@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Woollard, Geoffrey Robert Paget"@en ; dcterms:issued "2014-04-11T15:45:10Z"@en, "2014"@en ; vivo:relatedDegree "Master of Science - MSc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "Current protein sequencing methods include mass spectrometry and Edman degradation. We envision a novel high-throughput protein sequencing method using affinity adapters to recognize the N-terminal residue of a denatured peptide in an iterative process. This thesis takes a first step toward designing robust and selective affinity reagents. We outline our pipeline for designing selective protein adapters that recognize the N-terminal amino acid of a peptide independent of the following sequence. We based our design on a substrate recognition protein in the N-end rule pathway, ClpS. The bacterial N-recognin protein ClpS binds peptide substrates, termed N-degrons, that have a bulky hydrophobic amino acid (L/F/Y/W) at the N-terminus. Using full atom in silico models we designed hydrogen bonding and salt-bridge contacts in ClpS to novel N-degron substrates (N-end D/E/T), predicted the selectivity of these designs, and experimentally verified them. Of 11 designs, we purified nine that were soluble by SDS-PAGE, and obtained a peptide binding profile to 30 peptides with a modified ELISA assay. Most designs were non-specific or had no binding affinity. Four designs M53A, L112F, I45L, I45L_I45L_M53A had an increase in affinity to various substrates, but were not selective as they retained affinity to the native substrates (N-end L/F/Y/W). We performed molecular dynamics simulations on several proteins that were soluble or insoluble under standard expression conditions in E. coli, in order to learn parameters that were indicative of kinetic instability. Using a back-to-consensus approach, we identified a point mutant S104F that stabilizes the scaffold of ClpS as assayed by GFP fluorescence in a GFP-ClpS fusion protein. This thesis outlines the computational design pipeline we developed, which includes a RosettaScripts protocol, an in silico selectivity screen with AutoDock, and a kinetic stability confidence score from a molecular dynamics trajectory. Finally, we make suggestions toward designing selective affinity reagents for high-throughput N-end protein sequencing."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/46377?expand=metadata"@en ; skos:note """Redesign of the N-end Rule Protein ClpS for use in High-Throughput N-end Protein Sequencing by Geoffrey Robert Paget Woollard B.Sc., The University of British Columbia, 2011 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Genome Science and Technology) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2014 © Geoffrey Robert Paget Woollard, 2014 ii Abstract Current protein sequencing methods include mass spectrometry and Edman degradation. We envision a novel high-throughput protein sequencing method using affinity adapters to recognize the N-terminal residue of a denatured peptide in an iterative process. This thesis takes a first step toward designing robust and selective affinity reagents. We outline our pipeline for designing selective protein adapters that recognize the N-terminal amino acid of a peptide independent of the following sequence. We based our design on a substrate recognition protein in the N-end rule pathway, ClpS. The bacterial N-recognin protein ClpS binds peptide substrates, termed N-degrons, that have a bulky hydrophobic amino acid (L/F/Y/W) at the N-terminus. Using full atom in silico models we designed hydrogen bonding and salt-bridge contacts in ClpS to novel N-degron substrates (N-end D/E/T), predicted the selectivity of these designs, and experimentally verified them. Of 11 designs, we purified nine that were soluble by SDS-PAGE, and obtained a peptide binding profile to 30 peptides with a modified ELISA assay. Most designs were non-specific or had no binding affinity. Four designs M53A, L112F, I45L, I45L_I45L_M53A had an increase in affinity to various substrates, but were not selective as they retained affinity to the native substrates (N-end L/F/Y/W). We performed molecular dynamics simulations on several proteins that were soluble or insoluble under standard expression conditions in E. coli, in order to learn parameters that were indicative of kinetic instability. Using a back-to-consensus approach, we identified a point mutant S104F that stabilizes the scaffold of ClpS as assayed by GFP fluorescence in a GFP-ClpS fusion protein. This thesis outlines the computational design pipeline we developed, which includes a RosettaScripts protocol, an in silico selectivity screen with AutoDock, and a kinetic stability confidence score from a molecular iii dynamics trajectory. Finally, we make suggestions toward designing selective affinity reagents for high-throughput N-end protein sequencing. iv Preface Dr. Joerg Gsponer, Dr. Stephen Withers, and Dr. Leonard Foster conceptualized the method of N-end protein sequencing. In September 2010, the author began this project for an undergraduate honours thesis (PHYS 449). Various parts of this work draw from the reports submitted for that course, as well as the course GSAT 502. Dr. Patrick Chan developed and performed the peptide binding assay, including the cloning and purification steps. Dr. Leonard Foster, the Foster Lab, and Dr. Nobuhiko Tokuriki were essential collaborators for the peptide binding assay experiments, and lent valuable expertise, reagents and personnel. Geoffrey Woollard carried out the back-to-consensus stabilization of ClpS in collaboration with Dr. Nobuhiko Tokuriki and Dr. Miriam Kaltenbach. The remainder of the work is original, unpublished work of Geoffrey Woollard, performed under the guidance of Dr. Joerg Gsponer. v Table of Contents Abstract.......................................................................................................................................... ii!Preface........................................................................................................................................... iv!Table of Contents ...........................................................................................................................v!List of Tables .............................................................................................................................. viii!List of Figures............................................................................................................................... ix!List of Abbreviations ................................................................................................................... xi!Acknowledgements .................................................................................................................... xiii!Dedication ................................................................................................................................... xiv!Chapter 1: Introduction ................................................................................................................1!1.1! High-Throughput Protein Sequencing for Proteomics ...................................................... 1!1.2! Edman Degradation Sequencing........................................................................................ 2!1.3! Mass Spectrometry Based Proteomics............................................................................... 2!1.4! Aim of this Work ............................................................................................................... 4!1.5! High-Throughput N-end Sequencing................................................................................. 4!1.6! The N-end Rule.................................................................................................................. 6!1.7! Computational Protein Design......................................................................................... 13!Chapter 2: Methods .....................................................................................................................17!2.1! Rosetta.............................................................................................................................. 17!2.1.1! Fixbb Protocol........................................................................................................... 17!2.1.2! Relax Protocol........................................................................................................... 17!2.1.3! RosettaScripts Protocol............................................................................................. 18! vi 2.2! AutoDock......................................................................................................................... 19!2.3! Structural Analysis........................................................................................................... 20!2.4! Molecular Dynamics Simulations.................................................................................... 20!2.4.1! MD Structure Preparation ......................................................................................... 21!2.4.2! MD Minimization and Heading ................................................................................ 21!2.4.3! MD Equilibration ...................................................................................................... 22!2.4.4! MD Production.......................................................................................................... 22!2.5! Contact Map Analysis...................................................................................................... 22!2.5.1! Contact Pair Separation............................................................................................. 22!2.5.2! Statistical Tests of Contact Pair Distributions .......................................................... 23!2.6! GFP-ClpS Fluorescence Screen....................................................................................... 23!2.7! Peptide Binding Assay..................................................................................................... 24!Chapter 3: Results........................................................................................................................26!3.1! Benchmarking Protocol on Existing Experimental Data ................................................. 26!3.2! Computational Designs and Experimental Validation..................................................... 35!3.3! Design Optimization ........................................................................................................ 47!3.3.1! Investigating Kinetic Stability with Molecular Dynamics Simulations ................... 47!3.3.2! Stabilizing ClpS with Back-To-Consensus Mutations ............................................. 62!Chapter 4: Conclusion.................................................................................................................66!Bibliography .................................................................................................................................76!Appendices....................................................................................................................................86!Appendix A - AutoDock GA-LS parameters............................................................................ 86!Appendix B - DNA Sequences ................................................................................................. 88! vii B.1! pET28a-His6-SUMO-GFP(F64L/S65T/F99S/M153T)-TEV-ClpS ........................... 88!B.2! Mutagenesis Primers ................................................................................................... 89!Appendix C - RosettaScripts (cst.ndes.xml) ............................................................................. 91!C.1! Protocol ....................................................................................................................... 91!C.2! Hydrogen Bond Restraint (amide_hb.cst)................................................................... 93!Appendix D - Resfile ................................................................................................................ 94!Appendix E - Molecular Dynamics Scripts: NAMD Executables............................................ 95!E.1! Minimization, Heating, Equilibration.......................................................................... 95!E.2! Production ................................................................................................................... 97! viii List of Tables Table 1.1 Published N-recognin Structures .................................................................................. 10!Table 1.2 Experimental Binding Constants (N-recognin :: N-degron)......................................... 12!Table 1.3 Rosetta Energy Terms................................................................................................... 16!Table 3.1 Experimental Results or N-recognin Designs............................................................... 39!Table 3.2 Kinetic Stability Labels for Randomization Significance Test .................................... 54! ix List of Figures Figure 1.1 High-Throughput Protein Sequencing Schema............................................................. 5!Figure 1.2 Binding Interfaces of N-recognins ................................................................................ 9!Figure 1.3 N-end Hydrogen Bond Coordination .......................................................................... 15!Figure 3.1 Wild-type ClpS :: N-degron Binding Profile (Rosetta). .............................................. 28!Figure 3.2 Trp N-degron Binding Pose......................................................................................... 29!Figure 3.3 Wild-type ClpS :: N-degron Binding Profile (AutoDock) .......................................... 31!Figure 3.4 Wild-type ClpS :: N-degron Bound and Unbound Energy (AutoDock) ..................... 32!Figure 3.5 Wild-type ClpS Binding Profile .................................................................................. 33!Figure 3.6 Docking Poses of the Leu N-degron from AutoDock ................................................. 34!Figure 3.7 Binding Pocket Residues of ClpS ............................................................................... 36!Figure 3.8 Specificity Profile of V56H Negative Designs ........................................................... 37!Figure 3.9 Eleven Core Residues of ClpS .................................................................................... 40!Figure 3.10 Per Residue SASA Distribution ................................................................................ 41!Figure 3.11 L112F Peptide Binding Profile.................................................................................. 42!Figure 3.12 E. coli and C. crescentus Binding Pocket Differences............................................. 43!Figure 3.13 Peptide Binding Affinity of I45L and I45L_M53A .................................................. 44!Figure 3.14 Peptide Binding Profile of Thr Designs .................................................................... 46!Figure 3.15 Contact Map of ClpS and L112K.............................................................................. 49!Figure 3.16 Contact Pair Separation Distribution......................................................................... 51!Figure 3.17 Top Six Features Based on Max Separation.............................................................. 52!Figure 3.18 Contact Frequency Separation Enrichment ............................................................... 55!Figure 3.19 P-value Enrichment for Kinetically Stable vs. Unstable Contact Frequencies ......... 56! x Figure 3.20 Structural Basis of Four Contact Pairs ...................................................................... 58!Figure 3.21 Contact Pairs with > 30 % Frequency Separation ..................................................... 59!Figure 3.22 Confidence of Kinetic Stability for Various ClpS Mutants....................................... 61!Figure 3.23 Sequence Family of ClpS and Back-to-Consensus Mutations .................................. 63!Figure 3.24 Normalized Fluorescent Signals of Back-to-Consensus Mutations .......................... 64!Figure 3.25 ClpS S104 and Surrounding Aromatic Residues....................................................... 65!Figure 4.1 Computational Pipeline ............................................................................................... 69! xi List of Abbreviations CD - circular dichroism ELISA - enzyme-linked immunosorbent assay FA - fluorescence anisotropy HRP - horseradish peroxidase IPTG - isopropyl-beta-D-thiogalactopyranoside ITC - isothermal titration calorimetry Kd - dissociation constant MS - mass spectrometry PBS - phosphate buffered saline SDS-PAGE - sodium dodecyl sulfate polyacrylamide gel electrophoresis SILAC - stable isotope labeling with amino acids in cell culture A - Ala - Alanine C - Cys - Cysteine D - Asp - Aspartic E - Glu - Glutamic F - Phe - Phenylalanine G - Gly - Glycine H - His - Histidine I - Ile - Isoleucine K - Lys - Lysine L - Leu - Leucine M - Met - Methionine xii N - Asn - Asparagine P - Pro - Proline Q - Gln - Glutamine R - Arg - Arginine S - Ser - Serine T - Thr - Threonine V - Val - Valine W - Trp - Tryptophan Y - Tyr - Tyrosine xiii Acknowledgements When I first met Dr. Joerg Gsponer in 2010 he presented me with two projects: a stock undergraduate thesis project, and another that was much more difficult, but far more interesting. I chose the latter project, which grew into this thesis. He was always available in his office to bounce ideas off of, and I thank him for the opportunity to work in the challenging area of protein design. I thank Dr. Nobu Tokuriki for making space for me in his lab and welcoming me. Dr. Miriam Kaltenbach provided expertise and encouragement and I was inspired by the intensity that she worked with and her care for detail. Dr. Leonard Foster and Dr. Patrick Chan, in the midst of many other responsibilities, devoted time to obtaining experimental results and pondering over mysterious results. The hidden work of the support staff at GSAT, CHiBi and MSL makes the wheels of publicly funded science go round, and I thank you all: Hugh Brown, Sharon Ruschkowski, David Thompson, Joyce Huang, and Miranda Joyce. xiv Dedication To my family 1 Chapter 1: Introduction 1.1 High-Throughput Protein Sequencing for Proteomics High-throughput DNA sequencing revolutionized genomics. The ability to sequence whole genomes has stimulated new disease treatments, increased understanding of microbial diversity and evolutionary origins, and promises to usher in medicine that is predictive, preventative, personalized and participatory (Morozova and Marra 2008; Weston and Hood 2004; Hood and Auffray 2013). Proteins represent a further level of complexity beyond what genes are present in an organism (DNA), and which ones are being expressed (RNA) in the current environment. Proteins mechanistically perform most cellular functions necessary for life such as enzymatic reactions, cell signaling, transport and structural support. A similar breakthrough in the global study of all proteins (proteomics) requires a corresponding technological advance: high-throughput protein sequencing. DNA sequencing relies on a molecule called DNA polymerase, a protein that replicates genomes in living systems with the help of associated proteins and co-factors. In contrast, proteins are not amplified and replicated by a 'protein polymerase', and current protein sequencing relies on standard biochemical reactions like Edman degradation (Edman 1950; Niall 1973) or mass spectrometry. Furthermore, protein sequencing presents unique challenges. The majority of DNA is spatially localized in cells, and remains at a static concentration when not replicated. In contrast, protein expression, concentration and localization, vary for different cell types, time points and internal environments (pH, temperature, cellular stress, ageing, drug treatment). All of these variables present challenges to protein sequencing, yet they also highlight the wealth of information that may help explain biological processes, as well as defects in these processes that could lead to diseases (de Hoog and Mann 2004). 2 1.2 Edman Degradation Sequencing Edman sequencers were developed half a century ago, and rely on specific chemical reagents to cleave one amino acid off the N-terminal, then identify it chromatographically (Edman 1950; Edman and Begg 1967; Niall 1973). The cleavage step and detection steps take tens of minutes for each amino acid, and tens of hours are required to detect long peptides (Mortz, Nguyen, and Kofoed 2013). Edman sequencers are being eclipsed by mass spectrometry, but are still used to verify the protein sequence of a cleaved peptide fragment which is isolated by gel electrophoresis (Doucet and Overall 2011). The company Applied Biosystems instruments announced in 2008 that it was no longer manufacturing Edman sequencers, and would eventually stop selling the reagents (Fong 2008). The decision was made based on commercial reasons, as Edman sequencing represents a very small fraction of Applied Biosystems total business. This suggests that Edman sequencing is a mature technology but is lacking creative innovation, and perhaps is even becoming obsolete. 1.3 Mass Spectrometry Based Proteomics Current breakthroughs in mass spectrometry (MS) proteomics rely on the ability to fragment proteins into peptides, ionize them, and identify their primary structure through their mass to charge ratio (Hernandez, Müller, and Appel 2006). Using tandem mass spectrometry, one obtains a mass to charge spectrum for each peptide and further fragments off amino acids. One can identify the peptide sequence through comparing the experimental spectra to reference spectra, aided by non-trivial statistical analysis with software. Once peptides have reliably been identified, one can search through a database that matches peptides to proteins. MS-based technologies can do more than identify the presence or absence of proteins. MS can detect multiple modifications to proteins, including phosphorylation, acetylation, 3 ubiquitination, glycosylation and oxidation (Silva et al. 2013; Keck et al. 2011; W. Kim et al. 2011). In contrast, genomic or transcriptomic approaches can give limited insight into the state of proteins in a given environment because they measure post-translational modifications indirectly. Transcript abundance has been shown to vary from protein levels by as much as 30 fold (Gygi et al. 1999). Both transcript and protein abundance vary, but such high variation in their ratio is largely due to changes in protein levels in different cellular types and states. Thus, in order to determine protein abundance it is more appropriate to directly measure protein abundance through MS instead of inferring protein abundance through transcript abundance. Stable isotope labeling with MS allows relative quantification of such variation as protein levels or modifications. A number of in vitro or in vivo methods exist for comparing protein characteristics in a whole organism, tissue, organ structure, cell line, body fluid, or arbitrary protein mixture (e.g. serum) (Gouw, Krijgsveld, and Heck 2010). Depending on the experimental question, labels can be introduced metabolically with isotopically enriched amino acids or salts containing such atoms as 13C, 15N, 18O. Differentially labeled chemical groups can be added to certain functional groups in proteins. An example of metabolic labeling is 'stable isotope labeling with amino acids in cell culture' (SILAC) where an organism or cell line is grown in two conditions: (1) normal media containing normal "light" amino acids and (2) isotopically labeled "heavy" amino acids (Ong and Mann 2006). In any MS experiment, only a fraction of the peptides present in a sample are detected, which makes it difficult to detect low abundance proteins. Some strategies such as liquid separation attempt to overcome this, and much work continues in this area (Kay et al. 2008; Altelaar, Munoz, and Heck 2013). 4 1.4 Aim of this Work This work aims to take the first steps towards a new proteomics technology that is orthogonal (i.e. fundamentally different) than mass spectrometry. Our goal is to develop a single molecule method of high-throughput protein sequencing, based on recognition by affinity reagents. The scope of this thesis is to design protein based affinity reagents for this high-throughput protein sequencing method, using computational methods, and to characterize the designed protein affinity reagents with experimental techniques. 1.5 High-Throughput N-end Sequencing In order to address some of the limitations of MS based proteomics we propose a fundamentally orthogonal technology. We envision the parallel sequencing of hundreds of millions of peptides at single molecule resolution by sequential steps of Edman degradation (Edman 1950; Niall 1973) combined with affinity reagents and single molecule fluorescence imaging (Fig. 1.1). The proteins to be sequenced are unfolded into linear strands and immobilized on a substrate with their N-terminals (nitrogen terminals) exposed. The sequencing method works as follows: 1) The N-terminal amino acid of the immobilized peptide is bound by a specific adapter (i.e. there would be 20 adapters to recognize the 20 types of amino acids). 2) The bound adapter is imaged, for example it is optically detected through an attached fluorescent molecule. This reveals the N-terminal amino acid identity of the peptide. 3) The adapters are washed off the peptides and a single amino acid on the peptide is removed through Edman degradation. One and only one amino acid is removed, the one previously identified. 4) The process is repeated. 5 Figure 1.1 High-Throughput Protein Sequencing Schema. A protein mixture is immobilized on a matrix and a mixture of adapters is added. These adaptor proteins selectively bind the protein for sequencing depending on its N-terminal amino acid, and have a unique fluorescent molecule attached. The amino acid identity is sequenced by subsequent rounds of N-terminal amino acid removal via Edman degradation, and reapplication of the adapter. In traditional Edman sequencing, a free amino acid is cleaved off and detected with chromatography. This sequencing method is a low-throughput technology for several reasons. It sequences a single peptide, and it takes 40 minutes for each amino acid to be identified and then cleaved (Mortz, Nguyen, and Kofoed 2013). Our method removes the chromatographic detection step, and only utilizes Edman degradation to remove an amino acid that is washed away. Also, we would sequence an array of peptides in parallel, because the reaction steps (binding, imaging, removing) are performed on each peptide simultaneously. Once the peptides are immobilized, Bind%Image%Remove% 6 one cannot amplify them. In DNA sequencing, immobilized oligonucleotides are amplified, increasing the signal during the imaging step (Metzker 2010). The amplification step of the DNA is carried out with DNA polymerase, but there is no known enzyme that can amplify protein in this manner. Therefore this high-throughput technique would require single molecule identification, which is quite feasible with current microscopy technologies that break the diffraction limit, such as sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (Rust, Bates, and Zhuang 2006; Zhuang 2009), near-field scanning optical microscopy (F. de Lange et al. 2001), stimulated emission depletion microscopy (Donnert et al. 2006), and total internal reflection fluorescence microscopy (Webb et al. 2012). The missing piece of the puzzle for N-end sequencing is a set of adapter proteins that are able to bind specifically to each of the 20 amino acids on the N-terminus of a peptide. 1.6 The N-end Rule The success of our sequencing approach relies on two things: (1) specificity of the adapter-peptide interaction and (2) coverage of the 20 amino acids that need to be sequenced. The adapter-peptide binding event needs to be independent of other features of the unfolded peptide, such as the next amino acid in the sequence. Therefore the adapter needs to be specific for a single amino acid. Peptide-protein interactions have unique binding characteristics, they tend to bind in grooves and have many backbone hydrogen bonds (London, Movshovitz-Attias, and Schueler-Furman 2010). Peptide-protein interactions are often specified by a sequence recognition motif that defines a class of peptides that are recognized by the protein, and efforts are underway to further incorporate structural information and define minimotifs (Sargeant et al. 2012). The interaction required for N-end sequencing involves only the adapter and the N-terminal amino acid of the peptide, and is therefore unique in peptide-protein interactions. For 7 example, the adapter for Thr would have the motif N-T (N for the N-terminal) but the adapter would need to exclude all interactions with Thr side chains in the internal sequence of the peptide (not the N-terminus). Because the adapter-peptide interaction is specific, there would need to be 20 different adapters for the 20 different amino acids that would appear on the N-terminus as the reaction proceeded. Proteins with some of these characteristics do exist in nature, and are involved in eukaryotic and prokaryotic protein degradation in the so-called N-end pathway (Tasaki et al. 2012). In fact, it was this protein degradation pathway that initially inspired the protein sequencing technique. The N-end rule pathway was discovered by Varshavsky and co-workers in 1986 (Bachmair, Finley, and Varshavsky 1986). The half-life of a β-galactosidase reporter protein in yeast was found to depend on the identity of the N-terminal amino acid. Reporters differed only in their N-terminal amino acid identity, but their half-life ranged from more than 20 hours to less than 3 minutes. The different half-life times led to a classification of N-terminal amino acids as stabilizing (long half-life) or destabilizing (short half-life). Much work continues refining the categories and discovering further regulation, for instance, involving N-terminal acetylation (P. F. Lange et al. 2014; Hwang, Shemorry, and Varshavsky 2010). This pathway is present in all domains of life, and in eukaryotes the N-end rule forms part of the ubiquitin system for protein degradation (Mogk, Schmidt, and Bukau 2007; Tasaki et al. 2012). To date, various naturally occurring peptide substrates of the N-end rule have been determined, along with the functional effect of their degradation. The destabilizing peptide signals that are recognized are termed N-degrons, while the proteins that recognize them and target them for degradation are termed N-recognins. 8 In eukaryotes, N-terminal basic residues R/K/H are called type 1 N-degrons are recognized by a protein domain in E3 ubiquitin ligases, called the UBR box. Hydrophobic residues L/I/F/Y/W are called type 2 N-degrons and are recognized by these same E3 proteins - but the binding site is in another region called the N-domain, which has high homology to ClpS. In order for a substrate to be degraded it is first bound by the UBR box, then conjugated to ubiquitin through an internal Lys, and then delivered to proteasome for degradation. Several 3D structures of the UBR box have been determined and show the UBR box of the E3 ubiquitin ligase Ubr1 in complex with type 1 (R/K/H) N-degrons and are summarized in Table 1.1 (Choi et al. 2010; Matta-Camacho et al. 2010). These structures reveal that a negatively charged binding pocket binds the N-terminal residue through ionic and hydrogen bonds (Fig. 1.2 left), a mechanism quite different from the hydrophobic binding pocket of the bacterial N-recognin ClpS (Fig. 1.2 right). 9 Figure 1.2 Binding Interfaces of N-recognins. (Left) The E3 ubiquitin-protein ligase Ubr1 in Baker’s yeast binds charged N-degrons R/K/H (HIA shown) through ionic and hydrogen bond interactions in the UBR box domain (pdb 3nij). (Right) In bacteria the protein ClpS binds the N-degrons L/F/Y/W (Y shown) through a hydrophobic binding pocket (pdb 3dnj). The adapters needed for N-end protein sequencing need to be specific for each amino acid. Therefore the N-end rule proteins would need to be modified to make them specific to each amino acid independent of the downstream sequence, while retaining a sufficiently strong binding affinity. Many crystal structures of ClpS, alone and in complex with an N-degron, have been published (Table 1.1). Using the highest resolution crystal structure of ClpS (3dnj) as a template, we thought it possible to redesign ClpS, by computationally modeling mutations and computing the binding affinity to the 20 amino acids in silico. Bacterial ClpS was chosen instead of the eukaryotic UBR box because ClpS has a higher affinity to its hydrophobic N-degrons than does the UBR box to charged N-degrons, as measured by isothermal titration calorimetry (ITC), fluorescence anisotropy (FA), and surface plasmon resonance (SPR) (Table 1.2). This makes it attractive to use as a template for design, because binding affinities of designs may decrease below wild-type values, therefore the initial binding affinity needs to be strong. UBR$box$ ClpS$ 10 Table 1.1 Published N-recognin Structures. H: hydrogens present (+) or absent (-). N-degdon: Coordinates present (bold) or absent (not bold) from crystal structure. pdb N-recognin Start Res End Res N-degron Res (Å) H Organism Reference 3dnj ClpS 40 119 YLFVQRDSKE 1.15 + C. crescentus Wang et al. 2008 2w9r ClpS 25 122 LVKSKATNLLY 1.70 - E. coli Schuenemann et al. 2009 2wa8 ClpS 31 119 FRSKGEELFT 2.15 - E. coli Schuenemann et al. 2009 2wa9 ClpS 39 121 LLTMITDSLA 2.90 - E. coli Schuenemann et al. 2009 3gq0 ClpS 38 119 - 2.07 - C. crescentus Roman-Hernandez et al. 2009 3g19 ClpS 36 119 LLL 1.85 - C. crescentus Roman-Hernandez et al. 2009 3gq1 ClpS 35 119 WLFVQRDSKE 1.50 + C. crescentus Roman-Hernandez et al. 2009 3g1b ClpS_M53A 32 119 WLFVQRDSKE 1.45 + C. crescentus Roman-Hernandez et al. 2009 3gw1 ClpS 29 119 FGG 2.36 - C. crescentus Roman-Hernandez et al. 2009 3o1f ClpS 39 106 - 1.40 + E. coli Roman-Hernandez et al. 2011 3o2h (refined 2w9r) ClpS 25 119 LVKSKATNLLY 1.70 + E. coli Roman-Hernandez et al. 2011 3o2b (refined 2wa8) ClpS 35 119 FRSKGEELFT 2.05 + E. coli Roman-Hernandez et al. 2011 3o2o (refined 2wa9) ClpS 35 119 LKPP ...(ClpS) 2.90 + E. coli Roman-Hernandez et al. 2011 3ny1 UBR1 98 168 - 2.09 - H. sapiens Matta-Camacho et al. 2010 3ny2 UBR2 96 167 - 2.61 - H. sapiens Matta-Camacho et al. 2010 3ny3 UBR2 98 167 RIFS 1.60 - H. sapiens Matta-Camacho et al. 2010 3nih Ubr1 115 194 RIAAA 2.10 - S. cerevisiae Choi et al. 2010 11 pdb N-recognin Start Res End Res N-degron Res (Å) H Organism Reference 3nii Ubr1 115 194 KIAA 2.10 - S. cerevisiae Choi et al. 2010 3nij Ubr1 115 194 HIAA 2.10 - S. cerevisiae Choi et al. 2010 3nik Ubr1 115 194 REAA 1.85 - S. cerevisiae Choi et al. 2010 3nil Ubr1 115 194 RDAA 1.75 - S. cerevisiae Choi et al. 2010 3nim Ubr1 115 194 RRAA 2.00 - S. cerevisiae Choi et al. 2010 3nin Ubr1 115 194 RLGES 2.10 - S. cerevisiae Choi et al. 2010 3nis Ubr1 115 194 - 1.68 - S. cerevisiae Choi et al. 2010 3nit Ubr1 107 194 - 2.60 - S. cerevisiae Choi et al. 2010 12 Table 1.2 Experimental Binding Constants of N-recognin to N-degron. ITC = Isothermal titration calorimetry; FA = Fluorescence anisotropy ; SPR = Surface plasmon resonance N-degron Kd (uM) Technique N-recognin Reference FRSKGEELFT 3.8 ITC ClpS (E. coli) Schuenemann et al. 2009 LVKSKATNLLY 4.8 ITC ClpS (E. coli) Schuenemann et al. 2009 WLTMITDSLA 8.1 ITC ClpS (E. coli) Schuenemann et al. 2009 LRSKGEELFTGV 6.0 ITC ClpS (E. coli) Schuenemann et al. 2009 WRSKGEELFTGV 8.1 ITC ClpS (E. coli) Schuenemann et al. 2009 YLFVQYHHHHHHC 0.40 FA ClpS (C. crescentus) Wang et al. 2008 FLFVQYHHHHHHC 0.17 FA ClpS (C. crescentus) Wang et al. 2008 WLFVQYHHHHHHC 0.15 FA ClpS (C. crescentus) Wang et al. 2008 LLFVQYHHHHHHC 0.48 FA ClpS (C. crescentus) Wang et al. 2008 RA 38.0 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RA 27.5 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RAAA 22.4 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RAAAA 16.7 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RIAA 7.30 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 KIAA 58 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 HIAA 12.9 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RLAA 4.22 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RLGES 12.4 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RRAA 17.7 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 REAA 358 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RLAA 4.22 ITC UBR box in Ubr1 (S. cerevisiae) Choi et al. 2010 RIFSTDTGPGC 1.0 / 1.2 FA / SPR UBR box in Ubr1 (S. cerevisiae) Xia et al. 2008 FIFSTDTGPGC 0.7 / 1.0 FA / SPR UBR box in Ubr1 (S. cerevisiae) Xia et al. 2008 13 1.7 Computational Protein Design We chose to use computational methods in order to redesign ClpS to be an affinity reagent for high-throughput protein sequencing. In the protein design field, computational methods are commonly employed to redesign ligand-receptor interfaces (Morin, Meiler, and Mizoue 2011; Morin et al. 2011; Tinberg et al. 2013; Allison et al. 2014) and to investigate the specificity of promiscuous receptors to a specific target (London et al. 2011; Havranek, Duarte, and Baker 2004). These methods rely on Rosetta, a biomolecular modeling suite developed in David Baker’s lab at the University of Washington (Rohl et al. 2004; Leaver-Fay, Tyka, et al. 2011). Rosetta uses a Monte-Carlo based approach; the program samples an ensemble of structural states, which are ranked by various energy scores. The most stable structure for a given amino acid sequence is found by minimizing its energy score. The binding affinity can be estimated by computing the interaction energy between the partners in the binding complex. Rosetta energy contains energy terms based on physical models common in molecular mechanics, and others that relate statistical distribution of parameters in the protein data bank. If one assumes that the structural features available in the protein data bank follow a Boltzmann distribution, then one can relate the change in Gibbs free energy (between the folded and some reference state) to the probability of observing a protein in a given folded conformation. The energy function is described explicitly elsewhere (Kortemme, Morozov, and Baker 2003; Rohl et al. 2004; Das and Baker 2008), and is briefly outlined below. Rosetta's energy function is composed of various linearly weighted terms and we have summarized the energy function used in Rosetta 3.4 (score12) in Table 1.3. The functional forms of each of the terms in the Rosetta energy function are explicitly reported in the literature (Rohl et al. 2004; Kortemme, Morozov, and Baker 2003; Kortemme, Kim, and Baker 2004; Y. Liu and 14 Kuhlman 2006). The relevant parameters and weights can be determined by consulting the source code, available at rosettacommons.org. The Rosetta energy function contains terms derived from physical principles such as the Lazaridis-Karplus solvation energy term. This term models interactions of the protein with water such as the exclusion of hydrophobic amino acids from aqueous solution known as the hydrophobic effect (Chandler 2005). Two Lennard-Jones terms model attractive induced polarization (electrostatic attraction of dipoles) and repulsive overlapping electron density. Various empirical potentials relate the statistical distribution of parameters in the protein data bank to specific energy terms, including an explicit hydrogen bonding term. Rosetta’s energy function continues to be optimized, so that it scores native proteins favourably and is able to reproduce structures in the protein data bank (Leaver-Fay et al. 2013). The interface of ClpS with its N-degron is essentially one side chain (Fig. 1.2, right). Furthermore, we presume several conserved backbone contacts need to remain for proper binding to occur, because these contacts are conserved across all known crystal structures. These interactions do not seem to affect the specificity, because they are present in multiple complexes of ClpS bound to diverse N-degrons. Presumably any ClpS N-degron binding interaction would need to also bind in a similar manner. We show the hydrogen bond coordination of ClpS in complex with the Tyr N-degron from pdb 3dnj in Fig. 1.3. The small interface of ClpS and the N-degron makes it a particular challenge to design a novel specific interaction, as few new contacts can be introduced with such a small area to optimize. Rosetta has shown success at designing large interfaces between proteins with little flexibility, because one can design multiple complementary interactions between the binding partners and have a high likelihood that they will be in the designed orientation (Fleishman et al. 2011; Karanicolas et al. 2011; King 15 et al. 2012; Strauch, Fleishman, and Baker 2014). At the same time ClpS can be readily simulated with molecular dynamics, because the computational cost is low for a protein of its size. Therefore our hypothesis is: using computational methods, ClpS can be redesigned to specifically bind a novel N-degron. This would represent a first step toward high-throughput protein sequencing. Figure 1.3 N-end Hydrogen Bond Coordination. (Left) ClpS binds its N-degron through three hydrogen bonds to its α-amino nitrogen. The side chains of His 79, Asn 47 and a conserved water molecule lock the N-terminal residue in the hydrophobic binding pocket (pdb 3dnj). (Right) The distances and angles shown are between His 79 and the α-amino nitrogen of the Tyr N-degron from the crystal structure 3dnj. We constrained a distance parameter and two angles involved in N-terminal hydrogen bonding using our RosettaScripts protocol. Asn$47$Tyr$1$His$79$HOH$120$ Tyr$1$His$79$ 16 Table 1.3 Rosetta Energy Terms. Rosetta Energy Term Physical Basis fa_rep lennard-jones repulsive fa_intra_rep lennard-jones repulsive between atoms in the same residue fa_pair statistics based pair term, favours close contact of negative and positive charges fa_atr lennard-jones attractive fa_sol lazaridis-karplus solvation energy hbond_lr_bb backbone-backbone hbonds distant in primary sequence hbond_sr_bb backbone-backbone hbonds close in primary sequence hbond_bb_sc sidechain-backbone hydrogen bond energy hbond_sc sidechain-sidechain hydrogen bond energy rama ramachandran (angular) preferences omega omega dihedral in the backbone fa_dun internal energy of sidechain rotamers as derived from Dunbrack’s statistics ref reference energy for each amino acid 17 Chapter 2: Methods 2.1 Rosetta We used Rosetta version 3.4. Models of ClpS were derived from 3dnj chain A. The original crystal structure coordinates of the three N-terminal hydrogens were used to keep the α-amino nitrogen hydrogen bonding network intact. If these hydrogens were automatically replaced they took on coordinates incompatible with the α-amide hydrogden bond. Remaining hydrogens were replaced by default, because Rosetta uses different atom name conventions. 2.1.1 Fixbb Protocol Mutants were generated from fixbb with the following command line flag -use_input_sc The mutation was specified by a resfile, which allowed the residues to take on different conformations (rotamers), except residues Asn 47 and His 79 that coordinate the N-terminal hydrogen bond. An example resfile for the mutant L112F is included in the appendix. 2.1.2 Relax Protocol Crystal structures were relaxed into Rosetta’s energy function with minimal structural changes using the relax protocol with the following flags. The α-amino nitrogen hydrogen bond was constrained in the RosettaScripts protocols using the distance and angle values from the relaxed structure. -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false -ex1 -ex2 -use_input_sc 18 2.1.3 RosettaScripts Protocol We used Rosetta scripts to make a protocol which combined mutation, fixed backbone relaxation with minimal side chain perturbation, restrained docking with FlexPepDock (Raveh, London, and Schueler-Furman 2010; Raveh et al. 2011), and filtering steps. The full .xml file and a hydrogen bond restraint file is included in the appendix. We applied a harmonic restraint at the N-terminal to enrich a hydrogen bond in the output models. We retrained a distance and two angles between ND1 in the His 79 side chain and the α-nitrogen of the N-degron (Fig. 1.3). The equilibrium distances and angles were obtained from the relaxed crystal structure. The protocol was run from the command line with the following options: -use_input_sc -nstruct 100 -jd2:ntrials 2 -parser:protocol cst.ndes.xml -constraints:cst_fa_file amide_hb.cst -constraints:cst_file amide_hb.cst -cst_fa_weight 1.0 -cst_weight 1.0 -resfile resfile.res The RosettaScripts protocol included a filter for total_score and hydrogen bonds on between the α-amino N of the N-degron (hbond_bb_sc), and ddg. The filter tolerance values can be found in the .xml script in the appendix. The output structures were filtered heuristically for top total_score and hydrogen bonding scores. Structures with a strong hydrogen bond between the side chain of the N-terminal residue of the N-degron and the designed protein were 19 found by searching for the lowest q1_hbond_sc filter term energy, defined in RosettaScripts as follows: Structures output from the RosettaScripts protocol were used as templates for rigid body docking (fixed backbone coordinates) with AutoDock. 2.2 AutoDock N-degrons were docked into ClpS and various mutants using AutoDock 4.2.3 and associated command line tools and helper scripts such as AutoDockTools and Autogrid (Morris et al. 2009). Starting from a receptor and ligand PDB, .mol2 files were produced with Open Babel 2.3.1 (O’Boyle et al. 2011), and .pdbqt files were produced using scripts from AutoDockTools (prepare_ligand4.py, repare_receptor4.py from MGLTools-1.5.6rc3). Autogrid 4.2.3 was used to produce a 15 Å sided grid centred on the binding pocket (40x40x40 grid with 0.375 Å spacing). We used the genetic algorithm with local search method (GA-LS) to generate 100 docking poses using an RMSD tolerance of zero to avoid clustering. Full docking parameters can be seen in the appendix. We defined each docking trial as bound or unbound and used their energies to calculate Boltzmann weighted energies and a selectivity score, which is a normalized binding probability. The AutoDock (un)bound energy and selectivity score were computed according to the following formulae, 20 where ‘(un)bound docks’ refer to conformational states with an (un)bound N-degron, and where ‘i’ refers to the amino acid identity of the N-degron. We defined bound conformations as those that had an RMSD < 1.5 Å over backbone atoms (C, CA, O, N) in the N-terminal residue. We averaged the binding energy and selectivity score over several templates that passed heuristic score filters from the RosettaScripts output (total_score, hydrogen bonding scores). 2.3 Structural Analysis We used Pymol (Schrödinger 2010) to visualize all structures and the commands get_area to compute the solvent accessible surface area (SASA). E. coli and C. crescentus structures were aligned with the align command using default settings. 2.4 Molecular Dynamics Simulations Explicit water MD simulations of ClpS and various mutants were performed with NAMD (Phillips et al. 2005). Periodic boundary conditions were used with rectangular boundary SelScore(i) = Pstate2{bound docks} eGstate,i/kbTZ=Pstate2{bound docks} eGstate,i/kbTPaa Pstate2{docks} eGstate,aa/kbT(Un)boundEnergy(i) = Pstate2{(un)bound docks}Gstate,i eGstate,i/kbTZ 0=Pstate2{(un)bound docks}Gstate,i eGstate,i/kbTPstate2{(un)bound docks} eGstate,aa/kbT 21 conditions. Long-range electrostatic interactions were calculated with a particle mesh Ewald algorithm (Essmann et al. 1995). Langevin dynamics were used with a piston to fix pressure and thermostat to fix temperature. NAMD scripts are provided in the appendix. After minimization and heating, a 0.7 ns equilibration was followed by a ~200 ns production run used for the contact map analysis. 2.4.1 MD Structure Preparation Mutants were prepared with Rosetta (fixbb protocol with a N-degron with sequence YLF removed after the mutation). His protonation states were chosen based on the crystal structure 3dnj, which has a 1.15 Å resolution and includes hydrogen atoms. HSE was assigned to HIS 79, which is involved in hydrogen bond formation with the N-degron’s α-amino nitrogen. The remaining His residues (79, 108, 110) were assigned HSD. From the starting .pdb file we used VMD with tcl helper scripts to generate a .psf file for the CHARMM22 force field (MacKerell, et al. 1998), to map atom and residue names to the proper naming conventions, and to solvate and ionize the protein. The solvate package in VMD (MacKerell, et al. 1998) was used to solvate the protein in a water box. The size of the water box was defined by padding by 10 Å in the xyz directions, and the final size was ~50 Å cube. The autoionize package was used to add Na and Cl ions to a desired concentration of 0.5 mol/L, which resulted in ranges of 19-21 Na ions and 17-19 Cl ions. 2.4.2 MD Minimization and Heading The solvated structure was first minimized for 2 000 steps with protein atoms fixed, so only the water and ions were allowed to move. This was followed by 2 000 steps of all atom minimization. The protein was heated from 0 K to 310 K in 50 ps with a step size of 1 fs with all heavy atoms (non hydrogen atoms) harmonically restrained. 22 2.4.3 MD Equilibration The Langevin piston was turned on for the equilibration. Backbone atoms were harmonically restrained with the restraint scaling-factor lessened in a stepwise manner. The restraint scaling factor was set to 4 for 200 ps, then set to 3 for 100 ps, then set to 2 for 100 ps then set to 1 for 100 ps, then set to 0 for 200 ps. 2.4.4 MD Production Following visual inspection of the equilibration trajectory, we ran a production trajectory using a 2 fs time step. Structural coordinates were saved every 0.02 ns, representing just under 10 000 time frames, resulting in ~200 ns of production time (199.247 ns). 2.5 Contact Map Analysis We used the analysis.distances.contact_matrix command in the python module MDAnalysis (Michaud-Agrawal et al. 2011) to calculate residue pair contacts. A residue pair was defined as being in contact if their β-carbons (α-carbons for Gly) were within 10 Å. 2.5.1 Contact Pair Separation We searched for contact pairs with a clear separation between the two groups (+/-). We required the frequencies of all three examples from one group (+/-) to be higher than the other group (-/+). If there was no clear separating line, then we defined the separation as zero. If this requirement was met, we defined the separation as the smallest distance between the two groups. We computed the smallest difference by taking the difference between the minimum of the higher group and the maximum of the lower group. 23 2.5.2 Statistical Tests of Contact Pair Distributions P-values were computed for a contact pair using the three statistical tests: a Welch two sample t-test (Welch 1947), a two-sample Wilcoxon rank sum test (Bauer 1972), and a two-sample Kolmogorov-Smirnov test (Marsaglia, Tsang, and Wang 2003). All statistical tests were performed in R (R Core Team 2013), using the respective commands: t.test, wilcox.test, ks.test with default settings. The t-test computes the significance of the difference of means from two samples. In our case we compared two different distributions (e.g. the blue group = {clps, 53a, 112f} and the red group = {112k, 51n.112k, 51n.56h.112k}). Thus a mean and variance were defined for each contact pair for both the red and blue groups separately. Thus we used default settings that did not assume equal variance. In a two sample t-test, the p-value is low (significant) when the blue and red means are different, while the blue and red variances are low. The Welch t-test assumes a normal distribution, while the Wilcox and Kolmogorov-Smirnov tests are non-parametric and make no assumption of the distribution. The Wilcox rank sum test ranks the data from lowest to highest (from both distributions), sums up the rank of the data points from either distribution, and compares the likelihood of getting the different rank sums if the data was from the same underlying distribution. The Kolmogorov-Smirnov test can detect differences in the distribution that are not due to a difference in means, but the shape of the distribution. It computes the maximum probability distance between the cumulative distribution functions (along the vertical axis) at a given point (along the horizontal axis) where they are both defined. 2.6 GFP-ClpS Fluorescence Screen We expressed a His-SUMO-GFP(F64L/S65T/F99S/M153T)-TEV-ClpS fusion protein in the pET28a vector (sequence in the appendix). Unless otherwise stated growth and expression 24 conditions were in E. coli cells (BL21) in Luria Broth with kanamycin in 500 uL 96 well plates. Pre-cultures were grown overnight at 37°C (4 mL) to stationary phase and 50 uL of culture was transferred to 500 uL of fresh media and grown at 30°C. When the optical density at 600 nm reached 0.6, we induced expression with IPTG to a final concentration of 1 mM. Expression conditions were either 2 hours at 37°C or overnight at 16°C. Expression was halted by centrifugation. Before centrifugation we measured the optical density to normalize the fluorescent signal to the amount of cells. The bacteria pellet was lysed with a detergent based lysis buffer (50 mM NaCl; 50mM Tris HCl, pH 7.5; 0.1 wt/vol % Triton) and was centrifuged. The fluorescent signal in the soluble fraction was measured and normalized by the optical density. Point mutations were introduced by site-directed-mutagenesis with the primers included in the appendix. Cloning steps were performed in E. coli (DH5a), and followed standard procedures. Transformations were performed by heat shock at 42°C, or by electroporation if the cloning efficiency was low. DNA was isolated with the Qiagen QuickLyse Miniprep Kit and DNA sequences were verified by sequencing (GENEWIZ). 2.7 Peptide Binding Assay The peptide binding assay was based on a modified ELISA approach. Biotinylated peptides were immobilized to streptavidin coated 96 well plates. The N-recognin was added and then washed to remove the unbound form. The bound N-recognin was coupled to a chemiluminescent read out through horseradish peroxidase (HRP) conjugated anti-His6 antibodies. The final amount of signal was quantified with an Odyssey system. All washing steps were automated by robot to minimize variation. A step-by-step protocol is included below. 25 Streptavidin was bound to a 96 well plate by incubating the plate at 25°C overnight in phosphate buffered saline (PBS). The plate was washed with washing solution (0.05 % wt/vol bovine serum albumin in PBS). Blocking solution (3 % wt/vol in PBS) was added and incubated at 25°C for 1 hour. Peptides were added to each well and incubated overnight (4°C, 230 RPM). The peptide-coated plate was washed with washing solution. Peptides were synthesized in-house in Leonard Foster’s laboratory, or purchased (Zhejiang Ontores Biotechnolgy CO., Ltd.). N-recognin was added to a final concentration of 5 µM and incubated for 1 hour (25°C, 230 RPM). The plate was washed before we added HRP conjugated anti-His6 antibodies (Abcam), which bound the His6 tag on the N-recognin fusion protein. The antibody binding was incubated at 25°C for 1 hour before washing with washing solution. HRP substrate 680 was added and incubated for 15 min to produce a chemiluminescent signal before the reaction was terminated. Signals were observed at 680 nm with the Odyssey system. Error bars represent three independent experiments. 26 Chapter 3: Results Our goal was to develop a computational design pipeline, and then experimentally validate the designs for peptide binding affinity. In order to gain confidence that the computational pipeline could predict binding affinity, we first set out to recapitulate existing experimental data. After we obtained some initial experimental data, we gained perspective on our computational pipeline, identified the steps that had poor predictive power, and optimized our protocol. In this chapter we first present results that benchmark our computational protocol on existing experimental data involving wild-type ClpS. The next section applies this protocol to designing new versions of ClpS by introducing mutations, and to experimentally validating these designs. In the final section we describe how we learnt from the experimental data and optimized the design process. 3.1 Benchmarking Protocol on Existing Experimental Data In order to gain confidence that we could design ClpS mutants to bind new N-degrons, we first set out to recapitulate existing experimental data. Wild-type ClpS binds the hydrophobic amino acids L/F/Y/W with sub-uM affinity, as determined with fluorescence anisotropy (K H Wang et al. 2008), and is not selective when other amino acids are on the N-terminus. The second residue has some effect on binding activity as determined by peptide array experiments (Erbse et al. 2006). A positively charged amino acid such as lysine increases binding, whereas a negatively charged amino acid such as aspartate decreases binding. These trends have also been shown to occur in type 2 (L/F/Y/W) recognition by Ubr1 as determined by fluorescence anisotropy and surface plasmon resonance (Xia et al. 2008). We tried to reproduce the N-end specificity profile in silico by docking all 20 types of N-degrons (for the 20 amino acids) into ClpS and calculating a binding score. 27 The docked N-degrons had the sequence XL or XLF, where X is any one of the 20 amino acids. Using a two residue N-degron has the advantage of introducing less noise, as the changes in the score are due to only the structural changes of the first two residues. At the same time our goal is to have accurate computational models that imitate experimental conditions. Having a two residue N-degron is a disadvantage. The negatively charged C-terminal of a long peptide does not interact with the binding pocket of ClpS. In experiments where a very small N-degron (e.g. a dipeptide) is bound to ClpS, the C-terminus is indeed near the binding pocket (S. Sriram et al. 2013). The C-terminal region of the peptide is far from the binding pocket in experiments with longer N-degrons, and in our peptide binding assay the C-terminus is covalently linked to beads and therefore cannot interact with ClpS. When a short N-degron is used for modeling, this negative charge is closer to the N-terminal and therefore the region of interaction. When docking the N-degrons with AutoDock, we found that the second or third residue was often docked into the binding pocket, rather than the N-terminal residue. If we did not include the third residue, then there were few conformations (often called binding poses) available. When there were few binding poses available, then AutoDock had difficulty scoring known non-binders unfavourably. In other words, we observed bound poses less often when there was a third residue in the N-degron, as either the second or third residue could fit inside the binding pocket – and we observed this behaviour preferentially for N-degrons with poor expected experimental binding affinity. 28 Figure 3.1 Wild-type ClpS :: N-degron Binding Profile (Rosetta). N-degrons with varying N-terminal residues (n_degron) were docked into wild-type ClpS using a RosettaScripts protocol that restrained the N-terminal hydrogen bond. Rosetta energies (value, vertical axis) were computed from the top three models after ranking by total_score. The energies shown are of the complex (total_score, red), hydrogen binding energy (between backbone and side chain atoms) of the N-terminal residue of the N-degron to ClpS (q1_hbond_bb_sc, green) and binding energy (ddg, blue). REU = Rosetta energy units. total_score q1_hbond_bb_sc ddg−150−100−500−2.0−1.5−1.0−0.50.0−15−10−50a c de f gh i k l mnpq r s t vwy a c de f gh i k l mnpq r s t vwy a c de f gh i k l mnpq r s t vwyn_degronEnergy (REU) variabletotal_scoreq1_hbond_bb_scddg 29 We used a number of scoring methods and protocols in Rosetta, such as Relax, FlexPepDock, RosettaLigand/RosettaScripts (Kaufmann and Meiler 2012; Lemmon and Meiler 2012; Combs et al. 2013), but in the author’s opinion the binding energies did not sufficiently recapitulate the known experimental binding affinities (data not shown). Fig. 3.1 shows various Rosetta energy scores from our RosettaScripts protocol, and there are discrepancies between these energies and experimental data. We plotted the binding energy (ddg), the energy of the total complex (total_score) and the energy from the hydrogen bond from the N-terminal amino acid to the side chains of His79 and Asn47 (q1_hbond_bb_sc, see Fig. 1.3). Each score was averaged over the top three structures (lowest total_score), from an initial 120 (using the command line flag -nstruct 120). We expected the N-degrons L/F/Y/W to have the lowest ddg scores. However, the top binding predictions, according to our Rosetta protocol are F~Y > H > L. Rosetta did not favourably score W by ddg, and placed the side chain in a flipped conformation, as compared to existing crystal structures (Fig. 3.2). Figure 3.2 Trp N-degron Binding Pose. (Left) Trp N-degron binds wild-type ClpS through a hydrogen bond of its ring nitrogen to the backbone oxygen of Met75 (pdb 3gq1). (Right) The RosettaScripts protocol output structures with the Trp side chain in a flipped orientation relative to crystal structures. The nearest hydrogen bond donors to the Trp side chain were the side chain oxygens of Asp48. 30 In contrast to this, we noticed that AutoDock did not flip the side chain of Trp, and output structures in binding pose akin to the wild-type N-degrons without a guiding restraint. We defined a docked pose to be ‘bound’ by computing the backbone RMSD for the N-terminal residue (a pose with RMSD < 1.5 Å is bound, otherwise it is unbound). When the N-degron was known to bind only weakly to ClpS (e.g. G/R/S, etc.), then other non-native unbound conformations were favoured. Fig. 3.3 shows the number of docking poses that are in a bound confirmation (Nb in turquoise), normalized over the 100 docking trials. Some N-degrons, such as Lys and Arg tended to have large interaction energies, because they have many atoms. The Lys and Arg N-degrons had a substantially more negative energy than Leu, but we expect Leu to have a higher binding affinity (Fig. 3.4). We closely examined these cases and found that known non-binder N-degrons were frequently docked in an unbound pose, and the unbound pose had a more favourable energy than the bound pose. For instance, the unbound poses of K/R/G/A/C/S had a substantially less favourable energy than the bound pose, while the opposite trend is found for F/Y/W, i.e. the energy of the bound pose is more favourable than the unbound (Fig. 3.4). Some N-degrons have near equal bound and unbound energies, including Leu, which we expected to have a more favourable binding energy. Based on these results, we developed a selectivity score that quantified how selective a given ClpS receptor (wild-type or mutant) is for the 20 N-degrons. Each N-degron was docked to ClpS 100 times and each pose was assigned an energy score by AutoDock. We then checked if the docked pose was ‘bound’ by computing the backbone RMSD for the N-terminal residue (a pose with RMSD < 1.5 Å is bound, otherwise it is unbound). After each of the 100 poses was assigned ‘bound’ or ‘unbound’ we calculated the Boltzmann weighted probability of binding (Pb in Fig. 3.3). Finally, these the 20 probabilities (one for each N-degron) were normalized such 31 that they add up to one. This last normalization step captures the relative trends of the N-degrons to each other. Figure 3.3 Wild-type ClpS :: N-degron Binding Profile (AutoDock). The binding profile between ClpS and a three residue N-degron (XLF, where X is each of the 20 amino acids) was computed. We assigned each of 100 docked poses as bound or unbound, and computed the normalized number of bound structures from 100 docking rounds (Nb in turquoise). When then used the binding energies from AutoDock and computed the Boltzmann weighted binding probability (Pb in red). 0.000.250.500.75a c d e f g h i k l m n p q r s t v w yn_degronNormalized Value (unitless)variablePbNb 32 Figure 3.4 Wild-type ClpS :: N-degron Bound and Unbound Energy (AutoDock). We assigned each of 100 docked poses as bound or unbound and using the energies from AutoDock we computed the Boltzmann weighted binding energy (dGb, red) and unbound energy (dGub, turquoise). The N-degrons F/Y/W and also H had binding energies more favourable than unbound energies. AEU = AutoDock energy units. −7.5−5.0−2.50.0a c d e f g h i k l m n p q r s t v w yn_degron(Un)boundEnergy (AEU)variabledGbdGub 33 The selectivity score for wild-type ClpS is shown in Fig. 3.5, and shows that F/Y/W to have the most selectivity. A known binder, Leu, is less selective than Met and His. Leu preferentially adopts ‘unbound’ poses where the side chain is too deep in the pocket, or where the second Leu residue of the N-degron (LLF) is docked into the binding pocket (Fig. 3.6). His adopts a similar pose to the aromatic rings in F/Y/W, and Met is not sterically hindered, which makes them compatible with a binding pose results in a high selectivity score. Previous modeling studies argued that Met has a low binding affinity due to entropy loss, as it was not completely sterically hindered in the binding pocket, but adopted an uncommon side chain conformation (G Roman-Hernandez et al. 2009). Figure 3.5 Wild-type ClpS Binding Profile. The selectivity score is shown for wild-type ClpS and the N-degron XLF (where X is each of the 20 amino 0.00)0.04)0.08)0.12)0.16)A) C) D) E) F) G) H) I) K) L)M)N) P) Q) R) S) T) V)W)Y)Selec$vity&Score&(probability)&N3Terminal&Amino&Acid& 34 acids). ClpS is most selective for the N-degrons F/Y/W but unexpectedly has low selectivity to L, which is known to bind with high affinity. Figure 3.6 Docking Poses of the Leu N-degron from AutoDock. AutoDock did not score the Leu N-degron favourably because the unbound docking poses (white, right) were favoured over bound docking poses (magenta, left). The N-terminal residue of the N-degrons are shown superimposed with wild-type ClpS with His79 (cyan). Only the top 24 binding poses (of 100 trials) are shown, and all poses are within 1 energy unit of the top model. The unbound poses (white, right) are docked too deep into the pocket to form a hydrogen bond with His79, or have the N-terminal residue pointing out of the pocket, and the side chain of Leu2 docked into the binding pocket. Having partially recapitulated existing experimental data, we then applied this protocol to the problem of designing selective N-recognins. do ked && ndo ked&Bound& Unbound& 35 3.2 Computational Designs and Experimental Validation This selectivity score can be generalized to quantify the selectivity of a receptor to any ligand. Using this generalized scoring method, we set out to design a N-recognin to bind N-degrons that are either negative (D/E) or polar (T). We therefore introduced mutations into ClpS that optimized the interaction with these novel N-degrons, and carried out the same docking steps and calculated the selectivity score in the same manner as with wild-type ClpS. In the crystal structure between wild-type ClpS and the bound N-degron, 12 residues in ClpS make immediate contact, and make up the binding pocket (Fig. 3.7). This interface is particularly small and so we took a negative design approach (G. Schreiber and Keating 2011; Leaver-Fay, Jacak, et al. 2011). Our negative design strategy for ClpS was as follows. We wanted to fill the pocket so that the bulky hydrophobic N-degrons F/Y/W could not fit in. We also wanted to introduce a hydrogen bond or salt-bridge between ClpS and the side chain of the N-degron. To accomplish this negative strategy, we mutated a residue in the binding pocket so that its side chain could interact in a specific manner with the new N-degron, while at the same time destabilizing the interactions with the N-degrons L/F/Y/W. We mutated Val 56 to His to introduce new hydrogen binding contacts, and optimized the amino acid identities of D48, T51, and L112. In Fig. 3.8 we show the specificity profile of three mutants. The specificity score predicted the highest specificity to D/E and loss of specificity to F/Y/W. Upon structural analysis, we found that with a more polar binding pocket, the N-degrons D, E and Q could make hydrogen binding contacts with T51N and L112K, but not residue 48, so we chose to keep that position as wild-type D48. Thus we chose the mutations VH56, T51N and L112K. 36 Figure 3.7 Binding Pocket Residues of ClpS. The twelve binding pocket residues of ClpS are shown (pdb 3dnj): Ile 45, Leu 46, Asn 47, Asp 48, Asp 49, Thr 51, Met 53, Val 56, Met 75, Val 78, His 79, Leu 112. Panels are colour coded by the residue's physicochemical properties: orange = hydrophobic, green = polar, blue = positively charged, red = negatively charged. The Tyr N-degron is shown in magenta. Ile45& Leu46& Met53& Val56&Met75& Val78& Leu112&Asp48& Asp49&Asn47& Thr51&His79& 37 Figure 3.8 Specificity Profile of V56H Negative Designs. The selectivity score is shown for three designs, all having the common mutation V56H. The amino acid identity of residues 48, 51 and 112 were optimized, which resulted in the common double mutation T51N_L112K between three designs. The wild-type binding selectivity to F/Y/W is largely diminished, while the selectivity to D/E is largely increased. The wild-type amino acid identity of residue D48 was recovered (purple). These VH56, T51N and L112K mutations simultaneously changed three hydrophobic amino acids to a charged/polar identity, and this design was therefore quite aggressive. The triple mutant did not express well in bacteria, so we investigated the expressibility of all three point mutations individually, and all three double mutations. The experimental results are summarized in Table 3.1. Out of these six mutants only two were soluble (V56H and V56H_L112K), according to SDS-PAGE. We assayed these two mutants for peptide binding activity, but found 38 them to be non-specific or to show no binding. This could be caused by a number of reasons, such as a poorly defined structure or binding pocket. We did not further investigate the reasons for non-specific binding, but assaying for changes in the secondary structure with circular dichroism (CD) is a quite common technique in the protein design field (Figueroa et al. 2013). The SDS-PAGE results suggest that most mutants destabilized ClpS and/or affected the folding process. Protein stability is dependent on having a well-packed hydrophobic core (Borgo and Havranek 2012). To define which residues in ClpS are on the surface, and which are in the core, we computed the solvent accessible surface area (SASA). A low SASA indicates that the residue is not in contact with water, and therefore forms the core. Eleven residues form the core of ClpS, two of which are part of the binding pocket (I45, L112, Fig. 3.9). Three additional binding pocket residues, T51, V56 and M75 have quite low SASA (17 Å2, 13 Å2 and 13 Å2 respectively). The per residue distribution of SASA is seen in Fig. 3.10, with binding pocket residues separated (left panel) from the rest of the protein (right panel), and core residues in red. We therefore suspected that the mutations T51N, V56H, and L112K had destabilized the core and tried less aggressive mutations (hydrophobic to hydrophobic). 39 Table 3.1 Experimental Results or N-recognin Designs. High Expression *** Hydrophobic Non specific (NS) Medium Expression ** Polar No binding (X) Low Expression * (+) charged Increased binding + No expression 0 (-) charged Decreased binding - 45 46 47 48 49 51 53 56 75 78 79 112 EXPRESS PREDICT ASSAY I L N D D T M V M V H L *** Y F W L Y F W L N * DE H ** DE (NS) K 0 DE N H 0 DE N K 0 DE H K * DE (X) N H K 0 DE A *** +V +L +W F *** -Y -F -W +S +T +H A F *** T (X) A Q *** T (NS) L *** V +D +S +T +H L A *** V -F 40 Figure 3.9 Eleven Core Residues of ClpS. Eleven core residues of ClpS have under 2 Å2 SASA, and are shown in spheres (pdb 3dnj): Val 43, Ile 45, Val 59, Leu 60, Ala 71, Cys 86, Ala 94, Val 101, Ser 104, Ala 105, Leu 112. Of these core residues, two are part of the binding pocket: Ile 45 and Leu 112 (pdb 3dnj). Previous studies have described the role of M53 in blocking binding to β-branched N-degrons, such as Val and Ile. Evidence comes from biochemical assays with the E. coli ClpS, where the N-degron is tagged with GFP and the fluorescent signal is monitored in time (K H Wang et al. 2008). The structural basis of V/I exclusion is thought to be steric hindrance caused by M53 with the β-branched atoms. The M53A mutant (C. crescentus) in complex with a Trp peptide shows a native binding conformation (G Roman-Hernandez et al. 2009), and this mutant should therefore be soluble in appropriate conditions and bind the Trp N-degron. M43A is a non-aggressive hydrophobic to hydrophobic mutation, and we were interested in its solubility under our expression conditions. 41 Figure 3.10 Per Residue SASA Distribution. The solvent accessible surface area (SASA) of the binding pocket residues (left) and non binding pocket residues (right) is shown, with residues labeled on the vertical axis. Core residues are shown in red and defined as residues with under 2 Å2 of SASA. SASA was computed with the get_area command of Pymol. SASA (Å2)ResidueVAL 43VAL 59LEU 60ALA 71CYS 86ALA 94VAL 101ALA 105ILE 45SER 104LEU 112GLY 84VAL 78PHE 64CYS 114GLY 87ILE 74LEU 44VAL 56VAL 57MET 75THR 51VAL 98PHE 55PHE 63GLY 82HIS 110ASP 48MET 116LYS 97THR 72TYR 41PRO 111TYR 89LEU 46ALA 99ASP 70ILE 102THR 115VAL 93VAL 83LYS 66VAL 85GLU 61PRO 52SER 67ASN 47VAL 88THR 96THR 90TYR 58LEU 76HIS 108ASP 103ARG 42GLN 100GLU 95HIS 79ASP 119MET 53ARG 106ASN 81HIS 77TYR 91GLU 117GLN 113LEU 40ASN 65ASP 49GLU 69GLU 54GLU 92ARG 73ARG 107TYR 50GLN 109GLN 80LYS 118ARG 68ARG 620 50 100 150 200binding pocket0 50 100 150 200non binding pocket 42 Indeed the mutant M53A was soluble and retained binding activity toward the native N-degrons (L/F/Y/W, Table 3.1), as expected. However, we did not observe an increase in Val binding affinity. We then introduced a bulky hydrophobic residue at the back of the binding pocket, L112F, in order to prevent binding to N-degrons F/Y/W through steric clashes. However, L112F still bound the native N-degrons (Fig. 3.11). The combined mutation M53A_L112F was soluble, but elicited no binding activity. Figure 3.11 L112F Peptide Binding Profile. (Left) The peptide binding assay is a modified enzyme-linked immunosorbent assay (ELISA). The assay was performed in 96-well plates. Biotinylated peptides were immobilized to wells. The amount of ClpS bound was coupled to HRP through the His6 tag on ClpS. The signal was quantified by reacting HRP with substrate and imaging on an Odyssey system at 680 nm. (Right) The peptide binding profile of wild-type ClpS and the L112F mutant are shown. Each bar represents three independent measurements. The linker for each of the 30 peptides was the same: FVQRWK-Biotin. Peptides are grouped by second residue type (L/E/Q/R) along the horizontal axis: orange = hydrophobic, green = polar, blue = positively charged, red = negatively charged. We were surprised that the binding affinity of the M53A to the Val N-degron did not increase significantly, due to the relieved β-branch clash. However, the original assays reported in the literature were with a V/I N-degron and E. coli ClpS, while all our designs and assays ClpS-L112F peptide binding profile 0.00E+005.00E+041.00E+051.50E+052.00E+052.50E+053.00E+05DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignal     Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)ClpS-L112F peptide binding profile 0 05 0 411.50E+052.00E+052.50E+053.00E+05DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignal     Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignalil i i i i!"#$%&'()**+,-./&0$,)12345+( 43 involved the C. crescentus sequence. The sequence identity between E. coli and C. crescentus is only 55% and two binding pocket residues are different: I45 and L46 in C. crescentus vs. 45L and 46V in E. coli (Fig. 3.12). We “Ecolized” the C. crescentus binding pocket by introducing a I45L mutation, but we were unable to detect a specific increase in Val N-degron binding affinity with or without the additional M53A mutation (Table 3.1, Fig. 3.13). The binding affinity to D, S, T, V and H N-degrons all increased by the same amount, while the L/ F/ Y/W N-degron signals remained the same. Figure 3.12 E. coli and C. crescentus Binding Pocket Differences. There is 55 % sequence identity between the ClpS amino acid sequence in C. crescentus (green) and E. coli (cyan). Between the two organisms, only two of twelve binding pocket residues are different: I45L vs. L46V. The pdbs 3dnj (C. crescentus, green) and 3o1f (E. coli , cyan) are shown aligned and with the Tyr N-degron from 3dnj (magenta). Leu)46)(cc))Val)46)(ec))Ile)45)(cc))Leu)45)(ec)) 44 Figure 3.13 Peptide Binding Affinity of I45L and I45L_M53A. The peptide binding profiles for two designs are shown: I45L (top) and I45L_M53A (bottom). Residue 45 is Leu in E. coli, and the side chain points towards the binding pocket. The binding affinity for both proteins is highest to the native N-degrons L/F/Y/W. Peptides are grouped by second residue type (L/E/Q/R) along the horizontal axis: orange = hydrophobic, green = polar, blue = positively charged, red = negatively charged. 45 Because M53A was soluble, we layered mutations on top of it and computed their selectivity score, in order to find a design that could hydrogen bond a polar N-degron (T). Two computational designs, M53A_V56R/Q, were predicted to have improved selectivity to the T N-degron (Fig. 3.14). Upon visual inspection of the designed structures, the Arg at residue 56 did not make hydrogen binding contacts with the side chain of the N-degron, while Gln acted as a hydrogen binding donor to the side chain O of the Thr N-degron. Also Arg is a very large positively charged residue, and we suspected that it would destabilize the core. Therefore we only expressed the M53A_V56Q design, which was soluble but showed non-specific binding (peptide binding data not shown). V56H had been shown to be soluble, and tolerate a non-hydrophobic residue at that position, but both V56H and M53A_V56Q may not be properly folded, which is consistent with their non-specific binding (Table 3.1). 46 Figure 3.14 Peptide Binding Profile of Thr Designs. The amino acid identity of residue 56 was optimized for binding to a Thr N-degron by sampling charged and polar residues that could form hydrogen bonds. AutoDock was used to dock 20 N-degrons into the designs, for each of the 20 amino acids. The N-terminal amino acid (aa) of the N-degron is shown on the horizontal axis. The selectivity (sel) was computed by normalizing the Boltzmann weighted binding probability so that the selectivity scores of all 20 N-degrons add up to 1 for each design. The top two sequences were the most selective: M53A_V56Q/R. The sequence M53A_V56Q makes a hydrogen bond with the side chain of the N-degron Thr through Gln 56, while Arg 56 does not. 0.000.050.100.150.20a c d e f g h i k l m n p q r s t v w yn_degronSelectivity Scoreseq53a.56q53a.56r 47 3.3 Design Optimization We optimized our design process by learning from the limited experimental information outlined in Table 3.1, which consists of wild-type ClpS and 13 mutants, 9 of which were assayed for peptide binding activity. Our goal was to further improve our design protocol so as to have greater predictive power. We also performed some experiments to improve the stability of ClpS. 3.3.1 Investigating Kinetic Stability with Molecular Dynamics Simulations In the protein design field, mutant proteins are often found to express poorly, or be toxic to the expressional organism (David Baker, private communication). Given a modeled structure of a design, it is not known with high confidence if the protein will fold into this structure. To determine if the modeled structure is, in fact, the optimal structure for that particular sequence, the sequence is often computationally folded with computing resources contributed by the public, called Rosetta@Home (Kaufmann et al. 2010). If a sequence folds into another structure other than the model, then the designers go back to the drawing board. We considered this approach more suitable to de novo design, where a protein is designed from scratch by matching fragments from existing protein structures together (Zanghellini et al. 2006). In contrast to de novo design, we based our designs closely on a naturally occurring protein template, and introduced one to three mutations. Therefore, we were skeptical that our designs would fold into a significantly different structure from wild-type ClpS, and chose another approach. The approach we chose to investigate the stability of protein structures in silico involved time course molecular dynamics (MD) simulations. By simulating the dynamics of a design in a time course simulation, we could observe the structural changes and determine if the design is kinetically unstable. A previous study has successfully filtered out kinetically unstable mutants with MD simulations (Wijma et al. 2014). They calculated the kinetic stability by monitoring if 48 the RMSD increases with time significantly more than a control known to form a stable structure. We applied a similar method to our system. We assumed that if a protein would not fold into a well-defined structure as determined by experiment, then it would not be kinetically stable in the final 3D configuration of the design model as determined by MD simulation. The design models of ClpS were based on the 80 amino acid crystal structure 3dnj. The small size of ClpS made it relatively computationally inexpensive to simulate using explicit water molecules. Our approach was to look for differences in the simulations between mutants that expressed well and had peptide binding activity (high confidence positive examples: wild-type ClpS, M53A, L112F) and mutants that did not express at all (high confidence negatives: L112K, T51N_L112K, T51N_V56H_L112K), as shown in Table 3.1. At the time of performing this analysis we had not made the I45L and I45L_M53A mutants, therefore they do not form part of the high confidence positives. Furthermore, we did not include T51N_V56H in the high confidence negatives, because we wanted to have some remaining examples on which to validate our results. We first computed the RMSD, which is commonly used to assess the amount of structural change observed over the course of the simulation. Wild-type ClpS had the lowest backbone RMSD with respect to the crystal structure 3dnj, but this simple metric could not separate the positive group from the negative group, as the L112F design had a higher RMSD than L112K or T51N_V56H_L112K (data not shown). RMSD reduces the structural space to one number. Various structures can map to the same RMSD, when in fact there may be key differences, signifying instability. We hypothesized that some subtle structural re-arrangements in the high confidence negatives may not be visible to the manual inspection (viewing an animation of the simulation). We therefore decided to use 49 more detailed structural analysis. We computed the contact frequency of all residue pairs, over the course of the simulation. This information is a matrix of length 80x80 for our 80 residue protein, where element i,j varies between zero and one and is the frequency that residues i,j are ‘in contact’, defined as the residue pair being within 10 Å (see Methods). This matrix of values is commonly referred to a contact-map in the literature, and has been used in other studies to examine structural changes (Kass and Horovitz 2002; Andrews et al. 2008). We first calculated the average contact maps for each simulation (Fig. 3.15). Many pairs were never in contact (frequency zero, white), while pairs close in primary structure always stayed in contact (frequency 1, black). Out of 3160 pairs, there were only 717 that were different between the six simulations (not always zero or always 1 for all positive and negative examples). Figure 3.15 Contact Map of ClpS and L112K. The average contact frequency between all pairs of residues is shown as a contact map, with absolute residue numbering used (1 to 80). Residue pairs were defined as 1 (in contact) or 0 (not in contact) every 0.02 ns, and discrete values were averaged over ~10 000 frames from a ~200 ns molecular dynamics trajectory. A residue pair was defined to be ‘in contact’ if they had a distance < 10 Å between CB atoms (or CA atoms for Gly). 50 We then searched for contacts that had a clear separation between the two groups (positives and negatives). By separation, we mean that there was no overlap between the two groups, and the minimum frequency from the higher group was larger that maximum frequency from the lower group. Out of the 727 contact pairs, we found that only 129 had any separation power (Fig. 3.16). In the inset of Fig. 3.16 we how we computed the separation the residue pair 102_111. The kinetically unstable group (red) all had higher contact frequencies that the kinetically unstable group (blue). The smallest difference between the two groups was the difference between 51n.112k (lowest in the high groups) and 53a (highest in the low group). This residue pair, 102_111, had a separation of 55 %, the largest separation of all contact pairs. However, for most contact pairs, the separation power was quite low (< 10 %), and there appeared to be only several with large separation power. We show the top six features in Fig. 3.17, with p-values computed by a Welch two sample t-test (see Methods). 51 Figure 3.16 Contact Pair Separation Distribution. The contact frequencies for all residue pairs were computed for a set of 3 kinetically stable (+ set, coloured in blue in the inset panel) and 3 kinetically unstable (- set, coloured in the inset panel) proteins. If all three examples from the (+) set were higher than the (-) set, (or vice versa) we computed the minimum contact frequency distance between the two groups, and defined this as the ‘separation’ (sep, horizontal axis). In other words, the separation was computed by taking the difference between the lowest frequency in the higher group and the highest frequency in the lower group. In the inset we show an example of how we computed the separation. The separation distribution is shown for 717 contact pairs (of 3160) that have a separation > 0. sep dist of 129 / 717 features sep > 0sepDensity0510152025300.0 0.1 0.2 0.3 0.4 0.5Top 6 Features based on Max Separationvalueclps53a112f112k51n.112k51n.56h.112k0.0 0.2 0.4 0.6 0.8 1.0102_111 | p=3e−05 49_111 | p=0.020.0 0.2 0.4 0.6 0.8 1.051_78 | p=0.007clps53a112f112k51n.112k51n.56h.112k51_113 | p=0.040.0 0.2 0.4 0.6 0.8 1.052_112 | p=0.05 45_87 | p=0.01 52 Figure 3.17 Top Six Features Based on Max Separation. The contact frequencies for six residue pairs are shown. Kinetically stable sequences are shown in blue, kinetically unstable in red. The p-value was computed using a Welch two sample t-test, to compute the significance of the difference in mean between the blue and red group. The six contact pairs shown had the greatest separation distance. The separation was defined when all examples of one group (red/blue) were higher than the other (blue/red). The separation was then computed by taking the difference between the lowest frequency in the higher group and the highest frequency in the lower group (e.g. frequency difference between 51n.112k and 53a in the residue pair 102_111). Top 6 Features based on Max Separationvalueclps53a112f112k51n.112k51n.56h.112k0.0 0.2 0.4 0.6 0.8 1.0102_111 | p=3e−05 49_111 | p=0.020.0 0.2 0.4 0.6 0.8 1.051_78 | p=0.007clps53a112f112k51n.112k51n.56h.112k51_113 | p=0.040.0 0.2 0.4 0.6 0.8 1.052_112 | p=0.05 45_87 | p=0.01Contact'Frequency' 53 It could be argued that with hundreds of contact pairs, and only three examples in the positive and negative group, that one is likely to find contact pairs with separation by chance. Indeed, for any contact pair profile (a profile is the six frequencies of a contact pair, one frequency for each simulation) there will be some separation between the two groups whenever the top three frequencies are labeled as one group, and the bottom three are labeled as another. Therefore for any contact pair profile, there is some way of rearranging the group labels so that there is a separation between the two groups. There are only 10 ways to swap the group labels, and hundreds of contact pairs, so we suspected that the separation power might be due to chance. In order to directly address this criticism we swapped the labels (positive or negative) on the examples, and re-analyzed the data, to see if our separation power was enriched when examples were ‘properly’ labeled. Table 3.2 shows the randomized groups, which were 10 in total. For instance, in group 5 we pretended that 112f was a negative example and 51n.112k was a positive example and thus mislabeled the groups as {clps, 53a, 51n.112k} vs. {112k, 51n.56h.112k, 112f}. We then searched for contact pairs that had separation between the mislabeled groups. The properly labeled group 1 is shown in bold in Table 3.2 and shown in pink in the figures. We found that only two other mislabeled groups (5 and 7) had separation power above 20 % (Fig. 3.18). This enrichment in separation power in the properly labeled group 1 gave us confidence that the contact pairs with a higher separation power revealed important structural changes that could be used to predict the kinetic stability of unknown mutants. Enrichment of significance in the true-labeled group, as measured by three statistical tests, supports this conclusion (Fig. 3.19). 54 Table 3.2 Kinetic Stability Labels for Randomization Significance Test. group clps 53a 112f 112k 51n.112k 51n.56h.112k 1 0 0 0 1 1 1 2 0 0 1 0 1 1 3 0 1 0 0 1 1 4 1 0 0 0 1 1 5 0 0 1 1 0 1 6 0 1 0 1 0 1 7 1 0 0 1 0 1 8 1 0 1 0 0 1 9 0 1 1 0 0 1 10 1 1 0 0 0 1 55 Figure 3.18 Contact Frequency Separation Enrichment. The contact frequency separation between kinetically stable and unstable groups for all contact pairs was computed. Group 1, shown in pink, represents the ‘true’ groups: kinetically stable = {clps, 53a, 112f} vs. kinetically unstable: {112k, 51n.112k, 51n.56h.112k}. This process was repeated by randomizing the groups (10 ways in total). There was an enrichment in high separation values for the true group, as indicated by the outlier separation (sep) values around 0.4 – 0.6. Only two other groups, 5 and 7, have separations above 0.2. Group 5 is {clps, 53a, 51n.112k } vs. {112k, 51n.56h.112k, 112f }, while group 7 is {53a, 112f, 51n.112k } vs. {112k, 51n.56h.112k, clps}. Separation EnrichmentsepDensity0510152025300.0 0.2 0.4 0.610102030405020102030400.0 0.2 0.4 0.630102030404010203040505010015060246810127010203080102030405090.0 0.2 0.4 0.6051015202510 56 Figure 3.19 P-value Enrichment for Kinetically Stable vs. Unstable Contact Frequencies. The statistical difference between kinetically stable and unstable contact frequencies for all contact pairs was computed with three statistical tests: a Welch two sample t-test (t.p), a two-sample Wilcoxon rank sum test (w), and a two-sample Kolmogorov-Smirnov test (ks). P-values are plotted on the horizontal axis. Group 1, shown in pink, represents the ‘true’ group-labels: kinetically stable = {clps, 53a, 112f} vs. kinetically unstable: {112k, 51n.112k, 51n.56h.112k}. This process was repeated by randomizing the groups (10 ways in total). There is an enrichment in low p-values values (significant) for the true group, as indicated by leftward shift of the distributions of the true group-labels. P−value Enrichmentp−valueDensity01230.0 1.01w.p2w.p0.0 1.03w.p4w.p0.0 1.05w.p6w.p0.0 1.07w.p8w.p0.0 1.09w.p10w.p1ks.p0.0 1.02ks.p3ks.p0.0 1.04ks.p5ks.p0.0 1.06ks.p7ks.p0.0 1.08ks.p9ks.p0.0 1.0012310ks.pT−Test Enrichmentt.pDensity0.00.51.01e−05 0.001 0.11 21e−05 0.001 0.13 45 6 70.00.51.080.00.51.091e−05 0.001 0.110 57 We preferred to rely on this enrichment method rather than absolute magnitude of p-values for the following reasons. Our method made no assumptions about the distribution of the contact frequency distribution, because it used the empirical distribution. Secondly, the ‘sample size’ of each contact pair is small (three for the +/- groups), and significance could be due to noise. Even with a multiple testing correction, such as the Bonferroni correction for multiple comparisons (Abdi 2007), the relative values of one p-value to another would not change; all p-values would be scaled by the same correction factor. By swapping the labels, it was clear that just a few contact pairs, shown in Fig. 3.17, were different between the positive and negative group, and their structural basis was apparent upon visually tracking the contact throughout the simulation. Our method is simple because there are only three examples in each group, and swapping them resulted in only 10 label groups. If we included more and more examples, this process would need to rely more heavily on automation. For instance if there are four or five examples instead of three in each group, then there are respectively 35 or 126 label groups. In general for n examples in the +/- groups there are (n!/(n/2)!2)/2 label groups. We investigated the structural basis of the top contact pairs (Fig. 3.20). The distances of the four top contact pairs in the original crystal structure are approximately within an angstrom of the 10 Å contact definition (8.9, 9.1, 9.9, 10.0 Å). The contact pair 51_78 spans the binding pocket and measures how much the binding pocket opens up. The binding pocket of the kinetically unstable mutants opened up and thus these residue pairs were frequently not in contact. The top contact pair, 102_111, is between Ile 102 in an α-helix and Pro 111 in the loop region (immediately after the α-helix), which forms part of the second shell of the binding pocket, in contact with the helix. The opening of the binding pocket pushed the second shell loop region into contact with the helix (red arrow, Fig. 3.20). 58 Figure 3.20 Structural Basis of Four Contact Pairs. Four contact pairs are shown on the crystal structure 3dnj, which discriminate kinetically stable and unstable sets of proteins: stable = {clps, 53a, 112f} vs. unstable: {112k, 51n.112k, 51n.56h.112k}. The residue pairs that were more frequently in contact in the kinetically stable set are D49_P111, T51_Q113, and T51_V78; while the residue pair I102_P111 was more frequently in contact for the kinetically unstable set. These four contact pairs had a frequency separation (between the positive and negative groups) above 30%. With this level of separation power, there were four other contact pairs, from two of the nine mislabeled groups that had at least this level of separation power. We consider these contact pairs spurious, as they arose when the kinetically stability labels were mixed up. The contact pairs with separation power are shown in Fig. 3.21 in red and blue bars, while those that have zero separation are shown in green and yellow bars. I102%P111%T51%D49%V78%Q113% 59 Figure 3.21 Contact Pairs with > 30 % Frequency Separation. The kinetic stability labels were shuffled in order to check if frequency separation between kinetically stable (blue) and unstable (red) groups was due to chance. When the proteins are mislabeled (groups 5 and 7, upper two panels) there were several contact pairs that had spurious separation power: 47_112, 51_59, 54_110, 55_111. Contact pairs that had a separation > 0 are shown with red and blue coloured bars and outlined in bold. We then compared the contact pair profiles of other mutants to the kinetically stable group, in hopes that we could predict when an unknown mutant would be kinetically stable. If a mutant had a contact pair frequency in the range of values observed in the kinetically stable group, this gave confidence that it too would be kinetically stable. We took the top six contact pairs, which had the best separation power, and used them to predict whether other mutants are kinetically stable. The confidence score in Fig. 3.22 measures the average similarity to the kinetically stable group, across these six contact pairs. We defined the confidence score as how valueclps53a112f112k51n.112k51n.56h.112k0.0 0.4 0.8102_111149_11110.0 0.4 0.851_78151_11310.0 0.4 0.847_112151_5910.0 0.4 0.854_110155_1111clps53a112f112k51n.112k51n.56h.112k102_111549_111551_78551_113547_112551_59554_110555_1115clps53a112f112k51n.112k51n.56h.112k102_11170.0 0.4 0.849_111751_7870.0 0.4 0.851_113747_11270.0 0.4 0.851_59754_11070.0 0.4 0.855_1117Contact'Frequency' 60 many contact pairs fall within the range of the kinetically stable examples. There can be 0 to 6 contact pairs shared between an unknown mutant and the kinetically stable group, which we then normalized to 1. Thus a score of 1 means that 6/6 of the top contact pairs in a mutant had a value similar to the kinetically stable group, while a score of 0.5 means that only 3/6 contact pairs had a ‘kinetically stable’ frequency. The kinetically stable group (ClpS, M53A, L112F, shown in blue) obviously had a confidence of 1, while the kinetically unstable group (L112K, T51N_L112K, T51N_V56H_L112K, shown in red) had a confidence of 0. We validated our confidence score on proteins with known solubility, that had not been used to choose the 6 contact pairs and make the confidence score. Interestingly the confidence of mutants that did not express well was low, (shown in yellow group). The mutant I45K_L46K_T51K_V56K_M75K_V78K_L112K is predicted to be unstable (confidence = 0). This mutant was never experimentally tested, but with seven lysines in close proximity it should be kinetically unstable, and therefore acted as an additional negative control. 61 Figure 3.22 Confidence of Kinetic Stability for Various ClpS Mutants. The kinetic stability for hydrophobic mutants (green bars) was predicted using a confidence score. The confidence score measured how similar the contact frequency was across six residue pairs (see Fig. 3.17) that discriminate known kinetically stable (blue) from unstable (red) proteins. The confidence score for proteins that had poor solubility (yellow, excluding 56h) was predicted to be zero. 56h was soluble but produced non-specific binding. Contact Map Filters nFeat= 6confidence112f53aclps112k51n.112k51n.56h.112k45i.56v.112m45m.56l.112l45l.56l.112m45m.56v.112l45f.56i.112m45i.56i.112f45l.56l.112f45m.56i.112l45m.56l.112f45i.56f.112l45i.56i.112l45l.56i.112l45l.56l.112l45l.56v.112l45m.56l.112m45f.56i.112f45f.56i.112l45f.56v.112l45i.56l.112f45l.56i.112f45l.56i.112m45m.56i.112f45m.56i.112m45i.56l.112l45i.56m.112l56h45k.46k.51k.53k.56k.75k.78k.112k51n51n.56h56h.112k0.0 0.2 0.4 0.6 0.8 1.0 62 We computed the confidence of 25 mutants, in hopes of finding some with high confidence. Indeed several had a confidence value above 40%, which is the value of the expressible V56H, but which does not have any specific binding activity. The highest confidence mutants were L112M and I45M_V56I, which had respectively 6/6 and 5/6 contact pairs within the range of the kinetically stable group. There predictions can guide future experiments, and further experiments can validate these predictions. 3.3.2 Stabilizing ClpS with Back-To-Consensus Mutations We directly addressed the instability of ClpS by introducing mutations that were evolutionarily favoured. This ‘back-to-consensus’ approach has been used quite successfully to stabilize proteins (Bershtein, Goldin, and Tawfik 2008; Rockah-Shmuel and Tawfik 2012). The assumption is that mutations that are evolutionarily favoured are more likely to be tolerated than randomized mutations, and on average tend to stabilize the protein. Therefore we constructed a sequence family of highly similar sequences with > 70% amino acid sequence identity to our particular C. crescentus ClpS construct, (Fig. 3.23). We looked for positions that had a ‘consensus’ in the sequence alignment, but where our particular ClpS had a different amino acid identity. In other words, we looked for residue positions where our particular form of ClpS was different from the consensus amino acid. We settled on 14 mutations at 11 residue potions (some residue positions had multiple amino acid mutations, e.g. S104FL). Our goal was to determine which, if any, consensus mutation would rescue an insoluble mutant of ClpS and increased its solubility back to a wild-type level. 63 Figure 3.23 Sequence Family of ClpS and Back-to-Consensus Mutations. Back-to-consensus mutations were defined heuristically as residues for which there is consensus as to the amino acid identity across members of the alignment, but for which ClpS in C. crescentus (red box at bottom) differs. The sequence alignment represents members from Pfam annotated with the ClpS domain, and with > 70% sequence identity to ClpS in C. crescentus. We created a GFP-ClpS fusion protein in order to assay the amount of expressible soluble protein by fluorescence without running a protein gel (SDS-PAGE). We layered consensus mutations on top of the poorly soluble T51N design (Table 3.1) by site directed mutagenesis. Of the 14 mutations, we verified six with DNA sequencing that had a higher solubility level (relative to T51N) at one of the two temperatures tested (16 or 37°C). Out of these six we found one mutation, S104F, that improved the solubility of ClpS at two expression temperatures - by a factor of two and three at 16°C and 37°C respectively (Fig. 3.24). We suspect that the S104F mutation stabilizes ClpS through packing interactions with surrounding aromatics (Fig. 3.25). Consensus)Sequence) 64 Unfortunately, we were not able to fully rescue the T51N mutant to wild-type solubility levels, which are 12 and 8 times as high as T51N. Future work includes screening more mutations and combining multiple mutations with DNA shuffling to improve the solubility. Figure 3.24 Normalized Fluorescent Signals of Back-to-Consensus Mutations. Back-to-consensus mutants were made on top of the insoluble T51N mutation in a GFP-ClpS construct. The mutation S104F improved the GFP florescence signal (vertical axis), relative to T51N (dashed line), at both expression temperatures: 16°C (red) and 37°C (turquoise). A high GFP florescence value indicates that the protein has high solubility. The same fluorescence levels are shown with the wild-type (wt) levels in the inset. Wild-type ClpS levels were much higher than any back-to-consensus mutant. 0123t51ns104ly58hn81hv85ei102m s104fidsignal temp163704812t51n s104l y58h n81h v85e i102m s104f wtseqgfp_signal temp1637Solubility)(fold)improvement))Back8to8consensus)muta:on) 65 Figure 3.25 ClpS S104 and Surrounding Aromatic Residues. The S104F mutant was identified in the back-to-consensus screen. If the S104F mutant has a similar structure to ClpS, then the S104F mutation would be surrounded by aromatic residues (pdb 3dnj). S104F) 66 Chapter 4: Conclusion The goal of this thesis was to redesign ClpS for use in an affinity reagent based N-end sequencing method. If successful this technology could further the field of proteomics, because it could be used to sequence a vast number of diverse peptides in parallel. Living systems are highly complex, and this complexity cannot be exhaustively be captured by knowledge of the proteome. For instance, there is a whole world of interactions of small molecules with DNA, RNA and protein macromolecules, and some authors have called this the ‘missing link’ in the central dogma of molecular biology (S. L. Schreiber 2005; Oprea et al. 2011; Oprea et al. 2007). Yet a substantial portion of the research community is focusing on the emerging area of proteomics. Proteomics is already quite complicated due to the post-translational modifications, protein-protein interactions, and sequence variations of different isoforms of proteins arising from splice variants and pro-peptide removal by cleavage. Even within the same organism, tissue type, cell line, or organelle, the protein species present depend on the environmental conditions and developmental stage. Human disease is complex, and how can we understand this complexity if we cannot measure and 'see' the complexity in our body? One hopes that if one can measure at a systems-level understanding, then one can more easily de-convolute the multifactorial nature of complex diseases (Altelaar, Munoz, and Heck 2013). Efforts are well underway to understand the complexity of the proteome. The Human Proteome Project (HPP) utilizes diverse protein identification technologies to catalogue protein products from the 20 300 human protein-coding genes. Much work has been done, but 3844 proteins remain to be confidently identified (Legrain et al. 2011; Farrah et al. 2013; Farrah et al. 2014; Omenn 2013; Lane et al. 2014). Current technologies employed by the HPP include mass spectrometry, Edman sequencing, antibody-capture, and structure determination. We envision N- 67 end sequencing by protein adapters complementing these existing technologies, and contributing orthogonal lines of evidence to challenging problems. Are there any advantages of N-end sequencing over existing protein detection techniques? MS is not inherently quantitative, and many creative strategies exist for quantifying protein levels in a sample (Wasinger, Zeng, and Yau 2013). The amount of peptide detected in MS can depend greatly on its physicochemical characteristics, and comparing signals between diverse peptides does yield their stoichiometric ratio. In other words, in MS there is bias in what peptides are detected and many peptides are never detected. In order to detect peptide levels quantitatively (or quantitative ratios between experiments) one can introduce isotopically heavy peptide standards of known concentration, or repeat the experiment under different conditions and detect differences between the same peptide. In contrast to this, N-end protein sequencing directly detects single molecules. As long as there is no bias in what peptides are captured on the substrate, the author sees no reason why it would not be directly quantitative. Certain peptides are not amenable to detection in certain types of MS experiments, because they lack a basic charge at the C-terminus, or do not have optimal length characteristics. One MS study used multiple proteases (trypsin, LysC, ArgC, AspN, and GluC) to generate a larger family of peptides and improve protein identification (Swaney, Wenger, and Coon 2010). Peptides cleaved with trypsin had an average length of 8 residues, while the other proteases generated longer peptides ranging 13 to 21 residues long. The study shows how detecting longer and diverse peptides leads to more proteins being identified. Other experiments have used non-specific proteases (MacCoss et al. 2002; Schlosser, Vanselow, and Kramer 2005; B. Wang et al. 2008), but these experiments can be difficult to replicate and can increase the analysis time and 68 false discovery rate because of the greater number of possible spectra (Nesvizhskii 2010; Guthals and Bandeira 2012). In contrast to this, N-end sequencing would directly detect amino acids, and thus remove the challenge of matching experimental spectra to a database. Furthermore, N-end sequencing would not be limited to short tryptic peptides with a basic charge at the C-terminus, and alternative proteases could be employed without difficulty. Therefore in contrast to conventional MS, N-end sequencing could detect longer and more diverse peptides. We must admit that we do not know the typical length of the peptides that could be sequenced with our proposed method. If the peptides are too long then they may fold and hide their N-terminus. It has been shown that a disordered N-terminus is important for recognition by ClpS, as assayed indirectly by biochemical degradation (Kevin H Wang et al. 2008). In other words, protein degradation requires a disordered linker between the destabilizing residue and the folded portion of the protein targeted for degradation. Presumably N-end sequencing would only work if the N-terminal region of each peptide was unfolded. Toward our goal of developing a high-throughput N-end protein sequencing technology, we predicted the specificity of designed adapters, and experimentally validated 12 designs. We summarize our computational pipeline and experimental validation in Fig. 4.1. 69 Figure 4.1 Computational Pipeline. A summary of the workflow for designing selective affinity reagents for high-throghput N-end protein sequencing (left to right). The binding pocket residues of ClpS are mutated to interact specifically with an N-degron of interest and prevent non specific interactions. The selectivity is predicted by docking all 20 types of N-degrons (for the 20 amino acids) into various rigid receptor scaffolds (templates) with AutoDock. The selectivity score quantifies the selectivity. The kinetic stability is predicted by analyzing a molecular dynamics trajectory of the design. The final step is experimental validation with the peptide binding assay.AA1# ...# AA20#SelScore(i) = Pstate2{bound docks} eGstate,i/kbTZ=Pstate2{bound docks} eGstate,i/kbTPaa Pstate2{docks} eGstate,aa/kbTFig.% 1:% High*throughput% N*terminal% protein%sequencing! relies! on!mul+ple! rounds! of!N0terminal!amino! acid! recogni+on! followed! by! Edman!degrada+on!of!the!N0terminal!amino!acid.%%%%Fig.% 2:% The% N*end% Rule% Pathway.% (LeB)% In! the! 1980s! a! protein!degrada+on!rule!was!discovered!that!related!the!in#vivo#half!life!of!a!protein!to!its!N0terminal!amino!acid.! !Subsequent!work!in!different!organisms! unveiled! a! degrada+on! pathway! conserved! across!species.! (Right)% The! exact! molecular! mechanisms! are! quite! well!understood! through! the! structures! of! bacterial! ClpS! bound! to! N0terminal! {L,Y,W,F}! and! eucaryo+c! E3! ubiqui+n! ligase! Ubr1! {R,H,K}.2!Figures!taken!from!Annu!Rev!Biochem!2012.!Fig.%4:%Binding%Pocket%of%ClpS.%The!binding!pocket!of!ClpS!consists!of!12!residues.!LeB:!The! tyrosine! bound! structure! shown! in! full! (PDB! code! 3DNJ)! Right:! A! network! of! 4!hydrogen! bonds! coordinate! the! N0terminal! amino! acid! through! the! alpha0amino!nitrogen!and!are!conserved!in!several!X0ray!structures2!.!!Colors:! cyan! +! cartoon! =! ClpS! backbone;! salmon! +! s+cks! =! ClpS! binding! pocket! side!chains;!magenta!+!s+cks!=!tyrosine!substrate,!red!+!sphere!=!crystal!water!molecule.%His!79!Asn!47!Asp!49!Fig.% 3:% ComputaKonal% Protein% Design% Pipeline.% Muta+ons! in! the!binding!pocket!are!modeled!using! the!Roseda!molecular!modeling!suite.!The!20!amino!acids!are!docked!into!an!ensemble!of!structures!using! AutoDock! and! screened! in# silico! for! selec+vity! to! a! specific!amino! acid.! The! stability! of! the! protein! is! modeled! through!!molecular!dynamics!simula+ons.!0.00#0.04#0.08#0.12#0.16# A# C# D# E# F# G# H# I# K# L# M# N# P# Q# R# S# T# V# W# Y#Selec%vity*Score*(probability)* N3Terminal*Amino*Acid*Nega%ve*Design:************V56[ST]*M75[DENQST]*L112F*V56S#M75N# V56S#M75N# V56T#M75E# V56T#N75Q# V56S#M75D# V56#M75#(WT)#High!Resulu+on!Structures! Hot!Spot!Residue!Filtering!Nega+ve!Design! Template!Genera+on!(Roseda!filters)! Template!Selec+on!(Roseda,!diversity)! Selec+vity!Screen!(AutoDock)! MD!Simula+on!(NAMD)!Large!Scale!Mutagenisis!(Roseda)!AA1! ...! AA20!SelScore(Q) = Pstate⇥{bound docks} eGstate,Q/kbTZ= Pstate⇥{bound docks} eGstate,Q/kbTPaa Pstate⇥{bound docks} eGstate,aa/kbTGapScore(Q) = SelScore(Q)maxaa SelScore(aa) Fig.% 6:% Experimental% Results% of%12% Mutants.% The! pep+de!binding!profile!of!wild!type!ClpS!is! consistent! with! the!dissocia+on! constants! obtained!using!kine+c!assay!in!literature1.!A! summary! of! the! sequence,!expression! level,! binding!profile!predic+on!and!assay! results!are!shown!with!a!binding!profile!of!L112F.! Many! mutants! do! not!express! well! and! so! were! not!assayed!(0).!! ClpS-L112F peptide binding profile 0.00E+005.00E+041.00E+051.50E+052.00E+052.50E+053.00E+05 DLFVQRWK(6-Ahx-Biotin) FLFVQRWK(6-Ahx-Biotin) LLFVQRWK(6-Ahx-Biotin) SLFVQRWK(6-Ahx-Biotin) TLFVQRWK(6-Ahx-Biotin) VLFVQRWK(6-Ahx-Biotin) YLFVQRWK(6-Ahx-Biotin) HLFVQRWK(6-Ahx-Biotin) WLFVQRWK(6-Ahx-Biotin) DEFVQRWK(6-Ahx-Biotin) FEFVQRWK(6-Ahx-Biotin) LEFVQRWK(6-Ahx-Biotin) SEFVQRWK(6-Ahx-Biotin) TEFVQRWK(6-Ahx-Biotin) VEFVQRWK(6-Ahx-Biotin) YEFVQRWK(6-Ahx-Biotin) DQFVQRWK(6-Ahx-Biotin) FQFVQRWK(6-Ahx-Biotin) LQFVQRWK(6-Ahx-Biotin) SQFVQRWK(6-Ahx-Biotin) TQFVQRWK(6-Ahx-Biotin) VQFVQRWK(6-Ahx-Biotin) YQFVQRWK(6-Ahx-Biotin) DRFVQRWK(6-Ahx-Biotin) FRFVQRWK(6-Ahx-Biotin) LRFVQRWK(6-Ahx-Biotin) SRFVQRWK(6-Ahx-Biotin) TRFVQRWK(6-Ahx-Biotin) IRFVQRWK(6-Ahx-Biotin) YRFVQRWK(6-Ahx-Biotin) Anti-His6 antibodies only Biotinylated HRPSignal His6-­SUMO-­ClpS  (5  µM) His6-­SUMO-­ClpS-­L112F  (5  µM)Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)ClpS-L112F peptide binding profile 0 05 0 411.50E+052.00E+052.50E+053.00E+05DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignalHis6-­SUMO-­ClpS  (5  µM) His6-­SUMO-­ClpS-­L112F  (5  µM)Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignal-­ -­     -­ -­ -­    il i i i iLINKER:((XX),FVQRW ,(Bio4n)(S104F! Summary!•  N0end!protein!sequencing!requires!protein!adapters!to!recognize!the!N0terminal!amino!acid!•  ClpS,!a!naturally!occurring!N0end!rule!protein,!serves!as!a!design!template!•  ClpS!recognizes!bulky!hydrophobic!residues!(Phe,!Leu,!Trp,!Tyr)!•  We!have!developed!a!pep+de!binding!assay!that!recapitulates!the!wild!type!binding!affinity!•  ClpS!has!a!low!tolerance!to!muta+on,!since!many!of!the!residues!in!the!binding!pocket!are!part!of!the!hydrophobic!core!•  We!can!improve!the!stability!/!solubility!of!ClpS!thorugh!introducing!“consensus”!muta+ons!from!a!highly!related!sequence!family! References!1  Wang!et#al.#(2008).!The!Molecular!Basis!of!N0End!Rule!Recogni+on.!Mol#Cell.!2  !Tasaki!et#al.!(2012).!The!N0end!rule!pathway.!Annu#Rev#Biochem.!!Introduction!• !Current!protein!sequencing!methods!include!mass!spectrometry!and!Edman!degrada+on!• !A!new!high0throughput!sequencing!method!could!use!affinity!adapters!to!recognize!the!N0terminal!residue!• !N0end!protein!sequencing!would!require!the!design!of!protein!adapters!that!recognize!the!N0terminal!amino!acid!of!the!pep+de!being!sequenced,!independent!of!the!following!sequence!of!the!pep+de!• !ClpS,!part!of!the!N0end!rule!pathway,!already!recognizes!the!N0terminal!bulky!hydrophobic!amino!acids!(Phe,!Leu,!Typ,!Tyr)1,2!• !We!designed!and!assayed!several!adapters!and!report!our!computa+noal!pipeline!and!experimental!results!High-Throughput N-End Protein Sequencing Geoffrey!Woollard1,2!,!Patrick!H.!W.!Chan2,3,!Miriam!Kaltenbach3,!Florian!Baier1,!Nobuhiko!Tokuriki3,!Leonard!Foster2,3,!and!Joerg!Gsponer2,3!1!Genome!Science!and!Technology!Program,!2!Centre!for!High0Throughput!Biology,!3!Department!of!Biochemistry!and!Molecular!Biology,!!University!of!Bri+sh!Columbia,!Vancouver,!Canada!Peptides Kd of ClpS (C. cresc.) (nM)1 YL 400 ± 200 FL 170 ± 100 WL 150 ± 130 LL 480 ± 220 Fig.% 5:% High*throughput% Protein*PepKde% Binding% Assay:% Modified%Enzyme*Linked% Immunosorbent%Assay%(MELISA).% The! assay! is! performed! in!960wells! format.! 30! different! pep+des!commercially!purchased!can!be! tested!on!a!single!plate.!!Fig.% 7:% Concensus% MutaKons% Improve% Stability.% Several! ini+al! ClpS! mutants! were! found! to! be! unstable,! and! difficult! to!express.!We! aligned! highly! similar! ClpS! sequences! from!different! organisms! and! examined! at!which! posi+on! there!was! a!“consensus”.!If!our!ClpS!construct!deviated!from!this!consensus!then!we!mutated!it!to!the!preferred!consensus!amino!acid!and!assayed!its!expressibility!through!a!flourescent!read!out!in!a!GFP!taged!construct.! !Among!14!mutants,!S104F!produced!the!largest!improvement!in!expression.%LEGENDHydrophobic Non specific (NS)Polar No binding (X)(+) charged Increased binding +(-) charged Decreased binding -45 46 47 48 49 51 53 56 75 78 79 112 EXPRESS PREDICT ASSAYI L N D D T M V M V H L +++ Y F W L Y F W LN + DE -H ++ DE (NS)K - DE -N H - DE -N K - DE -H K + DE (X)N H K - DE -A +++ +V +L +WF +++ -Y -F -W +S +T +HA F +++ T (X)A Q +++ T (NS)AA1#AA1#1.#New#Binding#Partner#2.#Op5mize#Muta5ons#for#Binding#(Rose?a)#3.#Template#Selec5on#4.#Selec5vity#Scr en#(AutoDock)#5.#Stability#Filter#(MD)# 6.#Experimental#Valida5on#Top 6 Features based on Max Separationvalueclps53a112f112k51n.112k51n.56h.112k0.0 0.2 0.4 0.6 0.8 1.0102_111 | p=3e−05 49_111 | p=0.020.0 0.2 0.4 0.6 0.8 1.051_78 | p=0.007clps53a112f112k51n.112k51n.56h.112k51_113 | p=0.04 0.0 0.2 0.4 0.6 0.8 1.052_112 | p=0.05 45_87 | p=0.01ClpS-L112F peptide binding profile 0.00E+005.00E+041.00E+051.50E+052.00E+052.50E+053.00E+05DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignalHis6-­SUMO-­ClpS  (5  µM) His6-­SUMO-­ClpS-­L112F  (5  µM)Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)ClpS-L112F peptide binding profile 0 05 0 411.50E+052.00E+052.50E+053.00E+05DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignalHis6-­SUMO-­ClpS  (5  µM) His6-­SUMO-­ClpS-­L112F  (5  µM)Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)DLFVQRWK(6-Ahx-Biotin)FLFVQRWK(6-Ahx-Biotin)LLFVQRWK(6-Ahx-Biotin)SLFVQRWK(6-Ahx-Biotin)TLFVQRWK(6-Ahx-Biotin)VLFVQRWK(6-Ahx-Biotin)YLFVQRWK(6-Ahx-Biotin)HLFVQRWK(6-Ahx-Biotin)WLFVQRWK(6-Ahx-Biotin)DEFVQRWK(6-Ahx-Biotin)FEFVQRWK(6-Ahx-Biotin)LEFVQRWK(6-Ahx-Biotin)SEFVQRWK(6-Ahx-Biotin)TEFVQRWK(6-Ahx-Biotin)VEFVQRWK(6-Ahx-Biotin)YEFVQRWK(6-Ahx-Biotin)DQFVQRWK(6-Ahx-Biotin)FQFVQRWK(6-Ahx-Biotin)LQFVQRWK(6-Ahx-Biotin)SQFVQRWK(6-Ahx-Biotin)TQFVQRWK(6-Ahx-Biotin)VQFVQRWK(6-Ahx-Biotin)YQFVQRWK(6-Ahx-Biotin)DRFVQRWK(6-Ahx-Biotin)FRFVQRWK(6-Ahx-Biotin)LRFVQRWK(6-Ahx-Biotin)SRFVQRWK(6-Ahx-Biotin)TRFVQRWK(6-Ahx-Biotin)IRFVQRWK(6-Ahx-Biotin)YRFVQRWK(6-Ahx-Biotin)Anti-His6 antibodies onlyBiotinylated HRPSignal-­ -­     -­ -­ -­    il i i i iLINKER:((XX),FVQRW ,(Bio4n)( 70 In hindsight, designs with the T51N mutation were a risky choice, because T51 plays a functional role in N-degron recognition. The backbone O of T51 hydrogen bonds with the amide nitrogen between the first and second residues of the N-degron (K H Wang et al. 2008; G Roman-Hernandez et al. 2009; Giselle Roman-Hernandez et al. 2011). One study determined that this amide recognition is necessary for N-degron binding. The lone amino acid Phe in solution (N-Phe-COOH) could not bind the N-domain of Ubr1, but when the C-terminus was modified to have an amide (N-Phe-CONH2), binding was detected (S. Sriram et al. 2013). We did not test the effect of the T51N mutation on binding, since designs with this mutation expressed poorly. However, we suggest future designs should keep residue T51 as its wild-type amino acid identity, so as to ensure proper backbone O placement of T51 and N-degron binding. We gathered some initial data on what mutations were tolerated, and were able to recapitulate the kinetic stability of several ClpS mutants. However, we have been unable to satisfactorily recapitulate the binding affinity of the designs, for example L112F (data not shown). This inability to recapitulate the binding profiles of an arbitrary mutant could be caused by numerous reasons, one of which could be the structural modeling of the designs. We chose a quite conservative approach to modeling structural rearrangements in ClpS upon mutation, restricting backbone and side chain adjustment, so as to introduce minimal noise. ClpS showed itself to be quite sensitive to mutation, and therefore we suspect that mutations such as L112F may cause larger structural rearrangements. One ligand docking study modeled the breathing motions of a receptor by performing a molecular dynamics study on the receptor, finding the normal modes, and then generating an ensemble of templates for use in docking (Gerek and Ozkan 2010). We think this approach could be suitable, but the computational expense of an MD simulation is a disadvantage. Another approach to obtaining an ensemble of ClpS states would 71 be to use multiple crystal structures of ClpS and require a consensus among the results. We chose the crystal structure with the best resolution, 3dnj, but there are many others (Table 1.1) which are crystallized under different conditions and by different groups. However they are not completely independent, as many use molecular replacement to obtain the phases and resolve the diffraction data (Giselle Roman-Hernandez et al. 2011). Another reason for the inability to recapitulate the peptide binding activity of designs could be the sensitivity of the peptide binding assay itself. Wild-type ClpS has a submicromolar binding affinity to its native L/F/Y/W N-degrons, and stringent washing steps are likely to wash away a design with weaker binding. However if the washing steps are too gentle then protein that is not tightly bound remains and introduces background noise when detected in the final HRP readout. Other studies have used fluorescence anisotropy to detect the level of peptide binding activity. This works by fluorescently labeling the peptide, which tumbles when unbound in solution, but tumbles less when bound to a large receptor such as ClpS. The difference in fluorescent signals can be used to calculate a Kd. Our assay could be validated, and the washing steps could be automated, if we could compare our technique to an orthogonal technique such as fluorescence anisotropy. We found ClpS to be particularly sensitive to mutation and addressed this in two ways, through both a computational and experimental approach. We developed a confidence score that can be used to predict the kinetic stability of a design in silico. We also determined which mutations to ClpS would improve its solubility by tagging it to GFP and assaying the mutants for increased fluorescent signal. We recommend that future work of ClpS focus on improving the stability of ClpS. Two recent studies have successfully incorporated disulphide bridges into proteins to improve the thermostability (Wijma et al. 2014; L. Liu et al. 2014). Both approaches 72 screened a rationally constructed library, and were able to improve the thermostability (melting temperature, half life at high temperatures, increased optimal temperature at which it performs its function) without significantly interfering with the natural function. We think screening a randomized DNA library with only a few mutations, for instance using a directed evolution approach, would be suitable because ClpS is a small protein and too many mutations may change the shape of its binding pocket and prevent it from binding the N-terminal residue through its hydrogen bond network (Fig. 1.3). Alternatively, a completely new scaffold could be used to design affinity reagents for N-end based sequencing. We initially chose ClpS as a template for various reasons: (1) many crystal structures were available; (2) the binding activity had been investigated indirectly by a degradation based biochemical assay, and indirectly with peptide arrays and fluorescence anisotropy; (3) the binding affinity was greater than UBR box to type 1 R/K/H N-degrons; (4) the protein was not too large. The eukaryotic N-end rule protein Ubr1 is an E3 ubiqutin ligase that contains the N-domain, which has primary and secondary sequence similarity to ClpS (Tasaki et al. 2012; Lupas and Koretke 2003). Studies have reported the role of the N-domain in binding N-degrons with N-terminal I/L/Y/F/W (Tasaki et al. 2009; Kitamura and Fujiwara 2013). A recent study also shows that unacetylated M (when followed by a hydrophobic residue) is a degradation signal through its interaction with the N-domain in yeast and mouse Ubr1 (H.-K. Kim et al. 2013). However Ubr1 is a large protein ~1900 residues, with multiple domains, and efforts to isolate the N-domain and obtain a well-behaved Ubr1 truncation protein through deletion mutants have proved unsuccessful (H.-K. Kim et al. 2013; Tasaki et al. 2009). The UBR box of Ubr1 binds type 1 charged N-degrons and several crystal structures are published (Choi et al. 2010; Matta-Camacho et al. 2010; S. M. Sriram and Kwon 2010), but no such structures exist 73 of the N-domain. One group has modeled the UBR box and N-domain as being in extremely close proximity, within 15 Å (S. Sriram et al. 2013; Jiang et al. 2013) therefore it may require much trial and error to obtain a well behaved eukaryotic Ubr1 N-domain truncation. The structure of the N-domain was modeled, using the bacterial ClpS structure as a homology model (Schuenemann et al. 2009), but it seems wise to wait for a crystal structure to become available, because the therapeutic relevance of the N-end rule marks Ubr1 as a protein of interest (Pore and Banerjee 2013). Another technique for stabilizing proteins is to reconstruct their ancient ancestors. This work was pioneered by Joe Thornton at the University of Oregon (Thornton 2004). It has been used to study the tradeoff and interplay between evolving function and stability (Harms and Thornton 2013; Studer, Dessailly, and Orengo 2013). Furthermore, ancient mutations are more often tolerated by a protein than other types of mutations. Dan Tawfik at the Weizmann Institute of Science, has had success constructing semi-random DNA libraries using ancestral sequence information (Alcolombri, Elias, and Tawfik 2011). Another study reconstructed evolutionary nodes of precambrian β-lactamases with high thermostability (Risso et al. 2013). These ancient proteins had up to 100 amino acid differences from modern sequences yet retained their function and were more thermally stable. Their large 35°C enhancement in thermal denaturation led to the speculation that early organisms were highly thermophilic. This technique could be promising to apply to prokaryotic ClpS if a stable UBR box truncation protein is difficult to purify. There are other applications of N-end rule N-recognins beyond our proposed method of N-end protein sequencing. The N-end rule has been exploited to detect protease activity in a cell free assay (Oyler and Tsai 2013). This protease detection method works because the protease cleaves a labeled peptide, revealing a destabilizing N-end residue. By tracking the loss of labeled 74 peptide over time (for instance, through GFP signal), the initial protease cleavage activity can be quantified. This cleavage-degradation coupling has been applied to specifically target toxins to infected cells. One study introduced a toxin that was preferentially stabilized in virus-infected cells (Falnes and Olsnes 1998; Falnes et al. 1999), and therefore infected cells were preferentially targeted. The toxin targeting method worked through removal of an N-end degradation signal by a virus specific protease. The toxin is safely degraded healthy cells because healthy cells lack the virus specific protease, thus they cannot cleave the degradation signal and thus the toxin is degraded. One limitation of the study was the limited viral protease activity in the cytosol of infected cells, which prevented stabilization of the toxin. In other words, the toxin was degraded even in virus-infected cells. A more successful study directly fused a drug to ubiquitin (vs. a destabilizing N-end residue). The authors noted that it is important to choose the right activating protease enzyme for the pro-drug (Tcherniuk, Chroboczek, and Balakirev 2005). Zhou et al. have already harnessed the ubiquitin degradation system to knock down specific pools of cellular proteins (Zhou 2005). They engineered a protein that contains the F-box, which consists of three helicies and is coupled t degradation through the ubiqutin system (Schulman et al. 2000). They fused this F-box containing protein to a peptide that interacts with specific proteins of interest. This method can be used to determine the function of proteins that are necessary for development, and therefore are difficult to study in non-human organisms since they are embryo-lethal. This technique is analogous to RNAi at a protein level, after post-translational regulation. At a desired time point the engineered degradation adapter can be introduced and protein levels lowered. If one could engineer binding sites for specific amino 75 acids, or some specific N-terminal motif, then one could knock down different pools of protein, depending on their degradation signal. In conclusion, to return to the original goal of designing adapters for N-end sequencing, we propose these recommendations that should be addressed to move forward in this project. (1) Reproduce the peptide binding profiles of mutants with an assay such as fluorescence anisotropy to give confidence to the sensitivity of the existing modified affinity reagent based assay. (2) Stabilize the template of ClpS through randomized DNA screening techniques, such as directed evolution or ancestral protein resurrection. (3) Introduce minimal mutations into the core of ClpS, such as hydrophobic to hydrophobic, but not hydrophobic to charged / polar (4) Keep certain residues as their wild-type identity: residues T51, H79, and N47, and D49 are involved in coordinating the N-degron α-amino nitrogen and peptide backbone through hydrogen bonds. 76 Bibliography Abdi, H. 2007. Bonferroni and Šidák Corrections for Multiple Comparisons. Edited by N J Salkind. Thousand Oaks, CA: Sage. Alcolombri, Uria, Mikael Elias, and Dan S Tawfik. 2011. “Directed Evolution of Sulfotransferases and Paraoxonases by Ancestral Libraries.” Journal of Molecular Biology 411 (4) (August 26): 837–53. doi:10.1016/j.jmb.2011.06.037. Allison, Brittany, Steven Combs, Sam DeLuca, Gordon Lemmon, Laura Mizoue, and Jens Meiler. 2014. “Computational Design of Protein-Small Molecule Interfaces.” Journal of Structural Biology 185 (2) (February): 193–202. doi:10.1016/j.jsb.2013.08.003. Altelaar, A F Maarten, Javier Munoz, and Albert J R Heck. 2013. “Next-Generation Proteomics: Towards an Integrative View of Proteome Dynamics.” Nature Reviews. Genetics 14 (1) (January): 35–48. doi:10.1038/nrg3356. Andrews, Benjamin T, Shachi Gosavi, John M Finke, José N Onuchic, and Patricia A Jennings. 2008. “The Dual-Basin Landscape in GFP Folding.” Proceedings of the National Academy of Sciences of the United States of America 105 (34) (August 26): 12283–8. doi:10.1073/pnas.0804039105. Bachmair, A, D Finley, and A Varshavsky. 1986. “In Vivo Half-Life of a Protein Is a Function of Its Amino-Terminal Residue.” Science 234 (4773) (October 10): 179–186. doi:10.1126/science.3018930. Bauer, David F. 1972. “Constructing Confidence Sets Using Rank Statistics.” Journal of the American Statistical Association 67 (339): 687–690. Bershtein, Shimon, Korina Goldin, and Dan S Tawfik. 2008. “Intense Neutral Drifts Yield Robust and Evolvable Consensus Proteins.” Journal of Molecular Biology 379 (5) (June 20): 1029–44. doi:10.1016/j.jmb.2008.04.024. Borgo, Benjamin, and James J Havranek. 2012. “Automated Selection of Stabilizing Mutations in Designed and Natural Proteins.” Proceedings of the National Academy of Sciences of the United States of America 109 (5) (January 31): 1494–9. doi:10.1073/pnas.1115172109. Chandler, D. 2005. “Interfaces and the Driving Force of Hydrophobic Assembly.” Nature 437 (7059) (September): 640–647. Choi, Woo Suk, Byung-Cheon Jeong, Yoo Jin Joo, Myeong-Ryeol Lee, Joon Kim, Michael J Eck, and Hyun Kyu Song. 2010. “Structural Basis for the Recognition of N-End Rule Substrates by the UBR Box of Ubiquitin Ligases.” Nature Structural & Molecular Biology 17 (10) (October): 1175–81. doi:10.1038/nsmb.1907. Combs, Steven A, Samuel L Deluca, Stephanie H Deluca, Gordon H Lemmon, David P Nannemann, Elizabeth D Nguyen, Jordan R Willis, Jonathan H Sheehan, and Jens Meiler. 2013. “Small-Molecule Ligand Docking into Comparative Models with Rosetta.” Nature Protocols 8 (7) (January): 1277–98. doi:10.1038/nprot.2013.074. Das, R, and D Baker. 2008. “Macromolecular Modeling with Rosetta.” Annual Review of Biochemistry 77: 363–382. De Hoog, C L, and M Mann. 2004. “Proteomics.” Annual Review of Genomics and Human Genetics 5: 267–293. 77 De Lange, Frank, Alessandra Cambi, Richard Huijbens, Bärbel de Bakker, Wouter Rensen, Maria Garcia-Parajo, Niek van Hulst, and Carl G Figdor. 2001. “Cell Biology beyond the Diffraction Limit: Near-Field Scanning Optical Microscopy.” Journal of Cell Science 114 (23): 4153–4160. Donnert, Gerald, Jan Keller, Rebecca Medda, M Alexandra Andrei, Silvio O Rizzoli, Reinhard Lührmann, Reinhard Jahn, Christian Eggeling, and Stefan W Hell. 2006. “Macromolecular-Scale Resolution in Biological Fluorescence Microscopy.” Proceedings of the National Academy of Sciences 103 (31): 11440–11445. doi:10.1073/pnas.0604965103. Doucet, Alain, and Christopher M Overall. 2011. “Broad Coverage Identification of Multiple Proteolytic Cleavage Site Sequences in Complex High Molecular Weight Proteins Using Quantitative Proteomics as a Complement to Edman Sequencing.” Molecular & Cellular Proteomics!: MCP 10 (5) (May): M110.003533. doi:10.1074/mcp.M110.003533. Edman, P. 1950. “Method for Determination of the Amino Acid Sequence in Peptides.” Acta Chem. Scand. 4: 283–293. Edman, P, and G Begg. 1967. “A Protein Sequenator.” European Journal of Biochemistry / FEBS 1 (1) (March): 80–91. Erbse, A, R Schmidt, T Bornemann, J Schneider-Mergener, A Mogk, R Zahn, D A Dougan, and B Bukau. 2006. “ClpS Is an Essential Component of the N-End Rule Pathway in Escherichia Coli.” Nature 439 (7077) (February 9): 753–6. doi:10.1038/nature04412. Essmann, Ulrich, Lalith Perera, Max L Berkowitz, Tom Darden, Hsing Lee, and Lee G Pedersen. 1995. “A Smooth Particle Mesh Ewald Method.” The Journal of Chemical Physics 103 (19). Falnes, P O, and S Olsnes. 1998. “Modulation of the Intracellular Stability and Toxicity of Diphtheria Toxin through Degradation by the N-End Rule Pathway.” The EMBO Journal 17 (2) (January 15): 615–25. doi:10.1093/emboj/17.2.615. Falnes, P O, R Welker, H G Kräusslich, and S Olsnes. 1999. “Toxins That Are Activated by HIV Type-1 Protease through Removal of a Signal for Degradation by the N-End-Rule Pathway.” Biochem. J. 343 (1) (October 1): 199–207. Farrah, Terry, Eric W Deutsch, Michael R Hoopmann, Janice L Hallows, Zhi Sun, Chung-Ying Huang, and Robert L Moritz. 2013. “The State of the Human Proteome in 2012 as Viewed through PeptideAtlas.” Journal of Proteome Research 12 (1) (January 4): 162–71. doi:10.1021/pr301012j. Farrah, Terry, Eric W Deutsch, Gilbert S Omenn, Zhi Sun, Julian D Watts, Tadashi Yamamoto, David Shteynberg, Micheleen M Harris, and Robert L Moritz. 2014. “State of the Human Proteome in 2013 as Viewed through PeptideAtlas: Comparing the Kidney, Urine, and Plasma Proteomes for the Biology- and Disease-Driven Human Proteome Project.” Journal of Proteome Research 13 (1) (January 3): 60–75. doi:10.1021/pr4010037. Figueroa, Maximiliano, Nicolas Oliveira, Annabelle Lejeune, Kristian W Kaufmann, Brent M Dorr, André Matagne, Joseph A Martial, Jens Meiler, and Cécile Van de Weerdt. 2013. “Octarellin VI: Using Rosetta to Design a Putative Artificial (β/α)8 Protein.” PloS One 8 (8) (January): e71858. doi:10.1371/journal.pone.0071858. 78 Fleishman, Sarel J., Timothy A. Whitehead, Damian C. Ekiert, Cyrille Dreyfus, Jacob E. Corn, E.-M. Eva-Maria Strauch, Ian A. Wilson, and David Baker. 2011. “Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin.” Science 332 (6031) (May 12): 816–821. doi:10.1126/science.1202617. Fong, Tony. 2008. “ABI Plans to Shutter Edman Sequencing Business, But Some Customers Complain.” Gerek, Z Nevin, and S Banu Ozkan. 2010. “A Flexible Docking Scheme to Explore the Binding Selectivity of PDZ Domains.” Protein Science!: A Publication of the Protein Society 19 (5) (May): 914–28. doi:10.1002/pro.366. Gouw, Joost W, Jeroen Krijgsveld, and Albert J R Heck. 2010. “Quantitative Proteomics by Metabolic Labeling of Model Organisms.” Molecular & Cellular Proteomics!: MCP 9 (1) (January): 11–24. doi:10.1074/mcp.R900001-MCP200. Guthals, Adrian, and Nuno Bandeira. 2012. “Peptide Identification by Tandem Mass Spectrometry with Alternate Fragmentation Modes.” Molecular & Cellular Proteomics!: MCP 11 (9) (September): 550–7. doi:10.1074/mcp.R112.018556. Gygi, S P, Y Rochon, B R Franza, and R Aebersold. 1999. “Correlation between Protein and mRNA Abundance in Yeast.” Molecular and Cellular Biology 19 (3) (March): 1720–30. Harms, Michael J, and Joseph W Thornton. 2013. “Evolutionary Biochemistry: Revealing the Historical and Physical Causes of Protein Properties.” Nature Reviews. Genetics 14 (8) (August): 559–71. doi:10.1038/nrg3540. Havranek, James J, Carlos M Duarte, and David Baker. 2004. “A Simple Physical Model for the Prediction and Design of Protein-DNA Interactions.” Journal of Molecular Biology 344 (1) (November 12): 59–70. doi:10.1016/j.jmb.2004.09.029. Hernandez, Patricia, Markus Müller, and Ron D Appel. 2006. “Automated Protein Identification by Tandem Mass Spectrometry: Issues and Strategies.” Mass Spectrometry Reviews 25 (2): 235–54. doi:10.1002/mas.20068. Hood, Leroy, and Charles Auffray. 2013. “Participatory Medicine: A Driving Force for Revolutionizing Healthcare.” Genome Medicine: 12–15. Hwang, Cheol-Sang, Anna Shemorry, and Alexander Varshavsky. 2010. “N-Terminal Acetylation of Cellular Proteins Creates Specific Degradation Signals.” Science (New York, N.Y.) 327 (5968) (February 19): 973–7. doi:10.1126/science.1183147. Jiang, Yanxialei, Subrata Kumar Pore, Jung Hoon Lee, Shashi Sriram, Binh Khanh Mai, Dong Hoon Han, Pritha Agarwalla, et al. 2013. “Characterization of Mammalian N-Degrons and Development of Heterovalent Inhibitors of the N-End Rule Pathway.” Chem. Sci. 4 (8): 3339–3346. doi:10.1039/C3SC51059J. Karanicolas, John, Jacob E Corn, Irwin Chen, Lukasz A Joachimiak, Orly Dym, Sun H Peck, Shira Albeck, et al. 2011. “A de Novo Protein Binding Pair by Computational Design and Directed Evolution.” Molecular Cell 42 (2) (April 22): 250–60. doi:10.1016/j.molcel.2011.03.010. Kass, Itamar, and Amnon Horovitz. 2002. “Mapping Pathways of Allosteric Communication in GroEL by Analysis of Correlated Mutations.” Proteins 48 (4) (September 1): 611–7. doi:10.1002/prot.10180. 79 Kaufmann, Kristian W, Gordon H Lemmon, Samuel L Deluca, Jonathan H Sheehan, and Jens Meiler. 2010. “Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You.” Biochemistry 49 (14) (April 13): 2987–98. doi:10.1021/bi902153g. Kaufmann, Kristian W, and Jens Meiler. 2012. “Using RosettaLigand for Small Molecule Docking into Comparative Models.” PloS One 7 (12) (January): e50769. doi:10.1371/journal.pone.0050769. Kay, Richard, Chris Barton, Lucy Ratcliffe, Balwir Matharoo-Ball, Pamela Brown, Jane Roberts, Phil Teale, and Colin Creaser. 2008. “Enrichment of Low Molecular Weight Serum Proteins Using Acetonitrile Precipitation for Mass Spectrometry Based Proteomic Analysis.” Rapid Communications in Mass Spectrometry!: RCM 22 (20) (October): 3255–60. doi:10.1002/rcm.3729. Keck, Jamie M, Michele H Jones, Catherine C L Wong, Jonathan Binkley, Daici Chen, Sue L Jaspersen, Eric P Holinger, et al. 2011. “A Cell Cycle Phosphoproteome of the Yeast Centrosome.” Science 332 (6037): 1557–1561. Kim, Heon-Ki, Ryu-Ryun Kim, Jang-Hyun Oh, Hanna Cho, Alexander Varshavsky, and Cheol-Sang Hwang. 2013. “The N-Terminal Methionine of Cellular Proteins as a Degradation Signal.” Cell 156 (1) (December 18): 158–169. doi:10.1016/j.cell.2013.11.031. Kim, Woong, Eric¬†J Bennett, Edward¬†L Huttlin, Ailan Guo, Jing Li, Anthony Possemato, Mathew¬†E Sowa, et al. 2011. “Systematic and Quantitative Assessment of the Ubiquitin-Modified Proteome.” Molecular Cell 44 (2): 325–340. King, Neil P, William Sheffler, Michael R Sawaya, Breanna S Vollmar, John P Sumida, Ingemar André, Tamir Gonen, Todd O Yeates, and David Baker. 2012. “Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy.” Science (New York, N.Y.) 336 (6085) (June 1): 1171–4. doi:10.1126/science.1219364. Kitamura, Kenji, and Hidenobu Fujiwara. 2013. “The Type-2 N-End Rule Peptide Recognition Activity of Ubr11 Ubiquitin Ligase Is Required for the Expression of Peptide Transporters.” FEBS Letters 587 (2) (January 16): 214–9. doi:10.1016/j.febslet.2012.11.028. Kortemme, T, D E Kim, and D Baker. 2004. “Computational Alanine Scanning of Protein-Protein Interfaces.” Science’s STKE!: Signal Transduction Knowledge Environment 2004 (219) (February): pl2. Kortemme, T, A V Morozov, and D Baker. 2003. “An Orientation-Dependent Hydrogen Bonding Potential Improves Prediction of Specificity and Structure for Proteins and Protein-Protein Complexes.” Journal of Molecular Biology 326 (4) (February): 1239–1259. Lane, Lydie, Amos Bairoch, Ronald C Beavis, Eric W Deutsch, Pascale Gaudet, Emma Lundberg, and Gilbert S Omenn. 2014. “Metrics for the Human Proteome Project 2013-2014 and Strategies for Finding Missing Proteins.” Journal of Proteome Research 13 (1) (January 3): 15–20. doi:10.1021/pr401144x. Lange, Philipp F, Pitter F Huesgen, Karen Nguyen, and Christopher M Overall. 2014. “Annotating N Termini for the Human Proteome Project: N Termini and Nα-Acetylation Status Differentiate Stable Cleaved Protein 80 Species from Degradation Remnants in the Human Erythrocyte Proteome.” Journal of Proteome Research (March 10). doi:10.1021/pr401191w. Leaver-Fay, Andrew, Ron Jacak, P Benjamin Stranges, and Brian Kuhlman. 2011. “A Generic Program for Multistate Protein Design.” PloS One 6 (7) (January): e20937. doi:10.1371/journal.pone.0020937. Leaver-Fay, Andrew, Matthew J O’Meara, Mike Tyka, Ron Jacak, Yifan Song, Elizabeth H Kellogg, James Thompson, et al. 2013. “Scientific Benchmarks for Guiding Macromolecular Energy Function Improvement.” Methods in Enzymology 523 (January): 109–43. doi:10.1016/B978-0-12-394292-0.00006-0. Leaver-Fay, Andrew, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian Kaufman, et al. 2011. “ROSETTA3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules.” Methods in Enzymology 487 (January): 545–74. doi:10.1016/B978-0-12-381270-4.00019-6. Legrain, Pierre, Ruedi Aebersold, Alexander Archakov, Amos Bairoch, Kumar Bala, Laura Beretta, John Bergeron, et al. 2011. “The Human Proteome Project: Current State and Future Direction.” Molecular & Cellular Proteomics!: MCP 10 (7) (July): M111.009993. doi:10.1074/mcp.M111.009993. Lemmon, Gordon, and Jens Meiler. 2012. “Rosetta Ligand Docking with Flexible XML Protocols.” Methods in Molecular Biology (Clifton, N.J.) 819 (January): 143–55. doi:10.1007/978-1-61779-465-0_10. Liu, Long, Zhuangmei Deng, Haiquan Yang, Jianghua Li, Hyun-Dong Shin, Rachel R Chen, Guocheng Du, and Jian Chen. 2014. “In Silico Rational Design and Systems Engineering of Disulfide Bridges in the Catalytic Domain of an Alkaline Α-Amylase from Alkalimonas Amylolytica To Improve Thermostability.” Applied and Environmental Microbiology 80 (3) (February): 798–807. doi:10.1128/AEM.03045-13. Liu, Yi, and Brian Kuhlman. 2006. “RosettaDesign Server for Protein Design.” Nucleic Acids Research 34 (Web Server issue) (July 1): W235–8. doi:10.1093/nar/gkl163. London, Nir, Corissa L Lamphear, James L Hougland, Carol A Fierke, and Ora Schueler-Furman. 2011. “Identification of a Novel Class of Farnesylation Targets by Structure-Based Modeling of Binding Specificity.” PLoS Comput Biol 7 (10). London, Nir, Dana Movshovitz-Attias, and Ora Schueler-Furman. 2010. “The Structural Basis of Peptide-Protein Binding Strategies.” Structure (London, England!: 1993) 18 (2) (February 10): 188–99. doi:10.1016/j.str.2009.11.012. Lupas, Andrei N, and Kristin K Koretke. 2003. “Bioinformatic Analysis of ClpS, a Protein Module Involved in Prokaryotic and Eukaryotic Protein Degradation.” Journal of Structural Biology 141 (1) (January): 77–83. MacCoss, Michael J, W Hayes McDonald, Anita Saraf, Rovshan Sadygov, Judy M Clark, Joseph J Tasto, Kathleen L Gould, et al. 2002. “Shotgun Identification of Protein Modifications from Protein Complexes and Lens Tissue.” Proceedings of the National Academy of Sciences of the United States of America 99 (12) (June 11): 7900–5. doi:10.1073/pnas.122231399. 81 MacKerell,, A. D., D. Bashford, R. L. Dunbrack,, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, et al. 1998. “All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins †.” The Journal of Physical Chemistry B 102 (18) (April): 3586–3616. doi:10.1021/jp973084f. Marsaglia, George, Wai Wan Tsang, and Jingbo Wang. 2003. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software 8 (18): 1–4. Matta-Camacho, Edna, Guennadi Kozlov, Flora F Li, and Kalle Gehring. 2010. “Structural Basis of Substrate Recognition and Specificity in the N-End Rule Pathway.” Nature Structural & Molecular Biology 17 (10) (October): 1182–7. doi:10.1038/nsmb.1894. Metzker, Michael L. 2010. “Sequencing Technologies - the next Generation.” Nature Reviews. Genetics 11 (1) (January): 31–46. doi:10.1038/nrg2626. Michaud-Agrawal, Naveen, Elizabeth J Denning, Thomas B Woolf, and Oliver Beckstein. 2011. “MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.” Journal of Computational Chemistry (April 15). doi:10.1002/jcc.21787. Mogk, A, R Schmidt, and B Bukau. 2007. “The N-End Rule Pathway for Regulated Proteolysis: Prokaryotic and Eukaryotic Strategies.” Trends in Cell Biology 17 (4) (April): 165–172. Morin, Andrew, Kristian W Kaufmann, Carie Fortenberry, Joel M Harp, Laura S Mizoue, and Jens Meiler. 2011. “Computational Design of an Endo-1,4-Beta-Xylanase Ligand Binding Site.” Protein Engineering, Design & Selection!: PEDS 24 (6) (June): 503–16. doi:10.1093/protein/gzr006. Morin, Andrew, Jens Meiler, and Laura S Mizoue. 2011. “Computational Design of Protein-Ligand Interfaces: Potential in Therapeutic Development.” Trends in Biotechnology 29 (4) (April): 159–66. doi:10.1016/j.tibtech.2011.01.002. Morozova, O, and M A Marra. 2008. “Applications of next-Generation Sequencing Technologies in Functional Genomics.” Genomics 92 (5) (November): 255–264. Morris, Garrett M, Ruth Huey, William Lindstrom, Michel F Sanner, Richard K Belew, David S Goodsell, and Arthur J Olson. 2009. “AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility.” Journal of Computational Chemistry 30 (16) (December): 2785–91. doi:10.1002/jcc.21256. Mortz, Ejvind, Thanh Ha Nguyen, and Thomas Kofoed. 2013. “Applications of N-Terminal Edman Sequencing.” Nesvizhskii, Alexey I. 2010. “A Survey of Computational Methods and Error Rate Estimation Procedures for Peptide and Protein Identification in Shotgun Proteomics.” Journal of Proteomics 73 (11) (October 10): 2092–123. doi:10.1016/j.jprot.2010.08.009. Niall, H D. 1973. “Automated Edman Degradation: The Protein Sequenator.” Meth. Enzymol. 27: 942–1010. O’Boyle, Noel M, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. 2011. “Open Babel: An Open Chemical Toolbox.” Journal of Cheminformatics 3 (1) (January): 33. doi:10.1186/1758-2946-3-33. Omenn, Gilbert S. 2013. “The Strategy, Organization, and Progress of the HUPO Human Proteome Project.” Journal of Proteomics (October 19). doi:10.1016/j.jprot.2013.10.012. 82 Ong, Shao-En, and Matthias Mann. 2006. “A Practical Recipe for Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC).” Nature Protocols 1 (6) (January): 2650–60. doi:10.1038/nprot.2006.427. Oprea, Tudor I, Elebeoba E May, Andrei Leitão, and Alexander Tropsha. 2011. “Computational Systems Chemical Biology.” Methods in Molecular Biology (Clifton, N.J.) 672 (January): 459–88. doi:10.1007/978-1-60761-839-3_18. Oprea, Tudor I, Alexander Tropsha, Jean-Loup Faulon, and Mark D Rintoul. 2007. “Systems Chemical Biology.” Nature Chemical Biology 3 (8) (August): 447–50. doi:10.1038/nchembio0807-447. Oyler, G A, and Y C Tsai. 2013. “N-End Rule Protease Activity Indication Methods and Uses Thereof”. Google Patents. Phillips, James C, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kalé, and Klaus Schulten. 2005. “Scalable Molecular Dynamics with NAMD.” Journal of Computational Chemistry 26: 1781–1802. doi:10.1002/jcc.20289. Pore, SK, and R Banerjee. 2013. “The N-End Rule Pathway: Its Physiological Importance and Role in Disease Pathology.” International Journal of Pharmaceutical Sciences Review and Research 23 (1): 266–274. R Core Team. 2013. “R: A Language and Environment for Statistical Computing”. Vienna, Austria. Raveh, Barak, Nir London, and Ora Schueler-Furman. 2010. “Sub-Angstrom Modeling of Complexes between Flexible Peptides and Globular Proteins.” Proteins 78 (9) (July): 2029–40. doi:10.1002/prot.22716. Raveh, Barak, Nir London, Lior Zimmerman, and Ora Schueler-Furman. 2011. “Rosetta FlexPepDock Ab-Initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors.” PloS One 6 (4) (January): e18934. doi:10.1371/journal.pone.0018934. Risso, Valeria A, Jose A Gavira, Diego F Mejia-Carmona, Eric A Gaucher, and Jose M Sanchez-Ruiz. 2013. “Hyperstability and Substrate Promiscuity in Laboratory Resurrections of Precambrian Β-Lactamases.” Journal of the American Chemical Society 135 (8) (February 27): 2899–902. doi:10.1021/ja311630a. Rockah-Shmuel, Liat, and Dan S Tawfik. 2012. “Evolutionary Transitions to New DNA Methyltransferases through Target Site Expansion and Shrinkage.” Nucleic Acids Research 40 (22) (December): 11627–37. doi:10.1093/nar/gks944. Rohl, Carol A, Charlie E M Strauss, Kira M S Misura, and David Baker. 2004. “Protein Structure Prediction Using Rosetta.” In Numerical Computer Methods, Part D, edited by Ludwig Brand and Michael L Johnson, 383:66–93. Academic Press. doi:DOI: 10.1016/S0076-6879(04)83004-0. Roman-Hernandez, G, R A Grant, R T Sauer, and T A Baker. 2009. “Molecular Basis of Substrate Selection by the N-End Rule Adaptor Protein ClpS.” Proceedings of the National Academy of Sciences of the United States of America 106 (22) (June): 8888–8893. Roman-Hernandez, Giselle, Jennifer Y Hou, Robert A Grant, Robert T Sauer, and Tania A Baker. 2011. “The ClpS Adaptor Mediates Staged Delivery of N-End Rule Substrates to the AAA+ ClpAP Protease.” Molecular Cell 43 (2): 217–228. 83 Rust, Michael J, Mark Bates, and Xiaowei Zhuang. 2006. “Sub-Diffraction-Limit Imaging by Stochastic Optical Reconstruction Microscopy (STORM).” Nature Methods 3 (10) (October): 793–5. doi:10.1038/nmeth929. Sargeant, David P, Michael R Gryk, Mark W Maciejewski, Vishal Thapar, Vamsi Kundeti, Sanguthevar Rajasekaran, Pedro Romero, et al. 2012. “Secondary Structure, a Missing Component of Sequence-Based Minimotif Definitions.” PloS One 7 (12) (January): e49957. doi:10.1371/journal.pone.0049957. Schlosser, Andreas, Jens T Vanselow, and Achim Kramer. 2005. “Mapping of Phosphorylation Sites by a Multi-Protease Approach with Specific Phosphopeptide Enrichment and NanoLC-MS/MS Analysis.” Analytical Chemistry 77 (16) (August 15): 5243–50. doi:10.1021/ac050232m. Schreiber, Gideon, and Amy E Keating. 2011. “Protein Binding Specificity versus Promiscuity.” Current Opinion in Structural Biology 21 (1) (February): 50–61. doi:10.1016/j.sbi.2010.10.002. Schreiber, Stuart L. 2005. “Small Molecules: The Missing Link in the Central Dogma.” Nature Chemical Biology 1 (2) (July): 64–6. doi:10.1038/nchembio0705-64. Schrödinger, LLC. 2010. “The PyMOL Molecular Graphics System, Version 1.3r1.” Schuenemann, Verena J, Stephanie M Kralik, Reinhard Albrecht, Sukhdeep K Spall, Kaye N Truscott, David A Dougan, and Kornelius Zeth. 2009. “Structural Basis of N-End Rule Substrate Recognition in Escherichia Coli by the ClpAP Adaptor Protein ClpS.” EMBO Reports 10 (5): 508–514. Schulman, B A, A C Carrano, P D Jeffrey, Z Bowen, E R Kinnucan, M S Finnin, S J Elledge, J W Harper, M Pagano, and N P Pavletich. 2000. “Insights into SCF Ubiquitin Ligases from the Structure of the Skp1-Skp2 Complex.” Nature 408 (6810) (November 16): 381–6. doi:10.1038/35042620. Silva, André M N, Rui Vitorino, M Rosário M Domingues, Corinne M Spickett, and Pedro Domingues. 2013. “Post-Translational Modifications and Mass Spectrometry Detection.” Free Radical Biology & Medicine 65 (December): 925–41. doi:10.1016/j.freeradbiomed.2013.08.184. Sriram, Shashi, Jung Hoon Lee, Binh Khanh Mai, Yanxialei Jiang, Yongho Kim, Young Dong Yoo, Rajkumar Banerjee, Seung-Han Lee, and Min Jae Lee. 2013. “Development and Characterization of Monomeric N-End Rule Inhibitors through in Vitro Model Substrates.” Journal of Medicinal Chemistry 56 (6) (March 28): 2540–6. doi:10.1021/jm400046q. Sriram, Shashikanth M, and Yong Tae Kwon. 2010. “The Molecular Principles of N-End Rule Recognition.” Nature Structural & Molecular Biology 17 (10) (October): 1164–5. doi:10.1038/nsmb1010-1164. Strauch, Eva-Maria, Sarel J Fleishman, and David Baker. 2014. “Computational Design of a pH-Sensitive IgG Binding Protein.” Proceedings of the National Academy of Sciences of the United States of America 111 (2) (January 14): 675–80. doi:10.1073/pnas.1313605111. Studer, Romain A, Benoit H Dessailly, and Christine A Orengo. 2013. “Residue Mutations and Their Impact on Protein Structure and Function: Detecting Beneficial and Pathogenic Changes.” The Biochemical Journal 449 (3) (February 1): 581–94. doi:10.1042/BJ20121221. 84 Swaney, Danielle L, Craig D Wenger, and Joshua J Coon. 2010. “Value of Using Multiple Proteases for Large-Scale Mass Spectrometry-Based Proteomics.” Journal of Proteome Research 9 (3) (March 5): 1323–9. doi:10.1021/pr900863u. Tasaki, Takafumi, Shashikanth M Sriram, Kyong Soo Park, and Yong Tae Kwon. 2012. “The N-End Rule Pathway.” Annual Review of Biochemistry 81 (January): 261–89. doi:10.1146/annurev-biochem-051710-093308. Tasaki, Takafumi, Adriana Zakrzewska, Drew D Dudgeon, Yonghua Jiang, John S Lazo, and Yong Tae Kwon. 2009. “The Substrate Recognition Domains of the N-End Rule Pathway.” The Journal of Biological Chemistry 284 (3) (January 16): 1884–95. doi:10.1074/jbc.M803641200. Tcherniuk, Sergey O, Jadwiga Chroboczek, and Maxim Y Balakirev. 2005. “Construction of Tumor-Specific Toxins Using Ubiquitin Fusion Technique.” Molecular Therapy!: The Journal of the American Society of Gene Therapy 11 (2) (February): 196–204. doi:10.1016/j.ymthe.2004.10.009. Thornton, Joseph W. 2004. “Resurrecting Ancient Genes: Experimental Analysis of Extinct Molecules.” Nature Reviews. Genetics 5 (5) (May): 366–75. doi:10.1038/nrg1324. Tinberg, Christine E, Sagar D Khare, Jiayi Dou, Lindsey Doyle, Jorgen W Nelson, Alberto Schena, Wojciech Jankowski, et al. 2013. “Computational Design of Ligand-Binding Proteins with High Affinity and Selectivity.” Nature 501 (7466) (September 12): 212–6. doi:10.1038/nature12443. Wang, Bin, Rainer Malik, Erich A Nigg, and Roman Körner. 2008. “Evaluation of the Low-Specificity Protease Elastase for Large-Scale Phosphoproteome Analysis.” Analytical Chemistry 80 (24) (December 15): 9526–33. doi:10.1021/ac801708p. Wang, K H, G Roman-Hernandez, R A Grant, R T Sauer, and T A Baker. 2008. “The Molecular Basis of N-End Rule Recognition.” Molecular Cell 32 (3) (November): 406–414. Wang, Kevin H, Elizabeth S C Oakes, Robert T Sauer, and Tania A Baker. 2008. “Tuning the Strength of a Bacterial N-End Rule Degradation Signal.” The Journal of Biological Chemistry 283 (36) (September 5): 24600–7. doi:10.1074/jbc.M802213200. Wasinger, Valerie C, Ming Zeng, and Yunki Yau. 2013. “Current Status and Advances in Quantitative Proteomic Mass Spectrometry.” International Journal of Proteomics 2013 (January): 180605. doi:10.1155/2013/180605. Webb, S E, L Zanetti-Domingues, B C Coles, D J Rolfe, R J Wareham, and M L Martin-Fernandez. 2012. “{M}ulticolour Single Molecule Imaging on Cells Using a Supercontinuum Source.” Biomed Opt Express 3 (3) (March): 400–406. Welch, B L. 1947. “The Generalisation of Student’s Problems When Several Different Population Variances Are Involved.” Biometrika 34 (1-2) (January): 28–35. Weston, Andrea D, and Leroy Hood. 2004. “Systems Biology, Proteomics, and the Future of Health Care: Toward Predictive, Preventative, and Personalized Medicine.” Journal of Proteome Research 3 (2): 179–96. 85 Wijma, Hein J, Robert J Floor, Peter A Jekel, David Baker, Siewert J Marrink, and Dick B Janssen. 2014. “Computationally Designed Libraries for Rapid Enzyme Stabilization.” Protein Engineering, Design & Selection!: PEDS 27 (2) (February): 49–58. doi:10.1093/protein/gzt061. Xia, Zanxian, Ailsa Webster, Fangyong Du, Konstantin Piatkov, Michel Ghislain, and Alexander Varshavsky. 2008. “Substrate-Binding Sites of UBR1, the Ubiquitin Ligase of the N-End Rule Pathway.” The Journal of Biological Chemistry 283 (35) (August 29): 24011–28. doi:10.1074/jbc.M802583200. Zanghellini, Alexandre, Lin Jiang, Andrew M Wollacott, Gong Cheng, Jens Meiler, Eric A Althoff, Daniela Röthlisberger, and David Baker. 2006. “New Algorithms and an in Silico Benchmark for Computational Enzyme Design.” Protein Science!: A Publication of the Protein Society 15 (12) (December): 2785–94. doi:10.1110/ps.062353106. Zhou, Pengbo. 2005. “Targeted Protein Degradation.” Current Opinion in Chemical Biology 9 (1) (March): 51–5. doi:10.1016/j.cbpa.2004.10.012. Zhuang, Xiaowei. 2009. “Nano-Imaging with Storm.” Nature Photonics 3 (7) (January): 365–367. doi:10.1038/nphoton.2009.101. 86 Appendices Appendix A - AutoDock GA-LS parameters autodock_parameter_version 4.2 # used by autodock to validate parameter set outlev 1 # diagnostic output level intelec # calculate internal electrostatics seed pid time # seeds for random generator ligand_types C HD N OA # atoms types in ligand fld receptor.maps.fld # grid_data_file map receptor.C.map # atom-specific affinity map map receptor.HD.map # atom-specific affinity map map receptor.N.map # atom-specific affinity map map receptor.OA.map # atom-specific affinity map elecmap receptor.e.map # electrostatics map desolvmap receptor.d.map # desolvation map move ligand.pdbqt # small molecule about 19.9893 -17.8637 29.9874 # small molecule center tran0 random # initial coordinates/A or random quaternion0 random # initial orientation dihe0 random # initial dihedrals (relative) or random tstep 2.0 # translation step/A qstep 50.0 # quaternion step/deg dstep 50.0 # torsion step/deg torsdof 6 # torsional degrees of freedom rmstol 0.0 # cluster_tolerance/A extnrg 1000.0 # external grid energy e0max 0.0 10000 # max initial energy; max number of retries ga_pop_size 150 # number of individuals in population ga_num_evals 5000000 # maximum number of energy evaluations ga_num_generations 27000 # maximum number of generations ga_elitism 1 # number of top individuals to survive to next generation ga_mutation_rate 0.02 # rate of gene mutation ga_crossover_rate 0.8 # rate of crossover ga_window_size 10 # ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution set_ga # set the above parameters for GA or LGA sw_max_its 300 # iterations of Solis & Wets local search 87 sw_max_succ 4 # consecutive successes before changing rho sw_max_fail 4 # consecutive failures before changing rho sw_rho 1.0 # size of local search space to sample sw_lb_rho 0.01 # lower bound on rho ls_search_freq 0.06 # probability of performing local search on individual set_psw1 # set the above pseudo-Solis & Wets parameters unbound_model extended # state of unbound ligand ga_run 100 # do this many hybrid GA-LS runs analysis # perform a ranked cluster analysis 88 Appendix B - DNA Sequences B.1 pET28a-His6-SUMO-GFP(F64L/S65T/F99S/M153T)-TEV-ClpS atgggcagcagccatcatcatcatcatcacagcagcggcctggtgccgcgcggcagccatatgagcgatagcgaagttaaccaagaagcaaaaccggaagttaaacctgaagtgaaaccggaaacccatattaacctgaaagttagtgatggcagcagcgagatcttctttaaaatcaaaaaaaccacaccgctgcgtcgtctgatggaagcatttgcaaaacgtcagggtaaagaaatggatagcctgcgttttctgtatgatggtattcgtattcaggcagatcagacaccggaagatctggatatggatgataacgatattatcgaagcacatcgtgaacagaccggtggtatgagtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactctgacgtatggtgttcaatgcttttcccgttatccggatcacatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaacgcactatatctttcaaagatgacgggaactacaagacgcgtgctgaagtcaagtttgaaggtgatacccttgttaatcgtatcgagttaaaaggtattgattttaaagaagatggaaacattctcggacacaaactcgagtacaactataactcacacaatgtatacatcacggcagacaaacaaaagaatggaatcaaagttaacttcaaaattcgccacaacattgaagatggatccgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtcgacacaatctgccctttcgaaagatcccaacgaaaagcgtgaccacatggtccttcttgagtttgtaactgctgctgggattacacatggcatggatgagctctacaaaaccgaaaatctgtattttcagggcacccagaaaccaagcttgtatcgtgttctgattctgaatgatgattataccccgatggaattcgttgtgtatgttctcgagcgcttctttaacaaaagccgtgaagatgcaacccgtattatgctgcatgttcatcagaacggtgttggtgtttgtggcgtgtatacctatgaagttgcagaaaccaaagttgcccaggttattgatagcgcacgacgtcatcagcatccgctgcagtgtaccatggaaaaagattaagagctccgtcgaccagcttgcggccgcactcgagcaccaccaccaccaccactgagatccggctgctaacaaagcccgaaaggaagctgagttggctgctgccaccgctgagcaataactagcataaccccttggggcctctaaacgggtcttgaggggttttttgctgaaaggaggaactatatccggattggcgaatgggacgcgccctgtagcggcgcattaagcgcggcgggtgtggtggttacgcgcagcgtgaccgctacacttgccagcgccctagcgcccgctcctttcgctttcttcccttcctttctcgccacgttcgccggctttccccgtcaagctctaaatcgggggctccctttagggttccgatttagtgctttacggcacctcgaccccaaaaaacttgattagggtgatggttcacgtagtgggccatcgccctgatagacggtttttcgccctttgacgttggagtccacgttctttaatagtggactcttgttccaaactggaacaacactcaaccctatctcggtctattcttttgatttataagggattttgccgatttcggcctattggttaaaaaatgagctgatttaacaaaaatttaacgcgaattttaacaaaatattaacgcttacaatttaggtggcacttttcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgaattaattcttagaaaaactcatcgagcatcaaatgaaactgcaatttattcatatcaggattatcaataccatatttttgaaaaagccgtttctgtaatgaaggagaaaactcaccgaggcagttccataggatggcaagatcctggtatcggtctgcgattccgactcgtccaacatcaatacaacctattaatttcccctcgtcaaaaataaggttatcaagtgagaaatcaccatgagtgacgactgaatccggtgagaatggcaaaagtttatgcatttctttccagacttgttcaacaggccagccattacgctcgtcatcaaaatcactcgcatcaaccaaaccgttattcattcgtgattgcgcctgagcgagacgaaatacgcgatcgctgttaaaaggacaattacaaacaggaatcgaatgcaaccggcgcaggaacactgccagcgcatcaacaatattttcacctgaatcaggatattcttctaatacctggaatgctgttttcccggggatcgcagtggtgagtaaccatgcatcatcaggagtacggataaaatgcttgatggtcggaagaggcataaattccgtcagccagtttagtctgaccatctcatctgtaacatcattggcaacgctacctttgccatgtttcagaaacaactctggcgcatcgggcttcccatacaatcgatagattgtcgcacctgattgcccgacattatcgcgagcccatttatacccatataaatcagcatccatgttggaatttaatcgcggcctagagcaagacgtttcccgttgaatatggctcataacaccccttgtattactgtttatgtaagcagacagttttattgttcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatactgtccttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaactgagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgctcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcctgatgcggtattttctccttacgcatctgt 89 gcggtatttcacaccgcatatatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagtatacactccgctatcgctacgtgactgggtcatggctgcgccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttttcaccgtcatcaccgaaacgcgcgaggcagctgcggtaaagctcatcagcgtggtcgtgaagcgattcacagatgtctgcctgttcatccgcgtccagctcgttgagtttctccagaagcgttaatgtctggcttctgataaagcgggccatgttaagggcggttttttcctgtttggtcactgatgcctccgtgtaagggggatttctgttcatgggggtaatgataccgatgaaacgagagaggatgctcacgatacgggttactgatgatgaacatgcccggttactggaacgttgtgagggtaaacaactggcggtatggatgcggcgggaccagagaaaaatcactcagggtcaatgccagcgcttcgttaatacagatgtaggtgttccacagggtagccagcagcatcctgcgatgcagatccggaacataatggtgcagggcgctgacttccgcgtttccagactttacgaaacacggaaaccgaagaccattcatgttgttgctcaggtcgcagacgttttgcagcagcagtcgcttcacgttcgctcgcgtatcggtgattcattctgctaaccagtaaggcaaccccgccagcctagccgggtcctcaacgacaggagcacgatcatgcgcacccgtggggccgccatgccggcgataatggcctgcttctcgccgaaacgtttggtggcgggaccagtgacgaaggcttgagcgagggcgtgcaagattccgaataccgcaagcgacaggccgatcatcgtcgcgctccagcgaaagcggtcctcgccgaaaatgacccagagcgctgccggcacctgtcctacgagttgcatgataaagaagacagtcataagtgcggcgacgatagtcatgccccgcgcccaccggaaggagctgactgggttgaaggctctcaagggcatcggtcgagatcccggtgcctaatgagtgagctaacttacattaattgcgttgcgctcactgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgaatcggccaacgcgcggggagaggcggtttgcgtattgggcgccagggtggtttttcttttcaccagtgagacgggcaacagctgattgcccttcaccgcctggccctgagagagttgcagcaagcggtccacgctggtttgccccagcaggcgaaaatcctgtttgatggtggttaacggcgggatataacatgagctgtcttcggtatcgtcgtatcccactaccgagatatccgcaccaacgcgcagcccggactcggtaatggcgcgcattgcgcccagcgccatctgatcgttggcaaccagcatcgcagtgggaacgatgccctcattcagcatttgcatggtttgttgaaaaccggacatggcactccagtcgccttcccgttccgctatcggctgaatttgattgcgagtgagatatttatgccagccagccagacgcagacgcgccgagacagaacttaatgggcccgctaacagcgcgatttgctggtgacccaatgcgaccagatgctccacgcccagtcgcgtaccgtcttcatgggagaaaataatactgttgatgggtgtctggtcagagacatcaagaaataacgccggaacattagtgcaggcagcttccacagcaatggcatcctggtcatccagcggatagttaatgatcagcccactgacgcgttgcgcgagaagattgtgcaccgccgctttacaggcttcgacgccgcttcgttctaccatcgacaccaccacgctggcacccagttgatcggcgcgagatttaatcgccgcgacaatttgcgacggcgcgtgcagggccagactggaggtggcaacgccaatcagcaacgactgtttgcccgccagttgttgtgccacgcggttgggaatgtaattcagctccgccatcgccgcttccactttttcccgcgttttcgcagaaacgtggctggcctggttcaccacgcgggaaacggtctgataagagacaccggcatactctgcgacatcgtataacgttactggtttcacattcaccaccctgaattgactctcttccgggcgctatcatgccataccgcgaaaggttttgcgccattcgatggtgtccgggatctcgacgctctcccttatgcgactcctgcattaggaagcagcccagtagtaggttgaggccgttgagcaccgccgccgcaaggaatggtgcatgcaaggagatggcgcccaacagtcccccggccacggggcctgccaccatacccacgccgaaacaagcgctcatgagcccgaagtggcgagcccgatcttccccatcggtgatgtcggcgatataggcgccagcaaccgcacctgtggcgccggtgatgccggccacgatgcgtccggcgtagaggatcgagatctcgatcccgcgaaattaatacgactcactataggggaattgtgagcggataacaattcccctctagaaataattttgtttaactttaagaaggagatatacc B.2 Mutagenesis Primers > q36k_f ctgtattttcagggcaccaagaaaccaagc > y58h_f gaattcgttgtgcatgttctcgagcgcttc > n65q_f gcttctttcagaaaagccgtgaagatgcaac > n81h_f gcatgttcatcagcatggtgttggtgtttg > n81r_f gcatgttcatcagagaggtgttggtgtttg > v85e_f 90 ggtgttggtgaatgtggcgtgtatacctatg > y89f_f gttggtgtttgtggcgtgttcacctatgaag > a99s_f gaaaccaaagtttcccaggttattgatagcgc > a99t_f gaaaccaaagttacccaggttattgatagcgc > i102m_f caaagttgcccaggttatggatagcgcac > s104f_f caggttattgatttcgcacgacgtcatcag > s104l_f caggttattgatttggcacgacgtcatcag > r107q_f gttattgatagcgcacgacagcatcagcatc > t115v_f gcatccgctgcagtgtgtcatggaaaaag 91 Appendix C - RosettaScripts (cst.ndes.xml) C.1 Protocol HbondsToResidue name=hb partners=1 energy_cutoff=-0.5 backbone=0 bb_bb=0 sidechain=1 res_num=81/> 92 hbonds to q1 93 C.2 Hydrogen Bond Restraint (amide_hb.cst) #nd1 h79 -- n y1 # 2.9 A AtomPair ND1 40 N 81 HARMONIC 2.9 0.001 #n y1 -- nd1 h79 -- ce1 h79 | 109.8 deg | 110.7 x pi / 180 = 1.932 Angle N 81 ND1 40 CE1 40 HARMONIC 1.932 0.01 #n y1 -- nd1 h79 -- cg n79 | 140.5 deg | 139.9 x pi / 180 = 2.441 Angle N 81 ND1 40 CG 40 HARMONIC 2.441 0.01 94 Appendix D - Resfile NATAA EX 1 EX 2 EX 3 EX 4 start 6 A PIKAA i # i45 7 A PIKAA l # l46 8 A NATRO # n47 9 A PIKAA d # d48 10 A PIKAA d # d49 12 A PIKAA t # t51 14 A PIKAA m # m53 17 A PIKAA v # v56 36 A PIKAA m # m75 39 A PIKAA v # v78 40 A NATRO # h79 73 A PIKAA f # l112 95 Appendix E - Molecular Dynamics Scripts: NAMD Executables E.1 Minimization, Heating, Equilibration set w1 /home/gw/ner/namd/; # read from set w2 /home/gw/ner/namd/eq ; #write to set w3 /home/gw/ner/namd/input ; #read from coordinates ${w3}/solvated_ionized.pdb structure ${w3}/solvated_ionized.psf #bincoordinates ./equi1_restart.coor #binvelocities ./equi1_restart.vel #extendedSystem ./equi1_restart.xsc firsttimestep 0 temperature 0 # force field paratypecharmm on parameters ${w3}/par_all27_prot_na.prm exclude scaled1-4 1-4scaling 1.0 # 1.0 for Charmm, 0.833333 for Amber # dielectric constant # nonbond interactions # gw: vdw starts going to zero at switchdist (10) and hits zero at cutoff (12) # electostatics shifted down so that 0 at cutoff OR # full electrostatics calculated before and PME used after cutoff switching on switchdist 10 cutoff 12 pairlistdist 14 # integrator # unless rigid all (SHAKE) or MOLLY is used need 4 fs for # long-range electrostatic forces (FullElectFrequency), # 2 fs for short-range nonbonded forces (nonbondedFreq), # and 1 fs for bonded forces (timestep) timestep 1.0 nonbondedFreq 2 FullElectFrequency 4 stepspercycle 20 #PME stuff cellOrigin 0.05 0.62 0.16 cellBasisVector1 60.43 000.00 000.00 cellBasisVector2 000.00 60.43 000.00 cellBasisVector3 000.00 000.00 60.43 PME on PmeGridsizeX 64 PmeGridsizeY 64 96 PmeGridsizeZ 64 # output outputname ${w2}/equi1 outputenergies 100 outputtiming 100 binaryoutput no dcdfreq 1000 # how often we output trajectories. wrapall on wrapNearest on # for restarting: restartname ${w2}/equi1_restart restartfreq 1000 restartsave no # position-restrained fixedAtoms on fixedAtomsFile ${w3}/fixed.pdb fixedAtomsCol B # 1.0 in this col. fixedAtomsForces on constraints on consRef ${w3}/restrain_ca.pdb consKFile ${w3}/restrain_ca.pdb consKCol B langevin on langevinDamping 10 langevinTemp 310 langevinHydrogen on langevinPiston on langevinPistonTarget 1.01325 langevinPistonPeriod 200 langevinPistonDecay 100 langevinPistonTemp 310 useGroupPressure yes # smaller fluctuations # run one step to get into scripting mode minimize 0 # turn off until later langevinPiston off # minimize fixed lipids protein minimize 2000 output ${w2}/min_fix # min all atoms fixedAtoms off minimize 2000 output ${w2}/min_all # heat with restrained 97 run 50000 output ${w2}/heat # equilibrate volume with backbone restrained langevinPiston on constraintScaling 1.00 run 200000 output ${w2}/equil2_ca constraintScaling 0.75 run 100000 output ${w2}/equil3_ca constraintScaling 0.50 run 100000 output ${w2}/equil4_ca constraintScaling 0.25 run 100000 output ${w2}/equil6_ca constraintScaling 0.0 run 200000 output ${w2}/equil7_ca E.2 Production # Production simulation started from a pre-equilibrated conformation set w1 /home/gw/ner/namd ; set w2 /home/gw/ner/namd/eq ; set w3 /home/gw/ner/namd/2res/input ; set w4 /share/networkscratch/gw/ner/prod ; coordinates ${w3}/solvated_ionized.pdb structure ${w3}/solvated_ionized.psf set mmm 1 set nnn 2 # molecular system bincoordinates ${w2}/equi1_restart.coor.old binvelocities ${w2}/equi1_restart.vel.old extendedSystem ${w2}/equi1_restart.xsc.old firsttimestep 753000 # get from xsc file: ${w2}/equi1_restart.xsc.old # force field paratypecharmm on parameters ${w3}/par_all27_prot_na.prm exclude scaled1-4 1-4scaling 1.0 # 1.0 for Charmm, 0.833333 for Amber 98 # nonbond interactions switching on switchdist 8.5 cutoff 10 pairlistdist 12 # integrator timestep 2.0 # in fs. rigidBonds all rigidIterations 500 stepspercycle 10 nonbondedFreq 1 fullElectFrequency 2 # output outputname ${w4}/prod outputenergies 100 outputtiming 1000 binaryoutput no dcdfreq 10000 # how often we output trajectories. #outputPressure 500 wrapall on # PME issues PME yes PmeGridsizeX 64 PmeGridsizeY 64 PmeGridsizeZ 64 #margin 5 # for restarting: restartname ${w4}/prod restartfreq 5000 restartsave no # Temperature control (langevin) langevin on langevinDamping 2 # people use anything between 1 (small frioction) to 10 (large friction) langevinTemp 310 langevinHydrogen on # Constant Pressure Control (variable volume) langevinPiston on langevinPistonTarget 1.01325 langevinPistonPeriod 100 langevinPistonDecay 50 langevinPistonTemp 310 useGroupPressure yes # smaller fluctuations fixedAtoms off numsteps 100000000 """@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2014-05"@en ; edm:isShownAt "10.14288/1.0166916"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Genome Science and Technology"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution 2.5 Canada"@en ; ns0:rightsURI "http://creativecommons.org/licenses/by/2.5/ca/"@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Redesign of the N-end rule protein ClpS for use in high-throughput N-end protein sequencing"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/46377"@en .