Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The evolution of novel xenobiotic organophosphate activity in the metallo-β-lactamase superfamily Yang, Gloria 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2021_may_yang_gloria.pdf [ 61.5MB ]
Metadata
JSON: 24-1.0395236.json
JSON-LD: 24-1.0395236-ld.json
RDF/XML (Pretty): 24-1.0395236-rdf.xml
RDF/JSON: 24-1.0395236-rdf.json
Turtle: 24-1.0395236-turtle.txt
N-Triples: 24-1.0395236-rdf-ntriples.txt
Original Record: 24-1.0395236-source.json
Full Text
24-1.0395236-fulltext.txt
Citation
24-1.0395236.ris

Full Text

 i The evolution of novel xenobiotic organophosphate activity in the metallo-β-lactamase superfamily  by  Gloria Yang   B.Sc., The University of British Columbia, 2016  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF   DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Biochemistry and Molecular Biology)   THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)     December 2020  © Gloria Yang, 2020   ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:     The evolution of novel xenobiotic organophosphate activity in the metallo-β-lactamase      superfamily   submitted by   Gloria Yang  in partial fulfillment of the requirements for  the degree of    Doctor of Philosophy    in   Biochemistry and Molecular Biology     Examining Committee:    Dr. Nobuhiko Tokuriki, Biochemistry & Molecular Biology, UBC  Supervisor   Dr. Stephen G. Withers, Chemistry, UBC  Supervisory Committee Member   Dr. Michael Murphy, Microbiology & Immunology, UBC  Supervisory Committee Member   Dr. Filip L.A. Van Petegem, Biochemistry & Molecular Biology, UBC  University Examiner   Dr. Vikramaditya G. Yadav, Chemical & Biological Engineering, UBC  University Examiner            iii Abstract New protein functions often evolve through the recruitment and optimization of latent promiscuous activities. How do mutations alter the molecular architecture to change function? The overarching goal of my thesis is to provide answers to this question, utilizing a novel xenobiotic organophosphate hydrolase (OPH) activity as model. Directed evolution performed on an N-acyl homoserine (AHL) lactonase enzyme possessing promiscuous OPH activity demonstrated that the new function can be quickly optimized via a handful of mutations that rearranged active site residues to adapt to the new substrate. Ancestral sequence reconstruction (ASR) conducted on a recently evolved OPH enzyme, methyl-parathion hydrolase (MPH), revealed that the OPH activity emerged from an ancestral lactonase enzyme via five mutations that enlarged the active site to increase complementarity to the new substrate. Subsequent generation of the adaptive fitness landscapes formed by these five mutations uncovered a prevalence of epistatic interactions that constrained the number of accessible evolutionary trajectories. Furthermore, the topologies of the landscapes drastically change in response to subtle differences in substrate substituents. Finally, characterization of several extant lactonase orthologs of MPH revealed that sequence divergence has resulted in lower levels of promiscuous OPH activities in the orthologs compared to the ancestral enzyme that gave rise to MPH. Moreover, the five mutations fail to substantially increase OPH activity in the genetic backgrounds of the orthologs. Comparative directed evolution conducted on the MPH ancestor and the orthologs towards OPH activity show that the ancestral enzyme is able to improve the new function more rapidly. Overall, the results of this thesis contribute to our understanding of enzyme evolution, and will help to better protein engineering and design in the future.                iv Lay Summary Enzymes are biological catalysts that accelerate the rates of chemical reactions. Enzymes play crucial roles in almost all biological systems, and have additionally been employed by humans for numerous pratical applications ranging from manufacturing food and beverages, to degrading environmental wastes, to the treatment of various diseases. Being able to understand how enzymatic functions emerge and get improved by mutations will enable us to create better catalysists for a variety of industrial and medical purposes, and will also further our understanding of the evolution of life itself. Here, I investigate how a bacterial enzyme is able to obtain the ability to degrade man-made pesticides that have only existed in the environment for less than a century, and understand how mutations enabled the rapid improvement of that function. This work will contribute to our understanding of enzyme evolution and improve our ability to engineer and design catalysts for desirable functions.    v  Preface Parts of Chapter 1 was written together with Charlotte M. Miton in the laboratory of Dr. Nobuhiko Tokuriki at the University of British Columbia, Vancouver, Canada and published in “Yang G., Miton C.M., and Tokuriki N. A mechanistic view of enzyme evolution. Protein science 29,1724–1747 (2020)”. C. M. Miton and I contributed equally in developing the concepts of the manuscript and curating all of the research papers that were reviewed, with input from Dr. Tokuriki. The manuscript was written and edited by all authors.  Parts of Chapter 2 have been performed in collaboration with Nansook Hong in the laboratory of Dr. Colin Jackson at the Australian National University, Canberra, Australia, and with Florian Baier in the laboratory of Dr. Nobuhiko Tokuriki at the University of British Columbia, Vancouver, Canada, and has been published as “Yang, G., Hong, N., Baier, F., Jackson, C.J., Tokuriki, N. Conformational tinkering drives evolution of a promiscuous activity through indirect mutational effects. Biochemistry 55, 4583-4593 (2016)”. N. Hong performed crystal structure analysis and molecular dynamic simulations of AiiA-wt and AiiA-r4, as shown in Table A.3 and Figures 2.6-2.10. F. Baier helped with the conception of the project, and performed protein purifications and enzyme kinetics. I performed all of the other experiments and wrote the manuscript together with F. Baier and Dr. Tokuriki, with input from all of the other authors.  Parts of Chapter 3 have been performed in collaboration with Dr. Dave W. Anderson at the University of Calgary, Calgary, Canada, Elias Dohmen in the laboratory of Dr. Erich Bornberg-Bauer at Westfälische Wilhelms University, Münster, Germany, Nansook Hong and Paul D. Carr in the laboratory of Dr. Colin Jackson at the Australian National University, Canberra, Australia, Dr. Shina Caroline Lynn Kamerlin at the Uppsala University, Uppsala, Sweden, and Florian Baier in the laboratory of Dr. Nobuhiko Tokuriki at UBC, Vancouver, Canada, and has been published as “Yang, G., Anderson, D.W., Baier, F., Dohmen, E., Hong, N., Carr, P.D., Kamerlin, S.C.L., Jackson, C.J., Bornberg-Bauer, E., Tokuriki, N. Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nature Chemical Biology 15, 1120–1128 (2019)”. E. Dohmen conducted bioinformatics, shown in Table B.1 and Figures 3.2-3.3. N. Hong, Paul D. Carr, and F. Baier performed crystal structure analysis of AncDHCH1, shown in Table B.4 and Figures 3.7 and 3.9. N. Hong, Paul D. Carr, and S. C. L. Kamerlin conducted computational analyses of AncDHCH1 and MPH, shown in Table B.5 and Figures 3.7 and 3.10. F. Baier conducted lysate  vi activity assays, shown in Table B.7 and Figures 3.11-3.12 and 3.15. D. W. Anderson conducted statistical analyses, shown in Table B.8 and Figures 3.11, 3.13, and 3.16-3.19. I performed all the other experiments and conceived the project together with Dr. Tokuriki, and also wrote the manuscript with Dr. Tokuriki, with input from all of the other authors.   Parts of Chapter 4 has been performed in collaboration with Elias Dohmen in the laboratory of Dr. Erich Bornberg-Bauer at Westfälische Wilhelms University, Münster, Germany. E. Dohmen conducted bioinformatics, shown in Figure 4.1.and Table 4.3. I performed all other experiments and conceived the project with Dr. Tokuriki, and also wrote the chapter, which is edited by Dr. Tokuriki.  Chapter 5 was written entirely by myself, and edited by Dr. Tokuriki.                               vii Table of Contents Abstract ………………………………………………………………………………………...iii  Lay Summary ………………………………………………………………………………….iv  Preface  ………………………………………………………………………………………….v Table of Contents …………………………………………………………………………….vii List of Tables ………………………………………………………………………………….xii List of Figures ……………………………………………………………………………….xiii List of Abbreviations ………………………………………………………………………...xvi Acknowledgements …………………………………………………………………………xviii Chapter 1: Introduction …………………………………………………………………..…...1      1.1 Enzymes as biological catalysts ………………………………………………………….1      1.2 The evolution of new functions through the optimization of promiscuous activities …....2      1.3 Exploring enzyme evolution ……………………………………………………………..4            1.3.1 Ancestral sequence reconstruction  …………………………………………………4           1.3.2 Directed evolution …………………………………………………………………..6      1.4 Global view of the distribution of activity-enhancing mutations ………………………..8           1.4.1 What fraction of mutations improve activity? ……………………….……………..8           1.4.2 Where are activity-enhancing mutations located? ………..………….……………10           1.4.3 How many mutations need to be accumulated to elicit a new function? ……….…11           1.4.4 The effect of mutational epistasis ……………………………………………..…..11      1.5 Molecular mechanisms of activity-enhancing mutations ………………………………14           1.5.1 Creation of new interactions with the substrate …………………….……………..14           1.5.2 Active-site reshaping …………………………………….………………………..17           1.5.3 Conformational tinkering of active site residues …………………………….........18           1.5.4 Repositioning or switching metal cofactors …………………………………........20           1.5.5 Alteration of enzyme dynamics …………..………………………………….........21           1.5.6 Tailoring the enzyme-substrate complex ……………………………………….....23      1.6 Evolution of xenobiotic activities  ………………………………………………………25           1.6.1 Xenobiotic-degrading enzymes ……………………………………………….…..25           1.6.2 Organophosphates …………………………………………………….…………..26  viii           1.6.3 The evolution of organophosphate-degrading enzymes ….……………………….29      1.7 Aims and scope of thesis ……………………………………………………………….31 Chapter 2: Evolution of organophosphate hydrolase activity in an N-acyl homoserine lactonase ……………………………………………………………………………………….33      2.1 Summary ………………………………………………………………………………..33      2.2 Introduction …………………………………………………………………………….34      2.3 Materials and Methods …………………………………………………………………38           2.3.1 Molecular cloning of AiiA variants and mutant libraries …………………………38           2.3.2 Site-directed mutagenesis …………………………………………………………38           2.3.3 Generation of mutagenized libraries ………………………………………………38           2.3.4 Generation of DNA shuffling libraries ……………………………………………39           2.3.5 Activity prescreen on agar plates ………………………………………………….39           2.3.6 Cell lysate activity screen in 96-well plates ……………………………………….40           2.3.7 Enzyme purification for kinetic analysis ………………………………………….40           2.3.8 Enzyme kinetics …………………………………………………………………...40           2.3.9 Crystallization of AiiA-wt and AiiA-R4 …………………………………………..41           2.3.10 Molecular docking and molecular dynamics simulations …….………………….42      2.4 Results ………………………………………………………………………………….43           2.4.1 Directed evolution of AiiA towards increased phosphotriesterase activity ……….43           2.4.2 The functional adaptation of AiiA yielded a generalist enzyme …………………..45           2.4.3 Genotypic changes that led to increased phosphotriesterase activity ……………...47           2.4.4 The combination of indirect and direct mutations underlies the conformational active                      site tinkering ……………………………………………………………………………..49           2.4.5 Epistatic interactions altered the contribution of Phe68 …………………………..53           2.4.6 Extended comparative mutational scanning reveals changes in the functional            contribute of other non-mutated active site residues ……………….……………………55           2.4.7 Epistatic interactions between mutations of the evolutionary trajectory ………….57      2.5 Discussion ………………………………………………………………………………59 Chapter 3: The evolution of organophosphate hydrolase activity in methyl-parathion hydrolase ………………………………………………………………………………………62  ix      3.1 Summary  ………………………………………………………………………………..62      3.2 Introduction ……………………………………………………………………………..63      3.3 Materials and Methods ………………………………………………………………….66           3.3.1 Phylogenetic analysis and ancestral reconstruction ……………………………….66           3.3.2 Cloning …………………………………………………………………………….66           3.3.3 Purification of tagged proteins …………………………………………………….67           3.3.4 Enzyme kinetics …………………………………………………………………...67           3.3.5 Site-directed mutagenesis …………………………………………………………68           3.3.6 Cell lysate activity screen in 96-well plates ………………………………………68           3.3.7 Protein purification for crystallization …………………………………………….69           3.3.8 Crystallization of AncDHCH1 …………………………………………………….70           3.3.9 Data collection and structure determination ………………………………………70           3.3.10 Molecular docking ……………………………………………………………….70           3.3.11 Linear modeling of genetic and environmental effects …………………………..71           3.3.12 Evaluation of correlated epistasis between different substrates …………………73           3.3.13 Evolutionary pathway determination …………………………………………….73           3.3.14 Testing epistatic consequences with modeled datapoints ………………………..74      3.4 Results  ………………………………………………………………………………….75           3.4.1 MPH evolved from a dihydrocoumarin hydrolase enzyme ……………………….75           3.4.2 Five mutations enabled the evolution of OP activity ……………………………...81           3.4.3 The rugged adaptive landscape of MPH …………………………………………..88           3.4.4 High-order epistasis between five mutations ……………………………………...93           3.4.5 Epistasis change between O- vs. S-substituted substrates ………………………...95           3.4.6 Adaptive landscapes yield insight into OP specificity …………………………..100      3.5 Discussion  ……………………………………………………………………………..103 Chapter 4: Historical contingency in the evolution of organophosphate hydrolase activity  …...……………………………………………………………………………………………105      4.1 Summary  ………………………………………………………………………………105      4.2 Introduction …………………………………………………………………………..106      4.3 Materials and Methods  ………………………………………………………………..109  x           4.3.1 Phylogenetic analysis and ancestral reconstruction………………………………109           4.3.2 Genomic context analysis ………………………………………………………..109           4.3.3 Cloning ………………………………………………………….………………..110           4.3.4 Site-directed mutagenesis  ………………………………………………………..110           4.3.5 Purification of tagged proteins …………………………………………………...111           4.3.6 Enzyme kinetics ………………………………………………………………….111           4.3.7 Generation of mutagenized libraries ……………………………………………..112           4.3.8 Cloning of mutagenized libraries ……………………………………………......112           4.3.9 Cell lysate screen in 96-well plates ……………………………………………....113      4.4 Results  ………………………………………………………………………………...114           4.4.1 Sequence information and genomic context  ……………………………………..114           4.4.2 Characterization of ancestral enzymes and orthologs …………………………....119           4.4.3 Effects of the five mutations on different genetic backgrounds  …………………121           4.4.4 The singular effects of the five mutations in AncDHCH4 and AncDHCH5  ……124           4.4.5 Mapping the genetic changes between backgrounds …………………………….125           4.4.6 Comparative directed evolution of AncDHCH1 and orthologs  …………………130           4.4.7 Genotypic changes of AncDHCH1 and orthologs  ………………………………133      4.5 Discussion ……………………………………………………………………………..135 Chapter 5: Conclusion and future outlook ………………………………………………...138      5.1 General summary and conclusion ……………………………………………………..138      5.2 Future outlook …………………………………………………………………………138           5.2.1 Probing the catalytic mechanisms of MPH ……………………………………....138           5.2.2 Identifying the genetic causes of mutational incompatibilities ………………….139           5.2.3 Investigating mutational incompatibilities using adaptive fitness landscapes …...141           5.2.4 Obtaining a structural basis for mutational incompatibilities …………….……...141 Bibliography  …………………………………………………………………………………143 Appendices  …………………………………………………………………………………..165      Appendix A: Supplementary material for Chapter 2 ………………………………………165           Table A.1 Library information and amino acid changes ………………………………165           Table A.2 Site-directed mutagenesis primers ………………………………………….166  xi           Table A.3 Crystallography data collection and refinement statistics ………………….167           Table A.4 Cell lysate activities ………………………………………………………..168           Table A.5 Kinetic parameters for paraoxonase activity ……………………………….169      Appendix B: Supplementary material for Chapter 3 ….…………………………………...170           Table B.1 Ambiguous sites in AncDHCH1 and the highest Bayesian posterior probability            alternative residue  ……………………………………………………………………...170           Table B.2 Catalytic activities of purified enzymes against different substrates ……….171           Table B.3 Kinetic parameters of enzymes used for this study for selected substrates …172           Table B.4 Crystallography data collection and refinement statistics ………………….173           Table B.5 Docking data of methyl-paraoxon (MPO) and methyl-parathion (MPS) in the            active sites of AncDHCH1 and MPH ………………………………………………….174           Table B.6 Site-directed mutagenesis primers ………………………………………….175           Table B.7 Cell lysate activities of the 32 MPH variants ………………………………176           Table B.8 Relative activities of the 32 MPH variants towards methyl-parathion predicted            from linear regression models using effects up to the 2nd order and effects up to the 5th            order ……………………………………………………………………………………177      Appendix C: Supplementary material for Chapter 4  ……..……………………………….178           Table C.1 MPH sequences deposited in the GenBank database ……………………….178           Table C.2 Genomic information of DHCH orthologs …………………………………179           Table C.3 Primers utilized to generate +m5 variations of ancestral sequences ………..183           Figure C.1 Full SDS-PAGE analysis showing soluble and insoluble fractions of the            enzymes characterized  …………………………………………………………………184           Table C.4 Kinetic parameters of the enzymes used in this study ……………………...185           Table C.5 Mutations of all variants sequenced for directed evolution ………………...186        xii List of Tables Table 2.1 Kinetic parameters of AiiA variants and MPH for selected substrates ……………..47 Table 4.1 Information of the enzymes used in this study ……………………………………..115 Table 4.2 Amino acid sequence identities of enzymes characterized in this study …………...116 Table 4.3 Mutations between AncDHCH4 and JsDHCH and between AncDHCH2 and AncDHCH5 …………………………………………….……………………………………..129 Table 4.4 Summary of mutations of variants picked for each round of directed evolution …..133 Table A.1 Library information and amino acid changes ……………………………………..165 Table A.2 Site-directed mutagenesis primers ………………………………………………...166 Table A.3 Crystallography data collection and refinement statistics ………………………...167 Table A.4 Cell lysate activities ……………………………………………………………….168 Table A.5 Kinetic parameters for paraoxonase activity ………………………………………169 Table B.1 Ambiguous sites in AncDHCH1 and the highest Bayesian posterior probability  alternative residue …………………………………………………………………………….170 Table B.2 Catalytic activities of purified enzymes against different substrates ……………...171 Table B.3 Kinetic parameters of enzymes used for this study for selected substrates ……….172 Table B.4 Crystallography data collection and refinement statistics …………………………173 Table B.5 Docking data of methyl-paraoxon (MPO) and methyl-parathion (MPS) in the active  sites of AncDHCH1 and MPH ………………………………………………………………..174 Table B.6 Site-directed mutagenesis primers ………………………………………………...175 Table B.7 Cell lysate activities of the 32 MPH variants ……………………………………...176 Table B.8 Relative activities of the 32 MPH variants towards methyl-parathion predicted from  linear regression models using effects up to the 2nd order and effects up to the 5th order …….177 Table C.1 MPH sequences deposited in the GenBank database ……………………………...178 Table C.2 Genomic information of DHCH orthologs ………………………………………..179 Table C.3 Primers utilized to generate +m5 variations of ancestral sequences ………………183 Table C.4 Kinetic parameters of the enzymes used in this study ……………………………..185 Table C.5 Mutations of all variants sequenced for directed evolution ……………………….186    xiii List of Figures Figure 1.1 A global view of the distribution of activity-enhancing mutations ……………….....9 Figure 1.2 The effect of mutational epistasis .………………………………………………….13 Figure 1.3 Creation of new enzyme-substrate interactions by evolution ………………………16 Figure 1.4 Active site reshaping by evolution …………………………………………………18 Figure 1.5 Conformational tinkering of active site residues by evolution ………….………….20 Figure 1.6 Alteration of enzyme conformational dynamics by evolution ……………………..23 Figure 1.7 Substrate repositioning leads to novel enzyme-substrate complexes in evolution …25 Figure 1.8 Structure of organophosphates and catalytic mechanisms of OPH enzymes ...…….28 Figure 1.9 Crystal structures of OPH enzymes from different superfamilies ………………….31 Figure 2.1 Protein structures and enzymatic reactions of AiiA and MPH …………………….36 Figure 2.2 Proposed catalytic mechanism of AiiA for native and promiscuous substrates …....36 Figure 2.3 Metal dependency of AiiA for paraoxon activity …………………………………..44 Figure 2.4 Overview of the directed evolution scheme ………………………………………..45 Figure 2.5 Activity changes of AiiA over the evolutionary trajectory towards improved  phosphotriesterase activity ……………………………………………………………………..46 Figure 2.6 Structural changes of AiiA during the evolution ……………………...……………48 Figure 2.7 Structural comparison between AiiA-wt and AiiA-R4 …………………………….51 Figure 2.8 B-factor in the crystal structures of AiiA-wt and AiiA-R4 ………………………...52 Figure 2.9 Overlay of two independent molecular dynamics (MD) simulations of AiiA-wt and AiiA-R4 ……………………………………………………………………………………52 Figure 2.10 A snapshot of active site configuration and substrate position in the molecular dynamics (MD) simulations at 25ns of AiiA-wt and AiiA-R4 ………………………………...53 Figure 2.11 Comparative mutational scanning of Phe68 in the background of different AiiA  variants …………………………………………………………………………………………55 Figure 2.12 Comparative mutational scanning of selected non-mutated active site residues between AiiA-wt and AiiA-R4 ………………………………………………………………...56 Figure 2.13 Change in mutational effect in the background of AiiA-wt and in the evolutionary trajectory …………………………………………………………………………58 Figure 3.1 Representative sequence similarity network (SSN) of enzymes in the metallo-β-  xiv lactamase superfamily ………………………………………………………………………….65 Figure 3.2 Phylogenetic tree of MPHs and representative DHCH enzymes …………………..77 Figure 3.3 Site-specific posterior probabilities for ancestral amino acid sequences …………..79 Figure 3.4 Phylogeny and phenotype of methyl-parathion hydrolase …………………………80 Figure 3.5 A multiple sequence alignment of representative extant MPH, DHCH, and predicted ancestral enzymes ……………………………………………………………………83 Figure 3.6 Identification of five key adaptive mutations between AncDHCH1 and MPH ……84 Figure 3.7 Structural and biochemical effects of five key mutations ………………………….85 Figure 3.8 Proposed catalytic mechanisms of MPH for organophosphate and dihydrocoumarin  hydrolytic reactions …………………………………………………………………………….86 Figure 3.9 Comparison of the crystal structures of AncDHCH1 and MPH  …………………...87 Figure 3.10 Molecular docking poses of AncDHCH1 and MPH from Glide …………………88 Figure 3.11 Adaptive landscape and mutational effects of key mutations for methyl-parathion hydrolase activity  ………………………………………………………………………………90 Figure 3.12 SDS-PAGE analysis showing soluble fractions of the 32 MPH combination variants ………………………………………………………………………………………………….92 Figure 3.13 Reconstructed adaptive fitness landscapes of MPH for methyl-parathion activity using simulated data ……………………………………………………………………………94 Figure 3.14 Distances between the key functional residues in the crystal structures ………….95 Figure 3.15 Adaptive landscapes for three additional OP substrates ……….………………….97 Figure 3.16 Statistical analyses of singular and epistatic mutational effects for ethyl-parathion, methyl-paraoxon, and ethyl-paraoxon  ………………...……………………………………….98 Figure 3.17 Mutational analyses for three additional OP substrates …………………………...99 Figure 3.18 The effects of mutations in different genetic backgrounds ………………………100 Figure 3.19 Changes in the singular and epistatic effects of h258L and i271T between methyl-parathion and methyl-paraoxon substrates  …………………………………………...102 Figure 4.1 The evolution of OPH activity in MPH …………………………………………...108 Figure 4.2 Sequence alignment of all of the reconstructed ancestral and orthologous sequences  utilized in this study …………………………………………………………………………..117 Figure 4.3 Genomic context of DHCH orthologs …………………………………………….118  xv Figure 4.4 Biochemical characterization of the ancestral enzymes and orthologs  …………...120 Figure 4.5 Relationship between effects of the five mutations and the starting phenotype and  sequence identity to MPH …………………………………………………………………….123 Figure 4.6 Singular effects of the five mutations on AncDHCH4 and AncDHCH5 compared to  MPH …………………………………………………………………………………………..125 Figure 4.7 Genotypic divergence between enzymes  …………………………………………128 Figure 4.8 Comparative directed evolution of AncDHCH1 and five DHCH orthologs  ……..132 Figure 4.9 Genotypic changes in AncDHCH1 and five orthologs  …………………………...134 Figure C.1 Full SDS-PAGE analysis showing soluble and insoluble fractions of the enzymes  characterized  ………………………………………………………………………………….184                               xvi List of Abbreviations AHL  N-acyl homoserine lactone ASR  ancestral sequence reconstruction BLAST basic local alignment search tool DHC  dihydrocoumarin DHCH  dihdyrocoumarin hydrolase E. coli  Escherichia coli EC  enzyme commission g   gravity ID  identification IPTG  isopropyl β-D-1-thiogalactopyranoside MBL  metallo-β-lactamase MD  molecular dynamics ML  maximum likelihood PCR  polymerase chain reaction PDB  protein data bank PTE  phosphoriesterase ODxxx   optical density at XXX nm OP  organosphosphate OPH  organophosphate hydrolase RMP   revolutions per minute RMSD  root mean standard deviation RX  round x of directed evolution SDS-PAGE sodium dodecylsulfate polyacrylamide gel electrophoresis TS  transition state UBC  University of British Columbia vs.  versus WT  wild-type    xvii Amino acid abbreviations: A (or Ala) alanine C (or Cys) cysteine D (or Asp) aspartate E (or Glu) glutamic acid F (or Phe) phenylalanine G (or Gly) glycine H (or His) histidine I (or Ile) isoleucine K (or Lys) lysine L (or Leu) leucine M (or Met) methionine N (or Asn) asparagine P (or Pro) proline Q (or Gln) glutamine R (or Arg) arginine S (or Ser) serine T (or Thr) threonine V (or Val) valine W (or Trp) tryptophan Y (or Tyr) tyrosine               xviii Acknowledgements I thank my supervisor, Dr. Nobuhiko Tokuriki. I’m extremely grateful for the opportunity that he has given to me to do my graduate studies at his lab, and for his guidance, teaching, encouragement, and support throughout my PhD studies. He is the best supervisor that I could have asked for. I feel very fortunate to have been able to work with him, and under his tutelage, I have developed much both in my scientific skills and as a person. I thank my committee members, Dr. Stephen G. Withers and Dr. Michael Murphy for their feedback and support on my research and dissertation. I thank all of my collaborator who have contributed so much to the research described in this thesis. I thank Dr. Colin Jackson and the members of his lab, in particular Nansook Hong and Paul D. Carr, for the work that they did in crystallography and molecular dynamics analyses. I thank Dr. Erich Bornberg-Bauer and Elias Dohmen for their contributions in bioinformatics analyses. I thank Dr. David Anderson for his work conducting statistical analyses. I thank Dr. Shina Caroline Lynn Kamerlin and Anna Pabis for their contributions to conducting molecular dynamics analyses. I thank all of my past and present colleagues in the Tokuriki lab for providing me with help and support throughout my research. In particular, I thank Florian Baier for mentoring me when I first started working at the lab as a co-op student, teaching me all of the techniques, and supporting me during the initial stages of my project experiment. I thank the co-op students, Elina Levchenko and Jordan Williams-Yuen, for helping me out with my experiments. I thank Dr. Charlotte Miton, Dr. Janine Copp, Dan Kehila, and John Chen for their contributions of ideas and suggestions to my research.  Finally, I thank my family and friends. In particular, I thank my parents for their unwavering love and support. Even during the most difficult times, I’ve always known that I will be able to make it through because I had them at my back.  1 Chapter 1: Introduction  1.1 Enzymes as biological catalysts Enzymes are biological catalysts that are capable of accelerating the rates of chemical reactions by as much as 1015-fold, enabling reactions that may take years in solution to occur on the order of seconds or less (Benkovic & Hammes-Schiffer 2003; Wolfenden 2011). As such, enzymes have come to play a critical role in almost all biological systems, increasing the rates of a countless number of essential reactions that need to occur in an organism in order to sustain life (Wolfenden 2011). Since ancient times, humans have harnessed the power of enzymes in the manufacture of foods and beverages; today, enzymes are employed for a diverse array of industrial and biomedical applications ranging from consumer products such as laundry detergents and cosmetics, to the generation of biofuels, to the degradation of toxic wastes in the environment, to therapeutic treatments for a number of diseases and disorders (Davids et al. 2013; Gurung et al. 2013; Singh et al. 2016). Understanding how enzymes function is thus crucial both for practical applications, and for furthering our fundamental understandings of life itself. From a mechanistic perspective, enzymes accelerate spontaneous reactions without altering the equilibrium (i.e., the thermodynamics) (Albery & Knowles 1976; Fersht 1974). During a reaction, the substrate goes through a short-lived intermediate form, known as the “transition state” (TS), before being converted into the final product (Benkovic & Hammes-Schiffer 2003; Fersht 1974). The TS is highly unstable, resulting in an energy barrier known as the “activation energy” (Ea), defined as the energetic difference between the ground state of the substrate and the TS, that must be overcome in order for the reaction to proceed (Benkovic & Hammes-Schiffer 2003; Fersht 1974; Schramm 2011). Enzymes function by stabilizing the TS through various mechanisms, such as the stabilization of unfavourable charges, thereby lowering the Ea to accelerate the reaction without affecting the energetic differences between the reactant and product (Benkovic & Hammes-Schiffer 2003; Fersht 1974; Schramm 2011). This is achieved by a specialized region in the enzyme known as the “active site”, which contains functional groups that are able to form specific, temporary bonds with a particular substrate and catalyse the reaction of that substrate (Albery & Knowles 1976; Benkovic & Hammes-Schiffer 2003; Fersht 1974; Jencks 1975).  2 The measure of enzyme performance is most commonly described as the “catalytic efficiency” or kcat/KM ratio, where kcat (s-1) is the substrate turnover per time and KM (M) is the Michaelis constant, the substrate concentration where the reaction has reached half of its maximum velocity (Vmax) (Benkovic & Hammes-Schiffer 2003). Kinetic perfection is thought to be achieved when enzymes reach the diffusion limit, i.e., reactions become limited by the rate of diffusion of molecules in the solution, which is thought to occur at kcat/KM of 108-109 s-1M-1 (Bar-Even et al. 2011). Kinetically perfect enzymes, such as carbonic anhydrase and triosephosphate isomerase, are capable of converting substrates to products as quickly as they are delivered to the active site (Bar-Even et al. 2011). The vast majority of enzymes, however, exhibit catalytic efficiencies that are several orders of magnitude lower than the theoretical limit, with the average being ~105 s-1M-1 (Bar-Even et al. 2011). This may be due to physiochemical properties of the substrate, such as molecular mass and hydrophobicity, which can impose limits on the binding affinity of ligand to the macromolecule (Kuntz et al. 1999). Moreover, evolutionary selective pressures also play a key role: achieving the maximal efficiency rate is unnecessary if it does not improve the overall fitness of the host organism (Wagner 2007).  1.2 The evolution of new functions through the optimization of promiscuous activities The functional diversity of enzymes is enormous. As of April 2020, the ExplorEnz Database, which was developed by McDonald et al at Trinity College Dublin to enable access to the data of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature List, has listed 7736 distinct enzyme functions under the seven classes (Oxidoreductase, Transferase, Hydrolase, Lyase, Isomerase, Ligase, and Translocase) identified by the Enzyme Commission (E.C.) number classification system (McDonald et al. 2009). Over the past century, structural and biochemical characterizations, and, in more recent decades, extensive genetic sequencing, have enabled the classification of an enormous number of protein superfamilies that are characterized by having distinct structural folds and catalytic machineries and mechanisms (Babbitt & Gerlt 2000; Baier, Copp, & Tokuriki 2016; Furnham et al. 2016; Glasner, Gerlt, & Babbitt 2006). The CATH database (v4.2, 2017) has curated and classified structures from over 6000 protein superfamilies while its sister resource, Gene3D (v16), contains approximately 52 million protein sequences. Even within a superfamily, functional diversities can be extremely  3 large: all of the formerly six E.C. classes of catalytic reactions (the seventh class, Translocase, having been more recently added by the IUBMB in August 2018) are represented in the haloacid dehalogenase, enolase, cytosolic glutathione transferase, metallo-β-lactamase (MBL), and amidohydrolase superfamilies (Baier et al. 2016; Bebrone 2007; Furnham et al. 2016; Seibert & Raushel 2005).  How is evolution able to give rise to such a tremendous array of diverse functions?  The current model of enzyme evolution entails the recruitment and optimization of a pre-existing activity in a protein scaffold. Many enzymes, in addition to exhibiting high catalytic efficiency for their native (or cognate, physiological) substrate, display promiscuous reactivity toward other non-native substrates, albeit typically at several orders of magnitude lower in efficiency (Copley 2003; Jensen 1976; Khersonsky & Tawfik 2010; Mohamed & Hollfelder 2013). In the past few decades, a number of studies have demonstrated that promiscuous reactivity is ubiquitous in many enzyme superfamilies (Baier et al. 2016; Hochberg & Thornton 2017; Mohamed & Hollfelder 2013). For example, characterization of 24 enzymes from the MBL superfamily against 10 distinct catalytic reactions revealed that each enzyme catalyzed an average of 1.5 reactions in addition to the native one (Baier & Tokuriki 2014). Substrate and catalytic promiscuity have also been observed in enzymes from the cytosolic glutathione transferase superfamily (Mashiyama et al. 2014), β-keto acid cleavage enzyme family (Bastard et al. 2014), and haloacid dehalogenase superfamily (Huang et al. 2015). Such latent promiscuous activities may constitute a reservoir from which new chemistries can gradually emerge during functional optimization. Indeed, numerous studies in recent decades have demonstrated that pre-existing promiscuous activities in enzymes can be readily optimized by mutations (Afriat-Jurnou, Jackson, & Tawfik 2012; Baier et al. 2019; Boucher et al. 2014; Kaltenbach et al. 2015; Miton et al. 2018; Tokuriki et al. 2012; Voordeckers et al. 2012; Yang et al. 2016). When an enzyme possessing a weak promiscuous catalytic activity is recruited or selected, how do mutations alter its molecular architecture to rapidly adapt to a new substrate upon environmental changes? To address this question, it is necessary to first reflect on the mechanisms by which enzymes recognize their native and promiscuous substrates. Natural enzymes achieve high rate-acceleration through the precise pre-organization of active site residues that provide tailored electrostatic- and geometric-complementarities with their native substrate, allowing for  4 specific transition state (TS) recognition and stabilization (Benkovic & Hammes-Schiffer 2003; Fersht 1974; Kirby & Hollfelder 2009). Furthermore, their structural motions have often been tuned by evolution to facilitate unique catalytic cycles, i.e., the capture and release of chemical intermediates along the reaction coordinate (Gatti-Lafranconi & Hollfelder 2013). In this context, low-level promiscuous activities are likely to be the result of a suboptimal fit between the active site and the promiscuous TS, originating from poor geometric and electrostatic complementarities and/or out-of-sync dynamics (Nobeli, Favia, & Thornton 2009). Thus, the optimization of a promiscuous activity is likely to involve the tinkering of enzyme-substrate interactions to form more optimal Michaelis and TS complexes (Jacob 1977), and the tuning of structural motions to adapt to new catalytic cycles (Bhabha, Biel, & Fraser 2015). In recent years, a number of studies have documented the molecular changes associated with an increase in activity during evolutionary cycles. These include directed evolution experiments, starting from the promiscuous activity of a natural- or computationally-designed enzyme (Giger et al. 2013; Miton et al. 2018; Obexer et al. 2017; Siegel et al. 2010; Tokuriki et al. 2012; Yang et al. 2016), as well as natural examples identified via bioinformatics-based approaches, such as ancestral sequence reconstruction (Bridgham, Ortlund, & Thornton 2009; Clifton et al. 2018; Harms & Thornton 2014; Kaltenbach et al. 2018; Voordeckers et al. 2012).  1.3  Exploring enzyme evolution For some rapidly evolving sequences, such as enzymes associated with drug resistance and various viral proteins, it is possible to study evolution by following natural trajectories. This method yields only a limited perspective on the subject however, since it cannot be used for understanding the emergence of more ancient functions. Moreover, the majority of enzymes evolve slowly in nature, on timescales that make them impractical for researchers to follow. Thus, alternate methods must be employed. In this section, we discuss two key laboratory tools for exploring enzyme evolution: ancestral sequence reconstruction and directed evolution.   1.3.1  Ancestral sequence reconstruction Ancestral sequence reconstruction (ASR) has become a powerful tool for studying enzyme evolution (Fitch 1971; Hochberg & Thornton 2017; Joy et al. 2016; Merkl & Sterner 2016;  5 Zuckerkandl & Pauling 1965). ASR involves the deduction of phylogenetic relationships between genetic sequences in order to statistically predict the likely characteristics of ancestors at branching nodes, which can then subsequently be utilized to determine the probable evolutionary steps that have resulted in their divergence (Hochberg & Thornton 2017; Joy et al. 2016; Merkl & Sterner 2016). ASR enables the characterization of extinct progenitor sequences to determine their functional profiles and the identification of key mutations that have occurred to give rise to the extant sequences that we see today (Hochberg & Thornton 2017). Thus, we are able to effectively trace the course of evolution over timescales of millions or even billions of years.  The idea of ASR was put forward by Pauling and Zuckerkandl in the 1960’s, when they theorized that the information of extinct ancestral genes or protein sequences can be inferred from those of extant homologs (Zuckerkandl & Pauling 1965). Several years later, Fitch developed one of the first algorithms for ASR, employing a maximum parsimony (MP) model, which constructs a tree with the minimum number of evolutionary changes required to explain the relationship between extant sequences (Fitch 1971). The MP method has been used to reconstruct the ribonuclease from an extinct bovid ruminant (Stackhouse et al. 1990).  With the dramatic increase in computing power over the subsequent decades, more complex algorithms have been developed. Notably, there is the maximum likelihood (ML) approach, which searches for the tree with the highest probability, given the extant sequences and the parameters of the model of evolution that is chosen (Yang 1994). Unlike MP, the ML approach takes the probabilities and rates of nucleotide substitutions into account, and is thought to provide more realistic phylogenies in comparison to MP (Yang 1994). Bayesian algorithms are used to search for trees with the highest posterior probability (Huelsenbeck & Ronquist 2001; Huelsenbeck et al. 2001). Bayes’ theorem combines the prior probability of a phylogeny with the likelihood to produce a posterior probability distribution of trees; the tree with the highest posterior probability is considered the best estimate of the phylogeny (Huelsenbeck et al. 2001; Huelsenbeck & Ronquist 2001). ASR has been utilized to study the evolution of a number of different proteins (Akanuma et al. 2013; Boucher et al. 2014; Butzin et al. 2013; Lim et al. 2016; Nguyen et al. 2017; Pillai et al. 2020; Studer et al. 2014), including the thermal adaptation of Precambrian metabolic enzymes from Bacillus (Hobbs et al. 2012), the divergence of a family of fungal glucosidases following  6 gene duplication events (Voordeckers et al. 2012), and the de novo evolution of catalytic activities in the non-catalytic ancestors of cyclohexadienyl dehydratase and chalcone isomerase (Clifton et al. 2018; Kaltenbach et al. 2018). Such studies have yielded valuable insights into the molecular changes of a protein as it evolves towards a new function, and have also addressed various fundamental questions in evolutionary biology such as: 1) Is evolution is repeatable?, 2) Are there alternate solutions to the one that has been historically selected?, and 3) Is evolution reversible? In one notable study conducted on the evolution of a glucocorticoid receptor (GR), it was found that specificity towards cortisol evolved between the ancestral sequences AncGR1, which was more strongly activated by mineralocorticoids and only weakly by cortisol, and AncGR2, which was cortisol-specific (Bridgham et al. 2009). Just two mutations were necessary and sufficient to switch the preference of AncGR1 from mineralocorticoids to cortisol, whereas an additional three mutations resulted in the loss of mineralocorticoid sensitivity to yield a highly cortisol-specific receptor that is phenotypically similar to AncGR2 (Bridgham et al. 2009). Interestingly however, reversion of the five mutations back to their ancestral states in AncGR2 failed to recapitulate the phenotype of AncGR1; instead, a complete loss of function was observed (Bridgham et al. 2009). This was due to the presence of five additional mutations between AncGR1 and AncGR2 that have destabilized the receptor structure that was needed to support the ancestral function (Bridgham et al. 2009). These five mutations formed an “epistatic ratchet” that has caused reversal of function to be evolutionarily difficult (Bridgham et al. 2009).   1.3.2  Directed evolution Directed evolution is a method used for protein engineering that mimics the natural processes of Darwinian evolution, consisting of iterative rounds of gene diversification and library screening or selection (Cobb, Chao, & Zhao 2013). One of the first experiments that employed directed evolution was conducted in the 1960’s, involving the in vitro evolution of a self-replicating RNA molecule (Mills, Peterson, & Spiegelman 1967); with the advent of techniques like error-prone PCR to introduce random mutations into a genetic sequence, directed evolution of proteins was able to take off in the 1990’s (Cobb, Chao, & Zhao 2013). One of the landmark examples of directed evolution on enzymes was conducted by Chen and Arnold, who evolved the protease subtilisin E, and obtained a variant that exhibited a 256-fold higher activity in 60%  7 dimethylformamide (Chen & Arnold 1993). Frances Arnold would go on to receive the 2018 Nobel Prize in Chemistry "for the directed evolution of enzymes". Directed evolution has the distinct advantage over rational design in that it does not require prior in-depth knowledge of a protein structure or mechanism. Directed evolution has been highly successful in improving catalytic activity, binding, expression, stability, aggregation-resistance, and other desirable properties of proteins for various industrial and biochemical purposes (Cabrita et al. 2007; Collins et al. 2003; Ebo et al. 2020; Giger et al. 2013; Hsu et al. 2005; Jiménez-Osés et al. 2014; Trevizano et al. 2012; Wang et al. 2018; Zhang et al. 2012). In addition to biotechnological applications, by characterizing improved variants obtained along a trajectory, directed evolution can also be an invaluable tool for understanding the molecular mechanisms by which mutations affect function (Blomberg et al. 2013; Horsman et al. 2003; Jiménez-Osés et al. 2014; Khersonsky et al. 2011; Miton et al. 2018; Otten et al. 2018; Oue et al. 1999; Tokuriki et al. 2012). For example, the role of more distant mutations that do not directly interact with the substrate/ligand can be probed. Laboratory evolution of an aspartate aminotransferase resulted in a variant with a ~2 million-fold increase in catalytic efficiency towards a non-native substrate, valine, via 17 mutations (Oue et al. 1999). Notably, however, only one of the mutated residues appeared to form direct contact with the substrate, and none interacted with the coenzyme; instead, the mutations function by causing a rearrangement of the two subunits of the dimeric protein and remodelling the active site, which is located in the subunit interface (Oue et al. 1999).  Moreover, similar to ASR, directed evolution is also an important tool for studying evolutionary questions such as repeatability and evolvability of a function (Baier et al. 2019; Kaltenbach et al. 2015; Khanal et al. 2015; Schulenburg et al. 2015; Tokuriki et al. 2012). In an experiment conducted on nine L-gamma-glutamyl phosphate (GP) reductase (ProA) orthologs towards enhancing a promiscuous N-acetyl-L-glutamyl phosphate (NAGP) reductase activity, a single active site mutation, E388A, was found to be crucial for the evolution of the new activity in all of the sequences (Khanal et al. 2015). However, the mutational effect of E338A varied between the different genetic backgrounds, increasing the promiscuous activity between 50 to 770-fold and decreasing the native activity between 190- and 2100-fold (Khanal et al. 2015). Similarly, directed evolution conducted on two orthologous MBL enzymes, VIM2 and NDM1, towards a promiscuous phosphonate monoester hydrolase (PMH) activity resulted in dramatically different  8 evolutionary outcomes, with NDM1 acquiring an overall ~3600-fold increase in PMH fitness compared to only a ~35-fold increase in VIM2 (Baier et al. 2019). These studies highlight the importance of the starting genetic sequence on the evolutionary outcome.  1.4  Global view of the distribution of activity-enhancing mutations What fraction of mutations is beneficial to a new enzymatic function? How are these beneficial mutations distributed across the enzyme structure? How many beneficial mutations are required to reach high catalytic efficiency? We discuss these topics in this section to provide a bird’s-eye view on the mutations necessary to optimize a novel promiscuous function. While each enzyme model constitutes a unique case in terms of attributes and response to mutations, when observations are drawn from diverse sequences they may reveal general trends and patterns that underlie the optimization of a novel catalytic activity.   1.4.1 What fraction of mutations improve activity? Derived from numerous enzyme engineering efforts, the fraction of beneficial mutations is estimated to be very low: the screening of mutagenized libraries typically results in the identification of less than 10 to 20 activity-enhancing mutations. Recent developments in massive-scale mutational analysis platforms, i.e., deep mutational scanning (DMS), are now providing a systematic and more statistical picture of the distribution of fitness effects (DFE) in proteins (Araya & Fowler 2011; Fowler & Fields 2014). The overall DFE measured during several DMS experiments across different target enzymes is consistent with the observations inferred from enzyme evolution campaigns: ~60-70% of mutations are deleterious, 30-40% are neutral, and less than 5% of mutations confer improvements in function (Chen, Fowler, & Tokuriki 2020; Melnikov et al. 2014; Stiffler, Hekstra, & Ranganathan 2015; van der Meer et al. 2016; Wrenbeck, Azouz, & Whitehead 2017). It should be noted that the selection pressure applied during DMS is often coupled to cell growth, such that the DFE not only reflects on changes in catalytic activity but also in protein expression, stability, and solubility. However, these observations must be put into perspective: assuming a protein sequence encompassing hundreds of amino acids, even a very small percentage of all available substitutions could still yield a substantial number of beneficial  9 mutations. For example, 5% of all possible single point substitutions in an enzyme of 300 amino acids (5,700 variants) would still correspond to a few hundred available beneficial mutations. Indeed, in a DMS study with VIM-2 β-lactamase, more than a hundred beneficial and specificity-altering mutations were identified across 25 different positions (Figure 1.1 A) (Chen et al. 2020). This represents a significant reservoir of accessible beneficial mutations within the local sequence space that can be harnessed.     Figure 1.1 A global view of the distribution of activity-enhancing mutations. (A) Histogram of the distribution of fitness effects for all missense mutations in the enzyme VIM2 under selection for growth at 2020406080100Percentage of variants-4 -3 -2 -1 0 1Fitness scoreFold change in catalytic efficiency1050 5 10 15 20 25 30Number of mutationsNeutral PositiveNegativeABC1001041031021010 20 40 60 80 100Negative Neutral Positive65% 30% 5%% of variants 10 128 µg/mL of ampicillin. The dashed vertical lines indicate fitness score cut-offs used to classify fitness effects as negative, neutral, or positive. The bar graph at the top indicates the total percentage of negative, neutral, or positive variants. (B) Cartoon representations of the crystal structures of VIM2 (left, PDB ID: 5yd7) (Chen et al. 2020) and amiE (right, PDB ID: 2uxy) (Wrenbeck et al. 2017) with the positions of all activity-enhancing mutations highlighted as blue spheres. Red circles indicate the location of the active site. The active site Zn2+ ions of VIM2 are depicted as orange spheres, whereas two active site residues of amiE, Trp138 and Cys166, are depicted as orange sticks. (C) Fold-change in kcat/KM of several evolved enzymes against the number of missense mutations acquired during directed evolution. Panel A is adapted with permission from Chen et al (Chen et al. 2020), Panel C from Goldsmith & Tawfik (Goldsmith & Tawfik 2017).   1.4.2 Where are activity-enhancing mutations located? Whilst the fraction of beneficial mutations is largely consistent regardless of the enzyme model, the distribution of beneficial mutations on the tertiary structures seem to vary considerably among enzymes. In some cases, beneficial mutations cluster around the active site, e.g., in  the DMS study of VIM-2 β-lactamase, 23 out of 25 positions that contain at least one specificity altering mutation were located within 15 Å of the catalytic zinc ions (Figure 1.1 B) (Chen et al. 2020). 4-oxalocrotonate tautomerase and Tn5 transposon-derived kinase have yielded similar patterns, where the majority of activity- and specificity-altering mutations tended to cluster in the first or second shell (Melnikov et al. 2014; van der Meer et al. 2016). In contrast, the majority of beneficial and specificity-determining mutations occurring in amiE appear to be located far from the active site: most of the 395 mutations that specifically enhanced growth on isobutyramide were 9-21 Å away from the active site (Figure 1.1 B) (Wrenbeck et al. 2017). Similarly, 53/106 (50%) of mutations in TEM-1 β-lactamase that were found to increase fitness towards cefotaxime are localized on the enzyme surface and far from the active site (Stiffler et al. 2015). While the mutations observed in directed evolution and ASR studies are more biased, i.e., they only represent mutations that were selected during adaptation, these studies have also unveiled the contributions of both proximal and distal mutations. In general, mutations closer to the active site (<10 Å) tend to have larger effects; however, a considerable number of activity-enhancing mutations remain >10 Å away from the catalytic center (Miton & Tokuriki 2016; Morley & Kazlauskas 2005; Wilding et al. 2019). Moreover, distal mutations can still cause large improvements in activity. For example, a mutation located 13 Å away from the active site improved ceftazidime hydrolysis in a β-lactamase by more than 600-fold (Vakulenko et al. 1999); a mutation 19 Å away from the active site improved the activity of a fatty acid desaturase by ~30-fold (Whittle & Shanklin 2001).   11  Overall, the wide distribution of activity-enhancing mutations on protein structures is likely a reflection of the existence of multiple solutions for improving interactions between an enzyme’s active site and its substrate. Mutations in the active site may generate critical residues that interact with the substrate and stabilize the TS; however, second-shell mutations may help to fine-tune the key residues in the active site and/or binding pocket to be more complementary to the target substrate. In addition, surface mutations may function by altering conformational dynamics to more catalytically active conformations. Thus, the effects of proximal versus distal mutations will vary between different sequences and are dependent upon the underlying molecular bottleneck that needs to be improved upon.  1.4.3 How many mutations need to be accumulated to elicit a new function? If the optimization of enzyme function cannot be achieved via single point mutations and requires multiple substitutions, how many are necessary to achieve a complete functional transition? A recent systematic analysis of directed evolution studies demonstrated that the overall improvement in activity compared to the total number of mutations varies substantially with the enzyme model (Figure 1.1 C) (Goldsmith & Tawfik 2017). However, a prominent trend is that improvements greater than 1,000-fold typically require the accumulation of at least 10 mutations. Moreover, the catalytic improvements along an evolutionary trajectory often exhibit ‘diminishing returns’, a hallmark of evolutionary optimization processes whereby the fitness improvement per mutation is large in early rounds of evolution but later becomes more incremental (Hartl, Dykhuizen, & Dean 1985; Stebbins 1944). For example, during the evolution of PTE towards arylesterase activity, the first four rounds resulted in a ~1,100-fold increase in activity, whereas the final 14 rounds only saw a ~30-fold increase (Tokuriki et al. 2012). Similarly, the evolution of an arylsulfatase towards phosphonate hydrolysis saw a ~2800-fold increase in catalytic activity during the first four rounds of evolution, and only ~40-fold increase during the final five rounds (Miton et al. 2018). Thus, the accumulation of a large number of mutations (10-20 mutations) appears essential to fully achieve the optimization of a novel catalytic activity, even though late-occurring mutations may be comparatively less fruitful (Figure 1.1 C) (Goldsmith et al. 2017).  1.4.4 The effect of mutational epistasis  12 Epistasis refers to a phenomenon of non-additivity between mutations (de Visser, Cooper, & Elena 2011; Domingo, Baeza-Centurion, & Lehner 2019; Phillips 2008). While the term originated in genetics and was initially used to describe non-additive effects between genes, epistasis has since been adopted by a number of different biological systems (de Visser et al. 2011; Domingo et al. 2019; Phillips 2008; Richmond 2001). In protein science, epistasis is generally used to describe interactions between amino acids that result in non-additivity between two or more mutations on a phenotype, such as binding,  stability, or catalytic activity  (Kaltenbach & Tokuriki 2014; Starr & Thornton 2016). There are two main types of epistasis in proteins: magnitude epistasis and sign epistasis (Figure 1.2 A) (Starr & Thornton 2016; Kaltenbach & Tokuriki 2014). Magnitude epistasis occurs when the additive effects of mutations are either larger (synergistic) or smaller (antagonistic) compared to the sum of the individual mutations; sign epistasis occurs when the effect of a mutation switches from beneficial to deleterious (or the reciprocal), depending on the presence or absence of other mutations (Figure 1.2 A) (Kaltenbach & Tokuriki 2014). Epistasis influences the effects of mutations, and consequently impacts evolutionary outcomes (Domingo et al. 2019; Kaltenbach & Tokuriki 2014; Starr & Thornton 2016). For example, when mutations display sign epistasis, the accessibility of available substitutions becomes restricted as mutations will only have a positive effect on fitness (“fitness” here defined as a protein phenotype that’s under selection) in a limited number of genetic backgrounds. In the context of evolution, the presence of epistasis results in a “rugged” fitness landscape (“fitness landscape” defined here as a multidimensional representation of the fitnesses of all possible genotypes encompassed by a set of mutations  (Figure 1.2 B) (Kaltenbach & Tokuriki 2014; Romero & Arnold 2009; Smith 1970)), where the number of accessible pathways from one genotype to another is constrained due to intermediates with lower fitness (Figure 1.2 B) (Kaltenbach & Tokuriki 2014; Romero & Arnold 2009; Smith 1970; Starr & Thornton 2016). In those instances, the order in which mutations accumulates becomes essential (Lozovsky et al. 2009; Meini et al. 2015; Mira et al. 2015; Noor et al. 2012; Smith 1970; Weinreich et al. 2006). How prevalent is epistasis? On the scale of an entire protein sequence, epistasis appears relatively rarely: DMS studies conducted on the RNA recognition motif of poly(A)-binding protein and the IgG-binding domain of protein G found that pairwise epistasis occurred in only 4-5% of double mutants (Melamed et al. 2013; Olson, Wu, & Sun 2014). By contrast, among function- 13 altering mutations, epistasis appears highly prevalent and can drastically alter the effect of mutations (Lozovsky et al. 2009; Meini et al. 2015; Moriuchi et al. 2014; Noor et al. 2012; Weinreich et al. 2006; Yang et al. 2016). A systematic analysis of nine examples of enzyme evolution revealed that 82% of functional mutations exhibit epistasis, with nearly half appearing either neutral or deleterious in the wild-type background, only to become beneficial following the fixation of other mutations along the trajectory (Miton & Tokuriki 2016). Thus, the distribution of activity-enhancing mutations may be altered progressively as adaptation proceeds. Predicting epistasis and identifying mutations that can collectively improve the enzyme activity by several orders of magnitude is a key challenge for the field of protein evolution and engineering; it requires an in-depth molecular understanding of the epistatic networks embedded within enzyme structures.     Figure 1.2 The effect of mutational epistasis. (A) The effects of two mutations when no epistasis is observed (left), when magnitude epistasis is observed (center), and when sign epistasis is observed (right). Concept of figure is adapted from Kaltenbach & Tokuriki (Kaltenbach & Tokuriki 2014). (B) The effects of epistasis on the fitness landscape of a protein. Panel on the left depicts a hypothetical sequence space encompassed by five mutations, where each node (circle) represents a unique genotype, with the ancestral genotype on the left and derived genotype on the right.  Genotypes that are separated from each other via single mutations are connected by edges (lines). Panel on the right depicts the restriction in the number of + =A BABPhenotypic effect+ =BABAABor + =A B+ =A BABABMutationsNo epistasis Magnitude epistasis Sign epistasisABIncrease fitnessDecrease fitnessAncestor Derived Ancestor Derived 14 accessible pathways from the ancestral genotype to the derived as a result of epistasis among the mutations, where many intermediate nodes are inaccessible due to a decrease in fitness.   1.5 Molecular mechanisms of activity-enhancing mutations Understanding the mechanisms by which mutations alter the molecular architecture of enzymes to increase catalytic activity remains a great challenge in the field. Over the years, an increasing number of studies have described the molecular changes occurring in enzymes evolving in nature or the laboratory. Importantly, these studies also demonstrate how enzyme-substrate interactions are altered from promiscuous, sub-optimal interactions to highly organized, optimal ones. In this section, we highlight and discuss several major mechanisms, providing an atomic-level view of the evolution of novel enzyme functions.  1.5.1 Creation of new interactions with the substrate One accessible solution for enzyme evolution is the creation of new interactions with a novel substrate. The introduction of a new residue in the active site may, for example, alter electrostatic interactions with the substrate and improve its catalysis. In some cases, the establishment of essential catalytic groups is needed to achieve radical functional transitions. A notable example was described during the natural evolution of cyclohexadienyl dehydratase (CDT) activity from an ancestral solute binding protein (Clifton et al. 2018). In this case of de novo enzyme evolution, the ancestral scaffold possessed an incomplete catalytic machinery; the introduction of a glutamate residue within the binding pocket early in CDT evolution established general-acid catalysis that was essential to the new function. Another remarkable example is the mechanistic transition from atrazine chlorohydrolase (AtzA) to melamine deaminase (TriA), two homologs performing herbicide degradation in nature (Figure 1.3 A). The evolutionary transition requires the Ser-Asn catalytic dyad assisting atrazine dechlorination in AtzA to be substituted to a Cys-Asp for melamine deamination to occur (Noor et al. 2012). A similar mechanism was reported during the directed evolution of a computationally designed retro-aldolase, RA95.0, which initially contained a catalytic Lys210 designed to facilitate C-C bond cleavage of (±) methodol (Althoff et al. 2012; Giger et al. 2013). During the directed evolution of RA95.0, the function of Lys210 was taken  15 over by a new mutation, T83K, which was inserted across the active site in an environment that better promotes catalysis (Figure 1.3 B) (Giger et al. 2013). Even more surprisingly, further directed evolution resulted in the introduction of three additional mutations in the active site (V51Y, S110N, F180Y), which formed a hydrogen-bonding network with Lys83 (Obexer et al. 2017). This new catalytic tetrad catalyzes retro-aldolization of (±) methodol via a completely retuned catalytic mechanism (Obexer et al. 2017; Zeymer, Zschoche, & Hilvert 2017).  Beyond these extreme cases, new active site interactions appear to reinforce or complement pre-existing ones by providing residue(s) that coordinate unique moieties(s) of the new substrate nonexistent in the native substrate. For instance, a six-residue insertion within an active site loop was observed during the neofunctionalization of apicomplexan lactate dehydrogenase (LDH) from an ancestral malate dehydrogenase (AncM/L) (Boucher et al. 2014). This insertion displaces an active site arginine in favor of a new Trp107, forming hydrophobic interactions with the methyl substituent of the novel substrate, pyruvate. Several other studies, including the natural divergence of maltases and isomaltases (Voordeckers et al. 2012) and HisA/TrpF isomerases (Näsvall et al. 2012; Newton et al. 2017), or the laboratory evolution of arylester hydrolysis in PTE phosphotriesterase (Tokuriki et al. 2012), also described a similar phenomenon.  While the creation of a novel interaction with the substrate may be intuitive, this mechanism is not often reported in literature. One explanation may be that promiscuous substrates typically interact with pre-existing catalytic machinery, taking advantage of its inherent reactivity, albeit in a suboptimal fashion (Bayer, van Loo, & Hollfelder 2017; Sunden et al. 2017). Thus, the creation of new interactions may not be necessary as functional optimization can be achieved without drastic alterations of enzyme-substrate interactions, particularly when the target substrate closely resembles the native one.     16   Figure 1.3 Creation of new enzyme-substrate interactions by evolution. (A) Schematic representation of key active site residues in AtzA and TriA. The Ser-Asn dyad, involved in atrazine dechlorination, is substituted to Cys-Asp during the evolution towards melamine deamination (Noor et al. 2012). An additional active site residue, Phe84, is also mutated to leucine. The active site Fe2+ is depicted as a green sphere. (B) Schematic representation of the stepwise evolution of a designed retro-aldolase, RA95.0 (Giger et al. 2013; Obexer et al. 2017). (left) The catalytic dyad (K210-E53 and a water molecule) from the initial RA95.0 design, (centre) was mutated to a distinct triad (K83-N110-Y51) in RA95.5-5 during directed evolution. (right) Further evolution resulted in the emergence of a catalytic tetrad (K83-N110-Y51-Y180) in RA95.5-8F. Key active site residues for Panels A and B are depicted as sticks; new residues installed at each evolutionary stage are highlighted in blue. The mutations fixed between each step of the trajectory are indicated on the gray arrows. The chemical structures of atrazine (left) and melamine (right) in Panel A, and the mechanism-based inhibitor of retro-aldolases, 1,3-diketone, in Panel B are shown in purple. Panel B is adapted from Giger et al (Giger et al. 2013).     AtzAS331N328F84Fe2+TriAC331D328L84Fe2+ABβ7β2OORA95.0K210E53β3β1β2OORA95.5-5β7K83K210N110Y51β3β1NNNNH2H2NNH29 mut.F84L, N328D, S331C+ 6 a.a.β7K83β2RA95.5-8FN110Y51Y180β3β1β6OONNNNHNHCl12 mut.V51Y, T83K, E53S/T, S110N +8 a.a.18 mut.F180Y, K210L+16 a.a.H2O 17 1.5.2 Active-site reshaping A more prevalent evolutionary mechanism to enhance catalysis is the reshaping of the active site. This mechanism typically involves the tailoring of enzyme-substrate interactions to promote geometric complementarity. In this case, mutations typically occur around the active site to alter its overall shape and size, without necessarily affecting the electrostatic environment or catalytic machinery. These active site modifications are, of course, case-specific: some enzymes require a narrowing of the active site cavity to promote a snug fit with smaller substrates (Fasan et al. 2007; Fasan et al. 2008; Jiménez-Osés et al. 2014; Khersonsky et al. 2011). For example, during the laboratory optimization of a computationally designed diels-alderase (DA_20_00), a ~9,700-fold increase in activity was achieved by the insertion of a 24-residue helix-turn-helix motif, and the stepwise contraction of the cavity by multiple mutations to constrain the substrate in a productive orientation (Figure 1.4). This substantial remodeling of the binding pocket improved complementarity to the substrate without affecting the position of the designed catalytic residues (Preiswerk et al. 2014; Siegel et al. 2010). Similar observations were reported during the evolution of cytochrome P450BM3 towards propane hydroxylation (Fasan et al. 2007; Fasan et al. 2008) and the evolution of the adenylation domain of tyrocidine synthetase (TycA) towards smaller (Villiers & Hollfelder 2011), or backbone-modified amino acids (Niquille et al. 2018).  Conversely, an enlargement of the cleft may be required to remove steric hindrance and improve the access of larger substrates to the catalytic machinery (Baier et al. 2019; Clifton et al. 2018; Gould & Tawfik 2005; Kaltenbach et al. 2018; Miton et al. 2018). For instance, the directed evolution of an arylsulfatase (PAS) toward phosphonate hydrolysis resulted in the enlargement of its active site to better accommodate the bulkier new substrate (Miton et al. 2018). Similarly, in the evolution of the metallo-β-lactamase NDM1 towards phosphonate hydrolysis, mutating a tryptophan to glycine removed a steric clash and enhanced enzyme-substrate complementarity (Baier et al. 2019). Active site reshaping is the most frequently observed mechanism in the literature and the consequences of active site reshaping can be substantial; several studies have observed more than 1,000-fold increases in activity from this molecular strategy. The prevalence of this mechanism suggests, as previously discussed, that enzymes often possess pre-existing residues that are able to  18 form key interactions with promiscuous substrates, such that a modest active site reshaping may ensure rapid functional adaptation.     Figure 1.4 Active site reshaping by evolution. (top) Cut-away views of the substrate binding pocket of designed and evolved diels-alderases, in the same orientation, and overlaid with the phosphorylated product analog bound in CE20 (PDB: 4o5t) depicted as sticks. Two key catalytic residues, T195Q and D121Y, introduced by design in the scaffold of a diisopropylfluorophosphatase (DFPase, PDB: 1e1a) to generate the diels-alderase activity in mutants DA_20_00 (PDB: 3ilc), CE6 (PDB: 3u0s), CE11 (PDB: 4o5s) and CE20 are highlighted as sticks. A designed 24-residue helix-turn-helix motif that was incorporated into the structure is represented as cartoon, while a buried water molecule, present in all structures, is depicted as a red sphere. The number of mutations fixed between each step is indicated on the grey arrows. (bottom) Catalytic efficiency for the diels-alderase reaction at each step of the trajectory, on a log scale. Adapted from Preiswerk et al and Siegel et al (Preiswerk et al. 2014; Siegel et al. 2010).   1.5.3 Conformational tinkering of active site residues As previously discussed, enzyme evolution involves a significant number of remote mutations that do not directly interact with the substrate. While the effect of distal mutations is often hard to decipher, a significant number appear to contribute to fine-tuning the position and dynamic motion of active-site residues; a phenomenon known as ‘conformational tinkering’ (Jacob 1977; Yang et al. 2016). For example, during the laboratory evolution of an N-acyl-homoserine lactonase toward paraoxon, two distal mutations contributed to the shift of an active site residue, Phe68, by 3 Å, which promoted interactions with the leaving group of the new substrate (Yang et al. 2016). Moreover, a fascinating sequence of mutational events has been reported by several other studies. (i) An initial mutation introduces a key residue that can potentially interact with the substrate but  19 fails to efficiently do so due to mispositioning or the presence of conformational heterogeneity, i.e., the sampling of multiple discrete rotamers of a residue. (ii) Distal mutations subsequently alter the conformation of this key residue to become more catalytically competent. For example, during the directed evolution of PTE phosphotriesterase toward arylesterase activity, the first mutation that directly interacts with the new substrate, H254R, exhibited conformational heterogeneity. The subsequent fixation of multiple distal mutations contributed to the gradual shift of Arg254 to occupy a bent conformation that provides π–cation interactions with the leaving group of the substrate (Figure 1.5 A) (Campbell et al. 2016; Tokuriki et al. 2012). In addition, the directed evolution of LovD for higher simvastatin synthesis also saw the accumulation of distal mutations that progressively restricted the conformational sampling of Tyr188 within the catalytic triad, stabilizing a conformation that restored the catalytically active geometry (Figure 1.5 B) (Jiménez-Osés et al. 2014). Similar sequential events were observed during the directed evolution of a designed retro-aldolase (Giger et al. 2013) and the natural evolution of chalcone isomerase (Kaltenbach et al. 2018). Finally, during the evolution of a designed Kemp Eliminase (KE07) (Khersonsky et al. 2010), the sampling of three distinct active site configurations, each with different catalytic efficiency, was observed along the evolutionary trajectory (Hong et al. 2018). Over time, distal mutations resulted in the stabilization of the most catalytically efficient configuration, exhibiting improved positioning and orientation of a key Trp50 with respect to the substrate.   The conformational tinkering of active site residues might be an important mechanism in enzyme evolution. While the introduction of a key residue in the active site may provide a stepping stone for further evolution, its configuration may be inadequate within its novel surrounding environment. Thus, the alteration of the residue’s position and dynamics by other mutations, including distal ones, is essential to achieve efficient catalysis. The most extreme manifestation of this phenomenon is ‘conformational selection’, as described by Hong et al, which may be closely linked to the evolvability of promiscuous functions (Aharoni et al. 2005; Hong et al. 2018; Ma & Nussinov 2010; Maria-Solano et al. 2018; Tokuriki & Tawfik 2009).    20   Figure 1.5 Conformational tinkering of active site residues by evolution. (A) Repositioning of the mutated active site residues, H254R and D233E (depicted as sticks), over the course of PTE evolution toward 2-naphthyl hexanoate (2NH) hydrolysis (from left to right, PDB ID: 4pcp, 4xaf, 4xd5, 4xag, 4xay, and 4e3t) (Campbell et al. 2016). (B) Representative snapshots of the reorganization of the K79-S76-Y188 catalytic triad during LovD evolution, observed by MD simulations. Catalytic and noncatalytic regimes are depicted in purple and red, respectively. Tyr188 is gradually shifted by distal mutations from a non-catalytic- to catalytic- orientation. The percentages indicate the occupation of the representative side-chain conformation in simulation time. Red (dotted) arrows indicate the sampling of multiple conformational rotamers. The total number of mutations acquired at each evolutionary stage is indicated beneath the names of the variants. Panel B was adapted from Jiménez-Osés et al (Jiménez-Osés et al. 2014).   1.5.4 Repositioning or switching metal cofactors Analogous to the ‘conformational tinkering’ of a key residue, the repositioning of active site cofactors, such as metal ions, has been described as a recurring evolutionary mechanism (Sugrue et al. 2016). The laboratory evolution of serum paraoxonase PON1 towards phosphotriester hydrolysis identified an active site mutation, H115W, which altered the metal coordination of the catalytic Ca2+, following a 1.8 Å upward displacement of the cofactor (Aharoni et al. 2004; Ben-David et al. 2020; Ben-David et al. 2013). Interestingly, MD simulations suggest that the Ca2+ position is plastic: the upward metal position appears to have pre-existed in PON1 wild-type; several mutations shifted the equilibrium toward the upward, more catalytically competent, metal coordination for the phosphotriesterase activity. A similar phenomenon was also observed during PTE evolution towards arylesterase activity, where the distance between two active site Zn2+ ions  21 decreased in the most evolved variant (Kaltenbach et al. 2015; Tokuriki et al. 2012). Intriguingly, the subsequent ‘reverse’ evolution of PTE (back towards its native phosphotriesterase activity) resulted in the precise restoration of the Zn2+ distances to the wild-type configuration (Kaltenbach et al. 2015). Finally, in the directed evolution of serine β-lactamase BcII towards broader antibiotic specificity, a decrease in the distance between two active site Zn2+ appeared to be critical to stabilizing the new reaction intermediate and improving the rate-limiting step of the reaction (González et al. 2016; Tomatis et al. 2008).  Furthermore, several studies have highlighted the importance of promiscuous metal binding in modulating metalloenzymes’ specificity (Baier et al. 2015). For instance, examination of the specificity of EcoRV restriction endonuclease revealed a single leucine to isoleucine mutation that was able to invert the enzyme’s affinity from Mg2+ to Mn2+ and, in turn, alter specificity (Vipond, Moon, & Halford 1996). In natural or laboratory evolution, the switching of cofactor(s) is thus another avenue through which novel functions may emerge (Anderson et al. 2019).   1.5.5 Alteration of enzyme dynamics Beyond local conformational tinkering, distal mutations can modulate the dynamics of larger structural elements, from loops to whole protein domains, which may be essential to complete catalytic cycles (Bhabha et al. 2015; Fraser & Jackson 2011; Henzler-Wildman & Kern 2007; Petrović et al. 2018). However, it may be hard to capture the extent of, and the precision with which, conformational dynamics are fine-tuned during enzyme evolution. A case in point was made by a study that attempted to enhance the enantioselectivity of a haloalkane dehalogenase, DhaA, for β-bromoalkanes by transplanting the active site features (eight amino acid substitutions and an 11-amino acid insertion) of a closely related enantioselective homolog, DbjA, into the DhaA scaffold (Sykora et al. 2014). While the active site geometry of the chimeric enzyme (DhaA12) was virtually identical to DbjA, the hybrid enzyme failed to become enantioselective due to inadequate amplitudes of motions and hydration levels (Sykora et al. 2014).  In enzymes for which catalytic cycles proceed via significant conformational transitions to accommodate distinct reaction intermediates, e.g., via the sampling of open/closed states, an evolutionary reshaping of such transitions may be crucial to adapt to a new catalytic cycle (Buller  22 et al. 2015; Fraser et al. 2009; Maria-Solano, Iglesias-Fernández, & Osuna 2019; Otten et al. 2018). Two aforementioned evolutionary studies illustrate this scenario: PTE and CDT (Campbell et al. 2016; Clifton et al. 2018; Kaczmarski et al. 2020). In the former case, a series of mutations along the evolutionary trajectory anchored the active site loop 7 in a productive (closed) conformation, while eliminating the sampling of the non-productive (open) conformation (Figure 1.6) (Campbell et al. 2016). By contrast, other mutations increased the flexibility of another active site loop (loop 5), thus resulting in markedly different motions when compared to the wild type enzyme. In the latter example, inactive or weakly active CDT ancestors were found to mainly adopt an open, non-catalytic conformation (Clifton et al. 2018; Kaczmarski et al. 2020). The conformational landscape of CDTs was gradually altered by evolution, such that the closed active conformation became more stabilized for greater improvements in catalytic efficiency. While we have emphasized the importance of tailoring conformational dynamics, the extent to which such optimization becomes essential for enzyme evolution remains under debate. For instance, contrasting with previous examples, a study of two natural homologs of β-lactamases and their chimeric variants demonstrated that the alteration of protein dynamics is not necessarily associated with changes in catalytic activity, nor required (Gobeil et al. 2014; Gobeil et al. 2019). On the other hand, an extensive structural study of several variants of TEM-1 β-lactamase unveiled that two separate mutations caused opposite and incompatible dynamic changes to the structure of TEM-1, despite both enhancing the hydrolysis of cefotaxime, a novel antibiotic (Dellus-Gur et al. 2015). Thus, we urgently need to advance our understanding of the role of enzyme dynamics in function and evolution in order to design better enzyme engineering strategies (Broom et al. 2020; Davey et al. 2017).     23   Figure 1.6 Alteration of enzyme conformational dynamics by evolution. Wild-type PTE (PDB ID: 4pcp) and variants R6 (PDB ID: 4xag) and R22 (PDB ID: 4pcn) crystal structures are shown as cartoon putty representation (Campbell et al. 2016). The B-factors are visualized by a colour scale mapped onto the structure, and ribbon thickness. The active site location is indicated by a red circle, and two Zn2+ ions shown as green spheres. The number of mutations fixed between each step of the trajectory is indicated on the gray arrow.   1.5.6 Tailoring the enzyme-substrate complex  Finally, molecular changes can directly impact enzyme-substrate interactions during enzyme evolution. In a simple model, the optimization of enzyme function can be thought of as the process leading to a higher frequency of productive binding events between an enzyme and its substrate (Bar-Even et al. 2011; Gamage et al. 2005). While it is challenging to capture molecular evidence of an increase in productive complexes per se, some studies have been able to observe a transition from predominantly non-productive substrate binding modes to more productive ones. For instance, an early evolved mutant of a designed Kemp eliminase, HG2, was found to bind a transition state analog (TSA), 6-nitrobenzotriazole, in both productive and non-productive or ‘flipped’, orientations (Figure 1.7 A) (Privett et al. 2012). Further evolution fixed the mutation K50Q in mutant HG3 and introduced a new H-bond donor with the TSA. This stabilized the productive complex and caused a >700-fold increase in kcat/KM in subsequent mutants, HG3.17 and HG4 (Blomberg et al. 2013; Broom et al. 2020). The elimination of non-productive binding modes Loop 7 Loop 7Loop 5 Loop 5PTE R6Loop 7Loop 5Rigid Flexible6 mut. 20 mut.PTE R22PTE WT 24 was also described during the ancestor reconstruction of CHI (Kaltenbach et al. 2018) and the directed evolution of alcohol dehydrogenase A (Hamnevik et al. 2017).  Beyond the dichotomy described above, it may be more realistic to consider the existence of a continuum of binding states (or E.S complexes), leading to diverse catalytic efficiencies that depend on the ‘optimality’ of the enzyme-substrate interactions, e.g., the orientation and proximity between the reactants. Indeed, promiscuous substrates may generally form ‘less’ productive or inadequate interactions with non-cognate enzymes (Babtie, Tokuriki, & Hollfelder 2010; Bar-Even et al. 2011). Consequently, the optimization of an enzymatic activity could require the subtle tinkering of enzyme-substrate interactions toward more ‘optimal’ productive binding without major alterations of the catalytic machinery or overall structure. Such tailoring was demonstrated by the directed evolution of PAS arylsulfatase toward phosphonate hydrolysis, where a modest enlargement of the active site cleft by two proximal mutations ‘unlocked’ the access to the catalytic machinery for the bulkier substrate (Miton et al. 2018). Interestingly, this active site reshaping was accompanied by the formation of a new Michaelis complex upon substrate repositioning (Figure 1.7 B). As E.S interactions, in terms of distance and electrostatic complementation, were more optimally oriented with respect to the catalytic center, evolution resulted in a large increase in TS stabilization. Similar observations were made during the evolution of a designed Kemp eliminase, KE59 (Khersonsky et al. 2012).   It remains difficult to establish which type of mechanism prevails in enzyme evolution, as only a handful of studies have extensively characterized the gradual changes of enzyme-substrate interactions during enzyme evolution, leaving this phenomenon largely undocumented. These limitations result from the challenge of gathering experimental evidence of subtle changes in enzyme-substrate interactions, in particular for promiscuous substrates that exhibit low catalytic efficiency in the initial stages of evolution. Yet, resolving atomic-level views of such interactions is essential to advance our understanding of enzyme function and evolution (Yabukarski et al. 2019).    25    Figure 1.7 Substrate repositioning leads to novel enzyme-substrate complexes in evolution. (A) The evolution of a designed kemp eliminase produced a weakly active mutant, HG2 (PDB ID: 3nyd), exhibiting two binding modes for the TS analog, 6-nitrobenzotriazole (Privett et al. 2012). (left) Asp127 binds the substrate in a productive orientation (teal sticks), as designed. (right) However, a non-productive, flipped orientation (purple sticks), is also observed in the crystal structure. Further evolution stabilized the productive complex upon mutations K50Q and S265T, which eliminated the non-productive orientation in subsequent mutants, e.g., in the most evolved HG4 variant (PDB ID: 5rgf) (Blomberg et al. 2013; Broom et al. 2020). A red circle signals the position of the isoxazolic oxygen in the original substrate. (B) Substrate repositioning induced by mutations T50A and M72V, from PASWT (PDB ID: 1hdh) to PASG9 (PDB ID: 4cxk) during PAS arylsulfatase evolution toward phosphonate monoester hydrolysis (Miton et al. 2018). Evolution shifted the substrate closer to the catalytic machinery (Fgly nucleophile, K375 and H211) without altering the position of the catalytic residues, or creating new enzyme-substrate interactions. Snapshots are representative stationary points from MD simulations.   1.6 Evolution of xenobiotic activities 1.6.1 Xenobiotic-degrading enzymes The past century has seen the development and subsequent introduction of a number of anthropogenic xenobiotic chemicals into the environment. Some well-known compounds include pesticides such as dichlorodiphenyltrichloroethane (DDT), insulating fluids such as polychlorinated biphenyl (PCB), and synthetic polymers such as various forms of plastics (Copley 2009; Fang et al. 2014). While important for numerous agricultural, medical, and military D127K50S265M72VWTevolvedT50ACa2+fGly51K375H211K50QD127S265TK50S265D127Kemp eliminase HG29 mutationsKemp eliminase HG4PASWT PASG9freeze outproductive bindingEVOLUTION/DESIGNproductive bindingnon-productive bindingx9 mut.AB 26 purposes, a number of these xenobiotic compounds are also highly recalcitrant and, furthermore, some have been found to pose significant health risks to wildlife and humans (Copley 2009; Fang et al. 2014).  There has been a number of microorganisms isolated over the past few decades that exhibit the ability to degrade xenobiotic compounds (Copley 1998; Copley 2000; Copley 2009; Fang et al. 2014). This trait is often brought about by the acquisition of xenobiotic-degrading enzymes, which have evolved to either detoxify these new substrates, or to degrade them so that they can be utilized as a source of nutrients (Copley 1998; Copley 2000; Janssen et al. 2005; Ufarté et al. 2015). Some examples include atrazine chlorohydrolase (AtzA), which catalyzes the dechlorination of atrazine (Noor et al. 2012; Scott et al. 2009), and cis-3-chloroacrylic acid dehalogenase (cis-CaaD) and trans-3-chloroacrylic acid dehalogenase (CaaD), which are involved in the degradation of 1,3-dichloropropene (van Hylckama Vlieg & Janssen 1991). The evolution of these enzymes is hypothesized to have occurred via the recruitment and optimization of latent promiscuous xenobiotic activities in ancestral sequences (Copley 2000; Copley 2003). However, in spite of their presumably recent emergence, many xenobiotic-degrading enzymes exhibit low levels of sequence identity with their closest homologs in the databases. For example, the aforementioned cis-CaaD shares only 34% sequence identity with its closest known homolog in the tautomerase superfamily (Poelarends et al. 2004). Such divergence makes the identification of the progenitor sequences and the specific mutations that enabled the emergence of the novel xenobiotic activities challenging.  1.6.2 Organophosphates Organophosphates (OPs) are a class of compounds that possess the general structure of a central phosphate with three ester linkages (Bigley & Raushel 2013; Karalliedde et al. 2001; Singh 2009). There are three major classes of OPs: phosphotreisters, thiophosphotriesters, and phosphorothiolesters (Fig 1.8 A) (Bigley & Raushel 2013). Phosphotriesters consist of a phosphate centre joined to three O-linked groups, thiophosphotriesters have the phosphoryl oxygen replaced by a sulfur, and phosphorothiolesters have at least one of the ester oxygens replaced by a sulfur (Fig 1.8 A) (Bigley & Raushel, 2013). OPs function as powerful neurotoxins by inhibiting the enzyme acetylcholine esterase, causing an excess buildup of the neurotransmitter, acetylcholine,  27 in the body, which results in overstimulation of the nervous system (Ghanem & Raushel 2005; Naughton & Terry 2018). While some OPs such as cyclipostins and cyclophostin are naturally occurring compounds (Nguyen et al. 2017), the majority of OPs currently present in the environment are anthropogenically produced.  Anthropogenic OPs have only existed for less than a century. In the 1930’s, the German scientist Gerhard Schrader, working at IG Farbenindustrie, began experimenting with a series of synthetic OP compounds as a potential pesticide (Karalliedde et al. 2001). However, their utility in chemical warfare soon became apparent to the Nazi government, who began developing OP-based nerve gases (Karalliedde et al. 2001). Throughout World War II, IG Farbenindustrie would synthesize a number of different nerve agents, known as the G-series, which included OP compounds such as tabun, sarin, and soman (Karalliedde et al. 2001). While fortunately the Nazis did not use these weapons during World War II, there have been several documented cases of OP nerve agents being employed in chemical warfare in subsequent decades, including the Iraqi military attack on Kurdish civilians in the 1980’s, the 1995 Tokyo Sarin attack by the terrorist group Aum Shinrikyo, and in 2013, during the ongoing Syrian Civil War (Naughton & Terry 2018). OPs have also been employed in high-profile assassinations, including that of Kim Jong-nam, the eldest son of the former North Korean leader, Kim Jong-il, in 2017, as well as the attempted assassination of a former Russian military intelligence officer, Sergei Skripal, and his daughter in 2018 (Naughton & Terry 2018).   Following World War II, a number of Allied nations, including the United States, were able to gain access to German research and start synthesizing OPs themselves (Karalliedde et al. 2001). The use of OP pesticides for agricultural purposes began in the 1950’s, and became particularly prevalent in the 1970’s, when organochlorine pesticides such as DDT began to fall out of favour (Naughton & Terry 2018). OPs also play an important role in controlling a number of mosquito-borne illnesses such as malaria, Zika, and West Nile Virus (Naughton & Terry 2018). The toxicity of OPs to humans and other mammals, however, make them a health hazard: every year, there are thousands of cases of both acute and chronic OP poisonings being reported (Naughton & Terry 2018).    28   Figure 1.8 Structure of organophosphates and catalytic mechanisms of OPH enzymes. (A) Chemical structures of organophosphates. Examples are shown for each of the three major classes: phosphotriester (methyl-paraoxon, left), thiophosphate (methyl-parathion, center), and phosphorothiolate (malathion, right). (B) Reaction scheme of general OP hydrolysis. X denotes a phosphoryl sulfur or oxygen atom. R1 and R2 can be either aryl or alkyl groups. Y denotes the leaving group that can be a phenol, thiol, or fluoride group. (C) Proposed catalytic mechanism of OP hydrolysis by PTE. A bridging water molecule is coordinated between the two active site metal ions, and is deprotonated by Asp301 to carry out nucleophilic attack on the substrate. (D) Proposed catalytic mechanism of OP hydrolysis by PON1 and DFPase. In contrast to PTE, only a single metal ion is employed, and the mechanism proceeds through a covalent intermediate formed between the OP substrate and the catalytic Asp229/269 of the enzymes. Panels B and C are adapted from Bigley & Raushel (Bigley & Raushel, 2013).  Zn2+ Zn2+-OAsp301OOHO POYOROROH2Zn2+ Zn2+O-Asp301OOHO POYORORH2OZn2+ Zn2+HOAsp301OOPROO-OR-OYPX1ROOYOR2 PX1ROOHOR2 + OYHH2OO-O229/269DPOOYORROO-Ca2+O2ONHOH2NH2O OH2H2O -OYPOOROROO229/269DB: H2OO-Ca2+O2ONHOH2NH2O OH2Ca2+O-O229/269DPOOROR-OO-O2ONHOH2NH2O OH2BCDPOOOONO2PSOOONO2POOSOOOO OMethyl-paraoxon Methyl-parathion MalathionA 29 1.6.3 The evolution of organophosphate-degrading enzymes In spite of the relatively recent introduction of anthropogenic OPs into the environment, there are already a number of enzymes that have acquired the ability to degrade these substrates through hydrolysis of one of the ester bonds (Figure 1.8 B) (Bigley & Raushel 2013; Furlong et al. 1991; Ghanem & Raushel 2005; Zhang et al. 2005; Ramanathan & Lalithakumari 1999; Russell et al. 2011; Serdar et al. 1982; Singh 2009; Singh & Walker 2006). The labile group of OPs is commonly a phenol or thiol (Bigley & Raushel, 2013). Enzymes that carry out this reaction have been isolated from a number of different organisms, including bacteria, squid, and mammals (Bigley & Raushel 2013; Ghanem & Raushel 2005; Singh 2009). While the evolutionary histories of many of these sequences are still currently unknown, it is theorized that similarities between the TS of OPs and those of the enzymes’ native substrates resulted in the existence of OP hydrolase (OPH) activities prior to the widespread use of these compounds (Elias & Tawfik 2011; Elias et al. 2008). Indeed, a number of lactonase enzymes in both the amidohydrolase and MBL superfamilies have been found to possess promiscuous OPH activity (Afriat-Jurnou et al. 2006; Afriat-Jurnou et al. 2012; Baier & Tokuriki 2014; Elias et al. 2008; Elias & Tawfik 2011; Hiblot et al. 2012; Luo et al. 2014). These latent promiscuous activities were theorized to have been recruited and optimized via evolution when OPs became more prevalent in the environment.  The best characterized OPH enzyme is Phosphotriesterase (PTE), an enzyme belonging to the amidohydrolase superfamily (Figure 1.9 A) (Bigley & Raushel 2013; Ghanem & Raushel 2005). PTE has been isolated from a number of different strains of bacteria, and is thought to have evolved to provide the host organisms the ability to utilize various OP substrates in the environment as sources of phosphate and carbon. PTE is a metalloenzyme with a binuclear active site that binds two zinc (Zn2+) ions (although the enzyme is also active with other divalent metal ions, including manganese, cadmium, and nickel) (Bigley & Raushel 2013; Hong & Raushel 1996). Extensive structural, biochemical, and computational probing of PTE has revealed that the chemical reaction catalyzed by the enzyme occurs via nucleophilic attack by a hydroxide molecule that is bridging between the two metal ions (Figure 1.8 C) (Aubert, Li, & Raushel 2004; Bigley & Raushel 2013; Hong & Raushel 1996). Currently, the closest homologous sequences of PTE are N-acyl homoserine (AHL) lactonases, and it has been found that many lactonases do possess promiscuous activity towards OP substrates (Afriat-Jurnou et al. 2006; Afriat-Jurnou et al. 2012;  30 Elias et al. 2008; Elias & Tawfik 2011; Singh 2009). This has led to the hypothesis that PTE evolved through the recruitment of a serendipitously existing OPH activity in an ancestral lactonase enzyme.  Other OPH enzymes include bacterial methyl-parathion hydrolase (MPH) from the metallo-β-lactamase superfamily (Dong et al. 2005), bacterial phosphate acid anhydrase (OPAA) from the prolidase superfamily (Cheng, Harvey, & Chen 1996), and mammalian serum paraoxonase (PON1) (Furlong et al. 1991) and squid diisopropylfluorophosphatase (DFPase) (Hoskin & Long 1972) from the calcium-dependent phosphotriesterase superfamily (Figure 1.9). Furthermore, a number of insect α-carboxylesterases, which are from the serine hydrolase superfamily, have been found to have acquired OPH activity, thus providing their hosts with pesticide resistance (Jackson et al. 2013; Newcomb et al. 1997; Russell et al. 2011). The distinct protein folds and catalytic mechanisms of different OPH enzymes indicate that the function evolved independently. Experiments conducted on PON1 and DFPase, for instance, reveal that only one active site metal ion is employed during the reaction, which proceeds through a covalent enzyme-bound intermediate (Figure 1.8 D) (Bigley & Raushel 2013). Thus, convergent evolution has given rise to a number of different OPH enzymes in the past several decades; however, the progenitor sequences and mutations that have enabled the optimization of OPH activity in many of these cases are currently still unknown.     31   Figure 1.9 Crystal structures of OPH enzymes from different superfamilies. Cartoon representations of structures of (A) PTE from the amidohydrolase superfamily, (B) MPH and OPHC2 from the metallo-β-lactamase superfamily, (C) OPAA from the prolidase superfamily, and (D) PON1 and DFPase from the calcium-dependent phosphotriesterase superfamily. Green spheres represent the active site metal ions.   1.7 Aims and scope of thesis The overarching aim of this thesis is to unveil the processes by which novel functions emerge in sequences, using the evolution of xenobiotic OPH activity as a model. Utilizing directed evolution and ASR, I seek to uncover the molecular mechanisms by which a promiscuous activity in an enzyme can be optimized. Moreover, I will analyze the impact that epistasis has on enzyme evolution, and how cryptic genetic variation may affect the evolutionary potentials of different sequences.   In Chapter 2, I will describe the directed evolution experiment conducted on an AHL Amidohydrolase Metallo-β-lactamaseProlidase Calcium-dependent phosphotriesterase PTE MPHPON1 DFPaseOPAAPDB ID: 1dpm PDB ID: 1p9ePDB ID: 3l7g PDB ID: 1v04 PDB ID: 2gvvA BC DOPHC2PDB ID: 4le6 32 lactonase towards the hydrolysis of the OP substrate, paraoxon, to examine the evolvability of the promiscuous activity. Furthermore, through the use of structural analyses, coupled with extensive mutational and biochemical analyses, I demonstrate how more distant mutations can play a key role in evolution by rearranging residues in the active site. In Chapter 3, I will describe the use of ASR to identify the key mutations that enabled the evolution of OPH activity in the enzyme methyl-parathion hydrolase (MPH) from a lactonase ancestor. By applying extensive statistical analyses on the adaptive fitness landscape of MPH, I uncovered the complex network of epistatic interactions between the key mutations, and how these interactions change in response to subtle differences in substrate substituents. In Chapter 4, I investigate the evolvabilities of different genetic backgrounds towards OPH activity by conducting both mutational analyses and directed evolution on several extant lactonase orthologs of MPH. The mutations that enabled the evolution of OPH activity in MPH fail to substantially increase the activity in the orthologs, and only the lactonase ancestor of MPH is able to rapidly optimize the novel function. In Chapter 5, I will discuss the major conclusions that are drawn from this work, and suggest further experiments that can be conducted to advance our understanding of enzyme evolution and sequence-structure-function relationships.                33 Chapter 2: Evolution of organophosphate hydrolase activity in an N-acyl homoserine lactonase  2.1 Summary How remote mutations can lead to changes in enzyme function at a molecular level is a central question in evolutionary biochemistry and biophysics. Here, we combine laboratory evolution with biochemical, structural, genetic and computational analysis to dissect the molecular basis for the functional optimization of phosphotriesterase activity in a bacterial lactonase (AiiA) from the metallo-β-lactamase (MBL) superfamily. We show that a 1000-fold increase in phosphotriesterase activity is caused by a more favourable catalytic binding position of the paraoxon substrate in the evolved enzyme that resulted from conformational tinkering of the active site through peripheral mutations. In particular, a non-mutated active site residue, Phe68, was displaced by ~3 Å through the indirect effects of two second-shell trajectory mutations, enabling favourable molecular interactions between the residue and paraoxon. Comparative mutational scanning, i.e., examining the effects of alanine-mutagenesis on different genetic backgrounds, revealed significant changes in the functional roles of Phe68 and other non-mutated active site residues caused by the indirect effects of trajectory mutations. Our work provides a quantitative measurement of the impact of second-shell mutations on the catalytic contributions of non-mutated residues, and unveils the underlying intramolecular network of strong epistatic mutational relationships between active site and more remote residues. Defining these long-range conformational and functional epistatic relationships has allowed us to better understand the subtle, but cumulatively significant, role of second- and third-shell mutations in evolution.        34 2.2 Introduction The evolution of new enzymatic functions requires remodelling of the active site in order to accommodate and properly position the new substrate. Direct contributions of mutations, i.e., mutations that lead to gain or loss of interaction(s) with the substrate, can cause a significant change in enzyme function (Miton & Tokuriki 2016; Morley & Kazlauskas 2005). Direct contributions can often be rationally elucidated, and many tools have been developed to predict changes in enzyme function (Chica, Doucet, & Pelletier 2005; Kries, Blomberg, & Hilvert 2013). However, it has been observed in enzyme evolution studies that function-altering mutations can occur at residues that are outside the active site, in the second (active-site periphery) and even third (surface) shells of the protein. These residues do not directly interact with the substrate, and thus affect catalytic activity through indirect means (Ben-David et al. 2013; Bocola et al. 2004; Jiménez-Osés et al. 2014; Miton & Tokuriki 2016; Morley & Kazlauskas 2005; Oelschlaeger 2005; Oue et al. 1999; Sykora et al. 2014). Such indirect contributions of mutations often improve catalytic activity through “molecular tinkering” (Bridgham et al. 2010; Jacob 1977), including fine-tuning the shape of the active site (Tokuriki et al. 2012) and the position of catalytic residue(s) (Bocola et al. 2004; Jiménez-Osés et al. 2014) and catalytic metal ion(s) (Ben-David et al. 2013; Tomatis et al. 2005), and/or altering the dynamics of the enzyme (Dellus-Gur et al., 2015). Although the importance of indirect contributions during enzyme evolution is widely recognized, it is still challenging to elucidate and predict indirect mutational effects (Brodkin et al. 2015). Moreover, attempts at providing direct evidence or quantitative experimental measurements of the extent by which second- and third-shell mutations can affect catalytic function through fine-tuning of active site residue(s) is limited (Benkovic & Hammes-Schiffer 2003; Goodey & Benkovic 2008). Elucidating the structural and functional role of such mutations and quantifying their contributions to catalytic activity is crucial for our understanding of sequence-structure-function relationships.  Laboratory (or directed) evolution has been successfully employed to gain fundamental insights into the molecular mechanisms of enzyme evolution (Fischer, Kang, & Brindle 2016). Strong selection pressure can be utilized to prevent the fixation of neutral mutations, thereby reducing the number of mutational steps required for adaptation to a selective pressure. In this  35 study, we used laboratory evolution to generate an evolutionary trajectory towards phosphotriesterase (PTE) activity from an N-acyl homoserine lactonase in the metallo-b-lactamase (MBL) superfamily. Bacterial phosphotriesterases evolved within the last half-century to hydrolyze the P-O bond of xenobiotic organophosphate (OP) pesticides, such as paraoxon and parathion (Figure 2.1 B) (Sethunathan & Yoshida 1973; Singh 2009; Singh & Walker 2006). To date, two major classes of bacterial phosphotriesterases have been identified, one belonging to the amidohydrolase (AH) superfamily and one to the MBL superfamily (Bigley & Raushel 2013). While the evolutionary origin and mechanisms of phosphotriesterases in the AH superfamily have been extensively studied (Afriat-Jurnou et al. 2006; Afriat-Jurnou et al. 2012; Bigley & Raushel 2013; Jackson et al. 2008; Meier et al. 2013), comparatively little is known about phosphotriesterases from the MBL superfamily. Methyl-parathion hydrolase (MPH) was the first phosphotriesterase identified in the MBL superfamily (Dong et al. 2005; Zhongli, Shunpeng, & Guoping 2001). Phylogenetic and enzymatic characterizations of MPH homologous enzymes have indicated that, as with phosphotriesterases from the AH superfamily, MPH most likely evolved from lactonases, enzymes that hydrolyze the C-O ester bond of lactone ring and are involved in interfering with bacterial quorum sensing (Figure 2.1 B) (Elias & Tawfik 2011). Amongst other lactonases from the MBL superfamily, the N-acyl homoserine lactonase (AHLase) of Bacillus thuringiensis, AiiA, exhibits promiscuous phosphotriesterase activity (Figure 2.1 A) (Baier & Tokuriki 2014). However, the evolutionary potential and plausible evolutionary trajectories of these MBL lactonases toward phosphotriester hydrolysis have not been examined.  In this study, we evolved the promiscuous phosphotriesterase activity of the AiiA lactonase in the laboratory. Six rounds of random mutagenesis and screening of the library of mutants for higher phosphotriesterase activity yielded a variant with eight amino acid substitutions that exhibited a ~1000-fold improvement in its activity towards the organophosphate paraoxon. We performed structural characterizations of the wild-type enzyme and an evolved variant to reveal the molecular mechanism underlying the functional change. Additionally, we conducted extensive mutational analyses to elucidate the functional role of the remote mutations on repositioning and changes in the catalytic contributions of non-mutated, active-site residues. Our work provides a detailed molecular description of the evolution of a new enzymatic function and demonstrates how mutations indirectly affect catalytic activity through conformational tinkering.  36  Figure 2.1 Protein structures and enzymatic reactions of AiiA and MPH. (A) Structural overlay of B. thuringiensis N-Acyl homoserine lactonase, AiiA (grey, PDB ID: 3dhb), and the Pseudomonas sp. WBC-3. methyl-parathion hydrolase, MPH (deep blue, PDB ID: 1p9e). The active site metals are shown as beige spheres. (B) Reaction scheme of the native homoserine lactonase (top) and promiscuous phosphotriesterase (bottom) activities of AiiA. A proposed mechanism of AiiA for lactonase activity is described in Figure 2.2 (Momb et al. 2008).    Figure 2.2 Proposed catalytic mechanism of AiiA for native and promiscuous substrates. (A) N-Acyl homoserine lactone hydrolysis of AiiA as described by Momb et al (Momb et al. 2008). Metal cations are A B Lactonase reactionParaoxonase reaction+H2O+H2O+Active SiteMPHAiiAA BMe2+ Me2+ Me2+ Me2+ 37 coordinated by adjacent histidines, Asp191, and the bridging hydroxide ion, which serves as a nucleophile that attacks the carbonyl carbon of the AHL substrate (Momb et al. 2008). Binding of the substrate is stabilized by hydrogen bond interactions between the two carbonyl groups of the substrate with the metal ions, as well as interactions between the O3 of the substrate with Phe107 via a water molecule (Momb et al. 2008). (B) Proposed binding and mechanism of paraoxon-ethyl hydrolysis of AiiA. The hydroxide ion that serves as the nucleophile in lactone hydrolysis (Momb et al. 2008) likely plays the same role in paraoxon hydrolysis.                            38 2.3 Materials and Methods 2.3.1 Molecular cloning of AiiA variants and mutant libraries AiiA gene variants and mutant libraries were sub-cloned into a modified pET27(b) vector (Novagen), which replaced a N-terminal pelB leader sequence to Strep-tag II sequence (WSHPQFEK). PCR products and vectors were digested with Nco I and Hind III (Thermo Scientific) for 3 hours at 37°C. The digested vector was further treated with FastAP (alkaline phosphatase, Thermo Scientific) for an additional hour. Digested DNA was purified from a 1% agarose gel using a gel extraction kit (Qiagen). Ligations were performed in 20 µL reactions at a vector:insert molar ratio of 1:3 using T4 DNA ligase (Thermo Scientific) with approximately 30 ng vector DNA, and incubated at room temperature for 2 hours. The ligation mixtures were transformed into E. cloni 10G cells (Lucigen), yielding >105 colonies for mutant libraries. Colonies containing mutant library variants were pooled, and the plasmids were purified using a plasmid purification kit (Qiagen) and retransformed into E. coli BL21 cells (DE3) for enzyme expression and activity screening.  2.3.2 Site-directed mutagenesis Single-point mutant variants were constructed by site-directed mutagenesis as described in the QuikChange Site-Directed Mutagenesis manual (Agilent) using specific primers, which are listed in Table A.2. Genotype of all variants were confirmed by DNA sequencing.   2.3.3 Generation of mutagenized libraries Random mutant libraries were generated with error-prone PCR using nucleotide analogues (8-oxo-2'-deoxyguanosine-5'-Triphosphate (8-oxo-dGTP) and 2'-deoxy-P-nucleoside-5'-Triphosphate (dPTP); TriLink). Two independent PCRs were prepared, one with 8-oxo-dGTP and one with dPTP. Each 50 µL reaction contained 1 × GoTaq Buffer (Promega), 3 µM MgCl2, 1 ng template DNA, 1 µM of primers (forward (T7 promoter): taatacgactcactataggg; reverse (T7 terminator): gctagttattgctcagcgg), 0.25 mM dNTPs, 1.25 U GoTaq DNA polymerase (Promega) and either 100 µM 8-oxo-dGTP or 1 µM dPTP. PCR cycling conditions: initial denaturation at 95°C for 2 minutes followed by 20 cycles of denaturation (30 seconds, 95°C), annealing (60 seconds, 58°C) and extension (70 seconds, 72°C) and a final extension step at 72°C for 5 minutes. Subsequently, each  39 PCR was treated with Dpn I (Thermo Scientific) for 1 hour at 37°C to digest the template DNA. PCR products were purified using the Cycle Pure PCR purification kit (Omega Bio-tek) and further amplified with a 2 × Master mix of Econo TAQ DNA polymerase (Lucigen) using 10 ng template from each initial PCR and the same primer set at 1 µM in a 50 µL reaction volume. PCR cycling conditions: Initial denaturation at 95°C for 2 minutes followed by 30 cycles of denaturation (30 seconds, 95°C), annealing (20 seconds, 58°C) and extension (70 seconds, 72°C) and a final extension step (72°C, 2 minutes). The PCR products were purified and cloned as described above. The protocol yielded 1-2 amino acid substitutions per gene (1-2 bp) per round.  2.3.4 Generation of DNA shuffling libraries The staggered extension process (StEP) protocol was used to recombine DNA sequence of improved variants (Zhao et al. 1998). Plasmids of variants were mixed in equimolar amounts to 500 ng of total DNA and used as a template for the StEP reaction. PCR cycling conditions: Initial denaturation at 95°C for 5 minutes, followed by 100 cycles at 95°C for 30 seconds, followed by 58°C for 5 seconds. PCR products were purified using the Cycle Pure PCR purification kit and further amplified with a 2 × Master mix of Econo TAQ DNA polymerase. StEP libraries were cloned as described above.   2.3.5 Activity prescreen on agar plates Transformed cells were plated on LB agar plates with kanamycin (50 µg/mL) and incubated at 37°C overnight yielding about 3,000–5,000 colonies in total on 4–6 plates. Colonies were replicated onto nitrocellulose membranes, which were placed onto LB agar containing kanamycin and 1 mM IPTG for protein expression and incubated at room temperature overnight. After expression, the membranes were placed into an empty petri dish and the cells were lysed by alternating incubations at -20°C and 37°C three times for 10 minutes each. To assay activity, 25 mL of 0.5% agarose buffer (50 mM Tris–HCl pH 7.5, 100 mM NaCl and 200 µM MnCl2) containing 500 µM paraoxon (Sigma), or 250 µM after round 4, was poured onto the membrane. Colonies with active enzymes developed a yellow color due to the release of a product, p-nitrophenol. The most active colonies (~200 variants) were directly picked from plates for subsequent screening in 96-well plates.  40  2.3.6 Cell lysate activity screen in 96-well plates Colonies picked into 96-well plates were grown in 200 µL of LB with 50 µg/mL kanamycin at 30°C overnight. 20 µL of each culture were used to inoculate 400 µL of LB with 50 µg/mL kanamycin, and incubated at 37°C for 3 hours, then protein expression was induced by adding IPTG at the final concentration of 1 mM, and incubated at 30°C for another 3 hours. Cells were harvested by centrifugation at 3,220 × g for 10 minutes at 4°C and pellets were frozen at -80°C for at least 1 hour. Cells were lysed by adding 200 µL of 50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.1% (w/v) Triton-X100, 200 µM MnCl2, 100 µg/mL lysozyme, and 0.5 U benzonase (Novagen). After 30 minutes of incubation at room temperature the lysate was clarified at 3,220 × g for 20 minutes at 4°C. To assay enzymatic activity, 100 µL of the clarified lysate was mixed with 100 µL paraoxon solution at a final concentration of 250 µM (or 150 µM after round 4) in 50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.02% Triton-X100 and the reaction was monitored at 405 nm. The activity of the best variants was subsequently confirmed in triplicate cultures and activity assays. The variant with the highest activity was sequenced and used as template for the next round. When multiple variants were identified StEP shuffling was used to recombine beneficial mutations.   2.3.7 Enzyme purification for kinetic analysis All variants were cloned as described above, transformed and overexpressed in E. coli BL21 (DE3) cells and purified using Strep-tactin resin (IBA lifesciences) as described previously (Baier & Tokuriki 2014). During expression, purification and storage, 200 µM MnCl2 was supplied in media and buffers.  2.3.8 Enzyme kinetics The kinetic parameters and activity levels of purified of enzyme variants were obtained as described previously (Baier & Tokuriki 2014). Briefly, the activity for paraoxon-methyl, paraoxon-ethyl, and parathion-ethyl (Sigma) was monitored following the release of p-nitrophenol at 405 nm with an extinction coefficient of 18,300 M-1cm-1 (Baier & Tokuriki 2014). The activity for N-hexanoyl-L-homoserine lactone was monitored at 560 nm using a phenol red based pH indicator assay. A standard curve was prepared using HCl to calculate an extinction coefficient of  41 1334 M−1cm−1. The kinetic parameters KM and kcat were determined by fitting the initial rates to the Michaelis–Menten model (v0=kcat [E]0[S]0/(KM+[S]0)) using KaleidaGraph (Synergy Software).   2.3.9 Crystallization of AiiA-wt and AiiA-R4  Proteins were expressed with a N-terminal His10-tag in E. coli BL21 (DE3) cells in TB media containing 1% glycerol, 50 µg/mL kanamycin and 200 µM ZnCl2. Cells were grown at 30°C for 6 hours and another 16 hours at 22°C and harvested at 8,000 ´ g for 15 minutes at 4°C. Pellets were resuspended in 50 mM HEPES pH 7.5, 500 mM NaCl, 25 mM Imidazole and 200 µM ZnCl2 (Buffer A). Lysis was performed by sonication (OMNI sonic ruptor 400) and the lysate was clarified at 20,000 ´ g for 60 min at 4°C. The proteins were purified using a Ni-NTA column (Qiagen) and eluted in 50 mM HEPES pH 7.5, 500 mM NaCl, 500 mM Imidazole, 200 µM ZnCl2. To remove the N-terminal His10-tag TEV protease cleavage was performed at 4°C for 5 days in 50 mM Tris-HCl pH 8.0, 150 mM NaCl, 0.5 mM EDTA and 1 mM DTT containing 10% TEV protease relative to the purified protein (Cabrita et al. 2007). Subsequently, TEV protease and His10-tags were removed using a Ni-NTA column. The proteins were further purified using size exclusion (HiLoad 16/600 Superdex 75 column, GE-Healthcare), eluted in buffer containing 10 mM HEPES pH 7.0, 2 mM DTT, 100 µM ZnCl2 and concentrated to 20 mg/mL. Crystallization was performed using the hanging drop method by mixing a protein solution (1 µL) and a well solution (2 µL) containing 25% (w/v) PEG 4 K, 20% (v/v) glycerol, 80 mM Tris-HCl pH 8.5 and 160 mM MgCl2 as described elsewhere (Liu et al. 2008). Crystals appeared in two weeks at 18°C and continued to grow for several months. Crystals were briefly soaked into 35% PEG 4 K before nitrogen gas flash frozen. Crystallographic data were collected at 100 K at the Australian Synchrotron using the MX1 beam line with a wavelength of 0.9537 Å. The diffraction data obtained were indexed, processed and scaled with the programs XDS (Kabsch, 2010) and AIMLESS (Evans & Murshudov 2013). All structures were solved by molecular replacement using MOLREP (Vagin & Teplyakov 2010) as implemented in the CCP4 suite of programs (Winn et al. 2011). The models were subsequently optimized by iterative model building with the program COOT (Emsley & Cowtan 2004) and refinement with phenix.refine (Afonine et al., 2012). The structures were then evaluated using MolProbity (Chen et al. 2010). Resolution estimation and data truncation were performed by using overall half-dataset correlation CC (1/2)  42 > 0.5 (Karplus & Diederichs 2012). Details of the refinement statistics are produced by Phenix and summarized in Table A.3. Structures of wild-type AiiA and AiiA-R4 were deposited in the PDB with accession codes 5EH9 and 5EHT, respectively.  2.3.10 Molecular docking and molecular dynamics simulations AutoDock Vina (Trott & Olson 2010) was used to dock paraoxon into apo AiiA-wt and AiiA-R4 structures. AutoDockTools4 (Morris et al. 2009) was used to generate polar hydrogens and add partial charges to the proteins using Kollman United charges. The search space was included in a box of 40 × 40 × 40 Å and centred on the binding site of the ligand. Both rigid and flexible-docking were performed. Flexible torsions of side chains and ligand were assigned based on B-factors of the crystal structures with Autotors. The ligand paraoxon-ethyl was generated using PRODRG (Schüttelkopf & van Aalten 2004).  For each calculation eight poses were obtained and ranked according to the scoring-function of Autodock Vina. Selected protein-ligand structures were used for the molecular dynamics simulation using GROMACS 4.6.5 (Hess et al. 2008). The force field applied for the simulation was GROMOS96-53a6 (van Gunsteren 1996). Ligand topology was generated by the Automated Topology Builder 2.0. Proteins were immersed into a dodecahedron shaped box of water at a buffering distance of 1.0 nm between the protein and the edge of the box. Sodium ions were added to neutralize charge. Energies were subsequently minimized with the Steepest Decent method for 2,500 steps. The last frame of each energy-minimized structure was used as the initial frame for MD simulation. Electrostatic interactions were calculated with the Particle-Mesh Ewald (PME) method (Darden, York, & Pedersen 1993). The cut-off for PME was 1.0 nm. The time step was set at 2 fs at 300 K. The number of steps set to 25,000 (0.1 ns for NVT) for generating random velocities with position restrained, 250,000 (1 ns for NPT) for equilibrium at 1 atm, and 12,500,000 (25 ns) for the production MD simulation. V-rescale for temperature coupling and Parrinello-Rahman for pressure coupling were used.       43 2.4 Results 2.4.1 Directed evolution of AiiA towards increased phosphotriesterase activity B. thuringiensis AiiA, as well as most enzymes in the MBL superfamily, utilize two active site metal ions to catalyze chemical reactions by activating a water molecule for the nucleophilic attack and stabilizing the developing charge on the transition state (Figure 2.2 A). Many enzymes of the MBL superfamily are considered to be zinc-dependent enzymes (Crowder, Spencer, & Vila 2006; González et al. 2007); however, our previous study showed that diverse metal ions can be accommodated and that the activity profile of the MBL enzymes can vary significantly, depending on which metal ion is incorporated into the active site (Baier et al. 2015). Therefore, we examined the functional effect of various metal ions (Cd2+, Co2+, Mn2+, Ni2+ and Zn2) for the phosphotriesterase activity of AiiA (Figure 2.3). Out of the five tested AiiA metal-ion isoforms, Mn2+-AiiA exhibited the highest phosphotriesterase activity (v0 = 15.9 nMs-1 for 5 µM of enzyme and 500 µM of paraoxon), which is ~100-fold higher than Zn2+-AiiA (v0 = 0.23 nMs-1 for 5 µM of enzyme and 500 µM of paraoxon). Moreover, the phosphotriesterase activity of AiiA was easily detectable in crude cell lysate when Mn2+ was supplied to the cell culture media and enzyme activity assay buffer. However, the activity of AiiA in cell lysate was barely detectable in the presence of Zn2+. Thus, throughout the directed evolution experiment and subsequent biochemical characterizations we supplied Mn2+ to media and buffers.  AiiA was subjected to six rounds of directed evolution to improve its promiscuous phosphotriesterase activity. Mutagenized libraries of AiiA were generated using error-prone PCR and DNA shuffling, subcloned into the expression vector pET27(b), and transformed into E. coli BL21 (DE3) cells. The library (~104 transformants) was plated on agar plates and colonies were transferred onto nitrocellulose membranes for protein expression. Subsequently, colonies were lysed and assayed on membranes for improved phosphotriesterase activity, by measuring the release of p-nitrophenol from the paraoxon substrate, which produces an observable yellow color. Then, ~200 of the most active colonies were re-grown in 96-well plates, and phosphotriesterase activity of the clarified cell-lysate was assayed spectrophotometrically. The variants with the highest activity improvements in cell-lysate were then selected, sequenced, and used as a template for the next round of directed evolution (Figure 2.4). During the fourth round of evolution our initial screen failed to yield any improved variants. Thus, we lowered the substrate concentration  44 from 500 µM to 200 µM for the membrane pre-screen and from 250 µM to 150 µM for the cell-lysate screening to enable selection for variants with lower KM (Table A.1). In total, six iterative rounds of directed evolution yielded an AiiA variant with a ~270-fold increase in phosphotriesterase activity in cell lysate (Figure 2.5 A).     Figure 2.3 Metal dependency of AiiA for paraoxon activity. AiiA was expressed and purified in the presence of 100 μM of the corresponding metal or without any metal supplied (no metal). Metals were added to the LB media during expression and to all buffers for purification and activity assays. Enzymatic activities were measured using a final concentration of 5 μM of enzyme and 500 μM of paraoxon substrate. The error bars represent the standard deviation of triplicate measurements.    no metal CoMnZn Cd NiFeMetal0.010.1110100v 0 (nM/s) 45    Figure 2.4 Overview of the directed evolution scheme. (A) Starting variant(s) were mutagenized using error-prone PCR or StEP (staggered extension process) recombination and subcloned into a vector containing a N-terminal Strep-tag. (B) The resulting library was transformed into E. coli BL21 (DE3) cells for protein expression. The plates were replicated onto nitrocellulose membranes. (C) Colonies on the nitrocellulose membrane were lysed and pre-screened for paraoxonase activity, which can be identified through the development of yellow colour due to the release of p-nitrophenol. (D) Approximately 200 of the most active colonies were picked from nitrocellulose membrane for screening in ~2 × 96-well plates liquid culture. (E) The most improved variant(s) served as a starting point for the next round of directed evolution   2.4.2 The functional adaptation of AiiA yielded a generalist enzyme We purified the most improved variant from each round and determined its kinetic parameters for native lactonase and evolved phosphotriesterase activities, using the substrates N-hexanoyl-L-homoserine lactone and paraoxon-ethyl, respectively (Figure 2.5 B-C). The evolutionary trajectory of AiiA exhibits similar characteristics observed in other directed evolution experiments. First, the improvements in phosphotriesterase activity exhibit diminishing returns (Kaltenbach et al. 2015; Miton & Tokuriki 2016; Tokuriki et al. 2012), i.e., drastic improvements during initial rounds (~100-fold improvement in kcat/KM in the first 3 rounds), followed by smaller increments in the later rounds (~10-fold increase in kcat/KM in the final 3 rounds). The most evolved variant, AiiA-R6, exhibits a >1000-fold higher kcat/KM for paraoxon hydrolysis compared to the starting point, AiiA-wt (kcat/KM = 5.4 × 105 M-1s-1 vs. 5.1 × 102 M-1s-1). Over the first four rounds changes in kcat and KM contributed to the improvement in catalytic efficiency, whereas in the last two rounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Library generation with epPCR or StEP Prescreen for PTE activity (~2000 variants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AiiA Screening for improved PTE activity (~200 variants) Starting variant(s) Transformation and expression in E. coli A B C D E  46 only KM changed (decreased) significantly. Overall, the improvement in kcat/KM is largely due to a 130-fold decrease in KM (from 3400 to 26 µM), while kcat increased by only 8-fold (from 1.8 to 14 s-1). Second, the trade-off between the native lactonase and evolved phosphotriesterase activities was weakly negative (Aharoni et al. 2005; Khersonsky & Tawfik 2010). The lactonase activity decreased by ~3-fold during the evolution, while the phosphotriesterase activity increased ~1000-fold (Figure 2.5 C), yielding a generalist enzyme with high catalytic activities for lactones and phosphotriesters (kcat/KM ≥ 105 M-1s-1). In addition, catalytic activities towards two related organophosphate compounds, methyl-paraoxon and methyl-parathion, co-evolved along with the target substrate, ethyl paraoxon. Specifically, the kcat/KM for methyl-paraoxon and methyl-parathion increased by ~100-fold from AiiA-wt to AiiA-R6 (1.1 × 105 M-1s-1 and 7.1 × 102 M-1s-1, respectively; Table 2.1).      Figure 2.5 Activity changes of AiiA over the evolutionary trajectory towards improved phosphotriesterase activity. (A) Lactonase (light grey) and phosphotriesterase (dark grey) activity in cell lysate of wild-type (wt) and evolved AiiA variants. Experiments were performed in triplicate and values were averaged with errors (standard deviation) less than 10% (Table A.4). (B) Catalytic parameters, kcat, KM, and kcat/KM, of wt and evolved variants for phosphotriesterase activity. Each value represents the average of three parallel measurements with the standard deviation (bars). Individual values are listed in Table A.5. (C) Catalytic parameters, kcat, KM, and kcat/KM of WT and evolved variants for the native lactonase activity. Each value represents the average of three measurements with the standard deviation (bars).    LactonaseParaoxonasekcatKMkcat/KMkcatKMkcat/KM100101102103104105100101102103104105106107100101102103104105106107WT R1 R2 R3 R4 R5 R6 WT R1 R2 R3 R4 R5 R6 WT R1 R2 R3 R4 R5 R6Cell lysate activity (nM/s)k cat (s-1); KM (μM); k cat/KM (M-1s-1 )k cat (s-1); KM (μM); k cat/KM (M-1s-1 )Variant Variant VariantA B CParaoxonase Lactonase 47 Table 2.1 Kinetic parameters of AiiA variants and MPH for selected substrates. n.d. indicates not determined.   2.4.3 Genotypic changes that led to increased phosphotriesterase activity In total, eight amino acid substitutions accumulated during six rounds of directed evolution (Figure 2.6 A). Four mutations, L33M, V69G, K139T and I230M, occurred in the first round, but only two, L33M and V69G, are proximally located with respect to the active site (Figure 2.6 A). We generated single point mutants of all four mutations in the background of AiiA-wt to examine their individual contributions, and found that V69G alone accounted for the improvement observed in round 1. The other three mutations were functionally neutral (Table A.4). Only single mutations occurred at each subsequent round: F64C in round 2, M33V in round 3, S20F in round 4, K218R in round 5, and H18Q in round 6. Although most mutations occurred proximal to the active site, only S20F appears to be directly part of the active site cavity (Figure 2.6 A-B). V69G and F64C are located in the loop 3 above the active site (Figure 2.6 B). Residue 33, which is located on the ß-strand connected to loop 1, adjacent to a metal-binding residue, His235, was mutated twice during the trajectory: L33M at round 1 was neutral, but the subsequent mutation at round 3, M33V, increased phosphotriesterase activity by 2-fold (Figure 2.5 B). S20F (round 4) and H18Q (round 6) occurred in an active site loop 8, although H18Q does not appear to be part of the active site cavity (Fig. 2.6 A). K218R (round 5) is not located near the active site, yet it caused a 1.5-fold decrease in KM during round 5 (Figure 2.5 B).        Paraoxon-ethyl Paraoxon-methyl Parathion-methyl C6-HSL Enzyme kcat [s-1] KM [µM] kcat/KM [s-1M-1] kcat [s-1] KM [µM] kcat/KM [s-1M-1] kcat [s-1] KM [µM] kcat/KM [s-1M-1] kcat [s-1] KM [µM] kcat/KM [s-1M-1] AiiA-wt 1.8 ± 0.1 3400 ± 180 5.1 × 102 5.5 ± 0.4 2400 ± 400 2.3 × 103 n.d. n.d. 3.4 340 ± 33 89 ± 27 3.8 × 106 AiiA-R1 8.2 ± 0.6 1000 ± 170 7.9 × 103 n.d. n.d. n.d. n.d. n.d. n.d. 340 ± 58 350 ± 150 9.6 × 105 AiiA-R2 10 ± 0.3 400 ± 40 2.6 × 104 n.d. n.d. n.d. n.d. n.d. n.d. 200 ± 5.5 290 ± 21 6.9 × 105 AiiA-R3 25 ± 1.8 470 ± 70 5.3 × 104 n.d. n.d. n.d. n.d. n.d. n.d. 500 ± 9.4 460 ± 31 1.1 × 106 AiiA-R4 29 ± 6.6 220 ± 65 1.3 × 105 n.d. n.d. n.d. n.d. n.d. n.d. 470 ± 19 120 ± 22 6.9 × 106 AiiA-R5 15 ± 1.3 40 ± 7 3.8 × 105 n.d. n.d. n.d. n.d. n.d. n.d. 900 ± 69 750 ± 180 1.2 × 106 AiiA-R6 14 ± 1.2 26 ± 5 5.4 × 105 13 ± 0.4 120 ± 8 1.1 × 105 7.2 ± 0.5 11000 ± 1200 7.1 × 102 940 ± 72 820 ± 190 1.2 × 106 MPH 14 ± 1.1 900 ±  200 1.6 × 104  55 ± 6 690 ± 200 8.0 × 104  62 ± 3 880 ± 140 7.0 × 104  0.7 ± 0.1 675 ± 151 1.1 × 103 OHNOOOO2NO P OOOO2NO P OOOO2NO P OOS 48  Figure 2.6 Structural changes of AiiA during the evolution. (A) Overlay of crystal structures of AiiA-wt (grey, PDB ID: 5EH9) and AiiA-R4 (cyan, PDB ID: 5EHT). The C-a atoms of residues mutated during the trajectory are shown as spheres and colored according to their occurrence in the trajectory from grey (R1) to dark blue (R6). (B) Highlights of direct and indirect structural changes occurring in the active site. Residues that were mutated during the evolution are shown as sticks and colored in grey (AiiA-wt) and cyan (AiiA-R4), respectively. The non-mutated residue Phe68 is highlighted in pink. Electron densities (Fo-Fc omit maps) of Phe68 and Ser20 in AiiA-wt, and Phe68 and Phe20 in AiiA-R4 are shown in grey and cyan, respectively, and contoured at 3σ. (C and D) Surface representation of the active site of (C) AiiA-wt and (D) AiiA-R4. Phe68 is highlighted in pink and the two mutations at the active site entrance, S20F and V69G, are in shown in blue. Metal ions are shown as beige spheres.     H18QK218RI230MK139Tloop 8loop 3 loop 1AiiA-wtAiiA-R4R1R2R3R4R5R6V69GF64CL33VS20Floop 8K139TF68(wt)F68(R4) F64CV69GS20Floop 3F68V69S20 F68AiiA-wtG69AiiA-R4A BC DF20 49 2.4.4 The combination of indirect and direct mutations underlies the conformational active site tinkering To understand the molecular basis underlying the increase in phosphotriesterase activity, we solved the crystal structures of AiiA-wt and AiiA-R4 to a resolution of 1.29 Å under identical crystallization conditions (Table A.3). Both structures exhibit the same crystal packing, which allows a detailed structural comparison, with minimal crystallographic artifacts (Table A.3). We selected AiiA-R4 for structural analysis because it exhibits the highest kcat value of all evolved variants, and only marginal improvements in cell lysate activity were obtained in round 5 and 6 (Figure 2.5 A). Overall, the backbone of the structures of AiiA-wt and AiiA-R4 align with a maximum r.m.s.d. of 0.108 Å over all C-a atoms (Figure 2.6 A) and no significant changes in the positions of the catalytic metal ions and the coordinating residues were observed (Figure 2.7 A-B). However, the active site cavity of AiiA-R4 was remodelled through two major structural modifications (Figure 2.6 B-C). First, the position of loop 3 was altered; in particular, Phe68, which is located on the tip of loop 3, appears to have shifted downwards by 2.9 Å, which resulted in a narrowing of the active site entrance (Figure 2.6 B and Figure 2.8). Phe68 exhibits weak electron density in both AiiA-wt and AiiA-R4, indicating that the sidechain is quite mobile (Figure 2.6 B). The conformational tinkering of loop 3 is most likely caused by two mutations on the loop: V69G and F64C. The substitution of the larger hydrophobic residues with smaller ones created a space underneath loop 3, enabling Phe68 to adopt the downwards position in AiiA-r4. Second, the active site mutation S20F in round 4 resulted in the introduction of a bulky aromatic residue, which directly caused a narrowing of the active site (Figure 2.6 B).  To understand how the reshaping of the active site affected substrate positioning and phosphotriesterase activity, we attempted to soak paraoxon into the protein crystals. However, we were unable to obtain unambiguous substrate or product density to infer substrate binding. Therefore, we computationally docked paraoxon into the structures of AiiA-wt and AiiA-R4 and conducted MD simulations (Figure 2.9 and Figure 2.10). For each variant we conducted two independent MD simulations runs (25 ns), which provide similar results, but suggest a distinct substrate binding position in the active sites of AiiA-wt and AiiA-R4 (Figure 2.9). In AiiA-R4, the paraoxon substrate is reasonably well positioned in the active site where the scissile P-O bond of paraoxon can be in-line with the nucleophilic water molecule, which is activated and bridged  50 between the two metal ions, indicating a catalytically favorable substrate position (Figure 2.10 B). This orientation is supported by the repositioned Phe68, which forms hydrophobic and π-π-stacking interactions with the p-nitrophenol leaving group of paraoxon. The C4 of Phe68 and N of paraoxon maintain an average distance of 3.9 Å during the simulation, which is within the optimal distance range for π-π-stacking interactions (Meyer, Castellano, & Diederich 2003). Additionally, the mutation S20F might also contribute to positioning the substrate by hydrophobic interactions with one ethyl group of paraoxon (Figure 2.10 B). On the contrary, in AiiA-wt, the paraoxon is not placed in the position where an activated water molecule can perform nucleophilic attack, indicating catalytically inactive, non-productive binding (Figure 2.10 A). Phe68 in AiiA-wt remained relatively distant from the substrate during the MD simulations with average distance of 5.7 Å between the C4 of Phe68 and N of paraoxon, indicating no interaction between Phe68 and the substrate (Figure 2.10 C). Taken together, the results of our MD simulations lead us to speculate that the remodelling of the active site with the displacement of Phe68 and the active site mutation, S20F, collectively cause a catalytically more favourable paraoxon binding position in AiiA-R4, which may underlie the 200-fold increase in phosphotriesterase activity. Furthermore, we speculate that AiiA uses the same catalytic machinery for phosphotriesterase as well as for its native lactonase activity, in which an activated bridging water molecule between the two metal ions serve as a nucleophile (Figure 2.2 B). This is supported by the fact that mutating Asp108, which stabilizes the water molecule that participates in nucleophilic attack on the carbonyl carbon of the AHL substrate (Momb et al., 2008), to asparagine is highly deleterious (>100-fold decrease) to both lactonase activity and phosphotriesterase activity in AiiA-wt and AiiA-R4 (Table A.4).   51  Figure 2.7 Structural comparison between AiiA-wt and AiiA-R4. Configuration of active site metals (spheres) and metal binding residues (sticks) of (A) AiiA-wt and (B) AiiA-R4 are shown. Close up view of the active site of (C) AiiA-wt and (D) AiiA-R4. Residues that are repositioned in the crystallized structure are highlighted in pink. Residues mutated during the directed evolution are highlighted in cyan.  D108H235D191H104 H169H106H109D108H235D191H104 H169H106H109F68V69F64Q72Y194S20L33E135E135E136F68G69C64Q72Y194F20V33E135E135E136AiiA-wt AiiA-R4AiiA-wt AiiA-R4A BC D 52  Figure 2.8 B-factor in the crystal structures of (A) AiiA-wt and (B) AiiA-R4. The B-factor value increases from blue to red and loop regions where the B-factor changed between AiiA-wt and AiiA-R4 are highlighted with circles. (C) Comparison of the B-factors of the main chains of AiiA-wt and AiiA-R4. Loop regions where the B-factor changed between AiiA-wt and AiiA-R4 are highlighted with circles.    Figure 2.9 Overlay of two independent molecular dynamics (MD) simulations of (A) AiiA-wt (first and second trials colored light and dark grey, respectively) and (B) AiiA-R4 (first and second trials colored cyan and blue, respectively). Residues that were mutated during the trajectory or appear to have been repositioned are shown as sticks. Paraoxon substrate for the first and second trials are colored yellow and orange, respectively. B-factors (main chain) 0 20 40 60 0 50 100 150 200 250 Residue Number loop 3loop 1loop 8loop 3loop 1loop 8loop 1 loop 8loop 3AiiA-wtAiiA-R4AiiA-wt AiiA-R4A BCE135F68Q72S20Y194E136R134V69F64E135F68Q72F20Y194E136R134G69 C64AiiA-wt AiiA-R4A B 53  Figure 2.10 A snapshot of active site configuration and substrate position in the molecular dynamics (MD) simulations at 25 ns of (A) AiiA-wt and (B) AiiA-R4. Residues that have been repositioned and have not been mutated are shown as sticks and highlighted in pink. Residues mutated during the evolution are highlighted grey and cyan, respectively. A plausible nucleophilic water molecule bridging the two metal ions is shown as spheres. The distance between Phe68 and paraoxon was calculated between the C4 atom of the phenyl and the N atom of the nitro group, respectively. (C) Distance between Phe68 and paraoxon during two independent MD simulations of 25 ns for AiiA-wt (light and dark grey) and AiiA-R4 (cyan and blue). The distance between Phe68 and paraoxon was calculated between the C4 atom of the phenyl and the N atom of the nitro group, respectively.    2.4.5 Epistatic interactions altered the contribution of Phe68 In order to quantify how mutations of the trajectory alter the catalytic contribution of Phe68, we performed “comparative” alanine scanning mutagenesis, i.e., the activity change caused by F68A was compared in different genetic backgrounds of AiiA. As suggested by the structural analysis, we hypothesized that two adjacent mutations, V69G and F64C (from round 1 and round 2, respectively), could be responsible for the displacement of Phe68, and would potentially alter its functional contribution (Figure 2.6 B). Thus, we generated AiiA variants with the individual mutations V69G and F64C and their combination, V69G-F64C. Subsequently we introduced F68A in all four variants, including AiiA-wt, and compared their effect on phosphotriesterase NR134E135F68C4 V69F64Q72S20Y194H2ONR134E135C4F68G69C64 Q72Y194F20H2OA BCE136E1365.7 Å 3.9 Å  54 activity (Figure 2.11). If the effect of F68A on phosphotriesterase activity differ between in the background of the wild-type and in that of the mutants (V69G and/or F64C), it would indicate that V69G and F64C epistasically interact with Phe68 and alter its contribution to catalysis. The individual mutations V69G and F64C increased phosphotriesterase activity in the background of the wild-type (Phe68) by 11.8- and 1.8-fold, respectively (Figure 2.11 B). The double mutant F64C-V69G revealed significant positive epistasis between the two mutations; the combination of the two mutations synergistically increased phosphotriesterase activity by 70-fold, which is 3.4-times higher compared to an expected increase of 21-fold based on the additive null-model prediction, i.e., if the contributions of the two mutations are independent (Figure 2.11 B). Introducing F68A in AiiA-wt was advantageous, and resulted in a 2-fold activity increase (Figure 2.11 B). However, F68A became deleterious in the background of either V69G or F64C, causing a 2.9-fold and 1.9-fold decrease in activity, respectively. F68A became even more deleterious in the background of the F64C-V69G double mutant, with a 11.6-fold decrease in activity (Figure 2.11 B). Thus, the overall effect of F68A changed by 23-fold (2-fold positive vs. 11.6-fold negative) between AiiA-wt and the double mutant V69G-F64C. These results indicate that the role of Phe68 for catalysis was substantially altered during the evolution, which is consistent with the structural and MD simulation analyses. Interestingly, the deleterious effect of F68A was buffered through the accumulation of other mutations in the trajectory. F68A caused a 5-fold decrease in the background of the triple mutant V69G, F64C and S20F (acquired in round 4, Table A.4), and only a 2-fold decrease in AiiA-R4, i.e., the triple mutant plus L33V, K139T, and I230M (Figure 2.11 B). Thus, the contribution of Phe68 to catalysis was not only affected by adjacent mutations, e.g., F64C and V69G, but also by more remote ones, indicating that epistatic interactions can occur between very distant mutations.   55   Figure 2.11 Comparative mutational scanning of Phe68 in the background of different AiiA variants. (A) The functional contribution of Phe68 was quantified in various genetic background by measuring the effect of F68A on phophotriesterase activity. Variants possessing Phe68 is highlighted in grey, and variants with Ala68 in pink. Each value represents the average of three measurements with the standard deviation. (B) The fold-change in catalytic efficiency (kcat/KM) that resulted from the introduction of F68A (e.g., AiiA-F68A/AiiA-wt).   2.4.6 Extended comparative mutational scanning reveals changes in the functional contribution of other non-mutated active residues. Next, we investigated whether epistatic interactions during the evolutionary process drove any changes in the catalytic contributions of other non-mutated active site residues, in addition to Phe68. We extended our comparative mutational scanning analysis to several other active site residues, Gln72, Arg134, Glu135, Glu136, and Tyr194 (Figure 2.12). These residues were selected because their side chain positions were shifted in the crystal structures and/or MD simulations, suggesting that their interactions with the substrate could be different in AiiA-wt and AiiA-R4 (Figure 2.7 and Figure 2.10). Each of these residues were mutated to alanine with the exception of Tyr194, which was mutated to phenylalanine to avoid destabilization of the scaffold. Three out of five residues, Gln72, Glu136, and Tyr194, changed their functional contribution during the evolutionary trajectory between AiiA-wt and AiiA-R4 (Figure 2.12 A). Similar to 102103104105WTF68AV69GV69G-F68AF64CF64C-F68A R4106R4-F68AF64C-V69GF64C-V69G-F68Ak cat/KM (M-1s-1 )WTV69GF64C R4F64C-V69GFold Change in kcat/KM0.050.115A BVariant Variant 56 Phe68, the functional contribution of Tyr194 increased during the evolution: it caused a 10-fold improvement in phosphotriesterase activity in AiiA-wt, but its effect was reduced to a 5.5-fold increase in AiiA-R4 (Figure 2.12 A). In contrast, an opposite tendency in functional contribution was observed for Q72A and E136A. Q72A was deleterious (1.6-fold decrease) in the wild-type background, but became beneficial in AiiA-R4 (2.3-fold increase, Figure 2.12 A). Introducing E136A resulted in a 2.6-fold increase in AiiA-wt, and an even greater 11-fold increase in AiiA-R4 (Figure 2.12 A). It should be noted that while the overall catalytic efficiency (kcat/KM) increased with the introduction of the Q72A, R134A, E135A, E136A, and Y194F into AiiA-R4, none of the mutations have been selected for during the evolution. Surprisingly, none of the mutants exhibited higher activity than AiiA-R4 in cell lysate screens, which were employed during directed evolution (Table A.4). This is in part because the improvements conferred by these mutations are largely the result of reductions in KM, with a few (E135A and E136A) actually causing an accompanying sharp decline in kcat (Figure 2.12 C). Changes in protein expression and solubility may also cause discrepancies between cell lysate and kinetic data. Overall, the results suggest that the catalytic contributions of several active site residues were indeed altered by remote mutations accumulated during the evolution.     Figure 2.12 Comparative mutational scanning of selected non-mutated active site residues between AiiA-wt and AiiA-R4. The functional contribution of various active site residues, Phe68, Gln72, Arg134, Glu135, Glu136, and Tyr194 was measured in the genetic background of AiiA-wt and AiiA-R4. The bars 0.010.11100.010.11100.010.1110F68AQ72AR134AE135AE136AY194FF68AQ72AR134AE135AE136AY194FF68AQ72AR134AE135AE136AY194FVariants Variants VariantsFold Change in kcat/KMFold Change in kcatFold Change in KMA B CAiiA-wtAiiA-R4 57 represent fold change in phosphotriesterase activity between AiiA-wt and mutant (grey), or AiiA-R4 and Aii-R4 mutant (cyan). Each value represents the average of three measurements with the standard deviation (bars). (A) The fold-change in kcat/KM. (B) The fold-change in kcat. (C) The fold-change in KM.   2.4.7 Epistatic interactions between mutations of the evolutionary trajectory Finally, we investigated the extent by which epistatic interactions affected the mutational trajectory in the directed evolution. We selected the six mutations that had effects on the catalytic parameters, V69G, F64C, S20F, L33V, K139T and I230M, and compared their catalytic effect in the genetic background of when they appeared in the trajectory to their effect in the genetic background of AiiA-wt (Figure 2.13). Two out of the six mutations, F64C of round 2 and S20F of round 4, exhibited positive epistasis, i.e., a more positive effect on the phosphotriesterase activity in the trajectory than individually in the wild-type background. In particular, S20F was almost neutral in the background of AiiA-wt, but became positive (2.5-fold) in the trajectory, which indicates that the positive effect of S20F was permitted by earlier mutations of the trajectory (Figure 2.13 A). On the contrary, H18Q was more positive in the background of AiiA-wt compared to its effect in round 6 (Figure 2.13 A). Taken together, the results indicate that intertwined epistatic interaction networks shape the evolution of AiiA towards improved phosphotriesterase activity, and altered not only the catalytic contributions of non-mutated active site residues but also the effects of mutations that appeared during the later rounds of the evolution.     58  Figure 2.13 Change in mutational effect in the background of AiiA-wt and in the evolutionary trajectory. Mutations that occurred in the trajectory were introduced into AiiA-wt and their catalytic efficiencies (grey) were compared to their respective improvement in catalytic efficiency during the evolutionary trajectory (green). Each value represents the average of three measurements with the standard deviation (bars). (A) The fold-change in kcat/KM. (B) The fold-change in kcat. (C) The fold-change in KM.                             0.11100.1110V69G (R1)F64C (R2)L33V (R3)S20F (R4)K218R (R5)H18Q (R6)V69G (R1)F64C (R2)L33V (R3)S20F (R4)K218R (R5)H18Q (R6)0.1110V69G (R1)F64C (R2)L33V (R3)S20F (R4)K218R (R5)H18Q (R6)Variants Variants VariantsFold Change in kcat/KMFold Change in kcatFold Change in KMA B CAiiA-wtTrajectory 59 2.5 Discussion Changes in protein function caused by second- and third-shell mutations are not an uncommon phenomenon, and many directed evolution studies have observed the accumulation of remote mutations that caused significant improvements in enzyme function (Miton & Tokuriki 2016; Morley & Kazlauskas 2005). Fewer studies have, however, elucidated the molecular mechanisms underlying the effects of remote mutations on enzyme function (Ben-David et al. 2013; Dellus-Gur et al. 2015; Jiménez-Osés et al. 2014; Kaltenbach et al. 2015; Perica et al. 2014; Sykora et al. 2014; Tomatis et al. 2008), and there is still an absence of methods devised to perform such characterizations. Here, we provide a detailed molecular explanation for the evolution of a new enzyme function by mutations outside the active site: the ~1000-fold increase in phosphotriesterase activity of AiiA was largely driven by the indirect effects of two second-shell mutations, V69G and F64C, that appeared to reconfigure the active site residue Phe68 and enable a more favorable positioning of the substrate in the active site for catalysis. Moreover, we utilized comparative mutational scanning to provide a quantitative measurement of the indirect mutational effects, and reveal that the functional contribution of Phe68 to the phosphotriesterase activity markedly increased (>20-fold) in the background of the two adjacent mutations V69G-F64C.   Remote and active site mutations each possess their advantages and disadvantages, owing to the level of chemical and structural consequences they cause. For example, directly mutating an active site residue can result in significant changes in the active site, such as the introduction of new electrostatic interactions between the enzyme and substrate, or significantly changing the size and shape of the active site cavity. However, the level of structural disruption that they cause means that active site mutations can also be detrimental to both the activity and the structural integrity of proteins (Shoichet et al. 1995; Tokuriki et al. 2008). On the other hand, mutations outside the active site might not change the existing chemistry in the active site, but can fine-tune the shape of the cavity and the position of key catalytic residues and cofactors in subtle ways that direct active site mutations cannot (Mesecar, Stoddard, & Koshland 1997). For example, directly mutating Phe68 might not provide the same structural modifications that are required to increase the activity as its indirect repositioning through V69G and F64C can achieve. In this instance, it appears that subtle conformational tinkering, i.e., collapsing loop 3 downward into the active site, and shifting the position of Phe68, was a better adaptation strategy for the evolution of AiiA  60 towards phosphotriesterase activity.   Our work provided experimental evidence to link epistatic interactions and active site conformational tinkering. Over the last decade, growing experimental and theoretical evidence has indicated that epistasis, or non-additive interactions among mutations, is prevalent during adaptive evolution and plays a central role in shaping the accessibility of mutational trajectories (de Visser et al. 2011; Harms & Thornton 2013; Kaltenbach & Tokuriki 2014). Mutations that occur at an early stage in an adaptive evolution could permit or restrict the potential of other mutations and affect the appearance of mutations in later stages (Harms & Thornton 2013; 2014; Kaltenbach & Tokuriki 2014; Miton & Tokuriki 2016; Noor et al. 2012). Here we expand the view of epistatic interactions to residues that are not mutated during evolution, and demonstrate that their functional contribution can change during a functional transition. Such epistatic interactions might be prevalent and play important roles in many other enzyme evolution and engineering examples, in which functional adaptation is achieved through the molecular tinkering of active site residues by second- and third-shell mutations. It is essential that we further develop our knowledge of the molecular mechanisms underlying such indirect effects for the design and engineering of proteins (Hilvert 2013). Thus, along with a detailed structural characterization, comparative mutational scanning, as described in this study, would serve as a robust approach to investigate and connect the role of mutations outside the active site to functional changes in many other protein evolution studies.   Phosphotriesters and lactones are different chemical compounds in respect to their chemical structure and the bonds that are broken during hydrolysis (P-O bond via a pentacoordinate transition state vs. C-O bond via a tetrahedral transition state; Figure 2.1). However, enzymes in three different lactonase families, each with distinct structural folds and active sites (the AH superfamily (TIM barrel fold), the MBL superfamily (abba-fold), and the paraoxonase (PON) superfamily (b-propeller fold)), all exhibit promiscuous phosphotriesterase activity (Draganov 2010). Elias and Tawfik proposed that the two activities, phosphotriesterase and AHL lactonase, are commonly shared because the two substrates could bind in a similar geometry in the active site, and thus can utilize the same catalytic machinery (Elias & Tawfik 2011). The plausible evolutionary trajectories underlying the functional transition of enzymes in the AH and PON superfamily have been previously explored using directed evolution (Aharoni et  61 al. 2005; Hawwa et al. 2009; Meier et al. 2013). In this work, we have made similar observations to previous studies, e.g., the two reactions can be traversed within only several mutational steps, supporting the notion that phosphotriesterases promptly evolved in parallel from ancestral lactonases in the time frame of a few decades (Elias & Tawfik 2011). Interestingly, the molecular changes associated with the increase in phosphotriesterase activity appear to be different in each case. In mammalian PON1, a displacement of the catalytic Ca2+ was largely responsible for the transition between lactonase and phosphotriesterase activities (Ben-David et al. 2013). Phosphotriesterase activity in DrPLL, a lactonase in the AH superfamily, increased through enlargement of the active site cavity (Meier et al. 2013). OPHC2, another phosphotriesterase in the MBL superfamily, has been shown to have evolved from a dihydrocoumarin lactonase by only two active site mutations (Luo et al. 2014). The evolution of AiiA, presented in this study, demonstrated that improvements in the phosphotriesterase activity were achieved by a reduction in the active site cavity. Therefore, enzymes with similar native functions might be equally good springboards for the evolution of a particular new function. The caveat to this conclusion is that parallel evolution from different starting enzymes might involve different molecular and mechanistic adaptations to achieve the same goal.                 62 Chapter 3: The evolution of organophosphate hydrolase activity in methyl-parathion hydrolase  3.1 Summary Characterizing adaptive landscapes that encompass the emergence of novel enzyme functions can provide molecular insights into both the enzymatic and evolutionary mechanisms through which key function-altering mutations optimize an activity. Here, we combine ancestral protein reconstruction with biochemical, structural, and mutational analyses to characterize the functional evolution of methyl-parathion hydrolase (MPH), a xenobiotic organophosphate (OP)-degrading enzyme. We identify five mutations that are necessary and sufficient for the evolution of OP activity in MPH from an ancestral dihydrocoumarin hydrolase. In-depth analyses of the adaptive landscapes encompassing this evolutionary transition revealed the mutations form a complex interaction network, defined in part by higher-order epistasis, that constrained the number of accessible adaptive pathways available.  By also characterizing the adaptive landscapes in terms of their functional activities towards three additional OP substrates, we reveal that subtle differences in the polarity of the substrate substituents drastically alter the network of epistatic interactions. Our work suggests that the mutations function collectively to enable substrate recognition via subtle structural repositioning.           63 3.2 Introduction How evolution generated a tremendous repertoire of diverse enzymatic functions is a central question in evolutionary biochemistry. Understanding the molecular mechanisms that underlie the evolution of new enzyme functions requires 1) the identification of a minimal set of mutations that are necessary and sufficient to cause the emergence and optimization of a novel function, and 2) an understanding of the biophysical molecular changes caused by these mutations. Recent advances in experimental evolution and phylogenetically informed ancestral sequence reconstruction (ASR) aid in such endeavours by unveiling a set of adaptive mutations that allow substantial functional transitions (Canale et al. 2018; Lozovsky et al. 2009; Lunzer et al. 2005; Meini et al. 2015; O'Maille et al. 2008; Sunden et al. 2015; Tufts et al. 2015; Weinreich et al. 2006). Further biochemical, biophysical and structural characterization of enzyme variants enables us to uncover the molecular basis of the functional transition (Clifton et al. 2018; Kaltenbach et al. 2018). Elucidating the molecular role of each genotypic change, however, is often difficult due to the prevalence and complexity of epistasis, i.e., the phenomenon in which the effect of a mutation varies significantly depending on the presence or absence of other mutation(s) (Lozovsky et al. 2009; Meini et al. 2015; Sunden et al. 2015; Tufts et al. 2015; Weinreich et al. 2006). Consequently, the mutational effect observed in a particular genetic background, i.e., the wild-type genotype, may not always accurately reflect the impact that the mutation had during actual evolution if it was acquired after the fixation of other, earlier mutations. Understanding the molecular basis of epistasis is thus essential if we are to effectively investigate the sequence-structure-function relationships that govern how proteins evolve novel functions.  Adaptive fitness landscapes, which can be determined by generating and functionally assaying all possible combinations of the mutations responsible for a new function, are a powerful tool for studying the evolutionary and biophysical origins of novel functions by unveiling the potential adaptive pathways that connect the ancestral and derived genotypes (Canale et al. 2018; Lozovsky et al. 2009; Lunzer et al. 2005; O'Maille et al. 2008; Tufts et al. 2015; Weinreich et al. 2006). Recently developed statistical methods enable us to assess and quantify the degree of epistasis, including high-order interactions (interactions involving more than two mutations), which provide a comprehensive view of epistasis and the dominant interactions that drive it (Anderson, McKeown, & Thornton 2015; Sailer & Harms 2017; Stormo 2011; Weinreich et al.  64 2018). Moreover, because epistasis reflects interactions between amino acid changes, adaptive landscapes also provide critical insight into the underlying molecular interactions both within an enzyme and between enzyme and substrate. With this study, we integrate an analysis of adaptive landscapes with a comprehensive assessment of the intra- and intermolecular epistatic interactions underlying the evolution of a novel organophosphate (OP) hydrolase to obtain a detailed view of how novel catalytic functions can evolve.  Methyl-parathion hydrolase (MPH) degrades methyl-parathion, a xenobiotic OP compound, with high efficiency, enabling bacteria to utilize the substrate as a source of phosphate and carbon, thereby conveying a selective advantage (Sun et al. 2004). While naturally occurring forms of OPs have been documented (Malla et al. 2011; Nguyen et al. 2017), and OP-degrading enzymes may have existed prior to the industrial era, the activity in MPH is most likely recently acquired. This is because MPH was originally isolated from Pseudomonas sp. WBC-3 that was growing in the polluted soil near a factory manufacturing methyl-parathion (Sun et al. 2004). The enzyme has also disseminated to many bacterial strains via horizontal gene transfer to confer the ability to utilize xenobiotic OPs for nutrients (Liu et al. 2005). Moreover, its closest homologs exhibit low efficiency towards OPs (Luo et al. 2014). Presently, the evolutionary and molecular mechanisms underlying the evolution of MPH are largely unknown. As the selection pressure is relatively simple and well understood, this enzyme provides an excellent opportunity to investigate how novel catalytic functions evolve. Here, we combine ASR with a robust functional, structural, and mutational characterization of the ancestral and derived enzymes to identify a minimal subset of five genotypic changes that were responsible for optimizing the novel enzymatic function. We subsequently characterized the adaptive landscape that encompasses this evolutionary transition as defined by those five key evolutionary substitutions for activity against four different OP substrates. We then develop and apply extensive statistical analyses to understand how epistatic interactions defined this landscape for each substrate. In particular, we unveil the complex network of epistatically interacting sites that underlies the fine-tuning of the active site for less polar substrates.   65  Figure 3.1 Representative sequence similarity network (SSN) of enzymes in the metallo-β-lactamase (MBL) superfamily generated using EFI-EST. 87,400 unique protein sequences were obtained from the UniProt database in March 2016, and the similarities among the sequences calculated using “all-vs.-all” BLAST pairwise comparisons using Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST, https://efi.igb.illinois.edu/efi-est/) (Gerlt et al. 2015). Sequences that share >40% sequence identity are grouped together as one node (circle) in the network, resulting in a total of 4,548 nodes representing all 87,400 sequences. Nodes are connected by an edge if the mean pairwise BLAST alignment score between all sequences in the node is above a threshold of 20. Sequence clusters that include experimentally characterized enzymes are highlighted in violet to indicate distinct functional families. The dihydrocoumarin hydrolase (DHCH) family is highlighted in cyan, with large coloured nodes in the family representing experimentally characterized sequences: the large cyan node represents characterized DHCH enzymes and the large orange node methyl-parathion hydrolases (MPH).        B3 β-lactamasesPersulfideDioxygenaseGlyoxalases IIRibonucleases4-PyridoxolactonasesN-Acyl HomoserineLactonasesAnthranilateSynthasesBinding ProteinsAlignment Score: 20B1 & B2β-lactamasesNitrogenaseReductasesAlkyl SulfatasesDehalogenasesDHC HydrolasesMPHSupplementary Figure 1. Representative sequence similarity network (SSN) of enzymes in the metallo-β-lactamase (MBL) superfamily generated using EFI-EST. 87,400 unique protein sequences were obtained from the Uniprot database in March 2016, and the similarities among the sequences calculated using “all-v .-all” BLAST pairwise comparisons. Sequences hat share >40% sequence identity are gr uped together as one node (circle) in the network, resulting in a total of 4,548 nodes representing all 87,400 sequences. Nodes are connected by an edge if the mean pairwise BLAST alignment score between all sequences in the node is above a threshold of 20. Sequence clusters that include experimentally characterized enzymes are highlighted in violet to indicate distinct functional families. The dihydrocoumarin hydrolase (DHCH) family is highlight d in cyan, with large coloured nodes in the family representing experimentally ch racterized sequences: the large cyan node represent characterized DHCH enzymes and the large orange node methyl-parathion hydrolases (MPH).Supplementary Figure 1 66 3.3 Materials and Methods 3.3.1 Phylogenetic analysis and ancestral reconstruction The amino acid sequences of 500 MPH homologs were downloaded from GenBank using protein BLAST starting from Pseudomonas sp. WBC-3 MPH (GI:30038775) on July 2014. Subsequently, redundant sequences with more than 98% sequence identity were removed computationally using CD-Hit (Fu et al. 2012) with standard parameters, resulting in a total of 306 sequences. Then, a multiple sequence alignment (MSA) of these sequences was generated using the Multiple Sequence Alignment by Log-Expectation (MUSCLE) (Edgar 2004), and sequences containing major Indels were manually removed, further reducing the number of sequences down to 153. For the reconstruction a MSA of the aforementioned 153 sequences was used, generated using M-Coffee with standard parameters (Wallace et al. 2006). The phylogeny was inferred via Randomized Axelerated Maximum Likelihood (RAxML (Stamatakis 2014)), utilizing the LG (Le & Gascuel 2008) protein model with parameters +I+G+F, which was the best fitting evolutionary model according to the Akaike Information Criterion, tested via PROTTEST (Abascal, Zardoya, & Posada 2005). The posteriori bootstrapping analysis in RAxML was performed with the extended majority-rule consensus tree criterion. Amino acid sequences of ancestral nodes in the phylogeny were inferred with PAML (version 4.7a (Yang 2007)) employing the above-mentioned alignment, phylogeny and evolutionary model. As PAML does not support Indel reconstruction, Indels in the inferred ancestors were manually reconstructed using a Maximum Parsimony principle. The probabilities for every amino acid in the predicted sequences were examined for ambiguities (residues with Bayesian posterior probability <0.8). In total, AncDHCH1 contained 25 ambiguous sites, of which none are located in the vicinity of the active site in the protein structure (Figure 3.3 and Table B.1).  3.3.2 Cloning  The nucleotide sequences of the MPH ancestral enzymes, all with signal peptides removed, were designed using the same codons found on the wild-type MPH nucleotide sequence for regions where the amino acid sequences are identical and E. coli optimized codons for regions where the amino acid sequences diverge. All genes contained a Nco I restriction site on the forward side and  67 a Hind III restriction site on the reverse for cloning. The resulting genes were synthesized (BioBasic), amplified, and cloned into a pET27(b) vector (Novagen) containing a N-terminal Strep-tag II sequence (MASWSHPQFEKGAG) using the Nco I and Hind III restriction enzymes (Thermo Scientific). For crystallization of AncDHCH1 the gene was cloned into a pET28 vector with a TEV (Tobacco Etch Virus) proteases-cleavable N-terminal MBP (maltose binding protein)-His6 tag using Nco I and Hind III restriction enzymes. Correct sequences of synthesized genes and subcloning into the vector were confirmed by DNA sequencing.  3.3.3 Purification of tagged proteins The plasmids containing strep-tagged proteins were transformed into E. coli BL21 (DE3) and grown in LB with 50 µg/mL kanamycin overnight. The following day, 8 mL of the overnight cultures were used to inoculate 400 mL LB with 50 µg/mL kanamycin and 100 µM ZnCl2, and the cultures were grown at 30°C, 280 × rpm for 3 hours. The cultures were subsequently cooled to 16°C for 30 minutes, and 0.2 mM of IPTG was added to induce protein expression, and the cultures incubated at 16°C overnight. Cells were harvested by spinning at 4°C, 3,220 × rpm for 10 minutes, and the supernatant removed. For lysis, the cell pellets were frozen at -20°C overnight, and then re-suspended in a mixture of B-PER Protein Extraction Reagent (Thermo Scientific) and 50 mM Tris-HCl buffer, pH 7.5 containing 200 µM ZnCl2, 100 µg/mL lysozyme, and 0.5 U benzonase, and incubated on ice for 1 hour. Cell debris were removed by centrifugation at 4°C, 25,000 × g for 30 minutes. The clarified lysate was loaded into a Strep-tactin affinity column (IBA). The columns were washed once with Buffer A (50 mM Tris-HCl, pH 7.5 containing 100 mM NaCl and 200 µM ZnCl2), once with Buffer B (50 mM Tris-HCl, pH 7.5 containing 300 mM NaCl and 200 µM ZnCl2), and a final time with Buffer A. Strep-tagged proteins were eluted with Buffer A containing 2.5 mM desthiobiotin. To desalt the proteins, the buffer of eluted proteins was exchanged with 50 mM Tris-HCl, pH 7.5 containing 200 µM ZnCl2. The eluted proteins were concentrated using Microsep Advance Centrifugal Device, 10K Omega (Pall Life Sciences).  3.3.4 Enzyme kinetics The kinetic parameters of the purified enzymes were measured using the same procedure as described previously (Baier & Tokuriki 2014). Briefly, the activities for paraoxon-methyl,  68 paraoxon-ethyl, parathion-ethyl, and parathion-methyl (Sigma) were monitored following the release of p-nitrophenol at 405 nm with an extinction coefficient of 18,300 M-1cm-1 in 50 mM Tris-HCl, pH 7.5 containing 100 mM NaCl. The activity for dihydrocoumarin (Sigma) was monitored at 270 nm with an extinction coefficient of 1300 M−1cm−1 in 50 mM HEPES, pH 7.5 containing 100 mM NaCl. The kinetic parameters KM and kcat were determined by fitting the initial rates to the Michaelis–Menten model (v0=kcat [E]0[S]0/(KM+[S]0)) using KaleidaGraph (Synergy Software). The logP partitioning coefficients of the substrates were calculated using ChemDraw 15.0.   3.3.5 Site-directed mutagenesis To generate single and combinatorial mutants, non-overlapping pairs of primers ~30 bp in length and encoding the desired mutations were designed with a Lgu I restriction site at the 5’ end followed by an overlapping segment of 3 bp to enable annealing of the PCR products following restriction (Table B.6). The entire plasmid containing the target gene was amplified using PCR, with each 50 µL reaction containing 1 × Kapa Hifi Buffer (Kapa Biosystems), 2 ng template DNA, 1 µM of equal amounts of forward and reverse mutational primers, 0.25 mM dNTPs, and 0.25 U Kapa Hifi polymerase (Kapa Biosystems). The cycling conditions were as follows: Initial denaturation at 95°C for 2 minutes followed by 30 cycles of denaturation (20 seconds, 95°C), annealing (15 seconds, 60°C) and extension (5 minutes, 72°C) and a final extension step at 72°C for 5 minutes. Subsequently, each PCR was treated with Dpn I (Thermo Scientific) for 1 hour at 37°C to digest the template DNA, and the products were purified using Cycle Pure PCR purification kit (Omega Bio-tek). The PCR products were digested with Lgu I (Thermo Scientific) for 1 hour at 37°C. The digested product was purified using Cycle Pure PCR purification kit (Omega Bio-tek). Ligations were performed in 20 µL reactions with approx. 20 ng of DNA using T4 DNA ligase (Thermo Scientific), and incubated at room temperature for 2 hours. The ligation mixtures were transformed into E. cloni 10G cells (Lucigen). Successful mutagenesis was confirmed by sequencing.  3.3.6 Cell lysate activity screen in 96-well plates To test the lysate activities of variants, E. coli BL21 (DE3) transformed with plasmids for each of the 32 MPH variants were grown in triplicates in a 96-deep well plate containing 200 µL of LB  69 media supplemented with 50 µg/mL kanamycin at 30°C, 900 × rpm overnight. On the following day, a second 96-deep well plate containing 400 µL of LB media supplemented, 50 µg/mL kanamycin, and 100 µM of ZnCl2 were inoculated with 20 µL of the aforementioned overnight culture and incubated at 30°C, 900 × rpm for 3 hours. Protein expression was induced by adding IPTG to a final concentration of 1 mM and further incubation at 30°C for 3 hours. Cells were harvested by centrifugation at 3,220 × g for 10 minutes and pellets were frozen -80°C for at least 30 minutes. To lyse the cells, the cell pellets were resuspended in 200 µL of lysis buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 200 µM ZnCl2, containing 0.1% Triton X-100, 100 µg/ml lysozyme and 1 U/ml of benzonase) and incubated at room temperature with shaking at 1200 × rpm for 1 hour. The cell lysates were clarified by centrifugation at 3,220 × g for 20 minutes at 4°C. Clarified lysates were diluted 5-fold and measured against a single substrate concentration of 400 µM for all organophosphate compounds to obtain linear initial rates. The lysate activity is given as the rate of substrate hydrolysis in µM/sec/OD, which is calculated from the molar extinction coefficient of the p-nitrophenol leaving group (18,300 M-1cm-1) and normalized to the OD of the cell cultures.  3.3.7 Protein purification for crystallization For crystallization, AncDHCH1 was expressed in E. coli BL21 (DE3) cells and grown in 500 mL TB (terrific broth) (Sambrook 2001) media supplemented with 50 µg/mL kanamycin, 1% (w/v) glycerol and 200 µM ZnCl2. Cultures were grown at 30°C for 6 hours and for another 20°C hours at RT for expression. Cells were harvested by centrifugation and resuspended in sonication buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.2 mM ZnCl2, 1 U/ml of benzonase) and lysed by sonication (Sonic Ruptor 400; Omni International). Clarified lysate was applied to a gravity flow column containing 2 mL amylose resin (New England Biolabs), washed with sonication buffer and eluted in TEV reaction buffer (50 mM Tris-HCl pH 8.0, 0.5 mM EDTA, 1 mM DTT, and 150 mM NaCl) containing 10 mM maltose (Sigma-Aldrich). 20% (w/w) TEV protease containing a His-tag was added and incubated at 4°C for 3 days before the TEV protease and the cleaved MBP-tag were subsequently removed using affinity chromatography. The proteins were further purified using size exclusion (HiLoad 16/600 Superdex 75 column, GE-Healthcare), eluted in buffer containing 20 mM Tris-HCl at pH 7.5, 100 mM NaCl and 0.2 mM ZnCl2. The purity of the protein  70 was confirmed using SDS-PAGE and the protein concentrated to 10 mg/mL using a 10 kDa molecular weight cut-off ultrafiltration membrane (Amicon, Millipore).  3.3.8 Crystallization of AncDHCH1 AncDHCH1 was crystallized using sitting drop vapor diffusion method by mixing protein solution (1 µL) and well solution (2 µL). Crystals appeared within in a few hours at 18°C in 0.2 M Magnesium acetate terahydrate, 20% PEG 3350, 10 mM Tris-HCl at pH 7.5, 50 mM NaCl and 0.1 mM ZnCl2 and continued to grow over several weeks.   3.3.9 Data collection and structure determination The crystallographic data was collected at 100 K with a MarµX X-ray source (1.5418 Å) at the Australian National University and crystals diffracted to a resolution of 1.6 Å. Diffraction images were indexed and integrated by XDS (Kabsch 2010). The data was merged and scaled using Aimless in the CCP4 program suite. Molecular replacement was used to phase all structures using MOLREP using the structure deposited under PDB accession code 1P9E as a starting model. The model was refined using phenix.refine in Phenix v1.12 (Afonine et al. 2012) and Refmac v5.8 (Murshudov et al. 2011) in CCP4 v7.0, and the model was subsequently optimized by iterative model building with the program COOT v0.8.8 (Emsley & Cowtan 2004) (Ramachandran statistics: 96.13% preferred, 3.87% allowed and 0.00% outliers).  3.3.10 Molecular docking The crystal structures of MPH (PDB ID: 1P9E) and AncDHCH1 (PDB ID: 6C2C) were prepared for computational docking using the protein preparation wizard in the Schrödinger software graphical user interface Maestro (version 11.0.015). Bond orders were assigned to the structures with the PDB Chemical Component Dictionary (CCD). Hydrogen atoms were added to the protein-ligand complex at pH 7. Bonds between metal ions and their ligands were added after correcting the formal charge on the metal ions and the surrounding atoms. The hydrogen bond network was optimized using PROPKA (Bas, Rogers, & Jensen 2008) at pH 7. The structure was further optimized by restrained minimization with convergence of heavy atoms to RMSD 0.3 Å using the OPLS3 force field (Harder et al. 2016). Docking grids were generated using the default  71 settings with the box size of 20 Å, centered from the active site residues. The ligands were prepared using LigPrep with Epik and docked using the Glide (Friesner et al. 2006) SP method with default settings; 20 poses within the default energy window (>0.0 kcal/mol) were requested. The Glide scores (an empirical scoring function) for the various poses are shown in Table B.5. We further optimized a productive pose of the methyl-parathion docking using an energy minimization procedure. The substrate was moved closer (3.2 Å) to the b-metal ion (based on the crystal structure of an OP:phosphotriesterase Michaelis complex (Jackson et al. 2008)) and the protein-ligand complex was prepared as described above. The structure was further optimized by restrained minimization with convergence of heavy atoms to RMSD 0.3 Å using the OPLS3 force field (Harder et al. 2016). Molecular mechanics geometry minimization was performed using the program Macromodel in Schrodinger, where the energy of the geometry was minimized using the OPLS3 forcefield with Polak-Ribiere Conjugate Gradient (PRCG) minimization.  3.3.11 Linear modeling of genetic and environmental effects Definition of genetic and environmental encoding system To quantify the genetic and environmental determinants of enzyme activity, we used an approach similar to that previously developed (Anderson et al. 2015; Stormo 2011). We constructed regression models that explain lysate activity (L.A.) as a function of the genetic states at the five variable amino acid residues in the enzyme. The genetic variation in the protein was defined in the linear models using one-dimensional variables for the substitutions; residues 72, 193, 258, 271, and 273 are described by single-dimensional vectors a, b, c, d, and e, respectively, with the ancestral state defined as -1 and the derived state defined as +1. These variables make the y-intercept of the linear model equal to the mean activity across all experimental measurements (Anderson et al. 2015; Stormo 2011) therefore, all genetic effects are expressed relative to the mean.   First-order linear models We constructed our first-order model by regressing the lysate activity of each genotype on dependent variables that reflect the individual first-order identities at each genetic position. For example, the linear model for position 72 is expressed as:  72 (". $. ) = 	(()*) + ), where a is the effect coefficient of moving +1 in that dimension, u1 is the coordinate representing the genotype (i.e., -1 for ancestral lysine, +1 for derived arginine), and ), is the y-intercept for the model (equal to the mean across the data). The linear coefficients for each model were computed using ordinary least squares (OLS) regression with the open-source statistical package R (http://www.r-project.org/). The coefficient a indicates the deviation of the derived genetic state from the mean, while –a gives the deviation of the ancestral genetic state from the mean.  The total effect of the substitution lysine-to-arginine and position 72 is therefore equal to 2a.  To determine how well all five first-order effects of substitutions in the protein predict variation in L.A., we constructed the following linear model that included all first-order protein coefficients:  (". $. ) = 	(()*) + 	-().) + 	/()0) + 1()2) + 3()4) +	), where )., )0, )2, and )4 are the coordinates representing the genotype for positions 193, 258, 271, and 273, respectively. We then computed the R2 values for this first-order model.  Linear models with second-order genetic epistasis To identify cases of second-order epistatic interactions, we individually introduced every possible interaction term for every two-way combination of genotypes at the variable sites in the protein. These interaction variables were constructed as previously described (Anderson et al. 2015; Stormo 2011). Each interaction is described by a new linear vector, the value for which is determined by taking the outer product between the two first-order linear vectors.  For example, the interaction between site 72 and 258 of the protein will be equal to (a) Ä (c) = (ac).     The second-order interaction effects are equal to the deviation from the additive effect modeled by each genetic state individually across other genetic backgrounds, and are defined herein as the “marginal” effect. As previously described, this method of encoding means that all terms are relative to the mean lysate activity, and so we multiplied this term by two in order to obtain the effect of each substitution in the derived genetic background (i.e., when the derived state exists at the other site) versus the average effect of that substitution regardless of the genetic background (e.g., interaction 72_258 is equal to 2ac).  73 One advantage of this method of encoding the genetic data is that the first-order model is nested within the second-order model.  This allowed us to assess whether addition of the second-order model terms significantly improved the fit by comparing the improvement in the adjusted R2 as well as the improvement in the likelihood ratio test relative to the simpler first-order model. The effect of each second-order interaction (i.e., the epistasis that should be added to the sum of the additive lower-order effects) can be solved from these coefficients.   Linear models with second-order genetic epistasis Additional models with third-, fourth-, and fifth-order interactions were constructed analogously, and according to previously published analytical methods (Anderson et al. 2015; Stormo 2011).  3.3.12 Evaluation of correlated epistasis between different substrates We also applied a second approach to characterizing epistatic interactions that was focused on assessing their correlation between different substrate environments. To make these assessments, we compared each substitution’s effect when introduced in every possible genetic background (so five alternatives for each substitution) and performed a linear regression for every pairwise comparison between different substrates. The total range in a substitution’s effects reflects the widest degree of epistatic interactions that it has with other substitutions in this dataset. Therefore, the degree of correlation between those ranges across different environments should be reflected in the R2 value for the linear regression of these effects between those environments. Highly significant correlation implies that the epistatic interactions are largely the same between environments, while a lack of significant correlation implies largely unique patterns of epistasis between the two environments being compared. See (Figure 3.18).  3.3.13 Evolutionary pathway determination To rigorously assess the likely evolutionary pathways through this sequence space, we computed single-step substitutions in the following way: From the starting genotype, we compared its catalytic activity to that of each genotype separated from that starting genotype by only a single substitution; we then identified the neighbouring genotype that had the highest activity, and identified it as a “step” if it also had a higher activity than the starting genotype; we then repeated  74 the same protocol using that new post-step genotype as the new starting point; this process was repeated until arriving at a genotype for which there were no neighbouring genotypes with higher activity, which we then characterized as a “peak”.   One challenge that arose in computing Darwinian evolutionary trajectories in this way was the case of “ambiguous” steps; these were defined as steps where a neighbouring genotype had a higher average activity, but for which at least some replicate measurements were lower than at least some replicate measurements of the starting genotype. Because we did not feel we could definitely argue that either genotype had a higher catalytic activity than the other, we resolve these as ambiguous steps, and evaluated them under each possible scenario (either taking that step, or not taking that step) within the text, as appropriate.  3.3.14 Testing epistatic consequences with modeled datapoints In order to examine the effect of different order of epistasis on the distribution of mutational effects, we used the linear models described above in order to model datapoints based on the effects determined from first-, second-, and higher-order epistatic models to construct modeled datasets that reflect the different degrees of epistatic complexity. We solved the linear equation for the first-and-second order epistatic model (see above) to create the “simpler” dataset, and then used the linear equation for the full (up to fifth-order) epistatic model to create the “complex” dataset (see Table B.8). We then determined the total effect of introducing each of the five key substitutions into every possible genetic background across this space, and compared how the variation in the effect of each substitution (i.e., its total epistasis) changed upon switching from the simpler to the more complex modeled data (see Figure 3.13 and Table B.8). We found that higher-order epistasis (i.e., three-way, four-way, and five-way interactions) has a significant impact on the distribution of mutational effects, causing, for example, the substitution as position 271 to exhibit sign-epistasis, leading it to become beneficial only in the most derived genetic backgrounds (see Figure 3.11 B).     75 3.4 Results 3.4.1 MPH evolved from a dihydrocoumarin hydrolase enzyme Previous studies have found that MPH is closely related to a group of enzymes in the metallo-β-lactamase (MBL) superfamily that exhibit high hydrolytic activity towards the lactone dihydrocoumarin (DHC) (Luo et al. 2014) (Figure 3.1). To rigorously establish the evolutionary relationship between DHC hydrolases (DHCH) and MPH enzymes, we performed multiple sequence alignment and phylogenetic analyses on 153 publicly available sequences that are the closest homologs to MPH-wt (GenBank: 1P9E_A) (Figure 3.2). The vast majority of these sequences (all but six) exhibit a maximum of ~60% amino acid sequence identity with MPH-wt; the remaining six sequences form a small clade in which they share >90% identity. We then inferred the maximum likelihood amino acid sequence for the ancestral phylogenetic nodes that separate the MPH-like clade from the paraphyletic DHCH groups (Figure 3.2, Figure 3.3, and Table B.1). AncDHCH1 refers to the phylogenetic node that is the common ancestor for the MPH clade and a few DHCHs, including JsDHCH (GenBank: WP_043481628.1). AncDHCH2 and AncDHCH3 represent deeper phylogenetic nodes that are ancestral to AncDHCH1 and to other DHCHs (Figure 3.4). Overall, AncDHCH1 has 89% amino acid sequence identity to MPH-wt (differing at 31 substitutions and a single amino-acid insertion), and 73% identity to JsDHCH.  To narrow our analysis to the specific lineage on which MPH functionality evolved, we synthesized, expressed, and purified four extant enzymes (MPH-wt, JsDHCH, BbDHCH (GenBank: EHR68999.1), and SmDHCH (GenBank: WP_021504981.1)) and three reconstructed ancestral enzymes (AncDHCH1, AncDHCH2, AncDHCH3). We assayed these enzymes against a variety of substrates known to be hydrolyzed by enzymes in the MBL superfamily and lactonases in other enzyme superfamilies (Table B.2) (Baier & Tokuriki 2014; Khersonsky & Tawfik 2005). AncDHCH1 and the homologs exhibited high activity towards DHC, with moderate levels towards ester substrates. None of the enzymes exhibited detectable activity towards other lactones, or any of the other substrates that were tested. Thus, our results indicate that the enzymes are indeed most likely lactonases that hydrolyse DHC or similar substrates.   We subsequently characterized the hydrolytic activities of all the purified enzymes towards DHC (the hypothetical ancestral substrate), methyl-parathion (the canonical MPH OP substrate), and three additional OP compounds (ethyl-parathion, methyl-paraoxon and ethyl-paraoxon)  76 (Figure 3.4 and Table B.3). All four OP substrates contain the same leaving group (p-nitrophenol) but vary in terms of their substituent groups (i.e., sulfur vs. oxygen and ethyl vs. methyl). The extant DHCH homologs and ancestral enzymes all exhibit high DHCH activity (kcat/KM ≥ 105 M-1s-1) and relatively low OP hydrolase (OPH) activity (kcat/KM < 102 M-1s-1), which is similar to other previously studied DHCH enzymes (Luo et al. 2014). In contrast, MPH-wt exhibits substantial activity against all four OP substrates (kcat/KM = 102-104 M-1s-1) and only moderate DHCH activity (kcat/KM = 1.1×103 M-1s-1). Interestingly, AncDHCH1 and AncDHCH2 exhibit higher OPH activity compared to the three extant DHCHs and AncDHCH3, albeit still 100-1000-fold lower than MPH-wt for three of the OP substrates (with ethyl-paraoxon being the exception) (Figure 3.4 D and Table B.3). These results suggest that OPH activity in AncDHCH1 was a non-physiological and serendipitous promiscuous activity that arose prior to the introduction of methyl-parathion into the environment, and was subsequently recruited and optimized in response to the appearance of this novel substrate.               77  Supplementary Figure 2. Phylogenetic tree of MPHs and representative DHCH enzymes. A total of 153 of the closest MPH homologous sequences were used for the construction of the phylogeny via RAxML utilizing the LG protein model. Numbers at each of the branching nodes indicate bootstrap values. The cluster of MPH enzymes consist of a number of highly similar sequences (>90% sequence identity); the enzyme isolated from Pseudomonas sp WBC-3 (MPH-wt) is utilized for this study and labeled in orange. DHCH enzymes that were selected for characterization are labeled in cyan. Predicted ancestral sequences that were synthesized and characterized in this study are indicated by the cyan nodes. The most recent ancestral enzyme of the MPH (AncMPH) sequences is indicated by an orange node. Supplementary Figure 20.4AmDHCHCdDHCHRmDHCHPgDHCHBsDHCHSoDHCHSsDHCHAfDHCHSlDHCHRsDHCHPmPHCHKrDHCHPiDHCHAaDHCHDzDHCHMgDHCHSmDHCHOfDHCHBaDHCHRlDHCHBeDHCHAsMPHRgDHCHPcDHCHReDHCHCvDHCHCsDHCHYsDHCHBuDHCHBvDHCHBeDHCHMnDHCHCbDHCHPaDHCHPoOPHBdDHCHOPHC2MpDHCHAbDHCHMPH-wtSfDHCHVpDHCHDzDHCHRpDHCHSsPHCHRpDHCHGaDHCHAsDHCHPcDHCHRsDHCHJsDHCHCuDHCHFpDHCHCsDHCHCsMPHYfDHCHPaDHCHGcDHCHMlDHCHPzMPHHgDHCHDpDHCHCtDHCHBgDHCHSpDHCHHcDHCHSfDHCHPgDHCHSsDHCHMsDHCHAtDHCHDaDHCHCmDHCHSmDHCHEaDHCHRsDHCHCtDHCHDdDHCHTsDHCHCdDHCHRsDHCHSpDHCHPmMPHRsDHCHEcDHCHCnDHCHAaDHCHYrDHCHBpDHCHTxDHCHPfDHCHBbDHCHPsDHCHBkDHCHHgDHCHPcDHCHObDHCHDdDHCHGmDHCHEsDHCHLaDHCHAfDHCHTpDHCHBxDHCHDzDHCHPpDHCHAvDHCHPpDHCHPwDHCHGpDHCHHgDHCHSmDHCHLaDHCHTmDHCHTlDHCHRmDHCHOsMPHDsDHCH97100415882100996598956010010020319794461008910071775266985375569410093100100341001001009945100879345901006033100991005310076998850100968494978448100100100100661005010094593410088100100991009521974747255286734110057100841005149287510057100991008392364610010069544899DHCHsDHCHsMPHsAncDHCH3AncDHCH1AncDHCH2AncMPH 78 Figure 3.2 Phylogenetic tree of MPHs and representative DHCH enzymes. A total of 153 of the closest MPH homologous sequences were used for the construction of the phylogeny via RAxML utilizing the LG protein model. Numbers at each of the branching nodes indicate bootstrap values. The cluster of MPH enzymes consist of a number of highly similar sequences (>90% sequence identity); the enzyme isolated from Pseudomonas sp WBC-3 (MPH-wt) is utilized for this study and labeled in orange. DHCH enzymes that were selected for characterization are labeled in cyan. Predicted ancestral sequences that were synthesized and characterized in this study are indicated by the cyan nodes. The most recent ancestral enzyme of the MPH (AncMPH) sequences is indicated by an orange node.    79   Figure 3.3 Site-specific posterior probabilities for ancestral amino acid sequences characterized in this study. a, The Bayesian posterior probabilities of the most likely predicted ancestral amino acid at each position in the resurrected sequences of AncDHCH1 (top), AncDHCH2 (middle), and AncDHCH3 (bottom). The average posterior probabilities for each sequence are shown at the top of each graph. b, Cartoon representation of the structure of AncDHCH1 with the locations of ambiguous positions (predicted ancestral ≥ 0.70.6 - 0.690.5 - 0.59≤ 0.5bAncDHCH1; average posterior probability = 0.93AncDHCH2; average posterior probability = 0.92AncDHCH3; average posterior probability = 0.89Ancestral Amino Acid PositionPosterior Probability of Estimated Amino Acid1.00.50.01.00.50.01.00.50.0a1 33030 60 90 120 150 180 210 240 270 3001 33030 60 90 120 150 180 210 240 270 3001 33030 60 90 120 150 180 210 240 270 300 80 residues with a Bayesian posterior probability <0.8) are depicted as spheres. Colours of the spheres depict the posterior probability of the predicted residue, with red indicating lower probability and white higher probability. A full summary of the ambiguous positions and their posterior probabilities can be found in Table B.1. The two metal ions coordinated in the active site are shown as grey spheres.     Figure 3.4 Phylogeny and phenotype of methyl-parathion hydrolase (MPH). a, Chemical structures of the four organophosphate (OP) and dihydrocoumarin (DHC) substrates utilized in this study. Dashed red lines indicate the bond that is cleaved during the reaction. Proposed mechanisms of MPH for OP and DHC hydrolase activities are described in Figure 3.6. b, Cartoon representation of the crystal structure MPH from Pseudomonas sp. WBC-3 (PDB entry 1P9E). The two active site two metal ions are shown as grey spheres. c, Cartoon representation of the active site of MPH. The active site metal ions are shown as grey spheres. Residues coordinating the metals are highlighted as sticks. d, Schematic presentation of the phylogeny of MPHs, DHCHs, and predicted ancestral sequences. A full phylogeny of MPH is described in Figure 3.2. Spheres represent individual sequences while wedges depict groups of sequences. The number of mutations and the level of identity between the sequences are indicated on the branches. The kinetic activities of the characterized sequences for DHC and four OP compounds are shown next to their corresponding nodes on the phylogeny. The average of two technical replicates was used to fit to the Michaelis Menten equation; error bars indicate the error in the fit of the data to the equation. A full description of the kinetic parameters of the sequences can be found in Table B.3.   AncDHCH3AncDHCH2AncDHCH1MPHJsDHCHBbDHCHDHCHsDHCHsDHCM-parathionE-parathionM-paraoxonE-paraoxon27 mutations (91% id)61 mutations (79% id)28 mutations (91% id)32 mutations (89% id)72 mutations (73% id)10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )SmDHCHActive SiteDHCE-paraoxonM-paraoxonE-parathionM-parathionba dH234D255H302H149H147H152D151c 81 3.4.2 Five mutations enabled the evolution of OP activity We sought to determine the minimal set of genetic changes necessary for the evolution of OPH activity. Out of a total of 32 changes that occurred between AncDHCH1 and MPH-wt, we prioritized testing a subset based on two criteria: 1) the mutation is located in the vicinity of the active site, and 2) the residue otherwise exhibits high conservation either within the extant MPH orthologs or the DHCH orthologs (Figure 3.5 and Figure 3.6 A-B). We identified five mutations that satisfied these criteria: four substitutions, l72R, h258L, i271T, and f273L (small italic letter denotes the ancestral state for each amino acid residue while the large letter denotes the derived MPH state), and one insertion, Δ193S (Figure 3.6 A-B). We assessed the functional effect of these five mutations together by mutating each position in the AncDHCH1 genotype (AncDHCH1+m5) and reversing the five positions to their ancestral states in MPH-wt (MPH-m5), and measuring the activities of the purified enzymes against DHC and the four OP substrates. MPH-m5’s activity profile is almost identical to AncDHCH1, while AncCHCH1+m5 largely recapitulates the MPH-wt profile (Figure 3.6 C and Table B.3). These five changes together result in a 900-fold improvement in methyl-parathion activity and an 800-fold reduction in DHCH activity, and an overall ~700,000-fold shift in relative MPH/DHCH activities.   MPH’s evolved specificity is not only based on overall reaction chemistry, from C-O bond cleavage and tetrahedral transition state (DHCH) to P-O bond cleavage and pentacoordinate transition state (OPH), but also according to the substituents among the OP compounds, e.g., oxygen vs. sulfur (Figure 3.6 C). The ancestral enzymes (AncDHCH1 and MPH-m5) both exhibit an inverse linear relationship between log(kcat) and the partition-coefficient (logP) of the four OP substrates, with methyl substituents preferred over ethyl and oxygen over sulfur (Figure 3.7 A). The five genetic changes, however, increase activity against sulfur-substituted substrates to a greater degree (~1000-fold for methyl-parathion and ~400-fold for ethyl-parathion) than oxygen-substituted substrates (~200-fold for methyl-paraoxon and ~10-fold for ethyl-paraoxon). Consequently, the logP dependence is lost in the derived enzymes (MPH and AncDHCH+m5), which no longer show preference for oxygen over sulfur, but still exhibit higher activities for methyl-substituents over ethyl (Figure 3.7 A). Thus, MPH maintains a preference for methyl substituents, but is less discriminative between sulfur and oxygen substituents, whereas the interactions between the ancestral enzyme and OP substrates were governed to a greater degree by  82 hydrophobic effects.   To understand the structural basis for the functional changes, we solved the crystal structure of AncDHCH1 to a resolution of 1.7 Å (PDB entry: 6C2C) (Figure 3.7 B and Table B.4) and compared it to the previously published structure of MPH-wt (PDB entry: 1P9E). The two enzymes have almost identical overall structure, with the C-a backbone atoms aligned with a maximum r.m.s.d. of 0.46 Å, although there is a visible change in the shape of the active site pockets (Figure 3.7 B). Previous work suggested that substrates coordinate to the b-metal ion, and a terminal hydroxide ion bound by the α-metal ion in the active site carries out nucleophilic attack during OP hydrolysis (Purg et al. 2016) (Figure 3.8). The positions of the main catalytic machinery, including the active site metal ions and the residues that coordinate them, are nearly identical between AncDHCH1 and MPH (Figure 3.9). Four of the key mutations – h258L, i72R, i271L and f273L – are clustered in relatively close proximity to each other, and appear to function collectively to enlarge a potential binding pocket in the derived active site. The remaining mutation, Δ193S, is located on a loop opposite of the other four residues, and appears to have altered the conformation of the loop (Figure 3.7 B). We performed molecular docking of methyl-paraoxon and methyl-parathion in the active sites of AncDHCH1 and MPH. While the OP substrates can be positioned in productive poses in MPH, no productive poses were possible in AncDHCH1 (Figure 3.10 and Table B.5). Further docking analyses, where we refined the position of methyl-parathion in MPH and overlayed the substrate position into the AncDHCH1 active site, indicate that the substrate encounters steric clashes with Phe273 in the ancestral enzyme (Figure 3.7 C). The clash is eliminated by mutation of the residue into a leucine in MPH; additionally, h258L increases the size of a binding pocket to better accommodate the ethyl/methyl group of the substrate (Figure 3.7 C).    83  Figure 3.5 A multiple sequence alignment of representative extant MPH, DHCH, and predicted ancestral enzymes. Residue numbering is based on the sequence of the structure of MPH from Pseudomonas sp WBC-3 (MPH-wt, PDB ID: 1P9E). Extant MPH enzymes are labeled in light orange; - - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST ARAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AR L SCVAAVVAAT AAVAPL T L VST AHAAAPQVRT SAPGYYRML LGDF E I T A- - - - - - - - - - - - - - - - MPL KNR L L AT L SCVAAVL AAAMAAAPL T L VST AHAAAPQVKTQAPGF YRMMLGDF EVT A- - - - - - - - - - - - - - - - MPMTNR L L AT L SCVAAL L AAAL AAASL T L ASAAHAAAPQVKTQAPGF YRMMLGDF EVT A- - - - - - - - - - - - - - - - - - MTNRAMAT L SAVAAL L AAAL AAASL SL ASAAHAAAPQVKTQAPGF YRMMLGDF EVT A- - - - - - - - - - - - - - - - - - - - M I T T T T L TR T AAVL AVAMAS - - - - - - TMAQAAAPMAKFQAPGF YRT T LGDF E I T V- - - - - - - - - - - - - - - - - - - - - MT I T T L TRSAAVL AVAAAC - - - - - - SMAQAAAPMAKFQAPGF YRT T LGDF EVT V- - - - - - - - - - - - - - - - - - - - - - - - - - MAT AAAVL AGAMA - - P - - - - - - AQAAAPF AKFQAPGF YRT T LGDF EVT V- - - - - - - - - - - - - - - - - - - - - MAL AAL - C L AA - - - - - - - - - - - - - - AAAQAAAPQVKGQAPGWYRMPLGDF EVT AMRPR L PT AL SL SDSRERPMTRKL L A I FGL I VAL SAGGPAL - - - - - - TD SRAEAPQQKTQAPGYHRMMLGDF E I T AL SDGT VAL PVDKR LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VAL PVDKR LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VAL PVDKR LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VALQVDKR LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSF KKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VAL PVDKR LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VAL PVDKL LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VAL PVDKL LNQ - - - - - - - - - - - - - - - - - PAPKTQSAL AKSFQKAPL ET SVTGYL VN TGSKL VL VD TGAAGL SDGT VD L PVDKL LNQ - - - - - - - - - - - - - - - - - PPAKTQSAL AKSF L KAPL ET SVNAYL VN TGSKL VL VD TGAAGL SDGT VD L PVDKL L TN T - - - - - - - - - - - - - - - - SPAQVQSAL AKAF LGVPL ET SVNAYL I N TGSKL VL VD TGAAGL SDGT VD L PVDKL L TN T ST T PGGNGQS I PL DQSSPAQVQSAL AKAF LGVPVET SVNAYL I N TGSKL VL VD TGAGGLNDGT I D L PVDKL L KQ - - - - - - - - - - - - - - - - - PPAKTNAAL ARSF EKSPL ET S I N AF L I N TGSKL VL I D AGAASLNDGS I D L PVDQL L KQ - - - - - - - - - - - - - - - - - PPAKTNAAL AKSF L KSPVET SVNAF L I N TGSKL VL I D AGAASLNDGT LD L PVSKL L KQ - - - - - - - - - - - - - - - - - PEAKTRAALDKSF L KEPLQT SVNAF L I N TGSKL I L I D TGAAGLNDGT VGL PVDKL L TN T - - - - - - - - - - - - - - - - R PGQVQRALMKAYLG I PL ET SVNGYL VN TGSKL VL VD TGAAGL SDGT VD L PVDQL L SGA - - - - - - - - - - - - - - - - R PGQVKRALQRAYLGVPL ET SVNAYL VN TGTR L VL I D TGAAGL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGR L AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMVGEQL AF PNAVVRADQKEADFWL SQTN LDKAPDDEL FGPT LGKL AAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMANEQAAF PNAVVRADQKDADFWL SQAN LDKAPDDEL FGPT LGKL VAN L KAAGYQPEQVDE I Y I THMHPDHVGGLMANGQAAF PNAVVRADQKDADFWL SQAN L AKAPEDEL FGPT LGKL VAN L KAAGYQPEQVDE I Y I THMHPDHVGGL TQNGKAAF PNAVVRADQKDADFWL SEAN L AKAPEDEL FGPT AGKML ANF KASGYKPEQVDE I Y I SHMHGDHVGGL AANEQRVF PNAVVRAGKLDADF YL SQSN LDKATGEEL FGPT AGKL L AN F KASGYKPEQVDE I Y I SHMHGDHVGGL T ANEQRVF PNAVVRAGKLDADYYLNPGN LDKATGEEL FGPT AGQL AASL KASGYT PDQVDE I Y I THMHSDHVGGL ASKDQRVF PNA I VRAGKQDADYYL SQAN LDQAKPDAL FGPT LGN L VAN L KASGYQPEQVDEVY I THMHPDHVGGLMAGTQAAF PNAVVRAHQKEGDFWL SPQT L AQAPEADL FGPT LGKL PAH L KAAGYL PEQVDEVY I TH LHPDHVGGL L ADGKAAF PNAT LR FDQHDADFWL SNEQMARAPDDASKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDSKGF F KGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESQGQKL AL LGD L I L VAAVQFDDGF FQGAMASLNPYVKAGKF KPF SGNTD - L VPG I KAL ASHGHT PGHT T YVVESKGQKL VL LGD L I H VAAVQFDDGF FQGAMASL KPYVEAGRF KPF SGNTD - L VPG I KAVAAHGHT PGHT T YVVESKGQKL VLWGDL I H VAAVQF PDGF FQGAMASL KPYVEAGRF KPF SGNGDEL VPG I KAVAAHGHT PGHT T YVVESKGQKL VLWGDL I H VAAVQF PD- KED FQGAMMSLNPYVKAGKFQP I VANSE - L VPG I KSYFNGGHT PGH I T YVVESKGQKL AL LGD L LHVQAVQF AT- KQDFQGAMMSLNPYVKAGKFQP I I AN SE - L VPG I KSYFNGGHT PGH I T YV I ESAGQKL VL LGD L LHVQAVQF EN- KAN F EGPMVSL T PYVKAGKFQP I T ANGE - L APG I R SQFNGGHT VGH I SYVVESKGQKL VL LGD L LHVQSVQFDD- KGF VQGAQASMKAYVDGGRYKPFDGQT E - L I PGVT AVPA I GH T PGHS I YVVESKGQKL VLWGDLMHVAAVQF AD- KGF F AGAMAAVKPYQDAGR L KT F TGSTD - L VPGVRAEPAPGHT PGHT F YVVESRGQKL V I WGDTMHVAAVQF PDPSVT TQLDSDSKSVAVERKKAF ADAAKGGYL I AASH L SF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVT TQLDSDSKSVAVERKKAF ADAAKGGYL I AASH L SF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVTNQLD I DGKSAAVERKKAF ADAAKGGYL I AASH L PF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVTNQLD I DGKSAAVERKKAF ADAAKRGYL I AASHVPF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVT I QLDSDSKSAAVERKKAF ADAAKGGYL I AATH L SF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVT I QL VSDSKSAAVERKKAF ADAAKGGYL I AAAH L SF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVT I QLDSDSKSAAVERKKAF ADAAKGGYL I AAAH L SF PG I GH I R AEGKGYRF VPVNYSVVNPK - - - - - -PSVT I QFDSDSKAAAAERKKAF ADAAKGGYL I GAAH L SF PG I GH I R ADGKGYRF VPVNYSVANPK - - - - - -PSVT I QFDSDSKAAAAQRKKAF ADAAKQGYL VGAAH L SF PG I GH I R ADGKGYTWVPVNYSAARAK - - - - - -PSVT I EFDSDSKAAAAQRKKAF ADAAKQGYL VGAAH L SF PG I GHVRKDGKGYTWVPVNYSAARAKSKAGARPGVGVT FD TDSR I A I AERKEAF AAAAKGGYL I GAPH L SF PALGHVRVNGKGYDF VPVNYAL - - PR - - - - - -PGVGVNFDSDSKVA I AERKEAF AAAAKGGYL I GASH L SF PALGHVRADGKGYDF VPVNYAL - - PR - - - - - -PN VG I AFD TDSSVA I KERNAAF AAAAKGGYL VGAAH L SF PALGHVRASGKAYQF VPVSYT - - QPR - - - - - -PSVT I KFDADSKAAAPQRKKAYADAAKKGYYVGVAHVAF PG I GR LRADGKGYTWVPANYS - - GNK - - - - - -PS I V I KFDVDSKT AMPQRRKAF ADAARQGYYVA I AHVSF PG I GR LRAEGKGYVWVPLNYSS - KP - - - - - - -PsMPHAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPH- K- K- K50100150200 25030019372258271 273PsMPHAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPHPsMPHAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPHPsMPHAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPHPsMPHAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPHSupplementary Figure 4Supplementary Figure 4. A multiple sequence alignment of representative extant MPH, DHCH, and predicted ancestral enzymes.Residue numbering is based on the sequence of the structure of the MPH from Pseudomonas sp WBC-3 (MPH-wt, PDB ID: 1P9E).Extant MPH enzymes are labeled in light orange; extant DHCH enzymes and predicted ancestral sequences are labeled in pale cyan.Positions of metal binding residues are highlighted in green; positions of the five key functional mutations in MPH evolution arehighlighted in orange. 84 extant DHCH enzymes and predicted ancestral sequences are labeled in pale cyan. Positions of metal binding residues are highlighted in green; positions of the five key functional mutations in MPH evolution are highlighted in orange.     Figure 3.6 Identification of five key adaptive mutations between AncDHCH1 and MPH. a, Cartoon representation of the crystal structure of MPH-wt (PDB entry 1P9E) with 32 mutations (spheres, 31 substitutions and 1 insertion) that occurred between AncDHCH1 and MPH-wt. Five active site mutations (l72R, Δ193S, h258L, i271T, and f273L) are highlighted as orange spheres. The two active site metals are shown as grey spheres. b, A cropped multiple sequence alignment of representative sequences of extant MPH, DHCH, and resurrected ancestral enzymes. Residues at the positions where the five active site mutations have occurred between AncDHCH1 and MPH are highlighted in orange. A full multiple sequence alignment is presented in Figure 3.5. c, The effects of the five mutations on DHC and OP kinetic activities. Genetic relationships between the four enzymes are described in the inner square. The average of two technical replicates was used to fit to the Michaelis Menten equation; error bars indicate the error in the fit of the data to the equation. A full description of the kinetic activities of the sequences can be found in Table B.3.     Δ193S l72Rh258Lf273La b V D KR L NQV D KR L NQV D KR L NQV D KR L NQV D KR L NQV D K L L NQV D K L L NQV D K L L NQV D K L L T NV D K L L T NV D K L L KQV DQ L L KQV S K L L KQV D K L L T NV DQ L L SGD ED ED ED ED ED ED ED ED ED EE EE ED AA DD AS KG D L I L V A AS KG D L I L V A AS KG D L I L V A AS KG D L I L V A AS KG D L I L V A AS KG D L I L V A AS KG D L I L V A AG D L I H V A AG D L I H V A AG D L I H V A A- K E D L L H VQ A- KQ D L L H VQ A- K A D L L H VQ S- KG D L MH V A A- KG D TMH V A AS V T T Q L D SDS V T T Q L D SDS V TNQ L D I DS V TNQ L D I DS V T I Q L D SDS V T I Q L V SDS V T I Q L D SDS V T I Q FD SDS V T I Q FD SDS V T I E FD SDG VG V T FD TDG VG VN FD SDN VG I A FD TDS V T I K FD ADS I V I K FD VD- K- K- K72 258 271 273MPH-wtAsMPHOsMPHPlMPHCsMPHPzMPHJsDHCHObDHCHDzDHCHBbDHCHMpDHCHAncDHCH1AncDHCH2AncDHCH3AncMPH... ... ...19310-1 101 103105k cat/KM (M-1s-1 )107DHCM-parathionE-parathionM-paraoxonE-paraoxonAncDHCH1 AncDHCH1+m5MPH-m5 MPH5 mutations(l72R, Δ193S, h258L, i271T, f273L)27 mutations 27 mutations5 mutations(l72R, Δ193S, h258L, i271T, f273L)ck cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107i271T 85   Figure 3.7 Structural and biochemical effects of five key mutations. a, A plot of Hansch hydrophobic constant (logP) versus kcat (top) and KM (bottom). Plots were generated using the activity profiles of the enzymes towards four different OP substrates: methyl-paraoxon (logP = 0.97), ethyl-paraoxon (logP = 1.7), methyl-parathion (logP = 2.16), and ethyl-parathion (logP = 2.88).  Pale cyan indicates enzymes where the five key residues are in their ancestral states; light orange are enzymes with the derived state residues. b, Surface representation of the active sites of AncDHCH1 (PDB entry 6C2C) and MPH (PDB entry 1P9E). The five active site mutations are highlighted as sticks. The two active site metal ions are shown as grey spheres. c, Overlay of the energy-minimized docking pose of methyl-parathion in MPH in the apo AncDHCH1 active site (cyan, left) and in MPH (orange, right). The active site region where the substrate binds is depicted in surface mode. The active site metal ions are shown as grey spheres. The bridging and terminal (which acts as the likely nucleophile) water/hydroxide molecules are shown as red spheres. All initial molecular docking poses are presented in Figure 3.10 and the docking scores in Table B.5.  h258l72i271f27372R258L271T273Lh258f273i271l72 L258R72T271L273S193AncDHCH1 MPHb0.5 1 1.5 2 2.5 3-42-3-2-101logP (substrate)logkcat0.5 1 1.5 2 2.5 30612345logKMlogP (substrate)MPHMPH-m5AncDHCH1AncDHCH1+m5ac193SAncDHCH1 MPH 86   Figure 3.8 Proposed catalytic mechanisms of MPH for organophosphate and dihydrocoumarin hydrolysis reactions. a, A proposed mechanism for OP hydrolysis of MPH as described by Purg et al (Purg et al. 2016). Metal cations (Me) in the active site are coordinated by adjacent histidines and aspartic acids, and a terminal hydroxide ion, which serves as a nucleophile that attacks the phosphate of the substrate. b, A proposed mechanism for DHC hydrolysis of MPH. The hydroxide ion that serves as the nucleophile in OP hydrolysis is presumed to play the same role in DHC hydrolysis.   abMe2+ Me2+-OHPSOOO-OHHis147His234His149 His152Asp151His302Asp255NO2Me2+ Me2+-OHHis147His234His149 His152Asp151His302Asp255P OOHOSHONO2+Me2+ Me2+-OH-OHHis147His234His149 His152Asp151His302Asp255O OMe2+ Me2+-OHHis147His234His149 His152Asp151His302Asp255OHOHOSupplementary Figure 5. Proposed catalytic echanism  of MPH for organopho phate and dihydrocouma n hydrolytic reactions. a, A proposed mechanism for OP hydrolysis of MPH as described by Purg et al.22. Metal cations (Me) in the active site are coordinated by adjacent histidines and aspartic acids, and a terminal hydroxide ion, which serves as a nucleophile that attacks the phosphate of the substrate. b, A proposed mechanism for DHC hydrolysis of MPH. The hydroxide ion that serves as the nucleophile in OP hydrolysis is presumed to play the same role in DHC hydrolysis.Supplementary Figure 5 87   Figure 3.9 Comparison of the crystal structures of AncDHCH1 and MPH. a, Overlay of the cartoon representations of the crystal structures of AncDHCH1 (pale cyan, PDB entry 6C2C) and MPH (light orange, PDB entry 1P9E). 27 of the 32 amino acid substitutions that have occurred between the two enzymes are depicted as cyan spheres. 5 amino acid mutations (l72R, Δ193S, h258L, i271T, and f273L) that occurred in the vicinity of the enzyme active sites are highlighted as orange spheres. The two metals coordinated in the enzyme active sites are shown as grey spheres. b, A close up view of the active sites of the overlaid crystal structures of AncDHCH1 and MPH from a. Metal binding residues are highlighted as sticks. Metals bound in the active sites are shown as grey spheres.   Δ193S l72Rh258Lf273Li271TaAncDHCH-1MPHbH149D151H152H147 H234D255H302Supplementary Figure 6. Comparison of the crystal structures of AncDHCH1 and MPH. a, Overlay of the cartoon representations of the crystal structures of AncDHCH1 (pale cyan, PDB entry 6C2C) and MPH (light orange, PDB entry 1P9E). 27 of the 32 amino acid substitutions that have occurred between the two enzymes are depicted as cyan spheres. 5 amino acid mutations (l72R, Δ193S, h258L, i271T, and f273L) that occurred in the vicinity of the enzyme active sites are highlighted as orange spheres. The two metals coordinated in the enzyme active sites are shown as grey spheres. b, A close up view of the enzymes’ active sites of the overlaid crystal structures of AncDHCH1 and MPH from a. Metal binding residues are highlighted as sticks. Metals bound in the active sites are shown as grey spheres.Supplementary Figure 6 88   Figure 3.10 Molecular docking poses of AncDHCH1 (cyan) and MPH (orange) from Glide in complex with a, methyl-parathion, and b, methyl-paraoxon. Enzymes are depicted in cartoon representation. Productive modes of the docked ligand (defined as being aligned for nucleophilic attack by the terminal hydroxide and positioned for terminal coordination to the β-metal ion) are depicted as thick sticks, while non-productive modes are depicted as thin sticks. For AncDHCH1, no productive modes of binding were identified with either substrate; only unproductive modes are displayed. For MPH, the productive poses are presented in the middle panels while the non-productive poses are presented on the right. The active site metal ions are shown as grey spheres. The bridging and terminal (which acts as the likely nucleophile) water/hydroxide molecules are shown as spheres. The top docking scores are presented in Table B.5.   3.4.3 The rugged adaptive landscape of MPH  We next sought to understand the evolutionary process that generated MPH activity. We characterized the topology of the adaptive landscape that encompasses the functional transition between AncDHCH1 and MPH by generating and assaying a complete combinatorial set of the five genetic changes (a total of 32 genotypic combinations) between MPH-m5 and MPH-wt (Figure 3.11 A, Table B.6, and Table. B.7). All variants were overexpressed in E. coli and the h258l72i271f273R72L258T271L273S193h258i271f273R72L258T271L273S193l72AncDHCH1 MPHAncDHCH1 MPHabSupplementary Figure 7Supplementary Figure 7. Molecular docking poses of AncDHCH1 (cyan) and MPH (orang ) from Glide in complex with a, methyl-parathion, a d b, methyl-paraoxon. Enzymes are depicted in cartoon representation. Productive modes of the docked ligand (defined as being aligned for nucleophilic attack by the terminal hydroxide and positioned for terminal coordination to the β-metal ion) are depicted as thick sticks, while non-productive modes are depicted as thin sticks. For AncDHCH1, no productive modes of binding were identified with either substrate; only unproductive modes are displayed. For MPH, the productive poses are presented in the middle panels while the non-productive poses are presented on the ight. The active site metal ions are shown as grey spheres. The bridging and terminal (which acts as the likely nucleophile) water/hydroxide molecules are shown as spheres. The top docking scores are presented in Supplementary Table 5. R72L258T271L273S193MPHR72L258T271L273S193MPH 89 catalytic activity of the cell lysate for methyl-parathion was measured. The soluble expression of all the genotypes is very similar, indicating that changes in the cell lysate activity largely reflect changes in catalytic efficiency and not protein stability and expression (Figure 3.12).   The five mutations, when introduced into MPH-m5, synergistically improve MPH activity by 970-fold, which exceeds the sum of the 210-fold improvement predicted from a null-additive model (i.e., no epistasis) based on singular effect of each mutation in the background of the ancestral MPH-m5 (Figure 3.11 A). MPH-wt is found to be the single global maximum across this landscape, without any alternative local maxima. However, the adaptive landscape is rugged, and only a limited set of adaptive trajectories is accessible: only 19 out of the 120 (5!) possible trajectories that connect MPH-m5 to MPH-wt could be completed without intermediate steps that required a reduction in MPH activity, indicating that the order in which those mutations accumulated was highly constrained and rather deterministic (Figure 3.11 A). This is because three of the mutations (l72R, i271T, and f273L) exhibit sign epistasis, i.e., a reversal in the sign of the effect, from being deleterious to beneficial or vice versa, thereby limiting the number of genetic backgrounds in which they can fixate (Figure 3.11 B). In particular, i271T is almost always deleterious, and becomes positive only after the prior appearance of three mutations, l72R/h258L/f273L. The remaining two mutations (h258L and Δ193S) exhibit strong magnitude epistasis, ranging from highly positive to almost neutral (Figure 3.11 B). Given that MPH evolved in an environment with high amounts of methyl-parathion, we assume that, for a given genotype, the mutation that causes the greatest increase in OPH activity would be the most likely to fix. Following this assumption, we infer that the most likely order in which the five adaptive mutations were accumulated was h258L, Δ193S, f273L, l72R, and i271T. Our proposed trajectory is supported by the polymorphism observed in extant MPH genes at sites 72 and 271, where a few of the putative MPH orthologs still contain the ancestral residues, suggesting that they likely accrued at a later stage along the trajectory to the extant MPH-wt (Figure 3.5).  90   3.11 Adaptive landscape and mutational effects of key mutations for methyl-parathion hydrolase activity. a, The adaptive landscape between MPH-m5 and MPH-wt for methyl-parathion hydrolase activity. Each node represents a unique variant, with the genetic background indicated according to the numerical order of the residues (i.e., 72, 193, 258, 271, 273), where “0” refer to the ancestral and “1” refer to the derived state (e.g., 10000 denotes m5+l72R). Number in the centre of each node indicates its cell lysate activity relative to MPH-m5. Edges connecting nodes represent single mutational steps, with dark grey arrows indicating paths that lead to an increase in fitness from the previous node and are evolutionarily accessible, dashed light grey lines paths that are inaccessible due to a decrease in fitness from the previous node, and solid light grey lines paths that lead to an increase in fitness, but are inaccessible due to a decrease in fitness observed in a previous node. Results shown are the mean lysate activities for three biological replicates. A full description of the activities of all the variants can be found in Table B.7. b, The singular mutational effects (i.e., fold change in activity) of each of the 5 mutations in the 16 different genetic backgrounds on methyl-parathion activity. Genetic backgrounds are indicated according to the numerical Figure 4ac0000010000010000001000001110001010010010010100100100110001010001101101100110101100111110011101011111110111010111110011001011117.81.700100130.26.119140.6100016.3150.59.94.2130.7 7.70.92.35.7101101.31901.51110019230140260111116970011105.54.41110119Increase activityDecrease activityl72Rh258Li271Tf273LΔ193SNegativePositivebx1011x1111x1001x1101WTx1000x1010x0111x0011x0010x0100x0001x1100x1110x0101x01101x0111x1011x1111x0011x0101x1100x0100x001WT0x1110x1011x0000x1100x1001x1000x01100x1001x1101x1010x1100x11WT01x0011x1110x1011x1000x0101x0110x0011x0111x0010x01101x1111x1110x1011x1001x1100x1011x0001x0010x0111x0WT000x1101x0110x0010x1100x01111x1101x1011x1110x1100x0100x1001xWT0001x0111x0101x0011x0110x0010x1000x1010x103Fold change in lysate activityl72RΔ193Sh258Li271Tf273L10210110010-110-2d1010.1Average 1st Order (R2 = 0.68)2nd Order(R2 = 0.22)3rd Order (R2 = 0.09)72 19325827127372/19372/25872/271 72/273193/258193/271 193/273258/271258/273271/2731120.80.872/193/25872/193/27172/193/27372/258/27172/258/27372/271/273193/258/271193/258/273193/271/273258/271/27372/193/258/27172/193/258/27372/193/271/27372/258/271/273193/258/271/27372/193/258/271/2734th Order (R2 = 0.0003)5th Order (R2 = (0.0007)EpistasisFold effect on activity 91 order of the residues (i.e., 72, 193, 258, 271, 273), where “0” refer to the ancestral and “1” refer to the derived state (e.g., 10000 denotes m5+l72R). All analyses were conducted using the mean lysate activities for three biological replicates. c, Statistical analysis of mutational effects in the adaptive landscape. Mutations are indicated by their residue numbers (i.e., 72 denotes the effect of l72R). The heights of the bars of the 1st order effects indicate the average effects (fold-change in lysate activity) of each of the five mutations. Widths of the individual bars in the 1st order effects correspond to the portion of variation in activity (R2 in the linear regression model) attributable to each of the singular effects. The heights of the bars in the 2nd, 3rd, 4th, and 5th order effects correspond to the average epistatic effects of the combined mutations. Widths of the 1st, 2nd, 3rd, 4th, and 5th order effects correspond to the variation attributable to combined mutational effects, calculated as the increase in adjusted R2 of the fit to the data when each term is added to a linear regression model. d, Graphical representation of the effect of the five mutations and their epistatic relationships mapped on the structure of the MPH (PDB entry 1P9E). Positions of each residue reflect the configuration of the alpha-carbon of the residues in the MPH crystal structure. The size and colour of the nodes represent the magnitude of the average singular effects on MPH activity of the mutation. Edges connecting nodes indicate significant epistatic interactions between the residues [log10(epistatic effect)>0.1]; solid lines indicate interactions between pairs while transparent lines indicate interactions between three residues. All analyses were conducted using the mean lysate activities for three biological replicates.   92  Figure 3.12 SDS-PAGE analysis showing soluble fractions of the 32 MPH combination variants. The band corresponding to the Strep-tagged MPH enzyme (33.2 kDa) is indicated by dashed red circles. The genotype of each unique variant is indicated by numbers corresponding to the positions of the 5 key mutations on the background of the wt sequence (i.e., 72 denotes wt-l72R mutation).    72wt19325827127372/19372/25872/27172/273193/258193/271193/273258/271258/273271/27372/193/27372/193/25872/193/27372/258/27172/258/273193/258/271193/258/273193/271/273258/271/27372/193/258/27172/193/258/27372/193/271/27372/258/271/27372/271/273193/258/271/273m5MPHMPH 93 3.4.4 High-order epistasis between five mutations To understand the molecular basis for the extensive epistasis and rugged adaptive landscape, we performed statistical analyses on the mutational effects by generating a series of nested linear models and fitting the landscape (Figure 3.11 C). First, we generated a simple non-epistatic model that determines the average “main effects” (1st order effects) of each mutation across all genetic backgrounds. Next, we constructed a series of more complex models that include pair-wise interaction effects (2nd order), and then 3rd order, 4th order or 5th order effects (see Methods). This approach allows us to determine the “contribution” of each order of epistasis in terms of the improved R2 of each linear model’s fit to the experimental data (Anderson et al. 2015; Stormo 2011). Overall, epistasis accounts for more than 30% of the total variation in MPH activity across this evolutionary landscape, with around 10% being explained by high-order epistasis (3rd-order or higher) (Figure 3.11 C). To verify the significance of higher-order epistasis in determining the pathways that are available, we reconstructed the adaptive landscape using the parameters obtained in the linear models (Figure 3.13 and Table B.8). When only the 1st and 2nd order effects are included, the reconstructed landscape differs significantly from the experimental data. By contrast, when all the epistatic effects up to the 5th order are included, the reconstructed landscape roughly matches the experimental data.   To further decipher the molecular basis for epistasis, we visualized all 2nd- and 3rd-order epistatic interactions above a minimum threshold [log10(epistasis effect)>0.1] as "edges" between the Ca for each of residues involved on the structure of MPH (Figure 3.11 D). This analysis unveiled a highly connected network of interactions among the five mutations. In several cases, e.g., h257L and i271T, the interacting residues are in relatively close proximity to each other in the active site, and may form direct physical interactions. However, epistasis is also observed between residues that are physically far apart: for example, Δ193S exhibits synergistic epistasis with l72R and f273L via both 2nd- and 3rd-order epistasis despite being located >10 Å away, on the opposite side of the active site (Figure 3.11 D and Figure 3.14). These long-range epistatic interactions may be mediated by the substrate itself, with the mutations affecting catalytic activities by altering the substrate position in the active site.   94  Figure 3.13 Reconstructed adaptive fitness landscapes of MPH for methyl-parathion activity using simulated data, where the activities of the 32 genotypes are predicted in linear regression models using a, only 1st and 2nd order effects and b, all epistatic effects up to the 5th order. The panels on the left show the fitness landscape for methyl-parathion. Each node represents a unique variant, with the genetic background indicated according to the numerical order of the residues (i.e., 72, 193, 258, 271, 273), where “0” refer to the ancestral and “1” refer to the derived state (e.g., 10000 denotes m5+l72R). Number in the centre of each node indicates its cell lysate activity relative to MPH-m5. Lines connect nodes that are separated by single mutations. Solid dark grey lines indicate mutational pathways that are evolutionarily accessible (results in an increase in fitness from the previous node). Dashed light grey lines indicate pathways that are inaccessible due to a decrease in fitness from a previous node. Solid grey lines indicate pathways that lead to an increase in fitness, but are evolutionarily inaccessible due to a decrease in fitness observed in the previous node. The panels on the right show the difference in activity between the simulated landscape and the actual, with the numbers on each node depicting the ratio between the actual and predicted fold change in activities. The dashed grey lines indicate nodes that are separated by single mutations. A complete summary of the simulated data is presented in Table B.8. a00000100000100000010000011100010010010100100100110001010001110011010110011111001111111010111110101111.30.8001004.70.10.65.41.80.1100013.23.50.13.91.14.00.2 2.40.71.33.80.7560.6111007.060275.7011119.11101.03.07.611010010110011100110111011111010110011010bActivities predicted using effects up to the 2nd orderActivities predicted using effects up to the 5th orderIncrease activityDecrease activity00000110000010000001000001110001010010010010100100100110001010001110011010110011111001110101111111011101011111001100101116.32.0001002.73.79.73.57.65.0100012.04.37.52.53.73.24.3 3.21.31.81.5101102.03.32.5111002.83.95.34.5011111.88.85.51.52.51110101101011100000010000010000001000001110001010010010010100100100110001010001110011010110011111001110101111111011101011111001100101110.90.9001001.41.51.31.31.11.3100011.20.91.20.61.20.81.0 1.11.11.01.1101101.01.21.0111001.21.31.21.0011111.31.11.01.31.11111010110101110Higher than predictedLower than predicted0000010000010000001000001110001010010010010100100100110001010001101101100110101100111110011101011111101011111001100101118.51.9001009.10.24.714130.5100015.1160.4173.6160.7 7.20.82.25.1101101.31601.511100161801202501111139205.73.5111011701110110111Increase activityDecrease activityHigher than predictedLower than predicted 95   Figure 3.14 Distances between the key functional residues in the crystal structures of a, AncDHCH1 (pale cyan, PDB entry 6C2C) and b, MPH (light orange, PDB entry 1P9E). The panels on the left depict cartoon representations of the crystal structures of the two enzymes, with the five key functional residues highlighted as sticks; because AncDHCH1 lacks a residue at position 193, the nearest residue, e192, is used for measurements of inter-residue distances. The panels on the right show the measured distances between the nearest atoms (top) and the Cα (bottom) of the five residues in the crystal structures of the two enzymes. Residues that are close enough for potential physical interactions (distance <4 Å) are bolded.   3.4.5 Epistasis change between O- vs. S-substituted substrates We conducted comparative landscape analyses between the four different OP substrates to determine how MPH evolved substrate specificity. We assayed all 32 genotypic variants against the remaining three OP substrates (ethyl-parathion, methyl-paraoxon, and ethyl-paraoxon) and quantified the epistatic interactions using the same linear modelling approach described above (Figure 3.15, Figure 3.16, and Figure 3.17). The topology of the adaptive landscapes and the main and epistatic effects of the mutations for methyl- vs. ethyl-parathion are largely similar MPHS193 R72L273 T271L258AncDHCH1l72f273 i271h258e192AncDHCH1MPHR72 S193 L258 T271 L273R72 8.9 (N -O) 6.9 (N -C5) 5.3 (C -C4) 3.6 (N -C5)S193 15.4 (O-C5) 14.4 (O-O) 9.1 (O-C5)L258 3.6 (C5-O) 4.4 (C5-C5)T271 3.4 (O-C3)L273R72 S193 L258 T271 L273R72 15.8 10.1 5.3 6.9S193 20.5 17.7 13.1L258 8.6 9.7T271 5.1L273Distance between nearest atoms (Å)Distance between C  atoms (Å)l72 e192 h258 i271 f273l72 9.6 (C5-O) 7.3 (C5-C5) 4.1 (C3-C5) 3.8 (C5-C4)e192 13.4 (C2-N ) 11.8 (O-C3') 8.2 (O-C6)h258 3.7 (N -C4) 3.9 (C5-C2)i271 3.9 (C5-C2)f273l72 e192 h258 i271 f273l72 14.4 10.1 5.6 7.6e192 18.6 14.9 10.1h258 7.9 9.4i271 5.4f273Distance between nearest atoms (Å)Distance between C  atoms (Å)ab 96 (Figure 3.15 A and Figure 3.16 A); consequently, the structural epistatic interaction networks for these two substrates are virtually identical (Figure 3.17 A). Comparing the effects of each mutation across all 16 possible genetic backgrounds between the two substrates reveals a strong linear correlation (R2 = 0.96), again indicating that epistatic interactions are similar in the two backgrounds; however, three mutations, l72R, Δ193S and f273L, exhibit a slight preference for the methyl substituent over ethyl (Figure 3.18 A).  By contrast, the landscapes for paraoxons (which have a phosphoryl oxygen instead of sulfur) are more rugged, containing several local maxima (Figure 3.15 B-C). Notably, the derived MPH-wt genotype is not the most active genotypic variant for paraoxons; instead, MPH-m5+271 and MPH-m5+72/193/271/273 are the two most active variants against ethyl-paraoxon (Figure 3.15 C). The main and epistatic effects of each mutation for the two paraoxon substrates differ significantly compared to that of the parathion substrates (Figure 3.16 B-C); consequently, the interaction networks for the paraoxons are substantially different from that of parathions, with the paraoxon networks exhibiting fewer significant 2nd-order and more 3rd-order epistatic interactions (Figure 3.17 B-C). Most notably, the average effect of h258L has switched from being positive in parathion to negative in paraoxon; conversely the average effect of i271T has changed from negative in parathion to positive in paraoxon. Moreover, the 2nd order effect of h258L-i271T has changed from being synergistic in the parathion backgrounds to strongly antagonistic in the paraoxon backgrounds (Figure 3.16 B-C and Figure 3.17 B-C).          97  Figure 3.15 Adaptive landscapes for three additional OP substrates. The adaptive landscape of 32 variants between MPH-m5 and MPH-wt for a, ethyl-parathion, b, methyl-paraoxon, and c, ethyl-paraoxon. Each node represents a unique variant with the number at the center representing its cell lysate activity relative to MPH-m5. Edges connecting nodes represent single mutational steps. The definition of the colour of the edges is the same scheme as in Figure 3.11 A. Results shown are the mean lysate activities for three biological replicates. A full description of the activities of all the variants can be found in Table B.7.  a E-parathion0000010000010000001000001110001010010010010100100100110001010001101101100110101100111110011101011111110111010111110101117.51.300100210.86.317151.0100016.5170.78.115191.4 101.3182.59.0101102.2871.8111001985572401111164169.54.11110101110011001Increase activityDecrease activityb M-paraoxon0000010000010000001000001110001010010010010100100100110001010001101101100110101100111110011101011111101011111001100101111171.8001007.01109.7349.953100016.09.5150143.713170 2.923018265.3101101.41101801110014585205.8011116.52004.93.2111010111011011c E-paraoxon0000010000010000001000001110001010010010010100100100110001010001101101100110101100111110011101011111101011111001100101119.81.1001008.11708.1185.152100013.57.11607.34.815150 2.113011192.9101100.525140111005.5121702.6011113.08.93.40.81110101110110111Increase activityDecrease activityIncrease activityDecrease activity 98  E-parathionAverage 1st Order (R2 = 0.64)2nd Order (R2 = 0.23)1010.10.81120.872 19325827127372/19372/25872/27172/273193/258193/271193/273258/271258/273271/27372/193/25872/193/27172/193/27372/258/27172/258/27372/271/273193/258/271193/258/273193/271/273258/271/2733rd Order (R2 = 0.12)72/193/258/27172/193/258/27372/193/271/27372/258/271/273193/258/271/27372/193/258/271/2734th Order (R2 = 0.001)5th Order (R2 = 0.0003)EpistasisFold effect on activityl72RΔ193Sh258Li271Tf273L103Fold change in lysate activityx1011x1111x1101x1000WTx1101x1010x0111x0011x0010x0001x0100x1100x0101x1110x01101x0111x1111x0011x1011x0101x1101x0000x001WT0x1110x1010x1001x1000x0100x1100x01100x10WT01x1001x1101x0010x1111x1100x1100x0110x1001x0111x1011x0111x0010x0110x00111x1101x1001x0011x1001x1WT110x1100x0010x1110x0101x0111x0000x1100x1010x01111x1101x1011x1100x0100x1110xWT1001x0001x0101x0111x0110x1010x1000x0011x0010xl72RΔ193Sh258Li271Tf273Lx1111WTx1000x1001x1101x1011x0111x0100x1010x1100x0001x0010x1110x0101x0110x00111x1111x0011x0111x1011x010WT1x1100x1110x0100x0010x1000x1010x1101x0001x1000x011WT01x0000x0101x0110x0111x0110x0011x1111x0010x1100x1001x1010x1001x1100x1111x10WT010x0000x1010x1110x0100x1110x1111x1100x0101x1001x0011x0011x1001x1111x0101x01111xWT0100x1110x1011x1100x1101x0001x0010x0110x0101x0111x0011x1001x1010x1000xM-paraoxonAverage 1st Order (R2 = 0.49)2nd Order (R2 = 0.33)1010.1110.8272 19325827127372/19372/25872/27172/273193/258193/271193/273258/271258/273271/27372/193/25872/193/27172/193/27372/258/27172/258/27372/271/273193/258/271193/258/273193/271/273258/271/27372/193/258/27172/193/258/27372/193/271/27372/258/271/273193/258/271/27372/193/258/271/2733rd Order (R2 = 0.16) 4th Order (R2 = 0.006)5th Order (R2 = 0.0006)EpistasisFold effect on activityE-paraoxonAverage 1st Order (R2 = 0.41)2nd Order(R2 = 0.46)3rd Order (R2 = 0.12)1010.110.80.8127219325827127372/19372/25872/271 72/273193/258193/271193/273258/271258/273271/27372/193/25872/193/27172/193/27372/258/27172/258/27372/271/273193/258/271193/258/273193/271/273258/271/27372/193/258/27172/193/258/27372/193/271/27372/258/271/273193/258/271/27372/193/258/271/2734th Order (R2 = 0.004)5th Order (R2 = 0.0002)EpistasisFold effect on activityl72RΔ193Sh258Li271Tf273Lx1000WTx1111x1001x1011x0111x1101x1010x0100x1100x0001x0010x0101x1110x0110x00111x0111x0011x1011x1111x0101x1100x0101x000WT0x1110x0010x1001x1000x1010x1100x011WT01x0000x0101x0110x0111x0110x0011x0010x1111x1100x1001x1001x1111x1000x1110x10010x0WT000x1010x1110x0110x1100x1100x0101x1111x1001x0011x0011x1001x1111x0101x01111xWT0100x1011x1110x0010x1100x0001x0110x1101x0101x0111x1010x0011x1001x1000xabSupplementary Figure 910210110010-110-210-3103Fold change in lysate activity10210110010-110-210-3103Fold change in lysate activity10210110010-110-210-3c 99 Figure 3.16 Statistical analyses of singular and epistatic mutational effects for a, ethyl-parathion, b, methyl-paraoxon, and c, ethyl-paraoxon. Panels on the left depict the statistical analyses of the mutational effects in the adaptive landscapes. Mutations are indicated by their residue numbers (i.e., 72 denotes the effect of l72R). The heights of the bars of the 1st order effects indicate the average effects (fold-change in lysate activity) of the five mutations. Widths of the individual bars in the 1st order effects correspond to the portion of variation in activity (R2 in the linear regression model) attributable to each of the singular effects. The heights of the bars in the 2nd, 3rd, 4th, and 5th order effects correspond to the average epistatic effects of the combined mutations. Widths of the 1st, 2nd, 3rd, 4th, and 5th order effects correspond to the variation attributable to combined mutational effects, calculated as the increase in adjusted R2 of the fit to the data when each term is added to a linear regression model. Panels on the right depict the singular mutational effects (i.e., fold change in activity) of each of the 5 mutations in the 16 different genetic backgrounds. Genetic backgrounds are indicated according to the numerical order of the residues (i.e., 72, 193, 258, 271, 273), where “0” refer to the ancestral and “1” refer to the derived state (e.g., 10000 denotes m5+l72R). All analyses were conducted using the mean lysate activities for three biological replicates.     Figure 3.17 Mutational analyses for three additional OP substrates. A graphical representation of the effect of the five mutations and their epistatic relationships mapped on the structure of the MPH (PDB entry 1P9E) for a, ethyl-parathion, b, methyl-paraoxon, and c, ethyl-paraoxon. Positions of each residue reflect the configuration of alpha-carbon of the residues in the MPH crystal structure. The colour scheme for the nodes and edges are described in Figure 3.11 D. A full description of the statistical characterization is described in Figure 3.16.     Δ193Sl72Ri271Tf273Lh258LNegativePositivea E-parathionNegativePositiveΔ193Sf273Ll72Ri271Th258Lb M-paraoxonNegativePositiveΔ193Sl72Rf273Li271Th258Lc E-paraoxon 100   Figure 3.18 The effects of mutations in different genetic backgrounds. Singular mutational effects of each of the five key mutations in all 16 possible genetic backgrounds for a, ethyl-parathion, b, methyl-paraoxon, and c, ethyl-paraoxon plotted against that for methyl-parathion. d, Double mutational effects of h258L-i271T in all 8 possible genetic backgrounds for methyl-paraoxon plotted against methyl-parathion. The solid black line running through the graph represent a slope of 1. The dashed line indicates the linear fit, with the R2 shown in the bottom right corner of each plot. All analyses were conducted using the mean lysate activities for three biological replicates.   3.4.6 Adaptive landscapes yield insight into OP specificity Paraoxons and parathions are identical in overall size and structure; thus, the observed differences in enzyme specificity must stem from the phosphoryl sulfur or oxygen substituent. The reduced electronegativity of sulfur results in a less polarized, more hydrophobic molecule. Although this results in slower base-catalyzed hydrolysis in solution (Hong & Raushel 1996), the chemical reactivity of the oxons and thions with highly reactive 4-nitrophenol leaving groups is not substantially different in low-dielectric environments such as gas-phase simulations or enzyme active sites (Jackson et al. 2005).  Our aforementioned molecular docking experiments indicate that steric hindrance in the Log(fold change in lysate activity)methyl-parathionLog(fodl change in lysate activity)ethyl-parathion-3 -2 -1 0 1 2 3-3-2-10123 l72RΔ193Sh258Li271Tf273LR2 = 0.96a bLog(fold change in lysate activity)methyl-paraoxon-3 -2 -1 0 1 2 3-3-2-10123R2 = 0.14l72RΔ193Sh258Li271Tf273LLog(fold change in lysate activity)methyl-parathioncLog(fold change in lysate activity)ethyl-paraoxon-3 -2 -1 0 1 2 3-3-2-10123R2 = 0.02l72RΔ193Sh258Li271Tf273LLog(fold change in lysate activity)methyl-parathionLog(fold change in lysate activity)methyl-paraoxon-3 -2 -1 0 1 2 3-3-2-10123R2 = 0.93Log(fold change in lysate activity)methyl-parathiondh258L/i271T Effects 101 ancestral active site restricts the productive binding of OPs; the mutations appear to have made the active site more accommodating to the new substrates (Figure 3.7 C). Interestingly, the KM values (i.e., formation of the enzyme:substrate complex) change little over the course of the evolution; almost all of the change in activity can be attributed to improvements in kcat. The turnover rates for oxons is ~10-fold higher than thions in the ancestral enzymes; the rates become comparable in the derived enzymes. Notably, none of the five mutations appear to directly interact with the terminal oxygen/sulfur, which loosely associates with the active site metal ions, nor do they directly affect the catalytic machinery (nucleophile generation or leaving group departure) (Figure 3.7 C). Thus, the improvements in activity don’t appear to be from changes in the chemistry of the reaction, but are more likely the result of changes in the productivity of the substrate binding (i.e., the substrate is in a conformation aligned for hydrolysis). We speculate that the more polar oxons have higher affinity for the active site metal ions than thions, enabling them to bind more productively in the ancestral active site. In contrast, the improved active site-substrate complementarity in the evolved enzymes enable both oxons and thions to bind productively, resulting in more equal turnover rates.  Interestingly, characterization of the adaptive fitness landscapes between methyl-parathion and methyl-paraoxon unveils that two mutations, h258L and i271T, strongly interact each other and affect the oxons vs. thions turnover (Figure 3.18 B-C and Fig. 3.19). The average effect of h258L is mildly positive for both parathion and paraoxon when position 271 in the ancestral state (Ile), but becomes more positive for parathion and strongly negative for paraoxon when position 271 is in the derived state (Thr) (Figure 3.19). Conversely, the average effect of i271T is deleterious for parathion and positive for paraoxon when the position 258 is in the ancestral state (His), but becomes deleterious for both substrates when 258 is in the derived state (Leu) (Figure 3.19). A plot of the double mutational effects of h258L-i271T between methyl-parathion and methyl-paraoxon in the eight genetic backgrounds exhibits linear correlation (R2 = 0.93) (Figure 3.18 D), indicating that there is variation in the strong epistasis between h258L and i271T that’s dependent on ligand. i271T is adjacent to h258L, and may affect the position of the latter through steric hindrance and rotamers. We hypothesize that the conformation of these two residues influence the orientation of the substrate in the active site and its alignment for nucleophilic attack.    Taken together, our results indicate that improving productive substrate binding by increasing active site-substrate complementarity is one mechanism by which MPH evolved  102 activity for the more hydrophobic parathions. This modification is acquired through the collective effects of five mutations that form a complex network of epistatic interactions, and thus must arise in the right order to enable this evolutionary transition.     Figure 3.19 Changes in the singular and epistatic effects of h258L and i271T between methyl-parathion and methyl-paraoxon substrates. Labels in the centre of the nodes indicate the genotypes at the two positions, with the mutation occurring between each genotype indicated on the arrow. The singular effects of h258L and i271T in the 8 possible genetic backgrounds where one of the residues is in the ancestral state (going from the hi genotype) or in the derived state (going from either the Li or hT genotype), along with the pairwise effects of h258L-i271T, are plotted for methyl-paraoxon (red) and methyl-parathion (yellow). The dots on the plots represent the effects of the mutation(s) across the different genetic backgrounds, the heights of the bars represent the average of these effects, and the whiskers indicate the standard deviation. All analyses were conducted using the mean lysate activities for three biological replicates.       Figure 6hiLihTLT+h258L+i271T+i271T+h258L-2012-1-2Log(fold change in lysate activity)012-1-2Log(fold change in lysate activity)012-1-2Log(fold change in lysate activity)M-parathionM-paraoxon+h258L/i271T012-1Log(fold change in lysate activity) h258L Effectsh258L Effects012-1-2Log(fold change in lysate activity)i271T Effectsi271T Effectsh258L/i271T Effects 103 3.5 Discussion Previous studies using ASR examined functions that evolved millions or billions of years ago (Boucher et al. 2014; Bridgham, Carroll, & Thornton 2006; Clifton et al. 2018; Hochberg & Thornton 2017; Kratzer et al. 2014; McKeown et al. 2014). Our study demonstrates that this technique, combined with biochemical and mutational assays, can effectively uncover the molecular mechanisms underlying recently evolved functional novelty. The introduction of xenobiotics into the environment has led to the evolution of many novel enzymatic functions (Copley 2000; Russell et al. 2011). While numerous xenobiotic-degrading enzymes have been functionally characterized, the evolutionary origins and dynamics for most of these new sequences are still unknown due to large genetic “gaps” that currently exist in the databases (Afriat-Jurnou et al. 2012; Copley 2000; Crawford, Jung, & Strap 2007). As more novel sequences are discovered, ASR is becoming an increasingly valuable tool for researching functional evolution.  Our observations of the mechanisms underlying the evolution of MPH are consistent with the conclusions of several previous studies on protein evolution. Specifically, efficient MPH enzymes emerged rapidly, via a handful of genetic changes, by optimizing a promiscuous activity present in its ancestral state (Boucher et al. 2014; Clifton et al. 2018; Luo et al. 2014; Siddiq, Hochberg, & Thornton 2017). At the same time, epistasis between key adaptive mutations is prevalent, and acts to constrain the evolutionary pathways that were available (Lunzer et al. 2005; Meini et al. 2015; O'Maille et al. 2008; Sailer & Harms 2017; Tufts et al. 2015; Weinreich et al. 2006; Weinreich et al. 2018); early mutation(s) played a permissive role by epistatically generating or enhancing the positive effect of later mutations. As a consequence, the sequence in which substitutions accumulated likely occurred in a deterministic fashion (Kaltenbach & Tokuriki 2014; Lobkovsky & Koonin 2012; Miton & Tokuriki 2016; Starr, Picton, & Thornton 2017).   Most previous studies characterizing adaptive fitness landscapes have focused on unveiling the mutational epistasis that constrain the accessibility of evolutionary pathways (Lunzer et al. 2005; Meini et al. 2015; O'Maille et al. 2008; Sailer & Harms 2017; Tufts et al. 2015; Weinreich et al. 2006). Our in-depth statistical analyses of the adaptive landscapes of five key mutations, combined with robust biochemical and biophysical characterization, provided deep insight into the molecular mechanisms underlying the optimization of methyl-parathion activity. Moreover, we developed a novel approach in which we compare the adaptive landscapes for multiple substrates  104 to decipher the interactions between mutations and substrate substituents. Our results suggest that the MPH active site improved complementarity to methyl-parathion through the collective effects of five mutations that form a complex and interconnected network. We speculate that each mutation impacts catalysis by reshaping the active site to better accommodate the new substrate and reducing non-productive modes of binding, while also reorienting the substrate to optimize nucleophilic attack and transition state stabilization. It is probable that higher turnover rates of the oxons in the non-complementary ancestor are a consequence of the more polarized P=O bond resulting in an increased likelihood of productive coordination to the active site metal ions; in contrast, the more hydrophobic thions likely coordinate non-specifically through van der Waals interactions. The effects of non-productive binding have previously been described many proteins, including chymotrypsin and an unrelated bacterial phosphotriesterase (Ingles & Knowles 1967; Jackson et al. 2005). This is also reminiscent of recent work that demonstrated how laboratory evolution of an arylsulfatase was able to cause a 100,000-fold increase in phosphonate-monoester hydrolase activity by enlarging the active site and repositioning the substrate (Miton et al. 2018). In such cases, the structural consequences of mutations can be subtle in the active site, with sub-Ångström distance and angle adjustments resulting in optimized catalytic machinery, causing profound changes in catalytic efficiency and specificity. While the extent to which this model can account for the molecular mechanisms for novel enzyme functions more broadly remains to be seen, it is consistent with other models of enzyme dynamics and conformational diversity (Campbell et al. 2016; Jiménez-Osés et al. 2014; Tokuriki & Tawfik 2009). Fully assessing this will require a detailed understanding of the molecular relationships between the active site and the substrate in other examples of enzyme functional evolution. Combined with comprehensive and deep mutational analyses (Anderson et al. 2015; Canale et al. 2018; Sunden et al. 2015), studies such as this can deepen our understanding of the molecular mechanisms that underlie the evolution of new protein functions.        105 Chapter 4: Historical contingency in the evolution of organophosphate hydrolase activity  4.1 Summary Neutral genetic drift – i.e., the accumulation of mutations that have little or no effect on a protein’s function in its current environment – has generated substantial sequence variation amongst enzymes that catalyze the same reaction. Such variation can result in large differences in the nonphysiologically relevant promiscuous activities of the enzymes, as well as their ability to optimize a promiscuous activity via mutations. Consequently, some genotypes may be more capable of adapting under a novel selective pressure and acquiring new functions than others. In our previous analysis we have identified five key mutations that enabled the optimization of a promiscuous xenobiotic organophosphate hydrolase (OPH) activity in the dihydrocoumarin hydrolase (DHCH) ancestor of the enzyme, methyl-parathion hydrolase (MPH). Here, we examine how genetically diverged DHCH enzymes differ in their abilities to evolve OPH activity. We find that the ancestral sequence of MPH exhibits a higher starting level of promiscuous OPH activity compared to five extant DHCH orthologs that were characterized in this study. Moreover, the presence of epistasis has caused the five mutations to exhibit a broad range of effects on different genotypes, ranging from a ~1000-fold increase in OPH activity in the ancestral DHCH enzyme to being neutral or deleterious in most of the orthologs. Finally, comparative directed evolution of the DHCH ancestor and the five orthlogs reveal that the ancestor is able to improve OPH activity more rapidly, and is the most efficient enzyme after three rounds of directed evolution. Our work highlights the importance of the genetic starting point for successful evolution towards a novel function.             106 4.2 Introduction How does genetic variation affect an enzyme’s evolvability, defined here as “the ability of a protein to adapt in response to mutation and selective pressure” (Romero & Arnold 2009)? It has been demonstrated that many proteins exhibit genetic robustness, i.e., tolerance to mutations, resulting in considerable sequence divergence amongst orthologous proteins due to genetic drift  (Paaby & Rockman 2014; Tokuriki & Tawfik 2009). While such variation is usually neutral to the native function of the proteins, they may impact latent promiscuous activities that are not physiologically relevant, and are thus not under selection (Amitai, Gupta, & Tawfik 2007; Baier et al. 2019; Bloom et al. 2007; Khanal et al. 2015; Palmer et al. 1999). When a novel selective pressure is applied, however, such as the introduction of a new substrate into the environment, sequences in a genetically diverse population that fortuitously possess promiscuous activities towards the substrate will likely confer a fitness advantage to their host organism and undergo adaptive evolution (Amitai et al. 2007; Bloom et al. 2007).   In addition to altering the latent promiscuous activities of sequences, genetic variation can also affect the ability of an enzyme to optimize a function through mutations. Many studies have indicated that intramolecular epistasis – a phenomenon where a mutation’s effect changes depending on the presence or absence of other mutations –  is prevalent in proteins (Miton & Tokuriki 2016; Noor et al. 2012; Sailer & Harms 2017; Smith 1970; Tufts et al. 2015; Weinreich et al. 2006).  Epistasis can change the effects of the same mutations between genetically diverged orthologous sequences (Baier et al. 2019; Khanal et al. 2015; Parera & Martinez 2014).  For example, a study that tested the effect of a A156T substitution on 56 variants of the hepatitis C virus NS3 protease found that the mutation was deleterious in 46 (82.1%) of the backgrounds, but neutral in six (10.8%), and beneficial in four (7.1%) (Parera & Martinez 2014). Similarly, a E382G substitution exhibited differential effects on both the native and promiscuous activities of nine L-gamma-glutamyl phosphate reductase (ProA) orthologs (Khanal et al. 2015). Thus, genetically diverged sequences may require different sets of mutations and mechanisms in order to adapt to a new function. Furthermore, diverged sequences can differ in their evolutionary outcomes, that is, the final level of improvement in a function that they are able to achieve. A directed evolution experiment conducted on two orthologous β-lactamases towards a promiscuous phosphonate monoester hydrolase (PMH) activity found marked differences in both  107 genetic (i.e., types of mutations) and phenotypic (i.e., catalytic efficiency, solubility, thermostability) changes between the two sequences (Baier et al. 2019).  Most notably, one ortholog acquired an overall ~18,000-fold improvement in catalytic activity after ten rounds of evolution whilst the other acquired only a ~20-fold improvement (Baier et al. 2019). While laboratory evolution has strongly indicated that different genotypes will differ in their evolvabilities towards novel functions, whether this can be applied to natural examples remains to be seen. Methyl-parathion hydrolase (MPH) is an enzyme from the metallo-β-lactamase (MBL) superfamily that has acquired the ability to degrade xenobiotic organophosphate (OP) substrates (Liu et al. 2005). In our previous analyses we have found that MPH has evolved from an ancestral sequence that exhibits high dihydrocoumarin hydrolase (DHCH) activity (hence named “AncDHCH1”) via five key functional mutations (Figure 4.1).  While numerous DHCH orthologs exist in the MBL superfamily, there have been only two sequences isolated thus far that have acquired efficient OPH activity: MPH and OPHC2 (Gotthard et al. 2013; Liu et al. 2005; Luo et al. 2014). MPH was originally discovered in Pseudomonas sp. WBC-3 that was growing in the polluted soil near a factory manufacturing methyl-parathion (Sun et al. 2004), and has since disseminated to many different strains of bacteria via horizontal gene transfer (HGT). The absence of other OPH enzymes raises the question of whether diverged DHCH orthologs differ in their evolvabilities towards the novel function. Are there only certain genetic backgrounds that are capable of rapidly acquiring efficient OPH activity? Will different orthologous sequences acquire the same mutations that MPH did, or do they require their own distinct sets of mutations and mechanisms?   In this study, we address these questions by testing the abilities of ten different DHCH enzymes, sharing between 50-89% sequence identity to MPH, to optimize OPH activity. We find that sequence divergence has resulted in AncDHCH1 possessing higher promiscuous OPH activity compared to the extant DHCH orthologs that were characterized. Moreover, due to the presence of epistasis, the five functional mutations exhibited a broad range of effects in the different genetic backgrounds, ranging from a ~1000-fold increase in OPH activity in AncDHCH1 to being either neutral or deleterious in most of the orthologs.  Comparative directed evolution of AncDHCH1 and the orthologs towards OPH activity reveal that the orthologs are  108 unable to acquire the same catalytic efficiency towards the novel substrate as the ancestral sequence.     Figure 4.1 The evolution of OPH activity in MPH. (A) A schematic phylogeny of the reconstructed ancestral sequences and orthologs utilized in this study. Circles denote individual sequences whereas triangles indicate clusters of sequences. The catalytic activities (kcat/KM) of the characterized enzymes are displayed in bar graphs. The five mutations that enabled the evolution of OPH activity in MPH is indicated on the branch between AncDHCH1 and MPH. OsDHCH and PvDHCH weren’t included in our original phylogenetic analyses; their approximate relationships are indicated in the schematic phylogeny. (B) Reaction schemes of dihydrocoumarin hydrolase (DHCH, top) and methyl-parathion hydrolase (OPH, bottom) activities. (C) Overlay of the energy-minimized docking pose of methyl-parathion in MPH in the apo AncDHCH1 active site (left, cyan, PDB ID: 6c2c) and in MPH (right, orange, PDB ID: 1p9e). The five key mutations are highlighted as sticks. The active site metal ions are shown as grey spheres. The bridging and terminal (which acts as the likely nucleophile) water/hydroxide molecules are shown as red spheres. (D) The host organisms from which MPH sequences have been isolated. The percentages of the orders of bacteria are indicated outside the pie chart. A full list of the MPH sequences is displayed in Table C.1.    AncDHCH2AncDHCH1MPHJsDHCHBbDHCHDHCHsDHCHsDHCOP10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )10-1 101 103105107k cat/KM (M-1s-1 )SmDHCHAncDHCH310-1 101 103105107k cat/KM (M-1s-1 )MPH-m510-1 101 103105107k cat/KM (M-1s-1 )27 mutationsl72R, Δ193S,h258L, i271T,f273Lh258l72i271f27372R258L271T273L193SAncDHCH1 MPHDihydrocoumarin hydrolysis:O O OHOHOPOOO SNO2POOOHSNO2OH+Methyl-parathion hydrolysisPseudomonadalesBurkholderialesXanthomonadalesRhizobialesSphingomonadalesEnterobacteralesUnknownA BCD 13.8%34.5%6.9%6.9%3.4%10.3%24.1%AncDHCH4AncDHCH5PvDHCHOsDHCH 109 4.3 Materials and Methods 4.3.1 Phylogenetic analysis and ancestral reconstruction Inference of the phylogeny of MPH and the ancestral sequences were conducted as previously described (Yang et al. 2019). Briefly, the amino acid sequences of 500 MPH homologs were downloaded from GenBank using protein BLAST starting from Pseudomonas sp. WBC-3 MPH (GI:30038775) on July 2014. Subsequently, redundant sequences with more than 98% sequence identity were removed computationally using CD-Hit (Fu et al. 2012) with standard parameters, resulting in a total of 306 sequences. Then, a multiple sequence alignment (MSA) of these sequences was generated using the Multiple Sequence Alignment by Log-Expectation (MUSCLE) (Edgar 2004), and sequences containing major Indels were manually removed, further reducing the number of sequences down to 153. For the reconstruction a MSA of the aforementioned 153 sequences was used, generated using M-Coffee with standard parameters (Wallace et al. 2006). The phylogeny was inferred via Randomized Axelerated Maximum Likelihood (RAxML (Stamatakis 2014)), utilizing the LG (Le & Gascuel 2008) protein model with parameters +I+G+F, which was the best fitting evolutionary model according to the Akaike Information Criterion, tested via PROTTEST (Abascal et al. 2005). The posteriori bootstrapping analysis in RaxML was performed with the extended majority-rule consensus tree criterion. Amino acid sequences of ancestral nodes in the phylogeny were inferred with PAML (version 4.7a (Yang 2007)) employing the above-mentioned alignment, phylogeny, and evolutionary model. As PAML does not support Indel reconstruction, Indels in the inferred ancestors were manually reconstructed using a Maximum Parsimony principle.   4.3.2 Genomic context analysis The amino acid sequences of 500 MPH and DHCH orthologs were downloaded from UniProt using protein BLAST starting from Pseudomonas sp. WBC-3 MPH (UniProt ID: Q841S6) in April 2020. The DNA regions flanking the sequences were retrieved via EFI-GNT, using a neighbourhood size (number of neighbouring genes upstream and downstream) of 10 (Zallot, Oberg, & Gerlt 2019).    110 4.3.3 Cloning The nucleotide sequences of the MPH ancestral enzymes, all with signal peptides removed, were designed using the same codons found on the wild-type MPH nucleotide sequence for regions where the amino acid sequences are identical and E. coli optimized codons for regions where the amino acid sequences diverge. The nucleotide sequences of the orthologs SmDHCH, PvDHCH, and OsDHCH were synthesized in the same manner, whereas the orthologs JsDHCH and BbDHCH were entirely E. coli codon optimized. For directed evolution, a codon optimized version of AncDHCH1 was used. For all of the orthologous sequences and AncDHCH4 and AncDHCH5, both the wild-type sequences and the versions carrying the five MPH mutations (+m5) were synthesized; for AncDHCH1, AncDHCH2, AncDHCH3, and MPH, the five mutations were introduced using site-directed mutagenesis (described below). All genes contained a Nco I restriction site on the forward side and a Hind III restriction site on the reverse for cloning. The resulting genes were synthesized (BioBasic and Twist Bioscience), amplified, and cloned into a pET vector (Novagen) containing a N-terminal MBP (maltose binding protein) tag using the Nco I and Hind III restriction enzymes (Thermo Scientific). Correct sequences of synthesized genes and subcloning into the vector were confirmed by DNA sequencing.  4.3.4 Site-directed mutagenesis To generate single and combinatorial mutants, non-overlapping pairs of primers ~30 bp in length and encoding the desired mutations were designed with a Lgu I restriction site at the 5’ end followed by an overlapping segment of 3 bp to enable annealing of the PCR products following restriction (Table C.3). The entire plasmid containing the target gene was amplified using PCR, with each 50 µL reaction containing 1 × Kapa Hifi Buffer (Kapa Biosystems), 2 ng template DNA, 1 µM of equal amounts of forward and reverse mutational primers, 0.25 mM dNTPs, and 0.25 U Kapa Hifi polymerase (Kapa Biosystems). The cycling conditions were as follows: Initial denaturation at 95°C for 2 minutes followed by 30 cycles of denaturation (20 seconds, 95°C), annealing (15 seconds, 60-65°C), and extension (5 minutes, 72°C), and a final extension step at 72°C for 5 minutes. Subsequently, each PCR was treated with Dpn I (Thermo Scientific) for 1 hour at 37°C to digest the template DNA, and the products were purified using Cycle Pure PCR purification kit (Omega Bio-tek). The PCR products were digested with Lgu I (Thermo  111 Scientific) for 1 hour at 37°C. The digested product was purified using Cycle Pure PCR purification kit (Omega Bio-tek). Ligations were performed in 20 µL reactions with approximately 20 ng of DNA using T4 DNA ligase (Thermo Scientific), and incubated at room temperature for 1-2 hours. The ligation mixtures were transformed into E. cloni 10G cells (Lucigen). Successful mutagenesis was confirmed by sequencing.  4.3.5 Purification of tagged proteins The plasmids containing MBP-tagged proteins were transformed into E. coli BL21 (DE3) and grown in LB with 50 µg/mL kanamycin overnight. The following day, 8 mL of the overnight cultures were used to inoculate 400 mL of 2x YT media with 50 µg/mL kanamycin and 100 µM ZnCl2, and the cultures were grown at 30°C, 280 × rpm for 3 hours. The cultures were subsequently cooled to 16°C for 30 minutes, and 0.2 mM of IPTG was added to induce protein expression, and the cultures incubated at 16°C overnight. Cells were harvested by spinning at 4°C, 3,220 × g for 10 minutes, and the supernatant removed. For lysis, the cell pellets were frozen at -80°C overnight, and then re-suspended in a mixture of B-PER Protein Extraction Reagent (Thermo Scientific) and 50 mM Tris-HCl buffer, pH 7.5 containing 200 µM ZnCl2, 100 µg/mL lysozyme, and 0.5 U benzonase, and incubated on ice for 1 hour. Cell debris were removed by centrifugation at 4°C, 48,400 × g for 30 minutes. The clarified lysate was loaded into columns containing about 2 mL of amylose resin (New England Biolabs). The columns were washed once with Buffer A (50 mM Tris-HCl, pH 7.5 containing 100 mM NaCl and 200 µM ZnCl2), once with Buffer B (50 mM Tris-HCl, pH 7.5 containing 300 mM NaCl and 200 µM ZnCl2), and a final time with Buffer A. MBP-tagged proteins were eluted with Buffer A containing 10 mM maltose (Sigma-Aldrich). To desalt the proteins, the buffer of eluted proteins was exchanged with 50 mM Tris-HCl, pH 7.5 containing 200 µM ZnCl2. The eluted proteins were concentrated using Microsep Advance Centrifugal Device, 10K Omega (Pall Life Sciences).  4.3.6 Enzyme kinetics The kinetic parameters of the purified enzymes were measured using the same procedure as described previously (Baier & Tokuriki 2014). Briefly, the activity for methyl-parathion (Sigma)  112 was monitored following the release of p-nitrophenol at 405 nm with an extinction coefficient of 18,300 M-1cm-1 in 50 mM Tris-HCl, pH 7.5 containing 200 µM ZnCl2. The activity for dihydrocoumarin (Sigma) was monitored at 270 nm with an extinction coefficient of 1300 M−1cm−1 in 50 mM HEPES, pH 7.5 containing 200 µM ZnCl2. Data were corrected for buffer-catalyzed rates under identical conditions, if significant. The kinetic parameters KM and kcat were determined by fitting the initial rates to the Michaelis–Menten model (v0=kcat [E]0[S]0/(KM+[S]0)) using KaleidaGraph (Synergy Software).   4.3.7 Generation of mutagenized libraries Random mutant libraries were generated with error-prone PCR using nucleotide analogues (8-oxo-2’-deoxyguanosine-5’-Triphosphate (8-oxo-dGTP) and 2’-deoxy-P-nucleoside-5’-Triphosphate (dPTP); TriLink). Two independent PCRs were prepared, one with 8-oxo-dGTP and one with dPTP. Each 50 µL reaction contained 1 × GoTaq Buffer (Promega), 1.5 mM MgCl2, 1 ng template DNA, 1 µM of primers (forward (T7 promoter): taatacgactcactataggg; reverse (T7 terminator): gctagttattgctcagcgg), 0.25 mM dNTPs, 1.25 U GoTaq DNA polymerase (Promega) and either 100 µM 8-oxo-dGTP or 1 µM dPTP. PCR cycling conditions: initial denaturation at 95°C for 2 minutes followed by 20 cycles of denaturation (30 seconds, 95°C), annealing (60 seconds, 56°C) and extension (90 seconds, 72°C) and a final extension step at 72°C for 5 minutes. Subsequently, each PCR was treated with Dpn I (Thermo Scientific) for 1 hour at 37°C to digest the template DNA. PCR products were purified using the Cycle Pure PCR purification kit (Omega Bio-tek) and amplified in a 2nd PCR. The 50 µL reaction contained 10 ng template from each initial PCR, a nested set of primers at 1 µM, 1 × Q5 Reaction Buffer (NEB), 1 × Q5 High GC Enhancer (NEB), 0.25 mM dNTPs, 0.5 U Q5 High-Fidelity DNA Polymerase (NEB). PCR cycling conditions: Initial denaturation at 95°C for 30 seconds followed by 30 cycles of denaturation (10 seconds, 98 °C), annealing (15 seconds, 58°C), and extension (45 seconds, 72°C), and a final extension step at 72°C for 2 minutes. The PCR products were purified as described above. The protocol yielded an average of 1-2 amino acid substitutions per gene (2-3 bp changes) per round.  4.3.8 Cloning of mutagenized libraries Mutant libraries of AncDHCH1 and the orthologous enzymes were sub-cloned into the same  113 vector described in 4.3.3. PCR products and vectors were digested with Nco I and Hind III (Thermo Scientific) for 1-2 hours at 37°C. The digested vector was further treated with FastAP (alkaline phosphatase, Thermo Scientific) for an additional hour. Digested vector was purified from a 1% agarose gel using a gel extraction kit (Qiagen). Digested PCR products were purified as described above. Ligations were performed in 20 µL reactions at a vector:insert molar ratio of 1:3 using T4 DNA ligase (Thermo Scientific) with approximately 20 ng vector DNA, and incubated at room temperature for 1-2 hours. The ligation mixtures were transformed into E. cloni 10G cells (Lucigen), yielding >104 colonies. The colonies were pooled, and the plasmids were purified using a plasmid purification kit (Qiagen) and retransformed into E. coli BL21 cells (DE3) for enzyme expression and activity screening.  4.3.9 Cell lysate screen in 96-well plates Colonies picked into 96-well plates were grown in 200 µL of LB with 50 µg/mL kanamycin at 30°C overnight. 20 µL of each culture were used to inoculate 400 µL of LB with 50 µg/mL kanamycin and either 100 µM ZnCl2 (for AncDHCH1) or 100 µM CoCl2 (for all of the orthologous sequences), and incubated at 30°C for 3 hours, then protein expression was induced by adding IPTG at the final concentration of 1 mM, and incubated at 30°C for another 3 hours. Cells were harvested by centrifugation at 3,220 × g for 10 minutes at 4°C and pellets were frozen at -80°C overnight. Cells were lysed by adding 200 µL of 50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.1% (w/v) Triton-X100, either 200 µM ZnCl2 (for AncDHCH1) or 200 µM CoCl2 (for all of the orthologous sequences), 100 µg/mL lysozyme, and 0.5 U benzonase (Novagen). After 30 minutes of incubation at room temperature the lysate was clarified at 3,220 × g for 20 minutes at 4°C. To assay enzymatic activity, 20 µL of the clarified lysate was mixed with 80 µL methyl-parathion solution at a final concentration of 400 µM in 50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.02% Triton-X100 and either 200 µM ZnCl2 (for AncDHCH1) or 200 µM CoCl2 (for all of the orthologous sequences), and the reaction was monitored at 405 nm. The activity of the best variants was subsequently confirmed in triplicate cultures and activity assays. The variant with the highest activity was used as template for the next round of directed evolution.     114 4.4 Results 4.4.1 Sequence information and genomic context The ancestral sequences AncDHCH1, AncDHCH2, AncDHCH3, AncDHCH4, and AncDHCH5 were calculated from our previous ancestral reconstruction conducted in 2014 (Yang et al. 2019). We additionally synthesized five extant orthologous sequences obtained from the NCBI GenBank database using protein BLAST starting from the MPH sequence from Pseudomonas sp. WBC-3 (GI:30038775): JsDHCH (GenBank: WP_043481628.1), BbDHCH (GenBank: her68999.1), SmDHCH (GenBank: WP_021504981.1), PvDHCH (GenBank: WP_028105201.1), and OsDHCH (GenBank WP_114217252.1). The orthologs share between ~50% to ~70% sequence identity with MPH (GenBank: 1P9E_A) (Figure 4.1, Figure 4.2, Table 4.1, and Table 4.2). Our previous phylogenetic analysis has indicated that AncDHCH1 is the ancestral sequence to the MPH clade and JsDHCH, whereas AncDHCH2 is the ancestor to AncDHCH1 and BbDHCH, and AncDHCH3 is the ancestral sequence of AncDHCH2 and a clade of DHCH enzymes that includes SmDHCH. AncDHCH4 is an intermediate node between AncDHCH1 and JsDHCH, and AncDHCH5 is an intermediate node between AncDHCH2 and BbDHCH (Figure 4.1 A). For this study, we have included two additional orthologs, OsDHCH and PvDHCH, that weren’t included in our original analysis conducted in 2014. PvDHCH has higher similarity to MPH, sharing 63% sequence identity, whereas OsDHCH represents a more diverged ortholog with 52% sequence identity (Table 4.2).    To gain more insight into the host organisms that carry MPH and DHCH sequences, we performed protein BLAST search analysis (GenBank) using the MPH sequence (Table C.1). The results yielded a total of 29 sequences that exhibited >90% sequence identity to MPH; a considerable drop off is observed, with the next closest ortholog having only ~70% sequence identity. Analysis of the organisms from which MPH sequences have been isolated found that the majority (34.5%) of them are Burkholderiales, with other common hosts being Rhizobiales (24.1%), and Pseudomonadales (13.8%) (Figure 4.1 D). These orders of bacteria also commonly carry DHCH enzymes, including the orthologs that were characterized in this study (Table 4.1), which can potentially serve as evolutionary starting points for the novel OPH activity. Yet, instead of independently evolving a new OPH enzyme from a DHCH ortholog, all of these bacteria appear to have acquired MPH via HGT.    115  Currently, there is little known about the DHCH enzymes in the MBL superfamily and what functional role they play in their host organisms. We examined the genomic contexts of the sequences to determine whether they are part of any characterized operons and metabolic pathways. We performed protein BLAST search analysis (UniProt) using the MPH sequence (UniProt ID: Q841S6) and downloaded 500 MPH and DHCH orthologs. The DNA regions flanking these sequences were retrieved via EFI-GNT (Figure 4.3 and Table C.2). The orthologs don't appear to be in any identified bacterial operons, and many of the flanking genes are uncharacterized (Figure 4.3 and Table C.2). Thus, the functional roles of these enzymes are still unknown.      Table 4.1 Information of the enzymes used in this study. 1Length includes the N-terminal signal peptide. 2The JsDHCH sequence that’s been deposited in the UniProt database contains an extra amino acid in the N-terminal signal peptide.          Enzyme name UniProt ID GenBank accession PDB ID Organismal source Length1 AncDHCH1 n.a. n.a. 6C2C  Predicted ancestor 330 a.a. AncDHCH2 n.a. n.a. n.a. Predicted ancestor 331 a.a. AncDHCH3 n.a. n.a. n.a. Predicted ancestor 352 a.a. AncDHCH4 n.a. n.a. n.a. Predicted ancestor 317 a.a. AncDHCH5 n.a. n.a. n.a. Predicted ancestor 323 a.a. MPH Q841S6 1P9E_A 1P9E  Pseudomonas sp. WBC-3 331 a.a. JsDHCH2 L9P8Z6 WP_043481628.1 n.a. Janthinobacterium sp. HH01 317/318 a.a. BbDHCH H5WTC6 EHR68999.1 n.a. Burkholderiales bacterium JOSHI_001 309 a.a. PvDHCH n.a. WP_028105201.1 n.a. Pseudoduganella violaceinigra 312 a.a. SmDHCH n.a. WP_021504981.1 n.a. Serratia marcescens 314 a.a. OsDHCH n.a. WP_114217252.1 n.a. Ochrobactrum sp. 3-3 328 a.a.  116 Table 4.2 Amino acid sequence identities of enzymes characterized in this study. MPH                       AncDHCH1 89%             AncDHCH2 80% 90%            AncDHCH3 71% 80% 89%           AncDHCH4 69% 79% 73% 67%          AncDHCH5 75% 84% 91% 82% 68%         JsDHCH 66% 72% 67% 61% 90% 63%        BbDHCH 67% 73% 79% 71% 62% 87% 57%       PvDHCH 63% 71% 69% 64% 69% 68% 64% 63%      OsDHCH 52% 58% 58% 55% 54% 56% 51% 50% 53%     SmDHCH 54% 60% 64% 61% 54% 62% 51% 56% 54% 52%    MPH AncDHCH1 AncDHCH2 AncDHCH3 AncDHCH4 AncDHCH5 JsDHCH BbDHCH PvDHCH OsDHCH SmDHCH  A multiple sequence alignment of the amino acid sequences of the enzymes, all with the N-terminal signal peptides removed, was generated using MUSCLE (standard parameters), which was then used to calculate the pairwise sequence identities, using the web-based program SIAS (http://imed.med.ucm.es/Tools/sias.html), using the length of the longest sequence with gaps taken into account.    117   60        70                         80        90       100                                                                                                                   TT   MPH        1       10        20        30        40        50       MPH                       A                A AAAP      PG YR   G  E ...MPLKNRLLARLSCVA VVAATAAVAPLTLVST H    QVRTSA  Y  MLL DF IAncDHCH1                   A                A AAAP      PG YR   G  E ...MPLKNRLLATLSCVA VLAAAMAAAPLTLVST H    QVKTQA  F  MML DF VAncDHCH2                   A                A AAAP      PG YR   G  E ...MPMTNRLLATLSCVA LLAAALAAASLTLASA H    QVKTQA  F  MML DF VAncDHCH3                   A                A AAAP      PG YR   G  E .....MTNRAMATLSAVA LLAAALAAASLSLASA H    QVKTQA  F  MML DF VAncDHCH4                   A                A AAAP      PG YR   G  E ........MTITTLTTAA VLAAAMASS......M Q    MAKFQA  F  TTL DF VAncDHCH5                   A                A AAAP      PG YR   G  E ...MPMTRKLLATLSLVM LAAAALAAA......A Q    QVKTQA  Y  MML DF VJsDHCH                    A                A AAAP      PG YR   G  E ........MTTTTLTRTA VLAVAMAST......M Q    MAKFQA  F  TTL DF IBbDHCH                    A                A AAAP      PG YR   G  E ........MALAALCLAA ...............A Q    QVKGQA  W  MPL DF VPvDHCH                    A                A AAAP      PG YR   G  E ........MFLKKLAVPF FAAVAG.........A H    MAKSQA  F  MAL DF VOsDHCH                    A                A AAAP      PG YR   G  E MFRHLISSSSLRALAIGA LFGGAL..TPLAVPSL S    MQRTQA  Y  IMV SI VSmDHCH                    A                A AAAP      PG YR   G  E ........MFWKRCLLGA LAVMSL.........Q G    QAKTPT  F  IML SF Vβ1 MPH     MPH     AncDHCH1AncDHCH2AncDHCH3AncDHCH4AncDHCH5JsDHCH  BbDHCH  PvDHCH  OsDHCH  SmDHCH  β2 α1 α2 β3                TT   TT                                    TTMPH            110       120       130       140       150       160MPH     AncDHCH1AncDHCH2AncDHCH3AncDHCH4AncDHCH5JsDHCH  BbDHCH  PvDHCH  OsDHCH  SmDHCH  β4 α3 η1 β5 α4 η2 β6     TT                        TT                            MPH            170       180       190       200       210       220MPH     AncDHCH1AncDHCH2AncDHCH3AncDHCH4AncDHCH5JsDHCH  BbDHCH  PvDHCH  OsDHCH  SmDHCH  β7 β8 α5 α6 α7 .   TT      TT  TT        TT      TT                TT      MPH             230       240       250       260       270         MPH     AncDHCH1AncDHCH2AncDHCH3AncDHCH4AncDHCH5JsDHCH  BbDHCH  PvDHCH  OsDHCH  SmDHCH  β9 β10 β11 β12 α8                      TT  TT        TT                       MPH     280       290       300       310       320       330         MPH     AncDHCH1AncDHCH2AncDHCH3AncDHCH4AncDHCH5JsDHCH  BbDHCH  PvDHCH  OsDHCH  SmDHCH  α9 β13 β14 β15  L NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   SK  LV T AAG F PTL RLAANLK A  Q   V EIYIT M P  V   MVGEL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   SK  LV T AAG F PTL KLAANLK A  Q   V EIYIT M P  V   MANEL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     I   SK  LV T AAG F PTL KLVANLK A  Q   V EIYIT M P  V   MANGL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     I   SK  LV T AGG F PTL KLVANLK A  Q   V EIYIT M P  V   TQNGL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     I   SK  LI T AAG F PTA KLAANLK S  K   V EIYIT M S  V   AANEL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   SK  LV T AAG F PTL KLVANLK A  Q   V EVYIT M P  V   MADGL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     I   SK  LI A AAS F PTA KMLANFK S  K   V EIYIS M G  V   AANEL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   SK  LV T AAG F PTL NLVANLK S  Q   V EVYIT M P  V   MAGTL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   AK  LI T AGG F PTL NLVKSLK A  E   V EIYIT L G  V   GAQDL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     V   ER  LI A GGA F SRL QLVENLK A  G   I DILLT I P  V   VKNGL NTG  LV  D G   L G   G       A GY PEQ D     H H DH GGL     I   DK  MI S AGQ L DGL KLVDNLR A  Q   V EIYLT M P  L   THDG    R       A         H  FP  G       G    P NY              VAVE KKAFADA KGGYLIAAS LS  GI HIRAEGK YRFV V  SVVNPK......    R       A         H  FP  G       G    P NY              AAAE KKAFADA KGGYLIGAA LS  GI HIRADGK YRFV V  SVANPK......    R       A         H  FP  G       G    P NY              AAAQ KKAFADA KQGYLVGAA LS  GI HIRADGK YTWV V  SAARAK......    R       A         H  FP  G       G    P NY              AAAQ KKAFADA KQGYLVGAA LS  GI HVRKDGK YTWV V  SAARAKSKAGAR    R       A         H  FP  G       G    P NY              AIAE KEAFAAA KGGYLIGAA LS  AL HVRADGK YQFV V  ALPR........    R       A         H  FP  G       G    P NY              AAPQ KKAFADA KQGYYVGVA VS  GI RLRADGK YTWV V  SANK........    R       A         H  FP  G       G    P NY              AIAE KEAFAAA KGGYLIGAP LS  AL HVRVNGK YDFV V  ALPR........    R       A         H  FP  G       G    P NY              AAPQ KKAYADA KKGYYVGVA VA  GI RLRADGK YTWV A  SGNK........    R       A         H  FP  G       G    P NY              AMAQ KAAFAEA KQGYWMGAA LP  AI HLRAEGK YEFY V  SVPR........    R       A         H  FP  G       G    P NY              AAKV RSLFDEA KSGVLLGGA LA  GF HVRANKD FDWI V  STEF........    R       A         H  FP  G       G    P NY              AVAQ LRIFGDS RQSELVGGA LS  GL YLNRQGD YTWV L  GAL.........   FPNA V          LS      A                  Y             QLA    V RADQKEADFW  QTNLDK PDDESKGFFKGAMASLNP VKAGKFKPFSGNT   FPNA V          LS      A                  Y             QAA    V RADQKDADFW  QANLDK PDDE.KGFFQGAMASLNP VKAGKFKPFSGNT   FPNA V          LS      A                  Y             QAA    V RADQKDADFW  QANLAK PEDE.KGFFQGAMASLKP VEAGRFKPFSGNT   FPNA V          LS      A                  Y             KAA    V RADQKDADFW  EANLAK PEDE.KGFFQGAMASLKP VEAGRFKPFSGNG   FPNA V          LS      A                  Y             QRV    V RAGKQDADYY  QANLDK TGDE.KANFQGAMVSLNP VKAGKFQPITANS   FPNA V          LS      A                  Y             QAA    V RADQKDADFW  QQNLAK PEDE.KGFFQGAMASLKP VDAGRFKPFSGNT   FPNA V          LS      A                  Y             QRV    V RAGKLDADFY  QSNLDK TGEE.KEDFQGAMMSLNP VKAGKFQPIVANS   FPNA V          LS      A                  Y             QAA    V RAHQKEGDFW  PQTLAQ PEAD.KGFVQGAQASMKA VDGGRYKPFDGQT   FPNA V          LS      A                  Y             KPL    V RMDKRDAEYF  KENLEK SGDA.KGNFQGPISSVSG ..AGKVKPFDGNT   FPNA V          LS      A                  Y             AVV    T HADRHDAEFW  EAN.KD HGDP.EGFFEGALVSLQP IKSNRFNTFTGDS   FPNA V          LS      A                  Y             KAA    V RAASQDADFW  ADKLKQ SAGN.KSNFERAQAVIKP QAAGHFKPFSGDGT L DG   LP  K L                           L       P  TS     A S  TVA  VD R NQ.................PAPKTQSA AKSFQKA LE  VTGYT L DG   LP  K L                           L       P  TS     A S  TVD  VD L NQ.................PPAKTQSA AKSFLKA LE  VNAYT L DG   LP  K L                           L       P  TS     A S  TVD  VD L TNT................SPAQVQSA AKAFLGV LE  VNAYT L DG   LP  K L                           L       P  TS     A S  TVD  VD L TNTSTTPGGNGQSIPLDQSSPAQVQSA AKAFLGV VE  VNAYT L DG   LP  K L                           L       P  TS     V N  TID  VD L KQ.................PPAKTQAA AKSFLKS LE  VNAFT L DG   LP  K L                           L       P  TS     A S  TVD  VD L TNT................RPGQVQRA AKAYLGV LE  VNAYT L DG   LP  K L                           L       P  TS     V N  TID  VD L KQ.................PPAKTNAA ARSFEKS LE  INAFT L DG   LP  K L                           L       P  TS     A N  TVG  VD L TNT................RPGQVQRA MKAYLGI LE  VNGYT L DG   LP  K L                           L       P  TS     V N  TID  ME L HQ.................KPDKTIKA GENFLKT VE  VNAFT L DG   LP  K L                           L       P  TS     A S  TLD  AD L SG.................KPEETRAA AEGRLGS VE  FNSYT L DG   LP  K L                           L       P  TS     A S  IIR  AD L LNS................TPQQIAAG AERHQSL VV  VNAY  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .D V  IKALASH     HTT V ESQGQK ALL   ILVA   FDD S TTQL S SKS  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .D V  IKALASH     HTT V ESKGQK VLL   IHVA   FDD S TIQF S SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .D V  IKAVAAH     HTT V ESKGQK VLW   IHVA   FPD S TIQF S SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   DE V  IKAVAAH     HTT V ESKGQK VLW   IHVA   FPD S TIEF S SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .E V  IKSYFNG     HIT V ESKGQK VLL   LHVQ   FDD G GITF T SKV  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .D V  VRAVPAH     HTI V ESKGQK VLW   MHVA   FPD S TIKF A SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .E V  IKSYFNG     HIT V ESKGQK ALL   LHVQ   FAT G GVTF T SRI  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .E I  VTAVPAI     HSI V ESKGQK VLW   MHVA   FAD S TIKF A SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .E V  VRASSSF     HTT V ESKGQK VLI   MHNA   FPD G TIQF S SKA  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .Q V  ISATTAY     SVV A ENGDDR LII   IHVG   FAD D TISF S RAE  L PG       GHTPG   Y V      L   GDL    AVQ   P V    D D   .E S  IAAFAAH     HSV Q TSQGQK LLL   IHVA   MPH K AISF N AKA 118 Figure 4.2 Sequence alignment of all of the reconstructed ancestral and orthologous sequences utilized in this study. A multiple sequence alignment was first generated using MUSCLE, and features highlighted using ESPript 3 (Robert & Gouet 2014), with sequence similarities depiction parameters of BLOSUM62 and a global score of 0.7. The secondary structure elements of MPH (PDB ID: 1P9E) are displayed at the top of the alignment. Conserved residues are highlighted in red boxes; the positions of the five key mutations are highlighted in orange boxes.     Figure 4.3 Genomic context of DHCH orthologs. The amino acid sequences of 500 MPH and DHCH orthologs were downloaded from the UniProt database using protein BLAST starting from MPH from Pseudomonas sp. WBC-3 (UniProt ID: Q841S6) on April 2020. The DNA region flanking the sequences were retrieved via EFI-GNT (https://efi.igb.illinois.edu/efi-gnt/), using a neighbourhood size (number of neighbouring genes upstream and downstream) of 10. Results from the genomes of the host oganisms carrying MPH and two orthologs used in this study, JsDHCH (UniProt ID: L9P8Z6) and BbDHCH (UniProt ID: H5WTC6), along with genomes from several different strains of Burkholderiales are presented. The MPH and DHCH orthologs are indicated by the black box. A full description of the neighbouring genes is presented in Table C.2 Query UniProt ID: Q841S6; Pseudomonas sp. (strain WBC-3); MPHQuery UniProt ID: H5WTC6; Burkholderiales bacterium JOSHI_001; BbDHCHQuery UniProt ID: L9P8Z6; Janthinobacterium sp. HH01; JsDHCHQuery UniProt ID: A0A4Y8MT85; Paraburkholderia dipogonisQuery UniProt ID: A0A3E0C5U3; Paraburkholderia sp. BL6669N2Query UniProt ID: E1T966; Burkholderia sp. (strain CCGE1003)Query UniProt ID: I2IR17; Burkholderia sp. Ch1-1Query UniProt ID: A0A329BL42; Paraburkholderia bryophilaScale:3 kbp 119 4.4.2 Characterization of ancestral enzymes and orthologs The nucleotide sequences of all of the enzymes were cloned into a pET vector containing a N-terminal MBP (maltose-binding protein) tag, and recombinantly expressed in E. coli BL21 cells (DE3). SDS-PAGE analysis of the soluble (lysate) and insoluble (pellet) fractions revealed that MPH and all of the ancestral proteins are quite soluble, with >60% of the protein being present in the cell lysate fractions (Figure 4.4 A). Most of the extant orthologs are also quite soluble, the notable exception being BbDHCH, which has approximately only 7% of the protein in the cell lysate fraction (Figure 4.4 A). Insoluble protein aggregates are usually caused by the accumulation of misfolded or partially folded proteins (Trimpin & Brizzard 2009; Villaverde & Carrió 2003). Thus, the amount of biologically active molecules in the cell is diminished for an aggregation-prone protein like BbDHCH, which will likely decrease the overall fitness that the enzyme confers to the host. We subsequently purified all of the enzymes using affinity chromatography and measured their activities towards the lactone dihydrocoumarin (DHC) and methyl-parathion (the canonical OP substrate of MPH) (Figure 4.4 B-C and Table C.4). Note, we refer to the wild-type MPH enzyme, which has the five key mutations necessary for efficient OPH activity, as being in the “derived state”, whereas the wild-type ancestral sequences and orthologs, which do not have the mutations, are referred to as being in the “ancestral state” (Figure 4.2 and Figure 4.4). All of the ancestral enzymes and DHCH orthologs have high DHCH activity (kcat/KM = 105-107 M-1s-1), and low levels of promiscuous OPH activity (kcat/KM = 10-1-101 M-1s-1) (Figure 4.4 B-C and Table C.4). The wild-type MPH, on the other hand, has lower DHCH activity compared to the other enzymes, but much higher OPH activity (kcat/KM = 2.1 × 104 M-1s-1) (Figure 4.4 B-C and Table C.4). Of the DHCH enzymes characterized, the ancestral sequences AncDHCH1, AncDHCH2, and AncDHCH5 exhibited the highest levels of OPH activity (kcat/KM ~101 M-1s-1), albeit still ~1000-fold lower compared to MPH, whereas AncDHCH3, AncDHCH4, and all five of the extant orthologs have lower OPH activities (kcat/KM ≤ 100 M-1s-1) (Figure 4.4 C and Table C.4). In particular, the activities of SmDHCH and OsDHCH toward methyl-parathion were too low to be determined. In our previous study, we found that many of the DHCH enzymes have higher activity towards the OP substrate, methyl-paraoxon, which differs from methyl-parathion in having a phosphoryl oxygen in place of the sulfur (Yang et al. 2019); hence  120 we measured the OPH activities of SmDHCH and OsDHCH towards methyl-paraoxon. The activities of the two enzymes towards the substrate are still extremely low (kcat/KM ~10-1 M-1s-1) (Figure 4.4 C). Taken together, AncDHCH1, AncDHCH2, and AncDHCH5 possess higher latent promiscuous activities compared to the other sequences, and may thus provide a greater fitness advantage to host organisms if the activity was under selection. Notably, these three enzymes also have the highest sequence identity to MPH (75%-89%) (Table 4.2), suggesting that genetic backgrounds that are more similar to MPH may be more likely to exhibit higher OPH activity.       Figure 4.4 Biochemical characterization of the ancestral enzymes and orthologs. (A) SDS-PAGE analysis of the solubilities of the reconstructed ancestral and orthologous enzymes. Both wild-type (wt, left) and sequences carrying the five MPH mutations (+m5, right) are shown (MPH-m5 denotes the MPH sequence where the five residues have been reverted back to their ancestral states). The soluble (S) and insoluble pellet (P) fractions of cell lysates were analyzed by SDS–PAGE. The percentages of protein in the soluble and insoluble fractions were determined by the relative intensities of the supernatant and pellet bands using ImageJ. A full version of the gels can be found in Figure C.1. (B and C) The catalytic activities 10210310410510610710810-210-1100101102103104105MPHAncDHCH1AncDHCH2AncDHCH3JsDHCHBbDHCHPvDHCHSmDHCHOsDHCHCatalytic activity (kcat/KM)* * * **DHC OPAncestral stateDerived stateS P S PAncDHCH1wt +m588 12 92 8S P S PAncDHCH2wt +m588 12 93 7S P S PAncDHCH3wt +m590 10 87 13S P S PMPHwt -m582 8618 14S P S PJsDHCHwt +m550 50 38 62S P S PBbDHCHwt +m57 93 1 99S P S PSmDHCHwt +m589 11 81 19S P S PPvDHCHwt +m533 67 49 51S P S POsDHCHwt +m588 12 84 16AB C**wt +m5AncDHCH4S P S P64 36 38 62AncDHCH5wt +m5S P S P64 36 42 58AncDHCH4AncDHCH5MPHAncDHCH1AncDHCH2AncDHCH3JsDHCHBbDHCHPvDHCHSmDHCHOsDHCHAncDHCH4AncDHCH5 121 of the ancestral and orthologous enzymes with the five key positions in either the ancestral (black) or derived (red) states for (B) DHC (C) OP substrates. Arrows indicate the increase or decrease in activity due to the five mutations. For MPH, the wild-type sequence is the “derived state” while MPH-m5 is the “ancestral state”. A single asterisk indicates activity that is too low to be measured. A double asterisk indicates methyl-paraoxon activity being reported instead of methyl-parathion for OPH activity. A full description of the catalytic activities is shown in Table C.4.   4.4.3 Effects of the five mutations on different genetic backgrounds In our previous analysis, we have found five key mutations – 4 substitutions, l72R, h258L, i271T, and f273L (small letter denotes the ancestral state for each amino acid residue while the large letter denotes the derived MPH state), and one insertion, Δ193S – between AncDHCH1 and MPH that enabled the acquirement of efficient OPH activity in MPH (Figure 4.1 A-C and Figure 4.2). We tested the effects of these five mutations on the backgrounds of the ancestral enzymes and extant DHCH orthologs (Figure 4.4 and Table C.4). Note, we refer to MPH-m5, where the five positions have been reverted back to their ancestral states, as being in the “ancestral state”, while for the rest of the enzymes the +m5 variant, which has the five MPH residues, is referred to as being in the “derived state” (Figure 4.2 and Figure 4.4). Although AncDHCH4+m5 and AncDHCH5+m5 exhibit a slight decrease in the amount of protein that’s present in the lysate fraction, the solubilities of the +m5 variants for most of the enzymes don’t substantially change; hence, the amount of biologically active molecules in the cell isn’t affected (Figure 4.4 A). The mutations are deleterious to the DHCH activity in all of the backgrounds, resulting in 100-fold or more decreases in catalytic efficiency; in particular, the DHCH activities of BbDHCH and JsDHCH become too low to be determined above the buffer-catalyzed background rate in the +m5 variant (Figure 4.4 B). Thus, the five residues are important for the native lactonase function. As we have shown in our previous study in Chapter 3 (Yang et al. 2019), the five mutations enabled a ~1000-fold increase in OPH activity in AncDHCH1 and a similar fold-decrease in the activity in MPH when the residues are reverted back to the ancestral state:  AncDHCH1+m5 has OPH activity similar to that of wild-type MPH (kcat/KM ~104 M-1s-1) (Figure 4.4 C and Table C.4) while MPH-m5 has OPH activity similar to the wild-type AncDHCH1 (kcat/KM ~101 M-1s-1) (Figure 4.4 C and Table C.4). Although the five mutations  122 still increased OPH activity in the backgrounds of AncDHCH2, AncDHCH3, and AncDHCH4, they did so to a substantially lesser extent. AncDHCH2 has a starting OPH activity that’s similar to that of AncDHCH1, but the five mutations only improved the activity by ~130-fold in that background, resulting in a final OPH activity of 1.6 × 103 M-1s-1 (~9-fold lower compared to AncDHCH1+m5) (Figure 4.4 C and Table C.4). AncDHCH3 and AncDHCH4 have lower starting OPH activities (kcat/KM = 0.26 M-1s-1 and 0.87 M-1s-1, respectively). The catalytic activity of AncDHCH3 increased ~16-fold (kcat/KM = 4.3 × 100 M-1s-1 in AncDHCH3+m5), and the activity of AncDHCH4 increased ~85-fold (kcat/KM = 7.4 × 101 M-1s-1 in AncDHCH4+m5) (Figure 4.4 C and Table C.4). While AncDHCH5 has a starting OPH activity that’s similar to AncDHCH1 and AncDHCH2, the five mutations appear to be slightly deleterious in that background (kcat/KM = 9.4 × 100 M-1s-1 in AncDHCH5 vs. 5.9 × 100 M-1s-1 in AncDHCH+m5) (Figure 4.4 C and Table C.4). In only one of the orthologs, PvDHCH, do the five mutations appear to be beneficial for the OPH activity; however, they only resulted in a small ~5-fold increase in catalytic efficiency (kcat/KM = 1.6 × 100 M-1s-1 in the wild-type vs. kcat/KM = 8.6 × 100 M-1s-1 in the +m5 variant) (Figure 4.4 C and Table C.4). The mutations are neutral in JsDHCH, and deleterious in BbDHCH, making OPH activity too low to be determined in the latter enzyme (Figure 4.4 C and Table C.4). Since the methyl-parathion activities of SmDHCH and OsDHCH cannot be dermined, we used methyl-paraoxon for comparison of OPH activities between the wild-type and +m5 variants. The five mutations are slightly deleterious to the methyl-paraoxon activity in both enzymes (Figure 4.4 C and Table C.4). To determine whether there are factors that predict the starting OPH activity of the enzymes and their response to the mutations, we performed correlation analysis, using the starting OPH activity in the ancestral state, the final OPH activity in the derived state (+m5), the overall fold change in activity between the derived state and ancestral state, and the sequence identity to MPH as variables (Figure 4.5).  The starting level of OPH activity exhibits moderate linear correlation with the final level of activity in the derived state (R2 = 0.68) (Figure 4.5 A), as well as moderate linear correlation with the overall fold-change in catalytic activity (R2 = 0.53) (Figure. 4.5 B). Interestingly, both the starting level of OPH activity and the overall fold-change in OPH activity exhibit stronger positive linear correlation with the enzymes’ sequence identity to MPH (R2 = 0.89 and 0.75, respectively) (Figure 4.5 C-D). Thus, from the small  123 sampling of enzymes that we’ve characterized in this study, sequences that are more genetically similar to MPH tend to possess both higher promiscuous activity and higher fold-improvements in response to the mutations. Altogether, our results reveal that the effects of the five mutations are highly dependent on the genetic background that they occur in, and it is only in AncDHCH1 do they enable a 1000-fold improvement to generate an efficient OPH enzyme.    Figure 4.5 Relationship between effects of the five mutations and the starting phenotype and sequence identity to MPH. (A and B) The starting levels of OPH activity (kcat/KM) for each of the characterized enzymes, where the five positions are in their ancestral states, in relation (linear correlation) AncDHCH210-1Fold change in catalytic activity10010110210310-1 100 101Starting catalytic activity (kcat/KM)R2 = 0.5310-1Final catalytic activity (kcat/KM)10110210310410-1 100 101Starting catalytic activity (kcat/KM)R2 = 0.68100A B10-1Fold change in catalytic activity100101102103R2 = 0.7510-1Starting catalytic activity (kcat/KM)10250 60 70 80% Sequence identity to MPHR2 = 0.89100C D90 100 50 60 70 80% Sequence identity to MPH90 100MPHAncDHCH1AncDHCH5AncDHCH4AncDHCH3PvDHCHJsDHCHMPHAncDHCH1AncDHCH2AncDHCH5PvDHCHAncDHCH4AncDHCH3JsDHCHMPH-m5AncDHCH1AncDHCH2AncDHCH5AncDHCH3PvDHCHAncDHCH4BbDHCHJsDHCHMPH-m5AncDHCH1AncDHCH2AncDHCH5AncDHCH3AncDHCH4PvDHCHJsDHCH 124 to (A) their final OPH activity and (B) overall fold change in catalytic activity when the positions are in their derived states (+m5). (C and D) The percent sequence identity of the enzyme to MPH (Table 4.2) in relation (linear correlation) to (C) the starting catalytic activity and (D) the overall fold change in catalytic activity conferred by the five mutations. BbDHCH, where the methyl-parathion activity of the +m5 variant cannot be determined, is omitted from panels A, B, and D, while OsDHCH and SmDHCH, where the methyl-parathion activities for both the wild-type and the +m5 variants cannot be determined, are omitted from all plots.   4.4.4  The singular effects of the five mutations in AncDHCH4 and AncDHCH5 Our analyses show that the collective effects of the five mutations change substantially between different genetic backgrounds. To gain deeper insight into how each of the individual mutations contribute to the differential effects that’s being observed, we generated all of the single amino acid variants of the five mutations for AncDHCH4 (a background where the five mutations are positive) and AncDHCH5 (a background where the five mutations are slightly deleterious). The variants were expressed in E. coli BL21 cells (DE3), and their enzymatic OPH activity in cell lysate measured and compared to the previously determined activities of the MPH variants (Figure 4.6 A) (Yang et al. 2019). Of the five mutations, only the singular effect of l72R appears to be largely the same, resulting in a ~10-fold increase in OPH activity in all three different genetic backgrounds (Figure 4.6 A). While i271T was deleterious when initially introduced in the ancestral background of MPH (MPH-m5), causing a ~5-fold decrease in activity, the mutation is slightly beneficial in AncDHCH4 and neutral in AncDHCH5 (Figure 4.6 A). On the other hand, the singular effects of Δ193S and h258L are deleterious in the AncDHCH5 background, while they are positive in both MPH and AncDHCH4 (Figure 4.6 A).  In our previous study, we characterized the adaptive fitness landscape of MPH that is encompassed by the five key mutations, and found that the mutations are highly epistatic; for example, while i271T was deleterious in MPH-m5, it becomes positive (~4-fold increase in activity) following the fixation of three other mutations (Yang et al. 2019).  Interestingly, the singular effects of the five mutations are more positive in AncDHCH4 compared to MPH, and would have resulted in >1000-fold increase in activity in a null additive model where there is no epistasis (vs. the ~200-fold increase predicted from the singular effects of the mutations in MPH) (Figure 4.6 B). However, while the five mutations exhibit synergistic epistasis in MPH (i.e., the actual collective effects of the mutations are more positive than their additive effects), and  125 increase OPH activity by ~1000-fold (Figure 4.6 B), AncDHCH4+m5 only had a modest ~85-fold improvement in activity, indicating antagonistic epistasis between the mutations (i.e., the actual collective effects of the mutations are less positive than their additive effects) (Figure 4.6 B). Our analysis reveals that both the singular effects of the individual mutations and their combined epistatic effects have changed between the different genetic backgrounds.        Figure 4.6 Singular effects of the five mutations on AncDHCH4 and AncDHCH5 compared to MPH. (A) The fold difference in cell lysate activity between the single amino acid variants and the ancestral enzyme (e.g., l72R is the ratio between the activity of the l72R variant and the ancestral background) for MPH-m5, AncDHCH4, and AncDHCH5. Each value represents the average of three measurements and the error bars indicate the standard deviation. (B) The overall fold change in activity in the +m5 variants in a null additive model versus the actual fold change in activity. Panel on the left depicts the fold change in cell lysate activity predicted from the singular effects of the five mutations using a null additive model. Panel on the right depicts the actual fold change in catalytic activity.   4.4.5 Mapping the genetic changes between backgrounds  AncDHCH1 has 89% sequence identity to MPH (Table 4.2), diverging from MPH by 32 mutations; however, only five of those mutations are needed for the acquirement of efficient OPH activity, whereas the remaining 27 are mostly neutral. On the other hand, AncDHCH2 shares 80% sequence identity to MPH and 90% sequence identity to AncDHCH1, diverging from the latter by 28 mutations (Table 4.2). While these additional mutations don’t affect the A10-110010110210310-1100101102103Fold change in lysate activity from wild-typel72RΔ193Sh258Li271Tf273LMPH-m5AncDHCH4AncDHCH5Predicted fold change in lysate activityMPHAncDHCH4AncDHCH5BActual fold change in catalytic activity (kcat/KM)MPHAncDHCH4AncDHCH510-1100101102103 126 starting level of OPH activity, they buffer the beneficial effects of the five mutations, and AncDHCH2+m5 has ~9-fold lower activity compared to AncDHCH1+m5 (Figure 4.4 C). Finally, AncDHCH3 is diverged from AncDHCH2 by 27 mutations, which have resulted in both a ~45-fold lower starting OPH activity compared to AncDHCH1 and AncDHCH2, as well as only a modest ~16-fold improvement in the +m5 variant (vs. ~130-fold in AncDHCH2 and ~1000-fold in AncDHCH1) (Figure 4.4 C). To visualize the locations of the genetic changes, and identity potential regions that may have caused the differences in starting activities and mutational responses, we mapped the mutations that have occurred between each of the ancestral sequences onto model structures. Apart from the five functional mutations, most of the changes between AncDHCH1 and MPH are located outside the active site, in second (active-site periphery) and third (surface) shells of the protein (Figure 4.7 A).  Similarly, all 28 mutations that separate AncDHCH2 and AncDHCH1 appear to be outside the active site (>10 Å away from active site metal ions in the model structure); however, there are several mutations, including a threonine insertion in AncDHCH2, in the loop where one of the key residues, Leu72, is located (Figure 4.7 A). All of the mutations separating AncDHCH3 and AncDHCH2 are also outside the active site, with the exception of position 272 (a glutamic acid in AncDHCH3 that was subsequently mutated to a glutamine in AncDHCH2), which is in between the two key residues, Ile271 and Phe273 (Figure 4.7 A). Additionally, there is a 16-amino acid insertion in AncDHCH3 in the loop where Leu72 is located, which may substantially change the conformation of that loop and affect the enzyme’s OPH activity (Figure 4.7 A).  We conducted the same analyses for AncDHCH4 and AncDHCH5. AncDHCH4 is the intermediate between AncDHCH1 and JsDHCH, and is more similar in phenotype to the extant ortholog in having lower levels of OPH activity (kcat/KM = 0.87 M-1s-1) (Figure 4.1 A and Figure 4.4 C). However, the five mutations enabled a ~85-fold increase in OPH activity in that genetic background, while being largely neutral in JsDHCH.  Conversely, AncDHCH5, the intermediate between AncDHCH2 and BbDHCH, is similar in phenotype to AncDHCH2 in having a higher level of OPH activity (kcat/KM = 9.4 M-1s-1); however, the five mutations are slightly deleterious towards the function in that background (Figure 4.1 A and Figure 4.4 C). AncDHCH4 is separated from JsDHCH by 30 amino acid substitutions whereas AncDHCH5 is separated from AncDHCH2 by 25 amino acid substitutions and two insertions at the C-terminus (Figure 4.7 B- 127 E and Table 4.3); hence, these two sets of genetic changes are the causes for the mutational incompatibilities (i.e., the five mutations no longer increase OPH activity) observed in JsDHCH and AncDHCH5. In both cases, several of the mutations appear to be located in the vicinity of the active site and/or the five key residues (Figure 4.7 B-E). Further analyses will be needed to determine the minimal subset of the 30 and 27 genetic changes that have altered the mutational compatibilities between AncDHCH4 and JsDHCH, and between AncDHCH2 and AncDHCH5, respectively.     128  Figure 4.7 Genotypic divergence between enzymes. (A) Cartoon representations of the crystal structure of AncDHCH1 (left, PDB ID: 6C2C) and the model structures of AncDHCH2 (center) and AncDHCH3 (right). Red spheres indicate positions that have mutated between AncDHCH1 and MPH (left), AncDHCH2 and AncDHCH1 (center), and AncDHCH3 and AncDHCH2 (right). The five key functional mutations between AncDHCH1 and MPH are highlighted as sticks on the AncDHCH1 structure, with e192 being shown to indicate the location of Δ193S. Active site metal ions are depicted as gray spheres. (B) Cartoon representation of the model structure of AncDHCH4. (C) Cartoon representation of the active sites of the model structures of AncDHCH4 and JsDHCH. (D) Cartoon representation of the model structure of AncDHCH2. (E) Cartoon representation of the active sites of the model structures of AncDHCH2 and AncDHCH5. Red spheres in Panels B and D indicate positions that have mutated between AncDHCH4 and AncDHCH4 JsDHCHAncDHCH5AncDHCH2e192 e192e192e192f273 f273f273 f273h258 h258h258 h258i271 v271i271 i271l72 l72l72 l72B CAncDHCH4 vs. JsDHCHAncDHCH2 vs. AncDHCH5AD EAncDHCH3 vs. AncDHCH2AncDHCH2 vs. AncDHCH1AncDHCH1 vs. MPH16 a.a. insertionGlu272Thr insertion 129 JsDHCH and between AncDHCH2 and AncDHCH5. A full description of the mutations is available in Table 4.3. The five key mutations are highlighted as orange sticks in Panels C and E, with e192 being shown to indicate the location of Δ193S. Mutations that have occurred between the sequences are highlighted as red sticks. Active site metal ions are depicted as grey spheres. All models were generated using SWISS-MODEL, using the structure of AncDHCH1 (PDB ID: 6c2c) as a template (Waterhouse et al. 2018).   Table 4.3 Mutations between AncDHCH4 and JsDHCH and between AncDHCH2 and AncDHCH5. Position AncDHCH4 JsDHCH  Position AncDHCH2 AncDHCH5 57 Val Ile  47 Phe Tyr 81 Gln Asn  76 Ser Arg 86 Lys Arg  78 Ala Gly 89 Leu Glu  82 Ser Arg 97 Val Ile  88 Phe Tyr 113 Thr Ala  102 Ile Val 117 Gly Ser  143 Ile Val 126 Leu Met  159 Asn Asp 127 Ala Leu  183 Ala Gln 130 Leu Phe  209 Glu Asp 146 Thr Ser  226 Ile Val 150 Ser Gly  227 Lys Arg 174 Gln Leu  230 Ala Pro 178 Tyr Phe  240 Thr Ile 183 Ala Ser  257 Ile Met 191 Asp Glu  272 Gln Lys 195 Ala Glu  275 Ser Ala 196 Asn Asp  282 Ala Pro 202 Val Met  297 Leu Tyr 217 Thr Val  300 Ala Val 251 Val Ala  303 Leu Val 265 Asp Ala  310 His Arg 266 Asp Thr  311 Ile Leu 2711 Ile Val  328 Ala Asn 278 Lys Arg  329 Arg Lys 279 Val Ile  b2 Ala  301 Ala Pro  c2 Lys  313 Ala Val     314 Asp Asn     319 Gln Asp          130 Positions of residues are numbered according to the MPH structure (PDB ID: 1p9e). 1Position 271 is one of the locations of the five key MPH mutations. 2AncDHCH2 contains two extra amino acids at the C-terminus.   4.4.6 Comparative directed evolution of AncDHCH1 and orthologs To further test the evolvabilities of the sequences and explore alternate mutational pathways, we performed comparative directed evolution from the starting points of the wild-type AncDHCH1 and the five DHCH orthologs. Briefly, randomly mutagenized gene pools were transformed into E. coli BL21 cells (DE3), and 600-800 colonies were picked at random from the agar plates and inoculated into liquid media (96-well plates), regrown, lysed and screened for methyl-parathion hydrolysis activity (Figure 4.8 A). For each round, 1-3 representative variants that exhibited improvements for activity were sequenced, with the most improved variant used as the template for the next round of evolution. Note, because AncDHCH1 has much higher OPH activity, its lysate activity could be measured when the enzyme is supplied with zinc (Zn2+), the canonical metal that is bound by MBL enzymes. However, the starting lysate activities of many of the orthologs are low and difficult to detect in the presence of Zn2+. Our previous studies have indicated that MPH variants supplied with cobalt (Co2+) showed higher OPH cell lysate activity while not affecting the topology of the adaptive landscapes formed by the mutations (Anderson et al. 2019). Hence, for our directed evolution on the orthologous sequences, Co2+ was supplied in place of Zn2+.   A total of three rounds of directed evolution were conducted on AncDHCH1, JsDHCH, BbDHCH, and OsDHCH, while two rounds of evolution were conducted on PvDHCH (Figure 4.8 B and Table 4.4). A single round of directed evolution was conducted on SmDHCH (Figure 4.8 B and Table 4.4); two independent libraries generated for round two yielded no variants with further improvements.  Although AncDHCH1 was evolved in the presence of Zn2+, its trajectory is largely similar when assayed in the presence of Co2+ (Figure 4.8 B). Overall, the cell lysate activity of AncDHCH1 has improved ~57-fold after three rounds of evolution (Figure 4.8 C). Out of the five orthologs, BbDHCH and PvDHCH have acquired the highest levels of OPH activity (42- and 28-fold improvements over the wild-type, respectively); however, they are still >100-fold  131 lower in activity compared to the final evolved variant of AncDHCH1 (Figure 4.8 B-C). JsDHCH, on the other hand, acquired only a modest ~4-fold improvement over the wild-type (Figure 4.8 B-C). The cell lysate activities of OsDHCH and SmDHCH were too low to be accurately determined above the background for the wild-type variants, so the overall fold-improvement in activities for these two enzymes cannot be calculated; however, after three rounds of directed evolution, OsDHCH exhibits a higher level of cell lysate activity than the initially more active JsDHCH, indicating that enzymes with lower starting activities can sometimes still be more evolvable than ones with higher starting activities (Figure 4.8 B). Taken together, the results of the directed evolution experiment show that in spite of the five MPH mutations failing to increase OPH activity in most of the orthologs, the orthologs are still capable of improving the function via alternate mechanisms. However, their starting levels of activity are lower than AncDHCH1, and they do not evolve as rapidly; hence, AncDHCH1 remains by far the most efficient OPH enzyme.     132   Figure 4.8 Comparative directed evolution of AncDHCH1 and five DHCH orthologs. (A) Overview of the directed evolution scheme. (1) Starting variant(s) were mutagenized using error-prone PCR and subcloned into a vector containing a N-terminal MBP-tag. (2) The resulting library was transformed into E. coli BL21 cells (DE3) for protein expression. (3) 600-800 colonies were picked from the agar plates for screening in 96-well plates liquid culture. (4) The most improved variant(s) are sequenced and serve as a starting point for the next round of directed evolution. (B) Improvements in cell lysate activities of AncDHCH1 and five orthologs. Enzymes were expressed in E. coli in the presence of 100 μM of Co2+ or Zn2+ supplied in the media, and lysed with 200 μM of the metal in the buffer. Lysate activities are measured using 20 μL of cell lysate and 80 μL of 500 μM methyl-parathion substrate in assay buffer containing 200 μM of the metal (final substrate concentration is 400 μM). The error bars represent the standard deviation of triplicate measurements. “WT” denotes the wild-type variant, whereas “RX” denotes Round X of directed evolution. The cell lysate activities of the wild-type SmDHCH and OsDHCH were too low to be accurately determined above the background, and were excluded in the plot. (C) The overall fold-increase in cell lysate activity in the final evolved variant from the WT. Error bars represent the standard deviation of triplicate measurements. A single asterisk indicates fold-increase that cannot be calculated because the cell lysate activity of the WT was too low to be determined.   Library generation with epPCR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . geneScreening for improved PTE activity (600-800 variants) Starting variantTransformation in E. coli 1234AB100101102Lysate activity (nM/sec) 103104 AncDHCH1 (Zn2+) AncDHCH1JsDHCHPvDHCHBbDHCHOsDHCHSmDHCHWT R1 R2 R310-1CAncDHCH1JsDHCHPvDHCHBbDHCHOsDHCHSmDHCH100101102Fold increase in lysate activity* * 133 4.4.7 Genotypic changes of AncDHCH1 and orthologs After three rounds of evolution, AncDHCH1 has acquired a total of five mutations. During the first round of evolution, AncDHCH1 accumulated three mutations, including the historical l72R (Figure 4.9, Table 4.4, Table C.5). On the second round of evolution, the enzyme acquired another historical mutation, f273L, while the final round of evolution saw the fixation of a new mutation, t113I (Figure 4.9, Table 4.4, Table C.5).  While the five mutations together do not substantially increase OPH activity in most of the extant orthologous backgrounds, a couple of these positions still appear to be mutational hotspots during directed evolution. The historical mutation l72R occurred in OsDHCH during the first round of evolution in all of the improved variants that were sequenced (Table C.5), and the residue was also mutated to a glutamine during the third round of evolution for BbDHCH (Figure 4.9, Table 4.4, Table C.5). Moreover, all five of the orthologs have acquired mutations at position 273; however, none of them acquired the historical leucine residue (Table C.5). Instead, serine, valine, cysteine, and tyrosine were obtained, indicating that the orthologs require alternate residues in that position to improve OPH activity. The remaining mutations are scattered across the protein structure, with some (e.g., e70G in PvDHCH and f272S in JsDHCH) in the vicinity of the five key residues, while a number of others are on the protein surface, far from the active site in the model structures (Figure 4.9). Overall, our results show that AncDHCH1 has acquired some of the exact same historical mutations, indicating that evolution from that background may repeatedly result in similar solutions. Conversely, while a few of the same positions also get mutated in the orthologous enzymes, the actual mutations themselves are usually different from the historical ones, and mutations at alternate positions are also frequent.    Table 4.4 Summary of mutations of variants picked for each round of directed evolution.  Round AncDHCH1  BbDHCH  JsDHCH OsDHCH PvDHCH SmDHCH  1 l72R, n166T, v262I v197A, i232S t272S l72R, v323G k90N, f273V l52P, q79R, f273S 2 f273L m157L, l185R d186N, i281T a69V e70G  3 t113I l72Q s304P e86K     Mutations that are bolded indicate positions that are the same as those of the five key MPH mutations. A full summary of all the variants sequenced for each round can be found in Table C.5  134    Figure 4.9 Genotypic changes in AncDHCH1 and five orthologs. Cartoon representation of the crystal structure of AncDHCH1 (PDB ID: 6c2c) and model structures of the orthologs. The C-a atoms of residues mutated during the trajectory are shown as spheres and colored according to their occurrence in the trajectory, with red indicating Round 1 (R1), orange Round 2 (R2), and yellow Round 3 (R3). Mutations that are bolded indicate positions that are the same as those of the five key MPH mutations. The crystal structure of MPH (PDB ID: 1p9e) with the five key mutations indicated as red spheres is presented in the top left corner for reference. Active site metal ions are depicted as grey spheres. All models were generated using SWISS-MODEL, using the structure of AncDHCH1 as a template (Waterhouse et al. 2018).             R1R2R3l72Rf273Ln166Tv262It133IAncDHCH1t272Sd186Ni281Ts304PJsDHCHf273V k90Ne70GPvDHCHΔ193Sl72Rf273Lh258Li271TMPHl52Pq79Rf273SSmDHCHl72Ra69Vv323Ge86KOsDHCHl72Qv197Ai232Sl185Rm157LBbDHCH 135 4.5 Discussion Overall, our work addresses the impact of the starting genotype on protein evolvability, using the natural example of OPH activity as a model. Firstly, we find a broad variation in the OPH activities between different DHCH enzymes, with AncDHCH1, AncDHCH2, and AncDHCH5 serrendiptiously exhibiting higher levels of promiscuous activity, and hence, may provide a greater fitness advantage to host organisms when the function came under selection. Secondly, the effects of the five key historical mutations also vary broadly across the different genetic backgrounds, ranging from being beneficial in AncDHCH1, AncDHCH2, AncDHCH3, and AncDHCH4, to being neutral or deleterious in AncDHCH5 and four of the extant orthologs. Only in the background of AncDHCH1 are the mutations sufficient to enable the acquirement of OPH activity that is on par with that of MPH. Our observations are consistent with that of several previous studies that have also found changes in the effects of mutations in genetically diverged orthologs (Baier et al. 2019; Khanal et al. 2015; Parera & Martinez 2014). Thirdly, in our comparative directed evolution experiments, AncDHCH1 acquired two historical mutations, indicating that evolution from this background may repeatedly result in a similar set of mutations.  Conversely, while a few of the same positions do get mutated in the five DHCH orthologs, the majority of the mutations themselves are different, and mutations at other sites are also frequent. Hence, genetic divergence has resulted in the orthologs requiring mutations distinct from the ones that have occurred in MPH in order to improve OPH activity. Finally, none of the five orthologs are able to acquire the same level of OPH activity as AncDHCH1 during directed evolution, and therefore appear to be less evolvable towards the function compared to the ancestral sequence. Since the mass production of anthropogenic OP compounds as a form of pesticide following World War II, there has been a number of enzymes isolated that possess activity towards these substrates (Bigley & Raushel 2013; Ghanem & Raushel 2005; Singh 2009). OPH activity has independently arisen in several different enzyme superfamilies, indicating that a number of highly divergent sequences are capable of acquiring the novel function (Bigley & Raushel 2013; Ghanem & Raushel, 2005; Singh 2009). Yet, at the same time, only a handful of efficient OP-degrading sequences has been found within each of the superfamilies: PTE is the only known OPH enzyme in the amidohydrolase superfamily (Serdar et al. 1982) while MPH  136 and OPHC2 are the only two OPH enzymes in the MBL superfamily (Gotthard et al. 2013; Luo et al. 2014; Sun et al. 2004). This is in spite of the fact that numerous lactonase enzymes with promiscuous OPH activity exist in both superfamilies, and could theoretically serve as evolutionary starting points for the function (Afriat-Jurnou et al. 2012; Luo et al. 2014; Seibert & Raushel 2005). Because few DHCH orthologs have been functionally characterized thus far, it is possible that there may be other efficient OPH enzymes in the family that have yet to be identified. The results of our study, however, indicate that some sequences are less evolvable towards OPH activity, and there may only be a handful of genetic backgrounds that are capable of rapidly acquiring the new function.  Much research over the past several decades have attempted to understand the underlying determinants of protein evolvability (Bloom et al. 2006; Schulenburg et al. 2015; Tokuriki & Tawfik 2009; Toth-Petroczy & Tawfik 2014). Being able to find and choose the most evolvable starting genotype is important for the successful engineering of a protein towards a desired function (Baier et al. 2019; Bloom et al. 2006; Dellus-Gur et al. 2013; Khanal et al. 2015). From the limited number of enzymes that we’ve characterized in this study, we found that genetic backgrounds that are closer to MPH (i.e., higher sequence identity) tend to exhibit both higher starting levels of OPH activity as well as higher fold-improvements in activity in response to the five mutations. Interestingly, AncDHCH4 has a lower starting OPH activity that’s similar to the JsDHCH ortholog, but exhibits a substantial increase in activity in response to the five MPH mutations; conversely, AncDHCH5 has a higher starting activity similar to that of AncDHCH2, yet the five mutations fail to improve activity in that background. In a study conducted on the evolution of cortisol specificity in glucocorticoid receptors, it was found that two mutations were required to stabilize the mineralocorticoid receptor-like ancestor before further mutations can accumulate and shift the specificity of the ancestor towards cortisol (Ortlund et al. 2007). These two “permissive mutation” had no immediate consequence on the protein function, but were necessary in order to allow the protein to tolerate subsequent function-altering mutations (Ortlund et al. 2007). Similarly, the divergence that has occurred between AncDHCH4 and JsDHCH and between AncDHCH2 and AncDHCH5 must include a subset of permissive mutations that enable the five function-altering mutations to be beneficial in AncDHCH4 and AncDHCH2. Obtaining crystal structures of the ancestors and orthologs and analyzing the  137 effects that sequence divergence has on structure and/or protein dynamics will likely yield answers into the underlying molecular causes of the mutational incompatibilities. We acknowledge the shortcomings of our experimental setup. Firstly, all of the proteins are heterologously expressed in E. coli, and their phenotypes, in particular levels of expression and solubilities, may differ in native host organisms. Secondly, due to limitations in our system, the pool of variants that we have screened in our directed evolution experiments (600-800 per round) is relatively small compared to the theoretical number of unique variants (several thousand) that exists in each library. Thus, there may be other functional mutations that we have missed in our screening. Finally, we focus on catalytic efficiencies, but whether this accurately reflects the actual fitness that the enzymes confer to the host is unknown. The establishment of a culturing system, where cells growing on minimal media that require degradation of OP compounds to provide a phosphate or carbon source (Rani & Lalithakumari 1994) will be a much better reflection of fitness. While our experiments may not perfectly capture the processes enzyme evolution in nature however, they nevertheless show the importance of the starting point on the evolvability of a function, where only one sequence out of the ten that were tested is able to rapidly obtain efficient OPH activity. The role of historical contingency (the “sensitivity of outcomes to the details of history” (Blount, Lenski, & Losos 2018)) has been demonstrated in the long-term experimental evolution of E. coli, where only one out of twelve populations was able to evolve the ability to utilize citrate as a carbon source due to potentiating mutations acquired earlier on during evolution (Blount, Borland, & Lenski 2008). Conversely, other experimental evolution experiments have observed that adaptation to a novel condition leads to a convergence in the overall phenotype in spite of differing starting genotypes, and usually, differing sets of adaptive mutations acquired (Bedhomme, Lafforgue, & Elena 2013; Kaltenbach et al. 2015; Kryazhimskiy et al. 2014; Simões et al. 2017). How much of a role historical contingency plays in evolution remains to be determined.       138 Chapter 5: Conclusion and future outlook  5.1 General summary and conclusion The overarching goal of this thesis is to understand the evolution of novel enzyme functions, utilizing xenobiotic OPH activity as a model. In Chapter 2, we performed directed evolution to demonstrate how the promiscuous OPH activity of AiiA can be quickly optimized via just a handful of mutations. Notably, we were able to uncover the importance of more distant mutations that do not directly interact with the substrate, but function by tinkering existing residues in the active site. In Chapter 3, we utilized ASR to determine the likely ancestral sequence that has given rise to MPH and identify the key mutations that were necessary for the optimization of OPH activity. Through extensive statistical analyses of adaptive fitness landscapes, we were able to unveil the complex network of epistatic interactions between the five key mutations, including interactions between residues that are structurally far apart and cannot physically interact. Furthermore, by conducting adaptive fitness landscape analyses against slightly different substrates, we were able to uncover the changes in the epistatic effects in response to subtle differences in the substrate substituents. Finally, in Chapter 4, we demonstrated the importance of the starting genetic background on the outcome of evolution by testing the effects of the five key mutations identified in Chapter 3 on the backgrounds of various DHCH orthologs. We found that the effects of the mutations differ greatly between different genetic backgrounds, and only one sequence, AncDHCH1, was able to acquire the same level of OPH activity as MPH. Comparative directed evolution of AncDHCH1 and five of the DHCH orthologs further solidifies that AncDHCH1 is more evolvable towards the function. Altogether, this thesis sheds light on both the molecular mechanisms and evolutionary dynamics of functional innovation in proteins.   5.2 Future outlook 5.2.1 Probing the catalytic mechanisms of MPH The catalytic mechanisms of OPH activity have been heavily researched for PTE (Aubert et al. 2004; Bigley & Raushel 2013; Ghanem & Raushel 2005; Hong & Raushel 1996) as well as  139 PON1 and DFPase (Bigley & Raushel 2013; Blum et al. 2006; Blum et al. 2008; Chen et al. 2010); however, there is currently only limited information on the mechanism of MPH. It is hypothesized that MPH has a catalytic mechanism smiliar to that of PTE in that it involves a single displacement reaction via nucleophilic attack of the phosphorus centre by an activated water/hydroxide molecule that’s bound to the active site metal ions. However, there are subtle differences between PTE and MPH that may indicate differing mechanisms between the two enzymes (Purg et al. 2016). Whereas PTE exhibits a greater preference for oxygen-substituted substrates (Bigley & Raushel 2013; Hong & Raushel 1996), MPH appears to be equally efficient towards both sulfur and oxygen-subtituted substrates. Furthermore, computational studies conducted by Purg et al. have indicated that MPH utilizes a terminal water/hydroxide molecule bound to the α-metal ion for nucleophilic attack as opposed to the bridging water molecule between the two metal ions that’s likely utilized by PTE (Purg et al. 2016).  A number of experiments can be conducted to experimentally probe the catalytic mechanism of MPH and obtain a better understanding of how it has acquired efficient OPH activity. Experiments with chiral OPH substrates and 18O-labelled water can be utilized to confirm a single-displacement reaction and cleavage of the P-O bond. pH profiles can be constructed to help identify key residues involved in substrate binding and catalysis. Brønsted plots with both parathion and paraoxon analogues will yield valuable insights into the rate-determining steps for bond cleavage and the causes for the different effects of phosphoryl sulphur vs. oxygen substituents. Brønsted plots for various intermediates and variants, in particular the ancestral backgrounds of AncDHCH1 and MPH-m5, as well as combinations of h258L and i271T mutants, can potentially provide further mechanistic insight into how the mutations are altering enzyme function and substrate specificity. Elucidating the chemical and mechanistic changes that occur in response to the genetic changes of the enzyme will provide a much more comprehensive view of how protein function evolves along a trajectory.  5.2.2 Identifying the genetic causes of mutational incompatibilities   In Chapter 4, we have demonstrated that the effects of the five key mutations that enabled the optimization of OPH activity in MPH differ greatly in diverged genetic backgrounds, and fail to substantially increase activity in most of the extant DHCH orthologs. This observation is  140 consistent with the results from other studies, such as ProA (Khanal et al. 2015) and MBL enzymes (Baier et al. 2019). What are the causes of genetic incompatibilities? In a study conducted by Ortlund et al, ASR was utilized to understand the evolution of cortisol specificity of a glucocorticoid receptor (GR) (Harms & Thornton 2014; Ortlund et al. 2007). It was found that two “permissive mutations” needed to fix first in the mineralocorticoid receptor (GR)-like ancestor before further mutations can accumulate to fully shift the specificity towards cortisol (Harms & Thornton 2014; Ortlund et al. 2007). Although the permissive mutations had no noticeable effect on the ligand specificity, they were necessary to stabilize the ancestral receptor so that it could tolerate the specificity-changing mutations that would have initially resulted in a loss of function (Harms & Thornton 2014; Ortlund et al. 2007). We have found, in Chapter 4, that the five mutations still substantially increased OPH activity in AncDHCH4, the intermediate sequence between AncDHCH1 (where the mutations are beneficial) and JsDHCH (where the mutations are largely neutral). Conversely, the mutations fail to increase OPH activity in AncDHCH5, the intermediate between AncDHCH2 (where the mutations are beneficial) and BbDHCH (where the mutations are deleterious). To determine the minimal subset of genetic changes that have resulted in the changes in compatibilities, DNA shuffling can be performed between AncDHCH4 and JsDHCH and between AncDHCH2 and AncDHCH5. Specifically, staggered extension process (StEP) can be used to generate a library of randomly recombined variants (Zhao et al. 1998). Subsequent screening of the library and sequencing of selected variants will enable the identification of particular backgrounds where the mutations are compatible, and thereby disclose the minimal set of permissive mutations that enabled the improvement of OPH activity in some backgrounds and not others. The same analysis can additionally be conducted on AncDHCH2 and AncDHCH3. AncDHCH3 has a much lower OPH activity compared to AncDHCH1 and AncDHCH2, and a lower fold-improvement in activity in response to the five mutations. This may be caused by a 16-amino acid insertion in one of the loops, as well as a glutamine to glutamic acid mutation at position 272. Removing the insertion and mutating the glutamic acid to a glutamine, and subsequently assaying the activity of the mutant AncDHCH3, will reveal if this is indeed the case. Similarly, DNA shuffling utilizing the aforementioned StEP protocol can be performed  141 between AncDHCH2 and AncDHCH1 to identify the subset of mutations that have caused the lower fold-improvement observed in AncDHCH2.   5.2.3 Investigating mutational incompatibilities using adaptive fitness landscapes In Chapter 3, we have demonstrated the possibility of analyzing adaptive fitness landscapes to probe molecular interactions between mutations and substrates. This method can be further employed to understand the causes for the mutational incompatibilities that have been observed in many of the genetic backgrounds in Chapter 4. For example, it is unknown whether the differential effects of the five MPH mutations are due to the singular effects of the individual mutations, or due to negative epistasis occurring between the mutations, or a combination of both. By generating all combinations of the five mutations in different ancestral enzymes and extant orthologs, and subsequently assaying their activities and conducting statistical analyses on the data, it will be possible to see how both the singular and epistatic effects of the five mutations may change between different genetic backgrounds (Anderson et al. 2015; Stormo 2011). Preliminary analysis of the single amino acid variants for AncDHCH4 and AncDHCH5 indicates that both the singular and epistatic effects of the five mutations have changed in those two backgrounds compared to what has been observed in MPH. Notably, in AncDHCH4, it appears that the singular effects of each of the individual mutations are actually more positive than they were in MPH, and would have resulted in >1000-fold increase in a null additive model where there is no epistasis; however, due to antagonistic epistasis occurring between the mutations, the overall level of improvement in the derived +m5 variant is much lower. Statistical analysis of the full landscape encompassed by the five mutations (i.e., the 32 possible genotypes), will help identify the particular sets of pairwise and/or higher-order epistatic interactions that are responsible for the diminished improvement. This will provide in-depth insight into the causes of mutational incompatibility between the different sequences.   5.2.4 Obtaining a structural basis for mutational incompatibilities Concurrent with the genetic analyses, structural analyses can also be conducted on the enzymes to probe the differential effects of mutations. By utilizing X-ray crystallography to solve the structures of various orthologs and ancestral enzymes with the five key residues in their ancestral  142 and derived states, it will be possible to obtain a molecular basis for the causes of the mutational incompatibilities that have been observed from mutagenesis experiments. For instance, we will be able to determine whether there are subtle differences in the arrangements of the active site residues that may result in the different mutational effects. In a comparative directed evolution experiment conducted on two MBL orthologs, VIM2 and NDM1, it was found that mutation of an active site tryptophan to glycine had differential effects between the two orthologs: in NDM1 the mutation caused ~100-fold increase in activity towards phosphonate monoester hydrolase (PMH) activity whereas in VIM2 the mutation had a deleterious effect (Baier et al. 2019). Structural analyses of the enzyme active sites revealed different arrangements of the tryptophan: in NDM1, the residue initially formed a steric clash with the PMH substrate, and hence, removal of the sidechain had a beneficial effect; conversely, in VIM2, the tryptophan obtains a different orientation that doesn’t hinder substrate binding (Baier et al. 2019). Additionally, techniques such as molecular docking and molecular dynamics (MD) simulations can probe how the substrate may be positioned in different active sites. A number of recent studies have implicated the optimization of the E·S complex and the freezing out of non-productive and less productive substrate binding modes as important mechanisms for optimizing a promiscuous activity (Blomberg et al. 2013; Kaltenbach et al. 2018; Miton et al. 2018). Indeed, we have previously hypothesized that such mechanisms may have played a key role in the optimization of OPH activity in MPH. By conducting computational analyses, it may be possible to observe whether substrate repositioning and productive vs. non-productive binding have a significant impact on the OPH activity of the different enzymes. Moreover, such analyses can also reveal whether differential protein dynamics play a role in mutational incompatibility. In the aforementioned study conducted on the MBL orthologs, MD simulations showed an increase in the flexibility and repositioning of active site loops in the VIM2 enzyme with the tryptophan mutation, likely blocking substrate accessibility to the active site, and hence accounting for the deleterious effect that the mutation has in that background (Baier et al. 2019). Altogether, these experiments will be important to our understanding of sequence-structure-function relationships.     143 Bibliography Abascal, F., Zardoya, R. & Posada, D., 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics (Oxford, England), 21(9), 2104–2105.   Afonine, P.V. et al., 2012. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallographica. Section D, Biological Crystallography, 68(Pt 4), 352–367.   Afriat-Jurnou, L. et al., 2006. The latent promiscuity of newly identified microbial lactonases is linked to a recently diverged phosphotriesterase. Biochemistry, 45(46), 13677–13686.   Afriat-Jurnou, L., Jackson, C.J. & Tawfik, D.S., 2012. Reconstructing a missing link in the evolution of a recently diverged phosphotriesterase by active-site loop remodeling. Biochemistry, 51(31), 6047–6055.   Aharoni, A. et al., 2005. The “evolvability” of promiscuous protein functions. Nature Genetics, 37(1), 73–76.   Aharoni, A. et al., 2004. Directed evolution of mammalian paraoxonases PON1 and PON3 for bacterial expression and catalytic specialization. Proceedings of the National Academy of Sciences of the United States of America, 101(2), 482–487.   Akanuma, S. et al., 2013. Experimental evidence for the thermophilicity of ancestral life. Proceedings of the National Academy of Sciences, 110(27), 11067–11072.   Albery, W.J. & Knowles, J.R., 1976. Evolution of enzyme function and the development of catalytic efficiency. Biochemistry, 15(25), 5631–5640.   Althoff, E.A. et al., 2012. Robust design and optimization of retroaldol enzymes. Protein Science: A Publication of the Protein Society, 21(5), 717–726.   Amitai, G., Gupta, R.D. & Tawfik, D.S., 2007. Latent evolutionary potentials under the neutral mutational drift of an enzyme. HFSP Journal, 1(1), 67–78.   Anderson, D.W. et al., 2019. Secondary environmental variation creates a shifting evolutionary watershed for the methyl-parathion hydrolase enzyme. bioRxiv, 65, 1544–28.   Anderson, D.W., McKeown, A.N. & Thornton, J.W., 2015. Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites. eLife, 4, e07864.   Araya, C.L. & Fowler, D.M., 2011. Deep mutational scanning: Assessing protein function on a massive scale. Trends in Biotechnology, 29(9), 435–442.    144 Aubert, S.D., Li, Y. & Raushel, F.M., 2004. Mechanism for the hydrolysis of organophosphates by the bacterial phosphotriesterase †. Biochemistry, 43(19), 5707–5715.   Babbitt, P.C. & Gerlt, J.A., 2000. New functions from old scaffolds: How nature reengineers enzymes for new functions. Advances in Protein Chemistry, 55, 1–28.   Babtie, A., Tokuriki, N. & Hollfelder, F., 2010. What makes an enzyme promiscuous? Current Opinion in Chemical Biology, 14(2), 200–207.   Baier, F. & Tokuriki, N., 2014. Connectivity between catalytic landscapes of the Metallo-β-Lactamase superfamily. Journal of Molecular Biology, 426(13), 2442–2456.   Baier, F. et al., 2015. Distinct metal isoforms underlie promiscuous activity profiles of metalloenzymes. ACS Chemical Biology, 10(7), 1684–1693.   Baier, F., Copp, J.N. & Tokuriki, N., 2016. Evolution of enzyme superfamilies: Comprehensive exploration of sequence-function relationships. Biochemistry, 55(46), 6375–6388.   Baier, F. et al., 2019. Cryptic genetic variation shapes the adaptive evolutionary potential of enzymes. eLife, 8, 19.   Bar-Even, A. et al., 2011. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry, 50(21), 4402–4410.   Bas, D.C., Rogers, D.M. & Jensen, J.H., 2008. Very fast prediction and rationalization of pKa values for protein-ligand complexes. Proteins, 73(3), 765–783.   Bastard, K. et al., 2014. Revealing the hidden functional diversity of an enzyme family. Nature Chemical Biology, 10(1), 42–49.   Bayer, C.D., van Loo, B. & Hollfelder, F., 2017. Specificity effects of amino acid substitutions in promiscuous hydrolases: Context-dependence of catalytic residue contributions to local fitness landscapes in nearby sequence space. ChemBioChem, 18(11), 1001–1015.   Bebrone, C., 2007. Metallo-β-lactamases (classification, activity, genetic organization, structure, zinc coordination) and their superfamily. Biochemical Pharmacology, 74(12), 1686–1701.   Bedhomme, S., Lafforgue, G. & Elena, S.F., 2013. Genotypic but not phenotypic historical contingency revealed by viral experimental evolution. BMC Evolutionary Biology, 13(1), 46–13.   Ben-David, M. et al., 2020. Enzyme evolution: An epistatic ratchet versus a smooth reversible transition. Molecular Biology and Evolution, 37(4), 1133–1147.    145 Ben-David, M. et al., 2013. Catalytic metal ion rearrangements underline promiscuity and evolvability of a metalloenzyme. Journal of Molecular Biology, 425(6), 1028–1038.   Benkovic, S.J. & Hammes-Schiffer, S., 2003. A perspective on enzyme catalysis. Science, 301(5637), 1196–1202.  Bhabha, G., Biel, J.T. & Fraser, J.S., 2015. Keep on moving: Discovering and perturbing the conformational dynamics of enzymes. Accounts of Chemical Research, 48(2), 423–430.   Bigley, A.N. & Raushel, F.M., 2013. Catalytic mechanisms for phosphotriesterases. Biochimica Et Biophysica Acta, 1834(1), 443–453.   Blomberg, R. et al., 2013. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature, 503(7476), 418–421.   Bloom, J.D. et al., 2006. Protein stability promotes evolvability. Proceedings of the National Academy of Sciences, 103(15), 5869–5874.   Bloom, J.D. et al., 2007. Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution. Biology Direct, 2(1), 17.   Blount, Z.D., Borland, C.Z. & Lenski, R.E., 2008. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proceedings of the National Academy of Sciences, 105(23), 7899–7906.   Blount, Z.D., Lenski, R.E. & Losos, J.B., 2018. Contingency and determinism in evolution: Replaying life's tape. Science, 362(6415).   Blum, M.-M. et al., 2006. Binding of a designed substrate analogue to diisopropyl fluorophosphatase: Implications for the phosphotriesterase mechanism. Journal of the American Chemical Society, 128(39), 12750–12757.   Blum, M.-M. et al., 2008. Inhibitory potency against human acetylcholinesterase and enzymatic hydrolysis of fluorogenic nerve agent mimics by human paraoxonase 1 and squid diisopropyl fluorophosphatase. Biochemistry, 47(18), 5216–5224.   Bocola, M. et al., 2004. Learning from directed evolution: Theoretical investigations into cooperative mutations in lipase enantioselectivity. ChemBioChem, 5(2), 214–223.   Boucher, J.I. et al., 2014. An atomic-resolution view of neofunctionalization in the evolution of apicomplexan lactate dehydrogenases. eLife, 3, e02304  Bridgham, J.T., Carroll, S.M. & Thornton, J.W., 2006. Evolution of hormone-receptor complexity by molecular exploitation. Science, 312(5770), 97–101.   146  Bridgham, J.T. et al., 2010. Protein evolution by molecular tinkering: Diversification of the nuclear receptor superfamily from a ligand-dependent ancestor. PLoS Biology, 8(10), e1000497.   Bridgham, J.T., Ortlund, E.A. & Thornton, J.W., 2009. An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature, 461(7263), 515–519.   Brodkin, H.R. et al., 2015. Prediction of distal residue participation in enzyme catalysis. Protein Science: A Publication of the Protein Society, 24(5): 762–778.  Broom, A. et al., 2020. Evolution of an enzyme conformational ensemble guides design of an efficient biocatalyst. bioRxiv, 8, 749–29.   Buller, A.R. et al., 2015. Directed evolution of the tryptophan synthase β-subunit for stand-alone function recapitulates allosteric activation. Proceedings of the National Academy of Sciences of the United States of America, 112(47), 14599–14604.   Butzin, N.C. et al., 2013. Reconstructed ancestral Myo-inositol-3-phosphate synthases indicate that ancestors of the Thermococcales and Thermotoga species were more thermophilic than their descendants. PLoS ONE, 8(12), e84300.   Cabrita, L.D. et al., 2007. Enhancing the stability and solubility of TEV protease using in silico design. Protein Science: A Publication of the Protein Society, 16(11), 2360–2367.   Campbell, E. et al., 2016. The role of protein dynamics in the evolution of new enzyme function. Nature Chemical Biology, 12(11), 944–950.   Canale, A.S. et al., 2018. Evolutionary mechanisms studied through protein fitness landscapes. Current Opinion in Structural Biology, 48, 141–148.   Chen, J.C.-H. et al., 2010. Neutron structure and mechanistic studies of diisopropyl fluorophosphatase (DFPase). Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 11), 1131–1138.   Chen, J.Z., Fowler, D.M. & Tokuriki, N., 2020. Comprehensive exploration of the translocation, stability and substrate recognition requirements in VIM-2 lactamase. bioRxiv, 51, 57–52.   Chen, K. & Arnold, F.H., 1993. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proceedings of the National Academy of Sciences, 90(12), 5618–5622.    147 Chen, V.B. et al., 2010. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 1), 12–21.   Cheng, T.C., Harvey, S.P. & Chen, G.L., 1996. Cloning and expression of a gene encoding a bacterial enzyme for decontamination of organophosphorus nerve agents and nucleotide sequence of the enzyme. Applied and Environmental Microbiology, 62(5), 1636–1641.  Chica, R.A., Doucet, N. & Pelletier, J.N., 2005. Semi-rational approaches to engineering enzyme activity: Combining the benefits of directed evolution and rational design. Current Opinion in Biotechnology, 16(4), 378–384.   Clifton, B.E. et al., 2018. Evolution of cyclohexadienyl dehydratase from an ancestral solute-binding protein. Nature Chemical Biology, 14(6), 542–547.   Cobb, R.E., Chao, R. & Zhao, H., 2013. Directed evolution: Past, present and future. AIChE Journal. American Institute of Chemical Engineers, 59(5), 1432–1440.   Copley, S.D., 1998. Microbial dehalogenases: enzymes recruited to convert xenobiotic substrates. Current Opinion in Chemical Biology, 2(5), 613–617.  Copley, S.D., 2000. Evolution of a metabolic pathway for degradation of a toxic xenobiotic: The patchwork approach. Trends in Biochemical Sciences, 25(6), 261–265.  Copley, S.D., 2003. Enzymes with extra talents: Moonlighting functions and catalytic promiscuity. Current Opinion in Chemical Biology, 7(2), 265–272.   Copley, S.D., 2009. Evolution of efficient pathways for degradation of anthropogenic chemicals. Nature Chemical Biology, 5(8), 559–566.   Crawford, R.L., Jung, C.M. & Strap, J.L., 2007. The recent evolution of pentachlorophenol (PCP)-4-monooxygenase (PcpB) and associated pathways for bacterial degradation of PCP. Biodegradation, 18(5), 525–539.   Crowder, M.W., Spencer, J. & Vila, A.J., 2006. Metallo-β-lactamases:  Novel weaponry for antibiotic resistance in bacteria. Accounts of Chemical Research, 39(10), 721–728.   Darden, T., York, D. & Pedersen, L., 1993. Particle mesh Ewald: An N⋅log( N) method for Ewald sums in large systems. The Journal of Chemical Physics, 98(12), 10089–10092.   Davey, J.A. et al., 2017. Rational design of proteins that exchange on functional timescales. Nature Chemical Biology, 13(12), 1280–1285.    148 Davids, T. et al., 2013. Strategies for the discovery and engineering of enzymes for biocatalysis. Current Opinion in Chemical Biology, 17(2), 215–220.   de Visser, J.A.G.M., Cooper, T.F. & Elena, S.F., 2011. The causes of epistasis. Proceedings of the Royal Society B: Biological Sciences, 278(1725), 3617–3624.   Dellus-Gur, E. et al., 2015. Negative epistasis and evolvability in TEM-1 β-Lactamase – The thin line between an enzyme's conformational freedom and disorder. Journal of Molecular Biology, 427(14), 2396–2409.   Dellus-Gur, E. et al., 2013. What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. Journal of Molecular Biology, 425(14), 2609–2621.   Domingo, J., Baeza-Centurion, P. & Lehner, B., 2019. The causes and consequences of genetic interactions (epistasis). Annual Review of Genomics and Human Genetics, 20, 433–460.   Dong, Y.-J. et al., 2005. Crystal structure of methyl parathion hydrolase from Pseudomonas sp. WBC-3. Journal of Molecular Biology, 353(3), 655–663.   Draganov, D.I., 2010. Lactonases with organophosphatase activity: Structural and evolutionary perspectives. Chemico-Biological Interactions, 187(1-3), 370–372.   Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797.   Elias, M. & Tawfik, D.S., 2011. Divergence and convergence in enzyme evolution: Parallel evolution of paraoxonases from quorum-quenching lactonases. Journal of Biological Chemistry, 287(1), 11–20.   Elias, M. et al., 2008. Structural basis for natural lactonase and promiscuous phosphotriesterase activities. Journal of Molecular Biology, 379(5), 1017–1028.   Emsley, P. & Cowtan, K., 2004. Coot: model-building tools for molecular graphics. Acta Crystallographica. Section D, Biological Crystallography, 60(Pt 12 Pt 1), 2126–2132.   Evans, P.R. & Murshudov, G.N., 2013. How good are my data and what is the resolution? Acta Crystallographica. Section D, Biological Crystallography, 69(Pt 7), 1204–1214.   Fang, H. et al., 2014. Metagenomic analysis reveals potential biodegradation pathways of persistent pesticides in freshwater and marine sediments. The Science of the Total Environment, 470-471, 983–992.    149 Fasan, R. et al., 2007. Engineered alkane-hydroxylating cytochrome P450(BM3) exhibiting nativelike catalytic properties. Angewandte Chemie (International Ed. in English), 46(44), 8414–8418.   Fasan, R. et al., 2008. Evolutionary history of a specialized p450 propane monooxygenase. Journal of Molecular Biology, 383(5), 1069–1080.   Fersht, A.R., 1974. Catalysis, binding and enzyme-substrate complementarity. Proceedings of the Royal Society of London. Series B, Biological Sciences, 187(1089), 397–407.   Fischer, M., Kang, M. & Brindle, N.P., 2016. Using experimental evolution to probe molecular mechanisms of protein function. Protein Science: A Publication of the Protein Society, 25(2), 352–359.   Fitch, W.M., 1971. Toward defining the course of evolution: Minimum change for a specific tree topology. Systematic Biology, 20(4), 406–416.  Fowler, D.M. & Fields, S., 2014. Deep mutational scanning: A new style of protein science. Nature Methods, 11(8), 801–807.   Fraser, J.S. & Jackson, C.J., 2011. Mining electron density for functionally relevant protein polysterism in crystal structures. Cellular and Molecular Life Sciences: CMLS, 68(11), 1829–1841.   Fraser, J.S. et al., 2009. Hidden alternative structures of proline isomerase essential for catalysis. Nature, 462(7273), 669–673.   Friesner, R.A. et al., 2006. Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. Journal of Medicinal Chemistry, 49(21), 6177–6196.   Fu, L. et al., 2012. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England), 28(23), 3150–3152.   Furlong, C.E. et al., 1991. Purification of rabbit and human serum paraoxonase. Biochemistry, 30(42), 10133–10140.   Furnham, N. et al., 2016. Large-scale analysis exploring evolution of catalytic machineries and mechanisms in enzyme superfamilies. Journal of Molecular Biology, 428(2 Pt A), 253–267.   Gamage, N.U. et al., 2005. The structure of human SULT1A1 crystallized with estradiol. An insight into active site plasticity and substrate inhibition with multi-ring substrates. The Journal of Biological Chemistry, 280(50), 41482–41486.    150 Gatti-Lafranconi, P. & Hollfelder, F., 2013. Flexibility and reactivity in promiscuous enzymes. ChemBioChem, 14(3), 285–292.   Gerlt, J.A., Bouvier, J.T., Davidson, D.B., Imker, H.J., Sadkhin, B., Slater, D.R. & Whalen, K.L., 2015. Enzyme function initiative-enzyme similarity tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 1854(8), 1019-1037.  Ghanem, E. & Raushel, F., 2005. Detoxification of organophosphate nerve agents by bacterial phosphotriesterase. Toxicology and Applied Pharmacology, 207(2), 459–470.   Giger, L. et al., 2013. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nature Chemical Biology, 9(8), 494–498.   Glasner, M.E., Gerlt, J.A. & Babbitt, P.C., 2006. Evolution of enzyme superfamilies. Current Opinion in Chemical Biology, 10(5), 492–497.   Gobeil, S.M.C. et al., 2014. Maintenance of native-like protein dynamics may not be required for engineering functional proteins. Chemistry & Biology, 21(10), 1330–1340.   Gobeil, S.M.C. et al., 2019. The structural dynamics of engineered β-Lactamases vary broadly on three timescales yet sustain native function. Scientific Reports, 9(1), 6656–12.   Goldsmith, M. & Tawfik, D.S., 2017. Enzyme engineering: reaching the maximal catalytic efficiency peak. Current Opinion in Structural Biology, 47, 140–150.   Goldsmith, M. et al., 2017. Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Engineering Design and Selection, 30(4), 333–345.   González, J.M. et al., 2007. The Zn2 position in Metallo-β-Lactamases is critical for activity: A study on chimeric metal sites on a conserved protein scaffold. Journal of Molecular Biology, 373(5), 1141–1156.   González, M.M. et al., 2016. Optimization of conformational dynamics in an epistatic evolutionary trajectory. Molecular Biology and Evolution, 33(7), 1768–1776.   Goodey, N.M. & Benkovic, S.J., 2008. Allosteric regulation and catalysis emerge via a common route. Nature Chemical Biology, 4(8), 474–482.   Gotthard, G. et al., 2013. Structural and enzymatic characterization of the phosphotriesterase OPHC2 from Pseudomonas pseudoalcaligenes. PLoS ONE, 8(11), e77995.    151 Gould, S.M. & Tawfik, D.S., 2005. Directed evolution of the promiscuous esterase activity of carbonic anhydrase II. Biochemistry, 44(14), 5444–5452.   Gurung, N. et al., 2013. A broader view: microbial enzymes and their relevance in industries, medicine, and beyond. BioMed Research International, 2013, 329121.   Hamnevik, E. et al., 2017. Relaxation of nonproductive binding and increased rate of coenzyme release in an alcohol dehydrogenase increases turnover with a nonpreferred alcohol enantiomer. The FEBS Journal, 284(22), 3895–3914.   Harder, E. et al., 2016. OPLS3: A force field providing broad coverage of drug-like small molecules and proteins. Journal of Chemical Theory and Computation, 12(1), 281–296.   Harms, M.J. & Thornton, J.W., 2013. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nature Reviews Genetics, 14(8), 559–571.   Harms, M.J. & Thornton, J.W., 2014. Historical contingency and its biophysical basis in glucocorticoid receptor evolution. Nature, 512(7513), 203–207.   Hartl, D.L., Dykhuizen, D.E. & Dean, A.M., 1985. Limits of adaptation: the evolution of selective neutrality. Genetics, 111(3), 655–674.  Hawwa, R. et al., 2009. Structure-based and random mutagenesis approaches increase the organophosphate-degrading activity of a phosphotriesterase homologue from Deinococcus radiodurans. Journal of Molecular Biology, 393(1), 36–57.   Henzler-Wildman, K. & Kern, D., 2007. Dynamic personalities of proteins. Nature, 450(7172), 964–972.   Hess, B. et al., 2008. GROMACS 4:  Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation, 4(3), 435–447.   Hiblot, J. et al., 2012. Characterisation of the organophosphate hydrolase catalytic activity of SsoPox. Scientific Reports, 2.   Hilvert, D., 2013. Design of protein catalysts. Annual Review of Biochemistry, 82, 447–470.   Hobbs, J.K. et al., 2012. On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Molecular Biology and Evolution, 29(2), 825–835.   Hochberg, G.K.A. & Thornton, J.W., 2017. Reconstructing ancient proteins to understand the causes of structure and function. Annual Review of Biophysics, 46(1), 247–269.    152 Hong, N.-S. et al., 2018. The evolution of multiple active site configurations in a designed enzyme. Nature Communications, 9(1), 3900–10.   Hong, S.B. & Raushel, F.M., 1996. Metal-substrate interactions facilitate the catalytic activity of the bacterial phosphotriesterase. Biochemistry, 35(33), 10904–10912.   Horsman, G.P. et al., 2003. Mutations in distant residues moderately increase the enantioselectivity of Pseudomonas fluorescens esterase towards methyl 3bromo-2-methylpropanoate and ethyl 3phenylbutyrate. Chemistry - a European Journal, 9(9), 1933–1939.   Hoskin, F.C. & Long, R.J., 1972. Purification of a DFP-hydrolyzing enzyme from squid head ganglion. Archives of Biochemistry and Biophysics, 150(2), 548–555.   Huang, H. et al., 2015. Panoramic view of a superfamily of phosphatases through substrate profiling. Proceedings of the National Academy of Sciences of the United States of America, 112(16), E1974–83.   Huelsenbeck, J.P. & Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (Oxford, England), 17(8), 754–755.   Huelsenbeck, J.P. et al., 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294(5550), 2310–2314.   Ingles, D.W. & Knowles, J.R., 1967. Specificity and stereospecificity of alpha-chymotrypsin. The Biochemical Journal, 104(2), 369–377.  Jackson, C.J. et al., 2008. In crystallo capture of a Michaelis complex and product-binding modes of a bacterial phosphotriesterase. Journal of Molecular Biology, 375(5), 1189–1196.   Jackson, C.J. et al., 2013. Structure and function of an insect α-carboxylesterase (αEsterase7) associated with insecticide resistance. Proceedings of the National Academy of Sciences of the United States of America, 110(25), 10177–10182.   Jackson, C.J. et al., 2005. The effects of substrate orientation on the mechanism of a phosphotriesterase. Organic & Biomolecular Chemistry, 3(24), 4343.  Jacob, F., 1977. Evolution and tinkering. Science, 196(4295), 1161–1166.  Jencks, W.P., 1975. Binding energy, specificity, and enzymic catalysis: the circe effect. Advances in Enzymology and Related Areas of Molecular Biology, 43(1), 219–410.   Jiménez-Osés, G. et al., 2014. The role of distant mutations and allosteric regulation on LovD active site dynamics. Nature Chemical Biology, 10(6), 431–436.   153  Joy, J.B. et al., 2016. Ancestral reconstruction. PLoS Computational Biology, 12(7), e1004763.   Kabsch, W., 2010. Integration, scaling, space-group assignment and post-refinement. Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 2), 133–144.   Kabsch, W., 2010. XDS. Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 2), 125–132.   Kaczmarski, J.A. et al., 2020. Altered conformational sampling along an evolutionary trajectory changes the catalytic activity of an enzyme. bioRxiv, 57, 320–32.   Kaltenbach, M. & Tokuriki, N., 2014. Dynamics and constraints of enzyme evolution. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution, 322(7), 468–487.   Kaltenbach, M. et al., 2018. Evolution of chalcone isomerase from a noncatalytic ancestor. Nature Chemical Biology, 14(6), 548–555.   Kaltenbach, M. et al., 2015. Reverse evolution leads to genotypic incompatibility despite functional and active site convergence. eLife, 4, e06492  Karalliedde, L. et al., Organophosphates and health. London: Imperial College Press, 2001. Print.   Karplus, P.A. & Diederichs, K., 2012. Linking crystallographic model and data quality. Science, 336(6084), 1030–1033.   Khanal, A. et al., 2015. Differential effects of a mutation on the normal and promiscuous activities of orthologs: implications for natural and directed evolution. Molecular Biology and Evolution, 32(1), 100–108.   Khersonsky, O. & Tawfik, D.S., 2005. Structure−reactivity studies of serum paraoxonase PON1 suggest that its native activity is lactonase †. Biochemistry, 44(16), 6371–6382.   Khersonsky, O. & Tawfik, D.S., 2010. Enzyme promiscuity: a mechanistic and evolutionary perspective. Biochemistry, 79, 471–505.   Khersonsky, O. et al., 2012. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proceedings of the National Academy of Sciences of the United States of America, 109(26), 10358–10363.   Khersonsky, O. et al., 2010. Evolutionary optimization of computationally designed enzymes: Kemp eliminases of the KE07 series. Journal of Molecular Biology, 396(4), 1025–1042.    154 Khersonsky, O. et al., 2011. Optimization of the in-silico-designed kemp eliminase KE70 by computational design and directed evolution. Journal of Molecular Biology, 407(3), 391–412.   Kirby, A.J. & Hollfelder, F. From enzyme models to model enzymes. Cambridge: Royal Society of Chemistry, 2009. Print.  Kratzer, J.T. et al., 2014. Evolutionary history and metabolic insights of ancient mammalian uricases. Proceedings of the National Academy of Sciences, 111(10), 3763–3768.   Kries, H., Blomberg, R. & Hilvert, D., 2013. De novo enzymes by computational design. Current Opinion in Chemical Biology, 17(2), 221–228.   Kryazhimskiy, S. et al., 2014. Global epistasis makes adaptation predictable despite sequence-level stochasticity. Science, 344(6191), 1519–1522.   Kuntz, I.D. et al., 1999. The maximal affinity of ligands. Proceedings of the National Academy of Sciences, 96(18), 9997–10002.   Le, S.Q. & Gascuel, O., 2008. An improved general amino acid replacement matrix. Molecular Biology and Evolution, 25(7), 1307–1320.   Lim, S.A. et al., 2016. Evolutionary trend toward kinetic stability in the folding trajectory of RNases H. Proceedings of the National Academy of Sciences, 113(46), 13045–13050.   Liu, D. et al., 2008. Mechanism of the quorum-quenching lactonase (AiiA) from Bacillus thuringiensis. 1. Product-Bound Structures †‡. Biochemistry, 47(29), 7706–7714.   Liu, H. et al., 2005. Plasmid-borne catabolism of methyl parathion and p-nitrophenol in Pseudomonas sp. strain WBC-3. Biochemical and Biophysical Research Communications, 334(4), 1107–1114.   Lobkovsky, A.E. & Koonin, E.V., 2012. Replaying the tape of life: quantification of the predictability of evolution. Frontiers in Genetics, 3, 246.   Lozovsky, E.R. et al., 2009. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences, 106(29), 12025–12030.   Lunzer, M. et al., 2005. The biochemical architecture of an ancient adaptive landscape. Science, 310(5747), 499–501.   Luo, X.J. et al., 2014. Switching a newly discovered lactonase into an efficient and thermostable phosphotriesterase by simple double mutations His250Ile/Ile263Trp. Biotechnology and Bioengineering, 111(10), 1920–1930.   155  Ma, B. & Nussinov, R., 2010. Enzyme dynamics point to stepwise conformational selection in catalysis. Current Opinion in Chemical Biology, 14(5), 652–659.   Malla, R.K. et al., 2011. The first total synthesis of (±)-cyclophostin and (±)-cyclipostin P: Inhibitors of the serine hydrolases acetyl cholinesterase and hormone sensitive lipase. Organic Letters, 13(12), 3094–3097.   Maria-Solano, M.A., Iglesias-Fernández, J. & Osuna, S., 2019. Deciphering the allosterically driven conformational ensemble in tryptophan synthase evolution. Journal of the American Chemical Society, 141(33), 13049–13056.   Maria-Solano, M.A. et al., 2018. Role of conformational dynamics in the evolution of novel enzyme function. Chemical Communications (Cambridge, England), 54(50), 6622–6634.   Mashiyama, S.T. et al., 2014. Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere. PLoS Biology, 12(4), e1001843.   McDonald, A.G., Boyce, S. & Tipton, K.F., 2009. ExplorEnz: the primary source of the IUBMB enzyme list. Nucleic Acids Research, 37(Database issue), D593–597.   McKeown, A.N. et al., 2014. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell, 159(1), 58–68.   Meier, M.M. et al., 2013. Molecular engineering of organophosphate hydrolysis activity from a weak promiscuous lactonase template. Journal of the American Chemical Society, 135(31), 11670–11677.   Meini, M.-R. et al., 2015. Quantitative description of a protein fitness landscape based on molecular features. Molecular Biology and Evolution, 32(7), 1774–1787.   Melamed, D. et al., 2013. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA, 19(11), 1537–1551.   Melnikov, A. et al., 2014. Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Research, 42(14), e112–e112.   Merkl, R. & Sterner, R., 2016. Ancestral protein reconstruction: techniques and applications. Biological Chemistry, 397(1), 1–21.   Mesecar, A.D., Stoddard, B.L. & Koshland, D.E., 1997. Orbital steering in the catalytic power of enzymes: small structural changes with large catalytic consequences. Science, 277(5323), 202–206.  156  Meyer, E.A., Castellano, R.K. & Diederich, F., 2003. Interactions with aromatic rings in chemical and biological recognition. Angewandte Chemie (International Ed. in English), 42(11), 1210–1250.   Mills, D.R., Peterson, R.L. & Spiegelman, S., 1967. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proceedings of the National Academy of Sciences, 58(1), 217–224.   Mira, P.M. et al., 2015. Adaptive landscapes of resistance genes change as antibiotic concentrations change. Molecular Biology and Evolution, 32(10), 2707–2715.   Miton, C.M. & Tokuriki, N., 2016. How mutational epistasis impairs predictability in protein evolution and design. Protein Science: A Publication of the Protein Society, 25(7): 1260–1272.  Miton, C.M. et al., 2018. Evolutionary repurposing of a sulfatase: A new Michaelis complex leads to efficient transition state charge offset. Proceedings of the National Academy of Sciences, 115(31), E7293–E7302.   Mohamed, M.F. & Hollfelder, F., 2013. Efficient, crosswise catalytic promiscuity among enzymes that catalyze phosphoryl transfer. BBA - Proteins and Proteomics, 1834(1), 417–424.   Momb, J. et al., 2008. Mechanism of the quorum-quenching lactonase (AiiA) from Bacillus thuringiensis. 2. Substrate modeling and active site mutations. Biochemistry, 47(29), 7715–7725.   Morley, K.L. & Kazlauskas, R.J., 2005. Improving enzyme properties: when are closer mutations better? Trends in Biotechnology, 23(5), 231–237.   Morris, G.M. et al., 2009. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of Computational Chemistry, 30(16), 2785–2791.   Murshudov, G.N. et al., 2011. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallographica. Section D, Biological Crystallography, 67(Pt 4), 355–367.   Naughton, S.X. & Terry, A.V., 2018. Neurotoxicity in acute and repeated organophosphate exposure. Toxicology, 408, 101–112.   Näsvall, J. et al., (2012). Real-time evolution of new genes by innovation, amplification, and divergence. Science, 338(6105), 384–387.    157 Newcomb, R.D. et al., 1997. A single amino acid substitution converts a carboxylesterase to an organophosphorus hydrolase and confers insecticide resistance on a blowfly. Proceedings of the National Academy of Sciences, 94(14), 7464–7468.   Newton, M. S. et al., 2017. Structural and functional innovations in the real-time evolution of new (βα)8 barrel enzymes. Proceedings of the National Academy of Sciences of the United States of America, 114(18), 4727–4732.   Nguyen, P.C. et al., 2017. Cyclipostins and cyclophostin analogs as promising compounds in the fight against tuberculosis. Scientific Reports, 1–15.   Nguyen, V. et al., 2017. Evolutionary drivers of thermoadaptation in enzyme catalysis. Science, 355(6322), 289–294.   Niquille, D.L. et al., 2018. Nonribosomal biosynthesis of backbone-modified peptides. Nature Chemistry, 10(3), 282–287.   Nobeli, I., Favia, A.D. & Thornton, J.M., 2009. Protein promiscuity and its implications for biotechnology. Nature Biotechnology, 27(2), 157–167.   Noor, S. et al., 2012. Intramolecular epistasis and the evolution of a new enzymatic function. PLoS ONE, 7(6), e39822.   O'Maille, P.E. et al., 2008. Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature Chemical Biology, 4(10), 617–623.   Obexer, R. et al., 2017. Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nature Chemistry, 9(1), 50–56.   Oelschlaeger, P., 2005. Impact of remote mutations on metallo-β-lactamase substrate specificity: Implications for the evolution of antibiotic resistance. Protein Science, 14(3), 765–774.   Olson, C.A., Wu, N.C. & Sun, R., 2014. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Current Biology, 24(22), 2643–2651.   Ortlund, E.A. et al., 2007. Crystal structure of an ancient protein: evolution by conformational epistasis. Science, 317(5844):1544-1548.   Otten, R. et al., 2018. Rescue of conformational dynamics in enzyme catalysis by directed evolution. Nature Communications, 9(1), 1314.   Oue, S. et al., 1999. Redesigning the substrate specificity of an enzyme by cumulative effects of the mutations of non-active site residues. The Journal of Biological Chemistry, 274(4), 2344–2349.  158  Paaby, A.B. & Rockman, M.V., 2014. Cryptic genetic variation: evolution's hidden substrate. Nature Publishing Group, 15(4), 247–258.   Palmer, D. R. et al., 1999. Unexpected divergence of enzyme function and sequence: “N-acylamino acid racemase” is o-succinylbenzoate synthase. Biochemistry, 38(14), 4252–4258.   Parera, M. & Martinez, M.A., 2014. Strong epistatic interactions within a single protein. Molecular Biology and Evolution, 31(6), 1546–1553.   Perica, T. et al., 2014. Evolution of oligomeric state through allosteric pathways that mimic ligand binding. Science, 346(6216), 1254346–1254346.   Petrović, D. et al., 2018. Conformational dynamics and enzyme evolution. Journal of the Royal Society, Interface, 15(144): 20180330.   Phillips, P.C., 2008. Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics, 9(11), 855–867.   Pillai, A.S. et al., 2020. Origin of complexity in haemoglobin evolution. Nature, 581(7809),  480–485  Poelarends, G.J. et al., 2004. Cloning, expression, and characterization of a cis-3-chloroacrylic acid dehalogenase: insights into the mechanistic, structural, and evolutionary relationship between isomer-specific 3-chloroacrylic acid dehalogenases. Biochemistry, 43(3), 759–772.   Preiswerk, N. et al., 2014. Impact of scaffold rigidity on the design and evolution of an artificial Diels-Alderase. Proceedings of the National Academy of Sciences of the United States of America, 111(22), 8013–8018.   Privett, H.K. et al., 2012. Iterative approach to computational enzyme design. Proceedings of the National Academy of Sciences of the United States of America, 109(10), 3790–3795.   Purg, M. et al., 2016. Probing the mechanisms for the selectivity and promiscuity of methyl parathion hydrolase. Philosophical Transactions. Series a, Mathematical, Physical, and Engineering Sciences, 374(2080), 20160150.  Ramanathan, M.P. & Lalithakumari, D., 1999. Complete mineralization of methylparathion by Pseudomonas sp. A3. Applied Biochemistry and Biotechnology, 80(1), 1–12.  Rani, N. L. & Lalithakumari, D., 1994. Degradation of methyl parathion by Pseudomonas putida. Canadian Journal of Microbiology, 40(12), 1000–1006.   159 Richmond, M.L., 2001. Women in the early history of genetics. William Bateson and the Newnham College Mendelians, 1900-1910. Isis; an international review devoted to the history of science and its cultural influences, 92(1), 55–90.   Robert, X. & Gouet, P., 2014. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Research, 42(Web Server issue), W320–324.   Romero, P.A. & Arnold, F.H., 2009. Exploring protein fitness landscapes by directed evolution. Nature Reviews Molecular Cell Biology, 10(12), 866–876.   Russell, R.J. et al., 2011. The evolution of new enzyme function: lessons from xenobiotic metabolizing bacteria versus insecticide-resistant insects. Evolutionary Applications, 4(2), 225–248.   Sailer, Z.R. & Harms, M.J., 2017. High-order epistasis shapes evolutionary trajectories. PLoS Computational Biology, 13(5), e1005541.   Sambrook, J. R. Molecular cloning: a laboratory manual, third edition. New York: Cold Spring  Harbor Laboratory Press, 2001. Print.  Schramm, V.L., 2011. Enzymatic transition states, transition-state analogs, dynamics, thermodynamics, and lifetimes. Annual Review of Biochemistry, 80, 703–732.   Schulenburg, C. et al., 2015. Comparative laboratory evolution of ordered and disordered enzymes. Journal of Biological Chemistry, 290(15), 9310–9320.   Schüttelkopf, A.W., & van Aalten, D.M.F., 2004. PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallographica. Section D, Biological Crystallography, 60(Pt 8), 1355–1363.  Scott, C. et al., 2009. Catalytic improvement and evolution of atrazine chlorohydrolase. Applied and Environmental Microbiology, 75(7), 2184–2191.   Seibert, C.M. & Raushel, F.M., 2005. Structural and catalytic diversity within the amidohydrolase superfamily. Biochemistry, 44(17), 6383–6391.   Serdar, C.M. et al., 1982. Plasmid involvement in parathion hydrolysis by Pseudomonas diminuta. Applied and Environmental Microbiology, 44(1), 246–249.   Sethunathan, N. & Yoshida, T., 1973. A Flavobacterium sp. that degrades diazinon and parathion. Canadian Journal of Microbiology, 19(7), 873–875.   160 Shoichet, B.K. et al., 1995. A relationship between protein stability and protein function. Proceedings of the National Academy of Sciences of the United States of America, 92(2), 452–456.  Siddiq, M.A., Hochberg, G.K. & Thornton, J.W., 2017. Evolution of protein specificity: insights from ancestral protein reconstruction. Current Opinion in Structural Biology, 47, 113–122.   Siegel, J. B. et al., 2010. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science, 329(5989), 309–313.   Simões, P. et al., 2017. Predictable phenotypic, but not karyotypic, evolution of populations with contrasting initial history. Scientific Reports, 7(1), 1–12.   Singh, B.K., 2009. Organophosphorus-degrading bacteria: ecology and industrial applications. Nature Reviews. Microbiology, 7(2), 156–164.   Singh, B.K. & Walker, A., 2006. Microbial degradation of organophosphorus compounds. FEMS Microbiology Reviews, 30(3), 428–471.   Singh, R. et al., 2016. Microbial enzymes: industrial progress in 21st century. 3 Biotech, 6(2), 174–15.   Smith, J.M., 1970. Natural selection and the concept of a protein space. Nature, 225(5232), 563–564.  Stackhouse, J. et al., 1990. The ribonuclease from an extinct bovid ruminant. FEBS Letters, 262(1), 104–106.   Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (Oxford, England), 30(9), 1312–1313.   Starr, T.N. & Thornton, J.W., 2016. Epistasis in protein evolution. Protein Science: A Publication of the Protein Society, 25(7), 1204–1218.   Starr, T.N., Picton, L.K. & Thornton, J.W., 2017. Alternative evolutionary histories in the sequence space of an ancient protein. Nature, 549(7672), 409–413.   Stebbins, J., 1944. The law of diminishing returns. Science, 99(2571), 267–271.   Stiffler, M.A., Hekstra, D.R. & Ranganathan, R., 2015. Evolvability as a function of purifying selection in TEM-1 β-Lactamase. Cell, 160(5), 882–892.   Stormo, G.D., 2011. Maximally efficient modeling of DNA sequence motifs at all levels of complexity. Genetics, 187(4), 1219–1224.   161  Studer, R.A. et al., 2014. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2223–2228.   Sugrue, E. et al., 2016. The evolution of new catalytic mechanisms for xenobiotic hydrolysis in bacterial metalloenzymes*. Australian Journal of Chemistry. 69(12): 1383-1395  Sun, L. et al., 2004. Crystallization and preliminary X-ray studies of methyl parathion hydrolase from Pseudomonas sp. WBC-3. Acta Crystallographica. Section D, Biological Crystallography, 60(Pt 5), 954–956.   Sunden, F. et al., 2017. Differential catalytic promiscuity of the alkaline phosphatase superfamily bimetallo core reveals mechanistic features underlying enzyme evolution. The Journal of Biological Chemistry, 292(51), 20960–20974.   Sunden, F. et al., 2015. Extensive site-directed mutagenesis reveals interconnected functional units in the alkaline phosphatase active site. eLife, 4, e06181  Sykora, J. et al., 2014. Dynamics and hydration explain failed functional transformation in dehalogenase design. Nature Chemical Biology, 10(6), 428–430.   Tokuriki, N. & Tawfik, D.S., 2009. Protein dynamism and evolvability. Science, 324(5924), 203–207.   Tokuriki, N., & Tawfik, D.S., 2009. Stability effects of mutations and protein evolvability. Current Opinion in Structural Biology, 19(5), 596–604.   Tokuriki, N. et al., 2012. Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nature Communications, 3, 1257.   Tokuriki, N. et al., 2008. How protein stability and new functions trade off. PLoS Computational Biology, 4(2), e1000002.   Tomatis, P.E. et al., 2008. Adaptive protein evolution grants organismal fitness by improving catalysis and flexibility. Proceedings of the National Academy of Sciences of the United States of America, 105(52), 20605–20610.   Tomatis, P.E. et al., 2005. Mimicking natural evolution in metallo-beta-lactamases through second-shell ligand mutations. Proceedings of the National Academy of Sciences of the United States of America, 102(39), 13761–13766.   Toth-Petroczy, A. & Tawfik, D.S., 2014. The robustness and innovability of protein folds. Current Opinion in Structural Biology, 26, 131–138.   162  Trimpin, S. & Brizzard, B., 2009. Analysis of insoluble proteins. BioTechniques, 46(5), 321–326.   Trott, O. & Olson, A.J., 2010. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry, 31(2), 455–461.   Tufts, D.M. et al., 2015. Epistasis constrains mutational pathways of hemoglobin adaptation in high-altitude pikas. Molecular Biology and Evolution, 32(2), 287–298.   Vagin, A. & Teplyakov, A., 2010. Molecular replacement with MOLREP. Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 1), 22–25.   Vakulenko, S.B. et al., 1999. Effects on substrate profile by mutational substitutions at positions 164 and 179 of the class A TEM(pUC19) beta-lactamase from Escherichia coli. The Journal of Biological Chemistry, 274(33), 23052–23060.   van der Meer, J.-Y. et al., 2016. Using mutability landscapes of a promiscuous tautomerase to guide the engineering of enantioselective Michaelases. Nature Communications, 7, 1–16.   van Gunsteren, W.F. et al., Biomolecular Simulation: The GROMOS96 Manual and User Guide. Zürich: Vdf, Hochschulverlag AG an der ETH Zürich, 1996. Print.  van Hylckama Vlieg, J.E. & Janssen, D.B., 1991. Bacterial degradation of 3-chloroacrylic acid and the characterization of cis- and trans-specific dehalogenases. Biodegradation, 2(3), 139–150.   Villaverde, A. & Carrió, M.M., 2003. Protein aggregation in recombinant bacteria: biological role of inclusion bodies. Biotechnology Letters, 25(17), 1385–1395.   Villiers, B. & Hollfelder, F., 2011. Directed evolution of a gatekeeper domain in nonribosomal peptide synthesis. Chemistry & Biology, 18(10), 1290–1299.   Vipond, I.B., Moon, B.J. & Halford, S.E., 1996. An isoleucine to leucine mutation that switches the cofactor requirement of the EcoRV restriction endonuclease from magnesium to manganese. Biochemistry, 35(6), 1712–1721.   Voordeckers, K. et al., 2012. Reconstruction of ancestral metabolic enzymes reveals molecular mechanisms underlying evolutionary innovation through gene duplication. PLoS Biology, 10(12), e1001446.   Wagner, A., Robustness and Evolvability in Living Systems. New Jersey: Princeton University Press, 2007. Print.  163  Wallace, I.M. et al., 2006. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Research, 34(6), 1692–1699.    Waterhouse, A. et al., 2018. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Research, 46(W1), W296–W303.   Weinreich, D.M. et al., 2006. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science, 312(5770), 111–114.   Weinreich, D.M. et al., 2018. The influence of higher-order epistasis on biological fitness landscape topography. Journal of Statistical Physics, 172(1), 208–225.   Whittle, E. & Shanklin, J., 2001. Engineering delta 9-16:0-acyl carrier protein (ACP) desaturase specificity based on combinatorial saturation mutagenesis and logical redesign of the castor delta 9-18:0-ACP desaturase. The Journal of Biological Chemistry, 276(24), 21500–21505.   Wilding, M. et al., 2019. Protein engineering: the potential of remote mutations. Biochemical Society Transactions, 47(2), 701–711.   Winn, M.D. et al., 2011. Overview of the CCP4 suite and current developments. Acta Crystallographica. Section D, Biological Crystallography, 67(Pt 4), 235–242.   Wolfenden, R., 2011. Benchmark reaction rates, the stability of biological molecules in water, and the evolution of catalytic power in enzymes. Annual Review of Biochemistry, 80, 645–667.   Wrenbeck, E.E., Azouz, L.R. & Whitehead, T.A., 2017. Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded. Nature Communications, 8(1), 15695–10.   Yabukarski, F. et al., 2019. Assessing positioning in enzymatic catalysis via ketosteroid isomerase conformational ensembles. bioRxiv, 56, 449–46.   Yang, G. et al., 2019. Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nature Chemical Biology, 15(11), 1120–1128.   Yang, G. et al., 2016. Conformational tinkering drives evolution of a promiscuous activity through indirect mutational effects. Biochemistry, 55(32), 4583–4593.   Yang, Z., 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution, 39(3), 306–314.    164 Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24(8), 1586–1591.  Zallot, R., Oberg, N. & Gerlt, J.A., 2019. The EFI web resource for genomic enzymology tools: Leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry, 58(41), 4169–4182.   Zeymer, C., Zschoche, R. & Hilvert, D., 2017. Optimization of enzyme mechanism along the evolutionary trajectory of a computationally designed (retro-)aldolase. Journal of the American Chemical Society, 139(36), 12541–12549.   Zhao, H. et al., 1998. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nature Biotechnology, 16(3), 258–261.   Zhongli, C., Shunpeng, L. & Guoping, F., 2001. Isolation of methyl parathion-degrading strain M6 and cloning of the methyl parathion hydrolase gene. Applied and Environmental Microbiology, 67(10), 4922–4925.   Zuckerkandl, E. & Pauling, L., 1965. Molecules as documents of evolutionary history. Journal of Theoretical Biology, 8(2), 357–366.                           165 Appendices  Appendix A Supplementary material for Chapter 2  Table A.1 Library information and amino acid changes. Round [S] Prescreenb [S] Screenc Variants screenedd StEP shufflinge Amino acid change 1 0.5 mM 0.25 mM >2000 (192)  L33M, V69G, K139T, I230M 2 0.5 mM  0.25 mM  >2000 (192)  F64C 3 0.5 mM  0.25 mM  >2000 (192)  ✔ M33V 4a 0.5 mM; 0.2 mM 0.25 mM; 0.15 mM  2 × >2000 (192)  S20F 5 0.2 mM  0.15 mM  >2000 (192)  ✔ K218R 6 0.2 mM  0.15 mM  >2000 (192)   H18Q aIn round 4 two independent rounds of screening were performed with different substrate concentrations, because no improved variant could be identified with higher substrate concentrations.  bParaoxon concentration used for agar plate prescreen as described in material & methods.  cParaoxon concentration used in 96-well plate screening as described in material & methods.  dThe number of variants screened in agar plate prescreen and in 96-well plates (brackets) per round.  eDNA shuffling was performed in rounds where several improved variants with distinct mutations were identified.            166 Table A.2 Site-directed mutagenesis primers. Primer name Template Mutation Sequence (5’ to 3’) V69G-Fwd AiiA-wt V69G ctttttaacggtacatttggtgaaggacagatcttaccg V69G-Rev   cggtaagatctgtccttcaccaaatgtaccgttaaaaag L33M-Fwd AiiA-wt L33M gggaaactattaaacatgccggtgtggtgttatcttttg L33M-Rev   caaaagataacaccacaccggcatgtttaatagtttccc K139T-Fwd AiiA-wt K139T gcacttcatagagaagaatatatgacagaatgtatattaccgcatttg K139T-Rev   caaatgcggtaatatacattctgtcatatattcttctctatgaagtgc I230M-Fwd AiiA-wt I230M gttgtgaaaaaagagaaaccaatgattttctttggtcatgatatagagc I230M-Rev   gctctatatcatgaccaaagaaaatcattggtttctcttttttcacaac F64C-Fwd AiiA-wt F64C gttaataatgaagggctttgtaacggtacatttgttgaagg F64C-Rev   ccttcaacaaatgtaccgttacaaagcccttcattattaac L33V-Fwd AiiA-wt L33V gggaaactattaaacgtgccggtgtggtgttatcttttg L33V-Rev   caaaagataacaccacaccggcacgtttaatagtttccc S20F-Fwd AiiA-wt S20F gttgcatgttggatcattcgtttgttaacagtgcgttaac S20F-Rev   gttaacgcactgttaacaaacgaatgatccaacatgcaac K218R-Fwd AiiA-wt K218R gatccagaattagctttatcttcaattagacgtttaaaagaagttgtg K218R-Rev   cacaacttcttttaaacgtctaattgaagataaagctaattctggatc H18Q-Fwd AiiA-wt H18Q gtcgttgcatgttggatcaatcgtctgttaacagtgcgttaac H18Q-Rev   gttaacgcactgttaacagacgattgatccaacatgcaacgac F68A-Fwd AiiA-wt F68A gggctttttaacggtacagctgttgaaggacagatcttac F68A-Rev   gtaagatctgtccttcaacagctgtaccgttaaaaagccc Q72A-Fwd AiiA-wt Q72A gcttgctcttctaggagcgatcttaccgaaaatgactgaggaag Q72A-Rev   gcttgctcttctccttcaacaaatgtaccgttaaaaagcccttcattattaac R134A-Fwd AiiA-wt R134A gaggcagcacttcatgcagaagaatatatgaaag R134A-Rev   ctttcatatattcttctgcatgaagtgctgcctc E135A-Fwd AiiA-wt E135A gaggcagcacttcatagagcagaatatatgaaagaatgtatattacc E135A-Rev   ggtaatatacattctttcatatattctgctctatgaagtgctgcctc E136A-Fwd AiiA-wt E136A gcagcacttcatagagaagcatatatgaaagaatg E136A-Rev   cattctttcatatatgcttctctatgaagtgctgc D108N-Fwd AiiA-wt D108N gttctcacttacattttaatcatgcaggaggaaacggtgc D108N-Rev   gcaccgtttcctcctgcatgattaaaatgtaagtgagaac Y194F-Fwd AiiA-wt, AiiA-R4 Y194F cgattgatgcatcgttcacgaaagagaattttgaagatgaagtg Y194F-Rev   cacttcatcttcaaaattctctttcgtgaacgatgcatcaatcg F68A-F64C-Fwd AiiA-F68 F64C gttaataatgaagggctttgtaacggtacagctgttgaagg F68A-F64C-Rev   ccttcaacagctgtaccgttacaaagcccttcattattaac V69G-F64C-Fwd AiiA-V69G F64C gttaataatgaagggctttgtaacggtacatttggtgaagg V69G-F64C-Rev   ccttcaccaaatgtaccgttacaaagcccttcattattaac F68A-V69G-F64C-Fwd AiiA-F68A-V69G F64C gttaataatgaagggctttgtaacggtacagctggtgaagg F68A-V69G-F64C-Rev   ccttcaccagctgtaccgttacaaagcccttcattattaac R1-F68A-Fwd AiiA-R1 F68A gggctttttaacggtacagctggtgaaggacagatcttac R1-F68A-Rev   gtaagatctgtccttcaccagctgtaccgttaaaaagccc R4-F68A-Fwd AiiA-R4 F68A gggctttgtaacggtacagctggtgaaggacagatcttac R4-F68A-Rev   gtaagatctgtccttcaccagctgtaccgttacaaagccc R4-Q72A-Fwd AiiA-R4 Q72A gcttgctcttctaggagcgatcttaccgaaaatgactgaggaag R4-Q72A-Rev   gcttgctcttctccttcaccaaatgtaccgttacaaagcccttc R4-R134A-Fwd AiiA-R4 R134A gaggcagcacttcatgcagaagaatatatgacag R4-R134A-Rev   ctgtcatatattcttctgcatgaagtgctgcctc R4-E135A-Fwd AiiA-R4 E135A gaggcagcacttcatagagcagaatatatgacagaatgtatattacc R4-E135A-Rev   ggtaatatacattctgtcatatattctgctctatgaagtgctgcctc R4-E136A-Fwd AiiA-R4 E136A gcagcacttcatagagaagcatatatgacagaatg R4-E136A-Rev    cattctgtcatatatgcttctctatgaagtgctgc   167 Table A.3 Crystallography data collection and refinement statistics. Crystal  AiiA-wt AiiA-R4 PDB ID 5EH9 5EHT Wavelength (Å) 0.9537 0.9537 Resolution range (Å) 45.47 - 1.29 (1.336 - 1.29) 32.49 - 1.29 (1.336 - 1.29) Space group P 21 21 21 P 21 21 21 Unit cell (Å) Unit cell (°) 54.51 55.47 79.358 90.00 90.00 90.00 54.64 55.74 79.88 90.00 90.00 90.00 Total reflections 430254 (42089) 826812 (69935) Unique reflections 61128 (6032) 61760 (6093) Multiplicity 7.0 (7.0) 13.4 (11.5) Completeness (%) 99.80 (99.83) 99.47 (99.33) Mean I/sigma (I) 16.33 (1.32) 20.78 (1.44) Wilson B-factor (Å2) 15.87 13.34 R-merge 0.06476 (1.56) 0.08195 (1.759) R-means 0.07007 0.08529 CC1/2 0.999 (0.566) 0.999 (0.636) CCa 1 (0.85) 1 (0.882) R-work 0.1234 (0.2397) 0.1358 (0.2775) R-free 0.1663 (0.2722) 0.1726 (0.3526) Number of non-hydrogen atoms 2259 2204 Macromolecules (atoms of protein) 2036 2013 Ligands (atoms of glycerols) 56 32 Water 167 159 Protein residues 253 253 RMS (bonds, Å) 0.022 0.022 RMS (angles, °) 2.27 2.31 Ramachandran favoured (%) 96 96 Ramachandran outliers (%) 0.39 0.79 Clash score 4.32 3.46 Average B-factor (Å2) 23.9 20.8 Macromolecules (B-factors of protein) 22.8 20.1 Ligands (B-factors of glycerols) 36.6 30.5 Solvent 32.7 27.3 aHighest-resolution shell is shown in parentheses.       168 Table A.4 Cell lysate activities.  Lactonasea  Paraoxonaseb Variant Rate (nM/s) Relative to Previous Round  Rate (nM/s) Relative to Previous Round WT 1.1 (± 0.05) × 104 -  5.1 ± 0.8 - R1 5.5 (± 0.2) × 103 0.5  50 ± 4.9 9.8 R2 3.1 (± 0.04) × 103 0.6  190 ± 11 3.7 R3 5.9 (± 0.2) × 103 1.9  460 ± 16 2.5 R4 1.3 (± 0.1) × 104 2.2  950 ± 7.0 2.1 R5 1.5 (± 0.1) × 104 1.2  1000 ± 48 1.1 R6 1.7 (± 0.03) × 104 1.1  1400 ± 88 1.3         Relative to WT   Relative to WT V69G 4.9 (± 0.1) × 103 0.4  52 ± 2.5 10.1 L33M ND ND  4.7 ± 0.1 1.0 I230M ND ND  5.3 ± 0.05 1.1 K139T ND ND  5.6 ± 0.1 1.1 F64C 2.5 (± 0.1) × 103 0.2  17 ± 2.2 3.4 L33V 2.7 (± 0.1) × 104 2.5  18 ± 0.5 3.5 S20F 6.0 (± 0.05) × 103 0.5  8.8 ± 1.0 1.7 K218R 1.3 (± 0.1) × 104 1.2  5.9 ± 1.3 1.2 H18Q 1.1 (± 0.03) × 104 1.0  13 ± 1.8 2.5 F68A 1.7 (± 0.02) × 104 1.6  10 ± 0.9 2.0 Q72A 8.4 (± 0.4) × 103 0.8  2.5 ± 0.2 0.5 R134A 8.8 (± 0.3) × 103 0.8  5.0 ± 2.0 1.0 E135A 1.1 (± 0.05) × 104 1.0  18 ± 0.2 3.4 E136A 4.6 (± 0.1) × 103 0.4  4.3 ± 1.2 0.8 Y194F 6.8 (± 0.1) × 103 0.6  39 ± 1.1 7.6 V69G-F64C 3.8 (± 0.2) × 103 0.3  200 ± 13 38.8 F64C-S20F 3.3 (± 0.2) × 103 0.3  22 ± 0.3 4.3 V69G-F64C-S20F 5.9 (± 0.2) × 103 0.5  350 ± 16 68.0 V69G-F68A 1.6 (± 0.1) × 104 1.5  21 ± 0.2 4.0 F64C-F68A 4.3 (± 0.2) × 103 0.4  18 ± 0.4 3.6 V69G-F64C-F68A 4.4 (± 0.02) × 103 0.4  21 ± 1.2 4.0         Relative to R4   Relative to R4 R4-F68A 1.5 (± 0.02) × 104 1.1  310 ± 9.4 0.3 R4-Q72A 1.2 (± 0.03) × 104 0.9  880 ± 25 0.9 R4-R134A 9.1 (± 0.3) × 103 0.7  810 ± 13 0.9 R4-E135A 1.2 (± 0.01) × 104 0.9  680 ± 17 0.7 R4-E136A 1.8 (± 0.7) × 103 0.1  180 ± 9.4 0.2 R4-D108N 1.5 (± 1.1) × 102 0.0  5.3 ± 1.3 0.0 R4-Y194F 3.9 (± 0.2) × 103 0.3  950 ± 35 1.0 All variants were measured using the same amount of lysate (100 μL in 200 μL reaction) and the same substrate concentration. Measurements were performed in triplicate and values were averaged. Errors represent standard deviation. ND means not determined.  aTo assay lactonase activity a final concentration of 250 μM N-acyl homoserine lactone was used. bTo assay paraoxonase activity a final concentration of 250 μM paraoxon was used.    169 Table A.5 Kinetic parameters for paraoxonase activity. Variant kcat [s-1] KM [µM] Ki [µM] kcat KM [s-1M-1] Relative to Previous Round WT 1.8 ± 0.1 3400 ± 180 ND 5.1 × 102 1.0 R1 8.2 ± 0.6 1000 ± 170 ND 7.9 × 103 15.5 R2 10 ± 0.3 400 ± 40 ND 2.6 × 104 3.3 R3 25 ± 1.8 470 ± 70 7100 ± 1400 5.3 × 104 2.1 R4 29 ± 6.6 220 ± 65 190 ± 60 1.3 × 105 2.5 R5 15 ± 1.3 40 ± 7 550 ± 120 3.8 × 105 2.8 R6 14 ± 1.2 26 ± 5 660 ± 180 5.4 × 105 1.4            Relative to WT V69G 6.6 ± 0.4 1100 ± 180 ND 6.0 × 103 11.8 F64C 1.0 ± 0.6 1100 ± 140 ND 9.4 × 102 1.8 L33V 4.0 ± 0.2 3800 ± 400 ND 1.1 × 103 2.1 S20F 2.8 ± 0.4 5500 ± 1300 ND 5.2 × 102 1.0 K218R 1.9 ± 0.1 1400 ± 270 ND 1.3 × 103 2.6 H18Q 4.5 ± 0.1 2200 ± 100 ND 2.1 × 103 4.0 F68A 2.9 ± 0.1 3200 ± 110 ND 9.1 × 102 1.8 Q72A 1.2 ± 0.0 3900 ± 180 ND 3.1 × 102 0.6 R134A 1.3 ± 0.0 1200 ± 60 ND 1.1 × 103 2.1 E135A 2.4 ± 0.2 3300 ± 490 ND 7.1 × 102 1.4 E136A 1.7 ± 0.1 1300 ± 140 ND 1.3 × 103 2.6 D108N 0.02 ± 0.00 3200 ± 470 ND 5.6 × 100 0.011 Y194F 2.5 ± 0.1 500 ± 30 ND 5.0 × 103 9.9 V69G-F64C 26 ± 2.7 730 ± 140 10000 ± 3000 3.6 × 104 69.8 V69G-F68A 7.5 ± 1.0 3500 ± 930 ND 2.1 × 103 4.2 F64C-F68A 1.0 ± 0.1 2000 ± 300 ND 5.0 × 102 1.0 V69G-F64C-F68A 14 ± 1.1 4400 ± 660 ND 3.1 × 103 6.0            Relative to R4 R4-F68A 11 ± 0.2 200 ± 6.7 ND 5.5 × 104 0.4 R4-Q72A 26 ± 1.5 86 ± 8.9 630 ± 90 3.0 × 105 2.3 R4-R134A 26 ± 1.6 89 ± 9.7 630 ± 90 2.9 × 105 2.2 R4-E135A 5.7 ± 0.2 22 ± 2.4 ND 2.6 × 105 1.9 R4-E136A 11 ± 1.3 7.5 ± 2.3 380 ± 140 1.5 × 106 11.3 R4-D108N 1.4 ± 0.1 960 ± 100 ND 1.5 × 103 0.0 R4-Y194F 38 ± 4.1 53 ± 12 1800 ± 800 7.3 × 105 5.5 Measurements were performed in triplicate and values were averaged. Errors represent standard deviation. ND means not determined.      170 Appendix B Supplementary material for Chapter 3  Table B.1 Ambiguous sites in AncDHCH1 and the highest Bayesian posterior probability alternative residue. Position Selected  Alternative Residue Probability Properties  Residue Probability Properties 47 Phe 0.55 Hydrophobic, aromatic  Tyr 0.45 Polar, uncharged 51 Met 0.79 Hydrophobic, aliphatic  Leu 0.17 Hydrophobic, aliphatic 57 Val 0.68 Hydrophobic, aliphatic  Ile 0.32 Hydrophobic, aliphatic 78 Ala 0.76 Hydrophobic, aliphatic  Gly 0.09 Polar, uncharged 82 Ser 0.59 Polar, uncharged  Ala 0.37 Hydrophobic, aliphatic 91 Ala 0.79 Hydrophobic, aliphatic  Thr 0.07 Polar, uncharged 102 Val 0.51 Hydrophobic, aliphatic  Ile 0.49 Hydrophobic, aliphatic 111 Val 0.78 Hydrophobic, aliphatic  Ile 0.22 Hydrophobic, aliphatic 127 Ala 0.77 Hydrophobic, aliphatic  Val 0.22 Hydrophobic, aliphatic 159 Asn 0.60 Polar, uncharged  Asp 0.20 Acidic 163 Ala 0.36 Hydrophobic, aliphatic  Arg 0.30 Basic 174 Lys 0.66 Basic  Gln 0.23 Polar, uncharged 183 Ala 0.64 Hydrophobic, aliphatic  Ser 0.19 Polar, uncharged 190 Asp 0.67 Polar, uncharged  Glu 0.31 Acidic 221 Asp 0.77 Polar, uncharged  Glu 0.23 Acidic 227 Lys 0.78 Basic  Arg 0.22 Basic 229 Leu 0.54 Hydrophobic, aliphatic  Val 0.31 Hydrophobic, aliphatic 272 Gln 0.49 Polar, uncharged  Ser 0.13 Polar, uncharged 276 Asp 0.67 Acidic  Thr 0.23 Polar, uncharged 298 Ile 0.60 Hydrophobic, aliphatic  Val 0.40 Hydrophobic, aliphatic 311 Ile 0.58 Hydrophobic, aliphatic  Val 0.41 Hydrophobic, aliphatic 314 Asp 0.54 Acidic  Glu 0.45 Acidic 319 Arg 0.30 Basic  Gln 0.21 Polar, uncharged 327 Val 0.52 Hydrophobic, aliphatic  Ala 0.31 Hydrophobic, aliphatic 328 Ala 0.52 Hydrophobic, aliphatic  Val 0.26 Hydrophobic, aliphatic              171 Table B.2 Catalytic activities of purified enzymes against different substrates. All enzymes were purified using affinity chromatography, and activity levels of purified enzymes were measured at a single substrate concentration (450 μM for all substrates except for Centa, which was measured at 90 μM) and normalized to an enzyme concentration of 1 μM. Results shown are the mean activities for three technical replicates.  ± indicates standard deviation between three technical replicates  n.d. indicates activity not detected   Enzyme  MPH AncDHCH1 JsDHCH BbDHCH Substrate Activity (s-1) centa n.d. n.d. n.d. n.d. p-nitrophenyl sulfate n.d. n.d. n.d. n.d. p-nitrophenyl phenylphosphonate n.d. n.d. n.d. n.d. p-nitrophenyl phosphoryl choline n.d. n.d. n.d. n.d. p-nitrophenyl butyrate 0.01 ± 0.0004 0.08 ± 0.001 0.02 ± 0.006 0.02 ± 0.0005 p-nitrophenyl acetate 0.02 ± 0.0002 4.5 ± 0.3 0.02 ± 0.001 0.4 ± 0.02 bis-(p-nitrophenyl)-phosphate n.d. n.d. n.d. n.d. p-nitrophenyl phosphate n.d. n.d. n.d. n.d. paraoxon-methyl 1.2 ± 0.04 0.04 ± 0.003 0.007 ± 0.0003 0.001 ± 0.0001 paraoxon-ethyl 0.06 ± 0.003 0.01 ± 0.0002 0.005 ± 0.0003 0.001 ± 0.00003 parathion-methyl 0.6 ± 0.05 0.02 ± 0.001 0.0003 ± 0.00007 n.d. parathion-ethyl 0.09 ± 0.004 0.0007 ± 0.0002 n.d. n.d. dihydrocoumarin 0.9 ± 0.02 170 ± 8 120 ± 3 80 ± 0.5 3-oxo-c8-hsl n.d. n.d. n.d. n.d. γ-nonalactone n.d. n.d. n.d. n.d. γ-valerolactone n.d. n.d. n.d. n.d. γ-caprolactone n.d. n.d. n.d. n.d. δ-decalactone n.d. n.d. n.d. n.d.  172 Table B.3 Kinetic parameters of enzymes used for this study for selected substrates.  The average of two technical replicates was used to fit to the Michaelis Menten equation. For enzymes where no saturation kinetics were observed, the data was fitted to the pseudo first order kinetics in which the slope directly corresponds to kcat/KM. ± indicates the error in the fit of the data to the Michaelis Menten equation - indicates detectable activity that’s too low to be accurately determined n.d. indicates activity not detectedEnzyme Parathion-methyl Parathion-ethyl Paraoxon-methyl Paraoxon-ethyl Dihydrocoumarinkcat [s-1] KM [μM] kcat KM [s-1M-1] kcat [s-1] KM [μM] kcat KM [s-1M-1] kcat [s-1] KM [μM] kcat KM [s-1M-1] kcat [s-1] KM [μM] kcat KM [s-1M-1] kcat [s-1] KM [μM] kcat KM [s-1M-1]MPH 17 ± 1 880 ± 80 1.9 × 104 1.6 ± 0.1 1700 ± 200 9.4 × 102 20 ± 1 2600 ± 200 7.6 × 103 0.7 ± 0.1 3800 ± 800 1.8 × 102 2.1 ± 0.03 2000 ± 50 1.1 × 103MPH-m5 0.03 ± 0.002 1500 ± 200 2.1 × 101 0.001 ± 0.0001 400 ± 80 2.5 × 100 - - 5.0 × 101 0.3 ± 0.02 6800 ± 1000 3.8 × 101 130 ± 10 150 ± 20 8.7 × 105JsDHCH - - 2.3 × 10-1 - - 4.9 × 10-2 0.26 ± 0.03 15000 ± 3000 1.7 × 101 0.13 ± 0.03 13000 ± 4000 1.0 × 101 580 ± 10 920 ± 40 6.3 × 105BbDHCH 0.002 ± 0.0001 2900 ± 400 5.4 × 10-1 - - - - - 1.1 × 100 0.08 ± 0.01 17000 ± 3000 5.0 × 100 720 ± 40 420 ± 80 1.7 × 106SmDHCH n.d. n.d. n.d. n.d. n.d. n.d. - - 1.5 × 10-1 n.d. n.d. n.d. 3600 ± 160 2400 ± 240 1.5 x 106AncDHCH1 0.03 ± 0.002 1100 ± 200 2.7 × 101 0.002 ± 0.0006 1900 ± 500 1.1 × 100 0.4 ± 0.02 8700 ± 600 4.9 × 101 0.06 ± 0.001 900 ± 70 6.4 × 101 500 ± 62 230 ± 80 2.1 × 106AncDHCH1+m5 19 ± 0.7 2900 ± 200 6.4 × 103 0.7 ± 0.03 1900 ± 200 3.7 × 102 43 ± 2 8700 ± 600 4.9 x 103 1.5 ± 0.1 3900 ± 600 3.8 x 102 7.8 ± 0.8 1200 ± 330 6.8 x 103AncDHCH2 0.02 ± 0.0008 990 ± 120 1.7 × 101 0.007 ± 0.0008 1100 ± 300 6.6 × 100 0.14 ± 0.004 1500 ± 100 9.3 × 101 0.05 ± 0.002 1300 ± 190 3.7 × 101 1900 ± 300 180 ± 70 1.1 × 107AncDHCH3 - - 1.8 × 10-1 - - - - - 5.5 × 100 - - 2.7 × 100 1800 ± 130 220 ± 60 8.2 × 106O ONO2O POOSNO2O POOONO2O POOONO2O POOS 173 Table B.4 Crystallography data collection and refinement statistics.  AncDHCH11 Data collection  Space group P 21 21 21 Cell dimensions        a, b, c (Å) 57.60 88.15 119.80     a, b, g (°)  90.00 90.00 90.00 Resolution (Å) 26.38-1.60 (1.65-1.60) Rmerge 0.025 (0.13) I / sI 15.32 (5.23) Completeness (%) 96.26 (89.98) Redundancy 1.9 (1.9)   Refinement  Resolution (Å) 26.38-1.60 (1.65-1.60) No. reflections 78579 (7219) Rwork / Rfree 0.173/0.204 No. atoms      Protein 4521     Ligand/ion 20     Water 543 B-factors      Protein 18.5     Ligand/ion 34.3     Water 25.0 R.m.s. deviations      Bond lengths (Å) 0.017     Bond angles (°) 1.45 X-ray data were collected from a single crystal. 1Values in parentheses are for highest-resolution shell.                  174 Table B.5 Docking data of methyl-paraoxon (MPO) and methyl-parathion (MPS) in the active sites of AncDHCH1 and MPH. Enzyme AncDHCH1 MPH                 Ligand1,2 MPO MPS MPO MPS                 Pose 1 -3.789 -3.137 -4.009 -3.851                 Pose 2 -3.666 -3.078 -3.731 -3.737                 Pose 3  -2.978 -3.385 -3.454                 Pose 4  -2.963 -3.338 -3.348                 Pose 5  -2.962 -2.861 -3.276                 Pose 6  -2.926 -2.581 -3.187                 Pose 7  -2.817 -2.574 -3.085                 Pose 8  -2.696 -2.556 -2.985                 Pose 9  -2.616 -2.515 -2.892                 Pose 10  -2.605 -2.459 -2.877                 Pose 11  -2.595 -2.344 -2.737                 Pose 12  -2.414 -2.142 -2.733                 Pose 13  -2.374 -2.063 -2.7                 Pose 14  -2.35  -2.647                 Pose 15  -2.188  -2.62                 Pose 16    -2.576                 Pose 17    -2.548                 Pose 18    -2.454                 Glide scores are given in units of kcal/mol 120 poses within the default energy window (>0.0 kcal/mol) were requested. 2Catalytically productive poses that are aligned for nucleophilic attack are highlighted in bold.                175 Table B.6 Site-directed mutagenesis primers. Primer name Templae Mutation Sequence (5' to 3') MPH-R72L-Fwd MPH-wt r72L gcttgctcttctgtcgacaagctgctgaaccagcc MPH-R72L-Rev MPH-wt r72L gcttgctcttctgaccggcagcgccaccgtgc MPH-27I-273-Fwd MPH-wt none gcttgctcttctgaccccagcgtcacgacccagctcgac MPH-T271I-Fwd MPH-wt t271I gcttgctcttctgaccccagcgtcacgatccagctcgac MPH-L273F-Fwd MPH-wt l273F gcttgctcttctgaccccagcgtcacgacccagttcgac MPH-T271I-L273F-Fwd MPH-wt t271I, l273F gcttgctcttctgaccccagcgtcacgatccagttcgac MPH-258-Rev MPH-wt none gcttgctcttctgtcgtcgaactgcaccgcggcgacgagtatc MPH-L258H-Rev MPH-wt l258H gcttgctcttctgtcgtcgaactgcaccgcggcgacgtgtatc MPH-193S-Fev MPH-wt Δ193S gcttgctcttctaaaggcttcttcaaaggcgctatggcctc MPH-193S-Rev MPH-wt Δ193S gcttgctcttcttttctcgtcgtccggggccttgtcg Anc1-L72R-Fwd Anc1-wt l72R gcttgctcttcttcgacaagcggctgaaccagccgc Anc1-L72R-Rev Anc1-wt l72R ccttgctcttctcgaccggcagatccaccgtgc Anc1-271-273-Fwd Anc1-wt none gcttgctcttctgacgaccccagcgtcacgatccagttcgac Anc1-I271T-Fwd Anc1-wt i271T gcttgctcttctgacgaccccagcgtcacgacccagttcgac Anc1-F273L-Fwd Anc1-wt f273L gcttgctcttctgacgaccccagcgtcacgatccagctcgac Anc1-I271T-F273L-Fwd Anc1-wt i271T, f273L gcttgctcttctgacgaccccagcgtcacgacccagctcgac Anc1-258-Rev Anc1-wt none gcttgctcttctgtcgaactgcaccgcggcgacgtgtatcag Anc1-H258L-Rev Anc1-wt h258L gcttgctcttctgtcgaactgcaccgcggcgacgagtatcag Anc1-193S-Fwd Anc1-wt Δ193S gcttgctcttctaaaggcttcttccagggcgctatgg Anc1-193S-Rev Anc1-wt Δ193S gcttgctcttcttttctcgtcgtccggggccttgtc                   176 Table B.7 Cell lysate activities of the 32 MPH variants. Variant Parathion-methyl  Parathion-ethyl  Paraoxon-methyl  Paraoxon-ethyl Rate (µM/s/OD) Relative to -m5  Rate (µM/s/OD) Relative to -m5  Rate (µM/s/OD) Relative to -m5  Rate (µM/s/OD) Relative to -m5 WT 2.8 (± 0.2) × 100 970  5.8 (± 0.8) × 10-2 420  7.8 (± 1.1) × 10-1 200  2.0 (± 0.3) × 10-2 8.9 72 4.8 (± 0.5) × 10-2 16  2.2 (± 0.3) × 10-3 16  2.5 (± 0.2) × 10-2 6.5  6.8 (± 0.6) × 10-3 3.0 193 7.5 (± 0.6) × 10-2 26  3.3 (± 0.3) × 10-3 24  2.2 (± 0.2) × 10-2 5.8  5.8 (± 0.5) × 10-3 2.6 258 4.1 (± 0.2) × 10-1 140  8.0 (± 0.4) × 10-3 57  2.0 (± 0.1) × 100 520  3.8 (± 0.1) × 10-1 170 271 6.8 (± 0.4) × 10-1 230  1.2 (± 0.06) × 10-2 85  2.2 (± 0.1) × 10-1 58  2.8 (± 0.1) × 10-2 12 273 1.3 (± 0.2) × 10-2 4.4  5.7 (± 0.4) × 10-4 4.1  1.2 (± 0.09) × 10-2 3.2  1.8 (± 0.2) × 10-3 0.8 72/193 2.2 (± 0.3) × 10-2 7.7  1.4 (± 0.3) × 10-3 10  1.1 (± 0.2) × 10-2 2.9  4.8 (± 0.9) × 10-3 2.1 72/258 2.7 (± 0.01) × 10-3 0.9  1.8 (± 0.3) × 10-4 1.3  8.7 (± 0.4) × 10-1 230  3.0 (± 0.1) × 10-1 130 72/271 5.6 (± 1.0) × 10-2 19  2.6 (± 0.5) × 10-3 18  6.8 (± 1.2) × 10-2 18  2.5 (± 0.4) × 10-2 11 72/273 1.6 (± 0.2) × 10-2 5.5  1.3 (± 0.2) × 10-3 9.5  1.9 (± 0.2) × 10-2 4.9  7.6 (± 0.9) × 10-3 3.4 193/258 6.7 (± 0.9) × 10-3 2.3  3.5 (± 0.5) × 10-4 2.5  9.9 (± 1.3) × 10-2 26  4.2 (± 0.5) × 10-2 19 193/271 1.7 (± 0.04) × 10-2 5.7  1.3 (± 0.03) × 10-3 9.0  2.0 (± 0.1) × 10-2 5.3  6.6 (± 0.2) × 10-3 2.9 193/273 3.8 (± 0.4) × 10-3 1.3  3.1 (± 0.3) × 10-4 2.2  5.5 (± 0.3) × 10-3 1.4  1.1 (± 0.2) × 10-3 0.5 258/271 5.4 (± 0.2) × 10-1 190  1.2 (± 0.06) × 10-2 87  4.1 (± 0.2) × 10-1 110  5.7 (± 0.2) × 10-2 25 258/273 4.5 (± 0.6) × 10-3 1.5  2.5 (± 0.2) × 10-4 1.8  6.9 (± 0.6) × 10-1 180  3.1 (± 0.3) × 10-1 140 271/273 5.7 (± 0.7) × 10-2 19  2.6 (± 0.4) × 10-3 19  5.5 (± 0.7) × 10-2 14  1.2 (± 0.2) × 10-2 5.5 72/193/258 2.0 (± 0.2) × 10-3 0.7  2.0 (± 0.04) × 10-4 1.4  6.4 (± 0.5) × 10-1 170  3.4 (± 0.3) × 10-1 150 72/193/271 3.7 (± 0.7) × 10-2 13  2.6 (± 0.5) × 10-3 19  5.1 (± 1.0) × 10-2 13  3.4 (± 0.6) × 10-2 15 72/193/273 1.2 (± 0.1) × 10-2 4.2  2.1 (± 0.3) × 10-3 15  1.4 (± 0.2) × 10-2 3.7  1.1 (± 0.1) × 10-2 4.8 72/258/271 2.9 (± 0.4) × 10-2 9.9  1.1 (± 0.2) × 10-3 8.1  5.5 (± 0.8) × 10-2 14  1.6 (± 0.3) × 10-2 7 72/258/273 1.4 (± 0.1) × 10-3 0.5  9.3 (± 0.9) × 10-5 0.7  5.9 (± 0.3) × 10-1 150  3.6 (± 0.1) × 10-1 160 72/271/273 4.4 (± 0.9) × 10-2 15  2.4 (± 0.4) × 10-3 17  3.6 (± 0.7) × 10-2 9.5  1.6 (± 0.3) × 10-2 7.1 193/258/271 1.8 (± 0.4) × 10-2 6.3  9.2 (± 1.8) × 10-4 6.5  2.3 (± 0.4) × 10-2 6.0  7.8 (± 1.5) × 10-3 3.5 193/258/273 1.7 (± 0.3) × 10-3 0.6  1.5 (± 0.2) × 10-4 1.0  2.0 (± 0.3) × 10-1 53  1.2 (± 0.2) × 10-1 53 193/271/273 4.0 (± 0.2) × 10-2 14  2.1 (± 0.1) × 10-3 15  3.8 (± 0.3) × 10-2 9.9  1.2 (± 0.1) × 10-2 5.1 258/271/273 5.4 (± 0.7) × 10-2 19  2.3 (± 0.3) × 10-3 17  1.3 (± 0.2) × 10-1 34  4.0 (± 0.5) × 10-2 18 72/193/258/271 1.8 (± 0.2) × 10-2 6.1  8.8 (± 1.0) × 10-4 6.3  3.7 (± 0.4) × 10-2 9.7  1.8 (± 0.2) × 10-2 8.1 72/193/258/273 6.8 (± 1.3) × 10-4 0.2  1.2 (± 0.4) × 10-4 0.8  4.4 (± 1.1) × 10-1 110  3.9 (± 1.0) × 10-1 170 72/193/271/273 3.7 (± 0.5) × 10-2 13  2.9 (± 0.3) × 10-3 21  2.7 (± 0.3) × 10-2 7.0  1.8 (± 0.1) × 10-2 8.1 72/258/271/273 4.9 (± 1.2) × 10-3 1.7  1.8 (± 0.4) × 10-4 1.3  7.0 (± 1.5) × 10-3 1.8  2.5 (± 0.5) × 10-3 1.1 193/258/271/273 2.3 (± 0.5) × 10-2 7.8  1.1 (± 0.2) × 10-3 7.5  6.5 (± 1.4) × 10-2 17.0  2.2 (± 0.5) × 10-2 9.8 M5 2.9 (± 0.3) × 10-3 1.0  1.4 (± 0.2) × 10-4 1.0  3.8 (± 0.4) × 10-3 1.0  2.2 (± 0.3) × 10-3 1.0 Results shown are the mean activities for three biological replicates ± indicates standard deviation between three biological replicates  All variants were measured using the same amount of lysate (20 μL in 100 μL reaction) and the same substrate concentration (400 μM)  177 Table B.8 Relative activities of the 32 MPH variants towards methyl-parathion predicted from linear regression models using effects up to the 2nd order and effects up to the 5th order. Variant 2nd Order Effects  5th Order Effects Relative to -m5  Relative to -m5 WT 110  917 72 9  13 193 6  25 258 27  120 271 60  183 273 3.0  3.5 72/193 2.4  7 72/258 0.7  0.8 72/271 8  17 72/273 1.0  5.7 193/258 1.3  2.2 193/271 3.8  5.1 193/273 0.7  1.3 258/271 56  160 258/273 0.6  1.5 271/273 7  16 72/193/258 0.2  0.7 72/193/271 4  16 72/193/273 1.1  4 72/258/271 3.9  17.3 72/258/273 0.1  0.4 72/271/273 4  16 193/258/271 3.2  5.1 193/258/273 0.1  0.5 193/271/273 2  13 258/271/273 5  14 72/193/258/271 0.6  4.7 72/193/258/273 0.1  0.2 72/193/271/273 5  9 72/258/271/273 0.8  1.9 193/258/271/273 1.3  8.5 M5 1.0  1.0      178 Appendix C Supplementary material for Chapter 4  Table C.1 MPH sequences deposited in the GenBank database. GenBank accession Query Cover E value % identity Organismal source 1P9E_A 100% 0 100.00% Pseudomonas sp. WBC-3 WP_080708177.1 100% 0 100.00% Pseudomonas AAY99780.1 100% 0 99.70% Insertion vector pWSMK-T AAK40367.1 100% 0 99.70% Pseudomonas putida ABD92793.1 100% 0 99.70% Stenotrophomonas sp. Dsp-4 AAT84091.1 100% 0 99.70% Ochrobactrum sp. mp-4 KAK42483.1 100% 0 99.40% Caballeronia jiangsuensis WP_081851938.1 100% 0 99.09% Caballeronia zhejiangensis AAT84090.1 100% 0 99.09% Ochrobactrum sp. mp-3 AET95411.1 100% 0 99.09% Burkholderia sp. YI23 ABI15199.1 100% 0 98.79% Ochrobactrum sp. Yw18 AAY18224.1 100% 0 98.79% Burkholderia cepacia ABA02342.1 100% 0 98.79% Burkholderia sp. FDS-1 WP_081987209.1 100% 0 98.79% Caballeronia zhejiangensis AEU11362.1 98% 0 100.00% Stenotrophomonas maltophilia AEP44044.1 100% 0 99.09% Sphingopyxis sp. D-8 AAK14390.1 100% 0 98.49% Plesiomonas sp. M6 AAT84089.1 100% 0 98.49% Achromobacter sp. mp-2 ABI15198.1 100% 0 98.19% Ochrobactrum sp. Yw15 ADV17371.1 100% 0 97.89% uncultured bacterium ABI15200.1 100% 0 97.89% Ochrobactrum sp. Yw28 ACC63894.1 100% 0 98.19% Ochrobactrum sp. M231 ABD92795.1 100% 0 93.14% Sphingomonas sp. Dsp-2 AFN20642.1 100% 0 97.28% Cupriavidus sp. DT-1 KIL03406.1 95% 0 99.68% Pseudomonas stutzeri AXA20002.1 89% 0 100.00% synthetic construct AAT67170.1 97% 0 91.67% Burkholderia sp. FDS-1 ALM58467.1 89% 0 98.99% Caballeronia jiangsuensis AST48371.1 89% 0 97.97% Ochrobactrum sp. M231   179 Table C.2 Genomic information of DHCH orthologs.  Genome ID Start Stop Size (a.a.) Strand Function Pseudomonas sp.  (strain WBC-3) Q4QYG0 653 918 263 - DDE_Tnp_IS240 Pseudomonas sp.  (strain WBC-3) Q841S6 1017 1348 330 + Lactamase_B Pseudomonas sp.  (strain WBC-3) Q841S5 1358 1783 423 - Sigma70_r2; Sigma70_r4_2 Pseudomonas sp.  (strain WBC-3) Q841S4 1807 2062 253 - DDE_Tnp_IS240 Burkholderiales bacterium JOSHI_001 H5WTB6 30837 31130 291 + GGDEF Burkholderiales bacterium JOSHI_001 H5WTB7 31133 31435 300 - UbiA Burkholderiales bacterium JOSHI_001 H5WTB8 31476 31900 422 + Glyphos_transf Burkholderiales bacterium JOSHI_001 H5WTB9 31908 32125 215 - Methyltransf_25 Burkholderiales bacterium JOSHI_001 H5WTC0 32126 32373 246 - Asp_Glu_race Burkholderiales bacterium JOSHI_001 H5WTC1 32373 32608 233 - Methyltransf_25 Burkholderiales bacterium JOSHI_001 H5WTC2 32615 33028 412 - DUF711 Burkholderiales bacterium JOSHI_001 H5WTC3 33027 33255 226 - NTP_transf_3 Burkholderiales bacterium JOSHI_001 H5WTC4 33254 33386 130 - ParBc Burkholderiales bacterium JOSHI_001 H5WTC5 33389 33655 264 - NTP_transf_3 Burkholderiales bacterium JOSHI_001 H5WTC6 33743 34052 308 + Lactamase_B Burkholderiales bacterium JOSHI_001 H5WTC7 34056 34749 692 - DEAD; Helicase_C; RecG_wedge Burkholderiales bacterium JOSHI_001 H5WTC8 34768 35113 344 + Queuosine_synth Burkholderiales bacterium JOSHI_001 H5WTC9 35112 35779 665 + HisKA; HATPase_c Burkholderiales bacterium JOSHI_001 H5WTD0 35782 36010 226 - Response_reg; GerE Burkholderiales bacterium JOSHI_001 H5WTD1 36076 36933 855 + none Burkholderiales bacterium JOSHI_001 H5WTD2 36951 37328 375 + none Burkholderiales bacterium JOSHI_001 H5WTD3 37392 37518 124 + Cytochrom_C Burkholderiales bacterium JOSHI_001 H5WTD4 37537 37941 402 + GSDH Burkholderiales bacterium JOSHI_001 H5WTD5 37945 38321 374 + TGT Burkholderiales bacterium JOSHI_001 H5WTD6 38323 38500 176 + none Janthinobacterium sp. HH01 L9P8Y7 62382 62620 236 + UreF Janthinobacterium sp. HH01 L9P8W2 62624 62839 213 + cobW Janthinobacterium sp. HH01 L9P949 62838 63023 183 + HupE_UreJ Janthinobacterium sp. HH01 L9P9A5 63055 63261 205 - Acyltransferase Janthinobacterium sp. HH01 L9PBC4 63320 63778 456 + Sigma54_activat; HTH_8 Janthinobacterium sp. HH01 L9P8Z2 63862 64435 571 + none Janthinobacterium sp. HH01 L9P8W6 64454 64587 132 + none Janthinobacterium sp. HH01 L9P955 64586 64811 223 + Peptidase_C39 Janthinobacterium sp. HH01 L9P9A9 64815 64979 162 + none Janthinobacterium sp. HH01 L9PBC8 64986 65340 352 + none Janthinobacterium sp. HH01 L9P8Z6 65364 65682 317 + Lactamase_B  180 Janthinobacterium sp. HH01 L9P8X0 65686 66037 350 - Oxidored_FMN Janthinobacterium sp. HH01 L9P960 66041 66140 97 - HTH_20 Janthinobacterium sp. HH01 L9P9B4 66182 66535 351 + none Janthinobacterium sp. HH01 L9PBD1 66575 66693 117 - DUF1304 Janthinobacterium sp. HH01 L9P901 66732 66838 104 + Multi_Drug_Res Janthinobacterium sp. HH01 L9P8X5 66838 67158 318 - AstE_AspA Janthinobacterium sp. HH01 L9P965 67323 67738 413 + HlyD_3 Janthinobacterium sp. HH01 L9P9B7 67743 68451 707 + ABC_tran; ABC_membrane; Peptidase_C39 Janthinobacterium sp. HH01 L9PBD6 68464 68525 59 - none Janthinobacterium sp. HH01 L9P905 68529 68590 59 - none Paraburkholderia dipogonis A0A4Y8MTG7 687709 688014 304 - HTH_18 Paraburkholderia dipogonis A0A4Y8MY64 688196 688470 272 - Abhydrolase_1 Paraburkholderia dipogonis A0A4Y8MTL7 688654 688866 210 + Isochorismatase Paraburkholderia dipogonis A0A4Y8MTW9 688945 689568 621 + Amidohydro_3 Paraburkholderia dipogonis A0A4Y8MTJ1 689581 690071 488 + cNMP_binding; MS_channel Paraburkholderia dipogonis A0A4Y8MTS7 690118 690332 212 + HD Paraburkholderia dipogonis A0A4Y8MT96 690346 690430 82 + DUF1427 Paraburkholderia dipogonis A0A4Y8MTE3 690446 690763 315 + none Paraburkholderia dipogonis A0A4Y8MTP0 690767 691276 508 + Alginate_exp Paraburkholderia dipogonis A0A4Y8MTL5 691295 691871 575 + cNMP_binding; Pyr_redox_2 Paraburkholderia dipogonis A0A4Y8MT85 691890 692227 335 + Lactamase_B Paraburkholderia dipogonis A0A4Y8MTH7 692243 692415 170 + DUF3331 Paraburkholderia dipogonis A0A4Y8MTN0 692476 693487 1009 + Trans_reg_C Paraburkholderia dipogonis A0A4Y8MTX9 693490 693797 305 - HTH_1; LysR_substrate Paraburkholderia dipogonis A0A4Y8MTJ9 693871 694109 236 + tRNA_SAD Paraburkholderia dipogonis A0A4Y8MTT7 694116 694401 283 + AraC_binding; HTH_18 Paraburkholderia dipogonis A0A4Y8MTA5 694409 694542 131 - none Paraburkholderia dipogonis A0A4Y8MTF3 694546 694769 221 - RraA; like Paraburkholderia dipogonis A0A4Y8MTQ0 695150 695592 440 - MFS_1 Paraburkholderia dipogonis A0A4Y8MTM5 695634 695933 298 - HTH_1; LysR_substrate Paraburkholderia sp. BL6669N2 A0A3E0C5T1 806665 807117 451 + Sugar_tr Paraburkholderia sp. BL6669N2 A0A3E0C4V6 807133 807538 404 + Acyl-CoA_dh_N; Acyl-CoA_dh_2 Paraburkholderia sp. BL6669N2 A0A3E0C4W8 807546 807731 183 + Flavin_Reduct Paraburkholderia sp. BL6669N2 A0A3E0C6D1 807732 807962 228 + none Paraburkholderia sp. BL6669N2 A0A3E0C4S3 807972 808288 315 + HTH_1; LysR_substrate Paraburkholderia sp. BL6669N2 A0A3E0C4T5 808392 808677 283 - AraC_binding; HTH_18 Paraburkholderia sp. BL6669N2 A0A3E0C6U2 808683 808921 236 - tRNA_SAD Paraburkholderia sp. BL6669N2 A0A3E0C4T1 808995 809302 305 + HTH_1; LysR_substrate  181 Paraburkholderia sp. BL6669N2 A0A3E0C509 809305 810331 1024 - Trans_reg_C Paraburkholderia sp. BL6669N2 A0A3E0C5P6 810377 810550 171 - DUF3331 Paraburkholderia sp. BL6669N2 A0A3E0C5U3 810566 810909 342 - Lactamase_B Paraburkholderia sp. BL6669N2 A0A3E0C4W7 810921 811498 575 - cNMP_binding; Pyr_redox_2 Paraburkholderia sp. BL6669N2 A0A3E0C4Y0 811516 812035 517 - Alginate_exp Paraburkholderia sp. BL6669N2 A0A3E0C6E7 812032 812348 315 - none Paraburkholderia sp. BL6669N2 A0A3E0C4T4 812365 812448 82 - DUF1427 Paraburkholderia sp. BL6669N2 A0A3E0C4U4 812462 812676 212 - none Paraburkholderia sp. BL6669N2 A0A3E0C6V6 812723 813213 488 - cNMP_binding; MS_channel Paraburkholderia sp. BL6669N2 A0A3E0C4U1 813226 813853 625 - Amidohydro_3 Paraburkholderia sp. BL6669N2 A0A3E0C520 813928 814140 210 - Isochorismatase Paraburkholderia sp. BL6669N2 A0A3E0C5Q6 814268 814541 272 + Abhydrolase_1 Paraburkholderia sp. BL6669N2 A0A3E0C5V2 814687 814868 179 - Acetyltransf_3 Burkholderia sp.  (strain CCGE1003) E1T8Q9 681178 681428 248 - none Burkholderia sp.  (strain CCGE1003) E1T957 681500 681676 174 - none Burkholderia sp.  (strain CCGE1003) E1T958 681701 681925 223 - GerE Burkholderia sp.  (strain CCGE1003) E1T959 682487 682552 63 + none Burkholderia sp.  (strain CCGE1003) E1T960 682582 683047 463 - MFS_1 Burkholderia sp.  (strain CCGE1003) E1T961 683181 683499 316 + 2-Hacid_dh; 2-Hacid_dh_C Burkholderia sp.  (strain CCGE1003) E1T962 683532 683839 306 + HTH_1; LysR_substrate Burkholderia sp.  (strain CCGE1003) E1T963 683900 684210 309 + HTH_1; LysR_substrate Burkholderia sp.  (strain CCGE1003) E1T964 684263 685264 999 - Trans_reg_C Burkholderia sp.  (strain CCGE1003) E1T965 685302 685485 182 - DUF3331 Burkholderia sp.  (strain CCGE1003) E1T966 685509 685839 328 - Lactamase_B Burkholderia sp.  (strain CCGE1003) E1T967 685843 686419 574 - cNMP_binding; Pyr_redox_2 Burkholderia sp.  (strain CCGE1003) E1T968 686472 686996 522 - Alginate_exp Burkholderia sp.  (strain CCGE1003) E1T969 686993 687309 315 - none Burkholderia sp.  (strain CCGE1003) E1T970 687377 687461 82 - DUF1427 Burkholderia sp.  (strain CCGE1003) E1T971 687470 687683 212 - none Burkholderia sp.  (strain CCGE1003) E1T972 687778 688267 487 - cNMP_binding; MS_channel Burkholderia sp.  (strain CCGE1003) E1T973 688287 688910 622 - Amidohydro_3 Burkholderia sp.  (strain CCGE1003) E1T974 689024 689202 176 + Ferritin Burkholderia sp.  (strain CCGE1003) E1T975 689328 689601 272 + Abhydrolase_1 Burkholderia sp.  (strain CCGE1003) E1T976 689604 689646 41 + none Burkholderia sp. Ch1-1 I2IR07 333084 333707 621 + Amidohydro_3 Burkholderia sp. Ch1-1 I2IR08 333720 334211 489 + cNMP_binding; MS_channel Burkholderia sp. Ch1-1 I2IR09 334274 334488 212 + HD Burkholderia sp. Ch1-1 I2IR10 334498 334578 79 + DUF1427  182 Burkholderia sp. Ch1-1 I2IR11 334583 334922 338 + none Burkholderia sp. Ch1-1 I2IR12 334919 335428 507 + Alginate_exp Burkholderia sp. Ch1-1 I2IR13 335452 336026 572 + cNMP_binding; Pyr_redox_2 Burkholderia sp. Ch1-1 I2IR14 336074 336347 272 + Abhydrolase_1 Burkholderia sp. Ch1-1 I2IR15 336344 336961 615 - Trans_reg_C Burkholderia sp. Ch1-1 I2IR16 337043 337348 303 - HTH_18 Burkholderia sp. Ch1-1 I2IR17 337442 337773 329 + Lactamase_B Burkholderia sp. Ch1-1 I2IR18 337843 338007 162 + DUF3331 Burkholderia sp. Ch1-1 I2IR19 338111 339113 1000 + Trans_reg_C Burkholderia sp. Ch1-1 I2IR20 339116 339422 305 - HTH_1; LysR_substrate Burkholderia sp. Ch1-1 I2IR21 339525 340099 572 + TPP_enzyme_M; TPP_enzyme_C; TPP_enzyme_N Burkholderia sp. Ch1-1 I2IR22 340205 340931 724 + FUSC Burkholderia sp. Ch1-1 I2IR23 340930 341432 500 + OEP Burkholderia sp. Ch1-1 I2IR24 341426 341493 65 + DUF1656 Burkholderia sp. Ch1-1 I2IR25 341514 341834 318 + HlyD_3; Biotin_lipoyl_2 Burkholderia sp. Ch1-1 I2IR26 341906 342052 144 + PRC Burkholderia sp. Ch1-1 I2IR27 342143 342203 58 - none Paraburkholderia bryophila A0A329BL34 19150 19784 632 + K_trans Paraburkholderia bryophila A0A329BK93 19813 20050 235 - none Paraburkholderia bryophila A0A329BJZ8 20223 20498 273 + Abhydrolase_1 Paraburkholderia bryophila A0A329BL65 20546 20868 320 - HlyD_3; Biotin_lipoyl_2 Paraburkholderia bryophila A0A329BMM2 20887 20953 65 - DUF1656 Paraburkholderia bryophila A0A329BL09 20948 21448 498 - OEP Paraburkholderia bryophila A0A329BIX0 21447 22172 724 - FUSC Paraburkholderia bryophila A0A329BTZ7 22246 22564 317 + HTH_1; LysR_substrate Paraburkholderia bryophila A0A329BUC7 22568 23575 1006 - Trans_reg_C Paraburkholderia bryophila A0A329BIX3 23666 23846 178 - DUF3331 Paraburkholderia bryophila A0A329BL42 23861 24191 329 - Lactamase_B Paraburkholderia bryophila A0A329BKA3 24276 24421 144 + FAD_binding_3 Paraburkholderia bryophila A0A329BK09 24512 24817 303 + HTH_18 Paraburkholderia bryophila A0A329BL74 24898 25897 997 + Trans_reg_C Paraburkholderia bryophila A0A329BMN2 25894 26485 590 - cNMP_binding; Pyr_redox_2 Paraburkholderia bryophila A0A329BL19 26509 27022 511 - Alginate_exp Paraburkholderia bryophila A0A329BIX9 27018 27334 314 - none Paraburkholderia bryophila A0A329BU06 27372 27462 89 - DUF1427 Paraburkholderia bryophila A0A329BUD8 27471 27685 212 - none Paraburkholderia bryophila A0A329BIY3 27759 28248 488 - cNMP_binding; MS_channel The MPH/DHCH orthologs are bolded.   183 Table C.3 Primers utilized to generate +m5 variations of ancestral sequences. Primer Name Template Mutation Sequence (5’ to 3’) Node1_72_F AncDHCH1 l72R GCTTGCTCTTCTTCGACAAGCGGCTGAACCAGCCGC Node1_72_R AncDHCH1 none GCTTGCTCTTCTCGACCGGCAGATCCACCGTGC Node1_271_273_WT_F AncDHCH1 none GCTTGCTCTTCTGACGACCCCAGCGTCACGATCCAGTTCGAC Node1_271F AncDHCH1 i271T GCTTGCTCTTCTGACGACCCCAGCGTCACGACCCAGTTCGAC Node1_273F AncDHCH1 f273L GCTTGCTCTTCTGACGACCCCAGCGTCACGATCCAGCTCGAC Node1_271_273_F AncDHCH1 i271T, f273L GCTTGCTCTTCTGACGACCCCAGCGTCACGACCCAGCTCGAC Node1_258_WT_R AncDHCH1 none GCTTGCTCTTCTGTCGAACTGCACCGCGGCGACGTGTATCAG Node1_258_R AncDHCH1 h258L GCTTGCTCTTCTGTCGAACTGCACCGCGGCGACGAGTATCAG Node2_72_F AncDHCH2 l72R GCTTGCTCTTCTCGACAAGCGGCTGACCAACACCAGC Node2_72_R AncDHCH2 none GCTTGCTCTTCTTCGACCGGCAGATCCACCGTGC Node2_271_273_WT_F AncDHCH2 none GCTTGCTCTTCTGGACCCCAGCGTCACGATCCAGTTCGAC Node2_271F AncDHCH2 i271T GCTTGCTCTTCTGGACCCCAGCGTCACGACCCAGTTCGAC Node2_273F AncDHCH2 f273L GCTTGCTCTTCTGGACCCCAGCGTCACGATCCAGCTCGAC Node2_271_273_F AncDHCH2 i271T, f273L GCTTGCTCTTCTGGACCCCAGCGTCACGACCCAGCTCGAC Node2_258_WT_R AncDHCH2 none GCTTGCTCTTCTTCCGGGAACTGCACCGCGGCGACGTGTATC Node2_258_R AncDHCH2 h258L GCTTGCTCTTCTTCCGGGAACTGCACCGCGGCGACGAGTATC Node3_72_F AncDHCH3 l72R GCTTGCTCTTCTGACAAGCGGCTGACCAACACCAGCAC Node3_72_R AncDHCH3 none GCTTGCTCTTCTGTCGACCGGCAGATCCACCGTG Node3_271_273_WT_F AncDHCH3 none GCTTGCTCTTCTTCCCGGACCCCAGCGTCACGATCGAATTCGAC Node3_271F AncDHCH3 i271T GCTTGCTCTTCTTCCCGGACCCCAGCGTCACGACCGAATTCGAC Node3_273F AncDHCH3 f273L GCTTGCTCTTCTTCCCGGACCCCAGCGTCACGATCGAACTCGAC Node3_271_273_F AncDHCH3 i271T, f273L GCTTGCTCTTCTTCCCGGACCCCAGCGTCACGACCGAACTCGAC Node3_258_WT_R AncDHCH3 none GCTTGCTCTTCTGGAACTGCACCGCGGCGACGTGTATCAGGTC Node3_258_R AncDHCH3 h258L GCTTGCTCTTCTGGAACTGCACCGCGGCGACGAGTATCAGGTC Node123_193del_F1 AncDHCH1-3 Δ193 GCTTGCTCTTCTAAAGGCTTCTTCCAGGGCGCTATGG Node1_193del_R AncDHCH1 Δ193 GCTTGCTCTTCTTTTCTCGTCGTCCGGGGCCTTGTC Node23_193del_R AncDHCH2-3 Δ193 GCTTGCTCTTCTTTTCTCGTCTTCCGGGGCCTTCGC 1Ancestral sequences were synthesized with the Ser193 insertion present, which was subsequently removed to generate the wild-type sequences   184   Figure C.1 Full SDS-PAGE analysis showing soluble and insoluble fractions of the enzymes characterized in this study. The band corresponding to the MBP-tagged enzymes (approximately 75 kDa) is indicated by dashed red circles. Both the wild-type (wt, left) and sequences carrying the five MPH mutations (+m5, right) for each of the enzymes are shown, with the soluble lysate and insoluble pellet fractions for each variant run side by side. EV refers to “empty vector” (i.e., a cell that’s been transformed with a pET vector that does not contain an enzyme sequence under the T7 promoter).             AncDHCH1 AncDHCH2 AncDHCH3 MPH AncDHCH5 AncDHCH4 BbDHCH JsDHCHSmDHCH PvDHCH OsDHCH EVwt +m5 wt +m5 wt +m5 wt -m5 wt +m5 wt +m5 wt +m5 wt +m5wt +m5 wt +m5 wt +m5 185 Table C.4 Kinetic parameters of the enzymes used in this study. Enzyme DHCH   OPH kcat (s-1) KM (µM) kcat/KM (s-1M-1)  kcat (s-1) KM (µM) kcat/KM (s-1M-1) MPH  2.3 ± 0.1 230 ± 46 1.0 × 104  17 ± 0.4 790 ± 56 2.1 × 104 MPH-m5 410 ± 9 210 ± 15 1.9 × 106  0.03 ± 0.001 1200 ± 180 2.1 × 101 AncDHCH1 980 ± 32 180 ± 19 5.5 × 106  0.02 ± 0.001 1300 ± 200 1.2 × 101 AncDHCH1+m5 22 ± 1.9 1300 ± 210 1.7 × 104  17 ± 0.8 1100 ± 160 1.5 × 104 AncDHCH2 1600 ± 70 180 ± 26 8.9 × 106  0.01 ± 0.0005 1000 ± 130 1.2 × 101 AncDHCH2+m5 46 ± 4.1 900 ± 160 5.1 × 104  2.2 ± 0.1 1400 ± 230 1.6 × 103 AncDHCH3 4200 ± 300 440 ± 77 9.6 × 106    2.6 × 10-1 AncDHCH3+m5 3.3 ± 0.4 560 ± 210 5.8 × 103  0.007 ± 0.0005 1700 ± 370 4.3 × 100 AncDHCH4 1800 ± 100 750 ± 90 2.4 × 106  0.002 ± 0.0001 2800 ± 410 8.7 × 10-1 AncDHCH4+m5 7.1 ± 0.4 650 ± 100 1.1 × 104  0.1 ± 0.01 1500 ± 230 7.4 × 101 AncDHCH5 980 ± 78 52 ± 14 1.9 × 107  0.03 ± 0.002 3000 ± 430 9.4 × 100 AncDHCH5+m5 1.5 ± 0.06 100 ± 14 1.5 × 104  0.01 ± 0.001 1800 ± 490 5.9 × 100 JsDHCH 420 ± 26 930 ± 150 4.5 × 105    4.0 × 10-1 JsDHCH+m5   n.d.  0.002 ± 0.0002 3100 ± 800 5.2 × 10-1 BbDHCH 320 ± 17 380 ± 51 8.5 × 105  0.001 ± 0.0002 3100 ± 970 4.1 × 10-1 BbDHCH+m5   n.d.    n.d. PvDHCH 480 ± 33 170 ± 41 2.8 × 106  0.003 ± 0.0002  2200 + 230 1.6 × 100 PvDHCH+m5 1.5 ± 0.2 830 ± 270 1.8 × 103  0.008 ± 0.0003 940 ± 110 8.6 × 100 SmDHCH1 2000 ± 130 840 ± 110 2.4 × 106    9.3 × 10-2 SmDHCH+m51 2.9 ± 0.09 150 ± 15 1.9 × 104    5.3 × 10-2 OsDHCH1 6900 ± 410 850 ± 100 8.1 × 106  0.001 ± 0.0001 7800 ± 1300 1.7 × 10-1 OsDHCH+m51 0.7 ± 0.04 1800 ± 200 3.7 × 102     1.1 × 10-1 The average activities of two replicates were fitted to the Michaelis Menten equation. For enzymes where no saturation kinetics were observed, the data was fitted to pseudo first order kinetics in which the slope directly corresponds to kcat/KM. Error bars indicate the error in the fit to the curve.  n.d. means not determined because the activity is too low and cannot be plotted. 1The methyl-parathion activity of SmDHCH and OsDHCH was too low to be measured. Methyl-paraoxon activity is instead presented for OPH activity.       186 Table C.5 Mutations of all variants sequenced for directed evolution.   Round 1   Round 2   Round 3 Enzyme Isolate Mutations  Isolate Mutations  Isolate Mutations AncDHCH1 1 l72R, n166T, v262I  1 f273L  1 t113I 2 s245P  2 q81R, a301T  2 i226V, s277P 3 a167T, v243A     3 v242M BbDHCH 1 v197A, i232S  1 m157L, l185R  1 l72Q 2 t184P  2 t113P, q248R  2 a206V, s239T 3 f273Y  3 s239A    JsDHCH 1 t272S  1 d186N, i281T  1 s304P 2 f273S  2 k125R    3 q260R       OsDHCH 1 l72R, e86K, n129T, k158Q, e290A, k293T  1 a69V  1 e86K 2 l72R, v323G  2 v239I    3 l72R, i208S  3 y232H    PvDHCH 1 k90N, f273V  1 e70G    2 f273C  2 m69V, n90S    3 f47P, i65L  3 n98D    SmDHCH 1 l52P, q79R, f273S       2 l52P, q72R, a188T, f273S       3 i81T, k191R, f273S            Mutations that are bolded indicate positions that are the same as those of the five key MPH mutations.       

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0395236/manifest

Comment

Related Items