Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Biophysical insights into the regulatory and DNA-binding mechanisms of the eukaryotic transcription factors… Perez-Borrajero, Cecilia 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_may_perez-borrajero_cecilia.pdf [ 8.45MB ]
JSON: 24-1.0364707.json
JSON-LD: 24-1.0364707-ld.json
RDF/XML (Pretty): 24-1.0364707-rdf.xml
RDF/JSON: 24-1.0364707-rdf.json
Turtle: 24-1.0364707-turtle.txt
N-Triples: 24-1.0364707-rdf-ntriples.txt
Original Record: 24-1.0364707-source.json
Full Text

Full Text

BIOPHYSICAL INSIGHTS INTO THE REGULATORY AND DNA-BINDING MECHANISMS OF THE EUKARYOTIC TRANSCRIPTION FACTORS PAX5 AND ETS1  by Cecilia Perez-Borrajero B.Sc., University of British Columbia, 2011   A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Genome Science and Technology)    THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)   February 2018   © Cecilia Perez-Borrajero, 2018 ii  Abstract   Transcription factors are proteins that bind at promoter and enhancer sites to regulate gene expression. In this thesis, I used NMR spectroscopy and other methods to investigate the structural and biophysical basis of DNA binding by two eukaryotic transcription factors that are crucial in the development of lymphocytes, Pax5 and Ets1.   In chapter 2, I describe how the two subdomains comprising the bipartite DNA-binding Paired domain of Pax5 cooperate to mediate transcriptional regulation. The N-terminal subdomain recognizes DNA sequences in a highly specific manner, whereas the C-terminal subdomain shows little sequence discrimination. The more rigid C-terminal subdomain binds DNA primarily though non-specific electrostatic interactions. In contrast, association with specific DNAs by the dynamic N-terminal subdomain involves relatively large and compensating changes in enthalpy and entropy that point to structural rearrangements upon binding. I propose that the distinct behaviors of the subdomains allow the Pax5 protein to rapidly scan non-specific genomic DNA while retaining specificity for functional regulatory sites.   In chapter 3, I expand our understanding of the structural and thermodynamic basis of Ets1 autoinhibition. Previously it was reported that an intrinsically disordered serine-rich region (SRR) interacts transiently with the adjacent ETS domain to attenuate DNA binding. Although forming a dynamic fuzzy complex, I was able to use NMR spectroscopy and X-ray crystallography to provide a detailed mechanism for this inhibitory interaction. In particular, I exploited a trans peptide system to show that the SRR uses a combination of electrostatic and hydrophobic-driven interactions to sterically block the ETS domain DNA-binding interface. I also show how phosphorylation of the SRR strengthens its association with the ETS domain. Altogether, these results explain how the activity of Ets1 is regulated at the level of DNA binding through post-translational modifications that impinge upon the SRR. iii   Collectively, my thesis research uncovers many mechanisms that transcription factors use to regulate gene expression, and how they depend on the biophysical properties encoded by their amino acid sequence.      iv  Lay summary   The genetic information found in our DNA must be carefully read by molecules called transcription factors. This process is tightly controlled and very important for cellular health. The goal of my thesis was to understand how two of these DNA readers, called Pax5 and Ets1, carry out their functions. Using different biophysical experiments, I investigated how distinct parts of Pax5 recognize the DNA molecule, and found that they cooperate in unexpected ways that are likely to speed up scanning background DNA to find functionally important gene sequences. I also studied how the reading activity of Ets1 is adjusted though a process called autoinhibition that allows a dimmer-switch response to changing cellular conditions. Overall, my research contributes to a greater understanding of how transcription factors are able to read our genetic information correctly to ensure the well-being of a cell.   v  Preface   Chapter 2 is based on research that I conducted at the University of British Columbia (UBC) in collaboration with Dr. Lawrence McIntosh, Dr. Mark Okon, Mr. Florian Heinkel, and Ms. Trisha Barnard. Chapter 3 reflects a collaboration with Dr. Lawrence McIntosh, Dr. Mark Okon, Ms. Chang Sheng-Huei Lin, Mr. Karlton Scheu, and Dr. Michael Murphy.   The majority of sections 2.3.1 and 2.3.2 have been published (Cecilia Perez-Borrajero, Mark Okon, Lawrence P. McIntosh, "Structural and dynamics studies of Pax5 reveal asymmetry in domain stability and DNA binding by the Paired domain" Journal of Molecular Biology, 428: 2372-2391, 2016), and reformatted for this thesis with minor changes. The Pax5 expression plasmids used in this study were generated by Mr. Kevin Zhang under the supervision of Dr. Geneviève Desjardins. Dr. Mark Okon provided technical assistance with NMR experiments. Dr. Lawrence McIntosh assisted with research design, data analysis, and editing of the manuscript. I wrote the manuscript and was responsible for research design, experimental work, and data analysis.   Currently, I am preparing a second manuscript on additional unpublished results incorporated into Chapter 2, sections 2.3.1 and 2.3.2. Mr. Florian Heinkel conducted the molecular dynamics simulations found in section 2.3.1. With my supervision, Ms. Trisha Barnard cloned and conducted expression tests for Pax5 protein constructs outside the Paired domain (section 2.3.3). The remainder of the experimental work, research design, and data analysis was conducted by myself, with assistance from Dr. Lawrence McIntosh.   Chapter 3 is also currently being prepared for publication. I was responsible for the bulk of the research design, experimental work, and data analysis. Ms. Chang Sheng-Huei Lin in the laboratory of Dr. Michael Murphy provided technical assistance with crystallization of the protein complex described in section 3.3.2. She also solved and refined the structure, with help from Dr. Anson C.K. Chan, Dr. Jason C. Grigg, and Dr. Michael E. P. Murphy. Dr. Mark Okon provided key assistance with NMR experiments. Mr. Karlton Scheu conducted the 31P titrations with my guidance (section 3.3.1).  Dr. Lawrence P. McIntosh helped with research design and data analysis.   vi  Table of Contents  Abstract .................................................................................................................................................... ii Lay summary ......................................................................................................................................... iv Preface ....................................................................................................................................................... v Table of Contents .................................................................................................................................. vi List of Tables ........................................................................................................................................ viii List of Figures ........................................................................................................................................ ix List of Abbreviations .......................................................................................................................... xii Acknowledgements ........................................................................................................................... xiv Dedication ............................................................................................................................................. xvi Chapter 1: Introduction .......................................................................................................................1 1.1 Eukaryotic gene transcription .........................................................................................1 1.2 Eukaryotic transcription factors .....................................................................................3 1.2.1 General features and classification ........................................................................................ 3 1.2.2 Biophysical and structural characteristics .......................................................................... 4 1.2.3 DNA-binding domains ................................................................................................................. 5 1.2.4 Binding site recognition by transcription factors............................................................. 7 1.2.5 DNA search mechanisms ............................................................................................................ 8 1.2.6 Mechanisms that regulate DNA binding by transcription factors ........................... 10 1.3 Research questions and goals ....................................................................................... 16 1.3.1 Investigating the biophysical properties of human Pax5 and its DNA-binding mechanisms ................................................................................................................................................ 16 1.3.2 Understanding the DNA-binding autoinhibitory mechanism of Ets1 .................... 18 Chapter 2: Biophysical characterization of Pax5 and its DNA-binding mechanisms .. 19 2.1 Overview ............................................................................................................................... 19 2.2 Introduction......................................................................................................................... 20 2.2.1 Structural insight from the Paired box (Pax) family of proteins ............................. 20 2.2.2 Pax5 ................................................................................................................................................. 25 2.3 Results ................................................................................................................................... 27 2.3.1 Biophysical properties of the DNA-binding Paired domain of Pax5 ...................... 27 2.3.2 Mechanisms of DNA binding by Pax5 ................................................................................. 47 2.3.3 Beyond the Paired domain of Pax5 ..................................................................................... 67 2.4 Discussion............................................................................................................................. 74 2.4.1 Structure of the Pax5 Paired domain and changes upon binding DNA ................. 74 2.4.2 Stability of the subdomains of Pax5 .................................................................................... 75 2.4.3 The Pax5 subdomains contribute differently to DNA binding ................................. 77 2.4.4 Relationship between protein dynamics and DNA-binding specificity ................ 79 2.4.5 General model of DNA recognition by the PD and implications in biology ......... 81 2.5 Materials and methods .................................................................................................... 84 2.5.1 Expression and purification of Pax5 fragments ............................................................. 84 2.5.2 DNA oligonucleotides ............................................................................................................... 85 2.5.3 General NMR spectroscopy methods ................................................................................. 86 vii  2.5.4 CD spectroscopy ......................................................................................................................... 89 2.5.5 Electrophoretic mobility shift assay (EMSA) .................................................................. 90 2.5.6 Isothermal titration calorimetry .......................................................................................... 91 2.5.7 Molecular dynamics simulations ......................................................................................... 92 2.5.8 Accession numbers .................................................................................................................... 93 Chapter 3: The biophysical basis of phosphorylation-enhanced DNA-binding autoinhibition in Ets1 ........................................................................................................................ 94 3.1 Overview ............................................................................................................................... 94 3.2 Introduction......................................................................................................................... 95 3.2.1 Intrinsically-disordered regions .......................................................................................... 95 3.2.2 Ets1 .................................................................................................................................................. 98 3.3 Results ................................................................................................................................... 99 3.3.1 Phosphate-enhanced hydrophobic effect in Ets1 autoinhibition ............................ 99 3.3.2 Structural models of steric inhibition by the SRR ....................................................... 118 3.3.3 Intrinsic properties of the SRR peptide and the effects of phosphorylation .... 128 3.3.4 The SRR peptide can associate with distantly-related PU.1 .................................... 134 3.4 Discussion.......................................................................................................................... 136 3.4.1 Hydrophobic amino acids promote intermolecular interactions .......................... 136 3.4.2 The role of phosphorylation ................................................................................................ 137 3.4.3 The “fuzzy” nature of the interaction ............................................................................... 139 3.4.4 The mechanisms of DNA-binding regulation in Ets1 ................................................. 139 3.5 Materials and methods ................................................................................................. 143 3.5.1 Expression and purification of Ets1301-440 ...................................................................... 143 3.5.2 Expression and purification of PU.1167-272 ...................................................................... 144 3.5.3 Serine-rich-region (SRR) peptides .................................................................................... 145 3.5.4 DNA oligonucleotides ............................................................................................................. 145 3.5.5 NMR spectroscopy ................................................................................................................... 146 3.5.6 Hydrophobicity scale determination ................................................................................ 149 3.5.7 Crystallization and structure determination ................................................................. 150 Chapter 4: Concluding remarks .................................................................................................. 152 4.1 The dual roles of the DNA-binding subdomains of Pax5 .................................. 152 4.1.1 Summary, significance, and potential applications ..................................................... 152 4.1.2 Limitations, outstanding questions, and future studies ............................................ 154 4.2 Regulation of Ets1 function by an intrinsically-disordered region .............. 155 4.2.1 Summary, significance, and potential applications ..................................................... 155 4.2.2 Limitations, outstanding questions, and future studies ............................................ 157 References .......................................................................................................................................... 162 Appendices ......................................................................................................................................... 182  viii  List of Tables  Table 2.1: Midpoint unfolding temperatures and thermodynamic parameters of unfolding of the subdomains of Pax51-149 as determined by CD and HX. ................................... 38 Table 2.2: Oligonucleotides used for DNA-binding studies. ............................................................ 48 Table 2.3: Equilibrium dissociation constants (KD values) for Pax5-DNA interactions........ 53 Table 2.4: Temperature-dependent thermodynamic parameters for the Pax5 subdomains binding their CD19 half-site DNAs. ....................................................................................... 63 Table 2.5: Ionic strength-dependent thermodynamic parameters for the Pax5 subdomains binding their CD19 half-site DNAs. ....................................................................................... 66 Table 3.1: Sequence dependence of the SRR peptide interactions with Ets1301-440. ............ 105 Table 3.2: Increasing ionic strength weakens the Ets1301-440/WT2P* interaction. ................. 110 Table 3.3: Intermolecular NOE crosspeaks used as CYANA distance restraints for NMR structure calculations. ............................................................................................................. 119 Table 3.4: Data collection and refinement statistics for the Ets1301-440/5fPhe2P* complex. 126  ix  List of Figures  Figure 1.1:  Assembly of the PIC and associated proteins at eukaryotic promoters. ................ 2 Figure 1.2: Overview of the most common DNA-binding domain folds. ....................................... 6 Figure 1.3: Mechanisms of facilitated diffusion of TFs on DNA. ........................................................ 9 Figure 1.4: Mechanisms that regulate DNA binding by TFs. ............................................................ 11 Figure 1.5: Domain organization of Pax5. ............................................................................................... 17 Figure 1.6: Domain organization of Ets1 and selected PTMs. ......................................................... 18 Figure 2.1: Domain organization and subgroups of the nine identified mammalian Pax proteins. .......................................................................................................................................... 21 Figure 2.2: Paired domains characterized to date by X-ray crystallography and NMR spectroscopy. ................................................................................................................................. 23 Figure 2.3: Homeodomains of Pax proteins characterized to date. .............................................. 24 Figure 2.4: The subdomains of Pax51-149 fold as independent helical bundles. ....................... 28 Figure 2.5: The β-hairpin and linker become ordered upon binding DNA. ............................... 31 Figure 2.6: The β-hairpin structure is stabilized upon binding CD19 DNA ............................... 32 Figure 2.7: The CTD of Pax51-149 is more protected from amide HX than the NTD. ............... 33 Figure 2.8: The NTD of Pax5 is more resistant to heat and chemical denaturation than the CTD. ................................................................................................................................................... 37 Figure 2.9: Sub-nanosecond timescale motions of Pax5 using amide 15N relaxation experiments. .................................................................................................................................. 39 Figure 2.10: MD simulations shed light into dynamics of the Pax5 subdomains. ................... 41 Figure 2.11: Cross-correlation analysis of the isolated subdomains explores coupled motions of the NTD and CTD. .................................................................................................. 43 Figure 2.12: MD simulations investigating the DNA-bound state of the PD of Pax5. ............ 45 Figure 2.13: Cross-correlation analysis of the DNA-bound state of the Pax5 PD. ................... 46 Figure 2.14: Schematic representation of PD protein segments used for DNA-binding studies. ............................................................................................................................................. 48 x  Figure 2.15: Quantification of the interaction between Pax51-149 and full-length CD19 DNA. ............................................................................................................................................................ 49 Figure 2.16: The subdomains exhibit different sequence preferences for CD19 half-sites. 50 Figure 2.17: The subdomains of Pax51-149 exhibit different binding properties for DNA half-sites. .................................................................................................................................................. 52 Figure 2.18: The NTD and CTD of Pax5 contact specific and non-specific DNAs using similar binding interfaces. ....................................................................................................... 55 Figure 2.19: Pax51-149 binds specific and non-specific DNAs........................................................... 57 Figure 2.20: Deletion of the β-hairpin weakens DNA binding by the NTD. ............................... 58 Figure 2.21: Contribution of the linker and β-hairpin to DNA binding. ...................................... 59 Figure 2.22: In contrast to the CTD, the NTD subdomain of Pax51-149 only weakly interacts with non-specific DNA. .............................................................................................................. 61 Figure 2.23: Thermodynamics of DNA binding by the Pax5 subdomains. ................................. 63 Figure 2.24: The electrostatic contributions to DNA binding by the subdomains of Pax5. 65 Figure 2.25: The putative partial HD of Pax5 is intrinsically disordered. .................................. 68 Figure 2.26: The partial homeodomain of Pax5 does not interact with the Pax5 PD or a C-terminal fragment of Daxx. ...................................................................................................... 71 Figure 2.27: Pax5151-391 is predominantly disordered under mildly denaturing conditions. ............................................................................................................................................................ 72 Figure 2.28: The proline-rich transactivation domain of Pax5 is predominantly disordered under native conditions. ........................................................................................................... 73 Figure 2.29: Cartoon model of the proposed DNA-binding mechanism by the PD of Pax5. 82 Figure 3.1: The SRR interacts with a well-defined surface of the ETS domain encompassing the recognition helix H3 and flanking regions. .............................................................. 101 Figure 3.2: Amide chemical shift perturbations in Ets1301-440 upon addition of the WT2P peptide. .......................................................................................................................................... 102 Figure 3.3: 1H-13C chemical shift perturbations in Ets1301-440 upon addition of the WT2P peptide. .......................................................................................................................................... 103 Figure 3.4: Determination of the dissociation constants (KD) between the SRR peptide variants and Ets1301-440 from 15N-HSQC monitored titrations. ................................ 106 xi  Figure 3.5: Increased hydrophobicity strengthens the interaction between the ETS domain and the SRR phosphopeptides. ............................................................................................. 108 Figure 3.6: The interaction between the SRR peptide and the ETS domain is dependent on ionic strength. ............................................................................................................................. 110 Figure 3.7: The phosphate groups on the WT2P peptide are involved in the interaction with Ets1301-440. ..................................................................................................................................... 111 Figure 3.8: The SRR peptide does not undergo large changes in backbone chemical shifts upon binding. .............................................................................................................................. 113 Figure 3.9: Changes in the WT2P peptide upon binding the ETS domain ................................. 115 Figure 3.10: Cognate DNA competes with the WT2P peptide for binding to Ets1301-440. ..... 117 Figure 3.11: Filtered-edited 3D 1H-15N/13C-1H NOESY spectrum of 15N/13C-labeled Ets1301-440 with bound unlabeled WT2P peptide. .......................................................................... 120 Figure 3.12: NMR-derived models of the Ets1301-440/WT2P complex. ........................................ 123 Figure 3.13: Crystal structure of the Ets1301-440/5fPhe2P* complex. ............................................ 127 Figure 3.14: The unbound SRR peptide is predominantly unstructured and exhibits modest changes in NOE patterns upon phosphorylation .......................................... 130 Figure 3.15: The unbound SRR peptides are predominantly disordered ................................ 132 Figure 3.16: Distinct 1HN(i-1)-1HN(i) NOE correlation patterns among the SRR peptide variants .......................................................................................................................................... 133 Figure 3.17: The Ets1 WT2P peptide binds to the ETS domain of PU.1 at two distinct sites. .......................................................................................................................................................... 136 Figure 3.18: Ets1 DNA-binding regulation through inhibitory and activating protein sequences. .................................................................................................................................... 141 Figure 4.1: The full-length SRR region may extend the ETS domain binding interface. ..... 159  xii  List of Abbreviations  B-ALL: B-cell acute lymphoblastic leukemia bHLH: Basic helix-loop-helix domain bZIP: Basic leucine zipper domain CaMKII: Calmodulin-dependent kinase II CD: Circular dichroism CBP: CREB-binding protein CTD: C-terminal subdomain CSP : Chemical shift perturbation D2O: Deuterium oxide DBD: DNA-binding domain DNA: Deoxyribonucleic acid D. melanogaster: Drosophila melanogaster DTT: Dithiothreitol E. coli: Escherichia coli ETS: E26 transforming specific EX1: hydrogen exchange in the unimolecular kinetic limit EX2: hydrogen exchange in the bimolecular kinetic limit GlcNAc: N-Acetylglucosamine GTF: General transcription factor HD: Homeodomain HSQC: Heteronuclear single quantum correlation HTH: Helix-turn-helix HX: Hydrogen exchange IM: Inhibitory module IPTG: Isopropyl β-D-1-thiogalactopyranoside ITC: Isothermal titration calorimetry KD: Equilibrium dissociation constant xiii  LB: lysogeny broth MES: 2-(N-morpholino)ethanesulfonic acid MICS: Motif identification from chemical shift mRNA: Messenger RNA NMR: Nuclear magnetic resonance NOE: Nuclear Overhauser effect NOESY: Nuclear Overhausser effect spectroscopy NTD: N-terminal subdomain OP: Octapeptide Pax:  Paired box PCR: Polymerase chain reaction PD: Paired domain PDB: Protein data bank PF: Protection factor pH*: pH meter reading in D2O without correction for isotope effects PIC: Pre-initiation complex PRE: Paramagnetic relaxation enhancement PTM: Post-translational modification RCI-S2: Random coil index squared order parameter RMSD: Root mean squared deviation RNAP: RNA polymerase SDS-PAGE: Sodium dodecyl sulfate-polyacrylamide gel electrophoresis SRR: Serine-rich region SUMO: Small ubiquitin-like modifier TAD: Transactivation domain TBP: TATA-binding protein TF: Transcription factor TOCSY: Total correlation spectroscopy TROSY: Transverse relaxation-optimized spectroscopy wHTH: Winged helix-turn-helix domain xiv  Acknowledgements   Above all, I am eternally grateful to my mother, Melba Borrajero Montejo. My accomplishments, great or small, are all thanks to her enduring support, love, and encouragement throughout my life.   I am immensely grateful to my supervisor, Dr. Lawrence P. McIntosh, for giving me the opportunity to join his group, the freedom to be curious, and for his patience, guidance, dedication, and continued support.  Likewise, I am tremendously grateful to Dr. Mark Okon, for being an amazing colleague, for teaching me the art of NMR, and for making the most of unruly protein samples.   I joined the McIntosh group for the science, and stayed for the people. I want to thank all past and present members of the group for being wonderful labmates. In particular, a huge thank-you to Adrienne Cheung, Helen Huang-Hobbs, Dr. Laura Packer, Dr. Desmond Lau, and Dr. Gerald Platzer for creating a warm and welcoming environment, and for showing me the ropes as a junior graduate student. I am also extremely grateful to Dr. Geneviève Desjardins, Dr. Soumya De, Florian Heinkel, Chloe Gerak, Jacob Brockerman, Dr. Stacy Maynard, Dr. Miriam Kӧtzler, and Trisha Barnard for great discussions about life, science, and politics.   I want to thank my supervisory committee, Drs. Suzana Straus and Jӧrg Gsponer, for their advice, motivation, and encouragement during my degree. I also want to thank the Biochemistry and Molecular Biology department at UBC. In particular, I am indebted to Dr. Thibault Mayor, for introducing me to research and teaching me in those early days to think critically. As well, I want to thank Drs. Jason Read and Warren Williams for their help during difficult times as an undergraduate student.   I am also very grateful to the LSI community as well as friends, coffee mates, colleagues, teammates, mentors, and role models, past and present, who enriched my life tremendously during graduate school: Dr. Sophie Comyn, Dr. Ulrich Eckhard, Chang Sheng-Huei Lin, Eugene Kuatsjah, Angelé Arrieta, Dr. Julien Bergeron, Kelsey Harmse, Dr. Fred xv  Rosell, Siobhan Wong, Dr. Bernd Gardill, Hilary Leung, Dr. Roland Wilhelm, Dr. Gaye Sweet, Dr. John Smit, and many others.   Finally, I would like to thank the Genome Science and Technology graduate program for funding and support, and in particular Dr. Phil Hieter, Dr. Stephen Withers, and Sharon Ruschkowski. I may have never discovered NMR without this opportunity!  xvi  Dedication         To the loving memory of my father, Dr. Néstor R. Pérez Souto   1  Chapter 1: Introduction  1.1 Eukaryotic gene transcription   The process of transcription determines which set of genes will be expressed in a cell at any one time. Modulating transcription therefore allows organisms to develop, maintain cellular homeostasis, and adapt to changing cellular and environmental conditions. The complexity of organisms is correlated to the sophistication in their gene regulatory mechanisms [1]. In eukaryotes, several RNA polymerases (RNAPs) and thousands of proteins, including general and specific transcription factors (TFs),  have been identified to be responsible for gene activation or repression [2]. This complexity contributes to the distinct expression patterns observed in differentiated cells with otherwise identical genetic make-up, and highlights the importance of fine-tuning the levels of gene products in a cell.   In eukaryotic systems, gene transcription has three main stages: initiation, elongation, and termination. The first step involves assembly of the pre-initiation complex (PIC), followed by unwinding of the double-stranded DNA to create an open complex (transcriptional “bubble”) (Figure 1.1) [3]. In the case of mRNA transcription, this is catalyzed by RNAP II, which must clear the transcriptional start site following initiation (also called promoter escape). Subsequently, elongation proceeds and an mRNA molecule is produced by RNAP II according to its DNA template [3]. During termination, the nascent mRNA molecule is released and RNAP II dissociates from the DNA [4]. Although all three stages are critically regulated, only initiation is directly dependent on the cooperative action of TF proteins.    The eukaryotic PIC is a large protein assembly consisting mainly of RNAP, the general TFs such as TFIID, gene-specific TFs, and co-factors including histone modifiers, and chromatin remodelers [5] (Figure 1.1). In order to form a PIC capable of transcription, promoter and enhancer DNA elements are recognized in a concerted fashion by general and gene-specific transcription factors. These in turn are able to recruit key protein 2  complexes such as the Mediator and the enhanceosome, which further recruit and stabilize the PIC to promote transcription (Figure 1.1) [5, 6]. Once stably recruited, RNAP II unwinds about four turns of the DNA with the help of the general transcription factor TFIIH, and is poised to clear the transcriptional start site [7].       Figure 1.1:  Assembly of the PIC and associated proteins at eukaryotic promoters.  Shown is a cartoon representation of the general arrangements of components that allow transcription initiation. Promoter architectures vary depending on the gene and may include the TATA-box, the initiator (Inr), and enhancer DNA elements (green). Although not shown, downstream promoter elements can also be found near the +30 bp position from the transcriptional start site [7]. General TFs (GTFs; blue) typically associate with the proximal promoter regions (e.g. TATA box). Gene-specific TFs (red) and associated co-activators (orange) localize to enhancer elements. The Mediator (yellow) stabilizes the PIC by linking distal enhancer elements to the transcriptional start site. Positioning of the RNAP II complex (purple) is aided by the Mediator and the GTFs. Nucleosomes (brown) are an intricate part of this process, and although not shown, chromatin remodeling and histone modifying proteins must also act to allow passage of the transcriptional bubble accompanying RNAP II. This figure was inspired by representations of the PIC found in references [6] and  [7].    3   Any of the above-mentioned stages of transcription can be controlled by mechanisms that fine-tune the timing and extent of RNA production [3]. For instance, the C-terminal domain of RNAP II undergoes extensive phosphorylation, which greatly influences promoter escape and elongation (reviewed in [8]). It is not clear which step (if any) is predominantly regulated and rate-limiting [9], and the answer may depend on the cell type and specific gene in question. Nevertheless, assembly of an open PIC is required for transcription, as it localizes RNAP to the transcriptional start site. In eukaryotes, the presence of nucleosomes and other higher-order DNA structures impede the passing of the transcriptional bubble [10]. Chromatin remodelling complexes and histone modifiers such as SWI/SNF are therefore also necessary to liberate DNA binding sites necessary for transcription initiation and elongation. The RNAP II C-terminal tail can coordinate this process by recruiting histone modifiers as it moves along the DNA [8].  1.2 Eukaryotic transcription factors   1.2.1 General features and classification   Roughly 6 % of the human proteome consists of TF proteins [11] that are crucial in ensuring the high fidelity of gene expression. They are typically characterized by at least a DNA-binding domain (DBD) and a transactivation domain (TAD), responsible for localization to chromatin and co-factor recruitment, respectively [5]. These proteins are broadly classified into two groups based on their biological function. An estimated ~ 200-300 are general TFs involved in the transcription of most genes in the cell [5, 11]. In contrast, upwards of ~ 2000 gene-specific TFs each mediate transcription of a relatively small number of targets [5, 11]. Pax5 and Ets1 are both gene-specific TFs that cooperate in the development of B-cells that make up the lymphatic system [12].   Gene-specific TFs can be further divided into two categories according to their regulation and gene targets. Constitutively-active TFs are typically responsible for controlling the transcription of “house-keeping” genes and may be ubiquitously expressed 4  across many tissue types. Regulatory TFs, which are often tissue-specific, respond to intracellular or extracellular cues and control transcription of more specialized genes [5, 11]. These are generally viewed as being responsible for cellular responses to change. For example, the tumor suppressor p53 is up-regulated in response to DNA damage [13], and cytoplasmic TFs, such as the NFκΒ factor translocate to the nucleus upon being activated by cytokine signaling [14]. In most cases, regulatory TFs bind DNA and activate transcription [5]. However, many transcription factors can act both as positive and negative regulators, and some, such as ETV6, are transcriptional repressors [15]. Regulatory TFs are often associated with illnesses such as cancer, characterized by an imbalance of gene products necessary for proper cellular division [16, 17]. Thus, learning about their mechanisms of up-stream regulation and down-stream activity is critical for understanding the diseases linked to their misregulation.    1.2.2 Biophysical and structural characteristics    Eukaryotic TFs are modular in nature, containing discrete domains that often can be replaced by similarly-acting regions derived from other proteins [18, 19]. In 1985 it was shown that the DBD of GAL4 (a yeast protein) could be substituted by that of LexA (a bacterial protein), resulting in the GAL4-mediated activation of LexA target genes [20]. Since then, many studies have shown that different regions in TFs can be “mixed and matched” to generate functional proteins with new properties. Naturally occurring chromosomal translocation events can also result in fusion oncoproteins in which the DNA-binding and transactivation activities are derived from different TFs, leading to aberrant protein activity. One such example is the fusion oncoprotein Pax5-ETV6, which is strongly implicated in B-cell acute lymphoblastic leukemia (B-ALL) [21]. The presence of this protein, in which the DNA-binding Paired domain of Pax5 is fused to the nearly full-length ETV6 (a member of the ETS TF family), results in B-cell developmental arrest [21].   Partly because of this modularity, structural characterization of TFs has been marked by the “divide and conquer” approach, whereby stand-alone functional domains (such as the DBD) are studied in isolation. However, TFs exhibit a relatively high 5  proportion of intrinsically disordered polypeptide regions linking these domains [22, 23], and thus few full-length TFs have been characterized structurally by traditional approaches. The transactivation domains (TADs), in particular, are often composed of low sequence complexity regions rich in polar and disorder promoting amino acids [19]. For instance, the TADs of well-known TFs VP16, GCN4, and CREB do not fold in the absence of protein partners [10].  These domains recruit members of the basal transcriptional machinery; however, save for a handful of well-studied examples like p53, very little is known about how TADs function in eukaryotic systems [10, 19].   1.2.3 DNA-binding domains    A wide variety of DBDs are used by eukaryotic TFs to recognize DNA (Figure 1.2). By far, the most common fold is the zinc-finger (ZNF) C2H2 domain, which is present in more than 50 % of known TFs [11]. Other very common DBDs include homeodomains (HDs), basic helix-loop-helix (bHLH), basic leucine zipper (bZIP), Forkhead-type domains, and the winged helix-turn-helix (wHTH) domain present in Ets1 (Figure 1.2). In the majority of cases, the DBD contains an α-helix (named the recognition helix) that fully or partially occupies the major groove of DNA to allow sidechain-base interactions, including hydrogen bonding. Additional interactions to the sugar/phosphate backbone are provided by adjacent loops, such as the ‘turn’ and ‘wing’ present in wHTH domains. Although much less commonly observed, β-sheets can also bind the major groove of the DNA, as seen in the case of the transposon Tn916 [24]. More commonly, architectural TF proteins can bind the minor groove and induce large phosphodiester backbone conformational changes [25]. The TATA-binding protein (TBP), for example, uses β-sheets to dramatically bend DNA through hydrophobic-driven contacts [25]. Finally, some DBDs, such as the HD of Isl1, are marginally folded in the absence of DNA and only acquire stable structure upon association [26].     6    Figure 1.2: Overview of the most common DNA-binding domain folds.  Representative structures belonging to various DBDs families (red) are shown bound to DNA (grey). The most common of these is the ZNF-C2H2 domain, which is stabilized by coordination of Zn+2 ions by histidine and cysteine side chains. Shown is the mouse Zif268 protein (top, left). The HD family is represented by Antennapedia from Drosophila melanogaster (D. melanogaster, top and middle). This family is characterized by a three-helix bundle, in which the first two helices arrange nearly parallel to one another, and the third helix recognizes DNA. The bHLH family is exemplified by PHO4 from yeast (top, right). These homo/heteromeric domains are characterized by two helices mediating DNA recognition and dimerization that are separated by a loop. The bZIP family includes mouse CREB (bottom, left). Dimerization of this domain is mediated by leucine residues that form stable hydrophobic-driven interactions. The Forkhead (bottom, middle) and wHTH domains (bottom, right) are represented by human FOXN1 and Ets1, respectively. In both cases, an α-helix inserts into the major groove of the DNA. This figure was made using PyMol [27] from available PDB [28] structures.      7  1.2.4 Binding site recognition by transcription factors    Proteins and DNA molecules interact using a combination of non-covalent interactions, loosely classified as electrostatic, hydrogen bonding (direct or water-mediated), dipolar (e.g. van der Waals), and hydrophobic [29]. Because DNA is polyanionic, it is not surprising that many DNA-binding proteins are enriched in positively-charged amino acids such as arginine and lysine that contribute to recognition through favorable Coulomb attraction [30]. In addition, polar amino acids like glutamine and asparagine mediate hydrogen bonding and are very commonly observed in protein/DNA interfaces [31]. The aromatic side chains of phenylalanine and histidine can also stack with the DNA bases, thereby contributing to non-specific nucleic acid recognition [32]. In spite of these general trends, no specific pairing between protein side chains and DNA bases universally explains the formation of these complexes [29]. Instead, structural studies have found a wide variety of mechanisms (and thermodynamic contributions) used by these molecules to associate stably [29, 33].  Interestingly, the protein/DNA interface is not exclusively polar, but contains a high proportion (~50%) of aliphatic groups that may contribute to binding via hydrophobic and van der Waals interactions [30]. Although the individual energetic contribution of each of these types of interactions is relatively weak, the additive effect of many such contacts can result in most TFs recognizing cognate DNA with sub-nanomolar affinity (i.e. equilibrium dissociation constant KD < 10-9 M) [32].   Unlike other types of nucleic acid structures (such as folded single-stranded RNA), the DNA duplex is a relatively uniform double helix. Fine sequence discrimination mechanisms must therefore be at play to recognize important regulatory sites [34]. Specificity can result from direct readout of the pattern in hydrogen bond donors/acceptors and methyl groups that are present in the grooves of the DNA. For example, two arginine residues (Arg392, Arg395) and a histidine (His396) in the DNA recognition helix of ETV6 form hydrogen bonds with the invariant 5’-GGA-3’ bases that defines the ETS binding motif [35]. In closely-related Ets1, two arginine residues and a tyrosine mediate the same function [36].  8   In addition, indirect readout of sequence-dependent DNA features such as its geometry, dynamic properties, and electrostatic potential also provides binding specificity [32, 37]. DNA sequences that are rich in AT bases, for instance, have relatively narrow minor grooves with strongly negative electrostatic potential, and arginine side chains often recognize these grooves specifically [37]. In addition, the flexibility of TA-rich DNA sequences are thought to facilitate binding by proteins such as TBP and EcoRV which deform the regular DNA geometry upon association [38].   TFs can also bind non-specific DNA with micromolar affinity. This generally involves favorable electrostatic contacts, as well as non-specific base stacking between aromatic side chains and the nucleic acid bases [32]. Histone proteins use this type of mechanism to recognize DNA in a general manner, allowing relatively unbiased chromatin structure formation [29]. Multiple studies have shown that non-specific protein-DNA complexes use the same canonical interface involved in specific DNA recognition, but the extent of the structural rearrangements is smaller ([35, 39, 40] among others). Non-specific complexes are instead characterized by weaker association, greater dynamic properties, and a larger number of water molecules found at the protein-DNA interface [39, 41]. Generally speaking, the release of water molecules from the DNA and protein surfaces upon association contributes favorably to the entropic change of binding and to surface complementarity, and this occurs to a greater extend in specific complexes relative to non-specific ones [29, 34].     1.2.5 DNA search mechanisms   A key question in the field of gene regulation is how TFs locate cognate regulatory DNA sites (KD < 10-9 M) in a vast excess of non-specific DNA to which they still bind with moderate affinity (KD ~ 10-5 M) [32]. Even after accounting for chromatin-modulated accessibility, TFs are estimated to exist in the presence of ~ 1.5 mM possible binding sites of 10 base pairs (bp) each [42]. This issue is particularly important in higher organisms because eukaryotic TFs often recognize relatively short and degenerate core DNA sequence motifs of ~ 6 - 8 bp [32] that statistically occur at relatively high frequency throughout the 9  genome. In addition to buffering the pool of TFs within the nucleus, non-specific interactions facilitate the DNA search for high-affinity cognate sites.   Movement along DNA is thought to occur through a combination of one-dimensional (1D) and three-dimensional (3D) mechanisms of diffusion that allow these proteins to rapidly interrogate a large number of non-specific sites and thereby locate cognate promoter or enhancer sequences (Figure 1.3) [43].        Figure 1.3: Mechanisms of facilitated diffusion of TFs on DNA.  (a) TFs can slide along the DNA for short distances (50-100 bp) while loosely associated with the negatively charged double helix. (b) Rapid dissociation and re-association to the DNA molecule allows the TF to “hop” or “jump” along the DNA. (c) The presence of multiple DBDs facilitates the intersegmental transfer or “monkey bar” mechanism that contributes to sampling of DNA regions close in space. This figure was inspired by similar schematics found in [44].     The 1D ‘sliding’ mechanism involves the loose association of a TF with DNA that still allows rotationally coupled translational diffusion along the helical grooves [45]. This has been proposed to reduce the dimensionality of the search process and thereby help locate TFs on cognate sites that are near (~ 50 - 100 bp) the initial non-specific binding region 10  [46-49]. Sequence-independent electrostatic interactions between TFs and DNA aide in this process by providing a relatively uniform (isopotential) binding interface between the two molecules [50].   If the TF dissociates from DNA fully (i.e. into bulk solution) and rebinds at nearby sites, it is said to ‘hop’ (< 10 bp) or ‘jump’ (>100 bp) between locations [42, 49]. This 3D diffusion search mode allows TFs to bind at more distant sites, thereby increasing the diversity of sites that are sampled and avoiding local traps [43, 49, 50]. Indeed, DNA-binding proteins such as UL42 [51] and EcoRV [52] have been shown to exhibit these types of movements along the DNA. However, much work needs to be done to establish whether these observations are widely applicable to DNA-binding proteins and TFs specifically. New developments in the field of single-molecule fluorescence microscopy are helping to clarify the nature of these processes and establish their contribution to TF kinetics in real time [53].   The presence of multiple ordered domains, or even intrinsically disordered regions in a TF that can recognize DNA can also influence the search mechanism significantly [32]. For example, proteins with bipartite DBDs or intrinsically-disordered regions with some affinity for DNA, such as Oct-1 and HoxD9, have been shown to use the “monkey bar” intersegmental transfer mechanism, in which two distinct regions of the protein contact separate DNA molecules at once [54, 55]. Intersegmental transfer allows TFs to associate with DNA sites close in space, yet far in sequence. As a result, the TF molecule is not confined to one chromosome, but can potentially associate with distant promoter/enhancer sites.   1.2.6 Mechanisms that regulate DNA binding by transcription factors   Broadly speaking, regulation of TF activity can be thought to occur on two levels: i) at the gene/mRNA level leading to the presence of a functional TF protein, and ii) at the protein level. Both layers of regulation ultimately influence the ability of a given TF to bind DNA and recruit co-factors as required to activate or repress transcription cognate genes. Here, I will focus on the control elements acting on the TF protein itself and the various 11  mechanisms that influence DNA binding affinity and specificity. These are summarized in Figure 1.4.     Figure 1.4: Mechanisms that regulate DNA binding by TFs.  Diagrammatic representation of some of the multiple mechanisms that may contribute to fine-tuning of the transactivation potential of TFs (red). In the presence of an activating intracellular signal in the cytoplasm (yellow), autoinhibition may be relieved and allow translocation of the TF to the nucleus. Post-translational modifications such as phosphorylation, represented by a “P”, may promote transcription, for example by allowing recruitment of protein partners that stabilize binding to DNA. Gene transcription often occurs in the presence of nucleosomes (brown) that organize naked DNA into higher order structures. Cellular localization   The compartmentalization present in eukaryotic cells provides a simple mechanism to modulate DNA binding, for example, by restricting the TF molecule to the cytoplasm in the absence of a stimulus. The control of localization is common in TFs that rapidly relay cell receptor signals to the nucleus, as in the case of the NFAT proteins [5]. Activation of T-12  cell receptors causes a rise in intracellular Ca+2 concentrations, which in turn activates the calcineurin phosphatase. Dephosphorylation of cytoplasmic NFAT by calcineurin results in translocation of the former to the nucleus, where it activates genes responsive to T-cell signalling [5]. In addition, ligand binding events and accompanying conformational changes can also affect the localization of TFs that respond to metabolites such as sugars and amino acids [10]. The galactose response in yeast involving Gal3, Gal4, and Gal80 is a prime example of this type of regulatory switch [56]. In the presence of galactose, Gal3 is able to interact with the transcriptional inhibitor Gal80, which under non-inducing conditions, represses Gal4. The interaction between Gal3 and Gal80 results in the translocation of the inhibitor to the cytoplasm, thereby activating the Gal4 targets that metabolize galactose [57]. Chromatin structure   In the nucleus, binding by TFs is also typically constrained by higher order chromatin structures that occlude many potentially available DNA sites. ATP-dependent nucleosome remodelling enzymes such as SWI/SNF are therefore crucial in allowing the appropriate enhancer and promoter elements to be free to associate with TFs [10]. These enzyme complexes shift the position of nucleosomes along DNA by changing the supercoiling of the helical strands [58]. In addition, histone chaperones and acetylases help in ‘reshuffling’ nucleosome structures either by facilitating their translocation or by weakening histone-DNA interactions [59].  Of note, some TFs, termed “pioneers” are able to associate with nucleosome-bound DNA and can activate silent chromatin by recruiting the appropriate co-factors [60]. The Forkhead protein FoxA is one such case. The DBD of FoxA binds a surface of DNA not contacted by the nucleosome, while an additional C-terminal domain stabilizes the interaction by associating with the histones [61]. FoxA is then able to recruit proteins such as histone acetylases that further open the chromatin for transcription [61].     13 Post-translational modifications   PTMs regulate virtually every aspect of cellular function, including transcriptional activity. One of the most common and versatile modifications found to regulate TF function is phosphorylation, which involves the addition of a phosphoryl group to the side chains of tyrosine, threonine, or most commonly, serine. As a result, neutral residues become negatively charged (~ -2) and bulkier, with various consequences on protein structure and protein-protein interactions [62]. These include conformational changes, co-factor recruitment, altered ligand/DNA binding properties, protein degradation, and distinct cellular localization, to name a few (see [63] for review). A delicate balance between the action of "writer" kinases and "eraser" phosphatases that add or remove phosphoryl groups provides a refined layer of regulation of protein activity in the cell. This is highlighted by the more than ~500 kinases and ~200 phosphatases encoded in the human genome [64].   Phosphorylation of the cyclic AMP response element (CRE)-binding protein (CREB) at Ser133 is a well-known example of regulation via phosphorylation. Full transcriptional activity of CREB is only accomplished when this particular residue is modified, and multiple cellular signaling events can converge at this site [65]. Phosphorylation of additional residues along CREB, such as Ser142, can alter the effect of the Ser133 modification [65]. Thus, the combination of multiple PTMs with distinct consequences on protein activity results in highly regulated function.   Another common PTM occurring in TFs is the addition of the small ubiquitin-like modifier, SUMO. In contrast to phosphorylation, SUMOylation involves the covalent attachment of a small protein (~ 12 kDa) to a lysine sidechain. Akin to the addition of ubiquitin, SUMOylation involves the action of E1 activating, E2 conjugating, and E3 ligating enzymes [66]. Unlike ubiquitination, however, which often targets TFs for degradation through the proteasome pathway, SUMOylation is mostly associated with transcriptional inhibition through recruitment of co-repressors such as histone deacetylases and Daxx [67, 68]. In the case of the latter, SUMO-interacting domains have been identified that likely mediate interactions with many SUMO-modified TFs [69]. The addition of SUMO likely 14  results in the formation of large protein-protein interaction interfaces that recruit co-factors involved in transcriptional regulation.   Besides the above mentioned examples, TFs can be modified by a myriad of other PTMs, such as acetylation, O-GlnNAcylation, methylation, and lipidation, to name only a few. Finally, although not strictly considered a PTM, TFs are also sensitive to intracellular cues such as pH and reductive potential, which may alter the protonation or oxidation state of side chains. For example, two members of the Pax family of TFs, Pax5 and Pax8, have been shown to be sensitive to oxidation, which alters their ability to bind DNA by changing the reduced state of cysteine [70]. Autoinhibition   The activity of a protein is regulated by autoinhibition if removal of one region of the polypeptide chain causes an increase in the activity of another, such as ligand binding or catalysis (reviewed in [71]).  In the case of eukaryotic TFs, autoinhibitory mechanisms may regulate DNA-binding affinity or transactivation potential. Key to autoinhibition is the spatial proximity of the regulatory elements, which provides “on-site” control of protein activity [71]. This feature enables very rapid responses to cellular cues, as it does not depend on the diffusion of trans-acting molecular effectors. Instead, TFs that are autoinhibited may quickly become active/inactive upon disruption/reinforcement of the autoinhibitory mechanism, for example due the presence or absence of PTMs. Multiple members of the ETS family of transcription factors have now been shown to be autoinhibited by regions flanking the DBD using a combination of steric and allosteric mechanisms [72-74]. This additional layer of regulation results in specific and refined transcriptional activity.     15 Protein partnerships   A hallmark of eukaryotic gene expression is combinatorial control by multiple TFs. Partnerships among TFs can influence both DNA-binding affinity and specificity by providing stabilizing contacts and inducing conformational changes of the TF proteins and the DNA [32, 75, 76]. In fact, many in vivo promoter and enhancer sites for TFs do not conform well to their in vitro sequence motifs defined using isolated DBDs [32, 76]. For instance, an in vivo investigation using ChIP-seq of the zinc-finger protein KLF3 compared DNA-binding by the isolated DBD, the full-length protein, and a mutant lacking the ability to recruit an associated co-repressor [77]. The authors found that the types of sequences bound and the location of these binding sites along the gene (e.g. at promoters or introns) varied significantly among the three versions of the protein.    A well-characterized example of combinatorial binding is the B-cell specific interaction of Pax5 and Ets1 on the mb-1 promoter DNA [12, 75, 78, 79]. Neither TF is able to recognize this binding site with particularly high affinity. However, key contacts between the N-terminal β-hairpin of Pax5 changes the positioning of Tyr395 in the DNA recognition helix of Ets1, thereby enabling non-canonical base-specific contacts with the mb-1 promoter. The change in conformation of this residue allows Ets1 to recognize a distinct DNA sequence that does not fully conform to its known motif, thereby altering binding specificity. In addition, the cooperative interaction among all three molecules increases the stability of this complex and the affinity of both Pax5 and Ets1 for DNA.   A key determinant of TF partnerships is the DNA molecule itself [76]. The spacing and relative positioning of recognition motifs greatly influence the types of interactions that can be made in the presence of multiple TFs. In a study by Jolma and co-workers [76], the available structures of ternary TF complexes were analyzed to determine the basis for altered specificity found in TF pairs. This study found that in ~ 95 % of cases, the DNA molecule mediated the contacts necessary for complex formation, highlighting its role as a molecular scaffold. Thus, the cellular context provides a richness of interactions between DNA and co-factors that ultimately direct assembly of a stable PIC capable of transcription at unique binding sites. 16   In summary, TFs are crucial in specifying the genes that are transcribed in a cell. These proteins use their DBDs to locate target sites on available chromatin, in a process that must be tightly regulated to ensure cellular viability. Studying these mechanisms in detail will therefore help our understanding of gene regulation and diseases that may result when these fail.   1.3 Research questions and goals    The roles of TF proteins are intricately connected to their biophysical properties. The overarching goal of my thesis is to understand how the structural and dynamic properties of eukaryotic TFs determine their DNA-binding and regulatory mechanisms. For this purpose, I characterized two regulatory TFs that are crucial in the normal function of immune cells. In Chapter 2, I focused on the DNA binding mechanisms of Pax5, a protein that is required in B-cell development, and which is commonly mutated in B-cell malignancies. In Chapter 3, I investigated Ets1, a transcription factor whose function is crucial in T-cell activation. In this case, I studied the role of an intrinsically disordered region of this protein in regulating DNA association. My aim was to understand the underlying mechanisms used by these two proteins to associate with DNA in a controlled manner.  1.3.1 Investigating the biophysical properties of human Pax5 and its DNA-binding mechanisms    Pax5 belongs to a highly conserved family of developmental TFs characterized by an N-terminal DBD of ~128 amino acids called the Paired domain (PD) [80]. In humans, this small family consists of nine members (Pax1 - 9) with crucial roles in embryogenesis and cell differentiation [81]. The DBD of Pax5 is bipartite, containing N- and C-terminal subdomains that cooperate to bind DNA (Figure 1.5). This type of DBD architecture lends 17  itself to unique binding properties, since separate regions can contribute differentially to association with DNA.   One important question my research sought to address was the relative ability of each subdomain to associate with DNA in a specific versus non-specific manner. While conducting these studies, I found that the flexible N-terminal subdomain had very specific DNA sequence requirements for association, relative to the more rigid and promiscuous C-terminal subdomain [40]. The distinct dynamic and DNA-binding properties of the subdomains prompted me to question the underlying basis for these differences. Thus, I also conducted biophysical studies designed to determine the mechanisms behind the observed differences in DNA binding by the Pax5 PD subdomains. In addition, I was interested in whether functionally mapped domains of Pax5 outside the PD could adopt stable secondary structures, or whether the remainder of Pax5 was mostly disordered. Answers to these questions are provided in Chapter 2 of my thesis.    Figure 1.5: Domain organization of Pax5.  A schematic of the relative position of different domains in the human Pax5 protein is shown. This includes the bipartite N-terminal DNA-binding Paired domain (PD), the octapeptide motif (OP), a region of homology to homeodomains (HD), a transactivation domain (TAD), and a C-terminal inhibitory module (IM). Approximate amino acid boundaries are provided above the cartoon and are derived from multiple previous studies (reviewed in [82]). The PD of Pax5 is composed of N-terminal and C-terminal helical subdomains joined by a linker of ~ 20 residues. The crystal structure of the PD of Pax5 bound to DNA and Ets1 is shown (PDB: 1MDM, the Ets1 molecule was removed for simplicity).  18  1.3.2 Understanding the DNA-binding autoinhibitory mechanism of Ets1   Ets1 is an important transcriptional regulator and proto-oncogene highly expressed in lymphocytes [83]. Multiple PTMs have been identified in Ets1 that fine-tune its activity (Figure 1.6) (reviewed in [83]). Upon activation of T-cells, for example, intracellular calcium concentrations increase, leading to the activation of calmodulin-dependent kinase II (CaMKII). This kinase phosphorylates Ets1 at serine residues present in the intrinsically disordered serine-rich region (SRR) preceding the DNA-binding ETS domain. Phosphorylation of the SRR reduces DNA binding by Ets1 dramatically; however, the structural basis for this mechanism was only partly understood. The goal of my studies on Ets1 was to elucidate the basis for the interaction between the phosphorylated SRR region and the ETS domain, and thereby better explain how the function of Ets1 is regulated by disordered sequences adjacent to its DNA-binding domain. Answers to these questions are provided in Chapter 3 of my thesis.   Figure 1.6: Domain organization of Ets1 and selected PTMs.   Shown is the domain organization of the full-length Ets1 molecule. The Pointed (PNT) domain functions in transcriptional activation by mediating protein-protein interactions [84]. The C-terminal ETS domain is responsible for DNA binding. Phospho-acceptor sites are indicated by the letter P and are shown at approximate mapped sites along the protein. Ets1 activity is downregulated by at least two processes (red): CaMKII phosphorylation in the SRR region, which decreases DNA binding, and modification with SUMO, which recruits transcriptional co-repressors. In contrast, phosphorylation by the MAP kinases and the presence of partners such as Pax5 and Runx1 activate transcriptional activity (shown in green). For simplicity, other PTMs identified in Ets1, including ubiquitination and acetylation, are not shown [83]. The model of full length Ets1 was generated by Dr. Cameron Mackereth using the published coordinates for the PNT domain (PDB: 2JV3) and the inhibited ETS domain (PDB: 1R36). For illustrative purposes, the remaining residues were simply built as an extended polypeptide without any additional energy minimization. I made minor color changes of this model to generate the figure using PyMol [27].   19  Chapter 2: Biophysical characterization of Pax5 and its DNA-binding mechanisms  2.1 Overview   The eukaryotic transcription factor Pax5 or B-cell specific activator protein (BSAP) is central to B-cell development and has been implicated in a large number of cellular malignancies resulting from loss- or gain-of-function mutations. The DNA-binding Paired domain (PD), in particular, is a hot spot for chromosomal rearrangements and disease-associated genetic changes. In this chapter, I characterize the human Pax5 protein and its DNA-binding mechanisms, using a combination of NMR spectroscopy, isothermal titration calorimetry (ITC), circular dichroism (CD) spectroscopy, and molecular dynamics (MD) simulations.  I found that the PD folds as two independent helical bundle subdomains separated by a conformationally disordered linker. The N-terminal ~ 30 residues are also disordered as determined by chemical shift analysis and amide relaxation experiments. The two subdomains of the PD differ in their dynamics and DNA-binding properties. The C-terminal subdomain (CTD) is ~ 10 fold more protected from amide hydrogen exchange (HX) than the N-terminal subdomain (NTD). MD simulations support the dynamic nature of the NTD, and highlight motions of the DNA recognition helix. In spite of being more dynamic, the NTD is resistant to chemical and thermal denaturation relative to the CTD, as measured by CD spectroscopy. These and other observations described in this chapter, point to a flexible NTD subdomain which behaves as a “molten globule”. This stands in contrast to the more rigid CTD.   Upon binding DNA, the dynamic properties of the PD become significantly dampened. In particular, the N-terminal residues become folded, and the highly conserved linker region becomes rigid. I also found that DNA binding by the NTD is highly specific, relative to the CTD, as evidenced by the ability of the NTD to discriminate cognate versus non-specific DNA-binding sites. These interactions of the NTD with cognate DNA are not 20  solely reliant on electrostatic contributions and are driven by large favorable enthalpy changes that offset comparable losses in entropy. On the other hand, the more rigid CTD primarily depends on electrostatic effects in order to recognize both cognate and non-specific DNA. HX experiments and MD simulations indicate a greater reduction in motions of the NTD than the CTD upon DNA binding. The inverse relationship between subdomain dynamics and specificity suggests that conformational plasticity is needed for Pax5 to form high affinity interactions with its cognate DNA sites. In addition, the distinct behaviors of the NTD and CTD may enable efficient searching of non-specific genomic DNA by Pax5 while also retaining specificity for functional regulatory sites.  The remainder of Pax5 appears to be intrinsically disordered by NMR chemical shift analysis, and does not adopt any stable secondary structure in the absence of protein partners. In addition, the putative partial homeodomain of Pax5 does not interact with DNA, and may instead be a site for protein-protein interactions. These data, in combination with sequence-based structure predictions, point to regions of Pax5 outside the PD being natively unstructured and serving as docking surfaces for other components of the transcriptional machinery.   2.2  Introduction  2.2.1 Structural insight from the Paired box (Pax) family of proteins  The Pax family of transcription factors control tissue patterning and organ formation during early development, and are found across the animal kingdom, from sponges to humans [85-87]. Nine Pax genes have been identified in mammals, with critical roles in embryogenesis and cell differentiation, as demonstrated by a number of congenital syndromes associated with mutations in pax gene loci [86, 87]. For example, mice deficient in Pax3 do not survive gestation and fail to develop the circulatory system [80]. In humans, mutations in this gene can cause Waardenburg syndrome, a rare genetic disordered characterized by hearing deficiencies and abnormal melanocyte formation [80]. Similarly, 21  other Pax genes have been found to be crucial in the formation of the eyes (Pax6), the kidney (Pax2), the thyroid (Pax8), and the thymus (Pax1) (reviewed in [80]).  The defining feature of this ancient family of transcription factors is the highly-conserved N-terminal DNA-binding region of ~128 amino acids known as the Paired domain (PD) [12, 85-88]. The Pax proteins can be divided into four subgroups based on the presence of additional conserved features, such as the homeodomain (HD) and octapeptide (OP) motif, believed to be present in the ancestral Paired gene (Figure 2.1) [85, 86, 89]. The second subgroup, for example, includes Pax5 and contains the OP motif and a region of partial homology to the HD [89].     Figure 2.1: Domain organization and subgroups of the nine identified mammalian Pax proteins.  The nine paralogs can be subdivided into four groups (I-IV) based on amino acid sequence homology [85]. All family members share a highly conserved DNA-binding PD consisting of ~ 128 residues. Shown are schematic representations of the relative positioning of regions of high sequence similarity found across the family. The asterisks (*) highlight domains of human Pax proteins that have been characterized structurally, and are shown in Figures 2.2 and 2.3. The transactivation domains (TADs) have not been included for simplicity. However, they are found at the C-termini of the proteins, and can vary in length. The cartoon representations are not drawn to scale. Pax proteins differ in the number of amino acids and the spacing between the regions shown.   22   Regions of high sequence similarity in the Pax family of transcription factors such as the PD have been structurally characterized by NMR spectroscopy and X-ray crystallography. The PDs of human Pax5, human Pax6, and D. melanogaster Paired were crystalized in complex with DNA (Figure 2.2) [12, 90, 91], whereas the unbound PD of human Pax8 was characterized using NMR spectroscopy (Figure 2.2) [92]. These structural studies showed that the PDs of Pax proteins are composed of a bipartite DNA-binding domain consisting of two helical subdomains joined by a linker of ~ 20 residues. Superposition of these structures showed very high structural conservation, with ~ 1.5 Å backbone root-mean square deviations (RMSDs) between the PDs of Pax5 and Pax6. This is not surprising given the ~ 75 % amino acid sequence identity between these two domains. Helix-turn-helix (HTH) motifs in the N-terminal subdomain (NTD, also called PAI) and C-terminal subdomain (CTD, also called RED) bind the major groove of DNA at sites separated by approximately one turn of the phosphodiester backbone. Additional minor groove contacts are provided by a short N-terminal β-hairpin and adjacent loops, as well as the linker.   At the start of my thesis research, only the PDs of Pax6 and Pax8 had been characterized in the absence of DNA. Early studies using CD spectroscopy indicated that these PDs had low α-helical content, which increased upon DNA binding [93, 94]. In contrast, using NMR spectroscopy, Codutti et al. [92] demonstrated that the unbound Pax8 PD adopts stable conformation similar to that of other PD proteins studied in complex with DNA [12, 90, 91]. However, the authors noted weak tertiary contacts in the helical subdomains of Pax8, and the absence of the N-terminal β-hairpin found in crystals of PD/DNA complex (Figure 2.2). Thus, we initially lacked consistent insights into the biophysical properties and structural dynamics of the PDs in the Pax family, as well as their DNA-binding mechanisms.     23   Figure 2.2: Paired domains characterized to date by X-ray crystallography and NMR spectroscopy.  The crystal structures of paralogous PDs (sequence conservation > 60%) of Paired (fruit fly), Pax5, Pax6, and Pax8 (human) were determined in their DNA-bound or free forms [12, 90-92]. These structures showed that the PD is a conserved bipartite DNA-binding domain composed of NTD and CTD subdomains. The helices forming the subdomains are labeled, along with the β-hairpin motif where present. The crystal structure of Pax5 also contains the transcriptional partner Ets1 contacting the NTD and additional DNA, omitted here for clarity.          In addition to the N-terminal PDs which define the family, Pax proteins belonging to groups III and IV also contain a HD that is involved in DNA binding and protein interactions [95]. The three-dimensional (3D) structures of three Pax HDs have been described thus far [95, 96], and are shown in Figure 2.3a. These structures, which superimpose very closely, form a three-helix bundle using the helix-turn helix (HTH) motif characteristic of HDs [97]. In the case of Pax5 and members of the same subfamily, there is a region of high sequence similarity only to the first helix of these domains (Figure 2.3b).  24   Figure 2.3: Homeodomains of Pax proteins characterized to date.  The HDs in this family are structurally similar and exhibit a characteristic helix-turn-helix (H2-H3) motif. (a) Left and middle panels: X-ray crystallographic structures of the human Pax3 and fruit fly Paired HDs in complex with DNA [95, 96]. Although the DNA strands are not shown for ease of comparison and clarity, the recognition helix H3 binds the major groove of DNA. Right panel: NMR spectroscopically derived structure of the HD of human Pax6 determined in the absence of DNA. All structures superimpose closely, indicating that the overall HD fold does not change significantly upon binding DNA. (b) The full-length protein sequences of five human transcription factors from diverse families known to contain HDs were aligned with that of human Pax5. The numbers below the alignment correspond to Pax5 residues. Although residues within the ~ 220-250 region of Pax5 share some apparent homology with helix 1 (H1) of the HD, there is sequence little relationship to the helix-turn-helix H2 and H3. Residues fully (orange) or moderately (yellow) conserved in 5 or more members are highlighted.   However, the homology is only partial and does not extend to the DNA-recognition helix. Therefore, at the start of my work it was unclear whether in the case of Pax5, this segment was able to adopt a defined structure and recognize DNA, and if not, what its role in transcriptional regulation was.   Aside from the PD and HD regions, other conserved features of the Pax family have not been structurally characterized to date. The OP or eh1 motif is a short (7-8 residues) region present in some members of the Pax family, including Pax5, as well as other families of HD-containing transcription factors [98]. This motif has been shown to interact with the Groucho/TLE family of transcriptional repressors in several species [99]. How this interaction may contribute to transcriptional silencing is not well-understood. However, 25  there is evidence that upon recruitment, Groucho proteins interact with histone deacetylases (HDACs), thus localizing DNA-modifying proteins that contribute to a repressed chromatin state [99].   Finally, Pax proteins contain proline-rich TADs at their C-termini. Secondary structure prediction algorithms suggest that these domains are intrinsically disordered, save for small regions with α-helical propensity. This is supported by the relatively high content of proline and polar residues (e.g. serine and threonine) in the TAD of Pax5, residues known to promote disorder in protein regions (more details about intrinsic disorder in section 3.2.1). However, the structural and biophysical mechanisms mediating the recruitment of proteins promoting transcriptional activation and silencing by the TADs of Pax proteins remain unknown.   2.2.2 Pax5   Pax5 or B-cell specific activation protein (BSAP) was first identified for its DNA-binding ability and expression in immature B-cells [100]. In cooperation with Pax2, Pax5 is important in the formation of the central nervous system during mouse embryonic development [101-104]. In adult organisms, Pax5 drives differentiation and maturation of B-cells, serving both as an activator of B-cell-specific genes (e.g., CD19, mb-1, and Blnk) and a repressor of alternative developmental programs [105-108]. As such, Pax5 is considered a “commitment” factor, without which cells retain the potential to develop along alternative hematopoietic pathways (e.g. to become T-cells) [109].    Not surprisingly, aberrant Pax5 activity has been implicated in a large number of B-cell malignancies and other non-hematopoietic cancers, where it can act as oncogene or tumor suppressor depending on the cellular context (reviewed in [108]). In the latter case, a reduction in Pax5 function can lead to B-cell acute lymphoblastic leukemia [110-112], whereas re-establishment of Pax5 expression ameliorates the malignant phenotype [113].  Genome-wide DNA-binding studies have been crucial in understanding the function of Pax5 in B-cell development. A study published in 2012 identified ~20,000 and ~15,000 DNA-binding sites for Pax5 in pro-B and mature B cells, respectively, corresponding to ~ 26  40 % of the total number of nuclease hypersensitive regions identified in these cells [114]. However, only ~ 360 genes were found to change in expression level by more than 4-fold upon deletion of Pax5 [114]. These data indicated that, in most cases, DNA binding by Pax5 is not sufficient to alter gene expression levels and other regulatory factors are needed. Partnerships with proteins such as Ets1 and Grg4, as well as specific DNA promoters that allow high-affinity complex formation, must be critical for ensuring proper activation and repression of Pax5 gene targets in vivo [12, 89, 115, 116].  This chapter focuses on the DNA-binding PD of Pax5, which is a hotspot for disease mutations. Interestingly, the PD is retained in at least 15 chromosomal rearrangements resulting in Pax5 fusion proteins [109]. In addition, a study found that the vast majority of mutations in mouse models of B-cell acute lymphoblastic leukemia were located in the PD [117]. These mutations are expected to disrupt DNA-binding activity of Pax5, thereby preventing complete B-cell differentiation and leading to tumorigenesis.   Previous studies investigating DNA binding by Pax5 have shown that the consensus site of the Pax5 PD is extended (~ 15-20 bases) and somewhat degenerate and thus not well defined (Figure 2.4) [81, 114, 118]. This is consistent with the structures of Pax proteins in complex with DNA shown above, in which several regions of the PD are in close proximity to the DNA. However, the relative contributions of the two subdomains and the linker to DNA binding and specificity by Pax proteins was unclear and seemed to be context dependent. For example, in the case of Paired from fruit fly, the NTD appeared to be functionally dominant in the activation of genes involved in embryonic patterning, since deletion of the CTD had no phenotypic effect in vivo [119]. In addition, mutations in the DNA-binding domain affecting fly viability could be rescued with the NTD region of the protein alone [90]. Although a few reports agreed on the dominance of the NTD in DNA binding by Paired [120, 121], more recently, the CTD was found to be necessary for proper mating response in flies [122]. The importance of this subdomain in cooperating for target DNA recognition had also been recognized for Pax5 [81], Pax6 [123], and Pax8 [94, 124]. Therefore, the individual subdomains seemed to have different roles in associating with DNA. However, given the similar structures and DNA recognition modes of the NTD and CTD subdomains, the reasons underlying these differences were unknown.  27   In this chapter, I used NMR spectroscopy and other biophysical techniques, as well as MD simulations (in collaboration with Mr. Florian Heinkel), to investigate the structure, dynamics, and stability of the PD of Pax5, as well as regions encompassing the partial HD and the putative TAD. In addition, I used NMR spectroscopy ITC to conduct DNA-binding studies and relate these biophysical properties to function. My main findings, presented below, advance our understanding of the underlying biophysical mechanisms used by Pax5 to bind DNA and regulate gene expression.   2.3 Results  2.3.1 Biophysical properties of the DNA-binding Paired domain of Pax5  In solution, the PD of Pax5 folds as two independent helical subdomains   I initially collected 15N-HSQC spectra of three bacterially-expressed fragments of this transcription factor: Pax51-92 containing the NTD and flanking regions, Pax576-149 containing the linker and CTD, and Pax51-149 spanning the entire PD (Figure 2.4a). The two subdomain constructs showed dispersed amide peaks characteristic of folded structures, as well as sharp signals exhibiting random coil 1HN shifts (~ 8 to 8.5 ppm), diagnostic of conformational disorder (Figure 2.4b). In addition, the corresponding amide 1HN-15N signals of the smaller protein fragments (Pax51-92 and Pax576-149) overlaid closely with those of Pax51-149 (Figure 2.4b), indicating that the NTD and CTD are structurally independent and can adopt the same fold whether in isolation or connected covalently to form the full-length PD. Consistent with this conclusion, the 15N-HSQC spectrum of 15N-labeled Pax51-92 was not perturbed upon addition of unlabeled Pax576-149 (not shown).   28   Figure 2.4: The subdomains of Pax51-149 fold as independent helical bundles.  (a) Boundaries of the protein fragments used in this study. Colored rectangles indicate the NTD and CTD helical bundles. The DNA-binding consensus sequence of the PD according to the JASPAR CORE database [125] is also shown. Bases present in more than 50% of the sequences within this dataset are written explicitly, with those most conserved (> 75%) underlined. The x represents less than 50% preference for any particular base. (b) Overlaid 15N-HSQC spectra of the three Pax5 fragments collected at pH 6.5 and 25 °C. Residues Arg50, Glu113, and Ile83 are labeled as examples. The close overlap of dispersed signals from the two subdomain fragments and the full-length PD indicates that the NTD and CTD are structurally independent. In contrast, the sharp signals with poorly dispersed 1HN shifts near 8 to 8.5 ppm arise from conformationally disordered amides flanking the helical bundles. (c) In the absence of DNA, the Pax5 PD consists of two independent helical bundle subdomains flanked by conformationally disordered residues.  Shown are normalized α-helical and β-strand propensities per residue based on the MICS analysis [126] of backbone 13Cα/13Cβ/13CO/15N chemical shifts for Pax51-92 (yellow, left), Pax576-149 (red, right), and Pax51-149 (blue, bottom). The solid black lines denote the RCI-S2 values for each residue, with lower values indicating greater predicted conformational disorder. The top cartoon shows the secondary structure (black arrows for β-strands, rectangles for α-helices) of the PD observed in a crystallized Pax5/Ets-1/DNA complex (PDB: 1MDM).     29   The assigned chemical shifts of backbone 1HN, 15NH, 13Cα, 13Cβ, and 13CO nuclei in Pax51-92, Pax576-149, and Pax51-149 (assigned spectra shown in Appendix A.1-A.3) were used to calculate secondary structure propensities, as well as random coil index-squared ordered parameters (RCI-S2) according to the MICS algorithm [126]. This analysis revealed that in solution, the NTD and CTD of Pax5 form well-ordered 3-helix bundles (Figure 2.4c). The predicted secondary structures of the individual subdomains did not change in the context of the full PD, and agree closely with those observed in the crystal structure of the Pax5/Ets-1/DNA complex (PDB ID: 1MDM) [78]. However, the N-terminal ~ 30 residues of Pax51-92 and Pax51-149 lacked any persistent secondary structure and exhibited low RCI-S2 values, indicative of conformational flexibility. Although residues 25-27 showed β-strand propensity, the overall chemical shift analysis indicates that the small β-hairpin (residues 19-21 and 25-27) seen in the crystal structure of the Pax5/Ets-1/DNA complex is dynamic and not stably formed in either free Pax51-92 or Pax51-149. This conclusion is supported by amide HX and 15N relaxation studies presented below. The linker residues in all three fragments were also found to be disordered, with random coil chemical shifts and hence low RCI-S2 parameters. Consistent with the spectral comparisons in Figure 2.4b, these data suggest that the two subdomains do not interact with one another, but rather behave as “beads-on-a-string” to form the full PD.   Upon binding DNA, the β-hairpin and linker region become ordered    To characterize the potential changes in structure and dynamics that occur upon DNA binding by the PD, I assigned the backbone chemical shifts of amide-protonated 2H/15N/13C-labeled Pax51-149 in complex with a 25 bp DNA duplex corresponding to the high affinity variant of the CD19 promoter, CD19-2_Ains (Figure 2.5 and Appendix A4) [81]. The majority of the amides in the protein exhibited chemical shift perturbations (CSPs) in the presence of DNA (Figure 2.5a, c). However, based on chemical shift analysis using the MICS algorithm, the α-helical content of the complex remained very similar to that of the free protein (Figure 2.5b). In contrast, the RCI-S2 values of the linker and the N-terminal residues increased to match those in the helical bundles. This indicates that the 30  protein backbone in these regions becomes more ordered when bound to DNA (Figure 2.5b). In addition, the β-hairpin and linker regions showed slightly increased β-strand propensities. In the case of the linker, this most certainly reflects its restriction to an extended conformation and not the formation of a β-sheet. However, in the case of the N-terminal region, residues Gln22 and Asn29, which are part of the β-hairpin turn and loop leading to helix 1, respectively, showed large increases in turn propensities (Figure 2.6). This suggests that the β-hairpin and adjacent turns are stabilized upon DNA binding and supports previous observations that this region makes crucial contacts with DNA [12, 90-92].   Amide CSPs due to binding of the CD19 DNA were mapped onto the crystal structure of the PD in the Pax5/Ets1/DNA ternary complex (Figure 2.5c, PDB: 1MDM) [12, 78]. The largest CSPs localized to amides in the β-hairpin and adjacent loops, the recognition helix (H3) of the NTD, the linker region, and the recognition helix (H6) of the CTD. This is in close agreement with the DNA contacts observed by X-ray crystallography. In particular, residues Gln22, Gly30, Val90, and Trp112, which exhibited very high CSPs, all contact the phosphate backbone or make hydrogen bonds with the bases of the DNA (Figure 2.5c).    The NTD is less protected from amide HX than the CTD   Protein dynamics play a critical role in DNA binding by transcription factors [127, 128]. Thus, I measured amide HX rates to study the local and global motions of the Pax5 PD in its free and DNA-bound states. Initially, lyophilized Pax51-149 was resuspended in D2O at pH* 7.0 and 25 °C and its protonation levels were monitored by NMR spectroscopy. In the first 15N-HSQC spectrum, recorded only ~10 minutes after resuspension, the vast majority of the amide signals were absent (Figure 2.7a). Nearly all of the remaining ~ 13 residues with detectable signals localized to the helical regions of the CTD (Figure 2.7b). Although amides within the unstructured N-terminal and linker regions were expected to exchange rapidly under these conditions, it was surprising that the entire NTD also exhibited little HX protection.  31    Figure 2.5: The β-hairpin and linker become ordered upon binding DNA.  (a) Comparison of 15N-HSQC spectra of Pax51-149 free (blue) and bound to the CD19 DNA (orange). Numerous amide signals, including those from residues flanking the helical bundles, change chemical shifts in the presence of DNA. (b) The backbone 13Cα/13Cβ/13CO/15N/1HN chemical shifts of free and CD19-bound Pax51-149 were assigned (Appendix A4). Shown are the normalized α-helical and β-strand propensities, as well as RCI-S2 values, per residue based on these shifts using the MICS algorithm [126]. The top cartoon indicates the secondary structure of the PD as in Figure 2.4. Although the secondary structure of the PD does not change upon DNA binding, the β-hairpin and linker regions become substantially more ordered as evidenced by their increased RCI-S2 values. (c) The amide 1HN-15N chemical shift perturbations (CSPs, ppm) resulting from DNA binding are plotted (top) and mapped onto the cartoon representation of the PD (bottom, from PDB ID: 1MDM). Blank values correspond to prolines or amides with unassigned signals. Residues with values above 0.3 ppm (dashed line) are highlighted in green, and those for which there is no information or the CSP is below 0.3 are in blue. The largest CSPs localize to amides within the DNA recognition helices H3 and H6, the linker, and the β-hairpin region.     32   Figure 2.6: The β-hairpin structure is stabilized upon binding CD19 DNA  (a) Type I and (b) type II turn propensities obtained from a MICS analysis of backbone chemical shifts [126] show that residues Gln22 and Asn29 flanking the β-hairpin adopt chemical shifts consistent with the ordering and stabilization of this small secondary structure. Gln22 is located in the loop between the two β-strands and Asn29 forms a turn into helix 1 of the NTD.    33   Figure 2.7: The CTD of Pax51-149 is more protected from amide HX than the NTD.  (a) Pax51-149 was lyophilized and resuspended in D2O to monitor the decay in amide 1HN-15N signal intensity due to protium-deuterium exchange. Shown are overlaid 15N-HSQC spectra of Pax51-149 in H2O buffer (open blue, pH 6.5, 25 °C) and ~ 10 min after resuspension in D2O (solid cyan, pH* 7.0 and 25 °C). Peaks in the latter are assigned. (b) Amides that have not fully exchanged after this time localize primarily to the CTD and are highlighted in cyan on a cartoon representation of the PD derived from the Pax5/Ets-1/DNA crystal structure (PDB: 1MDM). (c) Upon binding DNA, the entire PD becomes more protected against HX. Protection factors (PFs) for free Pax51-149 obtained from either slow protium-deuterium exchange at pH* 6.0 and 15 °C or fast CLEANEX-PM protium-protium exchange at pH 5.6, 6.3, or 8.0 and 25 °C were plotted in blue or cyan (the latter highlighting those most protected from HX in panel (a)). In grey are the corresponding PFs of the PD in the DNA-bound state, measured at pH* 6.60 and 15 °C using protium-deuterium exchange. Missing data corresponds to prolines, residues with overlapping amide chemical shifts, or residues with exchange rate constants outside of the measureable experimental ranges of the two approaches. The top cartoon indicates the secondary structure of the PD.    To better quantitate the HX behavior of Pax51-149, protium-deuterium exchange experiments were repeated at pH* 6.0 and 15 °C. The reduced sample pH* and temperature enabled measurements of HX rate constants for many amides (~ minutes-hours timescale). In parallel, I conducted magnetization transfer CLEANEX-PM experiments at pH 5.6, 6.3 and 8.0 (25 °C) to measure more rapid protium-protium HX (~ seconds timescale). The fit exchange rate constants for each backbone amide, kHX-obs, were compared to those predicted for Pax51-149 (kHX-pred) as a random coil polypeptide under the same conditions [129-131]. The calculated amide protection factors PF = kHX-pred/kHX-obs from HX and/or 34  CLEANEX-PM experiments are shown in Figure 2.7c. In accordance with their random coil chemical shifts and hence low RCI-S2 values, amides in the linker and terminal regions had PFs ~ 1. This indicates that these residues (which include those forming the β-hairpin when DNA-bound) do not adopt stable hydrogen-bonded conformations. In contrast, residues forming helices in the N- and C-terminal subdomains had PFs of ~ 100 and ~ 1000, respectively. Assuming hydrogen exchange in the bimolecular kinetic limit (EX2 conditions) and that the most protected amides in each subdomain exchange via global unfolding [132], these PFs correspond to ΔG°unfolding = RTln (PF) values of only ~ 2.6 kcal/mol and ~ 4.0 kcal/mol for the N- and C-terminal subdomains, respectively (Table 2.1). Thus, relative to well-folded globular proteins with PFs typically in the range of 104 to 107 [74, 133, 134], neither subdomain is highly stable against fluctuations leading to HX. Nevertheless, the CTD of Pax51-149 is more protected than the NTD.   Upon binding DNA, the entire PD becomes more protected from HX   To gain complementary dynamic insights, I also performed HX experiments on the Pax51-149/CD19 complex at pH* 6.60 and 25 °C. Relative to the unbound state, amides throughout the protein became significantly more protected from exchange (Figure 2.7c). Of particular note, residues in the β-hairpin and linker regions had PFs ~ 103 in the bound state, versus ~ 1 in the free state, indicating the stabilization of intra- or intermolecular hydrogen bonds. This is consistent with the structure of the PD in the Pax5/Ets1/DNA ternary complex. For example, the amides of Leu23 and Phe27 in the β-hairpin donate hydrogen bonds to Asn21 and Gly19, respectively, whereas those in the linker region (e.g. G85) interact with DNA. Furthermore, many amides throughout the NTD and CTD helical bundles of the Pax51-149/CD19 complex had PFs > 107 when bound to CD19 DNA, representing dramatic increases of at least 105 or 104-fold, respectively, relative to the free protein. These data hint that the stabilities of the helical bundles become similar when bound to cognate DNA. However, due to the very slow exchange behavior of the PD/DNA complex, I can only provide minimum PFs for the most protected amides. Therefore, I cannot rule out that the subdomains have different stabilities in the bound state. In 35  addition, exchange could occur within the bound state and/or via transiently unbound forms of the protein. Nevertheless, these HX measurements demonstrate that Pax51-149 is dramatically stabilized upon binding a cognate DNA sequence. Studies using molecular dynamics (MD) simulations below provide further insight into the dynamic properties of the subdomains in their DNA-bound state.   The NTD is more stable to chemical and thermal denaturation than the CTD   The HX experiments described above revealed that the subdomains have distinct biophysical properties. To further examine this, I used CD spectroscopy to conduct thermal and chemical denaturation studies. As shown in Figure 2.8a, I found that both the NTD and CTD had CD spectra characteristic of α-helical structures with ellipticity minima at 222 nm, as well as random coil character. The latter I attribute to the linker and β-hairpin regions, which are unfolded in the DNA-free state. I used the CD signal at 222 nm to monitor the unfolding of the subdomains as a function of temperature. Heat denaturation of both the NTD and CTD was a reversible process with little hysteresis (not shown). Surprisingly, the NTD had a significantly higher midpoint unfolding temperature (TM) of 67 ± 0.5 °C relative to the CTD, which had a TM of 58 ± 0.5 °C (Figure 2.8b, Table 2.1). However, the unfolding transition of the CTD was more cooperative than that of the NTD, which showed a more gradual loss in α-helical content as the temperature was increased. The fit ΔH0unfold values for the NTD and CTD were 30 ± 1 kcal/mol and 41 ± 2 kcal/mol respectively, indicating that the NTD requires less heat to denature, in spite of its higher TM. These values are affected by the change in heat capacities upon unfolding, which were not determined (see below), and therefore represent estimates of ΔH0unfold.   In order to obtain a ΔG0unfold under conditions to match the HX experiments (15 °C), it is necessary to know the difference in the heat capacities of the folded and unfolded states of the proteins (ΔCp0). This can often be extracted from thermal denaturation curves measured as a function of sample pH [135]. Unfortunately, over the pH of range 5.6 - 7.8, the TM and ΔH0unfold values of the subdomains did not change significantly, thus precluding determination of their ΔCp0 values. Although not pursued during my thesis studies, 36  differential scanning calorimetry is an alternative approach for obtaining these thermodynamic parameters.  In parallel, I studied protein stability using guanidinium hydrochloride (GuHCl) as a chemical denaturant (Figure 2.8c). The midpoint concentrations of GuHCl required to obtain equilibrium populations of 50 % unfolded protein were found to be ~ 2.7 M and ~ 1.9 M for the NTD and CTD, respectively. The ΔG°unfold of the NTD and CTD were calculated to be 2.9 ± 0.1 kcal/mol  and 2.4 ± 0.1 kcal/mol, respectively, by extrapolating ΔGunfold measurements at each denaturant concentration to GuHCl-free conditions (Figure 2.8c, Table 2.1). Therefore, the NTD was also more stable than the CTD to chemical denaturation, as measured by loss in secondary structure. The m-value of unfolding [136], which is equal to the negative slope of ΔGunfold versus [GuHCl], was steeper in the case of the CTD, relative to the NTD (1.24 kcal/mol-M versus 1.06 kcal/mol-M). This value is typically related to the change in accessible surface area of the protein upon unfolding [136]. The fact that this m-value is greater in the case of the CTD may indicate that it has a more extensive hydrophobic core than the NTD. This is in agreement with qualitative observations of side chain packing in the crystal structures which show that the CTD has more aliphatic side chains mediating internal contacts (not shown).    The ΔG0unfold values determined for the NTD by HX and by chemical denaturation are comparable (Table 2.1), suggesting that they are reporting the same conformational equilibrium. In contrast, the ΔG0unfold value determined for the CTD by HX appears anomalously high. This is somewhat difficult to reconcile, and may imply that the predominantly unfolded state of the CTD still contains some residual structure under the HX experimental conditions, that leads to increased protection from exchange.  In summary, whereas the NTD requires higher temperatures and denaturant concentration to unfold, the transition is broader and less cooperative than observed in the CTD. These results suggest that the NTD is more dynamic than the CTD, and is consistent with the NMR-derived HX data showing that the NTD readily undergoes conformational fluctuations to enable solvent contact by its amide hydrogens [40].   37    Figure 2.8: The NTD of Pax5 is more resistant to heat and chemical denaturation than the CTD.  (a) CD spectra of purified Pax51-92 and Pax576-149 proteins (10 µM) in NMR sample buffer (see Methods) at pH 6.5 and 25 °C. Both subdomains exhibited characteristic CD spectra of helical and disordered regions. (b) The CD signal at 222 nm was monitored as the sample temperature was increased gradually from 25 °C to 95 °C. The resulting curves were fit to determine the indicated TM values and the enthalpies of denaturation, ΔH0unfold (Table 2.1). Relative to the NTD, the CTD unfolds at a lower TM, yet in a more cooperative manner and with a higher ΔH0unfold. (c) The CTD is also more sensitive to GuHCl denaturation. The ellipticity at 222 nm was monitored as a function of denaturant concentration (left panel). The midpoint GuHCl unfolding concentrations, [GuHCl]50% were ~ 1.9 M GuHCl for the CTD and ~ 2.7 M for the NTD. A plot of ΔGunfold at each denaturant concentration (right panel) derived from the fraction unfolded allows extrapolation to the ΔG0unfold under non-denaturing conditions. From these calculations, the ΔG0unfold values at 0 M GuHCl are estimated to be 2.9 and 2.4 kcal/mol for the NTD and CTD, respectively.      38  Table 2.1: Midpoint unfolding temperatures and thermodynamic parameters of unfolding of the subdomains of Pax51-149 as determined by CD and HX.   Pax5 subdomain TM (°C) a ΔH0unfold (kcal/mol) a ΔG0unfold (kcal/mol) b m-value (kcal/mol-M) b ΔG0unfold (kcal/mol) c NTD  67.0 ± 0.5 °C 30 ± 1 2.9 ± 0.1 1.06 ± 0.04  2.6 ± 0.4 CTD  58.0 ± 0.5 °C 41 ± 2 2.4 ± 0.1 1.24 ± 0.06 4.0 ± 0.5  a Derived from CD-monitored heat denaturation experiments. Errors were estimated from goodness of fit using GraphPad Prism.  b Derived from extrapolation of ΔG0unfold values to 0 M GuHCl using CD spectroscopy. The errors correspond to standard deviations derived from linear regression analysis.  c Derived from HX measurements, assuming the EX2 limit [137], and by averaging the largest protection factors found in the α-helices.  Errors were estimated using the standard deviation of these values.   Amide 15N relaxation experiments describe sub-nanosecond timescale motions in the PD of Pax5   To characterize the sub-nanosecond timescale dynamics of the PD, I collected amide 15N relaxation data (T1, T2, and heteronuclear NOE) of Pax51-149 in the absence and presence of DNA (Figure 2.9). In its free form, amides throughout the helical bundle subdomains had relatively uniform T1 and T2 lifetimes and heteronuclear 15N-NOE values of ~ 0.75. This is indicative of well-defined helical structures and limited motions of the 1NH-15N bonds in the sub-nanosecond timescale [138]. On the other hand, amides within the N-terminal ~ 30 residues and linker regions of free Pax51-149 showed distinctly long amide T2 lifetimes and low or negative heteronuclear NOE values. Together with their random coil chemical shifts, low RCI-S2 values, and PFs ~ 1, these data clearly demonstrate that these regions of Pax51-149 are conformationally disordered in the sub-nanosecond timescale in the absence of DNA. In contrast, when bound to DNA, the relaxation properties of the linker and the N-terminal residues more closely match those of the helical subdomains. Thus, in contrast to the "beads-on-a-string" behavior of free Pax51-149, the entire PD becomes well-ordered when in complex with DNA.  39   Figure 2.9: Sub-nanosecond timescale motions of Pax5 using amide 15N relaxation experiments.  Shown are (a) T2, (b) T1 and (c) heteronuclear NOE relaxation data of free (blue) and DNA-bound (orange) Pax51-149, confirming that the linker and N-terminal residues are conformationally mobile in the absence of DNA, with unusually long T2 lifetimes and low or negative NOE values. However, these regions become ordered when bound, with relaxation parameters similar to those of the NTD and CTD helical amides. The * denote clipped histogram bars for residues Asn148 and Gln149 with T2 (and NOE) values of 0.44 s (-0.6) and 0.80 s (-0.8) for the free protein and 0.80 s (-0.3) and 1.2 s (-0.8) for the complex, respectively.    40   Fitting the relaxation data measurements (T1, T2, 15N-NOE) for the most ordered amides in Pax51-149 to the model-free formalism using Tensor2 [139] yielded rotational correlation times for isotropic global tumbling of 8.9 and 8.2 ns for the NTD and CTD, respectively. Although the assumption of isotropic rotation is an oversimplification, these values are consistent with the similar masses of the NTD and CTD regions within the ~ 17 kDa Pax51-149 protein fragment, and the fact that the two subdomains are separated by a flexible linker. Theoretical calculations of correlation times for the separate subdomains based on the crystal structure (PDB: 1MDM), using hydroNMR [140] yielded correlation times of 4.3 and 5.2 ns for the NTD and CTD subdomains respectively. On the other hand, the full PD was predicted to have a correlation time of 19 ns. These data are consistent with the free PD having rotational diffusion motions between that of two completely independent subdomains (~ 4 - 5 ns) and a rigid body (i.e. the PD in the DNA-bound conformation, ~ 19 ns). In the bound state we obtained a correlation time of 24 ns for a near axially symmetric prolate ellipsoid, with the z-axis lying along the DNA. This value is compatible with the increased mass of ~ 32 kDa for the 1:1 Pax51-149/DNA complex.   Molecular dynamics simulations shed light into the dynamic properties of the subdomains   To better understand the dynamic nature of the subdomains, we ran MD simulations on the core helical bundle structures without the flexible termini, Pax534-77 (NTD) and Pax592-142 (CTD). These simulations were based on the crystal structure of Pax5 determined in complex with DNA and Ets1 (PDB: 1MDM) [12], after removal of both the DNA and Ets1 molecules, followed by energy minimization. The simulations were run for 920 ns and the free subdomains remained stable throughout (not shown). Not surprisingly, we found that the average Amber B-factor (ABF) for backbone Cα atoms was lower for those in helices and higher for those in the interconnecting loops of the subdomains (Figure 2.10). The ABF value is related to the mean-squared deviation of atomic positions during the simulation, and is an indication of motions involving the backbone Cα atom. Interestingly, the DNA recognition helix H3 of the NTD was found to be more dynamic in this timescale, relative to the recognition helix H6 in the CTD. While most residues in H6 had an ABF value 41  bellow 15 Å2, residues in H3 were mostly above this value. As discussed below, the subdomains have different DNA-binding specificities, and the differences in the nanosecond timescale motions of the recognition helices observed by MD simulations may provide clues as to why.     Figure 2.10: MD simulations shed light into dynamics of the Pax5 subdomains.  Simulations for the isolated subdomains were run for 920 ns each using AMBER 14 [141], in the absence of DNA. (a) The average Amber B-factor (ABF) of the Cα atoms for each residue was plotted as a function of residue number for Pax534-77 (NTD) and Pax592-142 (CTD). The helical boundaries of the subdomains are shown in yellow and red rectangles above the graph for the N-terminal and C-terminal regions, respectively. As expected, the helices of the subdomains exhibited lower fluctuations than the interconnecting loops. Interestingly, the ABF values of the NTD recognition helix H3 are larger than those of the corresponding H6 in the CTD. The dotted line represents an ABF value of 15 Å2, and is shown as a visual aid for both subdomains. (b) The ABFs are mapped onto the crystal structures of the subdomains. Greater thickness and brighter colors in the cartoon representation of the helices indicate larger ABFs.  42   In addition, we performed cross-correlation analysis of the motions of these Cα atoms to examine how the helices of the subdomains couple their movements [142]. In this analysis, if two atoms move exactly in the same direction, they have a perfectly-correlated value of +1. In contrast, atoms moving in opposite directions are defined as having a correlation value of -1. We found that in the CTD, the helices forming the subdomain had motions that were more associated, indicated by the relatively large number of residues exhibiting positive correlation within the same helices (Figure 2.11a). However, in the case of the NTD, the positive correlations within helices were not as pronounced, particularly for longer-range (4 - 5 residues) associations. In fact, the recognition helix H3 of the NTD is divided into two distinct regions by this analysis, due to a kinking motion occurring at Gly70 (Figure 2.11b). As a result, the N- and C-terminal portions of the recognition helix H3 move in different directions. The flexible nature of the glycine residue, with more allowed torsional rotamers, may facilitate this motion. In contrast, the CTD does not contain any glycine residues in its sequence. Altogether, the dynamic behavior of the subdomains, in particular that observed in the DNA recognition helices H3 and H6, are distinct in the timescale observed by the MD simulations.  43    Figure 2.11: Cross-correlation analysis of the isolated subdomains explores coupled motions of the NTD and CTD.  (a) The cross-correlation of the Cα atoms for each residue was plotted as a function of residue number in the NTD (left) and the CTD (right). The color gradient from dark red to dark blue corresponds to correlation values linearly scaled to the maximum and minimum values from +1 (positively correlated) to -1 (negatively correlated). Motions in helices of the NTD are more poorly correlated than those in the CTD, as indicated by the number and extent of positive correlations in these regions. The helical regions are indicated with black boxes in each case. (b) The DNA recognition helix H3 of the NTD undergoes a kinking movement mediated by Gly70, which likely results in the negative correlation observed between the N- and C-terminal portions of the helix.    44  Upon binding DNA, the dynamic properties of the subdomains are differentially dampened   To study the DNA-bound complex by MD, I chose the high-affinity CD19 DNA sequence as used for NMR studies above (Table 2.2, Figure 2.5). This simulation was run for 920 ns and allowed investigation of the motions in the DNA-bound state of Pax5 (Figure 2.12). Consistent with the NMR studies described above, we found that motions throughout the PD became significantly dampened upon association with DNA. In addition, we found that motions in the NTD become more reduced than those in the CTD, relative to the free state of the subdomains (Figure 2.12). This finding indicated that upon binding, the NTD undergoes greater changes in structural dynamics, and is consistent with thermodynamic measurements of DNA binding (below).   Finally, we also analyzed how motions in the PD are correlated in the DNA-bound state (Figure 2.13).  Overall, we found that motions in all helices were more strongly coupled relative to those in the free subdomains, which is consistent with increased structural definition in the complex. For example, motions within the helices forming the subdomains, including helix H3 of the NTD, had strong positive correlations. This is indicative of reduced dynamics in the subdomains and consistent with all the experimental results shown in this chapter. The analysis also highlights regions in the PD that become specifically coupled in the DNA-bound state. For example, the loop separating strands S1 and S2 in the β-hairpin had strong positive correlations with the C-terminus of H2. Upon folding, the β-hairpin is in close proximity to this region (Figure 2.12b, 2.13), suggesting that H2 may contribute to the stability of this structure, aside from favorable contributions due to contacts with DNA. In addition, the N-terminus of the linker and the loop leading to H1 are positively correlated and also found in close proximity in the DNA-bound state (Figure 2.12b, 2.13).   In summary, the results in this section shed light into the structural and dynamic properties of the free and DNA-bound PD of Pax5, and highlight the changes that occur upon binding. I also showed that the two helical subdomains have distinct dynamic and biophysical properties, which, as explored below, are related to their distinct DNA-binding behaviors.  45    Figure 2.12: MD simulations investigating the DNA-bound state of the PD of Pax5.  A 920 ns MD simulation of the DNA-bound Pax5 PD (residues 19-142) complex was run. Motions throughout the PD, and in particular the NTD, become significantly dampened upon formation of the complex. (a) In blue and orange are the Amber B-factors (ABFs) of Cα atoms of the isolated free subdomains and the DNA-bound full-length PD, respectively, plotted as a function of the residue number. The free subdomain boundaries are as in Figure 2.10. In the bound state, Pax5 residues 30-74 (NTD) and 92-141 (CTD) are shown. The linker was omitted from the analysis for ease of comparison with the free subdomains. (b) The motions are plotted graphically in the model of the Pax5 PD/CD19 complex. Greater thickness and brighter colors in the cartoon representation of the helices indicate larger ABFs. The secondary structure elements are indicated. Dashed circles highlight regions with coupled motions in the DNA-bound state shown in Figure 2.13.  46   Figure 2.13: Cross-correlation analysis of the DNA-bound state of the Pax5 PD.  The cross-correlation of motions in the PD/CD19 complex for residues 19-142 were plotted as a function of the residue number. As in Figure 2.11, the color gradient from dark red to dark blue corresponds to correlation values ranging from +1 (positively correlated) to -1 (negatively correlated). This analysis highlights possible stabilizing contacts between regions of the PD specific to the bound state. For example, the β-hairpin region is positively correlated to the C-terminus of H2, hinting at possible stabilizing contacts in the DNA-bound state. In addition, the loop leading to H1 and the N-terminus of the linker are positively correlated. These regions are indicated as dashed circles in the figure, and highlighted in the cartoon structure shown in Figure 2.12 above. The secondary structure elements are indicated with black boxes as in Figure 2.11.     47  2.3.2 Mechanisms of DNA binding by Pax5  Both subdomains of Pax51-149 contribute to overall binding affinity towards CD19 DNA   To determine the relative contribution of each subdomain to DNA binding, I measured the equilibrium dissociation constants (KD values) of Pax51-149, Pax51-92, and Pax576-149 for the full CD19 DNA or its half-sites, CD19-N and CD19-C (Figure 2.14 and Table 2.2). The latter were defined based on approaches including sequence comparisons, mutagenesis, and chemical modification studies from previous reports [81, 143]. Using an electrophoretic mobility shift assay (EMSA), the KD value of Pax51-149 for CD19 DNA was determined to be 5 ± 2 nM (Table 2.3, Figure 2.15). In contrast, the individual Pax51-92 and Pax576-149 fragments bound their respective half-site DNAs with KD values of 2.7 ± 0.9 µM and 13 ± 3 µM, respectively, as measured by ITC (Table 2.3, Figure 2.16). The change of more than 3 orders of magnitude in the KD value for Pax51-149 versus those for the two fragments indicates that both contribute to the overall affinity for the full-length CD19 DNA. This is expected for multivalent interactions involving the bipartite PD and an extended DNA sequence. Surprisingly, Pax1-92 did not measurably bind the CD19-C half-site, whereas Pax576-149 bound both the CD19-N (12 ± 1 µM) and CD19-C (13 ± 3 µM) half-sites with similar moderate affinities (Table 2.3, Figure 2.16). As discussed below, the DNA-binding specificity of Pax5 appears to be set by the NTD, which discriminates cognate and non-specific binding sites.    48    Figure 2.14: Schematic representation of PD protein segments used for DNA-binding studies.  Shown diagrammatically are the different fragments of the Pax5 PD used to dissect its DNA-binding mechanisms. In blue is the full-length Paired domain (Pax51-149). Regions including only the N-terminal subdomain (NTD) or C-terminal subdomain are shown in yellow and red, respectively. The top cartoon indicates the secondary structural elements found in the bound state of the PD.    Table 2.2: Oligonucleotides used for DNA-binding studies.  Name Sequence Length CD19 a    5’-CGGTGGTCACGCCTCAGTGCCCCAT    3’-GCCACCAGTGCGGAGTCACGGGGTA 25 CD19-N a     5'-GGTGGTCACGCC     3'-CCACCAGTGCGG 12 CD19-N+ a     5’-GGTGGTCACGCCTCAGTG     3’-CCACCAGTGCGGAGTCAC 18 CD19-C a                 5'-TCAGTGCCCCAT                 3'-AGTCACGGGGTA 12 mb-1 b 5'-GTGCCGGAGATGGGCTCCAGTGGCCCT 3'-CACGGCCTCTACCCGAGGTCACCGGGA 27 mb-1-N b       5'-GAGATGGGCTC       3'-CTCTACCCGAG 12 mb-1-C b              5'-GCTCCAGTGGCC                                                              3'-CGAGGTCACCGG 12 GTCACTCAG  5’-GTTTTCCAAAACCTTTTCCAAAAG  3’-CAAAAGGTTTTGGAAAAGGTTTTC 24 G(T)5(A)5C  5'-GTTTTTAAAAAC  3'-CAAAAATTTTTG 12 T(G)5(C)5A  5'-TGGGGGCCCCCA  3'-ACCCCCGGGGGT 12  a Corresponding to CD19-2_Ains [81]. The approximate half-sites for the NTD and CTD are denoted as -N and -C, respectively. Methylated G residues that disrupt binding are highlighted in bold blue [81].  b Corresponding to the mb-1 promoter sequence in the Pax5/Ets-1/DNA complex [12]. Base pairs contacted directly in the major groove by Pax5 are highlighted in bold red. 49    Figure 2.15: Quantification of the interaction between Pax51-149 and full-length CD19 DNA.  (Left) EMSA assay used to quantitate the interaction. The lower band corresponds to free Alexa Fluor 647 fluorescently-labeled CD19 DNA and the upper band corresponds to the Pax51-149/CD19 complex. The resulting binding curve derived from this gel is shown on the right. The fraction bound was calculated as the ratio of band intensity of the Pax51-149/DNA complex to total DNA band intensities, and plotted as a function of total protein concentration (right). Two independent data sets, each measured twice on separate gels, were fit to a simple 1:1 binding model and the results were averaged to yield the reported KD value ± standard deviation of 5 ± 2 nM. The assay was carried out in 20 mM MES, 100 mM NaCl, 6 mM MgCl2, 6 mM DTT, 0.2 mM EDTA, 200 µg/mL bovine serum albumin and 10 % glycerol at pH 6.5 and 4 °C.       50   Figure 2.16: The subdomains exhibit different sequence preferences for CD19 half-sites.   The DNA-binding subdomains of Pax5 have similar affinities for their corresponding CD19 half-sites. However, the CTD can associate with both half-sites, whereas the NTD only binds its cognate CD19-N sequence. The equilibrium dissociation constants for (a) Pax51-92 and (b) Pax576-149 towards the CD19-N and CD19-C half-sites were measured by ITC experiments. The top panels contain the raw buffer-corrected data with the heats produced at each injection of concentrated protein into DNA solution. The bottom panels show the corresponding Wiseman plots of integrated heats. The curves were fit with Origin to a simple 1 to 1 binding model and the resulting values for dissociation constant (KD), enthalpy change (ΔH0), and entropy change (ΔS0) are indicated on the graph. The stoichiometry was set to 1 by adjusting the protein concentration of the subdomains. This allowed reliable KD determinations (see Methods). Addition of Pax51-92 to CD19-C DNA yielded very small heat changes and the titration data were not fit. These measurements were carried out in 20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, and 6 mM MgCl2 at pH 6.5 and 25 °C. 51     DNA binding was also monitored by 15N-HSQC experiments. Pax51-149 bound the CD19 DNA sequence in the slow exchange limit (bound spectrum shown in Figure 2.5a and Appendix A4). That is, the exchange rate constant between the free and bound states of the protein (kex = kon[DNA] + koff) was much smaller than the chemical shift difference between these two states, |Δω| (reviewed in [138]). As a result, the 1HN-15N signals of the free protein disappeared upon titration with DNA, while new signals from the bound protein concomitantly appeared. Such tight binding effectively precludes the determination of a KD value by NMR spectroscopy. However, it is in agreement with that of 5 nM measured by EMSA. Upon titration of Pax51-92 with its CD19-N half-site, many amides exhibited binding in the intermediate-slow exchange regime (Figure 2.17a). With kex ≲ |Δω|, this behaviour is characterized by moderate shifting and significant broadening of the 1HN-15N signals (in most cases to the point of disappearance), followed by sharpening and reappearance with new chemical shifts over the course of the titration. Such exchange broadening also precluded the extraction of a KD value. However, given the weak affinity measured by ITC (KD ~ 3 µM) for the Pax51-92/CD19-N interaction (Table 2.3), severe broadening was somewhat unexpected. As explained below, this may be the result of large chemical shift changes (|Δω|), and relatively slow association and dissociation kinetics (kon and koff). In contrast, Pax576-149 bound the CD19-C half-site in the fast-intermediate exchange regime where kex ≳ |Δω| (Figure 2.17b). This is characterized by moderate broadening and progressive changes of amide signals from their unbound position as DNA is initially added, followed by the sharpening of amide signals at the bound chemical shift position as saturation is reached. In reasonable agreement with ITC data, fitting of the NMR-monitored titration curves (not shown) yielded a KD value of 26 ± 5 µM for Pax576-149 with the CD19-C half-site (Table 2.3).   52   Figure 2.17: The subdomains of Pax51-149 exhibit different binding properties for DNA half-sites.  Shown are 15N-HSQC-monitored titrations of the CD19 half-sites unto (a) 15N-labeled Pax51-92 and (b) Pax576-149, as well as the mb-1 promoter half sites into (c) 15N-labeled Pax51-92 and (d) Pax576-149. Pax51-92 bound CD19-N in the slow-intermediate exchange regime, such that some amide signals initially shifted and broadened to disappearance, and then reappeared with new chemical shifts, shown enclosed in dotted circles. In contrast, Pax51-92 bound the mb-1-N half-site weakly, exhibiting signal broadening, but no substantial chemical shift changes even after addition of more than 3 molar equivalent of DNA. Pax576-149 bound both half-sites in the fast exchange limit, allowing determination of the KD values listed in Table 2.3. The molar ratios of protein:DNA are indicated by the color codes, and selected assignments provided. Solid arrows indicate the amide chemical shift changes from the free to the bound states. Dotted arrows are included for amides that initially disappear and reappear outside the spectral window shown. 53   Table 2.3: Equilibrium dissociation constants (KD values) for Pax5-DNA interactions.  Protein DNA Pax51-149  CD19 (25 bp) mb-1 (27 bp) GTCACTCAG (24 bp) 5 ± 2 nM b 2.4 ± 0.5 nM e binding detected d, f Pax51-92 CD19-N  CD19-N+ CD19-C mb-1-N G(T)5(A)5C T(G)5(C)5A 2.7 ± 0.9 µM c, h  0.95 ± 0.1 µM c, g weak c, d weak d weak d weak d Pax51-77 N.D. 2.4 ± 0.2 µM c, g N.D. N.D. N.D. N.D. Pax532-92 85 ± 30 µM d 63 ± 0.9 µM c, g 34 ± 4 µM c , g N.D. N.D. N.D. N.D. Pax576-149 CD19-N CD19-C mb-1-C G(T)5(A)5C T(G)5(C)5A 10 ± 1 µM c, g, h 13 ± 3 µM c, h 26 ± 5 µM d 380 ± 120 µM d 380 ± 95 µM d 155 ± 20 µM d  a Sequences listed in Table 2.2.  b Determined by EMSA.  c Determined by ITC at 25 °C.   d Determined by NMR spectroscopy at 25 °C; weak binding is estimated to be > 500 µM.  e From Fitzsimmons et al. [79] using a 34 bp DNA duplex. f Exchange broadening precluded the estimation of a KD value.  g Derived from one ITC measurement. Errors represent goodness of fit.  h These values differ somewhat from those previously published in Perez-Borrajero, C. et al., 2016, J. Mol. Biol. because a more reliable method of fitting the ITC data was employed after publication (see Methods), and more measurements of these interactions were conducted.  N.D.: not determined 54   My results also indicated that, whether separated or linked, the NTD and CTD bound the CD19 half-sites via the same general interfaces as identified in the Pax5/Ets-1/DNA ternary complex by X-ray crystallography (Figure 2.18) [12]. Although I did not assign the spectrum of the Pax51-92/CD19-N complex, new dispersed signals appearing around 9 to 10 ppm upon addition of DNA matched those with largest CSP in the N-terminal region of the Pax51-149/CD19 complex (Figure 2.5a, 2.18a, and Appendix A4). Furthermore, a plot of the relative loss of amide signal intensities from the free state of Pax51-92 upon addition of CD19-N indicated that residues throughout the helical bundle and the preceding β-hairpin were perturbed (Figure 2.18a).   In the case of Pax576-149, most assignments for the bound state could be obtained by tracking 1HN-15N signals over the course the titration. Amides showing the largest CSPs upon binding CD19-C also matched those most affected in the complex of Pax51-149 with CD19 (Figure 2.18b). Not surprisingly, the linker residues showed greater CSPs in the context of the full PD rather than the subdomains. For example, Lys87 had a CSP value of 0.82 ppm in the Pax51-149/CD19 complex (Figure 2.5c), but only 0.49 ppm in the Pax576-149/CD19-C complex (Figure 2.18b). Thus, the high affinity of the PD in Pax51-149 for CD19 DNA arises from the combined binding of the NTD and CTD to their respective half-sites, augmented by positioning the ordered linker to lie along the intervening DNA minor groove.  55   Figure 2.18: The NTD and CTD of Pax5 contact specific and non-specific DNAs using similar binding interfaces.  (a) Shown are the changes in unbound state signal intensities upon addition of ~ 1 molar equivalent of the indicated DNAs to Pax51-92. The dilution-corrected intensities are relative to initial reference spectra (I1:1/Iref). The amides affected by DNA span the helical bundle of Pax51-92, including the β-hairpin. Signals from many amides had different chemical shifts when bound to CD19-N, and hence unbound signals decreased to near baseline values. The mb-1-N and palindromic sequences had more moderate effects, yet the most perturbed amides still clustered near helix H3. (b) The CSPs of Pax576-149 after addition of the various DNAs to near saturation are plotted as a function of residue number. Residues in helix H4 and the preceding linker, helix H5, and the recognition helix H6 showed the greatest CSPs, thus defining the DNA-binding interface. Addition of CD19-C caused the largest CSPs, followed by the palindromic sequence T(G)5(C)5A. During the titrations with these sequences, we were unable to track residues Val90, Trp112, Ile138, and Arg140 due to the large chemical shift differences of their free versus bound forms. This is consistent with their behavior in the Pax51-149-CD19 complex and highlights the importance of these residues in contacting the DNA.      56  The subdomains of Pax51-149 contribute differently towards binding mb-1 DNA    The mb-1 promoter is a well-characterized binding site for the partnership of Pax5 and Ets-1 [12, 79, 144-146]. Although this DNA sequence does not conform closely to the Pax5 consensus sequence [81], upon addition of the 27 bp mb-1 DNA duplex to Pax51-149, I also observed slow exchange behavior in 15N-HSQC spectra (bound spectrum shown in Figure 2.19). Thus, consistent with a KD ~ 2.4 nM reported by Fitzsimmons et al. [79], Pax51-149 can bind this sequence with high affinity, even in the absence of Ets-1. I also investigated the binding of the mb-1 half-sites to the Pax5 fragments by NMR spectroscopy (Table 2.3). Pax51-92 interacted very weakly with the mb-1-N half-site, showing only small CSPs even in the presence of a 3-fold molar excess of DNA (Figure 2.17c). Due to these small spectral changes, I was unable to determine a KD value for the Pax51-92/mb-1-N interaction. Nevertheless, 1HN-15N intensity losses were observed for amides throughout the helical bundle, and the effect was most pronounced for those clustering near helix H3 (Figure 2.18a). This suggests that the recognition helix H3 is also involved in the weak association of Pax51-92 with mb-1-N. In contrast, Pax576-149 bound the mb-1-C half site in the fast exchange regime (kex ≫ |Δω|) with no significant line broadening (Figure 2.15d). A KD value of 380 ± 120 µM was obtained by fitting the titration data (Table 2.3). Amides showing the largest CSPs mapped to the same surface of Pax576-149 affected by addition of the CD19-C half-site (Figure 2.18b), indicating that the CTD uses a common binding interface to interact with the mb-1 and CD19 DNAs. Although the CD19-C and mb-1-C sequences are more similar to one another than are the CD19-N and mb-1-N sequences (Table 2.2), the overall behaviours of the NTD and CTD towards these half-site DNAs are strikingly different. In particular, the NTD shows greater DNA sequence discrimination than the CTD. 57    Figure 2.19: Pax51-149 binds specific and non-specific DNAs.  Shown are overlaid 15N-HSQC spectra of 15N-labeled Pax51-149, free or in the presence of the DNA duplexes CD19, mb-1, and the 24 bp pseudo-palindrome G(T)4(C)2(A)4(C)2(T)4(C)2 (A)4G (Table 2.2). The DNA to protein molar ratios were 1:1, 3:1 and 3.6:1, respectively. Binding occurred in the slow exchange limit for specific complexes involving CD19 and mb-1, whereas exchange broadening was observed over the course of the titration with the pseudo-palindromic sequence (Table 2.3). The spectra of the specific complexes are similar, showing well dispersed amide signals in the 9 to 10 ppm region not seen in the free protein. However, the pseudo-palindromic complex also showed chemical shift perturbations. This demonstrates that Pax51-149 associates with this non-specific DNA sequence, albeit more weakly than with specific DNAs. Although only the spectrum of Pax51-149/CD19 was assigned, representative signals within the dashed ovals likely arise from the same amides in all three complexes (Pax51-149/CD19, Pax51-149/mb-1, and Pax51-149/pseudo-palindrome) and localize to the CTD. On the other hand, 1HN-15N signals within boxes are similar in the specific Pax51-149/CD19 and Pax51-149/mb-1 complexes, but absent in the non-specific Pax51-149/pseudo-palindrome complex. These correspond mostly to amides in the NTD and linker regions. Thus, the CTD appears to bind all three DNAs similarly, whereas the NTD discriminates between specific and non-specific DNAs. All spectra were collected at reduced salt concentration (20 mM MES, 20 mM NaCl, 2 mM DTT, 0.5 mM EDTA, pH 6.5).  58  Contribution of the β-hairpin and linker to DNA binding   To assess the contribution for DNA-binding affinity of the N-terminal β-hairpin region of the PD, I also expressed 15N-labeled Pax532-92 lacking the residues required to form this hairpin structure. The amide 1HN-15N chemical shifts of this protein were almost identical to those of Pax51-92, indicating that the helical bundle remained stable in the absence of residues 1 to 31, which are disordered (not shown). Upon addition of CD19-N DNA to Pax532-92, I observed similar residues being perturbed as a result of binding as seen with Pax51-92 (Figure 2.20). However, the exchange behavior became faster, consistent with weaker affinity for DNA. Fitting the NMR-monitored titration curves yielded a KD of 85 ± 30 µM (Table 2.3).     Figure 2.20: Deletion of the β-hairpin weakens DNA binding by the NTD.  Shown are 15N-HSQC-monitored titrations of 15N-labeled Pax532-92 with (a) CD19-N and (b) non-specific palindromic DNA. Pax532-92 bound CD19-N in the near fast exchange limit, allowing the determination of a KD value of 85 ± 30 μM (averaged from the fit titration curves of 10 residues; Table 2.3), In contrast, Pax51-92 bound this specific half-site DNA with higher affinity (2.7 ± 0.9 μM) and in the intermediate-slow exchange regime (Figure 2.17). Both Pax51-92 and Pax532-92 only weakly interacted with the non-specific DNA, and thus the β-hairpin does not impair binding. In these spectra, the molar ratios of protein:DNA are indicated by the color codes, and selected assignments are provided. Solid arrows indicate the amide chemical shift changes from the free to the bound states. Dotted arrows are included for amides that initially disappear and reappear outside the spectral window shown.  59  This is ~ 30-fold larger than the value of 2.7 ± 0.9 μM measured by ITC for Pax51-92 to the CD19-N DNA, and thus the N-terminal β-hairpin region of the PD indeed contributes to its net DNA binding affinity.  To explore the contribution of the linker separating the subdomains to affinity, I chose a sequence with 6 additional base pairs relative to the CD19-N half site. This sequence, denoted CD19-N+, guaranteed enough space for association of the extended linker (Table 2.2). Using ITC, I measured the KD of association of this DNA duplex to be 0.95 µM, 2.4 µM, and 34 µM for Pax51-92, Pax51-77(lacking the linker), and Pax532-92 (lacking the β-hairpin), respectively (Table 2.3, Figure 2.21). Therefore, the linker and β-hairpin increase affinity for the CD19-N+ site by ~ 2.5-fold, and ~ 35-fold, respectively. This is consistent with the ~30-fold increase in affinity afforded by the β-hairpin for the shorter CD19-N site (above). In addition, this suggests that the linker is not a passive spacer between the subdomains, but also contributes to DNA-binding even in the absence of the CTD.   Figure 2.21: Contribution of the linker and β-hairpin to DNA binding. The equilibrium dissociation constants of Pax51-92, Pax51-77, and Pax532-92 towards the CD19-N+ DNA duplex were measured by ITC experiments (Table 2.3). The linker, deleted in Pax51-77, contributes to binding by ~2.5 fold, while the β-hairpin, deleted in Pax532-92, contributes ~ 35 fold to affinity. The raw ITC data was treated as in Figure 2.16. These measurements were carried out in 20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, and 6 mM MgCl2 at pH 6.5 and 25 oC. 60  The two subdomains of Pax51-149 differ in non-specific DNA binding   Since the vast majority of genomic DNA does not correspond to Pax5 regulatory sites, we also investigated PD binding to non-specific DNA sequences by NMR spectroscopy. I initially used a 24 bp pseudo-palindrome (GT4C2A4C2T4C2A4G, Table 2.2) for studies with Pax51-149. Exchange broadening occurred over the course of the titration of unlabeled DNA duplex into 15N-labeled Pax51-149, demonstrating that the PD is able to bind DNA sequences that do not conform to a consensus site (bound spectrum shown in Figure 2.19). This broadening also precluded spectral assignments and the estimation of a KD value (Table 2.3). Nevertheless, in the presence of 3.6 molar excess of this pseudo-palindromic DNA duplex, several amides had dispersed chemical shifts similar to those observed in the spectra of the CD19 and mb-1 complexes (Figure 2.19). Although only tentatively assigned, these amides were mostly located in the CTD. In contrast, the dispersed signals from NTD amides seen with the two specific complexes were absent in the spectrum of the non-specific complex (Figure 2.19). This indicates that the CTD primarily mediates binding of Pax51-149 to the pseudo-palindromic DNA.   To further dissect the contribution of the subdomains to non-specific DNA binding, I chose two simple 12 bp palindromic sequences for titrations with Pax51-92 and Pax576-149 (Table 2.2). Surprisingly, Pax51-92 exhibited minimal CSPs upon addition of nearly 3-fold molar excess of either the 5’G(T)5(A)5C3’ or 5’T(G)5(C)5A3’ duplexes (Figure 2.22a). Accordingly, I estimate KD values greater than 500 µM for Pax51-92 with either of these palindromes (Table 2.3). Nevertheless, residues throughout the helical bundle region showed patterns of reduced amide signal intensities similar to those seen with the CD19-N and mb-1-N DNAs (Figure 2.18a). Therefore, Pax51-92 interacts with non-specific and specific DNA via the same canonical DNA-binding interface. This conclusion is also supported by the observation that the few amide signals with clear CSPs upon titration with either palindrome, such as Cys64, Ser61, and Leu69, map to this interface. Addition of the T(G)5(C)5A palindromic duplex to Pax532-92 lacking the β-hairpin region also did not result in any substantial spectral perturbations (Figure 2.22b). Parenthetically, this eliminates the formal possibility that residues 1-31 inhibit DNA binding by the NTD. 61   Figure 2.22: In contrast to the CTD, the NTD subdomain of Pax51-149 only weakly interacts with non-specific DNA.  The addition of the two palindromic DNAs to (a, b) 15N-labeled Pax51-92 and (c, d) Pax576-149 was monitored by 15N-HSQC experiments. The protein:DNA molar ratios are indicated by the color codes, and selected assignments are provided. Pax51-92 was only modestly perturbed by the presence of excess DNA with small amide intensity changes plotted in Figure 2.18a. On the other hand, Pax576-149 bound both palindromes in fast exchange and exhibited large amide CSPs, as shown in Figure 2.18b. Solid arrows show the direction of the amide chemical shift perturbation from free to bound states. Dotted arrows indicate amide shifts that cannot be tracked or reappear outside the spectral window shown. Fitting the latter titrations yielded the KD values in Table 2.3.     Unlike Pax51-92, Pax576-149 bound both palindromic DNAs in the fast exchange limit and exhibited large CSPs (Figure 2.22c, d). Fitting the titration data of Pax576-149 yielded KD values of 380 ± 95 µM for G(T)5(A)5C and 155 ± 20 µM for T(G)5(C)5A, respectively (Table 2.3). In addition, the patterns of CSPs exhibited by Pax576-149 upon addition of the two palindromic DNAs were similar to those due to binding the mb-1-C and CD19-C half-sites 62  (Figure 2.18b), and the magnitude of the CSP correlated with the strength of the interaction (CD19-C > T(G)5(C)5A > mb-1-C ~ G(T)5(A)5C).  This indicates that the CTD also binds specific and non-specific DNA using the same interface, encompassing the N-terminal portion of helix H4 and the entire recognition helix H6. As well, the KD values for these non-specific palindromic DNAs were only ~ 10- to 20-fold higher than for the cognate CD19-C half-site. Thus, in contrast to the NTD, the CTD exhibits only modest sequence specificity.   Thermodynamic parameters of DNA binding by the subdomains   In addition to KD (or equivalently, ΔG0), ITC provides a direct measure of the enthalpy change (ΔH0) and a calculated entropy change (ΔS0) for a binding equilibrium. Along with heat capacity changes (ΔCp0) from the temperature dependence of ΔH0, these thermodynamic parameters can help provide insights into factors such as conformational transitions and the burial of hydrophobic groups that underpin binding affinity and specificity [147, 148]. Therefore I used this technique to investigated association of the Pax5 subdomains to their respective CD19 half sites. As summarized in Table 2.4 and Figure 2.23, the Pax5 subdomains showed very distinct thermodynamic behaviors. In the case of the NTD, binding is driven by large favorable enthalpy changes that counteract large entropy losses.   In the case of the CTD, the compensating entropy and enthalpy contributions are smaller and at 15 °C, binding of the CD19-C duplex is even slightly entropically favorable. As will be discussed below, this is consistent with a view that the more dynamic NTD undergoes a greater loss of conformational entropy upon DNA binding than does the more rigid CTD.    63   Figure 2.23: Thermodynamics of DNA binding by the Pax5 subdomains.  (a) The temperature-dependent thermodynamic parameters for the binding of Pax51-92 (NTD; yellow) and Pax576-149 (CTD, red) to their corresponding CD19 DNA half-sites are shown in bar graph format (20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, 6 mM MgCl2 at pH 6.5). (b) The heat capacity changes upon bindings, ΔCp0, were calculated from the slopes of ΔH0 versus temperature plots. Linear regression analysis was used to estimate the error in the fit ΔCp0 values, and these are indicated on the plots.    Table 2.4: Temperature-dependent thermodynamic parameters for the Pax5 subdomains binding their CD19 half-site DNAs.    Pax5 subdomain / DNA Temperature (°C) KD * (µM) ΔH0 (kcal/mol) ΔS0 (cal/mol-K) NTD /  CD19-N 15 1.1 ± 0.1 a -27.0 ± 0.2 a -67 a 25 2.7 ± 0.9 b -30 ± 1 b -75 ± 5 b 35 6.3 ± 0.1 a -33.0 ± 0.2 a -84 a CTD / CD19-C 15 7.0 ± 0.5 a -5.1 ± 0.1 a 6 a 25 13 ± 3 b -10 ± 1 b -10 ± 4 b 35 19.0 ± 0.7 a -13 ± 0.2 a -21 a  a Derived from one measurement. Errors correspond to goodness of fit for ΔH0 only.  b The averages and standard deviations of five measurements.  * Determined at pH 6.5, 100 mM NaCl.     Note: the stoichiometry value (N) was set to 1 by floating the protein concentrations of the subdomains. This analysis provided reliable fit parameters (see Methods).    64   For both subdomains, increasing temperature from 15 °C to 35 °C resulted in increasing KD values (Table 2.4, Figure 2.23). This, of course, is expected for exothermic binding equilibria. Heat capacity changes (ΔCp0) of -305 ± 10 and -405 ±  9 cal/mol-K for Pax51-92 (NTD) and Pax576-149 (CTD) binding to their respective CD19 half-sites were calculated from linear plots of ΔH0 versus temperature. This approach assumes that ΔCp0 is approximately constant over the small temperature range studied. Although difficult to interpret mechanistically, these large negative ΔCp0 values are consistent with those measured for other transcription factor-DNA binding interactions [149-151].   Ionic strength dependence of DNA binding by the Pax5 subdomains    Analyzing the dependence of the association constant (KA = 1/KD) on ionic strength is useful for dissecting the contributions of ionic and other contacts to DNA binding by transcription factors [152]. Therefore, I measured with ITC the affinities of Pax51-92 and Pax576-149 for their respective CD19 half-sites as a function of NaCl concentration, while keeping the MgCl2 concentration constant at 6 mM. As expected, increasing the ionic strength resulted in weaker DNA binding (Figure 2.24a, Table 2.5). Furthermore, both the NTD and CTD exhibited similar dependencies on the NaCl concentrations, with log(KA) versus log[NaCl] plots having slopes of -3.1 ± 0.5 and -3.4 ± 0.5, respectively (Figure 2.24a). This slope, which corresponds to the net number of counter-ions released upon binding [152], indicates that the subdomains make a similar number of electrostatic contacts with their DNA half-sites. Although the inclusion of divalent Mg+2 ions in the binding reactions may conflate the analysis due to their favorable interactions with DNA, such a result is consistent with the similar predicted isoelectric points of the subdomains (10.2 and 10.6 for the NTD and CTD, respectively). In addition, both subdomains have similar numbers of positively charged side chains in close proximity to the DNA phosphodiester backbone in the crystal structure of bound Pax5 (PDB: 1MDM).   However, the salt-independent component contributing to affinity is larger for the NTD, relative to the CTD, as evidenced by the different y-axis intercepts. Stated equivalently, the interaction between the CTD and CD19-C was significantly weaker at NaCl 65  concentrations of 400 mM (KD ~ 1800 µM), relative to the NTD and CD19-N pair under the same conditions (KD ~ 120 µM) (Figure 2.24a, Table 2.5). At even higher ionic strengths (i.e. 500 mM NaCl), binding by the CTD was not detected, whereas the NTD measurably bound its DNA half-site (not shown). Electrostatic and ionic contacts therefore appear to be crucial in driving DNA recognition by the CTD. In contrast, although similarly affected by the salt concentration, DNA binding by the NTD also relies on additional contributions. Parenthetically, a comparison of 15N-HSQC spectra of the subdomains at 100 mM NaCl and 500 mM NaCl shows that both are well folded under these conditions, and thus these differences are not due to factors such as salt-induced denaturation (Appendix B).    Figure 2.24: The electrostatic contributions to DNA binding by the subdomains of Pax5. (a)  The KD values for the Pax5 subdomains with their respective CD19 half-sites were measured with ITC as a function of ionic strength, set by the concentration of NaCl (20 mM MES, 0.5 mM EDTA, 2 mM DTT, 6 mM MgCl2 at pH 6.5 and 25 °C). The NTD has higher affinity for its half-site and can rely on non-electrostatic contacts relative to the CTD, for which DNA binding is almost completely abolished in 400 mM NaCl. Error bars correspond to the standard deviation of at least two measurements (see table below). (b) A plot of log(KA) versus log[NaCl] estimates the net number of counter-ions released upon binding. Both subdomains depend roughly equally on the ionic strength of the solution. However, the non-ionic contribution to binding, represented by the y-intercept, is stronger in the case of the NTD. The errors estimates were derived from linear regression analysis of the data points shown.         66  Table 2.5: Ionic strength-dependent thermodynamic parameters for the Pax5 subdomains binding their CD19 half-site DNAs.    Pax5 subdomain / DNA [NaCl]  (mM) KD * (µM) ΔH0 (kcal/mol) ΔS0  (cal/mol-K) NTD / CD19-N 100 2.7 ± 0.9 a -30 ± 1 a -75 ± 5 a 200 8.7 ± 0.2 a -28 ± 0.5 a -72 ± 2 a 300 41 ± 4 a -26 ± 1 a -66 ± 3 a 400 120 ± 12 a -23 ± 0.5 a -60 ± 2 a CTD / CD19-C 100 13 ± 3 a -9.5 ± 1 a -10 ± 4 a 200 70 ± 15 a -8.6 ± 0.5 a -9.5 ± 1 a 300 320 ± 80 a N.D.b N.D.b 400 ~ 1800 b N.D.b N.D.b  a The averages and standard deviations of least two measurements by ITC.  b Due to weak binding, these values were either estimated or not determined (N.D.).  * Determined at pH 6.5, 25 °C.     Note: the stoichiometry value (N) was set to 1 by floating the protein concentrations of the subdomains. This analysis provided reliable fit parameters (see Methods).    Collectively, the results presented in this section show that the two subdomains have very distinct DNA-binding properties. The NTD recognizes specific DNA through enthalpically favorable contacts that offset relatively large entropic losses. These could reflect structural rearrangements and a dampening of dynamics accompanying complex formation. In contrast, DNA recognition by the CTD is much less specific and dependent primarily on electrostatic contacts. The distinct recognition mechanisms of the subdomains suggest they may have different roles in localizing the PD to DNA (more details in the Discussion). For example, the CTD likely allows for general localization of the PD on DNA, whereas the NTD sets the specificity for regulatory sites.    67  2.3.3 Beyond the Paired domain of Pax5  The partial homeodomain of Pax5 is disordered and does not bind DNA   Residues ~ 220-250 of human Pax5 share homology with the first helix of the HDs present other Pax factors (Figure 2.3) [86]. However, this does not extend to the second and third helices that form the DNA-binding HTH motif. Furthermore, sequence-based secondary structure prediction algorithms indicate that this region of the protein should be either disordered or have low α-helical content (not shown). To experimentally determine whether Pax5 contains a fully folded HD, I expressed Pax5210-286, including the region of homology and adjacent residues, in Escherichia coli (E.coli). This protein fragment formed insoluble inclusion bodies. Upon isolation under denaturing conditions and subsequent removal of GuHCl, the protein was prone to aggregation. However, in slightly acidic solutions (pH ~ 5), the protein fragment was sufficiently soluble for NMR spectroscopic studies. I found Pax5210-286 to be disordered in the conditions tested (20 mM MES, 50 mM - 200 mM NaCl, 1 mM DTT, 1 mM EDTA, pH 5.1 - 6.5), as evidenced by the relatively narrow distribution of 1HN signals in its 15N-HSQC spectrum (Figure 2.25). The number of amide peaks present was consistent with the number of non-proline residues in Pax5210-286. I also observed that, while many peaks had comparable peak intensities, some residues were exchange-broadened, resulting in relatively weak signals. This indicated a lack of monodispersity in the sample, and suggestive of some intra- or intermolecular self-association under the conditions used for these NMR measurements.   68    Figure 2.25: The putative partial HD of Pax5 is intrinsically disordered.  The 15N-HSQC spectrum of 15N-labeled Pax5210-286 shows narrow distribution of amide proton chemical shifts, indicating this region of the protein is intrinsically disordered.    Although isolated Pax5210-286 does not adopt a stably folded conformation, it is possible that folding is induced in the presence of DNA or other protein partners. To test this, I added stoichiometric amounts of DNA duplex corresponding to the CTD mb-1 half-site to 15N-labeled Pax5210-286. A lack of any 15N-HSQC spectral perturbations indicates that Pax5210-286 does not associate with DNA non-specifically, at least in the absence of other factors. Although the putative HD might fold in the presence of a yet unknown specific DNA 69  sequence (other than the mb-1 half-site), most DNA-binding domains at least weakly bind non-specific DNA sequences with μM-mM affinity and such binding is readily detectable through NMR spectral perturbations.   I also tested the possibility that this region of Pax5 associates with the DNA-binding PD in a regulatory fashion. This idea was based on the observation that disordered regions often participate in intramolecular regulatory interactions with other domains within a protein [153]. However, addition of excess unlabeled Pax51-149 to 15N-labeled Pax5210-286 did not result in significant changes to the spectrum of the latter species (Figure 2.26a).   Finally, I tested the reported interaction between residues 626-740 of the Death associated protein 6 (Daxx) and this HD region [154]. Upon addition of unlabeled Daxx626-740 to 15N-labeled Pax5210-286, no significant spectral perturbations were detected (Figure 2.26b). This indicates that the reported interaction may be dependent on additional regions of the two proteins, and/or relies on post-translational modifications. In support of this latter hypothesis, Pax5 contains one high-confidence consensus SUMOylation site at Lys257 [155], and the C-terminal region of Daxx (residues ~ 720 - 740) has a well characterized SUMO-interacting motif (SIM) [156, 157]. Therefore, it is possible that the reported interaction involves the SIM of Daxx with a SUMO attached to Pax5.   In conclusion, despite partial sequence similarity with HD proteins, residues 210-286 of Pax5 are intrinsically disordered in vitro and do not appear to interact directly with DNA, the Pax5 PD, or the C-terminal region of Daxx. The function of this region of Pax5 thus remains to be established.   70   71  Figure 2.26: The partial homeodomain of Pax5 does not interact with the Pax5 PD or a C-terminal fragment of Daxx.  (a) 15N-HSQC spectra of 15N-labeled Pax5210-286 in the absence (green) and presence (orange) of excess unlabeled Pax51-149, corresponding to the PD. No significant changes were observed, indicating these segments of the proteins are unlikely to make intramolecular contacts in the context of native Pax5. (b) The similar 15N-HSQC spectra of 15N-labeled Pax5210-286 in the absence (green) and presence (orange) of excess unlabeled Daxx626-740 shows that these protein fragments also do not interact.   The transactivation domain of Pax5 is disordered in vitro   To determine if other regions of Pax5, besides the N-terminal PD, were amenable to structural studies, I also expressed and purified Pax5151-391. This fragment spans all regions C-terminal to the PD (Figure 2.27). This protein fragment was found to be very prone to aggregation, even at relatively low concentrations (~ 50 µM). I therefore used a fractional factorial buffer screen similar to that described in [158], in order to determine additives and conditions that would improve protein solubility. In general, chaotropic additives that promote disorder reduced aggregation. These include GuHCl, arginine, and magnesium salts. In contrast, buffer additives such as glycerol, sucrose, and polyethylene glycol, which are thought to promote hydrophobic contacts [159], had detrimental effects on the quality of the sample. In addition, I found that buffering the protein solution at pH values 1 - 2 units away from neutral reduced aggregation.  With this knowledge, I was able to collect a 15N-HSQC spectrum of Pax5151-391 under mildly denaturing conditions at pH 5.8 (Figure 2.27). The majority of the amide peaks had poor 1HN dispersion, indicative of conformational disorder. The number of peaks and uniform intensity of the signals were a good indication that the sample was monodisperse under these conditions. A small fraction of the peaks exhibited more dispersed 1HN signals, hinting that some regions may sample non-random conformations. A more detailed analysis, including sequence specific main chain 1H, 13C, and 15N chemical shift assignments, would be needed to identify these regions and analyze their secondary structures. 72   Figure 2.27: Pax5151-391 is predominantly disordered under mildly denaturing conditions.  15N-HSQC spectra of 15N-labeled Pax5151-391 shows narrowly distributed 1HN amide signals indicative of an overall lack of persistent secondary structural elements under these conditions.     Finally, I also expressed and purified Pax5300-391, spanning the TAD, and a putative IM [160] found at the C-terminus of the protein. The TAD of Pax5 (residues 304-358), belongs to a class of TADs, rich in proline residues, that are thought to mediate interactions with members of the basal transcriptional machinery [161]. Although the specific proteins that are recruited by the TAD of Pax5 are not known, the histone acetyltransferase CBP is suspected to interact with Pax5 via such a TAD [116]. This construct, which was rich in polar residues, was well-behaved and not prone to aggregation. The 15N-HSQC spectrum of Pax5300-391 under non-denaturing conditions shows that this region of the protein is predominantly disordered in vitro (Figure 2.28). This was not surprising, given the 15N-HSQC spectrum of the longer Pax5151-391 fragment.    73   Figure 2.28: The proline-rich transactivation domain of Pax5 is predominantly disordered under native conditions.  15N-HSQC spectra of 15N-labeled Pax5300-391 shows narrowly distributed 1H amide signals indicative of a lack of secondary structural elements under these conditions.    Overall, these data indicate that the entire C-terminal region of Pax5, including the homeodomain homology region and the TAD, are mainly disordered, and any potential helical structures are likely small or transient. This is consistent with the results of sequence-based algorithms that predict secondary structures and intrinsic disorder. In addition, I identified boundaries and conditions suitable for future study of regions of Pax5 that are disordered, yet may fold upon interactions with additional components of the transcription machinery.     74  2.4 Discussion  2.4.1 Structure of the Pax5 Paired domain and changes upon binding DNA   Based on backbone chemical shift, 15N relaxation, and amide HX measurements, Pax51-149, encompassing the PD and adjacent residues, folds as two independent 3-helix bundles separated by a conformationally disordered linker. Similar to Pax8 [92], free Pax5 lacks any stable secondary structure in the region encompassing its first ~ 30 residues, including the β-hairpin found in crystal structures of PDs bonded to DNA. This result is consistent with MD simulations of Pax6 and DNA, which predict that the β-hairpin structure requires stabilizing contacts provided by DNA [92]. Given the small number of residues involved in the formation of the β-hairpin and the lack of tertiary contacts or disulfide bonds, this result was not unexpected. The MD simulations presented herein also suggest that contacts with the C-terminus of H2 in the NTD may also help stabilize this hairpin, as evidenced by the positively correlated motions of these regions.   Motions throughout the PD become dampened upon formation of a high affinity complex with DNA, as judged by 15N relaxation, CSP analysis, HX measurements, and MD simulations. For example, the PD exhibits an increase in protection factors of at least 3 orders of magnitude in its bound state. In addition, the linker and N-terminal β-hairpin acquire dampened sub-nanosecond timescale motions comparable with the helices present in the subdomains, indicating the formation of specific contacts with DNA that reduce flexibility in this timescale. This is consistent with KD measurements using ITC and NMR, as well as CSP analysis quantifying the relative contributions of these regions of the PD towards association with DNA.  In addition, MD simulations demonstrated that the dynamic subdomains, and in particular the DNA-recognition helices, lose flexibility upon binding, further contributing to our understanding of the structural and dynamic changes that occur upon association.    75  2.4.2 Stability of the subdomains of Pax5   Although well folded, I found that the helical bundles of the PD exhibited small PFs in the range of only ~ 102 to 103. Thus, both the NTD and CTD are dynamic and readily undergo conformational fluctuations detectable by HX. More importantly, the subdomains differ in the extent of their dynamic behavior, with the CTD helices having PFs ~ 10 fold greater than the NTD.     In complementary CD-monitored denaturation studies, the less protected NTD was found to require higher temperature and denaturant concentration to cause secondary structure loss, relative to the CTD. This result was somewhat surprising, as I expected the more dynamic NTD to also globally unfold more readily. However, the NTD denatured over a broader range of temperatures and GuHCl concentrations than the CTD, indicating a less cooperative unfolding transition, and hinting at the presence of intermediates along the unfolding pathway. The dynamic NTD may readily sample transient conformational states that are susceptible to HX, leading to low protection factors that do not report on the fully unfolded form of the subdomain. One way to help determine whether the subdomains have unfolding intermediates is to compare the ΔH0unfold determined calorimetrically (using differential scanning calorimetry) with the value determined using the van’t Hoff analysis [162]. Discrepancies in these two values would suggest the absence of a thermodynamic two-state transition between the folded and unfolded subdomains [162].   On a more quantitative level, I found that the extrapolated ΔG0unfold of the subdomains obtained from HX and CD measurements do not compare very well (Table 2.1). These differences were not due to the pH of the measurements, as both types of experiments were conducted over a range of pH from ~ 5.5 to 8, which did not seem to affect stability (not shown). Discrepancies in the ΔG0unfold values obtained from the two methods has been observed for other systems and may have a number of origins [163]. A key caveat when interpreting these data in terms of stability is that the mechanisms of exchange and unfolding of these subdomains, either under non-denaturing conditions (examined by HX), or under increasing denaturing conditions (examined by CD) is not known and may be different for each subdomain. For example, the most protected amides 76  may exchange through sub-global, rather than global fluctuations, thus leading to an underestimation of the overall stability of a protein. Alternatively, a protein may have residual structure in its unfolded state, which could increase amide HX protection and lead to an overestimation of ΔG0unfold. Also, exchange may not be following the EX2 limit that must be assumed in order to calculate ΔG0unfold derived from HX measurements [137]. Although I saw a clear pH dependence in the rate of exchange of the subdomains (therefore ruling out the EX1 limit), this relationship was not strictly first-order with respect to the hydroxide ion concentrations, as would be expected in the strict EX2 limit for a base-catalyzed reaction (not shown).   In addition, it is unknown whether either subdomain contains intermediates in the unfolding pathway which would lead to inaccuracies in the extrapolation of ΔG0unfold from GuHCl denaturing curves. Although in a different timescale, MD simulations showed that motions in the helices of the NTD were coupled to a smaller degree than those in the CTD. Strikingly, Gly70 in the recognition helix H3 seems to facilitate a kinking motion that disrupts the local secondary structure. Conceivably, one of these intermediates in the NTD could involve partial unfolding of the C-terminus of H3, facilitated by a disruption in the hydrogen bond pattern. In some systems, unfolding intermediates have been shown to be present to a significant extent in the transition region and result in inaccurate stability estimations from these denaturation studies [135, 163].     Furthermore, proteins tend to be more stable in D2O solutions used for HX measurements of the most protected amides, relative to H2O used for CD spectroscopy measurements. Depending on the protein, these isotopes effects can account for ~ 0.7 - 1.7 kcal/mol in increased stability measured in D2O relative to H2O solutions [164-167]. This phenomenon is not expected to influence the subdomains differentially, due to their similar size and overall structure. However, I did not determine stabilities of the subdomains in D2O by CD spectroscopy-monitored thermal denaturation studies.   In summary, multiple factors such as the solvent used for measurements, the validity of assuming a cooperative two-state transition between folded and unfolded states without intermediates, and the assumption that the EX2 limit of exchange for both subdomains holds, could all account for the apparent discrepancy in stability values 77  calculated by HX and CD spectroscopy. Nevertheless, the NTD is clearly more dynamic than the CTD, exhibiting lower protection factors, larger linewidths in 15N-HSQC spectra, and a broader transition from the folded to the unfolded states.   The reasons for these distinct biophysical properties are not obvious, as both subdomains have similar helical bundle folds and charge distributions. However, the NTD and CTD do not show any sequence similarity, and their tertiary structures do not superimpose closely (backbone RMSD of ~ 2 Å in PDB ID: 1MDM). In addition, the helices that form the NTD versus CTD bundles differ in length in all PD structures reported to date. In particular, helix H2 is relatively short and predominantly polar, and does not contribute substantially to the hydrophobic core of the NTD subdomain. In contrast, residues Ile99, Ile114, and Leu118, which were found to be among the most protected from HX in Pax51-149, make hydrophobic and van der Waals contacts that likely contribute to the more rigid CTD. As well, the loops separating the helices in each subdomain are longer in the CTD, which may allow for better positioning of the helices relative to each other in order to maximize tertiary contacts and minimize exposed hydrophobic surfaces. As discussed below, the distinct nature of the subdomains may have important implications in their DNA-binding mechanism.   2.4.3 The Pax5 subdomains contribute differently to DNA binding   The PD makes extensive contacts with DNA. Upon binding the CD19 duplex, amides within the recognition helices of the NTD and CTD showed substantial 1HN-15N chemical shift changes. The N-terminal β-hairpin and flanking loops, as well as the interdomain linker, also exhibited large amide shift perturbations accompanied by reduced sub-nanosecond timescale mobility, indicative of structural ordering. Amides within the β-hairpin, linker region, and both helical bundles also become markedly more protected from HX. These changes are consistent with the X-ray crystallographic structure of the Pax5/Ets-1/mb-1 ternary complex, in which the two recognition helices dock within the major groove of DNA, while the β-hairpin/loop and intervening linker residues provide minor groove contacts.  78   My studies also showed that both the NTD and CTD contribute to the net affinity of Pax5 for the CD19 DNA. The separate subdomains exhibited similar (~ 3 to 13 µM) dissociation constants for their respective CD19-N and -C half-sites. As expected for multivalent interactions, the binding affinity is much stronger (~ 5 nM) for the intact PD with the full-length CD19 DNA. Surprisingly, the NTD interacted very weakly with all other DNA sequences tested. Because most DNA-binding modules have at least some weak affinity for random sequences due to electrostatic interactions with the phosphodiester backbone, this result was rather unexpected [168]. In contrast, the CTD bound the two CD19 half-sites with ~ 10 µM affinities, and bound the mb-1-C and two palindromic DNAs with KD values between ~ 150 and 400 µM. Thus, unlike the NTD, the CTD showed only modest sequence specificity and was able to measurably interact with all DNA sequences tested. In agreement with these results, previous studies noted more variability in the DNA sequences bound by the CTD relative to the NTD [81, 114]. It is also interesting to note that one of the structures of the Pax5/Ets-1/DNA complex determined by X-ray crystallography (PDB ID: 1K78) contained an extra CTD bound to a pseudo-consensus site present in the DNA duplex [12].   The reason behind the disparity in DNA recognition by the NTD, which binds only specific DNA sequences, and the CTD, which associates with DNA rather indiscriminately, is further discussed below. Of note, both subdomains have similar predicted isoelectric points (10.2 and 10.0 for Pax51-92 and Pax576-149, respectively) [169], and a similar number of positive charges at their DNA-binding interfaces. Also, in the Pax5/Ets-1/mb-1 complex, both subdomains provide comparable number of base-specific contacts, such as those involving His62 and Asn29 in the N-terminal region and Ser133 and Arg137 in the C-terminal region. Most other contacts are to the DNA phosphodiester backbone, hinting that indirect readout of the sequence-dependent DNA shape may be important for specificity [38]. The β-hairpin and following loop, which make key contacts with DNA, are also likely involved in setting the specificity of the N-terminal fragment of the PD for various sequences.   79  2.4.4 Relationship between protein dynamics and DNA-binding specificity   As described in this chapter, MD simulations in the nanosecond to microsecond timescale of the free subdomains predicted that the recognition helix H3 of the NTD has higher backbone motions relative to all other subdomain helices. Experimentally, the subdomains did not exhibit detectable motions in the millisecond to microsecond timescale, as probed using Carr-Purcell-Meiboom-Gill (CPMG) relaxation dispersion NMR experiments (not shown). However, the HX and CD measurements discussed above showed that the NTD undergoes more conformational fluctuations than the CTD in the absence of DNA. Upon binding DNA, however, motions in the CTD and, even more so, the NTD become significantly dampened. This is evidenced by a larger decrease in ABFs observed in the bound state of the NTD relative to the CTD by MD simulations.   In addition, ITC experiments showed that the enthalpic and entropic components of binding by the NTD to its DNA half-site were larger than those of the CTD to its site, indicating the formation of energetically favorable contacts with DNA that accompany significant losses in entropy. Intuitively, changes in conformation that result in unfavorable entropic losses (e.g. a reduction in protein dynamics) might be driven by high enthalpies of binding, which are in turn related to very specific contacts with DNA. Therefore, these data collectively point to distinct mechanisms of DNA binding by the Pax5 subdomains.   The dynamic NTD seems to undergo the largest structural rearrangements upon binding, which results in -hairpin folding and dampening of motions, accompanied by a favorable enthalpy of binding. In the case of the CTD subdomain, greater rigidity may be important in maintaining the orientation of positively charged residues involved in making non-specific contacts. To test this hypothesis, structural characterization of the Pax5 PD in the absence of DNA is required. However, this was beyond the timeframe of my thesis research. Nevertheless, in support of this idea, the helical regions of the NTDs of DNA-bound Pax5 (X-ray crystallography, PDB ID: 1MDM) and highly homologous DNA-free Pax8 (NMR spectroscopy, PDB ID: 2K27) do not align as well as their CTD helices. In addition, Pax8 has a poorly-defined helix H3 that adopts only ~ 1.5 turns in its free state [92]. These observations, in combination with the findings that the NTD is highly specific in 80  recognizing DNA sequences, indicate that there may be a correlation between the extent of conformational changes upon binding and increased DNA-binding specificity. The increased dynamics and flexibility of the NTD may reduce the time spent in a DNA binding-competent conformation, which impacts both the thermodynamics and kinetics of DNA complex formation, and therefore its specificity and affinity for non-specific and cognate DNA sequences.     Folding and increased structural definition upon association with DNA has been observed in many transcription factors [18, 149, 170]. In particular, the stabilization of α-helices coupled to complex formation, as observed for example in helix H3 of the Pax5 NTD, seems to be a common feature. A recent study comparing the structures of free and DNA-bound proteins in 90 cases where high-resolution data was available for both states, found that larger conformational changes upon complex formation were associated with greater specificity [171]. Why is this happening? An investigation of p53 complexed with DNA found that although the affinity of this protein for specific versus non-specific DNA only differs by ~ 10 fold, the kinetics of binding were very different [172]. The authors proposed that structural rearrangements might alter the rate of association and dissociation, and thereby be responsible for the differential recognition of DNA sequences [172].   Although I did not measure binding kinetics directly, my data also suggest that the two subdomains exhibit different binding kinetics (kon and koff). For example, even though both Pax51-92 and Pax576-149 had similar dissociation constants for their respective CD19 half-sites (~ 3 and 13 µM, respectively), they exhibited markedly different exchange behaviors between their free and bound states. In the case of the NTD-containing fragment, binding occurred in the intermediate-slow exchange regime (kex ≲ |Δω|), whereas the CTD-containing fragment exhibiting mostly fast exchange (kex ≳ |Δω|). This effect cannot be attributed to differences in |Δω| values alone, because several amide signals in the NTD (e.g. Arg50 and Leu69) exhibited small chemical shift changes yet displayed slow exchange behavior. Since the KD values (= koff/kon) are comparable for both fragments, this indicates that the exchange rate constant kex (= kon[DNA] + koff) must be smaller for Pax51-92 than Pax576-149. More strikingly, upon deletion of the β-hairpin residues at the N-terminus of the PD, the KD value increased from ~ 13 µM to ~ 80 µM, yet this subdomain still showed 81  exchange broadening in the presence of CD19-N (Figure 2.20), indicating that the exchange kinetics is slower for the NTD in spite of exhibiting weaker equilibrium binding than the CTD. Therefore, greater conformational changes in the NTD of Pax5 may be correlated with greater sequence specificity requirements due to slower exchange rates between the free and bound states.     On a final note, the NTD, which carries two exposed cysteine residues in helices H1 and H2 near the DNA-binding interface, was found to be very sensitive to oxidation and prone to aggregation (not shown). Addition of dithiothreitol (DTT) reducing agent partly reversed this aggregation and restored the original protein fold. In contrast, Pax576-149, which has one cysteine residue in the loop between helices H5 and H6, did not readily aggregate under similar conditions. Several studies have linked PD proteins, including Pax5, to redox regulation of transcriptional activity, whereby only the reduced forms of the protein can bind to DNA. In its oxidized state, intramolecular disulfide bonds prevent formation of the DNA complex [70, 92, 173]. In the case of Pax5, this process seems to be mediated by the redox modulator APE/Ref-1 [70]. It is conceivable that the dynamic nature of the NTD plays a role in this mechanism, allowing the cysteine residues to be more susceptible to redox-induced modifications. In addition, the fact that the β-hairpin folds upon binding DNA also means that the interactions it mediates (e.g. with Ets1 [12, 78]) should only occur in the context of DNA-bound Pax5. Therefore, interactions with protein partners involving the NTD are likely very specific and context dependent. Altogether, this is consistent with a regulatory role of the NTD and nearby β-hairpin, not only in setting DNA-binding specificity, but also in redox sensing and in mediating protein-protein interactions that fine-tune the activity of Pax5.   2.4.5 General model of DNA recognition by the PD and implications in biology    It is well established that sequence-specific transcription factors also associate with non-specific DNA through electrostatic interactions with the phosphodiester backbone [168]. However, in the case of the Pax5 PD, the specific and non-specific DNA recognition functions seem to be found in two structurally independent subdomains. The lack of 82  detectable binding to non-specific DNA by the NTD is somewhat perplexing, as discussed above. The distinct roles of the NTD and CTD in DNA binding are summarized in a model of DNA recognition in Figure 2.29. The more rigid CTD likely provides the initial low affinity non-specific contacts to localize Pax5 on accessible regions of genomic DNA. Its promiscuous binding may also readily enable one-dimensional sliding to rapidly scan nearby sites [42, 174]. In contrast, the more dynamic NTD provides the needed discrimination between cognate and non-specific sites. Additionally, the "beads-on-a-string" architecture of the NTD and CTD may facilitate “monkey-bar” intersegmental DNA transfer [55, 175], as observed in multi-domain transcription factors such as Oct-1 [54] and Egr-1 [176].    Figure 2.29: Cartoon model of the proposed DNA-binding mechanism by the PD of Pax5.  The more stable CTD subdomain readily associates with non-specific DNA sequences. In the presence of a specific cognate DNA sequence, the dynamic NTD also binds. Along with interactions from the β-hairpin and linker, this yields a high-affinity Pax5/DNA complex and may facilitate the “monkey-bar” mechanism used by other bipartite transcription factors such as Oct-1.    An interesting question arising from these studies is whether the distinct roles of the subdomains might be exploited by the cell for increased functionality. One might predict that alternative splicing events in the PD would have significant effects on the DNA binding properties of Pax proteins. Indeed, Pax8 can undergo a natural splicing event that results in the insertion of a serine residue between the native amino acids Gly63 and Arg64 83  that form part of the DNA recognition helix H3. This insertion abrogates binding by the NTD of Pax8 and therefore changes DNA specificity by the PD [124]. Of note, the equivalent residues in Pax5 are Gly70 and Arg71, which create the kink in the free Pax5 NTD observed by MD simulations. This is consistent with the notion that helix H3 must fold properly upon binding DNA in order to provide energetically favorable contacts. Similarly, a disease mutation in the human Pax6 gene implicated in ocular abnormalities causes the disruption of normal splicing events in this gene [123]. As a result, an alternative protein variant containing a 14-residue insertion in the NTD abrogates normal DNA binding activity, leading to defects in normal eye function [123]. In the case of Pax5, naturally occurring isoforms present in B-cells have distinct transactivation properties [177, 178]. In particular, two isoforms that result in truncated variants of Pax5 are distinguished by the presence or absence of the NTD and have opposite effects in the activation potential of the full-length protein [178]. Interestingly, the variant including the NTD suppresses Pax5 function in a dominant negative manner, whereas the variant lacking the NTD promotes its activity [178]. Although the underlying mechanisms for these observations are unknown, these results are consistent with my observations that the NTD is involved in setting the specificity for binding sites, therefore likely competes for regulatory Pax5 binding sites. In contrast, the CTD may weakly recruit proteins involved in transcriptional activation non-specifically.   Overall, the studies described in this chapter provide useful insight into the possible effects of alternative splicing events and mutations involving the subdomains of Pax proteins. Due to their distinct biophysical properties and DNA-binding behaviors, changes affecting either subdomain are predicted to have markedly differing effects.    84  2.5 Materials and methods  2.5.1 Expression and purification of Pax5 fragments   The genes encoding all Pax5 fragments were cloned from the full-length Pax5 gene (NCBI Gene ID: 5079) into the pET28-MHL vector (Addgene, plasmid #26096) using NdeI and HindIII restriction sites. This vector encodes an N-terminal His6 affinity tag followed by a TEV cleavage site. Unlabeled proteins were expressed in E.coli BL21 (λDE3) cells grown in lysogeny broth (LB) media, whereas isotopically labeled proteins were produced using M9 minimal media supplemented with 3 g/L 13C6-glucose and/or 1 g/L 15NH4Cl as the sole carbon and nitrogen sources, respectively.   Uniformly 2H/13C/15N-labeled Pax51-149 was produced in M9 minimal media, using a protocol modified from that published for preparing deuterated proteins [179]. Briefly, a 25 mL starter culture was grown to OD600 ~ 0.6 in LB media (H2O) at 37 °C. The cells were then collected by centrifugation and resuspended in 75 mL of M9 media prepared with 99% D2O. Protonated additives for the M9/D2O media were dissolved in D2O and lyophilized prior to use. The bacterial culture was allowed to reach OD600 ~ 0.6 and diluted 4-fold with fresh M9/D2O media. This was repeated until reaching the final culture volume of 1 L.   Expression of Pax5 fragments was induced at OD600 ~ 0.6 with 0.5 mM IPTG, followed by growth at 30 °C for 4-16 hours. After centrifugation, the cell pellet was frozen at -80 ˚C, then later thawed, resuspended in denaturing buffer (4 M GuHCl, 20 mM sodium phosphate, 0.5 M NaCl, 20 mM imidazole, pH 7.4) and sonicated to ensure complete lysis. Denaturation also led to full amide protonation in the otherwise uniformly 2H/13C/15N-labeled Pax51-149 protein sample. The cleared supernatant was applied to a Ni+2-NTA HisTrap HP column (GE Healthcare). In cases where the protein is soluble under non-denaturing conditions, the column was washed with 20 mM sodium phosphate, 0.5 M NaCl, 20 mM imidazole at pH 7.4 to allow on-column refolding, followed by elution with 20 mM sodium phosphate, 0.5 M NaCl, and 1 M imidazole at pH 7.4. In the case of aggregation-85  prone Pax5210-286, the protein was eluted under denaturing conditions, and dialyzed against native buffer (20 mM MES, pH 5.1, 200 mM NaCl).   The appropriate fractions were pooled and the His6 affinity tag cleaved with TEV protease during a dialysis step against 50 mM Tris-HCl, 1 mM DTT at pH 8.0. Three (or two in the case of Pax51-92) non-native amino acid residues (Gly-His-Met) remained at the N-terminus of each construct. The uncleaved protein and (His)6 tag products were removed using a HisTrap HP column. A subsequent size-exclusion chromatography (Superdex 75, GE Healthcare) or high performance liquid chromatography (HPLC) step was used to increase sample purity and for buffer exchange. Unless noted otherwise, the final buffer used for NMR experiments contained 20 mM MES, 200 mM NaCl, 2 mM DTT, and 0.5 mM EDTA at pH 6.5. In the case of ITC and NMR-monitored DNA binding titrations, 6 mM MgCl2 was included and the NaCl concentration was decreased to 100 mM. Samples were concentrated with 3 kDa or 10 kDa MWCO centrifugal filters (EMD Millipore). Protein concentrations were determined by ultraviolet absorbance at 280 nm using predicted molar absorptivities based on protein sequence [169].  2.5.2 DNA oligonucleotides   The sequences of oligonucleotides used in this study are summarized in Table 2.2. All single-stranded DNA oligonucleotides were purchased from Integrated DNA Technologies. Non-palindromic complementary strands were mixed in a 1:1 ratio based on the quantities reported by the vendor. All duplexes were annealed by heating to ~ 95 °C in sample buffer (20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, 6 mM MgCl2 at pH 6.5) for 10 min, followed by slow cooling to room temperature. The resulting double stranded DNAs were purified using size-exclusion chromatography with sample buffer (Superdex 75, GE Healthcare) to remove single-stranded DNA and adjust the salt concentration. In the case of the CD19 duplex used for EMSA, one strand (5'CGGTGGTCACGCCTCAGTGCCCCAT3’) was 5’-labeled with Alexa Fluor 647 for detection. The two strands were mixed and annealed as described above. Concentrations of the purified dsDNA were determined by ultraviolet absorbance at 260 nm using predicted molar absorptivities [180]. The quality of 86  several duplex DNAs was confirmed using 1H-NMR spectroscopy. In the resulting spectra, the expected number of guanine and thymine imino proton resonances was observed as single peaks between ~12 and 15 ppm.  2.5.3 General NMR spectroscopy methods  NMR experiments were performed using cryoprobe-equipped Bruker Avance III 500, 600 or 850 MHz spectrometers. Unless noted otherwise, protein samples were concentrated to 0.3 - 0.8 mM in 95% NMR sample buffer (20 mM MES, 200 mM NaCl, 2mM DTT, 0.5 mM EDTA, pH 6.5) with 5% lock D2O and data were collected at 25 °C. The spectra were processed and analyzed using NMRPipe [181] and Sparky [182]. The backbone (13Cα, 13Cβ, 13CO, 15N, and 1HN) chemical shifts were assigned using standard 1H-13C-15N scalar correlation experiments [183]. In the case of amide-protonated 2H/13C/15N-labeled Pax51-149 in complex with the 25 bp CD19 DNA duplex, the NaCl concentration was lowered to 20 mM and TROSY-based pulse sequences with 2H-decoupling were employed [184, 185]. Secondary structure propensity and RCI-S2 calculations were carried out using MICS [126]. Amide 15N relaxation   Amide 15N relaxation data (T1, T2, heteronuclear NOE) for Pax51-149, both free and in complex with the CD19 DNA duplex, were collected using the Bruker Avance III 600 MHz spectrometer with standard pulse sequences [186]. In the case of the DNA complex, the NaCl concentration was lowered to 20 mM to improve spectral quality and increase cryoprobe sensitivity. The T1 and T2 curves for well-resolved amide signals (peak intensity versus relaxation time) were fit to single exponential decays using Sparky [182]. Errors were estimated using a Monte Carlo approach. The heteronuclear 1H-15N NOE values were calculated as the ratios of the peaks intensities in the NOE spectrum (5 s relaxation delay followed by 3 s of 1H irradiation) versus a control reference spectrum (8 s delay without 1H irradiation). The resulting data were fit with Tensor2 [139] to obtain global tumbling correlation times and, in the case of CD19-bound Pax51-149, an anisotropic rotational 87  diffusion tensor. Selected structural coordinates were taken from the PDB file 1MDM for fitting of the diffusion tensor and for the calculations of rotation correlation times with hydroNMR [140]. HX rate constants and protection factors    To obtain the HX rate constants of slowly exchanging amides, Pax51-149 in 500 µL of NMR buffer (pH 6.5) was lyophilized. The dry protein sample was then resuspended in 500 µL of D2O and immediately placed in the spectrometer. 15N-HSQC spectra were collected at 25 °C in succession every 5 minutes, starting ~ 5 minutes after resuspension. However, only a few amide signals were detected in the first spectrum collected. Therefore, the experiment was repeated using NMR buffer at pH 5.5 and 15 °C to decrease the exchange rate and enable quantitation of the HX rates for a greater number of amides. Protium-deuterium exchange rate constants for residues with well-resolved 1HN-15N signals were obtained using Sparky [182]. Peak intensities were fit to the equation It = I0e−kHX−obst, where It is the observed peak height at time t after resuspension in D2O, I0 is the fit initial height, and kHX-obs is the fit exchange rate constant. In the case of the Pax51-149/CD19 complex, a similar procedure was performed at pH* 6.60 and 25 °C. The initial 5-min HSQC spectrum was collected ~ 12 minutes after resuspension. The last spectrum was collected approximately a month later and still contained many amide signals which had not decayed sufficiently to obtain a reliable kHX-obs. Therefore, only a lower PF limit is provided in Figure 2.7 for these residues with kHX-obs < 1.8 x 10-7 s-1.   To obtain kHX-obs values for rapidly-exchanging amides for the free PD, I employed a CLEANEX-PM sequence [187] using mixing times from 4 to 160 ms at 25 °C. Pax51-149 protein samples in NMR buffer at pH 5.6, 6.3, and 8.0 were used to maximize the number of residues with detectable protium-protium exchange. The resulting growth curves were fit to the equation ItI0=kHX-obskHX-obs+R1{1-e(kHX-obs+R1)t}  88  where It is the amide peak height for transfer time t, I0 is the corresponding height in the reference spectrum (without 1H saturation and with a 12 s relaxation delay), and R1 is the effective transverse relaxation rate constant. The fit kHX-obs were scaled by a factor of 1.4 to account for the reduced magnetization of water.   Predicted exchange rate constants, kHX-pred, for a random coil polypeptide with the Pax51-149 sequence under the corresponding solvent and temperature conditions were obtained using the server program Sphere [129]. Protection factors (PFs) were calculated as PF = kHX-pred/kHX-obs. Results from the protium-deuterium and CLEANEX exchange experiments were combined to obtain PFs for many residues in Pax51-149. In cases where two or more reliable PF values were available from complementary measurements, these were averaged. This merging of PFs is based on the assumption that exchange occurs in the commonly observed EX2 limit and that the stability of Pax51-149 does not change substantially between sample pH values of 5.6 and 8.0 and between 15 °C and 25 °C [132]. NMR-monitored DNA-binding titrations   DNA duplexes and Pax5 protein fragments were prepared in 20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, 6 mM MgCl2, at pH 6.5. Small aliquots of concentrated DNA (~ 1.5 mM) were added in a step-wise manner to 15N-labeled Pax51-92, Pax532-92, Pax576-149 and Pax51-149, initially at ~ 0.35 mM. 15N-HSQC spectra were recorded at each titration point. For all well-resolved residues exhibiting fast-exchange behavior, CSP values were calculated as Δδ = [(0.14ΔδN)2 + (ΔδH)2]1/2, where ΔδN and ΔδH are the changes in chemical shift for 15N and 1HN respectively. These values were plotted as a function of DNA added. The resulting titration curves were then fit with GraphPad Prism to the equation for a simple 1:1 binding isotherm 𝛥𝛿𝑖 = 𝛥𝛿𝑠𝑎𝑡 (([𝑃]𝑇,𝑖 + [𝐷]𝑇,𝑖 + 𝐾𝐷) − √([𝑃]𝑇,𝑖 + [𝐷]𝑇,𝑖 + 𝐾𝐷)2− 4[𝑃]𝑇,𝑖[𝐷]𝑇,𝑖) /(2[𝑃]𝑇,𝑖) where [P]T,i and [D]T,i are the total, dilution-adjusted concentrations of labeled protein and unlabeled species, respectively, at each titration point i, and Δδsat is the CSP at saturation. 89  The DNA concentration was treated as a variable and the fit macroscopic KD values include possible binding at multiple independent sites and orientations within the duplex oligonucleotides. The fit DNA concentrations were ~ 2 fold greater than the measured DNA concentration for titrations of Pax576-149 with the mb-1-C and the two palindromic duplexes, but not for the CD19-C duplex. This difference may reflect possible errors in the determination of protein and DNA concentrations using predicted molar absorptivities, and/or a stoichiometry other than 1:1 for DNA sequences that bind more weakly. However, in all cases, the amide 1HN and 15N chemical shifts changed in a co-linear fashion with added DNA, indicating a two-state macroscopic equilibrium between free protein and an ensemble of bound states. The KD values obtained individually for 10 amide residues with the most reliably fit titration curves were averaged to obtain the mean KD value ± standard deviation as reported in Table 2.3.    2.5.4 CD spectroscopy   CD experiments were performed using a JASCO J-810 spectrometer. Unless otherwise stated, purified protein samples were diluted to 10 µM for all measurements in 20 mM MES, 100 mM NaCl, 2 mM DTT, 0.5 mM EDTA, 6 mM MgCl2, at pH 6.5.  Spectra were recorded in the range 200-280 nm, in a 1 mm quartz cuvette at 25 °C.    For thermal denaturation studies, the cuvette was filled to capacity (~ 400 µL) to minimize evaporative changes in concentration at higher temperatures. A water bath accessory station was used to increase and decrease the temperature of the cuvette in 0.1 °C steps, while recording the CD signal at 222 nm. A total of 701 points were collected from 25 °C to 95 °C. The TM and ΔH0unfold values were obtained from fits of the resulting thermal denaturation curves according to ε222(T) = εU (e−ΔH0unfoldRT +ΔH0unfoldRTm1 + e−ΔH0unfoldRT +ΔH0unfoldRTm)  90  where ε222 is the ellipticity at 222 nm in arbitrary units, T is the measured temperature at each point, εU is the ellipticity at 222 nm of the unfolded state obtained by averaging the last 10 to 20 points on the curve, R is the gas constant, and ΔH0unfold and TM are the fit enthalpies of unfolding and midpoint unfolding temperatures, respectively. This equation assumes a 2-state folding/unfolding equilibrium with a constant ΔH0unfold over the transition region. In addition, this equation does not take into account temperature-dependent changes in the base lines, which were assumed to have small effects. The data were fit using GraphPad Prism.   For chemical denaturation studies, re-crystalized GuHCl (kindly provided by Dr. Fred Rosell) was used to make 19 samples of 10 M protein in the presence of 0 to 5.8 M GuHCl. The final concentration of GuHCl was determined by refractive index, measured using a polarimeter with CD buffer as the reference.  The fraction of protein folded at each point was calculated according to  fU=ε222−εNεU−εN, where ε222 is the ellipticity at 222 nm at each point, εN is the ellipticity of the native state obtained by averaging the first ~ 3 values of the curve, and εU is the ellipticity of the unfolded state obtained in a similar manner. The ΔGunfold at each point in the transition was obtained according to ΔGunfold =  −RTln(fU1−fU) . The resulting line was fit using least squares regression, and the ΔG0unfold under native conditions was obtained by extrapolating to 0 M GuHCl according to the equation ΔGunfold = ΔG0unfold - m[GuHCl].     2.5.5 Electrophoretic mobility shift assay (EMSA)   Fluorescently labeled CD19 duplex was annealed as described above. Samples were prepared using 10 nM labeled CD19 duplex in the presence of 0.08 nM - 120 µM Pax51-149 in 20 mM MES, 100 mM NaCl, 6 mM MgCl2, 6 mM DTT, 0.2 mM EDTA, 200 µg/mL bovine serum albumin, and 10% glycerol at pH 6.5. After incubation at room temperature for 15 min, followed by cooling to 4 °C, the samples were resolved on native 13% polyacrylamide gels and run in 0.5 X TBE buffer (45 mM Tris, 45 mM boric acid, 1 mM EDTA, pH 8.3) at 200 V and 4 °C for 1.5 hours. The DNA was visualized using a Typhoon Imager (GE Healthcare) 91  and quantified with ImageJ. The fraction of DNA bound at each titration point, fb,i was calculated to be DNAbound/(DNAfree + DNAbound) using the respective fluorescent band intensities. The resulting titration curves from two independent experiments were fit with GraphPad Prism to the expression 𝑓𝑏,𝑖 = (([𝑃]𝑇,𝑖 + [𝐷]𝑇,𝑖 + 𝐾𝐷) − √([𝑃]𝑇,𝑖 + [𝐷]𝑇,𝑖 + 𝐾𝐷)2− 4[𝑃]𝑇,𝑖[𝐷]𝑇,𝑖) /(2[𝐷]𝑇,𝑖) where fb,i is the fraction bound at each titration point i, and [P]T,i and [D]T,i are the total protein and DNA concentrations. The reported results are the average KD values ± the standard deviation of the two experiments.  2.5.6 Isothermal titration calorimetry   Experiments were performed using a MicroCal ITC200 (GE Healthcare) at 15-35 °C. Both the DNA and protein samples were prepared in 20 mM MES, 100 mM NaCl, 0.5 mM EDTA, 2 mM DTT, 6 mM MgCl2 at pH 6.5, unless otherwise specified. The sample cell contained the DNA duplex at concentrations between 50 and 100 µM, whereas the syringe contained Pax51-92, Pax532-92, Pax51-77, or Pax576-149 at ~ 1.4 mM. The following settings were used: 20 or 25 injections, 1000 r.p.m. stirring speed, reference power set to 10, initial delay 150 to 200 s, 2 s filter period, 1.5 to 1.8 µL of titrant per injection, and 100 to 120 s delays between each point. The first injection (0.5 µL) was excluded from the analysis. Quantification of the results and data analysis was performed using Origin software (GE Healthcare). To account for heats of dilution, the experiment was repeated with buffer in the sample cell and protein at the same concentration in the syringe. The heats of dilution were fit by linear regression, and the interpolated values were subtracted from each point in the titration experiment. The baseline-corrected binding isotherms were fit to a simple 1:1 model to yield the reported thermodynamic parameters and errors.   Initially, the protein and DNA concentrations were measured and used to derive the fit ΔH0bind, ΔS0bind, stoichiometry (N), and KD values reported in Perez-Borrajero, C. et al., 2016, J. Mol. Biol. This type of analysis is standardly used in fitting ITC binding isotherms. 92  However, after publication I found that abnormally-high stoichiometry values (~2-3) were common for Pax51-92, suggesting that the protein concentration might be overestimated. In addition, the size of the DNA duplex used (12 bp) should not allow enough space for more than one Pax5 molecule to bind. I also found Pax51-92 to be prone to aggregation and the stoichiometry value calculated using this standard analysis was variable between protein samples. As well, the extinction coefficient of Pax51-92 is relatively small (4470 M-1 cm-1), which could contribute to inaccurate protein concentration measurements.  For all these reasons, I decided to rely on the DNA concentration measurement, which is more dependable due to high extinction coefficients, and fix the binding stoichiometry to 1. With this method, I was able to obtain reproducible KD values for the interaction between the Pax5 subdomains and DNA, ranging from 1.9 to 4 µM in the case of Pax51-92, and 10 to 18 in the case of Pax576-149 for their respective half-sites. The average and standard deviation of five measurements are reported in Table 2.3, and vary somewhat from those previously published.   2.5.7 Molecular dynamics simulations   Simulations of the NTD (Pax534-77) and the CTD (Pax592-142) subdomains were based on the crystal structure of the PD in complex with the mb-1 promoter DNA and Ets1 (PDB: 1MDM) after removal of the DNA and Ets1 molecules. In the case of the DNA-bound state, a model of the Pax519-142/CD19 complex was made by substituting the mb-1 duplex DNA bases with the appropriate bases found in the high affinity CD19 DNA sequence (Table 2.2). This was accomplished using the mutagenesis function in PyMol [27]. Protonation states of residue side chains were predicted using PROPKA3.1 [188] at a pH of 7. The systems were solvated in an explicit, periodic TIP3P cuboid water box, and neutralized with 3 Cl- ions (Pax534-77 and Pax592-142) and 31 Na+ ions (Pax519-142/CD19). The final number of atoms was 12493, 12284 and 41062 for Pax534-77, Pax592-142 and Pax519-142/DNA, respectively. The corresponding box dimensions were 47.6 Å ∙ 49.8 Å ∙ 52.8 Å, 49.2 Å ∙ 49.9 Å ∙ 50.0 Å and 61.3 Å ∙ 95.7 Å ∙ 69.9 Å. The energy of the system was then minimized by using 5000 steps of steepest descent minimization of just the solvent with 93  the protein coordinates fixed, followed by 10000 steps for all atoms including the protein. Subsequently, the systems were heated slowly from 0 to 300 K for 50 picoseconds. A one nanosecond equilibration step was then performed before the beginning of the production run simulations.   The simulation used an integration step size of 2 femtoseconds, and consisted of 920 ns of Langevin dynamics in the modified AMBER ff14SB all-atom force field using the PREMD module in AMBER14 [141]. During the simulation, the pressure and temperature were kept constant at 1 atm and 300 K, respectively. The SHAKE algorithm was used to constrain bond lengths of atoms attached to hydrogens. Long-range electrostatic interactions were accounted for using the particle-mesh Ewald sum. In addition, long-range non-bonded interactions at a distance of > 10 Å were not considered.  Backbone (N, C, CO) RMSD time courses were calculated from the trajectories aligned to the starting crystal structures as well using CPPTRAJ, a module of AMBER14. This module was further used to calculate the per-residue backbone (N, C, CO) Amber B-factors and they were mapped onto the structures and visualized using PyMol [27]. A normalized cross-correlation analysis of C atoms was performed using CARMA [189].   2.5.8 Accession numbers   The NMR spectral assignments for Pax51-149 alone and in complex with CD19 DNA have been deposited in the BioMagResBank under accession numbers 26730 and 26731, respectively. 94  Chapter 3: The biophysical basis of phosphorylation-enhanced DNA-binding autoinhibition in Ets1  3.1 Overview   The eukaryotic transcription factor Ets1 is regulated by an intrinsically disordered serine-rich region (SRR) that transiently associates with the adjacent ETS domain to inhibit DNA binding. Autoinhibition is progressively reinforced by calmodulin-dependent kinase II (CaMKII) phosphorylation of multiple serines within the SRR. In this chapter, I combined NMR spectroscopy and X-ray crystallography to determine the physicochemical basis for the phosphorylation-enhanced “fuzzy” interactions of the SRR with the ETS domain.   For these studies, I used a synthetic phosphopeptide corresponding to the wild type SRR region (residues 279-295 in Ets1). I found that the SRR peptide interacts in trans with a well-defined region of the ETS domain, encompassing the recognition helix H3, the N-terminus of H1, and flanking regions including an inhibitory helical bundle. This interaction does not occur when the ETS domain is bound to a high-affinity canonical DNA duplex, and supports a competitive steric mechanism of DNA-binding inhibition. In addition, I found that increasing the hydrophobic, but not necessarily the aromatic character, of peptides corresponding to the SRR promoted the interaction between the SRR and the ETS domain. Furthermore, the interaction is dependent on the sequence of the SRR, and not simply on its overall amino acid composition. In addition, the affinity of the SRR peptide for the ETS domain decreases with increasing ionic strength, and in the absence of phosphorylation at two key serine residues (pSer282 and pSer285), is weakened by ~ 10-fold. Thus, the interaction is driven by the hydrophobic effect and enhanced electrostatically by phosphorylation of serines adjacent to the SRR aromatic residues.  Although the wild-type SRR peptide is predominantly disordered in its free state, residues surrounding the phosphoserines and aromatics exhibit backbone nuclear Overhouser effects (NOE) patterns distinct from those in corresponding peptides lacking 95  the aromatic residues and phosphate modifications. Using a combination of intra-and intermolecular NOE-derived distance restraints, I calculated a structural ensemble of the SRR peptide/ETS domain complex. This ensemble helps explain how the SRR mediates Ets1 autoinhibition by sterically blocking the DNA-binding interface of the ETS domain. Key features of the NMR-derived ensemble were confirmed by an X-ray crystallographic structure of a high-affinity fluorinated SRR mimic bound to a domain swapped ETS domain dimer. Finally, I also found that the Ets1 SRR phosphopeptide binds to the ETS domain of distantly-related PU.1, suggesting that features of the ETS domain mediating autoinhibition are conserved in this family of transcription factors. Together, these data refine a model of steric autoinhibition of Ets1 by an intrinsically disordered region, and help explain DNA-binding regulation.  3.2 Introduction   3.2.1 Intrinsically-disordered regions    Between 35 to 50% of eukaryotic proteins contain intrinsically disordered regions (IDRs) which, due to their amino acid composition, do not adopt well-defined three-dimensional structures under physiological conditions [190]. These regions have a high proportion of disorder-promoting amino acids, such as Ala, Arg, Gly, Gln, Ser, Glu, Lys, and Pro [191]. The amino acid bias present in IDRs results in highly flexible regions of proteins, which possess a rich variety of biophysical properties (reviewed in [191-193]). For example, IDRs characterized by a high net charge are classified as polyelectrolytes [194]. These tend to form extended conformational ensembles thought to result from charge-charge repulsion, and may function to provide specific (but dynamic) spacing between structured domains [193, 194]. Other IDRs are roughly neutral (polyampholytes), and are predicted to form relatively compact structural ensembles [195].   The amino acid composition, their repeating pattern, charge distribution, and in general the position along the primary sequence all give IDRs particular chemical 96  properties that can be exploited for biological functions [193].  Broadly speaking, these functions can be divided into three non-exclusive categories: i) scaffolding and recruitment of protein partners, ii) regulation via PTMs, and iii) conformational variability and adaptability [196]. Consistent with these functions, IDRs have been found to be enriched in scaffolding [197] and network hub protein complexes [198], where they aid in combining signals from different pathways.   IDRs exploit a large number of binding mechanisms to associate with protein partners. The resulting complexes can be viewed to lie along a continuum of dynamic characteristics, from well-defined to highly heterogeneous [199]. At one end of the spectrum are IDRs that become well-folded upon association with a co-factor, as has been observed in the case of α-helical molecular recognition motifs (α-MoRFs) [199, 200].  The motions found in these complexes are relatively small due to stabilizing contacts linked with secondary structure formation. At the other end of the spectrum are so-called “fuzzy” complexes that maintain a highly flexible nature upon association [193, 199, 201]. These are characterized by relatively weak and transient interactions that interconvert rapidly among energetically-similar conformations [199].   IDRs are particularly notable in TFs [22, 23] and proteins involved in control of key cellular processes such as the cell cycle (reviewed in [193, 201, 202]). It is estimated that ~ 80 to 95 % of eukaryotic transcription factors possess long stretches of IDRs [203].  For example, the TADs of many eukaryotic TFs are disordered. These function by providing low affinity multivalent interaction surfaces that localize members of the basal transcriptional machinery [19]. Examples include the TADs of p53, c-Myb, and NFAT5 [204-206]. In addition, structured domains in transcription factors, such as those involved in DNA-binding, can be “decorated” by adjacent IDRs that control activity. For instance, disordered and flexible regions flanking the ETS domains of ETV1, ETV4, ETV5, and Ets1 inhibit DNA association through steric and allosteric mechanisms [72, 73].   As hinted above, PTMs often occur in disordered regions [62], and those present in TFs are not exempt. The conformationally flexible polypeptide chain allows access to "writer" and "eraser" enzymes that catalyze side chain modifications, as well as "reader" proteins that recognize the presence or absence of the resulting PTMs [62]. This enables 97  the reversible modulation of protein function by IDRs [62]. In particular, IDRs present in TFs are commonly the sites of phosphorylation, acetylation, methylation, O-GlcNAcylation, ubiquitination, SUMOylation, and other such PTMs [191, 203]. The subtle calibration of TF activity is particularly important in transcription factors responding to cellular stimuli [62]. The tumor suppressor factor p53, for instance, contains a disordered C-terminal region that alters its lifetime, its DNA-binding affinity, and facilitates recruitment of co-factors [207]. Phosphorylation, acetylation, and ubiquitination in this domain of p53 modify the existing DNA-binding ability, and allow diverse intracellular signals to converge upon one central protein [207].   Another interesting feature of IDRs is that they tend to be part of regions that are naturally removed upon splicing, thereby contributing to multiple protein isoforms and increased functional variability [22, 208, 209]. For example, of the seven isoforms of the tumor suppressor BRCA1 for which there is structural information, six are splice variants lacking regions contained in the intrinsically-disordered center domain, with the remaining isoform being full-length BRCA1 [209]. The bias towards splicing sites is true of the majority of proteins with known isoforms and intrinsic disorder analyzed by Romero and colleagues [209].   Currently, a number of experimental and computational techniques such as NMR spectroscopy, SAXS, and MD simulations have been used to describe the conformational ensembles sampled by IDRs [204, 210, 211]. A notable study in which all three techniques were combined was performed by Wells and colleagues describing the structure of p53 in a tertiary complex with DNA and the Taz2 domain of the co-activator acetyltransferase CBP [205]. In this work, the authors used X-ray and NMR-derived derived structures of globular domains, in combination with residual dipolar coupling (RDC), SAXS, and binding measurements, to derive tertiary and quaternary structural models of the arrangements of the different molecules in the complex. Nevertheless, this study required the integration of multiple approaches and many years of experimental work to define the “parts” of the system [205]. In spite of great progress in the field, the flexibility found in IDRs makes a detailed description of the populated conformations, both before and after association with other proteins, still a challenging task, and in particular for large systems [211].   98  3.2.2 Ets1   Ets1 is one of the best-studied members of the ETS family of eukaryotic transcriptional regulators, which is comprised of 28 paralogs in humans [212]. It is mainly expressed in lymphocytes and has crucial functions in the development and maintenance of cells involved in immunity, such as B-cells, T-cells, and Natural Killer cells [83, 213]. Multiple cellular signaling pathways converge on dynamic and disordered regions of Ets1, leading to its activation or inhibition [83].   Ets1 contains a conserved winged helix-turn-helix ETS domain that defines the ETS family. This domain binds a 9 - 12 bp DNA sequence with a highly conserved 5’-GGA(A/T)-3’ core flanked by more variable bases (reviewed in  [212]). The recognition -helix H3 of the ETS domain (i.e. the second helix of the helix-turn-helix  motif) inserts into the major groove of DNA, allowing direct hydrogen bonding between two invariant arginine sidechains and two guanines (Figure 3.1a). Additional contacts to the phosphodiester backbone are provided by the N-terminus of H1, the turn between H2 and H3, and the wing between -strands S3 and S4.   Four helices appended to the ETS domain of Ets1 reduce its affinity for DNA by ~ 2-fold [212]. This inhibitory module (IM) is composed of N-terminal helices HI-1 and HI-2 and C-terminal helices H4 and H5 interfaced with H1 of the ETS domain (Figure 3.1a). Importantly, the module is distal to the DNA-binding surface. In an allosteric response to association with either specific or non-specific DNA, the marginally-stable helices HI-1 and HI-2 unfold, thereby contributing an energetic penalty to attenuate DNA-binding [214, 215].   Ets1 is further regulated by the SRR, a region composed of ~ 50 residues preceding HI-1. Previously, our group showed that the SRR both stabilizes the IM and sterically inhibits DNA binding by transiently associating with the recognition helix H3 of the ETS domain [216, 217]. This results in a combined ~ 20-fold attenuation of DNA-binding. The SRR is intrinsically disordered and is phosphorylated by CaMKII in response to calcium signaling [218, 219]. An increasing number of phosphoserines in this region causes DNA binding to become progressively weaker (up to ~ 1000-fold), thus tuning autoinhibition in 99  a “rheostat” or "dimmer switch" manner [216]. Removal of the regulatory layer provided by the IM and the SRR through alternative splicing, mutation (as seen with the oncoviral v-Ets) or protein partnerships, results in increased affinity for DNA [212].   Previous work by our group revealed that both phosphoserines and flanking aromatic residues in the SRR contribute synergistically to Ets1 autoinhibition [72]. Although considerable progress has been made, due to the dynamic nature of the interaction between the disordered SRR and the ETS domain, the physicochemical basis of autoinhibition remained incompletely defined. Using a combination of NMR spectroscopy and X-ray crystallography, I demonstrate that the SRR interacts with a surface of the ETS domain that overlaps the DNA-recognition helix H3 and contacts the inhibitory helices. This interaction is driven by a combination of the hydrophobic effect provided by aromatic residues and favorable electrostatic contributions provided by phosphoserines and adjacent aspartate/glutamate residues. Phosphorylation of serine residues in the SRR peptide does not induce any persistent secondary structure. However, amino acids surrounding the phosphoserines exhibit more restricted motions in the presence of the aromatic residues. I present a detailed model of the ETS domain/SRR interaction and discuss its implications towards the regulation of Ets1. Finally, I found that the Ets1 SRR can also bind the ETS domain of PU.1, a divergent paralog that does not exhibit autoinhibition. This raises the intriguing possibility that the Ets1 SRR may be capable of inhibiting DNA binding by other members of the ETS family in trans.    3.3 Results  3.3.1 Phosphate-enhanced hydrophobic effect in Ets1 autoinhibition  The SRR peptide interacts with a surface region overlapping the ETS domain DNA-recognition interface  100   To characterize the region of Ets1 that associates with the SRR, I used a trans system similar to that previously described [72]. This consists of an unlabeled peptide corresponding to a truncated SRR (residues 279-295), along with 15N/13C-labeled Ets1301-440 containing the IM and ETS domain (Figure 3.1b). I denote the peptide, with blocked termini and phosphorylated serine residues 282 and 285, as WT2P (Table 3.1). Although the full SRR encompasses ~ 50 residues (244-300), a partial SRR with these two CaMKII phospho-acceptor serines recapitulates most of the inhibition seen with wild-type Ets1, and is well-suited for NMR spectroscopic studies [217].   Upon progressive addition of the WT2P peptide to Ets1301-440, a large number of amide signals in 15N-HSQC spectra of the protein shifted in the fast exchange regime (Figure 3.2). This indicates relatively weak binding to the peptide under conditions of moderate ionic strength (300 mM NaCl. As a complementary approach, I also monitored changes in the 13C-HSQC spectra of Ets1301-440 upon peptide titration (Figure 3.3). Overall, the 1H-13C shift perturbations of Cα, C, methyl and aromatic moieties were relatively small, indicating that the tertiary structure of the protein is not significantly perturbed upon binding. Importantly, residues exhibiting the largest 1H-13C and 1H-15N CSPs overlapped closely, and collectively define the SRR-interaction interface of the ETS domain as an extended surface encompassing the recognition helix H3, as well as H1 and portions of the IM (Figure 3.1). This same interface was identified from a comparison of the 15N-HSQC spectra of Ets1301-440 versus 2PEts1279-440, with the latter species containing the phosphorylated SRR in cis [217]. Furthermore, of the 24 residues exhibiting 1H-15N CSP values > 0.05 ppm, one-third were in H3, and over half were hydrophobic (Val, Leu, Ile, Tyr, and Trp). Therefore, in support of a steric model of autoinhibition [72, 217], the SRR associates with a hydrophobic patch on the Ets1 ETS domain that overlaps its DNA-recognition interface.     101    Figure 3.1: The SRR interacts with a well-defined surface of the ETS domain encompassing the recognition helix H3 and flanking regions.   (a) The ETS domain binds in the major groove of DNA using a wHTH motif. The recognition helix H3 contains a tyrosine and two arginines that make key hydrogen bonds with the core GGA bases. In the DNA-bound state, the inhibitory helices HI-1 and HI-2 are unfolded (adapted from PDB: 1MDM). (b) The trans system consists of unlabeled, phosphorylated peptide corresponding to Ets1 residues 279-295 (red) along with 15N/13C-labeled Ets1301-440, containing the core ETS domain (dark blue) and inhibitory helices (light blue). (c, d) Amide 1H-15N CSPs of Ets1301-440 upon addition of the WT2P peptide at 5.3-fold molar excess in 300 mM NaCl (~ 80 % saturation, see Table 3.1). The largest changes (> 0.05 ppm, dashed line and highlighted in red on the NMR-derived structure of free Ets1, PDB: 1R36) cluster around the recognition helix H3, the loop leading to S3, and the N-terminus of H1. (e) Cα, Cβ, methyl and aromatic moieties of Ets1301-440 were also monitored upon addition of the WT2P peptide to a 1:1 molar ratio in 100 mM NaCl (~ 85 % saturation, Figure 3.3). Residues with resolved signals showing a 1H-13C CSP > 0.05 ppm are identified with red spheres.    102    Figure 3.2: Amide chemical shift perturbations in Ets1301-440 upon addition of the WT2P peptide.  Overlaid 15N-HSQC spectra of 15N-labeled Ets1301-440 recorded upon titration with small aliquots of the unlabeled WT2P peptide (Table 3.1). The peptide causes a large number of amide signals in the protein to shift in fast exchange relative to the chemical shift timescale. Residues exhibiting particularly large amide CSPs are labeled, and correspond to those in H3 (~ 392-399) and nearby regions (see Figure 3.1c, d). Under the sample conditions (300 mM NaCl, KD ~ 140 µM), the final protein: peptide molar ratio of 1:5.3 corresponds to ~ 80 % saturation of the Ets1301-440/WT2P complex.  103    Figure 3.3: 1H-13C chemical shift perturbations in Ets1301-440 upon addition of the WT2P peptide. Overlaid 13C-HSQC spectra of 15N/13C-labeled Ets1301-440 in the absence (blue) and presence of 0.5 (purple) and 1.0 (red) molar equivalents of the unlabeled WT2P peptide. Peaks in the methyl (top), Cα (middle) and aromatic (bottom) regions shift in the fast exchange regime. Overall, the peptide caused relatively small spectral perturbations, indicating that the Ets1301-440 structure remained essentially unchanged. Residues exhibiting the greatest perturbations are labeled and highlighted in Figure 3.1e. Due to spectral overlap, not all residues exhibiting large CSPs were assigned. Under these conditions (100 mM NaCl, KD ~ 10 µM), the Ets1301-440/WT2P complex is ~ 85 % saturated at the final titration point.   104  Increased hydrophobicity strengthens the interaction between the SRR and the ETS domain    Within the SRR region, there are four aromatic residues shown previously to be crucial for the phosphorylation-enhanced interaction with the ETS domain of Ets1 [72]. Mutation of these residues to alanine significantly impaired DNA-binding autoinhibition in the intramolecular context of Ets1279-440, and also weakened the intermolecular association of the SRR peptide to the ETS domain [72]. Are these tyrosine and phenylalanine residues important due to their hydrophobic character, aromatic character, or both? How do they cooperate with the adjacent phosphorylated serine residues? To answer these questions, and better understand the nature of the ETS domain/SRR interaction, I expanded the trans system to include peptides which retained the overall pattern of the wild-type SRR, including phosphorylation sites, but had alternative residues at the four aromatic positions. This resulted in a series of peptide variants with a wide range of hydrophobic and aromatic properties (Table 3.1).   Titrations of the SRR peptide variants into 15N-labeled Ets1301-440 were monitored with 15N-HSQC spectra, allowing KD determination (Figure 3.4). In all cases, residues in the protein exhibiting large amide CSPs clustered around the same region bound by the WT2P peptide (Figure 3.5a, compare with Figure 3.1c). This indicates that the general features of the SRR peptide are sufficient to direct binding to a common interface on the ETS domain. Fitting of the titration data (Figure 3.4) yielded the equilibrium dissociation KD values summarized in Table 3.1.           105  Table 3.1: Sequence dependence of the SRR peptide interactions with Ets1301-440.  Peptide name Sequencea Relative hydrophobicityb Chargec KD (µM)d 5fPhe2P* RVPSPFDSPFDFEDFPAALW 1 -6.5 9 ± 6 Trp2P* RVPSPWDSPWDWEDWPAALW 0.88 -6.5 26 ± 8 Leu2P* RVPSPLDSPLDLEDLPAALW 0.88 -6.5 43 ± 5 Phe2P* RVPSPFDSPFDFEDFPAALW 0.90 -6.5 60 ± 15 WT2P* RVPSPYDSPFDYEDYPAALW 0.65 -6.5 130 ± 35 WT2P RVPSPYDSPFDYEDYPAAL - -6.5 140 ± 20 Val2P* RVPSPVDSPVDVEDVPAALW 0.69 -6.5 250 ±50 Ala2P* RVPSPADSPADAEDAPAALW 0.37 -6.5 930 ±170 WT0P* RVPS0YDS0FDYEDYPAALW - -3 1200 ± 200 Trp0P* RVPS0WDS0WDWEDWPAALW - -3 N.D. Clus2P* RPDSPDESPDYPYVALYAFW - -6.5 N.D.  a Residues in bold correspond to the four aromatic positions substituted in the peptide variants. The presence or absence of phosphorylation at Ser282 and S285, as found in CaMKII-modified Ets1, is indicated by 2P or 0P, respectively. The pentafluoro-phenylalanines in 5fPhe2P* are denoted as F. Clus2P* is a peptide with clustered charged and hydrophobic residues. The * denotes a non-native C-terminal tryptophan used for quantitation. b Hydrophobicity calculated relative to the 5fPhe2P* peptide, set to 1 (see Methods).  c Approximate net charge of the peptides, with blocked termini, at pH 6.5, assuming - 1.5 for pSer,    - 1 for Glu/Asp, and + 1 for Arg. d Determined from 15N-HSQC monitored titrations in 300 mM NaCl, 20 mM MES, pH 6.5, 28 °C. N.D.: not determined. Addition of Trp0P* and Clus2P* peptides caused protein aggregation.      106   Figure 3.4: Determination of the dissociation constants (KD) between the SRR peptide variants and Ets1301-440 from 15N-HSQC monitored titrations.  Plotted are the amide 1H-15N CSP values of Leu341 (●) and Leu389 () as a function of volume of peptide added to 15N-labeled Ets1301-440 (300 mM NaCl, 20 mM MES, pH 6.5, 28 °C). The KD values reported in Table 3.1 represent the mean ± standard deviation of the individual fit KD values (solid curves) for 10 such amides. The quantities of protein (initially ~ 280 M in ~ 400 L) and peptide (~ 2.5 mM stock solution) varied in each experiment and thus the molar ratios of peptide to protein are not exactly comparable among the graphs. Data points outside the x-axis maximum of 400 µL are not shown for the Val2P* and WT0P* titrations. 107   Increasing the hydrophobic nature of the peptides resulted in tighter binding to Ets1301-440 in the order 5fPhe2P* > Trp2P* > Leu2P* > Phe2P* > WT2P* > Val2P* > Ala2P*. Therefore hydrophobicity, and not strictly aromaticity, is important for the interaction (Figure 3.5b). For example, the Leu2P* peptide bound the ETS domain with greater affinity than the WT2P* peptide, even though the phenylalanine and three tyrosine residues in the latter were all replaced by leucines. The fact that the exact sequence was not important in localizing the peptide, and that multiple variants had relatively high affinity for the ETS domain, also indicated that the association between the SRR and Ets1301-440 is driven by the hydrophobic effect, rather than van der Waals interactions with more stringent stereochemical constraints. Finally, since 5fPhe is very hydrophobic, yet has reduced aromatic ring electron density, I concluded that -cation interactions do not dictate formation of the SRR/ETS domain complex [220].   I also tested a peptide with the same amino acid composition as the WT2P*, but with the negatively charged pSer, Asp, and Glu residues in one contiguous block, followed by the hydrophobic residues. Upon addition of this clustered peptide (Clus2P*), Ets1301-440 aggregated, and I was unable to determine a KD value for the interaction (Table 3.1). Nevertheless, this result suggests that functional interaction between the ETS domain and the SRR is dependent on the sequence pattern, and not simply the amino acid composition, of the phosphopeptide. Similarly, addition of the Trp0P* peptide, which is very hydrophobic and lacks phosphorylation, also caused protein aggregation, thus precluding a KD determination (Table 3.1).      108    Figure 3.5: Increased hydrophobicity strengthens the interaction between the ETS domain and the SRR phosphopeptides.  (a) Addition of phosphorylated peptide variants to 15N-labeled Ets1301-440 caused large CSPs in the same region of the protein as affected by the WT2P peptide (300 mM NaCl; cf. Figure 3.1c). Shown are 1H-15N CSP values for representative cases of the Ala2P* (cyan, 1:14.5 protein:peptide ratio, ~ 60 % saturation), Trp2P* (orange, 1:5, ~ 95 %) and 5fPhe2P* (black, 1:4.5, ~ 98 %) peptides variants. (b) The NMR-derived KD values obtained from titrations with the phosphorylated peptides plotted as a function of hydrophobicity relative to the 5FPhe2P* peptide, which was arbitrarily assigned a value of 1. The scale was based on side chain hydrophobicity values for the 20 standard amino acids [221], and predicted partition coefficients (octanol/water) of Phe relative to 5FPhe [222, 223]. The dashed line is drawn as visual aid to highlight the trend. This overall trend does not change with other hydrophobicity scales (not shown).      109  Electrostatic interactions with phosphoserines are also important for SRR association to the ETS domain   The absence of phosphorylation in the WT0P* peptide resulted in a ~ 10-fold decrease in binding affinity to Ets1301-440 (Table 3.1). This is consistent with previous studies showing that phosphorylation at Ser282 and Ser285 is important for binding of the SRR peptide, as well as for autoinhibition of Ets1 [72, 216, 217]. To gain further insights into the nature of these interactions, I carried out NMR-monitored titrations as a function of NaCl concentration. Increasing ionic strength caused binding of the phosphorylated WT2P* peptide to Ets1301-440 to be weakened significantly (Table 3.2). This highlights an electrostatic contribution to their interaction. Based on the slope of plot of log KA (= 1/KD) versus log [NaCl], I estimated that the net equivalent of two to three counter-ions are released upon association of this SRR peptide with the ETS domain (Figure 3.6) [152].   In addition to two phosphoserines, the SRR contains three aspartates, one glutamate, and one arginine that could contribute to electrostatic binding. To confirm that the phosphate groups are involved in the association with the ETS domain, the changes in the 31P signals of the WT2P peptide upon titration with Ets1301-440 were also monitored. These signals were assigned to pSer282 and pSer285 via 3JPH scalar couplings with the phosphoserine 1Hβ nuclei (Figure 3.7a). As shown in Figure 3.7b, the 31P signals deriving from the phosphoserine residues in both major (trans Val280-Pro281) and minor (cis) conformers of the peptide shifted downfield upon addition of the protein, with pSer285 exhibiting the largest CSP.          110  Table 3.2: Increasing ionic strength weakens the Ets1301-440/WT2P* interaction.  Name Peptide sequence [NaCl] (mM) KD (µM) WT2P*  RVPSPYDSPFDYEDYPAALW  50 ~ 2a 200 33 ± 9 300  130 ± 35  a Deviation from the fast exchange limit and estimated tighter binding precluded reliable fitting of the 15N-HSQC monitored titration data.     Figure 3.6: The interaction between the SRR peptide and the ETS domain is dependent on ionic strength.  Titrations of the WT2P* peptide into 15N-labeled Ets1301-440 were carried out under conditions of differing ionic strength (0.05 M NaCl, 0.2 M NaCl, 0.3 M NaCl). From the resulting binding isotherms, the association constants (KA) were calculated.  Based on the slope of log (KA) versus log [NaCl], an estimated 2-3 ionic contacts are being made between the two molecules [152]. The error estimates were obtained from linear regression analysis.  111   Figure 3.7: The phosphate groups on the WT2P peptide are involved in the interaction with Ets1301-440.  The changes in 31P chemical shifts arising from the phosphate groups in pSer282 and pSer285 were monitored upon addition of unlabeled Ets1301-440 to the WT2P peptide. (a) The 31P-HSQC spectrum of the WT2P peptide in the absence of Ets1 (5 mM MES, 50 mM NaCl, pH 6.5, 28 °C) is shown. The peaks were assigned using correlations between the phosphoserine phosphate and 1Hβ nuclei via 3JPH couplings. The * identify signals arising from a minor population of the peptide with a cis Val280-Pro281 amide. (b) The 31P signals of the pSer282 and pSer285 phosphate groups in the WT2P peptide shift downfield upon addition of Ets1301-440. A small increase sample pH from 6.41 to 6.51 during the titration was also observed, as evidenced by the small changes in the signal from inorganic phosphate. Control pH titrations confirmed that the perturbations of the phosphoserine signals were due to primarily to protein binding, with only minor contributions from the change in sample pH (not shown).     Thus pSer282 and pSer285 indeed interact with the ETS domain. The observed chemical shift changes could arise directly from changes in the magnetic environments of the 31P nuclei due, for example, to hydrogen bonding or salt bridge formation with the protein. However, based on coarse reference 31P NMR spectra of the WT2P peptide recorded as a function of pH, under the experimental conditions of pH ~ 6.5 the phosphoserines exist in an acid-base equilibria with roughly equal populations of mono- and di-anionic species. Furthermore, the downfield shifts in their 31P signals induced by protein binding suggest changes in these populations to favor the fully deprotonated species [224].   Collectively, these results show that the interaction between the SRR and the ETS domain of Ets1 occurs in a well-defined and specific region of the protein encompassing the DNA-recognition helix and nearby regions. In addition, the SRR uses a combination of 112  electrostatic and hydrophobic-driven effects to associate with the DNA-binding interface of the ETS domain.  The SRR remains disordered upon binding the ETS domain   In previous studies of several Ets1 fragments, the NMR signals from the SRR region were assigned [72, 217]. Using CSPs and paramagnetic relaxation enhancement data, the intramolecular interaction interface with the ETS domain was coarsely mapped. However, no unambiguous 1H-1H NOE interactions between nuclei in the SRR and the ETS domain were detected. Along with additional data, including 15N relaxation measurements, this indicated that the SRR only transiently bound the ETS domain to mediate autoinhibition. However, over the course of this current study, it became apparent that under conditions of low ionic strength, the WT2P peptide bound Ets1301-440 with low M affinity (Table 3.2). Such conditions were not used with the above mentioned Ets1 fragments as they tended to aggregate over the extended time periods needed for detailed NMR studies. The tighter binding of the current peptide models provided the opportunity to pursue more detailed structural studies.  Using double-filtered 1H-1H NOESY and TOCSY experiments, I assigned the majority of proton chemical shifts of the unlabeled WT2P peptide bound to invisible 15N/13C-labeled Ets1301-440. Overall, the 1H chemical shifts of the peptide in its free and bound forms were similar, indicating that it does not stably fold upon binding to adopt a well-defined conformation (Figure 3.8).    113   Figure 3.8: The SRR peptide does not undergo large changes in backbone chemical shifts upon binding.  Shown are the 1HN/1Hα regions of (a) 1H-1H TOCSY and (b)  1H-1H NOESY spectra (150 ms mixing time) of the WT2P peptide in its free and Ets1301-440 bound states (1:1 molar ratio WT2P:Ets1301-440, 100 mM NaCl, 28 °C, ~ 85% saturation). For the bound forms, the signals from the 13C/15N-labeled protein were suppressed by filtering in both dimensions. These spectra allowed nearly complete assignment of all 1H signals arising from the peptide. Overall, the small changes in backbone chemical shifts upon addition of protein demonstrate that the peptide does not fold into any persistent conformation upon binding. In the bound state, the NOESY crosspeaks are stronger due to slower tumbling of the peptide-protein complex. Note that signals from residues at the N-terminus of the peptide, such as R279 and pS282, remain relatively sharp in the bound state and exhibit small CSPs (see also Figure 3.9). In contrast, residues near the center of the peptide, such as pS285 and F286, become noticeably broader and undergo larger CSPs. 114  More quantitatively, I compared the 1Hα chemical shifts of the free and bound WT2P peptides versus the random coil chemical shifts predicted for this peptide sequence with the Camcoil program [225], corrected for phosphorylation effects (see Methods) [224] (Figure 3.9a). Differences between observed and expected random coil chemical shifts are called secondary shifts and are very sensitive indicators of secondary structure propensities [226, 227]. Overall, both the free and bound peptides had small secondary 1Hα chemical shifts, which were mostly less than a +/- 0.1 ppm range cutoff used by Wishart et al. for defining secondary structures via the chemical shift index (CSI) analysis [226]. This confirms the above mentioned observations that the peptide is disordered and does not stably fold upon association. A more detailed analysis of the conformational features of the SRR peptides will be presented in section 3.3.3, below.  In addition, I calculated the 1Hα and 1HN chemical shift changes for the WT2P peptide upon binding the ETS domain. Residues ~ 283 - 291, and pS285 and F286 in particular, exhibited the largest CSPs (Figure 3.9b). Such CSPs may arise from conformational changes or from other factors, such as altered electric field effects or solvation. Regardless, this indicates that residues in the center of the peptide are the most perturbed upon contacting Ets1301-440. On the other hand, residues at the N-terminus (279-282) and C-terminus (293-294) had relatively smaller changes in their backbone chemical shifts. This is consistent with previous results implicating the central aromatic and negatively charged residues in binding the ETS domain [72].  115   Figure 3.9: Changes in the WT2P peptide upon binding the ETS domain  The backbone chemical shifts (1HN, 1Hα) of the WT2P in its free and Ets1301-440-bound state were assigned and used to determine chemical shift changes. (a) Secondary structure shifts of 1Hα atoms relative to random coil chemical shifts determined using Camcoil [225], and corrected for phosphorylation effects (see Methods) of the free and bound WT2P peptide. Bound refers to a complex comprised of a 1:1 molar ratio of Ets1301-440/WT2P peptide at 100 mM NaCl. Under these conditions, the complex is ~ 85% saturated. Most residues do not deviate greatly from random coil chemical shifts, even in the bound state, indicating that the peptide does not fold upon binding. The dashed lines correspond to the empirical cutoff value of +/- 0.1 ppm used by Wishart et al. [226] for CSI analysis. (b) The differences between the free and bound states were plotted as a function of residue number for 1HN (white) and 1Hα (black) atoms. Residues 283-291 undergo the largest chemical shift changes upon association and define the interaction interface. The peptide sequence found in the WT2P peptide is shown above in red, with phosphorylation sites marked with a “P”.  116  The SRR does not associate with the DNA-bound ETS domain    To test whether the WT2P peptide could compete for binding with DNA in trans, I carried out titration studies using the WT2P peptide and a previously-described high-affinity Ets1/DNA complex [39]. No changes in the 15N-HSQC spectrum of the labeled protein were observed upon addition of 3-fold molar excess WT2P peptide to a pre-formed 1:1 15N-labeled Ets1301-440/DNA complex (Figure 3.10a). Thus, under these conditions, the SRR peptide is unable to compete with DNA for ETS domain binding. Upon addition of cognate DNA to the preformed Ets1301-440/WT2P complex, amide 1H-15N signals moved in slow exchange to the corresponding chemical shifts of the DNA-bound state (Figure 3.10b). Thus, the WT2P peptide does not associate with other parts of the ETS domain in the DNA-bound state, and DNA-binding precludes interaction with the WT2P peptide. These results are consistent with the fact that the DNA duplex binds with nM affinity to Ets1301-440 [39], whereas the measured interaction strength between the WT2P peptide and Ets1301-440 is ~ 130 µM under these conditions. In addition, the data support a steric mechanism of autoinhibition, in which association of the SRR and DNA by the ETS domain are mutually exclusive events. To be an effective steric regulator for DNA specific sequences that have high affinity for the ETS domain, the SRR likely needs to be in cis (connected to the protein), thereby allowing an increase in the effective concentration of the disordered region on the ETS domain, and reinforcing the autoinhibitory effect of the helical IM composed of HI-1, HI-2, H4, and H5 [219].   117    Figure 3.10: Cognate DNA competes with the WT2P peptide for binding to Ets1301-440.  The WT2P peptide and cognate DNA duplex were added to 15N-labeled Ets1301-440 and the changes were tracked with 15N-HSQC spectra. (a) Right: Starting with free Ets1301-440 (blue), cognate DNA was added in small increments to make a 1:1 protein: DNA complex (yellow). The Trp 15Nε1 indole signals shifted in slow exchange from the free to the DNA-bound state. Left: Excess WT2P peptide was then added (red). No further changes were observed indicating that, under these conditions and concentrations, the WT2P peptide cannot displace bound DNA. (b) Right: The reverse experiment. The WT2P peptide was added in molar excess (red). The peaks shifted in fast exchange from the free (blue) to the peptide-bound state (red). Left: Subsequently, the DNA duplex was added in small increments (yellow). The amide peaks moved in slow exchange from the peptide-bound to the DNA-bound state, indicating displacement of the peptide by the DNA. Under these conditions (300 mM NaCl), the KD of the WT2P peptide for Ets1 is ~ 140 µM, whereas DNA binds with nM affinity [39]. The molar ratios of each macromolecule present in the sample are indicated below the legend. The 1Hε1-15Nε1 signals of W375 and W338 are labeled in each spectrum. For clarity, intermediate spectra along the titration series are not shown, and hence fast and slow exchange behavior are indicated by the straight and curved arrows, respectively.   118  3.3.2  Structural models of steric inhibition by the SRR   NMR-based models of the Ets1301-440/WT2P complex    Three-dimensional filtered-edited NOESY experiments were used to selectively observe intermolecular 1H-1H NOEs between the 15N/13C-labeled Ets1301-440 and the unlabeled WT2P peptide. These experiments detect NOEs between a 1H nucleus directly bonded to a 12C/14N (present in the unlabeled WT2P peptide) and a 1H nucleus directly bonded to a 13C/15N, with the latter resolved by the shift of the bonded 13C/15N nucleus. The signals obtained were weak due to the dynamic nature of the complex and relatively low sample concentrations (200 - 300 µM) required to maintain solubility (Figure 3.11). However, with long collection times (~ 3 days), I detected ~ 60 intermolecular NOE crosspeaks between the peptide and the protein. Intramolecular 1H-1H NOE, such as those arising from buried sulfur-bonded 1H of Cys350 and neighboring labeled atoms, were excluded by collecting control experiments in the absence of WT2P peptide. To assign the intermolecular protein-peptide NOEs, I used 13C-HSQC and 15N-HSQC titrations to extend the previously reported chemical shifts of free Ets1301-440 to those of the bound protein [215]. In parallel, the chemical shifts of the bound peptide were obtained from double-filtered 1H-1H TOCSY and NOESY spectra (Figure 3.8). Due to the relatively large number of aromatic residues present in both the protein and the peptide (14 and 4, respectively) along with spectral overlap, some NOE signals were either not assigned or assigned ambiguously. In total, I obtained 15 unambiguous and 17 ambiguous intermolecular  1H-1H NOEs  to use as restraints for building a model of the WT2P/Ets1301-440 complex (Table 3.3).        119  Table 3.3: Intermolecular NOE crosspeaks used as CYANA distance restraints for NMR structure calculations.  Unambiguous Ambiguous No. Protein Peptide No. Protein Peptide 1 L337 Hδ2 F286 Hζ 1 T330 Hγ P281 Hβ/ P292 Hβ 2 R391 HN F286 Hζ 2 L337 Hδ2 Y283 Hε/ Y288 Hε 3 G392 HN F286 Hζ 3 M384 Hε Y283 Hδ/ Y288 Hδ 4 Y395 Hε F286 Hβ 4 K388 HN F286 Hζ/ F286 Hδ 5 Y395 HN F286 Hβ 5 L389 HN Y283 Hδ/ Y288 Hδ 6 I401 Hδ1 F286 Hβ 6 G392 HN Y283 Hε/ Y288 Hε 7 I401 Hδ1 Y291 Hδ 7 Y395 HN Y283 Hδ/ F286 Hζ/ Y288 Hδ 8 I401 Hδ1 Y291 Hε 8 Y395 HN Y283 Hε/ Y288 Hε/ Y291 Hε 9 I401 Hδ1 L295 Hδ 9 Y395 Hε Y283 Hδ/ Y288 Hδ 10 I401 Hγ2 L295 Hδ 10 Y396 HN Y283 Hδ/ Y288 Hδ 11 L418 Hδ2   L295 Hδ 11 I401 Hδ1 P281 Hβ/ P292 Hβ 12 L421 Hδ1 L295 Hδ 12 I401 Hδ1 R279’ Hγ/ L295 Hβ 13 L421 Hδ2 Y291 Hε 13 L421 Hδ1 P281 Hβ/ P292 Hβ 14 L421 Hδ2 A293 Hβ 14 L421 Hδ1 A294 Hα/ R279’ Hα 15 L421 Hδ2 L295 Hδ 15 L421 Hδ1 R279’ Hγ/ L295 Hβ    16 L421 Hδ1 A294 Hβ/ A293’ Hβ    17 L421 Hδ2 P281 Hβ/ P292 Hβ     120   Figure 3.11: Filtered-edited 3D 1H-15N/13C-1H NOESY spectrum of 15N/13C-labeled Ets1301-440 with bound unlabeled WT2P peptide.  Shown are selected regions of the spectrum containing intermolecular NOE crosspeaks between 1H on the peptide and (a) 13C-bonded or (b) 15N-bonded 1H on the protein. Overall, the signals tended to be weak. However, multiple consistent NOEs between the peptide and the protein were detected, including those involving the sidechains of Ile401 and Leu421 as well as amides in the recognition helix H3. These intermolecular NOEs have been assigned and are listed in Table 3.3.  121   Ile401 in Ets1301-440 exhibited the largest 1H-15N CSPs upon binding the WT2P peptide (Figure 3.1c) and also gave rise to the most intermolecular crosspeaks between the peptide and the protein (Figure 3.11, Table 3.3). Other hydrophobic residues near Ile401, including Leu418, and in particular Leu421, also gave rise to unambiguous intermolecular NOEs to residues at the C-terminus of the WT2P peptide including Ala293 and Leu295. Overall, this is consistent with the CSP analysis discussed above (Figure 3.1), and implicates helix H4 of the IM in also mediating favorable hydrophobic-driven contacts with the SRR. It is important to note that this positions the peptide with its C-terminus near the N-terminus of Ets1301-440, as required when the two molecules form a continuous polypeptide in the context of the native protein.   Residues in the DNA recognition helix H3, such as Leu389, Arg391, Gly392, Tyr395, and Tyr396 also exhibited ambiguous and unambiguous NOE crosspeaks to aromatic residues in the peptide (Figure 3.11, Table 3.3), supporting the steric model of autoinhibition. Tyr291 showed the most line broadening and distinct chemical shifts upon binding the ETS domain. NOEs between Tyr291 and Ile401/Leu421 positions this side chain in the hydrophobic pocket of the ETS domain found between Trp338 and Ile401. This is consistent with the particularly high CSPs of these residues upon peptide binding, as well as the presence of a 5fPhe side chain at this site in a crystal structure (see below). Unambiguous NOEs also place Phe286 of the SRR peptide near the DNA-binding residues Leu393 and Tyr395. Finally, Tyr283 and Tyr288 gave rise to about half of all ambiguous NOEs detected. This ambiguity is due to degeneracy in the 1Hδ and 1Hε chemical shifts of these residues.  To obtain structural models of the interaction between Ets1301-440 and the WT2P peptide, I combined published upper distance restraints of the free Ets1301-440 ensemble [213] (PDB: 1R36), along with restraints described above for CYANA structural calculations (see Methods). The resulting ensemble is shown in Figure 3.12. These models show that the SRR peptide adopts a loosely-defined S-shaped conformation, with turns of the peptide backbone driven by a relatively large number of NOEs between the aromatic residues on the peptide (Y283, F286, Y288, Y291) and the aromatic residues in H3 (Table 3.3, Figure 3.11, 3.12).  In addition, residues pSer282 and pSer285 were found to be involved in a turn 122  of the peptide backbone that results from a relatively large number of NOEs between the adjacent aromatic residues on the peptide (e.g. Y283, F286, and Y288) and the aromatic residues in H3. This places several negatively charged residues (pS282, D284, pS285, D287, E289, D290) facing the outside of the DNA-binding helix H3 (Figure 3.12b). In addition, the models positioned the C-terminal L295 of the SRR peptide in close proximity to the N-terminal K301 of the ETS domain, as would occur in the context of the native protein (dashed lines in Figure 3.12).   Analysis of the biophysical properties of the amino acids on the molecules showed good surface complementarity between the molecules (Figure 3.12c). Although the exact positioning of the side chains is variable, hydrophobic/aromatic residues on the WT2P peptide occupy a hydrophobic patch of ETS1 proximal to H3 composed of residues I401, W338, L418, L421, Y395, Y396, and W375. In addition, the negatively charged residues on the peptide are localized along a positively charged region of H3 composed of residues K379, K381, K388, R391 and R394.   Overall, this is consistent with the observations that the hydrophobic character of the peptide is important in increasing affinity for the ETS domain and that electrostatic interactions, including those with the phosphoserines, are also involved in binding. It is worth stressing that, despite my best efforts to obtain higher resolution NMR models of the complex, the NOE restraints were sparse and consistent with a dynamic (fuzzy) complex. Thus, I cannot rule out the possibility that the peptide is binding in multiple orientations and conformations. To gain further structural insight into this interaction, we conducted complementary X-ray crystallographic studies.     123    Figure 3.12: NMR-derived models of the Ets1301-440/WT2P complex. (a) Shown is the CYANA calculated structural ensemble obtained from experimental NOE restraints. The WT2P peptide (red) is bound along a surface extending from the IM to the recognition helix H3. The dashed lines represent the connection between the SRR and the ED/IM as would be expected in the cis context of the native protein. (b) The top-ranked structural model calculated by CYANA is shown in more detail, with residues that are hydrophobic/aromatic highlighted in lime green.  (c) Analysis of the surface complementarity between the ETS domain and the CYANA-derived structural ensemble. Hydrophobic residues (A, V, I, L, P, C, M, Y, W, F) on the peptide (top) and ETS domain (bottom) are colored in lime green; negatively charged residues (D, E, pS) in red, and positively charged (R, K, H) residues in blue. The dynamic N-terminal residues of the peptide Arg279, Val280, and Pro281 have been excluded from the analysis. The same view is shown in both cases, and highlights the complementarity in the positions of hydrophobic and charged residues on the molecules.        124  Crystal structure of Ets1301-440 in complex with the high-affinity 5fPhe2P* peptide    In collaboration with Ms. Chang Sheng-Huei Lin in the Murphy group, I co-crystallized Ets1301-440 in complex with the peptide mimic 5fPhe2P*. This doubly-phosphorylated peptide, with penta-fluorophenylalanines at the four aromatic positions and a non-native C-terminal tryptophan for quantification has the highest affinity for the ETS domain (9 µM at 300 mM NaCl). The crystallization drops were prepared at low ionic strength (75 - 100 mM NaCl) to favor binding. Although this caused protein aggregation, diffraction-quality crystals were obtained and the structure of the complex was solved at 2.00 Å resolution by molecular replacement. The data collection parameters and refinement statistics are provided in Table 3.4.   Within the peptide-protein complex, the Ets1301-440 molecules formed domain swapped dimers (Figure 3.13a). As observed in previous crystals of Ets1 (e.g. PDB: 1MD0 [78]), this domain swapping occurs as helix HI-1 of one monomer aligns with helix H4 of a second Ets1 monomer to effectively form an extended helix. Residues 281-296 of the peptide were clearly defined in the electron density map (Figure 3.13b). I also found the N-terminus of the peptide to be near HI-1, in contrast to the NMR-derived models, which clearly position the C-terminal L295 in the WT2P peptide at this position. This result may be due to the inherent symmetry of the peptide sequence, which has a pseudo-palindromic repeating pattern of aromatic and negatively-charged residues in the center of the sequence (Table 3.1). The peptide variant may bind in two energetically-similar orientations, and the crystallization conditions may favor one over the other.   Nevertheless, the crystal structure recapitulated the interaction features observed in solution for the ETS1301-440/WT2P complex by NMR spectroscopy and provided additional insight (Figure 3.13c). The aromatic 5fPhe residues mediate close contacts to aromatic residues in H3 in the ETS domain. For instance, Trp338, Tyr395, and Tyr396 in the ETS domain stack with residues 5fPhe 283, 286, and 288 in the peptide, respectively, with distances ~ 4 Å between the planar ring faces (Figure 3.13c). These residues of the ETS domain also gave rise to particularly high CSPs upon addition of the WT2P peptide, and a number of intermolecular NOEs involving the aromatic residues (Figure 3.1c, Table 3.3). 125  In addition, the negatively charged residues Glu289 and Asp290 observed in the crystal structure are involved in a turn of the peptide backbone that places the adjacent hydrophobic residues 5fPhe288 and 291 in close proximity and facing the protein (Figure 3.13c). In the NMR-derived models, the negatively charged Asp284 and pSer285 residues are also involved in a turn of the peptide backbone, with the adjacent Tyr283 and Phe286 facing the protein.   Comparison of structures of Ets1 in its free versus DNA- and peptide-bound states revealed that the rotamer conformations of Tyr395 differ between these (Figure 3.13d). In both the DNA- and peptide-bound states, Tyr395 faces the same direction away from the protein  core, while in its free form, Tyr395 is found in an alternative conformation closer to the protein interface. Therefore, Tyr395 is important in mediating both DNA-binding and inhibitory functions through a change in conformation relative to its free protein state.   Overall, the crystallographic analysis demonstrates that hydrophobic/aromatic clustering of aromatic residues in the SRR peptide variants, aided by a turn involving negatively-charged residues, promotes association with the DNA recognition interface of the ETS domain.                126  Table 3.4: Data collection and refinement statistics for the Ets1301-440/5fPhe2P* complex.  Data collection     Wavelength (Å) 0.9795     Space group P3221     Cell dimensions         a, b, c (Å) 75.33, 75.33, 115.33         α, β, γ (°) 90, 90, 120     Resolution (Å) 65.23-2.03 (2.03-2.00)     Rmerge 0.090 (1.174)     I/σI 12.7 (1.8)     Completeness (%) 100 (100)     Multiplicity 11.1 (11.2)     CC1/2  0.998 (0.762) Refinement     Resolution (Å) 56.78-2.00 (2.03-2.00)     No. of reflections 26,181     Rwork/Rfree 0.2194/0.2631 Ramachandran     Favored (%) 99.26     Allowed (%) 0.74     Outliers (%) 0 Average B-factor (Å2)     Macromolecules 52.05     Water 53.90 RMSDs     Bond lengths (Å) 0.004     Bond angles (°) 0.66 No. of atoms     Protein+peptide 4570     Water 174     Ions (SO4-2) 3  *Values in parenthesis correspond to the highest resolution shell.    127   Figure 3.13: Crystal structure of the Ets1301-440/5fPhe2P* complex. (a) Shown are two asymmetric units related by a two-fold axis rotation, represented by the central black circle. The ETS domain crystallized as a domain swapped dimer in which helix H4 of one monomer is aligned with helix HI-1 of a second monomer. Two peptide molecules (red) are sandwiched between four ETS1301-440 monomers (cyan and grey) for a net 1:2 stoichiometry. (b) The peptide residues 281-296 are clearly defined in the crystal structure. The electron density map of the peptide (Fo-Fc) is shown in dark blue, contoured at 1σ. The N-terminal Arg279 and Val280 were not observed. These residues are also very dynamic in solution as observed by NMR spectroscopy. (c) Close-up of the peptide-protein interface. Residues W338, Y396, and Y395 in the ETS domain associate with 5fF283, 5fF286, and 5fF288 in the peptide, respectively. Although not labeled, the Hα protons of Gly392 are also in close proximity to the ring of 5fF291, consistent with NMR observations that this residue directly contacts the SRR. The negatively charged residues face the outside of the protein, and Glu289/Asp 290 are involved in a turn of the peptide backbone that cluster adjacent aromatic residues in close proximity towards the protein interface. (d) The DNA-binding residue Tyr395 seems to exist in two conformations depending on whether the ETS domain is in isolation (lime, PDB: 3WTZ and 1MD0), or bound to DNA (grey, PDB: 2NNY) and the 5fPhe2P peptide (cyan). The DNA strands are shown in cartoon form in grey. Residue 5fPhe288 that stacks with Tyr395 is shown in red. 128  3.3.3 Intrinsic properties of the SRR peptide and the effects of phosphorylation   Depending on the amino acid composition, intrinsically disordered regions may sample preferred conformations that are important for function. These preferences may also be altered by PTMs. Earlier in this Chapter, I presented an analysis of the 1Hα shifts of the WT2P peptide, and concluded that it is predominantly disordered and does not stably fold upon association with Ets1301-440. To better understand the sequence dependence of the interactions of the SRR with the ETS domain, I assigned at natural abundance the backbone chemical shifts (1HN, 1Hα, 13Cα, 13Cβ, 13CO) of the WT2P, WT0P*, and Ala2P* peptides in the absence of protein. These were used to predict the secondary structure populations using the δ2D algorithm [228]. Since phosphoserines have not been parameterized for δ2D, I corrected the chemical shifts of pSer282 and pSer285 by assuming a simple additive offset due to phosphorylation of serine (see Methods). As expected from previous studies of similar peptide models [72], WT2P, WT0P*, and Ala2P* all were be disordered with random coil and polyproline type II (PPII) population values of ~ 70 % and ~ 25 %, respectively (Figure 3.14a), and very little contributions from α-helix or β-strand secondary structures (not shown). Thus, these SRR peptides are intrinsically disordered and neither phosphorylation nor alanine substitutions of the aromatic residues induces any significant conformational changes detectable by chemical shift measurements.   As an alternative method of assessing secondary structure propensities, I also analyzed the patterns of NOESY crosspeaks between main chain protons in neighboring residues of the three peptides. In α-helices, sequential residues i-1 and i have  strong 1HN(i-1) -1HN(i) NOE correlations, and weak 1Hα(i-1)-1HN(i) correlations relative to the intra-residue 1Hα(i)-1HN(i) correlations [229]. In extended conformations, such as β-strands, 1Hα(i-1)-1HN(i) crosspeaks are stronger than 1Hα(i)-1HN(i), and 1HN(i-1) -1HN(i) are not observed. I found that in all three peptides, the NOE intensities were strongest for adjacent 1Hα(i-1)-1HN(i) correlations, with approximately twice the intensity as 1HN(i)-1Hα(i)  correlations (Figure 3.14b). The 1HN(i-1)-1HN(i) correlations were very weak in all three peptides, having on average ~ 10% of the peak intensity as 1Hα(i-1)-1HN(i) correlations (not shown). This result is 129  consistent with the chemical shift analysis above, confirming the unstructured nature of the peptides.   Although disordered, the WT2P peptide exhibited stronger backbone NOE correlations of all three types in regions surrounding the modifications sites (residues ~281-287), than both the Ala2P* and WT0P* peptides (Figures 3.14b, c, 3.15, and 3.16). This was not due to differences in experimental conditions, as the C-terminal residues exhibited similar NOE intensity patterns in all three peptides. Because all three types of NOE correlations were stronger in the WT2P peptide relative to the WT0P peptide, I concluded that the overall secondary structure propensities did not change upon phosphorylation. However, the distinct pattern surrounding the modification sites suggests that the WT2P peptide may have more restricted motions than if it lacked either the phosphate groups or the aromatic residues. This is consistent with previous findings that phosphorylation decreases the sub-nanosecond timescale motions of the SRR [72, 217]. Also of note, residues ~287-291 consisting of the sequence DYEDY had detectable 1HN(i-1)-1HN(i) and large 1Hα(i-1)-1HN(i) correlations in all versions of the peptides (Figures 3.14-3.16). Therefore, the combination of aromatic and negatively-charged residues may generally restrict torsion angle rotations. As a brief background explanation for this conclusion, the interproton NOE depends on both the separation (1/r6) of the nuclei and the correlation time for their relative tumbling within the magnetic field. In the fast tumbling limit, the NOE is positive, whereas in the slow tumbling limit, it is negative. For the SRR peptides, the NOEs are negative (e.g. having the same phase as the diagonal in a NOESY spectrum). In this slow tumbling limit, the intensity of the NOE crosspeak will decrease/increase with faster/slower motions.       130   Figure 3.14: The unbound SRR peptide is predominantly unstructured and exhibits modest changes in NOE patterns upon phosphorylation  (a) The normalized secondary structure populations of the WT2P (red), Ala2P* (cyan), and WT0P* (grey) peptides (major population) were predicted from main chain chemical shifts (1HN, 1Hα, 13Cα, 13Cβ, 13CO) using the δ2D algorithm [228]. The populations for PPII (top panel) and random coil (bottom) conformations are plotted; the -helix and β-strand values were very small and are not shown. All three peptides exhibited similar coil and PPII propensities, indicating they are disordered and that no predominant conformational changes result from phosphorylation or alanine substitutions. (b) The integrated peak volumes resulting from 1Hα(i)-1HN(i) (top) and 1Hα(i-1)-1HN(i) (bottom) NOE correlations are plotted as a function of the residue number for the three peptides, and were normalized to 1 (see Methods). The 1Hα(i)-1HN(i) correlation intensities were ~ 40 - 50 % as strong as 1Hα(i-1)-1HN(i) correlations, consistent with the presence of disordered peptides. The color code is the same as in (a). The data were obtained from 2D 1H-1H NOESY spectra recorded with 250 msec mixing times at 28 °C (see Figure 3.15). NOE intensities of geminal Hβ nuclei in residues P281 and P292 were within 5% across the three peptide spectra, and thus integrated volumes were not corrected to account for differences between spectra. Gaps in the plot correspond to prolines, which lack an 1Hα(i), and residues with highly overlapping crosspeaks.  Stronger correlations observed for residues ~ 281-287 in the WT2P peptide relative to both the Ala2P* and WT0P* peptides indicate that phosphorylation may restrict backbone motions. (c) Although very weak, 1HN(i-1)-1HN(i) correlations between sequential amides were also detected in all three peptides. This is indicated diagrammatically as rectangles (see also Figure 3.16). More sequential 1HN(i)-1HN(i) NOEs were found in regions with aromatic and negatively charged residues, indicating restricted backbone motions around these amino acids. The dashed outline in the case of the D284 and pS285 residues of the Ala2P* peptide indicates ambiguity in assignments.   131   Together, these data indicate that phosphorylation has a small effect in altering the conformational ensemble of the SRR peptide in its free state. Chemical shift analysis indicated that all three peptides were mostly unstructured, exhibiting similar coil and PPII populations. NOE correlation analysis was consistent with these results, and also suggested that serine phosphorylation in combination with the aromatic residues may promote restriction of torsional angle rotations. Whether this bias affects binding energy or kinetics upon association with Ets1 is unknown, and difficult to examine experimentally. However, it is conceivable that a more dynamically restricted SRR ensemble reduces the entropic penalty of binding to the ETS domain. 132    Figure 3.15: The unbound SRR peptides are predominantly disordered  Shown are the 1Hα/1HN regions of NOESY spectra (250 msec mixing time, 28 °C) of the WT2P (red), Ala2P* (cyan), and WT0P* (grey) peptides. The 1Hα(i)-1HN(i)  NOE correlation peaks of residues pS285 and E289 are indicated, as well as the 1Hα(i-1)-1HN(i) correlations of A288/F288 and D284. The peaks intensities corresponding to 1Hα(i-1)-1HN(i) correlations are approximately twice as large as those corresponding to 1Hα(i)-1HN(i) correlations (see Figure 3.15b for quantification). Although E289 exhibited similar NOE intensity patterns in all three peptides, pS285 had significantly smaller correlation peaks in the WT0P* peptide variant. This suggests that phosphorylation has a small effect on the conformations and/or dynamics of the peptides near residue 285.   133   Figure 3.16: Distinct 1HN(i-1)-1HN(i) NOE correlation patterns among the SRR peptide variants  Shown are the amide (1HN/1HN) regions of NOESY spectra of the WT2P (red), Ala2P* (cyan), and WT0P* (grey) peptide variants. The 1HN(i-1)-1HN(i) correlation peaks are very weak relative to 1Hα(i-1)-1HN(i)  correlations (not shown). However, correlation peaks detected follow the same pattern as 1Hα(i)-1HN(i) and 1Hα(i-1)-1HN(i) correlations. That is, residues surrounding the phosphorylation sites have stronger crosspeak intensities. Residues with detectable 1HN(i-1)-1HN(i) correlations are labeled, and are shown diagrammatically in Figure 3.15c. Assignments in parenthesis are ambiguous due to close overlap in chemical shifts of D284 1HN and A286 1HN in the Ala2P* peptide. 134  3.3.4 The SRR peptide can associate with distantly-related PU.1   PU.1 is a member of the ETS family of transcription factors involved in myeloid and B-cell development [230]. Relative to Ets1, PU.1 is the most evolutionarily-distant member of the ETS proteins by sequence phylogeny, and has not been reported to exhibit DNA-binding autoinhibition [212]. I used NMR-monitored titrations to test whether the Ets1 WT2P peptide could also bind the ETS domain of PU.1. Indeed, addition of this peptide caused changes in the amide chemical shifts of PU.1 in the fast exchange regime, indicating that the Ets1 SRR is able to associate weakly with PU.1 (Figure 3.17a). A smaller number of residues were perturbed relative to 15N-labeled Ets1301-440, and the extent of the perturbations was less (compare with Figure 3.1c).   This result was somewhat unexpected as PU.1 lacks many of the residues that mediate the Ets1-SRR interaction. PU.1 residues with the highest chemical shift perturbations upon addition of the WT2P peptide were clustered around the loop between S3 and S4 (the wing) (Figure 3.17b, c). This region of PU.1 contains the sequence 246VKKKL250, which is very positively charged, and involved in DNA binding [231]. Therefore, the interaction between the negatively charged WT2P and PU.1167-272 may be mediated by electrostatic contacts at this site. A second non-contiguous site of PU.1 was also perturbed upon addition of the WT2P peptide, as shown by the large CSP exhibited by residue Tyr175, and to a lesser extent Glu242. The equivalent residues in Ets1301-440 are Trp338 and Ile401, respectively, which exhibit the highest CSPs upon association with the peptide (Figure 3.1c). This indicates the presence of a second binding mode more akin to the one found in Ets1. Also of note, Tyr175 in PU.1 and the equivalent in Ets1, Trp338, are found at the N-terminus of H1, a helix that links DNA binding and autoinhibition in Ets1 by contacting both the DNA phosphate backbone and the helical IM [232].   The presence of more than one binding site precluded a reliable determination of KD between the WT2P peptide and PU.1. However, CSP analysis indicates that residues in the “wing” of PU.1 bind to the SRR more weakly than residues Y175 and E242 in the second binding site, as shown by the extent of CSP saturation in the presence of > 4-fold excess WT2P peptide (Figure 3.17d). Therefore, the SRR peptide of Ets1 seems to contact PU.1 at 135  two sites: a lower affinity site, bound via electrostatic contacts, and a higher affinity site resembling that described in detail for Ets1. These results indicate that features enabling the association of the phosphorylated Ets1 SRR with the ETS domain, although weaker in PU.1, is conserved among distantly-related family members.    136  Figure 3.17: The Ets1 WT2P peptide binds to the ETS domain of PU.1 at two distinct sites.  (a) Addition of the Ets1 WT2P peptide to 15N-labeled PU.1167-272, encompassing the ETS domain, caused changes in amide chemical shifts in fast exchange in the NMR timescale, indicating relatively weak binding under conditions of moderate ionic strength (300 mM NaCl). Selected residues are labeled with arrows showing the direction of chemical shift changes upon addition of > 4-fold excess WT2P peptide to PU.1. (b) 1H-15N CSPs upon addition of 4.4-fold excess WT2P peptide are plotted as a function of residue number in PU.1167-272. The top cartoon shows the secondary structural elements found in PU.1. Residues with CSP > 0.05 include those in the “wing” between S3 and S4, as well as the N-terminus of H1 and H3. Gaps in the graph correspond to residues that could not be unambiguously assigned. (c) Residues with CSP > 0.05 are mapped onto the NMR-derived structure of PU.1 (Desmond Lau, PhD thesis) and highlighted in red. Two non-contiguous interfaces contact the WT2P peptide, exemplified by Y175/E242 on one surface, and M230/K248/L250 on the other. (d) Amides in these two binding interfaces report different titration curves, indicative of different affinities for the peptide. For example, Y175 and E242 show CSP values close to saturation upon addition of > 4-fold excess of WT2P, whereas K248 and L250 do not. Thus, the peptide appears to bind with higher affinity to the former interface spanning helices H1 and H3.    3.4 Discussion  3.4.1 Hydrophobic amino acids promote intermolecular interactions   Previously, autoinhibition of ETS domain by the SRR was found to be dependent on the presence of aromatic residues near the phosphorylation sites. However, the roles of the aromatic residues in contributing to association of the SRR to the ETS domain were not fully established. Using a series of SRR peptide variants, I found that the strength of the interaction correlated with the hydrophobic character of the residues at positions 283/286/288/291, but not necessarily their aromatic character. In addition, the NMR-based models and the crystal structure place the Tyr/Phe/5fPhe residues facing towards the hydrophobic patch surrounding helix H3 and away from solvent. Therefore, the aromatic residues in the SRR are contributing to autoinhibition by clustering favorably on a complementary hydrophobic surface of the ETS domain.   These results also suggest that the interaction of the SRR with the ETS domain is relatively resistant to mutations at these four aromatic positions, since replacement with other hydrophobic residues does not negatively impair binding. For example, an SRR peptide containing a combination of leucine and valine residues at these positions is 137  predicted to have a similar KD as seen in the wild-type peptide. Why then, is there an apparent repeating pattern of S--D/E residues ( = aromatic) in the SRR of all known Ets1 and Ets2 homologs [72]? One possibility is that these regions are important for the specificity and affinity of interactions involving regulatory kinases, phosphatases, and/or other binding partners. Although only one of the five mapped CaMKII phosphorylation sites on the SRR actually has the consensus sequence for this kinase (R-X-X-S/T), there could be other proteins recognizing these motifs currently uncharacterized, thereby explaining the sequence conservation. For example, phosphorylation of Ser282 has been reported to provide a binding site for COP-1, a ubiquitin ligase component, whereas phosphorylation of Tyr283 by Src family tyrosine kinases, decreases COP-1 binding to prevent the ubiquitin-mediated degradation of phosphorylated Ets1 [233]. Finally, it is possible that the electronic properties of tyrosine residues are useful in keeping the SRR more solvated through favorable water contacts, and reduce the likelihood of protein aggregation [234]. These reasons may explain the unusual enrichment of aromatic residues in this intrinsically disordered region.   3.4.2 The role of phosphorylation    We and others have found that phosphorylation of the SRR greatly enhances autoinhibition by promoting association of the SRR to the ETS domain [212]. In this chapter, this is confirmed by a ~10-fold decrease in the binding affinity of the SRR peptide and the ETS domain upon removal of the phosphate groups on S282 and S285. Furthermore, 31P-monitored titrations showed that the 31P nuclei of pS282, and even more so pS285, are perturbed in the presence of Ets1301-440 Also, consistent with electrostatic interactions, increasing ionic strength weakened binding of the SRR peptide to the ETS domain.  In the NMR-derived model of the Ets1301-440/WT2P complex, the phosphoserines, aspartates, and glutamates of the SRR are in the general proximity of a continuous positively charged surface on Ets1 composed of K379, K381, K388, R391, and R394. This is consistent with an electrostatic role of the phosphoserines in promoting binding. However, 138  it must be noted that no intermolecular NOE restraints involving these negatively charged residues were confidently detected, and thus their positions in the calculated ensembles are a consequence of satisfying the observed restraints to adjacent aromatic chains. In addition, the domain swapped dimers present in the crystallographic structure of the Ets1301-440/5fPhe2P* complex exhibited a 2:1 stoichiometry and alternative binding orientation relative to the NMR-derived ensemble. Nevertheless, the phosphoserines were positioned near complementary charged regions of the proteins in the crystallographic structure.   In this chapter, as well as in previous studies, it is seen that both aromatic (or hydrophobic) residues and phosphoserines contribute to the SRR/ETS domain interaction. What is the basis for this apparent synergy? A simple explanation is that complex formation results from the cooperative reinforcement of many individually weak interactions. Structurally, clustering of aromatic residues on a hydrophobic surface of the ETS domain requires that the phosphoserines, asparates, and glutamates to be positioned along an S-shaped turn in the SRR and thus adjacent to a positively charged surface of ETS domain. Conversely, positioning of these negatively charged residues requires that the intervening aromatic groups are clustered on the ETS domain. Furthermore, I found that although phosphorylation did not alter the secondary structure propensities of the free SRR peptide variants, it increased the intensity of NOE correlation peaks between adjacent residues near the modification sites, but only in the presence of aromatic residues. This suggests that the combination of aromatic and negatively charged residues may restrict backbone torsional rotation to some extent, and also promote ETS domain binding.   Collectively, the NMR and crystallographic data suggest that the negative charges on the peptide resulting from phosphorylation contribute to the hydrophobic clustering of the aromatic residues by promoting structurally and thermodynamically coupled turns necessary for electrostatic interactions with the ETS domain and by restricting SRR mobility. This positioning of the negatively charged phosphoserine residues on the outside of the protein along the DNA-recognition helix sterically and electrostatically blocks DNA binding and thus leads to autoinhibition.   139  3.4.3 The “fuzzy” nature of the interaction   In this study, I found that substitution of the aromatic side chains to other hydrophobic groups did not dramatically impact the association between the SRR peptide and the ETS domain (Table 3.1, Figure 3.5). From a biophysical point of view, this indicates that the interaction does not depend greatly on the precise geometry of the side chains at these positions, but instead, on hydrophobic-driven contacts between aliphatic side chains. This supports the idea that the interaction is “loose”, and that specific distances and angles between atoms are not required for the association. In addition, the peptide does not undergo large changes in chemical shifts upon binding. For instance, the chemical shifts of the aromatic residues Y283 and Y288 were indistinguishable, even at high magnetic field strength (850 MHz). The most distinct aromatic chemical shifts arose from residue Y291, which is located in a hydrophobic binding pocket near I401. The chemical shifts of the aspartates and prolines (e.g. D284/D287/D290 and P281/P292) were also highly degenerate in the bound state. This could be due to weak binding, the dynamic nature of the complex, and fast exchange between multiple energetically-similar conformations. Finally, weak and sparse NOE crosspeaks, in spite of great efforts to promote the interaction, also suggests a highly dynamic system. Collectively, these observations support the notion that the interaction is fuzzy and that multiple transiently populated conformations are possible. Of course, the use of a trans system, in which the SRR region and ETS domain of Ets1 are not continuous, partly contributes to the observed conformational heterogeneity, by making the interaction effectively weaker, and allowing more binding modes. However, previous studies on the cis system also did not observe folding upon binding, and also reported highly dynamic properties of the SRR region [215].  3.4.4 The mechanisms of DNA-binding regulation in Ets1   The DNA-binding function of Ets1 is regulated by regions adjacent to the ETS domain. Previous work by our group and others showed that an increasing number of phosphorylation events in the SRR results in both increased binding to the ETS domain 140  recognition helix and progressive dampening of protein motions [216]. This is accompanied by stabilization of the helical IM (composed of HI-1, HI-2, H4, and H5), thereby favoring “closed” rigid conformations of the ETS domain with relatively weak affinity for DNA [219]. Based on this collective evidence, a coupled steric and allosteric mechanism of DNA-binding autoinhibition in Ets1 has emerged. Comparison of Ets1 complexes bound to DNA and the SRR peptide reveals extensive overlap of the two interaction interfaces (Figure 3.18). The DNA and SRR peptide molecules interact with similar key residues in the ETS domain mediating both types of association, including those in the loop leading to H1 (e.g. T330), the N-terminus of H1 (e.g. W338), and the DNA-binding helix H3 (e.g. Y395). These findings are supported by CSP analysis, direct NOE contacts, and the crystal structure, which place aromatic/hydrophobic residues in the SRR peptides in close proximity to a hydrophobic patch along the ETS domain. Competition studies of binding to the ETS domain by the WT2P peptide and DNA further support that these molecules cannot associate simultaneously. Therefore, binding of the ETS domain to the SRR peptide and DNA are mutually exclusive events. In addition, I found that the SRR peptide makes contacts with residues in the helical IM of Ets1. NOEs detected between the C-terminal A293/A294/L295 in the SRR peptide and L418/L421 in the ETS domain, place the peptide in close proximity to helix H4. As well, L421 has one NOE correlation peak to a proline residue, ambiguously assigned as P281 or P292 in the SRR. In the crystal structure, P281 present in the 5fPhe2P* peptide is in close proximity to L418 and L421. Therefore, aliphatic/hydrophobic residues in the SRR seem to be mediating favorable contacts with the leucine residues in H4, and may contribute to the stability of the helical IM. This is consistent with findings that the SRR-driven autoinhibition is dependent on a well-formed IM [219], and that phosphorylation of the SRR increases the energy required to unfold the ETS domain [217].  141   Figure 3.18: Ets1 DNA-binding regulation through inhibitory and activating protein sequences.   Shown are aligned NMR and X-ray crystallography derived structures of Ets1 in different states. (a) In the absence of DNA, the IM of Ets1, comprised of helices HI-1, HI-2, H4 and H5 is well folded [39, 215] (PDB: 1R36). In its unphosphorylated state, the SRR does not associate with the ETS domain strongly (this study and [72, 216, 217, 219]). (b) Upon phosphorylation, the SRR (red) binds the ETS domain at the DNA-interacting interface. This results in a decrease in DNA-binding affinity via combined steric and allosteric mechanisms of inhibition (this study and [72, 216, 217]). Shown is the NMR-derived ensemble of the interaction between Ets1301-440 and the WT2P peptide, with the phosphoserine side chains in stick form. Five residues connecting the SRR peptide (Ets1279-295) to the ETS domain (Ets1301-440) are shown as red dashes to represent the natural protein context. (c) Ets1 exists in a conformational equilibrium between the free (a) and DNA-bound states (c). Association with DNA is accompanied by unfolding of helices HI-1 and HI-2 (PDB: 1MDM). In this crystal structure, HI-2 appears folded; however, in solution, this is not the case. (d) Activating protein sequences such as those found in Runx1 (shown in green) shift the equilibrium by stabilizing the Ets1 DNA-bound state (PDB: 4L0Z)[235].  142   Because I did not observe large changes in 13C-HSQC spectra of Ets1 upon complex formation with the SRR, I also concluded that the overall fold in the presence of the SRR is not significantly altered, and the helical module comprised of HI-1, HI-2, H4, and H5 remains intact. These findings are supported by comparison of the crystal structures of Ets1 determined in its free (PDB: 1MD0) [78] and peptide-bound forms (not shown). In summary, my models of the interaction of the SRR peptide with the ETS domain help explain the dual roles of the SRR in masking the DNA binding helix H3 through a steric mechanism and stabilizing the helical IM.   Under the experimental conditions used, the WT2P peptide was not able to displace a high-affinity DNA duplex for binding to the ETS domain. This is consistent with their differing affinities (nM versus µM) for the ETS domain. The use of a trans system results in reduced effective local concentration of the SRR and uncoupling the steric and allosteric mechanisms of autoinhibition. Nevertheless, these results show that the SRR does not contact the DNA-bound ETS domain. Therefore, the SRR likely inhibits DNA binding by slowing the rate of association of DNA on the ETS domain, rather than by disrupting the DNA-bound state, consistent with kinetics studies of inhibited and uninhibited Ets1 [236].    Finally, it is worth noting that the interaction between the Ets1-interacting domain (EID) found in Runx1 and Ets1 also involves favorable aliphatic-mediated contacts with helix H4 in the inhibitory module (Figure 3.18). In the presence of Runx1, DNA-binding affinity and transcriptional activation by Ets1 increases [235]. Runx1 residues F194, L198, and L201 in the EID form a hydrophobic interface that mediates association with Ets1 residues L421 and L422 in helix H4. In the crystal structure, the marginally stable inhibitory helices HI-1 and HI-2 are not present, consistent with their unfolding upon binding DNA [214, 235]. This shows that helix H4 can be involved in both inhibitory and activating mechanisms by the SRR and EID, respectively.   Overall, these data point to a model of Ets1 regulation in which inhibitory and activating sequences change the conformational equilibrium between the free and DNA-bound states (Figure 3.18). The phosphorylated SRR promotes closed conformations incompatible with association with DNA, via steric and allosteric mechanisms. On the other hand, the EID activates Ets1 by stabilizing its association to DNA through a tethering 143  mechanism. This is supported by previous studies demonstrating that the combined action of the EID and the SRR have an intermediate effect on DNA binding by Ets1, relative to the presence of the EID or phosphorylated SRR alone [235].  3.5 Materials and methods  3.5.1 Expression and purification of Ets1301-440   Expression and purification of Ets1301-440 was performed as previously described [72]. The gene encoding Ets1301-440 was cloned into the pET28a expression vector using BamHI and NcoI restriction sites. Protein was produced by heterologous expression in freshly transformed E. coli HMS174 cells. The cells were grown with shaking at 37 °C to OD600 ~ 0.6, then cooled to 30 °C, and induced at OD600 ~ 0.9 with a final concentration of 0.4 mM IPTG. Unlabeled protein was grown in LB media, whereas isotopically-labeled protein was produced using M9 minimal media supplemented with 3 g/L 13C6-glucose and/or 1 g/L 15NH4Cl as the sole carbon and nitrogen sources, respectively. The cell cultures were supplemented with 35 mg/L of kanamycin and 1 x trace metal mix [237]. The cells were harvested 2 hours (LB) or 4 hours (M9 minimal) post-induction, and frozen at -80 °C prior to lysis. The cells were thawed and resuspended in 40 mL of binding buffer (50 mM sodium citrate, 50 mM NaCl, 1 mM TCEP, pH 5.4) per liter of culture, supplemented with 0.5 x protease inhibitor cocktail (Roche). The cell mixture was lysed at 4 °C by 4 - 5 passages through an EmulsiFlex-C5 homogenizer (Avestin). The lysate was cleared by centrifugation, filtered to remove cellular debris, and applied to tandem Fast-Flow Q-sepharose and Fast-Flow SP-sepharose (GE Healthcare) ion exchange columns. After extensive washing of both columns with binding buffer, the anion exchange column was disconnected, and the cation exchange column was eluted with 50 mM sodium citrate, 1 M NaCl, 1 mM TCEP, pH 5.4, over a gradient of 5 column volumes (~ 150 mL). The purest fractions containing Ets1301-440 were combined, and run through a Superdex-75 gel filtration column (GE Healthcare) for buffer exchange and increased purity. For production 144  of protein crystals, Ets1301-440 was further purified using an analytical mono-S cation exchange column (GE Healtchare) equilibrated with 20 mM MES, 50 mM NaCl, pH 6.0 and eluted over 10 column volumes with a gradient to 20 mM MES and 1 M NaCl, pH 6.0. For NMR studies, the final Ets1301-440 buffer consisted of 20 mM MES, 50 - 300 mM NaCl, 5 mM DTT, and 0.5 mM EDTA at pH 6.50. For X-ray crystallography, the final sample buffer consisted of 10 mM MES, 50 - 100 mM NaCl, and 1 mM TCEP at pH 6.50.  3.5.2 Expression and purification of PU.1167-272   The cDNA encoding PU.1167-272, containing the core ETS domain and an appended C-terminal helix H4 of PU.1, was cloned into pET28-MHL (Desmond Lau, PhD thesis). This vector includes an N-terminal His6 affinity tag followed by a TEV cleavage site. Protein was produced by heterologous expression in freshly transformed E. coli BL21 (λDE3) cells grown at 37 °C in minimal M9 media supplemented with 1g/L of 15NH4Cl, 1 x trace metal mix [237], and 35 mg/L kanamycin. At OD600 ~ 0.4, the cells were cooled to 30 °C and expression was induced at OD600 ~ 0.6 using a final concentration of 1 mM IPTG. The cells were harvested ~ 16 hours post-induction, and the pellet was frozen at -80 °C until required. The pellet was resuspended in 40 mL of denaturing binding buffer (20 mM sodium phosphate, 500 mM NaCl, 40 mM imidazole, 4 M guanidinium hydrochloride, pH 7.4) per liter of culture. The cell suspension was lysed by sonication with cooling in an ice-water bath. The lysate was cleared by centrifugation and filtering, then applied to a NTA-Ni+2 HisTrap HP column (GE Healthcare). His-tagged PU.1167-272 was eluted in one step with 100 % elution buffer (20 mM sodium phosphate, 500 mM NaCl, 1 M imidazole, 4 M guanidinium hydrochloride, pH 7.4). The resulting protein sample was dialyzed against 2 L of refolding buffer (50 mM sodium phosphate, 500 mM NaCl, 0.5 mM EDTA, pH 6.5). Soluble protein was separated by centrifugation and natively-folded PU.1167-272 was further purified using a Superdex-75 gel filtration chromatography column as described above. The final NMR sample buffer consisted of 20 mM MES, 300 mM NaCl, 5 mM DTT, 0.5 mM EDTA at pH 6.50.  145  3.5.3 Serine-rich-region (SRR) peptides   Peptides corresponding to the SRR region of Ets1 (residues 279 to 295, see Table 3.1), were purchased from ABI scientific at 95% purity. The peptides were modified with N-terminal acetylation and C-terminal amidation to avoid charged termini. The majority of the peptides contained a C-terminal non-native tryptophan residue to facilitate quantification by uv-absorbance spectroscopy using predicted molar absorptivities [169]. The presence of this residue did not significantly change the dissociation constant relative to the strictly wild-type sequence (Table 3.1), although it slightly increased the CSPs observed in NMR-monitored titrations (not shown). For these titrations, the lyophilized peptides were resuspended in NMR buffer (20 mM MES, 50 mM NaCl, 5 mM DTT, 0.5 mM EDTA, pH 6.50). The pH was adjusted to ~ 6.5 with NaOH, and the samples were dialyzed against 2 L of NMR buffer at the desired NaCl concentrations (Tables 3.1 and 3.2) using Float-A-Lyzer (Spectrum Labs) dialysis devices with MWCO of 100-500 Da. To ensure buffer matching, both the peptide and the protein used for titration studies were dialyzed for 48 hours at 4 °C in the same container. For studies on the free SRR peptides, the samples were dialyzed against 2 L of 20 mM of sodium phosphate, 50 mM NaCl, pH 6.5 at 4 °C. For crystallography, the 5fPhe2P* peptide was dialyzed against 2 L of 10 mM MES, 100 mM NaCl, 1 mM TCEP at pH 6.50.   3.5.4 DNA oligonucleotides   The complementary oligonucleotides corresponding to a specific Ets1 binding site [39] spanning 12 base pairs, 5’-CAGCCGGAAGTG-3’ and 5’-CACTTCCGGCTG-3’, were purchased from Integrated DNA Technologies. The oligonucleotides were resuspended in NMR sample buffer (20 mM MES, 300 mM NaCl, 5 mM DTT, 0.5 mM EDTA, pH 6.50), mixed in a 1:1 molar ratio based on quantitation by uv-absorbance spectroscopy with predicted molar absorptivities (, heated to 95 °C, and slowly cooled to allow duplex DNA annealing. The sample was then run through a Superdex-75 gel filtration column (GE Healthcare) to remove impurities and single 146  stranded DNA, as well as for improved buffer matching. The purest fractions were concentrated to ~ 1.3 mM, and mixed with 15N-labeled Ets1301-440 in the presence or absence of the WT2P peptide during NMR-monitored titrations.  3.5.5 NMR spectroscopy   NMR experiments were performed at 28 °C using cryoprobe-equipped Bruker Avance III 600 or 850 MHz spectrometers. The spectra were processed using NMRPipe [181] and NMRFAM-Sparky [238]. Ets1301-440/WT2P complex assignments   The chemical shifts of Ets1301-440 were previously published and were used to assign the protein in its free state [215]. The shifts of the peptides were assigned using a combination of double filtered 2D 1H-1H NOESY (150 msec mixing time) and 2D 1H-1H TOCSY spectra. To detect intramolecular NOEs between the unlabeled WT2P peptide and 15N/13C-labeled Ets1301-440, a filtered-edited three dimensional (3D) 1H-15N/13C NOESY spectrum was recorded. This filtered-edited experiment unambiguously detects NOEs between a 1H nucleus directly bonded to a 12C/14N (present in the unlabeled WT2P peptide) and a 1H nucleus directly bonded to a 13C/15N, with the latter resolved by the shift of the bonded 13C/15N nucleus. The resulting NOE restraints were used as upper distance limits for structure calculations using CYANA [239] (see below). CYANA calculations   Intermolecular NOEs between 15N/13C-labeled Ets1301-440 and the WT2P peptide were assigned as described above and used for CYANA (v. 3.97) docking calculations [239]. The input files consisted of upper distance restraint (.upl) and sequence files. The .upl file included 3382 upper distance limits to calculate the Ets1301-440 structure. These restraints 147  have been published [215] and are available at the Biological Magnetic Resonance Data Bank entry number 5991 [240]. In addition, 32 ambiguous and non-ambiguous intermolecular NOE restraints were included as additional upper distance limits, set to 6 Å. Although potentially useful, I did not consider the relative NOE intensities from the filtered-edited 1H-15N/13C NOESY spectrum because of degenerate chemical shifts present in the WT2P peptide. Included were also intramolecular peptide-peptide NOE restraints derived from the double filtered 1H-1H NOESY spectra described above, which were assigned automatically with the CYANA noeassign function. Although mostly short range, this resulted in an additional 114 intramolecular (peptide-peptide) restraints included in the calculations. The sequence file contained 140 residues of the Ets1 IM/ED (301-440), a linker of 13 dummy residues, and 17 residues corresponding to the SRR (279-295). This allowed unbiased positioning of the peptide on the ETS domain. In addition, the CYANA library was modified to include phosphoserine residues, which were obtained from the DYANA library [241]. Water refinement was not included in this particular protocol; however, previous ensembles refined using NMRe [242] did not change the structure significantly. The resulting ensemble of 20 structures outputted by CYANA was visually examined using PyMol [27] and 18 of these models with consistent backbone positions are shown in Figure 3.12. Assignments of the free SRR peptide variants and secondary structure analysis   The chemical shifts of the SRR peptide variants were assigned using a combination of two-dimensional (2D) natural abundance 13C-HSQC, 13C-HMBC, 1H-1H NOESY, and 1H-1H TOCSY experiments.  This allowed unambiguous chemical shifts assignments of all available 1Hα, 1HN, 13Cα, 13Cβ, and 13CO in the peptide residues, with the exception of Tyr291, for which only the 1HN, 13Cβ, and 13CO chemical shifts were assigned due to ambiguity. The 13CO of Leu295 was excluded from secondary structure analysis because of unusual shifts resulting from C-terminal amidation in the WT2P peptide. Chemical shifts for the non-native tryptophan residue present in the WT0P*, and Ala2P* peptides were also excluded for ease of 148  comparison. The same type and number of chemical shifts determined for all three peptides were used for secondary structure population analysis with the program δ2D version 2.0.0 [228]. The chemical shifts of phosphorylated serine residues were corrected to account for expected effects of the phosphate group. To do this, I compared the random coil chemical shifts of serine to those of phosphoserine at pH 6.50 [224, 227], and applied the following corrections to the experimentally determined chemical shifts: 13Cα +0.1 ppm, 13Cβ -2.15 ppm, 13CO -1.06 ppm, and 1HN -0.82 ppm. The -helix and β-strand population values resulting from the δ2D were negligible and are not shown in Figure 3.14. 1H-1H NOESY experiments of the free peptides were recorded on an 850 MHz spectrometer (250 msec mixing time). The peak intensities of 1Hα(i-1)-1HN(i)  and 1Hα(i)-1HN(i) NOE correlations resulting from these experiments were determined with the integrate (it) function in NMRFAM-Sparky [238]. The resulting values were divided by the strongest NOE intensity value corresponding to the 1Hα(F286)-1HN(D287) correlation in order to normalize to 1, and were plotted on the same vertical scale in Figure 3.14. Chemical shift perturbation analysis and dissociation constant determinations   15N/13C-labled Ets1301-440 and unlabeled SRR peptide samples were prepared in 20 mM MES, 50-300 mM NaCl, 5 mM DTT, and 0.5 mM EDTA at pH 6.50. The protein and peptide samples were typically concentrated to ~250 µM and ~2 mM, respectively. For NMR-monitored titrations, the peptide was added in small increments to Ets1301-440 and 13C-HSQC and/or 15N-HSQC spectra were recorded at each point in the titration. Amide chemical shifts changed co-linearly with increasing peptide concentration and for the most part, this occurred in fast exchange in the NMR timescale. The 1H-15N and 1H-13C CSPs were calculated according to equations (1) and (2), respectively.    1H-15N CSP = [(0.14ΔδN)2 + (ΔδH)2]1/2        (1) 1H-13C CSP = [(0.3ΔδC)2 + (ΔδH)2]1/2            (2)  149  ΔδN, ΔδC, and ΔδH are the changes in chemical shift for 15N, 13C, and 1H, respectively. Ten residues exhibiting the largest 1H-15N CSPs and in fast exchange for each titration were used to fit a 1:1 binding isotherm using GraphPad Prism and obtain KD values according to equation (3).  Δδi = Δδsat (([P]𝑇,i + [p]T,i + KD) − √([P]T,i + [p]T,i + KD)2− 4[P]T,i[p]T,i) /(2[P]T,i)    (3) [P]T,i and [p]T,i are the total concentrations of labeled protein and unlabeled SRR peptides adjusted for dilution effects, respectively, at each point i. Δδsat is the CSP at saturation. The protein and peptide concentrations were calculated by measuring UV absorbance at 280 nm under native conditions and using the following extinction coefficients: ε= 35410 M-1 cm-1 (Ets1301-440), ε= 4470 M-1 cm-1 (WT2P), ε = 9970 (WT2P*, WT0P*), and ε= 5500 M-1 cm-1 (5fPhe2P*, Val2P*, Leu2P*, Ala2P*, and Phe2P*). Ten fit KD values for each titration were averaged and the mean value and standard deviation are reported in Table 3.1. Phosphate NMR and 31P-monitored titrations   The protein and peptide samples were concentrated to 480 µM. One-dimensional (31P and 1H), and 2D 31P-HSQC spectra were collected of the free WT2P peptide to allow assignments of the pSer282 and pSer285 signals. Unlabeled Ets1301-440 was then added in small increments, and a 1D 31P spectrum collected at each point. The signals shifted downfield in fast exchange (Figure 3.7). A small change in pH (from 6.41 to 6.51) was measured, and is consistent with the slight downfield shift of the phosphate buffer signal over the course of the titration. Control pH titrations showed that chemical shift changes in the 31P signals of pSer282 and pSer285 due to pH were smaller than those due to the protein binding (not shown).  3.5.6 Hydrophobicity scale determination   150   The relative hydrophobic character of each peptide was calculated by considering the additive effect at the four substituted sites based on reported literature values for the twenty standard amino acids at pH 7.0 [221]. These values were: 41 for Ala, 63 for Tyr, 76.4 for Val, 97 for Trp and Leu, and 100 for Phe. This led to combined hydrophobicity values of 164, 290, 306, 388, and 400, respectively, by adding the contributions of each amino acid. These were normalized to 1 relative to the hydrophobicity of pentafluorophenylalanine. The predicted partition coefficients for N-Fmoc-L-phenylalanine and N-Fmoc-pentalfuoro-L-phenylalanine were used to obtain a relative value of 0.9 of hydrophobicity of the phenylalanine residue relative to the pentafluorophenylalanine residue [222, 223]. The final relative values obtained for the peptide variants were 0.37 (Ala2P*), 0.65 (WT2P*), 0.69 (Val2P*), 0.88 (Leu2P*, Trp2P*), 0.90 (Phe2P*), and 1 (5fPhe2P*).   3.5.7 Crystallization and structure determination    Purified Ets1301-440 was mixed with the 5fPhe2P* peptide at a 1:1.2 ratio to form the complex in 10 mM MES, 75-100 mM NaCl, and 1 mM TCEP, at pH 6.50. As a negative control, replicate crystallization drops were set up with Ets1301-440 solely. Crystals of the complex grew within 3 days with reservoir solutions containing 100 mM HEPES, 0.16-0.2 M Li2SO4, and 16-26% PEG 3350 at pH 7.1-8.9 by sitting drop vapor diffusion.  Two distinct crystal morphologies were observed, rhomboid (space group P3221 at pH 7.1-8.5) and needle-like (tetragonal crystal system at pH 8.5-8.9), see also Appendix C. Optimization of the rhomboid morphology growth conditions by inclusion of 4.5 % ethylene glycol led to diffraction-quality crystals. Cryo-protection was achieved by soaking the crystal with 35 % PEG 3350, while maintaining the concentrations of the remaining components constant. Subsequently, the crystals were flash frozen in liquid nitrogen and stored for ~ 5 days prior to data collection. A native data set was collected to 2.00 Å resolution at the Canadian Macromolecular Crystallography Facility 08B1-1 beamline, using a wavelength of 0.98-1.00 Å. The data was processed using the iMosflm [243] and CCP4 Aimless [244] programs. The structure of the complex was solved by molecular replacement (MR) with Phaser-MR [245] using the structure of Ets1 in a domain-swapped dimer (PDB: 1MD0). The initial MR 151  solution model was used as a starting point for direct refinement using phenix.refine [246, 247] and manual rebuilding with Coot [248].    152  Chapter 4: Concluding remarks   Regulatory transcription factors fine-tune the gene expression patterns required for cell differentiation, development, and homeostasis. Many disease processes result from even small changes in these factors due to genetic mutations or chemical modifications. The overarching goal of this thesis was to understand the regulatory and DNA-binding mechanisms of two model systems, Pax5 and Ets1, and thereby contribute to our current understanding of their transcriptional roles. Implicit in my studies is that the in vivo functions of these proteins are intricately connected to their in vitro biophysical properties.   4.1 The dual roles of the DNA-binding subdomains of Pax5  4.1.1 Summary, significance, and potential applications   Pax5 drives the differentiation of uncommitted, pluripotent cells of the lymphoid lineage into fully mature B-cells (reviewed in [109]). This process is accomplished by the activation of genes that are tissue specific, such as those involved in forming the B-cell receptor, concomitantly with the repression of lineage inappropriate targets.  Around one-third of oncogenic mutations implicated in B-cell acute lymphoblastic leukemia (B-ALL) involve the Pax5 gene [110], highlighting its role in restricting cellular proliferation. In addition, the activity of Pax5 is dependent on its bipartite PD, which is retained in the vast majority of Pax5-derived fusion oncoproteins [249]. Therefore, the PD is involved in both normal and oncogenic processes, and understanding the biophysical basis of its DNA binding will enhance our comprehension of Pax5 function.    One of the goals of my studies on Pax5 was to understand the changes that occur upon DNA binding by the PD. Using NMR spectroscopy and complementary biophysical methods, I quantified the dynamic and structural rearrangements that take place upon association of the PD with a high-affinity DNA binding site. In addition, I teased apart the relative contributions of different regions of the PD to DNA binding affinity and specificity. 153  One of the key findings of these studies was that the two subdomains have distinct behaviours. Specifically, the NTD is highly dynamic in the absence of DNA and only recognizes a relatively small subset of sequences tested. Upon binding, however, the NTD undergoes the largest change in conformational dynamics, as evidenced most clearly by MD simulations, and supported by NMR and ITC experiments.  These observations point to a model of DNA recognition by the NTD in which favorable contacts to specific DNA bases require changes in conformation and dynamics. In contrast, the more rigid CTD is able to associate promiscuously with a less stringent range of DNA sequence. Consistent with non-specific recognition of the negatively-charged DNA backbone, binding by the CTD depends more exclusively on electrostatic effects than does the NTD.  These results led me to a model of DNA association by the PD of Pax5 in which the CTD provides low affinity non-specific contacts with available sites that serve to generally localize the protein to DNA. The NTD sets the specificity required to recognize cognate target sites in a stable manner (i.e. with high affinity).  Therefore, the findings in Chapter 2 shed light into how the distinct roles of the PD subdomains form a functional unit that enables Pax5 to search genomic DNA efficiently, while retaining specificity for regulatory sites. In addition, these results will prove useful in understanding disease mutations involving the PD of Pax5, and the role of protein isoforms in transcriptional activation. Given the high degree of conservation in the PD of Pax genes across the animal kingdom, these features likely hold true for other members of this transcription factor family. Finally, from a broader perspective, my findings add to a growing body of evidence indicating that greater conformational changes accompanying DNA binding are associated with a higher degree of binding specificity [171].   One potential longer term application of our current knowledge of this system is the specific modulation of Pax5 transcriptional activity, for example by using small molecules that target either subdomain, as has been accomplished in Pax2 [250]. As discussed in Chapter 2, protein isoforms of Pax5 can vary according to the presence or absence of the NTD region. More strikingly, these isoforms have opposite effects in transcriptional activity [177, 178].  My results suggest that small molecules that could block DNA binding by associating to the NTD or CTD would have dramatically different effects on the activity of 154  Pax5. Specifically, impairing DNA binding by the NTD should affect gene-specific recognition, but is not expected to change the general association of Pax5 on available chromatin. In contrast, I predict that a reduction in the function of the CTD would dramatically slow down localization of Pax5 on regulatory promoters, and weaken its transactivation potential.   4.1.2 Limitations, outstanding questions, and future studies    My research focused on the changes that occur with the Pax5 protein upon binding DNA. However, I did not determine a detailed three-dimensional structure of the unbound PD as required to more fully understand how its conformational dynamics link to DNA binding. Also, I did not directly examine the DNA molecule to which Pax5 bound. Some TFs, such as TBP, cause large structural changes on the standard B-form DNA structure [25]. Similarly, specific binding of the CD19-N half site by the NTD may be accompanied by distinct conformations of the DNA molecule. One simple approach to gain complementary insights into base-specific contacts by the NTD and CTD would be to use 1D 1H-NMR spectra to monitor changes in the imino proton signals of the DNA upon titration with Pax5. These nuclei have very distinct downfield chemical shifts (~ 12 - 15 ppm) that are sensitive to hydrogen bonding between complementary base pairs and with protein sidechains. Thus, the imino protons should serve as sensitive reporters of changes induced in DNA upon binding by the NTD and CTD of Pax5.  In addition, one could determine the structures of the DNA molecules in their free and Pax5-bound state by NMR, or use residual dipolar couplings (RDCs) to gain insight into conformational changes that occur upon binding [251].   In addition, although my observations in Chapter 2 suggest the NTD may associate and dissociate from DNA more slowly than the CTD, I did not directly measure DNA-binding kinetics. NMR approaches including paramagnetic relaxation enhancement (PREs), combined with single molecule fluorescence and surface plasmon resonance (SPR) measurements, would be very useful in understanding the DNA search process by Pax5. Such studies would establish whether the CTD indeed allows rapid “scanning” of available 155  sites on chromatin.  By way of example, NMR studies conducted on Oct1, a TF containing a bipartite DNA-binding domain architecture, showed that the distinct rates of association by its two subdomains facilitate the search process through a “monkey-bar” mechanism of intersegmental transfer [54].   Finally, the structural basis for transcriptional activation by Pax5 and the mechanisms of recruitment of members of the basal transcriptional machinery are currently unknown. Preliminary NMR experiments on the TAD of Pax5 are useful starting points to identify how it may interact with chromatin remodeling proteins such as CBP/p300, known to be recruited by Pax5 [154].    4.2 Regulation of Ets1 function by an intrinsically-disordered region  4.2.1 Summary, significance, and potential applications   The transcriptional activity of Ets1 is sensitive to several cellular inputs, including calcium signalling [218, 219]. Upon T-cell stimulation, for example, intracellular calcium concentration rises, resulting in activation of CaMKII [252-254]. This leads to CaMKII-dependent phosphorylation of up to five mapped serine residues in the intrinsically-disordered SRR of Ets1 [219, 255]. Increasing number of these PTMs in the SRR progressively weaken the association of the ETS domain to DNA via the autoinhibitory mechanisms described in Chapter 3 [72, 215-217, 219]. The decrease in the DNA-binding activity of Ets1 significantly changes the expression profile of genes that are under its control. For example, certain pro-inflammatory genes expressed in T-helper cells, such as the IL-17 cytokine, are negatively-regulated by Ets1 [256]. The decrease in DNA binding by Ets1 and the clearing of Ets1 mRNA accompanying T-cell activation [257] is predicted to unleash pro-inflammatory events leading to increased immune activity [256]. Thus, understanding the biophysical mechanisms connecting calcium signalling to changes in the DNA-binding activity of Ets1 will help our understanding of the molecular basis for gene 156  expression changes following T-cell activation. The same argument applies to any of the numerous transcriptional networks involving Ets1.   The findings presented in Chapter 3 provide structural and biophysical insight into how the SRR promotes DNA-binding autoinhibition, and how this is increased by phosphorylation of two key CaMKII serine targets. I found that the role of the four aromatic residues in the SRR is, at least partially, due to their hydrophobic nature. Hydrophobicity promotes association of the SRR and the IM/ETS domain through favorable contacts between aromatic and aliphatic side chains in these regions. The observation that the aromatic residues could be substituted with different types of aliphatic amino acids without loss in binding affinity, supports previous observations that the interaction is transient, dynamic, and “fuzzy” [72, 217]. These type of interactions are commonly observed in proteins that require rapid, reversible regulation of protein activity [258], as would be expected in transcription factors like Ets1 upon T-cell activation. In addition, I found that the phosphoserine residues tend to remain solvent exposed and, although not involved in any persistent salt-bridges, generally occupy a positively charged surface on Ets1 rich in arginines and lysines. This electrostatic contribution to binding is supported by the dependency of the interaction on ionic strength. Together, these results also hint at a “salting-out” mechanism of association of the SRR and IM/ETS domain, whereby the phosphate groups promote the hydrophobic clustering of adjacent aromatic residues. This facilitates binding to a hydrophobic surface of the ETS domain surrounded by positively charged residues.   In collaboration with colleagues at UBC, I used NMR spectroscopy and X-ray crystallography to determine the first detailed structural models of the SRR bound to the IM/ETS domain. These ensembles conclusively show that the SRR both acts as a steric modulator of Est1 by blocking the DNA-binding interface, and as an allosteric effector that stabilizes the IM against DNA-induced unfolding. Therefore, we can now explain how the action of the SRR and the IM are connected as one, combining the steric and allosteric mechanisms of autoinhibition to repress DNA-binding by Ets1.   From a broader biophysical perspective, the studies described in Chapter 3 add to our current understanding of “fuzzy” interactions involving IDRs, a relatively new field in 157  “un” structural biology. Specifically, my results provide insight into how the combination of charged and aromatic/hydrophobic residues may promote transient association of IDRs with protein surfaces having complementary physicochemical features.   The results presented in Chapter 3 have potential applications in efforts aimed at targeting Ets1 therapeutically.  Ets1 is a proto-oncogene involved in many different types of carcinomas [83]. In some cellular contexts, a reduction of its transcriptional activity could prove useful in treating cancers characterized by an overexpression of Ets1 [83]. Complementary NMR spectroscopy and X-ray crystallography approaches provide lower and higher resolution information about the IM/ETS domain-SRR peptide interaction in different physical contexts (i.e. in solution and in a crystal). This information is invaluable in the use of in-silico screening approaches to find small molecule compounds that reinforce the SRR-mediated autoinhibition. Because SRR-mediated autoinhibition is only found in the Ets1 and Ets2 members, it is an attractive target for specifically regulating these two proteins, while avoiding all other family members sharing the conserved DNA-binding ETS domain. In addition, this approach would specifically alter the regulatory mechanism mediated by the SRR, thereby eluding some of the common pitfalls in choosing a therapeutic target (e.g. lack of specificity or completely abolishing protein function).   4.2.2 Limitations, outstanding questions, and future studies   Previous work in our lab was unable to confidently detect NOE contacts between SRR and the IM/ETS domain in Ets1279-440, and thus their interaction interface was only coarsely mapped via CSP and PRE measurements [217]. These studies were carried out under conditions of moderate ionic strength, which was necessary to maintain protein solubility yet weakened the intramolecular association. Along with the relatively large number of tyrosine and phenylalanine residues in the IM/ETS domain and the SRR (a total of 18), this made structural characterization of SRR-mediated autoinhibition system very challenging. The trans system described in Chapter 3 had the advantages of allowing selective NMR experiments that specifically detect contacts between the unlabeled SRR and the 15N/13C-labeled IM/ETS domain. However, only a portion of the SRR was studied and 158  the effective interaction strength was much weaker than in the natural protein context. This resulted in a potential increase in the heterogeneity of conformations sampled by the SRR, as well as a loss in the coupling of the steric and allosteric mechanisms of DNA-binding autoinhibition.     One way to characterize the cis-interaction of the full SRR with the ETS domain involves use of relatively newer methods in protein chemistry that allow the efficient covalent linkage of protein segments into a continuous polypeptide chain. In particular, proximity-based Sortase A-mediated ligation seems well-suited for this system [259, 260]. Sortase A could be used to fuse 13C/15N-labeled SRR (residues 248-295) to the unlabeled ETS domain or vice versa. Such selective labeling would lead to spectral simplification and enable filtered-edited approaches to confidently define how the SRR impinges upon the IM and ETS domain in an intramolecular manner. This would help address questions of the following nature. How do the additional phosphorylation modifications of the native SRR contribute to autoinhibition? Do the remaining residues associate with the same surface region of Ets1, or are there additional surfaces of the ETS domain mediating interactions with the SRR? How dynamic is the association of the SRR under conditions that promote the interaction (i.e. lower ionic strength). A small disadvantage is that the trans-peptidation reaction would result in the addition of 5 non-native residues, LPXTG (where X is any residue) between the SRR and the IM/ETS domain. However, this is the same number of residues (PNHKP) missing in the trans system described in Chapter 3, in which the SRR peptide spans Ets1 residues 279 - 295, and the IM/ETS domain corresponds to residues 301 - 440.  In parallel to structural characterization, I used the CABS-DOCK modelling tool to predict how the truncated SRR peptide (residues 279-295) might localize on the IM/ETS domain. Out of the 10 models provided by this predictor, 7 were consistent with the general interface defined by my experimental observations (not shown). This gave me confidence in using this approach to gain insight into how the full sized SRR might associate with the IM/ETS domain. I therefore used CABS-DOCK to predict the binding site of residues 248-278, which can also be phosphorylated and increase autoinhibition [216]. Out of the ten models predicted by the program, five were consistent with a location that 159  naturally extends the SRR interaction interface, determined experimentally in Chapter 3 (Figure 4.1).     Figure 4.1: The full-length SRR region may extend the ETS domain binding interface.  The complete SRR region includes an additional ~35 residues with CaMKII phospho-acceptor serines (Ets1244-278, light violet). Although the truncated SRR used in my studies (Ets1279-295, red) recapitulates DNA-binding autoinhibition [217], Sortase-mediated protein ligation may allow studies involving the full-length region. The bottom model shows 4 of the 10 predicted binding sites for a peptide corresponding to Ets1 residues 248-278 (light violet). The remaining 6 models overlapped the region shown experimentally to be occupied by Ets1279-295, and were therefore excluded. The CABS-DOCK software only allows a maximum of 30 residues and so residues 244-248 were not included. Also shown are 5 of the 20 CYANA-derived models (see Figure 3.12) which place the N-terminus of Ets1279-295 in close proximity to the modeled residues Ets1248-278, as would be expected in the native context. Overall, this analysis suggests that the full-length SRR may further “wrap around” the core ETS domain to yield increased autoinhibition.    Another puzzling question I was not able to fully address, is why there is a striking pattern of conserved aromatic residues adjacent to serine and aspartate/glutamate residues in the SRR regions of Ets1 and Ets2 homologs [72], given that my studies predict aliphatic residues such as leucine could fulfill the same “hydrophobic” role. A few possible 160  explanations were offered in section 3.4.1. One is that the specific electronic and biophysical properties of tyrosine and phenylalanine amino acids are ideal for the SRR function and stability of the protein. For example, I found that certain SRR peptide variants promoted protein aggregation more than others, in particular the Trp2P* and 5fF2P* versions. The combination of tyrosine and phenylalanine residues may be “just hydrophobic enough” to maintain function without promoting protein misfolding. Of note, the SRR of Ets2 contains a mapped CaMKII phosphoacceptor site consisting of the sequence S-L-L/V-D. The leucine and valine residues at this site are expected to have a similar role as the tyrosine and phenylalanine residues found in the Ets1 SRR. However, the majority of the mapped phosphorylation sites across Ets1 and Ets2 homologs contain aromatic residues [72].   Another possibility is that the SRR residues have additional roles beyond serving as CaMKII phosphoacceptor sites and inhibiting DNA-binding by the ETS domain. For example, the SRR could provide recognition sites for important protein-protein interactions and post-translational modifications that also regulate Ets1 function. The SRR residues Val280 and Pro281, for instance, are part of a COP1 degron that allows polyubiquitination of Ets1 and its subsequent degradation [233]. Phosphorylation of Tyr283 in the SRR by the tyrosine kinase Src disrupts the interaction between this degron and COP1, thereby stabilizing Ets1 [233]. In contrast, phosphorylation at pSer282 increases COP1 binding. The negative charge afforded by phosphorylation likely mimics the canonical COP1 recognition sequence involving aspartates or glutamates at the phosphoserine site, thereby promoting binding of COP1 and polyubiquitination [233]. A few studies indicate that upon T-cell activation, Ets1 function may be reduced in parallel at the mRNA and protein levels [233, 257], and in addition, have its DNA-binding function impaired [219]. Modifications of serine and tyrosine residues in the SRR likely serve to fine-tune these mechanisms that integrate multiple regulatory signaling pathways.  Finally, the Ets1 SRR is able to associate weakly with the DNA-binding domain of PU.1, a distantly-related ETS transcription factor. This finding may not be biologically relevant, and only a product of conserved surface features of the ETS domains. However, the full-length SRR is ~ 50 residues. Conceivably, the SRR could function to inhibit co-factors in trans, which possess positively-charged side chains in the vicinity of hydrophobic 161  patches. This would require multiple ETS domains (or other partner) to be in close proximity, for example in the context of a homo- or heterodimers. If the SRR associates with multiple proteins in this way, it could also help explain the preservation of aromatic residues at these positions.   In summary, future studies are still needed to fully understand the biological roles of the SRR, how they are linked to the biophysical properties of this IDR, and the cellular signals that are integrated in this region.     162  References  1. Levine, M. and R. Tjian, Transcription regulation and animal diversity. Nature, 2003. 424(6945): p. 147-51. 2. Latchman, D.S., Eukaryotic Transcription Factors. Biochemical Journal, 1990. 270(2): p. 281-289. 3. Wade, J.T. and K. Struhl, The transition from transcriptional initiation to elongation. Current Opinion in Genetics & Development, 2008. 18(2): p. 130-136. 4. Proudfoot, N.J., Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science, 2016. 352(6291): p. aad9926. 5. Brivanlou, A.H. and J.E. Darnell, Jr., Signal transduction and the control of gene expression. Science, 2002. 295(5556): p. 813-8. 6. Allen, B.L. and D.J. Taatjes, The Mediator complex: a central integrator of transcription. Nat Rev Mol Cell Biol, 2015. 16(3): p. 155-66. 7. Krishnamurthy, S. and M. Hampsey, Eukaryotic transcription initiation. Curr Biol, 2009. 19(4): p. R153-6. 8. Hsin, J.P. and J.L. Manley, The RNA polymerase II CTD coordinates transcription and RNA processing. Genes Dev, 2012. 26(19): p. 2119-37. 9. Guenther, M.G., et al., A chromatin landmark and transcription initiation at most promoters in human cells. Cell, 2007. 130(1): p. 77-88. 10. Becker, P.B. and J.L. Workman, Nucleosome remodeling and epigenetics. Cold Spring Harb Perspect Biol, 2013. 5(9). 11. Vaquerizas, J.M., et al., A census of human transcription factors: function, expression and evolution. Nat Rev Genet, 2009. 10(4): p. 252-63. 12. Garvie, C.W., J. Hagman, and C. Wolberger, Structural studies of Ets-1/Pax5 complex formation on DNA. Mol Cell, 2001. 8(6): p. 1267-76. 13. Reinhardt, H.C. and B. Schumacher, The p53 network: cellular and systemic DNA damage responses in aging and cancer. Trends Genet, 2012. 28(3): p. 128-36. 14. Wan, F. and M.J. Lenardo, The nuclear signaling of NF-kappaB: current knowledge, new insights, and future perspectives. Cell Res, 2010. 20(1): p. 24-33. 163  15. Lopez, R.G., et al., TEL is a sequence-specific transcriptional repressor. J Biol Chem, 1999. 274(42): p. 30132-8. 16. Furney, S.J., et al., Structural and functional properties of genes involved in human cancer. BMC Genomics, 2006. 7: p. 3. 17. Cox, P.M. and C.R. Goding, Transcription and cancer. Br J Cancer, 1991. 63(5): p. 651-62. 18. Frankel, A.D. and P.S. Kim, Modular Structure of Transcription Factors - Implications for Gene-Regulation. Cell, 1991. 65(5): p. 717-719. 19. Hahn, S. and E.T. Young, Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics, 2011. 189(3): p. 705-36. 20. Brent, R. and M. Ptashne, A eukaryotic transcriptional activator bearing the DNA specificity of a prokaryotic repressor. Cell, 1985. 43(3 Pt 2): p. 729-36. 21. Smeenk, L., et al., Molecular role of the PAX5-ETV6 oncoprotein in promoting B-cell acute lymphoblastic leukemia. EMBO J, 2017. 36(6): p. 718-735. 22. Liu, J., et al., Intrinsic disorder in transcription factors. Biochemistry, 2006. 45(22): p. 6873-88. 23. Minezaki, Y., et al., Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J Mol Biol, 2006. 359(4): p. 1137-49. 24. Connolly, K.M., et al., Major groove recognition by three-stranded beta-sheets: affinity determinants and conserved structural features. J Mol Biol, 2000. 300(4): p. 841-56. 25. Bewley, C.A., A.M. Gronenborn, and G.M. Clore, Minor groove-binding architectural proteins: structure, function, and DNA recognition. Annu Rev Biophys Biomol Struct, 1998. 27: p. 105-31. 26. Ippel, H., et al., The solution structure of the homeodomain of the rat insulin-gene enhancer protein isl-1. Comparison with other homeodomains. J Mol Biol, 1999. 288(4): p. 689-703. 27. Schrodinger, L., The PyMOL Molecular Graphics System. 2015. 28. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42. 29. Jayaram, B. and T. Jain, The role of water in protein-DNA recognition. Annu Rev Biophys Biomol Struct, 2004. 33: p. 343-61. 164  30. Nadassy, K., S.J. Wodak, and J. Janin, Structural features of protein-nucleic acid recognition sites. Biochemistry, 1999. 38(7): p. 1999-2017. 31. Seeman, N.C., J.M. Rosenberg, and A. Rich, Sequence-specific recognition of double helical nucleic acids by proteins. Proc Natl Acad Sci U S A, 1976. 73(3): p. 804-8. 32. Smith, N.C. and J.M. Matthews, Mechanisms of DNA-binding specificity and functional gene regulation by transcription factors. Curr Opin Struct Biol, 2016. 38: p. 68-74. 33. Jen-Jacobson, L., L.E. Engler, and L.A. Jacobson, Structural and thermodynamic strategies for site-specific DNA binding proteins. Structure, 2000. 8(10): p. 1015-23. 34. Rice, P.A., Protein-nucleic acid interactions. 2008, Cambridge, UK: RSC Publishing. 397. 35. De, S., et al., Steric Mechanism of Auto-Inhibitory Regulation of Specific and Non-Specific DNA Binding by the ETS Transcriptional Repressor ETV6. Journal of Molecular Biology, 2014. 426(7): p. 1390-1406. 36. Lamber, E.P., et al., Regulation of the transcription factor Ets-1 by DNA-mediated homo-dimerization. EMBO J, 2008. 27(14): p. 2006-17. 37. Rohs, R., et al., The role of DNA shape in protein-DNA recognition. Nature, 2009. 461(7268): p. 1248-53. 38. Rohs, R., et al., Origins of Specificity in Protein-DNA Recognition. Annual Review of Biochemistry, Vol 79, 2010. 79: p. 233-269. 39. Desjardins, G., et al., Conformational Dynamics and the Binding of Specific and Nonspecific DNA by the Autoinhibited Transcription Factor Ets-1. Biochemistry, 2016. 55(29): p. 4105-18. 40. Perez-Borrajero, C., M. Okon, and L.P. McIntosh, Structural and Dynamics Studies of Pax5 Reveal Asymmetry in Stability and DNA Binding by the Paired Domain. Journal of Molecular Biology, 2016. 428(11): p. 2372-2391. 41. Kalodimos, C.G., R. Boelens, and R. Kaptein, Toward an integrated model of protein-DNA recognition as inferred from NMR studies on the Lac repressor system. Chem Rev, 2004. 104(8): p. 3567-86. 42. Clore, G.M., Exploring translocation of proteins on DNA by NMR. Journal of Biomolecular Nmr, 2011. 51(3): p. 209-219. 43. Redding, S. and E.C. Greene, How do proteins locate specific targets in DNA? Chem Phys Lett, 2013. 570. 165  44. Brackley, C.A., M.E. Cates, and D. Marenduzzo, Effect of DNA conformation on facilitated diffusion. Biochem Soc Trans, 2013. 41(2): p. 582-8. 45. Blainey, P.C., et al., Nonspecifically bound proteins spin while diffusing along DNA. Nature Structural & Molecular Biology, 2009. 16(12): p. 1224-U34. 46. Vonhippel, P.H. and O.G. Berg, Facilitated Target Location in Biological-Systems. Journal of Biological Chemistry, 1989. 264(2): p. 675-678. 47. Hippel, P.H.V., et al., Nonspecific DNA Binding of Genome Regulating Proteins as a Biological-Control Mechanism .1. Lac Operon - Equilibrium Aspects. Proceedings of the National Academy of Sciences of the United States of America, 1974. 71(12): p. 4808-4812. 48. Vonhippel, P.H. and O.G. Berg, On the Specificity of DNA-Protein Interactions. Proceedings of the National Academy of Sciences of the United States of America, 1986. 83(6): p. 1608-1612. 49. Halford, S.E., An end to 40 years of mistakes in DNA-protein association kinetics? Biochem Soc Trans, 2009. 37(Pt 2): p. 343-8. 50. Yesudhas, D., et al., Proteins Recognizing DNA: Structural Uniqueness and Versatility of DNA-Binding Domains in Stem Cell Transcription Factors. Genes (Basel), 2017. 8(8). 51. Komazin-Meredith, G., et al., Hopping of a processivity factor on DNA revealed by single-molecule assays of diffusion. Proc Natl Acad Sci U S A, 2008. 105(31): p. 10721-6. 52. Bonnet, I., et al., Sliding and jumping of single EcoRV restriction enzymes on non-cognate DNA. Nucleic Acids Res, 2008. 36(12): p. 4118-27. 53. Li, G.W. and J. Elf, Single molecule approaches to transcription factor kinetics in living cells. FEBS Lett, 2009. 583(24): p. 3979-83. 54. Doucleff, M. and G.M. Clore, Global jumping and domain-specific intersegment transfer between DNA cognate sites of the multidomain transcription factor Oct-1. Proceedings of the National Academy of Sciences of the United States of America, 2008. 105(37): p. 13871-13876. 55. Vuzman, D. and Y. Levy, The "Monkey-Bar" Mechanism for Searching for the DNA Target Site: The Molecular Determinants. Israel Journal of Chemistry, 2014. 54(8-9): p. 1374-1381. 56. Egriboz, O., F. Jiang, and J.E. Hopper, Rapid GAL gene switch of Saccharomyces cerevisiae depends on nuclear Gal3, not nucleocytoplasmic trafficking of Gal3 and Gal80. Genetics, 2011. 189(3): p. 825-36. 166  57. Traven, A., B. Jelicic, and M. Sopta, Yeast Gal4: a transcriptional paradigm revisited. EMBO Rep, 2006. 7(5): p. 496-9. 58. Varga-Weisz, P., ATP-dependent chromatin remodeling factors: nucleosome shufflers with many missions. Oncogene, 2001. 20(24): p. 3076-85. 59. Burgess, R.J. and Z. Zhang, Histones, histone chaperones and nucleosome assembly. Protein Cell, 2010. 1(7): p. 607-12. 60. Zaret, K.S. and S.E. Mango, Pioneer transcription factors, chromatin dynamics, and cell fate control. Curr Opin Genet Dev, 2016. 37: p. 76-81. 61. Cirillo, L.A., et al., Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell, 2002. 9(2): p. 279-89. 62. Bah, A. and J.D. Forman-Kay, Modulation of Intrinsically Disordered Protein Function by Post-translational Modifications. Journal of Biological Chemistry, 2016. 291(13): p. 6696-6705. 63. Whitmarsh, A.J. and R.J. Davis, Regulation of transcription factor function by phosphorylation. Cell Mol Life Sci, 2000. 57(8-9): p. 1172-83. 64. Ardito, F., et al., The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review). Int J Mol Med, 2017. 40(2): p. 271-280. 65. Shaywitz, A.J. and M.E. Greenberg, CREB: a stimulus-induced transcription factor activated by a diverse array of extracellular signals. Annu Rev Biochem, 1999. 68: p. 821-61. 66. Gareau, J.R. and C.D. Lima, The SUMO pathway: emerging mechanisms that shape specificity, conjugation and recognition. Nat Rev Mol Cell Biol, 2010. 11(12): p. 861-71. 67. Gill, G., Post-translational modification by the small ubiquitin-related modifier SUMO has big effects on transcription factor activity. Curr Opin Genet Dev, 2003. 13(2): p. 108-13. 68. Gill, G., Something about SUMO inhibits transcription. Curr Opin Genet Dev, 2005. 15(5): p. 536-41. 69. Santiago, A., et al., Identification of two independent SUMO-interacting motifs in Daxx: evolutionary conservation from Drosophila to humans and their biochemical functions. Cell Cycle, 2009. 8(1): p. 76-87. 70. Tell, G., et al., An 'environment to nucleus' signaling system operates in B lymphocytes: redox status modulates BSAP/Pax-5 activation through Ref-1 nuclear translocation. Nucleic Acids Res, 2000. 28(5): p. 1099-105. 167  71. Pufall, M.A. and B.J. Graves, Autoinhibitory domains: modular effectors of cellular regulation. Annu Rev Cell Dev Biol, 2002. 18: p. 421-62. 72. Desjardins, G., et al., Synergy of aromatic residues and phosphoserines within the intrinsically disordered DNA-binding inhibitory elements of the Ets-1 transcription factor. Proc Natl Acad Sci U S A, 2014. 111(30): p. 11019-24. 73. Currie, S.L., et al., Structured and disordered regions cooperatively mediate DNA-binding autoinhibition of ETS factors ETV1, ETV4 and ETV5. Nucleic Acids Res, 2017. 45(5): p. 2223-2241. 74. Coyne, H.J., et al., Autoinhibition of ETV6 (TEL) DNA Binding: Appended Helices Sterically Block the ETS Domain. Journal of Molecular Biology, 2012. 421(1): p. 67-84. 75. Pufall, M.A. and B.J. Graves, Ets-1 flips for new partner Pax-5. Structure, 2002. 10(1): p. 11-4. 76. Jolma, A., et al., DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature, 2015. 527(7578): p. 384-8. 77. Burdach, J., et al., Regions outside the DNA-binding domain are critical for proper in vivo specificity of an archetypal zinc finger transcription factor. Nucleic Acids Res, 2014. 42(1): p. 276-89. 78. Garvie, C.W., et al., Structural analysis of the autoinhibition of Ets-1 and its role in protein partnerships. Journal of Biological Chemistry, 2002. 277(47): p. 45529-45536. 79. Fitzsimmons, D., et al., Highly cooperative recruitment of Ets-1 and release of autoinhibition by Pax5. J Mol Biol, 2009. 392(2): p. 452-64. 80. Chi, N. and J.A. Epstein, Getting your Pax straight: Pax proteins in development and disease. Trends in Genetics, 2002. 18(1): p. 41-47. 81. Czerny, T., G. Schaffner, and M. Busslinger, DNA sequence recognition by Pax proteins: bipartite structure of the paired domain and its binding site. Genes Dev, 1993. 7(10): p. 2048-61. 82. Cobaleda, C., et al., Pax5: the guardian of B cell identity and function. Nat Immunol, 2007. 8(5): p. 463-70. 83. Dittmer, J., The role of the transcription factor Ets1 in carcinoma. Semin Cancer Biol, 2015. 35: p. 20-38. 84. Mackereth, C.D., et al., Diversity in structure and function of the Ets family PNT domains. J Mol Biol, 2004. 342(4): p. 1249-64. 168  85. Underhill, D.A., PAX proteins and fables of their reconstruction. Crit Rev Eukaryot Gene Expr, 2012. 22(2): p. 161-77. 86. Underhill, D.A., Genetic and biochemical diversity in the Pax gene family. Biochem Cell Biol, 2000. 78(5): p. 629-38. 87. Stuart, E.T. and P. Gruss, PAX: developmental control genes in cell growth and differentiation. Cell Growth Differ, 1996. 7(3): p. 405-12. 88. Barr, F.G., Chromosomal translocations involving paired box transcription factors in human cancer. Int J Biochem Cell Biol, 1997. 29(12): p. 1449-61. 89. Eberhard, D. and M. Busslinger, The partial homeodomain of the transcription factor Pax-5 (BSAP) is an interaction motif for the retinoblastoma and TATA-binding proteins. Cancer Res, 1999. 59(7 Suppl): p. 1716s-1724s; discussion 1724s-1725s. 90. Xu, W., et al., Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations. Cell, 1995. 80(4): p. 639-50. 91. Xu, H.E., et al., Crystal structure of the human Pax6 paired domain-DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding. Genes Dev, 1999. 13(10): p. 1263-75. 92. Codutti, L., et al., The solution structure of DNA-free Pax-8 paired box domain accounts for redox regulation of transcriptional activity in the pax protein family. J Biol Chem, 2008. 283(48): p. 33321-8. 93. Epstein, J., et al., Identification of a Pax paired domain recognition sequence and evidence for DNA-dependent conformational changes. J Biol Chem, 1994. 269(11): p. 8355-61. 94. Pellizzari, L., G. Tell, and G. Damante, Co-operation between the PAI and RED subdomains of Pax-8 in the interaction with the thyroglobulin promoter. Biochemical Journal, 1999. 337: p. 253-262. 95. Wilson, D.S., et al., High-Resolution Crystal-Structure of a Paired (Pax) Class Cooperative Homeodomain Dimer on DNA. Cell, 1995. 82(5): p. 709-719. 96. Birrane, G., A. Soni, and J.A.A. Ladias, Structural Basis for DNA Recognition by the Human PAX3 Homeodomain. Biochemistry, 2009. 48(6): p. 1148-1155. 97. Banerjee-Basu, S. and A.D. Baxevanis, Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Research, 2001. 29(15): p. 3258-3269. 98. Burglin, T.R. and M. Affolter, Homeodomain proteins: an update. Chromosoma, 2016. 125(3): p. 497-521. 169  99. Jennings, B.H. and D. Ish-Horowicz, The Groucho/TLE/Grg family of transcriptional co-repressors. Genome Biology, 2008. 9(1). 100. Barberis, A., et al., A novel B-cell lineage-specific transcription factor present at early but not late stages of differentiation. Genes Dev, 1990. 4(5): p. 849-59. 101. Urbanek, P., et al., Complete block of early B cell differentiation and altered patterning of the posterior midbrain in mice lacking Pax5/BSAP. Cell, 1994. 79(5): p. 901-12. 102. Urbanek, P., et al., Cooperation of Pax2 and Pax5 in midbrain and cerebellum development. Proc Natl Acad Sci U S A, 1997. 94(11): p. 5703-8. 103. Bouchard, M., P. Pfeffer, and M. Busslinger, Functional equivalence of the transcription factors Pax2 and Pax5 in mouse development. Development, 2000. 127(17): p. 3703-13. 104. Pfeffer, P.L., M. Bouchard, and M. Busslinger, Pax2 and homeodomain proteins cooperatively regulate a 435 bp enhancer of the mouse Pax5 gene at the midbrain-hindbrain boundary. Development, 2000. 127(5): p. 1017-28. 105. Nutt, S.L., et al., Commitment to the B-lymphoid lineage depends on the transcription factor Pax5. Nature, 1999. 401(6753): p. 556-562. 106. Delogu, A., et al., Gene repression by Pax5 in B cells is essential for blood cell homeostasis and is reversed in plasma cells. Immunity, 2006. 24(3): p. 269-281. 107. Schebesta, A., et al., Transcription factor Pax5 activates the chromatin of key genes involved in B cell signaling, adhesion, migration, and immune function. Immunity, 2007. 27(1): p. 49-63. 108. O'Brien, P., et al., The Pax-5 gene: a pluripotent regulator of B-cell differentiation and cancer disease. Cancer Res, 2011. 71(24): p. 7345-50. 109. Medvedovic, J., et al., Pax5: a master regulator of B cell development and leukemogenesis. Adv Immunol, 2011. 111: p. 179-206. 110. Mullighan, C.G., et al., Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature, 2007. 446(7137): p. 758-764. 111. Kuiper, R.P., et al., High-resolution genomic profiling of childhood ALL reveals novel recurrent genetic lesions affecting pathways involved in lymphocyte differentiation and cell cycle progression. Leukemia, 2007. 21(6): p. 1258-1266. 112. Shah, S., et al., A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia. Nature Genetics, 2013. 45(10): p. 1226-U179. 170  113. Liu, G., et al., Pax5 Loss Imposes a Reversible Differentiation Block in B-Progenitor Acute Lymphoblastic Leukemia. Experimental Hematology, 2014. 42(8): p. S46-S46. 114. Revilla-i-Domingo, R., et al., The B-cell identity factor Pax5 regulates distinct transcriptional programmes in early and late B lymphopoiesis. Embo Journal, 2012. 31(14): p. 3130-3146. 115. Eberhard, D., et al., Transcriptional repression by Pax5 (BSAP) through interaction with corepressors of the Groucho family. EMBO J, 2000. 19(10): p. 2292-303. 116. He, T., et al., Histone acetyltransferase p300 acetylates Pax5 and strongly enhances Pax5-mediated transcriptional activity. J Biol Chem, 2011. 286(16): p. 14137-45. 117. Dang, J.J., et al., PAX5 is a tumor suppressor in mouse mutagenesis models of acute lymphoblastic leukemia. Blood, 2015. 125(23): p. 3609-3617. 118. Czerny, T. and M. Busslinger, DNA-Binding and Transactivation Properties of Pax-6 - 3 Amino-Acids in the Paired Domain Are Responsible for the Different Sequence Recognition of Pax-6 and Bsap (Pax-5). Molecular and Cellular Biology, 1995. 15(5): p. 2858-2871. 119. Cai, J.X., et al., Dissection of the Drosophila Paired Protein - Functional Requirements for Conserved Motifs. Mechanisms of Development, 1994. 47(2): p. 139-150. 120. Treisman, J., E. Harris, and C. Desplan, The Paired Box Encodes a 2nd DNA-Binding Domain in the Paired Homeo Domain Protein. Genes & Development, 1991. 5(4): p. 594-604. 121. Bertuccioli, C., et al., In vivo requirement for the paired domain and homeodomain of the paired segmentation gene product. Development, 1996. 122(9): p. 2673-2685. 122. Li, L., P. Li, and L. Xue, The RED domain of Paired is specifically required for Drosophila accessory gland maturation. Open Biology, 2015. 5(2). 123. Epstein, J.A., et al., 2 Independent and Interactive DNA-Binding Subdomains of the Pax6 Paired Domain Are Regulated by Alternative Splicing. Genes & Development, 1994. 8(17): p. 2022-2034. 124. Kozmik, Z., T. Czerny, and M. Busslinger, Alternatively spliced insertions in the paired domain restrict the DNA sequence specificity of Pax6 and Pax8. Embo Journal, 1997. 16(22): p. 6793-6803. 125. Sandelin, A. and W.W. Wasserman, Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. Journal of Molecular Biology, 2004. 338(2): p. 207-215. 171  126. Shen, Y. and A. Bax, Identification of helix capping and b-turn motifs from NMR chemical shifts. J Biomol NMR, 2012. 52(3): p. 211-32. 127. Gryk, M.R., et al., Flexibility of DNA binding domain of trp repressor required for recognition of different operator sequences. Protein Science, 1996. 5(6): p. 1195-1197. 128. Ogata, K., et al., The cavity in the hydrophobic core of Myb DNA-binding domain is reserved for DNA recognition and trans-activation. Nature Structural Biology, 1996. 3(2): p. 178-187. 129. Zhang, Y.-Z., Protein and peptide structure and interactions studied by hydrogen exchange and NMR, in Structural Biology and Molecular Biophysics. 1995, University of Pennsylvania: Pennsylvania, USA. p. 406. 130. Bai, Y., et al., Primary structure effects on peptide group hydrogen exchange. Proteins, 1993. 17(1): p. 75-86. 131. Connelly, G.P., et al., Isotope effects in peptide group hydrogen exchange. Proteins, 1993. 17(1): p. 87-92. 132. Krishna, M.M.G., et al., Hydrogen exchange methods to study protein folding. Methods, 2004. 34(1): p. 51-64. 133. Best, R.B. and M. Vendruscolo, Structural interpretation of hydrogen exchange protection factors in proteins: Characterization of the native state fluctuations of C12. Structure, 2006. 14(1): p. 97-106. 134. Skinner, J.J., et al., Protein hydrogen exchange: Testing current models. Protein Science, 2012. 21(7): p. 987-995. 135. Pace, C.N., Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol, 1986. 131: p. 266-80. 136. Myers, J.K., C.N. Pace, and J.M. Scholtz, Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci, 1995. 4(10): p. 2138-48. 137. Clarke, J. and A.R. Fersht, An evaluation of the use of hydrogen exchange at equilibrium to probe intermediates on the protein folding pathway. Fold Des, 1996. 1(4): p. 243-54. 138. Kleckner, I.R. and M.P. Foster, An introduction to NMR-based approaches for measuring protein dynamics. Biochimica Et Biophysica Acta-Proteins and Proteomics, 2011. 1814(8): p. 942-968. 139. Dosset, P., et al., Efficient analysis of macromolecular rotational diffusion from heteronuclear relaxation data. Journal of Biomolecular Nmr, 2000. 16(1): p. 23-28. 172  140. de la Torre, J.G., M.L. Huertas, and B. Carrasco, HYDRONMR: Prediction of NMR relaxation of globular proteins from atomic-level structures and hydrodynamic calculations. Journal of Magnetic Resonance, 2000. 147(1): p. 138-146. 141. D.A. Case, V.B., J.T. Berryman, R.M. Betz, Q. Cai, D.S. Cerutti, T.E. Cheatham, III, T.A. Darden, R.E. Duke, H. Gohlke, A.W. Goetz, S. Gusarov, N. Homeyer, P. Janowski, J. Kaus, I. Kolossváry, A. Kovalenko, T.S. Lee, S. LeGrand, T. Luchko, R. Luo, B. Madej, K.M. Merz, F. Paesani, D.R. Roe, A. Roitberg, C. Sagui, R. Salomon-Ferrer, G. Seabra, C.L. Simmerling, W. Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, X. Wu and P.A. Kollman, AMBER 14. 2014: University of California, San Francisco. 142. Sethi, A., et al., Dynamical networks in tRNA: protein complexes. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(16): p. 6620-6625. 143. Adams, B., et al., Pax-5 encodes the transcription factor BSAP and is expressed in B lymphocytes, the developing CNS, and adult testis. Genes Dev, 1992. 6(9): p. 1589-607. 144. Fitzsimmons, D., et al., Pax-5 (BSAP) recruits Ets proto-oncogene family proteins to form functional ternary complexes on a B-cell-specific promoter. Genes Dev, 1996. 10(17): p. 2198-211. 145. Wheat, W., et al., The highly conserved beta-hairpin of the paired DNA-binding domain is required for assembly of Pax-Ets ternary complexes. Mol Cell Biol, 1999. 19(3): p. 2231-41. 146. Fitzsimmons, D., et al., Highly conserved amino acids in Pax and Ets proteins are required for DNA binding and ternary complex assembly. Nucleic Acids Res, 2001. 29(20): p. 4154-65. 147. Cooper, A., Heat capacity effects in protein folding and ligand binding: a re-evaluation of the role of water in biomolecular thermodynamics. Biophys Chem, 2005. 115(2-3): p. 89-97. 148. Bergqvist, S., et al., Heat capacity effects of water molecules and ions at a protein-DNA interface. J Mol Biol, 2004. 336(4): p. 829-42. 149. Spolar, R.S. and M.T. Record, Jr., Coupling of local folding to site-specific binding of proteins to DNA. Science, 1994. 263(5148): p. 777-84. 150. Gallagher, K. and K. Sharp, Electrostatic contributions to heat capacity changes of DNA-ligand binding. Biophys J, 1998. 75(2): p. 769-76. 151. Spolar, R.S., J.R. Livingstone, and M.T. Record, Jr., Use of liquid hydrocarbon and amide transfer data to estimate contributions to thermodynamic functions of protein folding 173  from the removal of nonpolar and polar surface from water. Biochemistry, 1992. 31(16): p. 3947-55. 152. Privalov, P.L., A.I. Dragan, and C. Crane-Robinson, Interpreting protein/DNA interactions: distinguishing specific from non-specific and electrostatic from non-electrostatic components. Nucleic Acids Research, 2011. 39(7): p. 2483-2491. 153. Yeon, J.H., et al., Systems-wide Identification of cis-Regulatory Elements in Proteins. Cell Syst, 2016. 2(2): p. 89-100. 154. Emelyanov, A.V., et al., The interaction of Pax5 (BSAP) with Daxx can result in transcriptional activation in B cells. Journal of Biological Chemistry, 2002. 277(13): p. 11156-11164. 155. Zhao, Q., et al., GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs. Nucleic Acids Research, 2014. 42(W1): p. W325-W330. 156. Lin, D.Y., et al., Role of SUMO-interacting motif in Daxx SUMO modification, subnuclear localization, and repression of sumoylated transcription factors. Molecular Cell, 2006. 24(3): p. 341-354. 157. Chang, C.C., et al., Structural and Functional Roles of Daxx SIM Phosphorylation in SUMO Para log-Selective Binding and Apoptosis Modulation. Molecular Cell, 2011. 42(1): p. 62-74. 158. Willis, M.S., et al., Investigation of protein refolding using a fractional factorial screen: a study of reagent effects and interactions. Protein Sci, 2005. 14(7): p. 1818-26. 159. Moelbert, S., B. Normand, and P. De Los Rios, Kosmotropes and chaotropes: modelling preferential exclusion, binding and aggregate stability. Biophys Chem, 2004. 112(1): p. 45-57. 160. Dorfler, P. and M. Busslinger, C-terminal activating and inhibitory domains determine the transactivation potential of BSAP (Pax-5), Pax-2 and Pax-8. EMBO J, 1996. 15(8): p. 1971-82. 161. Williamson, M.P., The structure and function of proline-rich regions in proteins. Biochem J, 1994. 297 ( Pt 2): p. 249-60. 162. Zhou, Y., C.K. Hall, and M. Karplus, The calorimetric criterion for a two-state process revisited. Protein Sci, 1999. 8(5): p. 1064-74. 163. Mayne, L. and S.W. Englander, Two-state vs. multistate protein unfolding studied by optical melting and hydrogen exchange. Protein Sci, 2000. 9(10): p. 1873-7. 174  164. Makhatadze, G.I., G.M. Clore, and A.M. Gronenborn, Solvent isotope effect and protein stability. Nat Struct Biol, 1995. 2(10): p. 852-5. 165. Huyghues-Despointes, B.M., J.M. Scholtz, and C.N. Pace, Protein conformational stabilities can be determined from hydrogen exchange rates. Nat Struct Biol, 1999. 6(10): p. 910-2. 166. Kuhlman, B. and D.P. Raleigh, Global analysis of the thermal and chemical denaturation of the N-terminal domain of the ribosomal protein L9 in H2O and D2O. Determination of the thermodynamic parameters, deltaH(o), deltaS(o), and deltaC(o)p and evaluation of solvent isotope effects. Protein Sci, 1998. 7(11): p. 2405-12. 167. Efimova, Y.M., et al., Stability of globular proteins in H2O and D2O. Biopolymers, 2007. 85(3): p. 264-73. 168. Marcovitz, A. and Y. Levy, Frustration in protein-DNA binding influences conformational switching and target search kinetics. Proceedings of the National Academy of Sciences of the United States of America, 2011. 108(44): p. 17957-17962. 169. Gasteiger, E., et al., ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research, 2003. 31(13): p. 3784-3788. 170. Alber, T., Protein-DNA Interactions - How Gcn4 Binds DNA. Current Biology, 1993. 3(3): p. 182-184. 171. Andrabi, M., K. Mizuguchi, and S. Ahmad, Conformational changes in DNA-binding proteins: Relationships with precomplex features and contributions to specificity and stability. Proteins-Structure Function and Bioinformatics, 2014. 82(5): p. 841-857. 172. Petty, T.J., et al., An induced fit mechanism regulates p53 DNA binding kinetics to confer sequence specificity. Embo Journal, 2011. 30(11): p. 2167-2176. 173. Tell, G., et al., Redox potential controls the structure and DNA binding activity of the paired domain. J Biol Chem, 1998. 273(39): p. 25062-72. 174. Von Hippel, P.H. and O.G. Berg, Facilitated Target Location in Biological-Systems. Journal of Biological Chemistry, 1989. 264(2): p. 675-678. 175. Vuzman, D., A. Azia, and Y. Levy, Searching DNA via a "Monkey Bar" Mechanism: The Significance of Disordered Tails. Journal of Molecular Biology, 2010. 396(3): p. 674-684. 176. Zandarashvili, L., et al., Asymmetrical roles of zinc fingers in dynamic DNA-scanning process by the inducible transcription factor Egr-1. Protein Science, 2012. 21: p. 94-95. 175  177. Zwollo, P., et al., The Pax-5 gene is alternatively spliced during B-cell development. Journal of Biological Chemistry, 1997. 272(15): p. 10160-10168. 178. Lowen, M., G. Scott, and P. Zwollo, Functional analyses of two alternative isoforms of the transcription factor Pax-5. Journal of Biological Chemistry, 2001. 276(45): p. 42565-42574. 179. Gardner, K.H. and L.E. Kay, The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu Rev Biophys Biomol Struct, 1998. 27: p. 357-406. 180. Tataurov, A.V., Y. You, and R. Owczarzy, Predicting ultraviolet spectrum of single stranded and double stranded deoxyribonucleic acids. Biophysical Chemistry, 2008. 133(1-3): p. 66-70. 181. Delaglio, F., et al., NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR, 1995. 6(3): p. 277-93. 182. Goddard, T.D. and D.G. Kneeler, Sparky 3rd Edition. 1999. 183. Sattler, M., J. Schleucher, and C. Griesinger, Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Progress in Nuclear Magnetic Resonance Spectroscopy, 1999. 34(2): p. 93-158. 184. Salzmann, M., et al., TROSY in triple-resonance experiments: New perspectives for sequential NMR assignment of large proteins. Proceedings of the National Academy of Sciences of the United States of America, 1998. 95(23): p. 13585-13590. 185. Yang, D. and L.E. Kay, Improved 1HN-detected triple resonance TROSY-based experiments. J Biomol NMR, 1999. 13(1): p. 3-10. 186. Farrow, N.A., et al., Backbone Dynamics of a Free and a Phosphopeptide-Complexed Src Homology-2 Domain Studied by N-15 Nmr Relaxation. Biochemistry, 1994. 33(19): p. 5984-6003. 187. Hwang, T.L., et al., Application of phase-modulated CLEAN chemical EXchange spectroscopy (CLEANEX-PM) to detect water-protein proton exchange and intermolecular NOEs. Journal of the American Chemical Society, 1997. 119(26): p. 6203-6204. 188. Olsson, M.H.M., et al., PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pK(a) Predictions. Journal of Chemical Theory and Computation, 2011. 7(2): p. 525-537. 176  189. Glykos, N.M., Software news and updates. Carma: a molecular dynamics analysis program. J Comput Chem, 2006. 27(14): p. 1765-8. 190. Dunker, A.K., et al., Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform, 2000. 11: p. 161-71. 191. Uversky, V.N., A decade and a half of protein intrinsic disorder: biology still waits for physics. Protein Sci, 2013. 22(6): p. 693-724. 192. Habchi, J., et al., Introducing protein intrinsic disorder. Chem Rev, 2014. 114(13): p. 6561-88. 193. van der Lee, R., et al., Classification of intrinsically disordered regions and proteins. Chem Rev, 2014. 114(13): p. 6589-631. 194. Mao, A.H., et al., Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc Natl Acad Sci U S A, 2010. 107(18): p. 8183-8. 195. Das, R.K. and R.V. Pappu, Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc Natl Acad Sci U S A, 2013. 110(33): p. 13392-7. 196. Gsponer, J. and M.M. Babu, The rules of disorder or why disorder rules. Prog Biophys Mol Biol, 2009. 99(2-3): p. 94-103. 197. Cortese, M.S., V.N. Uversky, and A.K. Dunker, Intrinsic disorder in scaffold proteins: Getting more from less. Progress in Biophysics & Molecular Biology, 2008. 98(1): p. 85-106. 198. Dunker, A.K., et al., Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J, 2005. 272(20): p. 5129-48. 199. Tompa, P. and M. Fuxreiter, Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions. Trends Biochem Sci, 2008. 33(1): p. 2-8. 200. Vacic, V., et al., Characterization of molecular recognition features, MoRFs, and their binding partners. Journal of Proteome Research, 2007. 6(6): p. 2351-2366. 201. Dyson, H.J., Expanding the proteome: disordered and alternatively folded proteins. Quarterly Reviews of Biophysics, 2011. 44(4): p. 467-518. 202. Dyson, H.J. and P.E. Wright, Intrinsically unstructured proteins and their functions. Nature Reviews Molecular Cell Biology, 2005. 6(3): p. 197-208. 203. Liu, J.G., et al., Intrinsic disorder in transcription factors. Biochemistry, 2006. 45(22): p. 6873-6888. 177  204. Shammas, S.L., A.J. Travis, and J. Clarke, Remarkably Fast Coupled Folding and Binding of the Intrinsically Disordered Transactivation Domain of cMyb to CBP KIX. Journal of Physical Chemistry B, 2013. 117(42): p. 13346-13356. 205. Wells, M., et al., Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proceedings of the National Academy of Sciences of the United States of America, 2008. 105(15): p. 5762-5767. 206. DuMond, J., et al., An intrinsically disordered region of the transcription factor, NFAT5, becomes more ordered with an increase in osmolality. Faseb Journal, 2014. 28(1). 207. Laptenko, O., et al., The Tail That Wags the Dog: How the Disordered C-Terminal Domain Controls the Transcriptional Activities of the p53 Tumor-Suppressor Protein. Trends in Biochemical Sciences, 2016. 41(12): p. 1022-1034. 208. Buljan, M., et al., Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell, 2012. 46(6): p. 871-83. 209. Romero, P.R., et al., Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proceedings of the National Academy of Sciences of the United States of America, 2006. 103(22): p. 8390-8395. 210. Krieger, J.M., et al., Conformational recognition of an intrinsically disordered protein. Biophys J, 2014. 106(8): p. 1771-9. 211. Sormanni, P., et al., Simultaneous quantification of protein order and disorder. Nat Chem Biol, 2017. 13(4): p. 339-342. 212. Hollenhorst, P.C., L.P. McIntosh, and B.J. Graves, Genomic and biochemical insights into the specificity of ETS transcription factors. Annu Rev Biochem, 2011. 80: p. 437-71. 213. Garrett-Sinha, L.A., Review of Ets1 structure, function, and roles in immunity. Cell Mol Life Sci, 2013. 70(18): p. 3375-90. 214. Petersen, J.M., et al., Modulation of transcription factor Ets-1 DNA binding: DNA-induced unfolding of an alpha helix. Science, 1995. 269(5232): p. 1866-9. 215. Lee, G.M., et al., The structural and dynamic basis of Ets-1 DNA binding autoinhibition. J Biol Chem, 2005. 280(8): p. 7088-99. 216. Pufall, M.A., et al., Variable control of Ets-1 DNA binding by multiple phosphates in an unstructured region. Science, 2005. 309(5731): p. 142-145. 217. Lee, G.M., et al., The affinity of Ets-1 for DNA is modulated by phosphorylation through transient interactions of an unstructured region. J Mol Biol, 2008. 382(4): p. 1014-30. 178  218. Liu, H. and T. Grundstrom, Calcium regulation of GM-CSF by calmodulin-dependent kinase II phosphorylation of Ets1. Molecular Biology of the Cell, 2002. 13(12): p. 4497-4507. 219. Cowley, D.O. and B.J. Graves, Phosphorylation represses Ets-1 DNA binding by reinforcing autoinhibition. Genes & Development, 2000. 14(3): p. 366-376. 220. Davis, M.R. and D.A. Dougherty, Cation-pi interactions: computational analyses of the aromatic box motif and the fluorination strategy for experimental evaluation. Phys Chem Chem Phys, 2015. 17(43): p. 29262-70. 221. Monera, O.D., et al., Relationship of Sidechain Hydrophobicity and alpha-Helical Propensity on the Stability of the Single-stranded Amphipathic alpha-Helix. Journal of Peptide Science, 1995. 1(5): p. 319-329. 222. Information, N.C.f.B., Fmoc-L-phenylalanine. 2017. 223. Information, N.C.f.B., Fmoc-pentafluoro-L-phenylalanine. 2017. 224. Bienkiewicz, E.A. and K.J. Lumb, Random-coil chemical shifts of phosphorylated amino acids. Journal of Biomolecular Nmr, 1999. 15(3): p. 203-206. 225. De Simone, A., et al., Accurate Random Coil Chemical Shifts from an Analysis of Loop Regions in Native States of Proteins. Journal of the American Chemical Society, 2009. 131(45): p. 16332-+. 226. Wishart, D.S., B.D. Sykes, and F.M. Richards, The Chemical-Shift Index - a Fast and Simple Method for the Assignment of Protein Secondary Structure through Nmr-Spectroscopy. Biochemistry, 1992. 31(6): p. 1647-1651. 227. Wishart, D.S. and A.M. Nip, Protein chemical shift analysis: a practical guide. Biochemistry and Cell Biology-Biochimie Et Biologie Cellulaire, 1998. 76(2-3): p. 153-163. 228. Camilloni, C., et al., Determination of Secondary Structure Populations in Disordered States of Proteins Using Nuclear Magnetic Resonance Chemical Shifts. Biochemistry, 2012. 51(11): p. 2224-2231. 229. Wuthrich, K., M. Billeter, and W. Braun, Polypeptide secondary structure determination by nuclear magnetic resonance observation of short proton-proton distances. J Mol Biol, 1984. 180(3): p. 715-40. 230. Scott, E.W., et al., Requirement of transcription factor PU.1 in the development of multiple hematopoietic lineages. Science, 1994. 265(5178): p. 1573-7. 179  231. Kodandapani, R., et al., A new pattern for helix-turn-helix recognition revealed by the PU.1 ETS-domain-DNA complex. Nature, 1996. 380(6573): p. 456-60. 232. Wang, H., L.P. McIntosh, and B.J. Graves, Inhibitory module of Ets-1 allosterically regulates DNA binding through a dipole-facilitated phosphate contact. J Biol Chem, 2002. 277(3): p. 2225-33. 233. Lu, G., et al., Phosphorylation of ETS1 by Src family kinases prevents its recognition by the COP1 tumor suppressor. Cancer Cell, 2014. 26(2): p. 222-34. 234. Gonzalez Nelson, A.C., et al., Increasing prion propensity by hydrophobic insertion. PLoS One, 2014. 9(2): p. e89286. 235. Shrivastava, T., et al., Structural basis of Ets1 activation by Runx1. Leukemia, 2014. 28(10): p. 2040-2048. 236. Wang, S., et al., Mechanistic heterogeneity in site recognition by the structurally homologous DNA-binding domains of the ETS family transcription factors Ets-1 and PU.1. J Biol Chem, 2014. 289(31): p. 21605-16. 237. Studier, F.W., Protein production by auto-induction in high-density shaking cultures. Protein Expression and Purification, 2005. 41(1): p. 207-234. 238. Lee, W., M. Tonelli, and J.L. Markley, NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics, 2015. 31(8): p. 1325-1327. 239. Guntert, P., Automated NMR structure calculation with CYANA. Methods Mol Biol, 2004. 278: p. 353-78. 240. Ulrich, E.L., et al., BioMagResBank. Nucleic Acids Res, 2008. 36(Database issue): p. D402-8. 241. Craft, J.W., Jr. and G.B. Legge, An AMBER/DYANA/MOLMOL phosphorylated amino acid library set and incorporation into NMR structure calculations. J Biomol NMR, 2005. 33(1): p. 15-24. 242. Ryu, H., et al., NMRe: a web server for NMR protein structure refinement with high-quality structure validation scores. Bioinformatics, 2016. 32(4): p. 611-3. 243. Battye, T.G., et al., iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta Crystallogr D Biol Crystallogr, 2011. 67(Pt 4): p. 271-81. 244. Winn, M.D., et al., Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr, 2011. 67(Pt 4): p. 235-42. 180  245. McCoy, A.J., et al., Phaser crystallographic software. J Appl Crystallogr, 2007. 40(Pt 4): p. 658-674. 246. Afonine, P.V., et al., Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr D Biol Crystallogr, 2012. 68(Pt 4): p. 352-67. 247. Adams, P.D., et al., PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr, 2010. 66(Pt 2): p. 213-21. 248. Emsley, P., et al., Features and development of Coot. Acta Crystallogr D Biol Crystallogr, 2010. 66(Pt 4): p. 486-501. 249. Huret, J.L., et al., Atlas of Genetics and Cytogenetics in Oncology and Haematology in 2013. Nucleic Acids Research, 2013. 41(D1): p. D920-D924. 250. Grimley, E., et al., Inhibition of Pax2 Transcription Activation with a Small Molecule that Targets the DNA Binding Domain. ACS Chem Biol, 2017. 12(3): p. 724-734. 251. MacDonald, D. and P. Lu, Residual dipolar couplings in nucleic acid structure determination. Curr Opin Struct Biol, 2002. 12(3): p. 337-43. 252. Lewis, R.S., Calcium signaling mechanisms in T lymphocytes. Annu Rev Immunol, 2001. 19: p. 497-521. 253. Joseph, N., B. Reicher, and M. Barda-Saad, The calcium feedback loop and T cell activation: how cytoskeleton networks control intracellular calcium flux. Biochim Biophys Acta, 2014. 1838(2): p. 557-68. 254. Liu, J.O., Calmodulin-dependent phosphatase, kinases, and transcriptional corepressors involved in T-cell activation. Immunol Rev, 2009. 228(1): p. 184-98. 255. Rabault, B. and J. Ghysdael, Calcium-induced phosphorylation of ETS1 inhibits its specific DNA binding activity. J Biol Chem, 1994. 269(45): p. 28143-51. 256. Moisan, J., et al., Ets-1 is a negative regulator of Th17 differentiation. J Exp Med, 2007. 204(12): p. 2825-35. 257. Bhat, N.K., et al., Reciprocal expression of human ETS1 and ETS2 genes during T-cell activation: regulatory role for the protooncogene ETS1. Proc Natl Acad Sci U S A, 1990. 87(10): p. 3723-7. 258. Sharma, R., et al., Fuzzy complexes: Specific binding without complete folding. FEBS Lett, 2015. 589(19 Pt A): p. 2533-42. 259. Mao, H., et al., Sortase-mediated protein ligation: a new method for protein engineering. J Am Chem Soc, 2004. 126(9): p. 2670-1. 181  260. Wang, H.H., et al., Proximity-Based Sortase-Mediated Ligation. Angew Chem Int Ed Engl, 2017. 56(19): p. 5349-5352.     182  Appendices  Appendix A  : Assigned 15N-HSQC spectra of 15N-labeled Pax5 fragments.   A.1 Pax51-92.  Spectrum collected at pH 6.5 and 25 °C in NMR sample buffer (see Methods). Regions within dashed boxes are expanded for clarity.     183  A.2 Pax576-149.  Spectrum collected at pH 6.5 and 25 °C in NMR sample buffer.                  184  A.3 Pax51-149  The spectrum was collected at pH 6.5 and 25 °C in NMR sample buffer. Regions within dashed boxes are expanded for clarity.      185  A.4 Pax51-149/DNA.   A TROSY-based HSQC pulse sequence was used for the DNA complex, which contained a high affinity DNA sequence of 25 bp (CD19-2_Ains, as described in Methods). The spectrum was collected at pH 6.5 and 25 °C in NMR sample buffer using deuterated Pax5.       186  Appendix B  : The subdomains remain folded under conditions of high ionic strength.    187  Appendix C  : Crystal morphologies of the Ets1301-440/5fPhe2P* complex.   


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items