Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Regulation of Sxy in Haemophilus influenzae and other Pasteurellaceae Volar, Milica 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-0269.pdf [ 9.5MB ]
Metadata
JSON: 831-1.0100671.json
JSON-LD: 831-1.0100671-ld.json
RDF/XML (Pretty): 831-1.0100671-rdf.xml
RDF/JSON: 831-1.0100671-rdf.json
Turtle: 831-1.0100671-turtle.txt
N-Triples: 831-1.0100671-rdf-ntriples.txt
Original Record: 831-1.0100671-source.json
Full Text
831-1.0100671-fulltext.txt
Citation
831-1.0100671.ris

Full Text

R E G U L A T I O N O F SXYTN HAEMOPHILUS INFLUENZAE A N D O T H E R P A S T E U R E L L A C E A E by M I L I C A V O L A R B.Sc., The University of Belgrade, 2004 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Zoology) T H E U N I V E R S I T Y O F BRIT ISH C O L U M B I A December 2006 © Mi l i ca Volar, 2006 Abstract Natural competence is ability of bacteria to take up exogenous D N A from their surroundings. Haemophilus influenzae, a member of the Pasteurellaceae family, tightly regulates competence development, inducing uptake only when cells are starved both for sugars and nucleotides. Two proteins are involved in activation of competence gene transcription: Sxy and C R P . While C R P is a global regulator of genes involved in sugar metabolism, Sxy regulates a small regulon composed primarily of competence genes. Competence levels are intimately connected with the abundance of Sxy protein in the cell, thus revealing the regulation of the sxy gene is an imperative for understanding how and why H. influenzae becomes naturally competent. Previous studies found that strains carrying point mutations predicted to weaken sxy R N A secondary structure overproduce Sxy and are hypercompetent, while strains with mutations predicted to strengthen this structure, do not produce Sxy and are non-competent. This suggested that the secondary structure of sxy m R N A limits the gene's expression in the wild type in competence non-inducing conditions. I used bioinformatics and biochemical techniques to investigate whether sxy is regulated via its R N A secondary structure. Nuclease mapping confirmed that sxy R N A folds into a stable secondary structure and that the point mutations in hyper- and non-competent sxy mutants weaken or strengthen this structure, respectively. I also examined whether the regulation of sxy is conserved in Pasteurellaceae by comparing the predicted secondary structures of R N A in sxy homologues. The results suggested that the R N A secondary structure is only important in regulation of sxy in H. influenzae strains and is not conserved in other ii Pasteurellaceae. However, all of the examined species had long intergenic regions upstream of sxy homologues suggesting that additional regulatory element(s) is present. Based on the results in the thesis, I propose two models of sxy regulation: in one, sxy m R N A secondary structure inhibits translation initiation and in the other this structure allows binding of lignad(s) and acts as a riboswitch. TABLE OF CONTENTS Abstract. ii Table of contents iv List of tables vii List of figures vii i List of abbreviations ix Acknowledgments x CHAPTER 1: INTRODUCTION 1 1.1 Distribution of natural competence 1 1.2 Induction of competence development in H. influenzae 1 1.3 Competence genes and regulation of competence in H. influenzae 3 1.4 sxy as a competence gene and hypercompetence mutation isolation 5 1.5 Regulation of sxy 8 1.6 R N A secondary structure could be regulating sxy expression 8 1.7 Specific objectives of the thesis : 11 CHAPTER 2: MATERIALS AND METHODS 13 2.1 General methods 13 2.1.1 Strains, plasmids and culture methods 13 2.1.2 Template preparation 13 2.2 R N A methods 17 2.2.1 R N A preparation 17 iv 2.2.2 R N A secondary structure mapping 18 2.3 Bioinformatics analysis 19 2.3.1 D N A sequence analyses 19 2.3.2 sxy R N A secondary structure predictions 20 2.3.3 Covariance analysis of sxy R N A in Pasteurellaceae 20 2.3.4 Analysis of intergenic regions upstream of sxy in Pasteurellaceae 21 C H A P T E R 3: R E S U L T S 22 3.1 Phylogenetic comparative analysis in Pasteurellaceae 22 3.1.1 Regions upstream of sxy gene in Pasteurellaceae 22 3.1.2 Phylogenetic comparative analysis of sxy and its homologues 26 1.1.3 Covariance analysis in H. influenzae strains 32 3.2 R N A secondary structure mapping : 35 3.2.1 sxy R N A folds into a stem loop structure 35 3.2.2 Enzymatic mapping of sxy R N A supports computationally predicted structure..43 3.2.3 Secondary structure of sxy-\ m R N A 46 3.2.4 Secondary structure of sxy-1 m R N A 53 C H A P T E R 4: D I S C U S S I O N . . . 55 4.1 Co-variance analysis 55 4.1.1 Pasteurellaceae could carry a regulatory element in the region upstream of sxy homologues 55 4.1.2 Regulation of sxy is probably not conserved in Pasteurellaceae 57 v 4.1.3 R N A secondary structure could be important in regulation of sxy in H. influenzae only.... 59 4.2 R N A secondary structure mapping 61 4.2.1 5' R N A secondary structure may limit the expression of sxy 61 4.2.2 Is the described sxy R N A secondary structure present in in vivo conditions? 63 4.3 Possible mechanisms of sxy regulation 64 4.3.1 sxy expression could be regulated via translation initiation by sxy R N A secondary structure 64 4.3.2 sxy could be regulated by a riboswitch 67 4.4 Conclusion 68 References ! 69 Appendix 74 vi LIST OF TABLES Table 2.1 Bacterial strains used in this study 14 Table 2.2 Plasmids used in this study 15 Table 3.1 sxy homologues in Pasteurellaceae 23 Table 3.2. Specific cleavage sites for RNases used for in vitro mapping of sxy, sxy-\ and sxy-1 R N A 36 Table A . l . Accession numbers of annotated genomes used in the study 74 vii LIST OF FIGURES Figure 1.1 Model of competence development in H. influenzae 6 Figure 1.2 Transformation frequencies of wild type, sxy-l and sxy-1 mutants 7 Figure 1.3 Region of H. influenzae K W 2 0 chromosome with sxy and its upstream region 9 Figure 2.1 Construction of pGEMsxy-7 16 Figure 3.1 Comparison of intergenic regions upstream of sxy in Pasteurellaceae 25 Figure 3.2 ClustalW alignment of sxy and its homologues in Pasteurellaceae 28 Figure 3.3 R N A secondary structure predictions of sxy and its homologues in Pasteurellaceae 31 Figure 3.4 Phylogenetic comparative analysis of five fully sequenced H.influenzae 34 Figure 3.5 Sample analysis 39 Figure 3.6 Nuclease mapping of sxy m R N A .41 Figure 3.7 Secondary structure of sxy m R N A in vitro 44 Figure 3.8 Summary of base-pairing in sxy R N A secondary structure 45 Figure 3.9 Nuclease mapping of wild type sxy, sxy-1 and sxy-1 R N A s 49 Figure 3.10 Secondary structures of sxy, sxy-l and sxy-1 R N A in vitro 51 Figure 3.11 Summary of base-pairing in the sxy, sxy-\ and sxy-1 R N A secondary structures 52 Figure 4.1 Possible mechanisms of regulation of sxy expression 66 Figure A . l Representative gel pictures 75 Figure A.2 Cleavage intensities of residues in sxy, sxy-l and sxy-1 R N A 77 LIST OF ABBREVIATIONS Amp Ampic i l l in A T P adenosine 5' -triphosphate BHI brain heart infusion (rich culture medium) bp base pair c A M P 3' , 5' cyclic adenosine monophosphate C R P c A M P receptor protein D N A deoxyribonucleic acid DNase deoxyribonuclease E value expected value ( B L A S T ) E tOH Ethanol M I V " M four", H. influenzae defined starvation medium m R N A messenger R N A N A D nicotinamide adenine dinucleotide NaOAc sodium acetate N C B I The National Center for Biotechnology Information NTPs ribonucleoside 5'-triphosphates O R F open reading frame P C R polymerase chain reaction R B S ribosome binding site R N A ribonucleic acid RNase ribonuclease sBHI BHI supplemented with hemin and N A D T IGR The Institute for Genomic Research U T P uridine 5'-triphosphate ix Acknowledgments I would like to thank my supervisor, Rosemary Redfield for guiding and supporting me both morally and financially to think about beautiful processes that occur in the world of bacteria. I would also like to thank my supervisory committee George A . Mackie and Naomi Fast for expert guidance and for very open and useful conversations and suggestions. Very special thanks to my lab mates, past and present: Andrew Cameron, Lindsay Wilson, Heather Maughan and Minghiu Yang with whom I shared many coffees in an attempt to understand sex in bacteria. Many others also helped me with my project throughout my program, to whom I am very grateful: Ar ina Omer, Mar ia Zago, Janet Hankins and Brett McLeod , for teaching me the R N A techniques; Bryony Wil l iams together with the rest of the Keel ing lab, for teaching me molecular biology techniques. I am deeply grateful to my families: both the one in Belgrade and the one in Waterloo, for sharing all of the good and bad moments that I went through in these two years. To my Bojan, who learned, with me, about sxy homologues, for his endless love and support. To my grandfather, Vojkan Pejovic, who taught me, among so many other things, to critically think, ask questions and nourish my imagination. I owe special thanks to Bryony Wil l iams and Andrea Basler for number of deep and interesting conversations throughout grad school and for being honest friends. Finally, I would like to thank Dragana Cvetkovic and Oliver Stojkovic who introduced me to the world of evolution and molecular biology and who encouraged me to continue thinking about these spectacular issues. x Chapter 1: Introduction Bacteria have various ways to cope with changes in the natural environment. One of the changes is nutrient depletion, which induces natural competence development in a number of bacterial species, making them able to bind and take up exogenous D N A from their environment. This energetically expensive process is regulated in most bacteria (for a review see (14,32). Therefore, learning more about the signals that induce competence is important for understanding not only the mechanism of natural competence development, but also its function and evolution. In Haemophilus infleunzae, competence development is regulated by two proteins: Sxy and C R P . The focus of this thesis wil l be on the post-transcriptional regulation of sxy expression in H. influenzae. Additionally, comparative phylogenetic analysis was employed to investigate if the proposed regulatory mechanism is conserved in other species within the Pasteurellaceae family. 1.1 Distribution of natural competence Natural competence is the ability of bacteria to take up exogenous D N A from their environment (14,32). D N A uptake is advantageous because it provides a cell with nucleotides, which are energetically expensive to synthesize de novo. It can also alter the cell 's genotype if the D N A escapes from degradation and recombines with the bacterial chromosome in a process called natural transformation. 1 Natural competence is broadly distributed across more than 40 Gram-positive and Gram-negative bacterial species, but the distribution is sporadic making its origin and evolution difficult to understand. Except for Neisseria gonorrhoeae, where natural competence is constitutive, the induction of competence in transformable bacteria is physiologically regulated (4). Mechanisms of natural competence are best understood in Haemophilus influenzae, Neisseria gonorrhoeae (both Gram-negative), Streptococcus pneumoniae and Bacillus subtilis (Gram-positive). H. influenzae is a member of the Pasteurellaceae family within the y-proteobacteria subdivision. This clinically important species is a commensal in the human respiratory tract that causes wide range of diseases, including meningitis and eye, ear and lower respiratory tract infections. H. influenzae becomes transformable under very specific conditions (15,29); hence, defining the signals that induce competence is crucial in understanding how and why this mechanism evolved. I. 2 Induction of competence development in H. influenzae In Haemophilus influenzae competence is tightly regulated. Since it requires synthesis of a range of new proteins, cells can afford to become competent only under certain environmental conditions where D N A uptake wil l be useful. In a laboratory environment, genes involved in competence development in H. influenzae are usually induced by transferring exponentionally growing cells from a nutritionally rich medium (usually sBHI) to a starvation medium (MIV) (for methods see (27)). This nutritional restriction results in an increase in transformation frequency from ~10' 8 in non-induced cells to ~10"3 2 in fully induced cells. A n increase in transformation frequency can be also observed in cells entering the stationary phase when grown only in sBHI. This is explained by nutritional depletion of the medium and, eventually, starvation of the cells. In this case the transformation frequency is always much less than that seen after induction in M I V (usually ~10"2). With these observations in mind, it is important to answer which signals are sensed by cells when transferred to starvation medium. Also, what is the specific mechanism by which genes are induced in competence-inducing conditions. In H. influenzae, two intracellular signals are required for competence development: a nucleotide signal and an energy signal (described in detail in fol lowing section). During an energy shortage, the production of c A M P in the cell increases, resulting in transcription of genes involved in sugar metabolism (38). The second regulatory signal, essential for maximal competence induction, is a limiting concentration of nucleotides in the cell. Macfadyen (16) showed that supplementing M I V with nucleotides prevents transcription of competence genes, indicating that purine ribonucleotide depletion induces competence development in H. influenzae. 1.3 Competence genes and regulation of competence in H. influenzae Under competence-inducing conditions about 25 genes are significantly up-regulated in H. influenzae. These genes are organized into 13 transcriptional units whose transcription is activated by two proteins, C R P and Sxy (29). 3 Although essential for induction of competence genes, C R P ( c A M P regulatory protein) is not a competence specific regulator. C R P is well described in E.coli where it acts as a global regulator of the genes involved in the cell 's response to sugar depletion. In H. influenzae, when c A M P concentration rises in the cell, C R P is activated by binding c A M P . The resulting conformational change in C R P allows it to bind at C R P sites in the promoter regions of competence and sugar metabolism genes inducing their transcription (3,13). sxy, on the other hand, is a competence-specific gene: mutants carrying sxy null mutations are non-transformable but fully viable indicating its only role is in natural competence. Conversely, constitutive expression of Sxy in H. influenzae results in elevated transformation levels under non-inducing conditions (33). sxy is located between the recA and rrnA genes in the chromosome and encodes a small basic protein. Expressed early in competence development, Sxy induces transcription of competence genes involved in D N A binding, uptake and processing (29). Although it is speculated that sxy is an activator of transcription, the specific mechanism by which it acts is still unknown. Comparative analyses have shown that sxy is present in three y-proteobacterial families: Pasteurellaceae, Vibrionaceae and Enterobacteriaceae (2), and that it plays essential roles in competence in some of the species examined (23). Figure 1.1 illustrates the steps in natural competence development in H. influenzae. Production of Sxy protein is one of the first steps in competence induction (29) (see Figure 1.1, Step 1). In parallel, due to the lack of preferred sugars c A M P concentrations rise in the cell. C R P binds c A M P (Fig. 1.1, Step 2) and induces the transcription of competence genes by binding to C R P sites in the promoter regions (10). Recently, two 4 types of C R P sites with distinct consensus sequences have been described: C R P - N and CRP-S (29). Transcription of the genes that carry C R P - N sites in their promoter regions is activated by binding C R P (shown in red, Step 3), while competence genes that have C R P - S sites are transcribed only in the presence of both C R P and Sxy (shown in blue, Step 4) (3). Finally, the expression of competence genes in H. influenzae allows cells to bind, take up and process exogenous D N A (Step 5). 1.4 sxy as a competence gene and hypercompetence mutation isolation sxy was first linked to natural competence in 1991. After performing E M S mutagenesis of H. influenzae, Redfield (1991) isolated first one (28) and later four additional mutant strains (unpublished) that were shown to be competent early in exponential growth, the phase when cells are usually not competent. A l l of the five point mutations were mapped to the region upstream of the reck gene. Since these strains were "always in the mood" for taking up exogenous D N A , the gene was named sxy and (after additional mutants were identified) the mutations were numbered from 1 to 5 (sxy-l to sxy-5). The phenotypes of all five mutants are the same, but quite distinct from the wild type. When grown in sBHI medium the rate of spontaneous transformation is 100-1000 times higher in the mutants throughout growth (see Fig. 1.2). Due to the elevated transformation frequency the strains are referred to as hypercompetent mutants. Surprisingly, normal induction of competence (by transferring the cells in exponential phase to starvation medium (MIV)) produces the same transformation frequencies in wild type and hypercompetent mutants (Fig 1.2). 5 Figure 1.1. M o d e l of competence development in H. influenzae. For specific details refer to text in section The distinct response of these strains depending on the different induction conditions raises questions about how the cells sense the signals from the environment, and how sxy expression responds to those signals. Figure 1.2. Transformat ion frequencies of wi ld type (KW20) , sxy-1 and sxy-7 mutants. The cells were grown in sBHI and transferred to M I V to induce competence. The transformation frequencies were measured for the wild type (purple), sxy-l mutant (pink) and sxy-7 mutant (blue) (adapted from L.A. Bannister's PhD thesis (2)). 7 1.5 Regulation of sxy Mapping of the point mutations in sxy hypercompetent mutants revealed that they are located in two distinct groups (Figure 1.3 A & B ) . Both the sxy-l and sxy-2 mutations map to the beginning of the coding region: the sxy-l mutation results in an amino acid change (Val 1 9-> He) while sxy-2 is a silent mutation (Gln 1 7). In contrast, the sxy-3, 4 and 5 mutations were mapped to adjacent positions 36, 37 and 38 bp upstream of the sxy start codon (Figure 1.3 C). Zulty and Barcak (40) mapped the transcriptional start of sxy R N A and characterized this gene's promoter region. Using primer extension analysis they showed that sxy m R N A has a 51 nucleotide 5' untranslated region. Thus all the sxy hypercompetence mutations alter the sxy m R N A , but only .sxy-1 changes the protein sequence. To ensure that the hypercompetence phenotype is a result of substitutions in only the sxy gene, the same mutations were produced by directed mutagenesis and transferred into the non-mutagenized background of H. influenzae K W 2 0 , where they were found to give the same phenotype (2). 1.6 RNA secondary structure could be regulating sxy expression The position of the mutations, where most are found outside of the coding sequence, implied that the untranslated region of sxy R N A has a regulatory role and suggested that sxy could be regulated by secondary structure of its m R N A . The untranslated region was predicted (by eye) to base pair with the beginning of the coding region, forming a 16 bp 8 lOObp B 3 < r* ] I 5 / 2 i -35 -10 H K sxy U K lObp o-fl-ft-G-u-fi-c - u—n Allele Mutation Position Amino acid change .sxv-1 G - > A +55 V a l 1 9 - > He sxy-2 G — > A +51 Si len t ( G l n l 7 ) sxy-3 C - > T -38 5 ' untranslated s.xy-4 T —> C -37 5 ' untranslated sxy-5 G - > T -36 5 ' untranslated sxy-6 C G —> T A -38, +55 V a l 1 9 —>I le sxy-1 C T - > G A -32 , -31 5 ' untranslated Figure 1.3. A . Region of the H. influenzae KW20 chromosome with sxy and its upstream region. B . Enlarged region with sxy untranslated region showing the position of point mutations and transcription initiation sites. Mutations causing hypercompetence are shown in red and mutations resulting in a non-competent phenotype are shown in blue. C . Predicted secondary structure of the 5' end of sxy R N A including the positions of the point mutations, ribosomal binding site and the translational start (adapted from (2)). 9 stem (see Figure 1.3 B and C). Additionally, computer analysis, using software that predicts R N A folding based on the sequence (Mfold (20)), gave very similar results (2). Since sxy is an essential activator of competence genes and each of the hypercompetence substitutions was found to disrupt the base-pairing in the predicted stem region, it was hypothesized that destabilization of the stem causes overexpression of Sxy, producing the hypercompetent phenotype in the mutant strains. Similarly, in the wi ld type, it was hypothesized that sxy expression is limited during early log phase by the secondary structure of sxy m R N A (2). To test if the stem in the secondary structure limits sxy expression, Laura Bannister created two additional mutant strains, sxy-6 and sxy-1 (Fig 1.3 B & C ) . The sxy-6 mutant. was designed to restore pairing in the stem region by introducing compensatory mutations combining the sxy-3 and .sxy-l genotypes. The sxy-1 strain carries two mutations designed to form two additional base-pairs in the stem region, by replacing the 2 bp bubble (Fig 1.3 C). As reported in L. Bannister's thesis (2), analysis of the sxy-6 and sxy-1 phenotypes revealed that they are non-transformable (Figure 1.2). Additionally, in competence-inducing conditions both of these strains have almost no sxy transcript (measured by Northern analysis) or protein (measured by (3-galactos'idase expression from a lacL fusion) (2). This analysis provided direct genetic evidence that the stem in the m R N A secondary structure is biologically important in the regulation of .sxy. Using R N A secondary structure mapping techniques, Bannister analyzed the folding of sxy and sxy-1 R N A in in vitro conditions. The goal was to test whether these R N A s form the predicted stem structure when folded, and to determine if the known mutations affect the folding. R N A secondary structure mapping with nucleases (reviewed in (9)) allows 10 identification of RNase sensitive sites in folded R N A , using enzymes with preferences for specific residue types and pairing states. Bannister used RNase C L 3 , T l and T2 and these enzymes revealed unpaired As , Gs, Cs and Us in sxy and sxy-l R N A . The results showed that in both R N A s the secondary structure is similar to the one predicted by Mfold and that the predicted stem region forms in in vitro conditions (2). Although genetic and protein expression data strongly suggest that sxy-1 mutation acts by destabilization of its secondary structure, mapping analysis did not provide any evidence that the sxy-1 secondary structure differs from the wild type one. Further investigation of, this issue via nuclease mapping is required. 1.7 Specific objectives of the thesis The focus of this thesis is the role of R N A secondary structure in the regulation of the H. influenzae sxy gene. Using phylogenetic comparative analysis, a common regulatory element was sought in the R N A s of H. influenzae sxy and its homologues within the Pasteurellaceae family. Additionally, the secondary structure sxy R N A was examined by enzymatic mapping using RNases that differ in specificity from the ones previously used by Bannister. Two single-stranded RNases (RNase T l and RNase A ) and an RNase specific for double-stranded regions were used in the mapping. To test whether the mutations affect R N A folding the secondary structures of a hypercompetent sxy-1 mutant and a non-competent sxy-7 mutant were determined. 11 Finally, whether the regulation of sxy homologues in Pasteurellaceae is conserved and how the gene might be regulated is discussed. 12 Chapter 2: Materials and methods 2.1 General methods 2.1.1 Strains, plasmids and culture methods The bacterial strains and plasmids used in this study are listed in Tables 2.1 and 2.2. A l l H. influenzae strains are derivatives of the original Rd strain (1). Methods for culturing E.coli and H. influenzae cells are described elsewhere (27). Ampic i l l in concentrations used for selection of resistant E.coli and H. influenzae strains were 100 pg/ml and 5.0 pg/ml, respectively. 2.1.2 Template preparation Plasmids pGEMsxy, pGEMsxy-1 and pGEMsxy-7 were used as templates for in vitro preparation of R N A s . Plasmids pGEMsxy and pGEMsxy-1 were constructed by L.A. Bannister (2). Both were constructed by amplifying the full untranslated and coding sxy sequence (position -51 to +678 relative to the translational start) from genomic D N A of H. influenzae K W 2 0 or the sxy-1 mutant strain (RR699). Amplicons were digested and cloned into the Apal and £coRI restriction sites of p G E M 7 (Sigma), adjacent to the T7 promoter (see Fig 2.1.C). I constructed pGEMsxy-7 by first amplifying the sxy sequence from sxy-1 mutant strain RR854 (see Fig. 2.1), with Apal and EcoRl sites introduced on the primers. The insert sequence started at the transcriptional start site (-51) and ended at position +272 in the coding region. The digested and purified sxy-1 insert was cloned into the Apal and EcoRl sites of p G E M 7 . Table 2.1. Bacterial strains used in this study H. influenzae R d Genotype Source or reference K W 2 0 Wi ld type (37) RR854 sxy-7 (2) E.coli D H 5 a F- 80 /acZAM 15 A(/acZYA-argF)U 169 deoR reckl endAl hsdRl7(rk-, mk+)phoA supE44 thi-l gyrA96 relAl (11) JM109 F- 80/acZAM15 A( /acZYA-argF)U169 deoR rechX endAl hsdRl7(xk-, mk+) phoA supE44 thi-l gyrA96 relAl pGEM7-Z f - (Amp R ) Promega RR2077 F- 80/acZAM15 A( /acZYA-argF)U169 deoR recAl endAl hsdRl7(rk-, mk+) phoA supB44 thi-l gyrA96 r e / A l pGEWlsxy-7 (Amp R ) This study 14 Table 2.2. Plasmids used in this study P lasmid Insert Source or reference pGEM7-Z f - pBR322 derivative (Amp R ) vector Promega p G E M i x y sxy sequence, insert including full untranslated region cloned into Apal and EcoRl sites of p G E M 7 (Amp R ) (2) pGEMsxy-1 sxy-l sequence, insert including full untranslated region cloned into Apal and EcoRl sites of p G E M 7 (Amp R ) (2) pGEMsxy-7 truncated sxy-7 sequence (to position +272), insert including full untranslated region cloned into Apal and EcoRl sites of p G E M 7 (Amp R ) This study 15 1) pAltersxyF (Primer 13 from Bannister, 1999 (2)) Apal 5 ' A A A G G G C C C C A G A A G T A C T T C T A C T G A C T C 3 ' 2) pAltersxyR 5 ' C A T G A A T T C G T A A A A T C T G A T C A G A A A G T G C 3 ' A Apal c T7 promoter b T A A I A C G A C T C A C T A T A l Q a Q f : ' -TT ;• sxy 5' UTR ^QAAGTACTTCTACTGftCTCTTTT^AAATAATT Figure 2.1. Construct ion of p G E M s x y - 7 . A . Location and sequence of primers p A L T E R s x y F and p A L T E R s x y R used for sxy-1 amplification. B. Cloning of sxy-1. The ful l U T R sequence (51 bp) together with the first 272 bp of the coding sequence of sxy-1 was amplified by P C R using chromosomal D N A as template. The product was digested with Apal and EcoRl (sites in the primers), purified and cloned into digested p G E M to be expressed under T7 promoter. C . Position of T7 promoter in pGEMsxy plasmids and the sxy transcription start (arrow). Nucleotides that are underlined represent the left end of the sxy insert. 16 The insert sequences were confirmed by sequence analysis using Bannister's Primer 10 (2). Genomic D N A s were isolated by the phenol-chloroform method (31). Plasmid D N A s were isolated using a Qiagen plasmid mini kit. 2.2 R N A methods 2.2.1 R N A preparation Wild type sxy, sxy-l and sxy-l R N A s were prepared by in vitro transcription (T7 MEGAscr ip t T7 kit, Ambion) from plasmids linearized at position +272, resulting in 326 nt long run-off transcripts. Both pGEMsxy and pGEMsxy - l were linearized with SnaBI (they have the full coding sequence cloned into the plasmid) while pGEMsxy-7 was linearized with EcoRl (since it only carries a partial coding sxy-7 sequence). The resulting R N A s were purified from the transcription mix, first by a DNase treatment using a DNA-Free K i t (Ambion) and next by a spin column ( R N A easy kit, Qiagen). A t this point each R N A sample was quantified by spectrophotometry, and quality was assessed both by agarose gel and A 2 6 0 / 2 8 0 ratios. Next, the R N A s (~20pmol) were dephosphorylated in 100u.l reactions at 37°C for 2 hours using 0.5 U of calf intestinal alkaline phosphatase (Roche). R N A s were extracted with an equal volume of phenol-chloroform and precipitated overnight at -20°C with 0.3M N a O A c and 2.5 volumes of 95% EtOH. After 30 minutes centrifugation at 3,780 x g at 4°C the pellets were rinsed with 200pl of 80% E t O H , centrifuged for 30 minutes (at 3,780 x g at 4°C), bench dried for 10 minutes and dissolved in 20pl nuclease-free water. Dephosphorylated R N A s (~10pmol) were 5'end-labeled in 50u.l reactions using 20U of T4 polynucleotide kinase 17 (New England, BioLabs) and at least 20pmol of y- P A T P (6000 Ci /mmol , 250uCi , G E Amersham). Labeling reactions were allowed to proceed for 1 hour at 37°C and then the enzyme was inactivated by a 10 minute incubation at 70°C. Unincorporated nucleotides were removed with a spin column ( R N A easy kit, Qiagen), and R N A was eluted in 30pl nuclease-free water. 2.2.2 R N A secondary structure mapping Prior to partial nuclease digestion, the R N A s were denatured for 5 minutes at 95°C and then allowed to refold for 15 minutes at 37°C. End-labeled R N A s (~2 pmols for each reaction) were then digested with the RNases A , VI and TI (Ambion). Both partially digested R N A s and control R N A s (ladders) were prepared fol lowing the manufacturers directions. The RNase concentration in digestions was considered to be optimal if the enzyme cut, on average, less then once per R N A molecule in the reaction. This was empirically determined by choosing conditions that left a substantial amount of undigested R N A in the well after the digestion. Prior to enzyme addition control samples were incubated for 5 minutes at 65°C in I X Sequencing buffer which allowed denaturation of R N A s . The fol lowing enzyme concentrations were used for both < partially digested and control sample preparations: for RNase A : 0.005U/pl, for TI: 0.05U/pl and for V I : O.OOlU/pl. After addition of enzyme, samples were incubated for 15 minutes at room temperature before 20pl of Inactivation/Precipitation buffer (Ambion) was added. A n overnight precipitation at-20°C was followed by centrifugation at maximum speed (3,780 x g) at room temperature. The pellets were rinsed with 20pl of 80% E t O H , centrifuged at 18 maximum speed at room temperature, air-dried and resuspended in 7 pi of R N A loading buffer (Ambion). R N A fragments were resolved on 5%, 8% or 10% denaturing (8M urea) polyacrylamide gels at 900V and 12mA. Gel dimensions were: 0.2mm thick and 400mm long. Alkal ine digested end-labeled R N A was used as a ladder to help in assigning the bands in the gels to a specific residue in the R N A sequence. Gels were visualized by Phospholmager and the intensities of the bands were quantified using ImageQuant 5.2. The assigned intensities for the particular residues in sxy, sxy-l and sxy-1 R N A s were normalized by the intensities of residue +27 (for residues in lanes loaded with R N A partially digested with RNase A ) and +29 (for residues in lanes loaded with R N A partially digested with RNase Tl ) . Three or more replicates were used in comparisons of the same residue in R N A s of sxy, sxy-l and sxy-1. 2.3 Bioinformatics analysis 2.3.1 DNA sequence analyses The nucleotide sequences of sxy and its annotated homologues (H. ducreyi 35000HP, Pasteurella multocida PM70 and Mannheimia succiniciproducens M B E L 5 5 E ) were obtained from The Institute For Genomic Research (TIGR, http://www.tigr.org). Sequences of unfinished Actinobacillus succinogenes 130Z, A. pleuropneumoniae, A. actinomycetemcomitans HK1651and H. influenzae 86-028NP genomes were obtained from The Laboratory for Genomics & Bioinformatics, University of Oklahoma (http://microgen.ouhsc.edu/project home.htm) and the unfinished sequence of H. somnus 129 PT was obtained from http://www.jgi.doe.gov. The sxy sequence of H. influenzae 19 strain 10810 was retrieved from the Sanger Institute (http://www.sanger.ac.uk/') while the sequences of other two completed H.influenzae strains (R2846 and 2866) were obtained from The University of Washington Genome Centre (http://www.genome.washington.edu/uwgc/). In non-annotated genomes sxy homologues were detected by searching with Haemophilus influenzae Rd K W 2 0 sxy amino acid sequence against nucleotide sequences using the B L A S T search available on the N C B I web server. The amino acid and nucleotide sequence alignments of sxy and its homologues were done in ClustalX or ClustalW (35) using the default settings. The alignments were further refined by eye using MacClade (v 4.0 PPC). 2.3.2 sxy RNA secondary structure predictions A l l R N A secondary structure predictions were done using Mfo ld (version 3.2) developed by Zuker et al (39)(20) (http://www.bioinfo.rpi.edu/applications/mfold/rna/). The sequence of sxy and its homologues used for this analysis contained the full U T R region (51nt) and 60 nt of the coding region. A l l Mfo ld parameters were kept at default values. 2.3.3 Covariance analysis of sxy RNA in Pasteurellaceae Phylogenetic comparison of sxy and its homologues was performed by submitting the R N A sequences to Mfo ld (39)(20) and visually inspecting the resulting folded sequences for similarities in their secondary structures, sxy in H. influenzae K W 2 0 strain served as a reference for choosing the beginning and end of R N A s to be folded in sxy homologues, 20 where information on length (or existence) of the untranslated region was not available. For all of the examined R N A s the same settings and conditions were used as for the sxy secondary structure predictions, where +1 position represented the translational start annotated by the sequencing centre. 2.3.4 Analysis of intergenic regions upstream of sxy in Pasteurellaceae The sequences of the intergenic regions upstream of sxy in annotated Pasteurellacean genomes were retrieved from T IGR. In non-annotated species open reading frames (ORFs) and non-coding (intergenic) regions were identified using the program Sequence Analysis (http://informagen.com/SA/). The parameters were left at default values except for the length of ORFs , where the minimum was set to 50 amino acids. The amino acid sequences of the ORFs found upstream of sxy homologues were searched against the O M N I O M E genome sequences using B L A S T on the T IGR web server to determine gene identities and functions. Putative roles were determined for all ORFs except for the one in A. actinomycetemcomitans where the available sequence was truncated. 21 Chapter 3: Results 3.1 Phylogenetic comparative analysis in Pasteurellaceae 3.1.1 Regions upstream of sxy gene in Pasteurellaceae B L A S T searches by L. Banister identified sxy homologues in several Pasteurellaceae, Enterobacteriaceae and Vibrionaceae species. More Pasteurellacean genomes have been sequenced since, and it was found that all of the species have sxy homologues (3). To test whether this gene shares a common regulatory mechanism in Pasteurellaceae, I compared predicted m R N A s secondary structures of sxy homologues. Table 3.1 lists the sequenced Pasteurellaceae species with their sxy homologues that were included in this study. To determine if other Pasteurellaceae have a potential to carry a regulatory element in their R N A s (like the one present in H. influenzae KW20) , I first examined intergenic regions upstream of sxy homologues in eight species from this family. Figure 3.1.A shows a comparison of the genome regions spanning sxy homologues and the closest upstream open reading frame. The lengths of the intergenic regions upstream of sxy in Pasteurellaceae range from 319 bp in H. influenzae to 1491 bp in P. multocida with the others all ranging from 407 bp to 607 bp. The upstream genes are diverse: they are involved in different cellular processes such as recombination (recA in H. infuenzae) and translation (fusA in A. succinogenes) or they code for ribosomal R N A s or proteins: rpsJ in H. ducreyi and A. pleuropneumoniae and 16SRNA in P. multocida. The transcription direction of the genes found upstream of sxy homologues is divergent to sxy, except in H. somnus where the genes are parallel. The 22 diversity in the upstream genes suggests that major rearrangements have occurred in this chromosomal region. Species Gene E value Prote in length (aa) H.influenzae (H. influen) HI0601 - 217 H. ducreyi (H. ducreyi) HD1985 1.2e-14 211 H. somnus (H. somnus) Hsom2370 2e-35 216 A. pleuropneumoniae (A. pleuro) Aple2132 5e-15 218 A. actinomycetemcomitans (A. actino) Not annotated 7e-34 211 A. succinogenes (A. succino) Not annotated 6e-37 219 P. multocida (P. multo) PM1558 2.5e-36 213 M. succiniciproducens (M. succini) MS2301 9.3e-30 217 Table 3.1. sxy homologues in Pasteurellaceae. This table shows species included in the study, the abbreviated species names that wil l be used throughout this thesis, each gene's primary annotation, B L A S T E values against H. influenzae sxy and the Sxy protein length. The lengths of intergenic sequences upstream of sxy were compared to the average length of the intergenic regions in each Pasteurellaceae. The average intergenic lengths were determined by dividing the number of nucleotides in the total intergenic sequence by the number of coding genes in the genome (TIGR). It should be noted that this calculation 23 represents just a rough approximation of the average intergenic length since it did not consider operon organization of the genes and presence of R N A genes in the intergenic regions. The comparison showed that the average length of intergenic region in each Pasteurellaceae species is less than 150nt making the sequences upstream of sxy homologues unusually long and suggesting that a conserved regulatory element could be present in these regions. 24 sxy H. ducreyi A. pleuro B Hi KW20 Hi 10810 Hi R2846 -Hi R2866 -Hi 86-028NP • recA 1065 1062 T 0 6 T 1062 1065 319 319 371 372 371 sxy Figure 3.1. Compar ison of intergenic regions upstream of sxy i n Pasteurellaceae. View of the regions upstream of sxy and its Pasteurellaceaen homologues. Numbers represent lengths of the intergenic and coding regions in bp. A . Intergenic regions upstream of sxy homologues in Pasteurellaceae. B. Region upstream of sxy in H. influenzae strains. The analysis was incomplete for H. influenzae R2846 and A. actinomycetemcomitans due to the end of contig (black bars). 25 3.1.2 Phylogenetic comparative analysis of sxy and its homologues Phylogenetic comparative (covariance) analysis is one of the methods to infer R N A secondary structure by sequence comparisons among members of a set of functionally identical R N A s from related species. The analysis is based on the premise that R N A s that have identical function wil l fold in a similar secondary structure despite differences in their primary sequences. In the other words, if a substitution arises in a functionally important region of R N A , a mutation that restores base-pairing would be positively selected to preserve the secondary structure. These pairs of substitutions in the R N A sequences of related species are referred to as covarying residues. The first step in covariance analysis is to find a conserved motif from aligned sequences that can be later used as a reference for choosing the start of the R N A sequence to be computationally folded. R N A secondary structure predictions for phylogenetic analysis can be made using two distinct classes of software. First, common R N A secondary structures can be obtained by using software that makes a prediction based on the alignment, inferring the possible pairing and taking into account the co-varying bases. This analysis, however, requires conserved motifs throughout the alignment (5). Alternatively, the predictions can be made for each sequence separately based on minimising free energy (AGs) of the sequence, using programs such as Mfo ld or RNADraw. These R N A secondary structure predictions (outputs) can be then compared for similarities. AGs represent an approximation of the predicted R N A secondary structure stability, where the structures predicted to be more stable have more negative values. 26 This particular analysis was based on the hypothesis that if the stem-loop structure present in H. influenzae sxy R N A is important in this gene's regulation it wi l l be conserved in different Pasteurellaceae species. Since stem region(s) with multiple co-varying base-pairs are likely to exist in a native R N A structure, R N A s of sxy homologues should have a stem region that is very similar to the one present in H. influenzae, if their regulation is similar. More specifically, I would expect the stem region to be of a similar length and with few co-varying bases, if the hypothesis is correct. In order to examine the level of conservation and to determine if there is a site (better than the translational start) that could be used as a reference for the downstream m R N A secondary structure predictions, the sxy amino acid and nucleotide sequences from Pasteurellacean genomes were aligned. The amino acid alignment showed that the Sxy sequence is not very conserved, especially at the beginning of the coding region (see Figure 3.2). However, several conserved motif blocks were present in the alignment suggesting that the function of the gene is conserved throughout Pasteurellaceae. The nucleotide sequences upstream of sxy homologues weren't alignable and the sequences were too divergent to identify a single conserved motif that could be later used as a reference in the comparative secondary structure analysis. Due to the absence of conserved motifs in the nucleotide alignment of sxy and its homologues, I decided to use software that produces predictions of individual secondary structures for the covariance analysis. The reference for choosing the length of sequences was the putative sxy m R N A secondary structure in H. influenzae where the genetic evidence strongly suggests that the - 5 0 to + 60 region is biologically relevant in this gene regulation. 27 10 H. influen IN I KD| A. succinoMKQQVENKQAI H. somnus MT KS LB P.multo P T K T E B M.succini MNRTN| A.actino BT T V S -H. ducreyi MKN I E! -4- P'euro M Ais p | H I D S V C S L L O Q L V G • N NT AVILEfet R R L L N S H [* T E N I R J E L T E L I G - T K f e . S E L I G - E H T I L H S f L E N E V T A ! N E F K Q L V P G E I S I T ! F R S I G L - - I KF E IF S P L 6 E - - INF 50 KE E TMF A I WON K G T M F A I Y Q KQ KGLMFGLYQC'O E K 1 M F 6 L Y Q I I I I RKVMFGIV!CD SO I M F A L H O N D D TMFGLYKE K-DTMFASJDHKNDR 70 80 90 100 VLAIQLTKLOCEPFTTNELNK-|F VLSQYYALSOQILRSHR DLANEL L AYGSYPWAYI P R S N - MKTOPI YYHI SO S IQNDE I KYL E SLGALSYLAL T P N P-QL H IGNYYLLPKE I TQNKE DLAL F LE SQGAASYLAKQ APS • |L N I S YYR L P R K I T S DKS VE k L G A V $ W D IFNKNI - NLAIS SYYR L P R A L V [> N E E E I D P N N T N I L A L H R V Y Q L P H S I RI'Ni: • D H K I S - i o - I K I FYLVPQH 1 I EH I S I DRR I © -IQQSH I FYL I PSS I I HNL H Q L V E i V E i L G A V S W C ' I F NKN 1 - N L » I ; e L V P , L E : L G A V A Y S I D P N M T N | L - L ^ Q Y L Q E I K D T I P I H F L S D H K I S - • • - SI-Q F Y P D I I R T I P i HF L I D R R IG-IQQSF H. influen A. succino H. somnus P. multo M.succino A.actino H. ducreyi A. pleuro 130 140 150 170 G SO A G M L L G I PV 5 1 TA KS KS L S KK V H G F ; I N V L * S I N V L W K F V C A L Q N L I L F W N F T A A 0 IG I FWAF H A A V M V E L F W L F Q A A K L 6 V S A N V E L L W T L C A I QOOF E V N G L L L F K I M L G L E V T E L L L F K Figure 3.2. ClustalW alignment of sxy and its homologues in Pasteurellaceae. The colors show conserved amino acid groups in the sequence alignment with small hydrophobic residues shown in red, acidic residues in blue, basic in magenta and hydorxyl, amine and basic in green. Dots represent gaps in the sequence alignment. to oo A l l of the R N A sequences were 110 nucleotides long and the translation start was used as an anchor point. The R N A sequences were submitted to Mfo ld , a software that produces R N A secondary structure predictions with the lowest free energy basing them on the nucleotide sequence. This software usually provides more than one output with different thermodynamical stabilities predicted. For each of the R N A s , Mfo ld produced alternate structures with different AGs , except for H. influenzae where there was only one. The predicted secondary structures were compared visually, looking for the presence of a stem region similar to the one in H. influenzae in length and in position relative to the translation start. The strongest secondary structure was predicted for H. influenzae (AG=-23.06 kcal). Most of the other structures were predicted to be much weaker, with two of them having AGs close to -15 kcal (M. succini and A. succino) and the rest AGs with less than -10 kcal (see Figure 3.3). The first round of analysis included only the thermodynamically strongest predicted structures; however, because no structural similarities were seen the analysis was expanded to include the rest of the outputs. Addit ionally, the analysis included structures produced by changing the length of the submitted R N A sequence by sequentially adding nucleotides and looking for alternate similar structure that could rise. None of these analyses yielded an obvious common secondary structure. Although stem regions were present in several predicted structures, the lack of co-varying sites suggested that the R N A secondary structure and thus its function is not conserved in the sxy homologues. Since Mfo ld predicted at least one secondary structure in all of the R N A s , I tested folding of 110 nucleotides of both random R N A sequence and the sequence from the rec A 29 coding region. In both of the sequences, secondary structures are not expected to be important therefore these R N A s should be unstructured. However, Mfo ld predicted folding in both of the sequences with AGs even stronger than in sxy R N A , suggesting that this algorithm may not be the best choice for the phylogenetic comparative analysis. To test whether the predicted foldings are the same using slightly different program algorithms, I compared the outputs of R N A s of sxy homologues produced by RNAdraw (http://www.rnadraw.com/), Vienna server (http://www.tbi.univie.ac.at/~ivo/RNA/) and Kinefold (http://kinefold.curie.fr/). Except for Kinefold, all of the other programs predicted the same structures as Mfo ld did. Kinefold secondary structure predictions, although slightly different, did not reveal any additional structural similarities in the examined R N A s (data not shown). 30 A.succino "'" \ AG=-12.2 \ 5-2 ". I'.l u 12 P.multo AG=-8.6 { ) 1 A.actino AG=-6.3 A R C A - U \ ft A A A A A R U A A ' u-u' A.pleuro AG=-8.41 Figure 3.3. RNA secondary structure predictions of sxy and its homologues in Pasteurellaceae. Sequences of sxy and its homologues from eight fully sequenced Pasteurellaceae were folded in Mfold. A l l sequences extended from -50 to +60 where +1 is the translational start (red bar). Only the structures with the lowest A G predictions are shown. 31 1.1.3 Covariance analysis in H. influenzae strains In the absence of any evidence for structural conservation of sxy homologues in other Pasteureallaceae species, sxy amino acid and nucleotide sequences of five available H. influenzae strains were aligned (H. influenzae K W 2 0 , R2846, R2866, 86-028NP and 10810). The sxy coding and the upstream intergenic sequences are nearly identical, with only six substitutions in the coding region (two of which are silent), and none in the first 50 nucleotides of the upstream sequence (see Figure 3.4 A) . The most divergent sequence is that of R2846. The length of the intergenic regions upstream of sxy is 319 bp in strains K W 2 0 and 10810, and about 370 bp in strains R2866, R2846 and 86-028NP. A s shown in Figure 3.4 B all the strains except for R2846 had identical - 50 to +60 sequence and thus were predicted to fold identically. Although mutations are present only in the R2846 sxy sequence, they do not necessarily alter sxy R N A folding in this strain: both mutations C 3 9 -> U and A 5 0 - > G conserve base-pairing, while the U 2 7 -> G change is in a loop. However, secondary structure produced by Mfold showed stem Ia+Ib to be shorter by one base pair. Addit ionally, the substitutions are predicted to change secondary structure of sxy R N A in this strain by increasing the number of base-pairs and producing an additional stem region. The A G in R2846 secondary structure was predicted to be stronger than in the rest of the examined strains. The very high sequence similarity and the restriction to only five H. influenzae strains makes this analysis uninformative in comparative phylogenetic analysis. Perhaps increasing number of sequenced genomes 32 and more reliable R N A secondary structure prediction algorithms may soon allow more detailed phylogenetic analysis of sxy in Pasteurellaceae. 33 B Hin R2846 Hin R2866 Hin 10810 Hin KW20 Hin 86-028 -50 -25 t l +15 +30 +45 +60 +75 GAAGTACTTCTACTGACTCTTTTAAAATAATTATTCATTGGAGGTTTAAT ATG AAT ATA AAG GAT GAG CAT ATA GAf AGC GTT TGC T C i TTG TTA GAT "jSG TTA GTA GGA TGT TTC CTT TAA AAA TCT TTT TAG TGG GAAGTACTTCTACTGACTCTTTTAAAATAATTATTCATTGGAGGTTTAAT ATG AAT ATA AAG GAT GAG CAT ATA GAT AGC GTT TGC TCS TTG TTA GAT C«C TTA GTA GGA TGT TTC CTT TAA AAA TCT TTT TAC TGG GAAGTACTTCTACTGACTCTTTTAAAATAATTATTCATTGGAGGTTTAAT ATG AAT ATA AAG GAT GAG CAT ATA GAI AGC GTT TGC T C | TTG TTA GAT C|G TTA GTA GGA TGT TTC CTT TAA AAA TCT TTT TAC TGG GAAGTACTTCTACTGACTCTTTTAAAATAATTATTCATTGGAGGTTTAAT ATG AAT ATA AAG GAT GAG CAT ATA GA§ AGC GTT TGC TCf TTG TTA GAT CE TTA GTA GGA TGT TTC CTT TAA AAA TCT TTT TAC TGG GAAGTACTTCTACTGACTCTTTTAAAATAATTATTCATTGGAGGTTTAAT ATG AAT ATA AAG GAT GAG CAT ATA GA§ AGC GTT TGC T c | TTG TTA GAT C|G TTA GTA GGA TGT TTC CTT TAA AAA TCT TTT TAC TGG U n t r a n s l a t e d reg ion M N 1 K II K II 1 D S V C S L L D Q L V G N V S t K N L K T I I E R G - 6 - U A ' G' U & fi u jk > Y U fl—u/ H i K W 2 0 H i 8 6 - 0 2 8 N P H i 10810 H I R 2 8 6 6 fi' vu ft' VA i - i ( U ) c 11—ft U — G U—A u' ^ yr<$ (G) C - 6 i i l a r \ (G) ft—U i-i u H o-u u Lt -G G - A - A - G - U - f t - C - U — fi— T AG=-23 .06 H i R 2 8 4 6 i-G-G-y \ \ / ff (IT) JL, X X r ' V i r V (o :?:j:J:S:?:|s $ - A - A ' A ' M '(A) A-y G-U u-A H U H G-A-A-G-U-A-C-U-4—3' 5-AG=-25.5 Figure 3.4. Phylogenetic comparative analysis of five fully sequenced H.influenzae strains. A . Nucleotide alignment of sxy sequences (the untranslated and partial coding region) from different H. influenzae strains, showing the position of substitutions in strain R2846 (in grey). The amino acid sequence of KW20, R2866, 10810 and 86-028NP strains is shown below the alignment and the two changes in R2846 strain are indicated below. B . sxy RNA secondary structure predictions in H. influenzae strains. Sequences of sxy in five H. influenzae strains were folded in Mfold. The sequences extended from -50 to +60 where +1 is a translational start (red bars). The nucleotide substitutions in R2846 strain are shown in grey. 3.2 RNA secondary structure mapping 3.2.1 sxy RNA folds into a stem loop structure Bannister used the sxy-6 and sxy-1 mutants to show that restoring base-pairing or creating additional base-pairing in the stem region of sxy R N A drastically reduces transformation frequency, suggesting that sxy is likely regulated by its R N A secondary structure. To test if and how sxy m R N A naturally folds, I used in vitro nuclease mapping of R N A secondary structure. The method is based on RNase cleavage of specific residues in folded R N A molecules. Different RNases have different specificities with some cutting only double-stranded regions and the others cutting 3' to specific single-stranded residues (see Table 3.1). Use of several different RNases in the analysis allows insight into the R N A secondary structure in in vitro conditions. For RNases that are single-strand specific, partial digestion of folded R N A s allows identification of single stranded nucleotides in the secondary structure while the digestion of denatured R N A s (control or ladder samples) provides information on all the nucleotides in the sequence that are recognized and cut by the RNase. For example, RNase T l in the control sample would cut all of the Gs in the sequence (since all of them are single-stranded in denatured R N A ) while in a folded R N A this enzyme would cut only Gs that are single-stranded. It should be noted that in vitro mapping analysis provides information on the average secondary structure of a thermally dynamic R N A population (rather than a set of R N A molecules in identical conformations). Interactions between residues involved in 35 secondary structure formation are dynamic, resulting in different R N A conformations. The data from mapping analysis for a certain residue actually represents the probability that the residue is base-paired under specific conditions and over the time of the digestion. RNase Specific cleavage site A 3' of ss Cs and Us V I base-paired nucleotides T l 3' of ss Gs Table 3.2. Specific cleavage sites for RNases used for in vitro mapping of sxy, sxy-l and sxy-7 R N A . R N A for these analyses is usually prepared by 5' end labeling with 3 2 P , but other methods have been also employed (9). This is followed by a partial digestion of folded molecules and the resulting fragments are resolved on denaturing sequencing gels. The percentage of polyacrylamide used in gels depends on the size of the R N A fragments that are to be resolved (each gel type usually gives resolution of about 100 nucleotides). A n alkaline ladder, commonly used in nuclease mapping, is prepared by exposing end labeled R N A to alkaline conditions for a limited time which results in nonselective partial cleavage of all phosphodiester bonds and production of a band (in a gel) for every residue. This ladder is used to facilitate the assignment of the bands in the gels to the residues in the R N A sequence. 36 The decision on the residue status (base-paired or not) is made by comparing the band intensities in the experimental lane (ie. conditions of limited digestion) to those in the ladder lane; this provides information on how strongly each of the sites in the R N A is cleaved (see Figure 3.5). For example, if the examined band is less intense in the experimental lane than in the control the corresponding residue in R N A is classified as weakly cleaved and is more likely to be base-paired. Classification of this type allows more objective analysis of this relatively subjective method. Additionally, for sxy, sxy-l and sxy-1 R N A the bands were quantified by ImageQuant and the intensities for the particular residues were plotted to aide in making comparisons (Fig. A .2 in Appendix). RNases used in the mapping of sxy R N A secondary structure are shown in Table 3.1. The analysis provided information on single stranded (ss) Gs, ssCs, ssUs and base-paired nucleotides in the sequence. Other potentially informative enzymes are no longer commercially available (9). The gels were prepared to the resolve first 110 nucleotides at the 5' end of the sxy transcript. The analysis I summarize below is based on more than 10 digestions and over 30 gels (for each R N A species); only two gel pictures wi l l be presented (additional representative gel pictures are presented in the Appendix section). RNase T l digestion. RNase T l specifically cuts at single-stranded Gs; thus, the stronger the cleavage in the folded R N A , the higher the proportion of that G residue that is unpaired in the population of R N A molecules. Lane 3 in Fig 3.6 shows the cleavage pattern for folded sxy R N A 37 digested with RNase T l . Lane 2 in the same figure is the T l ladder, where unfolded R N A was digested with the same amount of enzyme. T l ladder T l partial digest Strongly cleaved Moderately cleaved Weakly cleaved Figure 3.5. Sample analysis. Segment of a gel illustrating different cleavage intensities in the "structure lane". The decision on the nucleotide status (the right side of the figure) is made by comparing the band intensities in the ladder and structure lane. 39 In the denatured R N A (lane 2, Fig 3.6) RNase TI cut all of the Gs present in the sxy sequence, giving bands of very similar intensities. In folded R N A (relative to the control R N A ) very strong RNase TI cleavage was seen at G . 1 0 , G . 8 , G . 7 , G + 2 5 and G + 2 9 and moderate cleavage was observed at G . u and G + 6 4 . Some of the sites (G_ 3 6, G + 3 , G + 1 2 , G + 1 3 , G + 1 6 , G + 3 1 , G + 3 5 , G + 4 2 , G + 4 6 , G + 5 1 and G + 5 5 ) were cleaved more weakly than in the control, suggesting that these positions are either double stranded or inaccessible in the folded R N A . The bands corresponding to residues +18, +58 and +59 were not present in the structure lane; however, corresponding bands were observed in the TI ladder lane, implying that they are being protected from cleavage in the folded R N A (Fig. 3.6). RNase A digestion. Pancreatic RNase A specifically cuts unpaired pyrimidines in folded R N A . Unlike RNase TI, which cleaved all of the Gs in the sxy sequence, RNase A was unable to cut some of the single-stranded Us and Cs in the denatured sxy R N A (see F ig 3.6, ladder lane 4). However, like RNase TI, the cleavage sites and the relative band intensities were reproducible over different digestions performed under the same conditions. Lane 5 in Figure 3.6. shows the cleavage pattern of sxy R N A partially digested by RNase A , while lane 4 in the same figure is an RNase A ladder. The following nucleotides were cut strongly in folded sxy R N A relative to the ladder: U_ 2 8, U . ^ , U. 4 , U + 2 3 and U + 2 7 . Residues at sites U_, and U + 2 1 were moderately cut, while some sites were cut very weakly (U.^ , U_4o, U. 1 9 , C . i 5 , U . + 6 , U + 8 , U + 1 9 , U + 4 4 , U .23, C + 4 9 , U + 5 3 and U + 5 6 ) . 40 -40— T Figure 3.6. Nuclease mapping of sxy m R N A . 5' end-labeled sxy R N A was subjected to nuclease digestion (see section 2.2.2) and the resulting fragments were resolved on an 8% sequencing gel. Lanes 1, 2 and 4 were loaded with ladders: lane 1- alkaline ladder, lane 2- RNase TI ladder and lane 4 - RNase A ladder. Lanes 3 , 5 and 6 are "experimental" lanes loaded with sxy R N A partially digested with: 3 : RNase TI, 5: RNase A and 6: RNase VI . The numbers on the left are base positions in the sxy sequence relative to the translational start; the sequence itself is given on the right. 42 RNase VI digestion. RNase VI, unlike RNases Tl and A, is structure specific but not sequence specific, preferentially cleaving at any base-paired or stacked nucleotides. No ladder is shown in Fig 3.6 because this enzyme does not cut denatured RNA. The lack of a ladder makes classification of the sites recognized by this enzyme more subjective than for samples digested with RNases Tl or A. However, the RNase VI digest results were reproducible. Lane 6 in Figure 3.6. shows an RNase VI cleavage pattern. Several segments in folded sxy RNA were cut with RNase VI to various extents: U_ 4 3 to A. 3 6 , U . 3 3 to A_27, U_ 1 9 to C 1 5, U + 2 to U + 5 , A + 2 0 to U + 2 i and C + 3 9 to U + 4 1 . RNase VI shows strong specificity for nucleotides at positions -32 and -29. Bands with moderate intensity corresponded to nucleotides at positions -41, -31, 30, -29, -28, -16, -15, +21 and +49 while the rest of the above-mentioned nucleotides were weakly cut (see Fig.3.6). RNase VI also cut folded sxy RNA at positions -22, +34 and +49. The first two residues were cut very weakly while the last one was moderately cut. 3.2.2 Enzymatic mapping of sxy RNA supports its computationally predicted structure Figures 3.7 and 3.8 present summaries of the data produced by nuclease mapping, superimposed on the thermodynamically most favorable structure predicted by Mfold. Overall, the enzymatic mapping provided evidence that the sxy transcript folds into a structure very similar to the one predicted by Mfold and that stem Ia+Ib, predicted to regulate sxy expression is present in in vitro conditions. 43 • G-10 i G v U II / ,H+1 -G U-~ ft ft" V III ft 1+10 ft ft. Ib Ia • f l - U i i A—Um i i U—A i i OU—G« i i -30 (J—A u ' NUO oc C OU—A+50 I I C—G» I I A—U I I O G — U « I I OU—A I I OC—G® OA—Um i i -40*»OU—A i i C — G X i i O U — G X G . 1 +40 I G. I o ° - u - A . +30 * \ > G C 1 e a v a g e G - A - A - G - U - A - C ^ U — A -*thc cleavage intensity is relative to the corresponding ladder lanes 5 ' Figure 3.7. Secondary structure of sxy m R N A in vitro. 5' end-labeled sxy R N A was partially digested with RNase TI, RNase A or RNase V I in separate reactions. Resulting fragments were resolved on 5%,8% and 10% sequencing gels. The residues were categorized as strong, moderate or weak cut sites, depending on the corresponding band intensities in the gels. The data was superimposed on the Mfold-predicted secondary structure of sxy R N A . RNase St rong M o d e r . W e a k Not cut TI (ssGs)* • • X A (ssC & ssUs)* - m x V I (ds) o ° 44 Evidence that the residue is mm Strongly base paired Partly base paired • Single stranded 5 ' Figure 3.8. Summary of base-pair ing in sxy R N A secondary s t ructure. The cleavage results from different RNase digestions were summarized and superimposed on the Mfold-predicted secondary structure of sxy R N A . G - A - A - G - U - A .-II l U - f l I C - G U—G| i i C U—A— 3' J i i 45 Although the experimental data for most of the residues support the predicted structure, there are several discrepancies between the theoretical and experimental foldings. The RNase VI cleavage pattern supports the Mfold prediction that U . 2 8 and the flanking residues in stem lb are base-paired, but moderately-strong cleavage at by RNase A suggests that this residue is single-stranded. These contradictory data imply that this position is very dynamic in the secondary structure and that it is probably single-stranded in most of the molecules in the R N A population. Although predicted to be unpaired, both U + 8 and G + 4 2 are only weakly cleaved by RNase A and T l respectively. This may be due to the position of these residues in a very short stretch of single-stranded nucleotides between two long stems, which may be physically inaccessibleto the RNases. In the predicted secondary structure stem region I is interrupted by a small bubble with four unpaired residues ( C 3 2 , U . 3 1 , U + 4 9 and C + 5 0 ) . Although predicted to cut all four, RNase A very weakly cuts only two of these residues ( U + 4 9 and C + 5 0 ) suggesting that these positions are mostly base-paired. Pairing of these bases is also supported by RNase V I cleavage pattern: residues U. 3 1 and U + 4 9 were cut moderately and C 3 2 w a s cut strongly by RNase VI . 3.2.3 S e c o n d a r y s t ruc tu re of sxy-1 m R N A The sequence of sxy-l differs from the wild type sequence by a single G to A transition at position +55. This change was hypothesized to increase Sxy expression by disrupting base-pairing in the stem region. The Mfo ld prediction shows that the substitution in .sxy-l would affect its secondary structure only by opening a small bubble in the stem region la 46 This, however, is predicted to strongly decrease the stability of the stem Ia+Ib, changing its A G from -13.3 (in wi ld type sxy R N A ) to -7.3 kcal. To check whether sxy-l R N A folds in the same manner as the wild type R N A , I mapped ~110 bases at the 5' end of the transcript. The sxy-l mutation is predicted to make residues at positions -38 and +55 single-stranded and RNase A is predicted to detect this change by cutting residue C . 3 8 in the folded sxy-l R N A . The analysis of sxy-1 R N A revealed a very similar cleavage pattern to that of wild type sxy R N A . Except for the three sites, G . 3 6 , G + 5 8 a n d G + 5 9 , which were cut very weakly by RNase T l , no wi ld type bands were missing (Fig. 3.9). However, when the relative band intensities were compared, certain residues appeared more sensitive to digestion in sxy-1 R N A . It should be noted that the partial digestions for both R N A types were performed under the same conditions and at the same time to minimize introduction of experimental variation. Below, I discuss all of the sites whose cleavages differ in sxy and sxy-1 R N A s (see Figure 3.9). Residues at positions +51, +58 and +59 were more susceptible to RNase T l in sxy-1 R N A than in the wild type, while those at -28 , -19, +6, +8, +44, +49, +53 and +56 were more susceptible to RNase A digestion. It should be noted that the residues positioned in the loops (see F ig. 3.9. - for example residues G 8 , G. 7 or G + 2 5 ) were cleaved with the same intensity both in sxy and sxy-1 R N A . This provides additional confidence that the differences in cleavage intensities for some sites in sxy and sxy-1 R N A are a reflection of actual structural changes rather than just a result of different conditions. 47 Nuclease mapping did not provide any data on the additional bubble (residues -32 , -31, +48 and +49) predicted to form in folded sxy-l R N A due to the mutation (see Fig. 3.10). Pairing between the nucleotides at -38 and +55 position couldn't be directly detected because of the specificity of enzymes used in this analysis. Although C_ 3 8 was predicted to be unpaired in sxy-l R N A , this position was never cut (not cut in wild type sxy either) in any RNase A digested samples (not even in the denatured R N A ) . Overall, nuclease mapping of sxy-l indicated that its R N A folds in a structure very similar to the one present in sxy R N A . Although the experimental data failed to produce evidence for the predicted additional bubble introduced by the point mutation, it suggests that the stem I in sxy- l R N A secondary structure is generally less stable than the one in the wild type (see Fig. 3.11). J 48 RNase T I RNase A O H L i.vy i.vy-1 «ry-7 L sxysxy-X sxy-l 1 2 3 4 5 6 7 8 9 G G + 3 • U+71 r + 5 9 £ 63 J G ^ + 3 5 ^ G + 3 1 ^ G + 2 9 ^ G+ 2 5 ^ * . I " •«» U +23 G-7 • G-8 • f| *» G - 1 0 ^ IC.„ Figure 3.9. Nuclease mapping of w i ld type sxy, sxy-l and sxy-7 R N A s . 5 ' end-labeled R N A s were subjected to nuclease digestion (see section 2.2.2) and the resulting fragments were resolved on an 8% sequencing gel. Lanes 1, 2 and 6 were loaded with ladders: lane 1- alkaline ladder, lane 2- RNase TI ladder and lane 6 - RNase A ladder. Lanes 3, 4 and 5 were loaded with R N A s partially digested with RNase TI: lane 3- wt sxy R N A , lane 4- sxy-l R N A and lane 5- sxy-7 R N A . Lanes 7, 8 and 9 were loaded with R N A s partially digested with RNase A : lane 7- wt sxy, lane 8-sxy-l and lane 9- sxy-7. The numbers on sides are base positions in the sxy sequence relative to the translational start. 50 sxy w t s x y - l s x y - 1 •G'IO a m I n I I - F - 6 . ft •G'IO s u ****** I I n •u 1 ft ft III "-u H I I • G , G . ft' •C-IO I G \ l 1 / ft~ll U-f t ' lb IT vfl—U' « I l ft—Ilm I I U—ft II—G. .39U-4 U J . I I c c „ Nl-fl'sO 1 1 C—G. 1 1 ft—U \C—llm i n ./ P - r + a *x +30C ~ft-U I ft I G. lb la G-A-ft-G-0-X X ft— 3' 1^ ,G f^l—U • I 1 ft—U 1 1 • U—ft ii—G« 1 1 -30U—ft if y c c VU—ft'50 1 1 C—G I 1 ft—U I i • G — U m I I U—ft c \ A—U 1 1 -40«U — ft I I C—G 1 1 U— G G - f l - f l - G - U - f l - C - U — ft-+40 C - 7 - « n i U-ft +30C • -ft-U' I Ia+b -fl-U I ft—I U — 1 II—I -30 U—1 (S-l X(p-i u—1 „G +40 1 f~n. la ' fi) +55 (G->A) III / G. -ft M ft I o 'ft-u' I G-ft-fl-G-U-ft-C-U—ft 5' -31 (T->A), -32 (C->G) Figure 3.10. Secondary structures of sxy, sxy-1 and sxy-7 R N A in vitro. 5' end-labeled RNAs were partially digested with RNase T l or RNase A in separate reactions. The resulting fragments were resolved on 5%, 8% or 10% sequencing gels. The residues were categorized as strong, moderate or weak cut sites, depending on the corresponding intensities in the gels. The data from seven independent digestions (for each enzyme) were summarized and superimposed on the Mfold-predicted secondary structures of sxy, sxy-l and sxy-7 RNA. The mutated sites in sxy-1 and sxy-7 RNA are circled. Cleavage relative to the ladder lanes RNase Strong Moder. Weak Not cut Tl (ssGs) • • X A (ssC & ssUs) X s x y wt A fc-G-io II -20u, U A ' H ,"4 *10A lb A — U 4 t i A — U • u — A H 3 0 U — A f f U — A . 5 0 i n • 3 0 C G A - U ' A. Ia sxy-l i i G - G - U A ' *U ^ G - 1 0 U A fl-u C-G' r -" u lb Ia' A^ ^6* 1 1 _ A - U - ^ - I l • U—A I I U - G -30|j—A c VU—A* 50 C— G-«< U—A A— 10 l i - A C — G ^ U — G - ^ ! TTT 1 0 A ^ f l A A A d l - y ' A 'u - « 3 0 C sxy-7 A' *-G ' -10 G - S - U •4 2 ° U - U - A A ' V U II • u f I 0 A ^ A ^ „G A — U . A — U • u — A i < U - G U— A+50 Ia+b G - A - A - G - U - A - C - U — A — 3 ' G - A - A - G - U - A - C - U — A -+55 (G->A) G - A - A - G - U - A - C - U — - A I A — 3 -III - A . -It 'c-|-tQ_ Br , , 5 -1 + 3 0 C -31 (T->A). -32 (C->G) Figure 3.11. Summary of base-pair ing in the sxy, sxy-l and sxy-7 R N A secondary structures. The cleavage results from different RNase digestions were summarized and superimposed on the Mfold-predicted secondary structures of the three R N A species. Evidence that the residue is Strongly base paired Partly base paired • Single stranded on 3.2.4 S e c o n d a r y s t ruc tu re o f sxy-7 m R N A The sxy-1 sequence carries two substitutions at positions -32 and -31, designed to increase base-pairing in the stem region by eliminating the bubble (2). Mfo ld predicts that the resulting perfect 18bp long stem formation wil l decrease A G in this region from -13.3 to -20.6 kcal. To check whether sxy-1 folds into the predicted structure, I used enzymatic mapping of sxy-1 R N A using the same conditions used for the wi ld type and sxy-l R N A s (see sections 2.2.2 for details). Both RNase T l and A are predicted not to cut the stem region in the folded sxy-1 R N A , and the control (denatured) sxy-1 R N A should have an additional cut site (G_32) when compared to the wild type sxy R N A . Mapping data of sxy-1 R N A showed that its secondary structure is very similar to the one present in the wi ld type sxy. A s predicted G . 3 2 was cut in the control and not cut in the folded R N A suggesting that this residue is base-paired (data not shown; see Fig A l in Appendix). C + 4 9 , predicted to base pair with G 3 2 , was only weakly cut suggesting that pairing does occur under in vitro conditions. Unfortunately, experimental data were not available for both A_ 3 1 and U + 4 S sites due to specific features of RNases used in the analysis. Residue U + 4 8 , predicted to base-pair with A . 3 ] , was not cut by RNase A , even in denaturing conditions. Except for residues U_ 2 8, U + 8 and G + 4 2 , all of the sxy-1 mapping data support the predicted structure. These three residues are cleaved with the same intensity in the wild type R N A (Fig. 3.11), suggesting that the substitutions did not affect this region of the sxy-1 structure. Finally, the data confirm that under in vitro conditions the stem region Ia+b exists in the sxy-1 R N A and that except for U. 2 8 all of the nucleotides are paired in this 53 region (see Fig. 3.11). This implies that the stem Ia+b in sxy-1 R N A is stronger than stem Ia+Ib present in wild type sxy supporting the hypothesis that R N A secondary structure limits this gene's expression. Overall, RNase mapping data of sxy, sxy-l and sxy-1 provided evidence that their R N A s fold, corresponding to Mfo ld predictions, into a similar stem-loop structure in in vitro conditions. Addit ionally, my data suggest that the point mutations present in sxy-l and sxy-1 affect the stability of their R N A secondary structures rather than changing the way they fold. 54 Chapter 4: Discussion 4.1 Co-variance analysis 4.1.1 Pasteurellaceae could carry a regulatory element in the region upstream of sxy homologues The comparative analysis I have done shows that sxy has homologues in all sequenced Pasteurellacean species. In addition, phenotype of the strain with compensatory mutations designed by combining sxy-l and sxy-3 hypercompetence mutations in H. influenzae strongly implies that sxy expression is limited by its R N A secondary structure (2). If the regulation of sxy and its homologues is conserved in Pasteurellaceae, we would expect upstream intergenic regions to be similar length to the 5 ' untranslated region in H. influenzae sxy R N A and to carry a similar regulatory element. Surprisingly, my analysis showed that intergenic regions upstream of sxy homologues are much longer than that of each corresponding average (intergenic region) in every one of the species, with their lengths exceeding 500 nt in most of the species (in H. influenzae the average is about 120 nt long and in other annotated Pasteurellaceae is less than 150). The intergenic regions upstream of sxy homologues are very likely true non-coding regions since no tRNAs , rRNAs or sRNAs were predicted in these regions and high number of stop codons is present throughout the sequences. Long intergenic regions are unusual for bacterial chromosomes, whose genome compactness is essential for rapid cell division and D N A replication (19,26). Selective 55 pressure on bacterial chromosomes makes the genomes compact with minimal length and number of non-coding regions (19). Considering the unusual lengths of the intergenic regions upstream of sxy and its homologues it is important to ask whether their length and structure can indicate a role in regulation and reveal more about the function of these regions? One explanation for the length of the intergenic regions is that Pasteurellacean sxy genes carry one or more additional regulatory element(s) upstream of the gene. Interestingly, sxy in H. influenzae and two other Pasteurellaceae is predicted to include C R P sites in its promoter regions ((40), A . Cameron, personal communication). However, none of these sites has a strong C R P concensus binding sequence, and no experimental evidence is yet available. Testing the expression of Sxy in a strain with the putative C R P site deleted (without otherwise affecting the sxy 5 ' R N A untranslated sequence) would show if this portion of the intergenic region is important in sxy regulation. Another alternative explanation is that the intergenic regions upstream of sxy in Pasteurellaceae actually carry non-coding R N A s (ncRNA) in their sequences. This recently described class of R N A is involved in regulation of gene expression by complementary binding to transcripts (8). ncRNAs are found in all three domains of life but despite the availability of many sequenced genomes, the number and diversity of ncRNAs is still poorly understood. Due to the lack of both experimental evidence and reliable algorithms for detecting ncRNAs, it is believed that most of them are still unidentified. For example, in the E.coli genome over 60 ncRNAs have been experimentally detected, but comparative analysis predicts more than 200 ncRNAs in this genome, suggesting that most of them are still uncharacterized (21). Although ncRNAs 56 could account, in part, for the length of the intergenic regions upstream of sxy homologues in Pasteurellaceae, there is still neither bioinformatic nor experimental evidence that they exist in the H. influenzae genome. 4.1.2 Regulation of sxy is probably not conserved in Pasteurellaceae Since sxy has homologues in three gamma-proteobacterial families and is essential for competence development in both H. influenzae and V. cholerae (28), it is parsimonious to expect that this gene is regulated in the same manner in all of the Pasteurellacean species. To address this question, I looked for conserved motifs in the alignments of sxy and its homologues and compared predicted secondary structures of their R N A s . The phylogenetic comparative analysis revealed no common secondary structure in the R N A of sxy homologues in Pasteurellaceae and in the following paragraphs I discuss some possible implications. One explanation is that R N A s of sxy homologues can assume functionally similar secondary structure in the presence of a R N A binding protein. Although R N A s of sxy homologues are predicted to fold in different manners it is possible that an R N A binding protein binds to the untranslated regions of the R N A s and organizes them into a common secondary structure to modulate sxy expression. Some of the R N A binding proteins, such as Hfq or CsrA are known to affect gene expression by changing their abundance in the cell in growth depending manner (30, 34, 36). Although both hfq and crsA are present in 57 H. influenzae genome, there is no experimental evidence that they play role in regulation of the sxy gene.. Another explanation is that structures like that of H. influenzae sxy R N A are not present in the R N A s of other Pasteurellacean simply because the same mode of regulation is not conserved. The regulation of natural competence is very fluid and different taxa have found different ways to regulate competence (14). It is possible that other Pasteurellaceae species have altered the regulation of sxy expression to adapt to their specific environmental conditions. If this is true, the lack of common structures in the R N A of sxy homologues would suggest that the specific stem-loop structure in H. influenzae evolved recently in this species to regulate sxy expression. Finally, it is possible that the method that I used in looking for a common structure in Pasteurellaceae was too crude to determine the fine structural similarities that might be present in R N A s of sxy homologues. One of the limitations of the analysis was in the usage of individual structures produced by Mfo ld and visual inspection for similarities. Mfold is commonly used for R N A secondary structure predictions, basing those on free energy minimisation of the R N A folding, so that it produces the most stable structure based on the sequence. This is good for general R N A secondary structure predictions; however, for covariance analysis this might not be the optimal choice, because the algorithm doesn't anticipate information from covarying sites. A better choice would be one of the software packages such as RNA l i gn (5) or R A G A (24) that combine comparative analysis with thermodynamic considerations. However, these require the presence of conserved motifs throughout the input sequences, which is lacking in the alignment of sxy and its homologues. 58 Another limitation of the analysis is that, except for H. influenzae, there is no experimental evidence that sxy homologues have significant untranslated regions in their R N A . Phylogenetic comparative analyses are usually done by comparing the secondary structure predictions of the aligned R N A sequences from related species, where the length of the untranslated region was previously experimentally determined. In this study, however, the comparative analysis was based only on characteristics of H. influenzae sxy R N A , with the working assumption that the R N A s of sxy homologues would have untranslated regions of similar lengths. Mapping the transcriptional start positions of sxy homologues in Pasteurellaceae, starting with A. pleuropneumoniae or A. actinomycetemcomitans (both shown to be naturally competent) would provide a better basis for the phylogenetic comparative analysis. 4.1.3 R N A secondary structure could be important in regulation of sxy in H. influenzae only In the absence of a common R N A secondary structure in Pasteurellaceae phylogenetic comparative analyses were repeated using sxy sequences of five H. influenzae strains. Nucleotide alignment revealed very high sequence conservation, which is insufficiently informative for this type of analysis. However, comparison of the sxy R N A secondary structures from different H. influenzae strains to the corresponding transformation frequencies suggests that the regulatory mechanism could be shared among the competent strains. Preliminary data (H.Maughan, personal communication) showed that both H. influenzae R2866 and 86-028NP are competent, which is consistent with the identity of their sxy sequences to the one in KW20. The strain R2844, is not transformable (H.Maughan, personal communication) and has two substitutions that 59 change the amino acid sequence. The nucleotide substitutions present in the R2846 sequence are predicted (by Mfold) to increase base-pairing in the m R N A which results in a structure with longer stem regions and thus more stable secondary structure in this strain see figure 3.4). Since these long double-stranded regions precede the translational start, it is likely that the secondary structure limits the translation of the sxy m R N A , resulting in a non-competent phenotype. Alternatively, the nucleotide substitutions present in the R2844 sequence do not necessarily alter the folding of sxy R N A in this strain: two of the substitutions conserve base-pairing and the third is in a positioned in a loop (Fig 3.4) allowing R N A to assume the same secondary structure as in KW20 . In this scenario, sxy R N A of the non-competent R2844 strain would have the same secondary structure as other competent H. influenzae strains suggesting that the substitutions present in R2844 could act by disrupting the Sxy protein function. This, however, is very unlikely since there is only one mutation that causes a significant amino acid change (Gln 1 7-> Arg) in the sxy sequence. In addition, this substitution is located in the beginning of the coding sequence, the region without conserved motifs in the Pasteurellaceae sxy alignment (see Fig 3.2). Alternatively, non-competent phenotype in R2846 can be explained by the fact that its sequence is very diverged from that in K W 2 0 and other competent strains (the mean sequence diversity is over 3% (22)). Therefore, the lack of competence in R2844 could also be due to mutations in other competence genes. Comparing expression profiles of K W 2 0 and R2846, using microarray analysis could reveal which gene(s) is (are) responsible for non-transformable phenotype. 60 4.2 RNA secondary structure mapping 4.2.1 5' RNA secondary structure may limit the expression of sxy The RNase mapping analysis supported the hypothesis that the expression of sxy is limited by its R N A secondary structure. The majority of the experimental data from all three enzymes used supports the structure predicted by Mfo ld suggesting that the stem region Ia + lb does form in in vitro conditions. Although the most of the data agrees with the predicted structure, discrepancies do exist. Residues C 3 2 , U . 3 I and C + 4 9 , predicted to form a small bubble in the stem region, yet are all moderately-strongly cut by the double-strand specific RNase VI . Likewise, residues G + 4 2 and U + 8 , predicted to be single-stranded, are only weakly cut by the single-strand specific RNases TI and A . Contradictory evidence also exists for residues at positions -28 and +21. Both of these residues are cleaved both with RNase V I (ds specific) and RNase A (specific for ss pyrimidines). Despite these discrepancies, the experimental data do not necessarily contradict the computationally predicted structure. RNase V I has specificity for double-stranded regions in R N A , but it also cuts single-stranded nucleotides in stacked conformations or interacting in tertiary structures (9), which could explain the cleavage at positions -28 , - 3 1 , -32, +21, + 48 and +49. However, in previous results by L.Bannisterr residue C + 4 9 was cut both in folded sxy and sxy-l R N A by RNase C L 3 (specific for single stranded Cs). These contradictory data illustrate one of this method's limitations, where modest variation of reaction conditions may result in an alteration to the digestion pattern. 61 Residues U + 8 and G + 4 2 , are both positioned between two long double-stranded regions in the predicted structure and it is likely that steric hindrance (physical blockage) makes these nucleotides inaccessible to the RNase A or T l digestion. Overall, the .sxy-1 R N A mapping data of supported the hypothesis that the point mutation in sxy-l mutant has a destabilizing effect on the secondary structure and, therefore, results in sxy overexpression. Surprisingly, the data showed that the effect of the point mutation is very broad and that it weakens the stem Ia+Ib in sxy R N A globally. This global destabilization of Ia+Ib stem could facilitate translation of sxy R N A explaining the increased levels of Sxy protein and the hypercompetent phenotype in the sxy-1 mutant. This point wi l l be discussed in greater detail in section 4.3.1 Mapping data for sxy-7 R N A also supported the predicted secondary structure. However, except for evidence of G . 3 2 : C + 4 9 base-pairing, it provided no direct additional data that this region is more strongly paired than it is in the sxy R N A secondary structure. Is the effect of this single additional base-pairing strong enough to fully inhibit sxy expression and therefore competence induction? M y analysis indirectly showed that the residues flanking the G_ 3 2 : C + 4 9 bp are paired (i.e. resistant to single-stranded-specific RNases), suggesting that the mutations may act by stabilizing stem Ia+Ib in general. It is possible that the longer and more stable stem region present in sxy-7 R N A limits translation and therefore prevents competence. In addition, Bannister's expression data showed that in the sxy-7 mutant both sxy R N A and protein levels are drastically decreased (2). To test further if sxy expression is limited by its R N A secondary structure, directed mutagenesis of the stem Ia+Ib could be performed to determine which residues are the most important for sxy regulation. A lso nuclease mapping of sxy R N A secondary 62 structure in other hypercompetent and non-competent sxy mutants should be tested to confirm that the hypercompetence mutations act by global destabilization of stem Ia+Ib. 4.2.2 Is the described sxy RNA secondary structure present in in vivo conditions? Although RNase mapping provides insight into the status of each residue in folded R N A , it also has obvious limitations. The results of an in vitro RNase mapping analysis are known to depend on reaction conditions (9). For example, a change in pH, temperature or divalent ion concentration in the reaction can alter the R N A cleavage pattern. Additionally, cleavage at a specific position in R N A can result in a secondary structure change and promote new, artifactual cuts in the same molecules. These "secondary" cuts, although weaker than the primary ones, can only be identified by comparing cleavage patterns of 5' and 3' labeled R N A molecules (9). With these concerns in mind one can ask if the sxy R N A structure mapped in vitro by RNases is real, if it exists in vivo, and to what extent the reaction conditions affect the folding of R N A . Several lines of evidence support that sxy R N A folds similarly to the predicted structure. First, restoration and addition of base-pairs in the predicted stem Ia +Ib of sxy R N A results in a non-competent phenotype (2), providing very strong genetic evidence that the stem forms in vivo and has a function in sxy regulation. Addit ionally, the nuclease mapping done with 7 different enzymes, (both by L.Bannister and me), all supported the predicted sxy secondary structure and confirmed that the stem Ia+Ib exists. Although RNase mapping data in general fits well to the predicted sxy R N A secondary structure it would seem possible that an alternate structure exists. However, upon closer examination, 63 the alternate structures predicted by Mfo ld at both 25 and 37°C did not fit the data as well as the thermodynamically most favorable one (produced at 37°C, data not shown). 4.3 Possible mechanisms of sxy regulation How is sxy regulated? We know that addition of ribonucleotides (or their derivatives) to the medium decreases transformation frequency (16) by directly affecting sxy expression (A. Cameron, personal communication). The secondary structure of sxy R N A is likely to limit the translation of the sxy message: mutations that weaken the structure strongly increase Sxy expression and have only small impact on the R N A levels. Addit ionally, the strains carrying mutations that introduce base-pairing in the sxy R N A secondary structure produce almost no transcript or Sxy protein (A. Cameron, personal communication, (2)). In the following sections, I discuss two possible mechanisms of sxy regulation in H. influenzae. 4.3.1 sxy expression could be regulated via translation initiation by sxy RNA secondary structure Translational control via R N A secondary structure alteration is a common mechanism of gene regulation in bacteria (6). Gene expression in this case is limited by sequestering the translation initiation region via R N A secondary structure (7). The translation initiation region, which includes the ribosome binding site (RBS) and the translational start codon, is bound by ribosomes in unstructured (single-stranded) m R N A and is subsequently translated. Alternatively, protection of the ribosome binding site and start codon by R N A 64 secondary structure prevents ribosomes from binding and initiating translation of the R N A . It is possible that nucleotide concentration in the cell directly regulates sxy expression by changing the rate of transcription elongation and accessibility of the ribosomal binding site or the start codon in sxy R N A . In this model, under non-inducing conditions (early log phase in s B H I ) , the high abundance of nucleotides in the cell allows steady transcription elongation and rapid progression of R N A polymerase (Fig 4.1 A) . This fast elongation of sxy message allows forming of a strong secondary structure that blocks the ribosome from binding and initiating the translation. Under the inducing conditions (late log phase in sBHI), transcription elongation may proceed at a lower rate or stall due to nucleotide depletion in the cell (Fig 4.1 B). Pausing of R N A polymerase due to low concentration of the precursors could prevent formation of the secondary structure in the sxy transcript and allow a ribosome to bind the R B S in sxy R N A . One piece of evidence that supports this model is an increase of both sxy R N A and protein abundance in hypercompetent sxy mutants. In these strains, destabilization of R N A secondary structure may allow ribosome binding and initiation of translation even in non-inducing conditions. It has been previously shown experimentally that the translational efficiency is directly related to the stability of R N A secondary structure in the region of R B S and the translation start codon (6). de Smit (1994) showed that a single point mutation in the R N A sequence can drastically alter its secondary structure, thus, the protein expression. 65 4.3.2 sxy could be regulated by a riboswitch Riboswitches are cis-regulatory elements found in untranslated regions of R N A s . (18). They control gene expression by binding small molecules (ligands) with high specificity without protein interaction. Depending on the concentration of the ligand in the cel l , R N A folds into alternate structures affecting the transcription or translation of the downstream gene (25). Nine distinct riboswitch classes have been described and each has distinct consensus sequence, ligand specificity and highly conserved secondary structure throughout different bacterial taxa (17). Four putative riboswitch sequences have been detected in the H. influenzae genome by bioinformatics analysis but none of them are similar to the sxy R N A nucleotide sequence or secondary structure (www.sanger.ac.uk/cgi-bin/Rfam/). However, production of both sxy m R N A and protein is directly limited by purine nucleotides (A. Cameron, personal communication) and sxy R N A folds into a complex stem-loop structure suggesting that sxy expression could be riboswitch-regulated. In this model sxy R N A would directly sense a change of ligand (i.e. nucleotide) concentration, and the binding would alter its secondary structure and its expression. In non-inducing conditions (early log phase in s B H I ) high ligand concentration would result in formation of translation-inhibiting sxy R N A secondary structure (Fig 4.1 C) , while in the inducing conditions, the reduced concentration of the ligand would allow folding into an alternate structure that allows translation (Fig 4.1 D). This model can be tested by performing RNase mapping analysis of sxy R N A secondary structure both in the presence and absence of different potential ligands. Alteration of the 67 secondary structure can be detected by comparing the results from these two conditions. Since nucleotides cannot be transported into the cell without their dephosphorylation in the periplasm (12), a range of concentrations of purine nucleosides and bases should be tested first. Although this model provides a reasonable explanation of sxy regulation it also has some limitations. First, my analysis suggested that R N A s in sxy homologues are lacking both common secondary structure and conserved motifs in their sequences. Addit ionally, the sxy sequence does not match any of the concensus sequences of described riboswitch classes. However, currently used bioinformatics programs for riboswitch detection also have some limitations. These programs identify new riboswitch sequences throughout different genomes by comparing them to the conserved motifs and secondary structures of the known riboswitches classes. Thus, it is possible that some of riboswitch sequences have recently evolved to accommodate specific life styles of bacteria but due to the current methods for riboswitch detection none have been yet recognized. C o n c l u s i o n To summarize, the R N A mapping analysis that I did provided support that sxy expression is limited by its R N A secondary structure. Additionally, the phylognetic comparative analysis suggested that the R N A secondary structure is important only in regulation of sxy in H. influenzae strains and that is not conserved in other Pasteurellaceae. Further experiments should be conducted to determine the precise regulation and role of this major competence regulator. 68 References: 1. Alexander, H.E,. and Leidy, G. 1951. Determination of Inherited Traits of H. influenzae by Desoxyribonucleic Ac id Fractions Isolated From Type-Specific Cells. J Exp Med 93:345-359. 2. Bannister, L.A. 1999. A n R N A Secondary Structure Regulates Sxy Expression and Competence Development in Haemophilus Influenzae. PhD Thesis 3. Cameron, A.D. S., and Redfield, R.J. 2006. Non-Canonical C R P Sites Control Competence Regulons in Escherichia coli and Many Other Gamma-Proteobacteria Nucleic Acids Research, in press. 4. Chen, I., and Dubnau, D. 2004. D N A Uptake During Bacterial Transformation. Nat Rev Microbiol 2:241-249. 5. Corpet, F., and Michot, B. 1994. RNA l i gn Program: Alignment of R N A Sequences Using Both Primary and Secondary Structures. Comput Appl Biosci 10:389-399. 6. de Smit, M.H., and van Duin, J. 1990. Control of Prokaryotic Translational Initiation By m R N A Secondary Structure. Prog Nucleic Acid Res Mol Biol 38:1-35. 7. de Smit, M.H., and van Duin, J. 1994. Control of Translation By m R N A Secondary Structure in Escherichia coli. A Quantitative Analysis of Literature Data. J Mol Biol 244:144-150. 8. Eddy, S.R. 2001. Non-Coding R N A Genes and the Modern R N A World. Nat Rev Genet 2:919-929. 9. Ehresmann, C , Baudin, F., Mougel, M. , Romby, P., Ebel, J.P., and Ehresmann, B. 1987. Probing the Structure of R N A s in Solution. Nucleic Acids Res 15:9109-9128. 69 10. Gwinn, M.L., Ramanathan, R., Smith, H.O., and Tomb, J.F. 1998. A New Transformation-Deficient Mutant of Haemophilus influenzae Rd With Normal D N A Uptake. J Bacteriol 180:746-748. 11. Hanahan, D., Techniques for Transformation E. coli. IRL Press Ltd, Glover D M (ed). 1985 12. Landick, R., Turnbough,C.L. and Yanofsky, C., Transcription Attenuation' in Neidhardt, F C (ed), Escherichia Coli and Salmonella (1996) 1263-1286. 13. Lawson, C.L., Swigon, D., Murakami, K.S., Darst, S.A., Berman, H.M., and Ebright, R.H. 2004. Catabolite Activator Protein: D N A Binding and Transcription Activation. Curr Opin Struct Biol 14:10-20. 14. Lorenz, M.G., and Wackernagel, W. 1994. Bacterial Gene Transfer By Natural Genetic Transformation in the Environment. Microbiol Rev 58:563-602. 15. Macfadyen, L.P. 2000. Regulation of Competence Development in Haemophilus influenzae. J Theor Biol 207:349-359. 16. MacFadyen, L.P., Chen, D., Vo, H.C., Liao, D., Sinotte, R., and Redfield, R J . 2001. Competence Development in Haemophilus Influenzae is Regulated By the Availabil ity of Nucleic Ac i d Precursors. Mol Microbiol 40:700-707. 17. Mandal, M. , and Breaker, R.R. 2004. Adenine Riboswitches and Gene Activation By Disruption of a Transcription Terminator. Nat Struct Mol Biol 11:29-35. 18. Mandal, M. , and Breaker, R.R. 2004. Gene Regulation By Riboswitches. Nat Rev Mol Cell Biol 5:451-463. 19. Maniloff, J. 1996. The Minimal Cel l Genome: "on Being the Right Size". Proc Natl Acad Sci U S A 93:10004-10006. 70 20. Mathews, D .H., Sabina, J., Zuker, M. , and Turner, D .H. 1999. Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of R N A Secondary Structure. J Mol Biol 288:911-940. 21. Mattick, J .S., and Makunin, I .V. 2006. Non-Coding R N A . Hum Mol Genet 15 Spec No 1:R17 :29. 22. Meats, E., Feil, E.J., Stringer, S., Cody, A . J . , Goldstein, R., Kroll, J.S., Popovic, T., and Spratt, B.G. 2003. Characterization of Encapsulated and Noncapsulated Haemophilus influenzae and Determination of Phylogenetic Relationships By Multi locus Sequence Typing. J Clin Microbiol 41:1623-1636. 23. Meibom, K.L. , Blokesch, M. , Dolganov, N.A., Wu, C.Y., and Schoolnik, G.K. 2005. Chitin Induces Natural Competence in Vibrio cholerae. Science 310:1824-1827. 24. Notredame, C , O'Brien, E .A. , and Higgins, D .G. 1997. R A G A : R N A Sequence Alignment By Genetic Algorithm. Nucleic Acids Res 25:4570-4580. 25. Nudler, E. , and Mironov, A.S. 2004. The Riboswitch Control of Bacterial Metabolism. Trends Biochem Sci 29:11-17. 26. Ochman, H., and Davalos, L .M. 2006. The Nature and Dynamics of Bacterial Genomes. Science 311:1730-1733. 27. Poje, G. and Redfield, R.J., 'General Methods for Culturing Haemophilus influenzae, and Transformation of Haemophilus influenzae' in M . Herbert, E (ed), Molecular Methods for Haemophilus Influenzae (2002) 28. Redfield, R.J. 1991. S x y - l , a Haemophilus influenzae Mutation Causing Greatly Enhanced Spontaneous Competence. J Bacteriol 173:5612-5618. 71 29. Redfield, R.J., Cameron, A.D., Qian, Q., Hinds, J., Ali, T.R., Kroll, J.S., and Langford, P.R. 2005. A Novel CRP-Dependent Regulon Controls Expression of Competence Genes in Haemophilus influenzae. J Mol Biol 347:735-747. 30. Romeo, T. 1998. Global Regulation By the Small RNA-B ind ing Protein CsrA and the Non-Coding R N A Molecule CsrB. Mol Microbiol 29:1321-1330. 31. Sambrook J. & Russell D.W. Molecular Cloning (3rd ed, 2001). 32. Solomon, J.M., and Grossman, A.D. 1996. Who's Competent and When: Regulation of Natural Genetic Competence in Bacteria. Trends Genet 12:150-155. 33. Stuy, J H. 1989. Cloning and Characterization of the Haemophilus influenzae Rd Rec-1+ Gene. J Bacterio1171:4395-4401. 34. Takayama, K., and Kjelleberg, S. 2000. The Role of R N A Stability During Bacterial Stress Responses and Starvation. Environ Microbiol 2:355-365. 35. Thompson, J.D., Gibson, T.J., Plewniak, F. Jeanmougin, F. and Higgins, D.G. 1997. The Clustalx Windows Interface: Flexibile Strategies for Mult iple Sequence Alignment Aided By Quality Analysis Tools Nucleic Acids Research 24:4876-4882. 36. Vasil'eva, I.M., and Garber, M.B. 2002. The Regulatory Role of the Hfq Protein in Bacterial Cells. Mol Biol (Mosk) 36:970-977. 37. Wilcox, K W, and Smith, H O. 1975. Isolation and Characterization of Mutants of Haemophilus influenzae Deficient in an Adenosine 5'-Triphosphate-Dependent Deoxyribonuclease Activity. J Bacteriol 122:443-453. 38. Wise, E.M.J., Alexander, S.P., and Powers, M. 1973. Adenosine 3':5'-Cyclic Monophosphate as a Regulator of Bacterial Transformation. Proc Natl Acad Sci USA 70:471-474. 72 39. Zuker , M . 2003. Mfo ld Web Server for Nucleic Ac i d Folding and Hybridization Prediction. Nucleic Acids Res 31:3406-3415. 40. Zul ty , J . J . , and Barcak , G . J . 1995. Identification of a D N A Transformation Gene Required for C o m l 0 1 A + Expression and Supertransformer Phenotype in Haemophilus influenzae. Proc Natl Acad Sci USA 92:3616-3620. 73 Appendix Species Accession number H.influenzae K W 2 0 L42023.1 H.influenzae 86-028NP CP000436.1 H. ducreyi AE017143.1 H. somnus CP000436.1 P. multocida AE004439.1 M. succiniciproducens AE016827.1 Table A . l . Accession numbers of annotated genomes used in the study wt s x y s x y - 1 s x y - 7 Figure A . l Representative gel pictures . End-labeled sxy, sxy-l and sxy-l R N A s were partially digested with RNases and the resulting fragments were resolved in a sequencing gel. A . Mapping of .sxy R N A secondary structure. Lanes 1,2 and 3 were loaded with ladders: 1-alkaline ladder, 2 & 3-TI ladder. Lanes 4 and 5 were loaded with folded sxy R N A partially digested with RNase TI (lane 4) and RNase V I (lane 5). B. Mapping of .sxy-l and sxy-7 R N A secondary structures. Lanes 1-6 were loaded with partially digested sxy-l R N A and lanes 7-15 were loaded with partially digested sxy-7 R N A . Lanes 1,4,5,7,8,12 &13 represent sequencing ladders: 1,8 & 9-TI ladder, 4,5,12 &13 RNase A ladder and 7-alkaline ladder. Samples partially digested with RNase TI are in lanes 2,3,10 &11 and samples partially digested with RNase A are in lanes 6, 14 &15. 76 Cleavage intensities of residues in sxy , sxy -1 and sxy- 7 RNA secondary structure • sxy • sxy-1 • sxy-7 II T ! * i i • 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 Residue position (relative to translation start) Figure A.2 . Relative cleavage intensities for residues in sxy, sxy-l and sxy-7 R N A . 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0100671/manifest

Comment

Related Items