{"Affiliation":[{"label":"Affiliation","value":"Science, Faculty of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."},{"label":"Affiliation","value":"Microbiology and Immunology, Department of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."}],"AggregatedSourceRepository":[{"label":"AggregatedSourceRepository","value":"DSpace","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","classmap":"ore:Aggregation","property":"edm:dataProvider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","explain":"A Europeana Data Model Property; The name or identifier of the organization who contributes data indirectly to an aggregation service (e.g. Europeana)"}],"Campus":[{"label":"Campus","value":"UBCV","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","classmap":"oc:ThesisDescription","property":"oc:degreeCampus"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the name of the campus from which the graduate completed their degree."}],"Creator":[{"label":"Creator","value":"Awram, Peter Alan","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/creator","classmap":"dpla:SourceResource","property":"dcterms:creator"},"iri":"http:\/\/purl.org\/dc\/terms\/creator","explain":"A Dublin Core Terms Property; An entity primarily responsible for making the resource.; Examples of a Contributor include a person, an organization, or a service."}],"DateAvailable":[{"label":"DateAvailable","value":"2009-07-27T23:44:02Z","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"edm:WebResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"DateIssued":[{"label":"DateIssued","value":"2000","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"oc:SourceResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"Degree":[{"label":"Degree","value":"Doctor of Philosophy - PhD","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","classmap":"vivo:ThesisDegree","property":"vivo:relatedDegree"},"iri":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","explain":"VIVO-ISF Ontology V1.6 Property; The thesis degree; Extended Property specified by UBC, as per https:\/\/wiki.duraspace.org\/display\/VIVO\/Ontology+Editor%27s+Guide"}],"DegreeGrantor":[{"label":"DegreeGrantor","value":"University of British Columbia","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","classmap":"oc:ThesisDescription","property":"oc:degreeGrantor"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the institution where thesis was granted."}],"Description":[{"label":"Description","value":"C. crescentus is a Gram-negative bacterium that possesses an hexagonal\r\narray called the S-layer that covers the entire outer surface of the bacterium.\r\nThis array is composed of an estimated 60 000 copies of the 98 kDa protein\r\nRsaA. RsaA secretion is directed by a C-terminal secretion signal located in the\r\nlast 82 amino acids of the protein. Once RsaA is secreted from the cell, it\r\nassembles into the S-layer and attaches to the outer membrane via a specific\r\nspecies of smooth lipopolysaccharide (S-LPS). The mechanisms required for the\r\nsecretion of RsaA and the synthesis of the S-LPS were examined in this thesis.\r\nTn5 mutagenesis of wildtype C. crescentus demonstrated the presence of\r\ntwo genes, rsaD and rsaE, 3' of the rsaA gene that were required for transport of\r\nRsaA. These genes were isolated and are capable of complementing the Tn5\r\nmutations 3' of RsaA in trans. The resulting proteins of rsaD and rsaE belong to\r\nthe type I secretion family that uses three components: an ATP Binding\r\nCassette-transporter (RsaD), a Membrane Fusion Protein (RsaE) and an outer\r\nmembrane protein (OMP), to secrete proteins through both membranes of Gramnegative\r\nbacteria. The OMP, RsaF, of the Rsa system was found by screening\r\nthe partial Caulobacter genome sequence for sequence identity to other type I\r\nOMPs. The gene for RsaF is found 5 kb 3' of rsaE. Deletion of the N-terminus\r\nor C-terminus of RsaF prevents the Rsa secretion mechanism from functioning.\r\nThe secretion of the S-layer subunits in a number of other Caulobacter\r\nspecies was also examined. A partial ORF from FWC27 with 44.6% identity to\r\nRsaA was isolated. In addition, the ABC-transporter components from FWC6,\r\nFWC8 and FWC39 were isolated. These components were >95% identical to\r\n\r\nRsaD. These results were used to explore the evolutionary relationships\r\nbetween the different Caulobacter species.\r\nEighteen Tn5 mutations resulting in the inability of the S-layer to attach to\r\nthe surface of the bacterium were also isolated. Southern blot analysis\r\ndemonstrated that twelve of these insertions were linked to the Rsa transporters.\r\nThe Tn5 insertion points were isolated and sequenced allowing identification of\r\nseveral putative genes involved in S-LPS synthesis from the Caulobacter\r\ngenome sequence. A total of twelve open reading frames (ORFs) were found by\r\nTn5 mapping and two more were found 3' of rsaE. Six of these putative genes\r\nmay code for proteins involved in the synthesis of sugar residues including five\r\nthat make perosamine. Five of the genes appear to be glycosyltransferases\r\ninvolved in forming the linkages between sugar residues in the O-antigen. One\r\nof the genes appears to be a repressor, while the remaining genes are\r\nunidentified. These data suggest that the major component of the O-antigen is\r\nperosamine and that a number of different linkages are made between the\r\nperosamine residues.","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/description","classmap":"dpla:SourceResource","property":"dcterms:description"},"iri":"http:\/\/purl.org\/dc\/terms\/description","explain":"A Dublin Core Terms Property; An account of the resource.; Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource."}],"DigitalResourceOriginalRecord":[{"label":"DigitalResourceOriginalRecord","value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/11365?expand=metadata","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","classmap":"ore:Aggregation","property":"edm:aggregatedCHO"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","explain":"A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier"}],"Extent":[{"label":"Extent","value":"24165005 bytes","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/extent","classmap":"dpla:SourceResource","property":"dcterms:extent"},"iri":"http:\/\/purl.org\/dc\/terms\/extent","explain":"A Dublin Core Terms Property; The size or duration of the resource."}],"FileFormat":[{"label":"FileFormat","value":"application\/pdf","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/elements\/1.1\/format","classmap":"edm:WebResource","property":"dc:format"},"iri":"http:\/\/purl.org\/dc\/elements\/1.1\/format","explain":"A Dublin Core Elements Property; The file format, physical medium, or dimensions of the resource.; Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]."}],"FullText":[{"label":"FullText","value":"Analysis of the S-layer Transporter Mechanism and Smooth Lipopolysaccharide Synthesis in Caulobacter crescentus by Peter Alan Awram B . S c , The University of British Columbia, 1992 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE D E G R E E OF DOCTOR OF PHILOSOPHY in THE FACULTY OF G R A D U A T E STUDIES (Department of Microbiology and Immunology) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA November 1999 \u00a9 Peter Alan Awram, 1999 In p resen t i ng this thesis in partial fu l f i lment of the requ i remen ts for an a d v a n c e d d e g r e e at the Univers i ty of Brit ish C o l u m b i a , I ag ree that the Library shall m a k e it f reely avai lable fo r re ference and s tudy. I fur ther agree that p e r m i s s i o n fo r ex tens ive c o p y i n g o f th is thes is fo r scho lar ly p u r p o s e s may b e g ran ted by the h e a d of m y depa r tmen t o r by his o r her representat ives. It is u n d e r s t o o d that c o p y i n g o r pub l i ca t i on of this thesis for f inancial gain shal l no t be a l l o w e d w i thou t m y wr i t ten p e r m i s s i o n . D e p a r t m e n t of \/ < V c f i O T ? \/ o \/ - Q 6 \/ T h e Univers i ty of Brit ish C o l u m b i a V a n c o u v e r , C a n a d a Da te \/ \/ f f D E - 6 (2\/88) Abstract C. crescentus is a Gram-negative bacterium that possesses an hexagonal array called the S-layer that covers the entire outer surface of the bacterium. This array is composed of an estimated 60 000 copies of the 98 kDa protein RsaA. RsaA secretion is directed by a C-terminal secretion signal located in the last 82 amino acids of the protein. Once RsaA is secreted from the cell, it assembles into the S-layer and attaches to the outer membrane via a specific species of smooth lipopolysaccharide (S-LPS). The mechanisms required for the secretion of RsaA and the synthesis of the S-LPS were examined in this thesis. Tn5 mutagenesis of wildtype C. crescentus demonstrated the presence of two genes, rsaD and rsaE, 3' of the rsaA gene that were required for transport of RsaA. These genes were isolated and are capable of complementing the Tn5 mutations 3' of RsaA in trans. The resulting proteins of rsaD and rsaE belong to the type I secretion family that uses three components: an A T P Binding Cassette-transporter (RsaD), a Membrane Fusion Protein (RsaE) and an outer membrane protein (OMP), to secrete proteins through both membranes of Gram-negative bacteria. The OMP, RsaF, of the Rsa system was found by screening the partial Caulobacter genome sequence for sequence identity to other type I OMPs . The gene for RsaF is found 5 kb 3' of rsaE. Deletion of the N-terminus or C-terminus of RsaF prevents the Rsa secretion mechanism from functioning. The secretion of the S-layer subunits in a number of other Caulobacter species was also examined. A partial O R F from FWC27 with 44.6% identity to RsaA was isolated. In addition, the ABC-transporter components from FWC6, FWC8 and FWC39 were isolated. These components were >95% identical to i i RsaD. These results were used to explore the evolutionary relationships between the different Caulobacter species. Eighteen Tn5 mutations resulting in the inability of the S-layer to attach to the surface of the bacterium were also isolated. Southern blot analysis demonstrated that twelve of these insertions were linked to the Rsa transporters. The Tn5 insertion points were isolated and sequenced allowing identification of several putative genes involved in S - L P S synthesis from the Caulobacter genome sequence. A total of twelve open reading frames (ORFs) were found by Tn5 mapping and two more were found 3' of rsaE. Six of these putative genes may code for proteins involved in the synthesis of sugar residues including five that make perosamine. Five of the genes appear to be glycosyltransferases involved in forming the linkages between sugar residues in the O-antigen. One of the genes appears to be a repressor, while the remaining genes are unidentified. These data suggest that the major component of the O-antigen is perosamine and that a number of different linkages are made between the perosamine residues. i i i TABLE OF CONTENTS Abstract ii Table of Contents iv List of Tables v List of Figures vi List of Abbreviations vii List of Species abbreviations viii Acknowledgements ix C H A P T E R 1 Introduction 1 C H A P T E R 2 Materials and Methods 16 C H A P T E R 3 Secretion of RsaA 24 C H A P T E R 4 Identification of the Outer Membrane Protein Component of the ' RsaA Transport Complex 38 C H A P T E R 5 Identification of the S-layer Subunit and Transporter genes in Freshwater Caulobacter species 53 C H A P T E R 6 Identification of Genes involved in the Synthesis of the O-antigen of C. crescentus 68 C H A P T E R 7 Conclusion and Future Considerations 92 Bibliography . 97 Appendix 1 RAT fragment - rsaADEF and IpsABCDEF 113 Appendix 2 ATC15252 S-layer subunit and transporter genes 122 Appendix 2 Sequences of IpsGHIJK, orfl and orf2 129 iv List of Tables Table 2-1 Strains and Plasmids used in this study 17 Table 2-2 Primers used for P C R for this report 19 Table 5-1 Differences between the Rsa genes found in lab strains 55 Table 5-2 FWC species secreting alkaline protease 56 Table 5-3 Southern Blot Banding patterns of different FWC species 57 Table 5-4 BLAST alignment of RsaA with itself 62 Table 6-1 Southern blot of Shedder mutants using EcoRI 71 Table 6-2 Southern blot of Shedder mutants using Sst\\ 72 Table 6-3 List of shedder mutants 73 Table 6-4 Deduced proteins involved in O-antigen synthesis 77 Table 6-5 Characteristics of the S-LPS synthesis genes 79 v List of Figures Figure 1-1 Shed S-layer from C. crescentus 2 Figure 1-2 Developmental cycle of C. crescentus 3 Figure 1-3 3-Dimensional reconstruction of the S-layer 5 Figure 1-4 Type I secretion system 9 Figure 2-1 Plasmids containing NA1000 chromosomal DNA 18 Figure 3-1 Colony Immunoblot 25 Figure 3-2 S-layer negative Tn5 insertions 26 Figure 3-3 Complementation of Tn5 mutants with rsaA 27 Figure 3-4 Genes 3' of rsaA 28 Figure 3-5 ClustalW alignment of ABC-transporters 29 Figure 3-6 ClustalW alignment of MFPs 30 Figure 3-7 Complementation of transport deficient mutants 32 Figure 3-8 Expression of prtB in C. crescentus 34 Figure 4-1 Alignment of OMP components 41 Figure 4-2 The two possible OMPs 43 Figure 4-3 Comparison of possible Rsa OMP components 45 Figure 4-4 DNA surrounding rsaA 48 Figure 4-5 O M P s similar to Rsa(973) 49 Figure 4-6 AprA secretion from C. crescentus 50 Figure 5-1 Alignment of partial FWC RsaD genes 58 Figure 5-2 ClustalW alignment of FWC 27 60 Figure 5-3 Dendrogram of FWC species 66 Figure 6-1 Shed S-layer from C. crescentus 69 Figure 6-2 Colony Immunoblot 69 Figure 6-3 S -LPS of shedding Tn5 mutants 70 Figure 6-4 S -LPS synthesis genes linked to rsaA 75 Figure 6-5 Genes interrupted by Tn5 insertions in shedder mutants 76 Figure 6-6 ClustalW alignment LpsB 81 Figure 6-7 ClustalW Alignment of LpsD and LpsE 82 Figure 6-8 ClustalW alignment of LpsF 83 Figure 6-9 ClustalW alignment of LpsJ 85 Figure 6-10 ClustalW alignment of LpsK 87 Figure 6-11 Perosamine Synthesis Pathway 90 vi List of Abbreviations A B C ATP-Binding Cassette ATP adenosine triphosphate BLAST Basic Local Alignment Search Tool C-terminus carboxy terminus DNA deoxyribonucleic acid EDTA ethylene diaminetetra-acetic acid EGTA ethylene glycol-bis((3-aminoethyl Ether) NNN'N' tetraacetic acid G+C guanosine and cytosine content of DNA FWC freshwater Caulobacter HCI hydrochloric acid KDO ketodeoxy octulosonic acid kDa kilodalton Km kanamycin L P S lipopolysaccharide min minute MFP membrane fusion protein mg milligram ml millilitre (LLI microlitre [ig microgram NaCI sodium chloride NeuNAc N-acetyl neuraminic acid (sialic acid) NAD nicotinamide adenine dinucleotide NMR nuclear magnetic resonance N-terminus amino terminus NTG 1 -methyl-3-nitro-1 -nitrosoguanidine O-antigen antigenic determinant found on the outside of cell consisting of repeating units of oligosaccharides ORF open reading frame OMP outer membrane protein P A G E polyacrylamide gel electrophoresis P C R polymerase chain reaction P Y E peptone yeast extract RNA ribonucleic acid S-layer surface layer S -LPS smooth lipopolysaccharide of C. crescentus S D S sodium dodecyl sulphate Sm streptomycin Tc tetracycline T m Melting temperature of two strands of DNA TIGR The Institute for Genome Research Tris Tris (hydroxymethyl) methylamine UV ultra violet light List of Species Abbreviations B. pertussis B. melitensis C. crescentus C. fetus C. jejuni E. chrysanthemi E. coli P. aeruginosa S. enterica S. marcescens V. cholerae R. meliloti R. leguminosarum Bordetella pertussis Brucella melitensis Caulobacter crescentus Campylobacter fetus Campylobacter jejuni Erwinia chrysanthemi Escherichia coli Pseudomonas aeruginosa Salmonella enterica Serratia marcescens Vibrio cholerae Rhizobium meliloti Rhizobium leguminosarum Acknowledgements I would like to acknowledge myself for persevering throughout this process. I would further like to thank my wonderful girlfriend Bianca Kuipers for being supportive during this time. I would like to thank John Nomellini and Stephen Walker for helpful suggestions and thoughtful insights and the occasional gel along the way. I would also like to thank all my 'partners in pain' that started with me. ix Chapter 1 Introduction This thesis focuses on the secretion and attachment of the S-layer of Caulobacter crescentus. S-layers are not well understood and have not been studied extensively even though they are found on a wide range of prokaryotes (Messner and Sleytr, 1992; Sleytr et al., 1993; Sleytr and Sara , 1997). Consequently, there is a need for basic research to describe these structures. Despite this lack of study, some research has been done on the commercial aspects of S-layers (Sleytr et al., 1997a). The research presented here is applicable to both of these areas. It is of general interest to know the methods of secretion and attachment of the S-layer and this information can also be applied to the commercial aspects of S-layers. Evidence is presented that the S-layer subunit of C. crescentus is secreted by a type I secretion mechanism and that the S-layer subunits of a number of other Caulobacter species are probably secreted by an almost identical type I mechanism. Also presented are several putative proteins involved in the synthesis of the O-antigen that support the predicted composition of the O-antigen as being a polymer of a 4,6-dideoxy-4-amino-hexose with complex linkages (Walker et al, 1994; Smit unpublished). Furthermore these data suggest that the 4,6-dideoxy-4-amino-hexose is perosamine and that a number of glycosyltransferases provide complex linkages between the perosamine residues. The S-layer of C. crescentus can be used as a biotechnology vehicle. The S-layer is a 2-dimensional array made from approximately 60 000 copies of the protein, RsaA (Smit et al., 1981). This layer covers the entire outer surface of the bacterium and makes up about 10% of the cell's protein. Therefore, RsaA must be secreted, passing through both membranes, from the Gram-negative cell. An uncleaved C-terminal secretion signal directs this secretion of RsaA (Bingle et al., 1999; Bingle et al., 1996; Bingle et al., 1997b; Bingle and Smit, 1994). Once secreted, the S-layer is attached to the outer membrane via the smooth 1 lipopolysaccharide (S-LPS) (Walker et al., 1994). If the S -LPS is disrupted or absent the S-layer detaches from the membrane and aggregates into particles that are up to 90% pure RsaA making it easy to collect large amounts of relatively pure protein (Fig. 1-1). It has been found that the N-terminus of RsaA contains the attachment domain and a C-terminus C a 2 + binding domain is responsible for aggregation of the protein (Bingle etal., 1997b). To produce recombinant proteins it is desirable to produce large quantities that are easily isolated from the rest of the cellular protein. The properties of the C. crescentus S-layer and secretion apparatus allow this. Figure 1-1. shed s-l a y e r f r o m C. crescen-tus. E M photograph of S-layer shed from a strain with d e f e c t i v e S - L P S . (Photo courtesy John Smit) The C-terminal secretion signal and C a z + binding domain can be fused to a desired protein and recombinant proteins can then be secreted from C. crescentus by the RsaA secretion signal. The proteins aggregate together in the medium where they can be filtered away from the cells. This process has been shown to be viable and recombinant proteins have been expressed and purified from C. crescentus (Bingle et al., 1997a). S-layers also have other uses such as the expression of epitopes in S-layers to be used for recombinant vaccines. Another aspect that is being examined is to use the regular arrays formed by the S-layer as templates for the deposition of metal or silicon atoms to allow creation of circuitry finer than is allowed by current integrated circuit etching technology. It would also be possible to use the arrays as surface supports to which biologically active molecules could be attached (Sara and Sleytr, 1996a; Sara and Sleytr, 1996b; Sleytr et al., 1997a; Sleytr et al., 1997b; Sleytr and Sara, 1997). Obviously, all these uses could be applied to the S-layer of C. crescentus. To increase the utility of C. crescentus S-layers for such applications it is vital to understand how the RsaA protein is secreted and attached to the surface. For example, it is necessary to understand the conformation of the protein when it is 2 passing through the secretion apparatus. This will determine what kind of foreign proteins or epitopes can be secreted and are capable of forming aggregates using the RsaA secretion pathway. To answer some of these questions this thesis examines the RsaA secretion and S-LPS synthesis pathways. C. crescentus is a Gram-negative, motile eubacterium found in soil and aquatic environments including drinking water. The non-pathogenic bacterium derives its name from the crescent shape of the ce l l s . C. crescentus undergoes a dimorphic developmental life cycle (for reviews see Brun et a\/., 1994; Gober and M a r q u e s , 1 9 9 5 ; P o i n d e x t e r , 1 9 8 1 ; Shapiro, 1976; Shapiro and Losick, 1997) during w h i c h it s w i t c h e s b e t w e e n a mot i le (swarmer) phase and a sess i le stalked phase (Fig 1-2). In both phases the bacterium is completely covered by the S-layer (Smit et a\/., 1981). In the swarmer phase the cell expresses a single flagellum, pili and a holdfast (an adhesin) at one pole. When the cell differentiates into the stalked form, it loses the flagellum and a stalk (containing no cytoplasm) grows out from the cell envelope keeping the holdfast on its tip. Stalked cells divide and produce a swarmer cell with the flagellum being created at the pole furthest from the stalked cell. Most of the current research on C. crescentus focuses on the developmental process resulting in these two different forms and the development of the flagellum (Brun et a\/., 1994; Roberts et a\/., 1996; Shapiro and Losick, 1997). Figure 1-2. Deve lopmen ta l c y c l e of C. crescentus Sessi le cells attached to the surface via the holdfast bud off swarmer cells which move to a new location where they lose their flagellum and grow a stalk to attach to the surface again. (Figure courtesy Ian Bosdet . ) 3 S-layers are two-dimensional arrays that cover the outside surface of many prokaryotes. C. crescentus is one of many species of bacteria covered with a crystalline protein surface layer (S-layer) (Boot and Pouwels, 1996; Sleytr and Messner, 1983; Sleytr and Sara, 1997; Smit et a\/., 1981). Thousands of copies of nearly always a single protein or glycoprotein self-assemble into a crystalline-like lattice (Sleytr and Messner, 1983). The S-layers described so far have subunits ranging in size from 30 to 220 kDa (Messner and Sleytr, 1992). Although a large number of bacteria have been found to have S-layers, enteric bacteria, the most studied, lack them and consequently have not been studied much (Hovmoller et a\/., 1988; Sleytr and Messner, 1988). For reviews on S-layers see Beveridge et a\/., 1997; Sleytr, 1992; Sleytr and Messner, 1983. S-layers typically make up 10% of the protein in a cell and thus represent a large energy expenditure by the cell (Sleytr and Messner, 1983). Many bacteria have been found to lose their S-layers when there is no environmental pressure for maintenance, such as during sub-culturing in the laboratory, showing that S-layers are not essential for growth (Blaser et a\/., 1985; Borinski and Holt, 1990; Luckevich and Beveridge, 1989; Stewart and Beveridge, 1980). Considering the energy expenditure, the function of the S-layer must be required for survival in the normal environment of the bacterium. It is presumed that most S-layers have a protective barrier role because the pore-like structures formed by the layer likely act as molecular sieves and prevent the entry of molecules, such as proteases and lytic enzymes, larger than the pore (Sleytr and Messner, 1983) as shown by several cases (Koval and Hynes, 1991; Sleytr, 1976). In addition, some infectious bacteria use their S-layers to adhere to and invade the cells of other organisms (Blaser et a\/., 1988; Messner and Sleytr, 1992; Munn et a\/., 1982). It has been demonstrated that the S-layer of C. cresentus protects it from a Bdellovibrio-Wke organism (Koval and Hynes, 1991), but the S-layer also acts as a receptor for the bacteriophage 0CR3O (Edwards and Smit, 1991) showing that the S-layer also allows C. crescentus to be infected by a parasite. S-layers have common features, such as an acidic pi, an absence of cysteine residues and a high number of hydroxylated amino acids. Subunits are held 4 . together and to the surface by noncovalent (hydrophobic, ionic, hydrogen or polar) bonds (Koval and Murray, 1984; Messner and Sleytr, 1992; Sleytr and Messner, 1983). Despite these similarities, there is very little sequence similarity among S-layer proteins (Gilchrist et al., 1992; Messner and Sleytr, 1992), suggesting that S-layers may have arisen by convergent evolution. The S-layer of C. crescentus is composed of the protein RsaA. Six copies of RsaA form a ring-like subunit (Fig. 1-3) that interconnects with other subunits to form a two-dimensional hexagonal array (Smit et al., 1992). The gene for RsaA has been cloned (Smit and Agabian, 1984) and sequenced (Gilchrist et al., 1992). N-terminal protein sequencing of the mature RsaA polypeptide has shown that only the initial N-formyl methionine is cleaved, leaving a mature polypeptide of 1025 residues with a molecular weight of 98 kDa (Fisher et al., 1988; Gilchrist et al., 1992). The S-layer is anchored to the cell surface via a noncovalent interaction between the N-terminus of the protein and a specific smooth L P S in the outer membrane (Walker et al., 1994). C a 2 + is required for the proper crystallization of RsaA into the S-layer and its removal using E G T A disrupts S-layer structure (Nomellini et al., 1997; Walker et al., 1994). Figure 1 - 3 . 3-Dimensional recon-struction of the S-layer. The arrow ind icates a s ingle C - s h a p e d R s a A monomer. (Figure from Smit et al, 1 9 9 2 ) . RsaA is a true secreted protein. RsaA must pass through both the inner and outer membranes to form the S-layer on the outer surface of the bacterium. As there is a large amount of RsaA (10 to 12% of the cellular protein), an efficient secretion system or a large number of transport complexes are required to secrete the protein during the 105 min generation time. Linker mutagenesis of RsaA has shown that the 5 extreme N-terminus is required for surface attachment while the C-terminus is required for secretion. Further, deletion and hybrid protein analyses have indicated that secretion of RsaA relies on an uncleaved C-terminal secretion signal located within the last 82 amino acids of the RsaA protein (Bingle et al., 1999; Bingle et al., 1996; Bingle et al., 1997a; Bingle e ra \/ . , 1997b; Bingle and Smit, 1994). The presence of an uncleaved C-terminal secretion signal usually indicates secretion by a type I system (Binet et al., 1997; Salmond and Reeves, 1993) rather than a type II, III or IV system. Most Gram-positive bacterial S-layers have been shown to use the General Secretion Pathway (GSP) or Sec-dependent pathway (Pugsley, 1993) for export (Messner and Sleytr, 1992; Sleytr and Messner, 1988; Sleytr et al., 1993; Sleytr and Sara, 1997), whereas S-layer proteins in Gram-negative bacteria are secreted using a type II system (Boot and Pouwels, 1996) which also employs the G S P to transport the S-layer subunit across the inner membrane. Recently, it has been shown that the S-layer of Campylobacter fetus is secreted by a type I mechanism (Thompson et al., 1998) and an S-layer-like protein in Serratia marcescens with significant similarity to RsaA has been shown to use a type I secretion mechanism (Kawai etai, 1998). In addition to the secretion signal, the C-terminal portion of RsaA also contains repeats of a glycine and aspartate acid-rich region which are thought to bind calcium ions (Gilchrist et al., 1992) and result in the aggregation of free RsaA in the medium. Such Ca 2 +-binding motifs are found in most proteins secreted by type I systems (Binet et al., 1997) and consist of a glycine\/aspartate rich G G X G X D motif that repeats 4-36 times (Welch, 1991). C. crescentus has two groups of three repeats separated by 12-16 residues containing this motif. Interestingly, there are no obvious repeat regions in the S-layer of C. fetus (Thompson et al., 1998). It has been suggested that these motifs are important for the proper presentation of secretion signal to the A B C transporter (Duong et al., 1996; Letoffe and Wandersman, 1992; Sutton et al., 1996). Thus, in the case of RsaA, the glycine and aspartate rich repeats may function (along with C a 2 + ) both in maintaining the crystalline structure of the S-layer and in the secretion of the S-layer protein itself. 6 There are four described Gram-negative bacterial transport systems. These systems have been named type I through type IV. The type I system requires 3 proteins that are thought to form a pore through the inner and outer membranes allowing the protein to be secreted. This is the method by which RsaA is secreted and it is discussed in depth below. Type II systems use the G S P for export across the inner membrane and then use a complex of 12-14 proteins for secretion to the outside of the bacterium. The secretion substrates contain classical Sec-dependent N-terminal signal sequences that direct transport across the inner membrane by the Sec pathway (Pugsley, 1993). Proteins are transported across the cytoplasmic membrane in an unfolded state and then fold in the periplasm. This folding is necessary as the components for secretion seem to recognize the secondary or tertiary structure of the substrate as no sequence similarity has been found (Lu and Lory, 1996). Both ATP hydrolysis as well as proton motive forces appear to be required for secretion of the substrate (Feng et al., 1997; Letellier et a\/., 1997). For a review of type II secretion systems see Russel, 1998. The auto-secreting proteins, such as the IgA proteases, like the type II secreted proteins, use the G S P to cross the inner membrane. These proteins have an N-terminal signal sequence and a C-terminal pro-sequence. They are exported across the cytoplasmic membrane by the Sec dependent pathway in the usual manner with cleavage of the N-terminus signal sequence. The pro-sequence then forms a pore in the outer membrane through which the rest of the protein passes. Once the protein is outside, autocatalytic cleavage of the pro-sequence occurs, releasing the protease from the cell (Pohlner et al., 1987). Type III secretion has only been found in pathogens and is used to deliver bacterial proteins into the host cytoplasm to alter the host's metabolism to the advantage of the bacterium. Type III systems are the most complex of the secretion systems, involving more than 20 proteins. The proteins form a needle-like structure that spans the inner and outer membrane (Kubori et al., 1998). Before secretion can occur, the bacterium must make contact with the host cell. Secretion seems to be directed by the mRNA. It is thought that the mRNA forms a hairpin loop that obscures the translation start signal until the 5' region of the mRNA interacts with the 7 secretion apparatus (Anderson and Schneewind, 1997). A signal recognition protein may mediate this process. Therefore, secretion is coupled with translation. A T P hydrolysis appears to be required for secretion, as components of type III systems are capable of hydrolyzing ATP in vitro (Eichelberg et al., 1994). The substrate may j then pass through the needle structure to the outside of the cell, though this has not been proven. For reviews of type III secretion see Anderson and Schneewind, 1999; Galan and Collmer, 1999 Type IV secretion systems have only recently been discovered and are not well understood. This transport pathway, like the type III, has so far been found exclusively in pathogens. The type IV system seems to have been designed to transport DNA, though the Bordetella pertussis Ptl system only transports proteins (Weiss et al., 1993). There are at least 9 proteins involved in the transport process and their functions are not well understood. There are usually two proteins containing nucleotide binding motifs that appear to be the transporting units that hydrolyze ATP to effect transport. It is not known if the substrate is transported in a one step process where the substrate bypasses the periplasm or a two step process where the substrate is first transported to the periplasm and then a second transport process secretes the protein. For a review of type IV secretion see Burns, 1999 RsaA is secreted by a type I mechanism. The goal of this thesis was to elucidate the secretion mechanism of RsaA. Initial indications suggested that it was a type I secretion mechanism (i.e., a C-terminal secretion signal and the presence of glycine\/aspartate rich repeats) and data are presented here directly demonstrating that RsaA is secreted by a type I mechanism. Figure 1-4 shows the predicted structure of the C. crescentus membrane and also serves as a general model of a type I mechanism. The best described type I secretion systems are those required for the secretion of Escherichia coli a -hemolys in (HlyA), Erwinia chrysanthemi metalloproteases (PrtB) and Pseudomonas aeruginosa alkaline protease (AprA) (Binet et al., 1997; Salmond and Reeves, 1993). A type I secretion apparatus requires three components (Delepelaire and Wandersman, 1991). One component, the A B C transporter, is embedded in the inner membrane and contains an A T P -8 R s a A S-layer Outer Membrane Inner Membrane A B C - t ranspor ter binding cassette (ABC). It has been shown that this component recognizes the C-terminal signal sequence of the substrate protein and hydrolyzes A T P during the transport process (Binet and Wandersman, 1995; Koronakis et al., 1993). Another component, the membrane fusion protein (MFP), is anchored in the inner membrane and appears to span the periplasm (Dinh et al., 1994). The remaining component is an outer membrane protein (OMP) that has been shown to interact with the MFP. It is thought that these three components form a channel that extends from the cytoplasm through the two membranes to the outside of the cell (Akatsuka et al., 1997; Hwang et al., 1997). The substrate may pass through this channel (probably in an unfolded state) to the outside of the cell. In many cases, the genes for all three transport components are found immediately adjacent to the substrate gene(s) (Duong et al., 1992; Letoffe et al., 1990). In other type I systems, only the ABC-transporter and M F P genes are next to the substrate gene (Letoffe et al., 1994b; Mackman et al., 1985). The Rsa genes are organized like the latter and the O M P gene is not adjacent to the ABC-transporter and MFP. Recently, it was determined that the O M P gene is only separated from the M F P gene by five O R F s and a distance of 5 kb in the Rsa system. There are also instances where the substrate gene is separate from the secretion genes (Finnie et al., 1998; Scheu et al., 1992). A s shown in Figure 1-4, from analysis of the A B C -transporters it is thought that the protein components work in multimers of at least 2. Some members of the ABC-transporter family, such as P-glycoprotein, contain two Figure 1-4. T y p e I s e c r e t i o n s y s t e m . Diagram of the hypothet ica l membrane arch i tecture of C. crescentus showing the pred ic ted type I secret ion mechan ism of RsaA 9 almost identical domains in tandem, each with its own membrane spanning and A B C region (Sheps et al., 1996). Association of two A B C transporters has been shown for monomeric ABC-transporters (Davidson and Nikaido, 1991). The proteins may work in pairs so that one A T P is hydrolyzed for transport and a second A T P is hydrolyzed to return the complex to the resting conformation. It is also possible that the proteins work in tandem and small sequential conformational changes in each separate protein push the proteins along (Welsh, 1998). Recent work indicated that while the ABC-transporters may work as a dimer, the MFP may work as a hexamer and the O M P as a trimer (Holland, 1999; Koronakis et al., 1997). The ABC-transporter family is very large and the type I secretion systems make up only a small portion. They are found in all forms of life and are sufficient to transport a substrate across a single membrane. There is significant sequence similarity among the ABC-transporters, even between eukaryotic and prokaryotic genes. The eukaryotic P-glycoprotein shares close to 50% conserved amino acids with many of the bacterial ABC-transporters such as HlyB and PrtD over the entire length of the protein (Croop, 1998; Sheps era\/., 1996). Mammalian P-glycoproteins actually have more sequence identity to these prokaryotic transporters than to proteins considered to belong to the P-glycoprotein family. ABC-transporters are also involved in the import of substrates such as the Mai transporter where maltose is transported across the inner membrane (for reviews see Boos and Shuman, 1998; Ehrmann etai, 1998; Nikaido, 1994). The basic monomeric ABC-transporter consists of 2 domains. One domain, usually N-terminal and consisting of six to eight membrane spanning segments, recognizes the substrate and forms the pore through the membrane. The other domain contains the A B C region, which provides the energy for transport from the hydrolysis of ATP. The A B C domain is highly conserved and consists of about 215 amino acids and within this region there are four distinct motifs. Like all ATPases, ABC-transporters contain Walker A or P-loop (consensus G X X G X G K [ S T ] ) 1 and Walker B (hhhhD) 1 motifs which interact directly with A T P binding and hydrolysis 1 X-denotes any amino acid; h-denotes hydrophobic amino acid; brackets indicate alternative amino acids at a single position 10 (Walker et al., 1984), but they are immediately followed by a specific A B C -transporter motif (LSGGQ[QRK]QR) 1 (Bairoch, 1992; Gorbalenya and Koonin, 1990) which is thought to be involved in energy transduction (Hyde et al., 1990). A fourth motif has recently been identified in a majority of E. coli and Saccharomyces cerevisiae ABC-transporters (Decottignies and Goffeau, 1997; Linton and Higgins, 1998). This fourth motif is hhhhH 1 followed by a charged residue and is found approximately 30 amino acids C-terminal of the aspartic acid in the Walker B motif. No one has so far been able to make a 3-dimensional crystal of the complete A B C -transporter from which the structure could be determined. However, the A B C domain has been crystallized from two proteins (Armstrong et al., 1999; Hung et al., 1998) showing that the A B C forms an L with 2 arms; arm 1 binds with the A T P and arm 2 interacts with the membrane-spanning domain. It is thought that hydrolysis of ATP causes a conformational change in arm 2 which transfers the energy to the membrane spanning domain, possibly through the ABC-transporter motif found at the end of arm 2, and the conformational change in the membrane spanning domain results in transport of the substrate (Welsh, 1998). The MFP is characterised by a single hydrophobic transmembrane domain in the N-terminus that sits in the inner membrane. A hydrophilic domain spans the periplasm and the C-terminus consists of beta sheet that may interact with the outer membrane component (Dinh et al., 1994). The MFP family contains the conserved motif [L IVM]XXG[LM]XXX[STGAV]X[L IVMT]X[L IVMT] [GE]X[KR]X[L IVMFYW] [LIVMFYW]X[LIVMFYW][LIVMFYW][LIVMFYW] 1 (PROSITE:PDOC00469) The O M P sits in the outer membrane and interacts with the M F P . Of the known O M P s only TolC, from the a-hemolysin transporter, has been studied extensively. It has been found that three smooth L P S synthesis genes are required for secretion of a-hemolysin. It is likely that the smooth L P S is required for proper insertion of TolC in the membrane (Stanley et al., 1993; Wandersman and Letoffe, 1993). Two-dimensional crystals of TolC have been examined using electron microscopy and show that TolC forms a trimer. It also appears that a portion of the C-terminus is located in the periplasm (Koronakis et al., 1997). TolC contains a centrally located sequence of 44 amino acids in the middle of the protein that is highly similar to a sequence in HlyD (the MFP); these sequences are required for 11 transport and can be interchanged and still allow transport (Schulein era\/ . , 1994). Thus, TolC is thought to provide the essential function of linking the transporter complex to the external environment. While members of the ABC-transporter family secrete a huge range of substrates ranging from C a 2 + ions to cancer drugs to proteins, the type I secretion subfamily has been found to only secrete proteins. The specific features for secretion of a protein by a type I system are not known except that the secretion signal is located in approximately the last 60 amino acids of the C-terminus of the protein (Mackman et al., 1985). As little as 15 amino acids of the C-terminus of the protease, PrtG, from E. chrysanthemi still allows secretion, although this is only 1% as efficient. It was found that substrates can be secreted by closely related type I systems (Binet and Wandersman, 1996; Letoffe et al., 1994a; Letoffe et al., 1994b), but only if there is more than 25% amino acid identity between ABC-transporters of the systems (Delepelaire and Wandersman, 1990; Fath et al., 1991). No sequence similarity is found among the secretion signals of the different substrate proteins; however, in the proteases, lipases and NodO a conserved motif of a negatively charged amino acid followed by several hydrophobic amino acids has been found at the end of the C-terminus (Binet et al., 1997). The C-terminal signal sequence of a -hemolysin was extensively mutagenized, but few individual amino acids were found to affect secretion (Kenny et al., 1992). Because of this lack of sequence similarity and identification of important residues it is thought that the secretion signal relies on secondary structure to initiate transport. NMR and circular dichroism studies of the C-terminus of PrtG, HasA (the heme acquisition protein from Serratia marcescens), HlyA (the hemolysin from E. coli) and LktA (the leukotoxin from Pasteurella haemolytica) have shown that there are two a helices in the C-terminus (Wolff et al., 1997; Wolff et al., 1994; Yin et al., 1995). Mutation of these a helical regions in HlyA and LktA showed that the secretion signal appears to bind to a pocket in the ABC-transporter and induce a conformational change that causes transport to occur (Zhang et al., 1998). Presented in this thesis is evidence that all three components of a type I secretion system have been found in C. crescentus and these components are required for the secretion of RsaA. They have greatest similarity to the protease 12 type I secretion systems from P. aeruginosa and E. chrysanthemi and the proteases from these systems can be secreted by the Rsa system. The S-layers subunits from other Caulobacter species appear to be secreted by type I systems. Several FWC species with S-layers have been isolated from a wide number of aquatic sources (MacRae and Smit, 1991; Walker et al., 1992). The subunits of these S-layers react with anti-RsaA antibody and their smooth-LPS reacts with antibody raised against the smooth-LPS of NA1000. The S-layer subunits from these FWC species range in size from 100 to 193 kDa and can be removed from the bacterium's surface using low pH or EGTA (Walker et al., 1992). Portions of the genome of the FWC species with S-layers hybridize to the rsaA gene while the genomes of FWC species without S-layers do not (MacRae and Smit, 1991). It is shown in Ch . 5 that the protease, AprA from P. aeruginosa, was expressed and secreted in some of these FWC species. These facts suggest that type I secretion mechanisms secrete the S-layer subunits in the F W C species. Since the FWC species secrete S-layer subunits varying widely in size, it is desirable to examine the S-layer subunits and their corresponding secretion systems and examine the differences and similarities to allow one to determine how the mechanisms work, what parts of the protein are essential for secretion and what parts provide specificity. With these goals in mind, procedures are reported here for the characterisation of the S-layer subunit, ABC-transporter and M F P genes from various FWC species. The S-layer is attached to the surface of C. crescentus using a species of smooth LPS. The outer membrane of Gram-negative bacteria contains phospholipids, proteins and L P S (Nikaido and Vaara, 1985). In many cases, including C. crescentus, there is also an extracellular polysaccharide (EPS) (Ravenscroft et al., 1991); the S-layer is external to all of these molecules (although the E P S may pass through the S-layer). Smooth L P S is a major component of the outer membrane of Gram-negative bacteria and consists of three regions. The lipid A moiety is the endotoxic part of LPS and is anchored in the outer leaflet of the outer membrane. The core, a branched chain oligosaccharide linked to ketodeoxy 13 octulosonic acid (KDO), is attached to the lipid A molecule. Extending from the core is the O-antigen which contains a repeating linkage of oligosaccharides (Schnaitman and Klena, 1993). It has been found in C. crescentus that the S - L P S anchors the S-layer to the cell surface via a noncovalent interaction with the N-terminus of RsaA. Immunolabelling showed that the S - L P S is completely occluded by the S-layer (Walker et al., 1994). Isolation and characterization of the S - L P S showed that the core sugars and fatty acids are identical to those of the rough L P S and that the O-antigen is of a homogeneous length, unlike the variable length S - L P S found in many enteric bacteria. Previous reports (Walker et al., 1994) indicated that the O-antigen was composed of a 4,6-dideoxy-4-amino-hexose, a 3,6-dideoxy-3-amino-hexose and glycerol, but recent results (Smit, unpublished) indicate that glycerol is a contaminant of the S - L P S isolation procedure, and that the 3,6-dideoxy-3-amino-hexose assignment is likely due to a co-purifying polymer. Therefore, it seems possible that the O-antigen is composed solely of a 4,6-dideoxy-4-amino-hexose. Anomeric traces found by analysis of proton NMR spectra indicate that the linkages between the 4,6-dideoxy-4-amino-hexose are not identical, implying the involvement of a larger number of glycosyltransferases than needed for a simple polymer with only one kind of linkage. These data correlate with the information presented in this thesis. I have found a number of S - L P S synthesis genes, indicating that C. crescentus may make perosamine, a 4,6-dideoxy-4-amino-hexose, and that perosamine is likely a component of the S - L P S . A number of glycosyltransferases were also found as would be expected considering that several transferases would be required to produce the different linkages that result in the different anomeric proton traces found by proton NMR. Evidence is presented in this thesis demonstrating how RsaA is secreted and how the S-LPS, involved in attachment of the S-layer, is synthesized. Three genes composing the ABC-transporter, MFP and O M P of a type I secretion system required for secretion of RsaA in C. crescentus are described. A type I secretion system is also required for secretion of the S-layer subunits of other FWC species. The genes required for the secretion of RsaA and the synthesis of S - L P S are linked 14 leading to the discovery of a number of putative genes involved in the synthesis of the S -LPS required for S-layer attachment. Additional genes involved in synthesis of the S-LPS were discovered by Tn5 mutagenesis. 15 Chapter 2 Materials and Methods Strains, plasmids and growth conditions. All strains, libraries and plasmids used in this study are listed in Table 2-1. Plasmids with NA1000 DNA inserts are listed in Figure 2-1.The \u00a3. coli strains DH5oc JM109 or RB404 were used for all E. coli cloning manipulations. E. coli was grown at 37\u00b0C in Luria broth (1% tryptone, 0.5% NaCI, 0.5% yeast extract), with 1.2% agar for plates. C. crescentus strains were grown at 30\u00b0C in P Y E medium (0.2% peptone, 0.1% yeast extract, 0.1% CaCI 2 , 0.2% MgSCv, with 1.2% agar for plates). Ampicillin was used at 100 ng\/ml, streptomycin at 50 ng\/ml, kanamycin at 50 u.g\/ml in both C. crescentus and E. coli, and tetracycline was used at 0.5 |ig\/ml and 10 \\xg\/m\\ and chloramphenicol was used at 2 |ig\/ml and 20 ng\/ml in C. crescentus and E. coli, respectively, when appropriate. Recombinant DNA manipulations. Standard methods of DNA manipulation and isolation were used (Sambrook et al., 1989). Electroporation of C. crescentus was performed as previously described (Gilchrist and Smit, 1991). Southern blot hybridizations were done according to the membrane manufacturer's instructions (Amersham Hybond-N). Southern blot analysis allowing up to 30% mismatch between the probe and chromosomal DNA was performed in an identical manner except the hybridization step was performed at 50\u00b0C instead of 65\u00b0C. Blots were washed: twice for 15 min at room temperature with 2X S S P E (0.18M NaCI, 0.01 M N a P 0 4 , 0.001 EDTA pH 8.0), 0.1% S D S ; once for 15 min at 50\u00b0C with 1X S S P E , , 0.1% S D S . Radio labeled probes were made by nick translation using the DNase\/DNA Pol manufacturer's instructions (GIBCO\/BRL). Chromosomal DNA was isolated as previously described (Yun et al, 1994). P C R products were generated using the primers listed in Table 2-2. P C R was performed using Taq polymerase (BRL), following the manufacturer's suggested protocols. Annealing temperatures (Ta) 2\u00b0C below the melting temperature T m of the 16 Table 2-1. Strains and Plasmids used in this study Relevant characteristics Reference or Source Bacterial strains E. coli JM109 RB404 D H 5 a C. crescentus NA1000 JS10O1 JS1003 JS3001 JS4000 recAl, endAl,gyrA96, thi,hsdR17,supE44,relAl, A(lac-proAB),XV, [traD36, proAB lacP, lacZAM15] \u00a5-dam-3,dam-6,metBl, galK2, galT22 lacYl, thi-1, tonA31, tsx-78, mtl-1, supE44 recAl, endAl,gyrA96, thi,hsdR17,supE44,relAl, A(lacZYA-arfF)U196 X ( CO E X Q. Q . Q. W ED r r i n 5 - \u00a3 t T - m . . t h CM (M * < OC oc oc oc o o CL a . a a o CO 1^ o> E (0 < 3 o> N a < z Q To E E 2 o o Ol c 'E I \u2022a I a l (_ 18 primers were used. Extension times (te) were based on 60 sec\/1000 bp of DNA. General P C R parameters were 95\u00b0C - 30 sec, T A - 30 sec, 72\u00b0C - tE. The vector pF3SKS+ was cut at the EcoRV site and T-tailed (Holton and Graham, 1991) and the P C R product was ligated into this vector. Cloning of chromosomal DNA adjacent to Tn5 insertions: Chromosomal DNA of the Tn5 mutant was cut with BamHI, Sa\/1 o r X m a l . SamHI fragments were cloned directly into the BamHI site of pTZ18 vectors. A second method that was used for isolating the chromosomal DNA adjacent to the Tn5 insertions involved an inverse P C R method developed by V. Martin (Martin and Mohn, 1999). PCR product forward primer name- sequence (5'-3') reverse primer name- sequence (5'-3') RAT5 RsaD-A-CGGAATCGCGCTACGCGCTGG RsaE1-GGGAGCTCGAAGGGTCCTGA Degenerate primers for RsaF search F60-(GC)CG(GC)(AGT)(GC)(GTC)(GC)(GC)(GC) (CT)T(CG)CT(CG)CC(CG)CAGCT(CG)G FB110-CT(GC)(CA)(GC)CAG(AC)C(GC)AC)T(GC)T TCGAC IF340-GCCGCC(CG)(CGT)(TAG)(GA)(TA)A(GC)A (GT)(GC)GG(GC)AG(GC)(TCG)(TA)(CG)T IFB415-CTG(TC)TC(GC)GC(GC)(AT)(CT)(GC)AG(G C)ACGTC Inverse PCR to obtain chromosomal DNA next to Tn5 insertion Tn5 universal -GGTTCCGTTCAGGACGGCTAC TnSXma 1 -AGGCAGCAGCTGAACCAA Tn5Sal1-ATGCCTGCAAGCAATTCG Degenerate primers for amplification of internal portion of RsaD homologues in FWC species RD43B-TA(TC)ATGCT(GC)CAGGT(GC)TAT(GC)AC CGIG IRD477B-C(GC)A(GT)(GC)CGCTG(GC)CGCTGGCC GC Unsuccessful PCR of reaF(1984) RsaF140-GCGGTCGAGCAGGGGGTGCT RsaFIEND-ACGAATCCTTGCGCGCCTTGG Amplification of pUC type vectors TZ1920-GAGGCCTAGTACTCTGTCAGACCAAGTTT ACTCATA TZI1060-GAGGCCTACTCTTCCI I I I ICAATATTATT GAA Amplification of gcc1984 (numbers correspond to bp in contig) Gcc1984-28-CGCTCTACACCGGCGGTCGCGCCAGCGC G c d 984-1407-GCCGGAACCCGAACCTGAACCGGTGTCG G c d 984-11200-GGAGCTCTGGCGCCCCACCAGGGACGC GTAGAACG G c d 984-12143-GTGGTCGGTGCCCGGCAGCCACAGGG Amplification of gcc973 (numbers correspond to bp in contig) Gcc973-1600-GGAATCCATGTCACATGGGAAGAGACGG TCCGCCGT Gcc973-I2310-GCTGGCGCCCCACCAGGGACGCGTAGA ACG Table 2-2. Primers used for PCR for this report. 19 Construction of plasmid vectors that replicate in C. crescentus.: The plasmid pBBR5 was constructed from the plasmids p B B R l M C S (Kovach et al., 1994) and pHP45\u00a32-Tc (Fellay et al., 1987). The Q-Tc fragment from pHP45Q-Tc was removed using HindlU and the ends were blunted using T4 polymerase. A 0.3 kbp portion of the C m r gene was removed from p B B R l M C S by cutting with Oral and replaced with the blunted Q-Tc fragment producing a Tc r broad host range vector that replicates in C. crescentus. The plasmid pBBR3 was constructed in an identical manner except the plasmid pHP45Q-Sm (Fellay et al., 1987) was used to provide a Srh r marker. Both these plasmids were constructed by John Nomellini. Construction of vectors that replicate only in E. coli: The vector pTZ18U(CHE) was constructed by amplification of all of pTZ18U except the ap r gene using the primers TZ1920 and TZI1060 that were designed with Sfi\/1 sites. The P C R product was cut with Stul and a C m r gene (Morales et al., 1991) with blunt ends was inserted into the site. Tn5 mutagenesis. Tn5 mutagenesis was accomplished using the narrow host range (Co lE l replicon) plasmid pSUP2021 (Simon et al., 1983) which is not maintained in C. crescentus. The plasmid was introduced by electroporation and 20,000 colonies that were streptomycin and kanamycin resistant were pooled, frozen at -70\u00b0C and aliquots were used for subsequent screening. Southern blot analysis of chromosomal DNA isolated from the Tn5 library was used to assess the randomness of insertions. Hybridization with a Tn5 probe, pUC8neoR, indicated that while there were some hot spots of Tn5 integration, the Tn5 insertions were randomly distributed throughout the chromosome (data not shown). SDS-PAGE and Western blot analysis. Proteins and S - L P S were isolated from C. crescentus as previously described (Walker et al., 1992; Walker et al., 1994). S D S -polyacrylamide gel electrophoresis (PAGE) and Western immunoblot analysis was performed as previously described (Walker et al., 1992). After transfer of proteins to nitrocellulose, the blots were probed with polyclonal antibody and antibody binding 20 was visualized using goat anti-rabbit serum coupled to horseradish peroxidase and colour-forming reagents (Smit and Agabian, 1984). To detect C. crescentus whole cells synthesizing an S-layer, a colony blot assay was used (Bingle. ef al., 1997a). Briefly, cell material was transferred to nitrocellulose by pressing the membrane onto the surface of an agar plate. The membrane was air dried for 10 to 15 min, washed in a blocking solution (3% skim milk powder, 20 mM Tris (pH 8.0), 0.9% NaCI) with vigorous agitation on a rotary shaker and then processed in the standard fashion (Bingle er al., 1997a). Surface protein from C. crescentus cells was extracted using pH 2.0 H E P E S buffer as shown by Walker (Walker et al., 1992). To compare the amounts of surface protein extracted from different mutants equal amounts of cells growing at log phase were harvested and equal amounts of the protein extract were loaded on the protein gel. S D S - P A G E and Western blotting were performed according to standard procedures (Sambrook et al., 1989). Isolation of cosmids containing rsaA, rsaD and rsaE. The NA1000 and JS4000 cosmid libraries were probed with radiolabeled rsaA, using the plasmid pUC9 rsa>AANAC. 5 cosmids from the NA1000 library were isolated and 4 cosmids from the JS4000 library. Southern blot analysis of the cosmids hybridizing to the probe was used to determine which cosmids contained DNA 3' of rsaA. An 11.7 kb Sstl-EcoR\\ fragment containing rsaA plus 7.3 kb of 3' DNA was isolated from one of the NA1000 cosmids and cloned into the Sstl-EcoR\\ site of pBSKS+; the resulting plasmid was named pRAT1. The 3' end of the cloned fragment consisted of 15 bp of pLAFR5 DNA containing Sau3A, Sma\\ and EcoRI sites. SamHI fragments from the NA1000 cosmid were subcloned into the SamHI site of vector pTZ18R for sequencing. The 3' end fragment was subcloned using SamHI and EcoRI into pTZ18R. The 5' end fragment was subcloned using Sstl-Hind\\\\\\ into pTZ18R. A cosmid containing the rsaA, rsaD and rsaE genes was isolated from the JS4000 cosmid library and pieces were subcloned as BamHI fragments in pTZ18U for sequencing. \/-\/\/>?aflil\/BamHI fragments containing the rsaA gene were cloned directly from the genome of JS4000 and JS3001 by isolating bands of the correct size from an agarose gel and ligating to pUC8. Colonies were probed with rsaA from NA1000 for plasmids 21 containing the correct insert. These clones were subcloned in three pieces as Hind\\\\\\\/Cla\\, ClallEcoRV and EcoRV\/BamHI fragments into pUC type vectors. C\/al sites for cloning were generated in the vector by cutting with BamHl and filling in the 5' overhangs with Klenow fragment. Ligation of the blunt ends then produces a C\/al site. Isolation of FWC S-layer subunit genes. FWC27 chromosomal DNA was digested with BamHl and Pst\\. The digested DNA was ligated to a pTZ19U vector also digested with BamHl and Pst\\. A portion of the ligation mixture was electroporated in to E. coli JM109 and allowed to incubate at 37\u00b0C for 1 hour in 1 ml of Luria broth. The mixture was divided evenly and spread on 10 agarose plates and incubated overnight. The colonies were adsorbed to sterile filter paper (Whatman 541). The colonies were then lysed by soaking the filter paper in 0.5M NaOH for 5 min. The filter paper was neutralized by soaking the filter paper in 1M Tris-HCI (pH 7.0) for 5 min twice. A filter was then soaked in 0.5M Tris-HCI (pH 7.0), 1.5M NaCI. Then, the filter was washed with 70% EtOH and baked at 80\u00b0C for 2 hours. The filters were then probed with pUC8neoR using the Southern blot hybridization procedure allowing 30% mismatch (see above). Nucleotide sequencing and sequence analysis. Sequencing was performed on a DNA sequencer (Applied Biosystems model 373). After use of universal primers, additional sequence was obtained by \"walking along\" the DNA using 15-20 bp primers based on the acquired sequence. DNA was sequenced in both directions for all original sequence, thereafter DNA was only sequenced in both directions when ambiguities were found. Nucleotide and amino acid sequence data were analyzed using Geneworks and MacVector software (Oxford Molecular Group) and the NCBI BLAST e-mail server using the BLAST algorithm (Altschul ef al., 1990). Primers were designed with the help MacVector and Amplify 1.2 (Engles, 1993) Protein alignments were generated using the ClustalW algorithm as implemented by the MacVector software using the default settings. The sequences for NA1000 rsaADEF and IpsABCDEF were submitted to Genbank and can be accessed as AF06235. The sequences for JS3001 rsaA and JS4000 rsa AD E can be accessed 22 using the accession numbers AF193063 and AF193064. Preliminary sequence data of the C. crescentus genome was obtained from The Institute for Genomic Research through the website at http:\/\/www.tigr.org. Signal peptides predictions were made using the SignalP web server (http:\/\/www.cbs.dtu.dk\/services\/SignalP\/) (Nielsen et al., 1997). 23 Chapter 3 Secretion of RsaA Introduction The major purpose of my thesis was to elucidate the transport pathway of RsaA. The strain NA1000 was chosen for these studies because rsaA had originally been isolated from NA1000 and it is this gene that has been sequenced and used for all recombinant manipulations in the Smit Lab. In addition a number of useful mutants, with and without S-layers have been derived from NA1000. The lack of a cleaved secretion signal, the presence of calcium repeats, no periplasmic intermediate and a C-terminal secretion signal, indicated that RsaA was probably transported using a type I secretion system (Bingle et al., 1999; Bingle et al., 1996; Bingle et al., 1997a; Bingle era\/. , 1997b; Bingle and Smit, 1994) in which case other proteins would be required for secretion. Results and Discussion C. crescentus was screened for genes involved in the secretion of the S-layer subunit, RsaA. Since a type I secretion system uses 3 main proteins to form the transport mechanism, it was necessary to devise a method for finding the genes coding for the components by screening for the loss of RsaA secretion. Unfortunately, there is no easy method to detect the presence of RsaA on the exterior of a colony, as found for a-hemolysin or the metalloproteases which can be detected using blood or skim milk plates (Mackman et al, 1985; Wandersman et al, 1987). Previous research had shown that the lytic phage (|)CR30 could only infect C. crescentus when an S-layer was present (Edwards and Smit, 1991). This phage was isolated using the strain CB15BE, a derivative of A T C C 19089, as is NA1000. When the phage was used to lyse NA1000 cells with an S-layer using an moi of 10 4, it was found that spontaneous mutants occurred at a high frequency of approximately 10\" 5. When these mutants were examined, it was found that approximately 15% had lost their S-layer while the remaining 85% still retained their S-layer and were susceptible to re-infection. Obviously, the phage was not lysing all 24 the bacteria with an S-layer, since these bacteria still behaved like the wildtype strain. Of the bacteria that no longer had an S-layer, RsaA secretion was restored if a plasmid carrying the rsaA gene was expressed inside the bacterium (data not shown). It seems that the rsaA gene is a more likely target for mutation when selection pressure against the S-layer is applied. This is in agreement with the observation that many bacteria lose their S-layers during sub-culturing in the laboratory environment. This method was discarded in favour of a colony immunoblot assay which was much more labour intensive, but did not have a high background. For the colony immunoblot assay, two polyclonal primary antibodies were used: oc-RsaA (Walker er a\/., 1992) and a - S - L P S (Walker et al., 1994). a-RsaA reacts to RsaA and a - S - L P S reacts to the smooth L P S required for the anchoring of the S-layer to the surface of the bacterium (Walker er al., 1994). When a-RsaA was used, colonies with an S-layer reacted with the antibody and appeared as a spot on the blot (Fig. 3-1). It was also found that a 'halo' could be detected around colonies when the S-layer could not anchor to the cells (e.g., cells with a defective S-LPS) . The halo occurs when shed S-layer diffused away from the colony and was detected by a - R s a A as a ring around the colony (Fig. 3-1). When a - S - L P S was used, the antibody reacted to exposed S - L P S only when the cells of a colony lacked an S-layer; S-layer blocks the binding of a - S - L P S . 0 NA1000 (wildtype) RsaA appears to be completely degraded when it is not secreted (Bingle et al., 1996; JS1003 (S-layer negative) Bingle and Smit, 1994), therefore cell lysis JS 1001 (S-LPS negative) d u r i n 9 t h i s procedure and re lease of unsecreted RsaA was not a concern. Using Figure 3 -1 . colony immunoblot. this method, it was possible to differentiate Example of an immunoblot using a-RsaA a g a i n s t c o l o n i e s demons t ra t i ng the between cells secreting RsaA, cells secreting different phenotypes exhibi ted. a n d s h e d d i n g S . | a y e r a n d c e | , s w i t h o u t a n S . layer. Identification of Tn5 mutants lacking an S-layer. A pooled NA1000 Tn5 library was screened for S-layer negative mutants using the Western colony immunoblot 25 assay. In total, 9,000 colonies from the pooled Tn5 mutant library were screened using a - S - L P S antibody and 22,000 colonies were screened using a-RsaA. Eighteen Tn5 S-layer negative mutants were found. S D S - P A G E and Western blot analysis of whole cell lysates and culture supernatants confirmed that no S-layer was found in or on the cells or in the culture supernatant of these mutants (data not shown). One mutant, B12, on further examination was found to have an S-layer and was kept for use as a random Tn5 mutation control. Twenty-six Tn5 mutants with a shedding phenotype were also isolated during the screening and are described in Ch. 6. Identification of Tn5 mutants defective in RsaA secretion. Several possible Tn5 insertion events, in addition to those in secretion genes could result in an S-layer negative phenotype. To eliminate Tn5 insertions in the rsaA gene, Southern blot analysis was performed on the S-layer negative mutants. Eleven of the mutants contained Tn5 insertions in rsaA and were not further characterised. Five mutants, B5, B9, B13, B15 and B17, contained insertions in the DNA immediately 3' of rsaA and one mutant, B2, had a Tn5 integration elsewhere on the chromosome (Fig. 3-2). These six mutants represented possible RsaA translocator mutants. S H B B H 1 kb Figure 3-2. S - l a y e r nega t i ve Tn5 i n s e r t i o n s . Graph ica l representat ion of the posit ions of Tn5 insertions from mutants that no longer secreted RsaA. B = SamHI, H = HindW, S = Ss f l . Triangles indicate Tn5 insertion points. 26 To determine whether the loss of S-layer was caused by a mutation affecting regulation of the gene, rsaA was expressed in the mutants under the control of a lacZ promoter, using the plasmid pRK415rsa>AAPK. This construct restored RsaA production in JS1003 and B1, mutants with an interrupted rsaA gene, although wildtype RsaA expression levels were not reached. No S-layer was found on any of the five mutants with a Tn5 insertion in the DNA immediately 3' of rsaA secreted RsaA when rsaA was expressed in trans in this manner (Fig. 3-3). In addition, the one mutant (B2) where the Tn5 insertion was not adjacent to the rsaA gene also produced an S-layer when complemented with the plasmid pRK415rsa>AAPK. This indicates that the B2 insertion was not in a gene involved in RsaA secretion. B2 may have an interruption in a gene responsible for regulation of 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 Figure 3-3. C o m p l e m e n t a t i o n of Tn5 mu tan ts w i th rsaA. Protein was ex t rac ted from the surface of the Tn5 mutants and JS1003 carrying the plasmid p R K 4 1 5 rsaAtsPK which expresses R s a A under control of the lac promoter and wi ldtype and rsaA knockout mutants that did not contain any plasmid to demonstrate dif ferences in express ion. Equal amounts of surface extracts were loaded on the gel and a Western performed using polyclonal ant ibody against RsaA. The lanes are as follows: Lanes 2 through 10 are surface extract ions from cells conta in ing the p lasmid p R K 4 1 5 rsaAAPK ind icated by ( A P K ) . 1, pur i f ied R s a A ; 2 , J S 1 0 0 3 ( A P K ) ; 3, B9(APK) ; 4, B13(APK) ; 5, B1(APK) (a Tn5 insertion in rsaA); 6, B5(APK) ; 7 , B15 (APK) ; 8, B17 (APK) ; 9, B2(APK) ; 10, B12(APK) (a random Tn5 insert ion); 11, J S 1 0 0 3 (rsaA'); 12, N A 1 0 0 0 (wildtype). The arrow indicaties wildtype RsaA. 27 RsaA production or, possibly, the Tn5 insertion mutation does not eliminate secretion and a second mutation in rsaA was responsible for the loss of secretion. Isolation and sequencing of DNA near rsaA. A previously constructed cosmid library was used to isolate an 11.8 kb DNA fragment containing rsaA plus 7.3 kb of 3' DNA. This fragment was cloned into pBSKS+ forming the plasmid, pRAT1, and sequenced to search for translocator genes. An open-reading frame (ORF) was found 5' of rsaA, confirming earlier results (Fisher et al., 1988) and 5 O R F s were found 3' of rsaA (Fig. 3-4). A search of sequence databases showed that there were two O R F s immediately 3' of rsaA that encoded proteins with significant similarity to the A B C transporter and membrane fusion proteins (MFP) of two type I secretion systems: the alkaline protease transport system of P. aeruginosa (Guzzo er al., 1990) and metalloprotease transport system of E. chrysanthemi (Letoffe and Wandersman, 1992) (Figs. 3-5,3-6). The first O R F was 1734 bp long and started 246 bp after the termination codon of rsaA. This O R F was predicted to code for a 578 amino acid protein with a predicted molecular weight of 62.0 kDa and pi of 9.02. Alignments of the predicted amino acid sequence show that the putative protein is 46% identical and 69% similar rsaA rsaD rsaE L P S synthesis 1 B B 1 kb Figure 3-4. G e n e s 3 ' of rsaA. Graphic showing the ORFs found after sequencing the plasmid p R A T 1 . B = BamHI, E = EcoRI, H = Hindll l , S = Sst l . 28 to AprD from P. aeruginosa and 33% identical and 62% similar to PrtD from E. chrysanthemi.. The gene was designated rsaD because of this similarity (Fig. 3-5). RsaD exhibits several N-terminal hydrophobic domains that may be transmembrane regions and a possible A T P binding site in the C-terminal half of the protein. The predicted protein contains Walker A, Walker B, and A B C signature motifs as well as the newly discovered E. coli motif (hhhhH). These motifs are highlighted in Fig. 3-5. ClustalW Formatted Alignments RsaD m I AprD m a r l g|8|s PrtD pi | I HasD UpB M I v ii[R]p[A]v I m\u00b0 m|V F 311 I II N I L|a,T'71s p | l V M L Q V 00s R | g | A | r | s V l a MF slgjv I N L L I I L V PS L Y M L Q V r q i n T s T f s v q\\rr s T v I N V L M L falP S V Y M L Q V alAly R R[glF W[g] I a L F T A V I N L L M LUIP r |T|r^lK|v |FwTv | a | l F T A F l l It L t M L T > Y 0 R V L Y D R V L Y D R V L A 3 A L Y M L O V Y D R V L S I V M L Q V Y D R V Lip F S A V I N L L M L V P S L Y M L O V Y D R V L F I \u00bb J ! L T V IIcIvTl I L l l L I t Mia LS T U M L T L L Wajal M V T O I U L B 1 L L M L I L T LH\\U L T L T L T M L r L I L I I L T L 6~Mld q|v ft E p M T 61 gl t l i f A \" T l c D L T T L R O F I T a o T l \" A F T | D L T [ T ] L R Q F I T O N D L T T T L R O F p l T G N A L F A F F A L F A F F RsaD AprD PrtD HasD UpB RsaD AprD PnD HasD UpB E L P S S T V A A M I A H F E L F A G T V A E N Q L F[k]e S L A E N 0 L FTTG T L TRIN E I F A O T I A EN A R F A R F G D A R F Q |3v\"D A E K A R F N D I D S E K M I A fi v H E it l|g s|rT|m|G V S T T l A Q V H E L v T T L P (5G Y D T V vfalA A K L A G V H E L I L 0 L P N G Y D T V V a A A[a]l A G V H Q L ' V T A \u2022 E L F A G T A E N 500 8 C 8 8 * I G[7]G a A L GTTG QtJ]i L G E G G S II GFIQ Q A L S G G 6 R O R L A L A R A S G G Q R Q R I AL A R A L S G G O R O R I (g]L A R A S G G Q R Q R V A L A R A S G G Q K Q R I [gl l A R A V F p A L L V L B E p N Als P T L V V L D E P N S P C L L I L D E P N A|T|L D P A L V V L D E P N A N L D P A L V V L D E P N 5 N L D P. L V V L D E P N N L D RsaD AprD PrtD HasD UpB km 5\/0 =E]S L P|G]I a p a a a vRTI [Lja|N] - - - a q k |AJ q ' P p P ' P P P a EL. r vEAlg y glAEp q v vjA - -|AJa n qlAlr m n l F T 9 I n v n y|Ajn I aTJT n f s l e p d e g e Figure 3-5. ClustalW alignment of ABC-transporters. A l ignment of R s a D with AprD (Access ion number C A A 0 5 7 9 5 ) , PrtD (AAB03671 ) , HasD (CAA57069 ) and L ipB (BAA08631) which are the most closely related A B C transporters. The green box surrounds the Walker A motif, the blue box surrounds the Walker B motif, the red box surrounds the A B C motif and the yel low box surrounds the fourth A B C transporter motif recently d iscovered in most E. coli A B C t ranspor ters . 29 C l u s t a l W F o r m a t t e d A l i g n m e n t s RsaE E H * AprE PrtE M T HasE H S LipC pi S - - - - p 0 - - - - k i q r p t p | n f q -m [ T | r ~ v k | f T D \" E 3 \u00bb \" -3 m d i t t q [ p l e I | N | e A a m | R D [ T \" a s r n q S v irplg 0 i d T l | s ] r q \" T | D B | 9 t h i 9 e [ p j q [ D j s y [ T j e e i p [Q p ''g | r - -RsaE AprE PrtE HasE UpC L I T S Q R | L A[7~ fell RsaE V Q H | I e|Q C M L alK I 1 V R E 0 E K AprE V Q H * F j l | o 0 V V k |H ' I i V R D G [qJH PrtE I Q q |G 0 I V d R I q V K D Q D R HasE V Q H P | S | G G V V s |o I q V H E Q D R LipC V Q | T | p | a ^ l Q I I k n[_I a V R D G D K R$aE L R AprE % PrtE L R HasE M R LipC M K 220 L J y d k j o l l f v P R | p f L A ! e | E O Y M P R K Q L L E L A A D M Y V P R N K M L E A D G Y L P R N R ;[E S S G I|Q T S O RsaE AprE PrtE HasE LipC r 1 a | E V | t e p H e v V [ A ] S d 1 n E V i s Q | r e k pTId E L ] d n R L J a k [ A j s l E j f r n|n L | q m Q \" R * | f[o Q G [eJV 0 R \\f2 D T T3 a v jo i[R Q t|R R g a s d | _ J q d | T Q L K V E A Q H R I Q 1 q M L R I u it w U K G R I G 0 1|Q K l q f n 1 E S Cj\\t\\B. I G R . G . 320 Q . R I 330 A P F D N V R A P V A G T V V H T _ Q _ V K A P V A G T V V ~ G P vf7 |G T V V I . A P V . G T V V G L p l v P T \u00a3 , n I F T RsaE v\" Him G M | v t jT AprE i H B G L P V B PrtE V m \u2022 G L P V E HasE V 3^ -G L P V E LipC y_ y n G L P V D G L P V E F T A F H Q S L L FSAFNQS L M FTAFMQPi 370 T P R v T P R V T P R V T P K I T P R V Q T I | q s | L S [ q j D R I|_SJDIPJQ H G [ e ] v T M V S A D R L L D E Q O T V T L L S A D R L V D E K E 3 v T L v F 1 A D R L E ] D E K J G T V T L V S A D R L V D J V S A D R L N L R I K \" F ifdJaVv I ' G | P F T E G G V i j a p|Q: Q E G G V I G A G Q T Q G G V V G A p f d F T E G G RsaE | Q \" V ] i p ] p | T G B R T V L Q Y L F ) s [ P L | AprE PrtE HasE LipC Q V F V R T G E R S L L N Y L F Q [ g \" | p V R T G E R sQ] l K Y L F N Y 1 F T T I F v I I g | F I R T G E R S M Q V F V R G E R S L N Y L F K P L 450 Pi l\\L R| C]T M | r rE\"El D R U I H v A i A E|rr D R M H L A L T E E D R L H L A L T E E | P R n H | 7 | s h T E E Figure 3-6. ClustalW alignment of MFPs. Al ignment of RsaE with Ap rE (Access ion number C A A 4 5 8 5 6 ) , PrtE (CAA37343 ) , HasE (CAA57067) and LipC ( B A A 0 8 6 3 2 ) which are the most closely related A B C transporters. RsaD was predicted to have a insertion signal sequence consistent with insertion of the RsaD protein in the cytoplasmic membrane. The second O R F started 68 bp after rsaD, contained 1308 bp and encoded a protein of 436 residues with a predicted molecular weight of 48.4 kDa and pi of 6.59. Alignment of the predicted protein shows that the sequence is 28% identical and 50% similar to AprE from P. aeruginosa and 29% identical and 52% identical to PrtE from \u00a3. chrysanthemi. The gene was designated rsaE because of this similarity (Fig. 3-6). The deduced protein sequence of rsaE was predicted to have a typical N-terminal insertion signal sequence that would direct it to the inner membrane. 30 Possible ribosome binding sites were found 7 bp and 8 bp upstream of the A T G initiation codon for rsaD and rsaE, respectively. There was no indication of a promoter immediately 5' of either rsaD or rsaE, but there was a putative rho-independent terminator immediately after the stop codon of rsaE suggesting that they may be part of a polycistron which includes rsaA. It has been found in the type I secretion systems secreting E. coli a -hemolysin and E. chrysanthemi metalloprotease that the genes are part of an operon consisting of the substrate and the transport genes. It seems likely that transcription of the Rsa genes is similar. Three more O R F s were found 3' of rsaE. None of these O R F s encoded proteins similar to the third component of type I secretion systems. Instead, these O R F s encoded proteins similar to those involved in synthesis of perosamine, a dideoxyaminohexose (see Ch. 6). The chromosomal DNA near B1, B2, B5, B9, B13, B15 and B17 Tn5 insertions was isolated and sequenced to determine the Tn5 insertion point. It was found that the B1 Tn5 interrupts rsaA, as expected from the Sourthern blot analysis. B5 and B13 are identical insertions interrupting the N-terminus of RsaD while B17 is located 22 amino acids from the C-terminus. B9 and B13 are Tn5 insertions in rsaE. The sequence interrupted by the B2 Tn5 insertion has no sequence similarity to any known proteins. Complementation of the secretion-defective Tn5 mutants. To demonstrate that the Tn5 insertions were responsible for the secretion defect the mutations were complemented in trans. First, the cosmid, 17A7, containing the entire Rsa locus, was introduced into the mutants. All attempts at complementation using this cosmid were unsuccessful, including an attempt to restore RsaA production in JS1003 (which contains an inactivated rsaA gene). Since RsaA production in JS1003 can be restored with other plasmids containing rsaA, it is believed that expression of the genes was too low for complementation. A P C R product containing the genes rsaD and rsaE was generated and cloned into a suitable expression vector; the result was named pRAT5:PRK415 (see Ch. 2). This plasmid was introduced into the Tn5 mutants B15 and B17. With this plasmid, mutant B17 secreted RsaA while the B15 mutant did not (Fig. 3-7A). 31 1 6 7 B 1 5 6 of transport Figure 3-7. Complementa t ion deficient mutants using rsaD and rsaE. Westerns of surface extracted protein using anti-S ant ibody. A) Lanes are as follows: 1, B17 (DE); 2, B15 (DE);3, B1(DE) ; 4, B 1 7 (17A7) ; 5, B 1 5 (17A7) ; 6, J S 1 0 0 3 ; 7, N A 1 0 0 0 . (DE) i nd i ca tes tha t the ce l l s ca r r i ed the p l asm id pRAT5 :pRK415 containing the genes rsaD and rsaE. ( 1 7 A 7 ) indicates that the cel ls carry the cosmid 17A7 containing the entire R S A operon. Equal amounts of surface ext ract were loaded in all lanes. The arrow indicates full length RsaA. B) Lanes are as fol lows: 1, B1 (DE); 2, B5 (DE); 3, B9 (DE); 4, B15 (DE); 5, B17 (DE); 6; N A 1 0 0 0 . DE indicates that the cells carry the plasmid pRAT5 :pBBR5 expressing the genes rsaD and rsaE. Equal amounts of surface extract were loaded in all lanes except (6) where there was only one quarter of the amount loaded in the other lanes. The arrow indicates full length RsaA. To a d d r e s s the p r o b l e m s w i th B 1 5 complementat ion, a new tetracycl ine-resistant (Tc r) broad host range vector, pBBR5, was constructed. It was hoped that this vector would have a higher copy number and expression of the R s a genes that would a l lev ia te the p rob lems encountered when using pRK415 or p L A F R 5 (the cosmid vector). In the resulting constructs a lac promoter is used for transcription of the rsaD and rsaE g e n e s in p R A T 5 : P B B R 5 and the rsaA, rsaD and rsaE in p R A T 4 A H : P B B R 5 . When p R A T 5 : P B B R 5 was introduced into the mutants B1, B5, B9, B15 and B17 , Western blot analysis showed that the mutants with defective rsaD or rsaE genes expressed RsaA on the surface while the rsaA mutant B1 did not ( F i g . 3 - 7 B ) . W h e n pRAT4AH:pBBR5 was expressed in the same mutants, RsaA was only found on the surface of the B1 and B17 mutants (data not shown). The ability to complement the 32 Tn5 insertions in rsaD and rsaE using pRAT5:pBBR5 expressing rsaD and rsaE in trans indicates that these genes are responsible for the secretion of RsaA. The lack of complementation in some cases was probably the result of lower expression of the Rsa genes. It was necessary to use Tc to maintain the vectors as Tn5 confers kanamycin and streptomycin resistance, but C. crescentus does not tolerate Tc well. When cells carry the Tc resistance marker are exposed to even low levels of Tc (0.5 ng\/ml), they appear anomalous by microscopy. The cells are often severely elongated and there are few motile cells. It was difficult to grow cultures carrying Tc r plasmids with the Rsa genes to densities high enough to extract sufficient protein to be seen on the Western blot. It seems probable that the Tc was causing membrane abnormalities and that these factors contributed to lower expression of the Rsa genes with all the plasmids. The cosmid, 17A7, only has 1-2 copies per cell and similarly, pRAT5:pRK415 would be maintained at 2-3 plasmids per cell (Keen et al, 1988). Preliminary experiments with pBBR5 suggest that it has a much higher copy number than either pLAFR5 or pRK415 based vectors which would result in higher expression of any genes that pBBR5 carries (data not shown). Expression levels would also be affected by the promoter transcribing the genes. The lac promoter transcribes at higher levels than the wildtype rsaA promoter (Yap et al., 1994). In addition, in the cosmid and pRAT4AH:pBBR5, rsaD and rsaE are either transcribed by their wildtype promoter or as part of the rsaA transcript as described above. In either case, a lesser amount of transcript would be produced than from the lacZ promoter of pRAT5:pBBR5. These data suggest why the complementation occurred only in some cases. The plasmid pRAT5:pBBR5 (strong promoter and high copy number) produced the highest levels of RsaD and RsaE allowing full complementation of all the transport mutants while the cosmid, 17A7, (weaker promoter and low copy number) produced the lowest levels and could not complement any of the mutants. The plasmids pRAT5:pRK415 (strong promoter and low copy number) and pRAT4AH:pBBR5 (weak promoter and high copy number) probably make an intermediate amount of protein that is only enough to complement the mutant B17. This mutant may differ from the others because the Tn5 insertion is only 22 amino acids from the C-33 interrupted (Fig. 3-8). Smaller zones of clearing are seen around the wildtype strain, S-layef rsaE S-layer+ rsaA B 2 B9 B1 2 B 1 NA1000, and the S-layer producing B12 (representing a random Tn5 insertion unrelated to secretion), as compared to JS1003 or B1, where the rsaA gene has been interrupted, suggesting that there was competition between RsaA and PrtB for the J S 1 0 0 3 N A 1 0 0 0 rsaA' wildtype rsaD B l 7 rsaE B l S secretion machinery, further supporting the supposition that RsaD and RsaE are parts of a type I secretion mechanism. Identical results Figure 3-8. Expression of prtB in C. crescentus. PrtB was expressed in all the w e r e f o u n d w h e n * P r A w a s expressed plates containing 1% skim milk. Halos around colonies indicate that active PrtB is being secreted. Note that NA1000 and B12 cells are producing RsaA as well as PrtB and the halos surrounding these colonies are smaller. B12 represents a random Tn5 mutant control. Summary Analysis of the region 3' of rsaA revealed the presence of two genes (rsaD and rsaE) encoding proteins with significant sequence similarity to components of the type I secretion systems used by P. aeruginosa and E. chrysanthemi to secrete two different extracellular proteases (Duong et al., 1992; Wandersman et al., 1990). Because interruption of rsaD and rsaE eliminated secretion of RsaA and the defects could be restored by complementation, it was apparent that their gene products make up part of the RsaA translocator machinery. When these results were reported (Awram and Smit, 1998), it was the first example of an S-layer that is secreted using a type I secretion system. Before then, S-layers had only been found to be secreted by a type II system (Messner and Sleytr, 1992; Sleytr et al., 1993). It is now known that a protein with amino acid 34 unrelated to secretion), as compared to JS1003 or B1, where the rsaA gene has been interrupted, suggesting that there was competition between RsaA and PrtB for the secretion machinery, further supporting the supposition that RsaD and RsaE are parts of a type I secretion mechanism. Identical results were found when aprA was expressed in the Tn5 mutants (data not shown). Summary Analysis of the region 3' of rsaA revealed the presence of two genes (rsaD and rsaE) encoding proteins with significant sequence similarity to components of the type I secretion systems used by P. aeruginosa and E. chrysanthemi to secrete two different extracellular proteases (Duong et al., 1992; Wandersman et al., 1990). Because interruption of rsaD and rsaE eliminated secretion of RsaA and the defects could be restored by complementation, it was apparent that their gene products make up part of the RsaA translocator machinery. When these results were reported (Awram and Smit, 1998), it was the first example of an S-layer that is secreted using a type I secretion system. Before then, S-layers had only been found to be secreted by a type II system (Messner and Sleytr, 1992; Sleytr et al., 1993). It is now known that a protein with amino acid sequence similarity to RsaA is secreted by the S. marcescens type I secretion system (Kawai er al., 1998). In addition, the C. fetus S-layer protein is secreted by a type I secretion mechanism. The C. fetus S-layer shares several features in common with that of C. crescentus. It is produced by a free-living Gram-negative bacterium, is hexagonally-packed, anchors to the cell surface via its N-terminus to a particular species of L P S (Bingle er al., 1997b; Dworkin et al., 1995; Walker er al., 1992) and so far has the greatest similarity of any S-layer protein to RsaA (Gilchrist era\/., 1992). 35 The genes for the A B C transporter and the M F P components of type I secretion systems are generally found in an operon that includes the transported protein (Binet et al., 1997; Salmond and Reeves, 1993). In this respect then, the organization of the rsaA, rsaD and rsaE genes was not surprising. In contrast, the gene encoding the outer membrane protein component of type I secretion systems may or may not be closely linked to the other secretion genes. The third component of the Rsa transporter has now been found 5 kb 3' of rsaE and is described in Ch. 4. A potential Rho-independent terminator sequence is located after the rsaA coding region (Gilchrist et al., 1992). This predicted terminator results in a predicted transcript that matched closely the size of a transcript found using Northern blot analysis (Fisher et al., 1988). In this study, no obvious indications of a promoter were found immediately 5' of either the rsaD or rsaE genes suggesting that transcription of rsaD and rsaE is similar to transcription of the hlyA, hlyB and hlyD genes of E. coli, where a similar Rho-independent terminator is found after the hlyA gene and terminates most transcripts at this point. An anti-terminator, RfaH, prevents termination and when it does, a larger transcript including the hlyB and hlyD genes is made (Leeds and Welch, 1996). This transcript is difficult to detect because it has a short half-life and an analogous transcript in C. crescentus may have been missed in the northern blot analysis. Transcription of the E. chrysanthemi protease secretion genes appears to be accomplished by a similar method (Letoffe et al., 1990) and it is postulated that the same is true for the Rsa operon. A transcription pattern like this may account for the reduced expression found in the JS1003 and B1 mutants when they are complemented with rsaA. The kanamycin fragment interrupting rsaA in JS1003 does not have a transcription terminator and transcription may continue through to the end of rsaE, resulting in a transcript 1.5 kb longer than the wildtype, which would likely be more unstable and result in fewer transport complexes. In B1, it is likely that rsaD and rsaE are transcribed off one of the Tn5 promoters resulting in decreased amounts of transcript and, in turn, transport complexes. Type I secretion systems can be grouped into families. The RTX toxins, such as a-hemolysin (E. coli) and leukotoxin (P. hemolytica), comprise one family while extracellular proteases (e.g. AprA, PrtB) and lipase from S. marcescens constitute 36 another (Binet et al., 1997). Within the families there is high sequence similarity and functional secretion mechanisms can be constructed from using components from the different members without a dramatic drop in protein transport. Because it has been demonstrated that AprA and PrtB proteins can be secreted from C. crescentus in active form and there is higher sequence similarity between these proteins than with RTX toxins, presumably, RsaA can be grouped with the protease family of type I secretion systems. 37 Chapter 4 Identification of the Outer Membrane Protein Component of the RsaA Transport Complex Introduction The gene encoding the O M P component of the RsaA secretion machinery proved difficult to isolate since it was not found immediately 3' of the MFP , as in many other type I systems. This difficulty has also been found with most of the other type I secretion systems where the O M P is separated from the rest of the transporter complex. In fact, the O M P has only been found in 2 other cases of this type: TolC, required for transport of a-hemolysin in E. coli (Wandersman and Delepelaire, 1990) and HasF, part of the heme transporter in S. marcescens (Binet and Wandersman, 1996). In both of these cases the experimenters had simple, efficient screens to look for mutants. Several different strategies were considered to find the O M P component. As none of the original S-layer negative Tn5 mutants interrupted the O M P and considering the number of mutants screened, it was believed that the NA1000 Tn5 library did not contain the mutant. The Tn5 library may not have been complete or a Tn5 insertion in the O M P may have been lethal. If a Tn5 insertion was lethal there was no further point in screening another Tn5 library. It seemed possible that a point mutant with reduced secretion, but not having a lethal phenotype could be constructed. Since a UV\/NTG point mutant library had been previously made by others, it was decided that this library could be screened for an O M P mutant. Alternatively, a functional type I system could be reconstructed as was done in E. coli using hasDE, the ABC-transporter and M F P genes, from S. marcescens and the O M P gene, tolC (Binet and Wandersman, 1996). This secretion apparatus was capable of secreting the S. marcescens heme-acquisition protein, HasA, as well as AprA and PrtB. The S. marcescens O M P gene, hasF, was then isolated by expressing a protease along with hasDE in an E. coli tolC mutant along with a plasmid library of S. marcescens chromosomal DNA, and screening for the presence 38 of protease secretion on skim milk plates. It was hoped*that a similar method would be capable of identifying the Rsa OMP gene. A third option for finding the O M P was to screen by similarity to O M P components from other bacteria. There are two ways to approach this. One method is to search the genome of C. crescentus for DNA fragments hybridizing to the genes from OMP components. The other is to compare the sequences of different O M P components to find regions of similarity and design primers with degenerate sequences for P C R amplification of a portion of the O M P DNA sequence that can be used to isolate the complete gene by hybridization. All of these approaches were attempted and are summarized below, but none worked. The O M P gene was eventually found using the partial C. crescentus genome sequence provided by The Institute for Genome Research (TIGR). Two partial O R F s with similarity to O M P components from other bacteria were found in this sequence data and this information was used to devise strategies to clone the complete sequence and to test which of the two O R F s was a legitimate O M P gene involved in the secretion of RsaA. Results and Discussion Screening libraries for OMP mutants defective in secretion. Since the original immunoblot assay was very labour intensive, attempts were made to develop a new screening method for finding secretion deficient mutants. The proteases, AprA and PrtB, are secreted by type I transporters and can be secreted by the Rsa secretion machinery, allowing skim milk plates to be used for rapid screening. Therefore, vectors carrying these genes were designed for screening the libraries. The plasmid pBBR3AprA:pRAT5 was constructed and consists of the aprA gene and the rsaDE genes under the control of separate lacZ promoters. The plasmid pBBR3PrtB:pRAT5 is identical to pBBR3AprA:pRAT5 except the aprA gene is replaced with prtB. When these plasmids were introduced into the UV\/NTG mutant library, no secretion of AprA or PrtB was observed. The rsaDE genes had originally been included in the plasmid to exclude rsaDE mutants from being found during the 39 screening process, but since the plasmid did not work the approach was dropped. When the plasmids pBBR3AprA and pBBR3Pr tB were used to express their respective proteases in the UV\/NTG mutant library a large number of colonies failed to show secretion of the proteases. When some of these colonies were examined, it was found that they were still capable of secretion of RsaA. This was an unexpected result as expression of the proteases in NA1000 results in protease secretion from >99.9% of colonies. It was concluded that these proteases are not tolerated well by C. crescentus and could not be used as a screen. In agreement with this was the observation that C. crescentus colonies expressing the proteases could not be sub-cultured after growing for 5 days while normally C. crescentus can be sub-cultured even after several weeks. It appeared that the proteases were killing the bacteria, (see Ch. 5 for further discussion about protease expression in Caulobacter species). Without a rapid screening method, it was decided to drop screening of mutant libraries in favour of the other approaches. Searching for the OMP using complementation systems. If a complementation system was going to succeed in finding the O M P component, it was necessary to determine if a functional system could be constructed using the C. crescentus transporter components. In many other type I systems the components can be interchanged with components from other bacterial systems and allow heterologous secretion. To determine if the Rsa system would work in a similar manner plasmids expressing RsaD and RsaE were expressed in bacterial hosts along with O M P components from several different bacterial systems. The plasmids pBBR3AprA:pRAT5, pBBR3PrtB:pRAT5 and pRAT4AH were constructed and express either a protease or rsaA along with rsaD and rsaE. These plasmids were introduced into E. coli tolC+ alone or with either of the plasmids pBBRIAprF and pBBRIPr tF which express O M P components from the Apr and Prt systems. None of these strains secreted either the protease or RsaA (data not shown). Since E. coli is an enteric microorganism and C. crescentus is a free-living groundwater bacterium, their outer membranes are quite different. It is possible that the Rsa transport complex was unable to assemble in the membrane of E. coli. 40 Rhizobium meliloti and Rhizobium leguminosarum are ground water bacteria living in environments similar to C. crescentus and likely have a membrane resembling that if C. crescentus. In addition, the type I secretion systems, Nod and Prs, with similarity to the Rsa secretion machinery have been found in R. leguminosarum (Finnie et al., 1998; Scheu et al., 1992). In R. leguminosarum, as in the Rsa system, the O M P gene of the Prs secretion system has not been found close to the other transport genes and is expected to be elsewhere on the chromosome and could possibly complement the R s a machinery. With this in mind, pBBR3AprA :pRAT5 , pBBR3Pr tB :pRAT5 and pRAT4AH were expressed in R. meliloti and R. leguminosarum. Again, none of the constructs expressed the proteases or RsaA. Further exper iments were tried by introducing p B B R 3 A p r A : p R A T 5 , pBBR3PrtB:pRAT5 and pRAT4AH along with pBBR1 AprF and pBBR1 PrtF, in various combinations in the Rhizobium species. In no case was secretion of RsaA or the protease found (data not shown). Sequence similarity to other OMP genes was used to search for the Rsa OMP gene. Southern blots of C. crescentus chromosomal DNA were probed with the O M P genes, aprF and prtF under conditions allowing 30% mismatch. No hybridization of these probes to C. crescentus DNA was found (data not shown) demonstrating that this method could not be used. A sequence alignment of O M P components revealed areas of sequence identity among the different proteins. The protein sequences of the O M P s from a number of closely related type I transport systems (with O M P genes that are both linked and unlinked to the other transporter genes) were aligned (Fig 4-1). The OMP, HasF, was given the highest priority in the comparison because it is from the type I system with an unlinked O M P gene most closely related to the Rsa system. Areas of significant homology were examined for the purpose of designing degenerate primers to amplify a portion of the O M P gene using P C R . Four areas, shown in Fig 4-1, were chosen for making primers. The primers were designed by taking the consensus amino acid sequence and using the codon preferences of C. 41 F60 E FB 110 B IF 340 \u2022 IFB415 Figure 4-1. Alignment of OMP components. Arrows are p laced above regions of similarity that were used to design degenerate primers. The arrows are colour coded according the primer they were used to create (see legend) crescentus to determine the DNA sequence. The design process was governed by the suggestions in Colnaghi et al., 1996; Maser and Kaminsky, 1998; and Tobin et al., 1997. A variety of conditions, as well as different combinations of the primers, were used to amplify fragments from NA1000 chromosomal DNA (see Ch. 2). When the P C R conditions resulted in a product, multiple bands were always seen. Three DNA fragments of the expected size were gel purified and cloned. Sequencing of these products revealed similarity to 23S R N A , poly (3-hydroxybutyrate) biosynthesis genes and NADH dehydrogenase genes. The primers appeared to be amplifying undesired DNA sequences and as a result these experiments were abandoned. Two candidates for the Rsa OMP gene were identified in the preliminary Caulobacter genome data. As all other attempts had failed to identify the O M P gene, contact was made with The Institute for Genome Research (TIGR) who provided preliminary sequence data from the Caulobacter genome. F A S T A searches (Pearson et al., 1997) of this database produced two contigs with similarity to known O M P components. Contig gcc_973 contains an O R F coding for the first 225 amino acids of a possible O M P component with a G+C content of 65.3%. Examination of the DNA 5' of this ORF revealed that this O R F is 5 kb 3' of the rsaE gene and there are 5 intervening ORFs that likely code for S - L P S synthesis proteins (Fig. 4-2). This O R F has been designated rsaF(973). The deduced amino acid B Substrate Transporters S-LPS synthesis kb 0 2 4 6 8 10 12 14 B B valyl tRNA synthetase kb 0 2 S H C rsaF? ) H B BB gcc973 gcc 1984 Figure 4-2. The two possible OMPs, rsaF(973) and rsaF(1984). A) The figure shows that rsaF(973) is located 5 kb downstream of rsaE. B) rsaF(\\984) is located adjacent to a gene coding for valyl t R N A synthetase. The location of the gcc cont igs is shown with black bars. B-BamHl, C-Clal, S-Sstl, 43 sequence of rsaF{973) had greatest similarity to TolC with 26.1% identity and 52.2% similarity over the 225 amino acids coded by gcc_973 (Fig. 4-3). Contig gcc_1984 has a G+C content of 67% and contains an ORF coding for the last 384 amino acids of a possible OMP. This ORF has been designated rsaF(1984). 3' of rsaF(1984) is an O R F coding for valyl tRNA synthetase (Fig 4-2). The coding sequence of rsaF(1984) had greatest similarity to the HasF O M P with 26.8% identity and 48.5% similarity (Fig. 4-3). The G+C content of these two O R F s is comparable to C. crescentus's 67%, suggesting that neither is a recent genetic acquisition. These two contigs overlap with 59.6% identity over a region of 344 bp indicating that they are not part of the same O R F , but suggest that one arose by gene duplication of the other (Fig. 4-3). Once sequence was available it was assumed that it would be relatively simple to obtain both complete genes. This did not prove to be the case. Using these sequences, primers were designed to amplify portions of rsaF(973) and rsaF(1984) that could then be used as probes to isolate the complete genes. These primers had melting temperatures (Tm) between 58\u00b0C and 62\u00b0C and did not appear to have any hairpin loops or secondary priming sites when analyzed using primer analysis and design programs. Primers of this size and T m have been used routinely for P C R amplification of C. crescentus DNA with excellent results. These primers produced products of the expected size, but when cloned and sequenced the products were identical to the C. crescentus DNA gyrase and glutamate permease genes. Suspecting that there may be something peculiar about the structure of the DNA around the rsaF genes it was decided to attempt to isolate the DNA of the adjacent regions. Since the start of rsaF(973) is found in the genome 1.5 kb 3' of sequences cloned into pRAT1, a 2 kb SamHI-EcoRI fragment was sub-cloned from pRAT1 and designated pRAT HI (B\/E). To amplify a fragment of DNA close to the rsaF(1984) gene, new primers were made to amplify a 736 bp region 3' of rsaF(1984). These primers were designed with Tm of 70\u00b0C and were 26-28 bp long. 44 A. BlastX comparison of gcc_973 Smallest High P r o b a b i l i t y Sequences producing High-scoring Segment P a i r s : Score P(N) 1. gi|72556 outer membrane p r o t e i n t o l C E . c o l i 92 4 . 0 e - l l 2. gi|3080540 (D49826) LipD [ S e r r a t i a marcescens] 115 7.4e-07 3. gi|4826418 (Y19002) PrtF p r o t e i n [Erwinia amylovora] 115 1.0e-06 4. a i l 281563 a a a l u t i n a t i o n o r o t e i n - Pseudomonas D u t i d a 61 3.4e-05 B. BlastX comparison of gcc 1984 Smallest High P r o b a b i l i t y Sequences producing High-scoring Segment P a i r s : Score P(N) 1. gi11405817 (X98513) HasF ABC exporter outer membrane . 154 1.0e-23 2. gi|135980 OUTER MEMBRANE PROTEIN TOLC PRECURSOR E . c o l i 159 1.2e-23 3. gi|3080540 (D49826) LipD [ S e r r a t i a marcescens] 126 8.3e-23 4. ail4826418 (Y19002) PrtF p r o t e i n [Erwinia amvlovoral 111 4.2e-21 C. Overlap of gcc_973 and gcc_1984. gcc_97 3 CAGACCTCGACCCTCTCTCTGAGCCAGAGCCTCTACACCAACGGTCGTTTCTCGGCCCGC gcc_198 4 CGCTCTACACCGGCGGTCGCGCCAGCGCGGGC gcc_97 3 CTGGCGGGTGTCGAGGCGCAGATCAAGGCCGCGCGCGAGAACCTGCGCCGCATCGAGATG gcc_198 4 GTCAGCCCCGCTGAAGCCGACGTGCTGTCTGCGCGGGAAGGTCTTCGCGCGGTCGAGCAG gcc_97 3 GACCTGCTGGTCCGCGTGACCAACGCCTATATCTCGGTGCGCCGCGACCGCGAGATCCTG g c c _ l 9 8 4 GGGGTGCTGGTCAGCGTCGTCCAGGCCTATGTCGACGTGCGCCGAGACCAGGAACGCCTG gcc_97 3 CGGATCAGCCAAGG-CGGTGAAGCCTGGCTGCAGAAGCAATTGAAGGACACCGAGGACAA gcc_1984 CGCATC-GCCAAGGAAAACGTCGCGGTTCTGCAGCGCCAGCTCGAAGAATCGAACGCTCG gcc_973 GTACAGCGTCCGTCAGGTGACCTTGACCGACGTGCAGCAGGCCAAGGCCCGCCTGGCGTC gcc_1984 CTTCGACGTGGGTGAGATCACCCGGACGGACGTCGCCCAGTCTCAGGCGCGCTTGGCTTC gcc_973 GGCCAGCACTCAGGTGGCGAACGCCCAGGCGCAGCTGAATGTCAGCGTAGCGTTCTACGC gcc_1984 GGCCAAGGCCAGCCTGTCGGGCGCCCAGGCCCAGTTGGAAGTCAGCCGCGCCTCCTACGC gcc_973 GTCCCTGGTGGGGCGCCAGCCGGAGAC gcc_1984 TGCGGTGGTCGGTCAAACGCCCGGCGAACTGGCTCCCGAGCCGAGCTTGGCCGGACTGCT Figure 4-3. Comparison of possible Rsa OMP components. A) Closest similar proteins to the ORF from g c c _ 9 7 3 . B) Closest similar proteins to the ORF from gcc_1984 . C) comparison of gcc_973 to gcc_1984. Note that the P(N) numbers are higher for g c c _ 1 9 8 4 than gcc_973 because the gcc_1984 contig has a larger portion of the O R F . 45 P C R using these primers produced a product of the expected size that was successfully cloned and the resulting plasmid was called pBSKS-gcc1984. When sequenced, the product proved to be the correct fragment. The NA1000 cosmid library was probed with pRAT HI (B\/E) and p B S K S -gcc1984. A number of cosmids hybridized to pRAT HI (B\/E), but all proved to contain only DNA 5' of rsaF(973) and it was concluded that rsaF(973) was not located within the NA1000 cosmid library. The cosmid, 7A22, hybridized to pBSKS-gcc1984. Southern blots of the cosmid showed that pBSKS-gcc1984 hybridized to a 5.5 kb SamHI band. Several attempts were made to subclone this fragment and while the surrounding fragments could be cloned, it was not possible to subclone the fragment containing rsaF(1984). Yet another approach was taken to isolate the rsaF genes. The plasmids pRAT HI (B\/E) and p B S K S - g c d 984 will not replicate in C. crescentus and could be forced to integrate into the genome by homologous recombination. The plasmid pBSKS-gcc1984 was not successfully integrated into the chromosome, but pRAT HI (B\/E) was, giving NA1000::pRAT HI (B\/E). Chromosomal DNA from NA1000::pRAT HI (B\/E) was partially digested with BamHI and ligated under conditions promoting the circularization of the DNA fragments. The ligation mix was electroporated into E. coli and plated on selective medium which allowed only the growth of cells carrying the plasmid pRAT HI (B\/E) and chromosomal DNA adjacent to the integration points that had circularized during the ligation. The 14 kb plasmid, pTZ19UASSm973Bcirc, was isolated in this manner. Restriction mapping and Southern blotting of this plasmid showed that insert consisted of DNA from 2.5 kb of 5' to 5.5 kb 3' of rsaF(973). Fragments of this plasmid were sub-cloned and sequenced, including a fragment containing the N-terminal of RsaF(973), but it proved impossible to subclone and sequence the entire rsaF(973) from this plasmid. This is not the first example of DNA from C. crescentus that has proved impossible to subclone. A 6.6 kb fragment, containing the holdfast genes involved in C. crescentus attachment, has proven resistant to the subcloning efforts of several graduate students and postdoctoral fellows (Smit, unpublished). Fortuitously, one of the shedder Tn5 mutants, F11 (see Ch . 6), contains a Tn5 insertion 400 bp 5' of the rsaF(973) O R F . Using primers that hybridize to the 46 Tn5 it was possible to use an inverse P C R method (Martin and Mohn, 1999) to isolate and clone two fragments of DNA containing rsaF(973) . Plasmid pCR2.1F11Sal l contains the DNA from the F11 Tn5 insertion to the Sa\/I site 1.1 kb 3' of rsaF(973). The other, pCR2.1F11Xmal, contains the DNA from the F11 Tn5 insertion to the Xmal site 2.0 kb 3' of rsaF(973). Again, both of these clones proved difficult to isolate. Large amounts of P C R product were obtained from the P C R reaction, but cloning of these fragments only produced one clone of pCR2.1F11Sal l and two clones of pCR2.1F11Xmal. Usually when cloning products in this manner a minimum of 50 clones and as many as 300 clones can be expected. E. coli carrying these plasmids grow slowly and appear distended and malformed when observed by phase contrast light microscopy. It is possible that the inserts in these plasmids are not identical to wildtype NA1000 chromosomal DNA sequences, but contain mutations generated by inaccuracies in the Taq polymerase amplification. It may be that the majority of P C R product is lethal when introduced into E. coli, but some of the P C R product containing mutations in rsaF(973) making the product less toxic could be cloned in E. coli. The sequence of the insert from pCR2.1 F11 Sail assembled together with sequence from the plasmid pTZ19UASSm973Bcirc and the TIGR genome (Fig. 4-4, Appendix I). The R s a F (973) sequence from pCR2.1F11Sal l , showed considerable similarity to other OMPs . The highest degree of sequence similarity was to E. coli TolC with 25.2% identity and 48.6% similar amino acids. The O M P s AprF and PrtF from P. aeruginosa and E. chrysanthemi were not as similar (Fig. 4-5). Analysis of the sequence of RsaF(973) revealed the presence of a predicted signal sequence encompassing the first 32 amino acids and the presence of (3-strands capable of forming a (3-barrel structure typical of outer membrane proteins. 47 48 Comparison of RsaF(973) to the protein databases smallest High P r o b a b i l i t y Document ID Accession P r o t e i n Species Score P(N) 1. g i |3860786 (AJ235270) TolC R i c k e t t s i a prowazekii 160 5. 7e-23 2. g i | 882565 (028377) n\/a E s c h e r i c h i a c o l i 103 5. le-17 3. gi|135980 (X54049) TolC E s c h e r i c h i a c o l i 103 6. 9e-17 4 . gi|3080540 (D49826) LipD S e r r a t i a marcescens 115 1. 9e-16 5. gi|2495191 (U25178) TolC Salmonella e n t e r i t i d i s 90 1. 4e-14 6. gi|4826418 (Y19002) PrtF Erwinia amylovora 115 3. 2e-13 7 . gi|281563 (M64540) n\/a Pseudomonas putida 99 1 3 e - l l 8. gi|72556 (X00016) TolC E s c h e r i c h i a c o l i ( p a r t i a l ) 92 1 4 e - l l 9. g i | 1405817 (X98513) HasF S e r r a t i a marcescens 90 3 4 e - l l 10. g i | 4838370 (AF121772) NatC N e i s s e r i a m e n i n g i t i d i s 111 3 2e-10 11. gi|4115627 (AB015053) PrtF Pseudomonas fluorescens. 92 1 Oe-09 12. g i | 117799 (X14199) CyaE B o r d e t e l l a p e r t u s s i s 87 1 9e-09 13. g i | 3493599 (AF064762) ZapD Proteus m i r a b i l i s 94 5 9e-09 14. gi|4063019 (AF083061) T l i F Pseudomonas fluorescens 85 1 le-08 15. g i I 2983554 (AE000721) n\/a Aquifex a e o l i c u s 108 1 6e-08 16. g i | 416635 (X64558) aprF Pseudomonas aeruginosa 86 5 3e-08 17 . g i | 5759289 (AF175720) n\/a Porphyromonas g i n g i v a l i s 66 6 7e-06 18. giI 5759287 (AF175719) n\/a Porphyromonas g i n g i v a l i s 83 0 00017 19. giI 1653357 (D90913) n\/a Synechocystis sp. 70 0 00018 20. giI 3646415 (AJ007827) EprF Pseudomonas t o l a a s i i . . 78 0 00024 21. gi|3184190 (AB011381) OprM Pseudomonas aeruginosa 74 0 00035 22. gi|5091481 (AF031417) TtgC Pseudomonas putida 66 0 .00043 23. gi|3914250 (L23839) OprK Pseudomonas aeruginosa 74 0 .0011 24. gi|95600 (S12527) PrtF Erwinia chrysanthemi 80 0 .0015 Figure 4-5. BLASTX search showing OMPs similar to RsaF(973). Lines 1 and 2 are predicted from ORF found in genome sequences. O M P from type I systems with the greatest similarity to RsaD and RsaE are underlined. The P(N) value gives the probability of the match arising by chance. Was either of RsaF(973) or RsaF(1984) the OMP component involved in secretion of RsaA? Sequence similarity was not enough to show that either or both of the genes coded for the O M P . One approach to determine this, was to construct knockout mutants of these O R F s and determine if this prevented secretion. The plasmids pTZ19UASSmANAC-RsaF(973) and pTZ18U(CHE)ANAC-RsaF(1984) were constructed to perform the required integration events. Both plasmids consisted of internal portions of the respective genes without the N-terminal and C-terminal. These constructs required only a single recombination event to accomplish the knockout. A single cross-over would produce two copies of the gene, one with an N-terminal deletion and one with a C-terminal deletion, neither of which would be expected to function. To make the pTZ18U(CHE)ANAC-49 RsaF(1984), it was still necessary to generate a P C R product containing the coding sequence of rsaF(1984). New primers were created using the primer selection methods provided by the MacVector software. The resulting primers were 26 and 28 bp long and had T m of 71-73\u00b0C. Once again the P C R process proved difficult. A P C R product could not be generated at any annealing temperature higher that 55\u00b0C, considerably lower than the predicted T m . When a product was generated, contaminating bands were always present and could not be eliminated by changes in the P C R reaction conditions. Instead, the band of the expected size was gel purified and cloned, giving the plasmid pCR2.1rsaF(1984) which was then used for constructing the deletion clone pTZ18U(CHE)ANAC-RsaF(1984). The plasmids pTZ19UASSmANAC-RsaF(973) and pTZ18U(CHE)ANAC-RsaF(1984) were electroporated into the strains NA1000, and JS4000. JS4000 is a strain of C. crescentus that cannot make RsaA, but has functional rsaDE genes virtually identical to that of NA1000 (see Ch. 5). Knockouts were only obtained in the strain JS4000 and not NA1000, resulting in the mutants JS4000rsaF(973) and JS4000rsaF(1984). When AprA was expressed in these mutants, AprA was not secreted by JS4000rsaF(973), but was by JS4000rsaF(1984) (Fig. 4-6). From these data it was concluded that RsaF(973) is the O M P of the RsaA secretion system. To confirm that RsaF(973) was required for secretion, the clone pBBR3AprA:pCR2.1F11Sal1, expressing AprA and RsaF(973) was created. This construct could not be made in E. coli. This may be because both of the separate plasmids were toxic, but sublethal. Together the toxic effects may be lethal. The plasmid was obtained by introducing NAT 000 (wildtype) JS4000 (S-layer neg.) JS4000 rsaF(973) JS4000 rsaF{ 1984) Figure 4-6. A p r A sec re t i on f r o m C. crescentus. A p r A was expressed in all bacter ia using p B B R 3 A p r A on skim milk p lates. Zones of clearing around the colonies indicate secre t ion of Ap rA . Delet ion of rsaF(973) interrupts secret ion of A p r A while interrupt ion of rsaF( 1 9 8 4 ) does not interrupt sec re t ion . 50 the ligation mix directly into the knockout strain of RsaF(973). No AprA is secreted from this construct as the plasmid pF3BR3AprA:pCR2.1F11 S a i l was unable to complement the knockout. Despite this, it is still believed that RsaF(973) is the OMP of the RsaA secretion system. Summary This portion of the project was exceptionally arduous because the rsaF genes appeared to be toxic in E. coli. This would explain much of the difficulty encountered, such as why the NA1000 cosmid library did not contain rsaF(973), why the TIGR genome sequence database does not contain a complete rsaF gene sequence, and why it proved difficult to isolate the genes. The lack of colonies resulting from the cloning of the rsaF(973) P C R products also suggests a toxic effect. All other attempts to isolate the rsaF genes on a fragment of DNA smaller than 7 kb failed, presumably because the smaller inserts were lethal. This suggests that the rsaF genes are lethal to E. coli and the clones obtained contain mutations that make the insert less toxic. As mentioned above, this presumed toxicity may explain why the partial TIGR genome sequence contained only partial ORFs of the rsaF genes. Other analysis of the TIGR sequence suggests that greater than 80% of the C. crescentus genome is represented (see Ch. 6). Given that, the sequence reported here for rsaF(973) may differ from the wildtype sequence. Such a mutant rsaF(973) gene in the plasmid PCR2.1 F11 Sail may not produce a protein that functions correctly. This would explain why this plasmid was tolerated in E. coli while other constructs appeared to be lethal and would explain why the plasmid pBBR3AprA:pCR2.1F11Sal l failed to complement the RsaF(973) knockout. It is unlikely that the phenotype of the rsaF(973) knockout is caused by a polar mutation because the gene 3' of rsaF(973) is transcribed in the opposite orientation. Even given the failure to complement the knockout, the results presented here indicate that RsaF(973) is the O M P required for secretion of RsaA. The function rsaF(1984) is not known. The entire O R F was never cloned and sequenced so it was not possible to determine if an entire O R F coding for an OMP exists. The sequence identity between the two rsaF O R F s suggests that one may 51 be a gene duplication of the other and that rsaF(1984) is no longer functional. Another possibility is that there is a second type I secretion system in C. crescentus (though it is not known what it might transport) that uses RsaF(1984) as the O M P component. Determining the function of rsaF(1984) represents a future project. 52 Chapter 5 Identification of the S-layer subunit and transporter genes in Freshwater Caulobacter species Introduction The Smit laboratory strain culture collection contains numerous strains that have been isolated from locales around the world and are designated F W C (freshwater Caulobacter) species (MacRae and Smit, 1991). Analysis of these FWC species showed that not all have an S-layer (Walker et al., 1992). There seems to be a geographical as well as evolutionary distinction between these species (Abraham etal., 1999; MacRae and Smit, 1991). No FWC with an S-layer has been found in Europe, though admittedly, only a small fraction of the FWC species were isolated from European sources while FWC species with and without S-layers were found in North America. The evolutionary relationships between the different F W C species have recently been examined by 16S rDNA sequencing, profiling of restriction fragments of 16S-23S rDNA interspacer regions, lipid analysis, immunological profiling and salt tolerance characteristics to organize the taxonomy of 76 different strains (Abraham et al., 1999). It was demonstrated that all of the F W C species with S-layers are much more closely related to one another than to the species without S-layers, and the non-S-layer FWC species have been reclassified as the genus Brevundimonas instead of Caulobacter. Therefore S-layers are a characteristic of Caulobacter species. The S-layers of the Caulobacter species have been previously examined. The S-layer subunits range in size from 100 kDa (comparable to NA1000) to 193 kDa and can be removed by a low pH or EGTA extraction method. All the putative S-layer proteins react with, antibody raised against RsaA (though most often to a lesser extent) and most also produce a polysaccharide that reacts to antibody against the S - L P S responsible for attachment of the S-layer in NA1000 (Walker et al., 1992). It was also shown that these FWC species will hybridize with an rsaA probe under conditions that would allow up to 30% mismatch (MacRae and Smit, 53 1991). This suggests that the S-layer subunits on these other F W C species are similar to RsaA and may also be secreted by a type I secretion mechanism. Two strains have been used predominantly for the examination of the S-layer in C. crescentus. NA1000 is a variant of the A T C C 19089 strain, whose genome is being sequenced by TIGR. It is from NA1000 that the rsaA gene and rsaD and rsaE, genes responsible for secretion of RsaA, were isolated (see Ch . 3). The second strain used in the Smit lab is JS4000, a lab variant of the A T C C 15252 strain that spontaneously lost its S-layer during culturing, and is being used for expression of recombinant proteins secreted using the NA1000 rsaA gene. The S-layer gene from JS4000 has been cloned and expressed in E. coli where it produces a 40,000 molecular weight protein in inclusion bodies (Bingle era\/. , 1999). ATC15252 has an S-layer gene that appears to be identical to RsaA as determined by size and antibody reactivity, yet other characteristics of the bacterium (i.e. cell appearance, growth rates), 16S rRNA sequencing (Stahl er al., 1992) and R F L P mapping of the genome (B. Ely, pers. comm.) showed that it is different from NA1000. Preliminary investigations of these S-layers that were begun in order to determine the differences between the S-layer subunits and their associated transport systems are presented here and have now been taken over by Mihai luga. It is hoped that analysis of these other S-layer systems will provide insight into the transport mechanisms by showing what changes in the transporters are required to transport the different sized subunits. Results and Discussion The S-layer subunit, ABC-transporter and Membrane Forming Unit proteins of JS4000 and NA1000 Caulobacter species are virtually identical. The S-layer genes from both JS4000 and JS3001, a shedding derivative of A T C C 15252, were cloned and sequenced (see Ch . 2) and have few differences when compared to the sequence of the NA1000 rsaA In a few places the guanosine (G) and cytosine (C) residues are reversed (i.e., G C instead of CG) , but these are in regions of high G+C content and appear to be errors in the original sequencing of rsaA (Gilchrist et al., 1992) as the partial Caulobacter genome sequence from TIGR 54 supports my sequencing results. The sequence for NA1000 was amended accordingly. The error in the JS4000 sequence that truncates the S-layer protein consists of a guanosine base that has been deleted from codon 357 which causes a termination codon to be read at codon 359. These differences are listed in Table 5-1. The rsaD and rsaE genes from JS4000 have been isolated from a cosmid library (see Ch.2) and were sequenced. These genes are almost identical to the NA1000 genes. The differences between the strains are summarized in Table 5-1. ATCC 19089 ATCC 15252 NA1000 JS4000 JS3001 RsaA aa 358-359-360 Gln-Asn-Leu Gln-Thr-None Gln-Asn-Leu aa475 Val He Val aa860 Thr Ser Thr RsaD aa298 Asn Thr ND RsaE aa 131-132 Ser-Gln Arg-Leu ND Table 5-1. Differences between the Rsa genes found in lab strains. Deduced amino acid sequence differences between the RsaA, RsaD and RsaE proteins of three common lab strains of C. crescentus. ND- not determined The S-layers of FWC species are probably transported by a type I secretion system. The alkaline protease gene, AprA, from P. aeruginosa is secreted by the RsaA secretion machinery (see Ch. 3). AprA was successfully secreted in selected strains covering the range of S-layer subunit sizes, demonstrating that these strains also had type I secretion mechanisms (Table 5-2). AprA secretion was varied in the differing FWC species. While in NA1000 all the colonies containing the aprA gene secreted AprA, not all F W C colonies did. While some species (i.e., F W C 19) 55 showed full penetrance (all colonies expressed AprA), in other F W C species as few as 10% of the colonies secreted AprA when the aprA gene was expressed (i.e., FWC 32). It is not known why only some colonies secreted AprA. P. aeruginosa also expresses an inhibitor that binds to the AprA and prevents proteolytic activity inside the cell. As the inhibitor is not expressed with aprA in the FWC species, AprA may have a toxic effect on Caulobacter cells and there may be selective pressure to eliminate it from the cells. Cells not secreting AprA, may have found a way to prevent expression of the gene. NA1000 and some of the F W C species may be better able to tolerate the toxicity than other species. Species AprA secretion Penetrance* (%) Subunit size NA1000 ++ >99.9 98 kDa JS4000 ++ >99.9 98kDa FWC 8 ++ 80 122 kDa FWC 9 + >99.9 133 kDa FWC 17 + 78 106 kDa FWC 19 + >99.9 108 kDa FWC 28 + 45 106 kDa FWC 32 + 10 133 kDa FWC 39 + 80 193 kDa FWC 42 + 10 181 kDa Table 5-2. FWC species secreting alkaline protease. ++ represents 70 -100% of the NA1000 secretion level, + represents 20 -69% of the NA1000 secretion level * penetrance was the number of colonies expressing A p r A FWC species with similar subunit sizes have similar Southern blot banding patterns. To further characterise the FWC species, Southern blot analysis was performed using probes to rsaA and rsaDE. These blots were performed under conditions that would allow up 30% mismatch. The results are summarized in Table 5-3. 56 Caulobacter Subunit Fragment size when probed Fragment size when species size (kDa) with rsaD and rsaE (enzyme1) probed with rsaA (enzyme1) NA1000 98 >20kb(\u00a3coRI),7. lkb(#mtiIII) 1. Ikb (Hindill) JS3000 FWC 17 106 3.5 kb (\u00a3coRI), 5kb (Hindill) 4.3 kb (EcoRl) FWC 18 131 ND2 7.0 kb (BamHl) FWC 19 108 3.5 kb (\u00a3coRI) 4.4 kb (EcoRl) FWC 28 106 3.5 kb (\u00a3coRI) 4.3 kb (EcoRl) FWC 31 106 3.5 kb (EcoRl) 4.3 kb (\u00a3coRI) FWC 42 181 10 kb (EcoRl) 8.0 kb (EcoRT) Table 5-3. Comparison of Southern Blot banding patterns of different FWC species. Chromosomal digests with the enzyme specif ied were probed with either rsaA or rsaDE. 1 Enzyme that chromosomal DNA was cut with for Southern blot analysis 2 Not Determined Analysis of the Southern blot data suggests that the S-layer subunits and transporters can be grouped according to size. All of the FWC species with subunits ranging from 106-108kDa have identical Southern banding patterns, while all the other FWC species with different subunit sizes have different banding patterns. The ability of the rsaDE genes to hybridize to the chromosome of the differing FWC species suggests that the S-layer subunit is being secreted by a type I transporter. With this in mind, methods were devised for isolating the genes involved. The ABC-transporter subunits were isolated from several different FWC species. The sequence identity between A B C transporter among different type I systems is the most significant of the 3 transporter components. Using the sequence identity between the ABC-transporters aprD (P. aeruginosa), prtD (\u00a3. chrysanthemi) and rsaD (NA1000), degenerate primers were designed to amplify a central portion of the A B C transporter using PCR. Using these primers it was possible to amplify, clone and sequence fragments of the A B C transporter from FWC6, FWC8 and FWC39. P C R products were not successfully generated from FWC17, FWC26, 57 20 30 40 NA1000 ,L y M L Q V Y D R V I. T s R N V S T L I V L T V I C V F L F L V Y G L L E A L R T Q V L V R G G L K JS4000 k Y M L Q V Y D R V L T s R N V s T L I V L T V I C V F L F L V Y G L L E A L R T Q V L V R G G L K FWC8 l Y M L Q V Y D R V L s s R N V A T L V V L T L I C V F L F I V Y G L L E A L R T Q V L V R G G L K FWC6 FWC39 L Y M L Q V Y D R V L s s R N V A T L V V L T L l | v | I F L F L V Y G a L E A L R T Q V L V R G G L K L Y M L Q V Y D R V L s R N V T L V L T . I C V F L F L V Y G L I, E A L R T Q V L V R G G L K NA1OO0 60 70 80 90 roo F D g V A R D P I F K s V L D s T L s R K G i G G Q A F R D M D Q V R E F M T a G L I A F C D A P W JS4000 F D g A R D P I F K s V L D s T L s R K G i G G Q A F R D M D Q V R E F M T g G L I A F C D A P w FWC8 F D d a I* R D P I F R s V L D s T L n K R G a G PTIQ A F R D M D Q I R E F M T t G L I A F ITID A P w FWCS FWC39 E H \u2022 i- T R D P V F K s V L D s T \" K R \u2022 G G Q A F R D M D Q V R E F L T t G L I A F C D A P w F D A R D P I F K s V L D s T L G G G Q A F R D M D 0 V R E F M T G L I A E C U A P w 110 120 130 140 150 NA1000 T P V F V I V s W M L H p F F G I L A I I A C I I I F G L A V M N D n A T K N P I Q M A T M A s I A JS4000 T P V F V I V s W M L H p F F G I L A I I A C I I I F G L A V M N D n A T K N p I Q M A T M A s I A FWCS T P V F i I V s W I L H p Y F G V L A I I S c I T l l I F G L A V M N D r A T K N p I Q M A T M A s I A FWC6 FWC39 T P V F i I V s W M L H p F F G I L A I V S s V l | 11F G L A IM N D r n JL n HP i|g I. A T I A s I A T P V F I V s W M I, H p F F G I L A 1 I c . I I F G L A V M N D A T K N P 1 Q M A T M A s I A 160 170 ISO 190 200 NA1000 Q N D A G s T L R N A E V M K A M G M w G - G L Q A R W R A R R D E Q V A W Q A A A s D A G G A V JS4000 Q N D A G s T L R N A E V M K A M G M w G - G L Q A R W R A R R D E Q V A W Q A A A s D A G G A V FWC8 g Q N D A G s T L R N A E V M K A M G M w V - G L Q A R W R p \" R R D E Q V E w Q A A T s D S G G A V FWCS M w G - G L Q A R W R | V R R D E Q V A w Q A A A s D S G G A V FWC39 Q 0 D A N A T L R N A E V M K A M G M w J q a a | R A | 1 a | H S 1 8 E H V A w Q A A A s D A G G A V A Q N D A G s T L R N A E V M K A M G M w G G L Q A R W R . R R D E Q V A w Q A A A s D A G G A V 210 220 230 240 250 NA1000 M S G I K V F R N I V Q T L I L G G G A Y L A I D G K I S A G A M I A G s I L V G R A L A P I E G A JS4000 M s G I K V F R N I V Q T L I L G G G A Y L A I D G K I S A G A M I A G s I L V G R A L A P I E G A FWCB H \u2022 o I K V F R N I V Q T L I L G G G A Y L A I E G K[T|S A G A M I A G s I L V G R A L A P I E G A FWC6 M s G I K V F R Q V V Q T L I L G G G A Y L A I E G K I S A G S M I A G s I L V G R A L A P I E G A FWC39 M s G I K V F R 0 I V Q T L I L G G G A Y L A i f k l G R l I S [P IG S M I A G s I L V G R A L A P I E G A M s G I K V F R N I V 0 T L 1 L G G G A Y L A I . G K I S A G A M I A G s I L V G R A L A P I E G A 260 270 280 290 300 NA1000 V G Q W K N Y I G A li G A W D R L Q T M L R E E K s A D D H M P L P E P R G V L s A E A A s I L P P JS4000 V G Q w K C Y I G A R G A w D R L Q T M L R E E K s A D D H M P L P E P R G V L s A E A A s I L P P FWC8 V G Q w K G 1 L G A R G s w D R L Q T M L R E Q K n T D D H M P L P D P R G V L s A E A A T I L P P FWC6 V G Q w K G 1 L G A R G S w D R L Q T M L R E Q K n T D D H M P L P D P R G V L s A E A A T i L P P FWC39 I G Q W K G F I G A R G A w D R L Q ~a\\it L R | a [ E j a d rTfr D H M F L P E P R G V L s A E A A s i I P P V C Q w K G I G A R G A w [) R L Q T M L R E E K . D D H M p L P E P R G V L s A E A A s ] L P P 310 320 330 340 350 NA1000 G A <\u00bb T M R Q A S F R I D A G A A V A L V G P S A A G K S s L L R G I V G V w P C A A G V I R L JS4000 G A Q a \u00bb T M R Q A s F R I D A G A A V A L V G P S A A G K s s L L R G I V G V w P c A A G V I R L FWC8 a T P T M R Q A s F R I E A G T S V A I V G P S A A G K s s L L R G I V G V w P c A A G V I R L FWC6 G g Q T P T M R Q A s F R I E A G T s v A I V G P S A A G K s s L L R G I V G V w P c A A G V I R L FWC39 G A K A P T M R Q A s F R I D A G A A V A I V G P S A A G K s s L L R G I V G V w P c A A G V I R L G A Q F T M R Q A s F R I D A G A A V A . V G P S A A G K s s L L R G I V G V w P c A A G V I R L 360 370 380 390 400 NA1000 D G Y D I K Q W D P E K L G R H V G Y L P Q D I E L F S G T V A Q N I A R F T E F E s Q E V I E A A JS4000 D G Y D I K Q W D P E K L G R H V G Y L P Q D I E L F S G T V A Q N I A R F T E F E s Q E V i E A A FWC8 D G Y D I K Q w D P E K L G R H I G Y L P Q D I E L F[7 G T E A Q N I A R F T E F E A Q E E i D A A FWC6 D G Y D L R Q w D P E K L G R H I G Y L P Q D I E L F S G T V A Q N I A R F t l s F E A N D V i E A A FWC39 D G Y D L R Q w D P E K L G R H I G Y L P Q D I E L F S G T V A Q N I A R F F E A N D V i E A A D G Y D I K Q w D P E K L G R H C Y L P Q D I E L F S G T V A Q N I A R F T E F E Q E V i E A A 410 420 430 440 450 NA1000 t L A G V H E M I Q S L P m G Y D T A I G E G G A S L S G G Q R Q R L JS4000 t L A G V H E M I Q s L P m G Y D T A I G E G G A s L S G G Q R Q R L FWCS V L A G V H E M V Q A L P q G Y D T A I G E G G A s L S G G 0 R Q R I FWC6 k M A G V H E M I Q FWC39 k M A G V H E M I Q A L T ] a [ c r Y D T A I G E G G A s L S G G Q R Q R L A G V H E M I Q I. P G Y D T A I G E G G A s L S G G Q R Q R Figure 5-1. ClustalW alignment of partial RsaD genes from Identical residues have dark shading. Similar residues are shaded lightly, alignment is the consensus sequence. FWC species. The line underneath the FWC28, FWC29 and FWC41. Multiple bands were generated from FWC27 and FWC42, but I was unable to clone any of the fragments. Obviously, the P C R strategy selects for ABC-transporters most closely related to the NA1000 gene. This suggests that even though the subunit of FWC6 is 181 kDa and that of FWC39 is 193 kDa, the transporters are still closely related to FWC8 with a subunit of 122 and NA1000 with a subunit of 98 kDa and this was confirmed by sequencing (Fig 5-1). Curiously, FWC species with small subunit sizes close to that of NA1000 failed to generate P C R products suggesting that the sequences of their ABC-transporters have diverged more from the NA1000 sequence. Analysis of the sequence showed little division between the FWC species according to size. In some places along the deduced protein sequence, the transporters of smaller subunits are more similar to one another than to the transporters of larger subunits while in others, the sequences of transporters of differing sizes are more similar to one another (Fig 5-1). A method for screening the chromosomes of F W C species for the S-layer subunit and S-layer transport genes was devised (see Ch . 2). Using this method, part of the S-layer subunit gene for FWC 27 was isolated. FWC27 has an S-layer subunit size of 145 kDa. Comparison of the sequence to NA1000 reveals that there is a considerable difference in the sequence of these proteins (Fig. 5-2). A BLAST alignment of the RsaA and FWC27 sequences (Altschul et al., 1990) shows that the proteins are 44.6% identical and 61.5% similar over 130 amino acids. 59 NA1000 FWC 27 NMOOO FWC 27 NMOOO FWC 27 m a y T k k t g x S 10 t a q 1 r r s p 1 Is d A a A I T N I q | s | l | A | q U l t | A Q l S A A L N L N A Y Y S K F A Y Y A Q F V T A Y t \u00bb f57 ol k p r -a t t 1 t 1 d a y r T A F 9 \u00ab A L 1 \u00a3 n n i 1 r e t[r_ * v h 1 d 1 r s q a T A A L G p A 70 so 90 L K 1 V n s T T A V A i q T Y Q F F T G V a P s a A G L D F L V V K a a s A T T S V A t 1 A Y E F F T G k i P s 1 A G I D F L I K T T V A Y F F T G P s A G D F L 120 130 140 a q e N E F I N F s I N L a t g a a g a t T t g n t V N R Y I N F A V N L g k - n k d F * L g a L N R I N F N L G F A Y s t q v g d s TI -s p l T j g a q T f d A 50 N T N N S T N . . ISO a t m k S | A | A NMOOO FWC 27 Y I I G I L G n a v a t A A g A T p v D D v a a a v a f a k v h t 1 i I D Y L V D Y L v r a n t p f t a a a d - g d g a 200 i d t g Figure 5-2. ClustalW alignment of FWC 27 with the first 200 amino acids of RsaA. Identical residues have dark shading. Similar residues have light shading. Identical and similar residues are boxed. The line underneath the sequences is the consensus sequence. The sequence of RsaA contains repeating amino acid sequence elements. Sequence analysis of RsaA has revealed that portions of the sequence exhibit considerable sequence similarity to other portions of the molecule. Table 5-4 shows the similarity of the C a 2 + binding domain of RsaA to sequences closer to the N-terminal. These similar units do not appear to be uniform in size and appear to consist of 60 to 90 amino acid segments, but the exact s izes have not been determined. These segments may represent a complete structural domain (i.e., cc-helix or p-strand) that is replicated along the length of the protein, but further analysis is required to confirm this. As Table 5-4A shows, the alignments of RsaA along different portions of itself can result in as much as 28% identical amino acids. Furthermore, the Expect numbers, representing the possibility of the match occurring by chance in a random sequence database of the current size, are very small. Table 5-4B shows the other hits in the database to the same portion of RsaA. The Sap proteins from C. fetus are S-layer proteins with the greatest identity to RsaA. HlyA from Aquifex aeolicus and the hypothetical protein from Rhodobacter capsulatus both contain the calcium binding motifs found in proteins secreted by type I systems, leading to higher identity. As the Expect numbers show, the identity to RsaA along itself is greater than what would be found by chance in the sequence database. This repetitive 60 nature is also seen at the DNA level (data not shown). It must be taken into account that the nature of the RsaA composition (26% threonine and serine) leads to a higher number of repetitive sequences occurring than would be expected by chance. This explains why a low Expect number occurs with alignments to a membrane glycoprotein from Equine herpesvirus which also contains a high number of threonine and serine residues. It is only at Expect numbers of 1.8e-08, much higher than the best expect number of 6e-14 of RsaA to itself, that random proteins begin to show identity. Overall, the repetitive nature found here is higher than could be expected by chance and suggests that RsaA evolved by duplicating structural portions of the molecule to form a larger protein. 61 Table 5-4. BLAST alignment of RsaA with itself. A pir| |A48995 paracrystalline surface layer protein RsaA - Caulobacter crescentus Length = 1026 Score = 573 bits (1461), Expect = e-163 Identities = 300\/300 (100%), Positives = 300\/300 (100%) Query: 1 QLGATAGAI'l'tlNVAVIWGLTVLAAP^ 60 QLGATAGATiVlNVA\\ftWGLTvLAAK Sbjct: 721 Q L G A T A G A T I T T L X I V A V I W G ^ 780 Query: 61 L A G V E T V N I A A T K I O T T O 120 I A GTV E I V I S I X A A T D I O T Sbjct: 781 LAGVETVISIIAATDTWITA 840 Query: 121 CTGSAVIWSANITVGE*^ 180 GTGSAVIWSAlSriTvGEvVriRG^ Sbjct: 841 GTGSAVTFvSANITVGEVOTI^ 900 Query: 181 GTGADIFDIOSMGTSTAF^ 240 GTGADIFDIISLAIGTSTAFVTITES^ Sbjct: 901 GTGADIFT)INAIGTSTAFW 960 Query: 241 YLDAAAAGDGSGTSVTAKWFQFGGI^ ^ 300 YLDAAAAGIXJSCTSVAKWF^ Sbjct: 961 YLDAAAAGTCSGTSVAKWF^ ^ 1020 Score = 78.4 bits (190), Expect = 6e-14 Identities = 85\/318 (26%), Positives = 133\/318 (41%), Gaps = 37\/318 (11%) Query: 2 D3ATAGATIKTNVAVNVGL 59 L AT A NVAV+ G V A T G T T T + + S ++++++S+ G + Sbjct: 360 LTAT^ AAQAA^ M\/AVIX)GAWI 419 Query: 60 AIAGVEIVKlIAATIlTIwra^ 119 A+ G V +A T N V+T QA + VTGN+ TA + A+ Sbjct: 420 AVTGGTAVTVAQTAGNA VMITLTQA DVTVT^SSTTAVIVIQ^ 472 Query: 120 TGTGSAVTF VSAtvlTIVGEWTIR-GGAGAI^ ^ GAGADTL 169 AVT ++ I T G++ T + G G A + + S A +G GG L Sbjct: 473 GRVISkGAVTITDSAAASATTA^ 532 Query: 170 VYTGGTnTCTGGTGADiro^ KIJ^ LVGIST^ OUADGAF 226 T +T T ++N + T+T +T ++AA D +++ G + + IA Sbjct: 533 TATPTANTLT I J L W N G L - T T T G M T D S E A A A D ^ 584 Query: 227 GAAVTLG AAATl^YIIWWM3IX3SGTSvA^^ S 279 A TL A T + + AAG SV T+V AGA + + Sbjct: 585 AL^ TTTJsnSGDARVTITSHTA 644 Query: 280 GADAVIKLTGLVTLTTSA 297 A++ G T+T S+ Sbjct: 645 TTKAIVI^ EAGDinVTVSS 662 Table 5-4 continued Score = 66.3 bits (159), Expect = 3e-10 Identities = 94\/361 (26%), Positives = 143\/361 (39%), Gaps = 80\/361 (22%) Query: 4 ATi^TITOMWNVGLTvLAAPTG T T I V T L A N A T \u2014 G T S D V H N I L T I J S S S A A L A A G 57 A + TT +AV G V A T TT+T A+ T G S +T++ +AA. AG Sbjct: 409 ANSSTTTTGAIAVTGCTAVrvAQ^ 468 Query: 58 -TVA\u2014IAGVETVNIAATiniSnTA-HVEn^ 113 TVA + G T+ +A + TTA + T+TL + A +1 + +NL+ TG + Sbjct: 469 ATVAGRvKGAVTITDSAAASATTAGKIAT^ 528 Query: 114 FDASAVIGTGSAVTF-VSANrlVGhV- VTIRQGAGADSLTGSATANOT 159 A TT++T V+ T T G + + I G +++ A+T Sbjct: 529 R G A L T A T P T A O T L T I J N V ^ 588 Query: 160 I I G G A G IA D I I J V Y T G G T D T F T G G T G A D I F D I N A 191 + +G + T T FTGG GAD + A Sbjct: 589 TUXttSGDARVTITSHTAAAL^ 648 Query: 192 --IGTSTAFVTITDAAV GDKIXlLVGISTNGA\u2014 IAD3APGAAVTLGAAATLA 239 +G VT++ A + GD D++ + N3+ AD AFG TL Sbjct: 649 I V M S A G D E T V L V S S A T I J G A G G S V T ^ 703 Query: 240 QYH3AAAAGD3SGTSVAKWFtQPGG^ 299 A A A G S + GT+ + ++AGAT + + LT L T + Sbjct: 704 AGAAAQGSHNA N G F T A L Q D S A T A G A T ^ ^ 752 Score = 66.0 bits (158), Expect = 3e-10 Identities = 85\/301 (28%), Positives = 121\/301 (39%), Gaps = 46\/301 (15%) Query: 2 LGATAGATIFIl^VNvGLTvIAA 61 L A A T A++LVAAGT + N A . T+S A T A+ Sbjct: 172 LTAFVRAlNlTPFrAAADIDIAvKAALICT ILNAA TVSGIGGYATATAAM 219 Query: 62 AGVETVNIAATDTISTITAHvTOT 121 + ++ A T+ A V+ T +S S G+T + +TG Sbjct: 220 \u2014 INDLSDGALSTDNAAGV^ ^ GSTLSLTIGTDTLTG 263 Query: 122 TGSAVTFVSANITvGEVvTIRGGAGAD 181 T + TFV+ GEV AGA +LT DT+ G G A G D L + Sbjct: 264 TANNDTFVA GEV AGAATLT V G E O I J S G G A G T D V L I S I W V G ^ W ^ V T A I ^ 308 Query: 182 TGADIFDimiG-TSTAFVTITL^ 240 TG I I + TS A +T+ ++ L + +T+GA GA L A T AQ Sbjct: 309 TGVTISGIEIiyiWrSGAAITUOTSSGV^ 367 Query: 241 Yn^ AAAAGDGSCTSVAKSAfli^ p^ 299 + A G+ +VA G T V +S+A G VS A++ TG + +T Sbjct: 368 AANNVAVDGGATSIVTVASTGVTSGrTT^ 427 Table 5-4 continued S c o r e = 6 2 . 8 b i t s ( 1 5 0 ) , E x p e c t = 3 e - 0 9 I d e n t i t i e s = 7 7 \/ 2 9 3 ( 2 6 % ) , P o s i t i v e s = 1 2 5 \/ 2 9 3 ( 4 2 % ) , G a p s = 3 8 \/ 2 9 3 ( 1 2 % ) Q u e r y : 1 2 T N V A V N v t ^ T v I A A P I G T I T v ^ ^ 7 1 T + A V L + G + T L + TGT + +++ AG VA A W S b j c t : 2 3 0 T D N A A G V M i r r A Y P S S G V S ^ 2 8 7 Q u e r y : 7 2 T D I N I T A H V n n L T I j Q A ^ 1 3 1 + T + + + A + A V T + + T A + T + S V T G + T 4 + S b j c t : 2 8 8 SGGAGTrmJSWQAAAVTALPIGW L T A L M T 3 4 3 Q u e r y : 1 3 2 N T I N A 3 E v V r i R G G A G A I > - - S L T C S A T A ^ 1 8 9 N T + G T + G A G + + T + AN+ + G T + TG T + G S b j c t : 3 4 4 N T S - G A A Q T V T A G A G Q N L T A T T ^ SGTTTVGA 3 9 6 Q u e r y : 1 9 0 N A I C T S T A F V T I T D A A V G D ^ ^ 2 4 9 IM+ + T V + + + + + + T G A I A V T G A T+AQ A G + S b j c t : 3 9 7 NSAASGTVSVSVANSST T T T G A I A VIGGTAVTVAQ T A G N 4 3 5 Q u e r y : 2 5 0 G S G T S V - - A K W F Q F G G D I W V V r e 3 0 0 T + + A G + V + A + G A + + G V T + T S A A + S b j c t : 4 3 6 A V N I T L T Q ^ J O T V I X M 4 8 8 B Smallest High Probability Sequences producing High-scoring Segment Pairs: Score P(N) 1. gi 477427 RsaA - Caulobacter crescentus 1461 1 le--187 2. gi 2120535 SapB - Campylobacter fetus 154 9 5e--17 3. gi 2120536 SapA - Campylobacter' fetus 108 1 le--11 4. gi 2114323 membrane glycoprotein Equine herpesvirus 1 153 1 5e--11 5. gi 94640 SapA - Campylobacter fetus 100 1 4e--10 6. gi 2983562 HlyA - Aquifex aeolicus-hemolysin protein 132 9 9e--09 7. gi 2114321 membrane glycoprotein Equine herpesvirus 1 130 1 8e--08 8. gi 3128319 hypothetical protein-Rhodobacter capsulatus 98 4 3e--08 9. gi 2606019 envelope glycoprotein - Equine herpesvirus 4 127 4 7e--08 10. gi 4063042 glycoprotein - Cryptosporidium parvum 125 8 7e--08 11. gi 790694 epimerase -Azotobacter vinelandii 111 4 le--07. 12. gi 3128317 hypothetical protein-Rhodobacter capsulatus 102 1 4e--06 13. gi 790692 epimerase -Azotobacter vinelandii 109 1 4e-06 Table 5-4. BLAST alignment of RsaA with itself. A) Portions of the sequence of RsaA exhibit considerable sequence similarity to other portions of the molecule. Query represents the 3 0 0 amino acid segment of RsaA from 721 -1020 . Sbjct represents the entire sequence of RsaA. Numbers alongside the sequence indicate amino acid posit ions. The line between the Query and Sbjct lines indicates identical amino acids with the appropriate letter code and similar amino acids with a '+'. Identities refers to the number of identical amino acids shared between the sequences. Posit ives refers to the combined number of identical and similar amino acids shared between the sequences. Expect gives the possibility of the sequence alignment occurr ing by chance considering the current s ize of the sequence databases. B) Result of B L A S T search showing the c losest matches to the amino acids 721-1020 . P(N) numbers are almost identical to Expect numbers for Expect numbers<0.001 (Altschul etal., 1 9 9 0 ) . 64 Phylogenetic analysis of the FWC species has shown that the FWC species can be divided into five branches. Analysis of the phylogenetic study Abraham et al., 1999 shows that there are two branches, B and D, of the Caulobacter phylogenetic tree that contain species with only small, 100-108, kDa S-layers (Fig 5-3). FWC19, FWC28 and FWC31 belong to one of these branches and FWC 17 belongs to the other. These are the four strains with identical Southern blot banding patterns (Table 5-3) suggesting that the S-layers and associated transporters of these two branches are more closely related to each other than to the other three branches. The three other branches show no correlation between subunit size and evolutionary distance as they have S-layer subunit sizes ranging from small (102 kDa) to large (193 kDa). In addition to this, the species FWC6, FWC8 and FWC39, that proved easiest to amplify the ABC-transporter by degenerate P C R , all belong to different branches. This may simply reflect the conserved nature of the A B C -transporter. It may be that the larger S-layers evolved separately from one another and the similarities between ABC-transporters transporting large subunits (but not found in ABC-transporters transporting small subunits) may represent convergent evolution required to accommodate secretion of a larger subunit. 65 F W C without S- layers F W C 3 4 ( 110) F W C 4 1 ( 137) F W C 1 1 ( 108) F W C 4 5 ( 140) A * F W C 4 2 ( 181) F W C 2 6 ( 140) F W C 3 2 ( 133) F W C 16 ( 151) F W C 7 ( 177) F W C 2 9 ( 124) * F W C 6 ( 181) * F W C 2 8 ( 106) F W C 2 2 ( 107) B * F W C 3 1 ( 106) F W C 2 0 ( 108) F W C 2 5 ( 105) * F W C 1 9 ( 108) C i * F W C 8 ( 122) 1 1 F W C 3 3 ( 110) D F W C 3 5 ( 102) F W C 4 4 ( 106) 1 l * F W C 1 7 ( 106) * F W C 3 9 ( 193) F W C 9 ( 133) E F W C 1 2 ( 133) * J S 3 0 0 0 (98) * N A 1 0 0 0 (98) p I F 3 l I n 7 \/ J 0.00 0.05 0.10 Linkage Distance 0.15 Figure 5-3. Dendrogram derived from Caulobacter glycolipid content (Adap ted from Abraham ef al, 1999). The FWC species have been organized into 5 groups with a linkage difference of more than 0.05. * species examined in this study. Numbers in brackets refer to the s ize of the S-layer subunit in kDa. 66 Summary The evolutionary relationships of the S-layer subunits and associated transporters of the different FWC species have been examined here. These results are still preliminary and more work needs to be done to substantiate these conclusions. While keeping this in mind, I will hypothesize on the evolutionary relationships that the data presented here suggest. The repetitive nature of RsaA suggests how the different sizes of S-layers could have arisen among the different FWC species. The larger S-layer subunits from such strains as FWC39 and FWC41 may consist of an even more repetitive nature to account for the greater bulk. Larger S-layer subunits might arise from a duplication of DNA within the gene for the subunit. The phylogenetic analysis of the FWC species by Abraham and collegues shows little evolutionary relatedness with regard to S-layer subunit size (Fig. 5-3). While groups B and D contain only smaller S-layer subunits other groups contain a range of sizes. The most pronounced difference in subunit size is found in group E between the species with the largest (FWC 39) and the smallest (NA1000\/JS3000) subunits, yet the bacteria are very closely related according to glycolipid content. Thus, it seems that the large S-layer subunits arose independently. The identical amino acid changes seen in the ABC-transporters with large S-layer subunits suggest that these amino acids may be required changes for transporting a subunit of a large size. Further work on analyzing these differences is required before anything conclusive can be determined, and is of great interest since this information would help determine the factors that must be considered when designing recombinant proteins for secretion. In reviewing all current data, I hypothesize that the progenitor of the six branches of FWC species had a small (106-108 kDa) S-layer subunit and the two branches consisting solely of small S-layer subunits represent F W C that are most closely related to the progenitor. The S-layer subunits of the F W C species in the other four branches may have altered their sizes more recently. The repetitive nature of the S-layer sequence may have assisted in the duplication of sequence segments by allowing slippage during gene replication to create larger S-layer 67 subunits. Smaller subunits such as the 98 kDa NMOOO subunit may have resulted from deletion of repeated units. It may be that to accommodate the different sized subunits, the ABC-transporter components must be changed at specific residues to allow secretion of larger subunits. If convergent evolution resulted in the similarities found between the large subunit transporters here, then these similarities will indicate what portions of the protein are involved in transport of the larger subunit. I believe that the analysis of the S-layer subunits and transporters in this manner will allow a much greater understanding of the type I secretion systems. 68 Chapter 6 Identification of genes involved in the synthesis of the O-Antigen of C. crescentus Introduction The S - L P S of C. crescentus is responsible for attachment of the S-layer to the surface of the bacterium. Disruption of proper O-antigen formation in the S -LPS causes the RsaA molecules to slough off or 'shed' from the surface and assemble into sheets (Fig. 6-1). The S - L P S has been isolated and analyzed from S-layer negative NA1000 mutants (Walker et al., 1994) and has the same core and lipid composition as the rough L P S (Ravenscroft et al., 1992). Further analysis of the O-antigen (Smit, unpublished) has revealed that the O-antigen of the S - L P S appears to be composed of a homopolymer of a 4,6-dideoxy-4-amino-hexose. Mass spectrometry indicates that the O-antigen has a mass consistant with of forty of these hexose units. This homopolymer is unusual in that a number of different anomeric proton signals can be found when it is analyzed by proton N M R suggesting that the individual sugar units may not all be linked in the same c , manner. Presented in this report is evidence that this S-layer negative r 4,6-dideoxy-4-amino-hexose is, most likely, the sugar S-LPS negative perosamine. Perosamine is not commonly found in the O-antigen and only a few species, including Vibrio cholerae, Brucella melitensis and E. coli 0157, contain perosamine residues (Stroeher et al., 1995; Wang and R e e v e s , 1998). In add i t ion , a number of glycosyltransferases have been found which may be Figure 6-1. shed s-layer from C. crescen-tus. E M photo-graph of S -layer shed from a strain with de fec t i ve S - L P S . (Photo courtesy John Smit ) Figure 6-2 . colony Immunoblot. Example of an immunob lo t demonst ra -t ing the d i f ferent pheno-types exhibited by mutants . 69 the basis for the different linkages making up the homopolymeric O-antigen. Results and Discussion Several Tn5 mutants producing altered S-LPS were found. The screen used to detect transport deficient mutants also detected S - L P S mutants in the NA1000 Tn5 library. On plates, these mutants exhibit a 'halo' of RsaA protein diffusing out from the colonies that can be easily distinguished with an immunoblot from bacterial colonies not shedding the S-layer (Fig. 6-2). This method was used to isolate a total of 26 'shedders' from the NA1000 Tn5 library with altered S - L P S (Fig 6-3). Figure 6-3. S - L P S of s h e d d i n g Tn5 mu tan ts . Si lver sta ined polyacrylamide gel of S-L P S extracts from representative N M O O O shedder Tn5 mutants. N M O O O shows the wi ldtype form of S - L P S . J S 1 0 0 is a spontaneous shedder mutant with a defect ive S - L P S . The large dark band at the bottom is the rough LPS. Southern blot analysis of these mutants has shown that mutants F1-F22 consisted of 16 different Tn5 insertions (data not shown). Further Southern blot characterisation of the mutants showed that F8 was not a proper Tn5 insertion since the banding pattern was incorrect when probed with Tn5. Southern blots probed with the coding sequence of rsaA showed that the rsaA band in the mutant F21 was not the same as wildtype. This suggested that the Tn5 mutation did not result in the shedding phenotype, but instead a second mutation resulting in a deletion of the 70 rsaA gene was responsible (data not shown). To further characterise these mutants, Southern blot analysis using EcoR l and Sstl was performed on the chromosomal DNA of these mutants. Both of these enzymes do not cut Tn5 and as a result can be used to determine if the Tn5 insertions are linked. The Southern blots were probed with a portion of the Tn5 and the banding patterns have been summarized in Tables 6-1 and 6-2. The results showed that the majority of these mutants have identical banding patterns (groups C and I) and are linked. Of the remaining mutants: F10 and F22 appear to be linked, while F3 and F9 are not linked to any of the others (Tables 6-1 and 6-2). Four of these mutants were isolated at a later date and were not characterised by Southern (F23-F26). Southern blot analysis of chromosomal DNA digested using EcoRl Mutant Group A 8.1 kb Group B 15 kb Group C 23 kb Group D 30 kb Group E 35 kb Fl X F2 X F3 X F4 X F6 X F9 X F10 x F l l X F12 x F14 x F15 x F19 x F20 X F22 X Table 6-1. Compilation of Southern blot data from EcoRl digestion of shedder mutant chromosomal DNA . EcoRl does not cut Tn5 . The Southern blots were probed with a fragment of Tn5 . Mutants are grouped according the band s ize seen on the Southern blots. 71 Southern blot analysis of chromosomal DNA digested using Sstl Mutant Group F 9.3 kb Group G 14 kb Group H 18 kb Group I 20 kb Group J 21 kb Group K 23 kb Fl X F2 X F3 X F4 X F6 X F9 X F10 X F l l X F12 X F14 X F15 X F19 X F20 X F22 X Table 6-2. Compi lat ion of Southern blot data from Ssf l d igest ion of shedder mutant chromosomal DNA. Sst\\ does not cut Tn5. The Southern blots were probed with a fragment of Tn5. Mutants are grouped according the band size seen on the Southern blot. Half of the Tn5 and associated chromosomal DNA from a representative of each of these 16 groups and F23-F26 was cloned by one of two methods. The majority of Tn5 insertions were cloned by cutting the chromosomal DNA with BamHl. This cuts the Tn5 in half, but leaves the kanamycin resistance gene intact. This DNA was ligated into a pUC-based vector and selected on kanamycin. This gives an insert with Tn5 sequences on one side and chromosomal DNA on the other. A few mutants proved resistant to this technique and were cloned using an inverse P C R method, developed by V. Martin (Martin and Mohn, 1999). Sequencing off the end of the Tn5 revealed the insertion site of the Tn5 and this sequence was used to search the partial TIGR C. crescentus genome library for the DNA surrounding the Tn5 insertion site. All of the Tn5 insertion sites were found in the partial genome 72 sequence. Open reading frames (ORFs) were determined using the sequence from the partial genome and analyzed for C. crescentus codon preference. These O R F s were used to search the known protein databases for similar proteins using the BLAST algorithm (Altschul et al., 1990). The genes 2 interrupted by the Tn5 insertions were characterised using this data (Table 6-3). Tn5 mutant group Similarity to known proteins Location* ORF designation F1,F7 regulator and transcription repressor LacI gcc 433 IpsI F2 perosamine synthetase, RfbE - V. cholerae RATI IpsC F3 nucleotide sugar epimerase\/dehydratase gcc 1444 IpsK F4, F5 similarity to mannosyl transferase WbaZ - E. coli RATI IpsD F6 methyl-accepting chemotaxis receptor gcc 648 orfl F9,F13,F17 Phosphomannomutase, RfbB - V. cholerae gcc 227 IpsG F10 none-downstream of kpsT-like ORF (O-antigen transporter) gcc 279 orf2 F l l similarity to mannosyl transferase (rfb region) gcc 973 IpsE F12 similarity to mannosyl transferase WbaZ from E. coli RATI IpsD F14.F16 mannose-6-phosphate isomerase gcc 506 IpsH F15.F18 similarity to mannosyl transferase WbaZ from E. coli RATI IpsD F19 similarity to mannosyl transferase WbaZ from E. coli RATI IpsD F20 similarity to mannosyl transferases gcc 395 IpsF F22 none-downstream of kpsT-like ORF (O-antigen transporter) gcc 1290 orf2 F23 Phosphomannomutase gcc 227 IpsG F24 galactosyl-1-phosphate transferase, WlaH C. jejuni gcc 2537 IpsJ F25 mannose-6-phosphate isomerase gcc 506 IpsH F26 Rhamnosyl transferase gcc 2218 IpsL Table 6-3. List of shedder mutants. ORFs with similarity to sugar modif ication enzymes have been given an Ips designation. * Location gives either the contig (gcc) found in the partial Caulobacter genome or shows that the gene was found in the RAT I fragment 3' of rsaE and had been sequenced while looking for the third translocator protein, RsaF. 2 For clarity the ORFs will be referred to as genes and the corresponding deduced protein sequences as proteins even though it is acknowledged that neither assumption has been proven. 73 The S-LPS synthesis genes are genetically linked to the RsaA transport genes. Analysis of the DNA sequence around the rsaA transporter complex (see Ch. 3 and Ch. 4) revealed 5 O R F s with coding sequences having significant similarity to S -LPS synthesis enzymes between rsaE and rsaF(973) and one O R F 3' of rsaF(973) was found. The first O R F encoded a protein with similarity to GDP-D-mannose dehydratase (Currie et al., 1995; Stroeher et al., 1995), the second O R F encoded a protein with similarity to UDP-N-acetylglucosamine acyltransferases (Canter Cremers et al., 1989; Vuorio et al., 1994) and the third protein had similarity to perosamine synthetase (Bik et al., 1996; Stroeher et al., 1995). The fourth and fifth proteins have similarities to mannosyltransferases (Drummelsmith and Whitfield, 1999; Rocchetta etal., 1998). These five O R F s have been designated IpsA, IpsB, IpsC, IpsD and IpsE (Fig. 6-4). Another ORF, IpsF, was found 3' of rsaF(973), and also had similarity to glycosyl transferases (Kido et al., 1998). Since the S - L P S is required for attachment of the S-layer, it is not that surprising that some of the genes involved in S - L P S synthesis are physically near rsaA and the transport genes. Smooth L P S genes have also been implicated in the proper formation of the transport complex in some type I secretion signals (Wandersman and Letoffe, 1993). It is thought that smooth L P S is required for proper insertion of the O M P into the outer membrane. Sequencing of the Tn5 insertions in the shedders has shown that F2 is located within IpsC and the four different insertions F4, F12, F15, and F19 are located within IpsD. The presence of four different Tn5 mutations in IpsD suggests that the Tn5 mutations are the cause of the shedding phenotype and this gene plays a role in S - L P S synthesis. In addition, F11 is found in IpsE and F20 is found in IpsF. Most of the remaining Tn5 insertions are also in genes that have similarity to smooth L P S synthesis genes (Fig. 6-5, Table 6-4). Two of these insertions interrupt genes with similarity to glycosyltransferases. Four Tn5 insertions are found in genes that have been implicated in pathways for the production of GDP-4-keto-6-D-deoxymannose, a precursor of GDP-L-fucose and GDP-perosamine. One insertion appears in a gene with similarity to transcription regulators. Two other insertions are in unknown genes. 74 (\/> CO cc | \u00a71 H E 3 s t S <\/> * .\u00a3 ~ C CO = E f S 05 W ~ C I ? * 0 C O C ^ \"g 0) c o \u00b0 - 1 ^ CO >, TO CO QJ (0 > 01 o 3 D * ; 3 CO JT\" a) gj 75 \u2022S CM C CM => LL 13 CO I I '55 t c .E CO I \u00a3 CO I\u2014 to CO O 4-> CO c - \u00b0 .2 co *-> c .2> T3 -a CO c ^ \u00ab s c u CO ^ to .2 \u2022= CO c 3 CO * \" > c o \u00bb\u2022\u00a7 g CO CL CO i i 2 | CO \"S CO JC \u00a3 (0 CL C 05 - c toc o CD 10 .E c ^ u CO J> ' w Q. C 3 O E P I \u00ab \u00a3 I 8 ^ c c 0) 1-O co 4-* \u2022 CO I D y vO .1 (D to 76 Caulobacter Similar Organism Function Identity\/% Accession protein Proteins Similarity LpsA GCA Pseudomonas aeruginosa GDP-mannose dehydratase 65.2\/88.6 Q51366 RfbB Synechocytis species GDP-mannose dehydratase 55.2\/83.4 P72586 GMD Escherichia coli GDP-mannose dehydratase 55.7\/85.0 P32054 GMD Escherichia coli 0157 GDP-mannose dehydratase 55.7\/84.9 085339 LpsB YvfD Bacillus subtilis Serine O-acetyltransferase 47.2\/83.1 P71063 Wlal Campylobacter jejuni Serine O-acetyltransferase 37.9\/83.4 086157 NeuD Escherichia coli acetyltransferase 32.4\/77.2 Q46674 WbdR Escherichia coli 0157 N-acetyltransferase 30.3\/72.2 085344 LpsC SpsC Synechocytis species Spore coat polysaccharide synthesis 50.0\/86.1 P73981 Mth334 Methanobactium thermoautotropicum Perosamine synthetase 46.4\/82.4 026434 RfbE Escherichia coli 0157 Perosamine synthetase 45.4\/82.4 007894 RfbE Vibrio cholerae Perosamine synthetase 42.3\/80.1 Q06953 LpsD WbaZ-1 Archaeoblubus fulgidus Mannosyl transferase 24.3\/69.8 030192 Mth332 Methanobactium thermoautotropicum LPS biosynthesis 24.5\/68.6 026432 ORF18.9 Salmonella enterica Mannosyl transferase 19.6\/62.0 Q00483 ExpE4 Sinorhizobium meliloti 25.0\/40.7 P96434 LpsE ORF18.9 Salmonella enterica Mannosyl transferase 26.5\/89.7 Q00483 WbaZ-2 Archaeoblubus fulgidus Mannosyl transferase 24.5\/64.6 029649 WbaZ-1 Methanobactium thermoautotropicum Mannosyl transferase 24.2\/66.5 030192 LpsF WbdA Escherichia coli Mannosyl transferase 19.4\/66.2 066234 AF0617 Archaeoblubus fulgidus LPS biosynthesis protein 24.8\/69.9 029638 Mth370 Methanobactium thermoautotropicum LPS biosynthesis protein, RfbU -like 29.0\/65.7 026470 LpsG AlgC Pseudomonas aeruginosa phosphomannomutase 36.0\/57.4 P26276 PGM Neisseria gonorrhoeae phosphomannomutase 32.9\/50.6 P40390 PmmA Mycobacterium phosphomannomutase 38.0\/54.2 086374 PGM Neisseria meningitidis phosphomannomutase 35.0\/53.5 P40391 LpsH XanB Xanthomonas campestris Phosphomannose isomerase 38.3\/71.7 P29956 ManC Yersinia enterocolitica Mannose-1 -phosphate 33.2\/64.0 Q56874 guanyltransferase RfbM Escherichia coli Mannose-1 -phosphate 32.6\/65.9 Q59427 guanyltransferase LpsI CcpA Bacillus megaterium Catabolite control protein 34.9\/74.2 P46828 CcpA Bacillus subtilis Catabolite control protein 33.1\/74.5 P25144 DegA Bacillus subtilis Degradation activator 33.1\/74.9 P37947 LacI Bacillus subtilis LacI repressor like protein 30.0\/72.9 034396 LpsJ LpsBl Rhizobium etli galactosyltransferase 59.7\/71.0 034301 CapM Staphylococcus aureus unknown 45.7\/79.6 P95706 RfbW Vibrio cholerae galactosyltransferase 47.2\/79.8 Q56624 PssA Rhizobium leguminosarum galactosyltransferase 34.6\/69.2 Q52856 LpsK' WlaL Campylobacter jejuni amino sugar epimerase 43.8\/79.6 086159 BplL Bordetella pertussis LPS biosynthesis 31.0\/64.4 Q45387 LpsB2 Rhizobium etli dTDP-glucose 4,6, dehydratase 25.9\/39.4 034302 CAPD Bacillus subtilis unknown 26.5\/69.9 P72370 LpsL CPS23FV Streptococcus pneumoniae Rhamnosyltransferase 29.8\/51.7 086159 CPS23FI Streptococcus pneumoniae LPS biosynthesis 29.8\/51.7 AAC69532 ORF51x5 Vibrio anguillarum unknown 26.7\/45.0 031012 Tdbl6 6-4. Deduced proteins involved in O-antigen synthesis and their homologues. B L A S T and F A S T A alignments were used to determine identity and similarity. Percentage similarity represents identical amino acids and conserved subst i tut ions. * incomplete OFF 77 As shown by Southern blotting, the Tn5 insertions, F1, F2, F4, F6, F11, F12, F14, F15, F19 and F20 are linked. Figure 6-4 shows that the Tn5 insertions F2, F4, F11, F12, F15 and F20 are linked to the RsaA transporter genes. F1, F6, and F14 must be linked as well, but it was not possible to construct the DNA sequence of this linkage. In addition, of the four mutants not characterised by Southern analysis, F23 is in the same O R F as F9, and F25 is in the same O R F as F14. The other two mutants, F24 and F26, were not obviously linked to any of the other insertions. Analysis and proposed function of individual proteins involved in S-LPS synthesis. A total of 14 O R F s associated with the formation of the S - L P S were found (Table 6-4). Four of these O R F s are incomplete. A summary of the characteristics of these O R F s is listed in Table 6-5. All of the O R F s start with an A T G codon except IpsH which starts with a TTG. Sequence similarity and codon preference indicate that the TTG is the most probable start codon for IpsH. Using the C. crescentus promoter consensus for biosynthetic genes (Malakooti et al., 1995), possible promoters were found 31 bp and 99 bp 5' olIpsG, 52 bp 5' of IpsH, 204 bp 5' of Ipsl, 154 bp 5' of IpsJ and 63 bp 5' of IpsK. In some clusters of smooth L P S genes the G+C content of the individual clusters varies with respect to the G+C content of the bacterium suggesting recent acquisition of the genes (Fallarino et al., 1997; Fry et al., 1998; Stroeher etal., 1995). The G+C content of these O R F s is consistent with the average C. crescentus content of 67%. 78 QRF Translation start Size (aa) Predicted mass(kDa) pE GtC % IpsA 325 36.3 6.2 65.1 IpsB 215 21.4 8.5 69.3 IpsC GAACI3TCACTATCir^n!0G^TGCi^ 346 37.8 5.9 63.1 IpsD 346 39.1 5.7 65.2 IpsE 345 38.2 5.8 65.8 IpsF GCGTCTCXXTOGOCTGC^ ^ 430 47.0 7.5 69.1 IpsG >469 ND 5.0 65.8 IpsH OZTAAGACIGIGrOGGGACAAGAOTTO 434 45.5 4.6 67.4 lpsl CGQGCIOXCATSaCA^^ 356 38.7 6.3 65.4 IpsJ 187 20.5 10.5 66.0 IpsK >459 ND 10.4 69.8 IpsL\" ND >336 ND 5.5 68.7 orfl* ND >352 ND 6.4 73.8 orf2 GGO^ACCmSAAA^^ 316 34.1 10.2 72.5 Table 6-5. Characteristics of the putative S-LPS synthesis genes. Start codons are in bold. Putative Shine-Dalgarno sequences are underlined. * - incomplete ORF. ND - not determined because ORF is incomplete. LpsA resembles GDP-mannose 4,6-dehydratases. The start codon for IpsA is 143 bp 3' of rsaE. No promoter matching the consensus sequence was found upstream of IpsA, as would be expected if there is a terminator after rsaE (see Ch.3). The LpsA sequence has up to 65.2% identity and 88.6% similarity over its entire length to GDP-mannose 4,6-dehydratases from P. aeruginosa and E. coli. (Table 6-4). These enzymes convert G D P - m a n n o s e to GDP-4-ke to -6 -deoxymannose (Stevenson et al., 1996) as part of biosynthetic pathways polysaccharides. One example of this is the synthesis of perosamine in V. cholerae and E. coli 0157. The significant similarity to GDP-mannose 4,6-dehydratases suggests that this is also the function of LpsA, although no Tn5 insertion was found in the gene. LpsB is similar to N-acetyltransferases. The gene IpsB follows IpsA by 2 bp suggesting that these genes are transcriptionally coupled. The protein encoded by the gene shows significant similarity to Wlal from C. jejuni and NeuD from E. coli (Table 6-4). Wlal is involved in the synthesis of the O-antigen (Fry et al., 1998) while 79 the function of NeuD is not clear, but is thought to be involved in NeuNAc transfer (Annunziato et al., 1995). These proteins also show some similarity to the LpxA genes from E. coli and S. enterica. The LpxA proteins are U D P - N -acetylglucosamine O-acetyltransferases that are involved in the first step of Lipid A biosynthesis and have 24 to 26 unique hexapeptide motifs starting with an isoleucine, leucine or valine residue often followed by a glycine (Vaara, 1992; Vuorio et al., 1994). LpsB, Wlal and NeuD contain several of these hexapeptide repeats (Fig. 6-6). The protein WbdR from E. coli 0157 also contains these hexapeptide repeats and has 72.2% sequence similarity to LpsB. WbdR is thought to encode an N-acetyltransferase which converts GDP-perosamine to GDP-N-acetyl perosamine (Wang and Reeves, 1998). Since the data in this chapter suggest that the genes involved in perosamine synthesis in E. coli 0157 are also present in C. crescentus LpsB may acetylate GDP-perosamine like WbdR. 80 LpsB Wlal NeuD |M L I q|o ~ ~ M S I K K I g g L m a ED A K V V I E S L R SO rcT|g|G G H G A | s |G H G A G G|T G A G G H V_ c e|D | v a K N T I I D S L 7T|H V P V V G I d |D L A | 1 p P| k y n[p\" 70 T i g 1 l | K D N | \u2022 - d y y 11F V A I G d [ N ] r | L R Q K L j g r \\^\\ a JR D H IF F I A I G n |NJ e | l R K K l | y [ Q K l | s j E N |Y F i f g l I G k p s t|R K H | y 1 [NTT]_I_^T|H F . I) . K . 90 100 T JR G F | _ s L V N [a |_ E N G F K I V N L | T l R k [ _ NJTIR L I N l | F I A I G LpsB I H | p [ s A V V Wlal I H K S A L I NeuD i f d l K T A I I H K S A 130 I N A D S | w r T [ g r D l 1 I N A H ^ A k 1 D T R I a ] I I N T [ V I L N T |V V I N T <1 A V V D H S V I E H C s L I E \u00bb f LpsB L G A A C H L |g Wlal I \"T|s H v s| NeuD I G C C S N I s| Figure 6-6. C l u s t a l W a l ignment L p s B . Alignment of LpsB with Wlal from C. jejuni (Accession C A A 7 2 3 5 8 ) and NeuD from E. c o \/ \/ ( A C C 4 3 3 0 1 ) . Aster isks mark the hexapept ide motifs found in glycosyl transferase. Identical and similar residues are boxed . LpsC appears to be a perosamine synthetase. The gene encoding LpsC starts 74 bp 3' of IpsB, but no promoter sequence was found between IpsB and IpsC. LpsC has considerable identity over its entire length to the rfbE and per gene products that are thought to synthesize perosamine (Table 6-4). These proteins likely catalyze the conversion of GDP-4-keto-6-D-deoxymannose to G D P -perosamine (4-amino-4,6-dideoxymannose) in V. cholerae and E. coli 0157 (Stroeher et al., 1995; Wang and Reeves, 1998) and show similarity to two classes of pyridoxal-binding proteins involved in the synthesis of amino sugars similar to perosamine. The perosamine synthetic pathway has not been proven chemically, but the proteins suspected in the synthesis of perosamine are the only highly similar proteins involved in O-antigen synthesis found in common between Vibrio cholerae, and E. coli 0157 supporting these predictions (Wang and Reeves, 1998). Based on the similarity to these genes, it is likely that LpsC is a perosamine synthetase. 81 LpsD LpsE WbaZEc WbaZSe WbaZ-1 At WbaZ-2Af M R I M K V m e i n e I K I |M K I M K I v 1 s s i v p \u00a3 i I a a p \u00a3 <3 s L V tl e w 1 - 1 s c V \u00a3 H d y f - 9 a a V f H e s f - g e M K LpsD e a g h e V e r \u00a3 y L P f V d d P n LpsE _ _ _ _ a t P g V Q S e L V r V P \u00a3 T w e P a WbaZEc m n k r a k t t \u00a3 I q k L P k A k s n y WbaZSe r h \u00a3 1 g k y a t t t \u00a3 I q n L P k A k k \u00a3 y WbaZ-1 At - - - a V P e e \u00a3 r n k V i s \u00a3 E e T i k 1 p WbaZ-2Af - - - i n t 1 g \u00a3 e d i s q h 1 V a k i l LpsD LpsE WbaZEc WbaZSe WbaZ-tAt WbaZSAf LpsD LpsE WbaZEc WbaZSe WbaZ-tAt WbaZSAf \u00a3 r - - - P 1 S k s h a V \u00a3 a a h s V a s g - - n w s_ 9 - 1 t a s F a 7 r d n R a V k a a R m L 1 h k K w L 1 h k i 1 w p 1 a LpsD e r \u00a3 g y g D e i V A i s R I- \u2022 [p]h K R LpsE e 1 \u00a3 t g g e h g D Y V \u00a3 A g g R V a a g K R WbaZEc r F e 1 n \u00a3 n k e D Y f \u00a3 T a s R L V 7 y K R WbaZSe n F e V K n e k q D Y y \u00a3 T a s R H V P y K R WbaZ-1 At k F k \u00a3 K C y - g D F w 1 S V n R I y P \u2022 K R WbaZ-2Af k Y k c K n s - e 1) F y 1 \u00a3 V 9 R L w h e K R q a q h L M I E A [tl q y V k s g L L I E A 1 a l L P g s L I V E A f a e M P n -L I V E A \u00a3 s k M P o -_L q L _E_ V \u00a3 k k L q d -e a I r 9 c i k a k - -240 L V I L V V L V V L[7J i I V V L V . S Is I G I G V G I G p e n q a g -\u00a3 s k g d LpsD LpsE WbaZEc WbaZSe WbaZ-tAt WbaZ-2Af LpsD LpsE WbaZEc WbaZSe WbaZ-tAt WbaZ-2 At Figure 6-7. C l u s t a l W A l i gnmen t of L p s D and L p s E with W b a Z genes from E. coli (Access ion A A D 2 1 5 7 1 ) and S. enterica (X61917) and WbaZ homologues from A. fulgidus (AAB91187 ) . Identical and similar residues are boxed. Identical residues have dark shading Similar residues have light shading. The consensus sequence is located below the al ignment. 82 LpsD and LpsE resemble glycosyltransferases. The gene for LpsD follows IpsC by 6 bp and the gene for LpsE follows IpsD by 13 bp, suggesting that all three genes are part of a polycistron. Both LpsD and LpsE have significant similarity to the WbaZ proteins (Fig 6-7). These proteins also have similarity to the RfbU related proteins, but size and amino acid similarity indicates that the WbaZ-like protein are a separate family. WbaZ is a known mannosyltransferase in S. enterica (Liu et al., 1993). It seems likely that LpsD and LpsE function to link perosamine monomers to the O-antigen with each providing a different form of linkage. LpsF is similar to perosamine transferases. The gene for LpsF is separated from IpsABCDE by rsaF and is transcribed in the opposite orientation. LpsF, like LpsD and LpsE, appears to be a mannosyltransferase, but has greater similarity to the RfbU family. The similarity to mannosyltransferases is much less than that seen with LpsD and LpsE, but it does have significant similarity to the C-terminal of E. coli mannosyltransferases, WbdB and WbdA (Kido et al., 1998; Sugiyama et al., 1998) and RfbU, from V. cholerae (Wang and Reeves, 1998). RfbU, from V. cholerae, is known to transfer a perosamine residue onto the growing O-antigen chain. These proteins contain a signature motif that is also found in LpsF (Fig 6-8). This motif consists of the sequence EX[XF]GXXXXE[AG] with a serine preceding the motif by 3 to 5 residues (Geremia et al., 1996; Rocchetta et al., 1998). Again, it seems likely that LpsF acts to add perosamine residues onto the O-antigen. 320 330 34 K [ \u00a5 \" 350 LpsF L T A S S D I V L - _ F L H R E G Y G L L L A E A I w L G T L A RfbU Vc L H L L S K A F V F P s H L R E A F G I S L I E A M Y C K A I I S S RfbU E.c Y R I A s - V V V M P s - - E E A F G M V L A E A S V s G V P V I A WbdA L Y N L c K L F V F P s - - L E G F G L P P L E A M R c G A A T l [ g ~ wbdB L Y A A A R T F V Y P s - - F E G F G L P I L E A M S c G V P V V c * Figure 6-8. ClustalW alignment of LpsF with a number of known mannosyl transferases. The mannosyl transferase motif is boxed. The conserved serine is marked with *. RfbU -Vibrio cholerae (Access ion Y07788) , RfbU - E. coli (BAA31838 ) , W b d A , W b d B - E. coli (D43637). Identical and similar residues are boxed. Identical residues have dark shading. Similar residues have light shading. The consensus sequence is located below the al ignment. 83 LpsG is similar to phosphomannomutases. Two Tn5 insertions mutants had interrupted LpsG genes. The LpsG gene does not appear to be linked to any of the other Ips genes (Table 6-1 and Table 6-2). This protein has very high identity along its entire length to a number of phosphomannomutase enzymes suggesting that this is the function of LpsG (Table 6-4). Phosphomannomutase converts mannose-6-phosphate to mannose-1 -phosphate and is one of the enzymes implicated in perosamine synthesis (Stroeher etal., 1995; Wang and Reeves, 1998). LpsH may have a dual function as a phosphomannoisomerase and mannose-1-phosphate guanyltransferase. Two shedder mutants have Tn5 insertions within IpsH that result in loss of proper O-antigen production. It was not possible to link this gene with the RsaA transport genes using the TIGR Caulobacter genome sequence, but Southern analysis showed that IpsH is linked (Table 6-1 and Table 6-2). LpsH has significant identity over its entire length to a large family of enzymes that have dual functions as a phosphomannoisomerase and mannose-1-phosphate guanyltransferase (Table 6-4) Both functions are required for the synthesis of perosamine (Stroeher et al., 1995) and are probably also performed by LpsH in C. crescentus. These functions are split up in E. coli 0157 into the manA and manC genes (Wang and Reeves, 1998). Lpsl has similarity to the LacI repressor family. The Tn5 insertion in mutant F1 interrupts lpsl. Southern blot analysis indicated that this insertion is linked to the Rsa locus. This insertion has a different phenotype than every other shedder Tn5 insertion. Analysis of the O-antigen by S D S - P A G E and silver staining reveals that a lower amount of O-antigen is produced by this mutant. Analysis of Lpsl indicates that the highest degree of identity is with CcpA, the catabolite control protein in Bacillus subtilis. CcpA represses carbohydrate utilization enzymes such as a -amylase and acetyl coenzyme A synthetase and has a positive regulatory affect on excess carbon excretion proteins such as acetate kinase (Henkin etal., 1991). Lower sequence identity is found to a number of LacI repressor-like proteins (Table 6-4). Analysis of the genes adjacent to lpsl revealed the presence of analogues of 84 glucokinase, 6-phosphogluconate dehydratase and glucose-6-phosphate-1-dehydrogenase enzymes involved in basic metabolic pathways. This positioning suggests that Lpsl may regulate the transcription of these genes. If Lpsl has a repressor effect on these enzymes it could slow the production of O-antigen as glucose-6-phosphate would tend not be shunted into the perosamine synthetic pathway. Instead, it would be used for energy production in central metabolism. LpsJ is similar to galactosyl transferases. The Tn5 insertion F24 interrupts a gene with sequence similarity to several galactosyl transferases (Fig. 6-9). These enzymes appear to transfer the first sugar residue (usually a galactose) to undecaprenol phosphate, the lipid precursor. RfbW is one of these enzymes and its 10 LpsJ RfbW LpsB1 m e e v 1 g i WlaH m y e WlbG LpsJ V G R a s 4 RfbW V G K n q n LpsB1 I G R \u00a3 n Q WlaH 7 ] G [ T d e K WlbG V G K d g V V G . K M F DIVITJ |M V ] L K R \\s[g]h o| A L . G L P V I It I T A L L| 1 \" L TT P T l L f a ] L I A | i a | l K ) L L I L K F R T M K F R T M K F R S M K F K T M K F 1\" S M K F R T M LpsJ P Q L w s V L V G h M S 1 V G P R P A L F N Q d D L I RfbW P Q L i n V L k G e M s s V G P R P C L F N Q q D L I LpsB1 P Q L w c I L a G k M s f V a E \u2022 P A L Y N Q y _p_ L I WlaH T l Q L f n V L k G d M s f V G p R > H e y s L F WlbG V[Q_ _L_ i d v F G_ s M s ] V G p R e v P r y V V i y P Q L V L Q M s V G p R P L M Q D I. i 130 F R G V G V G V A L R P G V T G W A d k 1 r h k 1 0 l ^ V R V R P G P G P G G V L T G W A I T G W A I T | d p|A I T G W A Q Q I N G | Q V N a Q I N G ] Q V N G 1 J s I r \u00a3 LpsJ RfbW LpsBf WlaH WlbG R D E | \"v\"|p [m R D E | R In a pJn0Q R D E n e i l g q s s d p e r t y E Y L \u00ab R S L L i \u00bb L R V L V a Q\\\u00a3 I Q T k n y f t y I i q E Y L - r R S L g \u00a3 D M R i L \u00a3 I V K N [ i js ^1 1 D L K i M \u00a3 ! E Y V Q T R T \u00a3J g D L K i I a h . Y R S L i, D i T V T V i T LpsJ RfbW LpsBf WlaH WlbG p | v L | t a r|G V g k g a g d\"TJv k[G V V R V L K VJ T [ R V . . r s I G V 210 t | | k P k l H -K 220 g h v t t e k \u00a3 n g k n 230 250 Figure 6-9. C l u s t a l W a l i g n m e n t of L p s J w i t h p u t a t i v e g a l a c t o s y l t r a n s f e r a s e s R f b W - V . cholerae (Access ion Y 0 7 7 8 8 ) , L p s B 1 - R e f \/ \/ (U56723 ) , W laH-C . jejuni ( C A A 7 2 3 5 7 ) , WbIG - Bordetella pertussis (X90711) . Identical and similar residues are boxed. Identical residues have dark shading. Similar residues have light shading. The consensus sequence is located below the alignment. 85 sequence is 47.2% identical and 79.8% similar to LpsJ over 144 amino acids. RfbW is involved in the synthesis of the perosamine homopolymer making up the O-antigen of V. cholerae 01 (Fallarino et al., 1997) suggesting that RfbW may transfer the first perosamine to the lipid precursor. In C. crescentus, LpsJ may initiate the formation of the O-antigen by attaching the first sugar residue (probably a perosamine) to the undecaprenol phosphate. LpsK has sequence similarity to amino sugar synthesis enzymes. The mutant, F3, has an interruption in IpsK. It was only possible to determine the sequence for the 5' end of IpsK from the TIGR genome. The partial sequence of LpsK is similar to a number of large proteins, usually consisting of over 600 amino acids, suggesting that approximately 150 amino acids are missing from the C-terminal of the LpsK coding sequence (Fig 6-10). There is still considerable similarity, especially in the middle of the protein, to WlaL, RfbV and WlbL from C. jejuni, V. cholerae 01 and B. pertussis. These proteins contain 5 hydrophobic, predicted transmembrane domains in the N-terminus. The central portion contains an NAD-binding site and is homologous to UDP-glucose-4-epimerases. Two motifs have been implicated in binding of NAD in these proteins, G X G X X G and G A G G S I G (Fallarino et al., 1997). As seen in Fig 6-10, the second motif is found in all the proteins, but the first only occurs in RfbV and WlbL suggesting that not all members of this family contain this motif. The C-terminal 300 amino acids of these proteins have identity with dTDP-glucose 4,6-hydratases (Bechthold et al., 1995; Linton et al., 1995). These proteins are usually associated with synthesizing amino 6-deoxy and dideoxy sugars involved in L P S synthesis or extracellular polysaccharides and probably perform multiple functions to account for the 3 domains. LpsK was not found linked to the other O-antigen synthesis genes. This may indicate that LpsK is involved in the synthesis of a core sugar, possibly the terminal core sugar. Interruption of this gene may prevent attachment of the O-antigen to the core, resulting in the observed shedding phenotype. 86 10 LpsK RfbV WlaL WlbL LpsK RfbV WlaL WlbL M F T P I Q L M I M T L P Y A I R R _ L J F [ V ] D L P | R J P F [ K ] Q M L A [ h\\L\\S A Q F Y K S K R V E R A P w R F RHs A T D H L R F V R L G M Y R A L V J L R Y M M L P L F V F K I Y K V A W R F F S L N L A A F G V x L Y S i . R Y H s E R LpsK RfbV WlaL WlbL I T R T L A C F F V V T A G N T 110 T H P G _ I D G G F F F S G F I F Y L R T V A F Q - - - A F | l | P F S D F F N P F P L Q L - - A T [ T | S L S L A L S L A LpsK RfbV WlaL WlbL LpsK RfbV WlaL WlbL E R |G V | c F |o L F 160 p A L p R nrL i i R Q K P N [ V [Fl l| E E T P R T c |R C D K F V |V\"|N |G L |R 210 L G S I A D H - H S S D E - E K E K Y - P P E Q G S G A G A G A F D F E I K L P K 170 E A E A T G R D K A L H L R A P A G G A G S Q [ L ] A M | A ] L R T A Y |AJ L I Q G A K E 220 a v L | A [ R I L R D S G L S S \u2022 L R N L Y S [ Y \" \\71 E [Q~|G 180 G L G G G G Y A E R D E - [ Y S L G L F P H -[Y\"|R P I P V P V P V 190 G |V V | S P L D R E D D P A K T A R K E L I 230 P A A I I L | F P V K | M L I A I V K T i l l I A L L I|D | R | H N - - - - - I R Q | 240 L T D S A M S T F G A E N N I G K N T R I K R L E - Q E E L K K P S A P P K Q I R S I A M G G K H R 200 D E Q I T Y L V LpsK RfbV WlaL WlbL 440 450 L i s T D K A V | A | P T | S | V M G A A K R V A E L I S T D K A V R P T N I M G A S K R M A E M I S T D K A V R P T N I M G C T K R V C E L I S T D K A V R P T N V M G A S K R L A E 460 G G D 470 480 M R V S T T I F - - F E V D K T R F S I V R F G N V L G S A G S V V [ _ T M V R F G N V L G S S G S V V P A [c]v R F G N V L G S S G S V I P S M V R F G N V L G S S G S V V P F K F K F R R A A N L E V T V T H P D P L T L T H P D P I T L T H P E Figure 6-10. C l u s t a l W a l i gnmen t of L p s K . The first N A D binding motif is underl ined. The second N A D motif is boxed. Only RfbV and WlbL contain the first motif. Only a partial sequence of LpsK has been deduced and the alignment is t runcated after the L p s K sequence. RfbV - V. cholerae (Access ion Y 0 7 7 8 8 ) , WlaL - C. jejuni ( C A A 7 2 3 6 0 ) , W lbL - B. pertussis ( X 9 0 7 1 1 ) 87 LpsL may be a glycosyltransferase. The mutant F26 has an insertion in IpsL. This gene is 5' to an O R F with similarity to exsG which was implicated in extracellular polysaccharide synthesis (Becker et al., 1995). The LpsL amino acid sequence is 29.8% identical and 51.7% similar over a range of 87 amino acids to a putative rhamnosyl transferase in Streptococcus pneumoniae (Table 6-4). Rhamnose is a 6-deoxy derivative of mannose, as is perosamine, suggesting that LpsL may be another perosamine transferase. The functions of some of the Tn5-interrupted genes are still unidentified. The Tn5 insertions F22 and F10 interrupt an ORF with no identity to any known protein. But 5' of this O R F is an O R F corresponding to an A B C - 2 transporter. These transporters are known to transport extra-cellular polysaccharides and O-antigens through the cytoplasmic membranes (Whitfield, 1995). Unlike the A B C transporters of the type I secretion systems, the A B C and transmembrane domains consist of separate proteins. It is possible that the ORF interrupted by F10 and F22 represents the transmembrane protein part of the ABC-2 transporter, but hydropathy analysis does not suggest that this protein contains transmembrane segments. The ABC-2 transporters are often found adjacent to genes involved in polysaccharide synthesis, therefore it may be that the O R F interrupted by the F10 and F22 mutants is also involved in polysaccharide synthesis. The Tn5 insertion F6 interrupts orfl which has similarity to a chemotaxis receptor (Ward et al., 1995). CheY, a chemotaxis regulator, is found linked to a number of O-antigen synthesis genes with similarity to IpsJ, IpsB, IpsC and IpsK in C. jejuni. It may be that the genes involved in chemotaxis are found close to the 0 -antigen synthesis genes in C. crescentus and that the F6 insertion has a polar effect on downstream S - L P S genes. It is also possible that this O R F has nothing to do with L P S synthesis and the Tn5 insertion may not cause the shedding phenotype. Instead, a second mutation may cause the altered phenotype. 88 Summary As stated at the beginning of the chapter, it seems likely that the S -LPS of C. crescentus is a fixed length homopolymer of approximately forty 4,6-dideoxy-4-amino-hexose residues. Proton NMR anomeric traces suggest that the linkages between the hexose residues may not all be identical. Several of the genes discussed in this chapter are similar to genes found in the synthesis of perosamine in V. cholerae and E. coli 0157 (Stroeher et al., 1995; Wang and Reeves, 1998) and as perosamine is a 4,6-dideoxy-4-amino-hexose, it seems likely that the O-antigen of C. crescentus consists of perosamine residues. All of the enzymes responsible for perosamine synthesis can be found in the Ips genes listed above. Four enzymes are involved in converting fructose-6-phospate to perosamine (Fig. 6-11). The first enzyme in the pathway descr ibed by Stroeher et al (1995) is a phosphomannoisomerase, RfbA. Mutants F25 and F14 are located in LpsH which has significant similarity to RfbA. The second step in the pathway is performed by the enzyme RfbB, a phosphomannomutase. Two Tn5 mutants, F9 and F23, are in the gene for LpsG, an enzyme with considerable similarity to RfbB. The third step in the pathway is catalyzed by RfbA. RfbD, a GDP-mannose 4,6-dehydratase, catalyses the fourth reaction. No Tn5 insertion has been found in a gene with similarity to RfbD, but the coding sequence of C. crescentus gene immediately 3' of rsaE, IpsA, shows considerable similarity to RfbD. The last step of the process requires RfbE, the perosamine synthetase. LpsC presumably fulfills this role in C. crescentus, and the shedding mutant F2 has a Tn5 insertion in the LpsC gene. Two more genes need to be considered as part of the perosamine pathway in C. crescentus (Fig. 6-11). Bacteria using the Embden-Meyerhof-Parnas pathway require phosphoglucoisomerase as part of the pathway leading into the bottom half of glycolysis, but C. crescentus uses the Entner-Doudoroff glycolytic pathway instead (Riley and Kolodziej, 1976) and as such would not be expected to normally have the enzyme phosphoglucoisomerase for converting glucose-6-phosphate to fructose-6-phosphate. But C. crescentus requires phosphoglucoisomerase if it makes perosmaine by the pathway described here (Fig 6-11). None of the Tn5 hits were found in such a gene, so the TIGR Caulobacter genome was searched for a phosphoglucoisomerase analogue and one was found in contig gcc_2205. A 89 CC I cc Q. O \u2022 \u2022 \u2022 \u2022 0 CO o m JO \u2014 5 co CO [ n CO (U -~ D) C 3- o CX CO 2 o *\" %CJ3 co N r- (O C C - P CO CI) u CO f j \u00ab \u2022\u2014 c > . CO \" _ \u00a3 is \u2022 H CO >>_ E I -o r 8) CO 1= -O CL o CO M_ \u00a3 Q. O \u00a3 B t C \"D O CO 2 i 3 CO co F o *'<\/)<\/) m 3 .t; S2 <\" \"O . C CO c C 'o CO c o . s i o .52 \u2014 CO XI i i i \u00a3 I 90 second enzyme, glucokinase, is required for converting glucose to glucose-6-phophate. A glucokinase analogue was found next to the F1 Tn5 insertion in the potential repressor lpsl. From the position of lpsl may be deduced that Lpsl has a regulatory effect on the synthesis of glucokinase. Interruption of Lpsl by the F1 insertion may alter the expression of glucokinase, which in turn would affect perosamine synthesis, resulting in the phenotype seen in the F1 mutant (less O-antigen). These data suggest that C. crescentus contains all the genes necessary for the synthesis of perosamine. Furthermore, 5 separate Tn5 insertions in 3 of the O R F s cause loss of O-antigen synthesis, strengthening the argument that perosamine makes up the O-antigen of the S -LPS . Six of the Tn5 insertions appear to be in glycosyltransferases (IpsD, IpsE, IpsF, IpsJ, and IpsL) (Fig. 6-4). This is would be expected since proton NMR suggests there are a number of different linkages between the sugars in the O-antigen. The similarities of LpsJ to galactosyltransferases, which transfer the initial sugar to the lipid precursor, suggest that LpsJ may initiate the first addition of a sugar to the undecaprenol phosphate. The S - L P S chemical composition suggests that this first sugar is a perosamine, but it is possible that it is galactose. Galactose is found in the core and it is possible that traces found during analysis of the O-antigen would be attributed to contamination from the core. LpsK may be involved in the synthesis of a sugar residue. As all the enzymes for the synthesis of perosamine are accounted for in the other Ips genes, LpsK may synthesize an unidentified sugar in the O-antigen (possibly an initial galactose linked by LpsJ) or a sugar in the LPS core. O-antigens are elongated at either the reducing terminus or the non-reducing terminus. If the O-antigen elongates at the reducing terminus, individual sugars are 'flipped' across the cytoplasmic membrane by a flippase enzyme and the O-antigen is assembled in the periplasm. If synthesis of the O-antigen occurs at the non-reducing terminus, the chain elongates in the cytoplasm and an ABC-2 transporter is required to transport the O-antigen chain across the cytoplasmic membrane (Whitfield, 1995). If the ABC-2 transporter upstream of the F10 and F22 insertions is involved in the transport of the O-antigen, it suggests that the O-antigen is elongated by polymerization at the non-reducing terminus. The O-antigen would then be 91 transported through the cytoplasmic membrane by the ABC-2 transporter where it would then be transferred to the lipid-A core. While it has not been proven that any of the O R F s listed here are required for O-antigen synthesis, the presence of multiple Tn5 insertions in some of the O R F s confirms that the Tn5 is responsible for causing the defective S - L P S phenotype and the interrupted O R F is very likely a gene involved in making the S - L P S . 92 Chapter 7 Conclusions and Future Considerations The attachment and secretion of the S-layer appear to be linked, although RsaA can be secreted even when the S - L P S is defective and the S-layer cannot attach to the surface. While searching for the secretion components, genes involved in the synthesis and assembly of the S - L P S were found linked to the transport complex. In prokaryotes, genetic linkage often implies linkage of the function. In this case, the most obvious link is that the S - L P S is required for attachment of the S-layer. Since C. crescentus is a non-pathogenic bacterium, the only apparent function for the S - L P S is to allow attachment of the S-layer to the outer membrane. As such, it seems likely that the bacterium coordinates production of the S-layer and S - L P S and that clustering of the genes allows better control. Similar linkages between the S - L P S and S-layer translocation have been found in Acinetobacter sp. and Aeromonas salmonicida (Belland and Trust, 1985; Thorne et al., 1976). A linkage between type I secretion systems and S - L P S has also been found. Three genes involved in the synthesis of the smooth L P S have also been implicated in the secretion of a-hemolysin from E. coli (Stanley et al., 1993; Wandersman and Letoffe, 1993). It is suspected that these genes are required for the proper insertion of the OMP component in the outer membrane. RsaA is secreted by a type I secretion mechanism. All three main components of this system have been found and all are linked to the rsaA gene although the O M P gene is separated from the others by 5 kb. These genes are similar to a number of other type I secretion mechanisms. The highest similarity was found to systems secreting proteases and lipases from P. aeruginosa, E. chrysanthemi and S. marcescens. The identity between these systems is high enough that the proteases, AprA and PrtB, were successfully secreted by the RsaA transport machinery. The genetic arrangement of the RsaA transporter genes is unusual. Typically, either all three genes are on either side of the substrate gene or the OMP gene is unlinked to the rest of the genes. In the RsaA transport system, 5 genes are found between the M F P and the O M P , an arrangement that has not been found 93 before. These 5 genes appear to be required for the synthesis of the O-antigen. Another unusual finding was the presence of a homologous O R F of the OMP component found elsewhere in the genome. This homologue has 60% identity to rsaF, but is not required for the secretion of RsaA. The function of this homologue remains to be discovered or even if the gene produces a functional protein. RsaA accounts for a large portion of the cellular protein (10 to 12%). As far as can be determined, the RsaA secretion machinery secretes a larger fraction of total cell protein than any other known type I secretion mechanism. This high level of protein production is apparently necessary to keep the cell completely covered with S-layer at all times and is similar to the levels noted for other bacterial S-layer proteins (Messner and Sleytr, 1992). This means that the RsaA secretion machinery is either more efficient than that of other type I secretion systems or that a larger number of transport complexes exist in the membranes or a combination of both factors. This question is an important one to answer from a fundamental research perspective, to address such things as what makes a secretion apparatus more efficient. It is also important because some current research is engaged in evaluating the potential of the S-layer protein secretion system for the secretion of heterologous proteins and peptides in a biotechnological context (Bingle et al., 1997a; Bingle et al., 1997b), where increased levels of secretion has obvious utility. Now that the genes involved in the transport of RsaA have been discovered, it will be possible to address such issues. For example, gene duplications of the transporter genes can be made to see if more copies of the transporter components increase secretion. In addition, with the genes in hand it will be possible to produce and isolate the individual components and make antisera against them. Antibodies can then be used to assess the amount of protein present in the cell. Most of the genes involved in O-antigen synthesis are linked to the transporter genes. In addition to the O-antigen synthesis genes mentioned above, a number of other genes involved in O-antigen synthesis have been found by Tn5 mutagenesis. While the linkage pattern of these genes was not as obvious, Southern blot analysis showed that the majority of the Tn5 insertions found were linked to the transporter genes as well. However, it was not demonstrated that all of the Tn5 insertions were 94 as closely linked to the transporters. As the Southern analysis of the Tn5 insertions only used two restriction enzymes, further analysis may prove that these other genes are also linked. Usually, all the genes involved in the synthesis of the O-antigen are linked on a 20-30kb fragment of DNA. Sequencing further, past IpsF, should reveal other genes involved in O-antigen synthesis, possibly including genes not found here by Tn5 mutagenesis. Perosamine appears to be the major component of the O-antigen. Analysis of the O-antigen showed that it is composed of a 4,6, dideoxy-4-amino-hexose, of which perosamine is an example. It was shown in this report that all the genes required for the synthesis of perosamine are found in the genome of C. crescentus. Furthermore, three of these genes were disrupted by transposon mutagenesis leading to an altered O-antigen. It is reasonable to conclude from these data that perosamine is the 4,6, dideoxy-4-amino-hexose seen in the chemical analysis of the O-antigen. Several glycosyltransferases are involved in the synthesis of the O-antigen. NMR analysis of the O-antigen revealed a number of different anomeric proton signals, suggesting that there are several different linkages between the sugar residues. This implies the presence of multiple glycosyltransferases to produce these linkages. A number of Tn5 insertions altering the O-antigen were found in genes with similarity to mannosyltransferases. Since perosamine is a derivative of mannose the transferases are probably highly similar and this has been found with the perosamine transporter, RfbV from E. coli 0157 (see Ch. 6). One Tn5 insertion interrupts a gene with similarity to galactosyltransferases that transfer the first sugar residue to the lipid precursor of the O-antigen. It may be that this enzyme, LpsJ, transfers a galactose to the lipid precursor as a first step in the growing O-antigen. Alternatively, since perosamine is an isomer of galactose, a perosamine may be the first residue of the O-antigen chain. Galactose may have been missed in the analysis of the O-antigen since it is also found in the core and a slightly increased level, relative to other core sugars, would have gone unnoticed. 95 Several other genes involved in the proper formation of the smooth L P S have also been found. One, IpsK, may be involved in synthesis of a core or O-antigen sugar. Another, lpsl, appears to code for a transciption repressor that affects smooth L P S production. Tn5 insertions interrupting O-antigen synthesis were found in two O R F s with no similarity to any known proteins. Two of these insertions are 3' of an ORF coding for an ABC-2 transporter. ABC-2 transporters export O-antigens and extracellular polysaccharides. If this is the ABC-2 transporter that exports the O-antigen, it suggests that the O-antigen is synthesized in the cytoplasm by addition of sugar residues to the non-reducing terminus. The information provided here should assist in determining the correct structure of the S -LPS and may also allow the attachment site(s) of the O-antigen to RsaA to be determined. A number of possibilities present themselves for future steps in analysis of the S - L P S . The first obvious step is to isolate the DNA containing the genes IpsGHIJKL and determine how closely they are linked. Sequencing of this DNA may reveal other genes involved in O-antigen synthesis and possibly synthesis of the core (for example LpsK may be involved in synthesis of a core sugar and the DNA surrounding it may contain the remaining synthesis genes). The other obvious experiment is to knock-out LpsA and LpsB and confirm that they are involved in the synthesis of the O-antigen. There may be more genes involved in the synthesis of the O-antigen that were not found when screening the Tn5 library. For example, interruption of O-antigen synthesis genes that did not result in complete detachment of the S-layer may have been missed by the screen. An example of this might be enzymes involved in the transfer of the sugar residues that are not involved in the attachment of process. The S-layer lies very close to the outer membrane of the bacterium as seen in electron micrographs (Smit et al, 1981, Smit et al, 1984). If the O-antigen consisted of a single chain, it would be 40 residues long; long enough to span the distance between the S-layer and outer membrane numerous times. This suggests that the S-layer either attaches to several points along the chain (Fig 1-4) or the O-antigen has multiple branches. Selective mutation of the various transferases, or by using 96 the Tn5 mutants, should allow one to determine which of these possibilities is correct by analyzing the different sized O-antigens that are produced. Summary RsaA, the S-layer subunit of C. crescentus, is transported by a type I secretion system involving three proteins, an ABC-transporter, a periplasmic spanning Membrane Forming Protein and an outer membrane protein. It was shown that a number of other F W C species also contain type I secretion systems that probably secrete the S-layer subunit. The evolutionary relationships of these type I secretion systems and the S-layer subunit genes was examined. A number of genes involved in the synthesis of the smooth L P S were found. Some of these genes code for enzymes involved in the synthesis of perosamine, the likely major component of the O-antigen. Other genes code for the glycosyltransferases that link the sugar residues of the O-antigen to each other. 97 Bibliography Abraham, W., Strompl, C , Meyer, H., Lindholst, S., Moore, E.R.B. , Christ, R., Vancanneyt, M., Tindall, B.J. , Bennasar, A., Smit, J . and Tesar, M. (1999) Phylogeny and polyphasic taxonomy of Caulobacter species. Proposal of Maricaulis gen. nov. with Maricaulis maris (Poindexter) comb. nov. as the type species and emended description of the genera Brevundimonas and Caulobacter. IntlJ. of System. Bacteriol., 49, 1053-1073. Akatsuka, H., Binet, R., Kawai, E., Wandersman, C. and Omori, K. (1997) Lipase secretion by bacterial hybrid ATP-binding cassette exporters: molecular recognition of the L ipBCD, PrtDEF, and HasDEF exporters. J. Bacteriol., 179, 4754-4760. Alley, M.R., Gomes, S.L., Alexander, W. and Shapiro, L. (1991) Genetic analysis of a temporally transcribed chemotaxis gene cluster in Caulobacter crescentus. Genetics, 129, 333-341. Altschul, S.F. , Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410. Anderson, D.M. and Schneewind, O. (1997) A mRNA signal for the type III secretion of Yop proteins by Yersinia enterocolitica [see comments]. Science, 278, 1140-1143. Anderson, D.M. and Schneewind, O. (1999) Type III machines of Gram-negative pathogens: injecting virulence factors into host cells and more. Curr. Opin. Microbiol., 2, 18-24. Annunziato, P.W., Wright, L.F., Vann, W.F. and Silver, R.P. (1995) Nucleotide sequence and genetic analysis of the neuD and neuB genes in region 2 of the polysialic acid gene cluster of Escherichia coli K1. J. Bacteriol., 177, 312-319. Armstrong, S., Zhang, H., Tabernero, L., Hermodson, M. and Stauffacher, C. (1999) Powering the A B C transporter: The crystallographic structure of the ATP-binding cassette, RbsA. ATP-Binding Cassette Transporters: From Multidrug Resistance to Genetic Disease. F E B S Advanced Lecture Course, Gosau, Austria, p. 3. 98 Awram, P. and Smit, J . (1998) The Caulobacter crescentus paracrystalline S-layer protein is secreted by an A B C transporter (type I) secretion apparatus. J. Bacteriol., 180, 3062-3069. Bairoch, A. (1992) PROSITE: a dictionary of sites and patterns in proteins. Nuc. Acids Res, 20 Suppl, 2013-2018. Bechthold, A., Sohng, J.K. , Smith, T .M. , Chu , X . and F loss, H.G. (1995) Identification of Streptomyces violaceoruber Tu22 genes involved in the biosynthesis of granaticin. Mol. Gen. Genet, 248, 610-620. Becker, A., Kuster, H., Niehaus, K. and Puhler, A. (1995) Extension of the Rhizobium meliloti succinoglycan biosynthesis gene cluster: identification of the exsA gene encoding an A B C transporter protein, and the exsB gene which probably codes for a regulator of succinoglycan biosynthesis. Mol Gen Genet, 249,487-497. Belland, R.J. and Trust, T.J . (1985) Synthesis, export, and assembly of Aeromonas salmonicida A-layer analyzed bytransposon mutagenesis. J. Bacteriol., 163, 877-881. Beveridge, T.J . , Pouwels, P.H., Sara, M., Kotiranta, A., Lounatmaa, K., Kari, K., Kerosuo, E., Haapasalo, M., Egelseer, E.M., Schocher, I., Sleytr, U.B., Morelli, L , Callegari, M.L., Nomellini, J.F. , Bingle, W.H., Smit, J . , Leibovitz, E., Lemaire, M., Miras, I., Salamitou, S., Beguin, P., Ohayon, H., Gounon, P., Matuschek, M. and Koval, S.F. (1997) Functions of S-layers. FEMS Microbiol. Rev., 20, 99-149. Bik, E.M., Bunschoten, A .E . , Willems, R.J . , Chang, A . C . and Mooi, F.R. (1996) Genetic organization and functional analysis of the otn DNA essential for cell-wall polysaccharide synthesis in Vibrio cholerae 0139. Mol. Microbiol., 20, 799-811. Binet, R., Letoffe, S., Ghigo, J .M. , Delepelaire, P. and Wandersman, C. (1997) Protein secretion by Gram-negative bacterial A B C exporters - a review. Gene, 192, 7-11. Binet, R. and Wandersman, C. (1995) Protein secretion by hybrid bacterial A B C -transporters: specific functions of the membrane ATPase and the membrane fusion protein. EMBO J . , 14, 2298-2306. Binet, R. and Wandersman, C. (1996) Cloning of the Serratia marcescens hasF gene encoding the Has A B C exporter outer membrane component: a TolC analogue. Mol. Microbiol., 22, 265-273. 99 Bingle, W.H., Awram, P., Nomellini, J .F. and Smit, J . (1999) The Secretion Signal of C. crescentus S-layer Protein is Located Within the C-Terminal 82 Amino Acids of the Molecule, submitted J. Bacteriol. Bingle, W.H. , Le, K.D. and Smit, J . (1996) The extreme N-terminus of the Caulobacter crescentus surface-layer protein directs export of passenger proteins from the cytoplasm but is not required for secretion of the native protein. Can. J. Microbiol., 42, 672-684. Bingle, W.H. , Nomellini, J .F . and Smit, J . (1997a) Cell-surface display of a Pseudomonas aeruginosa strain K pilin peptide within the paracrystalline S-layer of Caulobacter crescentus. Mol. Microbiol., 26, 277-288. Bingle, W.H. , Nomellini, J .F . and Smit, J . (1997b) Linker mutagenesis of the Caulobacter crescentus S-layer protein: toward a definition of an N-terminal anchoring region and a C-terminal secretion signal and the potential for heterologous protein secretion. J. Bacteriol., 179, 601-611. Bingle, W.H. and Smit, J . (1994) Alkaline phosphatase and a cellulase reporter protein are not exported from the cytoplasm when fused to large N-terminal portions of the Caulobacter crescentus surface (S)-layer protein. Can J Microbiol, 40, 777-782. Blaser, M.J., Smith, P.F. and Kohler, P.F. (1985) Susceptibility of Campylobacter isolates to the bactericidal activity of human serum. J. Infect. Dis., 151, 227-235. Blaser, M.J., Smith, P.F., Repine, J .E . and Joiner, K.A. (1988) Pathogenesis of Campylobacter fetus infections. Failure of encapsulated Campylobacter fetus to bind C3b explains serum and phagocytosis resistance. J. Clin. Invest., 81, 1434-1444. Boos, W. and Shuman, H. (1998) Maltose\/maltodextrin system of Escherichia coli: transport, metabolism, and regulation. Microbiol. Mol. Biol. Rev., 62, 204-229. Boot, H.J. and Pouwels, P.H. (1996) Expression, secretion and antigenic variation of bacterial S-layer proteins. Mol. Microbiol., 21, 1117-1123. Borinski, R. and Holt, S .C. (1990) Surface characteristics of Wolinella recta A T C C 33238 and human clinical isolates: correlation of structure with function. Infect. Immun., 58, 2770-2776. Brent, R. and Ptashne, M. (1980) The lexA gene product represses its own promoter. Proc. Natl. Acad. Sci. U. S. A., 77, 1932-1936. 100 Brun, Y.V. , Marczynski, G. and Shapiro, L. (1994) The expression of asymmetry during Caulobacter cell differentiation. Annu. Rev. Biochem., 63, 419-450. Burns, D.L. (1999) Biochemistry of type IV secretion. Curr. Opin. Microbiol., 2, 25-29. Canter Cremers, H., Spaink, H P . , Wijfjes, A .H. , Pees, E., Wijffelman, C.A., Okker, R.J. and Lugtenberg, B.J. (1989) Additional nodulation genes on the Sym plasmid of Rhizobium leguminosarum biovar viciae. Plant Mol. Biol., 13, 163-174. Colnaghi, R., Pagani, S., Kennedy, C. and Drummond, M. (1996) Cloning, sequence analysis and overexpression of the rhodanese gene of Azotobacter vinelandii. Eur. J. Biochem., 236, 240-248. Croop, J .M . (1998) Evolutionary relationships among A B C transporters. Methods Enzymol., 292, 101-116. Currie, H.L., Lightfoot, J . and Lam, J .S . (1995) Prevalence of gca, a gene involved in synthesis of A-band common antigen polysacchar ide in Pseudomonas aeruginosa. Clinical and Diagnostic Lab. Immun, 2, 554-562. Davidson, A.L. and Nikaido, H. (1991) Purification and characterization of the membrane-associated components of the maltose transport system from Escherichia coli. J. Biol. Chem., 266, 8946-8951. Decottignies, A. and Goffeau, A. (1997) Complete inventory of the yeast A B C proteins. Nature Genet, 15, 137-145. Delepelaire, P. and Wandersman, C. (1990) Protein secretion in gram-negative bacteria. The extracellular metalloprotease B from Erwinia chrysanthemi contains a C-terminal secretion signal analogous to that of Escherichia coli alpha-hemolysin. J Biol. Chem., 265, 17118-17125. Delepelaire, P. and Wandersman, C. (1991) Characterization, localization and transmembrane organization of the three proteins PrtD, PrtE and PrtF necessary for protease secretion by the gram-negative bacterium Erwinia chrysanthemi. Mol. Microbiol., 5, 2427-2434. Dinh, T., Paulsen, I.T. and Saier, M.H., Jr. (1994) A family of extracytoplasmic proteins that allow transport of large molecules across the outer membranes of gram-negative bacteria. J. Bacteriol., 176, 3825-3831. 101 Drummelsmith, J . and Whitfield, C. (1999) Gene products required for surface expression of the capsular form of the group 1 K antigen in Escherichia coli (O9a:K30). Mol. Microbiol., 31, 1321-1332. Duong, F., Lazdunski, A., Cami, B. and Murgier, M. (1992) Sequence of a cluster of genes controlling synthesis and secretion of alkaline protease in Pseudomonas aeruginosa: relationships to other secretory pathways. Gene, 121, 47-54. Duong, F., Lazdunski, A. and Murgier, M. (1996) Protein secretion by heterologous bacterial ABC-transporters: the C-terminal secretion signal of the secreted protein confers high recognition specificity. Mol. Microbiol., 21, 459-470. Dworkin, J . , Tummuru, M.K.R. and Blaser, M.J. (1995) A lipopolysaccharide-binding domain of the Campylobacter fetus S-layer protein resides within the conserved N-terminus of a family of silent and divergent homologs. J. Bacteriol., 177, 1734-1741. Edwards, P. and Smit, J . (1991) A transducing bacteriophage for Caulobacter crescentus uses the paracrystalline surface layer protein as a receptor. J. Bacteriol., 173, 5568-5572. Ehrmann, M., Ehrle, R., Hofmann, E., Boos, W. and Schlosser, A. (1998) The A B C maltose transporter. Mol. Microbiol., 29, 685-694. Eichelberg, K., Ginocchio, C .C . and Galan, J .E . (1994) Molecular and functional characterization of the Salmonella typhimurium invasion genes invB and invC: homology of InvC to the F0F1 ATPase family of proteins. J. Bacteriol., 176, 4501-4510. Fallarino, A. , Mavrangelos, C , Stroeher, U.H. and Manning, P.A. (1997) Identification of additional genes required for O-antigen biosynthesis in Vibrio cholerae 0 1 . J . Bacteriol., 179, 2147-2153. Fath, M.J. , Skvirsky, R.C. and Kolter, R. (1991) Functional complementation between bacterial MDR-l ike export systems: colicin V, alpha-hemolysin, and Erwinia protease. J. Bacteriol., 173, 7549-7556. Fellay, R., Frey, J . and Krisch, H. (1987) Interposon mutagenesis of soil and water bacteria: a family of DNA fragments designed for in vitro insertional mutagenesis of gram-negative bacteria. Gene, 52, 147-154. 102 Feng, J .N . , Russel , M. and Model, P. (1997) A permeabilized cell system that assembles filamentous bacteriophage. Proc. Natl. Acad. Sci. U. S. A., 94, 4068-4073. Finnie, C , Zorreguieta, A., Hartley, N.M. and Downie, J.A. (1998) Characterization of Rhizobium leguminosarum exopolysaccharide glycanases that are secreted via a type I exporter and have a novel heptapeptide repeat motif. J. Bacteriol., 180, 1691-1699. Fisher, J.A., Smit, J . and Agabian, N. (1988) Transcriptional analysis of the major surface array gene of Caulobacter crescentus. J. Bacteriol., 170, 4706-4713. Fry, B.N., Korolik, V., ten Brinke, J.A., Pennings, M.T., Zalm, R., Teunis, B.J., Coloe, P.J. and van derZeijst, B.A. (1998) The lipopolysaccharide biosynthesis locus of Campylobacter jejuni 81116. Microbiolology, 144, 2049-2061. Galan, J .E . and Collmer, A. (1999) Type III secretion machines: bacterial devices for protein delivery into host cells. Science, 284, 1322-1328. Geremia, R.A., Petroni, E.A., lelpi, L. and Henrissat, B. (1996) Towards a classification of glycosyltransferases based on amino acid sequence similarities: prokaryotic alpha-mannosyltransferases. Biochem. J., 318, 133-138. Gilchrist, A., Fisher, J.A. and Smit, J . (1992) Nucleotide sequence analysis of the gene encoding the Caulobacter crescentus paracrystalline surface layer protein. Can. J. Microbiol., 38, 193-202. Gilchrist, A. and Smit, J . (1991) Transformation of freshwater and marine caulobacters by electroporation. J. Bacteriol., 173, 921-925. Gober, J.W. and Marques, M.V. (1995) Regulation of cellular differentiation in Caulobacter crescentus. Microbiol. Rev., 59, 31-47. Gorbalenya, A .E . and Koonin, E.V. (1990) Superfamily of UvrA-related NTP-binding proteins. Implications for rational classification of recombination\/repair systems. J Mol Biol, 213, 583-591. Guzzo, J . , Murgier, M., Filloux, A. and Lazdunski, A. (1990) Cloning of the Pseudomonas aeruginosa alkaline protease gene and secretion of the protease into the medium by Escherichia coli. J. Bacteriol., 172, 942-948. Henkin, T.M., Grundy, F.J., Nicholson, W.L. and Chambliss, G.H. (1991) Catabolite repression of alpha-amylase gene expression in Bacillus subtilis involves a trans-103 acting gene product homologous to the Escherichia coli lacl and gaIR repressors. Mol. Microbiol., 5, 575-584. Holland, I.B. (1999) personal communication. Holton, T.A. and Graham, M.W. (1991) A simple and efficient method for direct cloning of P C R products using ddT-tailed vectors. Nuc. Acids Res., 19, 1156. Hovmoller, S. , Sjogren, A. and Wang, D.N. (1988) The structure of crystalline bacterial surface layers. Prog. Biophys. Mol. Biol., 51, 131-163. Hung, L.W., Wang, I.X., Nikaido, K., Liu, P.Q., Ames, G.F. and Kim, S .H. (1998) Crystal structure of the ATP-binding subunit of an A B C transporter [see comments]. Nature, 396, 703-707. Hwang, J . , Zhong, X . and Tai, P .C. (1997) Interactions of dedicated export membrane proteins of the colicin V secretion system: CvaA, a member of the membrane fusion protein family, interacts with CvaB and TolC. J. Bacteriol., 179, 6264-6270. Hyde, S.C. , Emsley, P., Hartshorn, M.J., Mimmack, M.M., Gileadi, U., Pearce, S.R., Gallagher, M.P., Gill, D.R., Hubbard, R.E. and Higgins, C.F. (1990) Structural model of ATP-binding proteins associated with cystic fibrosis, multidrug resistance and bacterial transport [see comments]. Nature, 346, 362-365. Kawai, E., Akatsuka, H., Idei, A., Shibatani, T. and Omori, K. (1998) Serratia marcescens S-layer protein is secreted extracellularly via an ATP-binding cassette exporter, the Lip system. Mol. Microbiol., 27, 941-952. Keen, N T . , Tamaki, S., Kobayashi, D. and Trollinger, D. (1988) Improved broad-host-range plasmids for DNA cloning in gram-negative bacteria. Gene, 70, 191-197. Kenny, B., Taylor, S. and Holland, I.B. (1992) Identification of individual amino acids required for secretion within the haemolysin (HlyA) C-terminal targeting region. Mol Microbiol, 6, 1477-1489. Kido, N., Sugiyama, T., Yokochi, T., Kobayashi, H. and Okawa, Y. (1998) Synthesis of Escherichia coli 0 9 a polysaccharide requires the participation of two domains of WbdA, a mannosyltransferase encoded within the wb* gene cluster. Mol Microbiol, 27, 1213-1221. 104 Koronakis, V., Hughes, C. and Koronakis, E. (1993) ATPase activity and ATP\/ADP-induces conformational change in the soluble domain of the bacterial protein translocator HlyB. Mol. Microbiol., 8, 1163-1175. Koronakis, V., Li, J . , Koronakis, E. and Stauffer, K. (1997) Structure of TolC, the outer membrane component of the bacterial type I efflux system, derived from two-dimensional crystals. Mol. Microbiol., 23, 617-626. Kovach, M.E., Phillips, R.W., Elzer, P.H., Roop, R.M. and Peterson, K.M. (1994) p B B R l M C S : a broad-host-range cloning vector. Biotechniques, 16, 800-802. Koval, S.F. and Hynes, S.H. (1991) Effect of paracrystalline protein surface layers on predation by Bdellovibrio bacteriovorus. J. Bacteriol., 173, 2244-2249. Koval, S.F. and Murray, R.G. (1984) The isolation of surface array proteins from bacteria. Can J Biochem Cell Biol, 62, 1181-1189. Kubori, T., Matsushima, Y., Nakamura, D., Uralil, J . , Lara-Tejero, M., Sukhan, A., Galan, J .E . and Aizawa, S.I. (1998) Supramolecular structure of the Salmonella typhimurium type III protein secretion system. Science, 280, 602-605. Leeds, J.A. and Welch, R.A. (1996) RfaH enhances elongation of Escherichia coli \/7 \/yCABD mRNA. J. Bacteriol., 178, 1850-1857. Letellier, L., Howard, S .P . and Buckley, J.T. (1997) Studies on the energetics of proaerolysin secretion across the outer membrane of Aeromonas species. Evidence for a requirement for both the protonmotive force and A T P . J. Biol. Chem., 272, 11109-11113. Letoffe, S., Delepelaire, P. and Wandersman, C. (1990) Protease secretion by Erwinia chrysanthemi: The specific secretion functions are analogous to those of Escherichia coli alpha-haemolysin. EMBO J, 9, 1375-1382. Letoffe, S., Ghigo, J .M. and Wandersman, C. (1994a) Iron acquisition from heme and hemoglobin by a Serratia marcescens extracellular protein. Proc. Natl. Acad. Sci. U. S. A., 91, 9876-9880. Letoffe, S., Ghigo, J .M . and Wandersman, C. (1994b) Secretion of the Serratia marcescens HasA protein by an A B C transporter. J. Bacteriol., 176, 5372-5377. Letoffe, S. and Wandersman, C. (1992) Secretion of CyaA-PrtB and HlyA-PrtB fusion proteins in Escherichia coli: Involvement of the glycine-rich repeat domain of Erwinia chrysanthemi protease B. J. Bacteriol., 174, 4920-4927. 105 Linton, K.J. and Higgins, C.F. (1998) The Escherichia coli ATP-binding cassette (ABC) proteins. Mol. Microbiol., 28, 5-13. Linton, K.J . , Jarvis, B.W. and Hutchinson, C.R. (1995) Cloning of the genes encoding thymidine diphosphoglucose 4,6- dehydratase and thymidine diphospho-4-keto-6-deoxyglucose 3,5-epimerase from the erythromycin-producing Saccharopolyspora erythraea. Gene, 153, 33-40. Liu, D., Haase, A .M. , Lindqvist, L., Lindberg, A.A. and Reeves, P.R. (1993) Glycosyl transferases of O-antigen biosynthesis in Salmonella enterica: identification and characterization of transferase genes of groups B, C2, and E1. J. Bacteriol., 175, 3408-3413. Lu, H.M. and Lory, S. (1996) A specific targeting domain in mature exotoxin A is required for its extracellular secretion from Pseudomonas aeruginosa. EMBO J., 15, 429-436. Luckevich, M.D. and Beveridge, T.J . (1989) Characterization of a dynamic S layer on Bacillus thuringiensis. J. Bacteriol., 171, 6656-6667. Mackman, N., Nicaud, J .M. , Gray, L. and Holland, I.B. (1985) Identification of polypeptides required for the export of haemolysin 2001 from E. coli. Mol. Gen. Genet. 201, 529-536. MacRae, J .D. and Smit, J . (1991) Characterization of caulobacters isolated from wastewater treatment systems. Appl. Environ. Microbiol., 57, 751-758. Malakooti, J . , Wang, S .P . and Ely, B. (1995) A consensus promoter sequence for Caulobacter crescentus genes involved in biosynthetic and housekeeping functions. J. Bacteriol., 177, 4372-4376. Martin, V . J . and Mohn, W.W. (1999) An alternative inverse P C R (IPCR) method to amplify DNA sequences flanking Tn5 transposon insertions [In Process Citation]. J. Microbiol. Methods, 35, 163-166. Maser, P. and Kaminsky, R. (1998) Identification of three A B C transporter genes in Trypanosoma brucei spp. Parasitol. Res., 84, 106-111. Mead, D.A., Szczesna-Skorupa, E. and Kemper, B. (1986) Single-stranded DNA 'blue' T7 promoter plasmids: a versatile tandem promoter system for cloning and protein engineering. Protein Eng., 1, 67-74. 106 Messner, P. and Sleytr, U.B. (1992) Crystalline bacterial cell-surface layers. In Rose, A .H . and Tempest, D.W. (eds.), Advances in Microbial Physiology. Academic Press, London, Vol. 33, pp. 213-275. Morales, V .M. , Backman, A. and Bagdasarian, M. (1991) A series of wide-host-range low-copy-number vectors that allow direct screening for recombinants. Gene, 97, 39-47. Munn, C.B. , Ishiguro, E.E. , Kay, W.W. and Trust, T .J . (1982) Role of surface components in serum resistance of virulent Aeromonas salmonicida. Infect. Immun., 36, 1069-1075. Nielsen, H., Engelbrecht, J . , Brunak, S. and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 1-6. Nikaido, H. (1994) Maltose transport system of Escherichia coli: an ABC-type transporter. FEBS Lett, 346, 55-58. Nikaido, H. and Vaara, M. (1985) Molecular basis of bacterial outer membrane permeability. Microbiol. Rev., 49, 1-32. Nomellini, J.F. , Kupcu, S., Sleytr, U.B. and Smit, J . (1997) Factors controlling in vitro recrystallization of the Caulobacter crescentus paracrystall ine S-layer. J. Bacteriol., 179, 6349-6354. Pearson, W.R., Wood, T., Zhang, Z. and Miller, W. (1997) Comparison of DNA sequences with protein sequences. Genomics, 46, 24-36. Pohlner, J . , Halter, R., Beyreuther, K. and Meyer, T.F. (1987) Gene structure and extracellular secretion of Neisseria gonorrhoeae IgA protease. Nature, 325, 458-462. Poindexter, J .S . (1981) The caulobacters: ubiquitous unusual bacteria. Microbiol. Rev., 45, 123-179. Pugsley, A . P . (1993) The complete general secretory pathway in gram-negative bacteria. Microbiol. Rev., 57, 50-108. Ravenscroft, N., Walker, S .G . , Dutton, G .G . and Smit, J . (1991) Identification, isolation, and structural studies of extracellular polysaccharides produced by Caulobacter crescentus. J. Bacteriol., 173, 5677-5684. 107 Ravenscroft, N., Walker, S .G. , Dutton, G.S . and Smit, J.K. (1992) Identification, isolation, and structural studies of the outer membrane lipopolysaccharide of Caulobacter crescentus. J. Bacteriol., 174, 7595-7605. Riley, R.G. and Kolodziej, B.J. (1976) Pathway of glucose catabolism in Caulobacter crescentus. Microbios, 16, 219-226. Roberts, R.C. , Mohr, C D . and Shapiro, L. (1996) Developmental programs in bacteria. Curr. Top. Dev. Biol., 34, 207-257. Rocchetta, H.L., Burrows, L.L., Paean, J . C . and Lam, J . S . (1998) Three rhamnosyltransferases responsible for assembly of the A-band D- rhamnan polysaccharide in Pseudomonas aeruginosa: a fourth transferase, WbpL, is required for the initiation of both A-band and B-band lipopolysaccharide synthesis [published erratum appears in Mol Microbiol 1998 Dec;30(5):1131]. Mol. Microbiol., 28, 1103-1119. Russel, M. (1998) Macromolecular assembly and secretion across the bacterial cell envelope: type II protein secretion systems. J. Mol. Biol., 279, 485-499. Salmond, G.P. and Reeves, P .J . (1993) Membrane traffic wardens and protein secretion in gram-negative bacteria. Trends in Biochem. Sci., 18, 7-12. Sambrook, J . , Fritsch, E.F. and Maniatis, T. (1989) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y. Sara, M. and Sleytr, U.B. (1996a) Biotechnology and biomimetic with crystalline bacterial cell surface layers (S-layers). Micron, 27, 141-156. Sara, M. and Sleytr, U.B. (1996b) Crystalline bacterial cell surface layers (S-layers): from cell structure to biomimetics. Prog. Biophys. Mol. Biol., 65, 83-111. Scheu, A.K., Economou, A., Hong, G.F., Ghelani, S., Johnston, A.W. and Downie, J.A. (1992) Secretion of the Rhizobium leguminosarum nodulation protein NodO by haemolysin-type systems. Mol. Microbiol., 6, 231-238. Schnai tman, C .A . and Klena, J .D. (1993) Genet ics of l ipopolysaccharide biosynthesis in enteric bacteria. Microbiol. Rev., 57, 655-682. Schulein, R., Gentschev, I., Schlor, S. , Gross, R. and Goebel , W. (1994) Identification and characterization of two functional domains of the hemolysin translocator protein HlyD. Mol. Gen. Genet, 245, 203-211. Shapiro, L. (1976) Differentiation in the Caulobacter cell cycle. Annu. Rev. Microbiol., 30, 377-407. 108 Shapiro, L. and Losick, R. (1997) Protein localization and cell fate in bacteria. Science, 276, 712-718. Sheps, J.A., Zhang, F. and Ling, V. (1996) Phylogenetic Analysis of Members of the A B C transporter superfamily. In Rothman, S .R. (ed.) Membrane Protein Transport. JAI Press, Greenwich, Conneticut, Vol. 3, p. 81. Simon, R., Priefer, U. and Puhler, A. (1983) A broad host range mobilization system for in vivo genetic engineering: transposon mutagenesis in Gram negative bacteria. Bio\/technology, 1, 784-790. Sleytr, U.B. (1976) Self-assembly of the hexagonally and tetragonally arranged subunits of bacterial surface layers and their reattachment to cell walls. J . Ultrastruct. Res., 55, 360-377. Sleytr, U.B., Bayley, H., Sara, M., Breitwieser, A., Kupcu, S., Mader, C , Weigert, S., Unger, F.M., Messner, P., Jahn-Schmid, B., Schuster, B., Pum, D., Douglas, K., Clark, N.A., Moore, J.T., Winningham, T.A., Levy, S., Frithsen, I., Pankovc, J . , Beale, P., Gillis, H.P., Choutov, D.A. and Martin, K.P. (1997a) Applications of S-layers. FEMS Microbiol. Rev., 20, 151-175. Sleytr, U.B. and Messner, P. (1983) Crystalline surface layers on bacteria. Annu. Rev. Microbiol., 37, 311-339. Sleytr, U.B. and Messner, P. (1988) Crystalline surface layers in procaryotes. J. Bacteriol., 170, 2891-2897. Sleytr, U.B., Messner, P., Pum, D. and Sara, M. (1993) Crystalline bacterial cell surface layers. Mol. Microbiol., 10, 911-916. Sleytr, U.B., Pum, D. and Sara, M. (1997b) Advances in S-layer nanotechnology and biomimetics. Adv. Biophys., 34, 71-79. Sleytr, U.B. and Sara, M. (1997) Bacterial and archaeal S-layer proteins: structure-function relationships and their biotechnological applications. Trends Biotechnol., 15, 20-26. Smit, J . and Agabian, N. (1984) Cloning of the major protein of the Caulobacter crescentus periodic surface layer: detection and characterization of the cloned peptide by protein expression assays. J. Bacteriol., 160, 1137-1145. Smit, J . , Engelhardt, H., Volker, S., Smith, S .H. and Baumeister, W. (1992) The S-layer of Caulobacter crescentus: three-dimensional image reconstruction and structure analysis by electron microscopy. J. Bacteriol., 174, 6527-6538. 109 Srnit, J . , Grano, D.A., Glaeser, R.M. and Agabian, N. (1981) Periodic surface array in Caulobacter crescentus: fine structure and chemical analysis. J. Bacteriol., 146, 1135-1150. Stahl, D.A., Key, R., Flesher, B. and Smit, J . (1992) The phylogeny of marine and freshwater caulobacters reflects their habitat. J. Bacteriol., 174, 2193-2198. Stanley, P.L., Diaz, P., Bailey, M.J., Gygi, D., Juarez, A. and Hughes, C. (1993) Loss of activity in the secreted form of Escherichia coli haemolysin caused by an rfaP lesion in core lipopolysaccharide assembly. Mol. Microbiol., 10, 781-787. Stevenson, G. , Andrianopoulos, K., Hobbs, M. and Reeves , P.R. (1996) Organization of the Escherichia coli K-12 gene cluster responsible for production of the extracellular polysaccharide colanic acid. J. Bacteriol., 178, 4885-4893. Stewart, M. and Beveridge, T .J . (1980) Structure of the regular surface layer of Sporosarcina ureae. J. Bacteriol., 142, 302-309. Stroeher, U.H., Karageorgos, L.E., Brown, M.H., Morona, R. and Manning, P.A. (1995) A putative pathway for perosamine biosynthesis is the first function encoded within the rfb region of Vibrio cholerae 0 1 . Gene, 166, 33-42. Sugiyama, T., Kido, N., Kato, Y., Koide, N., Yoshida, T. and Yokochi, T. (1998) Generation of Escherichia coli 0 9 a serotype, a subtype of E. coli 0 9 , by transfer of the wb* gene cluster of Klebsiella 0 3 into E. coli via recombination. J. Bacteriol., 180, 2775-2778. Sutton, J .M. , Peart, J . , Dean, G. and Downie, J.A. (1996) Analysis of the C-terminal secretion signal of the Rhizobium leguminosarum nodulation protein NodO; a potential system for the secretion of heterologous proteins during nodule invasion. Mol. Plant Microbe Interact., 9, 671-680. Thompson, S.A., Shedd, O.L., Ray, K.C., Beins, M.H., Jorgensen, J .P . and Blaser, M.J. (1998) Campylobacter fetus surface layer proteins are transported by a type I secretion system. J. Bacteriol., 180, 6450-6458. Thome, K.J. , Oliver, R.C. and Glauert, A .M . (1976) Synthesis and turnover of the regularly arranged surface protein of Acinetobacter sp. relative to the other components of the cell envelope. J. Bacteriol., 127, 440-450. Tobin, M.B., Peery, R.B. and Skatrud, P.L. (1997) Genes encoding multiple drug resistance-like proteins in Aspergillus fumigatus and Aspergillus flavus. Gene, 200,11-23. 110 Vaara, M. (1992) Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA) and three other transferases of Escherichia coli, consist of a six-residue periodicity theme. FEMS Microbiol. Lett., 76, 249-254. Vieira, J . and Messing, J . (1982) The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene, 19, 259-268. Vuorio, R., Harkonen, T., Tolvanen, M. and Vaara, M. (1994) The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteria. FEBS Letters, 337, 289-292. Walker, J .E . , Saraste, M. and Gay, N.J. (1984) The unc operon. Nucleotide sequence, regulation and structure of ATP- synthase. Biochim. Biophys. Acta., 768, 164-200. Walker, S . G . , Karunaratne, D.N., Ravenscroft , N. and Smit, J . (1994) Characterization of mutants of Caulobacter crescentus defective in surface attachment of the paracrystalline surface layer. J. Bacteriol., 176, 6312-6323. Walker, S .G . , Smith, S .H. and Smit, J . (1992) Isolation and comparison of the paracrystalline surface layer proteins of freshwater caulobacters. J. Bacteriol., 174, 1783-1792. Wandersman, C , Delepelaire, P. and Letoffe, S. (1990) Secretion processing and activation of Erwinia chrysanthemi proteases. Biochimie, 72, 143-146. Wandersman, C. and Letoffe, S. (1993) Involvement of lipopolysaccharide in the secretion of Escherichia coli alpha-haemolysin and Erwinia chrysanthemi proteases. Mol. Microbiol., 7, 141-150. Wang, L. and Reeves, P.R. (1998) Organization of Escherichia coli 0157 O antigen gene cluster and identification of its specific genes. Infect. Immun., 66, 3545-3551. Ward, M.J., Bell, A.W., Hamblin, P.A., Packer, H.L. and Armitage, J .P . (1995) Identification of a chemotaxis operon with two cheY genes in Rhodobacter sphaeroides. Mol. Microbiol., 17, 357-366. Weiss, A.A., Johnson, F.D. and Burns, D.L. (1993) Molecular characterization of an operon required for pertussis toxin secretion. Proc. Natl. Acad. Sci. U. S. A., 90, 2970-2974. I l l Welch, R.A. (1991) Pore-forming cytolysins of gram-negative bacteria. Mol. Microbiol., 5, 521-528. Welsh, M.J. (1998) The A B C of a versatile engine. Nature, 396, 623-624. Whitfield, C. (1995) Biosynthesis of l ipopolysaccharide O antigens. Trends Microbiol., 3, 178-185. Wolff, N., Delepelaire, P., Ghigo, J .M. and Delepierre, M. (1997) Spectroscopic studies of the C-terminal secretion signal of the Serratia marcescens haem acquisition protein (HasA) in various membrane-mimetic environments. Eur. J. of Biochem., 243, 400-407. Wolff, N., Ghigo, J .M. , Delepelaire, P., Wandersman, C. and Delepierre, M. (1994) C-terminal secretion signal of an Erwinia chrysanthemi protease secreted by a signal peptide-independent pathway: proton NMR and C D conformational studies in membrane-mimetic environments. Biochemistry, 33, 6792-6801. Yanisch-Perron, C , Vieira, J . and Messing, J . (1985) Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene, 33, 103-119. Yap, W.H., Thanabalu, T. and Porter, A . G . (1994) Influence of transcriptional and translational control sequences on the expression of foreign genes in Caulobacter crescentus. J Bacteriol, 176, 2603-2610. Yin, Y., Zhang, F., Ling, V. and Arrowsmith, C H . (1995) Structural analysis and comparison of the C-terminal transport signal domains of hemolysin A and leukotoxin A. FEBS Lett, 366, 1-5. Yun, C , Ely, B. and Smit, J . (1994) Identification of genes affecting production of the adhesive holdfast of a marine caulobacter. J Bacteriol., 176, 796-803 Zhang, F., Sheps, J.A. and Ling, V. (1998) Structure-function analysis of hemolysin B. Methods Enzymol., 292, 51-66. 112 Appendix 1 RAT fragment-rsaADE, IpsABCDE, rsaF, IpsF LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL MEDLINE REFERENCE AUTHORS TITLE JOURNAL MEDLINE REFERENCE AUTHORS TITLE JOURNAL MEDLINE REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gene NA1000RATX 16458 bp DNA BCT 07-OCT-1999 C a u l o b a c t e r c r e s c e n t u s s s t l , S - l a y e r s u b u n i t ( r s a A ) , A B C - t r a n s p o r t e r ( r s a D ) , Membrane Forming U n i t ( r s a E ) , p u t a t i v e GDP-mannose-4,6-dehydratase (LpsA), p u t a t i v e a c e t y l t r a n s f e r a s e (LpsB), p u t a t i v e perosamine s y n t h e t a s e (LpsC), p u t a t i v e m a n n o s y l t r a n s f e r a s e (LpsD), p u t a t i v e m a n n o s y l t r a n s f e r a s e (LpsE), Outer membrane p r o t e i n ( r s a F ) , and p u t a t i v e perosamine t r a n s f e r a s e (LpsE) genes, complete cds. NA1000RATX C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1230 t o 2387) F i s h e r , J . A . , S m i t , J . and Agabian,N. T r a n s c r i p t i o n a l a n a l y s i s o f the major s u r f a c e a r r a y gene o f C a u l o b a c t e r c r e s c e n t u s J . B a c t e r i o l . 170 (10), 4706-4713 (1988) 89008089 2 (bases 1336 t o 4645) G i l c h r i s t , A . , F i s h e r , J . A . and S m i t , J . N u c l e o t i d e sequence a n a l y s i s o f the gene e n c o d i n g the C a u l o b a c t e r c r e s c e n t u s p a r a c r y s t a l l i n e s u r f a c e l a y e r p r o t e i n Can. J . M i c r o b i o l . 38 ( 3 ) , 193-202 (1992) 93007489 3 (bases 1 t o 16458) Awram,P. and S m i t , J . The C a u l o b a c t e r c r e s c e n t u s p a r a c r y s t a l l i n e S - l a y e r p r o t e i n i s s e c r e t e d by an ABC t r a n s p o r t e r (type I) s e c r e t i o n a p p a r a t u s J . B a c t e r i o l . 180 (12), 3062-3069 (1998) 98292737 4 (bases 1 t o 16458) Awram,P.A. and Smit,J.K. I d e n t i f i c a t i o n o f Genes i n v o l v e d i n the S y n t h e s i s o f t h e Smooth L i p o p o l y s a c c h a r i d e U n p u b l i s h e d 5 (bases 1 t o 16458) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (07-OCT-1999) M i c r o b i o l o g y and Immunology, U n i v e r s i t y o f B r i t i s h Columbia, 300-6174 U n i v e r s i t y B l v d , Vancouver, BC V6T 1Z3, Canada L o c a t i o n \/ Q u a l i f i e r s 1. .16458 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" complement(227. .799) \/ g e n e = \" s s t l \" 113 \/note=\"unknown\" CDS complement(227..799) \/ g e n e = \" s s t l \" \/note=\"unknown\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" S s t l \" \/translation=\"MAAQVLSFFQRSPRYAPQPADWSQQELAEFYRVESALIRAGIRV GTDRGLSDENEPWFVFYRADDGEVVIHFARIDGEYLIAGPAYEEIARGFDFTSLVRNL VARHPLIRRSDSGSNLSVHPAALLVAVVGTAFFKTGEARAAETGQSNATSGHNRPVLL S S S SNAS LNDRCRAGRLPAARLCLGATAGQ\" gene 1443..4523 \/gene=\"rsaA\" CDS 1443..4523 \/gene=\"rsaA\" \/ c i t a t i o n = [ 1 ] \/ c i t a t i o n = [ 2 ] \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" S - l a y e r s u b u n i t \" \/translation=\"MAYTTAQLVTAYTNANLGKAPDAATTLTLDAYATQTQTGGLSDA AALTNTLKLVNSTTAVAIQTYQFFTGVAPSAAGLDFLVDSTTNTNDLNDAYYSKFAQE NRFIN FSINLATGAGAGATAFAAAYTGVS YAQTVATAY DK11GNAVATAAGVDVAAAV AFLSRQANIDYLTAFVRANTPFTAAADIDLAVKAALIGTILNAATVSGIGGYATATAA MINDLSDGALSTDNAAGVNLFTAYPSSGVSGSTLSLTTGTDTLTGTANNDTFVAGEVA GAATLTVGDTLSGGAGTDVLNWVQAAAVTALPTGVTISGIETMNVTSGAAITLNTSSG VTGLTALNTNTSGAAQTVTAGAGQNLTATTAAQAANNVAVDGGANVTVASTGVTSGTT TVGANSAASGTVSVSVANSSTTTTGAIAVTGGTAVTVAQTAGNAVNTTLTQADVTVTG NSSTTAVTVTQTAAATAGATVAGRVNGAVTITDSAAASATTAGKIATVTLGSFGAATI DSSALTTVNLSGTGTSLGIGRGALTATPTANTLTLNVNGLTTTGAITDSEAAADDGFT TINIAGSTASSTIASLVAADATTLNISGDARVTITSHTAAALTGITVTNSVGATLGAE LATGLVFTGGAGADSILLGATTKAIVMGAGDDTVTVSSATLGAGGSVNGGDGTDVLVA NVNGSSFSADPAFGGFETLRVAGAAAQGSHNANGFTALQLGATAGATTFTNVAVNVGL TVLAAPTGTTTVTLANATGTSDVFNLTLSSSAALAAGTVALAGVETVNIAATDTNTTA HVDTLTLQATSAKSIVVTGNAGLNLTNTGNTAVTSFDASAVTGTGSAVTFVSANTTVG EVVTIRGGAGADSLTGSATANDTIIGGAGADTLVYTGGTDTFTGGTGADIFDINAIGT STAFVTITDAAVGDKLDLVGISTNGAIADGAFGAAVTLGAAATLAQYLDAAAAGDGSG TSVAKWFQFGGDTYVVVDSSAGATFVSGADAVIKLTGLVTLTTSAFATEVLTLA\" gene 4766. . 6502 \/gene=\"rsaD\" CDS 4766..6502 \/gene=\"rsaD\" \/n o t e = \" A B C - t r a n s p o r t e r o f RsaA t y p e I s e c r e t i o n system\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" A B C - t r a n s p o r t e r \" \/translation=\"MFKRSGAKPTILDQAVLVARPAVITAMVFSFFINILALVSPLYM LQVYDRVLTSRNVSTLIVLTVICVFLFLVYGLLEALRTQVLVRGGLKFDGVARDPIFK SVLDSTLSRKGIGGQAFRDMDQVREFMTGGLIAFCDAPWTPVFVIVSWMLHPFFGILA IIACIIIFGLAVMNDNATKNPIQMATMASIAAQNDAGSTLRNAEVMKAMGMWGGLQAR WRARRDEQVAWQAAASDAGGAVMSGIKVFRNIVQTLILGGGAYLAIDGKISAGAMIAG SILVGRALAPIEGAVGQWKNYIGARGAWDRLQTMLREEKSADDHMPLPEPRGVLSAEA ASILPPGAQQPTMRQASFRIDAGAAVALVGPSAAGKSSLLRGIVGVWPCAAGVIRLDG YDIKQWDPEKLGRHVGYLPQDIELFSGTVAQNIARFTEFESQEVIEAATLAGVHEMIQ SLPMGYDTAIGEGGASLSGGQRQRLALARAVFRMPALLVLDEPNASLDQVGEVALMEA MKRLKAAKRTVIFATHKVNLLAQADYIMVINQGVISDFGERDPMLAKLTGAAPPQTPP PTPPPAPLQRVQ\" 114 gene 6570..7880 \/gene=\"rsaE\" CDS 6570..7880 \/gene=\"rsaE\" \/note=\"MFP of RsaA type I s e c r e t i o n system\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/product=\"Membrane Forming U n i t \" \/translation=\"MKPPKIQRPTDNFQAVARIGYGIIALTFVGLLGWAAFAPLDSAV IANGVVSAEGNRKTVQHLEGGMLAKILVREGEKVKAGQVLFELDPTQANAAAGITRNQ YVALKAMEARLLAERDQRPSISFPADLTSQRADPMVARAIADEQAQFTERRQTIQGQV DLMNAQRLQYQSEIEGIDRQTQGLKDQLGFIEDELIDLRKLYDKGLVPRPRLLALERE QASLSGSIGRLTADRSKAVQGASDTQLKVRQIKQEFFEQVSQSITETRVRLAEVTEKE VVASDAQKRIKIVSPVNGTAQNLRFFTEGAVVRAAEPLVDIAPEDEAFVIQAHFQPTD VDNVHMGMVTEVRLPAFHSREIPILNGTIQSLSQDRISDPQNKLDYFLGIVRVDVKQL PPHLRGRVTAGMPAQVIVPTGERTVLQYLFSPLRDTLRTTMREE\" gene 8020..8997 \/gene=\"LpsA\" CDS 8020..8997 \/gene=\"LpsA\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e GDP-mannose-4,6-dehydratase\" \/translation=\"MAKTALITGVTGQDGAYLAKLLLEKGYTVHGMLRRSASADVIGD RLRWIGVYDDIQFELGDLLDEGGLARLMRRLQPDEVYNLAAQSFVGASWDQPHLTGSV TGLGTTNMLEAVRLECPQARFYQASSSEMYGLVQHPIQSETTPFYPRSPYAVAKLYAH WMTVNYRESFGLHASAGILFNHESPLRGIEFVTRKVTDAVAAIKLGQQKTVDLGNLDA KRDWGHAKDYVEAMWLMLQQETPDDYVVATGKTWTVRQMCEVAFAHVGLNYQDHVTIN PKFLRPAEVDLLLGDPAKAKAKLGWEPKTTMQQMIAEMVDADIARRSRN\" gene \u2022 8997..9644 \/gene=\"LpsB\" CDS 8997..9644 \/gene=\"LpsB\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e a c e t y l t r a n s f e r a s e \" \/translation=\"MSASLAIGGVVIIGGGGHAKVVIESLRACGETVAAIVDADPTRR AVLGVPVVGDDLALPMLREQGLSRLFVAIGDNRLRQKLGRKARDHGFSLVNAIHPSAV VSPSVRLGEGVAVMAGVAINADSWIGDLAIINTGAVVDHDCRLGAACHLGPASALAGG VSVGERAFLGVGARVIPGVTIGADTIVGAGGVVVRDLPDSVLAIGVPAKIKGDRS\" gene 9716..10756 \/gene=\"LpsC\" CDS 9716..10756 \/gene=\"LpsC\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e perosamine s y n t h e t a s e \" \/translation=\"MDTTWISSVGRFIVEFEKAFADYCGVKHAIACNNGTTALHLALV AMGIGPGDEVIVPSLTYIASANSVTYCGATPVLVDNDPRTFNLDAAKLEALITPRTKA IMPVHLYGQICDMDPILEVARRHNLLVIEDAAEAVGATYRGKKSGSLGDCATFSFFGN KIITTGEGGMITTNDDDLAAKMRLLRGQGMDPNRRYWFPIVGFNYRMTNIQAAIGLAQ LERVDEHLAARERVVGWYEQKLARLGNRVTKPHVALTGRHVFWMYTVRLGEGLSTTRD QVIKDLDALGIESRPVFHPMHIMPPYAHLATDDLKIAEACGVDGLNLPTHAGLTEADI DRVIAALDQVLV\" gene 10760..11797 \/gene=\"LpsD\" CDS 10760..11797 115 \/gene=\"LpsD\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e m a n n o s y l t r a n s f e r a s e \" \/translation=\"MRIVLLSSIVPFINGGARFIVEWLEEKLIEAGHEVERFYLPFVD DPNEILHQIAAWRLMDLTQWCDRVICFRPPAYVVDHPNKVLWFIHHIRTFYDLWDTPY RGMPDDAQHRAIRDNLRALDTQAISEARAVFTNSQVVADRLKAFNGLDATPLYPPIYQ PERFSHTGYGDEIVAISRLEPHKRQALMIEAMQYVKSGVKLRLAGTASSAEYGRQLVK MTHDLGVADRVILEDRWISEDEKADMLKQALAVAYLPKDEDSYGYPSLEGAHARKPVI TTTDSGGVLELVEHGRNGLISAPDPRALAEQFDRLHADKAATAKMGTASLNRLAEMKI DWSTVVERLTS\" gene 11808..12845 \/gene=\"LpsE\" CDS 11808..12845 \/gene=\"LpsE\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e m a n n o s y l t r a n s f e r a s e \" \/translation=\"MKVLVVNNAAPFQRGGAEELADHLVRRLNATPGVQSELVRVPFT WEPAERLIEEMLISKGMRLYNVDRVIGLKFPAYLIPHHQKVLWLLHQFRQAYDLSEAG QSHLDFDDTGRAVKAAIRAADNACFAECRKIYCNSPVTQNRLMKFNGVASQVLYPPLN DGELFTGGEHGDYVFAGGRVAAGKRQHLLIEALALLPGSLRLVIAGPPENQAYADRLT KLVEDLDLKDRVELRFGFHPREDIARWANGALICAYLPFDEDSVGYVTMEAFAAGKAV LTVTDSGGLLEIVSADTGAVAEPTPQALAEALDRLTSDKARAISLGDAARRLWRDKNV TWEETVRRLLD\" gene 12902..14485 \/gene=\"rsaF\" CDS 12902..14485 \/gene=\"rsaF\" \/note=\"OMP of RsaA type I s e c r e t i o n system\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/product=\"Outer membrane p r o t e i n \" \/translation=\"MRVLSKVLSVRTSLIALAMAMAVVGRADLAHAETLAEAITAAYQ SNPNIQAQRAAMRALDENYTQARSAYGLQASASVAEVYGWSKGVNAKNGVEAASQTST LSLSQSLYTNGRFSARLAGVEAQIKAARENLRRIEMDLLVRVTNAYISVRRDREILRI SQGGEAWLQKQLKDTEDKYSVRQVTLTDVQQAKARLASASTQVANAQAQLNVSVAFYA SLVGRQPETLKPEPDIDGLPTTLDEAFNQAEQANPVLLAAGYTEKASRAGVAEARAQR LFSVGARADYRNGSSTPYYARGGLREDTVNASITLTQPLFTSGQLNASVRQSIEENNR DKLLMEDARRSMVLSVSQYWDSLVAARKSLVSLEEEMKANTIAFYGVREEERFALRST IEVLNAQAELQNAQINFVRGRANEYVGRLHLLAQVGTLEVGNLAPGVQPYDPERNFRK VRYRGALPTELIIGTFDKIALPLEPKKPAPGDTSPIRPPSSELPARPVSADKVTPPAS MNDLPALTDDTPVQTAPRN\" gene complement(14591..15880) \/gene=\"LpsE\" CDS complement(14591..15880) \/gene=\"LpsE\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" p u t a t i v e perosamine t r a n s f e r a s e \" \/translation=\"MTSRLLEIWRRLPTPIRRSAHVVAGAPRAALEALDKALAEHRHR SAERTALARARRRAGPRGLSPTLPVTVIGFHSAVHGLGEGARMLARGFGDMGLGVRAL DLSASVGFAAEIAPAYSSPDPDERGVTISHINPPELLRWARETEGRFLEGRRHIGYWA WELEEVPSDWLPAFDFVDEVWTPSAFAADAIRRVAPRGVKVTPVPYPLYLNPRPQADR QRFGLQDDRVVVLMAFDLRSTAQRKNPDAALRAFRDATVKATRPATLVCKVVGADLYP ETFQALAAEVADDPSIRLLTDNLSAQDMAALTASSDIVLSLHRSEGYGLLLAEAIWLG KPTLATGWSSNVEFMDPASSQFVDYRLVPVEGDGVIYRAGRWADADVGDAAEKLARMI 116 SDDAWRNTLAAATARNGHVSFNRDAWVAMTSARLPLT\" BASE COUNT 2845 a 5542 c 5354 g 2717 t ORIGIN 1 g a g c t c a c c g c g t g a a c c g g c g t g t t g t c g a c g g t c t g a g tcggcggggg gaggcgcgct 61 g g c c c c g c c g c c a a a a c c a c t g a t g c c g g c cggcaaggcc g a g g c g t c g c t g a a g t c g a g 121 agccgcagcg gcggcggtca ggcccgagcg cacggccggg t c c a g g c t c g c g g c g t c c a c 181 acggaagtcc gaggccagca gggcggcggc cagaaccagc a c c g c c t c a t t g g c c g g c a g 241 t t g c a c c g a g gcataggcgg g c t g c t g g a a gccgaccggc gcgacaccgg t c g t t c a g g c 301 t g g c g t t c g a g c t g c t g c t g agcaggacgg g g c g a t t g t g g c c c g a g g t c g c a t t g c t c t 361 g g c c g g t c t c ggcggcgcgc g c t t c g c c g g t c t t g a a g a a g g c g g t g c c g a c c a c g g c c a 421 ccagcagggc ggcggggtga accgacaggt t g c t g c c g c t gtcggagcgg c g g a t c a g c g 481 gatggcgggc g a c g a g g t t g cgcaccaggc t g g t g a a g t c gaagccgcgc g c a a t c t c t t 541 cataggcggg g c c g g c g a t c a g a t a c t c g c c g t c g a t a c g cgcaaagtgg a t c a c c a c c t 601 c g c c g t c g t c c g c t c g g t a g aagacgaacc a g g g t t c g t t c t c g t c g c t c aggccgcgat 661 c a g t g c c g a c g c g g a t g c c c g c g c g g a t c a g g g c g c t c t c g a c a c g g t a g a a c t c g g c c a 721 g c t c c t g c t g g c t c c a g t c c g c c g g t t g c g g c g c g t a g c g cggcgagcgc tgaaagaacg 781 a c a g g a c c t g t g c g g c c a t a cggcggaagc t t c c c c a a g c c t a g g t g a a a a g c c g a c c c c 841 c c g t c g g c c c a a a c a c g c t a gcagacgaca c g a t a a c c g a a c t a g t c t t c g c t g a a c a g g 901 a t c t c g t a g g t g a t c g g a t c atagaagcgg a g c a g t t c g c g c a c g a a c g t c t t c t c c g a g 961 a c c t c c a g g g c c t t g g c c c a g t c g c g g t a a c g g t c c g g c g g a a t g c g g c c g c g a c c c g t c 1021 t c c a g c t g a g agatgaaggt g t a a t a a t c a g c g c c g a c c t t a g c g g c c a g c t g g c g t t g c 1081 gacaggccgg c g g c c t c g c g c a t c t c c t t c agccagcggc c a c c t t c g c g g c g g a g g t c t 1141 t g c a c c t c c g a g gcgctgcg g c g t t g c g g g t t a c c a t a c a t t a t a a a g c c t c g c g c g t t g 1201 accgagggca ggagcgcggg c g c g c t c a c t c a c c c g c c a g g t g a a c a g t c t t t a t a t a t a 12 61 g c g c t t t t c g gcggggggta caaggaacgc t a t a t a g g a a t t t g c t g t a c c g g t t a g a a a 1321 a a t g c t g t a c c c c t g a a a t t c g g c t a t t g t c g a c g t a t g a c g t t t g c t c t a t a g c c a t c g 1381 c t g c t c c c a t g c g c g c c a c t cggtcgcagg gggtgtggga t t t t t t t t g g g a g a c a a t c c 1441 t c a t g g c c t a t a c g a c g g c c c a g t t g g t g a c t g c g t a c a c c a a c g c c a a c c t c g g c a a g g 1501 c g c c t g a c g c c g c c a c c a c g c t g a c g c t c g a c g c g t a c g c g a c t c a a a c c cagacgggcg 1561 g c c t c t c g g a c g c c g c t g c g c t g a c c a a c a c c c t g a a g c t g g t c a a c a g c a c g a c g g c t g 1621 t t g c c a t c c a g a c c t a c c a g t t c t t c a c c g g c g t t g c c c c g t c g g c c g c t g g t c t g g a c t 1681 t c c t g g t c g a c t c g a c c a c c aacaccaacg a c c t g a a c g a c g c g t a c t a c t c g a a g t t c g 1741 c t c a g g a a a a c c g c t t c a t c a a c t t c t c g a t c a a c c t g g c cacgggcgcc ggcgccggcg 1801 c g a c g g c t t t c g c c g c c g c c t a c a c g g g c g t t t c g t a c g c c c a g a c g g t c g c c a c c g c c t 18 61 a t g a c a a g a t c a t c g g c a a c g c c g t c g c g a c c g c c g c t g g c g t c g a c g t c gcggccgccg 1921 t g g c t t t c c t gagccgccag g c c a a c a t c g a c t a c c t g a c c g c c t t c g t g c g c g c c a a c a 1981 c g c c g t t c a c g g c c g c t g c c g a c a t c g a t c t g g c c g t c a a g g c c g c c c t g a t c g g c a c c a 2041 t c c t g a a c g c c g c c a c g g t g t c g g g c a t c g g t g g t t a c g c g a c c g c c a c g g c c g c g a t g a 2101 t c a a c g a c c t gtcggacggc g c c c t g t c g a ccgacaacgc g g c t g g c g t g a a c c t g t t c a 2161 c c g c c t a t c c g t c g t c g g g c g t g t c g g g t t c g a c c c t c t c g c t g a c c a c c ggcaccgaca 2221 c c c t g a c g g g c a c c g c c a a c aacgacacgt t c g t t g c g g g t g a a g t c g c c g g c g c t g c g a 2281 c c c t g a c c g t t g g c g a c a c c ctgagcggcg g t g c t g g c a c c g a c g t c c t g a a c t g g g t g c 2341 a a g c t g c t g c g g t t a c g g c t c t g c c g a c c g g c g t g a c g a t c t c g g g c a t c gaaacgatga 2401 a c g t g a c g t c gggcgctgcg a t c a c c c t g a a c a c g t c t t c gggcgtgacg g g t c t g a c c g 2461 c c c t g a a c a c caacaccagc g g c g c g g c t c a a a c c g t c a c c g c c g g c g c t ggccagaacc 2521 t g a c c g c c a c g a c c g c c g c t caagccgcga a c a a c g t c g c c g t c g a c g g c ggcgccaacg 2581 t c a c c g t c g c c t c g a c g g g c g t g a c c t c g g gcacgaccac g g t c g g c g c c a a c t c g g c c g 2 641 c t t c g g g c a c c g t g t c g g t g a g c g t c g c g a a c t c g a g c a c g a c c a c c a c g g g c g c t a t c g 2701 c c g t g a c c g g t g g t a c g g c c g t g a c c g t g g c t c a a a c g g c cggcaacgcc g t g a a c a c c a 2761 c g t t g a c g c a a g c c g a c g t g a c c g t g a c c g g t a a c t c c a g c a c c a c g g c c g t g a c g g t c a 2821 c c c a a a c c g c c g c c g c c a c c g c c g g c g c t a c g g t c g c c g g t c g c g t c a a c g g c g c t g t g a 2881 c g a t c a c c g a c t c t g c c g c c g c c t c g g c c a cgaccgccgg c a a g a t c g c c a c g g t c a c c c 2941 t g g g c a g c t t cggcgccgcc a c g a t c g a c t c g a g c g c t c t g a c g a c c g t c a a c c t g t c g g 3001 gcacgggcac c t c g c t c g g c a t c g g c c g c g g c g c t c t g a c c g c c a c g c c g a c c g c c a a c a 3061 c c c t g a c c c t g a a c g t c a a t g g t c t g a c g a cgaccggcgc g a t c a c g g a c tcggaagcgg 3121 c t g c t g a c g a t g g t t t c a c c a c c a t c a a c a t c g c t g g t t c g a c c g c c t c t t c g a c g a t c g 3181 c c a g c c t g g t ggccgccgac gcgacgaccc t g a a c a t c t c gggcgacgct c g c g t c a c g a 3241 t c a c c t c g c a c a c c g c t g c c g c c c t g a c g g g c a t c a c g g t gaccaacagc g t t g g t g c g a 3301 c c c t c g g c g c c g a a c t g g c g a c c g g t c t g g t c t t c a c g g g c g g c g c t g g c g c t g a c t c g a 3361 t c c t g c t g g g c g c c a c g a c c aaggcgatcg t c a t g g g c g c cggcgacgac a c c g t c a c c g 3421 t c a g c t c g g c g a c c c t g g g c g c t g g t g g t t c g g t c a a c g g cggcgacggc a c c g a c g t t c 3481 t g g t g g c c a a c g t c a a c g g t t c g t c g t t c a g c g c t g a c c c g g c c t t c g g c g g c t t c g a a a 3541 c c c t c c g c g t c g c t g g c g c g g c g g c t c a a g g c t c g c a c a a cgccaacggc t t c a c g g c t c 3601 t g c a a c t g g g cgcgacggcg ggtgcgacga c c t t c a c c a a c g t t g c g g t g a a t g t c g g c c 3661 t g a c c g t t c t g g c g g c t c c g a c c g g t a c g a c g a c c g t g a c c c t g g c c a a c gccacgggca 3721 c c t c g g a c g t g t t c a a c c t g a c c c t g t c g t c c t c g g c c g c t c t g g c c g c t g g t a c g g t t g 3781 c g c t g g c t g g c g t c g a g a c g g t g a a c a t c g c c g c c a c c g a c a c c a a c a c g a c c g c t c a c g 3841 t c g a c a c g c t g a c g c t g c a a g c c a c c t c g g c c a a g t c g a t c g t g g t g a c g ggcaacgccg 3901 g t c t g a a c c t gaccaacacc ggcaacacgg c t g t c a c c a g c t t c g a c g c c a g c g c c g t c a 3961 ccggcacggg c t c g g c t g t g a c c t t c g t g t cggccaacac c a c g g t g g g t g a a g t c g t c a 4021 c g a t c c g c g g cggcgctggc g c c g a c t c g c t g a c c g g t t c g g c c a c c g c c a a t g a c a c c a 4081 t c a t c g g t g g c g c t g g c g c t g a c a c c c t g g t c t a c a c c g g cggtacggac a c c t t c a c g g 4141 gtggcacggg c g c g g a t a t c t t c g a t a t c a a c g c t a t c g g c a c c t c g a c c g c t t t c g t g a 4201 c g a t c a c c g a c g c c g c t g t c ggcgacaagc t c g a c c t c g t c g g c a t c t c g acgaacggcg 4261 c t a t c g c t g a c g g c g c c t t c ggcgctgcgg t c a c c c t g g g c g c t g c t g c g a c c c t g g c t c 4321 a g t a c c t g g a c g c t g c t g c t gccggcgacg gcagcggcac c t c g g t t g c c a a g t g g t t c c 4381 a g t t c g g c g g c g a c a c c t a t g t c g t c g t t g a c a g c t c g g c t g g c g c g a c c t t c g t c a g c g 4441 g c g c t g a c g c g g t g a t c a a g c t g a c c g g t c t g g t c a c g c t g a c c a c c t c g g c c t t c g c c a 4501 c c g a a g t c c t g a c g c t c g c c taagcgaacg t c t g a t c c t c g c c t a g g c g a g g a t c g c t a g 4561 actaagagac c c c g t c t t c c gaaagggagg c g g g g t c t t t c t t a t g g g c g c t a c g c g c t g 4621 g c c g g c c t t g c c t a g t t c c g g t g g c t a t g a t t t a g c g g g a c t g g g g g g c t t g c t c a c t t t 4681 c c g c c a c a a t t t c g t g g t c g agacggcgcc t t a g t t g t t a c t g t a c a t g g c c g c g t c g g t 4741 t c g c g c g g c g t c c t g a a g g c t c a c a a t g t t caagcgcagc ggcgcgaagc c g a c g a t c c t 4801 cgaccaggcc g t g c t g g t c g c c c g c c c g g c g g t g a t c a c c g c c a t g g t c t t c a g c t t c t t 4861 c a t c a a c a t t c t g g c c c t g g t c a g c c c g c t g t a c a t g c t g c a g g t c t a t g a c c g c g t g c t 4921 gaccagccgc a a c g t t t c g a c c c t g a t c g t g t t g a c g g t c a t c t g c g t c t t c c t g t t c c t 4981 g g t c t a c g g c c t g c t c g a g g c g c t g c g c a c c c a g g t g c t g g t g c g c g g c g g t c t g a a g t t 5041 c g a c g g c g t g g c c c g g g a t c c g a t c t t c a a g t c g g t g c t g g a c t c c a c g c t c a g c c g c a a 5101 gggcatcggc ggccaggcgt t c c g c g a c a t ggaccaggtc c g a g a g t t c a t g a c c g g c g g 5161 c c t g a t c g c c t t c t g c g a t g c g c c c t g g a c g c c g g t g t t c g t c a t c g t c t c g t g g a t g c t 5221 g c a c c c g t t c t t c g g c a t c c t g g c g a t c a t c g c c t g t a t c a t c a t c t t c g g c c t g g c c g t 5281 gatgaacgac a a c g c c a c c a agaacccgat c c a g a t g g c c a c c a t g g c c t c g a t c g c c g c 5341 ccagaacgac g c c g g t t c c a c c c t g c g c a a c g c c g a g g t c a t g a a g g c c a t g g g c a t g t g 5401 g g g c g g c c t g c a a g c c c g c t ggcgcgcgcg ccgcgacgag c a g g t g g c c t ggcaggccgc 54 61 cgccagcgac gccggcggcg c g g t g a t g t c gggcatcaag g t g t t c c g c a a c a t c g t c c a 5521 g a c c c t g a t c ctgggcggcg g c g c c t a t c t g g c c a t c g a c g g c a a g a t c t cggccggcgc 5581 g a t g a t c g c c g g c t c g a t c c t g g t c g g c c g c g c c c t g g c g c c c a t c g a g g g c g c c g t g g g 5641 t c a g t g g a a g a a t t a t a t c g gcgcgcgcgg c g c c t g g g a t c g c c t g c a g a c c a t g c t g c g 5701 cgaggaaaag agcgccgacg a c c a c a t g c c g c t g c c c g a g ccgcgcggcg t g c t g t c g g c 5761 cgaagccgcc t c g a t c c t g c cgccgggcgc gcaacagccg a c c a t g c g c c a g g c c a g c t t 5821 c c g t a t c g a c gccggcgccg c g g t g g c c c t t g t c g g t c c c agcgcggcgg g c a a g t c c t c 5881 g c t g c t g c g c g g t a t c g t c g g c g t c t g g c c c t g c g c g g c g g g c g t c a t c c g c c t c g a c g g 5941 c t a c g a c a t c aagcagtggg a t c c c g a g a a g c t g g g t c g c c a c g t c g g c t a c c t g c c g c a 6001 ggacatcgag c t g t t c t c g g g c a c c g t c g c c c a g a a c a t c g c c c g c t t c a c c g a g t t c g a 6061 a t c g c a g g a a g t c a t c g a g g c c g c g a c c c t ggcgggcgtg c a c g a g a t g a t c c a g a g c c t 6121 gccgatgggc t a t g a c a c g g cgatcggcga gggcggcgcc t c g c t g t c c g gcggccagcg 6181 c c a g c g c c t g g c c c t g g c g c g t g c g g t g t t c c g c a t g c c g g c c c t g c t g g t g c t g g a c g a 6241 gccgaacgcc a g c c t c g a c c aggtgggcga a g t g g c g c t g atggaagcga t g a a g c g g c t 6301 t a a g g c c g c t aagcgcacgg t g a t c t t c g c c a c c c a c a a g g t g a a c c t g t t g g c c c a g g c 6361 c g a c t a c a t c a t g g t g a t c a a c c a g g g t g t g a t c a g c g a c t t t g g c g a a c g c g a c c c g a t 6421 g c t g g c c a a g ctgaccgggg c t g c g c c g c c ccagacgccg c c g c c g a c g c c g c c g c c c g c 6481 g c c g t t g c a g c g c g t c c a g t a a g c g c c t t c g t c a g t c c g c c t c t c c c t t c c t g g c c g t t t 6541 cagaacgcgc c c a t c a g g c t t g a a t c t c a a t g a a g c c c c c c a a g a t c c a g c g t c c g a c g g 6601 a c a a c t t c c a g g c t g t g g c c c g t a t c g g c t a c g g c a t c a t c g c c c t g a c c t t t g t c g g t c 6661 t g t t g g g c t g g g c c g c g t t c g c c c c g c t c g acagcgcggt g a t c g c c a a c g g c g t c g t c t 6721 ccgccgaggg t a a t c g c a a g a c c g t g c a g c a c c t c g a a g g c g g c a t g c t g g c c a a g a t c c 6781 t g g t c c g c g a aggcgagaag gtgaaggccg g c c a g g t g c t g t t c g a g c t g g a c c c g a c c c 6841 aggccaacgc cgccgccggc a t c a c c c g c a a c c a g t a c g t g g c t t t g a a g gccatggaag 6901 c g c g c c t g c t ggccgagcgc g a c c a g c g t c c g t c c a t c a g c t t c c c c g c c g a c c t g a c c a 6961 gccagcgcgc c g a t c c g a t g g t c g c c c g c g c c a t c g c c g a cgaacaggcc c a g t t c a c t g 7021 a g c g t c g c c a g a c g a t c c a g g g c c a g g t c g a c c t g a t g a a c g c c c a g c g t t t g c a g t a t c 7081 agagcgagat cgagggcatc g a c c g t c a g a c c c a g g g c c t gaaggaccaa c t c g g c t t c a 7141 tcgaggacga g c t g a t c g a c c t g c g t a a g c t c t a t g a c a a g g g c c t g g t g c c c c g g c c g c 7201 g t c t g c t g g c c c t g g a g c g c gagcaggcct c g c t g t c g g g c t c g a t c g g c c g t c t g a c c g 72 61 c a g a c c g c t c c a a g g c c g t c cagggcgcct c t g a c a c c c a g c t c a a g g t t c g c c a g a t c a 7321 agcaggagtt c t t c g a g c a g g t c a g c c a g a g c a t c a c c g a g a c c c g g g t t c g c c t g g c c g 7381 aggtgaccga gaaggaggtc g t c g c c t c c g acgcccagaa g c g g a t c a a g a t c g t g t c g c 7441 c c g t c a a c g g cacggcgcag a a c c t g c g c t t c t t c a c c g a g g g c g c t g t c g t t c g c g c c g 7501 c c g a g c c g c t g g t c g a c a t c gcgcccgagg a c g a g g c c t t c g t g a t c c a g g c g c a t t t c c 7561 agccgaccga t g t g g a c a a t g t c c a t a t g g g c a t g g t c a c c g a a g t t c g g c t g c c g g c c t 7 621 t c c a c t c g c g g g a a a t c c c g a t c c t g a a c g g c a c g a t c c a g t c g c t g t c g caggaccgca 7681 t t t c c g a t c c gcagaacaag c t c g a c t a c t t c c t c g g g a t c g t g c g c g t g g a c g t c a a g c 7741 a g c t g c c g c c g c a t c t g c g c ggcagggtca c c g c c g g c a t g c c g g c c c a g g t g a t c g t g c 7801 cgaccggcga g c g c a c c g t g c t g c a g t a c c t g t t c t c g c c g c t g c g a g a c a c c c t g c g c a 7861 c c a c g a t g c g cgaggagtag ggcaaaggtt t c a a g g g c c t g a t t t c c a a a g c t t t t c g g a 7921 tgggcggcgc g g g c g a g t t a a a t t c g c c g g c g c t g c t t t c c a t t c g c g g g c a a t a g t g t a 7981 g t c a g g a c c c t t c g t t g t t a c t g g a g t c a g c g g a t a c g c a tggcgaaaac g g c t t t g a t c 8041 a c c g g t g t g a ccggtcagga cggggcgtac c t c g c c a a g c t g c t g c t g g a g a a g g g t t a c 8101 a c c g t c c a c g g c a t g c t g c g t c g c t c g g c c t c g g c c g a t g t g a t c g g c g a c c g c c t g c g c 8161 t g g a t c g g c g t c t a t g a c g a c a t c c a g t t c gagctgggcg a c c t c t t g g a cgagggcggt 8221 c t g g c g c g c c t g a t g c g g c g c c t g c a g c c g g a c g a g g t c t a c a a c c t g g c ggcccagagc 8281 t t c g t c g g c g c c t c g t g g g a c c a g c c g c a c c t g a c g g g c t cggtgacggg c c t g g g c a c g 8341 a c c a a c a t g c t c g a a g c c g t g c g t c t g g a a t g c c c g c a g g c g c g g t t c t a t c a g g c c t c g 8401 t c g t c c g a a a t g t a c g g t c t ggtgcagcac c c g a t c c a g t cggagacgac g c c g t t c t a t 8461 c c c c g c t c g c c c t a t g c g g t g g c c a a g c t c t a c g c c c a c t g g a t g a c g g t g a a c t a t c g c 8521 g a g a g c t t t g g c c t g c a c g c c t c g g c c g g c a t c c t g t t c a accacgagag c c c g c t g c g c 8581 g g c a t c g a g t t c g t g a c c c g c a a g g t c a c c gacgcggtgg c g g c c a t c a a g c t g g g t c a g 8641 caaaagaccg t c g a t c t g g g c a a t c t c g a c gccaagcgcg a c t g g g g t c a cgccaaggac 8701 t a t g t c g a g g c c a t g t g g c t g a t g c t g c a g caggagacgc c g g a c g a c t a t g t g g t c g c g 8761 accggcaaga c c t g g a c c g t g c g c c a g a t g t g c g a a g t g g c c t t c g c c c a t g t c g g c c t g 8821 a a c t a t c a g g a c c a c g t g a c g a t c a a t c c g a a g t t c c t g c g t c c g g c g g a a g t g g a c c t g 8881 c t g c t g g g c g a t c c g g c c a a ggccaaggcc a a g c t c g g c t gggaacccaa g a c g a c c a t g 8941 c a a c a g a t g a t c g c c g a a a t g g t c g a c g c c g a c a t c g c g c g g c g c t c g c g c a a c t g a t g a 9001 g c g c t t c c c t c g c c a t c g g g g g c g t c g t c a t c a t c g g c g g cggcggccac gccaaggtgg 9061 t c a t c g a g a g c c t g c g g g c c t g c g g t g a g a cggtggcggc c a t t g t c g a t g c g g a t c c g a 9121 cgcggcgcgc g g t g t t g g g c g t t c c g g t a g t g g g c g a t g a c c t g g c g c t g c c g a t g c t t c 9181 gcgagcaggg g c t g t c c a g a c t g t t c g t g g c g atcggcga c a a c c g g c t g cgccagaagc 9241 t g g g c c g c a a ggcgcgcgac c a c g g c t t t t c g c t g g t c a a c g c c a t c c a t c c c t c t g c c g 9301 t c g t t t c g c c t a g c g t a c g t ctgggcgagg g g g t t g c g g t gatggccggc g t c g c g a t c a 9361 a c g c t g a c a g c t g g a t c g g c g a c c t g g c g a t c a t c a a c a c c g g c g c t g t t g t c g a c c a t g 9421 a c t g c c g c c t gggcgcggcc t g c c a c c t g g g a c c c g c c t c g g c c c t g g c c g g c g g c g t a t 9481 ccgtgggaga g c g g g c t t t t c t c g g t g t c g gcgcccgggt c a t a c c t g g c g t c a c g a t c g 9541 gcgccgacac g a t c g t c g g c gccggcggtg t c g t c g t g c g c g a c c t t c c g g a c t c g g t c c 9601 t t g c g a t c g g c g t g c c g g c c aagatcaaag g a g a c c g t t c g t g a g t g a c c t g c c g c g c a t 9661 t t c c g t c g c c gcgccgcgcc t c g a c g g c a a cgaacgtgac t a t g t a c t c g a a t g c a t g g a 9721 c a c g a c c t g g a t c t c g t c g g t c g g a c g c t t c a t c g t t g a g t t c g a a a a g g c c t t c g c c g a 9781 c t a c t g t g g c g t c a a g c a c g c g a t c g c c t g caacaacggt a c g a c c g c c t t g c a c c t g g c 9841 c c t g g t g g c g a t g g g g a t c g gacccggcga c g a g g t g a t c g t t c c g a g c c t g a c c t a t a t 9901 c g c c t c g g c c a a t t c a g t c a c c t a t t g c g g c g c g a c g c c t g t g c t g g t c g a c a a c g a t c c 9961 g c g g a c c t t c a a c c t g g a c g c c g c g a a g t t ggaggcgctg a t a a c g c c g c gcacgaaggc 10021 g a t c a t g c c c g t g c a c c t c t a c g g t c a g a t t t g c g a c a t g g a t c c g a t c c t c g a a g t t g c 10081 t c g c a g g e a t a a c c t g c t c g t g a t c g a g g a tgcggccgag gcggtgggcg c g a c c t a c c g 10141 gggcaagaag t c a g g c t c g c t g g g c g a c t g c g c c a c c t t c a g c t t c t t c g gcaacaagat 10201 c a t c a c c a c c ggcgagggcg g g a t g a t c a c c a c c a a t g a t g a t g a c c t g g cggccaagat 102 61 g c g c t t g c t g cgaggccagg g c a t g g a t c c c a a c c g c c g c t a c t g g t t t c c g a t c g t c g g 10321 c t t c a a t t a c c g g a t g a c c a a c a t c c a g g c c g c g a t c g g t c t g g c g c a g c tggagcgggt 10381 cgacgaacac c t g g c c g c g c gcgaaagggt c g t g g g c t g g t a c g a g c a g a a g c t g g c g c g 10441 c c t g g g c a a t c g g g t c a c c a a g c c c c a t g t c g c g c t g a c c g g t c g c c a c g t g t t c t g g a t 10501 g t a c a c t g t g c g c c t g g g c g a g g g c c t t t c c a c c a c g c g c g a t c a g g t g a t c a a g g a t c t 10561 c g a c g c g t t g g g c a t t g a g a g c c g t c c g g t g t t c c a c c c g a t g c a c a t c a t g c c g c c c t a 10621 t g c g c a t c t g g c c a c g g a t g a t c t g a a g a t cgccgaagcc t g c g g g g t c g a c g g c t t g a a 10681 c c t g c c g a c c cacgcggggc t g a c t g a a g c c g a t a t c g a c c g t g t c a t c g c g g c g c t c g a 10741 t c a g g t g t t g g t c t a g c c g a t g c g c a t c g t c c t g c t g t c c t c g a t c g t g c c g t t c a t c a a 10801 cggcggcgcg c g c t t c a t c g t c g a g t g g c t cgaggagaag c t g a t c g a g g c c g g c c a c g a 10861 ggtcgagcgg t t c t a c c t g c c g t t t g t c g a c g a t c c g a a c g a g a t c c t g c a c c a g a t c g c 10921 c g c c t g g c g g c t g a t g g a c c t g a c c c a g t g g t g c g a c c g g g t g a t c t g c t t c c g g c c g c c 10981 g g c c t a t g t g g t g g a c c a t c cgaacaaggt c t t g t g g t t c a t c c a c c a c a t c c g c a c c t t 11041 c t a c g a c c t g tgggacacgc c c t a t c g c g g c a t g c c t g a c gacgcgcagc a c c g g g c c a t 11101 c c g c g a c a a t c t c c g c g c g c t c g a c a c c c a g g c g a t t t c g gaagcccgcg c g g t g t t c a c 11161 c a a c t c c c a g g t g g t g g c c g a c c g c t t g a a g g c g t t c a a c g g c c t g g a c g c c a c g c c g c t 11221 g t a t c c g c c g a t c t a t c a g c c c g a g c g c t t t t c c c a t a c c g g c t a t g g c g a c g a g a t c g t 11281 ggccatct'cg cggctggagc cgcacaagcg t c a g g c c c t g a t g a t c g a g g c c a t g c a g t a 11341 cgtgaagagc ggcgtgaagc t g c g c c t g g c gggcacggcg t c c a g c g c c g a g t a t g g t c g 11401 a c a g c t g g t c a a g a t g a c c c a c g a c c t g g g c g t c g c c g a c c g g g t c a t t c t c g a g g a t c g 11461 c t g g a t c a g c gaggacgaga a g g c c g a t a t g c t g a a a c a g g c t c t g g c c g t g g c c t a t c t 11521 gcccaaggac gaggacagct a t g g c t a t c c t t c g c t g g a g g g c g c t c a c g c c c g c a a g c c 11581 g g t g a t c a c c a c g a c c g a c t ccggcggggt g c t g g a a c t g g t c g a g c a t g gccgcaacgg 11641 c t t g a t c a g c gccccggacc c g c g c g c g c t ggccgagcag t t c g a c c g c c t g c a c g c t g a 11701 c a a g g c t g c g acagccaaga tggggaccgc c t c g c t g a a c c g t c t g g c c g agatgaagat 11761 cgactggagc a c c g t c g t g g a g c g c c t g a c c t c a t g a g a a c g c c c g c a t g a a g g t t c t g g 11821 t c g t c a a c a a c g c c g c g c c g t t c c a a c g c g gcggcgccga ggagctggcc g a c c a t c t g g 11881 t c c g c c g c c t gaacgccacg c c c g g c g t c c a g t c c g a g c t g g t g c g c g t g c c c t t c a c c t 11941 gggagccggc c g a g c g t c t g atcgaggaga t g c t g a t c t c caaggggatg c g g c t c t a c a 12001 a t g t g g a c c g g g t c a t t g g c c t c a a a t t t c c g g c c t a t c t g a t c c c g c a t caccaaaagg 12061 t g c t g t g g c t g c t g c a c c a g t t c c g t c a g g c c t a c g a c c t g t c c g a a g c g ggccagagcc 12121 a t c t g g a c t t cgacgacacg ggcagggcgg tgaaggcggc g a t c c g c g c g gccgacaacg 12181 c c t g c t t c g c c g a g t g c c g c a a g a t c t a c t g c a a c t c g c c c g t c a c c c a g a a c c g c c t g a 12241 t g a a g t t c a a cggggtcgcc agccaggtgc t c t a t c c g c c g c t g a a c g a c g g t g a g c t g t 12301 t c a c c g g c g g cgagcatggc g a c t a t g t c t tcgcgggcgg c c g g g t c g c g gcgggcaagc 12361 g c c a g c a c c t g t t g a t t g a g g c c c t a g c c t t g c t g c c c g g c a g t c t g c g g c t g g t g a t c g 12421 ccggaccgcc ggagaaccag g c c t a t g c c g a c c g c c t g a c c a a g c t g g t c g a g g a t c t g g 12481 a t c t g a a g g a t c g c g t c g a g c t g c g g t t t g g c t t c c a t c c gcgcgaggac a t c g c c c g t t 12541 gggccaacgg g g c c c t g a t c t g c g c c t a t c t g c c c t t t g a c g a g g a t a g t g t a g g t t a c g 12 601 t c a c g a t g g a g g c c t t c g c c gcaggcaagg c c g t g c t g a c c g t g a c c g a c t c c g g c g g c c 12661 t g c t g g a g a t c g t c a g c g c g g a t a c c g g t g c g g t c g c c g a g c c c a c g c c g c a a g c c c t g g 12721 c c g a g g c g c t t g a t c g t t t g a c c t c g g a c a aggcgcgggc g a t a t c g c t g ggcgacgcgg 12781 c g c g c a g g c t atggcgcgac a a g a a t g t c a catgggaaga g a c g g t c c g c c g t c t t c t t g 12841 a t t a a g c c a c a a a c a t t g g g t t g a a g a c c a c g t t a a g a c g g g g t c g g c t a c a g t c t a g g a 12901 a a t g c g a g t g c t g t c g a a a g t t c t g t c c g t g c g a a c g t c t c t g a t c g c c t t g g c c a t g g c 12961 c a t g g c g g t c g t c g g t c g c g c t g a t c t c g c c c a c g c c g a g a c c t t g g c c g aggcgatcac 13021 c g c a g c c t a t c a g a g c a a t c c g a a t a t t c a ggcccaacgc g c c g c c a t g c g c g c g c t g g a 13081 c g a g a a c t a c a c c c a g g c c c g t t c g g c c t a t g g g c t g c a a g c c a g c g c c t c g g t c g c t g a 13141 g g t c t a t g g c t g g t c c a a g g g c g t c a a c g c caagaacggc g t c g a g g c c g ccagccagac 13201 c t c g a c c c t c t c t c t g a g c c a g a g c c t c t a c a c c a a c g g t c g t t t c t c g g c c c g c c t g g c 13261 g g g t g t c g a g g c g c a g a t c a aggccgcgcg c g a g a a c c t g c g c c g c a t c g a g a t g g a c c t 13321 g c t g g t c c g c g t g a c c a a c g c c t a t a t c t c g g t g c g c c g c gaccgcgaga t c c t g c g g a t 13381 cagccaaggc g g t g a a g c c t ggctgcagaa g c a a t t g a a g gacaccgagg a c a a g t a c a g 13441 c g t c c g t c a g g t g a c c t t g a c c g a c g t g c a gcaggccaag g c c c g c c t g g c g t c g g c c a g 13501 c a c t c a g g t g gcgaacgccc aggcgcagct g a a t g t c a g c g t a g c g t t c t a c g c g t c c c t 13561 ggtggggcgc cagccggaga c g c t g a a g c c t g a a c c c g a t a t t g a c g g c c t g c c t a c a a c 13621 c c t c g a c g a g g c g t t c a a t c aggccgaaca a g c c a a t c c g g t c c t g c t g g cggcgggcta 13681 caccgagaag g c c t c t c g c g c c g g c g t c g c cgaggcgcgg g c c c a g c g c c t g t t c t c g g t 13741 cggcgcgcgc g c g g a c t a t c g c a a t g g c t c c a g c a c g c c g t a c t a c g c g c g t g g c g g t c t 13801 gcgcgaggac a c c g t c a a c g c c t c g a t c a c c c t g a c c c a g c c g c t g t t c a c c a g c g g t c a 13861 g c t g a a c g c c t c g g t g c g g c a g t c g a t c g a ggagaacaac cgcgacaagc t g c t g a t g g a 13921 agacgcacgt c g c a g c a t g g t c c t g a g c g t c t c g c a g t a c t g g g a c a g c c t g g t g g c c g c 13981 gcggaagtcg c t g g t c a g c c tcgaagagga aatgaaggcc a a c a c g a t c g c c t t . c t a t g g 14041 ggtgcgcgaa g a a g a g c g t t t c g c g c t t c g c a g c a c g a t c g a a g t g c t g a acgcccaagc 14101 c g a a t t g c a g aacgcccaga t c a a t t t c g t ccgcgggcgc g c c a a c g a g t a t g t c g g t c g 14161 g c t c c a t c t t c t ggcgcagg t c g g c a c g c t t g a g g t c g g c a a t c t c g c t c c c g g c g t c c a 14221 g c c c t a c g a t c c t g a g c g t a a c t t c a g g a a g g t c c g g t a c c g c g g c g c t t t g c c g a c g g a 14281 g c t g a t c a t c g g a a c c t t c g a c a a g a t c g c c t t g c c g c t c gagcccaaga agccggcgcc 14341 gggggacacc t c g c c g a t c c g g c c g c c g t c gagcgaactg ccggccaggc c t g t t t c g g c 14401 cgacaaggtg acgccgccgg c g t c g a t g a a c g a t c t g c c c g c c c t g a c c g acgacacgcc 14461 g g t c c a g a c c g c g c c c c g c a a c t a g a g c c c t t t c c g a t c g g a t c g c c t c a a t c c g a t c g g 14521 atcaaagggg t t c t a t t t c a g a a g t t t a g a a c g t t c t t c g g t c a c c g t t t c g c a c g g a t c 14581 ggagaacgct c t a g g t c a g g ggcagccgcg c g c t c g t c a t c g c c a c c c a g g c g t c g c g g t 14641 tgaacgagac g t g a c c g t t c c g c g c c g t c g cggcggccaa g g t g t t g c g c c a g g c g t c g t 14701 c g g a g a t c a t ccgcgccagc t t t t c g g c g g c g t c g c c c a c a t c a g c g t c g g c c c a g c g c c 14761 c g g c g c g a t a g a t g a c g c c a t c g c c c t c c a cgggaaccag c c g a t a g t c g a c g a a c t g g c 14821 tggacgccgg g t c c a t g a a c t c c a c g t t c g a t g a c c a g c c t g t c g c c a g g g t g g g c t t g c 14881 c g a g c c a t a t c g c t t c g g c g agcagcaggc c g t a a c c c t c ggaccggtgc agggacagca 14941 c g a t g t c g c t g c t g g c g g t g agggcggcca t g t c c t g c g c c g a c a g g t t g t c c g t c a g a a 15001 gccggatgga c g g a t c g t c g g c c a c c t c g g cggcgagcgc t t g a a a g g t c t c g g g a t a g a 15061 g g t c c g c g c c g a c g a c t t t a cagacaaggg t t g c g g g a c g t g t c g c c t t g a c t g t g g c g t 15121 cccgaaaggc gcgcagcgcc g c g t c g g g a t t c t t g c g c t g g gcggtcgaa c g c a g a t c g a 15181 a g g c c a t c a g cacgacgacc c g g t c g t c c t gcaggccaaa g c g c t g g c g a t c g g c c t g c g 15241 g g c g c g g a t t gagatagagc gggtagggga c c g g c g t a a c c t t t a c g c c c cgcggcgcca 15301 cgcgacggat a g c g t c a g c g gcgaaggcgg agggggtcca g a c c t c g t c g a c g a a a t c g a 15361 aggcggggag c c a g t c g g a c g g g a c c t c t t c c a g c t c c c a g g c c c a g t a g c c g a t g t g c c 15421 g g c g g c c c t c caggaaacgg c c t t c g g t c t c c c g c g c c c a t c g c a g a a g c t c t g g g g g a t 15481 t g a t g t g c g a g a t c g t g a c c c c g c g t t c g t cgggatccgg t g a a g a a t a g gccggggcga 15541 t c t c g g c g g c gaagccgacg gacgccgaca ggtccaaggc g c g g a c c c c a a g g c c c a t g t 15601 c g c c a a a c c c gcgcgccagc a t c c g t g c g c c t t c g c c c a g a c c g t g c a c c g c g c t a t g g a 15661 agccgatgac c g t c a c c g g g a g c g t g g g t g a c a a a c c c c t gggaccggcg c g t c t g c g c g 15721 c g c g c g c c a a g g c g g t t c t t t c c g c a c t c c g g t g t c g g t g c t c g g c c a a g g c c t t g t c c a 15781 g c g c c t c c a g agcggcgcgc ggggcgccgg c g a c g a c a t g cgccgagcgc cggatgggcg 15841 tgggcaggcg g c g c c a t a t c tcaagcaggc g t g a g g t c a t gggcgatgca ggcgggcgag 15901 acgcatgggc g a c c g t a t a g c c g t t c a g g c cgagccgaca ccagcaaagc tgcgcggggc 15961 c g c g t c a g g c g a t c t t g t c a t g cgaaaggt t t g c a t g c a g c a a t g g c g g c c c a c t g c g c c 16021 gaggggggaa t a c g t g t c g a c t g a g t g g a g c g c c g g c t a t g t c a c g g a c g t c a a c t a t a c 16081 g t t t g g c t a t t a t g g c g a g t t g a a c c c g c t c c g c t g c c g c c t t c c g c t c c t g a c g g t c g g 16141 c c g c c a c g c t c c c a a g a t c g a g a a c g c c t g cgagctcggg t t t g g c c a g g g t c t t t c c g t 16201 c t c c a t c c a c g c c gcagccc agccggggat c a a c t g g t a c g g a a c c g a c t t c a a c c c c t c 16261 acaggcggcc t t c g c c g c a g a g a t g g t c c g t c t g t c g g g c gcggaggcca a g c t c t a c g a 16321 c g a g g c c t t c g c c g a g t t c t g t a a t c g c a a g g a c c t g c c g g a c t t c g a c t t c a t c g g g c t 16381 c c a c g g c a t c t g g a c c t g g a t c t c c g a c c a a a a t c g c c a c g t g c t g g t c g a c t t c a t t c g 16441 t c g c a a g c t c cgcccggg Appendix 2 ATC15252 S-layer subunit and transporter genes LOCUS JS3001A19 4255 bp mRNA BCT 07-OCT-1999 DEFINITION C a u l o b a c t e r c r e s c e n t u s S - l a y e r s u b u n i t (rsaA) and A B C - t r a n s p o r t e r ( r s a D ( p a r t i a l ) ) mRNAs, complete c d s . ACCESSION JS3001A19 VERSION KEYWORDS SOURCE C a u l o b a c t e r c r e s c e n t u s . ORGANISM C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . REFERENCE 1 (bases 1 t o 4255) AUTHORS Bingle,W.H., Awram,P.A., N o m e l l i n i , J . F . and Smit,J.K. TITLE The S e c r e t i o n S i g n a l o f C. c r e s c e n t u s S - l a y e r P r o t e i n i s L o c a t e d i n the C - t e r m i n a l 82 Amino A c i d s o f the M o l e c u l e JOURNAL U n p u b l i s h e d REFERENCE 2 (bases 1 t o 4255) AUTHORS Bingle,W.H., Awram,P.A., N o m e l l i n i , J . F . and Smit,J.K. TITLE D i r e c t S u b m i s s i o n JOURNAL S u b m i t t e d (07-OCT-1999) M i c r o b i o l o g y and Immunology, U n i v e r s i t y o f B r i t i s h Columbia, 300-6174 U n i v e r s i t y Blvd,. Vancouver, BC V6T 1Z3, Canada FEATURES L o c a t i o n \/ Q u a l i f i e r s s o u r c e 1..4255 \/organism=\"Caulobacter c r e s c e n t u s \" \/ s t r a i n = \" J S 3 0 0 1 \" gene 637..3717 \/gene=\"rsaA\" CDS 637..3717 \/gene=\"rsaA\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" S - l a y e r s u b u n i t \" \/translation=\"MAYTTAQLVTAYTNANLGKAPDAATTLTLDAYATQTQTGGLSDA AALTNTLKLVNSTTAVAIQTYQFFTGVAPSAAGLDFLVDSTTNTNDLNDAYYSKFAQE NRFINFSINLATGAGAGATAFAAAYTGVSYAQTVATAYDKIIGNAVATAAGVDVAAAV AFLSRQANIDYLTAFVRANTPFTAAADIDLAVKAALIGTILNAATVSGIGGYATATAA MINDLSDGALSTDNAAGVNLFTAYPSSGVSGSTLSLTTGTDTLTGTANNDTFVAGEVA GAATLTVGDTLSGGAGTDVLNWVQAAAVTALPTGVTISGIETMNVTSGAAITLNTSSG VTGLTALNTNTSGAAQTVTAGAGQNLTATTAAQAANNVAVDGGANVTVASTGVTSGTT TVGANSAASGTVSVSVANSSTTTTGAIAVTGGTAVTVAQTAGNAVNTTLTQADVTVTG NSSTTAVTVTQTAAATAGATVAGRVNGAVTITDSAAASATTAGKIATVTLGSFGAATI DSSALTTVNLSGTGTSLGIGRGALTATPTANTLTLNVNGLTTTGAITDSEAAADDGFT TINIAGSTASSTIASLVAADATTLNISGDARVTITSHTAAALTGITVTNSVGATLGAE LATGLVFTGGAGADSILLGATTKAIVMGAGDDTVTVSSATLGAGGSVNGGDGTDVLVA NVNGSSFSADPAFGGFETLRVAGAAAQGSHNANGFTALQLGATAGATTFTNVAVNVGL TVLAAPTGTTTVTLANATGTSDVFNLTLSSSAALAAGTVALAGVETVNIAATDTNTTA HVDTLTLQATSAKSIVVTGNAGLNLTNTGNTAVTSFDASAVTGTGSAVTFVSANTTVG EVVTIRGGAGADSLTGSATANDTIIGGAGADTLVYTGGTDTFTGGTGADIFDINAIGT STAFVTITDAAVGDKLDLVGISTNGAIADGAFGAAVTLGAAATLAQYLDAAAAGDGSG TSVAKWFQFGGDTYVVVDSSAGATFVSGADAVIKLTGLVTLTTSAFATEVLTLA\" 122 gene CDS BASE COUNT ORIGIN 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 722 a a a g c t t c c c c g a c a c g a t a a gcggagcagt g t a a c g g t c c a t c a g c g c c g c t t c a g c c a g c g g g t t a c c a c a c t c a c c c g a c g c t a t a t a t t g t c g a c g t cagggggtgt g t g a c t g c g t c t c g a c g c g t a a c a c c c t g a a c c g g c g t t g a a c g a c c t g a t c g a t c a a c c g g c g t t t c g t g c g a c c g c c g a t c g a c t a c c g a t c t g g c c g a t c g g t g g t t t c g a c c g a c a g g t t c g a c c c a c g t t c g t t g g g c g g t g c t g a c c g g c g t g a c t g a a c a c g t g c t c a a a c c g gcgaacaacg t c g g g c a c g a g c g a a c t c g a g t g g c t c a a a a c c g g t a a c t g c t a c g g t c g g c c a c g a c c g g a c t c g a g c g c g c g g c g c t c acgacgaccg a a c a t c g c t g a c c c t g a a c a a c g g g c a t c a c t g g t c t t c a a t c g t c a t g g g g t t c g g t c a t t c a g c g c t g 3960..4253 \/ g e n e = \" r s a D ( p a r t i a l ) \" 3960..4253 \/ g e n e = \" r s a D ( p a r t i a l ) \" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" A B C - t r a n s p o r t e r \" \/translation=\"MFKRSGAKPTILDQAVLVARPAVITAMVFSFFINILALVSPLYM LQVYDRVLTSRNVSTLIVLTVICVFLFLVYGLLEALRTQVLVRGGLKFDGVARD\" 1512 c 1296 g 725 t a a g c c t a g g t c c g a a c t a g t t c g c g c a c g a ggcggaatgc a c c t t a g c g g c g g c c a c c t t t a c a t t a t a a ccaggtgaac g g a a t t t g c t a t g a c g t t t g g g g a t t t t t t a c accaacgc a c g c g a c t c a a g c t g g t c a a c c c c g t c g g c a c g a c g c g t a tggccacggg acgcccagac c t g g c g t c g a t g a c c g c c t t t c a a g g c c g c acgcgaccgc acgcggctgg t c t c g c t g a c cgggtgaagt g c a c c g a c g t c g a t c t c g g g c t t c g g g c g t t c a c c g c c g g t c g c c g t c g a c c a c g g t c g g gcacgaccac cggccggcaa c c a g c a c c a c c c g g t c g c g t ccggcaagat c t c t g a c g a c t g a c c g c c a c g c g c g a t c a c g t t c g a c c g c t c t c g g g c g a c g g t g a c c a a cgggcggcgc gcgccggcga acggcggcga a c c c g g c c t t gaaaagccga c t t c g c t g a a a c g t c t t c t c ggccgcgacc c c a g c t g g c g cgcggcggag a g c c t c g c g c a g t c t t t a t a g t a c c g g t t a c t c t a t a g c c t t g g g a g a c a c a a c c t c g g c aacccagacg cagcacgacg c g c t g g t c t g c t a c t c g a a g cgccggcgcc g g t c g c c a c c c g t c g c g g c c c g t g c g c g c c c c t g a t c g g c cacggccgcg c g t g a a c c t g c a c c g g c a c c c g c c g g c g c t c c t g a a c t g g c a t c g a a a c g g a c g g g t c t g c g c t g g c c a g cggcggcgcc c g c c a a c t c g cacgggcgct c g c c g t g a a c ggccgtgacg c a a c g g c g c t c g c c a c g g t c c g t c a a c c t g gccgaccgcc ggactcggaa c t c t t c g a c g c g c t c g c g t c c a g c g t t g g t t g g c g c t g a c c g a c a c c g t c cggcaccgac c g g c g g c t t c c c c c c c g t c g c a g g a t c t c g c g a g a c c t c c c g t c t c c a g c t t g c g a c a g g g t c t t g c a c c g t t g a c c g a g t a t a g c g c t t g a a a a a t g c t a t c g c t g c t c a t c c t c a t g g a a g g c g c c t g g g c g g c c t c t g c t g t t g c c a g a c t t c c t g g t t c g c t c a g g ggcgcgacgg g c c t a t g a c a g c c g t g g c t t a a c a c g c c g t a c c a t c c t g a a t g a t c a a c g t t c a c c g c c t g a c a c c c t g a g c g a c c c t g a g t g c a a g c t g a t g a a c g t g a a c c g c c c t g a a a c c t g a c c g a a c g t c a c c g g c c g c t t c g g a t c g c c g t g a a c c a c g t t g a g t c a c c c a a a g t g a c g a t c a a c c c t g g g c a t c g g g c a c g g a a c a c c c t g a g c g g c t g c t g a t c g c c a g c c a c g a t c a c c t g c g a c c c t c g t c g a t c c t g c a c c g t c a g c t g t t c t g g t g g g a a a c c c t c c g c c c a a a c a c t a g g t g a t c g a g g g c c t t g g t g a g a g a t g a c c g g c g g c c t t c c g a g g c g c ggcaggagcg t t c g g c g g g g g t a c c c c t g a c c a t g c g c g c c c t a t a c g a c a c g c c g c c a c cggacgccgc t c c a g a c c t a t c g a c t c g a c a a a a c c g c t t c t t t c g c c g c a g a t c a t c g g t c c t g a g c c g t c a c g g c c g c a c g c c g c c a c a c c t g t c g g a a t c c g t c g t c cgggcaccgc c c g t t g g c g a c t g c g g t t a c c g t c g g g c g c a c a c c a a c a c c c a c g a c c g c t c g c c t c g a c g c a c c g t g t c c c g g t g g t a c cgcaagccga c c g c c g c c g c c c g a c t c t g c g c t t c g g c g c g c a c c t c g c t c c c t g a a c g t a c g a t g g t t t t g g t g g c c g c c g c a c a c c g c g c g c c g a a c t t g g g c g c c a c c g g c g a c c c t c c a a c g t c a a g c g t c g c t g g g c t a g c a g a c g a t c a t a g a a c c c a g t c g c g a g g t g t a a t a c g c g c a t c t c t g c g g c g t t g cgggcgcgct ggtacaagga a a t t c g g c t a c a c t c g g t c g g g c c c a g t t g c a c g c t g a c g t g c g c t g a c c c c a g t t c t t c c a c c a a c a c c c a t c a a c t t c c g c c t a c a c g c a a c g c c g t c ccaggccaac t g c c g a c a t c g g t g t c g g g c c g g c g c c c t g g g g c g t g t c g caacaacgac c a c c c t g a g c g g c t c t g c c g t g c g a t c a c c cagcggcgcg c g c t c a a g c c gggcgtgacc g g t g a g c g t c g g c c g t g a c c c g t g a c c g t g c a c c g c c g g c c g c c g c c t c g c g c c a c g a t c c g g c a t c g g c c a a t g g t c t g c a c c a c c a t c cgacgcgacg t g c c g c c c t g ggcgaccggt gaccaaggcg g g g c g c t g g t c g g t t c g t c g c g c g g c g g c t 123 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301 3361 3421 3481 3541 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 4201 c a a g g c t c g c a c g a c c t t c a acgacgaccg t c g t c c t c g g a t c g c c g c c a t c g g c c a a g t a c g g c t g t c a g t g t c g g c c a t c g c t g a c c g c t g g t c t a c a a t c a a c g c t a a a g c t c g a c c g c g g t c a c c c gacggcagcg g t t g a c a g c t g g t c t g g t c a a a c g t c t g a t gaggcggggt a t g a t t t a g c c g c c t t a g t t t g t t c a a g c g c g g c g g t g a t c g c t g t a c a t t c g t g t t g a c g c a c c c a g g t acaacgccaa c c a a c g t t g c t g a c c c t g g c c c g c t c t g g c c c g a c a c c a a c g a t c g t g g t c c a g c t t c g a a c a c c a c g g t g t t c g g c c a c c c g g c g g t a c t c g g c a c c t c t c g t c g g c a t t g g g c g c t g c g c a c c t c g g t c g g c t g g c g c c g c t g a c c a c c c t c g c c t a g c t t t c t t a t g gggactgggg g t t a c t g t a c cagcggcgcg c a c c g c c a t g g c t g c a g g t c g g t c a t c t g c g c t g g t g c g c c g g c t t c a c g g g t g a a t g t c c a a c g c c a c g c g c t g g t a c g c a c g a c c g c t gacgggcaac cgccagcgcc gggtgaagtc c g c c a a t g a c g g a c a c c t t c g a c c g c t t t c c t c g a c g a a c t g c g a c c c t g t g c c a a g t g g g a c c t t c g t c c t c g g c c t t c gcgaggatcg g g c g c t a c g c g g c t t g c t c a a t g g c c g c g t aagccgacga g t c t t c a g c t t a t g a c c g c g g t c t t c c t g t g g c g g t c t g a g c t c t g c a a c g g c c t g a c c g g g c a c c t c g g g t t g c g c t g g c a c g t c g a c a g c c g g t c t g a g t c a c c g g c a g t c a c g a t c c a c c a t c a t c g acgggtggca g t g a c g a t c a g g c g c t a t c g g c t c a g t a c c t t c c a g t t c g a g c g g c g c t g gccaccgaag c t a g a c t a a g g c t g g c c g g c c t t t c c g c c a c g g t t c g c g c t c c t c g a c c a t c t t c a t c a a t g c t g a c c a g t c c t g g t c t a a g t t c g a c g g \/\/ tgggcgcgac t t c t g g c g g c a c g t g t t c a a c t g g c g t c g a c g c t g a c g c t a c c t g a c c a a c g g g c t c g g c gcggcggcgc g t g g c g c t g g cgggcgcgga c c g a c g c c g c c t g a c g g c g c t g g a c g c t g c gcggcgacac a c g c g g t g a t t c c t g a c g c t a g a c c c c g t c c t t g c c t a g t c a a t t t c g t g g g c g t c c t g a g g c c g t g c t g c a t t c t g g c c c c g c a a c g t t c g g c c t g c t c c g t g g c c c g g ggcgggtgcg t c c g a c c g g t c c t g a c c c t g gacggtgaac g c a a g c c a c c caccggcaac t g t g a c c t t c t g g c g c c g a c c g c t g a c a c c t a t c t t c g a t t g t c g g c g a c c t t c g g c g c t t g c t g c c g g c c t a t g t c g t c c a a g c t g a c c c g c c t a a g c g t t c c g a a a g g t c c g g t g g c t gtcgagacgg a g g c t c a c a a g t c g c c c g c c c t g g t c a g c c t c g a c c c t g a gaggcgctgc g a t c c LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gene CDS JS4000RAT1 7493 bp DNA BCT 07-OCT-1999 C a u l o b a c t e r c r e s c e n t u s S - l a y e r s u b u n i t ( r s a A ( t r u n c a t e d ) ) , A B C - t r a n s p o r t e r ( r s a D ) , and Membrane Forming U n i t (rsaE) genes, complete c d s . JS4000RAT1 C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 7493) Bingle,W.H., Awram,P.A., N o m e l l i n i , J . F . and Smit,J.K. The S e c r e t i o n S i g n a l o f the C. c r e s c e n t u s S - l a y e r p r o t e i n i s L o c a t e d w i t h i n the C-Terminal 82 Amino A c i d s o f the M o l e c u l e U n p u b l i s h e d 2 (bases 1 t o 7493) Bingle,W.H., Awram,P.A., N o m e l l i n i , J . F . and Smit,J.K. D i r e c t S u b m i s s i o n S u b m i t t e d (07-OCT-1999) M i c r o b i o l o g y and Immunology, U n i v e r s i t y o f B r i t i s h Columbia, 300-6174 U n i v e r s i t y B l v d , Vancouver, BC V6T 1Z3, Canada L o c a t i o n \/ Q u a l i f i e r s 1..7493 \/organism=\"Caulobacter c r e s c e n t u s \" \/ s t r a i n = \" J S 4 0 0 0 \" 637. .1716 \/gene=\"rsaA(truncated) \" 637 . .1716 124 \/ g e n e = \" r s a A ( t r u n c a t e d ) \" \/note=\" The RsaA p r o t e i n i s t r u n c a t e d because o f a d e l e t e d G b a s e p a i r . A s t o p codon r e s u l t s a f t e r t r a n s l a t i o n o f 359 amino a c i d s . \" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" S - l a y e r s u b u n i t \" \/translation=\"MAYTTAQLVTAYTNANLGKAPDAATTLTLDAYATQTQTGGLSDA AALTNTLKLVNSTTAVAIQTYQFFTGVAPSAAGLDFLVDSTTNTNDLNDAYYSKFAQE NRFINFSINLAT GAGAGATAFAAAYT GV S YAQTVATAY DK11GNAVATAAGVDVAAAV AFLSRQANIDYLTAFVRANT.PFTAAADIDLAVKAALIGTILNAATVSGIGGYATATAA MINDLSDGALSTDNAAGVNLFTAYPSSGVSGSTLSLTTGTDTLTGTANNDTFVAGEVA GAATLTVGDTLSGGAGTDVLNWVQAAAVTALPTGVTISGIETMNVTSGAAITLNTSSG VTGLTALNTNTSGAAQTVTAGAGQT\" gene 3959..5695 \/gene=\"rsaD\" CDS 3959..5695 \/gene=\"rsaD\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/ p r o d u c t = \" A B C - t r a n s p o r t e r \" \/translation=\"MFKRSGAKPTILDQAVLVARPAVITAMVFSFFINILALVSPLYM LQVYDRVLTSRNVSTLIVLTVICVFLFLVYGLLEALRTQVLVRGGLKFDGVARDPIFK SVLDSTLSRKGIGGQAFRDMDQVREFMTGGLIAFCDAPWTPVFVIVSWMLHPFFGILA 11AC111FGLAVMN DNATKN PIQMATMASIAAQNDAGS TLRNAEVMKAMGMWGGLQAR WRARRDEQVAWQAAASDAGGAVMSGIKVFRNIVQTLILGGGAYLAIDGKISAGAMIAG SILVGRALAPIEGAVGQWKTYIGARGAWDRLQTMLREEKSADDHMPLPEPRGVLSAEA ASILPPGAQQPTMRQASFRIDAGAAVALVGPSAAGKSSLLRGIVGVWPCAAGVIRLDG YDIKQWDPEKLGRHVGYLPQDIELFSGTVAQNIARFTEFESQEVIEAATLAGVHEMIQ SLPMGYDTAIGEGGASLSGGQRQRLALARAVFRMPALLVLDEPNASLDQVGEVALMEA MKRLKAAKRTVIFATHKVNLLAQADYIMVINQGVISDFGERDPMLAKLTGAAPPQTPP PTPPPAPLQRVQ\" gene 5763..7073 \/gene=\"rsaE\" CDS 5763..7073 \/gene=\"rsaE\" \/ c o d o n _ s t a r t = l \/ t r a n s l _ t a b l e = l l \/product=\"Membrane Forming U n i t \" \/translation=\"MKPPKIQRPTDNFQAVARIGYGIIALTFVGLLGWAAFAPLDSAV IANGVVSAEGNRKTVQHLEGGMLAKILVREGEKVKAGQVLFELDPTQANAAAGITRNQ YVALKAMEARLLAERDQRPSISFPADLTRLRADPMVARAIADEQAQFTERRQTIQGQV DLMNAQRLQYQSEIEGIDRQTQGLKDQLGFIEDELIDLRKLYDKGLVPRPRLLALERE QASLSGSIGRLTADRSKAVQGASDTQLKVRQIKQEFFEQVSQSITETRVRLAEVTEKE VVASDAQKRIKIVSPVNGTAQNLRFFTEGAVVRAAEPLVDIAPEDEAFVIQAHFQPTD VDNVHMGMVTEVRLPAFHSREIPILNGTIQSLSQDRISDPQNKLDYFLGIVRVDVKQL PPHLRGRVTAGMPAQVIVPTGERTVLQYLFSPLRDTLRTTMREE\" BASE COUNT 1261 a 2627 c 2358 g 1247 t ORIGIN 1 a a g c t t c c c c a a g c c t a g g t gaaaagccga c c c c c c g t c g g c c c a a a c a c g c t a g c a g a c 61 g a c a c g a t a a c c g a a c t a g t c t t c g c t g a a c a g g a t c t c g t a g g t g a t c g g a t c a t a g a a 121 gcggagcagt t c g c g c a c g a a c g t c t t c t c c g a g a c c t c c a g g g c c t t g g c c c a g t c g c g 181 g t a a c g g t c c ggcggaatgc ggccgcgacc c g t c t c c a g c t g a g a g a t g a a g g t g t a a t a 241 a t c a g c g c c g a c c t t a g c g g c c a g c t g g c g t t g c g a c a g g c c g g c g g c c t c g c g c a t c t c 301 c t t c a g c c a g c g g c c a c c t t cgcggcggag g t c t t g c a c c t c c g a g g c g c t g c g g c g t t g 361 c g g g t t a c c a t a c a t t a t a a a g c c t c g c g c g t t g a c c g a g ggcaggagcg cgggcgcgct 421 c a c t c a c c c g ccaggtgaac a g t c t t t a t a t a t a g c g c t t t t c g g c g g g g ggtacaagga 125 481 a c g c t a t a t a g g a a t t t g c t g t a c c g g t t a g a a a a a t g c t g t a c c c c t g a a a t t c g g c t a 541 t t g t c g a c g t a t g a c g t t t g c t c t a t a g c c a t c g c t g c t c c c a t g c g c g c c a c t c g g t c g 601 cagggggtgt g g g a t t t t t t t t g g g a g a c a a t c c t c a t g g c c t a t a c g a c g g c c c a g t t g 661 g t g a c t g c g t acaccaacgc c a a c c t c g g c a a g g c g c c t g a c g c c g c c a c c a c g c t g a c g 721 c t c g a c g c g t a c g c g a c t c a aacccagacg g g c g g c c t c t cggacgccgc t g c g c t g a c c 781 a a c a c c c t g a a g c t g g t c a a cagcacgacg g c t g t t g c c a t c c a g a c c t a c c a g t t c t t c 841. a c c g g c g t t g c c c c g t c g g c c g c t g g t c t g g a c t t c c t g g t c g a c t c g a c c a c c a a c a c c 901 a a c g a c c t g a a c g a c g c g t a c t a c t c g a a g t t c g c t c a g g a a a a c c g c t t c a t c a a c t t c 961 t c g a t c a a c c tggccacggg cgccggcgcc ggcgcgacgg c t t t c g c c g c c g c c t a c a c g 1021 g g c g t t t c g t acgcccagac g g t c g c c a c c g c c t a t g a c a a g a t c a t c g g c a a c g c c g t c 1081 gcgaccgccg c t g g c g t c g a c g t c g c g g c c g c c g t g g c t t t c c t g a g c c g ccaggccaac 1141 a t c g a c t a c c t g a c c g c c t t c g t g c g c g c c a a c a c g c c g t t c a c g g c c g c t g c c g a c a t c 1201 g a t c t g g c c g t c a a g g c c g c c c t g a t c g g c a c c a t c c t g a a c g c c g c c a c g g t g t c g g g c 12 61 a t c g g t g g t t acgcgaccgc cacggccgcg a t g a t c a a c g a c c t g t c g g a c g g c g c c c t g 1321 t c g a c c g a c a acgcggctgg c g t g a a c c t g t t c a c c g c c t a t c c g t c g t c g g g c g t g t c g 1381 g g t t c g a c c c t c t c g c t g a c caccggcacc g a c a c c c t g a cgggcaccgc caacaacgac 1441 a c g t t c g t t g cgggtgaagt c g c c g g c g c t g c g a c c c t g a c c g t t g g c g a c a c c c t g a g c 1501 g g c g g t g c t g g c a c c g a c g t c c t g a a c t g g g t g c a a g c t g c t g c g g t t a c g g c t c t g c c g 1561 a c c g g c g t g a c g a t c t c g g g c a t c g a a a c g a t g a a c g t g a c g t c g g g c g c t g c g a t c a c c 1621 c t g a a c a c g t c t t c g g g c g t g a c g g g t c t g a c c g c c c t g a a c a c c a a c a c cagcggcgcg 1681 g c t c a a a c c g t c a c c g c c g g c g c t g g c c a a a c c t g a c c g c c a c g a c c g c c g c t c a a g c c g 1741 c g a a c a a c g t c g c c g t c g a c ggcggcgcca a c g t c a c c g t c g c c t c g a c g g g c g t g a c c t 1801 cgggcacgac c a c g g t c g g c g c c a a c t c g g c c g c t t c g g g c a c c g t g t c g g t g a g c g t c g 1861 c g a a c t c g a g c a c g a c c a c c acgggcgcta t c g c c g t g a c c g g t g g t a c g g c c g t g a c c g 1921 t g g c t c a a a c ggccggcaac g c c g t g a a c a c c a c g t t g a c gcaagccgac g t g a c c g t g a 1981 c c g g t a a c t c c a g c a c c a c g g c c g t g a c g g t c a c c c a a a c c g c c g c c g c c accgccggcg 2041 c t a c g g t c g c c g g t c g c g t c a a c g g c g c t g t g a c g a t c a c c g a c t c t g c c g c c g c c t c g g 2101 c c a c g a c c g c cggcaagatc g c c a c g g t c a c c c t g g g c a g c t t c g g c g c c g c c a c g a t c g 2161 a c t c g a g c g c t c t g a c g a c c g t c a a c c t g t cgggcacggg c a c c t c g c t c g g c a t c g g c c 2221 g c g g c g c t c t gaccgccacg c c g a c c g c c a a c a c c c t g a c c c t g a a c g t c a a t g g t c t g a 2281 cgacgaccgg c g c g a t c a c g gactcggaag c g g c t g c t g a c g a t g g t t t c a c c a c c a t c a 2341 a c a t c g c t g g t t c g a c c g c c t c t t c g a c g a t c g c c a g c c t g g t g g c c g c c gacgcgacga 2401 c c c t g a a c a t c t c g g g c g a c g c t c g c g t c a c g a t c a c c t c g c a c a c c g c t g c c g c c c t g a 2461 c g g g c a t c a c ggtgaccaac a g c g t t g g t g c g a c c c t c g g c g c c g a a c t g g c g a c c g g t c 2521 t g g t c t t c a c gggcggcgct g g c g c t g a c t c g a t c c t g c t gggcgccacg accaaggcga 2581 t c g t c a t g g g cgccggcgac g a c a c c g t c a c c g t c a g c t c g g c g a c c c t g g g c g c t g g t g 2641 g t t c g g t c a a cggcggcgac ggcaccgacg t t c t g g t g g c c a a c g t c a a c g g t t c g t c g t 2701 t c a g c g c t g a c c c g g c c t t c g g c g g c t t c g a a a c c c t c c g c g t c g c t g g c g c g g c g g c t c 2761 a a g g c t c g c a caacgccaac g g c t t c a c g g c t c t g c a a c t gggcgcgacg gcgggtgcga 2821 c g a c c t t c a c c a a c g t t g c g g t g a a t g t c g g c c t g a c c g t t c t g g c g g c t c c g a c c g g t a 2881 c g a c g a c c g t g a c c c t g g c c aacgccacgg g c a c c t c g g a c g t g t t c a a c c t g a c c c t g t 2941 c g t c c t c g g c c g c t c t g g c c g c t g g t a c g g t t g c g c t g g c t g g c g t c g a g a c g g t g a a c a 3001 t c g c c g c c a c cgacaccaac a c g a c c g c t c a c g t c g a c a c g c t g a c g c t g c a a g c c a c c t 3061 c g g c c a a g t c g a t c g t g g t g acgggcaacg c c g g t c t g a a c c t g a c c a a c accggcaaca 3121 c g g c t g t c a c c a g c t t c g a c gccagcgccg t c a c c g g c a c g g g c t c g g c t g t g a c c t t c g 3181 t g t c g g c c a a c a c c a c g g t g g g t g a a g t c g t c a c g a t c c g c g g c g g c g c t g g c g c c g a c t 3241 c g c t g a c c g g t t c g g c c a c c g c c a a t g a c a c c a t c a t c g g t g g c g c t g g c g c t g a c a c c c 3301 t g g t c t a c a c cggcggtacg g a c a c c t t c a cgggtggcac gggcgcggat a t c t t c g a t a 3361 t c a a c g c t a t c g g c a c c t c g a c c g c t t t c g t g a c g a t c a c c g a c g c c g c t g t c g g c g a c a 3421 a g c t c g a c c t c g t c g g c a t c t c g a c g a a c g g c g c t a t c g c t g a c g g c g c c t t c g g c g c t g 3481 c g g t c a c c c t g g g c g c t g c t g c g a c c c t g g c t c a g t a c c t g g a c g c t g c t g c t g c c g g c g 3541 acggcagcgg c a c c t c g g t t g c c a a g t g g t t c c a g t t c g g cggcgacacc t a t g t c g t c g 3601 t t g a c a g c t c ggctggcgcg a c c t t c g t c a g cggcgctga c g c g g t g a t c a a g c t g a c c g 3661 g t c t g g t c a c g c t g a c c a c c t c g g c c t t c g c c a c c g a a g t c c t g a c g c t c g c c t a a g c g a 3721 a c g t c t g a t c c t c g c c t a g g cgaggatcgc t a g a c t a a g a g a c c c c g t c t tccgaaaggg 3781 aggcggggtc t t t c t t a t g g g c g c t a c g c g c t g g c c g g c c t t g c c t a g t t c c g g t g g c t a 3841 t g a t t t a g c g ggactggggg g c t t g c t c a c t t t c c g c c a c a a t t t c g t g g tcgagacggc 3901 g c c t t a g t t g t t a c t g t a c a t g g c c g c g t c g g t t c g c g c g g c g t c c t g a a g g c t c a c a a t 3961 g t t c a a g c g c agcggcgcga agccgacgat c c t c g a c c a g g c c g t g c t g g t c g c c c g c c c 4021 g g c g g t g a t c a c c g c c a t g g t c t t c a g c t t c t t c a t c a a c a t t c t g g c c c t g g t c a g c c c 4081 g c t g t a c a t g c t g c a g g t c t a t g a c c g c g t g c t g a c c a g c c g c a a c g t t t c g a c c c t g a t 4141 c g t g t t g a c g g t c a t c t g c g t c t t c c t g t t c c t g g t c t a c g g c c t g c t c g a g g c g c t g c g 4201 c a c c c a g g t g c t g g t g c g c g g c g g t c t g a a g t t c g a c g g c g t g g c c c g g g a t c c g a t c t t 4261 c a a g t c g g t g c t g g a c t c c a c g c t c a g c c g caagggcatc ggcggccagg c g t t c c g c g a 4321 c a t g g a c c a g g t c c g a g a g t t c a t g a c c g g c g g c c t g a t c g c c t t c t g c g a t g c g c c c t g 4381 g a c g c c g g t g t t c g t c a t c g t c t c g t g g a t g c t g c a c c c g t t c t t c g g c a t c c t g g c g a t 4441 c a t c g c c t g c a t t a t c a t c t t c g g c c t g g c c g t g a t g a a c gacaacgcca c c a a g a a c c c 4501 g a t c c a g a t g g c c a c c a t g g c c t c g a t c g c cgcccagaac g a c g c c g g t t c c a c c c t g c g 4561 caacgccgag g t c a t g a a g g c c a t g g g c a t gtggggcggc c t g c a a g c c c g c t g g c g c g c 4621 gcgccgcgac gagcaggtgg c c t g g c a g g c cgccgccagc gacgccggcg g c g c g g t g a t 4681 g t c g g g c a t c a a g g t g t t c c g c a a c a t c g t c c a g a c c c t g a t c c t g g g c g g c g g c g c c t a 4741 t c t g g c c a t c gacggcaaga t c t c g g c c g g c g c g a t g a t c g c c g g c t c g a t c c t g g t c g g 4801 c c g c g c c c t g g c g c c c a t c g agggcgcggt gggccagtgg a a g a c c t a t a t c g g c g c g c g 4861 c g g c g c c t g g g a t c g t c t g c a g a c c a t g c t gcgcgaggaa aagagcgccg a c g a c c a c a t 4921 g c c g c t g c c c gagccgcgcg g c g t g c t g t c ggccgaagcc g c c t c g a t c c t g c c g c c g g g 4981 cgcgcaacag c c g a c c a t g c gccaggccag c t t c c g c a t c gacgccggcg c c g c g g t g g c 5041 c c t t g t c g g t cccagcgcgg cgggcaagtc c t c g c t g c t g c g c g g c a t c g t c g g c g t c t g 5101 g c c c t g c g c g g c c g g c g t c a t c c g c c t c g a c g g c t a c g a c a t c a a g c a g t g g g a t c c c g a 5161 g a a g c t g g g t c g c c a c g t c g g c t a c c t g c c gcaggacatc g a g c t g t t c t c g g g c a c c g t 5221 cgcccagaac a t c g c c c g c t t c a c c g a g t t c g a g t c g c a g g a a g t c a t c g aggccgcgac 5281 cctggcgggc gtgcacgaga t g a t c c a g a g c c t g c c g a t g g g c t a t g a t a c g g c g a t c g g 5341 cgagggcggc g c c t c g c t g t ccggcggcca gcgccagcgc c t g g c c c t g g c c c g c g c g g t 5401 g t t c c g c a t g c c g g c c c t g c t g g t g c t g g a cgagccgaac g c c a g c c t c g accaggtggg 54 61 cgaagtggcg c t g a t g g a a g cgatgaagcg g c t c a a g g c c gccaagcgca c g g t g a t c t t 5521 c g c c a c c c a c aaggtgaacc t g t t g g c c c a g g c c g a c t a c a t c a t g g t g a t c a a c c a g g g 5581 t g t g a t c a g c g a c t t t g g c g aacgcgaccc g a t g c t g g c c a a g c t g a c c g g g g c t g c g c c 5641 g c c c c a g a c g c c g c c g c c g a c g c c g c c g c c c g c g c c g t t g c a g c g c g t c c a g t a a g c g c c 5701 t t c g t c t g t c c g c c t c t c c c t t c c t g g c c g t t t c a g a a c g c g c c c a t c a g g c t t g a a t c t 57 61 c a a t g a a g c c c c c c a a g a t c c a g c g t c c g a c g g a c a a c t t c c a g g c t g t g g c c c g t a t c g 5821 g c t a c g g c a t c a t c g c c c t g a c c t t t g t c g g t c t g t t g g g c t g g g c c g c g t t c g c c c c g c 5881 t c g a c a g c g c g g t g a t c g c c a a c g g c g t c g t c t c c g c c g a g g g t a a t c g c a a g a c c g t g c 5941 a g c a c c t c g a aggcggcatg c t g g c c a a g a t c c t g g t c c g cgaaggcgag aaggtgaagg 6001 c c g g c c a g g t g c t g t t c g a g c t g g a c c c g a cccaggccaa c g c c g c c g c c g g c a t c a c c c 6061 g c a a c c a g t a t g t g g c g t t g aaggccatgg a a g c g c g c c t g c t g g c c g a g cgcgaccagc 6121 g t c c g t c c a t c a g c t t c c c c g c c g a c c t g a c c c g c c t g c g c g c c g a t c c g a t g g t c g c c c 6181 g c g c c a t c g c cgacgaacag g c c c a g t t c a c t g a g c g t c g c c a g a c g a t c cagggccagg 6241 t c g a c c t g a t gaacgcccag c g t t t g c a g t atcagagcga gatcgagggc a t c g a c c g t c 6301 agacccaggg cctgaaggac c a a c t c g g c t t c a t c g a g g a c g a g c t g a t c g a c c t g c g t a 6361 a g c t c t a t g a c a a g g g c c t g g t g c c c c g g c c g c g t c t g c t g g c c c t g g a g cgcgagcagg 6421 c c t c g c t g t c g g g c t c g a t c g g c c g t c t g a ccgcagaccg c t c c a a g g c c g t c c a g g g c g 6481 c c t c t g a c a c c c a g c t c a a g g t t c g c c a g a tcaagcagga g t t c t t c g a g c a g g t c a g c c 6541 a g a g c a t c a c cgagacccgg g t t c g c c t g g ccgaggtgac cgagaaggag g t c g t c g c c t 6601 c c g a c g c c c a gaagcggatc a a g a t c g t g t c g c c g g t c a a cgggacggcg c a g a a c c t g c 6661 g c t t c t t c a c cgagggcgct g t c g t t c g c g ccgccgagcc g c t g g t c g a c a t c g c g c c c g 6721 aggacgaggc c t t c g t g a t c c a g g c g c a c t t c c a g c c g a c c g a t g t g g a c a a t g t c c a t a 6781 t g g g c a t g g t c a c c g a a g t t c g g c t g c c g g c c t t c c a c t c gcgggaaatc c c g a t c c t g a 6841 acggcacgat c c a g t c t c t g t c g c a g g a c c g c a t t t c c g a t c c g c a g a a c a a g c t c g a c t 6901 a c t t c c t c g g g a t c g t g c g c g t g g a c g t c a a g c a g c t g c c g c c g c a t c t g cgcggcaggg 6961 t c a c c g c c g g c a t g c c g g c c c a g g t g a t c g t g c c g a c c g g cgagcgcacc g t g c t g c a g t 7021 a c c t g t t c t c g c c g c t g c g a g a c a c c c t g c g c a c c a c g a t gcgcgaggag t a g g t c a a a g 7081 g t t t c a a g g g c c t g a t t t c c a a a g c t t t t c ggatgggcgg cgcgggcgag c t a a a t t c g c 7141 c g g c g c t g c t t t c c a t t c g c g g gcaatagt g t a g t c a g g a c c c t t c g t t g t t a c t g g a g t 7201 cagcggatac gcatggcgaa a a c g g c t t t g a t c a c c g g t g t g a c c g g t c a ggacggggcg 7261 t a c c t c g c c a a g c t g c t g c t ggagaagggt t a c a c c g t c c a c g g c a t g c t g c g t c g c t c g 7321 g c c t c g g c c g a t g t g a t c g g c g a c c g c c t g c g c t g g a t c g g c g t c t a t g a c g a c a t c c a g 7381 t t c g a g c t g g g c g a c c t c t t ggacgagggc g g t c t g g c g c g c c t g a t g c g g c g c c t g c a g 7441 ccggatgagg t c t a c a a c c t ggcggcccag a g c t t c g t c g g c g c c t c g t g gga 128 Appendix 3 Sequences of IpsGHIJK, orfl and orf2 LOCUS gcc227 4883 bp mRNA BCT 15-OCT-1999 DEFINITION gcc227. ACCESSION gcc227 VERSION KEYWORDS SOURCE C a u l o b a c t e r c r e s c e n t u s . ORGANISM C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; - P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . REFERENCE 1 (bases 1 t o 4883) AUTHORS Awram,P.A. TITLE A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s JOURNAL U n p u b l i s h e d REFERENCE 2 (bases 1 t o 4883) AUTHORS Awram,P.A. TITLE D i r e c t S u b m i s s i o n JOURNAL Su b m i t t e d (15-OCT-1999) UBC FEATURES L o c a t i o n \/ Q u a l i f i e r s s ource 1. .4883 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" gene complement(1..1242) \/gene=\"orf3\" CDS complement(1..1242) \/gene=\"orf3\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e g l y c o l i p i d t r a n s p o r t e r \" \/translation=\"MSAAASTPQEYKRLTQYEVDVICAKHDRLWSARMGGARAVFAFC DLSGLSVPGRNLCDADFTGAILVGCDLRKAKLDNANFYGADLQGADLTDASLRRADLR GSSLRGANLTGADMFEADLREGTIAAADRKEGYRVIEPTQREAFAAGANLSGANLERS RLSGIVATKADFSDAILKDAKLVRANLKQANFNGANLAGADLSGANLAGADLRNAVLV GAKTLSWNVNDTNMDGALTDKPSGTSVSDLPYEQMIADHARWIETGGGEGKPSVFDKA DLRNLRSVRGFNLTALSAKGSVFYGLDMEGVQMQGAQLDGADLRACNLRRADLRGARL KGAKLTGADLRDAQLGPLLIAADRLLPVDLTGAILTNADLARADLRQARMAGADVSRA NFTGAQLRDLDLTGAIRLAARG\" gene complement(1335..2048) \/gene=\"orf4\" CDS complement(1335..2048) \/gene=\"orf4\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e p h o s p h o g l y c e r a t e mutase\" \/translation=\"MPTLVLLRHGQSQWNLENRFTGWVDVDLTAEGEAQARKGGELIA AAGIEIDRLFTSVQTRAIRTGNLALDAAKQSFVPVTKDWRLNERHYGGLTGLNKAETA EKHGVEQVTIWRRSYDIPPPELAPGGEYDFSKDRRYKGASLPSTESLATTLVRVLPYW ESDIAPHLKAGETVLIAAHGNSLRAIVKHLFNVPDDQIVGVEIPTGNPLVIDLDAALK PTGARYLDDSRAEALPKVG\" gene 2421..3377 \/gene=\"orf5\" CDS 2421..3377 129 \/gene=\"orf5\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e sugar phosphate isom e r a s e K p s F - l i k e \" \/translation=\"MSAFNAVQVGRRVLAVEADALRVLADSLGEAFANAVETIFNAKG RVVCTGMGKSGHVARKIAATLASTGTQAMFVHPAEASHGDLGMIGPDDVVLALSKSGA GRELADTLAYAKRFSIPLIAMTAVADSPLGQAGDILLLLPDAPEGTAEVNAPTTSTTL QIALGDAIAVALLERRGFTASDFRVFHPGGKLGAMLRTVGDLMHGADELPLVAADAAM PDALLVMSEKRFGAVGVVDNAGHLAGLITXGDLRRHMDGLLTHTAGEVMTHAPLTIGP GALAAEALKVMNERRITVLFVVERERPVGILHVHDLLRAGVI\" gene 3477..4883 \/gene=\"lpsg\" CDS 3477.-4883 \/gene=\"lpsg\" \/ c o d o n _ s t a r t = l \/product=\"phosphomannomutase\" \/translation=\"MFSSPRADLVPNTAAYENEALVKATGFREYDARWLFGPEINLLG VQALGLGLGTYIHELGQSKIVVGHDFRSYSTSIKNALILGLISAGCEVHDIGLALSPT AYFAQFDLDIPCVAMVTASHNENGWTGVKMGAQKPLTFGPDEMSRLKAIVLNAEFVER DGGKLIRVQGEAQRYIDDVAKRASVTRPLKVIAACGNGTAGAFVVEALQKMGVAEWP MDTDLDFTFPKYNPNPEDAEMLHAMADAVRETGADLAFGFDGDGDRCGVVDDEGEEIF ADKIGLMLARDLAPLHPGAXFVVXVKSTGLYATDPILAQHGCKVIYWKTGHSYIKRKS AELGALAGFEKSGHFFMNGELGYGYDCGLTAAAAILAMLDRNPGVKLSDMRKALPVAF TSLTMSPHCGDEVKYGVVADVVKEYEDLFAAGGSILGRKITEVITVNGVRVHLEDGSW VLVRAS SNKPEVVVVVE S S\" BASE COUNT 807 a 1647 c 1587 g 837 t 5 o t h e r s ORIGIN 1 gccgcgcgcc g c c a g c c t g a t g g c g c c g g t c a g a t c c a g g t c g c g t a g c t g c g c g c c g g t 61 a a a g t t g g c g cgcgagacat ccgcgccggc c a t t c g g g c c t g c c g c a g g t cggcgcgcgc 121 gagatcggcg t t g g t a a g a a t c g c t c c c g t c a g a t c g a c c ggcagcaagc g g t c g g c t g c 181 gatgagcagc g g c c c c a g c t g t g c g t c a c g c a g g t c c g c t c c c g t c a g c t t g g c g c c c t t 241 c a a c c g c g c g c c t c g c a g g t cggcgcggcg c a g g t t g c a g g c t c g c a g a t c g g c g c c a t c 301 c a a c t g c g c g c c c t g c a t c t g c a c g c c t t c c a t g t c g a g g c c g t a g a a c a c c g a c c c c t t 361 cgccgacagg g c c g t g a g a t t g a a g c c t c g gacagatcgc a g a t t c c g c a g a t c g g c c t t 421 g t c g a a c a c c g a g g g c t t g c c c t c g c c g c c g c c g g t c t c g a t c c a t c g t g c g t g g t c g g c 481 g a t c a t c t g c t c a t a c g g c a ggtcagagac g c t t g t g c c c g a c g g c t t g t c g g t c a a c g c 541 g c c g t c c a t a t t g g t g t c g t t g a c g t t c c a ggacagggtc t t g g c c c c a a ccagcacggc 601 g t t g c g c a g a t c a g c g c c g g c g a g g t t a g c gcccgacaga t c g g c c c c c g c c a g g t t g g c 661 g c c g t t g a a a t t g g c c t g t t t g a g g t t g g c ccgaaccagc t t g g c g t c c t t c a g g a t a g c 721 g t c a c t g a a g t c c g c c t t c g t c g c g a c g a t gcccgacagg c g c g a a c g c t c g a g g t t g g c 781 gcccgacagg t t c g c g c c g g ccgcgaaggc t t c g c g t t g g g t c g g t t c g a t g a c g c g a t a 841 a c c t t c c t t g cggtcggcgg c g g c g a t c g t g c c c t c c c g c a g a t c g g c c t c g a a c a t g t c 901 g g c g c c g g t c a g g t t g g c g c c g c g c a g a c t g g a t c c g c g c a g g t c a g c c c gccgcaacga 961 g g c g t c g g t c aagtcggcgc c t t g c a g a t c c g c g c c a t a g a a g t t g g c g t t g t c c a g c t t 1021 g g c c t t t c g c aggtcacagc cgacgagaat ggcgccggtg a a a t c g g c g t c g c a g a g a t t 1081 gcgccccggc acggagaggc cagacaggtc gcagaacgcg aaaaccgcgc g c g c g c c c c c 1141 c a t c c g c g c c gaccacagac g g t c a t g c t t ggcgcagata a c g t c c a c t t c g t a c t g c g t 1201 c a g g c g c t t g t a t t c t t g t g g c g t g c t g g c g g c g g c g c t c a t g g c t c g c t g g g t a t c c c g 1261 a g t t a a c g g g a c a c c t t g c g cagagcaaaa t t a a c g c c g t g t t t c c a a c a c c t t c c g c t t 1321 a a g c t t t c g g c g t t t t a a c c a a c t t t g g g a a g c g c c t c a g c c c g a c t g t c g t c c a g g t a g 1381 cgcgcgccgg t c g g c t t c a a c g c a g c a t c c a g g t c g a t c a c c a g c g g a t t g c c g g t c g g a 1441 a t c t c c a c a c c g a c g a t c t g g t c g t c c g g c a c g t t g a a c a g a t g c t t g a c gatggcgcgc 1501 a g c g a g t t g c cgtgggcggc g a t c a g c a c a g t c t c a c c c g c c t t g a g a t g cggagcgatg 1561 t c g c t t t c c c agtagggcag aacgcgaacc a g c g t c g t c g c c a g g c t t t c c g t g c t g g g c 1621 a g g c t t g c g c c c t t g t a g c g g c g a t c c t t g c t g a a g t c g t a c t c g c c g c c cggcgccagc 1681 t c c g g c g g c g g g a t a t c g t a cgaacggcgc c a g a t g g t c a c c t g c t c g a c g c c g t g c t t t 1741 t c g g c g g t c t c a g c c t t g t t c a g g c c g g t c a g c c c g c c a t a g t g g c g c t c g t t c a g g c g c 1801 c a g t c c t t g g tcacggggac g a a g c t c t g c t t g g c g g c g t c c a g c g c c a g a t t g c c t g t g 130 1861 cggatggcgc g g g t c t g a a c cgaggtgaac 1921 a t c a g c t c g c c g c c c t t c c g g g c c t g a g c t 1981 c a a c c g g t g a a g c g g t t t t c c a g g t t c c a c 2041 g t c g g c a t c g g g c t t c c t t c ggagatcagg 2101 c t t c a g c g t c aagcgccgaa gcgaccaagg 2161 ccaggccgcc g c g c g t g c t a taggcgagcc 2221 g g a c t c a t c g c c g c c g c g c t g a t c g c c c t c 2281 c g c t c c t g g g g c c c c t t c g g c c a c a c g c c c 2341 a t t c a g c g g g aaaaggactc ggcgcgccgt 2401 g c c a t g g a t c a g c t t t c g c a a t g a g c g c c t 2461 t c g c c g t a g a a g c c g a t g c g c t g c g c g t c c 2521 a t g c g g t c g a g a c g a t c t t c aacgccaagg 2581 c g g g g c a t g t ggcgcggaag a t c g c c g c c a 2 641 t c g t c c a c c c cgccgaagcc t c g c a c g g c g 2701 t t c t g g c g c t g t c c a a g t c g ggcgccggcc 2761 a g c g c t t c t c g a t c c c g c t g a t c g c c a t g a 2821 cgggcgacat c c t g c t g c t g c t c c c c g a c g 2881 c g a c c a c g t c g a c c a c c c t g c a g a t c g c g c 2941 agcggcgcgg c t t c a c c g c c a g c g a c t t c c 3001 c t a t g c t g c g c a c g g t c g g c g a c c t g a t g c 3061 ccgacgccgc c a t g c c c g a c g c t t t a c t g g 3121 g c g t c g t t g a t a a c g c g g g t c a c c t g g c c g 3181 a c a t g g a t g g g c t g c t g a c c c a c a c c g c c g 3241 t c g g c c c c g g c g c c c t g g c g gctgaagcgc 3301 t g c t t t t c g t c g t c g a g c g c gaacgccccg 3361 g c g c g g g t g t g a t c t a g g t c a c a t c g a a a c 3421 g t c g t g c c g c g t c c g c a c g g c t a a a g c c a t 3481 t c t c c t c g c c c c g c g c c g a t c t g g t t c c g a 3541 tcaaggcgac g g g c t t t c g c gagtacgacg 3601 t c c t g g g c g t g c a g g c c c t g g g c c t g g g t c 3661 c g a a g a t c g t g g t c g g c c a t g a c t t c c g c t 3721 t c c t g g g g c t g a t c a g c g c c ggctgcgagg 3781 c c g c c t a t t t c g c c c a g t t c g a c c t c g a c a 3841 acaacgaaaa c g g c t g g a c c ggcgtgaaga 3901 ccgacgagat g a g c c g c c t c a a g g c c a t c g 3961 gcggcaagct g a t c c g c g t g cagggcgagg 4021 g c g c c a g c g t c a c c c g t c c c c t g a a g g t g a 4081 c c t t c g t g g t c g a g g c c c t g cagaagatgg 4141 a c c t c g a c t t c a c c t t c c c c a a g t a c a a t c 4201 c g a t g g c t g a c g c t g t c c g t gagacgggcg 42 61 g c g a c c g c t g c g g t g t g g t c gatgacgagg 4321 t g a t g c t g g c g c g c g a c c t g g c c c c g c t a c 4381 agtcgacggg c c t m t a c g c c a c c g a t c c g a 4441 actggaagac cggccacagc t a c a t c a a g c 4501 g c t t c g a a a a gagcggccac t t c t t c a t g a 4561 g c c t g a c c g c cgccgcggcc a t c c t g g c c a 4621 c g g a c a t g c g c a a g g c c c t g c c g g t g g c c t 4 681 gtg a c g a g g t gaagtacggc g t c g t c g c c g 4741 ccgccggcgg t t c g a t c c t g ggccgcaaga 4801 g g g t g c a c c t ggaggacggc t c c t g g g t c c 4861 t c g t c g t c g t ggtcgaaagc age aageggtega t c t c g a t g c c ggccgcagcg t c g c c c t c a g c g g t g a g g t c c a c a t c a a c c t g g c t t t g g c c a t g g c g c a g caggacgagc g a a a t g t c a g ggcgggctaa g g c c a g c c c g a c c c g a t c g g c c c g g t g c g c g c c c c c t c c c a t g e c t g a t e g c a t c t t c a t g e c t c t g a t g t c t c t c g t c t ggccccaggg ccagggcgac g t t c a g c a g a c c c c g g a g a t gaaggccaag egggacgagg eggegaaaaa g g c t g t c g a g t e a a e g c t g t t c a g g t g g g c c g c c g c g t c c t c g c a g a c t c getgggegag g c c t t c g c c a g t c g c g t c g t c t g c a c g g g c a t g g g c a a g t c c c t c g c t t c gaccggcacc c a g g e g a t g t a c c t g g g c a t gategggeca gacgacgtgg gcgaactggc c g a c a c c c t g g c c t a c g c t a c c g c c g t g g c c g a c a g c c c g c t e g g t c a g g cgcccgaggg gaeggeggaa g t g a a c g c c c tgggegaege g a t c g c t g t g g e t c t g e t g g g c g t c t t c c a c cccggcggc a a g c t c g g c g acggcgccga t g a g c t t c c c c t g g t c g c c g t e a t g a g e g a a a a g c g t t t c ggegeggtcg-g c t t g a t c a c g k a c g g t g a t c t g c g t c g a c g c g a g g t c a t g a c g c a c g c t c c c c t g a c c a t g a a g g t t a t gaacgagegg c g g a t c a c c g t e g g c a t t e t a c a t g t g c a c g a c c t g c t t c t t t g e a a a a c c t t g t c a t g c g a a c g e g c t a a c c a t c t c a a ctgaagegag c c t t c a a t g t a t a c g g c c g c c t a c g a a a a c g a a g c c c t g g c g c g c t g g c t g t t t g g g c c g g a g a t c a a t c t g g g a a c c t a t a t c c a c g a a c t g g g c c a a t e g t a t t c g a e c t c g a t c a a g a a c g c c c t g a t g c a c g a c a t t g g c c t g g c c c t g t c g c c c a t c c c g t g c g t c j g c c a t g g t c acggccagcc t g g g c g c c c a g a a g c c g c t g a c c t t c g g c c tgetgaaege c g a g t t c g t c gagegegatg c c c a g c g c t a t a t c g a c g a c g t g g c c a a g c t c g c c g c c t g cggcaacggc acggccggcg g t g t c g c t g a g g t c g t g c c g a t g g a c a c c g c c a a c c c c g a agacgecgag a t g c t g c a c g c g g a c c t g g c g t t e g g c t t e gaeggegacg gcgaggagat c t t c g c c g a c aagateggee a t c c g g g c g c g r e y t t e g t c g t g r a t g t g a t c c t g g c c c a g c a c g g c t g c a a g g t g a t c t geaagagege ggagctgggc g c c c t g g c c g aeggegaget g g g c t a t g g c t a c g a c t g e g t g c t g g a c c g c a a t c c c g g c g t g a a g c t g t t c a c c t c c c t g a c c a t g a g c c c g c a c t g c g a c g t g g t g a a ggaatacgag g a c c t g t t c g t c a c c g a g g t g a t c a e g g t c aacggcgtgc t g g t c c g c g c c t c g t c c a a c aageccgagg LOCUS gcc506 DEFINITION gcc506. 8012 bp mRNA BCT 15-OCT-1999 ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gene CDS gene CDS gene gcc506 C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 8012) Awram,P.A. A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s U n p u b l i s h e d 2 (bases 1 t o 8012) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (15-OCT-1999) UBC L o c a t i o n \/ Q u a l i f i e r s 1..8012 \/ o r g a n i s m ^ \" C a u l o b a c t e r c r e s c e n t u s \" \/strain=\"NA1000\" complement(1845..3860) \/gene=\"orf6\" complement(1845..3860) \/gene=\"orf6\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e t r a n s k e t o l a s e \" \/translation=\"MRVRPSRSPAKHIKTEAPMPVSPIKMADAIRVLSMDAVHKAKSG HQGMPMGMADVATVLWGKFLKFDASKPDWADRDRFVLSAGHGSMLLYSLLHLTGFKAM TMKEIENFRQWGALTPGHPEVHHTPGVETTTGPLGQGLATAVGMAMAEAHLAARYGSD LVDHRTWVIAGDGCLMEGVSHEAISIAGRLKLSKLTVLFDDNNTTIDGVATIAETGDQ VARFKAAGWAVKVVDGHDHGKIAAALRWATKQDRPTMIACKTLISKGAGPKEGDPHSH GYTLFDNEIAASRVAMGWDAAPFTVPDDIAKAWKSVGRRGAKVRKAWEAKLAASPKGA DFTRAMKGELPANAFEALDAHIAKALETKPVNATRVHSGSALEHLIPAIPEMIGGSAD LTGSNNTLVKGMGAFDAPGYEGRYVHYGVREFGMAAAMNGMALHGGIIPYSGTFLAFA DYSRAAIRLGALMEARVVHVMTHDSIGLGEDGPTHQPVEHVASLRAIPNLLVFRPADA VEAAECWKAALQHQRTPSVMTLSRQKTPHVRTQGGDLSAKGAYELLAAEGGEAQVTIF ASGTEVGVAVAARDILQAKGKPTRVVSTPCWELFDQQPAAYQAAVIGKAPVRVAVEAG VKMGWERFIGENGKFIGMKGFGASAPFERLYKEFGITAEAVAEAALA\" 4281..6041 \/gene=\"orf7\" 4281. .6041 \/gene=\"orf7\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e NH(3)-dependent NAD(+) s y n t h e t a s e \" \/translation=\"MIVVGGPLRDAGRLYNTAIVIQGGKVLGVVPKSFLPNYREFYER RWFTPGAGLTGKTLTLAGQTVPFGTDILFRGEGVAPFTVGVEICEDVWTPTPPSTAQA LAGAEILLNLSASNITIGKSETRRLLCASQSSRMIAAYVYSAAGAGESSTDLAWDGHV DIHEMGALLAETPRFSTGPAWTFADVDVQRLRQERMRVGSFGDAMALSPASTPFRIVP FAFDAPEGDLALARPIERFPFTPSDPARLRENCYEAYNIQVQGLARRLEASGLKKLVI GISGGLDSTQALLVAAKAMDQLGLPRSNILAYTLPGFATSDRTKSNAWALMKAMAVTA AELDIRPAATQMLKDLDHPFGRGEAVYDVTFENVQAGLRTDYLFRLANHNAALVVGTG DLSELALGWCTYGVGDHMSHYNPNCGAPKTLIQHLIRFVAHSGDVGAETTALLDDILA TEISPELVPGEAVQATESFVGPYALQDFNLYYMTRYGMAPSKIAFLAWSAWHDADQGG WPVGLPDNARRAYDLPEIKRWLELFLKRFFANQFKRSAVPNGPKISSGGALSPRGDWR MPS DATADAWLAELRTNAPI\" 6121..7446 \/gene=\"lpsH\" 132 CDS 6121..7446 \/gene=\"IpsH\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e mannose-6-phosphate i s o m e r a s e \" \/translation=\"VWGQDLAAIYPVILCGGSGTRLWPASRSDHPKQFLKLVSDRSSF QETVLRVKDIPGVAEVVVVTGEAMVGFVSEQTAEIGAWATILVEPEARDSAPAVAAAA AYVEAQDPAGVVLMLAADHHIAQPEIFQQAALTATKAAEQGYIVTFGVQPTVPATGFG YIRPGAPLLDGSVREVAAFVEKPDQATAERYLLEGYLWNSGNFAFQAATLLGEFETFE PSVAAAAKACVAGLQLEAGIGRLDREAFAQAKKISLDYAIMERTQKAAVAPAAFAWSD LGAWDAIWEASTRDGDGNAQTGDVDLHGSSNVLVRSTGPYVGVIGVNDIVVVAEPDAV LVCHRKDSQAVKTLVDGLKAKGRSIASRKSASPNGTETLVSTDGFDVELRRVPAGETL MLPVSTLQVLEGVIEMDGDVYAAGAIIALDDSVQARAIGAATLLVTKPR\" BASE COUNT 1292 a 2635 c 2741 g 1342 t 2 o t h e r s ORIGIN 1 g c t t g a a g c t tgcgcgaggc g a c c g c c a c g g c c g c g a t g a gcaggagcgc g c c a a a c c a c 61 a g g g c g t t g g ccaggggtgc a t c g g g a t t g g c c t g c c g g g c c a c g a t g g g gatgaccagg 121 agatagccga c g t a c g g c g t c g c t g c g g c g a t g g t c g c t g agcgaaggcc g c c g t g g a c g 181 a t g a c g a c g t t c a t c a g c g c cccggccagc a g c a t g c c g c c c a a g g t c g g cgcgaaccgg 241 g c g t c g g a c g cgaacagggg a a t c g c c a g g a a t c c g a a g a cggcggaggt tagaaagggc 301 agggcgatga cggagaagcg gcgccagcga t c a g g g c t g c t c t g c g g t c g c t t g a t g a g c 361 t g a a g c g c c a g c c a g g t c t c caggaactgg a t c g c g g t g t a g a g c g c t c c c c a g a t c c a a 421 a c c c a g c g c a agccaacgac gaggtgcagg g t c g c g c a g a t c g c a a a g g t cgagctgagg 481 cgcagatagg t a t t g a c g c g tcgcgaggcg a c a g c g c g t t c c a g g t c a a t a t c c t t c g t c 541 t c a g c g c t c a t g g g c g c t c c g c c c c c c c c c aaggagatcg c a g a g c a a t c c c t t a c c c a a 601 t a a c c c t a g t cacgaggggc taagaggaaa tgaaagacca g c g t t t c g t c a a c g g c t t g g 661 g t g t c c g g c g cgcaccggaa g a t t t t t g c g c g t t a c t g c c a t t g g g g c a g g t t t t g g g g c 721 a t a a a a t c c c a a t a a t g c c g a c c c g c g c t a g t t c g g c g g c ccgaacgcgc g t g c g a c a t c 781 g g c t g g g t c c a c g c c c t c c a g t t c g a g a c g c t t t a g c c g c g c g g t t t c g c c a g c c g c c a g 841 t c g t a c g g c c gagcraggaa t c t t c a g g g t c t t t g c c a g a aaggcgatga g g g c c g c a t t 901 c g c c g c g c c c tcgacgggcg gactggcgac c c t c a c c t t c agatagaggc g g c c g t c g g c 961 g t c a a g c g c c c a a c c c t c g g c g g c g t c t c g c c c g c c c c t c ggggtcaggc gcaccacgag 1021 c g t c a c c g c c a c g c c g t c a g ccaagaagag c g a t c a g c g t g c c t t g a a g c gcaggcagca 1081 g a t a g t t c t g c acgcccgaa a t g a t c a g c a gaacaacgat c g g g c t g a t g t c g a c g c c g c 1141 cgagcgacgg g a t c a g g c g t tggaacgggc gaagcaccgg t c c g g t c a c g c g a t c c a g g a 1201 a a t c c a g c a c c t g g t a c a c a g c g g t g t t g c g g c g g t t g a t c a c g t c g a a c gcgaccagcc 12 61 ag c t g a g g a t c g c c g a g a t g a c g a t g g c c c accaaaggag gctgagcagg ccgccgagga 1321 tgaagaaaac g a a t t g a a t g a t g g c g g t c a t g g g t t t c c c g t c c g g a c t t t c c g c t t c t t 1381 g c c g c g c c a c g c g c g c g c t g gcaagcaagg a g c c g c t t g a c g a g c g c g c t t t g g g g c c g t 1441 a t c t c c g c c g c t c c t t g g g g c c g t a g c t c a g a t g g t a g a g c g c c t c g t t c gcaatgagga 1501 ggtcaggggt t c g a t t c c c c t c g g c t c c a c c a a g g a c c t c c c t t a g c c t c g a t c a c g a c c 1561 c g a t c g g a t c g c c c a t c g g a c t c c g t c c g c a t t t t c t t g c c t a a c g t c g c g t g c g c c c c t 1621 g c g a c g a c t c g c c a c a g c c t g g g t t c g a g g aacagcggcg c c t c g c c g g c a a g a t c g g c c 1681 a g g g t c c c c a t c g c t g t g g g gcaggcatgg ggccagggtc aggcgcgagc c t g c t c t g t a 1741 g g c g a c c a t c a g g t g t t t c t g t t t c g g a c c g c g c t a t c g a t c c g c c c t g g c c c t t a t t c t 1801 gcggtcgggg g c g g a g c t c c g g a c t c c a c c cccgcgcagc g a t t t c a g g c c a g a g c c g c t 1861 t c g g c c a c g g c t t c a g c g g t gatgccgaac t c t t t a t a c a g g c g c t c g a a cggagccgag 1921 gcgccgaagc c c t t c a t g c c g a t g a a c t t g c c g t t c t c g c c a a t g a a g c g c t c c c a g c c c 1981 a t c t t g a c g c c c g c t t c g a c ggcgacgcgc a c c g g g g c c t t g c c g a t g a c g g c g g c c t g g 2041 taggcggcgg g c t g c t g a t c g a a c a g c t c c cagcagggcg t g g a g a c c a c gcgggtcggc 2101 t t g c c c t t g g c c t g c a g g a t g t c g c g c g c g gcgacggcga c g c c g a c c t c g g t g c c c g a g 2161 gcgaagatcg t c a c c t g c g c c t c g c c g c c c t c g g c c g c c a g c a g c t c g t a g g c g c c c t t g 2221 gccgacaagt c g c c g c c c t g ggtgcggacg t g c g g g g t c t t c t g g c g c g a c a g g g t c a t c 2281 accgacggcg t g c g t t g a t g t t g c a g g g c c g c c t t c c a g c a c t c g g c g g c c t c g a c g g c a 2341 t c g g c c g g a c ggaagaccag c a g g t t c g g a a t g g c g c g c a a g c t g g c a a c g t g c t c g a c c 2401 g g c t g g t g g g t g g g a c c g t c t t c g c c g a g a c c g a t g g a g t c g t g g g t c a t c a c g t g g a c g 2461 a c g c g g g c c t c c a t c a g g g c gcccaggcgg a t g g c c g c g c g g c t g t a g t c ggcgaaggcc 2521 aggaaggtgc ccgaataggg g a t g a t c c c g ccgtgcaggg c c a t g c c g t t c a t g g c c g c g 133 2581 g c c a t g c c g a a c t c a c g c a c g c c a t a g t g g acatagcggc c t t c g t a g c c g g g c g c g t c g 2641 a a c g c g c c c a t g c c c t t g a c c a g g g t g t t g t t c g a g c c g g t c a g g t c g g c c g a g c c g c c g 2701 a t c a t c t c g g g g a t c g c c g g g a t c a g g t g c t c c a g g g c c g agccggagtg gacgcgggtg 2761 g c g t t g a c c g g c t t g g t c t c c a g g g c c t t g g c g a t g t g g g c g t c c a g c g c c t c g a a g g c g 2821 t t c g c c g g c a g c t c g c c c t t c a tggcgcgg g t g a a g t c g g c c c c c t t g g g cgaggcggcc 2881 a g c t t g g c c t c c c a g g c c t t g c g g a c c t t g gcgccgcgac g g c c g a c g c t c t t c c a g g c c 2941 t t g g c g a t g t c g t c g g g c a c ggtgaagggc g c a g c g t c c c a g c c c a t g g c cacgcgcgag 3001 g c g g c g a t c t c g t t g t c g a a c a g g g t g t a g c c g t g g c t g t gggggtcgcc t t c c t t g g g g 3061 c c c g c g c c c t t c g a g a t c a g c g t c t t g c a c g c g a t c a t g g t c g g g c g g t c c t g c t t g g t g 3121 g c c c a g c g c a gggccgcagc g a t c t t g c c g t g g t c g t g g c c g t c g a c g a c c t t g a c c g c c 3181 cagccggcgg c c t t g a a g c g c g c g a c c t g g t c g c c g g t c t c g g c g a t g g t g g c c a c c c c g 3241 t c g a t g g t g g t g t t g t t g t c gtcgaagagg a c c g t c a g c t t c g a g a g c t t caggcggccg 3301 g c g a t g c t g a t c g c c t c a t g g c t g a c g c c c t c c a t c a g g c a t c c g t c g c c g g c g a t c a c c 3361 c a g g t g c g g t ggtcgacgag g t c a g a g c c g tagcgggcgg c c a g g t g c g c c t c g g c c a t g 3421 g c c a t g c c g a cggcggtggc c a g g c c c t g g cccagcggac c g g t c g t g g t c t c g a c g c c g 3481 g g c g t g t g a t g c a c t t c c g g gtggcccggg g t c a g c g c c c c c c a c t g a c g g a a g t t c t c g 3541 a t c t c c t t c a t c g t c a t g g c c t t g a a g c c g g t c a g a t g c a gcagggaata g a g c a g c a t c 3601 gagccgtgac cggccgacag cacgaagcgg t c g c g g t c g g c c c a g t c a g g c t t a g a c g c g 3661 t c g a a t t t c a g g a a c t t g c c c c a t a g g a c c g t c g c c a c g t c g g c c a t g c c c a t c g g c a t g 3721 c c c t g g t g g c c g g a c t t c g c c t t g t g c a c g g c g t c c a t g g agaggacgcg g a t c g c g t c g 3781 g c c a t c t t g a tgggcgaaac gggcatgggg g c t t c c g t c t t t a t a t g t t t t g c a g g g c t g 3841 cgcgaggggc g c a c t c g c a t gggaaccccc gggggtcaac ccgcgcaggg cggctaaggc 3901 cgtgcggaac g c c t g t c g t g g a c g c t a t g c a a c a t c g t c c g t c g g c g t t a taggtggagg 3961 c a g a c t t a t g c t c a g t t t c a g c a t a a c c c g gaaaggccgc t c c c t t g g g t a g t c c g t c g t 4021 t c t t c t c g c c c t a c c g t c a c g g t t t c g t c c gggtcgcgac c g c c g t t c c g aaggtcaagc 4081 t g g c g g a t c c c g c c g c c a a t g c t c a g a a c g t c g t g g c t c t ggcccgcgag g c c c a t g c g g 4141 agggcgtggc t g t g g t c g t g t t c c c g g a a c t g g g g c t g a c g g g c t a c a c g a t c g a c g a c c 4201 t t c t g c a g c a a g a g g c g t t g ctggacgcgg t t g a g g c c g c g a t c g c c a c c c t g a c c g a g g 4261 ccagcgcagg c c t g g c g c c g a t g a t c g t g g t c g g a g g t c c g c t g c g c g a c gcaggccgcc 4321 t c t a c a a t a c c g c g a t c g t c a t c c a g g g c g g c a a g g t g c t g g g c g t g g t c c c g a a a a g c t 4381 t c c t g c c c a a c t a t c g c g a g t t c t a c g a g c g t c g c t g g t t cacgccgggc g c c g g c t t g a 4441 caggcaagac c c t g a c c c t g gccggccaga c c g t t c c g t t cgggaccgac a t t c t g t t c c 4501 ggggcgaggg c g t c g c c c c g t t c a c g g t g g g c g t c g a g a t c t g c g a g g a t g t c t g g a c c c 4561 c g a c c c c g c c c a g c a c c g c c c a g g c c t t g g cgggggccga g a t c c t g c t g a a c c t g t c g g 4621 c c a g c a a c a t c a c c a t c g g c aagtccgaaa c g c g g c g t c t g c t c t g c g c c a g c c a g t c g t 4681 c g c g g a t g a t c g c g g c c t a t g t c t a t t c g g cggccggcgc gggcgagagc t c g a c c g a c c 4741 t g g c c t g g g a c g g c c a t g t c g a t a t t c a c g agatgggcgc g c t g c t c g c c gagaccccgc 4801 g g t t t t c g a c gggcccggcc t g g a c c t t c g c c g a t g t g g a c g t c c a g c g c c t t c g g c a g g 4861 agcggatgcg c g t c g g c a g c t t c g g c g a c g c c a t g g c g t t a t c g c c g g c c t c g a c c c c g t 4921 t c c g g a t c g t t c c g t t c g c c t t t g a c g c g c ccgagggcga c c t g g c g c t g gcccggccga 4981 t c g a a c g c t t t c c c t t c a c g c c g t c c g a c c c a g c c a g g c t gcgcgagaac t g c t a c g a g g 5041 c c t a c a a c a t c c a g g t c c a g ggcctggcgc g g c g c c t c g a g g c t t c g g g t c t c a a g a a g c 5101 t c g t c a t c g g t a t t t c c g g c g g g c t c g a c t c c a c c c a g g c t c t g c t g g t g gcggccaagg 5161 c c a t g g a c c a g c t g g g c c t g ccgcgcagca a c a t c c t g g c c t a c a c t c t g c c g g g c t t t g 5221 c g a c g t c c g a t c g c a c c a a g t c c a a c g c c t g g g c g c t g a t gaaggcgatg g c c g t c a c c g 5281 c c g c c g a g c t c g a t a t c c g g cccgcagcga c c c a g a t g c t c a a g g a c c t c g a c c a c c c g t 5341 tcgggcgcgg cgaggcggtc t a t g a c g t c a c c t t c g a g a a t g t g c a g g c c g g c c t g c g a a 5401 c c g a c t a t c t g t t c c g t c t g g c c a a c c a c a a c g c c g c c c t g g t c g t c g g c acgggggacc 54 61 t g t c g g a g c t ggcgctgggc t g g t g c a c c t a c g g c g t c g g c g a c c a c a t g a g c c a c t a c a 5521 a c c c c a a c t g c g g t g c g c c c aagacgctga t c c a g c a c c t g a t c c g c t t c g t g g c c c a t t 5581 c g g g t g a c g t cggcgccgag a c c a c g g c t c t g c t g g a c g a c a t c c t c g c g a c c g a g a t c t 5641 c g c c g g a g c t g g t g c c c g g c g a g g c g g t t c aggcgaccga g a g c t t c g t c g g c c c c t a c g 5701 c c c t g c a g g a c t t c a a t c t c t a c t a c a t g a c c c g c t a c g g c a t g g c g c c g t c c a a g a t c g 57 61 c c t t c c t g g c c t g g a g c g c c t g g c a t g a c g ccgaccaggg c g g c t g g c c c g t c g g c c t g c 5821 ccgacaacgc t c g c c g c g c c t a c g a c c t g c c t g a g a t c a a g c g c t g g c t g g a g c t g t t c c 5881 t g a a g c g g t t c t t c g c c a a c c a g t t c a a g c g c t c g g c t g t acccaacggg c c g a a a a t c t 5941 cgtcgggcgg c g c g t t g t c g ccgcgggggg a c t g g c g c a t g c c g t c g g a t gcgacagccg 6001 a t g c c t g g c t ggcggaactg c g c a c a a a t g c g c c g a t t t g a g g a a a a c t c t t c g t t a c a g 6061 c t c t g t t t g c c t t t a a g c a g t c g a c g c a a t a g c a c c c g a t aaggggcgaa g a c t a a g a c t 6121 gtgtggggac a a g a c t t g g c t g c g a t c t a t c c g g t a a t c c t g t g t g g c g g c t c g g g c a c c 6181 c g c c t c t g g c c c g c a t c g c g gagcgaccat c c c a a a c a g t t c c t t a a a c t c g t g a g c g a t 6241 c g g t c c t c c t tccaggagac t g t c c t g c g g g t g a a g g a t a t t c c g g g t g t ggccgaggtg 6301 g t c g t c g t g a ccggcgaggc g a t g g t c g g g t t t g t g t c c g agcagaccgc cgagatcggc 6361 g c c t g g g c c a c a a t c c t g g t cgaacccgag g c t c g c g a c a gcgcgccggc cgtggcggcg 6421 gcggcggcct a t g t c g a g g c c c a g g a t c c g g c c g g c g t c g t g t t g a t g c t ggccgccgac 6481 c a c c a c a t c g c c c a g c c c g a a a t c t t c c a g c a g gccgccc t c a c c g c c a c taaggcggcc 6541 gagcagggct a t a t c g t c a c g t t c g g g g t t cagccgacgg t c c c g g c g a c c g g c t t t g g t 6601 t a t a t c c g c c c t g g c g c g c c g c t t c t g g a t g g t t c g g t g c g t g a g g t c g c c g c c t t c g t c 6661 gagaagcccg accaggcgac cgccgagcgc t a t c t t c t g g a a g g c t a t c t ctggaacagc 6721 g g c a a t t t c g c g t t c c a g g c g g c g a c c t t g c t g g g c g a g t t c g a g a c c t t t g a a c c g t c g 6781 g t c g c c g c c g ccgccaaggc g t g c g t g g c c g g c c t g c a g c tggaggccgg c a t c g g c c g c 6841 c t g g a t c g c g a g g c c t t c g c ccaggccaag a a g a t c t c g c t c g a c t a c g c c a t c a t g g a g 6901 c g c a c c c a g a a g g c c g c t g t c g c c c c t g c g g c g t t c g c c t g g t c g g a c c t t g g g g c c t g g 6961 g a c g c g a t c t gggaggcctc c a c c c g c g a c ggcgacggta a c g c c c a g a c gggcgacgtc 7021 g a c t t g c a c g g c t c g t c c a a t g t t c t g g t g c g c t c g a c g g g t c c c t a t g t c g g c g t g a t c 7081 ggggtcaacg a c a t c g t c g t cgtggccgag cccgacgcgg t g c t g g t c t g c c a t c g c a a g 7141 gacagccagg cggtgaagac c c t g g t c g a t ggcctgaagg ccaagggccg c t c c a t c g c c 7201 t c g c g c a a g a g c g c c t c g c c gaacgggacc g a g a c c c t g g t c t c g a c c g a c g g c t t c g a t 7261 g t g g a g t t g c g t c g c g t a c c ggcgggagag a c c t t g a t g c t g c c g g t a t c g a c g c t t c a g 7321 g t g c t g g a a g g c g t g a t c g a gatggacggc g a c g t c t a t g c t g c g g g c g c g a t c a t c g c c 7381 t t g g a c g a c t c g g t t c a g g c t c g g g c g a t c ggggcggcga c c t t g c t g g t cacgaagccg 7441 c g t t g a t c a c c c g g t c c a t c t c c a g g a t c g ccaaggcgat a t g g t a g a a c g a g c t g g c c g 7501 ggacgggctc t t c g a t a a a g g t t t c g t c g g g c t g g t a c t t g t c g c g c c a g aggccgggga 7561 t c g c c g t g c g cagataggcc a t c a g g c c c t cagcggcggc g g c g g c c a t g t c c c a g t a g c 7621 g c g c t t g g c c g g t g a t c t c c gccgccagca c g g c c g c c t t g a t c c g c t c g g t t t g g g g c c 7681 acagtcgggc g c c g t c g t c g t g g g t c g a g a a a t c a t c g a g c a g g g c g t t g a t c g c c a c g c 7741 c t c g c g a t a g g t c a a c g c c g t g g g t c t c g g c g t c a t c g a t c a t g c g c a a g g c c g c c g c c g 7801 t a g c g t c g g c gcggccggca a g c t g g c c c c a g c g c a t c a g c a g c c a g c c c c a t t c g a a c t 7861 g a t g c c c c g g c t c g c a g a t c cgacccgcga c g c c t g g c g c g g g g t t c c a g t c g a g g t c g a 7921 a g a a c t c a c g g a t c t g g c c g c t g g g c g c g t g a a t g a a c c t ggagagcgcc a g t t c g g c g a 7981 t c t c g t c g g n cagggtgcgc c a g a t c g g g t cc \/\/ LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gcc433 gcc433. gcc433 9041 bp mRNA BCT 15-OCT-1999 C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 9041) Awram,P.A. A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s U n p u b l i s h e d \u20222 (bases 1 t o 9041) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (15-OCT-1999) UBC L o c a t i o n \/ Q u a l i f i e r s 1..9041 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" gene 913..2295 \/gene=\"orf8\" CDS 913..2295 \/gene=\"orf8\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e Glucose-6-Phosphate 1-Dehydrogenase\" \/translation=\"MLLPSLYFLELDRLLPHDLRIIGVARADHDAASYKALVREQLGK RATVEEAVWNRLAARLDYVPANITSEEDTKKLAERIGAHGTLVIFFSLSPSLYGPACQ ALQAAGLTGPNTRLILEKPLGRDLESSKATNAAVAAVVDESQVFRIDHYLGKETVQNL TALRFANVLFEPLWDRSTIDHVQITIAETEKVGDRWPYYDEYGALRDMVQNHMLQLLC .LVAMEAPSGFDPDAVRDEKVKVLRSLRPFTKETVAHDTVRGQYVAGVVEGGARAGYVE EVGKPTKTETFVAMKVAIDNWRWDGVPFFLRTGKNLPDRRTQIVVQFKPLPHNIFGPA TDGELCANRLVIDLQPDEDISLTIMNKRPGLSDEGMRLQSLPLSLSFGQTGGRRRIAY EKLFVDAFRGDRTLFVRRDEVEQAWRFIDGVSAAWEEASIEPAHYAAGTWGPQSAQGL ISPGGRAWKA\" gene 2298..2996 \/gene=\"orf9\" CDS 2298.-2996 \/gene=\"orf9\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e 6 - p h o s p h o g l u c o n o l a c t o n a s e \" \/translation=\"MPFTPIKLEAFGSREDLYDAAASVLVGALTTAVARHGRVGFAAT GGTTPAPVYDRMATMTAPWDKVTVTLTDERFVPATDASSNEGLVRRHLLVGEAAKASF APLFFDGVSHDESARKAEAGVNAATPFGVVLLGVGPDGHFASLFPGNPMLDQGLDLAT DRSVLAVPPSDPAPDLPRLSLTLAALTRTDLIVLLVTGAAKKALLDGDVDPALPVAAI LKQDRAKVRILWAE\" gene 2997..4811 \/gene=\"orflO\" CDS 2997..4811 \/gene=\"orf10\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e phosphogluconate d e h y d r a t a s e \" \/translation^IAMSLNPVIADVTARIVARSKDSRAAYLANMDRAIENQPGRAKL SCANWAHAFAASPGVDKLRALDPNAPNIGIVSAYNDMLSAHQPLEAYPALIKDAARDV GATAQFAGGVPAMCDGVTQGRPGMELSLFSRDVIAMATAVALTHDAFDSALYLGVCDK IVPGLVIGALTFSHLPALFVPAGPMTSGLPNSEKARIRALYAEGKVGREELLAAESAS YHGPGTCTFYGTANTNQMLMELMGFHLPGSAFVHPNTPLREALVKESARRVAAVTNKG NEFIPVGRMIDEKSFVNGVVGLMATGGSTNLALHIIAMAAAAGVQLTLEDLDDISKAT PLLARVYPNGSADVNHFQAAGGMAFVIRELLKAGLVHEDVQTIAGAGLSLYAKEPVLE DGMLTWRDGAHESLDPAIVRPVSDPFSKEGGLRLMAGNLGRGVMKISAVKPEHHVIEA PCAVFQEQEDFIAAFKRGELDRDVVVVVRFQGPSANGMPELHNLSPSISVLLDRGHKV ALVTDGRMSGASGKTPAAIHVTPEAAKGGPLAYVQDGDVIRVNAETGELKIMVDEATL LARTPANVPASKPGFGRELFGWMRSGVGAADAGASVFA\" gene 5856..6926 \/ g e n e = \" l p s l \" CDS 5856..6926 \/ g e n e = \" l p s l \" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e r e p r e s s o r s i m i l a r t o L a c I \" \/translation=\"MAKYSPKRANRTGEGRKLSAKVTIHDVARESGVSIKTVSRVLNR EPNVKADTRDRVQAAVAALHYRPNISARSLAGAKAYLIGVFFDNPSPGYVTDVQLGAI ARCRQEGFHLIVEPIDSTADVEDQVAPMLTTLRMDGVILTPPLSDHPVVLAALEREGV AYVRIAPGDDFDRAPWVSMDDRLAAYEMTKHLVDLGHKDIAFIVGHPDHGASHRRHQG FLDAMRDSGLRVRDDRVAQGWFSFRSGFEAAEKLLGGADRPTAIFASNDDMALGVMAV ANRLRLDVPTQLSVAGFDDTPGAKITWPQLTTVRQPIHAMAGAAADMLMQGVEREEGA 136 PPPSRLLDFELVVRESTGPASH\" gene 7224..9041 \/gene=\"orf11\" CDS 7224..9041 \/gene=\"orf11\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e 1,4-B-D-glucan g l u c o h y d r o l a s e \" \/translation=\"MLPRRFAFASALALTIACGSAGVVLAQTPPNATANPAVWPMSAS PAAITDAKTEAFIAQLMSRMTVEEKVAQTIQADGASITPEELKKYRLGSVLVGGNSAP ' DGNDRASPQRWIEWIRAFRAAALDKRGDRQEIP11FGVDAVHGHNNWGATIFPHNVG LGAAHEPDLIRRIGEVTAKEMAATGADWTFGPTVAVPRDSRWGRAYEGYGENPEIVKA YSGPMTLGLQGALEAGKPLAAGRVAGSAKHFLADGGTENGRDQGDAKISEADLVRLHN AGYPPAIEAGILSVMVSFSSWNGVKHTGNKSLLTDVLKERMGFEGFVVGDWNAHGQVE GCSNTSCAQAYNAGMDMMMAPDSWKGLYDNTLAQVKAGQIPMARIDDAVRRILRVKVK AGLFEDKRPLEGKLELLGAPEHRAVAREAVRKSLVLLKNEGVLPLKSSARVLVAGDGA DDIGKASGGWTLTWQGTGNKNSDFPHGQSIYAGVAEAVKAGGGSAELSVSGDFKQKPD VAIVVFGENPYAEFQGDITSIEYQAGDKRDLALLKKLKAAGIPVVSVFLSGRPLWTNP ELNASDAFVAAWLPGSEGGGVADVLVGDKAGKPRHDFQGK\" BASE COUNT 1492 a 3016 c 3052 g 1480 t 1 o t h e r s ORIGIN 1 g g a c g c g c t g a a c a c c a t c a acgcgacccg c a c g a c g a t c c t g c c g c c c g c c c t g a a c g c 61 c g a g c g t c g t c t g g c t a c c g c g c g c g c c c t g a t g g g t c t g g g a c g c t a t g a c g c g g c g c t 121 g g a t c t g g t c gaaaccgaca c c a g c c g t g a cgggcaggag a t c c g c g g c g a g a t c g c t t g 181 gaagcaacgc g c c t g g c c g g ccgccggcgc g c t c t a t g a g c g g t c c c t g g g c g a c c g c t t 241 caggacaggc ggggcgctca gcgcggccga ggaggcgagg t t g c t g c g c g c c t c g g t g g c 301 c t a c a g t c t c g c c g a t g a c g acgcggcgct c g g t c g c c t g c g c g c c c g c t g g t c g g g c t t 361 c a t c g a g a c a gccagcaacc c t g a a g g t c t t c g c g t a g c g c t a c a g g g c a t g t c g a t g g g 421 c g c g g t g t c c g c g t c c g a c t t t g g a c g g g t t t c c g c c g a c a a c g a g g c c t t c a a t g g c t g 481 g a t c g g t c g c c t c a a g g a a c g t t t c c g g a c aggacaaccg g c c g g g t c a c c c g c c c g c g c 541 gggcggctag c g c t c g c t g c g g t g a t c g c a g g t t c g c g c g c g t t a g a a g g gcggcagggc 601 a a g g t t c g c c tgaaagcgcg ctcaagaggc t t t a t g g a a g g c a g g g c g t t t c c g a a t g g c 661 c c c g g g a a t c c c t c a a a c a g t t c c t c t g a a a c c g g a t t g g t c t t a t g g c g ggcgccgggc 721 g t t t g g c c t a g t t a g a g c c c cgccgggcgc c g a t c c g a g g gcggcgaagc g c g g t c g c t t 781 g c c t t t c c c g c g c a g g c a t g t c a t g a c a a c g t t g t c a t a g ggtggacgac a t t g g c t a a g 841 aacaacgacg tcggcgagaa cggccgcgaa g t c t t g g t g c t g c t g g g c g g agcgggcgat 901 t t g g c c c t c c g g a t g c t g c t g c c t t c t c t g t a t t t c c t g g a g c t c g a c c g a c t g c t g c c g 961 c a c g a t c t g c g g a t c a t t g g c g t c g c c c g g g c c g a c c a t g acgcggccag c t a c a a g g c g 1021 c t g g t c c g c g agcaactggg caagcgcgcg acagtggagg a g g c g g t t t g g a a t c g c c t c 1081 gccgcgcgcc t c g a c t a c g t g c c t g c g a a c a t c a c c a g t g aggaagacac c a a g a a g c t g 1141 gccgaacgga t c g g t g c g c a t g g c a c g c t g g t c a t c t t c t t c t c g c t g t c g c c c a g c c t c 1201 t a c g g c c c g g c t t g c c a g g c t t t g c a g g c c g c c g g c c t g a cggggcccaa c a c g c g c t t g 1261 a t c c t c g a a a a g c c g c t t g g c c g c g a t c t c g a a a g c t c c a aggccaccaa c g c c g c c g t c 1321 g c c g c t g t g g tcgacgagag c c a a g t g t t c c g c a t c g a c c a c t a t c t g g g caaggaaacc 1381 g t c c a g a a c c t g a c g g c c c t g c g c t t c g c c a a c g t g c t g t t c g a g c c c c t g t g g g a t c g c 1441 a g c a c g a t c g a c c a t g t g c a g a t c a c c a t c gccgagaccg aaaaggtcgg c g a c c g c t g g 1501 c c c t a c t a c g acgaatacgg c g c g c t g c g g g a c a t g g t g c a g a a c c a c a t g c t g c a a c t g 1561 c t g t g t c t g g t c g c c a t g g a a g c g c c c t c a g g c t t c g a t c c c g a t g c g g t gcgcgacgag 1621 aaggtcaagg t g c t g c g c t c c c t g c g g c c c t t c a c c a a g g a g a c c g t g g c c c a c g a c a c c 1681 g t g c g t g g c c a g t a c g t c g c c g g t g t g g t c gagggcggcg c g c g c g c t g g c t a t g t c g a g 1741 gaagtgggca a g c c c a c c a a gaccgagact t t c g t g g c c a t g a a g g t c g c g a t c g a c a a c 1801 t g g c g t t g g g a c g g c g t g c c g t t c t t c c t g c gcaccggca a g a a c c t g c c ggaccgccgc 1861 a c c c a g a t c g t c g t c c a g t t c a a g c c t t t g c c g c a c a a c a t c t t c g g t c c ggcgaccgat 1921 g g c g a g c t g t gcgccaaccg c c t a g t c a t c g a c c t g c a g c cggacgaaga c a t c t c g c t g 1981 a c g a t c a t g a a c a a g c g t c c g g g t c t c t c g gacgagggca t g c g a c t g c a g t c g c t g c c g 2041 c t g t c g c t g t c g t t t g g c c a gaccggcggg c g c c g t c g c a t c g c t t a c g a a a a g c t g t t c 2101 g t c g a c g c c t t c c g c g g c g a c c g t a c g c t g t t c g t g c g t c g c g a t g a g g t cgagcaggcc 2161 t g g c g c t t c a t c g a c g g c g t c t c g g c g g c c tgggaagagg c c a g t a t c g a accggcgcac 137 2221 t a t g c g g c g g g c a c c t g g g g a c c g c a g t c c gcccagggcc t g a t c t c g c c cggcggccga 2281 gcctggaagg c c t g a g c a t g c c c t t c a c g c c c a t c a a g c t c g a a g c a t t t g g g t c c c g c g 2341 a g g a c c t c t a t g a c g c g g c c g c c t c g g t t c t g g t c g g c g c t t t g a c g a c g g c g g t c g c t c 2401 g t c a c g g c a g g g t c g g c t t c g c c g c c a c c g gcggcacgac gccggcgccg g t c t a t g a c c 2461 gcatggcgac c a t g a c c g c c c c c t g g g a c a a g g t c a c g g t c a c g c t c a c c gacgagcgct 2521 t t g t t c c c g c c a c c g a c g c c agcagcaatg a g g g t c t g g t g c g t c g c c a c c t g c t c g t g g 2581 gcgaggcggc c a a g g c c t c g t t c g c g c c g c t g t t c t t c g a cggcgtgagc cacgacgaga 2641 gcgcgcgcaa ggccgaggcg g g c g t c a a t g c c g c c a c c c c g t t c g g c g t c g t t c t c c t g g 2701 gcgtggggcc ggatgggcat t t c g c t t c g c t g t t t c c g g g c a a t c c g a t g c t g g a t c a g g 2761 g t c t g g a c c t c g c c a c c g a c c g t t c g g t g c t g g c c g t g c c g c c c a g c g a t c c c g c g c c g g 2821 a c c t c c c a c g c c t g a g c c t g a c c c t g g c c g c c c t g a c c c g c a c c g a c c t g a t c g t g c t g c 2881 t g g t c a c c g g cgcggccaag a a a g c t t t g t tggacggcga c g t t g a t c c g g c c c t g c c g g 2941 t c g c c g c c a t t c t g a a a c a g gaccgcgcca a g g t c c g c a t c c t c t g g g c g g a g t a g a t c g 3001 c c a t g a g c c t g a a t c c c g t c a t c g c c g a c g t c a c c g c c c g g a t c g t g g c g cgcagcaagg 3061 acagccgcgc g g c c t a t c t c g c c a a c a t g g a t c g g g c g a t cgagaaccag ccggggcgcg 3121 c c a a g c t g t c c t g c g c c a a c t g g g c c c a c g c c t t c g c c g c c t c g c c g g g c g t c g a c a a g c 3181 t c c g t g c t c t g g a tccgaac gcgccgaaca t c g g c a t c g t c t c g g c c t a t a a t g a c a t g c 3241 t g t c a g c c c a c c a g c c g c t g g a a g c c t a t c c c g c g c t g a t caaggacgcc gcccgggacg 3301 tgggcgcgac c g c c c a g t t c gccggcgggg t g c c g g c c a t g t g c g a c g g t g t c a c c c a g g 3361 g c c g t c c c g g c a t g g a g c t g t c g c t g t t c t c g cgcgacgt g a t c g c c a t g g c g a c c g c c g 3421 t g g c c c t g a c c c a t g a c g c c t t c g a c t c g g c g c t g t a t c t g g g c g t c t g c g a c a a g a t c g 3481 t g c c g g g c c t g g t g a t c g g c g c a c t g a c c t t c a g c c a t c t g c c c g c c c t g t t c g t g c c c g 3541 c c g g c c c g a t g a c c t c g g g c c t g c c c a a c a gcgagaaggc c c g c a t c c g c g c g c t c t a c g 3601 ccgagggcaa g g t c g g t c g t gaggaactgc tggcggccga gagcgccagc t a t c a t g g c c 3661 c g g g c a c c t g c a c c t t c t a t ggcacggcca a c a c c a a c c a g a t g c t g a t g g a g c t g a t g g 3721 g c t t c c a t t t g c c t g g c t c g g c c t t c g t c c a t c c c a a c a c g c c g c t g c g t g a g g c c c t g g 3781 t c a a g g a a t c c g c c c g c c g c g t g g c t g c g g t g a c c a a c a a gggcaatgaa t t c a t c c c g g 3841 t c g g c c g g a t gatcgacgag a a g t c g t t c g t c a a c g g c g t g g t c g g g t t g a t g g c g a c c g 3901 g c g g c t c g a c c a a c c t g g c g c t g c a c a t c a t c g c c a t g g c c g c c g c t g c g g g c g t g c a a c 3961 t g a c c c t c g a agacctggac g a t a t c t c c a , a g g c c a c g c c g c t g c t g g c g c g c g t c t a t c 4021 c g a a c g g t t c g g c c g a c g t g a a c c a c t t c c aggccgccgg c g g c a t g g c t t t c g t g a t c c 4081 g t g a g c t g c t gaaggcgggt c t a g t g c a c g a a g a c g t c c a g a c g a t c g c g ggcgccggcc 4141 t g t c g c t g t a cgcgaaggaa c c g g t g c t c g aggacggcat g c t g a c c t g g c g t g a c g g c g 4201 c t c a c g a g a g t c t g g a t c c c g c c a t c g t g c g g c c g g t c t c c g a c c c g t t c agcaaggaag 42 61 g c g g c c t g c g c c t g a t g g c g g g c a a t c t g g gccgcggcgt g a t g a a g a t c t c g g c c g t g a 4321 agcccgagca c c a c g t g a t c gaggcgccgt g c g c c g t g t t ccaggaacag g a a g a c t t c a 4381 t c g c c g c t t t caagcgcggc g a g c t g g a t c g c g a c g t g g t c g t g g t g g t c c g c t t c c a g g 4441 g g c c g t c c g c c a a c g g c a t g c c t g a a c t g c a t a a c c t g t c g c c g t c g a t c t c g g t g t t g c 4501 t g g a t c g c g g t c a c a a g g t g g c c c t g g t c a ccgacggccg c a t g t c c g g c g c c t c t g g c a 4561 agacgcccgc c g c c a t c c a c g t g a c g c c g g aagcggccaa gggcgggccg c t g g c c t a t g 4 621 t c c a g g a c g g c g a t g t g a t c c g c g t c a a t g ccgagaccgg ggaactgaag a t c a t g g t g g 4 681 acgaggcgac c c t g c t c g c c c g g a c c c c c g c g a a c g t c c c g g c g t c c a a g c c g g g c t t t g 4741 gccgggaact g t t t g g a t g g a t g c g g t c g g gggtcggcgc ggccgacgcc g g c g c c t c c g 4801 t c t t t g c t t g aggaagcgct a g g t c a t g g a c g g c a a t c a c agcggcgggc t c g g c c t c g t 4861 c g g c g a c a t c ggcggtacga a c g c c c g c t t c g c c c t g g t c g a g t t c g a c g g t c a g g a c c c 4921 g c g c c t g a t c gagccgacgg c c t a t a g g g g cgaggactac ggcacggccg aggacgccat 4 981 c g a g g a g t a t c t c c g c a a g g t c g g t g t c a a g c a t c c t g a c caggcggtgg t c g c t g t g g c 5041 c g g g c c g a t c g a c c a c g g t c a g g t c c a c a t g a c c a a t c t g gactggcgga t c t c c g a g g a 5101 c g g c c t g c g c cgcgcaggcg g t t t t c g g a a c g c c a a g c t g a t c a a c g a c t t c a c c g c c c a 5161 g g c g c t g g c c gcgccgcgcg t t g g c c c t a a g g a c c t g c g c c a g a t c g g c g a a t t g c c g a c 5221 gtcgggggag g g c g a t c t g g c g a t c c t g g g t c c a g g c a c c g g c t t c g g c g t c g c a g g c c t 5281 t g t c c g t c g c c a t g g c c a g g a g a t c c c g c t ggccaccgag g g t g g t c a c g t c g c c t t c g c 5341 g c c g g t c g a t gacgtcgaga t c g a g g t g c t c c g c g c c c t g a c c c g g c g c c tggacggcgg 5401 t c g g g t g t c g gtcgagcgga t c c t g t c g g g t c c c g g c a t g g a g g a c c t c c a t g t g g a t c t 54 61 ggcggccgct gaagggcgcg g t g t c g a g g c g c t g a c c g c c a a g c a g a t c a ccgagcgggc 5521 cgtagagggc t g c g c c g a c a g c c t g g c g a c g g t g a a c c g t t t c t g c g c c a t c c t g g g c t c 5581 aacggcgggc g a c a t c g c t c t g a c c t t g g g cgcacgcggc g g t g t t t t c a t c g c c g g c g g 5641 c a t c g c a c c a c g c a t c a t c g a c a t t c t g g a 5701 caaggggcgt c t g t c c g g c t t c a c c c g t t c 57 61 c a c c g c c c t g a t c g g t g c g g c g g t g g c g c t 5821 g t c g t t t t c t t g g g g c t c g c c a t g a c a g c c 5881 cgaatcggac cggcgaggga c g g a a a t t g a 5941 gcgagagcgg a g t c t c g a t c a a g a c c g t c t 6001 a g g c t g a c a c c c g t g a t c g c g t g c a g g c c g 6061 t c t c g g c c c g t a g c c t g g c g ggggcgaagg 6121 c c a g c c c c g g c t a c g t c a c c g a t g t g c a g c 6181 g g t t c c a t c t g a t c g t c g a g c c g a t c g a c t 6241 c g a t g c t g a c g a c g t t g c g c atggacggcg 6301 c g g t c g t t c t g g c g g c g c t t gagcgggaag 6361 a c g a t t t c g a c c g t g c g c c g t g g g t c a g c a 6421 c c a a g c a t c t g g t c g a t c t g ggccacaagg 6481 a c g g c g c t t c g c a c c g g c g t c a t c a g g g g t 6541 g t g t t c g t g a t g a t c g t g t g gcgcagggct 6601 ccgagaagct gctgggcggc gcggatcgac 6661 t g g c g c t g g g c g t c a t g g c g g t c g c c a a t c 6721 c a g t c g c c g g c t t c g a c g a c acgccgggag 6781 t t c g c c a a c c g a t c c a c g c c atggccggag 6841 agcgggaaga gggcgcgccc c c g c c a t c g c 6901 a g t c c a c c g g c c c a g c g t c g c a c t g a c g a g 6961 t a t t c c t c t t g a c a g c g c t g t c c g a t c g a c 7021 cgaaaaggcg gcgtcaggag gaaacgccat 7081 c g c c c t g t g c g c a c g a t c a c t c c c c t c c a a 7141 cgcggacacc ccgcaagcgc cgcgccgagc 7201 g g a c g c c c c t t c a a g g a t c g c c c a t g c t g c 7261 c c c t g a c g a t c g c c t g t g g a t c g g c c g g c g 7321 c t g c a a a c c c c g c c g t t t g g c c g a t g t c g g 7381 c c g a g g c c t t c a t c g c c c a g c t g a t g a g c c 7441 c c a t c c a g g c c g a t g g c g c c t c g a t c a c g c 7501 c g g t g c t g g t cggcggcaac t c a g c g c c g g 7561 g g a t c g a a t g g a t c c g c g c c t t c c g c g c g g 7621 a a a t c c c g a t c a t c t t c g g c g t c g a c g c c g 7681 c g a t c t t c c c g c a c a a t g t c ggcctgggcg 7741 t c g g c g a g g t g a c c g c t a a g gaaatggccg 7801 c g g t c g c c g t g c c t c g c g a t t c a c g c t g g g 7861 c g g a g a t c g t g a a g g c c t a t t c g g g c c c g a 7921 ccggcaagcc g c t g g c g g c c ggccgcgtgg 7981 g t g g c a c c g a gaatggccgc gaccagggcg 8041 g t c t g c a c a a c g c c g g c t a c ccgccggcga 8101 c g t t c t c c a g ctggaacggg g t c a a g c a c a 8161 tgaaggagcg c a t g g g c t t t g a g g g c t t c g 8221 t c g a g g g c t g cagcaacacc a g c t g c g c c c 8281 t g g c t c c c g a cagctggaag g g c c t y t a c g 8341 a g a t c c c c a t ggcgcggatc g a c g a t g c c g 8401 c c g g c c t g t t cgaggacaag c g g c c t t t g g 8461 agcaccgggc c g t g g c g c g c gaggcggtgc 8521 g c g t g c t g c c gctgaagagc t c g g c t c g t g 8581 t t g g c a a g g c c t c g g g c g g t t g g a c c c t g a 8641 a c t t c c c g c a c g g c c a g t c g a t c t a t g c a g 8701 gcagcgcgga a c t g t c g g t t t c g g g c g a t t 8761 t g t t c g g c g a g a a c c c c t a c g c c g a g t t c c 8821 c t g g c g a c a a g c g t g a c c t g g c g c t g c t g a 8881 t g t c g g t g t t c c t g a g c g g c c g g c c c c t g t 8941 c c t t t g t c g c g g c g t g g c t g c c c g g c t c g g 9001 gcgacaaggc gggtaagccg c g c c a c g a c t gaagagcccg t t c c g c g a g c g c t t c g a c a g g a t c c c g a c g c a c g t g a t c c t g c a t c c g c a cacgccggag g g c c g t g c g g c g g t g t c g t a t t g t c a t g g c t a a g t a c t c g ccgaagcgag gcgccaaagt c a c g a t c c a c g a c g t g g c c c c g c g t g t c c t g a a t c g c g a g c c c a a c g t c a c g g t a g c t g c g c t g c a c t a t c g c c c c a a t a c c t a t c t g a t c g g c g t t t t c t t c g a c a a c c t c g g c g c c a t c g c c c g t t g c cggcaggaag cgaccgccga t g t c g a g g a t c a g g t c g c g c t g a t c c t g a c c c c g c c g c t c a g c g a t c a t c g g g t g g c c t a t g t g c g c a t c g c c c c a g g c g t g g a t g a t c g g c t g g c c g c c t a c g a g a t g a a c a t c g c c t t t a t t g t a g g g c a c c c c g a c c t c c t c g a t g c a a t g c g c g a c a g c g g c c t g c g g t t t t c g t t t c g c t c g g g c t t c g a g g c g g cgacggcgat c t t c g c c t c g a a c g a t g a c a g c t t g c g g c t t g a c g t t c c t a c t c a a c t g t cgaagataac c t g g c c t c a g c t c a c c a c g g cggccgccga c a t g c t g a t g c a a g g c g t c g g g c t c c t g g a c t t c g a a c t c g t c g t g c g g g t g c g c g a t t g g c a a g g t g g t a c c g c a a t c a g t g a g a t c g c a t a g a t c a a g a c a g t c g c c a g a c c t c g c c t g g c c a g t c c a g c g g c t g t c g g c g g a c c t g a a g c c g c t t a t c t t c g c t c g g g c c t t t c c t t t a a a c a c c g c t c c g c c c g a c c g c g c c g t t t c g c c t t c g c t t c c g c c c t g g t g g t c c t g g c c c a g a c g c c g c c g a a c g c c a c t a g t c c a g c c g c c a t c a c c gacgccaaga g g a t g a c c g t cgaggagaag g t c g c c c a g a ccgaggaact gaagaagtac c g g t t g g g a t acggcaatga c c g c g c c a g c c c g c a g c g c t c c g c g c t g g a caagcgcggc gaccggcagg t g c a t g g t c a c a a c a a c g t c g t g g g c g c c a cagcgcacga g c c c g a c c t g a t c c g t c g t a ccaccggggc g g a c t g g a c c t t t g g t c c g a g c c g c g c c t a t g a g g g c t a t ggcgagaatc t g a c c c t g g g gctgcagggg gcgctggaag cgggctcggc c a a g c a c t t c c t c g c c g a t g acgcgaagat c t c c g a g g c c g a t c t g g t g c t t g a a g c c g g c a t c c t g t c g g t g a t g g t c t ccggcaacaa a a g c c t g c t g a c c g a c g t g c t c g t c g g c g a c t g g a a c g c c cacggccagg a g g c t t a t a a c g c c g g c a t g g a c a t g a t g a a c a a c a c c t t ggcgcaggtg aaggccgggc t t c g c c g c a t c c t g c g a g t c aaggtcaagg agggcaagct g g a g c t c c t c g g c g c g c c t g g c a a a t c g c t g g t g c t g c t g aagaacgaag t g c t g g t c g c cggagacggc gccgacgaca cctggcaggg caccggcaac aagaacagcg g c g t c g c g g a ggccgtgaaa gccggcggcg t c a a g c a g a a g c c c g a c g t g g c g a t c g t t g agggcgacat c a c c a g c a t c g a g t a t c a g g a g a a g c t c a a ggctgcgggc a t t c c g g t g g ggaccaaccc c g a a c t c a a c g c g t c c g a c g agggcggcgg c g t g g c c g a c g t t c t g g t c g t c c a g g g c a a g \/ \/ LOCUS gcc2537 1177 bp mRNA BCT 15-OCT-1999 DEFINITION gcc2537. ACCESSION gcc2537 VERSION KEYWORDS SOURCE C a u l o b a c t e r c r e s c e n t u s . ORGANISM C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . REFERENCE 1 (bases 1 t o 1177) AUTHORS Awram,P.A. TITLE A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s JOURNAL U n p u b l i s h e d REFERENCE 2 (bases 1 t o 1177) AUTHORS Awram,P.A. TITLE D i r e c t S u b m i s s i o n JOURNAL S u b m i t t e d (15-OCT-1999) UBC FEATURES L o c a t i o n \/ Q u a l i f i e r s s ource 1..1177 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" gene complement(1..1177) \/ g e n e = \" l p s j \" CDS complement(441..998) \/ g e n e = \" l p s j \" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e g a l a c t o s y l - l - p h o s p h a t e t r a n s f e r a s e \" \/translation=\"TGDAPCGQHRDDGADQDPQIEQKRAPAQIFGVKGDLVGDRQLVS PIDLRPPRHAGTQGVNAGCTARGDQVILIEQSRPGSDQAHVTDEHAPELGQLIETELA HQAADRRQPLIRIVEKVGGHLGRIDAHGAKLRHRKQRRGAPHALRPVETRPRRSQPHK RPQHGQRQDQDRQPDRCSQHIKHTLH\" BASE COUNT 169 a 387 c 416 g 205 t ORIGIN 1 gccgaagccg a c g g c a a t t g g a g t t g g c g t g g t c t c c t g a c g g g c a t c g a 61 c c g g g c g a t g cggggctggg t g a g a a g c t c gccgcagaga t c a c c g c c a t 121 c c c c a c g c t c t g t t g c c g c g cggacggatc gacgagatcg c c g c c g c g c t 181 gacggagagg gcgcgcgaga g g t g g t c a a g c g c c t g g c g c c g g c c g c g a t 241 g t c c g t c g c c t g t t c t c c g a c g c g a c g c t g aagggcaaca c c g a g c g c t t 301 t a t g c g g g c a t g a t c g a c g a ggcggccggc c a g g a t c g c g a a g g c t t c c t 361 c t g c t g t c c t c c g a c g c t g g g c g g g c c t a t c t g c t g c t c g acgcggcgag 421 g c c t a g g c c g agccggctga a t g a a g c g t a t g t t t g a t g t g c t g g c t g c a 481 t g g c g g t c c t g g t c c t g c c g c t g g c c g t g c t g t g g g c g c t t g t g c g g c t g 541 g g c c t g g t c t c t a c t g g t c g cagcgcgtgg g g c g c g c c t c g g c g c t g t t t 601 a g t t t c g c a c c a t g c g c a t c g a t a c g c c c g aggtggccac c c a c c t t c t c 661 a t c a a t g g c t g a c g c c g a t c ggcggcctga t g c g c a a g c t c a g t c t c g a t 721 a g c t c t g g a g c g t g c t c g t c g g t c a c a t g a g c c t g g t c g g a c c c c g g c c g 781 a t c a g g a t g a c t t g a t c g c c g c g c g c c g t g c a g c c g g c g t t g a c g c c t t g 841 t g a c g g g g t g ggcgcagatc aatgggcgag a c g a g t t g t c g a t c g c c g a c 901 t t g a c g c c g a a t a t c t g c g c c g g c g c t c g c t t c t g t t c g a t c t g c g g g t c 961 c c g t c a t c c c g g t g c t g a c c gcacggggcg t c a c c c g t t a g c c g a g a t c g 1021 t t t c c a g c g c cgaaacgaag cggggatcgc cggccagcga cgccgggaac 1081 gggccaggaa ggcggcgacg t c t g c g c t g g c c t c t c c c g t ggccgccagg 1141 ggaggcgatc t t g c a g c g g g t c g a c g a t g g c t t c a c c g a g c a g c t c g gggcatcgac c c a g a c g c g t c c g t c g c t t g c c t c a a g c g c g c t c g c c g c c c g g c g a t c t g gcgatcgggc a c t t c g c c g g c c g a t g c c g a g a c a a t c c t g g a g t t g c c c c g c t c t g t t c a c g t c c c g g c g a a g g t c g c c c c t g g t c a g c a g c g t a g g c c t a c c g a g t c c a c c c a g a g c c a 140 \/\/ LOCUS g c c l 4 4 4 2031 bp mRNA BCT 15-OCT-1999 DEFINITION g c c l 4 4 4 . ACCESSION g c c l 4 4 4 VERSION KEYWORDS SOURCE C a u l o b a c t e r c r e s c e n t u s . ORGANISM C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; . C a u l o b a c t e r . REFERENCE 1 (bases 1 t o 2031) AUTHORS Awram,P.A. TITLE A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s JOURNAL U n p u b l i s h e d REFERENCE 2 (bases 1 t o 2031) AUTHORS Awram,P.A. TITLE D i r e c t S u b m i s s i o n JOURNAL S u b m i t t e d (15-OCT-1999) UBC FEATURES L o c a t i o n \/ Q u a l i f i e r s s ource 1..2031 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" gene 3..569 \/gene=\"orf15\" CDS 3..569 \/gene=\"orf15\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e molybdenum c o f a c t o r b i o s y n t h e s i s p r o t e i n \" \/translation=\"GEAIRLSPQGDDAQAIASAVSPAPVDVIVTIGGASVGDHDLVKP ALRTLGLALSVETVAVRPGKPTWSGRLPDGRRWGLPGNPASALVCAELFLRPLLAAL TGAAPDIRLIPAGLAAPLPAGGPREHWMRAALSTDPDGRVVATPFPDQDSSLVSVFAR ADALLRRRPGAPPAATGEWDVLPLRRG\" gene 658..2031 \/gene=\"IpsK\" CDS 658..2031 \/gene=\"lpsK\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e n u c l e o t i d e sugar e p i m e r a s e \/ d e h y d r a t a s e p r o t e i n \" \/translation=\"MGHAGKIATHVLLAFVALLAGRYLVIDMPFTRDTLLQATLYGLA AFIVELAFRVERAPWRFVSATDHLRLLRSAVLTAAAFLVITRLTHPGIDGGLRTVAGA ALIQAALLSALRVIRRSLHERMLLDSVLRLGPASMHPALPRLLIIGSASEAEAFLRAP AGLGERYAPIGVVSPLDRETGDELRGVCVLGSIADFDSVLARLRDSGLSPAAILFLTD SAMSTFGAERLGRLKTEGVRLLRRHGVVEMGAAANTPQLREISIEELLSRPPVRLDPE PVRALVSGRRVLVTGAGGSIGSELCRQIAASGCAHLTMVDASEYNLFHIEREIAERHP LLSRREALCDVRDAARVQRVFTEMKPDIIFHAAALKHVTLVENHPCEGVRTNVLGTRN VAVAAKACGAAHLALISTDKAVAPTSVMGAAKRVAEAVARQYGGGGDMRVSIVRFGNV LGSAGSVV\" BASE COUNT 255 a 722 c 703 g 350 t 1 o t h e r s ORIGIN 1 gcggtgaggc g a t c c g a c t t t c c c c g c a g g gcgacgacgc c c a g g c g a t c g c c a g c g c c g 61 t t t c g c c c g c g c c c g t c g a c g t c a t c g t c a cgatcggcgg c g c c t c g g t c g g c g a c c a t g 121 a c c t g g t c a a a c c c g c a c t c c g a a c g c t g g g c c t t g c g c t t t c g g t c g a g a c g g t c g c c g 141 181 t g c g c c c c g g caagccgacc tggagcgggc g g t t g c c g g a c g g t c g c c g c g t g g t g g g t c 241 tgccaggaaa c c c g g c c t c g g c g c t g g t g t gcgcggaact c t t c c t g c g g c c t c t g c t g g 301 c g g c t c t c a c gggcgcggcg c c g g a t a t c c g c c t c a t t c c c g c g g g c t t g g c c g c c c c g c 361 t t c c g g c g g g cggaccgcgg g a g c a t t g g a t g c g c g c c g c g c t g t c g a c g g a t c c g g a c g 421 g g c g a g t c g t c g c g a c a c c c t t c c c c g a t c a g g a t t c c t c t c t g g t c a g c g t g t t c g c g c 481 g c g c c g a t g c t c t g c t a c g g c g a c g g c c t g g c g c g c c c c c t g c g g c g a c g g g c g a g g t t g 541 t c g a t g t t c t g c c g c t c c g g cgcggctgaa accgcgacgg c a t a g a a t t g a c g t g c t a a g 601 c c c g g a t t t g a g t t c g c c g g g c g t g a c c c g a c c t t c a c c g c t t c a g a g g t t c g t t t c a t g 661 gggcatgcag gaaagatcgc g a c c c a c g t t c t g c t g g c c t t c g t g g c c c t g c t g g c c g g t 721 c g c t a t c t c g t c a t c g a c a t g c c g t t c a c g cgggacacgc t g c t t c a g g c g a c c c t g t a c 781 g g c c t c g c a g c a t t c a t c g t g g a g t t g g c t t t c c g g g t g g agcgggcccc g t g g c g c t t c 841 g t c t c g g c c a c c g a c c a c c t g c g a c t t c t c c g c t c g g c c g t c c t g a c g g c g g c g g c g t t c 901 c t g g t c a t t a c c c g c c t g a c c c a t c c a g g c a t c g a c g g t g g c c t g c g c a c c g t g g c c g g c 961 g c g g c c c t g a t c c a g g c g g c g c t g c t g t c g gcgctgcggg t g a t c c g g c g g a g c c t g c a t 1021 gagcgaatgc t g c t c g a t t c g g t g c t g c g c c t t g g s c c c g c c t c g a t g c a t c c g g c g c t g 1081 c c g c g c c t g c t g a t c a t c g g c t c g g c c t c c gaggccgaag c c t t c c t g c g c gcgccggcc 1141 g g g c t t g g c g a a c g t t a c g c c c c g a t c g g c g t g g t c t c g c c g c t c g a c c g cgagaccggc 1201 g a t g a a c t g c g c g g c g t c t g c g t t c t g g g c t c g a t c g c c g a t t t c g a c a g c g t g c t g g c c 1261 c g t c t g c g c g a c a g c g g c c t g t c g c c g g c c g c g a t c c t g t t c c t c a c c g a c a g c g c g a t g 1321 a g c a c c t t c g gcgccgagcg t c t g g g c c g c t t g a a g a c g g aaggcgtgcg c c t g c t g c g c 1381 cgccacggcg t g g t c g a g a t gggcgcggcg g c c a a c a c c c c c c a g c t g c g c g a g a t c a g c 1441 atcgaggaac t c t t g a g c c g g c c g c c t g t c c g a c t g g a t c cagagccggt t c g c g c g c t g 1501 g t g t c c g g t c g a c g g g t g c t ggtgacaggc gcggggggca g c a t c g g t t c c g a g c t c t g c 1561 c g t c a g a t c g ccgccagcgg c t g c g c c c a t c t g a c c a t g g t c g a c g c c t c c g a a t a c a a c 1621 c t g t t c c a c a tcgagcgcga g a t c g c c g a g cggcacccgc t c c t c t c g c g t c g t g a g g c g 1681 c t c t g c g a c g t c c g c g a c g c c g c c c g c g t c c a g c g t g t c t t c a c g g a g a t gaagccggac 1741 a t c a t c t t c c a c g c t g c g g c g c t g a a g c a t g t c a c g c t g g t g g a g a a c c a c c c c t g c g a g 1801 g g c g t c c g c a c c a a t g t g c t gggcacccgc a a c g t g g c c g t c g c c g c c a a g g c c t g c g g c 1861 g c g g c g c a t c t c g c c t t g a t c t c g a c g g a c a a g g c c g t c g c g c c g a c c a g c g t g a t g g g c 1921 gcggccaagc g t g t c g c c g a ggccgtggcg c g t c a g t a c g gcggcggcgg c g a c a t g c g g 1981 g t c a g c a t c g t g c g c t t t g g c a a t g t g c t g g g c t c g g c c g g a t c g g t c g t a \/\/ LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gcc2218 gcc2218. gcc2218 2142 bp mRNA BCT 15-OCT-1999 gene C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 2142) Awram,P.A. A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s U n p u b l i s h e d 2 (bases 1 t o 2142) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (15-OCT-1999) UBC L o c a t i o n \/ Q u a l i f i e r s 1. .2142 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" complement(3..719) 142 \/gene=\"orf14\" CDS complement(3..719) \/gene=\"orf14\" \/ c o d o n _ s t a r t = l \/product=\"unknown e x s G - l i k e \" \/trans1ation=\"SCGQAHAFGERRAQREDQARGREVEHRLAAEVPRQALMHHLGAE PVARGRPRQGGPALFAPDQGQKPRSGRLVDVPFDRDPALGGREGAMARGVGDQLVDGH VHRHRRLGAEGDGRALDLEPSRNVVGEGRQGALQNLLQLRTSPGAAGQQLVRLRERQD PALEDVCKGLGRRGRAQGLAGDRLHDRQGVLHAVIQLAQXEVTVLERGGEVVIETPAL QGRGRGARDHLQLAQHLGRRI\" gene 1134..2138 \/gene=\"IpsL\" CDS 1134..2138 \/gene=\"IpsL\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e g l y c o s y l t r a n s f e r a s e \" \/translation=\"EPEFRRIDGEGGVVPARHGLARLGPVHGDRQLLAELHGVDGERL VGLGFDRIEPGLAVGGQDVADVVAAEVADPRLHLHQPVQRHHLAALRREQGVVQARAE LAGHQTRPREETRLLVPAQGLGAHQGEAGLLGQALDLGLVFELAEGLGRAHAAPADEQ HVGVTALGEAQGLALKVPAQEVVEHIVAVQHVRQQGRADLDGFVARVVGDDLLVGQHH PHDARHIAGGGPGQQQRRGDPRRVQGADHEQLRVHLGQPVTPDEVEVGDHEGPVEPVR HRRVEIPGLARNVVLVPVLDVIGIGALGVDELVVQLQVRAGAFGHHAFGHQQVDIGRV E\" BASE COUNT 324 a 723 c 757 g 337 t 1 o t h e r s ORIGIN 1 a t g a t c c g c c g c c c a a g g t g c t g c g c a a g c t g c a g g t g g t cgcgcgcgcc a c g g c c t c g g 61 c c c t g g a g a g c g g g c g t c t c a a t g a c t a c c t c g c c g c c t c g c t c g a g c a c g g t c a c t t c m 121 t g c t g c g c g a g c t g g a t c a c cgcgtgaaga a c a c c c t g g c g a t c g t g c a g t c g a t c a c c c 181 g c c a g a c c c t gcgcacgacc c c g t c g c c c g a g g c c t t t g c a g a c g t c t t c gagagccgga 241 t c c t g g c g c t c t c g c a g g c g c a c g a g t t g c t g a c c c g c c g c g c c t g g g g a c g t c c g c a a c 301 tggaggagat t c t g c a a c g c a c c c t g g c g a c c t t c g c c g a c g a c g t t g c g g g a c g g t t c g 361 aggtccaggg c c c g g c c g t c g c c t t c a g c g ccgagacggc g g t g t c g g t g c a c a t g g c c a 421 t c c a c g a g c t g a t c g c c a a c gccgcgcgcc a t g g c g c c c t c t c g a c c c c c caaggccggg 481 t c g c g a t c g a a t g g a a c g t c gacaaggcga c c g c t c c g g g g c t t c t g a c c c t g a t c t g g c 541 gcgaacaggg cgggcccgcc c t g t c g g g g c c g c c c a c g c g c a a c g g g t t c g g c t c c a a g a 601 t g g t g c a t c a gggcctggcg c g g g a c c t c g gcggccaggc g g t g c t c g a c t t c g c g c c c a 661 c g g g c c t g g t c t t c a c g c t g cgcgcgccgc t c t c c g a a c g c a t g a g c c t g g c c g c a t g a a 721 g a c c g c c c g g g t c a t g a t c g t c g a g g a t g a g g c c c t g g t g g c c a t g a t g g t c g a g g a c a t 781 g c t c g g c g a c a t g g g g t g t g aggtggccgg c t c g t t c g g c g c c g t c g a c g c c g c c c t g g c 841 c t g g c t g c g c g a t c a t c c c t cgcccgacgg c g c g g t g c t g g a c g t c a a t a t c g g c g g c g a 901 g a t g g t g t t t c c g g t c g c c g a a c g c c t g c g cgagcagggc g t g c c g t t c g t c t t c g c c a c 961 c g g c t a t g g c g a c c t g c c g c g c g c g g g c t t c g a g t c g g t g c a g g t g c t g g c c a a g c c g a t 1021 caacgccggc g c g c t g c g c c t g g c c g t c g a g c g c t t c c g g a t c g g c t a g a a c c c c t t g g c 1081 c t g g g a t g g g a t c a t c c a a a g c g t t c a a a a gggcgctgga t g t c t g a t c t t t a g a g c c t g 1141 a g t t c c g g c g t a t c g a c g g g gaagggggcg t c g t c c c a g c c c g c c a t g g c c t t g c g c g c c 1201 t g g g c c c g g t c c a c g g c g a t c g c c a g c t t c t g g c g g a a c t c c a c g g g g t c g a t g g t g a a c 1261 g g c t g g t t g g c c t g g g g t t c g a t c g a a t a g agccgggcct g g c c g t c g g c gggcaggatg 1321 t a g c c g a c g t cgtggcggcg g a a g t t g c c g a t c c g c g a c t g c a c c t c c a c c a g c c a g t t c 1381 agcgacacca c c t g g c a g c c ctgcggcgag agcagggcgt t g t g c a g g c c cgagccgaac 1441 t c g c c g g c c a ccagacgcgc ccgcgagaag a g a c g c g c c t g c t g g t c c c a g c c c a g g g t c 1501 tcggggcgca ccagggtgag g c c g g c c t c c tgggccaggc c c t c g a t c t c g g c c t c g t t t 1561 t c g a g c t c g c ggaaggactg gg c c g t g c g c a c g c c g c c c c ggctgacgaa c a g c a t g t c g 1621 g c g t c a c c g c c c t t g g t g a a g c t c a a g g c c t c g c g c t c a a g g t t c c a g c g caggaagtcg 1681 t g g a a c a c a t a g t t g c g g t t c a g c a t g t g c ggcagcaggg c c g c g c c g a t c t c g a c g g c t 1741 t c g t c g c g c g g g t c g t a g g t g a c g a t c t c c t g g t c g g g c a g c a t c a t c c g c a c g a t g c g c 1801 g c c a c a t a g c cgggggcggt ccggggcagc agcagcggcg c g g t g a t c c c c g c c g c g t t c 1861 agggcgcgga t c a c g a a c a g c t t a g g g t a c a t c t c g g t c a g c c a g t g a c c ccagacgaag 143 1921 t t g a a g t g g g t g a t c a c g a a g g c c c g g t c g 1981 c a g g c c t g g c g c g c a a t g t c g t g c t g g t a c 2041 t c g g c g t a g a c g a g c t g g t c g t c c a g c t t c 2101 c g t t t g g c c a c c a g c a g g t c gacatcgggc a g c c g g t g c g t c a c c g g c g c g t c g a a a t c c c a g t g c t g g a c g t a a t a g g g a t a g g c g c c c aggtgcgcgc c g g a g c c t t c g g c c a c c a c g gcgtggagat eg LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES sour c e gcc648 gec 648. gcc648 2699 bp mRNA BCT 15-OCT-1999 gene CDS gene CDS C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 2699) Awram,P.A. A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s U n p u b l i s h e d 2 (bases 1 t o 2699) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (15-OCT-1999) UBC L o c a t i o n \/ Q u a l i f i e r s 1. .2699 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" 1..1056 \/gene=\"orf1\" 1..1056 \/gene=\"orf1\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e chemotaxis r e c e p t o r p r o t e i n \" \/translation=\"RPVIAPGRTDDQDQVITVLSEQFKALAAGDLTARVDVVFSERYG HVRDEFNAAMTKLGQVMDEISMAAGGLGESSDEVARVSQHLSRGAGRQALDLHGARAA LQKVGAAAGRGVDGLRRVTEAAAGLRIDAASARRSVREAVGSIAEVEQSALRISQAAA LFDEVAQQANVLSLIADVEGARGGEGXGPFQAVAADKMRVLAERASGAAREIKGVTAA NSAQVSRCARLMDAASASFGGMASRITQIDGLVSGLAKSAQEQAHGLRAVDEAVDRAD DIAQTHADQVDEAAAVTGRLIEEAESLIQAASPFRAHVVSRPASRPEPARAGHHAPAG NAVARAHARIAAYARPR\" 1060..2577 \/gene=\"orf1\" 1060..2577 \/gene=\"orf1\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e h i p p u r a t e h y d r o l a s e p r o t e i n \" \/translation=\"MLCHPGKRVALVRDPGAASAALPQSLGPGSTPGFRRGSAGMTQD ISVRGGGGGEHVRRSCDSRNPRPSMKSLFAASALALLIATAAQAGPLNVPATQKVISA QLDRDYPALEALYKDIHAHPELGFQEVETAKKLAAQMRALGFTVTEGVGKTGVVAVLK NGEGPKVLIRTELDGLPMQEKSGLAWASQATATWNGEKVFVAHACGHDIHMAAWVGAA RQLVAMKAKWKGTLVFVAQPSEETVRGARAMLDDGLWDKIGGKPDYGFALHVGSGPXG EVYYKAGVLTSTSDGLDITFNGRGGHGSMPSATIDPVLMAARFTVDVQSVISREKDPS AFGVVTVGSIQAGSAGNIIPDKARVRGTIRTQDNAVREKILDGVRRTVKAVTDMAGAP PADLKLTPGGKMVVNDAALTDRTAVVFKAAFGARAVAQDKPGSASEDYSEFVLAGVPS VYFAIGGSDPAELAKAKAEGREPPVNHSPYFAPVAEPTIRTGVEAMTLAVLNVLK\" 144 BASE COUNT 396 a 935 c 984 g . 381 t 3 o t h e r s ORIGIN 1 c g c c c c g t g a t c g c g c c g g g ccgcaccgac g a t c a g g a t c a g g t g a t c a c c g t g c t g t c c 61 g a g c a g t t c a a g g c c c t g g c ggccggcgac c t g a c c g c c c g c g t c g a t g t g g t g t t c a g c 121 g a g c g c t a t g g c c a c g t c c g c g a c g a g t t c aacgcggcga t g a c c a a g c t gggccaggtc \u2022181 atggacgaga t c t c c a t g g c ggctggcggg c t g g g c g a g t c t t c g g a c g a ggtggcgcgc 241 g t c t c g c a g c a t c t g t c g c g cggcgcgggg c g t c a g g c c t t g g a t c t g c a c g g t g c g c g g 301 g c g g c g c t g c agaaggtggg cgcggccgcc gggcggggcg t g g a c g g g c t g c g c c g c g t c 361 accgaagccg c c g c c g g c c t g c g c a t c g a c gccgccagcg c c c g c c g t t c g g t g c g t g a g 421 gcggtggggt cgatcgcgga ggtcgagcag a g c g c c t t g c g c a t c a g c c a ggccgccgcc 481 c t g t t c g a c g a g g t g g c t c a gcaggccaat g t c c t g t c c t t g a t c g c c g a cgtcgagggc 541 gcgcggggcg grgagggcgm g g g g c c c t t c c a g g c c g t c g c c g c t g a c a a g a t g c g c g t c 601 c t g g c c g a g c gggcctcggg cgcggcgcga gagatcaagg gcgtgacggc c g c c a a t t c g 661 g c g c a g g t c t cgcggtgcgc g c g g c t g a t g g a c g c c g c c t c g g c c t c g t t c g g c g g c a t g 721 g c g t c c a g g a t c a c c c a g a t c g a c g g t c t g g t g t c g g g c c t g g c c a a g t c cgcccaggag 781 c a g g c c c a t g g c c t g c g c g c c g t c g a c g a g gcggtggacc gggccgacga t a t c g c c c a g 841 a c c c a t g c c g a c c a g g t c g a cgaggccgcg g c g g t c a c c g g c c g c t t g a t cgaggaggcc 901 gagagcctga t c c a g g c c g c c a g t c c t t t c c g c g c c c a t g t g g t t t c g c g c c c g g c g t c g 961 cggcccgaac cagcccgcgc c g g c c a t c a c g c g c c c g c c g g c a a c g c c g t g g c c c g c g c c 1021 c a c g c c c g c a t c g c c g c c t a t g c g c g a c c c cgctagggga t g c t g t g t c a t c c c g g a a a g 1081 c g t g t a g c g c t t g t c c g g g a cccaggggcg gcaagcgcgg c g c t c c c g c a g t c c c t g g g t 1141 c c c g g c t c t a c c c c c g g c t t t c g c c g g g g t tcggccggga t g a c a c a g g a t a t t t c t g t c 1201 aggggtggcg gaggtggcga g c a t g t g c g t c g a t c c t g c g a c t c a c g a a a c c c g a g a c c c 1261 t c c a t g a a g t c c c t g t t c g c c g c c t c g g c t c t c g c c c t g c t g a t c g c c a c c g c c g c c c a g 1321 g c c g g g c c g t t g a a c g t g c c cgccacgcag a a g g t g a t c a g c g c c c a g c t cgaccgcgac 1381 t a t c c g g c g c t g g a g g c g c t gtacaaggac a t c c a c g c c c a c c c c g a g c t c g g c t t c c a g 1441 gaggtcgaga ccgccaagaa g c t g g c c g c g cagatgcggg c g c t g g g c t t c a c c g t c a c c 1501 gagggcgtcg gcaagaccgg c g t g g t g g c g g t g c t g a a g a acggcgaggg c c c c a a g g t g 1561 c t g a t c c g c a ccgagctgga c g g c c t g c c g atgcaggaaa a g t c g g g c c t g g c c t g g g c c 1621 agtcaggcga c c g c c a c c t g gaacggcgag a a g g t c t t c g t c g c c c a t g c c t g c g g c c a c 1681 g a c a t c c a c a t g g c c g c c t g g g t g g g t g c g gcccgccagc t g g t g g c g a t gaaggccaaa 1741 tggaagggca c g c t g g t t t t c g t g g c c c a g c c c t c g g a g g a g a c g g t t c g cggggcccgc 1801 g c c a t g c t g g a c g a c g g t c t gtgggacaag a t c g g c g g c a a g c c c g a c t a c g g c t t t g c g 18 61 c t g c a c g t c g g t t c g g g t c c gkccggcgag g t c t a t t a c a aggccggcgt c c t g a c c t c g 1921 a c c t c g g a t g g c c t g g a c a t c a c c t t c a a c ggccggggcg g g c a c g g c t c g a t g c c c t c g 1981 g c c a c c a t c g a c c c g g t g c t g a t g g c c g c c c g c t t c a c c g t c g a c g t g c a g a g c g t g a t c 2041 agccgcgaga a g g a c c c g t c g g c c t t c g g c g t g g t g a c g g t c g g c t c g a t ccaggcgggc 2101 a g c g c c g g t a a c a t c a t c c c cgacaaggcc cgggtgcgcg g c a c g a t c c g cacccaggac 2161 a a c g c c g t g c gcgagaagat c c t c g a c g g c g t g c g c c g c a cggtgaaggc ggtgaccgac 2221 a t g g c c g g c g c c c c g c c c g c c g a c c t g a a a c t g a c c c c g g gcggcaagat g g t g g t c a a t 2281 g a t g c g g c c c t g a c c g a t c g cacggcggtg g t g t t c a a g g c c g c c t t c g g g g c c c g c g c c 2341 gtggcgcagg acaagccggg c t c g g c g t c c g a g g a c t a t t c g g a a t t c g t g c t g g c c g g c 2401 g t g c c g t c g g t c t a c t t c g c c a t c g g t g g c t c g g a c c c c g c c g a g c t c g c caaggccaag 2461 gccgaaggcc g t g a g c c g c c g g t c a a c c a c t c g c c g t a c t t c g c g c c c g t ggccgagccg 2521 a c g a t c c g c a cgggggtgga ggcgatgacc c t g g c g g t g c t g a a t g t g t t g a a g t g a c c c 2581 t t c t c c c c t t gcgggagaag g t g t c g c c g g aggcgacgga t g a g g g g t t t c t c g g c c t c g 2641 ccgcgcgacc c c t c a a c c g a c c c g c t a c g c g g g c c a c c c t c t c c c g c a a a gggagaagg \/\/ LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM g c c l 2 9 0 g c c l 2 9 0 . g c c l 2 9 0 2109 bp mRNA BCT 15-OCT-1999 C a u l o b a c t e r c r e s c e n t u s . C a u l o b a c t e r c r e s c e n t u s REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gene CDS gene CDS BASE COUNT ORIGIN 1 B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . 1 (bases 1 t o 2109) Awram,P.A. A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s U n p u b l i s h e d 2 (bases 1 t o 2109) Awram,P.A. D i r e c t S u b m i s s i o n S u b m i t t e d (15-OCT-1999) UBC L o c a t i o n \/ Q u a l i f i e r s 1. .2109 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" complement(822. . 1769) \/ g e n e = \" o r f l 2 \" complement(822..1769) \/gene=\"orf12\" \/ c o d o n _ s t a r t = l \/product=\"Unknown i n t e r r u p t s O-antigen s y n t h e s i s \" \/translation=\"MSRLPPGLKTGRDVSVTGVDAAGRTVLTARDGDPQMVWTPTREE RKALRAAKAVRIDVKLEAVEGKLVGPALYADWGDGFSEDSYARLKAGPDGWFASLPAR SFQLNGVRLDPSEGACAFTVEALTVTRIGDLGRDPRGLRGAAIQALKPMLGPLRGPAG AAWRRGRALLAKGRVARPAGRDEGAVGATYAHAIAVSRNLRSPHYAAPIAAPITLPAE APKVVAFYLPQFHPFPENDTWWGKGFTEWTNVSKAQPQFLGHYQPRLPADLGFYDLVS ARCWPSRWTWPRARASTPSASTTTGSPESAFWNGRWICS\" complement(17 66..2107) \/ g e n e = \" o r f l 3 \" complement(17 66. .2107) \/ g e n e = \" o r f l 3 \" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p o l y s a c c h a r i d e t r a n s p o r t e r k p s T - l i k e \" \/ t r a n s l a t i o n ^ ' PAELSGLGDFLHLPVRTYSAGMLARLMFTVATVFEADILVLDEW LSAGDAAFVQKAAQRMHRMVEDAKIVVMATHDHDLVQRVCNRVCELQGGKIXFLGSXE DWLAYRETQAA\" 310 a . 719 c 767 g 311 t 2 o t h e r s g t c g g c g c g c t t g g c g a a c t g g c t c t g t g c c t c g g c g a t g a t c g g a t g g g c g t t g g t c a g 61 gcgcggcagc c a g g c g c t g a gcgcggtgcg g g t g g c g t g c a g a t a g c c g t ggccgaacca 121 c c g g t c g g g c t c g a g a t a g g c g c c t t c g c c c c a c t c g t t c c a g g c g t t g a cgaacaccag 181 c g c c t c g c c c t t g g g a t g c c g g g c c t c g g c g t g c t t c a g c g c c c c c g a c a gccagccgaa 241 a t a g c t t t c c g g a t c g g c g t t g t g g a a c g c c a c c c c g g c c c a g g g c t t g c g g g c c t g g t t 301 g t c c c a g c c g ggcatgacgc cgggcacgaa ggcggccgga a c c t g c t c c a g c t c a t c c a g 361 c t t g t g g c g g gccacggcgg g a t a g t c a t a g a c c t t g c c g g t g a a g c c g g c g t g c a g c g g 421 c g t g a c c c g g t t g g t g a t c t c g c c c t c g a c a a t g g c g t g c ggcgggaagt c g a c g a t g c c 481 g t c g a a g c c g t g g c c g g c a t a g t c c t g g a a gccgaaggcg g t g g t g c a c a gcaggtgcag 541 c t c g c c c a g c c c c a t g g c c c g c g c c t g a t c gcgccagcgc t c a g t g g t g g c c t t g g c g t c 601 gggcaggatc tcgg g g c g g t a c a g c a a t a g g a g c g g c t t g c c g c t c a c c c g c a g a t a g cg 661 c g g g t c g c g c a t g t a g c g c g c c a g g t c c t c gaacaccgcg c g g t c g t c c t gcggcgagtg 721 c t c c t g g c c c a t c a g g a t g t c g c t c t c g t c g c c g t c c c a g c g g c g g g t c c a g t t c t c a t t 781 ggcccagcag agggcgaagg g c a g g t c c a g g c t c g g a t c g t t c a g g a a c a g a t c c a g c g g 841 c c g t t c c a g a a g g c g c t t t c cggcgaacca g t a g t a g t g g aagcagaagg c g t g g a c g c c 901 c g c g c c c t t g g c c a g g t c c a c c t g c t g g g c c a g c a c c t c g c g c t g a c c a g g t c g t a g a a g 961 c c c a g a t c c g ccggcaggcg c g g c t g g t a g t g a c c c a g g a a c t g c g g c t g g g c c t t g g a g 1021 a c g t t g g t c c a c t c g g t g a a g c c c t t g c c c c a c c a g g t g t c a t t c t c c g g gaacggatga 1081 a a c t g c g g c a ggtagaaggc c a c c a c c t t g g g c g c t t c g g ccggcagggt gatcggggcg 146 1141 gcgatcgggg c g g c g t a g t g cgggctgcgc a g g t t g c g g g a g a c c g c g a t c g c g t g g g c g 1201 t a g g t c g c g c cgaccgcgcc c t c g t c g c g t c c t g c c g g a c g c g c c a c g c g t c c c t t g g c c 12 61 agcagcgccc g c c c g c g t c g ccaggccgcg ccggcggggc cgcgcaaggg g c c c a g c a t c 1321 g g c t t c a g c g c c t g g a t c g c cgcgccgcgc aggccgcgcg g a t c g c g g c c g a g a t c c c c g 1381 a t a c g g g t g a ccgtgagggc c t c g a c c g t g aaggcgcagg c g c c c t c g g a c g g g t c c a g c 1441 c g c a c g c c g t t c a g t t g g a a a c t g c g c g c c ggcagcgagg cgaaccagcc g t c c g g a c c g 1501 g c c t t c a g g c gggcgtagga a t c c t c g g a a a a g c c g t c g c c c c a g t c g g c gtagagcgcg 1561 gggccgacca g c t t g c c c t c g a c c g c c t c a a g c t t g a c g t c g a t c c g c a c c g c c t t g g c c 1621 gcccgcagcg c c t t g c g c t c t t c g c g g g t g g g c g t c c a g a c c a t c t g c g g g t c g c c g t c g 1681 cgggcggtca g g a c c g t g c g c c c g g c c g c a t c g a c g c c g g t g a c c g a c a c g t c g c g g c c g 1741 g t c t t c a g g c cgggcggcag g c g g c t c a t g c g g c c t g g g t t t c g c g g t a g g c c a g c c a g t 1801 c c t c g k t c g a gccgaggaag g s g a t c t t t c c g c c c t g c a g t t c g c a g a c g c g g t t g c a g a 1861 c c c g c t g g a c c a g g t c a t g g t c g t g g g t g g c c a t c a c c a c g a t c t t g g c g t c c t c g a c c a 1921 t c c g g t g c a t c c g c t g g g c g g c c t t c t g c a cgaaggcggc g t c g c c g g c g c t g a g c c a c t 1981 c g t c c a g c a c c a g g a t g t c g g c c t c g a a c a c g g t g g c c a c c g t g a a c a t c aggcgcgcca 2041 g c a t a c c g g c c g a a t a g g t g cgcaccggca ggtgcagaaa g t c g c c c a g g c c c g a t a a c t 2101 cggcggggg \/\/ LOCUS gcc 2205 2365 bp mRNA BCT 15-OCT-1999 DEFINITION gcc 2205. ACCESSION gcc 2205 VERSION KEYWORDS SOURCE C a u l o b a c t e r c r e s c e n t u s . ORGANISM C a u l o b a c t e r c r e s c e n t u s B a c t e r i a ; P r o t e o b a c t e r i a ; a l p h a s u b d i v i s i o n ; C a u l o b a c t e r group; C a u l o b a c t e r . REFERENCE 1 (bases 1 t o 2365) AUTHORS Awram,P.A. TITLE A n a l y s i s o f the S - l a y e r T r a n s p o r t e r Mechanism and Smooth L i p o p o l y s a c c h a r i d e S y n t h e s i s i n C a u l o b a c t e r c r e s c e n t u s JOURNAL U n p u b l i s h e d REFERENCE 2 (bases 1 t o 2365) AUTHORS Awram,P.A. TITLE D i r e c t S u b m i s s i o n JOURNAL S u b m i t t e d (15-OCT-1999) UBC FEATURES L o c a t i o n \/ Q u a l i f i e r s s ource 1..2365 \/organism=\"Caulobacter c r e s c e n t u s \" \/strain=\"NA1000\" gene complement(2..550) \/gene=\"orf16\" CDS complement(2..550) \/gene=\"orf16\" \/ c o d o n _ s t a r t = l \/ p r o d u c t = \" p u t a t i v e HOMODA h y d r o l a s e p r o t e i n \" \/translation=\"MRGLTISGVFAVLVLTASLAQAGEVTVDGRKVAYREWGGGERTL VMVSGLGDGAETFETVGPRLAQGWRVIAYDRAGYGGSADDPRVHDAERAEAELKGLLA ALKVRKPVLLGHSLGGVFAAHFAARNPGEVTGLVLEETRPTGFTAACKAKRMRGCAFP PLLKYAFPPGGRREVETLDRIER\" gene complement(559..2178) \/gene=\"pgi\" CDS complement(559..2178) \/gene=\"pgi\" \/ c o d o n _ s t a r t = l 147 \/ p r o d u c t = \" p u t a t i v e p h o s p h o g l u c o i s o m e r a s e \" \/translation=\"MADLDAAWTRLEAAAKAAGDKRIVEFFDAEPGRLDALTLDVAGL HLDLSKQAWDEAGLEAALDLAHAADVEGARARMFDGEAINSSEGRAVLHTXLRAPAGA DVKALGQPVMAEVDAVRQRMKAFAQXVRSGAIKGATGKPFKAILHIGIGGSDLGPRLL WDALRPVKPSIDLRFVANVDGAEFALTTADMDPEETLVMVVSKTFTTQETMANAGAAR AWLVAALGEQGANQHLAAISTALDKTAAFGVPDDRVFGFWDWVGGRYSLWSSVSLSVA VAAGWDAFQGFLDGGAAMDEHFRTAPLEQNAPVLVALAQIFNRNGLDRRARSVVPYSH RLRRLAAFLQQLEMESNGKSVGPDGQPAKRGTATVVFGDEGTNVQHAYFQCMHQGTDI TPMELIGVAKSDEGPAGMHEKLLSNLLAQAEAFMVGRTTDDVVAELTAKGVSDAEIAT LAPQRTFAGNRPSTLVLLDRLTPQTFGALIALYEHKTFVEGVIWGINSFDQWGVELGK VMANRILPELESGASGQHDPSTAGLIQRLKR\" BASE COUNT 367 a 849 c 783 g 364 t 2 o t h e r s ORIGIN 1 g c c g c t c a a t a c g g t c c a g c g t t t c g a c c t cgcggcgccc gcccggcgga a a c g c g t a t t 61 tgagcagcgg cgggaacgcg c a c c c g c g c a t a c g c t t a g c c t t a c a g g c c gcagtgaagc 121 c g g t c g g c c g g g t t t c c t c c agcacgaggc c c g t g a c c t c t c c c g g a t t g cgggccgcga 181 agtgggcggc gaacacgccg c c c a g c g a a t gccccagaag c a c g g g c t t t c g c a c c t t c a 241 a c g c c g c c a g c a g c c c c t t c a g c t c a g c c t c c g c c c g c t c g g c g t c g t g c acacgcggat 301 c a t c g g c g c t g c c g c c a t a g c c c g c c c g g t c a t a g g c g a t gacgcgccag c c c t g g g c c a 361 gccgggggcc g a c c g t c t c g a a c g t c t c g g c c c c g t c g c c a a g a c c g c t g a c c a t c a c c a 421 g g g t c c g c t c g c c a c c g c c c c a t t c g c g a t a a g c g a c c t t g c g c c c g t c g a c c g t c a c c t 481 c g c c g g c c t g cgccaacgag g c c g t c a g c a ccaggacggc gaaaacgccg c t g a t c g t c a 541 g c c c t c g c a t g g g a t c g c c t a g c g c t t c a g g c g c t g g a t c a a c c c t g c g g t c g a a g g g t c 601 a t g c t g g c c c gaagcgccgc t c t c c a g c t c cggcaggatg c g g t t c g c c a t c a c c t t g c c 661 c a g c t c g a c g c c c c a c t g g t c g a a g c t g t t g a t c c c c c a g a t c a c g c c c t cgacgaaggt 721 c t t g t g c t c a tagagggcga t c a g g g c g c c g a a g g t c t g g ggcgtcaggc ggtcgaggag 781 c a c c a g g g t c gagggccggt tgccggcgaa a g t t c g c t g c ggggccaggg t g g c g a t t t c 841 ggcgtcagag a c g c c c t t g g c c g t g a g c t c ggccacgaca t c g t c c g t g g t c c g c c c g a c 901 catgaaggcc t c g g c c t g g g c c a a g a g g t t cgagagcagc t t c t c g t g c a t g c c g g c c g g 961 g c c t t c g t c c g a c t t g g c g a c g c c g a t c a g c t c c a t c g g c g t g a t g t c g g t c c c c t g g t g 1021 c a t g c a c t g g a a a t a g g c g t g c t g a a c a t t g g t g c c t t c g t c g c c g a a c a c c a c c g t g g c 1081 c g t g c c g c g c t t g g c c g g c t g c c c g t c g g g gccgaccgac t t g c c g t t g c t c t c c a t c t c 1141 c a g c t g c t g g aggaaggcgg ccaggcggcg caggcggtgc g a g t a c g g c a cgaccgagcg 1201 ggcccggcgg t c c a g g c c g t t g c g a t t g a a g a t c t g g g c c agggccacca gcaccggcgc 12 61 a t t c t g c t c c agcggggcgg t g c g g a a g t g c t c a t c c a t g g c c g c g c c g c c g t c c a g g a a 1321 a c c c t g g a a c g c g t c c c a g c ccgcggcgac ggccaccgaa a g g c t g a c c g acgaccacag 1381 cgaatagcgg c c g c c g a c c c a g t c c c a g a a c c c g a a c a c g c g a t c g t c c g gcacgccgaa 1441 ggcggcggtc t t g t c c a g c g c g g t c g a g a t ggcggccaga t g c t g a t t g g c c c c c t g c t c 1501 g c c t a g g g c c g c c a c c a g c c aggcccgcgc cgcgccggcg t t g g c c a t g g t c t c c t g g g t 1561 c g t g a a g g t c t t g g a g a c c a c c a t g a c c a g g g t c t c t t c c g g g t c c a t g t cggcggtggt. 1621 cagggcgaac t c g g c g c c g t c g a c a t t g g c gacgaagcgc a g g t c g a t c g a c g g c t t c a c 1681 c g g t c g c a g g g c g t c c c a c a gcaggcgtgg g c c c a g g t c g c t g c c g c c g a t g c c g a t g t g 1741 c a g g a t c g c c t t g a a c g g c t t g c c g g t c g c g c c c t t g a t c g c c c c c g a a c gcacggmctg 1801 cgcgaaagcc t t c a t c c g c t ggcggacggc a t c g a c c t c g g c c a t g a c c g g c t g g c c c a g 1861 g g c c t t g a c g t c c g c t c c c g ccggagcgcg c a g g n c c g t a t g c a g c a c a g c c c g g c c t t c 1921 g g a c g a a t t g a t c g c c t c g c c g t c g a a c a t gcgggcccgg g c g c c c t c g a c a t c g g c c g c 1981 gtgggccaga tcgagcgcgg c c t c g a g a c c c g c c t c a t c c c a g g c c t g c t t g g a g a g a t c 2041 caggtgcagg c c g g c g a c g t c c a g g g t c a g agcgtcaagg c g t c c c g g c t c g g c g t c g a a 2101 g a a c t c g a c g a t a c g c t t g t cgcccgcagc c t t g g c g g c g g c t t c c a g g c gggtccaggc 2161 ggcgtcgaga t c g g c c a t g t c g t c c t c a c a g g t t t g g t a a a t c g c t g t t t acggacccgg 2221 g c t t a t c a a a caggcgccgc c g c g t c a t g g c a g a c c a a t g a c g t t t t t t g cgggagcccc 2281 c g a t g t c c t g t g a c g c c g t c g c c t t t t c c g c c a t g c t c t g g a t g g c g t c g t t c a a t c c g g 2341 agcagacgac c g g c c c c g c c c t c g c 148 ","attrs":{"lang":"en","ns":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","classmap":"oc:AnnotationContainer"},"iri":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","explain":"Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information."}],"Genre":[{"label":"Genre","value":"Thesis\/Dissertation","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","classmap":"dpla:SourceResource","property":"edm:hasType"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","explain":"A Europeana Data Model Property; This property relates a resource with the concepts it belongs to in a suitable type system such as MIME or any thesaurus that captures categories of objects in a given field. It does NOT capture aboutness"}],"GraduationDate":[{"label":"GraduationDate","value":"2000-05","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","classmap":"vivo:DateTimeValue","property":"vivo:dateIssued"},"iri":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","explain":"VIVO-ISF Ontology V1.6 Property; Date Optional Time Value, DateTime+Timezone Preferred "}],"IsShownAt":[{"label":"IsShownAt","value":"10.14288\/1.0089847","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","classmap":"edm:WebResource","property":"edm:isShownAt"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","explain":"A Europeana Data Model Property; An unambiguous URL reference to the digital object on the provider\u2019s website in its full information context."}],"Language":[{"label":"Language","value":"eng","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/language","classmap":"dpla:SourceResource","property":"dcterms:language"},"iri":"http:\/\/purl.org\/dc\/terms\/language","explain":"A Dublin Core Terms Property; A language of the resource.; Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]."}],"Program":[{"label":"Program","value":"Microbiology and Immunology","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","classmap":"oc:ThesisDescription","property":"oc:degreeDiscipline"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the program for which the degree was granted."}],"Provider":[{"label":"Provider","value":"Vancouver : University of British Columbia Library","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","classmap":"ore:Aggregation","property":"edm:provider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","explain":"A Europeana Data Model Property; The name or identifier of the organization who delivers data directly to an aggregation service (e.g. Europeana)"}],"Publisher":[{"label":"Publisher","value":"University of British Columbia","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/publisher","classmap":"dpla:SourceResource","property":"dcterms:publisher"},"iri":"http:\/\/purl.org\/dc\/terms\/publisher","explain":"A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service."}],"Rights":[{"label":"Rights","value":"For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https:\/\/open.library.ubc.ca\/terms_of_use.","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/rights","classmap":"edm:WebResource","property":"dcterms:rights"},"iri":"http:\/\/purl.org\/dc\/terms\/rights","explain":"A Dublin Core Terms Property; Information about rights held in and over the resource.; Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights."}],"ScholarlyLevel":[{"label":"ScholarlyLevel","value":"Graduate","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","classmap":"oc:PublicationDescription","property":"oc:scholarLevel"},"iri":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the scholarly level of the author(s)\/creator(s)."}],"Title":[{"label":"Title","value":"Analysis of the s-layer transporter mechanism and smooth lipopolysaccharide synthesis in caulobacter crescentus","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/title","classmap":"dpla:SourceResource","property":"dcterms:title"},"iri":"http:\/\/purl.org\/dc\/terms\/title","explain":"A Dublin Core Terms Property; The name given to the resource."}],"Type":[{"label":"Type","value":"Text","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/type","classmap":"dpla:SourceResource","property":"dcterms:type"},"iri":"http:\/\/purl.org\/dc\/terms\/type","explain":"A Dublin Core Terms Property; The nature or genre of the resource.; Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element."}],"URI":[{"label":"URI","value":"http:\/\/hdl.handle.net\/2429\/11365","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#identifierURI","classmap":"oc:PublicationDescription","property":"oc:identifierURI"},"iri":"https:\/\/open.library.ubc.ca\/terms#identifierURI","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the handle for item record."}],"SortDate":[{"label":"Sort Date","value":"2000-12-31 AD","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/date","classmap":"oc:InternalResource","property":"dcterms:date"},"iri":"http:\/\/purl.org\/dc\/terms\/date","explain":"A Dublin Core Elements Property; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]."}]}