UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The organization, expression, function and evolution of some essential genes from the hyperthermophilic… Liao, Daiqing 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1993_fall_liao_daiqing.pdf [ 7.84MB ]
Metadata
JSON: 831-1.0098821.json
JSON-LD: 831-1.0098821-ld.json
RDF/XML (Pretty): 831-1.0098821-rdf.xml
RDF/JSON: 831-1.0098821-rdf.json
Turtle: 831-1.0098821-turtle.txt
N-Triples: 831-1.0098821-rdf-ntriples.txt
Original Record: 831-1.0098821-source.json
Full Text
831-1.0098821-fulltext.txt
Citation
831-1.0098821.ris

Full Text

THE ORGANIZATION, EXPRESSION, FUNCTION AND EVOLUTION OF SOME ESSENTIAL GENES FROM THE HYPERTHERMOPHILIC EUBACTERIUM THERMOTOGA MARITIMA  by  DAIQING LIAO M.Sc., Peking University, 1986 B.Sc., Hunan University, 1983 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA JUNE, 1993  ©Daiqing Liao, 1993  141  National Library^Bibliothéque nationale of Canada^ du Canada Acquisitions and Bibliographic Services Branch  Direction des acquisitions et des services bibliographiques  395 Wellington Street Ottawa, Ontario K1A ON4  395, rue Wellington Ottawa (Ontario) KlA ON4  Your file Votre reference  Our file Notre reference  The author has granted an irrevocable non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of his/her thesis by any means and in any form or format, making this thesis available to interested persons.  L'auteur a accorde une licence irrevocable et non exclusive permettant a Ia Bibliothêque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de sa these de quelque maniere et sous quelque forme que ce soit pour mettre des exemplaires de cette these a la disposition des personnes interessees.  The author retains ownership of the copyright in his/her thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without his/her permission.  L'auteur conserve Ia propriete du droit d'auteur qui protege sa these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.  ISBN 0-315-85403-0  Canada!  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  (Signature)  Department of  RroCileivit  The University of British Columbia Vancouver, Canada  Date  ^  DE-6 (2/88)  (727-, iqq3  ii  Abstract The hyperthermophilic eubacterium Thermotoga maritima grows optimally at 80°C near marine geothermal locales. Phylogenetic analyses based on various molecular sequences indicate that T. maritima and other hyperthermophilic prokaryotes have very deep phylogenetic placements; i.e., that they have diverged early from the ancestor of living organisms. Thus, studies on the biochemistry and molecular biology of hyperthermophilic organisms such as T. maritima may shed light on the early evolution of life, as well as enhance our understanding of life at high temperature. In this study, a 5,800-base-pair DNA fragment from the chromosome of T. maritima was cloned and sequenced. This fragment encodes five tRNAs, the ribosomal protein L33, an integral membrane protein, SecE, which is probably involved in protein translocation, the transcription factor NusG, four large subunit ribosomal proteins (L11, L1, L10 and L12), and the N-terminus of the RNA polymerase 0 subunit. The transcriptional patterns of this gene cluster were analyzed using S1 nuclease protection and primer extension techniques. The tRNA genes and the protein-encoding genes are cotranscribed, except the 13 gene, which is transcribed separately. The following regulatory sequence elements were identified in this cloned fragment: five promoters (Pi and P2 in front of the first and second methionine tRNAs, respectively, PLio in the L1-L10 intergenic space, PL12 at the end of the L10 gene, and PR in the L12-(3 intergenic region), a transcription attenuator upstream of the L10 gene, a transcription terminator located between the L12 and the (3 subunit gene of the RNA polymerase, and an autogenous translational regulation site (the Ll binding site) located upstream of the L11 gene. The transcription factor NusG encoded in this cluster exhibits 43% amino acid sequence identity when aligned to its E. coli counterpart; the alignment is interrupted by a 171-amino-acid-long insertion into the T. maritima protein. The  iii T. maritima NusG was overexpressed in E. coli, and the recombinant NusG protein  was purified. The NusG protein binds to DNA cooperatively, but nonspecifically. Two types of NusG-DNA complexes have been observed. The first type forms instantly and can be stained with ethidium bromide ("loose" complex); the second type forms more slowly, and is probably converted from the loose structure(s). The second type is probably more compact, as it can not be stained with ethidium bromide ("tight" complex). This protein binds to both ds- and ssDNA, but preferentially to dsDNA in a mixture of both DNA molecules. About 40 and 60 NusG monomers per kilobases (pairs) of ds- and ssDNA, respectively, are required to form cooperative NusG-DNA complexes. When a relatively large amount of NusG was added to an in vitro transcription assay, it appears to selectively suppress aberrant transcription initiation and termination, and at the same time, the production of specific transcripts is, at most, only marginally reduced. Available sequences that correspond to the E. coli ribosomal proteins L11, L1, L10 and L12 from eubacteria, archaebacteria and eukaryotes have been aligned, and the alignments were subjected to quantitative phylogenetic analysis. Eubacteria and eukaryotes each form a well-defined, coherent and non-overlapping group. Archaebacteria also form a coherent phylogenetic group by themselves, but the relationships between the major groups of archaebacteria and outgroups (eubacteria and eukaryotes) can not unambiguously be established. On the other hand, T. maritima does not appear as the deepest branch within the eubacterial kingdom;  however, this placement is less definitive.  iv  Table of Contents Abstract^  ii  Table of Contents ^  iv  List of Tables ^  viii  List of Figures ^  ix  Abbreviations ^  xi  Acknowledgments ^  xiii  I. Introduction ^  1  1.1 The origin of life ^  1  1.1.1 Prebiotic synthesis of the building blocks of life ^ 1 1.1.2 The living world before the first cell ^  3  1.1.3 The first cell ^  4  1.2 Hyperthermophilic prokaryotes ^  7  1.2.2 General properties of hyperthermophiles ^ 7 1.2.2 The molecular basis of thermophily ^  10  1.2.3 Phylogeny ^  15  1.3 The Ribosomes ^  16  1.4 Transcription ^  19  1.5 Objectives of this study ^  21  II. Materials and methods ^  23  2.1 Materials ^  23  2.2 Bacterial strains, plasmids, and media ^  24  V 2.3 Molecular biological techniques ^ 2.3.1 Gel electrophoresis ^  24 25  2.3.2 DNA restriction fragment preparation ^ 25 2.3.3 Ligation ^  26  2.3.4 Transformation ^  26  2.3.5 5' and 3' end-labeling of DNA fragments ^27 2.3.6 5' end-labeling of oligonucleotides ^  27  2.3.7 Labeling DNA probes by random priming method ^ 28 2.3.8 Southern blot hybridization and cloning procedures ^28 2.3.9 DNA sequencing ^  29  2.3.10 RNA transcript analysis ^  29  2.3.11 Northern blotting ^  32  2.4 Expression of T. maritima NusG in E. coli^  33  2.5 Purification of NusG ^  34  2.6 Immunization procedure ^  35  2.7 Enzyme-linked immunosorbent assay (ELISA) and Western blotting ^  36  2.8 DNA band-shift assays ^  37  2.9^Isolation of T. maritima ribosomes ^  38  2.10 In vitro transcription ^  39  2.11 Molecular sequences ^  40  2.12 Sequence alignments ^  40  2.13 Phylogenetic reconstruction ^  43  III. The organization and expression of essential transcription, translation component genes in the hyperthermophilic eubacterium Thermotoga maritima  44  vi 3.1 Introduction ^  44  3.2 Results and discussion ^  46  3.2.1 The tRNA gene cluster ^  53  3.2.2 tRNA processing ^  56  3.2.3 Characterization of transcripts derived from proteinencoding genes ^  60  3.2.4 mRNA secondary structure and function ^ 64 3.2.5 Transcription and translation initiation signals ^70 3.2.6 Protein homologies ^  71  3.2.7 Evolutionary implications ^  79  3.3 Summary ^  79  IV. The functions of the T. maritima NusG protein: DNA-binding activity and its role in transcription ^  82  4.1 Introduction ^  82  4.2 Results ^  84  4.2.1 Binding activity of NusG to duplex DNA ^ 86 4.2.2 Accessibility of the NusG-DNA complex to restriction by Taql ^  90  4.2.3 Binding activity of NusG to single-stranded DNA ^91 4.2.4 Competition between single-stranded and duplex DNA for NusG binding ^ 4.2.5 Role of NusG in transcription ^  94 94  4.2.6 Association of NusG with ribosomes in T. maritima ^ 100 4.3 Discussion ^  100  4.4 Summary ^  104  vii  V.  Molecular phylogenies based on the sequences of ribosomal proteins L11, L1, L10 and L12 ^  106  5.1 Introduction ^  106  5.2 Results and Discussion ^  107  5.2.1 Alignment and phylogeny of L11 proteins ^ 107 5.2.2 Alignment and phylogeny of L1 protein sequences ^ 109 5.2.3 The sequence alignments and phylogeny of L10 proteins ^ 114 5.2.4 The sequence alignments and phylogeny of L12 proteins ^ 120 5.2.5 Phylogenetic considerations ^ 5.3 Summary ^  128 130  VI. Conclusion and prospects ^  132  VII. References ^  134  viii  List of Tables Table 1 Taxonomy of hyperthermophilic prokaryotes ^  11  Table 2 E. coli strains used for cloning ^  25  Table 3 Oligonucleotides ^  30  Table 4 Duplex DNA fragments used for DNA band-shift assays ^ 38 Table 5 Organisms and their abbreviations from which the sequences of the ribosomal proteins L11, L1, L10 and L12 are available ^ 41 Table 6 T. maritima and E. coli protein homologies ^  73  ix List of Figures  Figure 1 A possible scheme for early evolution ^ Figure 2 A universal phylogenetic tree ^  8 17  Figure 3 Structure and organization of the L11, L1, L10 and L12 encoding regions from E. coli and T. maritima genomes ^ 47 Figure 4 Nucleotide sequence of the T. maritima 5.8 kb EcoRI genomic fragment^ Figure 5 Structure and processing of tRNAs ^  49 54  Figure 6 Mapping of transcript end sites in the tRNA-nusG region ^58 Figure 7 Characterization of transcripts from the SecE, NusG and ribosomal protein encoding genes ^  61  Figure 8 Structure and features of E. coli and T. maritima RNA transcripts ^ 65 Figure 9 Transcription initiation and translation initiation elements in T. maritima ^  72  Figure 10 Alignment of SecE protein sequences ^  75  Figure 11 Alignment of NusG protein sequences ^  77  Figure 12 Overexpression and purification of the T. maritima NusG protein ^85 Figure 13 The binding properties of NusG to linear duplex DNA ^87 Figure 14 The stoichiometry of NusG:duplex DNA complexes ^89 Figure 15 Susceptibility of NusG:duplex DNA complexes to restriction by Taql ^  92  Figure 16 The stoichiometry of NusG:single-stranded DNA complexes ^93 Figure 17 Competition between single-stranded and duplex DNA for NusG binding ^  95  Figure 18 In vitro transcription of DNA template containing a promoter and a terminator ^  97  x Figure 19 In vitro transcription of DNA template containing a promoter, a  terminator, and an attenuator ^  99  Figure 20 Alignment of the amino acid sequences of ribosomal protein L11 family, and the phylogenetic tree based on this alignment ^ 110 Figure 21 Alignment of the amino acid sequences of ribosomal protein L1 family, and the phylogenetic tree based on this alignment ^ 112  Figure 22 Alignment of the amino acid sequences of ribosomal protein L10 family, and the phylogenetic tree based on this alignment ^ 117 Figure 23 Alignment of the amino acid sequences of ribosomal protein L12 family ^  122  Figure 24 Phylogenetic tree inferred from the aligned L12 amino acid sequences ^  126  xi  Abbreviations A^Adenosine A600^Absorbance at 600 nm ATP^Adenosine 5'-triphosphate by^Base pair BSA^Bovine serum albumin C^Cytosine dATP^2'-deoxyadenosine-5'-triphosphate dCTP^21-deoxycytidine-5'-triphosphate ddATP^2',3'-dideoxyadenosine-5'-triphosphate ddCTP^2',3'-dideoxycytidine-5'-triphosphate ddGTP^2',3'-dideoxyguanosine-5'-triphosphate ddTTP^2',3'-dideoxythymidine-5'-triphosphate ddNTP^2',3'-dideoxyribonucleotide-5'-triphosphate (ddATP, ddCTP, ddGTP and ddTTP) dITP^2'-deoxyinosine-5'-triphosphate DNA^Deoxyribonucleic acid DNase^Deoxyribonuclease dNTP^2'-deoxyribonucleotide-5'-triphosphate (dATP, dCTP, dGTP and d I fP) ds^Double-stranded DTT^Dithiothreitol dTTP^2'-deoxythymidine-5'-triphosphate EDTA^Ethylenediamine tetraacetic acid G^Guanosine GTP^Guanosine-5'-triphosphate  xii IPTG^Isopropyl-13-D-thiogalactopyranoside kbp^Kilobase pairs kd^Kilodaltons MOPS^Morpholinopropane sulfonic acid m RN A^Messenger RNA NTP^Ribonucleotide-5'-triphosphate (ATP, CTP, GTP and UP) PAGE^Polyacrylamide gel electrophoresis PBS^Phosphate-buffered saline PCR^Polymerase chain reaction PIPES^1,4-piperazine-N,N1-bis[2-ethane sulfonic acid] RNA^Ribonucleic acid RNa se^Ribonuclea se rpm^Revolutions per minute rRNA^Ribosomal RNA S^Svedberg unit of sedimentation coefficient SDS^Sodium dodecyl sulphate ss^Single-stranded T^Thymidine Tmax^Maximum growth temperature Tris^Trihydroxymethylaminomethane tRNA^Transfer RNA X-gal^5-bromo-4-chloro-3-indoly1-13-D-galactoside  Acknowledgments I would like to thank Dr. Patrick Dennis for providing me the opportunity to work in his lab, and for his guidance, encouragement throughout this study. I am also indebted to Drs. Philip Bragg and Ross MacGillivray for their advice and support. I thank all my fellow workers in the Dennis's lab for their friendship and cooperation. Especially, I am grateful to Luc, Simon and Steve for critical reading this thesis. Many scientists have provided generous assistance. Dr. Wolfgang Zillig sent us T. maritima cells and the plasmid pUC-TB4. Dr. Peter Palm shared with us the purified DNA-dependent RNA polymerase from T. maritima. Dr. Jack Greenblatt provided purified E. coli NusG and antisera against it. Drs. Karl Stetter, Alap Subramanian, Roland Hartmann communicated manuscripts and data prior to publication. The Alfred Sloan Foundation awarded a fellowship to allow me to attend the third UCLA International School on Molecular Evolution. Most of all, I thank my wife Lisa for her unconditional support and enduring many lonely evenings and weekends with our daughter, Jennifer, while I was working in the lab. To them—Jennifer and Lisa—I am forever indebted, for making all this worthwhile. This thesis is dedicated to my parents, Mr. Wanjin Liao and Mrs. Liuliang Zhu.  1  I.^Introduction 1.1 THE ORIGIN OF LIFE 1.1.1 Prebiotic synthesis of the building blocks of life  According to current thinking, the solar system formed about 4.6 billion years ago from a cloud of gas and interstellar dust. The Earth was formed from one of the tiny planetesimals that were condensed from some of the cloud that did not fall into the Sun during the genesis of the solar system. The primitive atmosphere of the Earth resulted from outgassing of various gases from the interior of the Earth. Methane (CH4), ammonia (NH3), water (H20), and hydrogen (H2) were probably among the major constituents of the primitive atmosphere. Under the primitive Earth conditions (frequent lightning and volcanic eruptions), these gases reacted with each other, resulting in the prebiotic syntheses of amino acids, nucleic acid bases and sugars—the essential building blocks of life. These small molecules were probably assembled into larger molecules, such as nucleic acid precursors, short peptides and nucleic acids, by condensation reactions (for recent reviews, see Weiner, 1987; Pace, 1991). Laboratory experiments, which were thought to simulate primitive Earth conditions, demonstrated that many amino acids, hydrogen cyanide (HCN) and various aldehydes (RCHO) could be formed by refluxing a mixture of CH4, NH3, H2O and H2, and passing high voltage sparks through the gaseous phase. Further work showed that (i) refluxing a concentrated solution of HCN in ammonia led to the synthesis of nucleic acid base adenine; (ii) uracil can also be synthesized from HCN; and (iii) polymerization of formaldehyde (HCHO) can give rise to various sugars (reviewed by Weiner, 1987).  2 The conditions and pathways of prebiotic syntheses remain controversial. First, the exact conditions prevailing on the primitive Earth are still a matter of speculation. A more recent model of the early atmosphere suggested that the primitive atmosphere contained primarily CO2, CO and N2; that the surface temperature under such an atmosphere would have been 85°C, and that CH4 and NH3 were probably scarce (reviewed by Kasting, 1993). When life originated about 3.5 billion years ago, the atmosphere was likely to have been weakly reducing (mainly CO2 and N2, with traces of CO, H2, and reduced sulfur gases). In such an atmosphere, formaldehyde could still have been synthesized efficiently (Pinto et al., 1980), but formation of hydrogen cyanide (HCN), which is essential for syntheses of amino acids and nucleotides, would have been much more difficult (reviewed by Kasting, 1993). Many current theories regarding the origin of life assume that the primitive atmosphere was reducing. Explaining how HCN could have formed is, therefore, one of the major hurdles for these theories. However, prebiotic reactions could still have taken place in anaerobic locales, such as submarine hydrothermal vents, under oxidizing conditions (Wachtershauser, 1992). Alternatively, the biological precursor molecules could have been introduced by impacting comets or other planetesimals (Chyba et al., 1990). Secondly, many of the proposed prebiotic reaction pathways are mutually incompatible. For example, formation of various sugars requires high concentrations of formaldehyde in alkaline solution, but under such conditions, formaldehyde would react rapidly with the amino group of both nucleic acid bases and amino acids (Shapiro, 1988). Nonetheless, regional variations in conditions (temperature, pH, chemical composition, etc.) might have allowed incompatible reactions to occur in different locales and at different times on the primitive Earth.  3 1.1.2 The living world before the first cell  Prebiotic syntheses created a collection of basic building blocks of life and probably also short peptides and nucleic acids. Curiously, ribose is much more readily synthesized than deoxyribose under simulated prebiotic conditions (reviewed by Weiner, 1987). This may be the first indication of the primacy of RNA in the origin of life. The chemical properties of RNA seem to suit it to play central roles in the early history of life. First, it has inherent template properties, enabling it to self-replicate. Secondly, RNA can catalyze chemical reactions as illustrated by the self-splicing group I and II introns, the RNA component of RNase P, and the peptidyl transferase activity of the ribosomal RNA (rRNA) component of ribosomes (Noller et al., 1992). Most recently, it was shown that an engineered ribozyme is able to function effectively both as a catalyst and a template in selfcopying reactions (Green and Szostak, 1992). These and other key properties of RNA have led to the hypothesis that life was based on RNA during early stages of evolution and that DNA supplanted RNA as the dominant genetic material "relatively late" in the history of life (Crick, 1969; Gilbert, 1986). Naturally, because of the tremendous task of maintaining all metabolic reactions within a modern cell, it seems to be impossible to imagine that RNA catalysis alone would be enough to carry out the transition from an RNA world to a DNA world. Thus it seems likely that encoded protein synthesis evolved in an RNA world and preceded the advent of DNA. How protein synthesis could have evolved in an RNA world was addressed in the "genomic tag model" of Weiner and Maizels (1987 and 1991). In this model, it was proposed that tRNAs would have been derived from the 3' terminal structures that tagged RNA genomes for replication by a replicase made of RNA; charging of this tRNA-like structure with an amino acid could have been selected to facilitate replication, whereas a variant RNA replicase may have given rise to the first tRNA synthetase that ran transfer a  charged amino acid to a 3' terminal tRNA-like structure. Thus, this model makes it  4 possible to select for two key components of the modern translation apparatus, tRNA and a tRNA synthetase, for reasons of replication. From this, it is not difficult to imagine any number of scenarios by which encoded protein synthesis might have arisen. In this way, the RNA world would gradually give rise to a ribonucleoprotein or RNP world early in evolution. In the RNP world, RNA might have retained essential catalytic activities, while proteins would play structural roles and enhance catalytic efficiency. Other proteins might have gained novel catalytic capabilities. Recent experimental observations reveal that ribosomal RNA may catalyze peptide-bond formation during protein synthesis (Noller et al., 1992), and that an engineered Tetrahymena ribozyme can catalyze the hydrolysis of an aminoacylester bond (Piccirilli et al., 1992). These results seem to be consistent with the "genomic tag model" for the origin of protein synthesis. The ribose 2'-OH group that aids RNA catalysis also renders the RNA chain particularly vulnerable to hydrolysis. As the complexity of the living systems increased, RNA became less suitable as the genetic material. Ribonucleoside diphosphate reductase was the key enzyme that made the transition to the DNA world possible. In the DNA world, right-handed double helix DNA serves as the genetic material, and some ribonucleoprotein complexes have been retained as important cellular components, such as the ribosomes and RNase P. 1.1.3 The first cell  Quantitative phylogenetic analyses of the sequences of many proteins and nucleic acids (RNA and DNA), especially small subunit ribosomal RNA, have indicated that living organisms evolved from a common primordial ancestor (not necessarily a cellular entity), and subsequently divided into three primary kingdoms: the eubacteria, the archaebacteria and the eukaryotes (Pace, et al., 1986; Woese and Olsen, 1986; Woese et al., 1990; Pace, 1991). (Woese et al. proposed that  the three kingdoms should be renamed as three "domains:" the Bacteria for  5 eubacteria, the Archaea for archaebacteria, and the Eucarya for eukaryotes [Woese et al., 1990; Wheelis et al., 1992].) The common ancestor of all modern life, dubbed the  progenote, may have come into being after a crude translation mechanism and a nucleic acid-based genetic system were devised. The macromolecules were most likely already enclosed within a lipid membrane at this stage of evolution (Woese, 1987). Because the differences in cell architecture and many molecular characteristics are profound among the three primary kingdoms, and the fundamental cellular functions (translation, transcription and replication and so on) seem unique in each line of descent, it is plausible that the common ancestor that led to the ancestors of archaebacteria, eubacteria and eukaryotes must have been very different from any modern organisms. Woese (1987) speculated that the progenote may have lacked most of the functions characteristic of the cells known today, and that its rudimentary machinery would have undergone significant refinement and augmentation in the descendant lineages. Studies on microfossils of bacteria and stromatolites (laminated mounds with structures of a type often associated with mats of microorganisms) suggested that bacteria-like cells already existed 3.5 billion years ago (Walter et al., 1980; Schopf and Walter, 1983; Knoll and Barghoon, 1985; Schopf, 1993). Photosynthetic eubacteria were among these ancient cells (Walter, 1983), which implies that the common eubacterial ancestor existed even earlier. Thus, the common ancestor would probably have evolved rather quickly into the ancestors of modern organisms in less than 1 billion years. The question of the nature of the first cell may be the most difficult in biology. Nonetheless, insights into this have been emerging from evolutionary studies of modern organisms. Since archaebacterial rRNAs have on average a substantially more slowly evolving rate than those of either eukaryotes or eubacteria, it was proposed that archaebacteria are more closely related to the common ancestor than the other two groups (Woese, 1987; Pace, 1991). A group of  6 hyperthermophilic archaebacteria such as Pyrodictium have evolved particularly slowly; thus they may be the closest living relatives of the first organisms. These hyperthermophiles share some unique properties, such as extremely high growthtemperatures (about 100°C), utilization of geochemical energy sources (e.g. sulfur, molecular hydrogen), and fixation of CO2 (reviewed by Pace, 1991; see below). Along this conjecture, it was further proposed recently that the first cells were probably chemo-autotrophic, thriving in a thermophilic and anaerobic environment containing iron-sulfur compounds (pyrite) (W5chtershauser, 1992). The sole reducing power for fixation of CO2 could have been provided by oxidative formation of pyrite (FeS2) from hydrogen sulfide (H2S) and ferrous (Fe 2 +) ion. Such an environment would also allow the propagation and accumulation of biological compounds, in which water activity is low; thus, macromolecular chemistry could take place (reviewed by Pace, 1991). In contrast, Cavalier-Smith (1991) argued that the first cells were probably gram-negative photoheterotrophic eubacteria with an outer membrane surrounding their plasma membrane. Archaebacteria and eukaryotes may have evolved from a Thermotoga-like eubacterium after it lost the murein-based cell wall. The archaebacterial ancestor evolved a new cell wall and isoprenoidal ether lipids, while the eukaryotic cells arose from this mutant eubacterium and evolved an internal cytoskeleton based on actin microtubules, which would allow the formation and evolution of endomembranes and nuclei. It is highly unlikely that these debates about what were the phenotypes of the first cells can be settled soon. Nonetheless, it is now generally accepted that life on Earth evolved from a single ancestor, which is represented by the root in a universal phylogeny. The position of the root appears to fall closer to the eubacterial lineage, while the archaebacterial and eukaryotic lineages might arise by a second split after their common ancestor evolved from the primitive ancestor (Gorgarten et al., 1989; Iwabe et al., 1989). However, the position of the root and the  7 origins of all three lineages, especially the eukaryotes, remain unresolved. It is possible that eukaryotes may have evolved from a branch within the archaebacterial lineages; one contention argues that "eocytes" (a group of hyperthermophilic archaebacteria, e.g. Sulfolobus) are the closest relatives of eukaryotes (Lake, 1988; Rivera and Lake, 1992). Figure 1 summarizes a possible scheme for early evolution.  1.2 HYPERTHERMOPHILIC PROKARYOTES 1.2.2 General properties of hyperthermophiles  Organisms with a maximal growth temperature (T max ) greater than 50°C are defined as thermophiles (Brock, 1986), while organisms with an optimal growth temperature around 80°C are generally classified as hyperthermophiles (Kristjansson and Stetter, 1992). Thermophilic microorganisms have been recognized since the late nineteenth century. Early research on thermophiles mostly centered on several species of Bacillus and other thermophilic eubacteria which were isolated from a wide range of both thermophilic and mesophilic (temperature lower than 50°C) environments. These organisms can grow at temperatures as high as 70°C. The recent discovery of many extremely thermophilic microorganisms with optimal growth temperatures above 80°C has brought much interest into the field of thermophilic research. Most of these extremely thermophilic organisms (or hyperthermophiles) were isolated from hot springs, solfataras, geothermal soils and submarine vents. Such hyperthermophiles are chiefly anaerobic chemolithoautotrophs and heterotrophs. Molecular hydrogen (H2) is often used as the main source of energy by some chemolithoautotrophs. It was discovered that most of the hyperthermophiles belong to the archaebacterial lineage. The hyperthermophiles in the archaebacterial lineage include some of the methanogens  8 EVOLUTIONARY STAGE  MOLECULAR AND CELLULAR EVENTS  PREBIOTIC^ SYNTHESIS^  SYNTHESIS OF ESSENTIAL BUILDING BLOCKS (Amino acids, bases, sugars, nucleosides, nucleotides, fatty acids, cofactors)  1  CONDENSATION OF BUILDING BLOCKS (Oligonucleotides, oligopeptides, lipids)  EVOLUTION OF PROGENOTE  Advent of membranes  FIRST RNA REPLICASE RNA world  RNA GENOMES (Distinction between genomic and functional RNA molecules? RNA splicing? Primitive metabolism?)  PEPTIDE-SPECIFIC RIBOSOMES DEFINE GENETIC CODE (Primitive IRNAs, rRNAs, aminoacyl-tRNA synthetases)  1  TEMPLATE-DEPENDENT TRANSLATION APPARATUS  RNP world  (True mRNAs)  DNA world  V  CELLULAR^ AND ORGANISMAL^ EVOLUTION^  TRANSCRIPTION AND REPLICATION OF SEGMENTED DOUBLE-STRANDED GENOMES  r  4, RNA GENOMES COPIED INTO DNA (Ribonucleoside diphosphate reductase, reverse transcriptase)  3,  PROGENOTE (DNA genome, thymidylate synthase, most genes have introns, slow growth, heterotrophic)  Common ancestor of archaebacteria and eukaryotes  Common ancestor of eubacteria (probably hyperthermophilic)  (Probably hyperthermophilic)  Selection for efficient growth, autotrophism, extra DNA lost  Selection for complexity, inefficient growth extra DNA tolerated  1  URKARYOTES  EUBACTERIA  Hyperthermophiles  Oxidative phosphorylation, photosynthesis  (Sulfur-metabolizing thermoacidophiles etc.)  (Eukaryotic nuclear lineage)  4,  ARCHAEBACTERIA  Methanogens and Halophiles MITOCHONDRIA^  SINGLE-CELLED EUKAROTES Multicellularity ^  CHLOROPLASTS  ANIMALS^PLANTS Introns retained, Introns retained, both heterotrophic^heterotrophic and autotrophic  •  Some introns retained in tRNA, rRNA and some protein genes  Figure 1 A possible scheme for early evolution (Modified after Darnell and Doolittle [1986].)  Most introns lost, Operons arise  9 and sulfur-metabolizing thermophiles. The extreme halophiles are usually mesophilic with only one species that can grow up to 55°C (Grant and Larsen, 1989). The hyperthermophilic methanogens such as Methanothermus fervidus are neutrophilic, strictly anaerobic autotrophs, growing on hydrogen and CO2 (Stetter et al., 1981). Sulfur-metabolizing archaebacteria are metabolically very diverse. There  are thermoacidophiles, which are obligate or facultative aerobes, growing optimally at pH 2 to 3. The thermoneutrophiles are usually obligate anaerobes, and grow between pH 5.5 and 7.0; some of them are the most thermophilic organisms known (T max =110°C) (Kristjansson and Stetter, 1992; Stetter, 1993). Sulfur-metabolizing hyperthermophiles have small, circular chromosomes. For example, the sizes of the circular chromosomes of Sulfolobus acidocaldarius and Thermococcus celer are 3.1 and 2.0 megabase pairs respectively (Noll, 1989; Yamagichi and Oshima, 1990). Seven species of hyperthermophilic eubacteria have been isolated (Huber and Stetter, 1992, Huber et al., 1992). Among them, six species are in the order Thermotogales. Some, such as Thermotoga neapolitana and Thermotoga thermarum, were isolated from terrestrial neutral hot springs, whilst other species  thrive in marine hydrothermal fields. Thermotoga maritima was the first hyperthermophilic eubacterium to be isolated, originally from a geothermal marine sediment at Vulcano, Italy, and subsequently within many marine hightemperature ecosystems around the world (reviewed by Huber and Stetter, 1992). It is a fermentative heterotroph, grows anaerobically at temperatures up to 90°C, and utilizes a wide range of carbon sources, such as ribose, glucose, xylose, cellulose, starch and glycogen. The organisms in the order Thermotogales are gram-negative, rod-shaped bacteria, which are about 2 to 5 pm in length and 0.5 to 0.6 pm in diameter. These eubacteria grow singly or in pairs; some form short chains and aggregates. Cells of both T. maritima and T. neapolitana show a characteristic "toga," a sheath - like  10 outer envelope, ballooning over the ends. The main protein constituent of the toga is a porin (reviewed by Huber and Stetter, 1992). Members of Thermotogales show some distinctive biochemical characteristics. All of them contain rare long-chain dicarboxylic acids, and the species in the genus Thermotoga contain a novel ether-lipid, 15,16-dimethy1-30glyceryloxytriacontanoic acid (De Rosa et al., 1988). Novel glycolipids were also identified in T. maritima cells (Monca et al., 1992). Both T. maritima and T. neapolitana are insensitive to the antibiotic rifampicin, although their RNA  polymerase is of the eubacterial type (Huber et al., 1986; Huber and Stetter, 1992). The ribosomes of T. maritima are insensitive to the aminoglycoside antibiotics (streptomycin, kanamycin, gentamycin, neomycin and paromomycin) (Londei et al., 1988).  In addition, the hyperthermophilic eubacterium Aquifex pyrophilus was identified recently in a hot marine sediment. It is strictly chemolithoautotrophic, and uses H2, thiosulfate, and elemental sulfur as electron donors, and oxygen and nitrate as electron acceptors (Huber et al., 1992). The currently known hyperthermophilic eubacteria and archaebacteria are summarized in Table 1. 1.2.2^The molecular basis of thermophily  Contemporary mesophilic organisms cannot survive at high temperatures, because their proteins are denatured upon exposure to heat. The consequences of heat damage to proteins include the unfolding of tertiary structures. Chemical modifications, such as deamination of asparagine and glutamine, hydrolysis of aspartic acid- and asparagine-containing peptide-bonds, as well as destruction of cystine bonds, also occur (Hensel, et al., 1992). Thus it is extraordinary that thermophilic microorganisms not only can survive, but can also grow and multiply at high temperatures. This has led to consiciprahJe interest in the  molecular bases of thermophily. Initially, the protein sequences from thermophilic  11  Table 1^Taxonomy of hyperthermophilic prokaryotes  Order  -  Species  Tmax (°C)  References  T. maritima T.^neapolitana  90 90  Thermotogales  T.^thermarum  84  "Aquificiales"  Thermosipho T. africanus* Fervidobacterium F. nodosum F.^islandicum Aquifex A. pyrophilus  77 80 80 95  Huber et al. (1986) Windberger et al. (1989) Windberger et al. (1989) Huber et al. (1989b) Patel et al. (1985) Huber et al. (1990) Huber et al. (1992b)  Sulfolobus  85  Brock et al. (1972)  87 86 75 85  Zillig et al. (1980) Grogan et al. (1990) Huber et al. (1991) Stetter (1993)  80 95 75  Huber et al. (1989a) Segerer et al. (1986) Brierley et al. (1983) Zillig et al. (1987) Segerer et al. (1991) Zillig et al. (1981) Stetter (1986)  Genus  I. Eubacteria Thermotoga  II. Archaebacteria  S. acidocaldarius S. solfataricus S. shibatae S.^metallicus* S. thurin giensis M. sedula A. infernus A. brierleyi* -  Sulfolobales  Metallosphaera Acidianus Desulfurolobus Stygiolobus Thermoproteus  D. ambivalens S. azoricus T. tenax T. neutro philus T.^uzoniensis  95 89 97 97  P. islandicum P. organo trophum T. pendens  103 103  BonchOsmolovskaya et al. (1990) Huber et al. (1987) Huber et al. (1987)  95 95  Zillig et al. (1983) Stetter (1986)  -  Thermoproteales Pyrobaculum  -  Thermofilum  T. librum  97  12  Table 1 (continued)  Desulfurococcus  D. mobilis D. mucosus D. saccha rovorans D. amylolyticus -  Desulfurococcales  Pyrodictiales  Staphylothermus S. marinus Pyrodictium P. occultum P. brockii P. abyssi Hyperthermus H. butylicus Thermodiscus T.^maritimus Thermococcus T. celer T.^litoralis T. stetteri  Thermococcales  Archaeoglobus "Archaeoglobales"  T. acidamino vorans T. tadjuricus P. furiosus P. woesii A. fulgidus A. profundus  A.^litho trophicus MethanoMethanothermus M. fervidus bacteriales M. sociabilis Methanococcus M.^thermolithotrophicus* Methanococcales M. jannaschii M. igneus "Methan-  opyrales"  -  Methanopyrus  M. kandleri  Zillig et al. (1982) Zillig et al. (1982) Stetter (1986)  97  BonchOsmolovskaya et al. (1985) Fiala et al. (1986) Stetter et al. (1983) Stetter et al. (1983) Pley et al. (1991) Zillig et al. (1990) Stetter (1986) Zillig et al. (1983) Neuner et al. (1990) Miroshnichenko et al. (1989) Stetter (1993)  98 110 110 110 108 98 93 98 98  -  Pyrococcus  95 97 97  96 94 103 103 92 92 100  Stetter (1993) Fiala et al. (1986) Zillig et al. (1987) Stetter et al. (1987) Burggraf et al. (1990a) Stetter (1993)  97 97 70  Stetter et al. (1981) Lauerer et al. (1986) Huber et al. 1982  86 91  Jones et al. (1983) Burggraf et al. (1990b) Kurr et al. (1991)  110  * extremely thermophilic organism (T max <80°C) (Table adapted after Stetter [1993]).  13 and mesophilic organisms were compared. Thermophilic bacilli were often used as models for many of these studies which now have accumulated useful information which has been utilized by biochemists to attempt to modify proteins to improve their thermostability (Clarke et al., 1986). Amino acid replacements that make proteins thermostable are often subtle, and frequently involve a small number of residues. For example, glutamic acid and arginine are favored over aspartic acid and lysine, respectively; cysteine and asparagine are often avoided (Robson and Pain, 1971; Hensel et al., 1992), and high levels of glycine and proline residues (allowing proteins to fold tighter, and therefore more stable, loops that connect the secondary structural-elements) were also observed in some thermal stable proteins (Davies et al., 1991; Watanabe et al., 1991), but definitive rules are lacking (Wedler and Merkler, 1985). It is generally agreed that the thermostability of proteins is primarily dependent on the threedimensional structure. Homologous proteins from thermophiles and mesophiles often share similar tertiary structures, but the thermostable proteins exhibit more extensive hydrogen bonding, hydrophobic interactions, and ionic bonding, as well as more stable a helices, from which helix-destabilizing residues are deleted (Perutz and Raidt, 1975; Davies et al., 1993). Comparative studies of three-dimensional structures among mesophilic and thermophilic proteins have yielded best insights into the factors that control the thermal stability of proteins. Perutz and Raidt (1975) concluded that the enhanced thermal stability in ferredoxins and hemoglobins were conferred by additional salt bridges and/or hydrogen bonds. On the other hand, these stabilizing forces may make thermostable proteins less flexible. For example, D-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from  T. maritima has a rigid structure at 25°C when compared to mesophilic GAPDH molecules: A temperature-dependent conformational change of the T. maritima enzyme has been detected, which is probably required  to activate it  to function at  the physiological temperatures of T. maritima (Wrba et al., 1990; Rehaber and  14 Jaenicke, 1992). A small structural reorganization in the glutamine synthetase of Bacillus stearothermophilus occurs at well below the temperature of enzyme  inactivation—such change may also be required for the activity of the enzyme (Magsanaga and Nosoh, 1974). Enzymes that do not have intrinsic thermostability at the optimal growth temperature of an organism may be stabilized by cell components such as membranes, cofactors and substrates, as well as a special intracellular micromolecular environment (reviewed by Sharp et al., 1992). Purified DNA polymerase from a Thermotoga species was moderately thermostable and exhibited a half-life of 3 min at 95°C, and 60 min at 50°C in the absence of substrates. Given the high growth temperature of this organism, it is probable that the substrates of the enzyme and other cellular components may be needed to enhance its thermostability (Simpson et al., 1990). Binding substrate and some metal ions has been shown to stabilize the glutamine synthetase from Bacillus caldolyticus at its physiological temperature (Wedler and Merkler, 1985; Merkler et al., 1988). A highly thermostable hydrogenase from T. maritima has been characterized, which shows some distinctive properties when compared with hydrogenases of mesophilic bacteria (Juszczak et al., 1991). For example, tungsten is needed for its activity and its iron-sulfur clusters may be different from those of mesophiles, although its amino acid composition is very similar to those of mesophilic hydrogenases. These chemical differences may account for their different thermostability. Other strategies, such as aggregation, are also employed for stabilization of proteins (Wedler and Hoffman, 1974). Genetically, it has been demonstrated that DNA from the thermophilic Bacilli strains can be used to transform mesophilic Bacillus subtilis, enabling it to  grow at higher temperatures. Because the transformants can produce some  thermostable variants of proteins that are normally thermolabile in the host cells, it was postulated that thermophily may be controlled by a small number of genes  15 affecting translation, which leads to the thermostable phenotype (reviewed by Sharp et al., 1992). For thermophiles that can grow at both high and low temperatures (a temperature span greater than 40°C), it was proposed that two sets of genes for some key enzymes are probably present, one for low and the other for high temperature growth, and that their expression might be regulated by temperature (Wiegel, 1990). The RNA components of the RNase P from thermophilic eubacteria, such as T. maritima, have remarkably high thermostability (Brown et al., 1993). Like  thermostable proteins, the ability of the RNase P RNAs to resist heat denaturation is to a large extent inherent in their nucleotide sequences. Several strategies are employed to improve thermal stability of these RNAs: (i) increasing the number of hydrogen bonds in helices (higher G+C-content, more G-C base-pairs in the secondary structure); (ii) minimizing destabilizing elements in the secondary and tertiary structures (helices with fewer bulged nucleotides, minimal lengths of connections between helices); (iii) increasing Watson-Crick base pairs at the bases of stem-loops; and (iv) avoiding alternative foldings (fewer alternative structures near the minimal free energy state or short sequence length) (Brown et al., 1993). 1.2.3 Phylogeny  Biological and geological evidence seems to indicate that the ancestor of eubacteria may be a hyperthermophile. First, phylogenetic analyses based on the sequences of small subunit ribosomal RNA, and subsequently various other molecules, such as translation elongation factors Tu and G, suggested that the deepest branches in the eubacterial kingdom are represented exclusively by slowly evolving hyperthermophiles including both T. maritima and A. pyrophilus (Achenbach-Richter et al., 1987; Bachleitner et al., 1989; Tiboni et al., 1991; Burggraf et al., 1992; Huber et al., 1992; Cousineau et al., 1992). Secondly, it has been proposed  that the hydrosphere was far hotter three billion years ago than it is now (Ernst,  16 1983). As noted above, eubacteria already existed at that time; those ancient bacteria on stromatolites were probably Ch/oroflexus-like organisms, the photosynthetic deep-branching thermophiles belonging to the order green gliding bacteria (Walter, 1983; Woese, 1987). Within the archaebacterial kingdom, hyperthermophiles also occupy very deep branches in the universal phylogenetic tree based on various analyses (Burggraf et al., 1992; Cousineau et al., 1992; Wheelis et al., 1992; Hasegawa et al., 1993). Therefore, it is plausible that the common ancestor of archaebacteria was also hyperthermophilic (Woese, 1987; Pace, 1991). Because of the proximity of the hyperthermophiles to the root of the universal phylogenetic tree, the common ancestor of all living organisms was probably thermophilic (Pace, 1991; Stetter, 1993). A universal phylogenetic tree including many hyperthermophiles is depicted in Figure 2.  1.3 THE RIBOSOMES Ribosomes are ribonucleoprotein complexes that use an mRNA template to align and polymerize amino acids into proteins. Each amino acid is carried on its cognate tRNA, which uses an anticodon to recognize a corresponding codon on the mRNA. Since proteins have more versatile catalytic power than RNA, the selective pressure for the early living systems to devise a mechanism to make protein must have been intense. As a result, the ribosomes likely arose in an RNA world (see discussion above). The primitive ribosome probably consisted mainly of RNA (Woese 1980). The first ribosomal proteins were likely to have been small peptides that interacted with rRNA to stabilize the structure and function of the rRNA (Weiner and Maizels, 1987 and 1991). These ribosomes were probably error-  prone, only able to make small polypeptides (Woese, 1987). Further evolution  ^  17  ^ Animals  Eukaryotes Plants  Microsporidia ^ Ciliates  Fungi  ^  Flagellates  Slime molds Diplomonads  Green non-sulfur bacteria  Eubacteria  ^  Archaebacteria Sulfolobus Desulfurococcus  Gram-positive^ bacteria^Thermotoga Purple bacteria  Pyrodictium Thermofilum  Flavobacteria^ ^ Cyanobacteria  Thermoproteus Pyrobaculum  r oloc^1.6.0 nottle  ett  Methanopyrus  6  ^  afna 9 ''  Aquifex  M. Igneus M. thermolithotrophicus  ° Methanobacterium  M.  Archaeoglobus  ^  vannielii^  Halococcus  Halobacterium  Progenote ^Methanoplanus Methanosarcina Methanospirillum  Figure 2^A universal phylogenetic tree The hyperthermophilic organisms within the tree are indicated by heavy lines (Figure  adapted after Stetter  [1993])  18 resulted in the development of a more efficient translational apparatus with many basic features of a modern ribosome, such as the arrangement as a two-subunit entity, by the progenote stage. After divergence of the three primary kingdoms, the efficiency and accuracy of ribosomes were further refined independently in each line of descent. The modern ribosome is an exceedingly sophisticated molecular machine comprised of a few species of RNA and over 50 different proteins: the eubacterial ribosome consists of 5S, 16S and 23S rRNA molecules and approximately 50 proteins; the eukaryotic counterpart is comprised of 5S, 5.8S, 18S, and 28S rRNA molecules and about 75 proteins; the archaebacterial ribosome contains 5S, 16S, and 23S rRNA molecules and 50 to 65 proteins. The ribosomal components are encoded on many operons in the chromosomes of eubacteria and archaebacteria. There are seven rRNA operons, and approximately 20 operons for the 52 different ribosomal proteins in Escherichia coli; each ribosomal protein is encoded by a single-copy gene. The ribosomal protein genes are mostly organized in clusters of one or more transcriptional units that often contain additional genes that encode proteins involved in DNA replication (e.g. DNA primase), transcription (e.g. subunits of RNA polymerase), and translation (e.g. tRNA molecules, translational elongation factors Tu and G). In other eubacteria, ribosomal component genes are usually arranged in a similar fashion. Although there is only a single rRNA operon in the chromosome of T. maritima, the organization of 16S, tRNA, 23S, 5S and tRNA is the same as in E. coli. In eukaryotes, ribosomal proteins are encoded by single or multicopy genes of monocistronic transcriptional units which are rarely clustered within the genome (reviewed by Planta et al., 1986; Warner, 1989). Archaebacterial genes encoding ribosomal components have been extensively studied (reviewed by Matheson et al., 1990; Wittmann-Liebold et al., 1990; Durovic et al., 1993a). The gene organization of ribosomal operons in archaebacteria often  resembles that of corresponding operons in eubacteria. For example, the gene  19 organizations of the L11, L1, L10 and L12 ribosomal protein clusters in some archaebacteria are identical to that of rif operon in E. coli, and the gene clustering patterns corresponding to the E. coli spc, S10, and str operons are very similar from eubacteria to archaebacteria (Matheson et al., 1990; Wittmann-Liebold et al., 1990). The E. coli rif operon, located at about 90 min in the chromosome, encodes four large subunit ribosomal proteins, i.e., L11, L1, L10 and L12, and two RNA polymerase subunits 03 and 0'). The ribosomal proteins encoded on this cluster are involved directly or indirectly in the GTPase activity on the large ribosomal subunit, and are required for the binding of extrinsic factors (e.g. EF-Tu and EF-G) to the ribosome (reviewed by Liljas, 1982). Since the homologous proteins in the GTPase domain have been identified in the eubacteria, the archaebacteria and the eukaryotes, it must have formed prior to the divergence of the three primary living kingdoms (reviewed by Shimmin et al., 1989; Shimmin, 1990). Many studies concerning the structure, function and evolution of these proteins have been reported (Liljas, 1982; Leijonmarck et al., 1987; Shimmin et al., 1989; Shimmin and Dennis, 1989; Matheson et al., 1990; Wittmann-Liebold et al., 1990; Shimmin, 1990; KOpke et al., 1992).  1.4 TRANSCRIPTION  Transcription is a cellular process in which RNA polymerase synthesizes RNA (rRNA, mRNA and tRNA) using a DNA template. RNA polymerase is one of the largest cellular protein complexes in the bacterial cell. In E. coli, the entire enzyme consists of four kinds of subunits with composition of oc21313'a. Like nearly all biological polymerization reactions, transcription takes place in three steps: initiation, elongation, and termination. Synthesis of RNA starts at a specific region, called a promoter, on the template. The a factors bind to RNA polymerase, enabling it to selectively initiate transcription at various kinds of promoters. The  20 typical eubacterial promoter sequences show two common motifs on the 5' side (upstream) of the transcriptional start site. These are denoted as the —10 sequence (Pribnow box), which has a consensus of TATAAT, and the —35 sequence with a consensus sequence of TTGACA. The distance between these two conserved elements is crucial; it is almost always between 16 and 18 nucleotides (McClure, 1985). Following initiation of transcription, the a factor spontaneously dissociates from the RNA polymerase, and transcription enters the elongation step. In the elongation phase, RNA polymerase stays bound to its template until a termination signal is reached. A few cellular proteins, such as NusA, NusB, NusG and NusE (ribosomal protein S10), play important roles in regulating transcription elongation and termination in E. co/i (Das, 1992; Roberts, 1993). The NusA protein reduces the rate of transcript elongation, enhances pausing by RNA polymerase at certain sites, and is important for transcription termination at other sites (Greenblatt, 1991; Das, 1992). The proteins NusB and S10 form heterodimers that interact specifically with a conserved sequence called box A within the leader region of ribosomal RNA transcripts (Mason et al., 1992). This interaction permits RNA polymerase to transcribe through Rho-dependent transcriptional terminators in the ribosomal RNA operons (Nodwell and Greenblatt, 1993). The E. coli NusG stimulates antitermination of transcription mediated by the N protein of bacteriophage X along with purified NusA, NusB and NusE. The NusG protein stabilizes the transcriptional elongation-complex N-NusA-RNA polymerase when transcribing a nut-containing template, and also facilitates efficient Rho-dependent termination  in vivo and in vitro (reviewed by Das, 1992; Sullivan and Gottesman, 1992; Whalen et al., 1992; Li et al., 1993). Finally, when the RNA polymerase reaches a stop signal, transcription is terminated and the a factor rejoins the core of the enzyme (Platt,  1986), since it has stronger intrinsic affinity to the free RNA polymerase than elongation factor NusA (reviewed by Greenblatt, 1992).  21 In eukaryotic cells, transcription occurs in the nucleus, whereas translation takes place outside the nucleus. There are three types of eukaryotic RNA polymerases: RNA polymerase I (Pol I), RNA polymerase II (Pol II), and RNA polymerase III (Pol III). Pol I transcribes the tandem array of genes for 18S, 5.8S, and 28S rRNAs. Pol II synthesizes precursors of mRNA and several small RNA molecules, such as the U1 snRNA of the spliceosomes. Pol III makes 5S rRNA and all of the tRNA molecules, and most of the small RNAs. The sequence elements that control transcription and their positions in eukaryotic genes are complex. For the Pol II promoters, the TATA box is located about 30-nucleotides 5' to the start site of transcription initiation. Additional upstream activation sequences, such as the CAAT box and the GC box, which are located further upstream, improve transcription efficiency. Because of the diversity of the control sequences in the eukaryotic genes, many proteins, called transcription factors, are required to regulate transcription (Sawadogo and Sentenac, 1990; Conaway and Conaway, 1991).  1.5 OBJECTIVES OF THIS STUDY  Early-branching and slow-evolving organisms like T. maritima may have higher likelihood of retaining ancestral characteristics than later-branching and more rapidly evolving organisms. Therefore, a detailed analysis of molecular sequences and biochemical data from hyperthermophilic organisms may reveal features of early evolution. The present investigation has the following objectives: (i) to provide basic data from T. maritima for evolutionary comparison studies; (ii) to give perspectives on evolution of translation and transcription apparatuses; and (iii) to investigate the regulation of gene expression in this hyperthermophilic organism. This thesis focuses on the biochemical and evolutionary analysis of a genomic region from T. maritima that corresponds to the E. coli rif region, which encodes several essential transcription and translation components including the  22 GTPase-domain proteins of the ribosome. The thesis first deals with the cloning and sequencing of this region, and the genomic organization and the expression patterns of the genes located in this cloned fragment (Part III). The next part presents the function of the transcription factor NusG (Part IV). The thesis then discusses quantitative phylogenetic analysis based on the sequences of ribosomal proteins L11, L1, L10, and L12 (Part V). Each chapter begins with an introduction with expanded information particularly relevant to the content discussed, which is not detailed in the general introduction (Part 1). Finally, a brief conclusion and some thoughts for the future are given.  23  II. Materials and methods 2.1 MATERIALS  Bacterial cell culture components: yeast extract, tryptone, and agar were purchased from Difco Laboratories, ampicilin was from Sigma Chemical Co. (Sigma), isopropyl-O-D-thiogalactopyranoside (IPTG) from GIBCO Bethesda Research Laboratories Life Technologies (BRL), 5-bromo-4-chloro-3-indoly1-13-Dgalactopyranoside (X-gal) from BRL or Biosynth AG. NTPs, dNTPs, and ddNTPs were obtained from either Pharmacia LKB Biotechnology Inc. (Pharmacia) or United States Biochemical Co. (USB, in the Sequenase kit, version 2). Radioactive [a- 32 NNTPs, [a- 32 P]dNTP, and [7- 32 PJATP were from Dupont NEN Research Products. Most restriction enzymes, and DNA and RNA modifying enzymes were purchased from Pharmacia or New England Biolabs (NEB). The E. coli DNAdependent RNA polymerase was from Boehringer Mannheim; partially purified DNA-dependent RNA polymerase of Thermotoga maritima was provided by Dr. Peter Palm (Max Planck Institute, Martinsried, Germany). Modified T7 DNA polymerase (Sequenase) and shrimp alkaline phosphatase (SAP) were from USB. Proteinase K and Moloney murine leukemia virus reverse transcriptase (MMLVRT) were from BRL. Exonuclease III and S1 nuclease were from Promega. Acrylamide and N,N'-methylene-bis-acrylamide were purchased from BioRad Laboratories (Bio-Rad); agarose (genetic technology grade) was from Schwartz/Mann Biotech. All other chemicals were purchased from Sigma and BDH. CM Sepharose CL-6B ion exchange resin was from Pharmacia.  Hybond-N nylon membrane was obtained from Amersham; nitrocellulose membrane was from Bio-Rad. Films for autoradiography (XRP-1 and XAR-5, for  24 radioactive nucleic acid) and photography (for ethidium bromide stained DNA or RNA gels and Coomassie blue stained protein gels) were from Eastman Kodak and Polaroid, respectively. All buffers containing Tris described below were brought to the appropriate pH by using concentrated HC1; the pH of EDTA stock solutions was adjusted with concentrated NaOH.  2.2 BACTERIAL STRAINS, PLASMIDS, AND MEDIA The E. co/i strains that were used for cloning are described in Table 2. YT (5 g yeast extract, 5 g NaC1, 8 g tryptone, per liter, pH adjusted to 7.5 with NaOH) was used for growing E. coli. The T. maritima MSB8 and the recombinant plasmid pUC-TB4 which contains a portion of the T. maritima rpoB gene and about 1 kb of upstream sequence were kindly provided by W. Zillig (Max Planck Institute, Martinsried, Germany). Plasmids pGEM series (Promega) and bacteriophage A.gt10 were used for cloning and sequencing. The T. maritima strain MSB8 was cultured at 75°C in MMS medium (Huber,  et al., 1986). It contains (per liter): 6.93 g NaCl; 1.75 g MgSO4 • 7H20; 1.38 g MgC12•6H20; 0.16 g KC1; 25 mg NaBr; 7.5 mg H3B03; 3.8 mg SrC12•6H20; 0.025 mg KI; 0.38 g CaC12; 0.5 g KH2PO4; 0.5 g Na2S; 2 mg (NH4)2Ni(SO4)2; 15 ml trace minerals (Balch et al., 1979); 1 mg resazurin; 5 g starch; pH 6.5 (adjusted with H2SO4).  2.3 MOLECULAR BIOLOGICAL TECHNIQUES General molecular biology experiments were performed according to protocols described in Sambrook et al. (1989) and Promega Protocols and Application  25  Guide (Titus, 1991).  Table 2^E. coli strains used for cloning  Strain^  Genotype  JM101^A(lac-proAB), supE, thi/F' /acI cIZAM15, traD36, proAB+  JM109^recA1, supE44, endA1, hsdR17, gyrA96, relA1, thi, A(lacproAB)/F'traD36, proAB± lacIq /acZAM15  DH5aF' A(lacZY A-argF), U169, endA1, recA1, hsdR17(rk mk+), deoR, thi-1, -  supE44, X , gyrA96, re/AVE' 4)80 diacZAM15 -  DH5a^A(lacZY A argF), U169, endA1, recA1, hsdR17(rk mk+), deoR, thi 1, -  -  -  supE44, X -, gyrA96, re/A1/F" 080 diacZAM15 LE392^F - , hsdR1Ark - mk+), supE44, supF58, lac Y1 or 0(lacIZY)6, galK2, galT22, metB1, trpR55, X -  BL21(DE3) F , ompT,rB me -  -  (DE3 is a X. derivitive that was inserted into the int gene of the  chromosome of the host [BL21]. This X fragment carries the T7 RNA polymerase gene under the control of lacUV5 promoter.)  ^2.3.1^Gel electrophoresis  Agarose (genetic technology grade) slab gels (0.8% or 1%) and polyacrylamide gels were run in 0.5X TBE buffer (45 mM Tris, 45 mM boric acid, 1 mM EDTA, pH 8.2) at various voltages. The gels were run in the presence of 0.25 tg/m1 ethidium bromide, or were stained with it after electrophoresis.  ^2.3.2^DNA restriction fragment preparation  Restricted DNA fragments were separated by gel electrophoresis; the bands were visualized by ethidium bromide staining or autoradiography (when the  26  fragments were labeled with 3 2 P), then excised from agarose or polyacrylamide gels. The DNA bands in agarose gels were isolated by either electroeluting into a dialysis tubing, or using the Sephaglas BandPrep Kit (Pharmacia). The DNA bands in polyacrylamide gels were electroeluted into dialysis tubing (in 0.5X TBE or lx AGB buffer [20 mM sodium acetate, 40 mM Tris, 1 mM EDTA, pH adjusted to 8.0 with glycial acetic acid]). The eluate was collected and treated with phenol/chloroform (1:1 volume ratio) and precipitated by 2.5 volumes of 95% ethanol. ^2.3.3^Ligation  For cohesive-end ligation, 40 fmoles of plasmid vector DNA and 1-3 fold molar excess of insert DNA were ligated in 10 41 of reaction mix containing 1X ligase buffer (20 mM Tris-HC1, pH 7.6, 5 mM MgC12, 5 mM DTT, 50n/[1,1 BSA), 0.1 unit of T4 ligase (Weiss unit). The incubation was carried out at room temperature for 3 h. Blunt-end ligation was carried out with molar ratio of vector to insert DNA of 1:4 in lx ligase buffer with 0.5 unit of T4 ligase at room temperature for 3-5 h. One-third to one-half of the ligation mixture was used for transformation. ^2.3.4^Transformation  The E. coli competent cells were prepared by the CaC12 method for transformation. The E. coli cells were grown in YT medium to an A600 of about 0.4 (1 cm path length). The cells were then collected by centrifugation, and resuspended in 50 mM CaC12 (0.5 volume of the original culture), and incubated on ice for 40 min. The cells were then centrifuged, and resuspended in 50 mM CaCl2, 15% glycerol (in about 0.1 volume of the original culture). The cells were either used freshly or stored in small aliquots at -70°C for later use. Competent cells were gently mixed with 2-4 fmoles of DNA, and incubated on ice for 30 min, heat-shocked at 42°C for 45 sec, then incubated on ice for 2 min. One ml of YT medium was then added to cells, which were then incubated with shaking at 37°C for 1 h. 0.1 ml of the  27 culture was directly plated on YT-agar containing appropriate antibiotics. The rest of the culture was centrifuged, resuspended in 0.2 ml of YT medium, and plated.  ^2.3.5^5' and 3' end-labeling of DNA fragments Restriction DNA fragments containing recessed 3' ends were end-labeled using Klenow fragment of the E. coli DNA polymerase I and appropriate [a32P]dNTP (specific activity of 3000 Ci/mmol, 10 mCi/ml). For 5' end-labeling, the DNA fragments were dephosphorylated with SAP at 37°C in solution containing 10 mM MgC12, 20 mM Tris-HC1, pH 8.0, for 1 h. The reaction mixture was then heated for 30 min at 65°C to denature SAP. Once the mixture was cooled down to room temperature, Tris-HC1 (pH 8.0), DTT and spermidine were added into the solution to 30 mM, 5 mM, and 0.1 mM final concentrations, respectively, along with T4 polynucleotide kinase (PNK) (0.1 unit) and 50 pCi [y-32P]ATP. The mixture was incubated for 30 min at 37°C. The labeled fragments were precipitated twice with 2.5 volumes of 95% ethanol. Radioactivity was measured by Cerenkov counting.  ^2.3.6^5' end-labeling of oligonucleotides Oligodeoxyribonucleotides (about 250 ng) were 5' end-labeled at 37°C for 40 min with 1 unit of T4 PNK and 50 gCi of [y-32P]ATP in 20 gl of kinase buffer (0.1 M Tris-HC1, pH 8.0, 5 mM DTT, 10 mM MgC12). The reaction  was stopped by adding 1  gl of 0.5 M EDTA (pH 8.0), and heating at 65°C for 5 min. Carrier tRNA (8 gg) and distilled water (80 gl) were added. The labeled oligonucleotides were then precipitated twice with 2.5 volumes of 95% ethanol in the presence of 0.3 M sodium acetate, and redissolved in 20-50 IA TE buffer (10 mM Tris-HC1, pH 7.5, 1 mM EDTA).  ^  28 2.3.7^Labeling DNA probes by random priming method (Feinberg and  Vogelstein, 1984) Solution containing about 0.1 pmol of DNA fragment (70 ng for 1 kb fragment) was boiled with 5 ill of random hexadeoxyribonucleotides (about 50 A260/ml) for 5 min, then chilled on ice immediately. Five lil of 10X buffer (0.5 M Bis-tris-HC1, pH 6.6, 0.2 M NaC1, 50 mM MgC12, 50 mM 13-mercaptoethanol), 1 p.1 each of dGTP, dCTP, dTTP (1 mM), 5 1.1.1 of [a-3 2 P]dATP (50 .tCi) and 2 units of Klenow fragment were added, and the labeling reaction was allowed to take place at room temperature for 3 h. One 11.1 of yeast tRNA (10 mg/ml) and 1 W of EDTA (0.5 M, pH 8.0) were added to the reaction mixture, which was then heated at 65°C for 10 min. The labeled probe was precipitated twice with 2.5 volumes of 95% ethanol in the presence of 0.3 M sodium acetate. The pellet was dried, and the radioactivity was counted by the Cerenkov method. ^2.3.8^Southern blot hybridization and cloning procedures  Genomic T. maritima DNA was isolated following a CsC1 gradient centrifugation procedure (Sambrook et al., 1989). For Southern blotting, genomic DNA was digested with restriction enzymes, electrophoresed through 0.7% agarose gel, transferred to Hybond N membrane (Amersham), and probed with radioactive restriction fragments. The fragments were labeled with [a- 32 P]dATP using the random primer method (see above; Feinberg and Vogelstein, 1984). Genomic EcoRI fragments were size fractionated on a 5% polyacrylamide gel and, following electroelution, fragments of 4 to 7 kb were ligated with the two arms of kgt10. The ligated DNA was packaged with Packagene extract (Promega). The in vitro packaged phage was used to infect E. coli strain LE392. The phage library was screened by plaque hybridization with the 2.2 kb XbaI-EcoRI fragment from plasmid pUC-TB4. Phage DNA from positive plaques was digested with restriction  enzymes. A 4.0 kb EcoRI-SacI and a 2.2 kb XbaI-EcoRI fragments were subcloned  29  into pGEM-7Zf(+) to yield pPD934 and pPD990. The two restriction fragments overlap for about 300 by (between XbaI and Sad sites). ^2.3.9^DNA sequencing  Bidirectional deletions of insert DNA were constructed in plasmids pPD934 and pPD990 using exonuclease III (Henikoff, 1984). These deletions were used to sequence both strands of the two overlapping clones. The deletions were sequenced by the dideoxy chain termination method employing either pUC/M13 forward or reverse primers or T7 or SP6 primers. Both single- and double-stranded DNA molecules were employed as templates (Sanger et al., 1980; and Zhang et al., 1988), and when necessary, 7-deaza-2'-deoxyguanosine 5'-triphosphate (c 7 dGTP) and 2'deoxyinosine 5'-triphosphate (dITP) were used to resolve ambiguities caused by GC compression. ^2.3.10^RNA transcript analysis  Total cellular RNA was isolated from exponential cultures of T. maritima using the boiling SDS lysis method (Dennis, 1985). Briefly, cells were rapidly cooled on ice, collected by centrifugation, resuspended, and lysed in an SDS-containing buffer (100°C, 15-30 sec); RNA was extracted with phenol, precipitated with ethanol, resuspended in TE (10 mM Tris-HC1, pH 8.0, 1 mM EDTA), and pelleted through a cushion of 5.7 M CsCI (Shimmin and Dennis, 1989). The RNA pellets were resuspended in TE and stored at —70°C until required. Primer extension analysis was carried out essentially as described by Yang et al. (1990) with minor modifications. Briefly, RNA (5-10 lig) was precipitated with 5'  end-labeled primer (1 ng; primers used are listed in Table 3), and resuspended in 20 ill of hybridization buffer (0.25 M KC1, 5 mM Tris-HC1, pH 8.3, and 0.5 mM EDTA, pH 8.0). The sample was then denatured at 85°C for 2 min, cooled gradually to 42°C,  and then incubated at this temperature for at least 2 h. Afterwards, the  30 Table 3^Oligonucleotides  Designation  Sequence (5'-3')  (A) Oligonucleotides for primer extension 3 oD1 GTCAACCTCGAACATCTT  Length Positions Transcripts Strand2 or genes 18  248-231  tRNAmetl-  antisense  tRNAmet 2  oD2 oD3 oD4 oD5  AACAATACGGCCCACCAG  18  1160-1143  nusG  antisense  GGCGGGTCCAACGGGTGG  18  2233-2216  L11  anti sense  GAAACCCAGGAAATCGG  17  3539-3522  L10  antisense  CCCTCGTCCTCCTACCGC  18  4609-4592  OD9  GTGTTTCCCGCCCCAGTATC  20  33-144  5S rRNA  anti sense  oD10 oD11 oD12 oD13 oD14 oD15 oD16  CCGACCACCCGGTTATGAG  19  155-136  tRNAme tl  anti sense  ACGACACGGTGATTATGAA  19  338-319  tRNAmet 2  antisense  CACAACCTACTGATTACAAA  20  433-414  tRNAthr  anti sense  TCTGCCAGCGGATTTACAG  19  519-501  tRNAtyr  anti sense  CGCAACCACCGGI•1•I'IGGAG  20  783-764  tRNAtrp  antisense  TTTCTGTTGCCTGGTCAG  18  3464-3447  L10  antisense  CGCTTCGATGATTTCATC  18  4026-4008  L12  anti sense  41  1512-1535 6  nusG  sense  26  1044-1061  nusG  sense  26  2105-2088  nusG  antisense  27  3443-3461  L10  sense  antisense  (B) Oligonucleotides for PCR 5  oD6  TAGAATTCATGAAAAAAAAATGGT ACGTGGTCGTTCAGACA  oD7  GAGAATTCATGAAGAAAAAGTGGT AC  oD8  ACAAGCTTTCACTCGA frl'ICTCCA  C  oD18  GTGAATTCTATGCTGACCAGGCAAC AG  01)19  CAAAGCTTTCATTCAGATT^11 CTC  26  3983-3966  L10  antisense  o D20  TTTCTAGAGTGAAAAAAGAATGTT  26  4396-4413  113  sense  26  4476-4459  Tf3  anti sense  27  3411-3393  AL10  anti sense  GC 0D21  AAAAGC 111 FGTACCACACTCTATT  T  oD22  TGAGATCTCCGGAGGGTCCACAAA AAG  31  Table 3 (continued) oD23  TAGGGCC=GTGGACCCTCCGG  27  3395-3413  AL10  sense  27  2602-2619  L1  sense  27  3303-3286  Ll  antisense  27  3999-4016  L12  sense  27  4368-4385  L12  antisense  1178-1155  nusG  antisense  GG  L1-5'  GGAGAATTCATGCCGAAGCACTCC AAG  L1-3'  TCGAA=TTACTCTITCAACAGA CT  L12-5'  TGTQAALECATGACGATTGATGAAA TC  L12-3'  AACAAGC1111ACTTCAGTTCCACT TC  (C) Oligonucleotide for site-directed mutagenesis oD17  GTAGATTCTGATCTC1'1'1'1CTTGTTG AAACTACCTCTTCAGGAATAACAA T  1. 2.  51  and 1715-1689  The position corresponds to that in Figure 4. Antisense indicates that the oligonucleotide is complementary to the coding strand or mRNA.  3.  The primers oD9 to oD14 were used for Northern blot hybridization to detect  tRNAs and 5S rRNA. 4. The position of the 5S rRNA corresponds to the mature form of the T. maritima 5S rRNA (Achenbach-Richter, personal communication). 5. The PCR primers have 5' extension that contains restriction endonuclease recognition sequence (underlined) and two more bases at the 5' end; the position indicated does not include the extension. The restriction endonuclease recognition sequences are: GAATTC, EcoRI; AAGCTT,  HindIll; TCTAGA, XbaI; GGGCCC, Apr& 6^Sequence TGGTAC (code for Trp and Tyr, respectively) in oD6 was inserted, thus the positions indicated do not include these six bases.  32 annealed sample was diluted to 125 ill in reaction mixture (50 mM Tris-HC1, pH 8.3, 3 mM MgC12, 75 mM KC1, 10 mM DTT, 2 mM each of dATP, dGTP, dTTP and dCTP, 10-20 units of RNase inhibitor [Pharmacia] and 400 units of MMLV reverse transcriptase [BRL], all final concentraton). The extension reaction was allowed to take place at 42°C for 60 min. The extension products were precipitated, redissolved in gel loading buffer (98% formamide, 20 mM EDTA, 0.05% bromophenol blue and 0.05% xylene cyanol FF), and electrophoresed on an 8% polyacrylamide, 8 M urea gel. The sequencing ladders generated with the same 5' end-labeled primers were run alonside the samples on the same gel. The 3' and 5' ends of in vivo tRNA and mRNA transcripts were analyzed by S1 nuclease protection analysis as previously described (Favaloro et al., 1980; Dennis, 1985; Downing and Dennis, 1987). The appropriate DNA fragment was 5' endlabeled with T4 polynucleotide kinase and [y- 32 P]ATP, or 3' end-labeled with the Klenow fragment of DNA polymerase I and the appropriate [a- 32 P]dNTP (see above). Total cellular RNA (5-10 lig) was hybridized with the end-labeled DNA fragment in 20 Ill of hybridization buffer (80% formamide, 40 mM PIPES, pH 6.8 [the pH was adjusted by concentrated NaOH], 0.4 M NaC1, and 1 mM EDTA) at 50°C for 3 h. Hybrids were treated with S1 nuclease (400 units/ml; 34°C for 30 min). The products were precipitated, resuspended, and electrophoresed on an 8% polyacrylamide, 8 M urea gel along with end-labeled DNA length markers. The length markers were generated by restriction enzyme digestion of plasmid DNA of known sequence; fragments were 3' end-labeled as described above.  2.3.11^Northern blotting Ten to 20 lig of total cellular RNA were denatured at 70°C for 10 min in the denaturing buffer (50 mM MOPS, 1 mM EDTA, 0.66 M formaldehyde and 40% [v/v]  formamide), electrophoresed through a 1.5% agarose horizontal slab gel made up in the denaturing buffer, and transferred to Hybond-N membranes by blotting with 20X  33 SSC (3 M NaC1, 0.3 M trisodium citrate) or 20X SSPE (3.6 M NaC1, 0.2 M sodium phosphate, 2 mM M EDTA, pH 7.7). The membranes were hybridized with radioactive DNA probes (generated by the random priming method, as described above) at 48°C overnight in hybridization solution (5X SSPE, 5X Denhardt's solution [100X Denhardt's solution contains 2% BSA, 2% Ficoll [Pharmacia], 2% polyvinylpyrollidone [Pharmacia], and 0.5% SDS), which were prehybridized in the same solution in the absence of the DNA probe for 1 to 5 h. The membranes were then washed with 2X SSPE containing 0.1% SDS twice at room temperature for 15 mM. Two subsequent washes were done with lx SSPE, 0.1% SDS for 10 min, and 0.25X SSPE, 0.1% SDS for 10 mM, respectively. The membranes were sealed in plastic bags, and exposed to Kodak XAR-5 film with an intensifying screen.  2.4 EXPRESSION OF T. MARITIMA NUSG IN E. COLI  The E. coli stains JM 101, JM109 and BL21(DE3)pLysS (Studier et al., 1990) were used for cloning and expression of NusG-encoding gene of T. maritima. The nusG was amplified by the polymerase chain reaction (PCR) using primers oD7 and oD8 (Table 3). The 5' primer oD7 contains the first 6 codons of the nusG and an eightnucleotide extension at its 5' end containing the EcoRI recognition sequence upstream of the ATG initiation codon. The 3' end primer oD8 is complementary to the last 6 codons of the nusG, and the eight-nucleotide extension at the 5' end of oD8 has a HindIII recognition site and two extra bases at the 5' end. The PCR was carried out in a 100 41 mixture containing 10 mM Tris-HC1, pH 8.3, 50 mM KC1, 1.0 mM MgC12, 50 mM dNTP, 100 mg/ml gelatin, 40 pmol of both primers, 2.5 units of Taq DNA polymerase (Pharmacia) and 10 pg plasmid pPD990 which contains the T. maritima nusG (Part III; Liao and Dennis, 1992). The reaction cycles were: cycle 1, 90 sec at 94°C, 30 sec at 55°C, and 90 sec at 72°C; cycles 2 to 31, 15 sec at 94°C, 30 sec at 55°C, and 90 sec at 72°C; cycle 32, 5 mM at 72°C. The amplified DNA fragment was  34 gel purified, digested by EcoRI and Hindi'', and cloned into the EcoRI and HindIII sites of pGEM-7Zf(+). The amplified sequences were then checked by DNA sequencing from several clones. The EcoRI—HindIII fragment with the correct NusG-coding sequence was cloned into the expression vector pKK223-3 (Pharmacia) to give plasmid pPD1077. To clone the NusG-coding sequence into T7 expression vector pET-3a (Studier et al., 1990 ), the pET-3a was cut with Ndel, and the nusG -containing pGEM-7Zf(+) was cut by EcoRI, and these sites were rendered blunt by treatment with nuclease 51 and Klenow fragment of the E. coli DNA polymerase I. These plasmids were then cut by BamHI. The linearized vector pET-3a and nusGcontaining fragment were recovered from gel, and ligated together to yield plasmid pPD1078. The E. coli strains JM101 and JM109 were transformed with pPD1077, and BL21(DE3)pLysS was transformed with pPD1078. These transformants exhibited a high level expression of the NusG protein of T. maritima.  2.5 PURIFICATION OF NUSG  The strain JM109 harboring pPD1077 or BL21(DE3)pLysS harboring pPD1078 were grown at 37°C in 2 liters of YT medium containing 100 µg/ml ampicilin. When the absorbance of the culture at 600 nm reached between 0.6 and 1.0, IPTG was added to each culture to a final concentration of 0.4 mM. Each culture was grown for an additional 3 h, and the cells were harvested at 4°C by centrifugation. The cells were washed with buffer A (50 mM Tris-HC1, pH 8.0, 0.35 M NaC1, 10 mM MgC12 and 1 mM EDTA) and resuspended in 50 ml buffer A. The cell suspensions were sonicated at 1 min intervals in an ice/water mixture for 8 min (the cells of BL21(DE3)pLysS/pPD1078 can also be lysed by a freezing-thawing cycle). The cell lysates were centrifuged at 27,000 xg for 25 min to remove cell debris. Then, 1 M NaC1 was added to the cleared cell lysates to a final concentration of 0.5 M. Streptomycin sulfate (20%, w/v) was added slowly to the cell lysates with stirring on  35  ice to a final concentration of 4%. The cell lysates were stirred further for 15 mM on ice. The solutions were then centrifuged at 18,800 xg for 10 mM. The supernatants were heated at 75°C for 30 min, and centrifuged at 10,800 xg for 15 min. Solid ammonium sulfate was added to the supernatants to a final concentration of 24% (w/w), and the supernatants were stirred on ice for 20 min. The supernatants were centrifuged at 15,900 xg for 15 min. The resulting pellet was dissolved in 15 ml of buffer B (25 mM sodium phosphate, pH 7.0, 50 mM NaC1), and dialyzed overnight at 4°C against the same buffer with several changes of buffer. After dialysis, the solution was clarified by centrifugation to remove insoluble materials. The cleared protein solution was applied to a column (2.0X15 cm) of CM Sepharose CL-6B (Pharmacia) which had been equilibrated with buffer B. The column was washed with 10 column volumes of buffer B, and eluted with a linear NaC1 gradient of 50300 mM in 25 mM sodium phosphate buffer, pH 7.0. The column profile was obtained by plotting A280 of each fraction against fraction number. The fractions across the peak were analyzed by SDS-PAGE (Laemmli, 1970). The fractions containing NusG were pooled and concentrated to 10- to 15-fold with a Centriprep10 (Amicon, Beverly, MA). The concentrated NusG solution was dialyzed overnight at 4°C against buffer B with several changes of buffer. All the purification steps were carried out at room temperature described above except otherwise specified. The protein concentration was determined by using BCA* protein assay reagent and procedure recommended by the manufacturer (Pierce, Rockford, IL).  2.6 IMMUNIZATION PROCEDURE  For primary immunization, four white rabbits were used; each was injected subcutaneously with 0.3 mg of the purified NusG in 0.3 ml of phosphate-buffered saline (PBS) and emulsified in an equal volume of Freud's complete adjuvant. After 4 weeks, each rabbit was given an intramuscular booster injection of 0.2 mg of  36 protein in PBS which was mixed with an equal volume of Freud's incomplete adjuvant. Two weeks later, the antisera were collected.  2.7 ENZYME-LINKED IMMUNOSORBENT ASSAY (ELISA) AND WESTERN BLOTTING  Antisera were tested for the presence of specific antibodies against NusG using ELISA. Purified NusG protein (30 ng) was absorbed onto the wells of a microtiter plate (MicroTest III assay plate, Becton Dickinson, Oxnard, CA) by drying overnight at 37°C. The plate was then washed three times with 0.05% Tween 20 in PBS, and blocked for 1 h with 1 % gelatin in PBS. Antisera were diluted 50 fold in 0.5 % gelatin-PBS, and added into the first well of each row, and serially diluted in each adjacent well. After incubation at room temperature for 1 h, secondary antibody (goat anti-rabbit immunoglobulin G-Horseradish peroxidase conjugate [BRL]), which was diluted 3000 fold in 0.1% gelatin-PBS, was added to each well and incubated for 1 h. After washing with 0.05% Tween 20-PBS, 100 pl substrate mixture was added into each well. The substrate mixture contained 1 mg/ml of 2,2'-azinobis(3-ethylbenzthiaziline-6-sulfonic acid) (Sigma) in substrate buffer (0.1 M Na2HPO4, 80 mM citric acid, pH 4.0, and 0.01% H202). After color development, the plate was read by using a BIO TEK microplate reader (Mandel Scientific, Guelph, Ontario) at 405 nm. For Western blotting, the proteins were separated by SDS-PAGE and electrotransferred to a nitrocellulose membrane (Bio-Rad). The NusG was detected with the antiserum against it (diluted 1:30,000 with 0.1% gelatin, 0.05% Tween-20 in TBS [20 mM Tris-HC1, 0.5 M NaCI, pH 7.5]). Membranes were blocked with 3 `)/0 gelatin in TBS for 1 h, and washed twice with washing buffer (0.05% Tween 20 in TBS). The membranes were then incubated with primary antibodies for 2 to 16 h. After washing twice with washing buffer, the membranes were incubated for 1 h  37  with secondary antibodies (goat anti-rabbit immunoglobulin G-alkaline phosphatase conjugate [BRL]), which was diluted 3,000 fold in 1% gelatin, 0.05% Tween - 20 in TBS. The membranes were washed subsequently with washing buffer, TBS, and substrate buffer (0.1 M Tris-HC1, 0.1 M NaC1, 50 mM MgC12, pH 9.5), respectively; the immunoreactive proteins were then detected by incubating membranes in the development solution (0.225 mg/ml BCIP [5-bromo-4-chloro-3-indolyl-phosphate ptoluidine salt] and 0.22 mg/ml NBT [nitroblue tetrazolium chloride], that were first solubilized in dimethylformamide [BRL], then diluted in substrate buffer).  2.8 DNA BAND SHIFT ASSAYS -  Double-stranded plasmid DNA and single-stranded M13 derivatives were purified by ethidium bromide-CsC1 ultracentrifugation (Sambrook et al., 1989). Plasmids were digested with restriction enzymes and linear fragments were isolated from agarose gels. Table 4 lists the sizes and the origins of the duplex fragments used in the experiments described in Part IV. For the band-shift assays, purified NusG protein was mixed with DNA and incubated at the stated temperatures for the indicated period of time. The standard assay was 65°C for 2 h. Unless otherwise indicated, a typical buffer was 33 mM NaCl and 17 mM sodium phosphate, pH 7.0. When necessary, binding reactions were terminated by freezing in a dry-ice ethanol bath, thawing at 0°C, and mixed with 5 Ill of electrophoresis loading solution (50% glycerol [v/v], 0.2% bromophenol blue, 0.2% xylene cyanol), and electrophoresed in either agarose or polyacrylamide gels in 0.5 X TBE. The gels were stained with ethidium bromide. In some experiments, 3' or 5' end-labeled DNA fragments were employed; complexes were visualized by autoradiography of the electrophoresis gels.  38 Table 4  Duplex DNA fragments used for DNA band-shift assays  Fragment  Size (kb)  Source  Description  Reference  XbaI—HindIII  5.2  pUC-TB4  Sequence that encodes part of the  W.^Zillig,^personal  13 subunit of the T. maritima RNA  communication  polymerase HindIll—  3.5  pGEM-L12  Linearized plasmid pGEM-7Zf(-)  Liao, unpublished  that contains the T. maritima  HindIll  L12-encoding gene HindIll—  3.0  pGEM-L1  EcoRI  The larger HindIII—EcoRI  Liao, unpublished  fragment of the plasmid pGEM7Zf(-) containing the T. maritima Ll-encoding gene  HindIll—  EcoRI  0.7  pGEM-L1  The smaller HindIII—EcoRI  Liao, unpublished  fragment of the above plasmid (the T. maritima L1-encoding sequence)  2.9 ISOLATION OF T. MARITIMA RIBOSOMES  The T. maritima cells were suspended gently in buffer I (10 mM Tris-HC1, pH 7.5, 10 mM MgC12, 6 mM P-mercaptoethanol, 30 mM NH4C1), and broken by two passes through a French press. The DNase I (RNase-free, 200 pig/m1) was added to the cell extract and incubated at 0°C for 10 min. The cell extract was then centrifuged at 15,000 xg for 30 min to remove cell debris. The supernatant was centrifuged twice for 5 h at 248,000 xg with a 50 Ti rotor. The pellet was dissolved in buffer I, and the absorbance of the solution at 260 nm was measured. Two ml portions of the solution were layered on 6-30% (w/v) linear sucrose gradients (35 ml) made up in buffer I, and centrifuged at 48,000 xg for 15 h at 10°C. The gradients were fractionated dlld the  absorbance of each fraction at  260  nm was monitored continuously. The  39  fractions containing the 70S ribosomes, and the 50S and 30S ribosomal subunits were appropriately pooled, and dialyzed against buffer II (same as buffer I, with only 0.3 mM MgC12) at 4°C overnight with several changes of buffer II. The ribosomes and the subunits were recovered by adding 1 volume of 95% ethanol (the concentration of MgC12 was raised to 10 mM before adding ethanol) and centrifugation at 10,800 xg for 20 min at 4°C. The pellets were dried and resuspended in small volume of buffer II.  2.10 IN VITRO TRANSCRIPTION  Various DNA templates (about 0.1 pmol) containing T. maritima promoter were incubated with T. maritima RNA polymerase (0.2 pmol, about 0. 1 p,g) in 5 pl of buffer containing 50 mM Tris-HC1 ( pH 9.0 at 25°C) and 50 mM NaCl at 75°C for 10 min to form the binary complex. The binary complex was then added to reaction mixture (preheated at 75°C for 1 min) containing 1 mM each of ATP, GTP, CTP, 0.4 mM of UTP, 80 nM of [a- 32 MTP (about 10 1.1Ci), 6 mM MgC12, 50 mM Tris-HC1 (pH 9.0 at 25°C), 0.05 µg/µl BSA, 12.5 mM sodium phosphate (pH 7.0 at 25°C) and 37 mM NaCl (all at final concentration) in 20 pl of total reaction volume, and incubated at 75°C for 20 min. When the NusG was included in the transcription assay, different amounts of the protein (between 0.01 lig and 10 p.g) were used. The transcription was stopped by adding 20 pl of stop buffer (0.6 M sodium acetate, 0.1 M EDTA, 0.2 mg/ml yeast RNA) and 60 pl of distilled water. The reaction mixture was extracted once with phenol/chloroform (1:1 v/v), and then the RNA was precipitated by 2.5 volumes of 95% ethanol. The precipitated RNA was dried, and resuspended in 10 Al of RNA loading buffer (90% deionized formamide, 50 mM Tris-HC1, pH 8.0, 1 mM EDTA, 0.025% xylene cyanol, 0.025% bromophenol blue). The transcripts were analyzed by electrophoresis through an 8% polyacrylamide, 8 M urea gel.  40  2.11 MOLECULAR SEQUENCES  The molecular sequences (nucleotide sequences and/or amino acid sequences) for ribosomal proteins L11, L1, L10 and L12 were obtained from sequence data banks (EMBL, GenBank and Swiss-Prot data banks) associated with the GeneWorks® package (IntelliGenetics, Inc., Mountain View, CA). Sequences unavailable from the data banks were obtained from the literature. The abbreviations used as organism identifiers in sequence alignments and phylogenetic trees and the reference for each sequence are listed in Table 5.  2.12 SEQUENCE ALIGNMENTS  The amino acid sequences of ribosomal proteins L11, L1, L10 and L12 from the eubacteria, the archaebacteria and the eukaryotes were aligned using the alignment algorithm in the GeneWork® package. The resulting alignments were visually inspected to minimize the alignment gaps and to maximize amino acid identities. In the cases of ribosomal proteins L10 and L12, the previous evolutionary models were consulted in order to preserve predicted structural features (Shimmin et al., 1989). The L12 alignments center on the conserved arginine-tryptophan residue at position 88. When required for analysis, nucleotide sequence alignments colinear to the depicted amino acid sequence alignments were used. Consensus of sequence alignments was determined visually by a flexible majority rule, where chemically similar amino acid residues at each alignment position were taken into consideration. For example, at position 279, in the five archaebacterial L10 proteins there are two Ds, one E, one K, and one T. Because of the chemical similarity between D and E, D was chosen as the consensus residue, even though it does not represent the majority residue at this position.  41 Table 5 Organisms and their abbreviations from which the sequences of the ribosomal proteins L11, L1, L10 and L12 are available Organism  Abbreviation Proteins  Reference  Eubacteria Bacillus^stearothermophilus  B st  Ll  Kimura et al. (1985)  L12  Garland et al. (1987)  Bacillus subtilis  Bsu  L12  Itoh and Wittman (1979)  Desulfovibrio vulgaris  Dvu  L12  Itoh and Otaka (1984)  Escherichia^coli  Eco  L11, Ll, L10, L12  Post et al. (1979)  Haloanerobium prevalens  Hpr  L12  Matheson et al. (1987)  Halophilic eubacterium  Heu  L12  Falkenberg et al. (1985)  Micrococcus lysodeikticus  Mly  L12  Itoh (1981)  Proteus vulgaris  Pvu  L11, Ll  Sor and Nomura (1987)  Rhodopseudomonas  Rsp  L12  Itoh and Higo (1983)  Serratia marscescens  Sma  L11, Ll  Sor and Nomura (1987)  Salmonella^typhimurium  Sty  L10, L12  Paton et al. (1990)  L12  Bartsch et al. (1982)  L11  Smooker et al. (1991)  NRCC 41227  sphaeroides  Spinacea^oleracea  Sol(c)  (chloroplast) Streptomyces griseus  Sgr  L12  Itoh (1982)  Streptomyces^virginiae  Sy i  L11  Okamoto et al. (1992)  Synechocystis sp. PCC 6803  Sec  L10,L12  Sibold and Subramanian (1990)  Thermotoga^maritima  Tina  L11,L1, L10, L12  Liao and Dennis (1992)  Asa  L121I (eL12')  Amons et al. (1979; 1982)  Eukaryotes Artemia^salina  L12I (eL12)  Dictyostelium discoideum  Ddi  L10 (P0)  Prieto, et al. (1991)  Drosophila^melanogaster  Dme  L10 (PO)  Kelley et al. (1989)  L1211 (rp21C),  Wigboldus, 1987; Qian et al.  L12I (rpAl)  (1987)  L1211 (P1)  Ferro and Reinach (1988)  Gallus gallus  Gga  42  Table 5 Homo sapiens  Hsa  (Continued) L10 (P0), L1211  Rich and Steiz (1987)  (P1), L12I (P2)  Mus musculus  Miry  L10(P0)  Krowczynska et al. (1989)  Rattus norvegicus  Rno  L10(P0)  Chan et al. (1989)  Rattus rattus  Rra  L1211 (P1),  Wool et al. (1990)  L12I (P2) Saccharomyces^cerevisiae  Schizosaccharomyces pombe  Sce  Spo  L11 (L12)  Suzuki et al. (1990)  L10 (P0), L12IA,  Newton et al. (1990)  L12IB, L12IIA,  Mitsui and Tsurugi (1988);  L12IIB  Remacha et al. (1988)  L11 (L15)  Pucciarelli et al. (1990)  L121 (A4), L12IB  Beltrame and Bianchi (1990)  (A2), L121I (Al), L12IIB (A3) Trypanosoma cruzi  Tcr  L12I (P2)  Schijman et al. (1991)  Tetrahymena^thermophila  Tth  L1211 (L37)  Hansen et al. (1991)  Halobacterium cutirubrum  Hcu  L11, Ll, L10, L12  Shimmin et al. (1989)  Halobacterium^halobium  Hha  L11, Ll, L10, L12  Itoh (1988)  Haloarcula marismortui  Hma  L11, Ll, L10, L12  Arndt and Weigel (1990)  Haloferax^volcanii  Hvo  L11, Ll, L10, L12  Shimmin and Dennis  Archaebacteria  (unpublished data) Methanococcus vannielli  Mva  Ll, L10, L12  Baier et al. (1990)  Sulfolobus acidocaldarius  Sac  L12  Matheson et al. (1989)  Sulfolobus solfataricus 2  Sso  L11, Li, L10, L12  Ramirez et al. (1989)  1. The protein designations used in this thesis are based on the sequence similarity to the E. coli L11, Ll, L10 and L12 proteins. The original nomenclatures are given in parentheses. 2. Recent data indicate that the organism used to clone these ribosomal protein genes was actually S. acidocaldarius and not S. solfataricus (Durovic, 1993b). Nonetheless, we have here retained the species designation of Ramirez, et al. (1989).  43 2.13 PHYLOGENETIC RECONSTRUCTION  Parsimony analysis of the aligned amino acid sequences using the heuristic and/or branch and bound tree search options and bootstrap analysis were carried out using PAUP (Swofford, 1993). When the heuristic tree search option was used, random addition of sequences with 10 replications was used to generate the parsimony tree. For bootstrap analysis of the L12 alignments, random addition of sequences with one replication was used because of limitation in computing capacity. The tree bisection-reconnection (TBR) algorithm was used in the heuristic tree searches (Swofford, 1993). The distance matrix methods were also employed to construct distance matrix trees using DNADIST, FITCH, KITSCH, and NEIGHBOR programs in the PHYLIP Package (Felsenstein, 1991).  44  III. The organization and expression of essential transcription, translation component genes in the hyperthermophilic eubacterium Thermotoga maritima 3.1 INTRODUCTION  Living organisms derive from a common primordial ancestor and divide into three easily recognizable kingdoms or lineages: the eubacteria, the archaebacteria and the eukaryotes (Woese and Olsen, 1986; Woese et al., 1990). In spite of this superficial understanding, our knowledge relating to the molecular features of the common ancestor and the precise origins and relationships between the three surviving kingdoms or lineages remain obscure (Woese, 1987; Woese and Olsen, 1986). Using an elegant approach involving the use of duplicated gene sequences, Iwabe et al. (1989) and Gogarten et al. (1989) suggested that the primordial ancestor, represented by the root of the universal phylogenetic tree, falls closest to the eubacterial domain and that the archaebacteria and eukaryotes were derived from a later splitting of a second independent lineage. Thermotoga maritima is an anaerobic and extremely thermophilic  eubacterium that has been isolated from geothermal ocean floor locales (Huber et al., 1986). Phylogenetic sequence analysis of 16S rRNA, and elongation factors Tu  and G indicates that T. maritima is slowly evolving and is a representative of the deepest branches within the eubacterial lineage (Achenbach-Richter et al., 1987; Bachleitner et al., 1989; Tiboni et al., 1991). These features—deep branching and slowly evolving—make it more likely that T. maritima has retained ancestral characteristics that might tend to be lost in later and more rapidly evolving branches. Characterization of the molecular features of T. maritima can potentially reveal information about the common ancestor and its relationship to  eubacteria and possibly also to archaebacteria and eukaryotes.  45 In this study, we have chosen to characterize a segment of the T. maritima genome that encodes equivalents to the Escherichia coli L11, L1, L10 and L12 large subunit ribosomal proteins. The analysis of these particular genes and the proteins they encode is judicious for a number of reasons. Their functional activities in protein synthesis are universally conserved, have been well characterized, and for L10, L11 and L12, amino acid sequences are available from several representative species within each of the three kingdoms (for a review see Shimmin et al., 1989). In E. coli and other organisms, a single copy of L10 and four copies of L12 assemble along with L11 to form a distinct stalk on the 50S subunit of the ribosome; this complex functions in factor binding and GTPase activities during the protein synthesis cycle (Strycharz et al., 1978; Egebjerg, et al., 1990; Ryan et al., 1991). Protein L1 binds to 23 S rRNA to form a shoulder opposite the stalk  on the 50S subunit and functions to stabilize peptidyl-tRNA binding to the P site of the ribosome (Lake and Strycharz, 1981; Draper, 1990). The transcriptional and autogenous translational regulation of the L11, L1, L10 and L12 genes and proteins has been extensively studied in E. coli (Fiil et al., 1980; Baughman and Nomura, 1983; Christiansen et al., 1984; Lindahl and Zingal, 1986; finks-Robertson and Nomura, 1987; Downing and Dennis, 1987, 1991). The four ribosomal protein genes along with the RNA polymerase 13 and 13' subunit genes form a complex operon that is transcribed from two promoters PL11 and Puo• The L12-13 intergenic space contains a transcription attenuator that plays an important role in regulating the expression of the 13 and (3' RNA polymerase subunit genes. In addition, the proximal ribosomal protein transcripts contain two well-characterized sites used for autogenous translational regulation. The first is a mimic of the L1 binding site in 23S rRNA and is located immediately in front of the L11 translation initiation codon. A deficiency in 23S rRNA production allows L1 protein to bind to the mRNA and block translation of the L11 and L1 cistrons. The long L1-L10 intergenic space contains a second control  46 region which binds L10 (or L10-L12 complex); protein binding is believed to switch the conformation of the mRNA to a structure which exhibits greatly reduced translational efficiency. The region upstream of the operon encoding ribosomal proteins and RNA polymerase in E. coli is occupied by four tRNA genes, tufB (one of two genes encoding the translation elongation factor Tu), and the short secE-nusG operon encoding two essential proteins involved respectively in protein export and in transcription termination-antitermination (An and Friesen, 1980; Schatz et al., 1990; Downing et al., 1990; Sullivan et al., 1992; Linn and Greenblatt, 1992). Our analysis indicates that T. maritima lacks a tufB gene in this region and that five tRNA genes, secE and nusG are cotranscribed with four genes for the ribosomal proteins L11, L1, L10 and L12. The downstream RNA polymerase genes are transcribed separately. 3.2 RESULTS AND DISCUSSION  The rif region of the E. coli chromosome contains a cluster of essential genes that encode components of the transcription-translation apparatus (Lindahl et al., 1975). Included are genes for the large ribosomal subunit proteins L11, L1,  L10 and L12 and the 13 and 13' subunits of RNA polymerase. To identify the region in the T. maritima genome that encodes the equivalent large subunit ribosomal proteins, genomic DNA was digested and probed by Southern hybridization with the 2.2 kb EcoRI fragment from E. coli (see Figure 3). At medium stringency, the probe hybridized to a single 5.8 kb EcoRI fragment. The T. maritima 5.8 kb fragment was shown to contain a single Xbal site located 2.2 kb from one end. The same 2.2 kb Xbal-EcoRI fragment was identified as the terminal part of a larger 5.0 kb Xbal-Hin dill genomic fragment present in the recombinant plasmid pUC-TB4. This 5.0 kb insert fragment was known to  NucLeotide Scale (kilobases) 0^  1  8.5  I^i^I  A  Escherichia coli T. Y.  GT Ti  ^  tufB^secE nusG^L11^Ll^L10^L12^/3 B  B  Thermotoga maritima  ^  PROBE  tRNAs M1 M2TY WsecE^nusG^Lll^Ll  B  L33  B  C  C  L10^L12 D^X S  B^H  pUC -TB4 ^ X.Tma5.8 pPD990 pPD934  Figure 3 Structure and organization of the L11, L1, L10 and L12 encoding regions from the E. coil and T. maritima genomes A. The structure of a 6-kb portion of the rif region at 89 min on the E. coli chromosome is depicted. Genes are shown as solid boxes and intergenic spaces are blank. The tRNA genes are identified as follows: Tu and TT are non-identical tRNAtYr genes; Y is a tRNAtYr gene and G is a tRNAg 1 Y gene. The 2.2-kb EcoRI fragment overlapping the L11, Ll, L10 and L12 ribosomal protein genes was used to probe genomic T. maritima DNA. B. The structure of the corresponding 5.8-kb portion of the T. maritima genome is depicted. The tRNA gene designations are: MI and M2 are non-identical tRNAmet genes; T is a tRNAtiir gene; Y is a tRNAtYr gene and W is a tRNAtrP gene. The secE, nusG genes, and the L33, L11, L1, L10 and L12 ribosomal protein genes are the equivalents or homologues to the corresponding E. coli genes. Some restriction enzyme sites used for generating probes and their positions within the nucleotide sequence are: E, EcoRI (position 1, 5783'; C, ail (1356, 2007); D, Dral (3070); X, Xbal (3567); S, Sad (3858) and H, Hin dill. Restriction fragments that have been cloned into A. ca r plasmid vectors are indicated.  48 encode the amino terminal portion of the RNA polymerase  0 subunit protein (W.  Zillig, personal communication). These Southern hybridization experiments therefore suggested that some or all the L11, L1, L10 and L12 equivalent ribosomal protein genes are, as in E. coli, located proximal to the subunit gene of the RNA polymerase 13 in the T. maritima genome. Using the 2.2 kb XbaI-EcoRI fragment from pUC-TB4 as a probe, we cloned the genomic 5.8 kb EcoRI fragment in Xgt10, but we were unable to subclone this EcoRI fragment into a number of different plasmid vectors. However, from the recombinant Xgt10, the 2.2 kb XbaI-EcoRI and the overlapping 4.0 kb EcoRI-SacI fragments were isolated and subcloned to give plasmids pPD934 and pPD990, respectively. The complete nucleotide sequences of the overlapping 4.0 kb EcoRISacI and 2.2 kb XbaI-EcoRI fragments yielded the sequence of the entire 5788 nucleotide long genomic EcoRI fragment (Figure 4). The sequence contains five tRNA genes, two short open reading frames encoding the equivalent ribosomal protein L33 and the secE genes, respectively, a long open reading frame designated nusG, four genes encoding the equivalents of the E. coli L11, L1, L10 and L12 large ribosomal  subunit proteins and as expected, the 5' portion of the open reading  frame encoding the equivalent of the 0 subunit protein of the E. coli RNA polymerase. Comparison of the content and location of genes between E. coli and T. maritima reveals both similarities and differences (Post et al., 1979; Downing et al., 1990; An and Friesen, 1980). First, the tRNAthr and  tRNAtYr genes of T. maritima  have the same anticodon as the thrT and tyrU genes of E. coli; the other tRNA genes show no correspondence (Figure 3). Second, the ribosomal protein L33 gene is located between genes for tRNAtYr and tRNAtrP, whereas in E. coli, the equivalent gene (rpmG) is clustered with rpmB encoding ribosomal protein L28; this gene cluster is located near 80 minute in the E. coli chromosome, about 45 kb  49  Figure 4^Nucleotide sequence of the T. maritima 5.8 kb EcoRI genomic fragment  The sequence of the 5788-nucleotide-long EcoRI fragment from the genome of T. maritima is illustrated. The five putative promoters P1, P2, PL10, PL12, and Pp are indicated above the sequence; the major start sites are denoted (•). Restriction sites used for transcript mapping studies are indicated above the sequence. The position of the five tRNAs (o, anticodon• • •) and the predicted amino acid sequences of the proteins encoded by genes on the fragment are depicted below the nucleotide sequence. Translation initiation sequences complementary to the 3' end of 16S rRNA are underlined.  ^ -  50 EcoRI^20^  40^  60^  80^  P1 •^  120  GAATTCTCGGATATTTTACGAGCATTTCCTTGATGGGATCTTTCTTCATGCTGATCACACTCCTTGACAACGGGGTTTTGTTAGAATATAATCTGATAGCGGTGTGGGCTCGTAGCTCAG tRNAmet1^00000000000000000 MspI  140  Mspi  160  Aval  180  200  220  240  TTGGCAGAGCGCCCGGCTCATAACCGGGTGGTCGGGGGTTCGAATCCTCCCGAGCCCACCAGTTCCTGAAGGAGAGCACGGCTCTCCTTATTATTTTAACACATCGTTCAAAGATGTTCG 000000000000000000....0000000000000000000000000000000000000000 260^ P2 •^ 300^ 320^ 340^ 360 AGGTTGACAAAGAAAAGCTCTGATAGTAAAATTAATGAACGGTCTTGGGCGGCGTAGCTCAGCGGCGAGAGCGGGTGATTCATAATCACCGTGTCGTGGGTTCGAGTCCCACCGCCGCCA tRNAmet2^000000000000000000000000000000000...0000000000000000000000000000000000000 380^ MspI^ 420^ 440^ 460^ Aval TAGGTCATCGGAAAGGAAATAGGGCCAGCGTAGCTCAACCGGTAGAGCGACTGATTTGTAATCAGTAGGTTGTGGGTTCGAGTCCCACCGCTGGCTCCAAAAGTATGTGGTGGGGTGccC tRNAthr 00000000000000000000000000000000000•••0000000000000000000000000000000000000000 ^00000000000000 tRNAtyr 500^  520^  540^  560^  580^  600 GAGTGGCCAAAGGGGGCGGACTGTAAATCCGCTGGCAGAATCTTCGGAGGTTCAAATCCTCCCCCCACCACCAGATTTTTTGAGAAAGGGTGGAAGATATGCGAGTGAAAGTGGCTCTGA 0000000000000000000000...000000000000000000000000000000000000000000000000 L33:50aa;MW=5744; pI=9.9MRVKVAL 660^ 700^ 620^ 640^ 680^ 720 AATGTTCTCAGTGCGGTAACAAGAACTACTACACCACAAGGAACAAGGACAAAAGAGCAAAGCTCGAACTGAGAAAGTACTGCCCAAAGTGCAACGCCCACACGATTCATACCGAAACGA K  CSQCGNKNYYTTRNKDKRAKLELRKYCPKCNAHTIHTET 740^  MspI^MspI 780^  MspI^  820^  840  AAGCGTAATCGCAGGGCCGTAGCTCAACTGGTAGAGCGCCGGTCTCCAAAACCGGTGGTTGCGGGTTCGAGTCCTGCCGGCCCTGCCATTTTTTGATCTGAGGGGGCATCGAGAATGGAG K A^00000000000000000000000000000000000-0000000000000000000000000000000000000000 SecE:65aa;MW=7314; pI=9.9 M E tRNAtrp 860^ 880^ 920^ 940^ 900^ 960 AAACTCCGAAAGTTCTTCAGGGAAGTCATCGCCGAAGCAAAGAAAATTTCCTGGCCCTCCCGAAAGGAGTTGCTCACTTCTTTTGGTGTTGTTCTCGTGATACTCGCTGTTACAAGTGTT KLRKFFREVIAEAKKISWPSRKELLTSFGVVLVILAVTsV 980  Aval  1000  1020  1040  1060  MspI  TATTTTTTTGTGCTTGATTTCATCTTCTCGGGAGTTGTGAGTGCGATTTTCAAAGCGCTGGGAATAGGATAAGGTGATAGGTGATGAAGAAAAAGTGGTACATAGTCCTTACTATGTCCG Y F F V L D F I F S G V V S A I F K A L G I G NusC: 353aa; M K K K W Y I V L T M S MW=40329; pI=9.0 1100^  1160^ 1180^ 1200 GT TACGAGGAAAAGGTTAAAGAAAATATCGAAAAGAAAGTCGAAGCCACCGGGATAAAAAATCTGGTGGGCCGTATTGTTATTCCTGAAGAGGTAGTTTTGGACGCCACCAGCCCTTCCG G  1120^  1140^  YEEKVKENIEKKVEATGIKNLVGRIVIPEEVVLDATSPS  1220^ 1240^ 1300^ 1260^ 1280^ 1320 AGAGGCTCATACTTTCTCCGAAGGCCAAATTACACGTGAACAATGGAAAAGATGTTAACAAAGGGGATTTGATAGCTGAAGAACCTCCTATTTATGCTCGAAGAAGCGGTGTGATCGTTG E  RLILSPKAKLHVNNGKDVNKGDLIAEEPPIYARRSGVIV 1340^  ClaI^  1380^  1400^  1420^  1440  ACGTGAAGAACGTCAGAAAGATTGTTGTGGAAACCATCGATAGGAAGTATACGAAGACGTATTACATTCCCGAGTCTGCGGGAATCGAGCCGGGTTTGAGGGTTGGAACGAAAGTGAAGC D  VKNVRKIVVETIDRKYTKTYYIPESAGIEPGLRVGTKVK 1460^  1480^  1500^  1520^  1540^  1560  AGGGACTGCCGCTTTCGAAAAACGAAGAGTACATCTGTGAACTGGATGGAAAGATCGTTGAGATAGAACGAATGAAAAAAGTGGTCGTTCAGACACCCGATGGTGAGCAGGACGTTTATT Q GLPLSKNEEYICELDGKIVEIERMKKVVVQTPDGEQDVY 1580  1600  1620  1640  1660  1680  ACATTCCTTTGGATGTTTTCGACAGGGATAGGATAAAAAAAGGAAAAGAAGTGAAACAGGGGGAAATGCTTGCGGAAGCCAGGAAGTTCTTCGCCAAGGTTTCGGGAAGAGTCGAAGTGG YIPLDVFDRDRIKKGKEVKQGEMLAEARKFFAKVSGRVEV 1700^ 1720^ 1780^ SmaI^ 1760^ 1800 TGGATTATTCAACAAGAAAAGAGATCAGAATCTACAAGACGAAAAGAAGAAAACTCTTCCCGGGTTATGTGTTCGTGGAAATGATCATGAACGATGAGGCCTACAATTTCGTTCGTTCCG ^  DYSTRKEIRIYKTKRRKLFPGYVFVEMIMNDEAYNEVRS 1820^  1840^  1860^  1880^  1900^  1920  TGCCATACGTTATGGGGTTTGTCAGTTCGGGAGGACAACCCGTTCCCGTAAAAGACAGAGAAATGAGACCTATTTTGAGACTCGCGGGCCTCGAAGAGTACGAAGAGAAGAAGAAACCTG ^  PYVMGFVSSGGQPVPVKDREMRPILRLAGLEEYEEKKKP  1960^ 1980^ 1940^ 2000^ClaI^2020^ 2040 TGAAGGTCGAACTCGGTTTCAAGGTTGGAGACATGGTGAAGATAATAAGCGGTCCCTTCGAAGATTTTGCGGGTGTTATAAAGGAAATCGATCCAGAGAGACAGGAATTGAAAGTAAACG ^  KVELGFKVGDMVKIISGPFEDFAGVIKEIDPERQELKVN  2060^ 2080^ 2100^ 2120^ 2140^ 2160 TAACTATATTCGGACGTGAAACTCCTGTTGTTCTTCATGTTTCTGAAGTGGAGAAAATCGAGTGAGAAAACGTGGGAGGAGGAATCCGCACCACGCATAGGGACGTTCGAACATGGCGAA ^  T  I  F  G  R  E  T  P  V  V  L  H  V  S  E  V  E  K  I  E^  Figure 4  L11: 141aa; W=15089; p1=9.6^M A K  ^  51 2180^ MspI^MspI^2220^ 2240^ 2260^ 2280 GAAAGTAGCGGCTCAGATTAAATTACAACTGCCTGCCGGAAAAGCCACGcCGGCTCCACCCGTTGGACCCGCCTTGGGTCAGCACGGTGTTAACATCATGGAGTTTTGTAAAAGGTTCAA KVAAQIKLQLPAGKATPAPPVGPALGQHGVNIMEFCKRFN 2320^ 2300^ 2340^ 2360^ 2380^ 2400 TGCCGAAACAGCGGATAAAGGAGGCATGATACTTCCTGTTGTTATCACAGTGTACGAAGACAAGTCGTTCACTTTCATCATCAAAACACCACCTGCTTCCTTCCTTCTCAAGAAAGGAGC AETADKAGMILPVVITVYEDKSFTFIIKTPPASFLLKKAA 2420^ 2440^ 2460^ 2480^ 2500^ 2520 GGGTATAGAGAAGGGTTCTTCCGAGCCAAAAAGAAAGATAGTTGGAAAAGTTACCAGAAAACAGATTGAAGAAATAGCGAAXACAAAGATGCCAGATTTGAACCCAAACAGCTTGGAAGG G  IEKGSSEPKRKIVGKVTRKQIEEIAKTKMPDLNANSLEA  2540^ 2560^ 2580^ 2600^ 2620^ 2640 AGCCATGAAGATCATTGAAGGAACCGCTAAGAGTATGGGAATAGAAGTAGTGGACTGATGTAACGGAAAGGAGGAGGCGCAATGCCGAAGCACTCCAAGAGGTATCTTGAAGCAAGGAAA A  M  K  I  I  E  G  T  A  K  S  M  G  I  E  V  V  D^LI: 233aa;^M  P  K  H  S  K  R  Y  L  E  A  R  K  MW=25934; pI=9.5 2660^  2680^  2700^  2720^  2740^  2760  CTGGTGGACAGAACAAAGTACTACGATCTTGACGAAGCCATAGAACTCGTTAAAAAAACTGCCACGGCGAAATTCGATGAAACGATAGAACTCCACATTCAAACTGGAATAGACTACAGG L VDRTKYYDLDEAIELVKKTATAKFDETIELHIQTGIDYR 2780^ 2800^ 2820^ 2840^ 2860^ 2880 AAACCTGAACAGCACATCAGAGGAACGATCGTGCTTCCACACGGGACAGGTAAGGAAGTCAAGGTTCTGGTGTTTGCCAAAGGTGAAAAGGCAAAAGAGGCTTTGGAAGCGGGCGCGGAT K PEQHIRGTIVLPHGTGKEVKVLVFAKGEKAKEALEAGAD 2900^ 2920^ 2940^ 2960^ 2980^ 3000 TACGTAGGAGCTGAGGATCTTGTAGAAAAAATAGAAAAAGAAGGTTTTCTCGATTTCGATGTGGCAATAGCCACACCTGATATGATGAGAATAATCGGAAGGCTCGGAAAGATTCTGGGA YVGAEDLVEKIEKEGFLDFDVAIATPDMMRIIGRLGKILG 3020^  3040^  3060^Dral^3080^  3100^  3120  CCAAGAGGTTTGATGCCATCGCCCAAATCTGGAACGGTGACTCAGGAAGTAGCAGAAGCGGTTAAAGAGTTTAAAAAAGGAAGAATCGAGGTCAGAACGGACAAAACTGGGAACATCCAC P RGLMPSPKSGTVTQEVAEAVKEFKKGRIEVRTDKTGNIH 3140^ 3160^ Fnu4HI^ 3200^ 3220^ 3240 ATACCCGTTGGTAAGAGGAGCTTCGATAACGAGAAACTGAAGGAAAACATAATCGCGGCAATAAAACAGATTATGCAGATGAAACCCGCAGGTGTGAAAGGACAGTTCATAAAAAAAGTG IPVGKRSEDNEKLKENIIAAIKQIMQMKPAGVKGQFIKKV 3260 MspI^3280^ 3300^PL10•^3320^ 3340 Apal^3360 GTTTTGGCTTCTACAATGGGACCCGGTATAAAATTGAATCTTCAGAGTCTGTTGAAAGAGTAAAGCAATCGAAAACTCAATAAGACGCCGTAGATGGCAGGGCCCGTGGGGTTAAAGATC ^  LASTMGPGIKLNLQSLLKE 3380^  3400^  3420^  3440^  3460^  3480  CTGCCGGAGGCGTCCCAGAAAGGTTCTATGACCTTTTTGTGGACCCTCCGGGGTCCACAAAATTTTTTTGGGAGGTGAATCCTTTGCTGACCAGGCAACAGAAAGAACTCATAGTTAAAG L10: 179aa;MW=20231; pI=9.1^M 3500^  3520^  3540^  L  T  R  Q  Q  K  E  L  I  V  3560^XbaI^3580^  K 3600  AAATGAGTGAAATATTCAAAAAGACATCGCTGATACTCTTTGCCGATTTCCTGGGTTTCACGGTAGCTGATCTCACCGAGCTTCGTTCTAGATTGAGAGAAAAGTACGGAGATGGAGCAA E MSEIFKKTSLILFADFLGFTVADLTELRSRLREKYGDGA 3620^  3640^  3660^  3680^PvuII^3700^  3720  GGTTCAGGGTTGTGAAGAACACTCTCTTGAATCTCGCTCTCAAGAACGCTGAGTACGAAGGTTACGAAGAATTTCTCAAGGGACCCACAGCTGTACTCTACGTCACTGAAGGAGACCCTG RFRVVKNTLLNLALKNAEYEGYEEFLKGPTAVLYVTEGDP 3740^  3760^  3780^  3800^  3820^  3840  TAGAAGCTGTCAAGATAATTTACAACTTTTACAAGGATAAGAAAGCGGATCTTTCGAGGCTCAAGGGTGGTTTCCTCGAAGGAAAGAAATTCACGGCAGAAGAAGTGGAAAACATTGCGA ^ EAVKIIYNEYKDKKADLSRLKGGFLEGKKFTAREVENIA Sad ^  3880^  3900MspI^3920^  3940^  3960  AACTCCCATCCAAAGAAGAGCTCTACGCTATGCTCGTTGGTCGTGTGAAAGCTCCGATTACCGGTCTTGTGTTTGCATTGAGTGGTATTTTGAGGAATCTCGTGTATGTGCTCAATGCTA K  LPSKEELYAMLVGRVKAPITGLVFALSGILRNLVYVLNA  4000^ PL12 •^3980^ 4020^ 4040^ 4080 4060^ TTAAAGAGAAAAAATCTGAATGATGGAGGTGTTTGAAGATGACGATTGATGAAATCATCGAAGCGATTGAGAAACTCACAGTTTCAGAGCTTGCAGAACTCGTGAAGAAGCTCGAAGACA I  K  E  K  K  S  E^L12: 128aa;^M  T  I  D  E  I  I  E  A  I  E  K  L  T  MW=13457; pI=4.7  Figure 4 (continued)  V  S  E  L  A  E  L  V  K  K  L  E  D  52  4100^ 4120^ 4140^MspI^9160^ 9180^ 4200 AATTTGGAGTGACTGCTGCTGCACCTGTGGCTGTCGCTGCTGCCCCAGTTGCTGGAGCAGCTGCCGGTGCCGCTCAGGAAGAAAAGACAGAGTTTGACGTCGTTTTGAAGAGCTTCGGCC K F G V T A A A P V A VA A A PV AGA A AGA A QE E K T E FDV V L K S F G 9220^ 4290 Hinfl^4260^ 9280^ 9300^ 4320 AGAACAAGATTCAGGTCATCAAAGTTGTCAGGGAAATCACCGGACTCGGTCTCAAGGAAGCCAAAGACCTCGTCGAAAAAGCCGGTTCACCCGATGCAGTCATTAAGAGCGGTGTTTCCA Q N K I QVI K V V R E I T GL GL K EA K DL V EK A G S FDA VI KS G V S 4390^ 9360^ 4380^ 4400^ 9920^ 9490 AAGAAGAGGCAGAAGAGATCAAGAAGAAACTCGAAGAAGCTGGTGCTGAAGTGGAACTGAAGTAAATTTTCGTTTGTGAAAAAAGAATGTTGCAACCCTGTACCGCCTCTTGCCGGTACA K E EA E EIK K K L E E A G A EVEL K 4460 PO •^•^9480^ 9500^ 9520^SmaI^9590^ 9560 GGGTTTTTGTGTTTTAATAAATAGAGTGTGGTACAAACGTTCTTCCTCACCATGTTGTTTCCTTTCCGTTCGATCCAAGCAAAACCCGGGAGAGAAATCCTGGGAGTTTCTTATTCCACA 9580^ 4600^ 9620^ 9640^ 9660^ 9680 TTGAGAGGTGAGAAAATGAAAGAGATCTCTTGCGGTAGGAGGACGAGGGTTTCTTTCGGCAAGAGCCGAGAGCCCCTGCCAATTCCAGACCTCGTGGAGATCCAGAAGAGTTCCTACCGA  3^MK  EIS CGR R T R VS FGK SR E P L PIPDL V E I Q K S S YR  Hinfl^9700^ 4720^ 9740^ 9760^ 9780^ 4800 ACATTCCTCGAAGAAGGTTTGCTTGAAGTCCTCAAGAAATTTTCTCCCATTTATTCGCAGGCGACCCGCTCAGATTTGAGAAAATCAGACAGAGGATTTGCTCTCGAGTTTGTTTCAACC N FLEEGL L E V L K K F S PI YSQA T RS DL R K ^ RD GE A L E F VS T 4820^ 9840^ 4860^ 9880^ 9900^ 4920 AGAACTGGAGAACCTGCCATCCATCCCCTTGAATGTAAAGCGAAGGGTCTAACCTACAGTOTTCCGATATATGCGACGGCTCGCCTTACCGACATGAAAAGCGGTGAGATGAAGGAAGAA R T G E P A I D PL E C K A K GL T YS V PI Y A T AR L T D M K SGEMK EE 4940^ 9960^ 4980^ 5000^ 5020^ 5090 GAAGTGTTCCTTGGCTACATTCCCTACATGACGGATCGTGGAACGTTCATAATAAACGGAGCAGAAAGGGTTGTAGTCAATCAGATAGTGGTTTCCCCAGGGCTTTACTTCTCGTCTGAG E VFL GYI PYMT DR G T F I ^I N G AN ^ER V V VNQIV VS PGL YES SE 5060^ 5080^ 5100^ 5120^ 5190^ 5160 TACATAGACAGAGAAGAATACGGCGGGTACTTTCTCCCTTCTCGAGGTGCATGGCTCGAAGTCATCCTCGATCCCTACGATGGAGTTCTTTACGCGGGCCTTGACGGAAAGAAGGTCAAC Y IDR E EYGGY F L PSR GA WLE VILD PYDGVL YAGL DGK K V N 5180^ 5200^ 5220^ 5290^ 5260^ 5280 CTTTTCCTCTTTCTGAAAACGATCGGTTACGAAAAAGATGAGGATATCCTCTCCCTTTATCCCACCTATCTGGATGCCGACGATGAAGACAGTCTCCTGCTCCACGTGGGCTCCATTCTG L FL F L K TIDY EKDEDIL^ PT Ty LDADDEDSLL L HVGS IL 5300^ 5320^ 5390^ 5360^ 5380^ 5900 CTCGAAGACATCTACGATGGTGGCAGGAAGATCGCTGAAAAATGGGATATCCTGACCAAAGATCTCGCGGAAAGGATTCTGATGATAGATGACATAAATCAGATAAAAATAGTTCATCCA L EDI Y DGGR K I A E K WDIL TKDL A SRIL MIDDINQI K I V HP 5920^ 5940^ 5460^ 5480^ 5500^ 5520 ATAGCTCAAAATACATTTGAAAAGATGCTGGAAGTGGTGTCTTCCTCGAGCGAAGAGGGAGAGGAAGAAGAGGAAAAGACAAAGATTTACGGTTTAAACGAAGTCACCGTTGTGGACGCA I A Q N T F E K ML E VVS S SS EEGEEEE E K T K I Y GLNE V T V V DA 5590^ 5560^ 5580^ 5600^ 5620^ 5640 ATATCTGGAAATTTTCAGGAGATTGCGACCCGAAGAACTTCCAAGAATAAACGCGGCAAAAAGGTATCTGCACGACCTCTTCTTCAATCCGGAAAGGTACGATCTTTCCGAGGTGGGAAGA Y L^ FR RR L R PEEL PR INA AK R YLHDL F F NP ER YDL SEVGR 5660^ 5680^ 5700^ 5720^ 5740^ 5760 TACAAAGTCAACGAAAGACTCAGAAACGCTTACATCAGGTACCTCATAGAGGTTGAAGGGGAAGATCCCGAAGAGGCGAGGAAGAAGGTTTACAACGAAACTTCTCTCGTTCTGAAACCA YKVNERL RNA Y IR Y L IE V EGEDPEE AR K K V YNE T S L V L K P 5780 EcoRI CTTGATATAGTCCTCGCTTCCAGAATTC L DIV V L A S R  Figure 4 (continued)  53 upstream of the L11, L1, L10, L12,13 and 13' gene cluster (course et al., 1986). Third, this region in T. maritima lacks genes or sequences related to tufB (EF-Tu) of E. co/i. The tufA gene of T. maritima was located and cloned from elsewhere in the  genome (Bachleitner et al., 1989). A second copy of this gene which would be equivalent to tufB of E. coli does not exist in the T. maritima genome. Fourth, the T. maritima secE gene encodes a polypeptide of 65 amino acid residues, which  show significant sequence identity to the carboxyl-terminal region of the E. coli SecE protein. Fifth, the nusG equivalent gene is nearly twice as large as the corresponding nusG of E. co/i; this is principally the result of an insertion of 513 nucleotides after codon 45 of the T. maritima nusG gene (see below). Finally, the arrangement and approximate size of the L11, L1, L10 and L12 equivalent ribosomal protein genes and the downstream RNA polymerase 13 subunit genes are well conserved between the two species. 3.2.1 The tRNA gene cluster  Four of the five tRNA genes, Metl,Thr,Tyr and Trp, encode full length molecules that include the 3' terminal CCA acceptor sequence and all can be folded into the universal clover leaf structure (Figure 5). In contrast, the tRNAmet 2 gene encodes a 73-nucleotide-long truncated tRNA. Although the  sequence ends with a CCA 3' terminus, the two C's are buried as part of the seven base-pair acceptor stem in the clover leaf structure. It seems likely that activation of this tRNA requires the post transcriptional addition of the terminal CCA acceptor sequence by a nucleotidyl terminal transferase. Alternatively, the tRNAmet 2 might exhibit an atypical folding pattern where the stem of the Tlif arm  is contracted from five to two base-pairs; this would result in the expansion of the variable loop from five to eight nucleotides and extension of the 3' terminal GCCA sequence above the seven base-pair acceptor stem. In this alternative configuration, the acceptor stem would contain a C•A mismatch at position 6.  ^  54  AG II U..3'  MET 1  ^  C C  C^ C  ^  fi■  A II A G..3' C  C MET 2^ A II A G• •3 '^ G 5' ..0 0G—C 5...13 G G—C 5 ' . .13 G G—C^ G—C 0—C^ G—C ?^C—G G—C C—G^ C—G G—C^ 0—C U—A G—C^ G—C C—G C OA C-0^ 0—C G—CG G—C U 0 A 13 A^ U^CCIICC uA A^ U^CACCC^ U^C C G U A^A G^C A^A^ A^1 1^G I 1111 LT^CUCG^1111 CI i C IG G I^C GG GGG 0 11000 LI u ^G C CII^G LT II U C ^111^ G AI C u^ G G ^ A G C 11 GC^ 1^ U U A C 0^G G^0^G 0 ^0 C C—G U 0 G—C 0 u 0—C o C—G G—C^ .0 0 C—G U—A G—C G—C A— U C G—C A II A II A A C A U C A II  L  AAA A• .3'^ C^ C^  AU II U. • 3 I C C THR^ ^ TYR^ TRP ^A A^ A 5 ' . .G G G—C 5 '..G UG—C 5 ' . .0 CA—II C—G G—C^ G—C C—G^ II — A^ G—C A— II^ G—C^ G—C G—C^ G—C^ C—G C—G^ G—C^ C-0 G^ G—C^A^ 0—C U^CACCC U^CCIICCG A G—CU^ A U^CGUCCUGA cAA A^ G G G U A^11• 11 G^ 11^ 111^ 11111^A^C AA AG C C C CUCG CG GII GG G g^ uca GGAGG^ GCG G C U U^C 0 1 ^1 uII^ c UU G^1111^U 0^ U^G^ M^ I LT^ II C u^ c A^ A 0 0 0^ GAGC^ LT U C^G^ 0^LT^ ^0 G G U A^G^G^ ,, A -Gu A A—U A A ^ GAGC C-0 U 0 0 AA C—G^ 0^0—C C-0^ C—G ^Q—A^0—C^ G—C ^ ^0—C^0—C G—C A—U^ Q—A A—II LT^A^ LT^A^ C^A U^A^ U^A^ U^A LT 0 T3^ ° U A^ C c A A G A U • • 3 '^ C^ C^  Figure 5 Structure and processing of tRNAs  The structures of the five tRNAs encoded on the 5.8 kb EcoRI fragment are depicted. The shaded nucleotides are present in the primary transcript but removed during tRNA processing and maturation. The tRNAmet 2 is unusual; it either requires CCA addition for activation or it has a very unusual structure with a mismatch (C•A) at position six in the acceptor stem. Both possibilities are illustrated.  55 There is no indication as to which, if either, of the two methionine tRNAs might serve as the initiator in the translation initiation process. The analysis of transcripts derived from the region containing the five tRNA genes was both complex and difficult because of rapid endonuclease cleavage at a large number of processing sites. Nonetheless, by using both S1 nuclease protection and primer extension analysis, it has been possible to identify (i) putative transcription initiation sites, (ii) regions of the primary transcript that represent major processing intermediates, and (iii) processing sites that generate mature tRNA 5' or 3' ends. The oligonucleotide primers used for primer extension are complementary to either coding sequences or to intergenic noncoding sequences. Generally, the coding region primers utilize the primary transcript or processing intermediates as template much more efficiently than mature tRNA molecules; presumably, this reflects the inability of the reverse transcriptase to denature secondary or tertiary structure or to read through modified tRNA bases. Nuclease S1 protection assays using 5' or 3' end labeled DNA fragments as probes detect both precursor and mature RNA transcripts. The conclusions from these experiments are summarized in Figure 6A, and some of the experimental results are illustrated in Figures 6B and C. To summarize, primary transcripts appear to be initiated from two putative promoters, P1 located in front of the tRNAmet 2  tRNAmeti  gene and P2 located in front of the  gene. No transcripts could be detected from the region in front of the  putative P1 promoter. Transcripts initiated from these promoter sites appear to extend through the distal tRNAtrP gene and on into the secE and nusG genes; rapid endonuclease processing results in the removal of the tRNA sequences from the extended leader region of these transcripts.  56 3.2.2 tRNA processing  The results of S1 nuclease protection experiments using the 5' end labeled 173-nucleotide-long EcoRI AvaI fragment, the 308-nucleotide-long AvaI —  fragment, the 508-nucleotide-long AvaI fragment and the 280-nucleotide-long  Mspl fragment are illustrated in Figure 6B. The two protected products of the 173 nucleotide long probe (Figure 6B i) are 72 and 66 nucleotides in length and correspond respectively to protection by (i) the primary transcript initiated at the Pi promoter and (ii)  the transcript that has been processed to generate the mature  5' end of tRNAmetl. Clearly, the amount of mature tRNA detected is much greater than the amount of primary transcript. The two most visible protection products of the 308-nucleotide-long probe are about 200 and 300 nucleotides in length (Figure 6B ii). These correspond respectively to protection by (i) the trailer sequence that is liberated following cleavage of the primary transcript at or near the 3' end of  the  tRNAmetl  sequence, and (ii) by primary transcripts or processed  intermediates with a 5' end at or near the beginning of the  tRNAmet 2 sequence.  The three major protection products obtained with the 508-nucleotide-long probe were about 435, 260 and 180 nucleotides in length (Figure 6B iii). These correspond respectively to protection by trailer sequences liberated by processing at or near (i) the 3' end of the tRNAtYr gene, (ii) the 5' end of the tRNAtrP gene, and (iii) the  3' end of the tRNAtrP gene. Finally, using the 5' end-labeled 280-  nucleotide-long Mspl fragment as probe, it was possible to demonstrate that transcripts exiting the  tRNAtrP gene are extended well into the secE and nusG  genes. The observed products, 280 and 270 nucleotides in length, resulted respectively from (i) full length protection of the probe by the primary transcript, and (ii) partial protection by the trailer liberated following processing at the 3' end of the  tRNAtrP sequence (Figure 6B iv). No other abundant transcripts with either  3' or 5' ends within the tRNAtrP-secE, and cfrE-nusG intergenic spaces were detected. Thus, the five tRNA genes are processed from the leader region of the  57 mRNA that extends into the nusG gene. These results were confirmed and extended using the corresponding 3' end-labeled AvaI fragments and other 5' and 3' end-labeled Mspl fragments as probes in S1 nuclease protection assays (data not shown). Some of the 5' transcript ends detected by S1 nuclease protection were confirmed and precisely positioned using primer extension analysis (Figure 6C). The primer oD10 is complementary to a sequence within the tRNAmetl gene (Table 3). The major extension product, terminating at the G residue at position 101, is five nucleotides in front of the tRNAmetl gene (Figure 6C i). This is the position where transcripts are initiated from the putative Pi promoter. Less abundant products with end sites at nucleotides 98, 99 and 106 were also apparent. The first two positions probably correspond to minor transcription initiation sites and the third corresponds to the 5' end of the mature tRNAmetl. It is likely that this oligonucleotide primes more efficiently on precursor than mature tRNA. The second primer oD1 is complementary to a region within the primary transcript between the mature tRNAmetl and tRNAmet2 sequences (Table 3). Four extension products with end sites at nucleotide positions 182, 164, 134 and 101 were detected (Figure 6C ii). The product with an end at position 182 most probably corresponds to priming on the trailer intermediate released following endonuclease cleavage at or immediately adjacent to the 3' end of the tRNAmett. This result implies that endonuclease incision occurs precisely at the end of the mature tRNA sequence and that there is no extensive exonuclease trimming required to produce the mature 3' tRNA end. Alternatively, the product may be due to termination of extension caused by secondary structure of the tRNA within the primary transcript. The next two products corresponding to reverse transcription stops at position 164 and 134 within the tRNAmetl structural sequence are presumably caused by impediments to elon• a is i  II  -  the Tyr loop and the second is near the base of the descending portion of the  58  Figure 6 Mapping of transcript end sites in the tRNA-nusG region (A) A detailed genetic map of the 1.2-kb region is illustrated with the five tRNA and the nusG genes indicated. The positions of the two putative promoters, P1 and P2, are indicated (r>. ) along with restriction sites used for making S1 nuclease protection probes: E, EcoRI; M, MspI; A, AvaI. The primary transcript is depicted below the map: •, represents putative transcription start sites; 1, and T, represent respectively the positions of detectable 5' and 3' ends generated during the excision and processing of the tRNA sequences. (B) The structures of four 5' end-labeled DNA fragments used as the probes of S1 protection assays are illustrated as rectangles on the left. Below each are the major protection products illustrated as lines (i—iv). The position in the nucleotide sequence (from Figure 4) used for end-labeling at the ends of the minus strand DNA probes and the ends of the protected products are indicated in parentheses. The length of the protected products in nucleotides (n) corresponds to the visible autoradiographic bands. The autoradiograms are illustrated at the right: S, molecular length standard; T, S1 protection using T. maritima RNA. For clarity, the controls using E. coli RNA and the DNA probe alone without RNA are not shown. (C) The autoradiograms of the primer extension experiments are illustrated. The primers used were (a) oD10, complementary to position 155-136 within the tRNAmeti gene, (b) oD1, complementary to position 248-231 in the Met1 Met2 —  intergenic space and (c) oD12, complementary to position 433-417 within the tRNAmet 2 gene (Table 3). The major extension stops are indicated and their  positions within the complementary DNA (+) strand nucleotide sequence are illustrated: •, strong stop; o, weak stop. The ladder (G, A, T, C) depicts the DNA (-) strand sequence; PE designates the lane containing the primer extension products.  59  Figure 6  60 anticodon stem. The longest product has an end corresponding to the transcription initiation site of the putative Pi promoter at position 101. The absence of detectable product with an end site corresponding to the 5' end of the mature tRNA (position 106) may indicate that 3' end processing normally precedes 5' end processing. The third primer oD11 is complementary to a sequence within the tRNAmet 2  (Figure 6C iii). Two extension products with ends at positions 279 and  281 were evident; these correspond to the 5' end sites of transcripts initiated at the putative P2 promoter immediately in front of the tRNAmet 2 gene. By using other primers, it has been possible to show that endonuclease processing at the 3' ends of the tRNAthr and tRNAthr appear to occur immediately adjacent to the CCA terminal sequence; extension products resulting from priming of the Thr and Tyr trailer sequences exhibited stops at nucleotide positions 460 and 554, respectively (data not shown). In S1 nuclease protection experiments, 3' end sites were detected in approximately the same positions. 3.2.3 Characterization of transcripts derived from protein encoding genes -  Transcripts entering the secE and nusG genes were efficiently extended through the L11 and L1 ribosomal protein genes and into the L1-L10 intergenic space (Figure 7 A). Both nuclease S1 and primer extension assays failed to reveal significant levels of transcripts with either 5' or 3' ends in or between these genes (data not shown); this implies that the region between nucleotides 820 and 3300 is devoid of internal promoters, terminators and major mRNA processing sites (that upon cleavage produce transiently stable intermediates) and that the secE, nusG, L11 and L1 cistrons are sequestered on a large polycistronic mRNA. In contrast, both read-through transcripts and transcripts with 3' or 5' ends within the L1-L10 intergenic space have been identified (Figures 7B i and ii and C Ty. The 3 and 5 ends were not generated by an endonuclease cleavage event 1-  1-  61  Figure 7 Characterization of transcripts from the protein-encoding genes (A) The genetic map illustrates the positions of the protein encoding genes (solid boxes). Restriction sites used to generate S1 probes are : M, Mspl; F, Fnu4HI; X, XbaI; P, PvuII; and H, Hinfl. The vertical arrows indicate the positions of putative regulatory signals on the DNA (or mRNA below): P, promoter; A, attenuator; T, terminator. (B) The structures of several 5' and 3' end-labeled DNA probe fragments used in S1 nuclease protection assays are illustrated as rectangles. Under each probe are the protection products (lines). Nucleotide positions corresponding to 5' or 3' sites of end-labeling on the minus DNA strand (in parentheses) and protected fragment lengths in nucleotides (n) for each of the probes and the corresponding protection products are indicated. The autoradiograms are illustrated below: S, molecular length standard; T, S1 protection using T. maritima RNA. For clarity, the controls using E. coli RNA and the end-labeled DNA probe alone are not illustrated. The probes used are as follows: (i) 3' labeled Fnu4HI-PvuII; (ii) 5' labeled XbalFnu4H1; (iii) 3' labeled MspI-MspI; (iv) 5' labeled MspI-MspI; (v) 3' labeled HinflHinfl; (vi) 5' labeled Hinfl-HinfI. (C) Primer extension was used to locate the transcription initiation sites for the putative L10 (i), L12 (ii) and 13 (iii) promoters. Positions of major (•) stops on the (+) DNA strand sequence are indicated. The primers used were (i) oD15 complementary to position 3464-3447, (ii) oD16 complementary to position 40264008, and (iii) oD5, complementary to position 4609-4592 (Table 3). The ladder (G, A, T, C) depicts the DNA (-) strand sequence, and PE depicts the products of the primer extension reaction. (D) Total RNA was separated by electrophoresis and probed with the Xbal-Smal fragment (nucleotide position 3567-4526) spanning the L10 and L12 genes. The fragments hybridized to 0.4 and 1.0 kb RNA and to larger RNA molecules.  62  Figure 7  63 because the 3' transcript end site at position 3426 is located 112 nucleotides downstream from the 5' transcript end site at position 3314. The 5' transcript end probably results from transcription initiation at a putative internal promoter, 1110, used to augment the expression of the downstream L10 and L12 genes. The 3' end site at position 3426 presumably results from transcript attenuation. The end site is located within a poly T stretch and is preceded by overlapping sequences with inverted repeat symmetry. The results from S1 protection experiments indicate that this structure mediates the termination of about 50% of the mRNA transcripts during exponential phase growth. Together, the L1-L10 intergenic promoter and attenuator elements probably play an important role in modulating expression of the downstream genes (see below). The L10-L12 intergenic space is only 19 nucleotides in length. In the ribosomes of E. coli and other organisms, the L12 protein is present in four copies per 50S subunit, whereas all other proteins including L11, L1 and L10 are stoichiometric and present in single copy (Dennis, 1974; Subramanian, 1975; Hardy, 1975). In E. coli, this four-fold excess of L12 is achieved through an illdefined translational control mechanism (Downing and Dennis, 1987; Petersen, 1990). In contrast to E. coli, a major 5' transcript end was mapped near the end of the L10 gene (position 3972-3974) and probably results from a transcription initiation event from a promoter element buried within the T. maritima L10 gene (Figure 7B iv and 7C ii). Transcripts from this promoter represent between onethird to one-half of the total L12 mRNA. Analysis of the L12-13 intergenic space indicates that few if any transcripts exiting the L12 gene are extended into the downstream RNA polymerase 0 subunit gene (Figures 7B v and vi, and C iii). Rather, the transcripts are efficiently terminated within a poly T stretch centered around position 4446 that is preceded by a region of inverted repeat symmetry. Expression of the RNA polymerasp e• subunit gene requires transcript reinitiation at a downstream promoter. The 5'  64 ends of these reinitiated transcripts have been located by primer extension assays at positions 4566 and 4571. The above nuclease protection and primer extension results suggest the presence of two internal promoters within the ribosomal protein gene cluster; these are used to augment the expression of downstream genes. One of these, Pilo, is located in the L1-L10 intergenic space and the other PL12 is located immediately in front of the L10-L12 intergenic space. The presence of an efficient transcription terminator immediately after the L12 gene results in the production of mono-, bi- and hexacistronic transcripts containing the L12 cistron with lengths of about 400, 1000 and greater than 3500 nucleotides, respectively. By probing northern RNA blots with a probe containing L10 and L12 genes, the 400 and 1000 nucleotide long transcripts along with a heterogeneous large transcript were identified (Figure 7D).  3.2.4 mRNA secondary structure and function Figure 8 summarizes and contrasts the transcription patterns and regulatory features of the secE, nusG, L11, L1, L10, L12 regions of the E. coli and T. maritima genomes. In E. coli, there are three non-overlapping transcription units: the  tRNA-tufB operon, the secE -nusG operon, and the L11, L1, L10 L12, 0, j3' operon (An and Friesen, 1980; Downing and Dennis, 1987; Downing et al., 1990). Because  of internal promoters, terminators and attenuators, a number of different primary transcripts are produced from the operon of the ribosomal proteins and RNA polymerase. In addition, the transcripts from all three operons contain potential endonuclease cleavage sites which increase further the number of detectable mRNA species. The ribosomal protein mRNAs of E. coli possess well-characterized translational control elements (reviewed by Lindahl and Zengel, 1986; JinksRobertson and Nomura, 1987). The site controlling L11 and Ll synthesis is a  65 Figure 8 Structure and features of E. coli and T. maritima RNA transcripts  (A) The genomic maps and transcripts produced for E. coli (top) and T. maritima (bottom) are illustrated. The positions of promoters (P), terminators (T), and attenuators (A) are indicated. Where transcription termination is substantially less than 100% at terminators or attenuators, the (percent) read-through is indicated. The 5' transcript ends resulting from initiation events are indicated (9--) along with sites of mRNA processing by known endonucleases (R3, RNaseIII; RE, RNaseE) (0X0). Protein binding (PB) autogenous translational control sites are boxed on the mRNAs. (B) Regions of RNA secondary structure that presumably serve a regulatory function in the T. maritima mRNA are illustrated. For comparison, the putative L1 protein binding site in the T. maritima 23S rRNA is presented. Also illustrated for comparison are a portion of the L10 autogenous translational control region and the 0 attenuator structure from the E. coli RNA transcripts; mutational substitutions resulting in L10, L12 translation defective (^) or translation constitutive (>) phenotypes are indicated. The designation (T) is represented by (U) in RNA transcripts.  A CLEOTIDE SCALE (KILOBASES) 0^ 1^ 2^ 3^ 4^ 5^ 6^8.5 I^  I ^ I ^ I ^I ^ I ^ I  Es•herichia coli Tu iu GTTT^ tufB  secE nusG  PEG  II PLii  TB  R3 X  Th rmotoga maritima  TEG  L10^L12^13  I PL10  :Q  •^1  X X tRN ENDOS  L11^Ll -== II  TL11 (50%)^A13 (80%)  REV PBL 1 0  tRNAs  1 M2 T Y W secE^nusG  L11  L1  L10^L12  L33 P2^  PL10 AL10 (50%) PL12  TI3 r3  •  110■■■■■•  V tRNA ENDOS  ^  PBL1  Figure 8 A  I Mk.  MM-  ..  •—^  PB L 1  REX  LIM  R3  y xV  68 mimic of the 23S rRNA binding site for protein L1 and is located immediately preceding the L11 cistron on the mRNA (Baughman and Nomura, 1983). Because of translational coupling, once L1 binds to the mRNA, translation of the downstream L1 cistron is also blocked (Thomas and Nomura, 1987). The E. coli L10 regulatory site located in the middle of the long L1—L10 intercistronic space is more complex and less well understood (Fiil et al., 1980; Johnsen et al., 1982; Christiansen et al., 1984). The L10 protein (or an L10-(L12)4 complex) has been shown to bind to a segment of the mRNA about 100 nucleotides in length near the middle of the intercistronic space. A region of interrupted inverted-repeat symmetry immediately adjacent to the L10 protein binding site was shown to be extremely sensitive to nucleotide substitution (see Figure 8), and is believed to be a crucial component in the on/off switching of mRNA translation; both translation defective and translation constitutive mutants have been characterized (Fiil et al., 1980; Christiansen et al., 1984). In T. maritima, the secE and nusG genes are cotranscribed with the ribosomal protein genes, and the tRNAs are processed from the leader region of this polycistronic mRNA transcript. The distal RNA polymerase subunit genes would appear to form a separate operon, to be transcribed from a promoter, Pp, located in the L12-13 intergenic space. Potential sites related to those cleaved by RNaseIII and RNaseE in E. coli (Arraiano et al., 1988, King and Schlessinger, 1987) have not yet been identified in the T. maritima transcripts containing secE, nusG and ribosomal protein genes. It is possible, nonetheless, that such endonuclease sites do occur and are used to trigger rapid degradation of mRNA sequences. If the products formed by an endonuclease cleavage are rapidly degraded, they would escape detection in the nuclease protection and primer extension assays used here. In many of our S1 protection experiments, autoradiographic bands of low intensity are apparent.  These bands, representing minor protection products with 5' and 3' ends falling  69 within generally nondescript sequences, probably represent transiently-stable degradation intermediates and have for simplicity not been emphasized in this study. Examination of the T. maritima nucleotide sequence of the secE, nusG, L11, L1, L10, L12, and p genes and intergenic spaces has revealed a number of potentially important regions that could form regulatory structures within an mRNA transcript (Figure 8 B). The first is in the short nusG-L11 intergenic space and forms a bipartite helical structure immediately preceding the L11 translation initiation site. The region exhibits primary sequence and secondary structural similarity to the L1 binding site within the 23S rRNA (Achenbach-Richter, personal communication). By analogy with E. coli, this site is probably used to mediate translational regulation by protein L1. A direct comparison of the L1-L10 intergenic spaces of E. coli and T. maritima failed to reveal any nucleotide sequence similarity. The transcription  termination signal that has been identified is referred to as an attenuator because (i) it functions at about 50% efficiency during exponential phase growth, and (ii) it possesses structural features which suggest that the termination frequency can be modulated. The second inverted-repeat within this element is characteristic of eubacterial Rho-independent terminators; when this structure is allowed to form in the nascent transcript, termination would occur. The first inverted-repeat within this element exhibits structural and possibly functional similarity to an E. coli repeat implicated in the on/off switching of mRNA translation (Fiil et al.,  1980; Christiansen et al., 1984; see Figure 8B). If the on/off switch exists in T. maritima, it would appear to control transcript termination by either allowing or  preventing the formation of the second terminator hairpin in the mRNA. The efficient terminator in the L12-(3 intergenic space was identified and defined by nuclease S1 protection assays. It is a typical eubacterial Rhoindependent terminator, consisting of a single inverted-repeat followed by a  70 stretch of T residues. In E. coli, the analogous structure is more complex and functions as an attenuator, terminating approximately 80% of the transcripts that exit from the L12 gene and extend into the intergenic space (Downing and Dennis, 1991). Recent in vitro transcription experiments suggest that antitermination at this attenuator may be stimulated by E. coli NusA and NusG proteins (Linn and Greenblatt, 1992). The termination frequency at this E. coli attenuator is adjustable and functions to control the level of production of the 0 and DI subunits of RNA polymerase. In T. maritima, it seems that transcription of the upstream ribosomal protein genes and the downstream f3 and IT RNA polymerase subunit genes are dissociated; transcription of the RNA polymerase 13 gene requires initiation at the Pp promoter which partially overlaps with the termination sequence of the ribosomal protein operon. 3.2.5 Transcription and translation initiation signals  A total of five putative transcription initiation sites were located by primer extension and nuclease protection assays within the tRNA, secE, nusG, ribosomal protein and RNA polymerase gene cluster. The sequences preceding these five sites were examined in order to identify conserved features which might constitute elements of a T. maritima promoter (Figure 9A). The following features emerge: (i) All promoters exhibit one or more start sites located within a region up to eight nucleotides in length. (ii) Centered about ten nucleotides upstream from the major start site is an AT-rich sequence that probably corresponds to the E. coli Pribnow box sequence (TATAAT); the T. maritima consensus derived here is TAWAAT (W, nucleotide A or T, Hoopes and McClure, 1987). (iii) A second conserved element possibly corresponding to the E. con -35 sequence (TTGACA) was also identified; the T. maritima consensus was TTGAC(A /G). In the E. coli promoter, the spacing between the -10 and -35  71 elements is critical and usually limited to a range between 16 and 19 nucleotides. In T. maritima, the spacing appears to be more variable, ranging from 18-25 nucleotides. The functional significance of either of these elements and their spacing relative to the transcription start site in the T. maritima promoter will be established only by a more detailed in vitro or genetic analysis. The designation of open reading frames and the location of the translation initiation codons were based upon two criteria. The first is alignment of predicted amino acid sequences with the sequences of known E. coli proteins. The second is the proximity of a potential translation initiation codon to a sequence exhibiting complementarity to the 3' end of 16S rRNA (Gold and Stromo, 1987). All the protein-encoding genes analyzed in this study contain a five to nine nucleotide long region of 16S rRNA-mRNA complementarity (Figure 9B). For seven of the genes, L33, secE, nusG, L11, L1, L12 and 0, ATG is used as the translation initiation codon; the L10 gene apparently uses the unusual codon, TTG. It is uncertain whether or not this unusual initiation codon plays a role in translational regulation of the L10 cistron. 3.2.6 Protein homologies  The amino acid sequences of the L33, SecE, NusG, L11, L1, L10 and L12 proteins predicted from the nucleotide sequence of the T. maritima 5.8 kb genomic fragment were aligned to the E. coli protein sequences, and the sequence identity of each protein pair was calculated (Table 6). This comparison indicates quite clearly that the predicted T. maritima proteins are the homologues of the corresponding E. coli proteins; amino acid identities range from 32 to 66 percent for the seven protein pairs. The protein pairs exhibiting the largest proportion of amino acid replacements are often affected more frequently by insertion deletion events.  72 A PROMOTER ALIGNMENTS -10^START SITE  -35  ..CC ..GG ..AA ..TT ..TC ..TC CONSENSUS^..  TTGAC TTGAC TTGAA TTGAG TTGCC TTGCC TTGAC  P1 P2 PL10 PL12 P/31 13 132  AACGGGGTTTTGTTAGAA AAAGAAAAGCTCTGATAG TCTTCAGAGTCTGTTGAAAGAG GAATCTCGTGTGTGTGCTCAATGC GGTACAGGGTTTTTGTGT GGTACAGGGTTTTTGTGTTTTA ^ 18-24 NUC ^  B RIBOSOMAL BINDING SITES L33  ..TG  secE  ..CT  AGAAAGGG  TATAAT TAAAAT TAAAGC TATTAA TTTAAT ATAAAT TAWAAT  CTGATAGC G TAATGAAC G A AATCG AGAGAAA A AAATAGA^G AGAGTGTG G 5-8 NUC. G  GTGTG.. GTCTT.. AAACT.. AATCT.. TGTGG.. TACAA.. ^  (99) (279) (3312) (3972) (4466) (4471)  TRANSLATION INITIATION CODON^POSITION  TGGAAGAT ATG..  (579)  GAGGGGG^CATCGAGAA ATG..  (835)  TGAT  — 7— AAGGAGGTG  nusG  ..AT  L11  ..AG  Li  ..CG  L10  ..TG  GGAGGTGA  L12  ..AT  GGAGGTG  Q  ..GA  GGAGGT GAAAGGAGG  GAGGTGA  ATG..  (1044)  TCGAAC ATG..  (2153)  AGGCGCA ATG..  (2602)  ATCCTT TTG..  (3444)  TTTGAAG ATG..  (3999)  GAAA ATG..  (4577)  Ho TCTTTCCTCCACTAGG....16S rRNA  Figure 9 Transcription initiation and translation initiation elements in T. maritima  A. The nucleotide sequences (DNA) overlapping the putative transcription initiation sites are listed. The promoter designations are presented on the left and the major start nucleotide is given on the right. The sequences are aligned at the start site, the —10 element and the —35 element. The two equally intense start sites of the P promoter appear to use a common —35 element and overlapping —10 elements; both possibilities are listed. W stands for either nucleotide A or T. B. The nucleotide sequences (DNA) overlapping the translation initiation sites of the L33, secE, nusG, L11, L1, L10, L12 and fi genes are aligned at the translation initiation codon. The upstream sequences are spaced such that they align in a complementary way with the sequence of the 3'-end of 16S rRNA presented at the bottom (Illustrated is the DNA sequence of the rRNA gene). The complementary nucleotides equivalent to the E. coli Shine-Dalgarno ribosome binding sequences are overlined.  73  Table 6  Protein  T. maritima and E. coli protein homologies  Common positions )  Internal Gaps2  Identical amino acids 3  L33  48  4  20  (42%)  SecE  63  0  NusG L11 L1 L10 L12  175 141 232 147 119  5 1 1 8 4  20 75  (32%) (43%)  87 116 65 78  (62%) (50%) (44%) (66%)  1^The common positions are the number of positions where the two protein sequences each contain an amino acid in the alignment generated almost entirely by the alignment algorithm in the Gene Works® package using the default parameters. The alignments of T. maritima and E. coli SecE and NusG proteins were first generated by the computer program, then visually adjusted to maximize the amino acid identity. The NusG alignment results in an insertion of 171 amino acid residues after a position 45 in T. maritima NusG sequence. 2^Internal gaps are the number of gaps required in one or the other sequences to maintain maximal alignment. Presumably, these are the result of deletion or insertion mutations during evolution. 3^Identical amino acids are the number of common positions where both sequences contain the same amino acid residue. The percent identical  ain;no acIcL; i,; ificli,ated in parentheses.  74 In E. coli, secE gene is located 5' to the nusG gene (Downing et al., 1990). The E. coli secE encodes an essential integral membrane protein with three membrane-spanning stretches, which plays an important role in protein export (Schatz et al., 1989). In T. maritima, a short open reading frame immediately preceding nusG encodes a polypeptide of 65 amino acid residues, of which 36 are hydrophobic. This polypeptide can be aligned to the carboxyl terminus of the E.  coli SecE protein without alignment gap and with an amino acid identity of 32% (Figure 10). Thus, this polypeptide is likely the homologue of SecE in T. maritima. Presumably the N-terminal 28 amino acids are present on the cytoplasmic side of the plasma membrane; the central stretch of 19 amino acids spans the membrane bilayer, and the C-terminal 18 amino acids are localized on the periplasmic side (Schatz et al., 1989; Figure 10). Interestingly, short open reading frames that encode polypeptides of 60 and 84 amino acid residues were also found in the published sequences containing  nusG gene from Thermus thermophilus (Heinrich et al., 1992) and Streptomyces virginae (Okamoto et al., 1992), respectively. The two proteins can also be aligned with the T. maritima and E. coli SecE proteins (Figure 10), with 22% (T. maritima-  T. thermophilus), 23% (E. coli—T. thermophilus), 27% (S. virginae—T. thermophilus), 31% (T. maritima—S. virginae), and 25% (E. coli—S. virginae) amino acid identity, respectively. Taken together, these data indicate that T.  maritima and T. thermophilus have very small SecE proteins, whereas the S. virginae SecE protein has a highly charged N-terminal extension which does not show significant sequence identity to any part of the N-terminus of the elongated  E. coli SecE. All three SecE proteins probably have only one membrane-spanning segment (Figure 10). The alignment shows a few highly conserved residues including a tryptophan residue at alignment position 107 (Figure 10). If the T.  maritima SecE is ancestral, the N-terminal extension in the E. coli SecE may be a recent innovation. Genetic and biochemical experiments showed that the  75  Tma-SecE Eco-SecE Svi-SecE Tth-SecE Tma-SecE Eco-SecE Svi-SecE Tth-SecE Tma-SecE Eco-SecE Svi-SecE Tth-SecE  10^20^30^40^50 MSANTEAQGS GRGLEAMKWV VVVALLLVAI VGNYLYRDIM LPLRALAVVI  50  60^70^80^90^100 100 ^MEK LRKFFREVIA KGKA TVAFAREART LIAAAGGVAL LTT ^ 13 14 ^ MPDAEDE TREKKARKGG KRGKKGPLGR LALFYRQIVA MFAR LIRYFQEARA ^ 37 • • ^ 140 150 110^Transmembrane region KKI KELLTSFGVV LVILAVTSVY FFVLDFIFSG VVSAIFKALG IG 127 RKV -- 65 QETLHTTLIV AAVTAVMSLI LWGLDGILVR LVSFITGLRF ^ K NQLTTYTTVV IVFVVIMIGL VTVIEFGFEK AIKFVFG--- ^60 84 LARV EQVVEGTQAI LLFTLAFMVY LGLYITVFRF LIGLLR---•• • • • •  1R  Figure 10 Alignment of the SecE protein sequences The Sequences of the SecE proteins from T. maritima (Tma-SecE), E. coli (Eco-SecE), S. virginiae (Svi-SecE) and T. thermophilus (Tth-SecE) are aligned. The residues that are  invariant in all four sequences are boxed; the regions with conservative substitutions are indicated with "*." The transmembrane region is marked by a heavy overline.  76 carboxyl-terminal region containing the third membrane-spanning segment is sufficient for E. coli SecE function (Schatz et al., 1991; Nishiyama et al., 1992). The NusG protein of E. coli is essential for cell viability, and was shown to assemble into an RNA polymerase elongation complex and to participate in transcription termination-antitermination related activities (Swindle et al., 1988; Downing et al., 1990; Linn and Greenblatt, 1992; Sullivan et al., 1992; Li et al., 1992 and 1993). The E. coli protein is relatively small (M r 20,508), whereas the T . maritima protein is much larger (M r 40,329). The increase in size results almost  entirely from a single large insertion of 171 amino acid residues after position 45. An extensive search of the protein data base has failed to reveal any other protein sequence highly related to the sequence of the inserted region. The NusG sequences from Synechocystis Sp. PCC 6803 (Schimidt and Subramanian, personal communication) and Thermus thermophilus (Heinrich et al., 1992) have been determined; like E. coli they lack this insertion. The NusG protein from Streptomyces virginiae has 319 amino acid residues (M r 34,676; Okamoto et al.,  1992); surprisingly, this increase in size is due to an extension of about 110 amino acid residues at its amino terminus, and this extension has a high proportion of acidic residues (24 Glu residues and 15 Asp residues). The carboxyl terminus is similar to other NusG proteins (Figure 11). The NusG protein from S. virginiae was shown to function as a butyrolactone autoregulator receptor that may switch on expression of genes for antibiotic production and/or cytodifferentiation. An alignment of the five NusG protein sequences is illustrated in Figure 11. Antibodies prepared against the T. maritima NusG protein were shown to crossreact specifically with proteins of the expected sizes in extracts of E. coli, and T. thermophilus (T. Heinrich and R. Hartmann, personal communication). The T. maritima NusG has been expressed in E. coli; the purified recombinant NusG was  shown to have DNA-binding activity (see Part V).  77  Figure 11 Alignment of the NusG protein sequences The sequences of the NusG proteins from T. maritima (Tma-NusG), E. coli (Eco-NusG), Synechocystis Sp. PCC 6803 (Sec-NusG), S. virginiae (Svi-NusG) and T. thermophilus (Tth-NusG) are aligned. The residues that are conserved in either all five sequences or four of the five sequences and a conservative substitution in the fifth sequence are boxed. The N-terminal extension of about 110 amino acid residues in the NusG protein from S. virginiae, which does not show significant sequence identity to any part of the alignment, is omitted in the alignment. The NusG sequence of Synechocystis Sp. PCC 6803 is kindly provided by Dr. Subramanian before publication.  ^  78 10^20  30  40  50 KKVEATGIK EHIKLHNME QRIHTLDVA QRAVSLNVE KRIKAFGLQ  Tma-NusG Eco-NusG Sec-NusG Svi-NusG Tth-NusG  ^MKKK ^ MS----EAP----KKR MSFTDDQSPVAEQNKKTPSEG -PVDPIQALREELRLL---PG MSI E  Tma-NusG Eco-NusG Sec-NusG Svi-NusG Tth-NusG  NLVGRIVI DLFGE DRILQVEI EFIYQAE DKIFQVLI  Tma-NusG Eco-NusG Sec-NusG Svi-NuSG Tth-NusG  110^120^130^140^150 PIYARRSGVIVDVKNVRKIVVETIDRKYTKTYYIPESAGIEPGLRVGTKV ^ ^ ^ ^  Tma-NusG Eco-NusG Sec-NusG Svi-NusG Tth-NusG Tma-NusG Eco-NusG Sec-NusG Svi-NusG Tth-NusG  IVLTMS QAFS 126 1 • VQVPS IHTY•^1.1 1' VHTL to .  OW  60^70^80^90^100 LDATSPSERLILSPKAKLHVNNGKDVNKGDLIAEEP  160^170^180^190^200 KQGLPLSKNEEYICELDGKIVEIERMKKVVVQTPDGEQDVYYIPLDVFDG ^ ^ ^ ^ ^ 210^220^230 240^250 RDRIKKGKEVKQGEMLAEARKFFAKVSGRVEVVDYST EIRI KRRK EI^ GQR-- SERK ^ ^ KI QGEE ^ QIl GER-- RQN s11 VVRK ^ E sl GGK--IT  260 270 280 Tma-NusG -LFPGYVFV ^ E•I ^IMND Eco-NusG -F PGYVLV I ^D ^A Sec-NusG KI PGYVLI IMDD ^ D • II Svi-NusG KL-PGYVLV DLTN ^ Tth-NusG KLFPGYLFI DLGDEEEPNE4 Tma-NusG Eco-NusG Sec-NuSG Svi-NusG Tth-NusG  310 --PVPVKD --PAPISD VLPMPLSH IVKM-LAP--PVPLSP  Tma-NusG Eco-NusG Sec-NuSG Svi-NusG  -GPFEDFAGVI ID -GPFADFNGVV D -GPFKDFEGDVI S DGPFATLQATI IN  290^300 S-GGQ ^ TSDR^ SEQKRHYGRGRGH -AYDPYPLTLDE -GMR  320^330^340^350 RPILRLAGLEEYEEKKKPVKVELGFKVGD IIS---DAIMNRLQ--QVGDKPRP-K-TL-FEPGEuVND---r ERIFRHVD--E-QEPVVKIDMEIGDHI-- T--LS---QEKAAKAAAEEAGLPAPAVKRTIEV-LDF T-GDSVTVT RHILEVSGL--LGKKEAP-KAQVAFREGD4VVS----  360  370  380  QELKVNVTIFGRETP SRLKVSVSIFGROPPVE SKLKALLSIFGRETPVE SKKVKGLVEIFGRETPVE  Figure 11  390^400  79 The ribosomal protein L33 gene was identified between tRNAtYr and tRNAtrP genes (positions 579 to 728, see Figure 4). The T. maritima L33 has 50 amino acid residues, and is basic (pI=9.9). This protein shows sequence identities of 42% to the E. coli ribosomal protein L33 (Table 6) and 47% to the L33 of tobacco chloroplast. The utilization of codons at the 1150 positions in the L33, secE, nusG, L11, L1, L10 and L12 genes was analyzed. As previously observed, there are few restrictions on codon usage, although TTA and CTA (leucine), and CGC, CGA and CGG (arginine) codons are either not used or used infrequently (Tiboni et al., 1991). In general, this pattern is similar to that used by E. coli; the major difference is that E. coli prefers CGT and CGC, whereas T. maritima prefers the unrelated but synonymous AGA and AGG triplets to encode arginine. 3.2.7 Evolutionary implications  In archaebacteria and eukaryotes, the L10 and L12 equivalent proteins exhibit features which clearly distinguish them from the equivalent eubacterial proteins (Shimmin et al., 1989; Newton et al., 1990). In all these respects, the T. maritima L10 and L12 proteins are typically eubacterial. If the ancestor of T. maritima branched from other eubacteria very near the position of the universal  common ancestor (i.e., the root of the universal tree), it would suggest that the primordial state of the L10 and L12 proteins was eubacterial, and that the domain rearrangements and duplications now apparent in the archaebacterial-eukaryotic proteins occurred early in the lineage leading to these now well separated domains. This proposal differs from a previous model which suggested that the archaebacterial-eukaryotic L10 and L12 structures were primordial and that the rearrangements occurred within the lineage leading to eubacteria (Shimmin et al., 1989).  80 3.3 SUMMARY A 5788-nucleotide-long EcoRI fragment from the genome of T. maritima, identified by cross-hybridization to the L11, L1, L10 and L12 ribosomal protein gene sequences from E. coli, was cloned and sequenced. This fragment encodes five tRNAs (tRNA metl, anticodon complementary to AUG;  tRNAmet 2 ,  AUG; tRNAthr,  ACA; tRNAtYr, UAC; tRNAtrP, UGG), a membrane protein SecE, which is putatively involved in protein translocation process, the transcription termination-antitermination factor NusG, the five 50S subunit ribosomal proteins L33, L11, L1, L10 and L12, and the amino-terminal portion of the RNA polymerase f3 subunit. The five tRNA genes, the L33, secE, nusG genes, and the L11, L1, L10 and L12 genes form a complex transcription unit. Transcripts appear to be initiated from an upstream promoter, P1, located in front of the and from three internal promoters: tRNAmet2  P2  tRNAmetl  gene  is located immediately in front of the  gene; Puo is near the beginning of the L1—L10 intergenic space, and Pm  is at the end of the L10 gene sequence. The tRNA sequences are excised from the leader regions of the Pi and P2 initiated transcripts. Three putative but potentially important regulatory sequences were identified within this operon: an L1 translational control site, a transcription attenuator, and a strong Rhoindependent terminator. The strong terminator located distal to the L12 gene overlaps a fifth promoter,  Pp,  which is used to initiate transcripts of the  downstream RNA polymerase f3 subunit gene. The T. maritima secE encodes a protein with 65 amino acids, which can be aligned to the C-terminus of the elongated E. coli SecE protein (127 amino acids, and three membrane-spanning segments). In the alignment, the amino acid identity is 32%. A sequence analysis indicates that the small SecE of T. maritima has one putative transmembrane segment in the central region of the protein. The T. maritima NusG protein exhibits 43% amino acid sequence identity when aligned to the E. coli counterpart;  81 the alignment is interrupted by a 171 amino acid long insertion into the T.  maritima protein after residue 45 that is absent in the E. coli and other eubacterial NusG proteins.  82  IV. The functions of the T. maritima NusG protein: DNA-binding activity and its role in transcription  4.1 INTRODUCTION  A group of proteins called Nus factors endow the DNA-dependent RNA polymerase of Escherichia coli with the ability to read through both factordependent and factor-independent transcription termination signals on the template DNA. These Nus factors function in a multimeric complex with the RNA polymerase that is assembled shortly after transcription initiation (for review, see Das, 1992; Roberts, 1993). Two of the factors, NusB and NusE (ribosomal protein S10), bind to RNA polymerase as well as a box A RNA sequence and tether the transcript at this site to the elongating RNA polymerase (Nodwell and Greenblatt, 1993). The NusA binds directly to the RNA polymerase and slows the transcription elongation, causing the RNA polymerase to pause at specific sites in many transcriptional units (Yager and von Hippel, 1987). This factor is of fundamental importance in facilitating the interaction with the X antitermination protein N, during bacteriophage propagation (Whalen et al., 1988; Das, 1992). The N protein binds to a helical box B sequence that is adjacent to box A on X transcripts. Together, these RNA motifs constitute X nut sites (N utilization sites); the interactions between nut, N, NusA, NusB and NusE strengthen the tethering of the nascent transcript to the elongating polymerase and allow the complex to read through potential termination sites on the DNA template (Rosenberg et al., 1978; Friedman and Olson, 1983; Mason and Greenblatt, 1991; Das, 1992; Nodwell and Greenblatt, 1993). The NusG protein also binds to the elongating RNA polymerase and is believed to be essential for Rho-dependent transcription termination events  83 (Sullivan and Gottesman, 1992; Li et al., 1992, 1993). The termination factor Rho is an RNA-dependent ATPase with RNA helicase activity (Brennan et al., 1987);  if  unimpeded, it causes destabilization of the RNA-DNA duplex within the transcription bubble at the site of elongation and triggers the release of the nascent transcript when polymerase is paused at a Rho-dependent termination site (Morgan et al., 1983, 1984). The activity of Rho is mediated through an interaction with NusG in the elongation complex (Li et al., 1993). During the transcription of X DNA, the interaction of Rho and NusG is effectively blocked by the NusA-N protein association (Li et al., 1993).  The in vivo depletion of NusG has been shown  to result in the suppression of termination at a number of different Rho-dependent termination signals; other potentially deleterious effects of NusG depletion have not been observed (Sullivan and Gottesman, 1992; Sullivan et al., 1992). Other experiments suggest that NusG may also play a role in the regulation of termination at an attenuator site located upstream of genes encoding the 13 and 13' subunits of RNA polymerase in E. coli (Linn and Greenblatt, 1992). The E. coli gene encoding the 181 amino acid long NusG protein is essential for cell viability and is located in the secE-nusG bicistronic operon (Downing et al., 1990). The secE gene encodes a transmembrane protein that is an essential component of the protein translocation system (Schatz et al., 1989). This operon is part of a larger  cluster of essential genes that encode the translation factor EF-TuB,  the ribosomal proteins L11, L1, L10 and L12, and the f3 and (3' subunits of RNA polymerase (Post et al., 1979; An and Friesen, 1980). We  recently cloned and characterized a complex transcription unit  containing the nusG gene from the hyperthermophilic eubacterium Thermotoga maritima (Part III; Liao and Dennis, 1992). The operon contains five tRNA genes and ribosomal protein L33 gene in the 5' mRNA leader, a diminutive  version of  secE and an enlarged version of nusG, and in the distal position, the rp1KAJL genes  encoding ribosomal proteins L11, L1, L10 and L12. The NusG of T. maritima has  84 353 amino acid residues with molecular weight of 40 kilodaltons; within this sequence there is a large 171 amino acid long insertion after residue 45 that is absent in the corresponding NusG homologs from E. coli and other eubacteria. We have observed that the T. maritima NusG protein has a generalized DNA-binding activity. In this part, the characterization of this activity and the role of NusG in transcription are described.  4.2 RESULTS The nusG coding region was amplified with PCR from the T. maritima genomic clone pPD990 using EcoRI and Hin dill containing oligonucleotide primers. The amplified fragment was cloned, sequenced to verify amplification fidelity, and finally recloned into the expression vectors pKK223-3 and pET-3a to give pPD1077 and pPD1078, respectively. In plasmid pPD1077, the T. maritima  nusG gene is under the control of the tac promoter and in plasmid pPD1078, the gene is under the control of a T7 RNA polymerase promoter. In the E. coli strains harboring these expression plasmids, synthesis of the T. maritima NusG protein was tightly regulated and high level expression was dependent on IPTG induction. The expression of the T. maritima NusG protein from plasmid pPD1077 induced by the addition of IPTG to mid-log phase cells is illustrated in Figure 12A. To purify the protein, the cell lysate was heated to 75°C for 30 min and the heat labile E. coli proteins were removed by centrifugation. The protein was further purified by adsorption onto a CM Sepharose column and eluted with a linear gradient of NaCl. The eluted protein was virtually homogenous as judged by SDSPAGE (Figure 12B). Using polyclonal antibodies produced against the purified recombinant protein, we have shown by Western blotting that the single crossreacting protein in T. maritima cell extracts has the same size as judged by electrophoretic mobility as the recombinant protein produced in E. colt. This  85  A L4 1 2 kd^M 1 2 3 4 5 6 7 0 9 10 11 12 13 14 15 16 97.4 — 66.2^'OM 42.7 -a.ow iliim■  wialOSIDNO 00 -411-NueG  31.0 --44.  "  21.5^fip 14.4 -+  60  21.5 --a. • 14.4 --a-  NA  .  Figure 12 Overexpression and purification of the T. maritima NusG protein  (A) Lysates of JM109 (lane 1) and JM109/pPD1077 grown in the presence of IPTG (lane 2) were electrophoresed on a 12% polyacrylamide-SDS gel. (B) Aliquots of material at various stages of NusG purification are visualized on a 16% polyacrylamide-SDS gel. The lanes are: (1) supernatant after heating cell extract to 75°C for 30 min and centrifugation; (2) CM Sepharose CL-6B column flow through; (3,4) column washes with buffer B; (5-16) column fractions eluted with a linear gradient (50-300 mM) of NaCl; not all the column fractions are shown. For each gel, a molecular weight standard (M) was included. The size of the protein standards is in kilodaltons (kd).  86 indicates that the length of the designated T. maritima nusG open reading frame is correct and that the encoded protein is nearly twice the size of the homologous NusG protein of E. coli (20 kilodaltons). Antibodies raised against the E. coli protein have been shown to cross-react with the authentic and the recombinant T. maritima NusG proteins (data not shown).  4.2.1 Binding activity of NusG to duplex DNA  The highly purified recombinant NusG protein of T. maritima binds to double-stranded DNA non-specifically. The salt and temperature dependencies and the time-course of binding were examined using a 5.2 kb linear DNA fragment (Figure 13). These reactions contained about 0.03 pmoles of DNA fragment and 12 pmoles of protein; this corresponds to about 80 molecules of NusG monomer per kb of duplex DNA. At salt concentrations between 5 and 300 mM NaC1, at incubation temperatures between 0° and 80°C, and at times greater than 30 sec, NusG forms a complex with DNA that is retained at the origin and unable to penetrate the agarose gel matrix. Two types of complexes were observed based on ethidium bromide accessibility. At both low (5-20 mM) and high salt concentrations (160-300 mM) the complexes were stained with ethidium bromide, whereas at intermediate salt concentrations (40-100 mM) the complexes were not; these complexes are referred to as "loose" and "tight," respectively. In 50 mM salt, tight complexes are formed only at temperatures above 37°C. Even at 65°C, the initial protein-DNA complex found is loose and the transition to tight occurs only after about 30 min. The "tight" complex can be visualized at the origin by autoradiography when labeled DNA fragments are used (unpublished data). We have investigated the stoichiometry of NusG binding to duplex DNA under the following conditions. The buffer was 33 mM NaC1, 17 mM sodium phosphate, pH 7.0, and the incubation conditions were for two hours at 65°C. For  87  A  ^  ^ B C ^ NaCI (mM) Temp (°C)^Time (min)  64 Cl) 0) Z in 0 0 0 0 0 0 0 0  g  c8  a  i .4, co 0 ,. 0. . .. , 0 IA 0  r•i ri 01 el^PI VI LA CO CO  U)  E‘  IA r4 V 10 CICO  8 c:^ •  .-. ,,,,,  e. .  Figure 13 The binding properties of NusG protein to linear duplex DNA  The salt and temperature dependence and the kinetics of binding of NusG to a 5.2 kb linear fragment of duplex DNA was investigated. Typically, a 15 ill reaction mixture contained 0.03 pmoles of DNA fragment and 12 pmoles of NusG monomer. For the salt-dependence assays (A), the buffer was 5 mM sodium phosphate, pH 7.0, and the incubations were for two hours at 65°C. For the temperature-dependence experiments (B), the reaction buffer contained 33 mM NaC1 and 17 mM sodium phosphate, pH 7.0, and incubation was for 2 hours at the indicated temperatures. For the time kinetics assays (C), the reaction buffer was 33 mM NaCl and 17 mM sodium phosphate, pH 7.0, and the incubation temperature was 65°C for the indicated times. Incubations were terminated by freezing samples in ethanol dry-ice bath. The samples were thawed, mixed with loading solution, and immediately electrophoresed on a 1.0% agarose gel and stained with ethidium bromide. The molecular length standard (MLS, bacteriophage X, DNA digested by restriction enzyme Psti {X PstI], and EcoRI and Hind111 {? E-H}, respectively) and control DNA (CONT, the position is denoted with an arrow) lanes are indicated.  88 these studies we used either a long DNA fragment (5.2 kb), a short fragment (0.7 kb) or a mixture of a long and a short fragment (3.0 kb and 0.7 kb). The results obtained with the mixture of the two fragments are illustrated in Figure 14. In the first experiment, a constant amount of DNA (150 ng; 0.06 pmoles of each fragment) was titrated with increasing amounts of NusG protein (1.25 to 25 pmoles). At a protein concentration above 5 pmoles per assay, both of the input DNA fragments are fully sequestered at the electrophoretic origin in the tight complexes which are inaccessible to ethidium bromide staining. At protein concentrations below 2.5 pmoles of protein per assay, there is no evidence of any protein-DNA interaction. These results suggest that protein binding to DNA is cooperative, and that the critical protein concentration necessary for complete complex formation in this experiment was about 10 pmoles per assay; this corresponds to 44 molecules of NusG protein per kb of DNA. In separate experiments using the 5.2 kb or the 0.7 kb fragment, we estimated the critical amount of protein required for complete complex formation was 33 and 41 molecules per kb of DNA (data not shown). These and numerous other experiments indicate that the protein has no sequence specificity and exhibits no discernible preference for long versus short DNA fragments. In the second experiment (Figure 14B), a constant amount of NusG protein (500 ng; 12.5 pmoles) was titrated against an increasing amount of the equal molar mixture of the 3.0 and 0.7 kb DNA fragments (0.02 to 0.32 pmoles of each fragment). At DNA fragment concentrations below 0.08 pmoles per assay, virtually all the DNA is sequestered in a tight complex and retained at the electrophoretic origin. Again, this corresponds to about 42 molecules of NusG monomer per kb of target DNA. At higher DNA concentrations, fragments not complexed with the protein are visible. In experiments using a 5.2 or a 0.7 kb fragment, we estimated that the target DNA is totally sequestered when the amount of protein exceeds 21 monomers per kb of DNA.  89  A ^NusG  (pmol)^dsDNA (pmol)  °)  X  Cl  .  o  Hca  ca tn 0 LA 0 1/1 X^ .  v-.1^Cq  ei o a  qt CO Z 0 0 0 rq^Cq  mcv  0^•^•^•^•^•^•^• •^• rj000000  00  0  3.0  0.7  Figure 14 The stoichiometry of NusG:duplex DNA complexes  Assays were carried out in 15 IA volumes containing 33 mM NaC1 and 17 mM sodium phosphate, pH 7.0 and incubations were for two hours at 65°C. In (A), the amount of DNA was constant (0.06 pmoles each of a 0.7 and a 3.0 kb fragment) and the amount of NusG protein was increased from 1.25 to 25 pmoles. In (B), the amount of NusG protein was constant (12.5 pmoles) and the amount of DNA was increased from 0.02 to 0.32 pmoles of each fragment. The reactions were electrophoresed through a 1.0% agarose gel and stained with ethidium bromide. The molecular length standard (MLS, PstI and 7‘., E-H) and control DNA (CONT; 0.4 pmoles of each fragment) are indicated. The arrows indicate the positions of 3.0 kb and 0.7 kb DNA fragments.  90 4.2.2 Accessibility of the NusG-DNA complex to restriction by Taql  The state of the complex between NusG protein and duplex DNA was further investigated using accessibility to restriction endonuclease digestion. The restriction enzyme TaqI has a temperature optimum at 65°C and can efficiently cleave DNA over a wide range of salt concentrations. In this assay, we incubated duplicate samples of linear DNA at 65°C in buffers containing between 10 and 150 mM NaC1 in the absence or presence of NusG protein (80 monomers per kb of DNA). After two hours, TaqI endonuclease was added and digestion was continued for 1.5 hours. One of the duplicates was electrophoresed directly on a standard agarose gel and the other was mixed with a small amount of SDS (0.2% final concentration) and run on a gel containing 0.2% SDS (Figures 15A and B). This concentration of SDS is sufficient to disrupt the NusG-DNA complex so that the intact fragment or TaqI digestion products can be electrophoresed into the gel and visualized. In the native gel, it is apparent that the DNA fragment in samples not containing NusG protein is efficiently restricted by TaqI at salt concentrations below 100 mM NaCl. At 150 mM NaC1, TaqI cleaves less efficiently and restriction of the DNA is incomplete. In samples containing NusG protein and run on the native gel, the DNA is retained as a complex in the well because the number of monomers of protein per kilobase of DNA exceeds the critical value of 20-50. When the NusG-DNA complexes that had been digested with TaqI were dissociated with 0.2% SDS and run in an SDS gel, it can be seen that the DNA is protected from TaqI restriction by NusG binding at salt concentrations below 50 mM NaCl. At salt concentrations above 50 mM NaC1, TaqI can digest the DNA within the NusGDNA complex nearly to completion. At 150 mM NaC1, TaqI is able to digest DNA in the complex much more efficiently than it digests free DNA. This result suggests that at salt concentrations below 50 mM, the DNA-protein complex is either more compact or more static and therefore resistant or inaccessible to digestion by TaqI  91 endonuclease. We cannot explain the activation of Taql nuclease by NusG in high salt buffer. 4.2.3^Binding activity of NusG to single-stranded DNA  Next, we tested if NusG can bind to single-stranded as well as duplex DNA. We found that when we mixed NusG (2 pg; 50 pmoles) with single-stranded M13 DNA (200 ng; 0.085 pmoles) to give a ratio of about 80 monomers of protein per kb of single-stranded DNA, a complex that was retained in the well of the electrophoresis gel formed within 30 sec at 65°C. When incubated less than 30 min, the complexes were visible with ethidium bromide staining; with incubation times greater than 30 min, the complexes could no longer be stained. The complexes were at least partially dissociated by incubation with SDS (>0.1%) (data not shown). The stoichiometry of the interaction between NusG and single-stranded DNA was examined by titrating a constant amount of single-stranded M13 DNA (0.84 pmoles) with various amounts of NusG (1.25 to 160 pmoles per assay). At protein concentration above 20 pmoles, all the DNA was sequestered in complexes and retained at the electrophoretic origin. The protein concentration required to achieve complete complex formation was 40 pmoles; this corresponds to about 66 monomers of NusG per thousand bases of single-stranded DNA. In the reciprocal experiment, a constant amount of NusG (25 pmoles) was titrated with an increasing amounts of single-stranded DNA. At DNA concentrations of 0.63 pmoles per assay and below, virtually all the DNA was sequestered; this corresponds to greater than 54 NusG monomers per thousand bases of DNA (Figure 16).  92  A^Native gel^  B  NaC1 (mM) 10 25 50 75 100 150 E.  NaC1 (mM) 10 25 50 75 100 150 E.  NuaG^- + - + - + - + - + - +  0.2‘)/0 SDS gel  ^ 8^ NuaG^  + + - + - 0  Figure 15 Susceptibility of NusG:duplex DNA complexes to restriction by Taql  In duplicate experiments, NusG protein (1 lig; 25 pmoles) was mixed with a 3.5 kb linear DNA fragment (0.09 pmoles) in 5 mM sodium phosphate buffer (pH 7.0) containing between 10 and 150 mM NaCl. The sample volume was 19.5 tl and the first incubation was for two hours at 65°C. At the end of the first incubation, 0.5 tl mixture containing 1 unit of TaqI endonuclease and sufficient Mg+ 2 to bring the final concentration to 3.0 mM was added to each sample. The second incubation was at 65°C for 1.5 hours. One sample from each duplicate was mixed with standard loading solution, run on a standard agarose gel (1.0%) (A), and the other was mixed with loading solution containing SDS (0.2% final concentration) and run on an SDS-containing 1.0% agarose gel (0.2% final concentration) (B). The arrow indicates the position of the control 3.5 kb DNA linear fragment. The 3.5 kb DNA fragment is cleaved by TaqI to produce a 2.0 kb (o) and a number of other smaller subfragments.  93  A NusG (pmol)  B  ssDNA (pmol)  Cq 1.0  8 -111P-  r-1 Cq tf1 1.0 CI V 1.0 OD 0 Cl r-1 C`q U 0 0 0 0 0 0 0 0 0 r-1 1-4 r-1 r-1 ri Cl .11 OD ‘.0^ .^.^. r-1 0 0 0 0 0 0 0 0  M  Figure 16 The stoichiometry of NusG:single-stranded DNA complexes  Assays were carried out in 15 1.11 volumes containing 33 mM NaC1 and 17 mM sodium phosphate, pH 7.0, and incubations were for two hours at 65°C. In (A), the amount of DNA was constant (0.084 pmoles) and the amount of NusG was increased from 1.25 to 160 pmoles. In (B), the amount of NusG protein was constant (25 pmoles) and the amount of DNA was increased from 0.021 to 0.168 pmoles. The position of free DNA is indicated by the arrow. The lane designated MIS is a molecular length standard (X PstI).  94 ^4.2.4^Competition between single-stranded and duplex DNA for NusG binding  The results above suggest that somewhat more NusG protein is required to fully complex single-stranded DNA (>50 monomers per thousand bases) than to fully complex duplex DNA (<50 monomers per thousand base pairs). Furthermore, the transition between free DNA and complex occurred over a narrow concentration range for duplex DNA and over a broader concentration range for single-stranded DNA (Figures 19 and 21). In fact, at intermediate NusG concentrations, a thin trail of ethidium bromide stainable material is apparent between the band of free DNA and the electrophoretic origin (Figure 16). This material presumably represents intermediates or incomplete complexes. We then investigated the preference of NusG for single-stranded or duplex DNA in a competition experiment, where equal molar amounts of a 3.5 kb duplex DNA fragment and a 7.2 kb single-strand M13 DNA (0.085 pmoles of each) were mixed together and titrated against increasing amounts of NusG protein (1.25-50 pmoles). At a protein concentration above 10 pmoles per assay, all of the 3.5 kb duplex DNA was sequestered in tight complexes and retained at the electrophoretic origin (Figure 17). At the same time, at least some single-stranded DNA remained visible in the uncomplexed or partially complexed state at a protein concentration of 40 pmoles per assay (Figure 17). These results indicate that NusG binds preferentially and more efficiently to duplex DNA. 4.2.5 Role of NusG in transcription  Since NusG functions as a transcriptional factor in E. coli, and the T. maritima NusG binds to DNA, the next question we asked concerned the specific  role of T. maritima NusG in transcription. A number of DNA templates were constructed for in vitro transcription assays. A 170 by EcoRI—Aval fragmen t containing the promoter P1 (see Part III) was isolated and cloned into plasmid  95  Figure 17 Competition between single-stranded and duplex DNA for NusG binding  Assays were carried out in 15 ill volumes containing 33 mM NaC1 and 17 mM sodium phosphate, pH 7.0, and incubations were for two hours at 65°C. Each assay contained 0.085 pmoles of a 3.5 kb duplex DNA fragment and 0.085 pmoles of single-stranded M13 DNA. The amount of NusG protein was increased from 1.25 pmoles to 50 pmoles per assay. After incubation, the samples were electrophoresed through a 1.0% agarose and stained with ethidium bromide. Lane designations are MLS, molecular length standard (X, PstI); SS + DS, single-stranded plus duplex DNA without protein; DS, duplex DNA without protein; SS, single-stranded DNA without protein.  96 pGEM3-Zf(+) between EcoRI and Aval sites of the multiple cloning site. A 93 by fragment containing the terminator To was obtained by PCR amplification using the 5' primer oD20 and the 3' primer oD21 (Table 3). The product was cloned into the Xbal and HindIII sites downstream of the P1 promoter fragment. (Primers oD20 and  oD21 have Xbal and HindIII recognition sequences respectively.) The resulting EcoRI-HindIII 271 fragment containing the promoter Pi and the terminator To was used for the in vitro transcription assay with partially purified DNA-dependent RNA polymerase from T. maritima; the results of this experiment are shown in Figure 18. The major bands correspond to (i) a 135-nucleotide-long transcript initiated at or near the mapped starting site (nucleotide 101, Figure 4) and terminated at To (denoted as Term), and (ii) the 170-nucleotide-long transcripts initiated at the same starting site and extended to the end of the linearized template (denoted as RT). In the absence (lane 1) and presence of small amounts of NusG (lanes 2-4, Figure 18), other RNA species are easily recognizable, and probably result from non-specific initiation and termination events. However, at a relatively higher NusG concentration (12.5 ilM, lane 5), only specific transcripts can be seen (Term and RT); the intensity of these bands appears to be reduced. These results also indicate that about 30% of transcripts escape termination. In contrast, in vivo transcript mapping suggested that the terminator 113 was nearly 100% efficient in eliciting termination of transcription from upstream promoters (Part III; Liao and Dennis, 1992). This discrepancy suggests that additional protein factor(s) may be required for efficient transcription termination. Alternatively, sequence context of the terminator, the tertiary structure of the genome, or the intracellular environment may be important for efficient termination. In the second set of in vitro transcription experiments, a potential transcription attenuator (ALio) normally located between the L1 and L10 genes (the L10 attenuator, see Part III) was integrated into the DNA templates. On the first such template, the L10 attenuator (A) was arranged between the promoter P1 and  97  Figure 18^In vitro transcription of DNA template containing a promoter and a terminator  The promoter P1 (P) and terminator To (T) that were identified in the cloned gene cluster from T. maritima genome (Part III) were fused together (details in the text), and a 271 nucleotide long DNA fragment was used for an in vitro transcription assay with purified RNA polymerase from T. maritima. Major transcripts are indicated (RT and Term), which are also depicted below the template with lines, and the 5' end of the transcripts is marked with a solid circle (•). Lane M shows DNA size markers (3' end-labeled restriction fragments); the sizes are (from top, bp): 190, 147 and 110. The concentrations of T. maritima NusG are (.1M): 0 (lane 1), 0.0125 (lane 2), 0.125 (Lane 3), 1.25 (Lane 4) and 12.5 (Lane 5); which are corresponding to approximately 0, 0.6, 6, 60, and 600 monomers of NusG per template molecule.  98 the terminator Ti3 (Figure 19). Specific transcripts generated on this template were those initiated at the promoter and terminated at  AL10 (Att,  about 160-nucleotide-  long), terminated at Tp (Term, about 370-nucleotide-long), or extended to the end of the linearized template (RT, about 410-nucleotide-long). Non-specific transcripts were also apparent (lanes 1-4, Figure 19). Addition of NusG has similar effect on transcription of this template; at a relatively high concentration (12.5 11M, lane 5 of Figure 19), non-specific transcripts were greatly suppressed, whereas the specific transcripts (RT, Term and Att) were largely unaffected. To check if NusG binds to the DNA template under the in vitro transcription assay condition, DNA band-shift assays using this template were conducted. The DNA-NusG complexes that were retained at the origin of electrophoresis were observed when the protein was added to amounts corresponding to those used for lanes 4 and 5 in the transcription assay (1.25 and 12.5 1.1M, respectively). No complexes were detected at lower protein concentrations. Thus, the NusG protein binds to the template DNA cooperatively. However, more monomers of NusG are required to form the protein-DNA complexes under the in vitro transcription assay condition than that shown above (about 200 monomers of NusG per thousand base-pair of template DNA; data not shown). To verify if Att (about 160-nucleotide-long) corresponds to transcripts that are terminated at the AL10, two additional templates were used (Figure 19). Both contain the promoter Pi followed by the AL10, and the distance between the two sequence elements were kept constant (same as on template 1, Figure 19). Templates 2 and 3 are 410 and 760 by long, respectively. The major transcripts are the read-through products (pointed by arrows), and the transcripts ended at the AL10 site (Att, Figure 19).  Figure 19^In vitro transcription of DNA templates containing a promoter, an attenuator, and a terminator  Templates that contain the promoter P1 (P), the attenuator ALio (A) and the terminator Tp (T) (Temp 1), and that carry P1 (P) and Allo (A) (Temp 2 and Temp 3) were used for in vitro transcription assays with purified RNA polymerase from T. maritima. (All three sequence elements were from the cloned gene cluster from the T. maritima genome [see Part III].) Major transcripts are illustrated (RT and Term, and Att), which are also indicated below each template with lines, and the 5' end of the transcripts is marked with a solid circle (•). Lanes 1 to 5 show transcripts synthesized on template 1, and the lanes marked with Temp 2 and Temp 3 show the RNA products generated on templates 2 and 3, respectively. Arrow points to the read-through products (RT) in lanes Temp 2 and Temp 3. Lane M shows DNA size markers (3' end-labeled restriction fragments); the sizes are (from top, bp): 765, 577 489, 457, 404, 360, 328, 281, 255, 240, 190, and 147. The concentrations of T . maritima NusG are (µM): 0 (lane 1), 0.0125 (lane 2), 0.125 (Lane 3), 1.25 (Lane 4) and 12.5 (Lane 5); these are corresponding to approximately 0, 1, 10, 100, 1,000 monomers of NusG per template molecule (0.5 kb).  100 4.2.6 Association of NusG with ribosomes in T. maritima  To determine the possible location of NusG within the T. maritima cell, the polyclonal antibodies against purified NusG protein were used to detect whether it is associated with the ribosomes of T. maritima. The 70S ribosomes and the 50S and 30S ribosomal subunits from the T. maritima cells were purified by sucrose gradient centrifugation. Western blotting analysis detected a protein band with molecular weight of about 40 kilodaltons in both supernatant and pellet fractions after centrifugation at 248,000 xg. The pellet corresponds to the cellular particles larger than 20S. Thus, the NusG protein seems to be associated with the ribosomes in T. maritima cell. About 25%-50% of NusG partitioned with the pellet. The pellet was then dissolved and centrifuged through a sucrose gradient. The fractions corresponding to the 70S particle, 50S and 30S ribosomal subunits were collected and pooled. Western blotting analysis of these fractions indicated that NusG protein was present in all fractions. However, the amounts of NusG diminished in the 50S and 30S subunit fractions after another round of sucrose gradient centrifugation, which seems to suggest that NusG is not an integral part of the ribosome (data not shown).  4.3 DISCUSSION  There is strong biochemical and genetic evidence to suggest that in E. coli, Nus factors form a multimeric complex with core RNA polymerase during transcriptional elongation (for reviews, see Das, 1992; Roberts, 1993). The complex is endowed with the ability to read through both factor-dependent and factorindependent transcription-termination signals. NusG is an important component of this complex; it binds to and interacts with both core RNA polymerase and Rho  101 factor. The interaction with Rho is necessary for Rho-mediated termination (Mason and Greenblatt, 1992; Sullivan and Gottesman, 1992; Li et al., 1992, 1993). In E. coli, the nusG gene is essential for cell viability; the in vivo depletion of NusG appears to affect only Rho-mediated termination (Downing et al., 1990; Sullivan and Gottesman, 1992). The gene encoding NusG is located in the secEnusG bicistronic operon and is adjacent to the rplKAJLrpoBC operon. Nuclease S1  protection assays indicate that the secE-nusG transcripts are about five to ten-fold less abundant than the ribosomal protein encoding transcripts (Downing et al., 1990; Downing and Dennis, 1991). Using a quantitative Western blotting assay, Li et al. (1993) estimated that there are about ten thousand copies of NusG per cell. This  value is about equal to the number of core RNA polymerase and about five-fold below the number of ribosomes per cell (Bremer and Dennis, 1987). The NusG homologous gene, designated vbrA, has recently been cloned and characterized from Streptomyces virginiae (Okamoto et al., 1992). Unlike other NusG homologs, the S. virginiae protein contains extra 125 amino acids at its amino terminus. The protein was first characterized because of its ability to bind butyrolactone, an autoregulator that triggers secondary metabolism and the production of the antibiotic virginiamycin in this organism. The unique amino terminal domain has a high proportion of acidic residues (35%) and contains four copies of the tetrapeptide Glu-Glu-Ala-Ala. It was suggested but not demonstrated that this acidic domain, absent from other NusG proteins, is the butyrolactone receptor domain. If this is correct, it suggests that the protein not only senses the presence of the butyrolactone but also activates genes responsible for secondary metabolism and antibiotic production. The gene activation may be mediated through the termination-antitermination activity of NusG. If so, it substantiates the idea that the NusG protein is an important element in determining the termination properties of the transcription complex. Alternatively, this protein  102 might have either a specific or general DNA-binding activity and thereby either directly or indirectly influence transcription initiation. The T. maritima protein contains a 171-amino-acid-long insertion after residue 45 that is not present in any of the other eubacterial NusG proteins that have been examined to date. There is some indication that this insertion may be the result of a partial gene duplication, but the evidence for this is not compelling and the situation is clearly more complicated (unpublished observations). The T. maritima NusG contains a relatively high content of charged amino acids (121 of  353 residues), but is only moderately basic (57 acidic and 64 basic residues; pI = 9.0) (Liao and Dennis, 1992). The charged residues are evenly distributed along the length of the molecule. The relationship between this 171 amino acid long insertion and the DNAbinding activity of the protein remains uncertain. Three approaches have been used with limited success to address this issue. First, the insert sequence as well as the entire sequence were compared to all entries in the protein sequence data base. The T. maritima NusG exhibited homology only to other eubacterial NusG proteins, whereas the insert failed to exhibit significant sequence identity to any other sequence including proteins with known DNA-binding activity. Second, we constructed a deletion version of the T. maritima NusG that is missing the entire 171 amino acid insertion. When expressed, the deletion protein was detectable by Western blotting but failed to accumulate in appreciable amounts and therefore could not be purified (unpublished results). Finally, we examined the E. coli NusG protein for its ability to bind DNA; no complexes were detected at 37°C using the same conditions employed here for the T. maritima protein. These results indicate that the T. maritima NusG has a DNA-binding activity not present in the homologous E. coli protein; the activity may or may not be located within or related to the insertion unique to this protein.  103 Our results indicate that the protein binds cooperatively to DNA in vitro; the stoichiometry is about 20-40 monomers per thousand base pairs of DNA. In T. maritima, the secE and nusG genes are cotranscribed with the downstream  ribosomal protein genes (Part III; Liao and Dennis, 1992). If the nusG cistron is efficiently translated, the stoichiometry of NusG and ribosomes should be about 1:1, and the protein is likely to be in substantial excess over the amount of core RNA polymerase. Thus, it is possible that the amount of the protein may be five to tenfold higher than in E. coli. If the genome size and composition of a T. maritima cell is similar to that of E. coli, there may be as many as 50,000 copies of NusG per cell; this would correspond to about five monomers per thousand base pairs of DNA. Two different low molecular weight histone-like proteins, associated with the nucleoid of E. coli have been purified and characterized (reviewed by Drlica and Rouviêre-Yaniv, 1987). The first, protein HU, is highly basic, resembling eukaryotic histones, and plays an important role in maintaining chromosome structure and superhelicity (Rouviere-Yaniv et al., 1979; Broyles and Pettijohn, 1986). The HU protein is very abundant (with 20,000 to 100,000 copies per cell; Broyles and Pettijohn, 1986). It binds to DNA more tightly than to RNA, as well as to ssDNA more strongly than to dsDNA; moreover, it has the propensity to associate with the ribosome (Dijk et al., 1983), but the biological significance of these properties is not clear (Gualerzi et al., 1986). Interestingly, we found that the T. maritima NusG seems to be also in association with the ribosomes, and bind to RNA (data not shown). However, interactions between ribosomes and cellular proteins appear to be very complicated and nature of these interactions is not well understood. The second, protein H1, is neutral rather than basic and binds as a dimer once every 400 base pairs to duplex DNA. The abundance of H1 in the bacterial nucleiod is second only to that of HU (estimated about 20,000 copies per cell) (for review, see Higgins et al., 1990). Mutations in the osmZ gene encoding the H1 protein are pleiotropic and influence the expression of many genes in a non-  104 uniform fashion, presumably by affecting superhelicity and overall topology of the nucleoid (Hulton et al., 1990). Both HU and H1 generally inhibit transcription by E. coli RNA polymerase; however, they can also enhance transcription of certain  templates (Drlica and Rouviêre-Yaniv, 1987; Higgins et al., 1990). At the present time, it is still uncertain if the NusG protein of T. maritima, in addition to being a transcription factor, also plays a role similar to HU and H1 in maintaining the structure of the nucleoid in this hyperthermophilic eubacterium. The T. maritima NusG seems to play an important role in transcription; at a protein concentration of about 2,000 monomers per thousand base pairs of template DNA, the generally high background of aberrant initiation and termination was essentially eliminated, and at the same time, specific transcription initiation and termination are maintained. It is possible that at high temperature, the RNA polymerase is very active; aberrant initiation and termination events are frequent, which would result in high level synthesis of useless transcripts. To prevent this, the NusG protein acquired new functions that make RNA polymerase more faithful. The NusG may stay bound to the DNA template, the RNA transcripts, and even the RNA polymerase throughout the transcription process. These interactions would prevent premature release of the transcripts and dissociation of the RNA polymerase from the template. On the other hand, these interactions may inevitably cause slow-down in RNA synthesis. 4.4 SUMMARY The NusG protein of T. maritima has 353 amino acid residues with an apparent molecular weight of 40 kilodaltons, twice as large as the E. coli counterpart, due to a large insertion in the central part of the sequence. It is a basic protein (pI=9.0). The T. maritima NusG was expressed in E. coli and the recombinant NusG purified. The purified NiisG hag non-specific DNA binding  105 activity; it binds DNA cooperatively. We estimated by DNA band-shift assays that about 40 NusG monomers per kilobase pairs of dsDNA are needed to form NusGdsDNA complexes. The number of NusG monomers required per kilobase ssDNA is higher (about 60) for the formation of NusG-ssDNA complexes. Two types of NusG-DNA complexes have been observed: the first type forms instantly, can be stained with ethidium bromide ("loose" complex); the second type forms more slowly, probably is converted from the structure(s) of the first type complex. The second type may be more compact, since it can not be stained with ethidium bromide ("tight" complex). The protein binds to both ds- and ssDNA, but preferentially to dsDNA in a mixture of both DNA molecules. It seems to play important roles in transcription in T. maritima. In vitro transcription assays using purified DNA-dependent RNA polymerase from T. maritima suggested that at relatively high NusG concentration, NusG appears to suppress aberrant transcription; however, the production of specific transcripts is largely unaffected. The sequence elements identified in the cloned gene cluster are functional in the in vitro transcription system. The promoter Pi was used as promoter element. The transcripts synthesized under the control of this promoter are consistent with the mapped transcripts in vivo. Transcription attenuation was also observed at the attenuator ALE). Transcription termination at the terminator To is faithful, but transcription did not stop fully at this site in vitro, whereas it functions at nearly 100% efficiency in vivo.  106  V. Molecular phylogenies based on the sequences of ribosomal proteins L11, L1, L10 and L12 5.1 INTRODUCTION  Ribosomes are subcellular particles that play a structural and functional role in the template directed synthesis of protein. Ribosomes were already present in the common primordial ancestor, and their basic structural and functional features have been preserved in all its diverse descendants. As a result, the macromolecular components of the ribosome, especially the small subunit ribosomal RNA, have been useful chronometers to measure evolutionary relationships among extant organisms (Pace, et al., 1986; Pace, 1991). In the E. coli ribosome, a pentameric complex, consisting of four copies of protein L12 and a single copy of protein L10, binds cooperatively along with another protein, L11, to a region in 23S rRNA between nucleotides 1030 and 1120 (Dijk et al., 1979; Egebjerg et al., 1990; Ryan et al., 1991). This interaction produces a distinct and easily recognizable stalk on the large ribosomal subunit. This structure is essential for the binding of the extrinsic factors EF-Tu and EF-G and participates in conformational rearrangements of the ribosome that are accompanied by the hydrolysis of GTP (reviewed by Liljas, 1982; Shimmin et al., 1989). Quaternary complexes similar to the E. coli (L12)4-L10-L11-rRNA complex are structurally and functionally conserved in the ribosomes of archaebacteria and eukaryotes (Beauclerk et al., 1985; El-Baradi et al., 1987; Uchiumi et al., 1987; Cassiano et al., 1990). A fourth protein, L1, binds to large subunit RNA between nucleotides 2100 and 2200 (Branlant et al., 1981). It functions to stabilize peptidyl-tRNA binding to the ribosome P site and participates indirectly in the factor dependent GTP hydrol sis Subramania  .11 0. ea  •21  an• er, •83).  107 In E. co/i, the genes encoding L11, L1, L10 and L12 form a complex transcription unit that also contains the genes for the two large subunits of RNA polymerase. It was somewhat surprising to find that the clustering of the genes encoding these four ribosomal proteins was conserved not just in eubacteria but also in a range of distantly related archaebacterial species including Halobacterium cutirubrum (Shimmin and Dennis, 1989), Haloferax volcanii (Shimmin and  Dennis, unpublished results), Haloarcula marismortui (Arndt and Weigel, 1990), and Sulfolobus solfataricus (Ramirez et al., 1989). In eukaryotes, these genes are not linked (Newton et al., 1990) and the L12 gene has undergone a very ancient duplication that possibly predates the earliest eukaryotic organism. In this part, available L11, L1, L10 and L12 gene and protein sequences from eubacterial, archaebacterial and eukaryotic organisms are aligned and analyzed. We observed that for each of the gene-protein analyses, there is strong coherence for grouping organisms into the three primary kingdoms (or domains): eubacteria, archaebacteria and eukaryotes. That is, the gene or protein sequences of organisms from within any one of the three domains are more closely related to each other than they are to sequences from the other two domains. The patterns of divergence for the L11, L10 and L12 proteins between eubacteria, archaebacteria and eukaryotes are surprisingly dissimilar considering their intimate physiological interactions on the ribosome.  5.2 RESULTS AND DISCUSSION 5.2.1 Alignment and phylogeny of L11 proteins  There are five eubacterial sequences and one chloroplast sequence, which is encoded by the nuclear genome, available for ribosomal protein L11. They align from end to end with only two gaps in the alignment at positions 2-5 and 53  108 (Figure 20). The high degree of amino acid sequence identity among these five sequences clearly suggests that the chloroplast sequence is of eubacterial origin. The three available archaebacterial L11 protein sequences can be easily accommodated to this alignment. The archaebacterial proteins retain seven of the eight proline residues that are conserved in the eubacterial alignment at positions 24, 26, 27, 30, 60, 79 and 98; an eighth proline at position 80 has been replaced only in the S. solfataricus sequence. The archaebacterial Lii proteins are further characterized by a shorter amino terminus and by a 25-32 amino acid long extension at the carboxyl terminus when compared to the eubacterial L11 sequences. The proteins designated "L15" from S. cerevisiae (Pucciarelli et al., 1990) and "L12" from R. rattus (Suzuki et al., 1990) are homologs. They align end-to-end without gaps and are identical at 115 of the 165 positions. Based upon (i) immunological cross-reactivity (Juan-Vidales, et al., 1983), (ii) a limited degree of amino acid sequence similarity, and (iii) a common binding site within mouse 28S rRNA (El-Baradi et al., 1987), these eukaryotic proteins have been implicated as homologues of the Lii protein of E. coli. The eukaryotic L11 sequences can be accommodated in the alignment by the inclusion of only two additional internal gaps (positions 66 and 77). Of the seven positions where proline is conserved in the archaebacterial and eubacterial proteins, only two (positions 30 and 79) are retained in the eukaryotic proteins (Figure 20A). The phylogenetic relationships between the eleven L11 protein sequences were analyzed using PAUP (Figure 20B). The eubacteria were contained within a well-defined domain. The location and branching order of three species within this domain, Streptomyces virginiae, spinach chloroplast and T. maritima, are not rigorously defined. The two eukaryotic L11 sequences form another well defined branch that originates from the S. solfataricus linea:e within  II • •  group. If the ancestral root of the tree is located near or within the eubacterial  109 domain (below the position of the arrow in Figure 20B), then the archaebacteria would appear to be monophyletic but not holophyletic. (A group of taxa are said to be holophyletic only when they not only are descended from a single ancestral species, but represent all the descendants of that ancestor. In the tree shown in Figure 20B, archaebacteria share a common ancestor with eukaryotes; thus by strictest definition, archaebacteria are only considered being a monophyletic group.) However, bootstrap analysis indicates that the positioning of S. solfataricus relative to eukaryotes is tenuous. In the DNA parsimony tree (and in all other distance method trees), the archaebacteria are both monophyletic and holophyletic; the bootstrap confidence for this arrangement was 0.82. 5.2.2 Alignment and phylogeny of Li protein sequences  There are five eubacterial and six archaebacterial L1 equivalent protein sequences available (Figure 21A). Although the proportion of conserved amino acid residues within the L1 family is relatively high, the alignment is interrupted by gaps at approximately 15 different positions. Many of these gaps, particularly the five gaps located beyond amino acid position 125, clearly differentiate the archaebacterial proteins from the eubacterial proteins. Deletion-insertion events are generally rare and their co-occurrence in multiple sequence alignments is a strong indication of common ancestry. In E. coli, protein L1 binds to nucleotides 2100-2200 of the E. coli 23S rRNA (Branlant et al., 1981). The sequence and secondary structure of this binding domain within large subunit rRNA of archaebacteria and eukaryotes are highly conserved and the E. coli protein can protect these sites in vitro from ribonuclease digestion (Zimmerman et al., 1980; Gourse et al., 1981). In E. coli protein L1 is also an autogenous regulator of translation of the mRNA containing the L11, Li, L10 and L12 cistrons. A region within the leader of the mRNA exhibits primary sequence and secondary structural similarity to the authentic L1 binding domain in  110 Figure 20^Alignment of the amino acid sequences of the ribosomal protein L11 family, and phylogenetic tree based on this alignment A. The L11 proteins are from five eubacteria, one chloroplast, three archaebacteria, and two eucaryotes. The leader peptide required for import of the chloroplast L11 protein into the organelle is not included in the alignment. The numbers indicate the common alignment positions. The species names and their abbreviations are listed in Table 5 (pp. 41-42). B. A parsimony analysis of aligned sequences was carried out using PAUP with branch and bound search option, and the parsimonious tree is illustrated. There are 130 informative sites; all of them are included in the parsimony analysis. The consistency index for the tree is 0.886. The numbers indicate percent confirmation of grouping of species to the right of the node by bootstrapping analysis with 2000 replications. The arrow indicates that possible root of the universal tree is located between the eubacteria and archaebacteria-eukaryotes.  111  A 10^20^30^40^50^60^70^80^90^100 SceL11^MPPKFDPNEVKYLYLRAVGGEVGASAALAPKIGPLGLSPKKVGEDIAKATKE-FKGIKVYVQLKI-QNRQ-AAASV-VPSASSLVITALKEPPRDRKKDK RraL11^MPPKFDPNEIKVVYLRCTGGEVGATSALAPKIGPLGLSPKKVGDDIAKATGD-WKGLRITVKLTI-QNRQ-AQIEV-VPSASALIIKALKEPPRDRKKQK SsoLll^MPTKT ^ IKIMVEGGSAKPGPPLGPTLSQLGLNVQEVVKKINDVTAQ-FKGMSVPVTIEIDSSTKKYDIKVGVPTTTSLLLKAINAQEPSGDPAH HcuLll^MAE T ^ IEVLVAGGQADPGPPLGPELGPTPVDVQAVVQEINDQTEA-FDGTEVPVTIEYEDDGS-FSIEVGVPPTAALVKDEAGFDTGSGEPQE HmaL11^MAG-T ^ IEVLVPGGEANPGPPLGPELGPTPVDVQAVVQEINDQTAA-FDGTEVPVTVKYDDDGS-FEIEVGVPPTAELIKDEAGFETGSGEPQE Sol(c)L11 KA----KKVIGVIKLALEAGKATPAPPVGPALGSKGVNIMAFCKDYNARTAD-KPGEVIPVEITVEDDKS-FTFILKTPPASVLLLKASGAEKGSKDPQM EcoL11^MA----KKVQAYVKLQVAAGMANPSPPVGPALGQQGVNIMEECKAFNAKTDSIEKGLPIPVVITVYADRS-FTFVTKTPPAAVLLKKAAGIKSGSGKPNK SmaLll^MA----KKVQAYVKLQVAAGMANPSPPVGPALGQQGVNIMEECKAFNAKTDSIEKGLPIPVVITVYSDRS-FTFVTKTPPAAVLLKKAAGIKSGSGKPNK PvuLll^MA----KKVQAYIKLQVSAGMANPSPPVGPALGQQGVNIMEFCKAFNAKTESVEKGLPIPVVITVIADRS-FTFVTKTPPAAVLLKKAAGVKSGSGKPNK SviLll^MPPK-KKKVTGLIKLQIKAGAANPAPPVGPALGQHGVNIMEECKAYNAATES-QRGMVVPVEITVYDDRS-FTFITKTPPAARLILKNAGIEKGSGEPHK TmaL11^MA----KKVAAQIKLQLPAGKATPAPPVGPALGQHGVNIMEECKRFNAETAD-KAGMILPVVITVYEDKS-FTFIIKTPPASELLKKAAGIEKGSSEPKR 110^120^130^140^150^160^170^180 SceL11^NVKHSGNIQLDEIIEIARQMRDKSFGRTLASVTKEILGTAQSVGCRVDFKNPHDIIEGINAGEIEIPEN ^ PraL11^NIKHNGNITFDEIVNIARQMRHRSLARELSGTIKEILGTAQSVGCNVDGRHPHDIIDDINSGAVECPAS ^ SsoL11^---KIGNLDLEQIADIAIKKKPQLSAKTLTAAIKSLLGTARSIGITVEGKDPKDVIKEIDQGKYNDLLTNYEQKWNE-AEG HcuL11^---FVADLSIEQLKTIAEQKKPDLLAYDARNAAKEVAGTCASLGVTIEGEDARTFNERVDDGDYDDVLGD ^ ELAAA HmaL11^---FVADLSVDQVKQIAEQKHPDLLSYDLTNAAKEVVGTCTSLGVTIEGENPREFKERIDAGEYDDVFAA ^ E AQA Sol(c)L11 --EKVGKITIDQLRGIATEKLPDLNCTTIESAMRIIAGTAANMGIDID---PPILVKKKKEVIF EcoL11^--DKVGKISRAQLQEIAQTKAADMTGADIEAMTRSIEGTARSMGLVVED ^ SmaL11^--DKVGKVTRAQVREIAETKAADMTGSDVEAMTRSIEGTARSMGLVVED ^ PvuLll^--EKVGKITSAQVREIAETKAADLTGADVEAMMRSIAGTARSMGLVVED ^ SviLll^--TKVAKLTAAQVKEIAELKMPDLNANDIDAAVKIIAGTARSMGVTVEG ^ TmaLll^--KIVGKVTRKQIEEIAKTKMPDLNANSLEAAMKIIEGTAKSMGIEVVD ^  B  112 Figure 21^Alignment of the amino acid sequences of the ribosomal protein L1 family, and the phylogenetic tree based on this alignment  A. The Ll proteins from five eubacteria, and six archaebacteria are aligned. The numbers indicate the common alignment positions. The species names and their abbreviations are listed in Table 5 (pp. 41-42). B. A parsimony analysis of the L1 sequences was carried out using PAUP with heuristic and branch and bound tree search options, and one of the two shortest trees found with 627 steps is depicted. The other tree differs only in the position of S. solfataricus, which is indicated by a dash line; the branch length of this  alternative lineage is arbitrary. There are 176 informative sites; all of them are included in the parsimony analysis. The consistency index for the shortest tree is 0.900. The numbers indicate percent confirmation of grouping of species to the right of the node by bootstrapping analysis with 2000 replications. The arrow indicates that possible root of the universal tree is located between the eubacteria and archaebacteria.  ^  113  A^10^20^30^40^50^60^70^80^90^100 HcuLl HhaLl HmaLl HvoLi MvaLl SsoLl  MADNDIE-EAVAR-ALEDAPQR ^ NFRETVDLAVNLRDLDLNDPSQRVDEGVVLPSGTGQETQIVVFADGETAV-RADDVADDVLDE MADNDIE-EAVAR-ALEDAPQR ^ NFRETVDLAVNLRDLDLNDPSQRVDEGVVLPSGTGQETQIVVFADGETAV-RADDVADDVLDE MADQEIE-NAVSR-ALEDAPER ^ NFRETVDLAVNLRDLDLNDPSNRVDESVVLPAGTGQETTIVVFAEGETAL-RAEEVADDVLDE MAD-TIV-DAVSR-ALDEAPGR ^ NFRETVDLAVNLRDLDLNDPSKRVDESIVLPSGTGQDTQIVVFATGETP---AEDAADEVLGP MDSAQIQ-KAVKE-ARTRKPR-NFTQSVDLIV ^ NFTQSVDLIVNLKELDLTRPENRLKEQIVLPSGKGKDTKIAVIAKGDLAA-QAAEMGLTVIRQ MKKVLAD-KESLIEALK----LALSTEYNV KR ^NFTQSVEIILTFKGIDMKKGDLKLREIVPLPKQPSKAKRVLVVPSFEQLEYAKKASPNVVITR  BstL1 EcoLl SmaL1 PvuLl TmaLl  MPKVDKKYLEALK-LVDRSKAYPIAQAIEIVKKTNVAKFDATVEVAFRL-GVDPKKACQQIRGAVVLPHGTGKVARVLVFAKGEKAK-EAEAAGADYVGMAKLTKRMRVI-REKVDATKQYDINEAIALLKELATAKFVESVDVAVNL-GIDARKSDQNVRGATVLPHGTGRSVRVAVETQGANAE-AAKAAGAELVGMAKLTKRMRVI-RDKVDATKQYDITEAIALLKELATAKFVESVDVAVNL-GIDARKSDQNVRGATVLPHGTGRSVRVAVFTQGANAE-AAKAAGAELVGMAKLTKRMRNI-REKVEVTKQYEIAEAVALLKELATAKFVESVDVAVNL-GIDARKSDQNVRGATVLPHGTGRSVRVAVFAQGANAE-AAKEAGAELVGMPKHSKRYLEA-RKLVDRTKYYDLDEAIELVKKTATAKFDETIELHIQT-GIDYRKPEQHIRGTIVLPHGTGKEVKVLVFAKGEKAK-EALEAGADYVG-  ^110^120^130^140^150^160^170^180^190^200 HcuLl DDLSDLADDTDAAKDLADETDFFVAE ^APMMQDIVGALGQVLGPRGKMPTPLQPDD--DVVDTVNRMKNT-VQIRSRDRRTFHTRVGAEDMSAEDI HhaLl DDLSDLADDTDAAKDLADETDFFVAE ^ APMMQDIVGALGQVLGPRGKMPTPLQPDD--DVVDTVNRMKNT-VQIRSRDRRTFHTRVGAEDMSAEDI HmaLl DELEELGGDDDAAKDLADDTDFFIAE ^KGLMQDIGRYLGTVLGPRGKMPEPLDPDD--DVVEVIERMKNT-VQLRSGERRTFHTRVGAEDMSAENI HvoLl DELEDFGDDTDAAKDLADETDFFVAE ^AGLMQDIGRYLGTVLGPRGKMPTPLQPAD--DVVETVNRMYNT-VQLRTRDRRTFHTRVGEDDMTPDEI MvaLl EELEELGKNKKAAKRIANEHGFFIAQ ^ADMMPLVGKSLGPVLGPRGKMPTPLPGNA--NLAPLVARFKKT-VAINTRDKSLFQVYIGTEAMSDEEI SsoLl EELQKLQGQKRPVKKLAIQNEWFLIN ^ QESMALAGRILGPALGPRGKFPTPLPNTA--DISEYINRFKRS-VIVKTKDQPQVQVFIGTEDMKPEDL BstL1 EcoLl SmaLl PvuLl TmaL1  D TEY ^ INK --IQQGWFDFDVVVATPDMMGEVGK-LGRIIGPKGLMPNPKTGTVTFDVAKAVQEIKAGKVEYRVDKAGNIHVPIGKVSFDMEKL --MEDL ^ ADQ - IKKGEMNFDVVIASPDAMRVVGQ-LGQVLGPRGLMPNPKVGTVTPNVAEAVKNAKAGQVRYRNDKNGIIHTTIGKVDFDADKL --MEDL ^ AEQ --IKKGEMNFDVVIASPDAMRVVGQ-LGQISGPRGLMPNPKVGTVTPNVAEAVKNAKAGQVRYRNDKNGIIHTTIGKVDFDADKL --MDDL ^ AAK- VKAGEMDFDVVIASPDAMRVVGQ-LGQILGPRGLMPNPKVGTVTPNVAEAVKNAKAGQVRYRNDKNGIIHTTIGKVVSTKHKL --AEDL ^ VEK-I-EKEGFLDFDVAIATPDMMRIIGR-LGKILGPRGLMPSPKSGTVTQEVAEAVKEFKKGRIEVRTDKTGNIHIPVGKRSFDNEKL  HcuLl HhaLl HmaLl HvoL1 MvaLl SsoLl  210^220^230^240 ASNIDVIMRRLHANLEKGP--LNVDSVYVKTTMGPAVEVA ^ ASNIDVIMRRLHANLEKGP--LNVDSVYVKTTMGPAVEVA ^ ADNIDVILRRLHADLEKGP--LNIDTVYVKTTMGPAMEVA ^ ARTSNVIVRRLEATLEKGP--LNIDSVYVKTTMGPSVEVPA ^ AANAEAILNVVAKKYEKGL--YHVKSAFTKLTMGAAAPISK ^ AENAIAVLNAIENKA-KVE--TNLRNIYVKTTMGKAVKVKRA  BstL1 EcoLl SmaLl PvuLl TmaLl  KENFAAVYEAIIKAKPAAAKGTYVKNVTITSTMGPGIKVDPTTV-AVAQ KENLEALLVALKKAKPTQAKGVYIKKVSISTTMGAGVAVDQAGLSASVN KENLEALLVALKKAKPSQAKGMYIKKVSLSTTMGAGVAIDQSGLSAAAN KENLEALLVALKKAKPSAAKGVYIKKVSLSTTMGAGVAIDQASLSATVKENIIAAIKQIMQMKPAGVKGQFIKKVVLASTMGPGIKLNLQSL-LK-E  B  SsoL1 — — — — SsoL1 MvaL1  1001  53  HcuL1  65^HhaL1 100  ^ HmaL1 ^ HvoL1 EcoL1 SmaL1  100  — PvuLl  100 69  ^ BstL1  ^TuaLl  Figure 21  114 23S rRNA. Any deficiency in the production of rRNA results in L1 protein accumulation; the excess protein binds to the structural mimic on the mRNA and prevents translation of the L11 and L1 cistrons (Dean and Nomura, 1980; Yates and Nomura, 1981; Baughman and Nomura, 1981; Thomas and Nomura, 1987; Kearney and Nomura, 1987). Similar mimics of the Li rRNA binding site have been identified in the mRNAs of other eubacterial, as well as halophilic and methanogenic archaebacterial species (Sor and Nomura, 1987; Shimmin and Dennis, 1989; Baier et al., 1990; Liao and Dennis, 1992). Thus, both structural and regulatory features of the Ll family of proteins are conserved within eubacteria and at least some groups of archaebacteria. The eukaryotic homologue to protein Li has not been identified. The PAUP analysis of the Li protein sequences produced two equally parsimonious trees that group eubacteria and archaebacteria in separate and well resolved domains. The two trees differ only in their placement of S. solfataricus; in the first case it branches separately and somewhat closer to eubacteria (solid branch position in Figure 21B; 53% bootstrap confirmation), and in the second case it branches with M. vannielli (stippled branch) and separately from the halophilic L1 sequences. Distance and DNA parsimony methods position S. solfataricus and M. vannielli together although the grouping is tenuous.  5.2.3 The sequence alignments and phylogeny of L10 proteins Between eubacteria and archaebacteria, the L10 proteins are in general less conserved than are the L11 and L1 proteins. However, because of domain conservation within L10 proteins, a reasonable alignment can be achieved with little difficulty. By using L10 sequences from the archaebacterial species H. cutirubrum and S. solfataricus as "bridges," Shimmin et al. (1989) demonstrated  that the eukaryotic "PO" proteins are actually homolo ues of the bac proteins.  115 The sequence alignment of the L10 protein family from four eubacteria, five archaebacteria, and six eukaryotes is illustrated in Figure 22A. Amino acid identity among all the L10 proteins is highest within the amino terminal 121 residues. The most conspicuous feature is the presence of several highly conserved basic residues at alignment positions 17 (lys), 51 (arg), 68 (lys or arg), 74 (lys or arg), and 121 (lys). There are also many positions in this region which have high incidence of hydrophobic residues. These features suggest that secondary structures in this domain may be highly similar if not identical and that this domain may be involved in rRNA binding (Gudkov et al., 1980; Pettersson, 1979; Mitsui et al., 1989). It is difficult to align with certainty the carboxyl domain of the eubacterial L10 sequences beyond position 121 with the eukaryotic and archaebacterial sequences. Nonetheless, the sequence RNLVYVLNAI of T. maritima L10 near the carboxyl end is highly similar to the archaebacterial sequences around position 240 (e.g. RNL-SV-NAA in H. cutirubrum; Figure 22C). This sequence was used as a starting point to achieve the depicted alignment between positions 173 and 248. The archaebacterial and eukaryotic proteins exhibit a carboxyl terminal extension of approximately 80-100 residues that is clearly not present in the eubacterial protein. This extension is characterized in part by a cluster of charged amino acids (approximately position 320-359). In the eukaryotic proteins, this charged region is preceded by an alanine-proline rich region that is either shortened in, or absent from, the archaebacterial proteins. It has been suggested that these features are a result of a partial duplication of the L12 gene that has been fused to the end of the L10 gene (Shimmin et al., 1989). Within any species of archaebacteria or eukaryote, substantial sequence identity is always apparent between the carboxyl termini of the respective L10 and L12 proteins. For example, the identical sequences at the carboxyl terminus of the L10 and L12 proteins from S. solfataricus are ••QAAEKKEEKKEEEKK ,LSSLFG and from human are  ••KEESEESD(D/E)DMGFGLFD.  116 The carboxyl terminal four to six amino acid residues for the four eubacterial L10 proteins contains a high proportion of charged acidic or basic residues. This region is possibly the functional analog to the region of high charge density within archaebacterial and eukaryotic L10 proteins. In the depicted alignment these residues are somewhat arbitrarily placed at positions 343-348. The analysis of the L10 protein sequences by PAUP yields six equally parsimonious tree configurations. These six trees divide into the two types designated tree 1 and tree 2 in Figure 22D. The L10 proteins from human, rat and mouse are identical except for a few conservative amino acid replacements and a single deletion in the rat protein at position 324. The three subtypes within the type 1 and type 2 trees result from the rearrangement of these closely related mammalian L10 sequences. The type 1 and type 2 trees differ from each other in two respects: the first is the branching order within the eubacterial domain and the second is the positioning of S. solfataricus. In the type 1 tree, Synechocystis is the deepest branch within the eubacteria, and the eukaryotes branch from the S. solfataricus lineage within the archaebacterial group. In the type 2 tree, Synechocystis and T. maritima group together within the eubacteria and the eukaryotes branch from the methanogen halophile lineage within the archaebacterial group. Neither of these two positions for the origin of the eukaryotic domain is supported by bootstrapping. And again, if the root of the tree is within the eubacterial domain (below the position of the arrow in Figure 22D) the archaebacteria appear monophyletic but not holophyletic. Some regions of the L10 protein alignment are less certain than others. When positions 249-369, representing the region of uncertain alignment, were excluded from parsimony analysis, the shortest trees found exhibited a topology identical to the two types of tree illustrated in Figure 23D. When onl ali pose ions to 121 were used for parsimony analysis, the branch pattern within the  117  Figure 22^Alignment of amino acid sequences of the ribosomal protein L10 family, and the phylogenetic tree based on the L10 alignment  A. The L10 ribosomal proteins from five eubacteria, five archaebacteria and six eucaryotes are aligned. In the eukaryotes, these proteins were previously designated "P0." The species names and their abbreviations are as in Table 5 (pp. 41-42). B. The consensus of the alignment was generated manually by majority rule. When majority is not evident at an alignment position, chemically similar amino acid residues were considered to determine the consensus. Question mark (?) indicates that there is no simple consensus at such positions. C. Alignment of the L10 sequences of T. maritima and H. cutirubrum at positions 239-248.  D. A parsimony analysis of the aligned sequences was carried out using PAUP with the branch and bound tree search option, and two of the six equally shortest trees are illustrated (Tree 1 and Tree 2). Four other alternative trees arise because of regrouping among the three mammalian species (Hsa, Rno and Mmu). There are 289 informative sites; all were included in the parsimony analysis. The consistency indices for the two trees are 0.849. The arrow indicates that possible root of the universal tree is located between the lineages of eubacteria and archaebacteriaeukaryotes. The numbers indicate percent confirmation of grouping of species to the right of the node by bootstrapping analysis with 1000 replications.  ^  118  A DdiL10 DmeL10 HsaL10 MmuL10 RnoL10 SceL10  ^ 10^20^30^40^50 60^70^80^90^100 MSGAGS-K ^ RKKLFIEKATKLETTYDKMIVAEADFVGSSQLQKIRKSIR-- --GIGAVLMGKKTMIRKVIRDLAD--SKPELDALNTYLKQNTC M--VRENKAA ^ WKAQYFIKVVELFDEFPKCFIVGADNVGSKQMQNIRTSLR-- --GLAVVLMGKNTMMRKAIRGHLE--NNPQLEKLLPHIKGNVG M--PREDRAT^ WKSNYFLKIIQLLDDYPKGFIVGADNVGSKQMQQIRMSLR-- --GKAVVLMGKNTMMRKAIRGHLE--NNPALEKLLPHIRGNVG M--PREDRAT ^ WKSNYFLKIIQLLDDYPKCFIVGADNVGSKQMQQIRMSLR-- --GKAVVLMGKNTMMRKAIRGHLE--NNPALEKLLPHIRGNVG M--PREDRAT ^ WKSNYFLKIIQLLDDYPKGFIVGADNVGSKQMQQIRMSLR-- --GKAVVLMGKNTMNRKAIRGHLE--NNPALEKLLPHIRGNVG MGGIRE-K ^ KAEYFAKLREYLEEYKSLFVVGVDNVSSQQMHEVRKELR-- --GRAVVLMGKNTMVRRAIRGELS--DLPDFEKLLPFVKGNVG  HcuL10 HhaL10 HmaL10 MvaL10 SsoL10  M-SAEEQRTTEEVPEWKRQEVAELVDLLETYDSVGVVNVTGIPSKQLQDMRRGLH----GQAAVRMSRNTLLVRALEEAGD ^GLDTLTEYVEGEVG M-SAEEQRTTEEVPEWKRQEVAELVDLLETYDSVGVVNVTUIPSKQLQDMRRGLH----GQAALRMSRNTLLVRALEEAGD ^ GLDTLTEYVEGEVG M-SAESERKTETIPEWKQEEVDAIVEMIESYESVGVVNIAGIPSRQLQDMRRDLH----GTAELRVSANTLLERALDDVDD ^GLEDLNGYITGQVG MIDAKSEHK---IAPWKIEEVNALKELLKSANVIALIDMMEVPAVQLQEIRDKIR----DQMTLKMSRNTLIKRAVEEVAEETGNPEFAKLVDYLDKGAA M-IGLAVTTTKKIAKWKVDEVAELTEKLKTHKTIIIANIEGFPADKLHEIRKKLR----GKADIKVTKNNLENIALKNAGY ^ DTKLFESYLTGPNA  EcoL10 SecL10 styLio TmaL10  MALNLQD MGRTREN MALNLQD M-LTRQQ  DdiL10 DmeL10 HsaL10 MmuL10 RnoL10 SceL10  110^120^130^140^150^160^170^180^190^200 IIECKDNIAEVXRVINTQ--RVGAPAKAGVFAPNDVIIPAGPTGMEPTQ-TSFLQDLKIATKINRGQIDIVNEVEIIKTGQKVGASEATLLQKLNIKPFT FVFTKGDLAEVRDKLLES--KVRAPARPGAIAPLHVIIPAQNTGLGPEK-TSFFQALSIPTKISKGTIEIINDVPILKPGDKVGASEATILNMLNISPFS FVFTKEDLTEIRDMLLAN--KVPAAARAGAIAPCEVTVPAQNTGLGPEK-TSFFQALGITTKISRGTIEILSDVQLIKTODKVGASEATLINMLNISPFS FVFTKEDLTEIRDMLLAN--KVPAAARAGAIAPCEVTVPAQNTGLGPEK-TSFFQALGITTKISRGTIEILSDVQLIKTGDKVGASEATLLNMLNISPFS FVFTKEDLTEIRDMLLAN--KVPAAARAGAIAPCEVTVPAQNTGLGPEK-TSFFQALGITTKISRGTIEILSDVQLIKTGDKVGASEATLLNMLNISPFS FVFTNEPLTEIKNVIVSN--RVAAPARAGAVAPEDIWVRAVNTGMEPGK-TSFFQALGVPTKIARGTIEIVSDVKVVDAGNKVGQSEASLLNLLNISPFT  HcuL10 HhaL10 HmaL10 MvaL10 SsoL10  LVATNDNPFGLYQQLENS--KTPAPINAGEVAPNDIVVPEGDTGIDPGPFVGELQTIGANARIQEGSIQVLDDSVVTEEGETVSDDVSNVLSELGIEPKE LVATNDNPFGLYQQLENS--KTPAPINAGEVAPNDIVVPEGDTGIDPGPFVGELQTIGANARIQEGSIQVLDDSVVTEEGETVSDDVSNVLSELGIEPKE LIGTDDNPFSLFQELEAS--KTPAPIGAGEVAPNDIVIPEGDTGVDPGPFVGELQSVGADARIQEGSIQVLSDSTVLDTGEEVSQELSNVLNELGIEPKE IVVTEMNPFKLEKTLEES--KSPAPIKGGAIAPCDIEVKSGSTGMPPGPFLSELKAVGIPAAIDKGKIGIKEDKVVAKEGDVISPKLAVVLSALGIKPVT FIFTDTNPFELQLFLSKF--KLKRYALPGDKADEEVVVPAGDTGIAAGPMLSVEGKLKIKTKVQDGKIHILQDTTVAKPGDEIPADIVPILQKLGIMPVY  KQAIVAEVSEVAKGALSAVVADSRGVTVDKMTELRKAGRE -AGVYMRVVRNTLLRRAVEGT ^ PFECLKDAFVGPTL KATVISDVQELFQDAQMTVIIDYQGLWAEITDLRNRLRP- LGGTCKIAKNTLVRRALAGQ-E ^ AWSPMEEFLTGTTA KQAIVAEVSEVAKGALSAVVADSRGVTVDKMTELRKAGRE- AGVYMRVVRNTLLRRVVEGT ^ QFECLKDTFVGPTL KELIVKEMSEIFKKTSLILFADFLGETVADLTELRSRLREKYGDGARFRVVRNTLLRRAVENA ^ EYEGYEEFLKGPTA  EcoLlO IAYSMEHP-GAAARLFKEFAK ^ ^SecL10 ILVLKEDL-GGAIKAYKKFQK ^ ^StyL10 IAYSMEHP-GAAARLFKEFAK ^ ^TmaL10 VLYVTEGDPVEAVKIIYNFYK ^  ANAKFEVKAAAFEGELIPASQIDRLDTK- KTELRGGVLEGKSLTQADVEAIANAKFEVKAAAFEGELIPASQIDRLDKKADLSRLKGGFLEGKKFTAEEVENI-  DdiL10 DmeLiO HsaL10 MmuL10 RnoL10 SceL10  210^220^230^240^250^260^270^280 YGLEPKIIYDAGACYSPS---ISEEDLINKFKQGIFNIAAI-SL-EIGYPTVASIPHSVMNAFKNLLAISFETSYTFD ^ YCLIVNQVYDSGSIFSPEILDIKPEDLRAKFQWVANLAAV-CL-SVGYPTIASAPHSIANGEKNLLAIAATTEVEFK ^ FGLVIQQVFDNGSIYNPEVLDITEETLHSRFLEGVRNVASV-CL-QIGYPTVASVPHSIINGYKRVLALSVETDYTFP ^ FGLIIQQVFDNGSIYNPEVLDITEQALHSRFLEGVRNVASV-CL-QIGYPTVASVPHSIINGYKRVLALSVETEYTFP ^ FGLIIQQVFDNGSIYSPEVLDITEQALHTRFLEGVANVASV-CL-QIGYPTVASVPHSIINGYKRVLALSVETDYTFP ^ FGLTVVQVYDNGQVFPSSILDITDEELVSHFVSAVSTIASI-SL-AIGYPTLPSVGHTLINNYKDLLAVAIAASYHYP ^  HcuL10 HhaL10 HmaL10 MvaL10 SsoL10  VGLDLRGVESEGVLFTPEELEIDVDEYRADIQSAAASARNL-SV-NAAYPTERTAPDLIAKGRGEAKSLGLQASVESPDLADDLVSKADAQVRALAAQID VGLDLRGVESEGVLFTPEELEIDVDEYRADIQSAAASARNL-SV-NAAYPTERTAPDLIAKGRGEAKSLGLQASVESPDLADDLVSKADAQVRALAAQID VGLDLRAVFADGVLFEPEELELDIDEYRSDIQAAAGRAFNL-SV-NADYPTATTAPTMLQSDRGNAKSLALQAAIEDPEVVPDLVSKADAQVRALASQID VGLNVLGVYEEGVIYTSDVLRIDEEEFLGKLQKAYTNAFNL-SV-NAVIPTSATIETIVQKAFNDAKAVSVESAFITEKTADAILGKAHAQMIAVA-KLA VKLNIKIAYDNGVIIPGDKLSINLDDYTNEIRKAHINAFAV-AT-EIAYPEPKVLE--FTATKAMRNALALASEIGYIWETAQAVETKAVMKAYAAVAS  EcoL10 ATL---PTYEE-AIAR-LMATMKEASAGKLVRTLA-AVRD A ^ SecL10 GDL---PSKEQ-LMGQ-IAGGIN-ALATKIALGIKEVPASVARGLQHV StyL10 ATL---PTYEE-AIAR-LMATMKEASAGKLVRTLA-AVRD A TmaL10 AKL---PSKEE-LYAM-LVGRVK-APITGLVFALSGILRNLVYVLNAI 310^320^330^340^350^360 DdiL10 PVR AAP^SAAAPRAAA -KKVVVE- EKK ^ EESD ^ DDMGMG-LFDDmeL10 DPSKFAAAA----SASAAPAAGGATEKKEEA ^ KKPESESEEED ^ DDMGFG-LFDHsaL10 DPSAFVAAAPVAAATTAAPAAAAAPA-KVEA^ K- -EESEESD ^ EDMGFG-LFDMmuL10 DPSAFAAAAPAAAATTAAPAAAAAPA-KAEA^ K -EESEESD ^ EDMGFG-LFDRnoL10 DPSAFAAAAPLAAATTLAPAAAA PA KVEA ^ K -EESEESD ^ EDMGFG-LFDSceL10 NPEKYAAA ^APAATSAASGDAAPA- -EEAAAEEEEESD ^ DDMGFG-LFDHcuL10 HhaL10 HmaL10 MvaL10 SsoL10 EcoL10 SecL10 StyLlO aL10  DEDALPEELQDVDAPAAPAGGEADTTADEQS-DETQASE-ADDADDSADDDDDDDGNAGAEGLGEMFGDEDALPEELQDVDAPAAPAGGEADTTADEQS-DETQASE-ADDADDSDDDDDDDDGNAGAEGLGEMFGDEEALPEELQGVEADVATEEPTDDQDDDTASEDDADADDAAEEADDDDDDDED AGDALGAMF-GDEALDDDLKEQISSSAVVATEEAP-KAETKKE- -EKK ^ EEAA ^ PAAGLGLLF-ISGKV--DLGVQIQAQPQVSEQAA-EKKEEKKEE--EKKGP--SEEEI ^GGGLSSLFGG KEAA DDKE ^ KEAA KEKKSE  Figure 22  290^300 AAEKFKSAAA-AA EATTIKEY---IK LAEKVKAF---LA LTEKVKAF---LA LAEKVKAF---LA EIEDLVDR---IE  ^  119 B  CONSENSUS  ^10^20^30^40^50^60^70^80^90^100 Eukary M?GPREDRAT WKSNYFLKIIQLLDDYPKCFIVGADNVGSKQMQQIRMSLR----GKAVVLMGKNTMMRKAIRGHLE--NNPALEKLLPHIRGNVG Archae MISAE?ERTTEEIPEWK?EEVAELVELLETYDSVGVVNI?GIPSKQLQDMRR?LH----GQA?LRMSRNTLL?RALEEAGDETGNPGLD?L?EY1EGEVG Eubact MALTRQD KQAIvAEvsEvfKGALsAvvADsRGvTvAKmTELRI(RLREKYGAGVYmRvvRNTLLRRAvEGT - E ?FECLEEFLVGPTA 110^120^130^140^150^160^170^180^190^200 Eukary FVFTKEDLTEIRDMLLAN -- KVPAAARAGAIAPCEVTVPAQNTGLGPEK-TSFFQALGITTKISRGTIEILSDVQLIKTGDKVGASEATLLNMLNISPFS Archae LV?TDDNPF?LFQQLENSKLKTPA?INAGEVAPNDIVVPEGDTGIDPGPFVGELQTVGANARIQEGSIQVLDDSVV?EEGE?VSDDLSNVLSELGIEFTE Eubact ILYSMEHPPGAAAKLFKEFAK DKANAKFELKGGALEGKLITASQVERI^210^220^230^240^250^260^270^280^290^300 Eukary FGLIIQQVFDNGSIYSPEVLDITEEDLHSRFLEGVANVASV-CL-QIGYPTVASVPHSIINGYKRVLALSVETEYTFP LAEKVKAFAA-LA Archae VGLDLRGVF?EGVLFTPEELEIDVDEYR?DIQ?AA??AFNL-SV-NAAYPT?RTAPTLI?K7RGEAKSL?LQA7IESPDLADDLVSKADAQVRALAAQID Eubact ATLPS---YEE-LIAR-LMGTMKEASATKLVRTLA?AVRDLAYVLNAI 310^320^330^340^350^360 Eukary DPSAFAAAAPLAAATTAAPAAAAAPAKKVEA? EKK??EESEESD EDMGFG-LFDArchae DEEALPEELQDVDA??A????EAD???DEQSKDETQA?E?ADDADD?DDDDDDDDGNAGAEGLG?MFGG Eubact KEKEAE  C 240 HcuL10 RNL-SV-NAA TmaL10 RNLVYVLNAI  D  Tree 1^  Tree 2 DdiL10  DdiL10 DmeLlO  99  75  ^ DmeL10  99  75 100  MmuL10  67  MmuLlO  67  RnoL10  RnoL10  ^ SceL10  ^ SceL10  ^ SsoL10 100  100  100  rcuL10  100  57 99  ^ HmaL10  ^ HmaL10  MvaL10 ^ SsoL10  MvaL10  r  100  100 lEcoL10  tEcoL10 StyL10  90  ^ TmaL10  L StyL10 SecL10  SecL10  TmaL10  Figure 22 (continued)  rcuL10 HhaL10  59  HhaL10  59  HsaL10  [HsaL10 100  120 eukaryotic lineage was not well defined and branching within the archaebacterial group was reorganized: halophiles were closer to eukaryotes, M. vannielli was closer to eubacteria, and S. solfataricus was between the two (data not shown). 5.2.4 The sequence alignments and phylogeny of L12 proteins  In spite of the major structural discontinuity that occurs between eubacterial L12 sequences and archaebacterial-eukaryotic L12 sequences, biochemical and genetic evidence strongly suggests that all L12 proteins are homologous. First, the organization of the genes encoding ribosomal proteins L11, L1, L10 and L12 is maintained in organisms as divergent as eubacteria and archaebacteria; the L12 gene is always located at the end of the L11, L1, L10, L12 tetragenic cluster. Second, ribosomes from all organisms contain multiple copies of the L12 protein. As a group, these L12 proteins are very acidic, alanine and proline rich, and similar in size, ranging between about 110-120 amino acids in length. Four copies of the L12 protein along with a single copy of L10 form a distinct stalk on the large ribosome subunit that functions in factor-dependent GTP hydrolysis and mediates structural rearrangements of the ribosome during the protein synthesis cycle. Third, pentameric complex L10-(L12)4 from E. coli recognizes a similar site on the archaebacterial ribosome (StOffler-Meilicke and Staffler, 1991). Furthermore, E. coli L12 can form an active hybrid with yeast core ribosomes from which the acidic proteins have been removed (Sanchez-Madrid et al., 1981). In eukaryotic organisms, there are two distinct L12 proteins that have been described. These have been designated as type I and type II (or "P2" and "P1," respectively; Amons et al., 1979 and 1982; Rich and Steitz, 1987; Shimmin et al., 1989; Newton et al., 1990). In the yeast lineage that includes S. cerevisiae and S. pombe, each of the two genes has been reduplicated to give types IA, IB, IIA and IIB  (Newton et al., 1990; Beltrame and Biarchi, 1990).  121 The alignment of twelve eubacteria and one chloroplast, seven archaebacterial and nine type I and ten type II eukaryotic proteins of the L12 family is illustrated in Figure 23A. All but one of the eukaryotic type II proteins contain a conserved tryptophan at position 88; this aligns to a conserved arginine in the type I, the archaebacterial and the eubacterial L12 proteins. It is interesting and perhaps significant that the extension at the amino terminus of type II proteins shows some sequence similarity to the amino terminus of the eubacterial L12 proteins (alignment position 1-18). Another salient feature of all L12 proteins, especially the archaebacterial and eukaryotic proteins is the highly charged carboxyl terminus. The alignment reflects this feature. The two large alignment gaps near the Cterminus within the eubacterial L12 sequences are located within the loops connecting 13 sheet [B] and a helix [C], and a helix [C] and  R sheet [C] respectively  (according to the crystal structure of the C terminal domain of E. coli L12 protein; Leijonmarck et al., 1980). Consequently deletions (or insertions) in these regions could be accommodated without dramatically altering the overall protein structure. In eukaryotic and archaebacterial species, the L12 carboxyl terminal sequences are preceded by an alanine-proline rich region and exhibit substantial similarity to the carboxyl terminus of protein L10 (see above). Eubacterial L12 proteins have a similar alanine-proline rich region, but it is located more proximally to the aminoterminus in the protein at position 39-60. In all the proteins, these alanine-proline regions are believed to be highly flexible and to serve as "hinges" between two distinct domains (Leijonmarck et al., 1980 and 1987; Shimmin et al., 1989). The relocation of this hinge to a more amino terminal position in eubacterial L12 proteins cannot be easily explained. Recent biochemical studies on the S.  solfataricus L12 protein have concluded that the amino and carboxyl terminal domains of the protein are functionally equivalent to the corresponding amino and • • -^IIII"^•  supports a colinear alignment. To simplify visualization and comparison, a  122 Figure 23^Alignment of the amino acid sequences of the ribosomal protein L12 family  A.  The L12 equivalent protein sequences from thirteen eubacteria, seven  archaebacteria, and nineteen eukaryotes were aligned. The eukaryotic proteins divide into two types designated as type I and type II. The species names and their abbreviations are as in Table 5 (pp. 41-42). B. The consensus of the alignment was generated manually by majority rule. When majority was not evident at an alignment position, chemically similar amino acid residues were considered to determine the consensus. Question mark (?) indicates that there was no simple consensus at such positions.  123  A 10^20^30^40^50^60^70^80 AsaLl2II MA S-K DELAC ^ VYAALILL-DDDVDITTEKVN ^ TILRAAGVSVE DmeLl2II M^STK-AELAS ^ VYASLILV DDDVAVTGEKIN ^ TILKAANVEVE GgaLl2II MA---SVS E LAC ^ IYSALILH DDEVTVTEDKIN ^ ALIKAAGVNVE HsaLl2II MA---SVS-E LAC ^ IYSALILH DDEVTVTEDKIN ^ ALIKAAGVNVE ALIKAAGVNVE RraLl2II MA---SVS-E LAC ^ IYSALILH DDEVTVTEDKIN ^ SceLl2IIA M -S T-E SAL ^ SYAALILA DSEIEISSEKLL ^ TLTNAANVPDE TITKAAGANVD SceL12IIB M -SDS- II ^ SFAAFILA DAGLEITSDNLL ^ SpoLl2IIA M- -SAS E-LAT ^ SYSALILA-DEGIEITSDKLL ^ SLTKAANVDVE SLTKAANVDVE SpoLl2IIB M- SAS E LAT ^ SYSALILA DEGIEITSDKLL ^ TthL12II M^STT-E IEKVVKGA SYSALLLN DCGLPITAANIA ^ ALFKTAKLNGH AsaL12I DmeLl2I HsaLl2I RraL12I SceLl2IA SceL12IB SpoL12IA SpoL12IB TcrL12I  ^ MRYVAAYLLAALSGNADPSTADIE ^ ^ MRYVAAYLLAVLGGKDSPANSDLE ^ ^ MRYVASYLLAALGGNSSPSAKDIK ^ ^ MRYVASYLLAALGGNSNPSAKDIK ^ ^ MKYLAAYLLLVQGGNAAPSAADIK ^ ^ MKYLAAYLLLN AAGNTPDATKIK ^ ^ MKYLAAYLLLTVGGKQSPSASDIE ^ ^ MKYLAAYLLLTVGGKDSPSASDIE ^ ^ MKYLAAYALVGLSGG-TPSKSAVE ^  KILSSVGIECN KILSSVGVEVD KILDSVGIEAD KILDSVGIEAD AVVESVGAEVD AILESVGIEIE SVLSTVGIEAE SVLSTVGIEAE AVLKAAGVPVD  HcuL12 HhaL12 HvoL12 HmaL12 MvaL12 SacL12 SsoL12  ^ ^ ^ ^ ^ ^ ^  GVLEAAGVDVE GVLEAAGVDVE AVLEAAGVDVE DVLDAAGVDVE AVLVAGGIEAN NVLSAAGITVD NVLSAAGITVD  MEYVYAALILN-EADEELTEDNIT ^ MEYVYAALILN EADEELTEDNIT ^ MEYVYAALILN ESDEEVNEENIT ^ MEYVYAALILN EADEEINEDNLT ^ MEYIYAALLLN SANKEVTEEAVK ^ MEYIYASLLLH-AAKKEISEENIK ^ MEYIYASLLLH-AAKKEISEENIK ^  BstL12^M ^ TKEQIIEAVKNMTVLELNELVK-AIEEEFGVTAAAPVVVAGGAAAGA ^ EAAAEKTEFDVILADA-GAQKIKVIK BsuL12^MA^LNIEEIIASVKEATVLELNDLVK-AIEEEFGVTAAAPVAVAGGAAAGGAA ^ EE EFDLILAGA-GSQKIKVIK DvuL12^M---SSITKEQVVEFIANMTVLELSEFIK-ELEEKEGVSAAAPAMMAVAAGPA EAAPAEEEKTEFDVILKAA-GANKIGVIK EcoL12^M----SITKDQIIEAVAAMSVMDVVELIS-AMEEKFGVSAAAPVAVAAGPV ^ EAAEEKTEFDVILKAA-GANKVAVIK HeuL12^MA^LTQEDIINAVAEMSVMEVAELVS AMEEKFGVSAAAAVVAGPGGG ^ EA-EEAEEQTEFDLVLTSA-GEKKVNVIK HprL12^M ^NKEEIMSAIEEMSVLELSELVE-DLEEKFGVSAAAPVAVAGGAA-GAGAAA----EEKSEEDVFLADI-GGKKIKVIK M1yL12^M ^NKEQILEAIKAMTVLELNDLVK AIEEEFGVTAAAPVVA GGAAAAA ^ EEKTEFDVVLASA-GAEKIKVIK RspLl2^MA ^DLNKLAEDIVGLTLLEAQELKT-ILKDKYGIEPAAGGAVMMAGPAAGAAA--PAEEEKTEFDVGLTDAAGANKINVIK SecL12^M---SAAT-DQILEQLKSLSLLEASELVK-QIEEAFGVSAAAPVGGMVMAAAAAAPA--EAAEEKTEFDVILEEVPADKKIAELK SgrL12^MA---KLSQDDLLAQFEEMTLIELSEFVK-AFEEKFDVTAAAAVAVAGPAAGGAPA EEAEQD-EFDVILTGA-GEKKIQVIK Sol(c)L12 MAVEAPEKIEQLGTQLSGLTLEEARVLVD-WLQDKLGVSAASFAPAAAVAAPGAPADAAPAVEEKTEFDVSIDEVPSNARISVIK StyL12^M^SITKDQIIEAVSAMSVMDVVELIS AMEEKFGVSAAAAVAVAAGPA ^ EAAEEKTEFDVILKAA-GANKVAVIK TmaL12^M ^ TIDEIIEAIEKLTVSELAELVK-KLEDKFGVTAAAPVAVAAAPVAGAAAGAA--QEEKTEFDVVLKSF-GQNKIQVIK  Figure 23  124  A (continued) 150^160^170 KEEKKEEKKEESEE--EDEDMGFGLFD ES KKEEKKKEEESDQSDDDMGFGLFD EEKKEEEKKEESEE--SDDDMGFGLFD EEKKVEAKKEESEE--SDDDMGFGLFD EEKKVEAKKEESEE--SEDDMGFGLFD EKEEEEAKEE ^ SDDDMGFGLFD EEEKEEEAAEE ^ SDDDMGFGLFD EEKEEAKEEEE ^ SDEDMGFGLFD EEQKEEAKEEEE ^ SDEDMGFGLFD KKEEPKKEEPKKEEPKEEETDMDMG-DLFG  AsaL12II DmeLl2II GgaLl2II HsaLl2II RraLl2II SceLl2IIA SceLl2IIB SpoLl2IIA SpoLl2IIB TthL12II  90^100^110^120^130^140 PYWPGLFTKALEGL-DL-KSMITN----VGSGVGAAPAAGGAAAA ^TEA PAA PYWPGLFAKALEAI-NV-KDLITN----IGSGVGAAPAGGAAPAAAAAAPAA ^ PFWPGLFAKALANI-DI-GSLICN ^VGAGGGAPAAAAPAGGAAPAGGGAAPA ^ PFWPGLFAKALANV-NI-GSLICN ^VGAGGPAPAAGAAPAGGPAPATAAAPA ^ PFWPGLFAKALANV-NI-GSLICN ^VGAGGPAPAAGAAPAGGPAPSAAAAPA ^ NIWADIFAKALDGQ-NL-KDLLVN----F-SAGAAAPAGVAGGVAGG-EAGEAEA ^ NVWADVYAKALEGK-DL-KEILSG----FHNAGPVAGAGAASGAAAAGGDAAA ^ PIWATIFAKALEGK-DL-KELLLN^IGSGAGAAPVAGGAAAPAAA DGEAPA ^ PIWATIFAKALEGK-DL-KELLLN----IGSAAAAPAAGGAGAPAAAAGGEAAA ^ ETTFKTFEDFLKTN-PI-TNYIGA----IGGSAPAAASSAPA ^  AsaLl2I DmeLl2I HsaLl2I RraLl2I SceLl2IA SceLl2IB SpoL12IA SpoLl2IB TcrL12I  PSQLQKVMNELKGK-DL-EALIAEGQTKLASMPTGGAPAAAAGGAATA-PAA ^ EAKEAKKEEKKEESEE--EDEDMGFGLFD AERLTKVIKELAGK-SI-DDLIKEGREKLSSMPVGGGGAVAAADAAPAAAAGG ^DKKEAKKEEKKEESES--EDDDMGFALFE DDRLNKVISELNGK-NI-EDVIAQGIGKLASVPAGGAVAVSAAPGSAAPAAGSAPAAA----EEKKDEKKEESEE--SDDDMGEGLED DERLNKVISELNGK-NI-EDVIAQGVGKLASVPAGGAVAVSAAPGSAAPAAGSAPAAA ^EEKKDEKKEESEE--SDDDMGFGLFD EARINELLSSLEGK-GS-LEEIIAEGQKKFATVPTGGA--SSAAAGAAGAAAGGDAAA ^EEEKEEEKEE ^ SDDDMGFGLFD DEKVSSVLSALEGK-SV-DELITEGNEKLAAVPAAGPASA ^GG AAAAGGDAAA ^EEEKEEEAAEE----SDDDMGFGLFD AERVESLISELNGK-NI-EELIAAGNEKLSTVPSAGAVATPAAGGAAGAEATS AA ^ EEAKEEEAAEE----SDEDMGFGLFD SERIETLINELNGK-DI-DELIAAGNEKLATVPTGGAASAAPAAAAGGAAPAA ^ EEAAKEEAKEEEE--SDEDMGFGLFD PSRVDALFAEFAGK-DF-DTVCTEGKSKLVGGVTRPNAATASAPTAAAAASSGAAAPAAAA EEE ^EDDDMGFGLFD  HcuL12 HhaL12 HvoL12 HmaL12 MvaL12 SacL12 SsoL12  ESRAKALVAALEDV-DI-EEAVEE-- --AAAAPAAAPAASGSDDEAAADDGDDDEEA-DADEAAEAEDAGDDDDEEPSGEGLG-DLFG ESRAKALVAALEDV-DI-EEAVEE-- --AAAAPAAAPAASGSDDEAAADDGDDDEEA-DADEAAEAEDAGDDDDEEPSGEGLG-DLFG ESRVKALVAALEDV-DI-EEAIET -AAAAPAPAAGGSAGGEVEAADDDDEED-A-EEEAADEGGDDDGDDDEEADGEGLG-ALFG ESRVKALVAALEDV-DI-EEAVDQ-- --AAAAPVPASGGAAAPAEGDADEADEADEEAEEEAADDGGDDDDDEDDEASGEGLG-ELFG DARVKALVAALEGV-DI-AEAIAK-- --AAIAPVAAAAPVAAAAAPA ^ EVKKEE-KKEDTT-AAAAAGLG-ALFM EVRLKAVAAALEEV-NI-DEILKT-- --ATAMPVAAVAAPAGQQTQQAA ^ EKKEEKKEEEKKGPSE-EEIGGGLS-SLFG ^ EVRLKAVAAALEEV-NI-DEILKT ATAMPVAAVAAPAGQQTQQAA ^ EKKEEKKEEEKKGPSE-EEIGGGLS-SLFG  BstL12 BsuL12 DvuL12 EcoL12 HeuL12 HprL12 MlyL12 RspL12 SecL12 SgrL12 Sol(c)L12 StyL12 TmaL12  VVR-EITGLGLKEAKDLVDNTPKP----IKEGIA VVR-EITGLGLKEAKELVDNTPKP KEGIA VVR-ALTGLGLKEAKDKVDGAPST----LKEAVS AVR-GATGLGLKEAKDLVESAPAA^LKEGVS VVR-EITGLGLKEAKAAVDGAPAT ^LKEGMS AVR-ELTGLGLKEAKGVVDDAPGN----VKEGLS VVR-EITGLGLKEAKEVVDNAPKA----LKEGVS EVR-AITGLGLKEAKDLVE-AGGK----VKEAVA VVR-TITGLGLKEAKELVESTPKA----IKEATG VVR-ELTSLGLKEAKDLVDGTPKP----VLEKVA AVR-ALTSLGLKEAKELIEGLPKK LKEGVS AVR-GATGLGLKEAKDLVESAPAA----LKEGVS VVR-EITGLGLKEAKDLVEKAGSPDAV-IKSGVS  B  KEEAEEIKAALEE ^ AGAKVEIKKEEAEELKAKLEE ^ VGASVEVKKEEAEEAKKQLVE ^ AGAEVEVKKDDAEALKKALEE ^ AGAEVEVKKEDGDEAKTKLEE ^ AGASVELKKEDAEEMKEKLEE ^AGATVELKKDEAEEIKAKLEE ^ VGASVEVKKADAEAMKKK LEE ^AGAKVELKKDDAEAIKKQIEE ^AGGKAAVKKEAAEKAAESLKA ^AGASVEVKKDDAEDAKKQLED ^ AGAKVSIVKDDAEALKKSLEE ^AGAEVEVKKEEAEEIKKKLEE ^AGAEVELK-  consensus  Eukaryll Eukaryl Archae Eubact  10^20^30^40 50^60^70^80 MA---SVS-EELACVVKGA--SYSALILA-DDEVEITSDKIN ^ TLTKAAGVNVE ^ MKYLAAYLLL?LGGN?SPSASDIE ^ KILSSVGIEAD ^ MEYVYAALILN-EADEEITEENIT ^ ?VLEAAGVDVE MAVESSLTKEQIIEAIKEMTVLELNELVK-ALEEKEGVSAAAPVAVAGGAAAGAAAAAAEAAEEKTEEDVILA?APGANKIKVIK  EukaryII Eukaryl Archae Eubact  90^100^110^120^130^140^150^160^170 PFWPGLFAKALEG?-DL-KELI?N----IGGGGGAAAAGAAAAAAAAAG?AAAPAA--KKEEEKKEEAKKEESEEEESDDDMGFGLFD DERL?KVISELNGK-DI-EELIAEGNEKLASVPTGGAAAVAAAPGAAAAAAGGA?AAAAEAKEEKKEEKKEESEE--SDDDMGEGLED ESRVKALVAALEDV-DI-EEAVET AAAAPVAA?AASAGDDEQAADDGDEDEEAAEEDEAKEEEDKKDDDDEEASGEGLG-DLFG VVR-EITGLGLKEAKDLVDGAPKADAV-LKEGVS ^ KEDAEEIKKKLEE ^AGASVEVK-  Figure 23 (continupri)  125 consensus of the eukaryotic type I and II, the archaebacterial and the eubacterial L12 proteins are aligned in Figure 23B. It should be stressed here that in any alignment (and in particular this L12 alignment) the assumption of common ancestry of each amino acid at a given alignment position is less than certain. That is, alignments simply reflect a guess, hopefully a best guess, of common ancestry at every position. The phylogenetic relationships among the L12 family of protein sequences were determined using parsimony (Figure 24) and distance matrix methods (not shown). Because of the uncertainty in generating a reliable alignment between eubacterial and archaebacterial-eukaryotic L12 sequences, we first determined the phylogenies of eubacteria, archaebacteria and eukaryotes separately, and then for comparison we determined the "universal" phylogeny. In general, the branch patterns within the eukaryotic, archaebacterial and eubacterial groups were essentially identical in the "universal" tree and the three individual trees. The universal parsimony tree (shown in Figure 24) and a Fitch-Margoliash distance tree (not shown) both indicated that the eubacterial sequences form a single coherent group that is confirmed by bootstrap analysis. However, the branching order within this group is not substantiated by bootstrap analysis. The archaebacterial L12 sequences also appear to form a coherent group that is both mono- and holophyletic. By bootstrap resampling, the confirmation of this grouping was 57% for the protein alignment and 58% for the corresponding nucleic acid alignment analyzed by PAUP (data not shown). In contrast, the eukaryotic L12 sequences clearly resolve into two groups corresponding to the type I and type II proteins. This distinct division implies that the duplication of the L12 gene occurred very early in the eukaryotic lineage.  126  Figure 24^Phylogenetic tree inferred from the aligned L12 amino acid sequences  The phylogenetic tree was constructed by using PAUP with heuristic tree search option. Illustrated is the majority rule consensus of the 14 equally shortest trees. There are 147 informative sites in the alignment; all of them were used for parsimony analysis. The consistency index is 0.598. When the first 18 alignment positions, and the flexible hinge regions (position 43 to 74 for eubacteria and 119146 for archaebacteria and eucaryotes) were excluded from analysis, 20 shortest trees were found; the majority rule consensus of these trees has essentially the same topology as the tree shown here. The numbers refer to the percent confirmation of grouping of the species to the right of the node by bootstrap analysis with 100 replications. Only values greater than 50% are labeled. The abbreviations of the species names are as in Table 5 (pp. 41-42). The numbers indicate percent confirmation of grouping of species to the right of the node by bootstrapping analysis with 1000 replications. The arrow indicates that possible root of the universal tree is located between the lineages of eubacteria and archaebacteriaeukaryotes.  ^ ^  127  79 ^ AsaLl2II ^ DmeLl2II 50 E LGgaLl2II 100 HsaLl2II 56 100^RraLl2II 00 SceL12IIA 70 SpoLl2II 87 ^Sp0L12IIB 70 ^ SceL12IIB ^ TthL12II ^ AsaLl2I ^87 ^ DmeLl2I 52 100 HsaLl2I L RraLl2I 87 SpoL12IA SpoLl2IB ^ TcrL12I 64 ^ SceL12IA ^ SceL12IB 100 HcuLl2 100 I HhaL12 85 1--• HvoL12 ^ HmaL12 57 ^ MvaL12 100 SacL12 SsoL12  r  r--  BstL12 L.E.- BsuL12 MlyL12 ^ HprL12 ^ HeuL12 ^ SgrL12 ^ TmaL12 ^ DvuL12 100 EcoL12 L.- StyL12 ^ SecL12 ^ RspL12 Sol(c)L12  r  100  Figure 24  128 5.2.5 Phylogenetic considerations  The alignment and phylogenetic analysis presented above using L11, L1, L10 and L12 protein sequences generally support the concept that organisms divide into three distinct and well defined groups: eubacteria, archaebacteria and eukaryotes. The ribosomal protein sequences from member species within a group are in most cases more similar to each other based on amino acid identity than to the sequences from species outside the group. Furthermore, numerous deletions, insertions or structural rearrangements in these ribosomal protein sequences confirm this three part delineation and demarcation. If the root in these ribosomal protein based trees is near or within the eubacterial domain, then it is clear that the archaebacteria appear monophyletic, originating from a common ancestor that is distinct from eubacteria. The origin of the eukaryotes is more problematic. They appear to originate as a distinct branch either outside of the archaebacterial group as suggested by the L12 protein phylogeny or alternatively from within the archaebacterial group as suggested by the L11 and L10 protein phylogenies. Although ribosomal proteins at first glance might be considered good candidates for phylogenetic analysis, in reality there are some inherent problems. First, they are relatively small proteins and second, divergence and structural rearrangements often make alignments difficult and ambiguous. Because of these limitations, the origin of the eukaryotic lineage either from within or outside of the archaebacterial group cannot be statistically substantiated. Phylogenetic analyses of rRNA sequences and translational elongation factors Tu and G sequences suggest that the hyperthermophilic eubacterium T. maritima is a representative of deep branching lineages within the eubacterial  group (Achenbach-Richter et al., 1987; Bachleitner et al., 1989; Tiboni et al., 1991). Representatives of deep branching lineages within the archa yperthermophilic. This has led to the suggestion that the ancestor of eubacteria  129 and archaebacteria (i.e., the common ancestor represented as the root of the universal tree) was hyperthermophilic (Achenbach-Richter et al., 1987; Pace, 1991; Burggraf et al., 1992; Stetter, 1993). This would place the position of the root either deep within the eubacterial or archaebacterial groups or somewhere between the two groups. This situation may be clarified by previous analyses of translational elongation factors and subunits of ATPase which have placed the root somewhere between eubacteria and archaebacteria (Iwabe et al., 1989; Gogarten et al., 1989). In contrast to the rRNA and the elongation factors Tu and G based phylogenetic analysis (Achenbach-Richter et al., 1987; Bachleitner et al., 1989; Tiboni et al., 1991), our analysis using L11, L1, L10 and L12 ribosomal protein sequences are  less definitive in the placement of T. maritima within the eubacteria. The resolution of our trees is limited by the relatively small size of these proteins and in some cases by the limited number of sequences available for analysis. The tree for the L12 protein, containing thirteen eubacterial sequences, is virtually devoid of resolution that is confirmable by bootstrap analysis. In the L11 tree, the mesophile S. virginiae appears to branch more deeply than T. maritima. These observations  seem to suggest that different molecules, although they are all components of the protein synthesis apparatus, can diverge to some extent independently and give rise to incongruent phylogenies. The "true" organismal phylogeny will hopefully become apparent from a consensus of molecular phylogenies. Lake et al. have suggested that the eukaryotic lineage arose as a branch from the sulfur-metabolizing thermophilic lineage (i.e., the "eocytes" or Crenarchaeota) within the archaebacterial group (Lake, 1988, Rivera and Lake, 1992). Other analyses indicate that eukaryotic lineage originated outside of the archaebacterial kingdoms (Pace et al., 1986; Woese et al., 1990). Our data neither confirm nor refute either of these two positionings. However, our analysis clearly highlights the major discontinuity that separates archaebacterial and eukar sequences. The sequence (amino acid identity) and structure (deletion, insertion  130 and rearrangements) of ribosomal proteins from organisms within a group (i.e., eubacteria, archaebacteria or eukaryotes) are clearly more similar to each other than to the sequence and structure of the proteins from organisms outside the group.  5.3 SUMMARY  Available sequences that correspond to the E. coli ribosomal proteins L11, L1, L10 and L12 from eubacteria, archaebacteria and eukaryotes have been aligned. The alignments were analyzed qualitatively for shared structural features and for conservation of deletions or insertions. The alignments were further subjected to quantitative phylogenetic analysis. Eubacteria and eukaryotes each form welldefined, coherent and non-overlapping groups, and the holophyly of these two groups is supported by bootstrap resampling. Archaebacteria also form a coherent phylogenetic group by themselves, but the relationships between the major groups of archaebacteria (extreme halophiles, methanogens, and sulfur-metabolizing thermophiles) and outgroups (eubacteria and eukaryotes) can not be established in this study. In particular, the positioning of S. solfataricus (a sulfur-metabolizing hyperthermophile) is conflicting in various trees, and remains unresolved in any tree based on these ribosomal protein sequences. T. maritima does not appear as the deepest branch in any of the presented trees, which may indicate that the evolutionary rate of the ribosomal protein genes is different from that of the other genes, such as rRNA genes. However, the phylogenetic placement of T. maritima in these trees is less definitive, probably due to relatively short sequences of these proteins. The degree of diversity of the four proteins between the three groups is not uniform. L11 is the most conserved protein of these four ribosomal proteins; thus, the alignment of the L11 family is less ambigiFou =s7 For the L12 proteins and the L10 ---  131  proteins, the archaebacterial and eukaryotic proteins are more similar to each other, whereas the eubacterial proteins are very different. In eukaryotes there are paralogous genes that encode type I and type II L12 proteins; for some features, the type I protein is more similar to the archaebacterial L12 than is the type II protein. The eukaryotic L1 equivalent protein has yet to be identified. These data indicate that the evolutionary divergence of even closely associated components of the translation apparatus can be remarkably dissimilar, especially when compared between the eubacteria, archaebacteria and eukaryotes.  132  VI. Conclusion and prospects Eubacteria represent a large collection of diverse microbial species that utilize a wide variety of metabolic strategies to exploit different ecological habitats. Most of our understanding of the molecular biology and biochemistry of this group of organisms comes from the study of E. coli, a mesophilic and facultatively anaerobic heterotroph. Our knowledge of eubacteria will be greatly enhanced by characterization of other species that are more representative of the range of eubacterial biochemistry. Hyperthermophilic eubacteria are examples of the diversity of this microbial world. These organisms can grow at unusually high temperatures and the importance of their biochemical characterization has been widely appreciated by the scientific community and the biotechnology industry (Pace, 1991; Kristjansson and Stetter, 1992). Because the deep branches in a universal phylogenetic tree are exclusively occupied by the hyperthermophiles, it is plausible that the last common cellular ancestor of all life was a hyperthermophile (Pace, 1991; Stetter, 1993). Thus, detailed study of these organisms could potentially reveal ancestral characteristics that may have already been lost in other species, as well as the biochemical mechanisms for high temperature growth. Furthermore, the thermostable enzymes serve as better biocatalysts in industry, and have opened new possibilities in biotechnology, such as the well-known PCR technique. Proteins from thermophilic sources will, therefore, continuously be exploited in the future for technological purpose. The results presented in this thesis provide a perspective on the biochemistry and molecular biology of the hyperthermophilic eubacterium T. maritima. The genomic organization of the secE, nusG, ribosomal protein genes  L11, L1, L10 and L12, and the 13 subunit gene of the RNA polymerase in T. maritima  133 is the same as that in E. coli and other eubacteria. Thus, it is reasonable to assume that this arrangement already existed in the ancestor of eubacteria. However, the expression patterns of these genes in T. maritima are very different from that in E. coli. The E. coli rif region is divided into three transcriptional units, whereas most  of the genes in the T. maritima fragment are cotranscribed. The control mechanisms for gene expression are also different. The regulation of L10 and L12 expression in E. coli is through a complicated mechanism at the translational level, while the expression of these two genes in T. maritima appears to be modulated through a mechanism of transcriptional attenuation. On the other hand. some common strategies for control of gene expression are probably used. For example, both the E. coli rif operon and the cloned T. maritima fragment contain the L1 autogenous regulatory sites. While the primary structures of the ribosomal proteins L11, L1, L10, and L12 are largely preserved, the sequences of the T. maritima SecE and NusG are very dissimilar to their respective counterparts in E. coli. Furthermore, the transcription factor NusG in T. maritima functions as a  DNA-binding protein, which may be very important for regulation of gene expression in a hyperthermophilic environment, whereas the E. coli protein apparently lacks DNA-binding activity. These features exemplify the common and different biochemical characteristics within the domain of eubacteria. Common properties may be inherited from their ancestor, whereas different characteristics may be necessary for adaptation to diverse ecological niches. In summary, our understanding of the microbial world and of the fundamentals of life and its evolution can be enriched by further biochemical characterization of hyperthermophiles; future research in this field will be both rewarding and merited.  134 VII. References Achenbach-Richter, L., Gupta, R., Stetter, K. 0., and Woese, C. R. (1987) Were the original eubacteria thermophiles? System. Appl. Microbiol., 9: 34-39. Amons, R., Pluijms, W., Kriek, J., and M011er, W. (1982) The primary structure of proteins eL12'/eL2'-P from the large subunit of Artemia salina ribosomes. FEBS Lett.,146: 143-147. Amons, R., Pluijms, W., and Willer, W. (1979) The primary structure of ribosomal eL12/eL12-P from Artemia salina 80S ribosomes. FEBS Lett.,104: 85-89. An, G., and Friesen, J. (1980) The nucleotide sequence of tufB and four nearby tRNA structural genes of Escherichia coli. Gene, 12: 33-39. Arndlt, E., and Weigel, C. (1990) Nucleotide sequence of the genes encoding the L11, L1, L10 and L12 equivalent ribosomal proteins from the archaebacterium Halobacterium marismortui. Nucleic Acids Res. 18: 1285. Arraiano, C., Yancey, S., and Kushner, S. (1988) Stabilization of discrete mRNA breakdown products in ams rnb multiple mutants of Escherichia coli K-12. J. Bacteriol., 170: 4625-4633. Bacheitner, M., Ludwig, W., Stetter, K. 0., and Schleifer, K. H. (1989) Nucleotide sequence of the gene coding for the elongation factor Tu from extremely thermophilic eubacterium Thermotoga str operon. FEBS Lett., 57: 115-120. Baier, G., Piendl, W., Redl, B., and Stoeffler, G. (1990) Structure, organization and evolution of the L1 equivalent ribosomal protein gene of the archaebacterium Methanococcus vanielli. Nucleic Acids Res., 18: 719-724. Balch, W. E., Fox, G. E., Magrum, L. J., Woese, C. R., and Wolfe, R. S. (1979) Methanogens: reevaluation of a unique biological group. Microbiol. Rev. 43: 260296. Bartsch, M., Kimura, M., and Subramanian, A. R. (1982) Purification, primary structure, and homology relationships of a chloroplast ribosomal protein. Proc. Natl. Acad. Sci. LISA, 79: 6871-6875.  135 Baughman, G., and Nomura, M. (1983) Localization of the target site for translational regulation of the L11 operon and direct evidence for translational coupling in Escherichia coli. Cell, 34: 979-988 Beauclerk, A. A. D., Hummel, H., Holmes, D. J., Bock, A., and Cundliffe, E. (1985) Studies of the GTPase domain of archaebacterial ribosomes. Eur. J. Biochem., 151: 242-255. Beltrame, M., and Bianchi, M. E. (1990) A gene family for acidic ribosomal proteins in Schizosaccharomyces pombe: two essential and two non-essential genes. Mol. Cell. Biol., 10: 2341-2348. Bonch-Osmolovskaya, E. A., Miroshnichenko, M. L., Kostrikina, N. A., Chernych, N. A., and Zavarzin, G. A. (1990) Thermoproteus uzoniensis sp. nov., a new extremely thermophilic archaebacterium from Kamchatka continental hot springs. Arch. Microbiol. 154: 556-559. Bonch-Osmolovskaya, E. A., Slesarev, A. I., Miroshnichenko, M. L., Svetlichnaya, T. P., and Alexeyev, V. A. (1988) Characteristics of Desulfurosoccus amylolyticus n. sp. - a new extremely thermophilic archaebacterium isolated from thermal springs of Kamchatka and Kunashir island. Mikrobiologia, 57: 78-85. Bremer, B., and Dennis, P. (1987) Modulation of chemical composition and other parameters of cell by growth rate. In Escherichia coli and Salmonella typhimurium: cellular and molecular biology (Neidhardt, F. C., Ingraham, J. L., Low, K. B., Schaecher, M., and Umbarger, E., eds.), pp. 1527-11542. American Society for Microbiology, Washington, DC. Brennan, C. A., Dombroski, A. J., and Platt, T. (1987) Transcription termination factor p is an RNA-DNA helicase. Cell, 48: 945-952. Brierley, C. L., and Brierley, J. A. (1973) A chemoautotrophic and thermophilic microorganism isolated from an acid hot spring. Can. J. Microbiol., 19: 183-188. Brock, T. D. (1986) Introduction: an overview of the thermophiles. In Thermophiles: General Molecular and Applied Microbiology (Brock, T. D. ed.), pp. 1. John Wiley & Sons, New York.  136 Brock, T. D., Brock, K. M., Belly, R. T., and Weiss, R. L. (1972) Sulfolobus, a new genus of sulfur oxidizing bacteria living at low pH and high temperature. Arch. Microbiol. 84: 54-68. Brown, J. W., Haas, E. S., and Pace, N. R. (1993) Characterization of ribonuclease P RNAs from thermophilic bacteria. Nucleic Acids Res., 21: 671-679. Broyles, S., and Pettijohn, D. E. (1986) Interaction of the Escherichia coli HU protein with DNA. Evidence for formation of nucleosome-like structures with altered DNA helical pitch. J. Mol. Biol.,187: 47-61. Burggraf, S., Olsen, G. J., Stetter, K. 0., and Woese, C. R. (1992) A phylogenetic analysis of Aquifex pyrophilus. System. Appl. Microbiol., 15: 352-356. Burggraf, S., Jannasch, H. W., Nicolaus, B., and Stetter, K. 0. (1990a) Archaeoglobus profundus sp. nov., represents a new species within the sulfur-reducing archaebacteria. System. Appl. Microbiol.,13: 24-28. Burggraf, S., Fricke, H., Neuner, A., Kristjansson, J., Rouvier, P., Mandelco, L., Woese, C. R., and Stetter, K. 0. (1990b) Methanococcus igneus sp. nov., a novel hyperthermophilic methanogen from shallow submarine hydrothermal system. System. Appl. Microbiol., 13: 263-269. Casiano, C., Matheson, A. T., and Traut, R. R. (1990) Occurrence in the archaebacterium Sulfolobus solfataricus of a ribosomal protein complex corresponding to Escherichia coli (L7/L12)4•L10 and eukaryotic (P1)2/(P2)2•PO. J. Biol. Chem., 265: 18757-18761. Cavalier-Smith, T. (1991) The evolution of cells. In Evolution of Life: Fossils, Molecules and Culture (Osawa, S. and Honjo, T. eds.), pp. 271-304, SpringerVerlag, Tokyo. Chan, Y.-L., Paz, V., and Wool, I. G. (1989) The primary sequence of rat acidic ribosomal phosphoprotein P0. EMBL accession number X15096. Christiansen, T., Johnsen, M., Fiil, N., and Friesen, J. (1984) RNA secondary structure and translation inhibition: analysis of mutants in the rplJ leader. EMBO J., 3: 1609-1612  137 Chyba, C. F., Thomas, P. J., Brookshaw, L., and Sagan, C. (1990) Cometary delivery of organic molecules to the early Earth. Science, 249: 366-373. Clarke, A. R., Wigley, D. A., Chia, W. N., Barstow, D., Atkinson, T., and Holbrook, J. (1986) Site-directed mutagenesis reveals role of mobile arginine residue in lactate dehydrogenase catalysis, Nature , 324: 699-702. Conaway, J. W., and Conaway, R. C. (1991) Initiation of eukaryotic messenger RNA synthesis. J. Biol. Chem., 266: 17721-17724. Cousineau, B., Cerpa, C., Lefebvre, J., and Cedergren, R. (1992) The sequence of the gene encoding elongation factor Tu from chlamydia trachomatis compared with those of other organisms. Gene, 120: 33-41. Crick, F. H. C. (1968) The origin of the genetic code. J. Mol. Biol., 38: 367-379. Das, A. (1992) How the phage lamda N gene product suppresses transcription termination: communication of RNA polymerase with regulatory proteins mediated by signals in the nascent RNA. J. Bacteriol., 174: 6711-6716. Davies, G. J., Gamblin, S. J., Littlechild, J. A., and Watson, H. C. (1993) The structure of a thermally stable 3-phosphoglycerate kinase and a comparison with its mesophilic equivalent. Proteins,15: 383-389. Davies, G. J., Littlechild, J. A., Watson, H. C., and Hall, L. (1991) Sequence and expression of the gene encoding 3-phosphoglycerate kinase from Bacillus stearothermophilus. Gene, 109: 39-45. Dean, D., and Nomura, M. (1980) Feedback regulation of ribosomal protein gene expression in Escherichia coli. Proc. Natl. Acad. Sci. USA, 77: 3590-3594. Dennis, P. (1974) In vitro stability, maturation and relative differential synthesis rates of individual ribosomal proteins in Escherichia coli B/r. J. Mol. Biol., 88: 25-41. Dennis, P. (1985) Multiple promoters for transcription of ribosomal RNA gene cluster in Halobacterium cutirubrum. J. Mol. Biol., 186: 457-461  138 De Rosa M., Gambacorta A., Huber R., Lanzotti V., Nicolaus B., Stetter K. 0., and Trincone A. (1988) Lipid structure in Thermotoga maritima, J. Chem. Soc. Chem. Commun., 300. Dijk, J., White, S. W., Wilson, K. S., and Appelt, K. (1983) On the DNA binding protein II from Bacillus stearothermophilus. J. Biol. Chem., 258: 4003-4006. Dijk, J., Garret, R. A., and Muller, R. (1979) Studies on the binding of the ribosomal protein complex L7/L12-L10 and protein L11 to the end one third of the 23S RNA: a functional center of the 50S subunit. Nucleic Acids Res., 6: 2717-2729. Downing, W., and Dennis, P. (1987) Transcription products from the rp1KAJL-rpoBC gene cluster. J. Mol. Biol.,194: 609-620. Downing, W., Sullivan, S., Gottesman, M., and Dennis, P. (1990) Sequence and transcription pattern of the essential Escherichia coli secE-nusG operon. J. Bacteriol.,172: 1621-1627. Downing, W., and Dennis, P. (1991) RNA polymerase activity may regulate transcription initiation and attenuation in the rp/KAJLrpoBC operon in Escherichia coli. J. Biol. Chem., 266: 1304-1311. Draper, D. (1990) Structure and function of ribosomal protein-RNA complexes: thermodynamic studies. In The Ribosome: structure, function and evolution. (Hill, W. E., Dahlberg, A., Garrett, R., Moore, P., Schlessinger, D. and Warner, J. eds.), pp. 160-167, American Society for Microbiology, Washington, DC. Drlica, K. and Rouviere-Yaniv, J. (1987) Histone-like proteins of bacteria. Microbiol. Rev., 51: 301-319. Durovic, P., Liao, D., Mylvaganam, S., and Dennis, P. P. (1993a) The Evolution of Ribosomal Protein and Ribosomal RNA Operons: Coding Sequences, Regulatory Mechanisms and Processing Pathways. In The Translational Apparatus (Nierhaus, N. R., Subramanian, A. R., Erdmann, V. A., Franceschi, F., and Wittmann-Liebold, B., eds.), (in press), Plenum, New York and London. Durovic, P. (1993b) Characterization of a novel pathway for ribosomal RNA maturation in Sulfolobus acidocauldarius. Ph.D. thesis, University of British Columbia, Vancouver, B.C., Canada.  139 Egebjerg, J., Doutchwarie, S., Liljas, A., and Garrett, R. (1990) Characterization of the binding sites of protein L11 and the L10*(112)4 pentameric complex in the GTPase domain of 23 S ribosomal RNA from Escherichia coli. J. Mol. Biol., 213: 275-288. El-Baradi, T. A. L., de Regt, C. H. F., Einerhand, S. W. C., Teixido, J., Planta, R. J., Ballesta, J. P. G., and Raue, H. A. (1987) Ribosomal proteins EL11 from Escherichia coli and L15 from Saccharomyces cerevisiae bind to the same site in both yeast 26S and mouse 28S rRNA. J. Mol. Biol.,195: 909-917. Ernst, W. G. (1983) The early Earth and archean rock record. In Earth's Earliest Biosphere: its origin and evolution. (Schopf, J. W. ed.), pp. 41-52. Princeton University Press, Princeton, NJ. Falkenberg, P., Yaguchi, M., Roy, C. Zurker M., and Matheson, A. T. (1985) The primary structure of the ribosomal A-protein (L12) from the moderate halophile NRCC 41227. Biochem. Cell Biol., 64: 675-680. Favaloro, J. R., Treisman, R., and Kamen, R. (1980) Transcription maps of polyoma virus-specific RNA: analysis by two-dimensional nuclease S1 mapping. Methods Enzymol., 65: 718-749. Felsenstein, J. (1991) PHYLIP (Phylogeny Inference Package, version 3.4) (University of Washington). Feinberg, A. P., and Vogelstein, B. (1983) A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem., 132: 6-13. Ferro, J. A. and Reinach, F. C. (1988) The complete sequence of chicken-muscle cDNA encoding the acidic ribosomal protein P1. Eur. J. Biochem.,177: 513-516. Fiala, G., Stetter, K. 0. (1986) Pyrococcus furiosus sp. nov. representing a novel genus of marine heterotrophic archaebacteria growing optimally at 100°C. Arch. Microbiol., 145: 56-61. Fiala, G., Stetter, K. 0., Jannasch, H. W., Langworthy, T. A. and Madon, J. ( 1986) Staphylothermus marinus sp. nov. represents a novel genus of extremely thermophilic  • •^•  e I' •  System. Appl. Microbiol. 8: 106-113.  ••  e •e!  140 Fiil, N., Friesen, J., Downing, W., and Dennis, P. (1980) Post-transcriptional regulatory mutants in a ribosomal protein-RNA polymerase operon of E. coli. Cell, 19: 837-844. Friedman, D. I., and E. R. Olson, E. R. (1983) Evidence that a nucleotide sequence, "box A", is involved in the action of the NusA protein. Cell, 34: 143-149. Gabowski, D. T., Pieper, R. 0., Futscher, B. W., Deutsch, W. A., Erickson, L. C., and Kelley, M. R. (1992) Expression of ribosomal protein PO is induced by antitumor agents and increased in Mer human tumor cell lines. Carcinogenesis 13: 259263. -  Garland, W. G., Louie, K. A., Matheson, A. T., and Liljas, A. (1987) The complete amino acid sequence of the ribosomal 'A' protein (L12) from Bacillus stearothermophilus. FEBS Lett., 220: 43-46. Gilbert, W. (1986) The RNA world. Nature, 319: 618. Gogarten, J. P., Kibak, H., Dittrich, P., Taiz, L., Bowman, E. J., Bowman, B. J., Manolson, M. F., Poole, R. J., Date, T., Oshima, T., Konisch, J., Denda, K., and Yoshida, M. (1989) Evolution of the vacuolar H+-ATPase: Implications for the origin of eukaryotes. Proc. Natl. Acad. Sci. USA, 86: 6661-6665. Gold, L., and Stromo, G. (1987) Translational Initiation. In Escherichia coli and Salmonella typhimurium (Neidhardt, F., Ingraham, J., Low, K., Magasanik, B., Schaechter, M. and Umbarger, H., eds.), pp. 1302-1307, American Society for Microbiology, Washington, DC. Gourse, R. L., Sharrock, R. A., and Nomura, M. (1986) Control of ribosome synthesis in Escherichia coli. In Structure, Function, and Genetics of Ribosomes (Hardestry, B., and Kramer, G. eds.), pp.766-788, Springer-Verlag, New York, NY. Gourse, R. L., Thurlow, D. L., Gerbi, S. A., and Zimmerman, R. A. (1981) Specific binding of a prokaryotic ribosomal protein to a eukaryotic ribosomal RNA: implications for evolution and autoregulation. Proc. Natl. Acad. Sci. USA, 78: 2722-2726. e C' SOS^ Grant, W. D., and Larsen, H. (1989) Ex r Halobacteriales, ord. nov. In Bergey's Manual of Systematic Bacteriology (Staley,  141 J. T., Bryant, M. P., Pfennig, N., and Holt, J. G., eds.), 3: 2216, Williams and Wilkins, Baltimore. Green, R., and Szostak, J. W. (1992) Selection of a ribozyme that functions as a superior template in self-copying reaction. Science, 258: 1910-1915. Greenblatt, J. (1991) RNA polymerase-associated transcription factors. Trends Biochem. Sci.,16: 408-411. Grogan, D., Palm, P., and Zillig, W. (1990) Isolate B12, which harbours a virus-like element, represents a new species of archaebacterial genus Sulfolobus, Sulfolobus shibate, sp. nov. Arch. Microbiol. 154: 594-599. Gualerzi, C. 0., Losso, M. A., Lammi, M., Friedrich, K., Pawlik, R. T., Canonaco, M. A., Gianfranceschi, G., Pingoud, A., and Pon, C. L. (1986) Proteins from the prokaryotic nucleoid. Structural and functional characterization of the Escherichia coli DNA-binding proteins NS (HU) and H-NS. In Bacterial Chromatin (Guallerzi, C. 0., and Pon, C. L. eds.) pp. 101-134. Springer-Verlag, Berlin. Gudkov, A. T., Tumanova, L. G., Gongadze, G. H., and Bushow, V. N. (1980) Role of different regions of ribosomal proteins L7 and L10 in their complex formation and in the interaction with the ribosomal 50S subunit. FEBS Lett.,109: 34-38. Hansen, T. S., Andreasen, P. H., Dreisig, H., Hojrup, P., Nielsen, H., Engberg, J., and Kristiansen, K. (1991) Tetrahymena thermophila acidic ribosomal protein L37 contains an archaebacterial type C-terminus. Gene,105: 143-150. Hasegawa, M., Hashimoto, T., and Adachi, J. (1993) Origin and evolution of eukaryotes as inferred from protein sequence data. In The Origin and Evolution of Prokaryotic and Eukaryotic Cells (Hartman, H., and Matsuno, K., eds.), World Sci. Publ. (in press). Hardy, S. (1975) The stoichiometry of the ribosomal proteins in Escherichia coli. Mol. Gen. Genet.,140: 253-274. Heinrich, T., SchrOder, W., Erdmann, V. A., and Hartmann, R. K. (1992) Identification of the gene encoding transcription factor NusG of Thermus thermophiliis J Barteriol , 174. 7859-5863.  142 Henikoff, S. (1984) Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene, 28: 351-359. Hensel, R., Jakob, I., Scheer, H., and Lottspeich, F. (1992) Proteins from hyperthermophilic archaea: stability towards covalent modification of the peptide chain. In The archaebacteria: biochemistry and biotechnology, Biochem. Soc. Symp. (Danson, M. J., Hough, D. W., and Lunt, G. G. eds.), 58: 127-133, Portland Press, London. Higgins, C., Hinton, J., Hutton, J., Owen-Hughes, T., Pavitt, G., and Seirafi, A. (1990) Protein Hi: a role for chromatin structure in the regulation of bacterial gene expression and virulence? Mol. Microbiol., 4: 2007-2012. Hoopes, B., and McClure, W. (1987) Strategies in regulation of transcription initiation. In Escherichia coli and Salmonella typhimurium (Neidhardt, F., Ingraham, J., Low, K., Magasanik, B., Schaechter, M. and Umbarger, H. eds.), pp. 1231-1240 American Society for Microbiology, Washington, DC. Huber, G., and Stetter, K. 0. (1991) Sulfolobus metallicus, sp. nov., a novel strictly chemolithoautotrophic thermophilic archaeal species of metal-mobilizers. System. Appl. Microbiol. 14: 372-378. Huber, G., Spinnler, C., Gambacorta, A., and Stetter, K. 0. (1989a) Metallosphaera sedula gen. and sp. nov. represents a new genus of aerobic, metal-mobilizing, thermoacidophilic archaebacteria. System. Appl. Microbiol.,12: 38-47. Huber, H., Thomm, M., KOnig, H., Thies, G., and Stetter, K. 0. (1982) Methanococcus thermolithotrophicus, a novel thermophilic lithotrophic methanogen. Arch. Microbiol. 132: 47-50. Huber, R., and Stetter K. 0. (1992a) The Thermophiles: hyperthermophilic and extremely thermophilic bacteria. In Thermophilic Bacteria (Kristjansson J. K. ed.), pp. 185-194, CRC Press Inc., Boca Raton, FL. Huber, R., Wilharm, T., Huber, D., Trincone, A., Burggraf, S., KOnig, H., Rachel, R., Rockinger, I., Fricke, H., and Stetter, K. 0. (1992b) Aquifex pyrophilus gen. sp. nov., represents a novel group of marine hyperthermophilic hydrogen-oxidizing bacteria. System. Appl. Microbiol.,15: 340-351.  143 Huber, R., Woese, C. R., Langworthy, T. A., Kristjansson, J. K., and Stetter, K. 0. (1990) Fervidobacterium islandicum sp. nov., a new extremely thermophilic eubacterium belonging to the "Thermotogales". Arch. Microbiol., 154: 105-111. Huber, R., Woese, C. R., Langworthy, T. A., Fricke, H., and Stetter, K. 0. (1989b) Thermosipho africanus gen. nov., represents a new genus of thermophilic eubacteria within the "Thermotogales". System. Appl. Microbiol. 12: 32-37. Huber, R., Kristjansson, J. K., and Stetter, K. 0. (1987) Pyrobaculum gen. nov. a new genus of neutrophilic , rod-shaped archaebacteria from continental solfataras growing optimally at 100°C. Arch. Microbiol., 149: 95-101. Huber, R., Langworthy, T. A., KOnig, H., Thomm, M. M., Woese, C. R. Sleytr, U. B. and Stetter, K. 0. (1986) Thermotoga maritima sp. nov. represents a new genus of unique extremely thermophilic eubacteria growing up to 90°C. Arch. Microbiol.,144: 324-333. Hulton, C., Seirafi, A., Hinton, J., Sidebotham, J., Waddell, L., Pavitt, G., OwenHughes, T., Spassky, A., Buc, H., and Higgins, C. (1990) Histone-like protein H1 (HNS), DNA supercoiling, and gene expression in bacteria. Cell, 63: 631-642. Itoh, T. (1988) Complete Nucleotide Sequence of the ribosomal "A" protein operon from the archaebacterium Halobacterium halobium. Eur. J. Biochem., 176: 297303. Itoh, T., and Otaka, E. (1984) Complete amino-acid sequence of an L7/L12-type ribosomal protein from Desulfovibrio vulgaris Miyazaki. Biochim. Biophys. Acta, 789: 229-233. Itoh, T. and Higo, K. I. (1983) Complete amino acid sequence of an L7/L12-type ribosomal protein from Rhodopseudomonas spheroides. Biochim. Biophys. Acta, 744: 105-109. Itoh, T., Sugiyama, M., and Higo, K. I. (1982) The primary structure of an acidic ribosomal protein from Streptomyces griseus. Biochim. Biophys. Acta, 701: 164172. Itoh, T. (1981) Primary structure of an acidic ribosomal protein from Micrococcus lysodeikticus_ FERS Lett , 127 67,70__ ^ .  144 Itoh, T., and Wittmann-Liebold, B. (1978) The primary structure of Bacillus subtilis acidic ribosomal protein B-L9 and its comparison with Escherichia coli proteins L7/L12. FEBS Lett., 96: 392-394. Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S., and Miyata, T. (1989) Evolutionary relationship of archaebacteria, eubacteria and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA, 86: 9355-9359. Jinks-Robertson, S., and Nomura, M. (1987) Ribosomes and tRNA. In Escherichia coli and Salmonella typhimurium (Neidhardt, F., Ingraham, J., Low, K., Magasanik, B., Schaechter, M. and Umbarger, H. eds.), pp. 1358-1385, American Society for Microbiology, Washington DC. Johnsen, M., Christiansen, T., Dennis, P., and Fiils, N. (1982) Autogenous control: Ribosomal protein L10-L12 complex binds to the leader sequence of its mRNA. EMBO J., 1: 999-1004. Jones, W. J., Leigh, J. A., Mayer, F., Woese, C. R., and Wolfe, R. S. (1983) Methanococcus jannaschii sp. nov., an extremely thermophilic methanogen from submarine hydrothermal vent. Arch. Microbiol., 136: 254-261. Joyce, G. F. (1989) RNA evolution and the origin of life. Nature, 338: 217-224. Juan-Vidales, F., Sanchez-Madrid, F., Saenz-Robles, M. T., and Ballesta, J. P. G. (1983) Purification and characterization of two ribosomal proteins of Saccharomyces cerevisiae, homologies with proteins from eukaryotic and with bacterial protein ECL11. Euro. J. Biochem.,136: 275-281. Juszczak, A., Aono S., and Adams M. W. (1991) The extremely thermophilic eubacterium, Thermotoga maritima, contains a novel iron-dehydrogenase whose cellular activity is dependent upon tungsten. J. Biol. Chem., 266: 1383413841. Kasting, J. F. (1993) Earth's early atmosphere. Science, 259: 920-926. Kelly, M. R., Venugopal, S., Harless, J., and Deutsch, W. A. (1989) Antibody to a human DNA repair protein allows for cloning of a Drosophila cDNA that encodes an apurinic endonuclease. Mol. Cell. Biol., 9: 965-973.  145 Kearney, K. R., and Nomura, M. (1987) Secondary structure of the autoregulatory mRNA binding site of ribosomal protein L1. Mol. Gen. Genet., 210: 609-618. Kimura, M., Kimura, J., and Ashman, K. (1985) The complete primary structure of ribosomal proteins L1, L14, L15, L23, L24 and L29 from Bacillus stearothermophilus. Eur. J. Biochem., 150: 491-497. King, T., and Schlessinger, D. (1987) Processing of RNA transcripts. In Escherichia coli and Salmonella typhimurium (Neidhardt, F., Ingraham, J., Low, K., Magasanik, B., Schaechter, M. and Umbarger, H. eds.), pp. 703-718. American Society for Microbiology, Washington, DC. Knoll, A. H., and Barghoorn, E. S. (1985) Archean Microfossils showing cell division from the Swaziland system of south Africa. Science,198: 396-398. Kiipke, A. K. E., Leggatt, P. A., and Matheson, A. T. (1992) Structure function relationships in the ribosomal stalk proteins of archaebacteria. J. Biol. Chem., 167: 1382-1390. Kristjansson J. K., and Stetter K. 0. (1992) Thermophilic bacteria. In Thermophilic Bacteria (Kristjansson J. K. ed.), pp. 1-18, CRC Press Inc., Boca Raton, FL. Krowczynska, A.M., Coutts, M., Makrides, S., and Brawerman, G. (1989) The mouse homologue of the human acidic ribosomal protein P0: a highly conserved polypeptide that is under translational control. Nucleic Acids Res., 17: 64086408. Kurr, M., Huber, R., Kiinig, H., Jannasch, H. W., Fricke, H., Trincone, A., Kristjansson, J. K., and Stetter, K. 0. (1991) Methanopyrus kandleri, gen. and sp. nov represents a novel group of hyperthermophilic methanogens, growing at 110°C. Arch. Microbiol.,156: 239-247. Laemmli, U. K. (1970) Cleavage of structural proteins during the assembly of the head of the bacteriophage T4. Nature, 227: 680-685. Lake, J. A. (1990) Origin of the eucaryotic nucleus: rRNA sequences genotypical relate eocytes and eucaryotes. In The Ribosome: Structure, Function and Evolution (Hill, W. E., Dahlberg, A. Garrett, R. A., Moore, P. B., Schlessinger, D. and Warner, J. R. eds.), pp. 579 588, American Society for- Microbiology, Washington DC.  146 Lake, J. A. (1988) Origin of the nucleus determined by rate-invariant analysis of rRNA sequences. Nature, 331: 184-186. Lake, J. M., and Strycharz, W. A. (1981) Ribosomal proteins L1, L17, L27 localized at single sites on the large subunit by immune electron microscopy. J. Mol. Biol., 153: 979-992. Lauerer, G., Kristjansson, J. K., Langworthy, T. A., KOnig, H., and Stetter, K. 0. (1986) Methanothermus sociabilis sp. nov., a second species within the Methanothermaceae growing at 97°C. System. Appl. Microbiol., 8: 100-105. Leijonmarck, M., Liljas, A., and Subramanian, A. R. (1984) Computed spatial homology between the L12 protein of chloroplast ribosome and 1.7 A structure of Escherichia coli L12 domain. Biochem. Mt., 8: 69-76. Leijonmarck, M., Eriksson, S., and Liljas, A. (1980) Crystal structure of a ribosomal component at 2.6 A resolution. Nature, 286: 824-826. Leijonmarck, M., and Liljas, A. (1987) Structure of the C-terminal domain of the ribosomal protein L7/L12 from Escherichia coli at 1.7 A. J. Mol. Biol. 195: 555580. Li, J., Mason, S. W., and Greenblatt, J. (1993) Elongation factor NusG interacts with termination factor p to regulate termination and antitermination of transcription. Genes Dev., 7: 161-172. Li, J., Horwitz, R., McCrachen, S., and Greenblatt, J. (1992) NusG, a new Escherichia coli elongation factor involved in transcriptional antitermination by N protein of phage X. J. Biol. Chem., 267: 6012-6019. Liao, D., and Dennis, P. P. (1992) The organization and expression of essential translation, transcription component genes in the extremely thermophilic eubacterium Thermotoga maritima. J. Biol. Chem., 267: 22787-227976. Liljas, A. (1982) Structural studies of ribosomes. Prog. Biophys. Mol. Biol., 40: 161228.  147 Lindahl, L., Jaskunas, S., Dennis, P., and Nomura, M. (1975) Cluster of genes in Escherichia coli for ribosomal proteins, ribosomal RNA and RNA polymerase. Proc. Natl. Acad. Sci. USA, 72: 2743 2747. -  Lindahl, L., and Zengel, J. (1986) Ribosomal genes in Escherichia coli. Ann. Rev. Genet., 20: 297-326. Linn, T., and Greenblatt, J. (1992) The NusA and NusG proteins of Escherichia coli increase the in vitro read through frequency of a transcriptional attenuator preceding the gene for b subunit of RNA polymerase. J. Biol. Chem., 267: 14491454. Loomis, W. F., and Smith, D. W. (1990) Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc. Natl. Acad. Sci. USA, 87: 9093-9097. Londei P., Altamura, S., Huber, R., Stetter K. 0., and Cammarano, P. (1988) Ribosomes of the extremely thermophilic eubacterium Thermotoga maritima are uniquely insensitive to the miscoding-inducing action of aminoglycoside antibiotics J. Bacteriol., 170: 4353-4360. Magsanaga, A. and Nosoh, Y. (1974) Conformational change with temperature and thermostability of glutamine synthetase from Bacillus stearothermophilus. Biochim. Biophys. Acta, 365: 208 211. -  Manca, M. C., Nicolaus, B., Lanzotti, V., Trincone, A., Gambacorta, A., PeterKatalinic, J., Egge, H., Huber, R., and Stetter, K. 0. (1992) Glycolipids from Thermotoga maritima, a hyperthermophilic microorganism belonging to bacterial domain. Biochim. Biophys. Acta,1124: 249 252. -  Marquis, D. M., Fahnestock, S. R., Henderson, E., Woo, D., Schwinge, S., Clark, M. W., and Lake, J. A. (1981) The L7/L12 stalk, a conserved feature of the prokaryotic ribosome, is attached to the large subunit through its N-terminus. J. Mol. Biol., 150: 121-132. Mason, S. W., Li, J., and Greenblatt, J. (1992) Direct interaction between two Escherichia coli transcription antitermination factors, NusG and ribosomal protein S10. J. Mol. Biol., 223: 55-66.  148 Mason, S. W., and Greenblatt, J. (1991) Assembly of transcription elongation complexes containing the N-protein of phage X and Escherichia coli elongation factors NusA, NusB, NusG, and S10. Genes Dev., 5: 1504-1512. Matheson, A. T., Auer, J., Ramirez, C., and Bock, A. (1990) Structure and Evolution of archaebacterial ribosomal proteins. In The Ribosome: Structure, Function and Evolution (Hill, W. E., Dahlberg, A. Garrett, R. A., Moore, P. B., Schlessinger, D. and Warner, J. R. eds.), pp. 617-635, American Society for Microbiology, Washington DC. Matheson, A. T., Louie, A. K. and Bock, A. (1988) The complete amino acid sequence of the ribosomal A protein (L12) from the archaebacterium Sulfolobus acidocaldarius. FEBS Lett., 231: 331-335. Matheson, A. T., Louie, K. A., Tak, B. D., and Zuker, M. (1987) The primary structure of ribosomal A-protein (L12) from the halophilic eubacterium Haloanaerobium praevalens. Biochimie, 69: 1013-1020. McCarroll, R., Olsen, G. J., Stahl, Y. D., Woese, C. R., and Sogin, M. L. (1983) Nucleotide sequence of the Dictyostelium discoideum small-subunit ribosomal ribonucleic acid inferred from the gene sequence: evolutionary implications. Biochemistry, 22: 5858-5868. McClure, W. R. (1985) Mechanism and control of transcription initiation in prokaryotes. Annu. Rev. Biochem., 54: 171-204. Merkler, D. J., Srikumar, K., Marches-Ragona, S. P., and Wedler, F. C. (1988) Aggregation and thermoinactivation of glutamine synthetase from an extreme thermophilic B. caldolyticus. Biochim. Biophys. Acta, 952: 101-114. Miroshnichenko, M. L., Bonch-Osmolovskaya, E. A., Neuner, A., Kostrikina, N. A., Chernych, N. A., and Alekseev, V. A. (1989) Thermococcus stetteri sp. nov., a new extremely thermophilic marine sulfur-metabolizing archaebacterium. System. Appl. Microbiol.,12: 257-262. Mitsui, K., Nakogawa, T., and Tsurugi, K. (1989) The gene and the primary Structure of acidic ribosomal protein AO from yeast Saccharomyces cerevisiae which show partial homology to bacterial ribosomal protein L10. J. Biochem. 160: 223-227.  149 Mitsui, K., and Tsurugi, K. (1988) cDNA and deduced amino acid sequence of acidic ribosomal protein AO from Saccharomyces cerevisiae. Nucleic Acids Res., 16: 3573-3573. Morgan, W. D., Bear, D. G., and Von Hippel, P. H. (1984) Specificity of release by Escherichia coli transcription termination factor Rho of nascent mRNA transcripts initiated at the X pR promoter. J. Biol. Chem., 259: 8664-8671. Morgan, W. D., Bear, D. G., and Von Hippel, P. H. (1983) p-dependent termination of transcription. 1. Identification and characterization of termination sites for transcription from the bacteriophage pR promoter. J. Biol. Chem., 258: 95539564. Neuner, A., Jannasch, H. W., Belkin, S., and Stetter, K. 0. (1990) Thermococcus litoralis sp. nov.: a new species of extremely thermophilic marine archaebacteria. Arch. Microbiol.,153: 205-207. Newton, C. H., Shimmin, L. C., Yee, J., and Dennis, P. P. (1990) A family of genes encode the multiple forms of the Saccharomyces cerevisiae ribosomal proteins equivalent to the Escherichia coli L12 protein and a single form to L10-equivalent ribosomal protein. J. Bacteriol.,172: 579-588. Nishiyama, K., Mizushima, S., and Tokuda, H. (1992) The carboxyl-terminal region of SecE interacts with SecY and is functional in the reconstitution of protein translocation activity in Escherichia coli. J. Biol. Chem., 267: 7170-7176. Nodwell, J. R., and Greenblatt, J. (1993) Recognition of box A antiterminator RNA by the E. coli antitermination factor NusB and ribosomal protein S10. Cell, 72: 261268. Noll, K. M. (1989) Chromosome map of the thermophilic  Thermococcus celer. J. Bacteriol.,171: 6720-6726.  archaebacterium  Noller, H. F., Hoffarth, V., and Zimniak, L. (1992) Unusual resistance of peptidyl transferase to protein extraction procedures. Science, 256: 1416-1419. Okamoto, S., Nihira, T., Kataoka, H., Suziki, A., and Yamada, Y. (1992) Purification and Molecular Cloning of a butyrolactone autoregulator receptor from Styep tomyces virginiae. J. Biol. Chem., 267: 1093-1098. ,  150 Pace, N. R. (1991) Origin of life-facing up to the new physical setting. Cell, 65: 531533. Pace, N. R., Olsen, G. J., and Woese, C. R. (1986) Ribosomal RNA phylogeny and the primary lines of evolutionary descent. Cell, 45: 325-326. Patel, B. K. C., Morgan, H. W., and Daniel, R. M. (1985) Fervidobacterium nodosum gen. nov. and spec. nov., a new chemoorganotrophic caldo-active, anaerobic bacterium. Arch. Microbiol.,141: 63-69. Paton, E. B., Woodmaska, M. I., Kroupskaya, I. V., Zhyvoloup, A. N., and Matsuka, G. K.. (1990) Evidence for the ability of L10 ribosomal proteins of Salmonella typhimurium and Klebsiella pneumoniae to regulate rplJL gene expression in Escherichia coli. FEBS Lett., 265: 129-132. Paton, E. B., Woodmaska, M. I., Kroupskaya, I. V., Zhyvoloup, A. N., and Matsuka, G. K. (1990) Evidence for the ability of L10 ribosomal proteins of Salmonella typhimurium and Klebsiella pneumoniae to regulate rplJL gene expression in Escherichia coli. FEBS Lett., 2675: 129-132. Paton, E. B., Zolotukhiu, S. B., Woodmaska, M. I., Kroupskaya, I. V., and Zhyvoloup, A. N. (1990) The nucleotide sequence of gene rplJ encoding ribosomal protein L10 of Salmonella typhimurium. Nucleic Acids Res.,18: 2824-2824. Pettersson, I. (1979) Studies on the RNA and protein binding sites of the E. coli ribosome protein L10. Nucleic Acids Res., 6: 2637-2646. Petersen, C. (1990) Escherichia coli ribosomal protein L10 is rapidly degraded when synthesized in excess of ribosomal protein L7/L12. J. Bacteriol., 172: 431-436. Perutz, M. F., and Raidt, H. (1975) Stereochemical basis of heat stability in bacterial ferredoxins and in haemoglobins A2. Nature, 255: 256-259. Piccirilli, J. A., McConnell, T. S., Zaug, A. J., Noller, H. F., and Cech, T. R. (1992) Aminoacyl esterase activity of Tetrahymena ribozyme. Science, 256: 1420-1424. Pinto, J. P., Gladstone, G. R., and Yung, Y. L. (1980) Photochemical production of formaldehyde in Earth's primitive atmosphere. Science, 210: 183-185.  151 Planta, R., Mager, W., Leer, R., Wondt, L., Raue, H., and El-Baradi, T. (1986) Structure and expression of ribosomal proteins in yeast. In Structure, Function and Genetics of Ribosomes (Hardesty, B., and Kramer, G., eds.), pp. 699-718. Springer-Verlag, New York, NY. Platt, T. (1986) Transcription termination and the regulation of gene expression. Annu. Rev. Biochem., 55: 339-372. Pley, U., Schipka, J., Gambacorta, A., Jannasch, H. W., Fricke, H., Rachel, R., and Stetter, K. 0. (1991) Pyrodictium abyssi sp. nov. represents a novel heterotrophic marine archaeal hyperthermophile growing at 110°C. System. Appl. Microbiol., 14: 245-2533. Portier, C., Dondin, L. Grunberg Manago, M., and Rignier, P. (1987) The first step in the functional inactivation of Escherichia coli polynucleotide phosphorylase messenger is a ribonuclease III processing at the 5' end. EMBO J., 6: 2165-2170. Post, L. E., Strycharz, G. D., Nomura, M., Lewis, H., and Dennis, P. P. (1979) Nucleotide sequence of the ribosomal protein gene cluster adjacent to the gene for RNA polymerase subunit in Escherichia coli. Proc. Natl. Acad. Sci. USA, 76: 1697-1701. Prieto, J., Candel, E., and Coloma, A. (1991) Nucleotide sequence of a cDNA encoding ribosomal protein PO in Dictyostelium discoideum. Nucleic Acids Res., 19: 1342. Pucciarelli, M. G., Remacha, M., Vilella, M. D., and Ballesta, J. P. G. (1990) The 26S rRNA binding ribosomal protein equivalent to bacterial protein L11 is encoded by unspliced duplicated genes in Saccharomyces cerevisiae. Nucleic Acids Res., 18: 4409-4416. Qian, S., Zhang, J.-Y., Kay, M. A., and Jacobs-Lorena, M. (1987) Structural analysis of the Drosophila rpAl gene, a member of the eukaryotic "A" type ribosomal protein family. Nucleic Acids Res.,15: 987-1003. Ramirez, C., Shimmin, L. C., Newton, C. H., Matheson, A. T., and Dennis, P. P. (1989) Structure and evolution of the L11, Ll, L10 and L12 equivalent ribosomal proteins in eubacteria, archaebacteria and eukaryotes. Can. J. Microbiol., 35: 234244.  152 Rehaber, V., and Jaenicke, R. (1992) Stability and reconstitution of D-glyceraldehyde3-phosphate dehydrogenase from the hyperthermophilic eubacterium Thermotoga maritima. J. Biol. Chem., 267: 10999-11006. Remacha, M., Saenz-Robler, M. T., Vilella, M. D., and Ballesta, J. P. G. (1988) Independent genes coding for three acidic proteins of the large ribosomal subunit from Saccharomyces cerevisiae. J. Biol. Chem., 263: 9094-9101. Rich, B. E., and Steitz, J. A. (1987) Human acidic ribosomal phosphoproteins P0, P1, and P2: analysis of cDNA clones, in vitro synthesis, and assembly. Mol. Cell. Biol., 7: 4065-4074. Rivera, M. C., and Lake, J. A. (1992). Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science, 257: 74-76. Robson, E., and Pain, R. H. (1971) Analysis of code relating sequence to conformation in proteins: possible implications for the mechanism of formation of helical regions. J. Mol. Biol., 58: 237-259. Roberts, J. W. (1993) RNA and protein elements of E. coli and X transcription antitermination complexes. Cell, 72: 653-655. Rosenberg, M., Court, D., Shimatake, H., Brady, C., and Wulff, D. L. (1978) The relationship between function and DNA sequence in an incistronic regulatory region of phage X. Nature, 272: 414-422. Rouviêre-Yaniv, J., Yaniv, M., and Germond, J.-E. (1979) E. coli DNA binding protein HU forms nucleosome-like structure with circular double-stranded DNA. Cell,17: 265-274. Ryan, P. C., Lu, M., and Draper, D. E. (1991) Recognition of highly conserved GTPase center of 23 S ribosomal RNA by ribosomal protein L11 and the antibiotic thiostrepton. J. Mol. Biol., 221: 1257-1268. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular cloning: a laboratory manual, Cold Spring Harbor Laboratory, NY.  153 Sanger, F., Coulson, A. R., Barrell, B. G., Smith, A. J. H., and Roe, B. A. (1980) Cloning in single stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol.,143: 61-178. Sanchez-Madrid, F., Vidales, F. J., and Ballesta, J. P. G. (1981) Functional role of acidic ribosomal proteins. Interchangeability of proteins from bacterial and eucaryotic cells. Biochemistry, 20: 3263 -3266. Sander, G. (1983) Ribosomal protein L1 from Escherichia coli. Its role in the binding of tRNA to the ribosome and in elongation factor G - dependent hydrolysis. J. Biol. Chem., 258: 10098-10102. Sawadogo, M., and Sentenac, A. (1990) RNA polymerase B (II) and general transcription factors. Annu. Rev. Biochem., 59: 711-754. Schatz, P., Bieker, K., Ottemann, K., Silhavy, T., and Beckwith, J. (1991) One of the three transmembrane stretches is sufficient for functioning of the SecE protein a component of the E. coli secretion machinery. EMBO J.,10: 1749-1757. Schatz, P., Riggs, P., Jacq, A., Fath, M., and Beckwith, J. (1989) The secE gene encodes an integral membrane protein required for protein export in E. coli. Genes Dev., 3: 1035-1044. Schijman, A. G., Dusetti, N. J., Vazquez, M. P., Lafton, S., Levy-Yeyati, P., and Levin, M. J. (1990) Nucleotide cDNA and complete deduced amino acid sequence of a Trypanosoma cruzi ribosomal P protein (P-JL5). Nucleic Acids Res., 18: 3399. Schopf, J. W. (1993) Microfossils of the early archean apex chert: new evidence of the antiquity of life. Science, 260: 640-646. Schopf, J. W., and Walter, M. R. (1983) Archean microfossile: new evidence of ancient microbes. In Earth's Earliest Biosphere: its origin and evolution (Schopf, J. W. ed.), pp. 214-239, Princeton University Press, Princeton, NJ. Segerer, A. H., Trincone, A., Gahrtz, M., and Stetter, K. 0. (1991) Stygiolobus azoricus gen. nov., sp. nov., represents a novel genus of anaerobic, extremely thermophilic archaebacteria of the order Sulfolobales. Mt. J. Syst. Bacteriol., 41: 495-501.  154 Segerer, A., Neuner, A., Kristjansson, J. K., and Stetter, K. 0. (1986) Acidianus brierleyi comb. nov.: facultatively aerobic, extremely acidophilic thermophilic sulfur-metabolizing archaebacteria. Int. J. Syst. Bacteriol., 36: 559-564. Shapiro, R. (1988) Prebiotic ribose synthesis: a critical analysis. Origin of life & Evolution of Biosphere, 18: 71-85. Sharp, R. J., Riley, P. W., and White, D. (1992) Heterotrophic thermophilic Bacili. In Thermophilic Bacteria (Kristjansson, J. K., ed.), pp. 19-50. CRC Press Inc., Boca Raton, FL. Shimmin, L. C. (1990) An archaebacterial ribosomal protein gene cluster. Ph.D. thesis, University of British Columbia, Vancouver, B.C., Canada. Shimmin, L. C., and Dennis, P. P. (1989) Characterization of the L11, L1, L10 and L12 equivalent ribosomal protein gene cluster of the halophilic archaebacterium Halobacterium cutirubrum. EMBO J., 8: 1225-1235. Shimmin, L. C., Ramirez, C., Matheson, A. T., and Dennis, P. P. (1989) Sequence alignment and evolutionary comparison of the L10 equivalent and L12 equivalent ribosomal protein from archaebacteria, eubacteria and eukaryotes. J. Mol. Evol., 29: 448-462. Sibold, C., and Subramanian, A. R. (1990) Cloning and characterization of the genes for ribosomal proteins L10 and L12 from Synechocystis Sp. PCC 6803: Comparison of gene clustering pattern and protein sequence homology between cyanobacteria and chloroplasts. Biochim. Biophys. Acta, 1050: 61-68. Simpson, H. D., Coolbear, T. Vermute, M., and Daniel, R. M. (1990) Purification and some properties of a thermostable DNA polymerase from a Thermotoga species Biochem. Cell Biol., 68: 1292-1296. Smooker, P. M., Schmidt, J., and Subramanian, A. R. (1991) The nuclear:organelle distribution of chloroplast ribosomal protein genes. Features of a cDNA clone encoding precursor of L11. Biochimie, 73: 845-851. Sor, F., and Nomura, M (1987) Cloning and DNA sequence determination of the L11 ribosomal protein of Serratia marcenscens and proteus vulgaris. Translational feedback regulation of Escherichia coli L11 operon by heterologous 11proteins.___Atlok -Gen.  155 Stetter, K. 0. (1993) Life at the upper temperature border. In Colloque  interdisciplinaire du comite national de la recherche scientifique, Frontiers of Life, Le Bloris Proceedings (Than Thanh Van, J. T., Mounolou, J. C., Schneider, J., and McKay, C. eds.), pp. 195-219, C55, Editions Frontiers, Gif-sur-Yvette.  Stetter, K. 0. (1988) Archaeoglobus fulgidus gen. nov., sp. nov.: a new taxon of extremely thermophilic archaebacteria. System. Appl. Microbiol.,10: 172-173. Stetter, K. 0., Lauerer, G., Thomm, M., and Neuner, A. (1987) Isolation of extremely sulfate reducers: evidence for a novel branch of archaebacteria. Science, 236: 822824. Stetter, K. 0. (1986) Diversity of extremely thermophilic archaebacteria. In Thermophiles: General, molecular and applied microbiology (Brock, T. D. ed.) pp. 40-74, John Wiley & Sons, New York, NY. Stetter, K. 0., Kiinig, H., and Stackebrandt, E. (1983) Pyrodictium gen. nov., a new genus of submarine disc-shaped sulphur reducing archaebacteria growing optimally at 105°C. System. Appl. Microbiol., 4: 535-551. Stetter, K. 0., Thomm, M., Winter, J., Wildgruber, G., Huber, M., Zillig, W., Janecovic D., Kiinig H., Palm P., and Wunderl S. (1981a) Methanothermus fervidus, sp. nov., a novel extremely thermophilic methanogen isolated from Icelandic hot spring. Zentralbl, Bakteriol. Hyg., Abstr. 1, orig c2, 166. Stetter, K. 0., Thomm, M., Winter, J., Wildgruber, G., Huber, H., Zillig, W., Janecovic, D., KOnig, H., Palm, P., and Wunderl, S. (1981b) Met hano t hermu s fervidus, sp. nov., a novel extremely thermophilic methanogen isolated from Icelandic hot spring. Zentralbl. Bakteriol. Hyg., Abstr. 1 Orig. C2, 166-178. Stiiffler-Meilicke, M., and Stiiffler, G. (1991) The binding site of ribosomal protein L10 in eubacteria and archaebacteria is conserved: recognition of chimeric 50S subunit. Biochemie, 73: 797-804. Strycharz, W. A., Nomura, M., and Lake, J. A. (1978) Ribosomal proteins L7/L12 localized at a single region of the large subunit by immune electron microscopy. J. Mol. Bio1.,126: 123-140.  156 Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol., 105: 60-89. Subramanian, A. R. (1975) Copies of protein L7 and L12 and heterogeneity of the large subunit of Escherichia coli ribosome. J. Mol. Biol., 95: 1-8. Subramanian, A. R., and Dabbs, E. R. (1980) Functional studies on ribosomes lacking protein L1 from mutant E. co/i. Eur. J. Biochem., 112: 425-430. Sullivan, S., and Gottesman M. E. (1992) Requirement for E. coli NusG protein in factor-dependent transcription termination. Cell, 68: 989-994. Sullivan, S., Ward, D., and Gottesman, M. (1992) Effect of Escherichia coli nusG function on N-mediated transcription antitermination. J. Bacteriol.,174: 13391224. Suzuki, K., Olvera, J., and Wool, I. G. (1990) The primary structure of rat ribosomal protein L12. Biochem. Biophys. Res. Commun.,172: 35-41. Swindle, J., Zylicz, M., Georgopoulos, C., Li, J., and Greenblatt, J., (1988) Purification and properties of the NusG protein of Escherichia coli. J. Biol. Chem., 263: 1022910325. Swofford, D. L. (1993) PAUP Version 3.1 (Laboratory of Molecular Systematics, Smithsonian Institution, Washington, DC). Thomas, M., and Nomura, M. (1987) Translational regulation of the L11 ribosomal protein operon of Escherichia coli: mutations that define the target site for repression by L1. Nucleic Acids Res., 15: 3085-3096. Tiboni, 0., Cantoni, R., Creti, R., Cammarano, P., and Sanangelantoni, A. M. (1991) Phylogenetic depth of Thermotoga maritima inferred from analysis of the fus gene: amino acid sequence of elongation factor G and organization of the Thermotoga str operon. J. Mol. Evol., 33: 142-151. Titus, D. E. (1991) Promega Protocols and Application Guide, Second Edition, Promega Corporation, Madison, WI.  157 Uchiumi, T., Wahba, A. J., and Traut, R. R. (1987) Topography and stoichiometry of acidic proteins in large ribosomal subunits from Artemia salina as determined by crosslinking. Proc. Natl. Acad. Sci. USA, 84: 5580-5584. Wachtershauser, G. (1992) Groundworks for an evolutionary biochemistry: the iron-sulphur world. Prog. Biophys. Mol. Biol., 58: 85-201. Walter, M. R., Buick, R., and Dunlop, J. S. R. ( 1980) Stromatolites 3,400-3,500 myr old from North Pole area, Western Australia. Nature, 284: 443-445. Walter, M. R. (1983) Archean stromatolites: evidence of the Earth's earliest benthos. In Earth's Earliest Biosphere: its origin and evolution (Schopf, J. W. ed.), pp. 187213, Princeton University Press, Princeton, NJ. Warner, J. R. (1989) Synthesis of ribosomes in Saccharomyces cerevisiae. Microbiol. Rev., 53: 256-271. Watanabe, K., Chishiro, K., Kitamura, K., and Suzuki, Y. (1991) Proline residues responsible for thermostability occur with high frequency in the loop regions of an extremely thermostable oligo-1,6-glucosidase from Basillus thermoglucosidasius KP1006. J. Biol. Chem., 266: 24878-24294. Wedler, F. C., and Hoffman, F. M. (1974) Glutamine synthetase of Bacillus stearothermophilus. II. Regulation and thermostability. Biochemistry, 13: 32153221. Wedler, F. C., and Merkler, D. J. (1985) Thermostabilization of B. caldolyticus glutamine synthetase by intrinsic and extrinsic factors. In Curr. Top. Cell Regul., 26: 263-280. Weiner, A. M., and Maizels, N. (1991) The genomic tag model for the origin of protein synthesis: Further evidence from the molecular fossil record. In Evolution of Life: Fossils, Molecules and Culture (Osawa, S. and Honjo, T. eds.), pp. 51-66. Springer-Verlag, Tokyo. Weiner, A. M., and Maizels, N. (1987) 3' terminal tRNA-like structures tag genomic RNA molecules for replication: Implications for the origin of protein synthesis. Proc. Natl. Acad. Sci. USA, 84: 7383-7387.  158 Weiner, A. M. (1987) The origin of life. In Molecular Biology of Gene (Watson, J. D., Roberts, J. W., Steitz, J. A., and Weiner, A. M. eds.), pp. 1098-1160. Benjamin Cummings, Menlo Park, CA. Whalen, W., Wolska, K., Devito, J., and Das, A. (1992) NusG is a multifunctional transcription factor that positively regulates both termination and antitermination. Abstract of the Cold Spring Harbor Conference on Bacteria and Phages, August, 1992, pp. 17. Whalen, W., and Das, A. (1988) NusA protein is necessary and sufficient in vitro for phage X N gene product to suppress a Rho-independent terminator placed downstream of nutL. Proc. Natl. Acad. Sci. USA, 85: 2494-2498. Wheelis M. L., Kandler 0., and Woese C. R. (1992) On the nature of global classification. Proc. Natl. Acad. Sci. USA, 89: 2930-2934. Wiegel, J. (1990). Temperature spans for growth: hypothesis and discussion. FEMS Microbiol. Rev., 75: 155-170. Wigboldus, J. 0. (1987) cDNA and deduced amino acid sequence of Drosophila vp21c, another "A" type ribosomal protein. Nucleic Acids Res., 15: 10064. Windberger, E., Huber, R., Trincone, A., Fricke, H., and Stetter, K. 0. (1989) Thermotoga thermarum sp. nov. and Thermotoga neapolitana occurring in African continental solfataric spring. Arch. Microbiol., 151: 506-512. Wittmann-Liebold, B., KOpke, A. K. E., Arndt, E., Kromer, W., Hatakeyama, T., and Wittmann, H.-G. (1990) Sequence comparison of ribosomal proteins and their genes. In The Ribosome: Structure, Function and Evolution (Hill, W. E., Dahlberg, A. Garrett, R. A., Moore, P. B., Schlessinger, D. and Warner, J. R. eds.), pp. 598-616. American Society for Microbiology, Washington DC. Woese, C., Kandler, 0., and Wheelis, M. (1990) Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Nat. Acad. Sci USA, 87: 4576-4579. Woese, C. R. (1987) Bacterial Evolution. Microbiol. Rev., 51: 221-270. Woese, C. R., and G. Olsen, (1986) Archaebacterial phylogeny: perspectives on the urkingdoms. System. Appl. Microbiol., 7: 161-177.  159 Woese, C. R. (1980) Just so stories and Rube Goldberg machines: speculations on the origin of the protein synthesis machinery. In Ribosomes: Structure, Function, and Genetics (Chamblis, G., Craven, G. R., Davies, J., Davis, K., Kahan, L. and Nomura, M. eds.), pp. 357-373. University Park Press, Baltimore. Wool, I. G., Endo, Y., Chan, Y.-L., and Gliick, A. (1990) Structure, function and evolution of mammalian ribosomes. In The Ribosome: Structure, Function and Evolution (Hill, W. E., Dahlberg, A., Garrett, R. A., Moore, P. B., Schlessinger, D. and Warner, J. R. eds.), pp. 203-214, American Society for Microbiology, Washington, DC. Wrba, A., Jaenicke, R., Huber, R., and Stetter, K. 0. (1990) Lactate dehydrogenase from the extremely thermophilic Thermotoga maritima. Eur. J. Biochem., 188: 195-201. Wrba, A., Schweiger, A., Schultes, V., Jaenicke, R., and Zavodszy, P. (1990) Extremely thermostable D-glyceraldehyde-3-phosphate dehydrogenase from Eubacteria Thermotoga maritima . Biochemistry, 29: 7584-7592. Yager, T. D., and von Hippel, P. H. (1987) Transcript elongation and termination in Escherichia coli. In Escherichia coli and Salmonella typhimurium: cellular and molecular biology (Neidhardt, F. C., Ingraham, J. L., Low, K. B., Schaecher, M., and Umbarger, E., eds.), pp. 1241-1275. American Society for Microbiology, Washington, DC. Yamagichi, A., and Oshima, T. (1990) Circular chromosomal DNA in the sulfurdependent archaebacterium Sulfolobus acidocaldarius. Nucleic Acids Res., 18: 1133-1136. Yates, J. L., and Nomura, M. (1981) Feedback regulation of ribosomal protein synthesis in E. coli: Localization of the mRNA target sites for repressor action of ribosomal protein L11. Cell, 24: 243-249. Yang, X-Y. H., Schulz, H., Elzinda, M., and Yang, S.-Y. (1991) Nucleotide sequence of the promoter and fadB gene of the fadBA operon and primary structure of multifunctional fatty acid oxidation protein from Escherichia coli. Biochemistry, 30: 6788-6795. Zhang, H., Scholl,  ••• .• IP^P. 41 • • '^•^•  sequencing as a choice for DNA sequencing. Nucleic Acids Res.,16: 1220.  160 Zillig, W., Holz, I., Janekovic, D., Klenk, H. P., Imsel, E., Trent, J., Wunderl, S., Forjaz, V. H., Coutinho, R., and Ferreira, T. (1990) Hyperthermus butylicus, a hyperthermophilic sulfur-reducing archaebacterium that ferments peptides. j. Bacteriol., 172: 3959-3965. Zillig, W., Holz, I., Klenk, H. P., Trent, J., Wunderl, S., Janecovic, D., Erwin, J., and Haas, B. (1987) Pyrpcoccus woeseii, sp. nov., an ultra-thermophilic marine archaebacterium representing a novel order, Thermococcals. System. Appl. Microbiol., 9: 62-70. Zillig, W., Yeates, S., Holz, I., Bock, A., Gropp, F., and Simon, G. (1986) Desulfurolobus ambivalens gen. nov., sp. nov., an autotrophic archaebacterium facultatively oxidizing or reducing sulfur. System. Appl. Microbiol., 8: 197-203. Zillig, W., Gierl, A., Schreiber, G., Wunderl, S., Janecovic, D., Stetter, K. 0., and Klenk, H. P (1983) The archaebacterium Thermophilum pendens represents a novel genus of thermophilic, anaerobic sulfur spring Thermoproteales. System. Appl. Microbiol., 4: 79-87. Zillig, W., Stetter, K. 0., Prangishvilli, D., Schafer, W., Wunderl, S., Janecovic, D., Holz, I., and Palm, P. (1982) Desulfurococcaceae, the second family of the extremely thermophilic, anaerobic, sulfur-respiring Thermoproteales. Zentralbl. Bakteriol. Hyg., Abstr. 1 Orig. C3, 304-317. Zillig, W., Stetter, K. 0., Schafer, W., Janecovic, D., Wunderl, S., Holz, I., and Palm, P. (1981) Thermoproteals: a novel type of extremely thermophilic anaerobic archaebacteria isolated from Icelandic solfataras. Zentralbl. Bakteriol. Hyg., Abstr.. 1 Orig. C2, 205-227. Zillig, W., Stetter, K. 0., Wunderl, S., Schulz, W., Priess, H., and Scholz, I. (1980) The Sulfolobus- "caldariella" group: taxonomy on the basis of structure of DNAdependent RNA polymerases. Arch. Microbiol., 125: 259-269. Zimmerman, R. A., Thurlow, D. L., Finn, R. S., March, T. L., and Ferrett, L. K. (1980) Conservation of specific protein RNA interactions in ribosomal evolution. In Genetics and Evolution of RNA polymerase, tRNA and ribosomes (Osasa, S., Ozeki, H., Uchida, H. and Yura, T. eds.), pp. 569-584, University of Tokyo Press, Tokyo.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098821/manifest

Comment

Related Items