Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The superoxide dismutase gene family in the halobacteria : structure, expression and evolution Joshi, Phalgun B. 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1993_spring_phd_joshi_phalgun.pdf [ 4.66MB ]
JSON: 831-1.0086499.json
JSON-LD: 831-1.0086499-ld.json
RDF/XML (Pretty): 831-1.0086499-rdf.xml
RDF/JSON: 831-1.0086499-rdf.json
Turtle: 831-1.0086499-turtle.txt
N-Triples: 831-1.0086499-rdf-ntriples.txt
Original Record: 831-1.0086499-source.json
Full Text

Full Text

THE SUPEROXIDE DISMUTASE GENE FAMILY IN THE HALOBACTERIA:STRUCTURE, EXPRESSION and EVOLUTIONbyPHALGUN B. JOSHIB.Sc. (Hons.), University of Guelph, 1986A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF BIOCHEMISTRYGENETICS PROGRAMWe accept this thesis as conforming to the required standardTHE UNIVERSITY OF BRITISH COLUMBIASEPTEMBER, 1992© Phalgun B. Joshi, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature) Department of €'1C) Ct•LCIAA:a -111,1J\ The University of British ColumbiaVancouver, CanadaDate P6k,C11DE-6 (2/88)iiABSTRACTThe halophilic archaebacteria belong to a group of closelyrelated organisms that evolved from the methanogens. Since themethanogens are strict anaerobes, the divergence of the aerobichalophiles from this lineage may have required the acquisition ofprotection against oxygen toxicity; a number of reactions involvingoxygen produce highly reactive free-radical by-products. Anexample of such by-products is the superoxide radical, 02 - . Mostaerobic organisms possess the enzyme superoxide dismutase (SOD) toprotect against this free radical.It has previously been shown that the halophile, Halobacteriumcutirubrum possesses a type of SOD that contains manganese. Thegenome of this organism contains two closely related genes; one thatencodes the SOD enzyme, designated sod, and another sod-like genedesignated slg. The two genes are 87% identical and their putativeproteins are 83% identical. For most genes that are homologous, theDNA sequence identity initially decreases more rapidly than thecorresponding amino acid sequence identity of the proteins theyencode. The disparity in the identities between the sod and slg genesand between their corresponding proteins can be attributed to thealmost even distribution of substitutions between the three codonpositions. This pattern of substitutions results in a high incidence ofnon-synonymous nucleotide substitutions. The two genes also differin their response to exposure to paraquat, a generator of superoxidefree radicals; mRNA levels from the sod gene were seen to beelevated whereas those from slg were unaffected.111To investigate whether other halophiles contain similarparalogous genes (i.e., products of gene duplication in an organism)and whether they exhibit similar patterns of evolutionary divergenceand gene expression, sod genes from three different halophiles wereisolated and characterized.The number of copies of genes homologous to sod from Hb.cutirubrum varies from one in Haloarcula marismortui to two inHalobacterium sp. GRB and Haloferax volcanii. The pattern ofsubstitutions between the paralogous genes in Hb. sp. GRB (sod andslg) is almost identical to that observed between the sod and slggenes in Hb. cutirubrum. In contrast, the sodl and sod2 genes in Hf.volcanii are 99% identical. Comparison of the nucleotide sequencesreveals that all the genes are related (identities vary from 76% to99%) and form a coherent family. Within the entire family,substitutions in the first and second positions are much morefrequent than expected; this results in a large number of amino acidsubstitutions in the encoded protein.Both Hb. sp. GRB and Hf volcanii contain one gene each that isinduced by paraquat and one that is unaffected. The single sod genein Ha. marismortui is unaffected by paraquat treatment.Comparison of the protein sequences encoded by thesuperoxide dismutase gene family in the halobacteria with sequencesfrom other organisms has enabled the identification of halobacterial-specific residues.Phylogenetic analyses were performed to determine theevolutionary relationship between the members of the sod genefamily in the halobacteria. Results from these analyses together withivthe comparison of upstream flanking sequences and response toparaquat have enabled the postulation of possible evolutionaryhistories of the genes.VTABLE OF CONTENTSABSTRACT^ iiTABLE OF CONTENTS^ vLIST OF FIGURES viiiLIST OF TABLES^ ixABBREVIATIONS xACKNOWLEDGMENTS^ xiiiINTRODUCTION 1Early Evolution^ 1Molecular Phylogeny And The Three KingdomClassification^ 2Molecular Phylogeny Methods^ 9Archaebacteria^ 1 1Halophilic Archaebacteria^ 1 2Oxygen Toxicity And Superoxide Dismutases^15Superoxide Dismutases In Archaebacteria 2 0Aims Of This Study^ 2 3MATERIALS AND METHODS 2 5Bacterial strains and growth conditions^ 2 5Isolation of DNA and RNA^ 2 6Preparation of radioactive probes^ 2 6Southern hybridization analysis 2 7Library construction^ 2 7DNA sequencing 2 8RNA transcript mapping^ 2 8Enzyme activity assay 2 9viSequence^analysis^ 2 9CHAPTER 1. Characterization of Paralogous and Orthologousmembers of the Superoxide Dismutase Gene Family fromGenera of the Halophilic Archaebacteria^ 3 11.1. INTRODUCTION^ 3 11.2. LITERATURE CLARIFICATION AND NOMENCLATURE 3 41.3. RESULTS AND DISCUSSION^ 3 61.3.1. Gene Isolation 3 61.3.2.^Sequence determination and alignment^ 4 01.3.3.^Transcript characterization^and regulation byparaquat^ 4 71.3.4. Transcription and translation signals^ 5 41.3.5. Regulation by paraquat^ 5 71.4. SUMMARY^ 6 3CHAPTER 2. The Family of Superoxide Dismutase Proteins fromHalophilic Archaebacteria:^Structure, function and evolution^ 6 52.1. INTRODUCTION 6 52.2. RESULTS AND DISCUSSION^ 6 72.2.1 Amino acid and nucleotide sequencealignments.^ 6 72.2.2. Signature amino acid residues in SOD^ 7 02.2.3.^Nucleotide^sequence divergence 7 42.2.4.^Phylogenetic^analysis^ 8 32.2.5. The consensus tree 8 72.3^Summary^ 9 2CONCLUSIONS 9 4FUTURE RESEARCH PROSPECTS^ 9 7viiREFERENCES^ 9 9viiiLIST OF FIGURESFigure 1^Universal phylogenetic trees determined from thecomparison of SS rRNA sequences.^ 7Figure 2^Identification of sod-like sequences by Southernhybridization^ 3 7Figure 3^Aligned nucleotide and amino acid sequences of thesod gene family and the proteins they encode.^41Figure 4^Nucleotide sequences of the 5' and 3' flankingregions of the halophilic sod genes^ 4 5Figure 5 Nuclease Si and primer extension mapping of the 5'and 3'transcripts from the sod family of genes.^4 9Figure 6^Alignment of the 5' flanking regions of the non-inducible sod family of genes.^ 5 9Figure 7 Amino acid sequence alignments of SOD proteins^6 8Figure 8^Relative rates of nucleotide substitutions in the sodand slg genes of Hb. cutirubrum^ 8 1Figure 9 Phylogenetic relationships between members of thehalophilic sod gene family or the proteins theyencode^ 8 4Figure 10 The consensus tree illustrating phylogeneticrelationships between the halophilic sod genes (orproteins)^ 8 8ixLIST OF TABLESTable 1. Characteristics of the Primary kingdoms.^5Table 2. Transcripts from the sod family of genes. 5 6Table 3. Signature amino acid residues in eubacterial andhalophilic SODs^ 7 1Table 4. Comparison of paralogous and orthologous gas vacuolesgenes^ 7 6Table 5. Comparison of halophilic sod genes and proteins^7 8ABBREVIATIONSA600^absorbance at 600 nanometersA adenine (in nucleic acids)Ala or A^alanineAMV avian myeloblastosis virusArg or R^arganineAsn or N^asparagineAsp or D^aspartic acidATP adenosine triphosphateb p^base pairsC cytosineCys or C^cysteinedATP deoxyadenosine triphosphatedCTP^deoxycytosine triphosphatedGTP deoxyguanosine triphosphateDNA^deoxyribonucleic aciddTTP deoxythimidine triphosphateEDTA^ethylenediaminetetraacetic acidG guanine (in nucleic acids)g^gramsGlu or D^glutamic acidGln or Q^glutamineGly or G^glycineHis or H^histidineIle or I^isoleucinek b p kilobase pairsxikcal^kilocaloriesLeu or L^leucineLys or Y^lysineM^moles per literMet or M^methioninemg^milligramml millilitremRNA^messenger ribonucleic acidmg microgramml^microlitreng nanogramnt(s)^nucleotide(s)Phe or F^phenylalaninePro or P^prolineRNA ribonucleic acidRNAse^ribonucleaseS Svedberg unitSDS^sodium dodecyl sulfateSer or S^serineSOD superoxide dismutaseT^thymineThr or T^threonineTris tris(hydroxylmethyl)aminomethaneTrp or W^tryptophanTyr or Y^tyrosineu^units of enzymeUra or U^uracilVal or V^valinexiiACKNOWLEDGEMENTSI would like to thank my supervisor Dr. Patrick P. Dennis forhis help and support during the entire course of graduate work. Ialso thank Drs. Caroline Astell and Rosemary Redfield for their adviceand help during the same period of time. A special thanks goes toLawrence Shimmin for sharing his wisdom, patience, enthusiasm andintroducing me to the joys of "B" movies. Thanks also go to mycolleagues, Peter, Janet, Damien, Shanthy, Deidre, Luc, Daiqing andJosephine for their cooperation and friendship in the lab. I thank Dr.W. Robert McMaster for his help during the compilation of this thesis.Finally and most importantly, I thank my parents, my brothersand sisters, and my wife, Alison for their unconditional support andencouragement during the course of this work.INTRODUCTIONEarly Evolution How life began and how it subsequently evolved into the vastarray of extant forms presently inhabiting earth has intriguedscientists for decades. Until recently most clues to the history ofcellular evolution originated from a combination of geological andpaleontological sources. For example, the oldest fossils of any lifeform, the stromatolites from Warrawoona in Western Australia, placethe origin of life at approximately 3.5 X 10 9 years ago (Walter, 1983).These and other fossils show characteristics similar to depositions(mats) formed by present day cyanobacteria. Large and morecomplex cells first appear as fossils in depositions dated to about 2.1X 109 years ago. These cells were of sizes comparable to the smallestpresent day eucaryotes. Eucaryotes are distinct from the procaryotes;they contain organelles and a nuclear membrane that surrounds acomplex genome. From the fossil data it had been concluded thatevolution was a linear process beginning with simple procaryoticcells and eventually producing a larger and more complex eucaryoticcell.It was also recognized that the procaryotes containedorganisms, now grouped together and called the archaebacteria, thatseemed unusual in their ability to survive in extreme environments.Although these organisms showed peculiar structural andphysiological characteristics they were not considered different fromthe bacteria.2The dogma of linear evolution from simple procaryotes tocomplex eucaryotes prevailed prior to the early 1970s. It wasgenerally regarded that the "demarcation between the eukaryoticand prokaryotic organisms (was) the largest and most profoundsingle discontinuity in the contemporary biological world" (Stanier etal., 1976). In the latter part of the decade this idea was radicallyaltered due to results obtained from molecular phylogenetic analysesby Carl Woese and his colleagues.Molecular Phylogeny And The Three Kingdom ClassificationThe study of evolutionary relationships among organisms usingmacromolecular sequence data is termed molecular phylogeny. Inthe mid 1960s, Zuckerkandl and Pauling (1965) had suggested theuse of macromolecules as molecular chronometers; the comparison ofprimary structures of proteins and nucleic acids that are derivedfrom a common ancestor and separated by speciation could provideclues to their phylogenetic relatedness. It is now generally acceptedthat an enormous number of nucleic acid (and polypeptide)sequences produce a nearly identical phenotype. Genetic variabilityunaccompanied by significant phenotypic change is not subject toselective pressure and therefore is a good measure of evolutionaryrelatedness. In spite of the fact that the real situation is morecomplex, macromolecular sequences are being used to estimatephylogenetic relationships in the presence of uncertainty using aprobabalistic model of evolution (Felsenstein, 1988).3The evolutionary history of macromolecules or organisms(termed taxonomical units) is often presented graphically in the formof dendrograms or "trees". The branches of the trees denote theevolutionary relationship between the tips of the branches, orexternal nodes, representing the different taxonomical units. Thebranch points, or internal nodes, represent the ancestral states.One of the best molecular chronometers is a constituent of theribosome - the principal component in the process of translation inwhich messenger RNAs are decoded into polypeptides. Ribosomes arecomposed of a large and a small subunit designated according totheir sedimentation coefficients (S values): the eucaryotes mostlycontain 60S and 40S subunits and the archaebacteria and theeubacteria mostly contain 50S and 30S subunits. Each subunitconsists of one or more RNA (rRNA) and a large number of proteins(r-proteins); for example, in the eubacterium Escherichia coli, thesmall subunit (SS) contains 16S rRNA and 21 r-proteins and the largesubunit (LS) contains 23S and 5S rRNAs and 31 r-proteins. The SSrRNA is a very useful molecular chronometer because: 1) it is presentin all cellular life forms, 2) it contains many independent positionsthat range from highly conserved to highly variable domains, and 3)it is easily sequenced (Woese, 1987).The analysis and comparison of SS rRNA sequences resulted inan unexpected finding. The earliest life form, designated theprogenote, apparently branched (prior to the appearance of thecyanobacteria 3.5 X 109 years ago) into three independent lineages,designated urkingdoms: the eubacteria, the eucaryota, and thearchaebacteria (Woese, 1987). Subsequent endosymbiotic association4between a primitive eucaryote and the bacteria accounted for theintroduction of organelles into the former which gradually evolvedinto the modern eukaryotic cell (Margulis, 1970). The archaebacteriaappear to be distinct although they share some properties with theeubacteria and some with the eucaryotes (Table 1). [It has recentlybeen suggested that the terms eubacteria, archaebacteria andeucaryotes be changed to Bacteria, Archaea and Eucarya respectivelyto avoid the implication that the eubacteria and the archaebacteriaare more closely related than either one to the eucaryotes (Woese etal., 1990).]The universal phylogenetic tree based on SS rRNA sequencecomparison is depicted in FIG. 1A. This tree is unrooted because thepoint that corresponds to the universal ancestor is undetermined.Further comparison of sequences from other macromoleculessuch as ribosomal proteins and translational elongation factorssuggest that the archaebacteria and the eucaryotes are more closelyrelated to each other than to the eubacteria (Woese, 1990). Figure 1Bshows a modified tree illustrating this relationship. The position ofthe root of the tree was determined by comparing the genes for thetranslation elongation factors Tu and G, and for the a and r3 subunitsof ATPase. Both are pairs of paralogous genes (i.e. two genes of acommon ancestry in an organism that have arisen by a duplicationevent) that diverged from each other before the emergence of thethree organismal lineages from their common ancestor (Iwabe et al.,1989). The root of a tree constructed by comparing the sequences ofone member of the paralogous pair can be determined if the othermember is used as an outgroup.5TABLE 1: Characteristics of the Primary kingdoms.Distinguishing features of the eubacteria, archaebacteria and eucaryotaare listed. Abbreviations are chloramphenicol (CM), anisomycin (Ani),Kanamycin (Kan), pseudouracil (y), a-amatin (Ama) and rifampin (Rif).Cellular Organization^anucleateGenome Size (bp)Membrane LipidsCell WallsR ibosomes5x105 - 5x106ester linkedstraight chainpeptidoglycan6Characteristic^Eubacteria^Archaebacteria^Eucaryota5S, 16S, 23SsensitiveCMR AniS KanR1 - methytri/CGpresent5' triphosphatemethionine16-13AmaR RifRuncappedrRNAdiptheria toxinantibiotic sensitivityTransfer RNATVC loop1 - methyl adenineinitiator tRNAinitiator amino acidRNA Polymerasenumber of typessubunitsantibiotic sensitivitymRNA5S, 16S, 23SinsensitiveCMS AniR KanSTlyCGabsent5' monophosphateN - formyl methionine15AmaR RifSuncappednucleated withorganelles1.5x107 - 3x10 11ester linkedstraight chainvarious or none5S, 5.8S, 18S, 28SsensitiveCMR AniS KanRTiyCGpresent5' monophosphatemethionine312 or greaterArm (Pol 11)S (Pol 1+111)RRifR7 - methyl G cap andpolyadenyiationanucleate5x105 - 107ether linkedbranched chainvarious but notpeptidoglycan7Figurel: Universal phylogenetic trees determined from thecomparison of SS rRNA sequences.(A): An unrooted tree based on a matrix derived from distances betweenpairs of aligned SS rRNA sequences (Woese, 1987). (B): Tree from (A) abovemodified to contain a root which corresponds to the universal ancestor fromwhich all extant life forms ultimately diverged. The position of the root wasderived by comparing sequences of duplicated paralogous genes (two genesthat have resulted from a duplication event in a given organism) fortranslation elongation factors Tu and G and the genes for a and R subunits ofATPase (Iwabe et al., 1989). These duplicated genes diverged from each otherbefore the three primary lineages evolved from the common ancestor. Thisrooting strategy utilizes one member of the pair of paralogous genes as anoutgroup for the comparison of members of the other genes.8AEUBACTERIA^EUCARYOTESARCHAEBACTERIABEUBACTERIA^ARCHAEBACTERIA^EUCARYOTES9Molecular Phylogeny Methods The refinement of techniques for obtaining sequences ofbiological macromolecules during the last 30 years has resulted in anexponential increase in the sequence data set available forphylogenetic analysis. Numerous methods allow the statisticalcomparison of these sequences; associated with each method arespecific variables and assumptions (reviewed by Felsenstein, 1988).No single method can exclusively infer the phylogenetic tree closestto the "true tree" (i.e. the tree that represents the correctphylogenetic history). It is possible, however, that certainevolutionary relationships can be inferred with reasonable certaintyif they consistently appear from all or most of the methods ofanalysis employed.The most commonly used analyses for inferring phylogeniesare parsimony and distance methods. The principle of parsimonymethods is minimum evolution: trees that require the smallestnumber of changes between sequences are assumed to be closest tothe true tree and are therefore preferred (Fitch, 1971; Fitch andFerris, 1974). The method involves first assembling the sequences(nucleic acid or protein) into an alignment such that every positionalong the length of the alignment contains homologous members.Often, it is necessary to introduce gaps within the sequences in orderto optimize the alignment. The gaps represent insertion or deletionevents in the sequences that are assumed to have occurred at thesesites since their divergence from the common ancestor. Next,"informative sites" on the sequence alignment are determined. A siteis considered to be informative only if, taken in isolation, it favours1 0one specific evolutionary pathway (tree) describing the relationshipof the members at the site. A site where it is not possible to assign aspecific tree as being the most likely is considered to beuninformative and not factored into the analysis. The minimumnumber of substitutions at each informative site is then calculatedand a tree describing these substitutions is assigned to this site.Finally, the incidence of each of the trees over all the informativesites is calculated; the one that occurs most frequently is consideredto be the most parsimonious and best representative of theevolutionary relationship of the sequences under consideration(Felsenstein, 1988).In distance methods, the evolutionary tree is derived frommatrices of pairwise differences, usually expressed as the proportionof sites differing between sequences. An example of such a method isthe neighbour joining method of Saitou and Nei (1987). This methodworks as follows: First, the different sequences are aligned asdescribed above. The distances (differences) between all pairs arethen computed and a matrix is constructed. Next, all the sequencesare placed on a star-shaped dendrogram with no internal nodes (i.e.,all taxa are placed at the end of individual branches which alloriginate from a single point). To determine the first internal node,two most similar sequences are taken from the distance matrix andjoined to the rest of the tree to give the shortest total branch length.The two joined sequences are then considered to be "neighbours" andmerged into a single unit by averaging the distance between them.The distance of this average unit to each of the other taxa is thencomputed and a second matrix is compiled treating the first1 1neighbour pair as a single sequence. Repeating the process thenallows the assignment of two more sequences as neigbours which arealso merged and subsequently considered as a single unit. Thisprocedure is continued until no more internal branching is possible.This analysis results in a tree that infers the overall evolutionaryhistory of all the sequences (or species) that are under consideration.Since the methods described above are statistical, there areinevitable errors that occur. An example of such errors is theunderestimation of distances between sequences due to multiplemutations at single sites. In order to estimate the uncertainty in theoverall tree elucidation the data are subjected to resampling analysis.The most widely used resampling method in phylogenetic analyses is"bootstrapping" (Felsenstein, 1988). This consists of resampling withreplacement; randomly selected samples in the set (correponding tosites in aligned sequences) are analyzed and returned to the set sothat some sites of the sequences under comparison are sampled morethan once and some are never sampled. If such resampling isperformed frequently enough, it is possible to determine theconfidence limits (P values) and hence the significance of anyinferred evolutionary branch.The ArchaebacteriaThe archaebacteria are composed of two major lineages: 1) theextreme thermophiles and 2) the methanogen-halophils (Woese,1987). Woese et al., (1990) have recently suggested the termsCrenarchaeota and Euryarchaeota to describe the two respectivelineages. The extreme thermophiles are quite uniform in their1 2phenotype; most species are anaerobic (an exception is Sulfolobus,which is aerobic), utilize sulfur as an energy source and growoptimally at extremely high temperatures (70°-110°C).Based on SS rRNA sequence comparisons, the methanogens andthe halophiles are more closely related to each other than eithergroup is to the sulfur dependent extreme thermophiles (Woese,1987). The methanogens are obligate anaerobes that possess aunique mode of energy metabolism that generates methane. Theycan utilize 112, formate, acetate or methylamines with the aid ofcoenzymes (e.g. coenzyme M) that are not found in other organisms(Jones et al., 1987). They occupy extremely diverse habitats andhave been found wherever anaerobic biodegradation of organiccompounds occurs, including freshwater and marine sediments,digestive tracts of animals, and anaerobic waste digesters of sewagetreatment plants (Jones et al., 1987).Halophilic ArchaebacteriaAnalysis of SS rRNA indicates that the halophilic archaebacteriahave emerged from one group of the methanogens (Woese, 1987). Ithas been proposed that this transition involved adaptations to highsalt and oxygen containing environment (Woese, 1987). Halophilicmethanogens, which may represent evolutionary intermediatesbetween the methanogenic and halophilic lineages, have recentlybeen described (Paterek and Smith, 1988 and Liu et al., 1990). Thecombination of halophilicity and aerobiosis is the most significantphenotype that separates the halophiles from all the otherarchaebacteria. The requirement for salt varies between 2 to 4MNaCl; these organisms are found in salterns and along natural high1 3salt-containing bodies of water including the Dead Sea in Jordan andLakes Natron and Magadi in East Africa.Halophiles exhibit variable morphological shapes; they can bepleomorphs (Haloarcula marismortui) (Oren et al., 1988), box-shaped(Halobacterium sp. GN), rod-shaped (Halobacteria salinarium and itsrelatives including Hb. cutirubrum and Hb. halobium) or disc-shaped(Haloferax volcanii) (Mullakhanbhai and Larsen, 1975). They areoften deep red in colour due to the presence of a membrane pigment,bacteriorubrin. The members of the genus Halobacterium possess apurple membrane containing bacteriorhodopsin, a light-activatedproton pump used to make ATP.The high salt environment of the halophiles poses a number ofproblems which are ingeniously solved by the organisms. First, thehigh ionic exterior places an osmotic pressure on the cell. This isbalanced by a corresponding increase in the internal saltconcentration; potassium ions are actively taken up raising theintracellular concentration to as high as 5M (Lanyi, 1979).Salt, especially at high concentrations is destructive tobiological macromolecules whose optimal configurations are based onweak, non-covalent interactions. 'Salting-out' compounds such asNaCl and KC1 stabilize both the intra- and inter-molecularhydrophobic interactions among non-polar residues. Thisstrengthening of hydrophobic bonds causes proteins to assume amore tightly folded conformation and induces intermolecularaggregation. To reduce such effects, halophiles possess proteinscontaining a smaller number of non-polar amino acids (Lanyi, 1974).1 4Another effect of a high intra-cellular salt concentration is thedecreased availability of water for the solvation of the proteins. Inorder to compete effectively for available water molecules, halophilicproteins contain a high proportion of charged residues (Lanyi, 1974;May and Dennis, 1989). The acidic amino acids, aspartate andglutamate, are preferred because they bind at least twice as manywater molecules per side chain compared to other amino acids(Saenger, 1987).Although the halophilic archaebacteria grow in highconcentrations of NaC1, the interior of the cells contains a highconcentration of KC1. The K+ ions are actively imported raising theinternal KC1 concentration to as high as 5M. Presence of K+ ispreferred over Na+ probably because the hydration number of K+ ishalf that of Na+; consequently, less water is sequestered by the K+ions (Lanyi, 1974).The genomes of the halophilic archaebacteria are extremelycomplex. The purple membrane-containing halophiles, for example,contain two major fractions. About 80% of the genome is composed ofa chromosomal fraction that is 68% G+C and contains mostly single-copy sequences. The rest of the genome is heterogeneous andcontains a variety of different covalently-closed circular plasmidDNAs as well as A+T rich islands from the chromosomes (Ebert et al.,1984; Pfeifer and Betlach, 1985). In Hb. halobium a large number ofrepetitive elements, some of which have been shown to be insertionelements, are present in both chromosomal and extra-chromosomalfractions of its genome (Sapienza and Doolittle, 1982). These1 5repetitive elements are the source of numerous rearrangementsmaking the genome highly unstable.The genera Halobacteria contain purple membranes and arephysiologically and biochemically the best studied of the halophilicarchaebacteria. However, due to their unstable genomes they havenot been as amenable to genetic manipulations. Instead Haloferaxvolcanii, which possesses a more stable genome, has been exploitedand an efficient transformation system has been developed (Cline etal., 1989). For genetic manipulation, a number of E. coli - Hf. volcaniishuttle vectors have also been constructed (Lam and Doolittle, 1989;Holmes and Dyall-Smith, 1990 and Holmes et al., 1991).Because of the uniformity in 16S rRNA sequence divergence, ithas been difficult to reconstruct the phylogenetic history within thehalophilic archaebacteria. The data suggest rapid radiation to formthe major genera within a short period of time (Mylvaganam andDennis, 1992).Oxygen Toxicity And Superoxide Dismutases The appearance of the cyanobacteria in the earth's biosphere3.5 to 4.0 X 109 years ago set into motion a chain of events thatradically changed the existing anaerobic atmosphere. Thecyanobacteria were the earliest appearing organisms capable ofutilizing solar energy to derive reducing equivalents from water. Thisprocess, photosynthesis, generated molecular oxygen (02) as abyproduct. Initially, oxygen was sequestered by the ferrous "sink"within the earth's mantle. About 2.0 X 10 9 years ago the oxidation offerrous to ferric compounds was complete and 02 release into the1 6atmosphere began (Chapman and Schopf, 1983). The continualrelease of 02 has changed the earth's atmosphere from a reducinganaerobic to an oxidizing aerobic composition.The presence of 02 in the biosphere enabled other organisms toevolve a mode of respiration that utilized 02 as a terminal electronacceptor. This mode is energetically more efficient than eitherfermentation or respiration using nitrate and sulphate as terminalelectron acceptors.Although oxygen provides advantages to organisms able to useit, its presence and use in respiration generates highly reactive free-radical derivatives. These free radicals react randomly, and oftencatastrophically with biomolecules including nucleic acids, proteinsand lipids (as reviewed by Cadenas, 1989). Why these radicals aregenerated becomes obvious upon examination of the atomic structureof molecular oxygen.Dioxygen, 02, contains two unpaired electrons in its outermolecular orbital. When 02 oxidizes another atom or molecule itaccepts two electrons of the same spin. Since most molecules containa pair of electrons of anti-parallel spin in their outer orbitals, thereduction of 02 is restricted to one electron at a time. The primaryintermediate in the univalent reduction of 02 is the superoxideradical 02 - :02 + e-^02-Further reduction of the superoxide radical then gives rise tohydrogen peroxide (H202):1 702 - + 2H++ e-^H202The combined presence of the superoxide radical and hydrogenperoxide, through catalysis of a metal group, gives rise to the Haber-Weiss reaction and results in one of the most reactive anddestructive of all free radicals, the hydroxyl radical (O•) (Haber andWeiss, 1934):02- + H202 + H+ OH. + H2O + 02This is, however, not the full explanation of oxygen toxicity sincesuperoxide has been shown to be involved in damage independentlyof the presence of H202 (Fridovich, 1986a).In nature there are numerous enzymatic and non-enzymaticsources of superoxide. The enzymatic sources include reactionscatalyzed by xanthine oxidase, galactose oxidase and several flavindehydrogenases (Fridovich, 1978). Non-enzymatic sources includethe auto-oxidation of thiols and hydroquinones. The photolysis ofmolecular oxygen by solar radiation can also yield superoxide. As aresult, even organisms not utilizing oxygen are potentially at risk.Most organisms protect themselves from the destructive effectsof oxygen radicals. Several modes of protection exist: 1) the presenceof enzymes such as cytochrome oxidase which contain paramagnetictransition metals that enable the multivalent reduction of 02 withoutreleasing either 02 - or H202 (Antonini et al., 1970), 2) the absence ofthiol and other groups on proteins that are especially sensitive to1 8oxidation and 3) the presence of protective enzymes capable ofconverting reactive free radicals to less reactive species.Superoxide dismutase (SOD) is one of the most important ofthese protective enzymes (Fridovich, 1986b). It contains a prostheticmetal atom which catalyses the transfer of an electron from onesuperoxide anion to another, producing molecular oxygen andhydrogen peroxide. The latter is then converted to water through theaction of catalyses or peroxidases.SOD202 - +2H+^H202 +02catalaseH202 + H202^2H20 +02peroxidaseor H202 + RH2^2H20 + Rwhere RH2 is a reductant other than 11202 (e.g. formate).Depending on the type of prosthetic metal group, SODs can bedivided into two classes: 1) the Cu/Zn type (containing both Cu andZn ions) found in the cytosol of eucaryoteic cells and in a smallnumber of eubacterial organisms and 2) the Fe or Mn type(containing only one or both of Fe and Mn ions) found in mosteubacteria, archaebacteria and the organelles of eucaryotic cells1 9(Puget and Michelson, 1974; Steinman, 1982 and Bannister andParker, 1985). The bovine erythrocyte Cu/Zn SOD was the first ofthese enzymes to be purified (McCord and Fridovich, 1969). It is adimer of subunits of about 16,000 daltons. In contrast, the Fe- andMn-SODs enzymes exist as homo-dimers or tetramers of about20,000 daltons (Steinman, 1982). They show identity in both theirprimary sequence and tertiary structure (Stallings et al., 1984 and1985) and are distinct from the Cu/Zn type in their tertiary structureand in their sensitivity to inhibitors (Steinman, 1982). Thesedissimilarities suggest that the Cu/Zn and Mn and Fe enzymes aroseby convergent evolution from two separate origins.The importance of superoxide dismutase activity to organismsis supported by: 1) the separate origins for the Cu/Zn and the Fe andMn enzymes, 2) the ubiquitous occurrence of SODs in aerobicorganisms (Imlay and Fridovich, 1991), 3) the high degree ofconservation of primary and secondary structures within eachenzyme class throughout nature, indicative of a crucial function(Steinman, 1982), 4) the induction of Mn SOD by exposure of E.coli toincreased levels of 02 or superoxide generators such as paraquat, and5) the increased sensitivity to oxygen radicals by mutants of E.coliand Drosophila that lack SOD (Carlioz and Touati, 1986; Phillips et al.,1989).2 0Superoxide Dismutases In ArchaebacteriaAll archaebacteria investigated to date appear to possess theMn- or Fe-containing SOD enzyme. Methanogens and the thermophileThermoplasma possess the Fe-containing SOD (Kirby et al., 1981;Takao et al., 1991; Searcy and Searcy, 1981). The Mn-type of theenzyme occurs in the halophiles Hb. halobium and Hb. cutirubrum(Salin and Oesterhelt, 1988; May and Dennis, 1987).Genes encoding SOD from a methanogen and a halophile havebeen cloned and sequenced (Takao et al., 1990; May and Dennis,1989). These genes show high sequence identity to each other and togenes encoding Fe and Mn-SODs in other organisms (33%-42%)(Takao et al., 1990).In Hb. cutirubrum, the sod gene encodes a protein withsuperoxide dismutase activity. This protein has been purified to nearhomogeneity and characterized with respect to its metal content(Mn), size (25,000 daltons), subunit composition (tetramer) optimalsalt requirement, susceptibility to specific inhibitors (resistant toazide and sensitive to inactivation by H2O 2) and inducibility byparaquat (May and Dennis, 1989). In addition to sod, a secondhomologous gene designated slg (god like gene) was detected withinthe genome of Hb. cutirubrum (May and Dennis, 1990). The slg geneis actively transcribed but its protein product has not beenidentified. The expression of the sod gene is enhanced by paraquatwhereas the the slg gene is unresponsive.At the nucleotide level sod and slg exhibit 87% identitywhereas the proteins they encode exhibit 83% amino acid identity.This observation is unusual. Normally when duplicated genes begin10Identity(%)50DNAPROTEIN2 1to diverge, most of the nucleotide substitutions occur at the thirdcodon position and are of the synonomous type (i.e. no amino acidreplacement occurs at the position specified by the codon). Suchchanges are more easily fixed in the population because they areoften not subject to significant negative selection. Non-synonomousmutations are usually subject to stronger negative selection and arefixed far less frequently in the population. As a consequence,nucleotide identity between the two sequences degenerates morerapidly than the amino acid identity between the two proteins. Onlymuch later when a substantial number of first and second positionchanges accumulate, does the amino acid identity fall below thenucleotide identity. This relationship between the nucleotide andamino acid identities has been shown from the analysis of sequencesof duplicated genes and their protein products and can berepresented graphically (R. F. Doolittle, personal communication):Time (arbitrary units)The point at which the nucleotide and amino acids identitiesconverge (the crossover point) represents the stage in the divergence2 2of duplicated genes when non-synonymous substitutions begin tooutnumber synonymous ones. From the analysis of sequences ofnumerous duplicated genes and the proteins they encode, thiscrossover has been shown to be around 60% (R. F. Doolittle, personalcommunication). Beyond this point, amino acid identity changesfaster than the corresponding nucleotide identity. Some examples ofthis relationship are 1) the trpG genes of Escherichia coli andSalmonella typhimurium (DNA and protein identities of 83.0% and95.0%, respectively; Ochman and Wilson, 1987) 2) the amy2 andamy3 genes of Drosophila (DNA and protein identities of 98.6% and99.6%, respectively; Boer and Hickey, 1986) and the amylase genes ofDrosophila and mouse (DNA and protein identities of 57.0% and55.4%, respectively; Boer and Hickey, 1986).The sod and slg genes of Hb. cutirubrum are unusual in thattheir crossover point is above 83% and the amino acid identity islower than the nucleotide identity. This is the result of the almostequal distribution of substitutions between the three codon positions.Most of these substitutions are non-synonomous and result in aminoacid replacements in the protein. The effect of this is a higher degreeof difference in amino acid identity between the respective proteinscompared to the nucleotide differences between their DNA. It wasalso found that transversions outnumber transitions; transversions inthe third codon position (compared to transitions), are more likely tobe non-synonymous changes.In summary, sod and slg are paralogous genes (duplicatedgenes within a species) that are 87% identical at the nucleotide level.The 5' and 3' flanking regions are totally devoid of any sequence2 3similarity. The expression of the sod gene is enhanced by oxygenradicals whereas the slg gene is unresponsive. The two proteins areonly 83% identical at the amino-acid level. These observationssuggested that the divergence of sod and slg was being driven bystrong selection for diverse function (May and Dennis, 1990)Aims Of This StudyThe genome of the extreme halophile, Hb. cutirubrum has beenshown to possess two paralogous genes, sod and slg (May and Dennis,1990). These genes are differentially regulated and show an unusualpattern of divergence.The aims of the present study were:1) To determine if other species of halophilic archaebacteria alsopossess similar paralogous genes and if they exhibit similar patternsof gene expression and divergence as those seen in the H b .cutirubrum sod and slg genes.2) To determine the phylogenetic relationship of these orthologous(genes seperated by a speciation event) and paralogous members ofthe superoxide dismutase gene family both within and betweengenera of the halophilic archaebacteria.3) To determine the phylogenetic relationship between thesuperoxide dismutases from the halophilic archaebacteria and fromnon-halophilic organisms.These studies provide useful insight into the mechanisms ofmolecular evolution in the halobacteria and important clues to the2 4evolution of one form of protection against oxygen toxicity, thepresence of the enzyme superoxide dismutase.2 5MATERIALS AND METHODSBacterial strains and growth conditions All halobacterial strains were grown in enriched high saltmedia that were adjusted to pH 6.8-7.0 and sterilized by autoclaving.The Hb. cutirubrum and Hb. GRB were grown in medium whichcontained per liter 231 g NaC1, 24.6 g MgSO4.7H20, 2.2 g KC1, 1.35 gsodium citrate, 3 g yeast extract and 5 g tryptone (Bayley, 1971).For Hf. volcanii, the medium contained per liter 125 g NaC1, 45 gMgC12•7H20, 10 g MgSO4•7H20, 10 g KC1, 1.34 g CaC12•2H20, 3 g yeastextract and 5 g tryptone (Daniels et al., 1986). For Ha. marismortui,the medium contained per liter 206 g NaC1, 36 g MgSO4.7H20, 0.37 gKC1, 0.5 g CaC12•2H20, 0.013 mg MnC12 and 5 g yeast extract (Oren etal., 1988). All cultures were grown at 37-40°C in rotary incubators.Paraquat was added to early log phase cultures (A600=0.2) to afinal concentration between 1 and 2.5 mM. Incubation wascontinued for a further 24 h before cells were harvested for RNApreparation. In the presence of paraquat, the growth rate issubstantially reduced and during the 24 h incubation, the culturesdid not reach stationary phase.The Escherichia coli strains DH5a and JM101 were used forplasmid propagation and generation of single stranded phagemids,respectively. The helper phage, R408, was used in conjunction withpGEM (Promega) and pEMBL (Dente and Cortese, 1987) phagemids.The E. coli strain KW251 was used as host for propagating the Ha.marismortui genomic library constructed in ,GEM-11 (Promega). Allstrains were grown in YT medium and, when required, the antibiotics2 6ampicillin and kanamycin were added to final concentrations of 100and 50 gg per ml, respectively. For plate assays of 13-galactosidase,IPTG (isopropyl-thio-galactoside) and X-gal (5-bromo-4-chloro-3-indolyl-(3D-galactosidase) were added to 50 1.1.M and 0.005% (w/v),respectively. To induce the receptor for phage X, strain KW251 wasgrown in medium supplemented with 0.2% maltose and 10 mMMgSO4.Isolation of DNA and RNA Total DNA was isolated from stationary phase cultures ofhalobacteria as described by Schnabel et al. (1982). Followingequilibrium centrifugation in EtBr-CsC1 density gradients, the DNAswere extracted with isopropanol, dialyzed against TE buffer (10 mMTris, 1 mM EDTA, pH 8.0) and stored at -20°C. Plasmid DNA wasisolated by the alkaline lysis method (Maniatis et al., 1982). Totalcellular RNA was isolated using the boiling SDS-lysis methoddescribed by Chant and Dennis (1986). Following ethanolprecipitation, RNA was resuspended in TE buffer and stored at -20°C.Preparation of radioactive probesRestriction fragments were dephosphorylated with calfintestinal alkaline phosphatase as described by Maniatis et al.(1982). Fragments and oligonucleotides were 5' end-labelled with T4polynucleotide kinase and y- 32 P-ATP. Restriction fragmentscontaining 3' recessed ends were 3' end-labeled using theappropriate a - 3 2 P -dNTP and the Klenow fragment of DNApolymerase I. Uniform labeling of restriction fragments was2 7achieved using the random primer method (Freinberg andVolgenstein, 1982).Southern hybridization analysisGenomic, cosmid, plasmid or A, DNA was digested withrestriction enzymes, separated on agarose gels and blotted ontonylon membrane (Southern, 1975). The filters were prehybridizedat an appropriate temperature in 5X Denhardt's solution (1 XDenhardt's is 0.2% BSA, 0.2% Ficoll, 0.2% polyvinyl pyrrolidone), 5 XSSPE (1 X SSPE is 0.18M NaC1, 10 mM Na3PO4, 1 mM EDTA, pH 7.7)and 0.1% SDS for one hour prior to the addition of radioactive probe.Hybridization was carried out for 6-12 h. Filters were washed for 15min, twice at 20°C in 2 x SSPE, 0.1% SDS and once each at thehybridization temperature in 1 x SSPE, 0.1% SDS, and 0.1 x SSPE, 0.1%SDS. The filters were then subjected to autoradiography.Library construction Genomic DNA from Ha. marismortui was partially digested withSau3AI to yield fragments primarily in the 12-18 kb range. TheSau3AI fragment ends were partially filled with dATP and dGTP andligated to Xhol arms of ?, GEM-11 which were partially filled withdTTP and dCTP. The recombinant molecules were packaged intophage particles and propagated in the E. coli host strain KW251.Plaque lifts on Hybond N filters (Amersham) were screened byhybridization with a radioactive probe as recommended by thesupplier.2 8DNA sequencingFor sequencing, short fragments were subcloned intophagemids and sequenced directly. For larger fragments, phagemidclones containing overlapping unidirectional deletions weregenerated using Exonuclease III (Henikoff 1987). Single strandedtemplates were generated using helper phage R408 and werepurified as described (Sanger et al., 1977). Double stranded DNAtemplates were prepared according to Zhang et al. (1988). Thesequencing reactions were carried out using either the Klenowfragment of DNA polymerase I or T7 DNA polymerase.RNA transcript mappingNuclease Si protection was used to identify 5' or 3' mRNAtranscript end sites (Dennis, 1985). Briefly, either 5' or 3' endlabelled restriction fragment probes (10 5 -10 6 dpm) were hybridizedto 5-10 12 g of total RNA and the hybrids were digested with Sinuclease. The DNA fragments protected from digestion by RNA wereseparated on 8% polyacrylamide-8M urea sequencing gels anddetected by autoradiography. Appropriate molecular length sizestandards were used to determine the sizes of protected fragments.The 5' ends of mRNA transcripts were precisely located byextension analysis using specific 5' end labeled oligonucleotideprimers (Neuman, 1987). Molecular length markers were generatedusing the same 5' end labeled oligonucleotides as primers in DNAsequencing reaction using appropriate template DNA.2 9Enzyme activity assaySuperoxide dismutase activity was assayed as previously described(Markland 1977; May and Dennis 1987). Protein was assayed by themethod of Lowry as modified by Peterson (1977).Sequence analysis The relative test (Sarich and Wilson, 1973) was used toestimate the number of substitutions that have occurred in the sodand the slg genes of Hb. cutirubrum since their divergence from acommon ancestral sequence. This is accomplished by comparison toan outgroup sequence. In this instance, the sodl sequence of Hf.volcanii and the sod sequence for Ha. marismortui were used in twoseparate tests. The test involves solution of simultaneous equations:AO = (AB + AC - BC)/2BO = (AB + BC - AC)/2CO = (AC + BC - AB)/2where A, B and C represent the two related and the outgroup genesequences and AB, AC and BC are the number of nucleotidedifferences in the pairwise comparisons of the three sequences. Thecalculated values of AO and BO estimate the number of substitutionsthat have occurred in sequence A and sequence B since theirdivergence from the common ancestral sequence 0. The value of COrepresents the number of substitutions separating sequence 0 fromsequence C. These relationships are depicted graphically in Figure 8.3 0Parsimony analysis was carried out using the PAUP analysispackage devised by David Swafford (Illinois Natural History Survey,Champaign, IL, U.S.A.). Neighbour joining analysis was carried outusing the application from the Clustal V analysis package by DesHiggins (European Molecular Biology Laboratory, Heidelberg,Germany). The actual operation of the software was performed byDan Fieldhouse at York University on a Sun Sparc Workstation. Forbootstrap resampling (Felsenstein, 1988) 100 and 1000 repetitions,respectively, were carried out in the PAUP and Neighbour joininganalysis.3 1CHAPTER 1Characterization of paralogous and orthologous members ofthe superoxide dismutase  gene family from genera of thebalophilic archaebacteria1.1. INTRODUCTIONEvolution is driven by a plethora of molecular processes. Forexample, genome expansion occurs by duplication of genes orsequences; random mutational processes introduce variability intothese sequences which ultimately either lead to fixation anddiversification or to elimination. Two homologous genes areparalogous if they are derived from a duplication event andorthologous if they are derived from a speciation event. Homologoussequences (both orthologous and paralogous) can also participate inrecombination and/or gene conversion events which have the effectof maintaining homogeneity and minimizing apparent divergence.From a mechanistic point of view, charting the evolutionarydiversification of paralogous and orthologous members of a genefamily within a closely related group of organisms should be aninformative process to visualize. For this purpose, the superoxidedismutase (SOD) family of genes from four species representing threegenera of halophilic archaebacteria have been characterized andcompared.Halophilic archaebacteria are a group of aerobic ormicroaerobic organisms that evolved from a strictly anaerobic andnon-halophilic methanogen ancestor (Woese, 1987). The adaptationto high salt environments was achieved by raising the intracellular3 2salt concentration to near saturation and altering macromolecularstructure to function at high salinity (Lanyi, 1979). (Halophilicspecies of eubacteria are unrelated and fundamentally different fromhalophilic archaebacteria; they utilize active pumping mechanisms tomaintain a low intracellular salt concentration and, because of thisfundamental difference, are not relevant to the present study.) Theadditional adaptation of halophilic archaebacteria to an aerobicmetabolism almost certainly has enhanced the importance ofprotective enzymes such as superoxide dismutase which are used todissipate highly reactive oxygen containing radicals. The SODenzyme dissipates the highly reactive superoxide radical (02 - ) bycatalyzing the dismutation to peroxide (02 -2 ) and diatomic oxygen(02) (McCord and Fridovich, 1969). In eubacteria, the response tooxidative stress is a complex and highly regulated process that onlyrecently has begun to be analyzed (Demple and Amabile-Cuevas,1991; Fee, 1991).The genus HaloLacterium represented by Hb. cutirubrum andHb. sp. GRB are distinguished from other halobacteria by productionof transmembrane proton and chloride pumping proteins:bacteriorhodopsin and halorhodopsin. (Hb. cutirubrum and itsimmediate relative Hb. halobium are independent isolates of a singlespecies, Hb. salinarium; Hb. sp. GRB is a separate species; Ebert,1984). The other two organisms examined in this study areHaloferax volcanii (Mullakhanbhai and Larsen, 1975) and Haloarculamarismortui (Oren et al., 1988).The genome of Hb. cutirubrum contains two paralogous genesdesignated sod and slg (superoxide dismutase like gene; May and3 3Dennis, 1990). The sod gene encodes a protein with SOD activity; thisprotein has been purified to near homogeneity and has been wellcharacterized (May and Dennis, 1987).^The slg gene is activelytranscribed but its protein product has not been identified.^Theregulation and the pattern of nucleotide sequence differencesbetween the two genes are remarkable. The level of both sod genemRNA and SOD protein is elevated in the presence of paraquat, agenerator of superoxide anions, whereas the slg gene mRNA is notaffected. At the nucleotide level, the two genes exhibit 87% sequenceidentity whereas the proteins they encode exhibit only 83% aminoacid identity. The distribution of mutations is nearly even betweenfirst, second, and third codon positions and the majority of nucleotidesubstitutions cause amino acid replacement in the proteins.Transversions outnumber transitions. The 5' and 3' flanking regionsof the two genes exhibit no sequence similarity. It is presumed thatoxygen toxicity has played an important selective role in thedivergence between these two paralogous genes and the orthologoussod genes of related halophilic species.In this study, genes of the superoxide dismutase family fromthree related halophilic genera have been cloned and sequenced. Foreach gene, the putative transcription start and stop sites have beendetermined and the regulatory response of each gene to paraquathas been characterized. In the accompanying paper, the nucleic acidand protein sequences of the superoxide dismutase family aresubjected to a detailed phylogenetic analysis.3 41.2. LITERATURE CLARIFICATION AND NOMENCLATUREAs indicated above, Hb. cutirubrum and Hb. halobium areindependent isolates of a single species, Hb. salinarium. Salin et al.(1988) have published the sequence of a putative "sod" gene fromHb. halobium. Examination of this sequence indicates that itcorresponds to the slg gene of Hb. cutirubrum (i) since it is virtuallyidentical in sequence both in the coding and flanking regions and (ii)since the predicted protein it encodes differs at eight of 26 positionsfrom the N terminal sequence of the purified sod protein of Hb.halobium and Hb. cutirubrum (Salin et al., 1988; May and Dennis,1989). Furthermore, the published nucleotide sequence of thisputative "sod" gene appears to contain a number of errors. The firstis a reading frame shift resulting from a deletion of a G residue atposition 271 and an insertion of a C residue at position 368 (allnumbering according to Figure 3). The second is a dinucleotideinversion at position 530-531 that generates an ACG Thr codon inplace of the AGC serine codon. The predicted amino acid sequence inthe region of the frameshift exhibits no detectable similarity to otherSOD proteins from either eubacteria or archaebacteria, whereas thecorrected sequence exhibits 46% identity to the SOD protein fromBacillus stearothermophilus and 91% and 100% identity to the SODand SLG proteins of Hb. cutirubrum. None of the other genes fromthe halophilic sod family exhibits this reading frame shift. Theserine codon at position 529-531 overlaps an Alul (AGCT) restrictionsite in the DNA; the presence of this site has been confirmed byrestriction analysis (data not shown). Takao et al. (1989) have alsocharacterized this region in Hb. halobium. They too, designate the3 5gene as sod, although it shows 100% identity, within both the codingand flanking regions, to the slg gene. Therefore, the genes that havebeen cloned from Hb. halobium do not encode the authentic SODprotein that has been purified and extensively characterized (Salinand Oesterhelt, 1988; May and Dennis, 1987). Rather, they encodethe related SLG protein. The sequence as published by Salin et al.(1988) almost certainly contains the three errors as indicated above.3 61.3. RESULTS AND DISCUSSION1.3.1. Gene IsolationThe sod gene of Hb. cutirubrum is 600 nucleotides in lengthand contained completely within a 1127 by genomic Sau3 A Irestriction fragment. This fragment is located within a larger 2.8 kbpPstI genomic fragment; the related but unlinked slg gene is containedwithin a 1.7 kbp PstI genomic fragment. The 1.1 kbp Sa u3 A Ifragment was used to probe a PstI digest of genomic DNA from Hb.cutirubrum and Hb. sp. GRB (Figure 2A). The probe exhibited intensehybridization to a single 2.8 kbp fragment in both digests and lessintense hybridization to 1.7 and 1.6 kbp fragments in the H b .cutirubrum and Hb. sp. GRB digests, respectively. The largerfragments contain the orthologous sod genes and the smallerfragments contain the orthologous slg genes. A smaller 1.1 kbSau3AI fragment that encoded the entire sod gene of Hb. sp GRB wascontained within the 2.8 kb PstI fragment. The 1.1 kbp Sa u3AI and1.6 kbp PstI fragments, containing the sod and slg genes from Hb. sp.GRB, were cloned into the BamHI site of pGEM7zf(+) and the PstI siteof pGEM5zf(+) to give pPD 1041 and pPD 1042, respectively.An ordered library of overlapping cosmids representing morethan 95% of the Hf volcanii genome has been constructed (Charleboiset al., 1990). The 1.1 kbp Sa u3AI fragment from Hb. cutirubrum thatcontains the sod gene was used to probe this library. Two non-overlapping cosmids, 564 and 461, that hybridized to the probefragment were identified. (The latter is a derivative of cosmid B56;Schalkwyk, L., personal communication.) Genomic DNA from3 7Figure 2: Identification of sod -like sequences by Southern hybridizationRestriction enzyme digests of genomic or cosmid DNAs were probedusing the 1.1 kbp Sa u3AI fragment containing the authentic sod gene from Hb.cutirubrum. A: Genomic DNA was obtained from Hb. cutirubrum (Hcu) and Hb.sp. GRB (GRB), digested with Pst1 and probed at intermediate stringency withradioactive probe. B: Genomic DNA from Hf volcanii and from the Hf. volcaniicosmid clones 564 and 461 were digested with S a u 3AI and probed atintermediate stringency. C: Genomic DNA from Ha. marismortui (Hma) wasdigested with a number of different restriction enzymes and probed atintermediate and high stringency. The enzymes used are indicated above eachlane.3 83 9Hf. volcanii and the DNAs from these two cosmids were digested withSau3A1 and reprobed with the 1.1. kbp sod fragment from Hb.cutirubrum (Fig. 2B). The probe exhibited equally intensehybridization to fragments of 4.7 and 2.7 kbp in length. The largerfragment contained within cosmid 461 mapped to pHV4, a 690 kbpmega plasmid of Hf. volcanii. The smaller fragment, contained withincosmid 564, mapped to the main 2.4 mbp chromosome. The 4.7 and2.7 kbp S a u3AI fragments were cloned into the B a m HI site ofplasmid pUC8 and pEMBL18(+), respectively, to give pPD 1038 andpPD 1039.Genomic DNA from Ha. marismortui was digested with anumber of different restriction enzymes. The fragments wereseparated and probed by Southern hybridization with the s o dcontaining 1.1 kbp Sau3AI fragment from Hb. cutirubrum (Fig. 2C).In all digests, the probe hybridized to only a single restrictionfragment; this implies that Ha. marismortui, unlike the otherhalophilic species examined, contains only a single gene belonging tothe superoxide dismutase family. A partial Sau3AI genomic libraryof Ha. marismortui was constructed in ,GEM-11. Three thousandplaques from the library, representing approximately 10 genomeequivalents of DNA, were screened by hybridization to the 1.1 kbpSau3AI fragment from Hb. cutirubrum. Six positive plaques wereidentified. The insert DNAs in the six clones were shown to beoverlapping and all contain an identical 1.9 kbp Sau3AI fragmentthat hybridized to the Hb. cutirubrum probe. When the filtercontaining the genomic digests (illustrated in Figure 2C) was strippedand reprobed with the 1.9 kbp Ha. marismortui Sau3AI fragment at4 0low as well as high stringency, an identical pattern of hybridizationwas observed. This confirms the existence of only a single gene ofthe sod family in the genome of Ha. marismortui. The 1.9 kbp Sau3 Afragment was cloned into the BamHI site of the vector pGEM7zf(+) togive plasmid pPD 1040.1.3.2. Sequence determination and alignmentThe nucleotide sequences of the sod family of paralogous genepairs from Hb. sp. GRB and Hf volcanii and the single gene from Ha.marismortui were determined. The sequences of the coding regionsof these genes along with the Hb. cutirubrum sod and slg sequencesare aligned in Figure 3. Only positions that differ from the wellcharacterized Hb. cutirubrum sod sequence are indicated; capitalsindicate non-synonymous codon changes resulting in amino acidreplacement and lower case indicates synonymous codon changes.Five of the genes are 600 nucleotides in length and encode 200amino acid long proteins. The sod2 gene of Hf. volcanii contains athree nucleotide long deletion which removes codon three and theHa. marismortui sod gene contains a nine nucleotide insertion aftercodon four. A cursory examination of the aligned gene sequencesand the proteins they encode indicates that an unexpectedly largeproportion of the nucleotide substitutions are non-synonymous andresult in amino acid replacements in the respective proteins.Furthermore, the substitutions are often clustered; this is particularlyevident between nucleotide positions 568 and 603. These featuresare examined in more detail in the next chapter.4 1Figure 3: Aligned nucleotide and amino acid sequences of the sodgene family and the proteins they encode. The nucleotide sequences of genes of the sod family are aligned to theHb. cutirubrum sequence beginning at the ATG translation initiation codon.Species abbreviation are Hcu, Hb. cutirubrum; GRB, Hb. sp. GRB; Hvo, Hf.volcanii; Hma, Ha. marismortui. Genes with demonstrable superoxidedismutase are designated sod and with an unknown or undefined activity aredesignated sig. The corresponding proteins are SOD and SLG, respectively.Nucleotides identical to the Hb. cutirubrum sod sequence are represented by adash (-); deleted nucleotides are represented by a dot (•). Similarly, aminoacids identical to the Hb. cutirubrum SOD protein sequence are represented bya dash (-) and deleted amino acids by a dot (•). In the nucleotide alignments(excluding the translation initiation and termination codons), large lettersrepresent non-synonymous substitutions that result in amino acidreplacements and small letters represent synonymous substitutions which donot change the amino acid. The position of restriction sites used in preparingprobes for Si nuclease protection experiments are overlined: A v all, 514;XmaIII, 134; MspI, 188; EcoRI, 343 and BssHI, 447^'011iiiii - ^ gum!' ^Eiurn -.--..,^,1111111Eiiinq -...===^EiliTTi -. ^Eiii!ii -  ^gim!^.......0 ^!11!!!^-.--...gliITTT^.  ^gliiiii^...,...^IHITIT .......^EirrilT .......I .lli.ii+ -...---EMI!!^-  ^sglilliT^.....,.^g+1111+ .......^Ell.?^T^.,.....Eiiiiii^=  ^EITTS4 -...---^ ....:^- ^ EMIL^-......gilITTI^- ^ EiiITTI^-......^I:::77:^-......F.1111::    TiiiTT',:,^.......-^E::::^gl^r . ^EMT?! -...--. ^-  ^EilITT T -......^EMI!! -......s: 7^ ^Rii164 .....---^IkitLiE -.-----^EMiii -...---:411^ii^-  ^gliiiII ..„...^giliTTT ....„.^EITT1IT =■.....^ri  ^..,.,..EMT?^.......^SIIITTY - ^ -^..t1111166 -...---^E1,4,4PRzi -.-....=^EIIIIIT =............. ..H!!!^- ^ 1::::::^= ^ g:::::: .  ^gissr% ---..- igiiirri^-......^,  ^iiiii!cq^vtglliggi -^ nErrlorrl „ ^ EE'""^- ^ IEllir , - ^ REIIIM -......t^g:::::r ,  ^i ^rs^r^..,..^gissss  ^ „...,..^m::::L:^.......I    -  ^E    .^ 2."" ^.. ^EMIT; -  ^F111111 -...... 7122iil !! ..,1:71# 1^ g ^ ..^§  EITT PL1 '^-.....- .^Z1111 ^=  ^il!!!^.......^Liiiirr!   ^. ^ ::::^...,...' ....= .......^gilllil .......^ HILL..^gm!!! ...,...^SESSEEE^iiiil ^-...... Riiiiii^-......^El11111^-......-^111i11^......,^rt2:1:11:^,.  ^g!!!!!T^=......^ !!!!!!^- ^ ,,.,..:^„.'000000^ 111111^1111111^=100111^Eilliii^_,...,..^giiiliF ......, ^...,...EEIIII" -...... Eiiiiiii -...--. REIM!! 0...... FEHIFT!_. ^..^.^...  ^;iiiii^=  ^i:1::^=......^EiTTHT-.--..-^EMU' -......^Eillill^-......E ^r ^=  ^g!!!!!T^—,...^2   ^. ^ g:11:::^.......^g1.1.11:^......7-n^v^ti    = ^ gt1:771- . ^ aElITITT^-......^E:::Y:7 0......^"4:ITTTT ''""'k 1 ;IIIII^'  ^F144114 0  ^.......^i'":::"! .......^gliiiii ....„t4^;^g:::::^011.111 11i1 :1 ,,  ^E!!!iii 0...---^ii  7" ^- ^g^2111:111 ^1 ^= ^ F11::::^0  ^F ^.^..^...,...^gi::111^.. ^4 3As expected, the orthologous sod gene pair and slg gene pairfrom Hb. cutirubrum and Hb. sp. GRB are nearly identical; the pairsdiffered by three (positions 147, 150 and 308) and five (positions 80,146, 147, 155 and 408) nucleotide substitutions, respectively. Thisimplies that Hb. cutirubrum and Hb. sp. GRB are indeed closelyrelated and that most of the divergence between the sod and the slggenes of Hb. cutirubrum predated the species separation of Hb .cutirubrum from Hb. sp. GRB. The paralogous sodl and sod2 genes ofHf. volcanii are also nearly identical in sequence. They differ only bythe deletion of codon three from the sod2 gene and by threenucleotide substitutions in codon 2 that converts one serine codon(TCA, sodl) to the unrelated serine codon (AGC, sod2).The general pattern of serine codon utilization in the sevenaligned sequences is noteworthy. Of the 20 positions where serine isrepresented two or more times, nine of these use both the TCN andAGY serine codons (e.g., see nucleotide positions 181-3 and 184-6).This implies that convergent evolution may have occurred at some orall of these positions (Brenner, 1988). At position 184-6, the TCNand AGY serine codons could have been derived by separate singlenucleotide substitutions in an ancestral ACN threonine codon. Thephylogenetic origin of the two serine codons at other positions isobscure. As an alternative to convergence, the substitutions at twoadjacent nucleotide positions within the serine codon could beexplained by mutagenic repair of a single UV induced cyclobutanepyrimidine dimer (Hsia et al, 1989). That is, AGY- T C Yinterconversion is possible in a single mutagenic event. Of the elevenremaining positions using only a single serine codon type, four are4 4where there are only two serines at the alignment position and theyoccur in genes with high sequence similarity (e.g., see nucleotidepositions 112-4 and 538-40 where the paired serines occur in theorthologous slg genes of Hb. cutirubrum and Hb. sp. GRB and theparalogous sodl and sod2 genes of Hf volcanii, respectively).The 5' and 3' sequences flanking the members of the sod genefamily are illustrated in Fig. 4. The 5' flanking sequences are alignedat the ATG translation initiation codons and numbered as acontinuation of the coding sequence in a negative direction. The 3'flanking sequences are aligned at the termination codons, TAA orTGA, and numbered as a continuation of the coding sequence in apositive direction. It has been previously noticed that the sequencesflanking the paralogous sod and slg genes of Hb. cutirubrum areunrelated (May and Dennis, 1990). This is also true of the paralogoussod and slg genes of Hb. sp. GRB, as well as the virtually identicalsodl and sod2 genes of Hf. volcanii. In contrast, the flanking regionsof the orthologous sod gene pair and the slg gene pair of Hb.cutirubrum and Hb. sp. GRB are virtually identical (Fig. 4). Thissupports the presumption that the initial duplication to produce theprogenitor sod and slg genes in the common ancestor was an ancientevent and that the species separation of Hb. cutirubrum and Hb. sp.GRB was much more recent.In the total of 527 nucleotides of 5' and 3' flanking sequenceavailable for the two orthologous sod and slg gene pairs of Hb.4 5Figure 4: Nucleotide sequences of the 5' and 3' flanking regions of thehalophilic sod genes. Above, (A), the 5' flanking sequences of the sod family of genes arealigned beginning at the ATG translation initiation codon (position +1) and arenumbered in a negative direction.^The sequences are grouped according tothe similarity of their coding regions.^Identical nucleotides in the pairwisealignments of the paralogous genes are indicated by dots (•).^The heavyunder- and overlines indicate box A-like sequences and the arrowheads (PO')indicate prominent transcription initiation sites. Where present,complementarity to the 3' end of 16S rRNA (...AUCACCUCCU0H) is indicated by alight under- or overline.Below, (B), the 3' flanking sequences are aligned beginning at therespective translation termination codons (positions 610-612).^Again,identities in the pairwise alignments are indicated by dots (•).^The T tractsequences located at or near the site of transcription termination are under-or overlined.-220^-200^-100^-160^-140^-120^-100^-00^-60^-10^-20^ .1•GATCGCGCGTTGTTCGTCGCTGROTTCGRAGTCCATGTOGTATCATCCARCATACATCRTGAARAATACTIGTARFICGCTCGGCATCACTICCGCGGTTOCCGTCCCGTCACGARCCCAACTGRAACT0CATTCCGGAAACCACCATAAGCAGCGCCGACGTRCGACACACTGT ATG H. sodGATCGCGCGTTOTTCGTCGCTGAGTTCGARGTCCATGIGGTATCATCCAGCATACATCATGAAAAATACTNTAAACGCTCGOCATCRCTTCCGCGGTTGCCGTCCCGTCACGAACCCRACTGARACTOCATTCCGGARACCACCATRAGCAGCGCCGACGTACGACRCACTGT RIG GAB ood■Io ►ACGATCCTCACGAFICATTTAFICATGACGCCGCGTGATCACTGATCCITUATCCACCG WIG Hsu s1gACGATCCTCACGAFICATTTAACATGACGCCGC0TGATCACTGATCCGGIGGATTCCACCG FITG GRO slg••TRCOGAGGCGCGRATCGAGTCGTTCGRAGAGRGCT0GCCGCGTTCGRACCGCCGOTGTCGTOGGGGGAGTACAGGCCGRACTCGACGACGCCAGAGTTGGTTCCGAGGCGTGAGCGAGCGCCCGAGAGT0TTTGUICCCATACATATCARACTCACTRCGTAAATC0CGTTCA0CGARAGICACATOTGTGTTACT0GTCCCTCOGTCTARCT0CGAACACCTIRCCA ATG H. •odl• •^•^• •^. • • .4.. •^•^••^•^•^• •^• •^• • •^• •^•^• .^• . •^• . •TCGTCGRACCGGAGTTC 00110CCGINIATCGCCGACGCCUCTTCGACOGRAATCORGTACGAGCGCCGGACGAAGCCGTTCGACGGCTGACAGCCTCGCGC0FIACCGGA0CTITTCOAAGCGGAGARATCGACCGATAGCORCCCGCAMICTGTGGTTTGTOCTA0CATACCCCCTCCTCAAGTUTAACATGATTCCGCCCGRCGCATIATACGGAGGTTACACIITT RIG Hue ..12GGATCCATTGARCTCTCCTGAGTGCTTACACGCGMACGAGGCCACGTOCTGGCGTCAGTGCC0C000CAGATGRCGOCTCT0TCTCGCTOCCCTIGTTGAGATGCTACCARATGGCTCCTAAACMTMAGGCAGCCGCGCGACAATACJGGAGGGGITTICAT RIG Asa sodB^610^620^610^660^600^700^720^740^760^700^000^020^040Hsu sod TAR COCCCCGCCCGCOGGGIICACCCTGAISACGMACGCTUTTC0GCGTURGCGTTCCGATAGCTTCTATACACAGCCCAGCCAACACCUTGTOTGGTCACGTCTCCCGTOGTCGACGACCTCCGATACCAGCTCGTCGCGGA000CTGGATGRCCGCCCACGCCCGCGTGRACCCCACGACGGTGATGGTGCGGGCGCT0CGCGGCGACGGCCARCACCCGCTGCGOCTGCTCGTGGAO ood TAR CGCCCCGCCCGCOGGGACACCCTGRARCGCCRCGCTITTICGCCGTGTAGCGTTCCGATAGCTICTATACRCAGCCCRGCCARCACCGOTOTGTGOTCACGTCTCCCGTGGICGACGACCTCCGATACCAGCTCGTCGCGGAGGGCTOGATGACCGCCCACGCCCGCGTGAACCCCACGRCGGTGATGGTOCGOGC0CTGCGCGGCGACZCCAACACCC0CTGCGGCTGCTCGTOHsu s10 TGA CCGARCRCGCTCCCGTOTTUTTTCGCGTCATGGCGOCTCATCAGTGGCTGGCGTORGAB olg TGR CCGGIICACGCTCCGOTOTTTITTTCGC0TCATGGCGGCTCRTC110TGGCTGGCGTGAHuo sodl TAR CGCRGCGTRGCGCAGCCGRACCIACGCGCTTCCC0CGCMACCACAGGITACCGTGAGAGGGTGCCCGACCGACCGAGGOCATCCGGCGAGGCGAGITCGCCCCGCCRACCUCACCCTCCTCGAAC0CACGGCACATTTTTCGGTGAGTGRCGGCGRCAGAGGCCGTCGATTGACACCGTTTGTTTCRATCARRATTCATAACTA• • •^• • •^• •^•^4.^•^• •^• • •^•^•.H. sod2 TAR ACCCOLTACATCCCORCTUTTCTITCGOCGCAACGCCGTCAGC00CGCGCTOTCCGCGRATCGRATCARATCAGTCC0CGGGTOGCCTCCGCTlimo sod TAR CGGGCGRCACCCOACGITTUTTGCGGGCTGTGCCGAGAGCCCGAGCGGIITTCTOGGICGOTTGCGTTCAGTCCGCCOTTARTATC47cutirubrum and Hb. sp. GRB, there are only two nucleotidedifferences; one is in the 5' flanking region of the sod gene pair atposition -125 and the other is in the 3' flanking region of the slg genepair at position 616 (Fig. 4). Within the 1200 nucleotides of codingsequence for the same two gene pairs, there are eight nucleotidesubstitutions (Figure 3, positions 80, 146, 147, 147, 150, 155, 308,408). From this somewhat limited data set it is apparent that the sodand slg coding sequences are still accumulating nucleotidedifferences; and the rate of accumulation in the coding regions isdouble that observed for the non-coding, flanking regions. At leasthalf of the substitutions (4 of 7; position 147 is ambiguous) withinthe coding regions are non-synonymous and have resulted in aminoacid replacements.1.3.3. Transcript characterization and regulation by paraquatThe transcripts derived from the sod and slg genes of Hb.cutirubrum have been characterized by primer extension and S1nuclease protection analysis and 7-methyl-G capping (May andDennis, 1990). The sod transcript is initiated two nucleotides in frontof the ATG translation initiation codon whereas the slg genetranscript is initiated 13 nucleotides in front of the initiation codon(Fig. 4, positions -2 and -13, respectively). The transcripts terminatein the poly T sequences that are centred 38 and 21 nucleotidesbeyond the respective termination codons of the sod and slg genes atpositions 650 and 633, respectively (Fig. 4). Not surprisingly, the 5'and 3' end sites of the Hb. sp. GRB sod and slg gene transcripts werelocated at the same respective positions, and the expression of the4 8sod gene, but not the slg gene, was enhanced by the addition ofparaquat to the culture (Fig. 5A).The location of the 5' and 3' end sites of the transcripts derivedfrom the sodl and sod2 genes of Hf. volcanii was more difficult todetermine because of the perfect identity of the two sequencesbeyond position 9 of the coding region. To locate the 5' transcriptend sites, a 355 by Mspl fragment derived from the sodl region anda partially homologous 302 by Mspl fragment derived from the sod2region were used; both fragments overlap the respective initiationcodons and terminate at the common Mspl site at nucleotide position188 within the coding regions. The two fragments were 5' endlabelled on the (-) DNA strand at position 190 and hybridized to totalHf volcanii RNA. Following digestion with S1 nuclease, the 355 bysodl probe yielded protected products of about 196 and 172nucleotides in length (Fig. 5B). The longer product corresponds toprotection by the homologous sodl gene transcript and has a 5' endsite near nucleotide position -15 in front of the ATG translationinitiation codon. The shorter product corresponds to protection bythe heterologous sod2 gene transcript and has a 5' end nearnucleotide position 9; this is the boundary of sequence divergencebetween the sodl and sod2 genes. The 302 by probe from the sod2gene yielded protection products of about 197 and 172 nucleotidesin length. Again, the longer products result from protection of theprobe by the homologous sod2 transcript with a 5' end site at or nearposition -19 in front of the ATG translation initiation.codon. The4 9Figure 5: Nuclease Si and primer extension mapping of the 5' and 3' transcripts from the sod family of genes. Lane abbreviations are: F, end labeled Si nuclease protection probe; 1, Si orprimer extension protection products generated using exponential RNA; 2, Si orprimer extension products generated using RNA from paraquat treated cells; M,molecular length markers; G, A, T and C, the products of the four DNA chaintermination sequencing reactions.A: Total RNA was prepared from exponential and paraquat treated culturesof Hb. sp. GRB. (i) For extension analysis primers complementary to position 236-219 of the sod gene and 140-121 of the slg gene were used. (ii) For 3' end mappingof the sod gene transcript, a 462 by AvaII-EcoRI fragment was 3' end labelled atposition 517 in the (-) DNA strand and hybridized to RNA from exponential andparaquat treated cultures. Because of their size, the labelled probes do not appearwithin the autoradiogram window.B: Total cellular RNA was prepared from exponential and paraquat treatedHf. volcanii cultures. (i) For 5' mapping, partially identical 355 and 302 by MspIfragments from the sodl and sod2 genes were 5' end labelled in the (-) strand atposition 190 and used as probes in Si nuclease protection assays. (ii) For 3' endmapping, partially identical 514 by EcoRI and 1.2 kbp EcoRI-BamHI fragmentsfrom the sodl and sod2 genes were 3' end labelled in the (-) strand at position 346-347 and used as probes in Si nuclease protection assays. (iii) For primerextension, the sodl and sod2 specific primers were used to generate reversetranscription products and to generate the corresponding DNA sequence ladders.The sequence of the sod] and sod2 genes overlapping the ATG translationinitiation codons is presented under the autoradiogram and the regions of primercomplementarity and positions of extension product termination is illustrated.5 0C: Total RNA was prepared from exponential and paraquat treated Ha.marismortui cultures. (i) For 5' end mapping, plasmid pPD 1040 was digested withB amHI, 5' end labelled in the (-) strand at position 138 and used as probe for Sinuclease protection assays. (ii) For 3' end mapping, a BssHI-BamHI fragment fromplasmid pPD 1040 was 3' end labeled in the (-) strand at position 450 and used asprobe in Si nuclease protection assays. This fragment contains the 3' terminalportion of the insert and the entire vector of plasmid pPD 1040. Because of theirsize, the 5' and 3' end labeled probes do not appear within the window of theautoradiogram. (iii) For primer extension, a Ha. marismortui specificoligonucleotide primer complementary to the (+) strand DNA at position 106 to 88was used to generate extension products and a DNA sequence ladder.5 15 2shorter product results from protection by the heterologous sodl genetranscript and has a 5' end site near position 9 within the coding region.Finally, treatment of the bacterial culture with paraquat results in anincrease in the amount of sodl gene transcript relative to the amount ofsod2 gene transcript. For this regulatory feature, the sodl and sod2genes of Hf. volcanii are related respectively to the sod and slg genes ofHb. cutirubrum.To confirm the above results, primer extension assays werecarried out using separate primers capable of discriminating betweenthe sodl and sod2 gene transcripts (Fig. 5Biii). The sodl specific 18mer oligonucleotide primer was complementary to the (+) strand ofthe sodl gene between nucleotides 29 and 2 (numbering according toFig. 3); extension products terminate at nucleotide position -15 and-16. The sod2 specific 18 mer oligonucleotide was complementary tothe (+) strand of the sod2 gene between nucleotides 28 and -3; themajor extension products with this primer terminate at nucleotidepositions -19 and -20. Again, paraquat treatment serves to enhancethe amount of the sodl gene transcript as assayed by the abundanceof extension product.The 5' end of the transcript from the single sod gene in Ha.marismortui was located using plasmid pPD1040 linearized at theXma III site and 5' end labeled in the (-) DNA strand at position 138of the coding sequence (Fig. 5C). The fragment protected from S1nuclease digestion was about 153 nucleotides in length andcorresponds to a 5' transcript end site at or near position -15 in the5' flanking sequence. This position was confirmed by primerextension using an 18 mer oligonucleotide complementary to position5 3106 to 89 within the coding region. Surprisingly, paraquat treatmentdid not seem to enhance the amount of mRNA; this implies that theregulation of the Ha. marismortui sod gene is similar to the sod2 geneof Hf volcanii and the slg genes of Hb. cutirubrum and Hb. sp. GRB.The response of SOD enzyme activity to paraquat was also measuredin Ha. marismortui. The activity was 3.0 units per mg of protein inthe absence of paraquat and 3.2 units per mg of protein after 24 hrsof paraquat treatment. Thus, in Ha. marismortui, paraquat isapparently not an inducer of either sod mRNA or SOD enzymeactivity.The 3' end sites of transcripts from the sod family of genesfrom Hb. sp. GRB, Hf volcanii and Ha. marismortui were located by 51nuclease protection of 3' end labeled restriction fragment probesoriginating within the coding sequences and terminating in the 3'flanking regions. In all cases, 3' transcript end sites were located tonear or within T tract sequences ranging in length from 5 to 7nucleotides. For most of the genes, these sequences are within 40nucleotides of the translation termination codon; for the sodl gene ofHf. volcanii, the T tract is located at position 749, about 140nucleotides from the translation termination codon (see Fig. 4).Examination of the coding regions indicates that there are no tracts ofthree or more T residues in the (+) strand sequence. This absencemay be a reflection of the role T tracts plays in the process oftranscription termination.The results of the Si nuclease protection and primer extensionanalysis indicate that all seven members of the sod gene family inhalophilic archaebacteria that have been analyzed are transcribed5 4almost exclusively as monocistronic mRNAs. These mRNAs range inlength from about 630 to 750 nucleotides. This result wasindependently confirmed by northern hybridization using genespecific restriction fragments as probes (data not shown). It haspreviously been reported that the photolyase gene of Hb. halobium(and Hb. cutirubrum) is positioned immediately upstream of the slggene and that photolyase transcripts extend into the slg gene toproduce a long bicistronic transcript (Takao et al., 1989; May andDennis, 1990). Less than 5 percent of the slg gene transcripts arepresent in this larger species. A sequence related to the 3' end of thephotolyase gene was also detected in front of the slg gene of Hb. sp.GRB but it has not been examined in detail. A related sequence hasnot been found in the region upstream of any of the sod genes of Hf.volcanii or Ha. marismortui.1.3.4. Transcription and translation signalsMost archaebacterial promoters contain a moderatelyconserved hexanucleotide box A sequence (consensus TTTAWA)centred about 25 nucleotides in front of the transcription initiationsite (Reiter et al., 1990). In addition, some promoters, including therRNA promoters of halophilic archaebacteria, contain a box B(consensus AYGCGAA) that overlaps the start site (Dennis, 1985).The promoters of the halophilic sod genes seem to retain the box Alike sequence but apparently lack the box B like sequence (see Fig.4).The precise mechanism for identifying the AUG translationinitiation codon in the mRNA of halophilic archaebacteria has not yet5 5been analyzed in detail. The transcripts from the sod family of genesare all predominantly or exclusively monocistronic and all containrelatively short 5' untranslated leader sequences of 2-19 nucleotides(Table 2). In all seven transcripts, the first AUG is utilized fortranslation initiation. This is a feature that is characteristic of mosteucaryotic mRNAs (Kozak, 1978).Translation initiation in eubacteria is facilitated by thepresence of a purine-rich sequence centred about ten nucleotides infront of the translation initiation codon. This sequence forms acomplementary structure with the highly conserved pyrimidine richsequence at the 3' end of 16S rRNA (Shine and Dalgarno, 1974). Inthese organisms, this 16S sequence is 5'...AUCACCUCCUoll. Theleaders of the Hf. volcanii sod2 and Ha. marismortui sod genetranscripts contain the complementary ribosome binding sequencesGGAGG; the Hb. cutirubrum and Hb. sp. GRB slg gene transcriptscontain only the three-nucleotide-long GGA complementarysequence. The Hf. volcanii sodl gene transcript has only a GAcomplementary sequence and the Hb. cutirubrum and Hb. sp. GRBsod gene transcripts have leaders too short to exhibitcomplementarity (Table 2). These observations suggest thathalophilic archaebacteria may use one or a combination of both theeucaryotic scanning and eubacterial 16S rRNA complementaritymechanism to initiate translation.5 6Table 2: Transcripts from the sod family of genes. Organism Gene 5' leaded 3'^trailed RBS2 ResponsetoParaquat3Hb.^cutirubrum sod 2 n 36 n +slg 13n 18n GGAHb. sp. GRB sod 2 n 3 6n - +slg 13n 1 8n GGA -Hf volcanii sodl 15n 140n GA +sod2 19n 18n GGAGG -Ha.^marismortui sod 15n 17n GGAGG -1 The approximate length in nucleotides (n) of 5' leader and 3' trailer of therespective mRNA transcripts was determined by S1 nuclease protectionand/or primer extension analysis. The Hb. cutirubrum sod transcript hasbeen shown to possess a 5' triphosphate end (May and Dennis, 1989).2 The ribosome binding site (RBS) within the transcript 5' leaderscomplementary to the pyrimidine sequence at the 3' end of the 16S rRNA isindicated.3 Genes that exhibit elevated levels of mRNA and SOD enzyme activity in thepresence of paraquat are indicated by (+); unaffected genes are indicatedby (-)5 71.3.5. Regulation by paraquatThe drug paraquat is a generator of superoxide anions and isknown to reduce the growth rate and cause a selective induction ofSOD activity in Hb. cutirubrum (Hassan and Fridovich, 1977; May andDennis, 1987). • Primer extension, Si nuclease protection andnorthern hybridization experiments demonstrate that the amount ofsod mRNA increases concomitantly with the increase in enzymeactivity (May and Dennis, 1989). The four to five fold increase isgradual and occurs over a period of 24 hours. Presumably, theincrease in the amount of sod mRNA results from increasedtranscription initiation although effects on mRNA stability cannot becompletely ruled out. The relatively slow increase in sod mRNAsuggests that neither paraquat nor superoxide radicals are the directinducers; rather, the induction is more likely to result from theindirect accumulation of chemical damage caused by the increase insuperoxide radical concentration. Surprisingly, the amount of the slggene mRNA was not increased in the presence of paraquat.When examined in the presence of paraquat, the expression ofthe sod gene of Hb. sp. GRB and the sodl gene of Hf. volcanii wasincreased whereas the expression of the slg gene of Hb. sp. GRB, thesod2 gene of Hf. volcanii, and the sod gene of Ha. marismortui wasnot affected (see above and Table 2). If the property of inducibilityhad a single ancestral origin for these genes and if it were mediatedat the level of transcription initiation, one might expect to seeconservation of a regulatory sequence element in the 5' flankingregions of the regulated genes of Hb. cutirubrum (or Hb. sp. GRB) andHf. volcanii. When these regions were compared, no similarity was5 8apparent even when gaps were introduced into the alignments. Thisimplies either that the property of paraquat inducibility does notreside in the 5' flanking region or that the regulatory element issubtle and not easily detectable. If paraquat regulation is mediatedthrough mRNA stabilization, one might expect to see unique sequenceor structural elements in the 3' flanking region of the mRNAs; these3' regions exhibit neither sequence similarity nor distinctivestructural features.The question of ancestral relationship between the 5' (and 3')flanking regions of the paraquat responsive Hb. cutirubrum sod geneand the paraquat responsive Hf. volcanii sodl gene remainsuncertain. Either they are related, and nucleotide substitutions,insertions, and deletions have long since obliterated any significantsequence similarity, or they are not related. The latter might implythat at least one of the two genes was generated by reversetranscription of mRNA and reinsertion into the genome at a newlocation. This would not explain the common pattern of regulationby paraquat, however.When the 5' flanking regions of the genes not induced byparaquat were examined, a substantial amount of hyphenatedsequence similarity was detected. In Figure 6, the identical 5'flanking region of the Hb. cutirubrum and Hb. sp. GRB slg genes arealigned to the corresponding regions of the Hf volcanii sod2 genesand the Ha. marismortui sod gene. In the first 55 nucleotides(position -1 to -55), the pairwise sequences are between 54 and 61percent identical, and 40 percent of the nucleotides are conserved inall three sequences. The alignments are interrupted only by the5 9Figure 6: Alignment of the 5' flanking regions of the non-inducible sod fat jaystgales.The 5' flanking regions of the non-inducible Hb. cutirubrum slg, Hb. sp.GRB slg, Hf. volcanii sod2 and Ha. marismortui sod genes are aligned with theATG translation initiation codons. The Hb. cutirubrum and Hb. sp. GRBsequences are identical; nucleotide identities between the sequences areindicated by dots (•) and the single gap required to maintain alignment isindicated by a dash (-). Upper case in the consensus indicates perfectconservation whereas lower case indicates conservation in only two of thethree sequences. At the bottom, the 5' flanking region of the inducible sodAgene of E. coli (Takeda and Avila, 1986) is illustrated. Identities with the Ha .marismortui are indicated by dots and gaps required to maintain thealignment are indicated by dashes. The E. coli sequence exhibits slightlyfewer identities with the other halophilic sequences.6 0I^I^I^I^I^I^I^IGRB slg^CGCGCCGACGAGTGRGCCCACGATCCTCACGRACATTTRACATGAC-GCCGCGTGATCACTGATCCGGIGGATTCCACCG RIG• •^•^••••^ID.  ^•••• ••^•••• ••• •• • •^•••Hvo sod2 ACTUGGITTGTGCTAGCRTACCCCCTCCTCARGHTTRACATGAT-TCCGCCCGACGCATGRTACGGAGGITACACIITT RTG• •• • •^• •   •^•••• ••••^•  ^• •••Hma sod^CCCTTGTTGAGATGCTACCARATGGCTCCTRARCGTTTRACCAGGCAGCCGCGCGACRATACTGATGGAGGCGTITTCAT RIG• ••^•••^•••   • ^ •^•• •• •^•^•••Hcu 3Ig^CGCGCCGACGAGTGAGCCCACGATCCTCACGRACIITTTRACATGAC-GCCGCGTGATCACTGATCCGGIGGATTCCACCG RIGCONSENSUS cccgtggt--g-tg---Ccao---cCTCct-RAc-TTTARCatGac-gCCGCgcGAc-a-tgo-acGGaGG-ttc--c-t ATG• • • •• •••^ •••• • •Eco sodR TTRATTRACTATARTGRACCAFICTGCTTACGCGGCATTRACAATCG-GCCGCCCGACARTACT---GGAGATGART---- RIG6 1insertion of a single A residue into the Ha. marismortui sequence atposition -34. Notably, two of the blocks of conservation within thisregion are the box A promoter element (-44 to -39) and the potentialribosome binding sequence (-14 to -10). The significance of otherconserved nucleotides is not known.The downstream coding region for these sequences (beginningat position +1) exhibits about 77% nucleotide sequence identity in thesame pairwise comparisons, whereas the 3' flanking regions beyondthe translation termination codons exhibit no significant homologyoutside of the T tract termination sequence. Similarly, the sequencesupstream of position -55 exhibits no significant similarity. Thetranslation termination codon of the upstream photolyase gene in Hb.cutirubrum and Hb. sp. GRB occurs at position -67; the equivalentgene is not present at the corresponding position in the Hf. volcaniiand Ha. marismortui sequences. These comparisons suggest that thepromoter region (defined by the 55 nucleotide of 5' flankingsequence) of the slg gene of Hb. cutirubrum and Hb. sp. GRB, the sod2gene of Hf. volcanii and the sod gene of Ha. marismortui are relatedand that sequence divergence within this region is only about two-fold more rapid than within the coding regions of the correspondinggenes. Unfortunately, these comparisons shed little light on thedifferences between paraquat inducible and non-inducible genes andthe mechanisms of induction.The 5' flanking region of the non-inducible sod gene of Ha.marismortui was used to search the GENBANK data base; the highestsimilarity detected was to the 5' flanking region of the Escherichiacoli sodA gene (Takeda and Avila, 1986). This is surprising for a6 2non-coding region given (i) the evolutionary distance betweeneubacteria and halophilic archaebacteria and (ii) the fact that the E.coli sodA gene is inducible by paraquat. The 5' flanking regions ofother eubacterial sod genes fail to exhibit this degree of sequencesimilarity. The significance of this limited sequence similarity isuncertain.In only one case, Hf volcanii, has the genomic context of theparalogous gene pairs been examined. The uninducible sod2 gene islinked to the rrnA operon on the 2.4 mbp chromosome whereas theinducible sodl gene was mapped to the 690 kbp mega plasmid. Asyet, no other genes have been mapped to this mega plasmid(Charlebois et al., 1991). Perfect identity is exhibited between thesodl and sod2 genes beginning at codon four and extending to thetranslation termination codon at the end of the genes. This wouldseem to imply that the sequences of the two genes are beingmaintained by recombination or gene conversion mechanisms.6 31.4. SUMMARY Four species, representing three genera of halophilicarchaebacteria were examined for the presence of genomicsequences that encode proteins of the superoxide dismutase (SOD)family. Three species, Halobacterium cutirubrum, Halobacterium sp.GRB, and Haloferax volcanii contain duplicated (paralogous) genes ofthe sod family; a fourth species, Haloarcula marismortui, containsonly a single gene. These seven genes were cloned and sequenced,and their transcripts were characterized by northern hybridization,Si nuclease protection, and primer extension.The expression of one of the two genes in Hb. cutirubrum, Hb .sp. GRB and Hf volcanii was shown to be elevated in the presence ofparaquat, a generator of superoxide radicals. The other genesincluding the single gene from Ha. marismortui exhibited no elevatedexpression in the presence of paraquat.The 5' and 3' flanking regions of all the genes containrecognizable promoter and terminator elements that areappropriately positioned relative to the 5' and 3' transcript end sites.Between genera, the orthologous paraquat responsive genesexhibit no sequence similarity in either their 5' or 3' flanking regionswhereas the orthologous non-responsive genes exhibit limitedsequence similarity but only in the 5' flanking region. Within thecoding region, the two paralogous genes of Hf. volcanii are virtuallyidentical (99.5%) in spite of the absence of similarity in the flankingregions. In contrast, the paralogous genes of Hb. cutirubrum and Hb.sp. GRB are only about 87% identical. In the alignment of all sevensequences, there are nine codon positions where both the TCN and6 4AGY serine codons are utilized; some or all of these may well beexamples of convergent evolution.6 5CHAPTER 2The family of superoxide dismutase proteins fromJialophilic archaebacteria: structure. function and evolution2.1. INTRODUCTIONEvolution has produced two unrelated proteins with superoxidedismutase activity. One is a Cu/Zn enzyme and is confined almostexclusively to the cytoplasm of eucaryotic organisms. The secondcontains Fe or Mn as metal ion cofactor and is found in eubacteria, inthe eubacterial derived eucaryotic mitochondria and inarchaebacteria (Steinman, 1982; May and Dennis, 1987; Takao et al.,1991; Kirby et al., 1981). These enzymes catalyze the dismutation ofthe reactive superoxide anion (02 -) to molecular oxygen and peroxideand thus protect organisms from chemical damage and inactivation(Fridovich, 1978, 1986a).Halophilic archaebacteria are a collection of related organismsthat evolved from an anaerobic methanogen ancestor and that nowgrow in nature either aerobically or microaerobically in high saltenvironments (Woese, 1987). Superoxide dismutase activity hasbeen detected in at least two representatives of the ancestralmethanogen group and in numerous halophilic species (Kirby et al.,1981 and Takao et al., 1991). The SOD enzyme has been purified andextensively characterized from both Halobacterium cutirubrum andits immediate relative, Hb. halobium; the enzyme contains Mn as ametal ion cofactor and is related to the Mn or Fe type enzymes ofeubacteria (May and Dennis, 1987; Salin and Oesterhelt, 1988).6 6The sod gene, encoding the authentic SOD protein, was clonedfrom the genome of Hb. cutirubrum and its expression was shown tobe elevated in the presence of paraquat, a compound that generatessuperoxide radicals (May and Dennis, 1989; May et al., 1989). Inaddition, a second related gene, s 1 g , was also cloned andcharacterized. Although the slg gene is actively transcribed, itsmRNA is not elevated by paraquat and as yet no enzymatic activityhas been associated with the protein it encodes (May and Dennis,1990). The pattern of nucleotide sequence divergence between s odand slg is highly unusual; silent third codon position substitutions arerelatively infrequent and most substitutions are consequentlynonsynonymous, resulting in amino acid replacement in therespective proteins. This pattern suggests that the SLG protein andpossibly also the SOD protein are under intense and perhapsfluctuating selection for new and divergent function. This selection isalmost certainly influenced by the concentration of the superoxideradical and possibly also by the composition and concentration ofintra- and extracellular cations.To extend this analysis, additional genes of the sod family havebeen cloned and characterized from other halophilic archaebacteria.In this chapter, the deduced halophilic protein sequences wascompared to other Mn and Fe SODs and residues that characterizeand distinguish these halophilic SODs were identified.672.2. RESULTS AND DISCUSSION2.2.1 Amino acid and nucleotide sequence alignments. The deduced amino acid sequences of the SOD family ofproteins from halophilic archaebacteria are aligned in Figure 7. Tofacilitate comparison with SOD sequences from other nonhalophilicorganisms, the numbering system of Parker and Blake (1988) hasbeen adopted. The three extra residues in the Haloarculamarismortui sequence near the amino terminus are accommodatedbetween positions 2 and 3.The most striking features of the halophilic SOD sequences aretheir high degree of sequence similarity and their high proportion ofacidic amino acids. In pairwise comparison, the amino acidsequences are between 76 and 100% identical and 125 of the 199common residues (62%) are conserved in all seven sequences (seethe halophilic consensus sequence in Fig. 7). In contrast, any one ofthe seven halophilic proteins exhibits only about 35-40% sequenceidentity with eubacterial and eucaryotic mitochondrial SODs andabout 40-45% identity to the SOD of Methanobacteriumthermoautotrophicum. At the DNA level, 64 of the 125 codonpositions, specifying amino acids conserved in the halophilic SODproteins, use single unique triplets whereas the remaining 61 aredegenerate (see chapter 1). Especially interesting is the use of boththe TCN and AGY Ser codons at three positions (-1, 19 and 168)where Ser is conserved in all seven halophilic proteins. Also strikingare the 74 positions of amino acid variability. At nearly half of thesepositions, connection between the encoding triplets requires6 8Figure 7: Amino acid sequence alignments of SOD proteins The seven halophilic SOD sequences are aligned using the numberingsystem of Parker and Blake (1988). Abbreviations are: Hcu, Hb. cutirubrum;GRB, Hb. sp. GRB; Hvo, Hf. volcanii; Hma, Ha. marismortui; Mth, M .thermoautotrophicum; Eco, E. coli. The three extra residues near the aminoterminus of the Ha. marismortui SOD are accommodated between positions 2and 3. The Hb. cutirubrum sequence is given in its entirety; within thehalophilic group, amino acids identical to the Hb. cutirubrum residues areindicated by a dash (-). The halophilic consensus sequence illustrates onlyresidues conserved in all seven sequences; non-conserved residues areindicated by small dots (•). The halospecific signature residues are indicted bylarge dots (•) above the halophile consensus sequence.^Deletions required tomaintain alignments are indicated by the open circles (o).^At the end of thealignment, the molecular weight, the number of acidic and basic amino acidresidues, and the pI of the respective proteins, as determined from theirprimary sequences, are indicated. The M. thermoautotrophicum and E. colisequences were obtained from Takao et al (1991) and Takeda and Avila (1986).6 91^ 10^ 20^ 30Nom Mn SOD 0000MSEV000ELPPLPYDYDALEPHISEQVLIIIHHDTHHOGYVMGGAB SOD^o 0 o o - - - 0 0 0 0 ^Hew SLG^oooo- -0Hooo 5 ^  U ^  S^DGRO SLG^0000- -01.1000^5  A U  S DHvo SOD1^0000- -D-000 o^E ^Hvo S0D2^0000- -o-000^D E Him SOD^0000- - - H5HP • •HALOPHILIC COHSEMSUS^MS. .000EL. .LPY.YDALEPHISEQ. .711HHOTHHO.YV. GMth Fe SOD^HDLEKKFY000ELPELPYPYDALEPHISREOLIIHHOKHHOFIYUDGEco Mn SOD M5Y0000 ILPSLPYAYDALEPHFDKOTMEIHHIKHHOTVVH1110^ 50^ 60^ 70^ 80UHDAEETLREHREl0000000ODHASTAGALGDUTHMGSGHILHILFUOS^  o o o o o 0 o L^5 ^  000000 0 ^  C^V^M^E HL -S- - -H- -0- - - - 0000000  C^Y M E H- -ADD ^  P0000000^EFOS^V R Pi ^- -ADD  A0000000^EFGS V R M L E 5 ^  Dll0000000 FG^5^A^IIVH ^ C^0 D ^ E M• •E ^0. 0^..LH..F U..AMALLAKIDEFIRESc000000DIDUDIK A ALKELSFHIJGGYLILHLFFIIGHAMAALESLPEFAHTPUEELITKLDQLPADKKTULAHMAGGHAHHSLFIJKG90^ 100^ 110^ 120^ 130MSPooAGGDEPoSGALCIDAIARDFGSYEHWRFIEFEAPASAflooSGVALLV^ o  U  o o ^- - -ooD o^0 ^  u^o^0 0 - - -ooD^0 o  V 0 o o - - -ooE  o E^L^E^E ^ A^K G ^ c^0 o A ^- - -ooE ^ o E^L E E  A^K G  G o o A -D-ooll- -0- -0E-E-L- - -EE  G^K G  o o G • •^•11^.Poo.GG.EPo.G.L..RI..DFGSYE.1J..EFE.AA.Alloo.GIJALLUMGPADECGGEPoSOKLAEYIEKDFGSFERFRKEFSQAAISAEGSGWAULTLKK oooGTILQooGOLK AA^ EAOFGSUDHFKAEFEKAAASAFGSGUPULUU110^ 150^ 160^ 170^ 1804DoSHSHILAHVAUDMHDEGFILW000000GSHPILALDUVEHSYYYDYGPo  000000 ^- -oPUAKCI ^  000000 - -oPVAKQ  000000 - U - -K- -Q- - - - 000000 ^-U- -K- -Q- - - -oooooo  H- - oPC AK Q- - - -P --K - -Q- - - -000000 • •^•^••••^ ••^ •^ ••V D o   LANV.VD.HD.GALIJ000000GSHAILALDUWEHSYY.DYGPYCoORTDALFIMQUEKHNVIIVIP000000NFRILLULDUMEHAVYIDYRHLK0o0GDKLAVUSTAHODSPLMGERISGASGFAILGLDUWEHAYYLKFOH190^ 200^ 210^ PROTEIN^M.W.^ACIDIC BASIC^ofORGSFUDAFFEUUDUDEPTERFECIAAEAFE^Hcv SOD^22386^38^7^1.17GAB SOD^22111^38^7^4.16I^PIAAHYDDVV5L- -^Het. SLG^22209^36^6^1,17I PIAAMYDDVUSL- -^ORB SLG^22122^35^6^1.17A^D^S ^  A A - V - - - U - L - -^Hvo SODI^22185^39^6^1.17A D 5  AFI-Y- - -U-L- -^Hvo SOD2^22352^36^8^1.20A^o^I  KAA-EY-KSUSH- -^Hee SOD^22771^41^9^1.16• •^•^ • •.RG.F. .AFFEU.DIJD ^  F E^HALO CONSENSUSVAPDYVEAFUMIUMUKEUEKAFEDIL0000^Mth SOD^21066^35^23^3.26RRADYIKEFUHVVMUDEAPARFAAKK0000^Eco SOD^23012^25^23^6.787 0two or more nonsynonymous nucleotide substitutions. For example,at position 125 aspartic acid (GAC), serine (AGC and TCG) and alanine(GCG) all appear.2.2.2. Signature amino acid residues in SOD Comparison of the X-ray diffraction structures of the Bacillusstearothermophilus Mn SOD and a number of eubacterial andmitochondrial Mn and Fe SOD sequences resulted in the identificationof 39 positions of high sequence conservation that appeared to beimportant in the structure or activity of the protein (Parker andBlake, 1988). These residues are listed in Table 3A. Four of thesepositions, His26, His81, Asp175 and His179 are essential for bindingthe metal ion cofactor (Stallings et al., 1984). In addition, theresidues, His30, His31, Tyr34, Trp85, Trp133, and Tyr181 have beenimplicated in the formation of a hydrophobic pocket around theactive site and in enzyme catalysis (Stallings et al., 1985). Theremaining residues appear to have structural rather than functionalroles. Of these 39 positions, 35 are conserved in the Fe SOD from M.thermoautotrophicum (Takao et al., 1991) In the halophilic SODs,only 26 of these positions are absolutely conserved; three, atresidues 5, 39 and 131, are partially conserved; one at position 130is deleted, and the remaining nine are substituted with the sameamino acid in all seven sequences. None of the eleven positions ofpartial or complete replacement affect residues implicated information of the active site or in enzyme catalysis.The SODs from halophilic archaebacteria contain 30 unique andconserved amino acid residues that are not found in any other SODs7 1Table 3: Signature amino acid residues in eubacterial and halophilicODs1. Amino acid positions in the protein alignments are presented inFigure 7.* indicates the four residues required for metal ion binding;^**indicates residues implicated in active site formation and/or catalysis.2. The signature amino acid residues present in virtually all eubacterial(Eubact.) SODs are indicated (Parker and Blake, 1988; Smith andDoolittle, 1992).3. Meth.:^the single SOD sequence from methanogenic archaebacteriumM.^Thermoautotrophicum (Takao et al, 1991). Halo.: the sevenhalophilic archaebacterial SOD sequences.^(+) denotes that theeubacterial signature residue is conserved in either the singlemethanogenic sequence or in all seven halophilic protein sequences.When the eubacterial signature residue is not conserved, thereplacement(s) are indicated. The horizontal dashes connecting theMethanogen and Halophile columns highlight the same amino acidreplacement in both archaebacterial groups. (o) denotes a gap (noamino acid) at this position in the protein alignment (see Figure 7).4. The signature amino acid residues that occur in either halophilicarchaebacterial or halophilic and methanogenic archaebacterial SODproteins but not in any known eubacterial or mitochondrial SODproteins are indicated. (V) denotes that the amino acid replacement inthe eubacterial sequences is variable; where an amino acid isindicated in the eubacterial column, it represents a eubacterialsignature residue (see Table 3A).^(-) denotes that the signatureresidue in the seven halophilic SODs has been replaced in themethanogen sequence.^The replacements can be identified in thealignment in Figure 7.^The horizontal dashes connecting columnsdenote conservation of signature residues between eubacterial and themethanogenic or between the methanogenic and the halophilicsequences. (o) denotes a gap (no amino acid) at this position in theprotein alignment (see Figure 7).72A . Amino acid residues conservedin eubacterial SOD proteinsPositions Eubact.2 Meth.3 Halo.34 Leu + +5 Pro + Pro/Asp7 Leu + +13 Ala + +14 Leu + +16 Pro + +26 * His + +29 Lys + Thr30** His + +31** His + +34** Tyr + +35 Val + +39 Asn + Asn/Gln80 Asn Leu Leu81 * His + +85** Trp + +88 Leu Met^--- Met101 Gly + +106 Ala Tyr Arg107 Ile + +111 Phe + +112 Gly + +130 Gly + 0131 Ser + Ser/Ala/Gly133** Trp + +146 Leu + +170 Pro Ile +175 * Asp + +177 Trp + +178 Glu + +179 * His + +180 Ala + Ser181** Tyr + +182 Tyr + +187 Asn + Pro192 Tyr + Phe197 Trp + Phe201 Asn + Asp202 Trp + +B . Amino acid residues conserved inhalophilic SOD proteinsEubact. Meth.4 Halo.4 PositionV Glu 20V Arg Arg 24V Trp 25Lys Lys Thr 29V Ala 46V Asn 48V Arg Arg 49V - Ser 63V Ala Ala 67V Thr 72V His 73V Leu Leu 80V Met Met 88Ala Try Arg 106V - Tyr 114V Trp 117V Ala Ala 128V Leu 135V Tyr Tyr 138V Asn 148V Val Val 151V Asp 152V His His 154V Gly 157V Ala 158V Leu 159V Trp 160V Ser 168V His 169Ala Ala Ser 180V - Gly 186Asn Asn Pro 187V - Gly 190Tyr Tyr Phe 192Trp Trp Phe 197V Glu 198Asn Asn Asp 2010 0 Phe 2160 0 Glu 2177 3at the corresponding alignment positions (Table 3B). Seven of theserepresent total replacement of eubacterial signature residues. Inaddition, there are nine positions where the single methanogenic andseven halophilic SODs exhibit a single amino acid not found in anyknown eubacterial SOD. These comparisons indicate in a qualitativeway that the halophilic SOD proteins are almost as different from theSOD of their immediate methanogen relative as they are from theSODs of their more distant eubacterial cousins. Furthermore, thesubstitutions specific to the halophiles almost certainly have asubstantial effect on the overall structure of the protein but leavethe hydrophobic active site and the four metal ion binding residuesintact.Virtually all proteins from halophilic archaebacteria (incomparison to proteins from nonhalophilic organisms) exhibit a highcontent of acidic amino acids (Lanyi, 1974). These residues, usuallylocated on the surface of the protein, are thought to sequester watermolecules and create a hydration shell that allows the protein tofunction within the high ionic strength intracellular environment ofthe cell. For example, in the eubacterial SOD protein, the acidic andbasic residues are equally prevalent whereas halophilic SODs containmore acidic residues and proportionately fewer basic residues. ThepIs of these seven halophilic SODs range from 4.16-4.20 (Figure 7).One might predict that of the 30 halophile-specific signature residueslisted in Table 3B, a substantial proportion would involvereplacement of Lys or Arg by either Asp or Glu. That is, many ofthese substitutions should reflect the general adaptation of theenzyme to function in high salt. Surprisingly, this appears not to be7 4the case. The 30 halophile-specific substitutions increase thenumber of acidic residues by only three (positions 20, 152 and 201)and an additional acidic residue (position 217) is generated withinthe halophile specific extension at the carboxy terminus of theprotein. Only position 20 involves a lysine/arginine (E. coliIMthermoautotrophicum) replacement by glutamic acid in the halophilicproteins.In about 70% of the alignment positions where acidic residuesare found in one or more of the halophilic SODs, an acidic residue isalso found in at least one other non-halophilic SOD. Of the remainingpositions, about half exhibit a basic residue in at least one non-halophilic sequence. Thus, the high acidity of the halophilic SODs hasbeen achieved by insertion of Asp and Glu residues most often atpositions where charged residues can be tolerated in non-halophilicSODs without adversely affecting enzyme activity. These sites arepresumed to be on the surface of the protein molecule and wellremoved from the site of catalysis.2.2.3. Nucleotide sequence divergence When orthologous or paralogous gene sequences initially beginto diverge during evolution, mutational differences usuallyaccumulate most rapidly in the third codon position and mostsubstitutions are synonymous. As a consequence, nucleotide identitybetween two gene sequences initially degenerates more rapidly thanthe amino acid identity between the two corresponding proteins.Only later, when a substantial number of first and second positionchanges accumulate, does the amino acid identity fall below the7 5nucleotide identity. Typical of this situation are the paralogous p-vacand c-vac genes of Hb. halobium and the orthologous c-vac gene ofHaloferax mediterranei (Home et al., 1988; Englert, 1990). For these,the DNA nucleotide identity is less than the amino acid identity andthe vast majority of substitutions are synonymous and occur at thethird codon position (Table 4).In contrast, the halophilic sod genes do not conform to thisexpectation. The majority of substitutions are of the non-synonymous type and occur with an unexpectedly high frequency atthe first and second codon positions (Table 5). Because of this, theamino acid identity of the proteins for most pairwise comparisons isless than the nucleotide identities of the genes.Within the halophilic sod genes, substitutions by transversionoutnumber transitions by almost two to one. The significance of thisbias is uncertain. Compared to transitions, transversions have less ofan effect on base composition and when they occur in the third codonposition, they are more likely to be non-synonymous. This bias isnot reflected in all genes from halophilic archaebacteria, however.For example, in the vac and 16S rRNA genes, transitions are moreprevalent than transversions (Englert, 1990; May and Dennis, 1990;Mylvaganam and Dennis, 1992). Since the gas vacuole proteinsequences in halophiles are highly conserved (Table 4),nonsynonymous transversion mutations in the third codon positionwould appear to be subject to strong negative selection and thereforeless likely to become fixed in the population. Many of the 16S rRNAsubstitutions occur at compensatory positions within regions of RNAsecondary structure. Transition mutations allow these compensatory7 6TABLE 4: Comparison of paralogous and orthologous gas vacuoles genes1. Matrix comparison of the plasmid and chromosome encoded gas vacuoleprotein encoding genes (p-vac and c-vac, respectively).^The speciesdesignations are: Hha, Hb. halobium and Hme, Hf. mediterranei.2. Each entry in the matrix consists of three rows. The first row indicatesthe DNA (nucleotide) sequence identity/protein (amino acid) sequenceidentity, each^expressed as a percent.^The second row indicates thedistribution of nucleotide substitutions in the comparison betweenfirst/second/third codon positions.^The third row indicates the numberof transition (Ts) substitution/transversion (Tv) substitutions.^Thesedesignations are abbreviated in the boxed entry.7 7Spedesi^ Hha^Hhagene p-vac c-vacHha c-vac84.7/97A22/0/3321/1486.4/96.13/1/2720/11Nuc/AA1st/2nd/3rdTs/TvHme c-vac85.5/98.71/1/3320/137 8TABLE 5: Comparison of halophilic sod genes and proteins 1. Matrix comparisons of the members of the halophilic SOD encodinggenes. They are as in the legend to Figure 7.2. Matrix entries are as described in the legend to Table 4. Thedesignations are abbreviated in the boxed entry.79TABLE 5: Comparison of halophilic sod genes and proteinsSpecies1 Hcu^Hcu^GRB^GRB Hvo^Hvogene sod slg sod slg^sodl sod287.2/8252 —Hcu slg 25/21/31 —31/46 —95.5/95.5 873/82.0 — Nuc/AAGRB sod 0/1/2 25/22/29 — 1st/2nd/3rd2/1 31/45 — Ts/Tv86.7/81.0 992/985 86.7/805 —GRB sig. 25/24/31 0/3/2 85/25/30 —33/47 3/2 33/47 —81.0/80.0 763/72.0 80.7/79.5 76.0/705 ^—Hvo sodl 25/28/61 41/30/71 25/29/62 41/33/70 —34/80 42/100 35/81 45/99^—80.9/80.4 77.1/72.4 80.6/79.9 76.7/70.9^99.5/100Hvo sod2 26/29/59 39/29/69 26/30/60 39/32/68^1/1/134/80 42/95 35/81 45/94 0/377.7/74.5^772/76.0^78.0/75.0^76.3/74.5^763/78.0^762/78.4Hm a sod^36/35/63^37/30/70^36/34/62^37/33/72^29/26/87^30/27/8562/72 57/80 61/71 60/82 45/97 45/978 0changes to proceed through stable G:U base pair intermediates andtherefore could account for the transitional bias.When this high proportion of nonsynonymous substitutionswas first observed between the paralogous sod and slg genes of Hb.cutirubrum, it was suggested that the cause might be intenseselection at the molecular level for new protein function (May andDennis, 1990). Since the sod gene encodes the authentic superoxidedismutase activity, it was imagined that the superfluous slg gene wasbeing used to generate a new and different enzymatic activity. Thatis, the sod gene was being conserved whereas the slg gene was beingsubjected to frequent nucleotide substitution.The presence of additional sequences from the halophilic sodgene family sheds some light on this situtation. Clearly, not just thesod and slg genes of Hb. cutirubrum and Hb. sp. GRB, but all of thegenes in the collection when compared pairwise appear to exhibitthis unusual pattern of divergence. When the relative rate test(Sarich and Wilson, 1973) was applied to the paralogous sod and slggenes of Hb. cutirubrum (or Hb. sp. GRB) using either the Hf volcaniisodl gene or the Hf. marismortui sod gene as the outgroup, it wasquite clear that both sod and slg have accumulated substitutions at asubstantial rate (Figure 8). Relative to the sod gene of Ha.marismortui, the sod and slg genes of Hb. cutirubrum haveaccumulated mutations at essentially the same rate whereas relativeto the sodl gene of Hf. volcanii the slg gene has accumulatedsubstitutions at twice the rate of the sod gene. This implies that boththe sod and the slg genes and probably all the genes in this halophilicfamily have been subjected to intense but variable selective8 1Figure 8: Relative rates of nucleotide substitutions in the sod and slggenes of Hb. cutirubruntThe relative rate test is described in the Materials and Methods section.(I) depicts the generalized situation where A and B are contemporary genesequences; 0 is the common ancestral sequence of A and B, and C is theoutgroup sequence. The length of the three branches connecting at 0 areproportional to the number of nucleotide substitutions (in parenthesis)separating 0 from A, B and C. These are calculated using the three equationslisted in the Methods. The distances (in nucleotide substitution) AB, BC and ACcan be obtained from Table 3. (II) For the Hcu slg (A), Hcu sod (B), Hvo sodl(C) calculations, AB, BC and AC are 77, 114 and 142, respectively. (III) For theHcu slg (A), Hcu sod (B), Hma sod (C) calculations, AB, BC and AC are 77, 134, and137, respectively.0) (ii)^ (iii) HmaHvo sodC^ sodl(89.5)^(97)A(52.5)^(24.5) (40)^(37)A^B Hcu^Hcusod^Hcu^Hcusag slg sod8 3pressures that have resulted in frequent amino acid replacements ineach of the respective proteins. These replacements are confined to74 of the 199 common amino acid positions (Figure 7). For thenearly identical paralogous sodl and sod2 genes of Hf. volcanii, theforces and processes maintaining sequence homogeneity are notunderstood.2.2.4. Phylogenetic analysis The two methods, maximum parsimony (Fitch, 1971) andneighbour joining (Saitou and Nei, 1987), were used to analyze thephylogenetic relationships among the halophilic sod genes and amongthe proteins they encode (Figure 9). The nucleotide and proteinalignments used for the analysis were co-linear and are depicted inFigure 3 and in Figure 7. The use of the sod gene sequence from M .thermoautotrophicum as an outgroup allows for the positioning ofthe ancestral halophilic gene (or root) within the tree.These phylogenetic analyses have been evaluated forsignificance at each branch point by using the commonly employedbootstrap re-sampling technique (Felsenstein, 1985). Theconsistency values indicated are the proportion in 100 or moreresamplings for exclusive grouping of the taxonomic units (genes orproteins) within that branch of the tree, separate from all the othertaxonomic units represented on the tree. A value greater than 0.95is considered significant whereas values less than 0.95 are notnecessarily significant.The nucleotide parsimony, amino acid parsimony, andnucleotide neighbour joining trees exhibit identical topologies; the8 4Figure 9: Phylogenetic relationships between members of thehalophilic sod gene family or the proteins they encode Phylogenetic trees were constructed using maximum parsimony (A andB) or neighbour joining (C and D) methods. The neighbor joining methodcontains a correction for multiple substitutions occurring at a single position(Kimura 1980 and 1983). The length of the branches in the neighbour joiningtrees reflect evolutionary distance. The stippled branch leading to theoutgroup is drawn at one-fifth scale. Trees A and C are based on DNA sequencealignments and trees B and D are based on protein sequence alignments. Thenumbers preceding each branch are bootstrap consistency values andindicate the proportion of replications that group all the taxonomic unitswithin the branch and exclude all other taxonomic units represented in thetree. These consistency values were computed from 100 or more bootstrapresamplings. Abbreviations are as in Figure nth sodA. NUCLEOTIDE PARSIMONY^ B. AMINO ACID PARSIMONYt. 000.96^ Hcu slg^ ORB algHcu sodORB sodHvo sodlL Hvo sod2^ Hma sod^ nth SOD1.000.60^Hcu SLG^ ORB SLG^Hcu SOD^ORB SODr- Huo SOD1Hvo SOD2^ Hma SOD^ nth SODC. NUCLEOTIDE NEIGHBOR JOINING^ D. AMINO ACID NEIGHBOR JOINING^1.00 ^r—Hcu slgL ORB 3Ig^1. 00^(Hcu sod1 GRB sod1 00^/ Hvo sod IHvo sod2^ Hma sodnth sodr -II-1.000.921.00 Hcu SLGLGRB SLG0.611,00^Hcu SODGRB SODHvo SOD!Hvo 5002^ Hma SOD.621/5 scale^ 1/5 scale1.000.601.008 6sequences from Halobacteria form a coherent group with the two sodgenes (or proteins) on one branch and the two slg genes (or proteins)on another. The single sod gene (or protein) from Ha. marismortuibranches early and the two nearly identical sod genes (or proteins)from Hf. volcanii branch later from the lineage leading to theHalobacteria. The bootstrap consistency values for most of thebranch points is greater than 0.95. Only the amino acid neighbourjoining tree exhibits a slightly altered topology. Here, the Hf. volcaniiprotein branches early from the lineage leading to Ha. marismortuirather than the lineage leading to Halobacteria. The bootstrapconsistency for this grouping is 0.62.In all these analyses, the M. thermoautotrophicum was clearlyand unambiguously identified as the outgroup. This methanogenprotein is apparently an Fe containing SOD (Takao et al., 1991)whereas the halophilic proteins are most likely all Mn containingenzymes (May and Dennis, 1987). The difference in metal ioncertainly accounts for some of the amino acid differences betweenthe methanogenic and halophilic proteins. When the E. coli Mn-sodgene and protein sequences were included, the parsimony analysisgrouped the halophiles together but was unable to reliablydistinguish between Escherichia coli or M. thermautotrophicum asthe outgroup (not shown). This reinforces the impression that thehalophilic proteins as a group are unique and different from all otherSOD proteins (see Table 3).872.2.5. The consensus treeconsensus derived from these phylogenetic analyses isin Figure 10. The branch points represent eitherevents, designated S 1, S2 and S3, or divergence ofgene sequences, designated PSD1 and PSD2. The firstS1, separates the lineage leading to Haloarcula spp. fromother halophiles. The second event, S2, splits Haloferax spp. fromHalobacterium spp., and the third more recent event, S3, splits Hb.cutirubrum from Hb. sp. GRB.The first paralogous sequence divergence event, PSD1,represents the commencement of divergence between the paralogoussod and slg genes within the Halobacterium branch. The majority ofthe differences that have accumulated between these two sequencespredate S3, the speciation of Hb. cutirubrum and Hb. sp. GRB. Thesecond PSD2, is meant to depict the minor differences that existbetween the sodl and sod2 genes of Hf. volcanii.Other features characteristic of halophilic sod genes can besuperimposed on this diagram. One is sod gene copy number andanother is response to paraquat.^Both the methanogen ancestorrepresented by M. thermoautotrophicum and the early branching Ha.marismortui have only a single sod gene (Takao et al., 1991 andchapter 1). This suggests that the halophilic ancestor may also havehad only a single sod gene in its genome. If this scenario is correct,the position for duplication would be early in the branch leading toHalobacterium and Haloferax. In the extant species examined thatpossess two sod related genes, one gene is inducible by paraquat andTheillustratedspeciationparalogousspeciation,8 8Figure 10: The consensus tree illustrating phylogenetic relationshipsbetween the halophilic sod genes (or proteins)The consensus tree was obtained from the four constructed treesillustrated in Figure 9. £peciation events are designated S1, S2 and S3. Thecommencement of paralogous gene sequence divergence is indicated by PSD1and PSD2. (A) The positions of gene duplication and acquisition of inducibilityby paraquat are indicated. Double lines indicate the presence of paralogousgenes of identical coding sequence; solid lines are non-inducible genes;stippled lines are paraquat inducible genes. (B) The alternative scenariowhere the halophilic ancestor already possesses duplicated sod genes isdepicted.GENEDUPLICATIONANCESTR IALSTATEGA IN OFINDUCIBILITY — — — • Hcu sod[S3— • GAB sodHcu s I g^ S 3GRB s I g— — Hvo sodli-^ PSO2^fPSDIHuo sod2{S2Si^fP SD I— — — • Hcu sodS 3— — • GRB sod^I S3 Hcu s I gGRB s I g- — - Huo sod I^ P SD 2Huo sod2Hma sodRNCESTRI ALSTATEI^S289^ Hma sodDELETION OFINDUCIBLE GENE9 0the other gene is not inducible by paraquat (see chapter 1 and Mayand Dennis, 1990). This implies that shortly after duplication andprior to the separation of Halobacterium and Haloferax, one of theduplicated genes acquired, by an unknown mechanism, the propertyof inducibility. The Ha. marismortui branch retains the ancestralsingle copy non-inducible state (Figure 10A).There is a second alternative scenario that is equally likely.The ancestral halophile might have already possessed duplicated sodgenes, one of which was inducible and the other non-inducible byparaquat. To reach the current state would require simply the lossof the paraquat inducible gene from the branch leading to H a .marismortui. This possibility is depicted in Figure 10B. Other morecomplex explanations are not considered here.Although partially satisfying, these models for the evolution ofsod related sequences within halophilic archaebacteria fail to explaina number of observations. The first is the absence of any detectablesequence similarity in the 5' flanking region of the sod related genesthat were induced by paraquat (see Figure 5). This is especiallyenigmatic since the 5' flanking sequences of the uninducible genesthat possibly have a deeper evolutionary origin (see Figure 10A),nonetheless exhibit easily identifiable sequence similarity. Second,the sodl and sod2 genes of Hf. volcanii, although virtually identicalwithin their coding regions, exhibit no detectable similarity in theirflanking regions. This must mean that coding sequence homogeneityis maintained by concerted evolution and probably involvesrecombination or gene conversion type events; these events91apparently do not involve or include either the 5' or 3' flankingregions.In general, the flanking region of homologous genes, excludingconserved regulatory elements, accumulate nucleotide substitutionsmore rapidly than coding regions. This is in general true for thehalophilic sod gene family. Comparison between genera indicate thatonly the non-inducible sod genes exhibit similarity and this isconfined to a 50-55 nucleotide long region that contains thetranscriptional promoter. Even here, however, sequence identity issubstantially less than in the coding regions. This expected pattern isreversed within the two species of Halobacterium, Hb. cutirubrumand Hb. sp. GRB. Although the sample size is somewhat restricted,substitutions appear to be almost two fold more frequent in thecoding than in the non-coding region.In summary, nucleotide sequence divergence of the sod genefamily from halophilic archaebacteria exhibits many unusual andremarkable features. These features indicate quite clearly thatevolutionary processes are not uniform and predictable. Rather, theyare complex and involve subtle but intense selection that ispresumably exerted at the level of protein structure-function;somehow, this selection influences processes at the level of DNA thatare only now becoming apparent from sequence data analysis. Theseprocesses include duplication, generation of variability by mutation,fixation or elimination of mutations by drift or selection, andrecombination or gene conversion.9 22.3 Summary The protein sequences of seven members of the superoxidedismutase (SOD) family from halophilic archaebacteria have beenaligned and compared with each other and with the homologous Mnand Fe SOD sequences from eubacteria and the methanogenicarchaebacterium Methanobacterium thermoautotrophicum. Of 199common residues in the SOD proteins from halophilic archaebacteria,125 are conserved in all seven sequences and 64 of these areencoded by single unique triplets. The 74 remaining positionsexhibit a high degree of variability and for almost half of these theencoding triplets are connected by at least two nonsynonymousnucleotide substitutions.The majority of nucleotide substitutions within the seven genesare nonsynonymous and result in amino acid replacement in therespective protein, silent third codon position (synonymous)substitutions are unexpectedly rare. Halophilic SODs contain 30specific residues that are not found at the corresponding positions ofthe methanogenic or eubacterial SOD proteins. Seven of these arereplacements of highly conserved amino acids in eubacterial SODsthat are believed to play an important role in the three dimensionalstructure of the protein. Residues implicated in formation of theactive site, catalysis, and metal ion binding are conserved in all Mnand Fe SODs.Molecular phylogenies based on parsimony and neighbourjoining methods coherently group the halophile sequences butsurprisingly fail to distinguish between the Mn SOD of E. coli and theFe SOD of M. thermoautotrophicum as the outgroup.9 3These comparisons indicate that as a group the SODs ofhalophilic archaebacteria have many unique and characteristicfeatures. At the same time, the patterns of nucleotide substitutionand amino acid replacement indicate that these genes and theproteins they encode continue to be subject to strong and changingselection. This selection may be related to the presence of oxygenradicals and the inter- and intracellular composition andconcentration of metal cations.9 4CONCLUSIONSThe orthologous and paralogous members of the superoxidedismutase gene family in the halophilic archaebacteria were clonedand sequenced. The number of copies of these genes present in thegenome varies from one in Ha. marismortui to two in Hf. volcanii andHb. sp.GRB. All the genes show high sequence identity to the sod andslg genes of Hb. cutirubrum.The responses to paraquat by the five genes examined in thisstudy are not constant; mRNA levels from Hb. sp. GRB sod and Hf.volcanii sodl were found to increase (as reported previously for Hb.cutirubrum sod) and those of Hb. sp. GRB slg, Hf. volcanii sod2 andHa. marismortui (as reported previously for Hb. cutirubrum slg). InHa. marismortui, the presence of a single uninducible sod genesuggests that this organism may possess sufficient enzymatic SODactivity to counter the effects of intracellular superoxide radicalswithout the need to increase intracellular levels of the enzyme. Analternative possibility is that non-enzymatic SOD activity, similar tothose reported for free Mn and Mn-complexes (Archibald andFridovich, 1981; Rush et al., 1991), may exist in this organism.Analyses of the amino acid sequences of putative proteinproducts of the sod and slg genes indicate that the proteins aretypical of the other halobacterial proteins; they contain an excess ofacidic residues over basic residues. Comparison of the halobacterialSOD sequences with the Mn/Fe SOD from other organisms hasidentified 30 residues that are specific for the seven halobacterialproteins. Only five of these residues are acidic, indicating that the95acidic nature of the halobacterial proteins is due to the independentaccumulation of acidic residues at sites where such residues aretolerated in other non-halobacterial SODs. This convergent evolutionis one of the ways halobacterial proteins have evolved to withstandthe high intracellular salt concentration without drasticallycompromising the overall structure-function of the proteins.The comparison of the different sod and slg genes both withinand between species of the halobacteria shows an unusual pattern ofsubstitutions; there is no strong bias for third codon positionsubstitutions. As a result, non-synonymous substitutions tend to beretained more frequently than synonymous substitutions and a morerapid change in amino acid sequence identity compared to thenucleotide identity is observed. This in contrast to the situation ofother duplicated genes where, in the initail stages of divergence,nucleotide identity changes more rapidly than the amino acididentities of the proteins they encode (R. F. Doolittle, personalcommunication) It is also observed that transversions outnumbertransitions in almost all pairwise comparisons of the members of thesod gene family. The significance of this bias is uncertain.May and Dennis (1990) suggested that this pattern ofsubstitutions reflects diverged, or diverging, protein functions. Theresults from the present study show that this pattern of substitutionsalso occurs between pairs of orthologous genes of almost certainlyconserved function in different halophiles. It is difficult to establishwhether this pattern is due to positive selection or random drift. It ispossible that each protein is under strong selection for optimalactivity in different (or changing) micro-environments. The9 6environments might be related to the undetectable difference ininternal and external salt compositions and concentrations.In addition to this unusual pattern of substitutions, thehalobacteria also possess, as seen in vac genes in Hb. halobium andHb. mediterranei, the usual pattern that is found between closelyrelated genes in the eubacteria and the eucaryotes. In theseorganisms, a strong bias exists for substitutions in the third codonposition and transitions outnumber transversions. The pressures andmechanisms responsible for different patterns of mutation withindifferent regions of the genome are currently not understood.A consensus phylogenetic tree has been derived from theparsimony and neighbour joining analyses. When characteristicfeatures of the genes of the halophilic sod family are superimposedonto the consensus tree, two scenarios describing the evolution of thegenes can be formulated. The ancestral halophile might havepossessed duplicated genes, one of which was inducible and the othernon-inducible by paraquat. The present state would then result fromthe simple loss of the inducible gene from the branch leading to Ha.marismortui. Alternatively, the halophilic ancestor possessed only insingle sod gene which was unaffected by paraquat. A geneduplication event accompanied by aquisition of paraquat inducibilityearly in the branching leading to Halobacterium and Haloferax wouldresult in the presence of paralogous inducible and non-induciblegenes in these genera.This work has shown that nucleotide sequence divergence ofthe sod gene family from halophilic archaebacteria exhibits manyunusual and remarkable features. These features indicate quite9 7clearly that evolutionary processes are complex and involve subtleselection that is presumably exerted at the level of protein structure-function. This selection reflects itself in changes at the DNA level thatare only now becoming apparent from sequence data analysis. Theseprocesses include duplication, generation of variability by mutation,fixation or elimination of mutations by drift or selection, andrecombination or gene conversion.FUTURE RESEARCH PROSPECTS1) What is the function of the proteins that are encoded by theslg genes in Hb. cutirubrum and Hb. sp. GRB? It is possible that slgencodes SOD activity which was lost during purification or notdetectable with the assay employed. In order to determine if slg doesindeed encode SOD, sod - Hb. cutirubrum strains could be created byhomologous recombination and assayed for SOD activity. In addition,complementation of sod - slg - strains with an expression plasmidcontaining the slg gene could be attempted. The construction of asod - and/or slg - genotype by homologous recombination, usingappropriate deletions, is now feasible with the availability oftransformation systems for halobacteria.2) What are the signals for paraquat inducibility in thehalophilic sod genes? It may be possible to localize putativeregulatory regions by deletion, by site specific mutational analysis ofthe flanking regions, and by construction of hybrids of paraquat-andnon paraquat-inducible genes. For this study, Hb. cutirubrum or Hb.sp. GRB sod/slg genes with appropriate modifications of the flanking9 8regions contained on a shuttle plasmid could be utilized. Suchconstructs can be transformed into Hf. volcanii and the expression ofthe heterologous sod gene can be monitored by transcript analysis.By using appropriate constructs and controls, the paraquatresponsive regions could then be delineated.3) How are the other sod genes in the archaebacteria related intheir primary structure and the regulation by paraquat? To answerthis question, sod genes from other archaebacteria would have to becloned. Since Mn-SOD polypeptides from different organisms haveuniversally conserved regions, it should be possible to clone sodgenes using degenerate oligonucleotides for amplification by thepolymerase chain reaction. Subsequent phylogenetic analysis using alarger number of sequences will provide more insight into theevolution of the superoxide dismutase genes in the archaebacteria.9 9REFERENCESAntonini, E., Brunori, M., Greenwood, C. and Malmstrom, B.G. 1970. Catalyticmechanism of cytochrome oxidase. Nature. 228: 936-937.Archibald, F.S. and Fridovich, I. 1981. Manganese and defenses against oxygentoxicity in Lactobacillus plantarum. J. Bacteriol. 145: 442-451.Bayley, S. T. 1971. Protein synthesis systems from halophilic bacteria. MethodsMol. Biol. 1:89-100.Bannister, J. V., and Parker, M. W. 1985. The presence of a copper/zincsuperoxide dismutase in the bacterium Photobacterium leiognathi: a likelycase of gene transfer from eukaryotes to prokaryotes. Proc. Natl. Acad. Sci.$2,: 149-152.Boer, P. H and Hickey, D. A. 1986. The alpha amylase gene in Drosophilamelonogaster: nucleotide sequence, gene structure and expression motifs.Nucl. Acid. Res. 14:8399-8410.Brenner, S. 1988. The molecular evolution of genes and proteins: a tale of twoserines. Nature 3 3 4  :428 -43 O.Cadenas, E.,1989. Biochemistry of oxygen toxicity. Annu. Rev. Biochem. 58: , 79-110.Carlioz, A., and Touati, D. 1986. Isolation of superoxide dismutase mutants ofEscherichia coli; Is superoxide dismutase necessary for aerobic life? EMBO J.5_: 623-630.Chant, J., and Dennis, P. P. 1986. Archaebacteria: transcription and processingof ribosomal RNA sequences in Halobacterium cutirubrum. Embo. J. x:1091-1097.Chapman, D. J., and Schopf, J.W. 1983. Biological and biochemical effects of thedevelopment of an aerobic environment. In J. W. Schopf (Ed.) Earth's EarliestBiosphere: It's Origin and Evolution. Princeton Univ. Press: Princeton, N.J.Charlebois, R. L., Schalkwyk, L. C., Hofman, J. D., and Ford Doolittle, W. 1991.Detailed physical map and set of overlapping clones covering the genome ofthe archaebacterium Haloferax volcanii DS2. J. Mol. Biol. 222:509-524.Cline, S. W., Lam, W. L., Charlebois, R. L., Schalkwyk, L. C., and Ford Doolittle, W.1989. Transformation methods for halophilic archaebacteria. Can. J.Microbiol. a:148-152.Daniels, C.J., McKee, A. H. Z., and Doolittle, W. F. 1984. Archaebacterial heat-shock proteins. EMBO J. 3: 745-749.Demple, B., Amabile-Cuevas, C.F. 1991 Redox Redux: The control of oxidativestress responses. Cell. 51.: 837-839.1 0 0Dennis, P. P. 1985. Multiple promoters for the transcription of ribosomal RNAgene cluster in Halobacterium cutirubrum. J. Mol. Biol. 186: 457-461.Dente, L., and Cortese, R. 1983. pEMBL: A new family of single-strandedplasmids for sequencing DNA. Methods in Enzymology. 155:111-118.Ebert, K., Goebel, W., and Pfeifer, F. 1984. Homologies between heterogeneousextrachromosal DNA populations of Halobacterium halobium and four newhalobacterial isolates. Mol. Gen. Genet. 194:91-97.Englert, C., Horne, M., and Pfeifer, F. (1990). Expression of the major gas vesicleprotein gene in the halophilic archaebacterium Haloferax mediterrane i smodulated by salt. Mol. Gen. Genet. 222: 225-232.Fee, J. A. 1991. Regulation of sod genes in Escherichia coli: relevance tosuperoxide dismutase function. Mol. Microbiol. 1: 2599-2610.Feinberg, A., and Volgelstein, B. 1982. A technique for radiolabelling DNArestriction endonuclease fragments to high specific activity. Anal. Biochem.132:6-13.Felsenstein, J. 1988. Phylogenies from molecular sequences; Inference andreliability. Annu. Rev. Genet. 22:521-565.Fitch, W. M. 1971. Toward defining the course of evolution; Minimal change fora specific tree typology. Syst. Zool. 2Q:406-416.Fitch, W. M, and Ferris, S. D. 1974. Evolutionary trees with minimum nucleotidereplacements from amino acid sequences. J. Mol. Evol. 1:263-278.Fridovich, I. 1978. The biology of oxygen radicals. Science. 201: 875-880.Fridovich, I. 1986a. Biological effects of the superoxide radical. Arch. Biochem.Biophys. 211:1-8.Fridovich, I. 1986b. Superoxide dismutases. Adv. Enzymol. 51.:68-97.Haber, F., and Weiss, J. 1934. The catalytic decomposition of hydrogen peroxideby iron salts. Proc. R. Soc. London Ser. A. J47:332-351.Hassan, H. M., and Fridovich, I. 1977. Regulation of the synthesis of superoxidedismutase in Escherichia coli: Induction by methyl viologen J. Biol. Chem.212.: 7 6 6 7 - 7 6 72 .Henikoff, S. 1987. Exonuclease III generated deletions for DNA sequenceanalysis. Promega Notes. Promega Corp. No. 8. (pp. 1-3).Hickey, D. A., Bally-Cuif, L., Abukashawa, S., Payant, V., and Benkel, B. F. 1991.Concerted evolution of duplicated protein-coding genes in Drosophila. Proc.Natl. Acad. Sci. a:1611-1615.101Holmes, M. L., and Dyall-Smith M.L. 1990. A plasmid vector with a selectablemarker for halophilic archaebacteria. J. Bacteriol. 172:756-761.Holmes, M. L., Nuttall, S. D., and Dyall-Smith, M. L. 1991. Construction and use ofhalobacterial shuttle vectors and further studies on Haloferax DNA Gyrase. J.Bacteriol. 173:3807-3813.Home, M., Englert, C., and Pfeifer, F. 1988. Two genes encoding gas vacuoleproteins in Halobacterium halobium. Mol. Gen. Genet. 213: 459-464.Hsia, H., J. Lebkowski, P. Leong, M. Calos and J. Miller. 1989. Comparison ofultraviolet irradiation induced mutagenesis of the lac I gene in Escherichiacoli and in Human 293 cells. J. Mol. Biol.  205:103-113..Imlay, J.A. and Fridovich, I. 1991. Assays of metabolic superoxide products inEscherichia coli. J. Biol. Chem 266: 6957-6965.Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S., and Miyata, T. 1989. Evolutionaryrelationship of archaebacteria, eubacteria, and eukaryotes inferred fromphylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. a:9355-9359.Jones, W. J., Nagle, D. P., and Whitman, W. B. 1987. Methanogens and thediversity of archaebacteria. Microbiol. Rev. 5.1: 135-177 .Kimura, M. 1980. A simple method for estimating evolutionary rates of basesubstitutions through comparative studies of nucleotide sequences.^J. Mol.Evol. j€: 111-120.Kimura, M. 1983. The neutral theory of molecular evolution. CambridgeUniversity Press, Cambridge.Kirby, T. W., Lancaster, J. R., and Fridovich, I. 1981. Isolation andcharacterization of the iron-containing superoxide dismutase ofMethanobacterium bryantii. Arch. Biochem. Biophys. 210:140-148.Kozak, M. 1978. How do eucaryotic ribosomes select inititation regions inmessenger RNA? Cell. _11:1109-1123.Lam, W. L., and Ford Doolittle, W. 1989. Shuttle vectors for the archaebacteriumHalobacterium volcanii. Proc. Natl. Acad. Sci. USA.^5478-5482.Lanyi, J. 1974. Salt dependence of proteins from extremely halophilic bacteria.Bacteriol. Rev. a:272-290.Lanyi, J. 1979. Physiochemical aspects of salt dependence in halobacteria. InM. Shilo (Ed.) Strategies of microbes in extreme environments. (pp. 93-107).Verlag Chemie: Weinheim, FRG.Liu, Y. Boone, D., Choy, C. 1990. Methanohalophilus orgonense sp. nov., amethylotrophic methanogen from an alkaline, saline aquifer. Int. J. Syst.Bacteriol. LEI: 111-116.102Maniatis, T., Fritsch, E. F., and Sambrook, J. 1982. Molecular cloning: alaboratory manual. Cold Spring Harbour Laboratory: Cold Spring Harbour,N.Y.Margulis, L. 1970. Origin of eucaryotic cells. Yale Univeristy Press: New Haven,CT.Marklund, S., and Marklund, G. 1974. Involvement of the superoxide anionradical in the autoxidation of Pyrogallol and a convenient assay forsuperoxide dismutase. Eur. J. Biochem. 41:469-474.May. B. P., and Dennis, P. P. 1987. Superoxide dismutase from the extremelyhalophilic archaebacterium Halobacterium cutirubrum. J. Bacteriol.169:1417-1422.May. B. P., and Dennis, P. P. 1989. Evolution and regulation of the geneencoding superoxide dismutase from the archaebacterium Halobacteriumcutirubrum. J. Biol. Chem. al:12253-12258.May, B., and Dennis, P. P. 1990. Unusual evolution of a superoxide dismutase-like gene from the extreme halophilic archaebacterium Halobacteriumcutirubrum. J. Bacteriol. 1ZZ: 3725-3729.May, B. P., Tam, P., and Dennis, P. P. 1989. The expression of the superoxidedismutase gene in Halobacterium cutirubrum and Halobacterium volcanii.Can. J. Microbiol. 35:171-175.McCord, J. M., and Fridovich, I. 1969. Superoxide dismutase. An enzymicfunction for Erythrocuprein (Hemocuprein). J. Biol. Chem. 244:6049-6055Mullakhanbhai, M. F., and Larsen, H. 1975. Halobacterium volcanii spec. nov.; aDead Sea halobacterium with a moderate salt requirement. Arch. Microbiol.104:207-214.Mylvaganam, S., and Dennis, P. P. 1992. Sequence heterogeneity between thetwo genes encoding 16S rRNA from the halophilic archaebacteriumHaloarcula marismortui. Genetics, 130:399-410.Neuman, A. 1987. Specific accessory sequences in Saccharomyces cerevisiaeintrons control assembly of pre-mRNA into apliceosomes. EMBO J. k:3833 -3839.Ochman, H. and Wilson, A. C. 1987. Evolution in bacteria: Evidence for auniversal substitution rate in cellular genomes. J. Mol. Evol. 2_6_: 74-86.Oesterhelt, D.,1985., Light driven proton pumping in halobacteria. BioScience.25_: 18-21.Oren, A., Lau, P.. P., and Fox, G. E. 1988. The taxonomic status of Halobacteriummarismortui from the Dead Sea; a comparison with Halobacteriumvallismortis. System. Appl. Microbiol. 1a:251-258.103Parker, M. W., and Blake, C. C. 1988. Iron- and manganese-superoxidedismutases can be distinguished by analysis of their primary structures.FEBS Letters. 229:377-382.Paterek, J. R., and Smith, P. H. 1988. Methanohalphilus mahii gen. nov., sp.nov., a methylotrphic halophilic methanogen. Inter. J. Syst. Bacteriol.111:122-123.Peterson, G. L. 1977. A simplification of the protein assay method of Lowry etal. which is generally more applicable. Anal. Biochem. 21. 346-356.Pfeifer, F., and M. Betlach. 1985. Genome organization in Halobacterumcutirubrum: A 70 kb island of more (AT) rich DNA in the chromosome.Mol.Gen. Genet. 191: 449-455Phillips, J. P., Campbell, S. D. , Michaud, D., Charbonneau, M., and Hilliker, A. J.1989. Null mutation of copper/zinc superoxide dismutase in Drosophilaconfers hypersensitivity to paraquat and reduced longevity. Proc. Natl. Acad.Sci. USA. 86:2761-2765.Puget, K. and Michelson, A.M. 1974. Isolation of a new copper-containigsuperoxide dismutase bacteriocuprein. Biochem. Biophys. Res. Commun. 51:830-838.Reddy, K.J., Webb, R. and Sherman, L.A. 1990. Bacterial RNA isolation with onehour centrifugation in a table top ultracentrifuge. BioTechniques. 1: 250-251.Reiter, W. D., Palm, P., Voos, W., Kaniecki, J., Grampp, B., Schulz, W., and Zillig,W. 1987. Putative promoter elements for the ribosomal RNA genes of thethermoacidophilic archaebacterium Sulfolobus sp. strain B12. Nucleic AcidsRes. 11:5581-5595Rush, J.D., Maskos, Z. and Koppenol, W. 1991. The superoxide dismutaseactivities of two higher valent manganese complexes, MnIV desferrioxamineand MnIII cyclam. Arch. Biochem.Biophys. 289: 97-102.Saenger, W. 1987. Structure and dynamics of water surrounding biomolecules.Ann. Rev. Biophys. Chem. 93-114.Saitou, N., and Nei, M. 1987. The neighbour joining method; A new method forreconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.Salin, M. L., and Oesterhelt, D. 1988. Purification of a manganese-containingsuperoxide dismutase from Halobacterium halobium. Arch. Biochem.Biophys. 260:806-810.Salin, M. L., Duke, M. V., Oesterhelt, D., and Din-Pow, M. 1988. Cloning anddetermination of the nucleotide sequence of the Mn-containing superoxidedismutase Halobacterium halobium. Gene. /Q:153-159.104Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chainterminating inhibitors. Proc. Natl. Acad. Sci. USA. 2.1:5463-5467.Sapienza, C., and Ford Doolittle, W. 1982. Unusual physical organization of theHalobacterium genome. Nature. 2.21: 384-389.Sarich, V.M., and A.C. Wilson. 1973. Generation time and genomic evolution inprimates. Science. 179:1144-1147.Schiavone, J. R., and Hassan, H. M. 1987. Biosynthesis of superoxide dismutasein eight prokaryotes: effects of oxygen, paraquat and an iron chelator. FEMSMicrobiol. Letters. 42:33-38.Schnabel,H., Ziliig, W., Pfaffle, M., Schnabel, R., Michel, H.,Delius, H. 1982.Halobacterium halobium phage,H1. EMBO J. 1: 87-92.Searcy, K. B., and Searcy, D. G. 1981. Superoxide dismutase from thearchaebacterium Thermoplasma Acidophilum. Biochimica et BiophysicaActa. 670:39-46.Shine, J. and Dalgarno, L. 1974. The 3' terminal sequence of Escherichia coli16S ribosomal RNA: complementarity to nonsense triplets and ribosomebinding sites. Proc. Natl. Acad. Sci. USA. 71: 1342-1346.Southern, E. M. 1975. Detection of specific sequences among DNA fragmentsseperated by gel electrophoresis. J. Mol. Biol. 2.&:503-517.Stallings, W. C., Pattridge, K. A., Strong, R. K., and Ludwig, M. L. 1984.Manganese and iron superoxide dismutases are structural homologs. J. Biol.Chem. 259:10695-10699.Stallings, W.C., Partridge, K.A., Strong, R.K., Ludwig, M.L. 1985. The structure ofmanganese superoxide dismutase from Thermus thermophilus HB8 at 2.4-Aresolution. J. Biol. Chem. 260: 16424-16432Stanier, R. Y., Adelberg, E. A. and Ingraham, J. 1976. In The Microbial World.(pp. 65). Prentice -Hall, Inc. Englewood Cliffs, NJ.Steinman, H. M. 1982. Superoxide dismutases: Protein chemistry and structure-function relatinships. In Superoxide Dismutases. Vol. 1. (pp. 11-68). L. W.Oberly (Ed.) CRC Press: Boca Raton, FL.Takao, M., Kobayashi, T., Oikawa, A., and Yasui, A. 1989. Tandem arrangement ofphotolyase and superoxide dismutase genes in Halobacterium halobium. J.Bacteriol 171:6323-6329.Takao, M., Oikawa, A., and Yasui, A. 1990. Characteristics of a superoxidedismutase gene from the archaebacterium Methanobacteriumthermoautotrophicum. Arch. Biochem. Biophys. 283:210-216.Takeda, Y.and Avila, H., 1986. Structure and gene expression of the E.coli Mn-superoxide dismutase gene. Nucl. Acid. Res. 14: 4577-4589.105Walter, M. R. 1983. Archean stromatolites; Evidence of the earth's earliestbenthos. (pp. 187-212). In J. W. Schopf (Ed.) Earth's earliest biosphere; Itsorigin and evolution. Princeton University Press: Princeton, NJ.Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. a:221-271.Woese, C. R., Kandler, 0., and Wheelis, M. L. 1990. Towards a natural system oforganisms: Proposal for the domains Archae, Bacteria, and Eucarya. Proc.Natl. Acad. Sci. 87:4576-4579.Wu, J., and Weiss, B. 1991. Two divergently transcribed genes, soxR and soxS,control a superoxide response regulation of Escherichia coli. J. Bacteriol.173:2684-2871.Zhang, Q., and Yonei, S. 1991. Induction of manganese-superoxide dismutase bymembrane-binding drugs in Escherichia coli. J. Bacteriol. 173  :3488-3491 .Zuckerkandl, E., and Pauling, L. 1965. Molecules as documents of evolutionaryhistory. J. Theor. Biol. & 357-366.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items