Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Single gene circles in dinoflagellate chloroplast genomes : characterization and phylogeny Zhang, Zhaoduo 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2000-566552.pdf [ 17.08MB ]
Metadata
JSON: 831-1.0099520.json
JSON-LD: 831-1.0099520-ld.json
RDF/XML (Pretty): 831-1.0099520-rdf.xml
RDF/JSON: 831-1.0099520-rdf.json
Turtle: 831-1.0099520-turtle.txt
N-Triples: 831-1.0099520-rdf-ntriples.txt
Original Record: 831-1.0099520-source.json
Full Text
831-1.0099520-fulltext.txt
Citation
831-1.0099520.ris

Full Text

SINGLE GENE CIRCLES IN DINOFLAGELLATE CHLOROPLAST GENOMES: CHARACTERIZATION AND PHYLOGENY by ZHAODUO ZHANG B. Sc., Huazhong Normal University, 1983 M . Sc., Hebei Normal University, 1988 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY i n THE F A C U L T Y OF G R A D U A T E STUDIES DEPARTMENT OF B O T A N Y We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A July 2000 ©Zhaoduo Zhang, 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, 1 agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date DE-6 (2/88) Abstract Chloroplast D N A was isolated from the peridinean dinoflagellate Heterocapsa triquetra on CsCl gradients and used to construct three plasmid libraries. Complete sequencing of chloroplast 16S and 23S ribosomal RNA, and eight chloroplast protein genes revealed that each gene is located alone on a separate minicircle: "one gene - one circle". Each circle has an unusual tripartite non-coding 9G-9A-9G region (putative replicon origin), which is highly conserved among the ten circles. This organization is extremely different from the chloroplast genome (120-200 kb) in higher plants and algae. Five aberrant minicircles that have the tripartite 9G-9A-9G region were also sequenced. However, each aberrant circle consists of several short fragments from two or three of four chloroplast genes: psbA,psbC, 16S and 23S rRNA, instead of a complete chloroplast gene. Comparison of the sequences of the five circles indicated that all five circles are related, and could have evolved by differential deletions and duplications from four common ancestral unigenic circles. Probably the aberrant circles have no function, and are selfish D N A in the chloroplast of H. triquetra. In order to investigate the generality of the minicircular chloroplast genes, genomic D N A from fourteen other dinoflagellates of five orders were hybridized with chloroplast gene probes. Minicircles were detected in eight species. Chloroplast 23S rRNA and psbA genes were amplified from five species by PCR; sequencing the PCR products confirmed they are minicircles. Sequence comparison showed that the chloroplast genes are conserved among different species; the noncoding region of the circles is conserved within each species, but different between species. The presence of minicircles in five orders suggests that unigenic minicircles evolved in the ancestor of peridinean dinoflagellates. However, D N A blots and PCR amplification of chloroplast genes from several other dinoflagellates showed that chloroplast genes might be present in large D N A molecules outside the five orders. Phylogenetic analyses using sequences of ribosomal R N A genes, and sequences of individual proteins as well as seven concatenated protein sequences were carried out. Quartet puzzling, maximum likelihood, maximum parsimony, neighbor joining and LogDet trees were constructed. Inter-site rate variation and invariant sites were allowed for quartet puzzling and neighbor joining. A l l psbA and 23 S rRNA trees showed peridinin-containing dinoflagellate chloroplasts as monophyletic. In protein trees they are related to those of chromists and red algae. In 23S rRNA trees dinoflagellates are the sisters of sporozoans (apicomplexans), suggesting that dinoflagellate chloroplasts may be related to sporozoan plastids. The branches of the dinoflagellates and sporozoans are very long, and the possibility of long-branch artifacts cannot be ruled out. A l l the trees fit the idea that dinoflagellate chloroplasts originated from red algae by a secondary endosymbiosis. i i i TABLE OF COTENTS Abstract i i Table of contents iv List of Tables ix List of Figures x Acknowledgements xiv Chapter 1 Introduction 1.1 Dinoflagellates 1 1.1.1 General characteristics r. 1 1.1.2 Dinoflagellate chloroplasts 2 1.1.3 Heterocapsa triquetra 4 1.1.4 Alveolates 6 1.2 Chloroplast characters of major algal groups 7 1.2.1 Pigments 7 1.2.2 Chloroplast structure 8 1.3 Chloroplast genomes 9 1.3.1 Physical properties of chloroplast D N A 9 1.3.2 Genome structure 11 1.3.3 Gene content 14 1.3.4 Phylogenetic implications 25 iv 1.3.5 The mystery of dinoflagellate chloroplast genome 26 1.4 Origin and Evolution of chloroplasts 27 1.4.1 Chloroplast origin: primary and secondary endosymbiosis 27 1.4.2 Molecular evidence for secondary endosymbiosis 28 1.4.3 Origin of dino flagellate chloroplasts 30 1.5 Objectives 31 Chapter 2 Materials and methods 2.1 Strains 33 2.2 Culturing of dinoflagellates 33 2.3 Isolation and purification of chloroplast D N A from Heterocapsa triquetra 34 2.4 Total D N A isolation 35 2.5 Total R N A isolation from Heterocapsa triquetra 35 2.6 Southern blot 36 2.7 Northern blot 37 2.8 Construction of plasmid libraries 37 2.9 Plasmid D N A preparation 39 2.10 Polymerase chain reactions 39 2.11 D N A sequencing 40 2.12 Analysis of D N A sequences 41 2.13 Phylogenetic analyses 42 Chapter 3 Separation of chloroplast DNA 3.1 Chloroplast genepsbA andpsbD are present in the satellite D N A of Heterocapsa 46 3.2 Chloroplast psbA gene is not present in the satellite D N A of Amphidinium 51 3.3 Discussion 54 3.3.1 Separation of dinoflagellate chloroplast D N A using CsCl gradient 54 3.3.2 Unusual structure of dinoflagellate chloroplast genes 56 Chapter 4 Characterization of chloroplast genes in Heterocapsa triquetra: "one gene - one circle" 4.1 Random sequencing of clones from plasmid libraries of satellite D N A 60 4.2 Identification often chloroplast genes 60 4.3 Contig assembly in Gap 4 of Staden 63 4.4 The initially puzzling structure of a chloroplast psbA clone 69 4.5 psbA is on a minicircle, as is the 23S rRNA gene 72 4.6 Each gene is on a minicircular chromosome: "one gene-one circle" 78 4.7 Characteristics of chloroplast genes 80 4.7.1 The most divergent chloroplast genes ever sequenced 80 4.7.2 Starts and stops 89 4.8 Tripartite non-coding 9G-9A-9G region 92 4.9 R N A blot 100 vi 4.10 Chloroplast genome size 102 Chapter 5 Minicircles with jumbled chloroplast gene fragments 5.1 General characteristics 105 5.2 Aberrant minicircles are circular molecules 105 5.3 Jumbled chloroplast gene fragments 110 5.4 The 9G-9A-9G region 117 5.5 Other non-genic regions 123 5.6 Discussion 126 5.6.1 Minicircular chloroplast D N A molecules in other organisms 126 5.6.2 Chloroplast D N A fragments in mitochondrial and nuclear genomes 127 5.6.3 Selfishness of the aberrant minicircles 129 5.6.4 Origin of the aberrant minicircles 130 Chapter 6 Generality of single gene circles in dinoflagellates 6.1 Introduction 135 6.2 Southern blots of genomic D N A of fourteen dinoflagellates 135 6.3 PCR amplification of chloroplast psbA and 23S rRNA gene from dinoflagellates 139 6.4 Comparison of non-coding regions from five dinoflagellates 148 vii 6.5 Possible function of the non-coding region of minicircular chloroplast genes 153 6.6 Origin and significance of unigenic circles in dinoflagellates 154 Chapter 7 Phylogeny of dinoflagellate chloroplast genes 7.1 Introduction 156 7.2 The dinoflagellate 23S rRNA dequences are chloroplast genes 157 7.3 Relationships among the chloroplast 23S rRNA genes 165 7.4 Chloroplast 16S rRNA may not be reliable in phylogenetic analysis 170 7.5 Phylogenetic analyses of individual protein sequence2 174 7.6 Phylogenetic analyses of seven concatenated protein sequences 176 7.7 Discussion 179 7.7.1 Peridinean dinoflagellate chloroplasts probably originated by secondary endosymbiosis 179 7.7.2 Relationships among the dinoflagellates 180 7.7.3 Ancestral peridinean dinoflagellates probably had chloroplasts with minicircles 182 7.7.4 Dinoflagellate chloroplasts might be related to Sporozoa plastids 184 7.7.5 Accelerated evolution of dinoflagellate chloroplast genes 186 Reference 188 List of Tables Table 1-1 Characteristics of the major algal groups 8 Table 1-2 Chloroplast genomes completely sequenced 12 Table 1.3 Chloroplast genes of completely sequenced chloroplast genomes 16 Table 1.4 Number of genes on the sequenced chloroplast genomes 19 Table 4.1 Features of nine chloroplast genes and their 9G-9A-9G region from H. triquetra region 62 Table 4.2 Indels in H. triquetra chloroplast protein genes 87 Table 4.3 Shine-Dalgarno like sequence in protein genes of H. triquetra 92 Table 4.4 Direct repeats on single gene circles of H. triquetra 95 Table 5.1 Characteristics of chloroplast gene fragments on circles 1-5 and their coordinates on single gene circles oiH. triquetra 109 Table 5.2 Repeated sequences on aberrant minicircles 112 Table 6.1 Summary of D N A blots probed with chloroplast psbA and 23S rRNA gene in fifteen dinoflagellates 140 Table 6.2 Dinoflagellate chloroplast 23S rRNA and psbA primers 141 Table 6.3 PCR amplification of chloroplast genes from various dinoflagllates 141 Table 6.4 Characteristics of minicircular psbA and 23S rRNA gene of five dinoflagellates 143 Table 6.5psbA and 23S rRNA gene partially sequenced in four dinoflagellates.. ..143 Table 7.1 Base composition of chloroplast and cyanobacterial 23S rRNA sequences 164 ix List of Figures Figure 1.1 General characters of a generalized peridinean dinoflagellate 3 Figure 1.2 Axenic Heterocapsa triquetra (CCMP 449) 5 Figure 1.3 Origin of chloroplast by endosymbiosis 29 Figure 3.1 Separation of satellite and major band D N A from H triquetra on a CsCl gradient 47 Figure 3.2 Electrophoresis and Southern blot of satellite and major band D N A of Heterocapsa triquetra 48 Figure 3.3 Separation of two satellite D N A bands from Heterocapsa triquetra 50 Figure 3.4 Separation and Southern blot of satellite and major D N A band from Heterocapsa pygmaea 52 Figure, 3.5 Separation and Southern blot of two satellite D N A bands from Heterocapsa pygmaea 53 Figure 3.6 Separation and Southern blot of satellite and major band D N A from Amphidinium carterae 55 Figure 4.1 Flowchart of constructing three plasmid libraries from the satellite D N A of Heterocapsa triquetra 61 Figure 4.2 Random sequencing and sequence assembly 64 Figure 4.3 Ten chloroplast gene contigs assembled from sequences of different clones 65 Figure 4.4 Unexpected "Confused" structure of apsbA clone 70 Figure 4.5 Two possible psbA gene structures 71 Figure 4.6 psbA gene contig and its circularized minicircle 73 Figure 4.7 Southern blot of satellite D N A and total D N A of Heterocapsa triquetra 74 Figure 4.8 23S rRNA gene contig and minicircle 75 Figure 4.9 Confirmation of psbA and 23S rRNA minicircle by inverse PCR 77 Figure 4.10 Structure of ten chloroplast single gene circles 79 Figure 4.11 D N A and protein sequence alignment of apsaA gene segment 81 Figure 4.12 Alignment of a very conserved segment of psbA gene 83 Figure 4.13 Indels in proteins encoded by H. triquetra chloroplast genes 85 Figure 4.14 Protein sequence alignment of chloroplast ribosomal small subunit 14 88 Figure 4.15 Putative starts (bold) and the N terminus of eight chloroplast protein genes in H. triquetra 90 Figure 4.16 Putative promoter region on D4 region of ten chloroplast genes 90 Figure 4.17 Structure and sequences of the tripartite 9G-9A-9G region of chloroplast gene minicircles 93 Figure 4.18 Sequence alignment of the variable region of D2 and D3 from ten chloroplast genes 96 Figure 4.19 D N A blot probed with the non-coding 9G-9A-9G region ofpsaA minicircle 98 Figure 4.20 Secondary structure of the tripartite non-coding region 9G-9A-9G of the minicirular psbA of H. triquetra 99 Figure 4.21 Hybridization of Heterocapsa triquetra R N A with chloroplast gene probes 101 Figure 5.1 Five minicircles with jumbled chloroplast gene fragments and the coordinates of the fragments on normal chloroplast genes 106 Figure 5.2 Alignment of the sequences of repeated regions x. 114 Figure 5.3 Sequence alignment of 9G and 9A cores in selfish minicircles and their related chloroplast gene circles 118 Figure 5.4 Sequence alignment of D2 and D3 of selfish and chloroplast gene minicircles 121 Figure 5.5 Structure (a) and sequence of S2 (b) of circle 2-5 125 Figure 5.6 Origin of selfish circles from four chloroplast genes 131 Figure 6.1 Southern blots of uncut total genomic D N A of dinofalagellates 136 Figure 6.2 Structure of the 23S rRNA and psbA minicircle of//, triquetra 142 Figure 6.3 Comparison of the structure of psbA and 23S singlel gene circle in five different dinoflagllates 146 Figure 6.4 Structure of non-coding region of single gene circles of five peridinean dinoflagellates 150 Figure 6.5 Consensus sequences of the non-coding region of five dinoflagellates...l51 Figure 6.6 Sequences and secondary structure of inverted repeats 151 Figure 7.1 Maximum likelihood tree of 23S rRNA sequences of chloroplasts, mitochondria and bacteria 159 Figure 7.2 Maximum parsimony tree of 23S rRNA sequences of chloroplasts, mitochondria and bacteria 162 Figure 7.3 Gamma distribution neighbor joining tree of chloroplast 23S rRNA sequences 163 Figure 7.4 LogDet tree of 23S rRNA sequences using 164 Figure 7.5 Quartet Puzzling tree of 23S rRNA sequences 166 Figure 7.6 Neighbor joining tree of 23S rRNA sequences (HKY85, equal rates)... 167 Figure 7.7 Gamma distribution neighbor joining tree of chloroplast 23S rRNA sequences 168 Figure 7.8 LogDet tree of chloroplast 23S rRNA sequences 169 Figure 7.9 Maximum likelihood tree of chloroplast 16S rRNA 172 Figure 7.10 Quartet puzzling tree of chloroplast 16S rRNA sequences 173 Figure 7.11 Neighbor joining tree of psbA protein sequences 175 Figure 7.12 Neighbor joining tree ofpsbA gene sequences 177 Figure 7.13 Neighbor-joining tree of seven concatenated protein sequences 178 Figure 7.14 Schematic relationship between selected dinoflagellates and other alveolates 181 A C K N O W L E D G E M E N T There are a few people I want to acknowledge. I thank Drs Margaret Beaton and Ken-ichiro Ishida for helpful discussion, and for help with the bench techniques as well as for computer programs of sequence editing and phylogenetic analyses. I wish to thank Elena Filek for help me with random sequencing of the clones, Ema Chao, Xiaonan Wu and Lengtuo Deng for help with PCR and sequencing techniques, and Qing Wang for her advice on R N A preparation and R N A blot. I also wish to thank Juan Saldarriaga for his genomic D N A of several dinoflagellate species. And I also wish to express my gratitude to Dr. Griffith for permission to access to his laboratory and equipment. I greatly appreciated the great comments from Drs Mary Berbee, Carl Douglas and Thomas Cavalier-Smith for this dissertation. I thank Drs Beverley Green for spending time on reading several versions of this dissertation. I also wish to thank Drs Beverley Green and Thomas Cavalier-Smith for their encouragement, financial support through out this study, and for their very helpful comments and discussion in preparation of several manuscripts. I wish to thank Drs Patrick Keeling and F. J. R. Taylor for their helpful comments in preparation of manuscript. I thank my wife, Sufang Zhang, and my son, Shangye Zhang, for their support through out this study. xiv To the memory of my parents Zhang Gong Yuan Chen X i Yin XV Chapter 1 Introduction Ten chloroplast genomes, six from higher plants and four from algae, had been completely sequenced at the time when I started this project. Individual chloroplast gene sequences were also reported from many species. Dinoflagellates were the only major algal group with no reported chloroplast gene sequences. Despite the evidence that a dinoflagellate satellite D N A contained chloroplast genes (Boczar et al. 1991), previous attempts to obtain chloroplast gene sequences had been uniformly unsuccessful. This was an obvious gap in the study of the origin and evolution of chloroplasts, and was one of the last frontiers in the area. M y objectives were to clone and sequence the chloroplast genes of a peridinean dinoflagellate, Heterocapsa triquetra, and to do phylogenetic analyses using chloroplast gene and protein sequences. This project should provide information for chloroplast gene structure and chloroplast genome organization of the dinoflagellates, and for the origin and evolution of dinoflagellate chloroplasts. 1.1 Dinoflagellates 1.1.1 General characteristics Dinoflagellates are a very diverse group of unicellular eukaryotic organisms with approximately 2,000 living species, about half of them photosynthetic. Photosynthetic dinoflagellates are important primary producers in marine and fresh water ecosystems 1 both as free-living algae and as symbionts within corals, while some are notorious for causing toxic "red tides" and killing fish (Dodge 1984; Taylor 1990). Typical dinoflagellates have two flagella: the transverse flagellum that encircles the cell and causes the cell to turn around, and the longitudinal flagellum that beats posteriorly and drives the cell moving forward (Taylor 1990) (Figure 1.1). Dinoflagellates have a complex cell cortical structure, comprising membrane-bounded alveoli in which cellulose thecal plates are found in many species. They can be divided into naked genera that have no thecal plate and armored genera that have the theca plates (Sze 1993). Dinoflagellates have an unusual nucleus known as a dinokaryon with permanently condensed chromosomes (Soyer and Haapala 1974c) that usually lack histones (Rizzo 1981; Raikov 1982). There are enormous amounts of D N A in the nucleus of dinoflagellates, ranging from 3 pg/cell in Amphidinium to 200 pg/cell in Gonyaulax (Rizzo 1987), i.e. much higher than 0.1-0.2 pg/cell in other algae. 1.1.2 Dinoflagellate chloroplasts Typical dinoflagellate chloroplasts contain peridinin-chlorophyll a/c light-harvesting pigments (Jeffrey et al. 1975), and the peridinin (a xanthophyll) gives dinoflagellates the reddish-brown color associated with "red tides". Dinoflagellate chloroplasts differ from all other chloroplasts (except those of euglenoids) in being surrounded by an envelope of three membranes (Dodge 1975), rather than two membranes like red algal, glaucophyte and green plant chloroplasts, or four membranes as chromist, chlorarachnion and sporozoan (e.g. Toxoplasma) plastids. The outer 2 Figure 1.1 General characters of a generalized dinoflagellate (Taylor 1990). (a) Ventral view, (b) Longitudinal section. Abbreviations: E, epicone; H, hypocone; G, girdle; S U , sulcus; LF, longitudinal flagellum; TF, transverse flagellum; MT, mitochondrion; AV, amphiesmal vesicle; V, Vacuome; NU, nucleus; PL, plastid; P S , sac pusule; PY, pyrenoid; AX, anoneme; S S , striated strand; 3 membrane of the dinoflagellate chloroplasts does not have ribosomes on its cytoplasmic surface, like the chloroplasts of red algae, euglenoids, green algae and higher plants. The dinoflagellate chloroplasts do not contain starch grains and are different from those of chromists by lacking the girdle thylakoid (Schnepf 1992). The dinoflagellate chloroplasts are cup-shaped, peripheral and reticulate, or multilobed stellate with single pyrenoid (Bibby and Dodge 1974), and contain single or multiple chloroplast nucleoids (DNA-containing areas) (Coleman 1985). The lamellae systems of the dinoflagellate chloroplast usually consist of stacks of three thylakoids. A few dinoflagellates have chloroplasts that do not contain the pigment peridinin. The pigment composition and the presence of additional membranes suggest that the chloroplasts of these dinoflagellates have been derived from haptophytes (e.g. Gymnodinium breve: Delwiche 1999), cryptomonads (e.g. Dinophysis acuminata, Schnepf and Elbrachter 1988; Vesk et al. 1996) and prasinophytes (e.g. Lepidodinium viride: Watanabe et al. 1990) by tertiary endosymbioses. In addition to its chloroplasts, the endosymbiont in the dinoflagellate Peridinium foliaceum still keeps its own nuclei, and phylogenetic analyses of nuclear 18S rRNA sequence indicated that the endosymbiont is a diatom (Chesnick et al. 1997). 1.1.3 Heterocapsa triquetra Heterocapsa triquetra (Figure 1.2) is an armored and biflagellated peridinean dinoflagellate that has a peridinin-containing chloroplast with a single-stalked pyrenoid (Dodge and Crawford 1968). H. triquetra is a member of the genus Heterocapsa, order 4 Figure 1.2 Axenic Heterocapsa triquetra (CCMP 449) (Courtsey of Dr. Ishida) a. ventral view. b. dorsal view 5 Peridiniales. Heterocapsa is at the base in the evolutionary tree drawn on the basis of morphological characteristics (Taylor 1990). Phylogenetic analysis of dinoflagellate nuclear 18S rRNA sequences suggested that Heterocapsa might be a derived dinoflagellate (Saunders et al. 1997). However, 18S rRNA trees containing more taxa than Saunders' tree suggested that the position of Heterocapsa is unresolved within the dinoflagellates (Saldarriaga et al. 2000, in prep.). 1.1.4 Alveolates Alveolates, or infrakingdom Alveolata, consists of three phyla: Dinozoa, Ciliophora and Sporozoa (Apicomplexa) (Cavalier-Smith 1993). The key character shared by the three phyla is the presence of cortical alveoli, or the derivatives of cortical alveoli beneath the cell membrane (Figure 1.1b). The cortical alveoli were named amphiesmal vesicles in dinoflagellates but alveoli in ciliates. Phylogenetic analyses of nuclear 18S rRNA showed a monophyletic group consisting of dinoflagellates, ciliates and sporozoans, supporting the idea that the dinoflagellates are closely related to ciliates and sporozoans (Gajadhar et al. 1991; Cavalier-Smith 1993). Almost half of the living dinoflagellates are photosynthetic, whereas none of the ciliates and sporozoans is photosynthetic. However, a plastid genome of 35 kb containing a number of genes was found in the sporozoan Plasmodium falciparum (Gardner et al. 1988), but none of these genes encode photosynthetic proteins (Wilson et al. 1996). A similar plastid genome was found in Toxoplasma gondii and other sporozoan species (Kohler et al. 1997). Sporozoan plastids have an envelope of four membranes, whereas 6 the dinoflagellate chloroplasts have an envelope of three membranes. This suggests that there were two evolutionary pathways for the chloroplasts in Sporozoa and dinoflagellates. Further data are strongly needed to study the origin and evolution of the plastids in Sporozoa and dinoflagellates. There is no report of a relic chloroplast or chloroplast genes in ciliates. However, chloroplast-like genes were found in the mitochondrial genome of the ciliate Paramecium, although it was not clear whether those gene are transcribed or have functional products (Pritchard et al. 1989). 1.2 Chloroplast characters of major algal groups 1.2.1 Pigments The traditional classification of different algal groups was based on their pigment content. This resulted in three main taxonomic groups: the Rhodophyta (red algae) with chlorophyll a and phycobilisomes, the Chlorophyta (green algae) with chlorophyll a and b and the Chromophyta (colored algae) with chlorophyll a, c and xanthophylls such as fucoxanthin, vaucheriaxanthin and peridinin (Sze 1993). Chlorophyll a and b are also used by chloroplasts of terrestrial plants and chlorarachniophytes. This classification does not fit the large diversity of the algal world very well, since some algae can not be classified simply into any one of the three groups, so all chlorophyll c containing algae (except dinoflagellates) were grouped into Chromista (Cryptophyta, Heterokonts and Haptophyta) by Cavalier Smith (1993). 7 The characteristics of the different algal taxa classified based on their chlorophyll type and plastid structure are shown in Table 1.1 (Durnford 1995, Ph. D thesis). Table 1.1 Characteristics of the major algal groups Taxon Antenna Chi Chip membs Alternate name Glaucophyta a, pc, ape 2 Glaucophyceae Rhodophyta a, pc, pe, ape 2 Red algae Chlorophyta a, b 2 Green algae Dinophyta a, c2 3 Dinoflagellates Euglenophyta a, b 3 Euglenoids Chlorarachniophyta a, b 4 Green amoeba (nm) Cryptophyta a, c l , c2, pe, pc 4 Cryptomonads (nm) Heterokorita a, c l , c2, c3 4 Heterokonts Haptophyta a, c l , c2, c3 4 Haptophytes Abbreviations: Chi, Chlorophyll; a, chlorophyll a; b, chlorophyll b; cl-3, chlorophyll cl-3; pc, phycocyanin; pe, phycoerytherin; ape, allophycocyanin; pec, phycoerythrocyanin; Chip membs, chloroplast membranes; na, not applicable; nm, nucleomorph. 1.2.2 C h l o r o p l a s t s tructure Chloroplasts are photosynthetic organelles that have their own genome and an entire machinery essential for photosynthesis and gene expression (Sugiura 1995; Reith 1995). The chloroplast structure among terrestrial plants and algal groups is different. The chloroplasts in red algae, green algae and higher plants have an envelope of two membranes, and the chloroplasts of chromists (haptophyte, heterokont and cryptophyte) and chlorarachniophyte have an envelope of four membranes. The chloroplasts in red algae have separate thylakoid membranes, and phycobilisomes are present on the surface 8 of the thylakoid membranes. In the chloroplasts of green algae and higher plants, the thylakoid membranes are arranged as grana and thylakoid lamellae, and in the chloroplasts of haptophytes and heterokonts, the thylakoids appear as groups of three thylakoid membranes, but the thylakoids in cryptomonads appeared as groups of two thylakoid membranes (Sze 1993). The chloroplasts of cryptophytes and chlorarachniophytes are unique in that between the outer two membranes and the inner two membranes, there is a small nucleus (nucleomorph) believed to be the relic of the original photosynthetic symbiotic eukaryote that was engulfed by an eukaryotic host (McFadden 1990). Dinoflagellate and euglenoid chloroplasts differ from all others with an envelope of three membranes, and dinoflagellate chloroplasts have unique pigment peridinin as well as chlorophyll c that are not present in euglenoids (Table 1.1). The chloroplast thylakoids of dinoflagellates are arranged in groups of three while those of euglenoids are arranged as groups of two or three. 1.3 Chloroplast genomes 1.3.1 Physical properties of chloroplast D N A The demonstration of unique D N A in the chloroplasts of Chlamydomonas (Sager et al. 1963) led to intensive studies of chloroplast genomes in photosynthetic organisms. The chloroplast D N A was first isolated as circular molecules in Euglena (Manning et al. 1971), and the first procedure of isolating high proportions of intact molecules was developed in higher plants using CsCl gradient centrifugation (Kolodner and Tewari 9 1975). The basic approach for isolation of intact chloroplast D N A is to break the cell by a mechanical method, to isolate intact chloroplasts from the homogenates by differential centrifugation, and to wash the chloroplasts in Tris/EDTA solution to remove nuclear D N A contamination. Then the chloroplasts are treated with detergent and proteolytic enzymes (pronase or proteinase K), and the chloroplast D N A is purified by centrifugation of the lysate on CsCl/ethidium bromide gradient. Two D N A fractions are obtained from the gradient: the covalently closed circular D N A that is ideal for further experimental purpose such as cloning and sequencing, and a "main band" D N A which is a mixture of linear relaxed chloroplast D N A and nuclear D N A (Herrmann 1982). The base composition of chloroplast D N A of the majority of algae and higher plants is A T (-70%) rich. A bis-benzimide dye, Hoechst 33258 which preferably binds A T rich D N A , has been widely used for isolating chloroplast D N A from total genomic D N A in algae (Chesnick and Cattolico 1993; Douglas 1988) and higher plants (Szeto et al. 1981; Rogers et al. 1988). Usually the circular chloroplast genomes (DNA) are approximately 120 to 200 kb consisting of 100-250 genes and open reading frames (Downie and Palmer 1992; Reardon and Price 1995; Sugiura et al. 1998; Turmel et al. 1999). The population of chloroplast D N A in a given species is generally homogeneous. When the chloroplast D N A of a given species is digested with a restriction endonuclease and electrophoresed on agarose gel, it gives a band pattern consisting of various sized D N A fragments. The restriction endonucleases have been used for mapping the chloroplast genomes (Bedrook and Bogorad 1976), and used to determine the size of chloroplast genomes. 10 1.3.2 Genome structure Substantial progress has been made in sequencing chloroplast genomes in the past four years, and nine more chloroplast genomes of different species have been sequenced (Table 1.2). Comparisons among the chloroplast genome sequences indicated that the similarities are far more abundant than are the differences (Reardon and Price 1995). A notable feature of the chloroplast genomes found in most plants and algae is the presence of two duplicate regions in reverse orientation, known as the inverted repeats (IR). The size of inverted repeats usually ranges from 6 to 76 kb in length (Palmer 1985, Reardon and Price 1995) and contains genes related to gene expression, such as ribosomal R N A (rRNA), transfer R N A (tRNA) and ribosomal protein genes. The inverted repeats separate the chloroplast genome into a large single-copy region (LSC) and a small single-copy region (SSC). Changes in the length of inverted repeats are the major reasons for the size variation of chloroplast genomes. The chloroplast genome of the black pine Pinus thunbergii has two greatly reduced inverted repeats sized 495 bp, presumably due to incomplete loss of the ancestral large inverted repeats (Tsudzuki et al. 1992; Wakasugi et al. 1994). The residual inverted repeats contain only one tRNA gene (trnl) and the 3' part (83 bp) of psbA; the rRNA genes appear in the small single-copy region (SSC). Although the inverted repeats are present in the chloroplast genomes of most land plants, the chloroplast genomes of pea, broad bean and alfalfa have one copy of the corresponding region (Koller et al. 1980). It was suggested that the two inverted repeats were present in the common ancestor of land plants, but in evolution one inverted repeat was lost in some 11 Table 1.2 Chloroplast genomes completely sequenced Species Gene No. Size (bp) (A+T)% Accession Reference Algae 1 Cyanophora paradoxa* 150 135,599 69.5 U30821 Stirewalt et al. 1995 2 Cyanidium caldarium 189 164,921 67.3 AF022186 Gleockner et al. 1999 3 Porphyra purpurea* 179 191,028 67 U38804 Reith and Munholland 1995 4 Odontella sinensis* 132 119,704 68.2 Z67753 Kowall iket al. 1995 5 Guillardia theta 144 121,524 67 AF041468 Douglas et al. 1999 6 Euglena gracillis* 93 143,172 73 X70810 Hallick et al. 1993 7 Chlorella vulgaris 105 150,613 68.4 AB001684 Wakasugi et al. 1997 8 Nephroselmis olivacea 123 200,799 57.9 AF137379 Turmel et al. 1999 9 Mesostigma viride 126 118,360 69.9 NC002186 Lemieux et al. 2000 Sporozoa 10 Plasmodium falciparum 49 34.682 86.9 X95275/6 Wilson et al. 1996 11 Toxoplasma gondii 55 34,996 78.6 U87145 Kissinger et al. 1999 Plants 12 Marchantia polymorpha* 92 121,024 71.2 X04465 Ohyama et al. 1986 13 Pinus thunbergii* 103 119,707 61.5 D17510 Wakasugi et al. 1994 14 Nicotiana tabacum* 106 155,939 62.2 Z00044 Shinozaki et al. 1986 15 Arabidopsis thaliana 111 154,478 63.7 NC000932 Sata et al. 1999 16 Spinacia oleracea 110 150,725 63.2 NC002202 Schmitz-Linneweber et al. 2000 17 Oryza sativa* 92 134,525 61 X15901 Hiratsuka e ta l . 1989 18 Zea mays* 112 140,387 61.5 X86563 Maier et al. 1995 19 Epifagus virginiana* 45 70,028 64 M81884 Wolfe et al. 1992 *, chloroplast genomes were completely sequenced when this project started. The number of chloroplast genes does not include ORFs, ycfs and pseudogenes. Each duplicate gene in the IR region counts only once. The inverted repeats are present in all the chloroplast genomes listed except Porphyra, Euglena and Chlorella. 12 legume species (Palmer 1985). Like the chloroplast D N A of land plants, the chloroplast genomes o f green algae Mesostigma viride (Lemieux et al. 2000) and Nephroselmis olivacea (Turmel et al. 1999) have inverted repeats. However, the chloroplast genome o f the green alga Chlorella vulgaris lost one copy o f the two inverted repeats (Wakasugi et al. 1997). The inverted repeats were rearranged and partially duplicated in the chloroplast genome o f Euglena gracillis as a tandem array o f three complete and one partial ribosomal R N A gene clusters (Hallick et al. 1993). The chloroplast genomes o f the non-green algae Odontella sinensis (Kowallik et al. 1995) and Cyanophora paradoxa (Stirewalt et al. 1995) also contain the two inverted repeats. In the chloroplast genome o f the red alga Porphyra purpurea, the two repeats appear as direct repeats rather than inverted repeats, and only contain rRNA genes. The two direct repeats are not identical, with 41 bp difference out o f 4,280 (0.85 percent) positions (Reith and Munholland 1995). The inverted repeats o f the cryptomonad Guillardia theta plastid genome is similar to that of Porphyra in size (4.9 kb) and in gene content (only rRNA genes) (Douglas and Penny 1999). This is consistent with the hypothesis that the chloroplast of Guillardia was derived from a red alga (Douglas et al. 1991). Another important feature of the chloroplast genomes is that some functionally related chloroplast genes appear as clusters that are conserved among the chloroplast genomes of different species. The number of genes present in each cluster is different between species (Stoebe and Kowallik 1999). Clusters of the psaA and psaB gene, the 13 ribosomal protein genes and the rRNA genes are a few examples, and wil l be discussed in section 1.3.3. The organization of the relic plastid genomes of the sporozoans Plasmodium and Toxoplasma as well as the chloroplast genome of the nonphotosynthetic parasitic plant Epifagus virginiana is similar to that of other organisms with respect to the inverted repeats and gene clusters. The plastid genomes of Plasmodium and Toxoplasma and Epifagus all have the two inverted repeats, the ribosomal R N A gene cluster and the ribosomal protein gene cluster. However, the sizes of plastid genomes of Plasmodium (35 kb), Toxoplasma (35 kb) and Epifagus (70 kb) are much smaller than that of chloroplast genomes (120 to 200 kb) of algae and higher plants (Wilson et al. 1997; Kissinger et al. 1999; Wolfe et al. 1994). The relic plastid genomes lost all the photosynthetic genes, and have fewer expression system genes (see 1.3.3). Chloroplast genomes in land plants each have a few introns. The chloroplast genomes of the red alga Porphyra and the diatom Odontella do not have any introns, and that of the glaucophyte Cyanophora contains one intron in trnL (UAA) gene. The chloroplast genome of Euglena has 149 introns that represent 38% of the whole genome. 1.3.3 Gene content The nineteen completely sequenced chloroplast genomes vary in genome size and gene content (Table 1.2). Chloroplast genomes contain genes whose products function mainly in two processes: photosynthesis and gene expression (Table 1.3). The gene content varies from 45 genes in the nonphotosynthetic parasitic plant Epifagus virginiana 14 to 189 genes in the red alga Cyanidium caldarium with known functions. The gene contents of the plastid genomes of P. falciparum, T. gondii and E. virginiana are much less than those of algae and higher plants (Table 1.2). The following description is based on the nineteen chloroplast genomes completely sequenced, the references is on Table 2 unless specified. Expression system genes Ribosomal rRNA genes Two ribosomal R N A genes, 16S and 23 S rRNA gene, are present in all the chloroplast genomes and are associated with the 3 OS and 5 OS subunits of the chloroplasts 70S ribosomes respectively. 4.5S and 5S rRNA genes, which encode small rRNA molecules, are also present in the chloroplast genomes of land plants as well as the parasitic plant E. virginiana. Only the 5S rRNA gene is present in the chloroplast genomes of algae, and no small R N A gene is present in the sporozoan plastid genomes (Table 1.3). rRNA genes usually appear as a cluster in the inverted repeats, or in the derivatives of inverted repeats in Porphyra, Euglena and Chlorella, and are organized as 16S-23S-4.5S-5S in land plants, 16S-23S-5S in algae and 16S-23S in sporozoans. Transfer rRNA genes The chloroplast genomes of photosynthetic species contain > 30 transfer R N A (tRNA) genes, except that the chloroplast genome of Odontella has 29 tRNA genes (Table 1.3, 1.4). These tRNA species are enough for all the 20 amino acids in protein 15 Table 1.3 Chloroplast genes of completely sequenced chloroplast genomes Expression system genes Expression system genes Genes Products Notes Genes Products Notes 23S rDNA 23S rRNA rpsl 3 CS13 in 1-5 I6S rDNA 16S rRNA rpsl 4 CS14 not in 10,11 5S rDNA 5S rRNA in algae, land plants rpsl5 CS15 not in 1-8,10,11, 19 4.5 S rDNA 4.5S rRNA in land plants rpsl 6 CS16 not in 6-8,10-13,19 rpsl 7 CS17 in 1-5,10,11 trnA-UGC Ala-tRNA (UGC) rpsl 8 CS18 notinlO.ll trnR-ACG Arg-tRNA (ACG) rpsl9 CS19 not in 16 trnR-UCV Arg-tRNA (UCU) rps20 CS20 in 1,3,4,5 trnR-CCG Arg-tRNA (CCG) trnN-GW Asn-tRNA (GUU) rpll 50S r-protein CL1 in 1-5 trnD-GVC Asp-tRNA (GUC) rpl2 CL2 not in 12,pseudo in 19 trnC-GCA Cys-tRNA (GCA) rpl3 CL3 in 1-5 trnQ-UUG Gln-tRNA (UUG) rpl4 CL4 in 2-5,10,11 trnE-UVC Glu-tRNA (UUC) rpl5 CL5 in algae trnG-GCC Gly-tRNA (GCC) rpl6 CL6 in 1-5,10,11 trnG-VCC Gly-tRNA (UCC) rpl7 CL7 in 1 trnH-GUG His-tRNA (GUG) rpl9 CL9 in 3 trnl-GAU Ile-tRNA (GAU) rplll CL11 in 1-5,11 trnl-CAV Ile-tRNA (CAU) rpll2 CL12 in 2-8,19 IrnL-VAA Leu-tRNA (UAA) rpll 3 CL13 in 2-5 trnL-CAA Leu-tRNA (CAA) rpll4 CL14 pseudo in19 trnL-UAG Leu-tRNA (UAG) rpll6 CL16 trnK-VVV Lys-tRNA (UUU rpll 8 CL18 in 1-5 trnfM-CAXi fMet-tRNA (CAU) rpll9 CL19 in 1-5,7-9 trnM-CAU Met-tRNA (CAU) rpl20 CL20 not in 10,11 trnF-GAA Phe-tRNA (GAA) rpl21 CL21 in 1-5,12 trnP-MGG Pro-tRNA (UGG) rpl22 CL22 not in 7,8,10,11,19 trnS-GGA Ser-tRNA (GGA) rp!23 CL23 not in l,ll,16,pseudo inl9 trnS-UGA Ser-tRNA (UGA) rpl24 CL24 in 2-5 trnS-GCV Ser-tRNA (GCU) rpl27 CL27 in 2-5 trnT-GG\J Thr-tRNA (GGU) rpl28 CL28 in 1-3 trnT-UGU Thr-tRNA (UGU) rpl29 CL29 in 2-5 trnW-CCA Trp-tRNA (CCA) rpl31 CL31 in 2-5 trnY-GUA Tyr-tRNA (GUA) rpl32 CL32 in 1,10-12,14,17,19 IrnV-GAC Val-tRNA (GAC) rpl33 CL33 not in 6-8,10,11 trnV-UAC Val-tRNA (UAC) rpl34 CL34 in 1-5 rpl35 CL35 in 1-5 rpsl 30S r-protein CS1 in Porphyra rpl36 CL36 not in 12,14 rps2 CS2 rps3 CS3 rpoA RNA polymerase subunit alpha not in 6,10,11,pseudo in 19 rps4 CS4 rpoB subunit beta not in 19 rps5 CS5 in 1-5,10,11 rpoCl subunit beta' not in 19 rps6 CS6 in 1-5 rpoC2 subunit beta" not in 11,19 rps7 CS7 rps8 CS8 not in Toxoplasma tufA elongation factor Tu not in land plants rps9 CS9 in algae tsf elongation factor Ts in 2,3,5 rpslO CS10 in 1-5 infA initiation factor 1 in 7,8,9,12,13,16,17,18,19 rpsl 1 CS11 infB initiation factor 2 in 3,5 rpsl2 CS12 infC initiation factor 3 in 2,3 (to be continued) 16 Table 1.3 (continued) bef complex cytochrome Photosystem genes Photosystem genes Genes Products Notes Genes Products Notes rbcL Rubisco large subunit pseudogene in 19 psbX PSII X-protein (4.1 kDa) in 1,3-5 rbcS Rubisco small subunit in 1-5 petA baf complex cytochrome apocytochrome f not in 6 psaA PSI P700 chl A apoprotein A l petB bef complex cytochrome apocytochrome f6 psaB PSI P700 chl A apoprotein A2 petD b6f complex subunit IV not in 6 psaC PSI iron-sulfur center petF Ferredoxin in 1-5 psaD PSI reaction center subunit in 2-5 pelG b6f complex subunit V not in 12 psaE PSI reaction center subunit in 1-5 petJ b 6f complex cytochrome c553 in 2,3 psaF PSI reaction center subunit in 1-5 pelL b6f complex subunit VI (3.5 kDa) in 1,7,8,9, 14 psal PSI reaction center subunit in 1-5,7-9,13-16,18 petX b6f complex subunit in 1 psaJ PSI reaction center subunit in 1-5,7-9,13-16,18 psaK PSI reaction center subunit in 2,3,5 atpA ATP synthase subunit C F i alpha pseudogene in 19 psaL PSI reaction center subunit in 2-5 atpB ATP synthase subunit C F i beta Pseudogene in 19 psaM PSI reaction center subunit in 1-7,9,13 atpD ATP synthase subunit C F i delta in 1-5 atpE ATP synthase subunit C F i epsilon psbA PS 11 Dl protein pseudogene in 19 aipF ATP synthase subunit CFo I psbB PSII CP-47 apoprotein pseudogene in 19 atpG ATP synthase subunit CF 0 II in 1-5 psbC PS 11 CP-43 apoprotein atpH ATP synthase subunit C F 0 III psbD PSII D2 protein atpl ATP synthase subunit C F 0 IV not in 1 psbE Cytochrome b559 alpha psbF Cytochrome b559 beta ndhA NADH bdehydrogenase chain 1 in 8,9,14-18 psbH PSII 10 kDa phosphoprotein not in 12 ndhB NADH bdehydrogenase chain 2 in 8,9,14-18 psbl PSII I-protein (4.8 kDa) not in 12,14 ndhC NADH bdehydrogenase chain 3 in 8,9,14-18 psbJ PSII J-protein not in 12,17,18 ndhD NADH bdehydrogenase chain 4 in 8,9,14-18 psbK PSII K-protein not in 12,14 ndhE NADH bdehydrogenase chain 4L in 8,9,14-18 psbL PSII L-protein not in 12 ndhF NADH bdehydrogenase chain 5 in 8,9,14-18 psbM PSII M-protein not in 2-6,12,16,17 ndhG NADH bdehydrogenase chain 6 in 8,9,14-18 psbN PSII N-protein not in 12 ndhH NADH bdehydrogenase subunit in 8,9,14-18 psbT PSII T-protein (3 kDa) not in 6,12,17,18 ndhl NADH bdehydrogenase chain I in 8,9,14,15,16,18 psbV cytochrome c550 inl-5 ndhj NADH bdehydrogenase chain J in 9,14,15,1618 psbW PSIIW-protein(13kDa) in 1-5 ndhK NADH bdehydrogenase chain K in 8,9,14,16,18 The numbers in the column "notes" refer to the chloroplast genomes completely sequenced: 1, Cyanophora paradoxa, 2, Cyanidium caldarium, 3, Porphyra purpurea, 4, Odontella sinensis, 5, Guillardia theta, 6, Euglena gracillis, 7, Chlorella vulgaris, 8, Nephroselmis olivacea, 9, Mesostigma viride, 10, Plasmodium falciparum, 11, Toxoplasma gondii, 12, Marchantia polymorpha, 13, Pinus thunbergii, 14, Nicotiana tabacum, 15, Arabidopsis thaliana, 16, Spinacia oleracea, 17, Oryza sativa, 18, Zea mays, 19, Epifagus virginiana. Genes (except tRNA) in different species are indicated unless in every species (blank). Abbreviations: r-protein, ribosomal protein. PSI, photosystem I. PSII, photosystem II. Cyt, cytochrome. Chl, chlorophyll. 17 synthesis. The nonphotosynthetic plant Epifagus has 21 tRNA genes for 14 different amino acids, and lacks tRNA species for 6 amino acids: alanine, cyteine, glycine, lysine, threonine and valine (Wolfe et al. 1992). Plasmodium and Toxoplasma have 25 and 33 tRNA genes respectively, which encode tRNAs for 20 amino acids and is enough for translation of the protein genes on their plastid genomes (Wilson et al. 1996). tRNA genes are scattered all over the chloroplast genomes. Ribosomal protein genes Chloroplast ribosomes contain about 60 ribosomal proteins. The chloroplast genomes have various numbers of ribosomal protein genes, from 16 genes in Toxoplasma to 47 genes in Cyanidinium (Table 1.3, 1.4). Some ribosomal protein genes appear as a cluster. The ribosomal protein gene cluster is present in all the chloroplast genomes, including that of the parasitic plant Epifagus and that of the sporozoans Plasmodium and Toxoplasma. The organization of the ribosomal protein cluster is conserved among different species, but the number of genes in the cluster is different between species. For example, the ribosomal protein cluster of Toxoplasma is the smallest one and consists of 9 genes, while the cluster of Porphyra is the largest one and has 29 genes including rpoA and tufA. Translation factor genes Two elongation factor genes, tufA that encodes elongation factor Tu (EF-Tu), and tsf that encodes elongation factor Ts (EF-Ts), are present in the chloroplast genomes of 18 Table 1.4 Number of genes on the sequenced chloroplast genomes Species Genes for expression Genes for photosynthetic system Others ORFs & ycfs rRNA tRNA r-protein tufA ftsf inf rpo DnaB /me1 PSI PSII Cyt b6f ATPase NADH Rubisco Algae 1 Cyanophora paradoxa 3 36 37 1 0 4 0 8 17 7 7 0 2 28 32 2 Cyanidium caldarium 3 30 44 2 1 4 1 11 15 6 8 0 2 62 41 3 Porphyra purpurea 3 37 47 2 2 4 2 11 16 6 8 0 2 39 62 4 Odontella sinensis 3 29 44 1 0 4 1 10 16 5 8 0 2 9 38 5 Guillardia theta 3 30 44 2 1 4 2 11 16 5 8 0 2 16 28 6 Euglena gracillis 3 39 21 1 0 3 0 4 12 2 6 0 1 1 8 7 Chlorella vulgaris 3 33 21 1 1 4 0 6 14 5 6 0 1 10 57 8 Nephroselmis olivacea 3 38 21 1 1 4 1 5 14 5 6 10 1 13 9 9 Mesosligma viride 3 37 24 1 1 4 0 6 14 5 6 11 1 13 13 Sporozoa 10 Plasmodium falciparum 2 25 17 1 0 3 0 0 0 0 0 0 0 1 8 11 Toxoplasma gondii 2 33 16 1 0 2 0 0 0 0 0 0 0 1 6 Plants 12 Marchantia polymorpha 4 35 19 0 1 4 0 3 6 3 6 8 1 2 36 13 Pinus thunbergii 4 37 20 0 1 4 0 6 14 4 6 0 1 6 51 14 Nicoliana tabacum 4 36 19 0 0 4 0 5 12 5 6 11 1 3 24 15 Arobidopsis thaliana 4 37 21 0 0 4 0 5 14 4 6 11 1 4 16 16 Spinacia oleracea 4 37 19 0 1 4 0. 5 13 5 6 11 1 4 18 17 Oryza saliva 4 29 20 0 1 4 0 3 12 4 6 8 1 0 42 18 Zea mays 4 39 21 0 1 4 0 5 13 4 6 11 1 3 36 19 Epifagus virginiana 4 21 16 0 1 0 0 0 0 0 0 0 0 3 4 Note: Gene number does not include ORFs (open reading frames), ycfs (ycf is a system of temporary gene designations assigned to conserved reading frames in chloroplast genomes, Hallick & Bairoch 1994), pseudogenes and duplicate genes (e.g., genes on IR region). 1. Cyanidium, Porphyra and Guillardia also have tufA and tsf, the species only have tufA. 2. Cyanidium has infC, Porphyra has infB and infC, Guillardia has infB and the rest have infA. 3. Porphyra and Guillardia have rne (encodes RNAse E) and dnaB (encodes replication helicase subunit). 19 red algae {Porphyra and Cyanidinium) and the cryptomonad GuUlardia. The tufA gene is also present in the chloroplast genomes of Cyanophora, Odontella, Euglena, the green algae (Chlorella, Nephroselmis and Mesostigmd) and the Sporozoa (Plasmodium and Toxoplasma). The tufA gene is not present in the chloroplast genomes of land plants, and is known to have been transferred to the nucleus (Baldauf and Palmer 1990a; Baldauf et al. 1990b). Other genes related to translation in the chloroplast genomes are three genes, inf A, infB and infC, that encode initiation factors 1, 2 and 3 respectively. The land plants Marchantia, Pinus, Spinacia, Zea, Oryza and Epifagus, and the green algae Chlorella, Nephroselmis and Mesostigma have the inf A gene, the cryptomonad GuUlardia has the infB gene, and the red algae Porphyra and Cyanidium have two initiation factor genes, infB and infC. The sporozoans Plasmodium and Toxoplasma, the algae Cyanophora, Odontella and Euglena, and the higher plants Arabidopsis and Nicotiana have no initiation factor gene. RNA polymerase genes Four R N A polymerase genes, rpoA, rpoB, rpoCl and rpoC2 that encode R N A polymerase subunit a, p, P' and P " respectively, are present in the chloroplast genomes of algae (except Euglena) and land plants (except Epifagus). Epifagus only has a pseudo rpoA gene that should not have any function. Euglena and the sporozoan Plasmodium have rpoB, rpoCl and rpoC2 but lack rpoA. Similarly, Toxoplasma has rpoB and rpoCl 20 but does not have rpoA and rpoC2. rpoB, rpoCl and rpoC2 form a cluster, rpoB/Cl/C2, which rpoA is in the ribosomal protein cluster. Photosynthetic system genes Since the parasitic plant Epifagus, the sporozoans Plasmodium and Toxoplasma have no photosynthetic function and have lost all the photosynthetic genes, the following description is based on the completely sequenced chloroplast genomes of the photosynthetic species unless specified. Photosystem II genes The thylakoid membranes have four major complexes: photosystem I (PSI), photosystem II (PSII), the cytochrome b/f complex and ATP synthase. Seventeen components of PSII are encoded in the chloroplast genomes (Table 1.3). The number and the type of photosystem II genes are different between species (Table 1.3, 1.4). Six of the 17 genes, psbA, psbB, psbC, psbD, psbE and psbF, are present in all the chloroplast genomes sequenced. Marchantia only has the 6 genes, the minimum number of PSII genes present in the known chloroplast genomes (Table 1.4). The chloroplast genome of Cyanophora is the only one that contains all the 17 photosystem II genes. Porphyra, Odontella and GuUlardia have 16 photosystem II genes, and only lack psbM. The cluster psbC/psbD is present in all the chloroplast genomes of the photosynthetic species. Epifagus has two pseudo genes, \\ipsbA and \\)psbB. 21 Photosystem I genes Eleven photosystem I genes, psaA, psaB, psaC, psaD, psaE, psaF, psal, psaJ, psaK, psaL and psaM, which encode various components of photosystem I, are present in the chloroplast genomes (Table 1.3). psaA and psaB encode the reaction center protein P700 chlorophyll a apoprotein subunits A l and A2 respectively. psaA and psaB are present in all the chloroplast genomes and appear as a cluster. psaC gene encodes the iron-sulfur center of photosystem I and is also present in all the sequenced chloroplast genomes. The chloroplast genomes of Marchantia and Oryza have only three PSI genes, psaA, psaB and psaC, the minimum number of photosystem I genes found in the chloroplast genomes (Table 1.4). The red alga Cyanidium, Porphyra and the cryptomonad GuUlardia have all the eleven genes, and Odontella has ten genes (only lacks psaK). psaD is only present in the chloroplast genomes of non-green algae, Cyanidium, Porphyra, Odeontella and GuUlardia, but is in the nucleus in Cyanophora, the green algae and the land plants. Cytochrome b(f complex genes Eight genes, petA, petB, petD, petF, petG, petJ, petL and petX, for components of the cytochrome bef complex, are present in the chloroplast genomes (Table 1.3). petA, petB, petD and petG are in all the genomes, except that Euglena only has petB and petG, and Marchantia only has petA, petB and petD. The chloroplast genome of Cyanophora has seven genes, the maximum number present in the chloroplast genome, and only lacks petJ. petB and petD cluster with psbB and psbH in the green lineage. 22 ATP synthase genes There are two components to the ATP synthase: C F i and C F n . C F i consists of five different subunits (alpha, beta, gamma, delta and epsilon). Chloroplast genes atpA, atpB, atpD and atpE encode the subunits alpha, beta, delta and epsilon respectively. atpC (encoding the subunit gamma) is always in the nucleus. C F n consists of subunits I, II, III and IV, which are encoded by genes atpF, atpG, atpH and atpl respectively (Table 1.3). The non-green algae Cyanidium, Porphyra, Odontella and Guillardia have all the eight genes. Cyanophora has seven ATP synthase genes (lacks atpl), while all the other species have six chloroplast genes (lack atpD and atpG). ATP synthase genes tend to form clusters, appearing as atpB/atpE in all the species, atpA/atpD/atpF/atpG/atpH/'in Cyanophora, and atpA/atpD/atpF/atpG/atpH/atpl in the non-green algae, and atpA/atpF/atpH/atpI'mEuglena, the green algae and land plants. The plastid genome of Epifagus has two pseudo genes, \\)atpA and \\iatpB. ndh genes Eleven ndh genes, ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhl, ndhJ and ndhK, which encode N A D H dehydrogenase subunits are present in the chloroplast genomes (Table 1.3). The land plants Nicotiana, Spinacia and Zea, and the green algae Mesostigma have the eleven ndh genes; Arabidopsis (lacks ndhK) and the green alga Nephroselmis (lacks ndhJ) have ten ndh genes, Marchantia has 6 ndh genes and Oryza (lacks ndhl, ndhJ and ndhK) has 8 ndh genes (Table 1.3, 1.4). It is interesting that Pinus does not have any ndh genes. Cyanophora, Euglena, the non-green algae Cyanidium, Porphyra, Odontella and Guillardia and the green algae Chlorella have no 23 ndh genes in their chloroplast genomes, ndh genes tend to form a cluster, which is conserved in higher plants as ndhD/psaC/ndhE/ndhG/ndhl/ndhA/ndhH, and with minor changes in green algae. Rubisco subunit genes Ribulose-l,5-biphosphate carboxylase/oxygenase (Rubisco) is the major stromal protein in chloroplasts,'and consists of eight identical large subunits (L) and eight identical small subunits (S). The large subunit is encoded by the rbcL gene, and the small subunit is encoded by the rbcS gene. The chloroplast genomes of the non-green algae, Cyanophora, Cyanidium, Porphyra, Odontella and GuUlardia, have both rbcL and rbcS genes. Euglena, the green algae and land plants only have the rbcL gene in their chloroplast genomes, and have rbcS transferred to the nucleus. Dinoflagellates have a nuclear-encoded type II Rubisco (Morse et al. 1995; Rowan et al. 1996), probably derived by lateral gene transfer from a proteobacterium (Palmer and Delwiche 1998). Other genes and ORFs In addition to the expression system genes and photosynthetic system genes, there are a number of genes with various functions in the chloroplast genomes. The majority of these genes are only in the chloroplast genomes of non-green algae (Table 1.4). For example, in Porphyra these genes encode proteins involved in biosynthesis of phycobilisomes (cpcA, cpcB, cpcG, cpeA, cpeB, pbsA and hemA), amino acids (argB, 24 car A, gltB, ilvB, ilvH, trpA, trpG), fatty acids (accA, AccB, accD, acpA, fabH), chlorophyll (chlB, chll, chlL and chlN), carotenoids (preA) and thiamine (thiG). The chloroplast genome of Porphyra also has genes encoding enzymes involved in the biosynthesis of N A D (nadA) and carotenoids (crtE). It also has genes that encode proteins for nitrogen assimilation (glnB, hesB), redox reactions (trxA, ftrB), protein transport (secA, secY), glycolysis (pgmA), protease (clpC, ftsH), chaperonin (dnaK, GroEL, GroZS), and a cell division protein (ftsW) (Reith 1995). Some of these genes are also found in Cyanophora, Cyanidium, Odontella and GuUlardia, but very few of them are in the green algae, and land plants. An example is clpP gene, encoding protease ATP-binding subunit, which is present in green algae, and land plants (except Euglena, Marchantia and Oryzd). A l l the chloroplast genomes sequenced have various open reading frames (ORFs). Some ORFs are conserved between organisms, for example, ORF 470 was found in the plastid genomes of the Plasmodium, Porphyra and Odeontella (Wilson et al. 1996). 1.3.4 Phylogenetic implications Chloroplast genome organization and phylogenetic analyses of chloroplast genes provide some evidence for the relationship among chloroplasts from different organisms. The size, structure, gene content and linear order of genes are conserved in the chloroplast genome among land plants (Palmer 1991; Palmer and Stein 1986) and algae (Sugiura 1995; Reith 1995), suggesting that any change in structure, arrangement, or content of the chloroplast genome may have significant phylogenetic implications 25 (Downie and Palmer 1992). The organization of chloroplast genes was used for examining the evolutionary relationships of chloroplasts among different organisms, such as gene clusters for ATPase subunits (Kowallik 1993), and Rubisco large and small subunit (rbcL/S) (Douglas and Durnford 1989). Sequences of individual chloroplast genes were also used in phylogenetic analyses, such as tufA (elongation factor Tu) (Kohler et al. 1997; Ishida et al. 1997), atpB and 16S rRNA (Douglas and Turner 1991b; Morden et al. 1992; MacFadden et al. 1995). Recently phylogenetic analysis of concatenated sequences of chloroplast proteins provided more reliable information for the origin and evolution of chloroplast (Martin et al. 1998; Turmel et al. 1999). Phylogenetic analysis of my data will be described in Chapter 7. 1.3.5 The mystery of the dinoflagellate chloroplast genome Despite evidence that chloroplast D N A in typical dinoflagellates consists of a large number of scattered nucleoids (Coleman 1985) and that dinoflagellate satellite D N A separated on CsCl gradients apparently contains chloroplast genes (Boczar et al. 1991), previous attempts to characterize typical dinoflagellate chloroplast genes were uniformly unsuccessful. Dinoflagellate plastid genomes may be aberrant as suggested by the discovery that dinoflagellates have a nuclear-encoded type II rubisco (Morse et al. 1995; Rowan et al. 1996; Palmer 1996) probably derived by lateral gene transfer from a proteobacterium (Palmer and Delwiche 1998), rather than a plastid-encoded type I Rubisco. Form II Rubisco is oxygen sensitive and present in anaerobic proteobacteria, 26 thus it is totally different from the chloroplast encoded Rubisco form I universally present in all the photosynthetic eukaryotes. However, not a single chloroplast gene sequence from any dinoflagellate had been reported at the time when I started this project. 1.4 Origin and Evolution of chloroplasts 1.4.1 Chloroplast origin: primary and secondary endosymbiosis It has been generally accepted that plastids arose through one or more endosymbiotic events in which a photosynthetic prokaryote (cyanobacterium) was engulfed by an eukaryotic cell (Gray et al. 1982). Current evidence suggests that all chloroplasts probably originated from a single primary (prokaryotic/eukaryotic) endosymbiotic event (Cavalier-Smith 1995; Reith 1995; Palmer and Delwiche 1998; Moreria et al. 2000; Cavalier-Smith 2000) (Figure 1.3a). The chloroplasts with an envelope of four membranes might have been derived from the existing chloroplasts by various independent secondary endosymbioses, in which different photosynthetic algae with two-membrane chloroplasts were engulfed by various eukaryotic hosts (Figure 1.3b). Thus, the four-membrane chloroplasts in chlorarachniophytes, cryptomonads, heterokonts and haptophytes and sporozoans (apicomplexans) might have originated by secondary endosymbioses. 27 1.4.2 Molecular evidence for secondary endosymbiosis The strongest evidence supporting secondary endosymbiosis is the work on the cryptomonad GuUlardia theta (Douglas et al. 1991) and several chlorarachnions (McFadden et al. 1994; 1995; Ishida et al. 1998). Both GuUlardia theta and the chlorarachnions have a nucleomorph (relict nucleus) between the outer two membranes and the inner two membranes, and have two nuclear types of small subunit rRNA gene: one is nucleomorph specific, the other is nucleus specific. The phylogenetic tree of 18S rRNA revealed that the nucleomorph of the cryptomonad is closely related to the nuclear 18S rRNA of two red algae, Gracilariopsis sp. and Gracilaria tikvahiae, but distantly related to the cryptomonad nuclear 18S rRNA, indicating that a red algal-like alga was the endosymbiont in the secondary endosymbiosis which resulted in the formation of cryptophytes (Douglas et al. 1991). Cryptomonad nuclear rRNA did not group with any members of the chromist algae, indicating that the chromophyte may be not the result of the loss of the nucleomorph of chromist algae, but might have originated independently from red algal-like algae, thus supporting the polyphyletic origin of eukaryotic algae. The cryptomonad nuclear 18S rRNA grouped with Acanthamoeba castelanii suggesting that the host involved in the secondary endosymbiosis which produced the cryptophyte is a protozoan. In situ hybridization indicated that one nuclear-type rRNA is expressed by the nucleus, the other one is expressed by nucleomorph, also suggesting that the nucleomorph is a foreign genome (Gilson and McFadden 1995). Phylogenetic analyses of both 16S rRNA and rbcL sequences (McFadden et al. 1995) and elongation factor Tu sequences (Ishida et al. 1997) suggested that the chlorarachnion chloroplast is most 28 Cyanobacterium Eukaryote (a) Primary endosymbiosis Photosynthetic eukaryote Photosynthetic eukaryote Non-photosynthetic eukaryote Photosynthetic eukaryote (b) Secondary endosymbiosis Figure 1.3 Origin of chloroplast by endosymbiosis N, nucleus; cp, chloroplast; Nm, nucleomorph 2 9 closely related to those of green algae and euglenoids, supporting the idea that the chlorarachnion resulted from a secondary endosymbiotic event. 1.4.3 Origin of dinoflagellate chloroplasts The origin of typical peridinin-containing dinoflagellate chloroplasts has been much debated (Cavalier-Smith 1995; Palmer and Delwiche 1998). One view (Gibbs 1981) suggested they are the result of a secondary endosymbiosis like those of chromists and chlorarachneans but with the loss of one of the four membranes. Another view suggested that they might have arisen by the same primary endosymbiosis as the red and green algal and higher plant chloroplasts, but with the retention of the host's endocytotic vacuolar membrane (Cavalier-Smith 1982). Several authors pointed out that molecular data, especially chloroplast rRNA sequences and chloroplast D N A sequences, were needed to help us understand their evolution (Cavalier-Smith 1995; Reardon and Price 1995; Palmer 1996). The obvious difficulty in the field of plastid evolution of dinoflagellates is how to test the two hypotheses. Molecular phylogenetic analyses, which attempt to extract evolutionary information from sequence data, have become the most commonly used tools. However, the number and choice of species used, the correctness of the sequence alignment, and the analytical method can all influence phylogenetic topology. In addition, such analyses only generate phylogenetic trees for an individual gene or protein, and by no means reflect the evolution of an organelle or organism. Only i f the analysis of 30 multiple genes or proteins results in similar trees in most cases, can one assume a similar evolutionary pattern for the organelle or organism under study (Reith 1995). 1.5 Objectives The objectives of this study were to investigate the organization of the chloroplast genome of the peridinean dinoflagellate, Heterocapsa triquetra, as well as the origin and evolution of dinoflagellate chloroplasts using the sequences of chloroplast genes. This study involved separation of the chloroplast D N A from Heterocapsa triquetra followed by cloning, sequencing and phylogenetic analysis of the chloroplast genes. There were two reasons to choose Heterocapsa triquetra for this study. First, the dinoflagellate I would work on should have general implications for the dinoflagellates. Heterocapsa triquetra is a peridinin-containing dinoflagellate, so the organization of the chloroplast genome and the structure of the chloroplast genes of//, triquetra should have general implications for the peridinean dinoflagellates. Dinoflagellates having non-peridinin containing chloroplasts have probably been derived from other algal groups by a variety of tertiary endosymbioses (Palmer and Delwiche 1998). Second, a satellite D N A that contained chloroplast genes was separated from the non-axenic Heterocapsa pygmaea (Boczar et al. 1991). I started with Heterocapsa pygmaea and got a satellite DNA. However, PCR amplification of chloroplast 16S rRNA gene using chloroplast universal primers gave bacterial 16S rRNA sequence from the satellite D N A , indicating that the satellite D N A was contaminated with bacterial DNA. In 31 order to eliminate the contamination of bacterial DNA, I chose axenic Heterocapsa triquetra for this project. 32 Chapter 2 Materials and methods 2.1 Strains Dinoflagellate cultures obtained from the Provasoli-Guillard National Center for Culture of Marine Phytoplankton (CCMP) and North East Pacific Culture Collection (NEPCC) were Heterocapsa triquetra (CCMP 449), Heterocapsa pygmaea (CCMP 1490), Heterocapsa niei (CCMP 447), Amphidinium carterae (CCMP 1314), Heterocapsa rotundata (NEPCC D680, formerly known as Katodinium rotundata) (Hansen 1995), Scrippsiella trochoidea (NEPCC D602), Protoceratium reticulatum (NEPCC D535, formerly known as Gonyaulax grindleyf) (Hansen et al. 1996/7), Prorocentrum micans (NEPCC D443), Thoracosphaera heimii (NEPCC D670), Thecadinium inclinatum (NEPCC D682), Adenoides eludens (NEPCC D683) and Gyrodinium galatheanum (NEPCC D55R). 2.2 Culturing of dinoflagellates Dinoflagellate strains were cultured in F/2 medium (CCMP) with 12 hours alternative light and dark at 18-20°C in a Sherer growth chamber, on a rotary shaker (Orbit) at 125 rpm. For large scale D N A preparation, dinoflagellates were cultured in 2 1 flasks; for D N A minipreps, they were cultured in 250 ml flasks. 33 2.3 Isolation and purification of chloroplast D N A from Heterocapsa triquetra The following protocol was based on the method of Boczar et al. (1991) with modifications. 10 liters of late logarithmic cells of axenic H. triquetra were harvested by centrifugation for 10 minutes at 4,000 rpm in a Sorval RC-5B superspeed centrifuge with a GSA rotor. A n equal volume of glassbeads (0=0.5 mm) and 2 volumes of lysis buffer (50 m M Tris-HCI, 100 m M EDTA, 100 m M NaCl, pH 8.0) were added to the cell pellet. Cells were broken by vortexing four times at maximum speed for one minute. SDS (Bio-Rad) was added to give a 2% solution, and proteinase K (Boehringer) was added to yield concentration of 300 pg/ml. After incubation in a 50°C water bath for 1 hour with several gentle inversions, three phenol/chloroform extractions were carried out. CsCl (1.0 g/ml) was added to the supernatant, and the final refractive index was adjusted to 1.3990 using A B B E - 3 L refractometer (Bausch & Lomb). The intercalating dye bis-benzimide (Hoechst 33258, Sigma) was added to give a 100 pg/ml concentration prior to loading into 5.1ml ultracentrifuge tubes (Beckman). The DNA-CsCl-Hoechst 33258 solution was centrifuged in a vertical rotor (VTi 80 Beckman) for 24 hours at 220,000 g (55,000 rpm). The satellite D N A and major D N A bands were pooled, and further centrifuged in the same rotor for 20 hours at 220,000 g. Hoechst 33258 was removed by six extractions with isopropanol saturated with CsCl-TE buffer (10 m M Tris-HCI, 1 m M EDTA, pH 8.0). D N A was precipitated by adding 2-2.5 volumes 95% ethanol to the CsCl-DNA solution, air dried, dissolved in dH 2 0 and stored at -20°C. 34 2.4 Total D N A isolation Total D N A was extracted from the other dinoflagellate species using the protocol above. Total D N A of Amphidinium carterae, Prorocentrum micans and Heterocapsa pygmaea was further purified on a CsCl gradient. Glass beads was not used for A. carterae. Alternatively, in order to save time, total D N A was isolated from the small volume dinoflagellate cultures using DNeasy Plant Mini Kit (Qiaqen) according to the manufacturer's protocol and was used to amplify D N A by PCR, because total D N A prepared by phenol/chloroform extractions did not work in PCR reactions. The total D N A prepared using the kit worked well in PCR reactions. 2.5 Total R N A isolation from Heterocapsa triquetra The above method for D N A separation was modified and used for total R N A preparation. Cells were harvested, broken by vortexing with glassbeads, incubated with SDS (Bio-Rad) and proteinase K (Boehringer) at 50°C for 1 hour. After three extractions with phenol/chloroform, total nucleic acids were precipitated by 95% ethanol and dissolved in 6 ml dH 2 0 containing 0.1 % diethyl pyrocarbonate (DEPC). R N A was selectively precipitated overnight at 0°C by adding 2 ml 8 M L i C l (final 2 M), centrifuging for 20 minutes at 8,000 rpm at 4°C, washing the pellet with 2 M L i C l . The R N A pellet was air dried and dissolved in 500 ul DEPC-dH 2 0 (Sambrook et al. 1989). 35 2.6 Southern blot D N A (1-2 pg/lane) was electrophoresed on a 0.9% agarose gel (Bio-Rad) in 1 X Tris-borate-EDTA electrophoresis buffer using lkb ladder (Hartley and Donelson 1980) as a marker. The agarose gel was treated in denaturing buffer (1.5 M NaCl, 0.5 M NaOH) for 20 minutes, rinsed in distilled water for 2 min and soaked in neutralizing buffer (1.5 M NaCl, 1 M Tris-HCI, pH7.5) for 20 minutes with gentle shaking. Then the gel was transferred onto nylon membranes (Amersham) by capillary action using 10 X SSC buffer (1.5 M NaCl, 0.15 M Na citrate, pH7.0) for >16 hours. D N A was fixed onto the nylon membrane by incubating for 1-1.5 hour at 80°C (Sambrook et al. 1989). Probes were labeled using a Random Primer D N A Labeling System (GibcoBRL) and [a-3 2P]dCTP (Amersham) according to the protocol recommended by the manufacturer. 20-25 ng probe D N A was denatured in 23 pi dH20 at 95°C for 10 minutes, immediately cooled on ice, then 15 pi Random Primers Buffer Mixture, 2 pi of 0.5 mM dATP, 0.5 m M dGTP and 0.5 m M dTTP, 1 pi Klenow Fragment, 5 pi [a-3 2P]dCTP (50 pCi), were added, mixed completely and incubated for 1 hour at room temperature. The labeled probe was denatured for 10 minutes at 95°C and quenched on ice immediately for 5 minutes. Membranes were prehybridized in Church buffer (0.25 M Na2HP04, 1 m M EDTA, 7% SDS, pH7.2) for three hours at 50-55°C. The Church buffer was changed prior to adding the labeled denatured probes. Hybridization was carried out for >16 hours at 50-55°C. Membranes were washed three times (20 minutes each) at low stringency (1 36 X SSC/0.1% SDS) or at high stringency (0.1 X SSC/0.1% SDS) at 55UC, as required; then exposed to Kodak X-omat film at -80°C. 2.7 Northern blot Total R N A was quantified using a DU-64 Spectrophotometer (Beckman). An appropriate amount of total R N A was mixed with R N A sample buffer, heated at 75°C for 10 minutes and quenched on ice immediately for at least 5 minutes. A 1.2% agarose-formaldehyde gel was made from 1 X MOPS buffer and 22% volume (v/v) of formaldehyde (37%). After loading the R N A samples (12 p.g/lane), the gel was run in 1 X MOPS buffer. R N A was blotted onto a nylon membrane by capillary action using 20X SSC for 20 hours and fixed onto the membrane by baking for 2 hours at 80°C (Sambrook et al. 1989). A l l the equipment used for northern blot and hybridization were treated with 0.1% DEPC-dH 2 0 for >12 hours prior to use. 2.8 Construction of plasmid libraries 2 ug satellite D N A from H. triquetra was treated with 0.04 unit Sau3A (Pharmacia) in 10 p.1 of 1 X One-Phor-All buffer at 37°C for 8 minutes and stopped at 65°C for 5 minutes. The partially digested satellite D N A was run on 1.0% low melting agarose (GibcoBRL) gel containing 200 ng/ml ethidium bromide in 1 X Tris-Acetate-E D T A electrophoresis buffer (Sambrook et al. 1989). D N A fragments sized > 0.5 kb, 1.6-2 kb, 2-5 kb, 5.0-12 kb and > 12 kb were cut out of the gel and spun in the Ependorf 37 tubes. Six volumes (V/W) TE buffer were added and incubated at 70 C until the agarose was dissolved completely. The tubes were vortexed briefly prior to chilling at -20°C overnight or -80°C for 30 minutes, thawed at room temperature, centrifuged for 1 minute in an Ependorf centrifuge, and the pellet was discarded. 1/10 volume sodium acetate (pH5.2) and 2 volumes ethanol (100%) were added to the supernatant, mixed thoroughly, and treated at -20°C for 30 minutes before precipitating the D N A . 10 ul ligation reactions, each containing 1 X One-Phor-All buffer, 20-50 ng Sau3A digested D N A fragments, 50 ng pUC18 vector digested by BamHl (Pharmacia), 1 unit ligase and 1 m M ATP, were carried out at 4°C for 12 hours. The following protocol was recommended by the manufacturer and used for transformation. The Epicurian Coli Ultracompetent Cells (Stratagene) were thawed on ice, mixed gently by hand, and 100 pi aliquots were transferred into a prechilled 15 ml Falcon 2059 polypropylene tube. 2.0 p.1 of (3-mercaptoethanol (provided) was added to each aliquot. The tubes containing the aliquot (cells) were swirled gently in order to mix the contents, incubated on ice for 10 minutes, and swirled gently every 2 minutes. 30-50 ng of ligated D N A was added to the tubes containing the cells. The tubes were swirled gently, incubated on ice for 30 minutes followed by 30 seconds in a 42°C water bath, immediately quenched on ice for 2 minutes. 0.9 ml preheated (42°C) NYZ + broth were added to the tubes, and the tubes were incubated at 37°C for 1 hour with shaking at 250 rpm. A 180 ul solution from the tubes was spread on one plate containing Ampicilin (100 pg/plate), X-gal (1600 ug/plate) and IPTG (800 u.g/plate). The plates were incubated at 37°C overnight. Three plasmid libraries were made, from satellite D N A fragments of > 0.5 kb, 1.6-2 kb and 2-3 kb. 38 2.9 Plasmid D N A preparation Plasmid D N A was prepared according to the protocol with minor modifications (Sambrook et al. 1989). Single plasmid colonies were grown in glass tubes containing 1.5 ml Terrific Broth (1.2% bacto-tryptone, 2.4% bacto-yeast extract, 0.4% glycerol (v/v), 17 m M K H 2 P 0 4 , 72 m M K 2 H P 0 4 , Ampicilin 100 pg/ml) at 37°C for 12-14 hours with shaking at 250 rpm. The cells were transferred to a 1.5 ml Ependorf tube and spun in a microcentrifuge for 2 minutes, and the cell pellet was resuspended completely in 100 pi solution A (50 m M glucose, 10 m M EDTA, 25 m M Tris-HCI pH8.0). Then 200 pi freshly prepared solution B (0.2 M NaOH, 1% SDS) was added and mixed by several gentle inversions. 150 pi solution C (3 M potassium acetate, pH4.8) was added; the tube was vortexed briefly, and centrifuged for 5 minutes. Two volumes of 95% ethanol were added to the supernatant; the tube was vortexed, held for 2 minutes at room temperature and centrifuged for 5 minutes. The pellet was washed with 70% ethanol, air dried, suspended in RNAase (20 pg/ml) and used directly for sequencing. 2.10 Polymerase chain reactions Polymerase chain reactions (PCR) were carried out in a GeneAmp PCR system 9600 (Perkin-Elmer) thermal cycler according to the following protocol: 94°C for 30 seconds and paused for 2.15 minutes followed by 35 cycles: 94°C for 30 seconds, 50°C or 55°C for 30 seconds followed by 2 minutes at 72°C. After the final cycle the PCR reaction was kept at 72°C for 5 minutes. Each reaction (50 pi) contained 0.2 m M dNTP, 39 1 X PCR buffer, 0.1-1.0 ug template DNA, 50 nmol of each primer, 2.0 or 2.5 m M MgCb, 1.5-2.5 units Taq polymerase (Sigma or Rose), as required. PCR products were purified from 1.0% low melting agarose gels as described above, or using a Prep-A-Gene purification kit (Bio-Rad), or using G F X PCR D N A and gel band purification kit (Amersham-Pharmacia Biotech) according to the manufacturers' protocol. Purified PCR products were used directly in sequencing reactions. PCR products from Protoceratium reticulatum were cloned into pCR-TOPO vector using a TOPO T A Cloning kit (Invitrogen) according to the manufacturers' instruction. 2.11 DNA sequencing Sequencing reactions were set up using a dye terminator cycle sequencing kit (Applied BioSystems, FS-Taq or BigDye). Each sequencing reaction (10 pi) contained 2-3 ul FS-Taq or BigDye, 100-150 ng plasmid D N A or 20-30 ng D N A amplified by PCR, 3-5 nmol primer and the appropriate quantity of dH^O. Sequencing reactions were carried out in Perkin-Elmer Gene Amp 9600 using the A B I cycle sequencing protocol: 94°C for 5 seconds, 50°C for 5 seconds, 60°C for 4 minutes for 25 cycles. The sequenced samples were precipitated by adding 1/10 volume sodium acetate (pH5.2) and 2 volumes of 95% ethanol, quenched on ice for 10 minutes, centrifuged for 20 minutes, air dried and analyzed by an A B I 373 or A B I 377 automatic sequencer. Primers M13 and M13R were used for sequencing the two ends of each insert of the randomly picked clones. Specific primers for each gene were also designed for walking along the clones or sequencing PCR products. 40 2.12 Analysis of D N A sequences Staden package (http://www.mrc-lmb.cam.ac.uk/pubseq) was installed on Sun Microsystem (Solaris) Workstation, and was used to analyze sequences of the randomly picked clones. A directory (HTSD) specific for this project was created in my home directory (/export/home/zhaoduo/HTSD) and files containing sequence traces of certain clones from the A B I 373 or A B I 377 were transferred to HTSD. Vector sequences and ambiguous readings in each sequence were removed using Trev in Pregap of Staden. After opening Gap 4 in command tool, a database HTdata was created in directory HTSD using option "new" of File in Gap 4 and the edited sequences were imported into HTdata using option "normal shotgun assembly" of Assembly in Gap 4. Each sequence imported would either show up as a contig in Contig Selector i f there is no homologous sequence in the database, or join an existing contig by overlapping the homologues (Figure 4.2). Alternatively, contigs were generated using option "find internal joins" of View in Gap 4 that gives a matrix containing numerous dots, each dot representing homologous sequences between two sequences that could form a new contig i f the homologues are convincing. The contigs were further edited by correcting possible errors among the homologous sequences and checked by viewing the sequence traces. Text formats of D N A sequences were generated from each Contig by using option "calculate a consensus-normal" in File of Gap 4. After giving a filename for the output of a contig, a file containing sequence text for that contig was produced in HTSD. The text file could be opened in GDE for analysis such as sequence alignment and phylogenetic 41 studies, and the text format sequences were also used to search for homologues from databases using B L A S T (http://www.ncbi.nlm.nih.gov/BLAST). 2.13 Phylogenetic analyses Alignment of 23 S rRNA sequences of chloroplasts, mitochondria and bacteria were from the rRNA database (http://rrna.uia.ac.be). 23S rRNA sequences of Guillardia theta (AF041468), Plasmodium falciparum (X95275-6), Plasmodium berghei (U79732), Toxoplasma gondii (U87145), and 23 S rRNA sequences of the eight dinoflagellates, H. triquetra, H. pygmaea, H. niei, H. rdtundata, A. carterae, P. reticulatum, S. trochoidea, T. heimii, were aligned manually with those from the rRNA database in GDE (ver. 2.2) (Smith et al. 1994). Three masks were used in phylogenetic analyses: maskl (1316 bp from very conserved regions), mask2 (1885 bp, less stringent than maskl), mask3 (2033 bp, the positions from mask2 plus adjacent regions where a few taxa have small deletions). Chloroplast 16S rRNA gene sequences of Guillardia theta (AF041468), Plasmodium falciparum (X95275-6), Toxoplasma gondii (U87145) and the dinoflagellate H. triquetra were aligned to those from the rRNA database (http://rrna.uia.ac.be) in GDE (Smith etal. 1994). For the 16S rRNA and 23S rRNA gene sequences, maximum likelihood trees were constructed with global rearrangement and four different jumbles using fastDNAml (Olsen et al. 1994). Quartet puzzling trees were constructed using Tree-PUZZLE (ver 3.1) with 1,000 puzzling steps (HKY 85 substitution model), in which rate heterogeneity (invariable sites and among site rate variation) was taken into account (8 gamma rates + 1 42 invariable) (Strimmer and von Haeseler 1996). The following trees were constructed using PAUP 4.0 with the heuristic search option (Swofford 1999). Maximum parsimony trees were constructed by optimizing the characters with accelerated transformation (ACCTRAN). LogDet trees (LogDet/paralinear distance) and neighbor joining trees (HKY 85 distance, gamma distribution rate with shape parameter estimated from PUZZLE) were optimized by minimum evolution, with the starting tree(s) obtained via neighbor joining, and branches were swapped by tree-bisection-reconnection (TBR). Plastid protein sequence alignments were retrieved from 134.169.70.80/ftp/pub/incoming (Martin et al. 1998). Protein sequences from seven genes ipsbA, psbB, psbC, psaA, psaB, petB and atpA) ofH. triquetra were aligned to the corresponding retrieved alignment, and concatenated to give an alignment of 3,302 amino acids in GDE (Smith et al. 1994). Maximum parsimony and neighbor joining trees were constructed using PHYLIP (ver. 3.5, Felsenstein 1993). In an attempt to resolve the origin of dinoflagellate chloroplasts and the phylogenetic relationship among the dinoflagellate chloroplasts, psbA D N A and protein sequences obtained from various dinoflagellates in this project were aligned with those of other photosynthetic organisms and used in phylogenetic analysis. Phylogenetic analyses of psbA nucleotides (only the first and second nucleotides of the triplet codon were used) and amino acid sequences were carried out using PHYLIP (ver. 3.5, Felsenstein 1993). Parsimony trees were constructed using global rearrangement, Neighbor joining trees were constructed using the P A M matrix (for protein) or Kimura distances (for DNA). 43 The input order of taxa was jumbled. Bootstrap analyses were based on replicates for 16S rRNA, 23 S rRNA and psbA data sets, 100 replicates for the concatenated protein sequences (3,302 amino acids). Chapter 3 Separation of chloroplast D N A One strategy widely used for preparation of chloroplast D N A from algae and higher plants is extracting D N A from intact chloroplasts isolated by differential centrifugation after breaking the cells (Herrmann 1982). In an attempt to isolate intact chloroplasts from Heterocapsa triquetra, mortar and pestle, Yeda press and glass beads vortexing were used to break the cells, but only broken chloroplasts (mainly chloroplast lobes) were obtained after centrifugation on sucrose or percoll gradients. Isolation of chloroplasts from the naked Amphidinium carterae (without theca plates) also failed despite a very gentle lysis method being used. Since typical peridinean dinoflagellates have reticulate chloroplasts (Bibby and Dodge 1974), it is probably impossible to isolate intact such reticulate chloroplasts because they would be easily broken when the cells are broken. Another widely used method for preparation of chloroplast D N A is separating the chloroplast D N A from nuclear D N A by CsCl gradient centrifugation (Douglas 1988). Usually, the A T content of chloroplast D N A is higher than that of nuclear D N A , i.e., chloroplast D N A is AT-rich, so that the density of chloroplast D N A is different from that of nuclear D N A . Chloroplast D N A forms a satellite D N A band and the nuclear D N A forms a major D N A band on CsCl gradient after a period of centrifugation. By specifically binding to AT-rich D N A molecules, Hoechst 33258 reduces the density of AT-rich D N A on CsCl gradients, and enhances the density difference between AT-rich D N A molecules and GC-rich D N A molecules. Hoechst 33258 has been widely used 45 for isolating chloroplast D N A (AT-rich) from nuclear D N A (GC-rich) (Chesnick and Cattolico 1993). 3.1 Chloroplast gene psbA and psbD are present in the satellite DNA of Heterocapsa 3.1.1 Heterocapsa triquetra Axenic Heterocapsa triquetra total D N A was separated into a satellite D N A band and a major D N A band on a CsCl gradient in the presence of Hoechst 33258 (Figure 3.1); both the satellite D N A band and the major D N A band were recovered by side puncturing using a Precision Glide needle (16G1). The refractive index of satellite and major band D N A solutions were adjusted to 1.3980-1.3990 by adding CsCl-lysis buffer (lg/ml), and further centrifuged for 20 hours on a CsCl gradient using the same rotor. The satellite D N A and major band D N A were recovered, precipitated by 95% ethanol, dissolved in deionized H2O and quantified using a DU-64 spectrophotometer (Beckman). 1 ug satellite D N A and 2 pg major band D N A were treated with EcoRI at 37°C for 2 hours and electrophoresed on 0.9% agarose gel. After treatment with a restriction enzyme, chloroplast D N A (the satellite D N A band) of higher plants (Herrmann 1982) and algae (Douglas 1988; Chesnick and Cattolico 1993) obtained by CsCl gradient usually show a restricted band pattern. However, a similar band pattern to that of algae and higher plants was not observed in H. triquetra satellite D N A treated with EcoR I. Surprisingly, both uncut and EcoRI treated satellite D N A appeared as smears (Figure 3.2a). The uncut and EcoRI treated major band D N A also appeared as a smear; however, 46 Figure 3.1 Separation of satellite and major band DNA from Heterocapsa triquetra on a CsCl gradient in the presence of Hoechst 33258. S: satellite DNA band. M: major DNA band. a Figure 3.2 Electrophoresis and Southern blot of satellite and major band DNA of Heterocapsa triquetra. a. Satellite and major band DNA on agarose gel. b. Southern blot probed with psbA gene. c. Southern blot probed with psbD gene. S: uncut satellite DNA. M: uncut main band DNA. S* & M*: DNA treated with EcoRI. 48 the major band D N A has more D N A molecules of high molecular weight than the satellite D N A does (Figure 3.2a). In order to see whether the satellite D N A or the major band D N A contains chloroplast genes, probes of spinach chloroplast psbA and psbD gene were hybridized with the D N A blot from the gel. Both psbA and psbD hybridized only to the satellite DNA. psbA labeled a 2.6 kb band and a faint 2.1 kb band on the uncut lane, but only labeled a 2.1 kb band on EcoRI treated lane (Figure 3.2b). psbD labeled a 3.8 kb band on the uncut lane, and a 2.4 kb band on the EcoRI treated lane (Figure 3.2c). No signal was observed in the major band D N A lanes for psbA and psbD gene probes. When H. triquetra cells were vortexed longer (10 X 50 seconds rather than the usual 4 X 5 0 seconds), two satellite D N A bands (instead of a single band) were observed after 24 hours centrifugation at 220,000 g on CsCl gradient in the presence of Hoechst 33258 (Figure 3.3a). Satellite D N A band 1 and 2 were run on 1% agarose gel (Figure 3.3b), blotted onto nylon membrane and probed with the spinach chloroplast psbA gene. The psbA gene labeled only D N A of satellite 2, suggesting that only satellite 2 contains the chloroplast psbA gene (Figure 3.3c). The D N A content of the satellite 1 is unknown; probes of mitochondrial genes and nuclear genes are needed for further investigation. 3.1.2 Heterocapsa pygmaea The Heterocapsa pygmaea culture is non-axenic and was previously identified as Glenodinium (Boczar et al. 1991). On a CsCl gradient, in the presence of Hoechst 33258, Heterocapsa pygmaea total D N A was separated into a satellite and a major D N A band 49 Figure 3.3 Separation of two satellite DNA bands from Heterocapsa triquetra. a, Satellite and major band DNA on CsCl gradient in the presence of Hoechst 33258. b, Electrophoresis of uncut satellite and major band DNA on 1% agarose gel. c, Southern blot of the satellite and major band DNA probed with spinach psbA gene. S1 and S2: satellite DNA. M1 and M2: major band DNA. 50 (Figure 3.4a). The satellite and major band D N A fractions were recovered, treated with the restriction enzyme EcoRI, electrophoresed onl% agarose gel, and a band pattern was observed (Figure 3.4b). However, the band pattern was not very clear and the sum of the fragment sizes did not add up to the approximately 120 kb calculated in an earlier study (Boczar et al. 1991). Hybridization with the spinach psbA gene labeled two bands of 2.2 kb and 2.4 kb on the lane containing EcoRI treated satellite D N A , and two faint bands of 4 kb and 5 kb on the lane containing the major band D N A (Figure 3.4c). When the refractive index of DNA-CsCl-Hoechst 33258 solution was adjusted to 1.3995, instead of 1.400 and centrifuged in a VTi80 rotor rather than V T i 65 rotor, two satellite D N A bands were obtained (Figure 3.5a). Two satellite D N A bands were also obtained when the DNA-CsCl-Hoechst solution was 1.3990 and it was centrifuged for 24 hours at 48,000 rpm in the VTi65 rotor. Agarose electrophoresis of EcoRI treated satellite D N A 1 and 2 both showed three plasmid-like D N A bands of 2.6 kb, 2.9 kb and 3.1 kb, approximately (Figure 3.5b). Hybridization with the spinach psbA gene labeled 2.2 kb and 2.4 kb bands on lanes of both satellite 1 and satellite 2 treated by EcoRI (Figure 3.5c). However, hybridization with the spinachpsbD gene only labeled one band of 2.2 kb on lanes with D N A of satellite 1 and 2 (Figure 3.5d). In both D N A blots, neither the psbA nor the psbD gene labeled the main band DNA. 3.2 Chloroplast psbA gene is not present in the satellite D N A of Amphidinium A satellite and a major D N A band were separated from axenic Amphidinium carterae total DNA, again on a CsCl gradient in the presence of Hoechst 33258 51 Figure 3.4 Separation and Southern blot of satellite and major DNA band from Heterocapsa pygmaea. a, Satellite and major band DNA on CsCl gradient in the presence of Hoechst 33258. b, Electrophoresis of satellite and major band DNA treated with EcoRI on 1% agarose gel. c, Southern blot probed with spinach psbA gene. S: satellite DNA. M: major band DNA. bA: spinach psbA probe. 52 Figure 3.5 Separation and Southern blot of two satellite DNA from Heterocapsa pygmaea. a, Satellite and major band DNA on CsCl gradient in the presence of Hoechst 33258. b, Electrophoresis of satellite and major band DNA on 1% agarose gel. c, Southern blot probed with spinach psbA gene, d, Southern blot probed with spinach psbD gene. S1* , S2* : satellite DNA treated with EcoRI. M: uncut major band DNA. M*: major band DNA treated with EcoRI. 53 (Figure3.6a). When the satellite D N A was treated with EcoRI, and electrophoresed on an agarose gel, 19 restriction bands were observed which could be added up to a total of 45 kb approximately (Figure 3.6b). The main band D N A had a high molecular weight on both uncut D N A lane and the EcoRI treated lane, suggesting that the major band D N A could not be digested by EcoRI. When the spinach psbA gene was used to hybridize the D N A blot, psbA gene did not label the satellite DNA, but labeled a 2.4 kb band, a 2.6 kb band and a smear around 1.6 kb on the main band D N A (Figure 3.6c), indicating that psbA gene is present in the major band DNA. 3.3 Discussion 3.3.1 Separation of dinoflagellate chloroplast DNA using CsCl gradients Satellite D N A and major band D N A were obtained after CsCl gradient centrifugation of total D N A of Heterocapsa triquetra and Heterocapsa pygmaea. Hybridization with spinach chloroplast genes only labeled satellite D N A , suggesting that the chloroplast genes are present in the satellite D N A of Heterocapsa species, and could be separated from nuclear D N A by CsCl gradient centrifugation. A satellite and a major D N A band were also obtained from total D N A of Amphidinium carterae. However, on the D N A blot probed with the chloroplast psbA gene, the probe did not label the satellite DNA. Instead, it labeled the major band D N A (Figure 3.6), suggesting that the chloroplast psbA gene was not present in the satellite D N A but in the major band with the nuclear DNA. Therefore Amphidinium carterae 54 k b b A S M k b b A S M Figure 3.6 Separation and Southern blot of satellite and major band DNA from Amphidinium carterae. a, Satellite and major band DNA on CsCl gradient in the presence of Hoechst 33258. b, Electrophoresis of satellite DNA and major band DNA treated with EcoRI on 1% agarose gel. c, Southern blot probed with spinach psbA. S: satellite DNA. M: major band DNA. bA: spinach psbA probe. 55 chloroplast genes could not be separated from nuclear D N A by CsCl gradient because they both have similar GC contents. However, chloroplast D N A was separated from nuclear D N A as a satellite in Amphidinium opuculata by CsCl gradient centrifugation (Barbrook and Howe 2000), suggesting that the chloroplast D N A density may be different among different species of Amphidinium genus. On D N A blots probed with the spinach psbA and psbD genes only the satellite band D N A of Heterocapsa was labeled, strongly suggesting that the satellite D N A of the two Heterocapsa species contain at least the chloroplast psbA and psbD genes which encode proteins D l and D2, respectively. D l and D2 protein are two key components of the reaction center of photosystem II, and are present in all the characterized chloroplast genomes of photosynthetic eukaryotic organisms. Although some of the chloroplast genes have moved to the nucleus (Baldauf and Palmer 1990a; Baldauf et al. 1990b; Martin and Herrmann 1998), the chloroplast genes maintained on large circular chloroplast D N A have similar base composition and usually are AT-rich. Separation of chloroplast D N A from nuclear D N A using CsCl gradients in the presence of Hoechst 3328 is based on difference in their base composition. Therefore, it is reasonable to argue that all the chloroplast genes of H. triquetra or H. pygmaea should have similar base composition and are probably present in the satellite DNA, while the chloroplast genes of Amphidinium carterae are present in the major band D N A . 3.3.2 Unusual structure of dinoflagellate chloroplast genes Usually chloroplast D N A isolated from algae and higher plants is high in molecular weight and shows restriction bands when treated with restriction enzymes 56 (Herrmann 1982; Douglas 1988). However, uncut satellite D N A unexpectedly appeared as a smear after electrophoresis on a agarose gel, and bands of 2.1-3.8 kb was labeled when probed with spinach psbA and psbD gene (Figures 3.2, 3.3). Their association with discrete bands suggested that the psbA or psbD genes were present on independent, small D N A molecules. If psbA or psbD was on large D N A molecules, glass bead vortexing of Heterocapsa triquetra cells would randomly break the large chloroplast D N A molecules and generate different sized D N A fragments that contained psbA or psbD gene. After random breakage, psb A or psbD should label fragment of varying size on the uncut satellite D N A instead of the consistent 2.6 kb or 3.8 kb bands. The identity of other D N A fragments in the smear is unknown and wil l be discussed in Chapter 4. Two, rather than one satellite D N A bands were obtained when Heterocapsa triquetra cell were vortexed 2.5 times longer than usual. When the cells were vortexed vigorously, an extra satellite D N A band could be obtained. The psbA and psbD gene probes did not hybridize with D N A of satellite 1. Instead they only hybridized with D N A of satellite 2, suggesting that only the satellite 2 contains psbA gene. Satellite 1 might originate from AT-rich D N A fragments of nuclear, mitochondrial or chloroplast genome generated by intensive breakage of the cells. More probes of nuclear, mitochondrial or chloroplast genes are needed to investigate the origin of satellite 1. The uncut satellite D N A of Heterocapsa pygmaea was of a high molecular weight, unlike the smear from Heterocapsa triquetra. Although the satellite D N A treated with restriction enzymes yielded restriction bands, a psbA and psbD gene probe labeled only small bands on uncut satellite D N A (not shown) and on restriction enzyme treated satellite D N A (Figure 3.4). Two satellite D N A bands obtained when the speed of CsCl 57 gradient centrifugation was decreased. Restricted band patterns of the two satellite D N A bands were the same when they were treated with EcoRI. Again, a D N A blot probed with chloroplast genes labeled small bands (Figure 3.5). psbA labeled two bands slightly different in size in all the satellite DNA, suggesting that probably there were two small molecules different in size containing psbA gene (confirmed latter by sequencing, see Chapter 6). psbA and psbD labeled the same size bands for the two satellite DNAs (Figure 3.5), suggesting that the two satellite D N A may have the same chloroplast gene content. In an attempt to get chloroplast genes from the satellite, chloroplast 16S rRNA gene primers were used to amplify 16S rRNA gene ofH. pygmaea by PCR. The primers yielded one product of expected size. However, sequencing the PCR product revealed that it was bacterial 16S rRNA gene, indicating that the satellite D N A of H. pygmaea has bacterial D N A contamination. Although the satellite D N A of Amphidinium carterae gave 19 restriction bands when it was treated with EcoRI, a psbA gene probe did not label the satellite DNA, but labeled two small bands on the major band D N A (Figure 3.6). This suggested that the psbA gene is GC-rich and could not be separated from nuclear D N A on a CsCl gradient, and that psbA gene is on a small molecule. Probably most of the D N A molecules in the major band came from nuclear D N A that has the same base composition as that of chloroplast psbA gene. Where the satellite D N A originated is not clear. Probes of mitochondrial and nuclear genes are needed to identify the origin of the satellite DNA. However, chloroplast D N A was successfully separated from a different strain of Amphidinium carterae by centrifugation of a sucrose gradient that separates D N A based on the size of molecules. D N A blots probed with chloroplast genes indicated that the 58 chloroplast genes were in the upper fraction that contains small molecules, suggesting that chloroplast D N A molecules are small molecules (Roger Hiller, personal communication), which is consistent with the conclusion from my D N A blots. 59 Chapter 4 Characterization of chloroplast genes in Heterocapsa triquetra: "One gene - one circle" 4.1 Random sequencing of clones from plasmid libraries of satellite DNA D N A blots probed with spinach psbA and psbD genes suggested that the chloroplast genes of H. triquetra are present in the satellite D N A (see Chapter 3). The satellite D N A was partially digested using Sau3A and electrophoresed on 1% low melting agarose gels. D N A fragments from three fractions consisting of fragments of >0.5 kb, 1.6-2 kb and 2-3 kb were used to make three plasmid (pUC18) libraries (Figure 4.1). 124 clones were randomly picked from the three libraries (83, 40, 1 clones from >0.5 kb, 1.6-2 kb and 2-3 kb library respectively), and each of them was sequenced from two ends using universal primers M13 and M13R. A l l the sequences were imported into the database HTdata in Gap 4 of Staden in the directory of HTSD for further analysis. 4.2 Identification often chloroplast genes Using B L A S T searches (http://www.ncbi.nlm.nih.gov/BLAST), homologues of ten chloroplast genes were identified: two ribosomal R N A genes (23S rRNA and 16S rRNA) and eight chloroplast protein genes (psaA, psaB, psbA, psbB, psbC, atpA,petB and rpsl4) (Table 4.1). The protein genes identified include a ribosomal small subunit protein gene (rps\4), and at least one polypeptide gene for each of the four major membrane-protein assemblies of thylakoids: photosystem I (psaA, psaB), photosystem II 60 Satellite DNA Sau3A partial rdigestion Size fractionation kb 1 2 kb 3 1; 12 2-5 kb 1.6-2 kb 6 1 0.5 >0.5 kb 1.6-2 kb 2-3 kb >0.5 kb Ligation I Transformation (Stratagene ultracompetent cell) pUC18 1.6-2 kb library 2-3 kb library >0.5 kb library Figure 4.1 Flowchart of constructing three plasmid libraries from the satellite DNA of Heterocapsa triquetra. 1, uncut satellite DNA. 2 & 3, satellite DNA partially digested with Sau3A. pUC18, plasmid pUC18 vector. 61 Table 4.1 Features of nine H. triquetra chloroplast genes and their 9G-9A-9G region Gene No. of Protein Circle Protein Start Stop D l D2 D3 D4 clones of (bp) (aa) codon codon (bp) (bp) (bp) (bp) psaA 2 P S I 3005 732 A T A * T A A 70 47 112 120 psaB 3 P S I 3121 776 A T A * * T A G 29 75 103 126 psbA 5 PSII 2151 348 TTG T A A 173 66 92 316 psbB 3 PSII 2286 505 A T A T A A 53 66 83 99 psbC 1 PSII 2330 460 A T A * T A G 97 36 118 239 atpA 2 ATPase 2444 452 A T A T A A 290 36 113 190 petB 1 Cytb 6 f 2204 219 A T G T A A 815 65 90 117 rpsXA 1 rpsl4 2012 72 A T G T A A 62 215 94 174 23SrRNA# 8 3027 116 86 99 152 16SrRNA 1 2563 367 66 106 98 * or TTG; ** or ATT; # there appear to be 3 related circles: two have deletions in the tripartite region making them 25bp and 35bp shorter than the third. D l and D4 regions of 16S and 23S rRNA gene were estimated based on sequence alignment. There are three ORFs on rpsU: ORF1 (942-1049bp), ORF2 (1128-1448) and ORF3 (1731-1949) encoding 35, 106 and 72 amino acids respectively. D l and D4 region of rpslA is the region between 9 G L and the stop of ORF3 (rpsl4), and the start of ORF 1 and 9GR, respectively (also see Figure 4.10). (psbA, psbB and psbC), and the proton-driven ATPase (atpA), and the photosynthetic electron transport (petB encoding cytochrome be). Out of 124 clones randomly picked from the three libraries, 53 clones (43%) contained sequences homologous to chloroplast genes; the other 71 clones (57%) consisted of sequences with no statistically significant homologues from databases. Almost all the sequences from the 124 clones are AT-rich; their A T composition ranges 62 from 50% to 60 %. Except for thepetB and rps\4 gene, redundancy was observed for the eight other chloroplast genes from the sequences of the 53 clones. However, redundancy was not found for sequences from the 71 clones except that clone HT48 was identical to HT83, and clone H43 was identical to H123. The random sequencing strategy was stopped at this stage. To get complete sequences for each of the identified chloroplast genes, primers were specifically designed to walk each gene on both strands. Primers were also designed to amplify fragments by PCR in order to fill the gaps that were not covered by clone sequences. A l l the sequences of clones and PCR amplified fragments were imported to Gap 4 of Staden and assembled (Figure 4.2). 4.3 Contig assembly in Gap 4 of Staden Using the "normal shotgun assembly" and "find internal joins" options in Gap 4 of Staden, fifteen contigs sized 2-3.5 kb were generated from sequences of the 51 clones and PCR products related to those clones (Figure 4.3). A very conserved tripartite non-coding 9G-9A-9G region was found in all the fifteen contigs. The 9G-9A-9G region consists of three very conserved cores, two 9G cores each with a run of 9Gs in the center, a 9A core with a run of 9As in the center, and variable regions between the cores. The databases contained no sequence homologous to the 9G-9A-9G region. Only one contig of 1 kb was generated from sequences of three clones (H4K, H33K and HI 6) out of the 71 clones that have no significant homologues from the databases. As almost all these sequences do not form contigs with each other or with the 63 o o o o o o o o o o o On O 1.6-2 kb library • a o o o o o o o . o o o o o o o o . 2-3 kb library >0.5 kb library nr Z T Sequencing the two ends of randomly picked clones rzr z r T Z T Z zr 1 Assemble the sequences in Gap 4 of Staden T Z i_ i u a i_i i_ i TZi rzr • n r z r r~i rzr • a z r rzr T Z rzr TZl T Z Z T Z T Z T Contig generated in Gap 4 of Staden by overlapping the ends Figure 4.2 Random sequencing and sequence assembly 6 4 (1) psaA (3,256 bp, overlapped 251 bp) aA5 aA10 106R 1,000 1,500 psaA1 psaA3 psaA7 -4- <~ psaA5 psaA6 2,000 psaA9 psaA8 psaA4 psaA10 psaA2 C 9G 9A ~ 3 ^ ^ 3, (2) psaB (3,171 bp, overlapped 50 bp) aB8 120 • • • • • • • • > « • 120R aB4 22 64 64R 0 500 1,000 1,500 2,000 2,500 3,000 3,171bp -> -> -> - • psaB9 psaB2 psaB6 psaA5 Gag1 psaB4 psaB8psaB10 psaB7 psaB1 Gag2 psaB3 (3) psbA (2,350 bp, overlapped 199 bp) . H33R 111 . ^ " i " " 20KR 20K ^ ^ bA7 + ^ 23 . ± ^ 23R ^ • 0 500 1,000 1,500 2,000 2,350bp bA3 bA7 bA5bA2 Gag1 bA3 <~ <- <- <*- <4-<-bA8bA9 bA1 bA6 bA4 Gag2 bA8 bA9 65 (4) psbB (2,409 bp, overlapped 123 bp) ^ bB6 146 24K ' 10R 500 psbB5 1,000 ^psbB3 psbB6 psbB2 bB4 146R 24KR ^l',5™ 2^00 — 2,409bp psbB4 Gag1 <- <-psbB1 Gag2 (5) psbC (2,605 bp, overlapped 275 bp) bC3 91 0 500 - • 91bC1 <-91bC3 bC5 91R 1,000 91bC4 <-91bC2 1,500 2,000 Gag1 91bC5 +-Gag2 2,500 (6) atpA (2,539 bp, overlapped 96 bp) ATP3 ATP2 (7) petB (2,243 bp, overlapped 39 bp) N petB3 petB8 • <-. H19R petB5 500 petB3 1,000 petB2 petB7 petB1 petB6 ^ M i petB5 * petB4 H19 ^ ^ ^ A ^ 9G 1,500 2,000 - • - > • petB4 petB6 petB8 G^g3 (8) 16S rRNA (2,774 bp, overlapped 211 bp) H27K H27K6 H27KR H27K4 9G 9A i?G~ 0 500 1,000 16S2 H27K2 H2~7K4 H27K3 1,500 2,000 2,500 2,774bp - • -> -> Gag1 H27K5 H27K6 <- <-H27K7 H27K1 (9) 23S rRNA (3,446 bp, overlapped 419 bp) 3,000 3,446bp 23S3 Gag2 67 (10) rps^A (2,016 bp, overlapped 4 bp) 1,500 -> -> -> H9P1 Gag1 HT9P2 HT9P3 Gag2 HT9P4 Figure 4.3 Ten contigs assembled from sequences of different clones. Solid lines represent sequences of the basic clone with the clone name above the bold line (e.g. 23 and 23R represent the two end sequences from clone 23 using M13 and M13R). Thin solid lines represent the sequnces from walking the basic clone. Bold dotted lines represent sequences from other clones. Thin dotted lines represent sequences of PCR products. N: N terminus of the protein. C: C terminus of the protein. Primers used for walking the basic clone or PCR amplifying DNA are showed below each contig as short arrows. 68 chloroplast genes, probably they are non-specific contamination of the satellite band by AT-rich nuclear D N A of relatively high sequence complexity. The tripartite 9G-9A-9G region was not found in the sequences from the 71 clones. Ten complete chloroplast genes were identified on ten contigs by B L A S T searching for homologues from the databases, and each gene is on an independent contig (Figure 4.3). The other five contigs contained only fragments of chloroplast genes (see Chapter 5). Two different genes were never found on the same contig, even those that are adjacent in all other chloroplast genomes, such as psaA and psaB, 16S rRNA and 23S rRNA. 4.4 The initially puzzling structure of a chloroplast psbA clone The structure of clone HT23 containing psbA gene sequence appeared initially quite confused (Figure 4.4). Each end of the clone was part of the psbA gene with the same orientation, and was separated by the tripartite non-coding 9G-9A-9G region. If the two ends of the clone were connected, a complete psbA gene was formed. The structure of this psbA clone could result from only two possible structures of the psbA gene: identical tandem repeats that consist of a psbA gene and its adjacent non-coding 9G-9A-9G region, or a minicircle containing a psbA gene and the non-coding 9G-9A-9G region (Figure 4.5). Structures similar to the psbA clone were also found in clones containing chloroplastpsaB,psbB, atpA, 16S rRNA and 23S rRNA gene. 69 Figure 4.4 Unexpected "confused" structure of a psbA clone. Each end is part of psbA gene, and the two ends together form a complete psbA gene. M13 & M13R, universal primers. pUC18, vector. N & C, N-terminus and C-terminus of psbA protein. 9G-9A-9G, non-coding tripartite region. 9G, run of 9G motif. 9A, run of 9A motif. Arrows indicate the length and orientation of psbA gene fragments. The number under the arrows indicates the number of amino acids deduced from the psbA gene fragment. 70 (a) N Sau3A C 9 G 9 A g G N Sau3A C 348 aa Sau3A pUC18 0 115aa (b) 1,000 Sau3A Sau3A 233 aa 2,000^ pUC18 Figure 4.5 Two possible psbA gene structures that could give the puzzling psbA clone, (a) tandem repeats of psbA gene, (b) psbA gene is on a minicircle. 71 4.5 psbA is on a minicircle, as is the 23S r R N A gene Contig assembly showed that the two ends of the psbA contig have overlapping sequences (Figures 4.3, 4.6), suggesting that the psbA gene and its adjacent non-coding region are circularly permuted on different clones. In principle, a circular contig could result either from a minicircle containing only the gene and its adjacent non-coding region, or from identical tandem repeats of the gene with an intervening non-coding region. Both are consistent with the interpretation of the psbA clone (Figures 4.4, 4.5). Hybridization of a spinach psbA probe with 2scoRl-cut satellite D N A from H. triquetra labeled only a band at 2.1 kb (Figure 4.7 lane 2), in agreement with the 2,151 bp size and single EcoRI site found by sequencing (Figure 4.6, Table 4.1). If this gene existed as a tandem repeat within a larger molecule, digestion should have given fragments of at least two different sizes unless the number of copies was very large. When H. triquetra total D N A (uncut) was hybridized with spinach psbA gene, it labeled two bands of about 2.6 kb and 1.3 kb (Figure 4.7 lane 3), suggesting that chloroplast psbA is on a small molecule. Uncut satellite D N A probed by a psbA gene probe also labeled the 2.6 kb band (Figure 4.7 lanel). Probably the 2.6 kb and 1.3 kb bands on the uncut total D N A blot represent the relaxed circular and the supercoiled forms of the psbA minicircle respectively, while the 2.1 kb on the uncut satellite D N A blot represents the linear psbA generated by EcoRI digestion of the minicircle. Similarly to the structure of the psbA contig, the 23 S rRNA gene contig also has overlapped sequences at each end, and could be circularized to a minicircle (Figure 4.8). When total H. triquetra D N A (uncut) was hybridized with a 0.7 kb probe from the H. 72 Figure 4.6 psbA gene contig and its circularized minicircle. (a) psbA gene contig generated from sequences of 5 clones and 1 P C R product, (b) psbA minicircle circularized from the contig by overlapping the end sequence. Solid lines with numbers: sequences of clones; each number above the line represents a clone. Solid lines without numbers: sequences from walking clone 23. Dotted lines: sequences of the P C R product amplified from primer pair bA6/bA7. Grey rectangle: overlapping sequences of the linearized contig. N: N-terminus of the psbA protein. C: C-terminus of the psbA protein. bA1, bA5, bA6 & bA7: psbA gene primers. 9G-9A-9G: tripartite non-coding regions. 7 3 Figure 4.7 Southern blot of satellite DNA and total DNA of Heterocapsa triquetra. Lane 1-3, probed with spinach psbA. Lane 4-5, probed with H. triquetra 23S rRNA gene. Lane 1, uncut satellite DNA; lane 2, EcoRI treated satellite DNA; lane 3-4, uncut total DNA; lane 5, BglW treated total DNA; lane 6, ethidium bromide stain of lane 4. 74 Figure 4.8 23S rRNA gene contig and its circularized minicircle. a, 23S rRNA contig from sequences of 8 clones and three PCR products, b, 23S rRNA minicircle circularized from the contig. Solid lines with numbers: sequences of clones; each number above the line is the name of the clone. Solid lines without numbers: sequences from walking clone 100. Dotted lines: sequences of three PCR products. Grey rectangle: overlapped sequences at the linearized contig. Short arrows and numbers beneath the arrow: 23S rRNA gene primers. 9G-9A-9G: tripartite non-coding regions. 75 triquetra 23S rRNA gene, it gave strong bands at about 2.1 and 4.3 kb, and a weaker band at 3.4 kb (Figure 4.7 lane 4). The 2.1 and 4.3 kb bands disappeared after BglR digestion, giving two new bands of 0.6 and 2.6 kb (Figure 4.7 lane 5), which sum to 3.2 kb, in reasonable agreement with the 3,027 bp predicted for a linear monomer from sequencing (Table 4.1). The linear band is slightly smaller after digestion because of two closely-spaced Bglll sites (Figure 4.8). The three bands in uncut D N A therefore probably correspond (in decreasing size order) to relaxed monomelic circles, linear monomers, and supercoiled circles. The weak hybridization at high molecular weight (>12 kb) is probably non-specific cross reaction with nuclear 28S rRNA genes, since it does not decrease after digestion with Bglll. However, on the lanes of the gel stained with ethidium bromide for the total D N A blot, no visible D N A bands were in the regions labeled as bands by psbA or 23S rRNA gene (Figure 4.7 lane 6). In an attempt to demonstrate that the psbA and 23 S rRNA gene each is on a minicircular chromosome, inwardly and outwardly directed PCR primer pairs specific for the psbA and 23 S rRNA genes were used to amplify D N A from uncloned genomic DNA. Both primer pairs bA7/bA6 and bAl/bA5 gave products expected for a circular molecule (Figure 4.9), which was consistent with the results of hybridization of uncut genomic D N A using the psbA probe (Figure 4.7). Similarly, the primer pairs 23S1/23S4 and 23S2/23S3 gave products of the size predicted for a circular contig (or a minicircle), indirectly consistent with the D N A blot. Therefore, the 23 S rRNA gene is also on a minicircle (Figure 4.9). Sequences from both PCR products and clones could be integrated into a single circular contig both for the psbA gene and for the 23 S rRNA gene (Figure 4.6, 4.8). The results of the inverse PCR also suggested that the unique 76 23S rRNA (3,027 bp) Figure 4.9 Confirmation of psbA and 23S rRNA gene minicircles by inverse PCR. (a) psbA minicircle. (b) P C R products amplified from primer pair bA1/bA5 and bA6/bA7 using genomic DNA as template, (c) 23S rRNA gene minicircle. (d) P C R products amplified from primer pair 23S1/23S4 and 23S1/23S2. 77 arrangement for psbA or 23 S rRNA gene was neither chimaeric nor a cloning artifact. Despite the fact that identical tandem repeats of the psbA gene with its adjacent non-coding region could also have given the same PCR products when the two primer pairs were used, one can conclude that the psbA gene is on a minicircle and not a tandem repeat i f the combined evidence from D N A blots and PCR are taken into account. Therefore, the psbA gene is on a minicircle, and the 23S rRNA gene is also on a minicircle. 4.6 Each gene is on a minicircular chromosome: "one gene - one circle" A l l the fifteen contigs showed that the two ends of each contig have overlapped sequences, and each contig could be circularized to a minicircle (Figures 4.3, 4.10; also see Figure 5.1). Inwardly and outwardly directed PCR primer pairs were used to amplify D N A from uncloned genomic DNA, and gave products of the size predicted for each minicircle. Sequences from such PCR products and their length were all consistent with a circular organization, and could be integrated into each circular contig. In all the known chloroplast genomes, some chloroplast genes are consistently adjacent, existing as gene clusters or operons and are co-transcribed. Such adjacent genes include psaA and psaB, and 16S rRNA and 23S rRNA genes. In order to detect such universal chloroplast gene operons in H. triquetra, PCR reactions using one primer from psaA and one from psaB, as well as one primer from 16S rRNA gene and one from 23S rRNA gene were carried out. No product resulted from these reactions. This suggested that in the chloroplast genome of H. triquetra, psaA and psaB genes were not adjacent, 78 0RF2 Figure 4. 10 Structure o f ten chloroplast single gene circles. T h e stippled region on each circle represent the coding region for a specific gene. G e n e name and the size of the circle are shown inside each circle. N, N-terminus. C , C-terminus. T h e 9 G - 9 A - 9 G plus D1-D4 are the tripartite non-coding region. 79 and that 16S rRNA and 23S rRNA genes were not adjacent either. A large D N A molecule with multiple genes or gene operons could not be detected in the chloroplast genome of H. triquetra. Instead each chloroplast gene appeared on a separate minicircle, "one gene-one circle". 4.7 Characteristics of chloroplast genes 4.7.1 The most divergent chloroplast genes ever sequenced Comparing sequences of H. triquetra chloroplast genes with those of other organisms showed that H. triquetra has the most divergent chloroplast genes ever sequenced. The following analysis is based on the protein genes; 16S and 23S rRNA genes will be discussed in Chapter 7. In general, sequences close to the 3' end (or C terminus of protein sequence) of H. triquetra chloroplast genes are fairly conserved, while sequences near the 5' end (or N terminus of protein sequence) of these genes are more divergent than sequences of other regions. Comparison with chloroplast gene sequences from other organisms indicated that some protein genes of H. triquetra are not very conserved at the D N A level, but their protein sequences are conserved, e.g. the psaA and psaB genes (Figure 4.11). Others such as the psbA and psbB genes are conserved at both the D N A level and the protein level (Figure 4.12). Indels (insertions or deletions) of different sizes were found in various regions in all ten chloroplast genes of H. triquetra. Most indels in protein genes were at loops 80 (a) psaA D N A 1582 1592 1602 1612 1622 1632 I I I I I I Heterocapsa TTAGGTACAG CTGATTTCAT GGTCCATCAT ATCCATGCTT T C A C T A T T C A CTGTACTCTC Synechocystis C . G . T , C . C T . c . C . C . . T . c C .GTA. . GGC. Cyanophora T . A . . . T . A A . . T . A . T . C . . .GTA. . . G. T Porphyra A T . A . T . . A . A . T . .GTA. . . G. T Guillardia . T . . A . . T . G . T A TGT. . .AG.T Odontella T . T . . A . . T . . T . A TGTA. .AG.A Chlorella A . T . . T . c . C . . T . . C . . T . C . TGTA. .CG.T Chi amydoiuonas TT r . T . . T . c. r . . T . . C. A .GTA. . . G. G Euglena A G. . T T . T . . T . . C . . T . A TGT. . . . G. T Marchantia A A . T T . A A . . T . A T A TGTA. . . G. T Pinus A C . G . . T T . A c . . T . A T C TGTG. . . G. T Nicotiana A C n . . T . A c . . T . A . T . G TGTG. . GGCA Zea mays A C A . T T . A . c . . T . C A . T . C C TGTG. . . G. A Oryza A C . . A . . T T . A c . . T . c A . T . A C TGTG. . . G . A 1642 1 1652 1 1662 1 1672 1 1682 1692 1 T T A A T T T T A A TGAAGGGTGT T C T T T A T T C T AGAAGCTCTA GATTAGTTTC AGATAAGTTA C . G . CC CC . C . A . G . . A . .G C C T . .C .CC T. C C T. . . . . AGCG T . A . A A TCG A AT. .C . T . .A . C . T. . . . . AGC. C T . C. . G . C . .AT. TC. A AT. .C . C . .A . c . T. . . . . AGC. C . G . .C . A . A AT A TC. G C. . . A. . A. .A AC. . AGCT . C TT . A . A A. A .CG C T . . T . A. A . C .A . C .AGC. C T . .C TT .A A . . G C T . AC . G . .A C C . . . C . .AGCG .C TC A .T A T . G C T . .C .TC TA CC. .AGCT GT .A A T. . A C . C . . T . . G . .A AC. . . GCT .T . A . A .T A T . G C. . . . T . .C . T . GA AC. . . . C . .AGCT CC .C . T . A A T . G C C T CC . T . .A AC. T . . . . .AGCG C . G . AC CT A A T , G C . C . T CC . T . GACAC. G .AGC. AC TT A T A T . G C . C . . T . CC .TC GA AC. T. . . . . A G C AC TT A T A T . G C . C . . T. CC . T. GA > AC. C. . . . .AGC. 1702 1712 1 1722 1 1732 1 174: 1 1752 1 GAATTAGGTT TCCGTTATCC ATGTGATGGT CCTGGTAGAG GCGGTACTTG 1 TCAG A . T C C. TC. . C. C . . C . .CC C. C. C. . C. .A A . T C T TC. . T . C . . . A . . C C T . T . A A . T C T. A. .TA A T. . . C A C. C. .A A . T . .TA A TC. . T . . A . A. C. .A A . T C A TC. . T AC . T . A A . T . .TA A TC. . T. . . A . .CC T. . T . A. . . . . A A . C TC. . T C . . C T AGT TA G T. . . T . C T . G . A C . . . A TC T . T . T . . . T c A A . A . A A A TC G . T . C T. . . T A . A . A A. . A A . C C T . T . T. . . T A . A n A A A . T C T. C. . T . C TC. . T. c . C . A .GC . G . A. A. . . . .A A TC T . T . C TC. . T c G .GC .0. A . A . . . . .A 81 (b) psaA protein 528 538 548 558 568 578 I I 1 I I I I Heterocapsa LGTADFMVHH IHAFTIHCTL LILMKGVLYS RSSRLVSDKL ELGFRYPCDG PGRGGTCQ Synechocystis V . A . . . L A P . . A N . . . . F Cyanophora L V . V . . . L . . . . F A . N . . . I P . . A N . . . . F Porphyra V . V . . . V . . F . F . . H . . . I P . . A N . . . . F GuUlardia V . V . . . L . . . . F . . N . . . I P . . A N . . . . F Odontella V . V . . . L A . . . K . I P . . A N . . . . F Chlorella V . V . . . L A I P . . A N . . . . F Chlamydomonas . . . S V . V . . . L . . . . F A I P . . A N . . . . F Euglena L V . V . . . L . . . . F I P . . A S . . . . F Marchantia L V . V . . . . L . . . . F A I P . . A N . . . . F Pinus L V . V . . . L . . . . F A I P . . A N . . . . F Nicotiana L V . A . . . L . . . . F A T P . . A N . . . . F Zea mays L V . V . . . L . . . .FA I P . . A N . . . . F Oryza L V . V . . . L . . . . FA I P . . A N . . . . F VIII Figure 4.11 D N A and protein sequence alignment of a psaA gene segment, (a). D N A sequences corresponding to the protein segment, (b). Protein sequences. Bold line is potential transmembrane region VIII. Dots indicate nucleotides (in a) or amino acids (in b) identical to Synechocystis. Numbers are the positions on the DNA/protein of Porphyra purpurea. 82 601 611 621 631 641 651 661 Heterocapsa GGTGTTGCTG CTGTCTTCGG TGGTTCACTC TTCTCAGCAA TGCATGGTTC TCTCGTTACT TCTTCACTC Synechocystis G . . . . G . . . A . . . . A G C T . G C . . C . .'. . . C C T . G . . A . . C . . C . . C T . G Cyanophora G . . . A T T . A . . . A G T . . T . . . . . C T . A . . A A G C T . A Porphyra G . . . A . . T T . G . . . AGT . . T C . A . . C . . A A . . . A G C T . A GuUlardia G . . . A . . T T . A T C A A G C T . A Odontella G . . . A T . A . . . A G T C T . A . . A A G C T . A Chlorella . . C C . G . . . A . . T T . A T . . T C A . . T . . A T . . A Chlamydomonas G . . . A T . A T C T . A A . . T T . A Euglena G . . . T . . T T . . T T . . T T . G . . A . . A . . . A G T T . G Marchantia A . . . . G . . . A C T . . A . . . A G C . . T T . G . . A A A G T T . A Pinus A . . . . G C . . A C . . C . . T . . A . . . A G T . . T T . G . . A C A G T T . G Nicotiana . . C . A . . . . G . . . A C . . C . . C . . A . . . A G T . . T C T . G . . A A G T T . G Z e a mays A . . . . G . . . A C C . . A . . . A G T . . T C . . T . . A . . C . . . A G T T . G Oryza A . . . . G . . . A C C . . A . . . A G T . . T C T . G . . A . . C . . . A G T T . G P r o t e i n G--V--A--A --V--F--G- -G--S--L-- F--S--A--M --H--G--S- -L--V--T-- S--S--L 201 223 Figure 4.12 Alignment of a very conserved segment of psbA gene on transmembrane helix IV (192-218* amino acid) and its downstream partial sequence. Protein sequences of this region are the same in all the 14 organisms. Dots indicate nucleotides identical to that of Synechocystis D N A sequence. Numbers are the positions on the DNA/protein of Porphyra purpurea. 83 between potential transmembrane regions; however, indels were also found at the C terminus and the N terminus of some protein genes (Table 4.2). On the loop between transmembrane region IV and V, PsaA and PsaB have 16 and 10 amino acid deletions respectively. PsaA has a 10 amino acid insertion on the loop between transmembrane region VI and VII, while PsaB has a 43 amino acid insertion between transmembrane region II and III (Figure 4.13). However, PsbA has a 16 amino acid deletion at its C terminus, exactly like a deletion in the Euglena PsbA (Figure 4.13). Short indels were also found at various regions of PsaA, PsaB, PsbA and other proteins (Table 4.2). Short indels were also found in the 9G-9A-9G region of three related clones in the 23S rRNA gene. Clone H26 and H6K both have 35 bp deletions in D l ; HT33 has three deletions (9 bp in D2, 10 bp in D3, 6 bp in D4 that is also on clone HT104), making them 35 bp, 25 bp and 6 bp shorter than clone HT100 respectively. This suggests that heterogeneous 23 S rRNA gene minicircles exist in the chloroplast genome of H. triquetra, probably originating by recombination (unequal exchange) of the tripartite region between different copies of the same single gene circles. Three open reading frames (ORFs) which have putative start A T G and stop T A A or T G A (ORF1) were found on a minicircle sized 2,012 bp that had the tripartite 9G-9A-9G region (Figure 4.10). The translations of ORF 1, ORF2 and ORF3 were 35, 105 and 72 amino acids respectively. ORF1 and ORF2 had no significant homologues from databases, and ORF3 had homologues in the database with low E values. A B L A S T search suggested that ORF3 is related to small subunit ribosomal protein 14 of either the mitochondrion or the chloroplast. Alignment of the putative H. triquetra rps\A protein sequence with those of other organisms showed it is extremely divergent (Figure 4.14). It 84 - CD rf CD CD CD o CD CD CD CD CD 61 2 2 H 2 2 2 2 CD 2 2 2 H H > > > > 2 > H EH J J rf TO rf rf rf rf TO < H B 01 -04 A A P i & Pi TO ft H a a rf rf rf E-i H H hi hi J £ £ g . Oi CM CM E hi J hi ! > c n c n > 63 63 63 P CD O CD O rf h i rf c n rf c n rf > > la w w ti w > > H J J c n c n c n c n c n > c n D a 01 Cu Cu Cu 2 a o n Q D Q p n hi hi hi J hi ^ J Pi Pi Pi Pi Pi ft Ql Cn Cu Cn tn Cn Cu Cn ~ • — • 2 ca w w H w ca ca u u 63 a X 1 ac a i a 1 1 i a a a w w ca w M ca u bl w w 63 Cn £ s H E S s s s s s E CD 0 CD 0 CD 0 u. 0 0 CD CD H hi hi hi hi hi hi hi hi hi hi 2 2 2 2 2 2 2 2 2 2 2 2 2 h l H H h l l — I ^ H H H H H H H H > M M M M H H h ( M M h t M H h ( § § § § § § § § § § § § § § 3 3 3 3 3 S 3 3 3 3 3 3 3 3 - E - i E - i H H E H C n E - i E H E - i H E - i H E - i E H CD c n 2 2 2 2 2 > > Bi OJ CD CD i a a c n c n > hi > > > > Bi a Bi Pi Bi Pi o o o o o o CT ra a a a a c n o i c n c n c n c n - a p p P " 2 a 2 13 £ 3 2 h i H H M H M > > Pi Pi CD CD CT CT c n c n > > Pi pi CD CD O CT c n c n c n c n c n c n c n c n c n c n c n c n c n c n c n c n CTCTCTCTCTCTCTCTCTCTCTCTCTCT 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 i 6 i 6 i C n b b 6 i C n l n C u C n [ J n l n l n 2 2 2 2 2 2 2 2 2 2 2 2 2 2 I n h l C n C n C n h l C t i C u C n C u C u C t i C n h n C D C D C D C D C D C D C D C D C D C D C D C D C D C D a a a j h i j a a a i i h i a a a - 2 2 2 2 2 2 2 2 2 2 2 2 2 2 hi J J ht hi hi hi hi hi hi hi hi hi J c n u c n c n c n cn c n c n c n c n c n c n c n c n hi J J hi hi hi j hi J hi hi hi hi hil CT c n CT CT CT > CT CT ca CT CT CT CT CT E-i EH EH EH H EH EH H EH H EH EH EH rf rf PI PI < < CD CD CD CD CD CD CD CD :* >H >H >< >< >i >H >- >H >-p p P P 0 a R P p P P P P a H EH EH EH EH m EH hi EH EH EH EH H rf rf rf rf < O < < < < rf rf rf rf CT hi H H M u hi hi H hi J hi hi hi i» Cn >H Cn >H >H i» >< i» >- >H >H Oi Pi Pi Pi Pi P i Pi Oi Pi Pi Pi Pl Ol Oi >i >H >H PH >H >< >H >H >H >I >H >H >H >-Oi Pi Pi rf Pi > Pi Ol Pi Oi Pi Oi Oi Pl Pi PI PI Pi PI P i OI Ol PI Oi PI Pi Oi Pi s 2 " £ X s E E E E c n c n c n c n c n c n S i S i !« >H >H >i S i >H >H S i £ S S £ £ H E s S E E S E E a a a a a B a a a a a a a a CT 55 55 a a CT 3 3 < rf 1^ c n < 3« rf rf rf > > > > > Cn > > > > b> > > > H H H H M H M M M H > > > M H ht M M H M H H M M M M M EH c n c n c n c n 01 c n c n c n EH EH EH H H hi hi hi hi hi hi . J hi hi hi hi EH EH c n c n c n c n c n 0 3 c n c n c n c n W c n c n c n CD CD O 0 CD O CD CD O O O CD CD CD hi Cu £ E E «; Cn Cn E hi hi hi hi hi hi H hi hi E S E s E 35 si 555 3 3 5 3 5 5 2 2 a a 2 z 2 2 z z 2 2 z a M hi H H H H H M 3 hi 3 a LA c n hi 55 5 TO hi 55 c n hi 55 c n J c n J c n hi CT CT CT CT CT Bi CT CT CT CT CT CT CT 0 rf rf rf c n TO < < < < rf rf rf rf X a a a a W X a a a X a 3 s 3 3 s S 3 3 3 3 3 Is c n c n c n c n c n cn c n c n c n c n c n c n c n c n EH 2 EH EH H 2 H H 2 EH H EH EH EH EH EH EH H EH 2 EH EH EH EH EH EH EH EH hi Cn hi hi hi Cb hi hi Cn hi hi hi hi hi H H H H M iC H H H H H H H H 63 63 61 ca 63 l£ ca 63 63 ca 63 ca 61 ca >H >H >i >H >H 1^ >H >H >H >- >- >H >H >H hi hi hi J hi Q hi hi hi hl hl hi hi hi O O O CD CD O O CD CD "3 °. hi hi ca hi b hi hi hi hi hi hi a a a a a TO a a a a a a a CD 0 CD CD CD CD CD O CD CD CD CD CD Pi Cn ca 2 Cu 63 < 63 61 CT ca ca CT CT CT CD O 0 0 O 2 CD CD O CD O D 0 D EH H H EH EH ra EH EH EH EH EH EH EH 61 61 Cn Cn Cn Bi 61 Cn 61 Cn Cn Cn Cn Cn Pi Pi Pi Pi Pi hi 1 Oi Oi Ol Oi Pi Pi Oi Oi CD CD CD 0 O O CD CD CD CD CD O O hi hi hi hi , hi Bi EH hi hi hi hi hi a a 1 a a a 3 3 rf EH a rf 1 3! c n < rf rf rf rf ca 61 63 63 ca 1 ca t i 63 61 m ca ca ca j hi hi hi hi 1 hi hi hi hi hi hi hi H s H H H 1 H M hi H J H H H ca ca ca 63 63 1 ca ca CD 63 p P P P hi hi hi hi 1 hi hi 2 s s E E E 1 j E H 61 hi j j c n c n c n c n c n ' c n c n p c n 2 CD CD 0 a a a a a 1 c n a a a a a a a 0 CD CD 0 CD 1 CD 0 CD CD 0 CD CD 0 H H H H H H H H H H H t-H H O O 0 O CD ] CD O hi Bi CD CD CD EH EH EH EH EH > hi EH H EH EH H EH EH H Pi Pi Pi Bi Bi % Pi S i Bi hi Bi hi Bi 1^ Bi S i Pi H CT E E E E E E E a a a a a H a a a a a a a a CD 0 CD CD CD CD CD CD CD CD "3 CD "3 < < rf < cn < < rf rf rf rf H > rf > Cn H > > hi > M H M H M H H H H hi hi H hi hi hi hi hi Cn Cn Cn tu Cn CJ In Cn Cn 61 Cn Cn Cn Cn hi J hi hi EH hi hi J hi hi hi hi hi > > > > > H > > > > > H H H rf H rf H 55 c n hi CD > < H H 5 < H rf H rf H rf H rf M 5 5 5 5 5 b b 5 555555 a a a a a a a a a a a a a a a a a a a a a a a a a a a AH 3 3 3 3 01 rt. l> s s 3 3 EH EH H H H EH H > EH EH H H M P p P p p Q1 P P p P P P P P c n c n c n c n c n CD EH c n H EH EH EH c n c n hi J J J hi hi hi hi hi hi hi hi hi H M M H M H £M hi hi hi hi hi H CD CD CD CD CD CD Cn Cn Cn Cn Cn En c n c n c n c n c n rf • K K K K ffi W U J J J J h i fc fc fc fc fc H H H O O O O O O O O fcfcfcfcfcfcfcfc a a a a a a a a C n C n h l C n C n C n C n C n C D C D C D C D C D C D C D C D h l h l h l h l h l h l h l h l a a a a a a a a a a a a a a c n c n c n c n c n c D c n c n c n c n c n c n c n c n H H > M M E H M M H > H h l M EH rf EH H EH OI Pi PI PI PI P P P P R >• >H >H >H P P P P p Bi Pi 2 S Pi Cn Cn Cn Cn Cn H H H H H P R P S i (H £ P Pi CD CD rf rf CD CD CD CD O CD CD E? > > > > > > > > t-H H H H t—t H H H CJ u CJ hi H hi hi hi Cn Cn Cn Cn Cn Cn Cn Cn O CD O CD O CD CD CD CD O CD CD O CD CD CD H H t—t t—t H H t—t t—t s 3 3 S S EH a a a a a a a a a a a a a a a a H EH EH EH H EH EH EH Cn Cn Cn Cn Cn Cn Cn Cn < 43 n us hN Vl 0 o O X ! Q. . 0 CVJq » n ft CO T3 r-H H in —I >, Hi CU - i-H U c id SH -H O cn u cn O o to Qi us IB r-H O - H O JU ^ c3 i-H 6 CO us § I •p 15 ^1 •u R US (0 r-t O u 61 a Ui -u 3 O R CJ - H - H P. Si CO >s ns E= ns N N O PH CJ o o jg TJ - H IS US T3 -I ns 10 Qi ns m 0, ^ us cu O O X ! i-t 4J « C H i H o I t H ' - i t l ' i ' n u q ' " 0 u - s C ns IH U E CJ x j to -U 0 O ni I-H U co hN US E= HI - 3 0 C H S l c l 0 4 J h ( 1 t J « C U « ^ S , h S O 3T3 « X | X ! 3 15 - H - H CJ In 85 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 2 5 5 5 5 5 5 5 5 5 5 co CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO H CO CO CO CO CO CO CO CO > H > b > b > > > > > > > o 0 o o o u o o o 0 a 0 0 0 b b 2 b b H b b b b b b b b a ij J N2 0 o 0 o § 0 rt 0 0 0 0 0 rt 0 CO CO CO rt CO CO CO CO CO CO CO J J H 3 J 1-3 J x X x x X DC x cc a cc cc cc x a x b X x X cc cc cc cc cc 2 2 2 2 2 2 2 2 £q a J a (U Bi Oi Oi oi oi oi oi oi oi Oi Oi CO CO CO CO CO CO CO CO CO CO co co CO H W H W W O W i a W rf rt b b b b b 3 3 3 3 3 "S CO CO "S CO J j CO ta CO 0 0 Oi Oi a. Cu Cu Oi tt o i o i b b b b b rf § § b b CO CO CO J J > rf. CO ixi OJ Cu OJ w w w rt S b >H 2 « b b b b b 3 3 3 3 3 CO CO CO CO CO > > > J J •sci CO CO CO CO Cu Cu Cu Cu Cu ^ o i ^ ^ ^ jjE 3 , | g cn c o c o r t r t r t o i c o c o c o c o c o c o c o c o f<l > H > H > H > H > H ! > 4 , > H > H > J : > H > H > H > H > < S S S E S H 2 2 2 2 S E S 2 o o o o o o o i o o o i o o a o c o c o c o c o c o c o c o c o c o c o c o c o c o c o Oi H H E - J E H E - J E H U H H E - I E - I H H H I—II—irt'-HI'HUI-H'-HIHI-HI-HI—IMI—I ro — > > > > > > H H > > > > > > O 0 0 Q 0 C O O 0 O O O O 0 O J J J u co J J J J J J c o c o c o c o c o c o c o c o 5 5 5 5 3 .3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 OJ 0 O O 0 0 r t J 0 O 0 O 0 0 O 0 ro J J J J ^ J S H J J J J . J J m — o o a o o o o i o o o a a o o b b H b 2 tJ cc cc cc cc cc x J J J J J p CO CO CO CO CO CO g 2 z z a a 2 2 CO H g H > H H M CO b b b b b b b b l x x x x x x x x i c o c o c o c o c o c o c o c o Cu Cu Cu Cu Cu s Cu Cu Cu Cu OJ Cu Cu Cu a a a a a H a a a a o a a o j J >j j j pj M cc cc cc X X a cc cc X X X X X X j j j •3 J j j j h3 J a 3 3 3 3 3 « 3 3 3 Oi 3 3 3 0 0 0 0 0 0 0 O 0 0 0 0 0 0 rt rt rt rt 0 J rt rt rt rt rt rt 0 0 b b b b b iJ b b b H H M M H J J ^ a J b CO J p£ b b <& CO b CO CO CO J J rt M H J H H M CO rt u CO rt > rt CO > < H E-I rt CO CO rt CO rt rt CO rt CO CO CO CO CO J J iJ > l-H ^  J J > H • J H > iJ > b M b b b b J J a J J > J J J J iJ J J b b 0 b b > b b b b b b b b > H > H H H > S i J rt rt CO CO rt > CO CO CO rt rt rt rt rt 0 O o 0 0 ELI E? O 0 0 0 0 0 CO 2 J CO 2 > > 2 rt EH E-i E-i >H £ >H >H > H > H >H >< > H 01 01 o •H ra ts 4J Q R W ra ra Q 0 ra •X in •H ra a ra -iH ra O 0 ra •-I «i i-H 0 +J C 01 0 -R u i-H 0 i-H T3 ra c ra -R a : s ra CU 0 <U SN C ra -H ra 0 0 X! -u iH Cl <u X ! 01 4J Ej ra to 0) R P. i-H c 0 0 ra •-H CJ 0 J? OH R ra -iH 0 4J •-H i-H Cn SH q 0 ra PN .X Cs 0 CJ T3 <D £! 3 -iH "d <u in CO o o. U O O o Cu N O OJ E-i EH EH EH EH OJ D P b U a ro — £ i» b b b J J 2 0 0 0 0 H > > s * ; Oi X X X X X 0 0 0 0 0 rt rt rt 2 rt o 0 0 0 OJ EH • J 3 J J H oi o i o i Oi ro 1 0 0 0 0 1 0 0 CO CO CU Cu OJ Cu OJ O Cu Cu Cu « Oi Oi 2 Oi X x rt 0 rt rt rt 2 2 a a D OJ iJ j iJ o H M H H H ro H H H a D fc=i H M s CO CO CO 2 X X X cc 0 0 0 0 0 M M H H H 0 0 0 0 0 ro 1 1 1 1 1 CO b b 2 OJ — 2 2 2 2 2 EH H EH EH EH Oi Oi Oi Oi Oi >H >H >H >H S s ife E £ X X X X X 0 0 0 0 0 rt < rt rt < ro H > H b > CO H H H H H Ol — b b b b b H > H •> > > b > B» rt rt rt 0 rt E H E H E H E H E H E H E H E H Q D Q >H £ >H P P M p P b >H p 0 0 « bd X X O 0 rt rt O 0 o: co 0 0 H3 J 0 0 b « X X 0 0 J o i 0 0 J i J fcti o i 0 0 J J iJ J 0 0 0 0 fcd b'i fcd td X X X X 0 0 0 0 01 o i o i o i 0 0 0 0 J J J J J Oi Oi Oi 0 0 0 0 C O C O C O 0 0 0 0 0 Cu Cu Cu CO Cu Cu ca rt rt > a a J J J Cu Cu Cu Pu Cu Cu Cu Cu Cu Cu EH > H H H X X X X CC EH rt rt rt rt W H Q U U J J J J ^  H H J J iJ « a a o si H « H s s H M s s OS CO Oi Oi CO CO CO W K K HC K IVC ffi 0 0 0 0 0 0 0 b M 2 l-H M H H 0 0 0 rt 0 0 0 b b b b b b H 2 2 2 2 2 EH EH EH 01 oi oi >H >- >H 2 2 S X X X 0 0 0 0 0 0 • - • rt <i < • • • • > CO J b b b b b b RF RF RF J > > H CO CO X X 0 0 b b 2 2 EH EH 01 Oi 2 2 X X 0 0 rt rf H H b b J > H J H H l-H l-H 5 5 5 5 5 5 5 5 5 5 5 5 5 5 x x a x x K HC HC K x x x x x x x x i5C EG HC ffi K ffi ffi ffil 01 -H •U P P SN >H O O O i i X ! " 0 OJ CO CJ 01 ra m R ra tg o ra •H ra o, ra E= -H ra T3 i-H ra i-H o -u - . C H I H U H H I B R D j C s r a C D O D ^ N R " ! O X J I - H - U I H S H E I U X J O I U H M 5 « O « H U 3 rtj VH-H 04J1-H1-H CJOJH R ^ , o o , a < u x ! X ! 3 r a - H C U O O & J O O W S ; O H R oi ra ^ -H ra u Ej nj o N o ra 5s •H QJ In 65 N O Table 4.2 Indels in H. triquetra chloroplast protein genes Gene Insertion Deletion size (aa) position size (aa) position psaA 2 71 14 1-14 (N terminus) 1 186 2 101-111 10 420 1 243 1 626 2 279-280 2 752 (C terminus) 16 316-331 2 517-518 psaB 11 N terminus 10 310-319 5 108 1 483-493 43 158 4 730-733 (C terminus) 3 205 1 294 3 399 2 693 psbA 4 4 16 345-360 (C terminus) psbB 1 419 1 84 1 C-terminus 1 290 1 293 1 347 1 412 psbC 1 157 31 N terminus 1 160 1 219 1 225 petB 4 157 atpA 5 71 41 N-terminus 1 251 5 460-464 12 C terminus rps14 9 6-14 3 33-35 2 39-40 14 C terminus The position numbers are the sites of the chloroplast proteins of Porphyra purpurea, which are used as a marker to indicate the occurrence of indels on the same proteins of//, triquetra. 11 21 31 41 51 60 Heterocapsa Porphyra GuUlardia Odontella Chlorella Astasia Euglena Marchantia Pinus Nicotiana Oryza Dictyostelium Nephroselmis Prototheca Cafeteria Marchantia Reclinomonas Vicia Arabidopsis Oryza MSLYS MAKKNMIQRE MAKKSMIERE MAKKSMIERE MAKKSMIERD MSKKSIIERE MSKKSLIARQ MAKKSLIQRE MARKS LIQRE MARKSLIQRE MAKKSLIQRE MKIIRKNKKD MVNSI--QRD MFNSI--KRD MKSRL--FRD MSNQI--IRD MISHFS-IKD MSEKRN-IRD EQGVKRNSAD GVSEKRNLLD AHKSGY IKREKLEKKY RKREELVSKY KKRIKLNNKY RKRARLITKY KKRKSLVKKY RKRIILVLIH KKRQNLEKKY KKRQALERKY KKRQKLEQKY RKRQKLEQKY KERREIYKQA KKRRLLSKSY LKRRKLYKKY RNLRVYAKKN HKRRLLVAKY KKRRFLYLKY HKRRLLAAKY HRRRLLAARF HKRRLLAAKY T V K G S S S I L N YLKRLAIKEQ EKKRLELKSK TPKRNTLLQA AAKRKNLLVE KNLRNFIKKE SHNRYVYRTN KILRNSLKKK HLIRQSLEEK HSIRRSSKKE HLIRRSSKKK EKMKNMYKML ELLRMQYQSI ESKRLLYKAL EIKIKALRLL ELKRMHYKAI EWKRLQLKAI ELRRKLYKAF ELRRKLYKAF ELKGKLYKAV RK-LK-LK-YR-IK-IK-GK-IT-SK-IS-SV KTTSF KTAEY QTEDF TATSL NELNF DEKSF ETSSL -VSSL KVPSL IRSKVYPLSL RRN--ELLDQ RSIPQ CNLNQ LSLPF RNLPN IKN-ISD-SED-CQD-AEN--MLLPM CKD--SDLPS CKD-CRD-- PDLPS - PDLPA N - - L Y L V G V K AEKIELRQKL EEKLEVYKKI QSRLDIHSKI EDKFNLHRKL FEKIFLNFKL EKKLRIYSFL DEKWEFQKKL DDKWEIHRKL SDKWEIYGKL SEKTKMREKL ETRNYFNMKV DLRYSYILKL DLRFILTQKL VLRHKIFLHL KIRYEYFFKL DIRFKARLEI DMWDKLRYKL EMRDKNRYKL DMQDQFRYKL L C A I R A L S P G QEMPRNSAPV QEIPRNAFPS QKLPRNSAKN QQLPRNSAPV QKFPRDSSPC QKLPRNSLRC QSLPRNSAPT QSSPRNSAPT QSPPRNSAPT QSLPRNSAPT TSSEKNSSIS TQLPRNSSMI NKLPRNSSQV TQKQKWASLN SKLPRNSSKT NELPKDSSKV SKLPRNSSFA SKLPRNSAFA SKLPRNSSMT 61 71 Heterocapsa Porphyra GuUlardia Odontella Chlorella Astasia Euglena Marchantia Pinus Nicotiana Oryza Dictyostelium Nephroselmis Prototheca Cafeteria Marchantia Reclinomonas Vicia Arabidopsis Oryza QSVIRCLVTG RSRNRCWLTG RLRNRCWVTG RIRNRCWKTG RSHNRCTITG RLHNRCYLTG RLRNRCYVTG RLHRRCFLTG RLHRRCSSTG RLHRRCFLTG RLHRRCFLTG RIKNRCVETG RLKNRCVITG RVKNRCILTG RVKNRCLVTQ RVRNRCIFTG RIRNRCIITG RVRNRCISTG RIRNRCVFTG RLRNRCIFTG HSRSVFKIHN RSRGYYRDFG RSRGYYRDFG RPRGFYRDFG RPRGYFRDFG RPRGYYRFFG RSRGYFRTFG RPKANYRDFG RPRANYRDFG RPRANYRDFG RPRANYRDFG RSRGIISAYR RSKSVYRFCR RGHSVYKFCR RSRGVFRLTK RPRSVYKLFR RPRGVHKYWR RPRSVYELFR RSRSVTELFR RSRAVYKKFR VSRLKMFTY LSRHVFREMS LSRHVLREMV VSRHVLREMA LSRHVLREYA LSRHIFRDMA LSRHILRDMA LSRHLLREMA LSGHILREMA LSGHILREMV LSGHILREMV ISRLRFREYM LSRISFRELA ISRIKFRDLA TSRLTFRKLA ISRIVFRELA LSRIKIRELM ISRIVFRSLA VSRIVFRGLA MSRIVFRSLA HECLLPGVTK HDCLLPGVTK HSCLLPGVTK LQGFLPGWK HYGLLPGVTK HYGLLPGVTK HACLLPGVTK HACLLPGIKK HACLLPGATR YACLLPGATR KMGLISGVKK SKGLLMGITK NQGLIQGCVK TQGLLPGIRL SKGSLIGINK AQNKIPGLRK SRGPLMGIKK SKGALMGITK NKGELLGVKK SSW SSW SSW ASW SSW ASW SSW SSW SSW SSW ISY SSW SSW ATW SCW SSW SSW SSW ASW M Figure 4.14 Protein sequence alignment of the putative small subunit 14 ribosomal protein (rpslA). C , chloroplast. M , mitochondria. 88 has a 14 amino acid deletion at its C terminus, a 9 amino acid deletion close to its N terminus, and 6 and 2 amino acid deletions in the center. The four deletions make it 28 amino acids shorter than the rps\4 proteins (100 amino acids) of other organisms. Although B L A S T searches indicated that ORF3 is more similar to mitochondrial rps\4 than to chloroplast rps\4, the presence of the tripartite 9G-9A-9G region on this minicircle suggests that it is of chloroplast origin. 4.7.2 Starts and stops Only petB has A T G at the presumptive start site expected from alignment of the deduced protein sequence with plastid and cyanobacterial homologues. Determining the start site for the other six protein genes was more difficult, because they have no A T G anywhere near the presumptive N-terminus. psbA has A T G in H. pygmaea, H. niei and H. rotundata (see Chapter 6), but H. triquetra has TTG at this exact position(Figure 4.15), suggesting that TTG is the start codon, as in Chlorella plastid inf A (Wakasugi et al. 1997). psaA and psbC have a TTG at putative start sites, but both also have a nearby A T A , used as a start site in some mitochondria (Boore and Brown 1994). psbB has A T A at the exact presumptive start position, and psaB has A T A a short distance upstream (11 amino acids) of the putative start site; even the petB gene has an A T A adjacent to and upstream of its initial A T G (Figure 4.15). atpA has A T A downstream (41 amino acids) of the putative start site, however, in the same reading frame, it also has A T G 15 amino acids downstream of the putative start site but 81 bp (26 amino acids plus a stop TAA) upstream of the A T A . It is possible that atpA used A T G as start, and that R N A editing 89 psaA ATAAAATTGT TTTTCAGATA TGTCAATAGT AGAGTCTGGT CT CAAGCAGG CTCAAGTCAC I - - K - - L - - F - - F - - R - - Y - - V - - N - - S - - R - - V - - W - - S - - Q - - A - - G - - S - - S - - H - -psaB 'ATATCGCTTT TAGATGGTAG AATTTTAGGT TTTACAACTC ATTCTGATTC ATTTGTATCT I - - S - - L - - L - - D - - G - - R - - I - - L - - G - - F - - T - - T - - H - - S - - D - - S - - F - - V - - S - -psbA TTGAAGAATA CTTTCAACAC TTCTAACGTT TTCGCTTCAG CTTATAGCTT CTGGGGTTCT L - - K - - N - - T - - F - - N - - T - - S - - N - - V - - F - - A - - S - - A - - Y - - S - - F - - W - - G - - S - -psbB ATAAGATTAC CTTGGTTCAG AGTTCACATC GTTATTCTTA ACGATCCAGG CCGTCTTATT I - - R - - L - - P - - W - - F - - R - - V - - H - - I - - V - - I - - L - - N - - D - - P - - G - - R - - L - - I - -psbC ATACGTATTT CATGTTTGAA AAAGCGAACT TTAATCGGAT CTAGATATTC TTGGTGGTCA I - - R - - I - - S - - C - - L - - K - - K - - R - - T - - L - - I - - G - - S - - R - - Y - - S - - W - - W - - S - -atpA ATAGCATTCA TCGGTGAAGT TTTCAGAATC TGTGCAATGG GCCTTAGCGA AAGTAGCTTC I - - A - - F - - I - - G - - E - - V - - F - - R - - I - - C - - A - - M - - G - - L - - S - - E - - S - - S - - F - -petB ATAATGGGCT TCATTTATGA TTGGTCCGAA GAGCGTTTAG AGATTCAGTC AATTGCTGAT I - - M - - G - - F - - I - - Y - - D - - W - - S - - E - - E - - R - - L - - E - - I - - Q - - S - - I - - A - - D - -rps 14 ATGAGTTTAT ATTCAGCTCA TAAGAGTGGT TATACAGTCA AGGGTAGTTC ATCTATTCTT M - - S - - L - - Y - - S - - A - - H - - K - - S - - G - - Y - - T - - V - - K - . - G - - S - - S - - S - - I - - L - -Figure 4.15 Putative starts (bold) and the N terminus of eight chloroplast protein genes in H. triquetra. psaA TTATTTAAAA TGTTATGGAT AGATTTTTAG TATTTATGGC TTTATTTAGT TTGTTG psaB TTATTTAAAA TGTTATGGAT AGATTTCTAG TATTTT-AAT AGT-TCGTTA AAAAAG psbA TTATTTAAAA TGTTATGGAT AGATTTCTAG TATTTTTGGC TTTATTGGTG ATTAAG psbB TTGGTTAAAA TGTTCTGGAT AGATTTTTAG TATTTTTGGT TTT-ATGTTC GTTTTC psbC TTATTTAAAA TGTTATGGCA ACCTCAACCG AAGGGTGGGG TGA-GATTTC TAGTAT atpA TTGTAGAAAA TGTTGTGGAT AGATTTCTAG TATTTGGATA AGT-TTGTTA AAAAAG petB TGTTTGAAAA TGTTATGGAT AGATTTCTAG TATTTTGATT ATTACTGTTT GGTAGG rpsl4 TAGTTAAAAA TGTTGTGGAT AGATTTCTAG TATTTCAACC TTCGGCACCC TCCCCG 2 3 S rRNA TTAGTTAAAA TTGTAGAAAT GTTGGTGATG -GTGATGGTG TTTGAATGTT TGAAAA 16SrRNA TTGTTTAAAA TGTTATGGAT AGATTTCTAG TATTTATGGC TTGATAGTAA TGTTAG Figure 4.16 Putative promoter region (bold) on D4 region of ten chloroplast genes. 90 has turned T A A into an amino acid. Another possibility is that GGT is the start of atpA, and the first codon of that reading frame. 5' race, 3' race and cDNA sequencing are needed to investigate the possibility of R N A editing in the expression of the chloroplast genes in H. triquetra. The N terminus of all the protein genes and the 5' end of the sense strand of both rRNA genes have the same orientation with respect to the 9G-9A-9G region (Figure 4.10). Chloroplast protein genes in other organisms have a T A T A box like promoter region upstream of the start site (Sugiura 1995). However, H. triquetra chloroplast genes do not have a similar promoter region (TATA box) upstream of the start site, but they have a conserved very AT-rich region just downstream of the right 9G region (Figure 4.16). This region could be a prime candidate for the promoter, unless the promoter is within the conserved 9G-9A-9G region. In prokaryotes, the Shine-Dalgarno (SD) sequence (GGAGG) approximately seven nucleotides upstream of the start codon, and the anti-SD sequence (CCTCC) near the 3' end of the 16S rRNA, form the translation initiation complex. Although anti-SD K sequences at the 3' end of 16S rRNA gene in cyanobacteria and chloroplasts are very conserved, the putative SD sequences of many chloroplast mRNAs differ significantly in size, nucleotide sequence and distance to the start site (Kaneko et al. 1996; Betts and Spremulli 1994). It is unclear i f chloroplast translation initiation requires a Shine-Dalgarno sequence (Sugiura et al. 1998). A GGTGG consensus motif, complementary to C C A C C at the 3' end of the 16S rRNA sequence, is found at variable distances (-8 to -124 bp) upstream of the putative start codons of several chloroplast genes of//, triquetra (Table 4.3). 91 Table 4.3 Shine-Dalgarno like sequence in protein genes of H. triquetra Gene SD like sequence Position psaA G T C G G -36 psaB GTTGG -36 psbA G G T G G -127 psbB T G A G G -22 psbC G G T C G -38 atpA GGTGT -23 petB G G T A G -42 rpsl 4 GGTGT -8 Consensus G G T G G Six protein genes in H. triquetra use T A A , and the other two use T A G as stop codon; no T G A codons are used either as stops or'as alternative amino-acid codons as in many mitochondria (Table 4.1). This relative rarity of stop codons with a G is usual in AT-rich genomes. 4.8 Tripartite non-coding 9G-9A-9G region The 9G-9A-9G region consists of three very conserved cores (one 9A core and two related 9G cores) and much less conserved D2 and D3 regions between the cores (Figure 4.17). The central core (9A core) comprises 188 highly conserved nucleotides centered on a motif of nine A 's (9A), and a 19 bp inverted repeat (consensus sequence: A T C T A T C T A T C A T A C C A C C A A A G G T G G T A T T A T A G A T A G A T ) just 20 bp downstream of the 9 A motif (Figure 4.17). The 9 A core is flanked by non-identical but closely related 9G cores, 9GL and 9GR, each consisting of 135 conserved nucleotides centered on a run of nine Gs (9G). The 9GL and the 9GR cores have 107 nucleotides out 92 (a) 135bp C-terminus ^ G L 188bp 9A N-terminus (b) 1 11 21 31 41 51 61 7 ( I I . I I I I I I psaA A A A T C C T G A T A A A T T T C A C T T T T C T C A G T A C T T T T C C C C G GTAAAAGGGG GGGGGTGTCT G C G A T T T C A A psaB psbA psbB A psbC a tpA A petB rpsl4 A . GTG 1 6 S r R N A 2 3 S r R N A psaA psaB psbA psbB psbC atpA petB rpsl4 1 6 S r R N A 2 3 S r R N A . G G . . G T . 9G L 71 81 91 101 111 121 131 I I I I I I I psaA AGTGGAGTCC CAAACGCATG TCTGGAATAT ATGAGGAGAA G T T A T T T T C T C A G A T A T T C T CAGAT psaB T C A . . . . psbA T . . . . C . T C psbB T psbC a tpA rpsl4 1 6 S r R N A 2 3 S r R N A psaA psaB psbA psbB psbC atpA petB 1 6 S r R N A 2 3 S r R N A petB G A . G . G A . G T G A . . . . . T . G . G A . A A . GGA. G . T . T T . 9G,, . T C . . T C . . . . GG . . . G . . G A T . . G A T . . . . . G rpsl4 GG . G A T . . . .GG G . . . A . A A . GG . G A . A A . . . A . A A . A G A . A A . . . A . A A . . . A . A A . . . A . A A . . . A . A A . . . A . A A . . . A . A A . A . G . G . . A G . . T . AGA A GGT A . C . G . AGA A G G . A . . . G . AGA A GGT A G . . T . AGA A G G . A G . . T . AGA A T G T A . A G T . G . A A AGA A . . G T . GGTTG G . . . , A A G . T G T A G G . A G : . T . AGA A GGT A G . . T . AGA A 9GR (c) A A A T C C T g A T A A A T T T C A C T T T T C T C A G T a C T T T T C C C C G GTAAAAGGGG GGGGGTGTCT GCGATTTCAA a A A T C C T G A T A A A T T T C A C T T T T C T C A G T A C T T T T C C C C G GTAAAAGGGG GGGGGTGTCT G C G A T T T C A A AGTGGAGTCC C A A A c G C A T g T c t G G A A T A T A T G A G G a g A a G T T A T T T T C T C A G A T A T T c t CAGAT AGTGGAGTCC C A A A c G C A T g T c t G G A A T A T A T G A G G a g g a g t A A A A T T c T G g g A A g t t t T a g a a a 93 (d) psaA rpsl 4 1 6 S r R N A 2 3 S r R N A 1 11 21 31 41 51 61 70 I I I I I I I I A A A A G T G C A T T T T T G C A C G A T T T T A A G A A A A C A C A T G C A A T T T G C C T T G G A T T T T G G C C A AGCCGCAAGT p s a B G . psbA .. psbB . T C A . . T C A . psbC A . A A A A T C A A . C T . atpA petB G - C A A . A A A A T C A A . CT . . A . A A A A T C A A . CT . rpsl 4 1 6 S r R N A 2 3 S r R N A 71 81 91 101 111 121 131 140 I I I I I I I I C C T T T A A A A A A A A A C A A C T C GTGAATCGTA G-CAATCTAT CTATAATACC ACCTTTGGTG GTATTATAGA . A A A . psaA psaB C A T . . T T T psbA C A T . . T T T psbB C A T . . T T T psbC C . . . ' A T . . T T T a tpA C A T . . T T T p e t B G . . A T . . A T . . A T . . T T T . T . , T T T . T . , T T T . T . . A A A . . A A A . . A A A . 141 151 161 171 181 I I I I I psaA TAGATAGAAA A C A T T T C A C G AGTTGAAGAA AAAAGGGGAA A A T A A T T C psaB T . C T . G G . . . - T T psbA T . CT . GG . . . - T I psbB T . CT . CG . . . - T T psbC T . C T . G G . . . - T . . atpA T . C T . C G . . . - T petB T T T . G . . . T rpsl4 T . C T . G G . . . - T 1 6 S r R N A T . C T . C G . . . - T T 2 3 S r R N A T . C T . G G . . . - T (e) aAAAGTGCAT T T T T G C A C G A T T T T a a g A A A A C A C A T G C A A T T T G C C T T G g A t t t t g g c c A agCCGCAAGT C C T T T A A A A A A A A A C A c C T C G T G A A a t G T t t t C t A T C T A T C T A T c A T A C C A C C t t t G G T G G T A T t A T A G A T A G A T t G c t A g g A T T - C A C G A G T T G A A t A A AAAAGGGGAA a a T a A T T t t Figure 4.17 Structure and sequences of the tripartite 9 G - 9 A - 9 G region of chloroplast gene minicircles. (a) Diagram of the 9 G - 9 A - 9 G region that consists of three cores (9G L , 9 G R and 9 A core), (b) Sequence alignment of the 9 G L and 9 G R cores, (c) Consensus sequences of the 9 G L and 9 G R core of ten chloroplast genes, (d) Sequence alignment of the 9 A core (the inverted repeat of 19 bp downstream of 9 A on psaA is bolded). (e) Consensus sequence of the 9 A cores of ten chloroplast genes. In the consensus sequences, upper-case letters are nucleotides that are identical in the ten genes; lower-case are nucleotides that are conserved in >50% of the sequences. Dots are nucleotides identical to those of the psaA minicircle sequence. Runs of 9 G and 9 A are bold. 94 of 135 bp in common, and the sequence of 9 G L is more conserved than that of 9GR among different minicircles (Figure 4.17b). In addition to the two highly related 9G cores (direct repeats) and the 19 bp inverted repeats, short direct repeats were also found in D4 of psbA, D3 ofpsbC, D l of 16S and D2 of 23S rRNA gene minicircles (Table 4.4). Table 4.4 Direct repeats on single gene circles ofH. triquetra Gene Size (bp) No. of repeats Position Sequence psbA 10 3.4 D2 A G A T A T T T G A 10 3.5 D4 GGTTATTATC psbB 10 3 D2 A T T C T C A G A T psbC 13 3.3 D3 G G G G C T C A G A T T T 16 6.4 D4 T C G A A C A T T C A T T T C T 16S rRNA 21 2 D4 A A A G T C A A T G T C A A C A C C A C C 33 2 D4 C T T T G A T G A A C G A A G T T G G A C A A G G A T T C A A A T 23 S rRNA 10 6.9 D2 T T C T C A G A T A Comparison of all 9G-9A-9G regions of ten H. triquetra chloroplast minicircles showed that the 9GL , 9A and 9GR core regions are strongly conserved, although some single base insertions and substitutions have occurred (Figure 4.17). In contrast, the D2 region is more variable, with different numbers of a decanucleotide repeat (Figure 4.18a). In the 23S rRNA circle, six decanucleotide repeats are followed by a seventh with a 2 nucleotide substitution, and the first two repeats are shared by all the ten gene circles. In the psbA circle, two decanucleotide repeats are followed by a third with a 3 nucleotide substitution. Interestingly, petB and psaB share an identical 42-nucleotide sequence (boxed in Figure 4.18a), and upstream of this 42 bp is a very conserved region of 20 bp shared by all ten genes (bold). Short D N A sequences shared between genes were also observed, for example, psbA and psbB have the same 17 bp (boxed). Although the three genes share a 14-nucleotide variant in the D3 region (Figure 4.17b), and others share 95 (a) D2 region petB psaB 16SrRNA psaA atpA psbA psbB rpsl 4 psbC 23SrRNA petB psaB 16SrRNA psaA atpA psbA psbB rpsl 4 psbC 2 3 S rRNA 9GL TTCTCAGATA TTCTCAGATA TTCTCAGATA TTCTCAGATA TTCTCAGATA TTCTCAGATA TTCTCAGATA TTCTGGAAGA TTCTCAGATA TTCTCAGATA I I TTCTCAGATT~ D2 TTTCAAGATT CTCAAAGATT TTCTCAGATT CCATGGAATT TTCTCAGATA T T C T C A G A T A T T T S A A S A T T TTCTCAGATG TTTAAAGATT TTTGAAGATT TTCTCAGATA TTTGAjAGATA TTTGAAGATA TTjTGAAGATT TTCTCAGATA TTCTCAGATA TTTGAAGATA TTCTCAGATT TTCTTATTTA TTCAACAAAA GTTGCCTAGA TTCAATCTTT CAACAAATAT CTACAACTCA TTCTCAGATT TTCTCAGATA TTCTCAGATA TTTGAAGATA TTCTCAGATA TTCTCAGATA TTCTCAGATT I I I I I I I I I I I J D2 9A CCATGGAATT TCATGGAATT CCATGTAATT TCATGGAATT CCATGGAATT CCATGTAATT CCATGTAATT CCATGCAATT CCATGTAATT CCATGTAATT TTGCTGGATT TTGCTGGATT TTGCTTGCAT TTGCTTGCAT TTGCTTGCAT TTGCTTGCAT TTGCTTGCAT TGCCTTAGAT TTGCTTGCAT TTGCTTGCAT TCffTCAAAAT C A T C A O T T T TAATTATTTC CCTTGCAAAT A A C ^ A A T A T T C A A A A T C A T C A C C T T T T A A T T A T T T G C C T T G C A A A T A A C 1 G A A TTCACTTATT TTC TT TTCACTTATT TTCTA-TTCACTTATT TTCTA-TTATA AAA T AAA . AAA - AAA TTCACTTATT TTATA AAA TTGCCTTGGG TTATA AAA TTCACTTATT TTCTA AAA TTCACTTATT TTCCT AAA (b) D3 region psaA psaB atpA psbA psbB rpsl 4 psbC 16SrRNA petB 2 3 S rRNA psaA psaB atpA psbA psbB rpsl 4 psbC 16SrRNA petB 23SrRNA 9A AATTCTT-GA AATCTGG AATCT AATTTGCAGA AATCTGG AATCT AATTTGCAGA ATTGTGG GATTT AATTCTT-GA AATCTGG GTTTT AATTTGCAGA ATTGTGG GATTT AATTTGCAGA AATCTGG AATCT AATTCTT-GA AATCTGG AATCT AATTTGCAGA ATTGTGG GATTT GATTTT GATGA AATTCTT-GA AATCTGGGTT TTGATGATTT D3 TCTTTTTTTG TCTTTTTTTG TGATTTTCTT TGAAATTGTG TGATTTTCTT TCTTTTTTTG TCTTTTTTTG TGATTTTCTT TTTTTA--TG TGAAAT-CTC AAATTCCGCA AAATTCCGCA AAATTCCGCA GAAATCCGCA AAATTCCGCA AAATTCCGCA AAATTCCGCA AAATTCCGCA AAATTCCGCA AAATTCCGCA AGTCCGCGAT AGTCCGCGAT AGTCGGCGAT AGTCGGCGAT AGTCGGCGAT AGTCGGCGAT AGTCCGCGAT AGTCGGCGAT AGTCGGCGAT AGTCGGCGAT T A G A A A T T A G A A A T T A G A A A T TT-TT-TT-TTCTGGGGCT TTCTGGGGCT TTTATTCGCT TTCTGGGGCT TTCTGGGGCT TTTAGGATCT TTCTGGGGCT D3 CTTTTTAAGA TTTGTGSMrc A G A A A -C T T T T T A A G A T T T G T G G A T C A G A A A -C T T T T T A A G A T T T G T G G A 3 C T C A G A -CAGA CAGA CAGATTTGTG ATATT CAGATTTGGG GCTCAGATTT GGGGCTCAGA CAGATTTGGG GCTCAGA CAGATTTGTG TTTT CAGA TTGGGGGTTT TTGGGGGTTT TTTGGGTTTT TTTGGGGTTT TTTGGGGTTT TTAGGGTTTT TTTGGGGTTT TTTGGGGTTT TGTGGGATTT TTTGGGGTTT TGGGTATTTT TGGGGAAATC TGGGTATTTT TGGGTATTTT TGGGGAAATC TGGGGATTTT TGGGTATTTT TGGGTATTTT TGGGGAAATT TGGGTATTTT 9GR TCTAGATGTT GCCAAA TCT GAA TCTGGATGTT GCTAAA TCTGGATGTT GCTAAA TCT GAA GCC AAA TCTGGATGTT GCTAAA TCTGGATGTT GCTAAA GTGGAATGTG ATGAAA TCTGGATGTT GCTAAA Figure 4.18. Sequence alignment of the variable region of D2 (a) and D3 (b) from ten chloroplast genes. Direct repeats are in square brackets and gene conversions are boxed; the conserved sequences are bold for D2 and underlined for D3. 96 small D N A fragments, the D3 region as a whole is more conserved than the D2 region (Figure 4.18b). Such variants shared by two or more genes are indicative of gene conversion, which is probably the main homogenizing force between the 9G-9A-9G regions. Since all the minicircles so far identified in H. triquetra, have the tripartite non-coding 9G-9A-9G region, it is reasonable to argue that all these minicircles came from the same compartment, the chloroplast, although this needs to be confirmed by in situ hybridization using a probe containing 9G-9A-9G region. Hybridization with a probe containing the 9G-9A-9G region of psaA minicircle labeled several bands (1.2-8 kb) only on the satellite D N A lane (Figure 4.19), suggesting that all the minicircles containing the 9G-9A-9G region are present in the satellite DNA. Probably those bands of >4 kb labeled by the probe represented dimers or trimers generated by recombination between different minicircles, and those labeled bands of <4kb resulted from single minicircles. The observation that none of the 15 minicircles sequenced are smaller than 2 kb, and most of the ten genes as well as the five aberrant minicircles are around 2 kb, suggest that minicircle size is somehow constrained. Perhaps a certain minimum size of a D N A minicircle (e.g. 2 kb) is required for basic survival functions such as replication and transcription (also see Chapter 5). Possibly the very thick bands of approximately 2.4 kb and 1.3 kb represents open circular and supercoiled forms respectively of 2 kb minicircles that seemed to be abundant (Figure 4.1, also see Figure 5.1). The entire tripartite non-coding region of each H. triquetra gene can be folded into an elaborate secondary structure with many long hairpins and small unpaired loops 97 Kb C T S M 12.0 6.0 4.0 3.0 -2 .4 0 . 8 - * * Figure 4.19 DNA blot probed with the non-coding 9G-9A-9G region of psaA minicircle. T, total DNA. S, satellite DNA. M, main band DNA.C, control of the 9G-9A-9G probe. The DNA used was not treated with any restriction enzyme. ©2000 Washington University f*0 • ,0—T c—o 'IV ft Figure 4.20 Secondary structure of the tripartite non-coding region 9G-9A-9G of the minicircular psbA gene of H. triquetra. 9GL and 9G R as well as 9A show the run of 9G and 9A respectively. 9GL and 9GR each form part of a hairpin, which the 9A is part of an unpaired loop. 99 using D N A fold (http://mfold.wustl.edu/~folder/dna) (Figure 4.20), but many alternative structures of similar thermodynamic stability are possible. However, most are minor combinations of a small set of conserved hairpins. In all nine genes certain patterns recur: in each the 9A motif forms a large unpaired region near the base of a conserved long hairpin resulting from the 19 bp inverted repeats, and the 9G regions form one side of a base-paired stem near the ends of two other hairpins. Short repeats and AT-rich sequences with a high potential for hairpins occur in the replication origins of chloroplast D N A in Euglena (Schlunegger and Stutz 1984) and Chlamydomonas (Wu et al. 1986). As regions with a high capacity for secondary structure but little or no primary sequence conservation are very typical of prokaryotic replication origins, the tripartite regions are good candidates the replicon origins of the unigenic circles. Another function for this region may be to bind the circles to a larger structure, e.g. a membrane, to mediate D N A segregation. The minicircle organization would make this necessary to avoid gene loss during chloroplast division, unless their copy number was exceptionally high. 4.9 R N A blot Total R N A blots probed with spinach psaA and petB genes as well as H. triquetra psbA and 23S rRNA genes, showed that these chloroplast genes were expressed in mRNAs of the size expected for monocistronic transcripts (Figure 4.21). The largest band is psaB, with an approximate size of 2.4 kb, only slightly larger than the minimum expected for a protein of 776 amino acids. There is no transcript large enough to encode 100 Figure 4.21 Hybridization of Heterocapsa triquetra RNA with chloroplast gene probes. 1. Spinach psaB. 2. Spinach pefB. 3. H.triquetra psbA 4. H.triquetra 23S rRNA 101 both psaA and psaB, supporting the idea that these transcripts originate from single gene circles and not from a larger D N A molecule carrying the typical bicistronic operon. Similar conclusions can be drawn for the petB, psbA and 23 S rRNA transcripts. This does not rule out the possibility that a larger circle carrying more than one gene exists, since monocistronic messages could still be transcribed from such a circle. 4.10 Chloroplast genome size In addition to the two ribosomal R N A genes (16S and 23S rRNA) and eight protein genes (Table 4.1), a D N A blot showed that the psbD gene is in the satellite D N A ofH. triquetra (Chapter 2), which means that so far only eleven chloroplast genes are known in the chloroplast genome of H. triquetra. The protein-coding genes identified include at least one polypeptide from each of the four major membrane-protein assemblies of thylakoids and a putative ribosomal protein gene (Table 4.1). Five minicircular chloroplast genes that also include at least one gene for the four major membrane-protein assemblies of thylakoids have been sequenced in the dinoflagellate Amphidinium operculatum (Barbrook and Howe 2000). Two of them have not been found in H. triquetra, peiD and atpB, encode subunit IV of Cyt bef complex and the (3 subunit of ATPase respectively. Two more genes, petD and psbE, were found in H. triquetra (Filek and Green, unpublished). This makes a total of 14 genes in dinoflagellates. In both species, conspicuously absent are any genes for tRNA and R N A polymerase as well as the majority of soluble proteins such as ribosomal proteins. 102 In a search for other chloroplast genes in H. triquetra, D N A blots were probed with GuUlardia theta chloroplast elongation factor Tu {tufA), and R N A polymerase II (rpoB) genes and spinach atpB/E gene (not shown). At low stringency tufA probe labeled a high molecular weight band on the uncut total genomic D N A , in both the satellite and major band D N A lanes, but at high stringency these bands were very weak or disappeared. The bands observed might be an artifact as it is unlikely (but not improbable) that the tufA gene is present in both satellite and major band D N A . The rpoB and atpB/E gene probes labeled nothing either at high or lower stringency. These results suggested that tufA, rpoB and atpBIB gene probably are too divergent to be detected by the probes used. However, rpoB might be totally absent as in most mitochondria. Most mitochondria have replaced the eubacterial R N A polymerase by a nuclear-encoded virus-type R N A polymerase (Gray et al. 1998). Since in land plants such a nuclear-encoded polymerase is known to be present in chloroplasts and transcribes some genes (Cermakian et al. 1997), it is quite possible that in dinoflagellates this is the only chloroplast R N A polymerase and that the plastid encoded version has simply been lost. The absence of ribosomal proteins (except rpsl4) and tRNAs, though unprecedented for chloroplast genomes, is well known in some mitochondria (Gray et al. 1998). In an attempt to find tRNA genes, tRNAscan (http://www.genetics.wustl.edu/eddy/tRNAscan-SE) and gene search in Gap 4 of Staden were used for all the sequences of the 124 clones; however, no tRNA gene was found. Because both H. triquetra and A. operculatum have minicircular genes for different components of the ATPase and Cyt bef assembly, it is probable that each of them has maintained only one gene for each assembly of thylakoids, which could be the 103 minimum complement of chloroplast genes for photosynthetic function. These results, and the earlier evidence for loss of plastid-encoded Rubisco (Morse et al. 1995; Rowan et al. 1996), tend to suggest that the chloroplast genome of H. triquetra and A. operculatum both have a reduced complement of genes. However, although further randomly picked clones have revealed only copies of most of the ten genes, I have probably not yet cloned every plastid gene. Southern blotting using a spinach psbD probe suggests that the psbD gene is present as small molecules in H. triquetra, and in Amphidinium carterae (Hiller and Green, unpublished). A more exhaustive search for additional plastid genes is needed before I can be sure that dinoflagellate chloroplast genomes have indeed undergone as extensive genome reduction as have some mitochondria (Feagin 1992). 104 Chapter 5 Minicircles with jumbled chloroplast gene fragments 5.1 General characteristics In addition to the ten single gene circles in the chloroplast genome of H. triquetra described in Chapter 4, five aberrant minicircles with the tripartite non-coding 9G-9A-9G region were completely sequenced (Figure 5.1a, Table 5.1). Aberrant minicircles were generated by circularizing five contigs that each resulted from overlapping the sequences of several clones. However, none of them contains a complete chloroplast gene; instead each contains fragments of chloroplast genes 16S rRNA, 23S rRNA., psbA and psbC. Each fragment on an aberrant minicircle and the corresponding sequence on the normal minicircular chloroplast gene have the same orientation with respect to the 9G-9A-9G region (Figure 5.1, Table 5.1), therefore all the chloroplast gene fragments on the aberrant circles have the same polarity. The size of the five aberrant circles is around 2 kb, which is smaller than the average size (-2.57 kb) of the known functional single-gene circles. 5.2 Aberrant minicircles are circular molecules In order to rule out the possibility that the aberrant circle 1 was not cloning artifacts or chimeras, inverse PCR reactions amplifying genomic D N A were carried out using primer pairs 23S3/16S1 and 23S4/16S2. Products were the size predicted for a circular molecule (Figure 5.1a). Sequences of the PCR products could be integrated into 105 106 107 Figure 5.1 Structure of aberrant and normal minicircles {(a) and (b)}. (a) Jumbled chloroplast gene fragments on five minicircles. Gene name and the number in the bracket are consistent with those in Table 1 and in (b); each color represents a fragment of a particular chloroplast gene. All fragments on each circle have the same order and orientation as their coordinates on the original single gene circles (see b). *9G, 8GR are related to 9GR, and #9G is related to 9GL. 0 is start site of 9GL. Primer pairs 16S1/23S3 and 16S2/23S4 were used in inverse PCR that has confirmed that circle 1 is a genuine minicircle. Primer pairs 16S2/23S7, 99C2/23S7, 116C1/116C4, 99C4/23S7, 99C2/99C5 and 99C1/99C4 on circles 2-4 gave PCR products of the expected sizes. The large homologous segments that show a family relationship among the five circles are indicated by lines labeled by Roman numerals (continuous, dashed or dotted) within each circle, (b) Coordinates of chloroplast gene fragments of circles 1-5 on minicircles of 23 S rRNA, 16S rRNA, psbA and psbC gene. Five regions of homology (A, B, C, D and E) are indicated by dotted lines, and the nucleotides included are numbered at each end. Numbers beside the fragments correspond to those on the five selfish circles (see (a) and Table 5.1). The short fragments labeled by Greek letters (e.g. a, B and y) beside large ones represent duplications (repeats) (also see Figure 5.2) and gaps within the large fragments represent deletions on selfish circles. The solid arrows on the fragments of circle 1, 2 and 3 homologous to B region of 23 S rRNA are the insert sites of the 16S fragments (C region). The arrows on the fragments homologous to E region represent locations of the insertion of 11 bp on circles 3-5. 108 Q. •Q •a c o> +— c <u E Dl (0 OS 5? il c ra c o ro ro ro m ro ro ro ro ro CD E-i CJ rf EH O EH CJ rf H rf EH EH H rf EH EH H E-i EH O rf EH EH H CD i-i E-i CJ H EH EH H CJ H H H rf EH EH EH EH <, EH CJ CJ CD EH CJ EH o CD CJ EH EH U rf CJ EH EH CD rf CD EH CD rf CD EH O rf C5 C5 < EH iS Ci! O rf CJ rf O H O CJ rf O EH EH rf CJ CJ EH CD H EH CD CJ . rf CD CD rf rf E-i C D U EH EH EH C J IT) LO Lfl LT) Lfl UO LO LO LO LO LO LO LO LO LO LO co 2 oi co co 01 ~ O l OO C l "D O O + Q + or O oo ' r •1 ral 0) tf-Q CO CD O ^ O ) CO I V ~— CM N S CO CO LO T " T - T - CN °? ri CO - 5! CM S CO O i T - N i n c o o i |V T- T" 1- O tt tt tt tt LO < < < -z.~z.-z. DC 01 DC 2 w co w CO CO CD CO O . CM •<- CM 0) u o Qi cn CO CM * I CM O i l c o " CM O l O O i -r O S I O O CM CM i -T- T- T- CM O i -CM O O CM i - CM CO -tf CO .Q CD I CM O SScD c o 2 c o c o i n o o O) — C l OO O l O l • t f Q CD CD O w oi co iv CM N S 00 r-tt c o o LO c o Q + cc O 21 CD CM |v OO o tt tt in < < < z z z a. cc a. S cn co co CO CO CO CO O . CM r- CM ° in ° o 52 CM « ° < < a: Di l_ 1— CO to CO c o CM CM Oil CMl O LO T- col 1- | v T- CO CM CO LO O o o oo Y CD CM N - CO V O l CM CM - t f c o m N T - CM CO c o c o O CO O T" t - CM CO -tf LO CD 9- "> .Q CD T - O JicD o i o i c o c o S O l O l OO O l " • t f Q LO CD tf ^ CM CO O l ( D 00 N CO CM r- T- T-I— i l l i c o i n T -CO 00 o o O l tf" N - CO CO T - 1- i -< < < < Z.ZZZ DC CC CC CC 1_ L_ t_ 1— cn cn cn cn c o c o c o c o CM CM CM o J3 a i l col CD CM LO CO LOI T-CNl T-tf- 52 CM tf- g CO 3 CD O l O 00 CM LO CO CO CO CM CO O CM CO CO CO CM i -O tf- O CO CM CO CO CM CM CD CO CO T- T- 1-T - CM CO tf" LO Q. co JO CD "8 § T- CJ £itf-o „ o a. • D l c T3 O O ass CO w o CO CM O O LO CM V "? CO CM CM - t f i - -tf CO | v t v T -< < O . a . o JO <o co to Q_ Q. O. COI col O 00 T-CM i - T-CM _ t o ™ o CO tf" LO CO •tf T - CM CM CM CO T Lf l 6 CO CM CM CM T- CM CO •<- T-9- <" . D CD 3 I o o c o CD CJ EH H D m o m < o m o m < co < m o co w a o U J Q Q U J ° r o ° s. Ol O O + •tf a § c o tt o CO CM O O LO CM V °P «b CM CM -tf T - tf- CD < < O ^1 H D CO CO CO Q. Q. Q. COI col O 00 CM i -LO ^ | v 1 0 o i w CO 'tf Ol o o o i n o CD T - CD O T -.— I I i O l -tf •tf CO CO O O T -I V f -Q. CO CD " g o o o ciuo the sequence of circle 1, and further confirmed that circle 1 is a circular molecule. PCR using primers specific for circle 2-4 and directed inwardly as well as outwardly (Figure 5.1a) gave products of the sizes expected for the circular contig (data not shown). Since the satellite D N A was digested using Sau3A (also see Chapter 4), the D N A fragments used to make plasmid libraries should have " G A T C " at each end. Any artifactual clones should therefore have Sau3A sites (GATC) at the junctions of chloroplast gene fragments. However, no Sau3A sites were found at the junctions of chloroplast gene fragments in the aberrant circles. Instead, all Sau3A sites were found within the chloroplast gene fragments. Because minicircular chloroplast genes including psbA,psbC, 16S rRNA and 23S rRNA were not cloning artifacts (also see Chapter 4), it is reasonable to argue that the chloroplast gene fragments containing Sau3A sites were also not cloning artifacts. Furthermore, all the jumbled chloroplast gene fragments were derived from the five regions of the four chloroplast genes, and Sau3 A sites were not found at the junctions of these fragments. Therefore the five aberrant circles are unlikely to be cloning artifacts or chimeras. 5.3 Jumbled chloroplast gene fragments Using " B L A S T two sequences" (http://www.ncbi.nlm.nih.gov/BLAST), the sequence of each aberrant circle was used to Blast against the sequences of the ten minicircles that each consist of a chloroplast single gene and the tripartite 9G-9 A-9G region (Chapter 4). The alignment showed that each aberrant minicircle contains fragments that are highly related to segments (>93% identity) from two of the four 110 chloroplast genes:psbA,psbC, 16S and 23S rRNA gene (Figure 5. 1, Table 5.1). The chloroplast gene fragments appeared to originate from five regions of the normal circles: the A and B regions of the 23 S rRNA gene, the C regions of the 16S rRNA gene, the D region of psbA and the E region ofpsbC (Figure 5.1b). Fragments derived from each of the five regions were present in three or four circles: A , B or C region in circle 1, 2 and 3, E region in circle 3, 4 and 5, and D region in circle 1, 2, 4 and 5. The overlapping components suggest that different circles containing fragments from the same region are related. In fact, all the five circles might be related i f all the fragments were taken into account. Comparing the fragments with the homologous regions on the original chloroplast gene circles, we see evidence of duplications and deletions, although the number, size and position of fragments from each region are different in each of the 5 aberrant circles (Figure 5.1, Table 5.2). Fragments derived from four of the regions (all except C) were either separated by gaps (deletions) or duplicated at various sites (Greek letters in Figure 5.1b). These characteristics of the jumbled fragments were clearly demonstrated by the coordinates of the chloroplast gene fragments on the four original gene minicircles (Figure 5.1b, Table 5.1). For example, fragment 5 (791 bp) starting from *9G to the 9 G L of circle 1 is highly related to the corresponding segment of minicircular 23 S rRNA with 98% identity i f the gaps and duplications are not taken into account (Table 5.1). When the sequence of fragment 5 was aligned with that of the corresponding region of 23 S rRNA, it showed three gaps of 10, 5 and 23 bp, two repeats of 116 bp and a deletion of 24 bp on the second one (a on 23S rRNA in Figure 5.1b, 5.2a). Fragment 6 (337 bp) on circle 2 is highly related to the B region of 23 S rRNA gene with 99% identity. It has no 111 c o c g o Q. J2 ra o a o o 0-c o CD o 00 Tf o CD cp /137-/981-2086 2130 CO UO CD 1677 1730 -136 -980 038-087--957 666-707- CM CD 00 CM CM CO IT) CM s uo CO 2/94 r~ 00 c o CD CN uo CM 2/94 o Tr o CD CD CN CD O 2/94 CM CO 00 Tf CM TT T— CM T— T— T— T - CD *7 o uo 00 CT) d ) Is- 00 UO i f- uo CD o r- 00 f~ c o f- CM r*-CD c n CM CD CO 00 CO o , _ c n ~^ T _ •*— a o c 0) 3 CT O 00 •8 f 6 £ z o uo S> r - uo O) f- O CT) r~- O UO CN CM o CN CO 00 CM CD 00 CM CM CO CM CO UO UO r-- UO uo 00 i - ' — .— T- 1 *~ s. TT 00 Tf CNJ CO 4 f-. CD 00 CD CT) uo CD oo uo co CO a> f- o r-- uo UO CO uo uo CD T— CO CO co CO 0 0 CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CN CM CN CN CN CN CN CM CM CM CN TT UO uo i - Tf UO UO uo CM CM CM CM CM CM LL LL u. CO LL LL LL LL a D LL LL LL LL CM a o CM CM LL CO CD CO CO a> CD co S t - 00 CD CD T- f- UO O CO UO CO CD O) CO CD CN o o CD f-CM UO CD CD 1 1 i i ^ CO uo TT CD CD CO CO O) CD O CM 00 CD c\l CO uo CD co CO CD o O) O 00 CD UO O CM o CO T— TT CO CD CN CD •<r CO Is- CM i 1 r~- UO 3 CM f- f- O Tf uo f- UO o T~ O CN O Tf 00 CD T— * ~ CM CM o JO CM a CO UO o T- uo o CD CD (— O H O I-I— CD CD CD CD (-o I-I -o CD < < C9 5 o EE < CD *E 1=3 o o o CD 00 CD O oj co 55 CD O Tf CD CM CO CD TT CD O CD ?3 CD CD 00 _. _ r- T- CM O O CO O) Tf 00 ob CN CN f~ Tf 00 CN r-Tf 00 I I UO CO CD UO CO 00 Tf CM CT) O co Tf CM CD O CD UO CN O CD O j - CM 00 Tf CM O CD O T- CM CD O O CD I-o CD CD o < O CD O CD h- I-< CD < < H O Is "S « "S * "S Q "S a . JO D JO CO JD JO JO LO JO o CO CD CD Tf ^ . CD f~ CN Jf_ Tf^  CM LO CM CN CN CM si a "2 "3 * ft. * ##( CL Q Q a. "S Q Q Q Q. "2 "2 "2 "2 JO JO JO J0 JO JO JO JO JO J0 J0 JO JO CD CM Tf CD CD CD Tf CO CO CD o CO CO CN CO CN Tf CD CO CO CD CO "~ CM CN CM f~ CM CM CO CO Tf LO 112 deletion but three repeats of 29 bp, and the second repeat of the 29 bp has a duplication of 6 bp (P, y on 23S rRNA in Figure 5.1 b, 5.2b). Similarly, fragment 2 on circle 3 has a gap of 26 bp, two repeats of 39 bp (8 on 23S in Figure 5.1b) and two repeats of 25 bp (s on 23S in Figure 5.1b) at different locations. Although 16S rRNA gene fragments on circles 1-3 do not have internal gaps and duplications, on circle 3 the gene has a deletion of 118 bp at the 5' end (16S in Figure 5.1b). The 16S segment actually appeared as an insertion at the same site of the 23S fragment on circles 1-3 (arrows on 23S in Figure 5.1b). The 3' ends of the 16S fragments all have 3 bp (TTC) homologous to the junction site of the 23S segments at the 5' end of the insertion. The integration of the 16S rRNA fragment into circle 1-3 might have occurred before the divergence of circle 1, 2 and 3, since it is unlikely that the integration of the same fragment occurred on three circles independently. The E region of the psbC gene covers part of the fifth transmembrane helix and the following loop region. Sequences of psbC fragments on circles 3, 4 and 5 are identical to the sequence of the E region of the normal psbC gene with the exception of the deletions and duplications (Figure 5.1, Table 5.1). The psbC fragment on circle 3 has a deletion of 165 bp; however, psbC fragments on circles 4 and 5 have two and three repeats respectively sized 33 bp, which encode 11 amino acids of psbC protein ( K on psbC in Figure 5.1b, square brackets in Figure 5. 2c). B L A S T searches of the translation products of the psbC segments on circles 3, 4 and 5 revealed two fragments matching the psbC protein sequence, a small one of 13 amino acids and a large one of 51, 117 and 128 amino acids on circle 3, 4 and 5 (Figure 5.2c). The two psbC protein fragments actually resulted from a reading frame shift caused by an insertion of 11 bp at the 40 t h bp from the 113 (a) 2 3 S rRNA(1081)T7AACATTGTT AATTGTTATT AAAGCCCGAA TGCTACCGAT CTAACTATCA TTAAGGTTTT Cl(5) (1905)TAACATTGTT TATTGTTATT AAAGCCCGAA TGCTACCGAT CTAACTATCA TTAAGGTTTT Cl(a) (2038)TAACATTGTT TATTGTTATT AAAGCCCGAA TGCTATCGAT CTAACTATC-2 3SrRNA TCTGCTGGTT GAGTATACTA AAGATTAAAA TGTAAAGGGA TACACACCTG CCATTC(12 06) Cl(5) TCTGCTGGTT GAGTATACTA AAGATTAAAA TGTAAAGGGA TACACACCTG CCATTC(2 02 0) Cl(a) TATACTA AAGATTAAAA TGTAAAGGGA TACACACCTG CCATTC(2129) (b) 23SrRNA(1560)ATCTTATGAG CTAA ATTGTCTATG TATGT(1588) C2 (6) (163 9)ATCTTATGAG CTAA ATTATCTATA TATGT(1667) C2(P) (166 8)ATCTTATGAG TATGAGCTAA ATTGTCTATG TATGT(1702) C2 (y) (1703)ATCTTATGAG CTAA ATTGTCTATG TATGT(1731) (c) (259) psbC LAYSLSALSL MGFIAAVYAW YNNTAYPSEF YGPTGPEASQ AQGFTFLVRD QKLGIKVASS C i r c l e 3 MVNRR VNNTAYPSEF YGPTGPEASQ AQGFTFLVRD QKLGIKVASS C i r c l e 4 MVNRR VNNTAYPSEF YGPTGPEASQ AQGFTFLVRD QKLGIKVASS C i r c l e 5 MVNRR VNNTAYPSEF YGPTGPEASQ AQGFTFLVRD QKLGIKVASS C i r c l e 3 MRNHQGKLSL MGFIAAVYAW STEG C i r c l e 4 HTRTFGKLSL MGFIAAVYAW STEG C i r c l e 5 HTRTFGKLSL MGFIAAVYAW STEG psbC QGPTALAKYL MRSPSGEVIF GGETMRFWSV QGG WVEPL C i r c l e 3 QGPTALAKY-C i r c l e 4 QGPTALAKYL MRSPSGEVIF SGETMRFWSV QGGETMRFWS VQGG WVEPL C i r c l e 5 QGPTALAKYL MRSPSGEVIF SGETMRFWSV QGGETMRFWS VQGGETMRFW SVQGGWVEPL K K K psbC RTSFGLDIYK IQSDIQSWQE RRAAEYMTHA PLGALNSVGG(369) C i r c l e 3 MTHD SRTTTLSFGR C i r c l e 4 RTSFGLDIYK IQSDIQSWQE RRAAEYMTHD SRTTTLSFGR C i r c l e 5 RTSFGLDIYK IQSDIQSWQE RRAAEYMTHD SRTTTLSFGR (d) (1) (42) psbA LKNTFNTSNV FASAYSFWGS FIGFILSTSN -RLYIGWFGI LMF C i r c l e 4 LKNTFNTSNV FASAYSFWGS FIGFILSTSN -RLYILLIVL LHS C i r c l e 5 LKNTFNTSNV FASAYSFWGS FIGFILSTSN VFASAYSFWG SFIGFILSTS NRLYILLIVL LHS cp 9 114 Figure 5. 2 Alignment of the sequences of repeated regions, (a) 116 bp repeats on circle 1, the second repeat a has a deletion of 24 bp. (b) Sequence alignment of three repeats (29 bp) on fragment 23 S (6) of circle 2, showing that 6 bp of the second repeat (3 was duplicated (bold), (c) Alignment of amino acid sequences deduced from circle 3, 4, 5 and the corresponding region of the normal psbC protein of H. triquetra. The short peptides (bold names) and large ones represent two peptides resulting from the reading frame shift caused by the insertion of 11 bp. (d) Alignment of amino acid sequences deduced from circle 4 and 5 and the corresponding psbA protein of H. triquetra. Square brackets and bold letters represent the repeats. The numbers indicate nucleotide position of repeats (a, b) and the amino acid position of normal psbC (c) and psbA (d) of H. triquetra. 5' end of the psbC fragments (arrow on psbC in Figure 5.1b). The large fragments all have the start A T G and the stop T A A ; but the small segments have the start A T G (circle 3) or T A A (circle 3 and 5), and the stop TGA. ThepsbA(X) fragments on circles 1 and 2 and thepsbA(2) fragments on circles 4 and 5 were derived from the D4 region of the psbA minicircle; the psbA(\) fragment on circles 4 and 5 cover both the D4 region and the 5' end of the psbA gene (Figure 5.1, Table 5.1). The D4 region was duplicated in circles 4 and 5 but did not appear as direct repeats. The translation products of psbA fragments on circles 4 and 5 match 34 amino acids at the N terminus of the D l protein, with the addition of a 21 amino acid (or 63 bp) duplication on circle 5 (cp on psbA in Figure 5.1b, 5.2b). A l l the fragments derived from the D region of psbA have small deletions. Considering the similarity of the psbA 115 fragments among the four circles, circle 1 is closely related to circle 2, and circle 4 is more closely related to circle 5 than to circles 1 and 2. The presence of fragments derived from the same region of the normal chloroplast genes in different minicircles indicates a shared evolutionary origin. Thus circles 1, 2, 4 and 5 are related since they all have fragments derived from the D region of psbA gene. Similarly, circles 3, 4 and 5 are related because they have fragments derived from the E region of the psbC gene, and circles 1, 2 and 3 are related since they all have fragments derived from B region of 23S rRNA and C region of 16S rRNA (Figure 1). Therefore the five aberrant minicircles are all related to each other when all the jumbled chloroplast gene fragments are taken into account. The close relationship among the five circles was also revealed by the organization of the larger gene fragments indicated by the dot or dashed lines and Roman numbers inside each circle (Figure 5.1a). The chloroplast gene fragments from the 9GR to the extra 9G and the organization of these fragments on circle 1 (I) and circle 2 (F) are identical except for a few indels (Figure 5.1a). Similarly, the organization of chloroplast gene fragments on circle 4 (III, IV') is highly related to that on circle 5 (IIF, IV" ) except that there are more repeated sequences on circle 5 than on circle 4 (Figure 5.2, 5.4). It seemed that circle 3 is a bridge between circles 1, 2 and circles 4, 5. On the one hand, in the region from 8GR to the extra 9G on circle 3,1" is related to that of circles 1 (I) and 2 (F), and IF is related to that of circle 2 (II). On the other hand, circle 3 is closely related to circle 4 and circle 5 since IV, I V and I V " on the three circles are identical when repeats or indels are taken into account. 116 Deletions or duplications are present in almost all the chloroplast gene fragments on the aberrant circles but are absent in their homologues on the chloroplast genes (Figure 5.1b). This suggests that deletions and duplications occurred more frequently on the jumbled chloroplast gene fragments than on their homologues in the normal chloroplast genes in H. triquetra. 5.4 The 9G-9A-9G region The tripartite 9G-9A-9G region is very conserved among single gene circles and aberrant minicircles (Figure 5.3) although variants are present in the regions ( D l , D2, D3 and D4) between the cores. However, comparison of the sequences of 9G cores indicated that the 9 G L of circles 2, 3 and 4 were closely related to one another but were otherwise the most divergent 9 G L among all the minicircles in H. triquetra. The 9 G L of circles 1 and 5 were identical to the 9GL of the psbC, 16S and 23S rRNA single gene circles. The 9 G R of circles 4 and 5 were identical to the 9 G R of the psbC and 16S rRNA single gene circles, while the 9GR of circles 1 and circle 2 were identical to one another but were otherwise the most divergent of all the minicircle 9GR regions (bold, Figure 5.3a). The presumed 9 G R of circle 3 had a core with a run of 8G instead of 9G. Sequence alignment indicated that the 8G core is highly related to 9GR except for a few mutations (Figure 5.3a). As for the 9A region, circle 1 is similar to the 16S rRNA and psbC single gene minicircles, circle 5 is similar to the 23 S rRNA and psbA single gene minicircles, but circles 2, 3 and 4 are more similar to each other than to other circles (Figure 5.3b). 117 o o EH U • E H O EH < O o o - u o o EH 9 a EH 9 - < o E H U EH O U EH EH U O E H • EH EH H C J E H E H E H "5 O O - C D EH O E H CD E H CD CD CD CD -CD CD CD CD CD EH - C D C D C J O C J O EH EH E H E H - C J < EH < O EH O E H E H - H EH C J < o E H E H E H H C D EH u O E H CD a C D ai rd r H - U oj Bi C D rd SH 4J X OJ o • • • • H h EH • • < < < < < * ! : < : < ; < : < < ; ril • C D C D C D C D C D O C D C D C D C D E H -CJ • • • - E H E H E H - < < i i < < < r t ; < c D < < ; < < c EH CD Cl H E H E H C D E H E H E H - E H H E H E H EH C D . . . . E H < • • • • < C D C D - O O O - C D C D C D C D EH • • • - C D C D C D • •f$t$t£<<*tt$<t(£r£t£ ft o - - - - < < < - - - & E H < ft • • • - C D C D C D • • C D C D C D C D • • • O O O O - O • • • • O E H E H . - O O O O O O O O O O O E H CJ ri! EH EH EH • • • ' r i l r f r i ! • • < < < < • • < ri! ft rt ft E H • • • • < < < • • ri! < ri! ri! • <!<(<! < ri! EH • • • - <; <; <; • • ri! ri! ri! ri! • • ri! ri! ri! ri! ri! E H • • • • C D C D C D O • • • • C D ri! • • ft • • • • CD CD • • • • O O • O O O • - CD • <5 EH • - E H O O E H • • O EH EH O EH C D • • • • h E H • • - ri! • • ri! ri! rt! • ri! ri! • ri! ri! 0 - - 0 ' - 0 ' ' - 0 0 ' 0 O O < C D .% : : : : : : : : : : : : : : : : : : : : CN CO CU CU 1) OJ r-H rH rH i-H CO i-l CN CO (U OJ OJ S - l S - l r i ; U U O U U U ~ - M h , ~ , ^ CO CQ CO -H -H -rl -rl - H Ol LQ n H CN 0 , 0 , 0 0 0 0 0 * rH CN ri! o u u • - H H cri co ., , . tf< in cu cu ~ " i-H rH H CN O C J — — U U O O - H - H oi as O O * * < ft H IN n * in cu a) cu cu oj H o i n « m OJ OJ OJ CU CU — ~ rH r-H i—I r-H rH CN r H r H S C O C J C J C J O O — r H M r f O O O O U U D l O l f l f l r l r l r l r l r l O D l O l f l f l r l r l ^ r l r l O O CD CO W » - H - H - H -rl -rl 01 IO (») CQ CQ - H -rH -rH - H -rH OO Ol H n f i r i o u u u u f t H P i ftiiiU u o o i n * 118 < ft, r l ( N P I * U l a s cu cu cu cu cu 2 2 rH rH rH r H r H i H ^ r i l U U O U C J O C O W r Q - Q M r H r - l r H S H CO rO CO CO - H - H - H - H -rH H C N D H D J U O O O U cu CN ro CU CU TP in CU CU o ON O co fl CD H co CD C CD OC -+-» co <2 "EH o o o • f l CO •a T3 cn CD | CD 4=1 CO UH »—I CD. CO " f l CD "o u c3 (D O o o oo •fl c =3 o & CO Cl O co •§ C/3 co CD l-i O o o ON 03 l-i +J CD CO ccj L| (H < U U U CO CO rQ -Q U U CO CO -H 0 ( 0 , 0 co ro rH oi H o o u u o H rl H H -H -H u o CD CO co ' CD I-l o o ai o as • f l o ON C M O co CD O fl CD fl cr CD oo 1? • f l CD CD < ON • f l CO CD l-i O CD < ON "H—I O CO CD O C CD fl CT 1 CD 00 The 9G-9A-9G region of each aberrant circle may have undergone reorganization. Both 9G cores on circles 2, 3 and 4 are more similar to the 9GR than the 9 G L on the single gene minicircles (Figure 5.3a). The 9G and 9A cores on some aberrant circles probably originated from different single gene circles. For example, on circle 1 the 9GL is identical to the 9 G L of 16S rRNA, 23 S rRNA and psbC single gene circles, but the 9 G R is different from the 9GR of the three single gene circles. The more interesting thing is that circles 1, 2 and 3 each have an extra 9G core (Figure 5.1a). Sequence comparison showed that a segment of 791 bp including the extra 9G shows 98% identity to the 9GR and the neighboring 23S rRNA gene on the single gene minicircle (Figure 5.1, Table 5.1). Further, the extra 9G of circle 1 is identical to the 9GR of the 23S rRNA circle. The extra 9G of circle 2 is highly related to 9 G R of the 23 S rRNA gene circle, while a large segment (211 bp) of circle 2.including the extra 9G has 95% identity to a fragment including the 9GR of the 23 S rRNA gene minicircle. However, the extra 9G of circle 3 is closely related to the 9 G L of circles 1 and 5 and to the 9 G L of minicircles with 16S rRNA, 23S rRNA, psbA and psbC genes (Figure 5.3a). A possible explanation for the observed similarity is that the extra 9G region on circles 1 and 2 originated from the 9GR region of 23 S rRNA before the divergence of the two circles, while the extra 9G on circles 1, 2 and 3 might have originated independently from 9 G R or 9GL region of different minicircles. In addition to the highly conserved sequences of 9A and 9G cores, homologous sequences were found in D2, D3 and D4 (downstream of 9G R) of the five aberrant minicircles (Figure 5.4). Circles 3 and 4 shared 148 bp in their D2 region (boxed in Figure 5.4 a) that consisted of two 41 bp repeats with a few substitutions, and the two repeats made their D2 longer than the D2 regions of other minicircles (Figure 5.1a, Table 120 oo — C J CJ CJ CJ CJ rH CD CD < EH u U EH EH EH EH EH *_TC *.TC rH CJ o i n — E H EH E H EH < CD < E H EH U U CJ O a CD rH E H EH EH EH CD CD EH EH < < CD CD < < CJ CJ E H EH CJ CJ EH EH EH EH < < EH EH < < a 0 < < CJ CJ E H E H U CJ E H EH E H EH < < E H CD rf. o _ EH EH CJ CJ E H E H E H EH < < CD CD rf CJ EH U EH EH < EH EH CD o EH E H EH EH EH EH < < EH EH < rf CD O 64 ro tf1 CN LD < H < a) CD ai CD CD rH rH rH rH rH 0 U U CJ rH CJ u u SH SH U U C/l XI cn -rH •H -H •rH CD CO •rH ro o CJ U U rH tt CJ CN u EH 1 CD EH I CD CJ 1 CD CD I EH EH 1 < I CJ t 1 rH CD i I Cr, CD l 1 rH EH 1 1 CD l 1 < i I EH l 1 EH j I EH t 1 H H H EH EH E H H < EH E H CO EH <. E H H EH < O a CD < CD CD u EH EH CJ < < EH a CD < EH E H U CJ CJ H H EH EH r> EH EH EH H EH EH EH EH EH E H CAA CAA -IAA CJ CJ 1 CD CD 1 CD CD 1 rH CJ CJ 1 VD EH EH 1 rH E H EH 1 CJ CJ 1 CJ CJ 1 E H EH 1 E H EH 1 u CJ 1 CD CD 1 CD CD 1 rH CD CD 1 UO EH EH 1 rH < < 1 CJ CJ 1 CD CD 1 CD CD 1 EH EH 1 CD CD 1 < < 1 EH E H 1 rH EH E H 1 EH E H 1 rH EH EH 1 EH EH 1 < < 1 EH E H 1 EH E H 1 CJ CJ 1 < < 1 CJ CJ 1 rH CJ CJ 1 ro EH E H 1 H < < 1 U CJ 1 H EH 1 EH EH 1 EH EH 1 EH i rH 1 l 1 CN E H I l 1 rH E H 1 1 1 CD | l 1 CD 1 l 1 CJ I l 1 EH 1 i 1 EH I l 1 O I l 1 u 1 i 1 rH CJ I l 1 rH CJ CJ 1 rH u u u 1 CD CD CD 1 CD CD CD 1 CJ CJ CJ 1 EH E H EH 1 EH EH E H 1 CJ CJ CJ 1 CJ CJ CJ 1 H EH EH EH 1 O EH EH EH 1 H < < < 1 ro tf> CN uo 0) CD CD OJ rH rH rH rH CJ CJ CJ o u SH SH SH •rH -H •rH •rH O U CJ CJ < rH < CD CD CN • •5f CN < CD < E H EH EH EH E H EH CJ CJ CJ CJ CJ CJ CD CD CD ^< ^ i-l fn f-< i-< f-< fn - « : < ; < CD CD CD H EH EH EH EH E H, CJ CJ CJ CJ CD CD CD EH EH EH EH EH EH EH EH EH EH EH < < < CN CN • CJ u XI rl CO -H O H CJ < XI co a, ro tf CN uo < H < CD CD OJ CD CD rH rH rH rH rH K O CJ u CJ SH CJ U SH < SH SH SH SH W XI rl CO XI •rH •rl •H •H CD CO -H ro CO O CJ CJ CJ H O H CJ CN a 121 H E-> CDO CD CD CD CD a CD EH EH H EH EH < < CD < < CJ CJ EH EH U o CD l 0 i CD i CD 1 E H l E H i E H i < t CD l l CJ l E H i CJ C D C D C D C D a C D < C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D EH C D C D C D C D C D a C D C D C D C D C D EH EH EH EH EH EH EH E H E H EH C J C J EH EH EH EH E H E H EH EH E H EH EH EH EH EH E H < < < < < < < C D rf rf C D C D C D C D C D C D < < < < < < U a C D C J C J C J C J C J C J H < < E H EH EH EH EH EH C J C J C J C J C J C J C J C J C J C D • i C D C D C D C D C D C D C D EH E H a C D C D C D C D C D C D C D C D C D C D C D C D C D < C D C D C D C D C D C D EH EH E H E H E H EH EH EH EH C J EH EH C J C J C J C J C J C J EH EH EH E H E H E H EH EH EH E H EH EH EH E H E H E H E H EH E H EH H EH E H E H E H E H EH < < < < < < < < < C D C D C D C D C D C D C D C D C D C J C J C J C J C J C J C J C J C J C D C D C D C D C D C D C D C D C D C D C D C D O C D C J C D C D C D C J C J C J C J C J C J C J C J C J EH EH E H E H EH EH EH EH EH 9 S C D 5 C D C D 9 2 9 a C D C D Si C D % C J C J C J C J C J C J C J C J C J C D C D C D C D C D C D C D C D C D C J C J C J C J C J C J C J C J C J C J C J C J C J C J C J C J C J C J E H E H EH EH E H E H E H H EH E H H EH EH E H H H - C D C D C D C D E H < TA-EH C J H < C D C D EH E H EH EH EH EH EH EH E H EH EH C J E H E H E H E H EH rH C J C J i C J C J C J C J C J C J E H EH EH EH E H E H E H E H EH CN C D C D C D C D C D C D C D C D C D E H E H EH EH E H E H < E H < rH EH E H EH EH EH EH EH EH EH EH E H EH EH E H E H E H EH E H EH EH EH EH E H EH EH EH E H < EH < E H < C D C D C J C D C D C D C D C D C D C J < C D C J C D C J C D C D C D Ei E H EH EH EH E-j E H EH E H E H EH EH EH 1 EH E H < < "5 "S *£ < < l < 1 < 1 C D C D ft C D C D C D C D C D C D 1 l 3 l 1 1 1 3 1 < C D C D C D C D C D C D C D EH l E H E H EH EH EH EH EH E H H l EH EH EH EH EH EH C J l EH C J E H C J EH EH EH rH C J C J 1 C J C J C J C J C J u EH l EH E H E H E H EH EH EH rH E H EH i EH EH EH EH E H EH < l EH < < < EH H EH rH E H EH 1 E H EH E H EH E H H < I C D < C D < C D C D C D E H EH l E H EH E H EH E H EH 3 l C D C D C D C D C D C D C D EH E H l E H EH E H EH EH EH C D l C D C D C D C D C D C D C D EH E H l EH EH EH EH EH EH EH i E H E H EH EH EH E H E H < < < < < < < < < C J l C J C J C D C J C J C J C J EH EH C D EH E H H E H EH EH E H l EH EH E H E H EH E-j E-> C D C D C D C D C D C D C D C D C D C D E H C D C D C D C D C D C D C D C D C D r 5 E H r 5 r ^ < rf rH C D C D C D C D C D C D C D C D C D rf < 23 < rf rf O EH E H EH E H EH EH EH EH EH C D 3 C D C D C D C D C D C D C D H EH E H EH EH H EH EH EH EH n Tf CN i n < H < n Tf CN 0) CU CU cu CO cu CU CU r H r H rH rH r H r H rH rH CJ CJ CJ CJ u C J CJ SH < CJ u U U u u u CO SH CO (H SH SH • H • H - H •rH - H n CO • H • H • H U 0 CJ u H a O CN a U o CJ CU 2 rH K C J U SH SH CO co - H C l a u CN < X I CO a CD X o JO CD = Cl _o CO l-H CD > o CD CD S=! CD O co O^J 13 l-H 'o '3 CD C3 CD (JO -*-» CO J3 "E, o o fc CD o P T3 CD _g TH1 CD •d CD % c o d o '5b CD l - i -d CD I CD CO d o o 122 5.2). 54 bp of the shared 148 bp are present in circle 2 (boxed in Figure 5.4a). Homologous variants were also found in the D2 region between circle 1 and 20 bp in the 23S rRNA circle; circle 5 and 30 bp in the 16S rRNA circle; and circle 1 and 33 bp in circle 2 (Figure 5.4a). A l l the short variants shared between two or more circles are indicative of gene conversion, which is probably the main homogenizing force between the 9G-9A-9G regions. Both the D3 and D4 regions were more conserved than the D2 regions between different circles. Three very conserved motifs are present in D3 of the aberrant minicircles and chloroplast gene minicircles (underlined in Figure 5.4b). In the D4 region, a very conserved region downstream of 9GR was found among the aberrant circles as well as the ten single gene circles, a possible candidate for the promoter for the single gene circles (not shown). It could be very interesting to see whether the fragments on aberrant circles are transcribed; however, since the same fragment is present in aberrant circles as well as in normal chloroplast gene circles, an R N A blot might not tell which circle produced the transcripts. cDNA libraries or sequencing RT-PCR products specific for aberrant minicircles is needed in order to investigate the function of the aberrant minicircles at the transcription level. 5.5 Other non-genic regions In addition to the 9G-9A-9G region and the extra 9G cores, there were two major non-genic regions ("spacer") that have no significant homologues from the databases: S l and S2 (Figure 5.1a). S l on circle 1 (94 bp) consists of two direct repeats of 33 bp and a 123 third repeat that only maintained 18 bp at the 5' end. The sequence of S l on circle 2 (76 bp) was identical to that on circle 1 except that the second repeat of 33 bp on circle 2 had a deletion of 18 bp at 5' end (Table 5.2). Although the sizes of S2 were 236, 307, 422 and 499 bp on circles 2, 3, 4 and 5 respectively, sequence alignment showed that S2 is highly conserved among the four circles (Figure 5.1a, 5.5). Sequence analysis showed that S2 on each circle consisted of four motifs: motif 2, 3 and 4 were very conserved among the four circles, whereas motif 1 on circle 3, 4 and 5 was almost identical but was totally different from that on circle 2 (Figure 5.5a). The size differences of S2 among the four circles was due to the copy number of motif 3 (19 bp). 1, 1, 7 and 11 copies of the motif were present on circle 2, 3, 4 and 5 respectively (Figure 5.5a). Motif 3 was also present in circle 1, and 16S rRNA, 23S rRNA and psbA genes with a few nucleotide substitutions (Figure 5.5.b). Motif 4 was a very conserved motif of approximately 40 bp found in the D l of the 16S rRNA, 23 S rRNA and psbA single gene minicircles; the corresponding S2 region of aberrant minicircles; and also, although with a few substitutions, in the region upstream of the extra 9G on circles 1-3 (Figure 5.5b). Motif 4 was not present in the D l of other minicircular chloroplast genes. Motifs 2, 3 and 4 on circle 2 have several deletions (3, 1, 6 bp) and substitutions compared to those on circles 3, 4 and 5. The similarities of non-genic regions among the aberrant circles suggest that the chloroplast gene fragments on these circles did not originate independently. Instead the chloroplast gene fragments as well as the non-genic regions on the aberrant minicircles are evolutionarily related. The homology between S2 region of aberrant minicircles and D l of 16S, 23S and psbA genes suggest that they are also related (Figure 5.5). 124 16SrRNA GTCAACCTCG GTTGGCAATT psbA TTTCTCCTCA GTTT-CAATT 2 3SrRNA AAACTCCTCA GTTT-TAATT c i r c l e l ATACTCCTCA GTTT-CTCTT c i r c l e 2 ATTCTCCTCA GTTT-TCATT c i r c l e 3 ATTCTACTCA TTTT-CAATT c i r c l e 4 ATTCTACTTA TTTT-CAATT c i r c l e s ATTCTACTTA TTTT-CAATT c i r c l e l * GGACTCCTCA GCTT- CTCTT c i r c l e 2 * GGACTCCTCA GCTT- CTCTT circle3# ATTTTGCTGG ATTT-CACTT ATTATTGTCT ATTATTC ATTATTC ATTATTCT--ATTCTTC ATTATTC ATTATTCTCT ATTATTCTCT ATTATTC ATTATTC ATTATTGTCT CA-GTTTT-C GATGTTTTTC GATGTTTTTC CA-GTTTTTC GATGTTTTTC GATGTTTTTC CA-GTTTT-C CA-GTTTT-C GATATTTTTC GATATTTTTC CA-GTTTT-C TGAGCCTCCT TCACTTTCCT TCACTTTCCT T TTCACTTCCT TTCACTTCCT TGAGATTCCT TGAGATTCCT TCATCTTCCT TCATCTTCCT TGAGCTTCCT AGATTTTACC AGATTTTACC AGATTTTACC -GATGTTGCC AGATTTTCCT AGATTTTAAC AGATTTTCCT AGATTTTCCT AGATTTTCCC AGATTTTGCC AGATGTTGCC Figure 5. 5 Structure (a) and sequence (b) of S2 of circle 2-5. The structure was drawn based on sequence alignment. Motif 1 (102 bp) is almost identical among circle 3-5 except that it has 3 bp substitution and 1 base insertion on circle 5. Motif 2 (153 bp) is highly related among the four circles. It is identical between circle 4 and 5, but have 8 bp substitutions on circle 3, and has three deletions of 3 bp, 1 bp and 6 bp on circle 2 (indicated by arrows). Motif 3(19 bp) has 1,1,7 and 11 copies on circle 2-5 respectively. Motif 4 (34-38 bp) immediately upstream of 9GL is highly homologous among the four circles and is identical between each two circles (see b, circle 2 and 3, circle 4 and 5). The region on 16S, 23S rRNA and psbA circles corresponding to the motif 1 and 2 is the Dl region that is not homologous to the motifs on aberrant circles and not conserved between different genes, (b) Sequence alignment of motif 3 and 4 and the corresponding region of circle 1, 16S, 23S and psbA as well as the region immediately upstream of *9G and #9G. The repeat sequences of 19 bp are boxed. 125 5.6 Discussion 5.6.1 Minicircular chloroplast DNA molecules in other organisms Minicircular D N A molecules were reported from chloroplasts in Acetabularia (Green 1976; Ebert et al. 1985), Euglena (Heizmann et al. 1982) and several other algae. The circular plasmid-like D N A from Euglena and Acetabularia hybridized with chloroplast D N A , suggesting that the plasmid D N A might be related to the chloroplast genome, although the seqeuences and functions of these plasmids were unknown. In the diatom Cylindrotheca fusiformis, two plasmids pCfl (4.27 kb) and pCf2 (4.08 kb) were found and both hybridized with chloroplast DNA, but pCf2 also hybridized with nuclear D N A (Jacobs et al. 1992). Complete sequencing of pCfl and pCf2 revealed that they both contain ORFs with significant similarity to resolvases (site-specific recombinases), suggesting that the plasmids probably have coding functions (Hildebrand et al. 1992). A heterogeneous population of plasmid D N A was also observed in the green alga Ernodesmis verticillata.. Restriction fragments of these plasmids were cloned and sequenced (La Claire et al. 1998). Many cloned fragments had multiple directly repeated sequences and four of the clones had deduced amino acid sequences with 13, 56, 50, and 31 amino acids homologous to psaB, psbB, psbC andpsbF respectively. However, it was not clear whether these segments of chloroplast genes were from the same plasmid or from different ones. At the N terminus of the psbC fragment of E. verticillata, 11 amino acids (QERRAAEYMTH) are also found close to the C terminus of the psbC segment on 126 circle 4 and 5. These are the only homologous amino acids found between the aberrant minicircles in H. triquetra and the plasmid D N A in E. verticillata. Therefore the plasmids described from several algae might have originated from chloroplast D N A although none of those sequenced so far (except for the diatom plasmid) consist of a complete chloroplast gene. Furthermore, no tripartite non-coding region similar to the 9G-9A-9G of the minicircles in H. triquetra was characterized in the plasmids of those algae. The psbA, 16S rRNA and 23 S rRNA gene fragments on the aberrant minicircles ofH. triquetra were not found in the plasmids of C. fusiformis and E. verticillata; the psaB, psbB, and psbF fragments in the plasmids of E. verticillata were not detected in the aberrant minicircles of H. triquetra either. The lack of shared sequence regions suggests that the plasmid-like D N A in different algae have originated from chloroplast genomes independently and after the divergence of these organisms. 5.6.2 Chloroplast DNA fragments in mitochondrial and nuclear genomes Many chloroplast-derived D N A fragments have been found in the mitochondrial genomes of higher plants (Palmer 1990), and are maintained in the non-coding region of mitochondrial genomes. However, such fragments were not detected in the mitochondrial D N A oiMarchantia (Oda et al. 1992) and Chlamydomonas (Boer and Gray 1991; Vahrenholz et al. 1993), so the presence of such sequences appears to be a characteristic of the mitochondrial genomes of higher plants (Watanabe et al. 1994). In contrast, there was no report of a mitochondrial D N A fragment in the chloroplast genome. 127 Chloroplast gene fragments appeared more frequently in nuclear genomes than in mitochondrial genomes, and mainly were present in introns and flanking regions of nuclear genes or appeared as transcribed nuclear pseudogenes (Blanchard and Schmidt 1995). The integration of chloroplast gene fragments has changed the nuclear genomic composition and gene structure, and the presence of the integrated fragments in the non-coding regions as well as the flanking regions of the coding genes might have influenced gene expression patterns (Blanchard and Schmidt 1995). Unfortunately the influences of these integrations have not been experimentally assayed. Since no consistent pattern has been found in the sequence junctions of fragments derived from chloroplast genes and fragments derived from mitochondrial as well as nuclear genes, or a transposable elements (Zulla et al. 1991), homologous recombination (Pichersky et al. 1991) and non-homologous recombination (Sun and Callis 1993) were proposed to be involved in the integration of the chloroplast fragments into the mitochondrial and nuclear genomes (Blanchard and Schmidt 1995). Analysis of the fragment sequences and the integration sites of chloroplast derived fragments in mitochondrial genomes of plants have failed to detect any similarity with transposable element type insertion, suggesting that transposable elements might have not been involved in the integration of chloroplast gene fragments into the mitochondrial D N A and the nuclear genomes (Pichersky 1990). In my study, transposable elements were also not detected in the chloroplast gene fragments in the aberrant mincircles. Therefore transposable elements may not have been involved in the origin of the jumbled chloroplast gene fragments in the aberrant minicircles. 128 The homologous recombination model or "Nomad D N A " hypothesis (Pichersky et al. 1991) was based on the analysis of numerous chloroplast D N A fragments in the nuclear genome of higher plants (Pichersky 1990; Pichersky et al. 1991). It was suggested that chloroplast gene fragments form heteroduplexes at one end because of fortuitous short sequence similarities between the fragment end and the integration site, while the other end ligates to the host D N A directly. However, since most fragments lack homology to the new sites at the other end, they might randomly integrate into the mitochondrial and nuclear genome simply by the end-joining process used in repairing D N A breaks (Roth and Wilson 1988; Blanchard and Schmidt 1995). The homologous recombination proposed in generating sequence fragments is a specific case of a random integration process. It seems none of the hypothesis can satisfactorily explain why and how those D N A fragments migrated from one organelle to another, and how the jumbled chloroplast D N A fragments in aberrant circles were derived from normal chloroplast genes. Recently, fragments of mitochondrial D N A transferred to yeast chromosomes during the repair of double-strand breaks in haploid mitotic cells were experimentally demonstrated, which suggests that the integration of mitochondrial D N A into the yeast genome is an ongoing process (Ricchetti et al. 1999). 5.6.3 Selfishness of the aberrant minicircles Selfish D N A are D N A sequences widely distributed in a genome but having no known function, and they include repetitive DNA, introns and 129 D N A sequences between genes (Doolittle and Sapienza 1980; Orgel and Crick 1980). It is unlikely that any of the gene fragments on the aberrant minicircles are functional normally; on the other hand, chloroplast genes from which the fragments are derived function normally and are transcribed (Chapter 4). Even i f they were transcribed, which could not easily be determined, it is hard to envisage a function for a few tiny rRNA fragments. Most of the protein gene fragments lack start codons or have internal stop codons. The high frequency of deletions indicates that the aberrant circles are under strong selection for small size. The high conservation of the 9G-9A-9G regions contrasts markedly to the fragmented and jumbled gene fragments of these chimeric circles. This conservation strongly suggests that the 9G-9A-9G regions are essential for the replication and persistence of the circles. The conserved 9G-9A-9G region may be sufficient for replication and transmission from generation to generation because the rest of the aberrant circle D N A appears to be entirely devoid of function. If this is so, then these tiny circles are genetic parasites of the chloroplast replication machinery and it reasonable to regard them as examples of selfish DNA. 5.6.4 Origin of the aberrant minicircles A model is proposed to address the possible origin of the aberrant minicircles (Figure 5.6). The shared orientation of all the chloroplast gene fragments and the high degree of sequences identity between fragments on different aberrant circles suggest that the aberrant minicircles might have originated from two heterodimers of single chloroplast gene circles followed by numerous deletions and duplications (Figure 5.6). 130 131 Figure 5. 6 Hypothetical scheme for origin of selfish circles from four chloroplast genes, (a) Integration of C region of 16S rRNA into B region of 23S rRNA gene by illegitimate recombination resulted in a hybrid rRNA circle, (b) An heterodimer of psbA and psbC could have generated circle 4 and 5. The hybrid rRNA circle and the dimers recombined to form a super-hybrid circle consisting of all five regions that generated circle 1-3 by numerous deletions and duplications. Solid curves represent deletions generating circle 1, 2 or 3, as indicated by the numbers beside the curves. Doted arrows represent integration of C region and partial B region into another location appeared on circle 1 and circle 2. The extremely conserved 9G-9A-9G region may be a hotspot for recombination between minicircles that could result in unstable intermediates (dimers, trimers or even tetramers) with multiple 9G-9A-9G regions. Circles 4 and 5 are almost identical apart from indels in the chloroplast gene fragments and in the non-coding region (Figure 5.1, 5.2, 5.4). Circle 4 and 5 might have originated from a heterodimer of psbA and psbC genes, followed by numerous deletions and duplications (Figure 5.6b). Because two sets of 9G-9A-9G (thus with four 9G and two 9 A cores) might have increased the probability of recombination at the 9G-9A-9G region, such a "heterodimer" might have been stabilized by deleting the extra 9G-9A-9G including its neighboring regions and generated a "hybrid" minicircle consisting of D and E regions. Further numerous deletions and duplications resulted in circles 4 and 5 (Figure 5.6b). Circles 1 and 2 are very closely related, and both might have originated from a dimer containing 16S and 23S rRNA genes, which further generated a "hybrid" minicircle by deleting part of the extra 9G-9A-9G and integrating the C region (16S 132 fragment) into the B region. Alternatively, the integration of the 16S fragments into the B region might have occurred via a mechanism similar to the "Normad D N A " model (Pichersky et al. 1991). The 3' end of the 16S fragment (TTC) could form a heteroduplex with the 23 S at the integration site because of the homologous sequence (TTC), while the other end ligated directly to the host DNA. The 16S and 23S rRNA "hybrid" minicircle might have recombined with the psbA and psbC hybrid circle at the 9G-9A-9G and formed a "super-hybrid" circle containing regions A , B, C, D and E of the four chloroplast genes. Further deletions and duplications of the "super-hybrid" circle could have generated aberrant minicircles 1, 2 and 3 (Figure 5.6a). On the other hand, the likelihood that the chloroplast gene fragments on aberrant circles originated via recombination between the coding region of different genes is quite low, since there is no homologous sequence among five regions of the chloroplast genes. Assuming that the 9G-9A-9G region contains the replication origin, each minicircle is a replicon (Zhang et al. 1999). The presence of an extra 9G-9A-9G region might be disastrous for an intermediate consisting of two, three or four minicircular chloroplast genes. Normal chloroplast genomes (120-200 kb) can have one replication origin (Schlunegger and Stutz 1984) or a few origins (Ohyama 1992). Possibly single gene circles as well as aberrant minicircles are so small (2-3 kb) that they need only one replication origin. Any extra replication origin would compete for replicase and related elements, therefore the dimer, trimer or tetramer might have to get rid of the extra 9G-9A-9G in order to survive (replicate). Although three out of the five aberrant minicircles have an extra 9G core, however, none of them has an extra 9 A core, suggesting that the 133 9A region is responsible for the normal replicating function, and that survival is contingent on deletion of any extra 9A core. A transposable element usually has short direct repeats at the two ends, and is the major force for generating jumbled D N A fragments (Li 1997). The model proposed (Figure 5.6) can account for the origin of the jumbled chloroplast gene fragments on aberrant circles in the absence of transposable element. Perhaps dimers, trimers or even tetramer that contain two, three or four single gene circles are present in the satellite DNA. Hybridization of the uncut satellite D N A using 9G-9A-9G probe revealed three faint bands approximately sized 3.5, 5.2 and 7.0 kb in addition to multiple bands sized 1.0 - 3.3 kb that were from single gene circles (see Figure 4.19). Because they were larger than the single gene minicircles, the faint bands may have been from intermediate forms of single gene circles. It also suggested that the copy number of the intermediate would have had to be low since the bands were faint. Since the chloroplast gene fragments on aberrant minicircles of H. triquetra are not similar to those fragments on plasmids from C. fusiformis and E. verticillata or to fragments in the mitochondrial and nuclear genomes, the origin of the chloroplast gene fragments might be different. The chloroplast gene fragments in the mitochondrial and nuclear genomes, as well as those fragments on the plasmids, might have originated independently from the large circular chloroplast D N A of each species. 134 Chapter 6 Generality of single gene circles in dinoflagellates 6.1 Introduction Each chloroplast gene in the peridinean dinoflagellate Heterocapsa triquetra is on a separate minicircle (Chapter 4, Zhang et al. 1999). Unigenic chloroplast gene circles are also present in Amphidinium operculatum (Barbrook and Howe 2000), a dinoflagellate that is distantly related to H. triquetra. Since dinoflagellates are a very diverse group consisting of more than two thousand species, it would be interesting to know the generality of this unusual chloroplast gene organization. In order to see the generality of single gene circles in the dinoflagellates, spinach psbA and H. triquetra 23 S rRNA probes were hybridized to Southern blots of uncut total genomic D N A from fourteen photosynthetic dinoflagellates of five orders (Figure 6.1). Complete or partial chloroplast psbA and 23S rRNA gene sequences were amplified from eight dinoflagellates by PCR using degenerate primers and were sequenced. 6.2 Southern blots of genomic DNA of fourteen dinoflagellates In the species of Heterocapsa (H. pygmaea, H. niei and H. rotundatd) and Amphidinium (A. carterae), psbA exclusively labeled the 1.5-6 kb region where no DNA could be seen after ethidium bromide staining (II and V of Figure 6.1). This result is consistent with circular DNA molecules in the same size range as in H. triquetra. The faster and slower-migrating bands strongly labeled by psbA probe in each of these 135 kb PM PR AC HR HP HT PM PR AC HR HP HT PM PR AC HR HP HT kb HN AC Tl AE ST TH HN AC Tl AE ST TH HN AC Tl AE ST TH Figure 6.1 Southern blots of uncut total genomic DNA of dinoflagellates. I, IV: Ethidium bromide stained genomic DNA electrophoresed on agarose gel. II, V: DNA blots probed with spinach psbA. Ill, VI: DNA blots probed with H. triquetra 23S rRNA. HT, Heterocapsa triquetra; HP, Heterocapsa pygmaea; HR, Heterocapsa rotundata; HN, Heterocapsa niei; AC, Amphidinum carterae; PR, Protoceratium reticulatum; P M , Prorocentrum micans; T l , Thecadinium inclinatum; A E , Adenoides eludens; ST, Scrippsiella trochoidea; TH, Thoracosphaera heimii. 136 samples probably correspond to supercoiled and relaxed monomelic circles respectively. Interestingly psbA labeled four bands from the uncut genomic DNA of H. pygmaea: 3.0, 2.7, 1.4 and 1.2 kb, i.e., two faster and two slower migrating bands (lane HP on II of Figure 6.1). Similarly, psbA labeled four bands (one very faint) of uncut genomic DNA of H. niei (lane HN on V of Figure 6.1). These results suggest that there are two dissimilar sized psbA minicircles in H. pygmaea and H. niei, and PCR amplifying psbA gene confirmed the presence of the two dissimilar sized minicircles resulting from various indels in the non-coding region (see 6.3). In Amphidinium, psbA only labeled a band around 3.0 kb that probably represents the relaxed form of psbA minicircle (lane A C on II of Figure 6.1). However, in another DNA blot, psbA labeled three bands which probably correspond to the relaxed, linear and supercoiled form psbA minicircles (lane A C on V of Figure 6.1). Similarly, hybridization with the H. triquetra 23S rRNA gene probe labeled the 2-6 kb region for Amphidinium, Protoceratium and all Heterocapsa species (III and VI of Figure 6.1). The Protoceratium and Heterocapsa species have three bands corresponding (in decreasing size order) to relaxed monomeric circles, linear monomers (weak, except in Protoceratium) and supercoiled monomeric circles. H. triquetra 23S rRNA probe labeled four bands of uncut genomic DNA of H. niei (lane H N on VI of Figure 6.1), suggesting there are two dissimilar sized 23S rRNA circles in H. niei. PCR amplification of the 23S rRNA gene confirmed that there are two dissimilar sized minicircles which shared the same coding region but differed in the non-coding region (see 6.3). The weak hybridization of 23 S rRNA at high molecular weight is probably nonspecific cross-reaction with nuclear 28S rRNA gene, or chloroplast DNA trapped in the high molecular 137 weight DNA. Amphidinium has a single low molecular weight band around 3.5 kb (lane A C on III of Figure 6.1), this may be the relaxed form of unigenic 23 S minicircles. However, 23S rRNA labeled two bands which probably correspond to the relaxed and supercoiled chloroplast 23S rRNA circles in another blot (lane A C on VI of Figure 6.1). Hybridization of uncut genomic DNA of A. asymmetricum, A. compactum and A. rhynchocephalum with psbA and 23S rRNA also labeled the region of 1.6 to 4.0 kb (data not shown). These results suggest the minicircular psbA and 23S rRNA genes are not restricted to H. triquetra, and are present in dinoflagellates closely related to H. triquetra and in those distantly related to H. triquetra. Therefore the minicircular chloroplast genes might have been present in the ancestor of peridinean dinoflagellates. In the dinoflagellate Protoceratium reticulatum, psbA weakly labeled three faint bands of approximately 2.8 kb, 4.0 kb and 6.0 kb but strongly labeled a high molecular weight band corresponding to the bulk DNA (lane PR on II of Figure 6.1). This is in apparent contrast to the three bands strongly labeled by 23S rRNA probe (lane PR on III of Figure 6.1). Whether the heavy labeling of psbA in the high molecular weight is due to trapping of minicircles in the bulk DNA or indicates the presence of the psbA gene on a larger chromosome needs further investigation. The faint labeled bands probably represent minicircular psbA DNA. Probably psbA minicircles and large D N A molecules containing psbA genes are both present in the chloroplast genome of P. reticulatum. Both psbA and 23 S rRNA probes significantly labeled high molecular weight bands in the genomic DNA blots of six dinoflagellates: Thecadinium inclinatum (Gonyaulacales), Prorocentrum micans and Adenoides eludens (Prorocentrales), Thoracosphaera heimii (Thoracosphaerales) and weakly labeled Scrippsiella trochoidea (Peridiniales) (Figure 138 6.1, Table 6.1). Similar high molecular weight label was also observed on Gyrodinium galatheanum (Gymnodiniales) (data not shown). These results suggest that either psbA or 23 S rRNA gene is on large DNA molecules in these dinoflagellates, not on minicircular chromosomes as those in H. triquetra. The weak high molecular weight bands suggest that psbA and 23S rRNA are divergent in P. micans and S. trochoidea. psbA and 23S rRNA probes labeled two bands of approximately >12 kb and 7 kb for A. eludens, and >12 kb and 10 kb for T. heimii. The two bands might represent a relaxed form and a supercoiled form of the circular chloroplast genome in the two dinoflagellates. Therefore the size of the chloroplast genome in A. eludens and T. heimii should be smaller than those of algae and higher plants sized 120-200 kb, but larger than the single gene circles of H. triquetra (Zhang et al. 1999) and A. operculatum (Barbrook and Howe 2000). It would be very interesting to investigate the chloroplast genome of the first two dinoflagellates; probably the gene contents of their chloroplast genome and the genome organization would be another unexpected surprise, like the single gene circles. 6.3 PCR amplification of chloroplast psbA and 23S rRNA genes from dinoflagellates Using H. triquetra specific primers (bA6/bA7) and degenerate primers (DbA2/DbA6 or DbA6/DbA7) (Table 6.2), PCR amplification of the coding region of psbA gene was carried out, and each primer pair gave a single band product of the predicted size from eight dinoflagellates in addition to H. triquetra (Figure 6.2, Table 6.3). Sequencing the PCR products confirmed that they are psbA (Tables 6.4, 6.5). In 139 Table 6.1 Summary of DNA blots probed with chloroplast psbA and 23 S rRNA genes in. fifteen dinoflagellates Order Species Strain No. Minicircle Sequence Peridiniales Heterocapsa triquetra CCMP 449 yes complete Heterocapsa pygmaea CCMP 1490 yes complete Heterocapsa niei CCMP 447 yes complete Heterocapsa rotundata NEPCC D680 yes complete UScrippsiella trochoidea NEPCC D620 no Partial Gymnodiniales Amphidinium carterae CCMP 1314 yes complete Amphidinium asymmetricum * NEPCC D067 yes n.d. Amphidinium compactum * NEPCC D081 yes n.d. Amphidinium rhynchocephalum* U T E X LB 1946 yes n.d. #Gyrodinium galatheanum* NEPCC 55R no n.d. Gonyaulacales Protoceratium reticulatum** NEPCC D535 yes complete #Thecadinium inclinatum * NEPCC D 682 no n.d. Prorocentrales #Prorocentrum micans NEPCC D443 no partial #Adenoides eludens* NEPCC D683 no n.d. Thoracosphaeles WThoracosphaera heimii NEPCC D670 no Partial *: data from DNA blots. **: psbA might be on a minicircle or on a large DNA molecule. #: Results of DNA blot, inverse PCR and analysis of the sequences of the PCR products are consistent and suggest that the psbA and 23S rRNA genes in these species are on large DNA molecules. 140 Table 6.2 Dinoflagellate chloroplast 23S rRNA and psbA primers Name Sequence 23S1 5'GGCTGTAACTATAACGGTCC3' 23S2 5'CCATCGTATTGAACCCAGC3', 23S3 5'ATAAGTGGTTGTAGAAGAAAG3' 23S4 5'TAATTCTTTCTTCTACAACCAC3', D23S1 5'YTACYCWAGGGWTAACAG3' D23S2 5'TTMWATSTTTCATGCAGG3', bA6 5'GCAAGATCAAGTGGGAAGTTG3' bA7 5'GCTCCACCAGTCGATATTG3' bA1 5'CCAAGAGCTTCCCAAACTG3 bA5 5'CAACTTCCCACTTGATCTTGC3' DbA6 5'GTTGTGAGCGTTACGTTCRTGCATNACYTC3' DbA7 5'ATCTTCGCTCCACCAGTTGAYATHGAYGG3' DbA2 5'GGTCAAGGTTCTTTCTCTGAYGGNATGCC3' DbA1 5'GGCATACCATCAGAGAATCWNCCYTGNCC3 DbA5 5'GTTAGTACAATGGCTTTCAAYYTNAAYGG3' Table 6.3 PCR amplification of chloroplast genes from various dinoflagllates S D S D Organism 23S1/ 23S2/ 23S1/ D23S1/ bA6/ b A l / DbA6/ DbA2/ D b A l / 23S2 23S3 23S4 D23S2 bA7 bA5 DbA7 DbA6 DbA5 Heterocapsa triquetra Yes Yes Yes Yes Yes Heterocapsa pygmaea Yes Yes Yes Yes Yes Heterocapsa rotundata Yes Yes Yes Yes Yes Heterocapsa niei Yes Yes Yes Yes Yes Amphidinium carterae Yes Yes No Yes No No Yes Yes Yes Protoceratium reticulatum Yes Yes No Yes No No No Yes No Prorocentrum micans No No No No No No Yes Yes No Scrippsiella trochoida Yes Yes No No No No Yes Yes No Thoracosphaera heimii Yes No No No No No Yes Yes No S: specific. D: degenerate. Yes: PCR worked and sequencing confirmed that they are chloroplast genes. No: PCR did not work. Blank: PCR was not tried. 141 H. triquetra 23S rRNA (3,027 bp) «• triquetra psbA (2,151 bp) Figure 6.2 Structure of the 23S rRNA and psbA minicircles of H. triquetra. Gray area represents coding region; 9G-9A-9G is the tripartite non-coding region. Primers were designed based on H. triquetra gene sequences except for degenerate primers D23S1.D23S2, DbA1, DbA2, DbA5, DbA6 and DbA7, which were designed based the chloroplast gene sequences of various organisms including H. triquetra. 142 Table 6.4 Characteristics of minicircular psbA and 23 S rRNA genes of five dinoflagellates Organisms Gene Size(bp) Start Stop (A+T)% Dl(bp) D2(bp) D3(bp) D4(bp) H. triquetra psbA 2,151 TTG T A A 60 173 66 92 316 23 S rRNA 3,027 67 116# 86 99 152# H. pygmaea psbAl 2,195 A T G T A A 62 327 282 139 116 psbAl 2,421 509 172 110 301 23 S rRNA 2,793 67 147# 66 35 144# H. niei psbA 2,311 A T G T A A 62 (78)299 231 111 100 23S rRNA -3,400* 67 237 (78)57 H. rotundata psbA 2,298 A T G T A A 61 507 144 39 119 23 S rRNA 3,365 65 324# 148 (18)38 131# A. carterae psbA 2,311 A T G T A G 54 507 86 526 23 S rRNA 2,651 55 146# 32 394# P. reticulatum 23 S rRNA 3,772** 63 n.d. n.d. n.d. n.d. *: not complete yet. **: One clone of the PCR products has a 67 bp insert in the non-coding region and the 90 bp downstream of the 67 bp insert is different from those of other clones and PCR products, suggesting heterogeneous minicircles of 23 S rRNA gene are present in Protoceratium reticulatum. #: estimate from the sequence alignment. Numbers in brackets are sizes of the D l * or D3* in Figure 6.3a. psbAl and psbAl represent the small and large circles respectively. Table 6.5 Partial psbA and 23S rRNA gene sequences in four dinoflagellates Organisms Gene Size (bp) (A+T)% Scrippsiella trochoidea (Peridiniales) psbA 775 59 23 S rRNA 1,401 66 Prorocentrum micans (Prorocentrales) psbA 795 58 Thoracospherae heimii (Thoracospheraeles) psbA 813 61 23S rRNA 430 67 Protoceratium reticulatum (Gonyaulacales) psbA 455 65 143 order to get complete minicircular psbA genes, inverse PCR using outward primers (bAl/bA5, DbAl/DbA5) yielded one product from H. rotundata and A carterae, and assembly of the sequences of the PCR products could be circularized in each species (Figure 6.3). However, primer pair bAl/bA5 gave two PCR products of 1.7 and 1.9 kb from H. pygmaea, and sequencing the two products revealed that the length difference between the two circles is 226 bp, suggesting the size polymorphism of H. pygmaea psbA gene is in the non-coding region. PCR amplifying genomic D N A using primer pair bAl/bA5 also gave two products slightly different in size from H. niei (data not shown), suggesting that there are two dissimilar sized minicircles of psbA that are identical in the coding region but differ in non-coding region in H. niei. These results are consistent with those of D N A blots (lane HP on II, H N on V of Figure 6.1). Similarly, the 23S rRNA gene was amplified by PCR using inwardly directed primers (23S1/23S2, 23S2/23S3) from seven dinoflagellates and sequenced (Figure 6.2, Tables 6.2, 6.3, 6.4, 6.5). Inverse PCR using outward primers (23S1/23S4, D23S1/D23S2) yielded one product from H. pygmaea, H. rotundata, A. carterae and P. reticulatum. 23 S rRNA minicircles were obtained from these dinoflagellates by assembling the sequences of the PCR products (Figures 6.2, 6.3). PCR amplifying genomic D N A using outward directed primer pair 23S1/23S4 gave two products slightly different in size from H. niei (data not shown), suggesting that there are two dissimilar sized minicircles of 23S rRNA gene differing in the non-coding region inH. niei. This is consistent with the results of D N A blots (lane H N on VI of Figure 6.1). Heterogeneous minicircles of the 23 S rRNA gene were also observed in H. triquetra (Chapter 4) and P. reticulatum (data not shown) resulting from indels in the non-coding region. 144 H. niei H - rotundata H. rotundata 145 H. triquetra P. reticulatum Figure 6.3 Comparison of the structure of psbA and 23S single gene circles if five different dinoflagellates. (a) Minicircular chloroplast genes in four dinoflagellate species. Grey areas represent coding regions; shaded patterns represent cores in non-coding regions, identical cores have same pattern of shading, (b) Structure of the coding region of the 23S rRNA gene in H. triquetra and P. reticulatum. Dotted lines represent the corresponding fragments of 23S rRNA in H. triquetra and P. reticulatum. Locations of primers used in P C R amplification of the 23S rRNA circle from P. reticulatum are shown. 146 The 23 S rRNA minicircle of Protoceratium reticulatum (3, 772 bp) is the biggest one ever sequenced from dinoflagellates (Figure 6.3b). The sequence of the coding region of P. reticulatum is highly related to that of H. triquetra with >88% identity. However, the two parts (the light and dark gray region) of the 23S rRNA gene have been transposed relatively to one another in the direction of transcription compared to the 23S rRNA gene of H. triquetra and other organisms (Figure 6.3b). The psbA gene of P. reticulatum may be on minicircular chromosomes (three faint labeled bands) and large molecules (strongly labeled high molecular weight band) as the D N A blot revealed (lane PR on II of Figure 6.1), but PCR amplification of the psbA minicircle has not been successful yet. Probably the chloroplast genome of P. reticulatum consists of both minicircles and larger D N A molecules containing the chloroplast psbA gene. Using inwardly and outwardly directed 16S rRNA primer pairs, PCR amplification gave one product for each primer pair (not sequenced), suggesting that the 16S rRNA gene is also on a minicircle. Therefore single gene circles are not restricted to H. triquetra. They are also present in the dinoflagellates closely related to H. triquetra: H. pygmaea, H. niei and H. rotundata, but also to some distantly related to H. triquetra: Amphidinium and Protoceratium reticulatum (Figure 6.3). This suggests the chloroplast gene minicircles might have been present in the ancestor of the peridinean dinoflagellates. Numerous D N A blots are needed for hybridizing with other chloroplast gene probes, to tell whether the other genes are also on minicircular chromosomes in these dinoflagellates. Furthermore, Southern blots of uncut total genomic D N A from representatives of various other dinoflagellate orders are needed to hybridize with psbA and 23S rRNA probes and 147 will provide substantial data for the generality of the minicircular chloroplast genes in the dinoflagellates. However, minicircular chloroplast genes might not be present in Scrippsiella trochoidea, Prorocentrum micans and Thoracosphaera heimii, as indicated by the DNA blots. Although the coding regions of chloroplast genes have been partially amplified from S. trochoidea (psbA, 23S rRNA), T. heimii (psbA, 23S rRNA), P. micans (psbA) and P. reticulatum (psbA), inverse PCR amplifying the non-coding region of the minicircular chloroplast genes yielded no products from these dinoflagellates (Table 6.3 and 6.4). The results of inverse PCR are consistent with those of genomic D N A blots, and the interpretation that the chloroplast genes are probably present in large D N A molecules in those species. 6.4 Comparison of non-coding regions from five dinoflagellates Sequence comparison indicated that the chloroplast genes psbA and 23S rRNA are both conserved within the dinoflagellates, although they are the most divergent chloroplast genes ever sequenced (Chapter 7). However, the non-coding region of the two minicircles is conserved within each species but completely different between species (Figures 6.3, 6:4, 6.5). The size of all these single chloroplast gene circles is around 2-4 kb (Figure 6.3, Table 6.4). The tripartite non-coding region (9G-9A-9G) of H. triquetra is very conserved among the minicircles including the selfish circles, and might function in the replication of the minicircles (Chapters 4, 5). However, the non-coding regions of the minicircular 148 psbA and 23S rRNA genes of the close relatives (H. pygmaea, H niei and H. rotundata) and distant relatives (A. carterae and P. reticulatum) of H. triquetra are totally different from the 9G-9A-9G region of H. triquetra (Figures 6.3, 6.4, 6.5). In H. pygmaea, the tripartite non-coding region consists of three identical cores of 94 bp with a run of 5G at the center of the core, and is not homologous to the 9G or 9A core of H. triquetra (Figures 6.4, 6.5). In H. rotundata, the non-coding region has three cores sized 111 bp (with a run of 6G at its center), 196 bp and 97 bp (both have a run of 6T at its center) that are related since they have 97 bp identical (Figures 6.4, 6.5). However, there is an additional 6T core of 196 bp separated from the first one by 18 bp in the non-coding region of the 23 S rRNA minicircle of H. rotundata, thus the 23 S rRNA minicircle has a quadripartite while psbA minicircle contains a tripartite non-coding region (Figure 6.3). In H. niei, both psbA and 23S rRNA minicircles have quadripartite non-coding regions consisting of a core of 169 bp with a run of 6T at its center and three identical cores of 90 bp with a run of 7G at the center of the core (Figures 6.3, 6.4, 6.5). However, the location of the third 7G core (90 bp) is different between the two circles; it follows the second 7G in 23S rRNA minicircle but is separated from the second 7G core by the coding region in the psbA circle (Figure 6.3). Strongly contrasting with the tripartite or quadripartite structure oi Heterocapsa species, the non-coding region of psbA and 23 S rRNA gene in Amphidinium carterae is bipartite, consisting of a large core of 144 bp and a small core of 48 bp (Figure 6.4 and 6.5). However, the two cores in A. carterae are more divergent that those of Heterocapsa species between psbA and 23S rRNA circles. Sequence comparison of the psbA minicircle of A. carterae (CCMP 1314) with the psbA of A, operculatum (Barbrook and 149 135bp C 9G HP 94bp 5G 94bp 94bp D2 ggggga D3 ffgggga D4 T 5G 5G N HN* T D1 169bp 90bp 90bp D2 UWWttW 0 3 M M M M M H I D4 6T 7G 7G N HR* -Z. D1 111bp D2 196bp 97bp D3 I I D4 6G 6T 6T' N AC • D 1 144bp 5A D2 48bp D3 3A N Figure 6.4 Structure of non-coding region of single gene circles of five peridinean dinoflagellates based on at least two genes for each species. Each rectangle represents a conserved core; rectangles with the same color are highly related. D1-D4 are variable regions between the conserved cores. N and C are N terminus and C terminus of psbA protein. Non-coding regions are highly conserved within species but very divergent between species. HT, Heterocapsa triquetra; HP, Heterocapsa pygmaea; HN, Heterocapsa niei; HR, Heterocapsa rotundata; A C , Amphidinium carterae. HN* : both psbA andthe 23S rRNA gene have a third core of 90bp at a different location with respect to other cores (see text and Figure 6.3). HR*: the 23S rRNA gene has a two 196 bp cores, and the second core separated from the first one by 18 bp. 150 HT 9G A A A T C C T G A T A A A T T T C A C T T T T C T C A G T A C T T T T C C C C G GTAAAAGGGG GGGGGTGTCT G C G A T T T C A A AGTGGAGTCC C A A A C G C A T G T C T G G A A T A T A T G A G G A G A A G T T A T T T T C T C A G A T A T T C T CAGAT HP 5G T T T T T C C C T C TCAAAGGGGG GTCTGGTGAC TTTTGGGGGT T T T T C G A A A G T T C C A T G T T T T G A C T T T C C A G A A T A A G T A G AGGGGTTAGT T T T T G T A G A A A T C C C T T A A A A T C C C T T A A A A T C C G T T A A A A A T G G G C T T T T T T G C A T C A C C T T C C G T C A A A C A C T T G A A A A A G T A T G A A A A A G G A G G T T T T HN 7G A A T C C A A A C C TGTTTTTGGG GGGGCTCGTG C C A G T T A A G T GGGTTTTTAT TTTGGGCCGT T A T A C G C C G A T C C T G T A A T A AGTAGAGGGG A C 5A G G A C A A A T G T T G T C A G A C A T GAGGTTGACC A G G T C A T - T C A T A G G C T C T C C G G T C A T T T T G T T C C A T C T C T A C C C C A G T A GAGAAAAATC C A G G T C A T A T CATAGGAGAT GGAACTGAGA GATCGAGAGA A C G A A G A C A G A A A G Figure 6.5 Consensus sequence of cores of the non-coding region of five dinoflagellates. Sequences of cores vary between species but are very conserved within each species, and each core contains short strings of the same nucleotides. HT, Heterocapsa triquetra. HP, Heterocapsa pygmaea. HR, Heterocapsa rotundata. H N , Heterocapsa niei. AC, Amphidinium carterae. 9G , 5G, 6T, 7 G and 5 A are runs of identical nucleotides within each core and used arbitrary for the core name by the author. HT 5 ' A T C T A T C T A T C A T A C C A C C T T T G G T G G T A T T A T A G A T A G A T 3 ' HN 5'GTAATTACCATTTACCCTTT A A A G G G T A A A T G G T A A T T A C 3 ' HR 5 ' C A T A C A T A C A T A C A T A C A T A C C C C C T T T A G G G G G G G T A T G T A T G T A T G T A T G T A T G 3 ' Figure 6.6 Sequences and secondary structure of inverted repeats (IR, bolded) in the non-coding region, (a) IR of H. triquetra in the 9 A core, (b) IR of H. niei. (c) IR of H. rotundata in the 6 G core. 151 Howe 2000) revealed that they are identical including the non-coding region. Sequence alignments showed the non-coding regions ofpsbA and 23S rRNA circles of A. carterae were highly related to the five circles (petD, atpB, psaA, psbA and psbB) of A. operculatum, and contain a bipartite region (Figure 6.4). Moreover, the complete psbA (AF206672, direct submitted) of the strain A carterae (CS-21, CSIRO Culture Collection, Hobart, Australia) is not identical to the psbA minicircle of the strain A. carterae (CCMP 1314). These results suggest that A carterae (CCMP 1314) and A operculatum are highly related probably the same species, and A. carterae (CCMP 1314) and A. carterae (CS-21) are not the same species. Other data like the sequences of their nuclear 18S rRNA gene and other chloroplast gene minicircles are needed to make a conclusion. Since the 23S rRNA minicircle is the only one sequenced from P. reticulatum, it was not possible to determine any cores in the non-coding region of the 23S rRNA minicircle. Sequence comparison showed the non-coding region of 23S rRNA minicircle of P. reticulatum is not homologous to those described, thus the non-coding region of all the minicircles sequenced are unrelated between species. The non-coding regions of the minicircular chloroplast genes from various dinoflagellates are unrelated (Figure 6.5), strongly contrary to the coding regions that are very conserved between species. However, the sequences of the non-coding regions of all five dinoflagellates are A T rich, consisting of various short motifs of the same nucleotide A, T, G or C. The pattern of these motifs is unrelated between species (Figure 6.5). Furthermore, regions between the cores (i.e. D l , D2, D3 and D4 in Table 6.4) are divergent between the psbA and 23S rRNA genes, although various short identical motifs 152 indicative of gene conversions were found in these regions (Chapter 4). In H. pygmaea, the non-coding region of the larger psbA circle is 226 bp longer than that of the smaller psbA circle, and the 226 bp appear as insertions mainly in D l and D4 region (Table 6.4). Thus the extremely conserved cores in the non-coding region of each species probably are hotspots for recombination between different circles. The extensive gene conversion between the non-coding regions of the minicircular chloroplast genes is probably the main homogenizing force. Inverted repeats sized 41, 40 and 46 bp were found in the non-coding region of H. triquetra, H. niei and H. rotundata, respectively (Figure 6.6). The inverted repeats were in the 9A core (188 bp) of if. triquetra, the 169 bp core of H. niei and the 6G core (111 bp) of H. rotundata, respectively. None of these cores has a duplicate in the non-coding region. Since inverted repeats could form delicate hairpins and might have a replicating function in the chloroplast genomes of Euglena (Schlunegger et al. 1984) and Chlamydomonas (Wu et al. 1986), the inverted repeats may also have a function involved replication of the minicircular chloroplast genes in dinoflagellates. 6.5 Possible function of the non-coding region of minicircular chloroplast genes Like the tripartite non-coding region of H. triquetra circles (Figure 4.20), the non-coding region of the minicircles of H. pygmaea, H. niei, H. rotundata, A. carterae and P. reticualtum each can be folded into an elaborate secondary structure with various hairpins and unpaired loops using DNA fold (http://mfold.wustl.edu/~folder/dna). Several alternative structures of similar thermodynamic stability for the majority of the circles are 153 possible, and they all have similar characteristics in that each core is part of a hairpin and the region between the cores could form either hairpins or unpaired loops (data not shown). The capacity to form hairpins from non-coding regions may therefore be important for function but the primary structure cannot be. Thus the non-coding region of the minicircles from all the species may have the same function as that proposed for the 9G-9A-9G region of H. triquetra: the replication origins of the unigenie circles, mediating D N A segregation by binding the circles to a membrane (see Chapter 4). 6.6 Origin and significance of unigenic circles in dinoflagellates Minicircles (plasmids) from chloroplasts reported earlier had undefined sequences (Jacobs et al. 1992) or had fragments of chloroplast genes (La Claire et al. 1998). The organization of chloroplast genes as single gene circles in the dinoflagellates is altogether different from these, and they are the only germ-line unigenic chromosomes known in chloroplasts (Zhang et al. 1999, Barbrook and Howe 2000). The mechanism for the origin of these unique unigenic circles from a conventional multigenic circular genome may have begun with a sudden transposition of replication origins to sites between every gene, followed by homologous intrachromosomal recombination to generate separate unigenic circles in the dinoflagellate. Each circle generated would have been able to replicate immediately. The presence of minicircular chloroplast genes in several distantly related dinoflagellates suggests that the minicircles might have originated in the ancestral dinoflagellate. In contrast, DNA blots and inverse PCR also revealed that the chloroplast psbA and 23S rRNA genes might be present in large D N A molecules in other distantly 154 related dinoflagellates. Some dinoflagellates may still have the conventional multigenic circular chloroplast genome; alternatively some dinoflagellates may have transferred the minicircular genes to the nucleus. The location of the two genes on one D N A molecule needs further investigation. Although D N A blots and PCR amplification of chloroplast genes suggested that minicircular chloroplast genes are present in quite a few dinoflagellates, I cannot rule out the possibility that these dinoflagellates contain very low copy number of chloroplast D N A molecules with multiple chloroplast genes. However, the likelihood of maintaining two different genomes within the same organelle of dinoflagellates should be low. First, there is no convincible reason why dinoflagellates should keep two different chloroplast genomes. Furthermore, i f the hypothesis of generating minicircles proposed is right, the original chloroplast genome consisting of many genes should have been destroyed when the minicircles originated. The origin of the minicircular chloroplast genes in dinoflagellates is the largest change ever detected in the organization of chloroplast genome. This unique organization raises many questions for future study such as the replication and segregation of the minicircular chloroplast genes and the mechanism of copy number control for the different circles. 155 Chapter 7 Phylogeny of dinoflagellate chloroplast genes 7.1 Introduction In Chapters 4 and 6,1 showed that some chloroplast genes of many peridinin-containing dinoflagellates are on individual minicircles, which has been confirmed for several genes of A. operculatum by Barbrook and Howe (2000). The minicircles are extremely different from chloroplast genomes of other photosynthetic organisms which are single, large circular molecules (around 120 - 200 kb) containing approximately 140 to 250 genes (Sugiura 1995; Reith 1995; Turmel et al. 1999). In having minicircles, the dinoflagellates differ greatly from their non-photosynthetic sister group, the parasitic sporozoans (e.g. malaria parasites like Plasmodium falciparum), which have relict plastids with more than fifty different genes on a single circular genome of around 35 kb (Wilson et al. 1996). Sporozoans (apicomplexans) and dinoflagellates are grouped with protalveolate flagellates (e.g. Perkinsus, Colpodella) and ciliates as a major protist group, known as alveolates (Cavalier-Smith 1991, 1993a, 1998) because their membrane-bound cortical alveoli are a key shared feature. The grouping of the alveolates is also supported by other ultrastructural characters as well as by nuclear 18S rRNA gene sequences (Gajadhar et al. 1991; Cavalier-Smith 1993a) and actin sequences (Reece et al. 1997). However, as the relict sporozoan plastids have lost all photosynthetic genes, the relationship of dinoflagellate chloroplasts and sporozoan plastids is still not clear. Previous reports suggested that sporozoan plastids may be related to red algal and chromistan plastids 156 (Williamson et al. 1994; Blanchard and Hicks 1999; McFadden and Waller 1997) which favored the view that sporozoan and dinoflagellate plastids may be directly related (Palmer 1992; Cavalier-Smith 1999). However, protein synthesis elongation factor tufA trees suggested that sporozoan plastids might have originated from a green alga (Kohler et al. 1997). A key question, therefore, is whether the plastids of dinoflagellates and Sporozoa were related through common decent from their alveolate common ancestor (Palmer 1992; Cavalier-Smith 1999) or were acquired by independent endosymbiotic events (Kohler etal. 1997). Phylogenetic analyses of the chloroplast gene sequences I have obtained from several closely and distantly related dinoflagellates should provide important information for the origin of the dinoflagellate chloroplasts as well as the relationship between sporozoan plastids and dinoflagellate chloroplasts. 7.2 The dinoflagellate 23S r R N A sequences are chloroplast genes Dinoflagellate 23S rRNA genes are extremely divergent, so it is very difficult to align some regions with other chloroplast 23S rRNAs. Close to the 3' end of 23S rRNA is the most conserved part (-0.7 kb) which is easy to align. The region (-1.3 kb) upstream is very divergent between dinoflagellates and other organisms but very similar among the Heterocapsa species. The P. reticulatum 23S rRNA gene has an insertion of 166 bp (approximately 160 bp upstream of the 3' terminus) that is not present in any other chloroplast 23S rRNA genes. At the time when the 23S rRNA trees (described below) were constructed, I did not realize that P. reticulatum 23 S rRNA has split and flipped (see 157 Figure 6.3b of Chapter 6), and only the large fragment was aligned with 23 S rRNA sequences of other organisms properly. The region flipped is in the more variable region close to the 5' end of the 23 S rRNA gene, and is only about one fourth of the whole gene, so it may not have affected the tree topology. However, it would shorten this branch of 23 S rRNA trees if the flipped region were aligned and used to reconstruct the trees. A l l maximum likelihood and parsimony trees gave two clearly resolved groups: a cyanobacteria/chloroplast group and a non-cyanobacterial bacteria/mitochondria group where the mitochondria grouped with cc-proteobacteria (Figures 7.1, 7.2). In all other trees (LogDet, neighbor joining and quartet puzzling), chloroplasts and cyanobacteria grouped together with high bootstrap support, but mitochondria did not group with the cc-proteobacteria (Figures 7.3, 7.4, 7.5). In all the trees, the dinoflagellates form a single group within the chloroplast cluster, indicating that dinoflagellate 23 S rRNA genes are chloroplast genes, not mitochondrial genes, and that dinoflagellate peridinin-containing plastids are monophyletic. However, the dinoflagellate branches are extremely long, approximately twice that oi Plasmodium, and 1.5 times longer than that of the longest mitochondrial branch, Neurospora (Figures 7.1, 7.2). There was no significant difference among the trees constructed using three different masks differing in stringency (see methods). A l l the maximum likelihood trees have two major anomalies inconsistent with previous work: the paraphyly rather than holophyly of Sporozoa and the grouping of euglenoids with the alveolates rather than with the green algae (Figure 7.1) from which their plastids have originated (Tunnel et al. 1999). In Tunnel et al.'s (1999) maximum likelihood tree constructed from 37 concatenated protein sequences (7,449 amino acids), 158 ML Dinoflagellates: Heterocapsa pygmaea Heterocapsa niei Heterocapsa rotundata Heterocapsa triquetra Protoceratium reticulatum Amphidinium carterae j - Plasmodium falciparum '—Plasmodium berghei 1 Toxoplasma gondii Sporozoa t - Astasia longa Euglena gracilis Odontella sinensis Pylaiella littoralis jr Porphyra purpurea y— Palmaria palmata '— GuUlardia theta Cyanophora paradoxa r Pisum sativum JL Zea mays . V- Pinus thunbergiana I Marchantia polymorphs Chlamydomonas reinhardtii Chlorella vulgaris 1— Synechocystis sp. L- Anacystis nidulans J Euglenoids ] Heterokonts 1 Red algae • Cryptomonad • Glaucophyte Green plants ] Cyanobacteria Chromists fi - Paramecium aurelia • Tetrahymena pyriformis -Acanthamoeba castellanii — Dictyostelium discoideum I Neurospora crassa I Penicillium chrysogenum Saccharomyces cerevisiae Chondrus crispus Pylaiella littoralis Prototheca wickerhamii Marchantia polymorpha Zea mays - Rickettsia bellii \t— Agrobacterium vitis " Rhodobacter capsulatus a I 1 — Acetobacter europaeus r~ Pseudomonas cepacia P— Bordetella avium Neisseria gonorrhoeae Campylobacter coli Helicobacter pylori Borrelia burgdorferi Bacillus subtilis 0.1 changes Mitochondria ]- Proteobacteria Bacteria Figure 7.1 Maximum likelihood tree of 23S rRNA sequences (1,885 bp) of chloroplasts, mitochondria and bacteria (ln likelihood = -49685.99168). Dinoflagellates form a monophyletic group within the chloroplast clade. Scale bar indicates 0.1 changes per base pair. 159 Euglena is solidly within the green algal group with 90% bootstrap support. The parsimony tree of 23 S rRNA sequences had the same branching order for the plastids as that of maximum likelihood tree except that heterokonts were also incorrectly paraphyletic (Figure 7.2). In neighbor joining (Figure 7.3, gamma distribution, a = 0.77), maximum likelihood and parsimony trees (Figures 7.1, 7.2), Plasmodium artifactually groups with dinoflagellates, not the other sporozoan Toxoplasma. A similar systematic error (grouping of Plasmodium with dinoflagellates rather than Theileria) was seen on a mitochondrial cytochrome oxidase I (coxl) maximum likelihood tree (Inagaki et al. 1998). It seemed possible that these errors in tree topology were caused by base composition bias (Lockhart et al. 1994). The AT composition of the Plasmodium plastid genome is extremely high (A+T = 86.9%) (Wilson et al. 1996). The Plasmodium plastid 23S rRNA genes have a higher A T % than those of dinoflagellates, Toxoplasma and euglenoids (Table 7.1), which in turn are higher than those of green plants, red algae and chromists. When a LogDet tree was constructed, Plasmodium and Toxoplasma did form a clade with 93% bootstrap support, and the sporozoans formed the sister group of dinoflagellates with 100%) bootstrap support (Figure 7.4). However the LogDet tree placed both euglenoids and alveolates below the rest of the chloroplasts and grouped the mitochondria with the plastids/cyanobacterial clade, not the a-proteobacteria. Errors in tree topology can also result from highly unequal evolution rates in different taxa (Felsenstein 1978; Olsen 1987; Hillis et al. 1994). Rate variation among sites is almost universal in molecular evolution (Van de Peer et al. 1996) but it was not taken into account by the program fastDNAml used for making the maximum likelihood 160 MP 100 100 92 98 83 51 8 j 44| 69f 691 50J r 100 P — Dinoflagellates: Heterocapsa triquetra Heterocapsa rotundata Heterocapsa pygmaea Heterocapsa niei Protoceratium reticulatum Toxoplasma gondii Astasia longa Euglena gracilis Pylaiella littoralis Odontella sinensis i \j£5Porphyra purpurea Palmaria palmata '— Guillardia theta Cyanophora paradoxa 78 | 94 f - Zea mays 94JL p / s tvm sativum p2| «• p/nus thunbergiana |99 L Marchantia ploymorpha 9jl Chlamydomonas reinhardtii Amphidinium caderae 100 1- Plasmodium falciparum -1 Plasmodium berghei Sporozoa : • j Euglenoids Heterokonts Red algae Cryptomonad ^ -1 Glaucophyte Green plants IChromists Cyanobacteria 100 Penicillium -\ Neurospora Saccharomyces M Proteobacteria Bacteria • Bacillus 100 changes Figure 7.2 Maximum parsimony tree of 23S rRNA sequences (1,885 bp) of chloroplasts, mitochondria and bacteria. Dinoflagellates form a monophyletic group. Bootstraps are from 500 replicates. M , mitochondria. 161 NJ/Gamma Dinoflagellates: 89 97 61 53 P. falciparum L P. berghei Toxoplasma Astasia Euglena Pylaiella Odontella Porphyra rf Guillardia Palmaria 73f Zea mays Pinus Pisum ^ l a r c h a n t i a Chlamydomonas Chlorella \- Cyanophora ^Anacystis ^ Synechocystis 1001 Paramecium Tetrahymena Chondrus Dictyostelium Acanthamoeba ioo i— Penicillium J ' Neurospora ' Saccharomyces iMarchantia j 6 9 Zea mays • Prototheca • Pylaiella '— Borrelia 72l~ Agrobacter 6GP— Rhodobacter 8 j l J — Rickettsia y—Acetobacter lordetella fseudomonas Neisseria H- Campylobacter 12° Helicobacter Bacillus 100 ] -A carterae "P. reticulatum H. triquetra H. pygmaea 1H. n/e; /-/. rotundata Sporozoa J Euglenoids J Heterokonts • Red algae Chromists • Cryptomonad <4-> -J Red algae Green plants • Glaucophyte J Cyanobacteria Mitochondria a-Proteobacteria Bacteria — 0.1 changes Figure 7.3 Gamma distribution (HKY 85, a = 0.77) neighbor joining tree of 23S rRNA sequences (1,885 bp) with bootstrap values as percentages of 500 replicates. Scale bar corresponds to 0.1 changes per base pair. 162 LogDet/Heuristic search Marchantia Pinus Zea mays Visum Chlamydomonas Chlorella Green plants Porphyra ^Palmaria 85 86 74 GuUlardia Cyanophora • Pylaiella 1122. Odontella • Anacystis Synechocystis Mooi Astasia Euglena • • i J Euglenoids Red algae Cryptomonad Glaucophyte Heterokonts Cyanobacteria Chromists 94 100 100 85T H. pygmaea 8 8 j H. niei 100 n_ H triquetra • H. rotundata P. reticulatum A. carterae-1 100 H 9 3 _pP. falciparum 100 511 6l| 45l 88 • P. berghei • Toxoplasma Paramecium 1 Tetrahymena Sporozoa 4 | r 100 100 — Dictyostelium • Acanthamoeba Chondrus Penicillium Neurospora 100 98I i f r— M 85| 711 98l od i — archantia Zea mays Prototheca Pylaiella • Saccharomyces Mitochondria 100 100, Agrobacter • Rhodobacter - Rickettsia • Acetobacter Bordetella Pseudomonas Neisseria • Campylobacter - Helicobacter - Borrelia • Bacillus a-Proteobacteria Bacteria 0.05 changes Figure 7.4 LogDet tree of 23S rRNA sequences using a heuristic search. Holophyletic relationships within Sporozoa as well as between the Sporozoa and dinoflagellates are shown. Bootstraps are from 500 replicates. D, dinoflagellates. 163 Table 7.1 Base composition of chloroplast and cyanobacteria 23S rRNA sequences Species Sites (bp) A T % Marchantia polymorpha 1878 47 Green plants Zea mays 1879 45 Pisum sativum 1875 46 Pinus thunbergiana 1880 45 Chlamydomonas reinhardtii 1882 50 Green algae Chlorella vulgaris 1867 49 Astasia longa 1883 60 Euglenoids Euglena gracilis 1878 55 Toxoplasma gondii 1863 64 Sporozoa Plasmodium falciparum 1873 76 Plasmodium berghei 1870 74 Amphidinium carterae 1768 54 Dinoflagellates Protoceratium reticulatum 1853 64 Heterocapsa triquetra 1877 67 Heterocapsa pygmaea 1877 68 Heterocapsa niei 1876 68 Heterocapsa rotundata 1876 67 Pylaiella littoralis 1878 51 Heterokonts Odontella sinensis 1879 52 GuUlardia theta 1883 50 Cryptomonad Porphyra purpurea 1882 49 Red algae Palmaria palmata 1881 50 Cyanophora paradoxa 1883 48 Glaucophyte Anacystis nidulans 1879 46 Cyanobacteria Synechocystis sp. 1880 48 164 tree (Figure 7.1). To see if this was a problem, Quartet puzzling trees were constructed where rate heterogeneity was considered (8 gamma categories). The puzzling tree gave the same topology as the LogDet tree for the Sporozoa and dinoflagellates with 52% puzzling step support for the sporozoan clade, and 55%> for the alveolate grouping of Sporozoa and dinoflagellates, but the euglenoids still incorrectly grouped with the alvolates (Figure 7.5). The gamma distribution neighbor joining tree (a = 0.77) was overall similar to the maximum likelihood tree, but worse in that euglenoids and heterokonts also incorrectly appeared paraphyletic (Figure 7.3). If the evolutionary rate was not taken into account and assumed equal among different taxa, a neighbor joining tree incorrectly grouped all the long branch taxa of the Sporozoa, dinoflagellates and most of the mitochondria, apparently the result of a long branch artifacts (Figure 7.6). 7.3 Relationships among the chloroplast 23S r R N A genes In an attempt to resolve the topology of the chloroplast cluster more clearly, especially the relationship between the sporozoans and the dinoflagellates, phylogenetic trees were constructed using chloroplast and cyanobacterial rRNA sequences only. The sporozoans form a monophyletic and holophyletic group with bootstrap support of 84%) in the neighbor joining tree (gamma distribution parameter a = 0.74) (Figure 7.7), and 94%o in LogDet tree (Figure 7.8). Sporozoa is the sister group of dinoflagellates in both trees with > 98% bootstrap support. However, maximum likelihood, maximum parsimony and quartet puzzling trees gave the same artifactual grouping as Figures 7.1 and 7.2 with respect to the sporozoan Plasmodium as the sister group to dinoflagellates 165 66 *— Cyanophora 68 r Marchantia .Pinus ' Pisum } Green plants Zea mays .Porphyra b5Palmaria L- GuUlardia 55I r - Anacystis Synechocystis — K- Chlamydomonas -1 2. cwore//a J Green algae Heterokonts 1 1 Red algae Cryptomonad -Cyanobacteria K- Pylaiella -Odontella • Glaucophyte Chromists H. pygmaea tH. niei H. triquetra H. rotundata — P. reticulatum P. falciparum P. berghei -A. carterae -« 63 158 Toxoplasma g7 Astasia - Euglena Marchantia Zea mays Prototheca Penicillium Neurospora Saccharomyces j Sporozoa 3 Euglenoids 88 61 Paramecium Tetrahymena Dictyostelium Acanthamoeba - Pylaiella 1— Agrobacter A50 z,, . . . . y— Rhodobacter pM2- Acetobacter '— Rickettsia I r-Bordetella I UQ Pseudomonas L- Neisseria Campylobacter •Helicobacter Borrelia '— Bacillus Chondrus Mitochondria a-Proteobacteria 50 Bacteria Figure 7.5 Quartet puzzling tree of 23S rRNA sequences. Holophyletic relationship between Sporozoa and dinoflagellates is shown. Numbers at the branches are percentage values of 1,000 puzzling steps. D, dinoflagellates. 166 NJ (HKY85, equal rates) r i H. pygmaea H. niei H. triquetra H. rotundata P. reticulatum A. carterae J i—P. falciparum '—P. berghei • Toxoplasma Paramecium • Tetrahymena Sporozoa plastids • Chondrus — Dictyostelium •Acanthamoeba Penicillium Neurospora Marchantia Pinus Zea Pisum Chlamydomonas Chlorella • Saccharomyces J Mitochondria v. Anacystis Synechocystis Porphyra Palmaria Guillardia Cyanophora Pylaiella Odontella -Astasia Euglena Agrobacter Rhodobacter Acetobacter Rickettsia Bordetella Pseudomonas Neisseria i—Marchantia ' Zea mays — Prototheca Pylaiella Chloroplast ] • Borrelia • Campylobacter •Helicobacter •Bacillus a-Proteobacteria Bacteria Mitochondria Bacteria 0.05 changes Figure 7.6 Neighbor joining tree of 23S rRNA sequences (HKY85, equal rates). Long branched mitochondria sequences group with long branched chloroplast sequences of sporozoan and dinoflagellates, an obvious long-branch artifact. D, dinoflagellates. 167 NJ/Gamma 100 53 100 91 0.1 r H. pygmaea ' H. niei H. rotundata — P. reticulatum H. triquetra - A. carterae T. heimii S. trochoidea 100 98 92J 99 61 4 54 Dinoflagellates: 1 7 j Heterocapsa triquetra 27i Heterocapsa pygmaea 3 9 F Heterocapsa niei Heterocapsa rotunda 94 oo f Plasmodium falciparum 100 ^ 8 4 Plasmodium berghei '— Toxoplasma gondii Astasia longa Euglena gracillis Odontella sinensis k Pylaiella littoralis jf Porphyra purpurea GuUlardia theta Palmaria palmata Cyanophora paradoxa Zea mays 9jf Pinus thunbergiana •' Pisum sativum I grfarchantia polymorpha Chlamydomonas reinhardtii *- Chlorella vulgaris • Anacystis nidulans • Synechocystis sp. -Amphidinium carterae Protoceratium reticulatum Sporozoa J Euglenoids J Heterokonts ] Red algae ] Cryptomonad-^-1 ] Red algae ] Glaucophyte Green plants ] Cyanobacteria Chromists 0.1 Figure 7.7 Gamma distribution (HKY 85, a = 0.74) neighbor joining tree of chloroplast 23S rRNA sequences. Bootstraps are percentages of 500 replicates, a value is from Tree puzzling analysis with 1 invariable + 8 Gamma categories. Inset shows the relationship among the dinoflagellates of a tree constructed from partial 23S rRNA gene sequences. 168 LogDet/Heuristic search 100I 100 76 100 84T H. pygmaea " H. niei i" H.rotundata H. triquetra ~ P. reticulatum — A.carterae 90 | 0.05 T. heimii S. trochoidea 81 83 100 89 100 100 100 100 69 59^  100 Dinoflagellates: Heterocapsa pygmaea Heterocapsa niei Heterocapsa triquetra Heterocapsa rotundata Protoceratium reticulatum Amphidinium carterae 100 94 Plasmodium falciparum Plasmodium berghei Toxoplasma gondii Sporozoa U100 Astasia longa Euglena gracilis Pylaiella littoralis Odontella sinensis Guillardia theta Porphyra purpurea 100 Palmaria palmata ( • Marchantia polymorpha 47 - Pinus thunbergiana glenoids J Heterokonts "i Chromists J Cryptomonad-I 1 Red algae p Zea mays ^79 4ioo L Pisum sativum 11 Chlamydomonas reinhardtii Mioo I Chlorella vulgaris 1— Cyanophora paradoxa |— Anacystis nidulans Synechocystis sp. 0.05 Green plants ] Glaucophyte j Cyanobacteria Figure 7.8 LogDet tree of chloroplast 23S rRNA sequences (1,885 bp). Bootstrap values are percentage of 500 replicates. Inset is a tree of partial 23S rRNA sequences, showing relationships within eight dinoflagellates. 169 (data not shown). In quartet puzzling trees in which rate variation among sites was taken into account, the inclusion of invariable sites did not affect tree topology, although the a values were different when the invariable sites were not taken into account (data not shown). In all the trees, the euglenoids Euglena and Astasia were still in the red algae/chromists group, misleadingly strongly supported with high bootstrap values (Figure 7.7, 7.8). On mitochondrial coxl trees (Inagaki et al. 1998), the Euglenozoa also artefactually clustered with Miozoa, the phylum including dinoflagellates and Sporozoa (Cavalier-Smith 1999). Neighbor joining trees (not shown) were also constructed by TREECON considering transversions only (van de Peer and de Wachter 1994), but the tree topology was not significantly different from those of Figures 7.7 and 7.8. Phylogenetic trees (quartet puzzling, neighbor joining, LogDet and maximum parsimony, not shown) were also constructed for chloroplast 23S rRNA genes omitting one of the three long-branched groups, either dinoflagellates, sporozoans or euglenoids. The remaining two groups then grouped as sisters (e.g. dinoflagellates and sporozoans) within the chl a/c-red algal cluster. If two groups were omitted, the remaining one was still within the chl a/c-red algal cluster (not shown). 7.4 Chloroplast 16S rRNA may not be reliable in phylogenetic analysis B L A S T searching of homologous sequences of the 16S rRNA genes from databases (http://www.ncbi.nlm.nih.gov/BLAST) showed that the first significant hit was the sporozoan (e.g. Plasmodium) plastid 16S rRNA. Alignment of H. triquetra 16S rRNA sequences with those from algae and higher plants confirmed that it was a chloroplast 16S 170 rRNA gene since it contained quite a few very conserved motifs also found in chloroplast 16S rRNA of other organisms. However, it was even more divergent than the sporozoan 16S rRNA gene sequence, thus is the most divergent chloroplast 16S rRNA gene ever sequenced. Maximum likelihood analysis using the dinoflagellate H. triquetra 16S rRNA gene also showed artifactual grouping of//, triquetra with Plasmodium to the exclusion of Toxoplasma (Figure 7.9). LogDet tree (not shown) correctly showed the sporozoan clade as the 23S tree did, but the long-branched Prototheca moved to become sister of the Sporozoa. Prototheca also remained with the Sporozoa on gamma distribution neighbor joining trees (not shown). Prototheca, which is a colorless green alga, should have grouped with the Chlorella/Nanochlorum clade. In such gamma trees, H. triquetra grouped with the long branch chlorarachnean but the position of their artifactual clade was unstable, depending on whether the option of heuristic search (PAUP 4.0) was used or not. The relative position of the three chromist groups (cryptomonads, heterokonts and haptophytes) differed between the trees. In the quartet puzzling tree, H. triquetra did group with Plasmodium with 60% support out of 1,000 puzzling steps to the exclusion of Toxoplasma (Figure 7.10), and its branch was almost ten times as long as that of Plasmodium. Because of their lack of robustness and the systematically biased grouping of the long branched Prototheca, alveolate, euglenoid and chlorarachnean clades, 16S rRNA trees cannot be relied upon for the accurate reconstruction of chloroplast phylogeny. 171 65 6fJ 89 11001 100 92 98 _26_ Dinoflagellate: Heterocapsa triquetra • Plasmodium falciparum -i Sporozoa Toxoplasma gondii Prototheca zopfii -Astasia longa -Euglena gracilis • Chlorarachnion reptans *- Chlorarachnion sp. Isochrysis sp. Emiliania huxleyi Ochrosphaera sp. STT Odontella sinensis JpL Skeletonema costatum J' Pylaiella littoralis '— Olisthodiscus luteus 1—Cyanidium caldarium Chondrus crispus Antithamnion sp. tigforphyra purpurea *—— Glaucosphaete vacuolata Pyrenomonas salina GuUlardia theta r Marchantia polymorpha 99 1 95>|— 74 51" l7i 8£ 96 Pisum sativum Lygodium japonicum — Chara sp. - Coleochaete orbicularis '— Spirogyra maxima Nanochlorum eucaryotum Chlorella vulgaris Chlamydomonas moewusii 36T~ i [obLr — Chlamydomonas reinhardtii Gloeochaete wittrockiana ^-Glaucocystis nostochinearum ^Cyanophora paradoxa I—Synechocystis sp. '—Anacystis nidulans 0.1 Euglenoids Chlorarachniophytes Haptophytes -j Heterokonts -l Red algae Cryptomonad Chromists Green plants J Glaucophyte Cyanobacteria Figure 7.9 Maximum likelihood tree of chloroplast 16S rRNA (1,247 bp, ln likelihood = -18490). Bootstrap values are the percentage of 500 replicates for separate gamma distribution neighbor (rx= 0.46, above) and maximum parsimony trees (below). Scale bar corresponds to 0.1 changes per base pair. 172 Marchantia 54 Lygodium Pisum f7Spirogyra Coleochaete L Chara Emiliania 90 l8Jsochrysis Ochrosphaera 54 Guillardia 92 1 Pyrenomonas r Antithamnion 51 L Chondrus ^9Glaucosphaete Porphyra Cyanophora sfelaucocystis 65 Gloeochaete i Odontella toSkeletonema Plasmodium Dinoflagellate: Heterocapsa triquetra i Chlorella -\SB 1 Nanochlorum r Ochromonas 79 L Pylaiella H Chlorarachnion reptans 85 Chlorarachnion sp. I |- Astasia H89 1 L Euglena Chlamydomonas moewusii 1 L Chlamydomonas reinhardtii Prototheca Cyanidium h Olisthodiscus Toxoplasma Anacystis L Synechocystis Figure 7.10 Quartet puzzling tree of chloroplast 16S rRNA sequences (l,247bp). Numbers at the branches are percentage from 1,000 puzzling steps. 173 7.5 Phylogenetic analyses of individual protein sequences In contrast to the 16S and 23S rRNA genes, the deduced amino acid sequences of all seven proteins are very easily aligned (except for the N-terminus) with the corresponding proteins from other organisms. The dinoflagellate proteins were always the most divergent ones in alignments (see Chapter 4). Protein trees are less subject to base composition bias than nucleotide sequence trees (Foster et al. 1997). Phylogenetic analyses were carried out by neighbor joining and parsimony for the seven protein sequences individually. In all the trees, the Heterocapsa branches were many times longer than for any of the plastid or cyanobacterial sequences, and Heterocapsa grouped with .chromists (represented by Odontella and GuUlardia) and red algae (Porphyra) (data not shown). However, the branching patterns were not consistent and bootstrap values were low in many cases. The contradictions among the phylogenetic analyses of individual proteins may result from insufficient evolutionary information in a single sequence (Gray et al. 1998). Among the seven protein genes, the psbA gene is the most conserved one at both D N A and protein sequence level. Since I have sequenced the psbA gene from nine dinoflagellate species, and several dinoflagellate psbA sequences became available in the database while my thesis work was in progress (Takishita and Uchida 1999; Barbrook and Howe 2000), extensive phylogenetic analyses were carried out for the psbA gene. Neighbor joining and maximum parsimony trees of psbA protein (309 amino acids) showed two clusters: a green plant/euglenoid (chl a/b) cluster, and a red alga/chromist/dinoflagellate cluster (Figure 7.11). Euglena groups with Chlorella in the 174 97 59l 84 55| 1100 100 6d J V H.niei fl- H.pygmaea "— H.rotundata 93 H.triquetra (M-AP7) 9rJ H.triquetra (CCMP 449) L. polyedra 961— A.tamarense (OF151) I—==— A.catenella (TN7) — — S. trochoidea 98 100.— A.carterae (CCMP 1314) lotjT— A.carterae (NIES-331) •P. micans (NIES12) 0.1 -7. ftez/n// 79 58 100 100 g2~195lu, Dinoflagellates: jrHeterocapsa pygmaea 50-^-Heterocapsa niei eterocapsa triquetra (CCMP449) 'Heterocapsa triquetra (M-AP7) Heterocapsa rotundata Lingulodinium polyedra Alexandrium catenella (TN7) Alexandrium tamarense (OF151) Amphidinium carterae (CCMP1314) Amphidinium carterae (NIES-331) Prorocentrum m/cans(NIES12) 100 100 100 100 Bumilleriopsis filiformis Heterosigma carterae Odontella sinensis Ectocarpus siliculosus Guillardia theta Antithamnion sp. Porphyra purpurea Cyanophora paradoxa 7 6 s j Oryza sativa 67J' Nicotiana tabacum 97 P Zea mays V Marchantia polymorpha 63^ Pinus thunbergii Chlamydomonas reinhardtii Chlorella vulgaris 55 Chlorella ellipsoidea Chromists j Red algae ] Glaucophyte Green plants Plants -Euglena gracilis ] Euglenoid | Synechococcus PCC6301 Synechocystis PCC6803 J Cyanobacteria 0.1 Figure 7.11 Neighbor joining tree of psbA protein sequences (309 amino acids). Bootstrap values expressed as percentage of 500 replicates for neighbor joining (above) and maximum parsimony (below). The following dinoflagellate psbA protein sequences were from Genbank: H. triquetra (M-AP7, AB025587), A. catenella (TN7, ABO25590),A tamarense (OF151, AB025589), A. carterae (NIES-331, AB025586), P. micans (NIES-12, AB025585) and L. polyedra (AB025588). Inset is a neighbor joining tree of partial psbA protein sequences (261 amino acids) showing phylogenetic relationships among the dinoflagellates; the arrow indicates the position of P. reticulatum when its partial psbA protein sequence of (151 amino acids) is included. Scale bar corresponds to 0.1 changes per amino acid. The numbers or letters in brackets represent strain numbers. 175 green plant cluster; in the red cluster, dinoflagellates form a monophyletic group that is a sister group of heterokonts with 53% bootstrap support. Maximum likelihood and neighbor joining trees were also constructed using D N A sequences of the psbA gene with the exclusion of the third nucleotide of each codon. They showed a similar topology to that of the psbA protein in that dinoflagellates formed a monophyletic group (Figure 7.12). However, green algae, red algae, chromists and higher plants also formed a monophyletic group, contrary to the trees of 23 S rRNA and psbA protein sequences. When the third nucleotide of each codon was included in phylogenetic analysis, the topology of a neighbor joining tree was the same as when they were excluded (Figure 7.12) except that Euglena moved to the base of the dinoflagellate group (not shown). These results suggest that the psbA gene sequence may not be very useful in phylogenetic analysis since it is so conserved that it contains few evolutionary informative characters. 7.6 Phylogenetic analyses of seven concatenated protein sequences In order to get more reliable trees, the seven protein sequences were concatenated to give an alignment of 3,302 amino acids that should contain more evolutionary information. The concatenated dinoflagellate sequences were compared to concatenated sequences from plastids of thirteen other species (Martin et al. 1998). Phylogenetic analyses show that H. triquetra grouped with the diatom (Odontella), cryptomonad (GuUlardia) and red alga (Porphyra), with 87% bootstrap support from neighbor joining and 51% from parsimony (Figure 7.13). This tree also shows that peridinin-containing 176 Oryza sativa Zea mays r - l 1 - Nicotiana tabacum — Pinus thunbergii Marchantia polymorpha Cyanophora paradoxa — Porphyra purpurea — Antithamnion sp. Green plants ] Glaucophyte j Red algae GuUlardia theta Heterosigma carterae Odontella sinensis Bumilleriopsis filiformis Ectocarpus siliculosus Chromists Euglena gracilis ] Euglenoid Green algae -Chlorella vulgaris — Chlamydomonas reinhardtii r Heterocapsa niei |L- Heterocapsa pygmaea Heterocapsa triquetra (CCMP449) H Heterocapsa triquetra (M-AP7) • Heterocapsa rotundata Amphidinium carterae (NIES331) Amphidinium carterae (CCMP1314) Alexandrium catenella (TN7) Alexandrium tamarense (OF151) • Protoceratium reticulatum Prorocentrum micans (NIES12)-! •Synechococcus PCC6301 •Synechocystis PCC6803 ] Cyanobacteria 0.1 Figure 7.12 Neighbor joining tree of psbA gene sequences (1st and 2nd nucleotide of each codon). The dinoflagellates form a monophyletic group, but the red algae and chromists that group with dinoflagellates in other trees form a group with green algae and chromists here. D, dinoflagellates. 177 1001 56] ioq 100 1 0 0 ^ 100 100 oa 84 98 75 IOC 100 73I 59] 99 60 86 63 " Zea mays ' Oryza sativa Nicotiana tabacum Pinus thunbergii Marchantia polymorpha " Chlorella vulgaris Chlamydomonas reinhardtii Euglena gracilis Porphyra purpurea Guillardia theta ' Odontella sinensis Cyanophora paradoxa 0.1 Synechocystis PCC6803 Green plants 1 Euglenoid U Red alga Chromists Heterocapsa triquetra (Dinoflagellate) 3 Glaucophyte 1 Cyanobacterium Figure 7.13 Neighbor-joining tree of seven concatenated protein sequences. Numbers on the branches are bootstrap values greater than 50% from neighbor joining (above) and maximum parsimony analyses (below). Scale represents 0.1 substitution per amino acid. 178 dinoflagellate chloroplasts belong to the red algal/chromist lineage, although the branching order within this lineage is unclear. Thus the majority of the single protein trees and the concatenated protein sequence trees both indicate that dinoflagellate plastids are related to those of chromists and red algae. 7.7 Discussion 7.7.1 Peridinean dinoflagellate chloroplasts probably originated by secondary endosymbiosis The origin of the dinoflagellate peridinin-containing chloroplasts has been much debated. The two contradictory views are that the chloroplasts originated through primary (Cavalier-Smith 1982), or though secondary endosymbiosis (Gibbs 1981). Since the two hypotheses were both based on morphological and structural data, molecular data such as chloroplast gene sequences should be very important in order to test which hypothesis is reasonable. The phylogenetic analyses of ribosomal R N A genes, psbA and other individual protein sequences are consistent with that from the seven concatenated proteins of H. triquetra chloroplast proteins (Figure 7.13) that peridinean dinoflagellate chloroplasts, like chromist chloroplasts, may have been derived from a red alga by secondary endosymbiosis. This is also consistent with the conclusion from the phylogenetic analysis of psbA protein sequences (Takashita and Uchida 1999). Therefore peridinin-containing dinoflagellate plastids most probably evolved not directly from a cyanobacterium 179 (Cavalier-Smith 1982), but rather like the related chromistan chloroplasts by secondary symbiogenesis. The evolution of dinoflagellate chloroplasts is complicated since a few aberrant dinoflagellates have pigments rather than typical carotenoid peridinin. Because of their diverse pigment composition and presence of additional membranes (Palmer and Delwiche 1998), those non-peridinin chloroplasts maybe derived from haptophytes (e.g. Gymnodinium breve: Delwiche 1999, Tengs et al. 2000), cryptomonads (Dinophysis: Schnepf and Elbrachter 1988) and prasinophytes (Lepidodinium: Watanabe et al. 1990) through tertiary endosymbioses. Furthermore those with chloroplasts from diatoms (Chesnick et al. 1997) also showed the diatom nuclei and mitochondria within the endosymbiont. However, analysis of both 23S rRNA and psbA sequences confirm that these tertiary chloroplasts are not relevant to the origin of the monophyletic group of peridinin-containing dinoflagellate chloroplasts. The tertiary chloroplasts in those organisms imply that photosynthetic unicellular eukaryotes can be engulfed by dinoflagellates and live within the host as an endosymbiont; thus endoysmbiotic acquisition of chloroplasts in some dinoflagellates is a continuing process. 7.7.2 Relationships among the dinoflagellates Since the chloroplast gene trees have fewer dinoflagellate taxa than trees for nuclear 18S rRNA (Saunders et al. 1997, Figure 7.14), they are not as good as those 18S rRNA trees for resolving the branching order within the dinoflagellates. However, the branching order in the gamma distribution neighbor joining tree (Figure 7.7, where 180 ancestor got a plastid with peridinin, minicircles and triple envelopes G P P ancestral dinoflagellate ancestral alveolate A G Heterocapsa P ~" Peridinium balticum *D Scrippsiella P Thoracosphaera P Prorocentrum P Adenoides P Gymnodinium * H Lepidodinium * G Dinophysis * C *many heterotrophs Amphidinium P Pyrocystis P Lingulodinium P Protoceratium P Alexandrium P Crypthecodinium — #Noctiluca #Syndinea Peridiniea dinoflagellates #Perkinsus #Sporozoa Ciliates Figure 7.14 Schematic relationship between selected dinoflagellates and other alveolates. The branching order follows nuclear 18S rRNA trees (Saunders et al. 1997, Gunderson et al. 1998, Saldarriaga et al.in prep). Dinoflagellates with bold name have been screened for minicircles. Dinoflagellates with peridinin-containing plastids are marked P; those without plastids are labelled with an asterisk; several replaced them by differently pigmented plastids from other eukaryotic algae (replacement plastid sources are: D, diatoms; H , haptophytes; G, green algae; C, cryptomonads). Dinoflagellates known to have single gene circles are underlined. Non-photosynthetic species or orders are marked with #.Chloroplasts with both peridinin and minicircles were probably present in the ancestral peridinean (position 3), but the endosymbiotic event where a red algal plastid was acquired by their ancestors may have taken place earlier (at positions 1 or 2). 181 Amphidinium and Protoceratium formed a group that is the sister group of Heterocapsa species), is consistent with that of 18S rRNA trees (Saunders et al. 1997). Most of the other trees suffer from the long branch problem. In the trees in Figures 7.1, 7.4, 7.5 and 7.6, the long branched Amphidinium is at the base, and in Figure 7.11 the dinoflagellate taxa are ordered according to the length of their branches; probably these are the results of evolutionary rate differences between taxa. Therefore the rapidly evolving dinoflagellate chloroplast genes (see below) as well as differences in evolutionary rate between taxa make them less suitable than nuclear genes for reconstructing phylogeny of these organisms. 7.7.3 Ancestral peridinean dinoflagellates probably had chloroplasts with minicircles Phylogenetic analyses of 23 S rRNA and psbA protein sequences confirm that dinoflagellates are a monophyletic group, indicating that peridinin-containing dinoflagellate chloroplasts all have a common ancestry. M y data (see Table 6.1) includes two major dinoflagellate lineages that diverged at the base of the Peridinea in most published 18S rRNA trees (Saunders et al. 1997): Heterocapsa/Thoracosphaera/Scrippsiella/Prorocentrum/Adenoides (GPP complex, i.e. Gymnodiniales/Peridiniales/Prorocentrales) is on the one hand, Amphidinium/Protoceratium (AG complex, i.e. Amphidinium/Gonyaulacales) is on the other (Figure 7.14). However, the topology of the latest 18S rRNA trees with 32 dinoflagellate taxa that is more reliable than previous trees containing fewer taxa, 182 suggests that the all the dinoflagellates are sisters since their branches are short (Saldarriaga et al. in preparation). Whichever version of the 18S rRNA trees is correct, dinoflagellates containing minicircles are scattered all over the 18S rRNA trees, including Pyrocystis lunula, a species in a different order of Pyrocystales (Stoebe and Kowallik 1999). Therefore, the latest common ancestor of the peridinean dinoflagellates had a peridinin-containing plastid, and had minicircular plastid genes. The chloroplast genome was probably fragmented into single gene minicircles in the ancestral peridinean dinoflagellate (see Chapter 6), such fragmentation of a typical large chloroplast genome into separate single gene minicircles would have been a complex event. However, my preliminary results suggest that the dinoflagellate genera Prorocentrum, Thoracosphaera and Scrippsiella may have larger D N A molecules containing chloroplast gene(s), instead of chloroplast D N A 2-3 kb minicircles. Such large chloroplast D N A could have originated from minicircular chloroplast genes by duplications, or recombination between different circles, or by transferring chloroplast genes to the nucleus, since it is unlikely that minicircles evolved independently in the Heterocapsa and Amphidinium/Protoceratium lineages. Approaches such as PCR or RT-PCR amplification of genomic D N A using primer pairs specific for chloroplast genes co-transcribed in other organisms, RT-PCR amplification of chloroplast genes and screening cDNA libraries for chloroplast genes should be applied to investigate the large D N A molecules containing chloroplast genes in future. 183 7.7.4 Dinoflagellate chloroplasts might be related to Sporozoa plastids A second key question is whether sporozoan and peridinean dinoflagellate plastids have a common or an independent secondary origin (Palmer 1992; Cavalier-Smith 1999). Dinoflagellates group with sporozoans within the red algal/chromists group in all our 23S rRNA trees with strong bootstrap support, suggesting that the dinoflagellate chloroplasts are related to the sporozoan plastids. However, the possibility that the grouping of dinoflagellates and sporozoans is a "long branch artifact" cannot be ruled out (Felsenstein 1978; Philippe and Laurent 1999). Some mitochondrial 23S rRNA gene branches are longer than those of sporozoan plastids (Figures 7.1-7.5), but they did not group with dinoflagellate plastids that have the longest branches in most trees. Dinoflagellate chloroplast sequences do group with those long branching mitochondrial sequences in neighbor joining trees i f rate variation among sites is not taken into account (Figure 7.6). This shows the importance of using phylogenetic methods that take account of such rate variation (Yang 1996). However, taking into account rate variation among sites, including invariant sites, in the puzzling analyses did not change the sister relationship of dinoflagellates and sporozoans. Grouping of distantly related species might also result from similar base compositions (Lockhart et al. 1994, 1996), but LogDet analysis that corrects for base composition bias did not change the grouping of dinoflagellates and sporozoans. However, in all the rRNA trees the dinoflagellate and sporozoan branches are so long that one should be cautious when interpreting them, because none of the methods tried could move the euglenoids and Prototheca to the green group. 184 However, when the long branched taxa Euglena and Prototheca were excluded, dinoflagellates and Sporozoa still grouped with the red algal/chl a/c clade, which favors a red algal rather than a green algal ancestry for both dinoflagellate and sporozoan plastids. The fact that dinoflagellates and Sporozoa were at the same position on both the ribosomal R N A trees, irrespective of whether only one or both groups were included in the tree, indicates that their overall position is not simply a consequence of their two long branches attracting each other. In almost all the trees, the dinoflagellates and sporozoans are within the red algae/chromists group, while green plants and glaucophytes are more distant. The only exception is the LogDet tree of 23S rRNA sequences (Figure 7.4) where dinoflagellates and sporozoans are separated from all other groups. This is consistent with the evidence from conserved gene clusters that suggested that sporozoan plastids originated from a red alga as revealed by the similarity of ORF 470 (Williamson et al. 1994) and ribosomal protein clusters (Stoebe and Kowallik 1999) between the sporozoan Plasmodium falciparum and a red alga. The congruence between our trees and the gene-cluster data make unlikely a green algal origin of sporozoan plastids, as was inferred from phylogenetic analyses of the tufA gene (Kohler et al. 1997). 7.7.5 Accelerated evolution of dinoflagellate chloroplast genes Why are the dinoflagellate branches so long? In other words, why did the chloroplast genes in dinoflagellates evolve so rapidly? One possibility is that rapid evolution was an indirect consequence of the small size of minicircles (Chapters 4, 6). 185 Rates of substitution from mildly deleterious mutations might be greater in small molecules such as single gene circles that each is a replicon; it is probable that strength of stabilizing selection against mutation on such single gene circles is smaller than that on multigenic circular molecules (e.g. chloroplast DNA). Effect of mutation on any minicircle should be very obvious: i f it is harmful the minicircle wil l be deleted, i f it is mildly deleterious the minicircle may replicate, and thus the mutation wil l be maintained. However, some mutations on a gene of a multigenic molecule may not only affect the gene itself but also on its neighbor genes (e.g., its co-transcribed genes). Thus, any hannful mutation on a multigenic circular chloroplast genome is under high selection pressure, the probability of maintaining such a mutation on a large molecule should be lower compared with that on a small molecule. Another possibility is that the accelerated evolution is the result of Muller's ratchet. The gradual accumulation of deleterious mutations in small asexual populations is expected to result in an irreversible decline in fitness (Muller 1964; Felsentsein 1974). If a substantial proportion of mutations are mildly deleterious, then small populations show increased rates of sequence evolution. Recombination or sex could produce progeny with reduced number of deleterious mutations in small populations of small sizes (Muller 1964; Pamilo et al. 1987; Lynch et al. 1995). However, organelles and endosymbiotic bacteria are asexual and expected to be particularly vulnerable to the effects of Muller's ratchet that results (Lynch 1996; Moran 1996; Brynnel et al. 1998). Since each dinoflagellate (unicellular organism) only has one chloroplast, there is no recombination between different chloroplasts for any chloroplast gene. This is in contrast to other unicellular algae and higher plants that each cell contains multiple chloroplasts where 186 recombination between chloroplasts for chloroplast genes is possible. Therefore the dinoflagellate chloroplasts are expected to accumulate mildly deleterious mutations, and the chloroplast gene sequences should have evolved very rapidly. Alternatively, the rapidly evolved chloroplast genes could be the results of the D N A repair system that may not be able to correct every mutation in the dinoflagellate chloroplast genes. Interestingly, the sporozoan mitochondrial genome is also reduced in gene content to just three polypeptides (Feagin 1992), and in trees for the mitochondrial gene cytochrome oxidase I (coxl) the branches for the alveolates (dinoflagellates, sporozoans and ciliates) are also very long (Inagaki et al. 1998). This similarity between chloroplast trees and mitochondria trees raises the possibility that the genes in small genomes might have suffered rapidly evolving rates. 187 Reference 1. Aldrich, J. & Cattolico, R. A . (1981) Isolation and characterization of chloroplast D N A from the marine chromophye, Olisthodiscus luteus: electron microscopic visualization of isomeric molecular forms. Plant Physiol 68: 642-647. 2. Ayala, F. J. (1999) Molecular clock mirages. BioEssays 21: 71-71 3. Backert, S., Nielsen, B. L. & Borner, T. (1987) The mystery of the rings: structure and replication of mitochondrial genomes from higher plants. Trends PI Sci 2: 477-483. 4. Baldauf, S. L. & Palmer, J. D. (1990a) Evolutionary transfer of the chloroplast tufA gene to the nucleus. Nature 344: 262-265. 5. Baldauf, S. L. , Manhart, J. R. & Palmer, J. D. (1990b) Different fates of the chloroplast tufA gene following its transfer to the nucleus in green algae. Proc Natl Acad Sci USA 87: 5317-21. 6. Barbrook, A . C. & Howe, C. J. (2000) Minicircular plastid D N A in the dinoflagellate Amphidinium operculatum. Mol Gen Gen 263: 152-158. 7. Bedbrook, J. R. & Bogorad, L. (1976) Endonuclease recognition sites mapped on Zea mays chloroplast DNA. Proc Natl Acad Sci USA 73: 4309-4319. 8. Betts, L. & Spremulli, L. L. (1994) Analysis of the role of the Shine-Dalgarno sequence and mRNA secondary structure on the efficiency of translational initiation in the Euglena gracilis chloroplast atpH mRNA. J Biol Chem 269: 26456-63 9. Bibby, B. T. & Dodge, J. D. (1974) The fine structure of chloroplast nucleoid in Scrippsieela sweeneyae (Dinophyceae). J Ultrastructure Research 48: 153-161. 188 lO.Blanchard, J. L. & Schmidt, G. W. (1995) Pervasive migration of organellar D N A to the nucleus in plants. J Mol Evol 41: 397-406 11 .Blanchard, J. L . & Hicks, J. S. (1999) The apicomplexan plastid is not derived from within the chlorophycean/charophycean green plastid lineage. J Euk Microbiol 46: 367-375. 12. Boczar, B. A. , Liston, J. & Cattolico, R. A . (1991) Characterization of satellite D N A from three marine dinoflagellates: Glenodinium sp. and two members of the toxic genus, Protogonyaulax. Plant Physiol 97: 613-618. 13. Boer, P. H . & Gray, M . W. (1988) Scrambled ribosomal R N A gene pieces in Chlamydomonas reinhardtii mitochondrial DNA. Cell 55: 399-411. 14. Boer, P. H & Gray, M . W. (1991) Short dispersed repeats localized in spacer regions of Chlamydomonas reinhardtii mitochondrial DNA. Curr Genet 19: 309-312. 15. Boore J. L. & Brown, W. M . (1994) Complete D N A sequence of the mitochondrial genome of the black chiton, Katharina tunicata. Genetics 138: 423-443. 16. Brown, D. D. Wensink, P. C. & Jordan, E. A (1972) comparison of the ribosomal DNA's oi Xenopus laevis and Xenopus mulleri: the evolution of tandem genes. J Mol Biol 63: 57-73. n.Brynnel, E. U. , Kurland, C. G., Moran, N . A . & Andersson, S. G. (1998) Evolutionary rates for tuf genes in endosymbionts of aphids. Mol Biol Evol 15: 574-682. 18. Camacho, J. P. M . , Sharbel, T. F. & Beukeboom, L. W. (2000) Phil Trans R Soc Lond 5 355, 163-178. 19. Cavalier-Smith, T. (1982) The origins of plastids. Biol J Linn Soc 17: 289-306. 189 20. Cavalier-Smith, T. (1987) The origin of cells: a symbiosis between genes, catalysts, and membranes. Cold Spring Harbor Symp Quant Biol 52: 805-824. 21. Cavalier-Smith, T. (1991) Cell diversification in heterotrophic flagellates. In: The biology of free living heterotrophic flagellates (eds Patterson DJ, Larsen J). Oxford University Press, pp. 113-131 22. Cavalier-Smith, T. (1993a) Kingdom protozoa and its 18 phyla. Microbiol Rev. 57: 953-994. 23. Cavalier-Smith T (1993b) Evolution of the eukaryotic genome. In: The eukaryotic genome (eds Broda P, Oliver SG, Sims P) Cambridge University Press, pp. 333-385 24. Cavalier-Smith T (1998) A revised six-kingdom system of life. Biol Rev 73: 203-266 25. Cavalier-Smith, T. in: Biodiversity and Evolution, (eds Arai, R., Kato, M . & Doi, Y.) (The National Science Museum Foundation, Tokyo, 1995) pp. 75-114. 26. Cavalier Smith, T., Chao, E. E. & Allsopp, M . T. E. P. (1995) Ribosomal R N A evidence for chloroplast loss within Heterokonta: pedinellid relationships and a revised classification of ochristan algae. Arch Protistenk 145: 209-220. 27. Cavalier-Smith, T., Chao, E. E., Thompson, C. & Hourihane, S. (1996) Oikomonas, a distinctive zooflagellate related to chrysomonads. Archiv Protistenk 146: 273-279. 28. Cavalier-Smith, T. et al. Cryptomonad nuclear and nucleomorph 18S rRNA phylogeny. Eur J Phycol 31: 315-328 (1996). 29. Cavalier-Smith, T. (1999) Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate and sporozoan plastid origins and the eukaryote family tree. JEuk Microbiol 46: 347-366. 190 30. Cavalier-Smith T (2000) Membrane heredity and early chloroplast evolution.Trends Plant Sci 5: 174-182. 31. Cermakian, N . , Ikeda, T. M . , Miramontes, P, Lang, B. F., Gray, M . W. & Cedergren, R. (1997) On the evolution of the single-subunit R N A polymerase. J Mol Evol 48: 671-681. 32. Chesnick, J. M . & Cattolico, R. A . (1993) Isolation of D N A from eukaryotic algae. Methods Enzymol 224: 168-176. 33. Chesnick, J. M . , Kooistra, W. H. , Wellbrock, U . & Medlin, L . K . (1997) Ribosomal R N A analysis indicates a benthic pennate diatom ancestry for the endosymbionts of the dinoflagellates Peridinium foliaceum and Peridinium balticum (PyrrhophytaJ. J Euk Microbiol 44: 314-320. 34. Coleman, A. W. (1985) Diversity of plastid configuration among classes of eukaryotic algae. JPhycol 21: 1-16. 3 5.Delwiche, C. F. (1999) Tracing the thread of plastid diversity through the tapestry of life. Am Nat 154: S164-S177. 36. Dodge, J. D. (1963) Chromosomes in some marine dinoflagellates. Bot Mar 5: 121-127. 37. Dodge, J. D. & Crawford, R. M . (1968) Fine structure of the dinoflagellate Amphidinium carteri Hulburt. Protistologica 4: 231-242. 38. Dodge, J. D. (1975) A survey of chloroplast ultrastructure in Dinophyceae. Phycologia 14, 253-263. 39. Dodge, J. D. (1984) Dinoflagellate evolution. In: Dinoflagellates. Academic Press. p.481-522 40. Doolittle, W. F. & Sapienza, C. (1980) Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603. 41. Douglas, S. E. (1988) Physical mapping of the plastid genome from he chlorophyll c-containing alga, Cryptomonas <X>. Curr Genet 14: 0-0. 42. Douglas, S. E. and Durnford, D. G. (1989) The small subunit of ribulose-1,5-bisphosphate carboxylase is plastid-encoded in the chlorophyll c-containing alga Cryptomonas®. Plant Mol Biol 13: 13-20. 43. Douglas, S. E., Murphy, C. A. , Spencer, D. F. & Gray, M . W. (1991a) Cryptomonad algae are evolutionary chimeras of two phylogenetically distinct unicellular eukaryotes. Nature 350: 148-151. 44. Douglas, S., E. and Turner, S. (1991b). Molecular evidence for the origin of plastids from a cyanobacterium-like ancestor. J Mol Evol 33: 267-273. 45. Douglas, S., E. & Penny, S., L. (1999) The plastid genome of the cryptophyte alga, Guillardia theta: complete sequence and conserved synteny groups confirm its common ancestry with red algae. JMol Evol 48: 236-244. 46. Downie, S. R. and Palmer, J. D. (1992) Use of chloroplast D N A rearrangements in reconstructing plant phylogeny. In: Molecular Systematics of Plants. New York, Chapman and Hall. pp. 14-35. 47. Ebert, C , Tymms, M . & Schweiger, H. (1985) Homology between 4.3-pm minicircular and plastomic D N A in chloroplast oi Acetabularia cliftonii. Mol gen Genet 200: 187-192. 48. Feagin, J. E. (1992) The 6-kb element of Plasmodium falciparum encodes mitochondrial cytochrome genes. Mol Biochem Parasitol 52: 145-148. 192 49. Felsenstein J (1974) The evolutionary advantage of recombination. Genetics 78: 737-56. 50. Felsenstein, J. (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27: 401-410. 51 .Felsenstein, J. (1993) PHYLIP (phytogeny inference package) Version 3.57c. Distributed by the author, Department of Genetics, University of Washington, Seattle. 52. Fensome, R. A. , Taylor, F. J. R., Norris, G. Sarjeant, W. A. S., Wharton, D. I. & Williams G. L. A (1993) Classification of living and fossil dinoflagellates. Ed: Van Couvering, Sherida press. 53. Foster, P. G., Jermiin, L. S. & Hickey, D. A . (1997) Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J Mol Evol 44: 282-288. 54. Gajadhar, A. A. , Marquardt W . . C , Hall, R., Gunderson, J., Ariztia-Carmona, E. V . & Sogin, M . L. (1991) Ribosomal R N A sequences of Sarcocystis muris, Theileria annulata and Crypthecodinium cohnii reveal evolutionary relationships among apicomplexans, dinoflagellates and ciliates. Mol Biochem Parasitol 45: 147-154. 55. Gardner, M . J., Bates, P. A. , Ling, I. T., Moore, D. J., McCready, S., Gunasekera, M . B., Wilson, R. J. & Williamson, D. H . (1988) Mitochondrial D N A of the human malarial parasite Plasmodium falciparum. Mol Biochem Parasitol 31: 11-17. 56. Gibbs, S. P. (1981) The chloroplasts of some groups of algae may have evolved from endosymbiotic eukaryotic algae. Ann N Y Acad Sci 361: 193-207. 57. Gilson P, McFadden G l (1995) The chlorarachniophyte: a cell with two different nuclei and two different telomeres. Chromosoma 103: 635-641. 193 58.Gloeckner,G., Rosenthal,A. and Valentin,K (1999) Cyanidium caldarium chloroplast, complete genome (direct submission). 5?.Gray, M . W. & Doolittle, W. F. (1982) Has the endosymbiont hypothesis been proven? Microbiol Rev 46: 1-42. 60. Gray, M . W., Lang, B. F., Cedergren, R., Golding, G. B., Lemieux, C., Sankoff, D., Tunnel, M . , Brossard, N . , Delage, E., Littlejohn, T. G., Plante, I., Rioux, P., Saint-Louis, D., Zhu, Y . & Burger, G. (1998) Genome structure and gene content in protist mitochondrial DNAs. Nucleic Acids Res 26: 865-878. 61. Green, B. R. (1976) Covalently closed minicircular D N A associated with Acetabularia chloroplasts. Biochim Biophys Acta 447: 156-66. 62. Gunderson, J. H. , Goss, S. H. & Coats, D. W. (1999) The phylogenetic position of Amoebophrya sp. infecting Gymnodinium sanguineum. JEukaryot Microbiol 46: 194-197'. 63. Hallick, R. B., Hong, L. Drager, R. G. Favreau, M . R. Montfort, A . Orsat, B. Spielmann, A . & Stutz, E. (1993) Complete sequence of Euglena gracilis chloroplast DNA. Nucleic Acids Res 21: 3537-3544. 64. Hallick, R. B. & Bairoch, A . (1994) Proposals for naming of chloroplast genes. III. Nomenclature for open reading frames encoded in chloroplast genomes. Plant Mol BiolReptr 19 (2 Suppl.): S29-S30. 65. Hansen, G. (1995) Analysis of the thecal plate pattern in the dinoflagellate Heterocapsa rotundata (Lohmann) comb. Nov. (=Katodinium rotundata (Lohmann) Loeblich). Phycologia 34: 166-170. 194 66. Hansen, G., Moestrup, 0 & Roberts, K. R. (1996/7) Light and electron microscopical observations on Protoceratium reticulatum (Dinophyceae). Arch Protistenkd 147: 381-391. 67. Hartley, J. L. & Donelson, J. E. (1980) Nucleotide sequence of the yeast plasmid. Nature 286: 860-865. 68. Heizmann, P. Ravel-Chapuis, P. & Nigon,V. (1982) Minicircular D N A having sequence homologies with chloroplast D N A in a bleached mutant of Euglena gracillis. Curr Genet 6: 119-122. 69. Herrmann, R. G. (1982) The preparation of circular D N A from plastids. In "Methods in Chloroplast Molecular Biology" (eds. M . Edelman, R. B. Hallick and N. -H . Chua). Elsevier, Amsterdam, pp. 259-280. 70. Hildebrand, M . , Hasegawa, P., Ord, R. W., Thorpe V. S., Glass C. A . & Volcani B. E. (1992). Nucleotide sequence of diatom plasmids: identification of open reading frames with similarity to site-specific recombinases. Plant Mol Biol 19: 759-770. 71 .Hillis D M , Huelsenbeck JP, Cunningham CW (1994) Application and accuracy of molecular phylogenies. Science 264: 671-677. 72.Hiratsuka, J., H . Shimakda, R. Whittier, T. Ishibashi, M . Sakamoto, M . Mori, C. Kondo, Y . Honji, C. Sun, B. -Y. Meng, A . K . Y . - Q . L i , Y . Nishizawa, A . Hirai, K . Shinozaki and M . Suguira (1989). The complete sequence of the rice (Oryza sativa) chloroplast genome: Intermolecular recombination between distinct tRNA genes accounts for a major plastid D N A inversion during the evolution of cereals. Mol Gen Genet 111: 185-194. 195 73.Inagaki, Y . , Hayashi-Ishimaru, Y . , Ehara, M . , Igarashi, I. & Ohama, T. (1997) Algae or protozoa: phylogenetic position of euglenophytes and dinoflagellates as inferred from mitochondrial sequences. J Mol Evol 45: 295-300. 74.1shida K , Cao Y , Hasegawa M , Okada N , Hara Y . (1997) The origin of chlorarachniophyte plastids, as inferred from phylogenetic comparisons of amino acid sequences of EF-Tu. J Mol Evol 45: 682-687. 75.1shida K , Green, B. R. & Cavalier-Smith, T. (1998) Diversification of a chimaeric algal group, the chlorarachniophytes: phylogeny of nuclear and nucleomorph small-subunit rRNA genes. Mol Biol Evol 16:321-331. 76 Jacobs, J. D., Ludwig, J. R., Hildebrand, M . , Kukel, A. , Feng, T-Y., Ord, R. W. & Volcani, B. E. (1992) Characterisation of two circular plasmids from the marine diatom Cylindrotheca fusiformis: plasmids hybridise to chloroplast and nuclear DNA. Mol Gen Genet 233: 302-310. 77Jeffrey, S.W., Sielicki, M . & Haxo, F. T. (1975) Chloroplast pigment patterns in dinoflagellates. JPhycol 111: 374-384. 78. Jones, R. N . (1995) New Phytol 131: 411-434. 79. Jones, R. N . (1985) in The Evolution of genome size, ed. Cavalier-Smith, T. (Wiley, London), pp. 397-425. 80. Kaneko, T., Matsubayashi, T. Sugita, M . & Sugiura, M . (1996) Physical and gene maps of the unicellular cyanobacterium Synechococcus sp. strain PCC6301 genome. Plant Mol Biol 31: 193-201. 81. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, pp. 196 82. Kissinger, J. C , Donald, R. G., Moulton, A . L., Aiello, D. P., Lang-Unnasch, N . & Roos, D. S. (1999) Toxoplasma gondii chloroplast, complete genome (direct submission). 83. Koller, B. & H . Delius (1980). Vicia faba chloroplast D N A has only one set of ribosomal R N A genes as shown by partial denaturation mapping and R-loop analysis. Mol Gen Genet 178: 261-269. 84. K6hler, S., Delwiche, C. F., Denny, P. W., Tilney, L. G., Webster, P., Wilson, R. J. M . , Palmer, J. D. & Roos, D. S. (1997) A plastid of probable green algal origin in apicomplexan parasites. Science 275: 336-342. 85. Kolodner, R. D. & Tewari, K. K. (1975) The molecular size and conformation of the chloroplast D N A from higher plants. Biochim Biophys Acta 402: 372-390. 86. Kowallik K . V . (1992). Origin and evolution of plastids from chlorophyll-a+c-containing algae: suggested ancestral relationships to red and green algal plastids. In: Origins of Plastids. New York, Chapman and Hall. 223-263. 87. Kowallik K . V . , B. Stoebe, I. Schaffran & U . Freier (1995). The chloroplast genome of a chlorophyll a+c-containing alga, Odontella sinensis. Plant Mol Biol Reptr 13: 336-342. 88. La Claire II, J. W., Loudenslager, C M . & Zuccarello, G.C. (1998) Characterization of novel extrachromosomal D N A from giant celled marine green algae. Curr Genet 34: 204-211. 89. Lemieux, C , Otis, C. and Turmel, M . (2000) Ancestral chloroplast genome in Mesostigma viride reveals an early branch of green plant evolution. Nature 403: 649-652. 197 90. L i , W.-H. (1997) Molecular Evolution. Sinauer Associates, Sunderland, M A . 91. Lidholm, J. & Gustafsson, P. (1991) The chloroplast genome of gymnosperm Pinus contorta: a physical map and a complete collection of overlapping clones. Curr Genet 20: 161-166. 92. Lockhart, P. J., Steel, M . A. , Hendy, M . D. & Penny, D. (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11: 503-513. 93. Lockhart, P. J., Larkum, A . W. D., Steel, M . A. , Waddell, P. J. & Penny, D. (1996). Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci USA 93: 1930-9134. 94. Lynch, M . , Conery, J. & Burger, R. (1995) Mutational meltdowns in sexual populations. Evolution 49: 1067-1080. 95. Lynch, M . (1996) Mutation accumulation in transfer RNAs: molecular evidence for Muller's ratchet in mitochondrial genomes. Mol Biol Evol 13: 209-220. 96. Maier, R. M . , Neckermann, K . , Igloi, G. I. & Kossel, H . (1995) Complete sequence of the maize chloroplast genome: Gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol 251: 614-628. 97. Manning, J. E., Wolstenholme, D. R., Ryan, R. S., Hunter, J. I. & Richards, O. C. Circular chloroplast D N A from Euglena gracilis. Proc Natl Acad Sci USA 68: 1169-1173 (1971). 98. Martin, W., Stoebe, B. , Goremykin, V. , Hansmann, S., Hasegawa, M . & Kowallik, K . V. (1998) Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393: 162-165. 198 99. Martin, W. & Henrnann, R. G. (1998) Gene transfer from organelles to the nucleus: how much, what happens, and Why? Plant Physiol 118: 9-17. 100. Maslov, D. A. , Avila, H . A. , Lake, J. A . & Simpson, L. (1994) Evolution of R N A editing in kinetoplastid protozoa. Nature 368: 345-348. 101. Maynard Smith, J. & Szathmary, E. (1993) The origin of chromosomes I. Selection for linkage. J Theor Biol 164: 437-446. 102. McFadden, G. I. (1990) Evidence that cryptomonad chloroplasts evolved from photosynthetic eukaryotic endosymbionts. J Cell Science 95: 303-308. 103. McFadden, G. I., Gilson, P. R., Hofmann, C. J., Adcock, G. J. & Maier, U . G. (1994) Evidence that an amoeba acquired a chloroplast by retaining part of an engulfed eukaryotic alga. Proc Natl Acad Sci USA 91: 3690-3694. 104. McFadden, G. I., Gilson, P. & Waller, R. F. (1995) Molecular phylogeny of chlorarachniophytes based on the plastid rRNA and rbcL sequences. Archiv Protistenk 145:231-239. 105. McFadden, G. I. & Waller, R. F. (1997) Plastids in parasites of humans. BioEssays 19: 1033-1040. 106. Medlin, L. K. , Cooper, A. , Hi l l , C , Wrieden, S. & Wellbrock, U . (1995) Phylogenetic position of the Chromista plastids based on small subunit rRNA coding regions. Curr Genet 28: 560-565. 107. Moran, N . A . (1996) Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Natl Acad Sci USA 1996 93: 2873-2878. 108. Morden, C. W., Delwiche, C. F., Kuhsel, M . & Palmer, J. D. (1992) Gene phylogenies and the endosymbiotic origin of plastids. Biosystems 28: 75-90. 199 109.Moreira, D., Le Guyader, H. & Phillippe, H. (2000) The origin of red algae and the evolution of chloroplasts. Nature 40: 69-72. HO.Morse D., Salois, P., Markovic, P. & Hastings, J. W. (1995) A nuclear encoded form II RuBisCo in dinoflagellates. Science 268: 1622-1624. 11 l.Muller, H . J. (1964) The relation of recombination to mutational advance. Mutate Res 1: 2-9. 112.Norman, J. E., Gray, M . W. (1997) The cytochrome oxidase subunit 1 gene (coxl) from the dinoflagellate, Crypthecodinium cohnii. FEBS Lett 413: 333-8. 113.Oda. K. , Yamato, K. , Ohta, E., Nakamura, Y . , Takemura. M . , Nozato, N . , Akashi, K. , Kanegae, T., Ogura, Y . , Kohchi, T. et al (1992) Gene organization deduced from the complete sequence of liverwort Marchantia polymorpha mitochondrial DNA. A primitive form of plant mitochondrial genome. J Mol Biol 223: 1-7. 114.0hyama, K. , Fukuzawa, H. , Kohchi, T., Shirai, H. , Sano, T., Sano, S., Umesono, K , Shiki, Y . , Takeuchi, M . , Chang, Z., Aota, S., Inokuchi, H. and Ozeki, H (1986). Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha. Nature 322: 572-574. 115.01sen, G. J. (1987) Earliest phylogenetic branching: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harbor Symp, Quant Biol 52: 825-837 116.01sen, G. J., Matsuda, H. , Hagstrom, R. & Overbeek, R. (1994}fastDNAml: a tool for construction of phylogenetic trees of D N A sequences using maximum likelihood. Comp App Biosci 10: 41-48. 200 117.0rgel, L. E. & Crick, F. H . C. (1980) Selfish DNA: the ultimate parasite. Nature 284: 604-607. 118. Palmer JD (1985) Comparative organization of chloroplast genomes. Ann Rev Genet 19: 325-354. 119. Palmer, J. D. & Stein, D. B. (1986) Conservation of chloroplast genome structure among vascular plants. Curr Genet 10: 823-833. 120. Palmer JD (1990) Contrasting modes and tempos of genome evolution in land plant organelles. Trends Genet 6: 115-120. 121. Palmer, J. D. (1991) Plastid chromosomes: structure and evolution. In: Cell Culture and Somatic Cell Genetics in Plants, Vol . 7, The Molecular Biology of Plastids. New York, Academic Press. pp5-53. 122. Palmer, J. D. (1992) Green ancestry of malarial parasites? Current Biology 2: 318-320. 123 .Palmer, J. D. (1996) Rubisco surprises in dinoflagellates. The plant cell 8: 343-345. 124. Palmer, J. D. & Delwiche, C. F. (1998) The origin and evolution of plastids and their genomes. In: Molecular systematics of plants II. (eds Soltis, DE, Soltis, PS, Doyle, JJ) Kluwer, Norwall, M A pp. 375-409. 125. Pamilo, P., Nei, N . & L i . , W.-H. (1987) Accumulation of mutations in sexual and asexual populations. Genet Res. 49: 135-146. 126. Pichersky, E. (1990) Nomad DNA—a model for movement and duplication of D N A sequences in plant genomes. Plant Mol Biol 15: 437-448. 201 127. Pichersky, E., Logsdon, J. M . Jr, McGrath, J. M . & Stasys, R. A (1991) Fragments of plastid D N A in the nuclear genome of tomato: prevalence, chromosomal location, and possible mechanism of integration. Mol Gen Genet 225: 453-458. 128. Philippe, H . & Laurent, J. (1999) How good are deep phylogenetic trees? Curr Opin Genet DevS: 616-623. 129. Pritchard, A. E., Venuti, S. E., Chalambor, M . A. , Sable, C. L. and Cummings, D. J. (1989) A n unusual region of Paramecium mitochondrial D N A containing chloroplast-like genes. Gene 7 8 : 121-134. BO.Pritchard, A . E., Seilhamer, J. J., Mahalingam, R., Sable, C. L. , Venuti, S. E. (1990) Nucleotide sequence of the mitochondrial genome of Paramecium. Nucleic Acids Res 1 8 : 173-80 131. Raikov, I. B. (1982) The protozoan nucleus. In: Cell Biol. Monographs Vol . 9 New York : Springer-Verlag. 132. Raven, P. H. (1970) A multiple origin for plastids and mitochondria. Science 1 6 9 : 641-646. 133. Reardon, E. M . & Price, C. A . (1995) Plastid genomes of three non-green algae are sequenced. Plant Mol Biol Reptr 1 3 : 320-326. 134. Reece, K. S., Siddall, M . E., Burreson, E. M . & Graves, J. E. (1997) Phylogenetic analysis of Perkinsus based on actin gene sequences. JParasitol 8 3 : 417-23. 135. Reith, M . (1995) Molecular biology of rhodophyte and chromophyte plastids. Ann Rev Plant Physiol Plant Mol Biol 4 6 : 549-575. 136. Reith, M . & J. Munholland (1995). Complete nucleotide sequence of the Porphyra purpurea chloroplast genome. Plant Mol Biol Reptr 1 3 : 333-345. 202 137. Ricchetti, M . , Fairhead, C. & Dujon, B. (1999) Mitochondrial D N A repairs double-strand breaks in yeast chromosomes. Nature 4 0 2 : 96-100. 138. Rizzo, P. J. (1981) Comparative aspects of basic chromatin proteins in dinoflagelaltes. Biosystems 14 : 433-443. 139. Rizzo, P. J. (1987) Biochemistry of the dinoflagellate nucleus. In: The biology of dinoflagellate (ed: F. J. R. Taylor), pp. 143-173. 140. Rochaix, J. D. (1982) Isolation of chloroplast D N A from Chlamydomonas reinhardii. In "Methods in Chloroplast Molecular Biology" (M. Edelman, R. B. Hallick and N. -H . Chua, eds.). Elsevier, Amsterdam, pp. 295-301. 141. Rogers, S. O., Honda, S. & Bendich, A. J. (1988) Variation in the ribosomal R N A genes among individuals of Vicia faba. Plant Mol Biol 6: 339-345. 142. Rowan, R., Whitney, S. W., Fowler, A. & Yellowlees, D. (1996) Rubisco in marine symbiotic dinoflagellates: form II enzymes in eukaryotic oxygenic phototrophs, encoded by a nuclear multi-gene family. Plant Cell 8: 539-553. 143.Sager, R. & Ishida, M . R. (1963) Chloroplast D N A in Chlamydomonas. Proc Natl Acad Sci USA 5 0 : 725-730. 144.Sambrook, J. Fritsch, E. & Maniatis, T. (1989) Molecular cloning: A laboratory manual. Cold Spring Harbor, New York. 145.Sato, S., Nakamura, Y . , Kaneko, T., Asamizu, E. & Tabata, S. (1999) Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6: 283-290. 146.Saunders, G. W., Hi l l , D. R. A. , Sexton, J. P. & Andersen, R. A . (1997) Small-subunit ribosomal R N A sequences from selected dinoflagellates: testing classical 203 evolutionary hypotheses with molecular systematic methods. Plant Syst Evol [Suppl.] 11: 237-259. 147.Schlunegger, B. & Stutz, E. (1984) The Euglena gracilis chloroplast genome: structure features of a D N A region possibly carrying the single origin of D N A replication. Curr Genet 8: 629-634. 148.Schnepf, E. & Elbrachter, M . (1988) Cryptophycean-like double membrane-bound chloroplast in the dinoflagellate, Dinophysis Ehrenb.: evolutionary, phylogenetic and toxicological implications. Botanica Acta 101: 196-203. 149.Schmitz-Linneweber, C , Alcaraz, J. P., Cottet, A . , Maier, R. M . , Herrmann, R. G. & Mache, R. (2000) Spinacia oleracea chloroplast, complete genome (direct submission). 150.Shinozaki, K. , M . Ohme, M . Tanaka, T. Wakasugi, N . Hayashida et al. (1986).The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO JS: 2034-2049. 151.Smith, S. W. Overbeek, R., Woese, C. R., Gilbert, W. & Gillevet, P. M . (1994) The genetic data environment and expandable GUI for multiple sequence analysis. Comp AppBiosci 10: 671-675. 152.Stirewalt, V . L. , Michalowski, C. B., Loffelhardt, W., Bohnert, H . J. & Bryant, D. A . (1995) Nucleotide sequences of the cyanelle genome from Cyanophora paradoxa. Plant Mol Biol Reptr 13: 327-332. 153.Stoebe, B. & Kowallik, K. V . (1999) Gene-cluster analysis in chloroplast genomics. Trends genet 15: 344-347. 204 154.Strimmer, K . & von Haeseler, A . (1996) Quartet puzzling:a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13: 964-969. 155.Soyer, M . O. & Haapala, O. K . (1974c) Division and function of dinoflagellate chromosomes. JMicroscopie 19: 137-146. 156.Sugiura, M . (1995) The chloroplast genome. Essays Biochem 30: 49-57. 157.Sugiura, M . , Hirose, T. & Sugita, M . (1998) Evolution and mechanism of translation in chloroplasts. Annu Rev Genet 32: 437-459. 158.Sun, C. W. & Callis, J. (1993) Recent stable insertion of mitochondrial D N A into an Arabidopsis polyubiquitin gene by nonhomologous recombination. Plant Cell 5: 97-107. 159.Swofford, D. L. (1999) Phylogenetic analysis using parsimony (and other methods) (PAUP 4.0b) (test version) 160.Sze. P. (1993). Introduction-algal characteristics and diversity. A Biology of the Algae (second edition), pp.1-18. 161.Szeto, W. W., Jones, J. D. G., Zimmerman, J. L. , Mcintosh, L. & Ausubel, F. M . (1981) Hoechst 33258-cesium chloride gradient sedimentation of plant DNA. PMB Newsletter 2: 97-99. 162. Takishita, K. & Uchida, A . (1999) Molecular cloning and nucleotide sequence analysis of psbA from the dinoflagellates: origin of the dinoflagellate plastid. Phycol Res 47: 207-216. 163. Taylor, F. J. R. 1990. Phylum Dinoflagellata, In: Handbook of protoctista (Eds. L. Margulis, J. O. Corliss, M . Melkonian, D.J.Chapman), Jones and Bartlett publishers, Boston, pp. 419-437. 205 164. Tengs, T., Dahlberg, O. J., Shalchian-Tabrizi, K. , Klaveness, D., Rdui, K. , Delwiche, C. F. & Jakobsen, K . S. Phylogenetic analyses indicated that the 19' Hexanoyloxy-fucoxanthin-containing dinoflagellates have tertiary plastids of haptophyte origin. Mol Biol Evol 17: 718-729 (2000). 165. Tsudzuki, J., Nakashima, K. , Tsudzuki, T., Hiratsuka, J., Shibata, M . , Wakasugi, T. & Sugiura, M . (1992) Chloroplast D N A of black pine retains a residual inverted repeat lacking rRNA genes: nucleotide sequences of trnQ, trnK, psbA, trnl and trnH and the absence of rps\6. Mol Gen Genet 232: 206-214. 166. Turmel, M . , Otis, C. & Lemieux, C. (1999a) The complete chloroplast D N A sequence of the green alga Nephroselmis olivacea: insights into the architecture of ancestral chloroplast genomes. Proc Natl Acad Sci USA 96: 10248-10253. 167. Vahrenholz, C , Riemen, G., Pratje, E., Dujon, B. & Michaelis (1993) Miochondrial D N A of Chlamydomonas reinhardtii: the structure of the ends of the linear 15.8-kb genome suggests mechanism for D N A replication. Curr Genet 24: 241-247. 168. Van de Peer Y . & de Wachter, R. (1994) TREECON for windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comp Appl Biosci 10: 569-570. 169. Van de Peer Y . , Van de Auwera, G. & De Wachter, R. (1996) The evolution of stramenopiles and alveolates as derived by "substitution rate calibration" of small ribosomal subunit RNA. J Mol Evol 42: 201-210. 170. Vesk, M . , Dibbayawaan, T. P. & Vesk. P. A . (1996) Immunogold localization of phycoeythrin in chloroplasts of Dinophysis acuminata and D. forth (Dinophysiales, Dinophyta). Phycologia 35: 234-238. m .Watanabe, M . M . , Suda, S., Inouye, I., Sawaguchi, T. & Chihara, M . (1990) Lepidodinium viride gen sp. nov. (Gyninodiniales, Dinophyta), a green dinoflagellate with chlorophyll a- and b-containing endosymbiont. JPhycol 26: 741-751. 172. Wakasugi, T., Tsudzuki, J., Ito, S., Nakashima, K. , Tsudzuki, T. and Sugiura, M . (1994) Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc Nati Acad Sci USA 91: 9794-9798.' 173. Wakasugi, T., Nagai, T., Kapoor, M . , Sugita, M . , Ito, M . , Ito, S., Tsudzuki, J., Nakashima, K. , Tsudzuki, T., Suzuki, Y . , Hamada, A. , Ohta, T., Inamura, A. , Yoshinaga, K. & Sugiura, M . (1997) Complete nucleotide sequence of the chloroplast genome from the green alga Chlorella vulgaris: the existence of genes possibly involved in chloroplast division. Proc Nad Acad Sci USA 94: 5967-5972. 174. Waller, R. F., Keeling, P. J., Donald, R. G., Striepen, B., Handman, E., Lang-Unnasch, N . , Cowman, A . F., Besra, G. S., Roos, D. S. & McFadden, G. I. (1998) Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc Natl Acad Sci USA 98: 12352-12357. 175. Watanabe, N . , Nakazono, M . , Kanno, A . , Tsutsumi, N . & Hirai, A . (1994) Evolutionary variations in D N A sequences transferred from chloroplast genomes to mitochondrial genomes in the Gramineae. Curr Genet 26: 512-518. 176. Whatley, J. M . , John, P. & Whatley, F. R. (1979) From extracellular to intracellular: the establishment of mitochondria and chloroplasts. Proc Roy Soc Lond B 204: 165-187. 177. Williamson, D. H. , Gardner, M . J., Preiser, P., More, D. J., Rangachari, K . & Wilson, R. J. M . (1994) The evolutionary origin of the 35 kb circular D N A of Plasmodium falciparum: new evidence supports a possible rhodophyte ancestry. Mol Gen Genet 243: 249-252. 178. Wilson, R. J. M . , Denny, P. W., Preiser, P. R., Rangachari, K. , Roberts, K. , Roy, A. , Whyte, A. , Strath, M . , Moore, D. J., Moore, P. W. & Williamson, D. H . (1996) Complete gene map of the plastid like D N A of the malaria parasite Plasmodium falciparum. JMol Biol 261: 155-172. 179. Wolfe, K. H. , Morden, C. W. & Palmer, J. D. (1992) Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc Natl Acad Sci USA 89: 10648-10652. 180. Wu, M . , Lou, J. K. , Chang, D. Y . Chang, C. H . & Nie, Z. Q. (1986) Structure and function of a chloroplast D N A replication origin of Chlamydomonas reinhardtii. Proc Natl Acad Sci USA 83: 6761-6765. 181. Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Tree 11: 367-372. 182. Zhang, Z., Green, B. R. & Cavalier-Smith, T. (1999) Single gene circles in dinoflagellate chloroplast genomes. Nature 400: 155-159. 183. Zulla, S., Leang, C. S., Slighton, J. L., Hadler, H. I. & Eisenstadt, J. M . (1991) Mitochondrial D-loop sequences are integrated in the rat nuclear genome. JMol Biol 221: 1223-1235. 208 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0099520/manifest

Comment

Related Items