Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The evolutionary implications of diplonemids and their spliceosomal introns Qian, Qing 2000

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2000-0538.pdf [ 4.7MB ]
Metadata
JSON: 1.0089614.json
JSON-LD: 1.0089614+ld.json
RDF/XML (Pretty): 1.0089614.xml
RDF/JSON: 1.0089614+rdf.json
Turtle: 1.0089614+rdf-turtle.txt
N-Triples: 1.0089614+rdf-ntriples.txt
Original Record: 1.0089614 +original-record.json
Full Text
1.0089614.txt
Citation
1.0089614.ris

Full Text

T H E E V O L U T I O N A R Y IMPLICATIONS OF DLPLONEMIDS A N D THEIR S P L I C E O S O M A L INTRONS by QINGQIAN B . Sc., Hangzhou University, 1996 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S F O R THE D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES Department o f Botany W e accept this thesis as conforming to the required standard  T H E UNIVERSITY OF BRITISH C O L U M B I A September 2000 © Q i n g Q i a n , 2000  In presenting  this  degree at the  thesis  in  partial fulfilment  of  University of  British Columbia,  I agree  freely available for reference copying  of  department publication  this or  and study.  thesis for scholarly by  of this  his  or  her  Department of The University of British Columbia Vancouver, Canada  requirements that the  I further agree  purposes  representatives.  may be It  thesis for financial gain shall not  permission.  DE-6 (2/88)  the  that  advanced  Library shall make it  by the  understood be  an  permission for extensive  granted  is  for  allowed  that without  head  of  my  copying  or  my written  Abstract The phylum Euglenozoa consists o f three main groups: euglenoids, kinetoplastids and diplonemids (Simpson 1997). This phylum is unique in having three types o f introns: nuclear trans-spliced "introns", nuclear conventional and "aberrant" introns. In order to determine the evolutionary history o f the introns i n this phylum, it is very important to know the general distributions o f intron types within the phylum, and the likely phylogeny o f the phylum. The nuclear genomes o f euglenoids are known to contain all three types o f introns, while only trans-spliced and conventional introns have been found in kinetoplastids. However, nothing is known about diplonemid introns, and the phylogenetic placement o f diplonemids within the Euglenozoa is uncertain. Therefore, I looked for nuclear introns in diplonemids by sequencing four nuclear protein-coding genes (actin, alpha-tubulin, betatubulin, and G A P D H ) from different diplonemids. I found 11 introns i n nine o f the twentynine newly obtained diplonemid nuclear protein-coding genes. They all have conventional 5 ' - G T - A G - 3 ' splicing sites, but differ from well-studied eukaryotic conventional introns (mammalian introns) in several details. I have added these nuclear encoded sequences from diplonemids to the tubulin, actin and G A P D H alignments and then made global phylogenetic trees based on these protein alignments. The discrepancy between the tubulin trees and actin tree is whether the diplonemids are closer to kinetoplastids (tubulin trees) or euglenoids (actin tree). Taken together, I postulate that the G T - A G conventional introns were present i n the euglenozoan ancestor and were largely lost in kinetoplastids and euglenoids. The "aberrant" intron is very likely a derived character restricted to euglenoids. The trans-spliced discontinuous "intron" is an ancestral character to this phylum and it is highly likely that it w i l l be found in diplonemids as well. The phylogenetic position o f the four newly sequenced diplonemid G A P D H sequences turned out to be very interesting. None o f the four diplonemid G A P D H sequences branch with those o f other euglenozoa. Instead, three o f the four diplonemid-sequences branch with the gap3 o f cyanobacteria with 100% bootstrap support, indicating a lateral gene transfer from bacteria to eukaryotes, and one G A P D H sequence branches i n an uncertain position with other eukaryotic G A P D H sequences.  TABLE OF CONTENTS Abstract  ii  List o f Tables  iv  List o f Figures  v  Acknowledgements  vi  CHAPTER I  C H A P T E R II  C H A P T E R III  Introduction  1  1.1 1.2 1.3  1 4 12  Materials and Methods  16  2.1 2.2 2.3 2.4 2.5 2.6  16 17 18 19 20 21  Strains and culture conditions D N A extraction procedures P C R conditions Cloning o f amplified fragments D N A sequencing Sequence alignment and phylogenetic analyses  Results 3.1 3.2 3.3 3.4  C H A P T E R TV  The phylum Euglenozoa and its phylogeny Intron types in the Euglenozoa Diplonemid G A P D H  24 Sequences for nuclear encoded genes from diplonemids ... 24 Diplonemid introns 36 Phylogeny o f the Euglenozoa 40 Lateral gene transfer indicated by G A P D H phylogeny 50  Discussion  60  4.1 4.2 4.3 4.4  60 63 67 70  Phylogeny o f the Euglenozoa Possible origins o f the intron-types in the Euglenozoa Features o f diplonemid introns Evolutionary origin o f diplonemid G A P D H  References  75  Addendum  79  List of Tables 1  Twenty-nine nuclear encoded genes from nine diplonemids  2  K - H test o f the positions o f diplonemids within Euglenozoa i n the actin tree ... 49  3  K - H test o f the positions o f diplonemids within Euglenozoa i n the alpha-tubulin tree  49  K - H test o f the positions of diplonemids within Euglenozoa i n the beta-tubulin tree  49  4  25  iv  List of Figures 1  The two-step cis-splicing  6  2  The two-step trans-splicing  8  3  The secondary stem-loop structure o f an intron in the rbcS gene from  Euglena gracilis  10  4  A n alignment o f four amino-acid sequences o f actin  26  5  A n alignment o f twelve amino-acid sequences o f alpha-tubulin  28, 29  6  A n alignment o f fifteen amino-acid sequences o f beta-tubulin  31, 32  7  A n alignment o f six amino-acid sequences o f G A P D H  35  8  Alignment o f eleven-diplonemid introns from 5'-end to 3'-end  38  9  Neighbor-joining tree based on actin protein sequences o f various eukatyotes  41, 42  10  Neighbor-joining tree based on alpha-tubulin protein sequences o f various eukatyotes 44, 45  11  Neighbor-joining tree based on beta-tubulin protein sequences o f various eukaryotes  47, 48  12  Phylogeny o f diverse eukaryotes and prokaryotes based on G A P D H protein sequences 52, 53  13  G A P D H phylogeny o f protein sequences o f prokaryotes and some eukaryotes  54, 55  G A P D H phylogeny of protein sequences o f eukaryotes and some bacteria  56, 57  Three possible topologies for the internal phylogeny o f the Euglenozoa  66  14  15  v  Acknowledgements I would like to thank everyone who inspired and helped me to produce this thesis. O n the inspiration side there stand, first, both o f my supervisors, Dr. T o m Cavalier-Smith and Dr. Patrick Keeling. I am very grateful to Dr. T o m Cavalier Smith for his research funding that allowed me to pursue this project, for his high critical standards, and also for his insightful guidance about phylogeny and evolution. After T o m left for a faculty position at Oxford University in England, Dr. Patrick Keeling became my main supervisor. I enjoyed working in his lab and trying new techniques. I thank Dr. Patrick Keeling for his kindness i n sharing his knowledge with me, especially for the very complicated phylogenetic analysis and the G A P D H phylogeny, and also for his generosity to provide me with almost all the protein alignments and degenerate primers utilized i n this project. M y committee members, Dr. Carl Douglas and Dr. Martin Adamson, kindly gave me comments and suggestions during the progress o f my project. O n the help side, I would first like to thank Dr. N a o m i Fast, for her careful reading o f the manuscripts o f my thesis, and her help i n the preparation for the presentation o f my project. I would also like to thank Dr. K e n Ishida and Juan Saldarriaga who gave me valuable suggestions on my thesis. During the writing and re-writing process, I learned a lot in presenting my thoughts more strongly and precisely. I owe special thanks to E m a Chao for generously providing me with the genomic D N A s from several diplonemids, and Dr. Alexandra Marinets for showing me the basic molecular lab techniques i n the beginning. I couldn't possibly have finished my thesis without the support from my dear parents, and various help from my friends, especially my roommate Tanya Hooker. Finally, my thanks go to my friend Jens Happe, for his persistent encouragement over the last two years.  CHAPTER  I:  Introduction 1.1  The P h y l u m Euglenozoa and its phylogeny Cavalier-Smith (1981) first formally established the phylum Euglenozoa by grouping  kinetoplastids and euglenoids together based on a list o f shared characteristics, including: mitochondria with discoid cristae; paraxial rods; non-tubular mastigonemes (flagellar hairs) and closed mitosis with an endonuclear spindle (Cavalier-Smith 1981). The first electron microscopic observations o f diplonemids (Triemer et al. 1990) suggested the addition o f diplonemids to the phylum Euglenozoa. Simpson (1997) further proposed two potential synapomorphies uniting the phylum Euglenozoa: flagellar root pattern and paraxial rod substructure. The unique pattern o f flagellar root organization o f the Euglenozoa is the system o f three microtubular roots: two roots closely associated with the outside o f each basal body and one originating between the basal bodies. In addition, the paraxial rods o f kinetoplastids, euglenoids and diplonemids share a distinctive substructure: the paraxial rod of the dorsal/anterior flagellum has a cylindrical cross-sectional appearance, while the structure in the ventral/recurrent flagellum is squarer in cross-section with a threedimensional latticework substructure. Another new addition to the phylum Euglenozoa is Postgaardi managerensis. It is a recently described organism that is covered by rod-shaped bacteria, and that has two thickened flagella inserting into an anterior pocket. A recent ultrastructural study o f Postgaardi mariagerensis (Simpson et al. 1996/97) revealed a strong case for its inclusion within the Euglenozoa because it also shares the two major synapomorphies proposed by Simpson for the Euglenozoa (Simpson 1997).  1  While the Euglenozoa share several synapomorphies, each euglenozoan group also displays distinct features o f their own. The euglenoids as a subgroup are identified by the presence o f a pellicle- a system o f strips o f glycoprotein that appears under the plasma membrane and is supported by sub-pellicular microtubules (Triemer et al. 1991b). This group includes both photosynthetic euglenoids (e.g. Euglena) and non-photosynthetic euglenoids (e.g. Entosiphori). The photosynthetic euglenoids have attracted the attention o f many researchers because o f their intriguing chloroplasts, which are surrounded by three membranes, instead o f two membranes. It is now clear that their chloroplasts are o f secondary endosymbiotic origin, which means that a colourless euglenoid acquired its chloroplast by swallowing a green algal cell (Gibbs 1978). The kinetoplastids are the euglenozoans that harbor one or more kinetoplasts ( D N A rich bodies) in their mitochondria (Lee et al. 1985; Opperdoes 1987). This group includes major disease-causing genera. For example, the genera Trypanosoma and Leishmania include serious human pathogens that cause African 'sleeping sickness', South American Chagas disease, as well as leishmaniasis in tropical and subtropical areas (Lee et al. 1985; Opperdoes 1987). In addition, this group also includes free-living flagellates such as Bodo (Lee et al. 1985). Diplonemids are represented by only two genera, Diplonema and Rhynchopus, based on their very similar ultrastructural organizations (Schnepf et al. 1994). They have neither kinetoplasts nor a pellicle of glycoprotein strips, but do possess a distinctive feeding apparatus composed o f vanes with fuzzy coats and giant, flat mitochondrial cristae (Triemer et al. 1990; Triemer et al. 1991a; Triemer et al. 1991b; Simpson, 1997). Diplonemids do not have chloroplasts and they are not human pathogens, and they live in either fresh-water or marine environments (Schnepf et al. 1994; Triemer et al. 1990). 2  Postgaardi mariagerensis lacks an euglenoid pellicle and possesses mitochondria without kinetoplast or cristae. So, although being part o f the phylum Euglenozoa, it is neither an euglenoid nor a kinetoplastid. A s far as its feeding apparatus is concerned, P. mariagerensis has no vanes or supporting rods, but only the M T R (a complex o f reinforcing microtubules) to support its feeding apparatus. This distinction indicates that Postgaardi mariagerensis is not a diplonemid either (Simpson et al. 1996/97; Simpson 1997). In short, there are many data based on light- and electron-microscopy to distinguish among the four groups o f the Euglenozoa. Data are particularly abundant for euglenoids and kinetoplastids. Although these structural characters are helpful i n revealing phenotypic similatities, they are not as helpful for inferring phylogeny. Molecular sequences, on the other hand, are much more suitable for the latter task. However, they are not available at all from Postgaardi mariagerensis and extremely limited from diplonemids compared to euglenoids and kinetoplastids. In fact, no nuclear protein-coding gene has been characterized from diplonemids so far. The only available molecular sequences at the onset o f this study were the sequences o f small subunit ribosomal R N A ( S S U r R N A ) genes from two diplonemids (Diplonema papillatum and Diplonema sp.) and a partial sequence o f the mitochondrial gene for cytochrome c oxidase subunit I (Cox I protein) from one diplonemid (Diplonema papillatum) (Maslov et al. 1999). Maslove and Simpson (1999) performed a molecular phylogenetic study using these sequences i n order to analyze the phylogenetic position o f diplonemids within the phylum Euglenozoa. In their phylogenetic analyses, Diplonema was shown to be a sister-group o f either kinetoplastids (in trees inferred with the maximum-likelihood method), or euglenoids (in trees inferred with the parsimony and distance methods). In either case, however, the affinity is not well supported by bootstrap  3  analysis and the differences between the best tree and the alternative trees were not significant. It remains unclear how diplonemids are related to euglenoids and kinetoplastids. In molecular trees, this may be due to two weaknesses in the phylogenetic analyses conducted by Maslove et al. (1999): 1) The very small sampling size. A l l the S S U gene trees were based on only seventeen taxa, including two diplonemid-sequences, and all the C o x I protein trees were based on only seven taxa, including one diplonemid-sequence. 2) The mitochondrial Cox I protein phylogeny can be unreliable due to the fast evolution o f euglenozoan mitochondrial genes. In this study, I have characterized diplonemid nuclear encoded genes for actin, alphatubulin, beta-tubulin and G A P D H , to construct novel phylogenetic trees to try to resolve the phylogenetic position o f diplonemids within the Euglenozoa.  1.2  Intron types in the Euglenozoa  Three types of introns in euglenoids and/or kinetoplastids Three types o f introns occur in euglenoids and/or kinetoplastids: conventional ' G T A G ' spliceosomal introns, trans-spliced, or discontinuous 'introns' and "aberrant" introns. I w i l l describe each o f the three types and their distributions within the Euglenozoa in the following three sections. G T - A G spliceosomal introns G T - A G spliceosomal introns are abundant i n higher eukaryotes. Genes in most eukaryotes are transcribed into pre-mRNAs that include introns. Only when all the introns in the pre-mRNAs are excised w i l l the mature m R N A s be transported from the nucleus to the cytosol where translation takes place. The precise removal o f the introns from the primary R N A transcripts is a critical step in gene expression i n all eukaryotic cells. In general, it is a 4  two-step catalytic process aided by a group o f small nuclear ribonucleoprotein particles (snRNPs) together called the spliceosome. The spliceosome is mainly composed o f five snRNPs ( U l , U 2 , U 5 and U4/U6), and assembles on the precursor messenger R N A through R N A - R N A , RNA-protein, and protein-protein interactions. The first step i n cissplicing is the cleavage o f the 5' splice site by the formation o f a 2 ' - 5 ' phosphodiester bond between an adenosine within the intron and the guanosine residue at the 5' end o f the intron. This generates a free 5' exon and an intermediate R N A i n a lariat structure. The second step involves the cleavage o f the 3' splice site, the ligation o f the 5' exon and the 3' exon and the release o f the intron i n a lariat structure (Fig. 1). Since the introns are removed before expression o f the gene, most intron sequences accumulate mutations during evolution more rapidly than the flanking exons. The only highly conserved sequences within the intron are those required for intron removal or for recognition during formation o f the spliceosome. In particular, the ' G T ' at the 5' end and ' A G ' at the 3' end o f an intron are almost invariant. Mutational studies have shown that disrupting either the G T at the 5' splice site or the A G at the 3' splice site can block or reduce the rate o f both steps during the cis-splicing (Sharp 1987). In addition to the consensus 5' and 3' splice junction sequences, the next conserved sequence regions are the branchpoint region and the region between the branch point and the 3' splice site (Sharp 1987; U m e n et al. 1995). The branchpoint site is where the lariat intermediate forms after the first step o f the cis-splicing. During the first step o f the splicing, the 2' hydroxyl o f an adenosine in this branchpoint site attacks the phosphodiester bond between the guanosine at the 5' terminus o f an intron and an ajacent exon nucleotide. This leads to the releasing o f the 5' exon and the formation o f a 2 ' - 5 ' phosphodiester bond between the branchpoint adenosine and the guanosine at the 5' terminus o f an intron. In 5  pre-mRNA transcript  Fig. 1 The two-step cis-splicing. Filled square represents 5' exon and open square represents 3' exon. Intron is represented by black line. The consensus dinucleotides at either end of the intron are marked as GU and AG. The branchpoint adenosine is marked as A. The dashed line between G and A represents the 2'-5' phosphodiester bond formed after thefirststep of the splicing. See text for detailed description.  yeast, the branchpoint region is strictly maintained. It has the consensus sequence 5'U A C U A A C A - 3 ' (the underlined adenosine is the adenosine participating in the formation o f the 2 ' - 5 ' phosphodiester bond). In mammals, branchpoint region is less conserved, but the region between the branchpoint and the 3' splice site is a conserved polypyrimidine tract, and is one o f the essential recognition sites for the binding o f splicing factors (Sharp 1987; Tazi et al. 1986; Gerker et al. 1986; U m e n et al. 1995). Conventional G T - A G spliceosomal introns have been found i n both green and colourless euglenoids, although they are rare i n both. In Euglena, so far, only three G T - A G introns have been found in the fibrillarin gene of Euglena gracilis (Breckenridge et al. 1999). In the colourless euglenoid, Entosiphon sulcatum, one spliceosomal intron has been found in a beta-tubulin gene (Ebel et al. 1999). In kinetoplastids, only two G T - A G cissplicing introns have very recently been discovered in the poly ( A ) polymerase (PAP) genes from both Trypanosoma brucei and Trypanosoma cruzi (Mair et al. 2000). Trans-spliced discontinuous introns Trans-splicing is also a post-transcriptional R N A - s p l i c i n g process. The distinguishing difference between cis- and trans-splicing is that, in trans-splicing, two exons flanking the discontinuous intron are on two different pieces o f pre- messenger R N A s (Agabian 1990; Blumenthalet al. 1988; Nilsen 1995) (Fig. 2). However, trans-splicing is not an entirely novel R N A - s p l i c i n g process, it is regarded as the splicing o f a discontinuous G T A G spliceosomal intron. Trans-splicing is similar to G T - A G spliceosomal cis-splicing in three fundamental ways. First, this discontinuous 'intron' also has consensus G T and A G dinucleotides sequences at either end. Second, the chemistry o f trans-splicing involves two transesterification-reaction, with the discontinuous 'intron' forming a Y-branched intermediate that is structurally analogous to the cis-splicing lariat (Blumenthal et al. 1988; 7  two separate pre-mRNA transcripts  A G| spliced leader RNA  discontinuous "intron"  recipient R N A  Stepl  discontinuous "intron"  recipient R N A  "Y"-structure intermediate  Step 2  "Y"-structure  mature mRNA transcript  Fig. 2 The two-step trans-splicing. The filled square represents the spliced leader R N A and the open square represents the recipient exon R N A . The discontinuous "intron" is represented by black line. The consensus dinucleotides at either end of the discontinuous "intron" are marked as G U and A G . See text for detailed descrirrtion.  8  Agabian 1990; Nilsen 1995) (Fig. 2). Third, cis- and trans-splicing share at least three small nuclear ribonucleoprotein particles- U 2 , U 4 and U 6 snRNPs (Agabian 1990; Nilsen 1995). Around eighty percent of the m R N A s are trans-spliced in both the green euglenoid Euglena gracilis (Tessier et al. 1991) and the colourless euglenoid Entosiphon sulcatum (Ebel et al. 1999), whereas all known m R N A s are trans-spliced before they are translated into proteins in kinetoplastids (Agabian 1990; Nilsen 1994; Laird 1989). In addition to kinetoplastids and euglenoids, trans-splicing has only been reported in the Metazoan worms, such as nematodes (e. g. Caenorhabditis elegans) (Blumenthal et al. 1988; Agabian 1990; Nilsen 1989; Nilsen 1994) and flatworms (e. g. trematodes) (Davis 1997; Nilsen 1995), but the process certainly evolved independently i n these animals and euglenozoa. "Aberrant" introns A third type o f intron, here simply called "aberrant" introns, has also been found in the genome o f Euglena gracilis. In general, these introns have three distinctive features (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995). First, these introns do not have any consensus sequences at their borders. Second, they employ an unusual stable stem-loop secondary structure i n the p r e - m R N A (Fig. 3) (Tessier et al. 1992; Muchhal et al. 1994), and further secondary structures (stem-loop structures) are observed i n the "aberrant" introns in the cytosolic G A P D H gene o f Euglena gracilis (Henze et al. 1995). Third, they are usually flanked by short (2- to 4-bp) repeats (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995). "Aberrant" introns have only been reported from Euglena gracilis. They are found in three nuclear encoded genes: 14 introns in the gene for the light harvesting chlorophyll a/b binding proteins o f photosystem II (LHCPII) (Muchhal et al. 1994); 12 introns in the rbcS 9  Fig. 3 The secondary stem-loop structure of an intron in the rbcS gene from Euglena gracilis (Tessier et al. 1992). The arrows point at the two cleavage sites of the intron. This intron does not have consensus dinucleotides (GT-AG) at either end. Two stretches of nucleotides at the 5' and 3' ends of the intron can base-pair to each other, usually with several nucleotides at the 3' end of the intron displaced by two adjacent nucleotides from the 3' exon.  10  genes, which encodes the small subunits o f the ribulose 1,5 bisphosphate carboxylase oxygenase (Tessier et al. 1992), and four introns in the gene for cytosolic, glycolytic glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) (Henze et al. 1995). Introns in diplonemids? Nothing is known about the intron distribution in diplonemids, the third major group of the Euglenozoa, because no nuclear protein-coding genes from any diplonemid have ever been sequenced, and their relationship to other Euglenozoa is unclear. In order to be able to determine the origins o f nuclear cis-splicing, trans-splicing, and the "abnormal" introns in the Euglenozoa, we need to know the distribution o f these characters in all three main groups, and the internal phylogeny o f the phylum. Diplonemids, as the third and most poorly studied major group o f this phylum, are an important part o f this puzzle, since the lack o f known protein-coding gene sequences from diplonemids is a gap in our understanding o f intron distribution, and hinders phylogenetic analyses to determine the branching order within the Euglenozoa. The objective o f the first part o f this thesis, therefore, is to determine the evolutionary history o f the intron-types in the Euglenozoa. In order to achieve this goal, I sequenced several nuclear protein-coding genes from diplonemids. B y so doing, I sought to determine whether these nuclear encoded genes contain introns, and i f they do, what kind o f introns they possess. A l s o , I added my new diplonemid protein-sequences to protein alignments and constructed phylogenetic trees, hoping to solve the internal branching order o f the three major groups o f the phylum Euglenozoa. Then, based on the possible internal phylogeny, I attempt to infer the origins o f the three intron types within the phylum Euglenozoa.  11  The nuclear encoded genes chosen for this study were: actin, alpha-tubulin and betatubulin. These proteins are the basic components for the cytoskeleton universally present in the eukaryotic cells. These genes are good candidates because they have been widely used as phylogenetic markers and sequences from many phylogenetically distinctive groups are available, including those from both euglenoids and kinetoplastids. In addition, these nuclear encoded genes often contain one or more introns in higher eukaryotes.  1.3  Diplonemid G A P D H The second part o f my thesis focuses on a phylogenetic analysis o f glyceraldehyde-3-  phosphate dehydrogenase ( G A P D H ) . G A P D H was also chosen for analysis because it is well sampled and has a well-known intron distribution among extant eukaryotes. Glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) is a central carbon metabolic enzyme. The phylogeny o f G A P D H is complex, resulting from a complicated evolutionary history that includes gene duplications, endosymbiotic gene replacements and lateral gene transfers (Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). In global G A P D H trees constructed previously, G A P D H sequences can always be divided into two distinct clades, GapC and G a p A / B . The GapC clade mainly represents the cytosolic G A P D H enzymes o f eukaryotes. The GapC enzymes are generally involved in the glycolysis in the cytosol. The reaction catalised by the GapC enzyme is catabolic and this enzyme is N A D specific. However, the gapl from proteobacteria and cyanobacteria are also included in the GapC clade. The reason for this is unclear so far. Conversely, the G a p A / B clade mainly represents the G A P D H enzymes o f bacteria. One exception to the eubacterial nature o f the G a p A / B clade is the inclusion o f G A P D H from the eukaryotic phylum Parabasalia. Markos et al. (1993) and Viscogliosi et al. (1998) suggested that the close association o f the G A P D H sequences from parabasalids with bacterial G A P D H sequences indicated a bacterial origin o f 12  the G A P D H genes in Parabasalia, most likely by a lateral gene transfer from a bacterium to the ancestor o f this phylum. In addition, the G a p A / B clade also includes the nuclearencoded, chloroplast-targeted G A P D H sequences from photosynthetic eukatyotes, which are also bacterial due to the cyanobacterial origin o f the chloroplast. Indeed, i n the G a p A / B clade o f global G A P D H trees, the nuclear-encoded, plastid-targeted G A P D H genes o f photosynthetic eukaryotes are always closely related to the gap2 o f cyanobacteria (considered to be the free-living relatives o f chloroplasts) (Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). The chloroplast G a p A / B enzyme is involved i n the Calvin cycle. The reaction catalised by this enzyme is anabolic and the substrate o f this enzyme can be either N A D or N A D P . Clermont et al. (1993) demonstrated that the amino acid at position 32 o f a G A P D H gene plays an essential role in choosing the relative specificity of N A D or N A D P as its substrate. In most catabolic NAD-specific cytosolic G A P D H enzyme this position is aspartic acid (D), whereas for the anabolic G A P D H enzyme that is both N A D and NADP-specific, this position is occupied by a non-acidic amino acid, for instance alanine (A) in the chloroplast-targeted G A P D H of Euglena gracilis. This is because there is an electrostatic repulsion between the negatively charged carboxyl group o f an acidic amino acid (Asp32) and the negatively charged 2'-phosphate of N A D P . The phylogeny o f G A P D H in the phylum Euglenozoa is very complicated. It has been shown that there are two distantly related G A P D H genes i n both euglenoids (chloroplast and cytosolic G A P D H genes) (Martin et al. 1993; Henze et al. 1995) and kinetoplastids (glycosomal and cytosolic G A P D H genes) (Michels et al. 1991; Michels et al. 1992). The chloroplast G A P D H gene o f Euglena gracilis is closely related to those o f higher photosynthetic eukaryotes and the gap2 o f the cyanobacteria that gave rise to 13  chloroplast-targeted G A P D H genes in higher photosynthetic eukaryotes (Martin et al. 1993; Henze et al. 1995). The cytosolic GapC o f Euglena gracilis has been shown to be closely related to the glycosomal G A P D H genes o f kinetoplastids. Glycosomes are unique microbodies o f kinetoplastids, which harbor most enzymes o f the glycolytic pathway and are often thought to be o f endosymbiotic origin (Opperdoes 1987; Borst et al. 1989; Opperdoes et al. 1989; Michels et al. 1994). This Euglena cytosol/kinetoplastids glycosomes clade branches basally to the GapC clade (Henze et al. 1995; Liaud et al. 1997). Most G A P D H in kinetoplastids is found in glycosomes (Opperdoes 1987; Borst et al. 1989; Opperdoes et al. 1989; Michels et al. 1991; Michels et al. 1992). However, Trypanosoma brucei and Leishmania mexicana possess a second, distinct cytosolic G A P D H enzyme in addition to the glycosomal form (Michels et al. 1991; Michels et al. 1992). These two cytosolic G A P D H enzymes are extraordinarily closely related to E.coli gapl (=gapA) (Michels et al. 1991, Henze et al. 1995). Michels et al. (1992) have further proved that more distantly related kinetoplastids, such as a bodonid Trypanoplasma borelli, only have the typical glycosomal G A P D H enzyme. This strongly supports the speculation that the Euglena cytosol/kinetoplastid glycosome clade represents the original G A P D H form to the phylum Euglenozoa, and a horizontal gene transfer, perhaps from a y-purple bacterium related to E. coli, resulted i n the cytosolic G A P D H in Trypanosoma and Leishmania after their ancestor diverged from the Bodonids (Michels et al. 1992; Michels et al. 1994; Henze et al. 1995; Liaud et al. 1997). H o w the G A P D H sequences o f diplonemids fit into this picture is entirely unknown. Outstanding questions include: how many types o f G A P D H genes are there i n diplonemids? Where, in the global G A P D H tree, are they going to branch? Since it is generally thought that the Euglena cytosolAinetoplastid glycosome clade represents the original G A P D H form 14  to the phylum Euglenozoa (Michels et al. 1992; Michels et al. 1994; Henze et al. 1995; Liaud et al. 1997), the positions o f the G A P D H sequences from diplonemids in the G A P D H tree may either confirm this speculation or possibly reveal new relationships between diplonemid G A P D H sequences and those o f eukaryotes or prokaryotes.  15  CHAPTER II:  Materials and Methods 2.1  Strains and culture conditions Axenic cultures of Diplonema ambulator ( A T C C 50223), Diplonema papillatum  ( A T C C 50162), Diplonema sp. 3 (new strain) ( A T C C 50225), Diplonema sp. 4 ( A T C C 50232) and Rhynchopus sp. 3 ( A T C C 50231) were obtained from the A T C C (American Type Culture Collection). Cultures were maintained in four 150x15 m m sterilized, disposable plastic petri dishes (FISHER) in A T C C Culture mediuml728, enriched Isonema medium ( A T C C 1405 H E S N W Medium) with 10% heat-inactivated horse serum (Sigma Cat. # H I 270) added aseptically just before use. (Detailed recipes are given at the end o f this section). Cultures were incubated at room temperature. After significant growth was observed by light microscopy, cells were harvested by centrifugation at 2000xg, 4°C, for 10 minutes. Genomic D N A of Diplonema sp. 2 ( A T C C 50224), Diplonema sp. 3 ( A T C C 50231), Diplonema sp. 4 ( A T C C 50232), Rhynchopus sp. 1 ( A T C C 50226), and Rhynchopus sp. 2 ( A T C C 50229) were provided by E m a Chao. A T C C M e d i u m 1405: Natural seawater 1.0 L Enrichment Solution (see below) 10.0 m l Vitamin Solution (see below) 1.0 m l Two-month-old seawater was filter-sterilized and all components were combined aseptically. Enrichment Solution: EDTA 2H 0 NaN0 Na Si0 9H 0 2  3  2  3  2  Sodium glycerophosphate H3BO3 Fe(NH ) (S0 )2 • 6 H 0 4  2  4  2  0.553 4.667 3.000 0.667 0.380 0.234  g g g g g g  16  0.016 g FeCl 6 H 0 0.054 g MnS0 4H 0 ZnS0 7H 0 7.3 mg C0SO4 7 H 0 1.6 mg Distilled water 1.0 L N a S i 0 3 was neutralized with 1 N HC1. A l l ingredients were combined i n the order listed. This solution was filter-sterilized. 3  2  4  2  4  2  2  2  Vitamin Solution: Thiamine Vitamin B i Biotin Distilled water This solution was filter-sterilized. 2  2.2  0.1 g 2.0 mg 1.0 mg 1.0L  DNA extraction procedures C e l l pellets o f diplonemids were resuspended in a 1.5 m l C T A B Solution (4% (w/v)  C T A B (Hexadecyltrimethylammonium bromide, S I G M A H-5882), lOOmM M E S ( S I G M A M-8250), 1.4M N a C l and 1% 2-Mercaptoethanol) pre-heated to 65°C. The mixture was incubated at 65°C for 30 minutes to allow for digestion and lysis. D N A was then gently extracted (to avoid extensive shearing) from the mixture with an equal volume o f chloroform/isoamyl alcohol (24:1). D N A was precipitated from the aqueous phase by adding 2/3 volume isopropyl alcohol and incubating overnight at 4°C. D N A was collected the following day by successive centrifugation o f 1.5 m l aliquot portions i n the same 1.5 m l tube at maximum speed (usually 12500 rpm) for 2.5 minutes. The D N A pellet was washed twice with 95% ethanol and twice with 70% ethanol to remove salt before being air-dried pellet and resuspended in T E (10/1 T r i s / E D T A , p H 8.0).  2.3  PCR conditions Degenerate P C R primers for alpha-tubulin, beta-tubulin, actin and glyceraldehyde-3-  phosphate dehydrogenase ( G A P D H ) were designed by Dr. Patrick Keeling based on the  17  conserved amino acid sequences at the extreme A m i n o - and Carboxyl- termini o f the corresponding protein. The conserved regions at the N - and C - termini o f the four proteins used for primer designing are given below: Alpha-tubulin gene, N - terminus  5'-QVGNAGWE-3'  C - terminus  5'-WYVGEGM-3'.  Beta-tubulin gene, N - terminus  5'-GQCGNQ-3'  C - terminus  5'-MDEMEFT-3'.  Actin gene, N - terminus  5' - E K M T Q L M F E - 3 '  C - terminus  5'-VHRKCF-3'.  G A P D H gene, N - terminus  5'-KVGINGFG-3'  C - terminus  5'-WYDNEWGYS-3' .  The degenerate primer pairs used for these four nuclear encoded genes were: Alpha-tubulin gene, TUBA 1  5' - T C C G A A T T C A R G T N G G N A A Y G C N G G Y T G G G A - 3 '  TUBA2  5'-CGC GCC A T N C C Y T C N C C N A C R T A C CA-3'.  Beta-tubulin gene, TUBB1  5'-GCC TGC A G G N C A R T G Y G G N A A Y C A - 3 '  TUBB2  5'-TCC T C G A G T R A A Y T C C A T Y T C RTC CAT-3'  Actin gene, 18  actF2  5'-GAG A A G A T G C A N CAR A T H A T G TTY GA-3'  actRl  5'-GGC C T G G A A R C A Y T T N C G R T G N A C - 3 '  G A P D H gene, gap IF  5'-CCA A G G T C G G N A T H A A Y G G N T T Y G G-3'  gaplR  5'-CGA GTA GCC C C A Y T C RTT RTCRTA CCA-3' Amplification o f D N A was carried out using standard methods. Typically, 250ng o f  diplonemid genomic D N A was used as a template in 50 p l reactions, with each primer at 10 p M , 0.25 units o f Taq polymerase, 2 . 5 m M concentration o f dNTPs and reaction buffer (Gibco B R L ) . Cycle parameters were 94°C/ 30 Sec & pause 2.15min ( l x ) ; 94°C /30sec, 50°C /30sec, 72°C/ 2min (30x); and 72°C /5min(lx).  2.4  Cloning of amplified fragments A n aliquot o f 5pl o f each 50pl P C R reaction was run on an agarose gel (0.7-0.8%  agarose), together with the D N A molecular weight marker (1 K b D N A ladder), to check the size o f product. If the product has the expected molecular weight, the remaining portion o f the reaction was run on another agarose gel (0.7-0.8%) agarose), and the fragment o f interest was isolated from the gel using either the Prep-a-gene kit ( B I O - R A D ) or GeneClean II kit (BIO 101 B I O / C A N SCIENTIFIC). Isolated fragments were ligated into the p C R 2.1-TOPO T-tailed vectors as specified by the manufacturer's protocol (Invitrogen). After 5 minutes o f incubation at room temperature, 'One Shot Competent T O P O 10' Escherichia coli cells were transformed with the ligated plasmids following the manufacturer's protocol (Invitrogen). The cells were plated on selective L B medium containing 50 pg/ml ampicillin and 40 p l o f 40 mg/ml X-gal and incubated overnight at 37°C. The presence o f X - g a l allowed for 'blue-white screening' where colonies containing vectors with inserts appear white, whereas colonies containing 19  'empty vectors' appear blue. In order to determine the sizes o f the inserts within the plasmids, 6-10 or more white colonies per cloning reaction were chosen and either a restriction analysis (digest with EcoR. I) or a screening reaction ( P C R ) by amplifying these inserts using M 1 3 Forward (-20) and M 1 3 Reverse primers were performed on them. O n average, six white colonies, each o f which contained plasmid with the insert o f expected size, per cloning reaction were cultured overnight i n individual tubes containing liquid L B medium with 50 u.g/ml ampicillin. Plasmid D N A with the expected size o f P C R insert was isolated using either the standard alkaline lysis (miniprep) method (Sambrook et al. 1989) or the Perfect prep Plasmid D N A K i t following the manufacturer's protocol (Eppendorf).  2.5  DNA sequencing Automated sequencing using the dideoxy method was employed to obtain the  sequences o f cloned P C R products. The full-length sequences o f genes were obtained using a primer-walking strategy for alpha- and beta- tubulin genes. (Walking primer sequences are given below.) Regions sequenced only on a single strand were confirmed by two independent sequences. The forward strands o f the alpha-tubulin genes of Diplonema sp. 2 and Diplonema sp. 3 were sequenced using the following oligonucleotide primer: TUAA32  GCG GCG A A C A A C TAC GC.  The reverse strands o f the alpha-tubulin genes of Diplonema sp. 3, Diplonema sp. 4 , Rhynchopus sp. 1 and Rhynchopus sp. 2 were sequenced using the following primer: TUAA41  G G C A G CAC GCCATG TAC.  The forward strand o f the beta-tubulin gene of Rhynchopus sp. 1 was sequenced using the following oligonucleotide primer: TUBB32  G G TGC GGG G A A C A A CTG. 20  The reverse strands o f the beta-tubulin genes of Diplonema sp. 3, Rhynchopus sp. 1 and Rhynchopus sp. 2 were sequenced using the following oligonucleotide primer: TUBB42  GAC TTG ATG TTG TTC GGG.  The forward strands o f the beta-tubulin genes of Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 4 and Rhynchopus sp. 1 were sequenced using the following oligonucleotide primer: TUBB3  GGA GCTGGTA A C A A C TGG.  The reverse strands o f the beta-tubulin genes o f Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 4 and Rhynchopus sp. 1 were sequenced using the following oligonucleotide primer: TUBB4  C T T G A T G TTG TTT G G A ATC.  The forward strand o f the beta-tubulin gene o f Rhynchopus sp. 2 was sequenced using the following oligonucleotide primer: TUBB33  2.6  G G TGC GGG C A A C A A CTG.  Sequence alignment and phylogenetic analyses The nature o f the obtained sequences was confirmed by B L A S T searches against the  GenBank database. Introns were tentatively identified by insertions in genes that couldn't be aligned to the amino acid sequences o f the same gene from other organisms. The sequences were then imported into the Sequencher3.1.1 software package, where contigs were assembled. Once the contigs were complete, they were translated into amino acid sequence using the D N A Strider 1.2 program. B y comparing the inferred amino acid sequences with the corresponding protein alignments, introns were positively identified by the presence o f canonical G T - A G boundaries. Introns were removed, then the nucleotide sequences were  21  translated into amino acid sequences. A l l the inferred amino acid sequences were added to the corresponding protein alignments. A m i n o acid sequence alignments o f alpha-tubulin (436 amino acids), beta- tubulin (428 amino acids) and actin (373 amino acids) that included broad samplings o f eukaryotes, and G A P D H (290 amino acids) from a wide range o f both eukaryotes and prokaryotes were provided by Dr. Patrick Keeling and Dr. Naomi Fast. A m i n o acid sequences from diplonemids were added to these four alignments. Regions in the alignments that did not appear optimal were subsequently adjusted manually using a text editor. Phylogenetic analyses were performed on the aligned protein datasets using a distance method. P U Z Z L E version 4.0.2 (Strimmer and von Haeseler 1996) was used to calculate maximum likelihood distances between pairs o f sequences. The distance matrices were corrected by the J T T substitution frequency matrix with amino acid usage estimated from the data, site-to-site rate variation modeled on a gamma distribution with eight rate categories (except for the G A P D H alignment with 100 taxa). The Gamma distribution parameter alpha was estimated from each dataset. Trees were constructed from the distance matrices with the neighbor-joining (NJ) algorithm using the B i o N J program (Gascuel 1997). A hundred bootstrap resamplings o f the data were generated by the S E Q B O O T program implemented i n the Phylip 3.572 package (Felsenstein 1993). One hundred distance matrices were inferred from the 100 resampled alignments by P U Z Z L E version 4.0.2 (using the settings described above but not with the gamma-distribution), using the shell script puzzleboot (by M . Holder & A . Roger). A hundred trees were generated by analyzing the 100 distance matrices with the neighbor-joining (NJ) method using B i o N J . The bootstrap majority-rule consensus tree was constructed using the C O N S E N S E program from the Phylip 3.752 package.  22  Alternative internal topologies o f the phylum Euglenozoa i n the alpha- and betatubulin and actin trees were tested statistically using the Kishino-Hasegawa ( K - H ) method (Kishino and Hasegawa 1989). This method evaluates the standard error o f the difference in In likelihood between alternative topologies, that is, it allows one to test whether a tree topology with higher likelihood is significantly preferred over others with lower likelihood. A l l K - H tests were performed using P U Z Z L E version 4.0.2 with gamma-distributed rates and user-defined trees (the parameters used were based on the first input tree). For the present studies, differences o f log likelihood greater than 1.96 standard errors (corresponding to a 95% confidence interval) were considered significant (Kishino and Hasegawa 1989).  23  CHAPTER  III:  Results 3.1  Sequences for nuclear encoded genes from diplonemids I sequenced genes for alpha- and beta-tubulin, actin and G A P D H from nine different  diplonemids in this study. I obtained twenty-nine seqences in total: ten alpha-tubulin sequences from eight different diplonemids, thirteen beta-tubulin sequences from eight different diplonemids, two actin sequences from two different diplonemids and four G A P D H sequences from three different diplonemids. (See Table 1 for a summary). The predicted amino acid sequences inferred from the nucleotide sequences for these twenty-nine nuclear encoded genes are given in Fig. 4-Fig. 7. Lengths o f these sequences (with neither intron nor P C R primer sequences included) are approximately: 733 nt for actin, 1153 nt for alpha-tubulin, 1162 nt for beta-tubulin, and 904-949 nt for G A P D H . A c t i n gene sequences A band close to the expected size (777 nt) was obtained from Diplonema ambulator. A band larger than the expected size was obtained from Diplonema sp. 3. These two amplified products were cloned and seqeuenced, and blast searches against the GenBank database confirmed both encoded actin genes. Blast searches also revealed that both sequences contain introns: two in the actin gene o f Diplonema sp. 3 (80 nt and 176 nt in length) and one in the actin gene  Diplonema ambulator (40 nt in length). The length o f  each o f the two predicted protein sequences is 244 amino acids, representing about twothirds o f a complete actin sequence.  24  Table 1. Twenty-nine nuclear encoded genes from nine diplonemids. Numbers indicate copy/copies sequenced from one specific diplonemid. diplonemid  ct-tubulin  (3-tubulin  Diplonema sp. 2 Diplonema sp. 3 Diplonema sp. 3 new Diplonema sp. 4 Diplonema ambulator Diplonema papillatum Rhynchopus sp. 1 Rhynchopus sp. 2 Rhynchopus sp. 3  1 1 2 1  1 1 2 3 2  1 1 1 2  1 1 2  Actin  GAPDH  1  2 1  1  1  25  o a . c c « cu u c  •3  CJ  00 00 O  >1 S  crj  w5 tl  CO  X X X X  s M s H X X a a  E E O P Q D 2 |2 EH EH  >>  o0 x as bi la M  M  « ;>s H ,J H > 0 0  o s X;X o << a << a  a Q 0 M  w Q 0 > >H bl < ij  q :*s 2 *s x  CO  < < EH cn 0 S E  CD  ss  <<  co a a bi  2 2  OS Oi  os os  0 O >> H H CO CO Ch Ol bi bi  >>  o so Di J < ft  a a a a 0 0 a. co tn In  oo > £ E E  o o CO CO a 0 2 S3 Q Q CJ CJ  >> <: < a co a a bl bl M W H a W CO g E  DI  I  U ID  In  in O C HOH  ro ro  M M H E OS OS OS J OS  os  >>>M  X Oi Oi a Oi Oi X  J J CO C O CO C O >H >l >H o 0 o a bi w bl bl >H rH M > rH >H >H  Oi Oi  o,1—1 Oi  > >>> EH EH EH EH X X  EH EH  SBX  CO EH  E> > > > 0 0 a a o Q a a o0 a 0  CO CO u < a a Q D j  >> rH > H > HH o 0 o o H H EH H H EH EH OS OS OS OS 0 0 0 CO CO CO CO CO CO CO CO >i >l >l >i J CO CO CO CO  o  •J >>>> ft ft ft ft  a o> H H  a a H H o o r a u  > >> > >i >H >H E 5! < Oi Oi Oi Oi CO CO *> > S3 S3£5 0 bi Cn bi In EH EH EH H I bl bl i bi b  ^  ii i i ii 1  i  ii ii ii | i  ii ii 1  i  i ii i i i i i  ii i i  1 1 1s s H H 1 1a aH 1E E 1Xbi X, I1OS bl OS II CSO3 Sa3 X X 1 Oi Oi 1 1 E E 1 | bl w 1 I1J HJ J > 1> Oi CO 1 1cc a CO 1 bl 1Oi bl Oi 1 1ft> 23 1 1OS O>S 1 Q I1bl S3 bi S>i 1 bi bi 1 1 EH  1  AP AP  o co a\  EH  1  3S  EH EH  H M M ^  ||  Q  Q  bl  CO H CO EH < <a a Cn bi bi bi EH EH EH EH CO COCO CO CO J H CO 0 0 H  H CN ro  I*  43  3  COCO CO CO CO J HH H COCO CO 0 0 0 0 0 0 HH M  fj_i  S%  "a -2 o  » Q § 0 • 5 .§ ^ 8  1  <  "  J2  IH  a.!  5  )-H -^H W  0 o -a O IS C*H •S OJ i« ro r ts •O it 0  X. X X X  M M H Oi X X X  E E S M C O COCO CO S3 S3 S3 CO a. Oi Oi Oi  CJ  as >>  B § § > o a  S3 S3 S3 S3 EH EH EH  CO H M  a bl bl bl X X X X  «  CO COCO 0 J J E J OS OSOS OS Q Q bl bl Oi Oi 0 00 bl bl bl >H >i >H E E E EH  "I „  C3 St  CO COCO CO >l >H >< >H X « X OS OSOS OS bl bl bl bl Oi Oi Oi Oi Oi Oi Oi Oi ft ft ft  ^ o  Q a S  EH  co  cu  .0,  a o .  a ..  c 4— too — <  *z  2  bi E  EH EH EH EH EH EH EH  T3  o a  0 00 0 0 00 0 CO COCO CO  > >> > H M H H  S3 S3 S3 S3  CO 0 J  ^  >H  q OaS 2 2 2 Q Q  OS OSOS OS  > o > o > Q > p  H HH M a Q D P CJ X CJ X X  uu  E S S3 S3 M H H H CO COCO CO S3 S3 S3 a 5H SH b. b. p. p> EH EH EH EH EH E bl bl bl bl X X X X  s>  H H bi 0 00 O Oi S3 S3 u ft ft ft < bl bl bl bl P J •4  J J  SH 0 4J  vo  ^ji ^j| r^JI ^i r- rCN CN ro ro  ii ii U bi bi i i X XU i i OS J>M> Mi Mi M S3  N  3 IH  Q Q bl H CN ro  CJ  is O CO .5 3 S  ^ 5  bl bl bl bl bl bl bl CO  ^^ ^ ^ ^^ M U -rH ro•  m  3 J3  0 UJ  N3  •31  H CN ro ^  -  O C3  rS  60  o aa  ft^ 13  a ti  0,1 IH IH 3 Oil)  Q Q «  T  o &  «I  E-i  CA  co  e pa  CO  i-H- H r-H - H  5.  o ™2 ™ of.  >H >H >H >H  w ••H  i-H - H us-iH  i-H  « co cod  0 0 0 CO Oi Oi Oi Oi 0 0 0 o CO CO ft ft bl bl bl bl P P P P  IH 0  °<1 to 13 tn o  EH  " S O o o  s*&  H  3 0 • *Q U  >H  EH  ^J| ^J| t  H H <N CN  C N CNo>CO cn ro LO m fM fMro ro  n  N 3  &> O  Q Q bl  ro ro o co  I •a a  cs  EH  •  C3  "> 'Ft  MI2  e  tt< co  bo  The deduced amino acid sequences of Diplonema sp.3 and Diplonema ambulator were aligned with the homologous region o f the actin sequences from 63 other eukaryotes. Figure 4 shows a representative alignment including actin sequences from Diplonema sp.3, Diplonema ambulator, Euglena gracilis and Trypanosoma cruzi. The two diplonemid sequences were very similar to each other, with only 3 amino acid differences over the total length o f 244 amino acids (sequence differences were calculated by P A U P version 4.0). In addition, they were similar to the sequences o f other euglenozoa: when the two diplonemid actin sequences were compared to that of Euglena gracilis, only 37 o f the 244 amino acids were different, whereas sequence differences between diplonemids and kinetoplastids were higher (64-71 o f the 244 amino acids). Alpha-tubulin gene sequences P C R products o f the expected size (1197 nt) were obtained from Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 3 (new strain), Diplonema papillatum, Rhynchopus sp. 1, Rhynchopus sp. 3. P C R products o f larger than the intronless sizes were obtained from Diplonema sp. 4 and Rhynchopus sp. 2. The above P C R products were cloned, and both strands o f two independent clones from each source were sequenced. It was confirmed that all o f them were true alpha-tubulin sequences by B L A S T searches against the GenBank database. The results o f the B L A S T searches also revealed that there was one intron in each of the two alpha-tubulin clones from Diplonema sp. 4 (109 nt) and Rhynchopus sp. 2 (126 nt). In addition, by comparison, I found that the two alpha-tubulin clones from both Diplonema sp. 3 (new strain) and Rhynchopus sp. 3 were slightly different from each other. The two sequences from Diplonema sp. 3 (new strain) vary at several nucleotides and two amino acids and those o f Rhynchopus sp. 3 vary at several nucleotides but not at any amino acid. These differences indicate that two different alpha-tubulin genes were sequenced from 27  cTi Ar^r^r^r~r-r~r^r-r^r-cocoC ujv DVD^yjMJVD^usvD  m (^r>i^i^i^r>r>r>r~-r~-r~- mo^r-r-r--r^r--r-i>r--ir~-co r^hi/iiriuiiriuiiriLnLriinLn V D ^ O ^ ^ ^ ' ^ ^ ^ ' ^ ^ ' ^ C N  ft X X ft X ft X ft X ft X ft X ft X ft X ft X ft X ft X  EH EH EH H  EH EH EH EH EH H  EH  cu  COCOCOCOCOCOCOCOCOCOCO CU > H > > > > > > > > > CJ a a ao< ao. a a aa. a a a a Ccu U ft ft ft C Oft COCOCOft COCOCOCOC Oft COC OCU C U >< > H > >H fH > H >H ft ft H ft ft ft ft ft ft ft ft ft H >< >< >< H EH > EH > H> EH > HEH EH > EH > EH > EH > ucu EH H !H p 0 C 0uC 0uC 0u0Cu0CuC 0u0Cu0CuC 0u0Cu0 PP X P X P X P X p X X P PPP X P X p COCOCOCOCOCOCOCOCOCOCOCO  CuCu6. CuCuCuCuCuCuJCuJCuCu pP pppppO P acu cu a a cu a 0a4 cu a oi a ao; aoi os o o OHi Hi > >H >H >- >H >H >O  i* i* i* H EH EH EH H 0EH o o EH o EH o UEH aEH 0EH EH 0 H EH EH 0 H o EH 0 B! CUOi CUOJ CUOJ OJ Oi Oi Oi H >>>>>>>>>>> Cd Cd a cd CB Cd cd Cd U Cd Cd Cd RP M P P P QQP QP P H H MH MH M >>H >>> >>>>>>>>> EH COEH EH H EH EH EH EH EH ftft ta w ft Ci] ft ta ft Cd ft Cd ft Cd ft Cd ft Wft Cd ft Cd ft Cd p P PPPpPPP a p pPPPPPPpPPP ppJ pPppppPPP CuCuCu J CuC P >u > >>> >> < 2< < <<< 2 2 2 2 2 2  DYG DYG  EH EH EH EH EH EH H  X X X X X X X X X X Oi cu Di Oi CUCUCUCU CUCU 0 0 0 0 0 0 0 >H 0 0 0 i* CJ >H >H >H >H >H i» >H  99 99 P> 99 >>>> > > > CO> > CO pOpCOC p COpCOC p pCOp C p pOp P COC oi Oi Oi OiOP cu cuOC cu cu CUCU cuOcu Cd Cd pCd C d C d C d C d C d C d Cd Cd Cd P P p pp P P P P P pp P P PPp p PPPPPp PP PP PPPPP PP CU 0 0< 0< 0ft < < < < ft 0 0 0< < o 0 0rt0 0 p p p p PP PPP P PP 0 0 0 0 0 0 0 0 0 0 0 0 COC O C O C O 0 C 0EHOC 0OC 0OC 0OC 0OC 0EHOC 0OC 0O o EH 0 EH 0 EH EH EH EH EH EH EH EH  2 2 ft2 2 2 ftftftftfta.  > > >>> >>>>>>> ft X ft X X ft X X X. X X X. x  x 2 2 X X X X 2 2 0 0 u o0 a oooooo C O C O C O C O < rto< u< u< o 0 < 0 o 0 o o EH u o H EH EH EH EH EH H EH EH EH EH  Cd Cd Cd WWCd Cd Cd U p Cd COCOCOIt, COCOCOCOCOCOCOCOCO CuCuCn CuCuCuCuCuCuCuCu CuHCuEH CuEH CuEH CuEH CuEH CuEH CuHCuEH CuEH CuEH Cu EH 2 is CO2 2 2 S3 s C u c u C u C u C u C u C u C u u ft < < ft ft ft ft rt rt< C a a Pa a P a pP pP < a a p Pa a P a pDpP a ca Cd U a a Cd w Cd Cd Ww — 1M M > > > > M Hi H H H 1 0 0 0 a 0 0 o o Oo o 0 a  p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0< 0 0 0 0 0 0 0 0 0 0 0  >>>>>>>>> >> ft > C COC COC < 2 2CO 2O2 2O2 2OC 2O2 2 C u C u C u C u C u C u C u C u C u CuCu ft > >>>>>>>>> p p > Pp PPp p PPPP CuCuCuCuCuCuCuCuCuCuCuCu 0 0 0 0 0 0 ap a LSIa p aP aP 0EH P 0 0P 0 0EH 0EH EH EH EH U2 U 2 u 2 o 2 U 2 U 2  MM 1 — 1— i i— i i EH M— i i EH 1 — 1M HM EH u u EH EH EH ft EH X 2 X X X X X X, X a QPPPPa a a Pa a EH  COCOCOCOCOCOCOCOCOCOCOCO a ftft S ftft J ft P ft P ftft s ft ft O ft ft 0 ^ < ^0 C 0 0 O0 < o O o CJ o 0 QQD P P p p Ppppp ftoft ft ft ft ftaH oft ft ft ft ft o a H o M o H o H a Mo Ma Ho Mo M H H  0 ap 0 EH U 2  0 ap 0EH u2  0 0 0 0  a aa a  P 0P 0 0P 0P EH EH EH EH u2 U U 2 2 U 2  X P X X X X X X CUOi CUCU CUCUCU CUCUOi CUCU H H H MH H H H Oi Oi M Oi H cu H CUCUCU CUCUOi CUCU PPPp Pp a p p Pp p pppppppppppp  0  0 0 0 0 0 0 0 0 0 0 00 XXXXXXXXX X XX Cd CdWCdHCdCdWCdtdWCd P P PPPPPPPP P PU U U UUUOUUU u u >H>H>H>H>H>H>H>H>H>H PPPPPPPPPP  0UUUUUUUUCJUU  pppppppppppp H M S M H H M H H M H H  Cu >H PP Cd Cd 5 S CJ u  CdCdCdCdCdCdCdCdCdCdCdCd X X X X X X X X X X X X 00O000000000  pppppppppppp >>SESSSSSSSS CuCuCuCuCuCuCuCuCuCuCuCu X CC X X X CC CC CC CC CC CC X MhHHMHHMMMHMMH  cuoJoioioiQioioioioioioi ftftftftftftftftftftftft  >H>H>H>H>H>H>H>H>H>H>H>I  ftftftftftftftftftftftft >>>>>>>>>>>> PPPPPPPPPPPP 2 2 2 2 2 2 2 2 2 2 2 2 EHHEHEHEHEHEHEHEHEHEHEH  aaaaaaaaaaaa CuCuCuCuCuCuCuCuCuCuCuCu CdCdCdCdCdCdCdCdCdCdCdCd EHEHEHEHEHEHEHEHEHEHEHEH  PM>S>>>>>>>>t>  ppppppppp,  <<<<<<<<<<<< 0 0 0 0 0 0 0 0 0 0 0 0  PPPPPPPPPPPP CuCuCuCuCuCuCuCuCuCuCuCu CUDJOiDiOiOiOiOiDiOiDiOi PPPPPPPPPPPP COCOCOCOCOCOCOCOCOCOCOCO EHEHEHHHEHHEHHEHEHEH  PPPPPPPPPPPP  <cococococococococococo cocococococococococococo > H M H M M M I - t H M H M  >>>>>>>>>>>> aaaaaaaaaaaa Oftftft<ft<<<nuift<  h4H>-iH}-iHh4H>-lhi>-it-i  PPPPPPPPPPPP  cucuoioicucuoioioioioioi 2 2 2 2 S 3 E 5 2 2 2 E 3 2 2  pppppppppppp  2 2 2 2 2 2 2 2 2 2 2 2 EHEHEHEHEHEHEHEHEHEHEHEH >H>H>H>H>H>H><>H>H>H>H>H EHHEHEHEHHHHEHEHEHEH  ftftftftftftftftftftftft cucuoioioioJeicuoioioioi  CdCdCdCdCdCdCdCdWCdCdM P PP P PP PP PP PP PP PP PP P P P PP P cUcUoioioioicUcU CU CUCU cu  E H U O < < < 0 < ft ft dHHHHHHH ft ft PPI  PPPPPPPP  >H >H ;  >HrH>H>H>H><>H>4  ft <  Cd Cd 2  2  PR PP  H t H M H M H M H H H H H EHEHEHEHEHEHEHEHEHHEHEH >H>.>H>H>H>H>H>H>H>H>H>H HC HC HC £C K K  0 0  >— II aa  K  £C K  0 0 0 0 0 0 0 0 0 0 0 0 CUCUCUOiQiOiDiOiOiCUOiOi <:<<<<<<<<<;<<  vH  oo <0 aa 0 0 Hp  s gr-  XX HH  1  PPPPPPPPPPPP  CdCdCdCdCdCdCdCdCdCdCdCd X X X X X X X X X X X X cocococococococococococo  CJ CO HM <C M Cd Cd  0 0 0 0 0 0 0 0 0 0 0 0 H H H M H H H H H H H H  PPPPPPPPPPPP aaaaaaaaaaaa  CdCdCdCaCdCdCiICflCdCdCdCd  u o COC (J 5 5 t-H >-itt)O J  'H 'H t-H  a C  rH (N u CJ u ) •> H 5 H - H> < - 3O J tt)HUCN CJ E;  H (N CJ U  CO  N Cj-HnnniN^nnHn 3 mU  3  "H • "rH  CO 4J U U tti3oj M— I 3  •rH 1 Ct)  rH CN  UU r•H rH N O'HronrofN^frorOrHCN CH  £  !H SH ul C ^ r ^ c ^ c ^ Q i C l C i ^ C H > H > H <0 C^DC^QQ,D,QQD, N U'HmrorocN^roroHCN OC^Q<C0WC0C0C0C0C0C0C0 3 H D, hWQQ^QQQCUCUCUtti h Iii B Q Q ti Q Q Oi ni Oi »i >H >H U ClC^ClnC^C^C^CiQ.Ci OtoDjCOWCOCOCOCOWCOCO rHCNro^Lnvor-cocnOrHtN H n n ^ i n c o ^ r D O j o H N EH' Cd Q Q Q Q Q Q CU Qitt;uj rHcNro^rLnci>c-~cocr»oHfN J  J  J  I—[  (H (H  <  28  rororororororocororororo  rsJ^COCOCOCOCOOOCOCOCOVD  CMCMCMCMOICMCMOIOIOIOIOI  a o o o a o o o a o o o ^ ^ ^ ^ . ^ ^ ^ ^ ^ ^ ^ ^ 2 2 2 2 2 2 2 2 2 2 2 2  o o o o o o o o o o o o u u u u u o o o O CJ u u X X X X tn Cn tn Cn tn tn tn tn tn tn tn tn O O O O O O O o o o o O H H En H EH EH EH EH Ot O i O i O i CM CM CM CM CM Ch C M Ch CO CO CO CO CO CJ CO COCO CO CO CO  s s s s 3 3 3 £3 3 3 3 a Da a a a a a a> a> a >a >>>>>>>>> CntntntntntntnfntnCnCntn  aaaaoaaaaaaa  i2  rHrHrH-HHHHHHrHMrH EHEHEHEHEHEHEHHEHEHEHEH BS BSoS QSOSOSOSOSOSOSOSOS  EHEHEHEHEHEHHEHEHEHEHEH ^ h^H t^* b^Ht ^ r^ r^ HMMHHrHHHHI—IHrHH EHEHEHEHEHEHHEHEHEHEHEH  >>>>>>>>>>>>  rfcococococococococococo Oj Ol Ch  P QP  §§§ cu cu cu  CM CM  Oi Oi Ch Ch  a a a a a a  0000 o o Q P Q OS OS OS PS OS OS 0 0 0 >2l >2l >2l ><2 2 2 J J J >H r H ><a J J J J u o o 2 2 2 UUUU u u J J u u o uu u U V u oJ u o OOO  OS OS OS !* >< >< 2 2 2 OS Cn  u u u  s2« &  >H > l >H >H X X OS X  XXX  >H >H  « X  000 0000 000 0 o X X X X X X X x x x SB 33 S OS O OSS OS a a of.a; os os os O CM OJ Ch01 CM Ch Ch Ch Oi Oi Oi P P D 01 p p p pa a a p p u u uu u o u »^  2  2  xxx  V CJ  2  2 2 2 CO E S S 2 2 2 2 2 CO CO itCO CO CO CO CO CO COCO CO ft ft ft < ft < ft  Ch Ch Ch Ch Ch Ch Ch ChCM O i O i CM Id H Id Id Id w BI ta 01 til til Id  tn tn Cn tn In tn In tn tn tn tn In ti<,mft<tftftftftftftft  sa  2 2 2  2  2 2  2  EH EH h h h h CO EH EH H fnH M M M M MH H H H H H H Cd Id Id Id Cd Id Id Id Id 01 01 Id CO < ft ft ftft ft ft < ft < ft > >> > > > > > > > >> CO CO CO CO CC OO Cn COCO CO CM CO •J hi hi hi J J J J J 2  2 2 2  o a a a a  rl hi hia  aa o  t d C d C d t d C d lad la d ta n ' Id Id td td  x x x x x x x x  > H > H > I > . > H > I > I > I  as as a: as >l  >H >H  >H  tB Id Id Id ft ft < CO CO CO CO IH HH M H H H  Wldldtdtiltdtiltil <t<<ftftftft<  cocococococococo > M Oi Oi  C3  >H >H  • > > > > > >: Oi O i O i CM C M :CM I ft < ft  > > > > CM CM CM CM  ft ft ft ft < ft ftI >1 >H > H > H > H H >H >H CO CO CO CO CO COtn > H > HI C; O CO CO CO CO CO CO EH CO I CO 1 CO CO CN CO COH CJCOCJI CO -U SS r H CN i-H. i-H 0> 01 CJ U - H 'rH r-H d CS M O-rHrororocN^iroroHCN CO O,  co co co  3 U VH rrj 0,0,0,0,0,0,0,0,0, Ocj>Q,totococotococococo  E-IWQ'QQQQQ«Q;OSBS  crt —^ In In In tn In tn In In In In •1 hi ft <; ftft ft <; ft 2 2 2 2X i i 2 2 2 2 XO CO CO COCO CO CO CO CO CO CO CO C >< >H > H >H >l s. >l >l >l >H  |  2  2  2  2  2  P P Q D In P In P In P tn P Cn Q Cn tn In In X XX X « X X X X X X X as X X X X X p a a p P 0 a a a M M hH H M H H H H  P Cn X X a hH  P In X X a M  P In X X a HH  2  2  2 2  2  2  2  2  2  2  2  2  2  2  2 2  2  2  2  2  2  2  2  2  hi hi  2 2 2 ,1 hi r l  2  2  2  hi r l hi hi hi hi hi  C3  OSOS OSOSOSOSos OSOSOSOSOS ft ft ft ft ft ft ft ft ft < ft ft tn In tn ChCh In thIn In tn In In > td > td> Id > Id > Id > Id > Id > Id > Id > Id > Id > Id ft <C ft < ft ft ft ft ft ft rtftHi Hi M H M HH H M M Hi M H ft < ft < <C < < < <; < < < EH EH EH HO CEHO CO CEHO CHO CEHO C OC O EH COCO C O CO C CO COCO CO CO Ul CO CO CO CO CO Hi M hH H H hH H M M H H hH CO U  U U U U U U U CJ CJ U > > >> > > > > > > > > - - - a a aa a a g ;> gaaa x x g g g § § Ss > M  >  >  R P P P P Pa p p D P P O O O O O O 0 0 0 OOO O O O O O O 0 0 0 OOO O i CM C M CM O i CMCMCM CM CM CM CM H  LnH^Jl'rll^Jl^ll^lSllrJl'rJl^JlLn CNLOCOCOCOiXCOOOCOCOCOCD  ^^rorOfOfOrorororororo  EH EH EH EH EH EH EH EH EH EH EH CM OI OI  Id  s2  CM CM CM CM CM CM CM CM CM  CO hi •H rrj  CJ u  SS H CN OJ 01 u u CS c! O-rHrororocN-^roroHCN HQ,  r-H i-H I "rH i-H  SH&)Q U ,0,0,0,0,0,0,0,0,0, COtOCOCOC0MCOWW EHHQQQQQQOSOSQSQS HcNrofLnyjr-comoHCN  CO 4J -•H -H I-H  N  rH CN u u S S oi a) (H  CS  H CN o u  O-rHrororoCN^mroHCN r]  VHO3t nrHOa crj 0,0,0,0,0,0,0,0,0, .cocficoiocococococo  HCNro-rjiincor-cocnorHCN  cn  these two diplonemids, but only one gene was sequenced from each o f the remaining six diplonemids (see Table 1). The sequence o f the P C R product was 1153 nt in length (excluding primer and intron sequences) for each o f the nine alpha-tubulin sequences, recovering more than 80% o f a complete intronless alpha-tubulin sequence. However, for the alpha-tubulin sequence o f Rhynchopus sp. 2,1 was unable to sequence about 120 nt (see Fig. 5). The inferred translation o f 384 amino acids for each o f the ten alpha-tubulins from diplonemids (with no primer sequences) was aligned with those from a sampling o f 54 other eukaryotic taxa. Figure 5 shows a small sampling o f this alignment, including the ten diplonemid alpha-tubulin sequences as well as those o f Euglena gracilis and Trypanosoma cruzi. The sequence differences among the ten diplonemid sequences were slight (only 0-15 amino acid differences over the total length o f 384 amino acids). The sequence differences between diplonemids and Euglena gracilis were 18 to 39 amino acids. Similarly, the sequence differences between diplonemids and kinetoplastids were 23 to 44 amino acids. Beta-tubulin gene sequences P C R products o f the expected size (1200 nt) were obtained from Diplonema sp. 2,  Diplonema sp. 3, Diplonema sp. 3 (new strain), Diplonema ambulator, Rhynchopus sp. 1, Rhynchopus sp. 2 and Rhynchopus sp. 3. P C R products o f larger than the expected size were obtained from Diplonema sp. 4 and Diplonema sp. 2. A l l the above P C R products were cloned and both strands of several clones from each were sequenced. Again, B L A S T searches o f these sequences confirmed that each o f them was beta-tubulin, and corresponded to about 86%o (387-388 amino acids) o f a full-length beta-tubulin gene. In addition, the sequences from three independent clones from Diplonema sp. 4 showed that each were slightly different copies o f beta-tubulin (1-2 amino acid differences). The sequences from 30  o o o o o o o o o o o o o o o > > > > > > > > > > > > > > > CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO D R Q Q a a a p p p pP R R R S E E S S E S SS S S E S S E H H EH EH EH EH EH EH EH EH EH EH EH EH EH o o o o O O O O OOO O O O O CM CM CM CM CM Cu CM CM CM CM CM CM CM CM CM H H U H Cd Cd H 61Cd Cd Cd Cd Cd Cd Cd J J J J rl hi rl rl h i h i r l hi hi hi hi o a a D D D PP R PR P R S E E S E E S S E E SEM J rl J J hi hi hi hi hi hi hi hi hi hi > > > > > > > >> > >> > >  g  Si Si Si  oi o§ o l o l CU Cu Cu Cu CM CM CM CM Cu Cu Cu CU  OH,  CU  a; os cd os cu OS OS OS OS OS OS OS OS OS a o OOO OOO OOO OOO o o OOO OOO OOO OOO EH EH EH EH EH EH EH EH EH EH EH EH EH EH < ft < < < ft ft ft ft <t ft < << Cd Cd Cd W b l Cd Cd Cd Cd Cd Cd Z Z Z Z Z z z z Cd Cd Cd Z Z R C u C u Cu Cu Cu Cu z z z Oj OJ CU Cu Cu , >H  1—1 1—1 rH os os OS Cd Cd Cd hi hi hi a a a hi hi hi RO RCO aCO C p p a OOO a a a a r« rH >H EH EH EH  OJ CU h i  M M M hH M M M M M hH M OS OS OS OS OS OS OS OS OS OS DS  01 Cd hi hi a a •i hi p p CO CO  Cd hi a hi P CO  01 Cd Cd 01 Cd Cd hi hi hi hi hi hi <y o> o> o> o> o> hi rl hi hi hi hi p P p p p R D D CO CO CO CO CO CO CO CO Cd Cd rl hi aa hi hi  R R OR OD OR R RO RR RO R O O o O o O  a a a o a a a o ' O ' O ' >H >H >H >H >> >H r« r" r* r" EH EH EH EH EH  EH EH  o o o o o O O o O O O o O O  H EH EH H H EH EH EH EH EH EH EH EH EH OJ CM OJ OJ CM CM CM CM CM OJ CM CM CM CM  > O X Cd  RO C  P > o X Cd  P > o X Cd RO CPO C  RD  > o X Cd  > o X Cd DO aCO C  3 3 3 3 Cu Cu Cu  Cu X CO o M  X X CO CO o o H H  X CO o M  Cd 3 3 Cu Cu X X CO CO o o hH M  3 3 3 Cu Cu Cu X CO O H  X X CO CO o o M M  5  CM CM CM O J C M O I Q J O I C M C M C M C M CM CM |>lJ>l^lJ>lJ>lJ>l[>l[>I^H >- >H >H i» >H >H  Oi  Cd Cd os hi CO  Cd Cd os hi CO  Cd Cd Cd Cd os os hi X CO CO  Cd Cd os X  Cd 6 1 6 1 6 1 6 1 6 1 6 1 6 1 0 1 6 1 Cd cdCdCdCdCdCdCdCdCd os CiSPSBSOSOSCu'hiSOSOS hlJhlhlhlhlhlhlH X  CO CO  i^-l  r^ ^ ^  ^tS* ^ h^H  hi hi hi hi hi hi hi hi hi hi hi hi h H M M h H M H H H H EH EH EH EH EH EH lhlhlhlhlhlhlhlhl OOO OOO h S E E S E E hlhlhlhlhlhlhlhlhl O O O O O O EHEHEHHEHEHEHEHEH CO CO CO CO CO CO o o o o o o o o o OOO OOO S S S E S S S S S EH EH EH EH H EH OOO OOO o o o o o o o o o OOO OOO o o o o o o o o o OOO  cococococococococo  cococococococococo §5° E H E H E H E H E H H H H H  r l hi r l o o o o o o o o CO CO CO CM CM CO o o o o o o o o o X X X EH H X o o o o o o o o o o CO CO CO h i OS OS CO S h l h i h i h i h i h i h i  cococococococococo xH J h l h l U U h l xJxh x l hx l hxl x h lx h lxl H cooioi<<gcou< a o o o i f f l o o a o o c i o c i o a CuCuOiCOCOCuCuCuCuCuCuCuCuCuCu  o o o o o o o o o o o o o o o a o o o s o s o a o o o o o a o o ' hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi U U U U U U U U U U U U U U U P P P P P P P P P P P D P P P U U U U C J U U U U U U U U U O COCOCOCOCOCOCOCOCOCOCOCOCOCOCO CdCdCdCdCdCdCdCdCdCdCdCdCdCdCd 4ftftftftftftftft<ftftftft< 616161610101616161616161616161  xxxxxxxxxxxxxxx  u  osososososososososososososuos u u O O O U O U U U CJ U U > > >> >>>>>!> > >> P P Pp p p p a R p pp hi hi hi P P D hi hi hi hi hi hi hi hi hi > > > hi hi hi > > > > > > > > > CO CO CO > > > CO CO CO CO CO CO CO CO CO P P D CO CO CO p p p MUM D P P HHH a a a a a a hi hi hi hi hi hi hH M M rH M H H H Cd Cd Cd hi hi hi Cd Cd Cd hi hi hi hi hi hi ft ft ft Cd Cd Cd Cd Cd Cd Cd Cd Cd O O O <<<. <<< ft ft ft ft ft CM Cd Cd Cd O O O O O O O O O O O O H H H Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd >< >H >H H H H H H H H H H H H H > ! > - > - >H >. >, i» >H >H i-C P C - PC ->H >-HC >hC KC HC £C K £^ ^ o o o o o o o o o o o o o o o  >>>  — I  xxxxxxxxxxxxxxx  ftftftftftftftftftftftftftftft o o o o o o o o o o o o o o o <r<r<<<r-Cr<<<<<<rir^r< H  >>>>>>>>>> >  5 55553535355553 X X  oiosososososososoiosos OS OS hi hi h l h l h l h l h l h l h l h l h l h l h l hi hi OS OS  §9§9999999co  z z z z z Z z z z z z z z Z z hi hi hi hi rl hi hi a a o o o o o o o o O o o O O o o Oi CM OJ CM CM OJ CM CM CM CM CM CM CM CM CM Cu Cu Cu Oi Cu Cu Oi Cu Oi OJ Cu OJ OJ Cu Cu o: OS OS OS OS OS OS OS OS OS OS OS OS OS OS hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi u O u u U u u u CJ CJ u u u u o u U u u U u u u u u o u U u u H  H  H  H  H  H  H  H  H  H  H  H  o o o o o o o o o o o o o o o  hi a o Cu H  hi p o Cu H  hi p o Cu H  hi hi hi hi p D D D o O O o Cu Cu tu Cu  hi hi P p o o hU 6,  hi hi hi hi hi X p P P p p o o o o o o 6i Cu 6j tu OJ Cu  H  H  H  H  H  H  H  H  H  H  H  H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  CM CM CM CM CM CM CM CM CM CM CM CM OJ CM a, hi hi hi hi hi hi hi hi •1 hi •I •I hi hi hi X X X X hS X hS X hS X hS hS hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi OS OS OS OS OS BS BS BS OS OS BS OS 0! OS Cu 0. OJ Cu Cu Cu Cu Cu Cu tu tu tu tu 0,  u U u u u U H H H hH H H P P P D D P>H >i >H >H •1 hi hi •1 hi hi ft ft ft ft Cd Cd Cd 61 61 a Z Z Z z Z z P P p D H H H hH D H D H  <  <  U U hH H R >H D hi hi ft  U U U U U O hH H H H H H a R D R R R >H > H >H >H r» hi hi hi rl hi hi r< ft ft ft  <<  ^ . ^ . 3 c o r ^ C i C ^ Q j C ^ D j O j D j Q j S D j g D j M U cococouicococococorowrcccoxitrt  ciioicsqqQQQQQ^Qqhw HrjnrjiLiUcr-oimoHiNn^in  H  CN  U U  rH  CN  oo  ro O UO H H  CN  co  "rH r-H i-H r-H CD - H m r o c N r o r o r o ^ i ^ j i ' r 4 i 3 H 3 C N O O D , t i Q , Q , c i D , D j Q , D , E D , g (j, IH JH c o c o c o c o c o c o c o c o c o r c c o r o c o x i c i i r  J  S C « Q ; Q Q Q C 3 ^ Q Q C t i Q Q E H  ,  CM hi U H P >H hi  < Cd <  61 01 61 61 61 61 61 01  Z z z z Z z Z P D p p D p P H H H H H H H eg J. U U u U u U U u U CO u U U 01 01 01 01 01 01 Cd Cd 61 01 Cd 61 td  z Z p O H H U  CO [>  01  9 9hlj 9 9Wj 9hi. 9 rH. 9 hi.9 9 9 9 rH, 9 h 9Hj 9 IH,  Z z Z z z z z z z z z z z Cd 01 Cd 61 61 01 61 Z Cd 61 01 Z Cd 61 61 61 01  >>>>>>>>>>>>>>>  rl hi hi •1 hi hi hi hi hi hi hi hi hi hi hi a a a a o o o o o o o a o o a X X X X X X X X X X H H H hH hH H H H H H H H H J> [> CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO hi hi hi hi hi S hi hi hi hi hi hi hi hi hi hi H H H H H H H H H H H H H H H ft < H H ft ft < < < s s s < P Z Z ZZ Z Z > H zM Oi z a, CM CM CM CM CM CM CM CM C M CM CM CM C 01 01 01 01 01 Cd 01 01 01 01 Cd 61 61 Cd 01  h<C h<C  CN  O O  H  H  hUhUhUhuCuCuCuCuCuCuCuCuCuCuCu > > > > > > > > > > > > > H >  O HJ  H  <<  o o o o o o o o o o o o o o o  o rH CM CO HJ UU U U CJ CJCJOrrj CJ O rrj U -rH M rH r-H QJ TH n n r s m r o n ^ H / ^ 3 H 3 C N CJ O  H  >>>>>>>>>>>>>>> > ft ft ft ft ft ft ft ft ft < < < ft ft rt CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO >> > hi> hi > hi > hi > hi >> > hi >> > hi > hi hi hi hi hi hi > hi hi X X X X X X X X X X X Z X Z X z z z z X z Z z z Z Z z z Cd  O O o O O o o o O o o O O o o CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO  WA WA  s Cu Cu X X CO CO o O hH H S  RR  > > > O O O X X X Cd Cd Id P P R CO D O CO CO C > > H > > Cd Cd Cd  > > gX XX X  WA WA  Cd s s S Cu Cu Cu X X X CO CO CO o o O H HH  R  O > o X Cd  XX  WA  > Cd Cd Cd Cd Cd Cd Cd Cd  P > o X Cd P CO > >  EV  RR RD  > > > o o o X X X Cd Cd Cd RO RCO CRO C >  a > o X Cd a CO  fl D P D D PQ P CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO > > E. £ > > CM CM CM E; Ss Ss CO CO CO CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CO CO CO CO CO CO CO CO CO CO CM CM C M C M C M C M CO CO H H H M l—l M M hH M CM CM CM CM CM CM Ml— H H H M > > > > > > > > >> > H > > > CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu EH EH EH EH EH EH EH EH EH EH EH H EH EH EH > > > E S E h l E E E S S E E EE E S E E E E E E S E S E E H H H H > H S H S H H HS OS OS OS o s o S o S o S o S o S o S o S o S OS OS OS R R R R R D R D D P P R DRR  SD SD  MMMMI-iMMMMrHMMMMM o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o CMCMCMCMCMCMCMOjCMOjQjCMQjajCM  R D a R DD R  Cu Cu Cu Cu Cu OJ OJ Cu OJ Cu Cu Cu Cu Cu Cu O O O O o O O O O O o O O O O >>>>>>>> >>> >>E > hi hi hi hi hi hi hi hi hi hi hi hi hi E hi Cu Cu Cu Cu Cu Cu Cu Cu OJ OJ Cu OJ OJ Oi OJ Cu Cu Cu Cu Cu Oi OJ Cu Cu Cu Cu Cu Cu Oi Cu X X X X X X X X X X X X X X X hi hi hi hi hi hi hi hi hi hi hi •1 hi hi hi OS OS 0! OS OS OS OS OS OS OS OS OS OS OS OS CM CM OJ OJ OJ CM CM Oi CM CM CM CM CM CM CM Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu fcj Cu CU OJ Oi CM OJ OJ OJ CM CM OJ CM CM CM CM CM CM CM CM H H 1 HH H H H H M H H H > H hi hi I hi hi hi hi hi hi hi hi hi hi hi Z Z I Z Z Z Z Z z Z Z Z Z Z  MC  CU6JCUCUOIOIOICU0ICUCUCUCUCUOJ  EH EH EH EH EH EH EH EH EH EH H EH EH EH EH  ;DE  Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q CMOJOJCMCMCMCMCHCMCMCMCMCMCMCM  CNCNi^CNCNCN(NrsJCNM(NCNCNf>I(N  VM VM VM  Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z  iflLnuiinuiiniiiiiiLf)L/)iiii/>i/>or] Lni/iini/lLnini/iinLnLnLnLnLnr>o  VM  HHHHrHrHHHrHHrHHHHH  AD AD  jiinuiLnLnirnjiLfii/iLni/ii/iinon y)kDy)VDy)iovDiiHDvoijovovDcorH  VM VM  LnminLniiii/iLniriLjnininLnLnoro r--r^r-r-r>r>r-r-r>i^r>r^r>o>r-]  M  I CN U U  i CN  U U  H  u  u  rH O CN r o HJ  rorocNrororo-rji'rji'H  CO  o  HJ  3 »j  H  -rH rH <U -H 3 C NCJ CJ . 3 *  0,0,0,0,0,0,0,0,0,1= D j E 0,VH IH COC0C0C0C0C0CQC0to 3rQ^ Qhj| r  W f  1  r^os'iSl-iQQQQC^C^QQE^Cij  HiNrl^i^nDhcrjmoHMrlH/ii-|  31  L O LDLD in in in cn in LO UO UO UO o CN ^ ^fl <H< LD <H< "H TrJH .H. coo ro ro ro ro ro ro ro ro ro ro no ro ro ro CN LO  1  L^l^l^i>i>l^r^[^l^I^C^I>COCNI>  cococococococococoaooococo rjt rroforororororororororororo^rro  O CJ 0 CJ cj cj OCJ COO CJ CJ  crj agXft ftft ftft ftft ftft ftft ftftXftftXftftXftft Xftft Xftft ft ft ft ft C^M" HM" HM"  a eg  CM H H H  H  HH  1—1 H H H H H H  H H  H H  CH  Q Q Pp P P P P P P P P p p P U U U CJ CJ CJCJ CJ CJ M H M H 1—1 H H H 1—1 H H H >O> cj C cjO CO CO CC OO CO CO COcoCO COa,CO> COC CO CO CO CC OO CO CO CO CO COCOCOCOCO COCO X X X X X X X X X X X X X X X HI—1 H H i-i \-t \-t i-t v-^ H H H H H 2 2 2 2 2 22 22 2 2 § 2 Z g 2 g 2 2 §2 g2 g g ftH ftH Hft ft ft ft ft ft ft ftft ftft ft ft l—l i-^ t-t t-i H t-^ t-i H 1—1 1—1 H i—i 3 3 33 3 3 3 3 3 33 33 3 3 Cd Cd Ca d Cfl Cd Cd CM Cd CC dd CdCd Cd Cd > > >> > > > > > > > > >u HCu > C u Cu CC uu Cu Cu Cu Cu CuCuCuCuC Cu CH >< >H >H CH >H ,>• >< >H C" C" C" >i CO CO CC OO CO CO CO CO COCOCOCOCOCO CO CO CO CC OOCO CO CO CO COCOCOCOCOCO CO XXX X X x x x X2X §X 2 X 222 2 2 2 2 2 2 22 22 2 2 isi a a a a p p pP p p p p p pp PP P P ESS ESS: ' a a aa a a c Cd Cd Cd Cd Cd Cd C  uuu  a Cd ix > a p Cd Cd Cd Cd Cd Cd PP Cu Cu Cd Cd CJ CJ Cd Cd Cd Cd Cd Cd H > EH EH  ft ft  PR aa >H >H  aa aa C* >H Cd Cd CO CO > > p p R R 2 2 CO CO Cd Cd  96 £  Q  Cd Cd Cd C [H H H E CO CO CO c CU CU Oi Pj I CJ CJ CJ CJ I OJ PJ OJ OJ I Cu Cu Cu Cu I p pp s : a a a a a a a a aaac c CJ CJ ft < r< < < < ft < ft ft ft t  "2  ft ft  a ca EH EH Cu Cu Cd Cd E E Cd Cd CJ CJ CJCJ CJ CJO O O CJ CJ CJS ER EP C d C d C d Cd Cd CC dd Cd CfCld Cd Cd CJ CJ CJ CJ CJCJ CJ CJCJ CJ CJCJ CJ UCJ Cd Cd Cd EH EH EH EH EH H EH EH EH EH EH EH >H >H >H >H CH >H CH >H >H >H CH C" cj CJ cj 3 3 33 3 33 3 33 3 3  uuu uuu u u u  EH EH EH EH EH EH EH EH EH EH EH E pp C* > C* I >H C* >H > OJ OJ H >H >H >H >H OJ OJ OJ  PPP  2? ^  P PP P PP •  C OJ OCJ J OJ OJ OO JJ CJ CJ CJCJ CJ CJ CJ CJ C CC X X XCJXCJCC Cr; C G OJ OJOJ OJ OO OJ OO JJ OJ OS c JJ ft ft ft ft ft ft ftftftPftPt C P P ft P R R R PR C CO COP O CO CC OO CO CftOOftCOft C>O c ft ft C U U ft ft ft ft ft ft U UU C S E E E E E E E EEEESS.ES  cc cc cc c  CN  CC CC CC CC CC CC CC CC CC CC CC CC H EH EH C- >H >H  p pp p pp p pp p pp 3 3 3 CC Cu XCuXCC uu XCuXCCC u CC Cu Cu Cu X X p pp u HM HH HM X X X Cu Cu CM-H HH HH OJ OJ OJ uOJJCO uJ CO uJ OJ OJOJOJOJOJOO JOJJCO OJOJOJOJOJOJOJOJ Xu Cu OJ OJ OJ CuCuCuCuCuCuCuCu Cu Cu CC uu XCuX C J CU E SOJSOE  uuu uuu  ft ft  ft ft ft ft ft ft f  R P PP P P C Cu Cu CC uu Cu Cu C I CO CO CO CO CO : ' a a aa a a c ' a a aa a a c I EH EH EH EH EH EH E  p p pp p p • Cd Cd CfW l Cfl Cfl c ft ft ft ftftftc • > EH> EH> EH > > > ; E I I  PPP•  '.ft sC  f Cu Cu OJ C  EH EH EH EH EH EH EH EH EH EH EH H  P  it  EH EH EH  Cu Cu CC uu Cu Cu Cu CC uu Cu CC uu Cu Cu Cu a a a a a a a aa a a a a a a cd a a d Cd Cd Cd Cd Cd Cd Cd Cf Cld cd CflC CO CO CC OO CO CO COCO OO >CO>CJ>CO COCO CC > J OJ OJ OJ > > > fe is > X>X> O x x o; x u Cu x x x OJ OJ OJ OJ X O uJ Cu CC uuECuE C S S Cu Cu CX u X X X XC S E E Cu Cu Cu Cu ECuE S Cfl Cd Cd Cd Cd Cfl Cfl Cd aCda Ca d E E S Sa E a CdaCda Cad a Cd Cdo a a a a a  CH  C  CO  >H C« C" >H >H >H f a a a aC" a a a a a ac a aa aa a a aaa aac CO CO CO O CO CO c CJ CJCO COCO CO CO C CJ CJ c OJ OJ u cj oOJ OJ CJ OJ c CO COu cj cj J CO CO c CO H EH OJ OJOJ OJ OJ O O CQ CO EH COH EH E P p CO COC ft ftEH EH EH EH EH EH P P P H ft ft P P P P P P ft ft ft C >H >H >H  ft ftft ft ft ftft ft < ft ft ft ft  < ft f  U 0  CN H CN ro u H CN H U U U U U ra  -H rH  uu  OJ -H  forodnnn^ci'^ciHCifNOO  ciio^Q^r^ci ci'lf Q'§ J  1^  2  H CN UU UU  SH O  H CN ro w U U U ni  r o r o i N n n r o H f ^ ^ C3H 3 CN  OJ •  CJ CJ  3 ro  0 , 0 , 0 , ^ ^ 0 , ^ 0 , 0,11 ti, In H LOOCOWIOOCOCOWCCJCQItCOrOCOXJCOO  nwwwmwwnoicTjKraw-'H'Cji  C l j o j o 4 Q Q Q q Q H Q ' «  CU^Q4QQQHMQQ^  Hdco^inLoMooioHMro^m  Hrjci^LOLcMumoHiNroci'in x-i r-i ^-t r-t H  r-{  _g  ~  ap ca  H H H H H H  32  two independent clones from Diplonema ambulator also represented two different betatubulin sequences (3 amino acid differences). The sequences from two independent clones from Diplonema sp. 3 (new strain), and Rhynchopus sp. 3, respectively, revealed two different beta-tubulin gene sequences, however the differences were only detectable at the nucleotide level, not at the amino acid level. From each o f the rest four diplonemids, the two independent sequences from two independent clones are identical (Table 1). One intron was found in each o f the three different beta-tubulin sequences from Diplonema sp. 4 (140 nt, 126 nt and 149 nt in length, respectively) and in the beta-tubulin gene oi Diplonema sp. 2 (71 nt). The inferred amino acids for each o f the thirteen beta-tubulins from diplonemids (with no primer sequences) were aligned with those from 46 other eukaryotic taxa. Figure 6 shows a small part o f this alignment, which includes not only the thirteen new diplonemid beta-tubulin sequences, but also the sequences from Euglena gracilis and Trypanosoma brucei. A l l o f the thirteen diplonemid-sequences were very similar to each other (0-15 amino acid differences over a total o f 387 residues). The sequence difference between diplonemids and Euglena gracilis (23-44 amino acids) was similar to the sequence difference between diplonemids and kinetoplastids (39-54 amino acids). G A P D H gene sequences P C R products o f the expected size (around 1000 nt) were obtained from Diplonema sp. 3 and Rhynchopus sp. 3. P C R products o f both expected size and larger than expected size were obtained from Diplonema sp. 2. A l l these P C R products were cloned and both strands o f several clones were sequenced from each. B L A S T searches o f these sequences confirmed that they all encoded G A P D H , recovering over 90% o f a full-length G A P D H  33  gene. In addition, the sequence o f the larger P C R product from Diplonema sp. 2 contained two intron sequences (77 nt and 120 nt in length). The deduced amino-acid sequences o f the four diplonemid G A P D H genes (with no P C R primer sequences) were aligned with those from 96 other taxa, including both eukaryotes and prokaryotes. During the aligning process, I found it was difficult to align the four newly obtained G A P D H sequences from diplonemids with those o f any euglenozoa. However, it was comparatively easier to align the three intron-lacking G A P D H sequences (from Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2) with Anabaena variabilis gap3 (I w i l l refer to these three diplonemid sequences as "gapl"), and the second Diplonema sp. 2 G A P D H with the GapC of cryptomonads (I w i l l refer to this diplonemid sequence as "gap2"). Figure 7 is a representation o f this alignment. Comparison among the four diplonemid G A P D H sequences reveals that Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2 (gapl) are far more similar to each other than to Diplonema sp. 2 (gap2). The sequence differences between Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2 (gapl) are 81-93 amino acids over a total length o f 301-316 amino acids, whereas the sequence differences between these three and Diplonema sp. 2 gap2 are 187-196 amino acids. Further pairwise sequence comparisons showed that the sequence differences between Diplonema sp.3, Rhynchopus sp. 3, Diplonema sp.2 (gapl) and Anabaena variabilis gap3 are only 148-162 amino acids, whereas those between these three diplonemid G A P D H sequences and those o f E. gracilis GapC, T. bruci GapC, and L. mexicana GapC were 195207 amino acids. Meanwhile, the sequence differences between Diplonema sp. 2 (gap2) and the GapC o f two cryptomonads (Pyrenomonas salina GapC and the Guillardia theta GapC) were only 115-118 amino acids, whereas those between Diplonema sp.2 (gap2) and the E. gracilis GapC, T. bruci GapC, andZ. mexicana GapC were 129-152 amino acids. In 34  CinTio rH O OC LN D CNCNCNCNC CN in  ^o m Hooror- ro cr»om rf H ro ro ro ro ro ro 2> a C ~> [ D HJ H p >H O D Cdca C d C d C dft C x ft > rf >da >H X EH EH C O C O C O O EH EH EH ft ftC E-i ca ft CJ x XC CJUft C U X X CJ *> X p> CO2 rf CU 2 CdX cuP P CuCu> > < X rf rf P  2  3 H HH H fe fe  p M H £ CJ  > > > > EH  EH  Q Q a Q H EH H  Q  a  P COC OC O C OC ft >O>CO C J C O rf Ja •a J a a ftEH °rfEH9 H H H H ft ft ft j> ap H H > > > >ft> CU UCUCUC Cu U H HC H CU uC Si s,HHH> rf a cc HHrf rf CJ CJ CDJ CJ CEHJ EHCJ 2a a P XX H HXJ XP CJ CX J C X XJ CXJ CaJ aCJ P C dCdH Cd CdH Cd CdP p ftup ftuC ftH ftftJH ft i* C C u HH M > > >> ca > C d C d X CHJH rf JX CJ H rf H rf HH C >> rf rf rf rfrf rf C O Cdi iCd Q QEH EH EH X X X ftOftC ft ft rfOrf rf Cd Cd Cd Cd C rfCJrfCJ X X X X rf ft C JOC CHJ C CEHJOC CEHJO EH > > > > C O C O EH EH ft ftft ft ft ft rf ft C EH co u corfp > O C O C O rf ft ft ft EH EH EH EH rf C CCJJ CCJJ CCJJ CCJJ HP HCO CO CO CO ft CO CO CO CO O ft ft ft ft > j> > > HH H MftHHft HH H H H tfi In * S C J H H H H HH M HH M ft ft a aP Oa a 3a CH H Cd Cd Cd Cd P P X X X X X CU 2P £ ssC rt Oonj J p H H EH x w x aw x g agaga H H C J C J C J H > s > > < < COCOCOrf rf CJ Cr J cu C M [> HH Cu [> [> O O O I l l UCUCUCU OH cd x ca ca cv EH C O C O C O C J C JtO> J U < C U C Ucu CCJU CC JJU cu cu C ca ai cd cd q a C U U 2 i i i iga P s Q QP s s Q Q ft j a a x i ZH n Q ax H s u s s 2a XXX S*l i i i Q M p C J C O o EH EH EH X X X S CJ | | Cu J CJ X 2 a 2 1 CO CO > ftft ft P s Drf CJ rf CJ P H J p p i> > Q > a ' I p> J> > a> PP BH C OoCOo C a; os cu cu a aOaEHaX X cj cj cj cj EH H EH EH EH EH SS CJrfCJ arfrf EH EH EH EH EH H OSCJOiCJ > H HH a EH rf Cu Cu CJ CJ Q a a D 11 H EH EH EH EH EH aa H EH EH EH C O EH H H HHHH22 C>J MCJ S S £ M C UX CJ CJ CJ CJ CJ CJ H H X to ft •H "•H •H ss M rH 'rH "•H "•H fl fl ro. • ro Hra ro. .ro .•uIn ra • U' . . . in Q, OJ DH ro ro "r OJ| a , a , Q, Q, ra OJ O, Q, Q, ra Q, Q, D, ra HH HH HH  I  I  I  COO OJ  rf rfrfICSO ft > JHH J J X XXXXX >>>>>> ; o cu cu EH EH a : X! P Cd Q Q • - cj p a a I  >.  >H  3  3>> I  • • J J • u. < . : JS H H  u  EH EH EH EH CJ C C OftC O CO C OO rfrf ft ft X X rfrfrfrfrfCO p n p p Q P > HH |> HH C U^ HH HH HH HH HH HH EH C OCOCOCOCOCOCOCOCO C O C UXCUCdCft U CC UO >HEH CUCfl X D PpQ2Q 2 2 2 p cj CJ EH EH EH X P M >H > R Pc PCuPixPCu P Cu HH  EH H  EH M EH H  a ^3  ~>  HH  H  -3  HH  rn  IH  CO  CN  CN  co co co i> .c. CO Q Q «( Q  CU  HHH H  CN  ro ^  6  CN  uoCO  CN  4J  CN  t> •ci CO •u Q oj c1 Q cj CN rH H H cn CO H CN CO CO CO  ro ro  HH  CO  CO  CO  CO  CN  CO CN  t> x; CO •u Q oi Q Q «q cj CN rH rH rH CD H CN M ^ CO CO CO  rO LO  S  C)0  PH CU  HH  CCS  CO  -<H rH. -rH  rn.  CN .  ex-*  , C3 CJ  fltoro . U -H U  M  m  CO CN  Q, a , Q, to 11 P CO CO to CO i> u cu Q Q <i Q  1  fl  H  CN  ro  6  CN  -a*in  CO  JJ CD  addition, Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2 (gapl) share five unusual insertions (see Fig. 7), which are not found in either the gap2 oi Diplonema sp. 2, or i n the G A P D H from any other members o f the Euglenozoa, Anabaena variabilis gap3, GapC o f cryptomonads, or i n the G A P D H genes from most other prokaryotes and eukaryotes. In summary, all the diplonemid actin, alpha- and beta-tubulin sequences are very similar to each other. Diplonemid actin sequences show more resemblance to euglenoid sequences than to kinetoplastid sequences, and diplonemid alpha- and beta- tubulin sequences are nearly as similar to those from euglenoids as to those from kinetoplastids. O n the other hand, among the four newly obtained d i p l o n e m i d - G A P D H sequences, three are very similar to each other but very different to the remaining one sequence. In addition, none of the four is particularly similar to any o f the euglenozoa G A P D H sequence, instead, three are more similar to the gap3 o f cyanobacteria and one is more similar to the GapC o f cryptomonads.  3.2  Diplonemid introns  Position of diplonemid introns A s mentioned previously, introns were found in several o f the P C R products. They were tentatively identified by insertions in genes that couldn't be aligned to the amino acid sequences o f the same gene from other organisms by B L A S T searching. After the contigs were complete, introns were confirmed by the presence o f the canonical G T - A G cleavagesites (see Materials and Methods). They were found in all three phases (if an intron is found between two codons, then it is termed as a phase 0 intron; i f an intron is found between the first nucleotide and the second nucleotide o f one codon, then it is termed as a phase 1 intron; i f an intron is found between the second nucleotide and the third nucleotide o f one codon, then it is termed as a phase 2 intron). 36  I have characterized 11 introns in nine o f the 29 nuclear-encoded genes from diplonemids. The amino acids, where the corresponding introns occur, are highlighted in bold in the alignments (Fig. 4-Fig. 7). I found two introns in the actin gene from Diplonema sp. 3 and one intron from Diplonema ambulator. These three introns are all phase 0 introns (Fig. 4). There is one intron in each o f the alpha-tubulin genes from Diplonema sp. 4 and Rhynchopus sp. 2, respectively. In Diplonema sp. 4, the intron is a phase 1 intron whereas in Rhynchopus sp. 2, the intron is a phase 0 intron in a different position (Fig. 5). I found one intron in the beta-tubulin gene of Diplonema sp. 2. It is a phase one intron. In Diplonema sp. 4, there are three different introns in three different copies o f the gene. A l l the three introns are in the same position and same phase (phase 2) (Fig. 6). I found two introns i n one copy of the G A P D H gene from Diplonema sp. 2 (gap2). They are both phase zero introns. D i p l o n e m i d intron characterization The 5' and 3' ends o f the 11 introns were aligned (Fig. 8). A l l have the consensus G T and A G boundaries as expected o f canonical spliceosomal introns. The highly conserved six nucleotides at the 5' splice sites o f the diplonemid introns are G T R T G Y , which also closely correspond to the six conserved nucleotides at the 5' splice site o f a classical G T - A G mammalian intron: the only difference lies in the fourth position, where a T-residue is present in diplonemid-introns rather than an A-residue as in mammalian introns. A l l eleven diplonemid introns end with C A G consensus nucleotides, just as classical spliceosomal introns do. Interestingly, however, the twelve nucleotides preceding the final C A G are mostly C or A in all eleven diplonemid introns (see Fig. 8). In ten o f the eleven-diplonemid introns (except for the intron in the alpha-tubulin gene of Rhynchopus sp. 2), A-residues are present at least five times each and T-residues are  37  tji t?> tn &> 0> tn tr> tr> tr> t3i tTi CD cd rd rd rd rd rd rd rd rd rd id  u o 4J U U U U U u U U rd rd U rd rd rd rd rd rd rd rd o u U U rd O O rd U U CJ rd -U rd rd rd rd rd rd rd rd U U rd U U U U rd rd U (d rd rd rd rd U rd O U U U rd rd rd Cn cd Cn rd rd rd U U U 4-> U rd rd U rd rd U 4J 4J rd  rd U rd U rd rd rd rd 4J  u  U fd O <d O -U 4-> 4-> Cn 4J  U U rd rd O rd rd rd  u  U rd U rd U CJ CJ rd rd  CJ rd CJ rd CJ rd U rd CJ  U rd O rd CJ rd U rd CJ  «<  y •4! y  .< <  u  y << u  «c  u  <<  eg •4!  y 2 G5 |H |H |H |H |H M M |H |H EH  u U U U  U U U  U U U U  4JJJ4JJJ4J4J4J4J4J4J4J o u j o h o r ^ i j o H O v j m o o i ^ ^ i > o ] o r \ i r > ^ c N ) ^ rH  r H r H r H  O Cn Cn u Cn u Cn u Cn O Cn U  r H r H r H  4J 4-> U 4J 4-J 4J 4J 4-> Cn 4J 4J 4J 4J Cn rd 4J 4J 4-J Cn 4J 4J rd CJ u 4-J 4-J 4-J U CJ rd rd 4-> Cn 4J Cn Si Cn 4-J rd Cn Cn Cn Cn rd 4J rd 4-J 4J 4-J 4-) Cn -> 4-) Cn CnCn CnCn CnCn 4-J Cn 4C n Cn 4J 4J 4J 4-) 4J rd 4-> 4-J 4-) rd Cnrd rd Cn rd Cn u rd rd rd 4-> 4-> 4J 4J 4J 4J 4J 4J 4J 4J 4J rd  Cn  Cn CnCn CnCn Cn Cn CnCn Cn Cn CN  rH CN  I  H  I  co cn  CN  rd  i  CN  -H  I T) T) I I  - H - H CN CN  4->4->4J  Q Q  CJ  U  <  <  •a E<  CD  I  ft  CO  £  ,H C N  'S  cs  EH  CD  CO  E  rCl  EH U  U I  .... ^  T) T) T3 I I I I rd rd rd rd Q 4 J 4J 4J J J  UUU(iJ(l3HH(l)OlD(D  <  EH U  I I rd rd  I  Q Q Q C N C N ' d  I I SH {H  cn  CnCn<f<mfflpQpQ  w  §  S3 u H ca  .S  •9 t\  CQ  a  u  £ 2  38  either completely absent or present only once or twice each. The C-residues are also present around five times each on average. This contrasts with the classical G T - A G mammalian introns, which contain a polypyrimidine tract in this region. Moreover, i n the introns o f the alpha- and beta-tubulin genes from Diplonema sp. 4, continuous ' C A ' repeats were observed. The branchpoint region in a classical G T - A G intron is usually closer to the 3' splice site than to the 5' splice site. More specifically, this region generally appears 15-40 nt upstream o f the 3' splice site (Umen et al. 1995). In yeast, this branchpoint consensus sequence is strictly conserved, that is 5 ' - T A C T A A C A - 3 ' (Umen et al. 1995). In contrast, this branchpoint region is loosely conserved in the introns o f mammals: 5 ' - Y N Y T R A C N - 3 ' (Umen et al. 1995). The branchpoint consensus sequence from yeast introns was not observed in any o f the eleven diplonemid-introns, but the branchpoint consensus sequence from mammalian introns was observed six times in five o f the eleven diplonemid-introns. However, it was observed four times in four different introns at either position +4 (referring to the 5' cleavage site) to +11 (three introns in three different copies of Diplonema sp. 4 beta-tubulin genes) or, at position +9 to +16 (one intron in the alpha-tubulin gene o f Diplonema sp. 4). Both regions are highly unlikely to be real branchpoint sites since they are too close to or even overlapping with the 5' consensus splice sites o f the introns. This branchpoint sequence o f mammalian introns was also observed between position -23 (referring to the 3' cleavage site) to -16 ( T G T T G A C T ) i n the intron from Diplonema sp. 4 alpha-tubulin gene, and between position -36 and -29 ( T C C T G A C C ) i n the first intron (closest to the 5' end o f the gene) in the Diplonema sp. 2 gap2.  39  3.3  Phylogeny of the Euglenozoa In addition to looking for introns in diplonemids, I constructed three protein  phylogenetic trees with the newly obtained diplonemid sequences in an attempt to determine the phylogenetic position o f diplonemids within the phylum Euglenozoa. A c t i n phylogeny A n actin phylogeny was constructed from 373 alignable characters from a total o f 65 eukaryotic taxa using distance and neighbor joining methods (Figure 9). Most o f the phylogenetically distinct eukaryotic groups including land plants, green algae, animals, fungi, heterokonts, and alvolates are recovered in the actin tree (Fig. 9). The two new diplonemid sequences are closely related to each other and form a clade with 100% bootstrap support. In fact, the whole phylum Euglenozoa (shaded in Fig. 9), consisting o f three major groups (diplonemids, euglenoids and kinetoplastids), is well supported by this tree (91%) bootstrap value). Furthermore, this actin tree also strongly suggests that the two diplonemid sequences are more closely related to the euglenoid sequences than to the kinetoplastid sequences. The node uniting diplonemids with euglenoids (at the exclusion o f kinetoplastids, node A in Fig. 9) is well supported by bootstrap (79%>). In an effort to test the likelihood o f alternative positions for diplonemids within the phylum Euglenozoa, Kishino-Hasegawa tests were carried out on the actin data. In this case, I tested two alternative positions for the diplonemids. In one alternative, the diplonemids branch with the kinetoplastids, so the internal topology o f the phylum became ((diplonemids, kinetoplastids), euglenoids) or ((D, K ) , E). The other possible position o f diplonemids is at  40  Fig. 9 Neighbor-joining tree based on actin protein sequences o f various eukaryotes, including two new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are i n bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor o f diplonemid and euglenoid sequences and node B is the last common ancestor o f the phylum Euglenozoa (shaded). Alternative positions for the diplonemids were assessed with Kishino-Hasegawa tests at the nodes marked with open circles. The two alternatives were not rejected at 5% levels.  41  Sorghum bicolor 1 Orysa sativa 1 Nicotiana tabacum Pisum sativum 1 PLANTS •Striga asiatica 1 Arabidopsis thaliana 1 Solanum tuberosum 101 Zea mays 1 Cosmarium botrytis Coleochaete scutata Mesostigma viride Nannochloris bacillaris GREEN ALGAE Scherffelia dubia Volvox carteri Chlamydomonas reinhardtii Chlorella vulgaris 1 — 1 Allogromia sp. 55 Reticulomyxa filosa 1  Ammonia sp.1  FORAMINIFERA  RED ALGA  FUNGI  Trypanosoma bwcei B Trypanosoma brucoi A Trypanosoma cruzi Leishmania major  KINETOPLASTIDS  DIPLONEMIDS EUGLENOIDS • HETEROLOBOSEA  OOMYCETES HETEROKONTS (CHROMISTS)  Plasmodium falciparum 2 • Toxoplasma gondii a  R  _  ALVEOLATES  minimum Amphidinium carterae Perkinsus marinus 1 P m n i n a n f n m  0.1  42  the base o f this phylum, so the internal topology became (D, ( K , E)). These two alternative positions o f diplonemids are indicated by open circles in Fig. 9. The K - H tests found that these two alternative topologies o f Euglenozoa were not significantly worse than the original topology ((D, E), K ) , at a confidence level o f 5% (Table 2). In conclusion, actin tree strongly supports the inclusion o f diplonemids within the phylum Euglenozoa, but the close association o f diplonemids to euglenoids at the exclusion of kinetoplasteds is only supported by bootstrap, and not by the K - H tests. Alpha-tubulin phylogeny A n alpha-tubulin phylogeny was constructed from 436 alignable characters o f a total of 64 eukaryotic taxa using distance and neighbor joining methods. The resulting alphatubulin tree (Fig. 10) supports most o f the major eukaryotic groups, including alveolates, green algae, red algae, land plants, diplomonads, fungi, animals and parabasalia. The 10 diplonemid sequences from this study form a single group with a very high bootstrap value (96%). The phylum Euglenozoa (euglenoids, kinetoplastids and diplonemids) also forms a single clade, which is supported at 64% by bootstrap. When the internal phylogeny o f the Euglenozoa is considered, the alpha-tubulin tree tells a different story than the actin tree. The alpha-tubulin phylogeny favors diplonemids being closer to kinetoplastids than to euglenoids. However, the node uniting diplonemids and kinetoplastids (node A in Fig. 10) is poorly supported (40% bootstrap). To test the strength o f this position for the diplonemids, I did K - H tests on two alternative positions for diplonemids (marked by two open circles in Fig. 10). The results showed that the alternative topologies for the phylum Euglenozoa- (D, (E, K ) ) and ( K , (D, E))- were not significantly worse than (E, (D, K ) ) at confidence levels o f 5%, as suggested by the low bootstrap values in the original alpha-tubulin tree (Table 3). 43  Fig. 1 0 Neighbor-joining tree based on alpha-tubulin protein sequences o f various eukaryotes, including ten new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are in bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor o f diplonemids and kinetoplastids and node B is the last common ancestor o f the phylum Euglenozoa (shaded). Alternative positions for diplonemids were assessed with Kishino-Hasegawa tests at nodes marked with open circles. The alternatives were not rejected at 5% levels.  44  BSfRhynchopus sp.2 -Rhynchopus sp. 1 ?22 Rhynchopus sp. 3 115 Rhynchopus sp. 3 ZDiplonema sp.3 new DIPLONEMIDS -Diplonema sp. 3 29 Diplonema sp.3 new • Diplonema spA -Diplonema sp. 2 -Diplonema papillatum ^00^ Trypanosoma brucei KINETOPLASTIDS . Trypanosoma cruzi —— Leishmania donovani EUGLENOIDS Euglena gracilis Acrasis rosea Naegleria gruberi HETEROLOBOSEA Condylostoma magnum Loxodes striatus Zosterograptus sp. ALVEOLATES Plasmodium talciparum\ Toxoplasma gondii Spathidium sp. Volvox carteri 1 Chlorella vulgaris • G R E E N ALGAE Cercomonas ATCC50319 RS23 fe Chlorarachnion reptans 2ICERCOZOA 97 Guillardia theta nucleomorph "J RED ALGAE 85 • Galderia sulphuraria • Reticulomyxa filosa 2 • FORAMINIFERA 79. Hordeum vulgare 2 Eleusine indica 1 52 Prunus dulcis jrP" ' Anemia phyllitidis PLANTS 5 r— Arabidopsis thaliana 5 53 l-Anei 100Hordeum vulgare 1 76 •Arabidopsis thaliana 1 - Eleusine indica 2 100 ^—Physarum polycephalum 1 98 • Physarum polycephalum ESLIME MOLDS Physarum polycephalum D Guillardia thetacytoplasmic • CRYPTOMONAD 100 Pelvetia fastigiata'l —i . ., „ . Pelvetiafastigiata2 J BROWN ALGAE 100 i Spironucleus vortens 100 Spironucleus muris Spironucleus barkhanus DIPLOMONADS Giardia intestinalis Ajellomyces capsulatus 100 I -Emericella nidulans 1 Schizosaccharomyces pombe 1 ——— Candida albicans 100 100 • Schizophyllum commune A Schizophyllum commune B • Pneumocystis carinii Schistosoma mansoni Patella vulgata —Octopus dofleini _^ Drosophila melanogaster 1 ANIMALS QH] ^— Gallusgallus L J L Homo sapiens 1 S^Tofpecto marmorata —Spizellomyces punctatus = 3 CHYTRID FUNGUS 98 MonocercomonasATCC50210 | P A D A R A Q A I I A Trichomitus batrachorum J r A H A t S A b A L I A L  ]  Pru  ]  7  1  0.1  -  In conclusion, although in alpha-tubulin tree diplonemids are placed closer to kinetoplastids than to euglenoids, this placement is not supported by either bootstrap or K - H tests. Beta-tubulin phylogeny A beta-tubulin phylogeny was constructed from 428 alignable characters from a total of 59 eukaryotic taxa using distance and neighbor joining methods. The beta-tubulin tree (Fig. 11) also supports most o f the common eukaryotic groups: land plants, green algae, alveolates, heterokonts, animals, fungi, and diplomonads. The thirteen new diplonemid sequences obtained from this study also branch together with 100% bootstrap support. However, in this tree, a heterolobosean sequence (Naegleria gruberi) branches within the Euglenozoa, specifically with euglenoids. The close association o f the beta-tubulin sequences from Naegleria gruberi and euglenoids was also indicated by the previously constructed global beta-tubulin tree (Keeling et al. 1996). In both actin and alpha-tubulin trees, the heterolobosea form a separate phylogenetic group closest to the phylum Euglenozoa. The inclusion o f a member o f a different phylogenetic group may cause the low support (less than 50%) for the whole group (the Euglenozoa and Naegleria gruberi, node B in Fig. 11). Furthermore, the suspiciously close association o f Naegleria gruberi and euglenoids may affect the real phylogenetic relationship o f diplonemids to kinetoplastids and euglenoids. A s for the phylogenetic placement o f the diplonemids within the phylum Euglenozoa, this beta-tubulin tree agrees with the alpha-tubulin tree in placing the diplonemids with the kinetoplastids, in contrast to the actin tree. The bootstrap value for the  46  Fig. 11 Neighbor-joining tree based on beta-tubulin protein sequences o f various eukaryotes, including thirteen new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are i n bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor o f diplonemids and kinetoplastids and node B is the last common ancestor o f the phyla Euglenozoa (shaded) and Heterolobosea (represented by Naegleria gruberi in this tree). Alternative positions for diplonemids were assessed with the Kishino-Hasegawa test at nodes with open circles (when Naegleria gruberi branches together with euglenoids) and filled circles (when Naegleria gruberi was moved out o f the phylum Euglenozoa, suggested by a dashed line). The five alternatives were not rejected at confidence levels o f 95%>.  47  110 Rhynchopus sp. 3 °| Rhynchopus sp. 2 6 Rhynchopus sp. 3 IE Rhynchopus sp. 1 24 Diplonema ambulator 23 Diplonema ambulator ip '27 Diplonema sp. 4 new |L Diplonema sp. 4 126 Diplonema sp. 4 new 87j| 22 Diplonema sp. 3 new \l Diplonema sp. 3 16 Diplonema sp. 3 new I— Diplonema sp. 2 7  n  DIPLONEMIDS  100 j Trypanosoma brucei 1 Trypanosoma brucei rhodesionsc nn-n— Leishmania major Entosiphon sulcatum __________ Euglona gracilis Naegleria gruberi . Lupinus albus 1 J l O/yza O/y; saftVa 1 Pisum sativum 1 • Anemia phyllitidis] — Daucus carota 2 " Arab Arabidopsis thaliana 1 Volvox carter! 1 Polytomella agilis 2 Chlamydomonas reinhardtii Chlamydomonas incerta — Guillardia theta nuclear 2 Guillardia theta nuclear 1 • CRYPTOMONADS Eimeria tenella Toxgoplasma gondii Plasmodium berghei Babesia bovis Tetrahymena themnophila 2 ALVEOLATES Colpoda sp  KINETOPLASTIDS EUGLENOIDS HETEROLOBOSEA  PLANTS  ]  GREEN ALGAE  31CERCOZOA Phytophthora cinnamomi •JHETEROKONTS Ectocarpus variabilis 6 — Brugia pahangi 1 Caenomabditis briggsae 3 Caenomabditis briggsae Gallus gallus 4 ANIMALS Drosophila melanogaster2 Caenorhabditis elegans 2 97  .  _____ Basidiobolus ranarum 35 . Spiromyces minutus 2 _____ Conidiobolus coronatus 1  Spironucleus barkhanus Giardia lamblia 0.1  • _  FUNGI  DIPLOMONADS Trichomonas vaginalis  I PARABASALS  48  Tree 1 (K,(D,E)) 2(E,(D,K)) 3(D,(E,K))  log L -8698.30 -8700.06 -8695.08  difference 3.22 4.99 0.00  S.E. 5.84 5.10  Significantly worse no no best tree  Table 2. Kishino-Hasegawa test of the positions o f diplonemids within Euglenozoa in the actin tree. D-diplonemids, E-euglenoids, K-kinetoplastids.  Tree 1 (E,(D,K)) 2 (D,(E,K)) 3 (K,(D,E))  log L  difference  -8439.05 -8434.01 -8437.58  5.04 0.00 3.57  S.E. 4.84 5.57  Significantly worse no best tree no  Table 3. Kishino-Hasegawa test of the positions of diplonemids within Euglenozoa in the alpha-tubulin tree. D-diplonemids, E-euglenoids, K-kinetoplastids.  Tree  logL  difference  1 (E,(D,K)) 2 (D,(E,K)) 3 (K,(D,E)) 4 (E,(D,K)) 5 (D,(E,K)) 6 (K,(D,E))  -6424.70 -6436.15 -6434.70 -6425.10 -6436.54 -6438.95  0.00 11.45 10.00 0.41 11.84 14.26  S.E. 8.51 8.90 10.26 13.89 13.46  Significantly worse best tree no no no no no  Table 4. Kishino-Hasegawa test of the positions of diplonemids within Euglenozoa in the beta-tubulin tree. The topologies o f Euglenozoa of Tree 1-Tree 3, with Naegleria gruberi branching with euglenoids. Tree 4-Tree 6 exclude Naegleria gruberi from the Euglenozoa. D-diplonemids, E-euglenoids, K-kinetoplastids.  49  diplonemids node uniting diplonemids and kinetoplastids (node A in Fig. 11) is 64%, which is higher than the bootstrap value indicated by the alpha-tubulin tree, but is still relatively low. Considering the phylogenetic position o f Naegleria gruberi within the Euglenozoa, and that its close association with euglenoids might affect the phylogenetic placement o f within this phylum, I did K - H tests on alternative positions for the group diplonemids with Naegleria gruberi branching with euglenoids within the phylum Euglenozoa ((D, (E, K ) ) and ((D, E), K ) , marked by open circles), and on three alternative positions for the group diplonemids with Naegleria gruberi constrained outside the phylum Euglenozoa ((E, (D, K)), (D, (E, K ) and ((D, E), K ) , marked by closed circles). None o f these alternative topologies were rejected at the 5%> level (Table 4). In conclusion, beta-tubulin also supports a closer relationship between diplonemids and kinetoplastids than between diplonemids and euglenoids. However, this relationship is not supported by either bootstrap or K - H tests. Moreover, the validity o f the phylogenetic position o f diplonemids within the Euglenozoa suggested by the beta-tubulin tree is questioned by the inclusion o f a member from a separate phylogenetic group into the Euglenozoa. To summarize, among the three protein phylogenetic tree constructed i n this study, the actin tree shows the strongest bootstrap support, not only for the phylum Euglenozoa, but also for the phylogenetic placement o f diplonemids within this phylum. The alpha-tubulin tree indicates low ability to resolve the internal phylogeny o f the phylum Euglenozoa and the beta-tubulin tree does "not support the Euglenozoa as a monophyletic phylum.  3.4  Lateral gene transfer indicated by GAPDH phylogeny On the basis o f the comparison o f the 290 alignable amino-acid sequences for  G A P D H from 100 taxa, a B i o N J tree was constructed (Fig. 12) using a distance and 50  neighbor-joining analysis. This global tree includes not only diverse eukaryotic groups but also diverse prokaryotic groups. The resulting tree revealed a very complex picture o f G A P D H gene evolution (Fig. 12) and recovered the basic relationships o f the two separate classes o f G A P D H sequences, GapC and G a p A / B (divided by a dashed line in Fig. 9), typical o f G A P D H phylogeny (Michels et al. 1991; Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). The GapC clade (above the dashed line) includes the cytosolic G A P D H from most eukaryotes. In published global G A P D H trees, the GapC o f most eukaryotes form a sub-clade, with unresolved relationships. This is also indicated in my global G A P D H tree by the lack o f bootstrap support for the backbone o f the GapC sub-clade. Moreover, my global G A P D H tree also shows that the gapl sequences from a group o f proteobacteria (including the gap A (=gapl) from E.coli) and a group o f cyanobacteria are basal to this eukaryotic crown sub-clade, but separated from G a p A / B by the G a p C sequences from another eukaryotic phylum, Heterolobosea. The G a p A / B clade (below the dashed line) includes the G A P D H genes from most bacteria and the plastid targeted G A P D H genes from photosynthetic eukaryotes. The eukaryotic plastid-targeted G a p A / B sequences form a subclade that is closely related to the gap2 sequences from cyanobacteria, in keeping with the cyanobacterial origin o f chloroplasts. In order to do careful phylogenetic analysis that would be impossible on 100 taxa (I did not perform the gamma-distribution correction on the distance matrices inferred from the G A P D H sequences alignment with 100 taxa), I constructed two smaller B i o N J G A P D H trees based on two sub-alignments. Both alignments retained the 290 alignable characters. One includes all 39 taxa i n the G a p A / B clade (below the dashed line) o f the larger G A P D H tree (Fig. 13) and the other includes all 61 taxa in the GapC clade (above the dashed line) o f  51  F i g . 12 Phylogeny o f diverse eukaryotes and prokaryotes based on G A P D H protein sequences, including the four new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are in bold. Dashed line divides the two classes o f G A P D H : GapC (above the dashed line) and G a p A / B (below the dashed line). Scale bar indicates amino acid substitutions per site. The five shaded regions include all the members o f the phylum Euglenozoa in this tree. Node A unites diplonemid and cyanobacterial sequences, and node B unites a second copy G A P D H oi Diplonema sp. 2 and two sequences o f cryptomonads.  52  Cryphonectria parasitica • Colletotrichum gloeosponoides  95 „  tys nidulans heterostrophus FUNGI Emencella Ustijago maydis Lentinola edodes • Schizophyllum commune Agariousbisporus II Caenomabditis briggsae 2 Ceanorhabditis elegans 4 Homo sapiens gapu ANIMALS Drosophila melanogaster 2 Schistosoma rnansoni Dictyostelium discoideum C - Pyrenomonas salina Guillardia.theta.• CRYPTOMONADS  ]  2 Diplonema sp. 2  Z J RED A L G A E  Schizosaccharomyces pombe =i FUNG US PLANTS Zea Zeamays maysgapC4 ,_ . . Zea mays gapC1_ Pisu'm sativum^ - Plasmodium falciparum—i ,/r-„, . - , - , - 0 Gonyaulax polyedra C _ l A L V E O L A T E S  H  r  ~J DIPLOMONADS  Tre^mpnasjsllis  Phytophthora infestans Entamoeba histolytica C jCRYPTOMONAD Guillardia theta • Gpnyaul pnyaglax polyedra 33 A L V E O L A T E Chlamydomonas reinhardtn gapC ^gosFcchardmycesTQlixiP -• _, j?,REEN A L G A E saccharomyces iva cerevisiae gapl J FUNGI Escherichia coli gapA Serratia marcescens PROTEOBACTERIA Haemophilus influenzae Leishmania mexicana qapCi Trypanosoma brucei gapC KINETOPLASTIDS Bacteroides fragilts ^Jialstqnia eutropha ulreptieua sp. ileri Naegleria anderson^ ^ - ] HETEROLOBOSEA 1 Gloeobacter vioTaceus 1 Anabaena variabilis \Synechocystis PCC6803 CYANOBACTERIA ' ' - " "-coccus PCC7942 neumoniae _ . , . , .. cana gape Cnthidia fasciculata Leptomonas lactosovorahSM Phytomonas sp Trypanosoma cruzi qap KINETOPLASTIDS Trypanosoma rangeli & EUGLENOID .^anosoma brucei qppG Ttybanopiasma borreli P  1  Q  >  —  _  _  _  ^ PrI B Q C H A E I E ^repDnemaMaiMuni. _ PJ _S TERIA  •Arabidopsis thaliana Pisum sativum papA gapA PLANTS •monas reinhardtii gapA ArabidpjSsis 9^>E Graalarfa frfcilfs giftA RED A L G A E 2 Prochloron didemni — i A  2 yn /,o y /s_PC _6803 J 5  eC  C  S/  C  C Y  _1  ANOBACTERIA  rme^MUilli ' ' Zl CP ^Euglena gracilis b l s  •  BACILLUS  fParacoccus denitnficansu Rhodobacter sphaeroides Xanthobacter flavus 2 Eutreptiella sp. PROTEOBACTERIA 2 Ralsfonia eutropha Pseudomonas aeruginosa Zymomonas mobilis 2 Paracoccus denitrificans J aureofaciens FIRMICUTES 3 Prochloron didemni 1 CYANOBACTERIA 3Anabaena vanabiiis I  ZT^yl^cW^m^P^A . SyRhynchopusmpm 88 . tRhynchopuss 1 Diplonema sp. 2 DIPLONEMIDS .1 Diplonema sp. 3  ]  Thermotoga maritima. . THERMOTOGALES/THERMUS Thermus aquaticus — .Monocercomo/ras ATCC50210 —I Trichomonas vaginalis Trichomitus batrachorum PARABASALIA I  53  F i g . 13 G A P D H phylogeny o f protein sequences o f prokaryotes and some eukaryotes, including three new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are in bold. For technical details see Material and Methods. Scale bar indicates amino acid substitutions per site. The number in front o f the species indicates a particular copy o f the G A P D H from that species. The two shaded regions include all the members o f the phylum Euglenozoa i n this tree. Node A unites diplonemid sequences (Rhynchopus sp. 3, Diplonema sp. 2 g a p l , Diplonema sp. 3), cyanobacterial gap3 sequences and one proteobacterial G A P D H sequence.  54  F i g . 14 G A P D H phylogeny o f protein sequences o f eukaryotes and some bacteria, including one new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs o f sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50%. For technical details see Material and Methods. Scale bar indicates amino acid substitutions per site. The number in front o f the species indicates a particular copy o f the G A P D H from that species. The four shaded regions include all the members o f the phylum Euglenozoa i n this tree. Node B unites Diplonema sp. 2 gap2 and the GapC o f cryptomonads.  56  57  the larger G A P D H tree (Fig. 14). The two smaller G A P D H trees have essentially the same branching order as the global G A P D H tree. The phylogeny o f the phylum Euglenozoa based on G A P D H sequences is a lot more complicated than those based on actin, alpha-tubulin or beta-tubulin sequences. The various Euglenozoan G A P D H genes do not branch together as they do in these other trees: instead, euglenozoan sequences are scattered all over the G A P D H tree (see shaded regions o f Fig. 12). A s with previous analyses (e.g. Michels et al. 1991; Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997), the Euglena cytosolic/kinetoplastid glycosomal clade is basal to GapC, and the gapA(=gapl) from proteobacteria and the gapl from cyanobacteria. The cytosolic G A P D H genes of Leishmania mexicana and Trypanosoma brucei are extraordinarily close to Escherichia coli gapA (=gapl). The chloroplast G a p A of Euglena gracilis branches at the base o f the clade comprised o f plastid-targeted G a p A / B sequences of photosynthetic eukaryotes, and the gap2 o f cyanobacteria. The phylogenetic positions o f the four diplonemid G A P D H sequences from this study are intriguing. None o f the four is closely related to any o f the G A P D H sequences from other euglenozoa. Three o f the four d i p l o n e m i d - G A P D H sequences {Rhynchopus sp. 3, Diplonema sp. 3, and one copy o f G A P D H sequence from Diplonema sp. 2 ) form a group (gapl) with 100% bootstrap support. This group, surprisingly, branches with the gap3 from cyanobacteria and one proteobacterial gap (Rhodobacterium). The union of Diplonema sp. 2, Diplonema sp. 3, and Rhynchopus sp. 3 gapl with these bacterial G A P D H s is robust (supported by a 100% bootstrap value). The most reasonable explanation for this unusual association between prokaryotic and eukaryotic genes is interkingdom lateral gene transfer, much as that suggested to explain the extraordinary affinity between the L. mexicana and T. brucei GapC genes and the E. coli gapA (Michels et al. 1991). 58  The second copy o f G A P D H from Diplonema sp. 2 (gap2) is not closely related to the other three diplonemid sequences, nor is it closely related to the other euglenozoan G A P D H sequences. Instead, it weakly branches with cytosolic G A P D H sequences from cryptomonads, which branch with animals and fungi in Fig. 12 and Fig. 14, with a very low bootstrap support (31%). The low bootstrap support makes the phylogenetic placement o f the second copy o f G A P D H from Diplonema sp. 2 very tentative and questionable, but it is certain that it is not related to the diplonemid gapl genes.  59  CHAPTER IV:  Discussion In an effort to address questions about the evolutionary history o f introns in the phylum Euglenozoa, I have sequenced twenty-nine nuclear encoded genes from nine different diplonemids. I discovered eleven introns in nine o f the twenty-nine genes. I have also inferred phylogenetic trees from these protein genes (actin, alpha- and beta-tubulins), including the new diplonemid-sequences to attempt to reconstruct the evolutionary history o f the Euglenozoan introns. In order to gain a better understanding o f the G A P D H phylogeny o f the phylum Euglenozoa, I also constructed a global G A P D H tree, including four sequences from diplonemids. The resulting phylogenetic positions o f diplonemid-sequences were unexpected, which makes the G A P D H phylogeny o f the Euglenozoa even more intriguing.  4.1  Phylogeny of the Euglenozoa The actin, alpha- and beta- tubulin trees constructed in this study confirm that  diplonemids represent a third group in the phylum Euglenozoa, along with euglenoids and kinetoplastids (there are no molecular data on Postgaardi), as previously proposed based on the morphological (Triemer et al. 1990; Triemer et al. 1991b; Simpson 1997) and molecular phylogenetic evidence (Maslov et al. 1999). While some phylogenetic relationship between diplonemids and the other two euglenozoan groups seems certain, the phylogenetic relationships among the three groups has never been clear. There are three possible topologies for a tree o f three lineages (Fig. 15). The topology o f the actin tree I constructed with the two-diplonemid sequences obtained in this study suggested that diplonemids are more closely related to euglenoids than to kinetoplastids. O n the other hand, phylogenetic  60  analyses o f alpha- and beta- tubulin, including many new diplonemid sequences, weakly support a different topology, where diplonemids are more closely related to the kinetoplastids than to euglenoids. The third topology, i n which diplonemids are at the base o f the phylum, and euglenoids and kinetoplastids are closer to each other, has not been supported by any phylogenetic tree constructed in this study or in previous studies (Maslov et al. 1999). In order to assess the reliability o f different topologies suggested by different protein trees, I did bootstrap analysis (for the details, see results). Bootstrap analysis for the actin tree gives 79% support for the union o f diplonemids and euglenoids, while neither alpha- nor beta-tubulin tree give strong bootstrap support for the node uniting the diplonemids and kinetoplastids (only 40% and 64%, respectively). Moreover, the reliability o f the betatubulin tree is questionable because the phylum Euglenozoa is not holophyletic: the last common ancestor for the three major euglenozoan groups (diplonemids, kinetoplastids and euglenoids) is also an ancestor o f Naegleria gruberi. It is well known that Naegleria gruberi belongs to a related but phylogenetically distinct group, the Heterolobosea. The separation o f the Heterolobosea and the Euglenozoa is supported by nearly all known molecular phylogenies, including the actin and alpha-tubulin trees I constructed. Taken together, the topology o f the actin tree, in which diplonemids are closer to euglenoids, is probably more reliable than that o f tubulin trees. This conclusion agrees with the topologies suggested by distance and parsimony trees based on the sequences o f the S S U r R N A gene and the C o x I (cytochrome c oxidase subunit I) protein, but differs from maximum likelihood trees based on the same molecules (Maslov et al. 1999). M a x i m u m likelihood has been proven to be a powerful method o f phylogenetic reconstruction. However, as it is a complicated and computationally heavy 61  process, the sampling size in maximum likelihood methods is very limited, especially when the data are composed o f protein sequences. The limited sampling size raises doubts as to whether the phylogenetic position o f diplonemids in the maximum likelihood trees o f Maslove et al. (1999) are reliable. Moreover, none o f the phylogenetic positions o f diplonemids suggested by the maximum likelihood trees (Maslov et al. 1999) are wellsupported by bootstrap analysis: in the maximum likelihood C o x I protein tree, the bootstrap support for the diplonemid/kinetoplastid branch is very low (56%) and in maximum likelihood S S U r R N A tree, it is even lower, less than 50%. Compared with the phylogenetic analysis conducted by Maslov et al. (1999), the phylogenetic analysis I performed has three advantages. First, the sampling size is comparatively large: the number o f diplonemid-species is significantly increased, especially in both alpha- and beta-tubulin trees (10 and 13 diplonemid-species, respectively). Second, the number o f outgroups is also greatly increased: all o f the three protein trees, actin, alphatubulin and beta-tubulin, contain a variety o f phylogenetically distant groups. Thus, I may have avoided the bias caused by the choice o f outgroups. Different choices o f outgroup sequences may lead to variable support for the ingroup phylogeny. This has indeed been noticed by Maslov et al. (1999) in the phylogenetic analysis performed on the Euglenozoa: in the maximum likelihood analysis o f S S U r R N A , they found that the choice o f Giardia lamblia and Vairimorpha necatrix as outgroups greatly increased the bootstrap support for the association o f diplonemids and kinetoplastids, compared to the bootstrap support obtained when choosing Physarum polycephalum and Saccharomyces cerevisiae as outgroups. However, they pointed out that this seemingly high bootstrap support might be caused by the biased nucleotide composition or fast substitution rates in Giardia lamblia and Vairimorpha necatrix. Third, my analyses were performed at the amino acid level, rather 62  than alignments o f D N A sequences as in the S S U r R N A analyses conducted by Maslov et al. (1999). This is also an advantage because a substitution o f an amino acid may be more evolutionary informative than a substitution o f a nucleotide, especially when the change o f a nucleotide is synonymous. On the other hand, although the phylogenetic analysis conducted i n this study probably favours a closer association o f diplonemids with euglenoids, in each case the difference between the best tree and the alternative trees was not significant at a 95% confidence level, as inferred by K - H test (see results). In order to further assess the reliability of the union o f diplonemids and euglenoids, it may be helpful to perform a combined analysis, in which actin, alpha-tubulin and beta-tubulin sequences are combined into a single alignment, and a phylogenetic tree constructed from this alignment. In addition, it might be helpful to try to use different combinations o f outgroups chosen from the actin, alpha- and beta- tubulin trees, in maximum likelihood analyses with the newly obtained protein sequences o f diplonemids.  4.2  Possible origins of the intron-types in the Euglenozoa A s mentioned before, three types o f introns (conventional G T - A G spliceosomal  intron, trans-spliced discontinuous intron, and "aberrant" intron) have been reported i n the phylum Euglenozoa. Since conventional spliceosomal introns are seemingly rare in the Euglenozoa, trans-spliced discontinuous introns are rarely found out o f this phylum, and the "aberrant" introns are unique to photosynthetic euglenoids, it would be interesting to determine the distribution o f intron types in the third major lineage o f the Euglenozoa, diplonemids. In this section, I am going to discuss the possible origins o f the three types o f intron based on the avalaible information on the distribution o f the intron types i n the Euglenozoa, and the internal phylogeny o f this phylum discussed i n the preceding section. 63  G T - A G spliceosomal introns A m o n g the twenty-nine newly sequenced nuclear encoded genes (actin, alphatubulin, beta-tubulin and G A P D H ) , eleven G T - A G introns were found in nine genes. Thus, G T - A G introns seem to be frequently present in the actin, alpha-tubulin, beta-tubulin and G A P D H genes o f diplonemids. A s mentioned in the Introduction, conventional G T - A G introns are very rare in euglenoids and altogether absent from the actin and tubulin genes o f Euglena gracilis. The apparent rarity o f G T - A G introns in euglenoids could be due to limited sequence sampling. When more nuclear genes from different euglenoids are examined, more G T - A G introns could be discovered. In fact, three G T - A G spliceosomal introns have been recently reported i n the fibrillarin gene o f Euglena gracilis (Breckenridge et al. 1999), in addition to one G T - A G spliceosomal intron in a beta-tubulin gene o f Entosiphon sulcatum (Ebel et al. 1999). Because G T - A G spliceosomal introns have been detected in the nuclear genes o f most eukaryotes, including the closest relatives o f the Euglenozoa, the phylum Heterolobosea (Remillard et al. 1995), it is reasonable to think that G T - A G spliceosomal introns already exited in the ancestor o f the Euglenozoa. Thus, the reason that G T - A G spliceosomal introns are very rare in kinetoplastids and euglenoids is likely due to a high frequency o f intron loss. Trans-spliced discontinuous introns Trans-splicing occurs abundantly in the post-transcriptional process of p r e - m R N A in both kinetoplastids and euglenoids (see introduction), but no information available is available on whether this process is present in diplonemids or not. However, by combining the internal phylogeny and the known distribution o f trans-splicing within the phylum Euglenozoa, it is possible to make predictions regarding the origin o f this unusual process.  64  There are three possible topologies to describe the phylum Euglenozoa (Fig. 15). In my phylogenetic analysis based on actin, alpha- and beta- tubulin sequences, only two o f the three topologies were ever recovered (Fig. 15 A and Fig. 15 B ) , while the third possible topology, favouring diplonemids at the base o f the Euglenozoa (Fig. 15 C ) , was supported neither by my protein trees (actin, alpha- and beta-tubulin trees) nor by any other phylogenetic analysis conducted so far (Maslov et al. 1999). Either o f the two plausible euglenozoan phylogenies, that which unites diplonemids with euglenoids or, alternatively, with kinetoplastids, implies that trans-splicing arose in the common ancestor o f all Euglenozoa. If this is true, then all three major groups o f Euglenozoa, including diplonemids, should contain trans-spliced, discontinuous introns. On the other hand, i f one accepts the third topology o f this phylum (with diplonemids basal) and then considers the known distribution o f trans-splicing within Euglenozoa, there could be two possible origins o f trans-splicing (Fig. 15 C ) : it either originated i n the common ancestor o f this phylum or, alternatively, i n the common ancestor o f euglenoids and kinetoplastids after the separation o f the diplonemid-lineage. Since the third topology o f the Euglenozoa is not supported i n any o f the phylogenetic analysis, trans-splicing is highly likely to be an ancestral character o f the phylum Euglenozoa, and therefore, it w i l l also be found in diplonemids. "Aberrant" introns In nine o f the twenty-nine nuclear encoded genes sequenced from diplonemids, I discovered eleven G T - A G introns. None o f them resemble the "aberrant" introns unique to Euglena gracilis. In kinetoplastids, over 4000 protein sequences are available i n Genbank at present, and none o f them contains any such "aberrant" intron either. Therefore, it is tempting to speculate that "aberrant" introns are a derived character unique to euglenoids. 65  TOPOLOGY  MOLECULAR EVIDENCE  Actin (distance)  SSU rRNA, COI (distance and parsimony) (Maslov et al. 1999)  B Tubulins (distance) SSU rRNA, COI (maximum likelihood) (Maslov et al. 1999)  C None  Fig. 15 Three possible topologies (A, B, C) for the internal phylogeny of the Euglenozoa. Eeuglenoids; D-diplonemids; K-kinetoplastids. Arrows point at the most likely origin of either "aberrant" introns or trans-splicing. "?" indicates the uncertainty of the most likely origin of trans-splicing between the two sites: either before or after the divergence of the diplonemidlineage from other euglenozoons.  66  They are perhaps unique to photosynthetic euglenoids, or they may even be Euglena gracilis specific, since all the thirty "aberrant" introns reported so far are from Euglena gracilis: 26 from two different nuclear-encoded chloroplast-targeted genes and four from the nuclear encoded, cytosolic G A P D H gene (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995).  4.3  Features of diplonemid introns Although there were no "aberrant" introns in any o f the 29 newly sequenced  diplonemid genes, the eleven G T - A G introns from diplonemids do share four unusual features when compared with conventional G T - A G introns, especially those o f mammals, which are the best studied. First, although the lengths o f these introns are not uncommon among other protistintrons, they are relatively short when compared with those G T - A G spliceosomal introns i n mammals. They range in size from 40 to 149 nt whereas the sizes o f the G T - A G spliceosomal introns in mammals generally range from 80 to 10000 nucleotides or more. Second, the 5' splice consensus sequence o f a typical diplonemid-intron is one nucleotide different from that o f a typical mammalian-intron. A s observed by previous researchers, the consensus sequences o f a mammalian G T - A G spliceosomal intron are G / G U R A G Y at the 5' splice site and C A G / at the 3' splice site (a slash marks the cleavage site; R represents purine; Y represents pyrimidine; N can be any nucleotide: U m e n et al. 1995). In diplonemid-introns, the 5' splice site consensus is G / G U R U G Y while the 3' splice site consensus is the same as that o f mammalian introns. These consensus sequences at the 5' splice sites and 3' splice sites o f the eleven diplonemid-introns are thus very similar to those o f the introns in mammals, the only difference being the fourth position at the 5' splice site. It is a U in the diplonemid-intron whereas an A in the animal-intron. This is consistent 67  with a previous finding in euglenoids: the conserved sequences at the 5' splice sites o f all the euglenoid G U - A G conventional introns so far (three introns i n the fibrillarin gene o f Euglena gracilis and one intron in the beta-tubulin o f Entosiphon sulcatum) also have this single nucleotide substitution (Breckenridge et al. 1999; Ebel et al. 1999). In animal introns, the consensus region at the 5' splice site is recognised through complementary base pairing by U l s n R N A (Sharp 1987). It has been shown that in euglenoids, the highly conserved 5' extremity o f U l sequences contain one complimentary substitution ( U to A ) at the fourth position (Ebel et al. 1999; Breckenridge et al. 1999). Therefore, one would expect an analogous compensatory change at the 5' extremity in the U l s n R N A o f diplonemids. A third unusual feature is that no conventional branchpoint site can be clearly identified in these diplonemid introns. I have searched the branchpoint consensus sequence of both mammalian introns ( 5 ' - Y N Y U R A C N - 3 ' ) and yeast introns ( 5 ' - T A C T A A C A - 3 ' ) in the 11 diplonemid-introns (the branchpoint adenosine is underlined; U m e n et al. 1995). A s mentioned in the Results, the branchpoint consensus sequence o f yeast introns was not present in any o f the 11 diplonemid-introns. O n the other hand, the branchpoint consensus sequence o f mammalian intron was observed six times in five o f the 11 diplonemid-introns (twice i n one intron). But, in four o f the six times, this branchpoint consensus sequence was observed either between position +4 and +11 or between position +9 and +16 (referring to the 5' cleavage site) o f the introns. These can hardly be true branchpoint sites, since they are too close to the 5' splice sites o f the introns. W e know that the branchpoint site is closer to the 3' splice site than to the 5' splice site o f an intron, usually 15-40 nucleotides upstream o f the 3' splice site o f an intron (Umen et al. 1995). It is highly unlikely that the branchpoint site would be present so close to the 5' splice site or even overlapping with the 5' splice site consensus sequence. In two introns, this branchpoint consensus sequence was observed once 68  between position -23 and -16; once between position -36 and -29 in a different diplonemid gene. These two sites are also unlikely to be true branchpoint sites for two reasons. First o f all, i f they represent real branchpoint sites i n diplonemid-introns, then they should be observed in the other nine diplonemid-introns as well. Second, the branchpoint consensus sequence is relatively redundant in a mammalian intron. A m o n g the eight nucleotides Y N Y T R A C N , only three nucleotides are specific. So, the chance to find such eight continuous nucleotides within a piece o f nucleotide-sequence is comparatively high. In short, the branchpoint consensus sequence in a diplonemid-intron may be different from that o f a mammalian intron or a yeast intron. B y analysis o f the 14 alignable nucleotides at the 3' splice sites o f the eleven diplonemid-introns, another unusual feature o f diplonemid-introns becomes apparent. The 11-nucleotide regions preceding A C A G / in diplonemid-introns are generally C A - r i c h (see Results). W e know that i n a conventional G T - A G intron, especially i n mammals, there is a polypyrimidine tract between the branchpoint region and the 3' splice site (Umen et al. 1995). Previous experiments have demonstrated that the polypyrimidine tract i n mammalian introns provides recognition sites for a splicing factor (PSF) and a negative regulatory factor, pyrimidine tract binding protein (PTB) (Gerke 1986; Tazi 1986; Singh et al. 1995). The binding o f P S F to the polypyrimidine tract is essential for both splicing steps (Gerke 1986; Tazi 1986; Singh et al. 1995; U m e n et al. 1995). It has also been demonstrated that P S F has strong RNA-sequence preferences (Singh et al. 1995). P T B acts as a negative regulator o f splicing by binding to the pyrimidine tract and thus preventing the binding o f P S F to the pyrimidine tract (Singh et al. 1995). The ' C A ' rich regions adjacent to the consensus C A G / at 3' splice site raises the possibility that the role o f this region in diplonemid-introns is different from other introns. 69  This ' C A ' - r i c h region is absent at the 3' splice site o f the G T - A G intron i n the betatubulin gene from the colourless euglenoid Entosiphon sulcatum, where, instead, a typical polypyrimidine tract is present (Ebel et al. 1999). A m o n g the three G U - A G introns in the fibrillarin gene from Euglena gracilis (Breckenridge et al. 1999), the introns A and C have CT-rich tracts rather than C A - r i c h tracts at their 3' splice sites while intron B seems to have a weakly C A - r i c h tract: four A , four C , two T and two G residues, preceding the C A G / . If introns with both C A - r i c h and polpyrimidine tracts can exist i n the same p r e - m R N A transcript, then it is possible that a splicing factor i n euglenoids could recognise both the C A - r i c h tract and the CT-rich tract at the 3' splice selection site. It is also possible that the splicing factor i n diplonemids has the same dual functions, since the 11-nucleotide region preceding A C A G / i n one o f the 11 diplonemid-introns is clearly C T - r i c h (Alpha-rh2 i n Fig. 8). In summary, introns seem to be more common i n nuclear encoded genes o f diplonemids than those o f either euglenoids or kinetoplastids. The eleven diplonemid-introns discovered in this study are all G T - A G introns. However, they distinguish themselves from classical G T - A G spliceosomal introns in four ways: 1) They are short. 2) Nearly all the diplonemid-introns possess a T residue at the fourth position at their 5' splice sites. 3) They don't have branchpoint consensus sequence o f either mammalian introns or yeast introns. 4) There is a C A - r i c h region comprised o f 12 nucleotides preceding C A G / at the 3' splice site of a typical diplonemid-intron. A l l these differences suggest that the spliceosomes i n diplonemids might be slightly different from comparatively well-studied spliceosomes o f other eukaryotes.  4.4  Evolutionary origin of diplonemid GAPDH  70  The phylogenetic positions o f the four diplonemid G A P D H sequences obtained in this study (see Fig. 12) are unexpected, as none o f the diplonemid sequences branch with three other known groups o f euglenozoan sequences (the kinetoplastid glycosome/Euglena cytosol clade, the chloroplast G a p A of Euglena gracilis, or the Leishmania mexicana and Trypanosoma brucei cytosolic clade). Instead, the gapl sequences o f three diplonemids (Rhynchopus sp. 3, Diplonema sp. 3 and Diplonema sp. 2) branch within the G a p A / B clade, specifically with the gap3 genes o f cyanobacteria while the Diplonema sp. 2 gap2 sequence branches with the cytosolic G A P D H genes o f eukaryotes, specifically with those from cryptomonads. The three gapl sequences are much more similar to each other than to the gap2 sequence from Diplonema sp. 2 (for details, see Results). In addition, they share five insertions that are neither i n the gap2 sequence from Diplonema sp. 2 or i n any other G A P D H sequences examined (see Fig. 7 and Results). I suggest that the sequence differences between the three gapl sequences and the gap2 sequence were largely caused by their different evolutionary histories, rather than considered to be the evolutionary consequences o f different localizations or different functions within the cell. This is because that it seems likely that both types o f G A P D H i n diplonemids are NAD-specific, possibly playing roles in.catabolic glycolysis in the cytosol. A s mentioned i n the Introduction, the amino acid at position 32 o f a G A P D H gene is considered as an important indicator o f the relative specificity o f G A P D H for N A D or N A D P as a substrate. The amino acid at position 32 oi Diplonema gap2 is aspartic acid (D), the same as nearly all other NAD-specific cytosolic G A P D H enzymes found in different prokaryotes and eukaryotes, while in the diplonemid gapl sequences, the same positions are occupied by glutamic acid (E), which is also shared by gap3 in A. varibilis (see Fig. 7). Comparative studies o f the substrate-binding 71  properties o f various mutants have suggested that replacing Asp32 (D32) by Glutamic acid (E) w i l l not compromise activity with N A D , but both prevent activity with N A D P (Clermont et al. 1993). This means that both types o f G A P D H in diplonemids are likely NAD-specific. Since we know that cytosolic G A P D H is NAD-specific and chloroplast G A P D H is both N A D - and NADP-specific, it is likely that both diplonemid gapl and gap2 perform the same role (catabolic) in the same location (cytosol). In the global G A P D H tree (Fig. 12), the three gapl sequences o f three different diplonemids (Diplonema sp. 2, Diplonema sp. 3 and Rhynchopus sp. 3) unite themselves robustly with the cyanobacterial gap3 clade at a 100% bootstrap level. It is unlikely that the diplonemid gapl genes come from a cyanobacterial contaminant since these three gapl sequences were isolated from three independently grown axenic cultures (see Materials and Methods). So, three related but different cyanobacteria would have to have contaminated the three diplonemid cultures, which is highly unlikely. Since it isn't contamination, lateral gene transfer is the only way to explain why diplonemids have eubacterial genes, and the close and strongly supported relationship with cyanobacterial gap3 suggests that diplonemids aquired their gapl from a cyanobacterium through horizontal gene transfer. Lateral gene transfer has been cited previously to explain unusual association observed i n G A P D H phylogeny. In addition to the G A P D H genes o f parabasalids mentioned i n the Introduction, another case is the extraordinarily close relationship among the cytosolic GapC sequences o f T. brucei and L. mexicana with E.coli gapl sequence. Michels et al. (1991) and Martin et al. (1993) have postulated that the ancestor o f the trypanosome-lineage received this gene by a prokaryote-to-eukaryote lateral gene transfer from an E. co/z'-like ancestor relatively recently in evolution. Those kinetoplastids that separated early in evolution from the trypanosomelineage (such as the bodonid Trypanoplasma borelli), possess only the glycosomal G A P D H 72  gene, and lack the cytosolic genes found in "higher" kinetoplastids (Michels et al. 1992). Therefore, Henze et al. (1995) concluded that the genes for cytosolic G A P D H i n kinetoplastids provides evidence for an evolutionarily recent gene transfer. The second copy o f G A P D H from Diplonema sp. 2 (gap2) differs considerably from the other three diplonemid-gapl genes in sequence and appears from the phylogeny to be unrelated to the gapl genes (see Results). In the G A P D H tree (Fig. 12), this gap2 from Diplonema sp. 2 falls into the eukaryotic crown taxa (the G a p C clade), and branches specifically with the cytosolic G A P D H genes from cryptomonads. These are in turn closely related to animals and fungi but very distant from other diplonemid G A P D H sequences and G A P D H sequences from either kinetoplastids or euglenoids. The association o f gap2 and the cryptomonad-GAPDH sequences is very weak (31% bootstrap support), however, it is clear that the diplonemid gap2 sequence is not related to any other euglenozoan G A P D H sequence. The origin o f this G A P D H and its evolutionary relationships to other Euglenozoan G A P D H sequences are difficult to predict since the phylogenetic position of Diplonema sp.2 gap2 is so tentative. However, the presence o f a Diplonema G A P D H sequence within the eukaryotic crown taxa (GapC sub-clade) raises tantalising question: could this be descended from the original GapC o f the Euglenozoa, which is now lost in both euglenoids and kinetoplastids? In summary, what can be inferred from this G A P D H phylogeny o f the phylum Eulgenozoa are the following three points: 1) The G A P D H phylogeny o f the phylum Euglenozoa is complex. There are two distinct types o f G A P D H enzymes in each o f the three major groups: euglenoids, kinetoplastids and diplonemids. Except for the cytosolic G A P D H i n Euglena and the glycosomal G A P D H in kinetoplastids, the remaining four types of G A P D H sequences (chloroplast G a p A i n Euglena gracilis, cytosolic G A P D H in 73  trypanosomes, cyanobacteria-related G A P D H in diplonemids, and the Diplonema sp. 2 gap2) are scattered all over the global G A P D H tree. 2)The extraordinarily close association of Diplonema sp. 2, Diplonema sp. 3 and Rhynchopus sp. 3 G A P D H sequences with the gap3 sequences o f the cyanobacteria suggested a inter-domain horizontal gene transfer from a prokaryotic to a eukaryotic genome. 3) A different copy o f diplonemid G A P D H (Diplonema sp. 2 gap2) branches with the GapC sequences o f other eukaryotes, and may represent the ancestral euglenozoan G A P D H .  74  References Agabian, N . 1990. Trans splicing o f nuclear pre-mRNAs. C e l l 61:1157-60. Biesecker, G . , J. I. Harris, J. C . Thierry, J. E . Walker, and A . J. Wonacott 1977. Sequence and structure o f D-glyceraldehyde 3-phosphate dehydrogenase from Bacillus stearothermophilus. Nature 266:328-33. Blumenthal, T., and J. Thomas 1988. Cis and trans m R N A splicing i n C. elegans. T I G 4:305-8. Borst, P., and B . W . Swinkels 1989. The evolutionary origin o f glycosomes: how glycolysis moved from cytosol to organelle. In Evolutionary tinkering i n gene expression (GrunbergManago, M . , Clark, B . F. Zachau, H . G . , eds.) pp. 163-74, Plenum Publishing Corporation, N e w York. Breckenridge, D . G . , Y . Watanabe, S. J. Greenwood, M . W . Gray, and M . N . Schnare 1999. U l small nuclear R N A and spliceosomal introns in Euglena gracilis. Proc Natl A c a d Sci U S A 96:852-6. Cavalier-Smith, T. 1981. Eukaryote kingdoms: seven or nine? Biosystems 14:461-81. Clermont, S., C . Corbier, Y . Mely, D . Gerard, A . Wonacott, and G . Branlant 1993. Determinants o f coenzyme specificity in glyceraldehyde-3-phosphate dehydrogenase: role o f the acidic residue i n the fingerprint region o f the nucleotide binding fold. Biochemistry 32:10178-84. Davis, R. E . 1997. Surprising diversity and distribution o f spliced leader R N A s i n flatworms. M o l Biochem Parasitol 87:29-48. Ebel, C , C . Frantz, F. Paulus, and P. Imbault 1999. Trans-splicing and cis-splicing i n the colourless Euglenoid, Entosiphon sulcatum. Curr Genet 35:542-50. Farmer, M . A . , and R. E . Triemer 1988. Flagellar systems in the euglenoid flagellates. Biosystems 21:283-91. Felsenstein, J. 1993. P H Y L I P (phylogeny inference package). Distributed by the author, Department o f Genetics, University o f Washington, Seattle, Version 3.57c. Gascuel, O. 1997. B i o N J : an improved version o f the N J algorithm based on a simple model of sequence data. M o l B i o l E v o l 14: 685-95. Gerke, V . , and J. A . Steitz 1986. A protein associated with small nuclear ribonucleoprotein particles recognizes the 3' splice site o f premessenger R N A . C e l l 47:973-84.  75  Gibbs, S. P. 1978. The chloroplasts o f Euglena may have evolved from symbiotic green algae. Can J Bot 56:2883-9. Henze, K . , A . Badr, M . Wettern, R. Cerff, and W . Martin 1995. A nuclear gene o f eubacterial origin in Euglena gracilis reflects cryptic endosymbioses during protist evolution. Proc Natl Acad Sci U S A 92:9122-6. Keeling, P. J., and W . F. Doolittle 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution o f the tubulin family. M o l B i o l E v o l 13:1297-305. Kishino, H . , and M . Hasegawa 1989. Evaluation o f the maximum likelihood estimate o f the evolutionary tree topologies from D N A sequence data, and the branching order in Hominoidea. J. M o l . E v o l . 29: 170-9. Laird, P. W . 1989. Trans splicing in trypanosomes—archaism or adaptation? Trends Genet 5:204-8. Lee, J. J., Hutner, S. H . , and E . C . Bovee 1985. Oder 11. Kinetoplastida, pp. 141-55, i n A n illustrated guide to the protozoa. A l l e n Press, Lawrence, U S A . Liaud, M . F., U . Brandt, M . Scherzinger, and R. Cerff 1997. Evolutionary origin o f cryptomonad microalgae: two novel chloroplast/cytosol-specific G A P D H genes as potential markers o f ancestral endosymbiont and host cell components. J M o l E v o l 44 Suppl 1:S2837. Mair, G . , H . Shi, H . L i , A . Djikeng, H . O. Aviles, J. R. Bishop, F. H . Falcone, C . Gavrilescu, J. L . Montgomery, M . I. Santori, L . S. Stern, Z . Wang, E . U l l u , and C . Tschudi 2000. A new twist in trypanosome R N A metabolism: cis-splicing o f p r e - m R N A . R N A 6:163-9. Markos, A . , A . Miretsky, and M . Muller 1993. A glyceraldehyde-3-phosphate dehydrogenase with eubacterial features in the amitochondriate eukaryote, Trichomonas vaginalis. J M o l E v o l 37:631-43. Martin, W . , H . Brinkmann, C . Savonna, and R. Cerff 1993. Evidence for a chimeric nature o f nuclear genomes: eubacterial origin o f eukaryotic glyceraldehyde-3-phosphate dehydrogenase genes. Proc Natl A c a d Sci U S A 90:8692-6. Maslov, D . A . , S. Yasuhira, and L . Simpson 1999. Phylogenetic affinities of Diplonema within the Euglenozoa as inferred from the S S U r R N A gene and partial C O I protein sequences. Protist 150:33-42. Michels, P. A . , and V . Hannaert 1994. The evolution o f kinetoplastid glycosomes. J Bioenerg Biomembr 26:213-9. Michels, P. A . M . , F. R. Opperdoes, V . Hannaert, E . A . C . Wiemer, S. Allert, a n d N . Chevalier 1992. Phylogenetic analysis based on glycolytic enzymes. B e l g Journ Bot 125:164-73. 76  Michels, P. A . , M . Marchand, L . K o h l , S. Allert, R. K . Wierenga, and F. R. Opperdoes 1991. The cytosolic and glycosomal isoenzymes o f glyceraldehyde-3-phosphate dehydrogenase i n Trypanosoma brucei have a distant evolutionary relationship. Eur J Biochem 198:421-8. Muchhal, U . S., and S. D . Schwartzbach 1994. Characterization o f the unique intron-exon junctions oi Euglena gene(s) encoding the polyprotein precursor to the light-harvesting chlorophyll a/b binding protein o f photosystem II. Nucleic Acids Res 22:5737-44. Nilsen, T. W . 1995. trans-splicing: an update. M o l Biochem Parasitol 73:1-6. Nilsen, T. W . 1994. Unusual strategies o f gene expression and control i n parasites. Science 264:1868-9. Nilsen, T. W . 1989. Trans-splicing in nematodes. E x p Parasitol 69:413-6. Opperdoes, F. R. 1987. Compartmentation o f carbohydrate metabolism in trypanosomes. A n n u Rev Microbiol 41:127-51. Opperdoes, F. R., and P. A . M . Michels 1989. Biogenesis and evolutionary origin o f peroxisomes. In Organelles in eukaryotic cells:molecular structure and interactions (Tager, J. M . , A z z i , A . , Papa, S. and Guerrieri, F., eds) pp. 187-95, Plenum Publishing Corporation, N e w York. Remillard, S. P., E . Y . L a i , Y . Y . Levy, and C . Fulton 1995. A calcineurin-B-encoding gene expressed during differentiation o f the amoeboflagellate Naegleria gruberi contains two introns. Gene 154:39-45. Sambrook, J., E . F . Fritsch, and T. Maniatis 1989. Small-scale preparations o f plasmid D N A . In Molecular cloning (a laboratory manual, second edition) pp. 1.25-1.32, C o l d Spring Harbor Laboratory Press, U S A . Schnepf, E . 1994. Light and electron microscopical observations i n Rhynchopus coscinodiscivorus spec, nov., a colorless, phagotrophic Euglenozoon with concealed flagella. A r c h Protistenkd 144:63-74. Sharp, P. A . 1987. Splicing o f messenger R N A precursors. Science 235:766-71. Simpson, A . G . B . 1997. The identity and composition o f the Euglenozoa. A r c h Protistenkd 148:318-28. Simpson, A . G . B . , D . H . J. V a n , C . Bernard, H . R. Burton, and D . J. Patterson 1996/97. The ultrastructure and systematic position o f the euglenozoon Postgaardi mariagerensis. Fenchel et al. Arch. Protistenkd. 147: 213-25. Singh, R., J. Valcarcel, and M . R. Green 1995. Distinct binding specificities and functions o f higher eukaryotic polypyrimidine tract-binding proteins. Science 268:1173-6. 77  Strimmer, K . , and A . von Haeseler 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. M o l . B i o l . E v o l . 13: 964-9. ' Tazi, J., C . Alibert, J. Temsamani, I. Reveillaud, G . Cathala, C . Brunei, and P. Jeanteur 1986. A protein that specifically recognizes the 3' splice site o f mammalian p r e - m R N A introns is associated with a small nuclear ribonucleoprotein. C e l l 47:755-66. Tessier, L . H . , R. L . Chan, M . Keller, J. H . W e i l , and P. Imbault 1992. The Euglena gracilis rbcS gene contains introns with unusual borders. F E B S Lett 304:252-5. Tessier, L . H . , M . Keller, R. L . Chan, R. Fournier, J. H . W e i l , and P. Imbault 1991. Short leader sequences may be transferred from small R N A s to pre-mature m R N A s by transsplicing i n Euglena. E M B O J 10:2621-5. Triemer, R . E . , and M . A . Farmer 1991a. A n ultrastuctural comparison o f the mitotic apparatus, feeding apparatus, flagellar apparatus and cytoskeleton i n euglenoids and kinetoplastids. Protoplasma 164:91-104. Triemer, R . E . , and M . A . Farmer 1991b. The ultrastuctural organization o f the heterotrophic euglenoids and its evolutionary implications, pp. 183-204. In Patterson, D . J., and J. Larsen (ed.), The biology o f free-living heterotrophic flagellates. Clarendon Press, Oxford. Triemer, R. E . , and D . W . Ott 1990. Ultrastructure oi Diplonema ambulator Larsen & Patterson (Euglenozoa) and its relationship to Isonema. Europ J Protistol 25:316-20. Umen, J. G . , and C . Guthrie 1995. The second catalytic step o f p r e - m R N A splicing. R N A 1:869-85. Viscogliosi, E . , and M . Muller 1998. Phylogenetic relationships o f the glycolytic enzyme, glyceraldehyde-3-phosphate dehydrogenase, from parabasalid flagellates. J M o l E v o l 47:190-9.  78  Addendum Spliced leader sequences have recently been isolated from Diplonema papillatum and Diplonema sp. by D . A . Campbell, University o f California at Los Angeles (unpublished data, personal communication with Dr. Patrick Keeling). This discovery strongly supports m y prediction that trans-splicing is an ancestral characteristic to the phylum Euglenozoa.  79  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 10 0
France 3 0
Japan 3 0
China 3 1
Canada 1 0
Brazil 1 0
Germany 1 3
City Views Downloads
Ashburn 6 0
Unknown 5 3
Tokyo 3 0
Beijing 3 0
Sunnyvale 2 0
Atlanta 1 0
Nepean 1 0
Mountain View 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0089614/manifest

Comment

Related Items