"Science, Faculty of"@en . "Botany, Department of"@en . "DSpace"@en . "UBCV"@en . "Qian, Qing"@en . "2009-07-13T23:20:52Z"@en . "2000"@en . "Master of Science - MSc"@en . "University of British Columbia"@en . "The phylum Euglenozoa consists of three main groups: euglenoids, kinetoplastids\r\nand diplonemids (Simpson 1997). This phylum is unique in having three types of introns:\r\nnuclear trans-spliced \"introns\", nuclear conventional and \"aberrant\" introns. In order to\r\ndetermine the evolutionary history of the introns in this phylum, it is very important to know\r\nthe general distributions of intron types within the phylum, and the likely phylogeny of the\r\nphylum.\r\nThe nuclear genomes of euglenoids are known to contain all three types of introns,\r\nwhile only trans-spliced and conventional introns have been found in kinetoplastids.\r\nHowever, nothing is known about diplonemid introns, and the phylogenetic placement of\r\ndiplonemids within the Euglenozoa is uncertain. Therefore, I looked for nuclear introns in\r\ndiplonemids by sequencing four nuclear protein-coding genes (actin, alpha-tubulin, betatubulin,\r\nand GAPDH) from different diplonemids. I found 11 introns in nine of the twentynine\r\nnewly obtained diplonemid nuclear protein-coding genes. They all have conventional\r\n5'-GT-AG-3' splicing sites, but differ from well-studied eukaryotic conventional introns\r\n(mammalian introns) in several details.\r\nI have added these nuclear encoded sequences from diplonemids to the tubulin, actin\r\nand GAPDH alignments and then made global phylogenetic trees based on these protein\r\nalignments. The discrepancy between the tubulin trees and actin tree is whether the\r\ndiplonemids are closer to kinetoplastids (tubulin trees) or euglenoids (actin tree).\r\nTaken together, I postulate that the GT-AG conventional introns were present in the\r\neuglenozoan ancestor and were largely lost in kinetoplastids and euglenoids. The \"aberrant\"\r\nintron is very likely a derived character restricted to euglenoids. The trans-spliced\r\ndiscontinuous \"intron\" is an ancestral character to this phylum and it is highly likely that it\r\nwill be found in diplonemids as well.\r\nThe phylogenetic position of the four newly sequenced diplonemid GAPDH\r\nsequences turned out to be very interesting. None of the four diplonemid GAPDH sequences\r\nbranch with those of other euglenozoa. Instead, three of the four diplonemid-sequences\r\nbranch with the gap3 of cyanobacteria with 100% bootstrap support, indicating a lateral gene\r\ntransfer from bacteria to eukaryotes, and one GAPDH sequence branches in an uncertain\r\nposition with other eukaryotic GAPDH sequences."@en . "https://circle.library.ubc.ca/rest/handle/2429/10769?expand=metadata"@en . "4928857 bytes"@en . "application/pdf"@en . "T H E E V O L U T I O N A R Y I M P L I C A T I O N S OF D L P L O N E M I D S A N D T H E I R S P L I C E O S O M A L I N T R O N S by Q I N G Q I A N B . Sc., Hangzhou University, 1996 A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Botany We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A September 2000 \u00C2\u00A9 Q i n g Q i a n , 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract The phylum Euglenozoa consists of three main groups: euglenoids, kinetoplastids and diplonemids (Simpson 1997). This phylum is unique in having three types of introns: nuclear trans-spliced \"introns\", nuclear conventional and \"aberrant\" introns. In order to determine the evolutionary history of the introns in this phylum, it is very important to know the general distributions of intron types within the phylum, and the likely phylogeny of the phylum. The nuclear genomes of euglenoids are known to contain all three types of introns, while only trans-spliced and conventional introns have been found in kinetoplastids. However, nothing is known about diplonemid introns, and the phylogenetic placement o f diplonemids within the Euglenozoa is uncertain. Therefore, I looked for nuclear introns in diplonemids by sequencing four nuclear protein-coding genes (actin, alpha-tubulin, beta-tubulin, and G A P D H ) from different diplonemids. I found 11 introns in nine of the twenty-nine newly obtained diplonemid nuclear protein-coding genes. They all have conventional 5 ' - G T - A G - 3 ' splicing sites, but differ from well-studied eukaryotic conventional introns (mammalian introns) in several details. I have added these nuclear encoded sequences from diplonemids to the tubulin, actin and G A P D H alignments and then made global phylogenetic trees based on these protein alignments. The discrepancy between the tubulin trees and actin tree is whether the diplonemids are closer to kinetoplastids (tubulin trees) or euglenoids (actin tree). Taken together, I postulate that the G T - A G conventional introns were present in the euglenozoan ancestor and were largely lost in kinetoplastids and euglenoids. The \"aberrant\" intron is very likely a derived character restricted to euglenoids. The trans-spliced discontinuous \"intron\" is an ancestral character to this phylum and it is highly likely that it w i l l be found in diplonemids as well. The phylogenetic position of the four newly sequenced diplonemid G A P D H sequences turned out to be very interesting. None of the four diplonemid G A P D H sequences branch with those of other euglenozoa. Instead, three of the four diplonemid-sequences branch with the gap3 of cyanobacteria with 100% bootstrap support, indicating a lateral gene transfer from bacteria to eukaryotes, and one G A P D H sequence branches in an uncertain position with other eukaryotic G A P D H sequences. T A B L E O F C O N T E N T S Abstract i i List o f Tables iv List o f Figures v Acknowledgements v i C H A P T E R I Introduction 1 1.1 The phylum Euglenozoa and its phylogeny 1 1.2 Intron types in the Euglenozoa 4 1.3 Diplonemid G A P D H 12 C H A P T E R II Materials and Methods 16 2.1 Strains and culture conditions 16 2.2 D N A extraction procedures 17 2.3 P C R conditions 18 2.4 Cloning of amplified fragments 19 2.5 D N A sequencing 20 2.6 Sequence alignment and phylogenetic analyses 21 C H A P T E R III Results 24 3.1 Sequences for nuclear encoded genes from diplonemids . . . 24 3.2 Diplonemid introns 36 3.3 Phylogeny of the Euglenozoa 40 3.4 Lateral gene transfer indicated by G A P D H phylogeny 50 C H A P T E R TV Discussion 60 4.1 Phylogeny of the Euglenozoa 60 4.2 Possible origins of the intron-types in the Euglenozoa 63 4.3 Features of diplonemid introns 67 4.4 Evolutionary origin of diplonemid G A P D H 70 References 75 Addendum 79 List of Tables 1 Twenty-nine nuclear encoded genes from nine diplonemids 25 2 K - H test of the positions of diplonemids within Euglenozoa in the actin tree ... 49 3 K - H test of the positions of diplonemids within Euglenozoa in the alpha-tubulin tree 49 4 K - H test of the positions of diplonemids within Euglenozoa in the beta-tubulin tree 49 iv List of Figures 1 The two-step cis-splicing 6 2 The two-step trans-splicing 8 3 The secondary stem-loop structure of an intron in the rbcS gene from Euglena gracilis 10 4 A n alignment of four amino-acid sequences of actin 26 5 A n alignment of twelve amino-acid sequences of alpha-tubulin 28, 29 6 A n alignment of fifteen amino-acid sequences of beta-tubulin 31, 32 7 A n alignment of six amino-acid sequences of G A P D H 35 8 Alignment of eleven-diplonemid introns from 5'-end to 3'-end 38 9 Neighbor-joining tree based on actin protein sequences of various eukatyotes 41, 42 10 Neighbor-joining tree based on alpha-tubulin protein sequences of various eukatyotes 44, 45 11 Neighbor-joining tree based on beta-tubulin protein sequences of various eukaryotes 47, 48 12 Phylogeny of diverse eukaryotes and prokaryotes based on G A P D H protein sequences 52, 53 13 G A P D H phylogeny of protein sequences of prokaryotes and some eukaryotes 54, 55 14 G A P D H phylogeny of protein sequences of eukaryotes and some bacteria 56, 57 15 Three possible topologies for the internal phylogeny of the Euglenozoa 66 v Acknowledgements I would like to thank everyone who inspired and helped me to produce this thesis. On the inspiration side there stand, first, both of my supervisors, Dr. Tom Cavalier-Smith and Dr. Patrick Keeling. I am very grateful to Dr. Tom Cavalier Smith for his research funding that allowed me to pursue this project, for his high critical standards, and also for his insightful guidance about phylogeny and evolution. After Tom left for a faculty position at Oxford University in England, Dr. Patrick Keeling became my main supervisor. I enjoyed working in his lab and trying new techniques. I thank Dr. Patrick Keeling for his kindness in sharing his knowledge with me, especially for the very complicated phylogenetic analysis and the G A P D H phylogeny, and also for his generosity to provide me with almost all the protein alignments and degenerate primers utilized in this project. M y committee members, Dr. Carl Douglas and Dr. Martin Adamson, kindly gave me comments and suggestions during the progress of my project. On the help side, I would first like to thank Dr. Naomi Fast, for her careful reading of the manuscripts of my thesis, and her help in the preparation for the presentation of my project. I would also like to thank Dr. Ken Ishida and Juan Saldarriaga who gave me valuable suggestions on my thesis. During the writing and re-writing process, I learned a lot in presenting my thoughts more strongly and precisely. I owe special thanks to Ema Chao for generously providing me with the genomic D N A s from several diplonemids, and Dr. Alexandra Marinets for showing me the basic molecular lab techniques in the beginning. I couldn't possibly have finished my thesis without the support from my dear parents, and various help from my friends, especially my roommate Tanya Hooker. Finally, my thanks go to my friend Jens Happe, for his persistent encouragement over the last two years. C H A P T E R I: Introduction 1.1 The Phylum Euglenozoa and its phylogeny Cavalier-Smith (1981) first formally established the phylum Euglenozoa by grouping kinetoplastids and euglenoids together based on a list o f shared characteristics, including: mitochondria with discoid cristae; paraxial rods; non-tubular mastigonemes (flagellar hairs) and closed mitosis with an endonuclear spindle (Cavalier-Smith 1981). The first electron microscopic observations of diplonemids (Triemer et al. 1990) suggested the addition of diplonemids to the phylum Euglenozoa. Simpson (1997) further proposed two potential synapomorphies uniting the phylum Euglenozoa: flagellar root pattern and paraxial rod substructure. The unique pattern of flagellar root organization of the Euglenozoa is the system o f three microtubular roots: two roots closely associated with the outside o f each basal body and one originating between the basal bodies. In addition, the paraxial rods of kinetoplastids, euglenoids and diplonemids share a distinctive substructure: the paraxial rod of the dorsal/anterior flagellum has a cylindrical cross-sectional appearance, while the structure in the ventral/recurrent flagellum is squarer in cross-section with a three-dimensional latticework substructure. Another new addition to the phylum Euglenozoa is Postgaardi managerensis. It is a recently described organism that is covered by rod-shaped bacteria, and that has two thickened flagella inserting into an anterior pocket. A recent ultrastructural study of Postgaardi mariagerensis (Simpson et al. 1996/97) revealed a strong case for its inclusion within the Euglenozoa because it also shares the two major synapomorphies proposed by Simpson for the Euglenozoa (Simpson 1997). 1 While the Euglenozoa share several synapomorphies, each euglenozoan group also displays distinct features of their own. The euglenoids as a subgroup are identified by the presence of a pellicle- a system of strips of glycoprotein that appears under the plasma membrane and is supported by sub-pellicular microtubules (Triemer et al. 1991b). This group includes both photosynthetic euglenoids (e.g. Euglena) and non-photosynthetic euglenoids (e.g. Entosiphori). The photosynthetic euglenoids have attracted the attention of many researchers because of their intriguing chloroplasts, which are surrounded by three membranes, instead of two membranes. It is now clear that their chloroplasts are of secondary endosymbiotic origin, which means that a colourless euglenoid acquired its chloroplast by swallowing a green algal cell (Gibbs 1978). The kinetoplastids are the euglenozoans that harbor one or more kinetoplasts ( D N A -rich bodies) in their mitochondria (Lee et al. 1985; Opperdoes 1987). This group includes major disease-causing genera. For example, the genera Trypanosoma and Leishmania include serious human pathogens that cause African 'sleeping sickness', South American Chagas disease, as well as leishmaniasis in tropical and subtropical areas (Lee et al. 1985; Opperdoes 1987). In addition, this group also includes free-living flagellates such as Bodo (Lee et al. 1985). Diplonemids are represented by only two genera, Diplonema and Rhynchopus, based on their very similar ultrastructural organizations (Schnepf et al. 1994). They have neither kinetoplasts nor a pellicle of glycoprotein strips, but do possess a distinctive feeding apparatus composed of vanes with fuzzy coats and giant, flat mitochondrial cristae (Triemer et al. 1990; Triemer et al. 1991a; Triemer et al. 1991b; Simpson, 1997). Diplonemids do not have chloroplasts and they are not human pathogens, and they live in either fresh-water or marine environments (Schnepf et al. 1994; Triemer et al. 1990). 2 Postgaardi mariagerensis lacks an euglenoid pellicle and possesses mitochondria without kinetoplast or cristae. So, although being part of the phylum Euglenozoa, it is neither an euglenoid nor a kinetoplastid. A s far as its feeding apparatus is concerned, P. mariagerensis has no vanes or supporting rods, but only the M T R (a complex of reinforcing microtubules) to support its feeding apparatus. This distinction indicates that Postgaardi mariagerensis is not a diplonemid either (Simpson et al. 1996/97; Simpson 1997). In short, there are many data based on light- and electron-microscopy to distinguish among the four groups of the Euglenozoa. Data are particularly abundant for euglenoids and kinetoplastids. Although these structural characters are helpful in revealing phenotypic similatities, they are not as helpful for inferring phylogeny. Molecular sequences, on the other hand, are much more suitable for the latter task. However, they are not available at all from Postgaardi mariagerensis and extremely limited from diplonemids compared to euglenoids and kinetoplastids. In fact, no nuclear protein-coding gene has been characterized from diplonemids so far. The only available molecular sequences at the onset of this study were the sequences of small subunit ribosomal R N A (SSU r R N A ) genes from two diplonemids (Diplonema papillatum and Diplonema sp.) and a partial sequence of the mitochondrial gene for cytochrome c oxidase subunit I (Cox I protein) from one diplonemid (Diplonema papillatum) (Maslov et al. 1999). Maslove and Simpson (1999) performed a molecular phylogenetic study using these sequences in order to analyze the phylogenetic position of diplonemids within the phylum Euglenozoa. In their phylogenetic analyses, Diplonema was shown to be a sister-group of either kinetoplastids (in trees inferred with the maximum-likelihood method), or euglenoids (in trees inferred with the parsimony and distance methods). In either case, however, the affinity is not well supported by bootstrap 3 analysis and the differences between the best tree and the alternative trees were not significant. It remains unclear how diplonemids are related to euglenoids and kinetoplastids. In molecular trees, this may be due to two weaknesses in the phylogenetic analyses conducted by Maslove et al. (1999): 1) The very small sampling size. A l l the S S U gene trees were based on only seventeen taxa, including two diplonemid-sequences, and all the Cox I protein trees were based on only seven taxa, including one diplonemid-sequence. 2) The mitochondrial Cox I protein phylogeny can be unreliable due to the fast evolution of euglenozoan mitochondrial genes. In this study, I have characterized diplonemid nuclear encoded genes for actin, alpha-tubulin, beta-tubulin and G A P D H , to construct novel phylogenetic trees to try to resolve the phylogenetic position of diplonemids within the Euglenozoa. 1.2 Intron types in the Euglenozoa Three types of introns in euglenoids and/or kinetoplastids Three types of introns occur in euglenoids and/or kinetoplastids: conventional ' G T -A G ' spliceosomal introns, trans-spliced, or discontinuous 'introns' and \"aberrant\" introns. I w i l l describe each of the three types and their distributions within the Euglenozoa in the following three sections. G T - A G spliceosomal introns G T - A G spliceosomal introns are abundant in higher eukaryotes. Genes in most eukaryotes are transcribed into pre-mRNAs that include introns. Only when all the introns in the pre-mRNAs are excised wi l l the mature m R N A s be transported from the nucleus to the cytosol where translation takes place. The precise removal of the introns from the primary R N A transcripts is a critical step in gene expression in all eukaryotic cells. In general, it is a 4 two-step catalytic process aided by a group of small nuclear ribonucleoprotein particles (snRNPs) together called the spliceosome. The spliceosome is mainly composed of five snRNPs ( U l , U 2 , U5 and U4/U6), and assembles on the precursor messenger R N A through R N A - R N A , RNA-protein, and protein-protein interactions. The first step in cis-splicing is the cleavage of the 5' splice site by the formation of a 2 ' -5 ' phosphodiester bond between an adenosine within the intron and the guanosine residue at the 5' end of the intron. This generates a free 5' exon and an intermediate R N A in a lariat structure. The second step involves the cleavage of the 3' splice site, the ligation of the 5' exon and the 3' exon and the release of the intron in a lariat structure (Fig. 1). Since the introns are removed before expression of the gene, most intron sequences accumulate mutations during evolution more rapidly than the flanking exons. The only highly conserved sequences within the intron are those required for intron removal or for recognition during formation of the spliceosome. In particular, the ' G T ' at the 5' end and ' A G ' at the 3' end of an intron are almost invariant. Mutational studies have shown that disrupting either the G T at the 5' splice site or the A G at the 3' splice site can block or reduce the rate of both steps during the cis-splicing (Sharp 1987). In addition to the consensus 5' and 3' splice junction sequences, the next conserved sequence regions are the branchpoint region and the region between the branch point and the 3' splice site (Sharp 1987; Umen et al. 1995). The branchpoint site is where the lariat intermediate forms after the first step of the cis-splicing. During the first step of the splicing, the 2' hydroxyl of an adenosine in this branchpoint site attacks the phosphodiester bond between the guanosine at the 5' terminus of an intron and an ajacent exon nucleotide. This leads to the releasing of the 5' exon and the formation of a 2 ' -5 ' phosphodiester bond between the branchpoint adenosine and the guanosine at the 5' terminus of an intron. In 5 pre-mRNA transcript Fig. 1 The two-step cis-splicing. Filled square represents 5' exon and open square represents 3' exon. Intron is represented by black line. The consensus dinucleotides at either end of the intron are marked as GU and AG. The branchpoint adenosine is marked as A. The dashed line between G and A represents the 2'-5' phosphodiester bond formed after the first step of the splicing. See text for detailed description. yeast, the branchpoint region is strictly maintained. It has the consensus sequence 5'-U A C U A A C A - 3 ' (the underlined adenosine is the adenosine participating in the formation of the 2 '-5 ' phosphodiester bond). In mammals, branchpoint region is less conserved, but the region between the branchpoint and the 3' splice site is a conserved polypyrimidine tract, and is one of the essential recognition sites for the binding of splicing factors (Sharp 1987; Tazi et al. 1986; Gerker et al. 1986; Umen et al. 1995). Conventional G T - A G spliceosomal introns have been found in both green and colourless euglenoids, although they are rare in both. In Euglena, so far, only three G T - A G introns have been found in the fibrillarin gene of Euglena gracilis (Breckenridge et al. 1999). In the colourless euglenoid, Entosiphon sulcatum, one spliceosomal intron has been found in a beta-tubulin gene (Ebel et al. 1999). In kinetoplastids, only two G T - A G cis-splicing introns have very recently been discovered in the poly (A) polymerase (PAP) genes from both Trypanosoma brucei and Trypanosoma cruzi (Mair et al. 2000). Trans-spliced discontinuous introns Trans-splicing is also a post-transcriptional RNA-spl ic ing process. The distinguishing difference between cis- and trans-splicing is that, in trans-splicing, two exons flanking the discontinuous intron are on two different pieces of pre- messenger R N A s (Agabian 1990; Blumenthalet al. 1988; Nilsen 1995) (Fig. 2). However, trans-splicing is not an entirely novel RNA-spl ic ing process, it is regarded as the splicing of a discontinuous G T -A G spliceosomal intron. Trans-splicing is similar to G T - A G spliceosomal cis-splicing in three fundamental ways. First, this discontinuous 'intron' also has consensus G T and A G dinucleotides sequences at either end. Second, the chemistry of trans-splicing involves two transesterification-reaction, with the discontinuous 'intron' forming a Y-branched intermediate that is structurally analogous to the cis-splicing lariat (Blumenthal et al. 1988; 7 two separate pre-mRNA transcripts A G | spliced leader discontinuous \"intron\" recipient R N A RNA Stepl discontinuous \"intron\" recipient R N A \"Y\"-structure intermediate Step 2 \"Y\"-structure mature mRNA transcript Fig. 2 The two-step trans-splicing. The filled square represents the spliced leader R N A and the open square represents the recipient exon RNA. The discontinuous \"intron\" is represented by black line. The consensus dinucleotides at either end of the discontinuous \"intron\" are marked as G U and A G . See text for detailed descrirrtion. 8 Agabian 1990; Nilsen 1995) (Fig. 2). Third, cis- and trans-splicing share at least three small nuclear ribonucleoprotein particles- U2 , U4 and U6 snRNPs (Agabian 1990; Nilsen 1995). Around eighty percent of the m R N A s are trans-spliced in both the green euglenoid Euglena gracilis (Tessier et al. 1991) and the colourless euglenoid Entosiphon sulcatum (Ebel et al. 1999), whereas all known m R N A s are trans-spliced before they are translated into proteins in kinetoplastids (Agabian 1990; Nilsen 1994; Laird 1989). In addition to kinetoplastids and euglenoids, trans-splicing has only been reported in the Metazoan worms, such as nematodes (e. g. Caenorhabditis elegans) (Blumenthal et al. 1988; Agabian 1990; Nilsen 1989; Nilsen 1994) and flatworms (e. g. trematodes) (Davis 1997; Nilsen 1995), but the process certainly evolved independently in these animals and euglenozoa. \"Aberrant\" introns A third type of intron, here simply called \"aberrant\" introns, has also been found in the genome of Euglena gracilis. In general, these introns have three distinctive features (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995). First, these introns do not have any consensus sequences at their borders. Second, they employ an unusual stable stem-loop secondary structure in the pre-mRNA (Fig. 3) (Tessier et al. 1992; Muchhal et al. 1994), and further secondary structures (stem-loop structures) are observed in the \"aberrant\" introns in the cytosolic G A P D H gene of Euglena gracilis (Henze et al. 1995). Third, they are usually flanked by short (2- to 4-bp) repeats (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995). \"Aberrant\" introns have only been reported from Euglena gracilis. They are found in three nuclear encoded genes: 14 introns in the gene for the light harvesting chlorophyll a/b binding proteins of photosystem II (LHCPII) (Muchhal et al. 1994); 12 introns in the rbcS 9 Fig. 3 The secondary stem-loop structure of an intron in the rbcS gene from Euglena gracilis (Tessier et al. 1992). The arrows point at the two cleavage sites of the intron. This intron does not have consensus dinucleotides (GT-AG) at either end. Two stretches of nucleotides at the 5' and 3' ends of the intron can base-pair to each other, usually with several nucleotides at the 3' end of the intron displaced by two adjacent nucleotides from the 3' exon. 10 genes, which encodes the small subunits of the ribulose 1,5 bisphosphate carboxylase oxygenase (Tessier et al. 1992), and four introns in the gene for cytosolic, glycolytic glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) (Henze et al. 1995). Introns in diplonemids? Nothing is known about the intron distribution in diplonemids, the third major group of the Euglenozoa, because no nuclear protein-coding genes from any diplonemid have ever been sequenced, and their relationship to other Euglenozoa is unclear. In order to be able to determine the origins of nuclear cis-splicing, trans-splicing, and the \"abnormal\" introns in the Euglenozoa, we need to know the distribution of these characters in all three main groups, and the internal phylogeny of the phylum. Diplonemids, as the third and most poorly studied major group of this phylum, are an important part of this puzzle, since the lack of known protein-coding gene sequences from diplonemids is a gap in our understanding of intron distribution, and hinders phylogenetic analyses to determine the branching order within the Euglenozoa. The objective of the first part of this thesis, therefore, is to determine the evolutionary history of the intron-types in the Euglenozoa. In order to achieve this goal, I sequenced several nuclear protein-coding genes from diplonemids. B y so doing, I sought to determine whether these nuclear encoded genes contain introns, and i f they do, what kind of introns they possess. Also , I added my new diplonemid protein-sequences to protein alignments and constructed phylogenetic trees, hoping to solve the internal branching order of the three major groups of the phylum Euglenozoa. Then, based on the possible internal phylogeny, I attempt to infer the origins of the three intron types within the phylum Euglenozoa. 11 The nuclear encoded genes chosen for this study were: actin, alpha-tubulin and beta-tubulin. These proteins are the basic components for the cytoskeleton universally present in the eukaryotic cells. These genes are good candidates because they have been widely used as phylogenetic markers and sequences from many phylogenetically distinctive groups are available, including those from both euglenoids and kinetoplastids. In addition, these nuclear encoded genes often contain one or more introns in higher eukaryotes. 1.3 Diplonemid G A P D H The second part of my thesis focuses on a phylogenetic analysis of glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) . G A P D H was also chosen for analysis because it is well sampled and has a well-known intron distribution among extant eukaryotes. Glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) is a central carbon metabolic enzyme. The phylogeny of G A P D H is complex, resulting from a complicated evolutionary history that includes gene duplications, endosymbiotic gene replacements and lateral gene transfers (Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). In global G A P D H trees constructed previously, G A P D H sequences can always be divided into two distinct clades, GapC and GapA/B . The GapC clade mainly represents the cytosolic G A P D H enzymes of eukaryotes. The GapC enzymes are generally involved in the glycolysis in the cytosol. The reaction catalised by the GapC enzyme is catabolic and this enzyme is N A D specific. However, the gapl from proteobacteria and cyanobacteria are also included in the GapC clade. The reason for this is unclear so far. Conversely, the GapA/B clade mainly represents the G A P D H enzymes of bacteria. One exception to the eubacterial nature of the GapA/B clade is the inclusion of G A P D H from the eukaryotic phylum Parabasalia. Markos et al. (1993) and Viscogliosi et al. (1998) suggested that the close association of the G A P D H sequences from parabasalids with bacterial G A P D H sequences indicated a bacterial origin of 12 the G A P D H genes in Parabasalia, most likely by a lateral gene transfer from a bacterium to the ancestor of this phylum. In addition, the GapA/B clade also includes the nuclear-encoded, chloroplast-targeted G A P D H sequences from photosynthetic eukatyotes, which are also bacterial due to the cyanobacterial origin of the chloroplast. Indeed, in the GapA/B clade of global G A P D H trees, the nuclear-encoded, plastid-targeted G A P D H genes of photosynthetic eukaryotes are always closely related to the gap2 of cyanobacteria (considered to be the free-living relatives of chloroplasts) (Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). The chloroplast GapA/B enzyme is involved in the Calvin cycle. The reaction catalised by this enzyme is anabolic and the substrate of this enzyme can be either N A D or N A D P . Clermont et al. (1993) demonstrated that the amino acid at position 32 of a G A P D H gene plays an essential role in choosing the relative specificity of N A D or N A D P as its substrate. In most catabolic NAD-specif ic cytosolic G A P D H enzyme this position is aspartic acid (D), whereas for the anabolic G A P D H enzyme that is both N A D -and NADP-specif ic , this position is occupied by a non-acidic amino acid, for instance alanine (A) in the chloroplast-targeted G A P D H of Euglena gracilis. This is because there is an electrostatic repulsion between the negatively charged carboxyl group of an acidic amino acid (Asp32) and the negatively charged 2'-phosphate of N A D P . The phylogeny of G A P D H in the phylum Euglenozoa is very complicated. It has been shown that there are two distantly related G A P D H genes in both euglenoids (chloroplast and cytosolic G A P D H genes) (Martin et al. 1993; Henze et al. 1995) and kinetoplastids (glycosomal and cytosolic G A P D H genes) (Michels et al. 1991; Michels et al. 1992). The chloroplast G A P D H gene of Euglena gracilis is closely related to those of higher photosynthetic eukaryotes and the gap2 of the cyanobacteria that gave rise to 13 chloroplast-targeted G A P D H genes in higher photosynthetic eukaryotes (Martin et al. 1993; Henze et al. 1995). The cytosolic GapC of Euglena gracilis has been shown to be closely related to the glycosomal G A P D H genes of kinetoplastids. Glycosomes are unique microbodies of kinetoplastids, which harbor most enzymes of the glycolytic pathway and are often thought to be of endosymbiotic origin (Opperdoes 1987; Borst et al. 1989; Opperdoes et al. 1989; Michels et al. 1994). This Euglena cytosol/kinetoplastids glycosomes clade branches basally to the GapC clade (Henze et al. 1995; Liaud et al. 1997). Most G A P D H in kinetoplastids is found in glycosomes (Opperdoes 1987; Borst et al. 1989; Opperdoes et al. 1989; Michels et al. 1991; Michels et al. 1992). However, Trypanosoma brucei and Leishmania mexicana possess a second, distinct cytosolic G A P D H enzyme in addition to the glycosomal form (Michels et al. 1991; Michels et al. 1992). These two cytosolic G A P D H enzymes are extraordinarily closely related to E.coli gapl (=gapA) (Michels et al. 1991, Henze et al. 1995). Michels et al. (1992) have further proved that more distantly related kinetoplastids, such as a bodonid Trypanoplasma borelli, only have the typical glycosomal G A P D H enzyme. This strongly supports the speculation that the Euglena cytosol/kinetoplastid glycosome clade represents the original G A P D H form to the phylum Euglenozoa, and a horizontal gene transfer, perhaps from a y-purple bacterium related to E. coli, resulted in the cytosolic G A P D H in Trypanosoma and Leishmania after their ancestor diverged from the Bodonids (Michels et al. 1992; Michels et al. 1994; Henze et al. 1995; Liaud et al. 1997). How the G A P D H sequences of diplonemids fit into this picture is entirely unknown. Outstanding questions include: how many types of G A P D H genes are there in diplonemids? Where, in the global G A P D H tree, are they going to branch? Since it is generally thought that the Euglena cytosolAinetoplastid glycosome clade represents the original G A P D H form 14 to the phylum Euglenozoa (Michels et al. 1992; Michels et al. 1994; Henze et al. 1995; Liaud et al. 1997), the positions of the G A P D H sequences from diplonemids in the G A P D H tree may either confirm this speculation or possibly reveal new relationships between diplonemid G A P D H sequences and those of eukaryotes or prokaryotes. 15 CHAPTER II: Materials and Methods 2.1 Strains and culture conditions Axenic cultures of Diplonema ambulator ( A T C C 50223), Diplonema papillatum ( A T C C 50162), Diplonema sp. 3 (new strain) ( A T C C 50225), Diplonema sp. 4 ( A T C C 50232) and Rhynchopus sp. 3 ( A T C C 50231) were obtained from the A T C C (American Type Culture Collection). Cultures were maintained in four 150x15 mm sterilized, disposable plastic petri dishes (FISHER) in A T C C Culture mediuml728, enriched Isonema medium ( A T C C 1405 H E S N W Medium) with 10% heat-inactivated horse serum (Sigma Cat. # H I 270) added aseptically just before use. (Detailed recipes are given at the end of this section). Cultures were incubated at room temperature. After significant growth was observed by light microscopy, cells were harvested by centrifugation at 2000xg, 4\u00C2\u00B0C, for 10 minutes. Genomic D N A of Diplonema sp. 2 ( A T C C 50224), Diplonema sp. 3 ( A T C C 50231), Diplonema sp. 4 ( A T C C 50232), Rhynchopus sp. 1 ( A T C C 50226), and Rhynchopus sp. 2 ( A T C C 50229) were provided by Ema Chao. A T C C Medium 1405: Natural seawater 1.0 L Enrichment Solution (see below) 10.0 ml Vitamin Solution (see below) 1.0 ml Two-month-old seawater was filter-sterilized and all components were combined aseptically. Enrichment Solution: E D T A 2 H 2 0 N a N 0 3 N a 2 S i 0 3 9 H 2 0 Sodium glycerophosphate H3BO3 F e ( N H 4 ) 2 ( S 0 4 )2 \u00E2\u0080\u00A2 6 H 2 0 0.553 g 4.667 g 3.000 g 0.667 g 0.380 g 0.234 g 16 F e C l 3 6 H 2 0 M n S 0 4 4 H 2 0 Z n S 0 4 7 H 2 0 C0SO4 7 H 2 0 Distilled water 0.016 g 0.054 g 7.3 mg 1.6 mg 1.0 L Na 2 Si03 was neutralized with 1 N HC1. A l l ingredients were combined in the order listed. This solution was filter-sterilized. This solution was filter-sterilized. 2.2 DNA extraction procedures Cel l pellets of diplonemids were resuspended in a 1.5 ml C T A B Solution (4% (w/v) C T A B (Hexadecyltrimethylammonium bromide, S I G M A H-5882), lOOmM M E S ( S I G M A M-8250), 1.4M N a C l and 1% 2-Mercaptoethanol) pre-heated to 65\u00C2\u00B0C. The mixture was incubated at 65\u00C2\u00B0C for 30 minutes to allow for digestion and lysis. D N A was then gently extracted (to avoid extensive shearing) from the mixture with an equal volume of chloroform/isoamyl alcohol (24:1). D N A was precipitated from the aqueous phase by adding 2/3 volume isopropyl alcohol and incubating overnight at 4\u00C2\u00B0C. D N A was collected the following day by successive centrifugation of 1.5 ml aliquot portions in the same 1.5 ml tube at maximum speed (usually 12500 rpm) for 2.5 minutes. The D N A pellet was washed twice with 95% ethanol and twice with 70% ethanol to remove salt before being air-dried pellet and resuspended in T E (10/1 Tr i s /EDTA, p H 8.0). 2.3 PCR conditions Degenerate P C R primers for alpha-tubulin, beta-tubulin, actin and glyceraldehyde-3-phosphate dehydrogenase ( G A P D H ) were designed by Dr. Patrick Keeling based on the Vitamin Solution: Thiamine Vitamin B i 2 Biotin Distilled water 0.1 g 2.0 mg 1.0 mg 1.0L 17 conserved amino acid sequences at the extreme Amino- and Carboxyl- termini of the corresponding protein. The conserved regions at the N - and C- termini of the four proteins used for primer designing are given below: Alpha-tubulin gene, N - terminus 5 ' - Q V G N A G W E - 3 ' C- terminus 5 ' - W Y V G E G M - 3 ' . Beta-tubulin gene, N - terminus 5 ' - G Q C G N Q - 3 ' C- terminus 5 ' - M D E M E F T - 3 ' . Act in gene, N - terminus C- terminus G A P D H gene, N - terminus C- terminus 5' - E K M T Q L M F E - 3 ' 5 ' - V H R K C F - 3 ' . 5 ' - K V G I N G F G - 3 ' 5 ' - W Y D N E W G Y S - 3 ' . The degenerate primer pairs used for these four nuclear encoded genes were: Alpha-tubulin gene, T U B A 1 5' - T C C G A A T T C A R G T N G G N A A Y G C N G G Y T G G G A - 3 ' T U B A 2 5 ' - C G C G C C A T N C C Y T C N C C N A C R T A C C A - 3 ' . Beta-tubulin gene, T U B B 1 5 ' - G C C T G C A G G N C A R T G Y G G N A A Y C A - 3 ' T U B B 2 Act in gene, 5 ' - T C C T C G A G T R A A Y T C C A T Y T C R T C C A T - 3 ' 18 actF2 5 ' - G A G A A G A T G C A N C A R A T H A T G T T Y G A - 3 ' actRl 5 ' - G G C C T G G A A R C A Y T T N C G R T G N A C - 3 ' G A P D H gene, gap IF 5 ' - C C A A G G T C G G N A T H A A Y G G N T T Y G G-3 ' gaplR 5 ' - C G A G T A G C C C C A Y T C R T T R T C R T A C C A - 3 ' Amplification of D N A was carried out using standard methods. Typically, 250ng of diplonemid genomic D N A was used as a template in 50 pl reactions, with each primer at 10 p M , 0.25 units of Taq polymerase, 2 .5mM concentration of dNTPs and reaction buffer (Gibco B R L ) . Cycle parameters were 94\u00C2\u00B0C/ 30 Sec & pause 2.15min ( lx ) ; 94\u00C2\u00B0C /30sec, 50\u00C2\u00B0C /30sec, 72\u00C2\u00B0C/ 2min (30x); and 72\u00C2\u00B0C /5min(lx). 2.4 Cloning of amplified fragments A n aliquot of 5pl of each 50pl P C R reaction was run on an agarose gel (0.7-0.8% agarose), together with the D N A molecular weight marker (1 K b D N A ladder), to check the size of product. If the product has the expected molecular weight, the remaining portion of the reaction was run on another agarose gel (0.7-0.8%) agarose), and the fragment o f interest was isolated from the gel using either the Prep-a-gene kit ( B I O - R A D ) or GeneClean II kit (BIO 101 B I O / C A N SCIENTIFIC). Isolated fragments were ligated into the p C R 2.1-TOPO T-tailed vectors as specified by the manufacturer's protocol (Invitrogen). After 5 minutes of incubation at room temperature, 'One Shot Competent T O P O 10' Escherichia coli cells were transformed with the ligated plasmids following the manufacturer's protocol (Invitrogen). The cells were plated on selective L B medium containing 50 pg/ml ampicillin and 40 p l of 40 mg/ml X-gal and incubated overnig ht at 37\u00C2\u00B0C. The presence of X-gal allowed for 'blue-white screening' where colonies containing vectors with inserts appear white, whereas colonies containing 19 'empty vectors' appear blue. In order to determine the sizes of the inserts within the plasmids, 6-10 or more white colonies per cloning reaction were chosen and either a restriction analysis (digest with EcoR. I) or a screening reaction (PCR) by amplifying these inserts using M13 Forward (-20) and M13 Reverse primers were performed on them. On average, six white colonies, each of which contained plasmid with the insert of expected size, per cloning reaction were cultured overnight in individual tubes containing liquid L B medium with 50 u.g/ml ampicillin. Plasmid D N A with the expected size of P C R insert was isolated using either the standard alkaline lysis (miniprep) method (Sambrook et al. 1989) or the Perfect prep Plasmid D N A K i t following the manufacturer's protocol (Eppendorf). 2.5 DNA sequencing Automated sequencing using the dideoxy method was employed to obtain the sequences of cloned P C R products. The full-length sequences of genes were obtained using a primer-walking strategy for alpha- and beta- tubulin genes. (Walking primer sequences are given below.) Regions sequenced only on a single strand were confirmed by two independent sequences. The forward strands of the alpha-tubulin genes of Diplonema sp. 2 and Diplonema sp. 3 were sequenced using the following oligonucleotide primer: T U A A 3 2 G C G G C G A A C A A C T A C G C . The reverse strands of the alpha-tubulin genes of Diplonema sp. 3, Diplonema sp. 4 , Rhynchopus sp. 1 and Rhynchopus sp. 2 were sequenced using the following primer: T U A A 4 1 G G C A G C A C G C C A T G T A C . The forward strand of the beta-tubulin gene of Rhynchopus sp. 1 was sequenced using the following oligonucleotide primer: T U B B 3 2 G G T G C G G G G A A C A A C T G . 20 The reverse strands of the beta-tubulin genes of Diplonema sp. 3, Rhynchopus sp. 1 and Rhynchopus sp. 2 were sequenced using the following oligonucleotide primer: T U B B 4 2 G A C T T G A T G T T G T T C G G G . The forward strands of the beta-tubulin genes of Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 4 and Rhynchopus sp. 1 were sequenced using the following oligonucleotide primer: T U B B 3 G G A G C T G G T A A C A A C T G G . The reverse strands of the beta-tubulin genes of Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 4 and Rhynchopus sp. 1 were sequenced using the following oligonucleotide primer: T U B B 4 C T T G A T G T T G T T T G G A A T C . The forward strand of the beta-tubulin gene of Rhynchopus sp. 2 was sequenced using the following oligonucleotide primer: T U B B 3 3 G G T G C G G G C A A C A A C T G . 2.6 Sequence alignment and phylogenetic analyses The nature of the obtained sequences was confirmed by B L A S T searches against the GenBank database. Introns were tentatively identified by insertions in genes that couldn't be aligned to the amino acid sequences of the same gene from other organisms. The sequences were then imported into the Sequencher3.1.1 software package, where contigs were assembled. Once the contigs were complete, they were translated into amino acid sequence using the D N A Strider 1.2 program. B y comparing the inferred amino acid sequences with the corresponding protein alignments, introns were positively identified by the presence of canonical G T - A G boundaries. Introns were removed, then the nucleotide sequences were 21 translated into amino acid sequences. A l l the inferred amino acid sequences were added to the corresponding protein alignments. Amino acid sequence alignments of alpha-tubulin (436 amino acids), beta- tubulin (428 amino acids) and actin (373 amino acids) that included broad samplings of eukaryotes, and G A P D H (290 amino acids) from a wide range of both eukaryotes and prokaryotes were provided by Dr. Patrick Keeling and Dr. Naomi Fast. Amino acid sequences from diplonemids were added to these four alignments. Regions in the alignments that did not appear optimal were subsequently adjusted manually using a text editor. Phylogenetic analyses were performed on the aligned protein datasets using a distance method. P U Z Z L E version 4.0.2 (Strimmer and von Haeseler 1996) was used to calculate maximum likelihood distances between pairs of sequences. The distance matrices were corrected by the JTT substitution frequency matrix with amino acid usage estimated from the data, site-to-site rate variation modeled on a gamma distribution with eight rate categories (except for the G A P D H alignment with 100 taxa). The Gamma distribution parameter alpha was estimated from each dataset. Trees were constructed from the distance matrices with the neighbor-joining (NJ) algorithm using the B i o N J program (Gascuel 1997). A hundred bootstrap resamplings of the data were generated by the S E Q B O O T program implemented in the Phylip 3.572 package (Felsenstein 1993). One hundred distance matrices were inferred from the 100 resampled alignments by P U Z Z L E version 4.0.2 (using the settings described above but not with the gamma-distribution), using the shell script puzzleboot (by M . Holder & A . Roger). A hundred trees were generated by analyzing the 100 distance matrices with the neighbor-joining (NJ) method using B i o N J . The bootstrap majority-rule consensus tree was constructed using the C O N S E N S E program from the Phylip 3.752 package. 22 Alternative internal topologies of the phylum Euglenozoa in the alpha- and beta-tubulin and actin trees were tested statistically using the Kishino-Hasegawa (K-H) method (Kishino and Hasegawa 1989). This method evaluates the standard error of the difference in In likelihood between alternative topologies, that is, it allows one to test whether a tree topology with higher likelihood is significantly preferred over others with lower likelihood. A l l K - H tests were performed using P U Z Z L E version 4.0.2 with gamma-distributed rates and user-defined trees (the parameters used were based on the first input tree). For the present studies, differences of log likelihood greater than 1.96 standard errors (corresponding to a 95% confidence interval) were considered significant (Kishino and Hasegawa 1989). 23 C H A P T E R III: Results 3.1 Sequences for nuclear encoded genes from diplonemids I sequenced genes for alpha- and beta-tubulin, actin and G A P D H from nine different diplonemids in this study. I obtained twenty-nine seqences in total: ten alpha-tubulin sequences from eight different diplonemids, thirteen beta-tubulin sequences from eight different diplonemids, two actin sequences from two different diplonemids and four G A P D H sequences from three different diplonemids. (See Table 1 for a summary). The predicted amino acid sequences inferred from the nucleotide sequences for these twenty-nine nuclear encoded genes are given in Fig. 4-Fig. 7. Lengths of these sequences (with neither intron nor P C R primer sequences included) are approximately: 733 nt for actin, 1153 nt for alpha-tubulin, 1162 nt for beta-tubulin, and 904-949 nt for G A P D H . Act in gene sequences A band close to the expected size (777 nt) was obtained from Diplonema ambulator. A band larger than the expected size was obtained from Diplonema sp. 3. These two amplified products were cloned and seqeuenced, and blast searches against the GenBank database confirmed both encoded actin genes. Blast searches also revealed that both sequences contain introns: two in the actin gene of Diplonema sp. 3 (80 nt and 176 nt in length) and one in the actin gene Diplonema ambulator (40 nt in length). The length of each of the two predicted protein sequences is 244 amino acids, representing about two-thirds of a complete actin sequence. 24 Table 1. Twenty-nine nuclear encoded genes from nine diplonemids. Numbers indicate copy/copies sequenced from one specific diplonemid. diplonemid ct-tubulin (3-tubulin Actin G A P D H Diplonema sp. 2 1 1 2 Diplonema sp. 3 1 1 1 1 Diplonema sp. 3 new 2 2 Diplonema sp. 4 1 3 Diplonema ambulator 2 1 Diplonema papillatum 1 Rhynchopus sp. 1 1 1 Rhynchopus sp. 2 1 1 Rhynchopus sp. 3 2 2 1 25 o a\ ro ro O co in CO H H X X M M H E X X OS os OS OS s s J OS M H > > > M X X a a X a X E E Oi Oi Oi Oi O P J J Q D CO CO CO CO >l >H >H 2 |2 o o a 0 EH EH bi w bl bl > > >H >H > >H rH M rH 1\u00E2\u0080\u00941 o 0 Oi Oi o, Oi x as > > > > bi la EH EH EH EH M M X X SB X EH EH CO EH E> > > > \u00C2\u00AB ;>s a o 0 0 H ,J a Q a a o 0 a 0 CO CO u < H > a a Q D 0 0 j o; os > > > > X X rH H HH < < o 0 o o a a H H EH H < < H EH EH a w OS OS OS OS Q Q 0 0 o 0 0 0 CO CO CO CO M > CO CO CO CO >H bl >i >l >l >i < ij J q :*s CO CO CO CO 2 *s \u00E2\u0080\u00A2J x CO > > > > < < ft ft ft ft EH cn o> a a a CD 0 H H H H S E o o r a u s s > > > > < < >i >H >H co a E a bi 5! ^ < Oi Oi Oi Oi 2 2 CO CO *> > OS Oi S3 S3 \u00C2\u00A35 0 os os In bi Cn bi 0 O EH EH EH H > > I 1 bl bl H H i 1 bi b CO CO i 1 s s Ch Ol i 1 H H bi bi i 1 a a > > i 1 EH H 1 1 E E os o i 1 X X, Di J i I bi bl < ft i 1 OS OS a a i I S3 S3 a a i I CO a 0 0 i 1 X X a. co i 1 Oi Oi tn In i 1 o o i 1 E E | | AP AP > \u00C2\u00A3 i 1 bl w E E i I EH H o o i 1 J J CO CO 1 1 J a 0 i 1 > > 2 S3 i 1 Oi CO Q Q i 1 cc 3S CJ CJ i 1 a CO > > i 1 bl bl i 1 Oi Oi <: < i 1 ft 23 a co i 1 > a a i 1 OS OS bl bl i 1 > M W i I bl Q H a i 1 S3 W CO i 1 bi S>i g E i 1 bi bi i 1 EH EH DI I U N ID 3 In >H &> O Q Q bl EH H M M ^ Q Q bl EH H CN ro I * ro ro o co J^| J^| t H H CO cn ro LO m fM fM ro ro | | H CO H CO EH < < a a Cn bi bi bi EH EH EH EH CO CO CO EH CO CO CO CO CO CO CO J J H H H H CO CO CO CO 0 0 0 0 0 0 0 0 H H H M CO CO CO CO >l >H >< >H \u00C2\u00AB X X OS OS OS OS bl bl bl bl Oi Oi Oi Oi Oi Oi Oi Oi ft ft ft < X. X X X M M H Oi X X X E E S M CO CO CO CO S3 S3 S3 CO a. Oi Oi Oi as S3 S3 S3 S3 EH EH EH CO > > H M a bl bl bl X X X X CO CO CO 0 J J E J OS OS OS OS Q Q bl bl Oi Oi 0 0 0 *z bl bl bl 2 >H >i >H bi E E E E EH EH EH EH EH EH EH EH 0 0 0 0 0 0 0 0 CO CO CO CO > > > > H M H H S3 S3 S3 S3 CO 0 >H J Q Q q a j^i j^| r- vo 2 2 2 OS J^I i^ r- r-OS OS OS OS CN CN ro ro > > > > o o Q p H H H M a Q D P u u CJ CJ i i bi bi X X X i i U U E S S3 S3 i i X X M H H H i i OS S3 CO CO CO CO S3 S3 S3 a i i J> > 5H SH b. b. M M M M p. p> EH EH 0 0 0 CO EH EH EH E Oi Oi Oi Oi bl bl bl bl 0 0 0 o X X X X CO CO ft ft H H s> bi bl bl bl bl 0 0 0 O P P P P Oi >H >H >H >H S3 S3 u < bl bl bl bl ft ft ft bl bl bl bl CO bl bl bl P ^ ^ ^ ^ J J J \u00E2\u0080\u00A24 ^ ^ SH IH 0 w 0 M 4J \u00E2\u0080\u00A2\u00E2\u0080\u00A2H U -rH us i-H m i-H i-H -iH - H r-H -H - H n 3 0 N ro 3 0 N \u00E2\u0080\u00A2 *Q U 3 \u00E2\u0080\u00A2 J 3 UJ 3 \u00C2\u00B0<1 IH 0,1 IH IH to 13 tn o \u00C2\u00ABI 3 Oil) Q Q bl E-i Q Q \u00C2\u00AB EH H CN ro \u00E2\u0080\u00A231 H CN ro ^ o a . c c \u00C2\u00AB \u00E2\u0080\u00A23 cu u c CJ 00 00 O >1 S crj w5 \" S o CO tl O o s * & 43 ^ o 3 \"I \u00E2\u0080\u009E Q a S S % C3 St fj_i \"a -2 o \u00C2\u00BB Q \u00C2\u00A7 0 \u00E2\u0080\u00A2 5 .\u00C2\u00A7 ^ 8 1 \" J2 W IH \u00E2\u0080\u00A2S IS OJ i\u00C2\u00AB r ts CJ 5 a.! 0 0 -)-H -^H o -a O C*H \u00E2\u0080\u00A2O o it B \u00C2\u00A7 \u00C2\u00A7 > o a \u00C2\u00AB . 0 , co a .. . cu a o co 4\u00E2\u0080\u0094 to \u00E2\u0080\u0094< T 3 o a ^ 5. \u00C2\u00AB o co cd CA CJ T -is O co CO .5 O 3 S C3 ^ 5 rS o \u00E2\u0084\u00A2 2 \u00E2\u0084\u00A2 o f . e pa o & 60 a o a CO ^ ft 13 a ti I \u00E2\u0080\u00A2a a cs \u00E2\u0080\u00A2 C3 \"> 'Ft M I 2 e t< co bo The deduced amino acid sequences of Diplonema sp.3 and Diplonema ambulator were aligned with the homologous region of the actin sequences from 63 other eukaryotes. Figure 4 shows a representative alignment including actin sequences from Diplonema sp.3, Diplonema ambulator, Euglena gracilis and Trypanosoma cruzi. The two diplonemid sequences were very similar to each other, with only 3 amino acid differences over the total length of 244 amino acids (sequence differences were calculated by P A U P version 4.0). In addition, they were similar to the sequences of other euglenozoa: when the two diplonemid actin sequences were compared to that of Euglena gracilis, only 37 of the 244 amino acids were different, whereas sequence differences between diplonemids and kinetoplastids were higher (64-71 of the 244 amino acids). Alpha-tubulin gene sequences P C R products of the expected size (1197 nt) were obtained from Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 3 (new strain), Diplonema papillatum, Rhynchopus sp. 1, Rhynchopus sp. 3. P C R products of larger than the intronless sizes were obtained from Diplonema sp. 4 and Rhynchopus sp. 2. The above P C R products were cloned, and both strands of two independent clones from each source were sequenced. It was confirmed that all of them were true alpha-tubulin sequences by B L A S T searches against the GenBank database. The results of the B L A S T searches also revealed that there was one intron in each of the two alpha-tubulin clones from Diplonema sp. 4 (109 nt) and Rhynchopus sp. 2 (126 nt). In addition, by comparison, I found that the two alpha-tubulin clones from both Diplonema sp. 3 (new strain) and Rhynchopus sp. 3 were slightly different from each other. The two sequences from Diplonema sp. 3 (new strain) vary at several nucleotides and two amino acids and those of Rhynchopus sp. 3 vary at several nucleotides but not at any amino acid. These differences indicate that two different alpha-tubulin genes were sequenced from 27 cTiCAr^ r^ r^ r~r-r~r^ r-r^ r-- m(^r>i^ i^ i^ r>r>r>r~-r~-r~- mo^ r-r-r--r^ r--r-i>r--ir~-co cocoujvDVD^yjMJVD^usvD r^ hi/iiriuiiriuiiriLnLriinLn V D ^ O ^ ^ ^ ' ^ ^ ^ ' ^ ^ ' ^ C N ft ft ft ft ft ft ft ft ft ft ft X X X X X X X X X X X X Cu Cu 6. Cu Cu Cu Cu Cu Cu Cu Cu Cu p P p p p p p J J P a a a a a a a a a O o o cu cu cu 04 cu oi o; oi os Oi Oi i* i* >H >H i* >- >H >H >H >H EH EH EH EH EH EH H H EH EH EH H 0 o 0 o 0 0 o o o U a 0 EH EH H EH EH H EH EH EH EH EH EH B! CU Oi CU OJ CU OJ OJ Oi Oi Oi H > > > > > > > > > > > Cd Cd cd CB Cd cd Cd U Cd Cd Cd R P a P P P Q Q P Q P P > > H M H H M H M H M > > > > > > > > > > > > EH CO EH EH H EH EH EH EH EH ft ft ft ft ft ft ft ft ft ft ft ft ta w Cd Cd Cd Cd W Ci] ta Cd Cd Cd p p P P P P p P P P a p P P P P P P p P P P p p J p P p p p p P P P Cu Cu Cu J Cu Cu P > > > > > < > < < < > < < 2 ft 2 ft 2 ft 2 2 ft 2 2 ft 2 ft 2 ft 2 ft 2 ft 2 a. > > > > > > > > > > > > X X X X X X. X X X. x X X X x 2 2 X 2 2 0 0 u o 0 a o o o o o o < < CO CO CO CO rt < < < 0 0 o o u o 0 o o u u o H EH EH EH EH EH EH EH EH H EH EH a Cd Cd Cd W W Cd Cd Cd U p Cd CO CO CO CO CO CO CO CO CO CO CO CO Cu Cu Cn It, Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu EH H EH EH EH EH EH EH H EH EH EH is CO 2 2 2 S3 2 s Cu cu Cu Cu Cu Cu Cu Cu Cu ft < < ft ft ft ft rt rt < < a a P a a P a p P p P a a p P a a P a p D p P a ca Cd U a a Cd w Cd Cd W w > > M Hi H H H 1\u00E2\u0080\u00941 M M > > 0 0 0 a 0 0 o o O o o 0 M M 1\u00E2\u0080\u00941 i\u00E2\u0080\u0094i i\u00E2\u0080\u0094i M i\u00E2\u0080\u0094i 1\u00E2\u0080\u00941 M M u u EH ft EH EH EH EH EH EH H EH X 2 X X X X X X, X a Q P P P P a a a P a a CO CO CO CO CO CO CO CO CO CO CO CO ft ft ft ft ft ft ft ft ft a ft S J P P s ^ CO ft ft < ^ < 0 0 0 O 0 o 0 O o CJ o 0 Q Q D P P p p P p p p p ft ft ft ft ft ft a ft ft ft ft ft o o o o o a o o a o o a H H M H H M H M H M M H 0 0 X X Cd Cd P P u u Cu >H P P Cd Cd 5 S CJ u 0 0 > I\u00E2\u0080\u0094I aa v H o o < 0 a a 0 0 H p X X H H CJ CO H M H>H>H>H>H>H>H>H>H>H P P P P P P P P P P u o (CJ 5 5 H (N t-H >-i tt) OJ CJ U 'H 'H t-H a C N C j - H n n n i N ^ n n H n 3 m U >H >H <0 C^DJC^QJQ,D,QJQ H > > > > > > > > > CJ a a a a a a a a a a a CU ft ft o< o. ft a. ft ft ft cu CO CO CO CO CO CO CO CO CO CO CO CU ft ft ft ft ft ft ft ft ft ft ft CU >H >H >< >H >< >< >< H >H >H H H > > > > H > > > > u EH H EH EH H EH EH EH EH EH EH cu !H Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu p 0 0 0 0 0 0 0 0 0 0 0 0 P P P P P P P p P P P p X X X X X X X X CO CO CO CO CO CO CO CO CO CO CO CO X X X X X X X X X X Oi cu Di Oi CU CU CU CU CU CU DYG DYG 0 0 0 0 0 0 0 0 0 0 DYG DYG >H p i* P CJ 9 >H 9 >H 9 >H 9 >H >H 9 i\u00C2\u00BB 9 >H > CO > CO > CO >CO >CO > CO > CO > CO CO > CO > CO CO P p p p P p p p p p p p oi Oi Oi Oi CU CU cu cu cu cu cu cu Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd P P p p P P P P P p P P P P p p P P P P P p P P P P p p P P P P P p P P < CU < < < < ft < < rt < ft 0 0 0 0 0 0 0 o 0 0 0 0 P P p p P P P p P p P P 0 0 0 0 0 0 0 0 0 0 0 0 CO CO CO CO CO CO CO CO CO CO CO CO o 0 0 0 0 0 0 0 0 0 0 0 EH EH EH EH EH EH EH EH EH EH EH EH 0 0 0 0 0 0 0 0 0 0 0 0 < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > > > > > > > > > > CO CO CO CO CO CO ft < CO 2 2 2 2 2 2 2 2 2 2 Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu > ft > > > > > > > > > > p p P p P P p p P P P P Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu 0 0 0 0 0 0 0 0 0 0 0 0 a a LSI a a a a a a a a a p P P p P P p p P P P P 0 0 0 0 0 0 0 0 0 0 0 0 EH EH EH EH EH EH EH EH EH EH EH EH U U u o U U U u u U U U 2 X 2 2 2 2 2 2 2 2 2 2 2 0 P X X X X X X CU Oi CU CU CU CU CU CU CU Oi CU CU H M H H H H M H H H H Oi Oi Oi cu CU CU CU CU CU Oi CU CU P P P p P p a p p P p p p p p p p p p p p p p p 0 U U U U U U U U C J U U p p p p p p p p p p p p H M S M H H M H H M H H CdCdCdCdCdCdCdCdCdCdCdCd X X X X X X X X X X X X 0 0 O 0 0 0 0 0 0 0 0 0 H t H M H M H M H H H H H E H E H E H E H E H E H E H E H E H H E H E H >H>.>H>H>H>H>H>H>H>H>H>H HC HC K HC \u00C2\u00A3C \u00C2\u00A3C K K K 0 0 0 0 0 0 0 0 0 0 0 0 CUCUCUOiQiOiDiOiOiCUOiOi <:<<<<<<<<<;<< s g r - 1 P P P P P P P P P P P P CdCdCdCdCdCdCdCdCdCdCdCd X X X X X X X X X X X X 0 0 0 0 0 0 0 0 0 0 0 0 cocococococococococococo H H H M H H H H H H H H P P P P P P P P P P P P a a a a a a a a a a a a C d C d C d C a C d C d C i I C f l C d C d C d C d E; rH (N 3 u CJ CO -u) \u00E2\u0080\u00A2H H 3 5 H CN >-H >-< OJ tt) U CJ \"\u00E2\u0080\u00A2H \"rH I\u00E2\u0080\u0094[ (H (H N U'HmrorocN^roroHCN 3 H D, >H >H U ClC^ClnC^C^C^CiQ.Ci OtoDjCOWCOCOCOCOWCOCO EH' Cd Q Q Q Q Q Q CU Qi tt; uj rHcNro^ rLnci>c-~cocr\u00C2\u00BBoHfN p p p p p p p p p p p p > > S E S S S S S S S S CuCuCuCuCuCuCuCuCuCuCuCu X CC X X X CC CC CC CC CC CC X M h H H M H H M M M H M M H cuoJoioioiQioioioioioioi ftftftftftftftftftftftft >H>H>H>H>H>H>H>H>H>H>H>I ftftftftftftftftftftftft >>>>>>>>>>>> P P P P P P P P P P P P 2 2 2 2 2 2 2 2 2 2 2 2 E H H E H E H E H E H E H E H E H E H E H E H a a a a a a a a a a a a CuCuCuCuCuCuCuCuCuCuCuCu CdCdCdCdCdCdCdCdCdCdCdCd E H E H E H E H E H E H E H E H E H E H E H E H PM>S>>>>>>>>t> p p p p p p p p p , < < < < < < < < < < < < 0 0 0 0 0 0 0 0 0 0 0 0 P P P P P P P P P P P P CuCuCuCuCuCuCuCuCuCuCuCu CUDJOiDiOiOiOiOiDiOiDiOi P P P P P P P P P P P P COCOCOCOCOCOCOCOCOCOCOCO E H E H E H H H E H H E H H E H E H E H P P P P P P P P P P P P H M H M M M I - t H M H M >>>>>>>>>>>> a a a a a a a a a a a a Oftftft-iH}-iHh4H>-lhi>-it-i P P P P P P P P P P P P cucuoioicucuoioioioioioi 2 2 2 2 S 3 E 5 2 2 2 E 3 2 2 p p p p p p p p p p p p 2 2 2 2 2 2 2 2 2 2 2 2 E H E H E H E H E H E H E H E H E H E H E H E H >H>H>H>H>H>H><>H>H>H>H>H E H H E H E H E H H H H E H E H E H E H ftftftftftftftftftftftft cucuoioioioJeicuoioioioi CdCdCdCdCdCdCdCdWCdCdM P P P P P P P P P P P P P P P P cUcUoioioioicUcU E H U O < < < 0 < dHHHHHHH P P P P P P P P > H r H > H > H > H > < > H > 4 ft < Cd Cd 2 2 P R P P P P P P CU CU ft ft P P I >H >H ; P P P P CU cu ft ft CO 4J \u00E2\u0080\u00A2rH Ct) M 1\u00E2\u0080\u0094I \u00E2\u0080\u00A2rH r-H U U rH CN U U 3 3 tti oj CH \u00C2\u00A3 N O'HronrofN^frorOrHCN !H SH ul C ^ r ^ c ^ c ^ Q i C l C i ^ C H O C ^ Q < C 0 W C 0 C 0 C 0 C 0 C 0 C 0 C 0 hWQQ^QQQCUCUCUtti H n n ^ i n c o ^ r D O j o H N 28 r o r o r o r o r o r o r o c o r o r o r o r o rsJ^COCOCOCOCOOOCOCOCOVD CMCMCMCMOICMCMOIOIOIOIOI a o o o a o o o a o o o ^ ^ ^ ^ . ^ ^ ^ ^ ^ ^ ^ ^ 2 2 2 2 2 2 2 2 2 2 2 2 o o o o u u u u tn Cn tn Cn o o o o Ot O i O i O i CO CJ CO CO s s s s a D a a o o o o u o o o tn tn tn tn O O O O H H En H CM CM CM CM CO CO CO CO 3 3 3 \u00C2\u00A3 a a a a o o o o O CJ u u X X X X tn tn tn tn O O O O EH EH EH EH CM Ch CM Ch CO CO CO CO 3 3 3 3 a a a a >>>>>>>>>>>> C n t n t n t n t n t n t n f n t n C n C n t n a a a a o a a a a a a a r H r H r H - H H H H H H r H M r H EHEHEHEHEHEHEHHEHEHEHEH BS BSoS QSOSOSOSOSOSOSOSOS EHEHEHEHEHEHHEHEHEHEHEH ^ h^H t^ * b^Ht ^ r^ r^ H M M H H r H H H H I \u00E2\u0080\u0094 I H r H H EHEHEHEHEHEHHEHEHEHEHEH >>>>>>>>>>>> rfcococococococococococo \u00C2\u00A7 \u00C2\u00A7 \u00C2\u00A7 cu cu cu Q P Q 0 0 0 OS Cn a >H rH >< 2 2 2 J J J o u o u u u Oj Ol Ch P Q P O O O OS OS OS !* >< >< 2 2 2 J J J u o o U V u s2\u00C2\u00AB & 0 0 0 X X X a a of. Ch Ch Ch P P D u u u 2 2 2 2 2 2 CO CO it Ch Ch Ch Ch Ch Id H Id Id Id tn tn Cn tn In CM CM Oi Oi a a a a 0 0 0 0 OS OS OS PS >l > l > l >< 2 2 2 2 J J J J U U U U u u o u >H > l >H >H X X OS X 0 0 0 0 X X X X a; os os os 01 O i O i O i p p p p u u o u \u00C2\u00BB ^ 2 2 2 CO CO CO CO CO XXX 0 0 0 x x x OS OS OS CM OJ Ch a a a xxx E S S CO CO CO Ch Ch a a o o OS OS > H >H 2 2 u u u u > H >H \u00C2\u00AB X 0 o SB 33 OS 01 CM p p V CJ 2 2 CO CO ft ft ft Ch Ch Ch w BI ta tn In tn < ft < ft CM O i O i CM 01 til til Id tn tn tn In ti<,mft > > > > CO CO CO CO CO J J J J J o a a a a 2 2 2 H EH EH H H H Id Id Id ft ft ft > > > CO Cn CO rl hi hi a a a t d C d C d t d C d l d l d t n ' x x x x x x x x > H > H > I > . > H > I > I > I W l d l d t d t i l t d t i l t i l > > O i O i O i : < ft ft I >H > H > H I CO CO CO I CO CO CO > M O i O i CO CO EH CO > > >: CM CM CM I < ft ft 1 tn > H > H ; co co co 1 CO CO CO I 2 2 2 2 h h h h H H H H Id 01 01 Id < ft < ft > > > > CO CO CM CO \u00E2\u0080\u00A2J hi hi hi a a a o Id Id td td as as a: as > l >H >H >H tB Id Id Id ft ft < CO CO CO CO M H H H > > > > CM CM CM CM ft ft ft ft >H > H > H >H CO CO CO CO CO CO CO CO CO -U i-H. i-H - H 'rH r-H H CN CJ CJ S S r H CN 0> 01 CJ U d CS M O - r H r o r o r o c N ^ i r o r o H C N 3 CO O, U VH rrj 0,0,0,0,0,0,0,0,0, O c j > Q , t o t o c o c o t o c o c o c o c o E-IWQ'QQQQQ\u00C2\u00ABQ;OSBS In ft In In <; ft tn ft In tn In <; In ft In In \u00E2\u0080\u00A21 hi ft 2 2 2 2 X i i 2 2 2 | 2 X CO CO CO CO CO CO CO CO CO CO CO CO >H >l > H s. >H >l >l >< >l >H 2 2 2 2 2 2 2 2 2 2 2 hi hi ,1 hi r l hi r l hi hi hi hi hi P P Q P Q D P P P P P P tn Cn Cn tn In In In In In Cn In In X X X X \u00C2\u00AB X X X X X X X X X X as X X X X X X X X p a a p P 0 a a a a a a M M hH H M H H H H hH M HH OS OS OS OS OS OS os OS OS OS OS OS ft ft ft ft ft ft ft ft ft < ft ft tn In tn Ch Ch In th In In tn In In > > > > > > > > > > > > Id td td Id Id Id Id Id Id Id Id Id rt ft ft > > > > > > > > > > > aaa > > > M x x g ;> gR P P P P P O O O O O O O O O O O O O i CM CM CM O i CM - - - a a a g g g a p p 0 0 0 0 0 0 CM CM H EH EH CM CM CM EH EH EH CM CM CM EH EH EH CM CM CM a a a \u00C2\u00A7 \u00C2\u00A7 Ss D P P O O O O O O CM CM CM EH EH EH CM OI OI CO hi \u00E2\u0080\u00A2H rrj r-H i-H I \"rH i-H CJ u S S H CN OJ 01 u u CS c! O - r H r o r o r o c N - ^ r o r o H C N H Q , SH U 0,0,0,0,0,0,0,0,0, &)Q,COtOCOCOC0MCOWW EHHQQQQQQOSOSQSQS H c N r o f L n y j r - c o m o H C N L n H ^ J l ' r l l ^ J l ^ l l ^ l S l l r J l ' r J l ^ J l L n C N L O C O C O C O i X C O O O C O C O C O C D ^ ^ r o r O f O f O r o r o r o r o r o r o Id CO 4J rH CN u u S S H CN oi a) o u -\u00E2\u0080\u00A2H -H I-H ( H CS N O - r H r o r o r o C N ^ m r o H C N 3 r] a VH rH crj 0,0,0,0,0,0,0,0,0, O t n O . c o c f i c o i o c o c o c o c o c o H C N r o - r j i i n c o r - c o c n o r H C N i2 C3 crt \u00E2\u0080\u0094^ C3 s2 cn these two diplonemids, but only one gene was sequenced from each of the remaining six diplonemids (see Table 1). The sequence of the P C R product was 1153 nt in length (excluding primer and intron sequences) for each of the nine alpha-tubulin sequences, recovering more than 80% of a complete intronless alpha-tubulin sequence. However, for the alpha-tubulin sequence of Rhynchopus sp. 2,1 was unable to sequence about 120 nt (see Fig. 5). The inferred translation of 384 amino acids for each of the ten alpha-tubulins from diplonemids (with no primer sequences) was aligned with those from a sampling of 54 other eukaryotic taxa. Figure 5 shows a small sampling of this alignment, including the ten diplonemid alpha-tubulin sequences as well as those of Euglena gracilis and Trypanosoma cruzi. The sequence differences among the ten diplonemid sequences were slight (only 0-15 amino acid differences over the total length of 384 amino acids). The sequence differences between diplonemids and Euglena gracilis were 18 to 39 amino acids. Similarly, the sequence differences between diplonemids and kinetoplastids were 23 to 44 amino acids. Beta-tubulin gene sequences P C R products of the expected size (1200 nt) were obtained from Diplonema sp. 2, Diplonema sp. 3, Diplonema sp. 3 (new strain), Diplonema ambulator, Rhynchopus sp. 1, Rhynchopus sp. 2 and Rhynchopus sp. 3. P C R products of larger than the expected size were obtained from Diplonema sp. 4 and Diplonema sp. 2. A l l the above P C R products were cloned and both strands of several clones from each were sequenced. Again, B L A S T searches of these sequences confirmed that each of them was beta-tubulin, and corresponded to about 86%o (387-388 amino acids) of a full-length beta-tubulin gene. In addition, the sequences from three independent clones from Diplonema sp. 4 showed that each were slightly different copies of beta-tubulin (1-2 amino acid differences). The sequences from 30 LnminLniiii/iLniriLjnininLnLnoro jiinuiLnLnirnjiLfii/iLni/ii/iinon iflLnuiinuiiniiiiiiLf)L/)iiii/>i/>or] r--r^r-r-r>r>r-r-r>i^r>r^r>o>r-] y)kDy)VDy)iovDiiHDvoijovovDcorH Lni/iini/lLnini/iinLnLnLnLnLnr>o H H H H r H r H H H r H H r H H H H H C N C N i ^ C N C N C N ( N r s J C N M ( N C N C N f > I ( N Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q C M O J O J C M C M C M C M C H C M C M C M C M C M C M C M C U 6 J C U C U O I O I O I C U 0 I C U C U C U C U C U O J M M M M I - i M M M M r H M M M M M o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o CMCMCMCMCMCMCMOjCMOjQjCMQjajCM o o o o o o o o o o o o o o o > > > > > > > > > > > > > > > CO CO CO CO D R Q Q S E E S H H EH EH o o o o CM CM CM CM H H U H J J J J g o a a S E E J rl J J > > > > CU Cu Cu Cu a; os a o o o EH EH < < W b l Z Z Cu Cu , >H CO CO CO CO a a a p S E S S EH EH EH EH O O O O CM Cu CM CM Cd Cd H 61 rl hi rl rl D D D P S E E S hi hi hi hi > > > > oi o\u00C2\u00A7 o l o l CM CM CM CM CO CO CO p p p S S S EH EH EH O O O CM CM CM Cd Cd Cd h i h i r l P R P S E E hi hi hi > > > Si Si Si CO CO CO CO P R R R E S S E EH EH EH EH O O O O CM CM CM CM Cd Cd Cd Cd hi hi hi hi R P R S E M hi hi hi > > > Cu Cu Cu CU OH, CU cd os cu O O O O O O EH EH EH < < ft Cd Cd Cd Z Z Z Cu Cu Cu OS OS OS O O O O O O EH EH EH < < < Cd Cd Cd z z z Cu Cu Cu OS OS OS O O O O O O EH EH EH ft ft ft Cd Cd Cd z z z OJ CU h i OS OS OS O O O O O O EH EH EH ft o> o> o> o> hi hi hi \u00E2\u0080\u00A2i hi hi hi hi hi rl hi hi hi hi R R a p p P D D p P p p p R CO CO CO CO CO CO CO CO CO CO CO CO CO CO p p a R R R D R R R R R R R O O O O O O O O o O O o O O a a a a a a a o a a a o ' O ' O ' r\u00C2\u00AB rH >H >H >H >H >H >> >H r\u00C2\u00AB r\" r* r\" EH EH EH EH EH EH EH EH EH EH o o o o o O O o O O O o O O H EH EH H H EH EH EH EH EH EH EH EH EH OJ CM OJ OJ CM CM CM CM CM OJ CM CM CM CM R R R D a P P R D P R O R R > > > > > > > > > > > > > > o o o O o o o o o o O o O O X X X X X X X X X X X X X X Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Cd Id R R R R a R P D a P P D P R CO CO CO > CO CO CO CO CO CO CO > CO > CO CO > CO H Cd Cd > Cd Cd Cd Cd Cd Cd Cd > Cd > Cd Cd > Cd EV s s S S s 3 3 3 3 3 3 3 3 3 Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu X X X X X X X X X X X X X X CO CO CO CO CO CO CO CO CO CO CO CO CO CO o o O o O o o o o o o O o o H H H hH H M H H M hH M H M M EH EH R D CO CO > > X X CM CM CO CO CM CM H M > > CO CO Cu Cu EH EH E S S E E E H H H OS OS OS D R R Oi CM CM >- >H >H Cd Cd Cd Cd Cd Cd os os os hi hi hi CO CO CO EH EH EH a R D CO CO CO > > g CM CM CM CO CO CO CM CM CM H H H > > > CO CO CO Cu Cu Cu EH EH EH EH EH D R CO CO > > X X CM CM CO CO CM CM M l\u00E2\u0080\u0094l > > CO CO Cu Cu EH EH EH EH EH H fl D P D CO CO CO CO X X E. \u00C2\u00A3 CM CM CM CM CO CO CO CO CM CM CM CM M M hH M > > > > CO CO CO CO Cu Cu Cu Cu EH EH EH EH EH EH EH EH D P Q P CO CO CO CO 5 E; Ss Ss CM CM CM CM CO CO CO CO CM CM CM CM Ml\u00E2\u0080\u0094 H H > > H > CO CO CO CO Cu Cu Cu Cu EH EH H EH E E S S E E OS OS OS R R R CM CM CM i\u00C2\u00BB >H >H Cd Cd Cd Cd Cd Cd os os os X X X CO CO CO hi hi hi hi hi hi hi hi hi EH EH EH O O O S E E O O O CO CO CO O O O EH EH EH O O O O O O O O O r l hi r l CO CO CO X X X CO CO CO hi hi hi EH EH EH O O O S E E O O O CO CO CO O O O EH H EH O O O O O O \u00C2\u00A7 5 \u00C2\u00B0 CM CM CO EH H X OS OS CO > > > E S E h l E E E E E E E E S E S H H H H > H S H S o s o S o S o S o S o S o S o S o S R R D R D D P P R O J C M O I Q J O I C M C M C M C M | > l J > l ^ l J > l J > l J > l [ > l [ > I ^ H 616161616161610161 cdCdCdCdCdCdCdCdCd CiSPSBSOSOSCu'hiSOSOS h l J h l h l h l h l h l h l H i^-l r^ ^ ^ t^S* ^ h^H cococococococococo h H M M h H M H H H H h l h l h l h l h l h l h l h l h l h l h l h l h l h l h l h l h l h l E H E H E H H E H E H E H E H E H o o o o o o o o o S S S E S S S S S o o o o o o o o o cococococococococo o o o o o o o o o E H E H E H E H E H H H H H o o o o o o o o o o o o o o o o o o o o o o o o o o o h i S h l h i h i h i h i h i h i cococococococococo xxxxxxxxx cooioi< > > p p p hi hi hi > > > CO CO CO P P D M U M hi hi hi Cd Cd Cd ft ft ft O O O Cd Cd Cd H H H >< >H >H O O O > > > P P D hi hi hi > > > CO CO CO D P P I\u00E2\u0080\u0094 H H hi hi hi Cd Cd Cd <<<. O O O Cd Cd Cd H H H >H >- >-U O U > > > P P P hi hi hi > > > CO CO CO p p p H H H hi hi hi Cd Cd Cd <<< O O O Cd Cd Cd H H H > ! > - > -U U CJ U U > > > > > > ! > p p p p a R hi hi hi hi hi hi > > > > > > CO CO CO CO CO CO a a a a a a hH M M rH M H hi hi hi hi hi hi Cd Cd Cd Cd Cd Cd ft ft ft ft ft CM O O O O O O Cd Cd Cd Cd Cd Cd H H H H H H >H >. >, i\u00C2\u00BB >H >H i-C P-C P-C HC hC KC HC \u00C2\u00A3C K \u00C2\u00A3^ ^ o o o o o o o o o o o o o o o xxxxxxxxxxxxxxx ftftftftftftftftftftftftftftft o o o o o o o o o o o o o o o > > > > > > > > > > > > H > Cu Cu O O > > hi hi Cu Cu Cu Cu X X hi hi OS OS CM CM Cu Cu CM OJ H H 1 hi hi I Z Z I Cu Cu O O > > hi hi Cu Cu Cu Cu X X hi hi 0! OS OJ OJ Cu Cu OJ OJ Cu OJ OJ Cu OJ Cu Cu Cu Cu Cu Cu o O O O O O o O O O O > > > > > > > > > E > hi hi hi hi hi hi hi hi hi E hi Cu Cu Cu Cu OJ OJ Cu OJ OJ Oi OJ Cu Oi OJ Cu Cu Cu Cu Cu Cu Oi Cu X X X X X X X X X X X hi hi hi hi hi hi hi \u00E2\u0080\u00A21 hi hi hi OS OS OS OS OS OS OS OS OS OS OS OJ CM CM Oi CM CM CM CM CM CM CM Cu Cu Cu Cu Cu Cu fcj Cu CU OJ Oi CM CM OJ CM CM CM CM CM CM CM CM HH H H H H M H H H > H hi hi hi hi hi hi hi hi hi hi hi Z Z Z Z Z z Z Z Z Z Z >>>>>>>>>> > 5 5 5 5 5 3 5 3 5 3 5 5 5 5 3 X X OS OS oiosososososososoiosos OS OS hi hi h l h l h l h l h l h l h l h l h l h l h l hi hi AD AD \u00C2\u00A79\u00C2\u00A79999999co SD SD z z z z z Z z z z z z z z Z z hi hi hi hi rl hi hi a a o o o o o o o o O o o O O o o Oi CM OJ CM CM OJ CM CM CM CM CM CM CM CM CM Cu Cu Cu Oi Cu Cu Oi Cu Oi OJ Cu OJ OJ Cu Cu o: OS OS OS OS OS OS OS OS OS OS OS OS OS OS hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi u O u u U u u u CJ CJ u u u u o u U u u U u u u u u o u U u u H H H H H H H H H H H H H H H > > > > > > > > > > > > > > > O O o O O o o o O o o O O o o CO CO CO CO CO CO CO CO CO CO CO CO CO CO > CO ft ft < < < ft ft rt ft ft ft ft ft ft ft CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO > > > > > > > > > > > > > > > hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi X X X X X X X X X X X X X X H z z z z Z Z z Z z z Z Z z z Cd hi hi hi hi hi hi hi hi hi hi hi hi hi hi X a p p p D D D P p p P P p p o o o o O O o o o o o o o o o Cu Cu Cu Cu Cu tu Cu hU 6, 6i Cu 6j tu OJ Cu H H H H H H H H H H H H H H H CM CM CM CM CM CM CM CM CM CM CM CM OJ CM a, H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H hi hi hi hi hi hi hi hi \u00E2\u0080\u00A21 hi \u00E2\u0080\u00A2I \u00E2\u0080\u00A2I hi hi hi X X X X hS X hS X hS X hS hS hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi H H H H H H H H H H H H H H H OS OS OS OS OS BS BS BS OS OS BS OS 0! OS CM Cu 0. OJ Cu Cu Cu Cu Cu Cu tu tu tu tu 0, hi u U u u u U U U U U U U U O U H H H hH H H hH H hH H H H H H H D P P P P D R D a R D R R R P >H >H >i >H >H >H >H >H >H r\u00C2\u00BB >H \u00E2\u0080\u00A21 hi hi \u00E2\u0080\u00A21 hi hi hi hi hi hi hi rl hi hi hi ft < ft < ft ft ft < < r< ft ft ft < < Cd Cd Cd 61 61 a 61 01 61 61 61 61 61 01 Cd Z Z Z z Z z Z z z z Z z Z z Z D P P p D D P D p p D p P p O H H H hH H H H H H H H H H H H VM VM J. VM WA VM VM VM WA WA eg WA WA MC U U U u U u U U u U CO u U U CO [> 01 9 01 9 01 9 01 9 01 9 01 9 Cd 9 Cd 9 61 9 01 9 Cd 9 61 9 td 9 ;DE 01 9 IH, Z hlj z Z Wj z hi. z z rH.z hi. Z z z rH, Z z hHjz z z Cd 01 Cd 61 61 01 61 Cd 61 01 Cd 61 61 61 01 > > > > > > > > > > > > > > > rl hi hi \u00E2\u0080\u00A21 hi hi hi hi hi hi hi hi hi hi hi a a a a o o o o o o o a o o a h [> CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi H H H H H H H H H H H H H H H ft ft < < < Z Z Z Z Z < < z z s s s < ft P Z < H H CM CM CM CM CM CM Oi a, CM CM >H CM CM CM CM CM 01 01 01 01 01 Cd 01 01 01 01 Cd 61 61 Cd 01 o rH CM CO HJ CJ CJ O rrj U U U U CJCJ rrj U -rH M r H r-H QJ TH n n r s m r o n ^ H / ^ 3 H 3 C N CJ O ^ . ^ . 3 c o r ^ C i C ^ Q j C ^ D j O j D j Q j S D j g D j M U c o c o c o u i c o c o c o c o c o r o w r c c c o x i t r t ciioicsqqQQQQQ^Qqhw H r j n r j i L i U c r - o i m o H i N n ^ i n H CN U U rH CN o o O H CN ro HJ O U O H O co \"rH r-H i-H r-H CD - H m r o c N r o r o r o ^ i ^ j i ' r 4 i 3 H 3 C N O O D , t i Q , Q , c i D , D j Q , D , E D , g (j, IH JH c o c o c o c o c o c o c o c o c o r c c o r o c o x i c i i r J S C \u00C2\u00AB Q ; Q Q Q C 3 ^ Q Q C t i Q Q E H , M H i N r l ^ i ^ n D h c r j m o H M r l H / i i - | I CN i CN U U U U O rH O H CN ro HJ CN u u o HJ CO r o r o c N r o r o r o - r j i ' r j i ' H 3 H 3 C N -rH rH > > CO CO CO CO CO CO CO CO CO a, CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO X X X X X X X X X X X X X X X HI\u00E2\u0080\u00941 H H i-i \-t \-t i-t v-^ H H H H H 2 2 2 2 2 2 2 2 2 2 2 \u00C2\u00A7 2 Z g 2 g 2 2 \u00C2\u00A7 2 g 2 g g ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft H H H l\u00E2\u0080\u0094l i-^ t-t t-i H t-^ t-i H 1\u00E2\u0080\u00941 1\u00E2\u0080\u00941 H i\u00E2\u0080\u0094i 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Cd Cd Cd a Cfl Cd Cd CM Cd Cd Cd Cd Cd Cd Cd > > > > >>>>>> > > > H > Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu Cu >< >H >H CH >H ,>\u00E2\u0080\u00A2 >< >H C\" C\" CH C\" >i CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO X X X X X x x x X 2 X \u00C2\u00A7 X 2 X 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 isi a a a a p p p P p p p p p p p P P P P E S S ' a a a Cd Cd Cd E S S : a a a c Cd Cd Cd C Q 9 6 \u00C2\u00A3 Cd Cd Cd C [H H H E CO CO CO c CU CU Oi Pj I CJ CJ CJ CJ I OJ PJ OJ OJ I Cu Cu Cu Cu I p p p s : a a CJ CJ ft < EH EH P P >H C* OJ OJ CJ CJ CC X OJ OJ ft ft P P CO CO ft ft U U a a a uuu r< < < EH EH EH P p p >H >H >H OJ OJ OJ CJ CJ CJ X X X OJ OJ OJ ft ft ft P P R CO CO CO ft ft ft uuu a a a uuu < ft < EH EH EH P P P >H >H >H OJ OJ OJ CJ CJ CJ CC Cr; CG OJ OJ OJ ft ft ft R R P CO CO CO ft ft ft uuu a a a c u u u c ft ft ft t EH EH EH E P P P \u00E2\u0080\u00A2 C* > C* I OJ OJ OJ C CJ CJ CJ C cc cc cc c OJ OJ OS c ft ft ft t R P P C CO CO CO c ft ft ft > U U U C S E E E E E E E E E E E S S . E S ft ft ft R P P Cu Cu Cu I CO CO CO ' a a a ' a a a I EH EH EH I p p p Cd Cd Cfl I ft ft ft \u00E2\u0080\u00A2 > > > ft ft ft f P P P C Cu Cu Cu C CO CO : a a a c a a a c EH EH EH E p p p \u00E2\u0080\u00A2 W Cfl Cfl c ft ft ft c > > > ; >H >H a a a a CO CO CJ CJ OJ OJ CO CO H EH P p ft ft ft ft >H C\" a a a a CO CO u cj OJ OJ CO CO EH EH P P ft ft ft < >H C\u00C2\u00AB C\" a a a a a a a a CO CO CO CO cj u cj o OJ OJ OJ OJ CO CQ CO CO EH EH EH EH P P P P ft ft ft ft ft ft ft ft EH EH EH E P P P \u00E2\u0080\u00A2 '.ft sC f Cu Cu OJ C >H >H >H f a a a c a a a c CO CO c CJ CJ CJ c OJ OJ OJ c CO CO CO c EH H EH E P P P H ft ft ft C ft < ft f L ^ l ^ l ^ i > i > l ^ r ^ [ ^ l ^ I ^ C ^ I > C O C N I > cococococococococoaooococo rjt r-roforororororororororororo^rro CH a Cd ix > a p Cd Cd Cd Cd Cd Cd P P Cu Cu Cd Cd CJ CJ Cd Cd Cd Cd Cd Cd H > EH EH ft ft P R a a >H >H a a a a C* >H Cd Cd CO CO > > p p R R 2 2 CO CO Cd Cd ft ft a ca EH EH Cu Cu Cd Cd E E Cd Cd R P S E E CJ CJ CJ Cd Cd Cd cj CJ cj H EH EH C- >H >H 3 3 3 CC CC CC p p p Cu Cu Cu X X X OJ OJ CU CJ CJ CJ Cd Cd Cd CJ CJ CJ EH EH EH >H >H >H 3 3 3 CC CC CC p p p Cu Cu Cu X X X OJ OJ OJ CJ CJ CJ Cd Cd Cfl CJ CJ CJ EH EH H >H CH >H 3 3 3 CC CC CC p p p Cu Cu Cu OJ OJ OJ O O O Cd Cd Cd CJ CJ CJ EH EH EH CH >H >H 3 3 3 CC CC CC p p p Cu Cu Cu CJ CJ CJ Cd Cd Cd CJ CJ U EH EH EH >H CH C\" 3 3 3 CC CC CC p p p Cu Cu Cu X X X X X X M-H HM HH HM HH HH OJ OJ OJ OJ OJ OJ OJOJOJOJOJOJOJOJ CuCuCuCuCuCuCuCu EH EH EH Cu Cu Cu a a a cd a a CO CO CO > > > x x x Cu Cu Cu S E E Cd Cd Cd EH EH EH EH EH Cu Cu Cu Cu Cu a a a a a Cd Cd Cd Cd Cd CO CO CO CO CO fe is OJ OJ OJ OJ OJ X X X X X Cu Cu Cu Cu Cu E E E S S Cd Cd Cd Cd Cd OJ OJ OJ Cu Cu Cu EH EH EH Cu Cu Cu a a a Cfl cd Cfl CO CO CO > > > X X X Cu Cu Cu E E S Cd Cfl Cfl OJ OJ OJ OJ Cu Cu Cu Cu E S S E ft ft P it H EH EH EH Cu Cu Cu Cu a a a a Cd Cd Cd Cd CO CO CJ CO > > > > OJ OJ OJ OJ x x o; x Cu Cu Cu Cu S E E S Cfl Cd Cd Cd a a a a a a a a a o a a a a a 2? ^ \"2 CN C H C CO H CN u u H CN U U 0 H CN ro u U U U ra -H rH OJ -H f o r o d n n n ^ c i ' ^ c i H C i f N O O ciio^Q^r^ci ci'lf QJ'\u00C2\u00A7 1^ 2 n w w w m w w n o i c T j K r a w - ' H ' C j i C U ^ Q 4 Q Q Q H M Q Q ^ Hrjci^LOLcMumoHiNroci'in x-i r-i ^-t r-t H r-{ H CN U U U U U SH O H CN ro w U U U ni OJ \u00E2\u0080\u00A2 CJ CJ 3 ro r o r o i N n n r o H f ^ ^ C3H 3 CN 0 , 0 , 0 , ^ ^ 0 , ^ 0 , 0,11 ti, In H LOOCOWIOOCOCOWCCJCQItCOrOCOXJCOO C l j o j o 4 Q Q Q q Q H Q ' \u00C2\u00AB H d c o ^ i n L o M o o i o H M r o ^ m H H H H H H _g ~ ap ca 32 two independent clones from Diplonema ambulator also represented two different beta-tubulin sequences (3 amino acid differences). The sequences from two independent clones from Diplonema sp. 3 (new strain), and Rhynchopus sp. 3, respectively, revealed two different beta-tubulin gene sequences, however the differences were only detectable at the nucleotide level, not at the amino acid level. From each of the rest four diplonemids, the two independent sequences from two independent clones are identical (Table 1). One intron was found in each of the three different beta-tubulin sequences from Diplonema sp. 4 (140 nt, 126 nt and 149 nt in length, respectively) and in the beta-tubulin gene oi Diplonema sp. 2 (71 nt). The inferred amino acids for each of the thirteen beta-tubulins from diplonemids (with no primer sequences) were aligned with those from 46 other eukaryotic taxa. Figure 6 shows a small part of this alignment, which includes not only the thirteen new diplonemid beta-tubulin sequences, but also the sequences from Euglena gracilis and Trypanosoma brucei. A l l o f the thirteen diplonemid-sequences were very similar to each other (0-15 amino acid differences over a total of 387 residues). The sequence difference between diplonemids and Euglena gracilis (23-44 amino acids) was similar to the sequence difference between diplonemids and kinetoplastids (39-54 amino acids). G A P D H gene sequences P C R products of the expected size (around 1000 nt) were obtained from Diplonema sp. 3 and Rhynchopus sp. 3. P C R products of both expected size and larger than expected size were obtained from Diplonema sp. 2. A l l these P C R products were cloned and both strands of several clones were sequenced from each. B L A S T searches of these sequences confirmed that they all encoded G A P D H , recovering over 90% of a full-length G A P D H 33 gene. In addition, the sequence of the larger P C R product from Diplonema sp. 2 contained two intron sequences (77 nt and 120 nt in length). The deduced amino-acid sequences of the four diplonemid G A P D H genes (with no P C R primer sequences) were aligned with those from 96 other taxa, including both eukaryotes and prokaryotes. During the aligning process, I found it was difficult to align the four newly obtained G A P D H sequences from diplonemids with those of any euglenozoa. However, it was comparatively easier to align the three intron-lacking G A P D H sequences (from Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2) with Anabaena variabilis gap3 (I wi l l refer to these three diplonemid sequences as \"gapl\"), and the second Diplonema sp. 2 G A P D H with the GapC of cryptomonads (I w i l l refer to this diplonemid sequence as \"gap2\"). Figure 7 is a representation of this alignment. Comparison among the four diplonemid G A P D H sequences reveals that Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2 (gapl) are far more similar to each other than to Diplonema sp. 2 (gap2). The sequence differences between Diplonema sp. 3, Rhynchopus sp. 3 and Diplonema sp. 2 (gapl) are 81-93 amino acids over a total length o f 301-316 amino acids, whereas the sequence differences between these three and Diplonema sp. 2 gap2 are 187-196 amino acids. Further pairwise sequence comparisons showed that the sequence differences between Diplonema sp.3, Rhynchopus sp. 3, Diplonema sp.2 (gapl) and Anabaena variabilis gap3 are only 148-162 amino acids, whereas those between these three diplonemid G A P D H sequences and those of E. gracilis GapC, T. bruci GapC, and L. mexicana GapC were 195-207 amino acids. Meanwhile, the sequence differences between Diplonema sp. 2 (gap2) and the GapC of two cryptomonads (Pyrenomonas salina GapC and the Guillardia theta GapC) were only 115-118 amino acids, whereas those between Diplonema sp.2 (gap2) and the E. gracilis GapC, T. bruci GapC, andZ. mexicana GapC were 129-152 amino acids. In 34 co u co rf p > CJ CJ CJ CJ P CO CJ CJ CJ CJ H H H H H tfi In HH M HH M ft ft Cd Cd Cd Cd P P g a g a g a M [> HH Cu [> [> cd x ca ca cv EH J t O > J U < ca ai cd cd q a i i i i g a ft j a a x i s u s s 2 a Q M p CJ CO o S CJ | | Cu J CJ P H J ' I > > p J a; os cu cu cj cj cj cj S S H I H OS Oi PH CU CJ CJ Cu Cu CJ CJ a a H H CJ CJ > M CU X H H X ft s s CO CO Cd Cd Q Q ft ft ft I i i Cd Cd Cd Cd I I X X X X rf ft > > > > CO CO ft ft ft ft ft ft rf ft ft ft ft rf CO CO CO ft CO CO > j> > > HH H X X X X X CU x w x aw x > s > > < < O O O I l l CTi o rH O in in CO LD CN CN CN CN CN CN < X rf rf P rf 2 a CJ D ~> [> p> H H H Cd ca Cd Cd O D Cd ft rf Cd a x ft > 2 > >H X CO CO EH EH CO CO E-i EH EH EH ft ft ca CJ ft ft CJ x X CU CU X X CJ X CO 2 rf CU *> p> 3 P 2 Cd X cu P p Cu Cu > > > > > > EH EH M H \u00C2\u00A3 CJ H H Q Q a Q a Q H EH H fe fe HH HH HH P CO CO CO CO CO CO rf CJ CO ft > > a a a a \u00C2\u00B0 9 J \u00E2\u0080\u00A2 J H H ft rf j p H EH EH a ft ft ft ft > > > > > > CU CU CU CU CU CU HH HH HH > Cu Cu Si s, rf rf rf a cc HH CJ CJ CJ CJ CJ CJ 2 a D a EH EH P H J P X X X X XCJ CJ CJ CJ CJ CJ X X X a a p p H P H P Cd CdCd CdCd Cd ft ft ft ft ft ft Cu Cu Cu i* J HH > > > M > > ca Cd Cd X X CJ rf rf rf CJ CJ HH H H HH > > rf rf rf rf rf rf EH EH EH X X X ft rf rf rf CO CO CO CO rf rf CJ CJ CJ CJ CJ CJ EH H EH EH EH EH CO CO CO EH rf CO EH EH EH EH CO CO ft ft ft ft ft ft *S CJ M HH HH a 3 P H HH HH 2 a a CO a a P J p H onj \u00C2\u00A3 ss CO rt rH CJ CJ CJ EH rf CO CO CO rf CJ CJ cu CU CU CU CU OH CO CO CO CJ CJ CU CU cu 2 CJ CJ CU cu cu CUCU CU P s H P s s Q Q Q Q n Q a Z X x S*l i i i X X X X X EH EH EH CJ X 2 a 2 1 CO CO ~> > ft ft ft P s rf CJ rf p p Q D a a i> > > > > HH P P HH B H CO CO CO EH X X a o o a a a EH H EH EH EH EH CJ rf CJ a rf rf EH EH EH EH EH H > H HH a EH rf Q a a D 11 H EH EH EH EH EH H EH EH EH CO EH H H H H 2 2 S S \u00C2\u00A3 M HH HH CJ CJ CJ CJ CJ CJ ^ m oo r- cr\u00C2\u00BB m H o H ro ro o ro ro ro ro ro ro 3 IS I rf I rf rf CO ft > . >H 3 > > > J H H J J X X X X X X >>>>>> ; o cu cu ! EH EH a : X P Cd Q Q \u00E2\u0080\u00A2 - cj p a a \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 J J \u00E2\u0080\u00A2 u. < u . : J S H H EH EH EH EH CJ CO CO CO CO CO rf rf ft ft ft HH X X rf rf rf rf rf CO p n p p Q P > HH |> HH CU ^ HH HH HH HH HH HH EH CO CO CO CO CO CO CO CO CO CO Cfl CU CU CU CU >H CU X X Cd ft CO EH D P p Q 2 Q 2 2 2 p cj CJ EH EH EH X P M >H >c Cu ix Cu Cu R P P P P P EH H EH M EH H to CO CO CO \u00E2\u0080\u00A2H \"\u00E2\u0080\u00A2H \u00E2\u0080\u00A2H - .c. CO CO CO CO t> \u00E2\u0080\u00A2ci CO CO CO CO t> x; CO CO to CO i> fl CO \u00E2\u0080\u00A2u \u00E2\u0080\u00A2u u CU Q Q \u00C2\u00AB( Q oj c1 Q cj Q oi Q Q \u00C2\u00ABq cj Q cu Q Q tn &> 0> tn tr> tr> tr> t3i tTi CD eg cd rd rd rd rd rd rd rd rd rd id \u00C2\u00AB< \u00E2\u0080\u00A24! u o 4J U U U U U u U U y y rd rd U rd rd rd rd rd rd rd rd \u00E2\u0080\u00A24! 25 o u U U rd O O rd U U CJ y G u rd rd rd -U rd rd u rd rd rd rd .< |H U rd U U rd U U U U U CJ U u |H U U U U rd rd rd fd U rd rd rd < |H U U (d rd rd rd U O rd U CJ O y |H U rd U rd O U rd rd 4-> rd CJ U U u |H U U rd rd U rd 4J 4-> rd rd rd rd << |H U rd U 4J 4J rd Cn 4J u rd CJ CJ EH U 4 J J J 4 J J J 4 J 4 J 4 J 4 J 4 J 4 J 4 J o u j o h o r ^ i j o H O v j m o o i ^ ^ i > o ] o r \ i r > ^ c N ) ^ r H r H r H r H r H r H r H O Cn Cn u Cn u Cn u Cn O Cn U U CJ Cn Si Cn Cn Cn Cn rd Cn 4-> 4-> Cn Cn rd Cn 4J rd 4-J 4-> Cn rd 4-J Cn Cn 4J 4J rd rd 4J 4J Cn Cn Cn Cn 4J 4-) Cn rd 4J 4J Cn Cn U 4J 4-J 4J Cn 4J Cn rd 4-J Cn 4-J 4-J Cn 4J 4-J rd 4J rd 4-) Cn Cn 4-J 4J rd Cn u 4J 4J Cn Cn 4-> 4-> 4J 4J 4J u rd 4J 4J 4J 4J 4J CJ rd Cn Cn 4-J 4J 4-> 4-) Cn Cn Cn 4-> 4-J 4-) rd rd rd 4J 4J 4J Cn Cn Cn EH U EH U E< EH CD CD cn \u00E2\u0080\u00A2a 'S cs r H C N I I co cn rd i C N CO H U U C J I I C N C N I rCl . . . . Q Q Q C N C N ' d ,H C N ^ I I I T) T) I I T) T) T3 SH {H I I rd rd I I I I - H - H - H C N C N \u00C2\u00A3 rd rd rd rd 4->4->4J Q Q ft Q 4 J 4J 4J J J U U U ( i J ( l 3 H H ( l ) O l D ( D < < < C n C n < f < m f f l p Q p Q E w CO \u00C2\u00A7 S3 u H ca CQ a u .S \u00C2\u00A3 2 \u00E2\u0080\u00A29 t\ 38 either completely absent or present only once or twice each. The C-residues are also present around five times each on average. This contrasts with the classical G T - A G mammalian introns, which contain a polypyrimidine tract in this region. Moreover, in the introns of the alpha- and beta-tubulin genes from Diplonema sp. 4, continuous ' C A ' repeats were observed. The branchpoint region in a classical G T - A G intron is usually closer to the 3' splice site than to the 5' splice site. More specifically, this region generally appears 15-40 nt upstream of the 3' splice site (Umen et al. 1995). In yeast, this branchpoint consensus sequence is strictly conserved, that is 5 ' - T A C T A A C A - 3 ' (Umen et al. 1995). In contrast, this branchpoint region is loosely conserved in the introns of mammals: 5 ' - Y N Y T R A C N - 3 ' (Umen et al. 1995). The branchpoint consensus sequence from yeast introns was not observed in any of the eleven diplonemid-introns, but the branchpoint consensus sequence from mammalian introns was observed six times in five of the eleven diplonemid-introns. However, it was observed four times in four different introns at either position +4 (referring to the 5' cleavage site) to +11 (three introns in three different copies of Diplonema sp. 4 beta-tubulin genes) or, at position +9 to +16 (one intron in the alpha-tubulin gene of Diplonema sp. 4). Both regions are highly unlikely to be real branchpoint sites since they are too close to or even overlapping with the 5' consensus splice sites of the introns. This branchpoint sequence of mammalian introns was also observed between position -23 (referring to the 3' cleavage site) to -16 ( T G T T G A C T ) in the intron from Diplonema sp. 4 alpha-tubulin gene, and between position -36 and -29 ( T C C T G A C C ) in the first intron (closest to the 5' end of the gene) in the Diplonema sp. 2 gap2. 39 3.3 Phylogeny of the Euglenozoa In addition to looking for introns in diplonemids, I constructed three protein phylogenetic trees with the newly obtained diplonemid sequences in an attempt to determine the phylogenetic position of diplonemids within the phylum Euglenozoa. Act in phylogeny A n actin phylogeny was constructed from 373 alignable characters from a total of 65 eukaryotic taxa using distance and neighbor joining methods (Figure 9). Most of the phylogenetically distinct eukaryotic groups including land plants, green algae, animals, fungi, heterokonts, and alvolates are recovered in the actin tree (Fig. 9). The two new diplonemid sequences are closely related to each other and form a clade with 100% bootstrap support. In fact, the whole phylum Euglenozoa (shaded in Fig . 9), consisting of three major groups (diplonemids, euglenoids and kinetoplastids), is well supported by this tree (91%) bootstrap value). Furthermore, this actin tree also strongly suggests that the two diplonemid sequences are more closely related to the euglenoid sequences than to the kinetoplastid sequences. The node uniting diplonemids with euglenoids (at the exclusion of kinetoplastids, node A in Fig . 9) is well supported by bootstrap (79%>). In an effort to test the likelihood of alternative positions for diplonemids within the phylum Euglenozoa, Kishino-Hasegawa tests were carried out on the actin data. In this case, I tested two alternative positions for the diplonemids. In one alternative, the diplonemids branch with the kinetoplastids, so the internal topology of the phylum became ((diplonemids, kinetoplastids), euglenoids) or ((D, K ) , E). The other possible position of diplonemids is at 40 Fig. 9 Neighbor-joining tree based on actin protein sequences of various eukaryotes, including two new sequences of diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values of particular interest are in bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor of diplonemid and euglenoid sequences and node B is the last common ancestor of the phylum Euglenozoa (shaded). Alternative positions for the diplonemids were assessed with Kishino-Hasegawa tests at the nodes marked with open circles. The two alternatives were not rejected at 5% levels. 41 Sorghum bicolor 1 Orysa sativa 1 Nicotiana tabacum Pisum sativum 1 \u00E2\u0080\u00A2Striga asiatica 1 Arabidopsis thaliana 1 Solanum tuberosum 101 Zea mays 1 PLANTS Cosmarium botrytis Coleochaete scutata Mesostigma viride Nannochloris bacillaris Scherffelia dubia Volvox carteri Chlamydomonas reinhardtii Chlorella vulgaris 1 55 GREEN ALGAE \u00E2\u0080\u0094 1 Allogromia sp. Reticulomyxa filosa 1 Ammonia sp.1 RED ALGA FUNGI FORAMINIFERA Trypanosoma bwcei B Trypanosoma brucoi A Trypanosoma cruzi Leishmania major KINETO-PLASTIDS DIPLONEMIDS EUGLENOIDS \u00E2\u0080\u00A2 HETEROLOBOSEA OOMYCETES HETEROKONTS (CHROMISTS) Plasmodium falciparum 2 \u00E2\u0080\u00A2 Toxoplasma gondii a R _ P m n i n a n f n m minimum Amphidinium carterae Perkinsus marinus 1 ALVEOLATES 0.1 42 the base of this phylum, so the internal topology became (D, (K, E)). These two alternative positions of diplonemids are indicated by open circles in Fig . 9. The K - H tests found that these two alternative topologies of Euglenozoa were not significantly worse than the original topology ((D, E), K ) , at a confidence level of 5% (Table 2). In conclusion, actin tree strongly supports the inclusion of diplonemids within the phylum Euglenozoa, but the close association of diplonemids to euglenoids at the exclusion of kinetoplasteds is only supported by bootstrap, and not by the K - H tests. Alpha-tubulin phylogeny A n alpha-tubulin phylogeny was constructed from 436 alignable characters of a total of 64 eukaryotic taxa using distance and neighbor joining methods. The resulting alpha-tubulin tree (Fig. 10) supports most of the major eukaryotic groups, including alveolates, green algae, red algae, land plants, diplomonads, fungi, animals and parabasalia. The 10 diplonemid sequences from this study form a single group with a very high bootstrap value (96%). The phylum Euglenozoa (euglenoids, kinetoplastids and diplonemids) also forms a single clade, which is supported at 64% by bootstrap. When the internal phylogeny of the Euglenozoa is considered, the alpha-tubulin tree tells a different story than the actin tree. The alpha-tubulin phylogeny favors diplonemids being closer to kinetoplastids than to euglenoids. However, the node uniting diplonemids and kinetoplastids (node A in Fig. 10) is poorly supported (40% bootstrap). To test the strength of this position for the diplonemids, I did K - H tests on two alternative positions for diplonemids (marked by two open circles in Fig . 10). The results showed that the alternative topologies for the phylum Euglenozoa- (D, (E, K)) and (K, (D, E))- were not significantly worse than (E, (D, K)) at confidence levels of 5%, as suggested by the low bootstrap values in the original alpha-tubulin tree (Table 3). 43 Fig. 10 Neighbor-joining tree based on alpha-tubulin protein sequences of various eukaryotes, including ten new sequences of diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values o f particular interest are in bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor of diplonemids and kinetoplastids and node B is the last common ancestor of the phylum Euglenozoa (shaded). Alternative positions for diplonemids were assessed with Kishino-Hasegawa tests at nodes marked with open circles. The alternatives were not rejected at 5% levels. 44 52 76 BSfRhynchopus sp.2 -Rhynchopus sp. 1 ?22 Rhynchopus sp. 3 115 Rhynchopus sp. 3 ZDiplonema sp.3 new -Diplonema sp. 3 L29 Diplonema sp.3 new \u00E2\u0080\u00A2 Diplonema spA -Diplonema sp. 2 -Diplonema papillatum ^00^ Trypanosoma brucei . Trypanosoma cruzi \u00E2\u0080\u0094\u00E2\u0080\u0094 Leishmania donovani Euglena gracilis Acrasis rosea Naegleria gruberi Condylostoma magnum Loxodes striatus Zosterograptus sp. Plasmodium talciparum\ ] DIPLONEMIDS KINETOPLASTIDS EUGLENOIDS fe 85 Toxoplasma gondii Spathidium sp. Volvox carteri 1 Chlorella vulgaris Cercomonas ATCC50319 RS23 Chlorarachnion reptans 2 97 HETEROLOBOSEA ALVEOLATES \u00E2\u0080\u00A2 G R E E N ALGAE ICERCOZOA Guillardia theta nucleomorph 79. jrP\"Pru' 53 l-Anei \u00E2\u0080\u00A2 Reticulomyxa filosa 2 100-Hordeum vulgare 2 Eleusine indica 1 P unus dulcis Anemia phyllitidis 5 r\u00E2\u0080\u0094 Arabidopsis thaliana 5 \u00E2\u0080\u00A2 Galderia sulphuraria \u00E2\u0080\u00A2 FORAMINIFERA \"J RED ALGAE Hordeum vulgare 1 \u00E2\u0080\u00A2Arabidopsis thaliana 1 - Eleusine indica 2 PLANTS 98 100 ^\u00E2\u0080\u0094Physarum polycephalum 1 \u00E2\u0080\u00A2 Physarum polycephalum E 100 Physarum polycephalum D Guillardia thetacytoplasmic ] 100 Pelvetia fastigiata'l \u00E2\u0080\u0094 i . ., \u00E2\u0080\u009E . Pelvetiafastigiata2 J BROWN ALGAE 100 i Spironucleus vortens Spironucleus muris Spironucleus barkhanus Giardia intestinalis 100 I SLIME MOLDS \u00E2\u0080\u00A2 CRYPTOMONAD DIPLOMONADS 100 Ajellomyces capsulatus -Emericella nidulans 1 Schizosaccharomyces pombe 1 \u00E2\u0080\u0094\u00E2\u0080\u0094\u00E2\u0080\u0094 Candida albicans 100 Schistosoma mansoni Patella vulgata \u00E2\u0080\u0094Octopus dofleini \u00E2\u0080\u00A2 Pneumocystis carinii \u00E2\u0080\u00A2 Schizophyllum commune A Schizophyllum commune B ANIMALS 98 _^ Drosophila melanogaster 1 QH] 7^\u00E2\u0080\u0094 Gallusgallus LJL Homo sapiens 1 S^Tofpecto marmorata \u00E2\u0080\u0094Spizellomyces punctatus = 3 CHYTRID FUNGUS MonocercomonasATCC50210 1 - | P A D A R A Q A I I A Trichomitus batrachorum J rAHAtSAbALIA 0.1 In conclusion, although in alpha-tubulin tree diplonemids are placed closer to kinetoplastids than to euglenoids, this placement is not supported by either bootstrap or K - H tests. Beta-tubulin phylogeny A beta-tubulin phylogeny was constructed from 428 alignable characters from a total of 59 eukaryotic taxa using distance and neighbor joining methods. The beta-tubulin tree (Fig. 11) also supports most of the common eukaryotic groups: land plants, green algae, alveolates, heterokonts, animals, fungi, and diplomonads. The thirteen new diplonemid sequences obtained from this study also branch together with 100% bootstrap support. However, in this tree, a heterolobosean sequence (Naegleria gruberi) branches within the Euglenozoa, specifically with euglenoids. The close association of the beta-tubulin sequences from Naegleria gruberi and euglenoids was also indicated by the previously constructed global beta-tubulin tree (Keeling et al. 1996). In both actin and alpha-tubulin trees, the heterolobosea form a separate phylogenetic group closest to the phylum Euglenozoa. The inclusion of a member of a different phylogenetic group may cause the low support (less than 50%) for the whole group (the Euglenozoa and Naegleria gruberi, node B in Fig. 11). Furthermore, the suspiciously close association of Naegleria gruberi and euglenoids may affect the real phylogenetic relationship of diplonemids to kinetoplastids and euglenoids. A s for the phylogenetic placement of the diplonemids within the phylum Euglenozoa, this beta-tubulin tree agrees with the alpha-tubulin tree in placing the diplonemids with the kinetoplastids, in contrast to the actin tree. The bootstrap value for the 46 Fig. 11 Neighbor-joining tree based on beta-tubulin protein sequences of various eukaryotes, including thirteen new sequences of diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values of particular interest are in bold. Scale bar indicates amino acid substitutions per site. Node A is the last common ancestor of diplonemids and kinetoplastids and node B is the last common ancestor of the phyla Euglenozoa (shaded) and Heterolobosea (represented by Naegleria gruberi in this tree). Alternative positions for diplonemids were assessed with the Kishino-Hasegawa test at nodes with open circles (when Naegleria gruberi branches together with euglenoids) and filled circles (when Naegleria gruberi was moved out of the phylum Euglenozoa, suggested by a dashed line). The five alternatives were not rejected at confidence levels of 95%>. 47 110 Rhynchopus sp. 3 7 \u00C2\u00B0| Rhynchopus sp. 2 n IE Naegleria gruberi Volvox carter! 1 Polytomella agilis 2 Chlamydomonas reinhardtii Chlamydomonas incerta \u00E2\u0080\u0094 Guillardia theta nuclear 2 Guillardia theta nuclear 1 Eimeria tenella 6 Rhynchopus sp. 3 Rhynchopus sp. 1 24 Diplonema ambulator 23 Diplonema ambulator ip '27 Diplonema sp. 4 new |L Diplonema sp. 4 126 Diplonema sp. 4 new 87j| 22 Diplonema sp. 3 new \l Diplonema sp. 3 16 Diplonema sp. 3 new I\u00E2\u0080\u0094 Diplonema sp. 2 100 j Trypanosoma brucei 1 Trypanosoma brucei rhodesionsc nn-n\u00E2\u0080\u0094 Leishmania major Entosiphon sulcatum __________ Euglona gracilis DIPLONEMIDS KINETOPLASTIDS EUGLENOIDS HETEROLOBOSEA J l O/y; \" Arab . Lupinus albus 1 za saftVa 1 Pisum sativum 1 \u00E2\u0080\u00A2 Anemia phyllitidis] \u00E2\u0080\u0094 Daucus carota 2 idopsis thaliana 1 ] PLANTS GREEN ALGAE \u00E2\u0080\u00A2 CRYPTOMONADS Toxgoplasma gondii Plasmodium berghei Babesia bovis Tetrahymena themnophila 2 Colpoda sp ALVEOLATES 31CERCOZOA Phytophthora cinnamomi Ectocarpus variabilis 6 \u00E2\u0080\u0094 Brugia pahangi 1 \u00E2\u0080\u00A2JHETEROKONTS Caenomabditis briggsae 3 Caenomabditis briggsae Gallus gallus 4 Drosophila melanogaster2 Caenorhabditis elegans 2 9 7 . ANIMALS _____ Basidiobolus ranarum 35 . Spiromyces minutus 2 _____ Conidiobolus coronatus 1 Giardia lamblia Spironucleus barkhanus FUNGI \u00E2\u0080\u00A2 DIPLOMONADS _ Trichomonas vaginalis 0.1 I PARABA-SALS 48 Tree log L difference S.E. Significantly worse 1 (K,(D,E)) -8698.30 3.22 5.84 no 2(E,(D,K)) -8700.06 4.99 5.10 no 3 (D,(E,K)) -8695.08 0.00 best tree Table 2. Kishino-Hasegawa test of the positions of diplonemids within Euglenozoa in the actin tree. D-diplonemids, E-euglenoids, K-kinetoplastids. Tree log L 1 (E,(D,K)) -8439.05 2 (D,(E,K)) -8434.01 3 (K,(D,E)) -8437.58 difference S.E. 5.04 4.84 0.00 3.57 5.57 Significantly worse no best tree no Table 3. Kishino-Hasegawa test of the positions of diplonemids within Euglenozoa in the alpha-tubulin tree. D-diplonemids, E-euglenoids, K-kinetoplastids. Tree logL difference S.E. Significantly worse 1 (E,(D,K)) -6424.70 0.00 best tree 2 (D,(E,K)) -6436.15 11.45 8.51 no 3 (K,(D,E)) -6434.70 10.00 8.90 no 4 (E,(D,K)) -6425.10 0.41 10.26 no 5 (D,(E,K)) -6436.54 11.84 13.89 no 6 (K,(D,E)) -6438.95 14.26 13.46 no Table 4. Kishino-Hasegawa test of the positions of diplonemids within Euglenozoa in the beta-tubulin tree. The topologies of Euglenozoa of Tree 1-Tree 3, with Naegleria gruberi branching with euglenoids. Tree 4-Tree 6 exclude Naegleria gruberi from the Euglenozoa. D-diplonemids, E-euglenoids, K-kinetoplastids. 49 diplonemids node uniting diplonemids and kinetoplastids (node A in Fig . 11) is 64%, which is higher than the bootstrap value indicated by the alpha-tubulin tree, but is still relatively low. Considering the phylogenetic position of Naegleria gruberi within the Euglenozoa, and that its close association with euglenoids might affect the phylogenetic placement of within this phylum, I did K - H tests on alternative positions for the group diplonemids with Naegleria gruberi branching with euglenoids within the phylum Euglenozoa ((D, (E, K)) and ((D, E), K ) , marked by open circles), and on three alternative positions for the group diplonemids with Naegleria gruberi constrained outside the phylum Euglenozoa ((E, (D, K)) , (D, (E, K ) and ((D, E), K ) , marked by closed circles). None of these alternative topologies were rejected at the 5%> level (Table 4). In conclusion, beta-tubulin also supports a closer relationship between diplonemids and kinetoplastids than between diplonemids and euglenoids. However, this relationship is not supported by either bootstrap or K - H tests. Moreover, the validity of the phylogenetic position of diplonemids within the Euglenozoa suggested by the beta-tubulin tree is questioned by the inclusion of a member from a separate phylogenetic group into the Euglenozoa. To summarize, among the three protein phylogenetic tree constructed in this study, the actin tree shows the strongest bootstrap support, not only for the phylum Euglenozoa, but also for the phylogenetic placement of diplonemids within this phylum. The alpha-tubulin tree indicates low ability to resolve the internal phylogeny of the phylum Euglenozoa and the beta-tubulin tree does \"not support the Euglenozoa as a monophyletic phylum. 3.4 Lateral gene transfer indicated by GAPDH phylogeny On the basis of the comparison of the 290 alignable amino-acid sequences for G A P D H from 100 taxa, a B i o N J tree was constructed (Fig. 12) using a distance and 50 neighbor-joining analysis. This global tree includes not only diverse eukaryotic groups but also diverse prokaryotic groups. The resulting tree revealed a very complex picture o f G A P D H gene evolution (Fig. 12) and recovered the basic relationships of the two separate classes of G A P D H sequences, GapC and GapA/B (divided by a dashed line in Fig. 9), typical of G A P D H phylogeny (Michels et al. 1991; Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997). The GapC clade (above the dashed line) includes the cytosolic G A P D H from most eukaryotes. In published global G A P D H trees, the GapC of most eukaryotes form a sub-clade, with unresolved relationships. This is also indicated in my global G A P D H tree by the lack of bootstrap support for the backbone of the GapC sub-clade. Moreover, my global G A P D H tree also shows that the gapl sequences from a group of proteobacteria (including the gap A (=gapl) from E.coli) and a group of cyanobacteria are basal to this eukaryotic crown sub-clade, but separated from GapA/B by the GapC sequences from another eukaryotic phylum, Heterolobosea. The GapA/B clade (below the dashed line) includes the G A P D H genes from most bacteria and the plastid targeted G A P D H genes from photosynthetic eukaryotes. The eukaryotic plastid-targeted GapA/B sequences form a sub-clade that is closely related to the gap2 sequences from cyanobacteria, in keeping with the cyanobacterial origin of chloroplasts. In order to do careful phylogenetic analysis that would be impossible on 100 taxa (I did not perform the gamma-distribution correction on the distance matrices inferred from the G A P D H sequences alignment with 100 taxa), I constructed two smaller B i o N J G A P D H trees based on two sub-alignments. Both alignments retained the 290 alignable characters. One includes all 39 taxa in the GapA/B clade (below the dashed line) o f the larger G A P D H tree (Fig. 13) and the other includes all 61 taxa in the GapC clade (above the dashed line) of 51 F i g . 12 Phylogeny of diverse eukaryotes and prokaryotes based on G A P D H protein sequences, including the four new sequences of diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values of particular interest are in bold. Dashed line divides the two classes of G A P D H : GapC (above the dashed line) and GapA/B (below the dashed line). Scale bar indicates amino acid substitutions per site. The five shaded regions include all the members of the phylum Euglenozoa in this tree. Node A unites diplonemid and cyanobacterial sequences, and node B unites a second copy G A P D H oi Diplonema sp. 2 and two sequences of cryptomonads. 52 95 \u00E2\u0080\u009E Cryphonectria parasitica \u00E2\u0080\u00A2 Colletotrichum gloeosponoides tys heterostrophus Emencella nidulans Ustijago maydis Lentinola edodes \u00E2\u0080\u00A2 Schizophyllum commune Agariousbisporus II Caenomabditis briggsae 2 Ceanorhabditis elegans 4 Homo sapiens gapu Drosophila melanogaster 2 Schistosoma rnansoni Dictyostelium discoideum C - Pyrenomonas salina Guillardia.theta. ] FUNGI ANIMALS 2 Diplonema sp. 2 Schizosaccharomyces pombe Zea mays gapC4 \u00E2\u0080\u00A2 CRYPTOMONADS ZJ RED ALGAE =i FUNGUS Pisu'm sativum^ H PLANTS - Plasmodium falciparum\u00E2\u0080\u0094i ,/r-\u00E2\u0080\u009E, . - , - , - 0 Gonyaulax polyedra C _ l ALVEOLATES Tre^mpnasjsllis ~J DIPLOMONADS ea ys ,_r . . Zea mays gapC1_ Phytophthora infestans Entamoeba histolytica C jCRYPTOMONAD pnyaglax polyedra 33 ALVEOLATE ^gosFcchardmycesTQlixiP P -\u00E2\u0080\u00A2 _, j?,REEN A L G A E saccharomyces cerevisiae gapl J FUNGI Guillardia theta \u00E2\u0080\u00A2 Gpnyaul Chlamydomonas reinhardtn gapC ivaEscherichia coli gapA Serratia marcescens Haemophilus influenzae Leishmania mexicana qapCi Trypanosoma brucei gapC Bacteroides fragilts ^Jialstqnia eutropha ulreptieua sp. ileri 1 PROTEOBACTERIA KINETOPLASTIDS Naegleria anderson^ ^ Q - ] HETEROLOBOSEA 1 Gloeobacter vioTaceus 1 Anabaena variabilis \Synechocystis PCC6803 ' ' - \" \"-coccus PCC7942 neumoniae _ . , . , .. cana gape Cnthidia fasciculata Leptomonas lactosovorahSM Phytomonas sp Trypanosoma cruzi qap Trypanosoma rangeli .^anosoma brucei qppG Ttybanopiasma borreli > \u00E2\u0080\u0094 _ _ _ ^repDnemaMaiMuni. _ P _S r CYANOBACTERIA KINETOPLASTIDS & EUGLENOID J ^ P I B Q C H A E I E TERIA \u00E2\u0080\u00A2Arabidopsis thaliana gapA Pisum sativum pap A \u00E2\u0080\u00A2monas reinhardtii gapA PLANTS ArabidpjSsis 9^ >E A Graalarfa frfcilfs gift A _1 RED ALGAE 2 Prochloron didemni \u00E2\u0080\u0094 i 2 5 yn e C / ,o C y S / /s_PC C _6803 J C Y A N O B A C T E R I A Zl \u00E2\u0080\u00A2 BACILLUS rme^MUillib'l's ^Euglena gracilis CP fParacoccus denitnficansu Rhodobacter sphaeroides Xanthobacter flavus 2 Eutreptiella sp. 2 Ralsfonia eutropha PROTEOBACTERIA Zymomonas mobilis Pseudomonas aeruginosa aureofaciens 2 Paracoccus denitrificans J FIRMICUTES 3 Prochloron didemni 1 3Anabaena vanabiiis I ZT^yl^cW^m^P^A 88 . tRhynchopuss CYANOBACTERIA Thermotoga maritima. Thermus aquaticus . SyRhynchopusmpm 1 Diplonema sp. 2 .1 Diplonema sp. 3 ] DIPLONEMIDS . THERMOTOGALES/THERMUS \u00E2\u0080\u0094 .Monocercomo/ras ATCC50210 \u00E2\u0080\u0094I Trichomonas vaginalis Trichomitus batrachorum I PARABASALIA 53 F i g . 13 G A P D H phylogeny of protein sequences of prokaryotes and some eukaryotes, including three new sequences o f diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood ( M L ) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50% and the bootstrap values of particular interest are in bold. For technical details see Material and Methods. Scale bar indicates amino acid substitutions per site. The number in front of the species indicates a particular copy of the G A P D H from that species. The two shaded regions include all the members of the phylum Euglenozoa in this tree. Node A unites diplonemid sequences (Rhynchopus sp. 3, Diplonema sp. 2 gapl , Diplonema sp. 3), cyanobacterial gap3 sequences and one proteobacterial G A P D H sequence. 54 Fig . 14 G A P D H phylogeny of protein sequences of eukaryotes and some bacteria, including one new sequences of diplonemids (in bold). This unrooted B i o N J tree was constructed by calculating maximum-likelihood (ML) distances between pairs of sequences. Values on selected branches indicate neighbor-joining bootstrap support greater than 50%. For technical details see Material and Methods. Scale bar indicates amino acid substitutions per site. The number in front of the species indicates a particular copy of the G A P D H from that species. The four shaded regions include all the members of the phylum Euglenozoa in this tree. Node B unites Diplonema sp. 2 gap2 and the GapC of cryptomonads. 56 57 the larger G A P D H tree (Fig. 14). The two smaller G A P D H trees have essentially the same branching order as the global G A P D H tree. The phylogeny of the phylum Euglenozoa based on G A P D H sequences is a lot more complicated than those based on actin, alpha-tubulin or beta-tubulin sequences. The various Euglenozoan G A P D H genes do not branch together as they do in these other trees: instead, euglenozoan sequences are scattered all over the G A P D H tree (see shaded regions of Fig. 12). A s with previous analyses (e.g. Michels et al. 1991; Martin et al. 1993; Henze et al. 1995; Liaud et al. 1997), the Euglena cytosolic/kinetoplastid glycosomal clade is basal to GapC, and the gapA(=gapl) from proteobacteria and the gapl from cyanobacteria. The cytosolic G A P D H genes of Leishmania mexicana and Trypanosoma brucei are extraordinarily close to Escherichia coli gapA (=gapl). The chloroplast GapA of Euglena gracilis branches at the base of the clade comprised of plastid-targeted GapA/B sequences of photosynthetic eukaryotes, and the gap2 of cyanobacteria. The phylogenetic positions of the four diplonemid G A P D H sequences from this study are intriguing. None of the four is closely related to any of the G A P D H sequences from other euglenozoa. Three of the four d ip lonemid-GAPDH sequences {Rhynchopus sp. 3, Diplonema sp. 3, and one copy of G A P D H sequence from Diplonema sp. 2 ) form a group (gapl) with 100% bootstrap support. This group, surprisingly, branches with the gap3 from cyanobacteria and one proteobacterial gap (Rhodobacterium). The union of Diplonema sp. 2, Diplonema sp. 3, and Rhynchopus sp. 3 gapl with these bacterial G A P D H s is robust (supported by a 100% bootstrap value). The most reasonable explanation for this unusual association between prokaryotic and eukaryotic genes is interkingdom lateral gene transfer, much as that suggested to explain the extraordinary affinity between the L. mexicana and T. brucei GapC genes and the E. coli gapA (Michels et al. 1991). 58 The second copy of G A P D H from Diplonema sp. 2 (gap2) is not closely related to the other three diplonemid sequences, nor is it closely related to the other euglenozoan G A P D H sequences. Instead, it weakly branches with cytosolic G A P D H sequences from cryptomonads, which branch with animals and fungi in Fig. 12 and Fig . 14, with a very low bootstrap support (31%). The low bootstrap support makes the phylogenetic placement of the second copy of G A P D H from Diplonema sp. 2 very tentative and questionable, but it is certain that it is not related to the diplonemid gapl genes. 59 C H A P T E R I V : Discussion In an effort to address questions about the evolutionary history of introns in the phylum Euglenozoa, I have sequenced twenty-nine nuclear encoded genes from nine different diplonemids. I discovered eleven introns in nine of the twenty-nine genes. I have also inferred phylogenetic trees from these protein genes (actin, alpha- and beta-tubulins), including the new diplonemid-sequences to attempt to reconstruct the evolutionary history of the Euglenozoan introns. In order to gain a better understanding of the G A P D H phylogeny of the phylum Euglenozoa, I also constructed a global G A P D H tree, including four sequences from diplonemids. The resulting phylogenetic positions of diplonemid-sequences were unexpected, which makes the G A P D H phylogeny of the Euglenozoa even more intriguing. 4.1 Phylogeny of the Euglenozoa The actin, alpha- and beta- tubulin trees constructed in this study confirm that diplonemids represent a third group in the phylum Euglenozoa, along with euglenoids and kinetoplastids (there are no molecular data on Postgaardi), as previously proposed based on the morphological (Triemer et al. 1990; Triemer et al. 1991b; Simpson 1997) and molecular phylogenetic evidence (Maslov et al. 1999). While some phylogenetic relationship between diplonemids and the other two euglenozoan groups seems certain, the phylogenetic relationships among the three groups has never been clear. There are three possible topologies for a tree of three lineages (Fig. 15). The topology of the actin tree I constructed with the two-diplonemid sequences obtained in this study suggested that diplonemids are more closely related to euglenoids than to kinetoplastids. On the other hand, phylogenetic 60 analyses of alpha- and beta- tubulin, including many new diplonemid sequences, weakly support a different topology, where diplonemids are more closely related to the kinetoplastids than to euglenoids. The third topology, in which diplonemids are at the base of the phylum, and euglenoids and kinetoplastids are closer to each other, has not been supported by any phylogenetic tree constructed in this study or in previous studies (Maslov et al. 1999). In order to assess the reliability of different topologies suggested by different protein trees, I did bootstrap analysis (for the details, see results). Bootstrap analysis for the actin tree gives 79% support for the union of diplonemids and euglenoids, while neither alpha- nor beta-tubulin tree give strong bootstrap support for the node uniting the diplonemids and kinetoplastids (only 40% and 64%, respectively). Moreover, the reliability of the beta-tubulin tree is questionable because the phylum Euglenozoa is not holophyletic: the last common ancestor for the three major euglenozoan groups (diplonemids, kinetoplastids and euglenoids) is also an ancestor of Naegleria gruberi. It is well known that Naegleria gruberi belongs to a related but phylogenetically distinct group, the Heterolobosea. The separation of the Heterolobosea and the Euglenozoa is supported by nearly all known molecular phylogenies, including the actin and alpha-tubulin trees I constructed. Taken together, the topology of the actin tree, in which diplonemids are closer to euglenoids, is probably more reliable than that of tubulin trees. This conclusion agrees with the topologies suggested by distance and parsimony trees based on the sequences of the S S U r R N A gene and the Cox I (cytochrome c oxidase subunit I) protein, but differs from maximum likelihood trees based on the same molecules (Maslov et al. 1999). Maximum likelihood has been proven to be a powerful method of phylogenetic reconstruction. However, as it is a complicated and computationally heavy 61 process, the sampling size in maximum likelihood methods is very limited, especially when the data are composed of protein sequences. The limited sampling size raises doubts as to whether the phylogenetic position of diplonemids in the maximum likelihood trees of Maslove et al. (1999) are reliable. Moreover, none of the phylogenetic positions of diplonemids suggested by the maximum likelihood trees (Maslov et al. 1999) are well-supported by bootstrap analysis: in the maximum likelihood Cox I protein tree, the bootstrap support for the diplonemid/kinetoplastid branch is very low (56%) and in maximum likelihood S S U r R N A tree, it is even lower, less than 50%. Compared with the phylogenetic analysis conducted by Maslov et al. (1999), the phylogenetic analysis I performed has three advantages. First, the sampling size is comparatively large: the number of diplonemid-species is significantly increased, especially in both alpha- and beta-tubulin trees (10 and 13 diplonemid-species, respectively). Second, the number of outgroups is also greatly increased: all o f the three protein trees, actin, alpha-tubulin and beta-tubulin, contain a variety of phylogenetically distant groups. Thus, I may have avoided the bias caused by the choice of outgroups. Different choices of outgroup sequences may lead to variable support for the ingroup phylogeny. This has indeed been noticed by Maslov et al. (1999) in the phylogenetic analysis performed on the Euglenozoa: in the maximum likelihood analysis of S S U r R N A , they found that the choice of Giardia lamblia and Vairimorpha necatrix as outgroups greatly increased the bootstrap support for the association of diplonemids and kinetoplastids, compared to the bootstrap support obtained when choosing Physarum polycephalum and Saccharomyces cerevisiae as outgroups. However, they pointed out that this seemingly high bootstrap support might be caused by the biased nucleotide composition or fast substitution rates in Giardia lamblia and Vairimorpha necatrix. Third, my analyses were performed at the amino acid level, rather 62 than alignments of D N A sequences as in the S S U r R N A analyses conducted by Maslov et al. (1999). This is also an advantage because a substitution o f an amino acid may be more evolutionary informative than a substitution of a nucleotide, especially when the change of a nucleotide is synonymous. On the other hand, although the phylogenetic analysis conducted in this study probably favours a closer association of diplonemids with euglenoids, in each case the difference between the best tree and the alternative trees was not significant at a 95% confidence level, as inferred by K - H test (see results). In order to further assess the reliability of the union of diplonemids and euglenoids, it may be helpful to perform a combined analysis, in which actin, alpha-tubulin and beta-tubulin sequences are combined into a single alignment, and a phylogenetic tree constructed from this alignment. In addition, it might be helpful to try to use different combinations of outgroups chosen from the actin, alpha- and beta- tubulin trees, in maximum likelihood analyses with the newly obtained protein sequences of diplonemids. 4.2 Possible origins of the intron-types in the Euglenozoa As mentioned before, three types of introns (conventional G T - A G spliceosomal intron, trans-spliced discontinuous intron, and \"aberrant\" intron) have been reported in the phylum Euglenozoa. Since conventional spliceosomal introns are seemingly rare in the Euglenozoa, trans-spliced discontinuous introns are rarely found out of this phylum, and the \"aberrant\" introns are unique to photosynthetic euglenoids, it would be interesting to determine the distribution of intron types in the third major lineage of the Euglenozoa, diplonemids. In this section, I am going to discuss the possible origins of the three types of intron based on the avalaible information on the distribution of the intron types in the Euglenozoa, and the internal phylogeny of this phylum discussed in the preceding section. 63 G T - A G spliceosomal introns Among the twenty-nine newly sequenced nuclear encoded genes (actin, alpha-tubulin, beta-tubulin and G A P D H ) , eleven G T - A G introns were found in nine genes. Thus, G T - A G introns seem to be frequently present in the actin, alpha-tubulin, beta-tubulin and G A P D H genes of diplonemids. As mentioned in the Introduction, conventional G T - A G introns are very rare in euglenoids and altogether absent from the actin and tubulin genes of Euglena gracilis. The apparent rarity of G T - A G introns in euglenoids could be due to limited sequence sampling. When more nuclear genes from different euglenoids are examined, more G T - A G introns could be discovered. In fact, three G T - A G spliceosomal introns have been recently reported in the fibrillarin gene of Euglena gracilis (Breckenridge et al. 1999), in addition to one G T - A G spliceosomal intron in a beta-tubulin gene of Entosiphon sulcatum (Ebel et al. 1999). Because G T - A G spliceosomal introns have been detected in the nuclear genes of most eukaryotes, including the closest relatives of the Euglenozoa, the phylum Heterolobosea (Remillard et al. 1995), it is reasonable to think that G T - A G spliceosomal introns already exited in the ancestor of the Euglenozoa. Thus, the reason that G T - A G spliceosomal introns are very rare in kinetoplastids and euglenoids is likely due to a high frequency of intron loss. Trans-spliced discontinuous introns Trans-splicing occurs abundantly in the post-transcriptional process of pre-mRNA in both kinetoplastids and euglenoids (see introduction), but no information available is available on whether this process is present in diplonemids or not. However, by combining the internal phylogeny and the known distribution of trans-splicing within the phylum Euglenozoa, it is possible to make predictions regarding the origin of this unusual process. 64 There are three possible topologies to describe the phylum Euglenozoa (Fig. 15). In my phylogenetic analysis based on actin, alpha- and beta- tubulin sequences, only two of the three topologies were ever recovered (Fig. 15 A and Fig. 15 B) , while the third possible topology, favouring diplonemids at the base of the Euglenozoa (Fig. 15 C), was supported neither by my protein trees (actin, alpha- and beta-tubulin trees) nor by any other phylogenetic analysis conducted so far (Maslov et al. 1999). Either of the two plausible euglenozoan phylogenies, that which unites diplonemids with euglenoids or, alternatively, with kinetoplastids, implies that trans-splicing arose in the common ancestor of all Euglenozoa. If this is true, then all three major groups of Euglenozoa, including diplonemids, should contain trans-spliced, discontinuous introns. On the other hand, i f one accepts the third topology of this phylum (with diplonemids basal) and then considers the known distribution of trans-splicing within Euglenozoa, there could be two possible origins of trans-splicing (Fig. 15 C) : it either originated in the common ancestor of this phylum or, alternatively, in the common ancestor of euglenoids and kinetoplastids after the separation of the diplonemid-lineage. Since the third topology of the Euglenozoa is not supported in any of the phylogenetic analysis, trans-splicing is highly likely to be an ancestral character of the phylum Euglenozoa, and therefore, it w i l l also be found in diplonemids. \"Aberrant\" introns In nine of the twenty-nine nuclear encoded genes sequenced from diplonemids, I discovered eleven G T - A G introns. None of them resemble the \"aberrant\" introns unique to Euglena gracilis. In kinetoplastids, over 4000 protein sequences are available in Genbank at present, and none of them contains any such \"aberrant\" intron either. Therefore, it is tempting to speculate that \"aberrant\" introns are a derived character unique to euglenoids. 65 TOPOLOGY M O L E C U L A R EVIDENCE B C Actin (distance) SSU rRNA, COI (distance and parsimony) (Maslov et al. 1999) Tubulins (distance) SSU rRNA, COI (maximum likelihood) (Maslov et al. 1999) None Fig. 15 Three possible topologies (A, B, C) for the internal phylogeny of the Euglenozoa. E-euglenoids; D-diplonemids; K-kinetoplastids. Arrows point at the most likely origin of either \"aberrant\" introns or trans-splicing. \"?\" indicates the uncertainty of the most likely origin of trans-splicing between the two sites: either before or after the divergence of the diplonemid-lineage from other euglenozoons. 66 They are perhaps unique to photosynthetic euglenoids, or they may even be Euglena gracilis specific, since all the thirty \"aberrant\" introns reported so far are from Euglena gracilis: 26 from two different nuclear-encoded chloroplast-targeted genes and four from the nuclear encoded, cytosolic G A P D H gene (Tessier et al. 1992; Muchhal et al. 1994; Henze et al. 1995). 4.3 Features of diplonemid introns Although there were no \"aberrant\" introns in any of the 29 newly sequenced diplonemid genes, the eleven G T - A G introns from diplonemids do share four unusual features when compared with conventional G T - A G introns, especially those of mammals, which are the best studied. First, although the lengths of these introns are not uncommon among other protist-introns, they are relatively short when compared with those G T - A G spliceosomal introns in mammals. They range in size from 40 to 149 nt whereas the sizes of the G T - A G spliceosomal introns in mammals generally range from 80 to 10000 nucleotides or more. Second, the 5' splice consensus sequence of a typical diplonemid-intron is one nucleotide different from that of a typical mammalian-intron. A s observed by previous researchers, the consensus sequences of a mammalian G T - A G spliceosomal intron are G / G U R A G Y at the 5' splice site and C A G / at the 3' splice site (a slash marks the cleavage site; R represents purine; Y represents pyrimidine; N can be any nucleotide: Umen et al. 1995). In diplonemid-introns, the 5' splice site consensus is G / G U R U G Y while the 3' splice site consensus is the same as that of mammalian introns. These consensus sequences at the 5' splice sites and 3' splice sites of the eleven diplonemid-introns are thus very similar to those of the introns in mammals, the only difference being the fourth position at the 5' splice site. It is a U in the diplonemid-intron whereas an A in the animal-intron. This is consistent 67 with a previous finding in euglenoids: the conserved sequences at the 5' splice sites of all the euglenoid G U - A G conventional introns so far (three introns in the fibrillarin gene of Euglena gracilis and one intron in the beta-tubulin of Entosiphon sulcatum) also have this single nucleotide substitution (Breckenridge et al. 1999; Ebel et al. 1999). In animal introns, the consensus region at the 5' splice site is recognised through complementary base pairing by U l s n R N A (Sharp 1987). It has been shown that in euglenoids, the highly conserved 5' extremity of U l sequences contain one complimentary substitution ( U to A ) at the fourth position (Ebel et al. 1999; Breckenridge et al. 1999). Therefore, one would expect an analogous compensatory change at the 5' extremity in the U l s n R N A of diplonemids. A third unusual feature is that no conventional branchpoint site can be clearly identified in these diplonemid introns. I have searched the branchpoint consensus sequence of both mammalian introns ( 5 ' - Y N Y U R A C N - 3 ' ) and yeast introns ( 5 ' - T A C T A A C A - 3 ' ) in the 11 diplonemid-introns (the branchpoint adenosine is underlined; Umen et al. 1995). A s mentioned in the Results, the branchpoint consensus sequence of yeast introns was not present in any of the 11 diplonemid-introns. On the other hand, the branchpoint consensus sequence of mammalian intron was observed six times in five of the 11 diplonemid-introns (twice in one intron). But, in four of the six times, this branchpoint consensus sequence was observed either between position +4 and +11 or between position +9 and +16 (referring to the 5' cleavage site) of the introns. These can hardly be true branchpoint sites, since they are too close to the 5' splice sites of the introns. We know that the branchpoint site is closer to the 3' splice site than to the 5' splice site of an intron, usually 15-40 nucleotides upstream of the 3' splice site of an intron (Umen et al. 1995). It is highly unlikely that the branchpoint site would be present so close to the 5' splice site or even overlapping with the 5' splice site consensus sequence. In two introns, this branchpoint consensus sequence was observed once 68 between position -23 and -16; once between position -36 and -29 in a different diplonemid gene. These two sites are also unlikely to be true branchpoint sites for two reasons. First of all, i f they represent real branchpoint sites in diplonemid-introns, then they should be observed in the other nine diplonemid-introns as well . Second, the branchpoint consensus sequence is relatively redundant in a mammalian intron. Among the eight nucleotides Y N Y T R A C N , only three nucleotides are specific. So, the chance to find such eight continuous nucleotides within a piece of nucleotide-sequence is comparatively high. In short, the branchpoint consensus sequence in a diplonemid-intron may be different from that of a mammalian intron or a yeast intron. B y analysis of the 14 alignable nucleotides at the 3' splice sites of the eleven diplonemid-introns, another unusual feature of diplonemid-introns becomes apparent. The 11-nucleotide regions preceding A C A G / in diplonemid-introns are generally C A - r i c h (see Results). We know that in a conventional G T - A G intron, especially in mammals, there is a polypyrimidine tract between the branchpoint region and the 3' splice site (Umen et al. 1995). Previous experiments have demonstrated that the polypyrimidine tract in mammalian introns provides recognition sites for a splicing factor (PSF) and a negative regulatory factor, pyrimidine tract binding protein (PTB) (Gerke 1986; Tazi 1986; Singh et al. 1995). The binding of PSF to the polypyrimidine tract is essential for both splicing steps (Gerke 1986; Tazi 1986; Singh et al. 1995; Umen et al. 1995). It has also been demonstrated that P S F has strong RNA-sequence preferences (Singh et al. 1995). P T B acts as a negative regulator of splicing by binding to the pyrimidine tract and thus preventing the binding of PSF to the pyrimidine tract (Singh et al. 1995). The ' C A ' rich regions adjacent to the consensus C A G / at 3' splice site raises the possibility that the role of this region in diplonemid-introns is different from other introns. 69 This ' C A ' - r i c h region is absent at the 3' splice site of the G T - A G intron in the beta-tubulin gene from the colourless euglenoid Entosiphon sulcatum, where, instead, a typical polypyrimidine tract is present (Ebel et al. 1999). Among the three G U - A G introns in the fibrillarin gene from Euglena gracilis (Breckenridge et al. 1999), the introns A and C have CT-r ich tracts rather than CA- r i ch tracts at their 3' splice sites while intron B seems to have a weakly CA- r i ch tract: four A , four C, two T and two G residues, preceding the C A G / . If introns with both CA- r i ch and polpyrimidine tracts can exist in the same pre-mRNA transcript, then it is possible that a splicing factor in euglenoids could recognise both the C A - r i c h tract and the CT-r ich tract at the 3' splice selection site. It is also possible that the splicing factor in diplonemids has the same dual functions, since the 11-nucleotide region preceding A C A G / in one of the 11 diplonemid-introns is clearly CT-r ich (Alpha-rh2 in Fig . 8). In summary, introns seem to be more common in nuclear encoded genes of diplonemids than those of either euglenoids or kinetoplastids. The eleven diplonemid-introns discovered in this study are all G T - A G introns. However, they distinguish themselves from classical G T - A G spliceosomal introns in four ways: 1) They are short. 2) Nearly all the diplonemid-introns possess a T residue at the fourth position at their 5' splice sites. 3) They don't have branchpoint consensus sequence of either mammalian introns or yeast introns. 4) There is a C A - r i c h region comprised of 12 nucleotides preceding C A G / at the 3' splice site of a typical diplonemid-intron. A l l these differences suggest that the spliceosomes in diplonemids might be slightly different from comparatively well-studied spliceosomes of other eukaryotes. 4.4 Evolutionary origin of diplonemid GAPDH 70 The phylogenetic positions of the four diplonemid G A P D H sequences obtained in this study (see Fig. 12) are unexpected, as none of the diplonemid sequences branch with three other known groups of euglenozoan sequences (the kinetoplastid glycosome/Euglena cytosol clade, the chloroplast GapA of Euglena gracilis, or the Leishmania mexicana and Trypanosoma brucei cytosolic clade). Instead, the gapl sequences of three diplonemids (Rhynchopus sp. 3, Diplonema sp. 3 and Diplonema sp. 2) branch within the GapA/B clade, specifically with the gap3 genes of cyanobacteria while the Diplonema sp. 2 gap2 sequence branches with the cytosolic G A P D H genes of eukaryotes, specifically with those from cryptomonads. The three gapl sequences are much more similar to each other than to the gap2 sequence from Diplonema sp. 2 (for details, see Results). In addition, they share five insertions that are neither in the gap2 sequence from Diplonema sp. 2 or in any other G A P D H sequences examined (see Fig . 7 and Results). I suggest that the sequence differences between the three gapl sequences and the gap2 sequence were largely caused by their different evolutionary histories, rather than considered to be the evolutionary consequences of different localizations or different functions within the cell. This is because that it seems likely that both types of G A P D H in diplonemids are NAD-specif ic , possibly playing roles in.catabolic glycolysis in the cytosol. A s mentioned in the Introduction, the amino acid at position 32 of a G A P D H gene is considered as an important indicator of the relative specificity of G A P D H for N A D or N A D P as a substrate. The amino acid at position 32 oi Diplonema gap2 is aspartic acid (D), the same as nearly all other NAD-specif ic cytosolic G A P D H enzymes found in different prokaryotes and eukaryotes, while in the diplonemid gapl sequences, the same positions are occupied by glutamic acid (E), which is also shared by gap3 in A. varibilis (see Fig. 7). Comparative studies of the substrate-binding 71 properties of various mutants have suggested that replacing Asp32 (D32) by Glutamic acid (E) w i l l not compromise activity with N A D , but both prevent activity with N A D P (Clermont et al. 1993). This means that both types of G A P D H in diplonemids are likely NAD-specific. Since we know that cytosolic G A P D H is NAD-specif ic and chloroplast G A P D H is both N A D - and NADP-specif ic , it is likely that both diplonemid gapl and gap2 perform the same role (catabolic) in the same location (cytosol). In the global G A P D H tree (Fig. 12), the three gapl sequences of three different diplonemids (Diplonema sp. 2, Diplonema sp. 3 and Rhynchopus sp. 3) unite themselves robustly with the cyanobacterial gap3 clade at a 100% bootstrap level. It is unlikely that the diplonemid gapl genes come from a cyanobacterial contaminant since these three gapl sequences were isolated from three independently grown axenic cultures (see Materials and Methods). So, three related but different cyanobacteria would have to have contaminated the three diplonemid cultures, which is highly unlikely. Since it isn't contamination, lateral gene transfer is the only way to explain why diplonemids have eubacterial genes, and the close and strongly supported relationship with cyanobacterial gap3 suggests that diplonemids aquired their gapl from a cyanobacterium through horizontal gene transfer. Lateral gene transfer has been cited previously to explain unusual association observed in G A P D H phylogeny. In addition to the G A P D H genes of parabasalids mentioned in the Introduction, another case is the extraordinarily close relationship among the cytosolic GapC sequences of T. brucei and L. mexicana with E.coli gapl sequence. Michels et al. (1991) and Martin et al. (1993) have postulated that the ancestor of the trypanosome-lineage received this gene by a prokaryote-to-eukaryote lateral gene transfer from an E. co/z'-like ancestor relatively recently in evolution. Those kinetoplastids that separated early in evolution from the trypanosome-lineage (such as the bodonid Trypanoplasma borelli), possess only the glycosomal G A P D H 72 gene, and lack the cytosolic genes found in \"higher\" kinetoplastids (Michels et al. 1992). Therefore, Henze et al. (1995) concluded that the genes for cytosolic G A P D H in kinetoplastids provides evidence for an evolutionarily recent gene transfer. The second copy of G A P D H from Diplonema sp. 2 (gap2) differs considerably from the other three diplonemid-gapl genes in sequence and appears from the phylogeny to be unrelated to the gapl genes (see Results). In the G A P D H tree (Fig. 12), this gap2 from Diplonema sp. 2 falls into the eukaryotic crown taxa (the GapC clade), and branches specifically with the cytosolic G A P D H genes from cryptomonads. These are in turn closely related to animals and fungi but very distant from other diplonemid G A P D H sequences and G A P D H sequences from either kinetoplastids or euglenoids. The association of gap2 and the cryptomonad-GAPDH sequences is very weak (31% bootstrap support), however, it is clear that the diplonemid gap2 sequence is not related to any other euglenozoan G A P D H sequence. The origin of this G A P D H and its evolutionary relationships to other Euglenozoan G A P D H sequences are difficult to predict since the phylogenetic position of Diplonema sp.2 gap2 is so tentative. However, the presence of a Diplonema G A P D H sequence within the eukaryotic crown taxa (GapC sub-clade) raises tantalising question: could this be descended from the original GapC of the Euglenozoa, which is now lost in both euglenoids and kinetoplastids? In summary, what can be inferred from this G A P D H phylogeny of the phylum Eulgenozoa are the following three points: 1) The G A P D H phylogeny of the phylum Euglenozoa is complex. There are two distinct types of G A P D H enzymes in each of the three major groups: euglenoids, kinetoplastids and diplonemids. Except for the cytosolic G A P D H in Euglena and the glycosomal G A P D H in kinetoplastids, the remaining four types of G A P D H sequences (chloroplast GapA in Euglena gracilis, cytosolic G A P D H in 73 trypanosomes, cyanobacteria-related G A P D H in diplonemids, and the Diplonema sp. 2 gap2) are scattered all over the global G A P D H tree. 2)The extraordinarily close association of Diplonema sp. 2, Diplonema sp. 3 and Rhynchopus sp. 3 G A P D H sequences with the gap3 sequences of the cyanobacteria suggested a inter-domain horizontal gene transfer from a prokaryotic to a eukaryotic genome. 3) A different copy of diplonemid G A P D H (Diplonema sp. 2 gap2) branches with the GapC sequences of other eukaryotes, and may represent the ancestral euglenozoan G A P D H . 74 References Agabian, N . 1990. Trans splicing of nuclear pre-mRNAs. Ce l l 61:1157-60. Biesecker, G . , J. I. Harris, J. C. Thierry, J. E . Walker, and A . J. Wonacott 1977. Sequence and structure of D-glyceraldehyde 3-phosphate dehydrogenase from Bacillus stearothermophilus. Nature 266:328-33. Blumenthal, T., and J. Thomas 1988. Cis and trans m R N A splicing in C. elegans. T IG 4:305-8. Borst, P., and B . W . Swinkels 1989. The evolutionary origin of glycosomes: how glycolysis moved from cytosol to organelle. In Evolutionary tinkering in gene expression (Grunberg-Manago, M . , Clark, B . F. Zachau, H . G . , eds.) pp. 163-74, Plenum Publishing Corporation, New York. Breckenridge, D. G. , Y . Watanabe, S. J. Greenwood, M . W . Gray, and M . N . Schnare 1999. U l small nuclear R N A and spliceosomal introns in Euglena gracilis. Proc Natl Acad Sci U S A 96:852-6. Cavalier-Smith, T. 1981. Eukaryote kingdoms: seven or nine? Biosystems 14:461-81. Clermont, S., C. Corbier, Y . Mely, D . Gerard, A . Wonacott, and G . Branlant 1993. Determinants of coenzyme specificity in glyceraldehyde-3-phosphate dehydrogenase: role of the acidic residue in the fingerprint region of the nucleotide binding fold. Biochemistry 32:10178-84. Davis, R. E . 1997. Surprising diversity and distribution of spliced leader R N A s in flatworms. M o l Biochem Parasitol 87:29-48. Ebel, C , C. Frantz, F. Paulus, and P. Imbault 1999. Trans-splicing and cis-splicing in the colourless Euglenoid, Entosiphon sulcatum. Curr Genet 35:542-50. Farmer, M . A . , and R. E . Triemer 1988. Flagellar systems in the euglenoid flagellates. Biosystems 21:283-91. Felsenstein, J. 1993. P H Y L I P (phylogeny inference package). Distributed by the author, Department of Genetics, University of Washington, Seattle, Version 3.57c. Gascuel, O. 1997. B i o N J : an improved version of the N J algorithm based on a simple model of sequence data. M o l B i o l Evo l 14: 685-95. Gerke, V . , and J. A . Steitz 1986. A protein associated with small nuclear ribonucleoprotein particles recognizes the 3' splice site of premessenger R N A . Ce l l 47:973-84. 75 Gibbs, S. P. 1978. The chloroplasts of Euglena may have evolved from symbiotic green algae. Can J Bot 56:2883-9. Henze, K . , A . Badr, M . Wettern, R. Cerff, and W . Martin 1995. A nuclear gene of eubacterial origin in Euglena gracilis reflects cryptic endosymbioses during protist evolution. Proc Natl Acad Sci U S A 92:9122-6. Keeling, P. J., and W. F. Doolittle 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. M o l B i o l Evo l 13:1297-305. Kishino, H . , and M . Hasegawa 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from D N A sequence data, and the branching order in Hominoidea. J. M o l . Evol . 29: 170-9. Laird, P. W . 1989. Trans splicing in trypanosomes\u00E2\u0080\u0094archaism or adaptation? Trends Genet 5:204-8. Lee, J. J., Hutner, S. H . , and E . C. Bovee 1985. Oder 11. Kinetoplastida, pp. 141-55, in A n illustrated guide to the protozoa. Al len Press, Lawrence, U S A . Liaud, M . F. , U . Brandt, M . Scherzinger, and R. Cerff 1997. Evolutionary origin of cryptomonad microalgae: two novel chloroplast/cytosol-specific G A P D H genes as potential markers of ancestral endosymbiont and host cell components. J M o l Evo l 44 Suppl 1:S28-37. Mair , G . , H . Shi, H . L i , A . Djikeng, H . O. Aviles, J. R. Bishop, F. H . Falcone, C. Gavrilescu, J. L . Montgomery, M . I. Santori, L . S. Stern, Z . Wang, E . U l l u , and C. Tschudi 2000. A new twist in trypanosome R N A metabolism: cis-splicing of pre-mRNA. R N A 6:163-9. Markos, A . , A . Miretsky, and M . Muller 1993. A glyceraldehyde-3-phosphate dehydrogenase with eubacterial features in the amitochondriate eukaryote, Trichomonas vaginalis. J M o l Evo l 37:631-43. Martin, W. , H . Brinkmann, C. Savonna, and R. Cerff 1993. Evidence for a chimeric nature of nuclear genomes: eubacterial origin of eukaryotic glyceraldehyde-3-phosphate dehydrogenase genes. Proc Natl Acad Sci U S A 90:8692-6. Maslov, D . A . , S. Yasuhira, and L . Simpson 1999. Phylogenetic affinities of Diplonema within the Euglenozoa as inferred from the S S U r R N A gene and partial C O I protein sequences. Protist 150:33-42. Michels, P. A . , and V . Hannaert 1994. The evolution of kinetoplastid glycosomes. J Bioenerg Biomembr 26:213-9. Michels, P. A . M . , F. R. Opperdoes, V . Hannaert, E . A . C. Wiemer, S. Allert, andN. Chevalier 1992. Phylogenetic analysis based on glycolytic enzymes. Belg Journ Bot 125:164-73. 76 Michels, P. A . , M . Marchand, L . K o h l , S. Allert, R. K . Wierenga, and F. R. Opperdoes 1991. The cytosolic and glycosomal isoenzymes o f glyceraldehyde-3-phosphate dehydrogenase in Trypanosoma brucei have a distant evolutionary relationship. Eur J Biochem 198:421-8. Muchhal, U . S., and S. D . Schwartzbach 1994. Characterization of the unique intron-exon junctions oi Euglena gene(s) encoding the polyprotein precursor to the light-harvesting chlorophyll a/b binding protein of photosystem II. Nucleic Acids Res 22:5737-44. Nilsen, T. W . 1995. trans-splicing: an update. M o l Biochem Parasitol 73:1-6. Nilsen, T. W . 1994. Unusual strategies of gene expression and control in parasites. Science 264:1868-9. Nilsen, T. W. 1989. Trans-splicing in nematodes. Exp Parasitol 69:413-6. Opperdoes, F. R. 1987. Compartmentation of carbohydrate metabolism in trypanosomes. Annu Rev Microbiol 41:127-51. Opperdoes, F. R., and P. A . M . Michels 1989. Biogenesis and evolutionary origin of peroxisomes. In Organelles in eukaryotic cells:molecular structure and interactions (Tager, J. M . , A z z i , A . , Papa, S. and Guerrieri, F., eds) pp. 187-95, Plenum Publishing Corporation, New York. Remillard, S. P., E . Y . La i , Y . Y . Levy, and C . Fulton 1995. A calcineurin-B-encoding gene expressed during differentiation of the amoeboflagellate Naegleria gruberi contains two introns. Gene 154:39-45. Sambrook, J., E . F . Fritsch, and T. Maniatis 1989. Small-scale preparations of plasmid D N A . In Molecular cloning (a laboratory manual, second edition) pp. 1.25-1.32, Cold Spring Harbor Laboratory Press, U S A . Schnepf, E . 1994. Light and electron microscopical observations in Rhynchopus coscinodiscivorus spec, nov., a colorless, phagotrophic Euglenozoon with concealed flagella. Arch Protistenkd 144:63-74. Sharp, P. A . 1987. Splicing of messenger R N A precursors. Science 235:766-71. Simpson, A . G . B . 1997. The identity and composition o f the Euglenozoa. Arch Protistenkd 148:318-28. Simpson, A . G . B . , D . H . J. Van, C. Bernard, H . R. Burton, and D . J. Patterson 1996/97. The ultrastructure and systematic position of the euglenozoon Postgaardi mariagerensis. Fenchel et al. Arch. Protistenkd. 147: 213-25. Singh, R., J. Valcarcel, and M . R. Green 1995. Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins. Science 268:1173-6. 77 Strimmer, K . , and A . von Haeseler 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. M o l . B i o l . Evo l . 13: 964-9. ' Tazi , J., C . Alibert, J. Temsamani, I. Reveillaud, G . Cathala, C . Brunei, and P. Jeanteur 1986. A protein that specifically recognizes the 3' splice site of mammalian pre-mRNA introns is associated with a small nuclear ribonucleoprotein. Ce l l 47:755-66. Tessier, L . H . , R. L . Chan, M . Keller, J. H . Wei l , and P. Imbault 1992. The Euglena gracilis rbcS gene contains introns with unusual borders. F E B S Lett 304:252-5. Tessier, L . H . , M . Keller, R. L . Chan, R. Fournier, J. H . Wei l , and P. Imbault 1991. Short leader sequences may be transferred from small R N A s to pre-mature m R N A s by trans-splicing in Euglena. E M B O J 10:2621-5. Triemer, R. E . , and M . A . Farmer 1991a. A n ultrastuctural comparison of the mitotic apparatus, feeding apparatus, flagellar apparatus and cytoskeleton in euglenoids and kinetoplastids. Protoplasma 164:91-104. Triemer, R. E . , and M . A . Farmer 1991b. The ultrastuctural organization of the heterotrophic euglenoids and its evolutionary implications, pp. 183-204. In Patterson, D . J., and J. Larsen (ed.), The biology of free-living heterotrophic flagellates. Clarendon Press, Oxford. Triemer, R. E . , and D . W. Ott 1990. Ultrastructure oi Diplonema ambulator Larsen & Patterson (Euglenozoa) and its relationship to Isonema. Europ J Protistol 25:316-20. Umen, J. G. , and C. Guthrie 1995. The second catalytic step of pre -mRNA splicing. R N A 1:869-85. Viscogliosi , E. , and M . Muller 1998. Phylogenetic relationships of the glycolytic enzyme, glyceraldehyde-3-phosphate dehydrogenase, from parabasalid flagellates. J M o l Evo l 47:190-9. 78 A d d e n d u m Spliced leader sequences have recently been isolated from Diplonema papillatum and Diplonema sp. by D . A . Campbell, University of California at Los Angeles (unpublished data, personal communication with Dr. Patrick Keeling). This discovery strongly supports my prediction that trans-splicing is an ancestral characteristic to the phylum Euglenozoa. 79 "@en . "Thesis/Dissertation"@en . "2000-11"@en . "10.14288/1.0089614"@en . "eng"@en . "Botany"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "The evolutionary implications of diplonemids and their spliceosomal introns"@en . "Text"@en . "http://hdl.handle.net/2429/10769"@en .