Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Characterization of RNA viruses from the coastal waters of British Columbia Culley, Alexander Ian 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-267042.pdf [ 8.71MB ]
Metadata
JSON: 831-1.0100397.json
JSON-LD: 831-1.0100397-ld.json
RDF/XML (Pretty): 831-1.0100397-rdf.xml
RDF/JSON: 831-1.0100397-rdf.json
Turtle: 831-1.0100397-turtle.txt
N-Triples: 831-1.0100397-rdf-ntriples.txt
Original Record: 831-1.0100397-source.json
Full Text
831-1.0100397-fulltext.txt
Citation
831-1.0100397.ris

Full Text

CHARACTERIZATION OF RNA VIRUSES F R O M T H E COASTAL WATERS OF BRITISH COLUMBIA by A L E X A N D E R I A N C U L L E Y B . S c , Un ive r s i t y o f Oregon, 1993 M . S c . , M o s s L a n d i n g M a r i n e Laboratories , 2000 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y i n T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Botany) T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A January 2007 © Alexande r Ian C u l l e y , 2007 Abstract R N A viruses are major pathogens o f animals and plants and include viruses that are o f enormous economic and publ ic-heal th concern. In the ocean, R N A viruses infect organisms f rom bacteria to whales, but R N A vi rus communi t ies i n the sea remain essentially u n k n o w n . A l t h o u g h what we k n o w o f marine R N A viruses is restricted to a l im i t ed number o f isolates, emerging data suggest that R N A viruses might be more abundant, and are l i k e l y more eco log ica l ly important, than has been suggested. Therefore the hypothesis o f this dissertation is that R N A viruses comprise a detectable and diverse fraction o f the marine v i rus communi ty . Towards testing this premise, the research objectives were to sequence a marine R N A vi rus isolate, Heterosigma akashiwo R N A vi rus ( H a R N A V ) , evaluate the diversi ty o f p icorna- l ike viruses, a superfamily o f positive-sense single-stranded ( s s ) R N A viruses, and construct whole-genome shotgun l ibraries to characterize two complete R N A vi rus assemblages. The results o f a l l three studies underl ine the novel ty o f the marine R N A vi rus communi ty . F o r example, H a R N A V is related to p icorna- l ike viruses, but does not be long w i t h i n any currently defined v i rus fami ly and has therefore been class i f ied i n the Marnaviridae, a newly established v i rus fami ly . Furthermore, o n the basis o f analysis o f R N A - d e p e n d e n t R N A polymerase sequences ampl i f i ed f rom marine v i rus communi t ies f rom the Strait o f Georg ia , a diverse array o f p icorna- l ike viruses exists i n the ocean. A l l o f the sequences ampl i f i ed were divergent f rom k n o w n picorna- l ike viruses, and fe l l w i t h i n four monophyle t ic groups. F i n a l l y , analysis o f reverse transcribed whole-genome shotgun l ibraries revealed a diverse assemblage o f R N A viruses, i nc lud ing a broad group o f marine p icorna- l ike viruses and distant relatives o f viruses infect ing arthropods and higher plants. M o r e o v e r , the genomes o f several hitherto undiscovered viruses were complete ly assembled. These data are among the first characterizations o f the in situ marine R N A vi rus commun i ty and represent a p re l iminary step i n the e lucidat ion o f their role i n the marine environment. The discovery o f nove l groups o f viruses that are s ignif icant ly divergent f rom established taxa should be o f interest to virologis ts , oceanographers, and mic rob i a l ecologists. i i Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures vii List of Symbols and Abbreviations ix Acknowledgements xiii Dedication . xiv Co-Authorship Statement xv Chapter I. Introduction to marine RNA viruses 1 1.1 B a c k g r o u n d 2 1.1.1 Importance o f marine viruses 2 1.1.2 M o s t marine viruses are bacteriophages 5 1.1.3 A majority o f marine viruses have D N A genomes 5 1.1.4 M a r i n e R N A bacteriophages 6 1.1.5 V i r u s t axonomy 7 1.1.6 Relevant molecular methods i n marine v i ro logy 8 1.1.7 Introduct ion to R N A v i r o l o g y 9 1.1.8 M a r i n e R N A viruses 10 1.1.9 R N A viruses that infect marine protists 11 1.2 Thesis theme 13 1.3 Table , 15 1.4 References 16 Chapter II. Genome sequence and characterization of a virus (HaRNAV) related to picorna-like viruses that infects the marine toxic bloom-forming alga Heterosigma akashiwo 26 2.1 Introduction 27 2.2 Results 28 2.2.1 Features o f the H a R N A V genome sequence 28 2.2.2 Determina t ion o f H a R N A V genome polar i ty 29 i i i 2.2.3 A n a l y s i s o f H a R N A V structural proteins 30 2.2.4 Compar i sons o f H a R N A V proteins and putative protein domains to other vi rus sequences 31 2.2.5 Phylogenet ic analyses o f H a R N A V proteins and putative protein domains 33 2.3 D i scus s ion 34 2.4 Mater ia l s and Methods 37 2.4.1 Pur i f i ca t ion o f v i rus particles 37 2.4.2 Determina t ion o f H a R N A V genomic sequence 37 2.4.3 Pro te in sequencing 38 2.4.4 Nuc leo t ide and prote in sequence analyses 39 2.4.5 Sequence alignments 39 2.4.6 Phylogenet ic tree construct ion and presentation 39 2.5 Tables and Figures 40 2.6 References 51 Chapter III. High diversity of unknown picorna-like viruses in the sea 56 3.1 Introduction 57 3.2 Resul ts and D i scus s ion 57 3.3 Mater ia l s and Methods 60 3.3.1 Sample co l lec t ion and preparation 60 3.3.2 R T - P C R 60 3.3.3 C l o n i n g and sequencing 61 3.4 Tables and Figures 63 3.5 References 65 Chapter IV. Metagenomic analysis of coastal RNA virus communities 68 4.1 Introduction 69 4.2 Resul ts and D i scus s ion 69 4.3 Mater ia l s and Methods 74 4.3.1 Stat ion descr ipt ion 74 4.3.2 V i r u s Concentra t ion 74 4.3.3 R N a s e treatment and extraction 74 4.3.4 D N a s e 1 treatment 75 4.3.5 U n i v e r s a l r R N A P C R 75 4.3.6 c D N A synthesis 75 4.3.7 Second-strand synthesis 76 4.3.8 Adapte r addi t ion 76 4.3.9 C o l u m n chromatography 76 4.3.10 Adapter-targeted P C R 76 4.3.11 C l o n i n g & Sequencing 77 i v 4.3.12 Sequence fragment c lass i f icat ion 77 4.3.13 C o n t i g assembly 78 4.3.14 B i a s 78 4.3.15 Phylogenet ic analyses .- 78 4.3.16 c D N A synthesis for p icorna- l ike R d R p R T - P C R and D G G E 79 4.3.17 P C R w i t h degenerate pr imers 79 4.3.18 D G G E 79 4.3.19 A c c e s s i o n numbers 80 4.4 Tables and Figures 81 4.5 References 90 Chapter V. The complete genomes of three viruses assembled from shotgun libraries of marine RNA virus communities 94 5.1 Introduction 95 5.2 Results and D i scus s ion 96 5.2.1 Jericho P ie r site 96 5.2.2 Strait o f G e o r g i a site 99 5.3 Mater ia l s and Methods 101 5.3.1 Stat ion descr ipt ion 101 5.3.2 V i r u s concentration method 101 5.3.3 Whole -genome shotgun l ibrary construct ion 101 5.3.4 5 ' and 3 ' R A C E 102 5.3.5 P C R 103 5.3.5.1 C l o s i n g gaps i n the assembly 103 5.3.5.2 Env i ronmen ta l screening 103 5.4 Tables and Figures 104 5.5 References 114 Chapter VI. Conclusions 117 6.1 C o n c l u d i n g remarks 118 6.1.1 Recapi tu la t ion 118 6.1.2 B i a s 118 6.1.2.1 B i a s associated w i t h sample co l lec t ion and extraction o f R N A 118 6.1.2.2 B i a s associated w i t h R T 119 6.1.2.3 B i a s associated w i t h P C R 119 6.1.2.4 B i a s associated w i t h c lon ing 120 6.1.2.5 B i a s associated w i t h W G S l ibrary construct ion 120 6.1.3 Signi f icance o f the research 121 6.2 F igure 124 6.3 References 125 v List of Tables Table 1.1 R N A viruses that infect marine protists 15 Table 2.1 Pr imers used for c D N A synthesis, P C R and R T - P C R 40 Table 2.2 N - t e r m i n a l sequences o f proteins f rom pur i f ied H a R N A V particles 41 Table 2.3 Summary o f viruses used i n phylogenet ic analyses 42 Table 3.1 Sequence details 63 Table 4.1 Character izat ion o f sampl ing sites. The locat ion is g iven i n degree dec ima l format. A c h l o r o p h y l l a value was not available (n.a.) for the S O G sample. W e d i d not observe a b l o o m at either station during sampl ing 81 Table 4.2 Identif icat ion o f the top t B L A S T x matches (E value < 0.001, n = 92) o f environmental sequences f rom JP and S O G libraries w i t h the Genbank database. A number i n b o l d indicates the highest percentage o f matches i n each sample, and (-) indicates the v i rus fami ly , genus or species was not present 82 Tab le 4.3 Class i f ica t ion o f significant t B L A S T x matches (E value < 0.001, n = 92) to v i r a l sequences into protein categories 83 Table 4.4 Sequences used i n phylogenet ic analyses 84 Tab le 5.1 C o m p a r i s o n o f base compos i t ion between dic is t ronic p icorna- l ike viruses 104 Tab le 5.2 JP genome survey sample sites. A "+" indicates ampl i f ica t ion and " - " indicates no ampl i f i ca t ion occurred, "n .a ." indicates the data is not avai lable and " S " means the sample was taken f rom the surface 105 Table 5.3 V i r u s sequence details 106 Tab le 5.4 P C R primers used to complete the three genome sequences. Pr imers J P - A 5 and 6 and J P - B 6 and 7 ( shown i n bold) were used i n the environmental survey 108 vi List of Figures Figure 2.1 A n a l y s i s o f the H a R N A V genome sequence for open reading frames, and coverage o f the genome by P C R and c lon ing 43 F igure 2.2 Sequence o f the 5 ' untranslated region o f the H a R N A V genome 44 F igure 2.3 Demonst ra t ion that the H a R N A V genome is posit ive-stranded 45 F igure 2.4 A n a l y s i s o f structural proteins f rom H a R N A V particles 46 F igure 2.5 Representation o f the predicted H a R N A V polypro te in 47 F igure 2.6 A l i g n m e n t o f H a R N A V sequences w i t h sequences f rom other viruses 48 F igure 2.7 Phylogenet ic analysis o f R N A - d e p e n d e n t R N A polymerase d o m a i n protein sequences 49 F igure 2.8 Phylogenet ic analysis o f concatenated (putative) h e l i c a s e / R d R p / V P 3 - l i k e caps id prote in sequences 50 F igure 3.1 M a x i m u m - l i k e l i h o o d tree o f R d R p sequences f rom environmental ampl icons and representative viruses f rom picorna- l ike virus famil ies 64 F igure 4.1 C o m p o s i t i o n o f the JP (outer c i rc le , n = 216) and the S O G (inner c i rc le , n = 61) l ibraries 86 F igure 4.2 C o m p a r i s o n o f the general genomic organizat ion o f the R N A vi rus genomes assembled f rom the JP and S O G libraries (JP and S O G ) w i t h representative viruses f rom the ( A ) proposed order Picornavirales (Chr is t ian et a l . 2005) and the ( B ) f ami ly Tombusviridae and genus Umbravirus 87 F igure 4.3 Bayes i an m a x i m u m l i k e l i h o o d trees o f a l igned R d R p amino ac id sequences f rom the JP R N A virus communi ty and representative members o f the proposed order Picornavirales (Chr is t ian et a l . 2005) 88 v n Figure 4.4 Bayes i an m a x i m u m l i k e l i h o o d trees o f a l igned R d R p amino ac id sequences f rom the S O G vi rus l ibrary and representative viruses f rom the Tombusviridae and Umbravirus genus 89 F igure 5.1 A n a l y s i s o f genomes for possible open reading frames 109 F igure 5.2 M a p o f the Strait o f Georg ia , B r i t i s h C o l u m b i a , Canada w i t h station locat ions 110 F igure 5.3 Bayes ian m a x i m u m l i k e l i h o o d trees o f a l igned R d R p amino ac id sequences f rom J P -A and J P - B and representative members o f the proposed order Picornavirales I l l F igure 5.4 Bayes ian m a x i m u m l i k e l i h o o d trees o f a l igned concatenated helicase, R d R p and V P 3 -l ike capsid amino ac id sequences f rom J P - A and J P - B and other p icorna- l ike viruses 112 F igure 5.5 Bayes i an m a x i m u m l i k e l i h o o d trees o f a l igned R d R p amino ac id sequences f rom S O G and members o f the f ami ly Tombusviridae and unassigned genus Umbravirus 113 F igure 6.1 C lade o f marine p icorna- l ike virus R d R p sequences f rom Figure 4.3 124 v i i i List of Symbols and Abbreviations A adenine aa amino ac id D aspartic ac id bp base-pair B L A S T basic loca l a l ignment search too l C 0 2 carbon d iox ide cont ig contiguous segment o f over lapping sequence fragments C cytosine °C degrees Ce l c iu s D G G E denaturing gradient ge l electrophoresis D N A deoxyr ibonuc le ic ac id d N T P deoxyribonucleoside triphosphate d i a diameter D T T di thiothrei tol d s D N A double-stranded D N A d s R N A double-stranded R N A E D T A ethylenediaminetetraacetic ac id E value expected value F R P Fraser r iver p lume G guanine/glysine h hour JP Jericho pier I R E S internal r ibosome entry site kbp ki lobasepairs k D a k i loda l ton k P a k i lopasca l L A S L l inker ampl i f i ed shotgun l ibrary 1 l i tre M g C b magnes ium chlor ide m g m i l l i g r a m m i n minute m l m i l l i l i t r e m M m i l l i m o l a r m m mi l l ime te r M molar ng nanogram n M nanomolar n m nanometer N C B I national center for b io technology informat ion nt(s) nucleotide(s) o l igo ol igonucleot ide O R F open reading frame ppt parts per thousand P H A C C S phage communi t ies f rom cont ig spectrum P A U P phylogenet ic analysis us ing pars imony x p m o l p i comole p o l y ( A ) polyadenylate P C R polymerase cha in react ion P V D F po lyv iny l idene fluoride P S I - B L A S T posi t ion-specif ic iterated B L A S T P F G E pulsed f ie ld gel electrophoresis R A C E rap id ampl i f ica t ion o f complementary ends R T reverse transcript ion R T - P C R reverse transcription- polymerase chain react ion R N A r ibonucle ic ac id R d R p R N A - d e p e n d e n t R N A polymerase s second s s D N A single-stranded D N A s s R N A single-stranded R N A S D S - P A G E sod ium dodecy l sulphate p o l y a c r i l i m i d e ge l elctrophoresis k m 2 square ki lometers S O G Strait o f G e o r g i a T thymine T E M transmission electron microscope T A E tris acetate E D T A T B E tris borate E D T A un id . unident if ied U uni ts /uraci l U T R untranslated reg ion x i V vers ion v / v v o l u m e per vo lume w / v weight per vo lume W G S whole-genome shotgun approximately mic rog ram u l micro l i t re u m micrometer u M mic romola r - negative-sense + positive-sense •> pr ime x g t imes gravity Acknowledgements I thank A m a n d a Tope ro f f for her unwaver ing support, friendship, creative input and eternal o p t i m i s m and m y fami ly for their encouragement and patience. Thanks to m y advisor Cur t i s Suttle for his guidance, ingenuity and proficient edi t ing and to A n d r e w L a n g for demonstrating the delicate balance o f persistence, prec is ion , faith and res i l iency required i n molecula r b io logy . I offer m y gratitude to the members o f the Suttle laboratory past and present for l i v e l y scientif ic discourse, assistance i n sample co l lec t ion and buoyant camaraderie, i n particular, A m y C h a n , Caro l ine Chenard, Jessie Clasen , A n d r e C o m e a u , Ma t t Fischer , E m m a H a m b l y , Janice Lawrence , Pascal Lore t , Jerome Payet, N i n a N e m c e k , C i n d y Short,, Steven Short and V e r a T a i . I a m grateful to Debb ie A d a m , M a r y Berbee, Pa t r ick K e e l i n g , K e i z o N a g a s a k i , D ' A n n R o c h o n , Helene Sanfacon and L e e T a y l o r for their assistance dur ing the pub l i ca t ion process, and m y committee members , Francois Jean, Pat r ick K e e l i n g , B i l l M o h n and Cur t i s Suttle for their valuable input. M y w o r k c o u l d not have been completed wi thout the f inancia l support o f the the Department o f Botany , the Department o f Ea r th and Ocean Sciences, the Un ive r s i t y o f B r i t i s h C o l o m b i a and the Na tu ra l Science and Eng inee r ing Research C o u n c i l o f Canada. x i i i Dedication to blue Co-Authorship Statement In Chapter II, A n d r e w L a n g designed and directed the sequencing effort. I participated i n a l l aspects o f the sequencing o f the H a R N A V genome w i t h the except ion o f N - t e r m i n a l protein sequencing o f structural proteins. M y pr imary contr ibut ion to data analyses was to the phylogenet ic analysis o f conserved putative proteins. M y contributions to the manuscript inc luded a d iscuss ion o f the results o f the phylogenet ics and i n manuscript preparation and rev is ion . In Chapters III, I V and V , I designed and performed the research, analyzed the data and was the lead author o f the manuscripts. A n d r e w L a n g made significant contributions to methods development and participated i n the preparation and r ev i s ion o f the manuscripts. A s the research supervisor, Cur t i s Suttle was i n v o l v e d i n the conceptual izat ion and design o f the research and i n manuscript preparation and rev i s ion , but was not i n v o l v e d i n the execut ion o f the research. x v Chapter I. Introduction to marine RNA viruses 1.1 Background 1.1.1 Importance of marine viruses Viruses are the most numerous b io log i ca l entities i n the ocean, typ ica l ly present at concentrations o f tens o f b i l l i ons o f free v i r ions per l i ter o f seawater ( W o m m a c k & C o l w e l l 2000) . The vi rus communi ty or v i r iop lank ton is compr ised o f a morpho log ica l ly (Weinbauer 2004) and genetical ly (Edwards & R o h w e r 2005) diverse array o f pathogens that infect organisms f rom every leve l o f the marine food web ranging f rom cetaceans (Bracht et a l . 2006) to microbes (Weinbauer 2004). V i ruses p lay a significant eco log ica l role i n the marine environment , i nc lud ing cont ro l l ing the popula t ion structure o f p lanktonic communi t ies ( W o m m a c k et a l . 1999), as direct agents o f mortal i ty (Fuhrman & N o b l e 1995) and as mediators o f hor izonta l gene transfer (Jiang & P a u l 1998), ul t imately resul t ing i n viruses in f luenc ing the w a y nutrients cyc le i n the ocean. Moreove r , there are data indica t ing that the ocean can act as a reservoir for the transmission o f viruses that cause disease i n humans and terrestrial plants and animals ( M u n n 2006) . A s w e l l as be ing abundant and ubiqui tous, viruses are agents o f morta l i ty i n the ocean. O n a communi ty leve l , changing the concentration o f viruses i n seawater can affect prokaryote abundance (Proctor & F u h r m a n 1990), and phytoplankton biomass (Suttle et a l . 1990). F u h r m a n and N o b l e (1995) demonstrated that v i r a l lys is and zooplankton graz ing c o u l d contribute equal ly to morta l i ty i n marine prokaryote communi t ies and Fischer et a l . (2006) estimated that v i r a l -induced lys is accounted for o n average an order o f magnitude more prokaryote morta l i ty than graz ing i n marine sediments. Nevertheless, investigations o f viruses i n the water c o l u m n ( P e d r o s - A l i o et a l . 2000) and sediment ( F i l i p p i n i et a l . 2006) found the relative contr ibut ion o f viruses to be insignif icant , demonstrating that the impact o f viruses can be variable . V i r u s -induced mortal i ty o f prokaryotes is o n average 2 5 % i n oceanic and 5 8 % i n coastal waters (Weinbauer 2004) , however , estimates range f rom 0 to greater than 100% and clear ly suffer f rom poor ly constrained assumptions (Suttle 2005). V i ruses have also been impl ica ted i n the terminat ion o f p lankton b looms. F o r example , b looms o f a strain o f Vibrio natriegens were terminated by the addi t ion o f a natural v i rus communi ty (Hennes et a l . 1995). The marine cocco l i thophor id Emiliania huxleyi is capable o f fo rming b looms i n temperate waters that cover upwards o f 10,000 k m 2 (Dunnigan et a l . 2006). E. huxleyi are armored i n c a l c i u m carbonate 2 scales and thus the demise o f these immense b looms results i n a significant f lux o f carbon f rom the surface to deeper waters (Dun igan et a l . 2006) . In several cases, viruses have been ident i f ied as the p r imary agent o f E. huxleyi b l o o m terminat ion (Bratbak et a l . 1993, W i l s o n et a l . 2002) . V i ru se s have been recognized i n greater than 50 species o f algae, however the influence o f v i r a l infect ion on these organisms remains obscure ( V a n Et ten et a l . 2002). V i ruses are responsible for the demise o f mul t ice l lu la r organisms as w e l l . Ostre id herpesvirus 1 ( O s H V l ) infects several species o f b iva lves and appears to be responsible for sudden die offs i n populat ions o f cul tured abalone and juven i l e paci f ic oysters (Fr iedman et a l . 2005) . The lethal disease white-spot syndrome is caused by W S S V (white-spot syndrome virus) . E p i d e m i c s o f W S S V have decimated shr imp populat ions i n A s i a and prompted a w o r l d w i d e effort to conta in and eradicate the v i rus (F lege l 2006). It has been hypothesized that viruses sustain divers i ty i n host populat ions by c u l l i n g the most abundant populat ions o f successful competitors and therefore m a k i n g avai lable niche space for less compet i t ive species to occupy (Fuhrman & Suttle 1993). Suttle (1992) observed shifts i n the compos i t ion o f a pr imary producer communi ty w i t h the addi t ion o f a natural v i rus communi ty . The close coup l ing o f changes i n the popula t ion structure o f v i rus , phytoplankton and bacterial communi t ies suggests a relat ionship between v i r a l act ivi ty and host diversi ty (Larsen et a l . 2001). A n indirect affect o f v i rus- induced mortal i ty o n host divers i ty was observed by V a n Hannen et a l . (1999), who remarked that the terminat ion o f a b l o o m o f cyanobacter ia due to v i r a l lys is corresponded w i t h a dramatic shift i n the heterotrophic bacterial communi ty compos i t ion , poss ib ly due to the pulse o f avai lable organic mater ial l iberated by v i r a l act ivi ty . It is l i k e l y that virus-mediated gene transfer, v i a transduction and/or transformation, is an important mechanism o f host evo lu t ion i n the marine environment. Weinbauer et a l . (2003) estimated that o n average 3 5 % o f marine prokaryotes f rom the Medi ter ranean Sea harboured a v iab le , inducib le prophage. A d d i t i o n a l averaged estimates o f percent lysogeny i n the mar ine prokaryot ic communi ty range f rom 3 to 3 0 % (Weinbauer 2004) . J iang and P a u l (1998) calculated that 1.3 x 1 0 1 4 transduction events per year occurred between phages and host communi t ies i n the T a m p a B a y estuary. M o r e o v e r there are data that transduction can occur between marine bacteria o f different genera (Ch iu ra 1997). Ana ly se s o f a rap id ly increasing number o f m i c r o b i a l genome sequences suggest that gene exchange between v i rus and host is a c o m m o n occurrence. F o r example , cyanophages contain homologs o f essential components o f 3 the photosynthetic apparatus o f Synechococcus ( M a n n et a l . 2003) and Prochlorococcus ( S u l l i v a n et a l . 2006). Based o n sequence analysis , Ze idner et a l . (2005) conc luded that exchange o f these genes between virus and host has occurred on numerous occasions. In an analysis o f P B C V (Paramecium bursaria ch lore l la v i rus) , the type species for the Phycodnaviridae, a f ami ly o f large, double-stranded ( d s ) D N A viruses c o m m o n i n the marine environment, Iyer et a l (2006) ident i f ied 46 and 40 genes o f eukaryot ic and prokaryot ic o r i g i n respectively. There is no data evaluating the importance o f transfomation i n the marine environment. Neverthless , J iang and P a u l (1995) calculated that v i r a l lys is was responsible for up to 3 7 % o f d isso lved D N A i n the water c o l u m n . Whether this p o o l is available for uptake and integration by microbes is unknown . A n essential step i n understanding the g loba l c y c l i n g o f nutrients such as carbon is the e luc idat ion o f nutrient cycles i n the ocean. M i c r o b e s are uniquely equipped to convert nutrients f rom one fo rm to another and therefore p lay a v i t a l role i n marine b iogeochemica l cycles ( D e L o n g & K a r l 2005) . Because viruses infect mar ine microbes , it is assumed that viruses p lay a role i n the c y c l i n g o f nutrients i n the ocean. W i l h e l m and Suttle (1999) postulated that f rom 6 to 2 6 % o f carbon result ing f rom photosynthesis is diverted back into a d i sso lved f o r m due to v i r a l lys is and that the overa l l effect o f v i r a l act ivi ty is to augment the rate o f movement o f nutrients f rom particulate organic matter to d issolved organic matter, d iver t ing nutrients f rom higher t rophic levels back into the m i c r o b i a l fraction (Suttle 2005). G o b l e r et a l . (1997) observed a detectable increase i n b ioavai lable nutrients dur ing the terminat ion o f a phytoplankton b l o o m due to v i r a l lys is . M o r e o v e r , products o f v i r a l lys is were shown to be responsible for increases i n bacterial p roduct ion ( M i d d e l b e e 2000). In ol igotrophic environments where infusions o f al lochthonous nutrients are infrequent, v i r a l lys i s may represent a significant pathway for phosphate r ecyc l ing (Middelb0e et a l . 1996). Howeve r , Stoderegger & H e r n d l (1998) found that the fragments o f bacterial cel ls that are a result o f v i r a l lys i s are largely unavai lable for incorporat ion by the m i c r o b i a l communi ty . The v i r iop lank ton i t se l f represents the second largest p o o l o f carbon i n the ocean, orders o f magnitude greater than marine protists (Weinbauer 2004) . Di rec t consumpt ion o f v i r a l particles has been documented (Gonza lez & Suttle 1993), however W i l h e l m & Suttle (1999) estimated that on ly 1% o f the v i r a l communi ty is r emoved due to graz ing. The products o f v i r a l lys is fo rm large co l lo ids (Shibata et a l . 1997) that may increase the rate at w h i c h organic matter is advected out o f the phot ic zone, effectively reducing the amount o f nutrients avai lable to pr imary producers. 4 The study o f marine viruses is jus t i f ied based on the indisputable fact that viruses infect marine organisms and must therefore have some influence o n the marine ecosystem. A s discussed above, v i r a l infect ion can have direct and indirect effects o n marine organisms that result i n changes i n m i c r o b i a l food webs, popula t ion structure and b iogeochemica l c y c l i n g . The cont inued characterization o f v i rus communi t ies and isolates o n a molecu la r l eve l m a y lead to new insights into v i r o l o g y , i nc lud ing v i rus evolut ion. Before our understanding o f marine viruses o n a commun i ty leve l can improve , however , w e must better characterize the i n d i v i d u a l viruses that comprise the v i r iop lank ton . F o r example w e lack data o n fundamental areas o f research such as infect ion, reproduct ion and extra and intracellular persistence. 1.1.2 Most marine viruses are bacteriophages Viruse s are obligate parasites and therefore the compos i t ion o f the v i rus communi ty generally reflects the compos i t ion o f the host commun i ty (this is o f course a general izat ion as among other factors, communi ty compos i t ion is affected by virus burst size and decay rate). In the marine environment, prokaryotes are o n average an order o f magnitude more abundant than eukaryotes and thus the majori ty o f marine viruses are be l ieved to be phages ( C o c h l a n et a l . 1993). Several independent l ines o f evidence support this contention. In a w i d e range o f mar ine environments, the greatest spatial and temporal covariance o f v i r a l abundance is w i t h prokaryote abundance and prokaryote act ivi ty suggesting that prokaryotes are the hosts o f most marine viruses ( W o m m a c k & C o l w e l l 2000). It should be noted that these data do not d iscr iminate between bacteria and archaea. W h o l e genome shotgun l ibraries o f marine coastal D N A vi rus communi t ies demonstrated that o f the environmental sequences w i t h homologues i n the N C B I database up to 9 0 % were most s imi la r to bacteriophages (Edwards & R o h w e r 2005) . Nevertheless, a majori ty o f sequences were unidentif iable and thus the compos i t ion o f the host communi ty remains uncertain. The majori ty o f marine D N A virus genomes are between 30 and 60 kbp i n size, a size range characteristic o f bacteriophages (Steward et a l . 2000). T o this point, the genome size range o f marine R N A viruses remains unknown . 1.1.3 A majority of marine viruses have DNA genomes M o s t marine viruses are be l ieved to have D N A genomes. O f the greater than 5000 bacteriophages isolated, 9 6 % are ta i led and have d s D N A genomes ( A c k e r m a n 2000) . M o r e o v e r , a majori ty o f marine phage isolates be long to the Myoviridae, a f ami ly o f viruses w i t h contracti le 5 tails and d s D N A genomes. A l t h o u g h none infect marine hosts, a l l archaeal phage isolates to date have l inear or c i rcular d s D N A genomes (Prang i shv i l l et a l . 2006). Estimates o f the propor t ion o f ta i led phages (and thus viruses assumed to have d s D N A genomes) f rom i n situ v i r a l communi t ies based o n transmission electron mic roscopy ( T E M ) range f rom less than 5 0 % ( W o m m a c k et a l . 1992) to approximately 9 0 % (Demuth et a l . 1993), a l though it is l i k e l y that these results underestimate the propor t ion o f ta i led phages due to improper sample preparation and staining (Weinbauer 2004). 1.1.4 Marine RNA bacteriophages O f the hundreds o f marine bacteriophages that have been characterized ( B e r s h e i m 1992), on ly two have R N A genomes ( L e w i n 1963, H i d a k a 1971). L e w i n (1963) described tai led, rod-l i ke particles w i t h R N A genomes approximately 200 n m i n length that lysed isolates o f the marine f lexibacter ium Saprospira grandis. Subsequently, H i d a k a (1971) isolated and characterized ( H i d a k a & Ichida 1976) a single-stranded ( s s ) R N A vi rus named 0 6 N - 5 8 P f rom coastal Japanese waters that infected a marine strain o f Pseudomonas. 0 6 N - 5 8 P has a nar row host range, capable o f infect ing only one o f several Pseudomonas strains chal lenged ( H i d a k a & Ichida 1976). The v i r a l part icle has an envelope, is icosahedral i n shape and approximately 60 n m i n diameter. V i ruses w i t h D N A genomes also dominate classif ied groups o f viruses that infect prokaryotes. O f the 38 established genera o f bacteriophages, 3 o f them inc lude viruses w i t h R N A genomes. These genera fa l l into two famil ies , the Cystoviridae and the Leviviridae, neither o f w h i c h includes viruses w i t h marine hosts (h t tp : / /www.ncbi .n lm.nih .gov/ ICTVdb / I c tv / i ndex .h tm) . Vi ruses i n the Cystoviridae have a segmented d s R N A genome and infect several phytopathogenic (pathogenic to plants) species o f Pseudomonas (Mertens 2004) . V i r i o n s o f the Leviviridae are smal l ( -25 n m i n diameter) icosahedrans that encapsulate a posi t ive s s R N A genome ( B o l l b a c k & Huelsenbeck 2004) . The hosts o f viruses i n the Leviviridae appear to be restricted to gram-negative bacteria associated w i t h sewage ( B o l l b a c k & Huelsenbeck 2004) . A compar ison o f epifluorescence counts o f total viruses us ing a universa l nuc le ic ac id stain ( Y O P R O - 1 ) and one specific to d s D N A ( D A P I ) suggests that - 9 0 % o f the communi ty have d s D N A genomes (Weinbauer et a l . 1997). H o w e v e r , Y O P R O - 1 appears to stain R N A viruses weak ly (author, unpubl ished data), w h i c h w o u l d result i n an underestimate o f the contr ibut ion o f R N A viruses to total v i rus abundance. 6 These data have led to the assumption that R N A phages comprise an ins ignif icant fraction o f the marine bacteriophage communi ty (Steward 1992, Weinbauer and Suttle 1997), a l though this supposi t ion has not been direct ly tested, and the data leaves r o o m for uncertainty. F o r example , it is n o w w e l l established that the microbes i n culture do not reflect the immense divers i ty o f the natural marine prokaryote communi ty (Rappe & G i o v a n n o n i 2003). Because the isola t ion o f a v i rus is dependent o n the ava i lab i l i ty o f a host, estimates o f v i r a l d ivers i ty are l im i t ed by host cu l t ivab i l i ty ; thus, the characteristics o f marine v i r a l isolates may not accurately reflect the characteristics o f the natural communi ty . A d d i t i o n a l l y , 0 6 N - 5 8 P , the marine R N A phage characterized by H i d a k a and Ich ida (1976), is very unstable, be ing susceptible to modest changes i n temperature, p H and sal ini ty. I f these are general characteristics o f marine R N A phages, they may not survive standard marine v i rus co l l ec t ion techniques. 1.1.5 Virus taxonomy T h e nature o f viruses and v i r a l evo lu t ion preclude a c lass i f icat ion scheme mode led o n class ical L innaean taxonomy (Condi t 2001). Spec i f ica l ly , viruses l i k e l y have mul t ip le or igins and, therefore, a single c o m m o n ancestor does not exist. Recombina t i on and reassortment i n viruses occurs frequently, result ing i n viruses w i t h po lyphy le t i c genomes. Temperate viruses are subject to rad ica l ly different evolut ionary pressures depending o n whether they are i n an integrated or ly t i c phase o f reproduct ion ( B a l l 2004) . In order to accommodate these characteristics (among others), v i rus taxonomy is based o n a nonsystematic, polythet ic , h ierarchical system i n w h i c h viruses are c lass i f ied by compar ing a co l l ec t ion o f equivalent properties where the set o f properties can change i n different t axonomic branches (Condi t 2001). V i ruses are general ly c lass i f ied by v i r i o n morpho logy (e.g. capsid symmetry and size), v i r i o n phys ica l properties (e.g. genome structure and antigenic properties) and b i o l o g i c a l properties (e.g. repl icat ions strategy and pathogenici ty) and most recently, w i t h genetic analyses (Condi t 2001). In some cases, phylogenies o f groups o f viruses based o n single genes are congruent w i t h established taxa. F o r example, the R N A - d e p e n d e n t R N A polymerase ( R d R p ) is one o f the few proteins conserved among almost a l l R N A viruses w i t h the except ion o f retroviruses ( K o o n i n 1991). Phylogenies based on the R d R p are rout inely used to group R N A viruses into species, genera and famil ies , a l though groupings higher than the fami ly leve l are unrel iable (Zanotto et a l . 1996). 7 1.1.6 Relevant molecular methods in marine virology M a r i n e v i ro logis ts have used specific pr imers and the polymerase cha in react ion ( P C R ) to target evolu t ionar i ly informative genes i n order to explore the richness o f a variety o f virus groups i n the ocean. In this approach, a fragment o f the target molecu le is ampl i f i ed f rom a communi ty o f extracted v i r a l nuc le ic acids by P C R . The divers i ty o f ampl icons i n this react ion can be assessed by several methods inc lud ing c l o n i n g o f ampl icons and sequencing (Cot t re l l & Suttle 1995), c lon ing fo l l owed by a compar i son o f insert restr ict ion enzyme digest ion patterns ( C h e n et a l . 1996), separation o f products o n a denaturing gel and compar i son o f commun i ty fingerprints (Short & Suttle 2002) and endonuclease digest ion o f fluorescently end-label led P C R products f o l l o w e d by the generation o f communi ty profi les o n an automated sequence analyzer ( W a n g & C h e n 2004). A P C R - b a s e d approach was first used to investigate the divers i ty o f viruses i n the f ami ly Phycodnaviridae, a group o f large d s D N A viruses that infect algae ( C h e n et a l . 1996). Target ing the v i r a l D N A polymerase, this research revealed a vast amount o f genetic var ia t ion that was not represented i n cultures and showed that very s imi la r sequences were distributed o n a g loba l scale (Short & Suttle 2002). A subsequent temporal study i n B r i t i s h C o l u m b i a showed that the a lgal v i rus communi ty is remarkably stable, even w h i l e the host communi ty is undergoing dramatic shifts i n compos i t ion (Short & Suttle 2003) . Schroeder et a l . (2003) used pr imers targeting a caps id gene and P C R to track the dynamics o f Emiliania huxleyi viruses ( E h V ) dur ing the terminat ion o f an Emiliania huxleyi b l o o m . Fu l l e r et a l (1998) developed a P C R - b a s e d approach targeting gp20, a capsid gene conserved i n a subset o f myoviruses . Investigations based o n this method revealed the incredible diversi ty present i n myov i ru s communi t ies ( Z h o n g et a l . 2002). Y e t , despite their incredible diversi ty , nearly identical sequences were recovered f rom v i rus communi t ies ranging f rom A r c t i c waters to freshwater catfish ponds (Short & Suttle 2005). S i m i l a r l y , genetical ly indist inguishable podovirus sequences have been found i n a wide range o f environments (Breitbart et a l . 2004a). The ch imer ic nature o f v i r a l genomes and the mul t ip le ancestries o f their constituent genes not on ly compl ica te the class i f icat ion o f viruses (Lawrence et a l . 2002) , but also l i m i t the ut i l i ty o f a single gene approach to investigate environmental v i rus diversi ty. It is therefore essential that the targeted molecula r marker reflects a meaningful b i o l o g i c a l relat ionship. 8 Furthermore a single gene approach to character izing divers i ty may not be a v iab le op t ion w i t h some groups o f viruses (Hendr ix et a l . 1999). H o w e v e r , v i r a l genes that constitute a "core genome" that are resistant to lateral transfer may represent attractive targets (Jain et a l . 1999). A s w e l l as targeting molecular markers, marine vi rologis ts have used P C R to construct whole-genome shotgun ( W G S ) libraries o f natural v i rus communi t ies . Breitbart et a l . (2002) used a l inker ampl i f i ed shotgun l ibrary ( L A S L ) approach to construct a metagenomic l ibrary o f two coastal D N A phage communi t ies . In the L A S L method, 200 1 o f seawater were pre-fi l tered and concentrated us ing tangential f l o w fi l trat ion. The phage fraction was pur i f i ed f rom a ces ium chlor ide gradient and the v i r a l D N A extracted and sheared. Af te r end-repairing the sequence fragments, l inkers were added and P C R conducted w i t h pr imers targeting sites specif ic to these l inkers . A m p l i c o n s were subsequently c loned and sequenced (Brietbart et a l . 2002). The L A S L approach effectively overcame obstacles part icular to environmental phage communi t ies , i n c l u d i n g l o w concentrat ion o f nuc le ic acids per v i r a l genome (sub-femtogram), m o d i f i e d v i r a l D N A and genes lethal to transformed cel ls dur ing c lon ing (Edwards & R o h w e r 2005) . Subsequently, the L A S L method was used to examine phage communi t ies f rom marine sediment (Breitbart et a l . 2004b) and human (Breitbart et a l . 2003) and equine ( C a n n et a l . 2005) feces. A n a l y s i s o f these l ibraries demonstrated that most o f the sequence fragments are nove l . The results o f a m o d e l based o n the observed over lap o f sequence fragments, estimated that the number o f different v i r a l genotypes ranged f rom approximately 1000 i n the fecal communi t ies (Breitbart et a l . 2003, C a n n et a l . 2005) to one m i l l i o n i n the marine sediment (Breitbart et a l . 2004b) , suggesting that phage communi t ies are some o f the most diverse o n the planet (Edwards & R o h w e r 2005). 1.1.7 Introduction to RNA virology The genetic mater ial o f R N A viruses can be single-stranded or double-stranded molecules o f r ibonucle ic ac id . Single-stranded R N A viruses are further classif ied by the polar i ty o f their genomes. D u r i n g repl icat ion, the genome o f a positive-sense R N A vi rus i s translated direct ly w h i l e the genome o f a negative-sense R N A vi rus is first converted to positive-sense R N A by a R N A polymerase (Prescott et a l . 1993). R N A vi rus genomes can occur i n segments where each segment encodes one protein, or i n a single molecule that is transcribed into a po lypro te in f rom the entire genome ( R o i z m a n & Palese 1996). A l t h o u g h some R N A v i r ions 9 (complete v i r a l particles) have other constituents such as an envelope, a l l R N A viruses have a nucleocapsid core composed o f R N A surrounded by a protein shell ca l led a capsid . C a p s i d morpho logy is generally icosahedral or he l i ca l , however there are many exceptions (Prescott et a l . 1993). O f the - 3 6 0 0 virus species characterized, s s R N A viruses are the most diverse, f o l l o w e d by d s D N A , d s R N A and s s D N A respect ively. H o w e v e r , these data are greatly inf luenced by the focus o f v i r o l o g y on pathogens o f humans and economica l ly important organisms ( V i l l a r e a l 2005). 1.1.8 Marine RNA viruses R N A viruses o f every classif icat ion that infect a divers i ty o f host organisms have been isolated f rom the ocean. F o r example, marine bi rnavirus ( M A B V ) has been isolated i n a variety o f b iva lves f rom Japanese waters ( S u z u k i & N o j i m a 1999) and has been responsible for significant losses i n populat ions o f cul t ivated pearl oysters ( K i t a m u r a et a l . 2001) . M a r i n e R N A viruses have also been associated w i t h several species o f crustaceans. F o r example , pos i t ive-sense, single-stranded viruses have been found i n species o f penaeid shr imp ( M a r i et a l . 2002, Sr i tunyalucksana et a l . 2006) i nc lud ing Taura shr imp vi rus ( T S V ) . Infection by this v i rus can be fatal to the host organism and the v i rus has several variants that are w i d e spread i n N o r t h A m e r i c a n waters ( E r i c k s o n et a l . 2005). Moreove r , d s R N A viruses have been observed i n crustaceans, i nc lud ing several species o f crab (Pappalardo et a l . 1986, Z h a n g et a l . 2004). The negative-sense, s s R N A rhabdoviruses and paramyxoviruses are major pathogens o f f ish (Hof fmann et a l . 2005). V i r a l haemorrhagic septicaemia v i rus ( V H S V ) and infect ious haematopoietic necrosis v i rus ( I H N V ) are negative-sense s s R N A viruses that cause disease i n trout and sa lmon. Double-s tranded R N A reoviruses infect f i sh i nc lud ing C h i n o o k , C h u m and C o h o S a l m o n , Str iped Bass , and Turbot (Mertens 2004) and re t roviral sequences have been ampl i f i ed f rom sharks and Pufferf ish (Hern iou et a l . 1998). The positive-sense s s R N A betadnaviruses can cause encephalopathy (alterations i n bra in function) i n mul t ip le species o f w i l d and farmed f ish ( G o m e z et a l . 2004) . R N A vi rus infect ion is also c o m m o n i n mar ine mammals . Phoc ine distemper v i rus ( P D V ) is a negative-sense s s R N A m o r b i l l i v i r u s that has decimated European seal populat ions over the past two decades (Barrett et a l . 2003) . Cetacean m o r b i l l i v i r u s ( C M V ) and its variants have been isolated f rom whales, dolphins and porpoises ( R i m a et a l . 2005) and C M V pathology is frequently present i n stranded animals (Taubenberger et a l . 2000) . R N A viruses such as ca l ic iv i ruses ( S m i t h 2000) and inf luenza viruses ( V a n Bressem 10 et a l . 1999) are found i n many marine m a m m a l populations, a l though infections tend not to be lethal . A l t h o u g h there are numerous examples o f R N A viruses f rom the ocean, most k n o w n ones infect marine animals , w h i c h make up a smal l fraction o f the l i v i n g biomass i n the ocean, and w h i c h are un l ike ly to be the major hosts for the natural R N A v i r iop lank ton . 1.1.9 RNA viruses that infect marine protists A s s u m i n g that marine R N A phages are rare (see section 1.1.4), the most l i k e l y major hosts o f R N A viruses i n the ocean are the diverse, abundant and eco log ica l ly and economica l ly important marine protists. The first R N A vi rus reported to lyse a eukaryot ic phytoplankter is H a R N A V {Heterosigma akashiwo R N A virus) , a s s R N A vi rus that infects the unice l lu la r photosynthetic marine flagellate Heterosigma akashiwo (Ta i et a l . 2003). The popula t ion dynamics o f H. akashiwo are o f special economic interest as b looms o f this a lga are responsible for extensive f ish k i l l s wor ldwide , affecting the aquaculture industry i n part icular (Smayda 1998). H a R N A V was isolated f rom the southern Strait o f Georg ia , B r i t i s h C o l u m b i a . It has a non-enveloped, icosahedral capsid that is 25 n m i n diameter, composed o f f ive major proteins and contains a genome o f ~9 kbp (Ta i et a l . 2003) . Characterist ics o f H a R N A V infect ion inc lude the presence o f v i r a l crystal l ine arrays, the swe l l i ng o f the endoplasmic re t i cu lum and vacuola t ion o f the cytoplasm, pathology that is s imi la r to established groups o f positive-sense s s R N A viruses ( T a i et a l . 2003) . H a R N A V s host range is restricted to H. akashiwo strains isolated f rom the N o r t h East Pac i f i c (Ta i et a l . 2003), complement ing other research that shows that most marine viruses are strain-specific (Tomaru et a l . 2004) . The characterization o f the genome sequence o f H a R N A V can be found i n Chapter II. R N A viruses are k n o w n to infect and lyse eco log ica l ly important marine organisms inc lud ing a d ia tom (Nagasaki et a l . 2004a) and a dinoflagellate ( T o m a r u et a l . 2004). The first v i rus isolate shown to infect a d ia tom is the positive-sense s s R N A vi rus R s R N A V , w h i c h infects Rhizosolenia setigera (Nagasaki et a l . 2004a), a d ia tom c o m m o n i n temperate coastal waters and a reoccurr ing member o f the F a l l and Spr ing b looms (Graham & W i l c o x 2000). D ia toms are g loba l ly distributed and incred ib ly diverse, encompassing approximately 12,000 recognized species (Graham & W i l c o x 2000). T h e y are an integral part o f b iogeochemica l c y c l i n g i n the ocean and typ ica l ly dominate the phytoplankton i n nutrient r i c h waters (Graham & W i l c o x 2000) . R s R N A V has an icosohedral , non-enveloped capsid (32 n m dia). The positive-sense 8.9 11 kbp genome has a polyadenylate [po ly(A)] ta i l and is po lyc is t ron ic , encoding three major structural proteins (Nagasaki et a l . 2004a). The latent per iod o f R s R N A V infect ion is two days cu lmina t ing i n lys is , at w h i c h point 1000 to 3000 v i r ions are released (Nagasaki et a l . 2004a). The impact o f R s R N A V on R. setigera populat ions is u n k n o w n ; however , the existence o f R s R N A V indicates that R N A viruses are agents o f mortal i ty o f marine diatoms and hence l i k e l y affect d ia tom popula t ion dynamics . Heterocapsa circularisquama R N A vi rus ( H c R N A V ) is a non-enveloped, positive-sense s s R N A vi rus approximately 30 n m i n diameter that infects Heterocapsa circularisquama ( T o m a r u et a l . 2004), a harmful b loom- fo rming dinoflagellate responsible for the mass mortal i ty o f shel l f ish i n Japanese waters ( M a t s u y a m a et a l . 1999). Dinof lagel la tes are a diverse group o f protists that include autotrophs, mixot rophs , osmotrophs (i.e. they are capable o f the direct uptake o f d issolved organic compounds) and parasites. T h e y are found i n most aquatic habitats and are v i t a l p r imary producers p r imar i l y i n coastal waters (Graham & W i l c o x 2000). The H c R N A V genome is 4.4 kbp , lacks a p o l y ( A ) ta i l and contains two open reading frames ( O R F s ) . O R F 1 is p r o x i m a l to the 5 ' end and has identif iable protease and R d R p motifs , and O R F 2 encodes a single structural protein estimated to be 38 k D a i n size (Nagasaki et a l . 2005). Phylogenet ic analysis based o n translated R d R p al ignments suggests H c R N A V is related to viruses f rom the Luteoviridae, Barnaviridae and Tetraviridae, but falls outside these famil ies (Nagasaki et a l . 2005). In an addi t ional study, Nagasak i et a l . (2004b) demonstrated that dur ing the peak o f a b l o o m i n A g o B a y , Japan, a remarkable 8 8 % o f H. circularisquama ce l ls contained v i rus - l ike particles resembl ing H c R N A V ; w h i c h suggests that H c R N A V plays a significant role i n H. circularisquama b l o o m termination. A s w e l l as infect ing eco log ica l ly important eukaryotic phytoplankton, R N A viruses infect heterotrophic protists as w e l l . S s s R N A V (Sch i zocy t r i um single-stranded R N A virus) is a s s R N A v i rus that infects the thraustochytrid Schizocytrium sp. (Takao et a l . 2005) Thraustochytr ids are osmotrophic marine fungoid protists found i n a wide-range o f aquatic habitats, where they serve as important decomposers ( K i m u r a et a l . 1999). The genome o f S s s R N A V is positive-sense, 9018 bp i n length and has a p o l y ( A ) ta i l . The v i r a l genome codes for two putative polyproteins , a non-structural po lypro te in p r o x i m a l to the 5 ' end that includes helicase, protease and R d R p domains and, a structural po lypro te in that encodes three major and two m i n o r structural proteins (Takao et a l . 2006) . A l t h o u g h S s s R N A V has a s imi la r genome organizat ion to viruses i n the f ami ly Dicistroviridae, subgenomic R N A s are present dur ing S s s R N A V repl ica t ion but not 12 dur ing dic is t rovirus repl icat ion. Moreove r , phylogenet ic analysis strongly supports the placement o f S s s R N A V outside established famil ies o f R N A viruses (Takao et a l . 2006). The isola t ion o f Micromonas pusilla reovirus ( M p R V ) demonstrated that d s R N A viruses are capable o f infect ing a marine p r imary producer as w e l l . Micromonas pusilla is a f lagellated marine phytoplankter identif ied as the most abundant picoeukaryote (< 2 um) i n oceanic and coastal regions (Not et a l . 2004). The M p R V v i r i o n is 75 n m i n diameter, contains f ive major proteins (Brussard et a l . 2004) and does not possess the projections (i.e. ' turrets ') characteristic o f some v i r a l genera i n the f ami ly Reoviridae (A t tou i et a l . 2006). The genome is composed o f e leven segments o f d s R N A , that total ~ 25.5 kbp. Putat ive c e l l attachment, capsid and non-structural proteins, i nc lud ing a polymerase were ident i f ied based on significant s imi la r i ty to sequences i n the N C B I database (At tou i et a l . 2006). L i k e M p R V , some rotavirus and aquareovirus genomes have eleven segments, however phylogenies generated f rom al ignments o f the M p R V polymerase w i t h representative viruses o f the e leven genera o f the f a m i l y Reoviridae, as w e l l as several unique genomic features i nc lud ing an unusual ly l ong segment 1 and nove l 5 ' and 3 ' te rminal sequences, suggest that M p R V belongs i n a n e w genus (At tou i et a l . 2006) . Table 1.1 provides a synopsis o f the newly discovered R N A viruses that infect marine protists. 1.2 Thes i s t heme The informat ion presented above suggests that R N A viruses are more abundant, diverse and eco log ica l ly important i n the sea than has general ly been assumed; however , this hypothesis has never been tested. The overa l l theme o f this dissertation is therefore the characterization o f the natural R N A v i r iop lank ton i n the marine environment. T o w a r d this end, the second chapter is an analysis o f the complete genomic sequence o f H a R N A V , the first positive-sense s s R N A v i rus isolated that infects a marine protist. The presence o f a persistent and widespread mar ine R N A virus prompted an invest igat ion o f the marine R N A vi rus communi ty , a component o f the marine environment almost complete ly uncharacterized. Thus , the th i rd chapter discusses the results o f research assessing the diversi ty o f p icorna- l ike viruses, a group o f positive-sense s s R N A viruses w i t h s imi la r genome features and sharing conserved regions i n the R d R p , f rom marine v i rus communi t ies . In chapter four, randomly reverse-transcribed whole-genome shotgun sequencing is used to characterize the diversi ty o f two complete marine R N A vi rus assemblages. These v i rus 13 communi t ies were heav i ly dominated by different genotypes w i t h smal l genome sizes, a l l o w i n g the complete assembly o f the genomes f rom three prev ious ly u n k n o w n viruses. T h e complete genome sequences o f these viruses are ana lyzed i n chapter f ive. The conc lud ing chapter summarizes the f indings o f the dissertation, examines methodologica l bias and discusses the general importance and impl ica t ions o f this research. 14 1.3 Table Table 1.1 R N A viruses that infect marine protists Virus Acronym Host Genome Size (kbp) Reference Heterocapsa circularisquama RNA HcRNAV H.circularisquama + ss 4.4 Tomaru et al. 2004 Heterosigma akashiwo RNA HaRNAV H. akashiwo + ss 8.6 Tai et al. 2003 Micromonas pusilla reo- MpRV M. pusilla ds 25.5 Brussaard et al. 2004 Rhizosolenia setigera RNA RsRNAV R. setigera + ss 8.9 Nagasaki et al. 2004a Schizochytrium single-stranded RNA SssRNAV Schizochytrium sp. + ss 9.0 Takao et al. 2005 15 1.4 References A c k e r m a n n , H . W . 2001 . Frequency o f morpho log ica l phage descriptions i n the year. 2000. A r c h i v e s o f V i r o l o g y 146: 843-857. A t t o u i , H . , F . M . Jaafar, M . Belhouchet , P . D e M i c c o , X . D e Lamba l l e r i e , and C . P . D . Brussaard. 2006. M i c r o m o n a s pus i l l a reovirus: a new member o f the f ami ly Reoviridae assigned to a nove l proposed genus (Mimoreovirus). Journal o f Genera l V i r o l o g y 87: 1375-1383. B a l l , L . A . 2004. Introduction to universal v i rus taxonomy, p. 3-9. In C . M . Fauquet, M . A . M a y o , J . M a n i l o f f , U . Desselberger and L . A . B a l l [eds.], V i r u s T a x n o m y : V H I t h Repor t o f the International Commit tee o n T a x o n o m y o f Vi ruses . A c a d e m i c Press. Barrett, T . , P . Sahoo, and P . D . Jepson. 2003. Seal distemper outbreak 2002. M i c r o b i o l o g y Today 30: 162-164. B o r s h e i m , K . Y . 1993. N a t i v e marine bacteriophages. F E M S M i c r o b i o l o g y Letters 102: 141-159. Bracht , A . J . , R . L . Brudek , R . Y . E w i n g , C . A . M a n i r e , K . A . Burek , C . Rosa , K . B . B e c k m e n , J . E . Marun iak , and C . H . Romero . 2006. Genet ic ident i f icat ion o f nove l poxviruses o f cetaceans and pinnipeds. A r c h i v e s o f V i r o l o g y 151: 423-438. Bratbak, G . , J . K . Egge , and M . H e l d a l . 1993. V i r a l mortal i ty o f the marine a lga Emiliania huxleyi (Haptophyceae) and terminat ion o f a lgal b looms. M a r i n e Eco logy-Progress Series 93: 39-48. Breitbart , M . , J . H . M i y a k e , and F . Rohwer . 2004a. G l o b a l d is t r ibut ion o f nearly ident ical phage-encoded D N A sequences. F E M S M i c r o b i o l o g y Letters 236: 249-256. Breitbart , M . , B . Felts , S. K e l l e y , J . M . Mahaf fy , J . N u l t o n , P . Sa lamon, and F . Rohwer . 2004b. D ive r s i t y and popula t ion structure o f a near-shore marine-sediment v i r a l communi ty . Proceedings o f the R o y a l Society o f L o n d o n Series B - B i o l o g i c a l Sciences 271: 565-574. 16 Breitbart , M . , I. H e w s o n , B . Fel ts , J . M . Mahaf fy , J . N u l t o n , P . Sa lamon, and F . R o h w e r . 2003. Me tagenomic analyses o f an uncul tured v i r a l commun i ty f rom human feces. Journal o f Bac te r io logy 185: 6220-6223. Breitbart , M . , P . Sa lamon, B . Andresen , J . M . Mahaf fy , A . M . Sega l l , D . M e a d , F . A z a m , and F . Rohwer . 2002. G e n o m i c analysis o f uncultured marine v i r a l communi t ies . Proceedings o f the N a t i o n a l A c a d e m y o f Sciences o f the U n i t e d States o f A m e r i c a 99: 14250-14255. Brussaard, C . P . D . , A . A . M . Noorde loos , R . A . Sandaa, M . H e l d a l , and G . Bratbak. 2004. D i s c o v e r y o f a d s R N A vi rus infecting the marine photosynthetic protist Micromonas pusilla. V i r o l o g y 319: 280-291. C a n n , A . J . , S. E . Fandr ich , and S. Heaphy. 2005. A n a l y s i s o f the v i rus popula t ion present i n equine feces indicates the presence o f hundreds o f uncharacterized v i rus genomes. V i r u s Genes 30: 151-156. C h e n , F . , C . A . Suttle, and S. M . Short. 1996. Genet ic diversi ty i n marine a lgal v i rus communi t ies as revealed by sequence analysis o f D N A polymerase genes. A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 62: 2869-2874. C h i u r a , H . X . 1997. Genera l ized gene transfer by v i rus - l ike particles f rom marine bacteria. A q u a t i c M i c r o b i a l E c o l o g y 13: 75-83. C o c h l a n , W . P . , J . W i k n e r , G . F . Steward, D . C . Smi th , and F . A z a m . 1993. Spat ia l -dis t r ibut ion o f viruses, bacteria and ch lo rophy l l - a i n neri t ic , oceanic and estuarine environments. M a r i n e Eco logy-Progress Series 92: 77-87. Condi t , R . C . 2001 . Pr inc ip les o f v i ro logy , p. 19-51. In D . M . K n i p e , P . M . H o w l e y , D . E . G r i f f i n , R . A . L a m b , M . A . M a r t i n , B . R o i z m a n and S. E . Straus [eds.], F ie ld ' s V i r o l o g y . L ipp inco t t W i l l i a m s & W i l k i n s . Co t t re l l , M . T . , and C . A . Suttle. 1995. Genet ic divers i ty o f a lgal viruses w h i c h lyse the photosynthetic picoflagellate Micromonas pusilla (Prasinophyceae). A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 61: 3088-3091. D e L o n g , E . F . , and D . M . K a r l . 2005. G e n o m i c perspectives i n m i c r o b i a l oceanography. Nature 437: 336-342. 17 Demuth , J . , H . N e v e , and K . P . W i t z e l . 1993. Di rec t electron mic roscopy study o n the morpho log ica l diversi ty o f bacteriophage populat ions i n L a k e P luBsee . A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 59 : 3378-3384. D u n i g a n , D . D . , L . A . F i tzgera ld , and J . L . V a n Etten. 2006. Phycodnaviruses : A peek at genetic divers i ty . V i r u s Research 117: 119-132. Edwards , R . A . , and F . Rohwer . 2005. V i r a l metagenomics. Nature R e v i e w s M i c r o b i o l o g y 3: 504-510. E r i c k s o n , H . S., B . T . Poulos , K . F . J . Tang , D . B r a d l e y - D u n l o p , and D . V . L ightner . 2005. Tau ra syndrome vi rus f rom B e l i z e represents a unique variant. Diseases p f A q u a t i c Organisms 64: 91-98. F i l i p p i n i , M . , N . Bues ing , Y . Bettarel , T . S ime-Ngando , and M . O . Gessner. 2006. Infect ion paradox: H i g h abundance but l o w impact o f freshwater benthic viruses. A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 72 : 4893-4898. Fischer , U . R . , C . Wie l t s chn ig , A . K . T . Ki r schner , and B . V e l i m i r o v . 2006. Con t r ibu t ion o f v i rus- induced lys is and protozoan grazing to benthic bacterial mortal i ty estimated simultaneously i n mic rocosms . Env i ronmenta l M i c r o b i o l o g y 8: 1394-1407. F l e g e l , T . W . 2006. Detec t ion o f major penaeid shr imp viruses i n A s i a , a h is tor ica l perspective w i t h emphasis o n Tha i land . Aquacul tu re 258 : 1-33. F r i edman , C . S., R . M . Estes, N . A . Stokes, C . A . Burge , J . S. Hargove , B . J . Barber , R . A . E l s ton , E . M . Burreson, and K . S. Reece. 2005. Herpes v i rus i n j uven i l e Pac i f i c oysters Crassostrea-gigas f rom Tomales B a y , Ca l i fo rn ia , coincides w i t h summer mortal i ty episodes. Diseases o f A q u a t i c Organisms 63: 33-41. Fuh rman , J . A . , and C . A . Suttle. 1993. V i ru se s i n marine p lanktonic systems. Oceanography 6: 51-63. Fuhrman , J. A . , and R . T . N o b l e . 1995. V i ruses and protists cause s imi la r bacterial mortal i ty i n coastal seawater. L i m n o l o g y and Oceanography 40: 1236-1242. Fu l l e r , N . J . , W . H . W i l s o n , I. R . Joint, and N . H . M a n n . 1998. Occurrence o f a sequence i n marine cyanophages s imi la r to that o f T 4 g20 and its appl ica t ion to P C R - b a s e d detection and quantif icat ion techniques. A p p l i e d and Env i ronmenta l M i c r o b i o l o g y 64: 2051-2060. 18 Goble r , C . J . , D . A . Hutchins , N . S. Fisher , E . M . Cosper , and S. S a n u d o - W i l h e l m y . 1997. Release and b ioava i lab i l i ty o f C , N , P , Se, and Fe f o l l o w i n g v i r a l lys is o f a mar ine Chrysophyte . L i m n o l o g y and Oceanography 42: 1492-1504. G o m e z , D . K . , J . Sato, K . M u s h i a k e , T . I ssh ik i , Y . O k i n a k a , and T . N a k a i . 2004. P C R - b a s e d detection o f betanodaviruses f rom cultured and w i l d marine f ish w i t h no c l i n i c a l signs. Journal o f F i s h Diseases 27: 603-608. Gonza l ez , J . M . , and C . A . Suttle. 1993. G r a z i n g by marine nanoflagellates on viruses and v i rus-s ized particles: Ingestion and digestion. M a r i n e Eco logy-Progress Series 94: 1-10. G r a h a m , L . E . , and L . W . W i l c o x . 2000. A l g a e . P ren t ice -Hal l , Inc. H e n d r i x , R . W . , M . C . M . Smi th , R . N . Burns , M . E . F o r d , and G . F . Ha t fu l l . 1999. Evo lu t iona ry relationships among diverse bacteriophages and prophages: A l l the wor ld ' s a phage. Proceedings o f the N a t i o n a l A c a d e m y o f Sciences o f the U n i t e d States o f A m e r i c a 96: 2192-2197. Hennes, K . , and M . S i m o n . 1995. Signif icance o f bacteriophages for con t ro l l ing bacter ioplankton growth i n a Meso t roph ic L a k e . A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 61: 333-340. H e r n i o u , E . , J . M a r t i n , K . M i l l e r , J . C o o k , M . W i l k i n s o n , and M . Tr i s tem. 1998. Re t rov i ra l divers i ty and dis t r ibut ion i n vertebrates. Journal o f V i r o l o g y 72: 5955-5966. H i d a k a , T . , and K . Ichida. 1976. Properties o f a marine R N A - c o n t a i n i n g bacteriophage. M e m o i r s o f the Facu l ty o f Fisheries , K a g o s h i m a Un ive r s i ty 25: 77-89. H i d a k a , T . 1971. Isolat ion o f marine bacteriophages f rom seawater. B u l l e t i n o f the Japanese Society o f Scient i f ic Fisheries 37: 1199-1206. Hof fmann , B . , M . Beer , H . Schutze, and T. C . Mettenleiter . 2005. F i s h rhabdoviruses: M o l e c u l a r ep idemio logy and evolut ion . Current Top i c s i n M i c r o b i o l o g y and I m m u n o l o g y 292: 81-117. Iyer, L . A . , S. B a l a j i , E . V . K o o n i n , and L . A r a v i n d . 2006. Evo lu t iona ry genomics o f nucleo-cy toplasmic large D N A viruses. V i r u s Research 117: 156-184. Ja in , R . , M . C . R i v e r a , and J . A . L a k e . 1999. Hor i zon ta l gene transfer among genomes: The complex i ty hypothesis. Proceedings o f the Na t iona l A c a d e m y o f Sciences o f the U n i t e d States o f A m e r i c a 96: 3801-3806. 19 J iang, S. C , and J . H . P a u l . 1998. Gene transfer by transduction i n the marine environment . A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 64: 2780-2787. J iang, S. C . , and J . H . P a u l . 1995. V i r a l contr ibut ion to d issolved D N A i n the marine-environment as determined by differential centrifugation and k i n g d o m probing . A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 61: 317-325. K i m u r a , H . , T . Fukuba , and T . Naganuma . 1999. B iomass o f thraustochytrid protoctists i n coastal water. M a r i n e Ecology-Progress Series 189: 27-33. K i t a m u r a , S. I., S. J. Jung, and S. S u z u k i . 2000. Seasonal change o f infect ive state o f marine birnavirus i n Japanese pearl oyster Pinctada fucata. A r c h i v e s o f V i r o l o g y 145: 2003-2014. K o o n i n , E . V . 1991. The phylogeny o f R N A - d e p e n d e n t R N A polymerases o f posi t ive-strand R N A viruses. Journal o f Genera l V i r o l o g y 72: 2197-2206. Larsen , A . , T . Castberg, R . A . Sandaa, C . P . D . Brussaard, J . Egge , M . H e l d a l , A . Pau l ino , R . Thyrhaug , E . J . V a n Hannen , and G . Bratbak. 2001. Popu la t ion dynamics and divers i ty o f phytoplankton, bacteria and viruses i n a seawater enclosure. M a r i n e Eco logy-Progress Series 221: 47-57. Lawrence , J . G . , G . F . Ha t fu l l , and R . W . Hendr ix . 2002. Imbrogl ios o f v i r a l taxonomy: Genet ic exchange and fai l ings o f phenetic approaches. Journal o f Bac te r io logy 184: 4891-4905. L e w i n , R . A . 1963. Rod-shaped particles i n Saprospira. Nature 198: 103-104. M a n n , N . H . , A . C o o k , A . M i l l a r d , S. B a i l e y , and M . C l o k i e . 2003. M a r i n e ecosystems: bacterial photosynthesis genes i n a v i rus . Nature 424: 741. M a r i , J . , B . T . Poulos , D . V . Lightner , and J . R . B o n a m i . 2002. Sh r imp Taura syndrome vi rus : genomic characterization and s imi lar i ty w i t h members o f the genus Cricket paralysis-like viruses. Journal o f Genera l V i r o l o g y 83: 915-926. Ma t suyama , Y . , T . U c h i d a , and T. Honjo . 1999. Effects o f harmful dinoflagellates, Gymnodinium mikimotoi and Heterocapsa circularisquama, red-tide o n f i l ter ing rate o f b iva lve mol luscs . Fisheries Science 65: 248-253. Mertens , P . 2004. The d s R N A viruses. V i r u s Research 101: 3-13. 20 Middelb0e, M . 2000. Bac te r ia l g rowth rate and marine virus-host dynamics . M i c r o b i a l E c o l o g y 40: 114-124. Middelb0e, M . , N . O . G . Jorgensen, a n d N . K r o e r . 1996. Effects o f viruses o n nutrient turnover and growth eff iciency o f non-infected marine bacterioplankton. A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 62: 1991-1997. M u n n , C . B . 2006. V i ruses as pathogens o f marine organisms - f rom bacteria to whales. Journal o f the M a r i n e B i o l o g i c a l A s s o c i a t i o n o f the U n i t e d K i n g d o m 86: 453-467. Nagasak i , K . , Y . Sh i ra i , Y . Takao , H . M i z u m o t o , K . N i s h i d a , and Y . Tomaru . 2005. C o m p a r i s o n o f genome sequences o f single-stranded R N A viruses infect ing the b i v a l v e - k i l l i n g dinoflagellate Heterocapsa circularisquama. A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 71: 8888-8894. Nagasak i , K . , Y . Tomaru , N . Katanozaka , Y . Shi ra i , K . N i s h i d a , S. Itakura, and M . Y a m a g u c h i . 2004a. Isolat ion and characterization o f a nove l single-stranded R N A virus infect ing the b loom- fo rming d ia tom Rhizosolenia setigera. A p p l i e d and Env i ronmen ta l M i c r o b i o l o g y 70: 704-711. Nagasak i , K . , Y . Tomaru , K . N a k a n i s h i , N . Hata , N . Katanozaka , and M . Y a m a g u c h i . 2004b. D y n a m i c s of Heterocapsa circularisquama (Dinophyceae) and its viruses i n A g o B a y , Japan. A q u a t i c M i c r o b i a l E c o l o g y 34: 219-226. N o t , F . , M . Latasa, D . M a r i e , T . C a r i o u , D . V a u l o t , and N . S i m o n . 2004. A single species, Micromonas pusilla (Prasinophyceae), dominates the eukaryot ic p icop lank ton i n the western E n g l i s h channel . A p p l i e d and Env i ronmenta l M i c r o b i o l o g y 70: 4064-4072. Pappalardo, R . , J . M a r i , and J . R . B o n a m i . 1986. Tau-(tau) v i rus infect ion o f Carcinus mediterraneus - H i s to logy , cytopathology, and experimental t ransmission o f the disease. Journal o f Invertebrate Pathology 47: 361-368. P e d r o s - A l i o , C , J . I. Calderon-Paz , and J . M . G a s o l . 2000. Compara t ive analysis shows that bacter ivory, not v i r a l lys is , controls the abundance o f heterotrophic prokaryot ic p lankton. F E M S M i c r o b i o l o g y E c o l o g y 32: 157-165. P r a n g i s h v i l l , D . , R . A . Garrett, and E . V . K o o n i n . 2006. Evo lu t iona ry genomics o f archaeal viruses: U n i q u e v i r a l genomes i n the third doma in o f l i fe . V i r u s Research 117: 52-67. Prescott, L . M . , J . P . Har ley , and D . A . K l e i n . 1993. M i c r o b i o l o g y , 2 ed. W m . C . B r o w n . 21 Proctor, L . M . , and J. A . Fuhrman. 1990. Vi ra l mortality of marine bacteria and cyanobacteria. Nature 343: 60-62. Rappe, M . S., and S. J. Giovannoni. 2003. The uncultured microbial majority. Annual Review of Microbiology 57: 369-394. Rima, B . K . , A . M . J. Col l in , and J. A . P. Earle. 2005. Completion of the sequence of a cetacean morbillivirus and comparative analysis of the complete genome sequences of four morbilliviruses. Virus Genes 30: 113-119. Roizman, S. G . , and P. Palese. 1996. Multiplication of Viruses: A n Overview, p. 101-111. In B . N . Fields, D . M . Knipe and P. M . Howley [eds.], Fields Virology. Lippincott-Raven. Schroeder, D . C , J. Oke, M . Hal l , G. Mal in , and W . H . Wilson. 2003. Virus succession observed during an Emiliania huxleyi bloom. Applied and Environmental Microbiology 69: 2484-2490. Shibata, A . , K . Kogure, I. Koike, and K . Ohwada. 1997. Formation of submicron colloidal particles from marine bacteria by viral infection. Marine Ecology-Progress Series 155: 303-307. Short, S. M . , and C. A . Suttle. 2002. Sequence analysis of marine virus communities reveals that groups of related algal viruses are widely distributed in nature. Applied and Environmental Microbiology 68: 1290-1296. Short, S. M . , and C. A . Suttle. 2003. Temporal dynamics of natural communities of marine algal viruses and eukaryotes. Aquatic Microbial Ecology 32: 107-119. Short, C. M . , and C. A . Suttle. 2005. Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments. Applied and Environmental Microbiology 71: 480-486. Smayda, T. J. 1998. Ecophysiology and Bloom Dynamics of Heterosigma akashiwo (Raphidophyceae), p. 113-131. In D . M . Anderson, A . D . Cembella and G . M . Hallegraeff [eds.], Physiological Ecology of Harmful Alga l Blooms. Springer-Ver lag. Smith, A . 2000. Aquatic Virus Cycles, p. 447-491. In C. Hurst [ed.], Vi ra l Ecology. Academic Press. 22 Sritunyalucksana, K. , S. Apisawetakan, A. Boon-Nat, B. Withyachumnarnkul, and T. W. Flegel. 2006. A new R N A virus found in black tiger shrimp Penaeus monodon from Thailand. Virus Research 118: 31-38. Steward, G. F., J. L . Montiel, and F. Azam. 2000. Genome size distributions indicate variability and similarities among marine viral assemblages from diverse environments. Limnology and Oceanography 45: 1697-1706. Steward, G. F., J. Wikner, W. P. Cochlan, D. C. Smith, and F. Azam. 1992. Estimation of virus production in the sea: 2. Field results. Marine Microbial Food Webs 6: 79-90. Stoderegger, K., and G. J. Herndl. 1998. Production and release of bacterial capsular material and its subsequent utilization by marine bacterioplankton. Limnology and Oceanography 43: 877-884. Sullivan, M . B., D. Lindell, J. A . Lee, L . R. Thompson, J. P. Bielawski, and S. W. Chisholm. 2006. Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. Plos Biology 4: 1344-1357. Suttle, C. A . 2005. Viruses in the sea. Nature 437: 356-361. Suttle, C. A . 1992. Inhibition of photosynthesis in phytoplankton by the submicron size fraction concentrated from seawater. Marine Ecology Progress Series 87: 105-112. Suttle, C. A. , A . M . Chan, and M . T. Cottrell. 1990. Infection of phytoplankton by viruses and reduction of primary productivity. Nature 347: 467-469. Suzuki, S., and M . Nojima. 1999. Detection of a marine birnavirus in wild molluscan shellfish species from Japan. Fish Pathology 34: 121-125. Tai, V . , J. E. Lawrence, A . S. Lang, A. M . Chan, A . I. Culley, and C. A . Suttle. 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). Journal of Phycology 39: 343-352. Takao, Y. , K. Mise, K. Nagasaki, T. Okuno, and D. Honda. 2006. Complete nucleotide sequence and genome organization of a single-stranded RNA virus infecting the marine fungoid protist Schizochytrium sp. Journal of General Virology 87: 723-733. 23 Takao, Y . , K . Nagasaki, K . Mise, T. Okuno, and D. Honda. 2005. Isolation and characterization of a novel single-stranded R N A virus infectious to a marine fungoid protist, Schizochytrium sp. (Thraustochytriaceae, labyrinthulea). Applied and Environmental Microbiology 71: 4516-4522. Taubenberger, J. K . , M . M . Tsai, T. J. Atkin , T. G . Fanning, A . E . Krafft, R. B . Moeller, S. E . Kodsi , M . G . Mense, and T. P. Lipscomb. 2000. Molecular genetic evidence of a novel morbillivirus in a long-finned pilot whale (Globicephalus melas). Emerging Infectious Diseases 6: 42-45. Tomaru, Y . , N . Katanozaka, K . Nishida, Y . Shirai, K . Tarutani, M . Yamaguchi, and K . Nagasaki. 2004. Isolation and characterization of two distinct types of H c R N A V , a single-stranded R N A virus infecting the bivalve-killing microalga Heterocapsa circularisquama. Aquatic Microbial Ecology 34: 207-218. V a n Bressem, M . F., K . Van Waerebeek, and J. A . Raga. 1999. A review of virus infections of cetaceans and the potential impact of morbilliviruses, poxviruses and papillomaviruses on host population dynamics. Diseases of Aquatic Organisms 38: 53-65. V a n Etten, J. L . , M . V . Graves, D . G . Muller , W . Boland, and N . Delaroque. 2002. Phycodnaviridae - large D N A algal viruses. Archives of Virology 147: 1479-1516. V a n Hannen, E . J., G . Zwart, M . P. V a n Agterveld, H . J. Gons, J. Ebert, and H . J. Laanbroek. 1999. Changes in bacterial and eukaryotic community structure after mass lysis of filamentous cyanobacteria associated with viruses. Applied and Environmental Microbiology 65: 795-801. Villarreal, L . P. 2005. Viruses and the Evolution of Life. A S M Press. Wang, K . , and F. Chen. 2004. Genetic diversity and population dynamics o f cyanophage communities in the Chesapeake Bay. Aquatic Microbial Ecology 34: 105-116. Weinbauer, M . G . 2004. Ecology of prokaryotic viruses. F E M S Microbiology Reviews 28: 127-181. Weinbauer, M . G . , I. Brettar, and M . G. Hofle. 2003. Lysogeny and virus-induced mortality of bacterioplankton in surface, deep, and anoxic marine waters. Limnology and Oceanography 48: 1457-1465. 24 Weinbauer , M . G . , and C . A . Suttle. 1997. Compar i son o f epifluorescence and t ransmiss ion electron mic roscopy for count ing viruses i n natural marine waters. A q u a t i c M i c r o b i a l E c o l o g y 13: 225-232. Weinbauer , M . G . , S. W . W i l h e l m , C . A . Suttle, and D . R . Garza . 1997. Photoreact ivat ion compensates for U V damage and restores infect ivi ty to natural marine v i r a l communi t ies . A p p l i e d and Env i ronmenta l M i c r o b i o l o g y 63: 2200-2205. W i l h e l m , S. W . , and C . A . Suttle. 1999. V i ruses and nutrient cycles i n the sea. B iosc ience 49: 781-788. W i l s o n , W . H . , G . Tarran, and M . V . Z u b k o v . 2002. V i r u s dynamics i n a coccol i thophore-dominated b l o o m i n the N o r t h Sea. Deep-Sea Research Part I I -Top ica l Studies i n Oceanography 49: 2951-2963. W o m m a c k , K . E . , and R . R . C o l w e l l . 2000. V i r i o p l a n k t o n : V i ru se s i n aquatic ecosystems. M i c r o b i o l o g y and M o l e c u l a r B i o l o g y R e v i e w s 64: 69-114. W o m m a c k , K . E . , J . R a v e l , R . T . H i l l , J . S. C h u n , and R . R . C o l w e l l . 1999. Popu la t ion dynamics o f Chesapeake B a y v i r iop lank ton : Tota l -communi ty analysis by pulsed-f ie ld ge l electrophoresis. A p p l i e d and Env i ronmenta l M i c r o b i o l o g y 65: 231-240. W o m m a c k , K . E . , R . T . H i l l , M . K e s s e l , R . C . E . , and R . R . C o l w e l l . 1992. Dis t r ibu t ion o f viruses i n the Chesapeake B a y . A p p l i e d and Envi ronmenta l M i c r o b i o l o g y 58: 2965-2970. Zanot to , P . M . D . , M . J . G i b b s , E . A . G o u l d , and E . C . H o l m e s . 1996. A reevaluat ion o f the higher taxonomy o f viruses based o n R N A polymerases. Journal o f V i r o l o g y 70: 6083-6096. Ze idner , G . , J . P . B i e l a w s k i , M . Shmoish , D . J . Scanlan, G . Sabehi , and O . Be ja . 2005. Potent ia l photosynthesis gene recombinat ion between Prochlorococcus and Synechococcus v i a v i r a l intermediates. Env i ronmenta l M i c r o b i o l o g y 7: 1505-1513. Z h a n g , S., Z . S h i , J . Zhang , and J . R . B o n a m i . 2004. Pur i f ica t ion and characterization o f a n e w reovirus f rom the Chinese mit ten crab, Eriocheir sinensis. Journal o f F i s h Diseases 27: 687. Z h o n g , Y . , F . C h e n , S. W . W i l h e l m , L . P o o r v i n , and R . E . H o d s o n . 2002. Phylogenet ic d ivers i ty o f marine cyanophage isolates and natural v i rus communi t ies as revealed by sequences o f v i r a l capsid assembly protein gene G 2 0 . A p p l i e d and Env i ronmenta l M i c r o b i o l o g y 68: 1576-1584. 25 Chapter II. Genome sequence and characterization of a virus (HaRNAV) related to picorna-like viruses that infects the marine toxic bloom-forming alga Heterosigma akashiwo A version of this chapter has been published Lang, A.S., A.I. Culley, and C.A. Suttle. 2004. Genome sequence and characterization of a virus (HaRNAV) related to picorna-like viruses that infects the marine toxic bloom-forming alga Heterosigma akashiwo. Virology 320:206-217. 26 2.1 Introduction F o r many years, viruses or v i rus - l ike particles have been reported from numerous taxa representing nearly a l l the classes o f eukaryot ic algae ( reviewed i n V a n Et ten et a l . 1991, V a n Et ten & M e i n t s 1999). The first report o f a vi rus isolate that infected a marine photosynthetic protist was i n 1979 ( M a y e r & T a y l o r 1979), but it was not un t i l 10 years later that viruses infect ing phytoplankton were readi ly isolated f rom seawater (Suttle et a l . 1990). Subsequently, there were numerous examples o f viruses isolated f rom the marine environment that infected eukaryot ic phytoplankton (Castberg et a l . 2002, Cot t re l l & Suttle 1991, Jacobsen et a l . 1996, Nagasak i and Y a m a g u c h i 1997, Sandaa et a l . 2001 , Suttle and C h a n 1995, Tarutani et a l . 2001) . A l t h o u g h these viruses infect a variety o f distantly related taxa, they are morpho log i ca l l y remarkably s imi la r (Suttle 2000). A l l are large polyhedrons that contain double-stranded D N A ranging f rom 130 to 560 kbp and appear to belong i n the fami ly Phycodnaviridae. Recent ly , several different types o f viruses have been isolated that infect Heterosigma akashiwo (Raphidophyceae), a un ice l lu la r phototrophic marine flagellate that is c o m m o n i n temperate coastal waters and w h i c h forms tox ic b looms that can k i l l f i sh (Tay lo r 1990). These viruses include H a V , w h i c h appears to be long w i t h i n the Phycodnaviridae (Nagasaki & Y a m a g u c h i 1997), H a N I V , w h i c h forms paracrystall ine arrays w i t h i n the H. akashiwo nucleus (Lawrence et a l . 2001) , and H a R N A V , a single-stranded R N A virus that assembles w i t h i n the cy top lasm o f infected cel ls ( T a i et a l . 2003). H a R N A V particles appear to have icosahedral symmetry and are approximately 25 n m i n diameter (Ta i et a l . 2003). Sequence analysis o f the putative R N A - d e p e n d e n t R N A polymerase ( R d R p ) doma in f rom H a R N A V shows it is related to the p icorna- l ike virus superfamily ( C u l l e y et a l . 2003). Here , we report the complete genome sequence o f H a R N A V . A n a l y s i s o f the genome reveals that this v i rus is related to viruses f rom the p icorna- l ike superfamily o f viruses (the Picornaviridae, Caliciviridae, Dicistroviridae, Sequiviridae, Comoviridae and Potyviridae famil ies , L i l j a s et a l . 2002), but does not be long i n any o f these currently defined famil ies . T h e H a R N A V putative nonstructural protein domains and capsid proteins show a mosa ic pattern o f relationships w i t h sequences f rom viruses i n these famil ies . The organizat ion o f H a R N A V structural proteins appears to be the same as i n the Dicistroviridae, a l though the overa l l genome architecture is different f rom these viruses. Based o n our sequence comparisons and analyses o f 27 genome structure, w e argue that H a R N A V defines a new fami ly (Marnaviridae) o f p icorna- l ike viruses. 2.2 Results 2.2.1 Features of the HaRNAV genome sequence W e have determined the complete nucleotide sequence o f the H a R N A V genome ( G e n B a n k accession number A Y 3 3 7 4 8 6 ) . The genome is 8587 nucleotides (nts) long , p lus a p o l y ( A ) ta i l , w h i c h is i n close agreement w i t h the predicted size o f 8.6 kbp , based o n the analysis o f the p rev ious ly publ ished denaturing gel o f the H a R N A V genome ( T a i et a l . 2003). The genome contains a 7743-nt open reading frame ( O R F ) (Figure 2.1) that is predicted to encode a prote in o f 2581 amino acids. N o other O R F s that c o u l d encode proteins larger than 60 amino acids are predicted (Figure 2.1). A s s u m i n g we have correctly predicted the start o f the po lypro te in , the 5 ' and 3 ' untranslated regions ( U T R s ) are 483 and 361 nts l ong , respect ively, account ing for 9 .8% o f the genome. The protein sequence predicted by the large O R F contains conserved sequence domains f rom R N A viruses, and the sequences found i n the H a R N A V structural proteins (see be low) . W e used an o l igo (dT) pr imer as part o f the 3 ' R A C E system (see Mate r ia l s and Methods) to c lone the 3 ' end o f the genome. T h i s suggests that there is a 3 ' p o l y ( A ) ta i l , a l though previous experiments suggested its absence (Ta i et a l . 2003) . A reg ion near the 3 ' end o f the genome (nts 8420-8445) contains 22 out o f 26 bases that are U , w h i c h c o u l d fo rm a secondary structure w i t h the p o l y ( A ) ta i l thereby g i v i n g the impress ion that the genome does not have a p o l y ( A ) ta i l . The addi t ion o f D M S O to the first strand c D N A synthesis dur ing the 3 ' R A C E procedure (see Mater ia l s and Methods ) may have helped to disrupt any secondary structure. A l l five o f the 3 ' R A C E clones gave the same sequence for the end o f the genome. W e used a 5 ' R A C E approach to c lone the 5 ' end o f the H a R N A V genome. W u et a l . (2002) found that secondary structure i n the 5 ' end o f the Perina nuda p icorna- l ike virus ( P n P V ) genome interfered w i t h the 5 ' R A C E procedure. W e encountered the same phenomenon, and sequenced nine clones to determine the 5 ' end o f the H a R N A V sequence (F igure 2.2) . T h i s revealed a potential stem-loop structure f rom nts 5 to 39 i n w h i c h 14 o f 15 bases are capable o f h y b r i d i z i n g and fo rming a loop o f four bases (Figure 2.2). T h i s is very s imi la r to the P n P V 28 sequence where a predicted 13-base stem-loop structure occurs f rom nts 11 to 38 o f the genome ( W u et a l . 2002) . A n a l y s i s o f the H a R N A V 5' U T R w i t h the web-based vers ion o f the R N A secondary structure predic t ion program m f o l d 3.0 (Mathews et a l . 1999, Z u k e r et a l . 1999) predicted a large amount o f secondary structure i n the 5 ' U T R (not shown) i nc lud ing the potential s temloop structure ment ioned above. Secondary structure i n this r eg ion m a y be important for repl ica t ion o f the R N A as found for po l iov i rus ( A n d i n o et a l . 1993). Other potential secondary structures that were predicted closer to the putative start o f the po lypro te in are l i k e l y important as part o f an internal r ibosome entry site ( I R E S ) for po lypro te in translat ion ( reviewed i n H e l l e n & Sarnow 2001 , Mar t inez-Salas et a l . 2001 , Sa rnow 2003). A py r imid ine -r i c h stretch o f sequence where in 22 o f 29 bases are pyr imid ines occurs f rom nts 447 to 475, ending eight bases upstream o f the predicted start codon o f the large O R F (Figure 2.2); such p y r i m i d i n e - r i c h sequences are conserved i n p icornavirus I R E S s ( H e l l e n & Sarnow 2001 , Pestova e t a l . 1991). There are two notable repeats i n the genome sequence. One pair o f repeats invo lves sequences i n the proposed 5 ' and 3 ' U T R s . The 136-nt region f rom nts 312 to 448 shares 123 ident ical bases w i t h the 137-nt sequence f rom nts 8265 to 8401 (90% identity). These repeated sequences might have a funct ion i n R N A repl ica t ion or po lypro te in translation. It has been suggested that c i rcu lar iza t ion o f v i r a l m R N A s may be important for a id ing po lypro te in translation ( reviewed i n Mar t inez-Sa las et a l . 2001). These repeats c o u l d be i n v o l v e d i n this funct ion through R N A - R N A or R N A - p r o t e i n - R N A interactions. R N A secondary structure analysis w i t h m f o l d predicted the sequence f rom bases 312 to 448 w o u l d f o l d o n itself; therefore these repeats are theoretically capable o f interacting direct ly . The other set o f repeats is an over lapping repeat i n the predicted cod ing region. The 95 base sequence f rom nts 1052 to 1147 shares 87 ident ical bases w i t h the sequence between nts 1124 and 1218 (91.6% identi ty). T h i s creates a 31-amino-ac id self-overlapping repeat where 27 o f 31 amino acids are ident ical ( 8 7 % identi ty) . It is diff icul t to speculate o n a potential function for this repeated sequence because w e have no ind ica t ion o f a possible funct ion for that region o f the polyprote in . 2.2.2 Determination of HaRNAV genome polarity It was previous ly shown that the H a R N A V genome is a single-stranded R N A molecu le ( T a i et a l . 2003). W e performed separate first-strand c D N A synthesis reactions w i t h two pr imers , 29 263P11 and 263P12 (see Mater ia l s and Methods ; Table 2.1). The pr imer 263P12 w i l l b i n d to and initiate first-strand c D N A synthesis f rom a positive-stranded R N A molecu le , whereas 263P11 w i l l b i n d to and initiate first-strand c D N A synthesis f rom a negative-stranded molecu le . Af ter treatment w i t h R N a s e H , these first-strand c D N A reactions were used as templates for P C R w i t h these two primers. A P C R product was obtained on ly w i t h the first-strand react ion that was performed w i t h 263P12 (Figure 2.3), showing that the genome is posi t ive stranded. 2.2.3 Analysis of HaRNA V structural proteins W e performed N - t e r m i n a l sequence analyses o f prote in bands (Figure 2.4; Table 2.2) f rom pur i f ied v i rus particles. These sequences were found i n the amino ac id sequence predicted by the large O R F (Figure 2.1), w h i c h therefore encodes the v i r a l structural proteins. The approximate locations o f the N - t e r m i n i w i t h i n the po lypro te in sequence are shown i n F igure 2.5. Tab le 2.2 gives the apparent molecula r weights o f the protein bands based o n S D S - P A G E (Figure 2.4) and the predicted molecular weights based o n the genome sequence (using the N -te rmin i as guides for the boundaries between proteins i n the po lypro te in ; F igure 2.5). The theoretical molecula r weights o f the proteins are close to the apparent sizes based o n their migra t ion w i t h i n the gel , but four o f f ive proteins migrated at sizes s l ight ly larger than predicted (Table 2.2). C o m p a r i s o n o f the sequences around the protein boundaries (Table 2.2) d i d not reveal a clear pattern, preventing the ident if icat ion o f a potential consensus protease recogni t ion site. H o w e v e r , three out o f f ive processing sites are on the N - t e r m i n a l side o f a serine residue, and two o f the sites share f ive ident ical residues ( S T - S E I ) . The lack o f a recognizable pattern at the cleavage sites cou ld indicate that more than one protease is i n v o l v e d i n processing the polypro te in . A s reported previous ly ( T a i et a l . 2003), H a R N A V particles pur i fy i n different layers o n a sucrose gradient, and particles f rom these two layers give different prote in banding patterns (Figure 2.4). T h i s difference i n prote in compos i t ion may exp la in w h y particles f rom the upper layer are noninfect ious w h i l e particles i n the lower band can infect H. akashiwo ( T a i et a l . 2003) . There are several major differences between particles f rom the upper and lower layers. In the prote in gel (Figure 2.4), there is a large amount o f protein B a n d 1 i n the particles f rom the upper layer and comparat ively little o f this protein i n particles f rom the lower layer, w h i l e Bands 5 and 7 are compara t ive ly stronger bands i n particles f rom the lower layer. The reduct ion o f prote in 30 B a n d 1 and coincident intensif icat ion o f protein B a n d 5 are w e l l expla ined by the N - t e r m i n a l sequencing results and the genome sequence. The N- te rminus o f protein B a n d 5 is located approximate ly 7 k D a away f rom the N- te rminus o f protein B a n d 1 i n the C- te rmina l d i rec t ion (i.e. downstream; F igure 2.5). Therefore, it appears that protein B a n d 5 may result f rom processing o f protein B a n d 1 dur ing maturat ion o f the v i r a l particles. There is a sma l l amount o f the B a n d 1 protein i n the lower sucrose layer (Figure 2.4) and this protein has the same N -terminus (first s ix residues determined, sequence S E I V E Y ) as the protein B a n d 1 i n the upper sucrose layer. It is possible that not a l l o f protein B a n d 1 gets processed or that some " i m m a t u r e " noninfect ious particles were i n the lower sucrose sample. S i m i l a r l y , the N- t e rminus o f prote in B a n d 7 is located approximately 4 k D a C- te rmina l ly f rom the N- te rminus o f protein B a n d 6 (Figure 2.5). Pro te in B a n d 7 may arise f rom secondary processing o f the B a n d 6 prote in or f rom an alternative cleavage o f a precursor protein; both proteins are present i n mature capsids. The prote in f rom B a n d 3 was analyzed by mass spectrometry and the result ing peptide sequences were found i n the same reg ion o f the po lypro te in corresponding to prote in B a n d 2 (Figure 2.5), indica t ing that B a n d 3 is a different vers ion o f the prote in i n B a n d 2. The reg ion o f the gel containing protein Bands 4 and 5 (Figure 2.4) was also analyzed by mass spectrometry and the result ing peptide sequences were found i n the Bands 2 and 5 regions o f the po lypro te in (Figure 2.5). 2.2.4 Comparisons of HaRNAV proteins and putative protein domains to other virus sequences A n a l y s i s o f the H a R N A V predicted po lypro te in sequence by B L A S T searches o f the N C B I database ( A l t s c h u l et a l . 1997) revealed the presence o f a conserved R N A - d e p e n d e n t R N A polymerase ( R d R p ) domain , a conserved R N A virus R N A helicase domain , and conserved p icorna- l ike v i rus capsid protein domains (Figure 2.5). Discuss ions o f the i nd iv idua l prote in and putative prote in doma in sequences are be low. W e used the putative H a R N A V R d R p protein doma in sequence corresponding to conserved regions I - V I I I ( K o o n i n & D o l j a 1993) i n a B L A S T search o f the G e n B a n k database. The most s imi la r sequences found i n this search were f rom a variety o f viruses f rom the Comoviridae, Sequiviridae, and Dicistroviridae famil ies . T h e top-scoring sequence was f rom tomato ringspot v i rus ( T o R S V ; Comoviridae), and an al ignment o f the H a R N A V sequence (regions I—VIII) w i t h the corresponding reg ion f rom T o R S V showed that they are 2 9 % ident ical . Ano the r h igh-scor ing al ignment was w i t h acute bee paralysis v i rus ( A B P V ; Dicistroviridae), and al ignment o f this sequence w i t h H a R N A V (regions I—VIII) 31 showed 3 0 % identi ty. A l i g n m e n t o f the sequences f rom T o R S V and A B P V showed that these are 3 0 % ident ical . Therefore, the putative H a R N A V R d R p doma in sequence is as s imi la r to v i rus sequences f rom two different p icorna- l ike famil ies as these sequences are to each other. A n al ignment o f these three sequences is shown (Figure 2.6). F o r compar ison, w e a l igned the three R d R p sequences, i nd iv idua l ly , w i t h the R d R p sequence f rom tobacco etch v i rus ( T E V : Potyviridae) whose R d R p sequence has been shown to tree outside the other p icorna- l ike v i rus famil ies i n phylogenet ic analyses ( K o o n i n & D o h a 1993). The H a R N A V , T o R S V , and A B P V sequences are 19%, 2 3 % , and 2 1 % ident ical to the T E V sequence, respectively, w h i c h is lower than the H a R N A V , T o R S V , and A B P V sequences are to each other. W e used the H a R N A V sequence that showed s imi la r i ty to a central reg ion o f a conserved R N A vi rus R N A helicase d o m a i n (residues 5 2 4 - 6 8 6 o f the polyprotein) for a B L A S T search o f the N C B I database. S i m i l a r to the results found for the putative R d R p domain , this showed the H a R N A V sequence is most s imi la r to sequences f rom viruses i n the Picornaviridae, Dicistroviridae, and Comoviridae famil ies . The top-scoring sequence was f rom human rh inovi rus 14 ( H R V 1 4 ; Picornaviridae), f o l l owed by sequences f rom several other rhinoviruses . A n al ignment o f the sequences over the complete region that was input into the B L A S T showed the H R V 1 4 sequence is 2 3 % ident ical to H a R N A V . The next highest scor ing sequence is A B P V , w h i c h is 21%o ident ical to H a R N A V and 2 4 % ident ica l to H R V 1 4 . A n al ignment o f the three sequences is shown (Figure 2.6). U s i n g the experimental ly determined N - t e r m i n i as boundaries, the i n d i v i d u a l structural proteins were used for B L A S T searches o f the N C B I database. A search w i t h the sequence f rom residues 1776 to 1989 o f the po lypro te in (corresponding to protein Bands 6 and 7; F igures 2.4 and 2.5) showed that this caps id protein is most s imi la r to capsid proteins f rom viruses i n the Dicistroviridae f ami ly . There were also lower-scor ing alignments w i t h viruses f rom the Picornaviridae and Caliciviridae famil ies . The best scor ing match was w i t h the V P 2 prote in f rom Cr icke t paralysis vi rus ( C r P V ; Dicistroviridae); the two sequences were 2 4 % ident ical over the B L A S T - a l i g n e d residues. S i m i l a r l y , the capsid protein sequence f rom residues 2060 to 2317 (corresponding to B a n d 5; F igures 2.4 and 2.5) is most s imi lar to the V P 3 prote in f r o m C r P V , and the top nine scor ing sequences were f rom insect p icorna- l ike viruses. T h i s search showed that this prote in is also s imi la r to caps id proteins f rom viruses i n the Picornaviridae and Sequiviridae famil ies . Interestingly, when the C r P V V P 3 sequence is subjected to B L A S T , the 32 H a R N A V prote in scores as h i g h as K a s h m i r bee v i rus and higher than Taura syndrome vi rus , both o f w h i c h are members o f the same fami ly as C r P V . A n al ignment o f the H a R N A V and C r P V sequences (F igure 2.6) shows the H a R N A V protein is 2 4 % ident ical to the C r P V protein. A B L A S T search w i t h the capsid prote in sequence f rom residues 2318 to 2581 (corresponding to prote in B a n d 2; Figures 2.4 and 2.5) showed that this protein has less s imi la r i ty to other v i r a l proteins than the two prev ious ly discussed capsid proteins. F o r this protein, the in i t i a l B L A S T search returns some low-sco r ing matches to Dicistroviridae capsid proteins, but the second iteration o f a P S I - B L A S T search finds the C r P V V P 1 protein sequence w i t h a h igh-scor ing al ignment, suggesting this is the H a R N A V V P 1 homologue . A search w i t h the sequence between residues 1990 and 2059 d i d not return any matches to sequences i n the N C B I database. 2.2.5 Phylogenetic analyses of HaRNAV proteins and putative protein domains W e constructed phylogenet ic trees i n attempts to evaluate the evolut ionary relat ionship o f H a R N A V to other viruses. W e first compared the relat ionship o f the putative H a R N A V R d R p d o m a i n to R d R p sequences f rom picorna- l ike viruses. The al ignments inc luded residues 1 3 6 2 -1619 o f the H a R N A V predicted po lypro te in that represent the conserved regions I - V I I I ( K o o n i n & D o l j a 1993), and the corresponding regions f rom the other viruses. W e chose Potyviridae sequences [from tobacco etch v i rus ( T E V ) and barley y e l l o w mosaic v i rus ( B a Y M V ) ] as the outgroup because R d R p sequences f rom these viruses are i n a separate lineage relative to other p ico rna - l ike viruses ( K o o n i n & D o l j a 1993) w i t h w h i c h H a R N A V has s ignif icant sequence s imi la r i ty (see above). The m a x i m u m l i k e l i h o o d ( F i g . 2.7) and ne ighbor- jo in ing (not shown) trees resolved the Caliciviridae, Picornaviridae, Comoviridae, and Sequiviridae sequences into their famil ies , but not the Dicistroviridae. W i t h i n the Dicistroviridae, the viruses w e inc luded f rom the Cripavirus genus ( D C V , C r P V , T S V , A B P V , R h P V , B Q C V , and H i P V ) constitute a clade, but w i t h poor support and bootstrap values. The viruses f rom the Iflavirus genus ( I F V , S b V , and P n P V ) i n this f ami ly d i d not resolve into a clade. The m a x i m u m l i k e l i h o o d tree (Figure 2.7) does not suggest a close relationship between the H a R N A V sequence and any o f the established famil ies , a l though the neighbor- joining tree (not shown) p laced H a R N A V w i t h i n the Cripavirus genus o f the Dicistroviridae f ami ly w i t h a l o w bootstrap value o f 53. B o t h trees support the placement o f the H a R N A V sequence w i t h i n the p icorna- l ike group, relative to the Potyviridae sequences. Interestingly, i f the H a R N A V sequence is exc luded f rom the al ignments and phylogenet ic analyses, the Cripavirus clade has m u c h higher m a x i m u m l i k e l i h o o d support 33 and neighbor- jo in ing bootstrap values (75 and 90, respectively; not shown). S i m i l a r l y , we constructed phylogenet ic trees w i t h part o f the H a R N A V putative helicase d o m a i n (residues 524 -686 o f the polyprotein) and the putative R d R p d o m a i n sequences concatenated into one, and the corresponding sequences f rom other p icorna- l ike viruses. Sequences f rom the Potyviridae were not inc luded because they conta in helicase sequences w i t h different conserved motifs ( K o o n i n & D o l j a 1993). These trees (not shown) d i d not group the H a R N A V sequence w i t h any other v i rus fami ly . In addi t ion to the fami ly groups supported i n the R d R p analyses (Figure 2.7), these trees addi t ional ly supported independent clades for both the Cr ipavi ruses and Iflaviruses w i t h h igh (>80) m a x i m u m l i k e l i h o o d support and ne ighbor- jo in ing bootstrap values. The same approach was used to analyze concatenated (putative) h e l i c a s e / R d R p / V P 3 - l i k e sequences. Because o f greater sequence divergence i n the V P 3 proteins across the p icorna- l ike v i rus famil ies , w e were able to inc lude fewer fami l ies i n these analyses. The trees w e generated w i t h these sequences (Figure 2.8) supported the same relevant clades as the previous analyses. T h e y also do not support a strong relat ionship between H a R N A V and the other viruses. 2.3 Discussion Sequence analyses show that H a R N A V is c lose ly related to viruses f rom the p icorna- l ike superfamily o f viruses. H a R N A V particles are icosahedral w i t h a diameter o f approximate ly 25 n m (Ta i et a l . 2003) , a size and structure consistent w i t h p icorna- l ike viruses (other than the Potyviridae). The predicted 5 ' U T R contains conserved p icorna- l ike sequences and putative structural features. The capsids comprise three major structural prote in sequences that are recognizably s imi la r to k n o w n p icorna- l ike caps id proteins. The organizat ion o f the i nd iv idua l structural proteins, as indicated by sequence s imilar i t ies , is s imi la r to that found i n the Dicistroviridae. H o w e v e r , the overa l l structure o f the genome and phylogenet ic analyses indicate that H a R N A V does not belong w i t h i n any o f the established p icorna- l ike v i rus famil ies . Therefore, we propose that H a R N A V is the first member o f a new vi rus fami ly (Marnaviridae), w h i c h most l i k e l y falls w i t h i n the p icorna- l ike superfamily. It is not surprising that this v i rus belongs to a prev ious ly u n k n o w n fami ly because i t is the first described s s R N A vi rus that infects a photosynthetic protist. It seems l i k e l y that as more effort is spent l o o k i n g for these types o f 34 viruses i n microorganisms and i n marine environments, more nove l groups o f viruses w i l l be found. Indeed, four new putative groups o f p icorna- l ike viruses have been postulated based o n R d R p gene sequences ampl i f i ed f rom natural communi t ies o f marine viruses ( C u l l e y et a l . 2003) . The H a R N A V genome structure is most l ike that found i n potyviruses (e.g. tobacco etch vi rus ; A l l i s o n et a l . 1986) i n that the putative nonstructural protein domains are located at the N -terminus and the structural proteins are at the C-terminus o f a s ingle large po lypro te in encoded o n a monoparti te genome (Figure 2.5). Howeve r , the H a R N A V capsid structure is icosahedral , whereas potyvi rus capsids are fi lamentous. V i ru se s f rom the Caliciviridae also have a s imi la r genome structure, but they encode more than one po lypro te in (e.g. fel ine ca l i c iv i rus , Carter et a l . 1992). N o n e o f the database searches or phylogenet ic analyses suggested that H a R N A V is c lose ly related to viruses f rom either o f these famil ies . The Cr ipavi ruses i n the Dicistroviridae and the unassigned insect p icorna- l ike vi rus , Acyrthosiphon pisum v i rus ( A P V ) , also encode their structural proteins i n the 3 ' reg ion o f the genome (Johnson & Chr i s t i an 1998, M a r i et a l . 2002, M o o n et a l . 1998, N a k a s h i m a et a l . 1999, Sasaki et a l . 1998, V a n der W i l k et a l . 1997, v a n Muns te r et a l . 2002) . H o w e v e r , these viruses encode the caps id and nonstructural polyproteins i n distinct O R F s . The organizat ion o f the i nd iv idua l structural prote in units w i t h i n the caps id po lypro te in reg ion o f H a R N A V appears to be s imi la r to the organizat ion i n the Dicistroviridae. T h i s i s based o n the sequence relationships between the i nd iv idua l H a R N A V capsid proteins and the C r P V capsid proteins (Figure 2.5), for w h i c h the crystal structure has been determined (Tate et a l . 1999). N - t e r m i n a l sequencing data shows that there is a cleavage generating the V P 3 - l i k e prote in f rom a larger protein present i n the noninfectious particles (Figure 2.5, Tab le 2.2). T h i s smaller prote in fragment may be a V P 4 - l i k e protein, al though w e have no evidence that the smal l prote in released is associated w i t h mature capsids, and this protein is not recognizably s imi la r to any other sequences w h e n used for a B L A S T search o f the N C B I database. In the picornaviruses, V P 4 is c leaved f rom the N- te rminus o f V P 0 to generate V P 4 and V P 2 , whereas i n the Dicistroviridae, V P 4 is c leaved f rom the N- te rminus o f V P 0 to generate V P 4 and V P 3 . S i m i l a r to both o f these famil ies , H a R N A V capsids comprise three large structural protein sequences (al though some proteins appear to occur i n mul t ip le bands as discussed be low) . One aspect o f H a R N A V capsid protein structure appears analogous to structural prote in 35 processing i n P. nuda p icorna- l ike v i rus ( P n P V ) . In this v i rus , mul t ip le capsid prote in bands are generated f rom the same prote in reg ion because proteins o f different apparent molecu la r weights share the same N - t e r m i n i ( W u et a l . 2002). These authors speculated that these proteins shared N - t e r m i n i but were processed differently at their C- t e rmin i . W e found that different H a R N A V structural proteins contain sequences f rom the same prote in reg ion but have different N - t e r m i n i . A n a l y s i s o f the N - t e r m i n i o f the protein Bands 6 and 7 (Figure 2.4) show that these two bands comprise the H a R N A V V P 2 - l i k e sequences, as indicated by s imi la r i ty w i t h the C r P V V P 2 protein. A n a l y s i s by mass spectrometry o f capsid proteins f rom infectious particles i n the reg ion o f the ge l containing Bands 4 and 5 (Figure 2.4) showed that this reg ion contains a mixture o f two proteins (from the V P 3 - and V P l - l i k e regions o f the polyprote in , F igure 2.5). The V P l - l i k e sequences were also found i n the analysis o f B a n d 3 (Figure 2.4) by mass spectrometry. Therefore, three prote in bands (2, 3, and 4; Figure 2.4) contain sequences f rom the V P l - l i k e r eg ion o f the polyprote in . It is possible that these proteins differ by processing at the N- te rminus , C- terminus , or by another post-translational modi f ica t ion . T w o o f the three bands appear larger by S D S - P A G E (33 and 32 k D a ) than predicted from the genome sequence (29 k D a ) , as measured f rom the k n o w n N- te rminus o f the largest band to the end o f the po lypro te in sequence (Table 2.2, F igure 2.5). A n explanat ion o f these observations is that amino acids are not r emoved f rom the largest vers ion o f the protein to generate the smaller versions, but rather that there are post-translational addit ions to the proteins. G l y c o s y l a t i o n is one possible post-translational mod i f i ca t ion that has been observed w i t h other algal v i rus structural proteins ( F r i e s s - K l e b l et a l . 1994, W a n g et a l . 1993), but experiments need to be done to test this hypothesis. The H a R N A V prote in sequences show a mosaic pattern o f relationships to p icorna- l ike v i rus sequences. H o w e v e r , sequences f rom the Dicistroviridae gave higher scor ing matches w h e n the H a R N A V structural proteins were used for database searches. T h i s , and the apparently conserved structural protein organizat ion w i t h i n the po lypro te in , may reflect a closer evolut ionary relationship o f H a R N A V w i t h these viruses than w i t h the other famil ies o f p icorna-l i ke viruses. O v e r a l l however , our analyses do not suggest that H a R N A V belongs i n any o f the p icorna- l ike famil ies . T h i s is based o n the lack o f a consistent pattern o f sequence relat ionships between H a R N A V and the other p icorna- l ike v i rus famil ies , and the overa l l genome organiza t ion o f H a R N A V i n compar i son w i t h the other famil ies . Outside o f the regions indicated i n F igure 2.5, none o f the predicted H a R N A V protein sequence shows any recognizable s imi la r i ty to 36 k n o w n protein (v i ra l or other) sequences. W e have determined and analyzed the genome sequence o f the s s R N A vi rus , H a R N A V , infect ing the marine unice l lu la r photosynthetic a lga H. akashiwo. T o our knowledge , this is the first genome sequence reported for a positive-stranded R N A vi rus f rom an a lga or other protist. O u r analyses o f the genome sequence and structural protein compos i t ion indicate that H a R N A V is most c lose ly related to viruses f rom the p icorna- l ike superfamily. The evidence suggests that H a R N A V is the first representative o f a n e w vi rus f ami ly , w h i c h shares the def in ing characteristics o f the p icorna- l ike v i rus superfamily. 2.4 Materials and Methods 2.4.1 Purification of virus particles H. akashiwo cultures were g r o w n and infected w i t h H a R N A V (isolate S O G 2 6 3 ) as described (Ta i et a l . 2003). V i r u s particles were pur i f ied f rom 1.5 1 o f culture lysate by centrifugation. The lysate was first cleared by centrifugation at 4000 x g for 1 h . The supernatant was then centrifuged for 5 h at 108 000 x g to pellet the viruses w h i c h were then resuspended i n a total v o l u m e o f 200 u l o f 50 m M Tr i s ( p H 7.6). These samples were centrifuged for 2 m i n at 4000 x g to remove large mater ial and the result ing supernatant layered o n top o f a l inear 5 - 3 5 % sucrose (w/v) gradient ( in 50 m M Tr i s ; p H 7.6). The gradient was centrifuged for 3 h at 50 000 x g i n a B e c k m a n (Palo A l t o , U S A ) S W - 4 0 rotor and the v i r a l bands pur i f ied f rom the gradient as described (Ta i et a l . 2003). 2.4.2 Determination of HaRNAV genomic sequence R N A was pur i f ied f rom the viruses us ing the v i r a l R N A k i t (Qiagen, Miss i s sauga , Canada) accord ing to the manufacturer 's instructions. The first strand o f c D N A synthesis was performed w i t h Superscript II R N a s e H " (Invitrogen, Bur l ing ton , Canada) accord ing to manufacturer 's instructions us ing either o l igo(dT ) i2 - i s ( A m e r s h a m Pharmac ia B i o t e c h , Piscataway, U S A ) or d ( N ) i 0 T as pr imers . Af ter treatment w i t h R N a s e H (Invitrogen), this c D N A was used as template for P C R w i t h P l a t i num Taq (Invitrogen) us ing either d (N) ioT or v i r a l -specific pr imers (Table 2.1) or a combina t ion o f d (N) ioT and a v i ra l -speci f ic pr imer . P C R reactions w i t h v i r a l specific pr imers (400 n M each pr imer , 2 m M M g C b , 400 n M each d N T P ) were r un as fo l lows : 95 ° C for 60 s, f o l l owed by 30 cycles o f 95 ° C for 45 s, 54 ° C for 45 s, and 37 72 ° C for 1 m i n per k i lobase o f expected product, and a f ina l incubat ion at 72 ° C for 5 m i n . P C R reactions w i t h d (N) ioT [1 m M d(N) ioT, 400 n M v i r a l specific p r imer ( i f appl icable) , 4 m M M g C l 2 , 400 n M each d N T P ] were run as fo l lows : 95 ° C for 60 s, f o l l o w e d by 35 cycles o f 95 ° C for 45 s, 40 ° C for 2 m i n and 72 ° C for 3 m i n , and a f ina l cyc le o f 72 ° C for 5 m i n . P C R products were pur i f i ed w i t h the Q I A q u i c k P C R pur i f ica t ion system (Qiagen) and l igated w i t h p G E M - T (Promega, M a d i s o n , U S A ) direct ly when on ly vira l -specif ic pr imers were used, or digested w i t h restr ict ion enzymes (Mbol, Rsal, Haelll, or Alu\\ N e w E n g l a n d B i o l a b s , Miss i s sauga , Canada) before l iga t ion w i t h p U C 1 9 ( V i e i r a and M e s s i n g 1982) w h e n d(N) ioT was used. The 5 ' and 3 ' ends o f the v i r a l genome were c loned us ing the 5 ' and 3 ' R A C E systems (Invitrogen) according to manufacturer 's instructions, w i t h the f o l l o w i n g exceptions. V i r a l R N A was treated w i t h 400 p g / p l proteinase K (S igma- A l d r i c h , O a k v i l l e , Canada) for 60 m i n at 37 ° C and subsequently extracted w i t h phenol and precipitated w i t h ethanol before 5 ' R A C E , as described (Johnson & Chr i s t i an 1998). T o reduce secondary structure i n the R N A , first-strand c D N A synthesis was done i n the presence o f 5% d imethyl sulfoxide ( D M S O ; S i g m a - A l d r i c h ) for 3 ' R A C E , and i n the presence o f 4 % D M S O and at 50 ° C for the 5 ' R A C E . Pr imers 263P8 and 263P15 were used as the first strand and nested v i ra l -speci f ic pr imers , respectively, for amp l i fy ing the 5 ' end. P r imer 263P1 was used as the v i ra l -speci f ic p r imer for the 3 ' R A C E procedure. A total o f nine clones f rom the 5 ' end and five clones f rom the 3 ' end were sequenced. D N A sequencing was carr ied out us ing the universa l M l 3 pr imers for c loned fragments, and vi rus-speci f ic pr imers for c loned fragments and P C R products. Sequencing reactions were done w i t h B i g D y e vers ion 3.0 ( A p p l i e d Biosys tems, Foster C i t y , U S A ) and analyzed by the Un ive r s i t y o f B r i t i s h C o l u m b i a N u c l e i c A c i d and Prote in Service ( N A P S ) Fac i l i t y (Vancouver , Canada). The genome sequence has been deposited i n the G e n B a n k database and assigned accession number A Y 3 3 7 4 8 6 . 2.4.3 Protein sequencing F o r N - t e r m i n a l protein sequencing, pur i f ied v i r a l particles were subjected to S D S - P A G E and transferred onto a po lyv iny l idene di f luor ide membrane ( B i o - R a d , Miss i s sauga , Canada) accord ing to the manufacturer 's recommendations, and the N - t e r m i n a l sequences were determined at the Un ive r s i t y o f B r i t i s h C o l u m b i a N A P S F a c i l i t y . F o r mass spectrometry prote in 38 sequencing, pur i f ied v i r a l particles were subjected to S D S - P A G E and Coomass ie blue-stained prote in bands were excised and sent to the U n i v e r s i t y o f V i c t o r i a G e n o m e B r i t i s h C o l u m b i a Proteomics Center ( V i c t o r i a , Canada) for analysis by nanospray-quadrupole-t ime o f f l ight ( E S I -Q - T O F ) mass spectrometry. 2.4.4 Nucleotide and protein sequence analyses Identif icat ion o f potential cod ing regions and predictions o f prote in molecula r weights were done w i t h D N A Strider vers ion 1.2 ( M a r c k 1988). A n a l y s i s o f R N A sequences for potential secondary structure was done w i t h the web-based vers ion o f the program mfold v3 .0 (Mathews et a l . 1999, Z u k e r et a l . 1999). Database searches were performed w i t h the B L A S T a lgor i thm ( A l t s c h u l e t a l . 1997). 2.4.5 Sequence alignments The viruses f rom w h i c h sequences were used for the phylogenet ic analyses are l is ted w i t h their G e n B a n k sequence accession numbers i n Table 2.3. E a c h protein sequence group was a l igned us ing C L U S T A L X v l . 8 1 (Thompson et a l . 1997) w i t h the B L O S U M series prote in weight matr ix (Ffenikoff & H e n i k o f f 1992). The complete sequence al ignments are avai lable f rom the authors upon request. 2.4.6 Phylogenetic tree construction and presentation The prote in sequence alignments were transformed into m a x i m u m l i k e l i h o o d distances us ing T R E E - P U Z Z L E v5 .0 (St r immer & v o n Haeseler 1996) and 25000 p u z z l i n g steps. The default mul t ip le substitution mat r ix chosen by T R E E - P U Z Z L E was used [Var iab le T i m e ( V T ) , M u l l e r and V i n g r o n 2000] . Trees were plotted using T r e e V i e w v l . 6 . 5 (Page 1996). F o r the R d R p tree, sequences f rom the Potyviridae ( T E V and B a Y M V ) were used for an outgroup because they are i n a separate lineage relative to other p icorna- l ike v i rus R d R p sequences ( K o o n i n & D o l j a 1993). Ne ighbor - jo in ing trees were constructed w i t h P A U P * v4.0 (Swof ford 2000) , and bootstrap values calculated based o n percentages o f 1000 replicates are shown o n the trees. 39 2.5 Tables and Figures Table 2.1 Primers used for cDNA synthesis, P C R and RT-PCR. Primer Sequence (5'-3') Location Strand based on 263P1 CTCGCTCAACAGGTACACAA 7623-7642 + 263P2 CTTCCCG CATTCAGTTCG 1986-2003 -263P4 AAAAGTGATGATGTTTGAAGAC 2298-2319 + 263P5 CCGATGTAGAAGTGGGTAGAT 6434-6454 -263P6 CGATTTTGTGAGCATTGGG 8139-8157 + 263P7 AGCACCCGTAACTTTTCACTGT 2048-2069 -263P8 GGGTCTAAATCACCACTAACTG 1241-1262 -263P9 TGGTGATTTGGCTTCTATTT i 5916-5935 + 263P10 ACAACTTTCCATACCACCCTC 5043-5063 -263P11 TGGTACTGCGTGGTTTTACT 3069-3088 + 263P12 ATTTCCGCCGATCTGATT 4281-4298 -263P13 TACCTACGAGTGTGGAAAATG 3986-4006 + 263P14 CTCTGGTTTGTTGGCGG 7794-7810 + 263P15 TTTTCTGCCTGCTTGACG 591-608 -263P16 GGTCCGCCGCAAACATCA 666-683 + 263P17 GCCTGTCACCAACTACAAAAT 3662-3682 -263P18 GGTTGATTGGTGCTTGG 7954-7970 -263P19 TGGATTCTACACGCAAAGTT 485-504 + 40 Table 2.2 N-terminal sequences of proteins from purified HaRNAV particles Ban/ Protein mol wt (kDa)b c N-terminal sequence Position of N terminus in polyprotein Theoretical protein mol wt (kDa) Sequence at cut site 1 39 SEIVEYXKGEHXGGD6 1990 36 PTST-SEIV 2 33 SEIISESGADPTLVL6 2318 29 FVST-SEII 5 29 SRPDLLGAPEPFVPR6 2060 29 LFGY-SRPP 6 26 S(or T)ETLCNf 1776 24 EKLL-TETL 7 24 VDGDLASILSAPRTV6 1810 20 RPGE-VDGD As labeled in Figure 2.4 Based on SDS-PAGE X indicates it was not possible to discern the amino acid identity at this position Based on the genome sequence data and using the determined N-termini as boundaries Only the first 15 residues of the determined sequence are shown Only the first 6 residues were determined 41 T a b l e 2.3 S u m m a r y o f v i ruses used i n phy logene t i c analyses Virus Abbreviation Accession number Acute bee paralysis virus ABPV NC_002548 Acyrthosiphonpisum virus APV NC_003780 Barley yellow mosaic virus BaYMV NC_002990 Black queen cell virus BQCV NC_003784 Carnation mottle virus CarMV NC_001265 Cowpea mosaic virus CPMV NC_003549 Cricket paralysis virus CrPV NC_003924 Drosophila C virus DCV NC_001834 Feline calicivirus FCV NC_00T481 Heterosigma akashiwo RNA virus HaRNAV AY337486 Himetobi P virus HiPV NC_003782 Human rhinovirus 14 HRV NC_001490 Human poliovirus PV NC_002058 Infectious flacherie virus IFV NC_003781 Parsnip yellow fleck virus PYFV NC_003628 Perina nuda picorna-like virus PnPV NC_003113 Rabbit hemorrhagic disease virus RHDV NC_001543 Rhopalosiphum padi virus RhPV NC_001874 Rice tungro spherical virus RTSV NC_001632 Sacbrood virus SbV NC_002066 Taura syndrome virus TSV NC_003005 Tobacco etch virus TEV NC_00T555 Tomato ringspot virus ToRSV NC 003840 A. nucleotide position RACE Figure 2.1 Analysis of the H a R N A V genome sequence for open reading frames, and coverage of the genome by P C R and cloning (A) Analysis for possible open reading frames. For each reading frame (labeled on the left of the figure), potential start codons (AUG) are shown with a half-height line and stop codons (UGA, UAA, and UAG) are shown by full-height lines. Reading frame 1 contains a large open reading frame starting at position 484 that is 7743 nts long and predicted to encode a 2581-amino-acid residue protein. (B) Coverage of the HaRNAV genome with PCR, RACE, and cDNA clones. The approximate locations of primers used (Table 2.1) are shown as arrowheads that reflect the direction of priming for each primer. PCR products obtained with viral-specific primers following first-strand cDNA synthesis (PCR), the 5' and 3' RACE clones (RACE), and cDNA clones obtained by PCR involving d(N)i0T primers (cDNA) are shown as lines. Details on the generation of the fragments/clones are given in Materials and Methods. 43 ? f f T 1 UGftUAGAAAGUUCUGCAC«AUUCU:;CAGAACUUUUUAAUACCCUUUGUAGACiGGAUAUCUACUUUAAUUGCUGGUGUGCGCCAC.CUCGCUUUUCG.\UGUG 101 AUUCAUAAUUAACUAUACACUUGAGCUUAAAACCAAOUGUC'AyGCGUAGGUAGCGCUCUUUAGCGUCUACUGUAGUGUAAUUAAUUUGUUUGCUCUAGUG 301 CCCAGCUGGCGGGGCAAUAACUACCAGCCUGGUUGUUGCA,\GCCGUGCCG/iAACAAGCGGUAUCACGACCCGUGUCCCUCCA' JGUACUCCUUAAAAACAA 401 GUGAGGUAAGGAGUUGUCCAUCUUCGAGAAAAAUUAAUAAAA' JGGACUUUUtfGCWCUUJGGCCCCCCCCCGCGUUAGGCACCU AUG. GAU UCC ACA i | : ; £ ; H ^ ^>.!-:::::M^::::;p;;:::;^ Figure 2.2 Sequence of the 5' untranslated region of the HaRNAV genome The first 495 nts of the genome sequence are shown. The arrowheads above the sequence mark the locations found as ends in nine 5' RACE clones, and the numbers above the arrowheads indicate the number of clones that ended at each position. The location of the stem of a predicted stem-loop structure at the 5' end is underlined; 14 of the 15 underlined bases can hybridize. The large open reading frame predicted to encode the single HaRNAV polyprotein begins at position 484 and the first four predicted amino acid residues are shown below the nucleotide sequence. A pyrimidine-rich region upstream of the predicted start codon, possibly important for polyprotein translation (Pestova et al. 1991), is double-underlined. 44 + kb 2.0 | I Figure 2.3 Demonstration that the HaRNAV genome is positive-stranded First strand cDNA synthesis was performed with a primer that would bind to a positive stranded genome (+) or a primer that would bind to a negative-stranded genome (-), followed by PCR with both primers (see Materials and Methods). A product is only visible in the + lane, indicating the genome is positive stranded. 45 Figure 2.4 Analysis of structural proteins from HaRNAV particles HaRNAV particles from the noninfectious (upper layer) and the infectious (lower layer) samples obtained during sucrose gradient purification were subjected to SDS-PAGE. Bands discussed in the manuscript are labeled ( 1 - 7). 46 H RdRp 20 24 4/5^ 6 7 1 5 V II I |™| 29 36 VP1 29 Figure 2.5 Representation of the predicted HaRNAV polyprotein The N- and C-termini are indicated on the ends. The locations of conserved protein domains are indicated within the polyprotein box: RNA virus helicase (H), protease (P) and RNA-dependent RNA polymerase (RdRp), structural proteins (VP3 and VP1). The location of N-termini found by N-terminal sequencing of the HaRNAV structural proteins are shown by heavy vertical black lines and labeled above by their corresponding protein band number (Figure 2.4, Table 2.2). The theoretical molecular weights (kDa) of proteins between the N-termini are shown below the lines underneath the polyprotein box. The regions of the polyprotein sequence that contain the peptide sequences found by mass spectrometry sequencing of proteins are indicated by the lines above the polyprotein box and labeled with their corresponding band number from Figure 2.4. 47 A. RNA-dependent RNA polymerase * * * : * . * . * * : . * * . 5 * i t * * * : . : * * . : * * : * ** * HaRNAV TKKBEALR1GKVP RTFYAASMNVIRAVRKYFCPVLQALKJWFIHAEIAIGTNAFGKIWM^ ToRSV VPKDERLKPSKVLEKPKTRTFWLPMYNLLLRJCXVGIKSSMQVNRHR^^ YQAYV ABPV TLKDERR-PIEKVDQLKTRWSNQPMDFSITFRMYYM^ VCIME HaRNAV V L L D L I D E S S L Y S O - - V D K l J ^ T L v E S I £ Q S F t A F W 3 ^ XSDFRSHVALIVY ToRSV HIVNFINK--LYND--EHSIVRGKLIJ^AHYGRWSVCGQRVFEVRAGMPSOCALTVIINSLFNEMLIRYVYRITVPRP LVNNFKQEVCLIVY ABPV KFADLANE—FYDDGSEHALIRHVLLMDVYNSTHICGDSVY^ITHSQPSGNPATTPLNCFINSMGLRMVFELCSKKYSALNGTKCYV^IKDFSKHVSIVSY **** : :* : : : * . * * : . * . : : ::*** : :* . * ::* HaRNAV GDDNNAAVRDEP—RYNFQSVAVTMGKFRMTYTNTDK—NDEMHIYQRLEDAEFLKR-LWVPGPLKVYAP-LSWDSINKRI ToRSV G D 0^IS IKFDTMKYFNGEQIKTI tAKYKVTITDGSn - K N S P V L R A K P t K Q I . D F t K R G F R V E S D G R V t A P - L D L C A I Y S S ABPV GDDNWINFS13EVSEWFUMETITEAFEKWFTYTDEI,KGKNGEVPKWRTIEDVQYLKnKFRYDSKRKVWEAPLCMOTILEMPNWC B . he l icase t* . . * ; . * #* *** . . * . : . : : . . * * * : : : : * : : HaRNAV EKLRVLITSTKPKEQPYCLFFSGPAGQGKSALCWLLIRAIEG--HSHFLGTRGLSPEEIEDIIJVQETFVHNCSATKWLSGYO HRV14 KMVNYMQFKSKQRTEPVCVI.IHGTPGSGKSLTTSIVGRAIAE--HFN-SAVYSLPP0 PKHFDGYQQQEWIMDDLNQHP ABPV SVDTSC^FGlWPRTQPrVIWLFGESGRGKSGMTWPLAIDLIOISLLDWVDEHRKFSKN IYMRNVEQEFWDKYQGQNIVCYDDFGQHRDSSS HaRNAV GDSSPLNFLIGAVNNNTAFTDQADVDSKGK^^YAAEMVLVSSNTRDt^5VNQLTNSPF8VARRLHL HRV14 -DGQDISMFCQMVS SVDFLPPMASLDNKGML - FTSNFVLASTNSNTLSPPTI LN - PEALVRRFGF ABPV NPWPEFMELIRTANIAPYPI^MAHLEDKRKTKFTSKVIIKtTSNVFEQBVHSLTF-PDAFRRRVDL C. capsid VP3 * : * . : * * . : * * * : : * : : : : . : * : . : . . * . : * : : * : : : : . .* * * * . . * : : : * * * . * : * : : . * HaRNAV SRPPLLGAPEPFVPRYLSSLANCDVPETVQKFSMTGRQEVCVDSSPL^ CrPV SRPTOQGKIGECKLRGQGRMANFIX^MSHKMALSSTW^ : * : : * * * * * : : * * . . . : . . : * : * ; . . ! . . * * * . : : * ; * * . . 5 HaRNAV GAYALTPLAYFSQPFRYWRGSIKYRFEWASAFHRGRLRWWOPVLYSI.DAP FMQNFSWLDIAEQRDFTWVPYGAAQPYLENTRVPEDGVIYG CrPV DRFRCTIlMGKVANAFTYmOSMVYTFKFVKTQYHSGRkRISFIPYYYNTM^ Figure 2.6 Alignment of H a R N A V sequences with sequences from other viruses Alignments were done with C L U S T A L X vl.81 (Thompson et al. 1997), and the presence of positively scoring groups is indicated above the aligned sequences as defined in C L U S T A L : " * " indicates a fully conserved residue, " : " indicates full conservation of a strong group, and " . " indicates full conservation of a weak group. The accession numbers for virus sequences used are given in Table 2.3. (A) Alignment of putative RNA-dependent R N A polymerase protein sequences (corresponding to conserved regions I - VIII; Koonin and Dolja 1993). The alignment comprises residues 1362-1619 of the predicted HaRNAV polyprotein, residues 1691-1950 of the ToRSV polyprotein, and residues 1565-1840 of the A B P V replicase polyprotein. (B) Alignment of putative R N A helicase domain sequences. The alignment comprises residues 524-686 of the predicted HaRNAV polyprotein, residues 1207-1344 of the HRV14 polyprotein, and residues 522-675 of the A B P V replicase polyprotein. (C) Alignment of viral capsid protein sequences. The HaRNAV VP3-like protein sequence (residues 2060-2317 of the predicted sequence, corresponding to protein Band 5; Figures 2.4 and 2.5) was aligned with residues 341-622 of the CrPV structural polyprotein sequence (corresponding to the VP3 capsid protein; Tate et al. 1999). 48 0.1 71 100 HaRNAV I FY 78 • SbV PnPV 59 64 .DCV .CrPV -TEV 1 BaYMV • TSV •ABPV 94 i 100" 84 100 91 J -ll— 1 RhPV BQCV HiPV RH DV — FCV 1 PV 100»  HRV14 RTSV PYFV CPMV ToRSV Iflaviruses Dicistroviridae Caliciviridae Picornaviridae Sequiviridae Comoviridae Potyviridae Figure 2.7 Phylogenetic analysis of RNA-dependent R N A polymerase domain protein sequences Virus classifications are indicated, and the accession numbers and abbreviations for the viruses are listed in Table 2.3. The tree is based on maximum likelihood distances and the Potyviridae sequences from tobacco etch virus (TEV) and barley yellow mosaic virus (BaYMV) were used as an outgroup (outgroup clade support values are 71/100, see Materials and Methods). Support values based on 25000 puzzling steps are shown above the branches. Bootstrap values (percentages based on 1000 replicates) for branches that are supported by >50% by neighbor-joining analysis are labeled below the branches (a dash indicates there was no corresponding branch supported by >50% in the neighbor-joining tree). The maximum likelihood distance scale bar is shown. 49 Dicistroviridae o . i R h P V D C V H i P V C r P V T S V Picornaviridae PV P n P V P Y F V R T S V Sequiviridae Figure 2.8 Phylogenetic analysis of concatenated (putative) helicase/RdRp/VP3-like capsid protein sequences Virus classifications are indicated, and the accession numbers and abbreviations for the viruses are listed in Table 2.3. The unrooted tree is based on maximum likelihood distances and the maximum likelihood distance scale bar is shown. Support values based on 25000 puzzling steps are shown for the branches followed by bootstrap values (percentages based on 1000 replicates) from the neighbor-joining analysis. 50 2.6 References A l l i s o n , R . , R . E . Johnston, and W . G . Dougherty . 1986. The nucleotide sequence o f the cod ing reg ion o f tobacco etch v i rus genomic R N A : evidence for the synthesis o f a s ingle polyprote in . V i r o l o g y 154: 9-20. A l t s c h u l , S. F . , T . L . M a d d e n , A . A . Schaffer, J . H . Zhang , Z . Zhang , W . M i l l e r , and D . J . L i p m a n . 1997. Gapped B L A S T and P S I - B L A S T : a n e w generation o f prote in database search programs. N u c l e i c A c i d s Research 25: 3389-3402. A n d i n o , R . , G . E . R ieckhof , P . L . A c h a c o s o , and D . Ba l t imore . 1993. Po l i ov i ru s R N A synthesis ut i l izes an R N P complex formed around the 5' end o f v i r a l R N A . E M B O Journal 12: 3587-3598. Carter, M . J . , I. D . M i l t o n , J . Meanger , M . Bennett, R . M . G a s k e l l , and P . C . Turner . 1992. The complete nucleotide sequence o f a feline ca l ic iv i rus . V i r o l o g y 190: 443-448. Castberg, T. , R . Thyrhaug , A . Larsen , R . A . Sandaa, M . H e l d a l , J . L . V a n Et ten, and G . Bratbak. 2002. Isolat ion and characterization o f a vi rus that infects Emiliania Huxleyi (Haptophyta) . Journal o f P h y c o l o g y 38: 161-11 A. Cot t re l l , M . T. , and C . A . Suttle. 1991. Wide-spread occurrence and c lona l var ia t ion i n viruses w h i c h cause lys is o f a cosmopol i tan , eukaryotic marine phytoplankter, Micromonas pusilla. M a r i n e E c o l o g y Progress Series 78: 1-9. C u l l e y , A . I., A . S. L a n g , and C . A . Suttle. 2003. H i g h diversi ty o f u n k n o w n picorna- l ike viruses i n the sea. Nature 424: 1054-1057. F r i e s s - K l e b l , A . K . , R . Kn ippe r s , and D . G . M u l l e r . 1994. Isolat ion and characterization o f a D N A vi rus infect ing Feldmannia simplex (Phaeophyceae). Journal o f P h y c o l o g y 30: 653-658. H e l l e n , C . U . T. , and P . Sarnow. 2001 . Internal r ibosome entry sites i n eukaryot ic m R N A molecules . Genes and Deve lopment 15: 1593-1612. 51 Henikof f , S., and J . G . Henikof f . 1992. A m i n o ac id substitution matrices f rom protein b locks . Proceedings o f the N a t i o n a l A c a d e m y o f Sciences o f the U n i t e d States o f A m e r i c a 89: 10915-10919. Jacobsen, A . , G . Bratbak, and M . H e l d a l . 1996. Isolat ion and characterization o f a vi rus infect ing Phaeocystis pouchetii (Prymnesiophyceae) . Journal o f P h y c o l o g y 32: 923-927. Johnson, K . N . , and P . D . Chr is t ian . 1998. The novel genome organizat ion o f the insect p icorna-l i ke v i rus D r o s o p h i l a C vi rus suggests this vi rus belongs to a prev ious ly undescr ibed v i rus fami ly . Journal o f Genera l V i r o l o g y 79: 191-203. K o o n i n , E . V . , and V . V . D o l j a . 1993. E v o l u t i o n and taxonomy o f posit ive-strand R N A viruses: impl ica t ions o f comparat ive-analysis o f amino-acid-sequences. C r i t i c a l R e v i e w s i n B iochemis t ry and M o l e c u l a r B i o l o g y 28: 375-430. Lawrence , J . E . , A . M . C h a n , and C . A . Suttle. 2001 . A nove l v i rus ( H a N I V ) causes lys is o f the tox ic b loom- fo rming alga Heterosigma akashiwo (Raphidophyceae) . Journal o f P h y c o l o g y 37: 216-222. L i l j a s , L . , J . Tate, T . L i n , P . Chr i s t ian , and J . E . Johnson. 2002. Evo lu t iona ry and taxonomic impl ica t ions o f conserved structural motifs between picornaviruses and insect p icorna-l ike viruses. A r c h i v e s o f V i r o l o g y 147: 59-84. M a r c k , C . 1988. " D N A Strider": a " C " program for the fast analysis o f D N A and protein sequences o n the A p p l e M a c i n t o s h fami ly o f computers. N u c l e i c A c i d s Research 16: 1829-1836. M a r i , J . , B . T . Poulos , D . V . Lightner , and J . R . B o n a m i . 2002. Shr imp Taura syndrome vi rus : genomic characterization and s imi lar i ty w i t h members o f the genus Cricket paralysis-like viruses. Journal o f Genera l V i r o l o g y 83: 915-926. Mar t inez-Sa las , E . , R . Ramos , E . Lafuente, and S. L . D e Quin to . 2001 . Func t iona l interactions i n internal translation in i t ia t ion directed by v i r a l and cel lu lar I R E S elements. Journal o f Genera l V i r o l o g y 82: 973-984. Ma thews , D . H . , J . Sabina, M . Zuke r , and D . H . Turner. 1999. E x p a n d e d sequence dependence o f thermodynamic parameters improves predic t ion o f R N A secondary structure. Journal o f M o l e c u l a r B i o l o g y 288: 911-940. M a y e r , J . A . , and F . J . R . Tay lo r . 1979. V i r u s w h i c h lyses the marine nanoflagellate Micromonas pusilla. Nature 281: 299-301. 52 Moon, J. S., L. L . Domier, N. K. Mccoppin, C. J. D'arcy, and H. Jin. 1998. Nucleotide sequence analysis shows that Rhopalosiphum padi virus is a member of a novel group of insect-infecting RNA viruses. Virology 243: 54-65. Muller, T., and M . Vingron. 2000. Modeling amino acid replacement. Journal of Computational Biology 7: 761-776. Nagasaki, K., and M . Yamaguchi. 1997. Isolation of a virus infectious to the harmful bloom causing microalga Heterosigma akashiwo (Raphidophyceae). Aquatic Microbial Ecology 13: 135-140. Nakashima, N., J. Sasaki, and S. Toriyama. 1999. Determining the nucleotide sequence and capsid-coding region of Himetobi P virus: a member of a novel group of RNA viruses that infect insects. Archives of Virology 144: 2051-2058. Page, R. D. M . 1996. TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12: 357-358. Pestova, T. V., C. U. T. Hellen, and E. Wimmer. 1991. Translation of Poliovirus RNA: role of an essential c/s-acting oligopyrimidine element within the 5' nontranslated region and involvement of a cellular 57-kilodalton protein. Journal of Virology 6 5 : 6194-6204. Sandaa, R. A., M . Heldal, T. Castberg, R. Thyrhaug, and G. Bratbak. 2001. Isolation and characterization of two viruses with large genome size infecting Chrysochromulina ericina (Prymnesiophyceae) and Pyramimonas orientalis (Prasinophyceae). Virology 290: 272-280. Sarnow, P. 2003. Viral internal ribosome entry site elements: Novel ribosome-RNA complexes and roles in viral pathogenesis. Journal of Virology 77: 2801-2806. Sasaki, J., N. Nakashima, H. Saito, and H. Noda. 1998. An insect picorna-like virus, Plautia stali intestine virus, has genes of capsid proteins in the 3 ' part of the genome. Virology 244: 50-58. Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13: 964-969. Suttle, C. A. 2000. Viral infection of cyanobacteria and eukaryotic algae, p. 248-286. In C. Hurst [ed.], Viral Ecology. Academic Press. 53 Suttle, C. A., and A. M . Chan. 1995. Viruses infecting the marine Prymnesiophyte Chrysochromulina spp.: Isolation, preliminary characterization and natural abundance. Marine Ecology Progress Series 118: 275-282. Suttle, C. A., A. M . Chan, and M . T. Cottrell. 1990. Infection of phytoplankton by viruses and reduction of primary productivity. Nature 347: 467-469. Swofford, D. 2000. PAUP*: Phylogenetic Analysis Using Parsimony and other Methods 4.0. Sinauer Associates. Tai, V., J. E. Lawrence, A. S. Lang, A. M . Chan, A. I. Culley, and C. A. Suttle. 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma Akashiwo (Raphidophyceae). Journal of Phycology 39: 343-352. Tarutani, K., K. Nagasaki, S. Itakura, and M . Yamaguchi. 2001. Isolation of a virus infecting the novel shellfish-killing dinoflagellate Heterocapsa circularisquama. Aquatic Microbial Ecology 23: 103-111. Tate, J., L. Liljas, P. Scotti, P. Christian, T. W. Lin, and J. E. Johnson. 1999. The crystal structure of Cricket paralysis virus: the first view of a new virus family. Nature Structural Biology 6: 765-774. Taylor, F. J. R. 1990. Red tides, brown tides, and other harmful algal blooms: the view into the 1990s, p. 527-533. In E. Graneli, B. Sundstroem, L. Edler and D. M . Anderson [eds.], Toxic Marine Phytoplankton. Elsevier. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The C L U S T A L _ X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25: 4876-4882. Van der Wilk, F., A. M . Dullemans, M. Verbeek, and J. Vandenheuvel. 1997. Nucleotide sequence and genomic organization of Acyrthosiphon pisum virus. Virology 238: 353-362. Van Etten, J. L . , and R. H. Meints. 1999. Giant viruses infecting algae. Annual Review of Microbiology 53: 447-494. Van Etten, J. L. , L . C. Lane, and R. H. Meints. 1991. Viruses and virus-like particles of eukaryotic algae. Microbiology and Molecular Biology Reviews 55: 586-620. 54 Van Munster, M , A. M . Dullemans, M . Verbeek, J. Van Den Heuvel, A. Clerivet, and F. Van Der Wilk. 2002. Sequence analysis and genomic organization of Aphid lethal paralysis virus: a new member of the family Dicistroviridae. Journal of General Virology 8 3 : 3131-3138. Vieira, J., and J. Messing. 1982. The pUC Plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene 19: 259-268. Wang, I. N., Y. L i , Q. D. Que, M . Bhattacharya, L. C. Lane, W. G. Chaney, and J. L . Van Etten. 1993. Evidence for virus-encoded glycosylation specificity. Proceedings of the National Academy of Sciences of the United States of America 90: 3840-3844. Wu, C. Y., C. F. Lo, C. J. Huang, H. T. Yu, and C. H. Wang. 2002. The complete genome sequence of Perina nuda picorna-like virus, an insect-infecting RNA virus with a genome organization similar to that of the mammalian picornaviruses. Virology 294: 312-323. Zuker, M . , D. H. Mathews, and D. H. Turner. 1999. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide, p. 11-43. In J. Barciszewski and B. F. C. Clark [eds.], RNA Biochemistry and Biotechnology. Kluwer Academic Publishers. 55 Chapter III. High diversity of unknown picorna-like viruses in the sea A version of this chapter has been published Culley, A.L, A.S. Lang, and C.A. Suttle. 2003. High diversity of unknown picorna-like viruses in the sea. Nature 320: 1054-1057. 56 3.1 Introduction Viruses are extremely abundant and geochemically significant agents of microbial mortality in the ocean (Fuhrman 1999, Wilhelm & Suttle 1999). They comprise a morphologically and genetically diverse array of pathogens, some of which infect heterotrophic bacteria and cyanobacteria, as well as photosynthetic and non-photosynthetic protists (Suttle 2000, Wommack & Colwell 2000, Mann 2003). On the basis of a variety of evidence, marine viral communities have been assumed to consist almost entirely of dsDNA viruses; consequently, little effort has been made to examine natural communities of RNA viruses. There are data however to indicate that marine RNA viruses are also important. For example, rhabdoviruses and paramyxoviruses are negative-sense, ssRNA viruses that are major pathogens offish (Bernard & Bremont 1995) and marine mammals (Van Bressem et al. 1999), respectively. In addition, picorna-like viruses have been isolated that are pathogens of penaeid shrimp (Mari et al. 2002), seals (Smith 2000) and whales (Smith 2000). 3.2 Results and Discussion We used available sequences in GenBank to design degenerate primers that target the highly conserved RdRp sequence in picorna-like viruses. These primers in conjunction with R T -PCR were used to assay for the presence of picorna-like viruses in the coastal waters of British Columbia, Canada. With the exception of retroviruses, all RNA viruses encode an RdRp, which is essential for replication. Within the RdRp protein sequence, several motifs have been identified which are homologous among diverse species of RNA viruses (Poch et al. 1989, Koonin & Dolja 1993). Groups based on alignments of these conserved regions are congruent with presently defined RNA virus families (Koonin 1991, Zanotto et al. 1996). Families of picorna-like viruses classified on the basis of RdRp sequence data are congruent with families of picorna-like viruses classified according to virus structure, host and epidemiology. Consequently, sequence analysis can be used to infer the relationship of sequences from natural viral communities to known families of picorna-like viruses. Viruses were concentrated from seawater using ultrafiltration (Suttle et al. 1991) in the spring and summer of 1996, 1997 and 2000. Viral RdRp sequences were amplified from 13 of the 21 viral communities examined. Amplification occurred in samples collected from 57 oceanographically diverse environments, including anthropogenically influenced sites, estuarine environments, stations with a well-mixed water column and pristine, highly stratified fjords. Preliminary analysis of the environmental sequences by BLAST (Altschul et al. 1997) searches of the GenBank database gave high similarities to several picorna-like virus family RdRp sequences as well as an unidentified viral sequence from a Chinese clam homogenate (Kingsley et al. 2002). These results suggested that picorna-like viruses are prevalent in a variety of marine waters. To confirm the amplified products originated from picorna-like viruses, a selection of the amplicons were sequenced, and along with representative sequences from known picorna-like virus families, used to construct phylogenetic trees. All sequenced environmental PCR products translated into continuous amino acid sequences that contained the signature positive-stranded ssRNA virus RdRp motif GDD (Kamer & Argos 1984). Phylogenetic analysis of the RdRp fragment amplified in this study resolved all established picorna-like families (Figure 3.1). Strikingly, none of the environmental sequences fell within established families of picorna-like viruses, but rather into four previously unknown and distantly related groups that we refer to as A, B, C, and D. Our results suggest that at least two and possibly all four of these groups represent new families of RNA viruses. Phylogenetic trees constructed with both maximum-likelihood and neighbor-joining methods group the environmental sequences outside established picorna-like virus families (Figure 3.1). However, low bootstrap support prevents us from drawing any further conclusions regarding the evolutionary relationship between groups B, C, and D. Interestingly, a single sample (JP800) collected from English Bay, which is adjacent to the Strait of Georgia and the city of Vancouver, contained representatives from three of the four novel clades, indicating that a high diversity of picorna-like viruses exists even within a single water sample. Sequence identity among the clades of environmental picorna-like virus sequences was low, ranging from 38.9% to 54.6% nucleotide identity and 21.8% to 52.9% identity when translated to amino acids. In contrast, sequences within each clade were highly conserved and ranged from 97.7% to 100.0% and 95.9% to 100.0% identical on nucleotide and amino-acid levels, respectively. Interestingly, within group A, although no sequences were identical on a nucleotide level, six were identical when translated into amino acids; this suggests that the observed nucleotide differences may be real and not due to methodological error. 58 Although four groups of picorna-like viruses were discovered in this study, one of these groups (group A) contains a virus (HaRNAV, Tai et al. 2003) that causes the lysis of Heterosigma akashiwo, a toxic-bloom-forming alga that is responsible for major fish deaths in temperate waters (Smayda 1998). Nucleotide sequences from HaRNAV were 98.8% to 99.7% identical to sequences from group A, implying that there are a number of viruses closely related to HaRNAV that belong within this group. Interestingly, we were able to amplify sequences belonging to group A from samples collected in different locations, seasons and years, suggesting that HaRNAV-like viruses are reoccurring and widely distributed in the Strait of Georgia. Our results indicate that there is a diverse but previously unknown community of picorna-like viruses that are persistently occurring and widespread in the ocean. The fact that sequences from four stations resulted in at least two novel families of picorna-like viruses suggests that the diversity of these viruses in the ocean is high. Branch lengths between groups B, C, and D are similar to those among families and genera of known picorna-like viruses (Figure 3.1), suggesting that these groups are representative of at least three novel genera within a new family, or, in fact, may represent three previously unknown families. However, bias is undoubtedly associated with each step of the methods used in this research, including concentration and extraction of the RNA virus community, primer design, cDNA synthesis and PCR amplification and cloning (Von Wintzingerode et al. 1997). Thus it is more than likely that these results are an underestimate of marine picorna-like virus richness (see Chapter VI for a more detailed discussion of methodological bias). Viruses are obligate pathogens that generally remain infectious for a relatively short time in natural marine waters (Suttle & Chen 1992). The repeated amplification of picorna-like virus sequences from the same geographic location in water samples collected over a four-year period implies persistent viral production and therefore infection. The significant, high degree of identity between a sequence amplified from clams harvested from Asian waters and marine group B sequences from this study suggests that picorna-like viruses are likely to be present in a wide range of marine environments. Furthermore, the few isolates of marine picorna-like viruses infect ecologically and economically important organisms. These include HaRNAV (Tai et al. 2003), which infects the red tide-forming, fish-killing raphidophyte Heterosigma akashiwo, Taura syndrome virus (TSV), which infects penaeid shrimp (Mari et al. 2002), an intensely 59 farmed, important member of the marine food web, and San Miguel Sea Lion viruses (SMSVs), which infect pinnipeds such as the California Sea Lion (Zalophus californianus) (Smith et al. 1973). Ultimately these newly identified viruses should be isolated, sequenced in full and their hosts identified. In the terrestrial environment, picorna-like viruses infect a wide variety of organisms and are responsible for several important diseases; it seems likely that they will prove to be important players in marine ecosystems as well. 3.3 Mater ia ls and Methods 3.3.1 Sample collection and preparation Viruses from 40 to 200 liters of seawater were concentrated from stations in and adjacent to the Strait of Georgia, British Columbia, between May and August during 1996, 1997 and 2000 aboard the CCGS Vector as described (Suttle et al. 1991). Two milliliters from each concentrated viral community was pelleted by ultracentrifugation and RNA extracted from resuspended pellets using Trizol-LS (Invitrogen, Burlington, Canada) as per the manufacturer's protocol. 3.3.2 RT-PCR The degenerate primers RdRp 1 (positive-strand, 5' -GGA/GGAC/TT A C AG/CCIA/GA/TTTTGAT-3') and RdRp 2 (5 ' -A/CACCCAACG/TA/CCG/TCTTG/CAA/GA/GAA-3' ) were designed from an alignment of the putative RdRp sequences from several picorna-like viruses in the NCBI database. To confirm the identity of environmental PCR products, the 454-base pair (bp) target fragment includes a highly conserved amino acid motif (GDD) characteristic of the RdRp of positive-strand RNA viruses (Kramer & Argos 1984). Complementary DNA was synthesized with Superscript II RNaseH" Reverse Transcriptase (Invitrogen) with the reagents provided using 5 ul of extracted RNA and the primer RdRp 2. Subsequently, PCR was performed with RdRp 1 and RdRp 2 primers. PCR products were separated on a 1.5% agarose gel and bands of the appropriate size (approximately 500 bp) were excised. Washed agarose plugs were used as the template in a second round of PCR with the RdRp primer set. Positive and negative controls were done in parallel for the entire procedure. These RdRp fragments were either directly cloned or separated using denaturing gradient gel electrophoresis (DGGE). 60 D G G E was conducted using 25% to 40% linear denaturant gradient, 7% to 8% linear polyacrylamide gradient gels, as described (Short & Suttle 2002). DGGE bands were excised, re-suspended and amplified in a third round of PCR with RdRp primers. 3.3.3 Cloning and sequencing Second-round PCR fragments or re-amplified excised DGGE bands were cloned in the pGEM-T (Promega) vector using the manufacturer's protocol. Recombinants containing the cloned insert were identified using PCR with universal -21 M l 3 (5 ' -GTTTTCCCAGTCACGACGTTGTA-3 ' ) and M13R (5' - C A G G A A A C A G C T A T G A C C - 3 ' ) primers. Cloned, second-round PCR products were screened by restriction endonuclease digestion. PCR products from plasmids with DGGE bands and second-round PCR inserts with unique digestion patterns were sequenced. PCR fragments were sequenced at the University of British Columbia Nucleic Acid and Protein Service Facility (Vancouver, Canada). Conserved regions of translated sequences were aligned with C L U S T A L X vl.81 (Thompson et al. 1997) and then transformed into maximum-likelihood distances using the W A G matrix (Whelan & Goldman 2001) in TREE-PUZZLE v5.0 (Schmidt et al. 2002) and 25000 puzzling steps. Neighbor-joining bootstrap values were calculated based on 1000 replicates using FITCH v.3.6 (Fitch & Margoliash 1967). The name, acronym and accession number of viruses used in Figure 3.1 are: Aichi virus (AiV), NC_001918; broad bean wilt virus 2 (BBWV2), AB013615; cowpea mosaic virus (CPMV), NC_003549; Cricket paralysis virus (CrPV), NC_003924; Drosophila C virus (DCV), NC_001834; encephalomyocarditis virus (EMCV), NC_001479; equine rhinitis B virus (ERBV), NC_003983; foot-and-mouth disease virus (FMDV), NC_004004; human rhinovirus A (HRV), NC_001490; maize chlorotic dwarf virus (MCDV), NC_003626; parsnip yellow fleck virus (PYFV), NC_003628; poliovirus (PV), NC_002058; porcine teschovirus (PTV), NC_003985; potato virus Y (PVY), NC_001616; rabbit haemorrhagic disease virus (RHDV), NC_001543; rice tungro spherical virus (RTSV), NC_001632; ryegrass mosaic virus (RGMV), NC_001814; Sapporo virus (SV), U65427; sweet potato mild mottle virus (SPMMV), NC_003797; swine vesicular exanthema virus (VESV), NC_002551; Taura syndrome virus (TSV), NC_003005; tomato ringspot virus (ToRSV), NC_003840; wheat streak mosaic virus (WSMV), NC_001886. We determined an error rate of 0.3% to 0.9% based on five independent amplifications 61 using RT-PCR of a picorna-like virus isolate, implying that sequences less than 99.0% identical are probably different. 6 2 3.4 Tables and Figures Table 3.1 Sequence details Name Sample Collection Date (mm/dd/yyyy) Sampling Site Latitude Longitude Depth(m) Sample Volume (L) FRP896-1 08/26/1996 Fraser River Plume 49° 08' W 1 2 3 ° 3 1 ' N n 200 FRP896-2 08/26/1996 Fraser River Plume 49° 08' W 1 2 3 ° 3 1 ' N i i 200 FRP896-3 08/26/1996 Fraser River Plume 49° 08' W 1 2 3 ° 3 1 ' N 200 FRP896-4 08/26/1996 Fraser River Plume 4 9 ° 0 8 ' W 1 2 3 ° 3 1 ' N i i 200 FRP896-5 08/26/1996 Fraser River Plume 4 9 ° 0 8 ' W 1 2 3 ° 3 1 ' N i i 200 FRP896-6 08/26/1996 Fraser River Plume 49° 08' W 1 2 3 ° 3 1 ' N i i 200 FRP896-7 08/26/1996 Fraser River Plume 49° 08' W 1 2 3 ° 3 1 ' N i i 200 JP500-1 05/26/2000 Jericho Pier 49° 16' W 1 2 3 ° 1 2 ' N 0 40 JP700-1 07/13/2000 Jericho Pier 49° 16' W 1 2 3 ° 1 2 ' N 0 40 JP800-1 08/17/2000 Jericho Pier 49° 16' W 123° 12'N 0 40 JP800-2 08/17/2000 Jericho Pier 49° 16' W 1 2 3 ° 1 2 ' N 0 40 JP800-3 08/17/2000 Jericho Pier 4 9 ° 1 6 ' W 123° 12'N 0 40 JP800-4 08/17/2000 Jericho Pier 4 9 ° 1 6 ' W 123° 12'N 0 40 JP800-5 08/17/2000 Jericho Pier 49° 16' W 123° 12'N 0 40 JP800-6 08/17/2000 Jericho Pier 49° 16' W 1 2 3 ° 1 2 ' N 0 40 JP800-7 08/17/2000 Jericho Pier 49° 16'W 123° 12'N 0 40 JP800-8 08/17/2000 Jericho Pier 49° 16'W 123° 12'N 0 40 JP800-9 08/17/2000 Jericho Pier 49° 16'W 123° 12'N 0 40 JP800-10 08/17/2000 Jericho Pier 4 9 ° 1 6 ' W 1 2 3 ° 1 2 ' N 0 40 JP800-11 08/17/2000 Jericho Pier 4 9 ° 1 6 ' W 123° 12'N 0 40 63 96/100 r-! 70 / 9 5 HaRNAV AY285768 JP800-11 AY285767 FRP896-7 AY285753 JP500-1 AY286754 JP80CMAY285760 JP800-8AY285764 FRP896-2 AY285748 FRP896-3 AY285749 FRP896-4 AY285750 FRP896-5 AY285751 FRP896-6 AY285752 JP800-1 AY285757 JP700-1 AY285755 JP800-7 AY285763 JP800-6 AY285762 FRP896-1 AY285747 55 / N JP800-2 AY285758 JP800-3 AY285759 JP80O-5AY285761 JP80O-9 AY285765 JP800-10AY285766 -JP700-2 AY285766 93/96 7 0 / 7 4 CrPV DCV TSV HRV PV • ERBV - FMDV - E M C V • PTV 90 / 98 ri • sv - VESV - RHDV 82/99 RTSV 71/71 7 9 / 8 9 CPMV — ToRSV 84/97 • PVY SPMMV — WSMV B D Dicistroviridae Picornaviridae Caliciviridae Sequiviridae Comoviridae Potyviridae Figure 3.1 Maximum-likelihood tree of RdRp sequences from environmental amplicons and representative viruses from picorna-like virus families (See Methods for complete virus names). Viruses from the Potyviridae, which contain RdRp sequences from a different lineage (Koonin & Dolja 1993), were used as an outgroup. Family names and group letters are shown. Environmental amplicons from coastal British Columbia are labeled by a two or three letter station designation, month, year, group sequence number and NCBI database accession number (SSSMYY-AA, see Table 3.1 for details). TREE-PUZZLE support values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. N indicates there was no corresponding node in the neighbor-joining tree. The maximum likelihood distance scale bar indicates a distance of 0.1. 64 3.5 References Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389-3402. Bailey, L. , A. J. Gibbs, and R. D. Woods. 1964. Sacbrood virus of the larval honey bee (Apis mellifera Linnaeus). Virology 23: 425-429. Bernard, J., and M . Bremont. 1995. Molecular biology of fish viruses - a review. Veterinary Research 26: 341-351. Fitch, W. M . , and E. Margoliash. 1967. Construction of phylogenetic trees. Science 155: 279-284. Fuhrman, J. A. 1999. Marine viruses and their biogeochemical and ecological effects. Nature 399: 541-548. Kamer, G., and P. Argos. 1984. Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Research 12: 7269-7282. Kingsley, D. H., G. K. Meade, and G. P. Richards. 2002. Detection of both hepatitis a virus and Norwalk-like virus in imported clams associated with food-borne illness. Applied and Environmental Microbiology 68: 3914-3918. Koonin, E. V. 1991. The phylogeny of RNA-dependent RNA polymerases of positive- strand RNA viruses. Journal of General Virology 72: 2197-2206. Lazarowitz, S. G. 2001. Plant Viruses, p. 533-598. In D. M . Knipe and P. M . Howley [eds.], Fields Virology. Lippincott, Williams & Wilkins. Liljas, L. , J. Tate, T. Lin, P. Christian, and J. E. Johnson. 2002. Evolutionary and taxonomic implications of conserved structural motifs between picornaviruses and insect picorna-like viruses. Archives of Virology 147: 59-84. 65 Mann, N. H. 2003. Phages of the marine cyanobacterial picophytoplankton. FEMS Microbiology Reviews 27: 17-34. Mari, J., B. T. Poulos, D. V. Lightner, and J. R. Bonami. 2002. Shrimp Taura syndrome virus: genomic characterization and similarity with members of the genus Cricket paralysis-like viruses. Journal of General Virology 83: 915-926. Pallansch, M . A., and R. P. Roos. 2001. Enteroviruses: Polioviruses, Coxsackieviruses, Echoviruses, and Newer Enteroviruses, p. 723-775. In D. M . Knipe and P. M . Howley [eds.], Fields Virology. Lippincott, Williams & Wilkins. Poch, O., I. Sauvaget, M . Delarue, and N. Tordo. 1989. Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8: 3867-3874. Schmidt, H. A., K. Strimmer, M . Vingron, and A. Von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502-504. Short, S. M . , and C. A. Suttle. 2002. Sequence analysis of marine virus communities reveals that groups of related algal viruses are widely distributed in nature. Applied and Environmental Microbiology 68: 1290-1296. Smayda, T. J. 1998. Ecophysiology and Bloom Dynamics of Heterosigma akashiwo (Raphidophyceae), p. 113-131. In D. M . Anderson, A. D. Cembella and G. M . Hallegraeff [eds.], Physiological Ecology of Harmful Algal Blooms. Springer-Verlag. Smith, A. 2000. Aquatic Virus Cycles, p. 447-491. In C. Hurst [ed.], Viral Ecology. Academic Press. Suttle, C. A. 2000. The ecological, evolutionary and geochemical consequences of viral infection of cyanobacteria and eukaryotic algae, p. 248-286. In C. J. Hurst [ed.], Viral Ecology. Academic Press. 66 Suttle, C. A., A. M . Chan, and M . T. Cottrell. 1991. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Applied and Environmental Microbiology 57: 721-726. Tai, V., J. E. Lawrence, A. S. Lang, A. M . Chan, A. I. Culley, and C. A. Suttle. 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). Journal of Phycology 39: 343-352. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins 1997. The C L U S T A L _ X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25: 4876-4882. Van Bressem, M . F., K. Van Waerebeek, and J. A. Raga. 1999. A review of virus infections of cetaceans and the potential impact of morbilliviruses, poxviruses and papillomaviruses on host population dynamics. Diseases of Aquatic Organisms 38: 53-65. Von Wintzingerode, F., U. B. Gobel, and E. Stackebrandt. 1997. Determination of Microbial Diversity in Environmental Samples: Pitfalls of PCR-based rRNA analysis. FEMS Microbiology Reviews 21: 213-229. Whelan, S., and N. Goldman. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular Biology and Evolution 18: 691-699. Wilhelm, S. W., and C. A. Suttle. 1999. Viruses and nutrient cycles in the sea. Bioscience 49: 781-788. Wommack, K. E. , and R. R. Colwell. 2000. Virioplankton: viruses in aquatic ecosystems. Microbiology and Molecular Biology Reviews 64: 69-114. Zanotto, P. M . D., M . J. Gibbs, E. A. Gould, and E. C. Holmes. 1996. A reevaluation of the higher taxonomy of viruses based on RNA polymerases. Journal of Virology 70: 6083-6096. 67 Chapter IV. Metagenomic analysis of coastal RNA virus communities A version of this chapter has been published Culley, A.I., A.S. Lang, and CA. Suttle. 2006. Metagenomic analysis of coastal RNA virus communities. Science 313: 1795-1798. 68 4.1 Introduction High mutation rates and short generation times cause RNA viruses to exist as dynamic populations of genetic variants that are capable of using multiple host species (Drake & Holland 1999). In the oceans, the largest ecosystem on Earth, RNA viruses infect ecologically and economically important organisms at all trophic levels including heterotrophic bacteria (Borsheim 1993, see Chapter I section 1.1.8), fish (Kim et al. 2005), crustaceans (Mari et al. 2002), and marine mammals (Rima et al. 2005). Recently, a series of previously unknown RNA viruses have been characterized that infect marine phytoplankton. These include positive-sense single-stranded (ss)RNA viruses (HaRNAV and HcRNAV) that lyse the toxic bloom-formers Heterosigma akashiwo and Heterocapsa circularisquama (Lang et al. 2004, Nagasaki et al. 2005), a positive-sense ssRNA virus (RsRNAV) that infects the diatom Rhizosolenia setigera (Nagasaki et al. 2004), and a double-stranded (ds)RNA virus (MpRNAV) with a genome organization similar to reoviruses that infects the cosmopolitan species Micromonas pusilla (Brussard et al. 2004). Despite the apparent importance of RNA viruses to marine organisms, almost nothing is known about natural communities of RNA viruses in the sea. The most tantalizing evidence that the diversity of RNA viruses in the sea extends well beyond what has been revealed in culture comes from a study that used gene-specific primers to target a subset of picorna-like viruses (Culley et al. 2003). The work showed that these positive-sense ssRNA viruses are persistent, widespread and diverse members of marine virus communities. Cultivation-independent genomic approaches have recently been used to characterize entire microbial (DeLong & Karl 2005, DeLong et al. 2006) and bacteriophage (Breitbart et al. 2002, Breitbart et al. 2003) assemblages from a diversity of ecosystems. For this study we used randomly reverse-transcribed whole-genome shotgun sequencing to characterize the diversity of uncultivated marine RNA virus assemblages. 4.2 Results and Discussion Natural virus communities were concentrated from English Bay at Jericho Pier (JP) and the Strait of Georgia (SOG), British Columbia, Canada (Table 4.1). RNA was extracted from the purified virus fraction, reverse-transcribed into cDNA, and used to construct libraries 69 representative of the natural RNA virus communities (see section 4.3.12 and 4.3.13 for details). Few sequence fragments [37 and 19% for JP and SOG, respectively (Figure 4.1)] showed significant similarity [tBLASTx (Altschul et al. 1997) E value < 0.001] to sequences in the National Center for Biotechnology Information (NCBI) database and no similarity to sequences from the Sargasso Sea microbial metagenome (Venter et al. 2004). In contrast, 90% of Sargasso Sea microbial sequence fragments are significantly similar to sequences in the NCBI database (Edwards & Rohwer 2005). These results imply that most RNA viruses in the sea are distantly related to known viruses and that their genetic diversity is much less explored relative to that of the prokaryotic community. Sequence similarity (tBLASTx E value < 0.001) in our samples revealed 98% of sequences belonged to positive-sense ssRNA viruses that infect eukaryotes. The one exception was a sequence with similarity to a dsRNA virus. No RNA phages were detected, supporting arguments that most marine phages have DNA genomes (Weinbauer 2004) and that the predominant hosts of marine RNA viruses are eukaryotes. In addition, no sequences were similar to retroviral or negative-sense ssRNA viruses. Our results are minimum estimates of the richness of marine viral communities because some viruses may have been excluded by our sampling methods (see section 6.1.1 for a discussion of sampling bias). Nonetheless, we observed sequences resembling those of tombusviruses (Lommel et al. 2004), umbraviruses (Taliansky et al. 2004) and nanoviruses (Vetten et al. 2004), all of which are viruses that have not previously been reported from aquatic environments (Figure 4.1 and Table 4.2). Most sequences with significant matches to known sequences (77%) were similar to viral genes with known functions, which is not surprising given the limited number of genes encoded by RNA viruses and their relatively small average genome size (Table 4.3). The sequence fragments from the two aquatic viral communities were assembled by using a minimal mismatch percentage of 98% and an overlap of 20 base pairs (bp), the most stringent settings given the total introduced error of the RNA virus shotgun library construction method. Simulations demonstrated that these parameters correctly reassembled the genomes of different strains of the same species of RNA viruses from a random assortment of sequence fragments. After assembly, 50% of JP and 36% of SOG sequence fragments overlapped with other sequence fragments and formed contiguous segments (contigs) of overlapping sequence fragments. In the JP library, 66% of the overlapping sequence fell within four large contigs, 70 which were subsequently assembled into two complete viral genomes that are similar in structure (the JP genome organization schematic shown in Figure 4.2 is representative of both viruses) to each other but that differ from most other known picorna-like viruses (Figure 4.2). In contrast, over 90% of the remaining 14 contigs were formed from three sequence fragments or fewer, indicating that the JP RNA libraries are composed of two very abundant genotypes and others that were relatively rare. Similarly, the genotypic composition of the SOG library was also uneven, with 59% of the sequence fragments forming a single contig that contained most of a novel viral genome, including the 3' untranslated region (UTR), the structural proteins, and all eight conserved regions of the replicase (Koonin & Doha 1993) (Fig. 4.2). All the remaining sequences fell into contigs composed of two or fewer fragments. Attempts to quantify the structure and diversity of the two RNA virus communities with Phage Communities from Contig Spectrum (PHACCS) (Angly et al. 2005), an online tool designed to estimate the diversity of phage communities on the basis of the frequency of overlapping sequence fragments from whole-genome shotgun libraries, failed primarily owing to the disproportionate contribution of sequence fragments from a small number of dominant genotypes to the total number of contigs in both RNA virus libraries. The complete genomes assembled from the JP and SOG genomic libraries do not fall within any of the established families of RNA viruses. The JP genomes appear to be dicistronic single molecules of positive-sense ssRNA about 9 kbp in size (Figure 4.2). The JP genomes have characteristics similar to viruses in the proposed order Picornavirales (Christian et al. 2005), including the gene order of putative nonstructural genes, a poly(A) tail, a similar G + C content, and core regions of sequence similarity. However, phylogenetic analysis based on aligned RNA-dependent RNA polymerase (RdRp) amino acid sequences placed the JP genomes definitively outside the family Dicistroviridae (Figure 4.3), the only dicistronic family of viruses in the proposed order Picornavirales. Instead, the sequences fell within a well-supported clade that included HaRNAV, RsRNAV, and SssRNAV, suggesting that they share a common ancestry with viruses that infect marine protists (Figure 4.3). Phylogenies based on alignments of RdRp sequences from RNA viruses were congruent with established family assignments (Culley et al. 2003, Koonin & Dolja 1993) and hence provided a means of classifying unknown RNA virus sequences from the environment. Like the JP genomes, the SOG genome appears to be from a positive-sense ssRNA virus. BLASTp searches and phylogenetic analyses (Figure 4.3), as well 71 as genomic features such as a putative polymerase domain interrupted by an in-frame termination codon and the absence of obvious helicase motifs (Figure 4.2) (Lommel et al. 2004), indicated similarity to viruses in the family Tombusviridae and the unassigned Umbravirus genus, which infect flowering plants. However, unlike these viruses, the SOG genome had no detectable movement protein (on the basis of sequence similarity) and is therefore unlikely to be from a virus that infects a terrestrial plant. In the JP sample, 97% of the significant sequence matches were to viruses in the proposed order Picornavirales (Christian et al. 2005) (Figure 4.1 and Table 4.2). Of these, 43% were most similar to HaRNAV, which was first isolated from British Columbia waters (Tai et al. 2003) and which is the lone genome in the database for a picorna-like, phytoplankton-infecting RNA virus. Although the sequences were divergent from HaRNAV, the results suggest that related viruses were important members of the RNA virus community at the JP site. The second most frequent top scoring matches were to picorna-like virus RdRp sequence fragments amplified from the coastal waters of British Columbia (Culley et al. 2003), followed by matches to an array of Picornavirales sequences, including viruses infecting higher plants (apple latent spherical virus), arthropods (Taura syndrome virus), and mammals (foot-and-mouth disease virus) (Table 4.2). Nonetheless, the sequences were notably divergent from others in the database, showing that the marine viruses were distantly related to known RNA viruses. One sequence fragment was most similar to a rotavirus sequence, indicating that dsRNA viruses were also likely present. A significant match (tBLASTx E value = 3 x 10"20) to the RdRp of Sclerophthora macrospora virus A, an unclassified positive-sense ssRNA mycovirus with a unique genome organization (Yokoi et al. 2003), further illustrates the genetic novelty of marine RNA viruses. In contrast, in the SOG sample, 73% of the significant sequence matches and the largest contig containing the highest number of sequence fragments were similar to sequences from the Tombusviridae (Figure 4.1 and Table 4.2). Known members of this family infect higher plants and have positive-sense ssRNA genomes greater than 5.5 kbp in size (Lommel et al. 2004). Also present in the community were sequences similar to viruses in the genera Umbravirus and Nanovirus (Figure 4.1 and Table 4.2). The SOG and JP assemblages had little similarity in community composition. Picorna-72 like virus RdRp sequences were amplified from the JP site but not the SOG sample (Culley et al. 2003). tBLASTx searches among sequence fragments from both libraries resulted in 7% of SOG and 8% of JP having significant similarity (E value < 0.001) with each other. Numerous factors may have affected the composition of the JP and SOG virus communities, including salinity [12 parts per thousand (ppt) for JP versus 27 ppt for SOG], interannual variability (JP and SOG samples were collected in 2000 and 2004, respectively), and depth of sampling (JP is a surface sample whereas SOG was collected at 11 m; see Table 4.1 for additional station characteristics). An indirectly shared characteristic between the samples was that 4% of JP and 9% of SOG sequence fragments had significant similarity to the same cripavirus (KBV), although these sequences had no demonstrable homology. Our results demonstrate that marine RNA virus communities are distantly related to established groups of viruses. Both Bayesian (Altekar et al. 2004) and neighbor-joining (Swofford 2002) phylogenetic analyses of RdRp sequences strongly supported the occurrence of a distinct clade of marine picorna-like viruses. The only known viruses in this clade are HaRNAV, RsRNAV and SssRNAV (Fig. 4.3), all of which infect marine photosynthetic protists; hence, it seems likely that the environmental sequences were also from viruses that infect eukaryotic phytoplankton. Moreover, the large differences between the communities show that the RNA virus populations can differ greatly between two locations (i.e. they are not the same everywhere). The congruence between RdRp sequences and the established taxonomy of picorna-like viruses (Figure 4.3) suggests that the environmental RdRp sequences likely originate from 10 previously unknown genera of positive-sense ssRNA viruses. A second well-supported clade included two sequences that were related to sequences from viruses in the genus Cripavirus (Figure 4.3), a group of viruses known only to infect arthropods. This suggests these environmental sequences may have originated from viruses that infect marine arthropods. Phylogenetic analyses of RdRp sequences from the SOG library and representative members of the family Tombusviridae and genus Umbravirus indicated that the environmental sequences did not belong within established genera (Fig. 4.3) and clearly supported the existence of other marine RNA viruses that are only distantly related to extant taxa. Our analyses suggest the existence of a diverse group of RNA viruses that included 73 sequences related to viruses known to infect marine protists. Compared with the intensive sequencing required to characterize marine prokaryotic communities (Venter et al. 2004), the relatively small genome size of RNA viruses makes the construction of whole-genome shotgun libraries a realistic approach to rapidly survey the diversity of RNA virus communities. In conjunction with the isolation and the sequencing of individual RNA viruses, genomic surveys of RNA virus assemblages is an important step towards a greater understanding of the diversity and ecological impact of these pathogens in the ocean. • 4.3 Mater ia ls and Methods 4.3.1 Station description Seawater samples were collected from two stations, JP (Jericho Pier) a site in English Bay adjacent to the city of Vancouver, British Columbia and SOG (Strait of Georgia), a site in the Strait of Georgia, which separates the mainland of British Columbia from Vancouver Island (Table 4.1). Although English Bay opens directly to the Strait of Georgia, the JP station is heavily influenced by freshwater from the Fraser River, whereas the SOG site is more influenced by water from the Pacific Ocean (Leblond 1983) (Table 4.1). 4.3.2 Virus Concentration Concentrated virus communities were produced as described by Suttle et al. (1991). Briefly, large volumes of seawater from JP (40 1) and SOG (60 1) were sequentially filtered through glass-fiber (nominal pore size 1.2 um) and 0.45 um pore-size Durapore PVDF (polyvinylidene fluoride) membranes (Millipore, Cambridge, Canada) to remove eukaryotic plankton and most prokaryotes. The remaining viral size particulate material in the filtrate was concentrated approximately 200-fold through a tangential flow filter cartridge (Millipore) with a 30 kDa molecular-weight cutoff. Remaining bacteria were removed by filtering each concentrate twice through a 0.22 um pore-size Durapore PVDF membrane (Millipore). Virus-sized particles in each concentrate were pelleted by ultracentrifugation (5 h, 113,000 x g, 4 °C). Pellets were resuspended in sterile buffer (50 LIM Tris HCl, pH 7.8) overnight at 4 °C. 4.3.3 RNase treatment and extraction Before extraction, the concentrated lysates were treated with RNase (Roche, Mississauga, Canada) to remove non-encapsidated RNA (1 U RNase, 30 min incubation at 37 °C). Total 74 nucleic acids were extracted with the QIAmp Minelute Virus Spin Kit (Qiagen, Mississauga, Canada) according to the manufacturer's instructions. All concentrations cited are final concentrations. 4.3.4 DNase 1 treatment Total extracted viral nucleic acids were incubated with 1 U DNase 1 (Invitrogen, Burlington, Canada) and 1 x DNase 1 buffer for 15 min at room temperature to remove DNA from the sample. The reaction was terminated by adding 2.5 mM E D T A and incubating for 15 min at 65 °C. 4.3.5 Universal rRNA PCR An aliquot of purified nucleic acids from each sample was used in a PCR reaction with universal 16S primers to confirm the absence of bacterial nucleic acids in the sample using the following conditions: lx Platinum Taq buffer (Invitrogen), 1.5 mM MgCk, 0.2 mM each dNTP, 0.2 pM each universal rRNA primers GM3F (5'-AGAGTTTGATCCTGGC-3') and 907R (5' -CCGTCAATTCCTTTGAGTTT-3' ) (Niibel et al. 1997), 1 U of Platinum Taq DNA polymerase and 2 pl of template in a final volume of 50 pl. The following thermocycler conditions were used: 94 °C for 2 min, followed by 30 cycles of 94 °C for 60 s, 55 °C for 60 s and 72 °C for 90 s, and a final extension stage of 10 min at 72 °C. Aliquots of the amplification products were electrophoresed in a 1.5% agarose gel in 0.5 x T B E buffer (45 mM Tris-borate, 1 mM E D T A [pH 8.0]). Gels were stained with ethidium bromide and visualized under U V illumination. 4.3.6 cDNA synthesis The construction of a random shotgun clonal library from a community of viral RNA genomes uses some reagents and procedures found in the Superscript choice system for cDNA synthesis (Invitrogen). The overall approach is based on the Linker Amplified Shotgun Library method described at http://www.sci.sdsu.edu/PHAGE/LASL/ and first used to produce shotgun libraries of marine phages (Breitbart et al. 2002). To randomly synthesize cDNA from each extracted community of RNA virus genomes, 100 ng of random hexamers were added per sample and each reaction was heated to 70 °C for 10 min and immediately put on ice. lx RT buffer, 10 mM DTT and 500 pM of each dNTP were added and the reaction incubated at 37 °C. 75 After 2 min, 1 U of Superscript II reverse transcriptase (Invitrogen) was added and cDNA synthesis was performed at 37 °C for 60 min. 4.3.7 Second-strand synthesis Double-stranded cDNA fragments were synthesized from each first strand synthesis reaction using nick translational replacement of genomic RNA (Okayama & Berg 1982) with the following conditions, l x second strand buffer (Invitrogen), 250 uM of each dNTP, 10 U of E. coli DNA Ligase (Invitrogen), 40 U of E. coli DNA polymerase (Invitrogen) and 2 U of E. coli RNase H (Invitrogen) were added directly to each first strand reaction in a final volume of 150 pl. This reaction was incubated at 16 °C for 2 h. To ensure products were blunt-ended, 10 U of T4 DNA polymerase (Invitrogen) was added to each reaction and incubated for an additional 5 min at 16 °C. Finally, E D T A was added to a final concentration of 30 mM to each sample to terminate the reaction. 4.3.8 Adapter addition Blunted, double-stranded products from each sample were extracted with phenol:chloroform:isoamyl alcohol (25:24:1) in preparation for adapter ligation. EcoRI (Notl) (Invitrogen) adapters were added in the following reaction: 18 pl double-stranded cDNA sample resuspended in DEPC-treated water, l x adapter buffer, 14 mM DTT, 200 pg/ml adapters and 100 U/ml T4 DNA ligase (Invitrogen). This reaction was incubated for 18 h at 16 °C afterwards, followed by heat inactivation by a 10 min incubation at 70 °C. 4.3.9 Column chromatography To increase the probability of cloning larger fragments, samples were size fractionated with a Sephacryl column (Invitrogen) according to the manufacturer's instructions. Fractions theoretically greater than 600 bp were EtOH precipitated and re-suspended in water in preparation for PCR. 4.3.10 Adapter-targeted PCR Attempts to clone products directly failed due to insufficient insert concentration and therefore an amplification step was necessary. PCR was performed using a primer targeting a site on the EcoRI (Notl) adapter. The conditions were as follows: l x Platinum Taq buffer, 3 mM MgCl 2 , 0.2 mM of each dNTP, 0.8 uM of primer 76 (5'-CGGCCGCGTCGAC-'3), and 2.5 U of Platinum Taq DNA polymerase in a final volume of 50 ul. To reduce the chances of PCR-generated errors in early cycles (Rohwer et al. 2001), each PCR for each size fraction was divided into 5 smaller reactions of 10 ul each and thermo-cycled with the following conditions: 94 °C for 75 s, followed by 25 cycles of 94 °C for 45 s, 55 °C for 45 s and 72 °C for 150 s, and a final extension for 10 min at 72 °C. Completed reactions were pooled and the products purified with a PCR Minelute cleanup kit (Qiagen). 4.3.11 Cloning & Sequencing Products from each purified reaction were cloned with a TOPO T A Cloning Kit (Invitrogen) as per the manufacturer's protocol. Clones were analyzed by PCR with the vector-specific primers T3 (5' - A T T A A C C C T C ACTAAAGGGA-3 ' ) and T7 (5 ' -TAATACGACTCACTATAGGG-3' ) . After electrophoresis, the remaining PCR products from reactions demonstrating inserts greater than 600 bp were purified with a PCR Minelute cleanup kit (Qiagen) and sequenced using Applied Biosystems BigDye v3.1 Terminator Chemistry. Sequencing services were provided by University of British Columbia's Nucleic Acid and Protein Service Facility (Vancouver, Canada). 4.3.12 Sequence fragment classification Clones were sequenced with the T3 forward primer, resulting in 247 sequences in JP and 108 in SOG. These sequences were compared against GenBank with tBLASTx (Altschul et al. 1997) and the Sargasso Sea environmental metagenome (Venter et al. 2004) with BLASTx. A sequence was considered significantly similar if BLAST E values were < 0.001. Based on the sequence with the tBLASTx hit with the lowest E value, sequences were classified into one of three biological categories, viral, bacterial or unknown. tBLASTx searches between the two libraries was performed with the stand-alone BLAST tools available at http ://www.ncbi. nlm. nih. gov/BL AST/do wnload. shtml. Despite no amplification in either sample with universal rRNA primers (see above) BLAST searches showed that some clones were significantly similar to bacterial sequences (11% of the JP sequences and 37% of the SOG clones). A similar percentage of bacteria-like sequences were present in shotgun RNA libraries from the RNA virus community of human feces (Zhang et al. 2006). It is possible that in seawater some RNA and DNA is bound to virus-size particles, but 77 is resistant to enzyme digestion. Almost all of these "contaminating" sequences had significant similarity to bacterial 16S or 23S ribosomal RNA sequences with E values < 1 e"100. Furthermore, only sequences similar to bacteria had significant similarity with sequences in the Sargasso Sea metagenome, while sequences classified as viral and unknown showed no similarity to sequences in this database whatsoever. Sequences from JP and SOG identified as bacterial had an average % G+C distinctly higher than the average % G+C for sequences in the unknown and viral categories. Because our objective was to characterize the RNA virus community alone, all sequences identified as bacterial based on the above criteria, were removed from subsequent analyses. This resulted in 216 and 61 sequences classified as viral or unknown in the JP and SOG libraries, respectively. 4.3.13 Contig assembly Sequence fragments were assembled into overlapping segments using Sequencher v4.5 (Gene Codes, Ann Arbor, U.S.A.) based on a minimum match percentage of 98 and a minimum bp overlap of 20. These are the most stringent conditions feasible given the error introduced during the process of library construction. The distribution of contigs for the JP community was: one contig of 23 sequence fragments, one contig of 20, one contig of 18, one contig of 11, one contig of 6, two contigs of 4, one contig of 3, 9 contigs of two and 109 contigs of one. The contig distribution for the SOG community was: one contig of 13 sequence fragments, one contig of 5, two contigs of two and 39 contigs of one. In a simulation with sequence fragments from a variety of taxonomic levels of picorna-like viruses, we found that these assembly conditions correctly assembled genomes of different strains of RNA viruses belonging to the same species. 4.3.14 Bias See Chapter VI 6.1.1 Bias for a discussion of bias in the library construction method 4.3.15 Phylogenetic analyses The other viruses used in phylogenetic analyses are listed in Table 4.4. Translated sequences of viruses were aligned using C L U S T A L X vl.83 with the Gonnet series protein matrix (Thompson et al. 1997). Alignments were transformed into likelihood distances with Mr Bayes v3.1.1 (Altekar et al. 2004) and 250000 generations. Neighbor-joining trees were constructed with PAUP v4.0 (Swofford 2002), and bootstrap values calculated based on 78 percentages of 10000 replicates. 4.3.16 cDNA synthesis for picorna-like RdRp RT-PCR and DGGE The degenerate primers RdRp 1 (5 ' -GGA/GGAC/TT A C AG/CCIA/GA/TTTTGAT-3') and RdRp 2 (5'-A/CACCCAACG/TA/CCG/TCTTG/CAA/GAyGAA-3') described in Culley et al. (2003) (Chapter III) were used to assay JP and SOG libraries for picorna-like viruses. cDNA was synthesized by combining RNA virus template, 0.1 pmol primer RdRp 2, 0.5 mM of each dNTP and incubating for 5 min. at 65 °C. After cooling on ice, 1 x first strand buffer, 5 mM DTT, 40 U RNaseOUT and 200 U of Superscript III RT (Invitrogen) were added and the reaction held at 50 °C for 50 min followed by 70 °C for 15 min to inactivate the enzyme. To degrade the remaining RNA template, 1 U of RNase H was added and the sample incubated at 37 °C for 20 min. 4.3.17 PCR with degenerate primers The cDNA template was used in PCR with RdRp 1 and RdRp 2 primers. Each reaction consisted of lx Platinum Taq buffer, 3.0 mM MgCl 2 , 0.2 mM each dNTP, 1.0 pM each RdRp 1 and RdRp 2, and 1 U Platinum Taq DNA polymerase in a final volume of 50 pl. The following thermocycler conditions were used: 94 °C for 75 s, followed by 40 cycles of 94 °C for 45 s, 50 °C for 45 s and 72 °C for 1 min, and a final extension stage of 5 min at 72 °C. PCR products were separated on a 1.5% agarose gel in 0.5 x T B E buffer. To produce enough product for DGGE, bands of the appropriate size (approximately 500 bp) were excised with a sterile pipette, suspended in 0.5 x T B E and heated to 80 °C for 10 min. Aliquots of washed agarose plugs were used in a second round of PCR with the RdRp primer set using slightly more stringent PCR conditions (1.5 mM MgCl 2) and 25 cycles with the above thermocycling protocol. Positive and negative controls were done in parallel for the entire procedure. 4.3.18 DGGE Picorna-like virus RdRp fragments were separated with DGGE as described by Short and Suttle (Short & Suttle 2003). We ran gels with 30 to 50% linear denaturing gradients and 7 to 8% polyacrylamide for 16 h in lx T A E buffer (40 mM Tris, 20 mM sodium acetate, 1 mM E D T A [pH 8.5]) at 80 V and 60 °C in a D-code electrophoresis system (Bio-Rad Laboratories, Hercules, U.S.A.). The gels were then stained in 0.1 x SYBR Gold (Molecular Probes, Eugene, U.S.A.) for 79 4 h. Selected bands were excised, re-suspended in l x T A E , heated to 80 °C for 10 min and amplified in a third round of PCR with RdRp primers as described above. PCR products from each reaction were purified with a PCR Minelute cleanup kit and cloned with a TOPO T A Cloning Kit (Invitrogen) as per the manufacturer's protocol. Inserts were sequenced in one direction. 4.3.19 Accession numbers Sequences have been deposited in GenBank with accession numbers DX420985-DX421142 and DQ439712-DQ439732. 80 4.4 Tables and Figures Table 4.1 Characterization of sampling sites. The location is given in degree decimal format. A chlorophyll a value was not available (n.a.) for the SOG sample. We did not observe a bloom at either station during sampling. Parameter JP SOG Date (mm/dd/yyyy) 06/29/2000 07/13/2004 Location (Latitude, Longitude) 49.27,-123.20 49.86,-124.60 Depth (m) < 1 11 Temperature (°C) 18 14 Salinity (ppt) 12 27 Volume collected (L) 40 60 Chlorophyll a (ugL"1) 3.0 n.a. Tide Ebb Flood 81 Table 4.2 Identification of the top tBLASTx matches (E value < 0.001, n = 92) of environmental sequences from JP and SOG libraries with the Genbank database. A number in bold indicates the highest percentage of matches in each sample, and (-) indicates the virus family, genus or species was not present. Family/Genus JP (%) total) SOG(%) total) Virus species JP(%) total) SOG(%) total) Comoviridae 2 - Apple latent spherical 1 -Bean pod mottle 1 -Dicistroviridae 31 9 Acute bee paralysis 1 -Aphid lethal paralysis 5 -Black queen cell 5 -Cricket paralysis 2 -Drosophila C 6 -Kashmir bee 4 9 Plant ia stall intestine 1 -Rhopalosiphum padi 4 -Solenopsis invicta 1 -Taura syndrome 1 -Marnaviridae 43 - Heterosigma akashiwo RNA 43 -Nanoviridae 9 Subterranean clover stunt 9 Picornaviridae 2 - Foot-and-mouth disease 2 -Reoviridae 2 - Human rota- 2 -Tombusviridae - 72 Hibiscus chlorotic ringspot 27 Galinsoga mosaic 18 Tobacco necrosis A 18 Pea stem necrosis 9 Umbravirus 10 Groundnut rosette 10 Unclassified 20 - Sclerophthora macrospora A 1 -unid. Chinese clam assoc. 6 -unid. picorna-like. grp B 4 -unid. picorna-like. grp C 10 -8 2 Table 4.3 Classification of significant tBLASTx matches (E value < 0.001, n = 92) to viral sequences into protein categories. Protein classification % of total viral hits RNA-dependent RNA polymerase 39 Capsid 33 Unidentified structural 16 Unidentified nonstructural 7 Helicase 3 RNA binding protein 1 Replication initiator protein 1 83 Table 4.4 Sequences used in phylogenetic analyses. Virus Group Virus Acronym Full Name NCBI Accession # Cheravirus ALSV CRLV Apple latent spherical virus Cherry rasp leaf virus NC_003787 NC 006271 Comoviridae BBWV1 Broad bean wilt virus 1 CPMV Cowpea mosaic virus TRSV Tobacco ringspot virus NC_005289 NC_003549 NC 005097 Dicistroviridae ABPV CrPV DCV PSIV TSV TrV Acute bee paralysis virus Cricket paralysis virus Drosophila C virus Plautia stali intestine virus Taura syndrome virus Triatoma virus NC_002548 NC_003924 NC_001834 NC_003779 NC_003005 NC 003783 Iflavirus DWV KV VDV Deformed wing virus Kakugo virus Varroa destructor virus NC_004830 NC_005876 NC 006494 Marnaviridae HaRNAV Heterosigma akashiwo RNA virus NC 005281 Picornaviridae EMCV ERBV FMDV HRV PV Encephalomyocarditis virus Equine rhinitis B virus 1 Foot-and-mouth disease virus A Human rhinovirus 89 Poliovirus NC_001479 NC_003983 NC_011450 NC_001617 NC 002058 Sadwavirus SDV NIMV Satsuma dwarf virus NC_003785 Navel orange infectious mottling virus AB_022887 Sequiviridae PYFV RTSV Parsnip yellow fleck virus Rice tungro spherical virus NC_003628 NC 001632 Tombusviridae CaRMV Carnation mottle virus CRSV Carnation ringspot virus MCMV Maize chlorotic mottle virus OCSV Oat chlorotic stunt virus PMV Panicum mosaic virus PoLV Pothos latent virus TBSV Tomato bushy stunt virus TNV-A Tobacco necrosis virus A NC_001265 NC_003530 NC_003627 NC_003633 NC_002598 NC_000939 NC_001554 NC 001777 Virus Group Virus Acronym Full Name NCBI Accession # Umbravirus CMoMV Carrot mottle mimic virus NC_001726 GRV Groundnut rosette virus NC 003603 PEMV-2 Pea enation mosaic virus-2 NC 003853 TBTV Tobacco bushy top virus NC_004366 Unclassified SssRNAV Schizochytrium single-stranded RNA virus NC_007522 RsRNAV Rhizosolenia setigera RNA virus AB243297 B Unidentified picorna-like virus JP700-1 AY 285755 C Unidentified picorna-like virus JP800-1 AY 285758 D Unidentified picorna-like virus JP700-2 AY_285756 Environmental JP-A Environmental sequence JP.418.600-5465 DQ439729 JP-B Environmental sequence JP.418.600-4289 DQ439728 5d Environmental sequence JP.418C.600-5 DX421064 6d Environmental sequence JP.418C.600-6 DX421065 9d Environmental sequence JP.418C.600-9 DX421066 11 d Environmental sequence JP.418C.600-11 DX421067 16d Environmental sequence JP.418C.600-16 DX421068 20d Environmental sequence JP.418C.600-20 DX421069 32 Environmental sequence JP.418D.600-32 DX421081 62 Environmental sequence JP.418A.600-62 DX420998 162 Environmental sequence JP.418D.600-162 DX421094 1743 Environmental sequence JP.418.600-1743 DQ439724 SOG-A Environmental sequence SOG.658.704-3093 DQ439732 399 Environmental sequence SOG.658.704-399 DX421139 =Comoviridae Dicistroviridae . fi Marnaviridae JljjNanoviridae HI Picornaviridae 1 Tombusviridae Reoviridae Umbravirus Unclassified Unknown 1 *gi Figure 4.1 Composition of the JP (outer circle, n = 216) and the SOG (inner circle, n = 61) libraries The top tBLASTx matches of sequences from JP and SOG with the GenBank non-redundant database (E value < 0.001) are categorized by taxonomic group. Virus families or genera are in different shades of grey. The Comoviridae, Dicistroviridae, Marnaviridae, and Picornaviridae are families in the proposed order Picornavirales (Christian et al. 2005). The percent values for each virus group in each library are shown. The identification of the individual viruses from each taxonomic group can be found in Table 4.2. 86 A Marnaviridae Dicistroviridae CvpG Picornaviridae iflavirus 5— tlHel^j^Proy RdRp' J Structural |—An rojRdRp | : : i ' i x : ' Structural |—An Hel ( j^^-Qtructural | Hel {.. ProJ RdRp^|~»An : Sequiviridae 5'—fstructural j Hel {Pro RdRp~i-"*3' Cheravirus Sadwavirus Comoviridae JP 5'—| Structural!— 3' 5— H e l R d R p | — An RNA2 RNA1 Structural j—3' 5'— /.Hel^Pffi ^RdRp"3|— An RNA2 RNA1 5'- Hel 1 ° r a RcRp fi™| Structural J-»An B Tombusvirus Umbravirus SOG IMRfi.. Structural:: i RdRp 5'- RdRp IT 3' Figure 4.2 Comparison of the general genomic organization of the R N A virus genomes assembled from the JP and S O G libraries (JP and SOG) with representative viruses from the (A) proposed order Picornavirales (Christian et al. 2005) and the (B) family Tombusviridae and genus Umbravirus. Genomes are shown from 5' to 3', where conserved RNA virus protein domains are labeled as Hel =for helicase, Pro, = protease, RdRp, = RNA-dependent RNA polymerase; IGR, = intergenic region, MP, = movement protein; and An, indicates the presence of a poly(A) tail. Figure 4.2-A genomes are approximately 10 kbp in size while Figure 4.2-B genomes are approximately 5 kbp in size. The "JP" schematic represents the genome organization of both assembled RNA viruses JP-A and JP-B. The characteristic read-through stop codon of the Tombusviridae replicase (represented by a divided RdRp) and the -1 frame shift of the Umbravirus replicase (represented by a staggered RdRp) are also shown (B). Unlabeled regions in gray refer to sequence that code for protein of unknown function. 87 DCV Figure 4.3 Bayesian maximum likelihood trees of aligned RdRp amino acid sequences from the JP R N A virus community and representative members of the proposed order Picornavirales (Christian et al. 2005) Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. JP-A and JP-B are from the assembled environmental genomes. The Bayesian scale bar indicates a distance of 0.1. Environmental sequence numbers followed by a "d" are from excised denaturing gradient gel electrophoresis bands. See Table 4.4 for complete virus names, virus classification and sequence accession numbers. 88 Figure 4.4 Bayesian maximum likelihood trees of aligned RdRp amino acid sequences from the S O G virus library and representative viruses from the Tombusviridae and Umbravirus genus Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. SOG is from the assembled environmental genome. The Bayesian scale bar indicates a distance of 0.1. See Table 4.4 for complete virus names, virus classification and sequence accession numbers. 89 4.5 References Altekar, G., S. Dwarkadas, J. P. Huelsenbeck, and F. Ronquist. 2004. Parallel metropolis coupled Markov chain Monte Carlo for bayesian phylogenetic inference. Bioinformatics 20: 407-415. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389-3402. Angly, F., B. Rodriguez-Brito, D. Bangor, P. Mcnairnie, M . Breitbart, P. Salamon, B. Felts, J. Nulton, J. Mahaffy, and F. Rohwer. 2005. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. B M C Bioinformatics 6. Borsheim, K. Y. 1993. Native marine bacteriophages. FEMS Microbiology Letters 102: 141-159. Breitbart, M . , B. Felts, S. Kelley, J. M . Mahaffy, J. Nulton, P. Salamon, and F. Rohwer. 2004. Diversity and population structure of a near-shore marine-sediment viral community. Proceedings of the Royal Society of London Series B-Biological Sciences 271: 565-574. Breitbart, M . , P. Salamon, B. Andresen, J. M . Mahaffy, A. M . Segall, D. Mead, F. Azam, and F. Rohwer. 2002. Genomic analysis of uncultured marine viral communities. Proceedings of the National Academy of Sciences of the United States of America 99: 14250-14255. Brussaard, C. P. D., A. A. M . Noordeloos, R. A. Sandaa, M . Heldal, and G. Bratbak. 2004. Discovery of a dsRNA virus infecting the marine photosynthetic protist Micromonas pusilla. Virology 319: 280-291. Christian, P., Fauquet, C M . , Gorbalenya, A.E. , King, A .M.G. , Knowles, N., Legall, O., Stanway, G. 2005. A proposed Picornavirales Order In C. M . Fauquet [ed.], Microbes in a changing world. International Unions of Microbiological Societies. Culley, A. I., A. S. Lang, and C. A. Suttle. 2003. High diversity of unknown picorna-like viruses in the sea. Nature 424: 1054-1057. Delong, E. F., and D. M . Karl. 2005. Genomic perspectives in microbial oceanography. Nature 437: 336-342. 90 Delong, E. F., C. M . Preston, T. Mincer, V. Rich, S. J. Hallam, N. TJ. Frigaard, A. Martinez, M . B. Sullivan, R. Edwards, B. R. Brito, S. W. Chisholm, and D. M . Karl. 2006. Community genomics among stratified microbial assemblages in the ocean's interior. Science 311: 496-503. Drake, J. W., and J. J. Holland. 1999. Mutation rates among RNA viruses. Proceedings of the National Academy of Sciences of the United States of America 96: 13910-13913. Edwards, R. A., and F. Rohwer. 2005. Viral metagenomics. Nature Reviews Microbiology 3: 504-510. Kim, D. H., H. K. Oh, J. I. Eou, H. J. Seo, S. K. Kim, M . J. Oh, S. W. Nam, and T. J. Choi. 2005. Complete nucleotide sequence of the hirame rhabdovirus, a pathogen of marine fish. Virus Research 107: 1-9. Koonin, E. V., and V. V. Dolja. 1993. Evolution and taxonomy of positive-strand RNA viruses -implications of comparative-analysis of amino-acid-sequences. Critical Reviews in Biochemistry and Molecular Biology 28: 375-430. Lang, A. S., A. I. Culley, and C. A. Suttle. 2004. Genome sequence and characterization of a virus (HaRNAV) related to picorna-like viruses that infects the marine toxic bloom-forming alga Heterosigma akashiwo. Virology 320: 206-217. Leblond, P. H. 1983. The Strait of Georgia - functional-anatomy of a coastal sea. Canadian Journal of Fisheries and Aquatic Sciences 40: 1033-1063. Lommel, S. A., Martelli, G.P., Rubino, L . Russo, M . 2004. Tombusviridae, p. 907-936. In C. M . Fauquet, Mayo M.A., Maniloff, J., Desselberger, U., Ball, L .A. [eds.], Virus Taxonomy. Eight Report of the International Committee on Taxonomy of Viruses. Elsevier. Mari, J., B. T. Poulos, D. V. Lightner, and J. R. Bonami. 2002. Shrimp Taura syndrome virus: genomic characterization and similarity with members of the genus Cricket paralysis-like viruses. Journal of General Virology 83: 915-926. Nagasaki, K., Y. Shirai, Y. Takao, H. Mizumoto, K. Nishida, and Y. Tomaru. 2005. Comparison of genome sequences of single-stranded RNA viruses infecting the bivalve-killing dinoflagellate Heterocapsa circularisquama. Applied and Environmental Microbiology 71: 8888-8894. 91 Nagasaki, K., Y. Tomaru, N. Katanozaka, Y. Shirai, K. Nishida, S. Itakura, and M . Yamaguchi. 2004. Isolation and characterization O f a novel single-stranded RNA virus infecting the bloom-forming diatom Rhizosolenia setigera. Applied and Environmental Microbiology 70: 704-711. Ntibel, U., F. Garciapichel, and G. Muyzer. 1997. PCR primers to amplify 16S rRNA genes from cyanobacteria. Applied and Environmental Microbiology 63: 3327-3332. Okayama, H. , and P. Berg. 1982. High-efficiency cloning of full-length cDNA. Molecular and Cellular Biology 2: 161-170. Rima, B. K., A. M . J. Collin, and J. A. P. Earle. 2005. Completion of the sequence of a cetacean morbillivirus and comparative analysis of the complete genome sequences of four morbilliviruses. Virus Genes 30: 113-119. Rohwer, F., V. Seguritan, D. H. Choi, A. M . Segall, and F. Azam. 2001. Production of shotgun libraries using random amplification. Biotechniques 31: 108-118. Short, S. M . , and C. A. Suttle. 2003. Temporal dynamics of natural communities of marine algal viruses and eukaryotes. Aquatic Microbial Ecology 32: 107-119. Suttle, C. A., A. M . Chan, and M . T. Cottrell. 1991. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Applied and Environmental Microbiology 57: 721-726. Swofford, D. 2000. PAUP*: Phylogenetic Analysis Using Parsimony and other Methods 4.0. Sinauer Associates, Inc. Tai, V. , J. E . Lawrence, A. S. Lang, A. M . Chan, A. I. Culley, and C. A. Suttle. 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). Journal of Phycology 39: 343-352. Taliansky, M . E. , Robinson, D.J., Waterhouse, P.M., Murant, A.F. , De Zoeten, G.A., Falk, B.W., Gibbs, M.J. 2004. Umbravirus, p. 901-906. In C. M . Fauquet, Mayo M.A. , Maniloff, J., Desselberger, U., Ball, L.A. [eds.], Virus Taxonomy. Eight Report of the International Committee on Taxonomy of Viruses. Elsevier. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins 1997. The C L U S T A L _ X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25: 4876-4882. 92 Venter, J. C , K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Y. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy, A. H. Knap, M . W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y. H. Rogers, and H. O. Smith. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74. Vetten, H. J., Chu, P.W.G., Dale, J.L., Harding, R., Hu, J., Katul, L . , Kojima, M . , Randies, J.W., Sano, Y., Thomas, J.E. 2004. Nanoviridae, p. 343-352. In C. M . Fauquet, Mayo M.A., Maniloff, J., Desselberger, TJ., Ball, L .A. [eds.], Virus Taxonomy. Eight Report of the International Committee on Taxonomy of Viruses. Elsevier. Weinbauer, M . G. 2004. Ecology of prokaryotic viruses. FEMS Microbiology Reviews 28: 127-181. Yokoi, T., S. Yamashita, and T. Hibi. 2003. The nucleotide sequence and genome organization of Sclerophthora macrospora virus A. Virology 311: 394-399. Zhang, T., M . Breitbart, W. H. Lee, J. Q. Run, C. L . Wei, S. W. L. Soh, M . L. Hibberd, E. T. Liu, F. Rohwer, and Y. J. Ruan. 2006. RNA viral community in human feces: Prevalence of plant pathogenic viruses. PLoS Biology 4: 108-118. 93 Chapter V. The complete genomes of three viruses assembled from shotgun libraries of marine RNA virus communities A version of this chapter will be submitted for publication Culley, A.I., A.S. Lang, and CA. Suttle. The complete genomes of three viruses assembled from shotgun libraries of marine RNA virus communities. 94 5.1 Introduction Based on a variety of evidence, marine viral communities have been assumed to consist entirely of dsDNA bacteriophages (Weinbauer 2004) (see Chapter I, section 1.1 for more details). However, RNA viruses of every classification have been isolated from the ocean, although the in situ RNA virus community remains largely uncharacterized. For example, the identities and hosts of the most abundant marine RNA viruses from any location are still unknown. Although there are several examples of RNA viruses that infect marine animals (e.g. Smith 2000), these organisms represent a very small portion of the organisms in the sea; therefore it is unlikely that viruses infecting these organisms make up a significant fraction of the natural RNA virioplankton. It is more likely that the dominant RNA viruses infect the diverse and abundant marine protists. For example, RNA viruses have recently been isolated that infect a diatom (Nagasaki et al. 2004), dinoflagellate (Tomaru et al. 2004), and marine fungoid protist (Takao et al. 2005). Previous research investigated the diversity of marine picorna-like viruses, a "superfamily" of positive-sense single-stranded (ss)RNA viruses that have similar genome features and several conserved protein domains (Chapter III, Culley et al. 2003). Analysis of RNA dependent RNA polymerase (RdRp) sequences amplified from marine virus communities demonstrated that picorna-like viruses are present and persistent in a diversity of marine environments. Furthermore, phylogenetic analyses showed that none of the environmental sequences fell within established virus families (Chapter III, Culley et al. 2003). In a recent study, reverse-transcribed whole-genome shotgun libraries were used to characterize two marine RNA virus communities (Chapter IV, Culley et al. 2006). Positive-sense ssRNA viruses that are distant relatives of known RNA viruses dominated the libraries. One RNA virus library (JP) was characterized by a diverse, monophyletic clade of picorna-like viruses, while the second library (SOG) was dominated by viruses related to members of the family Tombusviridae and genus Umbravirus (Chapter IV, Culley et al. 2006). Moreover, in both libraries, a high percentage of sequence fragments contributed to a handful of contiguous segments of sequence (contig). Specifically, in the SOG sample 59% of the sequence fragments that formed overlapping contigs fell into one segment. Similarly, 66% of JP sequence fragments contributed to only four contigs (Chapter IV, Culley et al. 2006). Using a PCR-based approach to 95 increase the amount of sequence for each dominant contig resulted in the assembly of three complete viral genomes. This contribution analyzes these three marine RNA virus genomes and investigates their similarities and differences of representative genotypes with established viral taxa. 5.2 Results and Discussion 5.2.1 Jericho Pier site The JP-A genome is a single molecule of linear positive-sense ssRNA, 9212 nt in length. The genome has a 568 nt 5' untranslated region (UTR) followed by 2 predicted open reading frames (ORFs) of 5131 nt (ORF 1, nt position 569 to 5700) and 3044 nt (ORF 2, nt position 5841 to 8885) separated by an intergenic region (IGR) of 139 nt (Figure 5.1). ORF 2 is followed by a 3' UTR of 327 nt (nt position 8886 to 9213) and a poly (A) tail (Figure 5.1). The base composition of JP-A is 27.1% A, 19.4% C, 22.0% G, and 31.6% U; this results in a % G+C of 41%, a percentage similar to other polycistronic picorna-like viruses (Table 5.1). Comparison of the protein sequence predicted to be encoded by ORF 1 of JP-A to known viral sequences shows that it contains conserved sequence motifs characteristic of a type III viral helicase (aa residues 641 to 756), a 3C-like cysteine protease (aa residues 1288 to 1314) and a type I RdRp (aa residues 1561 to 1802) (Koonin and Dolja 1993, Figure 5.1). BLASTp (Altschul et al. 1997) searches of the NCBI database with the ORF 1 sequence showed significant inferred amino-acid sequence similarities (E value < 0.001) to nonstructural protein motifs of several viruses, including members of the families Dicistroviridae (Drosophila C virus), Marnaviridae (HaRNAV), Comoviridae (Cowpea mosaic virus) and the unassigned genus Iflavirus (Kakugo virus). The top matches for ORF 1 were to RsRNAV [E value = 3e"119, identities = 302/908 (33%o)], a newly sequenced, unclassified positive-sense ss RNA virus that infects the widely distributed diatom Rhizosolenia setigera (Nagasaki et al. 2004), HaRNAV [E value = 2e"32, identities = 156/624 (25%)] and Drosophila C virus [E value = le"29, identities = 148/603 (24%))], a positive-sense ssRNA virus that infects fruit flies. Comparison of the protein sequence predicted to be encoded by ORF 2 of JP-A to known viral sequences shows that it has significant similarities to the structural proteins of viruses from the families Dicistroviridae (Drosophila C virus), Marnaviridae (HaRNAV), and the genus Iflavirus (Varroa destructor virus 1). The 96 sequences that are most similar to ORF 2 of JP-A were the structural protein regions of RsRNAV [E value = 6e"78, identities = 212/632 (33%)], HaRNAV [E value = 6e"68, identities = 187/607 (30%)] and SssRNAV [E value = 2e"49, identities = 241/962 (25%)]. JP-B is also likely from a positive-sense ssRNA virus. The 8839 bp genome consists of a 5' UTR of 766 nt followed by two predicted ORFs of 4850 nt (ORF 1, nt position 767 to 5617) and 2786 nt (ORF 2, nt position 5843 to 8629) separated by an IGR of 224 nt (nt position 5618 to 5842, Figure 5.1). The 3' UTR is 209 followed by a poly(A) tail. The base composition of the genome was A, 30.8%; C, 17.9%; G, 19.7%; U, 31.6%. Like JP-A, this is % G+C value of 38% is comparable to the % G+C observed in other dicistronic picorna-like viruses (Table 5.1) The position of core sequence motifs conserved among positive-sense ssRNA viruses and BLAST searches of the NCBI database with the translated JP-B genome suggest that nonstructural proteins are encoded by ORF 1, and the structural proteins are encoded by ORF 2. We identified conserved sequence motifs in ORF 1 characteristic of a type III viral helicase (aa residues 587 to 700), a 3C-like cysteine protease (aa residues 1141 to 1168) and a type I RdRp (aa residues 1402 to 1667) (Koonin and Dolja 1993) (Figure 5.1). BLASTp (Altschul et al. 1997) searches of the NCBI database showed that ORF 1 has significant similarities (E value < 0.001) to nonstructural genes from positive-sense ssRNA viruses from a variety of families, including the Comoviridae (Peach rosette mosaic virus), Dicistroviridae (Taura syndrome virus), Marnaviridae (HaRNAV), Sequiviridae (Rice tungro spherical virus) and Picornaviridae (Avian encephalomyelitis virus). The top scoring sequences [E value = 2e"69, identities = 232/854 (27%))] were to a RdRp sequence fragment from RsRNAV and a partial picorna-like virus RdRp from an unidentified virus [E value = 2e"40, identities = 85/150 (56%.)] amplified from the same JP station during an earlier study (Chapter III, Culley et al. 2003). Significant similarities to ORF 2 include the structural genes of viruses from the families Dicistroviridae (Rhopalosiphum padi virus) Marnaviridae (HaRNAV) and Picornaviridae (Human parechovirus 2), as well as the unclassified genus Iflavirus (Ectropis obliqua picorna-like virus). The top scoring sequences were to the capsid protein precursor regions of RsRNAV [E value = 9e"88, identities = 244/799 (30%)] and HaRNAV [E value = 8e"60, identities = 180/736 (24%)] and SssRNAV [E value = le" 4 0 . identities = 156/588 (26%)]. Several viruses in the family Dicistroviridae have genomes that contain internal ribosome 97 entry sites (IRESs) (Jan and Sarnow 2002, Nishiyama et al. 2003, Cevallos and Sarnow 2005, Czibener et al. 2005), which raises the question of whether an IRES was present in JP-A or JP-B given their apparently similar dicistronic genome organization. Structures within the IRES position the genome on the ribosome actuating elongation even in the absence of known canonical initiation factors (Jan and Sarnow 2002). For example, TSV, a marine dicistrovirus, has an IRES located in the IGR that directs the synthesis of the structural proteins (Cevallos & Sarnow 2005). Although secondary structure elements characteristic of dicistrovirus IGR-lRESs in the JP genomes (Hatakeyama et al. 2006) were not located in the JP genomes, both genome sequences have extensive predicted secondary structure in the 5' UTRs and IGRs, suggestive of IRES function. Moreover, start codons in a favorable Kozak context, [i.e. conserved sequences upstream of the start codon that are thought to play a role in initiation of translation (Kozak 1986)] were not found in the JP genomes. However, IRES structures can vary greatly between viruses and there is clearly large evolutionary distance among these viruses (see below), and therefore predicted IRES elements must be confirmed experimentally in dicistronic constructs. It seems likely that these viruses use similar mechanisms to initiate translation of the ORF 2 genes. We used RT-PCR to assess the distribution and persistence of the JP-A and JP-B viruses in situ. Amplification with specific primers that target each of these viruses occurred in samples from throughout the Strait of Georgia, the West coast of Vancouver Island and in every season and tidal state at Jericho pier (Figure 5.2, Table 5.2). These results suggest that JP-A and JP-B are ubiquitous and can be detected in marine and estuarine waters. It has long been recognized that several other groups of small, positive-sense ssRNA viruses share many characteristics with viruses in the family Picornaviridae. Recently, Christian et al. (2005) proposed creating an order (the Picornavirales) of virus families (Picornaviridae, Dicistroviridae, Marnaviridae, Sequiviridae and Comoviridae) and unassigned genera (Iflavirus, Cheravirus, and Sadwavirus) that have picornavirus-like characteristics. Viruses in the proposed order have genomes with a covalently attached protein to the 5' end and a 3' poly (A) tail, a conserved order of non-structural proteins (helicase-VpG-proteinase-RdRp), regions of high sequence similarity in the helicase, proteinase and RdRp, post translational protein processing during replication and an icosahedral capsid with a unique "pseudo-T3" symmetry and only infect eukaryotes. 98 Although the capsid morphology, presence of a 5' terminal protein and replication strategy is unknown, signature genomic features and phylogenetic analysis suggest that the JP viruses fall within the proposed order Picornavirales. Both JP genomes encode the conserved core aa motifs and have the non-structural gene order characteristic of viruses in the Picornavirales. Furthermore, both JP genomes have a poly(A) tail and G+C content commensurate with viruses in the Picornavirales. Bayesian trees (Altekar et al. 2004) based on alignments of conserved RdRp domains (Koonin and Doha 1993) (Figure 5.3) as well as concatenated (putative) Hel/RdRp/VP3 capsid-like protein sequences (Figure 5.4) of the JP genomes and representative members of the Picornavirales, resolves established taxa within the Picornavirales and provides strong support for a clade comprised of viruses (HaRNAV, RsRNAV and SssRNAV) that infect marine protists. Within this clade, RsRNAV, JP-A and JP-B have the most characteristics in common. For example, they have the same order of structural and non-structural genes, they are polycistronic and phylogenetic analyses indicate they are (relatively) closely related. Whether JP-A and JP-B infect host organisms related to Rhizosolenia setigera remains unclear, although the inclusion of the JP genomes within this clade and the fact that protists are the most abundant eukaryotes in the sea suggest that both JP viruses likely have a protist host. 5.2.2 Strait of Georgia site The SOG genome has features characteristic of a positive-sense ssRNA virus. The genome is 4449 bp long and comprised of a 5' UTR of 211 bp followed by three putative ORFs (nt position 212-1229, nt position 1232 - 2863 and nt position 2864 - 4231) and is terminated with a 3' UTR of 218 bp. A poly(A) tail was not detected. Another putative ORF located at nt position 49 to 786 is in an alternative reading frame relative to the ORFs discussed above (Figure 5.1). The G+C content of the SOG genome is 52%. We identified only the eight conserved motifs of the RdRp (Koonin and Dolja 1993) in the SOG genome (aa residues pos 563 to 817, nt positions 1687- 2451) (Figure 5.1). tBLASTx (Altschul et al. 1997) searches with the remainder of the genome sequence showed no significant matches (E value < 0.001) to sequences in the NCBI database (including the five environmental metagenomes currently deposited). BLASTp searches with the putative RdRp resulted in significant similarities (E value < 0.001) to RdRp sequences from positive-sense ssRNA viruses 99 from the family Tombusviridae and the unassigned genus Umbravirus. The sequence with the most similarity to SOG was from Olive latent virus 1 [E value = 3e"66, identities = 180/508 (35%)]. This virus belongs to the genus Necrovirus in the family Tombusviridae that has a host range restricted to higher plants (Lommel et al. 2004). SOG is also significantly similar to the Carrot mottle mimic virus sequence [E value = 6e~66, identities = 178/492 (36%)], a member of the unclassified genus Umbravirus whose known members infect only flowering plants. Although the SOG putative RdRp sequence has similarity to the RdRp of viruses from the family Tombusviridae and genus Umbravirus, the remaining SOG sequence has no detectable similarity to any other known sequence. A Bayesian maximum likelihood tree based on alignments of the SOG RdRp with the available Umbravirus sequences and representative members of the Tombusviridae indicates that the SOG genome forms a well supported clade (Bayesian clade support value of 100) with the single member of the genus Avenavirus, OCSV (Figure 5.5). Additionally, the presence of an amber stop codon (nt position 1230-1232) separating ORF 1 and 2 of the SOG genome (Figure 5.1) resembles the in-frame termination codon characteristic of the replicase of viruses in 7 of the 8 genera of the Tombusviridae (White & Nagy 2004). This division of the replicase of the Tombusviridae by a termination codon is thought to be part of a translational read though gene expression strategy (White & Nagy 2004). Other similarities to the Tombusviridae include the absence of an obvious helicase motif and the 5' proximal relative position of the RdRp (Lommel et al. 2004) within the gene. However, unlike viruses in the Tombusviridae, there is no recognizable sequence for conserved movement or capsid proteins in the SOG genome. The absence of a movement protein suggests that the SOG virus does not infect a higher plant. Our inability to identify structural genes may indicate that, like the umbraviruses, the SOG genome does not encode capsid proteins. However it is more likely that the structural proteins of the SOG genome have no sequence similarity to those currently in the NCBI database. Our analyses suggest that a persistent and possibly dominant population of novel dicistronic picorna-like viruses is an important component of the RNA virioplankton in coastal waters. Nevertheless, as exemplified in the SOG genome, other marine RNA virus assemblages appear to contain viruses whose detectable sequence similarity with established groups of viruses is limited to only the most conserved RdRp genes. As we work towards the ultimate goal of understanding the ecological role of marine viruses, the next challenge with these data, and in 100 marine v i rus metagenomic research i n general, is to affiliate each assembled genome w i t h a specif ic v i r i o n and host. 5.3 Materials and Methods 5.3.1 Station description The shotgun l ibraries were constructed f rom seawater samples col lected f rom two stations, JP (Jericho Pier) a site i n E n g l i s h B a y adjacent to the c i ty o f V a n c o u v e r , B r i t i s h C o l u m b i a and S O G (Strait o f Georg ia) , located i n the central Strait o f G e o r g i a next to P o w e l l R i v e r , B . C . (Figure 5.2, Chapter I V section 4.3.1 and Table 4.1). T h e locat ions o f the stations where one or both o f the JP genomes were detected are shown i n F igure 5.2. Deta i l s for each station are l isted i n Tab le 5.2 and Tab le 4 .1 . In summary, samples were col lected f rom sites throughout the Strait o f Georg ia , i nc lud ing repeated sampl ing f rom the JP site dur ing different seasons, and f rom the West coast o f V a n c o u v e r Island i n B a r k l e y Sound . 5.3.2 Virus concentration method Concentrated v i rus communi t ies were produced as described by Suttle et a l . (1991). Twen ty to 60 liters o f seawater f rom each station were pre-fil tered through glass fiber (nomina l pore size 1.2 pm) and 0.45 p m pore-size Durapore P V D F (po lyv iny l idene f luoride) membranes ( M i l l i p o r e , Cambr idge , Canada) respectively, to remove particulates, i nc lud ing eukaryot ic p lankton and most prokaryotes. T h i s filtrate was subsequently concentrated approximate ly 200 f o l d through a tangential f l o w filter cartridge ( M i l l i p o r e ) w i t h a 30 k D a molecula r cutoff, essentially result ing i n the concentration o f the 2 to 450 n m size fraction o f seawater. R e m a i n i n g bacteria were r emoved by filtering each v i r a l concentrate two times through a 0.22 p m Durapore P V D F membrane ( M i l l i p o r e ) . V i r u s - s i z e d particles i n each v i rus concentrate were pel leted v i a ultracentrifugation (5 h at 113 000 x g at 4 ° C ) . Pellets were resuspended i n sterile buffer (50 p m Tr i s chlor ide) and left to resuspend overnight at 4 ° C . 5.3.3 Whole-genome shotgun library construction A detailed descr ipt ion o f the who le genome shotgun l ibrary construct ion protocol can be found i n C u l l e y et a l . (2006, Chapter I V ) . B r i e f l y , before extraction concentrated v i r a l lysates 101 were treated with RNase (Roche, Mississauga, Canada) and then extracted with a QIAamp Minelute Virus Spin Kit (Qiagen, Mississauga, Canada) according to the manufacturer's instructions. An aliquot of each extract was used in a PCR reaction with universal 16S primers to ensure samples were free of bacteria. To isolate the RNA fraction samples were treated with DNase 1 (Invitrogen, Burlington, Canada) and used as templates for reverse transcription with random hexamer primers. Double-stranded cDNA fragments were synthesized from ssDNA with Superscript III reverse transcriptase (Invitrogen) using nick translational replacement of genomic RNA (Okayama and Berg 1982). After degradation of overhanging ends with T4 DNA polymerase (Invitrogen), adapters were attached to the blunted products with T4 D N A ligase (Invitrogen). Subsequently, excess reagents were removed and cDNA products were separated by size with a Sephacryl column (Invitrogen). To increase the amount of product for cloning, size fractions greater than 600 bp were amplified with primers targeting the adapters. Products from each PCR reaction were purified and cloned with the TOPO T A Cloning system (Invitrogen). Clones were screened for inserts by PCR with vector-specific primers. Insert PCR products greater than 600 bp were purified and sequenced at the University of British Columbia's Nucleic Acid and Protein Service Facility (Vancouver, Canada). Sequence fragments were assembled into overlapping segments using Sequencher v 4.5 (Gene Codes, Ann Arbor, U.S.A.) based on a minimum match percentage of 98 and a minimum bp overlap of 20. Sequences were compared against the NCBI database with tBLASTx (Altschul et al. 1997). A sequence was considered significantly similar if BLAST E values were < 0.001. The details for viruses used in phylogenetic analyses are listed in Table 5.3. Virus protein sequences were aligned using C L U S T A L X v 1.83 with the Gonnet series protein matrix (Thompson et al. 1997). Alignments were transformed into likelihood distances with Mr Bayes v3.1.1 (Altekar et al. 2004) and 250000 generations. Neighbor-joining trees were constructed with PAUP v4.0 (Swofford 2002), and bootstrap values calculated based on percentages of 10000 replicates. 5.3.4 5' and 3' RACE The 5' and 3' ends of the environmental viral genomes were cloned using the 5' and 3' R A C E systems (Invitrogen) according to manufacturer's instructions. 3' R A C E with the SOG genome required the addition of a poly(A) tract with poly (A) polymerase (Invitrogen) before cDNA synthesis. cDNA was synthesized directly from extracted viral RNA from the appropriate library. Three clones of each 5' and 3' end were sequenced. 102 5.3.5 PCR 5.3.5.1 Closing gaps in the assembly PCR with primers targeting specific regions of the two JP environmental genomes were used to verify the genome assembly, increase sequencing coverage and reconfirm the presence of notable genome features. The template for these reactions was the amplified and purified PCR product from the JP and SOG shotgun libraries. Table 5.4 lists the sequence and genome position of primers used. The standard PCR conditions were reactions with 1 U of Platinum Taq DNA polymerase (Invitrogen) in lx Platinum Taq buffer, 1.5 mM MgC^, 0.2 mM of each dNTP, and 0.2 pM of each primer (Table 5.4), in a final volume of 50 pl. Thermocycler conditions were, activation of the enzyme at 94 °C for 1 min 15 s, followed by 30 cycles of denaturation at 94 °C for 45 s, annealing at 50 °C for 45 s and extension at 72 °C for 1 minute. The reaction was terminated after a final extension stage of 5 min at 72 °C. PCR products were purified with a PCR Miniature cleanup kit (Qiagen) and sequenced directly with both primers. 5.3.5.2 Environmental screening To assess the temporal and geographic distribution of the JP genomes, extracted RNA from viral concentrates were screened with Superscript III One-step RT-PCR System with Platinum Taq DNA Polymerase (Invitrogen) with primers JP-A 5 and 6 and JP-B 6 and 7 (Table 5.4). The template for the reactions was DNase 1 treated viral RNA extracted with a QIAamp Minelute Virus Spin Kit (Qiagen) according to the manufacturer's instructions. Each reaction consisted of RNA template, 1 x reaction mix, 0.2 pM of each primer, 1 pl RT/Platinum Taq mix in a volume of 50 pl. Reactions were incubated 30 min at 50 °C, then immediately heated to 94 °C for 45 s, followed by 35 cycles of denaturation at 94 °C for 15 s, annealing at 50 °C for 30 s and extension at 68 °C for 1 min. After a final extension step at 68 °C for 5 min, RT-PCR products were analyzed by agarose gel electrophoresis. Products were sequenced to verify the correct target had been amplified. 103 5.4 Tables and Figures Table 5.1 Comparison of base composition between polycistronic picorna-like viruses Genome A C G U % G+C JP-A 27.1 19.4 22.0 31.6 41 JP-B 30.8 17.9 19.7 31.6 38 ABPV 35.7 15.4 20.1 28.9 36 A L P V 31.3 19.4 19.2 30.2 39 BQCV 29.2 18.5 21.6 30.6 40 CrPV 32.61 18.4 20.9 28.1 39 DCV 29.9 16.3 20.4 33.4 37 HiPV 29.2 18.7 20.9 31.2 39 K B V 33.8 17.5 20.2 28.6 38 PSIV 31.3 17.0 19.4 32.3 36 RhPV 30.0 18.6 20.2 31.2 39 RsRNAV 31.2 16.7 19.5 32.5 36 SINV-1 32.9 18.3 20.5 28.2 39 SssRNAV 24.2 26.1 23.6 26.0 50 TSV 28.0 20.2 23.0 28.8 43 TrV 28.7 16.1 19.8 35.4 36 Average 30.4 18.4 20.7 30.5 39 104 T a b l e 5.2 J P genome su rvey s amp l e sites. A "+" ind ica tes a m p l i f i c a t i o n a n d "-" ind ica tes no a m p l i f i c a t i o n o c c u r r e d , " n . a . " ind ica tes the d a t a is not a va i l ab l e a n d "S" means the samp l e was t a k e n f r o m the sur face . Station Name Station location (B.C., Canada) Date (mm/dd/yy) Location (Lat.,Long.) Depth (m) Temp (°C) Salinity (PPt) JP-A PCR JP-B PCR JP Jericho Pier 04/28/00 49.27,-123.20 S 9 26 + + JP Jericho Pier 06/15/00 49.27,-123.20 S 14 12 + + JP Jericho Pier 06/29/00 49.27,-123.20 S 17 12 + + JP Jericho Pier 07/06/00 49.27,-123.20 S 16 13 + + JP Jericho Pier 07/13/00 49.27,-123.20 S 18 8 - -JP Jericho Pier 07/27/00 49.27,-123.20 S 18 11 + + JP Jericho Pier 08/17/00 49.27,-123.20 S 18 18 + + JP Jericho Pier 09/14/00 49.27,-123.20 S 15 19 + + JP Jericho Pier 09/21/00 49.27,-123.20 S 15 16 - + JP Jericho Pier 09/28/00 49.27,-123.20 S 14 21 + + JP Jericho Pier 11/23/00 49.27, -123.20 S 8 27 + + JP Jericho Pier 02/15/01 49.27,-123.20 S 7 27 + + JP Jericho Pier 06/14/01 49.27,-123.20 S 15 13 + + SEC Sechelt Inlet 07/06/03 49.69,-123.84 4 13 26 - + TEA Teakearne Inlet 07/07/03 50.19,-124.85 5 13 28 + -QUA Quadra Island 07/07/03 50.19,-125.14 3 13 28 - -ARR Arrow Pass 07/09/03 50.72,-126.67 2 10 31 + + IEC Imperial Eagle Channel 06/20/99 48.87,-125.21 7 n.a. n.a. + -TRE Trevor Channel 06/28/99 48.97,-125.16 S n.a. n.a. + + BAM Bamfield Inlet 07/06/99 48.81,-125.16 S n.a. n.a. + + NUM Numukamis Bay 07/12/99 48.90,-125.01 8 n.a. n.a. + + 105 Table 5.3 Virus sequence details Virus Group Virus Acronym Full Name NCBI Accession # Cheravirus A L S V C R L V Apple latent spherical virus Cherry rasp leaf virus N C 003787 N C 006271 Comoviridae BBWV1 Broad bean wilt virus 1 C P M V Cowpea mosaic virus TRSV Tobacco ringspot virus N C 005289 N C 003549 N C 005097 Dicistroviridae A B P V B Q C V CrPV D C V HiPV PSIV TSV TrV Acute bee paralysis virus Black queen cell virus Cricket paralysis virus Drosophila C virus Himetobi P virus Plautia stali intestine virus Taura syndrome virus Triatoma virus N C 002548 N C 003784 N C 003924 N C 001834 N C 003782 N C 003779 N C 003005 N C 003783 Iflavirus D W V IFV K V PnPV SbV V D V Deformed wing virus Infectious flacherie virus Kakugo virus Perina nuda picorna-like virus Sacbrood virus Varroa destructor virus N C 004830 N C 003781 N C 005876 N C 003113 N C 002066 N C 006494 Marnaviridae HaRNAV Heterosigma akashiwo RNA virus N C 005281 Picornaviridae A i V E M C V E R B V F M D V H A V HPeV H R V PV PTV Aichi virus Encephalomyocarditis virus Equine rhinitis B virus 1 Foot-and-mouth disease virus A Hepatitis A virus Human parechovirus Human rhinovirus 14 Poliovirus Porcine teschovirus 1 N C 001918 N C 001479 N C 003983 N C 011450 N C 001489 N C 001897 N C 001490 N C 002058 N C 003985 Virus Group Virus Acronym Full Name NCBI Accession # Sadwavirus SDV NIMV Satsuma dwarf virus NC 003785 Navel orange infectious mottling virus AB022887 Sequiviridae MCDV PYFV RTSV Maize chlorotic dwarf virus Parsnip yellow fleck virus Rice tungro spherical virus NC 003626 NC 003628 NC 001632 Tombusviridae CaRMV Carnation mottle virus CRSV Carnation ringspot virus MCMV Maize chlorotic mottle virus OCSV Oat chlorotic stunt virus PMV Panicum mosaic virus PoLV Pothos latent virus TBSV Tomato bushy stunt virus TNV-A Tobacco necrosis virus A NC 001265 NC 003530 NC 003627 NC 003633 NC 002598 NC 000939 NC 001554 NC 001777 Umbravirus CMoMV Carrot mottle mimic virus GRV Groundnut rosette virus PEMV-2 Pea enation mosaic virus-2 TBTV Tobacco bushy top virus NC 001726 NC 003603 NC 003853 NC 004366 Unclassified RsRNAV Rhizosolenia setigera RNA virus AB243297 SssRNAV Schizochytrium single-stranded RNA virus NC 007522 107 Table 5.4 PCR primers used to complete the three genome sequences. Primers JP-A 5 and 6 and JP-B 6 and 7 (shown in bold) were used in the environmental survey. Genome Primer Sequence (5'-3') Location (bp) Strand primer is based on JP-A 1 TTATTGCTAAGGCTGAAAGTCT 2596-2617 + 2 ATCCATTTTCTACCAACTTCAC 3467-3484 -3 TCGTCGGGAAGATGGC 3764-3779 + 4 GAAGCCTGCCACATCAAT 4285-4300 -5 ATGGTGGCAGTATGGTCG 5552-5569 + 6 CACTGGTATTCTTTGATTTTGAT 6165-6185 -7 TTGTGGATGATTCTGAACTTG 6881-6901 + 8 AAAATCGTCTCCAGCAGC 7863-7878 -9 TTGCTCCTTATGCTCCTCA 7943-7961 ' + 10 GAAGGTTCTGGTGTTTATTTGTA 8881-8901 -JP-B 1 CAATCATACCCCTGAGTTTAGA 213-234 + 2 AGTCTCAACAACACCCAAGC 1058-1077 -3 CCCGATTTTCTGTATGTTTTAG 1397-1418 + 4 ACCAACGACCAACTTAGCC 2076-2094 -5 GCGAAATGAAAAGGAGAAG 2646-2664 + 6 CGCTCTCGGACATAACAAA 3150-3168 -7 CCGTTTTCCGTTACATTGA 3666-3684 + 8 TTTTACCAACCTTAGCCTTCT 4240-4260 -9 GCTTCTTACTAAATCAATCCTTCTA 5521-5545 + 10 GCTAAAGTACAACCATAGAAAAATG 6416-6440 -SOG 1 ATACTTCTTCCCGCATCAG 378-398 + 2 TCCTTGGAATCGCTTGTTGT 771-790 -3 CGTCGGGTCGTCTAAAAC 1021-1040 + 4 CAGGCTTCTGAGGTGTGG 1464-1481 -5 GACTCCAACACAACAAATCG 2716-2737 + 6 GAGACAGGACAAGCGTTATG 3160-3179 -108 JP-A 2000 4000 6000 8000 UTR Hel p RdRp ;l G Structural UTR R S O G 2000 4000 u u RdRp ST;? Ri R Figure 5.1 Analysis of genomes for possible open reading frames. In the ORF maps created with DNA Strider (Marck 1988), for each reading frame, potential start codons (AUG) are shown with a half-height line and stop codons (UGA, UAA, and UAG) are shown by full-height lines. Putative genes (Hel = helicase, Pro = protease, RdRp = RNA-dependent RNA polymerase) and genomic features (UTR = untranslated region, IGR = intergenic region) are noted below each genome. See text for more detail 109 0 5 0 1 0 0 Figure 5.2 Map of the Strait of Georgia, British Columbia, Canada with station locations JP-A and JP-B were detected at 5/9 stations. The SOG station was not assayed for JP-A or JP-B. See Table 5.2 for additional information. 110 Figure 5.3 Bayesian maximum likelihood trees of aligned RdRp amino acid sequences from J P - A and JP-B and representative members of the proposed order Picornavirales. Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. The Bayesian scale bar indicates a distance of 0.1. See Table 5.3 for complete virus names and accession numbers. I l l IFV Figure 5.4 Bayesian maximum likelihood trees of aligned concatenated helicase, RdRp and VP3-like capsid amino acid sequences from JP-A and JP-B and other picorna-like viruses Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. The Bayesian scale bar indicates a distance of 0.1. See Table 5.3 for complete virus names and accession numbers. 112 SOG Figure 5.5 Bayesian maximum likelihood trees of aligned RdRp amino acid sequences from S O G and members of the family Tombusviridae and unassigned genus Umbravirus Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. The Bayesian scale bar indicates a distance of 0.1. See Table 5.3 for complete virus names and accession numbers. 113 5.5 References Altekar, G., S. Dwarkadas, J. P. Huelsenbeck, and F. Ronquist. 2004. Parallel metropolis coupled Markov chain Monte Carlo for bayesian phylogenetic inference. Bioinformatics 20: 407-415. Altschul, S. F., T. L. Madden, A . A . Schaffer, J. H . Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped B L A S T and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389-3402. Cevallos, R. C , and P. Sarnow. 2005. Factor-independent assembly of elongation-competent ribosomes by an internal ribosome entry site located in an R N A virus that infects penaeid shrimp. Journal of Virology 79: 677-683. Christian, P., Fauquet, C M . , Gorbalenya, A.E. , King, A . M . G . , Knowles, N . , Legall, O., Stanway, G. 2005. A proposed Picornavirales Order In C. M . Fauquet [ed.], Microbes in a changing world. International Unions of Microbiological Societies. Culley, A . I., A . S. Lang, and C. A . Suttle. 2003. High diversity of unknown picorna-like viruses in the sea. Nature 424: 1054-1057. Culley, A . I., A . S. Lang, and C. A . Suttle. 2006. Metagenomic analysis of coastal R N A virus communities. Science 312: 1795-1798. Czibener, C , D. Alvarez, E. Scodeller, and A. V . Gamarnik. 2005. Characterization of internal ribosomal entry sites of Triatoma virus. Journal of General Virology 86: 2275-2280. Hatakeyama, Y . , N . Shibuya, T. Nishiyama, and N . Nakashima. 2004. Structural variant of the intergenic internal ribosome entry site elements in dicistroviruses and computational search for their counterparts. R N A 10: 779-786. Jan, E., and P. Sarnow. 2002. Factorless ribosome assembly on the internal ribosome entry site of Cricket paralysis virus. Journal of Molecular Biology 324: 889-902. Koonin, E. V. , and V. V. Dolja. 1993. Evolution and taxonomy of positive-strand R N A viruses -implications of comparative-analysis of amino-acid-sequences. Critical Reviews in Biochemistry and Molecular Biology 28: 375-430. Kozak, M . 1986. Point Mutations Define a Sequence Flanking the Aug Initiator Codon That Modulates Translation by Eukaryotic Ribosomes. Cell 44: 283-292. 114 Lommel, S. A., Martelli, G.P., Rubino, L. Russo, M . 2004. Tombusviridae, p. 907-936. In C. M . Fauquet, Mayo M.A., Maniloff, J., Desselberger, TJ., Ball, L .A. [ed.], Virus Taxonomy. Eight Report of the International Committee on Taxonomy of Viruses. Elsevier. Marck, C. 1988. "DNA Strider": a "C" program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucleic Acids Research 16: 1829-1836. Nagasaki, K., Y. Tomaru, N. Katanozaka, Y. Shirai, K. Nishida, S. Itakura, and M . Yamaguchi. 2004. Isolation and characterization of a novel single-stranded RNA virus infecting the bloom-forming diatom Rhizosolenia setigera. Applied and Environmental Microbiology 70: 704-711. Nishiyama, T., H. Yamamoto, N. Shibuya, Y. Hatakeyama, A. Hachimori, T. Uchiumi, and N. Nakashima. 2003. Structural elements in the internal ribosome entry site of Plautia stali intestine virus responsible for binding with ribosomes. Nucleic Acids Research 31: 2434-2442. Okayama, H. , and P. Berg. 1982. High-efficiency cloning of full-length cDNA. Molecular and Cellular Biology 2: 161-170. Smith, A. 2000. Aquatic Virus Cycles, p. 447-491. In C. Hurst [ed.], Viral Ecology. Academic Press. Suttle, C. A., A. M . Chan, and M . T. Cottrell. 1991. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Applied and Environmental Microbiology 57: 721-726. Swofford, D. 2000. PAUP*: Phylogenetic Analysis Using Parsimony and other Methods 4.0. Sinauer Associates, Inc. Takao, Y., K. Nagasaki, K. Mise, T. Okuno, and D. Honda. 2005. Isolation and characterization of a novel single-stranded RNA virus infectious to a marine fungoid protist, Schizochytrium sp. (Thraustochytriaceae, labyrinthulea). Applied and Environmental Microbiology 71: 4516-4522. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins 1997. The C L U S T A L _ X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25: 4876-4882. 115 Tomaru, Y . , N . Katanozaka, K . Nishida, Y . Shirai, K . Tarutani, M . Yamaguchi, and K . Nagasaki. 2004. Isolation and characterization of two distinct types of H c R N A V , a single-stranded R N A virus infecting the bivalve-killing microalga Heterocapsa circularisquama. Aquatic Microbial Ecology 34: 207-218. White, K . A . , and P. D . Nagy. 2004. Advances in the molecular biology of tombusviruses: gene expression, genome replication, and recombination. Progress in Nucleic A c i d Research and Molecular Biology 78: 187-226. 116 Chapter VI. Conclusions 6.1 Concluding remarks 6.1.1 Recapitulation The primary aim of this dissertation was to characterize RNA viruses in the ocean, a previously undescribed component of the marine microbial community. I approached this task from three perspectives, a single isolate (HaRNAV, Chapter II), a specific taxon of viruses {Picornavirales, Chapter III) and two assemblages of viral genomes (stations JP and SOG, Chapter IV, Chapter V). These data are some of the first characterizations of the richness of the in situ marine RNA virus community. 6.1.2 Bias The methods developed in this dissertation are based on reverse transcription (RT) of RNA into cDNA and polymerase chain reaction (PCR) amplification of mixed templates. PCR in combination with improvements in cloning, sequencing and bioinformatics have resulted in, among other noteworthy contributions, the identification and classification of thousands of microbes that are not in culture (Rappe & Giovannoni 2003). However, bias associated with every step of a PCR-based assay, including sample collection, nucleic acid extraction, PCR amplification and sequence analysis of PCR amplicons, can contribute to an inaccurate portrayal of the community under examination (Von Wintzingerode et al. 1997). Formatting requirements, particularity in Chapter III and IV, prevented the inclusion of a comprehensive discussion of methodological bias. I have therefore attempted to address this topic in the following sections. 6.1.2.1 Bias associated with sample collection and extraction of RNA The introduction of bias can occur during routine sample collection. For example, during the production of viral concentrates from seawater samples are typically pre-filtered to remove host organisms (Suttle et al 1991). However, pre-filtration, can result in the removal of viruses larger than the filter pore-size, viruses adsorbed to the material captured on the filter and the destruction of viruses that are particularly delicate. Moreover, marine virus decay rates can be on the order of hours (Weinbauer 2004) and thus the extended time (often greater than 4 hr) required to concentrate viruses from large volumes of seawater may alter the community composition. Once the viral concentrate has been collected, extraction of nucleic acids from environmental samples requires a delicate balance between lysing the most recalcitrant members 118 of the community and avoiding damage (e.g. through shearing or degradation) to the extracted nucleic acids. RNA presents an additional challenge because it is less stable than DNA and susceptible to degradation by RNases (Von Wintzingerode et al. 1997). Furthermore difficulties can arise during the isolation of RNA due to DNA resistant to removal by enzymatic degradation. 6.1.2.2 Bias associated with RT Reverse transcriptase synthesizes DNA from an RNA template. The enzyme lacks 3' to 5' proofreading exonuclease activity, contributing to its relatively high error rate in comparison to other DNA polymerases (Yang et al. 2002). For example, Superscript II reverse transcriptase (Invitrogen) is approximately 13 times more error prone than Platinum Taq polymerase (Invitrogen) (Roberts et al. 1988). Factors such as the concentration of target RNA, the amount of template secondary structure and priming conditions including annealing temperature can significantly affect the precision, efficiency and production of the RT reaction (Stalhberg et al. 2001). 6.1.2.3 Bias associated with PCR Amplification of environmental targets with PCR can result in differential amplification, the formation of chimeras and heteroduplexes and artifacts from DNA polymerase error, among other biases (Von Wintzingerode et al. 1997, Kanagawa 2003). Polz and Cavanaugh (1998) found that PCR with degenerate primers did not maintain the original ratio of template after 25 cycles and that templates with GC-rich priming sites were preferentially amplified. Moreover, Suzuki and Giovannoni (1996) observed that in reactions with greater than 35 cycles, a 1:1 ratio of products occurred regardless of the initial ratio of target sequences. PCR can also produce artifacts. Chimeras, molecules formed from parts of two different sequences, can comprise approximately 10% of PCR amplified products (Choi et al. 1994) and appear to increase in frequency with cycle number (Von Wintzingerode et al. 1997). The DNA polymerase in PCR can mis-incorporate bases during amplification resulting in sequencing artifacts. Eckert and Kunkel (1991) calculated Taq error rates as high as 3 x 10"3 and Acinas et al (2005) identified Taq DNA polymerase error as the primary cause of artifacts during the construction of rRNA clone libraries. 119 6.1.2.4 Bias associated with cloning The cloning of amplified products can be another significant source of bias. Factors that may lead to cloning bias include the expression of deleterious genes (a significant concern with phages), a decrease in cloning efficiency with increasing insert size, the formation of heteroduplexes and inappropriate antibiotic resistance (Kanagawa 2003). For example, Rainey et al. (1994) observed significant differences in community composition between clone libraries constructed with different cloning systems from the same sample. Although PCR-based methods can be effective in characterizing the richness of a community as well as the identity and phylogeny of its members, estimates of evenness are dubious due to the biases discussed above. In Chapter III, a degenerate RT-PCR assay targeting a region of the RdRp conserved among picorna-like viruses resulted in the discovery of a diverse array of picorna-like viruses from samples collected from the Strait of Georgia (Figure 3.1). The advantages of this method are that it is relatively simple and inexpensive (compared to a metagenomic approach for example). However, as discussed above, methodological biases limit the application of this approach to the examination of the richness of an RNA viral community only. Nevertheless, targeting an evolutionary informative, conserved gene with degenerate RT-PCR is an approach that can be adapted to interrogate the virioplankton for additional taxa of viruses such as reoviruses, luteoviruses and retroviruses. 6.1.2.5 Bias associated with WGS library construction The whole-genome shotgun (WGS) library method employed in Chapter IV was based on the linker amplified shotgun library (LASL) protocol described in Breitbart et al. (2002). The L A S L method appears to produce a random representation of the original template with an error rate below 1% and no detectable chimeras (Rohwer et al. 2001). A test L A S L from a mixed community of previously sequenced vibriophages demonstrated that the average number of clones that contributed to any base is relatively constant over the entire length of the genomes in the original sample (Seguritan et al. 2003). Furthermore, a linear relationship existed between size of sequence overlap and number of contigs (r2 = 0.93), suggesting that sequence fragments where generated from the original templates randomly (http://www.sci.sdsu.edu/PHAGE/LASL/index.htm). Of the 60450 bp generated in the test library, 332 erroneous bases were detected resulting in an error rate of (0.55%). Moreover, no 120 chimeras were produced in approximately 1000 sequences (http://www.sci.sdsu.edu/PHAGE/LASL/index.htm). The performance of L A S L under the test conditions above is impressive, however future experiments should include an evaluation of the method with a more complex viral community and to a greater depth of sequencing. Research published by Zhang et al. (2006) used a method similar to the WGS approach in chapter IV to examine the RNA viral community in human feces. After extracting total RNA from the viral fraction, RT was conducted with random primers, followed by strand-displacement second-strand synthesis. The double-stranded DNA products were digested with an endonuclease, followed by adapter ligation and PCR with primers targeting sites on the adapters. Amplicons between 500 and 1000 bp were then cloned and sequenced (Zhang et al. 2006). However, no analysis of bias was described. Several different methods have been developed to accurately interrogate the transcriptosome of single cells (Peano et al. 2006). These methods share several of the same challenges as the examination of natural communities of RNA viruses, starting with very low concentrations of starting RNA template. Surprisingly, those methods that include an exponential amplification step after the addition of primer sites to the target template (like the WGS method described in chapter IV) generally introduced less bias than other approaches such as linear RNA amplification (Iscove et al. 2002, Subkhankulova & Livesey 2006). Were funding and access to virus isolates no obstacle, a direct way to evaluate error and bias in the RNA virus WGS method would be to construct a test library from a mixed community of characterized RNA virus isolates, including retro-, double-stranded, positive- and negative-sense single-stranded representatives, in known proportions, although it is uncertain whether the results of this exercise would be applicable to an unknown assortment of RNA viral genomes. Nevertheless, the relatively small genome size of RNA viruses makes the construction of whole-genome shotgun libraries a realistic approach to rapidly survey the diversity of marine RNA virus communities. Additionally, this method may be modified to characterize RNA virus populations from a variety of samples such as blood, aerosols and sewage effluent. 6.1.3 Significance of the research A significant finding of this work was the identification of marine RNA viruses with positive-sense single-stranded genomes that were distantly related to established virus taxa. This 121 research suggests that there is a persistent and widespread clade of novel picorna-like viruses in the Strait of Georgia, which is comprised of environmental sequences and RNA virus isolates that infect a diversity of protist hosts, for example, HaRNAV infects a photoautotrophic raphidophyte (Tai et al. 2003), while SssRNAV infects a heterotrophic thraustochytrid (Takao et al. 2005, Figure 6.1). In contrast to the marine DNA virus community, which appears to be composed primarily of phages (Edwards & Rohwer 2005), the viral sequences identified in this study are homologous to sequences from RNA viruses that infect eukaryotes, however, a majority of the viral sequences have no significant homology to any known viruses. Because methodological bias was not quantified, it is possible that the marine RNA virus community includes presently unidentifiable RNA bacteriophages. Nevertheless, virus abundance is linked to host abundance; hence, if the marine RNA virioplankton is primarily composed of viruses that infect eukaryotes, it is likely that RNA viruses comprise a small percentage of the total virioplankton. However, as exemplified by the recent isolation of RsRNAV (Nagasaki et al. 2004) and a suite of RNA viruses that infect multicellular organisms from shrimp (Mari et al. 2002) to salmonids (Bernard & Bremont 1995) to sea lions (Smith et al. 2000), it may be that the greatest impact of RNA viruses is that they affect the abundance and population structure of ecologically important marine organisms. The preceding chapters have provided a preliminary glimpse into a previously uncharacterized component of marine microbial communities and may be of interest to a wide range of scientists. For example, virologists will be interested by the existence of previously unknown families of RNA viruses; biological oceanographers will be interested because previously RNA viruses have not been considered important players in the ocean; microbial ecologists should be intrigued by the discovery of an avenue of previously unexplored microbial diversity in the ocean; scientists interested in emergent pathogens will likely find the discovery of picorna-like viruses of unknown pathogenicity exciting; phycologists will be interested to learn that picorna-like viruses are pathogens of algae. However, more data and better techniques are required before we can examine important topics such as the role of marine RNA viruses in biogeochemical cycling and the evolutionary interrelationships among taxa of RNA viruses. Future research should include the refinement of methods used to characterize RNA virus communities (e.g. virus community collection and metagenomic library construction), determination of the composition and structure of RNA virus assemblages from a greater 122 diversity of aquatic environments, and the continued sequencing of RNA virus isolates. From these data, quantitative molecular techniques can be used to investigate the dynamics of individual RNA virus-host systems in situ and may ultimately lead to a broader understanding of marine RNA virus ecology. 123 6.2 Figure Figure 6.1 Clade of marine picorna-like virus RdRp sequences from Figure 4.3. Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. JP-A and JP-B are from the assembled environmental genomes. The Bayesian scale bar indicates a distance of 0.1. See Table 4.4 for complete virus names and sequence accession numbers. 124 6.3 References Acinas, S. G., R. Sarma-Rupavtarm, V. Klepac-Ceraj, and M . F. Polz. 2005. PCR-induced sequence artifacts and bias: Insights from comparison of two 16S rRNA clone libraries constructed from the same sample. Applied and Environmental Microbiology 7 1 : 8966-8969. Bernard, J., and M . Bremont. 1995. Molecular biology of fish viruses - a review. Veterinary Research 26: 341-351. Breitbart, M. , P. Salamon, B. Andresen, J. M . Mahaffy, A. M . Segall, D. Mead, F. Azam, and F. Rohwer. 2002. Genomic analysis of uncultured marine viral communities. Proceedings of the National Academy of Sciences of the United States of America 99: 14250-14255. Choi, B. K., B. J. Paster, F. E . Dewhirst, and U. B. Gobel. 1994. Diversity of cultivable and uncultivable oral spirochetes from a patient with severe destructive periodontitis. Infection and Immunity 62: 1889-1895. Eckert, K. A., and T. A. Kunkel. 1990. High fidelity DNA-synthesis by the Thermus-aquaticus DNA-polymerase. Nucleic Acids Research 18: 3739-3744. Edwards, R. A., and F. Rohwer. 2005. Viral metagenomics. Nature Reviews Microbiology 3: 504-510. Iscove, N. N., M . Barbara, M . Gu, M . Gibson, C. Modi, and N. Winegarden. 2002. Representation is faithfully preserved in global cDNA amplified exponentially from sub-program quantities of mRNA. Nature Biotechnology 20: 940-943. Kanagwa, T. 2003. Bias and artifacts in multi-template polymerase chain reaction (PCR). Journal of Bioscience and Bioengineering 96: 317-323. Malboeuf, C. M . , S. J. Isaacs, N. H. Tran, and B. Kim. 2001. Thermal effects on reverse transcription: Improvement of accuracy and processivity in cDNA Synthesis. BioTechniques 30: 1074-1085. 125 Mari, J., B. T. Poulos, D. V. Lightner, and J. R. Bonami. 2002. Shrimp Taura syndrome virus: genomic characterization and similarity with members of the genus Cricket paralysis-like viruses. Journal of General Virology 8 3 : 915-926. Nagasaki, K., Y. Tomaru, N. Katanozaka, Y. Shirai, K. Nishida, S. Itakura, and M . Yamaguchi. 2004. Isolation and characterization of a novel single-stranded RNA virus infecting the bloom-forming diatom Rhizosolenia setigera. Applied and Environmental Microbiology 70:704-711. Peano, C., M . Severgnini, I. Cifola, G. De Bellis, and C. Battaglia. 2006. Transcriptome amplification methods in gene expression profiling. Expert Review of Molecular Diagnostics 6: 465-480. Polz, M . F., and C. M . Cavanaugh. 1998. Bias in template-to-product ratios in multi-template PCR. Applied and Environmental Microbiology 64: 3724-3730. Rainey, F. A., N. Ward, L. I. Sly, and E. Stackerbrandt. 1994. Dependence on the taxon composition of clone libraries for PCR amplified, naturally occurring 16S rDNA, on the primer pair and the cloning system used. Experientia 5 0 : 796-797. Rappe, M . S., and S. J. Giovannoni. 2003. The uncultured microbial majority. Annual Review of Microbiology 5 7 : 369-394. Roberts, J. D., K. Bebenek, and T. A. Kunkel. 1988. The accuracy of reverse-transcriptase from HIV-1. Science 242: 1171-1173. Rohwer, F., V. Seguritan, D. H. Choi, A. M . Segall, and F. Azam. 2001. Production of shotgun libraries using random amplification. BioTechniques 3 1 : 108-118. Seguritan, V., I. W. Feng, F. Rohwer, M . Swift, and A. M . Segall. 2003. Genome sequences of two closely related Vibrio parahaemolyticus phages, VP16T and VP16C. Journal of Bacteriology 1 8 5 : 6434-6447. Smith, A. 2000. Aquatic Virus Cycles, p. 447-491. In C. Hurst [ed.], Viral Ecology. Academic Press. 126 Stahlberg, A . , J. Hakansson, X . Xian , H . Semb, and M . Kubista. 2004. Properties of the reverse transcription reaction in m R N A quantification. Clinical Chemistry 5 0 : 509-515. Subkhankulova, T., and F. J. Livesey. 2006. Comparative evaluation of linear and exponential amplification techniques for expression profiling at the single-cell level. Genome Biology 7 : R18. Suttle, C. A . , A . M . Chan, and M . T. Cottrell. 1991. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Applied and Environmental Microbiology 5 7 : 721-726. Suzuki, M . T., and S. J. Giovannoni. 1996. Bias caused by template annealing in the amplification of mixtures of 16S r R N A genes by P C R . Applied and Environmental Microbiology 62: 625-630. Tai, V. , J. E . Lawrence, A . S. Lang, A. M . Chan, A. I. Culley, and C. A . Suttle. 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). Journal of Phycology 39: 343-352. Takao, Y . , K . Nagasaki, K . Mise, T. Okuno, and D . Honda. 2005. Isolation and characterization of a novel single-stranded R N A virus infectious to a marine fungoid protist, Schizochytrium sp. (Thraustochytriaceae, labyrinthulea). Applied and Environmental Microbiology 71: 4516-4522. V o n Wintzingerode, F. , U . B . Gobel, and E . Stackebrandt. 1997. Determination of microbial diversity in environmental samples: pitfalls of PCR-based r R N A analysis. F E M S Microbiology Reviews 21: 213-229. Weinbauer, M . G . 2004. Ecology of prokaryotic viruses. F E M S Microbiology Reviews 28: 127-181. Zhang, T., M . Breitbart, W. H . Lee, J. Q. Run, C. L . Wei , S. W . L . Soh, M . L . Hibberd, E . T. L i u , F. Rohwer, and Y . J. Ruan. 2006. R N A viral community in human feces: Prevalence of plant pathogenic viruses. PloS Biology 4: 108-118. 127 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0100397/manifest

Comment

Related Items