Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sequence and organization of the nonstructural proteins and noncoding regions of melon necrotic spot… Rivière, Carol Jeanne 1989

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A6_7 R58.pdf [ 7.95MB ]
Metadata
JSON: 831-1.0097539.json
JSON-LD: 831-1.0097539-ld.json
RDF/XML (Pretty): 831-1.0097539-rdf.xml
RDF/JSON: 831-1.0097539-rdf.json
Turtle: 831-1.0097539-turtle.txt
N-Triples: 831-1.0097539-rdf-ntriples.txt
Original Record: 831-1.0097539-source.json
Full Text
831-1.0097539-fulltext.txt
Citation
831-1.0097539.ris

Full Text

SEQUENCE AND ORGANIZATION OF T H E NONSTRUCTURAL PROTEINS AND NONCODING REGIONS OF M E L O N NECROTIC SPOT VIRUS By Carol Jeanne Riviere B. Sc. Ag. (Plant Science), University of Alberta, 1976 LL. B., University of British Columbia, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES PLANT SCIENCE We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1989 © Carol Jeanne Riviere, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Plant Science The University of British Columbia Vancouver, Canada Date August 30, 1989 DE-6 (2/88) Abstract Melon necrotic spot virus (MNSV), is an isometric plant virus with a monopartite, positive-sense, single-stranded RNA genome. This thesis reports the nucleotide sequence of regions of the MNSV genome which are noncoding or which encode nonstructural pro-teins, as well as the number and approximate sizes of the subgenomic RNAs produced by MNSV during infection. These data show that the MNSV genome comprises at least 4,262 nucleotides encoding, in two different reading frames, three or four nonstructural proteins for which there is some evidence of in vivo expression. These proteins, listed in order of their location on the genome from the 5' terminus, have molecular weights of ca. 29,000 (p29), 89,000 (p89), 7,000 (p7) and 14,000 (pl4). The p29 and P89 proteins are probably translated from genomic length RNA, while p7 and pl4 are probably translated from a ca. 1.9 kb subgenomic RNA. A second subgenomic RNA of ca. 1.6 kb is the likely template for translation of the 3' proximal, 42 kDa coat protein. Expression of p89 and pl4 requires read-through of the amber termination codons of p29 and p7, respectively. MNSV p89 is the putative viral replicase; its read-through domain contains the GDD motif characteristic of the RNA-dependent RNA polymerases of several plant and animal viruses. The MNSV replicase shows very little amino acid sequence similarity with the replicases of viruses from either of the two established virus supergroups. The MNSV polymerase, however, shows a high degree of similarity with the polymerases of maize chlorotic mottle virus as well as carmo-, tombus- and luteoviruses which have been sug-gested to form a third virus supergroup. The role of p29 is unknown, but it presumably functions in replication. The functions of p7 and pl4 are also unknown, but may be re-lated to virus transport. Although MNSV shows amino acid sequence similarity in some u of its proteins with viruses in several plant virus groups, it shows the most extensive similarities with members of the carmovirus group. MNSV also closely resembles these viruses in the number, sizes and genomic organization of its proteins as well as in its probable translation strategy. For these reasons, as well as previously reported physico-chemical similarities with the carmoviruses, MNSV should be classified as a member of the carmovirus group. Table of Contents Abstract ii List of Tables vii List of Figures viii List of Abbreviations x Acknowledgement xii 1 Introduction 1 1.1 Plant Viruses 1 1.1.1 Characteristics 1 1.1.2 Genome Types 1 1.1.3 Translation Strategies 2 1.2 MNSV 3 1.2.1 Biological Properties 3 1.2.2 Chemical and Physical Properties 6 1.2.3 Relationships with Other Viruses . 6 1.3 Thesis Objectives 7 2 Materials and Methods 9 2.1 Nucleotide Sequencing 9 2.1.1 Preparation of Subclones for Sequencing 9 iv 2.1.2 Sequencing Procedures 14 2.1.3 Storage and Analysis of Sequence Data 17 2.2 Northern Blot Analysis of Single-Stranded RNA from MNSV-Infected Plants 19 2.2.1 Purification of Virion RNA 19 2.2.2 Purification of Single-Stranded RNA from MNSV-Infected Plants 21 2.2.3 Northern Blots of Total Leaf Single-Stranded RNA 21 3 Results and Discussion 24 3.1 Sequencing Strategy 24 3.1.1 5' Terminus 24 3.1.2 3' Terminus 29 3.1.3 Central Genomic Region 32 3.1.4 Nucleotide Sequence Data 33 3.2 Nucleotide Sequence of the MNSV Genome 33 3.3 Genomic Organization 34 3.3.1 MNSV Genomic Organization 34 3.3.2 Comparison of the MNSV Genomic Organization with Those of Related Viruses 36 3.4 Translation Strategy 40 3.4.1 MNSV Translation Strategy 40 3.4.2 Comparison of the MNSV Translation Strategy with Those of Re-lated Viruses 43 3.5 Amino Acid Sequence Comparisons of MNSV Nonstructural Proteins . . 46 3.5.1 P29/p89 46 3.5.2 The Central Genomic Region of MNSV 58 3.6 Summary and Conclusions 64 v Appendices 68 A Protocols 68 A.l Recovery of DNA Fragments from Low Gelling Temperature Agarose Gels 68 A.2 Large Scale Plasmid Purification Using Alkaline Lysis and LiCl Precipitation 70 A.3 Removing Extensions from Restriction Fragments Using Mung Bean Nu-clease 73 A.4 Isolation of Total ssRNA from Leaves 74 A.5 Purification of MNSV from Infected Cucumber Plants 76 B Nucleotide Sequence Data 79 Bibliography 96 v i List of Tables 3.1 Nucleotide Sequences of the 3' Termini of MNSV cDNA Clones Mapping to the 3' End of the Genome 30 3.2 Percent Amino Acid Sequence Identity Among MNSV p29 and the Anal-ogous Proteins CarMV p27, TCV p28 and MCMV p50 48 3.3 Percent Amino Acid Sequence Identity Among the Read-through Domains of MNSV p89 and the Analogous Proteins CarMV p86, TCV P88, MCMV p i l l , CNV P92 and BYDV P99 50 3.4 Percent Amino Acid Sequence Identity Among MNSV p7A and the Anal-ogous Proteins CarMV p7, TCV p8 and MCMV p9 61 3.5 Percent Amino Acid Sequence Identity Among MNSV p7B and the Anal-ogous Regions of CarMV, TCV and MCMV 62 vii List of Figures 2.1 Genomic Location and Restriction Maps of MNSV cDNA Clones 10 3.2 Sequencing Strategy 25 3.3 Nucleotide Sequence of the MNSV Genome 26 3.4 Open Reading Frames of MNSV Genomic RNA in both the Positive and Negative Sense 35 3.5 Genomic Organization of MNSV 36 3.6 Comparison of the Genomic Organizations of MNSV, CarMV, TCV and CNV 37 3.7 Northern Blot of MNSV-Specific ssRNAs Generated During Infection of Cucumber 41 3.8 Orientation of MNSV Subgenomic RNAs Relative to the MNSV Genome 42 3.9 Comparison of the Translation Strategies of MNSV, CarMV and TCV . . 44 3.10 Dot Matrix Comparisons of the Amino Acid Sequence of MNSV p89 with Those of the Analogous Proteins CarMV p86, TCV p88, MCMV p i l l , CNV p92 and BYDV p99 47 3.11 Alignment of the Amino Acid Sequences of MNSV p29 and the Analogous Proteins CarMV p27, TCV p28 and MCMV p50 49 3.12 Alignment of the Amino Acid Sequences of the Read-Through Domain of MNSV p89 and the Analogous Proteins CarMV p86, TCV p88, MCMV p i l l , CNV p92 and BYDV p99 51 vm 3.13 Alignment of Amino Acids Surrounding the GDD Sequence of MNSV p89 with Conserved Amino Acids Characteristic of Many Viral Polymerases . 54 3.14 Comparison of the Central Genomic Regions of MNSV, CarMV, TCV and MCMV 59 3.15 Alignment of the Amino Acid Sequences of MNSV p7A and the Analogous Proteins CarMV p7, TCV p8 and MCMV p9 60 3.16 Alignment of the Amino Acid Sequences of MNSV p7B and the Analogous Regions of CarMV, TCV and MCMV 61 3.17 Comparison of the Genomic Organizations and Regions of Amino Acid Sequence Similarity Between MNSV and Several Other Plant Viruses . . 67 i x List of Abbreviations B M V brome mosaic virus bp base pair(s) B Y D V barley yellow dwarf virus BSA bovine serum albumin CarMV carnation mottle virus CNV cucumber necrosis virus Da dalton(s) DEC Digital Equipment Corporation ds double-stranded EDTA ethylenediaminetetraacetic acid kb kilobase(s) kDa kilodalton(s) LB Luria-Bertani M C M V maize chlorotic mottle virus MNSV melon necrotic spot virus ORF open reading frame x P E G polyethylene glycol SDS sodium dodecyl sulfate SND Scientific Numeric Database Service ss single-stranded SSC 15 mM sodium citrate, 150 mM sodium chloride TBSV tomato bushy stunt virus T C A trichloroacetic acid T C V turnip crinkle virus Td denaturation temperature T E 10 mM Tris-HCl pH7.5, ImM EDTA T N E 100 mM Tris-HCl PH7.5, 100 mM NaCl, 10 mM EDTA T M V tobacco mosaic virus Tris tris hydroxymethyl aminomethane xi Acknowledgement I would like to thank my advisor Dr. D'Ann Rochon for excellent supervision while I was carrying out this project and preparing my thesis. I am also grateful to the Vancouver Research Station of Agriculture Canada for providing the facilities and funding necessary to carry out my experimental work. I would also like to express my appreciation to Dr. Joan McPherson for acting as my Departmental advisor and to Drs. Brian Holl, Victor Runeckles, Michael Shaw and R. Antony Warren for serving on my committee. Finally, I would like to thank my husband, Rob Cameron, for assistance with computer analyses and text processing and, more importantly, for his continued support and encouragement throughout my M.Sc. program. xn Chapter 1 Introduction 1.1 Plant Viruses 1.1.1 Characteristics A virus has been defined as "... a set of one or more nucleic acid template molecules, normally encased in a protective coat or coats of protein or lipoprotein, which is able to organize its own replication only within suitable host cells." [Matthews, 1981, p. 11]. Most plant viruses consist primarily of one or more molecules of nucleic acid sur-rounded by a protein coat made up of repeating subunits of one or a few types of capsid protein. The capsids of some plant viruses are enveloped by a lipid bilayer. Plant virus particles may be isometric (spherical), rod-shaped or bacilliform and range in size from ca. 50-500 nm. 1.1.2 Genome Types Plant virus genomes may be mono-, bi- or multipartite i.e., composed of one, two or more molecules of linear or circular nucleic acid. Most multipartite viruses are also multicom-partmented i.e., each genome segment is separately encapsidated. The genome may be composed of either DNA or RNA, which may be either single-stranded (ss) or double-stranded (ds). Single-stranded genomes may be of either positive or negative polarity. A positive-sense ssRNA genome is itself translatable and, therefore, such genomic RNA alone is usually infectious. Negative-sense ssRNA must be transcribed into its positive 1 Chapter 1. Introduction 2 sense complement before it can be translated, so such genomic RNA on its own is not infectious. The majority of plant viruses characterized to date (76.6%) have positive-sense ssRNA genomes [Zaitlin and Hull, 1987] ranging in size from ca. 1.3 x 106-4.7 x 106 daltons (Da) or 3.8-13.8 kilobases (kb) [Matthews, 1982]. 1.1.3 Translation Strategies The ss, positive-polarity RNA viruses which comprise the majority of known plant viruses all contain at least one multicistronic genome segment. Expression of the internal cistrons of such RNAs by a plant host's metabolic machinery poses a problem because plant trans-lation systems have evolved to translate the monocistronic mRNAs normally produced by eukaryotes. A eukaryotic ribosome generally binds to the 5' end of a mRNA, moves along the transcript in a 5' to 3' direction until it encounters the first initiation codon which is in an appropriate translation initiation context [Kozak, 1987, Liitcke et al., 1987], trans-lates from this point to the first termination codon, then falls off the transcript. Thus, if a multicistronic transcript is translated in this manner, only the 5' proximal gene would be expressed; any cistrons located 3' to the first terminator would not be translated. Plant viruses have evolved several mechanisms to enable expression of the internal cistrons of their multicistronic transcripts by eukaryotic hosts [Dougherty and Hiebert, 1985, Joshi and Haenni, 1984]. These mechanisms include segmentation of the vi-ral genome, polyprotein processing, read-through (suppression) of "leaky" termination codons and production of 3' coterminal subgenomic RNAs. Individual viruses generally use more than one of these translation strategies. Segmentation of the genome, used by bi- and multipartite viruses, results in the pro-duction of genome segments with different genes in the 5' proximal translatable position. In polyprotein processing, the entire viral transcript is translated as one large protein and is post-translationally cleaved into several smaller proteins. With read-through of Chapter 1. Introduction 3 leaky termination codons, the 5' proximal gene is "punctuated" by a termination codon which is followed by an in-frame, open coding region. Translation sometimes ends at this terminator to produce a smaller protein, but sometimes the terminator is read-through to produce a larger protein. Such suppression of termination codons has been shown to occur with several viruses in vitro, e.g., tobacco mosaic virus (TMV) [Pelham, 1989] and carnation mottle virus (CarMV) [Harbison et al., 1985]. Lastly, internal cistrons may be expressed via the production of subgenomic RNAs. During infection, different, shorter than full-length, ss positive-sense RNAs are produced, each with a different in-ternal cistron at the 5' end. All subgenomic RNAs examined to date are 3' co-terminal, i.e., they all contain the same 3' end as the viral genome from which they are derived. In addition, all plant virus subgenomic RNAs have been found to be colinear with their genomic templates, i.e., no spliced plant RNA virus subgenomic RNAs have been found. Production of subgenomic RNAs serves not only as a mechanism to enable expression of internal cistrons, it may also be an important mechanism in regulating the timing of ex-pression of different viral proteins and in amplifying the production of proteins required in large amounts, such as coat proteins. 1.2 MNSV 1.2.1 Biological Properties Natural Host Range and Symptomatology Melon or muskmelon necrotic spot virus (MNSV), was first described by Kishi [Kishi, 1960] who isolated the virus from greenhouse melons in Japan. Outbreaks of the virus have since been reported in greenhouse grown muskmelons in California [Gonzalez-Garza et al., 1979], cucumbers grown on rockwool in greenhouses and in soil in the Netherlands [Bos et al., 1984], cucumbers grown on rockwool in greenhouses in the U.K. [Thomas and Chapter 1. Introduction 4 Tomlinson, 1985], melons and possibly watermelons grown in soil under plastic houses in Crete [Avgelis, 1985] and most recently in greenhouse grown muskmelons in Sweden [Ryden and Persson, 1986]. Symptoms described in these naturally occurring infections vary somewhat depending on the species and cultivar of the plant and the isolate of MNSV involved, but are generally characterized by necrotic lesions on infected leaves [Hibi and Furuki, 1985]. Similarly, whether the infection is localized or systemic also seems to vary depending on the combination of plant species and cultivar, and viral isolate, as well as other factors such as environmental conditions [Hibi and Furuki, 1985]. Systemic infections are much less common than localized ones, and occur only sporadically even in appropriate plant/virus combinations [Hibi and Furuki, 1985]. In infected plantings, incidence of the disease can be as high as 45% [Bos et al., 1984]. MNSV can significantly decrease plant growth [Hibi and Furuki, 1985] and in some cases may even cause plant death [Avgelis, 1985]. Experimental Host Range and Symptomatology As well as having a very narrow natural host range MNSV, unlike many other plant viruses, also has a very narrow experimental host range [Hibi and Furuki, 1985]. Exten-sive experimental host range studies have been conducted [Gonzalez-Garza et al., 1979, Bos et al., 1984], but almost all MNSV isolates tested to date apparently infect only certain members of the Cucurbitaceae [Hibi and Furuki, 1985]. The one exception is the Japanese isolate of MNSV, which was found to produce a localized infection in Vigna unguiculata ssp. sesquipedalis [Hibi and Furuki, 1985]. None of the virus isolate/plant species combinations tested to date appears to produce a systemic infection consistently [Hibi and Furuki, 1985], which can make purifying large amounts of the virus relatively difficult. Chapter 1. Introduction 5 Transmission Natural transmission of MNSV to both melons and cucumbers has been shown to occur primarily through the soil via motile zoospores of the chytrid fungus Olpidium radicale, formerly called Olpidium cucurbitacearum [Hibi and Furuki, 1985]. Virus particles adhere to the zoospores' external surfaces, but are not internalized [Hibi and Furuki, 1985]. Hydroponic growing conditions appear to provide an ideal environment for spread of the virus by this vector. MNSV has also been shown to be experimentally transmissible (8-13%) by certain cucumber beetles (Diabrotica spp.) [Coudriet et al, 1979]. Although these beetles are commonly found in many commercial melon fields [Coudriet et al, 1979], there is no evidence as yet that any natural outbreaks of MNSV have been due to beetle transmission. Transmission of MNSV by the aphid Aphis gossypii has been tested but found to be negative [Kishi, 1966]. Seed transmission of MNSV has been reported (10-15% [Kishi, 1966], 1-6% [Gonzalez-Garza et al, 1979] and 22.5% [Avgelis, 1985]). It has been suggested that one outbreak of the disease in melons was due to infected seed [Gonzalez-Garza et al., 1979], but since the investigators apparently did not check for the presence of Olpidium radicale, involvement of the fungal vector in this outbreak cannot be ruled out. The investigators in this case noted that neither MNSV virus nor its symptoms could be detected in field surveys of commercially grown melons, even though the virus was found in commercial seed stocks [Gonzalez-Garza et al., 1979]. Infected seed alone may, therefore, be insufficient to cause an outbreak of the disease, at least under field conditions. Chapter 1. Introduction 6 Control To date, MNSV has been primarily a problem in hydroponically grown greenhouse cu-cumbers and melons, probably due to the prevalence of Olpidium under these conditions [Hibi and Furuki, 1985]. The vector, and therefore spread of the virus, can be effectively controlled in these circumstances by addition of the surfactant Agral to nutrient solutions [Tomlinson and Faithfull, 1985]. In soil, the vector and thus the virus, can be controlled by soil sterilization with steam or methyl bromide [Hibi and Furuki, 1985]. 1.2.2 Chemical and Physical Properties MNSV consists of a single spherical particle ca. 30 nm in diameter [Hibi and Furuki, 1985]. Its capsid is composed of a single type of polypeptide, originally reported to have a molecular weight of ca. 46,000 on the basis of its electrophoretic mobility in SDS polyacrylamide gels [Bos et al., 1984]. Subsequent nucleotide sequence data suggest the MNSV coat protein has a molecular weight of 41,840 [Pot, 1987, Riviere et al., 1989]. MNSV has a monopartite, single-stranded (ss), positive-sense RNA genome reported to have a molecular weight of ca. 1.5 x 106 as determined by its electrophoretic mobility in non-denaturing polyacrylamide gels [Hibi and Furuki, 1985]. 1.2.3 Relationships with Other Viruses There are a large number of small, spherical, plant viruses with ca. 30 nm. particles and ssRNA genomes of ca. 1.5 x 106Da which, until recently, have been poorly characterized and classified [Morris and Carrington, 1988]. Early work tried to group MNSV with one or more of these viruses on the basis of serological relatedness and/or shared biological properties. MNSV, however, was not found to be serologically related to any of the viruses Chapter 1. Introduction 7 it was tested against [Hibi and Furuki, 1985]. Despite a lack of serological relatedness, it was suggested, even until recently, that MNSV should be grouped with cucumber necrosis virus (CNV) [Hibi, 1986]. This was suggested not only because these viruses share the chemical and physical properties described above, but also because they are both natural pathogens of cucumber and are both vectored by the soil fungus Olpidium radicale [Hibi and Furuki, 1985, Dias and McKeen, 1972]. It has recently been determined that CNV is a member of the tombusvirus group [Rochon and Tremaine, 1988]. Comparison of the number, and absolute and relative sizes of the dsRNAs produced during infection by MNSV and several other small, spherical, plant viruses suggested that MNSV might be a member of the carmovirus group [Pot, 1987, Riviere et al, 1989], a plant virus group related to, but separate from, the tombusvirus group. Nucleotide sequence data, however, showed that the MNSV coat protein more closely resembles the coat proteins of tombusviruses than of carmoviruses sequenced to date [Pot, 1987, Riviere et ah, 1989]. There was, therefore, still some uncertainty as to how MNSV should be classified when the present study was initiated. 1.3 Thesis Objectives Complementary DNA (cDNA) clones representing more than 95% of the MNSV genome had been prepared and the nucleotide sequence of the MNSV coat protein had been determined and analysed [Pot, 1987, Riviere et ai, 1989] before this study was initiated. The objectives of this thesis project were to: 1. Determine the nucleotide sequence of the rest of the MNSV genome, which would entail sequencing the regions of the genome encoding nonstructural proteins, as well as any noncoding regions, such as those normally found in intercistronic regions and at the 5' and 3' termini of viral genomes; Chapter 1. Introduction 8 2. Determine the number and sizes of subgenomic RNAs produced during infection of plants by MNSV; 3. Analyse the MNSV nucleotide sequence to determine the number, location and size of noncoding and potential coding regions in the MNSV genome; 4. Suggest which of these potential coding regions are most likely expressed in vivo, by determining which coding regions have a plausible means for expression, and by comparing the genomic organization and deduced amino acid sequences of potential MNSV proteins with those of other viruses; 5. Propose possible functions for the probable MNSV proteins based on comparisons of their amino acid sequences with those of proteins of known function; and 6. Use the information obtained from analysing the MNSV nucleotide sequence and the MNSV subgenomic RNAs to determine the plant virus group to which MNSV belongs. Chapter 2 Materials and Methods Certain procedures are commonly used in molecular biology and form one or more of the steps in the methods described or referenced below. These procedures include such things as ethanol precipitation of DNA or RNA, purification of nucleic acid preparations by extraction with organic solvents, restriction enzyme digests, agarose gel electrophore-sis of nucleic acids, ethidium bromide staining and UV photography of such gels, de-phosphorylating the termini of DNA fragments, ligations, transformations and growth and storage of E. coli cultures. Steps involved in carrying out these procedures have been described extensively [Maniatis et al., 1982]. Detailed information on handling and preparing reagents required for these procedures and for the methods described below can also be found in this reference [Maniatis et al., 1982]. 2.1 Nucleotide Sequencing 2.1.1 Preparation of Subclones for Sequencing Purified DNA preparations of two pUC13 plasmid derivatives, pMNS17A and pMNSOlA, containing cDNA inserts corresponding to more than 95% of the MNSV genome were prepared by a former student and were used to prepare subclones for sequencing. Fig. 2.1 shows the orientation of these inserts relative to each other and to the MNSV genome, and shows restriction enzyme maps of these inserts. 9 Chapter 2. Materials and Methods 10 M N S V R N A Eco RI Hinc II Sac I Eco RI I I |l I pMNS17A Hinc II H Bstl7A-14 (Ori 1) I *-Bs t l7A-E3 (Ori 2) Eco RI Sma I Bam HI Bam HI Sma I Eco RV p M N S O l A I I | I | I |l 1 Hind III Hinc II Hinc II Sal I I l I I I I I l l l I I I I I I l l l l I l l l l I I l I I I I l I I I I I l I I l I I 0 1000 2000 3000 4000 4266 Figure 2 . 1 : Genomic Location and Restriction Maps of MNSV cDNA Clones pMNS17A and p M N S O l A (represented by the heavier solid lines) are the original M N S V c D N A clones provided for this thesis project. Their restriction maps, as previously determined [Pot, 1987], are shown. Bst l7A-14 and Bs t l 7A-E3 (represented by thinner solid lines) are the two major subclones derived from pMNS17A. These subclones were used to prepare nested deletion subclones to sequence this area of the genome in both directions. The (BamHI) site shown near the 5' end of the M N S V R N A was deduced from nucleotide sequence data obtained by dideoxy nucleotide sequencing of virion R N A . Existence of this site was not verified by restriction enzyme mapping because a c D N A clone representing this area of the genome was not found. The scale at the bottom is numbered in base pairs. (Bam HI) Chapter 2. Materials and Methods 11 Restriction Enzyme Fragment Subclones Subclones of pMNSOlA and certain areas of pMNS17A were prepared by restriction enzyme digestion of plasmids containing the full-length insert of these plasmids, or of plasmids containing shorter inserts derived from pMNSOlA or pMNS17A by previous subcloning. Digests were electrophoresed through non-denaturing, 1% low gelling tem-perature agarose gels, which were ethidium bromide stained and visualized briefly under long wavelength UV light. Bands containing the desired restriction fragments were ex-cised from the gel and the DNA recovered from the gel pieces by a modification (see App. A.l for detailed protocol) of the "freeze-squeeze" method [Tautz and Renz, 1983]. Alternatively, ordinary agarose gels were used and the DNA recovered from excised gel pieces using Geneclean™ (BioCan) according to the manufacturer's instructions. The latter method is based on selectively binding and later eluting DNA from finely ground glass [Vogelstein and Gillespie, 1979]. In some cases, restriction fragments contained "sticky ends" which were not compatible with any of the restriction sites in the se-quencing vector's multiple cloning site where it was necessary to insert the fragment for sequencing. In these cases, the fragments were blunt-ended using a single-strand specific nuclease (mung bean nuclease), in a modification (see App. A.3 for detailed protocol) of a published method [Hammond and D'Alessio, 1986]. Purified restriction fragments were ligated to suitably digested, calf intestinal phos-phatase treated Bluescribe™ or Bluescript™ (Stratagene) phagemid vectors. The lig-ation mixture was used to transform Escherichia coli (E. coli) DH5a cells (Bethesda Research Laboratories), which had previously been made competent by treatment with CaCl2 [Morrison, 1979]. Chapter 2. Materials and Methods 12 The transformation mixture was plated on Luria-Bertani (LB) media containing ampi-cillin and Bluogal™ (Bethesda Research Laboratories), a chromogenic substrate of /?-galactosidase. Since the vectors used contain a gene for ampicillin resistance and, as part of the multiple cloning site, the N-terminal portion of the -^galactosidase gene, the use of ampicillin and Bluogal greatly aided in selecting E.coli which had been trans-formed by recombinant plasmids. Following overnight incubation at 37° C, single white colonies (those most likely to contain the desired plasmids) were used to individually inoculate 3 ml aliquots of liquid LB media containing ampicillin. These cultures were incubated overnight at 37°C with vigorous shaking, then used to isolate small amounts ("mini-preps") of plasmid DNA by an alkaline lysis procedure [Maniatis et al., 1982]. Restriction digests were used to determine whether plasmids contained the desired in-serts and, where inserts were not force-cloned, to determine the orientation of the insert. Plasmid mini-preps containing desired inserts were used as templates for sequencing as described below. Nested Deletion Subclones The MNSV cDNA insert in pMNS17A was excised by Pstl digestion and recloned, es-sentially as described above, into Bluescript, a vector designed to aid in the generation of nested deletion subclones for sequencing a long insert in one direction. Although sev-eral recombinants were examined, they were all found to contain the insert in only one orientation, even when a second attempt was made to redone the pMNS17A insert into Bluescript. One plasmid containing the insert in this orientation (Ori 1) was used to gen-erate nested deletion subclones in one direction and was named Bstl7A-14 (see Fig. 2.1). To generate nested deletion subclones of most of the insert in the other direction, the large central EcoRl fragment of Bstl7A-14 was purified and subcloned into Bluescript. The clone Bstl7A-E3 (see Fig. 2.1) contained this fragment in the desired orientation Chapter 2. Materials and Methods 13 and was used to generate nested deletion subclones in the opposite orientation (Ori 2). Large amounts of plasmid DNA from Bstl7A-14 and Bstl7A-E3 were prepared by an alkaline lysis procedure followed by incubation in 2M LiCl which precipitates high molecular weight ssRNA [Baltimore, 1966]. Details of this procedure are given in App. A.2. Where necessary, plasmid DNA was further purified by isopycnic centrifugation in CsCl density gradients containing ethidium bromide [Maniatis et al., 1982, Garger et al., 1983]. Nested deletions were generated from each preparation of plasmid DNA using the conditions recommended by the supplier of the Bluescript vector (Stratagene) which are based on the method described by Henikoff [Henikoff, 1984]. This method involves digesting the plasmid with two different restriction enzymes such that an opening is made between one end of the insert and a primer binding site for dideoxynucleotide sequencing. The restriction enzymes are selected to leave the plasmid side of the opening with a 3' extension on the end and the insert side of the opening with either a 5' extension or a blunt end. Exonuclease III is then used to degrade one strand of the insert from the 3' end. This enzyme will only act on double-stranded DNA with a 5' extension or a blunt end and will not degrade DNA with a 3' extension on the end. Thus, having digested the plasmid as described above, the plasmid side of the opening is protected from exonuclease III digestion so that the enzyme will degrade only the insert, and in only one direction. In addition, exonuclease III degrades DNA at a uniform rate. Conditions can be chosen so that it degrades the DNA at a rate of roughly 200 nucleotides per minute. By removing aliquots at one minute intervals one can obtain nested deletions of the insert varying by approximately 200 nucleotides in length, which is normally the minimum distance expected to be covered in a single sequencing reaction. Mung bean nuclease, a single strand specific nuclease, is then used to blunt the ends of the deleted plasmid so it can be recircularized with ligase and used to transform competent E. coli. Transformants Chapter 2. Materials and Methods 14 are selected and screened as described above. Generally, several transformants from each timepoint must be tested to find deletions allowing access to the entire insert. 2.1.2 Sequencing Procedures Double-stranded DNA Sequencing Sequencing Method The dideoxyribonucleotide chain termination method [Sanger et ai, 1977] was used to sequence double-stranded plasmid DNA templates [Korneluk et al., 1985]. Initially, sequencing was carried out using the Klenow fragment of DNA polymerase I (Klenow) under conditions recommended by the supplier of the sequenc-ing vectors used (Stratagene). Later in the project, a modified T7 DNA polymerase, Sequenase™ (United States Biochemical Corporation) was used instead of Klenow, un-der conditions described by Toneguzzo et al. [Toneguzzo et al., 1988]. Sequence ambiguities were resolved by changing the polymerase used, increasing the reaction temperature to decrease the amount of secondary structure the enzyme had to overcome or using dGTP analogues (7-deaza-dGTP with Klenow [Barr et al., 1986] or dITP with Sequenase [Tabor and Richardson, 1987]) and/or adding formamide to the sequencing gel [Martin, 1987] to decrease secondary structure causing electrophoretic artifacts. [o;32P]-dATP (ICN Biomedicals, Inc.) was used as the radioactive label for sequencing. Sequencing Gels A Sequi-Gen™ (Bio-Rad Laboratories) apparatus was used to cast and run sequencing gels. The particular model used produced gels which were 50 cm long by 21cm wide. Gels were cast using wedge spacers (C. B. S. Scientific Company Incorporated) to produce a gel varying in thickness from 0.2 mm at the top to 0.6 mm at the bottom. Such wedge gels allow more sequence to be read compared to a gel of the same length of uniform thickness. Gels were prepared according to the recommendations Chapter 2. Materials and Methods 15 of the manufacturer of the sequencing apparatus, except that gels were often poured the night before being used, rather than the minimum 3hrs before use suggested by the manufacturer. Gels generally contained 6% polyacrylamide and 7.7 M urea as a denaturant. "Sharkstooth" combs (0.25 mm thick, Bio-Rad Laboratories) producing 24 wells (fine teeth) or 48 wells (double-fine teeth) were used to form sample-loading wells. Gels were run according to the manufacturer's recommendations, which included pre-electrophoresing for 1 hr at a constant power setting resulting in a surface gel temperature of 45°C followed by electrophoresis of the samples at a constant power setting which increased the surface gel temperature to 50°C. After electrophoresis, gels were transferred to Whatman 3MM paper and dried under vacuum at 80° C for 1 hr to improve the resolution of gel autoradiographs. The dried gels were autoradiographed overnight, at room temperature without intensifying screens. Gels containing formamide were prepared, run and handled somewhat differently, as described by Martin [Martin, 1987]. Formamide was deionized using ion-exchange resin as described [Maniatis et al., 1982]. Gels contained 6% polyacrylamide, 7M urea and 40% formamide. Increased amounts of the polymerization initiator TEMED (tetram-ethylenediamine) were used to overcome the inhibitory effect formamide has on the poly-merization process. Formamide also significantly reduces the electrophoretic rate, so gels were run at higher voltages than usual. Formamide gels were run in a hood to minimize the operator's exposure to any volatilized formamide. After electrophoresis, gels were soaked in 5% acetic acid/5% methanol to remove as much formamide as possible, since formamide prevents gels from drying completely. After this treatment, gels were rinsed briefly with distilled H 20, dried and autoradiographed as described above. Chapter 2. Materials and Methods 16 RNA Sequencing Selected portions of MNSV genomic RNA were sequenced using specific oligonucleotide primers and genomic RNA as template for dideoxynucleotide sequencing. MNSV genomic RNA for sequencing was provided, or prepared as described in Section 2.2.1. Specific oligodeoxyribonucleotide sequencing primers, 17 nucleotides in length were purchased in order to sequence the MNSV RNA. The sequences of these primers were chosen to maximize the chance that they would hybridize to the MNSV genomic RNA at only one position, slightly 3' to the area to be sequenced. Primers were purified by reversed-phase partition chromatography according to the manufacturers' instructions as adapted from their previously published method [Atkinson and Smith, 1984]. The concentrations of solutions containing the purified primers were estimated by measuring their OD260- Purified primers were stored in solution at — 20°C until used. Dideoxynucleotide sequencing using MNSV RNA as template was carried out using avian myeloblastosis virus (AMV) reverse transcriptase, which has an RNA-dependent DNA polymerase function. Conditions used for sequencing were essentially those recom-mended by Stratagene for sequencing RNA templates. The only significant modification to this method was that the temperature used to anneal the primer to the RNA was sig-nificantly higher than the room temperature suggested by Stratagene. This temperature was increased to try to minimize the number of extraneous bands on the autoradiograph which might result from non-specific priming [Geliebter, 1987]. The optimum annealing temperature for each primer was estimated as suggested by Geliebter [Geliebter, 1987] as being 5°C below the denaturation temperature (Td), i.e., the temperature, in °C at which one half of the primer/template complexes are dissociated. Td was calculated on the basis of each primer's specific sequence using the formula Td = 4(G + C) + 2(A + T) [Suggs et al., 1981]. Depending on the primer's Td, annealing was usually carried out at Chapter 2. Materials and Methods 17 42-45° C. Sequencing gels for RNA dideoxynucleotide sequencing were prepared, run and au-toradiographed as for DNA sequencing. 2.1.3 Storage and Analysis of Sequence Data Autoradiographs of sequencing gels were read by eye throughout the project. Initially, data were stored and analysed using a microcomputer version of software developed by Pustell [Pustell and Kafatos, 1984] and marketed by International Biotechnologies, Inc. While using this software, raw sequence data were written out by hand, then typed into the computer. Overlaps in the sequences of different subclones were found by printing out the sequence and restriction enzyme sites of each subclone and using these to find overlaps by eye. Later in the project a more sophisticated microcomputer sequence analysis system, the Gene-Master™ (Bio-Rad Laboratories) system was used. With this system, a digitizer was used to enter raw sequence data directly from the autoradiograph into the computer and a "shotgun handler" program was used to automatically find overlaps in the sequences of different subclones. Analysis of the final nucleotide and amino acid sequence data was carried out using both the Gene-Master system as well as programs and databases available through the National Research Council's Scientific Numeric Database Service (SND) which operates on a Digital Equipment Corporation (DEC) VAX 780 running VMS. Dot matrix com-parisons of amino acid sequences were made using the Gene-Master version of DIAGON [Staden, 1982a]. All dot matrix comparisons used the proportional match method, usu-ally with a homology score of 350 and a span of 30. Pairwise alignments of amino acid sequences were generated using versions of the Needleman-Wunsch algorithm [Needleman Chapter 2. Materials and Methods 18 and Wunsch, 1970] from the Gene-Master system and the University of Wisconsin Ge-netics Computer Group (UWGCG) package [Devereux et al., 1984] available from SND. Multiple sequence alignments were made by a progressive alignment procedure which uses the Needleman-Wunsch algorithm iteratively to generate its alignments [Feng and Doolittle, 1987]. The ensemble of programs required for this method were kindly pro-vided by Drs. D-F. Feng and R. F. Doolittle of the University of Cahfornia-San Diego, and adapted to an Agriculture Canada DEC VAX running VMS by Mr. W. Ronald of the Vancouver Research Station. In all cases where a Needleman-Wunsch algorithm was used, amino acid sequence similarity was assessed by the algorithm using a modified Protein Mutation Matrix [Dayhoff et al., 1978]. Protein and nucleotide sequence databases were searched by computer to determine if any of the MNSV nonstructural proteins show significant amino acid sequence similarity with proteins of known function. Searches were conducted through the SND system using the FASTA program [Pearson and Lipman, 1988]. Databases searched included the protein sequence libraries of the NBRF (Release 20.0, 3/89), Swiss-Prot (Release 10.0, 3/89) and Pseqlp (Release 6.0, 7/88), as well as the nucleotide sequence libraries of the NBRF (Release 35.0, 5/89), GenBank (Release 59.0) and EMBL (Release 19.0, 5/89). Nucleotide sequence databases were searched for amino acid sequence similarity to MNSV proteins using a special version of FASTA called TFASTA. This program simultaneously translates all six frames of a nucleotide sequence and searches for amino acid sequence similarity between a query protein sequence and the translated nucleotide sequence. The significance of the best matches found between MNSV proteins and proteins in these databases was assessed using the RDF2 program [Pearson and Lipman, 1988]. Chapter 2. Materials and Methods 19 2.2 Northern Blot Analysis of Single-Stranded RNA from MNSV-Infected Plants 2.2.1 Purification of Virion RNA Plants used for MNSV Propagation Because of its very narrow host range (see Section 1.2.1), the strain of MNSV used could only be propagated in cucumber or melon. All virus preparations used in this study were propagated in cucumber. Initially, the cultivar Straight 8 was used, but when its susceptibility to powdery mildew became a problem, a powdery mildew resistant cultivar, Poinsett76S was used. Plants were grown under greenhouse conditions in pasteurized soil. Supplemental light and heating were necessary for adequate plant growth and acceptable virus yields except during the summer months. Virus Inoculum The Dutch isolate of MNSV, originally obtained from D. Z. Maat, was provided by J. H. Tremaine for use throughout this study. MNSV-infected cucumber cotyledon and leaf tissue which had been stored at — 20° C was used as the initial inoculum source. As an extra precaution to try to ensure purity, the inoculum was purified by several local lesion passages before inoculating large numbers of cucumber plants for virus purification. When fresh MNSV-infected cucumber tissue was used as inoculum, tissue was used within a few days of the appearance of local lesions, before the lesions became obviously necrotic. Chapter 2. Materials and Methods 20 Inoculation Procedure MNSV-infected leaf tissue was thoroughly ground in autoclaved 10 mM potassium phos-phate buffer, pH 7.0 using a baked mortar and pestle. The resulting slurry was rub inoc-ulated using small autoclaved sponge pads onto cucumber cotyledons previously dusted with fine carborundum. Plants were generally inoculated when their cotyledons were between half and fully expanded, but before the primary leaf had emerged. This could be anywhere from 7 to 11 days after seeding, depending on greenhouse conditons. Local lesions appeared on inoculated cotyledons 2 to 5 days after inoculation depending on greenhouse conditions. Systemic infection was rare. Virus Purification Since infection was generally limited to local lesions on inoculated cotyledons, only this tissue was harvested for purification. In rare cases where infection had gone systemic, all leaf tissue was harvested. Tissue was harvested 4 to 7 days after inoculation depending on the development of symptoms, but always before tissue became too necrotic. Virus was purified from harvested tissue immediately, as storage before purification apparently reduces yield [Tremaine, 1989]. Virus was purified by a modification of the pH5 method [Tremaine et al., 1983]. Details of this modified method are given in App. A.5. Essentially, tissue is ground in a pH 5.0 buffer which precipitates most plant proteins but leaves the virus in solution. Virus is precipitated from the clarified solution using polyethylene glycol (PEG). The resuspended PEG precipitate is centrifuged through a CsCl gradient to yield a very pure virus preparation which is then dialysed against the buffer of choice. Virus purity and integrity were assessed by electron microscopy and by measuring the absorption spectrum between 220 nm and 320 nm of solutions containing purified virus. Virus yield was estimated from such solutions by measuring their OD260-Chapter 2. Materials and Methods 21 Yields varied widely between batches of tissue. Purification of Virion RNA Virion RNA was isolated by incubation of purified virus in 10 mM ethylenediaminete-traacetic acid (EDTA) on ice for 2 min., followed by extraction with phenol/chloroform in the prescence of 1% sodium dodecyl sulfate (SDS). Purity and concentration of RNA was initially estimated spectrophotometrically. Purified virion RNA was also electrophoresed through denaturing agarose gels containing methyl mercuric hydroxide [Bailey and David-son, 1976], stained with ethidium bromide and visualized and photographed under UV light to confirm the purity, integrity and concentration of the RNA. 2.2.2 Purification of Single-Stranded RNA from MNSV-Infected Plants Total ssRNA was isolated from mock-inoculated and MNSV-infected cucumber cotyle-dons by a modification of the method of Siegel [Siegel et al., 1976]. A detailed protocol is provided in App. A.4. Briefly, this method involves grinding the cotyledons to a fine powder under liquid nitrogen in a mortar and pestle, followed by repeated extractions with phenol/chloroform to recover total nucleic acid. High molecular weight ssRNA is separated from the total nucleic acid by precipitation with 2M LiCl. Purity, integrity and concentration of the leaf ssRNA recovered was assessed as described above for virion RNA. 2.2.3 Northern Blots of Total Leaf Single-Stranded RNA Electrophoresis and Blotting MNSV virion RNA, total leaf ssRNA from mock-inoculated and MNSV-infected cucum-ber cotyledons, and DNA size standards were denatured with glyoxal and electrophoresed Chapter 2. Materials and Methods 22 under conditions which maintain glyoxal treated nucleic acids in their denatured state [McMaster and Carmichael, 1977]. The portion of the gel containing the DNA size stan-dards was treated with 50 mM NaOH to remove glyoxal from the DNA, then neutralized, stained with ethidium bromide and photographed under UV light. The remaining nucleic acids were transferred from the gel onto a positively charged nylon membrane (Zeta-Probe™, Bio-Rad Laboratories) using alkaline Northern blotting conditions [Vrati et al, 1987]. Hybridization and Autoradiography Blots were pre-hybridized for at least lhr., at 42°C, with continuous agitation in hy-bridization buffer consisting of 50% deionized formamide, 5% sodium dextran sulphate, 1M NaCl, 50 mM Tris-HCl pH 7.5 (tris (hydroxymethyl) aminomethane adjusted to pH7.5 with HC1), 0.2% bovine serum albumin (BSA), 0.2% polyvinylpyrrolidone, 0.2% Ficoll, 0.1% sodium pyrophosphate and 250//g/ml sheared salmon sperm DNA. The two MNSV cDNA clones pBST17A-14 and pMNSOlA (see Fig. 2.1) were used to probe these blots. pBstl7A-14 and pMNSOlA were labelled with 32P-dATP by nick-translating [Rigby et al., 1977] the entire plasmid. The progress of the labelling reaction was monitored by precipitating small aliquots of the reaction mixture using trichloroacetic acid (TCA), then measuring the amount of label incorporated into the TCA-precipitable material using liquid scintillation counting [Maniatis et al, 1982]. When the probe was labelled to the desired extent, it was purified by spun-column chromatography to remove unincorporated label [Maniatis et al, 1982]. Blots were hybridized in 20-25 ml of the same solution used for pre-hybridization. The labelled probe was denatured by adjustment to 0.1 M NaOH and subsequent boiling for 5 min., then added to the hybridization solution to a final concentration of ca. lng Chapter 2. Materials and Methods 23 probe/ml of hybridization solution, which was equivalent to ca. 106 cpm/ml of hybridiza-tion solution. Blots were hybridized overnight, at 42°C with continuous agitation. Following hybridization, blots were rinsed briefly with 2 x SSC plus 0.1% SDS at room temperature, then washed for 15 min. at 55-60°C in each of three washes; firstly in 2 x SSC plus 0.1% SDS, secondly in 0.5 x SSC plus 0.1% SDS and lastly in 0.2 x SSC (no SDS). 1 x SSC consists of 15mM sodium citrate plus 150mM sodium chloride. Washed blots were autoradiographed initially with two intensifying screens, for 2hrs. at — 70°C. To obtain better resolution for photographs, blots were subsequently autora-diographed without screens, overnight at room temperature. Chapter 3 Results and Discussion 3.1 Sequencing Strategy Fig. 3.2 illustrates the strategy used to sequence the noncoding regions and areas encoding potential nonstructural proteins of MNSV. 3.1.1 5' Terminus The sequence was determined in both directions, except at the extreme 5' end of the genome. The 5' proximal 178 nucleotides of the genome were not included in the cDNA clones used for sequencing, and therefore, were determined in one direction only by using a specific oligonucleotide primer and genomic RNA as a template for dideoxynucleotide sequencing. It is not possible to determine the ultimate 5' nucleotide using this method because chain elongation in all four sequencing reactions terminates at this point, produc-ing a band in all four lanes of the autoradiogram. The 5' ultimate nucleotide is, therefore, designated as "N" in Fig. 3.3, which shows the nucleotide sequence of the MNSV genome. 24 Chapter 3. Results and Discussion 25 5' r -M N S V R N A 3' — i 4267 Oligo l/3/191 Bstl7A-14 01igo2/2/131 « — 01igo3/2/149 Oligo4/2/203 pUCMOlA • p U C M 3 9 A / l / 2 1 4 • p U C M 4 0 A / l / 2 1 3 • p U C M 4 0 A / l / 1 3 2 • p U C M 0 6 F / l / 1 9 9 p U C M 1 4 F / l / 1 2 5 — - A P 4 / 1 / 2 3 8 19x1/1/94 —I l l x l / 2 / 1 8 3 I 8x2x1/2/286 110x2/1/159 1 7x6/1/176 ES4/1/253 1 5x4/2/283 1 8x2x3/3/236 1 2x3/1/256 1 7x7/1/213 1 6x16/2/317 1 5x3/2/284 M303/1/87 *• M301/3/329 M2001/1/171 M301/3/286 *• M2101/1/282 M1902/1/195 -« M1901/1/190 M1304/2/169 -*- M102/1/79 -« M1504/4/301 Bs t l7A-E3 H 5x1/3/288 E3/2/303 SE/1/224 1 2x5/2/260 1 l x l / 4 / 2 5 8 E E 2 / 1 / 2 6 0 • M1601/1/293 •M1601/1/272 - M1603/1/169 — - M702/2/209 M1806/1/242 - M1801/1/196 — - M1702/2/118 B s t l 7 A - E 3 / 3 / 2 0 5 SE/2/304 • 3x6/3/274 1x2/1/135 • 1x6/2/318 2x2/1/204 3x1/1/167 • 4x3/2/250 • 4x5/3/333 • 1x7/2/308 4x2/2/254 2x3/1/225 4x15/2/317 5x4/1/124 •4x7/3/117 • B s t l 7 A - E 3 / 2 / 3 0 3 J _ J I L J _ l L_l_ ' ' ' ' ' ' ' ' ' ' ' ' i i I i ________ ________ _1_J_ 1000 2000 3000 4000 4267 F i g u r e 3.2: Sequenc ing S t ra tegy Regions of the genome sequenced using specific oligonucleotide primers and R N A as template (Oligo 1-4) or using restriction enzyme subclones are indicated by , with the length of the arrowshaft indicating the length of the region sequenced using a particular primer or subclone. Nested deletion subclones are indicated by I . The region sequenced from each of these subclones is indicated by the thicker part of the arrowshaft I » . For each primer used or subclone sequenced, the direction of the arrowhead indicates the sense of the sequence and the following information is indicated: A / B / C , where A is the name of the subclone or primer; B is the number of times the sequence was determined using this subclone or primer and C is the number of nucleotides determined using this subclone or primer. Subclones used to sequence the region encoding the coat protein (nucleotides 2815-3987 inclusive) are not shown. The scale at the bottom is in nucleotides. Chapter 3. Results and Discussion 26 1 NGAUUACUCUAGCCGGAUCCCCGACUCUCUUAU^  75 M D T G L K F L V S G G L A T S S V I R K 76 UAUAGGUU^ GCAAUGGAUACUGGUUUGAAAUUUCTO^  150 V S A V S S L D S S L P S S S I L S A I H G S W T 151 GUGAGUGCUGUGAGUUCAUUGGAUUCGUCCCUUCCUUCAUCAUCUAUAUUAUCUGCCAUCCAUGGGUCUUGGACU 225 S A I S H D C S K I A K V A A I V G I G Y L G V R 226 AGUGCUAUCAGCCACGAUUGUAGUAAGAUUGCCAAGGUUGCCGCCAUAGUUGGGAUUGGUUAUCUUGGGGUUAGG 300 I G A A W C R R T P G I T H S I I T Y G E E V V E 301 AUUGGUGCCGCUUGGUGCCGCCGUACUCCCGGAAUAACGAAUUCCAUAAUCACCUAUGGGGAAGAAGUGGUUGAG 375 q V K V D I D E D A E E E S D I G E E I V V G T I 376 CAAGUGAAGGUAGAUAUUGAUGAAGAUGCUGAAGAGGAGUCCGAUAUUGGUGAGGAAAUUGUGGUUGGUACGAUA 450 G I G I H T N V K P E V R A K R R H R S R P F I K 451 GGUAUUGGUAUACACACAAACGUCAACCCUGAAGUUCGAGCUAAGCGCAGACAUAGAUCGAGGCCAUUCAUCAAG 525 K I V N L T K N H F G G C P D S S K S N V M A V S 526 AAGAUCGUGAAUUUAACGAAGAAUCACUUCGGUGGAUGCCCCGACUCUAGUAAAUCGAACGUCAUGGCUGUAAGU 600 K F V Y E Q C K Q H N C L P H Q T R L I M S I A V 601 AAAUUCGUUUAUGAACAAUGUAAACAGCACAAUUGUCUUCCACAUCAAACGAGAUUGAUCAUGAGUAUUGCAGUU 675 P L V L S P D M Y D I S S K A L L K S E I L T E N 676 CCAUUGGUGUUAAGUCCCGACAUGUACGACAUUUCCAGCAAAGCUCUGCUAAACAGCGAGAUAUUGACAGAAAAC 750 R A T L D R L K T L D G W L T H L V C H P L S A K 751 AGAGCCACGCUGGACCGCCUCAAAACUCUCGACGGGUGGCUAACACACUUGGUGUGCCACCCCCUUAGCGCGAAG 825 A W R R A I D N L C G L P D W K A F K L V N + G C 826 GCUUGGAGGCGGGCAAUUGACAACUUGUGUGGUCUUCCAGAUUGGAAGGCUUUCAAGUUGGUCAACUAGGGGUGC 900 L E E L A G F C T S V R R G T H P D M T E F P Q D 901 CUGGAGGAGCUCGCUGGGUUCUGUACUUCGGUACGGAGAGGGACACACCCAGACAUGACCGAGUUUCCUCAGGAU 975 R P I K T R K L Y C L G G V G T S V K F N V H N N 976 CGUCCCAUUAAGACACGCAAACUGUAUUGUUUAGGGGGAGUUGGAACUAGCGUGAAGUUCAACGUGCACAAUAAC 1050 S L A N L R R G L V E R V F F V E N D K K E L E P 1051 UCUCUAGCUAACCUUCGGCGCGGUCUAGUUGAGCGCGUUUTJCUUUGUUGAAAAUGAUAAGAAGGAACUGGAGCCU 1125 A P K P L S G A F D R L T W F R R K L H S I V G T 1126 GCCCCUAAACCUCUUAGUGGUGCGUUUGAUCGCUUAACUUGGUUUCGUCGGAAACUCCAUAGUAUUGUGGGUACU 1200 H S S I S P G Q F L D F Y T G R R R T I Y E G A V 1201 CAUUCCAGUAUUAGUCCAGGUCAGUUCUUGGACUUCUAUACUGGCAGGAGGCGCACGAUUUAUGAAGGUGCUGUG 1275 Figure 3.3: Nucleotide Sequence of the MNSV Genome (continued on pages 27-28) The nucleotide sequence is written as R N A . Numbers indicate nucleotide positions. Nucleotide number one is represented by an " N " because its identity is unknown. Where the nucleotide sequence differs among clones, i.e., at the 3' termini, all sequences are shown. The deduced amino acid sequence is written above the corresponding nucleotide sequence, using the one-letter code. Asterisks represent termination codons. The region encoding the coat protein (nucleotides 2815-3987) was sequenced independently of this thesis project [Pot, 1987,Riviere et o/., 1989]. Chapter 3. Results and Discussion 27 K S L E G L S V Q R R D A Y L K T F V K A E K I N 1276 AAAUCGUUGGAGGGGUUAAGUGUUCAACGAAGGGAUGCCUAUCUGAAAACGUUUGUUAAAGCGGAGAAGAUUAAU 1350 T T K K P D P A P R V I Q P R M V R Y I V E V . G R 1351 ACCACUAAGAAACCUGACCCAGCUCCGCGGGUUAUACAACCGAGGAACGUAAGAUACAACGUUGAGGUUGGUCGU 1425 Y L R R F E H Y L Y R G I D E I W H G P T I I K G 1426 UAUCUACGUAGGUUUGAGCAUUACCUCUAUCGAGGAAUUGACGAAAUCUGGAAUGGCCCCACCAUAAUAAAAGGA 1500 Y T V E Q I G K I A R D A W D S F V S P V A I G F 1501 UACACUGUCGAGCAAAUUGGGAAAAUCGCCCGUGACGCAUGGGACUCCUUCGUUAGUCCUGUAGCAAUCGGAUUU 1575 D M K R F D Q H V S S D A L K W E H S V Y L D A F 1576 GACAUGAAAAGGUTJCGACCAACAUGUAUCCUCCGACGCUCUUAAAUGGGAACAUAGUGUWAUCUUGACGOJUUU 1650 C H D S Y L A E L L K W Q L V M K G V G Y A S D G 1651 UGCCACGACUCAUAUCUUGCAGAAUUGUUGAAGUGGCAAUUAGUUAAUAAGGGUGUUGGGUAUGCUAGUGAUGGA 1725 M I K Y K V D G C R M S G D M N T A M G N C L I A 1726 AUGAUUAAAUAUAAGGUUGAUGGGUGCCGGAUGAGUGGUGACAUGAAUACAGCUAUGGGUAACUGUUUGAUUGCC 1800 C A I T H D F F R S R G I R A R L M K N G D D C V 1801 UGUGCCAUCACGCAUGAUUUCUUCCGUAGUCGUGGUAUCAGGGCGCGUUUGAUGAACAAUGGUGAUGACUGUGUC 1875 V I C E K E C A A V V K A D M V R H W R Q F G F Q 1876 GUAAUAUGCGAAAAAGAAUGUGCCGCGGUGGUUAAAGCCGACAUGGUAAGGCACUGGAGACAAUUCGGGUUUCAA 1950 C E L E C D A E I F E Q I E F C Q M R P V Y D G E 1951 UGCGAACUCGAAUGCGAUGCAGAAAUCUUCGAGCAAAUUGAGUUUUGUCAAAUGCGGCCUGUGUACGACGGGGAA 2025 K Y V M V R N P L V S L S K D S Y S V G P W N G I 2026 AAAUAUGUGAUGGUACGGAAUCCCUUGGUUAGCCUAUCCAAAGAUUCCUACUCAGUCGGCCCUUGGAAUGGAAUC 2100 N H A R K W V H A V G L C G L S L T G G I P V V Q 2101 AACCAUGCACGCAAGUGGGUCAAUGCAGUUGGCUUGUGUGGCUUAUCCCUCACUGGUGGAAUUCCUGUUGUCCAA 2175 S Y Y N M M I R M T Q S V N S S G I L R D V S F A 2176 AGUUAUUAUAAUAUGAUGAUCCGCAACACUCAGUCCGUGAACAGTOCUGGCAUACUTJCGCGAUGUCAGUUUUGCU 2250 S G F R E L A R L G N R K S G A I S E D A R F S F 2251 AGUGGAUUTJCGGGAGUUAGCGCGAUTJGGGUAACAGGAAAAGUGGUGCCAUAUCUGAAGACGCCCGUUUUAGCUUU 2325 Y L A F G I T P D L Q R A M E S D Y D A H T I E W 2326 UAUCUCGCAUUUGGCAUUACUCCAGAUUUACAACGUGCCAUGGAAAGUGACUAUGAUGCUCAUACUAUAGAGUGG 2400 G F V P Q G N P R i q P I S W T L N E L * M D S q R T V E L T N P 2401 GGUUUCGUGCCCCAGGGAAAUCCUAGAAUACAGCCAAUCUCAUGGACUCUCAACGAACUGUAGAAUUAACUAAUC 2475 R G R S K E R G D S G G K Q K N S M G R K I A N D 2476 CUCGGGGAAGAAGUAAAGAACGUGGUGACAGCGGGGGAAAACAGAAGAACUCAAUGGGGCGAAAGAUAGCCAAUG 2550 A I S E S K Q G V M G A S T Y I A D K I K V T I H 2551 AUGCUAUCUCUGAAUCGAAGCAAGGAGUUAUGGGUGCCAGCACAUACAUUGCUGAUAAAAUUAAGGUGACUAUUA 2625 F N F * C M A C C R C D S S P G D Y S G A L L I L 2626 ACuTOAAWJUuU„GUGUAUGGCTOGTOG 2700 F I S F V F F Y I T S L S P q G K T Y V H H F D S 2701 UAUOTAUCUCAuTJUGUWUCUUWAU 2775 Figure 3.3 (cont'd) Chapter 3. Results and Discussion 28 S S V K T Q Y V G I S T N G D G * M A M V K R I N N L P T 2776 GUUCUUCCGUUAAAACACAAUACGUUGGCAUCUCUACAAAUGGCGAUGGUUAAACGCAUUAAUAAUUUACCCACA 2850 V K L A K Q A L P L L A N P K L V N K A I D V V P 2851 GUGAAGCUUGCUAAGCAGGCUCUACCCCUGCUUGCGAAUCCUAAACUUGUAAAUAAAGCUAUAGAUGUGGUUCCU 2925 L V V Q G G R K L S K A A K R L L G A Y G G N I S 2926 UUGGUCGUCCAAGGUGGUCGGAAAUUGUCCAAGGCUGCUAAGCGGUUGCUTJGGCGCUUAUGGAGGCAACAUUUCG 3000 Y T E G A K P G A I S A P V A I S R R V A G M K P 3001 UACACUGAGGGUGCCAAACCGGGUGCAAUAUCAGCUCCUGUCGCUAUUAGUCGGCGAGUGGCUGGUAUGAAGCCU 3075 R F V R S E G S V K I V H R E F I A S V L P S S D 3076 AGGUUTJGUCAGAUCUGAAGGAUCUGUGAAGAUAGUUCAUAGGGAGUUUAUUGCCUCUGUUOJUCCUUCGAGUGAU 3150 L T V N H G D V N I G K Y R V N P S N N A L F T W 3151 CUCACUGUCAAUAAUGGUGAUGUCAAUAUCGGUAAGUAUAGAGUCAAUCCUAGUAAUAACGCUUUAUUCACCUGG 3225 L Q G Q A Q L Y D M Y R F T R L R I T Y I P T T G 3226 CUUCAGGGACAAGCUCAACUAUAUGAUAUGUACAGAUUUACUCGGCUCCGUAUCACCUACAUUCCUACUACCGGA 3300 S T S T G R V S L L W D R D S Q D P L P I D R A A 3301 UCCACUUCCACGGGUCGUGUCUCUCUCCUCUGGGAUAGAGAUUCUCAGGACCCCCUCCCUAUAGACCGUGCUGCC 3375 I S S Y A H S A D S A P W A E N V L V V P C D N T 3376 AUUAGCUCUUAUGCUCAUUCCGCUGAUUCAGCGCCUUGGGCUGAGAAUGUUCUAGUGGUCCCAUGUGACAAUACG 3450 W R Y M N D T N A V D R K L V D F G Q F L F A T Y 3451 UGGAGGUACAUGAAUGAUACCAAUGCUGUCGACCGGAAGUUGGUUGAUUUUGGGCAGUUCUUAUUCGCUACUUAU 3525 S G A G S T A H G D L Y V E Y A V E F K D P q P I 3526 UCUGGUGCUGGUAGCACCGCCCAUGGUGAUCUUUAUGUUGAGUAUGCUGUAGAAUUUAAGGACCCCCAGCCUAUC 3600 A G M V C M F D R L V S L S E V G S T I K G V K Y 3601 GCUGGGAUGGUAUGUAUGUUUGAUCGCUUGGUCUCUCUUUCCGAAGUUGGAUCCACUAUCAAGGGUGUCAAUUAC 3675 I A D R D V I T T G G H I G V N I H I P G T Y L V 3676 AUUGCUGAUCGUGAUGUGAUAACUACUGGGGGUAAUAUUGGUGUUAACAUCAAUAUUCCCGGGACUUAUCUCGUC 3750 T I V L K A T S I G P L T F T G N S K L V G N S L 3751 ACGAUTJGUUCUUAAUGCTACAUCGAUUGGUCCCCUCACCUTJCACUGGUAAUUCUAAACOTGUAGGCAACAGUCOT 3825 N L T S S G A S A L T F T L N S T G V P N S S D S 3826 AAUCUUACCAGCAGUGGUGCAUCUGCUCUUACGUUCACCCUUAACUCCACCGGUGUGCCCAACAGUAGCGAUUCU 3900 S F S V G T V V A L T R V R M T I T R C S P E T A 3901 UCAUUCUCUGUGGGUACCGUUGUUGCCUUGACUAGGGUGCGUAUGACGAUCACUCGCUGCUCUCCAGAAACUGCU 3975 Y L A * 3976 UACCUCGCCUAAUUUGAUUUACUGCACUCCAAAUCCGGUCUCCCUUGUUCCUACCUGUUCUCAGCCUGAUAUCUG 4050 4051 UUCUGGUGUCCUAUAGGCGUCCUUGUCGUGUGUAGUGCGGUCUGGCUAACCGUAAUGGCGUAUCGGCUUGGAUUU 4125 4126 CCGAUGAUTJUGGCUCCGGGAUGUACGACAUAGCUGAAGAUGGUUGGAGUUUGGUGGACCACCGCUAGCAAAAUAC 4200 4201 ACUCUGUGUGGGGCGUGCUAGUGGAUAGUCAUGUAUGUUUGAGAUGGGUUAUAGGCCCAUCC(An) 4262 CUUU(An) 4266 CGCCU(An) 4267 Figure 3.3 (cont'd) Chapter 3. Results and Discussion 29 3.1.2 3' Terminus The sequence of the pUCMOlA subclone originally used to sequence the 3' terminus could be read unambiguously up to the polyadenylate tail added to the MNSV RNA before cDNA synthesis. The sequence of this clone suggested that the 3' terminal sequence of MNSV was (CCCUUU(A„)). The ultimate 3' nucleotide is represented by A„ because it is not possible by the sequencing method used to determine how many, if any, of the adenines which appear at this point on the sequencing autoradiogram are part of the viral sequence, as opposed to being part of the poly(A) tail added to the viral RNA during the cloning procedure. The 3' terminal sequence of pUCMOlA does not match well with the 3' terminal sequences of related viruses, which end in either CC or CC(A„) [Guilley et al, 1985, Carrington et al, 1989, Hillman et al, 1989, Rochon and Tremaine, 1989, Nutter et al, 1989]. This suggested pUCMOlA might not contain the 3' end of MNSV. To resolve this uncertainty, four other original MNSV cDNA clones were identified which mapped to the 3' terminus: pUCM39A, pUCM40A, pUCM06F and pUCM14F (see Fig. 3.2). Subclones of each of these clones were prepared to enable the extreme 3' end of each to be sequenced in at least one direction. Restriction mapping and sequence data from these four clones suggest they may actually represent only two different cDNA clones, i.e., pUCM39A and pUCM40A may represent one clone and pUCM06F and pUCM14F may represent a second clone. Both members of each pair of clones appear to have the same length insert, identical restriction maps, identical nucleotide sequences over the short region sequenced from each and even approximately the same number of A residues added during polyadenylation. Because of this, pUCM39A/40A and pUCM06F/14F are grouped together in the following discussion. The 3' terminal sequences obtained from pUCMOlA, pUCM39A/40A and pUCM06F/14F are aligned in Table 3.1. A vertical line is drawn through the sequence Chapter 3. Results and Discussion 30 Clone Name Direction Sequenced Nucleotide Sequence (Positive sense) pUCMOlA (+) GGCCCAUCC CUUU(An) PUCM39A/40A (+) GGCCCAUCC CGCCU(An) PUCM40A (") GGCCCAUCC CGCCU(An) pUCM06F/14F (+) GGCCCAUCC (An) Table 3.1: Nucleotide Sequences of the 3' Termini of MNSV cDNA Clones Mapping to the 3' End of the Genome immediately 3' to the sequence common to all 3 clones. As Table 3.1 shows, each clone has a different 3' terminal sequence. The 3' terminal sequence of pUCM06F/14F is merely a subset of the sequences obtained from pUCMOlA and pUCM39A/40A. The 3' termini of pUCMOlA and pUCM39A/40A, however, show definite differences. pUCM39A/40A has one more nucleotide (U) at the 3' end and the identity of the three preceding nucleotides differs between these two clones, i.e., UUU in pUCMOlA and GCC in pUCM39A/40A. Such variability in the 3' termini of the original MNSV cDNA clones may be due to a cloning artifact. Similarly, the 3' end of the RNA molecule from which pUCM06F/14F was synthesized may simply have been slightly degraded before being polyadenylated. Alternatively, the variability observed in these three clones may reflect natural variability in the 3' terminus of the MNSV genome. Such variability in plant viruses has not been reported, but it is interesting to speculate on mechanisms by which such variability could arise and on whether genomic RNAs with such variable 3' termini could be functional. One or both of the following mechanisms could be involved in creating or maintaining variable 3' termini in MNSV genomic RNAs. Firstly, variable ends could be due to errors in replication by the viral RNA-dependent RNA polymerase which is known to have a high error rate and to lack an editing function [Holland et al., 1982]. Several rounds of replication would probably be necessary to result in the extensive differences Chapter 3. Results and Discussion 31 between pUCMOlA and pUCM39A/40A if this is the only mechanism involved. A second mechanism which could create such variability is the non-templated (posttranscriptional) addition of different sequences to the initially uniform end of each molecule. Enzymes have been identified which catalyse such non-templated additions to nucleic acids [Rao et al, 1989], e.g., telomere terminal transferase (telomerase) catalyses the addition of tandem repeats of the sequence TTGGGG to the termini of eukaryotic chromosomes. Similarly, tRNA nucleotidyltransferase catalyses the addition of CCA to the 3' termini of tRNA in a non-templated fashion. In fact, host tRNA nucleotidyltransferase has been implicated in the posttranscriptional addition of a 3'-terminal A residue to the CC found at the 3' termini of newly synthesized brome mosaic virus (BMV) RNAs [Miller et al, 1986]. The original or unmodified 3'-terminal sequence of MNSV RNA may be CC or CCA since this is the sequence nearest the 3' end which is common to the three clones examined and since these sequences are commonly found at the 3' termini of ssRNA viruses. The different 3' termini of MNSV could then be generated by the non-templated addition of sequences like the CUUU(A) of pUCMOlA and the CGCCU(A) of PUCM39A/40A. Therefore, it seems that mechanisms exist for creating 3' terminal variability. One must still consider, however, whether such molecules could be functional in replication since the 3' termini of genomic RNAs are intimately involved in replication, specifically in the initiation of minus-strand synthesis [Miller et al, 1986]. RNAs with variable 3' ends would have to be able to be replicated in order to become sufficiently numerous in the population of genomic RNA to show up in cDNA prepared from such RNA. Experiments with BMV provide evidence for and against the idea that MNSV RNAs with the 3' termini observed in the three MNSV cDNA clones could be replicated. It has been shown in BMV that the penultimate C of its CCA terminus is required for replication, but that the addition of at least some short 3' extensions to this terminus Chapter 3. Results and Discussion 32 does not interfere with replication, in vitro. Furthermore, initiation of replication occurs at the correct position despite the presence of such extensions [Miller et al, 1986]. Thus, replication of MNSV RNA may be able to initiate from the C nearest the 3' end which is shared by all three MNSV cDNA clones, despite the variable 3' extensions found in pUCMOlA and pUCM39A/40A. Studies of BMV replication in vivo, however, suggest that such variable 3' extensions are removed and progeny RNAs have only CC or CCA at their 3' termini [Rao et al., 1989]. This would argue against the type of variable 3' termini found in the MNSV cDNA clones. On the evidence available to date, therefore, one can not decide conclusively whether the 3' terminal variability observed in the MNSV cDNA clones reflects natural variability in MNSV genomic RNAs or is merely due to some sort of cloning artifact. Although it might have been possible to determine the 3' terminal sequence of MNSV by direct RNA sequencing [Simoncsits et al., 1977, Donis-Keller, 1980] and some attempts were made to do this, it was felt that the time, difficulty and expense involved placed further attempts beyond the scope of this project. 3.1.3 Central Genomic Region As indicated in Fig.3.2, the sequence of the central region of the MNSV genome was initially determined from cDNA subclones. Since this area showed significant differences from the analogous regions of related viruses, the sequence of 418 nucleotides in this region (nucleotides 2416-2833 inclusive) was confirmed using specific oligonucleotide primers and MNSV RNA as template for dideoxynucleotide sequencing. A great deal of variability can exist in a population of virion RNA because of the high error rate and lack of an editing function in the viral RNA-dependent RNA polymerase which replicates the genome of RNA viruses [Holland et al, 1982]. A particular cDNA clone prepared from a population of virion RNA represents the sequence of only one possible variant, which Chapter 3. Results and Discussion 33 could even be a non-functional molecule. The sequence determined using virion RNA as template should represent the predominant sequence of the RNA population in that region, which should be the functional or "true" sequence. No significant differences were found, however, in the sequence of the central region of the MNSV genome as determined using cDNA subclones compared to using virion RNA as template. 3.1.4 Nucleotide Sequence Data App. B shows a "shotgun handler" alignment of all the useable sequence data generated during this project. This alignment indicates the data obtained each time a particular subclone was sequenced, the orientation of the subclones, the number of times each nucleotide was determined (2 to 14 times depending on the difficulty of ascertaining the sequence in the area) and the consensus sequence derived from the total sequence data. 3.2 Nucleotide Sequence of the MNSV Genome Fig. 3.3 shows the nucleotide sequence of the MNSV genome and the deduced amino acid sequences of probable MNSV proteins. The sequence of the coat protein gene (nucleotides 2815 to 3987 inclusive) was determined before and concurrently with work on this project [Pot, 1987, Riviere et a/., 1989]. The entire MNSV genome consists of at least 4262 nucleotides. This agrees well with the previously reported molecular weight of the MNSV genome of 1.5 x 106 (ca. 4.4 kb) as determined by non-denaturing polyacrylamide gel electrophoresis [Hibi and Furuki, 1985] and previous estimates of the genome length as 4.3 kb by denaturing agarose gel electrophoresis [Pot, 1987]. Chapter 3. Results and Discussion 34 3.3 Genomic Organization 3.3.1 MNSV Genomic Organization Fig. 3.4 shows all possible open reading frames (ORFs) deduced from the MNSV sequence, in all 3 reading frames on both the RNA orientated as shown in Fig. 3.3 (positive sense) as well as from its complement (negative sense). Examination of the subgenomic RNAs produced by MNSV during infection (see Sec-tion 3.4) as well as comparisons of the genomic organization and amino acid sequences of MNSV proteins with those of related viruses (see Section 3.3.2 and Section 3.5, re-spectively) suggest that up to five of the ORFs indicated in Fig. 3.4 may actually be expressed during MNSV infection. The genomic organization of MNSV would, therefore, appear as shown in Fig. 3.5. A 5' proximal noncoding region of 87 nucleotides precedes the first AUG codon found in the MNSV sequence. This initiator lies in reading frame 1 and is followed by an open region which encodes a protein of 29,228 Da (p29). This ORF terminates with an amber codon at nucleotides 892 to 894 inclusive, but reading frame 1 remains open after this terminator for another 1,566 nucleotides. If read-through of this terminator occurs, a protein of ca. 88,683Da (p89) would be produced. MNSV p89 also terminates in an amber terminator, but read-through of this terminator is unlikely since a second termination codon (UAA) occurs almost immediately thereafter (nucleotides 2467-2469). Overlapping p89 at the 3' end by 19 nucleotides is an ORF in reading frame 3 encod-ing a 7,104 Da protein (p7A). This ORF also terminates in an amber codon at nucleotides 2637 to 2639 inclusive, and the same frame is open for another 186 nucleotides after the amber codon. If read-through of this terminator occurs, a protein of ca. 13,915 Da (pl4) would be produced. In addition, the putative readthrough area has an in-frame initiation codon only 3 nucleotides after the amber terminator of p7A raising the possibility that Chapter 3. Results and Discussion 35 P 2 9 P 89 q-iq—t p42 (coat) 4+ Frame 1 44M l h \\ \ hWH hh^l \ I i Mhh I h ^ Frame 2 p7Ap7B 44—44- 4 - 4 - Frame 3 4-p l 4 4 ± q _ ± , ^ \\ 4 t L I 1 1 Frame 4 4 — W | _ k q hfq b = ^ h h h' I I - 4 - Frame 5 J 4 4 - t 44 4^- 44—tfc- Frame 6 ' ' ' ' l l l l l l I I I I l l I I I _L_L 1000 2000 3000 4000 4262 Figure 3.4: Open Reading Frames of MNSV Genomic RNA in both the Positive and Negative Sense Start codons are represented by vertical lines above the main horizontal line. Only the first A U G of an O R F is indicated. Termination codons are represented by vertical lines below the main horizontal line. Only the first termination codon 3' to a start codon is indicated. Numbers following "p" indicate the approximate sizes (in kDa) of proteins encoded by ORFs believed to be expressed by M N S V . Frames 1-3 show all possible O R F s from a three frame translation of the M N S V genome in the positive sense. Frames 4-6 show all possible ORFs from a three frame translation of the reverse complement of the genomic sequence. A l l frames are written 5' to 3' with respect to the O R F s shown. The scale at the bottom is in nucleotides, numbered with respect to the positive-sense strand. Chapter 3. Results and Discussion 36 U A G U A G P 2 9 U A G 2460 5 ' - P 8 9 88 A U G 892 p 7 A A U G 2815 U A A 3985 p42 (coat protein) p 7 B | p l4 4262 Frame 1 — Frame 3 2442 2637 2825 A U G U A A Figure 3.5: Genomic Organization of MNSV Horizontal lines at the 5' and 3' termini represent noncoding regions. Rectangles represent probable coding regions. The O R F encoding p7B is represented by a dashed rectangle to indicate its expression is less likely than the other O R F s shown. Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Other numbers indicate nucleotide positions. The nucleotide positions of start ( A U G ) and termination ( U A G or U A A ) codons are indicated. this ORF is independently expressed to produce a 6,589 Da protein (p7B). There is, how-ever, much less evidence to support expression of p7B in vivo than to support expression of the other proteins shown in Fig. 3.5. MNSV pl4 and p7B terminate with the same ochre terminator, UAA, at nucleotides 2826 to 2828. Overlapping the 3' end of these proteins by 11 nucleotides is an ORF in reading frame 1 which encodes the 41,480 Da coat protein (p42), which has previously been described [Pot, 1987, Riviere et al, 1989]. The coat protein gene terminates in an ochre terminator, UAA, at nucleotides 3985 to 3987 which is followed by a 3' proximal noncoding region of at least 276 nucleotides. 3.3.2 Comparison of the MNSV Genomic Organization with Those of Re-lated Viruses Fig. 3.6 compares the genomic organization of MNSV, as described above, to those of the two carmoviruses CarMV [Guilley et al, 1985] and turnip crinkle virus (TCV) [Carrington et al, 1989] and the tombusvirus CNV [Rochon and Tremaine, 1989]. These compar-isons show that MNSV shares similarities in genomic organization with both carmo- and Chapter 3. Results and Discussion 37 MNSV U A G I U A G I P 2 9 | P 89 1 | p 7 A p 7 B P 14 p42 (coat) U A G U A G CarMV (Carmovirus) p27 p86 p98 p38 (coat) U A G TCV (Carmovirus) P 28 | P 88 — p38 (coat) P 8 p9 j CNV (Tombusvirus) U A G I p33 p92 p41 (coat) p20 p21 Figure 3.6: Comparison of the Genomic Organizations of MNSV, CarMV, TCV and CNV Rectangles represent probable coding regions. ORFs encoding M N S V p7B and T C V p9 are represented by dashed rectangles to indicate their expression is less likely than the other O R F s shown. Inclusion of T C V p9 is based on this author's interpretation of the nucleotide sequence data published for T C V . Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Arrows indicate the locations of amber termination codons ( U A G ) . Chapter 3. Results and Discussion 38 tombusviruses, but resembles the carmoviruses more closely overall. This was expected from previous comparisons of dsRNA species generated by these viruses during infection [Pot, 1987, Riviere et al., 1989]. The 5' portion of MNSV has a genomic organization similar to that of both the tombus- and carmoviruses sequenced to date, i.e., after a short 5' noncoding region all of these viruses have an ORF encoding a protein of ca. 30 kDa terminating with an amber codon which may be read-through to produce a protein of ca. 90 kDa. This read-through protein has been tentatively identified as the viral RNA-dependent RNA polymerase in all of these viruses (see Section 3.5.1). Downstream from the ORFs encoding their polymerases, distinct differences exist in the genomic organizations of the tombus- and carmoviruses. MNSV clearly resembles the carmo- rather than the tombusviruses in respect of these differences. Two tombusviruses have been sequenced in this area; CNV and tomato bushy stunt virus (TBSV) [Hillman et al., 1989]. Both tombusviruses have a centrally located coat protein gene followed by two nested ORFs encoding proteins of 20 kDa and 21 kDa. MNSV and the carmoviruses have their coat protein genes at the 3' end of the genome and have at least one small ORF centrally located between the putative polymerase and coat protein genes. MNSV and the carmoviruses have no ORFs in locations analogous to the two 3' proximal nested ORFs of the tombusviruses, and the tombusviruses have no ORFs in locations analogous to the small, centrally located ORFs of MNSV and the carmoviruses. Using the criterion of genomic organization, therefore, MNSV seems to be more closely related to the carmo-than to the tombusviruses. As shown in Fig. 3.6, the major differences in genomic organization between MNSV and the carmoviruses, CarMV and TCV, exist in the central region of the genome. These differences may explain some of the biological differences between these viruses. All 3 viruses have at least 2 short open regions in this area, but these are arranged differently in Chapter 3. Results and Discussion 39 each virus. In CarMV, the first carmovirus sequenced, the putative viral polymerase, p86, terminates in an amber codon and the same reading frame (Frame 1) remains open after this amber codon for another 315 nucleotides. Thus, if two amber read-through events occur (firstly to read through the amber terminator of p27 to produce p86 and secondly to read through the amber terminator of p86) a 98kDa protein could be produced. One report of in vitro translation of CarMV has shown the production of a protein of 100 kDa, which the authors suggest corresponds to the above 98 kDa protein [Harbison et al, 1985]. In addition to this open area in Frame 1, CarMV also has a small ORF in Frame 2 which could encode a 7 kDa protein, p7. This ORF also terminates in an amber codon. The p7 ORF overlaps slightly with the p86 ORF (ca. 25 nucleotides) and the rest of p7 lies completely within the area which also encodes the second read-through domain of p98. This latter domain overlaps the coat protein ORF slightly (ca. 8 nucleotides). Thus, potentially, there is no noncoding region in the central genomic region of CarMV. The central region of TCV has a slightly different organization. Its putative repli-case, p88, terminates in an ochre terminator (UAA), and the replicase frame, Frame 3, contains other stop codons shortly thereafter so that read-through of this frame is unlikely. Instead, TCV is reported [Carrington et al, 1989] to contain one small ORF in this area (Frame 2), which encodes an 8 kDa protein. Analysis of the TCV sequence data, however, reveals that another small ORF occurs in Frame 3 of this region which encodes a 9 kDa protein. No evidence (such as a subgenomic RNA mapping to this re-gion or a 9 kDa in vitro translation product) has been reported that this 9 kDa protein is produced by TCV. It is, therefore, represented by a rectangle composed of dashed lines in subsequent figures. The 5' end of the p8 ORF overlaps the read-through domain of p88 slightly (ca. 31 nucleotides) and the 3' end of p8 overlaps the 5' end of the p9 ORF for a longer distance (ca. 77 nucleotides). The p9 ORF overlaps the coat protein ORF by ca. 8 nucleotides. Thus, as with CarMV, the entire central area of the TCV genome Chapter 3. Results and Discussion 40 is occupied by potential coding regions. The genomic organization of the central region of MNSV shows similarities to and differences from both CarMV and TCV. Like CarMV, the putative replicase of MNSV (p89), terminates in an amber codon. But like TCV, it's unlikely that a second read-through event occurs in MNSV since there are other in-frame terminators soon after this amber codon. In Frame 3 of the central region, MNSV has two small ORFs, each of which could encode a 7 kDa protein (p7A and p7B). The ORF encoding p7A terminates in an amber codon, with one in-frame amino acid between this amber codon and the methionine of p7B. Thus, these proteins could be separately expressed as two 7 kDa proteins or, as seems more likely, read-through could produce a single 14 kDa protein. The 5' end of the p7A ORF overlaps the p89 ORF slightly (ca. 19 nucleotides) and the 3' end of pl4/p7B overlaps the coat protein ORF (ca. 11 nucleotides). Thus, as in CarMV and TCV, the entire central region of the MNSV genome is occupied by potential coding regions. 3.4 T r a n s l a t i o n S t r a t e g y 3.4 .1 M N S V T r a n s l a t i o n S t r a t e g y Based on comparison with related viruses, MNSV appears to use at least two of the translation strategies commonly used by viruses to enable expression of internal cistrons, i.e., production of subgenomic RNAs and read-through of "leaky" termination codons. Fig. 3.7 shows a Northern blot of total ssRNA from MNSV-infected cucumber hy-bridized to a radioactively labelled probe derived from either pBST17A-14 or pMNSOlA. As shown in Fig. 2.1, these probes correspond to approximately the 5' half and the 3' half of the MNSV genome, respectively. Fig. 3.7 shows that the 5' half probe hybridizes to only one ssRNA species of ca. 4.3 kb Chapter 3. Results and Discussion 41 a 1 2 3 4 b 1 2 3 4 f r i - 4 . 3 - 1 .9 - 1.6 Figure 3.7: Northern Blot of MNSV-Specific ssRNAs Generated During Infection of Cucumber Blots were probed with either 32P-labelled pBstl7A-14 (a) or pMNSOlA (b). Lane 1 contains MNSV virion RNA, lane 2 contains total ssRNA from uninfected cucumber cotyledons, lane 3 contains total ssRNA from MNSV-infected cucumber cotyledons, and lane 4 contains MNSV virion RNA plus total ssRNA from uninfected cucumber cotyledons. Sizes of MNSV-specific ssRNA species are indicated on the right in kb. which closely approximates the size of full-length MNSV genomic RNA. The 3' half probe, pMNSOlA, hybridizes to 3 ssRNA species. The largest of these also corresponds in size to full-length genomic RNA, while the 2 shorter RNAs are probably subgenomic RNAs generated during infection. As discussed in Section 1.1.3, all subgenomic RNAs examined from plant viruses have been found to be 3' coterminal and colinear with the viral genome. The Northern blot of MNSV-specific ssRNAs shown in Fig. 3.7 suggests this is also the case for MNSV subgenomic RNAs and that these subgenomics are orientated relative to the MNSV genome as shown in Fig. 3.8. Comparing the genomic locations of the MNSV subgenomic RNAs with the genomic locations of the probable proteins expressed by MNSV (Fig. 3.3) suggests that these proteins are expressed via the translation strategy illustrated in Fig. 3.8. The genomic length ssRNA could serve as template for expression of the 5' proximal p29 and/or Chapter 3. Results and Discussion 42 P 29 88 U A G I P 8 9 892 2460 •4.3kb Genomic Length RNA U A G P 7 A | P 7 B | p l4 I I 2442 2637 2825 • 1.9kb Subgenomic RNA 1 p42 (coat protein) I 2815 1.6kb I 3984 Subgenomic RNA 2 Figure 3.8: Orientation of MNSV Subgenomic RNAs Relative to the MNSV Genome Rectangles represent probable coding regions. The O R F encoding p7B is represented by a dashed rectangle to indicate its expression is less likely than the other O R F s shown. Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Numbers at the right indicate the sizes of MNSV-specific ssRNA species in kb. Other numbers indicate nucleotide positions. Arrows indicate the locations of amber termination codons ( U A G ) proposed to be read-through. Chapter 3. Results and Discussion 43 p89. The p89 protein could only be produced from this transcript if the amber (UAG) terminator of p29 (nucleotides 892-894 inclusive) is read-through. The larger subgenomic RNA (Subgenomic 1), which is ca. 1.9 kb in length, could serve as template for expressing p7A and/or pl4. Again, pl4 would only be expressed if the intervening amber termination codon (nucleotides 2637-2639) is suppressed. The smaller subgenomic RNA (Subgenomic 2), which is ca. 1.6kb in length, could serve as template for the 42 kDa coat protein. 3.4.2 Comparison of the MNSV Translation Strategy with Those of Related Viruses As illustrated in Fig. 3.9, the translation strategy proposed for MNSV is very similar to those proposed for the carmoviruses CarMV and TCV. These viruses also produce two subgenomic RNAs, and the sizes of their subgenomics [Carrington and Morris, 1984, CarMV] [Carrington et al, 1987, TCV] are very similar to those of MNSV. The actual genomic locations of the CarMV subgenomics have been determined by primer extension analysis [Carrington and Morris, 1986, CarMV] [Carrington et al, 1987, TCV]. This has shown that the CarMV subgenomics are located relative to its genome in locations analogous to those proposed for the MNSV subgenomics. Immunoprecipitation of the in vitro translation product of the smaller subgenomic RNA of CarMV [Carrington and Morris, 1985] has shown that this subgenomic acts as template for the viral coat protein. Similar evidence supports the idea that the smaller subgenomic of TCV acts as template for its coat protein [Carrington et al, 1987]. It is proposed that, as in CarMV and TCV, the smaller 1.6kb MNSV subgenomic RNA is the template for the coat protein. Similarly, in vitro translation of the larger subgenomic RNA of CarMV [Carrington and Morris, 1986] produces a protein close to the molecular weight predicted for the small, centrally located CarMV p7 ORF [Carrington et al, 1987]. It is proposed that the larger subgenomic RNA of MNSV (ca. 1.9kb) also acts as Chapter 3. Results and Discussion 44 U A G I P29 f MNSV p89 U A G p l4 — p42 (coat) •4 .3kb /F l • 1.9kb/F3 • 1.6kb/Fl CarMV (Carmovirus) U A G P 2 7 p86 U A G p98 P7 — p38 (coat) -4 .0kb /F l -1.7kb/F3 -1.5kb/F2 TCV (Carmovirus) U A G I p28 p88 P 8 r - - i i p9 • — p38 (coat) -4.1kb/F3 -1.7kb/F2 -1 .5kb/Fl Figure 3.9: Comparison of the Translation Strategies of MNSV, CarMV and TCV Rectangles represent probable coding regions. O R F s encoding M N S V p7B and T C V p9 are represented by dashed rectangles to indicate their expression is less likely than the other O R F s shown. In particular, no obvious means of translation exists for either of these proteins. Inclusion of T C V p9 is based on this author's interpretation of the nucleotide sequence data published for T C V . Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Numbers at the right indicate the sizes (in kb) of virus-specific ssRNA species from which the protein(s) indicated are thought to be translated. Arrows indicate the locations of termination codons proposed to be readthrough as part of the viruses' translation strategies. Coat stands for coat protein. Chapter 3. Results and Discussion 45 template for the expression of its small, centrally located ORFs, p7A and/or pl4. As shown in Fig. 3.8, CarMV and TCV have amber terminators in locations analogous to the amber codon which punctuates MNSV p89. It has been shown in CarMV that this amber terminator can be suppressed in in vitro translation, allowing expression of the CarMV p86 [Harbison et al, 1985]. There is also evidence that this occurs in TCV [Dougherty and Kaesberg, 1981]. These findings support the proposal that MNSV p89 may be expressed by read-through of a leaky termination codon. Comparison with CarMV and TCV lends less support to the idea that the amber terminator of MNSV p7A is read-through to produce pl4. The reading frames of TCV p8 and CarMV p7, the analogous proteins to MNSV p7A (see Section 3.5.2), are open for only a short distance after their termination codons (175 and 108 nucleotides re-spectively). Furthermore, the in vitro translation studies of CarMV and TCV to date provide little evidence to suggest that read-through of CarMV p7 or TCV p8 occurs. These studies, however, have not been specifically designed to determine whether such read-through occurs. There are, however, other reasons to think that nucleotides 2640 to 2825 of the MNSV genome may be expressed as protein, i.e., either as the read-through area of pl4 or independently as p7B. Firstly, this area shares amino acid sequence similarity with proteins which could be expressed from analogous regions of CarMV and TCV (see Section 3.5.2). Secondly, the centrally-located p9 protein of maize chlorotic mottle virus (MCMV), an unclassified spherical virus, shares amino acid sequence similarity with MNSV p7 (see Section 3.5.2) and MCMV p9 has a very long potential read-through protein, p32.7 [Nutter et al, 1989]. Finally, if neither MNSV p7B nor the read-through portion of MNSV pl4 is expressed, there would be a fairly large region in the middle of the MNSV genome which lacks a coding function. This would be unusual for a small RNA virus where intercistronic regions are usually very short. Chapter 3. Results and Discussion 46 If this area of MNSV does have a coding function, it is more likely that it is expressed by read-through of the p7A terminator to produce pl4, than by independent expression of p7B. A separate subgenomic RNA for p7B was not found when MNSV-specific ssRNA from infected cucumber was examined (see Fig. 3.7). It is possible that such a subgenomic exists but could not be differentiated from the subgenomics for p7A or the coat protein under the electrophoretic conditions used. It is also possible that a p7B subgenomic was not detected because it is only produced at very low levels or under certain conditions. If, however, such a subgenomic does not exist, then expression of p7B would seem to require some unusual translation mechanism such as internal initiation, which has been proposed but not demonstrated to occur in some plant viruses [Joshi and Haenni, 1984]. 3.5 Amino Acid Sequence Comparisons of MNSV Nonstructural Proteins The deduced amino acid sequences of putative MNSV nonstructural proteins were com-pared to the amino acid sequences of proteins of other small, isometric, positive-sense, ssRNA plant viruses to help determine possible relationships between MNSV and these viruses. In addition, MNSV protein sequences were also compared to the amino acid sequences of proteins of known function by computerized searches of several protein and nucleotide sequence databases, to help determine possible functions of the MNSV non-structural proteins. 3.5.1 p29/p89 Comparisons with Other Plant Viruses Fig. 3.10 shows dot matrix comparisons of MNSV p29/p89 with the analogous pro-teins of several other plant viruses. These comparisons show that MNSV p29 (the pre-readthrough domain of p89) shares amino acid sequence similarity with the analogous Chapter 3. Results and Discussion 47 BYDV—• Figure 3.10: Dot Matrix Comparisons of the Amino Acid Sequence of MNSV p89 with Those of the Analogous Proteins CarMV p86, TCV p88, MCMV p i l l , CNV p92 and BYDV P99 Arrows indicate the direction from the amino- to the carboxyterminus of each protein. Triangles indicate the approximate locations of U A G codons separating the two domains of M N S V p89, C a r M V p86, T C V p88 and M C M V p i l l . The triangle in the B Y D V sequence indicates the approximate location of the frame-shift region separating the two domains of B Y D V p99. Comparisons were made by the proportional match method, using a span of 31 and a similarity score of 350 (see Section 2.1.3). Chapter 3. Results and Discussion 48 proteins of the carmoviruses CarMV and TCV, and the unclassified virus MCMV, but shares no discernible similarity with the analogous proteins of the tombusvirus CNV or the luteovirus barley yellow dwarf (BYDV) [Miller et al, 1988]. The dot matrix compar-isons show, however, that the read-through domain of MNSV p89 shares a high degree of amino acid sequence similarity with the analogous proteins of all these viruses. Fig. 3.11 shows an alignment of the amino acid sequences of MNSV p29 and the analogous proteins of the carmoviruses (CarMV p27 and TCV p28) and of the unclassified virus MCMV (the carboxy terminal region of MCMV p50). Table 3.2 shows the percent amino acid sequence identity between pairs of these proteins as aligned in Fig. 3.11. Of the four virus groups compared, MNSV p29 shares the highest degree of amino acid sequence similarity with the analogous proteins of the carmoviruses. This supports the idea that MNSV is most closely related to this group of viruses. CarMV TCV MCMV MNSV 21.4 23.6 18.3 CarMV - 27.5 17.9 TCV - - 18.9 Table 3.2: Percent Amino Acid Sequence Identity Among MNSV p29 and the Analogous Proteins CarMV p27, TCV p28 and MCMV p50 Percent amino acid sequence identity is given by Number of identical residues shared by the two sequences compared ^ ^ Average length of the two sequences, without gaps Only the carboxyl terminal 268 amino acids of M C M V p50 are included in these calculations. Chapter 3. Results and Discussion 49 M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M[D A R[D T|G L[K F M G L P M[P[L L S P H L[V L L L L S G V E G L A T S S V I G C T L A I H G S W T S A I W E F H M K N L L A A G A R K V ^ S A V S S L D S S G[G] L ^ L y J A V G [ | A A A R Y Y P E V Q T F T[G] G[SA] I Q Q T[§] R Y V R I G Y|L G V Rj l[G] G[R|C|L(V1 v f q - s S S D T V G V R] -[G] V E L K S D S S S I L V R A T I P D Y V A V R F T P 3 I T H S I Q T J}JI[GJS S L [ G ] C [ I ] L E Q S Q [ l | s q V E Q I V V G T V L V E L V P E I G V V V G I V Y G[1]E E I D N AJV P D _A M(j]q I G I E @ « V D E E G[T] V T F G G F G G F[C]G F G G C[P]D s D I Sf? VlP K P s K|_S[N s K A N T E A N T T A N V [ P L V M V L P L P L S P D L V F T P D V F T P D v F[M| P|~C R T T HJL -[R L V N R R V C M V A M V F N K[R|R H R S R|P|F| I K K R[H L R Rlv F L I R Q R q R E F F V R F A M H A V R V N L[v[ A I E A A K A K L A K E qjc K | q H G[K C K E R Q~Y\C K E R R[K C E H A C L P V V P V V M N H q]~T~R H T R H I L [ L D C D S P T D S M]S V S R T I V R A I I A N T SIM - S S K -[RJM M - q V V L D G[R] A S I P S~E| I L T E N[R] A T D A V R Y G I K I R H D H El S I E I C A L A D R L K T L D G W S I L N R K[G W E A G K V R K[W Q Y R[Ajq F V E E T P -S] A| K A W R R A D R A R T[G]R S W E M S S G K A W R R A W R I D N|L C G L wlc V V N| G R N W V R L C LlG G L DJW S N K A F K L K D q P V T I S F V p A L[F] q E L[K| Figure 3.11: Alignment of the Amino Acid Sequences of MNSV p29 and the Analogous Proteins CarMV p27, TCV p28 and MCMV P50 Amino acid sequences are denoted using the one-letter code. Dashes indicate gaps introduced to optimize the alignment. Amino acids shared by two or more of the proteins compared are boxed. Only the carboxyl terminal 268 amino acids of M C M V p50 are included in the alignment. Chapter 3. Results and Discussion 50 Amino acid sequence comparisons of the read-through domain of MNSV p89 also support this conclusion. Fig. 3.12 shows an alignment of the deduced amino acid sequence of the read-through domain of MNSV p89 with the analogous proteins of CarMV, TCV, MCMV, CNV and BYDV. Table 3.3 shows the percent amino acid sequence identity between pairs of these proteins as aligned in Fig. 3.12. This comparison shows that all these viruses, although many are from different groups, share a very high degree of amino acid sequence identity in this protein. As with MNSV p29, however, the read-through domain of MNSV p89 shares the highest degree of sequence identity with the two carmoviruses (49.7% with CarMV and 53.2% with TCV; average identity 51.5%), a slightly lesser degree of identity with the unclassified virus MCMV (51.0%), less identity with the tombusvirus CNV (38.7%) and the least identity with the luteovirus BYDV (33.2%). CarMV TCV MCMV CNV BYDV MNSV 49.7 53.2 51.0 38.7 33.2 CarMV - 50.0 44.1 38.5 32.2 TCV - - 51.0 38.9 32.4 MCMV - - - 40.3 31.8 CNV - - - - 31.3 Table 3.3: Percent Amino Acid Sequence Identity Among the Read-through Domains of MNSV p89 and the Analogous Proteins CarMV p86, TCV p88, MCMV p i l l , CNV p92 and BYDV p99 Percent amino acid sequence identity is given by Number of identical residues shared by the two sequences compared Average length of the two sequences, without gaps Chapter 3. Results and Discussion 51 M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V — G C L E E L A G F C T S V R R G - - T H P D M T E F P q D R P I K C a r M V — G G L F Y L N G V E T K i R R G G H P S V I E V D G q C P L K T C V — G C L R E L V G R E T q i S R G EN P A M R V F P L A N P P K M C M V — G C L E E W L G V q S R R T R A R H P K L R S H F K D T R H K C N V — G G L V R L P G V T N R D I P S G V L L P q E V L E V R A G P P N A K B Y D V L C G F L E G L C T A S G F E S P F - - - - P I L G L 3 E I A V T D G A R L R K M N S V - T R K L Y C L G _ V G T S V K [ F N V H N N s L | A N L R R G L V E R V F C a r M V - E R K L Y V q N A I T T G Y E Y R V H N H S [ Y A N L R R G L L E R V F T C V - V R R I F H I C G HGKGLD F G V H N N S ~ L ] N N L R R G L M E R V F M C M V - T R R V F R I A G L G N L Y E F G V H N N s A V N L E R G L M E R V F C N V - D R N I F M V A G C P s q A R F L V H N H C L K N L K R G L V E R V F B Y D V V S S N I R Y L S q T H L G L V Y K A P[N Afs L H NJA L V A V E R]R V F S I S P G R I S A N P V P R E P V S R E R L G Y D T Y P A q F V K A E F 0 K A E F V K A E F V K A E F V K A E Fr_l KfRlE D P A P R V I D P A P R V I D P A P R V I D P[V]P R V I __ D P A P R V I K - -j Pfjc] A P R[L|l I I V H V L| I M C I q p R Q P R q p R Q P R q p R C I P R S P R Y N V D P R Y N[l] A P R Y N V N P R Y N V s K] R Y N I T EJVjG R E L G R E 0 G 0 E L G R E L G R U L GfT" Y L Y L Y L Y LJ Y L R [ L ) K J R F D q S R F D q K ] R F D q S R F D q S R F D q S R F D q H V S [ S D H V S V H V S V H V S V H [ C ] S V H V s|~E Q R R K K K P R P R H K F V S q T R K T D N K A C El E H Y L Y R G E H H A Y K A E H H L Y R A E H P I Y H A E S K L H K A E K K I H H A Figure 3.12: Alignment of the Amino Acid Sequences of the Read-Through Domain of MNSV p89 and the Analogous Proteins CarMV p86, TCV p88, MCMV p i l l , CNV p92 and BYDV p99 (continued on page 52) Amino acid sequences are denoted using the one-letter code. Dashes indicate gaps introduced to optimize the alignment. Amino acids shared by four or more of the proteins compared are boxed. Chapter 3. Results and Discussion 52 M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V M N S V C a r M V T C V M C M V C N V B Y D V G C R H S G D G C R M S G D G C R M S G D G [ K ] R M S G D G C R H S G D G [ H ] R M S G D - Y L K K F V T K L Y M R H L Y L K K L D D C V[VJT D D C V L l | D D C V L[F F D D[S]V L i] D D C V L I D D C V[T| I E K E C A A V E R T D I D Y EJ A D E V D R P A V E V G R E J R R H L K Q D R A N - E K K A D M V R [ H W V S N L T T G I W Q F R F D F N Y N L Q Y G F G F G F G F Q C E L N C I A q c i A E V I S G|Y T M K V G Fl N M V T q I [ E F c q M K I ^ F C Q H E F c q M E F C Q H E F c q E F C q K V q v E V q L s s s T T M T I N D [ s j l S Y E F S Y E Y E K F E S F S K S K S K S K S K H\K_ Y Y Y Y Y Y D S D S D S D D D Y S V H S L Y S T A Y A V V H C V S I T T L G P V H G V H W N N T q W A N T P F N T N N I R D L S H L I M H I R N K Y V E T C L K R N A L V K H R F T L Y C L Y R S S G Y K K T q S V N A G K V R F G P L A G L D K S G A I S E D l A R G Y S A G S K D K W q E G D Y V S E V C R s q D q E s T P E A R A R A R E - T P I T S HSJR G F V P q G K P R I V G P q C E A D P T G Y K E E S T T G I L A E F G E E G V D q S V K V T T P H L S F S F S F S F S F sfY Y L Y q A F G A F G Y R J F G W L W A A F G A F G W E S]F G N I N H E T E K D P T A L A T R R A Q S D V K S T M S S G I L R E - - N K N I T G D Y K K K M q P - - K N I K q E V P K K V S E E F I K H I T P D L q R A[M)E I T P D q q I A L E Y T P D E q E A L E Y T P D E q R A L E L T G D E q L A L E V D P K I q q I V[E q p i A D S L S D I P E A H E q S I L L S I P S W T L W I R W I C L L P S I E - - L -R K - Y q E - - F P H N P L P S A V A -E N H S q T T L P T -N E Y Figure 3.12 (cont'd) Chapter 3. Results and Discussion 53 Comparisons with Proteins of Known Function Many plant viruses have two 5' proximal ORFs arranged in a manner analogous to the MNSV p29 and p89, i.e., the 5' proximal ORF is smaller, terminates, usually with an amber codon, and can be read-through to produce a larger protein [Dougherty and Hiebert, 1985]. In many of these viruses, the protein produced by read-through has been identified as the putative viral replicase [Goldbach, 1986]. It has been suggested that in at least some of these viruses, the replicase protein has two functional domains, corresponding to the two ORFs from which it is translated [Goldbach, 1987]. Conserved sequence motifs have been associated with each of these functional domains. Read-through Domain of MNSV p89 The more highly conserved domain of the replicase in most viruses is the read-through domain, which has been identified as en-coding the RNA-dependent RNA polymerase function of many viral replicases [Kamer and Argos, 1984]. This polymerase domain is characterized in most viruses by a highly conserved amino acid triplet glycine-aspartate-aspartate (GDD), surrounded by several less-conserved hydrophobic residues [Kamer and Argos, 1984, Argos, 1988]. As shown in Figure 3.13, the read-through domain of the MNSV p89 contains the GDD sequence and other conserved residues characteristic of viral polymerases. Based on the size of MNSV p89, its genomic location and organization and the fact that it contains sequence motifs characteristic of viral polymerases, it seems likely that MNSV p89 is the viral replicase, and that its read-through domain comprises the viral polymerase. MNSV p29 Little is known about the pre-readthrough portion of the viral replicase. It is not known with certainty in what form(s) the pre-readthrough region may be func-tional, i.e., it may be translated as a separate protein ending at the termination codon which separates it from the read-through domain, or it may only be functional as the Chapter 3. Results and Discussion 54 * * *** * *** * ** *** MNSV CRMSGDMNTAMGNC — 19 aa— IMNNGDDCVV CarMV CRMSGDMNTALGNC — 16 aa — LINNGDDCVL TCV CRMSGDVNTALGNC — 17 aa— LINNGDDCVL MCMV KRMSGDMNISLGNC — 19 aa— LINNGDDNVL CNV CRMSGDINTSLGNY — 20 aa— LANCGDDCVL BYDV HRMSGDINTSMGNK — 19 aa — LCNNGDDCVI TMV QRKSGDVTTFIGNT — 18 aa — GAFCGDDSLL A1MV QRRTGDALTYLGNT — 18 aa — VVASGDDSLI BMV QRRTGDAFTYFGNT — 18 aa — AIFSGDDSLI SNBV HMKSGMFLTLFVNT —20 aa — AAFIGDDNII FMD GHPSGCSATSIINT —24 aa— MISYGDDIVV Polio GMPSGCSGTSIFNS —24 aa— MIAYGDDVIA CPMV GIPSGFPMTVIVNS —33 aa — LVTYGDDNLI Figure 3.13: Alignment of Amino Acids Surrounding the GDD Sequence of MNSV p89 with Conserved Amino Acids Characteristic of Many Viral Polymerases Sequence data for M N S V were taken from this thesis. Sequence data for the viruses from C a r M V to B Y D V were taken from the references given in the text. Sequence data for the remaining viruses T M V , alfalfa mosaic virus (A1MV), brome mosaic virus ( B M V ) , sindbis virus ( S N B V ) , foot-and-mouth disease virus ( F M D ) , polio virus and cowpea mosaic virus ( C P M V ) were taken from Rochon and Tremaine 89. Amino acid sequences are denoted using the one-letter code. The notation "—no. aa—" indicates the number of intervening amino acids separating aligned areas of a sequence. A double asterisk indicates that residues are identical in all 13 viruses examined; a single asterisk indicates that residues are identical in 9 to 12 of the viruses examined. Chapter 3. Results and Discussion 55 pre-readthrough domain of the larger replicase. The latter seems unlikely, since in this case there would be no reason for the termination codon found at the end of the pre-readthrough domain. It is possible that the pre-readthrough region is expressed in both forms, possibly at different levels, and under different conditions or at different stages of the viral life-cycle. It may have different functions depending on the form in which it is expressed. The function of the pre-readthrough domain of viral replicases is also not known with certainty. In most positive-sense ssRNA viruses examined to date, a region which is usually located N-terminal to the polymerase domain of the replicase has been found to contain certain conserved sequences found in various nucleotide binding proteins [Gor-balenya et al, 1988, Hodgman, 1988, Gorbalenya et al, 1989]. These regions have, therefore, been referred to as nucleotide binding domains. The actual role of this domain in RNA replication is not known with certainty, but it has been suggested that it may act as a helicase with possibly two activities [Gorbalenya et al., 1988]. Firstly, it may func-tion during replication to unwind the double-stranded replicative forms of ssRNA viruses in order to make the template strand accessible to the viral polymerase. Secondly, it may function in recombination between RNA genomes. It has previously been reported that the pre-readthrough regions of the carmoviruses CarMV and TCV do not contain the conserved sequences characteristic of a nucleotide binding domain [Carrington et al., 1989]. The pre-readthrough region of the tombusvirus CNV also lacks this motif [Rochon, 1989] as do the analogous regions of MCMV and the luteovirus BYDV. MNSV, which shares significant amino acid sequence similarity with these viruses in the polymerase domain of the replicase (see Fig. 3.12), also resembles these viruses in that its pre-readthrough region, p29, lacks a nucleotide binding motif. In fact, none of the coding regions of MNSV contains such a motif. It has been suggested that viruses lacking this motif may have differences in their replication strategy compared Chapter 3. Results and Discussion 56 to viruses which contain a nucleotide binding domain [Carrington et a/., 1989]. Since the MNSV p29 lacks a nucleotide binding domain, its amino acid sequence was compared with the sequences of proteins in various protein data banks to see if any significant similarities existed which might indicate a possible function or mode of action for this protein. Unfortunately, no significant similarities were found. Replicase Sequences in Virus Taxonomy Taxonomists have used amino acid sequence similarity in the polymerase domain of viral replicases as an important criterion in grouping positive-sense ssRNA viruses into two major supergroups, the picornaviruses and the Sindbis-like or alphaviruses [Goldbach, 1986, Strauss and Strauss, 1988]. Each supergroup contains plant and animal viruses, as well as viruses with very different genomic organizations and translation strategies. All viruses within a supergroup, however, share significant amino acid sequence similarity with each other in at least their polymerase domains. It has previously been reported that the polymerase domains of the carmo- and tombusviruses sequenced to date, as well as the luteovirus barley yellow dwarf (BYDV), do not share significant amino acid sequence similarity with the polymerases of viruses from either of the established supergroups, but share a very high degree of sequence sim-ilarity with each other [Rochon and Tremaine, 1989]. It has been suggested, therefore, that these viruses may compose a third viral supergroup [Rochon and Tremaine, 1989]. On this basis, it is proposed that MNSV be classified in this third supergroup. The MNSV polymerase shares no discernible amino acid sequence similarity with the poly-merases of viruses belonging to either of the two established supergroups, but shares a high degree of sequence similarity with the polymerases of all the viruses proposed as members of the new supergroup (see Fig. 3.12). It is also notable that these viruses are unusual among positive-sense ssRNA plant Chapter 3. Results and Discussion 57 viruses in that the pre-readthrough domains of their replicases lack a nucleotide binding motif [Gorbalenya et ai, 1989]. As noted above, it has been suggested that this may indicate that these viruses have a distinctive strategy of replication [Carrington et ai, 1989]. This may also explain why their polymerases are so different from other positive-sense ssRNA viruses and provides an additional reason for classifying these viruses in a third supergroup. Modular Evolution The two domains of the MNSV replicase exhibit very different degrees of amino acid sequence similarity in comparisons with the analogous regions of related plant viruses. This illustrates the idea that different functional domains or modules of a viral genome can evolve at very different rates [McClure et ai, 1988, Zimmern, 1988]. The high degree of sequence similarity among the polymerase domains of MNSV, CarMV, TCV, CNV, MCMV and BYDV strongly suggests that this domain arose from an ancestor common to all these viruses and has been highly conserved in all of them since then. Alternatively, one or more of these viruses may have only recently acquired this type of polymerase domain by recombination [Zimmern, 1988]. In contrast, the pre-readthrough domains of the replicases of several of these viruses show no discernible amino acid sequence similarity, suggesting this domain of the replicase has either arisen or evolved in a very different manner from the polymerase domain, e.g., the pre-readthrough domain of all of these viruses may have arisen from a common ancestral sequence, but diverged rapidly in some of them. Alternatively, recombination may have joined very different pre-readthrough domains to the common polymerase domains of some of these viruses. The possibility that viruses evolve, at least partly, through recombination of functional modules has recently been discussed [Zimmern, 1988]. Chapter 3. Results and Discussion 58 3.5.2 The Central Genomic Region of MNSV Potential Proteins As discussed previously and as illustrated in Fig. 3.6, the central genomic region of MNSV, between the end of p89 and the beginning of the 42 kDa coat protein, encodes three potential proteins; p7A, p7B and pl4. MNSV p7A and p7B are in the same frame and are separated by only three nucleotides. The ORF encoding p7A terminates in an amber codon. Read-through of this terminator would effectively join p7A and p7B to form pl4. Since p7B also has its own start codon, it is also possible, though unlikely, that p7B is independently expressed (see Section 3.4.2). Amino Acid Sequence Comparisons with Related Viruses The tombusviruses do not have a central genomic region resembling that of MNSV, nor do any of the tombusvirus proteins, including the 3' proximal nested ORFs encoding p20 and p21, exhibit discernible amino acid sequence similarity with any of the 3 potential proteins encoded by the central genomic region of MNSV. As discussed previously, however, the carmoviruses, CarMV and TCV, show definite similarities in the organization of potential proteins encoded by their central genomic regions (see Fig. 3.6). Some of these proteins also share significant amino acid sequence similarities with potential MNSV proteins in this region. Although MNSV and the carmoviruses share overall similarity in their central genomic region, they also exhibit significant differences in this region which may account for some of their biological differences. The centrally located p9 of MCMV also shares amino acid sequence similarity with MNSV p7A and the analogous proteins of CarMV and TCV. Fig. 3.14 illustrates the major similarities and differences in organization and areas of amino acid sequence similarity in the central genomic regions of MNSV, CarMV, TCV and MCMV. The highest degree of amino acid sequence similarity is seen among MNSV p7A, Chapter 3. Results and Discussion 59 U A G MNSV P 8 9 / F 1 c o a t / F l P7A/F3 X X X X X p7B/F3 X X X X X X pi*±/ r o U A G CarMV P 9 8 / F 1 P86/F1 X X X X X X X X X X X X X X coat/F2 P 7 / F 3 p9/F3 TCV P 8 8 / F 3 1 X 1 X X X X ^ X x 1 X X X 1 c o a t / F l P 8 / F 2 F2 MCMV p l l l / F 2 X X X X X X X X X X X X coat/F3 P32.7/F1 P 9 / F 1 U G A Figure 3.14: Comparison of the Central Genomic Regions of MNSV, CarMV, TCV and MCMV Rectangles represent probable coding regions. Rectangles shaded in the same way indicate amino acid sequences which share at least a discernible level of sequence similarity in dot matrix comparisons. O R F s encoding M N S V p7B and T C V p9 are represented by dashed rectangles to indicate their expression is less likely than the other O R F s shown. Inclusion of T C V p9 is based on this author's interpretation of the nucleotide sequence data published for T C V . The amino acid sequence represented by an open dashed rectangle in M C M V is not an O R F (it has no start codon), but shows sequence similarity with the other 3 viruses as indicated. Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Arrows indicate the locations of termination codons thought to be read-through. F 1 - F 3 indicate the reading frame from which each amino acid sequence is translated. Chapter 3. Results and Discussion 60 M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M D M D M D s q[R| T - - v E [ L ] T N|"P] R G R S [ K ] E [ R ] G D S [ G ] G K q E E R M V H N G Y F G R N S R I P Y I S L S E V [ P ] V V D S D A T M S S S q T q S P P T D V A R R l q q T G K Q M L A G K R | K K G N S M G q K T R K R L V G T A V R S E D N R [ K I R S|V A] S H D A I D A I S [ E ~ S _ _ A S S V L N K H R N Q L E N [ I ] A V G - - P K P A K R N L T K K q G V M | G D S T N G S A S H G G G G A S T Y J j A D K vTv]_NjV A D K W V I V A D K V E V S J E G A P A q Q N] V l [ l | A | K E | V | V I N N T I N F N F H I H F N F S I N F N F  N H F S F Figure 3.15: Alignment of the Amino Acid Sequences of MNSV p7A and the Analogous Proteins CarMV p7, TCV p8 and MCMV p9 Amino acid sequences are denoted using the one-letter code. Dashes indicate gaps introduced to optimize the alignment. Amino acids shared by two or more of the proteins compared are boxed. CarMV p7, TCV p8 and MCMV p9. These sequences are aligned in Fig. 3.15. Table 3.4 shows the percent amino acid sequence identity between pairs of these proteins as aligned in Fig. 3.15. As seen in comparisons of other MNSV proteins, MNSV p7A most closely resembles the analogous proteins of the carmoviruses, suggesting a closer relationship with this group. In addition, subgenomic RNAs allowing expression of MNSV p7A, CarMV p7 and TCV p8 have been detected, but a separate subgenomic for expression of MCMV has not. It has been suggested that MCMV p9 may be expressed by internal initiation [Nutter et a/., 1989]. A lesser degree of amino acid sequence similarity exists among MNSV p7B (or the carboxy terminal half of pl4), the 11 kDa second read-through region of CarMV p98, TCV p9 (which, like CarMV p98 is in the same frame as the replicase), and a region of MCMV which is also in-frame with the replicase, but which is probably not expressed (see Fig. 3.14) The similarity between some of these regions is so low that it is only seen in dot matrix comparisons (not shown) using less stringent conditions than were normally used (see Section 2.1.3). These sequences are aligned in Fig. 3.16. Table 3.5 shows the percent amino acid sequence identity between pairs of these sequences as aligned Chapter 3. Results and Discussion 61 CarMV TCV MCMV MNSV 31.7 26.3 15.2 CarMV - 36.1 12.8 TCV - - 13.2 Table 3.4: Percent Amino Acid Sequence Identity Among MNSV p7A and the Analogous Proteins CarMV p7, TCV p8 and MCMV p9 Percent amino acid sequence identity is given by Number of identical residues shared by the two sequences compared Average length of the two sequences, without gaps M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V M N S V C a r M V T C V M C M V R M P S A H 0 H L I V|L T G V M K V L L V N R T * S [ L ] R K R W [ L L F P I N Q I T G V G L G L L L I L L I R L R R C D S S T F K W K S q S T S T T S T S I T T S I S T E L E [ C ] V[c] V D S|_S[W PjGDYSGAL F S L P P L V T I S J F V l[A L S P W V I Y A F F F I LI I SI S I L C G Y N S L S L V Y I L L L L T S L S Pj q G I T [ Y ] V H H F D S [ ? ~ ' s d l S | R A E R A C Y Y N P I E I K P i f f S N q T C Q C P T P ] Q W L R N L I [ L ] S V K C H__ __ LT\ K T Q D T V A VpYH T S S V -NTH E P S V Y S D S S D S S i D q Y V G| I S T 1ST N I T q H i s q H i s i K F q Kl I D I N G D G N G K * N G K * N G G K Figure 3.16: Alignment of the Amino Acid Sequences of MNSV p7B and the Analogous Regions of CarMV, TCV and MCMV Amino acid sequences are denoted using the one-letter code. Dashes indicate gaps introduced to optimize the alignment. Amino acids shared by two or more of the proteins compared are boxed. Asterisks indicate termination codons. Chapter 3. Results and Discussion 62 CarMV TCV MCMV MNSV 17.8 12.3 21.9 CarMV - 35.3 12.9 TCV - - 8.2 Table 3.5: Percent Amino Acid Sequence Identity Among MNSV p7B and the Analogous Regions of CarMV, TCV and MCMV Percent amino acid sequence identity is given by Number of identical residues shared by the two sequences compared ^ Average length of the two sequences, without gaps in Fig. 3.16. These figures show that definite amino acid sequence similarity exists in analogous regions of these four viruses, suggesting that the sequence in this region is being conserved for some reason. The pressure for sequence conservation may be at the amino acid or nucleotide sequence level. The latter could occur if, e.g., this region plays a role in controlling gene expression such as serving as a promoter for expression of the adjacent coat protein gene. However, no obvious nucleotide sequence conservation was found in this region, even at its 3' end where the level of amino acid sequence conservation is highest. This suggests that conservation in this region is occurring at the amino acid sequence level. Such conservation, however, would require that these four regions be expressed as proteins and that these proteins have similar functions. In this case, this would imply that the four regions would be independently expressed as small proteins of similar size. However, there is little evidence that these regions are expressed at all and even less evidence that they are expressed as proteins of similar size. In CarMV, this region has been expressed in vitro as a second read-through of the 5' proximal ORF [Harbison et al, 1985]. This region of CarMV also contains two in-frame start codons, either of which could initiate translation of a small separate protein, but a separate subgenomic RNA for expression of such a protein has not been found Chapter 3. Results and Discussion 63 [Carrington and Morris, 1986]. The analogous region in MNSV has two potential modes of expression, i.e., as a read-through protein of p7A or as a separate protein. The latter seems unlikely even though this region contains an in-frame start codon because no corresponding subgenomic RNA has been found. Similarly, expression of TCV p9 is questionable although it too contains an in-frame start codon, because no corresponding subgenomic RNA has been found [Carrington et al, 1987]. It is possible that with all three of these viruses, a subgenomic RNA allowing separate expression of this region does exist but has not been detected. Alternatively, some novel mode of expression, such as internal initiation [Joshi and Haenni, 1984], may occur. Expression of the analogous region in MCMV seems even less likely than in these other three viruses. This region could not be expressed by read-through of the replicase because two in-frame stop codons occur between the replicase and this region. Similarly, expression as a separate protein is impossible because no in-frame start codon exists in this area. The only way of expressing this area would seem to be as an extension of MCMV p9 by a frameshift mechanism as is suggested to occur in BYDV [Miller et al, 1988]. This seems unlikely not only because frameshifting appears to be a rarely used translation mechanism, but also because MCMV p9 has a very long potential read-through protein, p32.7. In addition, as noted previously, expression of MCMV p9 itself is questionable because a subgenomic RNA enabling its expression has not been detected [Nutter et al, 1989]. Given that expression of this region as protein is questionable in all four viruses and that the observed amino acid sequence conservation does not appear to arise from conservation at the nucleotide sequence level, it is difficult to explain why the amino acid sequence is so conserved in this region among these four viruses. Amino Acid Sequence Comparisons to Determine Possible Functions The functions of the protein(s) encoded by the central genomic region of MNSV are unknown. Chapter 3. Results and Discussion 64 It has been suggested that the CarMV p7 and TCV p8 might be involved in transport [Guilley et ai, 1985, Carrington et al., 1989], although no amino acid sequence similarity has been found between these proteins and known or putative viral transport proteins. In an attempt to obtain some clue as to the functions of the protein(s) encoded by the central genomic region of MNSV, their amino acid sequences were compared with those of proteins in several protein and nucleotide sequence databanks (see Section 2.1.3). Unfortunately, no significant similarities were found. 3.6 Summary and Conclusions Number, Size, Genomic Organization and Proposed Translation of MNSV Nonstructural Proteins Of the several ORFs encoded by the MNSV genome, probably only four or five are expressed in vivo as proteins. These proteins, listed in order of their location on the MNSV genome from the 5' terminus have molecular weights of ca. 29,000 (p29), 89,000 (p89), 7,000 (p7), 14,000 (pl4), and 42,000 (p42). The p29 and p89 proteins are probably translated from genomic length RNA while p7A and pl4 probably are translated from the larger of two subgenomic RNAs produced during infection by MNSV. The smaller subgenomic probably serves as template for the 42 kDa coat protein. Expression of p89 and pl4 would require read-through of the termination codons of p29 and p7A, respectively. Probable Functions of MNSV Nonstructural Proteins MNSV p89 is almost certainly the viral replicase, with the read-through portion of this protein comprising the RNA-dependent RNA polymerase domain. MNSV p29 and/or the pre-readthrough domain of p89 presumably function in replication, but their mode of Chapter 3. Results and Discussion 65 action is unknown. The functions of p7A and pl4 are also unknown, but may be related to virus transport. MNSV Replication MNSV may use a replication strategy different from those used by viruses in the alpha-and picornavirus supergroups. This is suggested by the fact that the MNSV replicase lacks the nucleotide binding domain found in these viruses, and has a polymerase which is very different in amino acid sequence from the polymerases of these viruses. MNSV resembles the carmoviruses, tombusviruses, MCMV and the luteovirus BYDV in these respects, and should probably be classified with these viruses in a new, third virus su-pergroup. Relationships Between MNSV and Other Plant Viruses Fig. 3.17 compares the genomic organizations of MNSV, the carmoviruses CarMV and TCV, the tombusvirus CNV, the unclassified virus MCMV and the luteovirus BYDV. It also indicates regions of amino acid sequence similarity among these viruses. Although MNSV shares some regions of amino acid sequence similarity with each of these viruses, its nonstructural proteins share the greatest sequence similarity with those of the car-moviruses. MNSV also closely resembles these viruses in the number, size and genomic organization of its probable proteins, as well as in their likely translation strategy. Based on these molecular similarities, as well as on previously reported physical, chemical and biological similarities to various carmoviruses, it is proposed that MNSV be classified as a member of the carmovirus group. It has previously been reported that the MNSV coat protein more closely resembles the coat proteins of tombusviruses than of other carmoviruses [Pot, 1987, Riviere et ai, 1989]. The fact that all other molecular cri-teria indicate that MNSV is a carmovirus illustrates the danger of using coat protein Chapter 3. Results and Discussion characteristics as a major criterion in classifying viruses. Chapter 3. Results and Discussion 67 U A G U A G MNSV p29 [ p89 [ p42 (coat) n 7 A -31o<In7R p  p l4 CarMV (Carmovirus) U A G p27 j p86 A\\\\\WIIIIIIIIIIIIIIIIIIIIII{ U A G p98 p38 (coat) TCV (Carmovirus) CNV (Tombusvirus) I -U A G p28 \ p88 p38 (coat) i\\\\\\miiiiiiiiiimnmm--mmsm. P S . . lXXXX_A mP9 U A G p20 p33 \ p92 P 41 (coat) p21 MCMV U A G U G A p50 p i l l [ p25 (coat) \ \ \ \ \ \ \ \ \ \ \ P31.6 p9 p32.7 BYDV (Luteovirus) frame-shift p39 \ p60( P99?) VIIIIIIIIIIIIIIIMTim pl7 U A G p22 (coat) p50( P72?) P 6.7 p6 hL-M~r-Figure 3.17: Comparison of the Genomic Organizations and Regions of Amino Acid Sequence Similarity Between MNSV and Several Other Plant Viruses Rectangles represent probable coding regions. ORFs encoding M N S V p7B and T C V p9 are represented by dashed rectangles to indicate their expression is less likely than the other O R F s shown. Inclusion of T C V p9 is based on this author's interpretation of the nucleotide sequence data published for T C V . Rectangles shaded in the same way indicate proteins which share at least a discernible level of amino acid sequence similarity in dot matrix comparisons. Numbers following "p" indicate the approximate sizes (in kDa) of probable proteins. Arrows indicate the locations of termination codons thought to be read-through. Appendix A Protocols A . l Recovery of DNA Fragments from Low Gelling Temperature Agarose Gels (adapted from Tautz and Renz, 1983). 1. Electrophorese DNA through a low gelling temperature agarose gel. 2. Visualize briefly over a long wavelength UV light (365 nm) and excise band of interest. 3. Weigh gel slice in a pre-weighed 1.5 ml microfuge tube and add an equivalent volume of elution solution (0.3 M NaOAc pH7.0, ImM EDTA), e.g., add 100 /d of solution to a 0.1 g gel slice. The solution should cover the gel slice; if necessary, cut gel slice into two or more pieces, but do not crush gel. 4. Equilibrate gel slice in elution solution in the dark, at room temperature for 15-45min. The smaller the gel fragment and the shorter the DNA fragment, the shorter the equilibration time should be, to minimize leaching of the DNA into the solution. 5. Remove as much of the elution solution as possible and discard. 6. Freeze gel slice in liquid nitrogen or in an ethanol/dry ice bath for 2 min. 68 Appendix A. Protocols 69 7. Spin in a microfuge at room temperature for 5 min. This crushes the gel slice and squeezes the DNA out of it. 8. Remove supernatant containing DNA, minimizing carryover of agarose, and save in a separate 1.5 ml microfuge tube. 9. Add an equivalent volume of elution solution to the crushed gel. The volume of the gel piece will be significantly reduced from its original volume, so reduce the volume of elution solution accordingly. 10. Heat the mixture to 65°C for 2-3 min. 11. Freeze, centrifuge and save supernatant as above. 12. Repeat steps 9-11 once or twice more. 13. Determine volume of combined supernatants. Add 1/100 this volume of 1M MgCl2 and 2.5 volumes of ice-cold, 99% or absolute ethanol. 14. Incubate in an ethanol/dry ice bath or a —70°C freezer for 15-30 min to precipitate DNA. 15. Pellet DNA in a microfuge at 4°C for 10 min. Wash pellet twice with 70% ethanol. 16. Dry pellet and resuspend in a suitable buffer, such as TE (10 mM Tris-HCl pH 7.5, ImM EDTA). 17. If large amounts of agarose remain in the sample, spin in microfuge at room tem-perature for lmin. Carefully transfer supernatant to a clean microfuge tube and discard agarose pellet. Appendix A. Protocols 70 Notes 1. When excising desired band from agarose gel, minimize exposure to UV light to prevent nicking of the DNA. 2. The volume of elution buffer used in Steps 9-12 can be reduced slightly, e.g., to allow precipitation of DNA from the combined supernatants in one tube to maximize recovery. If a very large gel slice (or several gel slices) are used, they should be split between several tubes and treated as separate samples. 3. Lengthy freezing and centrifuging should not be necessary to precipitate DNA from the combined supernatants because residual agarose acts as a carrier. 4. DNA yields using this method are usually between 50-70%. Recovered DNA is suitable for nick-translation and cloning. A.2 Large Scale Plasmid Purification Using Alkaline Lysis and L i C l Precip-itation 1. Inoculate a small volume of Luria-Bertani broth containing 50 fig ampicillin/ml from a single E. coli colony containing the desired plasmid. Incubate at 37°C, overnight, with vigorous shaking. 2. Inoculate 200 ml of Luria-Bertani broth containing 50 fig ampicillin/ml with 100 pi of the overnight culture from Step 1. Incubate as in Step 1. 3. Centrifuge in a GSA rotor at 7,500rpm at 4°C for 10 min. 4. Discard the supernatant. Resuspend the pellet in 5 ml of lysozyme solution (see Notes). Incubate at room temperature for 10 min. Appendix A. Protocols 71 5. Add 10 ml of 0.2 N NaOH plus 0.5% SDS. Incubate on ice for 5 min. 6. Add 7.5 ml of 3M KOAc pH4.8. Incubate on ice for 10-30 min. 7. Centrifuge in a SS34 rotor at 7,500rpm at 4°C for 10min. 8. Transfer the supernatant to a clean tube. Discard the pellet. Add 2.5 volumes of ethanol (see Notes) to the supernatant. Incubate 15 min. at —70°C. 9. Centrifuge in a SS34 rotor at 7,500rpm at 4°C for 10min. 10. Discard the supernatant. Resuspend the pellet in 6 ml TE (10 mM Tris-HCl pH 7.5, ImM EDTA) plus 2 ml of 8M LiCl. This results in a final concentration of 2M Li CI which precipitates high molecular weight ssRNA [Baltimore, 1966]. Incubate on ice for 15 min. 11. Centrifuge in a SS34 rotor at 7,500 rpm at 4°C for 10 min. 12. Transfer the supernatant to a clean tube. Discard the large white pellet. Add 2 volumes of ethanol to the supernatant. Incubate 15min. at —70°C. 13. Centrifuge in a SS34 rotor at 10,000rpm at 4°C for 10 min. 14. Discard the supernatant. Resuspend the pellet in 1ml TE. Divide between two 1.5 ml microfuge tubes. 15. Extract each tube with 250//l phenol plus 250/d chloroform (CHCI3; see Notes). Transfer each aqueous phase to a clean microfuge tube. 16. Back extract each organic phase with 250 /d H 2 O . Add the aqueous phases to the aqueous phases from Step 15. Appendix A. Protocols 72 17. Extract each aqueous phase with an equal volume of chloroform. Transfer each aqueous phase to a clean microfuge tube. 18. Repeat Step 17 until the interface between the organic and aqueous phases is re-moved. 19. To each of the final aqueous phases, add 1/10 volume of 2 M NaOAc pH 5.8 and an equal volume of isopropanol. Incubate at room temperature for 15 min. 20. Spin in a microfuge at room temperature for 15 min. 21. Discard the supernatant. Dry the pellets. Resuspend each pellet in 200 (A TE. 22. To each tube, add 50 pi RNase A (lmg/ml) plus 250 units of RNase Tl . Incubate at 37°C for 30 min. 23. Add 250pi TE to each tube so that the final volume of each is ca. 500 pi. Extract each tube with organic solvents as in Steps 15-18. 24. Precipitate the plasmid DNA as in Steps 19-20. 25. Discard the supernatant. Wash the pellet with 70% ethanol. Dry the pellet. 26. Resuspend the pellet in 100 fil TE. Estimate the concentration of DNA spectropho-tometrically. Electrophorese an aliquot of the DNA through an agarose gel, stain with ethidium bromide and photograph to check the integrity of the plasmid and the yield. 27. Store at 4°C. Appendix A. Protocols 73 Notes 1. Lysozyme solution consists of 10 mM EDTA, 25 mM Tris-HCl ph 8.0, 50 mM glu-cose, 12 mg lysozyme/ml (Add lysozyme just before use or store in lysozyme solu-tion at -20°C). 2. Throughout this protocol, unless otherwise specified, ethanol means ice-cold, 99% or absolute ethanol. 3. Throughout this protocol, chloroform (CHCI3) means a 24:1 mixture (volume:vol-ume) of chloroform:octanol. A.3 Removing Extensions from Restriction Fragments Using Mung Bean Nuclease (adapted from Hammond and D'Alessio, 1986). 1. Digest DNA with desired restriction enzymes. 2. Heat kill these enzymes, or phenol extract and ethanol precipitate the digestion mixture. If the latter method is used, determine the concentration of digested DNA remaining. 3. For each 1 pg of digested DNA to be blunt-ended, use 30 units of mung bean nucle-ase and a total reaction volume of 50 fil with the following additional components at the final concentrations listed: • 30 mM NaOAcpH5.0 • 50 mM NaCl • 1 mM ZnCl2 Appendix A. Protocols 74 • 5% glycerol. 4. Recover the DNA by ethanol precipitation, or if desired, phenol extract before precipitation. Notes 1. Less than or greater than l^g of DNA can be used in the blunting reaction, but the amount of enzyme and the reaction volume should be adjusted accordingly. 2. The NaOAc/NaCl/ZnC^/glycerol reaction solution can be made up as a 5X stock and stored at -20°C. A.4 Isolation of Total ssRNA from Leaves (adapted from Siegel et al., 1976). 1. Harvest 1-2 g of leaf tissue. Wash with distilled H 2 O and blot dry. 2. Grind leaves to a fine powder under liquid nitrogen, using a mortar and pestle. 3. Transfer powdered leaf to a 30 ml glass centrifuge tube. Add 3-5 volumes of 10X TNE (100 mM Tris-HCl pH 7.5, 100 mM NaCl, 10 mM EDTA) containing 0.2% SDS and 5% /?-mercaptoethanol. 4. Add an equal volume of phenol:chloroform (CHCI3; see Notes) (1:1). Mix well using a pasteur pipette. 5. Centrifuge in a SS34 rotor at 8,000rpm at 4°C for 5min. 6. Transfer the aqueous phase to a clean centrifuge tube. Back extract the organic phase with 1/2 volume 10X TNE containing 0.2% SDS and 5% /?-mercaptoethanol. Appendix A. Protocols 75 7. Centrifuge as in Step 5. Combine aqueous phases. 8. Extract aqueous phases with an equal volume of phenol:CHCl3 (1:1). Centrifuge as in Step 5 and save the aqueous phase. 9. Extract the aqueous phase with an equal volume of CHCI3. Centrifuge as in Step 5 and save the aqueous phase. 10. To the aqueous phase, add 0.1 volumes of 2M NaOc pH5.8 and 2-2.5 volumes of ethanol (see Notes). 11. Incubate at — 20°C ca. 1 hour or until a flock appears (can leave overnight). 12. Centrifuge in a SS34 rotor at 8,000rpm at 4°C for 10 min. 13. Discard the supernatant. Dry the pellet. Resuspend in 3 ml of IX TNE. 14. Add 1ml 8 M LiCl. Incubate on ice for 30-60 min. or until a flock appears. 15. Centrifuge as in Step 12. 16. Save the supernatant and the pellet. The supernatant contains DNA, dsRNA and tRNA. The pellet contains high molecular weight ssRNA. 17. Resuspend the pellet in 500 fil H 20. Add 1/10 volume 2 M NaOAc pH 5.8 and 2-2.5 volumes of ethanol. Divide volume between 2 microfuge tubes. 18. Incubate at -70°C for 30min. 19. Spin in a microfuge at 4°C for 15 min. Wash the pellet with 70% ethanol. Dry the pellet. Appendix A. Protocols 76 20. Resuspend the pellet in 200 [A H 20. Estimate the concentration of RNA by deter-mining the OD260 of a small aliquot. Check the integrity and concentration of the RNA by electrophoresis. 21. Store the aqueous solution of RNA at — 70° C. 22. To the supernatant from Step 16, add 1/10 volume 2 M NaOAc ph5.8 and 2-2.5 volumes of ethanol. 23. Precipitate and recover pellet as in Steps 18-19. 24. Resuspend the pellet in 500 /^l IX TNE. Estimate the concentration and integrity of nucleic acid as in Step 20. Store at —70°C. Notes 1. All steps should be carried out at 4°C using autoclaved reagents and containers. 2. Using more than the recommended 1-2 g of leaf tissue does not increase yield. 3. Wherever chloroform is required, use 24 parts of chloroform : 1 part octanol. Unless specified otherwise, ethanol means ice-cold 99% or absolute ethanol. 4. When isolating RNA from virus-infected leaves, RNA from mock-inoculated (unin-fected) leaves should also be isolated to use as a control in subsequent experiments. A . 5 Purification of MNSV from Infected Cucumber Plants (adapted from Tremaine et al, 1983). 1. Weigh tissue and grind in a Waring blender with 1-2 volumes (mis of buffer:g of tissue) of 0.1 M NaOAc buffer pH5.0, 5mM -^mercaptoethanol. Add moderate Appendix A. Protocols 77 amounts of tissue and buffer to the blender at a time. Grind until the mixture forms a fine slurry. 2. Strain slurry through cheesecloth using a funnel. Squeeze as much of the green solution through the cheesecloth as possible. 3. Incubate at 4°C for 1 hr. In the low pH buffer used (pH5.0), most plant proteins will precipitate but the virus will remain in solution. 4. Centrifuge in a GSA rotor at 10,000rpm at 4°C for 15 min. 5. Save the supernatant. Discard the green pellet. 6. Measure the volume of the supernatant and add an amount of polyethylene glycol (PEG) equivalent to 8% of this volume, slowly and with stirring. Stir at 4°C for 1-2 hrs (can leave stirring overnight). 7. Centrifuge as in step 4. 8. Discard the clear, brown supernatant. Save the greenish-yellow pellet. 9. Resuspend the pellet, gently but thoroughly, in 0.1 M NaOAc buffer pH5.0, us-ing 1/10 the volume of buffer used in step 1. The pellet can be very difficult to resuspend. Cotton swabs and/or a dounce homogenizer work well. 10. Measure the total volume of the solution containing the resuspended pellet. Add 0.4542 g CsCl/ml of this solution to purify the virus through buoyant density gra-dient centrifugation (the buoyant density of MNSV in CsCl is 1.33-1.34g/cm3). 11. Centrifuge at 35,000rpm for 15 hrs at 20°C in a Ti50.2 rotor (or under equivalent conditions). Appendix A. Protocols 78 12. After centrifugation, any green particulates will have formed a solid mat near the top of the tube. The rest of the solution should be a clear, golden color with the virus forming a white band near the centre of the tube. A fainter white band may be visible near the top of the tube which may contain empty MNSV spheres. 13. Remove the virus band, being careful not to contaminate it with any of the green material near the top of the tube. 14. Dialyze the virus overnight at 4°C using at least 11 of buffer and changing this buffer at least once. The buffer chosen depends on how the virus is to be used. Use IX phosphate buffered saline (PBS) if the virus is to be used as an immunogen. Use 0.01 M NaOAc buffer pH5.0, 0.1 M NaCl if the virus is to be used to prepare RNA or is to be stored for long periods of time. 15. Quantify the virus spectrophotometrically and store at 4°C or —20°C. Notes 1. Virus should be purified from infected tissue as soon as tissue is harvested. Storing tissue before purification significantly reduces yield. Appendix B Nucleotide Sequence Data This appendix shows a "shotgun handler" alignment of all the useable sequence data gen-erated during this project. It also includes sequence data for the coat protein gene (nu-cleotides 2815-3987, inclusive) determined previously and concurrently with this project [Pot, 1987,Riviere et al., 1989]. The alignment indicates the data obtained each time a particular subclone was sequenced, the direction in which each subclone was sequenced (Sense), the number of times each nucleotide was determined and the consensus sequence. The sequence is written as DNA. Ambiguities are indicated using the Staden Uncertainty Code [Staden, 1982b] outlined below. Dashes indicate either a nucleotide which was too ambiguous to be read (see Staden Uncertainty Code) or a padding character inserted to optimize the alignment. 79 Appendix B. Nucleotide Sequence Data, 80 Symbol Meaning 1 probably C 2 probably T 3 probably A 4 probably G D probably C possibly CC V probably T possibly TT B probably A possibly AA H probably G possibly GG K probably C possibly C-L probably T possibly T-M probably A possibly A-N probably G possibly G-R A or G Y C or T 5 A or C 6 GorT 7 A or T 8 G or C - A or C or G or T Staden Uncertainty Code Appendix B. Nucleotide Sequence Data. 81 Clone Archive Gel Sense Name No. No. Sense 10 20 30 40 50 Ol igol 117 101 — -4A2TACTCTAGC1G4ATCC11GACTCTCTTATTTCCTTAAGTTAGTTCG Oligol 132 115 — -4A2TACTCTAGC1G4ATDCCCGACTCTCTTATTTCCTTAAGTTAGTTCG C O N S E N S U S -GATTACTCTAGCCGGATCCCCGACTCTCTTATTTCCTTAAGTTAGTTCG 60 70 80 90 100 Ol igol 117 101 — TGTATTGATTATCTGTCTTGATCAGTATAGGTTAGCAATGGATACTGGTT Oligol 132 115 — TGTATTGATTATCTGTCTTGATCAGTATAGGTTAGCAATGGATACTGGTT Oligol 131 114 — ACTGGT2 C O N S E N S U S TGTATTGATTATCTGTCTTGATCAGTATAGGTTAGCAATGGATACTGGTT 110 120 130 140 150 Ol igol 117 101 — TGAAATTTCTTGTLTCTGGGGGTTTAGCCACCTCATCTGTTATTAGG33A Oligol 132 115 — TGAAATTTCTTGTTTCTGGGGGTTTAGCCACCTCATCTGTTATTAGG33A Oligol 131 114 — TGBAATTTCTTGTTLCTGGGGGTTTAGCKACCT8ATCTGTTATTAGGAAA C O N S E N S U S TGAAATTTCTTGTTTCTGGGGGTTTAGCCACCTCATCTGTTATTAGGAAA 160 170 180 190 200 Ol igol 117 101 — GTGAGTGCTG24342TCATTG4AT214-CCC Oligol 132 115 — GTGAGLGCTGTGAGTTCATTGGATTCGTCCCTTCCTTC Oligol 131 114 — GTGAGTGCTGTGAGTTCATTGGATT1NLCCCTTCCTTCMLC 11BGRNGI 109 96 — 1CCTTCCTTCATCATCTATATT A P 4 S K G I 104 91 + 1CCTTCCTTCATCATCTATATT 9 x l B R P N 5 7 — CCTTCCTTCATCATCTATATT l l x B R P N 3 5 — TTCCTTCATCATCTATATT C O N S E N S U S GTGAGTGCTGTGAGTTCATTGGATTCGTCCCTTCCTTCATCATCTATATT 210 220 230 240 250 11BGRNGI 109 96 — ATCTGCCATCCATGGGTCTTGGACTAGTGCTATCAGCCACGATTGTAGTA A P 4 S K G I 104 91 + ATCTGCCATCCATGGGTCTTGGACTAGTGCTATCAGCCACGATTGTAGTA 9 x l B R P N 5 7 — ATCTGCCATCCATGG4-CTTGGACTA4TGCTATCAGCC3C4ATT4TA4T3 l l x B R P N 3 5 — ATCTGCCATCCATGG4-CTTGGACTAGTGCTATCAGCCACGATTGTAGTA C O N S E N S U S ATCTGCCATCCATGGGTCTTGGACTAGTGCTATCAGCCACGATTGTAGTA 260 270 280 290 300 11BGRNGI 109 96 — AGATTGCCAAGGTTGC1GCCATAGTTGGGATTGGTTATCTTGGGGTTAGG A P 4 S K G I 104 91 + AGATTGCCAAGGTT4CCGCCATAGTTGGGATTGGTTATCTTGGGGTTAGG 9 x l B R P N 5 7 — 343TTGCC33G4TTGCCGCCATA l l x B R P N 3 5 — AGATTGCCAAGGTTGCCGCCATAGTTGGG3TTGGTTATCTTGGG4-TAGG 821BDRPN 4 6 — TAG4 C O N S E N S U S AGATTGCCAAGGTTGCCGCCATAGTTGGGATTGGTTATCTTGGGGTTAGG 310 320 330 340 350 11BGRNGI 109 96 — ATTGGTGCCGCTTGGTGCCGCCGTACTCCCGGAATAACGAATTCCATAAT A P 4 S K G I 104 91 + ATTGGT4CCGCTTGGTGCCGCCGTACTCCCGGAATAACGAATTCCATAAT l l x B R P N 3 5 — ATTGGTGCCGCTTGGTGCCGCCGTACTCCCGGAATAACGAATTCCATAAT 821BDRPN 4 6 — A2TH-VG1CGC2TH-VGCCGCCGTACTCCCGGAATAACGAATTCCATAAT E3RP35S 45 32 + G33TTC1ATAAT E 3 R P 47 30 + GAATTICATAAT E3T3 1 31 + AATTCCATAAT C O N S E N S U S ATTGGTGCCGCTTGGTGCCGCCGTACTCCCGGAATAACGAATTCCATAAT 360 370 380 390 400 11BGRNGI 109 96 — CACCTATGGGG A P 4 S K G I 104 91 + CACCTAT44GGAAGAA-2GGTTGAGCA3GTGA3GGTAGATATTGATGAAG l l x B R P N 3 5 — CACCTATGGGG 821BDRPN 4 6 — CACCTATG4G43AGAAGTGGTTGAGCAA4TGAAGGTAGATATTGATGAAG E3RP35S 45 32 + CACCT5TGGGGA3GAAGT44TTGBGCAAGTGAAGGTAGATATT4ATGAAG E 3 R P 47 30 + CACCT5TGGGGAAGAAGTGGTTGAGCAAGTGAAGGTAGATATTGATGAAG E3T3 1 31 + CA1CTATGGGGAAGAARTGGTTGAG133GTGAAGGTAGATATTGATGAAG C O N S E N S U S CACCTATGGGGAAGAAGTGGTTGAGCAAGTGAAGGTAGATATTGATGAAG Appendix B. Nucleotide Sequence Data 82 410 420 430 440 450 A P 4 S K G I 104 91 + ATGCTGA3GAG4AGTC 8 2 1 B D R P N 4 6 — ATGCTGAAGAGGAGTCCGATATTGGTGAGGAAATTGTGGTTGGTACGATA E3RP35S 45 32 + ATGC24AAGA4GAGTCCGATATTGGTGAGGAAATTGTGGTTGGTACG3TA E 3 R P 47 30 + ATGCTGAAGAGGAGTCCGATATTGGTGAGGAAATTGTGGTTGGTACGATA E3T3 1 31 + ATGCTGAAGAGHAGTC14ATATTGGTGAGGAAATTGTGGTTGGTACGATA 821BT3N 6 8 — A4AG4AGTCCGATA2TG4TGAG4-AA2T4TGG2TGGTACGATA 10x2T3N 7 9 — TTGTGGTTGGT3CGATA C O N S E N S U S ATGCTGAAGAGGAGTCCGATATTGGTGAGGAAATTGTGGTTGGTACGATA 460 470 480 490 500 821BDRPN 4 6 — GGTATTGGTATACACACAAACGTCAACCCTGAAGTTCGANKTAAGCGCAG E3RP35S 45 32 + GGTATTGGTATACADACA E 3 R P 47 30 + GGTATTGGTATACACACA33CGTDA31CCTGA3GTTCGAGCTAAGCGCAG E3T3 1 31 + GGTATTGGTATACACACA33CGTCA3CCCT4AAGTTCGAGCTAAGCGCAG 821BT3N 6 8 — G4TA-VG4TATACRCACAAACGTCAACCCTGAAGTTCGAGCTAAGCGCAG 10x2T3N 7 9 — GGTATTGGTATACACACAAACGTCAACCCTGAAGTTCGANKTAAGCGCAG C O N S E N S U S GGTATTGGTATACACACAAACGTCAACCCTGAAGTTCGAGCTAAGCGCAG 510 520 530 540 550 821BDRPN 4 6 — ACATAGATCGAGGCCATTCATCAAGAAGATCGTGAATTTAACGAAGAATC E 3 R P 47 30 + ACATAGATCG3GG1CATTCATCA3G33G3TCGTGA3T E3T3 1 31 + ACATAGATCGAGGC1ATTCATCA3GA3GATDGT4A3TTTA3CG 821BT3N 6 8 — RCATAGATCGAGGCCA2TCATCAAGAAGA7CGTGAATTTAACGAAGAATC 10x2T3N 7 9 — ACATAGATCGAGGCCATTCATCAAGAAGATCGTGAATTTAACGAAGAATC 3x6CRPGI 100 88 + TGAATTTAACGAAGAATC 3x6ARP 8 1 + TGAATTTAACGAAGAATC 3x6ARP2 56 48 + GAATTTAACGAAGAATC C O N S E N S U S ACATAGATCGAGGCCATTCATCAAGAAGATCGTGAATTTAACGAAGAATC 560 570 580 590 600 821BDRPN 4 6 — ACTTCGGTGGATGCCCCGACTCTAGTAAATCG 821BT3N 6 8 — ACTTCG4TG4ATG 10x2T3N 7 9 — ACTTCGGTGGATGCCCCGACTCTAGTAAATCGAACGTCATGG 3x6CRPGI 100 88 + ACTTCGGTGGATGCCCCGACTCTAGTAAATCGAACGTCATGGCTGTAAGT 3x6ARP 8 1 + ACTTCGGTGGAT4CCCCGACTCTAGTAAATCGAACGTCATGGCTGYAAGT 3x6ARP2 56 48 + ACTTCGGTGGAT4CCCCGACTCTAGTAAATCGAACGTCATGGCTNTAAGT 7 x 6 B R P N 12 10 — C4ACTCTAGTAAATCGAACGTCATG4CTGTAAGT l x 2 R P 85 78 + TAAGT C O N S E N S U S ACTTCGGTGGATGCCCCGACTCTAGTAAATCGAACGTCATGGCTGTAAGT 610 620 630 640 650 3x6CRPGI 100 88 + AAATTCGTTTATGAACAATGTAAACAGCACAATTGTCTTCCACATCAAAC 3x6ARP 8 1 + AAATTCGTTTATGAACAATGTAAACAGCACAATTGTCTTCCACATCAAAC 3x6ARP2 56 48 + AAATTCGTTTATGAACAATGTAAACAGCACAATTGTCTTCCACATCAAAC 7 x 6 B R P N 12 10 — AAATTCGTTTATGAACAATGTAAACAGCACAATTGTCTTCCACATCAAAC l x 2 R P 85 78 + AAATTCGTTTAT4AA1A37GTAAACAGCAC3ATTGTCT7CCACATCAAAC l x 6 B R P H 9 2 + CAGCACAATTGTCTTCCACATCAAAC ES4BT7N 13 11 — C2TCCACATCA3AC C O N S E N S U S AAATTCGTTTATGAACAATGTAAACAGCACAATTGTCTTCCACATCAAAC 660 670 680 690 700 3x6CRPGI 100 88 + GAGATTGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT 3x6ARP 8 1 + GAGATTGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT 3x6ARP2 56 48 + GAGATTGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT 7 x 6 B R P N 12 10 — GAGA2TGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT l x 2 R P 85 78 + GAGATTGATCATGAGTATTRCAGTYCCA2TGGTGTTA3GTCCCG3CATGT l x 6 B R P H 9 2 + GAGATTGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT ES4BT7N 13 11 — GAGAV-GATCAT4AGTA2TGCAG2TCCA2TG4TGTTAA4TCCCGACATGT lx6T3 10 3 + GATCATGAGTATTGCAGTTCCATTGGTGTTAAG-CCCGACATGT C O N S E N S U S GAGATTGATCATGAGTATTGCAGTTCCATTGGTGTTAAGTCCCGACATGT Appendix B. Nucleotide Sequence Data 83 710 720 730 740 750 3x6CRPGI 100 88 + ACGACATTTCCAGCAAAGCTCTGCTAAACAGCGAGATATTGACAGAAABC 3x6ARP 8 1 + ACGAC3TTTCCAGCAAAG1TCTGCTAAACAGCGAGATATTGACA4BAA-C 3x6ARP2 56 48 + ACGACATTTCCAGCAAAGCTCTGCTAAACA4CGAGATATTGACAG33AAC 7x6BRPN 12 10 — AKNACATTTCCANKAAANKTCTGCTAAACAGCGAGATATTGA lx2RP 85 78 + ACG3CATTT1CAGCA33GCTCTGCTA33CA lx6BRPH 9 2 + ACGACATTTCCAGCAAAGKTCTGCTAAACAGCGAGATATTGACAGAAAAC ES4BT7N 13 11 — ACGACATTTCCANKAAAHKTCTGCTAAACAGCGAGATATTGACAGAAAAC lx6T3 10 3 + ACGACATTTCCAGCAAAGCTCTGCTAAACAGCGAGATATTGACAGAAAAC CONSENSUS ACGACATTTCCAGCAAAGCTCTGCTAAACAGCGAGATATTGACAGAAAAC 760 770 780 790 800 3x6CRPGI 100 88 + AGAG1CACGCTGGACCGCCTCAAABCTCTCGACGGHTGGCTA 3x6ARP 8 1 + AGAGD-ACGCTGGA1CG1CTCAAB-C7CTCGAD4GGTH-CTA3CACACTT 3x6ARP2 56 48 + AGAGC lx6BRPH 9 2 + AGA4CCACGCTGGACCGCCTCAAAACTCTCGACGGGTGGCTAACACACTT ES4BT7N 13 11 — AGAGCCACGC2GGACCGCCTCAAAACTCTCGACGGGTGGCTAACACACTT lx6T3 10 3 + AGAGCCA1G1TGGA1CG1CT1AA331T-T-GA1GGGTGGCTA31A-A 2x2RP 11 4 + CTAACACACTT CONSENSUS AGAGCCACGCTGGACCGCCTCAAAACTCTCGACGGGTGGCTAACACACTT 810 820 830 840 850 3x6ARP 8 1 + GGTGTG lx6BRPH 9 2 + GGTGTGCCA1CCCCTTAGCGCGAAGNKTTGGAGGCGGGCAATTGACAACT ES4BT7N 13 11 — -GTGTGCCACCCCCTTAGCGCGAAGGCTTGGAGGCGGGCAATTGACAACT 2x2RP 11 4 + GGTGTGCC311CCCTTAGCGCGAAGNKTTGGAGGCGGGC7ATTGACAACT CONSENSUS GGTGTGCCACCCCCTTAGCGCGAAGGCTTGGAGGCGGGCAATTGACAACT 860 870 880 890 900 lx6BRPH 9 2 + TGTGTGGTCTTCCAGATTGGAAGGCTTTCAAGTTGGTCAACTAGGG4TGC ES4BT7N 13 11 — TGTGTGGTCTTCCAGATTGGAAGGCTTTCAAGTTGGTCA 2x2RP 11 4 + TGTGTGGTCTTCCAGATTGGAAGGCTTTCAAGTTGGTCAACTAGGGGTGC 5x4DRPN 14 33 — TG4TC2TC1AGA2TG4AAG4C22TCAAG2TG4TCAACTAGGG4-GC 3xlRPVN2 21 40 + C CONSENSUS TGTGTGGTCTTCCAGATTGGAAGGCTTTCAAGTTGGTCAACTAGGGGTGC 910 920 930 940 950 lx6BRPH 9 2 + CTGGAH-AGCTCGCT4GGV-CTGTBCT2CG4TACG4AGA4GG 2x2RP 11 4 + CTGGAGGAGCTCGCTGGGTTCTGTACTTCGGTAYGGAGAGGGAC7C71CC 5x4DRPN 14 33 — CTGGAGGAGCTCGCTGGGTTCTGTACTTCGGTACGGAGAGGGACACACCC 3xlRPVN2 21 40 + CTGGAGGAGCTCGCTGGGTTCTGTACTTCGGTACGGAGAGGGACACACCC SEA1MGI 126 108 + GTTCTGTACTTCGGTACGGAGAGGGACACACCC 823BDRPN 57 49 — TACG4AGAH-GACACACCC SEBT7GI 113 99 + GACACACCC 823ERNGI 92 82 — A1ACACC1 CONSENSUS CTGGAGGAGCTCGCTGGGTTCTGTACTTCGGTACGGAGAGGGACACACCC 960 970 980 990 1000 2x2RP 11 4 + AGACATGACCGAG2TTCCTCAGG7TCGT1CC3TTAAGACAT1G 5x4DRPN 14 33 — AGACATGACCGAGTTTCCTCAGGATCGTCCCATTAAGACACGCAAACTGT 3xlRPVN2 21 40 + AGACATGACCGANTTTCCTCAGGATCGTCCCATTAAGACACGCAAACTGT SEA1MGI 126 108 + AGACATGACCGAGTTTCCTCAGGATCGTCCCATTAAGAC3C4CAAACTGT 823BDRPN 57 49 — AGACATGACCGAGTTTCCTCAGGATCGTCCCATTAAGACACGCAAACTGT SEBT7GI 113 99 + AGACATGACC4AGTTTCCTCAGGATC-TCCCATTAAGACAC4CAAACTGT 823ERNGI 92 82 — AGACATGACCGAGTTTCCTCAGGATCGTCC13TTAAGAC3-G-AAACTGT 5x4T3N 15 34 — 22TCCTCAG4ATC4TCCCATTAAGACAC4C-33CT-T CONSENSUS AGACATGACCGAGTTTCCTCAGGATCGTCCCATTAAGACACGCAAACTGT Appendix B. Nucleotide Sequence Data 84 1010 1020 1030 1040 1050 5 x 4 D R P N 14 33 — A-TGTTTAGGGG43GTTGGAACTAGCGTGAAGTTCAACGTGCACAATAAC 3 x l R P V N 2 21 40 + ATTGTTTAGGGGGAGTTGGAACTAGC-TGAAGTTCAACGTGCACA3TAAC S E A 1 M G I 126 108 + ATTGTTTAGGGGGAGTTGGAACTAGCGTGAAGTTCAAC-TGCACAATAAC 823BDRPN 57 49 — A2TGTTTAGGGG-3GTTGGAACTAGCGTGAAGTTCAACGTGCACAATAAC S E B T 7 G I 113 99 + ATTGTTTAG4GGGAGTTGGAACTAGCGTGAAGTTCAAC8TGCACAATAAC 823ERNGI 92 82 — ATTGTTTAGG444AGTTGGAACTAG1GTGAAGTT1AACGTGCA1AATAAC 5x4T3N 15 34 — ATT-TTTAGGGG—42TG4AACTAGCGTGAAG221AAC4TGCACAATAAC 823BT3N 16 12 — TGTTTAGGGH-34TTG43ACTAGCGTG3AG221AACGTGCACAATAAC 2x3RPN 17 13 — C3ACGTGCACAAT3AC C O N S E N S U S ATTGTTTAGGGGGAGTTGGAACTAGCGTGAAGTTCAACGTGCACAATAAC 1060 1070 1080 1090 1100 5 x 4 D R P N 14 33 — TCTCTAGCTAAC-TTCGGCGCGGTCTAGTTGAGCGCGTTTTCTTTG2TGA 3 x l R P V N 2 21 40 + TCTCTAGCTAACCTTC S E A 1 M G I 126 108 + TCTCTAGCTAACCTTCG4C-C-GTCTAGTTGAGC-CGTTTTCTTTGTTGA 823BDRPN 57 49 — TCTCTAGCTAAC-TTCGGCGCGGTCTAGTTGAGCGCGTTTTCTTTGV-GA S E B T 7 G I 113 99 + TCTCTAGCTAACCTTCG-C-1GGTCTAGTTGAGC4CGTTTTCTTTGTTGA 823ERNGI 92 82 — TCTCTAGCTAAC1TTCGG-GCGGTCTAGTTGAGCG-GTTTTCTTTGTTGA 5x4T3N 15 34 — TCTCTAGCTA5C-TTCG4CGCG4TCTA4TT4A4C4C 823BT3N 16 12 — TCTCTAGCTA5C-TTC4GCGCGGTCTAGTTGAGCGCGTTTTCTTTGTTGA 2x3RPN 17 13 — TCTCTAGCTAAD-2TCG4CGCG4TCTAG2TGAGCGCG2TTTC22TG2TGA C O N S E N S U S TCTCTAGCTAACCTTCGGCGCGGTCTAGTTGAGCGCGTTTTCTTTGTTGA 1110 1120 1130 1140 1150 5x4DRPN 14 33 — AAATGATAAGAAGGAACTGGAGCCTGCCCCTAAACCT S E A 1 M G I 126 108 + AAATGATAAGAAGGAACTGGAG1CTGCCCCTAAACCTCTTAGTGG24CGT 823BDRPN 57 49 — AAATGATAAGAAGGAACTGGAGCCTGCCCCTAAACCTCTTAGTGGTGCGT S E B T 7 G I 113 99 + AAATGATAAGAAGGAACTGGA41CTGCCCCTAAACCTCTTAGTGG2-DGT 823ERNGI 92 82 — AAATGATAAGAAGGAACTGGAGCCTGCCCCTAAACCTCTTAGTGGTG1GT 823BT3N 16 12 — AAATGATAAGAAGGAACTGGAGCCTGCC11TAAACCTCTTAGTGGTGC 2x3RPN 17 13 — AAATGATAAGAAGGAACTGGAGCCTGCCCCTAAACCTCTTAGTGGTGCGT 4x5RP 46 39 + CCCTAAACCTCTTAGT4GTRCG7 4x5RP2 25 16 + CCCTAAACCTCTTAGTGGTGCHT 4x3ARPHI 23 35 + CCCTAAACCTCTTAGT4GTRCNT 4 x 3 A R P L O 22 15 + CCCTAAACCTCTTAGTGGTGCGT 4x5DT3 54 50 + CGT C O N S E N S U S AAATGATAAGAAGGAACTGGAGCCTGCCCCTAAACCTCTTAGTGGTGCGT 1160 1170 1180 1190 1200 S E A 1 M G I 126 108 + TTGATCG1TTA3CT2GG 823BDRPN 57 49 — TTGATCGCTTAACTTGG S E B T 7 G I 113 99 + TTGAT141TTAACTTGG2TT1GL-GGAA3CTC-ATAGTATTGTGGGTACT 823ERNGI 92 82 — TTGATCGCTTAACTTGG 2x3RPN 17 13 — 2TGATCGCTTAACTTGGTTTCGTCGGAAACTCCATAGTATTGTGGG2ACT 4x5RP 46 39 + TTGATCRCT7A3 GGTT75G71GGAAACTCCATAGTAT-GTGGGLNAC 4x5RP2 25 16 + TTGATK41TTA3-22GGTLTCG2CGGAAACTCCATAGTALTGTGGGTACT 4x3ARPHI 23 35 + TTGATKNKT2A31TTGGTLTCNT1G4AAACTCCATAGTALTGTGGGLACT 4 x 3 A R P L O 22 15 + TTGATKNKTTA3 GGTTTCGTCGGAAACTCCATAGTALTGTGGGLACT 4x5DT3 54 50 + TTGATCGCTTAACTTGGTTTCGTCGGAAACTCCATAGTATTGTGGGTACT 7x7DRP2FN 18 14 — CGTCG4AA3CTCCATAGTA2TGTGG4TACT 616A3NGI 99 87 — GGTACT 6 x l 6 R P N 48 41 — CT C O N S E N S U S TTGATCGCTTAACTTGGTTTCGTCGGAAACTCCATAGTATTGTGGGTACT Appendix B. Nucleotide Sequence Data 85 1210 1220 1230 1240 1250 S E B T 7 G I 113 99 + CATTD-AGTATTAGTD-A4GT 2x3RPN 17 13 — CATTCCAGTATTAGTCCAGGTCAGTTCTTGGACTTCTATACTGGCAGGAG 4x5RP 46 39 + 5ATTCCAGTATTA42CCARG2CA47T5T7GGACTTCTATB1TGGCAGGAG 4x5RP2 25 16 + 5ATTCCAGTATTAGTCCAGGTCAGTTCTTGGACTTCTATACTGGCAGGAG 4x3ARPHI 23 35 + KATTCCANTATT342CCMKGTCAGTTCT2GGACTTCTALA1TGGCAGGAG 4 x 3 A R P L O 22 15 + 5ATTCCAGTATTAGTCCAGGTCAGTTCLTGGACTTCTATA1TGGCAGGAG 4x5DT3 54 50 + CATTCCAGTATTAGTCCAGGTCAGTTCTTGGACTTCTATACTGGCAGGAG 7x7DRP2FN 18 14 — CATTCCA4TATTAGTCCAGGTCAGTTCTTGGACTTCTATACTGGCAGGAG 616A3NGI 99 87 — CA2T1CAGTA2TAGTC1AG4TCAG2TC2TG4AC2TCTATACTG4CAG4AG 6xl6RPN 48 41 — CAT-D-AGTA2TAG2D-AH-TCAG2TC2TG4AC2TCTATACTG4CAG4AG C O N S E N S U S CATTCCAGTATTAGTCCAGGTCAGTTCTTGGACTTCTATACTGGCAGGAG 1260 1270 1280 1290 1300 2x3RPN 17 13 — GCGCACGAT2TATGAAGGTGCTGTGAAATCGTTGGAGGGG 4x5RP 46 39 + GCGCA143TTTATGAAGGTGCTGTGAAATCG 4x5RP2 25 16 + GCGCAC432TTATGAAGGTGCTGTGAAATCGTTGGAG4GGTTAAGTGTTC 4x3ARPHI 23 35 + GCGCAC-3TT2A2GAAG4TGCTGT8AAAT1G7TGGAGGG87TAAGYGTT1 4 x 3 A R P L O 22 15 + GCGCAC—TTTATGAAGGTGCTGTGAAAT-G2TGGAGGG4TTAAG2GTTC 4x5DT3 54 50 + GCGCACGATTTATGAAGGTGCTGTGAAATCGTTGGAGGGGTTAAGTGTTC 7x7DRP2FN 18 14 — GCGCACGATTTATGAAGGTGCTGTGAAATCGTTGGAGGGG2TAA-LGTTC 616A3NGI 99 87 — GCGC3CGATTTATGAAGGTGCTGTGAAATCGTTGGAGGGGTTAAGTGTTC 6 x l 6 R P N 48 41 — 4CGCACGA22TATGAAG4TGCTGTGAA32CG—44A444G-TAAGTG2TC C O N S E N S U S GCGCACGATTTATGAAGGTGCTGTGAAATCGTTGGAGGGGTTAAGTGTTC 1310 1320 1330 1340 1350 4x5RP2 25 16 + A3CGAAGGGATG1CTATCTGAAAACG 4x3ARPHI 23 35 + 33C4A3NGGATGCCTATCTGAAAACRTTTGTTAAAGC 4 x 3 A R P L O 22 15 + AACGABGGGATGCCTATCTGA3AACGTTTGTTAAAGCGGAGAAGATTAA2 4x5DT3 54 50 + AACGAAGGGATGCCTATCTGAAAACGTTTGTTAAAGCGGAGAAGATTAAT 7x7DRP2FN 18 14 — AACGAAGGGATGCCTATCTGAAAACGTTTGTTAAAGCGGAGAAGATTAAT 616A3NGI 99 87 — B-143AGGGATGCCTATCTGAAAACGTTTGTTAAAGCGGAGAAGATTAAT 6xl6RPN 48 41 — AACGAAGG4ATGCCTATCTGAAAACGTTTGTTAAAGCGGAGAAGATTAAT C O N S E N S U S AACGAAGGGATGCCTATCTGAAAACGTTTGTTAAAGCGGAGAAGATTAAT 1360 1370 1380 1390 1400 4 x 3 A R P L O 22 15 + A1CADTAAGA331CTG31CCAGCT1CG 4x5DT3 54 50 + ACCACTAAGAAACCTGACCCAGCTCCNKGG-TTATACA31CGAGGAACGT 7x7DRP2FN 18 14 — ACCACTAAGAAACCTGACCCANKTCCGCGGGLT 616A3NGI 99 87 — ACCACTAAGAAACCTGACCCAGCTCCGCGGGTTATACAACCGAGGAACGT 6xl6RPN 48 41 — ACCACTAAGAAACCTGAD-CAKNTCCGCGGGTTATACAACCGAGGAACGT l x 7 A R P 2 F 49 42 + ACTAAGAAACCTGACCCAGCTCCGCGGGTTATACAACCGAGGAACGT lx7ARP 35 26 + CMNCTCCNKGG-TTATACAACCGAGGAACGT C O N S E N S U S ACCACTAAGAAACCTGACCCAGCTCCGCGGGTTATACAACCGAGGAACGT 1410 1420 1430 1440 1450 4x5DT3 54 50 + AAGATACA3CG2TGAGG2T4GTCG2TATCTACGTAGG22TGAGCA2TA1C 616A3NGI 99 87 — AAGATACAACGTTGAGGTTGGTCGTTATCTACGTAGGTTTGAGCATTACC 6xl6RPN 48 41 — AAGATACAACGTTGAGGTTGGTCGTTATCTACGTAGGTTTGAGCALLMK-l x 7 A R P 2 F 49 42 + AAGATACAACGTTGAGGTTGGTCGTTATCTACGTAGGTTTGAGCATTACC l x 7 A R P 35 26 + AAGATACAACGTTGAGGTTGGTCGTTATCTACGTAGGTT2GAGCATTACC C O N S E N S U S AAGATACAACGTTGAGGTTGGTCGTTATCTACGTAGGTTTGAGCATTACC 1460 1470 1480 1490 1500 4x5DT3 54 50 + TCTATCGA4G 616A3NGI 99 87 — TCTATCGAGGAATTGACGAAATCTGGAATGGCCCCACCATAATAAAAGGA 6xl6RPN 48 41 — LCTATCGAGGAATTGACGAAATCTGGAATGGCCCCACCATAATAAAAGGA l x 7 A R P 2 F 49 42 + TCTATCGAGGAATTGACGAAATCTGGAATG4CCCCACCATAATAAAAGGA l x 7 A R P 35 26 + TCTATCGAGGAATTGACGAAATCTGGAATG-CCCCACCATAATAAAAGGA 5 x 3 E R P N 26 17 AG4A C O N S E N S U S TCTATCGAGGAATTGACGAAATCTGGAATGGCCCCACCATAATAAAAGGA Appendix B. Nucleotide Sequence Data 86 1510 1520 1530 1540 1550 616A3NGI 99 87 — T 6 x l 6 R P N 48 41 — TACACTGTCGA l x 7 A R P 2 F 49 42 + TACA1TGTCGA4CAAATTGGGAAAATCG1CCGTGACGCATGGGACT1CTT l x 7 A R P 35 26 + TACACTGTC43GCAAATTGGGAAAATC4CCCGTGACGCATGGGACTCCTT 5 x 3 E R P N 26 17 — TACACTGTCGAGCAAA2TGG4AAAATCGCCCGTGACGCATGG4ACTCCTT C O N S E N S U S TACACTGTCGAGCAAATTGGGAAAATCGCCCGTGACGCATGGGACTCCTT 1560 1570 1580 1590 1600 l x 7 A R P 2 F 49 42 + CGTTAGTCCTGTAGCB-TCG4AT22GACATGA333G42TCGACCAACATG l x 7 A R P 35 26 + CGTTAGTCCTGTAGCAATCGGATTTGACATGAAAAGGTTCGACCAACATG 5 x 3 E R P N 26 17 — CGTTAGTCCTGTAGCAATCGGATTTGACATGAAAAGGTTCGACCAACATG 5x3T3N 27 18 — TGACATGAAAAGG2TCGACCAACAT4 C O N S E N S U S CGTTAGTCCTGTAGCAATCGGATTTGACATGAAAAGGTTCGACCAACATG 1610 1620 1630 1640 1650 l x 7 A R P 2 F 49 42 + TAT1CT1CGACG1TCT2A33TG44B-CATAGTGT22ATCT2GACGKTTTV l x 7 A R P 35 26 + TATCCT1CGACGCTCTTAAAT4GGA3CATAGT 5 x 3 E R P N 26 17 — TATCCTCCGACGCTCTTAAATGGGAACATAGTGTTTATCTTGACGCTTTT 5x3T3N 27 18 — TATCCTCCGACGCTCTTAAATGGGAACATAGTGTTTATC2TGACGCTTTT C O N S E N S U S TATCCTCCGACGCTCTTAAATGGGAACATAGTGTTTATCTTGACGCTTTT 1660 1670 1680 1690 1700 l x 7 A R P 2 F 49 42 + -G1CACGACTC 5 x 3 E R P N 26 17 — TGCCACGACTCATATCTTGCAGAATTGTTGAAGTGGCAATTAGTTAATAA 5x3T3N 27 18 — TGCCACGAl-CATATCTTGCAGAATTGTTGAAGTGGCAATTAGTTAATAA 4x2RP2 38 28 + CGACTCATATCTTGCAGAATTGTTGAAGTGGCAATTAGTTAATAA 2x3RP 39 29 + ATATCTTGCAGAATTGT2GAAGT4GCAATTAGTTAATAA 4x2RP 37 27 + GAATTGTTGAAGTGGCAATTAGTTAATAA 5 x l D 3 N G I 107 94 — 2TAG2TAATAA C O N S E N S U S TGCCACGACTCATATCTTGCAGAATTGTTGAAGTGGCAATTAGTTAATAA 1710 1720 1730 1740 1750 5 x 3 E R P N 26 17 — GGG2GTTGGGTATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 5x3T3N 27 18 — GGGTGTTGGG2ATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 4x2RP2 38 28 + GGGTGTTGGGTATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 2x3RP 39 29 + GGG24TT4GGTATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 4x2RP 37 27 + GGGTGTTGGGTATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 5 x l D 3 N G I 107 94 — H-GTGTTGG-TATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT C O N S E N S U S GGGTGTTGGGTATGCTAGTGATGGAATGATTAAATATAAGGTTGATGGGT 1760 1770 1780 1790 1800 5 x 3 E R P N 26 17 — GCCGGATGAGTGGTGACATGAAT 5x3T3N 27 18 — GCCGGATGAGTGGTGA1AL4AATA-AGCTA 4x2RP2 38 28 + GCCGGATGAGTNGTGACATGAATACAGCTATGGGTAACTRTTTGATTGCC 2x3RP 39 29 + GCCGGATGAGTGGTGACATGAATACAGCTATGGGTAACT4TTTGATTG1C 4x2RP 37 27 + GCCGGATGAGTGGTGACATGAATACAGCTATGGGTAACTGTTTGATTGCC 5 x l D 3 N G I 107 94 — GCCGGATGAGTGGTGACATGAATACAGCTATGGGTAACTGTTTGATTGCC 5 x l D R P N 28 19 — CG4ATGA4TG4TGACATGAATACAGCTATGGGYA3CTGT2TGA2TGCC 5 x l T 3 N 29 20 — TG4TGACATGAATACA4CTMTGG4-MACTGV2TGATTGCC C O N S E N S U S GCCGGATGAGTGGTGACATGAATACAGCTATGGGTAACTGTTTGATTGCC 1810 1820 1830 1840 1850 4x2RP2 38 28 + T4TGC1ATCACGCATGATTTCTTCCGTAGTCGTGGTATCAGGGCGCGTTT 2x3RP 39 29 + TGTG1CATCACGCATGATTTCTT1CGTAGTCGTGGTA213GGGCGCGTTT 4x2RP 37 27 + TGTGCCATCACGCATGATTTCTTCCGTAGTCGTGGTATCAGGGCGCGTTT 5 x l D 3 N G I 107 94 — TGTGCCATCACGCATGATTTCTTCCGTAGTCGTGGTATCAGGG1G1GTTT 5 x l D R P N 28 19 — TGTGCCATCACGCATGATTTCTTCCGTAGTCGTGGTATCAGGGC4 TT 5 x l T 3 N 29 20 — T-TGCCATCACGCATGA22TCTTCCGTAGTCGTG4TATCAGGG1 TTT 4 x l 5 B R G I 125 110 + C4TGGTATCAGG4CGCGTTT C O N S E N S U S TGTGCCATCACGCATGATTTCTTCCGTAGTCGTGGTATCAGGGCGCGTTT Appendix B. Nucleotide Sequence Data 87 1860 1870 1880 1890 1900 4x2RP2 38 28 + GATGA3CA3T4GTGATGACTGTGTCGTA3TATGCGA33B-GA3TGTG1CG 2x3RP 39 29 + GATGA3CA3TGGTGATGACTGTGTCGTAATATGCGA 4x2RP 37 27 + GATGA3CA3TGGTGAT4ACTGTGTCGTAATATGCGA 5 x l D 3 N G I 107 94 — GATGAACAATGGTGATGACTGTGTCGTAATATGCGAAAAAGAATGTGCCG 5 x l D R P N 28 19 — GATGAACA-TGGTGATGACTGTGTCGTAATATGCGAAAAAGAATGTGCCG 5 x l T 3 N 29 20 — GATGA3 TGGTGATGACTGTGTCGTAATATGCGAAAAAGAATGTGCCG 4 x l 5 B R G I 125 110 + GATGAACAATGGTGATGACTGTGTCGTAATATGCGAAAAAGAAT4T4CCG 4xl5T3GI 114 100 + 4GT4AT4AC2GTGTC4TAATAT4CGAAAAAGAA—T-CC4 E 3 T 7 N G 119 102 — GTGATGACTGTGTCGT-ATATGCGB-AAAGB-TGTGC1G 5x4RP2A 40 36 + GATGACLGTRTCGTAATATGCGAAAAAGAATGTGCCG C O N S E N S U S GATGAACAATGGTGATGACTGTGTCGTAATATGCGAAAAAGAATGTGCCG 1910 1920 1930 1940 1950 4x2RP2 38 28 + CGGT4GTTA 5 x l D 3 N G I 107 94 — CGGTGGTTAAAGCCGACATGGTAAGGCACTGGAGACAATTCGGGTTTCAA 5 x l D R P N 28 19 — CGGTGGTTAAAGCCGACATGGTAAGGCACTGGAGACAATTCGGGTTTCAA 5 x l T 3 N 29 20 — CGGTGGTTAAA41CGACATGGTAAGGCACTGGAGACAATTCGGG2TTCAA 4 x l 5 B R G I 125 110 + CGGTGGTTAAAGCCGACATGGTAAG4CACTGGAGACAATTCGGGTTTCAA 4xl5T3GI 114 100 + C4GTGGTTAAAGCCGACATGGTAAG4CACTGGAGACAATTCGGGTTTCAA E 3 T 7 N G 119 102 — C4GT4G2T3AAG1CGACAT4GT3A4GCACTG4A4AC3A2TC4GG2TTCAA 5x4RP2A 40 36 + CGGTGGTTAAAGCCGACALMGTAAGGCACTGGAGACA3TTCGGGTTTCAA S E C R P N G I 120 103 — CH-GV-TCAA E 3 T 7 N 30 21 — TCAA C O N S E N S U S CGGTGGTTAAAGCCGACATGGTAAGGCACTGGAGACAATTCGGGTTTCAA 1960 1970 1980 1990 2000 5 x l D 3 N G I 107 94 — TG1GAACTCGAATGCGATGCAG 5 x l D R P N 28 19 — TGCGAACTCGAATNKGATGCAGAAATC 5 x l T 3 N 29 20 — TGCGAA 4 x l 5 B R G I 125 110 + T4CGAACTCGAAT4C-ATGCAGAAATCTTCGAGCAAATTGAGTTTT4TCA 4xl5T3GI 114 100 + TGCGAACTCGAAT4CGAT4CAGAAATCTTCGAGCAAATTGAGTTTT4TCA E 3 T 7 N G 119 102 — TGCGAACTCGAATGCGATGCAGAAATCTTCGAGCAAATTGAGTTTTGTCA 5x4RP2A 40 36 + TRCGAACTCGA3TRCGMTGCAGAAATCV-CGMGCAAA S E C R P N G I 120 103 — TGCGAACTCGAATGCGATGCAGA3ATCTTCGAGCAAATTGAGTTTTGTCA E 3 T 7 N 30 21 — T4CGAACTCGAATGCGATGCAGAAATC2TCGAGCARA2TGAG226TGTCA C O N S E N S U S TGCGAACTCGAATGCGATGCAGAAATCTTCGAGCAAATTGAGTTTTGTCA 2010 2020 2030 2040 2050 4 x l 5 B R G I 125 110 + AAT-C—CCTGTGTAC—C-GGGAAAAATATGTGATGGTAC8GAAV1CCT 4x l5T3GI 114 100 + AAT4C4GCCTGTGTAC4AC4GGGAAAAATATGTGATGGTACGGAATCCCT E 3 T 7 N G 119 102 — AATGCGGCCTGTGTACGACGGGGA-A-ATATGTGATGGTA1GGAATCCCT S E C R P N G I 120 103 — AATGCGGCCTGTGTACGACGGG4A3A3ATATGTGATGGTACGGAATCCCT E3T7N 30 21 — AATGCGGC124TGTACGACGGGGAAAAATATGTGATGGTACGGAATCCCT 2 x 5 D R P N 32 23 — ATATGTGATGGTACGGAATCCCT 4 x 7 A R P G I 103 90 + CCT 4x7RP 43 37 + CCT 4x7RP2 55 47 + CCT C O N S E N S U S AATGCGGCCTGTGTACGACGGGGAAAAATATGTGATGGTACGGAATCCCT 2060 2070 2080 2090 2100 4 x l 5 B R G I 125 110 + V4GTTAGCCTATCCAAA 4xl5T3GI 114 100 + TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGGCCCTTGGAATGGAATC E 3 T 7 N G 119 102 — TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGG1CCTTGGAATGGAATC S E C R P N G I 120 103 — TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGGCCCTTGGAATGGAATC E3T7N 30 21 — TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGGCKCTTGGAATGGAATC 2 x 5 D R P N 32 23 — TGGTTAGCCTATCCAAAGATTCCTACTCA-TCGGCCCTTGGAATGGAATC 4 x 7 A R P G I 103 90 + TGGTTAGCCTATCCAAAGATTCCTACTCAGTC44C1CTTGGAATGGAATC 4x7RP 43 37 + TGGTTAGCCTATCCAAAGATTCCTACTCAGT144CCCTTGGAATGGAATC 4x7RP2 55 47 + TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGGCCCTTGGAATGGAATC 303A7NGI 128 111 — 1CTATCCAAAGATTCCTACTCAGTCGG11CTTGGAATGGAATC 2x5T3N 31 22 — CTCAGTCG4CCC2TG4AATG4AATC l x l D R P N 34 25 — G411C2TG4AAVG4AATC C O N S E N S U S TGGTTAGCCTATCCAAAGATTCCTACTCAGTCGGCCCTTGGAATGGAATC Appendix B. Nucleotide Sequence Data, 88 2110 2120 2130 2140 2150 4xl5T3GI 114 100 + AACCATGCACGCAAGTG4GTCAATGCAGTTGGCTTGTGTG4CTTATC E 3 T 7 N G 119 102 — AACCATGCACG1AAGTGGGTCAATGCAGTTGGCTTGTGTGGCTTATCCCT S E C R P N G I 120 103 — AACCATGCACG-AAGTGGGTCAATGCAGTTGGCTTGTGTGGCTTATCCCT E 3 T 7 N 30 21 — AACCATGCACGCRAGTGGGTCAATGCAGTTGGKTTGTGTGG8TTATCCCT 2 x 5 D R P N 32 23 — AACCATGCACGCAAGTGGGTCAATGCAGTTGGCTTGTGTGGCTTATCCCT 4 x 7 A R P G I 103 90 + AACCAT4CACGCAAGTGGGTCAATGCAGTTGGCTTGTGT4GCTTAT1CCT 4x7RP 43 37 + AACCATGCAYGCAA—GG-TCAATGCAGTTGGCTTGTGTGGCTTATCCCT 4x7RP2 55 47 + AACCATGCACGCAAGLHGGTCAATGCAGTTGGCTTGTGTGGCTTATCCCT 303A7NGI 128 111 — AACCATGCACGCAAGTGGGTCAATGCAGTTGGCTTGTGTGGCTT 2x5T3N 31 22 — AACCATGCACGCAAGTGG4TCAATGCAG2TG4C2TGTGTGGCTTATCCCT l x l D R P N 34 25 — AACCATGCACGCAA42GG4-CAATGCAG2TG4C2TGTGTG4CTTATCCCT l x l F R N G I 110 97 — TCAATGCAGTTGGC2TGTGT4G1TTATCCCT l x l A R P N 33 24 — 2T4T4TG4CTTATCCCT C O N S E N S U S AACCATGCACGCAAGTGGGTCAATGCAGTTGGCTTGTGTGGCTTATCCCT 2160 2170 2180 2190 2200 E 3 T 7 N G 119 102 — CACTGGTGGAATTC S E C R P N G I 120 103 — CACTGGTGGAATTC E3T7N 30 21 — CACTGGTGGAATTC 2x5DRPN 32 23 — CACTGGTGGAATTCCTGTTGTCCAAAGTTATTATAATATGATGATCCGCA 4 x 7 A R P G I 103 90 + CACTGGTGGAATTC 4x7RP 43 37 + CACT—TGGAATTC 4x7RP2 55 47 + CACTGGTGGAATTC 2x5T3N 31 22 — CACTGGTGGAA2TCCTG2TGTCC33AGTTATTATAATATGATGATCCGCA l x l D R P N 34 25 — CACTGGTGGAATTCCTG2TGTCCAAAGTTATTATAATATGATGATCCGCA l x l F R N G I 110 97 — CACTGGTGGAATTCCTGTTGTCCAAAGTTATTATAATATGATGATCCGCA l x l A R P N 33 24 — CACTG4TG43ALTCCTGTTGTCCA3AGTTATTATAATATGATGATCCGCA l x l E R N G I 102 89 — ACTGGTGGAATTCCTGTTGTCCAAAGTTATTATAATATGATGATCCG1A E E 2 B T 3 44 38 + GAATTCCTGTTGTCCAAAGTTATTATAATATGATGATCCGCA M301D7GI 98 83 + GTTATTATAATATGATGATCCGCA M301AT7 50 43 + TATTATAATATGATGATCCGCA M301T7 52 44 + A2AATATGATGATCCGCA C O N S E N S U S CACTGGTGGAATTCCTGTTGTCCAAAGTTATTATAATATGATGATCCGCA 2210 2220 2230 2240 2250 2 x 5 D R P N 32 23 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGC-2x5T3N 31 22 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT l x l D R P N 34 25 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT l x l F R N G I 110 97 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT l x l A R P N 33 24 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT l x l E R N G I 102 89 — ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT E E 2 B T 3 44 38 + ACACTCAGTCCGTGAAC34TTCTGGCATACTTCGCGATGTC3GTTTTGCT M301D7GI 98 83 + ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT M301AT7 50 43 + ACACTCA4TCCGTGAACA4TTCTGGCATACTTCGCGATGTCAGTTTTGCT M301T7 52 44 + ACACTCAGTCCGTGAACAGTTCTGGCATACT2CGCGATGTCAGTTTTGCT C O N S E N S U S ACACTCAGTCCGTGAACAGTTCTGGCATACTTCGCGATGTCAGTTTTGCT 2260 2270 2280 2290 2300 2 x 5 D R P N 32 23 — AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGA 2x5T3N 31 22 — AGTGGATTTCGG l x l D R P N 34 25 — AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAMGTGGTGCCAT l x l F R N G I 110 97 — AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT l x l A R P N 33 24 — AGTGGAT2TCGGG3GTTAGCGCGATTGGG2AACAGGAAAAGTGGTGCCAT l x l E R N G I 102 89 — AGTGGATTTCGGGAGTTAGCG1GATTGGG2AACAGGAAAAGTGGTGCCAT E E 2 B T 3 44 38 + AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT M301D7GI 98 83 + AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT M301AT7 50 43 + AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT M301T7 52 44 + AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT M2001T3N 58 51 — TGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT C O N S E N S U S AGTGGATTTCGGGAGTTAGCGCGATTGGGTAACAGGAAAAGTGGTGCCAT Appendix B. Nucleotide Sequence Data. 89 2310 2320 2330 2340 2350 l x l D R P N 34 25 — ATCTGAAGACGCCCGTTTTAGCTTTTALKLKGCATTTGGC l x l F R N G I 110 97 — ATCTGAAGACGCCCGTTTTAGCTTTTATCTCGCATTTGGC l x l A R P N 33 24 — ATCTGAAGACGCCCGTTTTAGCTTTTALKTCGCATTTG l x l E R N G I 102 89 — ATCTGAAGA5G GTTT E E 2 B T 3 44 38 + ATCTGAAGACG1CCGTTTTAGCTTTTATCTCGCATTTGGCATTACTCCAG M301D7GI ' 98 83 + ATCTGAAGAC-CCCGTTTTAGCTTTTATCTCGCATTTGGCATTACTCCAG M301AT7 50 43 + ATCTGAAGACGCCCGTTTTAGCTTTTATCTCGCATTTGGCATTACTCCAG M301T7 52 44 + ATCTGAAGACG1CCG2222AGCTTTTATCTCGC7222GGCATTACTCCAG M2001T3N 58 51 — ATCTGAAGACGCCCGTTTTAGCTTTTATCTCGCATTTGGCATTACTCCAG C O N S E N S U S ATCTGAAGACGCCCGTTTTAGCTTTTATCTCGCATTTGGCATTACTCCAG 2360 2370 2380 2390 2400 E E 2 B T 3 44 38 + ATTTACA3CGTGCCATGGA33GTGACTATGATGCTCATTCTATBGAGTGG M301D7GI 98 83 + ATTTACAACG2GCCATGGAAAGTGACTATGAT4CTCATACTATAGAGTGG M301AT7 50 43 + ATTTACAACGTGCCATGGAAAGTGACTATGATGCTCATACTATAGAGT4G M301T7 52 44 + A M2001T3N 58 51 — ATTTACAACGTGCCATGGAAAGTGACTATGATGCTCATACTATAGAGTGG 301CT3NS 124 107 — CTATAGAGTG4 C O N S E N S U S ATTTACAACGTGCCATGGAAAGTGACTATGATGCTCATACTATAGAGTGG 2410 2420 2430 2440 2450 E E 2 B T 3 44 38 + GGTTTCGT-1CD-AG44A M301D7GI 98 83 + GGTTTCG24C1CCAGGGAAATCCTAGAATACAGCCAATCTCATGGACTCT M301AT7 50 43 + GG2TTCGT-C1D-A4G4AAAT1CTAGA3TACAG1CA3TCTCAT4GACTCT M2001T3N 58 51 — GGTTTCGTGCCCCAGGGAAATCC 301CT3NS 124 107 — 4GT2TCGTGCCC1AGG4AAATCCTAGAATACAGCCAATCTCATG4ACTCT 01igo2 139 121 — GG33-TCCTAGAATACAGCCAATCTCATGGACTCT M301AT3N 51 46 — ATCCTAGAATACAGC13ATCTCATGGACTCT Oligo2 140 122 — TCCTAGAATACAGCCAATCTCATGGACTCT M2101T7 59 52 + CTCATGGACTCT C O N S E N S U S GGTTTCGTGCCCCAGGGAAATCCTAGAATACAGCCAATCTCATGGACTCT 2460 2470 2480 2490 2500 M301D7GI 98 83 + CA3CGA3CTGTAGAATTA3CTA3TC M301AT7 50 43 + CA3CGA3CTGTBGB-2TA3CTA3TD-TC44G-A3GA3GTA33GB-CGTG-301CT3NS 124 107 — CAACGAACTGTAGAATTAACTAATCCTCGGGGAAGAAGTAAAGAACGTGG Oligo2 139 121 — CAACG33CTGTAGAATTAACTAATCCTCGGGGAAGA7GTAAAGAACGTGG M301AT3N 51 46 — CAACGAACVGTAGAATTAACTAATCCTCGGG4-AGAAGTAAAGAACGTG4 01igo2 140 122 — CAACGAACTGTAGAATTAACT7ATCCTCGGGGAAGAAGTAAAGA3CGTGG M2101T7 59 52 + CAACGAACTGTAGAATTAACTAATCCTCGGGGAAGAAGTAAAGAACGTGG M301T3N 53 45 — TG4 C O N S E N S U S CAACGAACTGTAGAATTAACTAATCCTCGGGGAAGAAGTAAAGAACGTGG 2510 2520 2530 2540 2550 M301AT7 50 43 + TGACA 301CT3NS 124 107 — TGACAGCGGGG4AAAACAGAAGAACTCAATGGGG1GAAAGATAGCCAATG Oligo2 139 121 — TGACAGCGGGGGAAAACAGAAGAACTCAATGGGGCGA3AG3TAGC M301AT3N 51 46 — TGACAGCGG4G4BAAACAGAAGAACTCAATGGG4-GAAAGATAGCCAATG 01igo2 140 122 — TGACAGCGGGGGAAAACAGAAGAACYCAALGGGGCGAAACATAGCC M2101T7 59 52 + TGACAGCGGGGGAAAACAGAAGAACTCAATGGGGCGAAAGATAGCCAATG M301T3N 53 45 — TGACAGCGGGG4AAAACAGAAGAACTCAATGGG4CGAAAGATAGCCAATG Oligo3 142 123 — AACAGAAGAACTCAATGGGGCGAAAGATAGCCAATG 01ieo3 141 126 — T44GGCGAAAGAT-RCCAATG C O N S E N S U S TGACAGCGGGGGAAAACAGAAGAACTCAATGGGGCGAAAGATAGCCAATG 2560 2570 2580 2590 2600 301CT3NS 124 107 — ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGCCAGCACATACATT M301AT3N 51 46 — ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGG2GCCAGCACATACATT M2101T7 59 52 + ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGC1AGCACATACATT M301T3N 53 45 — ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGCCAGCACATACATT Oligo3 142 123 — ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGCCAGCACATACATT Oligo3 141 126 - ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGCCAGCACATACATT C O N S E N S U S ATGCTATCTCTGAATCGAAGCAAGGAGTTATGGGTGCCAGCACATACATT Appendix B. Nucleotide Sequence Data 90 2610 2620 2630 2640 2650 301CT3NS 124 107 — GCTGATAAAATTAAGGTGACTATTAACTTTAATTTTTAGTGTATGGCTTG M301AT3N 51 46 — GCTGATAAAATTAAGGTGACTATTAACTTTAATTTTTAGTGTATGGCTTG M2101T7 59 52 + GCTGATAAAATTAAGGTGACTATTAACTTTAATTTTTAGTGTATGGCTTG M301T3N 53 45 — GCTGATAAAATTAAGGTGACTATTAACTTTAATTTTTAGTGTATGGCTTG Oligo3 142 123 — GCTGATAAAATTAAGGTGACTATTAACTTT7AT2TTTAGTGTATGGCTTG 01igo3 141 126 — GCT4ATAAAATTAAGGTGACTATT7ACT777727TTTAGTGTATGGCTTG M1902N3G 94 84 — TTAATTTTTAG2GTBTGGCT2G Oligo4 144 125 — T-ATTTTTAGTGTATGGCTTG M1901T3N 60 53 — TAGTGTATG4C22G C O N S E N S U S GCTGATAAAATTAAGGTGACTATTAACTTTAATTTTTAGTGTATGGCTTG 2660 2670 2680 2690 2700 301CT3NS 124 107 — TTGCCGTTGTGAC M301AT3N 51 46 — TTGCCGTTGTGACTCCAGCCCCGGG M2101T7 59 52 + TTG1CGTTGTGACT1CAG1CCC44GGATTACTCT4GAGCA2TGC2TATA2 M301T3N 53 45 — TTGCCGTTGTGACTCCAGCCCCGGG 01igo3 142 123 — TTGCCGTTGTGAC Oligo3 141 126 — TTGCCGTTGTGACTCCAGCCCC M1902N3G 94 84 — TTG—GTTGTGACTCCAG-111GGG43TTACTCTGGAGCATTG1TTATAT 01igo4 144 125 — TTGC-GTTGTGACTC1RGCCD-44GGATTACTCTGGAGCATTGCT2ATAT M1901T3N 60 53 — 2TG1CG2TGTGACTCCAGCCCCGGGG-TTACTCTGGAGCA2TGC6TATAT 1304B7GI 95 85 + GGGATT3CTCTGGAGCATTGCTTATAT M1304T7 61 54 + CTCTGGAGCATTNKTTATAT C O N S E N S U S TTGCCGTTGTGACTCCAGCCCCGGGGATTACTCTGGAGCATTGCTTATAT 2710 2720 2730 2740 2750 M2101T7 59 52 + TA2TTATCTCA22TG2222C M1902N3G 94 84 — TATTTATCTCATTTGTTTTCTTTTATATTACCTCGCTTAG115G1AAGGA 01igo4 144 125 — TATTTATCTCATTTG2222-T2TTATATTACCTYGCTTAGCCCGCAAGGA M1901T3N 60 53 — TATTTALCTC3226G2TT2CTTTTALATTACCTCGCTTAGCCCGC5AGGA 1304B7GI 95 85 + TATTTATCTCATTTNTTTTCTTTTATATTACCTCGCTTAGCCCGCAAGGA M1304T7 61 54 + TATTTATCTCATTTGTTTTCTTTTATATTACCTCGCTTAGCCCGCAAG-A 01igo4 143 124 — TTATATTACCT-GCTTAGCCCGCAAGGA M102T3N 62 55 — TALATTACC2CGCTTAGCCCGC5AGGA C O N S E N S U S TATTTATCTCATTTGTTTTCTTTTATATTACCTCGCTTAGCCCGCAAGGA 2760 2770 2780 2790 2800 M1902N3G 94 84 — AATACTTACGTTCATCACTTCGATAGTTCTTCCGTTAAAACACAATACGT Oligo4 144 125 — AATACTTACGTTCATCACTTCGATAGTTCTTCCGTTAAAACACAATACGT M1901T3N 60 53 — AATACTTACG2TCATCAC2TCGALAGT2CTTCCGTTAAAACACAATACGT 1304B7GI 95 85 + AAT3CTT3C6TTCATCACTT14AT3GTTCTTYCGTTAAAACACAATAC6T M1304T7 61 54 + AATACTTACGTTCATCACTTCGATAGTTCTTCCGTTAAAACACA3TACGT Oligo4 143 124 — AATACTTACGTTCATCACTTCGATAGTTCTTCCGTTAAAACACAATACGT M102T3N 62 55 — AATACTTACG2TCATCACCLKGATAGV-C2TCCGTTAAAACACAA2ACHT M1504A3N 66 59 — AACACAATAC4-C O N S E N S U S AATACTTACGTTCATCACTTCGATAGTTCTTCCGTTAAAACACAATACGT 2810 2820 2830 2840 2850 M1902N3G 94 84 — TGG5ATCTCTACAAATGGCGATG Oligo4 144 125 — TGGCATCTCTACAAATGGCGATGGTT-AACGCA M1901T3N 60 53 — TGGCATCTCTACAAATG4CG3TGGTT 1304B7GI 95 85 + T66CATCTCT3CAAAT4GC4AT4GTTAA7C4CATTAATAATT M1304T7 61 54 + TGGCATCTCTACA3AT4GCGAT4GTTAAACGCATTAATAATTTA1CCACA Oligo4 143 124 — TGGCATCTCTACAAATGGCGATGGT2-AACGCA M102T3N 62 55 — TG M1504A3N 66 59 — TH-CATCTCTAC3AATH-CGATG42T3AACGCA2TAATAAV-TACD-AC3 1504A3NS 108 95 — GCATCTCTAC3AATG4CGATG4V-3AACGCB2TAATAA2TTACCCACA 1504CRNS 121 104 — TCTCTACB-ATH-CGAT4G2T3AACGCBTTAATAA2TTA1CCACA C O N S E N S U S TGGCATCTCTACAAATGGCGATGGTTAAACGCATTAATAATTTACCCACA Appendix B. Nucleotide Sequence Data, 91 2860 2870 2880 2890 2900 M1304T7 61 54 + GTGA3GCTTGCTAAGCAGGCTCTA1C11TGCTTGCGA3T1CTA33CTTGT M1504A3N 66 59 — 4TGAAGC2TGCTAAGCAG4CTCTRCCCCTGC2TNCGAATCCTAAAC2TGT 1504A3NS 108 95 — GTGAAGCTTGCTAAGCAH-CTCTACCCCTGCT2GCGAATCCTAAACTTGT 1504CRNS 121 104 — GTGAAGCTTGCTAAGCA4GCTCTACCCCTGCTTGCGAATCCTAAACTTGT 1103C3GI 130 113 + AGCAGGCTCTACCCCTGCTTGCGAATCCTAAACTTGT M1103B3F 65 58 + CCCTGCTTGCGAATCCTAAACYTGT M901T3 63 56 + TTGCGAATCCTAAACTTGT M1103BT3 64 57 + GCGAATCCTAAACTTGT C O N S E N S U S GTGAAGCTTGCTAAGCAGGCTCTACCCCTGCTTGCGAATCCTAAACTTGT 2910 2920 2930 2940 2950 M1304T7 61 54 + A33TA33GCTATAG3TGT4GT21CT224GTCGT1C3-4GTH-TCG M1504A3N 66 59 — AAATAAAGCTATAGAT8TG44TCCTTTGGTCGTCCAAG-TGGTCGGAAAV 1504A3NS 108 95 — AAATAAAGCTATAGATGTGGTTCCTTTGGTCGTCCAAGGTGGTCGGAAAT 1504CRNS 121 104 — AAATAAAGCTATAGATGTGGTTCCTTTGGTCGTCCAAGGTGGTCGGAAAT 1103C3GI 130 113 + AAATAAAGCTATAGATGTGGTTCCTTT44TCGTCCAAGGTGGTCGGAAAT M1103B3F 65 58 + AAATAAAGCTATAGA2GTGGTTCCTTTGGTCGTCCAAGGTGGTCGGAAAT M901T3 63 56 + AAATAAAGCTATAGATGTGGTTC8TTTGGTCGTC8AAGGTGGTCGGAAAT M1103BT3 64 57 + AAATAAAGCTATAGATGTGGTTCCTTTGGTCGTCCAAGG2GGTCGGAAAT 1504B3NF 67 60 — AGCTATAGATGTG4V-CCTV-G4TCGTCCAAGGTGGTCGGAAA2 C O N S E N S U S AAATAAAGCTATAGATGTGGTTCCTTTGGTCGTCCAAGGTGGTCGGAAAT 2960 2970 2980 2990 3000 M1504A3N 66 59 — -GTCCAAGGCTGCTAAGCGGLTGCTTGGCGC2TATGGAGGCAACATTTCG 1504A3NS 108 95 — TGTCCAAGGC 2AAGCGGTTGCTTGG1G1TTATGGAGGCAACATTTCG 1504CRNS 121 104 — TGTCCAAGGCTGCTAAGCGGTTGCTTGG1G1TTATGGAGGCAACATTTCG 1103C3GI 130 113 + TGTCCAAG-CTGCTAAGCGGTTGCTTGGC-CTTATGGAGGCAACATTTCG M1103B3F 65 58 + TGTCCAAGGCTGCTAANKGGTTGCTTGGCGCTTATGGAGGCAACATTTCG M901T3 63 56 + TGTC8AAGGCTGCTAA8-GGTTKKTTGG14CTTATGGAG4CAACATTTCG M1103BT3 64 57 + TGTCCAAGGCTGCTAANKGGTTGCTTGGCGCTTATGGAG4CAACATTTCG 1504B3NF 67 60 — TGTCCAAGGCTGCTAAGCGGTTGCTTGGCGCTTATGGAGGCAACATTTCG C O N S E N S U S TGTCCAAGGCTGCTAAGCGGTTGCTTGGCGCTTATGGAGGCAACATTTCG 3010 3020 3030 3040 3050 M1504A3N 66 59 — TACACTGAGGGTGCCAAACCGGGTGCAATATCAGCTCCTGTCGCTATTAG 1504A3NS 108 95 — TACACTGAGGGTGCCAAACCGGGTGCAATATCAGCTCCTGTCGCTATTAG 1504CRNS 121 104 — TACACTGAGGGTGCCAAACCGGGTGCAATATCAGCTCCTGTCGCTATTAG 1103C3GI 130 113 + TAC3CTGAGGGT4CCAAACCGGGTGCAATATCA4CTCCTGTCGCTATTAG M1103B3F 65 58 + TACACTGAGGGTGCCAAACCGGGTGCA3TATCA4CTCCTGTCGCTATTAG M901T3 63 56 + TACA1TGAGGGT418AAA1CGGGTGCAATATCA—T1CTGTCG1TATTAG M1103BT3 64 57 + TACACTGAGGGTGCCAAACCGGGTGCA3TATCA4CTCCTGTCGC2ATTAG 1504B3NF 67 60 — TACACTGAGGGTGCCAAACCGGGTGCAATATCAGCTCCTGTCGCTATTAG C O N S E N S U S TACACTGAGGGTGCCAAACCGGGTGCAATATCAGCTCCTGTCGCTATTAG 3060 3070 3080 3090 3100 M1504A3N 66 59 — TCGGCGAGTGG1TGGTATGAAGCCKAGG 1504A3NS 108 95 — TCGGCGAGTGGCTGGTATGAAGCCTAGGTTTGTCAGA 1504CRNS 121 104 — TCGGCGAGTGGCTGGTATGAAGCCTAGGTTTGTCA 1103C3GI 130 113 + TCGGCGAGTGGCTGGTATGAAGCCTAGGT M1103B3F 65 58 + TCGGCGAGTGGCTGGTATGA3G1CTAGGT22GTCAGATCTGA3GGATCTG M901T3 63 56 + TCGGCGAGT4GCTGGTATGA3G1CTAGG22TGTCAGA2CTGA3GGA2CTG M1103BT3 64 57 + TCGGCGAGTGGCTGGTATGAAGCCTAGGTTTGTCAGATCVGA3GGATCTG 1504B3NF 67 60 — TCGGCGAGTGGCTGGTATGAAGCCTAGGTTTGTCAGATC M901T7N 68 61 — G4C2G4TATGAAGCCTAG4—TGTCAG32C2G3AG4A2CTG M709T3N 69 67 — GTCAG3TCTG3AG4BTCTG M1402AT7 87 80 + GA2CTGAAGGATCTG M1402T7 70 62 + GG3TCT4 C O N S E N S U S TCGGCGAGTGGCTGGTATGAAGCCTAGGTTTGTCAGATCTGAAGGATCTG Appendix B. Nucleotide Sequence Data 92 3110 3120 3130 3140 3150 M1103B3F 65 58 + TGA3GATAGT M901T3 63 56 + TGA3GATAGT2CATAGGGAGT M1103BT3 64 57 + TGAAGATAGTTCATAGGGAGTTTATTG M901T7N 68 61 — TG3AGATAG2TCATAGG4AGTTTA2TGCC2CTG2TC2TCC2TCGAGTGAT M709T3N 69 67 — TGAAGATAG2TCATA4G4AG22TA2TGCCTCTGTTCTTCCTTCGA4TGAT M1402AT7 87 80 + TGAAGATAGTTCATAGGGAGTTTATTGCCTCTGTTCTTCCTTCGAGTGAT M1402T7 70 62 + TGA3GATAGTTCATAGGGAGTTTATTGCCTCTGTTCTTCCTTCGAGTG7T C O N S E N S U S TGAAGATAGTTCATAGGGAGTTTATTGCCTCTGTTCTTCCTTCGAGTGAT 3160 3170 3180 3190 3200 M901T7N 68 61 — CTCACTGTKAATAATGGTGATGTCAATATCGGTAAGTATAGAGTCAATCC M709T3N 69 67 — CTCAC2GTCAATAATGGTGA2GTCAATATCGGTAAGTATAGAGTCAATCC M1402AT7 87 80 + C2C3CTGTCAATAATGGTGATGTC3ATATCGGTAAGTATAGAGTCB-TCC M1402T7 70 62 + CTCACTGTCAATA7TGGTG7TGTCAATATCGGTAAGTATAGA6TCAATCC C O N S E N S U S CTCACTGTCAATAATGGTGATGTCAATATCGGTAAGTATAGAGTCAATCC 3210 3220 3230 3240 3250 M901T7N 68 61 — TAGTAATAACGCTTTATTCMCCTGGCTTCAGGGACAAGCTCAACTATATH M709T3N 69 67 — TAGTAATAACGCTTTATTCACCTG4CTTCAGGGACAAGCTCAACTATATG M1402AT7 87 80 + TAGTAATAACGCTTTATTDACCTGGCTTCB4GGACA3GCTCAACTATATG M1402T7 70 62 + TAGTAATAACGCTTTATTCACCTGGCTTCAGGG3C73GCTC73CTATATG C O N S E N S U S TAGTAATAACGCTTTATTCACCTGGCTTCAGGGACAAGCTCAACTATATG 3260 3270 3280 3290 3300 M901T7N 68 61 — ATATGTACAGATTTACTCGGCTCCGTATCACCTACATTCCTACT M709T3N 69 67 — ATATGT M1402AT7 87 80 + ATATGTACAGATTTACTCGGCT1CGTATCA1CTACATT M1402T7 70 62 + ATATGTAC M709C7GI 112 98 + ACAGATTTAC2CGGCTCCGTATCACCTACATTCCTACTACCGGA 1103N7AD 88 81 — 1CGTATCACCTACAV-CCTACTACCG4A C O N S E N S U S ATATGTACAGATTTACTCGGCTCCGTATCACCTACATTCCTACTACCGGA 3310 3320 3330 3340 3350 M709C7GI 112 98 + TCCACTTCCACGGGTCGTGTCTCTCTCCTCTGGGATAGAGATTCTCAGGA 1103N7AD 88 81 — TC1ACTTC1ACGG4TC4TRTCTCTCTCCTCTGG4ATAGAGATTCTCAG4A M802A7GI 89 63 + CTTCCACGGGTCGTGTCTCTCTCCTCTGGGATAGAGATTCTCAGGA M802AT7 90 79 + CACGGGTCGTGTCTCTCTCCTCTGGGATAGAGAT7CTCAGGA C O N S E N S U S TCCACTTCCACGGGTCGTGTCTCTCTCCTCTGGGATAGAGATTCTCAGGA 3360 3370 3380 3390 3400 M709C7GI 112 98 + CCCCCTCCCTATAGACCGTGCTGCCATTAGCTCTTATGCTCATTCCGCTG 1103N7AD 88 81 — CCCCCTCCCTATAG3CC4TGCTG—ATTAGCTCTTATGCTCATTCCG1TG M802A7GI 89 63 + CCCCCTCCCTATAGACCGTGCTGCCATTAGCTCTTATGCTCATTCCGCTG M802AT7 90 79 + CCCCDTCCCTATAGACC4TGCTGCCATT3GCTCTTATGCTCATTCCGCTG C O N S E N S U S CCCCCTCCCTATAGACCGTGCTGCCATTAGCTCTTATGCTCATTCCGCTG 3410 3420 3430 3440 3450 M709C7GI 112 98 + ATTCAGCGCCTTGGGCTGAGAATGT2CTAGTGGT1CCATGTGACAATACG 1103N7AD 88 81 — ATTCAG1NKCTTGGGCTGAGAA7GTTCTAGTGGTCCCATGTGACAATACG M802A7GI 89 63 + ATTCAGCGCCTTGGGCTGAGAATGTTCTAGTGGTCCCATGTGACAATACG M802AT7 90 79 + ATTCANKGCCTTG4NKTGAGAATGTTCT3GTGGTCCCATGTGACAA2ACG C O N S E N S U S ATTCAGCGCCTTGGGCTGAGAATGTTCTAGTGGTCCCATGTGACAATACG 3460 3470 3480 3490 3500 M709C7GI 112 98 + TGGAGGT 1103N7AD 88 81 — TGGAGGTACATGAATGAT3CC33TGCTGTCG3 M802A7GI 89 63 + TGGAGGTACATGAATGATACCAATGCTGTCGACCGGAAGTTGGTTGATTT M802AT7 90 79 + TGGAGGTAC3YGAATGATACCAATGCTGYCGACCGGAAGTTGGTTGATTT 802D3NGI 129 112 — GTACATGAATGATACCAATGCTGTCGACCGRAAGTTGGTTGATTT M711T7 75 69 + A2GATA-CAATG1TGTCGACCGGAAGTTGGTTGATTT M802T3N 74 68 — CTG2CGACCG4AAG2TG42TGA222 M711BT7 76 70 + TGTCGACCGGAAGTTGGTTGATTT C O N S E N S U S TGGAGGTACATGAATGATACCAATGCTGTCGACCGGAAGTTGGTTGATTT Appendix B. Nucleotide Sequence Data 93 3510 3520 3530 3540 3550 M802A7GI 89 63 + TGGGCAGTTCTTATTCGCTACTTATTCTGGTGCTGGTAGCACCGCCCATG M802AT7 90 79 + TGG6CAGTTCTTATTCGCTACTTATTCTGGYGCTGGTAGCMK-G1CCATG 802D3NGI 129 112 — TGGGCAGTTCTTATTCGCTACTTATTCTGGTGCTGGTAGCAC-GCCCATG M711T7 75 69 + TGGGCAGTTCTTATTCGCTACTTATTCTGGTGCTGGTAGDA—4CCCATG M802T3N 74 68 — T4G4CAGTTCTTATTCGCTACTTATTCTG-TGCTGGTAGCACCGCC1ATG M711BT7 76 70 + TGGGCAGTTCTTATTCGCTACTTATTCTGGTGCTGGTAGCMK-GCCCATG M1304A3N 73 65 — CGCTAC2TA2TCTG-TGCTG4TAGCACCGCC1ATG M1304T3N 72 64 — C2G-2GC2G4TANKA1CGD-CA2G C O N S E N S U S TGGGCAGTTCTTATTCGCTACTTATTCTGGTGCTGGTAGCACCGCCCATG 3560 3570 3580 3590 3600 M802A7GI 89 63 + GTGA2C M802AT7 90 79 + GTGATCTTTATGT2GAGTATGCTGTAGA3TTTA3GGACCC 802D3NGI 129 112 — GTGATCTTTATGTTGAGTATGCTGTAGAATTTAAGGA1-1-1AGCCTATC M711T7 75 69 + GTGATCT2TATGTTGAGTATGCTGTAGAATTTAAGGA1CCCCA6YCTATC M802T3N 74 68 — GTGATCTTTATGTTGAGTATGCTGTAGAATTTAAGGACKC—AGCCTATC M711BT7 76 70 + GTGATCTTTATGTTGAGTATGCTGTAGAATTTAAGGMCCCCCAGCCTATC M1304A3N 73 65 — 4TGATCTTTATG2TGAGTATGCTGTAGAATTTAAGGACCD—AGCCTATC M1304T3N 72 64 — -LGA2C22TATG2TGAGTATGCTGTAGAA6TTAAGGACKC—AGCCTATC C O N S E N S U S GTGATCTTTATGTTGAGTATGCTGTAGAATTTAAGGACCCCCAGCCTATC 3610 3620 3630 3640 3650 802D3NGI 129 112 — GCTGGGATGGTATGTATGTTTGATCGCTTGGTCTCTCTTTCCGAAGTTGG M711T7 75 69 + GCT4G4ATGGTATGTATGTTTGATCG1TTGGTCTCTCT2T1CGA3G224G M802T3N 74 68 — GCTGGGATGGTATGTATGTTTGATCGCTTGGTCTCTCTTTCCGAAGTTGG M711BT7 76 70 + GCTGG-ATGGTATGTATGTTTGATCGCTTGGTCTCTCT22CCGAAGTTGG M1304A3N 73 65 — GCTGGGATGGTATGTATGT2TGATCGCTTGGTCTCTCTTTCCGAAGTTGG M1304T3N 72 64 — GCTGG4ATG4TATGTATGT2TGATCGCTTGGTCTCTCTTTCCGAAGTTGG C O N S E N S U S GCTGGGATGGTATGTATGTTTGATCGCTTGGTCTCTCTTTCCGAAGTTGG 3660 3670 3680 3690 3700 802D3NGI 129 112 — ATCC M711T7 75 69 + AT1CA1TATCA34G4TGTDA3TTA1A22GCTGATCGTGAT4TGATA M802T3N 74 68 — ATCC M711BT7 76 70 + ATCCA1TATCA3GGGTGTCB-2TACA2TGCTGATCGTGATGTGATA3CTA M1304A3N 73 65 — ATCCACTATCAAGGGTGTCAATTACATTGCTGATCGTGATGTGATAACTA M1304T3N 72 64 — ATCCACTATCAAGGGTGTCAATTACATTGCTGATCGTGATGTGATAACTA M107T7 78 75 + GGGTGTCA3TTACATTGCTGATCGTGATGTGATAACTA M1072T7 77 66 + T4TCAATTACATTGCTGATCGTGATGTGATAACTA M107F7GI 122 105 + AATTAC3TTGCTGATCGTGATGTGATAACTA C O N S E N S U S ATCCACTATCAAGGGTGTCAATTACATTGCTGATCGTGATGTGATAACTA 3710 3720 3730 3740 3750 M711BT7 76 70 + CT4GGH-TA3TA M1304A3N 73 65 — CTGGGG42AATATTGGTGTTAACATCAATATTCCCGGG M1304T3N 72 64 — CTGGGGGTAATATTGGTGTTAACATCAATATTCCCGGG M107T7 78 75 + CT4GGGGTAATATTGGTGTTAACBTCA-TATTCCCGGGACTTATCTCGTC M1072T7 77 66 + CT4GGGGTAATATTGGTGTTAACATCA3TATT1CC4GGACTTATCTCGTC M107F7GI 122 105 + CTGGGGGTAATATTGGTGTTAACATCAATATTCCCGGGACTTATCTCGTC M711AT3N 80 72 — 4TAATAV-H-TGTTAACATCAATATTCCCGG4ACTTATCTCGTC 1601T7NG 127 109 — CCC4GGAC2TATCTCGTC 1601T3G 123 106 + TC C O N S E N S U S CTGGGGGTAATATTGGTGTTAACATCAATATTCCCGGGACTTATCTCGTC 3760 3770 3780 3790 3800 M107T7 78 75 ACGATTGTTCTTA3TGCTACA2CGAT2GGT1CD-T-A1CT M1072T7 77 66 + ACGATTGTTCTTAATGCTACATCGATTGGT1CC1T1ACCT2CACT4GTAA M107F7GI 122 105 + ACGATTGTTCTTAATGCTACATCGATTGGTCCCCTCACCTTCACTGGTAA M711AT3N 80 72 — ACGAV-GTTCTTAATGCTACATCGATTGGTCCCCTCACC22CACTGGTAA 1601T7NG 127 109 — ACGA2TG2TC2T3ATGCTACATCGA2TG4TCCCCTCACCTTCACTGGTAA 1601T3G 123 106 + ACGATTGTTCTTAATGCTACATCGATTGGTCCCCTCACCTTCACTGGTAA M711T3N 79 71 — TGCTACATCG3TTG4TCCCCTC3CCTTC3CTGGT3A C O N S E N S U S ACGATTGTTCTTAATGCTACATCGATTGGTCCCCTCACCTTCACTGGTAA Appendix B. Nucleotide Sequence Data 94 3810 3820 3830 3840 3850 M1072T7 77 66 + T2CTAAACT2GTAGGCA3CBGTCT2AATCTTA1CAGCB4T4GT4CATCT M107F7GI 122 105 + TTCTAAACTTGTAGGCAACAGTCTTAATCTTACCAGCAGTGGTGCATCT-M711AT3N 80 72 — TTCTAAAC2TGTAGGCAACAGTCTTAATCTTACCAGCAGTGGTGCATCTG 1601T7NG 127 109 — TTCTAAACTTGTAGGCAACAGTCTTAATCTTACCAGCAGTGGTGCATCTG 1601T3G 123 106 + TTCTAAACTTGTAGGCAACAGTCTTAA2CTTACCAGCAGTGGTGCATCTG M711T3N 79 71 — 2T1TA3AC2TGTAGGC3ACAGTCTT33TCTTA1CAG13GTGGTGC3TCTG C O N S E N S U S TTCTAAACTTGTAGGCAACAGTCTTAATCTTACCAGCAGTGGTGCATCTG 3860 3870 3880 3890 3900 M107F7GI 122 105 + CTCTTACGTTCACCCTLA3CTD-AC1GG2GT-CD-A3CAGTAGCGAT M711AT3N 80 72 — CTCTTACGTTCACCCTTAACTCCACCGGTHTGCCCAACAGTAGCGATTCT 1601T7NG 127 109 — CTCTTACGTTCACCCTTAACTCCACCGGTGTGCCCAACAGTAGCGATTCT 1601T3G 123 106 + CTCTTACGTTCACCCTTAACTCCACCGGTGTGCCCAACAGTAGCGATTCT M711T3N 79 71 — CTCTT3CGTTC3CCCTTAACTCC3CCGGTGTGCCCAACAGTAGCGATTCT M1603T7N 81 76 — G4TGTG1C1AA1AGTAGCGATTCT C O N S E N S U S CTCTTACGTTCACCCTTAACTCCACCGGTGTGCCCAACAGTAGCGATTCT 3910 3920 3930 3940 3950 M711AT3N 80 72 — TCATTCTCTGTGGGT 1601T7NG 127 109 — TCATTCTCTGTGGGTACCGTTGTTGCCTTGACTAGGGTG1GTATGACGAT 1601T3G 123 106 + TCATTCTCTGTGGGTACCGTTGTTGCCTTGACTAGGGTGCGTATGACGAT M711T3N 79 71 — TC3TTCTCTGTGGGT M1603T7N 81 76 — TCATTCTCTGTGG4TACCGTTGTTGCCTTGA1TAGGGTGCGTATGACGAT M702T7 82 73 + 2GCCTTGACTAGGGTGCGTATGACGAT M702E7GI 106 93 + CCTTGACTAGGGTGCGTAT4ACGAT C O N S E N S U S TCATTCTCTGTGGGTACCGTTGTTGCCTTGACTAGGGTGCGTATGACGAT 3960 3970 3980 3990 4000 1601T7NG 127 109 — CACTCGCTGCTCTCCAGAAACTGCTTACCTCGCCTAATTTGATTTACTGC 1601T3G 123 106 + CACTCGCTGCTCTCCAGAAACTGCTTACCTCGCCTAAV-TGA2TTACTGC M1603T7N 81 76 — CACTCGCTGCTCTCYAGAAACTGCTTACCTCGCCTAATTTGATTTACTGC M702T7 82 73 + CA CTGCTCTCCAGAAACTGCTTACCTCGCCTAATTTGATTTACTGC M702E7GI 106 93 + CAC21-CTGCTCTCCAGAAACTGCTT3CCTCGCCTAATTTGATTTAC2-C C O N S E N S U S CACTCGCTGCTCTCCAGAAACTGCTTACCTCGCCTAATTTGATTTACTGC 4010 4020 4030 4040 4050 1601T7NG 127 109 — ACTCCAAATCCGGTCTCCCTTGTTC 1601T3G 123 106 + ACT1CA3ATD-GGTCTD-CT M1603T7N 81 76 — ACTCCAAMTCCG4TCTCCCTTGTTCCTACCTGTTCTCAGCCTGAT M702T7 82 73 + ACTCCAAATCCGGTCTC1CTTGTTCCTA1CTGTTCTCAGCCTGATATCTG M702E7GI 106 93 + ACTCCAAAT-C4GTCTCCCTT-TTCCTACCTGTTCTCA4CCTGATATCTG 1806N3GI 105 92 — TCT11C2TGTTCCTACCTGTTCTCAGCCTGATATCTG 14F3EST3 134 116 + TG C O N S E N S U S ACTCCAAATCCGGTCTCCCTTGTTCCTACCTGTTCTCAGCCTGATATCTG 4060 4070 4080 4090 4100 M702T7 82 73 + TTCTGGTGTCCTATAGGCGTCCT2GTCGTGTGTAGTGCGGTCT4GCTAA1 M702E7GI 106 93 + TT1TGGTGTCCTAT3GGCGTCCTT4TCGTGTGTAGT4C4G2CTGGCTAA1 1806N3GI 105 92 — TTCT4GTGTCCTATAH-1GTCCTTGTCGTGTGTAGTGCGGTCTGGCTAAC 14F3EST3 134 116 + TTCTGGTGTCCTATAGGCGTCCTTGTCGTGTGTAGTGCGGTCTGGCTAAC 39A3EST3 135 117 + GGTGTCCTATAGGCGTCCTTGTCGTGTGTAGTGCGGTCTGGCTAAC 40A3EST3 136 120 + GTGTCCTATAGGC-TCCTTGTCGTGTGTAGT4C-GTCTGGCTAAC M1801T3N 84 77 — CCTATA-GC42C12TGTC-TRTGTAGTGCH-TCTG4CTAAC 6 F 1 E S C M P 138 119 + AGGC-TCCTTGTCGTGTG2AGTGC4GTCTGGCT3AC C O N S E N S U S TTCTGGTGTCCTATAGGCGTCCTTGTCGTGTGTAGTGCGGTCTGGCTAAC Appendix B. Nucleotide Sequence Data 95 4110 4120 4130 4140 4150 M702T7 82 73 + CGTAAT4GCGTATCGGCTT4GA2221CGATGA M702E7GI 106 93 + CGT3AT-4CGTATCG-1TTGGAT2T1CGATGA 1806N3GI 105 92 — CGTAATGGCGTATCGG1TTGGATTTCCGATGATTTGGCTCCGGGATGTAC 14F3EST3 134 116 + CGTAACGGCGTATCGGCTTGGATTTCCGATGATTTGGCTCCGGGATGTAC 39A3EST3 135 117 + CGTAACGGCGTATCGGCTTGGATTTCCGATGATTTGGCTCCGGGATGTAC 40A3EST3 136 120 + CGTAAC44C4TATCG-CTTGGATTTCCGATGATTTG4CTCCGGGATGTAC M1801T3N 84 77 — C4TAATH-C4TATCG4D2TGGATTTCCGATGATTTGGCTCCGGGATGTAC 6 F 1 E S C M P 138 119 + CG2AACGGCGTATCGGCTTGGATTTCCGATGATTTGGCTCCGGGATGTAC 40A2ESRC 137 118 — GCTCCGGGATGTAC M1702CT3 83 74 + AC M1702D3I 97 86 + AC C O N S E N S U S CGTAA-GGCGTATCGGCTTGGATTTCCGATGATTTGGCTCCGGGATGTAC 4160 4170 4180 4190 4200 1806N3GI 105 92 — GACATAGCTGAAGATGGTTGGAGTTTGGTGGACCACCGCTAGCAAAATAC 14F3EST3 134 116 + GACATAGCTGAAGATGGTTGGAGTTT-4 ACCACCGCTAGCAAAATAC 39A3EST3 135 117 + GACATAGCTGAAGATGGTTGGAGTTT4GT—ACCACCGCTAGCAAAATAC 40A3EST3 136 120 + GACATAGCTGAAGATGGTT4GAGTTT ACCACCGCTAGCAAAATAC M1801T3N 84 77 — GACATAGCT4AAGATGGTTGGANTLTGGTGGACCACCGCTAGCAAAATAC 6 F 1 E S C M P 138 119 + GACATAGCTGAAGATGGTTGGAGTTT-4T-GACCACCGCTAGCAAAATAC 40A2ESRC 137 118 — 4ACATAGCTGAAGATGGTTGGAGTTTGGTGGACCACCGCTAGCAAAATAC M1702CT3 83 74 + GACATAGCTGAAGA7GG77GNAG7T7GG-GGACCACCR5TAGCAAAA7A5 M1702D3I 97 86 + 6ACATAGCTGAAGAT4GTTGGAGTT244T4GACCACCGCTAGCAAAATAC C O N S E N S U S GACATAGCTGAAGATGGTTGGAGTTTGGTGGACCACCGCTAGCAAAATAC 4210 4220 4230 4240 4250 1806N3GI 105 92 — ACTCTGTGTGGGG1GTGCTAGTGGATAGTCATGTATGTTTGAGATGGGT2 14F3EST3 134 116 + 3CTCTGTGTGGG4C4TGCTAGTGGATAGTCATGTATGTTTGAGATGGGTT 39A3EST3 135 117 + 3CTCTGTGTGGG4C-T4CTAGTGGATAGTCATGTATGTTTGAGATGGGTT 40A3EST3 136 120 + -CTCTG2GTGGG-C-T-CTAGTGGATAGTCATGTATGTTTGAGAT4GGTT M1801T3N 84 77 — ACTCTGTGTGGGGC4TGCTAGTGGATAGTCAT8TATGTTTGAGATGGGTT 6 F 1 E S C M P 138 119 + ACTCTGTGTGGG4C4TGCTAGTGGATAGTCATGTATGTTTGAGATGGGTT 40A2ESRC 137 118 — ACTCTGTGTGGGGCGTGCTAGTGGATAGTCATGTATGTTTGAGATGGGTT M1702CT3 83 74 + ACT5TGTGTGGGGCGTGCTAG7GGBTAG7CBTG7ATGTT7GAGATGGGTY M1702D3I 97 86 + ACTCTGTG244G4C424CTAGTGGAT7GTCATGTAT4TT24AGAT-GGTT C O N S E N S U S ACTCTGTGTGGGGCGTGCTAGTGGATAGTCATGTATGTTTGAGATGGGTT 4260 4270 4280 4290 4300 1806N3GI 105 92 — ATAGG 14F3EST3 134 116 + ATAGGCCCATCCA 39A3EST3 135 117 + ATAG4CCCATCCC-CCTA 40A3EST3 136 120 + AT3G4CCCAT-CC-CCTA M1801T3N 84 77 — ATAGG 6 F 1 E S C M P 138 119 + ATAG4CCCATCCA 40A2ESRC 137 118 — ATAGGCCCATCCCGCCTA M1702CT3 83 74 + ATAGRCCCATCCCT7TA M1702D3I 97 86 + AT3G4CCCA2YCCTTTA C O N S E N S U S ATAGGCCCATCC—C—A Bibliography Argos, Patrick. (1988). A sequence motif in many polymerases. Nucleic Acids Research, 16(21):9909-9916. Atkinson, T. and Smith, M. (1984). Purification of oligonucleotides obtained by small scale (0.2/imol) automated synthesis by gel electrophoresis. In Gait, M. J., editor, Oligonucleotide Synthesis: A Practical Approach, pages 35-81, IRL Press, Oxford. Avgelis, A. (1985). Occurrence of melon necrotic spot virus in Crete (Greece). Phytopathologische Zeitschrift, 114(4):365-372. Bailey, James M. and Davidson, Norman. (1976). Methylmercury as a reversible denaturing agent for agarose gel electrophoresis. Analytical Biochemistry, 70:75-85. Baltimore, David. (1966). Purification and properties of poliovirus double-stranded ribonucleic acid. Journal of Molecular Biology, 18:421-428. Barr, Philip J., Thayer, Richard M., Laybourn, Paul, Najarian, Richard C, Seela, Frank, and Tolan, Dean R. (1986). 7-deaza-2'-deoxyguanosine-5'-triphosphate: en-hanced resolution in M13 dideoxy sequencing. Bio Techniques, 4(5):428-432. Bos, L., van Dorst, H. J. M., Huttinga, H., and Maat, D. Z. (1984). Further charac-terization of melon necrotic spot virus causing severe disease in glasshouse cucum-bers in the Netherlands and its control. Netherlands Journal of Plant Pathology, 90(2):55-69. Carrington, J. C. and Morris, T. J. (1984). Complementary DNA cloning and analysis of carnation mottle virus RNA. Virology, 139:22-31. Carrington, J. C. and Morris, T. J. (1985). Characterization of the cell-free trans-lation products of carnation mottle virus genomic and subgenomic RNAs. Virology, 144:1-10. Carrington, J. C. and Morris, T. J. (1986). High resolution mapping of carnation mottle virus-associated RNAs. Virology, 150:196-206. Carrington, J. C, Morris, T. J., Stockley, P. G., and Harrison, S. C. (1987). Struc-ture and assembly of turnip crinkle virus IV — analysis of the coat protein gene and implications of the subunit primary structure. Journal of Molecular Biology, 194:265-276. 96 Bibliography 97 [12] Carrington, James C, Heaton, Louis A., Zuidema, Douwe, Hillman, Bradley I., and Morris, T. J. (1989). The genome structure of turnip crinkle virus. Virology, 170:219-226. [13] Coudriet, D. L., Kishaba, A. N., and Carroll, J. E. (1979). Transmission of muskmelon necrotic spot virus in muskmelons by cucumber beetles. Journal of Economic Entomology, 72(4):560-561. [14] Dayhoff, M. 0., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolution-ary change in proteins. In Dayhoff, M. 0., editor, Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, pages 345-352, National Biomedical Research Foundation, Washington, D. C. [15] Devereux, John, Haeberli, Paul, and Smithies, Oliver. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research, 12(l):387-395. [16] Dias, H. F. and McKeen, C .D. (1972). Cucumber necrosis virus. AAB Descriptions of Plant Viruses, (82). [17] Donis-Keller, Helen. (1980). Phy M: an RNase activity specific for U and A residues useful in RNA sequence analysis. Nucleic Acids Research, 8(14):3133-3142. [18] Dougherty, William G. and Hiebert, Ernest. (1985). Genome structure and gene expression of plant RNA viruses. In Davies, J. W., editor, Molecular Plant Virology, Volume II, pages 23-81, CRC Press, Boca Raton, Florida. [19] Dougherty, William G. and Kaesberg, Paul. (1981). Turnip crinkle virus RNA and its translation in rabbit reticulocyte and wheat embryo extracts. Virology, 115:45-56. [20] Feng, Da-Fei and Doolittle, Russell F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 25:351-360. [21] Garger, S. J., Griffith, 0. M., and Grill, L. K. (1983). Rapid purification of plasmid DNA by a single centrifugation in a two-step cesium chloride-ethidium bromide gradient. Biochemical and Biophysical Research Communications, 117(3):835-842. [22] Geliebter, Jan. (1987). Dideoxynucleotide sequencing of RNA and uncloned cDNA. Focus, 9(l):5-8. [23] Goldbach, R. W. (1986). Molecular evolution of plant viruses. Annual Review of Phytopathology, 24:289-310. Bibliography 98 [24] Goldbach, R. (1987). Genome similarities between plant and animal viruses. Mi-crobiological Sciences, 4(7):197-202. [25] Gonzalez-Garza, R., Gumpf, D. J., Kishaba, A. N., and Bohn, G. W. (1979). Iden-tification, seed transmission, and host range pathogenicity of a C a l i f o r n i a isolate of melon necrotic spot virus. Phytopathology, 69(4):340-345. [26] Gorbalenya, Alexander E., Koonin, Eugene V., Donchenko, Alexei P., and Blinov, Vladimir M. (1988). A conserved NTP-motif in putative helicases. Nature, 333:22. [27] Gorbalenya, Alexander E., Blinov, Vladimir M., Donchenko, Alexei P., and Koonin, Eugene V. (1989). An NTP-binding motif is the most conserved sequence in a highly diverged monophyletic group of proteins involved in positive strand viral replication. Journal of Molecular Evolution, 28:256-268. [28] Guilley, H., Carrington, J. C., Balazs, E., Jonard, G., Richards, K., and Morris, T. J. (1985). Nucleotide sequence and genome organization of carnation mottle virus RNA. Nucleic Acids Research, 13(18):6663-6677. [29] Hammond, Alan W. and D'Alessio, James M. (1986). Removal of 5' extensions with mung bean nuclease using positive selection plasmids. Focus, 8(4):4-6. [30] Harbison, S. A., Davies, J. W., and Wilson, T. M. A. (1985). Expression of high molecular weight polypeptides by carnation mottle virus DNA. Journal of General Virology, 66:2597-2604. [31] Henikoff, Steven. (1984). Unidirectional digestion with exonuclease III creates tar-geted breakpoints for DNA sequencing. Gene, 28:351-359. [32] Hibi, Tadaaki. (1986). Physical and chemical properties of several plant viruses. In International Symposium on Virus Diseases of Rice and Leguminous Crops in the Tropics, pages 86-91, Tropical Agriculture Research Centre, Tsukuba, Ibaraki, Japan. [33] Hibi, T. and Furuki, I. (1985). Melon necrotic spot virus. AAB Descriptions of Plant Viruses, (302). [34] Hillman, Bradley I., Hearne, Patrick, Rochon, D'Ann, and Morris, Thomas J. (1989). Organization of tomato bushy stunt virus genome: characterization of the coat pro-tein gene and the 3' terminus. Virology, 169:42-50. [35] Hodgman, T. C. (1988). A new superfamily of replicative proteins. Nature, 333:22-23, 578. Bibliography 99 Holland, John, Spindler, Katherine, Horodyski, Frank, Grabau, Ehzabeth, Nichol, Stuart, and VandePol, Scott. (1982). Rapid evolution of RNA genomes. Science, 215:1577-1585. Joshi, Sadhna and Haenni, Anne-Lise. (1984). Plant RNA viruses: strategies of expression and regulation of viral genes. FEBS Letters, 177(2):163-174. Kamer, Gregory and Argos, Patrick. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Research, 12(18):7269-7282. Kishi, K. (1960). Annals of the Phytopathological Society of Japan, 25:237-238. Kishi, Kunihei. (1966). Necrotic spot of melon, a new virus disease. Annals of the Phytopathological Society of Japan, 32(3):138-144. Korneluk, Robertg G., Quan, Frank, and Gravel, Roy A. (1985). Rapid and reliable dideoxy sequencing of double-stranded DNA. Gene, 40:317-323. Kozak, Marilyn. (1987). At least six nucleotides preceding the AUG initiator codon enhance translation in mammalian cells. Journal of Molecular Biology, 196:947-950. Liitcke, H. A., Chow, K. C, Mickel, F. S., Moss, K. A., Kern, H. F., and Scheele, G. A. (1987). Selection of AUG initiation codons differs in plants and animals. The EMBO Journal, 6(l):43-48. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982). Molecular Cloning: A Labo-ratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. Martin, Richard. (1987). Overcoming DNA sequencing artifacts: stops and com-pressions. Focus, 9(l):8-9. Matthews, R. F. (1981). Plant Virology. Academic Press, New York, second edition. Matthews, R. E. F. (1982). Classification and nomenclature of viruses — fourth report of the international committee on taxonomy of viruses. Intervirology, 17, No. 1-3. McClure, M. A., Johnson, M. S., Feng, D.-F., and Doolittle, R. F. (1988). Sequence comparisons of retroviral proteins: Relative rates of change and general phylogeny. Proceedings of the National Academy Of Sciences U. S. A., 85:2469-2473. McMaster, Gary K. and Carmichael, Gordon C. (1977). Analysis of single- and double-stranded nucleic acids on polyacrylamide and agarose gels by using glyoxal and acridine orange. Proceedings of the National Academy Of Sciences U. S. A., 74(ll):4835-4838. Bibliography 100 [50] Miller, W. Allen, Bujarski, Jozef J., Dreher, Theo W., and Hall, Timothy C. (1986). Minus-strand initiation by brome mosaic virus replicase within the 3' tRNA-like structure of native and modified RNA templates. Journal of Molecular Biology, 187:537-546. [51] Miller, W. A., Waterhouse, P. M., and Gerlach, W. L. (1988). Sequence and or-ganization of barley yellow dwarf virus genomic RNA. Nucleic Acids Research, 16(13):6097-6111. [52] Morris, T. J. and Carrington, J. C. (1988). Carnation mottle virus and viruses with similar properties. In Koenig, Renate, editor, The Plant Viruses, Volume 3, pages 73-112, Plenum Pubhshing Corporation, New York. [53] Morrison, D. A. (1979). Transformation and preservation of competent bacterial cells by freezing. In Methods in Enzymology, Volume 68, pages 326-331, Academic Press. [54] Needleman, Saul B. and Wunsch, Christian D. (1970). General method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453. [55] Nutter, R. C, Scheets, K., Panganiban, L. C, and Lommel, S. A. (1989). The complete nucleotide sequence of the maize chlorotic mottle virus genome. Nucleic Acids Research, 17(8):3163-3177. [56] Pearson, William R. and Lipman, David J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy Of Sciences U. S. A., 85:2444-2448. [57] Pelham, Hugh R. B. (1989). Leaky UAG termination codon in tobacco mosaic virus RNA. Nature, 272:469-471. [58] Pot, Jerina. (1987). Molecular characterization of melon necrotic spot virus and comparison of the genome with the genomes of four similar viruses. Master's thesis, Agricultural University of Wageningen, The Netherlands. [59] Pustell, J. and Kafatos, F. C. (1984). Convenient and adaptable package of computer programs for DNA protein sequence management, analysis and homology determi-nation. Nucleic Acids Research, 12:643-655. [60] Rao, A. L. N., Dreher, T. W., Marsh, L. E., and Hall, T. C. (1989). Telomeric function of the tRNA-like structure of brome mosaic virus RNA. Proceedings of the National Academy Of Sciences U. S. A., 86:5335-5339. Bibliography 101 [61] Rigby, Peter W. J., Dieckmann, Marianne, Rhodes, Carl, and Berg, Paul. (1977). Labelling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. Journal of Molecular Biology, 113:237-251. [62] Riviere, C. J., Pot, J., Tremaine, J. H., and Rochon, D. M. (1989). Coat protein of the carmovirus melon necrotic spot is more similar to those of tombusviruses than to carmoviruses. Accepted for publication in Journal of General Virology. [63] Rochon, D'Ann M. (1989). Personal communication. [64] Rochon, D'Ann and Tremaine, Jack H. (1988). Cucumber necrosis virus is a member of the tombusvirus group. Journal of General Virology, 69:395-400. [65] Rochon, D'Ann M. and Tremaine, Jack H. (1989). Complete nucleotide sequence of the cucumber necrosis virus genome. Virology, 169:251-259. [66] Ryden, Kerstin and Persson, Paula. (1986). Nekrosflacksjuka hos melon — en ny virussjukdom i Sverige (melon necrotic spot virus — a new virus disease in Sweden). Vdxtskyddnotiser, 50(4-5):130-132. [67] Sanger, F., Nicklen, S., and Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy Of Sciences U. S. A., 74:5463-5467. [68] Siegel, Albert, Hari, V., Montgomery, Ilene, and Kolacz, Kathryn. (1976). A mes-senger RNA for capsid protein isolated from tobacco mosaic virus-infected tissue. Virology, 73:363-371. [69] Simoncsits, A., Brownlee, G. G., Brown, R. S., Rubin, J. R., and Guilley, H. (1977). New rapid gel sequencing method for RNA. Nature, 269:833-836. [70] Staden, Rodger. (1982a). An interactive graphics program for comparing and align-ing nucleic acid and amino acid sequences. Nucleic Acids Research, 10(9):2951—2961. [71] Staden, Roger. (1982b). Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Research, 10(15):4731-4751. [72] Strauss, James H. and Strauss, Ellen G. (1988). A messenger RNA for capsid protein isolated from tobacco mosaic virus-infected tissue. Annual Review of Microbiology, 42:657-683. [73] Suggs, Sidney V., Hirose, Tadaaki, Miyake, Tetsuo, Kawashima, Eric H., John-son, Merrie Jo, Itakura, Keiichi, and Wallace, R. Bruce. (1981). Use of synthetic oligodeoxyribonucleotides for the isolation of specific cloned DNA sequences. In Developmental Biology Using Purified Genes, pages 683-693, Academic Press. Bibliography 102 [74] Tabor, Stanley and Richardson, Charles C. (1987). DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proceedings of the National Academy Of Sciences U. S. A., 84:4767-4771. [75] Tautz, Diethard and Renz, Manfred. (1983). An optimized freeze-squeeze method for the recovery of DNA fragments from agarose gels. Analytical Biochemistry, 132:14-19. [76] Thomas, Barry J. and Tomlinson, J. A. (1985). Studies on melon necrotic spot virus and its vector — occurrence and transmission by the fungus Olpidium radi-cals In Annual Report Glasshouse Crops Research Institute 1984, pages 108-114, Littlehampton, Sussex, U. K. [77] Tomlinson, J. A. and Faithfull, Elizabeth M. (1985). Melon necrotic spot virus disease of cucumber. In National Vegetable Research Station Annual Report 1984, pages 88-89, Wellesbourne, Warwick, U. K. [78] Toneguzzo, F., Glynn, S., Levi, E., Mjolsness, S., and Hayday, A. (1988). Use of a chemically modified T7 DNA polymerase for manual and automated sequencing of supercoiled DNA. Bio Techniques, 6(5):460-469. [79] Tremaine, J. H. (1989). Personal communication. [80] Tremaine, J. H., Ronald, W. P., and McGauley, E. M. (1983). Effects of sodium dextran sulfate on some isometric plant viruses. Phytopathology, 73(9):1241-1246. [81] Vogelstein, Bert and Gillespie, David. (1979). Preparative and analytical purification of DNA from agarose. Proceedings of the National Academy Of Sciences U. S. A., 76(2):615-619. [82] Vrati, Sudhanshu, Mann, David A., and Reed, Ken C. (1987). Alkaline north-ern blots: transfer of RNA from agarose gels to Zeta-Probe™ membrane in dilute NaOH. Molecular Biology Reports, l(3):l-4. [83] Zaitlin, Milton and Hull, Roger. (1987). Plant virus-host interactions. Annual Review of Plant Physiology, 38:291-315. [84] Zimmern, David. (1988). Evolution of RNA viruses. In Domingo, Esteban, ed-itor, RNA Genetics, Volume II: Retroviruses, Viroids, and RNA Recombination, pages 211-240, CRC Press, Boca Raton, Florida. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097539/manifest

Comment

Related Items