Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Characterization of an 11s legumin-like storage protein gene from the gymnosperm picea glauca Márquez García, Magdalena Ivonne 1994

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1994-0529.pdf [ 1.47MB ]
JSON: 831-1.0087450.json
JSON-LD: 831-1.0087450-ld.json
RDF/XML (Pretty): 831-1.0087450-rdf.xml
RDF/JSON: 831-1.0087450-rdf.json
Turtle: 831-1.0087450-turtle.txt
N-Triples: 831-1.0087450-rdf-ntriples.txt
Original Record: 831-1.0087450-source.json
Full Text

Full Text

CHARACTERIZATION OF AN uS LEGUMIN-LIKE STORAGEPROTEIN GENE FROM THE GYMNOSPERMPICEA GLAUCAbyMAGDALENA IVONNE MARQUEZ GARCIAB.Sc. (BioI.), National Autonomous University of Mexico (UNAM).1987A thesis submitted in partial fulfillment of the requirements for thedegree ofMASTER OF SCIENCETHE FACULTY OF GRADUATE STUDIESGENETICSWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAMay 1994© Magdalena Ivonne Márquez-GarcIa,1 994In presenting this thesis in partial fulfillment of therequirements for an advanced degree at the University of BritishColumbia, I agree that the Library shall make it freely availablefor reference and study. I further agree that permission forextensive copying of this thesis for scholarly purposes may begranted by the head of my department or by his or herrepresentatives. It is understood that copying or publication ofthis thesis for financial gain shall not be allowed without mywritten permission.(Signature)____________________Department of_________________The University of British ColumbiaVancouver, CanadaDate LAbstractThe amino acid sequence homologies of seed storageproteins in all seed plants, including gymnosperms,suggest that they evolved from a common ancestor. Seedstorage protein genes have been extensively studied inangiosperms, however no data regarding the structuralorganization of the genes in gymnosperrns is available.This is the first report of a gymnosperm seed storageprotein gene. A 2 genomic clone containing a Picea uSlegumin gene was isolated and characterized. Theorganization of the gene was found to be similar toangiosperm legumin genes. The nucleotide sequence containsfive exons and four introns. The number of intronsdiffers from those in angiosperrns, however the position ofthe first three introns is highly conserved, as it is inangiosperms. A deletion was found in the third exon. Thepossibility of this deletion being a cloning artifact isdiscussed. The deduced amino acid sequence is 509 aminoacids long and is 98.7% identical to a previouslycharacterized legumin cDNA from Picea glauca. lmino acidcomparisons of the legumin genes of Picea and otherspecies showed the presence of highly conserved sequences.Putative regulatory sequences were found in the 5’flanking sequence of the Picea uS legumin gene bycomparisons between Picea uS legumin promoter andangiosperm SSP promoters.12.Table of ContentsAbstract iiList of Tables vList of figures viAcknowledgment1 INTRODUCTION 12 LITERATURE REVIEW2.1. SEED STORAGE PROTEIN CHARACTERISTICS ANDCLASSIFICATION 42.2. GLOBULIN PROTEINS IN GYMNOSPERMS 52.3. GLOBULIN STRUCTURE 82.4. DOMAIN ORGANIZATION 102.5. SYNTHESIS AND DEPOSITION OF SEED STORAGEPROTEINS DURING ANGIOSPERM AN]) GYMNOSPERMDEVELOPMENT 112 . 6. GENE REGULATION 152.6.1. ABA REGULATION 162.7. SSP ARE MEMBERS OF MULTIGENE FAMILIES 182.8. GENE STRUCTURE 192.9. REGULATORY SEQUENCES 202.10. TISSUE SPECIFICITY AND TEMPORAL REGULATION 202.11. THE ROLE OF CIS-ACTING ELEMENTS AND CONSERVEDMOTIFS ON THE REGULATION OF LEGtJMIN GENE EXPRESSION .. 222.12. DNA-BINDING PROTEINS 243 MATERIAL AND METHODS3 .1. uS LEGtJMIN GENOMIC DNA ISOLATION 283.2. RANDOM LABELING 283.3. ?-GENOMIC DNA CHARACTERIZATION 293.4. 2k-DNA PREPARATION 301113.5. EXTRACTION OF X-DNA.313.6. CsC1 DNA PURIFICATION 323.7. SOUTHERN BLOT 333.8. MAPPING THE ?—GENOMIC DNA 333.9. PLASMID DNA CLONING 343 .10. VECTOR PREPARATION 353.11. LIGATION OF VECTOR AND INSERT DNA 353.12. COMPETENT CELLS PREPARATION 363.13. ISOLATION OF PLASMID DNA BY ALKALI METHOD 373.14. GENERATION OF UNIDIRECTIONAL DELETION CONSTRUCTSFOR SEQUENCING 373.15. SEQUENCING METHODOLOGY 393.16. SEQUENCING GELS AND ELECTROPHORESIS 403.17. PRIMER EXTENSION 404 RESULTS4.1 IDENTIFICATION OF A GENOMIC CLONE CONTAINING THEuS LEGUMIN GENE 424.2. uS LEGUMIN CODING REGION 494.3. STRUCTURAL ORGANIZATION OF THE PICEA uS LEGUMINGENE 534.4. PICEA GLAUCA uS LEGUMIN AMINO ACID SEQUENCE .... 574.4.1. AMINO ACID COMPARISONS REVEAL CONSERVATION OFHIGHLY CONSERVED SEQUENCES 624.5. THE PICEA GLAUCA LEGUMIN PROMOTER REGION 674.5.1. PUTATIVE REGULATORY SEQUENCES 675 DISCUSSION 746 REFERENCES 82ivList of TablesTable 4.1 Amino acid composition of the Picea 11Slegumin protein 61Table 4.2. Percentage of amino acid identity amonglegumin proteins: a) Picea glauca versus gyrnnosperms;b) Picea versus dicots and monocots 66Table 4.3 Putative regulatory sequences in the Piceaglauca uS legumin promoter 73‘7List of figuresFigure 1.1. Pathway for synthesis and processing of 11 Sseed storage globulins 14Figure 4.1. Restriction enzyme digests and southernanalysis of two k-clones (1) XI5H-1 and (2) XI5H-2containing the uS legumin gene from Piceaglauca 44Figure 4.2. Restriction digests and southern analysis of? genomic clone containing a Picea glauca uS legumingene 46Figure 4.3. Restriction map of the -genomic spruceclone XI5H—1 48Figure 4.4 Nucleotide sequence of genomic DNA clonesS3.7 and E2.8 from Picea glauca uS legumin storageprotein, and deduced amino acid sequence 50Figure 4.5 Comparison of uS legumin genes from Piceaglauca and angiosperm subfamilies A and B 54Figure 4.6. Comparison of uS legumin intron flankingsequences 55Figure 4.7. Deduced amino acid sequence of spruce uSlegumin 60Figure 4.8. Amino acid alignment of uS legumin proteinsfrom Picea, Pseudotsuga, Pinus strobus, cotton (goshi),.vioat(orysa), Arabic7opsis, pea, sunflower (helianthinin)63Figure 4.9. Determination of the transcription start site(+1) 70Figure 4.10. Location of putative regulatory sequenceson the promoter of Picea uS legumin gene 72viiAcknowledgmentsAfter three years of work it is difficult to list allthose whom I owe thanks. So I will begin at the beginning.I could not have undertaken these studies withoutassurance of financial support. I thank The Government ofCanada Award Program, The CONACyT awards Program, and forthe loan received from Banco de Mexico. I am perhaps mostin debt by BC Research for financial support, for the useof facilities and for friends and colleagues. Specialthanks to Ben who offered me his supervision and big help,to Craig who taught me as much as I could take. To JohnCarison, my academic supervisor, whom I thank for acceptingme as student, and specially for the help at the end of mythesis.I would like to thank professors, for sharing theirknowledge, even though I had hard time at the beginning,due to my English difficulties.Very special thanks to my parents, to all mybrothers and sisters, whose care and moral supportaccompanied me to the end. To Ian, who at the end of thisjourney brought so much happiness to my life.To Stephanie, Melody and Sheila, for their culturaland language teachings. To my “Latin Family” in Vancouver:Victor and Lety, Nilda, Celia, Jaime, Lynn, Trini, Jorge M.and Jorge CH., Ivan, Oliva, Ricardo, Gloria, Andrea andOscar. Specially for their care, and for their financialsupport when I needed most.viiiChapter 1INTRODUCTIONSeed storage proteins (SSP) are an importantconstituent of angiosperm and gymnosperm seeds, and ofthe spores of more distantly related ferns. They providenutrients for the germination and post-germinationprocesses necessary for the propagation of the species.Seeds contain 10 to 50% protein, most of which is storageprotein (Shotwell and Larkins, 1989). These proteins havebeen extensively studied in angiosperms, due to theireconomic importance, but not many studies have concentratedon gymnosperm storage proteins. Even though angiosperms andgymnosperms diverged from one another 330 million yearsago, the function and characteristics of SSP remain thesame in both groups. It has been demonstrated that duringthe maturation process of spruce somatic embryos thepattern of accumulation of SSP is similar to that of thezygotic embryos (Flinn et al, 1991b). Redenbaugh et al,(1986) have proposed that the quality of somatic embryosdepends on the extent of storage protein accumulation. Ithas been proposed that conifer storage proteins representuseful biochemical markers for developmental studies inzygotic and somatic embryogenesis (Flinn et al, 1991b,Flinn et al, 1993). At the amino acid level some SSPsequences have been highly conserved among monocots anddicots. Three recent papers have shown that gymnospermsalso share some of these regions of conserved sequence(Newton et al, 1992, Hager et al, 1992, Leal and Misra,1993). However, no data regarding SSP gene regulation havebeen published. To date only two gymnosperm cDNA sequenceshave been published, both of them belonging to coniferseeds (Newton et al, 1992; Leal and Misra, 1993). Tworecent papers suggested that at the mRNA level, SSPs inconifers are transcriptionally regulated (Leal and Misra,1993; Flinn et al, 1993). These studies on gymnospermshave provided some important information regarding aminoacid sequences and mRNA stability and transcription.Nevertheless, nothing is known about the structure andregulation of these genes in gymnosperms.Plant seed storage proteins are abundant and theirsynthesis and accumulation is developmentally regulated.These characteristics make them an ideal model system tostudy gene expression and, in particular, to compare andcontrast the structure and organization of the genes, themechanisms of gene regulation and the evolutionaryrelatedness between angiosperms and gymnosperms. Howeverthe genes encoding these proteins in gymnosperms have notbeen isolated and their structural organization is unknown.Characterization of these genes would allow one todetermine: Whether the SSP genes are structurally similar2(exons\introns) between angiosperms and gymnosperms, andwhether the cis-acting regulatory elements are similar inboth groups.The development of cDNA libraries, genomic DNAlibraries and a high quality embryogenesis system forinterior spruce at B.C. Research provide an opportunity tostudy gene regulation during embryo development in spruce.The contributions of this study include: 1) Thecharacterization of the white spruce uS legumin gene bysequencing of the coding region and comparison of thestructure of the gene to other legumin genes. 2) Deductionof the amino acid sequence and comparison to other leguminprotein sequences. 3) Cloning and sequencing of thepromoter of the gene and identification of putativeregulatory elements that could be important in the temporaland spatial regulation of the gene.To date this is the first report that provides directinformation on a complete SSP gene in gymnosperms. Thesequence of the spruce uS legumin gene and the comparisonswith homologous genes in angiosperms, provides informationabout structure and gene organization. It also providesdata regarding putative elements that may play a role inthe regulation of the gene.3Chapter 2LITERATURE REVIEW2.1 SEED STORAGE PROTEIN CHARACTERISTICS ANDCLASSIFICATIONStorage proteins from seeds are classified into:albumins, globulins, glutelins and prolamines (Shotwell andLarkins, 1989) due to their distinguishable physiochemicalcharacteristics. These SSP show solubility in water(albumins), salt (globulins), acid or alkali solutions(glutelins) or aqueous alcohol (prolamines) (Bewley andBlack, 1985) . The predominant proteins in cereals areprolamins and glutelins. Cereal storage proteins occurpredominantly in the endosperm and are limiting in lysine.In dicot seeds the most abundant storage proteins areglobulins and albumins which occur in the cotyledons andare limiting in methionine and cysteine. Within theglobulin group, two major forms of salt soluble proteinsare resolved from one another on the basis of sedimentationcharacteristics, and fall into two different size classesllS and 7S. Both are insoluble at pH 4.7 in O.2M NaC1.Because globulin proteins have been best characterized inlegumes, the llS and the 7S fractions are referred to aslegumins and vicilins, respectively. However, other trivialnames derived from the genus of the plant are also given.The legumin fraction has a sedimentation constant of 11-13Sand molecular weight of 36Ok1D, composed of six identical4subunits of 60 KD. The legumin subunits, which are notglycosylated, have two components: the acidic, or asubunit of 4OKD and the basic, or 1 subunit of 2OKd. Thesesubunits are covalently linked by a single disulfide bond.The vicilin fraction has a 7-9 S value, and a molecularweight of l8OkD. It is made up of three major subunitsa, a’ and f3 of 76KD, 72 KD and 53KD.2.2 GLOBULIN PROTEINS IN GYMNOSPERMSStudies based on solubility characteristics in coniferseed have shown the presence of globulin, also referred toas crystalloid protein and albumin type proteins, (Flinnet al, 1991a; Stabel et al, 1990; Hakrnan et al, 1990; Misraand Green, 1990; Green et al, 1991).Stabel et al, (1990) described 3 major storage proteinsin Picea abies during somatic embryogenesis. They foundaccumulation of storage proteins of 42, 33 and 22K]D inmature embryos and degradation upon onset of germination.Hakrnan et al, (1990) described an additional storageprotein of 281W, and showed that mature embryos containmore storage proteins than immature embryos. This indicatesthat, as in angiosperms, synthesis and accumulation ofstorage proteins in gymnosperms occurs during embryomaturation. Storage proteins with similar molecular weightshave been described in several Pinus species (Gif ford,1988) and Picea glauca. These findings suggest that seedstorage proteins may be conserved among conifers.5Comparing interior spruce SSP from different embryostages Flinn eC al, (1991a) found by SDS-PAGE that the 41,33, 24, and 23 KD storage proteins accumulated only inmature somatic and zygotic embryos. These proteinscorrespond to the storage proteins found in protein bodiesisolated from mature seed embryos of interior spruce. Theamount of these storage proteins are moderated by theinfluence of ABA (see ABA regulation). Misra and Green(1990) have shown that in the mature seed of white spruce,70% of the total protein content correspond to crystalloidproteins, the major storage proteins (35 kd range). Otherstudies have shown similar results in different seedclasses such as Pinus, Norway spruce, Douglas fir, etc.,(Gifford, 1988; Misra and Green, 1991)Interior spruce globulins (35, 33, 24 and 22KD) andalbumins (41KD), accumulate at different developmentalstages (Flinn et al, 199lb). Albumin-like proteinaccumulates at later stages of cotyledon maturation, andthe rest of the storage proteins, similar to legumes, startaccumulating during early embryo maturation. By twodimensional electrophoresis storage proteins appeared to becomposed of various isoforms (Flinn et al, 1991a). By PAGEanalysis under non-reducing conditions, Flinn et al,(1991a) found a 55-57 KDa protein. The characteristicpattern of storage proteins under reducing conditions wascomposed of 33, 24 and 22KD proteins, but no proteins with55-57KD, suggesting that disulfide linkages exist between6the 33 and 24 and 23 KDa proteins, analogous to legumins inangiosperms.Allona et al, (1992) have shown that the SSP content inPinus pinaster differs from other conifers, in thatglutelins represent 70% of total protein content whileglobulins and albumins constitute 26% and 4%, respectively.In this study the authors compared the structure and aminoacid sequence of the glutelin protein to other plants andconcluded that these glutelins are homologous to the uSlegumins. There are two basic differences between the P.pinaster glutelins and the uS legumin proteins: a) theextraction requires alkali solution (similar to riceglutelins) and b) the basic character of the largersubunit which appears to be acidic in the rest of the uSproteins. These data agree with the results from Jensen andLixue (1991), where within 31 species of Pinaceae studied,all except the 12 Abies species were shown to contain ilSlegumins. The Abies species lack liS legumins but haveinstead, glutelin like proteins. There is no data regardingthe amino acid sequence from Abies to compare with Pinuspinaster (Allona et al, 1992) or to define the homologybetween them. Jensen and Lixue (1991) suggest that theabsence of legurnin type proteins in Abies species may havesomething to do with the shorter period of viability ofthese seeds, compare to Picea or Pinus.Legumin-like proteins in seeds of Gingko biloba have beenreported (Jensen and Berthold, 1989). A 50 KD, dimer7separates into 28 and 21 KD subunits, that are linked bydisulfide bonds. The molecular weight, the charge, subunitsproperties and heterogeneity correspond to leguinin-likeprotein characteristics reported for angiosperms. It alsohas been demonstrated that the fern Onclea sensibilis(Templeman and DeMaggio, 1990), contain both globulinstorage proteins, 7S and uS, which are comparable to thosereported by Templemann et al, (1987) for Matteucciastruthiopteris. The fern Osmunda cinnamomea, also containsglobulin storage proteins of 5.5S and 11.3S (Templeman anddeMaggio, 1990). The fact that globulin storage proteinsshare similarities between all plant groups suggest astrong conservation of seed storage proteins during theevolution of seed plants.2.3 GLOBULIN STRUCTUREGlobulins have been extensively studied in both cerealand dicot seeds. In cereals they are not an importantcomponent whereas in most dicots they account for as muchas 80% of the total seed protein. In gymnosperms, it hasbeen shown that globulins and albumins are the majorcomponent of gymnosperm seeds [Picea abies (Stabel et al,1990) , Picea species (Gifford, 1988; Flinn et al, 1991aand 1991b; Roberts et al, 1990;) several Pinus species(Gifford, 1988; Allona et al, 1992), Douglas fir, Norwayspruce (Misra and Green, 1991), Gingko biloba (Jensen and8Berthold, 1989), and Fern species (Templemann and deMaggio,1990)].The vicilin polypeptides are best characterized fromvarious legume seeds (Nielsen, 1989) . They are isolatedfrom dilute salt extracts of seed meal as trimers withmolecular weights around 180 KD and contain randomcombinations of non identical subunits. Each trimer has oneor two N-linked glycosyl groups. The primary genetranscripts from the 7S genes are modified cotranslationally and post-translationally. The proteinsemanate from preproteins of 70 KD that, after loosing theirsignal peptide, are cleaved to produce the high molecularweight species (51 KD) and a smaller polypeptide of 20 KID.Legumin polypeptides have also been best characterizedin legume seeds, particularly from soybean and pea(Nielsen et al, 1989) . Based on electron microscopy ofsunflower uS protein (helianthin), Richelet and co-workers(1980) concluded that each complex is composed of sixsubunits arranged in two trimers (Nielsen et al, 1989)Similar results were found for rape seed uS globulin usingx-ray scattering (Plietz et al, 1983). The hexamer has amolecular weight of 360 KD. Subunits in the hexamer arenot glycosylated and need not all be identical. Differentforms of the llS subunits are part of the multimeinberfamilies present in several species. Each subunit has twopolypeptides components, one with an acidic and the otherwith a basic isoelectric point. The two components are9linked by a single disulfide bond. Legumin subunits insoybean (Nielsen, 1986) can be separated into two groups.Subunits in group I have uniform apparent molecular weightand contain more sulfur than members of group-Il. Subunitsin the same group are 88% to 90% homologous, howeverhomology among members of different groups is 40% to 50%.The differences between the different members can also beobserved at gene level (see chapter 2.8.).2.4 DOMAIN ORGANIZATIONA relationship of predicted domain organization between7S and liS globulins has been proposed, based on amino acidand physical characteristics for soybean, pea and frenchbean (Argos et al, 1985). Domain I is the NH2 terminuswhich differs significantly between 7S and uS. Domain IIcontains common regions and domain III is the COOH-terminushalf and is highly conserved. Nielsen (1986) proposed thatthe hydrophobic and most highly conserved domain is domainIII. Argos et al, (1985) proposed that the singledisulfide bond between domain I and III play an importantrole in maintenance of conformation of the subunit.Evolution of a common precursor for the vicilin and leguminfamilies has been proposed based on amino acid sequencecomparisons (Gibbs et al, 1989). The presence ofhypervariable regions between domains II and III, accountsfor the size differences between the two globulins. Theinsertions within these regions vary in length and consist10largely of repeated aspartate and glutamate residues, arevery acidic and are predicted to exist in a helicalconformation (for review see Shotwell and Larkins 1989). Bycomparing amino acid sequences from different species ithas been shown that there are repeats of 8 to 38 aminoacids corresponding to the hypervariable region at the endof domain II. These repeats contain a high proportion ofpolar, mainly acidic residues. Although thesecharacteristics are a common structural feature, theinserts can vary in length, amino acid composition andlocation within and between species.2.5 SYNTHESIS AND DEPOSITION OF SEED STORAGE PROTEINSDURING ANGIOSPERM AND GYMNOSPERM DEVELOPMENTThe synthesis of storage protein in seeds is regulatedduring development. In general gene expression starts atthe end of the mitotic phase and finishes at the end ofseed maturation when seed desiccation takes place. Alongwith the increase of storage protein formation,proliferation of the rough endoplasmic reticulum takesplace (Muntz, 1989). Seed storage proteins are synthesizedat membrane-bound cytoplasmic polysomes (Bollini andChrispeels, 1979) and transferred from their site ofsynthesis to protein bodies (Nielsen et al, 1989; Bewleyand Black, 1985). The mechanism of protein sorting remainsunknown. Seeds contain more than one class of storageprotein and each has a characteristic temporal11accumulation pattern. Despite the differences betweenangiosperm and gymnosperm embryo development, synthesis anddeposition into storage organs is quite similar (Fig 2.1).Storage globulins are synthesized by membrane-boundpolysomes as precursor polypeptides with NH2 terminalsignal sequence. The signal peptide directs thetranslocation of the nascent polypeptide into the lumen ofthe endoplasmic reticulum and is co-translationallyremoved. Soon after translation is complete the globulinprecursors are assembled into trimers within theendoplasmic reticulum and then transported to vacuoles viaGolgi apparatus. Once in the vacuole, the uS precursorsare cleaved into acidic and basic polypeptides whichremain linked by disulfide bonds. After the proteolyticprocess, the 115 type trimers assemble into hexamers.Vacuoles subdivide to form protein bodies for theaccumulation of storage proteins. Double-labeling ofstorage proteins of pea has shown that some protein bodiescontain both 7S and 115 globulin proteins (Craig andMillerd, 1981). Microscopic analysis of protein bodies fromnearly mature embryos of Interior spruce (Piceaglauca/Picea englemanii) showed that both globulin proteinswere present in the same organelles (Flinn et al, l991b).Protein bodies are confined to the cotyledon or thetriploid endosperm cells in angiosperms. In contrast thestorage seed tissue in gymnosperms is haploid andhomologous to the protothallium heterosporic ferns. It12develops independently and before fertilization of the eggcell (Jensen and Bethold, 1989). Protein bodies have beenidentified in mature and near mature seeds and have beenrarely reported at very immature stages in angiosperms orgymnosperms.Many storage proteins undergo post-translationalmodifications during deposition to convert them to thecorrect size (Muntz, 1989) . The primary translationproducts of the legumin genes undergo co- and post-translational modifications. A signal sequence that has ahydrophobic component is removed during the synthesis ofthe precursors, while cleavage to form the acidic and basicpolypeptides probably occurs in protein bodies. Inangiosperms, cleavage has been reported always to occurbetween an aspargine and a glycine, with the later becomingthe N-terminal of the basic polypeptide. However, recentdata on legumin-like protein of the gymnosperm Gingkobiloba has shown that there is a Asn residue at the Nterminus of the basic subunit (Hager et al, 1992).13Subunit Oligomer IntracelularStructure Composition Compartment1. Synthesis of_________________preproglobulin I2. Removal of_____I Isignal peptideI RER3. Disulfide bondformation Iri4.!stassemblyintoJ8S trimers5. Transport to r5 i1 Golgiprotein body________via Golgi6. Proteolicprocessing IProtein7. 2nd assembly into__Bodies11 S hexamersFig 2.1 Pathway for synthesis and processing of uS seedstorage globulins. Taken from Shotwell and Larkins, 1989.(In the figures the white areas represent the oc subunits,and the black areas the j3 subunits. RER = Roughendoplasmic reticulum)142.6. GENE REGULATIONGenes encoding seed storage proteins have been thesubject of intensive studies towards the understanding ofgene expression. Seed storage proteins are encoded by adiverse gene set that is highly regulated during the plantlife cycle (Goldberg et al, 1989). Seed storage proteingenes are expressed under tissue specificity anddevelopmental regulation and therefore are an excellentsystem to study the control of gene expression. The extentto which interactions between the embryo and surroundingtissues regulate development remains uncertain as are thesignals that form the basis of these interactions. Howevermany attempts to elucidate these processes have beenperformed in angiosperms, especially in the past 10 years.The isolation and characterization of mRNA5 and theircorresponding cDNA5 encoding seed storage proteins haveproduced a vast amount of information regarding amino acidsequences, number of genes, temporal and spatialregulation, and in many cases data about the structure ofthe genes themselves. There is not much informationpublished concerning the genes for SSP5 in gymnosperms.The cDNA sequence for the vicilin gene of spruce (Newton etal, 1992), cDNA sequence for legumin and albumin genes ofspruce (Newton, in preparation), c]DNA sequence for legumingene in Douglas fir (Misra and Leal, 1993) have led tointeresting comparisons between angiosperms and gyrnnospermsat this level which permit speculation about the evolution15of these proteins. However, no data from genomic DNA cloneshas been published, therefore the organization of thesegenes in gymnosperms still remains unknown.2.6.1 ABA REGULATIONAlthough the regulation of storage protein genes isinfluenced by the developmental stage of the seed/embryo,the details of how this occurs is not known (Bauinlein etal, 1991). Information from phytohormone action duringdevelopmental events has provided a better understanding ofembryo development. It has been shown that thephytohormone absicic acid (ABA) mediates a number ofimportant physiological processes in plants (Finkelsteinet al, 1985; Mundy and Chua, 1988). The mode of action ofthe hormone via receptors and/or transduction pathwaysremains obscure. Evidence at the physiological level inmonocots and dicots indicates that ABA plays a major rolein the control of embryo maturation and suppression ofprecocious germination (Mundy and Chua, 1988). It has alsobeen demonstrated that ABA plays an important role in theproper regulation of gymnosperm embryo maturation(Redenbaugh et al, 1986). Roberts et al, (1990) and Flinnet al, (199lb) have shown that including ABA during thematuration of spruce somatic embryos results inaccumulation of storage proteins and suppression ofpremature germination. Globulin proteins including16legumin and vicilin as well as albumins are accumulated inresponse to ABA.ABA has been found to regulate storage proteinaccumulation during embryogenesis at the transcriptionallevel in seeds of diverse species of angiosperrris (Kuhiemeiret al, 1987; Mundy and Chua, 1988). Exogenous ABAincreases precocious accumulation of seed storage proteinmRNA5 in immature embryos in angiosperms (Finkeistein etal, 1985) and in gymnosperms (Roberts et al, 1990). It hasbeen demonstrated that in the case of legumin, bothprotein and mRNA accumulate in response to ABA at specificdevelopmental time in angiosperms (Finkeistein et al, 1985)and gymnosperms (Roberts et al, 1990). Sorbitol treatmentsproduced an increase in ABA that preceded storage proteinmRNA, suggesting that osmotically induced ABA stimulatesstorage protein expression in rapeseed (Wilen et al, 1990).Interior spruce zygotic and somatic embryos have shown toaccumulate legumin, vicilin and albumin storage proteinmRNAs, from the cotyledon stage to late embryo maturationstage, in the presence of ABA (Flinn et al, 1993) Theamount of proteins accumulated and the transcript levels ofthese storage proteins in somatic embryos were ABAconcentration dependent (Roberts et al, 1990; Flinn et al,1993). Stimulation of storage protein accumulation inexcised zygotic embryos by osmotic stress has beendemonstrated (Finkeistein et al, 1985). In response toosmotic stress ABA levels in vegetative tissues are17increased (Skriver and Mundy, 1990), and it has beensuggested that osmotic effects on embryo development aremediated via increased ABA levels. Vicilin and leguminproteins accumulated in broad bean cotyledons in responseto high osmoticum (18% sucrose). It has been suggested thatthe effects of ABA can be triggered by osmotic stress inrice (Bostock and Quatrano, 1992). Recently Flinn et al(1993) have demonstrated osmotic stress induced storageprotein and storage protein transcript accumulation insomatic embryos. The combined effect of fluoridon (aninhibitor of endogenous ABA biosynthesis) and high osmotictreatment caused the synthesis of storage proteins to beinhibited. These results suggest that similarly toangiosperms (Bostock and Quatrano, 1992), gymnosperrns mayhave an ABA pathway that is induced by stress.2.7 SSP ARE MEMBERS OF MULTIGENE FAMILIESLike other eukaryotic genomes, multigene families arecharacteristic in plants. Genes encoding globulin storageproteins (vicilins and legumins) belong to multigenefamilies (Ellis et al, 1988; Heim et al, 1989) , varyingfrom a few to as many as 20 members. Hybridizationexperiments and cDNA sequence analysis have confirmed thatlegumin multigene families are divided into two subfamiliesA and B (Baumlein, 1986; Dure III, 1988; Turner, et al,1993; Shotwell and Larkins, 1989; Breen and Crouch, 1992;Depigny-This, et al, 1992; Wang et al, 1987; Takaiwa et al,181991; Shotwell et al 1990; Pang et al, 1988) . The aminoacid identity between the two subfamilies is 40 to 50%,however between members of the same subfamily thepercentage of identity is about 80%. RFLP experiments havealso confirmed for some species the presence of multigenefamilies (Pich and Schubert, 1993; Domoney and Casey 1985;Domoney et al, 1986)De Pace et al (1991) have shown by in situhybridization that genes encoding the 2 legurnin subfamiliesin Vicia Eaba are arranged in two clusters: the genesencoding legumin A are located in the long arm of the twoshortest subtelocentric chromosome pairs whose centromereis in a less terminal position; those coding for legumin Bare located in the non-satellited arm of the longersubmetacentric pair. Casey et al (1988) have also shownthat the two legumin genes for Pisum sativum are located intwo different chromosome pairs.2.8 GENE STRUCTUREGenes for uS legumin subunits share common features.The coding region is approximately 2.7 Kb including 2 or 3introns (Shrisat et al, 1989; Nielsen et al, 1989) insubfamily B and A, respectively. The introns are ofvariable sizes, 70 bp to 600 bp, however the positions arewell conserved for soybean, broad bean, pea, and oilseedrape (Baumlein et al, 1986; Rodin et al, 1992; Sims andGoldberg, 1989). In angiosperms the position of intronsi19and 2 of the subfamily B, correspond to the positions ofintrons 2 and 3 from subfamily A genes (Baumlein et al,1986). All the intron/exon junctions follow the GT/AG rulefor eukaryotes.2.9 REGULATORY SEQUENCESRecently, attention has been given to the study of the5’ flanking sequences and to the structural and functionalanalysis of the upstream region that regulates seed storageprotein genes. The use of transgenic plants, such astobacco, Petunia and ArabicIopsis, to investigate controlsequences regulating seed storage protein gene expressionhas revealed an evolutionary conservation of regulatoryprocesses. This includes tissue specificity and temporalregulation of the genes as well as correct regulation andprocessing for mRNAs and proteins (transient signalcleavage and glycosylation) (Bustos et al, 1991).2.10 TISSUE SPECIFICITY AND TEMPORAL REGULATIONFusion experiments of globulin genes to reporter genesand the subsequent introduction into transgenic plantshave demonstrated that SSPs can be expressed in the correctsize and composition only in mature seeds. Shrisat et al,(1989) used a T-DNA construct containing 3.4 Kb pea LegAfragment fused to a nos reporter gene which was introducedinto tobacco plants via Agrobacterium. They demonstratedthat the 3.4 Kb fragment contains all of the information20necessary for seed specificity and correct processing ofthe primary transcript and the legumin precursor. Ellis etal, (1988) showed that a 1.2 Kb upstream sequence of thepea LegA gene was able to direct synthesis of the leguminprotein in transgenic tobacco. Shrisat et al, (1989)showed by deletion analysis of LegA, and transientexpression in transgenic tobacco plants, that transgeniclegumin protein was only present in seeds and absent inleaf tissues. Baumlein et al, (1991) have cloned a 4.7 Kbfragment of the LegB from Vicia faba containing the codingregion, 2.4kb upstream and 0.3 Kb 3’, and showed that itwas functional after transfer into transgenic tobaccoplants, and was only expressed in seed tissue. Deletionanalysis of legumin genes have defined important regionsfor high levels of expression. Partially deleted promoterfragments of LegB were inserted in a vector plasmid(pGV18O) in front of the nptll gene and transferred intotobacco via Agrobacterium. Expression was detected by nptllenzyme activity, and in situ hybridization to an antisenseRNA probe. The results revealed that similar to the peaLegA gene (Ellis et al 1988), about 1.2Kb of the LegBflanking sequence is enough to confer high levels ofexpression. The possibility of minor positive elementsfurther upstream was suggested. A construct containingonly 0.2Kb of the upstream sequence resulted in a dramaticreduction of nptll activity. Shirsat et al (1989) showedthat a 97 bp 5’ fragment of pea LegA which contains the21CART and TATA boxes was not sufficient to induceexpression. However the synthesis increased by increasingthe 5’ flanking sequence, suggesting that additional ciselements must be involved. An interesting question arosefrom these results: What are the DNA sequences involved inthe temporal and spatial regulation of these proteins? Toaddress this question different approaches have been used:a) sequence analysis of legumin promoter regions to defineCis-acting sequences; b) in vitro mutagenesis of specificDNA-motifs and; c)mobility shift assays to test thebinding of nuclear factors or known transcription factors.2.11 THE ROLE OF CIS-ACTING ELEMENTS AND CONSERVEDMOTIFS ON THE REGULATION OF LEGUMIN GENE EXPRESSIONIn the search for specific motifs involved in theregulation of gene expression in seed storage proteinsseveral putative sequences have been found. The role ofthese sequences in transcriptional regulation has beenstudied and in some cases confirmed. The legumin box is ahighly conserved sequence of 28 bp,TCCATAGCCATGCAAGCTGCAGATGTC present in all legumes studiedto date (Riggs et at, 1989; Shirsat et al, 1990; Ericson etal, 1991) . These are also referred to as RY repeats forstorage protein genes other than legumin (Dickinson et al,1988). A 549 bp 5’ flanking sequence containing CART, TATAand the Legumin Box could direct legumin synthesis,suggesting the involvement of the legumin box in regulation22of gene expression (Shirsat et al, 1989). This was alsosuggested by the absence of expression when using a 97 bp5’ sequence (as mentioned above) which only contained 12 bpof the legumin box. Since the leguinin box is present notonly in legumin genes but in all seed storage proteingenes (Riggs et al, 1989; Chamberland et al, 1992), it hasbeen suggested that presumably this sequence has a role inthe regulation of tissue specificity. Many attempts toelucidate the function of this sequence have beenperformed. Baumlein et al (1991) observed a 10 foldreduction of expression when using a 200 bp 5’ sequencecontaining the legumin box, arguing that the presence ofthe legumin box within this sequence plays no role in thehigh level of expression in developing seeds. Thepossibility that its function is dependent on other Ciselements is not clear. However, other studies havesuggested that the legumin box plays an important role asenhancer of gene expression (Lelievre et al, 1992) . Bycomparing the expression of a construct containing thefull Gy2 glycinin promoter from soybean or the samepromoter without the leg-box, Lelievre et al (1992)observed a ten-fold reduction when the element was notpresent, suggesting that the leg-box has a role inregulating the amount of expression of the gene.Chamberland et al (1992) have shown that the leguminbox plays an important role in —conglycinin transcription.In the case of soybean -conglycinin gene there are two23well defined legumin boxes and the mutation of bothresulted in a ten-fold reduction in the transcription ofthe gene.Three other regulatory elements closely related to theconsensus sequences in glutelin genes in cerealsTG(T/A/C)AAA(G/A) (G/T) were reported in pea legA betweenthe -1203 and -549 5’ flanking region (Shirsat et al,1989). This sequence has been implicated in the expressionof storage protein genes by nuclear DNA-binding proteinexperiments as well as by promoter analysis.2.12 DNA-BINDING PROTEINSAn important step in the signal transduction pathwaylinking stimulus perception to alterations of eukaryoticgene expression is the binding of nuclear proteins, i.e.,trans-acting factors to specific Cis-elements locatedprimarily on sequences 5’ to the gene coding region.Evidence accumulated to date indicates that sensitiveregulation of transcription in a cell-type ordevelopmentally specific manner is achieved by multiplicityof interactions between promoter enhancer sequences andtrans-acting factors with either stimulatory or repressivefunctions ( Meakin and Gatehouse, 1991)Cis-acting elements controlling seed-specificexpression have been identified in maize zein, wheatglutenin, barley hordein, oliseed rape napin, soybeanlectin, conglycinin and french bean phaseolin genes24(Jordano et al, 1989, and references therein) . Conservedelements have been postulated to play an important role inactivation of gene transcription by the binding of transacting nuclear proteins.Examining sequence specific DNA-protein interactions,by DNA-protein binding and mobility shift assays, Shirsatet al, (1990) demonstrated that nuclear proteins stronglybound the -549bp flanking sequence. However a truncated -124bp LegA construct fragment containing the complete leg-box sequence with 6 additional 5’ bases did not bindnuclear proteins.DNA footprinting experiments demonstrated interactionof a nuclear protein from pea seed (LABF1) with the -549 to-316 fragment of LegA 5’ flanking region (Meankin andGatehouse, 1991) . Gel retardation assays showed thespecific interaction between two LegA promoter fragments (-540 to -316 and -833 to -584) and pea seed nuclearproteins. The promoter sequence of LegA between -316 to+40 did not form stable complexes with seed nuclearprotein. Developmental regulation and tissue specificitybetween nuclear proteins and legA promoter was demonstratedby gel retardation assays. The nuclear protein binding thepromoter region showed a molecular weight of 84 - 116 KDthat was determined by elution and renaturation of proteinfrom PAGE-SDS. Its function as DNA-binding protein wasconfirmed by competition assays. Meankin and Gatehouse(1991) showed the tissue specificity and developmental25regulation of this binding factor. Extracts from peasduring development were tested with a probe consisting of -549 to -316 LegA promoter. The factor was detected in seedextracts 12 to 19 days after anthesis (DAA). The 15 DAAseed extract interacted strongly to the probe. No pea leafextracts recognized the probe. The evidence that LABF1was seed specific and that its binding activity wastemporally correlated with synthesis of mRNA (Thompson andLarkins, 1989), suggested that it may act as atranscriptional enhancer. Since a low level oftranscription occurs when LABF1 protein is not detectable,transcriptional enhancement rather than induction wassuggested.In studies on sunflower helianthinin gene expression,Jordano et al (1989) detected nuclear proteins that bindan AlT rich region upstream of the helianthinin promoter.Binding competition experiments showed that sunflowerembryonic and somatic nuclear proteins bound to the frenchbean phaseolin gene and to the carrot DcG3 embryo specificgenes, suggesting that binding activities are conservedbetween plant species. In the same report the authorsshowed that the sequence, containing the protein bindingsite, fused to a reporter gene (GUS) and driven by atruncated CaMV 35S promoter, enhanced expression in seedsin transgenic tobacco plants.Elements that bind nuclear proteins in other speciesare mostly AlT rich, and do not show any particular26sequence conservation, e.g. sunflower helianthinin (Jordanoet al, 1989), Pha in phaseolus vulgaris (Riggs et al,1989), soybean lectin (Jofuku et al, 1987), and twosoybean globulin genes (Kitamura et al, 1990; Itoh et al,1993)Despite its highly conserved sequence, no proteinsbinding to the legumin box have been detected (Meankinand Gatehouse, 1991; Shirsat et al, 1990; Itoh et al,1993).Riggs et al, (1989) suggested that as an alternative toregulation by soluble proteins, the CATGCATG motif may forma Z-DNA structure in vivo. One possibility is that after anactivator protein binds to the upstream region (-549 to -316), the CATGCATG motif may adopt an altered conformationthat enhances the recognition for or passage oftranscriptional complexes. Itoh et al (1993), alsosuggested that assuming that the leg box could be a bindingsite for nuclear factors, these factors may be veryunstable, or they may require other factors binding at adifferent site for interaction with these motifs as foundin yeast mating type regulatory proteins.27Chapter 3MATERIAL AND METHODS3.1. uS LEGUMIN GENOMIC DNA ISOLATIONThe spruce uS legumin cDNA XI5H was obtained fromDr. Craig Newton at B.C. Research. The cDNA was labeled andused as a probe to isolate the white spruce uS legumin —genomic clone. The EMBL3 Eastern white spruce total genomick-library was constructed by L. DeVerno (PFNI). Isolatedfrom a partial Sau3a digest, DNA was cloned into a BamHlsite.3.2. RANDOM LABELING20 ng of XI5H cDNA or E2.8 DNA in 11111 H20 wereheated at 100°C for 5 mm. to separate the double strandedDNA. After heating the DNA sample was placed on ice and allthe labeling components were added (2J.ti lOX labeling buffer,2J11 dNTPS (2 inN each G, T, C), 1J.Ll pN6, ijil BSA (1 mg/ml), 1jil 0.1 M DTT, 2J11 &2P-dATP (5000 Ci/inmol; Dupont), 1 unitof Klenow enzyme (BRL). The labeling reaction was allowed toproceed at room temperature overnight. The probe was thentwice precipitated with half volume of 7.5 M NH4OAc and2.5 volumes of cold 95% ethanol, using of tRNA (2 mg/ml)as carrier (-80 0C; 30 mm.). Sample was centrifuged at12,000 rpm, 15 mm., 4°C, The pellet was dried at roomtemperature and resuspended in 100 jil water. lj.Ll of sample28was used to verify 32P-ATP incorporation in a liquidscintillation counter.3.3. ?-GENOMIC DNA CHARACTERIZATIONThree ? clones (XI5H-l, XI5H—2, XI5H-3) provided byDr. Craig Newton were used to characterize the genomic DNA.?-phage dilutions (lO-lO plaques /ml of SM ( 50 mM Tris-HC1pH 7.5, 100 mM NaC1, 10 mM MgSO4) were mixed with 0.1 ml ofER1647 bacteria host and incubated at 37°C in 10 ml-falcontubes for 20 mm. Following the incubation 3 ml of TB topagar (10 g/l tryptone, 5 g/l NaC1, 6 g/l agar) were addedand plated on TB plates, incubated at 37°C for 7 hr andplaced at 4°C overnight. One nylon hybridization filter(O.45J1m) was placed on top of each plate for 2 mm. to allowphage to adsorb to filter. Filters were peeled off theplates and placed DNA side up on 3M Whatman paper, soaked indenaturing solution (1.5 M NaC1; 0.5 M NaOH) for 5 mm.followed by neutralizing solution (1.5 M NaC1; 0.5 M TrisHC1 pH 7.5) for another 5 mm.. Filters were air dried onWhatman paper for 20 mm. and exposed to UV light for 7 denature DNA. Filters were placed in 10 ml prehybridization solution (5X SSPE; 1% SLS; 0.1% NaPyrophosphate; 200 Ig/ml denatured salmon sperm DNA) for 2hr at 65°C, the radiolabeled cDNA probe (see above) wasadded, and hybridization proceeded overnight at 65 0C.Filters were washed twice with 2X SSC pH 7 (0.15M NaC1,290.015M Na-citrate) and 0.1% SDS for 30 mm. at 65°C thenair dried at room temperature and exposed to Kodak X-rayfilm with intensifying screen (-80 °C, overnight). Followingovernight exposure the film was developed. The positiveplaques were identified by aligning filters to X-ray film.Then by aligning filter to agar plates, three single phageplaques were identified, picked, transferred to culturetubes with 1 ml SM each, and shaken for 2 hr at roomtemperature. Dilutions of phage were made in SM, incubatedfor 20 mm. with bacteria host and plated on TB agarplates. After overnight incubation at 37 0C, the number ofplaque forming units (pfu/ml) was determined.3.4. 2-DNA PREPARATION5 x io9 pfu/ml of XI5H-1 and XI5H-2 were added toa10 ml falcon tube containing 10 ml of ER1647 culture growthovernight in SM media ( 5 x 108 cell/ml ) containing 4 ml ofSM, mixed by inversion and placed at 37 0C for 20 mm..Lysates were added to 250 ml TB media and shaken at 2000rpm (37 0C; 5 hr). 5 ml of chloroform were added to each250 ml of lysed culture and shaken at 37 °C 10 mm. lJig/mlof each DNase I and RNase were added to lysates andincubated 30 mm. at room temperature (RT), followed by theaddition of NaC1 to 1M final concentration, dissolved byswirling and placed 1 hr on ice. Lysates were centrifuged11,000 rpm; 10 mm.; 4 0C ). 25 g of PEG 8000 (10% final30concentration) were added to supernatant, dissolved bystirring (RT), cooled on iced water and placed in the coldroom overnight. After centrifuging (11,000 rpm; 10 mm.)pellets were resuspended in 8 ml SM media. 8 ml ofchloroform were added and samples centrifuged (3000 rpm; 15mm.; 4 °C), the aqueous phase recovered and bacteriophageparticles collected by centrifugation (25,000 rpm; 2 hr; 4°C) and resuspended in 0.5 ml SM (4 °C; overnight; rockingplatform).3.5. EXTRACTION OF 2-DNAThe bacteriophage solution was gently resuspended inSM media, and EDTA (0.5 M ), 51i1 proteinase K (5mg/mi), 25 jil SDS 10% were added and incubated 1 hr at 56°C). This step was followed by two chloroform:phenol (1:1)and one chloroform extractions. DNA in the aqueous phase wasprecipitated with half volume 7.5 M NH4OAc and 1 volumeisopropanol at -20 °C overnight. DNA was pelleted bycentrifugation (14,000 rpm, 20 mm.), washed with 70 %ethanol, dried and resuspended in TE buffer (10 mM Tris-HC1pH 8.0, 1 inN EDTA) and CsC1 purified. A 10 El]. sample wasdigested with EcoRI and Hindlil restriction enzymesfollowing the instructions from supplier (Pharmacia), andrun on a 0.8% agarose gel in TEA buffer containing EtBr(0.5 jig/mi).313.6. CsCI DNA PURIFICATIONAS-DNA and plasmid DNA (E2.8 and S3.7) were CsC1purified. CsC1 plasmid DNA preparations were used for thegeneration of deletion constructs, for southern blothybridization, as well as for sequencing reactions.After the DNA extraction step 2- or plasmidnucleic acid pellets resuspended in 2.4 ml TE buffer werepurified by equilibrium sedimentation in cesiumchioride-ethidium bromide (CsC1-EtBr) gradient. 4.2g of CsC1and 400 jil of 10 mg/mi EtBr were added to the plasmidsolution, centrifuged (6000 rpm; RT) in a JA-21 Beckmancentrifuge for 10 mm.. A Ti70.1 quick seal ultracentrifugetubes was partially filled with 8 ml of light cesiumchloride solution (63 g/100 ml TE), and the DNA plasmidsolution placed at the bottom of the tube. The tube wasfilled with cesium solution, balanced, sealed andcentrifuged for 18 hr (40,000 rpm ; 20 °C) in a Ti 70Beckman ultracentrifuge rotor. After centrifugation, thetube was protected from light, and the position of theplasmid band determined by exposing the tube to UV light.The lower DNA band was removed from the tube using a 1 mlsyringe with a wide bore needle. Three volumes of TE wereadded and DNA extracted 4 times with an equal volume ofwater saturated isobutanol. The lower aqueous phase wastransferred to a 30 ml corex tube and precipitated with 332volumes of cold 95% ethanol overnight at 20 °C. The samplewas centrifuged (15,000 rpm; 30 mm.), the pellet rinsedwith cold ethanol, dried at room temperature, andresuspended in TE buffer.3.7. SOUTHERN BLOTAfter digesting 5 DNA each, with EcoRI, Hindill,Sail, and combinations of them, following conditions fromPromega, samples were loaded on a 1% agarose gel in TBE(0.89 mM Tris-base, 0.89 inN boric acid, 20 JIM EDTA)containing 0.5 JIg/mi EtBr, run for 4 hr at 75 V/H. The gelwas washed in HC1 solution (21.5 ml/l) 30 mm., in 3M NaC11M NaOH three times for 10 mm., denatured in 1M NH4QActhree times 10 mm. and blotted to a nitrocellulose filterfor 4 hr at room temperature. After blotting the filter wasallowed to dry at room temperature and DNA fixed 5 mm.under UV light. After this step the filter was hybridized tothe labeled cDNA probe using the same method describedabove.3.8. MAPPING THE -GENOMIC DNAIn order to generate a map of the legumin genomicDNA, restriction digestion of the k-DNA and plasmid DNA wereperformed. In all cases CsC1 purified DNA was used. 5 g ofDNA were digested with each of the following enzymes orcombinations of them: Ec0RI, SalI, PstI, BamHI, Hindlil. All33enzymes were obtained from Promega and restrictionconditions were carried out as suggested by supplier. Theresulting fragments were separated according to size byelectrophoresis through a 0.8% agarose gel cast in 0.5x TBE,containing EtBr (0.5 2k-DNA size markers were loadednext to DNA samples and were used as a reference todetermine sizes. After electrophoresis was completed the gelwas photographed under UV light. Sizing the different DNAfragments was done manually by measuring the distances andreferring to the size markers, and also by computer scanningthe negative of the photograph through a Sparc 1 (Sun)scanner. After pictures were taken gels were blotted andhybridized to cDNA probe as described above.3.9. PLASMID DNA CLONINGThe pEMBL and pGEM-3Z expression systems wereconvenient cloning vectors to use due to the multiplecloning sites, and usefulness for deletion experiments andsequencing reactions. Both DNA fragment and Vector DNA weredigested with the same restriction enzyme, to producecompatible ends for cloning. k-DNA was digested with EcoRIor SalI and the products were visualized in 0.8% agarosegels. An EcoRI 2.8KD fragment (E2.8) and a Sail fragment of3.7 lCD (S3.7), were gel purified (using the prep-A-gene Kit,Promega), and cloned in vector pEMBL (EcoRl 2.8 KD fragment)or pGEM-3Z vector (SalI 3.7 ).343.10. VECTOR PREPARATIONpEMBL or pGEM vector (10 JIg) were digested withEcoRI or Sail as needed, following the Promega instructionsto get complete digestion, then treated with calf intestinalalkaline phosphatase (ClAP) (0.01 t/pmo1 ends in 100; 37°C; 1 hr) to remove 5’ phosphate groups and preventrecircularization of the vector during ligation. ClAPreaction was stopped by adding 2 Jil of 0.5 M EDTA. VectorDNA was phenol/Chloroform extracted once, the aqueous phaseextracted with chloroform:isoamyl alcohol (24:1), and DNAprecipitated with 0.5 volumes of 7.5 M ammonium acetate and2 volumes of 95% ethanol (-80 0C, 30 mm.). DNA pellets werecollected by centrifugation (12,000 rpm; 10 mm.), washedwith 95% ethanol, dried and resuspended in H20. The DNAconcentration was determined by absorption spectroscopyA2603.11. LIGATION OF VECTOR AND INSERT DNAVector DNA and insert DNA were mixed at l;l and 1:3molar ratio in ligase mix (1 jil ligase 5x buffer, 1 unit DNALigase, 1 p1 10 mM ATP, H20 to 10 Jil) for overnight reactionat room temperature. After ligation reactions werecomplete, plasmid DNA was transformed into SURETM competentcells (see competent cells below)Aliquots of 50 p1 of competent cells were thawed on35ice and 2.5Jil of the plasmid ligation reaction were addedand incubated 15 mm. on ice. To increase transformationefficiency a heat shock at 37 °C for 1 mm. was performed,followed by 2 mm. on ice. 200J1l of LB medium were added toeach tube and incubation at 37 °C was allow for 45 to 60mm.. Cells were plated on LB (10 g/l Bacto-tryptone, 5 g/lBacto-yeast extract, 5 g/l NaC1, pH 7) plates containing 50Ig/ml ainpicillin, 10 Il IPTG (1M), and 50 X-Gal (20mg/ml) for 14 - 16 hr. Recombinant white colonies wereselected and single bacteria colonies inoculated on 2 ml LBmedium containing 50 .tg/m1 ampicillin, incubated 8 - 14 hr.,followed by miniprep plasmid DNA isolation procedures.3.12. COMPETENT CELLS PREPARATIONlml YT/Mg (20g/l bacto-tryptone, 5 g/l yeast extract,5 g/l NaC1 2.5 g/l MgSO4) was inoculated with 1 SureTM(Stratagene) colony and grown to mid-log phase. Bacteriawere then added to a lOOmi warm YT/Mg in 500 ml flask andgrown to A600 =0.6. Bacteria cells were chilled on ice,pelleted by centrifugation (3,500 rpm, 15 mi 20C) andgently resuspended in 40 ml of cold TfBI (30 mM KOAc, 50 inNMnCl2 100 mM KC1, 10 inN CaCl2, 15% glycerol). The bacterialsuspension was centrifuged as above, resuspended in 5m1 ofcold TfBII (10 mM Na-MOPS pH 7.0, 75 mM CaC12, 10 inN KC1,15% glycerol), aliquoted and frozen in liquid nitrogen andstored at -70°C.363.13. ISOLATION OF PLASMID DNA BY ALKALI METHODPlasmid DNA was isolated by the alkali lysisprocedure described by Maniatis et al (1982). 1.5 ml of theplasmid cultures were centriguged at 12,000 g for 2 mm. inmicrofuge tubes. The bacterial pellets were resuspended byvortexing in 100 111 ice cold lysis buffer (25 mM Tris-HC1,ph 8.0, 10 inN EDTA, 50 inN glucose), followed by the additionof freshly prepared solution II (0.2N NaOH, 1%SDS), mixingby inversion and incubating 2 mm. at RT. 150 il of ice-coldsolution III (Potasium acetate pH 4.8) were added and mixedby inversion, incubated 5 mm. on ice, and centrifuged at12,000 g 5 mm.. The supernatant was separated and onevolume of phenol:chlorophorm (1:1) was added, vortexed for 1mm. and centrifuged 5 mm. at 12,ooo g. The upper aqueousphase was precipitated with 2.5 volumes of ethanol (95%)for 5 mm. at RT. DNA was pelleted by centrifugation at12,ooo g 10 mm., washed with 70% ethanol, vaccuum dried,and resuspended in 50 p.1 TE buffer.3.14. GENERATION OF UNIDIRECTIONAL DELETION CONSTRUCTSFOR SEQUENCINGThe erase a-base-systemTM (Promega) was used for theconstruction of subclones containing progressive deletionsof the legurnin gene and promoter, to facilitate thesequence analysis. The system is based on the use of37exonuclease III to digest DNA from a 5’ protruding or bluntend, while leaving a 4 base 3’ protruding end or an aphosphotioate filled end intact. The digestion produced aseries of deletions of increasing size that were exposed toSI nuclease which removed the single stranded tailsremainded from the Exo III digestion. SI nuclease wasneutralized and heat inactivated. Kienow DNA polyinerase wasadded to the reaction to generate blunt ends, that wereligated to circularize the deletion containing plasmids.Half of each reaction was used to transform SUreTM(Stratagene) competent cells. After transformation 4 to 10colonies of each deletion time were selected and plasmidpreparations were performed followed by enzymaticrestriction to determine the samples to be sequenced.Two CsCl-purified DNA inserts were used forsequencing purposes, the E2.8 (containing the completecoding region and 0.7kb of 3’ flanking sequence) and theS3.7 (containing 1.4kb of the promoter sequence and 2.3Kb ofthe coding region). Bacteria containing the E2.8 or S3.7insert were grown overnight in two hundred and fifty ml ofYT broth (8 g/l bacto-tryptone, 5 g/l bacto-yeast extract, 5g/l NaC1) with ampicillin added at 100 jig/ml. Following theincubation at 37 0 nucleic acids were isolated using thealkaline lysis method.383.15. SEQUENCING METHODOLOGYThe Promega fmolTM sequencing system was used tosequence the uS legumin gene. The fmol system uses Taq DNApolymerase which is stable at 95°C and which replicates DNAat 70 °C, and allows use of a thermocycling apparatus. (Twinblock TM system, ERICOMP). Three different primers (27merlegumin specific (5 -GCCTAGGCGTTAATTGTCATAGACGTA-3’), 24merForward, 2Omer Reverse) were end labeled (10 pmoles primer,10 pmoles ‘y-ATP, lOX T4 buffer (500 mM Tris-HC1 ph7.5,100 mM MgC12, 50 mM DTT, 1 mM spermidine) 5 units T4polynuclotide kinase, 30 mm. 37 0C. The kinase wasinactivated at 90°C 2 mm. and the labeled primers were thenused for sequencing proposes. l-2J11 template DNA were mixedwith 4.5 jil of sequencing buffer (250 mM Tris-HC1 pH9.0,10mM MgC12), 1.5il labeled primer, H20 to 18 111, and lj.tlTaqDNA polymerase (5u/JIl). For each set of reactions 4 Llof the enzyme/primer/template were added to each 0.5 mleppendorf tube containing 1 jil of each of the four d/ddNTPmixes (G [40IM 7-Deaza dGTP, 40J.LM dATP, 401M dTTP, 40LMdCTP, 60p,M ddGTP], A [4OjiM 7-Deaza dGTP, 4011M dATP, 401.LMdTTP, 4OJiM dCTP, 700J1M ddATPI, T [40J..LM 7-Deaza dGTP, 401MdATP, 40p,M dTTP, 4OEIM dCTP, l200mnM ddTTPI, C [ 40pM 7-DeazadGTP, 40pM dATP, 40J1M dTTP, 40p,M dCTP, 400J1M ddCTP]). Onedrop of mineral oil was added to each tube, spun for 2 sec.and placed in the thermal cycler preheated at 95°C for 239mm. The PCR program used for the sequencing reactions wasas follow: 95 °C 30 sec. (denaturation), 60 °C 30 sec.(annealing), 70 °C 1 mm. (extension) for 30 cycles total.After reactions were completed 3 jil of stop solution (10 mMNaOH, 95% formamide, 0.05% bromophenol blue, 0.05% xylenecyanole) were added to each tube. Samples were heated for 2mm. at 70 °C just before loading on sequencing gels.3.16. SEQUENCING GELS AND ELECTROPHORESIS5 ml of 50% long ranger solution (J.T.Baker), 21 gUrea, 6 ml lOx TBE, 25 ml H20 were mixed and filtered. 25 ji1 TEMED and 250 p.1 of 10% ainmonium persulfate were added andthe solution was transferred to a 50-60 ml syringe andinjected in between the sequencing gel glass plates. Aftergel polymerization (1-2 hr) sequencing reactions wereloaded. Electrophoresis was performed using 0.6X TEErunning buffer at 30 watts for 3-6 hrs. Once electrophoresiswas completed plates were separated and the gel transferredonto Whatman 3M paper, covered with saran wrap, vacuum driedat 80°C for 1 hr and exposed to a Kodak x-ray filmovernight.3.17 PRIMER EXTENSIONmRNA from white spruce proembryoand mature embryostages was obtained from Dr. Dave Cyr (BCResearch). 3 p.g ofproembryo and mature embryo mRNA5 were used per reaction.40Proernbry mRNA was used as a control and no RNA was used as anegative control.Three samples of mature mRNA one sample of proernbryoInRNA and no RNA sample were mixed each with 10 ng of 27merprimer previously labelled (see primer labelling), in 0.3MNaCl heated at 80 °C for 60 secs. and immediately after eachof the mature embryo mRNAs were placed at 42 °C, 55 O and65 0C for 15 mm. Pro embryo and no RNA mixtures were placedat 55 o for 15 mm. All samples were removed and theReverse Transcriptase mixture (0.1 M KC1, 0.1 M Tris pH8.5,0.1 mg/ml BSA, 0.01 M DTT, 0.01 M MgCl2,0.05 u/ml RNaseinhibitor, 250 UN dNTP5, 0.5 units Reverse Transcriptaseenzyme) was added, reactions were allowed for 1 hr. at 42°C. Reactions were stopped by the addition of RNase A (37 °C10 mm.) followed by ethanol precipitation (2 vol. ethanol95% and 1/2 vol. 7.5 M NH4Ac) at -80 0C for 20 mm. Sampleswere redissolve in sequencing running buffer (see sequencingmethodology) and ran along with S3.7 sequencing reactionsusing the same 27mer primer.41Chapter 4RESULTS4.1 IDENTIFICATION OF A GENOMIC CLONE CONTAINING THE uSLEGUMIN GENEThe EMBL3 Eastern white spruce -genomic library wasscreened with the spruce XI5H cDNA probe (uS leguminlike) (C. Newton, unpublished results). Three plaques (-1,-2,-3) that strongly hybridized to the probe were selected. Apartial restriction map was obtained for each DNA samplefollowed by a Southern blot hybridization using the 5’ endof the X15H cDNA as a probe. Phages XI5H-2 and XI5H-3 showthe same restriction pattern and are therefore considered tobe identical clones (data not shown) . Only one was used forthe southern blot. Fig 4.1 shows restriction digests andsouthern blot of clones 2 XI5H-1 and -2. The two clonesexhibited different restriction patterns. Nevertheless bothhybridized the probe, although very weakly for XI5H-2,suggesting the presence of at least two members of theleguinin family in Picea glauca. Since XI5H-1 stronglyhybridized the probe, it was selected for furthercharacterization. A restriction map for ? X15H-1 wasconstructed which showed that this clone has an insert of17.9Kb containing a 2.8 Kb EcoRI fragment (E2.8) and a 3.742Kb Sal I fragment (S3.7) that strongly hybridize the cDNAprobe. The 2 XI5H-1 restriction map is shown in figure 4.3.Three fragments from XI5H-1 were subcloned in E.colisequencing vectors, E2.8 was subcloned in a pEMBL vector,S3.7 in a pGEM-Z3 vector and S4.7 which contained 3’flanking sequence was subcloned in pUC9 vector. Aftersubcloning, the 3 plasmid DNA samples were CsCl purified. Inorder to obtain a full length genomic DNA the E2.8 and S3.7were selected. Both of them strongly hybridized the probeand they were large enough to contain a complete uS genebased on legumin genes in angiosperms (Ellis et al, 1988;Nielsen eC al, 1989). These two fragments were sequenced bythe PCR method and only when the sequence was not clear thesequenase Kit from USB was used for confirmation. The E2.8DNA was only partially sequenced. E2.8 has a Sail site atapproximately 0.3kb from the 3’ end. The E2.8 DNA wassequenced from the 5’ end to the SalI site, 2350 bp inlength containing 18 bp of 5’ flanking region, 483 bp of 3’non-coding region and an open reading frame of 1867 bp. TheS3.7 clone contains 1.4 Kb of promoter region, 1867 bp àfcoding region and 483bp of 3’ non coding sequence.43-‘‘.3 CA)C,’/0) -.4 0) CD 0wCA)043ICA)C,’ 0) -4 CDttttI1%)‘.3-Fig 4.1. Restriction enzyme digests (A) andSouthern hybridization (B) of 2 clones (1-5)XI5H-1 and (6-10) XI5H-2 presumably containing theus legumin gene from Picea glauca. 3.Lg of k-DNAwere digested EcoRI (lane 1,2,6 and 7); HindIll(lane 3 ,5,8 and 10); EcoRl/Hindill (lane 4 and 9);separated in 0.8% agarose gels (A) in presence ofEtBr, and (B) blotted on Hybond N and hybridizedto the 32P-labeled 5 cDNA probe. Bordering lanesshow DNA markers.45g0’-‘S Ic) Cu 0) -‘I 0) CD 0I,Fig 4.2. Restriction digests and Southernanalysis of genomic clone (XI5H-l) containingPicea glauca us legumin gene. 2ig of ?-DNA weredigested with either EcoRI (lane 1); EcoRl/Hindlil(lane 2), EcoRl/Sall (lane 3); EcoRI/PstI (lane 4);HindIII( lane 5); HindIII/PstI (Lane 6);Hindlil/Sall (lane 7); PstI (lane 8); SalI (lane9); SalI/PstI (lane 10); and were separated in 0.896agarose gels (A) in presence of EtBr, and (B)blotted on Hybond N and hybridized to the 32P-labeled E2.8 probe. ?-DNA markers are shown.47450 I 1867 I 483Fig 4.3 Restriction map of the 2-genomic spruce XI5H-1.The clones that strongly hybridize the spruce us legumincDNA are shown (E2.8 and 83.7). 83.7 and E2.8 contain adeletion 77 bp long, marked as a solid bar across them. ThecDNA is 1738 bp in length and is shown in the figure forcomparison. The empty bars inserted in the cDNA sequencecorrespond to the introns. The line at the bottom representsthe sequence obtained from the three regions; numberscorrespond to length in bp. Note that the restriction mapdo not correspond to the cDNA. LA= ? left arm; RA= rightarm; E= EcoRl, 5= SalI, H= Hindlil, B=BamHISE SEBEEHS HH H SfII’ 11111 III ‘IRAIPromoter Coding Region 3’ sequence I484.2. uS LEGUMIN CODING REGIONIn order to sequence the complete coding region,nested deletion experiments using the Promega erase-a-basesystem were performed. All deletion products were re-ligatedand ligation reactions were used to transform SURETMcompetent cells. White containing inserts over bluecolonies were selected, grown in YT media and DNA extractedas detailed in materials and methods. 2 .Lg of DNA of eachsample were restricted with EcoRI and separated in agarosegels, to select samples to be sequenced. All sequence datawas compared to the cDNA sequence using NA-align and NA-compare programs. These comparisons showed a 98.7% homologybetween the cDNA and gDNA, with the exception of a 77bpdeletion. Figure 4.4 gives the complete sequence of the geneincluding 5’ and 3’ flanking sequences. 10 nucleotides inthe genomic DNA are different from the cDNA. 5 of thesesubstitutions are present in the third position of codons,four of which do not produce a change in amino acid, whileone changes a serine for an arginine. Another 3 of thesesubstitutions are in the second position of codons andresult in a change to a similar amino acid. One nucleotidethat differs from the cDNA in the first position resulted ina change from an arginine to a cysteine. None of thesechanges are found in any of the highly conserved regions,and therefore do not greatly affect amino acid homologieswith other legumin genes.49AATATTAACA TTAAAAAT TTATGTAGGA ATATTTAAGC CAATAAAAA TATAAATATT 60TAAGTAATAA AAAATAAAAA ATATAAAATT TAAGTAATAA ATTTTTTCCT CGTGGAACGT 120ATTTTTTCCT CGTTAGATGT GAACACATAC ATTGACAGCA GCATTTCCTT AAACAAACAC 180TCAACTTT ACACGTCGAA TCGTACGACA TTACACGACA CGCCGGAGAG TAGCCGCATC 240ACACGTGATG AAGATTCCCT TTGGCCTTAA GCCCATGTGG CTCTCAGGAG TAGATATAGC 300+1CTTAATCATA TCGCCCTTCG CATGCTATAA_AGCTAATAAT ATTCAACAAC AGCAGGGAC 360CAGCCTGTGT ATAAAAACAC GAAGAAGCAT CTAGGAATTC AAAACGAAGC AAGAGAAATG 420AAGGGGAAGA TGATGAGATC AGCGCGTTGT CCACTGATGC AGATACTGTT AATTGCCTCT 480= H Q ILL IA S 8GCCTGCTTTC TTTTTCTCTC CCTGTCACT GTATCACCTG TAACTGCAAT TTCCCAGCAA 540A C F L F L S L S T V S P V T A I S Q Q 28AGJJGAGGAA GAGGTCGTCG TTACGATGAG CAATCATCGT CATGTCGGAG GCTGCGGCGG 600R R G R G R R Y D E Q S S S C R R L R R 48CTAP&GCGCCC ACGAACCGTC TGAATCGGAG ACGATAAGAT CGGATGGTGG CACCTTCGAA 660L S A H E P S E S E T I R S D G G T F E 68TTGTCCACTG GAGAGACAA CGAGGARTTA GAGTGCGCAG GCGTTGCCTT CTTCAGAAAG 720L ST GE D N EEL E CA G VA F FR K 88ACGATCGAAA GCAACGCCAT CTTGTTGCCC CGATATCCCA GCGCCGATCT GTTGCTTTAC 780TIE SN AX L L P R Y P SAD L L L Y 108GTTGTCCGAG GTAGGTTAAT ACATGATTGT GTATGGCACA TGATTGCCTA AAATTGTCAT 840V V R intron 1 111TATAATTGTG TATGCAG-TG AGGGCAGACT GGGAATTGTT TTCCCCGGAT GTCCGGAGAC 899GE G R I, G XV F PG C PET 126TTTCAGAGAT CATTCCTCGT TTCAAGGGCG ATCAGGCAC AGATCAGAGG GACGACGGGA 959F RD H S S F Q G R SR H R SE G R RE 146GGMGAGGAA GAGGAAGAAG AGGACTCAAG TCAGAGGTG AGGCGAGTGA GGAGAGGAGA 1019E E E HE E ED SS Q K V R R V R R GD 166CGTAATAGCG ATATTTGCAG GAGCAGCCTA CTGGTCGTAC AACGAPGGCA ACGAGCCTCT 1079VIA IF A GA A Y W S Y ND G NE P L 186CCAAATCGTA GGCATTGCCG ACACATCCAG CCGTCGAAAT CAGGGCCGCA GCAGGAGTTA 1149Q IV G IA D T S S R RN Q G R SR S Y 20650CCGCGTAAGA ATCCCGACCA ATTAACTAAT AATCATCTTC AGTTATATTA TAGATTTTTT 1199R intron2 207CGTTTCTTTT ATAGTTGATT GATGGGGTAG AGATATATAC ATGTACAGCC CTTCTCTTTG 1259P F S L 211GCTGGGCCAG GCTCATCATC TCGTCGTGAG GAGGGAGARG GA.AGCAAG AGGAATTGGG 1319A C P C S S S R R E H C E C K G R C I C 231AGTAATATTT TTGCAGGTTT TAGCACTCGC ACTTTGGCTG AAACATTGGG GGTGGAGATT 1379SN I FAG F S TR TL A ET L G yE I 251GAAACTGCAA GGAAGCTTCA AGAGAATCAG CAATCGCGAC TGTTTGCGAG CGTTGAPCGG 1439ETA R IC L Q ENQ Q SR L FAR V ER 271GGCCAACGAC TGAGCTTACC CCGCCCTCGA TCTCGCTCTC GCTCTCCTTA CGAGACGGAG 1499C Q R L S L P C P R S R S R S P Y E R E 291ACTGAGAGGG ATGATGTTGC TGGTGGATTG CAGGGATATT AT?CATCTGG AGATGACAAT 1559T ER D D VA CCL Q C Y Y SS C DEN 311GGCCTTGAAG AGCTTCTGTG CCCACTGCGT CTAAAGCACA ATGCTGACAA TCCCGAGGAT 1619GV E EL V C P L R VK H NA DN P ED 331CCCGATCTCT ACGTAAGAGA TGGGGGACGA TTGAATAGAC TCAACCGCTT CCTTCCT 1679A D V Y V R D C C R L N R V N R F K L P 351GTACTCAACT ATTTCAGATT ACGAGCCGAG ACCCTTGTTC TCCACCCGGT AAGCAATAAC 1739V L K Y L R L GA H R V V L HP 367TTTTTATTCG CTTCACTTAA TGTCAATTTT CAAGTCCAGT GAATGAATTA ATCTGGTTGC 1799intron 3AGAGAGCATC GTGTGTTCCT TCGTGGAGGA TGAACGCGCA TGGCATAATG TACGTGACGA 1859--R AS CV P SW R M NA H GIN Y VT 386GAGGGGAGGG GAGATTGAG GTGGTGGGAG ACGAGGCAG GAGCGTGTTT CATGCGCGTG 1919R GE G RI E V V C D E CR S V F D CR 406TGAGAGAGCG TCAGTTCATC GTCATTCCCC A?TCTACGC ACTCATCAAA CAGGCAGGAG 1979V RE C Q F I VIP Q F Y A V 1K Q AG 426GCCAGCGCTT TGAGTGGATA ACGTTCACAA CATCGCACAT GTAAGTATAA CATAATTAGC 2039CE G F E WI T F T T S D I 440ATTGCACATG TCATGTACTG ATTGTTATAC TCATCATAAC TGGTATGCAT CTCAGTTCTT 2099intron4 S 441TCCAGTCGTT TTTGGCGGGA AGGCATCAC TTTTGAAGGC ATGCCGGAG GAGTG?TGA 2159F Q S F L A C R Q S V L K A N P H E V L 46151GTGCCGCTTA CAGGATGGAC CGAACTGAG TCCGTCAGA TATGAGTAAC AGAGATGCG 2219S A A Y R M D R T E V R Q I N S N R E C 481ACACCCTCAT TCTGCCTCCA TCATCCCT?G GACGTGACCA AGAACAACAG CACAACATCA 2279D T I, I L P P S S L G R D Q E Q Q H N I 501CATCTCTTCTGCACCAAGTGGAgggcgtt tgaatgaatattatggaataaggcgt t tT S L L H Q V E - 509gaatgaatattatcgagacagctctctgcttcacgcggtgtcctgtttgcgctgcatggttcggcttagtagctagctacccaatattacaataaaacaatgataaggctgtaatagatattataataaggatgttgctttctatgtgtctacaatttcgatggaactttctccattatattcacatgcagctacgccctcagcgttttcgttttctccatatttccaaattccatcccaaagttataaaaca.tttgacgtgatttatatagcaaactcttttcacatggagcatgtcattaatgacctgggtttgtattaatattcttatcaaattaagaaaacactaccacatcggtcaaacattgtagFig. 4.4 Nucleotide sequence of genomic DNA clonesS3.7 and E2.8 from Picea glauca uS legumin storageprotein, and deduced amino acid sequence. The +1 siteand the begining of the cDNA (=) are indicated.Putative regulatory sequences are underlined. Theaminoacid sequence is printed below the nucleotidesequence. The nucleotide and amino acid sequencescorresponding to the deletion are printed in italiccharacters. The coding region is printed in boldcharacters. Introns are indicated. The aininoacidsequence is explained in fig 4.7. On the 3’ non-codingregion the first stop codon and the polyadenylationsignals are underlined.524.3. STRUCTURAL ORGANIZATION OF THE PICEA uS LEGUMIN GENEIn order to align the cDNA to the gDNA, 4 gaps wereintroduced on the cDNA sequence corresponding to introns and1 gap introduced on the gDNA that corresponds to a putativedeletion. The sequence of the E2.8 plasmid contains thecomplete coding region of the ilS legumin gene from Piceaglauca. The coding region is interrupted by 4 short intronsof 67, 1104, 74 and 75 nt respectively. The Picea genecontains at least one additional intron, when compared tosubfamily A legumin genes, and two more introns whencompared to subfamily B legumin genes from angiosperms. Thefirst two introns are located in the region of the geneencoding the acidic or a protein subunit, and the last twoare located on the region encoding the basic or 3 proteinsubunit. Introns 1, 2 and 3 correspond in position tointrons 1 to 3 of subfamily A legumin genes (figure 4.5).All four introns are flanked by canonical border sequences(fig 4.6) and no direct repeats were detected in theirsequence. A remarkable feature of plant introns is theirelevated AT content compared to the surrounding exons. Dicotintrons have an average AT content of 72%, versus 56% inmonocots. The content of A/T within the Picea introns is inaverage 68.1% (67.7, 71.1, 67.5 and 64.0% for each intronrespectively).53tata 1 2 3 4 aataaaa______ ____________________ ____Picea_ __ _____A_____ __ ___ ______BFig 4.5 Comparison of us legumin genes from Picea glaucaand angiosperm subfamilies A and B. The introns arenumbered 1-4, and exons are shaded. The TATA box andpolyadenylation signals (aataaaa) are indicated. The intronsinterrupt the coding regions at the same relative positionsin each gene, but are variable in size.tata 1 2 3 aataaaa‘aa-a\aa-rrss4 Itata 2 3 aataaaaIII54Eukaryote GTA/QAG-/GconsensusINTRON 1Picea GTAGGT 67 nt GTATGCAGRadish GTCCAT 134 nt TTTTATAGSoybean A GTCCAT 237 nt ATGAATAGSoybean BPeaBINTRON 2Picea GTAAGA 104 nt ATGTACAGRadish GTAATC 96 nt TGCTGTAGSoybean A GTGAGA 282 nt CTTGGCAGSoybean B GTGAGC 270 nt CTTGGCAGPea B GTAAGT 75 nt TTTTTCAGINTRON 3Picea GTAAGC 74 nt GGTTGCAGRadish GTAAGT 265 nt TCTTTCAGSoybean A GTACGT 624 nt TGGTGCAGSoybean B GTACGT 395 nt CGATGCAGPea B GTACGT 80 nt ATATGCINTRON 4Picea GTAAGT 75 nt CATCTAGRadishSoybean ASoybean BPeaBFig 4.6. Comparison of uS legumin intron flankingsequences. The uS legumin gene introns are flanked bycanonical border sequences. The size of each intron isindicated in nucleotides. Radish and Soybean A correspond tosubfamily A legumin genes and Soybean B and Pea B tosubfamily B genes.55The gap introduced when aligning the cONA to thegDNA showed a deletion that is present in the genomic DNA.The 77 nucleotide sequence is located in the region encodingthe Cx subunit. Among the 26 deduced amino acids of thissequence, 4 are highly conserved among legumin genes butabsent on the genomic DNA. The deletion shifted the sequenceout of frame. No other deletions or stop signals were foundin the rest of the gene. The fact that this is the onlydeletion on the sequence and the fact that the positions of3 of the four introns are conserved relative to angiosperms,suggests that the deletion may be an artifact of thesubcloning. A Hindlil restriction site was located on thec]DNA within the sequence that was absent on the genomicclone. To examine this possibility the other clone, the S3.7plasmid DNA was tested for the presence of Hindill site (seefig 4.2). The S3.7 contains a 2.3 Kb fragment that overlapsthe E2.8, however it does not have the Hindlil site. TheHindlil site is not present on the k—DNA either, ascorroborated by enzymatic digestion and mapping of the k-DNA(fig 4.2). These results suggested that this could be acloning artifact that could have arisen during the phagepropagation. If this is a real deletion in the originalgenome and this gene is a pseudo gene this would have beenthe first deletion accumulated in the gene, and probably avery recent event based on the high sequence conservationwith the cDNA. The possibilities of an artifact arisingfrom cloning are difficult to explore because it would56require re-screening the library and obtaining exactly thesame gene.4.4. PICEA GLAUCA uS LEGUMIN AMINO ACID SEQUENCEBecause of the deletion present on the third exon twodifferent approaches can be taken to analyze the amino acidsequence using the genomic DNA nucleotide sequence : a)deduce the amino acid sequence from the genomic DNA thatincludes the deletion or b) substitute the deleted region inthe genomic with the corresponding cDNA nucleotide sequenceand deduce the amino acid sequence. For the purpose ofthis study the second approach will be used, because if thisdeletion is real, it would be the first accumulateddeletion, which means that the rest of the gene remainsintact and is therefore suitable for comparison. Also, therest of the gene, promoter, coding region, intron positionsand intron / exon junctions are consistent when compared toangiosperms. Furthermore since there are no other gymnospermSSP gene sequences, this presents a good opportunity tostudy gene structure and to compare the gene and its productto genes and proteins from angiosperms. The possibility ofthis deletion being a cloning artifact is also to beconsidered.If approach “a”, is studied, the deduced amino acidsequence is truncated and the protein is 244 aminoacids inlength. The 77 nucleotide deletion is in the third exon, inan area of highly conserved aminoacids. However by57substituting the deleted area with the corresponding c]DNAsequence, the product of the gene is a typical uS leguminprotein comparable to other legumin proteins. The deducedamino acid sequence codes for a protein of 509 amino acids(Fig 4.7). The ATG initiation codon was determined bycomparison to other legumin aminoacid sequences. The first24 aminoacids at the N- terminus, correspond to a veryhydrophobic region , the same size as those reported forGluA and G1uB from rice (Takaiwa et al, 1991; Misra andLeal, 1993), and may represent the signal peptide. Thecleavage site was estimated to lie between the alanine-24and isoleucine-25 by comparison to the XI5H cDNA and to theDouglas fir sequences. The deduced molecular weight for theprecursor is 57.18 KD and the co-translationally processedprotein is 54.4 KD. The post-translational cleavage of thelegumin subunit precursors into ot and polypeptides isregulated by a protease that cleaves between asparangine andglycine residues (Scott et al, 1992). The cleavage site forprocessing the precursors of uS legumin proteins inangiosperms is highly conserved and the sequence surroundingthis site is also well conserved. Recently it has been shownby cDNA sequence that the predicted cleavage site in theDouglas fir legumin-like gene is also between asparagine andglycine (Leal and Misra, 1993). The legumin gene of thegymnosperm Gingko biloba, however, has an asparagine residueat the N- terminus of the 13 subunit instead of glycine(Hager et al, 1992). By comparison to other legumin58sequences, the spruce amino acid sequence from 311 to 320corresponds to the conserved sequence with the cleavage sitebetween the asparagine and the glycine at positions 311 and312, respectively. The invariant cysteine residue atposition 7 of the f3 subunit has been shown to be involved inthe formation of a disulfide bridge linking the I andsubunits of uS legumins in angiosperms and in Gingko biloba(Hager et al, 1992). In the spruce uS legumin protein theputative cysteine is located at position 318 of theprecursor protein. The other cysteine that may be involvedin the formation of the bridge is at position 123 of the xsubunit, determined by comparison to other legumin proteins.As a result of the cleavage the x subunit is a polypeptideof 311 aminoacids with a molecular weight of 34.9 KD and theI subunit is 198 amino acids in length with a molecularweight of 22.2 KID. The amino acid composition of the deducedprotein is shown in table 4.1.59MQILLIASCFLFLSLSTVSPVTAISQQRRGRGRRYDEQS S ScRBLRRLS 50AHEPSESETIRSDGGTFELSTGEDNEELECAGVAFFRKTIESNAILLPRY 100PSADLLLYVVRGEGRLGIVFPETFRDHSSFQGRSRHRSEGREEEEE 150EEED S SQKVRRVRGDVIAl FAGAAYWSYNDGNEPI1Q VGIADTS SRNQ 200GRSRSYRPFSIAGPGSSSRBEEGEGKGRGIGSNIFAGFSTRTI1AETL VE 250IETAKLQENQQSRLFARVERGQRLSLPGPRSRSRSPYERETERDDVAGG 300LQYYS SGGDENVEELIRVKHNDNPEDADVYVRDGGRLNRVNRFKL 350PVLKYLRLGAERVVLHPRASCVPSWRAHGIMYVTRGEGRIEVVGDEGR 400SVFDGRVREGQFIVIPQFYAVIKQAGGEGFEWI TFTTSDI SFQSFLAGRQ 450SVLKMPEEVLSAAYRRTEVRQIMSNRECDTLILPPSSLGRDQEQQHN 500ITSLIHQVE 509Fig. 4.7 Spruce uS legumin deduced amino acid sequence.The protein sequence is in capital letters. The putativesignal peptide at the N-terminus is underlined. Thepredicted cleavage site for the x and 3 subunits is shown C‘1’). The two cysteins predicted to be involved in theformation of disulfide bridge are circled.60ArginineSerineGlutamic acidGlycmeLeucmeValineAlanineIsoleucineGlutainineThreonine11.7%9.7%9.3%9.0%8.1%6 • 8%5.7%5.2%4.3%3.9%Aspartic acidPhenylalanineProlineAsparangineTyrosmeHistidineLysineCysteineMet iononeTryptophan3.9%3 • 9%3.7%3.2%2.51.8%1.8%1 • 6%1.6%0.5%Table 4.1 Amino acid composition of the Picea uS legu.minprotein.614.4.1. AMINO ACID COMPARISONS REVEAL PRESENCE OFHIGHLY CONSERVED SEQUENCESThe deduced amino acid sequence was compared tolegumin amino acid sequences from angiosperm, monocots anddicots, and to two gymnosperm sequences. The sequence wascompared to EMBL protein data banks (PCGENE releaseCDPROT18, Intelligenetics, Inc.CA) by using the PCOMPAREand PCLUSTAL programs (fig 4.8). The results in table 4.3(a and b) are shown in terms of percentage of identity. Theresults show 69.2 % identity between spruce and Douglas fir;68.2 % between spruce and pine and 63.9 % between D.fir andpine. The percentage of identity between the spruce anddicots varies from 28.8 % to 34.5 % and among dicots from39.8 % within different subfamilies to 65.8 % within thesame subfamilies. Similarly the identity between spruce andmonocots is 31.1 % to 34.5 %. The percentage of identitybetween monocots and dicots varies from 35.1 % to 40.5 %.62E9vvvvvvvvvvbaANsYxdLfMXNDYDbHSY-YNY’IHESt18E4ADdHLYRY1STr09IHSOD1D’I££OOOSSSSflIdtOI6SdSHSSSOSYDL1LtO’ItSSSSYDIdtO’14V444VVVVVV44V4V#VV##VV06tawINY’IHEStt98tYYDL6tHJIYYtSt60YSXOtYIfSOdsYH-Naa’IbNYaNDAAA’IS1DtXMEYADYaIYAAaDaPn1IHSODtD1t61SflNXdtDahI61YDflSLtO1StYDIdtO’IV4VVvv44#vv4VVL£1NY’II!HESIIEEtNAma-.oDbddDOdO.Af,NOX.I!OYdYOI£taHNa-soIa----_SSabHLYYtSt9StHxHaa-ssôsvI--YOSbbbsaDDadDIOHDbXXW1Svsotn’rn1StWI)IHa-dI-XHSOO1DtILEtSsHsDsdaadDDdAW’IøD&’I’rXsm4xatD’x9tAssa-aaYDSJt’I59tYDIdtf’IVVVVVVV4VVVV44VVVVVV8d’IDS.aYA’IN.DLa.’IIBMYDba-YaMIDYYIA!XdIYINNY’IHEStt88dYNSdYNb’ILYIS’IYADYz-dNJXIDDsIaNad’IYa’n!YCYD!’I£6.YNSdHLYYtStEOtYDNLXHdOHdIARSLDbNS-ALYbSASIIdYb’IHYSAOtL1’IDLOtdysSdDIHI!HIYAOYDbIHSOOtOIt8SflNICtD’I601dYSdAôdIYIYNSII.YADYDYSYTtO101YSdAdIIYNSILYADYDYDIdtO’IVVVVV4SEYIDLYI’1----‘ILII&VSYWNY’IHEStttEWIY.DDDM---O.S’IS’IYTDWNYaCYfX!’I6Ea’IbDNdDbOY..DH.’II’I’IL’IDiS’I’IS-SAYMliLYYtSt6a.IDa---HdSDSSMSLS---fYI’IOY’ISDND’IDALLAIdIMISYWYS1.tOtfl’IfES-SLANIHSODID!’I9‘IHbEDYHHHNLADAAaIDWSf1NICt’I,sYOflSLtD’I6II’fllHDSSS----aANoIDnIbbszYLsAL-s’Is’I.!’I-.wysyx’I’IxbJ4YOIdtO’IVVVVVVVVVVVVV#VVVVVVVV9LI’NY’IHEStt66V----NsYNb’ILMLYdYD’IESVHLYaYIStLLVVSAIOTfl’IDtOSIHSODtf’I95VSflNItD’I98VvonsTt’i8V!DIdtf!’IVVVVVV#VVVVVVVVVVVVVVVtVNY’IHEStT8VVYCVD’I86EHLYYtStVYSIOtL1’IOOSVoIHSODtO’I£0VSL1NIdtO’It£VYDL1SLtD’ILVYDIJtD’IVVVVVVVVVVVVVVVVVVVV99£NY’!HEStt£6£wtIH’IgDHYMAdxaxsxoydNxxadsssdDxwx’Dw&DdYO’IEVEHYYtStL9£VSOtL1’IDS6£IHSODI’I8V£ASYd’IIAYO’IDYdAINPnIDDGaYa!dNaYNH1tLMdDSIIMIdIOa’I9L£V&SYhAD’IDAdYSLtD’IL£ADSdHYDAd’DMNAOOaAXAaYa!dNaYNHIAWIdDYEDIdTDVVVVVTI£NIHEStT8££Cv’x88I31ONOHDH----dOWIdHLYYtStt£IfNSDOSDSbbD1!bSbA’ISYA--YSOtfl’IDII’£DNOSDDHOnIOD--E6AADNSaIOMbHbIDYAdSHSISISIdt£LIEYDIdtOIVVVVVVVVV9L-O1IbYL!NY’IHESTI£8YdYOIO9HJNdYTSIds’xDHAAIDabaNbD--‘IIYAYSAOtE1’ID88IHSOOtOIYdISSflMIdtOIL8DdSzAHY.’IHgHbN---IaYLaYStOt6LYDIdtD’!LEG1_PICEk TLXLPPSS- - - -LGRDQEQQHNITSLLHQVE 509LEG1_TSUGA TLILPPSP RHQRDIE-- -S-RVQVE 507YJEG1_PXNUS ?LILPPSSSLSGSGRYQDQQQNVTSLLVQVA 488LEG1_GOSHI VSVFSP R-QGSQQ 516GLU1ORYSA PGAFPIQYKSY-- --QDVYNAAESS 49912S1_AEATH PL?HSSGP--ASYGRP---RVAPA 472TJEGA_PEA ----NPFKFLVPA-RESENRASA 51711S3_HELAN VLFAPSFS RGQGIRASR 493AFig. 4.8. amino acid alignment of uS leguminproteins from Picea, Pseudotsuga, Pinus strobus,cotton (goshi), oat(orysa), Arabidopsis, pea,sunflower (helianthinin). The alignment was done on8 protein sequences using the CLUSTAL PCGENE.Character to show that a position in the alignmentis perfectly conserved: 1*1; Character to show thata position is well conserved: ‘‘.65PERCENTAGE OF IDENTITY AT THE AMINO ACID LEVELa)Spruce D. Fir PineSpruce 69.296 68.2%D. Fir 63.9%b)Soy2 PeaA Cot2 Arabl2 Ricel Rice2Spruce 29.7% 28.8% 33.7% 34.5% 34.5% 31.1%Soy 2 65.8% 41.2% 39.8% 37.1% 37.5%PeaA 41.2% 41.3% 35.1% 36.1%Cot 2 44.1% 40.5% 36.9%Arabl2 39.6% 38.6%Rice 1 62.9%Table 4.2 Percentage of identity at amino acid levelbetween a) Picea glauca and other gymnosperms; b) Piceaversus dicots and monocots.664.5. THE PICEA GLAUCA LEGUMIN PROMOTER REGIONThe restriction map and the sequence of the S3.7indicate that this clone contains 1.4Kb of 5’ flankingregion. The cap site was determined by primer extensionusing a 27mer primer that was constructed using 27nucleotides at the beginning of the cDNA. mRNA from threedifferent embryo developmental stages were assayed. Thescanning for eukaryotic promoter elements using EUKPROM,PCGENE program showed the position of the start site andTATA box, being the same as determined by primer extension.The start site (÷1) was determined to be 97 nucleotidesupstream of the ATG (fig 4.9 and 4.10) and 35 nucleotidesdownstream from the putative TATA box (fig 4.10).4.5.1. PUTATIVE REGULATORY SEQUENCESSequences responsible for the regulation of legumingene expression have been described in angiosperms and ithas been shown that the first 600 nt of the 5’ flankingregion are sufficient to give a level of regulatedexpression almost equal to that of the complete promoter(Shirsat et al, 1989; Itoh, et al, 1993) . In order toinvestigate the presence of similar elements that could beregulating gene expression in spruce, 450 nucleotides of thepromoter region were sequenced. Sequences homologous toknown conserved motifs were found in the 5’ flanking67sequence and the results are shown on figure 4.10 and table4.3. The cap site and TATA box were determined by scaningfor eukaryotic promoter elements using the PCGENE program.The putative TATA box is located at 35 bp from the cap site.One ACACA element, that has been described as an importantelement for seed specific expression on albumin genes, wasfound at position -208. Two legumin boxes with theconsensus sequence CATGCAT, or RY repeat, are located at -40and -87, respectively, and are similar to those present inthe upstream region of other seed storage protein genes(Depigny-This, et al, 1992). Close to the RY-repeat atposition -110 one ABRE element (ABA regulatory element) wasfound and other two ABRE elements are present at -138 and -160. These ABRE elements were first described in wheat(Marcotte et al, 1989) and rice (Mundy J. et al, 1990) andhave been shown to bind transcriptional factors. The ABREconsensus sequence is included within the G-Box sequence.Four G-Box elements are present on this sequence, three ofwhich contain ABRE elements at positions -132, -137 and -109and another one at -159 position. An AGATGT element,recently identified in the sunflower promoter region thatbinds nuclear proteins (Nunberg et al, 1994) is present atposition -216 of the Picea promoter . AGATGT elements arethought to be involved in binding nuclear factors andenhancing expression of sunflower helianthinin. An A/T richregion is present within the -223 to -340 nucleotides. A/Trich regions have been implicated in the binding of nuclear68factors (Meakin and Gatehouse, 1991; Itoh, et al, 1993;Nunberg et al, 1994).Due to the high similarity to promoter elementsdescribed in angiosperm SSP genes, these various conservedsequences are likely to be involved in Picea SSP regulatoryfunctions which have yet to be characterized. However, theexpression of a specific gene depends on a combination ofregulatory elements and a specific complement of trans—acting factors. Table 4.3 shows the possible roles of Piceaelements compared to elements in other ssp genes.69Fig.4.9. Determination of the transcription start site(+1) by primer extension. n1ENA5 from spruce mature enthxyo(lanes3-5) and pre-enthryo (lane 2) stages were assayed. NoRNA (lane 1) was used as negative control. The arrowindicates the +1 site, and the sequence is shown at theright of the S3.7 DNA sequence. The TATA box sequence isalso shown.71uS LEGUMIN PROMOTERFig. 4.10. Location of putative regulatorysequences on the promoter of Picea uS legumingene. Isluinbers on the figure are relative to the capsite. Abbreviations are referred to in the text.(-87)(-40)LEG BOX(RY)(-35)TATL(-208)ACACAAf\(-223---340) \AGATGT(-216)ATG aataaaIIABREG-BOX(-109)(-132)(-137)(--159)72PUTATIVE REGULATORY SEQUENCES ON THE PICEALEGUMIN PROMOTERCONSENSUS POSITION FUNCTION BINDING REFERENCESEQUENCE IN PICEA FACTORTATA -35 TATA-binding TFIID1,2 Chaiziberland,basal et al (1992)transcriptionLEG BOX -40 Tissue Lelievre, etRY repeat -87 specificity al, (1992)Enhancer of Chamberland,CATGCAT expression et al (1992)ABRE/ -109 ABA EmB-1 Mundy et al,G-BOX -132 responsive (1990)ACACGTG : element GcBT-1 Guiltinan,and et al (1990)ACACGTCGACACGACA ExpressionenhancerACACA -208 Albumin seedspecificexpressionAGATGT -216 Tissue Seed Nunberg etspecificity nuclear al, (1994)Legumin proteinsexpression fromenhancer soybeanAfT -223 to Seed specific LABF1 Meakin and-340 expression Nuclear Gatehouse,enhancer factors (1991)of early Itoh, et alembryo (1993)genesisFig 4.3 Putative regulatory sequences in the Picea glaucaus legumin promoter. Consensus sequence and positions referto the P. glauca promoter sequence. The function and bindingfactor columns refer to other seed storage protein promotersequences, from the literature.73Chapter 5DISCUSSIONThe coding region of the XI5H-1 Picea glauca uS legumingene characterized in the present study exhibits 98.7%homology with the cDNA clone XI5H, used to isolate thisgenomic DNA. The important features in the coding regionhave been indicated in figure 4.4. Comparison of thegenomic and the cDNA sequences suggests the presence offour introns (fig 4.4 and 4.5) and a short deletion withinthe coding region of the Picea uS legumin gene.Based on amino acid identity the existence of two leguminsubfamilies (A and B) has been shown in angiosperms. Eachsubfamily has a characteristic number of introns: thisnumber is two for B and three for A (Boutler et al, 1987).The position of the two introns of subfamily B genescorrespond to the position of the second and third intronsof subfamily A genes. The position of the introns is highlyconserved among legumin genes as are the surroundingsequences (Galau et al, 1991). This study shows that theuS legumin gene from Picea has one or two extra intronscompared to angiosperm genes. The position of introns oneto three correspond to the position of one to three intronsof subfamily A. This result demonstrates that theconservation of intron positions is extended togyrnnosperms. Since this is the first report of a genomic74clone from a legumin seed storage protein gene from agymnosperm, the possibility that this gene is a member of adifferent subfamily, based on the number of introns isspeculated, however the possibility of Picea containingtype A or B genes can not be excluded. Evenmore, theconservation of intron positions (one to three) suggeststhat these genes may have derived from a common ancestor.The possibility that the Picea uS legumin gene belongsto a different subfamily is also suggested by thepercentage of homology at the amino acid level withangiosperm sequences. The percentage of identity betweensubfamilies A and B in angiosperms of the same or differentspecies is about 40%. The percentage of identity amongmembers of subfamily A, of same or different species is 65%to 85%. The percentage of identity among dicots is higherthan it is between dicots and monocots. The analysis of thepredicted amino acid sequences from this study shows thatthe percentage of identity between Picea and dicots orPicea and monocots is similar (30-34%). Since thepercentage of identity between Picea and subfamily A isnot different than that between Picea and subfamily B, thiscould suggest that they belong to three differentsubfamilies. Nevertheless the differences may be the resultof evolution. The percentage of identity among the threegymnosperms Picea, Pinus and Pseudotsuga is around 70%.This high percentage suggests that they may belong to thesame subfamily. However no data is available at the genomic75DNA level for Pinus and Pseudotsuga, so comparisonsregarding intron number can not be made.Homologous regions among legumin amino acid sequences aredispersed throughout the molecules, suggesting that thesimilarity is due to divergence from a common ancestor(Negoro et al 1985). The ancient origin of legumin genes isconfirmed by the existence of homologous sequencesexpressed in the spores of some fern species (Templeman etal, 1988) . Furthermore, the presence of legumin proteins ingymnosperms such as Gingko biloba and conifer speciessupports the hypothesis of a common ancestor. On the basisof structural and immunological criteria it has beenspeculated that the genes encoding legumin proteins inmonocots and dicots have evolved from a common ancestorgene (Negoro et al 1985; Borroto and Dure 1987).Recently molecular evolution data strongly suggest thatthe separation of monocot and dicot lineages took place inlate Carboniferous (300 million years ago). The divergencetime of conifers from angiosperms is estimated of 330million years ago, and the earliest seed plants are ofupper Devonian-Lower Carboniferous age (360 million yearsago) (Martin et al, 1993). The results of this study showthat the legumin gene from Picea glauca is structurallysimilar to angiosperm genes. Even though the number ofintrons is different, the position of three of the fourintrons is conserved among the gymnosperm Picea glauca and76angiosperms. The results also show that at the amino acidlevel the protein shares common regions with a 30-34% ofidentity to angiosperm legumin proteins and 70% of identitywith other gymnosperms. This is the first report thatshows that the organization of the legumin gene is highlysimilar between conifers and angiosperms. The notableconservation of genes and proteins support the hypothesisof a common ancestor.The extent of conservation among uS proteins may bebased on functional constraints to evolutionary divergence,i.e., the postranscriptional processing event including thedisulfide bond formation, the proteolytic cleavage, thesignal for moving through the ER, the information formoving to the Golgi apparatus and the effectiveaccumulation of the proteins in protein bodies (Borroto andDure, 1987; Negoro et al 1985).Besides introns, another difference found between thegenomic clone and the cDNA is a 77 bp deletion. Thedeletion resides in the third exon, a highly conservedregion. The deletion moves the sequence out of frame,therefore the predicted amino acid product would be atruncated protein. The deletion may be a cloning artifact.However this possibility is difficult to explore because itwould be necessary to re-screen the library and obtain thesame particular gene. The probability of including aspecific DNA sequence in a library with a known genome size77is described by the formula N=ln(1-p)/ln(1-f), where p isthe probability of containing any particular DNA sequence,f is the fraction of the genome represented by eachfragment and N corresponds to the number of clonedfragments and, therefore the necessary number ofrecombinant clones. For Picea glauca the number ofrecombinants required to have any given DNA sequence is2.30 x 106. Another important consideration is thatlegumin genes belong to multigene families. The number ofgene family members for Picea has not been characterized.Nevertheless results from this study show the presence ofat least two members of the legumin family (fig 4.1). It isimportant to mention that unsuccessful efforts were made todetermine the number of the members of the legumin family.However, because the deletion is the only one foundwithin the Picea sequence, and it is not bordered by directrepeats or palindromic sequences, the possibility of itbeing a cloning artifact has to be considered. In animals,bacteria and plant genomes it has been shown that there isa close correlation between direct repeats and naturallyoccurring deletions upon cloning into E.coli, and led tothe authors to favor ‘slipped mispairing’ as a mechanism toexplain deletion formation (Heim et al, 1989). Anotherfeature that suggests that this deletion is a cloningartifact is that the rest of the gene has no othermodifications and the promoter region appears to be intact.The amino acid identity, if the deletion is substituted by78the corresponding sequence from the cDNA, is very highwithin gymnosperms (70%) and around 35% with angiosperms.The intron positions and the intron border sequences, alsoinsinuate that the gene is functional, since they observethe rules for eukaryotic intron/exon flanking sequences.However the possibility of this being the first deletionevent accumulated in the Picea uS legumin gene, thereforea very recent pseudogene can not entirely discounted.Nevertheless, even if the deletion is real, since it isprobably the first deletion occurring in this gene, tojudge from the high sequence homology to the cDNA, the restof the gene would still conserve the originalcharacteristics and can be used for comparisons. Theresults of this present work are important in the light ofthe structural organization of the legumin gene sincethere is no data, other than that presented here, regardinggenomic SSP DNA in gyrnnosperms. Since the c]DNA sequence isavailable (C. Newton, unpublished results) the deletedsequence was substituted by the cDNA corresponding sequencefor the purpose of comparing to other legumin genes. Allthe amino acid comparisons were performed after translatingthe gDNA with the substituted sequence.Other important features found in the Picea glauca uslegumin gene are found in the promoter region (Fig 4.4 andTable 4.3). The results of this present work show thepresence of putative regulatory sequences in the 5’79flanking sequence of the gene. The presence of basictranscriptional sequences such as the TATA box, and thepresence of specific sequences like the highly conservedlegumin box, ABRE elements, ACACA elements, G-boxes andA/T rich regions, support the hypothesis that the Picea uslegumin gene may be functional and raise the possibilitythat gymnosperms have similar regulatory sequences toangiosperms. The presence in Picea of various sequencesthat are highly conserved and important for transcriptionin angiosperm SSP genes, strongly suggest a transcriptionalrole. Some of these putative sequences have been determinedas enhancers or binding elements for transcriptionalfactors in variety of seed specific genes (Weissing andKahl, 1991) . The presence of these conserved sequencessuggests that the gene is functional. The objectives ofthis thesis do not comprise the functional characterizationof the promoter region. However it would be veryinteresting to test whether the promoter sequence isfunctional or not. The functionality of the promoter couldbe examined by dissecting the gene promoter, fusing it to areporter gene, and in a transgenic system determine if itdirects the expression of the gene. Recent advances ingenetic transformation of plants have made possible thetransfer of chimeric genes into conifer genomes (Duchenseand Charest, 1991) . The somatic embryogenesis systemdeveloped for conifers at B.C.Research and the use ofmicroprojectile DNA-delivery may overcome some of the80limitations for transgenic tree recovery (Ellis et al,1991)The results of the present study showed that all thestructural elements of the Picea glauca us legumin geneare similar to angiosperms, with the exception of anadditional intron. The extra intron and the amino acididentity suggest that the Picea gene belongs to a leguminsubfamily different than angiosperm subfamilies A or B.Another possibility is that angiosperm genes lost one ortwo introns during evolution and the Picea gene is moresimilar to the ancestral gene. The conservation of legumingenes from fern species to seed plants (gymnosperms andangiosperms) strongly suggest an evolutionary relationshipthat may have significant consequences in regards theapplication of genetic technology largely developed indicot angiosperms (i.e., tobacco, Arabidopsis) toeconomically important species such as conifers. Thepromoter region also exhibits all putative regulatoryelements associated with angiosperm SSPs. While it is notpossible to rule out the presence of regulatory domainsunique to gymnosperms, it is likely that cis-actingsequences are common between gymnosperms and angiosperms.This study implies that tissue specific regulation could beachieved in transgenic conifers using angiosperm promoters.81Chapter 6REFERENCESArgos P., Narayana S.V.L. and Nielsen N.C. (1985)Structural similarity between legumin and vicilin storageproteins from legumes. EMBO J. 4:1111-1117.Allona I. Casado R. Aragoncillo C. (1992) Seed storageprotein from Pinus pinaster Alt.: homology of majorcomponents with llS proteins from angiosperms. Plant Sci.8 7:9-18.Baumlein H., Wobus U., Pustell J. and Kafatos F.C. (1986)The legumin gene family: structure of a B-type gene of Viciafaba and a possible legumin gene specific regulatoryelement. Nucleic Acids Res. 14:2707-2720.Baumlein H. Boerjan W., Nagy I., Panitz R, Inze D. andWobus U. (1991) Upstream sequences regulating legumin geneexpression in heterologous transgenic plants. Mol Gen Genet225:121-12 8.Bewley J.D. and Black M. (1985) Seeds: germination,structure and composition. Edited by J.D.Bewley and M. BlackPlenum Press New York, pp 1-28.Bollini R. and Chrispeels N.J. (1979) Characterizationand subcellular localization of vicilin andphytohemaglutinin, the two major reserve proteins ofphaseolus vulgaris. Planta 142:291-298.Borroto K. and Dure III L. (1987) The globulin seedstorage proteins of flowering plants are derived from twoancestral genes. Plant Mol.Biol. 8:113-131.Bostock R.M. and Quatrano R.S. (1992) Regulation of Emgene expression in rice. Plant Physiol. 98:1356-1363.Boulter D., Evans N.I., Ellis R.J., Shirsat A., GatehouseJ.A. and Croy R.R.D. (1987). Differential gene expression inthe development of Pisum sativum. Plant Physiol..Bioch.25:283-289.82Breen J.P. and Crouch M.L. (1992). Molecular analysis ofa cruciferin storage protein gene family of Brassica napus.Plant Mol.Biol. 19:1049-1055.Bustos, M.M. Beguin, D., Kalkan, F.A. Battraw, M.J. andHall T.C. (1991) Positive and negative cis-acting DNAdomains are required for spatia and temporal regulation ofgene expression by a seed storeage protein promoter. EMBO J10:1469-1479.Casey R., Domoney C. Ellis N, and Turner S. (1988) Thestructure expression and arrangement of legumin genes inpea. Bioch.Phys.Pflanz. 183: 173-180.Charnbreland S., Daigle N. and Berrnier F. (1992). Thelegumin boxes and the 3’ part of a soybean b-conglycininpromoter are involved in seed gene expression in transgenictobacco plants. Plant Mol.Biol. 19:937-949.Craig S. and Millerd A. (1981) Pea seed storage proteins:Immunocytochemical localization with prot A-gold by electronmicroscopy. Protoplasma 105:333-339.De Pace,C.,Delre,V., Scarascia Mugnozza, G.T., Maggini,F., Cremonini, R., Frediani, M. and Cionini P.G. (1991)Legumin of Vicia faba major: accumulation in developingcotyledons, purification, mRNA characterization andchromosomal location of coding genes. Theor Appi Genet83:17-23.Depigny-This D., Raynal M. Aspart L., Deisney M. andGrellet F. (1992) The cruciferin gene family in radish.Plant Mol. Biol. 20:467-479.Dickinson C.D. Evans R.P. aand Nielsen N.C. (1988) RYrepeats are conserved in the 5’ flanking sequence of leguininprotein genes. Nuc. Acids Res. 16:371Domoney C. and Casey R. (1985) Measurements of genenumber for seed storage protein genes in Pisum. Nucl.AcidsRes. 13:687-699.Domoney C. Ellis T.H.N.and Davies D.R. (1986)Organization and mapping of legumin genes in Pisurn.Mol.Gen.Genet. 202:280-285.Dure III (1988) Characteristics of storage proteins ofcotton. JAOCS 66:356-359.83]Duchesne L.C. and Charest P.J. (1991) Transientexpression of the f3-glucoronidase gene in ernbryogenic callusof Picea mariana following microinjection. Plant Cell Rep10:191-194Ellis J.R. Shirsat A.H., Hepher A, Yarwood J.N. GatehouseJ.A., Croy R.R.D. and Boulter D. (1988) Tissue specificexpression of a pea legumin gene in seeds of Nicottianaplumbaginifolia. Plant Mol Biol 10:203-214Ellis D.D., McCabe D.,RussellD.,Martinell B. and McCownB.H. (1991) Expression of inducible angiosoerm promoters ina gymnosperm Picea glauca. Plant Mol Biol 17:19-27.Eicson M.L. Muren E., Gustavsson H.O., Josefsson L.G andRask 1. (1991) Analysis of the prmoter region of napin genesfropm Brassica napus demonstrates binding of nuclearproteins in vitro to a conserved motif. Eur J.Bioch.197: 741-746.Finkelstein R.R. Tenbarge K.M. Shumwat J.E. and CrouchM.L. (1985) Role of ABA in maturation of rapeseed embryos.Plant Physiol. 78:630-636.Flinn B.S., Roberts D.R. Webb D.T. and Sutton B.C.S.(1991a) Storage protein changes during zygotic embryogenesisin interior spruce. Tree Physiol. 8:71-81.Flinn B.S., Roberts D.R. and Taylor I.E.P. (l99lb)Evaluation of somatic embryos of interior spruce.Characterization of developmental regulation of storageproteins. Physiol. Plant. 82:624-632.Flinn B.S. Roberts D.R. Newton C.H. Cyr D.R. Webster F.B.Taylor I.E.P. (1993) Storage protein gene expression inzygotic and somatic embryos of interior spruce. Phys. Plant.89: 719—730Galau G.A., Wang H.Y.C. and Hughes W. (1991). Sequence ofthe Gossypium hirsutum D-genome alloallele of legumin A andits mRNA. Plant Physiol. .97:1268-1270.Gibbs P.E.M., Strongin K.B. and McPherson A. (1989)Evolution of legume seed storage proteins - A domain commonto legumins and vicilins is duplicated in vicilins.Mol.Biol.Evol. 6:614-623.84Gifford D.J. (1988) An electrophoretic analysis of theseed proteins from Pinus monticola and eight other speciesof pine. Can J.Bot. 66:1808-1812.Goldberg, R.B. Batrker, S.J. and Perez-Grau, L. (1989)Regulation of gene expression during plant embryogenesisw.Cell 56:149-160.Green M.J. McLeod J.K. and Misra S. (1991).Characterization of Douglas fir protein body composition bySDS-PAGE and electron microscopy. Plant.Physiol .Biochem.29:49-55.Gultinan M.J., Marcotte W.R. and Quatrano R.S. (1990) Aplant leucine zipper protein that recognizes an absicic acidresponnse element. Science 250:267-270Hager K.P., Jensen U., Gilroy J. and Richardson M. (1992)The N terminal amino acid sequence of the 13 subunit of thelegumin like protein from seeds of Ginkgo biloba.Phytochemistry 31:523-525Hakman I., Stabel P. Engstrom P. and Erikson T. (1990)Storage protein accumulation during zygotic and somaticembryo development in Picea abies (Norway spruce). PhysiolPlant. 80: 441-445.Heim U., Schubert R., Baumlein H. Wobus U. (1989) Theleguinin gene family: structure and evolutionary implicationsof Vicia faba B-type genes and pseudogenes. Plant Mol.Biol.13:653-663.Itoh Y., Kitamura Y., Arahira M. and Fukazawa C. (1993)Cis-acting regulatory regions of the soybean seed storageuS globulin gene and their interactions with seed embryofactors. Plant Mol.Biol. 21:973-984.Jensen U. and Lixue C. (1991) Abies seed protein profiledivergent from other Pinaceae. Taxon 40:435-440.Jensen U. and Berthold H. (1989) Legumin-like proteins ingymnosperms. Phytochem. 28:1389-1394.Jordano, J. Almoguerra, C. and Thomas, T. (1989) Asunflower helianthinin gene upstream sequence ensemblecontains an enhancer and sites of nuclear proteininteraction. Plant Cell 1: 855-866.85Jofuku, D. K., Okamuro, J.K. and Goldberg, R.B. (1987)Interaction of an embryo DNA binding protein with a soybeanlectin gene upstream region. Nature 328: 734-737.Kitamura Y., Arahiara M., Itoh Y. and Fukazawa C. (1990)The complete nucleotide sequence of soybean glicinin A2B1gene. Nuc. Acids Res. 18:4245.Kuhlemeir C., Green P. J. and Chua N. (1987) Regulation ofgene expression in higher plants. Ann Rev Plant Physiol.3 8:221-257Leal I. and Misra S. (1993) Molecular cloning andcharacterization of a legumin-like storage protein cDNA ofDouglas fir seeds. Plant Mol.Biol. 21:708-715Lelievre, J., Oliveira, L.O. and Nielsen, N.C. (1992).5’-CATGCAT-3’ elements modulate the expression of glyciningenes. Plant Physiol 98: 387-391.Maniatis T. Fritsch E.F. and Sarnbrook J. (1982)Molecular clonnig. A lab mannual. Cold spring Harbor LabPress. N.Y.Martin W., Lydiate D. and Brinkmann H. (1993) Molecularphylogenies in angiosperm evolution. Mol.Biol.Evol. 10:140-162.Marcotte W.R Bayley C.C. and Quatrano R.S. (1988)Regulation of a wheat promoter by ABA in rice protoplasts.Nature 335:4543-457.Marcotte W.R., Russell S.L. and Quatrano R.S. (1989) ABAresponsive sequences from the Em gene of wheat. Plant Cell1:969 -976.Meakin P.J. and Gatehouse J.A. (1991) Interaction of seednuclear proteins with transcriptionally enhancing region ofpea leg A gene promoter. Planta 183:471-477.Misra S. and Green M.J. (1990) Developmental geneexpression in conifer embryogenesis and germination. I.Seed proteins and protein body composition of mature embryoand megagamethophyte of white spruce (Picea glauca (Moench)Voss.) Plant Sci. 68:163-173.Misra S. and Green N.J. (1991) Developmental geneexpression in conifer embryogenesis and germination. II.Crystalloid protein synthesis in the developing embryo and86megagainethophyte of white spruce (Picea glauca (Moench)Voss.) Plant Sci.78:61-71.Mundy J. and Chau N. (1988) ABA and water stress inducethe expression of a novel rice gene. EMBO J 7:2279-86.Muntz, K. (1989) Intracellular protein sorting and theformationof protein reserves in storage tissue cells ofplant seeds. Bioch Physiol Pflanzen 185:315-335.Negoro T., Momma T. and Fukazawa C. (1985) A cDNA cloneencoding a glycinin Ala subunit precursor of soybean.Nuc.Acids Res. 13:6719-6731.Newton C.H., Flinn B.S. and Sutton B.C.S. (1992) Vicilinlike seed storage proteins in the gymnospem interior spruce(Picea glauca/englemanii). Plant Mol.Biol. 20:315-322.Nielsen N. (1986). Structural relationships between 7Sand liS legume globulins, in Molecular biology of seedstorage proteins. Edited by Shannon L. and Chrispeels M.Nielsen N.C., Dickinson C.D., Cho T.J., Thanh V.H.,Scallon B.J., Fischer R.L., Sims T.L., Drews G.N. andGoldberg R.B. (1989). Characterization of the glycinin genefamily of soybean. Plant Cell 1:313-328.Nunberg A.N., Zhuwen L., Bogue M.A. Vivekananda J., ReddyA.V. and Thomas T. (1994) Developmental and hormonalregulation of sunflower helianthinin genes: proximalpromoter sequence confer regionalized seed expression. PlantCell 6:473—486.Pang P. P., Pruit R. E. and Meyerowitz E .M. (1988)Molecular cloning, genomic organization, expression andevolution of 12S seed storage proteins of Arabidopsisthaliana. Plant Mol.Biol. 11:805-820.Pich U. and Schubert I. (1993) Polymorphism of legumingenes in inbred lines of Vicia Eaba. 112:342-350.Plietz P., Damaschun G., Muller J.J. and Schlesier B.(1983) In “The biochemistry of plants. Molecular biology”.(Stumfp P.K. and Conn E.E. Editors, Vol 15) pp297-345.Riggs, D.C. Voelker, T.A. and Chispeels, M.J. (1989)Cotyledon nuclear proteins bind to DNA fragments harboring87regulatory elements of phytohhemagglutinin genes. Plant Cell1 :609-621.Roberts,]J.R., Flinn B.S., Webb D.T., Webster F.B. andSutton B.C.S. (1990) ABA and IBA regulation of maturationand accumulation of storage proyteins in somatic embryos ofinterior spruce. Physiol. Plant 78:355-360.Redenbaugh K. Paasch B.D., Nichol S.W., Kossler M., VissP.R. and Walker K.A. (1986) Somatic seeds: Encapsulation ofasexual plant embryos. Bio-tech 4:797-781.Rodin J., Sjodahal S., Josefsson L. and Rask L. (1992)Characterization of Brassica napus gene encoding acruciferin subunit: estimation of sizes of cruciferin genefamilies. Plant Mol.Biol. 20:559-563.Scott M.P., Jung R., Muntz K. and Nielsen N.C. (1992) Aprotease responsible for post-translational cleavage of aconserved Asn-Gly linkage in glycinin, the major seedstorage protein in soybean. Proc.Natl.Acad.Sci. USA 89:659-662.Shirsat, A.H., Meakin, P.J. and Gatehouse, J.A. (1990)Sequences 5’ to the conserved 28bp Leg box element regulatethe expression of pea seed storage protein gene leg A. PlantMol. Biol. 15:685-693.Shirsat A., Wilford N., Croy, R. and Boulter, D. (1989)Sequeneces responsible for the tissue specific promoteractivity of a pea legumin gene in tobacco. Mol Gen Genet215: 326-331.Shotwell M.A. and Larkins B.A. (1989) The biochemistryand molecular biology of seed storage proteins in Marcus A.The biochemistry of plants vol. 15 Acad. Press. Inc.Shotwell M.A., Boyer S.K., Chestnut R.S. and Larkins B.A.(1990) Analysis of seed storage proteins of oat.J.Biol.Chem. 265-9652-9658.Sims T.L. and Goldberg R.B. (1989). The glycinin Gyl genefrom soybean. Nuc.Acids Res. 17:4386.Skriver K. and Mundy J. (1990) Gene expression inresponse to ABA and osmotic stress. Plant Cell 2:503-512.Stabel P., Erikson T. and Engstrom, P. (1990) Cganges inprotein synthesis upon cytokinin-mediated adventitious bud88induction and during seedling development in Norway sprucePicea abies. Plant Physiol. 92: 1174-1183.Takaiwa F., Oono K. and Kato A. (1991) Analysis of the 5’flanking region responsible for the endosperm-specificexpression of a rice glutelin chimeric gene in transgenictobacco. Plant Mol.Biol. 16:49-58.Templeman T.S., DeMaggio A.E. and Stetler D.A. (1987)Biochemistry of fern spore germination: globulin storageprotein in Matteuccia struthiopteris L. Pl.Physiol. 85:343-349.Templeinan T. S., ID. B. and DeMaggio A. E. (1988). Afern spore storage protein is genetically similar to the1.7S seed storage protein of Brassica napus. Biochem.Genet.26:595-603.Templeman T.S. and DeMaggio A.E. (1990) Biochemistry offern spore proteins:globulin storage proteins in Oncleasensibilis and OsmuncYa cinnamomea. Amer. J.Bot. 77:284-287.Thompson G.A. and Larkins B.A. (1989) Structuraleelements regulating zein gene expression. BioEssays 10:108-113.Turner L., Helllens R.P., Lee D. and Ellis T.H.N. (1993)Genetic aspects of the organization of legumin genes in pea.Plant Mol.Biol. 22 101-112.Wang C., Shastri K., Wen L., Huang J. Sonthayanon B.,Muthukrishnan S. and Reeck G.R. (1987) . Heterogeneity incDNA clones encoding rice glutelin. FE.BS 1,. 222:135-138.Weissing K. and Kahl G. (1991) Towards an Understandingof plant Gene regulation. The action of nuclear factors.Nature 46:1-11.Wilen R.W., Mandel R.M., Pharis R.P., Holbrook L.A. andMoloney M .M. (1990) Effects of abscicic acid and highosmoticum on storage protein gene expression in microsporeembryos of Brassica napus. Plant Physiol. 94:875-881.89


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items