UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The isolation and molecular characterization of a 2S albumin gene from Picea glauca McInnis, Stephanie Marie 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1998-271001.pdf [ 13.84MB ]
Metadata
JSON: 831-1.0088734.json
JSON-LD: 831-1.0088734-ld.json
RDF/XML (Pretty): 831-1.0088734-rdf.xml
RDF/JSON: 831-1.0088734-rdf.json
Turtle: 831-1.0088734-turtle.txt
N-Triples: 831-1.0088734-rdf-ntriples.txt
Original Record: 831-1.0088734-source.json
Full Text
831-1.0088734-fulltext.txt
Citation
831-1.0088734.ris

Full Text

T H E ISOLATION A N D M O L E C U L A R C H A R A C T E R I Z A T I O N O F A 2S A L B U M I N G E N E F R O M PICE A GLAUCA by S T E P H A N I E M A R f f i M C I N N I S B . S c , The University of Victoria, 1985 A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Plant Science) We accept this thesis as conforming T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A March 1998 © Stephanie Marie Mclnnis , 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of Br i t i sh Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Plant Science The University of Br i t i sh Columbia 344-2357 M a i n M a l l Vancouver, Canada V 6 T 1Z4 Date: A B S T R A C T 2S albumins are a class of small seed storage proteins (SSPs) found widely among the Dicotyledonae and related to the prolamin SSPs of the Monocotyledonae. A genomic clone (PG2S) and related pseudogene with homology to the dicot 2S albumins were sequenced from the gymnosperm, Picea glauca. Northern blot analysis of developing spruce somatic embryos indicated embryo-specific expression of the conifer 2S albumin gene, in a pattern consistent with a seed storage protein. A translational fusion between 2.3 kb of the 5'-flanking region of PG2S and the uidA (p-glucuronidase) reporter gene was constructed to explore promoter function. Two distal deletions were also created, resulting in promoters deleted to a 5' position of -653 and -117, respectively, relative to the site of transcriptional initiation. The two larger constructs were used in the stable transformation of tobacco (Nicotiana tabacum cv. Xanthi) resulting in embryo-specific expression of the uidA gene in the developing tobacco embryo from long heart stage to embryo maturity with maximal expression in the torpedo stage embryo. Under control of the PG2S promoter G U S expression was not detected in any other tobacco tissues. There were no differences observed in pattern or strength of expression between the constructs in tobacco. The three PG2S promoter ; uidA reporter gene constructs were transiently expressed by microprojectile bombardment in a developmental series of interior spruce {Picea glauca/engelmannii) somatic embryos (proembryo, stage 2, early cotyledonary, mature and partially-dried mature), germinants and pollen. Transient expression of the 2.3 kb promoter construct mirrored 2S albumin m R N A levels in the proembryo, stage 2, early cotyledonary and mature embryo, as well as in germinants. In contrast, high levels of transient expression were observed in partially-dried mature embryos and in pollen, in which low and no 2S albumin m R N A was detected. The promoter deletions gave reduced levels of transient expression, but were not altered in seed-specificity. Putative regulatory motifs identified within the promoter were: A C G T , T G C A , C A T G , C A N N T G , and C C A C ( C ) . Despite evolutionary distance, the 2S albumin promoter from spruce functions in a developmentally regulated, tissue-specific manner in an ii angiosperm; c/s-elements and their co-ordinate transacting factors are conserved between gymnosperms and angiosperms. ii i T A B L E O F C O N T E N T S Abstract i i Table of Contents iv List o f Abbreviations ix List o f Tables x i List o f Figures xi i Acknowledgments xiv C H A P T E R O N E - Introduction 1.1 Seed Storage Proteins 1 1.2 2S Albumin Super Family 2 1.2.1 2S Albumin-Related Genes in the Monocotyledonae 3 1.2.2 2S Albumins in the Dicotyledonae 4 1.2.3 2S Albumin Proteins 6 1.2.4 Protein Secondary Structure 7 1.2.5 Processing of the Precursor Protein 9 1.2.6 Amino Ac id Homology 11 1.2.7 Practical Applications of 2S Albumin Genes and Proteins 12 1.2.8 2S Albumins Identified as Allergens 15 1.2.9 2S Albumins with Anti-Pathogenic Activity 16 1.3 Seed-Specific Expression 17 1.4 Conifer Seed Storage Proteins and 2S Albumins 18 1.5 2S Albumin Promoter Studies in Dicotyledonae 19 1.6 Characterisation of a Picea glauca 2S Albumin Genomic Clone 23 C H A P T E R T W O - Methods and Materials 2.1 General Microbiology Techniques 29 2.1.1 Lambda Bacteriophage Protocols 30 Preparation of Plating Bacteria Working Stock.. 30 Bacteriophage Multiplication 30 iv Calculation of Bacteriophage Titre 31 2.1.2 Isolation of D N A and R N A 32 Lambda Bacteriophage D N A 32 Escherichia coli Plasmid Mini-prep .33 Agrobacterium tumefaciens Plasmid Mini-prep 34 Large Scale Plasmid Isolation by Alkaline Lysis 34 Cesium Chloride Gradient D N A Purification 35 Spruce Genomic D N A 36 Spruce R N A Purification ..37 Tobacco Genomic D N A 38 2.1.3 Gel Electrophoresis and Analysis of Nucleic Acids 39 Restriction Digest Reactions 39 Gel Electrophoresis - D N A 39 Southern Blot 40 Gel Electrophoresis - R N A 41 Northern Blot 41 2.1.4 Labelling and Hybridization of Probes 42 3 2 P Labelling of Oligonucleotide Probes 42 Hybridization of Southern Blots with Oligonucleotide Probes 43 3 2 P Labelling by Random Primer Method 43 Hybridization of Southern Blots with Randomly Labelled Probes 44 Hybridization of Northern Blots 44 2.1.5 Ligation Reactions 45 2.1.6 Heat Shock Transformation of Competent E. coli Cells 45 2.1.7 Generation of Unidirectional Deletions by Exo III Digestion 46 2.1.8 2S Albumin Promoter / G U S Fusions 47 Isolation of the 2S Albumin Promoter 47 Construction of the G U S Fusion Vectors 47 v Introduction of Binary Vectors into Agrobacterium tumefaciens 50 2.2 Sequencing 51 2.2.1 Radioactive Labeling of Sequencing Primers 52 2.2.2 Sequencing Extension/Termination Reactions 52 2.2.3 Sequencing Gels - 1 . 2 X T B E 53 2.2.4 Sequencing Gels - Formamide 55 2.2.5 Assembly of Sequences 55 2.3 Confirmation of Intron by Polymerase Chain Reaction 56 2.4 Plant Methods 56 2.4.1 Interior Spruce Embryogenic Cultures 56 2.4.2 Tobacco Plants 57 2.5 Gene Expression Experiments 58 2.5.1 Transient Expression Experiments 58 Interior Spruce Developmental Stages - Target Preparation 58 D N A Preparation for Microprojectile Bombardment 61 Microprojectile Bombardment 61 2.5.2 Stable Expression Experiments 62 Tobacco Transformation 62 Generation of T j Plants 63 2.5.3 Assessment of P-glucuronidase Activity 64 Histochemical G U S Assay 64 Fluorometic G U S Assay. 65 Protein Quantification 66 Calculation of P-glucuronidase Enzyme Activity 66 2.6 Data Analysis and Statistics 67 C H A P T E R T H R E E - Gene Structure 3.1 Characterisation of X3.2 68 3.1.1 Pseudogene (v;/2S) 68 vi 3.2 Piceaglauca 2S Albumin (PG2S) 72 3.2.1 5' Flanking Sequence .73 3.2.2 Intron Sequence..... .......87 3.2.3 Comparison of PG2S with Related Conifer c D N A s .89 3.3 Amino A c i d Sequence 93 3.4 2S Albumin Promoter : uidA Translational Fusions 110 C H A P T E R F O U R - Gene Expression 4.1 Expression of the Native Gene 112 4.2 Transient Expression in Picea glauca Developmental Stages 114 4.3 Stable Expression in Tobacco 121 C H A P T E R F I V E - Discussion and Conclusions 5.1 2S Albumin Pseudogene ^ S 128 5.2 Characterisation of a Spruce 2S Albumin Intron 130 5.2.1 Promoter Motifs within the 2 S Albumin Intron. 132 5.2.2 Would the Ancestral 2S Albumin Gene Contain an Intron? 133 5.3 Translation of PG2S 134 5.3.1 2S Albumin Cysteine Framework 134 5.3.2 2S Albumin Secondary Structure 136 5.3.3 Variation in Amino Ac id Sequence Among the Conifer 2S Albumins 136 5.4 Predicted Processing of the Mature 2S Albumin Protein 137 5.4.1 Secretory Signal Sequence 137 5.4.2 Amino terminal Processed Fragment ...141 5.4.3 Small Subunit 141 5.4.4 Internal Processed Fragment 142 5.4.5 Large Subunit and C-Terminal Processed Fragment 143 5.4.6 Amino Ac id Content of the Predicted Mature Protein 143 5.5 Similarity to Calmodulin Antagonists 145 5.6 PG2S Promoter Region 146 vii 5.6.1 2S Albumin Developmental Pattern of Expression is Conserved 146 5.6.2 Conserved Sequence Motifs 148 5.6.3 Cereal-like Promoter Motifs 151 5.6.4 A B A Response Elements 151 5.6.5 Putative enhancer elements not recognised in tobacco 152 5.7 Transient Expression 153 5.8 Conclusions 156 References Cited 160 Appendices 182 Appendix A 2S Albumins and Related Genes 183 Appendix B Buffers and Solutions 186 Bacterial and Plant Tissue Culture Media 191 Appendix C List of Suppliers 197 Appendix D B L A S T Sequence Alignment Results 200 Appendix E Prediction of Protein Secondary Structure 205 viii List of Abbreviations A adenine A B A abscisic acid A N O V A analysis of variance A P S ammonium persulphate A T P F amino terminal processed fragment A T P adenosine triphosphate d A T P deoxyadenosine triphosphate B A 6-benzylaminopurine B S A bovine serum albumin C cytosine c D N A complementary deoxyribonucleic acid C D P K Ca dependent protein kinase C P M counts per minute C T A B cetyl trimethylammonium bromide C T P F carboxy terminal processed fragment 2,4-D 2,4-dichlorophenoxyacetic acid D E P C diethyl pyrocarbonate D N A deoxyribonucleic acid D T T dithiothreitol E D T A ethylenediaminetetraacetic acid E M B L 3 lambda phage strain E R endoplasmic reticulum E S M embryogenic suspensor mass F A A formalin acetic acid fixative G guanine G U S P-glucuronidase H E P E S N-2-hydroxyethylpiperazine-N'-2'ethanesulfonic acid H g mercury I B A 3-indolebutyric acid IPF internal processed fragment I P T G isopropyl P-D-thiogalactopyranoside L B Luria-Bertani broth L M Litvay's media m R N A messenger ribonucleic acid M O P S 3 -[N-morpholino]propanesulfonic acid M S Murashige and Skoog medium ix M U G 4-methylumbelliferyl 3-D-glucuronide 4 - M U 4-methylumbelliferone M W molecular weight n number of individual measurements N A A a-naphthaleneacetic acid NPTI I neomycin phosphotransferase dNTP deoxyribonucleoside triphosphate d/ddNTP deaza/dideoxyribonucleoside triphosphate O D optical density O P A P One-Phore-All Plus™ 1 OX buffer P C R polymerase chain reaction P E G polyethylene glycol pfu plaque forming units P N F I Petawawa National Forestry Institute P N K polynucleotide kinase Pu purine Py pyrimidine R N A ribonucleic acid RNase ribonuclease SDS sodium dodecyl sulphate S E standard error SLS sodium N-lauroyl sarcosine SSC saline-sodium citrate buffer S S P E saline-sodium phosphate-EDTA buffer T thymine T A E tris-acetate-EDTA buffer T B E tris-borate-EDTA buffer T E Tris - E D T A buffer T E M E D N,N,N',N'-tetramethyl-ethylenediamine t R N A transfer R N A U B C University of British Columbia X - G a l 5-bromo-4-chloro-3-indolyl-(3-D-galactopyranoside X-Gluc5-bromo-4-chloro-3-indolyl-P-D-glucuronide Y E P yeast extract - peptone medium Y T yeast extract - tryptone medium x List of Tables Table 1: Dicot 2S Albumins 5 Table 2: Transgenic Plants Transformed with 2S Albumin Coding Regions 13 Table 3: Synthetic Oligonucleotides 29 Table 4: Matches from B L A S T Sequence Alignments 72 Table 5: Sequence with Similarity to Cereal motifs 82 Table 6: Amino Ac id Composition (%) of PG2S 105 Table 7: Percent Identity between PG2S and Conifer 2S Albumin cDNAs . . . 106 Table 8: Matrix of Pairwise Similarity of Conifer 2S Albumins 106 xi List of Figures Figure 1: Dicot 2S Albumin Precursor Protein 8 Figure 2: Plasmid Constructs. 48 Figure 3: Spruce Somatic Embryo Developmental Stages 59 Figure 4: X 3.2 Restriction Map 69 Figure 5: Picea glauca 2S albumin pseudogene T 2 S 70 Figure 6: 2S Albumin genomic clone from Picea glauca 74 Figure 7: Alignment of *F2S with the functional 2S Albumin.... 76 Figure 8: 2S albumin promoter sequence including the uidA fusion junction 80 Figure 9: Alignment of 2S Albumin Proximal Promoters 83 Figure 10: Evidence of Intron within the Spruce 2S Albumin Gene Family 88 Figure 11: Comparison of the Spruce 2S Albumin genomic and c D N A clones 90 Figure 12: White Spruce 2S Albumin coding region 94 Figure 13: Graphical Representation of 2S Albumin Amino Ac id Statistics 96 Figure 14: Plot of Predicted Hydropathicity 101 Figure 15: Prediction of the 2S Albumin Precursor Protein Secondary Structure 102 Figure 16: 2S Albumin Super Family Alignment 103 Figure 17: Alignment of Conifer 2S Albumin Amino Ac id Sequences 107 Figure 18: Dendrogram of the Conifer 2S Albumin Alignment 109 Figure 19: Spruce 2S Albumin Promoter Constructs I l l Figure 20: Northern Blot of the Spruce 2S Albumin Gene 113 Figure 21: Transient G U S Expression in Spruce Somatic Embryos and Germinants 115 Figure 22: Transient G U S Expression in Spruce Pollen 120 Figure 23: Expression Pattern of the Spruce 2S Albumin Promoter in Developing Tobacco Seeds 123 Figure 24: Transformed Tobacco Embryos stained for G U S expression 125 Figure 25: Relative Strength of pBIN2S and pBiN700 126 Figure 26: Position of Signal Sequence Cleavage Relative to Hydrophobicity 139 xii Figure 27: Conserved Motifs of the Proximal Promoter xiii A C K N O W L E D G E M E N T S I would like to acknowledge all the support that I have received from my friends and family over the many years it has taken to complete this work. Thanks M o m , Dad and Darryl. Thanks also to the "gang" who are my extended family: Barbara Zatyko, Jane Webster, Lisa Spellacy, Dianne McDonald, Susanna Grimes and Cheryl Dyck. Thank you for believing in me and encouraging me, and for giving me a good shove when I needed it. Thank you Gertjan, I'm not sure I could have finished this without you. Special thanks to Dr. Craig Newton, who taught me practically everything I know in the lab, and to Dr. Dave Ellis who taught me to have confidence in the results. The transient expression experiments could not have been completed without the assistance of Dr. Pierre Charest and the staff at the Petawawa National Forestry Institute, especially Yvonne Devantier. I would also like to thank Dr. Carl Douglas for the use of the fluorometer. This research was financially supported in part by the B . C . Science Council G . R . E . A T awards and by B . C . Research. xiv C H A P T E R O N E Introduction 1.1 Seed Storage Proteins Seed storage proteins have been extensively studied because of their nutritional importance for humans and livestock. These proteins are synthesised during embryo development, generally stored in protein bodies and utilised as a carbon and nitrogen source during seedling germination (reviewed by Bewley and Black, 1994 and Shewry et al., 1995). They have also proven useful as an experimental system to explore gene regulation, as they are expressed at high levels, are strictly regulated in a temporal and tissue specific manner, and are responsive to environmental and biochemical signals such as abscisic acid ( A B A ) , dehydration and plant nutritive status (reviewed by Morton et al., 1995). It is likely that seed storage proteins arose from a common ancestor to the fern spore storage proteins, based on amino acid homology between fern spore storage proteins and angiosperm 11S legumin, 7S vicilin and 2S albumin seed storage proteins (Rodin and Rask, 1990, Templeman et al.,1988). Seed storage proteins do not share homology with the vegetative storage proteins which plants accumulate in leaves, stems, and roots (Staswick, 1989), although exceptions do exist, as Beardmore et al. (1996) found sequence similarity at the protein level, based on proteolytic digestion patterns and antigenic cross-reactivity between 36 kDa seed and bark storage proteins in poplar. The pod storage proteins of legumes, a type of vegetative storage protein, also show little homology to seed storage proteins although they accumulate and are mobilized as a nitrogen source to support concurrent seed development (Zhong et al., 1997). Seed storage proteins were originally classified based on their solubility in defined solutions and methods of extraction (Osborne, 1924). B y definition, albumins are soluble in water, globulins in saline, prolamins in alcohol/water, and glutelins in alkali solutions. As well, storage proteins are classified based on their sedimentation coefficients, S (Svedberg constant). . The main classes of seed storage proteins found in the Dicotyledonae are the 7S vicilins and the 11S legumins, both classed as globulins, and the 2S albumins. In the 1 Monocotyledonae, the dominant class of storage protein consists of the prolamins, though other classes of seed storage proteins are found in lesser amounts (reviewed by Shewry et al., 1995). Seed protein profiles of gymnosperms, though not as well studied, generally resemble those of dicots (Gifford and Tolley, 1989, Misra and Green, 1990, Hakman et al., 1990, Allona et al., 1992, Flinn et al., 1993, Hakman, 1993). Seed storage proteins are synthesised in large quantities, either by high levels of transcription and translation of a few genes, or by lower levels of expression from multiple genes (Bewley and Black, 1994). These genes are normally only expressed, and the proteins only accumulate, in the seed tissues. In dicot species the storage tissues are located in the embryo, or in both the embryo and endosperm (Thomas, 1993), whereas monocots store the majority of their protein reserves in the endosperm. In gymnosperms, seed storage proteins have been isolated from both the embryo and the megagametophyte, the haploid maternal nutritive tissue (Bewley and Black, 1994). Upon germination, seed storage proteins are catabolized by proteolytic enzymes, creating a free amino acid pool that is utilised by the germinant until it is able to grow autotrophically. The general pattern for seed storage protein synthesis involves translation of m R N A in the cytoplasm, followed by targeting of the precursor protein to the endoplasmic reticulum (ER) by a signal sequence, which is cleaved as the peptide enters the E R lumen. The precursor protein usually undergoes further processing in the E R and is exported either to specialised vacuoles which become protein bodies, or is excreted from the E R in Golgi vesicles (see review by Shewry et al., 1995). In both cases, after leaving the E R , the preprotein is further processed to form the mature protein. 1.2 2S Albumin Super Family 2S albumins are small seed storage proteins ranging in size from 9 to 15 k D a and are the most abundant seed storage protein in many dicot species (Youle and Huang, 1981). Related proteins have also been identified in monocots, fern spores and gymnosperms. The 2S albumins are grouped as a super family based on: 1. Amino acid sequence, specifically the 2 strict conservation of eight cysteine residues, 2. protein structure, usually a small and large subunit joined by disulphide bonds and 3. seed-specific expression. The 2S albumin and related gene sequences currently found in the NUT GenBank and SwissProt databanks are listed in Appendix A (page 179). There are 100 sequences from the Dicotyledonae, 113 entries from the Monocotyledonae, 3 amino acid sequences from the fern Matteuccia struthiopteris and 9 nucleotide sequences (including two from this work) from two conifer species. Not all 2S-size albumins are members of the super family, however. Examples of unrelated seed storage proteins of similar size and solubility are narbonin from Vicia narbonensis (Nong et al., 1995), PA1 and P A 2 from pea (Higgins et al., 1986 and 1987), and A m A l from Amaranthus spp. (Raina and Datta, 1992). A diverse group of proteins, which are not seed storage proteins, but that have a very similar cysteine framework was reviewed by Yasuda et al. (1997). These proteins, unlike the 2S albumins, tend to be hydrophobic, and have high proline or glycine content in their N -terminal regions. Several of this group are expressed in roots and stems, as well as being induced by wounding. Interestingly, some of this group are also known to be expressed during embryogenesis - H y P R P from maize (Jose-Estanyol et al., 1992), DC2.15 from carrot (Aleith and Richter, 1990) and S O Y B N from soybean (Odani et al., 1987). A tentative case may be made for the evolutionary relationship between the 2S albumin seed storage proteins and these diverse proteins based on the shared framework of cysteine residues. 1.2.1 2S Albumin-Related Genes in the Monocotyledonae Kreis et al. (1985) initially proposed an evolutionary relationship between the dicot 2S albumins and the H M W (high molecular weight) prolamins, sulphur-rich prolamins, a-amylase and trypsin inhibitors, and Bowman-Birk protease inhibitors, based on the conservation of three cysteine-containing regions within these proteins. The Bowman-Birk type protease inhibitors, which act against the catalytic activity of trypsin and chymotrypsin found in the digestive tracts of herbivorous insect larvae, are found in both monocot and dicot species. Although Bowman-Birk type inhibitors show sequence homology with the 2S albumins, they 3 are not always seed-specific and some, in fact, are wound inducible (reviewed by Birk, 1985). This divergent group is therefore not dealt with in this review. The evolutionary trend in cereal protein sequences is the insertion of blocks of more or less repetitive amino acid sequence into an ancestral protein framework related to the dicot 2S albumins. Cereal prolamins are not processed into separate subunits and consequently are larger and less soluble than the dicot 2S albumins (reviewed by Shewry et al., 1995). Not all of the cereal seed storage proteins are larger than the related dicot proteins; however, the rice allergenic proteins are slightly smaller (Adachi et al., 1993). 2S albumin proteins have also been identified as a major storage protein in somatic embryos of the monocot oil palm (Morcillo et al., 1997). The oil palm 2S albumins are cysteine-rich, but are monomeric and apparently lack disulphide bonds. The conifer seed storage proteins thus far characterised tend to resemble dicot proteins more closely than those of monocots, due most likely to the explosive evolutionary divergence of the cereals (Shutov et al., 1995, Flinn et al., 1993, Dong and Dunstan, 1996). For this reason, the majority of this introduction deals with dicot 2S albumins. 1.2.2 2S Albumins in the Dicotyledonae The mature dicot 2S albumin proteins generally consist of a small (3 - 4 kDa) and large ( 7 - 9 kDa) subunit joined by disulphide bridges. These proteins have been identified in numerous dicot families (Table 1). Additional unpublished sequences from other species can also be found in the SwissProt and GenBank databases (Appendix A , page 179). Some of the 2S albumin proteins characterised have "common" names i.e., napin from Brassica napus, arabidin from Arabidopsis thaliana, mabinlin from Capparis masaikai, conglutin-5 from Lupinus angustifolius. The large and small subunits o f the mature protein are generally cleaved from a single precursor peptide (Arabidopsis thaliana - Krebbers, et al., 1988; Bertholletia excelsa -Altenbach et al, 1986, De Castro et al., 1987; Brassica napus - Crouch et al., 1983, Josefson et al., 1987, Scofield and Crouch, 1987, Ericson et al., 1986, Baszcynski and Fallis, 1990; 4 Table 1: Dicot 2S Albumins Family Species Reference Brassicaceae Brassica napus, B. rapa, B. oleracea; B. campestris, B. nigra, B. juncea, and B. carinata Monsalve and Rodriguez, 1990; Dasgupta et a l , 1995 B. napus Lonnerdahl et al., 1972 B. napus Monsalve R.I. et al. 1991a B. campestris Dasgupta S. and Mandal R K , 1991 Raphanus sativus Laroche et al., 1984, Monsalve et al., 1994, Laroche-Raynal M . and Delseny M . , 1986, Raynal et al., 1991 Sinapis alba Menendez-Arias et al, 1988 Sinapis arvensis Svendsen et al., 1994 Fabaceae Lupinus angustifolius; L. albus Gayleretal . , 1990, Salmanowicz and Weder, 1997 Medicago sativa Coulter and Bewley, 1990 Euphorbiaceae Ricinus communis L i et al., 1977, Sharief and L i , 1982 Cucurbitaceae Cucurbita spp. Hara-Nishimura, 1993 Luffa cylindrica Ishihara et al., 1997 Momordica charantia L i , 1977 Lecythidaceae Bertholletia excelsa Ampe et al., 1986, Sun et al., 1987 Lecythis zabucajo, Couroupita quianensis Sunetal . , 1996 Asteraceae Helianthus annuus Allen et al., 1987, Kortt and Caldwell, 1990 Papaveraceae Papaver somniferum Srinivas and Rao, 1987 Malvaceae Gossypium hirsutum Youle and Huang, 1979 Sinapis alba - Menendez-Arias et al, 1987', Lupinus angustifolius - Gayler et al., 1990, Gossypium hirsutum - Galau et al., 1992). The sunflower {Helianthus annuus) 2S albumins are an exception to the rule since the small and large subunits remain uncleaved, resulting in a monomeric mature protein (Kortt and Caldwell, 1990, Kortt et al., 1991, Anisimova et al., 1995). Also, two genes from sunflower (Allen et al., 1987, Thoyts et al., 1996) and one from castor bean {Ricinus communis) (Irwin et al., 1990, Godhino da Silva et al., 1996), have been identified which encode two expressed 2S albumin proteins in tandem. A t least one sunflower c D N A (SFA8) encodes a "single" 2S albumin (Kortt et al., 1991) similar to the majority of the dicot 2S albumins. Sequencing of genomic 2S albumin genes has revealed that the majority of dicot 2S albumins are without introns, with the exception of Brazil nut (Bertholletia excelsa) (Gander et al., 1991) and sunflower (Allen et al., 1987). A l l of the related monocot genes sequenced to this point also lack introns. 1.2.3 2S Albumin Proteins Mature 2S albumin proteins have been shown to be localised within the cell in specialised membrane bound organelles known as protein bodies (A. thaliana - De Clercq et al., 1990a, Bertholletia excelsa - Altenbach et al., 1986, Medicago sativa - Coulter and Bewley, 1990), though a sunflower 2S albumin protein has been found to be associated with lipid bodies (Thoyts et al., 1996). Krochko etal. (1994) observed the abnormal accumulation of alfalfa 2S protein in the cytoplasm rather than in the protein bodies of somatic embryos which were deficient in 2S and 1 IS proteins compared to zygotic embryos. The 2S albumin protein fraction generally consists of several variant isoforms within the seed (Ampe et al., 1986, Ishihara et al., 1997, Anisimova et al., 1995, Kortt and Caldwell, 1990, Irwin et al., 1990, Gayler et al., 1990, Coulter and Bewley, 1990, Monsalve et al., 1994, Gehrig et al., 1996) and the pattern of isoform expression has been used as a polymorphic character to determine relatedness among and within species (Salmanowicz and Przybylska, 1994, Salmanowicz, 1995, Anisimova et al., 1995, Przybylska and Zimniak-Przybylska, 1995). 2S albumin genes are found in multigene families: five copies in A. thaliana (Krebbers et al., 1988, van de Kle i et al., 1993), at least five copies in Brazil nut (Gander et al., 1991), ten to sixteen copies in B. napus (Josefson et al., 1987, Scofield and Crouch, 1987), eight to 12 copies in Raphanus sativus (Raynal et al', 1991) and four copies in castor bean (Irwin et al., 1990). The isoforms are apparently the product of these multigene families (Krebbers et al., 1988, Muren and Rask, 1996, Gonazalez de la Pena et al., 1996), though examples of alternate proteolytic processing have been observed at the amino-terminal peptide (D'Hondt 6 et al., 1993b) and at the carboxy-terminus (Monsalve and Rodriguez, 1990; Monsalve et al. 1991, Muren et al. 1995, Godinho da Silva Jr. et al., 1996, Gehrig et al., 1996). A s well, as mentioned above, in sunflower and castor bean a single m R N A has been shown to code for two complete proteins. 1.2.4 Protein Secondary Structure Disulphide bridges are formed within the E R prior to removal of the linker peptide joining the large and small subunits (Crouch et al. 1983, Coulter and Bewley, 1990, Monsalve et al., 1991b, Hara-Nishimura, 1993) and are important in stabilisation of the folded structure of the napin protein (Schwenke et al., 1988). The interchain and intrachain positions of the disulphide bonds have been mapped in Lupinus albus (Salmanowicz and Weder, 1997), Lupinus angustifolius (Lilley and Inglis, 1986), the sunflower 2S albumin SFA8 (Egorov et al., 1996), B. napus napin (Gehrig and Biemann, 1996) and mabinlin from Capparis masaikai (Nirasawa et al., 1993). Based on the above research, the pattern of disulphide bonds is conserved among the dicots (Figure 1) and consists of interchain bonds between the first Cys residue of the small chain and the third Cys of the large chain, and between the second Cys of the small chain and the first Cys of the large chain. Intrachain bonds occur in the large chain between the second Cys and fifth Cys residue, and the fourth Cys and the sixth Cys. Egorov et al., (1996) found that related monocot 2S albumin-like proteins, the alpha and gamma gliadins, have a pattern of disulphide bonds very similar to that seen in the dicot proteins, but that the pattern is less conserved in the related wheat a-amylase inhibitors. The secondary structure of a napin protein has been determined by N M R and the tertiary structure calculated, resulting in a predicted structure consisting of five helices and a C-terminal loop (Rico et al., 1996). 7 small subunit large subunit N H ' V M M MM* B H 7X~ Cystys CysXCys J L L L mm. C O O H "TT Cys Cys J I *CTPF di-sulphide bridges Figure 1: Dicot 2S Albumin Prepropeptide Representation of the initial product of translation. E R (endoplasmic reticulum ) signal sequence targets the precursor protein to the ER. A , B and C are regions of somewhat conserved sequence which contain the eight highly conserved cysteine residues. V is the variable region, which has little homology between proteins. A T P F is the amino-terminal processed fragment, IPF is the internal processed fragment or "linker" region, and C T P F is the carboxy-terminal processed fragment. The disulphide bridges are formed in the lumen of the E R , prior to the precursor protein being transprted to the protein body where the processed fragments are removed. 8 1.2.5 Processing of the Precursor Protein Comparisons between mature 2S albumin protein amino acid sequences and c D N A sequences have revealed that the initial product of translation is extensively processed, resulting in removal of approximately 20% of the amino acid sequence (De Castro et al., 1987, Ericson et a l , 1986, Krebbers et al., 1988). The first step in the processing of the 2S albumin precursor protein is the removal of an E R signal sequence as it enters the lumen of the E R from the cytoplasm. The disulphide bonds form between the conserved cysteine residues within the lumen of the E R . The propeptide is then targeted to the vacuole, where the amino-terminal peptide, and an internal linker region located between the small and large subunits, are removed, along with a few amino acids from the carboxy-terminus of the protein precursor (Figure 1) ( see review by Shewry et al., 1995, Ampe et al., 1986, Krebbers, et al., 1988, D 'Hondt et al., 1993b, Muren and Rask, 1996, Altenbach et al., 1986, De Castro et al., 1987, Thoyts et al., 1996). Gayler et al. (1990) found that in Lupinus angustifolia the conglutin-8 precursor protein lacks an amino-terminal propeptide, i.e. the signal peptide is directly attached to the small subunit. In addition, the carboxy-terminus of the large subunit was not removed. The extensive processing undergone by the precursor protein to give the mature protein, and the somewhat conserved nature of the fragments removed in processing, suggest that these regions are necessary or have some function. Detailed studies of the proteolytic processing of 2S albumin precursor peptides from Arabidopsis and B. napus have revealed that deletion or mutation of the processed fragments have little effect on targeting of the protein to protein bodies or on formation of the mature protein (Muren et al., 1995, D'Hondt et al., 1993b). However, deletion or modification of the, Arabidopsis 2S albumin internal linker propeptide resulted in less efficient processing of the precursor protein, as significant amounts of the precursor remained unprocessed (D'Hondt et al., 1993b). This may not be due to structural changes within the precursor protein since the presence of the napin propeptides was not found to significantly alter the conformation of the precursor protein in 9 comparison to the mature napin (Muren et al., 1996). N o function has yet been assigned to these propetides, though the removal of the amino-terminal and linker regions appears metabolically wasteful. However, Saalbach et al. (1996) have determined that the processed four amino acid carboxy-terminal fragment ( IAGF) from the Brazil nut 2S albumin is an essential part of a 20 amino acid carboxy-terminal sequence which targets the 2S albumin precursor protein to the vacuole in transgenic tobacco leaves. A n integral membrane protein, BP-80, has been identified which may be a vacuolar targeting receptor, as it binds the Brazil nut 2S albumin carboxy-terminal fragment, as well as the amino-terminal sequences of two other proteins targeted to the vacuole (Kirsch et al., 1996). The enzymes involved in the proteolytic processing of the 2S albumin proteins from the precursor to the mature form have only partially been characterised. The lack of conserved amino acid motifs within the processed regions of the peptide which would act as recognition sites for a proteolytic enzyme, led Muren et al. (1995) to hypothesise that the proteolytic enzyme had low coding sequence specificity or that the proteolytic enzyme may recognise sequences which were not located at the exact site of processing. Alternately, the prepropeptide may be processed by multiple enzymes recognising multiple motifs (Muren et al., 1995). Monsalve et al. (1990) hypothesise the existence o f a beta-turn specific endoprotease in B. napus, based on the cleavage of the precursor protein at sites that are tetra-peptides with high beta-turn probabilities. One of the proteases involved in proteolytic processing is believed to be an aspartic proteinase, which cleaves at multiple sites (preferring Phe-Asp, but also recognising Asp-Met, Asp-Asp, and Asp-Ser) within the processed fragments in both Arabidopsis and B. napus 2S storage proteins (Muren and Rask, 1996, D'Hondt et al., 1993a). Hara-Nishimura et al. (1995) isolated an asparaginyl endopeptidase, a type of cysteine proteinase, from the vacuole of castor bean and soybean. This enzyme was able to process 2S albumins and an 11S globulin by cleaving peptide bonds at the carboxy-terminal side of an exposed asparagine. The mature 2S albumin protein is not glycosylated in A thaliana (Krebbers et al., 1988) or Lupinus angustifolia (Gayler et al., 1990). Phosphorylation of serine residues by a 10 calcium dependent protein kinase ( C D P K ) has been observed to occur in both the small and large chain of specific kohlrabi napin proteins (Neumann et al., 1996a, 1996b). A subset of napin-like proteins, characterised in radish (Polya et al., 1993), bitter melon and castor bean (Neumann et al., 1996c), is also phosphorylated at specific small subunit serine residues by the same kinase. Though the significance of this phosphorylation is unknown, Neumann et al. (1996c) suggest that it may be related to plant defensive mechanisms, as other cysteine-rich proteins (Bowman-Birk protease inhibitors, lipid transfer proteins, and y-thionins) which are know to be involved in defence against pathogens, are also phosphorylated by C D P K . 1.2.6 Amino Acid Homology Amino acid sequences within the 2S albumin super family are quite variable, with the exception of the eight highly conserved cysteine residues which form the disulphide bridges that maintain the secondary structure of the mature protein. Amino acid identities of 2S albumins within the Brassicaceae are in the order of 70% (86% for processed regions of the protein and 66% for mature regions of the protein), and between Brassica and Brazil nut the identity drops to 22 -24% (Gander et al.,1991). Identity between the mature Brazil nut and castor bean proteins is only 36% (Gander et al., 1991) and 44% between the cotton Mat5A, Brazil nut and Arabidopsis amino acid sequences (Galau et al., 1992). Sequence similarity at the nucleotide level between members of a 2S albumin gene family was found to be 85% within the genus Brassica (Dasgupta et al., 1995) and 88% within the genus Lupinus (Salmanowicz and Weder, 1997). Antigenic cross-reactivity is another measure of protein similarity. Cross-reactivity was found between a 2S albumin isolated from Brassica campestris and other similar sized seed proteins in the Brassicaceae, but no cross-reactivity was observed with seed storage proteins of mung bean and tobacco (Dasgupta and Mandal, 1991). Monsalve and Rodriguez (1990) also found that antibodies raised against the mustard allergen Sin al recognised 2S albumins from B. napus, B. rapa, and B. oleracea. Coulter and Bewley (1990) found only one instance of serological cross-reactivity between an antibody raised against an alfalfa 2S 11 albumin and other legume seed storage protein extracts, indicating low amino acid homology among the legume 2S albumins. 1.2.7 Practical Applications of 2S Albumin Genes and Proteins Many 2S albumin proteins are sulphur-rich due to high cysteine and methionine content relative to other seed storage proteins. A body of research has developed based on identifying 2S albumins with nutritional qualities of interest from different species, as well as modifying previously characterised 2S albumin genes for increased nutritive value. The Plant Genetic Systems N V (Brussels, Belgium) has patented a process for increasing the nutritional content of plants by modifying 2S albumin genes (US5589615, 1997). Amino acid sequence comparisons among the dicot 2S albumin proteins indicate that the region located between the fourth and fifth Cys residues of the large subunit has a high degree of variability (Krebbers et al., 1993) (Figure 1). This "variable region" has since been shown to tolerate the addition of small biologically active peptides, as well as the addition or substitution of nutritionally important amino acids (reviewed in Table 2). Genetic engineering of 2S albumin coding regions between species results in correct processing of the precursor peptide, as well as targeting to protein bodies, though levels of protein accumulation is dependent upon the promoter used to drive expression (reviewed in Habbin and Larkins, 1995). Forage legumes have been engineered for foliar expression of 2S albumin proteins by placing the coding region under the control of a 3 5 S C a M V promoter. This results in low levels of protein accumulation throughout the plant, with protein being targeted to leaf mesophyll vacuoles (Saalbach et al., 1994). Chimeric 2S albumins with added E R retention signals have been used to prevent transport of the protein to the leaf vacuole where proteases may degrade the protein (Tabe et al., 1993, Tabe et al., 1995 and Khan et al., 1996). A n anti-sense napin gene under the control of the napin promoter resulted in transgenic B. napus seeds with reduced or no napin, a concomitant increase in the amount of the seed storage protein cruciferin and altered fatty acid composition (Kohno-Murase et al., 1994). 12 Table 2: Transgenic Plants Transformed with 2S Albumin Coding Regions 2S albumin coding region from: Modification Transgenic Expression Reference Arabidopsis thaliana addition of a neuro-peptide to the variable region A. thaliana and B. napus Vandekerckhove et al., 1989 Arabidopsis thaliana seed-specific in N. tabacum De Clercq et al., 1990a Arabidopsis thaliana Bertholletia excelsa addition of Met residues to the variable region seed-specific in A. thaliana, B. napus, and N. tabacum De Clercq etal., 1990b Arabidopsis thaliana addition of Lys residues and 28 amino acid peptide from Xenopus, to the variable region seed-specific in Brassica napus Krebbers et al., 1991 Arabidopsis thaliana addition of Lys residues to the variable region seed-specific in A thaliana Conceicao and Krebbers, 1994 Bertholletia excelsa seed-specific in N. tabacum Altenbach et al., 1989 Bertholletia excelsa signal peptide from a soybean lectin gene seed-specific in B. napus Guerche et al., 1990 Bertholletia excelsa seed-specific in B. napus Altenbach et al., 1992 Bertholletia excelsa constitutive expression in N. tabacum and Vicia narbonensis Saalbach et al., 1994 Bertholletia excelsa constitutive and seed-specific expression in N. tabacum and Vicia narbonensis Saalbach et al., 1995a Bertholletia excelsa constitutive and seed-specific expression in N. tabacum and Vicia narbonensis Saalbach et al., 1995b Bertholletia excelsa seed-specific in B. napus Denis et al., 1995a, 1995b, 1996 Bertholletia excelsa - addition of 5 Trp residues to the variable region - single substitutions: Leu to Trp or Arg 8 0 to Trp constitutive expression iriN. tabacum Marcellino et al., 1996 Bertholletia excelsa seed-specific in Vicia narbonensis Pickardt et al., 1995 Bertholletia excelsa seed-specific in Glycine max Nordlee et al., 1996 Bertholletia excelsa constitutive expression Phaseolus vulgaris Aragao etal., 1996 13 Table 2 (continued) Bertholletia excelsa Lecythis zabucajo addition of Met residues in the variable region constitutive and seed-specific in Solanum tuberosum and N. tabacum Sun etal, 1996 Bertholletia excelsa Transiently expressed under constitutive and seed-specific promoters in Arachis hypogaea embryos Lacorte et al., 1997 Brassica juncea constitutive and seed-specific expression in N. tabacum Ghosh etal., 1995 Brassica napus modified 3* UTR for identification of mRNA seed-specific Brassica napus Radkeetal., 1988 Brassica napus seed-specific in N. tabacum Stayton et al., 1991 Helianthus annuus 3' addition of an ER retention signal (amino acids: SEKDEL) -seed-specific in Pisum sativum and Lupin -foliar expression in lucerne and Trifolium subterraneum Tabe et al., 1993 Helianthus annuus 3' addition of an ER retention signal peptide constitutive and foliar expression in Medicago sativa Tabe et al., 1995 Helianthus annuus 3' addition of an ER retention signal foliar expression in Trifolium subterraneum Khan etal., 1996 Helianthus annuus seed-specific in Lupinus angustifolius Molvig et al., 1997 Pisum sativum seed-specific in N. tabacum and B. napus Stayton etal., 1991 2S albumin proteins have also been expressed in prokaryotic and eukaryotic expression systems. Gonzalez de la Pefia et al. (1996) expressed the mustard allergen Sin al as a small and large subunit joined by the internal processed fragment in Escherichia coli. Though 98% of the chimeric protein was located in insoluble inclusion bodies, a soluble fraction was purified which shared many characteristics of the purified mature protein, suggesting correct three dimensional folding. D'Hondt et al. (1993b) expressed the 2S albumin protein from Arabidopsis in yeast, but found that the precursor protein was not processed to the mature form. In contrast, Pal and Biswas (1995) reported the expression in yeast of a second Arabidopsis 2S albumin under the control of its own promoter, which was correctly processed and localised in vacuolar bodies. The precursor form of napin has also 14 been expressed in a baculovirus system, where the signal peptide was cleaved from the prepronapin by the insect cells, but no further processing occurred (Muren and Rask, 1996). The mabinlins comprise a group of 2S albumin proteins isolated from the seeds of Capparis masaikai with the unusual property of being able to elicit a sweet taste (Liu et al., 1993). Individual mabinlin isoforms have been characterised as being 400 times sweeter than sucrose, and heat stable at commercial processing temperatures (Nirasawa et al., 1994, Sun et al., 1996). These proteins therefore have potential as "low calorie" sweeteners for the processed food industry (Nirasawa et al., 1993). Structural analysis of the mabinlin isoforms in relation to their food processing characteristics has revealed that "sweetness" is lost upon reduction of the disulphide bridges (Nirasawa et al., 1993) and that differential heat stability is associated with a single amino acid variation in the large subunit (Nirasawa et al., 1994). Another group of 2S albumins with potential uses in industry has been identified from sunflower. Certain sunflower 2S albumins may be useful as emulsifiers or foaming agents (Gueguen et al., 1996). 1.2.8 2S Albumins Identified as Allergens A number of the 2S albumin proteins characterised from various plant families have been shown to be allergenic, such as Bra j EE from Brassica juncea (Gonzalez de la Pena et al., 1991, Monsalve et al., 1993) and Sin al from mustard (Menendez-Arias et al., 1988, Dominguez et al., 1990, Gonzalez de la Pena et al., 1996). The allergenicity of these two proteins is associated with the presence of a histidine residue in the large chain, which acts as an epitope for antibody recognition (Monsalve et al., 1993). In addition, conformational epitopes have been detected in the mustard allergen, Sin al (Menedez-Arias et al., 1990). Other 2S albumins cited as being allergens have been characterized in rapeseed (Brassica napus) (Monsalve et al., 1997), castor bean (Ricinus communis) (Youle and Huang, 1978b, Thorpe et al.,1988, Machado and Godinho D a Silva Jr., 1992), soybean (Glycine max) (Moroz and Yang, 1980, Burks et al., 1988) and cotton (Gossypium hirsutum) (Youle and Huang, 1979). Within the monocotyledonae, members of the a-amylase / trypsin inhibitor 15 family have been characterized as allergens in rice (Adachi et al.. 1993., Nakase et al., 1996), barley (Barber et al., 1989) and wheat (Gomez et al., 1990). Nordlee et al. (1996) found that the 2S albumin from Brazil nut is a major allergen in individuals allergic to the nut and that transgenic soybeans which expressed the Brazil nut 2S albumin coding region would also induce an allergic reaction. A patient allergic to Brazil nut was shown to be specifically sensitive to both the 2S albumin and 12S legumin Brazil nut seed storage proteins, as well as having a significant reaction, to presumably related proteins in hazel nut and mustard (Bartolome et al., 1997). The unusual stability of 2S albumin proteins may allow them to cross from the digestive tract into the blood stream and elicit an IgE-mediated immune reaction (Gonzalez de la Peria et al., 1996). Inhalation has also been shown to elicit an allergic response (Barber et al., 1989, Gomez et al., 1990, Monsalve et al., 1997). The potential for allergenicity shown by 2S albumins has important implications for their use in genetic engineering to improve nutritive quality (Nestle, 1996). 1.2.9 2S Albumins with Anti-Pathogenic Activity Some 2S albumins may have a dual function as both seed storage protein and anti-fungal protein within the seed (Terras et al. 1992, 1993 a). The antifungal activity of radish (Raphanus sativus) and B. napus 2S albumins is due to their ability to render the membranes of fungal hyphae excessively permeable (Terras et al., 1993b). Both of these dicot 2S albumins, as well as three related barley seed proteins (a trypsin inhibitor, and two Bowman-Birk-type inhibitors) act synergistically to increase the fungal inhibition of a purified barley thionin in vitro (Terras et al., 1993b). Kreis et al. (1985) proposed that dicot 2S albumin storage proteins and the cereal trypsin and a-amylase inhibitors, which have insecticidal properties, were related based on amino acid homologies of three domains within these proteins. Numerous endosperm specific cereal a-amylase and / or trypsin inhibitors have been characterized. These proteins are similar in size to the dicot 2S albumins and contain ten conserved cysteine residues, eight of which align with the conserved cysteines characteristic of 2S albumins (Garcia-Olmedo et al., 1987, Rasmussen and Johansson, 1992). However, 16 comparison of secondary structure between the dicot 2S albumins with a-amylase inhibitors from wheat and Indian finger millet indicates that the pattern of disulphide bonds is only partially conserved between the two groups (Egorov et al., 1996). 2S albumin small and large subunits from various species (Brassica napus, Momordica charantia, Ricinus communis, Raphanus sativus, Sinapis alba) have been identified as calmodulin antagonists by Polya et al. (1993) and Neumann et al. (1996a, 1996b, and 1996c). These researchers theorise that the calmodulin inhibitory activity may be involved with anti-fungal activity. 1.3 Seed-specific expression The general pattern of expression of 2S albumin genes is seed-specific, with low expression during the early stages of embryo development, which increases to a high level during cotyledon development, and then declines or is turned off at seed maturity. Napin m R N A first appears during the late heart stage (Fernandez et al., 1991), and can also be detected during later developmental stages in the endosperm (Hbglund et al., 1991, DeLisle and Crouch, 1989). In the mature seed of Arabidopsis thaliana, 2S albumin protein is localised within the embryo and vestigial endosperm (De Clercq et al., 1990a). Differential expression within the seed by individual members of 2S albumin gene families has been observed in Arabidopsis (Guerche et al., 1990), Raphanus sativus (radish) (Laroche-Raynal and Delseny, 1986), and B. napus (Blundy et al., 1991). In Bertholletia excelsa (Brazil nut), where the hypocotyl is the main storage tissue of the embryo, 2S albumin m R N A is not detected until late embryo development, in stages 3 and 4 (Gander et al., 1991). In Ricinus communis (castor bean), 2S albumins accumulate in the endosperm during later embryo developmental stages and continue to accumulate during the desiccation stage (Irwin et al., 1990). Coulter and Bewley (1990) found that the Medicago sativa (alfalfa) 2S albumin protein began to accumulate in the early cotyledonary embryo stage and was degraded 72 hours post-germination. In a few cases, napin expression has also been associated with pollen development. A napin promoter fused to an exotoxin A gene from Pseudomonas aeruginosa, resulted in male 17 sterile transgenic tobacco as well as blockage of embryo formation at the stage of napin accumulation (Koning et al., 1992). Interestingly, transgenic B. napus plants containing the same construct had viable pollen suggesting that the napin promoter functioned differently in tobacco. Boutilier et al. (1994) further identified a subfamily of napin clones expressed in B. napus microspores induced to undergo embryogenesis. Expression was biphasic, occurring as a response to induction and again later as storage proteins began to accumulate in the microspore derived embryos. 1.4 Conifer Seed Storage Proteins and 2S Albumins Small (10 kDa to 18 kDa) seed storage proteins have been identified in several gymnosperm species including Pinus contorta (Lammer and Gifford, 1989), Picea glauca/engelmannii (Flinn et al.,1993), Picea abies (Hakman, 1993) Pinus pinaster (Allona et al. 1992 and 1994) and Pinus taeda (King and Gifford, 1997). Though other research examining coniferseed storage protein profiles does not show proteins less than 14.4 kDa in size (e.g. Groome et al. 1991) this may indicate that, due to their small size, these proteins could have run off the bottom of S D S - P A G E gels and been missed. Allona et al. (1994) characterized four low molecular weight globulins from Pinus pinaster, and found that each consisted of a small and large subunit joined by disulphide bridges, that they had high Arg and Glx (Glu and Gin combined) contents and that they probably contained the same number of cysteine residues as dicot 2S albumins. Flinn et al. (1993) identified a 15 k D a seed storage protein in Picea glauca/engelmannii, consisting of large and small subunits joined by disulphide bonds. A c D N A clone (GenBank accession X63193 - Newton, 1991) coding for a putative 20 k D a peptide had earlier been isolated from late cotyledonary spruce {Picea glauca/engelmannii) somatic embryos. This clone had homology to the dicot 2S albumins and was expressed during the same embryo developmental stages as the protein. Five other conifer c D N A clones showing amino acid homology to the 2S albumin super family have been sequenced from Pinus strobus (GenBank accessions X62433, X62434, X62435 and X62436 18 Rice and Kamalay, 1991) and Picea glauca (GenBank accession L47745 Dong and Dunstan, 1995, Dong and Dunstan, 1996). Flinn et al. (1993) found that 2S albumin m R N A accumulated in interior spruce (Picea glauca/engelmannii) zygotic embryos, as well as somatic embryos cultured on 40 u M abscisic acid ( A B A ) maturation medium. 2S albumin m R N A was first detected at the early cotyledonary stage and continued through to embryo maturity. In addition, it was observed that somatic embryos maturing on sub-optimal levels of A B A tended to germinate precociously and that the amount of 2S albumin m R N A declined as precocious germination proceeded. The authors indicated that 2S albumin expression was up-regulated in response to A B A , and to high osmoticum caused by the addition of mannitol. Conversely, Dong and Dunstan (1996) found that a P. glauca 2S albumin c D N A (PgEMB25) , though showing essentially the same pattern of developmental regulation as that observed by Flinn et al. (1993), was not up-regulated in response to A B A or the osmoticum P E G (polyethylene glycol) in P. glauca somatic embryo suspension cultures. 1.5 2S Albumin Promoter Studies in Dicotyledonae The upstream regulatory regions, or promoters, of seed storage protein genes are of particular interest because they contain regions which are thought to mediate the high levels and precise timing of gene expression, as well as embryo and tissue specificity within the seed. Discrete regions, referred to as cis elements, are thought to interact with nuclear proteins known as transcription, or fraws-acting, factors. Some of these regions are involved in the transcriptional response of genes to A B A , desiccation and plant nutritional levels (see review by Thomas, 1993, Bewley and Black 1994 and Morton et al., 1995). Promoter function has been explored using three main strategies. The primary method used to elucidate function is the fusion of the promoter to a reporter gene, followed by the transient or stable transformation of cells, isolated tissues or whole plants. This allows the pattern and relative strength of expression of a promoter to be measured and / or visualised, depending on the reporter gene used. The reporter gene most commonly used is uidA 19 encoding the enzyme P-glucuronidase (Jefferson et al., 1987). Other reporter genes which are suitable but used less frequently are bar, encoding phosphinotricin acetyltransferase; C A T , encoding chloramphenicol acetyl transferase; lux, encoding luciferase, and various toxin genes which encode products lethal to the cell such as diphtheria toxin A (DT-A) , Pseudomonas aeruginosa exotoxin A or RNase. Lethal reporter genes are used to identify extremely low levels of expression and to pinpoint the earliest moment of expression by causing cell ablation, i.e. death of any cell which expresses the gene product. Further dissection of promoter function may be accomplished by progressive deletion of the promoter or by mutagenesis of specific elements within the promoter. Regions may also be duplicated or exchanged between promoters. A variation of this technique is to fuse regions of interest, sometimes repeated in tandem, to a minimal promoter to explore the minimal amount of sequence information necessary to produce a specific pattern of expression. Quantification of response to environmental or biochemical signals, such as dehydration or A B A (reviewed by Quatrano et al., 1993), is also possible using such gene constructs. A s well, mutants of a signal transduction pathway may be characterized by transformation with promoter constructs suspected to be downstream of or to be affected by the mutation. A second, indirect method, used to identify potentially important regions within the promoter is sequence comparison. Within the coding regions of genes, conservation of amino acid sequence among members of a gene family is generally considered to be an indication of conserved function over evolutionary time. Similarly, attempts have been made to identify conserved promoter elements having functional significance by comparing 5' flanking regions of genes which are related evolutionarily, or are co-ordinately regulated, or which respond to the same signals (environmental or biochemical). The relative importance of these conserved elements to promoter function can then be confirmed by their mutation or deletion from promoter/reporter gene fusions. A third strategy for identifying cis elements is to locate the regions of a promoter where nuclear proteins bind by using DNAase foot-printing or gel retardation protocols. 20 Transcription factors are nuclear proteins which bind specific D N A sequences, thereby activating or repressing gene expression. Transcription factors act as links in signal transduction pathways. They are themselves regulated and can co-ordinate sets of genes by their binding to common promoter elements. The function of 2S albumin promoters isolated from Arabidopsis thaliana (De Clercq et al., 1990a, Conceicao and Krebbers, 1994), Brassica napus (Radke et al., 1988, Koning et al., 1992, Stalberg et al., 1993; Ellerstrom et al., 1996, Stalberg et al., 1996), and Bertholletia excelsa (Brazil nut) (Grossi de Sa et al., 1994, Vincentz et al., 1997) has been explored by deletion studies investigating gain and loss of expression in transgenic tobacco, as well as transgenic Arabidopsis and B. napus. These studies have revealed a general picture in which dicot 2S albumin promoters direct developmentally-regulated, seed-specific expression in heterologous dicot species. However, differences in expression patterns have been observed for the same construct in different species. For example, in transgenic tobacco the AT2S1 gene from Arabidopsis is expressed at an earlier stage of seed development than in Arabidopsis (De Clercq et al., 1990a). Similarly the B. napus napA gene is expressed in the tobacco endosperm at an earlier stage than in B. napus (Stalberg et al., 1993). Koning et al. (1992) observed the differential expression of a napin promoter / lethal gene fusion which arrested pollen development in transgenic tobacco but not B. napus. Despite these differences in expression, their high expression and seed-specificity have made 2S albumin promoters popular candidates to direct transgene expression in the seed of B. napus (Stayton et al., 1991, Voelker et al., 1996, Roeckel et al., 1997), tobacco (Ghosh et al., 1995, Roeckel et al., 1997) and Arabidopsis (Broun and Somerville, 1997). The hormone abscisic acid ( A B A ) , as well as the process of dehydration, have been implicated in the correct maturation of somatic embryos in several plant species (Kermode, 1995). Eight hundred basepairs of a napin promoter is sufficient to down-regulate expression of the p-glucuronidase (GUS) reporter gene in response to dehydration in immature transgenic tobacco seeds (Jiang et al., 1995). G U S expression was also shown to be further down-regulated upon imbibition of the prematurely dehydrated seed. These authors also 21 found that the 3' flanking region of the napin gene had no effect on the pattern of gene expression, but appeared to decrease absolute levels of expression when included in gene constructs. Further research revealed that the napin promoter was up-regulated in response to 10 u M A B A , but could be rendered insensitive to A B A by dehydration (Jiang et al., 1996), suggesting that hormonal control of storage protein accumulation may be quite complex. Koorneef et al., (1989) found that Arabidopsis plants which were double mutants lacking endogenous A B A {aba) production as well as being insensitive to A B A {abi3) did not accumulate 2S or 12S storage proteins, whereas single mutants homozygous for aba, abi\, or abil, and the double mutant aba,abi\, accumulated normal amounts of seed storage proteins. Developmental regulation of an Arabidopsis 2S albumin promoter G U S fusion was found to be independent of morphological development of the embryo in studies which crossed transgenic A. thaliana plants with plants which had mutations in embryo morphology {emb mutants) (Devic et al., 1996). Eight different emb mutants arrested at various points in embryo development, showed correct temporal induction of A T 2 S : G U S expression when compared to wild-type embryos despite their mutant morphology. Emb mutants did however, have higher G U S expression in the endosperm compared to wild-type seeds, due most likely to the lack of cotyledon development in the mutant embryos. Promoter function is complex and not easily dissected. Elements have been identified which are necessary for but not sufficient on their own for seed-specific expression, suggesting that these promoter elements somehow interact with other elements in determining patterns of gene expression. Sequential deletion of 2S albumin promoters in a 5' to 3' direction generally results in decreased levels of expression, although seed specificity is maintained until approximately the -150 basepair position, where expression ceases or seed-specificity is lost. This pattern holds true for 2S albumin promoters from Brazil nut {Bertholletia excelsa) which retained seed specificity to the -49 position (Vincentz et al., 1997), for Brassica napus, which lost seed specificity at -152 (Radke et al., 1988, Stalberg et al., 1993, Ellerstrom et al., 1996, Stalberg et al., 1996) and in Arabidopsis, where two 22 promoters were still seed-specific at -250 and -270 (Conceicao and Krebbers, 1994). The region between -1101 and -309 of the napin promoter contains a "negative" element (or elements), as removal of this region increases napin expression (Stalberg et al., 1993). A region responsible for cotyledonary expression was delineated by the exchange of regions between two differentially expressed A. thaliana 2S albumin promoters (Conceicao and Krebbers, 1994). Internal deletions and rearrangements have also been used to locate regions of the B. napus napin promoter responsible for changes in temporal and endosperm specific expression (Ellerstrom et al., 1996). Sequence comparison within gene families and between species has been used to identify conserved motifs in the 5' flanking regions (Gander et al., 1991, Dasgupta et al., 1993, Adachi et al., 1993, Conceicao and Krebbers, 1994, Stalberg et al., 1996). Some of these putatively conserved elements have been observed interacting with nuclear proteins, or are located adjacent to the site of protein binding (Ericson et al., 1991, Gustavsson et al., 1991, Grossi de Sa et al., 1994, Nakase et al., 1996b, Vincentz et al., 1997). Nakase et al. (1996b) showed that the promoters of three seed-specific rice genes, a rice allergenic protein gene (a member of the 2S albumin super-family), a glutelin gene and a prolamin gene interacted with the same transcription factors during competitive binding assays. Vincentz et al. (1997) showed that opaque-2 regulatory proteins from the monocots, Zea mays and Coix lacryma-jobi, bound to specific G-box core sequences within a Brazil nut (Bertholletia excelsa) 2S albumin promoter, and in addition, up-regulated expression of that promoter in a transient expression assay. 1.6 Characterisation of a Picea glauca 2S albumin genomic clone Research on the seed storage proteins of Picea glauca (Moench) Voss was initiated by Dane Roberts and Barry Flinn at the Forest Biotechnology Centre - B C Research (Roberts et al., 1990a and 1991; Flinn et al., 1991a, Flinn et al., 1991b and Flinn et al., 1993). A s part of his doctoral thesis at the University of British Columbia (1992), Flinn identified 1 IS legumin and 7S vicilin proteins and characterised their accumulation within developing zygotic and 23 somatic Picea glauca and P. glauca (Moench) Voss lengelmannii Parry embryos (Flinn et al., 1991a, Flinn et al., 1991b, and Flinn et al., 1993). In order to isolate c D N A clones encoding these storage proteins, c D N A libraries made from proembryos and early cotyledonary stage somatic embryos were screened by differential hybridization by Craig Newton (Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Thirty-seven clones were isolated that were uniquely expressed in the early cotyledonary stage somatic embryos; the stage at which storage proteins were beginning to accumulate (Flinn et al., 1991a). Cross-hybridization experiments and sequencing identified a single c D N A clone encoding a legumin and five vicilin c D N A clones (Newton et al., 1992). Interestingly, the majority of c D N A clones (21/37) were found to encode a small protein which had homology with the dicot 2S albumins (Dr. C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Four of the twenty-one c D N A clones were sequenced and found to differ, especially in the 5' and 3' untranslated regions. This result indicated that the white spruce 2S albumin gene family had at least four members, or at the very least 2 copies per haploid genome. Further work confirmed the presence of a 15 k D a protein in embryos which accumulated earlier than the legumin and vicilin proteins, and which was up-regulated in maturing somatic spruce embryos by A B A and high osmoticum (Flinn et al., 1993). Flinn found that the 15 kDa seed storage protein was located in protein bodies in interior spruce zygotic and somatic embryos (1993) and the megagametophyte, along with the spruce legumin and vicilin storage proteins (unpublished, B.Flinn, Genesis Research and Development Corp. Ltd. , P.O. B o x 50, Aukland, N .Z . ) . Flinn attempted to directly sequence the protein but was unsuccessful as the N-terminal sequence was blocked. A white spruce genomic library was subsequently screened using the 2S albumin c D N A clone II5G001 (GenBank accession X63193) as a probe, and three out of twenty-three positive clones were sequenced (Dr. C. Newton, unpublished, F.orest Biotechnology Centre -B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Two of the sequenced genomic clones were found to be pseudogenes, containing an insertion in one case and an in-24 frame stop codon in the other; in addition they had only 70% identity with the original c D N A clone. Northern hybridization with gene-specific oligonucleotide primers indicated that these genomic clones were not expressed or were expressed at very low levels in developing white spruce somatic embryos (Dr. C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). This data indicated that in addition to the expressed members of the white spruce 2S albumin gene family there are also non-fiinctional pseudogene members of the family. Newton re-screened the genomic library using a synthetic oligonucleotide probe specific to the highly expressed 2S albumin c D N A clone II5G001 and which did not cross-hybridize with any of the 23 initial lambda clones. Two lambda clones were isolated during the re-screening of the genomic library. One of these clones, A3.2, became the object of my thesis research. When this work was begun, there was little information available in general on conifer seed storage proteins and no information on conifer gene promoter function. The white spruce 2S albumin gene was particularly interesting because it was a member of a gene family known to be expressed at high levels during specific stages of seed development and was possibly co-ordinately regulated, along with the legumin and vicilin seed storage proteins. There was extensive research in the literature on angiosperm 2S albumin proteins and genes with which to compare and contrast the conifer homologue. Using the white spruce 2S albumin gene as a model of a developmentally regulated, tissue-specific gene I hoped to explore whether there were fundamental differences between gymnosperm and angiosperm regulatory sequences. There was also the possibility that by dissecting this promoter, cis-elements responsible for the tissue specificity and high levels of expression of the 2S albumin protein could be identified. Such elements might be useful in the genetic engineering of conifers, especially i f significant differences existed between angiosperm and gymnosperm regulatory sequences. It was unknown whether a gymnosperm promoter would function to direct the expression of a reporter gene in the angiosperm tobacco. Even within the Angiosperms, 25 sufficient differences exist between the Monocotyledonae and the Dicotyledonae in the cellular machinery that introns are not always spliced correctly between groups (Simpson and Filipowicz, 1996) and heterologous promoters are not always functional (Connelly et al., 1994). A n alternative to the stable transformation of tobacco plants is the transient expression of promoter:reporter gene constructs by microprojectile bombardment into plant tissues. Whether developmental regulation of a gene could occur during transient expression was also unknown. This work represents the first example of the stable expression of a reporter gene under control of a gymnosperm promoter in an angiosperm (tobacco) and shows that, with the proper controls, questions about the tissue-specificity and the developmental regulation of promoter construct can be answered in a transient expression system. The objectives of this thesis were to: • Isolate a genomic clone encoding a functional gymnosperm 2S albumin gene. • Sequence this gene and compare it with related angiosperm sequences. • Characterise the function of the promoter in a homologous gymnosperm (white spruce somatic embryos) and heterologous angiosperm (tobacco) through promoter G U S reporter gene fusions. In the course of this research, two regions with homology to the highly expressed 2S albumin c D N A clone were identified and sequenced from the lambda clone A3.2. One of the putative 2S albumin genes was identified as a pseudogene, but the second gene was apparently functional. The functional genomic clone was found to contain an intron, in contrast to most dicot 2S albumin genomic sequences which lack introns. Comparison of the sequence from the proximal promoter region (+62 to -400) of the Picea glauca 2S albumin gene with those from dicot 2S albumins revealed conserved motifs characteristic of the dicot 2S albumins. Northern analysis confirmed that the spruce 2S albumin clone was expressed in a seed-specific manner and showed a pattern of expression typical of seed storage protein genes. 26 The promoter of the putatively functional white spruce 2S albumin gene (PG2S) was translationally fused to the uidA (GUS) reporter gene. Two deletions were made to the full length promoter, one reducing it from 2.3 kb to approximately 700 basepairs, and the second reducing it to 179 basepairs. Function of the gymnosperm promoter in an angiosperm background was explored by stably transforming tobacco with the full length and 700 basepair truncated promoter constructs. Ease of transformation and larger seed size influenced the choice of tobacco over Arabidopsis, as our model transgenic plant. The Picea glauca 2S albumin promoter constructs directed G U S expression specifically to the tobacco embryo from heart stage to embryo maturity, with no expression in the tobacco endosperm. Promoter function was also observed in a homologous system by transiently expressing the three promoter constructs in different developmental stages of spruce somatic embryos, somatic germinants and pollen. The embryos produced by somatic embryogenesis have levels of storage protein (Flinn et al., 1991a, and Flinn et al., 1993 ) and lipids (Cyr et al., 1991) comparable to zygotic embryos and are able to germinate with high frequency (Webster et al., 1990). The choice of somatic embryos as an experimental system was influenced by the fact that spruce somatic embryos can be harvested in large numbers at specific stages of development which is necessary for the preparation of biolistic targets. It would have been prohibitive to dissect sufficient zygotic embryos for statistical comparison between treatments. In addition, the early stages of conifer zygotic embryos would only be available for brief periods of time, as pollination occurs once a year and embryo development occurs between May and September (Owens and Molder, 1984). Transient expression using the spruce 2S albumin promoter constructs was accomplished using a biolistic device, the DuPont PDS/He 1000 Gene Gun. Transient expression mirrored expression of the native gene as seen by Northern blot analysis, with the exception of high levels of transient expression observed in mature embryos which had been partially dried and in germinating white spruce pollen grains. The pattern of 2S albumin transient expression indicates the presence of elements which enhance expression located 27 between 2.3 kb and 700 basepairs. Seed specificity appears to be retained even in the smallest (179 basepair) 2S albumin construct. This work shows the relationship of the spruce genomic clone to dicot 2S albumin sequences previously characterised. In addition to sharing structural similarities based on the conservation of the eight cysteine residue framework within the protein, motifs within the promoter of this gene are conserved in sequence and function. Though the putative regulatory motifs identified are small, not conserved in position exactly between the gymnosperm and angiosperm sequences, and are not even highly conserved among the dicot 2S albumins, they provide sufficient information to direct the proper developmental and tissue specific expression of a reporter gene controlled by a spruce 2S albumin promoter in tobacco. This suggests that in addition to the conservation of cis elements between the gymnosperms and the angiosperms, the trans-acting factors with which they interact must also be conserved. 28 C H A P T E R T W O Methods and Materials Two genomic clones, A2. 1 and A3.2, were isolated from a white spruce genomic library using a synthetic oligonucleotide 115 G . l (Table 3) (Newton, unpublished; Dr. Craig Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). The genomic library (PNFI-X-88), prepared by Linda Deverno, Petawawa National Forestry Institute, Chalk River, Ont. consisted of partially digested Sau3a white spruce genomic D N A packaged in the lambda bacteriophage vector E M B L 3 (Frischauf et al., 1983). The sequence contained in the oligonucleotide II5G. 1 served to differentiate between high and low expressing members of the 2S albumin c D N A family (C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). B y using this oligonucleotide it was hoped that the genomic clones recovered would be ones that were expressed at high levels in the developing Picea glauca seed. The II5G series of sequencing primers, II5G.1 to II5G.5 (Table 3), were designed and used by Craig Newton to sequence several 2S albumin c D N A clones in both directions. Subsequently, I used them in sequencing the genomic clones contained in A3.2, as well as for the hybridization of Southern blots of subcloned D N A fragments. Table 3: Oligonucleotide Sequencing Primers P R I M E R 5' 3' T 4 II5G.1 1 G A A C C A T T T G A G C G T C A G C C 62 °C II5G.2 1 C A G C A T C T C T C C G A T G G 54 °C II5G.3 1 TAGCGAGAGTGGCGTTG 54 °C II5G.4 1 A T G G G T G T C T T T T C C C C T T C 60 °C II5G.5 1 C A C T T A A A A C T G C T G C C C G T 60 °C G U S fusion junct ion 2 T C A C G G G T T G G G G T T T C T A C 62 °C p U C / M 13 F o r w a r d 3 C G C C A G G G T T T T C C C A G T C A C G A C 78 °C p U C / M 1 3 Reverse 3 T CACACAGGAAACAGC TAT GAC 64 °C 1. Nucleic Acid - Protein Service Lab (U.B.C., Vancouver, BC) 2. Clontech Laboratories (Palo Alto, CA) 3. Promega (Madison, WI) 4. T m = 2(A+T) + 4(G+C), Wallace et al, 1979. 29 2.1 General Molecular Biology Techniques 2.1.1 Lambda Bacteriophage Protocols Preparation of Plating Bacteria Working Stock Plating bacteria were prepared as described in Sambrook et al (1989). The Escherichia coli strain ER1647, taken from a 15% glycerol stock stored at - 80 °C, was streaked onto T B medium (Appendix B , page 187) and grown at 37 °C overnight. A single colony was picked and used to inoculate a 250 ml flask containing 50 ml T B broth plus 0.2% maltose (from a 20% filter-sterilised stock), and incubated overnight at 37 °C on an orbital shaker at 250 rpm. After 16 hours, the bacterial culture was divided into two 50 ml conical centrifuge tubes and centrifuged at 2500 rpm for 10 minutes in a Beckman G P benchtop centrifuge (Beckman Instrument Inc.). The supernatant was removed and the cells were resuspended in filter-sterilised 10 m M M g S 0 4 (approximately 5 to 7 ml) to give an optical 9 density reading (OD) at 600 run of 2.0, equivalent to a bacterial concentration of 1.6 x 10 cells/ml. Pooled aliquots were stored at 4 °C, up to 3 weeks prior to use. Bacteriophage Multiplication Single plaques were used to produce pure phage stocks by the plate lysate method, essentially as described in Sambrook et al., (1989) A single plaque was harvested, using a sterile 200 ul plastic pipette tip, and transferred into a sterile 10 ml culture tube containing 0.3 ml of ER1647 host bacteria suspension. Bacteriophage and host were incubated without agitation at 37 °C for 30 minutes. T B top agar (TB medium containing 0.6% agar) was melted in a microwave oven on low power and cooled to 50 °C. Under sterile conditions, 6.5 ml of the T B top agar was pipetted into the culture tube. The agar was added so that it ran down the side of the culture tube forming no bubbles in the media. The mixture was immediately poured onto a room temperature 140 mm diameter T B petri plate and quickly swirled to spread the soft agar evenly. Plates were incubated in an upright position for 6 to 8 hours at 37 °C, until plaques were visible. 30 Phage were recovered by adding 10 ml S M buffer (Appendix B , page 184) to each plate, with gentle agitation for several hours on an orbital shaker at 4 °C. The liquid was collected using a sterile Pasteur pipette and transferred to 50 ml conical centrifuge tubes. A further 3 ml of fresh S M was added to each plate, swirled, and placed at an angle for 15 minutes until the liquid had drained from the plates' surface. This was added to the first 10 ml of S M collected. A 0.4% volume of chloroform was added to the pooled S M , the mixture was vortexed briefly, and centrifuged at 4000 rpm for 10 minutes at 4 °C, to kil l residual bacteria. Phage stocks were stored at 4 °C. In order to increase bacteriophage titre, initial stocks were diluted 1:10 and 1:100 with S M buffer. In a sterile 10 ml culture tube, 0.3 ml of diluted phage was gently mixed with 0.3 ml of ER1647 working stock and incubated at 37 °C for 20 to 30 minutes without agitation. Three replicates from each dilution were prepared, working from the highest dilution to the lowest, and plated using the method given above. Plates were incubated in an upright position for 6 to 8 hours at 37 °C until plaques had grown so that their edges touched. Phage particles were collected in S M buffer as above. Calculation of Bacteriophage Titre A serial dilution of phage stocks was done with S M buffer in 1 ml total volume to calculate titre. Diluted phage (100 ul) were mixed with an equivalent volume of the plating bacteria, ER1647, in a 10 ml disposable culture tube, and incubated at 37 °C for 30 minutes. Melted T B top agar was cooled to 50 °C, 2.5 ml added to the culture, and the mixture poured onto a 90 mm T B petri plate. The plate was swirled to evenly disperse the top layer. Plates were incubated in an upright position at 37 °C until plaques were visible in the lawn of ER1647 (approximately 6 hours). Plaques were then counted, and pfu/ml calculated: pfu/ml = number of plaques in 0.1 ml x dilution factor x 10. Phage stocks were most useful when the titre was around 1 0 1 0 or 10 1 1 plaque forming units (pfu) per ml. 31 2.1.2 Isolation of DNA and R N A Lambda Bacteriophage DNA To produce large numbers of phage for D N A purification, 5 ml of ER1647 culture 9 8 (approximately 5 x 1 0 bacteria) was mixed with 0.5 ml of phage stock (approximately 6 x 1 0 pfu) diluted with 4 ml of S M and incubated at 37 °C for 20 minutes without agitation. The host / phage mixture was added to 250 ml of pre-warmed T B in a 1 1 Erlenmeyer flask and placed on an orbital shaker for 3 to 4 hours at 250 rpm, until bacterial lysis was visible. A t this point, 5 ml of chloroform were added and the flask returned to the orbital shaker for another 10 minutes to further lyse the bacteria. The culture was cooled to room temperature and 25 ul of pancreatic DNase I (Sigma-Aldrich Canada, Ltd.,) and RNase A (Sigma) were added to give a final concentration of 1 u.g/ml. The flask was incubated at room temperature for 30 minutes, then 14.6 g of N a C l was added (1 M N a C l final concentration) and dissolved by gentle swirling. The solution was placed on ice for 25 minutes, then decanted into a 250 ml centrifuge bottle, taking care to leave the chloroform behind in the bottom of the glass flask. Bacterial debris were removed by centrifugation at 11,000 g for 10 minutes at 4 °C in a Beckman J2-21 centrifuge, JA-14 rotor (Beckman Instruments Inc.). The supernatant was poured into a clean flask and 25 g of solid polyethylene glycol, M W 8000, ( P E G 8000) was added to make a 10% solution. P E G was dissolved at room temperature by slow stirring on a magnetic stir plate. The P E G / phage solution was placed on ice at 4 °C for one hour or overnight, then centrifuged at 11,000 g for 10 minutes at 4 °C to recover the phage particles. The pellet was drained and resuspended in 4 ml of S M . Suspended phage particles were transferred to a 50 ml sterile conical centrifuge tube, 4 ml of chloroform were added and the mixture vortexed for 30 seconds. This solution was centrifuged for 15 minutes at 4 °C in a Beckman G P benchtop centrifuge. The aqueous phase was transferred to a thin walled ultra-centrifuge tube (Beckman Ultra-Clear™, 14 x 89 mm, Beckman). Phage particles were collected by ultra-centrifugation at 25,000 rpm for 2 hours at 4 °C, using the swinging bucket rotor, SW41, in a Beckman L 8 - 7 0 M Ultracentrifuge (Beckman 32 Instruments Inc.). The glassy pellet was resuspended overnight in 1 ml of S M by gently o t agitating the centrifuge tube at 4 G on an orbital shaker. Persistent lumps were dispersed by slowly pipetting up and down. Phage particles were lysed to separate the viral protein coat from the D N A , by adding 25 ul 0.5 M E D T A (pH 8.0), 2.5 ul of Proteinase K (20 mg/ml stock) (Sigma) and 50 (il of 10% sodium dodecyl sulfate (SDS), mixed by inversion and incubated at 56 °C for 1 hour. The mixture was cooled to room temperature and divided between two 1.5 ml microfuge tubes. A n equal volume of Tris equilibrated phenol (Appendix B , page 185) was added, the solution extracted, and the aqueous phase removed to a fresh tube leaving behind the phenol and white interface layer. The solution was extracted twice more with 1:1 phenol/chloroform, then once with chloroform alone. The aqueous phases were combined in one tube. D N A was precipitated by the addition of half a volume of 3 M ammonium acetate (pH 7.0) and 2.5 volumes of 95% ethanol, and mixed by inversion. The D N A precipitated at room temperature for 30 minutes before being centrifuged at 10,000 g for 20 minutes. The pellet was washed with 95% ethanol and air dried. The D N A pellet was redissolved in 1 ml of T E (pH 7.0) overnight. Escherichia coli Plasmid Mini-prep (based on Holmes and Quigley, 1981) Small amounts (approximately 20 u,g) of plasmid D N A were purified by boiling mini-prep to check the results of ligation reactions and to supply double stranded D N A template for sequencing reactions. Half of a 3 ml overnight E.coli culture was placed in a microfuge tube, spun down for 1 minute and the supernatant discarded. The remainder of the original culture was held at 4 °C. The pellet was resuspended in 0.3 ml of S T E T buffer (Appendix B , page 185) by vigorous vortexing. Bacteria were lysed by adding 25 \xl of a fresh lysozyme (Sigma) solution (10 mg/ml in T E p H 8.0), mixed by inversion and floated (uncapped) in a boiling water bath for 50 seconds. Immediately following, the microfuge tubes were capped and centrifuged for 10 minutes at 14,000 rpm. The gelatinous pellet was removed with a 33 sterile toothpick and discarded. D N A in the supernatant was precipitated with a lA volume of 7.5 M ammonium acetate (150 pi) and 3 volumes of 95% ethanol (900 ul), mixed by inversion and placed at -20 °C for 30 minutes or more. The D N A was pelleted by centrifugation at 14,000 rpm for 10 minutes and washed with ice cold 70% ethanol before being air dried and resuspended in 40 ul T E (or distilled water). Agrobacterium tumefaciens Plasmid Mini-prep A. tumefaciens was grown overnight at 28 °C in 5 ml of 523 medium (Appendix B , page 186) with the appropriate antibiotics. The culture was transferred to two 1.5 ml microfuge tubes and centrifuged for 60 seconds. The supernatant was discarded and the pellet washed with 1 ml Agro Wash solution (Appendix B , page 181) by vortexing, and centrifuging again. The wash solution was removed and the pellet resuspended in 100 p i of ice cold lysozyme solution (Appendix B , page 182). This mixture was incubated for 10 minutes at room temperature. Next, 200 pi of 1% SDS, 0.2 N N a O H solution was added, mixed by gently inverting and incubated on ice for 5 minutes. Ice cold potassium acetate solution (150 ul, p H 4.8) (Appendix B , page 183) was added and mixed by gently vortexing in an inverted position for 10 seconds. The tube was returned to ice for 5 minutes, then centrifuged for 5 minutes and the supernatant transferred to a new microfuge tube. The supernatant was extracted with an equal volume of phenol/chloroform and the aqueous phase removed to a clean microfuge tube. Plasmid D N A was precipitated with 2 volumes of room temperature ethanol for 2 minutes. The D N A pellet was collected by centrifugation (5 minutes at 14,000 rpm), washed with 70% ethanol twice and dried briefly in a Speed Vac, before being resuspended in 25 p i of T E . Large Scale Plasmid Isolation by Alkaline Lysis The D N A of certain plasmids was required in large amounts, for restriction mapping, subcloning, and for precipitating onto microprojectiles used in transient expression experiments. Such clones were grown overnight at 37 °C at 250 rpm in 500 ml of Terrific 34 broth (Appendix B , page 187) containing the appropriate antibiotic(s). E. coli cells were harvested by centrifugation in 250 ml centrifuge bottles using a Beckman JA-14 rotor at 5000 rpm, 4 °C for 10 minutes. The bacterial pellet was resuspended, by pipetting gently up and down, in 20 ml of freshly made buffered lysozyme solution (Appendix B , page 182) containing 10 mg/ml lysozyme (Sigma). The suspension was incubated at room temperature for 10 minutes, then 40 ml of freshly prepared alkaline SDS solution (Appendix B , page 181) was added to further lyse the bacteria. This mixture was placed on ice for 30 minutes. Ice cold 5 M potassium acetate (30 ml) was added, swirled to mix, incubated on ice for a minimum of 30 minutes or overnight, and then centrifuged at 10,000 rpm for 40 minutes at 4 °C in the JA-14 rotor. The supernatant was decanted into a clean flask and any white flakes removed using a cheesecloth filter. The D N A was precipitated from the supernatant by the addition of 2 volumes of 95% ethanol on ice or 0.6 volumes of isopropanol at room temperature, for a minimum of 30 minutes. The pellet was collected by centrifugation at 10,000 rpm for 40 minutes at 4 °C (if ethanol was used) or room temperature (if isopropanol was used). The supernatant was poured off, the pellet washed twice with 70% ethanol and air dried. The D N A pellet was washed from the walls of the centrifuge bottle, resuspended in 2.4 ml distilled water and further purified by cesium chloride gradient centrifugation. Cesium Chloride Gradient DNA Purification D N A from the alkaline lysis protocol, suspended in 2.4 ml distilled water, was mixed with 0.4 ml ethidium bromide (10 mg/ml) and 4.2 g of C s C l in a 15 ml Corex tube. The solution was heated slightly under hot running water to dissolve lumps of CsCl . Corex tubes were placed in thick rubber adapters for the JA-21 rotor and centrifuged at 6000 rpm at room temperature for 5 minutes. A Quick-Seal™ ultracentrifuge tube (Beckman) was partially filled with 8 ml of light C s C l solution (63 g/100 ml) using a 10 ml syringe and needle. Using a long Pasteur pipette attached to an automatic pipettor, the CsCl/plasmid solution from the Corex tube was placed at the bottom of the ultracentrifuge tube under the light CsCl . Care was taken to avoid transferring the " r o o f of protein floating in the Corex tube. 35 Ultracentrifuge tubes were filled to the top with light C s C l and pairs of tubes balanced to within 0.01 g. The ultracentrifuge tubes were heat sealed, placed in a Ti70 fixed angle rotor and centrifuged at 40,000 rpm for 18 hours at 20 °C (acceleration and deceleration programs were both set at 1) in a Beckman L 8 - 7 0 M ultracentrifuge (Beckman Instruments Inc.). The plasmid band was the lower band of D N A visible in the ultracentrifuge tube. D N A bands were generally visible without using U V light. The top of the tube was punctured with an 18 gauge needle before removing the plasmid band with a 21 gauge needle attached to a 3 ml syringe. Water saturated isobutanol was used to remove the ethidium bromide from the D N A by mixing equal volumes by inversion, and discarding the alcohol phase until both phases become colourless. The aqueous phase, containing the D N A , was diluted with 2 volumes of distilled water and the D N A precipitated with 2 volumes of 95% ethanol at -20 °C for several hours in a 30 ml Corex tube. The D N A pellet was collected by centrifugation at 10,000 rpm for 15 minutes at 4 °C in a JA-21 rotor. The pellet was washed with 70% ethanol, air dried and resuspended in 1 ml of distilled water (or TE) . Spruce Genomic DNA Interior spruce genomic D N A was prepared from mature somatic embryos of culture line W70. Thirty embryos (approximately 100 mg) were placed in a 1.5 ml microfuge tube and ground to a fine powder in liquid nitrogen using a metal pestle attached to a standard carpentry drill. The powdered tissue was incubated at 65 °C for 15 minutes in 1 ml of C T A B Extraction buffer (Appendix B , page 181) and extracted with 1 volume of chloroform : isoamyl alcohol (24:1). The aqueous layer (approximately 600 u,l) was removed to a fresh tube and the D N A precipitated with a 1/10 volume of 3 M ammonium acetate and 1 volume of isopropanol. After centrifugation at 7500 rpm in a microcentrifuge, the D N A pellet was washed with 600 ul of 70% ethanol, air dried, and resuspended in 300 ui of T E . 36 Spruce R N A Purification Samples representative of the interior spruce tissues prepared for microprojectile bombardment were frozen in liquid nitrogen and stored at -80 °C. Total R N A was extracted using TRIZOL™ Reagent, (a phenol and guanidine isothiocyanate solution, Gibco B R L ) according to supplier's instructions. RNase-free disposable plasticware and DEPC-treated solutions were used under sterile conditions to prevent the degradation of the R N A as it was being isolated. Fifty to one hundred milligrams of spruce tissue were ground in liquid nitrogen using a prechilled glass rod in a 1.5 ml screw cap microfuge tube. The sample was suspended in 1 ml TRIZOL™ reagent by vortexing and incubated at room temperature for 5 minutes. Samples were centrifuged at 14,000 rpm to pellet the insoluble cell wall fraction and the supernatant was transferred to a clean screw cap microfuge tube. Two hundred microlitres of chloroform (without additives) was added and each sample was vigorously shaken by hand for 15 seconds, before being set aside to incubate at room temperature for 3 minutes. The aqueous phase, which contained the R N A , was separated from the organic phase by centrifugation at 11,750 rpm for 15 minutes at 4 °C and transferred to a new microfuge tube. R N A was precipitated by the addition of 500 pi isopropyl alcohol, followed by a 10 minute incubation at room temperature and collected by centrifugation at 11,750 rpm for 10 minutes at 4 °C. The supernatant was removed and the gel-like pellet washed with 1 ml 75% ethanol (made with diethylpyrocarbonate ( D E P C ) treated distilled water, Appendix B , page 183). The pellet was vortexed, then centrifuged at 9500 rpm for 5 minutes at 4 °C, the supernatant discarded and the pellet air dried for 10 minutes. The R N A pellet was dissolved in 20 to 50 pi RNase-free 0.5% SDS by mixing gently with a pipette tip and incubating at 60 °C for 10 minutes. R N A concentrations of the samples were calculated by measuring light absorbance at 260nm wavelength ( A 2 6 0 ) of 1 pi of sample diluted in 99 pi of distilled water. R N A pg/ml = (A 26o units measured) x (40 pg/ml R N A per A 26o unit) x (dilution factor) 37 Tobacco Genomic DNA (Doyle and Doyle, 1990) Four to six healthy tobacco leaves were picked from a single plant (Nicotiana tabacum cv. Xanthi) and ground to a fine powder using liquid N 2 in a pre-chilled mortar and pestle. Ground leaf tissue was transferred to plastic scintillation vials, sitting in liquid N 2 , then stored at -80 °C. Residual liquid N 2 was allowed to evaporate before the vials were capped. D N A extraction was begun by rapidly vortexing approximately 0.5 to 1 g of frozen powdered leaf tissue with 5 ml pre-heated (60 °C) C T A B II extraction buffer (Appendix B , page 1 8 1 ) i n a l 5 m l Corex tube. Samples were processed one at a time to avoid excessive thawing and degradation of D N A . The C T A B / l e a f mixtures were incubated at 60 °C for 30 minutes. After the incubation period, an equal volume of chloroform : isoamyl alcohol (24:1) was used to extract the mixture by gentle inversion. Phases were separated by centrifugation at room temperature in the JA-20 rotor at 4000 rpm for 15 minutes. The top aqueous phase (about 5 ml) was removed to a clean 15 ml Corex tube, avoiding any floating fragments of plant tissue. Nucleic acids were precipitated from the aqueous phase by the addition of 3.3 ml chilled isopropanol. Gentle rocking of the tube caused long white strands of nucleic acid to appear. A Pasteur pipette with the tip melted into a hook was used to collect the D N A / R N A strands, which were then placed into a microfuge tube containing 1 ml wash solution (10 m M ammonium acetate in 76% ethanol). The strands of nucleic acid were washed by rocking the microfuge tubes back and forth. The wash solution was changed and the rocking repeated until the D N A / R N A precipitate was colourless. The pellet was spun down at 10,000 rpm for 5 minutes, air dried and redissolved in 1 ml of T E . Insoluble material was removed by centrifugation. The nucleic acids were further purified by reprecipitating in 2 volumes distilled water, Vi volume ammonium acetate (7.5 M ) and 2.5 volumes cold 95% ethanol in a 30 ml Corex o tube. The solution was mixed by inversion and allowed to precipitate overnight at -20 C. o The pellet was recovered by centrifugation at 10,000 rpm at 4 C for 15 minutes. The 38 supernatant was discarded and the pellet air dried, before being resuspended in 500 pi T E buffer. 2.1.3 Gel Electrophoresis and Analysis of Nucleic Acids Restriction Digest Reactions Restriction digests of plant genomic or X phage D N A consisted of 500 ng to 1 pg D N A , 3 pi 1 OX One-Phore-All Plus (OPAP) buffer (Pharmacia Biotech) (1 or 2 times strength depending on the manufacturers recommendation for the restriction enzyme(s)), 0.5 pi bovine serum albumin (Fraction V , Sigma) ( B S A ) (1 mg/ml), 0.5 p i 1 M dithiothreitol (DTT), 3 p i 40 m M spermidine and 0.5 pi (approximately 1 unit) of the appropriate restriction enzyme(s) brought up to 30 pi with distilled water. Digests were incubated for at least 3 hours or overnight at 37 °C (or at the temperature specified for a particular enzyme). Restriction digests of 0.5 to 1 pg plasmid D N A were successfully completed using 1 unit of restriction enzyme in a O P A P buffer solution (Pharmacia) at the strength recommended for that particular restriction enzyme. Gel Electrophoresis - DNA Two microlitres Ficoll tracking dye (Appendix B , page 182) were mixed with 20 pi restriction digest, loaded on a 0.8% agarose 1 X T A E (Appendix B , page 185) gel containing 0.5 pg/ml ethidium bromide (-10 pi of a 10 mg/ml stock), and run at 55 volts in I X T A E buffer until the first band of the tracking dye approached the end of the gel. The first and last lanes of each gel contained size markers (usually lambda phage D N A digested with H i n d l l l and / or Hind l l l /EcoRI) for the quantification of band size. D N A was visualised by placing the gel on a U V transilluminator (wavelength = 302 nm) (TM-36, U V P Inc.) to view the fluorescence of the ethidium bromide-DNA complexes. Photos (Polaroid 667 film, Polaroid) were taken through a Wratten 22 filter. 39 Southern Blot Southern blots were prepared from agarose gels in which the D N A bands had been separated by electrophoresis. In order to depurinate the D N A , gels were placed in a Pyrex baking dish, covered with a solution of 0.25 M HC1, and swirled on an orbital shaker until the bands of tracking dye in the gel changed from shades of blue to green and yellow (15 to 20 minutes). The HC1 solution was discarded and replaced with denaturing solution (1 .5M NaCl , 0 .5M NaOH) . The gel was returned to the orbital shaker until the tracking dye bands had returned to the original blue colour (15 minutes). The denaturing solution was replaced with 1 M ammonium acetate, for three 15 minute washes. The gel was prepared for blotting by trimming excess agarose from the top and sides of the gel, and by marking the position of the first lane by cutting off the left bottom corner at an angle. A "wick" (of the same length as the trimmed gel and 3 times the width of the gel stand) and 4 gel-sized pieces of Whatman 3 M M blotting paper (Whatman) were cut and soaked in fresh 1 M ammonium acetate. A gel stand or inverted casting tray was placed in a large Pyrex baking dish, the wick was placed over the gel stand with two edges of the blotting paper tucked under the legs of the gel stand. Fresh 1 M ammonium acetate was added to the dish to a point half way up the legs of the gel stand. Two pieces of the blotter were centred on the wick, the gel was placed on top, and then a piece of Hybond membrane (Amersham Life Sciences Inc.) cut the same size as the gel (also pre-wetted in 1 M ammonium acetate). A i r bubbles were carefully pressed out as the stack was assembled. A gasket was formed from four strips of plastic food wrap placed along the edges of the nylon membrane to the sides of the dish. The final two pieces of moist blotting paper were laid over top. The assembly was completed with half a package of paper towels (4 inches in height) weighted by a 10 x 10 cm glass plate and approximately 500 g of lead. D N A transfer proceeded overnight. The stack was carefully disassembled and the Hybond membrane peeled from the flattened gel. The side of the membrane adjacent to the gel (the D N A side) was marked with pencil. Efficiency of D N A transfer was checked by placing the gel on the U V transilluminator and ascertaining that the ethidium bromide stained bands had transferred to the membrane. 40 Trace amounts of agarose were rinsed from the blot and it was placed in the dark to air dry. D N A was cross-linked to the Hybond membrane by exposure to U V light (wavelength = 302 nm), D N A side down on a piece of Saran Wrap™ (Dow Chemical) on a U V transilluminator for no longer than 10 minutes. The blot was then wrapped in Saran Wrap™ and stored in the dark until used. Gel Electrophoresis - R N A Prior to gel electrophoresis of the R N A samples, the gel kit and casting tray were cleaned by soaking overnight in a 1% SDS solution, rinsed with tap water, followed by distilled water and ethanol, before air drying. The gel was prepared by dissolving 2 g agarose (Molecular Technology grade, Sigma) in 170 ml distilled water plus 20 ml 1 OX M O P S buffer (Appendix B , page 183) by boiling in a microwave. The agarose solution was cooled to 50 °C, 10 ml formaldehyde was added and the gel poured into the casting tray. Running buffer consisted of I X M O P S made up in DEPC-treated distilled water (Appendix B , page 183). R N A stocks were thawed on ice. Five micrograms R N A were mixed with 6 pi formamide dye (Appendix B , page 182), 2 pi formaldehyde and 0.6 pi 1 OX M O P S , then heated to 65 °C for 10 minutes. The samples were then snap-cooled on ice and 1 pi of ethidium bromide (1 mg/ml in DEPC-treated distilled water) added. Samples were spun down in a microcentrifuge to mix the contents, and quickly loaded onto the gel. The gel was run at 85 volts for 2 hours. Northern Blot The R N A gel was placed in an RNase-free dish and rinsed with DEPC-treated distilled water for 2 minutes, then washed with 10X SSC (Appendix B , page 184) for 20 minutes with gentle agitation on an orbital shaker. Capillary transfer of the R N A to Hybond nitrocellulose membrane (Amersham) was accomplished essentially in the same manner as for Southern blotting except that the transfer buffer used was 10X SSC. The nitrocellulose membrane and blotting papers were pre-wetted in 10X SSC. Transfer occurred overnight, the Hybond 41 membrane was air dried in the dark and the R N A cross-linked to the membrane by exposure to U V light for 5 minutes on a U V transilluminator (wavelength = 302 nm). 2.1.4 Labelling and Hybridization of Probes 3 2 P Labelling of Oligonucleotide Probes Synthetic oligonucleotide primers (Nucleic Acid-Protein Service Unit, Biotechnology Laboratory, University of British Columbia, Table 1) corresponding to the sequence of the 2S albumin c D N A clone (Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2) were radioactively labelled to serve as probes. Small oligomers (17 to 24 basepairs in length) were end-labelled in a 10 ul reaction mix consisting of 3 pmoles of the oligonucleotide (19 ng of a 19mer, 20 ng of a 20mer, 22 ng of a 22mer, etc.), 3 ul (4.8 pmoles) of 3 2P-labelled y-ATP(6000 Ci/mmol, 10.0 mCi/ml, DuPont N E N Research Products), 0.5 ul (5 units) of polynucleotide kinase ( P N K ) (Promega) in I X P N K buffer. The labelling reaction was mixed, spun down briefly and incubated at 37 °C for 15 minutes. The reaction was stopped by 4 ul 0.5 M E D T A in addition to being heated at 98 °C for 2 minutes. The end-labelled oligonucleotide was precipitated by the addition of 90 ul T E , 1 ul yeast t R N A (1 mg/ml) (Sigma), half a volume of 7.5 M ammonium acetate, plus 3 volumes of 95% ethanol, at -80 °C for 30 minutes. The precipitate was recovered by centrifugation at 14,000 rpm for 15 minutes in a microcentrifuge. The supernatant was removed using a drawn out glass Pasteur pipette and discarded. The pellet was washed with ice cold 95% ethanol, centrifuged for 5 minutes and the supernatant discarded. The radioactivity of the supernatant was monitored and the washes repeated to remove unincorporated radioactive label. The probe was dried in a Speed Vac for 5 minutes and re-dissolved in 100 ul T E by heating at 70 °C for 10 minutes. Probes had an activity of 10 6 counts per minute ( C P M ) or more. 42 Hybridization of Southern Blots with Oligonucleotide Probes The blot was pre-hybridized in 20 ml of pre-hyb solution (6X SSC (Appendix B , page 184), 1 m M E D T A , 0.5% SDS, 50 pg/ml yeast t R N A (2 mg/ml stock), 0.1% sodium pyrophosphate) in a hybridization jar of the Techne Hybridization oven (Techne Inc.), for a minimum of 4 hours or overnight, at the same temperature required for hybridization of the probe. Hybridization was carried out at 55 °C for oligonucleotide probes which were between 20 to 24 nucleotides in length, smaller probes were hybridized at 50 °C. The probe was added to the pre-hyb solution in the jar and hybridization proceeded overnight. The hybridization solution was poured off and the blot was washed (2X SSC, 1% SDS, 1 m M E D T A ) three times in 20 ml washes of 15 minutes each until no further radioactivity was measured in the wash solution. Excess wash solution was blotted from the membrane before it was wrapped in Saran Wrap™, monitored by Geiger counter, placed in a film cassette and used to expose X-ray film (X-Omat™, Kodak). A n intensifying screen (Cronex Lightening Plus, DuPont) was used and the cassette placed at -80 °C, i f levels of radioactivity were low. Length of exposure time varied. Overnight exposures were usually adequate though blots could be re-exposed for longer periods of time. X-ray film was developed under safe-lights, by processing for 3 minutes in Kodak G B X developer, rinsing under tap water, then fixing for 2 to 3 minutes in Kodak G B X fixative, before a final half hour rinse under running tap water. 3 2 P Labelling by Random Primer The random primer method (based on Feinberg and Vogelstein, 1983 and 1984) was used to radioactively label larger double stranded pieces of D N A , such as portions of plasmids, c D N A clones, or polymerase chain reaction (PCR) products. Template D N A (1 pi of <1 pg/pl) was mixed with 9.5 p l o f distilled water in a 1.5 ml microfuge tube, then denatured by heating to 98 °C for 10 minutes in a P C R machine and snap-cooled on ice. Radioactive labelling of the template D N A was accomplished by the addition of 2 pi 10X labelling buffer (Appendix B , page 182), 2 pi of 1 mg/ml B S A , 2 pi 0 .1M D T T , 2 pi dNTP 43 mix (2 m M each of G,T,C) , 1 ul hexanucleotide random primers (Gibco B R L ) , 2 pi oc-labelled 3 2 P - d A T P (3000 Ci/mmol, 10.0 mCi/ml, DuPont N E N Research Products), and 0.5 pi of Klenow fragment (Gibco B R L , Burlington, Ont) . The labelling mix was incubated at room temperature for a minimum of 4 hours or overnight. The reaction was stopped and the labelled probe precipitated by the addition of 1 pi 0.5 M E D T A , 80 p i distilled water, 3 pi yeast t R N A (2 mg/ml), 50 pi 7.5 M ammonium acetate, and 375 pi ice cold ethanol. This mixture was placed at -80 °C for 20 minutes. The probe was collected by centrifugation at 14,000 rpm for 15 minutes in a microcentrifuge. The pellet was washed three times with ice cold 95% ethanol followed each time by a 5 minute centrifugation, until the supernatant was not significantly radioactive, indicating that unincorporated dNTPs had been washed away. The pellet was dried in a Speed Vac, and resuspended in 100 pi T E by heating to 65 °C. The probe was stored at -80 °C i f not used immediately. Hybridization of Southern Blots with Randomly Labelled Probes The pre-hybridization solution appropriate for large probes consisted of 5 ml 2 0 X SSPE (Appendix B , page 185), 2 ml 10% SLS, 2 g dextran sulphate, 1 ml sheared salmon sperm D N A (Appendix B , page 184) (20 mg/ml), and distilled water to give a final volume of 20 ml. The randomly labelled probe was heated to 98 °C for 10 minutes, then snap-cooled on ice to denature the double-stranded D N A before being added to the pre-hybridization solution. Pre-hybridization and hybridization were carried out at 65 °C overnight. The blot was washed 3 times for 15 minutes each in 20 ml of wash solution (2X SSC, 1% SDS), and exposed to X-ray film as above. Hybridization of Northern Blots R N A blots were pre-hybridized in a Techne Hybridization oven (Techne Inc.) at 42 °C for 2 hours in a solution consisting of 10 ml formamide, 4 ml 20% SDS, 4 ml 2 M N a H 2 P 0 4 -4 m M E D T A (pH 7.2) (Appendix B , page 183), and 160 pi B S A (50 mg/ml). The randomly 44 labelled probe (a 517 bp fragment of the c D N A subclone II5G1001) was heated to 98 °C for 10 minutes, snap-cooled on ice, and added to the hybridization jar. Hybridization proceeded over a 20 hour period. R N A blots were washed three times for 15 minutes at 42 °C with 2 X SSC, 0.1% SDS and once, with the same solution, at 65 °C for 10 minutes. The blot was exposed to X-ray film in the same manner as for Southern blots. The Northern blot was stripped with boiling 0.1% SDS and re-probed with a Picea glauca 28S ribosomal probe ( C N -X 6 G ) to confirm that equal amounts of total R N A had been loaded. Band size was quantified using NTH Image version 1.60 for Macintosh (http://RSB.Info.NIH.gov/NIH-IMAGE/). 2.1.5 Ligation Reactions Sai l and S a l l / B a m H l fragments from X 3.2 which cross-hybridized with the II5G.1 primer (Table 1) were gel purified (Prep-A-Gene kit, Promega, Madison, WI) and subcloned into the sequencing vector p G E M -3Zf(+) (Promega). The vector was prepared to receive the fragments by restriction with the enzyme Sai l or double digestion with both Sai l and B a m H l in the case of the S a l l / B a m H l fragment. Ligation reactions consisted of 100 ng vector digested with the appropriate restriction enzymes, 25 ng insert, (roughly a 1:3 molar ratio of vector to insert) plus 2 (al 5 X ligase buffer (Gibco B R L ) , 1 ul 10 m M A T P , 1 unit T4 D N A ligase (Gibco B R L ) , and distilled water to 20 u l Control reactions were prepared with no ligase and no insert D N A to test transformation efficiency. Ligation reactions were incubated overnight at room temperature. 2.1.6 Heat Shock Transformation of Competent E. coli Cells Heat shock competent cells (E. coli DH5oc) were stored in 50 ul aliquots at -80 °C. Aliquots of cells were thawed on ice. Ligation reactions were diluted with equal volumes of T E , and 10 ul added to 50 ul of competent cells with gentle mixing. The ce l l /DNA mixtures were placed on ice for 15 minutes, followed by a 1 minute heat shock at 37 °C and then returned to ice for 2 minutes. The cells were removed from the ice and 200 ul pre-warmed S O C medium (Appendix B , page 186) added before being incubated at 37 °C for .1 hour. Ten 45 microlitres of 0.1 M P-D-isopropyl-thiogalactopyranoside (IPTG) and 50 pi of 5-bromo-4-chloro-3-indolyl-P-galactopyranoside (X-Gal) (20 mg/ml in dimethylformamide) were added to the cell mix, which was immediately spread-plate on room temperature Y T medium containing 50 pg/ml ampicillin. The plates were incubated overnight and screened for white colonies (containing insertions) among the blue colonies (containing re-ligated vector). Single white colonies were picked, used to inoculate 3 ml Y T broth with ampicillin 50 pg/ml, and incubated at 37 °C on an orbital shaker overnight. The size of the insert was checked using the boiling mini-prep plasmid D N A purification detailed previously (section 2.1.2), along with restriction digestion using the appropriate enzyme(s) to release the insert. 2.1.7 Generation of Unidirectional Deletions by Exonuclease Digestion Following the Erase-a-Base™ protocol given by Promega (based on Henikoff, 1984) a series of deletion subclones was created 5' to 3' in relation to the coding region by exonuclease digestion of the clone p i lb-3 (a 6.5 kb Sai l fragment containing the 2S albumin gene ligated into the vector p G E M -3Zf(+), Figure 2a). Cesium chloride purified p i lb-3 plasmid D N A was digested with Sacl and B a m H l , phenol/chloroform extracted and reprecipitated with a 0.2 volume 1 M N a C l and 2 volumes of 100% ethanol. After centrifugation for 10 minutes, the pellet was washed with 70% ethanol and air dried. The pellet was redissolved in 108 pi distilled water and 12 pi 1 OX Exo III buffer (Promega). Ten 0.5 ml microfuge tubes were labelled, placed on ice and 7.5 pi of SI nuclease mix (Appendix B , page 184) added to each. Hal f (60 pi) of the D N A / E x o I I I buffer mix was heated to 37 °C and 2 pi Exo III nuclease added. This enzyme digests along one strand of the D N A from a nick, a blunt end or 5' overhang such as, in this case, from B a m H l . Three prime overhangs are resistant to ExoIII digestion, therefore the D N A adjacent to the Sacl site is protected. Every 30 seconds after the addition of the ExoIII nuclease, 2.5pl of the digest reaction was removed and added to one of the S1 nuclease mix tubes. After all the timed samples had been taken, the SI tubes were incubated at room temperature for 30 minutes, to allow the SI nuclease to digest the remaining single stranded D N A . 46 The SI nuclease was inactivated by adding 1 pi SI Stop (0 .3M Tris base, 50 m M E D T A ) and by heating to 70 °C for 10 minutes. The degree of serial digestion of the plasmid was ascertained by running 2 pi from each time point on a 0.8% I X T A E agarose gel. The deleted plasmids were blunt-ended to prepare them for re-ligation by mixing 5 pi of each stopped reaction mix with 40 pi distilled water, 5 pi 1 OX O P A P buffer (Pharmacia) and 1 unit Klenow reagent (Gibco B R L ) , and incubating at 37 °C for 3 minutes. One microlitre dNTPs (0.125 m M each of G , A , T , C ) was added to each tube and incubation continued a further 5 minutes. Plasmids were re-ligated by mixing 10 pi of the Klenow reaction mix with 4 pi 5 X ligase buffer (Gibco B R L , Burlington, Ont) , 2 pi 10 m M A T P , 4 p i distilled water and 2 units T4 D N A ligase (Gibco B R L ) . Ligations proceeded overnight at room temperature and competent E. coli D H 5 a cells were transformed as in section 2.1.6. 2.1.8 2S Albumin Promoter / GUS Fusions Isolation of the 2S Albumin Promoter The 2S albumin upstream region was isolated from the plasmid clone p i lb-3 (Figure 2a) as a polymerase chain reaction (PCR) product using the oligonucleotide 115 G. 1 and p U C forward sequencing primer (Promega) as P C R primers (Table 1). Vent r D N A polymerase (New England Biolabs Inc.) with 3' to 5' proof-reading ability was used for primer extension to reduce the possibility of error introduction during D N A synthesis. The P C R mix consisted of 200 ng C s C l purified plasmid p i lb-3, 100 ng forward primer, 100 ng II5G.1 primer, 2.5 pi 10X Vent buffer, 2.5 pi 10X dNTPs (2 m M each of G , A , T , C ) , 1 unit Vent r D N A polymerase and sterile distilled water to give 25 pi total volume. The amplified promoter fragment was digested with Sstl and ligated into an Sst l /Smal site in pGEM3Zf(+) (Promega), creating p G E M 2 S (Figure 2b). Construction of the GUS Fusion Vectors To aid in vector construction, the promoter-less G U S genes from the binary vector series p B H O l . l , pBI101.2, and pBI101.3 (Jefferson, et al., 1987) were removed as 47 Figure 2: Plasmid Constructs Plasmid maps are not to scale relative to each other, but within a map fragment sizes are comparable. A. 5' and 3' untranslated and flanking regions of the Picea glauca 2S albumin genomic clone contained within the plasmid p i lb-3 are represented by open boxes. Shaded boxes denote exon 1 and 2 of the 2S albumin coding region. The intron is represented by a thin line. The intron and second exon are not drawn to scale. The cloning vector was pGEM3Zf(+) (Promega). B. The 5' flanking region of the genomic clone contained in p i lb-3 was subcloned to create p G E M 2 S . C. p U C l O l . 1 consists of the uidA coding region and nos terminator from p B H O l (Jefferson et al., 1987) (shaded boxes labelled G U S and Nos respectively) cloned into the p U C 19 vector. D. p2SGUS is a translational fusion of the 2S ' albumin 5' flanking region, released as a B a m H l fragment from p G E M 2 S , with the uidA coding region contained in p U C l O l .2. E . Deletion of the 2S albumin 5' flank to position -653 was accomplished by restriction digestion of p2SGUS with X b a l and re-ligation to give p2S700. F. p 2 S M I N was created by the restriction digestion of p2S700 with SphI and religation, thereby reducing the 2S albumin 5' flank to position -117. G. pBIN2S was created by cloning the 2.3kb 5' flanking region and G U S reporter gene from p2SGUS as an EcoRI /Hind l l l fragment into the EcoRI /Hind l l l site of the binary vector pBIN19 (Bevan, 1984). NPTI I denotes the neomycin phosphotransferase gene which encodes kanamycin resistance. L B and R B represent the left and right borders of the disarmed T i plasmid. H. pBIN700 was created by removal of the 5' Qank.uidA translational fusion from p2S700 as an EcoRI /Hind l l fragment and ligation into the EcoRJ /Hind l l l site of pBIN19 (Bevan, 1984). 48 Figure 2: Plasmid Constructs A B „ „ Xbal,BamHl, SalK \ PstI SphI HindlH /Smal 49 EcoRI /H ind l l l fragments and placed in the polycloning site of pUC19, producing three vectors: p U C l O l . l , pUC101.2, and p U C l O l . 3 (Figure 2c). These vectors are identical except that the G U S reading frame is shifted by the addition of one nucleotide in relation to the polylinker (Jefferson et al., 1987). A translational fusion (p2SGUS, Figure 2d) was created by removing the 2S promoter from p G E M 2 S as a B a m H l fragment and ligating it in front of the promoter-less G U S gene of p U C l O l .2. The 2S albumin promoter region was also ligated into the B a m H l site of pBI101.3; creating a negative control (p2S+lGUS), due to the one nucleotide frameshift upstream of the G U S coding region. Fusion junctions were sequenced (see Methods and Materials section 2.2) using a G U S sequencing primer (Clontech) (Table 1). The 5' flanking region was deleted distally from the full size of 2.3 kb to approximately 700 basepairs by digestion with X b a l and re-ligation to form the plasmid p2S700 (Figure 2e). A minimal promoter consisting of 117 basepairs of 2S albumin 5' flanking sequence (p2SMTN, Figure 2f), was formed by removing the distal region of the promoter by digestion with Sphl and re-ligation. The spruce 2S albumin promoter/GUS fusions of p2SGUS and p2S700 were excised as EcoRI /H ind l l l fragments by restriction digestion and were gel-purified (Prep-a-gene™, BioRad). These fragments (2.3 kb promoter:GUS fusion and 700 bp promoter:GUS fusion) were then ligated into the binary vector pBIN19 (Bevan, 1984) for use in Agrobacterium tumefaciens mediated transformation, creating binary vectors pBIN2S and pBIN700 respectively, Figure 2g and 2h. The binary vectors pBIN2S and pBIN700 were transformed by electroporation (Gene Pulser, BioRad) into E.coli strain D H 5 a electro-competent cells. Constructs were introduced into A. tumefaciens strain E H A 1 0 5 (Hood et al., 1993) via triparental mating. Introduction of Binary Vectors into Agrobacterium tumefaciens Three millilitre liquid cultures of the E.coli helper strain (HB101/pRK2013) (Ditta et al., 1990), the E.coli strain D H 5 a containing the binary vector (pBIN2S or pBIN700), and 50 the recipient A tumefaciens strain E H A 1 0 5 (Hood et al., 1993) were grown overnight. The E.coli strains were cultured in L B medium containing 50 pg/ml kanamycin at 37 °C, while the disarmed A tumefaciens strain was cultured at 28 °C in 523 medium containing rifamycin 50 pg/ml. The overnight bacterial cultures were diluted 1:10 and grown to mid-log phase (OD 6 0 o = 0.7). The helper and D H 5 a E.coli cultures were combined in a single culture tube and incubated without agitation at room temperature for 30 minutes. One millilitre of the A. tumefaciens culture was added to this, gently mixed and filtered through a sterile 0.2 pm filter held in a reusable Sarstedt syringe filter holder (Sarstedt Inc.). The filter was removed from the holder and placed bacterial side up on a plate of 523 medium without antibiotics at 28 °C for 24 hours. Bacterial cells were resuspended in 2 ml of sterile 15% glycerol by transferring the filter disk from the 523 medium to a 10 ml disposable culture tube and vortexing by hand. A dilution series was made in 1 ml sterile distilled water blanks from the initial bacterial suspension and 100 pi of each dilution was spread-plated on 925 Minimal medium (Appendix B , page 186) containing kanamycin and rifamycin at 50 pg/ml. The 10 dilution was also plated on 925 Minimal medium without antibiotics, to observe the viability of the bacterial strains. Plates were incubated at 28 °C overnight. Single colonies were picked and streaked onto selection medium (925 minimal medium containing kanamycin and rifamycin 50 pg/ml) three times (in series) to ensure that there were no E. coli contaminating the transformed A. tumefaciens. Mini-preps were done on the putatively transformed A. tumefaciens (EHA105/pBIN2S and EHA105/pBIN700) (as described in Methods and Materials section 2.1.2) to confirm that the plasmids had been taken up and had not undergone any deletions or gross rearrangements. Bacterial cultures were stored as 15% glycerol stocks at -80 °C. 2.2 Sequencing Sequence was obtained by the Taq Polymerase Chain Reaction (PCR) sequencing method using the fmol™ Sequencing System (Promega), with modifications. Synthetic oligonucleotides (Table 1) specific to the cloning vector (pUC/M13 forward, and pUC/M13 51 reverse) (Promega) and to the 2S albumin c D N A clone II5G1001 (II5G.1, II5G.2, II5G.3, II5G.4, and II5G.5) (Nucleic Acid-Protein Service Unit) were used as sequencing primers. 2.2.1 Radioactive Labelling of Sequencing Primers Primers were radioactively end-labelled by combining in a 0.5 ml microcentrifuge tube: 10 pmoles of sequencing primer (= 57 ng for a 17mer, = 67 ng for a 20mer, = 80 ng for a 24mer primer), 10 pmoles (= 5.0 ul of 5000 Ci/mmol at 10 uCi/ul) y 3 2P-labelled A T P , 1 ul T4 P N K 10X buffer, 5 units T4 P N K (Promega), and sterile distilled water to a final volume 32 of 10 u l If the primer concentration was too dilute, the primer and y P - A T P were dried down together in a Speed Vac, before being redissolved in 10 ul I X T4 P N K buffer. The primer labelling mix was incubated at 37 °C for 15 minutes then inactivated by heating to 98 °C for 2 minutes. The primer was stored at -20 °C or used directly without further purification. Note that this differs from the protocol for end-labelling primers for use in Southern blot hybridization; the amount of primer labelled is increased, E D T A is omitted in stopping the reaction, and there is no need to remove unincorporated radioactive label. 2.2.2 Extension/Termination Reactions Four 0.5 ml microfuge tubes were labelled G, A , T, C for each set of sequencing reactions. One microlitre of the appropriate dNTP - d/ddNTP mix (Appendix B , page 181) was added to the bottom of each tube (i.e., 1 ul deoxy-adenosine triphosphate (dATP) -deaza/dideoxy-adenosine triphosphate (d/ddATP) mix into e.g., tube A ) . The tubes were capped and stored on ice until needed. A primer/template D N A mixture was prepared which consisted of 2 ul double stranded D N A template (approximately 100 ng of super-coiled plasmid D N A prepared by mini-prep, Methods and Materials section 2.1.2), 2.2 ul 10 X Taq 32 Mg-free buffer (Promega), 1.7 ul 25 m M M g C l , 1.5 ul P end-labelled primer, 1.0 ul regular grade Taq polymerase (Promega) and distilled water to 18 ul final volume. The primer/template mixture was assembled on ice, mixed gently with a pipette tip and spun down. Special care was taken not to introduce bubbles into the sequencing reaction when 4 ul of the 52 " primer/template mixture was placed on the inside wall of each d/ddNTP tubes. The presence of any air bubbles would cause the sequencing reaction to fail. One drop (approximately 25 p 1) of mineral oil was added to each tube and the reaction mix spun down briefly. The reaction tubes were transferred directly from ice into block 1 of the Ericomp Twin Block™ Thermal cycler (Ericomp), preheated to 95 °C. The P C R cycle used for primers 17 nucleotides in length was 1 cycle of 95 °C for 2 minutes; 30 cycles of 94 °C for 30 seconds (denaturing), 45 °C for 30 seconds (annealing of the primer), 70 °C for 30 seconds (extension of the primer); finishing with 1 cycle of 70 °C for 5 minutes. The annealing temperature for primers 20 nucleotides in length was raised to 55 °C. For primers of 24 nucleotides or longer the annealing/extension step was combined as 70 °C for 30 seconds, i.e. the 55 °C annealing step was omitted. After the thermocycling program had been completed, 3 pi of sequencing stop buffer (Appendix B , page 184) was added and the tubes spun down. Sequencing reactions were heated for 2 minutes at 70 °C before being loaded in a GATC-order on the sequencing gel or they could be stored at -20 °C. 2.2.3 Sequencing Gels - 1.2X T B E The glass plates of the sequencing apparatus were thoroughly cleaned and the top plate (shorter one) silanized in the fume hood using Gel Slick™ (J.T. Baker Inc.) following the manufacturer's instructions. The plates were assembled silanized side in, with 0.4 mm spacers held in place by black stationary clamps. The bottom of the gel cassette was taped closed with yellow Scotch Brand electrical tape to a third the way up the cassette sides. Long Ranger™ (J.T. Baker Inc.) is a pre-mixed modified acrylamide monomer solution used as a substitute for acrylarrude/A/sacrylamide in the preparation o f sequencing gels. A 1.2X T B E gel was prepared by dissolving 21 g urea in 6 ml 10X T B E (Appendix B , page 185), 5 ml Long Ranger™ concentrate and 25 ml deionized water. This solution was filtered through a Whatman No . 1 filter paper (Whatman) in a Buchner funnel, and then de-gassed. A "plug" was mixed for the bottom o f the sequencing gel consisting o f 1.5 ml of the gel solution prepared as above, 15 pi fresh 10% ammonium persulphate solution (APS), and 53 1.5 ul N,N,N',N'-tetramethyl-ethylenediamine ( T E M E D ) mixed in a microfuge tube and pipetted down one side of the gel cassette and spread along the bottom. The plug had a thicker, faster polymerizing consistency, its purpose being to prevent leaks when the main part of the gel was poured. The cassette was placed upright while the plug was setting. After the plug had set, 250 ul 10% A P S and 25 ul T E M E D were added to the remainder of the gel solution. The solution was swirled to mix and then drawn up in a 60 cc syringe (without needle), taking care not to add bubbles. A n 18 gauge needle was attached to the syringe and the gel solution was poured along one side of the cassette, the angle of the glass plates and the speed of pouring being adjusted so that the gel solution poured in one smooth flow with no air bubbles. When the cassette was full, it was laid at a slight angle propped up by two rubber stoppers at the top sides, a sharks-tooth comb inserted (flat side into the gel) and clamped in place. Polymerization was complete within 1 to 2 hours. The top edge was wrapped in plastic wrap and the gel stored at 4 °C i f the sequencing gel was not used that day. The glass cassette was clamped into the sequencing apparatus and the upper and lower buffer chambers filled with 0 .6X T B E running buffer. 1.2X T B E gels were run at 35 watts for 3 to 6 hours. The sharks-tooth comb was removed and its orientation reversed so that the teeth were level and barely pressed into the gel. The gel was pre-run for 15 minutes to bring it up to running temperature (45 °C) and the sample wells rinsed with running buffer before 1/3 to Vz volume of the pre-heated (70 °C) sequencing reactions were loaded. Runs of overlapping sequence were accomplished by loading the gel two or three times with the same reaction mix (in an new set of wells) at 1 Vi hour intervals. Short sequencing runs were approximately 4000 volt hours (the first dye front, bromophenol blue, had just run off the gel by then), long runs were 8000 volt hours (the second dye front, xylene cyanol, was then about to run off the bottom of the gel). A t the completion of the run, the gel cassette was removed and allowed to cool to room temperature, before the spacers were removed and the top glass plate was lifted off. A piece of Whatman 3 M M blotting paper (Whatman) cut to the size of the gel was smoothed 54 over top, firmly pressed in place, and used to peel the gel from the lower glass plate. The gel was overlaid with a piece of Saran Wrap™ so that there were no bubbles or wrinkles in the plastic. The sequencing gel was then vacuum dried in a gel drier at 80 °C for 1 hour, before being exposed to X-ray film (X-Omat™, Kodak) overnight. 2.2.4 Sequencing Gels - Formamide G - C rich regions of the sequence were difficult to read due to apparent compression of the sequence on a 1.2X T B E sequencing gel. Formamide sequencing gels are strongly denaturing and act to clarify regions of compression by eliminating secondary structures in the D N A strand. Formamide gels consisted of 21 g urea dissolved in 20 ml formamide (Ultrapure Bioreagent, J.T. Baker Inc.), 8 ml Long Ranger™ (J.T. Baker Inc.) concentrate, 5 ml 1 OX T B E , polymerized with 400 pi 10% A P S and 60 pi T E M E D . The gel solution was warmed to dissolve the urea and then cooled to room temperature before the polymerizing agents were added. The plug for the formamide gel consisted of 1.5 ml of the above gel solution polymerized with 25 pi 10% A P S and 2.5 pi T E M E D . Formamide sequencing gels were run with I X T B E running buffer at 35 watts for twice the length of time as 1.2X T B E sequencing gels. Formamide gels must be fixed after being run to prevent stretching and distortion. The top plate of the gel cassette was removed and the lower glass plate with the gel placed in a 20% ethanol, 10% acetic acid fixing solution for 15 minutes. The fixative was drained away and the gel dried as above. 2.2.5 Assembly of Sequences The sequence was read from the developed X-ray films and entered by hand into the S E Q U I N program of the P C / G E N E software package (version 4.16, 1992, IntelliGenetics). This software package contains programs which were used to align and overlap sequences ( A S S E M B G E L ) , as well as to identify potential eukaryotic promoter elements ( E U K P R O M ) , hairpin loops (HAIRPIN) and repetitive motifs ( R E P E A T ) within the gene sequence. The program N A L I G N was used to compare nucleotide sequences by aligning them. 55 2.3 Confirmation of Intron by Polymerase Chain Reaction P C R primers (II5G.4 and II5G.5, Table 1) were used to confirm the presence of an intron in the 2S albumin gene. The primer II5G.4 anneals 467 basepairs upstream from the 5' border of the intron; II5G.5 anneals 10 basepairs downstream of the 3' intron border. When the c D N A clone (II5G1001) was used as the template, the primer pair produced a 517 bp P C R product. Genomic D N A templates assayed were from the interior spruce tissue culture line W70, and from individual trees: E W S 1647, PG2, PG5 , PG8, E K 6 and E K 4 6 (EWS= eastern white spruce, PG= Prince George interior spruce, E K = eastern Kootenay interior spruce). P C R reactions consisted of 1 ul D N A template (20 to 100 ng, R N A free), 100 ng each primer, 2.5 (al 10X Taq polymerase buffer, 1.5 ul 25 m M M g C l 2 , 2.5 ul 10X dNTPs, 1 unit Taq D N A polymerase (Promega). The thermocycler (Ericomp) was set to hot start at 95 °C, then perform 30 repeats of a 30 second annealing step at 55 °C, a 1 minute chain extension step at 72 °C and a 30 second denaturing at 94 °C. The program ended with one round of strand completion consisting of 1 minute 40 seconds at 55 °C plus 10 minutes chain extension at 72 °C. 2.4 Plant Methods 2.4.1 Interior Spruce Embryogenic Cultures The interior spruce embryogenic culture line, W70, used in these experiments consistently produced large numbers of high quality somatic embryos. This culture line was initiated in 1987 (Roberts et al., 1991), from a seed o f the open pollinated family PG118 (maternal parent) originally from the Prince George region of British Columbia. Interior spruce exists as a natural hybrid or introgression zone between Picea glauca [Moench] Voss (white spruce) and P. engelmannii Parry (Englemann spruce) (Owens and Molder, 1984). Interior spruce embryogenic cultures were maintained and matured as per Roberts et al. (1990a) The culture line was maintained in the dark on Vi L M medium (Appendix B , page 188) solidified with 0.6% noble agar (Difco). Embryogenic suspensor masses (ESM) were subcultured weekly by dividing into quarters and transferring to fresh medium. 56 Somatic embryos were matured from E S M under 16 hour photoperiod (25 to 35 p E m"2 sec"1). Maturation was initiated by transferring a 3A cm 2 section of E S M from maintenance medium to hormone free 1/2 L M charcoal medium (Appendix B , page 189) for one week. The E S M was then transferred to 1/2 L M maturation medium (Appendix B , page 189) containing 60 p M abscisic acid ( A B A ) and 1 p M indole-3-butyric acid ( IBA) for four to six weeks, with subculture to fresh media every two weeks. Interior spruce embryos reached maturity, based on embryo size and morphological characters, after 4 to 5 weeks on maturation medium. 2.4.2 Tobacco Plants Nicotiana tabacum cv. Xanthi plants used for Agrobacterium tumefaciens leaf disk transformation were supplied by Agriculture Canada, Pacific Agriculture Research Station (6660 N W Marine Dr. , Vancouver, B .C . ) as small soil grown plants with 4 to 6 leaves. Tobacco plants were grown in 5 inch diameter pots containing sterilized potting mix under 2 1 high intensity lights (200 p E m" sec" ) for an eighteen hour photoperiod, to encourage flowering. Plants were fertilized every two weeks with a dilute solution (1.2 g/1) of 20-20-20 soluble fertilizer. Shoot tips were cut from selected tobacco transformants, surface sterilized (3 minutes in 70% ethanol, 5 minutes in 20% commercial bleach plus 0.1% Tween 20 (polyoxyethylene sorbitan monolaurate) (Sigma), 4 washes in sterile distilled water) and grown in G A - 7 (Magenta Corp.) vessels on M S medium (Appendix B , page 191) containing 0.1 m M benzyladenine (BA) , as a backup to the soil grown plants. Tissue cultured tobacco shoots were transferred to fresh media every 5 weeks. 57 2.5 Gene Expression Experiments 2.5.1 Transient Expression Interior Spruce Developmental Stages - Target Preparation Proembryos, the earliest somatic embryo stage, consist of a transparent round embryo (10 to 40 cells) attached to a strand of suspensor cells (Figure 3a). Proembryos may be single or attached in ridges consisting of a file of embryos and suspensors attached laterally to one another. A n E S M is essentially a mass of proembryos with the embryos on the surface of the E S M facing up from the medium and outward. Targets of proembryos were prepared by briefly suspending 7.5 g of E S M (interior spruce, line W70) in 37.5 ml of liquid 1/2 L M medium (Appendix B , page 189), then pipetting 1.5 ml of the suspension onto a Whatman #1 filter (5.5 cm in diameter) (Whatman) to form a thin, even layer. Filters were briefly blotted on sterile paper towel, then centred on 1/2 L M (Appendix B , page 189) petri plates solidified with 0.4% Gelrite® (Scott Laboratories, Inc.) rather than noble agar (Difco) for microprojectile bombardment. Stage 2, also known as the globular stage embryo, is the first developmental stage visible during maturation of the spruce embryos. Stage 2 embryos consist of a yellow, opaque embryo, which can be round or bullet shaped, subtended by clear strands of suspensor cells (Figure 3b). Stage 2 embryos were picked individually from the surface of the E S M after one week on maturation medium. Twenty stage 2 embryos were laid in four rows of five on a 1 cm 2 piece of 53 (im nylon mesh.. Each target was then centred on a 1/2 L M 60:1, 0.4% Gelrite (Appendix B , page 189) petri plate. Stage 3 is the early cotyledonary stage of conifer embryo development, harvested from E S M after 3 weeks on maturation medium (Figure 3c). This stage has also been referred to as the "flat head" stage of embryo development, as the ring of cotyledon primordia is just arising around the apical dome, giving the top of the embryo a flat appearance. Embryos were considered to be beyond stage 3 (in transition to maturity) when the tips of the cotyledons had elongated past the apical dome. Temporally, stage 3 is very brief as the cotyledons tend to 58 Figure 3: Spruce Developmental Stages for Microprojectile Bombardment A , proembryo B , stage 2 embryo C, stage 3 embryo (early cotyledonary stage) D , mature embryo E , somatic germinants after 3 weeks on germination media F, germinating white spruce pollen. The somatic embryo stages, germinants and pollen were analysed for expression of the endogenous 2S albumin gene by Northern blotting and were prepared as targets for microprojectile bombardment as described in the Methods. 59 elongate rapidly. Stage 3 embryos were arranged for microprojectile bombardment in the same manner as the stage 2 embryos. The mature embryo is larger than the stage 3 embryo and the cotyledons have extended past the apical dome (Figure 3d). The cotyledons of somatic embryos have a tendency to spread out petal-like away from the apical dome, this differs from zygotic embryos, where the cotyledons tend to close over the apex. Mature embryos were harvested from the E S M after 4 to 5 weeks on maturation medium. Mature embryos were arranged for microprojectile bombardment as above, 20 embryos per target. Partially-dried embryos were mature embryos that had undergone a three week high relative humidity treatment (Roberts et al., 1990b). The high relative humidity treatment involved placing squares of 53 um mesh each containing 20 to 25 mature embryos in alternate empty wells of a sterile tissue culture 12-well plate (Becton Dickinson Labware), the 6 remaining wells being filled with 1 ml of sterile distilled water adsorbed in a 1 cm of Kimpak ™ (Seedboro Equipment). The embryos were not supplied with nutrients during the partial drying treatment. The 12-well plates were sealed with parafilm (American National Can™), wrapped in aluminium foil and stored in the dark for 3 weeks. Partially dried embryos were prepared for microprojectile bombardment by transferring the nylon mesh from the 12-well plate to hormone free Vi L M plates (Appendix B , page 189) solidified with 0.4 % Gelrite. Partially-dried somatic embryos were germinated on 1/2 L M H F , solidified with 0.6% noble agar (Appendix B , page 189) in 500 ml Phytocon tubs (Sigma). Three week old germinants were removed from the medium and prepared for microprojectile bombardment by positioning them horizontally, 7 per target, in the centre of a 0.8% water agar plate (Appendix B , page 191) (Figure 3e). White spruce {Picea glauca) pollen was harvested in M a y 1993 at the Petawawa National Forestry Institute in Chalk River, Ontario and stored desiccated at 4 °C. Pollen grains were suspended in sterile distilled water (0.2 mg/ml) and stirred constantly with a magnetic stir bar on a stir plate. Five millilitres of pollen suspension were vacuum filtered onto sterile 5 cm squares of Biotrans nylon membrane ( ICN Biochemicals), using a scintered-60 glass Buchner funnel. Pollen targets were centred on petri plates containing 5% sucrose, 0.6% water agar (Appendix B , page 191) for bombardment. It was necessary for germination of the pollen grains to have sucrose in the pollen medium. Only germinating pollen grains are able to express G U S (Figure 3f). DNA Preparation for Microprojectile Bombardment The expression vectors p2SMTN, p2S700, p2SGUS, (Figures 2 and 19) and p B M l 13kp (Marcotte et al., 1988) were transformed by heat shock (Methods and Materials section 2.1.5) into theE. coli strains JM101 and SURE® (Stratagene) respectively. Plasmids were isolated by alkaline lysis (Methods and Materials section 2.1.2) and purified by cesium chloride gradient (Methods and Materials section 2.1.2). Vector D N A was quantified by absorbance at 260 nm and diluted to 1 pg/pl for use in transient expression studies. Vector D N A was precipitated onto 1.6 pm gold particles (BioRad) with C a C l 2 / spermidine (Klein et al., 1988). Ten microlitres of 1 pg/pl cesium chloride-purified plasmid D N A , 50 pi 2.5 M C a C l 2 , and 20 pi lOOmM spermidine free base (Sigma) were added in order, to 25 pi of gold particles [3 mg/25 pi distilled water] in a 0.5 ml microfuge tube while being vortexed. The gold microprojectiles were incubated at room temperature for 10 minutes before being briefly centrifuged and washed twice in 100% ethanol. The vector-coated gold particles were resuspended in 50 pi of 100% ethanol and, while being vortexed, were dispensed as 5 pi aliquots onto the centre of each macrocarrier. Macrocarriers were loaded into the P D S 1000/He after the ethanol had evaporated. Gold microprojectiles were propelled into the spruce tissues by a burst of helium gas as the rupture disk gave way. Microprojectile Bombardment Twenty-three targets were prepared, as a group, for each embryo developmental stage (proembryo, stage 2 embryo, stage 3 embryo, mature embryo, somatic germinant and white spruce pollen). Targets were divided between the five expression vectors; five targets each for p2SGUS, p2S700, p2SMTN, and p B M l 13kp, and three targets for the negative control 61 p2S+l. Targets for several experiments would be prepared in the morning and then bombarded in the late afternoon. Each experiment was replicated three times, usually on different days, for a total of 69 targets per experiment. In total, 5963 interior spruce somatic embryos were harvested for .the transient expression experiments. The DuPont P D S 1000/He (BioRad) was operated as per manufacturer's instructions at a vacuum of 25 inches Hg , with 1100 psi rupture disks, a gap distance of 3/8 inch, and an internal nested gap of 16 mm. Microprojectile bombardment, for all developmental stages except the somatic germinants, was carried out with the targets positioned at shelf level 2 (8.3 cm from the stopping mesh). Germinants were bombarded at shelf level 3, 5.1 cm from the stopping mesh. The vacuum flow rate was set at 0.75 and the vent flow rate was set at 0.35 on the P D S 1000/He. Forty eight hours after bombardment, the interior spruce targets were histochemically assayed for transient G U S expression. 2.5.2 Stable Expression Experiments Tobacco Transformation Tobacco {Nicotiana tabacum cv. Xanthi) was transformed using a leaf disk method (Horsch et al., 1985). Agrobacterium tumefaciens cultures (EHA105/pBIN2S and EHA105/pBIN700) used for co-cultivation of tobacco were initiated by picking single colonies from Y E P (Appendix B , page 187) kanamycin 50 mg/1 agar plates and inoculating 3 ml of Y E P broth plus rifamycin 50 mg/1, kanamycin 50 mg/1. The liquid culture was incubated on an orbital shaker at 28 °C overnight. The next morning, the cultures were diluted 1:10 with Y E P broth plus antibiotics and grown to log phase (an O D 600 nm of between 0.7 and 0.9). Antibiotics were washed from the A. tumefaciens log phase cultures by centrifugation at 3900 rpm in a Beckman G P benchtop centrifuge to pellet the cells, followed by resuspension in 3 ml Y E P , repeated 3 times. Cultures were diluted to 30 ml with Y E P broth, and used for co-cultivation. 62 Leaves, from young greenhouse grown plants (4 to 8 leaf stage), were removed and surface sterilized by soaking for 15 minutes in a solution of 10% commercial bleach and 0.1% Tween 20 (Sigma) followed by 4 washes with sterile distilled water. Avoiding major leaf veins, disks (10 mm diameter) were cut using a flame-sterilized cork borer. Leaf disks were soaked in co-cultivation broth for 15 minutes, gently blotted dry on sterile paper towels and plated, abaxial side down, on Murishige and Skoog - shoot induction medium (MS-SIM) (Appendix B , page 191). Control leaf disks, which were not exposed to Agrobacterium, were prepared and plated on MS-SEV1 at the same time. The plates of tobacco leaf disks were placed under lights (25 to 30 pE/sec/m 2 light intensity, 16 hour photoperiod) in the tissue culture room. After 48 hours, A. tumefaciens was visible growing on the surface of the medium around the co-cultivated disks. The leaf disks were transferred to M S - S I M containing 50 mg/1 kanamycin, 250 mg/1 cefotaxime, and'500 mg/1 carbenicillin. One control disk was placed with co-cultivated disks on each plate. Shoots began to grow from the cut edge of the co-cultivated disks after approximately 2 weeks. A t 4 weeks, putatively transformed shoots were removed from the initial explant and rooted on hormone free M S containing the same antibiotics as above. Rooted plants were transplanted to sterile potting soil, and kept under low light (35 pE/sec/m 2) and high humidity for 3 days before being transferred to high light, long day length (200 pE/sec/m 2 , 18 hr photoperiod) to encourage flowering. Generation of Tj Plants Flowers of T 0 plants were allowed to self pollinate and the seed collected. Tj seed was germinated on water agar containing 300 mg/1 kanamycin. Germinants able to form green true leaves on this high level of antibiotic selection were transplanted to soil and grown under the same regime as the parent plants. T, plants were allowed to set seed and then were cut back to encourage further growth and seed production. Two families for each construct (pBIN2S and pBIN700) were selected for further study, based on high levels of p-glucuronidase activity as determined by M U G assays of 63 whole green seed capsules. Expression of the 2.3 kb spruce promoter contained in the construct pBIN2S was studied in 8 offspring of the T 0 parent plant 2S-4 and 11 offspring of T 0 2S-12. The effect on stable expression of deletion of the 2S albumin promoter upstream from -653 was studied in the pBIN700 group of plants, consisting of 9 individuals originating from T 0 plant 700-3 and 11 plants from the T 0 plant 700-13. 2.5.3 Assessment of P-glucuronidase Activity Tobacco leaves, roots, stems, whole flowers, seed and green capsule tissues were assayed for P-glucuronidase expression using both the histochemical (X-gluc) and the fluorescence ( M U G ) techniques. Tobacco seed capsules covering the range of embryo development were harvested; Vi o f each capsule was placed in a vial of F A A fixative (Appendix B , page 182) and the remaining seeds were scraped into a microfuge tube, frozen in liquid nitrogen, and stored at -80 °C until P-glucuronidase activity was measured by M U G assay. Seeds fixed in F A A were dissected and scored for stage of embryo development. A developmental series of tobacco embryos were dissected from fresh seeds, histochemically stained for G U S expression and fixed in F A A . These embryos were passed through a dehydration series from F A A to FAA:acetone (1:1), to acetone Immersion oil (1:1), then photographed in immersion oil using a Zeiss light microscope. Histochemical GUS Assay (Jefferson, et al., 1987) G U S histochemical stain reagent was made by dissolving 50 mg X-gluc (5-bromo-4-chloro-3-indoyl-P-D- glucuronide, Clontech) in 100 ul of dimethylformamide before being diluted in 100 ml 50 m M N a P 0 4 buffer (pH 7.0), 1% Triton X-100 (iso-octylphenoxy polyethoxy-ethanol, B D H Chemicals) to give a final concentration of 0.5 mg/ml (Jefferson, 1987). The X-gluc solution was made fresh or stored frozen at-20 °C. Filter disks containing proembryos and white spruce pollen were placed on Whatman #1 filters which had been soaked in X-gluc solution, whereas larger spruce and tobacco tissues were simply immersed in microfuge tubes or small petri plates. A l l samples were incubated overnight at 37 64 °C in the dark. The number and location of blue loci on microprojectile bombarded tissues were observed using a dissecting microscope. Fluorometic GUS Assay Tobacco samples were either fresh or frozen in liquid nitrogen and stored at -80 °C before being assayed for P-glucuronidase activity. Tobacco seed (50 - 100 mg), 10 mm leaf disks, pollen and whole flowers was assayed by grinding each tissue in 500 p i M U G extraction buffer (50 m M N a H P 0 4 (pH 7.0), 10 m M D T T , 1 m M E D T A , 0.1 % Triton X -100) in a 1.5 ml microfuge tube using a homogenizer with a pestle and a pinch of acid washed sand (BDH) . Whole tobacco leaves, capsules and roots were ground to a fine powder using liquid nitrogen in a pre-chilled mortar and pestle. Approximately 200 mg of ground tissue were transferred to a 1.5 ml microfuge tube and suspended in 500 pi M U G extraction buffer. Insoluble plant material was removed by centrifugation at 13,000 rpm for 10 minutes. The supernatant was removed to a clean tube, 100 pi stored at -80 °C for total protein measurements (BioRad Protein assay, BioRad - see below) and 20 to 50 pi of the crude extract (extract volumes consistent within an experiment) assayed for P-glucuronidase activity. A typical reaction mix consisted o f 40 pi sample extract, 745 p i M U G extraction buffer and 200 pi methanol (Kosugi et al., 1990), warmed to 37 °C to which 20 pi of 50 m M 4-methyl-umbelliferone glucuronide ( M U G ) stock was added (Jefferson et al., 1987). A control blank was made for each set of samples, with 40 pi distilled water replacing the sample extract. Immediately after the addition of the M U G substrate, the microfuge tube was inverted to mix, and 100 pi of the reaction mix was removed and added to 900 pi Stop buffer (0 .2M N a 2 C 0 3 ) as the zero time measurement. The reaction mix was returned to 37 °C and the next 100 p i samples placed in 900 pi Stop buffer at 60 minutes, 2 hours and 3 hours. Fluorescence was read using a L S 50 Luminescence Spectrometer (Perkin-Elmer Cetus) and the Obey software package with F L Data manager and sipper mechanism. Excitation and emission wavelengths were set at 365 and 460 nm respectively. The 65 spectrometer was calibrated with a dilution series of 7-hydroxy-4-methylcoumarin (4-MU) (Sigma) (0.1 u M , 0.5 u M , 1.0 u M , 5.0 u M , and 10.0 uM) , and 100 |ul o f each standard was added to 900 u l o f Stop buffer. Protein Quantification The Bio-Rad Protein Assay microtitre plate protocol (based on Bradford, 1976) was used to quantify total protein, for normalisation of p-glucuronidase activity between samples to activity per milligram total protein. Two replicates of 10 ul sample extract, as well as of a B S A standard (Bio-Rad) dilution series were pipetted into a 96 well microtitre plate. Sample extracts and B S A standards were stored at -80 °C until measured. Standards were 0 mg/ml, 0.070 mg/ml, 0.141 mg/ml, 0.705 mg/ml, and 1.41 mg/ml B S A in sterile distilled water. B io -Rad dye reagent concentrate was diluted 1.5 with distilled water and filtered through a Whatman #1 filter (Whatman). Two hundred microlitres of dilute dye reagent were added to each well and mixed by pipetting. The microtitre plate was incubated at room temperature for at least 5 minutes and then read at 595 nm using a microtitre plate reader (Titertek, Flow Laboratories, Inc.). Calculation of P-glucuronidase Enzyme Activity P-glucuronidase enzyme activity was calculated using the measured amount of 4-methylumbelliferone (4 -MU) generated by the activity of the enzyme on the substrate 4-methylumbelliferyl P-D-glucuronide ( M U G ) (calculated from fluorescence measured over time) normalised to the amount of protein in a given sample. A n example calculation is given below: 4 - M U generated (nMol/min./ul) = ({4-MU1T2 - f 4 - M U I . T , ) (T 2 -Ti)/(vol . crude extract ul)(1000) Protein mg/ul = [Pi+P 2]/2 / 10 ul p-glucuronidase activity (pMol min."1 mg"1) =(4-MU nM/min/ul) x 1000 (protein mg/ul) 66 [4-MU] = calculated concentration of 4-methylumbelliferone at a given time (pmoles / ml) T i and T 2 = times when samples were taken (minutes) P i and P 2 = concentrations of replicate protein samples (mg/ml) 2.6 Data Analysis and Statistics Data were compiled and analysed using the statistics package S Y S T A T , version 5 ( S Y S T A T Inc.). Transient expression was quantified as number of blue loci per target or per embryo and stable expression as pmoles of M U G generated per minute per milligram of total protein. The mean level of expression and standard error were calculated for each developmental stage or tissue type. A n analysis of variance ( A N O V A ) was done on the data, as well as a pair-wise comparison of means (Fisher's least-significant-difference test) (Fisher, 1935) to confirm statistically significant differences between means. 67 C H A P T E R T H R E E Gene Structure 3.1 Characterisation of A.3.2 3.1.1 Pseudogene (v|/2S) Two positive lambda clones (A2. 1 and A3.2) were isolated from a white spruce genomic library (PNFI-X-88) screened with the synthetic oligonucleotide II5G.1 (Table 3: Synthetic Oligonucleotides) (Dr. Craig Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). M y research began with the restriction mapping of A3.2. In the course of mapping the lambda clone, Southern blots revealed the presence of two regions of sequence with homology to the 2S albumin c D N A (GenBank X63193) (Figure 4). A 1.7 kb Sal l /BamHI restriction fragment of A3.2 (subcloned into the sequencing vector p G E M and denoted pl2a-3) was partially sequenced and the B a m H l restriction site was found to be located in the putative coding region of the gene. This was unexpected as the original c D N A clone did not have a B a m H l site. Further sequencing of p i lb-2 (a 13 kb Sai l fragment which contained the previous subclone), revealed that this sequence was a pseudogene containing stop codons in all reading frames, upstream of a 643 basepair insertion (Figure 5). The sequence of the Picea glauca 2S albumin pseudogene, *F 2S, was submitted to GenBank, where it was assigned accession number U92078. The insertion sequence was A / T rich and had 3 small inverted repeats at its 5' and 3' borders (51-G C A C A / T G T G C -3', 5'- A A T T A A T G / C A T T A A T T -3', and 5'- A C G C C G A A / T T C G G C G T -3'). The presence of inverted repeat sequences at both ends of the insertion indicate that this may be a type of transposable or viral element. In order to discover i f this inserted sequence had homology to previously identified insertion elements, transposons, or retro-viral elements, the nucleotide sequence of the insertion was translated in all reading frames in both the forward and reverse directions. This resulted in six possible amino acid sequences, which were submitted to a B L A S T search for comparison with known protein sequences (Altschul et al., 1990). Potential matches from the different translations are listed in Table 4. Amino acid sequence alignments from the B L A S T searches and alignment statistics are recorded in Appendix D (page 195). 68 pGEM subclones 12a-3 )1lb-3 p l l b - 2 Insertion A T G T G A BamHI 1 kb ' X X SX B V X XHX B S / \ / \ / \ / \ / Y2S (Pseudogene) \ 2S Albumin +l A T G UTR Xbal HaeHI 100 bp B BamHI H Hindlll S Sail X Xba1 sequencing primer (not to scale) Figure 4: Restriction Map of A, - 32 Solid black bars above the X-32 restriction map indicate restriction fragments subcloned into sequencing vectors (pi lb -3 , p i lb -2 , and pl2a-3). Right and left arms of the E M B L 3 lambda cloning vector are not drawn to scale. Gene diagrams below X-3.2 represent regions of contiguous sequence. Horizontal arrows indicate location and direction of sequencing primers (1 =H5G.l , 2 =05 G.2 ,3 =115G.3, 4=II5G.4, and 5=H5G.5). II5G.1, II5G.3 and II5G.4 bind to both sequences. Vertical arrows mark restriction enzyme sites. Striped areas denote regions of homology between the two genes. Location of the first in-frame stop codon is noted in the pseudogene as a solid bar above the " T G A " . Genbank accession numbers are U92077 for the 2S albumin gene PG2S and U92078 for W2S. 69 Figure 5: Picea glauca 2S albumin pseudogene *F2S The pseudogene, T 2 S (GenBank accession number U92078), was sequenced from the genomic clones pl2a-3 and p i lb-2. Highlighted sequence in the figure lacks homology with the functional white spruce 2S albumin gene (PG2S). The sequence from 845 to 1488 is a 643 bp insertion into a region homologous to the spruce 2S albumin exon 1. Solid arrows mark sequencing-primer binding sites (II5G.1, II5G.3 and II5G.4), arrowheads indicate the direction of priming. A dotted arrow marks the position where the primer II5G.2 annealed in the functional gene but does not bind in the pseudogene, due to the 5 nucleotide differences noted above the arrow. The putative initiation codon and stop codons (stops in all reading frames ) are boxed, and a unique B a m H l site is also highlighted. Small inverted repeats (A, B , and C) present in the insertion are underlined. 70 Figure 5: Picea glauca 2S A l b u m i n Pseudogene *P2S 1 AT GT T AAAT GAG C T GAT GT AT CAT TAGT AT AT AT GT T GACTACT AT G GAT GAT TAATT TA 6 1 CACAT GT AGAT GGCTATTT CACACAT CNAAGTATT GAT AT GT AT AGT GT GCAT GACGAAA 1 2 1 ATCTGTATGTGTGGTGTGCCTTAACACGTAGAAGTGATGTTGTCGATGATTTAGTCCTGG 1 8 1 GT TAT GAAGAATAAAGAG GAAG GCAGT T T C T TAT T GAG GTAAT T T GT GTAAAAT GAGATA 2 4 1 CCCTTATATCGTGTTGTTTCTGTAGGGCCTGCCCGCTAACAAGATTATCTGTCTTTTGAA 3 0 1 AGACGTGGCCATTGGAAATAGTGTAAGCCAGGCGTTCCTTCTTTGGGTTGGGACGTGGAG 3 6 1 AGTGAGATGTTGCATGACTGCATCTGTTCNACGCTTCTTTGTTGTAGTGTTGTGTTCGTT 4 2 1 GT G C AAAAAGAC AC AT TCCCTTTCCT CAC C T GAC C T T CAT AAAT AT AAC AAT AAT AC C C A 4 8 1 C T T A T T C T T C C C A C C T T G G N A C T T G C A T T C C G T T C A T C T C G G G A A G A A A G G A A A T A A A G A II5G.4 ^ ^ II5G.1 541 A A A A C T C A A A G C A ^ I G G T G T C T T T T C C C C T T C G A C G A C G A G G C T G A C G C T C A A A T G G T T 6 0 1 " A N T T T A T C C G T C A C C C T A T T C A T C C T C C T T C A C T G G G G T A T T C C C A A T G T T G A T G G C C A 6 6 1 TGAAGACAATATGTATGGAGAiliTGfiiAACAACAATGACGGTNTTGCGATCCTCAGAG 721 AGACT<5AAGAGATTTNNTTGATGACGAGACTACTTGGAGCGCCGl||fAGAGAGGCCATCA ..U5G.2 T TQC 7 8 1 GAGAGAAGCAAGCTGCCGAGGAATTGCAAATATTGTCTCAAflllTTCCGAAGCCAATCCA 8 4 1 TACAAClIJCACAGTCACATAATTA^TGTGGTCl'aA'rGA^CGCCGAAAGACATATTTAAT A B C 9 0 1 TTATTAAATCTACCTTTAACTATAAATAAAATCTAATATATCTTATGGCTGCTTTTAACC 9 6 1 ATATCGATCGATCGCASATATCAGAGCTCGCTTCAGTATCAATATCGATAT3CTATTAAA 1 0 2 1 CeATATGCAAAATTTTCTGATGAACNGGCTCTCCAGAAACGCACGCAAAANGCACGNAAA 1 0 8 1 ATTTTCACTGTCGGCTCTCCATAAANGCACGCAGAACGTGGCATCACAATTCACATAAAT BamHl 1 1 4 1 GGGATCCGTGGCAATGAAGATGAAGCAGTTCCTGGTAATANTTTNNNCGTG3CAAGANAA 1 2 0 1 CAGTTTTATAGATTTTNATTATTGGTAATCAATTAAACGTGGCAATTATAT GGCTGGGTA 1 2 6 1 GTAATTTCTTAATATGATTAGGTAAATGTGAAAAGTAAACATGTTAACGGATGCCTTGAT 1 3 2 1 AAAT CCAAGAAAGTTAT GAAGAAT AAAT GTGACCCAT TGAT TTT GGAGGAC GACT AAGAA 1 3 8 1 T A T G C G A G C A T C T T C A A T C C A T A G C C A T C A G A T T A G G G T T T A A G G T T A A A T A A A A T T A A A 1 4 4 1 AGATCATTAATTGTGCCCATTGTCAAGTAGCANTANCTTCGGCGTAACAAATGCTCGATC B A C 1 5 0 1 GATCTTCATCGTAAGATTCCTTCAAGGATTCTGGTTCTTAGGAGGGTGCACCACTTAATC 1 5 6 1 ACGTCGTCGCCCCAAGGCCGCGGAAGAGAGGAGGAGGAGGTACTTGAGAGAGCGACATAC ^ ii5fi.a 1 6 2 1 CTTCTGAATACCTGCAACGTTCATGAGCACNNNNNNNNNNNNNCAACGCCACTCTC 71 Table 4: Matches from B L A S T Sequence Alignments Insertion reading frame Sequence Matches GenBank Accession # BLAST score Identities Positives 1 forward none 2 forward none 3 forward capacitative Ca entry channel (Bos taurus) X99792 59 15/38 (39%) 20/38 (52%) 1 reverse omega-Grammotoxin SIA (Grammostola spatulata = tarantula) 451235 54 10/35 (28%) 15/35 (42%) 1 reverse F47C12.5 gene product (Caenorhabditis elegans) U61946 56 9/20 (45%) 6/17 (35%) 13/20 (65%) 10/17 (58%) 1 reverse sequence 2 from Patent US 4920196 100054 33 7/8 (87%) 5/7 (71%) 7/8 (87%) 7/7 (100%) 2 reverse ADP, ATP carrier protein 2 (Arabidopsis thaliana) P40941 X68592 60 15/40 (37%) 17/40 (42%) 2 reverse nucleotide translocator (Arabidopsis thaliana) 1908224 A 60 15/40 (37%) 17/40 (42%) 2 reverse ADP, ATP carrier protein 1 (Arabidopsis thaliana) P31167 X65549 59 15/40 (37%) 17/40 (42%) 3 reverse none The second region of X3.2 with homology to the c D N A , subclone p i lb-3, was sequenced. During restriction mapping, the two related sequences were differentiated by hybridization with the end-labelled oligonucleotides 115G.5 (present in the c D N A and p i lb-3) and II5G. 1 (present in both p i lb-2 and p i lb-3). Orientation of the sequences in relation to each other was confirmed using the II5G sequencing primers (Table 3) in various combinations to produce P C R products. Presence or absence of amplified sequences was used to deduce direction of the primer pair in relation to each other. N o product was amplified by the primers 115 G. 4 or 115 G. 1 when used alone, confirming that the two genes were arranged facing in the same direction. 3.2 Picea glauca 2S Albumin (PG2S) Sequencing of p i lb-3 resulted in 1907 bp of contiguous 2S albumin genomic sequence; of this, 1063 bp was 5' flanking region, 490 bp was the first exon, 176 bp was a single intron, followed by a small 32 bp second exon and 146 bp 3' flanking sequence (Figure 6). The sequence was named PG2S and submitted to the GenBank database, where it was given accession number U92077. Nucleotide sequence homology between the 2S albumin genomic clone contained in p i lb-3, and the pseudogene Q¥2S - G B U92078) contained in p i lb-2, was 84.7% and included 5' flanking sequence as well as coding region on both sides of the insertion in the pseudogene (Figure 7). Previous work had identified a putative transcriptional start site (+1) as being 62 bases upstream of the translational start site (Craig Newton, unpublished, C. Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). The initiation codon of the 2S albumin genomic clone at position +62 ( A G C A A U G G G ) , was surrounded by nucleotides which are similar to those of the plant consensus sequence for initiation ( A A C A A U G G C ) (Liitcke et al., 1987). The E U K P R O M program of P C / G E N E (release 6.85) identified a T A T A box ( T C A T A A A T A C A A C A A ) 37 nucleotides upstream of the transcriptional start site, with a C A A T box ( C G C G G C C A T T G G ) 153 nucleotides upstream from the T A T A box. Several possible R N A cap signals were also indicated at positions -16, -3 or +6. The polyA signal A A T A A A was found at +808 and the polyA attachment site was determined to be at +883 by comparison with the c D N A clone. 3.2.1 5' Flanking Sequence There were five sets of large direct repeats found in the 5' flanking sequence of PG2S (Figure 8). In three cases there are small stretches (1 to 3 nucleotides) of disparate nucleotides interrupting what would be a longer direct repeat. Also notable was a 30 basepair sequence at -737 repeated completely in tandem once at -707 and again at -677 for the first 12 basepairs. This repeat is labelled as 3 a through 3 c in Figure 8. Located within this 30 basepair repeat unit was a 6 basepair inverted repeat (atgagcgagctaecaaaaectcat) which has the potential to form a hairpin loop with -5.6 kcal free energy. Eleven motifs with similarity to repetitive elements found in cereal seed storage protein promoters were identified (Table 5). 73 Figure 6: 2S Albumin genomic clone from Picea glauca The 2S albumin genomic clone was sequenced from the subclone p i lb-3. The GenBank accession number assigned was U92077. The promoter sequence (-1001 to -1), single intron, the 3' U T R and 3' flanking sequence are written in lower case nucleotides. The 5' U T R is in uppercase from +1 to +62, as is the coding region from +62 to +551 and +728 to +759, with the translated amino acid sequence given below. The boxed nucleotides in the sequence are from 5' to 3': a unique X b a l restriction site, the putative C A A T box, the unique Sphl restriction site, the putative T A T A box, 2 possible R N A cap signals, the initiation of transcription site (open box) and the putative initiation codon ( A T G ) . The second open boxed C marks the site of translational fusion of the 2S albumin promoter to the gus gene. Within the intron sequence two potential hairpin loops are in bold face and a possible intron branch sites are underlined. The stop codon ( T G A ) , the poly-A signal, and the site of poly-A attachment are also highlighted. 74 Figure 6: 2S Albumin genomic clone PG2S from Picea glauca -1001 -941 -881 -821 -761 -701 -641 -581 -521 -461 -401 -341 -281 -221 -161 -101 -41 +20 +80 ctaaa t g g a a a a g a t t g c a t g a t t a g t t t g a g a a t a c g g g t t t c a g g g t t c a t c t t a c c a gtggagaatcttttgattcgggaaacaaacgcagatactcagtcgcacaccataacagtg gacactggtgagtcttttgattcgtgaacaaaacgcagatactcagtggcacaacataac aatggctaatcttttggattcaaatggaaagaacgaagacattgaaaattgaaggaatgg gggagaaggagaagcaaagttcagaaatggaatgagcgagctagcaaagctcataaatgg aatgagcgagctagcaaagctcataaatggaatgagaaagca tcaa tf^feagiaitgacata caataggacattaggcagagagacaggggatgtttgcatggctgtgtaggtggcaattca tgagaaggcggtggaggtggccagtcatgagcaaatgagctatggcgatgcactcaagaa g c a a a c a t t t c t t a a c a t t t a a t g t g t a a t g t t a g a a t t a g t t c t a g c a t t a c t t a c t t g a t t g g a a a a a a t a a t g c c a a a t t c a t g t g c g t t a a a a g c a t t c a g t c g t c a t t g t t a c g g t t a c t a t a a a c t t t a t g a a a c t t t g g c t a a a a g c a t t c a g t c g t c a t t g g t t a c g g t t a c tatagtctctacagcccgaacgagggaataataaagacaatgtaaagcccagtttctaat t g a g a t c a t t t g t g t a a a a t g a g a t a c c c t t a c g t c g t g t c g t t a c t g t a c g g c c t g c c c gctaacaagattctctctgtcttttgaaaga^^^^CC-attggaaacagtggcagccagg c g t t c c t t c a c t g g c t t t g g a c g t g g a g a g t g a g a c g c t ^ l l l l l i l a c c g c a t g t g t t c c a c g c t t c t c t g t c g t c g t g t t g t g t t c g t c g t g g a a g a a g a c a c a t t c c t t t c c t c t c c c a ^ t a a t a c S f f l r a H K t c t c c gcct T T CAT CAC G G GAAGAAAG GAAAGAAAGAAAAAG CT GAAAGC? .mat. |||GGTGTCTTTTCCCCT M G V F S P ?CGACGACGAGG^TGACGCTCAAATGGTT0AGTTTATCCGTCGCCCTGTTCCTCCTCCTT S T T R L T L K W F S L S V A L F L L L HSG.1 GUS + 14 0 C AC T G GGGT AT T C C C AGT GT T GAT G G C CAT GAAGAC AAT AT GT AT GGAGAG GAGAT ACAA H W G I P S V D G H E D N M Y G E E I Q +200 +260 CAACAAAGACGGTCGTGCGACCCTCAGAGACACCCGCAGAGATTGTCTTCATGCCGGGAC Q Q R R S C D P Q R H P Q R L S S C R D ;CCATC TACTTGGAGCGCCGGAGAGAGCAGC ATCGGAGAGATGCTGCGAGGAATTGCAAAGAATG Y L E R R R E Q P S E R C C E E L Q R M +320 TCTCCACAATGCCGATGCCAAGCGATACAGCAAATGCTCGATCAATCTTTATCGTATGAT S P Q C R C Q A I Q Q M L D Q S L S Y D +380 TCCTTCATGGATTCTGACTCTCAGGAGGATACACCACTTAATCAACGACGCCGCCGCCGC S F M D S D S Q E D T P L N Q R R R R R + 440 +500 CGCGAAGGGCGCGGAAGAGACGAGGAGGAGGTGATGGAGAGAGCAGCATACCTTCCGAAT R E G R G R D E E E V M E R A A Y L P N ' C A ^ C ACCTGCAACGTTCGCGAGCCCCCCCGCCGCTGCGATATT ACGCCACTCTCgtaagtcc T C N V R E P P R R C D I Q R H S R +560 t t c a a t c a a c g c t a c c a a t t a t g a c g t a t c a t a a t t a t g a c g a a g c g g t c c a t c t a t c a a +620 t a t a a c g t g g c t a t g c a a a a t t t t c a t t c a g t c a t g t t t c t g t t a t t c c a t a c c c c a a t t + 680 +740 +800 +860 aatgattaatttaagtcatttgttgttttactgctggtgtctggacagGCTATTTCATGA Y F M 1156,5 CGGGCAGCAGTTTTAAGl|titcgacgaagaagaaaatatagatactgcgtgtatgctatg T G S S F K t a t g t c c c t a ^ ^ ^ ^ t a a g g g a g g c a c t a c c g c t a t g t a t t t t t g g t t t c t g c t t t t a t a g a t a t a g c c t c t c a t t c a a t g q l c a c c a c t t t t c a c t t a c a t c a t g 75 Figure 7: Alignment of *P2S with the Functional 2S Albumin Identity is 84.7% between ^F2S and the functional 2S albumin gene, PG2S, with nine gaps inserted in ¥ 2 8 and five gaps inserted in the 2S albumin sequence. Identical nucleotides are denoted with " | " , gaps are shown as The putative C A A T box, T A T A box and initiation codon are highlighted, as are potential stop codons upstream of the insertion. Even allowing for some uncertainty in the sequence, multiple stop codons are found in all reading frames (reading frames are noted above each stop codon) at the 5' end of the inserted sequence. GenBank accession numbers are U29077 for PG2S and U29078 for W2S. Horizontal black arrows drawn above the insertion sequence indicate three short inverted repeats found near both ends; A = 5 , - G C A C A / T G T G C - 3 ' , B = 5 ' - A A T T A A T G / C A T T A A T T - 3 ' , and C=5'-A C G C C G A A / T T C G G C G T - 3 ' . A T rich areas at both ends of the insertion are in bold typeface. The primer II5G.3 marks the furthest 3' point to which the pseudogene was sequenced. 76 Figure 7: Alignment of *F2S with the functional 2S Albumin Gene »F2S GTCCTGGGTTAT +185 I I I I I I I I I 2S ALBUMIN - TAAAAGCATTCAGTCGTCATTGGTTACGGTTACTATAGTCTCTACAGCCC -325 ¥2S 2S ALBUMIN 2S ALBUMIN 2S ALBUMIN 2S ALBUMIN T2S 2S ALBUMIN T2S 2S ALBUMIN 4*2 S 2S ALBUMIN *F2S 2S ALBUMIN *F2S 2S ALBUMIN GAAGAATAAAGA GGAAGGC—AGTTTCTTATTGAGGTA +221 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I GAACGAGGGAATAATAAAGACAATGTAAAGCCCAGTTTCTAATTGAGATC -27 5 ATTTGTGTAAAATGAGATACCCTTATATCGTGTTGTTTCTGTAGGGCCTG +271 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I M I N I ATTTGTGTAAAATGAGATACCCTTACGTCGTGTCGTTACTGTACGGCCTG -225 CAAT box CCCGCTAACAAGATTATCT — GTCTTTTGAAAGACSf03€€AT?<KSAAAT +319 I I I I I I I I I I I I I I I III I I I I I I!I I I I I I M M I M M U I I I CCCGCTAACAAGATTCTCTCTGTCTTTTGAAAGACSC-SSCCA^ffiGAAAC -175 AGTGTAAGCCAGGCGTTCCTTCTTTGGGTTGGGACGTGGAGAGTGAGATG +369 I I I I I I I I I I 1.1 I I I I I I I I I II II I I I I I I I I I I I I I I I II I AGTGGCAGCCAGGCGTTCCTTCACTGGCTTTGGACGTGGAGAGTGAGACG -125 TTGCATG-ACTGCATCTGTTCNACGCTTCTTTGTTGTAGTGTTGTGTTCG +418 I I I I I I II I I I I I I I I I I I I I I I I I III II I II I I I I I I I I I CTGCATGCACCGCATGTGTTCCACGCTTCTCTGTCGTCGTGTTGTGTTCG -75 TATA box TTGTGCAAAAAGACACATTCCCTTTCCTCACCTGACCTf CM'AAM'Af'^A +4 68 I III II I I I I I II I I I I I I I I I I I I I I I I I I I H! I M M f i l TCGTGGAAGAAGACACATTCC-TTTCCTCTCCCAGCCTTCATAAATACAA ~26 I I S L T A A T A C C C A C T T A T T C T T C C C A C C T T G G N A C T T G C A T T C C G T T C A T C +518 I H I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I |^^.TAATACCCACTTGTTCTCCCCACCTTCGCACTTGCATTACGTTCATC +2 4 Initiation codon T C G G GAAGAAAG GAAATAAAGAAAAA-CTCAAAGCAAfiGG GTGTCTTTTC +567 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I l;H I M I N I U M ACGGGAAGAAAGGAAAGAAAGAAAAAGCTGAAAGCAATGGGTGTCTTTTC +74 CCCTTCGACGACGAGGCTGACGCTCAAATGGTTCANTTTATCCGTCACCC +617 II II II II I II II II I M II II II M II II II M I I II I II I M I III CCCTTCGACGACGAGGCTGACGCTCAAATGGTTCAGTTTATCCGTCGCCC +124 T2S - TATTCATCCTCCTTCACTGGGGTATTCCCAATGTTGATGGCCATGAAGAC +667 I M I II I II I I I II II II I II II I II II I I I II I I II I I II I II I I I II 2S ALBUMIN - TGTTCCTCCTCCTTCACTGGGGTATTCCCAGTGTTGATGGCCATGAAGAC +174 77 , „ 3 „ , ...2.... l M'2S . - AAT AT GTAT GGAGAillT GTGZ^CAAC A ^ l l c GGTNT TGC GAT C CT CA +717 I I I I I I I I I I I ! I I 1 1 1 1 1 1 I I I I I I I I If §111 I I I I I I I I I I I I I 2S ALBUMIN - AAT AT GTAT GGAGA||||i3ATA;fAA.CAAC A ^ i f c GGTCGTGC GAC C CT CA +22 4 .....I 3 3 2 ( F 2 S - GAGAGAcill;(3AAGAGATTTNNTTGATGACGAGACTACTTGGAGCGCCGTA +7 66 I I I I I 1111111 I I I I I I l l l l l l l i l l l I I I I I I I I I I I I II I I I iH 2S ALBUMIN - GAGACAC§|i§AGAGATTGTCTf|^ +2 7 4 * F 2 S - SAGAGAGGCCATCAGAGAGAAGCAAGCTGCCGAGGAATTGCAAATATTGT +816 M i l l I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 2S ALBUMIN - :!|AGAGCAGCCATCGGAGAGATGC TG-CGAGGAATTGCAAAGAATGT +319 2. 3 1 1 M'2S - CT C AAJlliJT T C C GAAG C C AAT C C AT AC AAc|jfe6GACAGT CAC AJj^T TAA +8 66 I I I I 1 1 1 I I I I I I I I I I I I N N wmgmsr 2 S ALBUMIN - C T C CAliiST G C C GAT G C CAAG C GAT AC A -: +347 3 3 . »F2S - TGTGGTCTBftTG&ACGCCGAAAGACATATTTAATTTATTAAATCTACCTT +916 C 2S ALBUMIN +347 Y2S - TAACTATAAATAAAATCTAATATATCTTATGGCTGCTTTTAACCATATCG +9 66 2S ALBUMIN +347 *F2S - AT C GAT C G CAGAT AT CAGAG CT C G CT T C AGT AT CAAT AT C GAT AT GCT AT +1016 2S ALBUMIN ' — — +347 T2S - TAAACCATATGCAAAATTTTCTGATGAACNGGCTCTCCAGAAACGCACGC +1066 2S ALBUMIN : • • • +347 ¥23 - AAAANGCACGNAAAATTTTCACTGTCGGCTCTCCATAAANGCACGCAGAA +1116 2S ALBUMIN : ; +347 BamHl T2S - C GT GG C AT CACAAT T CAC AT AAAT GSOJAWC© GT G G CAAT GAAGAT GAAGC +1166 2S ALBUMIN • — +347 T2S - AGTTCCTGGTAATANTTTNNNCGTGGCAAGAAAACAGTTTTATAGATTTA +1216 2S ALBUMIN '— +347 78 T 2 S - A T T A T T G G T A A T C A A T T A A A C G T G G C A A T T A T A T G G C T G G G T A G T A A T T T + 1 2 6 6 2 S A L B U M I N + 3 4 7 V F 2 S - C T T A A T A T G A T T A G G T A A A T G T G A A A A G T A A A C A T G T T A A C G G A T G C C T T + 1 3 1 6 2 S A L B U M I N + 3 4 7 ¥ 2 3 - G A T A A A T C C A A G A A A G T T A T G A A C A A T A A A T C T G A C C C A T T G A T T T T G G A + 1 3 6 6 2 S A L B U M I N — + 3 4 7 - G G A C G A C T A A G A A T A T G C G A G C A T C T T C A A T C C A T A G C C A T C A G A T T A G G + 1 4 1 6 2 S A L B U M I N + 3 4 7 - G T T T A A G G T T A A A T A A A A T T A A A A G A T C A T T A A T T G T G C C C A T T G T C A A G + 1 4 6 6 2 S A L B U M I N + 3 4 7 T 2 S - T A G C A N T A N C T T C G G C G T A A C A A A T G C T C G A T C G A T C T T C A T C G T A A G A T + 1 5 1 6 £ T ~ ^ I I II I I I I I I I I I I I I I I I I I I I I I I I I 2 S A L B U M I N G C A A A T G C T C G A T C A A T C T T T A T C G T A T G A T + 3 7 8 Y 2 S - T C C T T C A A G G A T T C T G G T T C T T A G G A G G G T G C A C C A C T T A A T C A - C G T C G + 1 5 6 5 I I I II I I I I I I II I I I III I I I I II I I I I I I I I I I II I I II II 2 S A L B U M I N - T C C T T C A T G G A T T C T G A C T C T C A G G A G G A T A C A C C A C T T A A T C A A C G A C G + 4 2 8 - T C G C C C C A A G G C C G C G G A A G A G A G G A G G A G G A G G T A C T T G A + 1 6 0 6 I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I 2 S A L B U M I N - C C G C C G C C — G C C G C G A A G G G C G C G G A A G A G A C G A G G A G G A G G T G A T G G A + 4 7 6 * F 2 S - G A G A G C G A C A T A C C T T C T G A A T A C C T G C A A C G T T C A T G A G C A C N N N N N N N + 1 6 5 6 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 S A L B U M I N - G A G A G C A G C A T A C C T T C C G A A T A C C T G C A A C G T T C G C G A G C C C C C C C G C C + 5 2 6 - N N N N N N N N N N N C A A C G C C A C T C T C G + 1 6 8 1 I I I I I I I I I I I I I I 2 S A L B U M I N - G C T G C G A T A T T C A A C G C C A C T C T C G T A A G T C C T T C A A T C A A C G C T A C C A A + 5 7 6 * < II5G3 primer * 2S Albumin sequence continues 79 Figure 8: 2S albumin promoter sequence including the gus fusion junction The unique X b a l and Sphl restriction sites, which mark the .5' ends of the p2S700 and p2SMTN constructs respectively, are highlighted. Motifs resembling seed-specific elements found in cereal promoters are highlighted. Two putative G-box motifs, a C E l - l i k e element, and a possible opaque-2 binding site are noted. Other putative promoter elements discussed in the text are in bold type and underlined. The putative C A A T box, T A T A box and potential R N A cap signals are also highlighted. The site of initiation of transcription is marked with "+1" and the initiation codon is highlighted. Long direct repeats are underlined and numbered 1 though 5. Within direct repeat 3, invert repeats with potential to form hairpin loops are indicated by bold type. A t the promoter/GUS fusion junction +109 marks the last 2S albumin nucleotide, the highlighted sequence 3' to this derives from the p G E M and pBI101.2 polycloning sites, and the boxed A T G is the initiation of translation for the uidA gene. 80 Figure 8: 2S Albumin Promoter including GUS Fusion Junction -1001 ctaaafcggaaaggattgcatgattagtttgagaatacgggtttcagggttcatcttacca -941 gtggagaatcttttgattcgggaaacaaacgcagatactcagtcgcacaccataacagtg la 2a -8 81 gacactggtgagtcttttgattcgtgaacaaaacgcagatactcagtggcacaacataac lb "" 2b -821 aatgqctaatcttttggatccaaatggaaagaacgaagacatj|gaaaacfagaaggaatgg lc -761 gggagaaggaqaaqcaaagttcagaaatggaatgagcgagctagcaaagctcataaatgg 3a Xba1 -7 01 aatqaqcqaqctaqcaaagctcataaatqqaatgagaaagcatcaat t"ctaga|tgacata 3b 3c 0-2 site -641 caataggacattaggcagagagacaggggatgtttgcatggctgtgtaggtggcaattca -5 81 tgagaaggcggtggaggtggccagtcatgagcaaatgagctatggcgatgcactcaagaa -521 g c a a a c a t t t c t t a a c a t t t a a t g t g t a a t g t t a g a a t t a g t t c t a g c a t t a c t t a c t t g -4 61 'ai^feggaaaaa ataatgccaaat t c a t g t qcgfecaaaagcattcaqtcqtcattqttacgg 4a 5a -4 01 ttactataaacttta£g^ 4b 5b -341 tatagtctctacagcccgaacgagggaataataaagacaatg!£aaag:cccagt;t;tcuaac G box -2 81 tgaqatcatttgfcjgtaaaategagatacccttACGTcgtgtcgttactgtacggccrgccc CAAT box -221 g c t a a c a a g a t t c t c t c t g t c t t t t g a a a g a ^ ^ ^ : ^ a t t g g a a a c a g t g g c a g c c a g g G box SphlCE1-like S p h l V ^ i --161 cqttccttcactqqctttggACGTggagagtgagacgctjjcatgCfeCCgcatgtgttcca -101 c g c t t c t c t g t c g t c g t g t t g t g t t c g t c g t g g a a g a a g a c a c a t t c c t t t c c t c t c c c a TATA box Cap Signal +1 Cap Signal -41 g c c t l ^ ^ M ^ ^ M t a a t a c ^ ^ ^ t c t c c ^ ^ M G C A C T T G C A T T A C G +2 0 TTCATCACGGGAAGAAAGGAAAGAAAGAAAAAGCTGAAAGCA^^GGTGTCTTTTCCCCT M G V F S P BamHI +109 + 80 TCGACGACGAGGCTGACGCTCAAATGGTTCG^GATGCpCSGGTAGGTCAGTGCCTT^P S T T R L T L K W F G D P R V G Q S L M 81 Table 5: Sequence with similarity to Cereal motifs -997 AT G GAAAAG a -781 ATTGAAAATt -461 ATTGGAAAAa -431 cGTtAAAAGc -388 taTGAAActt -376 qGCtAAAAGc -302 ATGTAAAGcc -270 qTGTAAAatG -199 tTTGAAAGac -183 ATTGGAAAca +632*(located in intron) ATGCAAAAtt Consensus Sequences PG2S A(t)T q/t q/t AAA(a) cereal motif1 TG t/a/c AAA q/t rice allergenic2 ATGCAAAA -300 motif3 TGTAAAG 1. and 3. Reviewed in Morton et al.,1995. 2. Adachi et al., 1993 A n element similar to one of the Opaque-2 transcription factor binding sites of maize ( G A T G A P y P u T G P u ) (Lohmer et al., 1991) was identified at -650 ( G A T G A C A T a c ) . Other potential cis acting elements located in the 5' flanking region were two G box-like motifs (ct tACGTcgt and tggACGTgga) positioned 132 and 23 basepairs upstream from a motif (atgCACCgc) which resembles the coupling element (CE1) identified by Shen and Ho (1996). Overlapping the putative coupling element is a T G C A T G C A motif, similar to the nuclear protein binding U S R 1 and U S R 2 regions identified in a pea legumin gene (Howley and Gatehouse, 1997). Alignment of 400 nucleotides of the spruce 2S albumin proximal promoter with several dicot 2S albumin promoters served to identify other small motifs (Figure 9). Specifically, the G box core ( A C G T ) and its palindrome ( T G C A ) , the V2 R Y repeat ( C A T G ) , the E box ( C A N N T G ) and a C C A C or C A C C element which are generally found in multiple copies in seed-specific promoters. The relative positions of these motifs were somewhat conserved between promoters, and the elements are often adjacent to each other or overlapping. N o large regions of conserved sequence were observed. 82 Figure 9: Alignment of 2S Albumin Proximal Promoters The 2S albumin promoter sequences were aligned at their T A T A boxes. The sequences are numbered with the guanine of the initiation of translation codon as +1. G boxes ( A C G T ) , 1/2 R Y repeats ( C A T G ) , T G C A motifs, E boxes ( C A N N T G ) , and coupling elements ( C A C C or C C A C ) are in bold type and underlined. Motifs similar to ones found in cereal promoters are underlined. Sequences are 5' flanking regions from 2S albumin genes of Picea glauca, Brassica napus, Arabidopis thaliana (2), Helianthus annuus, Gossypium hirsutum, Pisum sativum, Ricinus communis, two related Zea mays seed storage proteins, an allergenic protein from Oryza sativa, and the E M promoter of Triticum aestivum. GenBank accession numbers are: PG2S - U92077, B N A N A P A - J02798, A T H A T 2 S 1 - M22032, A T H A T 2 S 2 - M22034, H A G 5 A L B 2 - X06410, C O T M A T 5 A - M86213, P E A A L B U M I 1 - M81864, R C 2 S A L B G -X54158, M Z E G L U T 2 E - M16066, Z M Z E I N P R - X63667, R I C R A G 1 - D l 1433, and T A E M G - X 5 2 1 0 3 . 83 Figure 9: Alignment of 2S Albumin Proximal Promoters PG2S 395 ctatagtctctacagcccgaacgagggaataataaagacaatgtaaagcccag BNANAPA 324 g t t t t t t t t t t t a a t t t t a t g a a g t t a a g t t t t t a c c t t g t t t t t a a a a a g a a ATHAT2S1 318 caaaaacatttattgacacactactactctttccgtattgactctcaactagt ATHAT2S2 348 aaaagattgactctcacataccccattaattgaaaccaaatgaacaaaaacgt HAG5ALB2 368 a t c a t a t g t t g t t a t c t t c c a t a g t t g c g g t a t a c c a a c t a t a g g t a g t t t t t COTMAT5A 378 c c t t t t t t a t g a t a t a t t t t t a a a a t t t t g a a c a t a t a c a t g g g c t t g a a a a a PEAALBUMI1 375 t g a c a t g a c t t c t c c a t a c t a t t c t t t a t c t a t a a t g a a a t g a t t c t t t t t t t RC2SALBG 437 gtatatactaagttgatagatgactgatacatattttattaattttaaataat MZEGLUT2E 408 tcgtttcatgaaaaataaaataggccggaacaggacaaaaatccttgacgtgt ZMZEINPR 395 ccatgattttttttctagtggaaaatagccaaaccaagcaacacatatgtggc RICRAG1 414 oatgccagtcatgcaaaaccccaagagcatactaaccacgcatgaacagcnlO TAEMG. 446 aaatataaatatattagagcaaaccatacgaagaaatggcatgacgatcggtt PG2S 343 tttctaattgagatcatttgtgtaaaatgagatacccttacgtcgtgtcgtt BNANAPA 272 tcgttcataagatgccatgccagaacattagctacacgttacacatag ATHAT2S1 266 catttcaaaataattgacatgtcagaacatgagttacacatggttgcatatttg ATHAT2S2 296 tcataagatattaagatgtcacgtcagaacatgatctacaaatgacacataa HAG5ALB2 316 a t a t t t a t a g t t t a t a t t t t c a t t a a a c t c t c t t c g c c a g g c t a c t g t a t t g t C0TMAT5A 326 tatccaaattgtgttaagtgtttgtgggttgattccaaattggatcaccga PEAALBUMI1 323 cgctttccttactaaataattatagatactacacttgttactccacaaaacatta RC2SALBG 384 aa a a t a t a t a t t a a c t a c c t a a t a t t a a a a t a t a a a a t a t t t a t a t a t a t t t c a t a c MZEGLUT2E 356 aaagtaaatttacaacaaaaaaaaagccatatgtcaagctaaatctaattcgtttt ZMZEINPR 343 tatcgttacacatgtgtaaaggtattgcatcacactattgtcacccatgtatttgga RICRAG1 353 t g t g g c c g t t c t t c a a a g t t g a a t c a c t t t a a t t a g t c t t t c t a t a g c c a c a t a t a g TAEMG 394 cacggccagtctccgatcgagcccggccgctacaaacgtacacgcgtcgacaa PG2S 291 actgtacggcctgcccgctaacaagattctctctgtcttttgaaagacgcggccatt BNANAPA 224 catgcagccgcggagaattgtttttcttcgccacttgtcactcccttcaaa ATHAT2S1 212 caagtagacgcggaaacttgtcacttcctttacatttgagtttccaa ATHAT2S2 244 catgcagacgcggagacgcggagggccggtgttgttcgtcacttgtcactctctt BE2S1 264 tccagactaaaaaatcaatgcttcgat HAG5ALB2 263 aatctataggaatctcaactccacttggaccatccatcatatatttccatttccaaa C0TMAT5A 275 tgcaaagtgaaaccattttatccccaggagatattaaaattacatcgagaacctgaa PEALABUMI1 268 tgcacactgtgttactttggtctattgccacactaggtagctgccaaagaata RC2SALBG 327 ttttattaataaaaaaaatataattattatcaaatgtacataaaaataaaataaatt MZEGLUT2E 300 acgtagatcaacaacctgtagaaggatgcaa ZMZEINPR 286 caataccgagaggaaaaaccacttatttattgtattttatcaagtttgtcttgctt RICRAG1 296 catttattttcttccacgtaaggatcaaacaaattcaaataattaaagagtgtag TAEMG 341 tgcatgcatgcaagcagagtcttgagcttctcgtccccttcctcggataactc 84 PG2S 234 ggaaacagtggcagccaggcgttccttcactggctttggacgtggagagtgagacgc BNANAPA 173 cacctaagagcttctctctcacagcacacacatacaatcacatgcg ATHAT2S1 165 cacctaatcacgacaacaatcatatagctctcgcatacaaacaaacata ATHAT2S2 189 ccacacctaatccagacaacaacctaagatcttcactctcgcacacacacgaca BE2S1 237 cagtttgataagcaaacagacaacgtttttggcaacgtctgcatgaaattcg BE2S2 224 gccacctcgcctgcgcctgtccttaccgccctcctngtactttctcg HAG5ALB2 206 caaagagaattgacacctcatacatactccaaagcatacttccacttgctataatttt COTMAT5 218 tactaccgatcccagttaaggcagtttcataaccaaccaaacaaatgtgtccatttt PEAALBUMI1 215 tgcatgatttgctttgatatattgccacactcgctagccccccaaaacct RC2SALBG 270 taaaaatatatacttcaacaagatcaagcgcaaaatccataaaaagaat MZEGLUT2E 269 caaaactgagccacgcagaagtacagaatgattccagatgaaccatcg ZMZEINPR 230 acgtataaattataacccaacaaagtaatcactaaatgtcaaaaccaactagatac RICRAG1 241 tacatatcaatccactttaaccatctgttgtccatatgatgccccagattaggcgtgt TAEMG 291 catgccttgcgagggcacgcccattacgtgttgtcttccaggcccttgccgga PG2S 187 tgcat g c a c c g c a t g t g t t c c a c g c t t c t c t g t c g t c g t g t t g t g t t BNANAPA 127 tgcatgcattattacacgtgatcgccatgcaaatctcctttatagcc ATHAT2S1 116 tg c a t g t a t t c t t a c a c g t g a a c t c c a t g c a a g t c t c t t t t c t c a c c ATHAT2S2 135 catgcattcttacacgtgatcgccatgcaatctcctttctcacc BE2S1 185 acgtggaagctccgttccgccacctcatctcagcccactcccgccat BE2S2 177 tgcaaagttgccacctcccccctgcctgtccttacagtcatccangt HAG5ALB2 14 8 catgtaaaaactcgtacgtgttattcgacaatgttca C0TMAT5 161 tgcatgcagaaattaacctacgtgttaggtttcaagtttcaacat PEAALBUMI1 165 c a t g c a t g c a t t a a a a c t t c a t t a t g t t t a t c a t a t t g t t t t g RC2SALBG 221 catatgaagcaaacagaaacttgcatgctgctgccatacgt MZEGLUT2E 225 acgtgctacgtaaagagagtgacgagtcatatacatttggcaagaaaccatgaagc ZMZEINPR 174 c a t g t c a t c t c t a c c t t a t c t t a c t a a t a t t c t t t t t g c a a a a RICRAG1 183 gaatcacaatagatgtttcatacagaaccacaagcaaatccataaatagctcatg TAEMG 238 cacgtggcgcgacagcagggacaacgagcaggccgacgcacgt PG2S 140 cgtcgtggaagaagacacattcctttcctctcccagcctt BNANAPA 80 ATHAT2S1 69 ATHAT2S2 91 BE2S1 138 cctccacgtactcccacgtgcaaagttccaaaagcccgaaaatccac BE2S2 130 actttcccatgcacagttgctagcctgcaaaacgcac HAG5ALB2 111 C0TMAT5A 116 cacacgtattccccatgcaaaggttcaactctcc PEAALBUMI1 122 tttttacatttaccgtcagctacggtaagaagacgtctc RC2SALBG 180 ggaagcttaaagcagacctcaaaaccatgggtagccaccc MZEGLUT2E 165 tgcctacagccgtctcggtggcataagaacacaagaaattgtgttaattaatcaaagc ZMZEINPR 131 tccaaaattaatcttgcacaagcacaaggactgagatgtg RICRAG1 128 aaca g c c a a a a a a t t t c t a t c t g t t c t t g a a t c a c t t t g g t c t t t a t a g c t a t a TAEMG 195 ccgcgtcgctgcacacgtgccgcctccgtgettcacgacgcaccgcgcccetccage 85 TATA box PG2S 100 cataaatacaacaataatacccacttgttctccccaccttcgcacttgcattac BNANAPA 80 tataaattaactcatccgcttcactctttactcaaaccaaaactcat ATHAT2S1 69 tataaataccaaccacaccttcaccacattcttcact ATHAT2S2 91 tataaaactaactcttcacttcactctttactcaaaccaaaactcat BE2S1 92 cataaatatagaaacccccaccaccaagctctccatccaccactgct BE2S2 93 cataaatagagattctctccaccatcctcttcgcccaccacggctgcagat HAG5ALB2 111 tataacgccaccgattaaactcacctctccacgtatgaacctccacccaccatatatac COTMAT5A 82 t a t a a a t t a c c t t c a c a c c t c c a c t c c a t t t c t c a c c c t t c t c a t c t g a t PEAALBUMI1 93 tataaattgcttcatacaagaatgaaataatgcaacataaaacaaaggtc RC2SALBG 140 cataaaaatgccagaaatgactggccattccatacatcttacacgttctcgacatgcactccnl MZEGLUT2E 107 tataaataacgctcgcatgcctgtgcacttctccatcaccaccactgggtcttcagacc ZMZEINPR 91 t a t a a a t a t c t c t t a a a t t a g t a g c t a a t a t a t c g c a c a t a t t a t t t a RICRAG1 74 tataaaaccacccagagacgaagagcctagctatca TAEMG 13 8 tataaaaacacggcgtatggctcgtcttctccaccatcgatcattg(nlO)gagcgccagcagttgca Initiation codon PG2S2 46 gttcatcacgggaagaaaggaaagaaagaaaaagctgaaagcaatg BNANAPA 33 caatacaaacaagattaaaaacatacacgaatg ATHAT2S1 32 cgaaccaaaacatacacacatagcaaaaaatg ATHAT2S2 44 catcacaaacgagtaagaatacaaacacaaatagcaaaaaaatg BE2S1 45 ctgtatcacatataccaacccacgcttatacccagaatcaccatg BE2S2 42 catacacccacacccaagtacagctagaatcacatcaccatg HAG5ALB2 52 gcaccaccaccacaccataattcacacaaccacaacaccatctcccacaatg COTMAT5A 32 tattttctccataccaggataaatcaacaatg PEAALBUMI1 43 tcataaaaacaagtgtagctggtttcctttcgatcaaacaatg RC2SALBG 67 tataagagcccattaccatcatctactatattccataagagcatctttctgatattcagtaacaatg MZEGLUT2E 48 attagctttatctactccagagcgcagaagaacccgatcgacaccatg ZMZEINPR 42 gaccaactagcaacatagaagcacaatagtgtaccaacaatg RICRAG1 38 cagttaattaagatttttcacaaactaagcaaaatatg TAEMG 65 tacaccacacacgcatccacacgtccgtttcaggaaccttagcggtcgagcacctgttaqcaatg 86 3.2.2 Intron Sequence The putative splice sites of the single 176 bp intron followed the G T / A G rule and showed consensus at the splice sites with introns sequenced in angiosperms (Simpson et al., 1993) (Figures 6 and 12). The consensus sequence for angiosperm introns was given as A A G : G T A A G T at the 5' splice site compared to the white spruce 2S albumin 5' sequence of C T C : G T A A G T and the 3' angiosperm consensus T G C A G : G T compared to the sequence G A C A G : G C . The A U content of the intron was 65%, well within the range given for dicots and monocots. Three possible internal branch sites, fitting the U R A Y consensus (R = A or G, Y = C or T) (White et al. , 1992) are located in a group starting 49 bp upstream from the acceptor site ftaatgattaaf). Two inverted repeats (aattatga cgta tcataatt and aaatga ttaatttaag tcattt) were found within the intron; these have the potential to form hairpin loops with free energies of -5.2 kcal and 0.6 kcal respectively. Several putative promoter elements were also located within the intron. These elements included two G box core motifs ( A C G T ) , a Vi R Y repeat ( C A T G ) and a T G C A motif which was part of a larger element having similarity to the rice glutelin box ( A T G C A A A A ) (Adachi et al., 1993). P C R was used to screen genomic D N A from several white spruce trees to determine i f this intron were unique or i f similar sized insertions were found in other members of the 2S albumin gene family. The primer pair, II5G.4 and II5G.5 (Table 3), amplified a 517 bp P C R product when the c D N A clone II5G1001 was used as a template. Genomic D N A from the white spruce tissue culture line W70 and other individual P. glauca trees, resulted in bands of approximately 700 bp. This is within the range predicted, 517 bp from the c D N A plus 176 bp from an intron yields a P C R product of 693 bp. In some cases (EWS 1647) there were faint bands larger and smaller than 700 bp, but none were the same size as the c D N A control (Figure 10). A Southern blot confirmed that the P C R products generated had homology to the 2S albumin c D N A . 87 t 2 3 4 5 6 7 8 9 10 11 12 Figure 10: Evidence of an Intron in Spruce 2S A l b u m i n Genes Genomic D N A isolated from individual spruce genotypes was used as a template for 2S albumin specific primers in a P C R reaction. P C R bands generated range in size, but are larger than the P C R product of the c D N A clone by approximately 200 bp. Shown is a negative image of a I X T B E gel (1.1% agarose) stained with ethidium bromide containing the P C R products and molecular weight markers. Lanes 1 and 12 contain a 100 bp ladder (Promega). Bands in lanes 3 to 11 are P C R products of the primers II5G.4 and II5G.5 (5 ul of a 25 ul reaction mix). The D N A templates used in each reaction were: lane 2, no D N A ; lanes 3 and 4, W70; lane 5, EWS1647; lane 6, PG2; lane 7, PG5; lane 8, PG8; lane 9, E K 6 ; lane 10, E K 4 6 ; lane 11, II5G1001 (white spruce 2S albumin c D N A clone, GenBank X63193). The template in lanes 3 and 4 was genomic D N A prepared from mature embryos of the white spruce tissue culture line W70. Templates in lanes 5 to 10 were genomic D N A isolated from buds of individual trees. E W S = eastern white spruce, P G = Prince George region, and E K = east Kootenay region. 88 Other supporting evidence for the presence of introns in the spruce 2S albumin genes comes from two P. glauca genomic clones sequenced by Craig Newton (unpublished, Dr. C. Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). These genes had low sequence similarity with the c D N A , II5G1001, but had intron-like sequences in the same relative position in the coding region when compared to PG2S (unpublished results). Based on comparison with the c D N A clone and conservation of the intron splice sites, there was a 227 bp intron present in the genomic clone II5G122001, and the second genomic clone GII5G8001 had the beginning of an intron, though sequencing of the 3'end of gene was incomplete. 3.2.3 Comparison ofPG2S with Related Conifer cDNAs The genomic clone PG2S contained in the subclone p i lb-3 was compared to the four largest P. glauca 2S albumin c D N A clones sequenced by Craig Newton (GenBank accession X63193 and unpublished; Dr. C. Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2) (Figure 11). Excluding the intron, the genomic clone PG2S had 93.3% identity with the c D N A II5G1001, 90.7% identity with X I I A 0 0 1 , 8 8 . 1 % with III9H001 and 82.4% identity with X5H001 at the nucleotide level. Sequence differences between the most similar c D N A (II5G1001) and the genomic clone resulted in three synonymous substitutions, five substitutions in non-coding areas, and six amino acid substitutions ( c D N A compared to genomic sequence, Phe26 >Leu, Asp57> His, A l a i 17 > Thr, G l u i 32 > Asp, A l a i 35 > Va l , Serj65 > Phe). There was also the addition of a sixth arginine in a string of five arginines at position 125 in the genomic sequence. 89 Figure 11: Comparison of the Spruce 2S Albumin genomic and cDNA clones Identity at the nucleotide level between the 2S albumin genomic clone and the c D N A clones are: 93.3% with II5G1001, 90.7% with X I 1 A 0 0 1 , 88.1% with III9H001, and 82.4% with. X5H001 . GenBank accession numbers are U92077 for the genomic 2S albumin clone and X63193 for the c D N A II5G1001. Identical nucleotides are indicated by dashes. Light shaded boxes indicate neutral nucleotide changes. Dark shaded boxes indicate sequence differences resulting in amino acid substitutions. Open boxes indicate an insertion relative to the other sequences, and "*" indicates a gap placed to allow alignment due to an insertion. A triangle points to the position of the intron in the genomic clone. The amino acid translation of the genomic clone is above the nucleotide sequence. Amino acids which differ between the genomic and c D N A clones are written in bold type in brackets. 90 Figure 11: Alignment of 2S Albumin Genomic and cDNA clones genomic clone 1 lb-3 CCTTCGCACTTGCATTACGTTCATCACGGGAAGAAAGGAAAGAAAGAAAAAGCTGAAAGC 60 >II5G1001 :| .——I • 55 cDNA I X11A001 A-GA-A--A- A $; 39 clones<v III9H001 13 lx5H001 37 M G V F S P S T T R L T L K W F S L 11B-3 AATGGGTGTCTTTTCCCCTTCGACGACGAGGCTGACGCTCAAATGGTTCAGTTTA 115 . II5G1001 110 X11A001 : 94 III9H001 68 X5H001 : '• 92 (F) (C) S V A L F L L L H W G I P S V D G H 11B-3 TCCGTCGCCCTGTTCCTCCTCPTTCACTGGGGTATTCCCAGTGTiGAT|GCCATG 170 II5G1001 i 165 X11A001 149 III9H001 — 1 1 123 X5H001 ' 147 E D N M Y G E E I Q Q Q R R S C D P Q 11B-3 AAGACAAT AT GTAT GGAGAGGAGATACAACAACAAAGACGGT CGT GC GACCCT CA 225 II5G1001 220 X11A001 204 III9H001 . A 178 X5H001 — 202 (D) (F) R H P Q R L S S C R D Y L E R R R E 11B-3 GAGAfACCCGCAGAGATTGTCTTCATGCCGGGACTACTT|GAGCGCCGGAGAGAG 2 80 II5G1001 1 — ; 275 X11A001 1 2 59 III9H001 1 1 '233 X5H001 4 2 5 7 Q P S E R C C E E L Q R M S P Q C R . 11B-3 CAGCCATCGGAGAGATGCTGCGAGGAATTGCAAAGAATGTCTCCACAATGCCGAT 335 II5G1001 330 X11A001 314 III9H001 - 288 X5H001 312 C Q A I Q Q M L D . Q S L S Y D S F M D 11B-3 GCCAAGCGATACAGCAAATGCTCGATCAATCTTTATCGTATGATTCCTTCATGGA 390 II5G1001 385 X11A001 369 III9H001 343 X5H001 ' ; 367 (A) • (S) S D S Q E D T P L N Q R R R R R R E 11B-3 TTCTGACTCTCAGGAGGATyACCACTTAlrCAACGACGCCGCCGCCgCCTGCGAA 445 II5G1001 T 1 * C *** 437 X11A001 T |f C — *** 421 III9H001 T 1 i C *** 395 X5H001 T: M 1 *** 418 91 (E) (A) (S) G R G R D E E E V M E R A A Y L P N l'lB-3 GGGCGCGGAAGAGApGAGGAGGAGGpGATGGAGAGAGCAGCAT^CCTTCCGAATA 500. II5G1001 1 1 1 492 X11A001 1 1 1 476 III9H001 1 1 1 1 450 X5H001 1 1 : 1 : 473 intron • • T ' C .N V R E P P R R C D I Q R H s l j l Y. . 11B-3 CCTGCAACGTTCGCGAGCCCCCCCGCCGCTGCGATATTCAACGCCACTCTCGCTA 731 II5G1001 . 547 X11A001 . - 1 531 III9H001 ' 505 X5H001 -—: 528 (S) F M T G S S F K 11B-3 TTfCATGACGGGCAGCAGTTTTAAGTGAgcgacgaagaagaaaatatagatactg 78 6 II5G1001 — I — • 1 ' 602 X11A001 1 '• 586 III9H001 . 560 X5H001 — 1 -G — 583 11B-3 c;gtg:featgctatgtatgtccctaaataa*************************** 811 II5G1001 —A *************************** 627 X11A001 ;rp *************************** 611 III9H001 C *************************** 585 X5H001 - A — | |GGGAGGCTCCATGACGGGCAGCAG"fTf| 638 ******************************************************* 811 JJ5QQ^Q0^ ******************************************************* 627 X ]_ 2_PiO 01 ******************************************************* 611 III9H001 ******************************************************* 585 X5H001 | T AAGT GAG C GAC GAAGAAGAAAAT AT AGAT ACT G CAT G CAT G CT AT GT AT GT C C C| 693 11B-3 * * * a t a a g g g a g g c a c t a c c g c t a t g t a t t t i t g g t t t c t g c t t t t a t a | a t a t a 866 II5G1001 ******* _ — | 678 X 1 1 A 0 0 1 ******** 1__ 6 6 2 III9H001 *** 1 • ' 640 X5H001 BA3 '• * T 747 11B-3 g c c t c t c a t t c a a t g c t c a c c a c t t t t c a c t t a c a t c a t g 907 II5G1001 AAAAAAAAAAAAAAAAA 712 X11A001 CAAAAAAAAA 689 III9H001 AAAAAA 663 X5H001 CAAAAAAA 772 Legend Neutral nucleotide changes i | | § § Nucleotide changes leading to amino acid substitution I Nucleotide insertions * Gap placed due to an insertion or deletion — Homologous nucleotides 92 3.3 Amino Acid Sequence Translation of the genomic sequence resulted in a 173 amino acid polypeptide of a predicted weight of 20 kDa (Figures 6 and 12). The precursor protein was characterised using the P C / G E N E program P R E S I D U E to plot amino acid proportion along the sequence (Figure 13). The proportions of hydrophobic (Figure 13A), aromatic (Figure 13B), charged (Figure 13C) and polar residues (Figures 13 D , E and F) were plotted. The average hydrophobicity of the precursor protein was calculated as 40%. The amino terminal region of the protein is highly hydrophobic from Tyr i 2 to Val33 due in part to a high concentration of leucine residues. The balance of the protein is hydrophilic except for two small regions of hydrophobicity around Leu99 and from Met i38 to A r g i 51. The few aromatic residues present in the precursor protein tend to be clustered in the amino terminal region from residue 11 to 29 or are located in the last 9 residues of the carboxy terminus. The average proportion of aromatic residues (Phe, Trp or Tyr) was 15%. Charged amino acid residues were found to form three clusters within the amino acid sequence, from residue 31 to 87, from residue 108 to 141, and from residue 150 to 165. The amino and carboxy terminal regions had low proportions of charged amino acids. The average concentration of charged residues within the precursor protein was 25%. Polar neutral residues made up 35% of the precursor protein on average, while polar positive and polar negative residues made up 15% and 10%, respectively. A t the amino terminal end of the peptide polar neutral residues made up to 50% of the region between residues 6 and 12. Two other regions with approximately 50% concentration of polar neutral residues were identified between residues 42 to 54 and between residues 87 to 120. The sequence Seri63,Thr, Gly, Ser, Ser at the carboxy terminus was also highly polar neutral in character. Polar positive residues, histidine, lysine and arginine, fell into three clusters within the peptide. From residue 52 to 78, the polar positive residues reached a maximum concentration of 35%. A sharp peak centred at 126 reached a 70% proportion due to a string of six arginine residues, and another small cluster of approximately 35% concentration was 93 Figure 12: White Spruce 2S Albumin coding region The amino acid translation of the 2S albumin genomic clone is in uppercase letters below the nucleotide sequence. The 5' U T R , and two exons of the nucleotide sequence are written in uppercase, the intron and 3' U T R are written in lowercase. The initiation codon, stop codon, poly-A signal and poly-A attachment site are boxed and identified above the sequence. A putative eukaryotic secretory signal is highlighted from threonine-8 to aspartate-38, with the cleavage site denoted by " | ". The proposed mature 2S albumin small subunit is highlighted from glycine-42 to arginine-72. The putative large subunit is from proline-75 to lysine-173. Amino acids strictly conserved in identity and position among dicot 2S albumins, as well as in related monocot seed storage proteins, are boxed. Two potential hairpin loops are highlighted in the intron sequence, as are three potential internal branch sites which conform to the U R A Y consensus sequence. 94 Figure 12: White Spruce 2S Albumin Coding Region Initiation codon +2 0 TTCATCACGGGAAG7WVGGAAAGAAAGA7^A7AAGCTGAAAGCA^^GGTGTCTTTTCCCCT M G V F S P +80 TCGACGACGAGGCTGACGCTCAAATGGTTCAGTTTATCCGTCGCCCTGTTCCTCCTCCTT s l l l l l l l l l l l l l l l l l l g + 140 CACTGGGGTATTCCCAGTGTTGATGGCCATGAAGACAATATGTATGGAGAGGAGATACAA l l l l i l l l l l l l l i l l l l l p n m y i B i p i ^ M i i i i i i +200 CAACAAAGAGGGTCGTGCGACCCTCAGAGACACCCGCAGAGATTGTCTTCATGCCGGGAC Q Q R h £ [CI V B Q R R P Q R I I, S P C P T< +260 TACTTGGAGCGCCGGAGAGAGCAGCCATCGGAGAGATGCTGCGAGGAATTGCAAAGAATG P;S E R|C C|E B [ T ] Q R '. M +320 TCTCCACAATGCCGATGCCAAGCGATACAGCAAATGCTCGATCAATCTTTATCGTATGAT +38 0 TCCTTCATGGATTCTGACTCTCAGGAGGATACACCACTTAATCAACGACGCCGCCGCCGC +440 CGCGAAGGGCGCGGAAGAGACGAGGAGGAGGTGATGGAGAGAGCAGCATACCTTCCGAAT !R:;;;;::E::::;;Gj::::;iR:;::::6 ::L;:;:;:;P: +500 ACCTGCAACGTTCGCGAGCCCCCCCGCCGCTGCGATATTCAACGCCACTCTCgtaagtcc +5 60 ttcaatcaacgctacc»ji|pf^ +620 t a t a a c g t g g c t a t g c a a a a t t t t c a t t c a g t c a t g t t t c t g t t a t t c c a t a c c c c a a t t + 68 0 l i j j i J s^taatttaagi^IiltgttgttttactgctggtgtctggacagGCTATTTCATGA stop codon +740 CGGGCAGCAGTTTTAAGf^tcgacgaagaagaaaatatagatactgcgtgtatgctatg T G S S F K + 8 00 t a t g t c c c ^ ^ ^ ^ ^ p ^ ^ g g g a g g c a c t a c c g c t a t g t a t t t t t g g t t t c t g c t t t t a t + 8 60 agatatagc'ctctcattcaatgc||caccacttttcacttacatcatg poly A attachment site Legend Leader sequence with splice site denoted by I Small subunit Large subunit | | Conserved amino acids Putative branch sites 95 Figure 13: Graphical Representation of 2S Albumin Amino Acid Statistics The proportion of specific sets of amino acids in an interval of 11 amino acids is plotted for the spruce 2S albumin protein using the P C / G E N E program P R E S I D U E . The x-axis represents the amino acid number along the sequence, the y-axis indicates proportion, and the horizontal dotted line marks the average value for the sequence. A. Plot of the proportion of hydrophobic residues (Ala, He, Phe, Leu, Met, Pro, Va l , and Trp). B. Plot of the proportion of aromatic amino acids (Phe, Trp and Tyr). C. Plot of the proportion of charged amino acids, positive and negative, (Arg, Asp, Glu, His, and Lys). D. Plot of the proportion of polar neutral residues (Cys, Gly, Asn, Gin, Ser, Thr, and Tyr). E . Plot of polar positive residues (His, Lys, and Arg). F. Plot of polar negative residues (Asp and Glu). 96 Figure 13: Graphical Representation of 2S Albumin Amino Acid Statistics 97 98 present between 156 and 165. Polar negative residues , aspartic and glutamic acid, were clustered in four sharp peaks which reached local concentrations of 35 to 50% centred at residue 39, 78, 111 and 135. The predicted hydropathic index of the precursor peptide was plotted using the S O A P program of P C / G E N E (Figure 14). A hydrophobic amino terminal region was predicted from residue 9 to 33, as well as a slightly hydrophobic carboxy-terminus. The majority of the protein sequence was found to be hydrophilic in nature with the exception of small regions which were slightly hydrophobic at residues 64, 90 to 108, and from 139 to 146. The S O A P program predicted that the hydrophobic regions were either folded into the interior of the protein or were inserted into membranes. As well, a transmembrane region was predicted from Seri 7 to Val33 by the same software. The grand average of the hydropathicities, or G R A V Y score, was given as -11.06. Further characterisation of the precursor protein sequence to predict possible secondary structures was accomplished using the N O V O T N Y program of P C / G E N E ; the results are shown in Figure 15. Propensity to form beta turns, beta sheets and alpha helices was plotted, as were concentrations of charged residues and hydrophobicity as calculated by the method of Rose and Roy (1980). Secondary structure prediction by the method of Rost and Sander (1994) using an online neural network program, Predict-Protein, results in a protein consisting of a loop region at the amino-terminal end followed by three helices, a second region of loop, a fourth helix with the carboxyl-terminus in an extended conformation (Appendix E , page 200). Secondary structure is expected to have an effect on the processing of the mature protein, as processing sites are either exposed or hidden from proteolytic processing enzymes (Hara-Nishamura et al., 1993, Monsalve et al., 1990). The relationship between the putative secondary structure of the precursor protein and the hypothetical structure of the mature spruce 2S albumin protein wil l be dealt with in the Discussion. Eight cysteine residues, which are highly conserved in 2S albumin amino acid sequences, define three core domains of the protein (Figure 16). Alignment of the cysteine residues of the gymnosperm and angiosperm 2S albumin seed storage proteins reveals that a 99 few other amino acids are also strongly conserved, namely L e u g i , Leug3, Ala94, Ala]41 Leu 1 4 4 , and Pro 1 4 5 as numbered in the spruce amino acid sequence. A s noted previously, amino acid homologies are low among the various 2S albumin proteins, though they have similarities such as high glutamine and glutamate content. Based on comparison with dicot 2S albumin precursor proteins for which the processing is known, an attempt was made to identify the signal sequence, the amino terminal processed fragment, the internal processed fragment and the carboxy terminal fragment of the spruce 2S albumin protein. P C / G E N E identified a possible eukaryotic secretory signal which meets the -3, -1 rule of von Heijne (1986) from T h r 8 to G l y 3 5 ( T T R L T L K W F S L S V A L F L L L H W G I P S V D G / H E D ) with cleavage before His36 Glu Asp (Figure 12). Alternatively, based on comparison with the known dicot 2S albumin signal sequence, the site of cleavage may be at Ala21, and the amino terminal processed fragment may be cleaved at T y r 4 i . The spruce prepropeptide sequence differs from the angiosperm sequences in that it appears to lack or has a very small internal processed fragment (IPF) (Figure 1 and Figure 16). Among dicot 2S albumins the IPF is normally removed from between the A and B domains to form the mature protein. Interestingly, whereas the putative IPF is smaller, the variable region between the conserved domains B and C is larger than is generally found in dicot sequences. The amino acid content of the mature protein was calculated based on the estimated processing sites which would result in a small subunit thirty amino acids in size from glycine 42 to arginine 72, and a large subunit consisting of 98 residues from glutamate 73 to serine 171. This putative mature protein would have a molecular weight of 15.6 kDa. The amino acid composition of this possible mature protein was compared with a directly sequenced Pinus pinaster 2S albumin protein (Allona et al., 1994), the dicot 2S albumin, napin (Ericson et al., 1986), and a monocot 2S albumin directly sequenced from oil palm (Morcillo et al., 1997) in Table 6. 100 Figure 14: Plot of Predicted Hydropathicity Hydropathic index of the spruce 2S albumin precursor protein from amino acid 1 to 173 plotted using the S O A P program from P C / G E N E . The interval size was 9 amino acids. Values above the -5 score are hydrophobic, below -5 are hydrophilic. The grand average of the hydropathicity ( G R A V Y ) for the 2S albumin peptide was calculated as -11.06. 101 Beta turn Beta sheet ~r Alpha helix Charged residues Hydrophobicity Precursor Peptide ER signal t ^ t Small f peptide ' T ' Subunit 1 Large atbuait P F Figure 15: Prediction of the 2S Albumin Precursor Protein Secondary Structure Secondary structure was predicted by the N O V O T N Y program of P C / G E N E . Possible proteolytic processing sites are marked with arrows and dashed lines, from left to right: E R signal peptide, amino-terminal processed fragment (ATPF) , small subunit of the mature protein, internal processed site (or fragment), large subunit of the mature protein and carboxy-terminal processed fragment (CTPF). The conserved cysteine residues are marked by heavy black lines and dotted lines. A , B , and C denote the three conserved domains found in 2S albumin storage proteins. Beta turn, beta sheet and alpha helix are predicted by the methods of Chou and Fasman, (1978) and hydrophobicity values are predicted by the method of Rose and Roy (1980). 102 Figure 16: 2S Albumin Super Family Alignment Similar amino acids are marked by and "*" marks those that are conserved in all sequences. Gaps were introduced where needed to improve the alignment. Sequences for which the processing is known have the signal peptide printed in blue, the amino terminal processed fragment in red, the mature protein small subunit in green, the internal processed fragment in purple, the mature protein large subunit in black and the carboxy terminal processed fragment in gold. H A S F 8 and R I C R A G 1 are monomeric. The amino acid translation of PG2S is aligned with a Pinus strobus 2S albumin translation from the c D N A P S A L B 3 A ( G B X62435), representative dicot 2S albumin sequences from A thaliana ( A T H A T 2 S 1 , G B M22032, Krebbers et al., 1988), B. napus ( B N A N A P A , G B J02798, Ericson et al., 1986), Brazi l nut (BE2S1 ,GB X54490, De Castro et al., 1987), cotton ( C O T M A T 5 A , G B M86213), sunflower (HASF8, G B X 5 6 6 8 6 , Kortt et al., 1991), Capparis masaikai ( C A P M A * , G B P80351, Sun et al., 1996), pumpkin ( C U C P 2 S A , GBD16560, Hara-Nishimura et al., 1993), walnut (JRU66866,GB U66866) and lupine ( L A C O N G L D , GBX53523 , Gayler et al., 1990) as well as a monocot, rice allergenic protein ( R J C R A G 1 , G B D l 1433, Adachi et al., 1993). *The published Capparis masaikai amino acid sequence lacked the signal and amino terminal peptide (Sun et al., 1996). 103 Figure 16: 2S Albumin Super Family Alignment Signal Peptide and Amino-terminal Processed Fragment PG2S MGVF-SPSTTRLTLKWFSLS VA-LFLLLHWGIPSVDGHEDNMY PSALB3A ERKKSNGCLS SPMS TRLTLKWVTLVAALLFVIHCS TP TVGAHEDMD ATHAT2S1 MAN-KIiFLVCA-LALCFLLTNASIYRTVVEFEEDDATN BNANAPA MAN-KLFLVSATLAFFFLLTNASIYRTWEFDEDDATD BE2S1 MA—KIS VAAAALLVLMALGHATAFRATVTTT WEEEN COTMAT5A MA—KLAVYLATLALILFLANAS ITSVTVESEEN HASF8 MAKFSIVFAAAGVLLLVAMAPVSEASTTTIITTIIEEN 2S2CAPMA* CUCP2SA MARLTSIIALFAVALLVADAYA RTTITTVEVEEN JRU6 68 6 6 -AALLVAL-LF-VAN—AAAA FRTTITTMEIDEDIDN RICRA5 MASNKWFSVLLLAWSVLAATATMAE YH LACONGLD MAKLTILLIALVAALVLWHTSA 41 46 36 37 36 32 38 * 33 32 29 23 Small Subunit and Internal Processed Fragment PG2S -GEEIQQQRRSCDPQRH-PQRLSSCRDYLEKRR PSALB3A GETALQQQRRSCD PQRLSDCHDYLQRRR ATHAT2S1 -PIGP-KMRK-CRKEFQKEQHLRACQQLMLQQARQGRSD EFDFEFFMEN BNANAPA -SAGPFRIPK-CRKEFQQAQHLRACQQ'WLHKQAMQSGGGPS'WTLDGEFDFEDDMEN BE2S1 -QEE CREQMQRQQMLSHCRMYMRQQ-MEES PYQTM COTMAT5A -RDS CEQQIRKQAHLKHCQKY MEEE LGGE GSDN HASF8 -PYGRGRTESGCYQQMEEAEMLNHCGMYLMKN LGER SQVS 2S2CAPMA QLWRCQRQFLQHQRLRACQRFIHRRAQFGGQP ELEDEVEDDNDDEN CUCP2SA RQGREERCRQMSAREE-LRSCEQYLRQQSRDVLQMRGIEN-JRU66866 PRRRGEGCREQIQ.RO.QNLNHC-Q.YLRQO.SRSG GYDEDN-RICRA5 HQDQWYTRARCQPGMGYPMYSLPRCRALVKRQCRGSAAAAE LACONGLD FQS SKQ SCKRQLQ-QVNLRHCENHIAQRIQQQQEEEEDHALKRGIKHVIL 72 74 82 91 69 64 77 46 72 69 71 71 Large Subunit and Carboxy-terminal Processed Fragment PG2S EQPSERCCEELQRMSPQ-CRCQAIQQMLDQSLSYDSFMDSDSQE-DTPLNQ 121 PSALB3A EQPSERCCEELQRMSPH-CRCRAIEQTLDQSLSFDSSTDSDSQD-GAPLNQ 123 ATHAT2S1 PQG-QQQEQLFQQCCNELRQEEPD-CVCPTLKOAAKAVRLQGQHQPMQ VRKI 132 BNANAPA PQGPQQRPPLLQQCCNELHQEEPL-CVCPTLKGASKAVKQQIQQQGQQQGKQQMVSRI 147 BE2S1 PRRGMEPHMSECCEQLEGMDES-CRCEGLR-M MMMRMQQEEMQPRGE CjMRRMM 119 COTMAT5A IAGGYIDSCCQQLEKMDTQ-CRCQGLRHA TMQQMQQMQGQMGSKQMREIM 113 HASF8 PRMREEDHKQLCCMQLKNLDEK-CMCPAI MMMLNEPMWIRMRDDQVMSM- 125 2S2CAPMA QPRRPALRQCCNQLRQVDRP-CVCPVLRQAAQQVLQRQIIQGPQQLKRL 94 CUCP2SA PWRREGGSFDECCRELKNVDEE-CRCDMLEEIAREE—QRQQARGQE GRQM 120 JRU66866 QRQHFRQCCQQLSO>DDEQ-CQCGLRQWRRQQ—QQQGLRGEEMEEM 113 RICRA5 QVRRD—CCRQLAAVDDSWCRCEAISHMLGGIYRELGAPDVGHPMSEVFR 119 LACONGLD RHRSSQEYSEESEELDQCCEQLNELNSQRCQCRALQQIYESQSEQCE-GSQQEQQL 126 ** * * * PG2S RRRRRR-EGRGRDEEEVMERAAYLPNTCNVREPPRRCDIQRHSRYFMTGSSFK PSALB3A RRRRPE GRGRE E E EE EAVERAGE LPDRCNVRE S PRRCDIRRH SRYSIIG ATHAT2S1 TQTAKH LPNVCDIPQVDV- CPFN IPS FPS F Y BNANAPA YQTATHLPKVCNIPQVSV-CPFQKTMPGPSY BE2S1 RLAENIPSRCNLSPMR—CPMGG-SI—AGF COTMAT5A QKVTKKIMSECEMEPGR—CD-T—SR-SLI HASF8 AHNLPIECNLMS-QP-CQM 2S2CAPMA FDAARNLPNICNIPNIGA-CPFRAW CUCP2SA LQKARNLPSMCGIRP-QR-CD F JRU66866 VOSARDLPNECGISS-QR-CEIRRS-WF RICRA5 GCRRGDLERAAASLPAFCNVDIPNGGGGVCYWLARSGY LACONGLD -EQELEKLPRTCGFGPLRR-CDVN-PDEE * * 173 172 162 178 146 138 142 118 141 138 157 153 104 Table 6: Amino Acid Composition (%) of P G 2 S Amino Acid Picea glauca Pinus pinaster Brassica napus Elaeis guineensis (Oil Palm) PG2S 2S albumin napin 2S albumin Ala 2.3 6.3 6.3 3.9 Arg 19.3 16.3 5.4 15.8 Asn 2.3 Asx1 5.2 1.8 Asx1 7.0 Asp 6.9 Asx1 0 Asx1 Cys 6.2 6.4 7.1 4.9 Gin 11.6 Glx2 23.1 23.2 Glx2 29.3 Glu 10.8 Glx2 3.6 Glx2 Glv 3.1 5.4 6.3 9.0 His • 1.5 1.8 2.7 1.3 He 2.3 1.9 3.6 1.4 Leu 5.4 6 7.1 3.7 Lys 0 0.4 8.0 5.8 Met 3.8 0.2 2.7 0.7 Phe 1.5 0.1 2.7 1.3 Pro 6.2 5.5 11.6 4.9 Ser 9.3 7.3 5.4 4.1 Thr 2.3 0.1 3.6 3.0 Trp 0 not analysed 0.9 not analysed Tyr 3.1 3.7 0.9 1.3 Val 1.5 2.2 5.4 2.4 1. Asx represents aspartate and asparagine combined. 2. Glx represents glutamate and glutamine combined. The P C / G E N E program C H A R G E P R O predicts an isoelectric point for the precursor protein of 7.86. The precursor protein processed into the mature form predicted above has a calculated isoelectric point of 7.87. The isoelectric points calculated for the predicted small and large subunits separately were 9.97 and 5.76, respectively. GenBank contains five conifer 2S albumin c D N A sequences, besides II5G1001 sequenced by Craig Newton. One was also isolated from Picea glauca (GenBank L47745, Dong and Dunstan, 1995), while the other four were isolated from Pinus strobus (GenBank accessions: X62433, X62434, X62435, X62436, Rice and Kamalay, 1991). Identity at the nucleotide level between PG2S (sequence from +1 to +906 minus the intron) and the GenBank c D N A sequences ranged from 50.3 to 93.3 percent (Table 7). 105 Table 7: Percent Identity between P G 2 S and Conifer 2S Albumin cDNAs* Species cDNA GenBank Accession Identity (%) Global Alignment Score Picea glauca II5G1001 X63193 93.3 2572 Picea glauca P I A E M B 2 5 L47745 65.3 1212 Pinus strobus P S A L B 1 A X62433 50.3 754 Pinus strobus P S A L B 2 A X62434 66.5 1192 Pinus strobus P S A L B 3 A X62435 66.8 1174 Pinus strobus P S A L B 4 A X62436 62.2 977 * Scoring matrix gap penalties: -12/-2 Alignment of the conifer 2S albumin amino acid translations in Figure 17 revealed that they share basic similarities in primary structure. The consensus length for alignment was calculated as being 203 amino acids, the shared identity among the 10 sequences was 28.1% and the similarity was 24.1%. The matrix of the pair-wise similarity between individual sequences is shown in Table 8 and a dendrogram of the sequence alignment is shown in Figure 18. Table 8: Matrix of Pair-wise Similarity of Conifer 2S Albumin Amino Acid Sequence (%) PG2S PIAE25 PSALBl PSALB2 PSALB3 PSALB4 X11A001 II5G1001 III9H001 X5H001 PG2S 51.7 49.8 45.8 49.8 41.4 80.3 79.3 77.8 79.8 PIAE25 51.7 43.3 37.9 51.7 42.4 51.7 51.7 50.2 52.2 PSALB1 49.8 43.3 59.1 42.9 42.4 52.7 51.7 51.7 52.2 PSALB2 45.8 37.9 59.1 43.3 40.9 46.8 45.8 46.3 46.3 PSALB3 49.8 51.7 42.9 43.3 63.1 52.7 53.2 51.2 53.2 PSALB4 41.4 42.4 42.4 40.9 63.1 44.8 45.3 43.3 45.3 X11A001 80.3 51.7 52.7 46.8 52.7 44.8 83.7 82.3 84.2 II5G1001 79.3 51.7 51.7 45.8 53.2 45.3 83.7 81.3 84.2 III9H001 77.8 50.2 51.7 46.3 51.2 43.3 82.3 81.3 81.8 XSH001 79.8 52.2 52.2 46.3 53.2 45.3 • 84.2 84.2 81.8 106 Figure 17: Alignment of Conifer 2S Albumin Amino Acid Sequences PG2S is a Picea glauca genomic clone (GenBank accession number U90277). X I 1 A 0 0 1 , X5H001 , and III9H001 are unpublished P. glauca c D N A s sequenced by C. Newton (Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). II5G1001 (GenBank X63193) and P I A E M B 2 5 (GenBank L47745) are also P.glauca cDNAs , P S A L B 1 A to P S A L B 4 A are Pinus strobus c D N A s (GenBank X62433, X62434, X62435, and X62436). Translations of the nucleotide sequences were aligned using the P C / G E N E program C L U S T A L . Setting of computation parameters: K-tuple value = 1, Gap penalty = 5, Window size =10, Filtering level = 2.5, Open gap cost =10, Unit gap cost =10. The '*' indicates strict conservation of amino acid identity,'.' indicates a similar amino acid, and '-' marks a gap placed for alignment of the sequences. 107 Figure 17: Alignment of Conifer 2S Albumin Amino Acid Sequences PG2S MGVF-SPSTTRLTLKWFSLSVAL 22 X5H001 • MGVF-SPSTTRLTLKWFSLSVAL 22 XI1 AO 01 MGVF-SPSTTRLTLKWFSLSVAL 22 II5G1001 MGVF-SPSTTRLTLKWFSLSVAL 22 III9H001 MGVF-SPSTTRLTLKWFSLSVAL 22 PIAEMB25 MGVFFSPTSTRLTLKWVSLGMAL 23 PS ALB 3 A ERKKSNGCLSSPMSTRLTLKWVTLVAAL 28 PS ALB 4 A LLSNVDEADAKM—GDLVAAV 19 PSALB1A MKEIFNLSASAGSDYLSIFHRNEERKAMDVF-CPPTTRLALKWASLGVRL 49 PSALB2A LHRNEERKAMGVF - S P LKWVS LGVAL 25 PG2S FLLLHWGIPSVDGHEDNMYGEEIQQQRRSCDPQRHPQRLSSCRDYLERRR 72 X5H001 FLLLHWGIPSVDGHEDNMYGEEIQQQRRSCDPQRDPQRLSSCRDYLERRR 72 X11A001 FLLLHWGIPSVDGHEDNMYGEEIQQQRRSCDPQRDPQRLSSCRDYLERRR 72 II5G1001 FLLFHWGIPSVDGHEDNMYGEEIQQQRRSCDPQRDPQRLS SCRDYLERRR 72 III9H001 FLLLGWGIPSVDCHEDNMYGEEIQQQRRSCDPQRDPQRLSSCRDYFERRR 72 PIAEMB25 LLLLHWGTRTVDAHEDGLYGEEVQQQRRSCEQ QRLSSCREYLERPR 69 PS ALB 3 A LFVIHC STPTVGAHEDTMDGEALQQQRRS CD PQRLSDCHDYLQRRR 74 PSALB4A LFLIHCSTPTVGAHEDTMDGEALQQQRRSCD PQRLSDCRDYLQRRR 65 PSALB1A LVLLQWGTPTVDAREGVMYGEGLQQQRRSCD PQRLSECREYMEMRR 95 PSALB2A LLLLQCATPTVDAREGVMYGEGLQQQRRSCD PQRLSQCRDYMEMRR 71 *_ _*_ . * * _ * * • • * * * _ _ * * * * * . . * . . . . * PG2S EQPSERCCEELQRMSPQCRCQAIQQMLDQSLSYDSFMDSDSQE-DTPLNQ 121 X5H001 • EQPSERCCEELQRMSPQCRCQAIQQMLDQSLSYDSFMDSDSQE-DAPLNQ 121 X11A001 EQPSERCCEELQRMSPQCRCQAIQQMLDQSLSYDSFMDSDSQE-DAPLNQ 121 II5G1001 EQPSERCCEELQRMS PQCRCQAIQQMLDQSLSYDSFMDSDSQE-DAPLNQ 121 III9H001 EQPSERCCEELQRMSPQCRCQAIQQMLDQSLSYDSFMDSDSQE-DAPLSQ 121 PIAEMB25 DQPSERCCEELQRMSPQCRCQAIQRTLE DVFMDSDSQD-GAPLNQ 113 PSALB3A EQPSERCCEELQRMSPHCRCRAIEQTLDQSLSFDSSTDSDSQD-GAPLNQ 123 PSALB4A EQPSERCCEELRRMSPHCRCRAIEQTLDQSLSFDLSLNS—QD-GAPLNQ 112 PSALB1A EQPSERCCEQLERMSPQCRCRAIQQVLDQSQSYD LFMDSEAALNQ 140 PSALB2A EQP SERC CEELERMSPQCRCRAIQQVLDQSHSYDSTTEDLSMDSDAALNQ 121 ******** * **** *** ** * * * * PG2S RRRRRR-EGRGRDEEEVMERAAYLPNTCNVREPPRRCDIQRHSRYFMTGSSFK 173 X5H001 RRRRR—EGRGREEEEAMERAAYLPNTCNVREPPRRCDIQRHSRYSMTGSSFK 172 X11A001 RRRRR—EGRGREEEEAMERAAYLPNTCNVREPPRRCDIQRHSRYFMTGSSFK 172 II5G1001 RRRRR—EGRGREEEEAMERAAYLPNTCNVREPPRRCDIQRHSRYSMTGSSFK 172 III9H001 RRRRR—EGRGREEEEAMERAAS LPNTCNVREPPRRCDIQRHSRYFMTGS SFK 172 PIAEMB25 RRRQRRGQGRGMEEEEWRRAEE LPNTCNVRQ S PRRCD LQRH SRY SITDT S F - 165 PS ALB 3 A RRRRPE GRGREEEEEEAVERAGELPDRCNVRE S PRRCD IRRH SRY S11G 172 PSALB4A RRR—EGRGREEEEEEAVERAEELPDRCNVRESPRRCDIRRHSRYSIIGGT-S 162 PSALB1A RRRR-ESRGRE-EAEEAEERAAYLPETCNIRQPPRRCDVQRRSRYFTSGSGF- 190 PSALB2A RRRRRE SRGRE-E-EEAEERVRIPSRNLQRPPATRRCDVQRRSRYFTSGTGF- 171 *** **. . **, *. . ,,****.,*.*** 108 PG2S X11A001 X5H001 II5G1001 III9H001 PIAEMB25 PSALB3A PSALB4A PSALB1A PSALB2A Figure 18: Dendrogram of the Alignment of Conifer 2S Albumins The conifer 2S albumin genes are clustered in the dendrogram based on similarity in amino acid sequence. The alignment tree was formed by the C L U S T A L program using the U P G M A (Unweighted Pair Group Maximum Averages) method of Sneath and Sokal (1973). GenBank accession numbers for the sequences are given in the text and Figure 17. 109 3.4 2S Albumin Promoter : uidA Translational Fusions Three vectors were constructed to study spruce 2S albumin promoter function. The vector p2SGUS consisted of the total length of the upstream region isolated from p i lb-3 (approximately 2.3 kb) translationally fused at +109 of the 2S albumin coding region to the polycloning site upstream of the uidA gene in pBI101.2. The second vector, p2S700, was a deletion of p2SGUS from the distal end of the promoter to a unique X b a l site at -653. The third vector, was a further deletion of the upstream regions of the promoter to an Sphl site at position -117 site, 3' to the putative C A A T box (Figure 19). The translational fusion between the +109 nucleotide of the 2S albumin coding region and the pBI101.2 polylinker 5' to the uidA initiation codon resulted in a 25 amino acid N-terminal addition to the p-glucuronidase enzyme (Figure 8 page 80). The first 16 amino acids of the chimeric protein originated from the spruce 2S albumin preprotein leader sequence, a single glycine was encoded by three guanines from the p G E M 3zf(+) poly-linker and the final eight amino acids before the uidA initiation codon were encoded by nucleotides from the p B H O l .2 poly-linker. The addition of 25 amino acids to the P-glucuronidase enzyme did not appear to adversely effect activity, as levels of G U S expression were observed in the seeds of stably transformed tobacco plants which were comparable to levels observed in seeds of the positive-control plants which were transformed with the vector pBI121 (Jefferson et al., 1987) containing the uidA coding region under control of the cauliflower mosaic virus ( C a M V ) 35S promoter (data not shown). 110 p2SGUS Sa i l 1 2S GUS ^9 7*9 EcoRI 4 LTzf GUS -117bp V 1 nzzpi GUS Figure 19: Spruce 2S Albumin Promoter Constructs These diagrams represent the translational fusion between the 5' flanking region of PG2S and the uidA reporter gene (p2SGUS), and subsequent deletions (p2S700 and p2SMIN). The promoter region of p2SGUS and the coding region of the uidA gene are not drawn to scale. Vertical arrows mark unique restriction sites and +1 marks the 2S albumin initiation of transcription site. "2S atg" and "GUS atg" mark the initiation codons from the 2S albumin gene and the uidA gene respectively. The positions of motifs with similarity to cereal seed specific motifs are marked by "*"s (listed in Table 5). Two putative G boxes (cttacgtcgt and tggacgtgg) are located upstream of a putative coupling element (atgcaccgc). The C A A T box (cgcggccattgg, -190) and T A T A box (cataaatacaacaa, -36) are represented by a triangle and ellipse on the promoter diagram. Il l C H A P T E R F O U R Gene Expression 4.1 Expression of the Native Gene Seed storage proteins are strictly regulated, being expressed at high levels in seed storage tissues (embryo, endosperm or megagametophyte) and generally, nowhere else in the plant (Bewley and Black, 1994; Morton et al., 1995). Flinn et al. (1993) showed that Picea glauca/engelmannii 2S albumin m R N A began to accumulate in early cotyledonary stage somatic embryos, through to maturity and was turned off by a partial-drying treatment. 2S albumin message was also observed during the early part of precocious germination of somatic embryos which had not undergone partial-drying, but the message declined rapidly as this abnormal germination continued: The 2S albumin gene was not expressed in somatic germinants which had previously undergone the partially-drying treatment, and which were germinating normally. In order to compare expression of the native 2S albumin gene with the results of transient expression of 2S albumin promotenGUS fusion vectors, endogenous activity of the 2S albumin gene was examined in a series of white spruce somatic embryo developmental stages, as well as in partially dried mature embryos and in white spruce pollen. A Northern blot containing 5 pg total R N A per lane was hybridized with a probe homologous to the first exon of the genomic sequence (Figure 20). The probe, a 517 basepair P C R product of the primers II5G.4 and II5G.3, was amplified from the c D N A clone II5G1001. N o signal was detected in the proembryo, but the 2S albumin m R N A began to appear in stage 2 embryos (globular), and was strongly expressed in stage 3 (early cotyledonary) and mature embryos. The messenger R N A declined, but was not completely absent, in partially dried embryos. N o 2S albumin m R N A was detected in the germinating white spruce pollen. The northern blot was stripped and re-probed with a constitutive white spruce 28S ribosomal probe ( C N - X 6 G ) to confirm that equivalent amounts of R N A were present in each lane. Band intensity of the 2S albumin R N A bands corrected to normalise the amount of 28 S r R N A per lane, indicated that the message increased four-fold from the stage 2 embryo (band 112 Spruce 28S ribosomal probe Spruce 2S Albumin cDNA probe egpo c$r ^ Figure 20: Northern Blot of the Spruce 2S Albumin Gene in RNA Total R N A was extracted from the various somatic embryo developmental stages of spruce, five micrograms were loaded per lane on a 1% agarose I X 32 M O P S - 5% formaldehyde gel. The gel was blotted and probed with a P-labeled 517 bp fragment of the spruce c D N A clone II5G1001, stripped and then re-probed with a spruce 28S ribosomal probe ( C N - X 6 G ) . 113 intensity - 45) to the stage 3 (early cotyledonary) embryo (band intensity = 187). The relative amount of 2S albumin message decreased slightly in the mature embryo (band intensity = 152), and decreased a further five-fold in the partially dried embryos (band intensity = 31). 4.2 Transient Expression in Picea glauca Developmental Stages Gene constructs need not be stably integrated into the chromosomal D N A of an organism in order to be expressed, transient expression of reporter genes under the control of various promoters has been accomplished by electroporation and microprojectile bombardment. This type of experiment has been used to compare the "strength" of different promoters or promoter constructs, as well as being used to identify promoter elements by comparison of the wild-type with a modified promoter sequence. Stable transformation of conifer somatic embryo cultures leading to the production of transformed plants has been achieved (Ellis et al., 1993), but is time consuming and arduous when compared to transformation of plants such as tobacco. It was not known whether transient expression of a 2S albumin seed storage protein promoter:GUS reporter gene construct bombarded into developing white spruce somatic embryos would exactly mirror expression of the native gene in the same tissues. The goal of this experiment was to transiently express a spruce 2S albumin promoter: G U S reporter gene construct in a series of somatic embryo developmental stages, germinants and pollen and to compare expression with the levels of 2S albumin m R N A present in order to see i f transient expression would be developmentally controlled. Under the control of 2.3 kb of the 2S albumin 5' region (p2SGUS) the P-glucuronidase enzyme was minimally active in proembryos compared to the E M promoter from wheat ( p B M l 13kp), yet showed increasing activity through stage 2, stage 3, mature and partially dried embryos (Figure 21). The mean level of expression for the 15 proembryo targets bombarded was 17.0 GUS-expressing loci per target (SE ± 2.2) (Figure 21 A ) . The expression vector p2S700, in which the 5' promoter region was deleted distally leaving only the proximal lA, had a mean expression level of 29.9 (± 8.2) loci/filter disk. The 114 16 O 12 E LU ._ 8 (D Q. O O p2SGUS p2S700 p2SMIN pBM113kp p2S+1 P a r t i a l l y D r i e d E m b r y o a ab bg _b_ p2SGUS p2S700 p2SMIN pBM113kp p2S+1 Vectors used in Bombardment c 'E t ID CD 2 i_ 0) Q. p2SGUS p2S700 p2SMIN pBM113kp p2S+1 a M a t u r e E m b r y o c c p2SGUS p2S700 p2SMIN pBM113kp p2S+1 G e r m i n a n t F -L b b b b p2SGUS p2S700 p2SMIN pBM113kp p2S+1 Vectors used in Bombardment Figure 21: Transient GUS Expression in Spruce Somatic Embryos and Germinants Results of the microprojectile bombardment of spruce somatic embryos and germinants with the white spruce 2S albumin:GUS reporter gene fusion constructs. Graphs are plots of the average number of P-glucuronidase expressing loci per target (graph A) or per individual embryo or germinant of a target (B-F) versus the promoter construct. Error bars within a graph with the same letter above denote means which are not statistically different from each other. Briefly, p2SGUS is 2.3 kb of 5' flank from the gene PG2S translationally fused to the uidA reporter gene, p2S700 is a 5' deletion to -653 of the previous construct and p2SMIN is a deletion to -117. The wheat E M promotenGUS fusion construct of p B M l 13kp (Marcotte et al., 1988) was used for comparison, as it is known to be expressed at high levels in conifer somatic embryo cultures (Pierre Charest, pers. comm., Dr. P. Charest, NRCan, Canadian Forest Service, 580 Booth St., Ottawa, Ont, K 1 A 0E4). The negative control is p2S+l, which has a single basepair frame shift between the 2.3 kb spruce 2S albumin promoter and the uidA coding region. 115 minimal expression vector, p 2 S M I N , which is a further deletion of distal sequence to position -117 in the 2S promoter, had a mean level of expression of 23.7 (± 2.9) loci per target. The expression level of the wheat E M promoter ( p B M l 13kp) was 87.4 (+ 8.7) loci per proembryo target. Fisher's Pair-wise Comparison of Means found no difference between the mean levels of expression for the three spruce 2S albumin constructs in proembryo tissues, whereas the mean level of expression of p B M l 13kp was significantly different at a P value less than 0.05. A n analysis of variance of the data from three replicate experiments indicated that there were no significant differences between the experiments, yet confirmed there were significant differences between vectors (P=0.05). Intensity and size of G U S stained foci on the filter disks of proembryos did not differ between constructs, nor were differences or preferences in cell type expressing G U S observed. Transient expression levels of the four vectors in the stage 2 (globular) spruce somatic embryos followed a similar pattern (Figure 21 B) . There were no significant differences, at P < 0.05, among the three 2S albumin constructs. The mean level of expression for p2SGUS was 0.34 (± 0.07) loci per embryo, for p2S700 was 0.33 (+ 0.06) loci per embryo, and for p2SMTN was 0.13 (± 0.05). The transient level of G U S expression with p B M l 13kp was significantly higher with a mean value of 0.62 (± 0.08) loci per globular embryo. At a P value < 0.05, there were significant differences between experiments and between vectors. When each experiment was viewed separately, there were no significant differences between the 2S albumin constructs within an experiment. The G U S expressing spots on the stage 2 embryos tended to be large in relation to the size of the embryo and intense, which may have resulted in an under-estimation of the number of loci per embryo. Stage 3 denotes the early cotyledonary stage in conifer embryos, in which a ring of cotyledon primordia are visible on the embryo head under 2 0 X magnification. During the 48-hour period from microprojectile bombardment until assay for G U S expression, cotyledons elongated as stage 3 embryos continued to develop on maturation medium. Stage 3 is the point at which seed storage proteins begin to accumulate in both the zygotic and somatic P. 116 glauca/engelmannii embryos (Flinn et al., 1993 ). During this developmental stage, differences in transient expression levels between the three sizes of 2S albumin promoters became apparent (Figure 21 C). The mean level of expression for p2SGUS was 1.47 (± 0.21) blue loci per stage 3 embryo. Which was not significantly different (P=0.213) from the mean value for p B M l 13kp (1.76 ± 0.25) loci per embryo but was significantly greater than the transient levels of p2S700 (0.86 ± 0.13) and p2SMTN (0.23 ± 0.08 loci per embryo) (P=0.008, and 0). p2S700 and p2SMTN were also significantly different from each other (P=0.008). Size and intensity of GUS-expressing loci also varied between vectors. The E M promoter produced the most intense staining, followed in descending order by p2SGUS, p2S700 and p2SMTN. The analysis of variance calculation pointed to significant differences between experiments and between vectors, but the experiment by treatment value was 0.269, which indicates that the trend was not significantly different between experiments (P< 0.05). The pattern of expression observed (Figure 21 D) when mature embryos were bombarded, was similar to that of the stage 3 embryos. The vector p2SGUS had the highest levels of expression, 2.94 (± 0.37) loci per embryo, with p2S700 having the next highest level of expression, 1.89 (± 0.26) loci per embryo (significantly different, P=0.002). The minimal promoter was a seventh as active as the full length promoter, with a mean transient expression level of 0.43 ( ± 0 . 1 6 ) loci per embryo. The mean values for the three albumin vectors were significantly different from each other at a P value of less than 0.05. The wheat promoter of p B M l 13kp had a mean level of transient expression of 1.84 (± 0.33), which was not statistically different than p2S700 (P= 0.868). Analysis of variance of the mature embryo transient expression data found significant differences between experiments and vectors (P=0 in both cases). The P value for experiment by vector variance was 0.186, which indicates that experimental trends were not significantly different. Intensity and size of GUS-expressing loci showed a general trend among the constructs. Mature embryos bombarded with p B M l 13kp had the largest, most intense spots, followed in descending order by p2SGUS, p2S700, and p2SMTN. 117 Partial drying of mature somatic embryos under a high relative humidity has the effect of increasing the percentage and quality of embryo germination (Roberts, et al., 1990b). Mature white spruce embryos which had been partially dried in a high relative humidity treatment for three weeks, showed a different pattern and higher levels of transient expression than the mature embryo targets when bombarded with the three 2S albumin constructs (Figure 21 E) . Unlike the mature embryo targets which were bombarded on maturation medium (Appendix B , page 189) containing 60 m M A B A and 1 m M IB A , partially-dried embryos were placed on hormone-free medium for bombardment, and assayed for G U S expression 48 hours post-bombardment. The highest level of transient expression was produced by the full-length 2S albumin promoter of p2SGUS (9.6 ± 1 . 9 blue loci per embryo). This was significantly different (P value less than 0.1) from the mean expression values for the smaller constructs (P= 0.058, and P= 0.022 for p2S700 and p2SMTN respectively). The construct p2S700 produced on average 6.0 ( ± 1 . 1 ) loci per embryo; p2SMTN, 5.3 ( ± 1 . 1 ) loci per embryo and pBM113kp, 6.0 (± 0.6) loci per embryo. These levels of expression were not significantly different from each other at a P value of less than 0.1. The intensity and size of GUS-expressing regions was greatest in partially-dried embryos bombarded with p B M l 13kp. Embryos bombarded with p2SGUS and p2S700 were next most intensely stained and were approximately equal in appearance. In general, embryos bombarded with p2SMTN had smaller, less intensely blue loci than for the other vectors. The negative control vector, p2S+l, which has a single basepair frame shift of the uidA codon in relation to the 2S albumin promoter, was expressed in the partially-dried embryos (1.4 ± 0.7 loci per embryo). Expression was observed in 5 targets during three separate experiments. This was unusual, as G U S expression did not occur with this construct in any of the other spruce embryo developmental stages or in somatic germinants, but was also observed in germinating white spruce pollen grains (see below). 118 Seed storage protein genes are not active in normally germinating embryos, since at this point in the plant's life cycle seed proteins are being catabolised and the resultant amino acids synthesised into new proteins or shunted into various biosynthetic pathways. In germinants, the levels of transient expression of the 2S albumin promoters were not statistically different from zero (P= 0.862 for p2SGUS, P= 0.975 for p2S700, and P= 0.877 for 2SMTN, when compared to the mean level of expression for p2S+l). Average levels of expression were 0.054 (± 0.31) loci per germinant for p2SGUS, 0.01 (+ 0.01) for p2S700 loci per germinant, and 0.048 (± 0.018) loci per germinant for p 2 S M I N . In comparison, p B M l 13kp produced relatively high levels of expression (1.73 ± 0.40 loci per germinant). Analysis of variance indicated no significant differences between experiments (P= 0.463) but significant differences between vectors (P= 0). GUS-expressing loci generated by the wheat E M promoter of p B M l 13kp were generally located in the cotyledons and hypocotyl of the germinants, but few loci were observed in the roots. The individual somatic germinants which had G U S expressing cells under control of the 2S albumin promoters looked abnormal and had thickened hypocotyls or were vitrified, when compared to germinants not expressing G U S or to germinants expressing G U S under control of the E M promoter. G U S expression was only observed in pollen grains which were in the process of germinating, in which case the entire pollen grain and germination tube would be stained blue. Pollen grains were germinated by plating them on water agar supplemented with 5% sucrose (Appendix B , page 191). White spruce pollen had high levels of G U S expression when bombarded with the 2S albumin promoters (Figure 22). The 2.3 kb promoter, p2SGUS, produced the highest mean level of G U S expression, 175 ( ± 2 1 . 9 ) loci per filter disk. The distally-deleted 2S albumin promoter constructs, p2S700 and p2SMTN, also had high levels of expression (154 ± 16.5 and 122 ± 27.2, respectively). Fisher's Least-Significant-Difference test did not find any differences between these three means at a P value less than 0.05 (P value between p2SGUS to p2S700 was 0.494, between p2SGUS and p 2 S M I N P=0.100, and between p2S700 and p2SMTN P= 0.329). The wheat E M promoter in p B M l 13kp also gave 119 "3 100 o I—1 p2SGUS p2S700 p2SMIN p B M l 13kp p2S+l Vectors used in Bombardment Figure 22: Transient GUS Expression in White Spruce Pollen Bar graph of the average number of G U S expressing loci per pollen target. White spruce pollen grains were suspended in sterile distilled water (0.2 mg/ml) and 5 ml vacuum filtered onto nylon membrane prior to microprojectile bombardment. The vectors used for bombardment are described in the text and in Figure 21. 120 high levels of transient expression in pollen. The mean 168 (± 22) was not significantly different from the other constructs (P values of 0.814, 0.653 and 0.157). Analysis of variance of the pollen transient expression data indicated no significant difference between experiments (P= 0.126) and no significant difference between vectors (P= 0.359) when the negative control vector is excluded from the calculation. Only five pollen targets in two experiments were successfully bombarded with the negative control vector p2S+l due-to fungal contamination of replicate plates. Mean level of expression was 56.8 (± 14.6) loci per filter disk. Lack of sufficient replicate numbers for the negative control vector meant that it could not be included in the analysis of variance calculation. Dunnett's two sided test, as well as the Tukey H S D multiple comparison (both pair-wise comparisons) found that there were significant differences between the mean expression levels of p2S+l and the other constructs at P values less than 0.05. 4.3 Stable Expression in Tobacco Twenty-three pBIN2S and 21 pBIN700 T Q transformed tobacco plants were grown until they flowered and set seed. Southern blots of tobacco genomic D N A probed with a 750 bp fragment of the uidA gene coding region confirmed that these plants contained the gene. Analysis of vegetative tissues from the T Q tobacco plants for 3-glucuronidase activity by X -gluc histochemical staining was negative, there being no expression observed in leaves, roots, stem cross-sections, vegetative buds, sepals, anthers or whole flowers. Mature and immature whole tobacco seeds, from the T Q plants, incubated in X -g luc reagent did not stain blue, but when whole seed capsules from the same plants were ground and tested for G U S activity using the M U G fluorescence assay, results were positive. The activity levels measured by fluorescence were not particularly low (i.e., not below levels at which one would expect to see X -g luc staining). Further work revealed that the endosperm of the tobacco seeds was not expressing P-glucuronidase and that tobacco seeds had to be partially dissected to allow the X -g luc reagent to penetrate to the P-glucuronidase-expressing embryo. 121 Seeds from T 0 parent plants were exposed to a high level of selection (300 mg/1 kanamycin) while germinating on water agar. Four of the 23 pBIN2S transformed plants (2S-4, 2S-5, 2S-12, and 2S-22) and seven of the 21 pBIN700 transformants (700-3, 700-5, 700-9, 700-10, 700-13, 700-19, and 700-20) produced seedlings able to form true leaves on this high level of kanamycin. The T^ seedlings exhibiting this high level of NPTI I activity were planted in soil and grown to seed set. Fluorometric M U G assays of T\ plants confirmed that there was no G U S activity in roots, leaves, stems, corollas, sepals, or pollen. P-glucuronidase activity was detected only in whole seed capsules. Seed-specific P-glucuronidase activity was further explored in two families of sibling plants for each construct. Expression of the 2.3 kb spruce promoter of pBIN2S was studied in eight offspring of the T Q parent plant 2S-4 and eleven offspring of TQ 2S-12. The effect on stable expression of deletion of the 2S albumin promoter upstream from -653 was studied in the pBIN700 group of plants, consisting of nine individuals originating from T Q plant 700-3 and eleven plants from the T Q plant 700-13. P-glucuronidase activity of the whole tobacco seed capsule was found to be correlated to embryo size within the developing seed, but was not correlated with capsule size, seed size, gene copy number, or spruce promoter construct. Analysis of gene copy number in the T j plants by Southern blot indicated that G U S gene copy number varied between two and five copies per plant. The M U G expression pattern for whole seed capsules was: no expression i f seeds were unfertilized or embryos not visible upon dissection, no or very low expression i f embryos were 0.1 mm in size (globular or early heart stage), increasing expression for embryos 0.3 to 0.6 mm in length (long heart to the torpedo stage), declining expression as the embryo reached 0.7 mm (late cotyledonary stage) and further decline in the level of expression as the embryo reached maturity. M U G assays of capsule tissues separated from the seeds revealed that only the seeds had G U S activity. Seed representative of the tobacco embryo development stages was harvested and fluorometrically assayed for P-glucuronidase activity (Figure 23). The spruce 122 2000 S Oh 0.1 0.1-0.25 0.25-0.45 0.45-0.65 0.65-0.75 0.75-0.85 embryo length (mm) No Globular Heart Long Torpedo Late Mature Embryo Heart Cotyledonary Figure 23: Expression Pattern of a Spruce 2S Albumin Promoter in Developing Tobacco Seeds The P-glucuronidase activity of whole tobacco seeds (T 2 generation) assayed using the fluorometric M U G assay. The graph is the pooled results from several independant T! plants transformed with either pBIN2S or pBIN700, as no difference was observed in strength or pattern of expression between the constructs. Stage of embryo development was determined visually by dissection of fixed seeds. Letters above error bars indicate statistical difference (i.e. "b" is statistically different than "c", but "c 's" are not statistically different from each other). 123 promoter was not active in unfertilized tobacco seeds or in globular stage embryos, activity was low in heart stage embryos, but expression increased progressively to the torpedo stage and then declined as the embryo reached maturity. The pattern of expression was the same as the whole seed capsule results, but expression levels were less variable for a given embryo stage due to the exclusion of the non-expressing tissues. X-gluc assays of dissected tobacco seed (Figure 24) revealed that while the endosperm did not express G U S , the tobacco embryos stained visibly blue after the heart stage and continued to stain through to embryo maturity. Initial expression was localised along the pro-vascular bundles and in the cotyledons of long heart staged embryos (Figure 24). There were no differences observed in the pattern of developmental expression between the two promoter constructs (pBIN2S and pBIN700). Strength of G U S expression, expressed as picomoles of 4 - M U generated per minute per milligram total protein, varied between plants transformed with the same construct, as well as between pods on the same plant. Level of G U S expression was not correlated, either positively or negatively, with gene copy number. A comparison of G U S expression levels in seeds harvested 14 days after pollination (torpedo embryo stage), confirmed that there were no significant differences in G U S activity between the two constructs (Figure 25). The mean level of expression for torpedo stage embryos containing the full length promoter in pBIN2S was 2903 pmol iriin" 1 m g ' 1 ± 602 (n=12), this is not statistically different from the mean level of expression of pBIN700, which was 2557 pmol min" 1 mg" 1 ± 545 (n=l 1). 124 Figure 24: Transformed Tobacco Embryos stained for GUS expression Embryos are from T l tobacco plants transformed with the spruce PG2S promotenGUS fusion constructs (pBIN2S or pBIN700). Embryos were dissected from the seed, stained overnight for P-glucuronidase activity and fixed in F A A . G U S expression was first visible at the long heart stage. 4 Globular Torpedo Heart Late Cotyledonary Long Heart Mature 125 5000 a 4000 h 3000 a 2000 i 1000 < 0 pBIN2S pBIN700 2S Albumin Promoter Constructs Figure 25: Relative Strength of pBIN2S and pBIN700 Comparison of P-glucuronidase activity by M U G assay for tobacco seeds (T 2 ) from several independant T i transformants. Seeds were assayed 14 days after pollination, when embryos were in the torpedo stage of development. There was no statistical difference between the expression levels generated by the two spruce promoter constructs, as denoted by the "a" above the error bar. 126 C H A P T E R FIVE Discussion and Conclusions Seed storage proteins act primarily as a source of carbon and nitrogen for the germinating embryo, and in the case of 2S albumins also as a source of sulphur. A s nutrient stores they do not, in most cases, have enzymatic activity. They are synthesised for storage and ultimately for catabolysis. Hence the evolutionary constraints on changes to the protein structure tend to be looser than would be expected for an enzyme or a transcription factor. Characteristics which would be important for storage proteins may be optimal balance in amino acid composition for the germinant, determinants of secondary structure which would effect "packing" of the protein body, as well as sites necessary for processing, targeting and metabolism of the protein. The 2S albumins and related proteins show greater divergence in amino acid sequence, compared to the legumin (Hager et al., 1995) and vicilin (Newton et al., 1992) storage proteins which are also found throughout the gymnosperms and angiosperms. Gymnosperm legumins have 37 to 41% identity with angiosperm legumins and 54% sequence identity within the gymnosperms (Hager et al., 1995). A Picea glauca vicilin shares 28 to 38% identity with angiosperm vicilins (Newton et al., 1992). Identity between the conifer and dicot 2S albumins is very low and revolves around the conservation of the eight cysteine residues. Compared to these other storage proteins, 2S albumins also tend to be more heterogeneous within and between species. The spruce sequence characterised in this research is classed as a 2S albumin based primarily upon two pieces of evidence. One is the presence of a framework of cysteine residues which serves to define three domains within the 2S albumin protein. The second is the expression profile of the gene, which indicates that this is in fact a seed storage protein, expressed at high levels and restricted to the seed tissues of spruce as well as transgenic tobacco. Previous research in our laboratory by Barry Flinn (Dr. B . Flinn, Genesis Research and Development Corp. Ltd. , P.O. B o x 50, Aukland, N . Z . ) found that the white spruce 2S albumin protein was located in protein bodies within the parenchyma cells of zygotic and 127 somatic embryo, as well as the megagametophyte. 2S albumin m R N A had also been isolated from white spruce embryos and megagametophytes (Flinn, unpublished). Craig Newton's unpublished research (Dr. C.Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook M a l l , Vancouver, B . C . , V 6 S 2L2) indicated that this protein was most likely encoded by a multigene family consisting of a least four functional genes as shown by the c D N A cloning experiments, with the addition of possibly numerous non-functional pseudogenes, as shown by the screening of a white spruce genomic library. I attempted several times to quantify the size of the multigene family by Southern blot, but was unsuccessful, due possibly to the large size of the white spruce genome (9.6 x 10 9 kb) which would dilute the signal from any low copy number gene such as 2S albumin. In hindsight, an estimation of the size of the multigene family could have been arrived at by designing degenerate 2S albumin P C R primers which would anneal to the conserved amino acid sequence Q P S E R C C E at the 5' side and R R C D ( V / I / L ) ( Q / R ) R on the 3' side. Genes could be differentiated based on the size of the P C R product as well as on restriction polymorphisms, or could be directly sequenced. P C R priming from a site 3' o f the intron would give increased polymorphism due to the intron sequence but as the 3' end of the gene appears less conserved (at least between Picea glauca and Pinus strobus) the design o f the primer becomes problematic. A degenerate primer based on the amino acid sequence Y F / S M T G S S F K would anneal to a subset of the multigene family related to the high expressing c D N A s . Alternately, a primer based in part on the sequence of the polyA signal ( A A A T A A N 1 2 ) should anneal to all translated 2S albumin genes and perhaps some of the unexpressed gene family members. 5.1 2S Albumin Pseudogene Y2S Two members of the white spruce 2S albumin multigene family, located approximately 4 kb apart, are reported in this work: the functional 2S albumin gene, PG2S, and the pseudogene, ^F2S. Clustering of 2S albumin genes has also been observed in Arabidopsis (Krebbers et al., 1988). The occurrence of genes coding for the same protein adjacent to one 128 another is generally thought to be an example of gene duplication. It is not known whether the homologous sequences began to diverge before the insertion event occurred, though it is known that non-functional genes have an increased rate of base substitutions (L i and Graur, 1991). The 642 bp insertion present in the pseudogene T 2 S is A T rich (64%), but is not an intron as it lacks consensus 5' or 3' intron splice sites. The insertion occurs within the first third of the coding region near the amino-terminus and is preceded by two inframe stop codons. Multiple termination codons occur in all three reading frames prior to the insertion. Translation of the pseudogene would result in a truncated product only 48 amino acids in length. The nearest polyadenylation signal is 175 bp downstream from the termination codon, making translation of a truncated product unlikely. To determine i f this insertion is related to any known insertion elements, retrotransposons or transposable elements, the sequence was translated in all reading frames in both directions and a B L A S T search done. The results are shown in Table 4 (page 72); seven matches were made with low B L A S T scores (ranging from 33 to 60). None of the putative matches were to known transposable elements. In addition, there are no large direct- or inverted-repeat sequences at the insert's borders nor any potential open reading frames, whereas most transposable elements characterised to date contain these features (Wessler et al., 1995). Interestingly, there are three small (5 and 8 nucleotide) inverted-repeats imperfectly arranged at the 5' and 3' borders, as well as AT-r ich regions. The presence of small invert-repeat sequences and increased A T content are characteristics shared by M I T E s (miniature inverted-repeat transposable elements) which also lack coding capacity (Bureau et al., 1996). M I T E s are believed to be mobile, based on the presence of homologous insert sequences in various genes and on the absence of insertions in certain members of gene families (Bureau et al., 1996). The insert in the spruce pseudogene does not show any homology with the various M I T E families characterised thus far (Susan Wessler, pers. comm. - Dr. S R . Wessler, Dept. of Genetics and Botany, Life Sciences Bldg. , University of Georgia, Athens, Georgia, U S A , 129 30602). Without homology to a known mobile element there is not enough information to decide whether the small invert repeats present at the borders are a coincidence, or i f the insert is a M I T E which has degraded border sequences similar to elements characterised by Bureau et al. (1996). Alternatively, the insert may be the result of a heterologous recombination event within the genome. 5.2 Characterisation of a Spruce 2S Albumin Intron Most plant genes contain multiple introns (Simpson and Filipowicz, 1996). The 2S albumin family of genes appears to be an exception to this, as only two dicot species have introns within their 2S albumin genes and introns are not found in related monocot genes. Introns occur in Brazi l nut (Gander et a l , 1991) and sunflower (Allen et al., 1987) 2S albumin genes. These genes appear to have independently acquired their single intron, because these species are not closely related and the introns are located at different positions within the coding region. The putative intron in the spruce 2S albumin gene, PG2S, is a phase 1 intron located near the carboxy-terminus of the mature protein. The intron sequence is AT- r i ch (65%) and 175 bp in size. This falls close to the average range in size described for higher-plant introns reviewed by Simpson and Filipowicz (1996), who found that 65% are between 80 and 150 nucleotides in length with an average A U content of 70%. They note that the minimum intron size is 70 nt and that a few introns are as large as 2 -3 kb. The spruce intron 5'-splice site ( C T C G U A A G U ) fits the eukaryotic consensus of A G : G U A A G U . In dicot introns, the frequency of occurrence of C at the -3 position in the splice site is 31%, T at the -2 position is 20%, and C at -1 is 2%. The putative intron's 3'-splice site ( C A G : G C ) also resembles the angiosperm consensus C A G : G U . C occurs 14% of the time at exon position 2 in dicot 3' splice sites. Simpson and Filipowicz (1996) point out that many naturally occurring introns do not exactly match the consensus sequences and this does not appear to affect the ability of the intron to be spliced. 130 Plant intron branch sites have not been definitively characterised, but they may be similar to the consensus for mammalian intron branch sites ( C U R A Y ) . Only the A is crucial for function of the branch site, R denotes an A or G nucleotide and Y denotes a C or U . Branch sites in eukaryotic introns are generally located 10 to 50 nucleotides upstream of the 3'-splice site. The intron in the spruce 2S albumin gene PG2S has three possible branch sites arranged in tandem ( U U A A U G A U U A A U l . beginning 50 nucleotides upstream from the 3'-splice site. The U U A A U sequence is similar to a functional plant intron branch site identified as U U G A U in the Rubisco activase gene (Simpson and Filipowicz, 1996). P C R priming of 2S albumin genes from genomic D N A isolated from individual spruce genotypes resulted in P C R products that were larger than the amplification product of the c D N A clone by approximately 200 bp (Figure 10, page 88). This suggested that these genes all contain insertions of about the same size as the putative intron. Other evidence in support of the presence of an intron are Craig Newton's unpublished low-expressing genomic sequences, II5G122001 and GII5G8001 which appear to contain a 227 bp intron and the 5' end of an intron respectively (Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2 L 2 ) . A third piece of evidence arises from the alignment of the GenBank conifer 2S albumin sequences. The nucleotide sequence on either side of the position of the putative intron present in the genomic clone (i.e., T C T C / G C T A T ) is well conserved in all ten of the c D N A sequences ("/" indicates the border between exon 1 and 2). A slight variation exists in two of the Pinus strobus c D N A s . In P S A L B 1 A the second nucleotide of exon 2 is a G and in P S A L B 4 A it is an A . If the intron present in the genomic sequence was unspliced, this would result in a c D N A clone with the sequence T C T C G T A A G at this point. The intron present in the spruce PG2S gene contains two inframe stop codons. The first termination codon is eleven codons into the intron, so that i f the intron were hot spliced but translated, the carboxyl end of the translated peptide would change from R Y F M T G S S F K to R K S F N Q R Y Q L . This would not substantially change the nature of the protein, though the strictly conserved Y and G in this region would be lost. Also the polyadenylation signal and 131 poly-A attachment site would be farther from the stop codon than is usual. A carboxy-terminal processing site might be lost, although it is not known i f the spruce 2S albumin protein is processed at the C-terminal end. Despite the second exon being quite small, coding for only ten amino acids, the above evidence taken together supports the presence of an intron. 5.2.1 Promoter Motifs within the 2S Albumin Intron Although the dicot 2S abumin gene introns were most likely acquired independently during evolution, they share some characteristics with the spruce intron. The dicot introns bracket the spruce intron in size; 86 and 88 bp for the Brazil nut genes and 190 bp for the sunflower gene. They are AT-r ich , 65% for the spruce intron, 67 and 71.5% for the Brazil nut and 79% for the sunflower intron. The white spruce intron sequence contains elements similar to those found in promoters; two A C G T , a C A T G and a T G C A motif contained within a larger element having homology to the rice glutelin box ( A T G C A A A A ) . The intron in the Brazi l nut BE2S1 gene has a similar element ( A T G C G A A A ) , plus a single T G C A , whereas the intron in the related gene, BE2S2 , contains two T G C A and one A C G T motif (Gander et al., 1991). The intron within the sunflower gene H A G 5 A L B 2 contains two A T G A A A A motifs and a single A C G T (Allen et al., 1987). There are also two pairs of inverted repeats found within the PG2S intron sequence. They have potential to form secondary structures, but these structures have no known function in the splicing of the intron and are probably removed by the binding of proteins involved in intron splicing (Simpson and Filipowicz, 1996). Though the presence of introns has been shown to enhance gene expression by increasing the steady-state level of m R N A , they are not thought to necessarily enhance transcription as promoter sequences do (Simpson and Filipowicz, 1996, Kozie l et al., 1996). Evidence exists that the interaction of proteins involved in the splicing out of the intron may act to stabilise the m R N A , improving its quantity and quality for translation (Koziel et al., 1996). Introns which have been shown to enhance transgene expression are located in the 5' transcribed region of the gene, do not contain 132 known enhancer elements and must be spliced to be effective (Koziel et al., 1996). Other evidence indicates that introns somehow increase m R N A polyadenylation frequency (Simpson and Filipowicz, 1996). A n interesting discovery by Bolle et al. (1996), showed that a light and plastid-regulated gene (PsaD) required both the promoter and intron sequences for correct expression, but no promoter elements were identified within the intron. It is unknown whether the introns in the 2S albumin genes enhance expression of these proteins, but the high levels of expression associated with these genes and the presence of promoter-like motifs within the introns make this an interesting question. 5.2.2 Would the Ancestral 2S Albumin Gene Contain an Intron? The modern conifers are not ancestral to the angiosperms but are a sister group having arisen from a common progymnosperm ancestor (Stewart and Rothwell, 1993). The presence of an intron in the spruce 2S albumin gene does not necessarily predict that the ancestral gene from which the 2S albumin super family arose contained an intron. Perhaps, as in the case of the Brazil nut and sunflower 2S albumin genes, the white spruce 2S albumin coding region received an intron as an independent evolutionary event. In each of the three species, the intron is located in a different position within the 2S albumin coding region. Comparison with two other families of seed storage protein genes (legumins and victims) suggests a tendency towards the progressive loss of introns going from gymnosperm to dicot to monocot. Gymnosperm legumin genes generally contain four introns, while the dicot legumin genes studied to date contain three or two introns (Hager et al., 1996). Gymnosperm and dicot vicilin genes contain five introns, while only four are present in monocot vicilin genes (McHenry et al., 1992). The relative positions and phase of the legumin and vicilin introns are conserved within the respective genes. Characterisation of 2S albumin gene sequences from a wider variety of gymnosperm species, or from more primitive plants, may reveal whether this gene family shows the same tendency towards loss of introns. This may be a general trend, as other conifer genes have been shown to contain introns where their dicot counterparts are 133 intron-less (Sundas et al., 1993), and may partially explain the relatively larger genome size of gymnosperms. 5.3 Translation of PG2S 5.3.1 2S Albumin Cysteine Framework Cysteine is a sulphur-containing amino acid, riot usually found at high levels in the other classes of seed storage proteins (Bewley and Black, 1994). The arrangement of cysteine residues in dicot 2S albumins is strictly conserved (Figure 16, page 103). The small subunit of the mature protein contains two cysteines separated by either eleven or twelve amino acids. Three residues before the second cysteine of the small subunit is a highly conserved leucine. The translated sequence of PG2S results in eleven residues between the first two Cys, one of which is a Leu in the conserved position. The distance between the second small subunit Cys and the first two Cys of the large subunit varies in the dicot precursor proteins, ranging from 23 to 46 amino acid residues. This variation is due in part to the variation in size of the internal processed fragment. The translations of the spruce genomic clone, PG2S, as well as the conifer c D N A clones indicate that the region between the second and third Cys is significantly and consistently smaller, being only 14 residues. This may indicate that conifer 2S albumins do not have an internal processed fragment. Within the large subunit the third and fourth Cys are adjacent to each other and separated by nine, sometimes ten, residues from the fifth and sixth Cys. There is a second conserved Leu three residues downstream from the paired Cys. The fifth and sixth Cys form a Cys X Cys motif, where in this case X is usually an Arg , Va l , or Gin. The spruce sequence shares this C C X 2 L X 6 C R C structure. One residue beyond the C R C motif is a pair of residues which tend to be either A l or T L or G L . Conservation of these residues doesn't necessarily follow evolutionary relationships, i.e. C R C X A I is found in the rice allergenic protein and the conifer 2S albumin sequences, and a similar motif C M C X A I is present in the sunflower H A S F 8 protein. 134 The "variable region", which varies in both length and sequence, is located between the C X C motif and the seventh Cys of the large subunit. This region tends to accumulate small blocks of single amino acid repeats; multiple Met, Arg , Gin, and Glu are common in dicot 2S albumins. The presence of pairs of amino acids is also notable in this part of the sequence. The spruce sequence characterised in this work is rich in Ser and Asp residues in this region, having five of each. A s well, there is a string of six A r g out of a total of eight in the variable region adjacent to six Glu. The inherent variability of this region with its natural tendency to accumulate multiples of particular amino acids led to its experimental modification to improve seed nutritive quality and to introduce economically important peptides (see Introduction - Table 2). Taken together, these facts support the hypothesis that the variable region loops out from the rest of the protein and is not critical to protein folding (Krebbers et al., 1993). Prediction of protein secondary structure (Rost and Sander, 1993, 1994a, and 1994b - Appendix E , page 200) also suggests a looped structure for the region from Ser 102 to Gly 131. The seventh and eighth Cys located near the C-terminal of the large subunit are separated by either six or seven residues in the dicot sequences. This spacing is conserved in the translation of the spruce sequence. There are three amino acids preceding the seventh Cys which are well conserved in the dicot and conifer sequences. These residues are arranged as A l a X X Leu Pro X X Cys-7. The 2S albumin cysteine residues form inter- and intra-chain disulphide bonds, the pattern of which is strictly conserved among the dicot 2S albumins and slightly more variable in related monocot proteins (Lilley and Inglis, 1986; Nirasawa et al., 1993; Gehrig and Biemann, 1996; Egorov etal., 1996; Salmanowicz and Weder, 1997). The conserved leucine, alanine and proline residues noted above most likely have a conserved structural function as they are located close to the conserved cysteine residues. A n example of this may be the conserved Leu 144 and Pro 145 residues which are a part of a (3-turn tetrapeptide, L P N T , with a high probability of bending (2.77). 135 5.3.2 2S Albumin Secondary Structure Graphical representation of the deduced amino acid sequence of PG2S using the P C / G E N E programs P R E S E D U E and N O V O T N Y indicate that the precursor protein is generally hydrophilic in nature with a hydrophobic N-terminal region. This is very similar to the dicot 2S albumin precursor proteins (Gayler et al., 1990, Hara-Nishimura et al., 1993). Two minor peaks of hydrophobicity occur on either side of the variable region at the amino acid residues 100 and 141, with the balance of the variable region being highly hydrophilic. This is very similar to the hydrophobicity plots presented for the Lupinus angustifolius 2S albumin protein (conglutin-8) by Gayler et al. (1990) and the Cucurbita spp. 2S albumin by Hara-Nishimura (1993). The hydrophobic N-terminal region of dicot 2S albumins has been characterised as part of the eukaryotic signal sequence which targets the precursor protein to the endoplasmic reticulum (Altenbach et al., 1986, Ericson et al., 1986, Krebbers et al., 1988, Gayler et al., 1990, Hara-Nishimura et al., 1993). The putative signal sequence for the spruce 2S albumin wi l l be discussed in detail below. There are few aromatic amino acids in the translation of the PG2S sequence and they form three clusters in the N-terminal hydrophobic region, the N-terminal side of the variable region, and the carboxy terminus of the protein. The presence of aromatic residues at the C-terminal end of the precursor protein is shared by most of the dicot 2S albumin sequences. These residues contribute to a C-terminal motif which is also generally hydrophobic and charged. This conservation of form may be acting as a signal for the action of a carboxy peptidase or, as in the case of the Berthollitia excelsa, may be part of a targeting signal which directs the protein from the E R to the vacuole (Saalbach et al., 1996). D'Hondt et al. (1993b), however, found no effect on vacuolar targeting when the carboxy terminal residues were deleted from the Arabidopsis 2S albumin precursor protein. 5.3.3 Variation in Amino Acid Sequence Among the Conifer 2S Albumins Differences in the deduced amino acid sequence between the Picea glauca genomic and c D N A clones are mostly conservative in nature. A s would be predicted by the model of 136 2S albumin evolution proposed by Shewry et al. (1995), amino acid sequence tends to be more rigorously conserved in the domains which contain the conserved cysteine residues. A notable difference between the four Pinus and five of the Picea sequences is the duplication of a block o f sequence D P Q R between the two conserved cysteines of the small subunit, followed by the further divergence of the genomic sequence by a single nucleotide change to D P Q R H P Q R . In a similar manner, but to a much greater extent, multiplication of blocks of amino acids has often occurred in cereal seed storage protein evolution (Shewry et al., 1995). A second example of protein evolution may be the use of upstream initiation codons by the Pinus strobus c D N A sequences. Although there are Met codons downstream, Kamalay and Rice (1991) believe translation initiates at the upstream methionine. There are no possible initiation codons upstream of the first Met in the PG2S sequence. The dendrogram in Figure 18 (page 109) groups the Picea glauca sequences together, as might be expected. The clustering of the Pinus strobus sequences P S A L B 3 A and P S A L B 4 A with the Picea sequences may be an indication that these sequences are orthologous, i.e. both derived from a common ancestral gene. P S A L B 3 A and P S A L B 4 A would then be considered paralogous to the P S A L B 1 A and P S A L B 2 A genes. 5.4 Predicted Processing of the Mature 2S Albumin Protein 5.4.1 Secretory Signal Sequence Secretory signal peptides are the amino-terminal portion of precursor proteins targeted to the endoplasmic reticulum (ER) in eukaryotes. They are cleaved as the protein passes through the membrane (Bar-Peled et al., 1996) and are generally characterised as having a positively charged N-region, followed by a hydrophobic H-region and a neutral but polar C-region (Nielsen et al., 1997). The site of cleavage follows the -3,-1 rule, which states that amino acids at the -3 and -1 positions relative to the site of cleavage must be small and neutral (von Heijne, 1986). The spruce 2S albumin signal peptide predicted using the method of von Heijne (1986) is approximately 15 residues larger than the dicot 2S albumin signal peptides characterised 137 experimentally (Altenbach et al., 1986, Ericson et al., 1986, Krebbers et al., 1988, Gayler et al., 1990, Hara-Nishimura et al., 1993). Many of the dicot 2S albumin signal peptides are cleaved on the carboxyl side of an alanine (see Figure 16, page 103). Similarly, in the deduced amino acid sequence from spruce 2S albumin PG2S, there is an A l a at position 21 which is also conserved in nine of the ten conifer 2S albumin sequences in Figure 17 (page 107). But the putative signal peptide cleavage site V D G H E D , with cleavage between Gly 35 and His 36, also shows conservation among the conifer 2S albumin sequences. The valine which would be at the -3 position relative to the cleavage site is strictly conserved, and the -1 position is occupied by a glycine in four cases, an alanine in five cases or a cysteine in one instance, all of which agree with the-3,-1 rule. Comparison of hydrophobicity plots (using P C / G E N E N O V O T N Y ) for 2S albumins with known signal sequences reveals similarities in the N-terminal region. Cleavage of the signal sequence occurs at or near the first hydrophilic maximum past two N-terminal hydrophobic peaks (Figure 26). The hydrophobicity plot of the putative signal sequence for the spruce 2S albumin protein looks remarkably like the dicot pattern described above, although it is longer. The site of cleavage based on maximal hydrophilicity would be around position 36, which is in agreement with the site predicted by von Heijne's method (1986). A n improved method for predicting the existence of signal peptides has recently become available (Nielsen et al., 1997) and it correctly predicts the known signal sequences of dicot 2S albumins. It also predicts a eukaryotic signal sequence in PG2S, with cleavage between Gly 35 and His 36. Graphical representation of the signal peptide prediction by this method indicates that cleavage after A l a 21 would be the fourth choice, and therefore not as likely as three other positions in the N-terminal sequence. Interestingly, Nielsen et al. (1997) note that only 2.2% of eukaryotic signal sequences are longer than 35 residues, but the high prediction scores for presence of signal peptide, position of cleavage site and combined cleavage site indicate that the spruce 2S albumin does in fact have a large signal peptide. 138 Figure 26: Position of Signal Sequence Cleavage Hydrophobicity values are predicted by the method of Rose and Roy (1980). Regions above the horizontal line are hydrophobic and below the line are hydrophilic. The x-axis indicates amino acid number. The position of signal sequence cleavage proven experimentally is indicated by an arrow for the 2S albumin precursor peptides from A. thaliana ( A T H A T 2 S 1 , G B M22032, Krebbers et al., 1988), Brazil nut (BE2S1 ,GB X54490, De Castro et al., 1987), B. napus ( B N A N A P A , G B J02798, Ericson et al., 1986), pumpkin ( C U C P 2 S A , G B D16560, Hara-Nishimura et al.,' 1993), sunflower (HASF8, G B X56686, Kortt et al., 1991), and lupine ( L A C O N G L D , GBX53523 , Gayler et al., 1990) as well as a monocot, rice allergenic protein ( R I C R A G 1 , G B D l 1433, Adachi et al., 1993). Cleavage occurs at or near a position of maximal hydrophilicity adjacent to, in most cases, a double hydrophobic peak. Based on this pattern the predicted site of signal sequence cleavage for PG2S would be at or near residue 36 (indicated by arrow). 139 Figure 26: Position of Signal Sequence Cleavage 140 5.4.2 Amino Terminal Processed Fragment Prediction of the presence or size of an amino terminal processed fragment (ATPF) based on sequence similarities between the deduced conifer amino acid sequences is at best a rough guess. The prediction of H E D N M Y 4 1 as the A T P F is based on the putative site of signal cleavage at the amino side, and the strict conservation of Gly 42 adjacent to Tyr 41 within the deduced conifer 2S sequences. Conservation of the Gly residue may indicate that it is the N-terminal amino acid of the small subunit. The tetra-peptide N M Y G is predicted to be a P-turn ( P C / G E N E program B E T A T U R N ) , with a high probability of bend occurrence (2.29). Monsalve et al., (1990) found that many 2S albumin cleavage sites are within P-turn tetrapeptides, and predicted the existence of an endopeptidase which used p-turns as a recognition site. Unlike the dicot 2S albumin sequences, there are no strictly conserved Asn or Asp residues present which could define the carboxy terminus of an A T P F and no conserved Pro, which often form the amino terminus of the dicot 2S albumin small subunits. N o function has been attributed to the A T P F (D'Hondt et al., 1993b, Muren et al., 1995), and it is not present in all 2S albumin precursor proteins (e.g., Lupinus angustifolius (Gayler et al., 1990), and Helianthus annuus (Thoyts et al., 1996)). 5.4.3 Small Subunit If the small subunit of the mature protein encoded by PG2S includes the sequence H E D N M Y , it would contain 39 amino acids and have a mass of 4.9 kDa. Without the putative A T P F , it would be 4.1 kDa in size. If the putative signal sequence cleavage site at Gly 35 is ignored and cleavage occurred after A l a 21, which is unlikely, the small subunit would consist of 53 amino acid residues, (6.5 kDa). Interestingly, the Pinus pinaster 2S albumin small subunit was estimated to be 5.5 kDa in size (Allona et al., 1994), and would be made up of 54 amino acids based on the information that the two cysteines are 3.7% of the small subunit. O f course, the Picea and Pinus 2S albumin proteins may not be processed in exactly the same way. 141 Flinn et al. (1993) and Newton (unpublished, Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2) identified a 15 k D a seed storage protein believed to be a 2S albumin in white spruce protein bodies. This protein dissociated under reducing conditions to give a large subunit estimated to be 8 kDa. They were unable to visualise a small subunit, either because it co-migrated with the 8 kDa band or was lost due to its small size (Newton, unpublished results; Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook M a l l , Vancouver, B . C . , V 6 S 2L2). The protein work done previously by Flinn and Newton would support the prediction of 4.1 kDa small subunit, as a 6.5 kDa subunit would be visible on a S D S - P A G E gel, though a 4.1 k D a band might be missed. The value of 6.5 kDa for the small subunit also does not agree with the total molecular weight of the mature protein when the size of the large subunit, predicted below, is taken into account. 5.4.4 Internal Processed Fragment Two P-turn sites occur near the putative carboxy-terminus of the small subunit. The first includes the second conserved cysteine ( C R D Y 6 7 ) , and is six amino acids away from the P-turn (QPSE77). The sequence between the two P-turns is rich in the most abundant conifer storage protein amino acids ( L E R R R E ) . This region may be removed during processing, but because of the presence of these highly favoured amino acids and other reasons discussed below, the spruce protein most likely lacks an internal processed fragment. There are only 14 amino acids between the second and third cysteines of the spruce 2S albumin precursor protein. The size of this region in the dicot sequences ranges from 23 to 44 residues, which is 1.6 to 3 times as large. The dicot internal processed fragments (IPFs) characterised experimentally range in size from a five amino acid linker sequence in the Brazil nut 2S albumin (De Castro et al., 1987) to the 20 amino acid EPF in the B. napus napin protein (Crouch et al., 1983, Ericson et al., 1986). Asn is often the carboxy-terminal amino acid of the IPF, and there are no Asn residues at this position in any of the conifer sequences. 142 5.4.5 Large Subunit and C-Terminal Processed Fragment Prolines are very commonly found as the N-terminal residue in both the small and large subunits of 2S albumins, and it is notable that aminopeptidases are halted at proline residues (Muren and Rask, 1995). Therefore it is theorised by Muren and Rask (1995) that these proline residues are conserved to ensure correct processing of the mature subunits. Proline-75 of the deduced PG2S amino acid sequence is conserved in all of the translated conifer 2S albumin sequences, at a position where it could serve as the initial amino acid of the processed large subunit. This Pro is also part of a P-turn structure as mentioned above. If Pro-75 is the initial amino acid of the large subunit and no carboxy terminal processing occurs, the proposed large subunit would be 99 amino acids in length with a molecular weight of 11.8 kDa. Removal of the two carboxy terminal residues (Phe and Lys) would result in a large subunit of 11.6 kDa. A small subunit of 4.1 k D a and a large subunit of 11.6 k D a would result in a mature protein of approximately 15.7 kDa, made up of 130 amino acids. These estimated processing sites for the spruce 2S albumin differ somewhat from the data obtained for the Pinus pinaster 2S albumin proteins by Allona et al. (1994). Allona et al. (1994) described a 124 amino acid protein with a mass of 16 to 18 kDa, which dissociated under reducing conditions into a 5 to 6.5 kDa small subunit of 54 residues and a 7.5 to 9.5 kDa large subunit of 70 residues. The weight and number of amino acids of the spruce and pine mature 2S albumin proteins agree, but not the values for the subunits. This suggests that the amino acid sequence and / or the processing sites of 2S albumins differ significantly in these two conifer species, and that the Pinus pinaster 2S albumins may be processed differently from the related dicot proteins. 5.4.6 Amino Acid Content of the Predicted Mature Protein Isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan and valine are required amino acids for higher animals (Habben and Larkins, 1995). Cereal grains are deficient in lysine, threonine and tryptophan, whereas legume seeds are limiting in methionine 143 and cysteine (Higgins, 1984). Due to this, researchers have attempted to locate a seed source for a "complete" protein. A n alternate strategy is to create a seed containing a complete complement of amino acids by genetic engineering, either by adding a gene for a protein which is high in the missing amino acid(s) or by modifying a seed storage protein gene to increase the protein's content of certain amino acids (reviewed by Krebbers et al., 1993). Typical of storage proteins, the spruce 2S albumin is high in the nitrogen-rich amino acids arginine (19.3%), glutamine (11.6 %), and asparagine (2.3%). A r g is the predominant amino acid in conifer seed storage proteins characterised thus far (Allona et al., 1992 and 1994, Newton et al., 1992, Leal and Misra, 1993, King and Gifford, 1997). Similar to the dicot 2S albumins, this protein would have relatively high levels of the sulphur-containing amino acids cysteine (6.2%) and methionine (3.8%). The mature spruce 2S albumin protein would lack the essential amino acids lysine and tryptophan, and is also low in histidine, valine and phenylalanine. Therefore this is not a "complete" protein, but may be of interest nutritionally because of its relatively high levels of the sulphur amino acids. Several 2S albumins are known to be serious food allergens (Youle and Huang, 1978b and 1979, Thorpe et al., 1988, Machado and Godinho D a Silva Jr., 1992, Monsalve et al., 1993, Gonzalez de la Pena et al., 1996, Nordlee et al., 1996, Lacorte et al., 1997), so caution must be taken to ensure that i f edible seeds are engineered to express this related spruce 2S albumin, an allergic reaction wi l l not be induced. Comparison with Youle and Huang's survey of 2S albumin proteins (1981) indicates that the spruce protein is not unusual in having low amounts of tryptophan, histidine, and phenylalanine. Though the low level of valine (1.5%) and the lack of lysine in the mature spruce protein differs from most of the proteins surveyed by Youle and Huang (1981). Generally, valine and lysine are present in higher concentrations (3 to 6%, and 3 to 8.7% respectively) among the dicot 2S albumins, the exception being Brazi l nut 2S albumin which contains only 0.52% and 0.89% respectively. There is also good agreement with the concentrations of lysine (0.4%) and valine (2.2%) in the Pinuspinaster 2S albumin proteins surveyed by Allona et al. (1994). 144 5.5 Similarity to Calmodulin Antagonists Polya et al.(1993) and Neumann et al. (1996, Neumann et al., l996a, Neumann et al., 1996b) have identified small and large subunits from various members of the 2S albumin family which have activity as calmodulin inhibitors, many of which are also phosphorylated at 2+ specific serine residues. Phosphorylation occurs via a Ca -dependent protein kinase ( C D P K ) at sites that have been elucidated as Basic X X Ser, Ser X Basic, Ser X X Basic, Basic X X Ser X X Basic, Basic X X Ser X Basic (Neumann et al. 1996), Ser Basic (Neumann et al., 1996a) and Basic X X X Ser X X Basic (Neumann et al., 1996b). Basic denotes a basic amino acid (His, Arg , or Lys) and X denotes any amino acid. There are four potential sites where Ser may be phosphorylated in the deduced amino acid sequence of PG2S based on consensus with the above recognition sites. In the small subunit, Ser 51 is part of a Basic Ser motif, and Ser 62 and 63 in R L S S C R are part of a Basic X X Ser X Basic or Basic X Ser X X Basic motif (only one Ser may be phosphorylated, Neumann et al., 1996c). Within the large subunit two potential sites for phosphorylation are R M S 8 7 P Q C R and RHS163R. The carboxy-terminus of the spruce 2S albumin precursor protein ( Y F M T G S S F K 1 7 3 ) , resembles a phosphorylation site, which does not fit the general pattern but was identified experimentally, at the carboxy terminus of a plant histone ( G T G A S G S F K ) (Polya et al., 1993). Identification of these sites is purely speculative, but interestingly they are well conserved among the conifer 2S albumins. Possible roles for the phosphorylation of Ser residues within 2S albumin proteins may involve acting as signals for precursor processing, protein folding, proteolysis during germination (Polya et al., 1993), or as a functional requirement for anti-fungal activity (Neumann et al., 1996c). Not all o f the 2S albumins that they characterised are phosphorylated, despite the fact that all act as calmodulin antagonists. During germination, calmodulin levels increase within the seed and calmodulin inhibitors decline (Cocucci and Negrini, 1988). Inhibition of calmodulin may be related to the presence of 2-sided alpha helices within the secondary structure of the protein (Neumann et al., 1996, Neumann et al., 1996a and 1996b). Prediction of secondary structure for the 145 " deduced amino acid sequence of PG2S (Appendix E , page 200) includes four helical regions, though none have amino acid homology with the regions identified by Neumann et a l , 1996, Neumann et al., 1996a and Neumann et al., 1996b. 5.6 PG2S Promoter Region 5.6.1 2S Albumin Developmental Pattern of Expression is Conserved The promoter sequence from the white spruce 2S albumin gene is among the first examples of a gymnosperm promoter being expressed in an angiosperm and the first and only example of a conifer gene being correctly developmentally regulated in an angiosperm. P-glucuronidase reporter gene activity was measured in T 2 tobacco embryos, the offspring of selfed Tj plants using the fluorescence M U G assay. Southern blots of Tj tobacco plants indicated that promoter construct copy number ranged between two and five copies per genome. Since few non-GUS expressing embryos were noticed in the T 2 embryos, it can be assumed that either: 1) germination on kanamycin of the T, seed biased the T x germination toward homozygous transformed plants or 2) due to the presence of multiple unlinked copies of the construct few non-expressing embryos existed. A s I was interested in the precise timing of P-glucuronidase expression in the developing tobacco embryo, the fluorescence M U G assay which is more sensitive to low levels of reporter gene activity was appropriate in this case. Further, i f there were a small number of untransformed seed in a sample, this would have been consistent across the various developmental stages and therefore not effected the analysis of the results on a relative scale. The spruce 2S albumin promoter directs seed specific expression of the uidA reporter gene in tobacco, an angiosperm. This suggests that the functional elements of seed storage protein promoters have been retained since the divergence of the gymnosperm and the angiosperm lineages some 285 (Savard et al., 1994) to 360 million years ago (Troitsky et al., 1991). The developmental pattern of expression of the spruce 2S albumin promoter in tobacco was similar to that of the native gene based on northern blots of spruce somatic embryo developmental stages. Based on the similarity of the 5' coding region of the gene 146 PG2S with those of the highly expressed c D N A s and it's dissimilarity to the three genomic sequences known to be low expressing (C. Newton, unpublished; Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2), it is presumed to make up a significant part of the hybridzing m R N A seen on the northern blot. There was no expression in the earliest stages of embryo development, which was the proembryo in spruce and the globular embryo in tobacco. Expression began in stage 2 spruce and tobacco heart stage embryos, then increasedto maximum levels in the early cotyledonary stages of both. Messenger R N A and P-glucuronidase levels declined in the mature, partially dried embryo. Embryo morphology differs considerably between spruce and tobacco; most notably, embryogenesis in the Pinaceae lacks a heart-shaped stage and mature embryos have multiple cotyledons surrounding the apical dome. Parallels may be drawn between the gymnosperm and angiosperm developmental stages, however, based on appearance of embryonic tissues such as the protoderm and the provascular traces. Based on the appearance of the protoderm, the tobacco heart stage and the Pinaceae stage 2 embryo would be equivalent in development. The appearance of provascular traces occurs in the tobacco long heart stage and the spruce early cotyledonary stage (stage 3). In both types of embryos, late cotyledonary and mature stages are marked by the concurrent increase in storage products and embryo size. The expression pattern of the spruce 2S albumin promoter in the developing tobacco embryo supports these morphological comparisons. Another, more fundamental difference between gymnosperm and angiosperm seeds, is the origin of the nutritive tissue surrounding the embryo. In gymnosperms the nutritive tissue is the megagametophyte, the haploid female gametophytic tissue containing storage proteins and lipids. Angiosperm embryos are surrounded by the endosperm, which is the triploid . product of the fusion of one pollen nucleus with the two polar nuclei of the embryo sac. The spruce PG2S promoter did not direct expression to the tobacco endosperm, perhaps because this tissue does not exist in the gymnosperm seed or the promoter may in fact be embryo-specific, as are members of the Arabidopsis 2S albumin gene family (DeClerq et al., 1990, and 147 Guerche et al.,1990). Another dicot 2S albumin promoter, from the ndpA gene of B. napus, directed expression to both the developing tobacco embryo from heart stage to maturity and to the endosperm during heart stage which declined as the embryo matured (Stalberg et al., 1993). The spruce 2S albumin protein and m R N A are known to accumulate in the megagametophyte (Flinn, unpublished results; Dr. B . Flinn, Genesis Research and Development Corp. Ltd. , P.O. B o x 50, Aukland, N .Z . ) , but at this time it is unknown whether this particular gene, PG2S, is expressed in that tissue. R T - P C R (reverse transcriptase P C R ) of megagametophyte m R N A using PG2S gene-specific primers could answer this question about the tissue-specificity of PG2S gene expression in the future. Obtaining white spruce seeds at a stage where the megagametophyte is developing (as opposed to the mature, dormant seed) can only been done once a year during the early summer (Owens and Molder, 1984). Which was a large part of the rationale for using somatic embryos in the first place. 5.6.2 Conserved Sequence Motifs Putative regulatory motifs within the spruce promoter, identified in Figure 9 (page 83), are represented diagrammatically in Figure 27. In Figure 27 the first 400 bp of the spruce promoter are aligned at the T A T A box with proximal promoters from dicot 2S albumin genes known to direct seed-specific expression in transgenic tobacco (Arabidopsis thaliana -DeClerq et al., 1990; Conceicao, et al., 1994; Brassica napus - Stalberg et al., 1993; Bertholletia excelsa Grossi de Sa et al., 1994). Most motifs do not appear to be conserved in position amongst the 5 sequences, though as perhaps would be expected, the degree of similarity is highest amongst the most closely related species. The clustering of elements into roughly three groups in the promoter and the spacing between certain elements within a cluster may be indicative of a seed-specific arrangement of conserved motifs. Often these motifs are directly adjacent to each other, or overlap! Though these elements are small, and flanking sequence is not strictly conserved between the sequences, their presence and pattern suggests functionality more than random occurrence. In the proximal region of the PG2S 2S albumin promoter there are two G box 148 C A T A A A T A . 395 if A A T Wiz* 1 PG2S f T A T A A A T T , r I | BNANAPA 375 4 # V A T A T A A A T A ^ 7 ®| © e»r ATHAT2S 388 A ir iz ® ^%i*@ T A T A A A A C ^ 1 ' J 1 ATHAT2S2 ^ T A ±®i?W 400 a a Ma © 0 T A T A A A T A J ^ m BE2S1 ©CCACorCACC ^ CACGTG W TGCATGCA A ACGT JjjL V TGCA CATGTG ^ TGCATG A CATG ^ -^-CANNTG -"5^ 7 CANNTGCA ^ CATGCA Figure 27: Conserved Motifs of the Proximal Promoter The proximal promoter region of the white spruce 2S albumin promoter (PG2S, GenBank U92077) aligned at the T A T A box with the dicot 2S albumin promoters from Brassica napus napA gene ( B N A N A P A , GenBank J02798), the Arabidopsis thaliana genes 1 and 2 ( A T H A T 2 S 1 and A T H A T 2 S 2 , GenBank M22032, M22034), and the Bertholletia excelsa (Brazil nut) 2S albumin gene (BE2S1, GenBank X54490, and Grossi de Sa, 1994). Putative initiation of transcription is indicated by the arrow. The diagram begins at the right with the initiation codon and ends at the nucleotide noted. Overlapped symbols indicate that sequence motifs are overlapping, and a symbol drawn under another indicates that the lower sequence is present within but does not extend beyond the upper sequence. 149 core elements 135 and 26 bp upstream from a C A C C motif, referred to as a coupling element by Shen and Ho (1995). Similar arrangements of G boxes and C A C C elements have been observed in many ABA-inducible promoters (Shen and Ho , 1995). Nuclear proteins from Brazil nut seeds have been shown to bind on or adjacent to C A C C and C C A C , A C G T , T G C A and C C A C G T G (an E box motif) in the Brazil nut 2S albumin promoter (Grossi de Sa, 1994). Recent work by Vincentz et al. (1997) showed that Opaque-2 (0-2), a basic-leucine-zipper (bZIP) transactional activator isolated from a monocot, Coix lacryma-jobi, binds to three sites within the Brazi l nut 2S albumin promoter BE2S1 and activates transcription. The binding sites identified were T C G A C G T G G A . G C C A C C T C A T . and T C C A C G T A C T . which contain the conserved motifs, and were shown by deletion studies in transgenic tobacco to be necessary for seed-specific expression. Gustavson et al., (1991) showed binding of B.napus nuclear proteins to synthetic oligonucleotide probes containing C A T G C A , C C A C and C A C C motifs. Nuclear proteins have also been observed to bind to two regions containing T G C A motifs in a pea legumin promoter (Howley and Gatehouse, 1997). Ericson et al. (1991), also in B. napus, showed binding to a T A C A C A T motif (found repeated in napin promoters), a similar sequence ( G A C A C A T ) is located at -63 in the PG2S sequence. Stalberg et al. (1996) demonstrated that the loss of an E box motif from the napA proximal promoter abolished expression of a reporter gene within B. napus seed, while an internal deletion of a region containing a C A C C motif significantly decreased expression. Using the same promoter in a deletion study in tobacco, Ericson et al. (1996), found that internal deletions abolishing the same E box / C A C C containing region led to decreased expression in the embryo relative to endosperm. Kawagoe et al. (1994) demonstrated the synergistic interaction of three E boxes within the beta-phaseolin gene promoter, a legumin type seed storage protein. A region within the Arabidopsis 2S albumin promoter containing 2 A C G T and a C A T G C A motif has been identified as necessary to direct expression to the cotyledon of the embryo (Conceicao et al., 1994). 150 5.6.3 Cereal-like Promoter Motifs Sequence elements with homology to cereal promoter motifs are also found in the PG2S promoter (Table 5). The A T G C A A A T octamer sequence has also been noted in the B. napus 2S albumin promoter (Ericson, 1991). Within cereal promoters these elements have been shown to bind b-ZIP transcription factors and are necessary for endosperm specific expression (Albani et al., 1997), Despite the presence of similar sequences with the spruce promoter, no G U S expression was observed in the endosperm of tobacco transformed with pBIN2S or pBIN700. Another possible site of transcription factor binding is ( G A T G A C A T a c ) located at -650, which has similarity (in upper-case) to the maize Opaque-2 binding site ( G A T G A P y P u T G P u ) (Lohmer et al., 1991). The B. napus 2S albumin promoter contains repeats of similar elements (Stalberg et al.,1993), though function of these elements was not demonstrated. The potential hairpin loop present in the spruce promoter (atgagcgagctagcaaaagctcaf) is similar to a hairpin loop identified in a maize zein storage protein promoter (atgcatattgggtgatgcaf). though functional significance in either promoter is also unknown. 5.6.4 A B A Response Elements Synthesis of seed storage proteins may be regulated by the hormone abscisic acid, at certain stages of embryo development (reviewed by Kermode, 1995). A type of E box element, which is ABA-responsive ( C A C G T G ) , has been characterised in many plant promoter sequences (reviewed in Quatrano et al., 1993). The A B R E consensus sequence is found in most dicot 2S albumin promoters, but is not present in the spruce PG2S promoter. Hul l et al. (1996), have found that A B A responsiveness of an Arabidopsis E M promoter occurs without the absolute conservation of the A B R E ; an E box and four G box core sequences appear to be sufficient for responsiveness in this case. Previous work by Flinn et al. (1993) indicated that the levels of 2S albumin m R N A and protein are up-regulated in spruce somatic embryos by the addition of mannitol, an osmoticum, or A B A to the culture media. Conversely, Dong and Dunstan (1996), did not see any up-regulation of the white spruce 2S 151 albumin c D N A (pEMB25) by A B A or the osmoticum P E G in somatic embryos, whereas other c D N A s isolated from cotyledonary stage embryos were responsive. The c D N A clone they isolated has the same developmental expression pattern, and is 69.7% percent the same as the genomic clone PG2S at the amino acid level. Jiang et al. (1995) and Jiang et al. (1996) showed that both vicilin and napin promoter:GUS fusions in transgenic tobacco were down regulated by desiccation of the seed and up regulated by the application of exogenous A B A . They also observed that both the vicilin and napin promoters would become unresponsive to A B A i f the seed were prematurely dried. Preliminary experiments based on the method of Jiang et al. (1995) involving application of exogenous A B A to excised tobacco seed (T 2 ) transformed with pBIN2S and pBIN700 did not result in increased expression of the reporter gene (data not shown). I did not feel that this work conclusively showed lack of A B A responsiveness on the part of the spruce promoter as these experiments lacked a positive control such as tobacco seed transformed with a known ABA-responsive gene. For example, tobacco seed transformed with the construct p B M l 13kp, containing the E M promoter from wheat (Marcotte et al., 1988), would have been an appropriate positive control. A transient system for testing the up-regulation of expression of the PG2S promoter:reporter gene by A B A was considered. Obviously proembryo E S M cannot be used as the target tissue for bombardment for this experiment as one would not be able to discern up-regulation caused directly by A B A versus up-regulation caused by the concurrent development of the proembryos on the A B A containing medium. A non-embryogenic spruce tissue would be a better choice, but efforts to initiate a callus culture from white spruce seedling cotyledons for this experiment were unsuccessful. 5.6.5 Putative enhancer elements not recognised in tobacco There were no significant differences observed in pattern or strength of G U S expression in tobacco plants transformed with the 2.3 kb full length promoter (pBIN2S) or the promoter deleted to -653 (pBIN700). Transient expression of these constructs in spruce stage 152 3 and mature embryos, resulted in higher levels of expression for the full length promoter, suggesting that elements which enhance expression in spruce are not recognised in tobacco or that these elements are only important to transient expression. Most seed specific promoters studied to date fit the bipartite model (Thomas, 1993) in which enhancer elements are located distally (from -2 kb to -500 bp), while tissue specific elements are located in the proximal promoter (-500 bp to +1). Conservation of motifs within the first 400 bp of the promoter allows the spruce promoter to function in a tissue and developmental specific manner within the angiosperm tobacco. 5.7 Transient Expression This is the first study to report the use of transient expression to characterise a seed-specific promoter in a developmental series of embryos. Aragao et al. (1992) used microprojectile bombardment to test the expression of a Brazi l nut 2S albumin construct in bean embryos, and as a method of stable transformation (Aragao et al., 1996), but did not look at developmental regulation. Difficulty in acquiring sufficient numbers of accurately staged zygotic or somatic embryos may have prevented others from attempting such large experiments. Interpretation of the results of transient promoter expression requires the use of at least two control vectors. One of these should be expressed at relatively high levels in all tissues under study (sometimes referred to as a constitutive promoter). Expression of the constitutive or positive control confirms that the microprojectile bombardment system is delivering particles to the tissues under study, and that these tissues are physiologically able to express the reporter gene. The second control in these experiments was the "minimal" promoter, p2SMTN, which consisted of a small amount (the "minimal" amount necessary to drive a low level of expression) of promoter sequence and therefore gives a measure of "background" expression levels. This allows the certainty that the level of expression one is observing is in fact due to the promoter sequence and not due to physical or physiological artifacts. The third control employed was the "negative" control p2S+l , which contained a 153 single basepair insertion compared to p2SGUS causing the G U S coding region to be frame-shifted and untranslated (or expected to be untranslated). The choice of p2S+l was made over a promoter-less G U S construct, as promoter-less constructs are found to be expressed at low levels apparently by transcription due to chance physical integration within the genome adjacent to a functional promoter sequence, similar to promoter trapping. In hindsight, the promoter-less G U S construct would have also been a good control for the levels of expression seen with the minimal promoter, in addition to p2S+l. In the transient expression experiments p-glucuronidase activity was quantified by the counting of blue loci which were the result of histochemical staining with the G U S substrate X-gluc. The number of GUS-expressing loci measured histochemically or by fluorescence assays for p-glucuronidase activity produce a good correlation at a quantitative level and the amount of variation within a treatment is less with X-gluc (Ellis et al., 1991). Therefore, for the purpose of this research, which was to determine the relative expression levels between the different embryo developmental stages, the histochemical assay is as good as the fluorescent assay. Transient expression of the spruce 2S albumin promoter constructs p2SGUS, p2S700 and p2SMTN, generally mirrored the expression of the endogenous 2S albumin gene family. The low levels of expression observed in proembryo and stage 2 embryos are considered background levels as there was no statistical difference between average number of G U S expressing loci for the minimal promoter, p2SMTN, and the 2.3 kb promoter, p2SGUS. The significantly higher level of expression measured for the wheat E M promoter in p B M l 13kp, confirms that the low levels of expression observed for the spruce constructs is based on their 5' flanking sequence and not on an inability of the bombarded tissues to express the reporter gene at higher levels. The significant differences in expression between the three spruce 2S albumin promoter constructs observed in the stage 3 and mature embryos indicate that elements which enhance expression are located in the distal region of the promoter between the unique X b a l site at -653 and the 5' end of p2SGUS, as well as between -653 and -117 in the proximal 154 promoter region. Though deletion of the spruce 2S albumin promoter to position -117 decreased expression six to seven-fold compared to the full length promoter in stage 3 and mature embryos, it did not remove the promoter's seed specificity. Deletion of sequence which determines seed specificity might be expected to result in an increase in expression, relative to the full length promoter, in non-seed tissues such as the three week old somatic germinants. N o such increase was observed. In addition, the germinants which had G U S -expressing loci after microprojectile bombardment with 2S albumin constructs tended to look deformed and less well developed than germinants which showed no expression. The "positive" control vector, p B M l 13kp, was expressed at a significantly higher level in the spruce germinants. The relatively high levels of transient expression observed in partially-dried mature somatic embryos bombarded with the spruce 2S albumin promoter:GUS fusion constructs was contrary to the low level of expression of the endogenous gene in the same developmental stage. In this case the promoter does not appear to be behaving in a developmentally regulated manner and there are several possible explanations for this. High levels of transient expression in partially dried mature embryos has been suggested to be related to increased cell survival by microprojectile bombarded cells, which is related to their osmotically stressed condition (Klein et al., 1988). Cells not under full turgor pressure would suffer less leakage of cell contents when bombarded, and the higher cell survival rate would translate into a higher rate of reporter gene expression. In addition, it is known that during the initial phase of seed germination D N A is repaired within the cell, and proteins are synthesised from extant m R N A (Bewley, 1997). A s the partially dried embryo is switching into the germination mode, cellular machinery may not be able to distinguish the high levels of introduced D N A vectors from genomic D N A which is damaged or exposed for transcription. The cell may be preferentially transcribing and translating the introduced D N A , or due to the artificially high copy number of the vector sequences a proportion of them may be "caught up" and expressed along with proteins necessary for germination. A fourth explanation may be that a "negative" or "silencer" element is present in a part of the PG2S gene not present in the 155 promotenreporter gene constructs. Within the last two years regulatory elements effecting gene expression have been discovered within coding regions, introns and 3' flanking regions (reviewed by Taylor, 1997). Similar high levels of transient expression were observed in germinating white spruce pollen grains. Pollen is also known to translate stored m R N A as it germinates and this may be related to the lack of developmental control of the introduced D N A . The lack of significant difference in expression levels between the three spruce 2S albumin promoter constructs in germinating spruce pollen also indicates that construct is not being developmentally regulated. Interestingly in both the partially dried mature embryos and in the spruce pollen the negative control p2S+l was expressed at low but observable levels. This vector consists of 2.3 kb of the spruce 2S albumin promoter translationally fused to the uidA coding region so that it was frame-shifted by one nucleotide. Both of these developmental stages appear to have relaxed standards for initiation of translation, the ribosome is by-passing the upstream 2S albumin initiation codon and initiating translation at the uidA initiation codon in significant numbers. A s in the partially-dried mature embryos, a negative element which down-regulates expression of the native PG2S 2S albumin gene in pollen may be located outside of the 5' flanking region and therefore not be present in any of the promotenreporter gene constructs. 5.8 Conclusions This research expands what is known about the 2S albumin super family of seed storage protein genes through a detailed examination of the conifer homologue, PG2S. The divergence of the amino acid sequence within the framework of the highly conserved cysteine residues is similar to that seen among the related angiosperm proteins, suggesting that the 2S albumin protein is flexible in protein sequence with certain constraints due to the conserved pattern of disulphide bond formation, as well as constraints in overall amino acid content necessary for supplying nutrients to the germinating embryo. The white spruce PG2S gene encodes a gymnosperm 2S albumin seed storage protein, which is expressed during embryo development in spruce somatic embryos. Expression 156 declines when the mature embryo enters a period of water loss and is absent in the germinant, similar to all seed storage proteins. The genomic sequence contains an intron, unlike related monocot and all but two of the related dicot genes characterised. A n intron is present in other spruce 2S albumin genes, but we cannot speculate as to whether the gene ancestral to the gymnosperm and angiosperm sequences possessed an intron. Characterisation of genomic sequences which encode homologous proteins identified in fern spores by Templeman (1988), and Rodin and Rask (1990) plus other primitive plants, could answer this question. The deduced amino acid sequence of the protein encoded by PG2S, contains the eight conserved cysteine residues which define the 2S albumin super family plus a few other conserved amino acids which may be required for correct protein folding. In agreement with other conifer seed storage proteins characterised, the deduced sequence has a high percentage of arginine (19.3 %) and glutamine (11.6%) and is also relatively high in the sulphur-containing amino acids (cysteine 6.2%, methionine 3.8%). Comparison with related dicot proteins reveals some similarity with the calmodulin inhibitors studied by Polya et al.(1993) and Neumann et al. (1996, Neumann et al., 1996a and Neumann et al., 1996b), though such a function has not been attributed to the spruce 2S albumin protein or its subunits. This work represents the first example of the proper and predicted, stable expression of a reporter gene under control of a gymnosperm promoter in an angiosperm (tobacco). It is significant that within the 5' flanking region of the spruce 2S albumin gene, PG2S, there were no large regions of sequence homology with any of the angiosperm 2S albumin promoter regions (Figure 9, page 83), instead there are small conserved motifs found loosely clustered within the proximal promoters of these seed specific genes (Figure 27). Despite what would appear to be tenuous similarity between the spruce and dicot 2S albumin proximal promoter sequences, these promoters all drive expression of the reporter gene uidA in a seed-specific manner in tobacco. This indicates that tobacco transcription factors must be conserved, yet be flexible enough to be able to interact with conserved cis elements found in the spruce PG2S promoter, the Brazi l nut BE2S1 promoter, the B. napus napA promoter and the, Arabidopsis AT2S1 and A T 2 S 2 promoters, in a tissue and developmentally specific manner. The 157 clustering o f these elements and their presence in multiple copies may be an indication that they interact with each other either physically or through D N A binding proteins. The transient expression experiments in this thesis indicate that it is possible to observe the developmental regulation of a promotenreporter gene construct in a transient system. Interpretation o f the results requires comparison with both positive and negative controls to ascertain that levels of expression observed are, in fact, due to promoter sequence and are not artifactual. In addition, it must be borne in mind that creating chimeric promotenreporter gene fusions can lead to the loss of regulatory elements located elsewhere in the gene, resulting in patterns of expression unlike that of the native gene. This may be what is occurring in the partially-dried mature embros and germinating white spruce pollen grains. Dissection of promoter function in order to assign a specific function to a particular sequence or cis element is a difficult task. The research on dicot 2S albumin promoters supports the idea of an additive effect of several small motifs acting in concert to control seed specificity. This model would explain why the loss of a particular sequence is not critical for expression in situations where other motifs are adjacent; and conversely, why the presence of certain motifs may not be sufficient for expression, in situations where a threshold number of cis elements have been deleted or where spacing between elements is critical for transcription factor binding. The gymnosperm 2S albumin gene, PG2S, represents an important part of the overall 2S albumin gene family story, not for what is similar between the conifer and the angiosperm genes but for what is different. The white spruce 2S albumin proteins have regions reduced (between the conservative domains A and B ) and enlarged (the variable region, between domains B and C) in size compared to 2S albumin proteins from the dicotyledonae. The white spruce 2S albumin genes contain a single intron, unlike most of the related dicot genes and all of the related monocot genes sequenced to date. This may be part of a trend towards increased number of introns in conifer genes compared to related angiosperm genes (Hager et a l , 1996, McHenry et al., 1992, and Sundas, 1993). 158 The most significant result of this research was that the white spruce 2S albumin promoter sequence from the gene PG2S drives expression of the reporter gene G U S in a developmentally regulated, 2S albumin seed-specific pattern despite having only small putative c/s-element motifs in common with homologous dicot 2S albumin promoters. This result suggests that not only are the c/'s-elements and their coordinate trans-acting factors conserved between the gymnosperm and angiosperm 2S albumin genes, but the order of c/'s-elements does not appear to be conserved, although spacing between elements may be important. 159 i References Cited Adachi T., Izumi H . , Yamada T., Tanaka K . , Takeuchi S., Nakamura R. and Matsuda T. 1993. Gene structure and expression of rice seed allergenic proteins belonging to the alpha amylase / trypsin inhibitor family. Plant M o l . B io l . 21 (2): 239-248. Albani D . , Hammond-Kosack M . C . U . , Smith C , Conlan S., Colot V . , Holdsworth M . and B e v a n M . W . 1997. The wheat transcriptional activator SPA: A seed-specific bZIP protein that recognizes the GCN4-l ike motif in the bifactorial endosperm box of prolamin genes. Plant Cell. 9(2): 171-184. Aleith F. and Richter G . 1990. Gene expression during induction of somatic embryogenesis in carrot cell suspensions. Planta 183: 17-24. Allen R . D . , Cohen E . A . , Vonder Haar R . A . , Adams C . A . , M a D P . , Nessler C . L . and Thomas T .L . 1987. Sequence and expression of a gene encoding an albumin storage protein in sunflower. M o l Gen Genet (2): 211-218. Allona I., Casado R. and Aragoncillo C. 1992. Seed storage proteins from Pinus pinaster Ait . : homology of major components with 1 IS proteins from angiosperms. Plant Sci. 87: 9-18. Allona I., Collada C , Casado R. and Aragoncillo C. 1994 2S Arginine-rich proteins from Pinus pinaster seeds. Tree Physiol. 14: 211-218. Altenbach S.B., K u o C - C , Staraci L . C . , Pearson K . W . , Wainwright C , Georgescu A . and Townsend J. 1992. Accumulation of a Brazil nut albumin in seeds of transgenic canola results in enhanced levels of seed protein methionine. Plant M o l . B io l . 18: 235-245. Altenbach S B , Pearson K . W . , Meecker G. , Staraci L . C . and Sun S .S .M. 1989. Enhancement of the methionine content of seed proteins by the expression of a chimeric gene encoding a methionine-rich protein in transgenic plants. Plant M o l . Bio l . 13: 513-522. Altenbach S.B., Pearson, K . W . , Leung, F .W. and Sun S .S .M. 1987. Cloning and sequence analysis of c D N A encoding a Brazil nut protein exceptionally rich in methionine. Plant M o l . B io l . 8: 239-250. Altenbach S.B., Pearson, K . W . , Leung, F .W. and Sun S .S .M. The stepwise processing of a sulfur-rich seed protein from Brazil nut (Bertholletia excelsa). In: Molecular biology of Seed Storage Proteins and Lectins. Proceedings of the 9th Annual Symposium in Plant Physiology. Eds. Shannon L . M . and Chrispeels M . J . American Society of Plant Physiologists. 1986. Altschul S.F., Gish W. , Mil ler W. , Myers E . W . and Lipman D.J . 1990. Basic local alignment search tool. J. M o l . B io l . 215: 403-410. 160 Ampe C , V a n Damme J., De Castro L . , Sampaio M . J . , V a n Montagu M . , Vandekerckhove J. 1986. The amino-acid sequence of the 2S sulfur-rich proteins from seeds of Brazil nut (Bertholletia excelsa H . B . K . ) . Eur. J. Biochem. 159: 597-604. Anisimova I .N. , Fido R.J . , Tatham A . S . and Shewry PR. 1995. Genotypic variation and polymorphism of 2S albumins of sunflower. Euphytica 83(1): 15-23. Aragao F.J .L. , Barros L . M . G . , Brasileiro A . C . M . , Ribeiro S.G., Smith F .D . , Sanford J.C., Faria J.C. and Rech E L . 1996. Inheritance of foreign genes in transgenic bean (Phaseolus vulgaris L . ) co-transformed via particle bombardment Theor. Appl . Genet. 93(1-2): 142-150. Aragao F.J .L. , Grossi de Sa M . F . , Almeida E.R. , Gander E .S . and Rech E L . 1992. Particle bombardment-mediated transient expression of a Brazil nut methionine-rich albumin in bean Phaseolus vulgaris L . Plant M o l . B io l . 20(2): 357-359. Barber D. , Sanchez-Monge R., Gomez L . , Carpizo J., Armentia A . , Lopez-Otin C , Juan F. and Salcedo G . 1989. A barley flour inhibitor of insect oc-amylase is a major allergen associated with baker's asthma disease. F E B S Lett. 248: 119-122. Bar-Peled M . , Bassham D . C . and Raikhel N . V . 1996. Transport of proteins in eukaryotic cells: more question ahead. Plant M o l . B io l . 32: 223-249. Bartolome B . , Mendez J.D., Armentia A . , Vallverdu A . and Palacios R. 1997. Allergens from Brazil nut: immunochemical characterization. Aller. Immunopath. 25(3): 135-144. Baszcynski C . L . and Fallis L . 1990. Isolation and nucleotide sequence of a genomic clone encoding a new Brassica napus napin gene. Plant M o l . B io l . 14: 633-635. Beardmore T., Wetzel S., Burgess D . and Charest P.J. 1996. Characterization of seed storage proteins in Populus and their homology with Populus vegetative storage proteins. Tree Physiol. 16(10): 833-840. Bevan, M . 1984. Binary Agrobacterium Vectors for Plant Transformation. Nucl . Acid . Res. 12:8711-8721. Bewley J.D. and Black M . 1994. Seeds. Physiology of development and germination, 2nd Edition. Plenum Press, New York, 445 pp. Birk Y . 1985. The Bowman-Birk inhibitor. Trypsin- and chymotrypsin-inhibitor from soybeans. Int. J. Pept. Prot. Res. 25(2): 113-131. Burks A . W . , Brooks J.R. and Sampson H . A . 1988. Allergenicity of major component proteins of soybean determined by enzyme-linked immunosorbent assay (ELISA) and immunoblotting in children with atopic dermatitis and positive soy challenges. J. Allergy Clin. Immunol. 81: 1135-1142. 161 Blundy K . S . , Blundy M . A . and Crouch M L . 1991. Differential expression of members of the napin storage protein gene family during embryogenesis in Brassica napus. Plant M o l . B io l . 17(5): 1099-1104. Bolle C , Herrmann R . G . and Oelmuller R. 1996. Intron sequences are involved in the plastid- and light-dependent expression of the spinach PsaD gene. Plant J. 10(5): 919-924. Boutilier K . A . , Gines M . J . , Demoor J . M . , Huang B . , Baszczynski C . L . , Iyer V . N . and M i k i B . L . 1994. Expression of the B n m N A P subfamily of napin genes coincides with the induction of Brassica microspore embryogenesis. Plant M o l . B io l . 26(6): 1711-1723. Bradford M . M . 1976. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72: 248-254. Broun P. and Somerville C. 1997. Accumulation of ricinoleic, lesquerolic, and densipolic acids in seeds of transgenic Arabidopsis plants that express a fatty acyl hydroxylase c D N A from castor bean. Plant Physiol. 113(3): 933-942. Bureau T .E . , Ronald P .C . and Wessler S.R. 1996. A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. U S A . 93: 8524-8529. Cocucci M . and Negrini N . 1988. Changes in the levels of calmodulin and of a calmodulin inhibitor in the early phases of radish (Raphanus sativus L . ) seed germination. Plant Physiol. 88: 910-914. Conceicao A . da S. and Krebbers E . 1994a. A cotyledon regulatory region is responsible for the different spatial expression patterns of Arabidopsis 2S albumin genes. Plant J. 5(4): 493-505. Conceicao A . da S., Van Vliet A . and Krebbers E . 1994b. Unexpectedly higher expression levels of a chimeric 2S albumin seed protein transgene from a tandem array construct.. Plant M o l . B io l . 26: 1001-1005. Connelly S., Marshallsay C , Leader D. , Brown J.W. and Filipowicz W . 1994. Small nuclear R N A genes transcribed by either R N A polymerase II or R N A polymerase III in monocot plants share three promoter elements and use a strategy to regulate gene expression different from that used by their dicot plant counterparts. M o l . Cell B io l . 14(9): 5910-5919. Coulter K . M . and Bewley J.D. 1990. Characterization of a small sulphur-rich storage albumin in seeds of alfalfa (Medicago saliva L . ) . J. Exp. Bot. 41(233): 1541-1547. Crouch M . L . , Tenbarge K . M . , Simon A . E . and Ferl R. 1983. c D N A clones for Brassica napus seed proteins: evidence from nucleotide sequence analysis that both subunits of napin are cleaved from a precursor polypeptide. J. M o l . Appl . Gen. 2: 273-283. 162 Cyr D.R. , Webster F . B . and Roberts D.R. 1991. Biochemical events during germination and early growth of somatic embryos and seed of interior spruce (Picea glauca engelmanni complex). Seed Sci. Res. 1: 91-97. D'Hondt K . , Bosch D . , V a n Damme J., Goethals M , Vandekerckhove J. and Krebbers E . 1993a. A n aspartic proteinase present in seeds cleaves Arabidopsis 2S albumin precursors in vitro. J. Bio l . Chem. 268: 20884-20891. D'Hondt K . , V a n Damme J., Van den Bossche C , Leejeerajumnean S., De Rycke R., Derksen J., Vandekerckhove J. and Krebbers E . 1993b. Studies of the role of the propeptides of the Arabidopsis thaliana 2 S albumin. Plant Physiol. 102: 425-433. Dasgupta J., Dasgupta S., Ghosh S., Roy B . and Mandal R . K . 1995. Deduced amino acid sequence of 2S storage protein from Brassica species and their conserved structural features. Indian J. Biochem. Biophys. 32(6):378-384. Dasgupta S. and Mandal R K . 1991. Characterization of 2S seed storage protein of Brassica campestris and its antigenic homology with seed proteins of other*Cruciferae. Biochem. Int. 25(3): 409-418. Dasgupta S., Dasgupta J. and Mandal R . K . 1993. Cloning and sequencing of 5' flanking sequence from the gene encoding 2S storage protein, from two Brassica species. Gene 133(2): 301-302. De Castro L . A . B . , Lacerada Z. , Aramayo, R . A . , Sampaio M . J . A . M . and Gander E.S . 1987. Evidence for a precursor molecule of Brazil nut 2S seed proteins from biosynthesis and c D N A analysis. M o l . Gen. Genet. 206: 338-343. De Clercq A . , Vandewiele M . , De Rycke R., Van Damme J., V a n Montagu M . , Krebbers E . and Vandekerckhove J. 1990a. Expression and processing of an Arabidopsis 2S albumin in transgenic tobacco. Plant Physiol. 92: 899-907. De Clercq A . , Vandewiele M . , Van Damme J., Guerche P., V a n Montagu M . , Vandekerckhove J. and Krebbers E . 1990b. Stable accumulation of modified 2S albumin seed storage proteins with higher methionine contents in transgenic plants. Plant Physiol. 94: 970-979. DeLisle A . J . and Crouch M . L . 1989. Seed storage protein transcription and m R N A levels in Brassica napus during development and in response to exogenous A B A . Plant Physiol. 91: 617-623. Denis M . , Krebbers E . and Renard M . 1996. Effect of sulphur levels on transgenic double-low Brassica napus plants expressing a seed-specific gene encoding a methionine-rich 2S albumin. Plant Breeding. 115(3): 145-151. Denis M . , Renard M . and Krebbers E . 1995a. Isolation of homozygous transgenic Brassica napus lines carrying a seed-specific chimeric 2S albumin gene and determination of linkage relationships. M o l . Breed. 1(2): 143-153. 163 Denis M . , Van Vliet A . , Leyns F., Krebbers E . and Renard M . 1995b. Field evaluation of transgenic Brassica napus lines carrying a seed-specific chimeric 2S albumin gene. Plant Breeding 114(2): 97-107. Devic M , Albert S. and Delseny M . 1996. Induction and expression of seed-specific promoters in Arabidopsis embryo-defective mutants Plant J. 9(2): 205-215. Dey N . and Mandal R . K . 1993. Characterisation of 2S albumin with nutritionally balanced amino acid composition from the seeds of Chenopodium album and its antigenic homology with seed proteins of some Chenopodiaceae and Amaranthaceae species. Biochem. M o l . B io l . Int. 30(1): 149-157. Ditta G . , Stanfield S., Corbin D . and Helinski D.R. 1980. Broad host range D N A cloning system for Gram negative bacteria: construction of a gene bank of Rhizobium meliloti. Proc. Natl. Acad. Sci. U S A , 77: 7347-7351. Dominguez J. Cuevas M . , Urefia V . , Munoz T. and Moneo I. 1990. Purification and characterization of an allergen of mustard seeds. Ann. Allergy 64: 352-357. Dong, J.Z. and Dunstan, D.I . 1996. Expression of abundant m R N A s during somatic embryogenesis of white spruce [Picea glauca (Moench) Voss]. Planta 199: 459-466. Doyle, J.J. and J .L. Doyle. 1990. Isolation of Plant D N A from Fresh Tissue. Focus 12(1): 13-15. Egorov T .A . , Odintsova T.I., Musolyamov A . K h . , Fido R., Tatham A S . and Shewry P.R. 1996. Disulphide structure of a sunflower seed albumin: conserved and variant disulphide bonds in the ceral prolamin superfamily. F E B S Lett. 396: 285-288. Ellerstrom M . , Stalberg K , Ezcurra I. and Rask L . 1996. Functional dissection of a napin gene promoter: Identification of promoter elements required for embryo and endosperm-specific transcription. Plant M o l B i o l 32(6): 1019-1027. Ellis D .D . , McCabe D . E . , Mclnnis S . M . , Ramachandran R., Russell D.R. , Wallace K . M . , Martinell B . J . , Roberts D.R. and M c C o w n B . H . 1993. Stable transformation of Picea glauca by particle acceleration. Bio/Tech. 11: 84-89. Ellis D . D . , McCabe D . E . , Russell D R . , Martinell B . J . and M c C o w n B . H . 1991. Expression of inducible angiosperm promoters in a gymnosperm, Picea glauca (white spruce). Plant M o l . B io l . 17: 19-27. Ericson M . L . , Muren E . , Gustavsson H.O. , Josefsson L . G . and Rask L . 1991. Analysis of the promoter region of napin genes from Brassica-napus demonstrates binding of nuclear protein in vitro to a conserved sequence motif. Eur. J. Biochem. 197(3): 741-746. 164 Ericson M . L . , Rodin J., Lenman M . , Glimelius K . , Josefsson L . - G . , and Rask L . 1986. Structure of the rapeseed 1.7S storage protein, napin, and its precursor. J. Bio l . Chem. 261: 14567-14581. Feinberg A . P . and Vogelstein B . 1983. A technique for radiolabeling D N A restriction endonuclease fragments to high specific activity. Anal. Biochem. 132: 6-13. Feinberg A . P . and Vogelstein B . 1984. A technique for radiolabeling D N A restriction endonuclease fragments to high specific activity. Addendum. Anal. Biochem. 137: 266-267. Fernandez D . E . , Turner F.R. and Crouch M . L . 1991. In situ localization of storage protein m R N A s in developing meristems of Brassica napus embryos. Development 111: 299-313. Fisher, R. A . 1935. The Design of Experiments. Oliver and Boyd, Edinburgh. Flinn B .S . Storage protein gene expression in zygotic and somatic embryos of interior spruce. Dept: of Botany, University of British Columbia, Ph.D. Thesis. 1992. 173 pp. Flinn B .S . , Roberts D.R. and Taylor I.E.P. 1991a. Evaluation of somatic embryos of interior spruce. Characterization and developmental regulation of storage proteins. Physiol. Plant. 82: 624-632. Flinn B .S . , Roberts D.R. , Webb D.T. and Sutton, B . C . S . 1991b. Storage protein changes during zygotic embryogenesis in interior spruce. Tree Physiol. 8: 71-81. Flinn B .S . , Roberts D.R. , Newton C . H . , Cyr D.R. , Webster F . B . and Taylor I.E.P. 1993. Storage protein gene expression in zygotic and somatic embryos of interior spruce. Physiol. Plant. 89: 719-730. Fourney R . M . , Miyakoshi S., Day R.S. and Peterson M . C . 1988. Northern blotting: Efficient R N A staining and transfer. Focus 10(1): 5-7. Frischauf A . - M , Lehrach H . , Pustka A . and Murray N . 1983. Lambda replacement vectors carrying polylinker sequences. J. M o l . B io l . 170:827-842. Galau G . A . , Wang H . Y . - C . and Hughes D . W . 1992. Cotton Ma t5 -A (CI64) gene and Mat5-D c D N A s encoding methionine-rich 2S albumin storage proteins. Plant Physiol. 99: 779-782. Gander E.S. , Holmstroem K . O . , De Paiva G.R., De Castro L . A . B . , Carneiro M . and Grossi de Sa M . F . 1991. Isolation, characterization and expression of a gene coding for a 2S albumin from Bertholletia excelsa (Brazil nut). Plant M o l . B io l . 16(3): 437-448. Gayler K . R . , Kolivas S., Macfarlane A . J . , Lilley G . G . , Baldi M . , Blagrove J. and Johnson E . D . 1990. Biosynthesis, c D N A and amino acid sequences of a precursor of conglutin 5, a sulfur-rich protein fromLupinus angustifolius. Plant M o l . B io l . 15: 879-893. 165 Gehrig P . M . and Biemann K . 1996. Assignment of the disulfide bonds in napin, a seed storage protein from Brassica napus, using matrix-assisted laser desorption ionization mass spectrometry. Peptide Res. 9(6): 308-314. Gehrig P . M . , Krzyzaniak A . , Barciszewski J. and Biemann K . 1996. Mass spectrometric amino acid sequencing of a mixture of seed storage proteins (napin) from Brassica napus, products of a multigene family. Proc. Natl. Acad. Sci. 93: 3647-3652. Ghosh S.K., Dasgupta J., Mai t i I.B., Hunt A G . and Mandal R . K . 1995. Expression of 2S seed storage protein gene of Brassica juncea in transgenic tobacco plants under constitutive and seed-specific promoters. J. Plant Biochem. Biotech. 4(1): 1-4. Gifford, D.J . and Tolley M . C . 1989. The seed proteins of white spruce and their mobilisation following germination. Physiol. Plant. 77: 254-261. Godinho da Silva Jr. J., Machado O.L.T . , Izumi C , Padovan J.C., Chait B . T . , Mirza U . A . and Green L . J . 1996. Amino acid sequence of a new 2S albumin from Ricinus communis which is part of a 29 kDa precursor protein. Arch. Biochem. Biophys. 336(1): 10-18. Gomez L . , Mart in E . , Hernandez D . , Sanchez-Monge R., Barber D . , Pozo V . , Andres B . , Armentia A . , Lahoz C , Salcedo G. and Palomino P. 1990. Members of the a -amylase inhibitors family from wheat endosperm are major allergens associated with baker's asthma. F E B S Lett. 261: 85-88. Gonzalez de la Pefia M . A . , Monsalve R.I. , Batanero E . , Villalba M . and Rodriguez R. 1996. Expression in Escherichia coli o f Sin a 1, the major allergen from mustard. Eur. J. Biochem. 237(3): 827-832. Gonzalez de la Pefia M A . , Menendez-Arias L . , Monsalve R.I. and Rodriguez R. 1991. Isolation and characterization of the major allergen from oriental mustard seeds, Bra j l . Int. Arch. Allergy Appl . Immunol. 96: 263-270. Gracia-Olmedo F., Salcedo G. , Sanchez-Monge R., Gomez L . , Royo J. and Carbonero P. 1987. Plant proteinaceous inhibitors of proteinases and a-amylases. Oxf. Surv. Plant M o l . Cell . B io l . 4: 275 -334. Groome M . C , Axler S.R. and Gifford D.J . 1991. Hydrolysis of lipid and protein reserves in loblolly pine seeds in relation to protein electrophoretic patterns following imbibition. Physiol. Plant. 83: 99-106. Grossi de Sa M . F . , Weinberg D.F. , Rech E . L . , Barros L . M . G . , Aragao F.J .L. , Holmstroem K . O . and Gander E.S. 1994. Functional studies on a seed-specific promoter from a Brazil nut 2S gene. Plant Science 103(2): 189-198. Gueguen J., Popineau Y . , Anisimova I .N. , Fido R.J . , Shewry P R . and Tatham A S . 1996. Functionality of the 2S albumin seed storage proteins from sunflower {Helianthus annuush.) J. Agr i . Food Chem. 44(5): 1184-1189. 166 Guerche P., Tire C , Grossi De Sa F., De Clercq A . , Van Montagu M . and Krebbers E . 1990a. Differential expression of the Arabidopsis 2S albumin genes and the effect of increasing gene family size. Plant Cell 2 (5): 469-478. Guerche P., De Almeida E.R.P . , Schwarztein M . A . , Gander E . , Krebbers E . and Pelletier G . 1990b. Expression of the 2S albumin from Bertholletia excelsa in Brassica napus. M o l . Gen. Genet. 221 (3): 306-314. Gustavsson H .O . , Ellersfrom M . , Stalberg K , Ezcurra I., Koman A . , Hoglund A . S . , Rask L . and Josefsson L . G . 1991. Distinct sequence elements in a napin promoter interact in vitro with DNA-binding proteins form Brassica napus. Physiol. Plant. 82: 205-212. Habbin J.E. and Larkins B . A . 1995. Improving protein quality in seeds. In: Kigel , J., and Galili , G . (eds.) Seed Development and Germination, pp. 791-810. Marcel Dekker, N . Y . Hakmanl . 1993. Embryology in Norway spruce (Picea abies). A n analysis of the composition of seed storage proteins and deposition of storage reserves during seed development and somatic embryosgenesis. Physiol. Plant 87: 148-159. Hakman I., Stabel P., Engstrom P. and Eriksson T. 1990. Storage protein accumulation during zygotic and somatic embryos development in Picea abies (Norway Spruce). Physiol. Plant. 80: 441-445. Hara-Nishimura I., Inoue K . and Nishimura M . 1991. A unique vacuolar processing enzyme responsible for conversion of several proprotein precursors into the mature forms. F E B S L e t t . 294(1-2): 89-93 Hara-Nishimura I., Takeuchi Y , Inoue K . and Nishimura M . 1993. Vesicle transport and processing of the precursor to 2S albumin in pumpkin. Plant J. 4: 793-800. Hara-Nishimura I., Shimada T., Hiraiwa N . and Nishimura M . 1995. Vacuolar processing enzyme responsible for maturation of seed proteins. J. Plant Physiol. 145(5-6): 632-640. Henikoff S. 1984. Unidirectional digestion with exonuclease III creates targeted breakpoints for D N A sequencing. Gene 28(3): 351-359. Higgins T.J .V. 1984. Synthesis and regulation of major proteins in seeds. Annu. Rev. Plant Physiol. 35: 191-221. Higgins T.J .V. , Chandler P . M . , Randall P.J., Spencer D. , Beach L . R . , Blagrove R.J . , Kortt A . A . and Inglis A . S . 1986. Gene structure, protein structure, and regulation of the synthesis of a sulfur-rich protein in pea seeds. J. Bio l . Chem. 261(24): 11124-11130 Higgins T.J .V. , Beach L . R . , Spencer D. , Chandler P . M . , Randall P.J., Blagrove R.J . , Kortt A . A . and Guthrie R . E . 1987. c D N A and protein sequence of a major pea seed albumin (PA2). Plant M o l . B io l . 8: 37-45. 167 Hira iwaN. , Kondo M . , Nishimura M . and Hara-Nishimura I. 1997. Anaspartic endopeptidase is involved in the breakdown of propeptides of storage proteins in protein-storage vacuoles of plants. Eur. J. Biochem. 246(1): 133-141. Hbglund A - S . , Rodin J., Larsson E . and Rask L . 1991. The distribution of napin and cruciferin in developing rape seed embryos. Plant Physiol. 98: 509-515. Holmes, D.S . and,M. Quigley. 1981. A Rapid Boiling Method for the Preparation of Bacterial Plasmids. Anal. Biochem. 114: 193. Hood E E . , Gelvin S.B., Melchers L . S . and Hoekema A . 1993 N e w Agrobacterium helper plasmids for gene transfer to plants. Transgen. Res. 2(4): 208-218. Horsch R . B . , Fry J.E., Hoffmann N . L . , Wallroth M . , Eichholtz D . A . , Rogers S. G. and Fraley, R .T . 1985. A simple and general method for transferring genes into plants. Science 227: 1229-1231. H o w l e y P . M . and Gatehouse J. A . 1997. A 38 basepair repeat sequence within the pea seed storage protein promoter of legA is a binding site for a nuclear DNA-binding protein. Plant M o l . B io l . 33: 175-180. Irwin S.D., Keen J .N. , Findlay J .B.C. and Lord J . M . 1990. The Ricinus communis 2S albumin precursor a single preproprotein may be processed into two different heterodimeric storage proteins. M o l . Gen. Genet. 222 (2-3): 400-408. Ishihara H . , Sasagawa T., Sakai R., Nishikawa M . , Kimura M . and Funatsu G . 1997. Isolation and molecular characterization of four arginine/glutamate rich polypeptides from the seeds of sponge gourd (Luffa cylindrica) Biosci. Biotech. & Biochem. 61(1): 168-170. Jefferson R . A . , Kavanagh T .A . and Bevan M . W . 1987 G U S fusions: p-glucuronidase as a sensitve and versatile gene fusion marker in higher plants. E M B O J 6: 3901-3907. Jefferson, R . A . 1987. Assaying chimeric genes in plants: The G U S gene fusion system. Plant M o l . B io l . Rep. 5: 387-405. Jiang L . , Abrams S.R. and Kermode A . R . 1996. Vici l in and napin storage-protein gene promoters are responsive to abscisic acid in developing transgenic tobacco seed but lose sensitivity following premature desiccation. Plant Physiol. 110: 1135-1144. Jiang L . , Downing W . L . , Baszczynski C . L . and Kermode A . R . 1995. The 5' flanking regions of vicilin and napin storage protein genes are down-regulated by desiccation in transgenic tobacco. Plant Physiol. 107: 1439-1449. Jose-Estanyol M . , Ruiz-Avi la L . and Puigdomenech P. 1992. A maize embryo-specific gene encodes a proline-rich and hydrophobic protein. Plant Cell 4: 413-423. 168 ( Josefson L . - G . , Lenman M . , Ericson M . L . and Rask L . 1987. Structure of a gene encoding the 1.7S storage protein, napin, from Brassica napus. J. B io l . Chem. 262: 12196-12201. Kado C.I., Heskett M . G . and Langley R . A . 1972. Studies on Agrobacterium tumefaciens. characterization of strains I D 13 5 and B6, and analysis of the bacterial chromosome, transfer R N A and ribosomes for tumour-inducing ability. Phys. Plant Path. 2: 47-57. Kermode A . R . 1995. Regulatory mechanisms in the transition from seed development to germination: Interactions between the embryo and the seed environment. In K ige l J . , Galili G . (eds.), Seed Development and Germination, pp. 273-332. Marcel Dekker, New York. Khan M . R . I . , Ceriotti A . , Tabe L . , Aryan A . , McNabb W., Moore A . , Craig S., Spencer D . and Higgins T.J .V. 1996. Accumulation of a sulphur-rich seed albumin from sunflower in the leaves of transgenic subterranean clover (Trifolium subterraneum L.) . Transgenic Res. 5(3): 179-185. King J.E. and Gifford D.J . 1997. Amino acid utilization in seeds of loblolly pine during germination and early seedling growth. 1. Arginine and arginase activity. Plant Physiol 113(4): 1125-1135. Kirsch T., Saalbach G. , Raikhel N . V . and Beevers L . 1996. Interaction of a potential vacuolar targeting receptor with amino- and carboxyl-terminal targeting determinants Plant Physiol. 111(2): 469-474. Klein T . M . , Gradziel T., Fromm M . E . and Sanford J.C. 1988. Factors influencing gene delivery into Zea mays cells by high-velocity microprojectiles. Bio/Technol. 6: 559-563. Kohno-Murase J., Murase M . , Ichikawa H . and Imamura J. 1994. Effects of an antisense napin gene on seed storage compounds in transgenic Brassica napus seeds. Plant M o l . B io l . 26(4): 1115-1124 Koning A . , Jones A . , Fillatti J. J., Comai L . and Lassner M . W . 1992. Arrest of embryo development in Brassica napus mediated by modified Pseudomonas aeruginosa exotoxin A . Plant M o l . B io l . 18(2): 247-258. Koornneef M . , Hanhart C.J. , Hilhorst H . W . M . and Karssen C M . 1989. In vivo inhibition of seed development and reserve protein accumulation in recombinants of abscisic acid biosynthesis and responsiveness mutants in Arabidopsis thaliana. Plant Physiol. 90: 463-469. Kortt A . A . and Caldwell J .B. 1990. L o w molecular weight albumins from sunflower seed: identification of a methionine-rich albumin. Phytochemistry 29: 2805-2810. 169 Kortt A . A . , Caldwell J.B., Lilley G . G . and Higgins T.J .V. 1991. Amino acid and c D N A sequences of a methionine-rich 2S protein from sunflower seed {Helianthus annuus L.) . Eur. J. Biochem. 195(2): 329-334. Kosugi S., Ohashi, Y . , Nakajima K . and Arai , Y . 1990. A n improved assay for P-glucuronidase in transformed cells: methanol almost completely suppresses a putative endogenous P-glucuronidase activity. Plant Sci. 70: 133-140. Koziel M . G . , Carozzi N . B . and Desai N . 1996. Optimizing expression of transgenes with an emphasis on post transcriptional events. Plant M o l . B io l . 32: 393-405. Krebbers E . , D a Silva Conceicao A . , Denis M . , D'Hondt K . and Vandekerckhove J. 1993. Modification of plant seed storage proteins. In Seed Storage Compounds: Biosynthesis, Interactions and Manipulation, P.R. Shewry and A . K . Stobart, eds. Oxford University Press, Oxford. Krebbers E . , Herdies L . , De Clercq A . , Seurinck J., Leemans J., V a n Damme J., Segura M . , Gheysen G. , Van Montagu M . and Vandekerckhove J. 1988. Determination of the processing sites of an Arabidopsis 2S albumin and characterization of the complete gene family. Plant Physiol. 87: 859-866. Krebbers E . , Rudelsheim P., De Greef W . and Vandekerckhove J. 1991. Laboratory and field performance of transgenic Brassica plants expressing chimeric 2S albumin genes. In GCERC eighth International Rapeseed Congress. July 1991, Saskatoon, Saskatchewan. Ed . McGregor D.I. pp. 716-721. G C I R C , Canada. Kreis M . , Forde B . G . , Rahman S., Mif l in B . J . and Shewry P.R. 1985. Molecular evolution of the seed storage proteins of barley, rye and wheat. J. M o l . B io l . 183: 499-502. Krochko J.E., Bantroch D.J . , Greenwood J.S. and Bewley J.D. 1994. Seed storage proteins in developing somatic embryos of alfalfa: Defects in accumulation compared to zygotic embryos. J. Exp. Bot. 45(275): 699-708. Lacorte C , Aragao F.J .L. , AlmeidaE.R. , Mansur E . and R e c h E L . 1997. Transient expression of G U S and the 2S albumin gene from Brazil nut in peanut (Arachis hypogaea L . ) seed explants using particle bombardment. Plant Cell Rep. 16: 619-623. Lammer D . L a n d Gifford D.J . 1989. Lodgepole pine seed germination II. The seed proteins and their mobilization in the megagametophyte and embryonic axis. Can. J. Bot. 67(9): 2544-2551. Langridge P. and Felix G . 1983. A zein gene of maize is transcribed from two widely separated promoter regions. Cell 34: 1015-1022. Laroche M . , Aspart L . , Delseny M . and Penon P. 1984. Characterization of radish (Raphinus sativus) storage proteins. Plant Physiol. 74: 487-493. 170 Laroche-Raynal M . and Delseny M . 1986. Identification and characterization of the m R N A for major storage proteins from radish. Eur. J. Biochem. 157(2): 321-327 L i S.S.-L. 1977. Purification and characterization of seed storage proteins fromMomordica charantia. Experientia 33: 895-896. L i S.S.-L., L i n T.T. and Forde M . D . 1977. Isolation and characterization of a low-molecular weight seed protein from Ricinus communis. Biochim. Biophys. Acta 492: 364-369. L i W . - H . and Graur D . 1991. Fundamentals of Molecular Evolution. Massachusetts: Sinauer Associates, Inc. Lilley G . G . and Inglis A . S . 1986a. Amino acid sequence of conglutin 5, a sulfur-rich seed protein of Lupinus angustfolius L . F E B S Lett. 195: 235-241. Lilley G . G . 1986b. Isolation of conglutin 5, a sulphur-rich protein from the seeds of Lupinus angustifoliusL. J. Sci. Food Agric. 37: 20-30. Lilley G . G . 1986c. The subunit structure and and stability of conglutin 6, a sulphur-rich protien from the seeds of Lupinus angustifolius L . J. Sci. Food Agric. 37: 895-907. Litvay, J.D., Verma, D . C . and Johnson, M . A . 1985. Influence of loblolly pine (Pinus taeda • L . ) culture medium and its components on growth and somatic embryogenesis of the wild carrot (Daucus carota L ) . Plant Cell Rep. 4: 325-328. L i u X . , Maeda S., H u Z . , Aiuchi T., Nakaya K . and Kurihara Y . 1993. Purification, complete amino acid sequence and structural characterization of the heat-stable sweet protein mabinlin II. Eur. J. Biochem. 211(1-2): 281-287. Lohmer S., Maddaloni M . , Motto M . , D i Fiozo N . , Hartings H . , Salamini F. and Thompson R . D . 1991 The maize regulatory locus Opaque-2 encodes a DNA-binding protein which activates the transcription of the b-32 gene. E M B O 10: 617-624. Lonnerdal B . and Janson J.C. 1972. Studies on Brassica seed proteins. I. The low molecular weight proteins in rapeseed. Isolation and characterization. Biochim. Biophys. Acta 278: 175-183. Liitcke H . A . , Chow K . C . , Micke l F.S., Moss K . A . , Kern H.F . and Scheele G A L . 1987. Selection of A U G initiation codons differs in plants and animals. E M B O J. 6: 43-48. Machado O.L .T . and Godinho D a Silva Jr. J. 1992. A n allergenic 2S storage protein from Ricinus communis seeds which is a part of the 2S albumin precursor predicted by c D N A data. Brazilian J. Med. Bio l . Res. 25(6): 567-582. Maier U . - G , Brown J.W.S., Toloczyki C. and Feix G. 1987. Binding of a nuclear factor to a consensus sequence in the 5' flanking region of zein genes from maize. E M B O J. 6: 17-22. 171 Marcellino L . H . , Neshich G. , Grossi de Sa M . F . , Krebbers E . and Gander E.S. 1996. Modified 2S albumins with improved tryptophan content are correctly expressed in transgenic tobacco plants. F E B S Lett. 385(3): 154-158. Marcotte W . R . Jr., Bayley C .C . and Quatrano R.S. 1988. Regulation of a wheat promoter by abscisic acid in rice protoplasts. Nature 335: 4 5 4 - 4 5 7 . Masoud S.A., Ding X . F . , Johnson L . B . , White F.F. and Reeck G.R. 1996. Expression of a corn bifunctional inhibitor of serine proteinases and insect alpha-amylases in transgenic tobacco plants Plant Science. 115(1): 59-69. McHenry L . and Fritz P.J. 1992. Comparison of the structure and nucleotide sequences of vicilin genes of cocoa and cotton raise questions about vicilin evolution. Plant M o l . B io l . 18: 1173-1176. Menendez-Arias L . , Dominguez J., Moneo I. and Rodriguez R. 1990. Epitope mapping of the major allergen form yellow mustard seeds, Sin a I. M o l . Immunol. 27: 143-150. Menendez-Arias L . , Moneo I., Dominguez J. and Rodriguez R. 1988. Primary structure of the major allergen of yellow mustard {Sinapis alba L . ) seed, Sin a I. Eur. J. Biochem. 177: 159-166. Menendez-Arias L . , Monsalve R.I. , Gavilanes J.G. and Rodriguez R. 1987. Molecular and spectroscopic characterization of a low molecular mass seed storage protein from yellow mustard (Sinapis alba L . ) Int. J. Biochem. 19: 899-907. Misra S. and Green M . J . 1990. Developmental gene expression in conifer embryogenesis and germination. I. Seed proteins and protein body composition of mature embryo and the megagametophyte of white spruce (Picea glauca [Moench ] Voss.). Plant Sci. 68: 163-173. Molv ig L . , Tabe L . M . , Eggum B.O . , Moore A . E . , Craig S., Spencer D . and Higgins T.J .V. 1997. Enhanced methionine levels and increased nutritive value of seeds of transgenic lupins (Lupinus angustifolius L . ) expressing a sunflower seed albumin gene. Proc. Natl. Acad. Sci. U S A 94: 8393-8398. Monsalve R.I. , Gonzalez de la Pefia M . A . , Lopez-Otin C , Fiandor A . , Fernandez C , Villalba M . and Rodriguez R. 1997. Detection, isolation and complete amino acid sequence of an aeroallergenic protein from rapeseed flour. Clin. Exp. Aller. 27: 833-841. Monsalve R.I. , Menendez-Arias L . , Gonzalez de la Pefia M . A . , Batanero E . , Villalba M . and Rodriguez R. 1994. Purification and characterization of napin-like proteins from radish. J. Exp. Bot. 45: 1169-1176. Monsalve R.I. , Gonzalez de la Pefia M . A . , Menendez-Arias L . , Lopez-Otin C , Villalba M . and Rodriguez R. 1993. Characterization of a new oriental-mustard (Brassica juncea) allergen, Bra j IE: detection of an allergenic epitope. Biochem. J. 293: 625-632. 172 Monsalve R.I. , Lopez-Otin C , Villalba M . and Rodriguez R. 1991a. A new distinct group of 2S albumins from rapeseed. F E B S Lett. 295:207-210. Monsalve R.I. , Villalba M . , Lopez-Otin C. and Rodriguez R. 1991b. Structural analysis of the small chain of the 2S albumin napin NIII from rapeseed, chemical and spectroscopic evidence of an intramolecular bond formation. Biochim. Biophys. Acta 1078(2): 265-272. Monsalve R.I. , Menendez-Arias L . , Lopez-Otin C. and Rodriguez R. 1990. Beta-turns as structural motifs for the proteolytic processing of seed proteins. F E B S Lett 263(2): 209-212. Monsalve R.I. and Rodriguez R. 1990. Purification and characterization of proteins from the 2S fraction from seeds of Brassicaceae family. J. Exp. Bot. 41: 89-94. Morci l lo F. , Aberlenc-Bertossi F., Trouslot P., Hamon S. and Duval Y . 1997. Characterization of 2S and 7S storage proteins in embryos of oil palm. Plant Sci. 122(2): 141-151. Moroz L . A . , and Yang W . H . 1980. Kunitz soybean trypsin inhibitor. A specific allergen in food anaphylaxis. N . Engl. J. Med . 302: 1126-1128. Morton R . L . , Quiggan D . and Higgins T.J .V. 1995. Regulation of seed storage protein gene expression. In: Kigel , J., and Galili, G. (eds.) Seed Development and Germination, pp. 103-136. Marcel Dekker, N Y . Murashige, T. and Skoog, F. 1962. A revised medium for rapid growth and bioassays with tobacco tissue cultures. Physiol. Plant. 15: 473-497. Muren E . and Rask L . 1996. Processing in vitro of pronapin, the 2S storage-protein precursor of Brassica napus produced in a baculovirus expression system. Planta 200: 373-379. Muren E . , E k B . and Rask L . 1995. Processing of the 2S storage protein pronapin in Brassica napus and in transformed tobacco. Eur. J. Biochem. 227: 316-321. Muren E . , E k B . , Bjbrk I. and Rask L . 1996. Structural comparison of the precursor and the mature form of napin, the 2S storage protein in Brassica napus. Eur. J. Biochem. 242: 214-219. Nakase M . , Adachi T., Urisu A . , Miyashita T., Alvarez A . M . , Nagasaka S., A o k i N . , Nakamura R. and Matsuda T. 1996a. Rice (Oryza sativa L . ) alpha-amylase inhibitors of 14-16 kDa are potential allergens and products of a multigene family. J. Agri . Food Chem. 44(9): 2624-2628. Nakase M . , Yamada T., K i r a T., Yamaguchi J., A o k i N . , Nakamura R., Matsuda T. and Adachi T. 1996. The same nuclear proteins bind to the 5'-flanking regions of genes 173 for the rice seed storage protein-16 kDa albumin, 13 k D a prolamin and type II glutelin. Plant M o l . B io l . 32(4): 621-630. Nestle M . 1996. Allergies to transgenic foods - questions of policy. N . Engl. J. Med . 334(11): 726-728. Neumann G . M . , Condron R. and Polya G . M . 1996. Purification and sequencing of napin-like protein small and large chains from Momordica charantia and Ricinus communis seeds and determination of sites phosphorylated by plant Ca2+-dependent protein kinase. Biochim. Biophys. Acta - Protein Structure & Molecular Enzymology. 1298(2): 223-240. Neumann G . M . , Condron R., Thomas I. and Polya G . M . 1996a. Purification and sequencing of multiple forms of Brassica napus seed napin small chains that are calmodulin antagonists and substrates for plant calcium-dependent protein kinase. Biochim. Biophys. Acta 1295(1): 23-33. Neumann G . M . , Condron R., Thomas I. and Polya G . M . 1996b. Purification and sequencing of multiple forms of Brassica napus seed napin large chains that are calmodulin antagonists and substrates for plant calcium-dependent protein kinase. Biochim. Biophys. Acta 1295(1): 34-43. Newton C . H . , Flinn B .S . and Sutton B . C . S . 1992. Vicilin-like seed storage proteins in the gymnosperm interior spruce (Picea glauca /engelmanii). Plant. M o l . B io l . 20: 315-322. Nielsen H . , Engelbrecht J., Brunak S. and von Heijne G. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Prot. Eng. 10(1): 1-6. Nirasawa S., L i u X . , Nishino T. and Kurihara Y . 1993. Disulfide bridge structure of the heat-stable sweet protein mabinlin II. Biochim. Biophys. Acta 1202(2): 277-280. Nirasawa S., Nishino T., Katahira M . , Uesugi S., H u Z . and Kurihara Y . 1994. Structures of heat-stable and unstable homologues of the sweet protein mabinlin. The difference in the heat stability is due to replacement of a single amino acid residue. Eur. J. Biochem. 223(3): 989-995. Nong V . H . , Schlesier B . , Bassuner R., Repik A . , Horstmann C. and Muntz K . 1995. Narbonin, a novel 2S protein from Vicia narbonensis L . seeds - c D N A , gene structure and developmentally regulated formation. Plant M o l . B io l . 28(1): 61-72. Nordlee J .A. , Taylor S.L., Townsend J.A., Thomas L . A . and Bush R . K . 1996. Identification of a Brazil nut allergen in transgenic soybeans N . Engl. J. Med . 334(11): 688-692. 174 Odani S., Koide T., Ono T., Seto Y . and Tanaka T. 1987. Soybean hydrophobic protein. Eur. J. Biochem. 162: 485-491. Onaderra M . , Monsalve R.I., Mancheiio J . M . , Villalba M . , Martinez del Pozo A , Gavilanes J .G. and Rodriguez R. 1994. Eur. J. Biochem. 225: 609-615. Osborne T . B . 1924. The Vegetable Proteins. Longmans, Green. London. Owens J .N. and Molder M . 1984. The Reproductive Cycle of Interior Spruce. Ministry of Forests, Province of British Columbia, Victoria, B . C . Pal M . and Biswas B . B . 1995. Expression of the Arabidopsis thaliana 2S albumin gene 3 in Saccharomyces cerevisiae. Gene 153(2): 175-178. Pasha M . K . , Begum N . and Baset Q.A. 1995. Electrophoretic characterization of albumin and globulin proteins of rice grains. Bangladesh J. Bot. 24(2): 115-120. Pickardt T., Saalbach I., Waddell D . , Meixner M . , Muntz K . and Schieder O. 1995. Seed specific expression of the 2S albumin gene from Brazil nut (Bertholletia excelsa) in transgenic Vicia narbonensis. Molec. Breed. 1(3): 295-301. Polya G . M . , Morrice N . A . and Wettenhall R . E . H . 1989. Substrate specificity of of wheat embryo calcium-dependent protein kinase. F E B S Lett. 253: 137-140. Polya G . M . , Chandra S., Chung R., Neumann G M . and Hqj P .B . 1992. Purification and characterization of wheat and pine small basic protein substrates for plant calcium-dependent protein kinase. Biochim. Biophys. Acta 1120: 273-280. Polya G . M . , Chandra S. and Condron R. 1993. Purification and sequencing of radish seed calmodulin antagonists phosphorylated by calcium-dependent protein kinase. Plant Physiol. 101(2): 545-551. Przybylska J. and Zimniak-Przybylska Z . 1995. Electrophoretic seed albumin patterns and species relationships in Vicia sect. Faba (Fabaceae). Plant Sys. Evo l . 198(3-4): 179-194. Quatrano R.S. , Marcotte W.R. and Guiltinan M . 1993. Regulation of gene expression by abscisic acid. In: D.P.S. Verma (ed), Control of Plant Gene Expression, pp. 69-90. C R C Press, Boca Raton. Radke S.E., Andrews B . M . , Moloney M . M . , Crouch M L , Kr id l J .C. and Knauf V . C . 1988. Transformation of Brassica napus L . using Agrobacterium tumefaciens: developmental^ regulated expression of a re-introduced napin gene. Theor. Appl . Genet. 75: 685-694. Raina A . and Datta A . 1992. Molecular cloning of a gene encoding a seed-specific protein with nutritionally balanced amino acid copistion from Amaranthus. Proc. Natl. Acad. Sci. 89: 11774-11778. 175 Raynal M , Depigny D. , Grellet F. and Delseny M . 1991. Characterization and evolution of napin-encoding genes in radish and related crucifers. Gene 99: 77-86. Rico M . , Bruix M , Gonzalez C , Monsalve R.I. and Rodriguez R. 1996. *H N M R assignment and global fold of napin Bnlb, a representative 2S albumin seed protein. Biochem. 35(49): 15672-15682. Roberts D.R. , Flinn B .S . , Webb D.T. , Webster F . B . and Sutton B . C . S . 1990a. Abscisic acid and indole-3-butyric acid regulation of maturation and accumulation of storage proteins in somatic embryos of interior spruce. Physiol. Plant. 78: 355-360. Roberts D R . , Sutton B . C . S . and Flinn B .S . 1990b. Synchronous and high frequency germination of interior spruce somatic embryos following partial drying at high relative humidity. Can. J. Bot. 68: 1086-1090. Roberts D.R. 1991. Abscisic acid and mannitol promote early development, maturation and storage protein accumulation in somatic embryos of interior spruce. Physiol. Plant. 83; 247-254. Roberts D.R. , Webster F .B . , Flinn B.S . , Lazaroff W.R. , Mclnnis S . M . and Sutton B . C . S . 1991. Application of somatic embryogenesis to clonal propagation of interior spruce. In: M . R . Ahuja (ed). Woody Plant Biotechnology. Plenum Press, New York, pp. 157 - 169. Rodin, J. and Rask, L . 1990. Characterization of matteuccin, the 2.2S storage protein of the ostrich fern. Evolutionary relationship to angiosperm seed storage proteins. Eur. J. Biochem. 192: 101-107 Roeckel P., Oancia T. and Drevet J. 1997 Effects of seed-specific expression of a cytokinin biosynthetic gene on canola and tobacco phenotypes. Transgen. Res. 6(2): 133-141. Rose G .D . and Roy S. 1980. Hydrophobic basis of packing in globular proteins. Proc. Natl. Acad. Sci. U S A 77(8): 4643-4647. Rost B . and Sander C. 1993. Prediction of protein structure at better than 70% accuracy. J. M o l . B io l . 232: 584-599. Rost B . , Sander C. and Schneider R. 1994. P H D - an automatic mail server for protein secondary structure prediction. C A B I O S . 10: 53-60. Rost B . and Sander C. 1994. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19: 55-72. Saalbach G. , Rosso M . and Schumann U . 1996. The vacuolar targeting signal of the 2S albumin from Brazil nut resides at the C terminus and involves the C-terminal propeptide as an essential element. Plant Physiol. 112(3): 975-985. 176 Saalbach I., Waddell D . , Pickardt T., Schieder O. and Muentz K . 1995a. Stable expression of the sulphur-rich 2S albumin gene in transgenic Vicia narbonensis increases the methionine content of seeds. J. Plant Physiol. 145(5-6): 674-681. Saalbach I., Pickardt T., Waddell D.R. , Hillmer S., Schieder O. and Muntz K . 1995b. The sulphur-rich Brazil nut 2S albumin is specifically formed in transgenic seeds of the grain legume Vicia narbonensis. Euphytica 85(1-3): 181-192. Saalbach I., Pickardt T., Machemehl F., Saalbach G. , Schieder O. and Muentz K . 1994. A chimeric gene encoding the methionine-rich 2S albumin of the Brazi l nut (Bertholletia excelsa H . B . K . ) is stably expressed and inherited in transgenic legumes. M o l . Gen. Genet. 242(2): 226-236. Salmanowicz B P . and Przybylska J. 1994. Electrophoretic patterns of seed albumins in the Old-World Lupinus species (Fabaceae): Variation in the 2S albumin class. Plant Syst. Evol . 192(1-2): 67-78. Salmanowicz B .P . 1995. Comparative study of seed albumins in the Old-World Lupinus species (Fabaceae) by reversed-phase H P L C . Plant Syst. Evo l . 195(1-2): 77-86. Salmanowicz B .P . and Weder J .K.P. 1997. Primary structure of 2S albumin from seeds of Lupinus albus. Zeitschrift fur Lebensmittel-Untersuchung und-Forschung A-Food Research & Technology. 204(2): 129-135. Sambrook J., Fritsch E.F . and Maniatis T. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N . Y . , 1989. Sander C. and Schneider R. 1991. Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins 9: 56-68. Savard L . , L i P., Strauss S.H., Chase M . W . , Michaud M . and Bousquet J. 1994. Chloroplast and nuclear gene sequences indicate late Pennsylvanian time for the last common ancestor of extant seed plants. Proc. Natl. Acad. Sci. U S A 91(11): 5163-5167. Schwenke K . D . , Drescher B . , Zirwer D . and Raab B . 1988. Structural studies on the native and chemically modified low-molecular mass basic storage protein (napin) from rapeseed (Brassica napus L . ) . Biochem. Physiol. Pflanzen 183: 219-224. Scofield S.R. and Crouch M L . 1987. Nucleotide sequence of a member of the napin storage protein family from Brassica napus. J. Bio l . Chem. 262: 12202-12208. Sharief F.S. and L i S.S.L. 1982. Amino acid sequence of small and large subunits of seed storage protein from Ricinus communis. J. B io l . Chem. 257: 14753-14759. Shen, Q. and T .D. Ho . 1995. Functional dissection of an abscisic acid (ABA)-inducible gene reveals two independent ABA-responsive complexes each containing a G-box and a novel c/s-acting element. Plant Cell 7: 295-307. 177 Shewry P.R., Napier J.A. and Tatham A . S . 1995. Seed storage proteins: Structures and biosynthesis. Plant Cell 7: 945-956. Shutov A . D . , Kakhovskaya I.A., Braun H . , Baumlein H . and Muentz K . 1995. Legumin-like and vicilin-like seed storage proteins: Evidence for a common single-domain ancestral gene. J. M o l . Evol . 41(6): 1057-1069. Simpson G . G . and Filipowicz W . 1996. Splicing of precursors to m R N A in higher plants: mechanism, regulation and sub-nuclear organisation of the spliceosomal machinery. Plant M o l . B io l . 32(1-2): 1-41. Simpson G . G . , Leader K . J . , Brown J.W.S. and Franklin, T. 1993. Characteristics of plant pre-mRNA introns and transposable elements. In: R . R . D . Croy (ed.), Plant Molecular Biology Labfax, BIOS Scientific Publishers, Oxford, U K . pp. 183-251. Sneath P . H . A . and Sokal R.R. 1973. Numerical taxonomy. Freeman, San Francisco. Srinivas H . and Rao M . S . M . 1987. Studies on the low molecular weight proteins of poppy seed (Papaver somniferumh). J. Agr i . Food Chem. 35: 12-14. Stalberg K , Ellerstom M . , Ezcurra I., Ablov S. and Rask L . 1996. Disruption of an overlapping E - b o x / A B R E motif abolished high transcription of the napA storage-protein promoter in transgenic Brassica napus seeds. Planta. 199(4): 515-519. Stalberg K . , Ellerstrom M . , Josefsson L . G . and Rask L . 1993. Deletion analysis of a 2S seed storage protein promoter of Brassica napus in transgenic tobacco. Plant M o l . B io l . 23(4): 671-683. Staswick P .E . 1989. Developmental regulation and the influence of plant sinks on vegetative storage protein gene expresssion in soybean leaves. Plant Physiol. 89: 309-315. Stayton M . , Harpster M . , Brosio P. and Dunsmuir P. 1991. High-level seed-specific expression of foreign coding sequences in Brassica napus. Aus. J. Plant Physiol. 18(5): 507-518. Stewart W . N . and Rothwell G .W. 1993. Paleobotany and the Evolution of Plants. 2nd ed. Cambridge, U K : Cambridge University Press. Sun S .S .M. , Altenbach S.B. and Leung F .W. 1987. Properties, biosynthesis and processing of a sulfur-rich protein in Brazil nut (Bertholletia excelsa H . B . K . ) . Eur. J. Biochem 108: 477-481. Sun S .S .M. , Zuo W. , Tu H . M . and Xiong L . 1996. Plant proteins: Engineering for improved quality. In: Engineering plants for Commercial Products and Applications. Annals of the New Y o r k Academy of Sciences, v. 792. New Y o r k Academy of Sciences, New York. 178 Sundas A . , Tandre K . , Kvarnheden A . and Engstrom P. 1993. c D N A sequence and expression o f an intron-containing histone H 2 A gene from Norway spruce, Picea abies. Plant M o l . B io l . 21(4): 595-605. Svendsen I., Nicolova D . , Goshev I. and Genov N . 1994. Primary structure, spectroscopic and inhibitory properties of a two-chain trypsin inhibitor from the seeds of charlock (Sinapis arvensis L . ) , a member of the napin protein family. Int. J. Peptide Protein Res. 43: 425-430. Tabe L . M . , Higgins C M . , McNabb W . C . and Higgins T.J .V. 1993. Genetic engineering of grain and pasture legumes for improved nutritive value. Genetica 90: 181-200. Tabe L . M . , Wardley-Richardson T., Ceriotti A . , Aryan A . , McNabb W. , Moore A . and Higgins T.J .V. 1995. A biotechnological approach to improving the nutritive value of alfalfa. J. Animal Science 73(9): 2752-2759. Taylor C . B . 1997. Promoter fusion analysis: A n insufficient measure of gene expression. Plant Cell 9(3): 273-275. Tartof K . D . and Hobbs C A . 1987. Improved media for growing plasmid and cosmid clones. Focus 9: 12. Templeman T.S, Stein D . B . and Demaggio A . E . 1988. A fern spore storage protein is genetically similar to the 1.7S seed storage protein of Brassica napus. Biochem. Genet. 26(9-10): 595-604. Templeman T.S. , Demaggio A . E . and Stetler D . A . 1987. Biochemistry o f fern spore germination: Globulin storage proteins in Matteuccia struthiopteris L . Plant Physiol. 85(2): 343-349. Terras F . R . G . , Torrekens S., Van Leuven F., Osborn R.W. , Vanderleyden J., Cammue B . P . A . and Broekaert W.F . 1993 a. A new family of basic cysteine-rich plant antifungal proteins from Brassicaceae species. F E B S Lett. 316 (3): 233-240. Terras F . R . G . , Schoofs H . M . E . , Thevissen K . , Osborn R.W. , Vanderleyden J., Cammue B . P . A . and Broekaert W.F . 1993b. Synergistic enhancement of the antifungal activity of wheat and barley thionins by radish and oilseed rape 2S albumins and by barley trypsin inhibitors. Plant Physiol. 103 (4): 1311-1319. Terras F . R . G . , Schoofs H . M . E . , De Bolle M . F . C , Van Leuven F., Rees S.B., Vanderleyden J., Cammue B . P . A . and Broekaert W.F . 1992. Analysis of two novel classes of antifungal proteins from radish (Raphinus sativus L . ) seeds. J. B io l . Chem. 267: 15301 - 15309. Thomas T .L . 1993. Gene expression during plant embryogenesis and germination: A n overview. Plant Cell 5: 1401-1410. 179 Thorpe S C . , Kemeny D M . , Panzani R .C . , M c G u r l B . and Lord M . 1988. Allergy to castor bean. II. Identification of the major allergens in castor bean seeds. J. Allergy Clin. Immunol. 82(1): 67-72. Thoyts P.J .E. , Napier J .A., Millichip M . , Stobart A . K . , Griffiths W.T. , Tatham A . S . and Shewry P.R. 1996. Characterization of a sunflower seed albumin which associates with oil bodies. Plant Sci. 118(2): 119-125. Troitsky A . V . , Melekhovets Y . F . , Rakhimova G . M . , Bobrova V . K . , V a l i e j o - R o m a n K M . and Antonov A . S . 1991. Angiosperm origin and early stages of seed plant evolution deduced from r R N A sequence comparisons. J. M o l . Evol . 32(3): 253-261. van de Kle i H . , Van Damme J., Casteels P. and Krebbers E . 1993. A fifth 2S albumin isoform is present in Arabidopsis thaliana. Plant Physiol. 101: 1415-1416. Vandekerckhove J., Van Damme J., Van Lijsebettens M , Botterman J., De Block M . , Vandewiele M . , Declercq A . , Leemans J., Van Montagu M . and Krebbers E . 1989. Enkephalins produced in transgenic plants using modified 2S seed storage proteins. Biotech. 7: 929-932. Vincentz M . , Leite A . , Neshich G , Vriend G. , Mattar C , Barros L . , Weinberg D. , de Almeida E.R. , Paes de Carvalho M . , Aragao F. and Gander E.S. 1997. A C G T and vicilin core sequences in a promoter domain required for seed-specific expression of a 2S storage protein gene are recognized by the opaque-2 regulatory protein. Plant M o l . Bio l . 34: 879 - 889. Voelker T .A . , Hayes T.R., Cranmer A . M . , Turner J.C. and Davies H . M . 1996. Genetic engineering of a quantitative trait - metabolic and genetic parameters influencing the accumulation of laurate in rapeseed. Plant J. 9(2): 229-241. von Heijne G . 1986. A new method for predicting signal sequence cleavage sites. Nucl . Acid . Res. 14: 4683-4690. Wallace R . B . , Shaffer I , Murphy R.F. , Bonner I , Hirose T. and Itakura K . 1979. Hybridization of synthetic oligodeoxyribonucleotides to phi chi 174 D N A : the effect of single base pair mismatch. Nucl . Acid. Res. 6(11): 3543-3557. Webster F . B . , Roberts D.R. , Mclnnis S . M . and Sutton B . C . S . 1990. Propagation of interior spruce by somatic embryogenesis. Can. J. Forest. Res. 20: 1759-1765. Wen L . , Huang J .K. , Zen K . C . , Johnson B . H , Muthukrishnan S., MacKay V . , Manney T.R., Manney M . and Reeck G.R. 1992. Nucleotide sequence of a c D N A clone that encodes the maize inhibitor of trypsin and activated Hageman factor. Plant M o l . Bio l . 18(4): 813-814. Wessler S.R., Bureau T .E . and White S.E. 1995. LTR-retrotransposons and M I T E s : important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5: 814-821. 180 White O., Soderlund C , Shanmugan P. and Fields C. 1992. Information contents and dinucleotide compositions of plant intron sequences vary with evolutionary origin. Plant M o l . B io l . 19: 1057-1064. Yasuda E . , Ebinuma H . and Wabiko H . 1997. A novel glycine-rich/hydrophobic 16 kDa polypeptide gene from tobacco: similarity to proline-rich protein genes and its wound inducible and developmental^ regulated expression. Plant M o l . B io l . 33: 667-678. Youle R. and Huang A . H . C . 1981 Occurrence of low molecular weight and high cysteine containing albumin storage protein in oilseeds of diverse species. A m . J. Bot. 68: 44 48. Youle R.J . and Huang A . H . C . 1979. Albumin storage proteins and allergens in cottonseed. J. Agr i . Food Sci. 27: 500-503. Youle R.J . and Huang A . H . C . 1978a. Albumin storage proteins in the protein bodies of castor bean. Plant Physiol. 61:13-16. Youle R.J . and Huang A . H . C . 1978b. Identification of the castor bean allergens as the albumin storage proteins in the protein bodies of castor bean. Plant Physiol. 61: 1040-1042. Zhong P . - Y . , Tanaka T., Yamauchi D . and Minamikawa T. 1997. A 28-kiloDalton pod storage protein of French bean plants. Plant Physiol. 113: 479-485. 181 Appendices Appendix A 2S Albumins and Related Genes 183 Appendix B Buffers and Solutions 186 Bacterial and Plant Tissue-Culture Media 191 Appendix C List of Suppliers 197 Appendix D B L A S T Sequence Alignment Results 200 Appendix E Prediction of Protein Secondary Structure for PG2S .205 182 Appendix A 2S Albumins and Related Genes Species _octis ID Accession Sequence Size Comment Pteridophyta 1 Matteuccia struthiopteris 2S0ST2 S11351 AA 25 solated from fern spores 2 Matteuccia struthiopteris 2S0ST S11350 AA 42 solated from fern spores 3 Matteuccia struthiopteris 2SSMAT P17718 AA 67 solated from fern spores Gymnosperm 1 Picea abies PASPI1 GEN X91487 mRNA 490 root specific gamma thionin 2 Picea glauca PG2SSP X63193 mRNA 712 3 Picea glauca PGU92077 U92077 DNA 1907 4 Picea glauca PGU92078 U92078 DNA 1675 pseudogene 5 Picea glauca PIAEMB25R L47745 mRNA 714 6 Pinus strobus PSALB1A X62433 mRNA 1049 7 Pinus strobus PSALB2A X62434 mRNA 709 8 Pinus strobus PSALB3A X62435 ' mRNA 683 9 Pinus strobus PSALB4A X62436 mRNA 665 Angiosperm Dicots 1 Arabidopsis thaliana A13820 A13820 DNA 1028 2 Arabidopsis thaliana AT2S1 X87764 DNA 1896 3 Arabidopsis thaliana AT2SALBGA 224744 DNA 2876 2 genes 4 Arabidopsis thaliana AT2SALBGB 224745 DNA 4274 5 Arabidopsis thaliana ATHAT2S1 M22032 DNA 1810 6 Arabidopsis thaliana ATHAT2S2 M22034 DNA 1663 7 Arabidopsis thaliana ATHAT2S3 M22035 DNA 798 8 Arabidopsis thaliana ATHAT2S4 M22033 DNA 1211 9 Arabidopsis thaliana ATS1GENE X87764 DNA 1896 10 Arabidopsis thaliana ATTS0239 217665 mRNA 496 Bowman-Birk PI 11 Arachis hypogaea ARQALLII L77197 mRNA 717 peanut allergen 1] 12 Bertholletia excelsa BE2S1 X54490 DNA 991 intron 13 Bertholletia excelsa BE2S2 X54491 DNA 1028 intron 14 Bertholletia excelsa BE2SG1 X57027 DNA 441 15 Bertholletia excelsa BE2SG2 X57028 DNA 530 16 Bertholletia excelsa A13818 A13818 mRNA 574 17 Bertholletia excelsa BRNALB2S1 M80399 mRNA 563 18 Bertholletia excelsa BRNALB2S2 M80400 mRNA 590 19 Brassica campestris BC2SSP X65037 DNA 537 20 Brassica campestris BNANAPINB M64632 DNA 4329 21 Brassica campestris BNANAPINC M64633 DNA 4338 22 Brassica campestris BNANAPINA M64631 mRNA 733 23 Brassica carinata BNC2SSP X74813 DNA 537 24 Brassica juncea BJ2SCG X65972 DNA 537 25 Brassica juncea BJ2SSP X65040 DNA 537 26 Brassica juncea BJ2SSTP X67833 DNA 1137 27 Brassica napus 2SS1 BRANA P24565 AA 110 28 Brassica napus BNANAPA J02798 DNA 3289 29 Brassica napus BNASSPC J02782 DNA 1026 30 Brassica napus BNNAP1 X17542 DNA 1731 31 Brassica napus BNNAPB X58142 DNA 1257 32 Brassica napus BNNAPIN X14492 DNA 1993 33 Brassica napus BNANAP J02586 mRNA 718 34 Brassica napus BNASSPA K01544 mRNA 566 35 Brassica napus BNASSPB K01545 mRNA 713 36 Brassica napus BNU04943 U04943 mRNA 715 37 Brassica napus BNU04944 U04944 mRNA 715 38 Brassica napus BNU04945 U04945 mRNA 715 39 Brassica nigra BN2SCG X65971 DNA 537 40 Brassica nigra BN2SSP X65039 DNA 537 41 Brassica oleracea B02SCG X65970 DNA 537 42 Brassica oleracea B02SSP X65038 DNA 537 43 Brassica oleracea B02SSPR X70333 • DNA 1137 44 Brassica rapa BC2SCG X65969 DNA 537 45 Capparis masaikai 2SS1 CAPMA P80351 AA 104 46 Capparis masaikai 2SS2 CAPMA P30233 AA 105 47 Capparis masaikai 2SS3 CAPMA P80352 AA 104 48 Capparis masaikai 2SS4 CAPMA P80353 AA 100 49 Cucurbita spp. CUCP2SA D16560 mRNA 532 50 Glycine max GMDIIPIS X68707 DNA 1607 Bowman-Birk PI 51 Glycine max GMBBPI X68704 mRNA 688 Bowman-Birk PI 52 Glycine max GMCIIPI X68705 mRNA 422 Bowman-Birk PI 53 Glycine max GMDIIPI X68706 mRNA 447 Bowman-Birk PI 54 Glycine max GMSE60 213956 mRNA 434 Bowman-Birk PI 55 Glycine max GMU11260 U11260 mRNA 294 Bowman-Birk PI 56 Glycine max SOYBBIM KOI 968 mRNA 208 Bowman-Birk PI 57 Glycine max SOYCIIPI KOI 967 mRNA 645 Bowman-Birk PI 58 Gossypium hirsutum C0TMAT5A M86213 DNA 5291 59 Gossypium hirsutum COTASP m83301 mRNA 570 60 Helianthus annuus 2SS8 HELAN P23110 AA 141 183 Appendix A 2S Albumins and Related Genes Species Locus ID Accession Sequence Size Comment Species Locus ID Accession Sequence Size Comment 61 Helianthus annuus HAG5ALB2 X06410 DNA 2299 intron 62 Helianthus annuus HAO not submitted mRNA 1167 associated with oil bodies 63 Helianthus annuus HA2SALB X76101 mRNA 1167 64 Helianthus annuus HASF8 X56686 mRNA 600 65 Jugfans regia JRU66866 U66866 mRNA 649 66 Lupinus angustifolius LAU74383 U74383 DNA 246 leginsulin-like protein 67 Lupinus angustifolius LACONGLD X53523 mRNA 684 sulfur rich protein 68 Mirabilis Jalapa A25776 A25776 DNA 262 biocidal 69 Nicotiana tabacum 086721 D86721 DNA 530 glycine rich protein 70 Pisum sativum PEAABN1 m13709 DNA 949 intron 71 Pisum sativum PEAALBUM11 M81864 DNA 2148 intron 72 Pisum sativum PEAABN1M M13790 mRNA 609 73 Pisum sativum PEAABN2 M13791 mRNA 957 74 Pisum sativum PEAALBUMAJ M17147 mRNA 957 75 Raphanus sativus A25773 A25773 DNA 572 biocidal. 76 Raphanus sativus RADNAPA M63841/M36628 mRNA 623 77 Raphanus sativus BADNAPB M63842/M366229 mRNA 580 78 Raphanus sativus RADNAPC M63843/M36630 mRNA 653 79 Ricinus communis 2SS RICCO P01089 AA 258 80 Ricinus communis RC2SALBG X54158 DNA 1591 codes for two heterodimers 81 Sinapis alba S54101 S54101 DNA 435 major allergen sin a I 82 Sinapis alba SASIN1 X91799 DNA 435 allergen 83 Sinapis alba SASIN2 X91800 DNA 435 allergen 84 Sinapis alba SASIN3 X91801 DNA 435 allergen 85 Sinapis alba SASIN4 X91802 DNA 435 allergen 86 Sinapis alba SASIN5 X91798 DNA 435 allergen 87 Solanum tuberosum POTTRPI M22140 mRNA 470 Bowman-Birk P.I., from tuber 88 Solanum tuberosum ST322R X13180 mRNA 470 Bowman-Birk P.I., from tuber 89 Vigna unguiculata VUTRYPIFV X51617 mRNA 583 Bowman-Birk PI Monocot 1 Avena sativa ASTSSPA J05486 DNA 1548 2 Avena sativa ASTSSPA JO 5486 DNA 1548 3 Avena sativa ASTAVE M38446 mRNA 758 endosperm 4 Avena sativa ASTAVEA M38721 mRNA 725 5 Avena sativa ASTAVEB M38722 mRNA 889 6 Avena sativa ASTAVEC M83381 mRNA 875 7 Coix lacryma-jobi CLBCOIX X79885 DNA 1643 8 Hordeum vulgare BLYHORA M23080 DNA 1805 alpha-hordothionin 9 Hordeum vulgare BLYBTH7T L36883 DNA 2259 thionin 10 Hordeum vulgare BLYBTH6T L36882 DNA 2406 thionin 11 Hordeum vulgare BLYG1HORDA M36378 DNA 1614 12 Hordeum vulgare HVB1H0R2 X87232 DNA 1775 13 Hordeum vulgare HVB1HORG X03103 DNA 2900 14 Hordeum vulgare HVBH031 X53690 DNA 1255 15 Hordeum vulgare HVDNAHOR3 X84368 DNA 1859 16 Hordeum vulgare HVGHRDSP X13508 DNA 1614 17 Hordeum vulgare HVHOR11 7 X60037 DNA 1420 18 Hordeum vulgare HVNREHTH X63357/X67707 DNA 1654 endosperm specific a- thionin 19 Hordeum vulgare BLYHOR3 D82941 mRNA 2296 20 Hordeum vulgare BLYTHNA Ml 9046 mRNA 540 leaf specific thionin 21 Hordeum vulgare HVB1H0R1 X01024/M23835 mRNA 552 22 Hordeum vulgare HVB1HORD X01778 mRNA 868 23 Hordeum vulgare HVB3H0RD X01777 mRNA 954 24 Hordeum vulgare HVGA3H0R X72628 mRNA 973 25 Oryza longistaminata OLPROLAI X51488 DNA 349 26 Oryza rufipogon ORPROLA1 X51487 DNA 332 27 Oryza rufipogon ORPROLA2 X51489 DNA 343 28 Oryza rufipogon 0RPR0LA3 X51490 DNA 339 29 Oryza rufipogon 0RPR0LA4 X51491 DNA 344 30 Oryza rufipogon 0RPR0LA5 X51492 DNA 328 31 Oryza rufipogon 0RPR0LA6 X51493 DNA 330 32 Oryza rufipogon ORPROLA7 X51494 DNA 336 33 Oryza rufipogon 0RPR0LA8 X51495 DNA 343 34 Oryza sativa D50643 D50643 DNA 1773 35 Oryza sativa OS10KDPRO X81970 DNA 525 36 Oryza sativa OSPR010 XI7074 DNA 1546 37 Oryza sativa OSRP3 X71981 DNA 1836 38 Oryza sativa OSRP6G X65064 DNA 2200 39 Oryza sativa RIC13KDAP D63901 DNA 2993 40 Oryza sativa RICPROLA L36604 DNA 525 41 Oryza sativa RICPROLB L36605 DNA 525 42 Oryza sativa RICRAG1 D11433 DNA 1286 rice allergenic 43 Oryza sativa OSGLUT21 XI4393 mRNA 1632 ?? maybe not 184 Appendix A 2S Albumins and Related Genes Species Locus ID Accession Sequence Size Comment Species Locus ID Accession Sequence Size Comment 44 Oryza sativa OSPRL X14392 mRNA 603 45 Oryza sativa OSPROLAM X60979/S45531 mRNA 629 46 Oryza satrva OSPROLAMI X15231 mRNA 652 47 Oryza sativa OSRNAPROL X84649 mRNA 533 48 Oryza sativa OSU76004 U76004 mRNA 1050 Bowman-Birk PI 49 Oryza sativa RICPR0L1 7A M23745 mRNA 650 prolamine 50 Oryza sativa RICRA14 D11432 mRNA 618 rice allergenic 51 Oryza sativa RICRA17 D11431 mRNA 636 rice allergenic 52 Oryza satrva RICRA5 D11430 mRNA 618 rice allergenic 53 Oryza spp. S72972 S72972 DNA 728 promoter only, may not be 2S 54 Secale cereale SCSECGR X02602 mRNA 714 55 Sorghum bicolor SVGKAF X62481 DNA 2647 gamma-kafirin, ? homology 56 Sorghum bicolor SRGENSPMRN M73688 mRNA 848 gamma-kafirin, ? homology 57 Triticum aest'rvum TAAGCNN16 X54517 DNA 3581 alpha gliadin pseudogene 58 Tr'rticum aest'rvum TAAGCNN1 7 X54689 DNA 3566 alpha gliadin pseudogene 59 Triticum aestivum TAAGCNN35 X54688 DNA 3573 alpha gliadin pseudogene 60 Triticum aestivum TAGLIAA X01130 DNA 2347 gliadin 61 Triticum aestivum TAGLIAG1 X02538 DNA 1679 alpha/beta gliadin 62 Triticum aestivum TAGLIAG2 X02539 DNA 1672 alpha/beta gliadin 63 Triticum aestivum TAGLIAG3 X02540 DNA 1753 alpha/beta gliadin 64 Triticum aest'rvum TAGLU1D1 X13306/X06334 DNA 3165 LMW glutenin 65 Triticum aest'rvum TAGLU1DG X03041 DNA 3095 HMW glutenin 66 Triticum aestivum TAU08287 U08287 DNA 6115 alpha gliadin 67 Triticum aestivum TAU50984 U50984 DNA 1758 alpha-gliadin 68 Triticum aestivum TAU51302 U51302 DNA 2213 alpha gliadin pseudogene 69 Triticum aestivum TAU51303 U51303 DNA 7644 alpha gliadin 70 Triticum aest'rvum TAU51304 U51304 DNA 2809 alpha gliadin 71 Triticum aestivum TAU51305 U51305 DNA 3534 alpha gliadin 72 Triticum aestivum TAU51306 U51306 DNA 3034 alpha gliadin 73 Triticum aestivum TAU51307 U51307' DNA 3022 alpha gliadin 74 Triticum aestivum TAU51308 U51308 DNA 1799 alpha gliadin pseudogene 75 Triticum aestivum TAU51309 U51309 DNA 1997 alpha gliadin pseudogene 76 Triticum aestivum TAU51310 U51310 DNA 3019 alpha gliadin pseudogene 77 Triticum aestivum WHTAGMP D84341 DNA 780 alpha-gliadin 78 Triticum aestivum WHTAGMP D84341 DNA 780 alpha gliadin 79 Triticum aestivum WHTGGLN M36999 DNA 2086 gamma gliadin 80 Tr'rticum aestivum WHTGGMPA D78183 DNA 840 gamma gliadin 81 Triticum aestivum WHTGLGAP M13712 DNA 2450 gamma gliadin 82 Triticum aestivum WHTGLGB M13713 DNA 2450 gamma gliadin 83 Triticum aestivum WHTGLIABB K03074 DNA 3043 alpha/beta gliadin 84 Tr'rticum aestivum WHTGLIABD K03075 DNA 3310 alpha/beta gliadin 85 Tr'rticum aestivum WHTGLIABE K03076 DNA 3022 atpha/beta gliadin class 1 86 Triticum aestivum TAAGLIA X17361/S51000 mRNA 994 alpha/beta gliadin 87 Triticum aestivum TAGLIA X00627/K03076 mRNA 3022 gliadin 88 Triticum aestivum TAPURD1A X70666 mRNA 567 endosperm specific a- thionin 89 Tr'rticum aest'rvum WHTGLIA K02068 mRNA 1152 alpha gliadin 90 Triticum aestivum WHTGLIABA M10092 mRNA 1102 alpha/beta gliadin class A-ll 91 Triticum aestivum WHTGLIABC M11073 mRNA 1 156 alpha/beta gliadin class A-V 92 Triticum aestivum WHTGLIABF Ml 1074 mRNA 950 alpha/beta gliadin class A-l 93 Tr'rticum aestivum WHTGLIABG Ml 1075 mRNA 1126 alpha/beta gliadin class A-IV 94 Tr'rticum aestivum WHTGLIABH M11076 mRNA 1039 alpha/beta gliadin class A-lll 95 Tr'rticum aestivum WHTGLIB K02069 mRNA 725 alpha gliadin 96 Triticum aestivum WHTGLIGBB M11336 mRNA 1130 gamma gliadin class B-l 97 Triticum aestivum WHTGLIGBC M11335 mRNA 927 gamma gliadin class B-lll 98 Triticum aestivum WHTGLIGP M16060 mRNA 798 gamma gliadin 99 Triticum durum TDGAGL X53412 DNA 842 gamma gliadin 100 Triticum urartu WHTGLNA M16496 DNA 3179 alpha/beta gliadin 101 Zea mays MZEGLUT2E Ml 6066 DNA 1857 102 Zea mays MZEME1 5G M13507 DNA 1240 103 Zea mays MZEZEIN1 OK M23537 DNA 2562 104 Zea mays MZEZEINP M72708 DNA 2085 105 Zea mays MZEZEISP M33830 DNA 215 5' flank only 106 Zea mays ZMZC1 X53515 DNA 3864 107 Zea mays ZMZC2ZEI X53514 DNA 2975 108 Zea mays ZMZEIN27 X58197 DNA 3108 109 Zea mays ZMZEINPR X63667 DNA 1452 110 Zea mays MZEZE1 5A3 M12147 mRNA 851 111 Zea mays MZEZE16 Ml 6460 mRNA 800 112 Zea mays MZEZEG M16218 mRNA 909 113 Zea mays ZMWIP1 X71396 mRNA 601 Bowman-Birk, wound induced 185 Appendix B Stock Solutions and Buffers Alkaline SDS solution (40 ml) distilled water 37.2 ml l O N N a O H 0.8 ml 20% SDS 2.0 ml Make this solution just prior to use Agrobacterium Wash Solution 0 . 5 M N a C l 50 m M Tris l O m M E D T A 0.1% N-lauroyl sarcosine C T A B (cetyl trimethylammonium bromide) Extraction Buffer I (for Spruce) 1% P E G 1000 100 m M Tris 1 . 4 M N a C l 2% C T A B 20 m M E D T A 0.3% S L S C T A B Extraction Buffer II (for Tobacco) (Dovle and Doyle, 1990) 2% C T A B 1 . 4 M N a C l 0.2% P-mercaptoethanol 20 m M E D T A 100 m M Tris HC1 (pH 8.0) Deaza Dideoxv Nucleotide M i x Formulations for Taq D] ^ A polymerase Sequencing Component d/ddGTP Mix d/ddATP Mix d/ddTTP Mix d/ddCTP Mix ddGTP ddATP ddTTP ddCTP 7- deaza dGTP dATP dTTP dCTP 25 LLM 25 u.M 250 LIM 250 u.M 250 LIM 350 u.M 250 u.M 25 LIM 250 LIM 250 u.M 300 LIM 250 u.M 250 LIM 25 LIM 250 u.M 160 u.M 250 LIM 250 LIM 250 LIM 25 LIM 186 F A A (formalin acetic acid) Fixative 5 ml glacial acetic acid 2 ml formaldehyde 100 ml ethanol 88 ml distilled water Ficoll tracking dye (for D N A gels) 0.25% bromophenol blue 0.25% xylene cyanol FF 15% Ficoll (Type 400, Pharmacia) dissolved in distilled water Formamide Loading Dye (for R N A gels) 10 - 20 ul 10% xylene cyanol 10 - 20 ul 10% bromphenol blue 1 ml formamide. Adjust degree of colour to personal preference. Store at -20 °C. G U S extraction buffer for M U G Assay 100 ml 50 m M N a H P 0 4 , p H 7.0 0.1 M 50 ml l O m M D T T 1.0 M 1 m l 1 m M E D T A 0.5 M 0.2 ml 0 . 1 % Triton X-100 10 % 1 ml I P X Labeling buffer - for Random Labeling of Probes 250 m M Tris -HCl , p H 8.0 2 5 m M M g C l 2 l O m M D T T I m M H E P E S , p H 6.6 Lysozyme Solution 50 m M glucose 18.02 g 25 m M Tr i s -HCl 3.94 g 10 m M E D T A 3.72 g distilled water to 1 litre Adjust p H to 8.0. Filter sterilize and store at 4 °C. Lysozyme (Sigma) is added to a 10 mg/ml final concentration just prior to use. 187 I P X M O P S Buffer 84g N a M O P S 3.8 g N a 2 E D T A I 1 distilled water II ml glacial acetic acid Autoclave before use. or 0.2 M M O P S (2P9.38g/mole) 42 g 8PmM NaAcetate M W 82.P3) 6.6 g 8PP ml dH2P p H to 7 with 5 N N a O H P . 5 M E D T A 2 P m l fill to 11, add 1 ml of D E P C for 2 hours Autoclave 4 - M U (4-methyl-umbelliferone) Stock l m M 4 - M U 10 mg/50 ml store at -20 °C M U G (4-methyl-umbelliferone glucuronide) Stock 50 m M M U G 18 mg/ml make fresh M U G Stop Buffer 0 .2M N a 2 C 0 3 21.2 g/1 Northern Blot Pre-Hybridization Stock Solutions i , 20% SDS in sterile D E P C treated distilled water. i i , 2 M N a H 2 P 0 4 (27.6 g) in 4 m M E D T A (0.8 ml of 0.5 M E D T A ) p H 7.2 with 10 N N a O H ii i , B S A 50 mg/ml stock in autoclaved distilled water 5 M Potassium Actetate (Alkaline lysis protocol) potassium acetate 294.0 g glacial acetic acid 115.0 ml distilled water to 1 litre The p H should be approximately 5 Autoclave RNase A (DNase-free) . Dissolve 10 mg/ml in 10 m M Tr i s -HCl (pH 7.5), 15 m M NaCl . B o i l for 15 minutes, then cool slowly to room temperature. Store at -20 °C. PvNase-free distilled Water Add diethyl pyrocarbonate (DEPC) to 0.1% mix well and let stand overnight Remove residual D E P C by autoclaving for 15 minutes. 188 S1 nuclease mix use fresh 172 pi distilled water 27 pi SI 7 .4X buffer 60 units S1 nuclease SI nuclease 7 .4X buffer 0.3 M potassium acetate 2.5 M N a C l 1 0 m M Z n S O 4 50% glycerol Salmon Sperm D N A (sheared) (Sambrook et al, 1989) salmon sperm D N A , sodium salt (type III, Sigma) 10 mg/ml dissolve completely in distilled water, a d d N a C l t o O . l M extract once with phenol, once with phenol:chloroform The D N A in the aqueous phase is sheared by passing it rapidly through a 17 gauge needle 12 times. D N A is precipitated with 2 volumes of ice cold ethanol and redissolved to a 10 mg/ml concentration. The sheared D N A solution is denatured by boiling for 10 minutes and stored at -20 °C. Before being added to hybridization solutions it is boiled for 5 minutes and snap cooled on ice. Sequencing Stop Buffer l O m M N a O H 95% formamide 0.05% bromophenol blue 0.05% xylene cyanol S M Phage Dilutant (Sambrook et al, 1989) O . l M N a C l 5.8 g/1 1 0 m M M g S O 4 - 7 H 2 O 2 g/1 50 m M T r i s H C l , p H 7.5 50 ml of 1 M stock 0.01% gelatin 5 ml of a 2% stock distilled water to 1 litre 2 0 X SSC 3 M N a C l 175.3 g 0 .3M Na 3 citrate-2H 2 0 88.2 g distilled water to 800 ml 1 M HC1 p H 7.0 distilled water to 1 litre 189 2 0 X S S P E 3 . 6 M N a C l 200 m M N a H 2 P 0 4 H 2 P 210.2 g 27.6 g 20 m M E D T A distilled water I O N N a O H 40 ml of 0.5 M stock, p H 8.0 to 1 litre p H 7 . 4 S T E T 8% glucose 0.5 % Triton X-100 50 m M E D T A (pH 8.0) 1 0 m M T r i s ( p H 8 . 0 ) 50X T A E Buffer Tris Base 242 g glacial acetic acid 57.1 ml 0.5 M E D T A , p H 8.0 100 ml distilled water to 1 litre adjust p H to 8.5 used I X for agarose gels and running buffer I P X T B E Buffer Tris Base 1P8 g Boric acid 55 g P . 5 M E D T A , p H 8.P 4P ml distilled water to 1 litre vacuum filter through Whatman #1, i f using for sequencing gels or buffers used I X for agarose gels and running buffer N a O H to adjust p H to 8.P autoclave. Tris Equilibrated Phenol Melt frozen phenol stock in 68 °C water bath. Add hydroxyquinoline to a final concentration of P. 1%. Saturate the phenol with an equal volume of P. 5 M Tris HC1 (pH 8.P), by stirring at room temperature on a magnetic stirrer. A l l o w the phases to separate and remove the upper aqueous layer. Equilibrate the phenol by extracting 3 times with an equal volume of P. 1 M Tris HC1 (pH 7.6). Store with an overlay (P. 1 volume) of P. 1 Tris HC1, in dark glass bottles at 4 °C. T E (Tr i s -EDTA) Buffer Tr i s -HCl P . 5 M E D T A 1.2 g/1 2.P ml/1 190 Bacterial Media 523 Medium (Kado et al, 1972) sucrose 10 g casein, enzymatic hydrolysate (Sigma) . 8 g yeast extract 4 g K 2 H P 0 4 3 g distilled water to 1 litre adjust p H to 7.0 Bacto agar (Difco) 20 g/1 after autoclaving, cool to 55 °C and add: sterile l M M g S 0 4 1.22 ml 925 Minimal Medium sucrose 10 g K 2 H P 0 4 3 g N a H 2 P 0 4 l g NFI4CI l g distilled water to 1 litre agar 20 g/1 after autoclaving, cool and add: sterile 1 M M g S 0 4 1.22 ml L B Medium Bacto tryptone (Difco) 10 g Bacto yeast extract (Difco) 5 g N a C l 10 g distilled water to 1 litre adjust p H to 7.0 Bacto agar (Difco) 15 g/1 S O C Medium Bacto tryptone 20 g Bacto yeast extract 5 g N a C l 0.5 g 2 5 0 m M K C l 10 ml distilled water to 980 ml adjust p H to 7.0 autoclave and cool: 2 M M g C l 2 (sterile) 5 ml/1 1 M glucose (filter sterilized) 20 ml 191 T B Medium (Sambrook et al, 1989) Bacto tryptone (Difco) lOg/1 N a C l 5 g/1 distilled water to 1 1 agar 15 g/1 autoclave, cool to 55 °C 0.34% Vitamin B1 (filter sterile) 5 ml T B Top Agar as above but reduce agar to 0.6% store at 4 °C, as 100 ml aliquots. Terrific Broth (Tartof and Hobbs, 1987) Bacto tryptone (Difco) 12 g Bacto yeast extract (Difco) 24 g glycerol 4 ml distilled water to 900 ml autoclave, cool to 60 °C sterile potassium phosphate solution 100 ml Terrific Broth Potassium Phosphate solution 0 . 1 7 M K H 2 P O 4 23.14 g 0 . 7 2 M K 2 H P O 4 125.41 g Make up to 1 litre with distilled water and autoclave Y E B Medium (= Y N medium) Beef extract 5g peptone 5g sucrose 5g yeast extract l g MgSO4-7H20 0.5g distilled water 1 1 Bacto agar 15 g/1 Y E P Medium Bacto peptone (Difco) 10 g Bacto yeast extract 10 g N a C l 5 g distilled water to 1 litre Bacto agar 15 g/1 after autoclaving, cool: biotin* (100 ug/ml) 20 ul (*not necessary for growth of all Agrobacterium strains) 192 Y T Medium Bacto tryptone (Difco) 8 g Bacto yeast extract (Difco) 5 g N a C l 5 g distilled water to 1 litre adjust p H to 7.0 Bacto agar (Difco) 15 g/1 Plant Tissue Culture Media lA L M Medium (Modified* from Litvay et al., 1985) L M Stock Solutions *Macros and Micros 1/2 strength of original L M medium V 6 L M Macros ( IPX) K N 0 3 9.5 g/1 N H 4 N O 3 8.25 g/1 K H 2 P 0 4 1.70 g/1 M g S 0 4 - 7 H 2 0 9.25 g/1 C a C l 2 - 2 H 2 0 0.11 g/1 'AIMMicros (\00X) M n S 0 4 - 4 H 2 0 1.35 g/1 Z n S 0 4 - 7 H 2 0 2.15 g/1 H 3 B O 3 155 g/1 K I 207.5 mg/1 N a 2 M o 0 4 - 2 H 2 0 62.5 mg/1 C u S 0 4 - 5 H 2 0 25.0 mg/1 C o C l 2 - 6 H 2 0 6.5 mg/1 XA L M F e E D T A flOOX) F e S 0 4 - 7 H 2 0 2.78 g/1 dissolved in 400 ml hot distilled water N a 2 E D T A - 2 H 2 0 3.73 g/1 dissolved in 400 ml hot distilled water Combine solutions, cool, and make up to 1 litre. XA L M Vitamins flOOX) store aliquots at-20 °C myo-inositol 10 g/1 thiamine-HCl 10 mg/1 nicotinic acid 50 mg/1 pyridoxine-HCl 10 mg/1 XA L M Glutamine glutamine 25 g/1 193 Vi L M Hormone Stock Solutions V2 L M B A 6-benzylaminopurine 112.7 mg/1 dissolve with H C L V2 L M 2.4-D 2,4-dichlorophenoxyacetic acid 221.2 mg/1 dissolve with K O H lA L M A B A 2.0 m M abscisic acid 528.6 mg/1 dissolve with K O H ^ L M I B A O . l m M 3-indolebutyric acid 20.32 mg/1 dissolve with ethanol Stock Solution Maintenance Yi LM Pre-Maturation Yi LM charcoal Maturation Yz LM 60:1 Germina ion Yi LMHF Macros 100 ml 100 ml 100 ml 100 ml Micros 10 ml 10 ml 10 ml 10 ml FeEDTA 10 ml 10 ml 10 ml 10 ml Vitamins 10 ml 10 ml 10 ml 10 ml Hormones 10 ml BA 30 ml ABA Hormones 10 ml 2,4-D 10 ml IBA Casein1 l g l g l g l g Charcoal2 10.0 g Sucrose 10.0 g 34.0 g 34.0 g 20.0 g pH 5.8 5.8 5.8 5.8 Agar3 6.36 g 8.0 g 6.36 g 6.36 g Glutamine4 20 ml 20 ml 20 ml 20 ml 1. Enzymatic Hydrolysate of Casein (Sigma). 2. Activated charcoal (Sigma). 3. Noble Agar (Difco). 4. Filter sterilized into cooled media. M S medium (Murishige and Skoog, 1962) M S Stock Solutions M S Macros (10X) NH4NO3 16.5 g/1 KNO3 19.0 g/1 C a C l 2 - 2 H 2 0 4.4 g/1 M g S 0 4 - 7 H 2 0 3.7 g/1 KH2PO4 17 g/1 194 M S Micros (500X) H3BO3 3.1 g/1 M n S 0 4 - 4 H 2 0 11.15 g/1 Z n S 0 4 - 7 H 2 0 4.31 g/1 K I 415 mg/1 N a M o 0 4 . - 2 H 2 0 125 mg/1 C u S 0 4 - 5 H 2 0 12.5 mg/1 C o C l 2 - 6 H 2 0 12.5 mg/1 M S F e E D T A (100X) F e S 0 4 - 7 H 2 0 2.78 g dissolved in 400 ml hot distilled water N a 2 E D T A - 2 H 2 0 3.73 g dissolved in 400 ml hot distilled water M i x the two solutions, cool, and bring up to 11 in volume. M S Vitamins (100X) store aliquots at-20 °C nicotinic acid 500 mg/1 pyridoxine-HCl 500 mg/1 thiamine-HC1 100 mg/1 myo-Inositol lOOg/1 M S Glycine (100X) glycine 200 mg/1 M S Hormone Stock Solutions Shoot Induction Medium (SIM) Hormones: M S - S I M 6-Benzvlaminopurine ( B A ) (Sigma) (100X) 6-benzylaminopurine 200 mg/1 dissolve by adding HC1 M S - S I M 1-Naphthaleneacetic acid f N A A ) (Sigma) (100X) 1-naphthaleneacetic acid 10 mg/1 dissolve by adding K O H Callus Induction Medium (CLM) Hormones. M S - C I M B A ( I O O X ) 6-benzylaminopurine 20 mg/1 dissolve by adding HC1 M S - C I M N A A ( I O O X ) 1-naphthaleneacetic acid 250 mg/1 dissolve by adding K O H Tobacco Suspension Medium Hormone: MS-2.4-dichlorophenoxvacetic acid (2.4-D) (1000X) 2,4-dichlorophenoxyacetic acid 221.2 mg/1 195 Stock Solution Shoot Induction MS-SIM Callus Induction MS-CIM Rooting Medium MS-HF Suspension Medium MS-2,4-D Macros 100 ml 100 ml 100 ml 100 ml Micros 2ml 2ml 2ml 2ml Fe EDTA 10 ml 10 ml 10 ml 10 ml Vitamins 10 ml 10 ml 10 ml 10 ml Glycine 10 ml 10 ml 10 ml 10 ml Hormones 10 ml SIM-BA 10 ml CIM-BA 1 ml MS-2,4-D Hormones 10 ml SIM-NAA 10 ml CIM-NAA sucrose 30 g 30 g 30 g 30 g pH 5.7 5.7 5.7 5.7 distilled water to 1 litre to 1 litre to 1 litre to 1 litre Agar1 8.0 g/1 8.0 g/1 8.0 g/1 8.0 g/1 1. Agar (A-7002, Sigma) Pollen Medium sucrose 50 g distilled water 11 Bacto agar 6 g/1 Water Agar distilled water 1 1 Bacto agar 15 g/1 196 Appendix C List of Suppliers and Manufacturers Amersham Canada Ltd. Clontech Laboratories, Inc. 1166 South Service Road West 4030 Fabian Way, Oakville, Ont. Palo Alto, C A 94303-4607 . Canada, L 6 L 9Z9 (800)-662-2566 (800)- 387-7146 Difco American National Can P.O. B o x 1058A Neneh, W I Detroit, M I 48201 U S A 54946 (800)-521-0851 J.T. Baker D o w Chemical Co. 222 Red School Lane Mississauga, Ont. Phillipsburg, N J , 08865 U S A Canada (800)582-2537 DuPont N E N - Research Products B D H Chemicals Canada Ltd. 549 Albany St. 60 East 4th Avenue, Boston, M A 02118 U S A Vancouver, B . C . (800)-3 87-83 91 Canada, V 5 T 1E8 (604)-873-5121 Ericomp 6044 Cornerstone Court West, Suite E Beckman Instruments, Inc. San Diego, C A , 92121 U S A 1045 Tristar Drive, (800)541-8471 Mississauga, Ont. L 5 T 1W5 Flow Laboratories, Inc. (800)-3 87-6799 1760 Meyerside Drive, Unit 3 Mississauga, Ont. Becton Dickinson Canada, Inc. Canada, L 5 T 1A3 2464 S. Sheridan Way, (416)-677-5910 Mississauga, Ont. Canada, L 5 J 2M8 Gibco /BRL Canada Inc. (416)-822-4820 Life Technologies, 2270 Industrial Street Bio-Rad Laboratories Burlington, Ont. 5671 M c A d a m Road, Canada, L 7 P 1A1 Mississauga, Ont. (416)-335-2255 L 4 Z 1N9 (800)-268-0213 I C N Pharmaceuticals Inc. 3300 Hyland Rd . Costa Mesa, C A 92626 U S A Fax: (800)-854-0530 197 IntelliGenetics 700 East Camino Real, Suite 300 Mountain View, C A 94040 U S A (800)-876-9994 Kodak (Canada) Inc. 3500 Eglington Avenue West, Toronto, Ont. Canada, M 6 M 1V3 (416)-766-8233 Magenta Corp. Chicago, IL U S A New England Biolabs, Inc. 32 Tozer R d Beverley, M A 01915-5599 U S A (800)-632-5227 Nucleic Ac id - Protein Service Laboratory Biotechnology Laboratory 6174 University Blvd University of British Columbia, Vancouver, B . C . (604) 822-4570 Perkin-Elmer Cetus 6335 Millcreek Drive Mississauga, Ont. Canada, L 5 N 2 M 2 (800) 668-6913 " Pharmacia Biotech 500 Morgan Blvd. Baie d'Urfe', Que. Canada H 9 X 3 V I (800)567-1008 Polaroid (Canada) Inc. 350 Carlingview Drive, Rexdale, Ont. Canada, M 9 W 5G6 (416)-675-3680 Promega 2800 Woods Hol low Road, Madison,WL53711-5899 U S A (800) 356-9526 Sarstedt Inc. 5655 Bois - Franc St. Laurent, Que. Canada H4S 1B2 (514)-337-6908 Seedboro Equipment Chicago, EL Scott Laboratories, Inc. Carson, C A U S A Sigma-Aldrich Canada Ltd . 2149 Winston Park Drive Oakville, Ontario L 6 H 6J8 (800) 565-1400 Stratagene, 11011 North Torrey Pines Road L a Jolla, C A 92037 U S A (800)-424-5444 Systat Inc. 444 North Michigan Ave. , Chicago, IL 60611 U S A (312)329-2400 Techne Inc. Princeton, N J U S A U V P Inc. San Gabriel, C A , U S A Whatman Inc. 6 Just Road Fairfield, N J 07004 U S A 198 Carl Zeiss Inc. Thornwood, N Y 10594 U S A (914)_747-1800 Appenidix D BLAST Sequence Alignment Results blast@ncbi.NLM.NIH. BLAST E-Mail Server <blast@ncbi.NLM.NIH.GOV>: mcinnis@decul2 The query sequence f o r t h i s search has been f i l t e r e d . F i l t e r i n g low complexity regions t h a t commonly give s p u r i o u s l y high scores t h a t r e f l e c t compositional bias r a t h e r than s i g n i f i c a n t b y - p o s i t i o n alignment. F i l t e r i n g can e l i m i n a t e these p o t e n t i a l l y confounding matches (e.g., h i t s against p r o l i n e - r i c h regions or poly-from the b l a s t r e p o r t s , l e a v i n g regions whose b l a s t s t a t i s t i c s r e f l e c t the s p e c i f i c i t y of t h e i r p a i r w i s e alignment. BLAST 1.4.9MP [26-March-1996] [ B u i l d 14:27:01 Apr 1 1996] Reference: A l t s c h u l , Stephen F., Warren Gish, Webb M i l l e r , Eugene W. Myers,David Lipman (1990). Basic l o c a l alignment search t o o l . J . Mol. B i o l . 215:403-10. I. Query= UNKNOWN, INSERTION IN PSEUDOGENE #3 t r a n s l a t i o n (213 l e t t e r s ) Database: Non-redundant GenBank CDS translations+PDB+SwissProt+SPupdate+PIR 260,745 sequences; 73,403,355 t o t a l l e t t e r s . Smallest-Sum High P r o b a b i l i t y Sequences producing High-scoring Segment P a i r s : Score P(N) N gnl|PID|e276474 (X99792) c a p a c i t a t i v e calcium entry chann... 59 0.9999 1 >gnl|PID|e276474 (X99792) c a p a c i t a t i v e calcium entry channel 1 [Bos taurus] Length = 981 Score = 59 (27.9 b i t s ) , Expect = 8.9, P = 1.0 I d e n t i t i e s = 15/38 (39%), P o s i t i v e s = 20/38 (52%) Query: 8 GLMNAERHIXFIKSTFNYKXNLIYLMAAFNHIDRSQIS 45 GL + I FI T +Y L L+ A HIDRS ++ Sbj c t : 358 GLFIRKPFIKFICHTASYLTFLFLLLLASQHIDRSDLN 395 Parameters: V=100 B=50 H=0 - f i l t e r = S E G P=4 - c t x f a c t o r = l . 0 0 E=10 Query As Used Computed Frame MatID M a t r i x name Lambda K H Lambda K H +0 0 BLOSUM62 0.328 0.136 0.435 same same same Query Frame MatID Length Eff.Length E S W T X E2 S2 +0 0 213 161 10. 59 3 11 22 0.20 31 ' 200 S t a t i s t i c s : Query Expected Observed HSPs HSPs Frame MatID High Score High Score 'Reportable Reported +0 0 63 (29.8 b i t s ) 59 (27.9 b i t s ) 1 . 1 Query Neighborhd Word Excluded F a i l e d S u ccessful Overlaps Frame MatID Words H i t s H i t s Extensions Extensions Excluded +0 0 4747 18788135 .4376884 14386121 25130 11 Database: Non-redundant GenBank CDS translations+PDB+SwissProt+SPupdate+PIR Release date: May 31, 1997 Posted date: 9:39 AM EDT May 31, 1997 # of l e t t e r s i n database: 73,403,355 # of sequences i n database: 260,745 # of database sequences s a t i s f y i n g E: 1 No. of s t a t e s i n DFA: 564 (56 KB) T o t a l s i z e of DFA: 106 KB (128 KB) Time to generate neighborhood: O.Olu 0.00s O.Olt Real: 00:00:00 No. of processors used: 4 Time to search.database: 34.69u 0.32s 35.01t Real: 00:00:09 T o t a l cpu time: 34.72u 0.36s 35.08t Real: 00:00:09 I I . Query= UNKNOWN, INSERTION IN PSEUDOGENE #4 t r a n s l a t i o n (214 l e t t e r s ) Database: Non-redundant GenBank CDS translations+PDB+SwissProt+SPupdate+PIR 260,745 sequences; 73,403,355 t o t a l l e t t e r s . Smallest Sum High P r o b a b i l i t y Sequences producing High-scoring Segment P a i r s : Score P (N) N gi1451235 omega-Grammotoxin SIA, omega-GsTx SIA=voltage-.. 54 0.75 1 gi11397257 (U61946) F47C12.5 gene product [Caenorhabditis.. 56 0. 996 2 gb|100054| Sequence 2 from Patent US 4920196 33 0 . 999 2 >gi|451235 omega-Grammotoxin SIA, omega-GsTx SIA=voltage-sensitive calcium channel responses p e p t i d e r g i c b l o c k e r [Grammostola s p a t u l a t a = t a r a n t u l a s p i d e r s , venom, Peptide, 36 aa] Length = 36 Score = 54 (25.7 b i t s ) , Expect = 1.4, P = 0.75 I d e n t i t i e s = 10/35 (28%), P o s i t i v e s = 15/35 (42%) Query: 127 CVXLWRADSENFXCXLRAFLESXFIRKFCIWFNSI 161 CV W S+ C +S + R C+W S+ Sb j c t : 2 CVRFWGKCSQTSDCCPHLACKSKWPRNICVWDGSV 36 >gi|1397257 (U61946) F47C12.5 gene product [Caenorhabditis elegans] Length =341 Score = 56 (26.6 b i t s ) , Expect = 5.5, Sum P(2) = 1.0 I d e n t i t i e s = 9/20 (45%), P o s i t i v e s = 13/20 (65%) 201 Query: 46 WVRFIVHNFLGFIKASVNMF 65 WV I + F+GF A++N F Sbj c t : 260 WVPIIYYTFIGFFNAA.INNF 279 Score = 35 (16.7 b i t s ) , Expect = 5.5, Sum P(2) = 1.0 I d e n t i t i e s = 6/17 (35%), P o s i t i v e s = 10/17 (58%) Query: 33 EDARIFLWLQNQWVRF 4 9 E+ L++ +N W RF Sb j c t : 124 ENRFYILMINKNMWTRF 140 >gb|I00054| Sequence 2 from Patent US 4920196 Length =41 Score = 33 (15.7 b i t s ) , Expect = 6.5, Sum P(2) = 1.0 I d e n t i t i e s = 7/8 (87%), P o s i t i v e s = 7/8 (87%) Query: 143 RAFLESXF 150 RAFLES F Sb j c t : 28 RAFLESGF 35 Score = 31 (14.8 b i t s ) , Expect = 6.5, Sum P(2) = 1.0 I d e n t i t i e s = 5/7 (71%), P o s i t i v e s = 7/7 (100%) Query: 130 LWRADSE 136 LWRA+S+ Sbj c t : 21 LWRANSD 27 Parameters: V=10 0 B=50 H=0 - f i l t e r = S E G P=4 -c t x f a c t o r = l . 0 0 E=10 Query Frame MatID M a t r i x name +0 0 BLOSUM62 As Used Lambda K 0.330 0.144 H 0. 448 Lambda same Computed K same H same Query Frame MatID Length Eff.Length E S W T X E2 S2 +0 0 214 170 10. 59 3 11 22 0.21 31 S t a t i s t i c s : Query Expected Observed Frame MatID High Score High Score +0 0' 63 (30.0 b i t s ) 56 (26.6 b i t s ) HSPs HSPs Reportable Reported 5 5 Query Frame MatID +0 0 Neighborhd Word Excluded Words H i t s H i t s 4307 20865510 4460789 F a i l e d Successful Overlaps Extensions Extensions Excluded 16381679 23042 6 Database: Non-redundant GenBank CDS translations+PDB+SwissProt+SPupdate+PIR Release date: May 31, 1997 Posted date: 9:39 AM EDT May 31, 1997 202 # of l e t t e r s i n database: 73,403,355 # of sequences i n database: .260,745 # of database sequences s a t i s f y i n g E: 3' No. of s t a t e s i n DFA: 556 (55 KB) T o t a l s i z e of<DFA: 102 KB (128 KB) Time to generate neighborhood: O.Olu 0.01s 0.02t Real: 00:00:00 No. of processors used: 4 Time to search database: 37.52u 0.34s 37.86t Real: 00:00:10 T o t a l cpu time: 37.55u 0.38s 37.93t Real: 00:00:10 I I I . Query= UNKNOWN, INSERTION IN PSEUDOGENE #5 t r a n s l a t i o n (214 l e t t e r s ) Database: Non-redundant GenBank CDS translations+PDB+SwissProt+SPupdate+PIR 260,745 sequences; 73,403,355 t o t a l l e t t e r s . Smallest Sum High P r o b a b i l i t y Sequences producing High-scoring Segment P a i r s : Score P(N) N sp|P40941|ADT2_ARATH ADP,ATP CARRIER PROTEIN 2 PRECURSOR ... 60 0.998 1 prf||1908224A .nucleotide t r a n s l o c a t o r [ A r a b i d o p s i s . . . 60 0.998 1 sp|P31167|ADT1_ARATH ADP,ATP CARRIER PROTEIN 1 PRECURSOR ... 59 0.9998 1 >sp|P40941|ADT2_ARATH ADP,ATP CARRIER PROTEIN 2 PRECURSOR (ADP/ATP TRANSLOCASE 2) (ADENINE NUCLEOTIDE TRANSLOCATOR 2) (ANT 2) >pir||S29852 ADP,ATP c a r r i e r p r o t e i n - Arabidopsis thaliana >gi|16160 (X68592) adenosine n u c l e o t i d e t r a n s l o c a t o r [Arabidopsis thaliana] Length = 385 Score = 60 (28.4 b i t s ) , Expect = 6.2, P = 1.0 I d e n t i t i e s = 15/40 (37%), P o s i t i v e s = 17/40 (42%) Query: 19 LFNLKPXSDGYGLKMLAYSXSSSKINGSDLLFITFLDLSR 58 LFN K DGY S S LLF+ LD +R Sb j c t : 175 LFNFKKDKDGYWKWFAGNLASGGAAGASSLLFVYSLDYAR 214 >prf| |1908224A n u c l e o t i d e t r a n s l o c a t o r [Arabidopsis thaliana] Length = 403 Score = 60 (28.4 b i t s ) , Expect = 6.3, P = 1.0 I d e n t i t i e s = 15/40 (37%), P o s i t i v e s = 17/40 (42%) Query: 19 LFNLKPXSDGYGLKMLAYSXSSSKINGSDLLFITFLDLSR 58 LFN K DGY S S LLF+ LD +R Sb j c t : 193 LFNFKKDKDGYWKWFAGNLASGGAAGASSLLFVYSLDYAR 232 >sp|P31167|ADT1_ARATH ADP,ATP CARRIER PROTEIN 1 PRECURSOR (ADP/ATP TRANSLOCASE 1) (ADENINE NUCLEOTIDE TRANSLOCATOR 1) (ANT 1) >pir||S21313 ADP,ATP c a r r i e r p r o t e i n - Arabidopsis thaliana (fragment) >gi116175 • (X65549) adenylate t r a n s l o c a t o r [Arabidopsis thaliana] >prf||1909354A adenylate t r a n s l o c a t o r [Arabidopsis thaliana] Length =37 9 Score = 59 (27.9 b i t s ) , Expect = 8.7, P = 1.0 I d e n t i t i e s = 15/40 (37%), P o s i t i v e s = 17/40 (42%) 203 Query: 19 LFNLKPXSDGYGLKMLAYSXSSSKINGSDLLFITFLDLSR 58 LFN K DGY S S LLF+ LD +R Sb j c t : 169 LFNFKKDRDGYWKWFAGNLASGGAAGASSLLFVYSLDYAR 20? Parameters: V=10 0 B=50 H=0 - f i l t e r = S E G P=4 - c t x f a c t o r = l . 0 0 E=10 Query Frame MatID M a t r i x name +0 0 BLOSUM62 As Used Lambda K 0.328 .0.141 H 0.427 Computed Lambda K H same same same Query Frame MatID Length Eff.Length +0 0 214 166 E S W T . X- E2 S2 10. 59 3 11 22 0.21 31 S t a t i s t i c s : Query Frame MatID +0 0 Expected High Score 63 (29.8 b i t s ] Observed High Score 60 (28.4 b i t s ) HSPs Reportable 3 HSPs Reported 3 Query Neighborhd Word Excluded Frame MatID Words H i t s ' H i t s +0 0 3291 20;049046" 4204983 F a i l e d S u ccessful Overlap Extensions Extensions Exclude 15818457 25606 3 Database: Non-redundant GenBank CDS "translations+PDB+SwissProt+SPupdate+PIR Release date: May 31, 1997 Posted date: 9:39 AM EDT May 31, 1997 • # of l e t t e r s i n database: 73,403,355 # of sequences i n database: 260,745 # of database sequences s a t i s f y i n g E: 3 No. of s t a t e s i n DFA: 551 (54 KB) T o t a l s i z e of DFA: 91 KB (128 KB) Time to generate neighborhood: O.Olu 0.01s 0.02t Real: 00:00:00 No. of processors used: 4 Time to search database: 37.22u 0.32s 37.54t Real: 00:00:10 T o t a l cpu time: 37.25u 0.36s 37.61t Real: 00:00:10 204 Appendix E Prediction of Protein Secondary Structure for PG2S Thu, 18 Sep 1997 22:39:55 GMT To: mcinnis@decul2 Subject: P r e d i c t - P r o t e i n The f o l l o w i n g i n f o r m a t i o n has been recei v e d by the server: seed storage p r o t e i n \ ' HEDNMYGEEIQQQRRSCDPQRHPQRLSSCRDYLERRREQPSERCCEELQRMSPQCRCQAIQQMLDQSLSYDSFMD SDSQEDTPLNQRRRRRREGRGRDEEEVMERAAYLPNTCNVREPPRRCDIQRHSRYFMTGSSFK The a l i g n m e n t t h a t has been used as i n p u t t o t h e ne t w o r k i s : MAXHOM m u l t i p l e sequence a l i g n m e n t MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY ID: i d e n t i f i e r o f a l i g n e d (homologous) p r o t e i n STRID: PDB i d e n t i f i e r ( o n l y f o r known s t r u c t u r e s ) PIDE: p e r c e n t a g e o f p a i r w i s e sequence i d e n t i t y WSIM: p e r c e n t a g e o f w e i g h t e d s i m i l a r i t y L A L I : number o f r e s i d u e s a l i g n e d NGAP: number o f i n s e r t i o n s and d e l e t i o n s ( i n d e l s ) LGAP: number o f r e s i d u e s i n a l l i n d e l s LSEQ2: l e n g t h o f a l i g n e d sequence ACCNUM: S w i s s P r o t a c c e s s i o n number NAME: o n e - l i n e d e s c r i p t i o n o f a l i g n e d p r o t e i n MAXHOM ALIGNMENT HEADER: SUMMARY ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME 2 s s l p i c g l 96 94 137 1 1 172 P26986 2S SEED STORAGE-LIKE PROT puia wheat 35 36 86 3 43 148 P33432 PUROINDOLINE-A PRECURSOR. i a a l wheat 34 32 79 3 17 124 P01085 ALPHA-AMYLASE INHIBITOR iaa5 wheat 33 27 79 3 17 124 P01084 ALPHA-AMYLASE INHIBITOR iaab horvu 31 35 80 2 23 149 P32936 SOLUBLE PROTEIN CMB). i a l 6 wheat 31 35 80 2 23 143 P16159 SOLUBLE PROTEIN CM16). 2ss5 helan 31 33 122 4 4 2 295 P15461 2S SEED STORAGE PROTEIN 2ss r i c c o 30 37 90 2 32 258 P01089 2S ALBUMIN PRECURSOR. 2ss8 helan 30 31 94 3 39 141 P23110 •ALBUMIN 8 PRECURSOR M A X H O M A L I G N M E N T : IN MSF F O R M A T MSF of: /home/phd7server/work/predict_h25826_28326.hssp from: 1 to: 138 /home/phd/server/work/predict_h25826_28326.ret_msf MSF: 138 Type: P 19-Sep-97 00:37:2 Check: 7052 .. Name: predict_h258 Len: 138 Check: 6380 Weight: 1 00 Name: 2 s s l p i c g l Len: 138 Check: 3874 Weight: 1 00 Name: puia wheat Len: 138 Check: 910 Weight: 1 00 Name: i a a l wheat Len: 138 Check: 2695 Weight: 1 00 Name: iaa5 wheat Len: 138 Check: 3586 Weight: 1 00 Name: iaab horvu Len: 138 Check: 8365 Weight: 1 00 Name: i a l 6 wheat Len: 138 Check: ' 9487 Weight: 1 00 Name: 2ss5 helan Len: 138 Check: 896 Weight: 1 00 Name: 2ss r i c c o Len: 138 Check: 9093 Weight: 1 00 Name: 2ss8 helan Len: 138 Check: 1766 Weight: 1 00 205 predict_h25S 2 s s l _ p i c g l puia_wheat iaal_wheat iaa5_wheat iaab_horvu ial6_wheat 2ss5_helan 2 s s _ r i c c o 2ss8 helan 1 . 50 HEDNMYGEEI QQQRRSCDPQ RHPQRLSSCR DYLERRREQP SERCCEELQR HEDNMYGEEI QQQRRSCDPQ RDPQRLSSCR DYLERRREQP SERCCEELQR ETKLNSCR NYLLDrcQEL LGECCSRLGQ LPACR PLLRLqpEAV LRDCCQQLAH LPGCR PLLKLqpEAV LRDCCQQLAD TPLPSCR DYVEQqpYLA KQQCCGELAN TPLPSCR DYVEQqpYLA KQQCCGELAN .TTTIEDENP ISGQRQVSQR IQGQRLNQCR MFLQqqQEQQ LQQCCQELQN PSQQGCRGQI QEQQNLRQCQ EYIKQqqERS LRGCCDHLKQ .EENPYGRG. RTESGCYQQM EEAEMLNHCG MYLMkrEEDH KQLCCMQLKN predict_h25£ 2 s s l _ p i c g l puia_wheat iaal_wheat iaa5_wheat iaab_horvu ial6_wheat 2ss5_helan 2 s s _ r i c c o 2ss8 helan 51 MS PQCRCQAI MSPQCRCQAI MPPQCRCNII ISEWCRCGAL ISEWPRCGAL IPQQCRCQAL IPQQCRCQAL IEGQCQCEAV MQSQCRCEGL LDEKCMCPAI QQMLDQSLSY QQMLDQSLSY QGSIQGDLGG YSMLDSM..Y YSMLDSM..Y RFFMGRKSRP RYFMGPKSRP KQVFREA... RQAIEQQQS. MMMLNEPMWI DSFMDSDSQE DSFMDSDSQE IFGFQRDRAS KEHGAQEGQA KEHGVSEGQA D D ' QQ DTPLNQRRRR DAPLNQRRRR 100 RREGRGRDEE R.EGRGREEE GTGAFPRCRR GTGAFPSCRR QSGLM QSGLM QVQQQQGRQL E E ELPGCPREVQ ELPGCPREVQ VPFRGSQQTQ QGQLQGQDVF RMRD predict_h25E 2 s s l _ p i c g l puia_wheat iaal_wheat iaa5_wheat iaab_horvu ial6_wheat 2ss5_helan 2 s s _ r i c c o 2ss8 helan 101 138 EVMERAAYLP NTCNVREPPR RCDIQRHSRY FMTGSSFK EAMERAAYLP NTCNVREPPR RCDIQRHSRY SMTGSSFK KVIQEAKNLP PRCNQGPP.. .CNIPGTIGY Y WKLTAASIT AVCRL WKLTAASIT AVCRL MDFVRILVTP GFCNLTT MDFVRILVTP GYCNLTT QLKQKAQILP NVCNLQS..R RCEIGTItrP FGTGSQ.. EAFRTAANLP SMCGVSPTEC R QVMSMAHNLP IECNLMSQPC Q Secondary structure prediction by PHDsec: Author: Burkhard Rost, EMBL, Heidelberg, FRG1, Meyerhofstrassel, 69 117 Heidelberg: Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method The network procedure is described in detail in: 1) Rost, Burkhard, Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol.,1993,232, 584-599. Brief description is given in: Rost, Burkhard; Sander, Chris: Prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562. The PHD mail server is described in: 2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard: PHD - an automatic mail server for protein secondary structure prediction. CABIOS, 1994,10, 53-60. 206 The latest improvement steps (up to 72%) are explained in: 3) Rost, Burkhard; Sander, Chris: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994, 19,55-72. To be quoted for publications of PHD output: 1-3 for the prediction of secondary structure and the prediction server. About the input to the network The prediction is performed by a system of neural networks. The input is a multiple sequence alignment. It is taken from an HSSP file (produced by the program MaxHom: Sander, Chris & Schneider, Reinhard: Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins, 1991,9, 56-68. For optimal results the alignment should contain sequences with varying degress of sequence similarity relative to the input protein. The following is an ideal situation: sequence: sequence i d e n t i t y t a r g e t sequence 100 % al i g n e d seq. 1 90 % • . . al i g n e d seq. 2 80 % al i g n e d seq. 7 30 % Accuracy of P r e d i c t i o n A c a r e f u l cross v a l i d a t i o n t e s t on some 250 p r o t e i n chains ( i n t o t a l about 55,000 residues) w i t h l e s s than 25% p a i r w i s e sequence i d e n t i t y gave the f o l l o w i n g r e s u l t s : Q t o t a l = 72.1% ("ove r a l l three s t a t e accuracy") Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% | Qloop (% of observed)=79% | Qloop (% of predicted)=72% | These percentages are defined by: Qt o t a l = number of c o r r e c t l y p r e d i c t e d residues(*100) number of a l l residues Qhelix (% of obs) = - no of res c o r r e c t l y p r e d i c t e d to be i n h e l i x (*100) no of a l l res observed to be i n h e l i x I I I I Qhelix (% of pred)= no of res c o r r e c t l y p r e d i c t e d to be i n h e l i x (*100) no of . a l l residues p r e d i c t e d to be i n h e l i x Averaging over s i n g l e chains The most reasonable way to compute the . o v e r a l l accuracies i s the above quoted percentage of c o r r e c t l y p r e d i c t e d residues. However, si n c e the user i s mainly i n t e r e s t e d i n the expected performance of the p r e d i c t i o n f o r a p a r t i c u l a r p r o t e i n , the mean value when averaging over p r o t e i n chains might be of help as w e l l . Computing f i r s t the three s t a t e accuracy f o r each p r o t e i n chain, and then averaging over 250 chains y i e l d s the f o l l o w i n g average: Qtotal/averaged over chains =72.2% standard d e v i a t i o n = 9 . 3 % Further measures of performance Matthews c o r r e l a t i o n c o e f f i c i e n t : C h e l i x = 0.63, Cstrand = 0.53, Cloop =0.52 207 Average l e n g t h of p r e d i c t e d secondary s t r u c t u r e segments: p r e d i c t e d observed L h e l i x = 10.3 9.3 Lstrand = 5.0 5.3 Lloop = 7.2 5.9 The accuracy m a t r i x i n d e t a i l : number of residues w i t h H, E, L net H net E obs H 12447 1255 obs E 949 7493 obs L 2604 2875 sum Net 16000 11623 net L 3990 3750 19962 27702 sum obs 17692 12192 2-5441 55325 Note: This t a b l e i s to be read i n the f o l l o w i n g manner: 12447 of a l l residues p r e d i c t e d to be i n h e l i x , were observed to be i n h e l i x , 949 however belong to observed strands, 2604 to observed loop regions. The term "observed" r e f e r s to the DSSP assignment of secondary s t r u c t u r e c a l c u l a t e d from 3D coordinates of e x p e r i m e n t a l l y determined s t r u c t u r e s (Dictionary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22,2577-2637). P o s i t i o n - s p e c i f i c r e l i a b i l i t y index The network p r e d i c t s the three secondary s t r u c t u r e types using r e a l numbers from the output u n i t s . The p r e d i c t i o n i s assigned by choosing the maximal u n i t ("winner takes a l l " ) . However, the r e a l numbers c o n t a i n a d d i t i o n a l i n f o r m a t i o n . E.g. the d i f f e r e n c e between the maximal and the second l a r g e s t output u n i t can be used to d e r i v e a " r e l i a b i l i t y index". This index i s given f o r each residue along w i t h the p r e d i c t i o n . The index i s s c a l e d tovalues between 0 (lowest r e l i a b i l i t y ) , and 9 ( h i g h e s t ) . The accuracies (Qtot) to be expected f o r residues w i t h values above a p a r t i c u l a r value of the index are given below as w e l l as the f r a c t i o n of such residues (%res).: I i n d e x | 0 | 1 | 2 | 3 | 4 | 5 . | 6 | 7 | . 8 | 9 | | %res 1100.01 99.2| 90.4| 80.9| 71.6| 62.51 52.8| 42.3| 29. 8| 14. 1| I I I I I I I I I I I I | Qtot | 72.11 72.3| 74.8| 77.71 80.3| 82.9| 85.7| 88.51 91.11 94.2| I 1 I I I I 1 I I I I I | H%obs| 70.4| 70.6| 73.7| 77.1| I E % o b s | 61.5| 61.7| I I I I I H%prd| 77.8| 78.0| | E%prd| 64.5| 64.7| 63.7| 66.6| 80.1| 69. 11 83.1| 86.0| 71.71 74.6| I 80.0| 67.8| I I 82.6| 84.7| 71.0| 74.2| I 86.9| 77.6| I 89.2| 81.4 1 89.3| 77.0| I I 91.3| 93.1| 92.5| 96.4| 77.8| 68.1| 85.1| 89. I 95. 4 | 93.5| The above t a b l e gives the cumulative r e s u l t s , e.g. 62.5% of a l l residues have a r e l i a b i l i t y of at l e a s t 5. The o v e r a l l t h r e e - s t a t e accuracy f o r t h i s subset of almost two t h i r d s of a l l residues i s 82.9%. For t h i s subset, e.g., 83.1% of the observed h e l i c e s are c o r r e c t l y p r e d i c t e d , and 86.9% of a l l residues p r e d i c t e d to be i n h e l i x are c o r r e c t . 208 The f o l l o w i n g t a b l e gives the non-cumulative q u a n t i t i e s , i . e . the values per r e l i a b i l i t y index range. These numbers answer the question: how r e l i a b l e i s the p r e d i c t i o n f o r a l l residues l a b e l e d w i t h the p a r t i c u l a r index i . I index| 1 | 2 1 3 | 4 | 5 • | 6 ' 7 1 8 | 9 | | %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10. 5| 12.5| 15. 7| 14.1| ! I I I I I I I I I I " | Qtot | 46.6| 50.6| 57.7| 62.61' 67.9| 74.2| 82.2| 88.3| 94.2| I I I I I I I I I I -+ + -+- --+ I | H%obs| 36.81 42.3| | E%obs| 44.7| 44.5| I I I I | H%prd| 4'9.9| 52.5| | E%prd| 41.7| 47.1| + + + +-49.5| 52.11 I 55.2 | 55.4| I 61.7 | 60. 9 | I 69.9| 78.8| 68.0| 75.9| 60.3| 64.21 53.6| 57.0| I 69.2 1 77:5| 64.0| 71.6| --+-I 87.4| 81.0| I 85.4| 89.9| 78.8| 88.81 96.4| 68.1| I 95. 4 | 93. 5 | For example, f o r residues w i t h Relindex = 5 64% of a l l p r e d i c t e d betha-stand residues are c o r r e c t l y i d e n t i f i e d . The r e s u l t i n g network (PHD) p r e d i c t i o n i s : PhD: P r o f i l e fed neural network systems from HeiDelberg P r e d i c t i o n of: secondary s t r u c t u r e , by PHDsec solvent a c c e s s i b i l i t y , by PHDacc and h e l i c a l transmembrane regions, by PHDhtm Author:Burkhard Rost, EMBL 69012 Heidelberg, Germany I n t e r n e t : Rost@EMBL-Heidelberg.-DE A l l r i g h t s reserved. Some s t a t i s t i c s Percentage of amino a c i d s : + | AA: | % of AA: + | AA: | % of AA: + | AA: | % of AA: + | AA: | % of AA: + - + - -I I - + - -I +-- + - -I I -+--+--R I 18.1 | -+--P I 5.8 | +--N I 2.9 | +__ F I 2.2 | Q I 10.9 | +-C I 5.8 | +-G I 2.9 | +-A | 2.2 | +-E 10. 9 L 5.1 T 2.2 V 1.4 - + -I I - + -I I - + -I I - + -I I -+-S I 9.4 | M | 4.3 | I I 2.2 | +-K I 0.7 | + + D I 7.2 | + Y I 3.6 | + H I 2.2 | + Percentage of secondary s t r u c t u r e p r e d i c t e d : + + + + + | SecStr: I H | E | L | | % P r e d i c t e d : | 30.4 | 4.3 | 65.2 | + + + + + 209 According to the f o l l o w i n g c l a s s e s : a l l alpha: %H>45 and %E< 5; a l l - b e t a : %H<5 and %E>45 alpha-beta : %H>30 and %E>20; mixed: r e s t , t h i s means th a t the p r e d i c t e d c l a s s i s : mixed c l a s s PhD.output f o r your p r o t e i n F r i Sep 19 00:39:21 1997 Jury on: 10 d i f f e r e n t a r c h i t e c t u r e s (version 5.94_317 ). Note: d i f f e r e n t l y t r a i n e d a r c h i t e c t u r e s , i . e . , d i f f e r e n t v e r s i o n s can r e s u l t i n d i f f e r e n t p r e d i c t i o n s . About the p r o t e i n HEADER /home/phd/server/work/predict_h25826_283 CMPND SOURCE AUTHOR SEQLENGTH 138 NCHAIN 1 chain(s) i n predict_h25826_28326 data set NALIGN 9 (=number of a l i g n e d sequences i n HSSP f i l e ) A b b r e v i a t i o n s : PHDsec sequence: AA: amino a c i d sequence secondary s t r u c t u r e : HEL: H=helix, E=extended (sheet), blank=other (loop) PHD: P r o f i l e network p r e d i c t i o n HeiDelberg Rel: R e l i a b i l i t y index of p r e d i c t i o n (0-9) d e t a i l : prH: ' p r o b a b i l i t y ' f o r a s s i g n i n g h e l i x prE: ' p r o b a b i l i t y ' f o r a s s i g n i n g strand prL: ' p r o b a b i l i t y ' f o r a s s i g n i n g loop note: the ' p r o b a b i l i t e s ' are s c a l e d to the i n t e r v a l 0-9, e.g.,prH=5 means that the f i r s t output node i s 0.5-0.6 subset: SUB: a subset of the p r e d i c t i o n , f o r a l l residues w i t h an expected averag accuracy > 82% (tables i n header) note: f o r t h i s subset the f o l l o w i n g symbols are used: L: i s loop ( f o r which above " " i s used) ".": means that no pr e d i c t i o n , i s made f o r t h i s residue, as t h e r e l i a b i l i t y i s : Rel < 5 p r o t e i n : p r e d i c t l e n g t h 138 210 AA PHD sec Rel sec d e t a i l : prH sec prE sec prL sec subset: SUB sec AA PHD sec Rel sec d e t a i l : prH sec prE sec prL sec subset: SUB sec AA PHD sec Rel sec d e t a i l : prH sec prE sec prL sec subset: SUB sec , 1 , 2 , 3. . . ., 4 , 5 , 6 HEDNMYGEEIQQQRRSCDPQRHPQRLSSCRDYLERRREQPSERCCEELQRMSPQCRCQAI| HHHHHHHHH HHHHHHHHHH . HHHHHH| 99887 898 98544 676873356657247 9999861551214999999993874122 6999 | 00001100012322111133222214 68 9999 8 7 522554 69999999961134557 8 99| 000000000000000100000000000000000000000000000000000000000000 1 998 87 8 988 8666777 8 76577777531000 0124774 4 5300000000388 64341000| LLLLLLLLLLL..LLLLL..LLLLL..HHHHHHH.LL. . HHHHHHHH.LL. . . .HHHH| 10. 11. 12 QQMLDQSLSYDSFMDSDSQEDTPLNQRRRRRREGRGRDEEEVMERAAYLPNTCNVREPPR| HHHHHH HHHHHHHHHHH | 9999554 897 87 999987 6579877 666677 99987247 99999998 817 8526899998| 9999772101110000112210111221111000014 67 99999998.8411110000000| 00 000 000 00000 0 0000000 0 00000 01100000 0 0 00000 000 000 000132 000000| 000022688888889988778888877777899988531000000011588647899998| HHHHHH.LLLLLLLLLLLLLLLLLLLLLLLLLLLLL..HHHHHHHHHH.LLL.LLLLLLL| ........13. 13. 8 RCDIQRHSRYFMTGSSFK| E EEEE E ' | 2224621044331321291 100000000000000000| 354.323446665464430 1 535675442333535559 1 . . . . L L| 211 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088734/manifest

Comment

Related Items