T H E ISOLATION A N D M O L E C U L A R C H A R A C T E R I Z A T I O N O F A 2S A L B U M I N G E N E F R O M PICE A GLAUCA by S T E P H A N I E M A R f f i M C I N N I S B . S c , The University of Victoria, 1985 A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Plant Science) We accept this thesis as conforming T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A March 1998 © Stephanie Marie Mclnnis , 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of Br i t i sh Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Plant Science The University of Br i t i sh Columbia 344-2357 M a i n M a l l Vancouver, Canada V 6 T 1Z4 Date: A B S T R A C T 2S albumins are a class of small seed storage proteins (SSPs) found widely among the Dicotyledonae and related to the prolamin SSPs of the Monocotyledonae. A genomic clone (PG2S) and related pseudogene with homology to the dicot 2S albumins were sequenced from the gymnosperm, Picea glauca. Northern blot analysis of developing spruce somatic embryos indicated embryo-specific expression of the conifer 2S albumin gene, in a pattern consistent with a seed storage protein. A translational fusion between 2.3 kb of the 5'-flanking region of PG2S and the uidA (p-glucuronidase) reporter gene was constructed to explore promoter function. Two distal deletions were also created, resulting in promoters deleted to a 5' position of -653 and -117, respectively, relative to the site of transcriptional initiation. The two larger constructs were used in the stable transformation of tobacco (Nicotiana tabacum cv. Xanthi) resulting in embryo-specific expression of the uidA gene in the developing tobacco embryo from long heart stage to embryo maturity with maximal expression in the torpedo stage embryo. Under control of the PG2S promoter G U S expression was not detected in any other tobacco tissues. There were no differences observed in pattern or strength of expression between the constructs in tobacco. The three PG2S promoter ; uidA reporter gene constructs were transiently expressed by microprojectile bombardment in a developmental series of interior spruce {Picea glauca/engelmannii) somatic embryos (proembryo, stage 2, early cotyledonary, mature and partially-dried mature), germinants and pollen. Transient expression of the 2.3 kb promoter construct mirrored 2S albumin m R N A levels in the proembryo, stage 2, early cotyledonary and mature embryo, as well as in germinants. In contrast, high levels of transient expression were observed in partially-dried mature embryos and in pollen, in which low and no 2S albumin m R N A was detected. The promoter deletions gave reduced levels of transient expression, but were not altered in seed-specificity. Putative regulatory motifs identified within the promoter were: A C G T , T G C A , C A T G , C A N N T G , and C C A C ( C ) . Despite evolutionary distance, the 2S albumin promoter from spruce functions in a developmentally regulated, tissue-specific manner in an ii angiosperm; c/s-elements and their co-ordinate transacting factors are conserved between gymnosperms and angiosperms. ii i T A B L E O F C O N T E N T S Abstract i i Table of Contents iv List o f Abbreviations ix List o f Tables x i List o f Figures xi i Acknowledgments xiv C H A P T E R O N E - Introduction 1.1 Seed Storage Proteins 1 1.2 2S Albumin Super Family 2 1.2.1 2S Albumin-Related Genes in the Monocotyledonae 3 1.2.2 2S Albumins in the Dicotyledonae 4 1.2.3 2S Albumin Proteins 6 1.2.4 Protein Secondary Structure 7 1.2.5 Processing of the Precursor Protein 9 1.2.6 Amino Ac id Homology 11 1.2.7 Practical Applications of 2S Albumin Genes and Proteins 12 1.2.8 2S Albumins Identified as Allergens 15 1.2.9 2S Albumins with Anti-Pathogenic Activity 16 1.3 Seed-Specific Expression 17 1.4 Conifer Seed Storage Proteins and 2S Albumins 18 1.5 2S Albumin Promoter Studies in Dicotyledonae 19 1.6 Characterisation of a Picea glauca 2S Albumin Genomic Clone 23 C H A P T E R T W O - Methods and Materials 2.1 General Microbiology Techniques 29 2.1.1 Lambda Bacteriophage Protocols 30 Preparation of Plating Bacteria Working Stock.. 30 Bacteriophage Multiplication 30 iv Calculation of Bacteriophage Titre 31 2.1.2 Isolation of D N A and R N A 32 Lambda Bacteriophage D N A 32 Escherichia coli Plasmid Mini-prep .33 Agrobacterium tumefaciens Plasmid Mini-prep 34 Large Scale Plasmid Isolation by Alkaline Lysis 34 Cesium Chloride Gradient D N A Purification 35 Spruce Genomic D N A 36 Spruce R N A Purification ..37 Tobacco Genomic D N A 38 2.1.3 Gel Electrophoresis and Analysis of Nucleic Acids 39 Restriction Digest Reactions 39 Gel Electrophoresis - D N A 39 Southern Blot 40 Gel Electrophoresis - R N A 41 Northern Blot 41 2.1.4 Labelling and Hybridization of Probes 42 3 2 P Labelling of Oligonucleotide Probes 42 Hybridization of Southern Blots with Oligonucleotide Probes 43 3 2 P Labelling by Random Primer Method 43 Hybridization of Southern Blots with Randomly Labelled Probes 44 Hybridization of Northern Blots 44 2.1.5 Ligation Reactions 45 2.1.6 Heat Shock Transformation of Competent E. coli Cells 45 2.1.7 Generation of Unidirectional Deletions by Exo III Digestion 46 2.1.8 2S Albumin Promoter / G U S Fusions 47 Isolation of the 2S Albumin Promoter 47 Construction of the G U S Fusion Vectors 47 v Introduction of Binary Vectors into Agrobacterium tumefaciens 50 2.2 Sequencing 51 2.2.1 Radioactive Labeling of Sequencing Primers 52 2.2.2 Sequencing Extension/Termination Reactions 52 2.2.3 Sequencing Gels - 1 . 2 X T B E 53 2.2.4 Sequencing Gels - Formamide 55 2.2.5 Assembly of Sequences 55 2.3 Confirmation of Intron by Polymerase Chain Reaction 56 2.4 Plant Methods 56 2.4.1 Interior Spruce Embryogenic Cultures 56 2.4.2 Tobacco Plants 57 2.5 Gene Expression Experiments 58 2.5.1 Transient Expression Experiments 58 Interior Spruce Developmental Stages - Target Preparation 58 D N A Preparation for Microprojectile Bombardment 61 Microprojectile Bombardment 61 2.5.2 Stable Expression Experiments 62 Tobacco Transformation 62 Generation of T j Plants 63 2.5.3 Assessment of P-glucuronidase Activity 64 Histochemical G U S Assay 64 Fluorometic G U S Assay. 65 Protein Quantification 66 Calculation of P-glucuronidase Enzyme Activity 66 2.6 Data Analysis and Statistics 67 C H A P T E R T H R E E - Gene Structure 3.1 Characterisation of X3.2 68 3.1.1 Pseudogene (v;/2S) 68 vi 3.2 Piceaglauca 2S Albumin (PG2S) 72 3.2.1 5' Flanking Sequence .73 3.2.2 Intron Sequence..... .......87 3.2.3 Comparison of PG2S with Related Conifer c D N A s .89 3.3 Amino A c i d Sequence 93 3.4 2S Albumin Promoter : uidA Translational Fusions 110 C H A P T E R F O U R - Gene Expression 4.1 Expression of the Native Gene 112 4.2 Transient Expression in Picea glauca Developmental Stages 114 4.3 Stable Expression in Tobacco 121 C H A P T E R F I V E - Discussion and Conclusions 5.1 2S Albumin Pseudogene ^ S 128 5.2 Characterisation of a Spruce 2S Albumin Intron 130 5.2.1 Promoter Motifs within the 2 S Albumin Intron. 132 5.2.2 Would the Ancestral 2S Albumin Gene Contain an Intron? 133 5.3 Translation of PG2S 134 5.3.1 2S Albumin Cysteine Framework 134 5.3.2 2S Albumin Secondary Structure 136 5.3.3 Variation in Amino Ac id Sequence Among the Conifer 2S Albumins 136 5.4 Predicted Processing of the Mature 2S Albumin Protein 137 5.4.1 Secretory Signal Sequence 137 5.4.2 Amino terminal Processed Fragment ...141 5.4.3 Small Subunit 141 5.4.4 Internal Processed Fragment 142 5.4.5 Large Subunit and C-Terminal Processed Fragment 143 5.4.6 Amino Ac id Content of the Predicted Mature Protein 143 5.5 Similarity to Calmodulin Antagonists 145 5.6 PG2S Promoter Region 146 vii 5.6.1 2S Albumin Developmental Pattern of Expression is Conserved 146 5.6.2 Conserved Sequence Motifs 148 5.6.3 Cereal-like Promoter Motifs 151 5.6.4 A B A Response Elements 151 5.6.5 Putative enhancer elements not recognised in tobacco 152 5.7 Transient Expression 153 5.8 Conclusions 156 References Cited 160 Appendices 182 Appendix A 2S Albumins and Related Genes 183 Appendix B Buffers and Solutions 186 Bacterial and Plant Tissue Culture Media 191 Appendix C List of Suppliers 197 Appendix D B L A S T Sequence Alignment Results 200 Appendix E Prediction of Protein Secondary Structure 205 viii List of Abbreviations A adenine A B A abscisic acid A N O V A analysis of variance A P S ammonium persulphate A T P F amino terminal processed fragment A T P adenosine triphosphate d A T P deoxyadenosine triphosphate B A 6-benzylaminopurine B S A bovine serum albumin C cytosine c D N A complementary deoxyribonucleic acid C D P K Ca dependent protein kinase C P M counts per minute C T A B cetyl trimethylammonium bromide C T P F carboxy terminal processed fragment 2,4-D 2,4-dichlorophenoxyacetic acid D E P C diethyl pyrocarbonate D N A deoxyribonucleic acid D T T dithiothreitol E D T A ethylenediaminetetraacetic acid E M B L 3 lambda phage strain E R endoplasmic reticulum E S M embryogenic suspensor mass F A A formalin acetic acid fixative G guanine G U S P-glucuronidase H E P E S N-2-hydroxyethylpiperazine-N'-2'ethanesulfonic acid H g mercury I B A 3-indolebutyric acid IPF internal processed fragment I P T G isopropyl P-D-thiogalactopyranoside L B Luria-Bertani broth L M Litvay's media m R N A messenger ribonucleic acid M O P S 3 -[N-morpholino]propanesulfonic acid M S Murashige and Skoog medium ix M U G 4-methylumbelliferyl 3-D-glucuronide 4 - M U 4-methylumbelliferone M W molecular weight n number of individual measurements N A A a-naphthaleneacetic acid NPTI I neomycin phosphotransferase dNTP deoxyribonucleoside triphosphate d/ddNTP deaza/dideoxyribonucleoside triphosphate O D optical density O P A P One-Phore-All Plus™ 1 OX buffer P C R polymerase chain reaction P E G polyethylene glycol pfu plaque forming units P N F I Petawawa National Forestry Institute P N K polynucleotide kinase Pu purine Py pyrimidine R N A ribonucleic acid RNase ribonuclease SDS sodium dodecyl sulphate S E standard error SLS sodium N-lauroyl sarcosine SSC saline-sodium citrate buffer S S P E saline-sodium phosphate-EDTA buffer T thymine T A E tris-acetate-EDTA buffer T B E tris-borate-EDTA buffer T E Tris - E D T A buffer T E M E D N,N,N',N'-tetramethyl-ethylenediamine t R N A transfer R N A U B C University of British Columbia X - G a l 5-bromo-4-chloro-3-indolyl-(3-D-galactopyranoside X-Gluc5-bromo-4-chloro-3-indolyl-P-D-glucuronide Y E P yeast extract - peptone medium Y T yeast extract - tryptone medium x List of Tables Table 1: Dicot 2S Albumins 5 Table 2: Transgenic Plants Transformed with 2S Albumin Coding Regions 13 Table 3: Synthetic Oligonucleotides 29 Table 4: Matches from B L A S T Sequence Alignments 72 Table 5: Sequence with Similarity to Cereal motifs 82 Table 6: Amino Ac id Composition (%) of PG2S 105 Table 7: Percent Identity between PG2S and Conifer 2S Albumin cDNAs . . . 106 Table 8: Matrix of Pairwise Similarity of Conifer 2S Albumins 106 xi List of Figures Figure 1: Dicot 2S Albumin Precursor Protein 8 Figure 2: Plasmid Constructs. 48 Figure 3: Spruce Somatic Embryo Developmental Stages 59 Figure 4: X 3.2 Restriction Map 69 Figure 5: Picea glauca 2S albumin pseudogene T 2 S 70 Figure 6: 2S Albumin genomic clone from Picea glauca 74 Figure 7: Alignment of *F2S with the functional 2S Albumin.... 76 Figure 8: 2S albumin promoter sequence including the uidA fusion junction 80 Figure 9: Alignment of 2S Albumin Proximal Promoters 83 Figure 10: Evidence of Intron within the Spruce 2S Albumin Gene Family 88 Figure 11: Comparison of the Spruce 2S Albumin genomic and c D N A clones 90 Figure 12: White Spruce 2S Albumin coding region 94 Figure 13: Graphical Representation of 2S Albumin Amino Ac id Statistics 96 Figure 14: Plot of Predicted Hydropathicity 101 Figure 15: Prediction of the 2S Albumin Precursor Protein Secondary Structure 102 Figure 16: 2S Albumin Super Family Alignment 103 Figure 17: Alignment of Conifer 2S Albumin Amino Ac id Sequences 107 Figure 18: Dendrogram of the Conifer 2S Albumin Alignment 109 Figure 19: Spruce 2S Albumin Promoter Constructs I l l Figure 20: Northern Blot of the Spruce 2S Albumin Gene 113 Figure 21: Transient G U S Expression in Spruce Somatic Embryos and Germinants 115 Figure 22: Transient G U S Expression in Spruce Pollen 120 Figure 23: Expression Pattern of the Spruce 2S Albumin Promoter in Developing Tobacco Seeds 123 Figure 24: Transformed Tobacco Embryos stained for G U S expression 125 Figure 25: Relative Strength of pBIN2S and pBiN700 126 Figure 26: Position of Signal Sequence Cleavage Relative to Hydrophobicity 139 xii Figure 27: Conserved Motifs of the Proximal Promoter xiii A C K N O W L E D G E M E N T S I would like to acknowledge all the support that I have received from my friends and family over the many years it has taken to complete this work. Thanks M o m , Dad and Darryl. Thanks also to the "gang" who are my extended family: Barbara Zatyko, Jane Webster, Lisa Spellacy, Dianne McDonald, Susanna Grimes and Cheryl Dyck. Thank you for believing in me and encouraging me, and for giving me a good shove when I needed it. Thank you Gertjan, I'm not sure I could have finished this without you. Special thanks to Dr. Craig Newton, who taught me practically everything I know in the lab, and to Dr. Dave Ellis who taught me to have confidence in the results. The transient expression experiments could not have been completed without the assistance of Dr. Pierre Charest and the staff at the Petawawa National Forestry Institute, especially Yvonne Devantier. I would also like to thank Dr. Carl Douglas for the use of the fluorometer. This research was financially supported in part by the B . C . Science Council G . R . E . A T awards and by B . C . Research. xiv C H A P T E R O N E Introduction 1.1 Seed Storage Proteins Seed storage proteins have been extensively studied because of their nutritional importance for humans and livestock. These proteins are synthesised during embryo development, generally stored in protein bodies and utilised as a carbon and nitrogen source during seedling germination (reviewed by Bewley and Black, 1994 and Shewry et al., 1995). They have also proven useful as an experimental system to explore gene regulation, as they are expressed at high levels, are strictly regulated in a temporal and tissue specific manner, and are responsive to environmental and biochemical signals such as abscisic acid ( A B A ) , dehydration and plant nutritive status (reviewed by Morton et al., 1995). It is likely that seed storage proteins arose from a common ancestor to the fern spore storage proteins, based on amino acid homology between fern spore storage proteins and angiosperm 11S legumin, 7S vicilin and 2S albumin seed storage proteins (Rodin and Rask, 1990, Templeman et al.,1988). Seed storage proteins do not share homology with the vegetative storage proteins which plants accumulate in leaves, stems, and roots (Staswick, 1989), although exceptions do exist, as Beardmore et al. (1996) found sequence similarity at the protein level, based on proteolytic digestion patterns and antigenic cross-reactivity between 36 kDa seed and bark storage proteins in poplar. The pod storage proteins of legumes, a type of vegetative storage protein, also show little homology to seed storage proteins although they accumulate and are mobilized as a nitrogen source to support concurrent seed development (Zhong et al., 1997). Seed storage proteins were originally classified based on their solubility in defined solutions and methods of extraction (Osborne, 1924). B y definition, albumins are soluble in water, globulins in saline, prolamins in alcohol/water, and glutelins in alkali solutions. As well, storage proteins are classified based on their sedimentation coefficients, S (Svedberg constant). . The main classes of seed storage proteins found in the Dicotyledonae are the 7S vicilins and the 11S legumins, both classed as globulins, and the 2S albumins. In the 1 Monocotyledonae, the dominant class of storage protein consists of the prolamins, though other classes of seed storage proteins are found in lesser amounts (reviewed by Shewry et al., 1995). Seed protein profiles of gymnosperms, though not as well studied, generally resemble those of dicots (Gifford and Tolley, 1989, Misra and Green, 1990, Hakman et al., 1990, Allona et al., 1992, Flinn et al., 1993, Hakman, 1993). Seed storage proteins are synthesised in large quantities, either by high levels of transcription and translation of a few genes, or by lower levels of expression from multiple genes (Bewley and Black, 1994). These genes are normally only expressed, and the proteins only accumulate, in the seed tissues. In dicot species the storage tissues are located in the embryo, or in both the embryo and endosperm (Thomas, 1993), whereas monocots store the majority of their protein reserves in the endosperm. In gymnosperms, seed storage proteins have been isolated from both the embryo and the megagametophyte, the haploid maternal nutritive tissue (Bewley and Black, 1994). Upon germination, seed storage proteins are catabolized by proteolytic enzymes, creating a free amino acid pool that is utilised by the germinant until it is able to grow autotrophically. The general pattern for seed storage protein synthesis involves translation of m R N A in the cytoplasm, followed by targeting of the precursor protein to the endoplasmic reticulum (ER) by a signal sequence, which is cleaved as the peptide enters the E R lumen. The precursor protein usually undergoes further processing in the E R and is exported either to specialised vacuoles which become protein bodies, or is excreted from the E R in Golgi vesicles (see review by Shewry et al., 1995). In both cases, after leaving the E R , the preprotein is further processed to form the mature protein. 1.2 2S Albumin Super Family 2S albumins are small seed storage proteins ranging in size from 9 to 15 k D a and are the most abundant seed storage protein in many dicot species (Youle and Huang, 1981). Related proteins have also been identified in monocots, fern spores and gymnosperms. The 2S albumins are grouped as a super family based on: 1. Amino acid sequence, specifically the 2 strict conservation of eight cysteine residues, 2. protein structure, usually a small and large subunit joined by disulphide bonds and 3. seed-specific expression. The 2S albumin and related gene sequences currently found in the NUT GenBank and SwissProt databanks are listed in Appendix A (page 179). There are 100 sequences from the Dicotyledonae, 113 entries from the Monocotyledonae, 3 amino acid sequences from the fern Matteuccia struthiopteris and 9 nucleotide sequences (including two from this work) from two conifer species. Not all 2S-size albumins are members of the super family, however. Examples of unrelated seed storage proteins of similar size and solubility are narbonin from Vicia narbonensis (Nong et al., 1995), PA1 and P A 2 from pea (Higgins et al., 1986 and 1987), and A m A l from Amaranthus spp. (Raina and Datta, 1992). A diverse group of proteins, which are not seed storage proteins, but that have a very similar cysteine framework was reviewed by Yasuda et al. (1997). These proteins, unlike the 2S albumins, tend to be hydrophobic, and have high proline or glycine content in their N -terminal regions. Several of this group are expressed in roots and stems, as well as being induced by wounding. Interestingly, some of this group are also known to be expressed during embryogenesis - H y P R P from maize (Jose-Estanyol et al., 1992), DC2.15 from carrot (Aleith and Richter, 1990) and S O Y B N from soybean (Odani et al., 1987). A tentative case may be made for the evolutionary relationship between the 2S albumin seed storage proteins and these diverse proteins based on the shared framework of cysteine residues. 1.2.1 2S Albumin-Related Genes in the Monocotyledonae Kreis et al. (1985) initially proposed an evolutionary relationship between the dicot 2S albumins and the H M W (high molecular weight) prolamins, sulphur-rich prolamins, a-amylase and trypsin inhibitors, and Bowman-Birk protease inhibitors, based on the conservation of three cysteine-containing regions within these proteins. The Bowman-Birk type protease inhibitors, which act against the catalytic activity of trypsin and chymotrypsin found in the digestive tracts of herbivorous insect larvae, are found in both monocot and dicot species. Although Bowman-Birk type inhibitors show sequence homology with the 2S albumins, they 3 are not always seed-specific and some, in fact, are wound inducible (reviewed by Birk, 1985). This divergent group is therefore not dealt with in this review. The evolutionary trend in cereal protein sequences is the insertion of blocks of more or less repetitive amino acid sequence into an ancestral protein framework related to the dicot 2S albumins. Cereal prolamins are not processed into separate subunits and consequently are larger and less soluble than the dicot 2S albumins (reviewed by Shewry et al., 1995). Not all of the cereal seed storage proteins are larger than the related dicot proteins; however, the rice allergenic proteins are slightly smaller (Adachi et al., 1993). 2S albumin proteins have also been identified as a major storage protein in somatic embryos of the monocot oil palm (Morcillo et al., 1997). The oil palm 2S albumins are cysteine-rich, but are monomeric and apparently lack disulphide bonds. The conifer seed storage proteins thus far characterised tend to resemble dicot proteins more closely than those of monocots, due most likely to the explosive evolutionary divergence of the cereals (Shutov et al., 1995, Flinn et al., 1993, Dong and Dunstan, 1996). For this reason, the majority of this introduction deals with dicot 2S albumins. 1.2.2 2S Albumins in the Dicotyledonae The mature dicot 2S albumin proteins generally consist of a small (3 - 4 kDa) and large ( 7 - 9 kDa) subunit joined by disulphide bridges. These proteins have been identified in numerous dicot families (Table 1). Additional unpublished sequences from other species can also be found in the SwissProt and GenBank databases (Appendix A , page 179). Some of the 2S albumin proteins characterised have "common" names i.e., napin from Brassica napus, arabidin from Arabidopsis thaliana, mabinlin from Capparis masaikai, conglutin-5 from Lupinus angustifolius. The large and small subunits o f the mature protein are generally cleaved from a single precursor peptide (Arabidopsis thaliana - Krebbers, et al., 1988; Bertholletia excelsa -Altenbach et al, 1986, De Castro et al., 1987; Brassica napus - Crouch et al., 1983, Josefson et al., 1987, Scofield and Crouch, 1987, Ericson et al., 1986, Baszcynski and Fallis, 1990; 4 Table 1: Dicot 2S Albumins Family Species Reference Brassicaceae Brassica napus, B. rapa, B. oleracea; B. campestris, B. nigra, B. juncea, and B. carinata Monsalve and Rodriguez, 1990; Dasgupta et a l , 1995 B. napus Lonnerdahl et al., 1972 B. napus Monsalve R.I. et al. 1991a B. campestris Dasgupta S. and Mandal R K , 1991 Raphanus sativus Laroche et al., 1984, Monsalve et al., 1994, Laroche-Raynal M . and Delseny M . , 1986, Raynal et al., 1991 Sinapis alba Menendez-Arias et al, 1988 Sinapis arvensis Svendsen et al., 1994 Fabaceae Lupinus angustifolius; L. albus Gayleretal . , 1990, Salmanowicz and Weder, 1997 Medicago sativa Coulter and Bewley, 1990 Euphorbiaceae Ricinus communis L i et al., 1977, Sharief and L i , 1982 Cucurbitaceae Cucurbita spp. Hara-Nishimura, 1993 Luffa cylindrica Ishihara et al., 1997 Momordica charantia L i , 1977 Lecythidaceae Bertholletia excelsa Ampe et al., 1986, Sun et al., 1987 Lecythis zabucajo, Couroupita quianensis Sunetal . , 1996 Asteraceae Helianthus annuus Allen et al., 1987, Kortt and Caldwell, 1990 Papaveraceae Papaver somniferum Srinivas and Rao, 1987 Malvaceae Gossypium hirsutum Youle and Huang, 1979 Sinapis alba - Menendez-Arias et al, 1987', Lupinus angustifolius - Gayler et al., 1990, Gossypium hirsutum - Galau et al., 1992). The sunflower {Helianthus annuus) 2S albumins are an exception to the rule since the small and large subunits remain uncleaved, resulting in a monomeric mature protein (Kortt and Caldwell, 1990, Kortt et al., 1991, Anisimova et al., 1995). Also, two genes from sunflower (Allen et al., 1987, Thoyts et al., 1996) and one from castor bean {Ricinus communis) (Irwin et al., 1990, Godhino da Silva et al., 1996), have been identified which encode two expressed 2S albumin proteins in tandem. A t least one sunflower c D N A (SFA8) encodes a "single" 2S albumin (Kortt et al., 1991) similar to the majority of the dicot 2S albumins. Sequencing of genomic 2S albumin genes has revealed that the majority of dicot 2S albumins are without introns, with the exception of Brazil nut (Bertholletia excelsa) (Gander et al., 1991) and sunflower (Allen et al., 1987). A l l of the related monocot genes sequenced to this point also lack introns. 1.2.3 2S Albumin Proteins Mature 2S albumin proteins have been shown to be localised within the cell in specialised membrane bound organelles known as protein bodies (A. thaliana - De Clercq et al., 1990a, Bertholletia excelsa - Altenbach et al., 1986, Medicago sativa - Coulter and Bewley, 1990), though a sunflower 2S albumin protein has been found to be associated with lipid bodies (Thoyts et al., 1996). Krochko etal. (1994) observed the abnormal accumulation of alfalfa 2S protein in the cytoplasm rather than in the protein bodies of somatic embryos which were deficient in 2S and 1 IS proteins compared to zygotic embryos. The 2S albumin protein fraction generally consists of several variant isoforms within the seed (Ampe et al., 1986, Ishihara et al., 1997, Anisimova et al., 1995, Kortt and Caldwell, 1990, Irwin et al., 1990, Gayler et al., 1990, Coulter and Bewley, 1990, Monsalve et al., 1994, Gehrig et al., 1996) and the pattern of isoform expression has been used as a polymorphic character to determine relatedness among and within species (Salmanowicz and Przybylska, 1994, Salmanowicz, 1995, Anisimova et al., 1995, Przybylska and Zimniak-Przybylska, 1995). 2S albumin genes are found in multigene families: five copies in A. thaliana (Krebbers et al., 1988, van de Kle i et al., 1993), at least five copies in Brazil nut (Gander et al., 1991), ten to sixteen copies in B. napus (Josefson et al., 1987, Scofield and Crouch, 1987), eight to 12 copies in Raphanus sativus (Raynal et al', 1991) and four copies in castor bean (Irwin et al., 1990). The isoforms are apparently the product of these multigene families (Krebbers et al., 1988, Muren and Rask, 1996, Gonazalez de la Pena et al., 1996), though examples of alternate proteolytic processing have been observed at the amino-terminal peptide (D'Hondt 6 et al., 1993b) and at the carboxy-terminus (Monsalve and Rodriguez, 1990; Monsalve et al. 1991, Muren et al. 1995, Godinho da Silva Jr. et al., 1996, Gehrig et al., 1996). A s well, as mentioned above, in sunflower and castor bean a single m R N A has been shown to code for two complete proteins. 1.2.4 Protein Secondary Structure Disulphide bridges are formed within the E R prior to removal of the linker peptide joining the large and small subunits (Crouch et al. 1983, Coulter and Bewley, 1990, Monsalve et al., 1991b, Hara-Nishimura, 1993) and are important in stabilisation of the folded structure of the napin protein (Schwenke et al., 1988). The interchain and intrachain positions of the disulphide bonds have been mapped in Lupinus albus (Salmanowicz and Weder, 1997), Lupinus angustifolius (Lilley and Inglis, 1986), the sunflower 2S albumin SFA8 (Egorov et al., 1996), B. napus napin (Gehrig and Biemann, 1996) and mabinlin from Capparis masaikai (Nirasawa et al., 1993). Based on the above research, the pattern of disulphide bonds is conserved among the dicots (Figure 1) and consists of interchain bonds between the first Cys residue of the small chain and the third Cys of the large chain, and between the second Cys of the small chain and the first Cys of the large chain. Intrachain bonds occur in the large chain between the second Cys and fifth Cys residue, and the fourth Cys and the sixth Cys. Egorov et al., (1996) found that related monocot 2S albumin-like proteins, the alpha and gamma gliadins, have a pattern of disulphide bonds very similar to that seen in the dicot proteins, but that the pattern is less conserved in the related wheat a-amylase inhibitors. The secondary structure of a napin protein has been determined by N M R and the tertiary structure calculated, resulting in a predicted structure consisting of five helices and a C-terminal loop (Rico et al., 1996). 7 small subunit large subunit N H ' V M M MM* B H 7X~ Cystys CysXCys J L L L mm. C O O H "TT Cys Cys J I *CTPF di-sulphide bridges Figure 1: Dicot 2S Albumin Prepropeptide Representation of the initial product of translation. E R (endoplasmic reticulum ) signal sequence targets the precursor protein to the ER. A , B and C are regions of somewhat conserved sequence which contain the eight highly conserved cysteine residues. V is the variable region, which has little homology between proteins. A T P F is the amino-terminal processed fragment, IPF is the internal processed fragment or "linker" region, and C T P F is the carboxy-terminal processed fragment. The disulphide bridges are formed in the lumen of the E R , prior to the precursor protein being transprted to the protein body where the processed fragments are removed. 8 1.2.5 Processing of the Precursor Protein Comparisons between mature 2S albumin protein amino acid sequences and c D N A sequences have revealed that the initial product of translation is extensively processed, resulting in removal of approximately 20% of the amino acid sequence (De Castro et al., 1987, Ericson et a l , 1986, Krebbers et al., 1988). The first step in the processing of the 2S albumin precursor protein is the removal of an E R signal sequence as it enters the lumen of the E R from the cytoplasm. The disulphide bonds form between the conserved cysteine residues within the lumen of the E R . The propeptide is then targeted to the vacuole, where the amino-terminal peptide, and an internal linker region located between the small and large subunits, are removed, along with a few amino acids from the carboxy-terminus of the protein precursor (Figure 1) ( see review by Shewry et al., 1995, Ampe et al., 1986, Krebbers, et al., 1988, D 'Hondt et al., 1993b, Muren and Rask, 1996, Altenbach et al., 1986, De Castro et al., 1987, Thoyts et al., 1996). Gayler et al. (1990) found that in Lupinus angustifolia the conglutin-8 precursor protein lacks an amino-terminal propeptide, i.e. the signal peptide is directly attached to the small subunit. In addition, the carboxy-terminus of the large subunit was not removed. The extensive processing undergone by the precursor protein to give the mature protein, and the somewhat conserved nature of the fragments removed in processing, suggest that these regions are necessary or have some function. Detailed studies of the proteolytic processing of 2S albumin precursor peptides from Arabidopsis and B. napus have revealed that deletion or mutation of the processed fragments have little effect on targeting of the protein to protein bodies or on formation of the mature protein (Muren et al., 1995, D'Hondt et al., 1993b). However, deletion or modification of the, Arabidopsis 2S albumin internal linker propeptide resulted in less efficient processing of the precursor protein, as significant amounts of the precursor remained unprocessed (D'Hondt et al., 1993b). This may not be due to structural changes within the precursor protein since the presence of the napin propeptides was not found to significantly alter the conformation of the precursor protein in 9 comparison to the mature napin (Muren et al., 1996). N o function has yet been assigned to these propetides, though the removal of the amino-terminal and linker regions appears metabolically wasteful. However, Saalbach et al. (1996) have determined that the processed four amino acid carboxy-terminal fragment ( IAGF) from the Brazil nut 2S albumin is an essential part of a 20 amino acid carboxy-terminal sequence which targets the 2S albumin precursor protein to the vacuole in transgenic tobacco leaves. A n integral membrane protein, BP-80, has been identified which may be a vacuolar targeting receptor, as it binds the Brazil nut 2S albumin carboxy-terminal fragment, as well as the amino-terminal sequences of two other proteins targeted to the vacuole (Kirsch et al., 1996). The enzymes involved in the proteolytic processing of the 2S albumin proteins from the precursor to the mature form have only partially been characterised. The lack of conserved amino acid motifs within the processed regions of the peptide which would act as recognition sites for a proteolytic enzyme, led Muren et al. (1995) to hypothesise that the proteolytic enzyme had low coding sequence specificity or that the proteolytic enzyme may recognise sequences which were not located at the exact site of processing. Alternately, the prepropeptide may be processed by multiple enzymes recognising multiple motifs (Muren et al., 1995). Monsalve et al. (1990) hypothesise the existence o f a beta-turn specific endoprotease in B. napus, based on the cleavage of the precursor protein at sites that are tetra-peptides with high beta-turn probabilities. One of the proteases involved in proteolytic processing is believed to be an aspartic proteinase, which cleaves at multiple sites (preferring Phe-Asp, but also recognising Asp-Met, Asp-Asp, and Asp-Ser) within the processed fragments in both Arabidopsis and B. napus 2S storage proteins (Muren and Rask, 1996, D'Hondt et al., 1993a). Hara-Nishimura et al. (1995) isolated an asparaginyl endopeptidase, a type of cysteine proteinase, from the vacuole of castor bean and soybean. This enzyme was able to process 2S albumins and an 11S globulin by cleaving peptide bonds at the carboxy-terminal side of an exposed asparagine. The mature 2S albumin protein is not glycosylated in A thaliana (Krebbers et al., 1988) or Lupinus angustifolia (Gayler et al., 1990). Phosphorylation of serine residues by a 10 calcium dependent protein kinase ( C D P K ) has been observed to occur in both the small and large chain of specific kohlrabi napin proteins (Neumann et al., 1996a, 1996b). A subset of napin-like proteins, characterised in radish (Polya et al., 1993), bitter melon and castor bean (Neumann et al., 1996c), is also phosphorylated at specific small subunit serine residues by the same kinase. Though the significance of this phosphorylation is unknown, Neumann et al. (1996c) suggest that it may be related to plant defensive mechanisms, as other cysteine-rich proteins (Bowman-Birk protease inhibitors, lipid transfer proteins, and y-thionins) which are know to be involved in defence against pathogens, are also phosphorylated by C D P K . 1.2.6 Amino Acid Homology Amino acid sequences within the 2S albumin super family are quite variable, with the exception of the eight highly conserved cysteine residues which form the disulphide bridges that maintain the secondary structure of the mature protein. Amino acid identities of 2S albumins within the Brassicaceae are in the order of 70% (86% for processed regions of the protein and 66% for mature regions of the protein), and between Brassica and Brazil nut the identity drops to 22 -24% (Gander et al.,1991). Identity between the mature Brazil nut and castor bean proteins is only 36% (Gander et al., 1991) and 44% between the cotton Mat5A, Brazil nut and Arabidopsis amino acid sequences (Galau et al., 1992). Sequence similarity at the nucleotide level between members of a 2S albumin gene family was found to be 85% within the genus Brassica (Dasgupta et al., 1995) and 88% within the genus Lupinus (Salmanowicz and Weder, 1997). Antigenic cross-reactivity is another measure of protein similarity. Cross-reactivity was found between a 2S albumin isolated from Brassica campestris and other similar sized seed proteins in the Brassicaceae, but no cross-reactivity was observed with seed storage proteins of mung bean and tobacco (Dasgupta and Mandal, 1991). Monsalve and Rodriguez (1990) also found that antibodies raised against the mustard allergen Sin al recognised 2S albumins from B. napus, B. rapa, and B. oleracea. Coulter and Bewley (1990) found only one instance of serological cross-reactivity between an antibody raised against an alfalfa 2S 11 albumin and other legume seed storage protein extracts, indicating low amino acid homology among the legume 2S albumins. 1.2.7 Practical Applications of 2S Albumin Genes and Proteins Many 2S albumin proteins are sulphur-rich due to high cysteine and methionine content relative to other seed storage proteins. A body of research has developed based on identifying 2S albumins with nutritional qualities of interest from different species, as well as modifying previously characterised 2S albumin genes for increased nutritive value. The Plant Genetic Systems N V (Brussels, Belgium) has patented a process for increasing the nutritional content of plants by modifying 2S albumin genes (US5589615, 1997). Amino acid sequence comparisons among the dicot 2S albumin proteins indicate that the region located between the fourth and fifth Cys residues of the large subunit has a high degree of variability (Krebbers et al., 1993) (Figure 1). This "variable region" has since been shown to tolerate the addition of small biologically active peptides, as well as the addition or substitution of nutritionally important amino acids (reviewed in Table 2). Genetic engineering of 2S albumin coding regions between species results in correct processing of the precursor peptide, as well as targeting to protein bodies, though levels of protein accumulation is dependent upon the promoter used to drive expression (reviewed in Habbin and Larkins, 1995). Forage legumes have been engineered for foliar expression of 2S albumin proteins by placing the coding region under the control of a 3 5 S C a M V promoter. This results in low levels of protein accumulation throughout the plant, with protein being targeted to leaf mesophyll vacuoles (Saalbach et al., 1994). Chimeric 2S albumins with added E R retention signals have been used to prevent transport of the protein to the leaf vacuole where proteases may degrade the protein (Tabe et al., 1993, Tabe et al., 1995 and Khan et al., 1996). A n anti-sense napin gene under the control of the napin promoter resulted in transgenic B. napus seeds with reduced or no napin, a concomitant increase in the amount of the seed storage protein cruciferin and altered fatty acid composition (Kohno-Murase et al., 1994). 12 Table 2: Transgenic Plants Transformed with 2S Albumin Coding Regions 2S albumin coding region from: Modification Transgenic Expression Reference Arabidopsis thaliana addition of a neuro-peptide to the variable region A. thaliana and B. napus Vandekerckhove et al., 1989 Arabidopsis thaliana seed-specific in N. tabacum De Clercq et al., 1990a Arabidopsis thaliana Bertholletia excelsa addition of Met residues to the variable region seed-specific in A. thaliana, B. napus, and N. tabacum De Clercq etal., 1990b Arabidopsis thaliana addition of Lys residues and 28 amino acid peptide from Xenopus, to the variable region seed-specific in Brassica napus Krebbers et al., 1991 Arabidopsis thaliana addition of Lys residues to the variable region seed-specific in A thaliana Conceicao and Krebbers, 1994 Bertholletia excelsa seed-specific in N. tabacum Altenbach et al., 1989 Bertholletia excelsa signal peptide from a soybean lectin gene seed-specific in B. napus Guerche et al., 1990 Bertholletia excelsa seed-specific in B. napus Altenbach et al., 1992 Bertholletia excelsa constitutive expression in N. tabacum and Vicia narbonensis Saalbach et al., 1994 Bertholletia excelsa constitutive and seed-specific expression in N. tabacum and Vicia narbonensis Saalbach et al., 1995a Bertholletia excelsa constitutive and seed-specific expression in N. tabacum and Vicia narbonensis Saalbach et al., 1995b Bertholletia excelsa seed-specific in B. napus Denis et al., 1995a, 1995b, 1996 Bertholletia excelsa - addition of 5 Trp residues to the variable region - single substitutions: Leu to Trp or Arg 8 0 to Trp constitutive expression iriN. tabacum Marcellino et al., 1996 Bertholletia excelsa seed-specific in Vicia narbonensis Pickardt et al., 1995 Bertholletia excelsa seed-specific in Glycine max Nordlee et al., 1996 Bertholletia excelsa constitutive expression Phaseolus vulgaris Aragao etal., 1996 13 Table 2 (continued) Bertholletia excelsa Lecythis zabucajo addition of Met residues in the variable region constitutive and seed-specific in Solanum tuberosum and N. tabacum Sun etal, 1996 Bertholletia excelsa Transiently expressed under constitutive and seed-specific promoters in Arachis hypogaea embryos Lacorte et al., 1997 Brassica juncea constitutive and seed-specific expression in N. tabacum Ghosh etal., 1995 Brassica napus modified 3* UTR for identification of mRNA seed-specific Brassica napus Radkeetal., 1988 Brassica napus seed-specific in N. tabacum Stayton et al., 1991 Helianthus annuus 3' addition of an ER retention signal (amino acids: SEKDEL) -seed-specific in Pisum sativum and Lupin -foliar expression in lucerne and Trifolium subterraneum Tabe et al., 1993 Helianthus annuus 3' addition of an ER retention signal peptide constitutive and foliar expression in Medicago sativa Tabe et al., 1995 Helianthus annuus 3' addition of an ER retention signal foliar expression in Trifolium subterraneum Khan etal., 1996 Helianthus annuus seed-specific in Lupinus angustifolius Molvig et al., 1997 Pisum sativum seed-specific in N. tabacum and B. napus Stayton etal., 1991 2S albumin proteins have also been expressed in prokaryotic and eukaryotic expression systems. Gonzalez de la Pefia et al. (1996) expressed the mustard allergen Sin al as a small and large subunit joined by the internal processed fragment in Escherichia coli. Though 98% of the chimeric protein was located in insoluble inclusion bodies, a soluble fraction was purified which shared many characteristics of the purified mature protein, suggesting correct three dimensional folding. D'Hondt et al. (1993b) expressed the 2S albumin protein from Arabidopsis in yeast, but found that the precursor protein was not processed to the mature form. In contrast, Pal and Biswas (1995) reported the expression in yeast of a second Arabidopsis 2S albumin under the control of its own promoter, which was correctly processed and localised in vacuolar bodies. The precursor form of napin has also 14 been expressed in a baculovirus system, where the signal peptide was cleaved from the prepronapin by the insect cells, but no further processing occurred (Muren and Rask, 1996). The mabinlins comprise a group of 2S albumin proteins isolated from the seeds of Capparis masaikai with the unusual property of being able to elicit a sweet taste (Liu et al., 1993). Individual mabinlin isoforms have been characterised as being 400 times sweeter than sucrose, and heat stable at commercial processing temperatures (Nirasawa et al., 1994, Sun et al., 1996). These proteins therefore have potential as "low calorie" sweeteners for the processed food industry (Nirasawa et al., 1993). Structural analysis of the mabinlin isoforms in relation to their food processing characteristics has revealed that "sweetness" is lost upon reduction of the disulphide bridges (Nirasawa et al., 1993) and that differential heat stability is associated with a single amino acid variation in the large subunit (Nirasawa et al., 1994). Another group of 2S albumins with potential uses in industry has been identified from sunflower. Certain sunflower 2S albumins may be useful as emulsifiers or foaming agents (Gueguen et al., 1996). 1.2.8 2S Albumins Identified as Allergens A number of the 2S albumin proteins characterised from various plant families have been shown to be allergenic, such as Bra j EE from Brassica juncea (Gonzalez de la Pena et al., 1991, Monsalve et al., 1993) and Sin al from mustard (Menendez-Arias et al., 1988, Dominguez et al., 1990, Gonzalez de la Pena et al., 1996). The allergenicity of these two proteins is associated with the presence of a histidine residue in the large chain, which acts as an epitope for antibody recognition (Monsalve et al., 1993). In addition, conformational epitopes have been detected in the mustard allergen, Sin al (Menedez-Arias et al., 1990). Other 2S albumins cited as being allergens have been characterized in rapeseed (Brassica napus) (Monsalve et al., 1997), castor bean (Ricinus communis) (Youle and Huang, 1978b, Thorpe et al.,1988, Machado and Godinho D a Silva Jr., 1992), soybean (Glycine max) (Moroz and Yang, 1980, Burks et al., 1988) and cotton (Gossypium hirsutum) (Youle and Huang, 1979). Within the monocotyledonae, members of the a-amylase / trypsin inhibitor 15 family have been characterized as allergens in rice (Adachi et al.. 1993., Nakase et al., 1996), barley (Barber et al., 1989) and wheat (Gomez et al., 1990). Nordlee et al. (1996) found that the 2S albumin from Brazil nut is a major allergen in individuals allergic to the nut and that transgenic soybeans which expressed the Brazil nut 2S albumin coding region would also induce an allergic reaction. A patient allergic to Brazil nut was shown to be specifically sensitive to both the 2S albumin and 12S legumin Brazil nut seed storage proteins, as well as having a significant reaction, to presumably related proteins in hazel nut and mustard (Bartolome et al., 1997). The unusual stability of 2S albumin proteins may allow them to cross from the digestive tract into the blood stream and elicit an IgE-mediated immune reaction (Gonzalez de la Peria et al., 1996). Inhalation has also been shown to elicit an allergic response (Barber et al., 1989, Gomez et al., 1990, Monsalve et al., 1997). The potential for allergenicity shown by 2S albumins has important implications for their use in genetic engineering to improve nutritive quality (Nestle, 1996). 1.2.9 2S Albumins with Anti-Pathogenic Activity Some 2S albumins may have a dual function as both seed storage protein and anti-fungal protein within the seed (Terras et al. 1992, 1993 a). The antifungal activity of radish (Raphanus sativus) and B. napus 2S albumins is due to their ability to render the membranes of fungal hyphae excessively permeable (Terras et al., 1993b). Both of these dicot 2S albumins, as well as three related barley seed proteins (a trypsin inhibitor, and two Bowman-Birk-type inhibitors) act synergistically to increase the fungal inhibition of a purified barley thionin in vitro (Terras et al., 1993b). Kreis et al. (1985) proposed that dicot 2S albumin storage proteins and the cereal trypsin and a-amylase inhibitors, which have insecticidal properties, were related based on amino acid homologies of three domains within these proteins. Numerous endosperm specific cereal a-amylase and / or trypsin inhibitors have been characterized. These proteins are similar in size to the dicot 2S albumins and contain ten conserved cysteine residues, eight of which align with the conserved cysteines characteristic of 2S albumins (Garcia-Olmedo et al., 1987, Rasmussen and Johansson, 1992). However, 16 comparison of secondary structure between the dicot 2S albumins with a-amylase inhibitors from wheat and Indian finger millet indicates that the pattern of disulphide bonds is only partially conserved between the two groups (Egorov et al., 1996). 2S albumin small and large subunits from various species (Brassica napus, Momordica charantia, Ricinus communis, Raphanus sativus, Sinapis alba) have been identified as calmodulin antagonists by Polya et al. (1993) and Neumann et al. (1996a, 1996b, and 1996c). These researchers theorise that the calmodulin inhibitory activity may be involved with anti-fungal activity. 1.3 Seed-specific expression The general pattern of expression of 2S albumin genes is seed-specific, with low expression during the early stages of embryo development, which increases to a high level during cotyledon development, and then declines or is turned off at seed maturity. Napin m R N A first appears during the late heart stage (Fernandez et al., 1991), and can also be detected during later developmental stages in the endosperm (Hbglund et al., 1991, DeLisle and Crouch, 1989). In the mature seed of Arabidopsis thaliana, 2S albumin protein is localised within the embryo and vestigial endosperm (De Clercq et al., 1990a). Differential expression within the seed by individual members of 2S albumin gene families has been observed in Arabidopsis (Guerche et al., 1990), Raphanus sativus (radish) (Laroche-Raynal and Delseny, 1986), and B. napus (Blundy et al., 1991). In Bertholletia excelsa (Brazil nut), where the hypocotyl is the main storage tissue of the embryo, 2S albumin m R N A is not detected until late embryo development, in stages 3 and 4 (Gander et al., 1991). In Ricinus communis (castor bean), 2S albumins accumulate in the endosperm during later embryo developmental stages and continue to accumulate during the desiccation stage (Irwin et al., 1990). Coulter and Bewley (1990) found that the Medicago sativa (alfalfa) 2S albumin protein began to accumulate in the early cotyledonary embryo stage and was degraded 72 hours post-germination. In a few cases, napin expression has also been associated with pollen development. A napin promoter fused to an exotoxin A gene from Pseudomonas aeruginosa, resulted in male 17 sterile transgenic tobacco as well as blockage of embryo formation at the stage of napin accumulation (Koning et al., 1992). Interestingly, transgenic B. napus plants containing the same construct had viable pollen suggesting that the napin promoter functioned differently in tobacco. Boutilier et al. (1994) further identified a subfamily of napin clones expressed in B. napus microspores induced to undergo embryogenesis. Expression was biphasic, occurring as a response to induction and again later as storage proteins began to accumulate in the microspore derived embryos. 1.4 Conifer Seed Storage Proteins and 2S Albumins Small (10 kDa to 18 kDa) seed storage proteins have been identified in several gymnosperm species including Pinus contorta (Lammer and Gifford, 1989), Picea glauca/engelmannii (Flinn et al.,1993), Picea abies (Hakman, 1993) Pinus pinaster (Allona et al. 1992 and 1994) and Pinus taeda (King and Gifford, 1997). Though other research examining coniferseed storage protein profiles does not show proteins less than 14.4 kDa in size (e.g. Groome et al. 1991) this may indicate that, due to their small size, these proteins could have run off the bottom of S D S - P A G E gels and been missed. Allona et al. (1994) characterized four low molecular weight globulins from Pinus pinaster, and found that each consisted of a small and large subunit joined by disulphide bridges, that they had high Arg and Glx (Glu and Gin combined) contents and that they probably contained the same number of cysteine residues as dicot 2S albumins. Flinn et al. (1993) identified a 15 k D a seed storage protein in Picea glauca/engelmannii, consisting of large and small subunits joined by disulphide bonds. A c D N A clone (GenBank accession X63193 - Newton, 1991) coding for a putative 20 k D a peptide had earlier been isolated from late cotyledonary spruce {Picea glauca/engelmannii) somatic embryos. This clone had homology to the dicot 2S albumins and was expressed during the same embryo developmental stages as the protein. Five other conifer c D N A clones showing amino acid homology to the 2S albumin super family have been sequenced from Pinus strobus (GenBank accessions X62433, X62434, X62435 and X62436 18 Rice and Kamalay, 1991) and Picea glauca (GenBank accession L47745 Dong and Dunstan, 1995, Dong and Dunstan, 1996). Flinn et al. (1993) found that 2S albumin m R N A accumulated in interior spruce (Picea glauca/engelmannii) zygotic embryos, as well as somatic embryos cultured on 40 u M abscisic acid ( A B A ) maturation medium. 2S albumin m R N A was first detected at the early cotyledonary stage and continued through to embryo maturity. In addition, it was observed that somatic embryos maturing on sub-optimal levels of A B A tended to germinate precociously and that the amount of 2S albumin m R N A declined as precocious germination proceeded. The authors indicated that 2S albumin expression was up-regulated in response to A B A , and to high osmoticum caused by the addition of mannitol. Conversely, Dong and Dunstan (1996) found that a P. glauca 2S albumin c D N A (PgEMB25) , though showing essentially the same pattern of developmental regulation as that observed by Flinn et al. (1993), was not up-regulated in response to A B A or the osmoticum P E G (polyethylene glycol) in P. glauca somatic embryo suspension cultures. 1.5 2S Albumin Promoter Studies in Dicotyledonae The upstream regulatory regions, or promoters, of seed storage protein genes are of particular interest because they contain regions which are thought to mediate the high levels and precise timing of gene expression, as well as embryo and tissue specificity within the seed. Discrete regions, referred to as cis elements, are thought to interact with nuclear proteins known as transcription, or fraws-acting, factors. Some of these regions are involved in the transcriptional response of genes to A B A , desiccation and plant nutritional levels (see review by Thomas, 1993, Bewley and Black 1994 and Morton et al., 1995). Promoter function has been explored using three main strategies. The primary method used to elucidate function is the fusion of the promoter to a reporter gene, followed by the transient or stable transformation of cells, isolated tissues or whole plants. This allows the pattern and relative strength of expression of a promoter to be measured and / or visualised, depending on the reporter gene used. The reporter gene most commonly used is uidA 19 encoding the enzyme P-glucuronidase (Jefferson et al., 1987). Other reporter genes which are suitable but used less frequently are bar, encoding phosphinotricin acetyltransferase; C A T , encoding chloramphenicol acetyl transferase; lux, encoding luciferase, and various toxin genes which encode products lethal to the cell such as diphtheria toxin A (DT-A) , Pseudomonas aeruginosa exotoxin A or RNase. Lethal reporter genes are used to identify extremely low levels of expression and to pinpoint the earliest moment of expression by causing cell ablation, i.e. death of any cell which expresses the gene product. Further dissection of promoter function may be accomplished by progressive deletion of the promoter or by mutagenesis of specific elements within the promoter. Regions may also be duplicated or exchanged between promoters. A variation of this technique is to fuse regions of interest, sometimes repeated in tandem, to a minimal promoter to explore the minimal amount of sequence information necessary to produce a specific pattern of expression. Quantification of response to environmental or biochemical signals, such as dehydration or A B A (reviewed by Quatrano et al., 1993), is also possible using such gene constructs. A s well, mutants of a signal transduction pathway may be characterized by transformation with promoter constructs suspected to be downstream of or to be affected by the mutation. A second, indirect method, used to identify potentially important regions within the promoter is sequence comparison. Within the coding regions of genes, conservation of amino acid sequence among members of a gene family is generally considered to be an indication of conserved function over evolutionary time. Similarly, attempts have been made to identify conserved promoter elements having functional significance by comparing 5' flanking regions of genes which are related evolutionarily, or are co-ordinately regulated, or which respond to the same signals (environmental or biochemical). The relative importance of these conserved elements to promoter function can then be confirmed by their mutation or deletion from promoter/reporter gene fusions. A third strategy for identifying cis elements is to locate the regions of a promoter where nuclear proteins bind by using DNAase foot-printing or gel retardation protocols. 20 Transcription factors are nuclear proteins which bind specific D N A sequences, thereby activating or repressing gene expression. Transcription factors act as links in signal transduction pathways. They are themselves regulated and can co-ordinate sets of genes by their binding to common promoter elements. The function of 2S albumin promoters isolated from Arabidopsis thaliana (De Clercq et al., 1990a, Conceicao and Krebbers, 1994), Brassica napus (Radke et al., 1988, Koning et al., 1992, Stalberg et al., 1993; Ellerstrom et al., 1996, Stalberg et al., 1996), and Bertholletia excelsa (Brazil nut) (Grossi de Sa et al., 1994, Vincentz et al., 1997) has been explored by deletion studies investigating gain and loss of expression in transgenic tobacco, as well as transgenic Arabidopsis and B. napus. These studies have revealed a general picture in which dicot 2S albumin promoters direct developmentally-regulated, seed-specific expression in heterologous dicot species. However, differences in expression patterns have been observed for the same construct in different species. For example, in transgenic tobacco the AT2S1 gene from Arabidopsis is expressed at an earlier stage of seed development than in Arabidopsis (De Clercq et al., 1990a). Similarly the B. napus napA gene is expressed in the tobacco endosperm at an earlier stage than in B. napus (Stalberg et al., 1993). Koning et al. (1992) observed the differential expression of a napin promoter / lethal gene fusion which arrested pollen development in transgenic tobacco but not B. napus. Despite these differences in expression, their high expression and seed-specificity have made 2S albumin promoters popular candidates to direct transgene expression in the seed of B. napus (Stayton et al., 1991, Voelker et al., 1996, Roeckel et al., 1997), tobacco (Ghosh et al., 1995, Roeckel et al., 1997) and Arabidopsis (Broun and Somerville, 1997). The hormone abscisic acid ( A B A ) , as well as the process of dehydration, have been implicated in the correct maturation of somatic embryos in several plant species (Kermode, 1995). Eight hundred basepairs of a napin promoter is sufficient to down-regulate expression of the p-glucuronidase (GUS) reporter gene in response to dehydration in immature transgenic tobacco seeds (Jiang et al., 1995). G U S expression was also shown to be further down-regulated upon imbibition of the prematurely dehydrated seed. These authors also 21 found that the 3' flanking region of the napin gene had no effect on the pattern of gene expression, but appeared to decrease absolute levels of expression when included in gene constructs. Further research revealed that the napin promoter was up-regulated in response to 10 u M A B A , but could be rendered insensitive to A B A by dehydration (Jiang et al., 1996), suggesting that hormonal control of storage protein accumulation may be quite complex. Koorneef et al., (1989) found that Arabidopsis plants which were double mutants lacking endogenous A B A {aba) production as well as being insensitive to A B A {abi3) did not accumulate 2S or 12S storage proteins, whereas single mutants homozygous for aba, abi\, or abil, and the double mutant aba,abi\, accumulated normal amounts of seed storage proteins. Developmental regulation of an Arabidopsis 2S albumin promoter G U S fusion was found to be independent of morphological development of the embryo in studies which crossed transgenic A. thaliana plants with plants which had mutations in embryo morphology {emb mutants) (Devic et al., 1996). Eight different emb mutants arrested at various points in embryo development, showed correct temporal induction of A T 2 S : G U S expression when compared to wild-type embryos despite their mutant morphology. Emb mutants did however, have higher G U S expression in the endosperm compared to wild-type seeds, due most likely to the lack of cotyledon development in the mutant embryos. Promoter function is complex and not easily dissected. Elements have been identified which are necessary for but not sufficient on their own for seed-specific expression, suggesting that these promoter elements somehow interact with other elements in determining patterns of gene expression. Sequential deletion of 2S albumin promoters in a 5' to 3' direction generally results in decreased levels of expression, although seed specificity is maintained until approximately the -150 basepair position, where expression ceases or seed-specificity is lost. This pattern holds true for 2S albumin promoters from Brazil nut {Bertholletia excelsa) which retained seed specificity to the -49 position (Vincentz et al., 1997), for Brassica napus, which lost seed specificity at -152 (Radke et al., 1988, Stalberg et al., 1993, Ellerstrom et al., 1996, Stalberg et al., 1996) and in Arabidopsis, where two 22 promoters were still seed-specific at -250 and -270 (Conceicao and Krebbers, 1994). The region between -1101 and -309 of the napin promoter contains a "negative" element (or elements), as removal of this region increases napin expression (Stalberg et al., 1993). A region responsible for cotyledonary expression was delineated by the exchange of regions between two differentially expressed A. thaliana 2S albumin promoters (Conceicao and Krebbers, 1994). Internal deletions and rearrangements have also been used to locate regions of the B. napus napin promoter responsible for changes in temporal and endosperm specific expression (Ellerstrom et al., 1996). Sequence comparison within gene families and between species has been used to identify conserved motifs in the 5' flanking regions (Gander et al., 1991, Dasgupta et al., 1993, Adachi et al., 1993, Conceicao and Krebbers, 1994, Stalberg et al., 1996). Some of these putatively conserved elements have been observed interacting with nuclear proteins, or are located adjacent to the site of protein binding (Ericson et al., 1991, Gustavsson et al., 1991, Grossi de Sa et al., 1994, Nakase et al., 1996b, Vincentz et al., 1997). Nakase et al. (1996b) showed that the promoters of three seed-specific rice genes, a rice allergenic protein gene (a member of the 2S albumin super-family), a glutelin gene and a prolamin gene interacted with the same transcription factors during competitive binding assays. Vincentz et al. (1997) showed that opaque-2 regulatory proteins from the monocots, Zea mays and Coix lacryma-jobi, bound to specific G-box core sequences within a Brazil nut (Bertholletia excelsa) 2S albumin promoter, and in addition, up-regulated expression of that promoter in a transient expression assay. 1.6 Characterisation of a Picea glauca 2S albumin genomic clone Research on the seed storage proteins of Picea glauca (Moench) Voss was initiated by Dane Roberts and Barry Flinn at the Forest Biotechnology Centre - B C Research (Roberts et al., 1990a and 1991; Flinn et al., 1991a, Flinn et al., 1991b and Flinn et al., 1993). A s part of his doctoral thesis at the University of British Columbia (1992), Flinn identified 1 IS legumin and 7S vicilin proteins and characterised their accumulation within developing zygotic and 23 somatic Picea glauca and P. glauca (Moench) Voss lengelmannii Parry embryos (Flinn et al., 1991a, Flinn et al., 1991b, and Flinn et al., 1993). In order to isolate c D N A clones encoding these storage proteins, c D N A libraries made from proembryos and early cotyledonary stage somatic embryos were screened by differential hybridization by Craig Newton (Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Thirty-seven clones were isolated that were uniquely expressed in the early cotyledonary stage somatic embryos; the stage at which storage proteins were beginning to accumulate (Flinn et al., 1991a). Cross-hybridization experiments and sequencing identified a single c D N A clone encoding a legumin and five vicilin c D N A clones (Newton et al., 1992). Interestingly, the majority of c D N A clones (21/37) were found to encode a small protein which had homology with the dicot 2S albumins (Dr. C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Four of the twenty-one c D N A clones were sequenced and found to differ, especially in the 5' and 3' untranslated regions. This result indicated that the white spruce 2S albumin gene family had at least four members, or at the very least 2 copies per haploid genome. Further work confirmed the presence of a 15 k D a protein in embryos which accumulated earlier than the legumin and vicilin proteins, and which was up-regulated in maturing somatic spruce embryos by A B A and high osmoticum (Flinn et al., 1993). Flinn found that the 15 kDa seed storage protein was located in protein bodies in interior spruce zygotic and somatic embryos (1993) and the megagametophyte, along with the spruce legumin and vicilin storage proteins (unpublished, B.Flinn, Genesis Research and Development Corp. Ltd. , P.O. B o x 50, Aukland, N .Z . ) . Flinn attempted to directly sequence the protein but was unsuccessful as the N-terminal sequence was blocked. A white spruce genomic library was subsequently screened using the 2S albumin c D N A clone II5G001 (GenBank accession X63193) as a probe, and three out of twenty-three positive clones were sequenced (Dr. C. Newton, unpublished, F.orest Biotechnology Centre -B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). Two of the sequenced genomic clones were found to be pseudogenes, containing an insertion in one case and an in-24 frame stop codon in the other; in addition they had only 70% identity with the original c D N A clone. Northern hybridization with gene-specific oligonucleotide primers indicated that these genomic clones were not expressed or were expressed at very low levels in developing white spruce somatic embryos (Dr. C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). This data indicated that in addition to the expressed members of the white spruce 2S albumin gene family there are also non-fiinctional pseudogene members of the family. Newton re-screened the genomic library using a synthetic oligonucleotide probe specific to the highly expressed 2S albumin c D N A clone II5G001 and which did not cross-hybridize with any of the 23 initial lambda clones. Two lambda clones were isolated during the re-screening of the genomic library. One of these clones, A3.2, became the object of my thesis research. When this work was begun, there was little information available in general on conifer seed storage proteins and no information on conifer gene promoter function. The white spruce 2S albumin gene was particularly interesting because it was a member of a gene family known to be expressed at high levels during specific stages of seed development and was possibly co-ordinately regulated, along with the legumin and vicilin seed storage proteins. There was extensive research in the literature on angiosperm 2S albumin proteins and genes with which to compare and contrast the conifer homologue. Using the white spruce 2S albumin gene as a model of a developmentally regulated, tissue-specific gene I hoped to explore whether there were fundamental differences between gymnosperm and angiosperm regulatory sequences. There was also the possibility that by dissecting this promoter, cis-elements responsible for the tissue specificity and high levels of expression of the 2S albumin protein could be identified. Such elements might be useful in the genetic engineering of conifers, especially i f significant differences existed between angiosperm and gymnosperm regulatory sequences. It was unknown whether a gymnosperm promoter would function to direct the expression of a reporter gene in the angiosperm tobacco. Even within the Angiosperms, 25 sufficient differences exist between the Monocotyledonae and the Dicotyledonae in the cellular machinery that introns are not always spliced correctly between groups (Simpson and Filipowicz, 1996) and heterologous promoters are not always functional (Connelly et al., 1994). A n alternative to the stable transformation of tobacco plants is the transient expression of promoter:reporter gene constructs by microprojectile bombardment into plant tissues. Whether developmental regulation of a gene could occur during transient expression was also unknown. This work represents the first example of the stable expression of a reporter gene under control of a gymnosperm promoter in an angiosperm (tobacco) and shows that, with the proper controls, questions about the tissue-specificity and the developmental regulation of promoter construct can be answered in a transient expression system. The objectives of this thesis were to: • Isolate a genomic clone encoding a functional gymnosperm 2S albumin gene. • Sequence this gene and compare it with related angiosperm sequences. • Characterise the function of the promoter in a homologous gymnosperm (white spruce somatic embryos) and heterologous angiosperm (tobacco) through promoter G U S reporter gene fusions. In the course of this research, two regions with homology to the highly expressed 2S albumin c D N A clone were identified and sequenced from the lambda clone A3.2. One of the putative 2S albumin genes was identified as a pseudogene, but the second gene was apparently functional. The functional genomic clone was found to contain an intron, in contrast to most dicot 2S albumin genomic sequences which lack introns. Comparison of the sequence from the proximal promoter region (+62 to -400) of the Picea glauca 2S albumin gene with those from dicot 2S albumins revealed conserved motifs characteristic of the dicot 2S albumins. Northern analysis confirmed that the spruce 2S albumin clone was expressed in a seed-specific manner and showed a pattern of expression typical of seed storage protein genes. 26 The promoter of the putatively functional white spruce 2S albumin gene (PG2S) was translationally fused to the uidA (GUS) reporter gene. Two deletions were made to the full length promoter, one reducing it from 2.3 kb to approximately 700 basepairs, and the second reducing it to 179 basepairs. Function of the gymnosperm promoter in an angiosperm background was explored by stably transforming tobacco with the full length and 700 basepair truncated promoter constructs. Ease of transformation and larger seed size influenced the choice of tobacco over Arabidopsis, as our model transgenic plant. The Picea glauca 2S albumin promoter constructs directed G U S expression specifically to the tobacco embryo from heart stage to embryo maturity, with no expression in the tobacco endosperm. Promoter function was also observed in a homologous system by transiently expressing the three promoter constructs in different developmental stages of spruce somatic embryos, somatic germinants and pollen. The embryos produced by somatic embryogenesis have levels of storage protein (Flinn et al., 1991a, and Flinn et al., 1993 ) and lipids (Cyr et al., 1991) comparable to zygotic embryos and are able to germinate with high frequency (Webster et al., 1990). The choice of somatic embryos as an experimental system was influenced by the fact that spruce somatic embryos can be harvested in large numbers at specific stages of development which is necessary for the preparation of biolistic targets. It would have been prohibitive to dissect sufficient zygotic embryos for statistical comparison between treatments. In addition, the early stages of conifer zygotic embryos would only be available for brief periods of time, as pollination occurs once a year and embryo development occurs between May and September (Owens and Molder, 1984). Transient expression using the spruce 2S albumin promoter constructs was accomplished using a biolistic device, the DuPont PDS/He 1000 Gene Gun. Transient expression mirrored expression of the native gene as seen by Northern blot analysis, with the exception of high levels of transient expression observed in mature embryos which had been partially dried and in germinating white spruce pollen grains. The pattern of 2S albumin transient expression indicates the presence of elements which enhance expression located 27 between 2.3 kb and 700 basepairs. Seed specificity appears to be retained even in the smallest (179 basepair) 2S albumin construct. This work shows the relationship of the spruce genomic clone to dicot 2S albumin sequences previously characterised. In addition to sharing structural similarities based on the conservation of the eight cysteine residue framework within the protein, motifs within the promoter of this gene are conserved in sequence and function. Though the putative regulatory motifs identified are small, not conserved in position exactly between the gymnosperm and angiosperm sequences, and are not even highly conserved among the dicot 2S albumins, they provide sufficient information to direct the proper developmental and tissue specific expression of a reporter gene controlled by a spruce 2S albumin promoter in tobacco. This suggests that in addition to the conservation of cis elements between the gymnosperms and the angiosperms, the trans-acting factors with which they interact must also be conserved. 28 C H A P T E R T W O Methods and Materials Two genomic clones, A2. 1 and A3.2, were isolated from a white spruce genomic library using a synthetic oligonucleotide 115 G . l (Table 3) (Newton, unpublished; Dr. Craig Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). The genomic library (PNFI-X-88), prepared by Linda Deverno, Petawawa National Forestry Institute, Chalk River, Ont. consisted of partially digested Sau3a white spruce genomic D N A packaged in the lambda bacteriophage vector E M B L 3 (Frischauf et al., 1983). The sequence contained in the oligonucleotide II5G. 1 served to differentiate between high and low expressing members of the 2S albumin c D N A family (C. Newton, unpublished, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). B y using this oligonucleotide it was hoped that the genomic clones recovered would be ones that were expressed at high levels in the developing Picea glauca seed. The II5G series of sequencing primers, II5G.1 to II5G.5 (Table 3), were designed and used by Craig Newton to sequence several 2S albumin c D N A clones in both directions. Subsequently, I used them in sequencing the genomic clones contained in A3.2, as well as for the hybridization of Southern blots of subcloned D N A fragments. Table 3: Oligonucleotide Sequencing Primers P R I M E R 5' 3' T 4 II5G.1 1 G A A C C A T T T G A G C G T C A G C C 62 °C II5G.2 1 C A G C A T C T C T C C G A T G G 54 °C II5G.3 1 TAGCGAGAGTGGCGTTG 54 °C II5G.4 1 A T G G G T G T C T T T T C C C C T T C 60 °C II5G.5 1 C A C T T A A A A C T G C T G C C C G T 60 °C G U S fusion junct ion 2 T C A C G G G T T G G G G T T T C T A C 62 °C p U C / M 13 F o r w a r d 3 C G C C A G G G T T T T C C C A G T C A C G A C 78 °C p U C / M 1 3 Reverse 3 T CACACAGGAAACAGC TAT GAC 64 °C 1. Nucleic Acid - Protein Service Lab (U.B.C., Vancouver, BC) 2. Clontech Laboratories (Palo Alto, CA) 3. Promega (Madison, WI) 4. T m = 2(A+T) + 4(G+C), Wallace et al, 1979. 29 2.1 General Molecular Biology Techniques 2.1.1 Lambda Bacteriophage Protocols Preparation of Plating Bacteria Working Stock Plating bacteria were prepared as described in Sambrook et al (1989). The Escherichia coli strain ER1647, taken from a 15% glycerol stock stored at - 80 °C, was streaked onto T B medium (Appendix B , page 187) and grown at 37 °C overnight. A single colony was picked and used to inoculate a 250 ml flask containing 50 ml T B broth plus 0.2% maltose (from a 20% filter-sterilised stock), and incubated overnight at 37 °C on an orbital shaker at 250 rpm. After 16 hours, the bacterial culture was divided into two 50 ml conical centrifuge tubes and centrifuged at 2500 rpm for 10 minutes in a Beckman G P benchtop centrifuge (Beckman Instrument Inc.). The supernatant was removed and the cells were resuspended in filter-sterilised 10 m M M g S 0 4 (approximately 5 to 7 ml) to give an optical 9 density reading (OD) at 600 run of 2.0, equivalent to a bacterial concentration of 1.6 x 10 cells/ml. Pooled aliquots were stored at 4 °C, up to 3 weeks prior to use. Bacteriophage Multiplication Single plaques were used to produce pure phage stocks by the plate lysate method, essentially as described in Sambrook et al., (1989) A single plaque was harvested, using a sterile 200 ul plastic pipette tip, and transferred into a sterile 10 ml culture tube containing 0.3 ml of ER1647 host bacteria suspension. Bacteriophage and host were incubated without agitation at 37 °C for 30 minutes. T B top agar (TB medium containing 0.6% agar) was melted in a microwave oven on low power and cooled to 50 °C. Under sterile conditions, 6.5 ml of the T B top agar was pipetted into the culture tube. The agar was added so that it ran down the side of the culture tube forming no bubbles in the media. The mixture was immediately poured onto a room temperature 140 mm diameter T B petri plate and quickly swirled to spread the soft agar evenly. Plates were incubated in an upright position for 6 to 8 hours at 37 °C, until plaques were visible. 30 Phage were recovered by adding 10 ml S M buffer (Appendix B , page 184) to each plate, with gentle agitation for several hours on an orbital shaker at 4 °C. The liquid was collected using a sterile Pasteur pipette and transferred to 50 ml conical centrifuge tubes. A further 3 ml of fresh S M was added to each plate, swirled, and placed at an angle for 15 minutes until the liquid had drained from the plates' surface. This was added to the first 10 ml of S M collected. A 0.4% volume of chloroform was added to the pooled S M , the mixture was vortexed briefly, and centrifuged at 4000 rpm for 10 minutes at 4 °C, to kil l residual bacteria. Phage stocks were stored at 4 °C. In order to increase bacteriophage titre, initial stocks were diluted 1:10 and 1:100 with S M buffer. In a sterile 10 ml culture tube, 0.3 ml of diluted phage was gently mixed with 0.3 ml of ER1647 working stock and incubated at 37 °C for 20 to 30 minutes without agitation. Three replicates from each dilution were prepared, working from the highest dilution to the lowest, and plated using the method given above. Plates were incubated in an upright position for 6 to 8 hours at 37 °C until plaques had grown so that their edges touched. Phage particles were collected in S M buffer as above. Calculation of Bacteriophage Titre A serial dilution of phage stocks was done with S M buffer in 1 ml total volume to calculate titre. Diluted phage (100 ul) were mixed with an equivalent volume of the plating bacteria, ER1647, in a 10 ml disposable culture tube, and incubated at 37 °C for 30 minutes. Melted T B top agar was cooled to 50 °C, 2.5 ml added to the culture, and the mixture poured onto a 90 mm T B petri plate. The plate was swirled to evenly disperse the top layer. Plates were incubated in an upright position at 37 °C until plaques were visible in the lawn of ER1647 (approximately 6 hours). Plaques were then counted, and pfu/ml calculated: pfu/ml = number of plaques in 0.1 ml x dilution factor x 10. Phage stocks were most useful when the titre was around 1 0 1 0 or 10 1 1 plaque forming units (pfu) per ml. 31 2.1.2 Isolation of DNA and R N A Lambda Bacteriophage DNA To produce large numbers of phage for D N A purification, 5 ml of ER1647 culture 9 8 (approximately 5 x 1 0 bacteria) was mixed with 0.5 ml of phage stock (approximately 6 x 1 0 pfu) diluted with 4 ml of S M and incubated at 37 °C for 20 minutes without agitation. The host / phage mixture was added to 250 ml of pre-warmed T B in a 1 1 Erlenmeyer flask and placed on an orbital shaker for 3 to 4 hours at 250 rpm, until bacterial lysis was visible. A t this point, 5 ml of chloroform were added and the flask returned to the orbital shaker for another 10 minutes to further lyse the bacteria. The culture was cooled to room temperature and 25 ul of pancreatic DNase I (Sigma-Aldrich Canada, Ltd.,) and RNase A (Sigma) were added to give a final concentration of 1 u.g/ml. The flask was incubated at room temperature for 30 minutes, then 14.6 g of N a C l was added (1 M N a C l final concentration) and dissolved by gentle swirling. The solution was placed on ice for 25 minutes, then decanted into a 250 ml centrifuge bottle, taking care to leave the chloroform behind in the bottom of the glass flask. Bacterial debris were removed by centrifugation at 11,000 g for 10 minutes at 4 °C in a Beckman J2-21 centrifuge, JA-14 rotor (Beckman Instruments Inc.). The supernatant was poured into a clean flask and 25 g of solid polyethylene glycol, M W 8000, ( P E G 8000) was added to make a 10% solution. P E G was dissolved at room temperature by slow stirring on a magnetic stir plate. The P E G / phage solution was placed on ice at 4 °C for one hour or overnight, then centrifuged at 11,000 g for 10 minutes at 4 °C to recover the phage particles. The pellet was drained and resuspended in 4 ml of S M . Suspended phage particles were transferred to a 50 ml sterile conical centrifuge tube, 4 ml of chloroform were added and the mixture vortexed for 30 seconds. This solution was centrifuged for 15 minutes at 4 °C in a Beckman G P benchtop centrifuge. The aqueous phase was transferred to a thin walled ultra-centrifuge tube (Beckman Ultra-Clear™, 14 x 89 mm, Beckman). Phage particles were collected by ultra-centrifugation at 25,000 rpm for 2 hours at 4 °C, using the swinging bucket rotor, SW41, in a Beckman L 8 - 7 0 M Ultracentrifuge (Beckman 32 Instruments Inc.). The glassy pellet was resuspended overnight in 1 ml of S M by gently o t agitating the centrifuge tube at 4 G on an orbital shaker. Persistent lumps were dispersed by slowly pipetting up and down. Phage particles were lysed to separate the viral protein coat from the D N A , by adding 25 ul 0.5 M E D T A (pH 8.0), 2.5 ul of Proteinase K (20 mg/ml stock) (Sigma) and 50 (il of 10% sodium dodecyl sulfate (SDS), mixed by inversion and incubated at 56 °C for 1 hour. The mixture was cooled to room temperature and divided between two 1.5 ml microfuge tubes. A n equal volume of Tris equilibrated phenol (Appendix B , page 185) was added, the solution extracted, and the aqueous phase removed to a fresh tube leaving behind the phenol and white interface layer. The solution was extracted twice more with 1:1 phenol/chloroform, then once with chloroform alone. The aqueous phases were combined in one tube. D N A was precipitated by the addition of half a volume of 3 M ammonium acetate (pH 7.0) and 2.5 volumes of 95% ethanol, and mixed by inversion. The D N A precipitated at room temperature for 30 minutes before being centrifuged at 10,000 g for 20 minutes. The pellet was washed with 95% ethanol and air dried. The D N A pellet was redissolved in 1 ml of T E (pH 7.0) overnight. Escherichia coli Plasmid Mini-prep (based on Holmes and Quigley, 1981) Small amounts (approximately 20 u,g) of plasmid D N A were purified by boiling mini-prep to check the results of ligation reactions and to supply double stranded D N A template for sequencing reactions. Half of a 3 ml overnight E.coli culture was placed in a microfuge tube, spun down for 1 minute and the supernatant discarded. The remainder of the original culture was held at 4 °C. The pellet was resuspended in 0.3 ml of S T E T buffer (Appendix B , page 185) by vigorous vortexing. Bacteria were lysed by adding 25 \xl of a fresh lysozyme (Sigma) solution (10 mg/ml in T E p H 8.0), mixed by inversion and floated (uncapped) in a boiling water bath for 50 seconds. Immediately following, the microfuge tubes were capped and centrifuged for 10 minutes at 14,000 rpm. The gelatinous pellet was removed with a 33 sterile toothpick and discarded. D N A in the supernatant was precipitated with a lA volume of 7.5 M ammonium acetate (150 pi) and 3 volumes of 95% ethanol (900 ul), mixed by inversion and placed at -20 °C for 30 minutes or more. The D N A was pelleted by centrifugation at 14,000 rpm for 10 minutes and washed with ice cold 70% ethanol before being air dried and resuspended in 40 ul T E (or distilled water). Agrobacterium tumefaciens Plasmid Mini-prep A. tumefaciens was grown overnight at 28 °C in 5 ml of 523 medium (Appendix B , page 186) with the appropriate antibiotics. The culture was transferred to two 1.5 ml microfuge tubes and centrifuged for 60 seconds. The supernatant was discarded and the pellet washed with 1 ml Agro Wash solution (Appendix B , page 181) by vortexing, and centrifuging again. The wash solution was removed and the pellet resuspended in 100 p i of ice cold lysozyme solution (Appendix B , page 182). This mixture was incubated for 10 minutes at room temperature. Next, 200 pi of 1% SDS, 0.2 N N a O H solution was added, mixed by gently inverting and incubated on ice for 5 minutes. Ice cold potassium acetate solution (150 ul, p H 4.8) (Appendix B , page 183) was added and mixed by gently vortexing in an inverted position for 10 seconds. The tube was returned to ice for 5 minutes, then centrifuged for 5 minutes and the supernatant transferred to a new microfuge tube. The supernatant was extracted with an equal volume of phenol/chloroform and the aqueous phase removed to a clean microfuge tube. Plasmid D N A was precipitated with 2 volumes of room temperature ethanol for 2 minutes. The D N A pellet was collected by centrifugation (5 minutes at 14,000 rpm), washed with 70% ethanol twice and dried briefly in a Speed Vac, before being resuspended in 25 p i of T E . Large Scale Plasmid Isolation by Alkaline Lysis The D N A of certain plasmids was required in large amounts, for restriction mapping, subcloning, and for precipitating onto microprojectiles used in transient expression experiments. Such clones were grown overnight at 37 °C at 250 rpm in 500 ml of Terrific 34 broth (Appendix B , page 187) containing the appropriate antibiotic(s). E. coli cells were harvested by centrifugation in 250 ml centrifuge bottles using a Beckman JA-14 rotor at 5000 rpm, 4 °C for 10 minutes. The bacterial pellet was resuspended, by pipetting gently up and down, in 20 ml of freshly made buffered lysozyme solution (Appendix B , page 182) containing 10 mg/ml lysozyme (Sigma). The suspension was incubated at room temperature for 10 minutes, then 40 ml of freshly prepared alkaline SDS solution (Appendix B , page 181) was added to further lyse the bacteria. This mixture was placed on ice for 30 minutes. Ice cold 5 M potassium acetate (30 ml) was added, swirled to mix, incubated on ice for a minimum of 30 minutes or overnight, and then centrifuged at 10,000 rpm for 40 minutes at 4 °C in the JA-14 rotor. The supernatant was decanted into a clean flask and any white flakes removed using a cheesecloth filter. The D N A was precipitated from the supernatant by the addition of 2 volumes of 95% ethanol on ice or 0.6 volumes of isopropanol at room temperature, for a minimum of 30 minutes. The pellet was collected by centrifugation at 10,000 rpm for 40 minutes at 4 °C (if ethanol was used) or room temperature (if isopropanol was used). The supernatant was poured off, the pellet washed twice with 70% ethanol and air dried. The D N A pellet was washed from the walls of the centrifuge bottle, resuspended in 2.4 ml distilled water and further purified by cesium chloride gradient centrifugation. Cesium Chloride Gradient DNA Purification D N A from the alkaline lysis protocol, suspended in 2.4 ml distilled water, was mixed with 0.4 ml ethidium bromide (10 mg/ml) and 4.2 g of C s C l in a 15 ml Corex tube. The solution was heated slightly under hot running water to dissolve lumps of CsCl . Corex tubes were placed in thick rubber adapters for the JA-21 rotor and centrifuged at 6000 rpm at room temperature for 5 minutes. A Quick-Seal™ ultracentrifuge tube (Beckman) was partially filled with 8 ml of light C s C l solution (63 g/100 ml) using a 10 ml syringe and needle. Using a long Pasteur pipette attached to an automatic pipettor, the CsCl/plasmid solution from the Corex tube was placed at the bottom of the ultracentrifuge tube under the light CsCl . Care was taken to avoid transferring the " r o o f of protein floating in the Corex tube. 35 Ultracentrifuge tubes were filled to the top with light C s C l and pairs of tubes balanced to within 0.01 g. The ultracentrifuge tubes were heat sealed, placed in a Ti70 fixed angle rotor and centrifuged at 40,000 rpm for 18 hours at 20 °C (acceleration and deceleration programs were both set at 1) in a Beckman L 8 - 7 0 M ultracentrifuge (Beckman Instruments Inc.). The plasmid band was the lower band of D N A visible in the ultracentrifuge tube. D N A bands were generally visible without using U V light. The top of the tube was punctured with an 18 gauge needle before removing the plasmid band with a 21 gauge needle attached to a 3 ml syringe. Water saturated isobutanol was used to remove the ethidium bromide from the D N A by mixing equal volumes by inversion, and discarding the alcohol phase until both phases become colourless. The aqueous phase, containing the D N A , was diluted with 2 volumes of distilled water and the D N A precipitated with 2 volumes of 95% ethanol at -20 °C for several hours in a 30 ml Corex tube. The D N A pellet was collected by centrifugation at 10,000 rpm for 15 minutes at 4 °C in a JA-21 rotor. The pellet was washed with 70% ethanol, air dried and resuspended in 1 ml of distilled water (or TE) . Spruce Genomic DNA Interior spruce genomic D N A was prepared from mature somatic embryos of culture line W70. Thirty embryos (approximately 100 mg) were placed in a 1.5 ml microfuge tube and ground to a fine powder in liquid nitrogen using a metal pestle attached to a standard carpentry drill. The powdered tissue was incubated at 65 °C for 15 minutes in 1 ml of C T A B Extraction buffer (Appendix B , page 181) and extracted with 1 volume of chloroform : isoamyl alcohol (24:1). The aqueous layer (approximately 600 u,l) was removed to a fresh tube and the D N A precipitated with a 1/10 volume of 3 M ammonium acetate and 1 volume of isopropanol. After centrifugation at 7500 rpm in a microcentrifuge, the D N A pellet was washed with 600 ul of 70% ethanol, air dried, and resuspended in 300 ui of T E . 36 Spruce R N A Purification Samples representative of the interior spruce tissues prepared for microprojectile bombardment were frozen in liquid nitrogen and stored at -80 °C. Total R N A was extracted using TRIZOL™ Reagent, (a phenol and guanidine isothiocyanate solution, Gibco B R L ) according to supplier's instructions. RNase-free disposable plasticware and DEPC-treated solutions were used under sterile conditions to prevent the degradation of the R N A as it was being isolated. Fifty to one hundred milligrams of spruce tissue were ground in liquid nitrogen using a prechilled glass rod in a 1.5 ml screw cap microfuge tube. The sample was suspended in 1 ml TRIZOL™ reagent by vortexing and incubated at room temperature for 5 minutes. Samples were centrifuged at 14,000 rpm to pellet the insoluble cell wall fraction and the supernatant was transferred to a clean screw cap microfuge tube. Two hundred microlitres of chloroform (without additives) was added and each sample was vigorously shaken by hand for 15 seconds, before being set aside to incubate at room temperature for 3 minutes. The aqueous phase, which contained the R N A , was separated from the organic phase by centrifugation at 11,750 rpm for 15 minutes at 4 °C and transferred to a new microfuge tube. R N A was precipitated by the addition of 500 pi isopropyl alcohol, followed by a 10 minute incubation at room temperature and collected by centrifugation at 11,750 rpm for 10 minutes at 4 °C. The supernatant was removed and the gel-like pellet washed with 1 ml 75% ethanol (made with diethylpyrocarbonate ( D E P C ) treated distilled water, Appendix B , page 183). The pellet was vortexed, then centrifuged at 9500 rpm for 5 minutes at 4 °C, the supernatant discarded and the pellet air dried for 10 minutes. The R N A pellet was dissolved in 20 to 50 pi RNase-free 0.5% SDS by mixing gently with a pipette tip and incubating at 60 °C for 10 minutes. R N A concentrations of the samples were calculated by measuring light absorbance at 260nm wavelength ( A 2 6 0 ) of 1 pi of sample diluted in 99 pi of distilled water. R N A pg/ml = (A 26o units measured) x (40 pg/ml R N A per A 26o unit) x (dilution factor) 37 Tobacco Genomic DNA (Doyle and Doyle, 1990) Four to six healthy tobacco leaves were picked from a single plant (Nicotiana tabacum cv. Xanthi) and ground to a fine powder using liquid N 2 in a pre-chilled mortar and pestle. Ground leaf tissue was transferred to plastic scintillation vials, sitting in liquid N 2 , then stored at -80 °C. Residual liquid N 2 was allowed to evaporate before the vials were capped. D N A extraction was begun by rapidly vortexing approximately 0.5 to 1 g of frozen powdered leaf tissue with 5 ml pre-heated (60 °C) C T A B II extraction buffer (Appendix B , page 1 8 1 ) i n a l 5 m l Corex tube. Samples were processed one at a time to avoid excessive thawing and degradation of D N A . The C T A B / l e a f mixtures were incubated at 60 °C for 30 minutes. After the incubation period, an equal volume of chloroform : isoamyl alcohol (24:1) was used to extract the mixture by gentle inversion. Phases were separated by centrifugation at room temperature in the JA-20 rotor at 4000 rpm for 15 minutes. The top aqueous phase (about 5 ml) was removed to a clean 15 ml Corex tube, avoiding any floating fragments of plant tissue. Nucleic acids were precipitated from the aqueous phase by the addition of 3.3 ml chilled isopropanol. Gentle rocking of the tube caused long white strands of nucleic acid to appear. A Pasteur pipette with the tip melted into a hook was used to collect the D N A / R N A strands, which were then placed into a microfuge tube containing 1 ml wash solution (10 m M ammonium acetate in 76% ethanol). The strands of nucleic acid were washed by rocking the microfuge tubes back and forth. The wash solution was changed and the rocking repeated until the D N A / R N A precipitate was colourless. The pellet was spun down at 10,000 rpm for 5 minutes, air dried and redissolved in 1 ml of T E . Insoluble material was removed by centrifugation. The nucleic acids were further purified by reprecipitating in 2 volumes distilled water, Vi volume ammonium acetate (7.5 M ) and 2.5 volumes cold 95% ethanol in a 30 ml Corex o tube. The solution was mixed by inversion and allowed to precipitate overnight at -20 C. o The pellet was recovered by centrifugation at 10,000 rpm at 4 C for 15 minutes. The 38 supernatant was discarded and the pellet air dried, before being resuspended in 500 pi T E buffer. 2.1.3 Gel Electrophoresis and Analysis of Nucleic Acids Restriction Digest Reactions Restriction digests of plant genomic or X phage D N A consisted of 500 ng to 1 pg D N A , 3 pi 1 OX One-Phore-All Plus (OPAP) buffer (Pharmacia Biotech) (1 or 2 times strength depending on the manufacturers recommendation for the restriction enzyme(s)), 0.5 pi bovine serum albumin (Fraction V , Sigma) ( B S A ) (1 mg/ml), 0.5 p i 1 M dithiothreitol (DTT), 3 p i 40 m M spermidine and 0.5 pi (approximately 1 unit) of the appropriate restriction enzyme(s) brought up to 30 pi with distilled water. Digests were incubated for at least 3 hours or overnight at 37 °C (or at the temperature specified for a particular enzyme). Restriction digests of 0.5 to 1 pg plasmid D N A were successfully completed using 1 unit of restriction enzyme in a O P A P buffer solution (Pharmacia) at the strength recommended for that particular restriction enzyme. Gel Electrophoresis - DNA Two microlitres Ficoll tracking dye (Appendix B , page 182) were mixed with 20 pi restriction digest, loaded on a 0.8% agarose 1 X T A E (Appendix B , page 185) gel containing 0.5 pg/ml ethidium bromide (-10 pi of a 10 mg/ml stock), and run at 55 volts in I X T A E buffer until the first band of the tracking dye approached the end of the gel. The first and last lanes of each gel contained size markers (usually lambda phage D N A digested with H i n d l l l and / or Hind l l l /EcoRI) for the quantification of band size. D N A was visualised by placing the gel on a U V transilluminator (wavelength = 302 nm) (TM-36, U V P Inc.) to view the fluorescence of the ethidium bromide-DNA complexes. Photos (Polaroid 667 film, Polaroid) were taken through a Wratten 22 filter. 39 Southern Blot Southern blots were prepared from agarose gels in which the D N A bands had been separated by electrophoresis. In order to depurinate the D N A , gels were placed in a Pyrex baking dish, covered with a solution of 0.25 M HC1, and swirled on an orbital shaker until the bands of tracking dye in the gel changed from shades of blue to green and yellow (15 to 20 minutes). The HC1 solution was discarded and replaced with denaturing solution (1 .5M NaCl , 0 .5M NaOH) . The gel was returned to the orbital shaker until the tracking dye bands had returned to the original blue colour (15 minutes). The denaturing solution was replaced with 1 M ammonium acetate, for three 15 minute washes. The gel was prepared for blotting by trimming excess agarose from the top and sides of the gel, and by marking the position of the first lane by cutting off the left bottom corner at an angle. A "wick" (of the same length as the trimmed gel and 3 times the width of the gel stand) and 4 gel-sized pieces of Whatman 3 M M blotting paper (Whatman) were cut and soaked in fresh 1 M ammonium acetate. A gel stand or inverted casting tray was placed in a large Pyrex baking dish, the wick was placed over the gel stand with two edges of the blotting paper tucked under the legs of the gel stand. Fresh 1 M ammonium acetate was added to the dish to a point half way up the legs of the gel stand. Two pieces of the blotter were centred on the wick, the gel was placed on top, and then a piece of Hybond membrane (Amersham Life Sciences Inc.) cut the same size as the gel (also pre-wetted in 1 M ammonium acetate). A i r bubbles were carefully pressed out as the stack was assembled. A gasket was formed from four strips of plastic food wrap placed along the edges of the nylon membrane to the sides of the dish. The final two pieces of moist blotting paper were laid over top. The assembly was completed with half a package of paper towels (4 inches in height) weighted by a 10 x 10 cm glass plate and approximately 500 g of lead. D N A transfer proceeded overnight. The stack was carefully disassembled and the Hybond membrane peeled from the flattened gel. The side of the membrane adjacent to the gel (the D N A side) was marked with pencil. Efficiency of D N A transfer was checked by placing the gel on the U V transilluminator and ascertaining that the ethidium bromide stained bands had transferred to the membrane. 40 Trace amounts of agarose were rinsed from the blot and it was placed in the dark to air dry. D N A was cross-linked to the Hybond membrane by exposure to U V light (wavelength = 302 nm), D N A side down on a piece of Saran Wrap™ (Dow Chemical) on a U V transilluminator for no longer than 10 minutes. The blot was then wrapped in Saran Wrap™ and stored in the dark until used. Gel Electrophoresis - R N A Prior to gel electrophoresis of the R N A samples, the gel kit and casting tray were cleaned by soaking overnight in a 1% SDS solution, rinsed with tap water, followed by distilled water and ethanol, before air drying. The gel was prepared by dissolving 2 g agarose (Molecular Technology grade, Sigma) in 170 ml distilled water plus 20 ml 1 OX M O P S buffer (Appendix B , page 183) by boiling in a microwave. The agarose solution was cooled to 50 °C, 10 ml formaldehyde was added and the gel poured into the casting tray. Running buffer consisted of I X M O P S made up in DEPC-treated distilled water (Appendix B , page 183). R N A stocks were thawed on ice. Five micrograms R N A were mixed with 6 pi formamide dye (Appendix B , page 182), 2 pi formaldehyde and 0.6 pi 1 OX M O P S , then heated to 65 °C for 10 minutes. The samples were then snap-cooled on ice and 1 pi of ethidium bromide (1 mg/ml in DEPC-treated distilled water) added. Samples were spun down in a microcentrifuge to mix the contents, and quickly loaded onto the gel. The gel was run at 85 volts for 2 hours. Northern Blot The R N A gel was placed in an RNase-free dish and rinsed with DEPC-treated distilled water for 2 minutes, then washed with 10X SSC (Appendix B , page 184) for 20 minutes with gentle agitation on an orbital shaker. Capillary transfer of the R N A to Hybond nitrocellulose membrane (Amersham) was accomplished essentially in the same manner as for Southern blotting except that the transfer buffer used was 10X SSC. The nitrocellulose membrane and blotting papers were pre-wetted in 10X SSC. Transfer occurred overnight, the Hybond 41 membrane was air dried in the dark and the R N A cross-linked to the membrane by exposure to U V light for 5 minutes on a U V transilluminator (wavelength = 302 nm). 2.1.4 Labelling and Hybridization of Probes 3 2 P Labelling of Oligonucleotide Probes Synthetic oligonucleotide primers (Nucleic Acid-Protein Service Unit, Biotechnology Laboratory, University of British Columbia, Table 1) corresponding to the sequence of the 2S albumin c D N A clone (Dr. C. Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2) were radioactively labelled to serve as probes. Small oligomers (17 to 24 basepairs in length) were end-labelled in a 10 ul reaction mix consisting of 3 pmoles of the oligonucleotide (19 ng of a 19mer, 20 ng of a 20mer, 22 ng of a 22mer, etc.), 3 ul (4.8 pmoles) of 3 2P-labelled y-ATP(6000 Ci/mmol, 10.0 mCi/ml, DuPont N E N Research Products), 0.5 ul (5 units) of polynucleotide kinase ( P N K ) (Promega) in I X P N K buffer. The labelling reaction was mixed, spun down briefly and incubated at 37 °C for 15 minutes. The reaction was stopped by 4 ul 0.5 M E D T A in addition to being heated at 98 °C for 2 minutes. The end-labelled oligonucleotide was precipitated by the addition of 90 ul T E , 1 ul yeast t R N A (1 mg/ml) (Sigma), half a volume of 7.5 M ammonium acetate, plus 3 volumes of 95% ethanol, at -80 °C for 30 minutes. The precipitate was recovered by centrifugation at 14,000 rpm for 15 minutes in a microcentrifuge. The supernatant was removed using a drawn out glass Pasteur pipette and discarded. The pellet was washed with ice cold 95% ethanol, centrifuged for 5 minutes and the supernatant discarded. The radioactivity of the supernatant was monitored and the washes repeated to remove unincorporated radioactive label. The probe was dried in a Speed Vac for 5 minutes and re-dissolved in 100 ul T E by heating at 70 °C for 10 minutes. Probes had an activity of 10 6 counts per minute ( C P M ) or more. 42 Hybridization of Southern Blots with Oligonucleotide Probes The blot was pre-hybridized in 20 ml of pre-hyb solution (6X SSC (Appendix B , page 184), 1 m M E D T A , 0.5% SDS, 50 pg/ml yeast t R N A (2 mg/ml stock), 0.1% sodium pyrophosphate) in a hybridization jar of the Techne Hybridization oven (Techne Inc.), for a minimum of 4 hours or overnight, at the same temperature required for hybridization of the probe. Hybridization was carried out at 55 °C for oligonucleotide probes which were between 20 to 24 nucleotides in length, smaller probes were hybridized at 50 °C. The probe was added to the pre-hyb solution in the jar and hybridization proceeded overnight. The hybridization solution was poured off and the blot was washed (2X SSC, 1% SDS, 1 m M E D T A ) three times in 20 ml washes of 15 minutes each until no further radioactivity was measured in the wash solution. Excess wash solution was blotted from the membrane before it was wrapped in Saran Wrap™, monitored by Geiger counter, placed in a film cassette and used to expose X-ray film (X-Omat™, Kodak). A n intensifying screen (Cronex Lightening Plus, DuPont) was used and the cassette placed at -80 °C, i f levels of radioactivity were low. Length of exposure time varied. Overnight exposures were usually adequate though blots could be re-exposed for longer periods of time. X-ray film was developed under safe-lights, by processing for 3 minutes in Kodak G B X developer, rinsing under tap water, then fixing for 2 to 3 minutes in Kodak G B X fixative, before a final half hour rinse under running tap water. 3 2 P Labelling by Random Primer The random primer method (based on Feinberg and Vogelstein, 1983 and 1984) was used to radioactively label larger double stranded pieces of D N A , such as portions of plasmids, c D N A clones, or polymerase chain reaction (PCR) products. Template D N A (1 pi of <1 pg/pl) was mixed with 9.5 p l o f distilled water in a 1.5 ml microfuge tube, then denatured by heating to 98 °C for 10 minutes in a P C R machine and snap-cooled on ice. Radioactive labelling of the template D N A was accomplished by the addition of 2 pi 10X labelling buffer (Appendix B , page 182), 2 pi of 1 mg/ml B S A , 2 pi 0 .1M D T T , 2 pi dNTP 43 mix (2 m M each of G,T,C) , 1 ul hexanucleotide random primers (Gibco B R L ) , 2 pi oc-labelled 3 2 P - d A T P (3000 Ci/mmol, 10.0 mCi/ml, DuPont N E N Research Products), and 0.5 pi of Klenow fragment (Gibco B R L , Burlington, Ont) . The labelling mix was incubated at room temperature for a minimum of 4 hours or overnight. The reaction was stopped and the labelled probe precipitated by the addition of 1 pi 0.5 M E D T A , 80 p i distilled water, 3 pi yeast t R N A (2 mg/ml), 50 pi 7.5 M ammonium acetate, and 375 pi ice cold ethanol. This mixture was placed at -80 °C for 20 minutes. The probe was collected by centrifugation at 14,000 rpm for 15 minutes in a microcentrifuge. The pellet was washed three times with ice cold 95% ethanol followed each time by a 5 minute centrifugation, until the supernatant was not significantly radioactive, indicating that unincorporated dNTPs had been washed away. The pellet was dried in a Speed Vac, and resuspended in 100 pi T E by heating to 65 °C. The probe was stored at -80 °C i f not used immediately. Hybridization of Southern Blots with Randomly Labelled Probes The pre-hybridization solution appropriate for large probes consisted of 5 ml 2 0 X SSPE (Appendix B , page 185), 2 ml 10% SLS, 2 g dextran sulphate, 1 ml sheared salmon sperm D N A (Appendix B , page 184) (20 mg/ml), and distilled water to give a final volume of 20 ml. The randomly labelled probe was heated to 98 °C for 10 minutes, then snap-cooled on ice to denature the double-stranded D N A before being added to the pre-hybridization solution. Pre-hybridization and hybridization were carried out at 65 °C overnight. The blot was washed 3 times for 15 minutes each in 20 ml of wash solution (2X SSC, 1% SDS), and exposed to X-ray film as above. Hybridization of Northern Blots R N A blots were pre-hybridized in a Techne Hybridization oven (Techne Inc.) at 42 °C for 2 hours in a solution consisting of 10 ml formamide, 4 ml 20% SDS, 4 ml 2 M N a H 2 P 0 4 -4 m M E D T A (pH 7.2) (Appendix B , page 183), and 160 pi B S A (50 mg/ml). The randomly 44 labelled probe (a 517 bp fragment of the c D N A subclone II5G1001) was heated to 98 °C for 10 minutes, snap-cooled on ice, and added to the hybridization jar. Hybridization proceeded over a 20 hour period. R N A blots were washed three times for 15 minutes at 42 °C with 2 X SSC, 0.1% SDS and once, with the same solution, at 65 °C for 10 minutes. The blot was exposed to X-ray film in the same manner as for Southern blots. The Northern blot was stripped with boiling 0.1% SDS and re-probed with a Picea glauca 28S ribosomal probe ( C N -X 6 G ) to confirm that equal amounts of total R N A had been loaded. Band size was quantified using NTH Image version 1.60 for Macintosh (http://RSB.Info.NIH.gov/NIH-IMAGE/). 2.1.5 Ligation Reactions Sai l and S a l l / B a m H l fragments from X 3.2 which cross-hybridized with the II5G.1 primer (Table 1) were gel purified (Prep-A-Gene kit, Promega, Madison, WI) and subcloned into the sequencing vector p G E M -3Zf(+) (Promega). The vector was prepared to receive the fragments by restriction with the enzyme Sai l or double digestion with both Sai l and B a m H l in the case of the S a l l / B a m H l fragment. Ligation reactions consisted of 100 ng vector digested with the appropriate restriction enzymes, 25 ng insert, (roughly a 1:3 molar ratio of vector to insert) plus 2 (al 5 X ligase buffer (Gibco B R L ) , 1 ul 10 m M A T P , 1 unit T4 D N A ligase (Gibco B R L ) , and distilled water to 20 u l Control reactions were prepared with no ligase and no insert D N A to test transformation efficiency. Ligation reactions were incubated overnight at room temperature. 2.1.6 Heat Shock Transformation of Competent E. coli Cells Heat shock competent cells (E. coli DH5oc) were stored in 50 ul aliquots at -80 °C. Aliquots of cells were thawed on ice. Ligation reactions were diluted with equal volumes of T E , and 10 ul added to 50 ul of competent cells with gentle mixing. The ce l l /DNA mixtures were placed on ice for 15 minutes, followed by a 1 minute heat shock at 37 °C and then returned to ice for 2 minutes. The cells were removed from the ice and 200 ul pre-warmed S O C medium (Appendix B , page 186) added before being incubated at 37 °C for .1 hour. Ten 45 microlitres of 0.1 M P-D-isopropyl-thiogalactopyranoside (IPTG) and 50 pi of 5-bromo-4-chloro-3-indolyl-P-galactopyranoside (X-Gal) (20 mg/ml in dimethylformamide) were added to the cell mix, which was immediately spread-plate on room temperature Y T medium containing 50 pg/ml ampicillin. The plates were incubated overnight and screened for white colonies (containing insertions) among the blue colonies (containing re-ligated vector). Single white colonies were picked, used to inoculate 3 ml Y T broth with ampicillin 50 pg/ml, and incubated at 37 °C on an orbital shaker overnight. The size of the insert was checked using the boiling mini-prep plasmid D N A purification detailed previously (section 2.1.2), along with restriction digestion using the appropriate enzyme(s) to release the insert. 2.1.7 Generation of Unidirectional Deletions by Exonuclease Digestion Following the Erase-a-Base™ protocol given by Promega (based on Henikoff, 1984) a series of deletion subclones was created 5' to 3' in relation to the coding region by exonuclease digestion of the clone p i lb-3 (a 6.5 kb Sai l fragment containing the 2S albumin gene ligated into the vector p G E M -3Zf(+), Figure 2a). Cesium chloride purified p i lb-3 plasmid D N A was digested with Sacl and B a m H l , phenol/chloroform extracted and reprecipitated with a 0.2 volume 1 M N a C l and 2 volumes of 100% ethanol. After centrifugation for 10 minutes, the pellet was washed with 70% ethanol and air dried. The pellet was redissolved in 108 pi distilled water and 12 pi 1 OX Exo III buffer (Promega). Ten 0.5 ml microfuge tubes were labelled, placed on ice and 7.5 pi of SI nuclease mix (Appendix B , page 184) added to each. Hal f (60 pi) of the D N A / E x o I I I buffer mix was heated to 37 °C and 2 pi Exo III nuclease added. This enzyme digests along one strand of the D N A from a nick, a blunt end or 5' overhang such as, in this case, from B a m H l . Three prime overhangs are resistant to ExoIII digestion, therefore the D N A adjacent to the Sacl site is protected. Every 30 seconds after the addition of the ExoIII nuclease, 2.5pl of the digest reaction was removed and added to one of the S1 nuclease mix tubes. After all the timed samples had been taken, the SI tubes were incubated at room temperature for 30 minutes, to allow the SI nuclease to digest the remaining single stranded D N A . 46 The SI nuclease was inactivated by adding 1 pi SI Stop (0 .3M Tris base, 50 m M E D T A ) and by heating to 70 °C for 10 minutes. The degree of serial digestion of the plasmid was ascertained by running 2 pi from each time point on a 0.8% I X T A E agarose gel. The deleted plasmids were blunt-ended to prepare them for re-ligation by mixing 5 pi of each stopped reaction mix with 40 pi distilled water, 5 pi 1 OX O P A P buffer (Pharmacia) and 1 unit Klenow reagent (Gibco B R L ) , and incubating at 37 °C for 3 minutes. One microlitre dNTPs (0.125 m M each of G , A , T , C ) was added to each tube and incubation continued a further 5 minutes. Plasmids were re-ligated by mixing 10 pi of the Klenow reaction mix with 4 pi 5 X ligase buffer (Gibco B R L , Burlington, Ont) , 2 pi 10 m M A T P , 4 p i distilled water and 2 units T4 D N A ligase (Gibco B R L ) . Ligations proceeded overnight at room temperature and competent E. coli D H 5 a cells were transformed as in section 2.1.6. 2.1.8 2S Albumin Promoter / GUS Fusions Isolation of the 2S Albumin Promoter The 2S albumin upstream region was isolated from the plasmid clone p i lb-3 (Figure 2a) as a polymerase chain reaction (PCR) product using the oligonucleotide 115 G. 1 and p U C forward sequencing primer (Promega) as P C R primers (Table 1). Vent r D N A polymerase (New England Biolabs Inc.) with 3' to 5' proof-reading ability was used for primer extension to reduce the possibility of error introduction during D N A synthesis. The P C R mix consisted of 200 ng C s C l purified plasmid p i lb-3, 100 ng forward primer, 100 ng II5G.1 primer, 2.5 pi 10X Vent buffer, 2.5 pi 10X dNTPs (2 m M each of G , A , T , C ) , 1 unit Vent r D N A polymerase and sterile distilled water to give 25 pi total volume. The amplified promoter fragment was digested with Sstl and ligated into an Sst l /Smal site in pGEM3Zf(+) (Promega), creating p G E M 2 S (Figure 2b). Construction of the GUS Fusion Vectors To aid in vector construction, the promoter-less G U S genes from the binary vector series p B H O l . l , pBI101.2, and pBI101.3 (Jefferson, et al., 1987) were removed as 47 Figure 2: Plasmid Constructs Plasmid maps are not to scale relative to each other, but within a map fragment sizes are comparable. A. 5' and 3' untranslated and flanking regions of the Picea glauca 2S albumin genomic clone contained within the plasmid p i lb-3 are represented by open boxes. Shaded boxes denote exon 1 and 2 of the 2S albumin coding region. The intron is represented by a thin line. The intron and second exon are not drawn to scale. The cloning vector was pGEM3Zf(+) (Promega). B. The 5' flanking region of the genomic clone contained in p i lb-3 was subcloned to create p G E M 2 S . C. p U C l O l . 1 consists of the uidA coding region and nos terminator from p B H O l (Jefferson et al., 1987) (shaded boxes labelled G U S and Nos respectively) cloned into the p U C 19 vector. D. p2SGUS is a translational fusion of the 2S ' albumin 5' flanking region, released as a B a m H l fragment from p G E M 2 S , with the uidA coding region contained in p U C l O l .2. E . Deletion of the 2S albumin 5' flank to position -653 was accomplished by restriction digestion of p2SGUS with X b a l and re-ligation to give p2S700. F. p 2 S M I N was created by the restriction digestion of p2S700 with SphI and religation, thereby reducing the 2S albumin 5' flank to position -117. G. pBIN2S was created by cloning the 2.3kb 5' flanking region and G U S reporter gene from p2SGUS as an EcoRI /Hind l l l fragment into the EcoRI /Hind l l l site of the binary vector pBIN19 (Bevan, 1984). NPTI I denotes the neomycin phosphotransferase gene which encodes kanamycin resistance. L B and R B represent the left and right borders of the disarmed T i plasmid. H. pBIN700 was created by removal of the 5' Qank.uidA translational fusion from p2S700 as an EcoRI /Hind l l fragment and ligation into the EcoRJ /Hind l l l site of pBIN19 (Bevan, 1984). 48 Figure 2: Plasmid Constructs A B „ „ Xbal,BamHl, SalK \ PstI SphI HindlH /Smal 49 EcoRI /H ind l l l fragments and placed in the polycloning site of pUC19, producing three vectors: p U C l O l . l , pUC101.2, and p U C l O l . 3 (Figure 2c). These vectors are identical except that the G U S reading frame is shifted by the addition of one nucleotide in relation to the polylinker (Jefferson et al., 1987). A translational fusion (p2SGUS, Figure 2d) was created by removing the 2S promoter from p G E M 2 S as a B a m H l fragment and ligating it in front of the promoter-less G U S gene of p U C l O l .2. The 2S albumin promoter region was also ligated into the B a m H l site of pBI101.3; creating a negative control (p2S+lGUS), due to the one nucleotide frameshift upstream of the G U S coding region. Fusion junctions were sequenced (see Methods and Materials section 2.2) using a G U S sequencing primer (Clontech) (Table 1). The 5' flanking region was deleted distally from the full size of 2.3 kb to approximately 700 basepairs by digestion with X b a l and re-ligation to form the plasmid p2S700 (Figure 2e). A minimal promoter consisting of 117 basepairs of 2S albumin 5' flanking sequence (p2SMTN, Figure 2f), was formed by removing the distal region of the promoter by digestion with Sphl and re-ligation. The spruce 2S albumin promoter/GUS fusions of p2SGUS and p2S700 were excised as EcoRI /H ind l l l fragments by restriction digestion and were gel-purified (Prep-a-gene™, BioRad). These fragments (2.3 kb promoter:GUS fusion and 700 bp promoter:GUS fusion) were then ligated into the binary vector pBIN19 (Bevan, 1984) for use in Agrobacterium tumefaciens mediated transformation, creating binary vectors pBIN2S and pBIN700 respectively, Figure 2g and 2h. The binary vectors pBIN2S and pBIN700 were transformed by electroporation (Gene Pulser, BioRad) into E.coli strain D H 5 a electro-competent cells. Constructs were introduced into A. tumefaciens strain E H A 1 0 5 (Hood et al., 1993) via triparental mating. Introduction of Binary Vectors into Agrobacterium tumefaciens Three millilitre liquid cultures of the E.coli helper strain (HB101/pRK2013) (Ditta et al., 1990), the E.coli strain D H 5 a containing the binary vector (pBIN2S or pBIN700), and 50 the recipient A tumefaciens strain E H A 1 0 5 (Hood et al., 1993) were grown overnight. The E.coli strains were cultured in L B medium containing 50 pg/ml kanamycin at 37 °C, while the disarmed A tumefaciens strain was cultured at 28 °C in 523 medium containing rifamycin 50 pg/ml. The overnight bacterial cultures were diluted 1:10 and grown to mid-log phase (OD 6 0 o = 0.7). The helper and D H 5 a E.coli cultures were combined in a single culture tube and incubated without agitation at room temperature for 30 minutes. One millilitre of the A. tumefaciens culture was added to this, gently mixed and filtered through a sterile 0.2 pm filter held in a reusable Sarstedt syringe filter holder (Sarstedt Inc.). The filter was removed from the holder and placed bacterial side up on a plate of 523 medium without antibiotics at 28 °C for 24 hours. Bacterial cells were resuspended in 2 ml of sterile 15% glycerol by transferring the filter disk from the 523 medium to a 10 ml disposable culture tube and vortexing by hand. A dilution series was made in 1 ml sterile distilled water blanks from the initial bacterial suspension and 100 pi of each dilution was spread-plated on 925 Minimal medium (Appendix B , page 186) containing kanamycin and rifamycin at 50 pg/ml. The 10 dilution was also plated on 925 Minimal medium without antibiotics, to observe the viability of the bacterial strains. Plates were incubated at 28 °C overnight. Single colonies were picked and streaked onto selection medium (925 minimal medium containing kanamycin and rifamycin 50 pg/ml) three times (in series) to ensure that there were no E. coli contaminating the transformed A. tumefaciens. Mini-preps were done on the putatively transformed A. tumefaciens (EHA105/pBIN2S and EHA105/pBIN700) (as described in Methods and Materials section 2.1.2) to confirm that the plasmids had been taken up and had not undergone any deletions or gross rearrangements. Bacterial cultures were stored as 15% glycerol stocks at -80 °C. 2.2 Sequencing Sequence was obtained by the Taq Polymerase Chain Reaction (PCR) sequencing method using the fmol™ Sequencing System (Promega), with modifications. Synthetic oligonucleotides (Table 1) specific to the cloning vector (pUC/M13 forward, and pUC/M13 51 reverse) (Promega) and to the 2S albumin c D N A clone II5G1001 (II5G.1, II5G.2, II5G.3, II5G.4, and II5G.5) (Nucleic Acid-Protein Service Unit) were used as sequencing primers. 2.2.1 Radioactive Labelling of Sequencing Primers Primers were radioactively end-labelled by combining in a 0.5 ml microcentrifuge tube: 10 pmoles of sequencing primer (= 57 ng for a 17mer, = 67 ng for a 20mer, = 80 ng for a 24mer primer), 10 pmoles (= 5.0 ul of 5000 Ci/mmol at 10 uCi/ul) y 3 2P-labelled A T P , 1 ul T4 P N K 10X buffer, 5 units T4 P N K (Promega), and sterile distilled water to a final volume 32 of 10 u l If the primer concentration was too dilute, the primer and y P - A T P were dried down together in a Speed Vac, before being redissolved in 10 ul I X T4 P N K buffer. The primer labelling mix was incubated at 37 °C for 15 minutes then inactivated by heating to 98 °C for 2 minutes. The primer was stored at -20 °C or used directly without further purification. Note that this differs from the protocol for end-labelling primers for use in Southern blot hybridization; the amount of primer labelled is increased, E D T A is omitted in stopping the reaction, and there is no need to remove unincorporated radioactive label. 2.2.2 Extension/Termination Reactions Four 0.5 ml microfuge tubes were labelled G, A , T, C for each set of sequencing reactions. One microlitre of the appropriate dNTP - d/ddNTP mix (Appendix B , page 181) was added to the bottom of each tube (i.e., 1 ul deoxy-adenosine triphosphate (dATP) -deaza/dideoxy-adenosine triphosphate (d/ddATP) mix into e.g., tube A ) . The tubes were capped and stored on ice until needed. A primer/template D N A mixture was prepared which consisted of 2 ul double stranded D N A template (approximately 100 ng of super-coiled plasmid D N A prepared by mini-prep, Methods and Materials section 2.1.2), 2.2 ul 10 X Taq 32 Mg-free buffer (Promega), 1.7 ul 25 m M M g C l , 1.5 ul P end-labelled primer, 1.0 ul regular grade Taq polymerase (Promega) and distilled water to 18 ul final volume. The primer/template mixture was assembled on ice, mixed gently with a pipette tip and spun down. Special care was taken not to introduce bubbles into the sequencing reaction when 4 ul of the 52 " primer/template mixture was placed on the inside wall of each d/ddNTP tubes. The presence of any air bubbles would cause the sequencing reaction to fail. One drop (approximately 25 p 1) of mineral oil was added to each tube and the reaction mix spun down briefly. The reaction tubes were transferred directly from ice into block 1 of the Ericomp Twin Block™ Thermal cycler (Ericomp), preheated to 95 °C. The P C R cycle used for primers 17 nucleotides in length was 1 cycle of 95 °C for 2 minutes; 30 cycles of 94 °C for 30 seconds (denaturing), 45 °C for 30 seconds (annealing of the primer), 70 °C for 30 seconds (extension of the primer); finishing with 1 cycle of 70 °C for 5 minutes. The annealing temperature for primers 20 nucleotides in length was raised to 55 °C. For primers of 24 nucleotides or longer the annealing/extension step was combined as 70 °C for 30 seconds, i.e. the 55 °C annealing step was omitted. After the thermocycling program had been completed, 3 pi of sequencing stop buffer (Appendix B , page 184) was added and the tubes spun down. Sequencing reactions were heated for 2 minutes at 70 °C before being loaded in a GATC-order on the sequencing gel or they could be stored at -20 °C. 2.2.3 Sequencing Gels - 1.2X T B E The glass plates of the sequencing apparatus were thoroughly cleaned and the top plate (shorter one) silanized in the fume hood using Gel Slick™ (J.T. Baker Inc.) following the manufacturer's instructions. The plates were assembled silanized side in, with 0.4 mm spacers held in place by black stationary clamps. The bottom of the gel cassette was taped closed with yellow Scotch Brand electrical tape to a third the way up the cassette sides. Long Ranger™ (J.T. Baker Inc.) is a pre-mixed modified acrylamide monomer solution used as a substitute for acrylarrude/A/sacrylamide in the preparation o f sequencing gels. A 1.2X T B E gel was prepared by dissolving 21 g urea in 6 ml 10X T B E (Appendix B , page 185), 5 ml Long Ranger™ concentrate and 25 ml deionized water. This solution was filtered through a Whatman No . 1 filter paper (Whatman) in a Buchner funnel, and then de-gassed. A "plug" was mixed for the bottom o f the sequencing gel consisting o f 1.5 ml of the gel solution prepared as above, 15 pi fresh 10% ammonium persulphate solution (APS), and 53 1.5 ul N,N,N',N'-tetramethyl-ethylenediamine ( T E M E D ) mixed in a microfuge tube and pipetted down one side of the gel cassette and spread along the bottom. The plug had a thicker, faster polymerizing consistency, its purpose being to prevent leaks when the main part of the gel was poured. The cassette was placed upright while the plug was setting. After the plug had set, 250 ul 10% A P S and 25 ul T E M E D were added to the remainder of the gel solution. The solution was swirled to mix and then drawn up in a 60 cc syringe (without needle), taking care not to add bubbles. A n 18 gauge needle was attached to the syringe and the gel solution was poured along one side of the cassette, the angle of the glass plates and the speed of pouring being adjusted so that the gel solution poured in one smooth flow with no air bubbles. When the cassette was full, it was laid at a slight angle propped up by two rubber stoppers at the top sides, a sharks-tooth comb inserted (flat side into the gel) and clamped in place. Polymerization was complete within 1 to 2 hours. The top edge was wrapped in plastic wrap and the gel stored at 4 °C i f the sequencing gel was not used that day. The glass cassette was clamped into the sequencing apparatus and the upper and lower buffer chambers filled with 0 .6X T B E running buffer. 1.2X T B E gels were run at 35 watts for 3 to 6 hours. The sharks-tooth comb was removed and its orientation reversed so that the teeth were level and barely pressed into the gel. The gel was pre-run for 15 minutes to bring it up to running temperature (45 °C) and the sample wells rinsed with running buffer before 1/3 to Vz volume of the pre-heated (70 °C) sequencing reactions were loaded. Runs of overlapping sequence were accomplished by loading the gel two or three times with the same reaction mix (in an new set of wells) at 1 Vi hour intervals. Short sequencing runs were approximately 4000 volt hours (the first dye front, bromophenol blue, had just run off the gel by then), long runs were 8000 volt hours (the second dye front, xylene cyanol, was then about to run off the bottom of the gel). A t the completion of the run, the gel cassette was removed and allowed to cool to room temperature, before the spacers were removed and the top glass plate was lifted off. A piece of Whatman 3 M M blotting paper (Whatman) cut to the size of the gel was smoothed 54 over top, firmly pressed in place, and used to peel the gel from the lower glass plate. The gel was overlaid with a piece of Saran Wrap™ so that there were no bubbles or wrinkles in the plastic. The sequencing gel was then vacuum dried in a gel drier at 80 °C for 1 hour, before being exposed to X-ray film (X-Omat™, Kodak) overnight. 2.2.4 Sequencing Gels - Formamide G - C rich regions of the sequence were difficult to read due to apparent compression of the sequence on a 1.2X T B E sequencing gel. Formamide sequencing gels are strongly denaturing and act to clarify regions of compression by eliminating secondary structures in the D N A strand. Formamide gels consisted of 21 g urea dissolved in 20 ml formamide (Ultrapure Bioreagent, J.T. Baker Inc.), 8 ml Long Ranger™ (J.T. Baker Inc.) concentrate, 5 ml 1 OX T B E , polymerized with 400 pi 10% A P S and 60 pi T E M E D . The gel solution was warmed to dissolve the urea and then cooled to room temperature before the polymerizing agents were added. The plug for the formamide gel consisted of 1.5 ml of the above gel solution polymerized with 25 pi 10% A P S and 2.5 pi T E M E D . Formamide sequencing gels were run with I X T B E running buffer at 35 watts for twice the length of time as 1.2X T B E sequencing gels. Formamide gels must be fixed after being run to prevent stretching and distortion. The top plate of the gel cassette was removed and the lower glass plate with the gel placed in a 20% ethanol, 10% acetic acid fixing solution for 15 minutes. The fixative was drained away and the gel dried as above. 2.2.5 Assembly of Sequences The sequence was read from the developed X-ray films and entered by hand into the S E Q U I N program of the P C / G E N E software package (version 4.16, 1992, IntelliGenetics). This software package contains programs which were used to align and overlap sequences ( A S S E M B G E L ) , as well as to identify potential eukaryotic promoter elements ( E U K P R O M ) , hairpin loops (HAIRPIN) and repetitive motifs ( R E P E A T ) within the gene sequence. The program N A L I G N was used to compare nucleotide sequences by aligning them. 55 2.3 Confirmation of Intron by Polymerase Chain Reaction P C R primers (II5G.4 and II5G.5, Table 1) were used to confirm the presence of an intron in the 2S albumin gene. The primer II5G.4 anneals 467 basepairs upstream from the 5' border of the intron; II5G.5 anneals 10 basepairs downstream of the 3' intron border. When the c D N A clone (II5G1001) was used as the template, the primer pair produced a 517 bp P C R product. Genomic D N A templates assayed were from the interior spruce tissue culture line W70, and from individual trees: E W S 1647, PG2, PG5 , PG8, E K 6 and E K 4 6 (EWS= eastern white spruce, PG= Prince George interior spruce, E K = eastern Kootenay interior spruce). P C R reactions consisted of 1 ul D N A template (20 to 100 ng, R N A free), 100 ng each primer, 2.5 (al 10X Taq polymerase buffer, 1.5 ul 25 m M M g C l 2 , 2.5 ul 10X dNTPs, 1 unit Taq D N A polymerase (Promega). The thermocycler (Ericomp) was set to hot start at 95 °C, then perform 30 repeats of a 30 second annealing step at 55 °C, a 1 minute chain extension step at 72 °C and a 30 second denaturing at 94 °C. The program ended with one round of strand completion consisting of 1 minute 40 seconds at 55 °C plus 10 minutes chain extension at 72 °C. 2.4 Plant Methods 2.4.1 Interior Spruce Embryogenic Cultures The interior spruce embryogenic culture line, W70, used in these experiments consistently produced large numbers of high quality somatic embryos. This culture line was initiated in 1987 (Roberts et al., 1991), from a seed o f the open pollinated family PG118 (maternal parent) originally from the Prince George region of British Columbia. Interior spruce exists as a natural hybrid or introgression zone between Picea glauca [Moench] Voss (white spruce) and P. engelmannii Parry (Englemann spruce) (Owens and Molder, 1984). Interior spruce embryogenic cultures were maintained and matured as per Roberts et al. (1990a) The culture line was maintained in the dark on Vi L M medium (Appendix B , page 188) solidified with 0.6% noble agar (Difco). Embryogenic suspensor masses (ESM) were subcultured weekly by dividing into quarters and transferring to fresh medium. 56 Somatic embryos were matured from E S M under 16 hour photoperiod (25 to 35 p E m"2 sec"1). Maturation was initiated by transferring a 3A cm 2 section of E S M from maintenance medium to hormone free 1/2 L M charcoal medium (Appendix B , page 189) for one week. The E S M was then transferred to 1/2 L M maturation medium (Appendix B , page 189) containing 60 p M abscisic acid ( A B A ) and 1 p M indole-3-butyric acid ( IBA) for four to six weeks, with subculture to fresh media every two weeks. Interior spruce embryos reached maturity, based on embryo size and morphological characters, after 4 to 5 weeks on maturation medium. 2.4.2 Tobacco Plants Nicotiana tabacum cv. Xanthi plants used for Agrobacterium tumefaciens leaf disk transformation were supplied by Agriculture Canada, Pacific Agriculture Research Station (6660 N W Marine Dr. , Vancouver, B .C . ) as small soil grown plants with 4 to 6 leaves. Tobacco plants were grown in 5 inch diameter pots containing sterilized potting mix under 2 1 high intensity lights (200 p E m" sec" ) for an eighteen hour photoperiod, to encourage flowering. Plants were fertilized every two weeks with a dilute solution (1.2 g/1) of 20-20-20 soluble fertilizer. Shoot tips were cut from selected tobacco transformants, surface sterilized (3 minutes in 70% ethanol, 5 minutes in 20% commercial bleach plus 0.1% Tween 20 (polyoxyethylene sorbitan monolaurate) (Sigma), 4 washes in sterile distilled water) and grown in G A - 7 (Magenta Corp.) vessels on M S medium (Appendix B , page 191) containing 0.1 m M benzyladenine (BA) , as a backup to the soil grown plants. Tissue cultured tobacco shoots were transferred to fresh media every 5 weeks. 57 2.5 Gene Expression Experiments 2.5.1 Transient Expression Interior Spruce Developmental Stages - Target Preparation Proembryos, the earliest somatic embryo stage, consist of a transparent round embryo (10 to 40 cells) attached to a strand of suspensor cells (Figure 3a). Proembryos may be single or attached in ridges consisting of a file of embryos and suspensors attached laterally to one another. A n E S M is essentially a mass of proembryos with the embryos on the surface of the E S M facing up from the medium and outward. Targets of proembryos were prepared by briefly suspending 7.5 g of E S M (interior spruce, line W70) in 37.5 ml of liquid 1/2 L M medium (Appendix B , page 189), then pipetting 1.5 ml of the suspension onto a Whatman #1 filter (5.5 cm in diameter) (Whatman) to form a thin, even layer. Filters were briefly blotted on sterile paper towel, then centred on 1/2 L M (Appendix B , page 189) petri plates solidified with 0.4% Gelrite® (Scott Laboratories, Inc.) rather than noble agar (Difco) for microprojectile bombardment. Stage 2, also known as the globular stage embryo, is the first developmental stage visible during maturation of the spruce embryos. Stage 2 embryos consist of a yellow, opaque embryo, which can be round or bullet shaped, subtended by clear strands of suspensor cells (Figure 3b). Stage 2 embryos were picked individually from the surface of the E S M after one week on maturation medium. Twenty stage 2 embryos were laid in four rows of five on a 1 cm 2 piece of 53 (im nylon mesh.. Each target was then centred on a 1/2 L M 60:1, 0.4% Gelrite (Appendix B , page 189) petri plate. Stage 3 is the early cotyledonary stage of conifer embryo development, harvested from E S M after 3 weeks on maturation medium (Figure 3c). This stage has also been referred to as the "flat head" stage of embryo development, as the ring of cotyledon primordia is just arising around the apical dome, giving the top of the embryo a flat appearance. Embryos were considered to be beyond stage 3 (in transition to maturity) when the tips of the cotyledons had elongated past the apical dome. Temporally, stage 3 is very brief as the cotyledons tend to 58 Figure 3: Spruce Developmental Stages for Microprojectile Bombardment A , proembryo B , stage 2 embryo C, stage 3 embryo (early cotyledonary stage) D , mature embryo E , somatic germinants after 3 weeks on germination media F, germinating white spruce pollen. The somatic embryo stages, germinants and pollen were analysed for expression of the endogenous 2S albumin gene by Northern blotting and were prepared as targets for microprojectile bombardment as described in the Methods. 59 elongate rapidly. Stage 3 embryos were arranged for microprojectile bombardment in the same manner as the stage 2 embryos. The mature embryo is larger than the stage 3 embryo and the cotyledons have extended past the apical dome (Figure 3d). The cotyledons of somatic embryos have a tendency to spread out petal-like away from the apical dome, this differs from zygotic embryos, where the cotyledons tend to close over the apex. Mature embryos were harvested from the E S M after 4 to 5 weeks on maturation medium. Mature embryos were arranged for microprojectile bombardment as above, 20 embryos per target. Partially-dried embryos were mature embryos that had undergone a three week high relative humidity treatment (Roberts et al., 1990b). The high relative humidity treatment involved placing squares of 53 um mesh each containing 20 to 25 mature embryos in alternate empty wells of a sterile tissue culture 12-well plate (Becton Dickinson Labware), the 6 remaining wells being filled with 1 ml of sterile distilled water adsorbed in a 1 cm of Kimpak ™ (Seedboro Equipment). The embryos were not supplied with nutrients during the partial drying treatment. The 12-well plates were sealed with parafilm (American National Can™), wrapped in aluminium foil and stored in the dark for 3 weeks. Partially dried embryos were prepared for microprojectile bombardment by transferring the nylon mesh from the 12-well plate to hormone free Vi L M plates (Appendix B , page 189) solidified with 0.4 % Gelrite. Partially-dried somatic embryos were germinated on 1/2 L M H F , solidified with 0.6% noble agar (Appendix B , page 189) in 500 ml Phytocon tubs (Sigma). Three week old germinants were removed from the medium and prepared for microprojectile bombardment by positioning them horizontally, 7 per target, in the centre of a 0.8% water agar plate (Appendix B , page 191) (Figure 3e). White spruce {Picea glauca) pollen was harvested in M a y 1993 at the Petawawa National Forestry Institute in Chalk River, Ontario and stored desiccated at 4 °C. Pollen grains were suspended in sterile distilled water (0.2 mg/ml) and stirred constantly with a magnetic stir bar on a stir plate. Five millilitres of pollen suspension were vacuum filtered onto sterile 5 cm squares of Biotrans nylon membrane ( ICN Biochemicals), using a scintered-60 glass Buchner funnel. Pollen targets were centred on petri plates containing 5% sucrose, 0.6% water agar (Appendix B , page 191) for bombardment. It was necessary for germination of the pollen grains to have sucrose in the pollen medium. Only germinating pollen grains are able to express G U S (Figure 3f). DNA Preparation for Microprojectile Bombardment The expression vectors p2SMTN, p2S700, p2SGUS, (Figures 2 and 19) and p B M l 13kp (Marcotte et al., 1988) were transformed by heat shock (Methods and Materials section 2.1.5) into theE. coli strains JM101 and SURE® (Stratagene) respectively. Plasmids were isolated by alkaline lysis (Methods and Materials section 2.1.2) and purified by cesium chloride gradient (Methods and Materials section 2.1.2). Vector D N A was quantified by absorbance at 260 nm and diluted to 1 pg/pl for use in transient expression studies. Vector D N A was precipitated onto 1.6 pm gold particles (BioRad) with C a C l 2 / spermidine (Klein et al., 1988). Ten microlitres of 1 pg/pl cesium chloride-purified plasmid D N A , 50 pi 2.5 M C a C l 2 , and 20 pi lOOmM spermidine free base (Sigma) were added in order, to 25 pi of gold particles [3 mg/25 pi distilled water] in a 0.5 ml microfuge tube while being vortexed. The gold microprojectiles were incubated at room temperature for 10 minutes before being briefly centrifuged and washed twice in 100% ethanol. The vector-coated gold particles were resuspended in 50 pi of 100% ethanol and, while being vortexed, were dispensed as 5 pi aliquots onto the centre of each macrocarrier. Macrocarriers were loaded into the P D S 1000/He after the ethanol had evaporated. Gold microprojectiles were propelled into the spruce tissues by a burst of helium gas as the rupture disk gave way. Microprojectile Bombardment Twenty-three targets were prepared, as a group, for each embryo developmental stage (proembryo, stage 2 embryo, stage 3 embryo, mature embryo, somatic germinant and white spruce pollen). Targets were divided between the five expression vectors; five targets each for p2SGUS, p2S700, p2SMTN, and p B M l 13kp, and three targets for the negative control 61 p2S+l. Targets for several experiments would be prepared in the morning and then bombarded in the late afternoon. Each experiment was replicated three times, usually on different days, for a total of 69 targets per experiment. In total, 5963 interior spruce somatic embryos were harvested for .the transient expression experiments. The DuPont P D S 1000/He (BioRad) was operated as per manufacturer's instructions at a vacuum of 25 inches Hg , with 1100 psi rupture disks, a gap distance of 3/8 inch, and an internal nested gap of 16 mm. Microprojectile bombardment, for all developmental stages except the somatic germinants, was carried out with the targets positioned at shelf level 2 (8.3 cm from the stopping mesh). Germinants were bombarded at shelf level 3, 5.1 cm from the stopping mesh. The vacuum flow rate was set at 0.75 and the vent flow rate was set at 0.35 on the P D S 1000/He. Forty eight hours after bombardment, the interior spruce targets were histochemically assayed for transient G U S expression. 2.5.2 Stable Expression Experiments Tobacco Transformation Tobacco {Nicotiana tabacum cv. Xanthi) was transformed using a leaf disk method (Horsch et al., 1985). Agrobacterium tumefaciens cultures (EHA105/pBIN2S and EHA105/pBIN700) used for co-cultivation of tobacco were initiated by picking single colonies from Y E P (Appendix B , page 187) kanamycin 50 mg/1 agar plates and inoculating 3 ml of Y E P broth plus rifamycin 50 mg/1, kanamycin 50 mg/1. The liquid culture was incubated on an orbital shaker at 28 °C overnight. The next morning, the cultures were diluted 1:10 with Y E P broth plus antibiotics and grown to log phase (an O D 600 nm of between 0.7 and 0.9). Antibiotics were washed from the A. tumefaciens log phase cultures by centrifugation at 3900 rpm in a Beckman G P benchtop centrifuge to pellet the cells, followed by resuspension in 3 ml Y E P , repeated 3 times. Cultures were diluted to 30 ml with Y E P broth, and used for co-cultivation. 62 Leaves, from young greenhouse grown plants (4 to 8 leaf stage), were removed and surface sterilized by soaking for 15 minutes in a solution of 10% commercial bleach and 0.1% Tween 20 (Sigma) followed by 4 washes with sterile distilled water. Avoiding major leaf veins, disks (10 mm diameter) were cut using a flame-sterilized cork borer. Leaf disks were soaked in co-cultivation broth for 15 minutes, gently blotted dry on sterile paper towels and plated, abaxial side down, on Murishige and Skoog - shoot induction medium (MS-SIM) (Appendix B , page 191). Control leaf disks, which were not exposed to Agrobacterium, were prepared and plated on MS-SEV1 at the same time. The plates of tobacco leaf disks were placed under lights (25 to 30 pE/sec/m 2 light intensity, 16 hour photoperiod) in the tissue culture room. After 48 hours, A. tumefaciens was visible growing on the surface of the medium around the co-cultivated disks. The leaf disks were transferred to M S - S I M containing 50 mg/1 kanamycin, 250 mg/1 cefotaxime, and'500 mg/1 carbenicillin. One control disk was placed with co-cultivated disks on each plate. Shoots began to grow from the cut edge of the co-cultivated disks after approximately 2 weeks. A t 4 weeks, putatively transformed shoots were removed from the initial explant and rooted on hormone free M S containing the same antibiotics as above. Rooted plants were transplanted to sterile potting soil, and kept under low light (35 pE/sec/m 2) and high humidity for 3 days before being transferred to high light, long day length (200 pE/sec/m 2 , 18 hr photoperiod) to encourage flowering. Generation of Tj Plants Flowers of T 0 plants were allowed to self pollinate and the seed collected. Tj seed was germinated on water agar containing 300 mg/1 kanamycin. Germinants able to form green true leaves on this high level of antibiotic selection were transplanted to soil and grown under the same regime as the parent plants. T, plants were allowed to set seed and then were cut back to encourage further growth and seed production. Two families for each construct (pBIN2S and pBIN700) were selected for further study, based on high levels of p-glucuronidase activity as determined by M U G assays of 63 whole green seed capsules. Expression of the 2.3 kb spruce promoter contained in the construct pBIN2S was studied in 8 offspring of the T 0 parent plant 2S-4 and 11 offspring of T 0 2S-12. The effect on stable expression of deletion of the 2S albumin promoter upstream from -653 was studied in the pBIN700 group of plants, consisting of 9 individuals originating from T 0 plant 700-3 and 11 plants from the T 0 plant 700-13. 2.5.3 Assessment of P-glucuronidase Activity Tobacco leaves, roots, stems, whole flowers, seed and green capsule tissues were assayed for P-glucuronidase expression using both the histochemical (X-gluc) and the fluorescence ( M U G ) techniques. Tobacco seed capsules covering the range of embryo development were harvested; Vi o f each capsule was placed in a vial of F A A fixative (Appendix B , page 182) and the remaining seeds were scraped into a microfuge tube, frozen in liquid nitrogen, and stored at -80 °C until P-glucuronidase activity was measured by M U G assay. Seeds fixed in F A A were dissected and scored for stage of embryo development. A developmental series of tobacco embryos were dissected from fresh seeds, histochemically stained for G U S expression and fixed in F A A . These embryos were passed through a dehydration series from F A A to FAA:acetone (1:1), to acetone Immersion oil (1:1), then photographed in immersion oil using a Zeiss light microscope. Histochemical GUS Assay (Jefferson, et al., 1987) G U S histochemical stain reagent was made by dissolving 50 mg X-gluc (5-bromo-4-chloro-3-indoyl-P-D- glucuronide, Clontech) in 100 ul of dimethylformamide before being diluted in 100 ml 50 m M N a P 0 4 buffer (pH 7.0), 1% Triton X-100 (iso-octylphenoxy polyethoxy-ethanol, B D H Chemicals) to give a final concentration of 0.5 mg/ml (Jefferson, 1987). The X-gluc solution was made fresh or stored frozen at-20 °C. Filter disks containing proembryos and white spruce pollen were placed on Whatman #1 filters which had been soaked in X-gluc solution, whereas larger spruce and tobacco tissues were simply immersed in microfuge tubes or small petri plates. A l l samples were incubated overnight at 37 64 °C in the dark. The number and location of blue loci on microprojectile bombarded tissues were observed using a dissecting microscope. Fluorometic GUS Assay Tobacco samples were either fresh or frozen in liquid nitrogen and stored at -80 °C before being assayed for P-glucuronidase activity. Tobacco seed (50 - 100 mg), 10 mm leaf disks, pollen and whole flowers was assayed by grinding each tissue in 500 p i M U G extraction buffer (50 m M N a H P 0 4 (pH 7.0), 10 m M D T T , 1 m M E D T A , 0.1 % Triton X -100) in a 1.5 ml microfuge tube using a homogenizer with a pestle and a pinch of acid washed sand (BDH) . Whole tobacco leaves, capsules and roots were ground to a fine powder using liquid nitrogen in a pre-chilled mortar and pestle. Approximately 200 mg of ground tissue were transferred to a 1.5 ml microfuge tube and suspended in 500 pi M U G extraction buffer. Insoluble plant material was removed by centrifugation at 13,000 rpm for 10 minutes. The supernatant was removed to a clean tube, 100 pi stored at -80 °C for total protein measurements (BioRad Protein assay, BioRad - see below) and 20 to 50 pi of the crude extract (extract volumes consistent within an experiment) assayed for P-glucuronidase activity. A typical reaction mix consisted o f 40 pi sample extract, 745 p i M U G extraction buffer and 200 pi methanol (Kosugi et al., 1990), warmed to 37 °C to which 20 pi of 50 m M 4-methyl-umbelliferone glucuronide ( M U G ) stock was added (Jefferson et al., 1987). A control blank was made for each set of samples, with 40 pi distilled water replacing the sample extract. Immediately after the addition of the M U G substrate, the microfuge tube was inverted to mix, and 100 pi of the reaction mix was removed and added to 900 pi Stop buffer (0 .2M N a 2 C 0 3 ) as the zero time measurement. The reaction mix was returned to 37 °C and the next 100 p i samples placed in 900 pi Stop buffer at 60 minutes, 2 hours and 3 hours. Fluorescence was read using a L S 50 Luminescence Spectrometer (Perkin-Elmer Cetus) and the Obey software package with F L Data manager and sipper mechanism. Excitation and emission wavelengths were set at 365 and 460 nm respectively. The 65 spectrometer was calibrated with a dilution series of 7-hydroxy-4-methylcoumarin (4-MU) (Sigma) (0.1 u M , 0.5 u M , 1.0 u M , 5.0 u M , and 10.0 uM) , and 100 |ul o f each standard was added to 900 u l o f Stop buffer. Protein Quantification The Bio-Rad Protein Assay microtitre plate protocol (based on Bradford, 1976) was used to quantify total protein, for normalisation of p-glucuronidase activity between samples to activity per milligram total protein. Two replicates of 10 ul sample extract, as well as of a B S A standard (Bio-Rad) dilution series were pipetted into a 96 well microtitre plate. Sample extracts and B S A standards were stored at -80 °C until measured. Standards were 0 mg/ml, 0.070 mg/ml, 0.141 mg/ml, 0.705 mg/ml, and 1.41 mg/ml B S A in sterile distilled water. B io -Rad dye reagent concentrate was diluted 1.5 with distilled water and filtered through a Whatman #1 filter (Whatman). Two hundred microlitres of dilute dye reagent were added to each well and mixed by pipetting. The microtitre plate was incubated at room temperature for at least 5 minutes and then read at 595 nm using a microtitre plate reader (Titertek, Flow Laboratories, Inc.). Calculation of P-glucuronidase Enzyme Activity P-glucuronidase enzyme activity was calculated using the measured amount of 4-methylumbelliferone (4 -MU) generated by the activity of the enzyme on the substrate 4-methylumbelliferyl P-D-glucuronide ( M U G ) (calculated from fluorescence measured over time) normalised to the amount of protein in a given sample. A n example calculation is given below: 4 - M U generated (nMol/min./ul) = ({4-MU1T2 - f 4 - M U I . T , ) (T 2 -Ti)/(vol . crude extract ul)(1000) Protein mg/ul = [Pi+P 2]/2 / 10 ul p-glucuronidase activity (pMol min."1 mg"1) =(4-MU nM/min/ul) x 1000 (protein mg/ul) 66 [4-MU] = calculated concentration of 4-methylumbelliferone at a given time (pmoles / ml) T i and T 2 = times when samples were taken (minutes) P i and P 2 = concentrations of replicate protein samples (mg/ml) 2.6 Data Analysis and Statistics Data were compiled and analysed using the statistics package S Y S T A T , version 5 ( S Y S T A T Inc.). Transient expression was quantified as number of blue loci per target or per embryo and stable expression as pmoles of M U G generated per minute per milligram of total protein. The mean level of expression and standard error were calculated for each developmental stage or tissue type. A n analysis of variance ( A N O V A ) was done on the data, as well as a pair-wise comparison of means (Fisher's least-significant-difference test) (Fisher, 1935) to confirm statistically significant differences between means. 67 C H A P T E R T H R E E Gene Structure 3.1 Characterisation of A.3.2 3.1.1 Pseudogene (v|/2S) Two positive lambda clones (A2. 1 and A3.2) were isolated from a white spruce genomic library (PNFI-X-88) screened with the synthetic oligonucleotide II5G.1 (Table 3: Synthetic Oligonucleotides) (Dr. Craig Newton, Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). M y research began with the restriction mapping of A3.2. In the course of mapping the lambda clone, Southern blots revealed the presence of two regions of sequence with homology to the 2S albumin c D N A (GenBank X63193) (Figure 4). A 1.7 kb Sal l /BamHI restriction fragment of A3.2 (subcloned into the sequencing vector p G E M and denoted pl2a-3) was partially sequenced and the B a m H l restriction site was found to be located in the putative coding region of the gene. This was unexpected as the original c D N A clone did not have a B a m H l site. Further sequencing of p i lb-2 (a 13 kb Sai l fragment which contained the previous subclone), revealed that this sequence was a pseudogene containing stop codons in all reading frames, upstream of a 643 basepair insertion (Figure 5). The sequence of the Picea glauca 2S albumin pseudogene, *F 2S, was submitted to GenBank, where it was assigned accession number U92078. The insertion sequence was A / T rich and had 3 small inverted repeats at its 5' and 3' borders (51-G C A C A / T G T G C -3', 5'- A A T T A A T G / C A T T A A T T -3', and 5'- A C G C C G A A / T T C G G C G T -3'). The presence of inverted repeat sequences at both ends of the insertion indicate that this may be a type of transposable or viral element. In order to discover i f this inserted sequence had homology to previously identified insertion elements, transposons, or retro-viral elements, the nucleotide sequence of the insertion was translated in all reading frames in both the forward and reverse directions. This resulted in six possible amino acid sequences, which were submitted to a B L A S T search for comparison with known protein sequences (Altschul et al., 1990). Potential matches from the different translations are listed in Table 4. Amino acid sequence alignments from the B L A S T searches and alignment statistics are recorded in Appendix D (page 195). 68 pGEM subclones 12a-3 )1lb-3 p l l b - 2 Insertion A T G T G A BamHI 1 kb ' X X SX B V X XHX B S / \ / \ / \ / \ / Y2S (Pseudogene) \ 2S Albumin +l A T G UTR Xbal HaeHI 100 bp B BamHI H Hindlll S Sail X Xba1 sequencing primer (not to scale) Figure 4: Restriction Map of A, - 32 Solid black bars above the X-32 restriction map indicate restriction fragments subcloned into sequencing vectors (pi lb -3 , p i lb -2 , and pl2a-3). Right and left arms of the E M B L 3 lambda cloning vector are not drawn to scale. Gene diagrams below X-3.2 represent regions of contiguous sequence. Horizontal arrows indicate location and direction of sequencing primers (1 =H5G.l , 2 =05 G.2 ,3 =115G.3, 4=II5G.4, and 5=H5G.5). II5G.1, II5G.3 and II5G.4 bind to both sequences. Vertical arrows mark restriction enzyme sites. Striped areas denote regions of homology between the two genes. Location of the first in-frame stop codon is noted in the pseudogene as a solid bar above the " T G A " . Genbank accession numbers are U92077 for the 2S albumin gene PG2S and U92078 for W2S. 69 Figure 5: Picea glauca 2S albumin pseudogene *F2S The pseudogene, T 2 S (GenBank accession number U92078), was sequenced from the genomic clones pl2a-3 and p i lb-2. Highlighted sequence in the figure lacks homology with the functional white spruce 2S albumin gene (PG2S). The sequence from 845 to 1488 is a 643 bp insertion into a region homologous to the spruce 2S albumin exon 1. Solid arrows mark sequencing-primer binding sites (II5G.1, II5G.3 and II5G.4), arrowheads indicate the direction of priming. A dotted arrow marks the position where the primer II5G.2 annealed in the functional gene but does not bind in the pseudogene, due to the 5 nucleotide differences noted above the arrow. The putative initiation codon and stop codons (stops in all reading frames ) are boxed, and a unique B a m H l site is also highlighted. Small inverted repeats (A, B , and C) present in the insertion are underlined. 70 Figure 5: Picea glauca 2S A l b u m i n Pseudogene *P2S 1 AT GT T AAAT GAG C T GAT GT AT CAT TAGT AT AT AT GT T GACTACT AT G GAT GAT TAATT TA 6 1 CACAT GT AGAT GGCTATTT CACACAT CNAAGTATT GAT AT GT AT AGT GT GCAT GACGAAA 1 2 1 ATCTGTATGTGTGGTGTGCCTTAACACGTAGAAGTGATGTTGTCGATGATTTAGTCCTGG 1 8 1 GT TAT GAAGAATAAAGAG GAAG GCAGT T T C T TAT T GAG GTAAT T T GT GTAAAAT GAGATA 2 4 1 CCCTTATATCGTGTTGTTTCTGTAGGGCCTGCCCGCTAACAAGATTATCTGTCTTTTGAA 3 0 1 AGACGTGGCCATTGGAAATAGTGTAAGCCAGGCGTTCCTTCTTTGGGTTGGGACGTGGAG 3 6 1 AGTGAGATGTTGCATGACTGCATCTGTTCNACGCTTCTTTGTTGTAGTGTTGTGTTCGTT 4 2 1 GT G C AAAAAGAC AC AT TCCCTTTCCT CAC C T GAC C T T CAT AAAT AT AAC AAT AAT AC C C A 4 8 1 C T T A T T C T T C C C A C C T T G G N A C T T G C A T T C C G T T C A T C T C G G G A A G A A A G G A A A T A A A G A II5G.4 ^ ^ II5G.1 541 A A A A C T C A A A G C A ^ I G G T G T C T T T T C C C C T T C G A C G A C G A G G C T G A C G C T C A A A T G G T T 6 0 1 " A N T T T A T C C G T C A C C C T A T T C A T C C T C C T T C A C T G G G G T A T T C C C A A T G T T G A T G G C C A 6 6 1 TGAAGACAATATGTATGGAGAiliTGfiiAACAACAATGACGGTNTTGCGATCCTCAGAG 721 AGACT<5AAGAGATTTNNTTGATGACGAGACTACTTGGAGCGCCGl||fAGAGAGGCCATCA ..U5G.2 T TQC 7 8 1 GAGAGAAGCAAGCTGCCGAGGAATTGCAAATATTGTCTCAAflllTTCCGAAGCCAATCCA 8 4 1 TACAAClIJCACAGTCACATAATTA^TGTGGTCl'aA'rGA^CGCCGAAAGACATATTTAAT A B C 9 0 1 TTATTAAATCTACCTTTAACTATAAATAAAATCTAATATATCTTATGGCTGCTTTTAACC 9 6 1 ATATCGATCGATCGCASATATCAGAGCTCGCTTCAGTATCAATATCGATAT3CTATTAAA 1 0 2 1 CeATATGCAAAATTTTCTGATGAACNGGCTCTCCAGAAACGCACGCAAAANGCACGNAAA 1 0 8 1 ATTTTCACTGTCGGCTCTCCATAAANGCACGCAGAACGTGGCATCACAATTCACATAAAT BamHl 1 1 4 1 GGGATCCGTGGCAATGAAGATGAAGCAGTTCCTGGTAATANTTTNNNCGTG3CAAGANAA 1 2 0 1 CAGTTTTATAGATTTTNATTATTGGTAATCAATTAAACGTGGCAATTATAT GGCTGGGTA 1 2 6 1 GTAATTTCTTAATATGATTAGGTAAATGTGAAAAGTAAACATGTTAACGGATGCCTTGAT 1 3 2 1 AAAT CCAAGAAAGTTAT GAAGAAT AAAT GTGACCCAT TGAT TTT GGAGGAC GACT AAGAA 1 3 8 1 T A T G C G A G C A T C T T C A A T C C A T A G C C A T C A G A T T A G G G T T T A A G G T T A A A T A A A A T T A A A 1 4 4 1 AGATCATTAATTGTGCCCATTGTCAAGTAGCANTANCTTCGGCGTAACAAATGCTCGATC B A C 1 5 0 1 GATCTTCATCGTAAGATTCCTTCAAGGATTCTGGTTCTTAGGAGGGTGCACCACTTAATC 1 5 6 1 ACGTCGTCGCCCCAAGGCCGCGGAAGAGAGGAGGAGGAGGTACTTGAGAGAGCGACATAC ^ ii5fi.a 1 6 2 1 CTTCTGAATACCTGCAACGTTCATGAGCACNNNNNNNNNNNNNCAACGCCACTCTC 71 Table 4: Matches from B L A S T Sequence Alignments Insertion reading frame Sequence Matches GenBank Accession # BLAST score Identities Positives 1 forward none 2 forward none 3 forward capacitative Ca entry channel (Bos taurus) X99792 59 15/38 (39%) 20/38 (52%) 1 reverse omega-Grammotoxin SIA (Grammostola spatulata = tarantula) 451235 54 10/35 (28%) 15/35 (42%) 1 reverse F47C12.5 gene product (Caenorhabditis elegans) U61946 56 9/20 (45%) 6/17 (35%) 13/20 (65%) 10/17 (58%) 1 reverse sequence 2 from Patent US 4920196 100054 33 7/8 (87%) 5/7 (71%) 7/8 (87%) 7/7 (100%) 2 reverse ADP, ATP carrier protein 2 (Arabidopsis thaliana) P40941 X68592 60 15/40 (37%) 17/40 (42%) 2 reverse nucleotide translocator (Arabidopsis thaliana) 1908224 A 60 15/40 (37%) 17/40 (42%) 2 reverse ADP, ATP carrier protein 1 (Arabidopsis thaliana) P31167 X65549 59 15/40 (37%) 17/40 (42%) 3 reverse none The second region of X3.2 with homology to the c D N A , subclone p i lb-3, was sequenced. During restriction mapping, the two related sequences were differentiated by hybridization with the end-labelled oligonucleotides 115G.5 (present in the c D N A and p i lb-3) and II5G. 1 (present in both p i lb-2 and p i lb-3). Orientation of the sequences in relation to each other was confirmed using the II5G sequencing primers (Table 3) in various combinations to produce P C R products. Presence or absence of amplified sequences was used to deduce direction of the primer pair in relation to each other. N o product was amplified by the primers 115 G. 4 or 115 G. 1 when used alone, confirming that the two genes were arranged facing in the same direction. 3.2 Picea glauca 2S Albumin (PG2S) Sequencing of p i lb-3 resulted in 1907 bp of contiguous 2S albumin genomic sequence; of this, 1063 bp was 5' flanking region, 490 bp was the first exon, 176 bp was a single intron, followed by a small 32 bp second exon and 146 bp 3' flanking sequence (Figure 6). The sequence was named PG2S and submitted to the GenBank database, where it was given accession number U92077. Nucleotide sequence homology between the 2S albumin genomic clone contained in p i lb-3, and the pseudogene Q¥2S - G B U92078) contained in p i lb-2, was 84.7% and included 5' flanking sequence as well as coding region on both sides of the insertion in the pseudogene (Figure 7). Previous work had identified a putative transcriptional start site (+1) as being 62 bases upstream of the translational start site (Craig Newton, unpublished, C. Newton - Forest Biotechnology Centre - B C Research Inc., 3650 Wesbrook Mal l , Vancouver, B . C . , V 6 S 2L2). The initiation codon of the 2S albumin genomic clone at position +62 ( A G C A A U G G G ) , was surrounded by nucleotides which are similar to those of the plant consensus sequence for initiation ( A A C A A U G G C ) (Liitcke et al., 1987). The E U K P R O M program of P C / G E N E (release 6.85) identified a T A T A box ( T C A T A A A T A C A A C A A ) 37 nucleotides upstream of the transcriptional start site, with a C A A T box ( C G C G G C C A T T G G ) 153 nucleotides upstream from the T A T A box. Several possible R N A cap signals were also indicated at positions -16, -3 or +6. The polyA signal A A T A A A was found at +808 and the polyA attachment site was determined to be at +883 by comparison with the c D N A clone. 3.2.1 5' Flanking Sequence There were five sets of large direct repeats found in the 5' flanking sequence of PG2S (Figure 8). In three cases there are small stretches (1 to 3 nucleotides) of disparate nucleotides interrupting what would be a longer direct repeat. Also notable was a 30 basepair sequence at -737 repeated completely in tandem once at -707 and again at -677 for the first 12 basepairs. This repeat is labelled as 3 a through 3 c in Figure 8. Located within this 30 basepair repeat unit was a 6 basepair inverted repeat (atgagcgagctaecaaaaectcat) which has the potential to form a hairpin loop with -5.6 kcal free energy. Eleven motifs with similarity to repetitive elements found in cereal seed storage protein promoters were identified (Table 5). 73 Figure 6: 2S Albumin genomic clone from Picea glauca The 2S albumin genomic clone was sequenced from the subclone p i lb-3. The GenBank accession number assigned was U92077. The promoter sequence (-1001 to -1), single intron, the 3' U T R and 3' flanking sequence are written in lower case nucleotides. The 5' U T R is in uppercase from +1 to +62, as is the coding region from +62 to +551 and +728 to +759, with the translated amino acid sequence given below. The boxed nucleotides in the sequence are from 5' to 3': a unique X b a l restriction site, the putative C A A T box, the unique Sphl restriction site, the putative T A T A box, 2 possible R N A cap signals, the initiation of transcription site (open box) and the putative initiation codon ( A T G ) . The second open boxed C marks the site of translational fusion of the 2S albumin promoter to the gus gene. Within the intron sequence two potential hairpin loops are in bold face and a possible intron branch sites are underlined. The stop codon ( T G A ) , the poly-A signal, and the site of poly-A attachment are also highlighted. 74 Figure 6: 2S Albumin genomic clone PG2S from Picea glauca -1001 -941 -881 -821 -761 -701 -641 -581 -521 -461 -401 -341 -281 -221 -161 -101 -41 +20 +80 ctaaa t g g a a a a g a t t g c a t g a t t a g t t t g a g a a t a c g g g t t t c a g g g t t c a t c t t a c c a gtggagaatcttttgattcgggaaacaaacgcagatactcagtcgcacaccataacagtg gacactggtgagtcttttgattcgtgaacaaaacgcagatactcagtggcacaacataac aatggctaatcttttggattcaaatggaaagaacgaagacattgaaaattgaaggaatgg gggagaaggagaagcaaagttcagaaatggaatgagcgagctagcaaagctcataaatgg aatgagcgagctagcaaagctcataaatggaatgagaaagca tcaa tf^feagiaitgacata caataggacattaggcagagagacaggggatgtttgcatggctgtgtaggtggcaattca tgagaaggcggtggaggtggccagtcatgagcaaatgagctatggcgatgcactcaagaa g c a a a c a t t t c t t a a c a t t t a a t g t g t a a t g t t a g a a t t a g t t c t a g c a t t a c t t a c t t g a t t g g a a a a a a t a a t g c c a a a t t c a t g t g c g t t a a a a g c a t t c a g t c g t c a t t g t t a c g g t t a c t a t a a a c t t t a t g a a a c t t t g g c t a a a a g c a t t c a g t c g t c a t t g g t t a c g g t t a c tatagtctctacagcccgaacgagggaataataaagacaatgtaaagcccagtttctaat t g a g a t c a t t t g t g t a a a a t g a g a t a c c c t t a c g t c g t g t c g t t a c t g t a c g g c c t g c c c gctaacaagattctctctgtcttttgaaaga^^^^CC-attggaaacagtggcagccagg c g t t c c t t c a c t g g c t t t g g a c g t g g a g a g t g a g a c g c t ^ l l l l l i l a c c g c a t g t g t t c c a c g c t t c t c t g t c g t c g t g t t g t g t t c g t c g t g g a a g a a g a c a c a t t c c t t t c c t c t c c c a ^ t a a t a c S f f l r a H K t c t c c gcct T T CAT CAC G G GAAGAAAG GAAAGAAAGAAAAAG CT GAAAGC? .mat. |||GGTGTCTTTTCCCCT M G V F S P ?CGACGACGAGG^TGACGCTCAAATGGTT0AGTTTATCCGTCGCCCTGTTCCTCCTCCTT S T T R L T L K W F S L S V A L F L L L HSG.1 GUS + 14 0 C AC T G GGGT AT T C C C AGT GT T GAT G G C CAT GAAGAC AAT AT GT AT GGAGAG GAGAT ACAA H W G I P S V D G H E D N M Y G E E I Q +200 +260 CAACAAAGACGGTCGTGCGACCCTCAGAGACACCCGCAGAGATTGTCTTCATGCCGGGAC Q Q R R S C D P Q R H P Q R L S S C R D ;CCATC TACTTGGAGCGCCGGAGAGAGCAGC ATCGGAGAGATGCTGCGAGGAATTGCAAAGAATG Y L E R R R E Q P S E R C C E E L Q R M +320 TCTCCACAATGCCGATGCCAAGCGATACAGCAAATGCTCGATCAATCTTTATCGTATGAT S P Q C R C Q A I Q Q M L D Q S L S Y D +380 TCCTTCATGGATTCTGACTCTCAGGAGGATACACCACTTAATCAACGACGCCGCCGCCGC S F M D S D S Q E D T P L N Q R R R R R + 440 +500 CGCGAAGGGCGCGGAAGAGACGAGGAGGAGGTGATGGAGAGAGCAGCATACCTTCCGAAT R E G R G R D E E E V M E R A A Y L P N ' C A ^ C ACCTGCAACGTTCGCGAGCCCCCCCGCCGCTGCGATATT ACGCCACTCTCgtaagtcc T C N V R E P P R R C D I Q R H S R +560 t t c a a t c a a c g c t a c c a a t t a t g a c g t a t c a t a a t t a t g a c g a a g c g g t c c a t c t a t c a a +620 t a t a a c g t g g c t a t g c a a a a t t t t c a t t c a g t c a t g t t t c t g t t a t t c c a t a c c c c a a t t + 680 +740 +800 +860 aatgattaatttaagtcatttgttgttttactgctggtgtctggacagGCTATTTCATGA Y F M 1156,5 CGGGCAGCAGTTTTAAGl|titcgacgaagaagaaaatatagatactgcgtgtatgctatg T G S S F K t a t g t c c c t a ^ ^ ^ ^ t a a g g g a g g c a c t a c c g c t a t g t a t t t t t g g t t t c t g c t t t t a t a g a t a t a g c c t c t c a t t c a a t g q l c a c c a c t t t t c a c t t a c a t c a t g 75 Figure 7: Alignment of *P2S with the Functional 2S Albumin Identity is 84.7% between ^F2S and the functional 2S albumin gene, PG2S, with nine gaps inserted in ¥ 2 8 and five gaps inserted in the 2S albumin sequence. Identical nucleotides are denoted with " | " , gaps are shown as The putative C A A T box, T A T A box and initiation codon are highlighted, as are potential stop codons upstream of the insertion. Even allowing for some uncertainty in the sequence, multiple stop codons are found in all reading frames (reading frames are noted above each stop codon) at the 5' end of the inserted sequence. GenBank accession numbers are U29077 for PG2S and U29078 for W2S. Horizontal black arrows drawn above the insertion sequence indicate three short inverted repeats found near both ends; A = 5 , - G C A C A / T G T G C - 3 ' , B = 5 ' - A A T T A A T G / C A T T A A T T - 3 ' , and C=5'-A C G C C G A A / T T C G G C G T - 3 ' . A T rich areas at both ends of the insertion are in bold typeface. The primer II5G.3 marks the furthest 3' point to which the pseudogene was sequenced. 76 Figure 7: Alignment of *F2S with the functional 2S Albumin Gene »F2S GTCCTGGGTTAT +185 I I I I I I I I I 2S ALBUMIN - TAAAAGCATTCAGTCGTCATTGGTTACGGTTACTATAGTCTCTACAGCCC -325 ¥2S 2S ALBUMIN 2S ALBUMIN 2S ALBUMIN 2S ALBUMIN T2S 2S ALBUMIN T2S 2S ALBUMIN 4*2 S 2S ALBUMIN *F2S 2S ALBUMIN *F2S 2S ALBUMIN GAAGAATAAAGA GGAAGGC—AGTTTCTTATTGAGGTA +221 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I GAACGAGGGAATAATAAAGACAATGTAAAGCCCAGTTTCTAATTGAGATC -27 5 ATTTGTGTAAAATGAGATACCCTTATATCGTGTTGTTTCTGTAGGGCCTG +271 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I M I N I ATTTGTGTAAAATGAGATACCCTTACGTCGTGTCGTTACTGTACGGCCTG -225 CAAT box CCCGCTAACAAGATTATCT — GTCTTTTGAAAGACSf03€€AT?Leu, Asp57> His, A l a i 17 > Thr, G l u i 32 > Asp, A l a i 35 > Va l , Serj65 > Phe). There was also the addition of a sixth arginine in a string of five arginines at position 125 in the genomic sequence. 89 Figure 11: Comparison of the Spruce 2S Albumin genomic and cDNA clones Identity at the nucleotide level between the 2S albumin genomic clone and the c D N A clones are: 93.3% with II5G1001, 90.7% with X I 1 A 0 0 1 , 88.1% with III9H001, and 82.4% with. X5H001 . GenBank accession numbers are U92077 for the genomic 2S albumin clone and X63193 for the c D N A II5G1001. Identical nucleotides are indicated by dashes. Light shaded boxes indicate neutral nucleotide changes. Dark shaded boxes indicate sequence differences resulting in amino acid substitutions. Open boxes indicate an insertion relative to the other sequences, and "*" indicates a gap placed to allow alignment due to an insertion. A triangle points to the position of the intron in the genomic clone. The amino acid translation of the genomic clone is above the nucleotide sequence. Amino acids which differ between the genomic and c D N A clones are written in bold type in brackets. 90 Figure 11: Alignment of 2S Albumin Genomic and cDNA clones genomic clone 1 lb-3 CCTTCGCACTTGCATTACGTTCATCACGGGAAGAAAGGAAAGAAAGAAAAAGCTGAAAGC 60 >II5G1001 :| .——I • 55 cDNA I X11A001 A-GA-A--A- A $; 39 clones