Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Gene families encoding three enzymes of phenylpropanoid metabolism in raspberry (Rubus idaeus L) : characterization… Kumar, Amrita 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2000-565718.pdf [ 18.33MB ]
Metadata
JSON: 831-1.0089691.json
JSON-LD: 831-1.0089691-ld.json
RDF/XML (Pretty): 831-1.0089691-rdf.xml
RDF/JSON: 831-1.0089691-rdf.json
Turtle: 831-1.0089691-turtle.txt
N-Triples: 831-1.0089691-rdf-ntriples.txt
Original Record: 831-1.0089691-source.json
Full Text
831-1.0089691-fulltext.txt
Citation
831-1.0089691.ris

Full Text

Gene families encoding three enzymes of phenylpropanoid metabolism in raspberry (Rubus idaeus L): Characterization of the families, and of the cognate ripening-related fruit cDNAs. By Amrita Kumar B.Sc (Ag), Banaras Hindu University, 1989 M.Sc (Ag), Banaras Hindu University, 1991 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR T H E DEGREE OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES (The Biotechnology Laboratory) & (Faculty of Agricultural Sciences) We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA April, 2000 ©Amrita Kumar, 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of _^ The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract A PCR-based homology search of the Rubus genome, led to the isolation and characterization of two PAL genes (Ripall and Ripal2), three 4CL genes (Ri4cll, Ri4cl2, Ri4cl3), and ten PKS genes (Ripksl-10). These data demonstrate that such a PCR-based homology search could be extended to characterize members of the PAL, 4CL and PKS gene-family members from other crops, in a manner that is independent of their expression. To identify members of these gene-families that may be associated with fruit-ripening, a cDNA library representing partially-ripe fruits was sequentially screened at low stringency with a mixed population of each gene-family member. These hybridization-based homology screenings led to the characterization of two full-length PAL (RiPALl and RiPAL2), three full-length 4CL (Ri4CLJ, Ri4CL2, and Ri4CL3), and three full-length PKS genes (RiPKS5, R1PKS6, and RiPKSll). Thus, the characterization of an additional 4CL (Ri4CL3) and PKS (RiPKSll) gene, suggest that a PCR-based homology search by itself may not be sufficient for isolation and characterization of all members representing these gene-families. Although phenylpropanoid-derived metabolites are important for fruit quality, the regulation of this pathway during fruit development is still poorly understood. I have therefore profiled the expression pattern of these three important phenylpropanoid gene families in ripening Rubus fruits. Quantitative RT-PCR assay with gene-specific probes, showed that the individual gene family members are differentially expressed during fruit ripening and in various Rubus tissues studied. Furthermore, some members from each of the RiPAL, R14CL and RiPKS gene families displayed very similar patterns of expression, which may reflect functionally-related roles in generating specific phenylpropanoid end-products in a given tissue. I have also examined the evolutionary dynamics of these three gene families. Duplication and divergence in function appears to be a common theme for evolution of all three gene families. I find that members of all three gene families from a given plant species are not necessarily more closely related to one another than they are to other genes from other species. iii Table of Contents Abstract ii Table of Contents iv List of Tables vii List of Figures ix Abbreviations xii Acknowledgments xiv 1. General Introduction 1.1 Phenylalanine as a precursor of secondary products in plants 3 1.2 Phenylpropanoid metabolism in plants 5 1.3 Gene(s) encoding PAL, 4CL, and PKS enzymes of the phenylpropanoid metabolism 7 1.4 Biosynthesis of Rubus phenylpropanoid 10 1.5 Proj ect obj ective s 14 2. Materials and Methods 2.1 Plant growth-conditions and materials 16 2.2 Establishment, maintenance and elicitation of raspberry cell cultures 17 2.3 Gene amplification and characterization 17 2.4 Sequencing and sequence analysis of PCR products 18 2.5 Construction and screening of the cDNA library 18 2.5.1 Extraction of RNA from partially ripe fruits ofRubus idaeus 18 2.5.2 Synthesis of Uni-ZAP XR cDNA library 19 2.5.3 D N A screening of cDNA library 20 2.5.4 Restriction map-analysis, and sequence analysis of the cDNAs 20 2.6 Southern and northern blot analysis 21 2.7 Recombinant proteins 22 2.7.1 cDNA cloning into E. coli expression vectors 22 2.7.2 Recombinant protein induction and extraction 22 2.8 Preparation of protein extracts from plant tissues and cell-cultures 23 2.9 Protein analysis 23 2.10 Enzymatic assays 24 2.10.1 Assay for 4-coumarate-CoA ligase 24 2.10.2 Enzymatic synthesis of hydroxycinnamyl-Coenzyme A derivatives 24 2.10.3 Assay for plant polyketide synthase activity 26 2.11 Analysis of gene expression by RT-PCR 26 2.11.1 Extraction of total R N A from Rubus tissues 26 2.11.2 Synthesis of cDNA 27 iv 2.11.3 Generation of PCR competitor 27 2.11.4 Expression of PAL, 4CL and PKS gene family members in various Rubus organs 28 2.11.5 Absolute amounts of Rubus PAL, 4CL, PKS gene-family members in various Rubus tissues 29 2.12 Phylogenetic analysis 30 3. Phenylalanine ammonia-lyase gene-family in Rubus idaeus 3.1 Introduction 31 3.2 Methods 36 3.2.1 Degenerate PCR primer design and amplification of Rubus genomic D N A 36 3.2.2 Screening of Rubus cDNA library for PAL genes 37 3.2.3 Design of gene-specific primers and RT-c 38 3.3 Results 39 3.3.1 PCR-based search tor Rubus PAL gene family 39 3.3.2 Construction of cDNA library ..49 3.3.3 Isolation and characterization of two ripening-related Rubus PAL genes....50 3.3.3.1 Cloning of PAL cDNA(s) 50 3.3.3.3 Sequence analysis 52 3.3.4 Developmental expression of the RiPALs 60 3.3.4.1 Design of gene-specific primers 60 3.3.4.2 Developmental regulation of RiPAL genes 60 3.3.4.3 Comparison of expression levels of the two RiPAL genes in different organs of Rubus 61 3.4 Discussion 67 4. 4-coumarate:coenzyme A ligase gene family in Rubus idaeus 4.1 Introduction 77 4.2 Methods 81 4.2.1 Design of degenerate PCR primers for 4CL genes 81 4.2.2 PCR amplification and characterization of products 83 4.2.3 Screening of Rubus cDNA library for 4CL genes 83 4.2.4 Design of gene-specific primers and RT-cPCR 83 4.2.5 Cloning of the Rubus 4CL3 gene into E. coli expression vectors 85 4.3 Results 85 4.3.1 PCR-based search for the Rubus 4CL gene-family 85 4.3.2 Isolation and characterization of ripening-related 4CL cDNAs 94 4.3.3 Enzymatic activity of the recombinant Ri4CL3 protein. 107 4.3.4 Developmental regulation of the Ri4CL genes 109 4.3.4.1 Design ofgene-specific oligonucleotide primers 109 4.3.4.2 Developmental expression ofRi4CL transcripts 109 4.4 Discussion 114 5. Polyketide synthase gene-family in Rubus idaeus 5.1 Introduction 124 5.2 Methods 133 5.2.1 Degenerate PCR primer design and amplification of members of the Rubus PKS gene family 133 5.2.2 Screening of a Rubus cDNA library for expressed PKS genes 134 5.2.3 Expression of recombinant Rubus PKS proteins 135 5.2.4 Design of gene-specific primers and RT-cPCR 136 5.3 Results 137 5.3.1 Amplification and characterization of the Rubus PKS genes 137 5.3.2 Rubus PKS sequence classes 142 5.3.3 Rubus PKS gene structure and sequence comparisons 146 5.3.4 Isolation and characterization of ripening-related Rubus PKS genes 150 5.3.5 Molecular modeling 153 5.3.6 Phylogenetic origin of Rubus polyketide synthases 161 5.3.7 Functional expression of the RiPKS cDNAs 163 5.3.8 Properties of the recombinant Rubus PKS proteins 166 5.3.9 Modification of the enzymatic activity of the recombinant Rubus PKS proteins 169 5.3.10 Induction of PKS activity in Rubus cell-suspension cultures 173 5.3.11 vivo relevance of the effects of imidazole 174 5.3.12 Developmental regulation of the RiPKS cDN A transcripts 180 5.4 Discussion 186 5.4.1 Rubus PKS multigene family 190 5.4.2 Catalytic properties of the three Rubus PKS recombinant proteins 190 5.4.3 Evolution of the PKS gene family 195 5.4.4 Expression patterns 198 6. Summary and future directions 203 6.1 Characterization of phenylpropanoid gene-families in other plant species 203 6.2 Coordinate regulation of phenylpropanoid genes 204 6.3 Identification of ripening-related promoters 207 6.4 Characterization of Rubus PKSs 207 6.5 Evolution by design 209 Bibliography 211 vi List of Tables Chapter 2 2.1 U V absorbance data for the various Co A derivatives used in the enzymatic assay 25 Chapter 3 3.1 Nucleotide sequence of the primers used for amplification of the Rubus PAL genes 37 3.2 Summary of putative PAL clones analyzed and sequenced from each of the primer pairs used in this study 41 3.3 Features of the intron found within the sequenced regions of the Rubus pal genes 46 3.4 Sequence similarity (%) among PAL gene family members from different species, comparing the 366 bp (122 aa) region shown in Figure 3.3 47 3.5 Summary of full-length Rubus /MZ-cDNA clones identified by screening the cDNA library • 51 3.6 Amino acid sequence identity among PAL gene family members from different species 56 3.7 Features predicted for Rubus PAL proteins 58 3.8 Levels of specific RiPAL transcripts in different organs 66 Chapter 4 4.1 Nucleotide sequence of the primers used for amplification of the Rubus 4CL gene family 82 4.2 Summary of analysis and characterization of putative Rubus 4CL clones from each primer pairs 91 4.3 Sequence similarity of partial Rubus 4CL genes compared to each other, and to other 4CL genes3, within the 438 nucleotide region shown in Figure 4.1 93 4.4 Features of the predicted proteins corresponding to the three Rubus 4CL cDNAs 102 4.5 Amino acid sequence similarity among full-length 4CLs 105 vii 4.6 Substrate utilization by E. co/z'-expressed Ri4CL3 recombinant protein 108 Chapter 5 5.1 Summary of analysis and characterization of putative Rubus PKS clones from each primer pair 141 5.2 Sequence similarity amongst the partial Rubuspks gene family members, comparing the 351 bp (117 aa) regions shown in Figure 5.2 149 5.3 Features of the predicted proteins corresponding to the three Rubus PKS cDNAs 157 5.4 Amino acid sequence similarity of full-length Rubus PKS cDNA 158 5.5 Absolute levels of specific RiPKS transcripts in different Rubus organs 185 viii List of Figures Chapter 1 1.1 The three core carbon skeletons that are prevalent in plant secondary metabolites 2 1.2 Schematic diagram of the biosynthesis of the aromatic amino acid L-phenylalanine 4 1.3 The reactions of general phenylpropanoid metabolism 7 1.4 Phenylpropanoids that control some quality traits in Rubus idaeus fruits 13 1.5 Biosynthesis of Rubus fruit polyketides 14 Chapter 2 2.1 Flow chart illustrating the generation of gene-specific competitors 28 Chapter 3 3.1 Position of the Rubus pal clones amplified and sequenced, relative to a generic PAL gene 41 3.2 Fragments of the Rubus genome amplified using different PAL gene-specific degenerate primer pairs 42 3.3 Nucleotide sequences of the two classes of R. idaeus pal genes 44 3.4 Restriction maps of representative Rubus PAL cDNA clones 51 3.5 Comparison of the deduced amino acid sequences of the Rubus PAL genes with members of the Arabidopsis PAL gene family 56 3.6 A phylogenetic tree depicting the relationships amongst PALs 59 3.7 Specificity of the gene-specific primers designed to amplify individual members of the RiPAL gene family 60 3.8 Semi-quantitative RT-cPCR assay to estimate the accumulation of specific RiPAL transcripts in different organs of Rubus 64 3.9 Quantitation of the absolute levels of two RiPAL mRNAs in fruits at developmental stage III 65 ix Chapter 4 4.1 Positions of the 4CL gene-specific primers relative to Arabidopsis 4CL1 (GenBank, S57784) 89 4.2 PCR amplification of 4CL gene fragments from Rubus using 4CL gene-specific primers 90 4.3 Nucleotide sequence of the two partial Rubus idaeus 4CL genes 92 4.4 Relationship of Rubus 4CLs to members characterized from Arabidopsis thaliana and Populus spp 94 4.5 Restriction maps of Rubus 4CL cDNA clones 100 4.6 Sequence alignments of the Ri4CLl cDNA clone and genomic clone (gDNA) in the region of the stop codon 101 4.7 Comparison of the deduced amino acid sequences of the Rubus 4CL genes to each other and to the 4CL consensus 103 4.8 Phylogenetic relationships among plant 4CL proteins 106 4.9 Immunoblot analysis of recombinant Rubus 4CL3 expressed in E. coli 108 4.10 Specificity of the gene-specific primers used for QRT-PCR analysis 109 4.11 Expression of specific Ri4CL transcripts in different organs of Rubus as estimated by QRT-PCR analysis 112 Chapter 5 5.1 Plant PKSs that catalyze the condensation of a specific starter units [labeled (a)-(f)] with one, two or three units of malonyl CoA 132 5.2 Positions of the PKS gene-specific primers, and of the amplified Rubus pks clones, relative to a generic plant PKS gene 140 5.3 Fragments of the Rubus genome amplified using different PKS gene-specific degenerate primer pairs 141 5.4 Nucleotide sequences of the ten partial Rubus idaeus pks genes 148 5.5 Restriction maps of representative Rubus PKS cDN A clones 154 5.6 Nucleotide sequence alignments of the three Rubus PKS cDNA clones 156 5.7 Alignment of the deduced amino acid sequences for the RiPKS cDNAs 160 5.8 Phylogenetic relationships among plant PKS proteins 162 5.9 Analysis of recombinant RiPKS proteins expressed in E. coli 164 5.10 T L C analysis of products of the assay with the three Rubus polyketide synthase recombinant proteins expressed inE. coli 165 5.11 Conversion of hydroxycinnamyl-CoA starter esters by RiPKS recombinant proteins 167 5.12 Effect of p-mercaptoethanol on the activities of recombinant RiPKS proteins.... 168 5.13 Effect of elution buffer on the activity of the Rubus polyketide synthase recombinant protein 171 5.14 Products from the assay of recombinant RiPKS6 in the presence of increasing concentrations of imidazole 172 5.15 Effect of increasing concentrations of imidazole on the gerbera CHS 1 recombinant protein 173 5.16 Changes in the activities of CHS and BAS in raspberry cell-suspension cultures induced with yeast extract 174 5.17 Structures of chemicals tested for their ability to imitate the effects of imidazole on RiPKS 177 5.18 Effect of assay pH on the enzymatic activity of recombinant RiPKS6 178 5.19 Effect of different amino acids and amino acid derivative on the enzymatic activities of recombinant RiPKS6 179 5.20 Effect of adding protein extracts from raspberry tissue and cell-culture to crude sample ofRiPKS6 180 5.21 Semi-quantitative RT-cPCR analysis of the accumulation of specific RiPKS transcripts in different organs of Rubus 182 5.22 Quantification of the absolute levels of three RiPKS mRNAs in developmental stage III of fruits 184 xi Abbreviations 3HBL 3-hydroxybenzoate-coenzyme A ligase 4CL 4-coumarate-coenzyme A ligase uM micromolar A A amino acid ATP adenosine tri-phosphate B A benzalacetone BAR benzalacetone reductase BAS benzalacetone synthase BSA bovine serum albumin C4H cinnamate-4-hydroxylase C A D cinnamyl alcohol dehydrogenase CAF caffeate CCR cinnamyl-Co A. reductase CHS chalcone synthase CHI chalcone isomerase CIN cinnamate CoA coenzyme A CTAS p-coumaroyltriacetic acid synthase H A L histidine ammonia-lyase HCHS homoeriodictyol/eriodictyol chalcone synthase IPTG isopropyl-P-D-thiogalactopyranoside Kb kilobase kD kilodalton min minute M W molecular weights nkat nanomoles of substrate converted to product per second PQE30-4CL3 Qiaexpresssionist plasmid containing the Ri4CL3 cDNA PQE30-PKS5 Qiaexpressionist plasmid containing the RiPKS5 cDNA PQE30-PKS6 Qiaexpressionist plasmid containing the RiPKS6 cDNA PQE30-PKS11 Qiaexpressionist plasmid containing the RiPKSll cDNA pHPB p-hydroxyphenylbutan-2-one PAGE polyacrylamide gel electrophoresis PAL phenylalanine ammonia lyase PCR polymerase chain reaction PKS polyketide synthase PS pyrone synthase RFLP restriction fragment length polymorphism RT-PCR reverse-transciptase polymerase chain reaction RT-cPCR reverse-transcriptase competitive polymerase chain reaction RMIS3 Rubus idaeus Histone H3 Ripal Rubus idaeus phenylalanine ammonia-lyase (partial genomic fragment) Ri4cl Rubus idaeus 4-coumarate-CoA ligase (partial genomic fragment) Ripks Rubus idaeus polyketide synthase (partial genomic fragment) xii RiPAL Rubus idaeus phenylalanine ammonia-lyase (full-length cDNA) Ri4CL Rubus idaeus 4-coumarate-CoA ligase (full-length cDNA) RiPKS Rubus idaeus polyketide synthase (full-length cDNA) s sec SDS sodium dodecyl sulfate SPS styrylpyrone synthase STS stilbene synthase T A E tris-acetate TLC thin layer chromatography xiii Acknowledgements To my supervisor and mentor Prof. Brian Ellis -Thank you very much. My words cannot compensate for your excellent training, guidance, support and kindness. I am proud and honored to be your student. I would also like to thank the members of my committee, Prof. Carl Douglas, Prof. Brian Holl, Dr. Mary Berbee, for their constant help and guidance. Special thanks to Dr. Arthur Yee and Dr. Ehlting for the technical help. Stef, I would not be half the molecular biologist I am, had it not been for you. You remain my role model. Anil, thank you for constant "irritating" question, quote 'When are you going to finish ?' It annoyed the hell out of me but kept me focussed. xiv CHAPTER 1 Introduction As the principal primary producers of the food chain, plants occupy an important position in sustaining life on this planet. Plants are efficient chemical factories, best manifested by their ability to capture solar energy and fix carbon to form complex metabolites and structures. Though plants have developed this unique ability to photosynthesize, they are sessile and thus an essential feature of the evolution of higher plants has been their exceptional ability to develop new and unique metabolites to ward off threats to their integrity. For example, plants use flavonoids to protect themselves from exposure to UV-light during capture of the sunlight for photosynthesis. Similarly, normal lignin is a major structural component in the walls of certain specialized plant cells, while stress-induced lignin deposition provides a mechanism for sealing off sites of pathogen infection and wounding. Still other metabolites such as monoterpenes serve as herbivore-feeding deterrents, and act as antifungal compounds, and as attractants for pollinators (Langenheim, 1994). These observations suggest an important feature of the success of plants and their evolutionary development is the chemical diversity seen within plants. One feature of the biochemical virtuosity of plants that initially puzzled scientists was the observation that relatively few chemicals occurred universally throughout the plant kingdom. Consequently, the other species-specific metabolites were labeled "secondary metabolites" and were thought to be non-essential for day-to day functioning of plants. In fact, the term "secondary compounds" was first coined by Kossel (1891) to describe "components which are not found in every cell capable of developing" as 1 compared to primary components which are essential components of all cells. However the subtle and multi-faceted patterns of interaction between plants and their environment are being deciphered, it is becoming clear that these secondary products are not just a metabolic "rubbish heap". Instead, it appears that the chemical profile displayed by any particular plant at a given time and place is a representation of an on-going refinement of the optimal chemistry necessary for the survival of the plant. Plant biochemists have also discovered that underlying this array of plant secondary metabolites is a small number of core biosynthetic pathways that generate the precursors for the majority of the known compounds. Products of these core pathways contain a basic chemical backbone that is diagnostic for the biosynthetic origins of each compound. Thus, an isopentane carbon skeleton is the hallmark of biogenesis via the isoprenoid (terpenoid) pathway, while the presence of one or more aromatic rings suggests an origin in the shikimate/phenylpropanoid pathway, and a pattern of alternating oxygenated and reduced carbons along a linear or cyclized skeleton is usually generated by acetate/malonate condensations (Figure 1.1). A B C Figure 1.1 The three core carbon skeletons that are prevalent in secondary metabolites in plants. A, isopentane units form the backbone of isoprenoid metabolites; B, phenylpropanoid compounds display the basic carbon skeleton of L-phenylalanine; C, sequential condensations of acetyl units give rise to functional carbon chain that is characteristic of polyketides. 2 1.1 Phenylalanine as a precursor of secondary products in plants The aromatic amino acid L-phenylalanine is synthesized in plants via the shikimic acid pathway (reviewed in Herrmann, 1995; Herrmann and Weaver, 1999). This pathway links metabolism of carbohydrates to the biosynthesis of aromatic compounds. To date, this pathway has been found only in microorganisms and plants. While the bacterial shikimate pathway serves almost exclusively to synthesize the three aromatic amino acids (Herrmann, 1983), higher plants use these amino acids not only as a source of protein building blocks, but in even greater quantities, as precursors for a large number of secondary metabolites. Globally 20% of the carbon fixed by plants flows through the shikimate pathway, which amounts to ~7 x 105 kg/yr (Herrmann, 1995). The steps of the shikimate pathway leading to the production of chorismate (Figure 1.2) are identical in bacteria, eukaryotic micororganims and plants. In contrast, beyond chorismate, the biosynthesis of phenylalanine in plants can proceed via prephenic and arogenic acid, while in fungi synthesis of phenylalanine proceeds via prephenic and phenylpyruvic acid. In bacteria either of these pathways, or sometimes both pathways, can be operative (Figure 1.2) (Bender, 1985). All the enzymes of the shikimate pathway have been isolated from both prokaryotic and eukaryotic sources and their kinetic parameters studied in detail (reviewed in Herrmann and Weaver, 1999). The primary sequence of each of the enzymes has been obtained by the combined efforts of protein and D N A sequencing, and three-dimensional structures have been reported for DAHP synthase (Shumilin et al., 1999), shikimate kinase (Krell et al., 1997), anthranilate synthase (Knochel et al., 1999), chorismate mutase (Strater et al, 1996), and dehydroquinate synthase (Carpenter et al., 3 1998). Numerous studies have shown that genes encoding enzymes of the shikimate pathways are transcriptionally activated by stress conditions that also induce the production of phenylpropanoid compounds (Dyer et al., 1989; Keith et al., 1991; Gorlach etal, 1995). Phosphoenol pyruvate Erythrose 4-phosphate I 1 DAHP I 2 3-Dehydroquinate 3-Dehydroshikimate I 4 Shikimate Shi ki m ate-3 - phos phate I 6 EPSP I' Chorismate Prephenate / l0\ Arogenate Phenylpyruvate V u / Phenylalanine Figure 1.2 Schematic diagram of the biosynthesis of the aromatic amino acid L-phenylalanine. DAHP, 3-deoxy-D-arabmo-heptulosona.te 7-phosphate; EPSP, 5-enolpyruvyl shikimate 3-phosphate. The enzymes are: 1, DAHP synthase; 2, 3-dehydroquinate synthase; 3, 3-dehydroquinate dehydratase; 4, shikimate dehydrogenase; 5, shikimate kinase; 6, EPSP synthase; 7, chorismate synthase; 8, chorismate mutase; 9, arogenate dehydrogenase; 10, prephenate dehydratase; 11, arogenate dehydratase; 12, phenylpyruvate aminotransferase. 4 1.2 Phenylpropanoid metabolism in plants The further metabolism of phenylalanine formed via the shikimic acid pathway leads to synthesis of a vast collection of compounds with diverse functions. The production of most phenylpropanoid compounds from phenylalanine requires a minimum of three steps that are controlled by the enzymes phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumarate CoA ligase (4CL). These three steps are often referred as "general phenylpropanoid metabolism" (Figure 1.3). PAL catalyzes the deamination of L-phenylalanine to form /raws-cinnamic acid and NFLj+. Cinnamic acid is hydroxylated by C4H to yield 4-coumaric acid. Further activities of hydroxylases and O-methyltransferases yield derivatives of 4-coumaric acid (e.g. ferulate, caffeate, sinapate), while the final enzyme of the general phenylpropanoid pathway, 4CL, catalyzes the formation of activated CoA esters of these hydroxycinnamic acids (4-coumaric acid or its methoxy derivatives). These CoA esters serve as substrates for specific branch pathways. Among the best studied of these branch pathways are those leading to the synthesis of flavonoids (reviewed in Dixon and Steele, 1999) and lignin (reviewed in Douglas, 1996). The entry point enzymes of the flavonoid pathway are chalcone synthase (CHS) and chalcone isomerase (CHI), whereas those for the lignin pathway are cinnamyl CoA reductase (CCR) and cinnamyl alcohol dehydrogenase (CAD). Other prominent phenylpropanoid product classes include the hydroxycinnamyl esters and the hydroxybenzoic acids. While the phenylpropanoid pathway has often been categorized as a secondary metabolic pathway, recent studies have shown that many metabolites derived from this pathway play a role in growth and development of plants and some may play essential 5 roles. The two quantitatively dominant phenylpropanoids, lignins and flavonoids, serve a variety of functions in plants. Lignin is located in the cell walls of conducting and supporting tissues such as vascular elements and phloem fibers, where it provides hydrophobicity and mechanical strength. It is also utilized by plants as an inducible physical barrier against pathogen invasion (Vance et al., 1980). Flavonoid natural products have long been known to function as floral pigments for the attraction of insect pollination, as signal molecules for beneficial microorganisms in the rhizosphere, and as antimicrobial defense compounds (reviewed in Dixon and Steele, 1999). Recent studies also indicate that flavonols may be required for male fertility and, more specifically, for pollen tube growth in maize, petunia and tobacco, although they are not essential in Arabidopsis (Weisshaar and Jenkins, 1998). In addition, new functions for flavonoids compounds continue to be found particularly in plant-microorganism signaling. It has been suggested that the occupation of new ecological niches by plants was due to the co-evolution of flavonoids (Jorgensen, 1994) and possibly lignins. Other phenylpropanoid derivatives also serve important functions in various plants. Evaluation of transgenic tobacco plants impaired in their ability to biosynthesize phenolic acids (and monolignols) have provided preliminary insights that phenolic derivatives of hydroxycinnamic acids and monolignols play a significant role in development of leaf palisade cells, and also appear to provide compounds that are important in controlling the process of senescence (Tamagnone et al., 1998). In addition to these influences on growth and development, phenylpropanoids also play an important role in plant defense against various pathogens. Salicylic acid (SA), derived from cinnamic acid, and produced during disease resistance responses, is thought to be a signal 6 molecule in the induction of systemic acquired resistance (Malamy and Klessig, 1992). Recent analyses of a gain-of-function Arabidopsis acd6 mutant suggests that S A may have a broad influence on plant defenses, cell-death and cell growth (Rate et al., 1999). Assessment of all these results leads to the conclusion that the traditional view of this pathway is inaccurate and suggests that phenylpropanoid metabolism is vital for the well-being of plants. General Phenylpropanoid Pathway Phenyl- Cinnamic 4-Coumaric 4-Coumaiyl CoA alanine acid acid Figure 1.3 The reactions of general phenylpropanoid metabolism. PAL, phenylalanine ammonia-lyase; C4H, cinnamate-4-hydroxylase; 4CL, 4-coumarate CoA ligase. 1.3 Gene(s) encoding PAL, 4CL and PKS enzymes of phenylpropanoid metabolism The evidence available thus far suggests that most enzymes of the phenylpropanoid metabolism are encoded by families of homologous genes. Certain families, such as those encoding PAL, are relatively well characterized in some plant species (Logemann et al., 1995; Wanner et al, 1995). Following the isolation of PAL cDNA from bean (Cramer et al., 1989), parsley (Lois et al., 1989) and sweet potato (Tanaka et al., 1989), PAL genes have been isolated from many sources. PAL is encoded by a small gene family in plants such as bean, parsley, and poplar (Cramer et al., 1989; 7 Lois et al., 1989; Subramanium et al., 1992). Joos and Hahlbrock (1992) suggest that in potato there are as many as 50 PAL genes. In the yeast R. toruloides PAL is encoded by a single gene (Anson et al., 1987). A nucleotide search of GenBank in November 1999 returned more than 150 entries as those belonging to PAL genes. In general, the sizes of PAL genes have been reported in the range from 2.1 kb to 2.4 kb. While PAL genes from plants harbor one/two introns, fungal PAL has five (Anson et al., 1986) to six introns (Valslet et al., 1988). Exceptionally, however no introns have been found in the pine PAL genes (Butland et al., 1998). The enzyme 4CL can utilize both 4-coumarate and its methoxylated derivatives as substrates (Figure 1.3). While physically distinct 4CL isoforms have been characterized from soybean, Petunia, pea, poplar, and carrot (reviewed in Chapter 4), more detailed knowledge about 4CL isoforms derives from the cloning of 4CL genes during the 1980's and 1990's. Characterization of 4CL genes from a variety of plant species has revealed interesting gene family differences. The enzyme can be encoded by very similar duplicated genes, as in parsley, potato, tobacco and loblolly pine, or by divergent multiple genes as in Arabidopsis and soybean (reviewed in Chapter 4). Poplar 4CL genes represent an interesting example where two duplicated 4CL genes show very similar patterns of expression and substrate utilization (Allina et al., 1998), whereas yet other duplicated 4CL genes in this species show distinctive substrate utilization profile and expression patterns (Cukovic, 1999). It is possible that the divergent 4CL genes in plants may encode 4CL enzymes with different enzymatic properties, but no concrete molecular 8 or genetic evidence exists which demonstrates a specific association between a 4CL gene and a 4CL isoform. Chalcone synthase (CHS) is the first plant natural product polyketide synthase (PKS) to be characterized at the molecular level. It catalyzes the extension of 4-coumaryl CoA to yield naringenin chalcone, a precursor for the major classes of plant flavonoids (Figure 1.5). Alternative folding pathways for the polyketide extension can lead instead to the formation of the stilbene resveratrol. Cloning of stilbene synthase (STS), the first variation from a CHS-type plant-specific PKS, revealed considerable homology to CHS at the primary sequence level. It is now apparent that CHS is simply one member of a family of closely related PKSs that can utilize different starter molecules, and perform different numbers of polyketide condensations to yield a variety of interesting metabolites such as acridones, styrylpyrones, and benzophenones (reviewed in Chapter 5). It is remarkable that despite the pronounced differences in the catalytic functions, the primary sequences for all these PKSs share considerable homology. For example, a family of pyrone synthase (PS) genes from Gerbera shares about 74% amino acid sequence identity with true Gerbera CHS (Helariutta et al., 1995). However, modeling of PS, based on the co-ordinates of the complete crystal structure of alfalfa CHS solved at the 1.8 A level (Ferrer et al., 1999), suggests that the high homology in the primary sequence still allows for variations in the tertiary structure of PS. The biological significance of the phenylpropanoid multigene families remains unclear. It has been suggested that specific genes in each family may encode different subunit isoforms, as appears to be the case for P. vulgaris PAL2 and PALS (Liang et al., 1989). Similarly, in soybean the two 4CL genes isolated have been proposed to be linked 9 to the two specific isoforms characterized (Knobloch and Hahlbrock, 1975; Uhlmann and Ebel, 1985). In an extension of this concept it has been proposed that specific isoenzymes encoded by the gene-family members may associate preferentially with specific multienzyme complexes to control the flux of metabolites through the different branches of phenylpropanoid pathway. Preliminary evidence for such a model has been reported by Rasmussen and Dixon (1999). Alternatively, gene-family members may simply contribute to an overall increase in the level of expression of the gene in question, or they may be specialized for expression at different times, in different cell-types, or in different sub-cellular locations (Butland et al., 1998). In any case, to gain an understanding of the regulation of phenylpropanoid metabolism, it is crucial to assess the diversity and functionality within each multigene family involved. Multigene families are also found for other enzymes in plants. Complete sequence analysis of chromosome 4 of Arabidopsis reveals that 12% of the encoded genes identified on the basis of significant similarities to known genes are present in multiple copies (Mayer et al., 1999). These gene copies often appear as clusters on the same D N A strand. A family of 15 contiguous receptor kinase-like protein genes which are over 95% identical at the sequence level form a contiguous cluster. Clustering within some phenylpropanoid gene families has also been reported for PAL genes from Trifolium (Howies et al., 1994) and CHS genes from soybean (Akada and Dube, 1995). 1.4 Biosynthesis of Rubus phenylpropanoids While the characteristic aroma of raspberry fruit is due to the interactions of many compounds including aliphatic and aromatic hydrocarbons, aldehydes, ketones, alcohols and esters (Robertson et al., 1995), the polyketide derivative, 4-(4-hydroxyphenyl)-butan-10 2-one (pHPB) is the major determinant of raspberry flavour (Borejsza-Wysocki et al., 1994 and references therein). Various other phenylpropanoid derivatives are also important for the quality of the fruits, notably flavonoid derivatives such as anthocyanins, the levels and types of which are primarily responsible for fruit color (Figure 1.4) (Goiffon et al., 1991). In addition, plant disease resistance often involves lignification (for cell-wall reinforcement) and formation of phenolic phytoalexins, which are all products of the phenylpropanoid pathway (reviewed in Dixon and Paiva, 1995). While phenylpropanoids such as lignin and phenolic phytoalexins have not been specifically isolated from Rubus fruits, it is likely that such phenolics are also biosynthesized in that tissue. Despite the importance of phenylpropanoid metabolic products to raspberry quality, however, little is known about the organization and properties of this pathway at either the biochemical or genetic level in Rubus. Elucidation of the biosynthesis of the novel Rubus polyketide, raspberry "ketone" demonstrated that its formation in Rubus involves an enzyme whose properties were reminiscent of the well-known plant-specific polyketide synthase, CHS. In the first step, malonyl-CoA is condensed with pxoumaryl CoA to form the polyketide intermediate p-hydroxybenzalacetone (/?-hydroxyphenylbut-3-ene-2-one) (BA). This reaction is catalyzed by the enzyme benzalacetone synthase (B AS). In the second step, p-hydroxybenzalacetone is reduced to p-hydroxyphenylbutan-2-one (pHPB), the "raspberry ketone", by the enzyme benzalacetone reductase (BAR), which uses N A D P H as the proton donor (Borejsza-Wysocki and Hrazdina, 1994). The first enzyme of this pathway, BAS, is of particular interest since its catalytic capacity is similar to that of a number of other plant polyketide synthases such as CHS, STS and ACS (Dixon and Steele, 1999). 11 However, the biosynthesis of B A involves a single cycle of malonyl-CoA condensation followed by a decarboxylation that results in chain shortening. Analysis of partially purified BAS revealed values for the M r , pH, and K m for the enzyme that were similar to those for CHS (Borejsza-Wysocki and Hrazdina, 1996). However, the partially-purified BAS differed from CHS in showing a different substrate utilization profile and sensitivity to externally added P-mercaptoethanol, ethylene glycol and glutathione (Borejsza-Wysocki and Hrazdina, 1996). Rubus CHS usesp-coumaryl CoA exclusively as its starter unit substrate, whereas BAS utilized ferulyl CoA three times more efficiently than p-coumaryl CoA. Ethylene glycol, p-mercaptoethanol and glutathione all markedly increased the activity of BAS but inhibited the activity of CHS (Borejsza-Wysocki and Hrazdina, 1996). Further evidence that BAS is a different enzyme from CHS came from the behavior of the two PKSs in stressed raspberry cell-cultures. Yeast extract increased the BAS activity in the cultured cells by 5-fold, while leaving the CHS activity at basal levels (Borejsza-Wysocki and Hrazdina, 1996). Similar differential responses between two PKSs in a single species have also been observed for STS and CHS activities in elicited peanut cell-cultures (Rolf et al., 1987). Thus, while BAS has been partially characterized as an enzyme, a more detailed knowledge of the corresponding gene sequence would provide important insights into not only the reaction mechanisms of this class of aromatic polyketide synthases, but their evolutionary relationships to other PKSs. 12 Figure 1.4 Phenylpropanoids that control some quality traits in Rubus idaeus fruits. Dashed arrows indicate a branch pathway emanating from the general phenylpropanoid pathway. 13 ^ f 4-coumarv 1-CoA H O S C o A , , V c o Malonyl-CoA T , C - C H 2 - C . H O M , , ^ . S L ; O A ± T * C O . Malonvl-CoA T r _ _ V _ C H H 0 \ ^ - y " S C o A ^ C - C H 2 - C , ^^Malonyl-CoA S C o A ^ CoASH OH O O H R ' - R = O H S = carbohydrate Figure 1.5 Biosynthesis of Rubus fruit polyketides. The primary flavour (2) and pigment molecules (4) derive from the phenylpropanoid pathway, in branch pathways requiring the enzymes BAS and CHS, respectively. CHS, chalcone synthase; BAS, benzalacetone synthase; BAR, benzalacetone reductase; 1, p-hydroxyphenylbut-3-ene-2-one; 2, p-hydroxyphenylbutan-2-one; 3, naringenin chalcone; 4, anthocyanins. 1.5 Project objectives While most genes encoding the core phenylpropanoid pathway have been isolated from various plant species and the gene family members have been studied in context of developmental programs such as lignification and in response to environmental stresses, the behavior of phenylpropanoid gene families during fruit ripening remain to be explored. Rubus was chosen as the starting material because the quality of Rubus fruits depends on the activities of multiple phenylpropanoid branch pathways. Thus, Rubus tissues offer an excellent opportunity to explore the interplay between different 14 phenylpropanoid branch pathways. A good understanding of the molecular biology of this complex biosynthetic system could also create opportunities for improving the quality traits of Rubus fruits through genetic engineering. The specific aims of my research were to: 1. Characterize the PAL, 4CL and PKS gene families of Rubus idaeus. 2. Characterize fruit ripening-related genes within each of the three gene families. 3. Analyze the evolutionary dynamics of the three gene families. 4. Determine whether the Rubus PKS gene family includes a gene whose encoded protein has the features of a BAS. 15 CHAPTER 2 Materials and Methods This chapter includes common materials and methods that have been used during the course of this study. Variations from protocol or methods unique for a particular analysis are described within each relevant chapter(s). 2.1 Plant growth-conditions and materials Raspberry (Rubus idaeus L. cv Meeker) plants were grown in the experimental plots of the Agriculture and Agri-Food Canada Research Station at Abbottsford, BC or in the greenhouse (Faculty of Agricultural Sciences, UBC) under ambient conditions. All harvested plant tissues were immediately frozen in liquid nitrogen and stored at -80°C. Raspberry leaf, shoot and root tissues were collected from mature greenhouse-grown plants. Flowers and fruits at different developmental stages were collected from plants grown in the experimental field plots of the Agriculture and Agri-Food Canada Research Station at Abbotsford, BC. Flowers were collected at three developmental stages. Flowers I consisted of closed inflorescence buds, Flowers II consisted of fully open flowers and Flowers III consisted of fertilized flowers. Fruits were collected at five different developmental stages. Fruits I were green, hard and still undergoing cell-expansion; Fruits II were still green but had almost reached mature size; Fruits III were yellow, starting to "blush" and the size was that of the mature berry; Fruits IV were fully ripe with the color and aroma fully developed; Fruits V were slightly overripe and somewhat dehydrated. 16 2.2 Establishment, maintenance and elicitation of raspberry cell cultures Raspberry cell suspension cultures {Rubus idaeus L. cv Royalty) were established from soft-calli provided by Prof G. Hrazdina, Cornell University (Borejsza-Wysocki et al., 1994). Suspension cultures were maintained on a gyratory shaker (100 rpm) at 25°C in the dark. Cells were propagated bi-weekly in full-strength Anderson's basal medium supplemented with 2,4-Dichlorophenoxyacetic acid (9 uM), Indole-3-butyric acid (4.9 u.M), 6-(y,y-Dimethylakkykamirio)purine (4.9 uM), myo-insotiol (100 mg L"1), nicotinic acid (0.5 mg L"1), pyridoxine hydrochloride (0.5 mg L' 1), thiamine hydrochloride (0.1 mg L"1), and sucrose (30 g L"1). Seven day old cultures were elicited with Bacto-yeast extract (Difco) at a final concentration of 3 g L' 1 . Following elicitation, the cells were vacuum filtered through Whatman #1 filter paper, flash frozen in liquid nitrogen and stored at -80°C until further analysis. 2.3 Gene amplification and characterization Rubus idaeus genomic DNA was isolated from variety 'Meeker' using the method described by Doyle and Doyle (1990). Amplification reactions contained 100 ng genomic DNA, lx Appligene buffer (providing a final concentration of 1.5 mM MgCl 2), 0.5 uM each dNTP, 2 u.M each primer and 2.5 U Taq DNA polymerase in a final volume of 50 ul Reaction mixture were incubated at 95°C for 10 min and then subjected to 35 cycles of amplification (95°C for 50 s, 55°C for 50 s, 72°C for 1 min), and completed by a final 10 min extension at 72°C in a Techne PHC-3 thermal cycler (Mandel Scientific). PCR reactions were analyzed by 1% TAE-agarose gel electrophoresis. 17 Amplified fragments were purified from agarose gel slices using the G E N E C L E A N kit. After overnight digestion with EcoRI and Xbal, 100 ng of each of the amplified product were ligated with 100 ng EcoRI/Xbal digested pUC19 in a 20 pi reaction volume using T 4 D N A Ligase (Life Technologies). An aliquot of the ligation reaction (10 pi) was used to transform competent E. coli DH5a (100 u.1). A series of transformants derived from each PCR amplification product was further analyzed as described within each chapter. 2.4 Sequencing and sequence analysis of PCR products Plasmid D N A from selected clones was isolated for sequencing following a mini-alkaline lysis/PEG precipitation procedure (Ausubel et al., 1995). Both strands of the inserts were sequenced with the Ml3 universal primers and/or synthetic oligonucleotide primers as needed to extend the sequence. Sequencing reactions were carried out by the University of British Columbia Nucleic Acid-Protein Service Unit using the PRISIM Ready Reaction DyeDeoxy Terminator Cycle Sequencing kit (Applied Biosystems). All sequences were edited and analyzed using the PC/GENE Software (Intelligenetics). Database searches for sequence homology and comparisons were performed using various web-based analytical tools compiled at the web-site http://www.sdsc.edu/ResTools/. 2.5 Construction and screening of the cDNA library 2.5.1 Extraction of RNA from partially ripe fruits of Rubus idaeus All precautions to prevent RNase contamination were taken as described by Sambrook et al. (1989). Total RNA was isolated from fruits (stage III) of raspberry using 18 the RNeasy Maxi Kit (Qiagen) following the manufacturer's protocol with the following modifications. Fruit tissue (10 g) was ground in liquid nitrogen and resuspended in 20 ml buffer RLC (Qiagen). The cell lysate was homogenized for 45 s at high speed using a Polytron homogenizer (Brinkmann). The sample was applied to the RNeasy maxi spin column and the RNA bound to the matrix eluted with DEPC- treated water. The concentration and purity of RNA was estimated by absorbance at 260 nm and 280 nm. The integrity of the RNA was examined by electrophoresis in 1% TAE-agarose or denaturing formaldehyde gels. . Poly(A)+ RNA was isolated from 1.5 mg total RNA using Dynabeads Oligo (dT25) ( D Y N A L ) following the manufacturer's instructions. No degradation of the mRNA was observed as verified by northern blot analysis. 2.5.2 Synthesis of Uni-ZAP XR cDNA library cDNA was synthesized from 5 u,g poly(A)+ RNA using the ZAP-cDNA synthesis kit (Stratagene). The efficiency and the size range of the first and the second strand synthesis were evaluated by incorporating [a-32P] dCTP during synthesis, and autoradiography of samples run in an alkaline agarose gel. After blunt-ending the cDNAs termini with Pfu DNA polymerase, EcoRI adapters were ligated to the ends. The modified cDNAs were digested with Xhol and size-fractionated using Sephacryl S-400 spin columns. The size of cDNAs in each fraction was evaluated by electrophoresis and autoradiography. The D N A content of the fractions containing cDNAs >500 bp was quantified by ethidium bromide plate assay and 100 ng of the cDNA was ligated into Uni-ZAP X R vector arms following the manufacturer's instructions (Stratagene). The recombinant X-phage was packaged using Gigapack III Gold packaging extract 19 (Stratagene). The library was titred using X L 1-Blue M R F strains and plating in LB top agar plates supplemented with IPTG and X-gal. The Uni-Zap XR cDNA library, consisting of ~107 independent clones, was amplified once to obtain a large and stable quantiy of the high-titre stock. 2.5.3 DNA screening of the cDNA library Approximately 500, 000 plaques of the amplified cDNA library were blotted in duplicate on Hybond N + nylon membrane (Amersham). The library was screened with homologous probes for each gene family. The probes were radiolabeled to a high specific activity with [a- 3 2P]dATP using a Random Primer Labeling kit (Life Technologies). The nylon membranes were hybridized at 50°C with the probe for 16 h in hybridization buffer consisting of 6x SSC buffer, 20 mM NaH 2 P0 4 , 0.1% (v/v) SDS, 5x Denhardt's reagent, and 500 pg/ml denatured salmon sperm DNA. The membranes were washed three times at 50°C with 6x SSC buffer, 0.1% (v/v) SDS for 20 min each. After the tertiary screen, inserts in positive clones were amplified using the vector-specific primer T3 and antisense gene-specific degenerate PCR primer(s) to confirm the identity of the gene and to estimate the insert length. Representative plaques of each cDNA class were generated as pBluescript SK" phagemids via in vivo excision using the helper phage strain ExAssist (Stratagene). 2.5.4 Restriction map-analysis, and sequence analysis of the cDNAs Only putative full-length clones were classified into groups based on the fragment sizes generated by digestion with different restriction endonucleases. Plasmids from each class were isolated using a commercial plasmid purification kit (Qiagen). Sequence 20 accuracy was confirmed by the complementarity between the sense and the antisense strands of the D N A sequences and by translation of the sequences into amino acid sequences. Similar sequences in the databases were identified using BLAST software (Altschul et al., 1990). Nucleotide and amino acid sequence analysis was performed using various web-based analytical tools (http://www.sdsc.edu/ResTools/). 2.6 Southern and northern blot analysis Genomic D N A (10 u.g) was digested with the appropriate enzymes, resolved in a 0.8% TAE-agarose gel, and then blotted onto Zeta-Probe nylon-membrane (BioRad). Blots were hybridized overnight at 50°C in a solution consisting of 6x SSC, 0.5% (v/v) SDS, 5x Denhardt's reagent and 0.1 mg/ml denatured salmon sperm DNA. Southern blots were washed three times (20 min each) at moderate stringency (2x SSC, 0.1% (v/v) SDS, 50°C). The RNeasy Maxi Kit (Qiagen) was used for total RNA isolation. R N A bound to the silica-gel-based matrix was eluted with DEPC-treated water and quantified by measurements at 260 nm. Total RNA (10 u.g) was resolved in a 1% agarose gel with 2.2 M formaldehyde, and blotted onto Zeta-Probe nylon membrane (BioRad). Membranes were hybridized overnight at 42°C in 0.5 M Na 2 HP0 4 at pH 7.2, and 7% SDS with a probe labeled with [ct-32P] dATP. Membranes were washed at low stringency (6x SSC, 0.1% SDS, 50UC, 3X15 min each wash). Membranes were stripped by washing in hot 0.2x SSC, 0.5% (v/v) SDS buffer and rehybridized with a Rubus Histone H3 D N A probe to check for equal loading. 21 2.7 Recombinant proteins 2.7.1 cDNA cloning into E. coli expression vectors The plant cDNA was cloned into expression vectors pQE30 or pQE50 (Qiagen) to generate, respectively, recombinant proteins with or without the His6-tag fusion at the N-terminus. To facilitate cloning into the expression plasmids, the open reading frame of the cDNA was amplified with Pfu Taq Polymerase (Stratagene) using primers that would introduce unique restriction endonuclease sites 5' and 3 - to the start and stop codon respectively. PCR conditions used for amplification of the cDNA were 95°C for 5 min, 35 cycles of 94°C for 50 s, 55°C for 2 min, 72° C for 1 min, followed by a hold at 72°C for 10 min in a PTC 100 thermal cycler (MJ Scientific). The PCR product was resolved in a 1% TAE-agarose gel and the appropriate size fragment purified from the gel using the G E N E C L E A N kit. The fragment and the vector were digested with the appropriate restriction endonucleases and the vector-insert ligation was carried out using T 4 DNA Ligase (Life Technologies). Initial propagation of the manipulated clones was carried out in XL-1 Blue strains of E. coli. The integrity of the reading frame and the nucleotide sequences was checked by fully sequencing the new constructs. 2.7.2 Recombinant protein induction and extraction To produce samples for enzymatic assays, plasmids containing the cDNA of interest were transformed into bacterial strain RM82, which contained an additional plasmid, pBUS25, carrying the argU (dnaY) gene (Brinkmann et al., 1989). Expression vectors without any insert, transformed into RM82 [pBUS250] cells, served as the negative control. Recombinant protein was obtained by inducing 50 ml cultures in the 22 logarithmic-phase (grown at 37°C) with 1.0 mM EPTG for 4 h. The cultures were then centrifuged and the bacterial pellets resuspended in 5 ml of 200 mM Tris (pH 7.5) containing 1.5 mM P-mercaptoethanol for 4-coumarate-CoA ligase, and in 50 mM HEPES buffer (pH 7.5) for polyketide synthase recombinant proteins, respectively. Cells were disrupted in a French press at 1100 psi, and the extract clarified by centrifugation at 10,000x g for 2 min. The resulting supernatant containing the soluble protein was immediately used for further analysis. 2.8 Preparation of protein extracts from plant tissues and cell-cultures Rubus plant or cell culture tissue was homogenized in liquid N 2 . The frozen powder (100 mg) was resuspended in 1 ml buffer [100 mM Tris (pH 7.5), 14 mM P-mercaptoethanol] containing 100 mg Dowex AG1-X2 (Bio-Rad). The slurry was slowly rotated at 4°C for 20 min, filtered through one thickness of Miracloth (Calbiochem), and centrifuged to remove the resin and debris. The final supernatant was used for further analysis. Protein extracts used for polyketide synthase assays were prepared essentially the same way, except that P-mercaptoethanol was omitted from the extraction buffer. 2.9 Protein analysis Protein extracts were fractionated using SDS-polyacrylamide gel electrophoresis (SDS-PAGE; 12.5% polyacrylamide) with the discontinuous system of Laemmli (1970). Following electrophoresis, polypeptides were stained with Coomassie Blue R250 for direct visualization or were blotted onto Immobilon-P (Millipore) PVDF membrane for detection with antigen-antibody complexes. Membranes were blocked for 1 h in TBST containing 5% skim milk powder and incubated with primary antibodies for an additional 23 1 h. Antigen-antibody complexes were detected using a 1:5000 dilution (in blocking solution) of a goat anti-rabbit IgG-alkaline phosphatase secondary antibody (Sigma). Membranes were washed three times for 10 min each in blocking solution and equilibrated for 5 min in 100 mMNaCl, 5 mM MgCl 2 , 100 mM Tris-Cl pH 9.5, before addition of 5-bromo-4-chloro-3-indolyl phosphate (33 ul) and blue tetrazolium (44 pi) (Life Technologies). The color reaction was stopped by immersing the blots in double-distilled water. Protein concentration were determined using the Bradford Assay Kit (Bio-Rad) with bovine serum albumin as a standard. 2.10 Enzymatic assays 2.10.1 Assay for 4-coumarate:CoA ligase The 4CL assay used the spectrophotometric assay described by Knobloch and Hahlbrock (1975). The standard assay buffer contained 5 mM ATP, 5 mM MgCl 2 , 0.33 mM Coenzyme A, and 0.2 mM cinnamic acid or benzoic acid derivative. Incubations were started by addition of CoA and carried out at 25°C. The formation of the CoA esters was determined spectrophotometrically by monitoring the change relative to a control (no Coenzyme A) incubation, at wavelengths appropriate for cinnamyl, hydroxycinnamyl, benzoyl and hydroxybenzoyl-Coenzyme A derivatives (Stockigt and Zenk, 1975; Webster et al., 1974) (Table 2.1). 2.10.2 Enzymatic synthesis of hydroxycinnamyl CoenzymeA derivatives Cinnamyl CoA and other hydroxycinnamyl CoA derivatives were enzymatically synthesized using poplar 4CL6 recombinant protein expressed in baculovirus-infected 24 insect cells (Cukovic, 1999). Hydroxycinnamic acid derivatives (0.2 mM) were incubated with recombinant protein (200 u,l) in a final volume of 1 ml. Co-factors were added at the same concentration as used for a typical 4CL assay described above. The reaction was allowed to proceed for two hours and was monitored at regular intervals at wavelengths indicated for various CoA derivatives (Table 2.1). At the end of the incubation, the mixture was loaded on a pre-equilibrated reverse-phase Ci8-column (Waters, bed volume 1 ml). Before applying the reaction mixture, the column was washed with three column volumes of methanol and equilibrated with three column volumes of 0.2 M MOPS, pH 7.5. After sample application, the column was washed through with 0.2 M MOPS, pH 7.5 until the effluent had no absorbance. The aromatic CoA derivatives were then eluted with 30 ml water and immediately lyophilized. The freeze-dried powder was dissolved in 1 ml water and stored at -80°C until further use. The authenticity and concentration of the synthesized product were determined by UV-spectral scan of the product and absorbance measurements at the appropriate wavelength for each CoA derivatives (Table 2.1). CoA Derivatives Wavelength Extinction Coefficient (nm) (L mol 'cm') Cinnamyl 311 22,000 /7-Coumaryl 333 21,000 Caffeyl 346 18,000 Ferulyl 345 19,000 Sinapyl 352 19,000 Benzoyl 261 21,100 o -Hydroxybenzoyl 262 21,300 m -Hydroxybenzoyl 261 21,300 p -Hydroxybenzoyl 262 21,400 Table 2.1 UV absorbance data for the various CoA derivatives used in the enzymatic assay. 25 2.10.3 Assay for plant polyketide synthase activity PKS activity was determined by following the formation of radiolabeled end-product during incubation of the enzyme with radiolabeled [2-14C] malonyl Coenzyme A (Amersham). Malonyl CoA (1 nmole) and p-coumaroyl CoA (1 nmole) were incubated with 85 pi protein extract at 30°C for 30 min in a final assay volume of 100 pi. The reaction was terminated by addition of glacial acetic acid (20 pi) and the assay mixture spiked with authentic samples of benzalacetone (20 nmol) and naringenin (80 nmol), the expected end-products. The reaction was extracted with 500 pi ethyl acetate, and the organic phase evaporated to dryness and re-dissolved in 50 pi ethanol. The entire product was applied to the origin of a plastic-backed polyamide T L C plate (Macherey-Nagel, Germany) and developed using methanol:acetic acid:water (45:1:1) as the mobile phase. Products were visualized by autoradiography and samples having the same Rf as that of the authentic standard were scraped from the plate, eluted in methanol, and analyzed by liquid scintillation counting. 2.11 Analysis of gene expression by RT-PCR The level of expression of genes in tissues of raspberry was quantitated by amplifications of the cDNA, using a PCR MIMIC strategy for competitive PCR (Siebert andLarrick, 1993) 2.11.1 Extraction of total RNA from Rubus tissues Total R N A from various Rubus tissues was extracted using the RNeasy Plant Mini kit (Qiagen). Total RNA was treated with RNase-free DNasel (Qiagen), as per manufacturer's instructions. The concentration of RNA was determined based on absorbance at 260 nm. 26 2.11.2 Synthesis of cDNA A constant amount of total RNA (100 ng) from various Rubus tissues was reverse-transcribed into cDNA using 4 U Omniscript Reverse Transcriptase (Qiagen) in a reaction volume (20 ul) containing lx RT buffer (Qiagen), 0.5 mM each dNTP, 1 uM oligo-dT primer and 10 units RNase inhibitor (Pharmacia). The reaction mixture was incubated at 37°C for 1 h followed by a 5 min incubation at 95°C to destroy the RT-enzyme. 2.11.3 Generation of PCR competitor An internal standard D N A fragment for each gene family member was constructed as described in Figure 2.1. A non-homologous DNA was amplified using composite primers, designed to incorporate each gene-specific primer target sequence at the ends of the non-homologous DNA. Composite primers were -35 nucleotides each, in which the 18 nucleotides at the 5'-end are part of the specific primers for each gene family member, followed by -17 nucleotides complementary to spruce coniferin /?-glucosidase gene. These composite primers were used in the first PCR amplification to generate a "mimic" (competitor) that now has the same primer recognition sites as the gene-specific-primers. A dilution of the first PCR product was then amplified using only the gene-specific primers to ensure that all fragments have the complete gene specific-primer sequence. This fragment (competitor) was then purified from a 1% TAE-agarose gel and quantified against known amounts of X-DNA. The competitor was initially diluted to 200 amol/ul stock, and then used in competitive PCR experiments described below. 27 2.11.4 Expression of PAL, 4CL and PKS gene family members in various Rubus organs Each sample of cDNA synthesized from different Rubus tissues was amplified with Taq DNA polymerase (Qiagen) along with a constant amount of gene-specific competitor. Details of the amplification conditions and the dilution of the competitor used have been discussed in each chapter. The cDNA synthesized from each tissue was amplified with primers specific for Rubus Histone H3 gene to ensure that the same amounts of R N A from each sample were being used as the starting template for the composite primers + non-homologous DNA fragment 1° P C R with composite primers gene-specific | specific primers 2°PCRwithgene-^ primer T. ^ purify the fragments Pure P C R Competitor (with gene-specific primer binding sequences) Figure 2.1 Flow chart illustrating the generation of gene-specific competitors. The composite primers were composed of two units; the 3'-portion anneals to the non-homologous D N A fragment and the 5'-portion anneals to a specific target gene. 28 cDNA synthesis (assuming that the efficiency of cDNA synthesis was equal for each sample). The amplified fragments.were subject to electrophoresis on a 3% TAE-agarose gel and the intensity of the EtBr-stained amplified products was quantified using the software Scion Image (Scion Corporation). The values of the intensities for the amounts of products generated by the competitor (Ac) and the target (At) were determined for each individual sample. The amount of Rubus Histone H3 gene amplified from each sample was factored into the ratio (target:competitor) for each sample to normalize each lane for equal amounts of starting R N A template. Difference in the ratios (Ac: A t) represented difference in the mRNA level in each tissue sample. To monitor the degree of inter-assay variations samples of R N A were assayed in two independent RT-PCR assays. 2.11.5 Absolute amounts of Rubus PAL, 4CL, PKS gene-family members in various Rubus tissues To compare the levels of the expression of each gene family member in a particular tissue sample, the absolute mRNA level for each member was determined. A constant amount of the cDNA from fruits in stage III (1 ul) was amplified in the presence of a serial dilution of the gene-specific PCR competitor. The amplification products were then resolved on an agarose/EtBr gel and the bands quantified by densitometry scanning using the Scion Image Software. The relative amounts of target (t) to competitor (c) were calculated after correcting for the differences in sizes between them. The logarithm of the ratio of A t/Ac was graphed as a function of the logarithm of the initial molar amount of the competitor (Nc). Linear regression analysis was used to define the equation for the line through the data points, and absolute amounts of competitor sufficient to give an equimolar ratio of the cDNA to the competitor were calculated. The amount of each 29 gene-specific-cDNA, and hence the amount of the cognate mRNA present in the fruits III (assuming 100% efficiency of reverse-transcription), was thus obtained. The absolute levels in all other samples was determined based on the relative ratio of A t / A c in each sample. 2.12 Phylogenetic analysis Amino acid sequence data from other plant species for integration into the analysis were obtained from GenBank (http://www.ncbi.nlm.nih.gov/entrez). ClustalW (Altschul et al., 1990) was used for alignment of the sequences. The multiple sequence alignment obtained was used for maximum parsimony analysis using PAUP 4.0.2b program (Sinauer Associates, Massachusetts). A heuristic search with the "tree bisection reconnection" branch-swapping algorithm (Swoffort and Olsen, 1990) was used to find the most parsimonious tree. For statistical analysis, 1000 bootstrap replicates were analyzed (Felsenstein, 1985). 30 CHAPTER 3 Phenylalanine ammonia-lyase gene family in Rubus idaeus: structure, organ-specific expression and evolution 3.1 Introduction Ripening fruits undergo a complex developmental process that encompasses changes in many metabolic pathways. These coordinated physiological and biochemical changes ultimately lead to metabolite profiles that determine the quality of the fruit e.g. color, texture, flavour, and aroma (Brady, 1987). In climacteric fruits such as tomatoes, avocados and bananas, these events are coordinated by the gaseous hormone ethylene, which is synthesized autocatalytically in the early stages of ripening (Biale and Young, 1982). In contrast, non-climacteric fruits such as strawberry, cucumber and pepper do not synthesize or respond to ethylene in the same manner, yet undergo metabolic changes, associated with the production of ripe fruit (Biale and Young, 1982). Despite this inherent difference between climacteric and non-climacteric fruits, in both types of fruits, the ripening process is largely controlled at the level of nuclear gene expression. This has been shown by examination of in vitro translation products synthesized from poly (A) + RNAs in tomatoes (Grierson et al., 1985; Briggs et al., 1986), apple (Lay-Yee et al., 1990), and mango (Lopez-Gomez and Gomez-Lim, 1992), and by differential screening and subtractive hybridization of cDNA libraries from poly(A)+ RNAs in fruits of tomatoes (Gray et al., 1992; Kausch and Handa, 1997), banana (Clendennen and May, 1997), avocado (Dopico et al., 1993), and strawberry (Wilkinson et al., 1995). Ripening of raspberry fruits is characterized by increased synthesis of a number of natural products biogenetically derived from phenylalanine that contribute significantly 31 to the quality of fruits. The ripening is typified by the accumulation of anthocyanins and characteristic flavour components, by a concomitant decrease in chlorophyll, and by a progressive decrease in tissue firmness (Perkins-Veazie et al., 1992; Borejsza-Wysocki and Harazdina, 1994). Interestingly, both the color and the characteristic aroma of the fruits are due to the accumulation of phenylpropanoid derivatives, anthocyanins and p-hydroxyphenylbutan-2-one, respectively (Barrit and Torre, 1975; Schinz and Seidel, 1961; Borejsza-Wysocki and Harazdina, 1994). Increase in diverse phenylpropanoid derivatives is also a feature of ripening in other fruits. The aroma of ripening strawberry fruits is attributed to the accumulation of volatile cinnamates (Latza et al., 1996). The formation of flavor in developing fruits of Vanilla planifolia is due to the accumulation of phenolics such as 4-hydroxybenzaldehyde, 4-hydroxybenzoic acid, vanillyl alcohol, vanallin and vanillic acid (Brodelius, 1994). Green fruits of Litchi chinensis contain the anthocyanin, malvinidin-3-acetylglucoside while ripe fruits contain cyanidin-3-rutinoside, demonstrating specific developmental changes in phenylpropanoid metabolism (Rivera-Lopez et al., 1999). In capsicum, increase in fruit length is paralleled by the disappearance of three cinnamyl glycosides and two flavonoids, while at the same time, 'lignin-like' substances and the glycosides of vanillic acid, p-hydroxybenzaldehyde and several unkown C-6-C-1 compounds accumulate (Sukrasno and Yeoman, 1993). Thus, phenylpropanoid metabolism is a vital part of fruit development, and thus of quality, in both climateric and non-climateric fruits. Despite this obvious importance of phenylpropanoid metabolism, not much is known about the regulation of this pathway at the molecular level in ripening fruits, including Rubus. 32 Synthesis of phenylpropanoids in fruits could be regulated at multiple steps: the entry of sugars into the shikimic acid pathway, the entry of phenylalanine into the general phenylpropanoid pathway or the entry of activated CoA esters into various sub-branches of the phenylpropanoid pathway. It is generally accepted that phenylalanine ammonia-lyase (PAL, E C 4.3.1.4), the first enzyme of the phenylpropanoid pathway, catalyzes the rate-determining step for the synthesis of phenylpropanoids. This was demonstrated for the biosynthesis of chlorogenic acid in transgenic tobacco leaves (Bates et al., 1994; Howies et al., 1996). It has been also shown that when PAL is reduced below a threshold of 20% to 25% of wild-type activity, it becomes a rate-limiting step for lignin biosynthesis (Bate et al., 1994; Howies et al., 1996). It has been suggested by Stafford (1974) that phenylpropanoid metabolism may involve macromolecular assemblies, and that this organization might help regulate the partitioning of intermediates among competing pathways and determine the intracellular deposition of end products. Initial gel-filtration and cell-fractionation studies performed by Hrazdina and Wagner (1985) indicated that enzymes of the phenylpropanoid pathway function as part of one or more membrane-associated enzyme complexes in amaryllis, buck-wheat and red cabbage. Using radiotracer feeding studies in a transgenic tobacco plant over-expressing a bean PAL, Rasmussen and Dixon (1999) suggest that the channeling of metabolites through the phenylpropanoid pathway requires the coupling of specific forms of P A L with cinnamate-4-hydroxylase, the second enzyme of the pathway. In parallel to this study, using a biochemical genetic approach, Burbulis and Winkel-Shirley (1999) provided evidence for specific protein-protein interaction among four enzymes of flavonoid biosynthesis in Arabidopsis. Taken together, these results are consistent with a model in 33 which synthesis of various phenylpropanoid metabolites is organized by the activities and associations of specific isoforms of the enzymes of the pathway. In ripening grape fruits, Hrazdina et al. (1984) showed that several enzymes of the general phenylpropanoid pathway along with those of the flavonoid pathway are coordinately up-regulated, and suggested that multienzyme complexes may also be important for channeling of metabolites during the process of fruit ripening. As a first step to gain further insights into the regulation of phenylpropanoid metabolism during the process of Rubus fruit ripening, I have examined the gene-family structure and the regulation of genes encoding PAL, the first enzyme of the phenylpropanoid pathway. PAL is encoded by a family of three to five genes in bean (Cramer et al., 1989), parsley (Logemann et al., 1995), rice (Minami et al., 1989), Arabidopsis (Wanner et al., 1995), pea (Yamada et al., 1992), and barley (Kervinen et al., 1997) while in potato over forty PAL genes have been reported (Joos and Hahlbrock, 1992). Increase in PAL enzyme activity observed in response to a range of environmental cues, including wounding, infection, light, phytohormones and elicitors (Dixon and Paiva, 1995) is correlated with corresponding fluctuations at the mRNA levels, suggesting that regulation of the PAL genes occurs primarily at the transcriptional level. In some cases, individual PAL genes are differentially regulated in different organs of plants and specific PAL transcripts can accumulate in a cell-type specific manner (Liang et al, 1989; Lois and Hahlbrock, 1992; Shufflebottom et al., 1993). PAL has been purified from fruits of tomatoes (Bernards and Ellis, 1991), citrus (Dubery and Schabort, 1986), grape berries (Roubelakis-Angelakis and Kliewer, 1985), and strawberry (Given et al., 1988), and expression of PAL genes have been characterized 34 in ripening fruits of melon (Diallinas and Kanellis, 1994) and sweet cherry (Wiersma and Wu, 1998). The reddening of ripening strawberry and apple fruits is attributed to the accumulation of anthocyanins (Woodward, 1972; Chalmers et al., 1973) and is paralleled by an increase in PAL activity (Faragher and Brohier, 1984; Given et al., 1988). Developing grape berries also display accumulation of anthocyanins accompanied by an increase in P A L activity (Hrazdina et al., 1984). In strawberry, it is noteworthy that two peaks of PAL activity have been reported, one in green fruits and one in nearly ripe fruits (Cheng and Breen, 1991). While the second peak is correlated with the anthocyanin accumulation that occurs in ripe fruits, the first peak has been suggested to be involved in the synthesis of other flavonoids (e.g. condensed tannins) and phenolics that takes place during early fruit ripening (Macheix et al., 1990; Cheng and Breen, 1991). In melon, the increase in levels of PAL transcripts follows the kinetics of expression of the ethylene biosynthetic genes, 1-aminocyclopropane-l-carboxylic acid (ACC) synthase and A C C oxidase, during fruit development (Diallinas and Kanellis, 1994), while cherry PAL was maximally induced in the "early pink" stage of development (Wiersma and Wu, 1998). Thus, while there are several examples of a close relationship between PAL activity and ripening of fruits, there is relatively little information available regarding the gene-family structure and differential regulation of PAL genes from a fruit crop. In this chapter, I show that in Rubus there are at least two divergent PAL genes. The relationship of these genes to each other and to orthologues in other plant species are described. I also show that the two Rubus PAL genes behave differentially during the process of fruit ripening. 35 3.2 Methods 3.2.1 Degenerate PCR primer design and amplification of Rubus genomic DNA When this study was started there were -16 full-length PAL genes identified from plants and fungi in the GenBank database. An amino acid alignment of these 16 genes identified several conserved regions that appeared to be useful for design of degenerate PCR primers. Degenerate sense primers (P5, P7) and antisense primers (P2, P6) were designed to amplify partial regions of the PAL genes (Figure 3.1). Inosine was incorporated for amino acids that had four-or six-fold degeneracy. Primer P5 (128-fold degenerate) was based on the conserved amino acid residues Y G V T T G F G . This region is directly upstream of a conserved intron found in all PAL genes identified from angiosperms. P7 was designed to target a septa-peptide region that forms a part of the active site of phenylalanine and histidine ammonia-lyases (Schwede et al, 1999). Primer P2 (2048-fold degenerate) was targeted to the conserved region FfVQSAEQ in exon 2. It is also downstream of a second intron found only in Arabidodpsis PAL3 (Wanner et al., 1995) and Arabidopsis PAL4 (GenBank, BACF14P13). Primer P6 shared one amino acid in common with P2 and was 64-fold degenerate. Primer sequences and binding sites relative to a known PAL gene from Arabidopsis (PAL3; GenBank, L33679) have been summarized in Table 3.1 and graphically represented in Figure 3.1. EcoKl and Xbal restriction sites were added to the sense and the antisense primers, respectively, to facilitate directional cloning of the PCR products. PCR products from the various primer combinations were analyzed and purified from a 0.7% agarose gel using a G E N E C L E A N Kit (BIO/CAN), and subcloned into pUC19 for further analysis. After transformation, multiple colonies were analyzed for the presence of the 36 insert by digestion of extracted DNA with EcoBJ and Xbal. Multiple clones from two independent PCR reactions with each primer set were picked at random for sequencing. The sequences were grouped into classes based on their level of amino acid and nucleotide sequence identity. Primers DNA Sequence of the degenerate primer AA Binding Sites P5 5 '-CGGAATTCTACGG(T/C)GTCACIAC(T/C)GGITT(T/C)GG-3' Y95GVTTGFG101 P7 5 '-CGGAATrcATC(T/A)CIGCirCIGGIGA(C/T)(T/C)T-3' Ii90(T/S)ASGDL196 P2 5 '-GCTCTAGATG(T/C)TCIGCI(G/C)(AT)(T/C)TGIAC(A/G)TG-3 H461VQSAEQ468 P6 5 '-GCTCTAGATTIAC(G/A)TC(C/T)G(G/A)TT(A/G)TG(C/T)TC-3' Q468HNQDVN480 Table 3.1 Nucleotide sequences of the primers used for amplification of the Rubus PAL genes. The binding sites of the primers relative to Arabidopsis PAL3 (GenBank, L33679) have been indicated with the amino acids. The bold letters denote the nucleotide recognition sites of restriction enzymes £coRI (GAATTC) and Xbal (TCTAGA). A A = Amino Acid 3.2.2 Screening of Rubus cDNA library for PAL genes A partial homologous fragment obtained by amplification of the Rubus genome with PAL gene-specific primers P7-P2 (Figure 3.1) was used as a probe to screen the Rubus fruit cDNA library as described in Section 2.5. The insert of each positive plaque was amplified in a 50 u,l volume PCR reaction with vector-specific primer T3 and PAL gene-specific primer P6, employing the protocol used for amplifications of the genomic fragments described in Section 2.3. Amplified products (5 ul) were digested with restriction enzymes BamHL, HindUl, EcdRI, EcoRV, Kpnl, Ncol, Sail, Smal, and Xhol according to the manufacturer's protocol. One plaque representative of each RFLP class 37 and harboring the largest size insert, was rescued as a pBluescript II SK(-) phagemid using the ExAssist helper phage. 3.2.3 Design of gene-specific primers and RT-cPCR Gene-specific primers capable of amplifying a segment from nucleotides 2009 to 2227 of RiPAL 1 cDNA are: pall 5 (5 ' -CGACAATGCCAGGATCGAAT) and antisense primer pal 13 (5'-TCCTTCA AACACTCCAGCAGA-3' ) . Gene-specific primers capable of amplifying a segment from nucleotides 2009 to 2225 ofRiPAL2 cDNA are: pal25 (5-T G A G A G C G C T A G G G C T G CG-3') and antisense primer pal23 (5'-GCTGAGGCAGCTGAGAATG-3' ) . The predicted amino acid sequence locations of these primers are shown in Figure 3.5. Composite primers for creation of a non-homologous competitor (Section 2.11), contained additional nucleotides at the 3'-end of each gene-specific primer. Twenty additional nucleotides ( 5 - C C C C T A A C A G G A A T T CTGCG-3') were added at the 3'-end of each sense gene-specific primer to create sense gene-specific composite primer. Similarly, twenty additional nucleotides (5-A C C A T C G C AGATTGAAGGAC-3') were added to the 3'-end of each antisense composite gene-specific primer. Gene-specific primers for amplification of the Rubus HistoneH3 had the nucleotide sequences: hisl5 (5-ATGGCGCGGACGAAGGA-3') and hisl3 (51- GCCTACGCCGCCCGCTCAACCTA-3' ) . For amplifications of the cDNA, the first-strand cDNA reaction (1 pi) (Section 2.11.2) was amplified in a total volume of 20 pi containing; 200 nM each PCR primer, 200 p M each dNTP, and 2.5 U of Taq DNA polymerase in lx PCR buffer (Qiagen) and lx Q solution (Qiagen). The thermal cycling conditions were 94°C for 5min, followed by 25 cycles for (Histone H3) or 32 cycles for (RiPAL) of 94°C for 20 s, 59°C for 50 s, and 38 72UC for 50 s, followed by a final extension for 5 min at 72°C. The PCR product (10 1^) was analyzed on a 3% TAE-agarose gel and stained with 5 p.g/ml ethidium bromide. 3.3 Results 3.3.1 PCR-based search for the Rubus PAL gene family Rubus genomic DNA isolated from young leaf tissue was used as a template for amplification of PAL genes using combinations of the degenerate primers (P5/P7)-(P2/P6). Amplification reactions with the primer pair P7-(P2/P6) yielded expected size products of about 0.9 kb, in addition to products of other sizes (Figure 3.2 A, lane 1 and 2). However, the other fragments were also seen in the amplification reactions with single primers P7/P6/P2 (Figure 3.2 A, lanes 4, 5 and 6), indicating they were non-specific products. All angiosperm PAL genes characterized to date harbor a single intron at a conserved position within the regions amplified by primers P5-(P2/P6). These primer combinations were therefore expected to amplify fragments larger than 1.1 kb (Figure 3.1). Amplifications with combinations P5-(P2/P6), each yielded an intense band of-1.8 kb (Figure 3.2 B, lane 1 and 2). Multiple colonies obtained from subcloning the products of two independent PCR reactions of each primer combination were analyzed for the presence of inserts by digesting with EcoKl and Xbal. Clones carrying inserts were randomly chosen for sequencing. I sequenced the 3' end of >20 randomly chosen clones from each of the primer pair combinations P7-(P2/P6), and 5 additional clones from each of the primer-pair combinations P5-(P2/P6) (Table 3.2). Comparison of a stretch of 366 nucleotides (122 aa) sequenced from all 50 clones revealed two different classes of PAL sequences, arbitrarily named Ripall and Ripall. I use lower case letters for these genes, as they 39 represent partial length sequences. Tables 3.2 summarizes the number of clones sequenced from each class, as well as the distribution of Ripall-like or Ripal2-like clones obtained from each primer pair combination. 40 Figure 3.1 Positions of the Rubus pal clones amplified and sequenced, relative to a generic PAL gene. The dark rectangular box indicates the coding regions of a generic angiosperm PAL gene. Closed arrows indicate the target positions of the degenerate primers. V 1 represents the conserved intron found in all PAL genes isolated from angio sperms. Dark lines indicate the amplified regions of the two classes of Rubus pal genes, Ripall and Ripal2. Each gene has an added EcoRI site at the 5' end and Xbal site at its 3' end to facilitate cloning. Ripall has an additional intrinsic internal EcoRI site downstream of P7. The 366 bp (122 aa) region compared and analyzed in detail is indicated by the hatched box. Inverted open arrows (V) indicate the location of the intron found within the Rubus PAL genes. Primer Pair # Colonies # Clones Analyzed Sequenced # Ripall -like clones # Ripal2-like clones P7-P2 75 25 16 8 P7-P6 50 22 18 4 P5-P2 25 7 5 2 P5-P6 25 5 4 1 Table 3.2 Summary of putative PAL clones analyzed and sequenced from each of the primer pairs used in this study. 41 Figure 3.2 Fragments of the Rubus genome amplified using different PAL gene-specific degenerate primer pairs. A) The various primer-pair combinations used were as follows: lane 1, P7-P2; lane 2, P7-P6; lane 3, P7; lane 4, P2; lane 5, P6; lane 6, no template control. M , 1 kb Molecular Weight Marker. B) Lane 1, P5-P2; lane 2, P5-P6; lane 3, P5; lane 4, P2; lane 5, P6; M , 1 kb Molecular Weight Marker. 42 The Ripall-like sequence class was represented by 16 clones obtained from primer-pair P7-P2, 18 clones from P7-P6, 5 clones from P5-P2 and 4 clones from P5-P6. A closer analysis of these sequences revealed that the clones could be grouped into two sub-categories that varied at fewer than ten nucleotide positions, from each other. Twenty clones were identical in the 366-nucleotide positions and this sequence was therefore designated Ripall (Figure 3.3). Each of the additional fifteen clones had single base substitution at nucleotide positions 75 (C^T) , 114 (A->G), 141 (C->T), 144 (G^A) , 162 (C->T), 228 (T->C), 267 (A->T), and 351 (C->G). These sequences were designated Ripall-2. Interestingly, none of these substitutions introduced a change at the amino acid level. These substitutions were seen in more than one clone obtained from two independent PCR reactions, and the substitution rate was greater than the estimated sequence error rate (0.5%) for Taq DNA polymerase in a similar analysis (Butland et al., 1998). Since the target DNA used for amplifications in this analysis was derived from Rubus idaeus, a diploid heterozygous species, this level of heterogeneity most likely denotes the polymorphism between allelic forms of the same genes. Eight clones had single base substitutions at fewer than three nucleotide positions each within the 366 nucleotides compared. These substitutions were not seen in clones obtained from another PCR reaction with the same or a different primer set and they thus likely reflect synthesis errors introduced by Taq DNA polymerase. Ripal2 was represented by eight clones obtained from primer set P7-P2, four clones from P7-P6, two clones from P5-P2 and one clone from P5-6. Eight of these clones were identical over the 366 nucleotides compared and this sequence was therefore designated Ripal2 (Figure 3.3). An additional four clones contained substitutions at both 43 nucleotide positions 234(T—>C) and 246(T—>C). These sequences most likely represent allelic variation, since they were detected in clones obtained from two independent PCR reactions with the same or different primer combinations. These sequences are henceforth referred to as Ripal2-2. Three of the fifteen Ripall-like clones had single nucleotide substitutions at positions that were not represented in other clones, and thus most likely arose from errors introduced by Taq DNA polymerase. Ripall has an internal EcoRI site 554 bp downstream of primer P5 (281 bp downstream of primer P7) (Figure 3.1) that accounted for clones with inserts of 0.7 kb obtained with all the primer combinations used in this analysis. Ripall CCACAATGGCTAGGTCCACAGATCGAAGTGATCAGGGCAGCAACCAAAATGATTGAGAGG 60 Ripal2 C .-.A A T T T T . G . . T . . .TCT 60 Ripall GAGAT C AAC T CT GT CAAC GACAAC C CAT T GAT C GAT GT CT C C AGGAACAAG G CAT TAC AT 120 Ripal2 T . . T T T T T . . G G . . . 120 Ripall GGTGGAAATTTTCAAGGAACCCCGATTGGTGTTGCCATGGACAACACCAGACTTGCCATT 180 Ripal2 C. . C . . C . . G . .T A A . . G T C . T T . G . . T . . . 180 Ripall GCCTCAATTGGGAAGCTCATCTTTGCCCAATTCTCTGAGCTTGTCAATGACTACTACAAC 24 0 Ripal2 . . A . .C G T . . G . . T A TT 240 Ripall AATGGCTTGCCTTCGAATCTCACAGGAAGCAGCAACCCGAGTTTGGACTACGGGTTCAAA 300 Ripal2 . . C . . T A T . A T . G . . T G . G . .G C T . . T . . C G 300 Ripall GGTGCTGAAATCGCAATGGCATCTTACTGCTCAGAGCTTCAGTTCCTTGCCAACCCTGTG 360 Ripal2 . . A G . . T . . C T . . T . . C T . . A T . . G . . . 360 Ripall ACCAAC 366 Ripal2 . . T . . . 366 Figure 3.3 Nucleotide sequences of the two classes of R idaeus PAL genes. The 366 nucleotide region compared is shown in full for Ripall. For Ripall only deviations from Ripall are shown. Dots represent nucleotides identical in both sequences. 44 Amplifications of Ripall and Ripall with primer combinations P5-(P2/P6), had revealed the likely presence of intron(s) within the regions amplified, based on the size of the PCR amplification products (1.8 kb). Since Ripall had an internal EcoRI site (Figure 3.1), I cloned the 1.8 kb PCR-amplified product of P5-P2 (Figure 3.2B) into a ddT- tailed vector (Holton and Graham, 1990). The two classes of Rubus PAL genes could be distinguished by digesting the cloned inserts with EcoRI. Sequencing the full 1.8 kb insert from each of the two classes, confirmed the presence of an intron (-700 bp) in both classes. The intron is inserted between the second and third bases of a conserved arginine codon (Table 3.3), a position that is identical to that observed in other angiosperm PAL genes (Arg 1 2 1 in Arabidopsis PALS) (Figure 3.1). The sequence around the exon-intron junctions in the Ripal genes is similar to the consensus for dicot genes (Brown et al., 1986) and conforms to the "GT-AG" rule for donor/acceptor sites (Breathnach and Chambon, 1981). A comparison of sequences around the exon/intron boundary for the two genes is shown in Table 3.3. As predicted from the size of the amplification products seen with primer combinations P5-(P2/P6), the intron size within Ripall and Ripall was 778 bp and 687 bp respectively. The two introns share 57% nucleotide identity. Within the amplified regions, the Ripals do not contain the second intron that has been found only in Arabidopsis PAL3 and Arabidopsis PAL4 (GenBank, BACF14P13) genes to date. 45 Samples Intron Position 5'-Splice site sequence 3'-Splice site sequence size (bp) Ripall 778 Arg (AG^A) • • T A G g t a a t t t t a . . . . g t t t t t g t t g g a c a g . . Ripal2 687 Arg19(AG^AA) • • TAGgtaagttaa. . . . a a t t t c t t g g t c c a g . . Dicot N A Ggtaagtatt t t t t t t t a t t t g c a g Consensus T a t a a " * aanaaaataa a t •• Table 3.3 Features of the intron found within the sequenced regions of the Rubus pal genes. Sequences around the intron-exon splice junctions in the Ripals compared to the consensus found in dicots. Exon sequences are denoted by capital letters. A denotes the position of the intron within the codon of the conserved arginine residue, found within partial amplified fragments of Ripal genes. This arginine was 19 amino acids downstream of the primer P5 binding site. Using degenerate PCR primers I have established that the PAL gene-family in Rubus consists of at least two members. Within the 366 bp regions compared, the two genes are 71% identical at the nucleic acid level (Figure 3.4). Their encoded amino acid sequences (122 aa) share 82% identity and 96% similarity to each other. The derived amino acid sequences of Ripall and Ripal2, when compared to a similar fragment of PAL gene family sequences in other plant species showed the strongest similarity to members from dicotyledons (89%-97%). Ripall was most identical (97% amino acid sequence identity) to PALI from Lithospermum, while Ripal2 was most identical (98% amino acid sequence identity) to PALI from Citrus. These comparisons have been summarized in Table 3.4. To examine whether these partial fragments of PAL might be sufficient to draw conclusions about phylogenetic relationships amongst the PAL genes, I conducted a parsimony analysis using PAUP 4.0b2. An alignment of the 366 bp (122 aa) fragment from all PAL genes listed in Table 3.4 failed to generate any conclusive phylogenetic 46 tree, even when reducing the number of taxa added for phylogenetic comparison. This outcome probably reflects the considerable similarity, and thus parsimony uninformative sites, shared by diverse PAL sequences within the regions amplified by the PCR primers. Table 3.4 Sequence similarity (%) amongst PAL gene family members from different species, comparing the 366 bp (122 aa) region shown in Figure 3.3. (G) represents sequences in GenBank, (S) represents sequences in SwissProt database. aa 1 = % amino acid sequence identity; aa2= % amino acid sequence similarity in the 122 aa regions compared. a Predicted protein sequence of the PAL gene from Arabidopsis chromosome III (BAC F14P13), sequenced by the Arabidopsis Genome Initiative as of November 1999. 47 Gene family Species Ripall Ripal2 Accession members i aa 2 aa i aa 2 aa # Ripall Rubus idaeus - 92 96 Ripal2 92 96 - -AtPALl Arabidopsis thaliana 90 95 91 95 P35510 S AtPAL2 91 96 91 96 P45724 S AtPAL3 91 96 92 96 P45725 S AtPAL4" 95 98 95 97 AC009400 G CtPALY Citrus limon 95 97 96 97 Q42667 S CtPALl Citrus Clementina x Citrus 92 96 98 99 AJ238753 G CtPAU reticulata 91 95 97 99 AJ238754 G GmPALl Glycine max 90 94 95 97 P27991 S GmPAL2 89 93 91 95 JQ1070 S IpPALl Ipomea batatas 92 96 90 96 P14166 S IpPAL2 93 95 91 95 Q428588 S LePALl Lithospermum erythrorhizon 97 98 93 97 049835 S LePal2 95 98 91 96 049836 S Lypall Lycopersicon esculentum 95 97 94 97 P35511 S Lypal5 93 95 92 95 P26600 S Nspall Nicotiana tabacum 95 97 92 95 P25872 S Nspal2 91 95 91 95 P35513 S Nspal3 92 96 92 96 P45733 S Pvpall Phaseolus vulgaris 90 95 94 97 P07218 S Pvpal2 90 94 93 97 PI9142 S Pvpal3 81 91 86 95 P19143 S Pspall Pisum sativum 89 95 94 97 Q01861 S Pspal2 89 95 94 97 Q04593 S Pcpall Petroselinum crispum 91 97 96 97 P24481 S Pcpal2 90 97 95 97 P45728 S Pcpal3 90 97 95 97 P45729 S PkPALl Populus kitakamiensis 91 95 95 96 P45731 S PkPAL2 90 92 92 92 Q43052 S PkPAL4 91 93 92 94 Q40910 S PbPALY Populus balsamifera 90 94 94 97 P45730 S Stpall Solanum tuberosum 94 97 93 97 P31425 S Stpal2 94 97 93 97 P31426 S Hvpall Hordeum vulgaris 90 94 95 97 Z49147 S Hvpal2 86 93 89 94 Z49146 S HvpaB 80 89 83 90 X97313 S Ospall Oryza sativa 84 92 86 93 P14717 S Ospal2 90 94 95 97 P53443 S Pbpall Pinus banksiana 81 92 82 91 AF013481 G Pbpal2 82 93 84 93 AF013482G Pbpal3 83 92 84 93 AF013483 G Pbpal4 85 94 85 93 AF013484G Pbpal5 83 93 85 93 AFO13485 G 48 3.3.2 Construction of the cDNA library Attempts to isolate good quality RNA from fruits of Rubus using various RNA extraction procedures (CTAB; hot phenol extraction followed by LiCl; guanidine isothiocyanate extraction with CsCl centrifugation), alone or in combination, were unsuccessful even though parallel extraction procedures performed on tobacco leaves were successful. These results were the same as observed by Jones et al. (1997) who also report that conventional methods of RNA extraction when applied to raspberry fruits yield either degraded or poor quality RNA. It has been observed that R N A extraction procedures using detergents, guanidine buffer, phenol extraction, or density gradient centrifugation often fail when working with plant tissues rich in secondary products (Levi et al., 1992; Wang and Vodkin, 1994). Several commercially available RNA extraction kits bypass the use of phenol/chloroform extractions or CsCl ultracentrifugation. Kits from several companies were tested for their suitability for extraction of total RNA from partially ripe fruits of Rubus. Good quality RNA could not be extracted with commercially available kits that were based on selective precipitation of the RNA with ethanol or isopropanol. However, the RNeasy kit marketed by Qiagen Inc, which is designed to selectively purify R N A by binding to an anion exchange resin (silica-gel-based membrane) enabled me to isolate high quality R N A from fruits of Rubus. The RNA preparations had excellent spectral values and contained intact cytoplasmic RNA species. mRNA selectively purified from total RNA isolated from partially ripe fruits was used to construct the cDNA library as described in Section 2.5. The primary library, consisting of ~106plaque-forming-units/pg DNA, was amplified once to generate a high 49 titer stock of 10 plaque-forming-units/ml. The average insert size in the library was ~1 kb, as determined by PCR amplification of inserts from 15 randomly chosen plaques (data not shown). 3.3.3 Isolation and characterization of two ripening-related Rubus PAL genes 3.3.3.1 Cloning of PAL cDNA(s) Screening 500,000 phage plaques of the cDNA library with a 900 bp fragment of Ripall (Figure 3.1) yielded 46 positive plaques after three rounds of screening. Amplification of the insert in all positive plaques with universal primer T3 and a gene-specific primer (P6) identified 26 clones as having potentially full-length PAL gene inserts. Restriction fragment length analysis of the amplified fragment of the full-length clones with BamHL, EcoRI, Kpnl, Ncol, Pstl, Sail, Smal and Xbal allowed me to group the clones into two discrete classes (Table 3.5). RFLP maps for a representative clone from each class are shown in Figure 3.4. The entire sequence of a clone representing each class was determined by primer walking. Group I clones had a similar RFLP pattern and an exact sequence match to the partial fragment of Ripall-2. Group II clones had a similar RFLP pattern and exact sequence match with the overlapping regions of genomic fragments of Ripal2 identified by the PCR-based homology screen. Since group I and group II cDNA clones were the full-length expressed versions of the partial genomic fragments of Ripall and Ripal2,1 designated these clones as RiPALl and RiPAL2. 50 Clone # of clones, of each Insert grouping group size (kb) 11 2.5 RiPAL 1 1 2.4 13 2.5 RiPAL2 1 2.4 Table 3.5 Summary of full-length Rubus iML-cDNA clones identified by screening the cDNA library. The full-length clones isolated can be grouped into distinct groups based on the fragment size obtained after digestion with restriction endonucleases. cDNA Group MALI E * I* S . * S , 3 ? I l u . - p j r , E B P SE KN N SK XI \KirAL,Z i i I l _ J I 500bp Figure 3.4 Restriction maps of representative Rubus PAL cDNA clones. The dark line represents the partial fragment isolated and characterized during the PCR-based homology search for PAL gene family members. Restriction enzymes are E, EcoRI; K, Kpnl; N, Ncol; S, Smal; X, Xbal; XI, Xhol; B, Bamm, P, Pstl. 51 3.3.3.2 Sequence analysis RiPAL J consists of 2130 bp containing 125 bp of 5'-untranslated region, an open reading frame of 2130 bp, and 245 bp of 3-untranslated region. RiPAL2 consists of 2439 bp containing 65 bp of 5'-untranslated region, an open reading frame of 2189 bp, and 184 bp of 3-untranslated region. It was deduced that both clones represent full-length sequences because each contains one in-frame putative A T G intiation codon in favorable sequence context for the initiation of translation (Joshi, 1987; Kozak, 1986), and also because homology to other PALs in the GenBank begins shortly after this methionine (Figure 3.5). The 3-untranslated regions of the RiPAL genes do not contain the conserved eukaroytic polyadenylation signal A A U A A A ; instead they contain AAUAAA-like sequences, as is the case for more than 50% of the reported plant mRNA sequences (Wu et al., 1995). The open reading frame of RiPAL J encodes a polypeptide of 710 aa with a predicted M W of 77580 Da and theoretical pi of 6.2. The RiPALl open reading frame encodes a polypeptide of 730 aa with a predicted M W of 79356 Da and theoretical pi of 5.9. The deduced amino acid sequences of PALI and PAL2 are 81% identical (88% similar) to each other. As has been noted in other PAL protein sequences (Cramer et al., 1989), the greatest divergence was found in the N-terminal regions (Figure 3.5). Based on our previous analysis, we knew that RiPALs contain a single intron at a highly conserved position. This occurs between nucleotides 383 and 384 in the RiPAL J cDNA, and between 443 and 444 in the RiPAL2 cDNA. The sequences representing exon I portions of the two Rubus PAL genes share 73% amino acid sequence identity while the sequence identity between exon II regions was 83%. 52 In amino acid sequence comparisons between RiPALs and PAL gene family members identified from Arabidopsis, RiPALl shared the highest amino acid identity (81%) with Arabidopsis PAL2, while RiPAL2 was closely related (81% amino acid identity) to Arabidopsis PALI. In general, the predicted amino acid sequences of RiPALl and RiPALl were more closely related to predicted amino acid sequences of PAL genes isolated from dicot plant species than to those from monocot plants (Table 3.6). The polypeptides encoded by rice PAL genes, for example, share 66-72% amino acid sequence identity with RiPALs. Interestingly, N-glycosylation sites, phosphorylation sites and N-myristolyation sites are also predicted for both RiPALs (Table 3.7). A phosphorylation site detected in the bean PAL sequence (Allwood et al., 1999) was also conserved in the two Rubus PAL, suggesting that phosphorylation might also play a role in modulating the activity of these PALs. The amino acid positions of these potentially biologically significant sites are summarized in Table 3.7. Both predicted polypeptides contain the conserved motif {G-[STG]-[LIVM]-[STG]-[AC]-S-G[DH]-L-X-P-L-[SA]-X(2)-[SA] }(PROSITE: PS00488) found in all histidine and phenylalanine ammonia-lyases to date. This motif is predicted to form part of the catalytic active site (Hernendez and Phillips, 1994, Schwede et al., 1999). The good amino acid alignment with PAL from other species and the conserved amino acid motifs within the predicted polypeptides demonstrate that RiPALl and RiPALl encode isoforms of the Rubus PAL proteins. cDNA clones of RiPALl and RiPALl were thus used for further analysis, since they represent the full-length sequence of the two divergent classes of Rubus PALs. 53 To study the evolutionary relationship among PAL sequences, sequences of a set of PAL gene-family members from other plant species, including Rubus, were analyzed using the maximum-parsimony method. Each of the two independent heuristic searches conducted resulted in a single most parsimonious tree. The gymnosperm PAL sequence from Pinus taeda was selected as the outgroup because gymnosperms are considered on the basis of both morphological characters and 18s RNA sequences (Chaw et al., 1997) to be ancestral of the angiosperms. With the pine PAL sequence as the outgroup, the high bootstrap values (100% of the 1,000 bootstrap replicates) strongly suggest that dicots and monocot PALs belong to two separate monophyletic groups (Figure 3.6). While in this analysis only the two rice monocot sequences were included, inclusion of other monocot PAL sequences such as those from maize, wheat, and barley did not alter the distinct sub-grouping of monocot and dicot sequences. Within the dicot PALs, Arabidopsis PAL3 and PAL4 formed a distinct subgroup. RiPAL2 clustered with the cherry PAL sequence, another PAL gene characterized from a non-climacteric fruit. 54 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i PALI A t PAL 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t P A L 2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t PAL 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l 1 1 MEINGAHKSNl 1 MESITQNGHHHQNGIQNGSj X 355; AVE •NHl| ME Eli—gP NATAL S 345 353 358 338 335 333 405 413 418 398 395 393 4 65 473 478 458 Q K P K Q D R Y A L R T S P Q W L G P Q I E V I R S A T K J Q K P K Q D R Y A L R T S P Q W L G P Q I E V IRRATK? Q K P K Q D R Y A L R T S P Q W L G P Q I E V I R ^ T K ? Q K P K Q D R Y A L R T S P Q W L G P Q I E V I R A A T K . Q K P K Q D R Y A L R T S P Q W L G P Q I E V I R A A T K ' Q K P K Q D R Y A L R T S P Q W L G P Q I E V I R A A T K J IERE INSVNDNPL IEREINSVNDNPL IERE INS VNDNPL IERE INSVNDNPL IERE INSVNDNPL IERE INSVNDNPL IDVSRNKAfHGGNFQGT IDVSRNKAIHGGNFQGT IDVSRNKAtHGGNFQGT IDVSRNKALHGGNFQGT IDVSRNKAIHGGNFQGT IDVSRNKAIHGGNFQGT NTRLAIAAIGKLMFAQFSELVNDFYNNGLPSNT.TA^ PIGV^lDNTRIJ^ l jAIGK IJ ^ AQ FS ELVNDFYNNG LPSNITAg ! ' PIGvjgMDNTRLAIASIGKLMFAQFSELVNDFYNNGLPSNLSGM P I G V ^ D N T R L A I A S I G K L I F A Q F S E L V N D Y Y N N G L P S N L T i . ^ PIGVgMDNSRLAIAS I G K L M F A Q F S E L V N D F Y N N G L P S N L S G £ PIGvgMDNTRLALAS IGKLMFAQFTELVNDFYNNGLPSNLSGSF SYCSELQjLANPVTgHVQSAEQHNQDVNSLGLISSRKT SYCSELQXLANPVTgHVQSAEQHNQDVNSLGLISSRKT SYCSELQFLANPVTNHVQSAEQHNQDVNSLGLISSRKT SYCSELQFLANPVTNHVQSAEQHNQDVNSLGLISSRKT PSLDYGFKGAEIA1-?SLDYGFKGAEIA1» ?SLDYGFKGAEIA1-PSLDYGFKGAEIA1-PSLDYGFKGAEIA* ?SLDYGJjKGAEWr 55 A t PAL4 A t PAL3 A t P A L 2 A t P A L I R i P A L 2 R i P A L l A t P a l 4 A t P A L 3 A t P A L 2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t PAL2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 A t P A L 2 A t P A L I R i P A L 2 R i P A L l A t P A L 4 A t P A L 3 ++++ Figure 3.5 Comparison of the deduced amino acid sequences of the Rubus PALs with members oi the Arabidopsis PAL gene family. Black shade indicates amino acids conserved, grey boxes denote similar amino acid changes. Dashes indicate gaps introduced to maximize alignment. The active site signature motif forming part of the active site is indicated by asterisks (*). +, denotes the positions of the gene-specific primers used for expression analysis. Table 3.6 Amino acid sequence identity among PAL gene family members from different species. (G) represents sequences in GenBank, (S) represents sequences in the SwissProt database. aa'= % amino acid sequence identity comparing the partial genomic fragments (122 aa) as in Table3.4; aa2= % amino acid sequence identity comparing the peptides encoded by the full-length genes. Sequences for which full-length coding regions are not reported are shown by gaps in aa2 column.a Predicted protein sequence of the PAL gene from Arabidopsis chromosome III (BAC F14P13), sequenced by the Arabidopsis Genome Initiative as of November 1999. 56 Genes Species RiPALl 1 2 aa aa RiPAL2 1 2 aa aa Accession # RiPALl Rubusidaeus - - 92 80 AF237954 RiPAL2 92 80 - - AF237955 AtPALl Arabidopsis thaliana 90 78 91 81 P35510 S AtPAL2 91 78 91 79 P45724 S AtPAL3 91 73 92 69 P45725 S AtPAL4 95 81 95 78 AC0094003 CtPALY Citrus Union 95 81 96 84 Q42667 S CtPALl 92 79 98 83 AJ238753 G CtPAL2 91 79 97 83 AJ238754 G GmPALl Glycine max 90 81 95 84 P27991S GmPAL2 89 91 - JQ1070 S IpPALl Ipomea batatas 92 79 90 79 P14166S IpPAL2 93 78 91 79 Q428588 S LePAL2 Lithospermum 97 83 93 82 049835 S LePAL2 erythrorhizon 95 81 91 81 049836 S LyPALl Lycopersicon 95 79 94 80 P35511S LyPAL2 esculentum 93 79 92 82 P26600 S NsPALl Nicotiana tabacum 95 81 92 82 P25872 S NsPAL2 91 81 91 82 P35513 S NsPAL3 92 81 92 82 P45733 S PvPALl Phaseolus vulgaris 90 94 - P07218 S PvPAL2 90 80 93 82 P19142 S PvPAL3 81 68 86 71 P19143 S PsPALl Pisum sativum 89 79 94 82 Q01861S PsPAL2 89 80 94 82 Q04593 S PcPALl Petroselinum crispum 91 81 96 84 P24481S PcPAL2 90 81 95 83 P45728 S PcPAL3 90 83 95 85 P45729 S PkPALl Populus kitakamiensis 91 79 95 78 P45731S PbPALY 90 80 94 85 P45730 S PkPAL2 90 82 92 81 Q43052 S PkPAL4 91 - 92 - Q40910 S StPALl Solanum tuberosum 94 79 93 81 P31425 S StPAL2 94 - 93 - P31426 S Hvpall Hordeum vulgaris 90 - 95 - Z49147 S Hvpal2 86 - 89 - Z49146 S HvPAL3 80 - 83 - X97313 S OsPALl Oryza sativa 84 68 86 66 P14717 S OsPAL2 90 72 95 72 P53443 S PtPALl Pinus taeda 81 - 82 - AF013481G Pbpal2 Pinus banksiana 82 - 84 - AF013482 G Pbpal3 83 - 84 - AF013483 G Pbpal4 85 - 85 - AF013484G Pbpal5 83 - 85 - AF013485 G 57 RiPALl RiPAU Features of the Predicted Proteins #of AA ' 710 Calculated MW (Da) 77580 Theoretical pi 6.21 Potential Modification Sites (a) N-glycosylation site (PS0001) cAMP-andcGMP-dependent protein kinase phosphorylation site (PS00004) Protein kinase C phosphorylation site (PS00005) Casein kinase II phosphorylation site (PS00006) Tyrosine kinase phosphorylation site (PS00007) N-nyristoylation site (PS00008) Phenylalanine and Ffistidine ammonia-lyases signature (PS00488) 254-257,436-439,606-609 493-496 29-31,112-114,142-144,299-301, 491-493,556-558 13-16,33-36, .137-140,227-230,374-377,556-559,562-565,571-574,674-677 645-652 9-14,60-65,83-88,103-108,133-138, 192-197,208-213,236-241,255-260, 307-312,392-397,432-437,451-456, 192- 207 730 79356 5.88 17-20,157-160,274-277,456-459,624-627, 30 - 32,99-101,132-134,231-233,319-321,384-386,511-513,549-551,650-652 19 -22,36-39,53-56,394-397,617-620, 634-637,694-697, 665-672 14-19,103-108,123-128,153-158,212-217,228-233,256-261,275-280,281-286,327-332,412-417,452-457,471-212- 227 Table 3.7 Features predicted for Rubus PAL proteins. Amino acid positions of potentially important modification sites in the proteins, as identified by scanning for PROSITE patterns using the server http://www.ch.embnet.org/software/PSTSCAN_form .html. (a) The numbers within the brackets denote the accession number of the patterns in the PROSITE database (http://www.expasy.ch/prosite/). 58 Arabidopsis PAL4 Arabidopsis PAL3 Phaseolus PAL2 l_2S_ Pira/w PAL2 Ll£- Pisum PALI G/ycme PALI Arabidopsis PAL2 2A Arabidopsis PALI ±4? Phaseolus PAL3] 19 " Lycopersicon PALI 26 1 8 Lycopersicon PAL5 >//ana PALI Lithospermum PAL2 Lithospermum PALI Dicot Monocot P/«Hi PALI Figure 3.6 A phylogenetic tree depicting the relationships amongst PALs. PALs identified from other species were used subject to the following restrictions, (i) Only species in which full-length PAL gene-families have been characterized were used in the analysis, (ii) Within a gene family, only sequences that differed by more than 5% in primary nucleotide sequence were incorporated in the analysis to reduce the inclusion of allelic sequences. Amino acid sequences were aligned and the parsimonious tree was constructed using the program PAUP 4.0.b2. The tree shown has a consistency index of 0.788. Bootstrap support values are boxed and values below 50% are not indicated. The Pinus taeda PAL sequence (GenBank, AF013481) was used to root the tree. 59 3.3.4 Developmental expression of the RiPAL genes 3.3.4.1 Design of gene-specific primers The developmentally regulated expression patterns of the two cloned RiPAL gene, were examined by quantitative RT-cPCR analysis. Gene-specific PCR primers (Figure 3.7) were used to amplify a 217-bp fragment of the RiPALl gene and a 218-bp fragment of RiPAL2. The specificity of the gene-specific primers was tested by control amplifications of the reciprocal plasmid cDNAs. As predicted, under the PCR conditions used the primers were specific to their cognate gene-family member (Figure 3.7). 600 bp M Target M 1 2 3 4 Figure 3.7 Specificity of the gene-specific primers designed to amplify individual members of the RiPAL gene family. Lanel, RiPALl cDNA with gene-specific primers pall5 and pall3; lane2, RiPALl cDNA with non-specific primers pall 5 and pall3; lane 3 RiPALl cDNA with non-specific-primers pal25 and pal23; lane 4, RiPALl cDNA with gene-specific primers pal25 and pal23. M , 100 bp molecular weight marker (Life Technologies) 3.3.4.2 Developmental regulation of the RiPAL genes Previous studies in bean (Liang et al., 1989), Arabidopsis (Wanner et al , 1995), and parsley (Lois et aL, 1989) have shown that members of the PAL gene family are expressed differentially during development and in response to different environmental cues. While PAL genes have been isolated from a few fruit crops, their developmental expression has not been studied during the process of fruit ripening. 60 Since I was interested in the process of Rubus fruit development and the regulation of the PAL genes during this developmental program, I examined the expression patterns of the two RiPAL gene family members during fruit development, as well as in other tissues. Both members of the RiPAL gene family were expressed in all the tissues studied (Figure 3.8), and the pattern of expression for both RiPALl and RiPALl transcripts was qualitatively similar. Among all the tissues examined, the two genes were expressed most actively in fertilized flowers or fruits. Moderate expression was also detected in shoots and roots, while barely detectable levels of expression were observed in the young leaves. Low expression of PAL genes in leaves accompanied by a higher level of expression in roots has also been seen in other plant species (Liang et al., 1989; Joos and Hahlbrock, 1992; Yamada et al., 1992). RiPALl'was most highly expressed in green immature fruits (stage I), with the expression declining as the fruits attain maturity. RiPAL2, on the other hand, had lower levels of expression in green fruits (stage I), with two-fold higher expression observed as the fruits were turning pink (stage III) and then red (stage IV). Since high PAL expression has been observed in flowers (Liang et al., 1989; Gowri et al., 1991; Joos and Hahlbrock, 1992; Fukasawa-Akada et al., 1996; Wanner et al., 1995), I also examined the Rubus floral tissue at three stages of development. Both RiPALs were expressed most highly in the mature, fertilized flowers (stage JJI). Based on these results, it is apparent that though both RiPALs are actively transcribed throughout Rubus plants, and their expression is developmentally regulated. 3.3.4.3 Comparison of expression levels of the two RiPAL genes in different organs of Rubus To compare the relative levels of expression of RiPALl and RiPAL21 used a RT-cPCR assay to quantitatively determine the amounts of each RiPAL transcribed in stage 61 Ill fruits. Full-length sequences of the two RiPAL genes had been isolated by screening a cDNA library made from mRNA isolated from stage III fruits, so it was already established that the RiPAL genes are expressed in these tissues. Once I had determined the absolute levels of expression of each of the two genes in developing fruits III, it was possible to normalize the amounts in other tissues by comparing the ratio (target:competitor) for each tissue to the ratio (target:competitor) established for expression in fruits stage III. The incorporation into the RT-PCR reactions of a target of non-homologous gymnosperm genomic DNA with the same primer binding sites as the normal RiPALl and RiPAL2 provides an internal control from which levels of RiPALl and R1PAL2 cDNA in a sample can be estimated. Figure 3.9 A shows the qualitative results of a typical RT-cPCR assay. In this case, seven equal aliquots of cDNA transcribed from total RNA of fruit stage Ul were amplified with gene-specific primers in the presence of a series of dilutions of the competitor DNA. The two fragments, the 217/218-bp fragment from RiPALl or R1PAL2 mRNA and the control competitor 370-bp fragment, are simultaneously generated by the polymerase in proportions that reflect the relative abundance of target species. Figure 3.9A shows that as decreasing amounts of competitor are added, the amount of RiPALs cDNA being amplified increases; hence, the ratio of the two bands (target:competitor) changes across the gel (Figure 3.9A). After densitometric scanning, the molar ratio of the competitive D N A to the RiPALs cDNAs product was regressed against the amount of competitor added (Figure 3.9B). As determined by this analysis, the amount of the two RiPAL cDNAs in fruits III was essentially equal and was equivalent in each case to 65 pg cDNA/u.g of total RNA. 62 Comparisons of the relative abundance of RiPALl and RiPAL2 in different tissues indicated that RiPALl mRNA is more abundant than RiPALl in most of the tissues examined (Table 3.8). RiPALl is about 12-fold more abundant in shoots and about 5-fold higher in leaves and young fruit (stage I and II). RiPALl and RiPALl had about the same level of expression in mature flowers (stage III) and maturing fruits (stage III). Only in fully developed open flowers (stage II) was RiPALl expressed more highly (about 3-fold) than RiPALl. The observation that relatively similar levels of both RiPALs rnRNAs accumulate in stage III fruits was consistent with the class distribution of the RiPAL clones isolated from the Rubus cDNA library. The 26 full-length characterized cDNAs that were verified to be PAL clones fell into two restriction fragment length polymorphism classes that contained 12 RiPALl clones and 14 RiPALl clones (Table 3.5). 63 . . 500 bp -A) RiPALl mm m m M M » mm mm - Competitor m m Hi i • - Target 500 bp - — RiPAL2 ?* — *** 1LB - Competitor " Target 500 bp -RiHlS3 1 111) i . « — > • — » «—• • - Target 500 bp -RMIS3 (-)RT « S — Leaves Shoots Roots Flowers I Flowers II Flowers III Fruits I Fruits II Fruits ni Fruits IV Fruits V B) r- • RiPALl Figure 3.8 Semi-quantitative RT-cPCR estimation of the accumulation of specific RiPAL transcripts in different organs of Rubus. A) . RT-cPCR was performed using 100 ng of total RNA isolated from: young leaves, shoots, roots and different developmental stages of flowers and fruits. Following 32 cycles of amplification, the products (50% of each amplification mix) were resolved in a 3% TAE-agarose/EtBr gel. B) . The relative amounts of target (t) and competitor (c) amplification products were calculated, and the ratio of the two products graphed. Similar results were obtained in two independent experiments. The expression level of a given RiPAL gene can be compared between tissues, but expression of the two RiPAL genes cannot be compared within a tissue. The intensity of RiPAL bands was normalized to the average intensity of the RiHis3 product as a control for equal amounts of starting RNA. M , lOObp DNA size marker (Life Technologies); C, positive control cDNA. 64 A) 500 bp - I RiPALl 500 b p - | RiPALl [-Competitor Target Competitor h Target M 1 2 3 4 5 6 7 8 O: RiPALl 65 pg/ug total RNA •: RiPALl 65 pg/ug total RNA - 3 - 2 - 1 0 1 2 3 Log N c (moles) Figure 3.9 Quantitation of the absolute levels of the two RiPAL mRNA species in fruits at developmental stage III. A) . PCR products were generated using cDNA templates reverse-transcribed from 250 ng total RNA. The PCR reactions were carried out with a constant amount of template using gene-specific primers, in the presence of a serial dilution of the gene-specific competitor, from 100 attomoles (lanel) to 3 x 10"3attomoles competitor (lane 7). Lane 8, control containing no template. The PCR products were resolved on a 3% agarose gel and stained, and the amount of amplified DNA present in each band was measured by densitometric scanning. B) . The molar ratio of the PCR products shown in (A), was plotted against the amount of competitor added per tube. In this case the analysis yielded a similar value for the absolute amounts of two RiPAL genes in developmental stage III of fruits. t, target; c, competitor; At/Ac, ratio of the target to competitor; N c , moles of competitor 65 RiPAL transcript levels ng/mg total cellular RNA RiPALl RIPAL2 Ratio RiPALl :RiPAL2 Leaves Shoots Roots Flowers I Flowers H Flowers DI Fruits I Fruits H Fruits m Fruits IV Fruits V 14.6 169.5 126.8 37.8 8.1 76.5 425.9 73.1 65.2 132.8 83.7 3.2 U.1 38.5 12.8 22.e 73.C 74.5 14.S 65.2 ll.t 20.7 1 5 8 6 0 5 9 2 6 24.6 11.5 3.3 3.0 0.3 1.0 5.7 4.9 1.0 1.7 4.2 Table 3.8 Levels of specific RiPAL transcripts in different organs. The absolute levels of the two RiPAL transcripts were determined in the developmental stage III of fruits (Figure 3.9). The levels of transcripts of each cDNA in other organs were then determined based on the relative ratio (target:competitor) of expression of each cDNA in those tissues compared to that seen in stage III fruits (Figure 3.8). 66 3.4 Discussion I have used a combination of approaches to identify the repertoire of PAL genes in Rubus and to study the tissue-specific and developmental expression of each gene family member during ripening of Rubus fruits. To my knowledge this is the first comprehensive study of the structure, expression and evolution of PAL genes from a fruit crop; the only other report about regulation of PAL genes during fruit-ripening comes from a preliminary work in melon fruits (Diallinas and Kanellis, 1994). To identify the PAL genes in Rubus, I used degenerate PCR primers that target evolutionarily conserved sequences in PAL. Such a PCR-based homology approach has proven to be successful in identifying multiple divergent PAL genes in pine (Butland et al., 1998), plants in which an initial cDNA library screen had led to the erroneous conclusion that PAL might be encoded by single gene in gymnosperms (Whetten and Sederoff, 1992). By sequencing several PCR-derived clones, I isolated and characterized partial fragments of two distinct Rubus PAL genes (Ripall and Ripall), distinguishable by the polymorphic EcoRI restriction endonuclease site. The characterization of putative alleles of Ripall and Ripal2 (Ripall-2 and Ripal2-2, respectively), suggests that my screening of the colonies obtained from PCR-amplified products was exhaustive and it was unlikely that additional RiPAL genes would be found within the PCR-amplified products. The two fragments designated Ripall and Ripal2 were 71% identical to each other at the nucleotide sequence level, demonstrating that the PCR primers used in this study are capable of amplifying highly divergent PAL genes from a single species. Using the same sets of degenerate PCR primers, Butland et al. (1998) identified five PAL loci in 67 Pinus banskiana that share 69-94% nucleotide sequence identity within a similar 366 bp region that was used for comparison of the Rubus PAL genes (Figure 3.3). Likewise, McQuoid and Ellis (1999) have used the primer pair combination P7-P2 to successfully sample the presence of PAL genes in various plant species, including those belonging to the Rhodophyta, Phaeophyta, Chlorophyta, Hepatophyta, Bryophyta, Anthocerophyta, and Lycophyta. This suggests that these target sequences of the primers are highly conserved across divergent members of the PAL gene family and across great evolutionary distance. Support for this hypothesis also comes from comparing the conservation of sequences targeted by the primers in a catalytically similar enzyme, histidine ammonia-lyase (HAL, E C 4.3.1.3). HAL, like PAL, is a homotetrameric enzyme. It catalyzes the non-oxidative deamination of the aromatic amino acid histidine, to release urocanic acid and N H / (Hernandez and Phillips, 1994). The similarity in reaction mechanisms between H A L and PAL is reflected in the amino acid sequence conservation between H A L and PAL, which may be important for establishing the architecture and function of the catalytic center. The crystal structure of Pseudomonas putida H A L reveals that in H A L the parallel amino acid residues at positions (mentioned within brackets with each primer) targeted by the primers P5 (53-60), P7 (140-146), P2 (405-411), and P6 (411-417), are required for the formation of the active sites of the tetramer (Schwede et al., 1999). In particular the amino acid motif N T G (56-58) in HAL, likely narrows the access to the active center by forming a mobile loop (Schwede et al., 1999). In PAL, these amino acids correspond to amino acids T T G incorporated within the primer P5. Since these primers target conserved sequences that are required to maintain 68 the catalytic properties of the enzyme, their ability to amplify PAL genes from diverse species is thus understandable. The primary structure of the partial genomic fragments of the two RiPAL genes was similar to that from other species. The genomic D N A sequence of PAL genes has also been characterized for some thirty-six genes representing twenty species. In all angiosperms, except in two Arabidopsis genes (AtPAL3 and AtPAL4), a single intron occurs after the second nucleotide in the codon for a conserved arginine residue (Table 3.3). In PAL genes from potato (Joos and Hahlbrock, 1992) and orchid (Liew et al., 1996), this arginine is replaced by a lysine but the position of the intron is conserved. The sizes of the introns sequenced thus far vary from 90 bp in pea PAL2 (Yamada et al., 1992) to 1720 bp in bean PAL3 (Cramer et al., 1989). The two characterized poplar genomic PAL genes each contain an intron of the same size and at a position conserved as in other PALs. The sequence around the junctions of the single intron in Ripall and Ripal2 are very similar to the consensus eukaryotic splice sites, as is true of intron in other PAL genes. Interestingly, PAL genes identified in gymnosperms thus far differ from angiosperm PALs in lacking this intron (Butland et al., 1998). While gain or loss of introns has been implicated as a common event in the evolution of other gene families, such as catalase (Frugoli et al., 1998) a detailed analysis of this phenomenon has not been conducted for the PAL gene family. It would be interesting to establish whether gain and loss of introns has played a crucial role in the evolution of divergent members of the PAL gene family. Extensive screening of a cDNA library representing partially ripe fruits of Rubus resulted in the isolation of two cDNAs, that could be differentiated by polymorphic sites 69 for a set of restriction endonucleases (Figure 3.4). Each cDNA shared an exact sequence match to one of the two partial PCR fragments, Ripall or Ripal2. These cDNAs thus represented the full-length sequences of the partial genomic fragments identified using the PCR-based homology search. RiPALl and RiPAL2 exhibit very similar organization in terms of the polypeptide encoded by the respective open reading frames. The calculated molecular masses (77580 and 79356 Da) of the two deduced open reading frames are consistent with the size determined for PAL polypeptides (72000-83000 Da) from other plants using SDS/PAGE (Schomburg and Salzmann, 1990). The deduced amino acid sequence of the Rubus PAL gene represented by RiPALl and RiPAL2 is clearly related to other PAL genes and is more closely related to dicotyledenous angiosperms PAL genes than to those of monocots or gymnosperms (Table 3.6). Comparisons of RiPAL sequences to those of other PALs reveal large stretches of amino acid identity, mainly restricted to the predicted second exon (Figure 3.6), as has also been noted before (Subramanium et al., 1993). This conservation in the second exon is not unexpected, as a 2.1 A resolution of the crystal structure of H A L reveals that the five-helix bundles of the C-terminal domain is required to assemble the central core of the enzyme (Schwede et al., 1999). Further, sequence regions of the C-terminal domain are also required to form the active site of the enzyme. Several lines of evidence support the prediction that at least two PAL genes exist in the Rubus genome. First, amplification of the Rubus genome with different sets of degenerate PCR primers amplified two divergent classes of genes. Second, screening of a cDNA library at low stringency grouped all positive clones into two groups: clones containing Pstl, EcoBJ and BamHl sites and clones that lack these sites (Figure 3.4). PAL 70 gene polymorphism was also reflected in the sequence heterogeneity within the 5' and 3'-untranslated regions of the two cDNAs. Finally, in a southern blot analysis with each of the two RiPAL cDNAs, a single genomic restriction fragment hybridized to each cDNA (data not shown). These data suggest that the two RiPAL genes likely constitute the complete gene family; however I cannot with certainty exclude the existence of additional, perhaps even more diverged, copies. The representation of the partial fragments of the two Rubus PAL genes in the cDNA library confirms that the two genes were not pseudogenes. Since it is unknown what percentage of the genes in a eukaryotic genome actually constitute pseudogenes, one cannot predict the proportions of genomic PCR-amplified fragments that might arise from pseudogenes. Sequence analysis of cDNAs also demonstrated that the introns detected in Ripall and Ripall were spliced at the expected positions thus generating a highly conserved arginine codon found in most plant PAL sequences. Interestingly, the deduced amino acid sequences of the two cDNAs contain several potential post-translational modification sites. While heterologous expression of yeast PAL (Orum and Rasmussen, 1992), parsley PAL (Shulz et al., 1989; Appert et al., 1994) and poplar PAL (Osakabe et al., 1996) in E. coli have demonstrated that post-translational modification of the enzyme is not essential for the catalytic activity, it had been postulated that glycosylation and phosphorylation may be required for stability, localization, and for positioning of the active sites for optimal activity (Havir, 1979; Shaw et al., 1990). Support for a role for post-translational phosphorylation comes from recent work of Allwood et al. (1999), who show that Thr 5 4 5 in poplar PAL is the site of phosphorylation by a Ca2+-stimulated protein kinase of 55kD from bean. Based on their 71 analysis of the stability of the phosphorylated enzyme, the authors suggest that this phosphorylation may be involved in marking PAL subunits for turnover (Allwood et al., 1999). The presence of potential phosphorylation sites in the Rubus PAL sequences (Table 3.7) provides the possibility that protein phosphorylation may have a role in modulating the Rubus PAL protein activity. To investigate the evolution of the PAL gene family, the amino acid sequences for a wide range of PALs were compared. Only species from which a family of PAL genes had been identified were used in this analysis, with the exception of the single PAL from another non-climacteric fruit, cherry. This analysis showed that most PALs from one dicotoyledons species, or even a family, grouped together, while PALs from monocots formed a distinct cluster. This pattern suggests that gene duplication events occurred later in the evolution of angiosperms, likely after the split between monocots and dicots that is proposed to have occurred between 200 and 100 mya (Stewart and Rothwell, 1993). The more recent duplications resulted in genes from a single species that clustered together rather than with PAL genes from other species. For example, parsley PAL is encoded by a family of four genes that cluster close together. Likewise, PALs characterized from tobacco, potato, tomatoes, and Lithospermum clustered with each other rather than with PAL from other species. Consistent with this phylogenetic analysis, the biochemical and molecular data for the parsley PALs expressed as recombinant proteins show that they have nearly indistinguishable catalytic properties (Appert et al., 1994), share considerable sequence similarity in their promoter elements, and are induced in a similar manner (Logemann et al., 1995). In contrast to the parsley PALs, however, the bean PALs formed two distinct clusters. Thus, it is not surprising that bean PALs encode unique PAL 72 isoforms (Liang et al., 1989) and are differentially induced by different stimuli (Liang et al., 1989). This suggests that gene duplication and sequence divergence underlie evolution of PAL genes in plants, as has been shown for the evolution of many other plant gene families such as actin (McDowell et al., 1995), A C C synthase (Oetiker et al., 1997), and heat-shock proteins (Waters, 1995). Gene duplications followed by divergence in regulatory and/or protein coding sequence has long been recognized as a potential source of genes with novel functional capabilities (Ohno 1970). In the most parsimonious tree (Figure 3.6), the two RiPALs are more closely related to genes from other species than to each other. RiPALl for example, formed an outgroup to most other dicot PALs, while RiPALl is a part of the clade formed by other rosidae PAL sequences. Similar bivalent clustering was observed for PAL from other species such as poplar, citrus and Arabidopsis. However bootstrap support for many of these larger groups was weak. The cluster representing AtPAL3 and AtPAL4 sequences lacked sequences from other species. AtPAL3 and AtPAL4 are the only angiosperm PAL genes known to have two introns (Wanner et al., 1995). Thus, it is possible that AtPAL3 and AtPAL4-like sequences are unique to the family Brassicasseae. Cloning of PAL genes from other species belonging to family Cruciferae with the sets of primers designed in this study could provide useful insights into this question. However, it is also possible that "two intron" PAL genes are also present in other families, but are yet to be identified. While it is apparent that some PAL genes do share a high level of similarity and thus cluster together, they might contain subtle, but important, variations. This variability might occur at only one or a few positions, conferring advantageous differences such as that required 73 for protein-protein interactions, and thus such genes would be maintained by selection. However, this remains to be demonstrated experimentally. Based on the wealth of plant and fungal taxa in which PAL enzymatic activity has been detected (Young et al., 1966; Kim et al., 1996) or from which PAL genes have been isolated, it appears that an ancestral PAL gene probably existed before the evolution of plants. While phenolics have been detected in lower plants, PAL genes are just being characterized from lower plants (McQuoid and Ellis, 1999). Phenolics such as lignin and flavonoids have been implicated as being necessary for the evolution of land plants. Further, while PAL activity and PAL genes have been isolated from certain fungal species, PAL remains to be isolated from species other than plants. Complete sequencing of the genomes of various bacteria and archaea reveal that species such as Bacillus subtilis, Escherichia coli, Mycobacterium tuberculosis, Methanococcus jannaschii lack PAL genes. Interestingly, some of these species have H A L . Thus, the presence of H A L in species that lack PAL and vice-versa, is enigmatic and raises questions as to the evolution of the "ammonia-lyase" class of enzymes. To date, Streptomyces verticillatus remains the only bacterial species in which both H A L and PAL activities have been detected (Ernes and Vining, 1970), though the corresponding genes have not been characterized. Thus, the overall evolution of PAL raises further interesting questions about: (a), what is the current diversity of the P A L / H A L lineage; (b), what, at both the molecular and organizational level, might have been the driving force behind the expansion and functional diversity of this multigene family. To analyze the expression pattern of a large multigene family, the ability to specifically distinguish amongst all its members is essential. The data in Figure 3.7 shows 74 that the primers designed for each gene family member in Rubus can specifically distinguish between the two RiPAL transcripts. This made it possible to assess the expression of the RiPAL by RT-cPCR in a range of different tissues. In particular, the various developmental stages of the flowers and fruits were examined closely (Figure 3.8). As a general rule, PAL genes display tissue-specific expression in plants (Liang et al., 1989; Wanner et al., 1995). The PAL expression profile in Rubus can be interpreted in both qualitative and quantitative terms, as shown in Figure 3.9 and Table 3.8. Both RiPALl and RiPAL2 are expressed in all major organs of Rubus, though to some extent at different levels. The strongest signal for RiPALl was detected in fruits (stage I), while the strongest signal for RiPAL2 was observed in fruits (stage IV) (Table 3.8). PAL genes are thus actively transcribed in these organs, as would be predicted, given the accumulation of large amounts of phenylpropanoid products in the ripe fruits (Goiffon et al., 1991). While the flowering and ripening developmental program clearly induced both RiPAL transcripts, their overall abundance is not obviously correlated with the process of ripening in the fruits. In bean, all three members of the PAL gene family were found to be expressed in roots, while PALI and PAL2 were expressed in shoots and only PALI was expressed in leaves. It was also observed that PAL2 was highly expressed in petals, whereas PALI was weakly expressed and PAL3 was not expressed in that tissue (Liang et al., 1989). My data for abundant transcript accumulation in roots is consistent with other studies, where experiments with gene-specific probes have suggested that most of the isolated PAL genes are expressed strongly in roots and notably fewer are expressed in leaves (Kervinen et al., 1997; Lois et al., 1989; Cramer et al., 1989). Thus, while RiPALl and RiPAL2 have quantitatively similar expression profiles, the quantitative differences 75 in the two transcripts imply that their cognate genes are subject to somewhat different developmental control mechanisms. This apparent divergence in behavior of the two RiPAL genes may be related to their distinct phylogenetic clustering. Overall, the results in this chapter confirm that a PCR-based homology search for gene family members has the potential to sample the PAL gene family of any organisms in a manner that is independent of temporal and spatial patterns of gene expression. I anticipate that analysis of gene families using degenerate primer PCR and genomic D N A as a template will find applications in studies where use of cDNA is not informative, or is impossible due to the unavailability of mRNA. Appropriately chosen PCR amplification products could also be used to define intron positions and structures. However, the same conservation of sequence that enables such screening can also make it difficult to carry out informative phylogeny analysis of the gene families and species. In addition to expanding our general knowledge of PAL genes and their behavior during development, the Rubus PAL genes will provide useful tools for examining mechanisms regulating PAL gene expression during fruit ripening in this crop. The ability to transform and regenerate transgenic Rubus (Hassan et al., 1993) also offers the opportunity to analyze c/'s-acting elements and trans-factors involved in that regulation, as well the potential to manipulate PAL genes in order to enhance fruit quality or production. 76 CHAPTER 4 The 4-coumarate:CoA ligase gene family in Rubus idaeus: cDNA structures, expression during fruit-development and catalytic properties of a divergent member 4.1 Introduction Channeling of photosynthetically fixed carbon through the phenylpropanoid pathway requires the sequential action of three enzymes: phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumarate:CoA ligase (4CL), which together constitute the general phenylpropanoid pathway (Hahlbrock and Grisebach, 1979). As the last enzyme of this pathway, 4CL is responsible for the activation of cinnamic acid and its derivatives to their corresponding thioesters. These esters serve as central intermediates in the synthesis of more highly modified phenylpropanoid compounds that are required for various physiological functions and adaptation to environmental perturbations (Dixon and Paiva, 1995). In all plants examined so far, the 4CL gene family comprises two to three members. While in some species e.g. Arabidopsis (Ehlting et al., 1999), poplar (Hu et al., 1998), and soybean (Uhlmann and Ebel, 1993) specific 4CL genes have been correlated with the formation of unique end-products, in others multiple forms of 4CL have been isolated that display identical or nearly identical properties, e.g. potato (Becker-Andre et al., 1991), parsley (Lozoya et al.; 1988), pine (Zhang and Chiang, 1997), and poplar (Allina et al., 1998). In aspen, Pt4CLl and Pt4CL2 gene products are structurally and functionally distinct, and are expressed in a compartmentalized manner (Hu et al., 1998). Pt4CLl is expressed in lignifying xylem and the corresponding recombinant protein prefers lignin-specific substrates such as ferulic acid and 5-hydroxyferulic acid, 77 suggesting that it plays a specific role during lignification (Hu et al., 1998). This proposed role for Pt4CLl is corroborated by the results of antisense-mediated down-regulation of Pt4CLl in aspen, which led to accumulation of structurally normal lignin but at substantially reduced levels (Hu et al., 1999). Aspen Pt4CL2, on the other hand, shows highest activity towards 4-coumaric acid with no activity towards 5-hydroxyferulic acid, and is expressed in leaf and stem epidermis but not in developing xylem tissue, thus suggesting that it may play a role in the biosynthesis of non-lignin phenylpropanoids (Hu et al., 1998). Likewise, in soybean, pathogen-inducible Gm4CL16 is speculated to encode a specific 4CL isoform involved in the biosynthesis of isoflavonoid phytoalexins (Uhlmann and Ebel, 1993). In Arabidopsis, recombinant At4CLl has a high specificity for 4-coumaric acid and caffeic acid, and is the only member expressed in lignifying bolting stems, roots (Lee et al., 1995), siliques and to a lesser extent in leaves (Ehlting et al., 1999). At4CL2 is unusual in being able to utilize caffeic acid with a higher efficiency than 4-coumaric acid while ferulic acid was not converted at all (Ehlting et al., 1999). At4CL2 is highly expressed in roots and siliques. At4CL3, on the other hand has a high specificity for 4-coumaric acid relative to caffeic acid and ferulic acid and is expressed at highest levels in flowers and at barley detected level in lignified organs. This pattern suggests that the primary function of the At4CL3 isoform may be to channel activated 4-coumaric acid for the CHS reaction that feeds the flavonoid-specific branch pathways (Ehlting et al., 1999). In soybean, the two discrete 4CL genes characterized, Gm4CL14 and Gm4CL16, have been proposed to correspond to two distinct 4CL isoforms identified in that species, implying that a discrete 4CL gene can encode a function-specific isoform (Knobloch and Hahlbrock, 1975; Uhlmann and Ebel, 1993). 78 4CL gene expression is transcriptionally regulated and can be activated both during development and by external stimuli such as pathogen infection, elicitor treatment, wounding, MJ treatment, and UV-light irradiation (Douglas et al., 1987; Schmelzer et al., 1989; Wu and Hahlbrock, 1992, Lee et al., 1995). This activation of 4CL following elicitor treatment or UV-irradiation has been shown to occur coordinately with the activation of PAL, the first enzyme of the general phenylpropanoid metabolism (Chappell and Hahlbrock, 1984). In parsley, Lois and Hahlbrock (1992) reported that PAL-2 and 4CL-2 are primarily responsible for the high constitutive levels of these enzymes seen in roots and flowering stems. A high degree of coordination in the overall expression of the PAL, C4H, and 4CL genes has also been observed in parsley cell-cultures treated with UV-light, wounding and elicitor treatment (Logemann et al., 1995). In tobacco flowers, in situ hybridization was used to show that both endogenous tobacco 4CL transcripts and those of an introduced parsley 4CL1 gene accumulate in a cell-type specific manner, and that the patterns of accumulation are generally consistent with the sites of phenylpropanoid natural product accumulation (Reinold et al., 1993). Thus, one can speculate that divergent 4CL genes are not only employed to divert activated CoA thioester to specific branch- phenylpropanoid pathways but might also be required to control the temporal and cell-type specific accumulation of discrete end-products. The 4CL gene family in Rubus is of particular interest, since maturing raspberry fruits accumulate a novel phenylpropanoid derivative, p-hydroxyphenylbutan-2-one ("raspberry ketone"), that is a primary determinant of raspberry flavour (Schinz and Seidel, 1961; Borejsza-Wysocki and Hrazdina, 1994). Biosynthesis of the "raspberry ketone" has been shown to proceed via a two-step pathway branching from the general 79 phenylpropanoid pathway. The first enzyme in this branch, benzalacetone synthase, catalyzes a reaction step analogous to that catalyzed by chalcone synthases, using p-coumaryl CoA and malonyl CoA as substrates (Borejsza-Wysocki and Harazdina, 1994). Raspberry fruits also accumulate other phenylpropanoid-derived metabolites, notably anthocyanins responsible for fruit color (Goiffon et al., 1991). Biosynthesis of anthocyanins is also dependent on activated 4-coumaric acid as a precursor. This diversity of phenylpropanoid end-products in one tissue suggests that distinct 4CL genes might regulate the biosynthesis of individual metabolites in Rubus. To address this question, however, it was first necessary to characterize the 4CL gene family in this species. 80 4.2 Methods 4.2.1 Design of degenerate PCR primers for 4CL genes The following sequences Arabidopsis 4CL1 (S57784), soybean 4CL14 (X69954) soybean 4CL16 (X69955), Lithospermum erythrorhizon 4CL1 (D49367), Lithospermum erythrorhizon 4CL2 (D49366), rice 4CL1 (L43362), rice 4CL2 (X52623), parsley 4CL1 (X13324), parsley 4CL2 (X13325), loblolly pine 4CL1 (U12012), loblolly pine 4CL2 ( U12013), potato 4CL1 (M62755), and potato 4CL2 (AF150686) were aligned to generate an amino acid consensus. Based on this alignment, primers were designed to amplify partially conserved regions of the first exon containing the AMP-binding motif. The target locations of the primers within a 4CL gene are illustrated in Figure 4.1. Primers CI and C2 target a highly conserved seven amino acid sequence (RSKLPDI) located near the N-terminus of the protein. Several amino acids in this region have six-fold codon degeneracy, which creates a high level of primer degeneracy. To decrease the complexity of the primer, the nucleotide sequences of the 4CL genes in this region were aligned and the codon preference for 4CL genes analyzed. Three codons for arginine (AGA, C G A and CGG) and four codons for serine (UCU, UCC, U C G and UCA) were found to be the codons of preference and were therefore incorporated in the primers. Since all the six codons for leucine are used in the 4CL genes, these codons were split between the two primers, CI and C3. CI and C3 can thus be considered as sister primers, where CI accommodates the leucine codons CUU, CUC, CUA, and C U G and C3 targets codons UTJA and UUG. Primers C2 and C4 were designed to target a seven amino acid region [P(LM)FFfl(YF)(SA)] corresponding to position 255-261 in Arabidopsis 4CL1. Primers C2 and C4 were identical, other than at the position for leucine. The 4CL primer C2 81 incorporated four-fold degeneracy for leucine, while primer C4 was two-fold degenerate for leucine. Sister primers C2 and C4 target the same position but collectively they account for the codons of the different amino acids. Similarly, sister primers C6 and C8 target the partially degenerate sequence corresponding to the peptide K G V M L T , which forms part of the putative AMP-binding motif (Bairoch, 1991) and collectively account for the six leucine codons as in primers CI and C3. Primers Degenerate Primer Sequences AA Binding Sites C 1 5'-cggaattc G(AG) TCI AA(GA) CTI CCI GA(TC) AT-3' R26SKL(PQ)DI3o C3 5'-cggaattc G(AG) TCI AA(AG) TT(AG) CCI GA(CT) AT-3' R26SKL(PQ)DD0 C2 5'-gctctaga GA (GA)TA (AGT)AT (AG)TG (AG)AA LAG (AG)GG-3' P255(LM)FHI(YF)(SA)26I C4 5'-gctctaga GA (GA)TA (AGT)AT (GA)TG (AG)AA (TC)AA (AG)GG-3' P25 (LM)FHI(YF)(SA)261 C6 5'-gctctaga GT LAG CAT (CA)AC ICC (CT)TT-3' K2i8GVMLT223 C8 5'-gctctaga GT (TA)AA CAT (CA)AC ICC (CT)TT-3' K 2 i 8 GVMLT 2 2 3 Table 4.1 Nucleotide sequence of the primers used for amplification of the Rubus 4CL gene family. The lower case letters denote the restriction enzyme recognition sites incorporated within each primer to facilitate PCR product cloning. The binding sites of the primers relative to Arabidopsis 4CL1 (GenBank, S57784) have been denoted with the amino acid corresponding to each primer. A A = Amino acid Restriction enzyme recognition sites (Table 4.1, lower case) were incorporated at the 5'-end of each primer to facilitate cloning of the amplified products into pUC19. Forward primers (odd numbered/sense-strand primers) CI and C3 have an EcoRl site while reverse primers C2, C4, C6 and C8 (even numbered/antisense primers) had a recognition site for^Tjal. Table 4.1 summarizes the sequences of the primers and their respective binding positions compared to Arabidopsis 4CL1 (S57784). 82 4.2.2 PCR amplification and characterization of PCR products Details of the PCR methods used are described in Chapter 2. Amplified products of two independent PCR reactions from each primer combination, were subcloned into the EcoRl and Xbal site of pUC19. Multiple clones from each subcloning experiment were amplified in a PCR reaction with vector-specific primers M13R and M13F to check for the presence or absence of inserts. The inserts were digested with restriction enzymes according to the manufacturer's protocol. 4.2.3 Screening of Rubus cDNA library for 4CL genes A mixed population of fragments of Ri4cll and Ri4cl2 obtained by amplification of the Rubus genome with ^CZ-gene-specific primers (Figure 4.1) was used as a probe to screen the library as described in chapter 2 (Section 2.5). The inserts of the positive plaques were amplified with vector specific-primer T3 and a pool of ¥CL-gene-specific primers (C6 and C8) to verify the presence and the size of the inserts. Amplified fragments of putative full-length inserts were digested with restriction enzymes to fingerprint the different classes of clones. One representative plaque from each class, harboring the largest size insert, was rescued as a pBluescript II SK(") phagemid using the ExAssist helper phage. 4.2.4 Design of gene-specific primers and RT-cPCR To distinguish between the Ri4CLs, primers were designed to target unique regions within the open reading frame of the three genes. Gene-specific primers capable of amplifying a segment from nucleotides 201 to 452 ofRi4CLl cDNA are: CI 5 (5-CGA C A T C C A C A C C T A C G C C ) and antisense primer C13 ( 5 - A C T T C A C T T C G T C G C A T G AT). Gene-specific primers capable of amplifying a segment from nucleotides 266 to 531 83 ofRi4CL2 cDNA are: C25 (5 -TTCGCGAAGCTCAACGACG) and antisense primer C23 (5'-TCGTTGAGCTTCCGCGAA). Gene-specific primers capable of amplifying a segment from nucleotides 375 to 625 of Ri4CL3 cDNA are: C35 (5'-C C T A C A C C T T C T C C G A A A C ) and antisense primer C33 (5'- C C A A C T T G A A A G T G CTGGCC). The predicted amino acids locations of these primers are shown in Figure 4.7. Composite primers (Section 2.11) contained additional nucleotides at the 3'-end of each gene-specific primer. Twenty additional nucleotides (5 ' -CCCCTAACAGGAATTC TGCG-3') were added at the 3'-end of each sense gene-specific primer, to generate sense gene-specific composite primers. Similarly twenty additional nucleotides (51-A C C A T C C T C A G A T T G A A G G A C - 3 ' ) were added to the 3'-end of each antisense gene-specific primers, to generate antisense gene-specific composite primers. Gene-specific primers for amplifications of the Rubus Histone H3 had the nucleotides: hisl5 (5-ATGGCGCGGACGAAGGA-3*) and his 13 (5'- G C C T A C G C C G C C C G C T C A A C C T A -3')-First-strand cDNA reaction (1 pi) (Section 2.11.2) was amplified in a total volume of 20 pi containing; 200 nM each PCR primer, 200 p M each dNTP, and 2.5 U of Taq DNA polymerase in lx PCR buffer (Qiagen) and lx Q solution (Qiagen). The thermal cycling conditions were 94°C for 5 min followed by 25 cycles for (Histone H3) or 30 cycles for (Ri4CLs) of 94°C for 20 s, 59°C for 50 s, and 72°C for 50 s and a final extension of 5 min at 72°C. PCR product (10 pi) was analyzed on a 3% TAE-agarose gel and stained with 5 pg/ml ethidium bromide. 84 4.2.5 Cloning of the Rubus 4CL3 gene into E. coli expression vectors The open reading frame of Ri4CL3 cDNA was amplified with sense primer (5-A C A T G C A T G C A T G A T A T C C A T T G CCTATAAT-3') and antisense primer (5'-GGGTACCGCTGCCCCCCCTCGAGGTC-3 , ) which introduced a unique Sphl site upstream of the start codon, and a Xhol site downstream of the stop codon. The SphllXhol fragment containing the full-length coding region of Ri4CL3 was subcloned into the SphllSali site of pQE to create plasmid pQE-4CL3. 4.3 Results 4.3.1 PCR-based search for the Rubus 4CL gene-family Genomic D N A amplification using a combination of forward primer C1/C3 with any reverse primer C2/C4/C6/C8 yielded single amplification products of the expected size (at least 600-750 nucleotides). Combinations of forward primer C1/C3 with reverse primer C2/C4 amplified fragments of-750 bp, as predicted (Figure 4.2A lane 1, 2, 7 and 8). Primer pair (C1/C3)-(C6/C8) amplified fragments o f - 600 bp (Figure 4.3B, Lane 1, 2, 3 and 4). These fragment sizes matched the estimates for a 4CL gene without any intron sequences within the amplified regions. After subcloning the PCR products, I analyzed multiple clones from two independent PCR reactions with each primer combination. All clones obtained from the 750 bp amplified products of primer combination (C1/C3)-(C2/C4) had an insert of - 500 bp. Clones analyzed from the - 600 bp amplified products of (C1/C3)-(C6/C8) had an insert of - 400 bp. This indicated that the PCR-amplified products had an internal restriction enzyme recognition site for either EcoRI orXbal, the enzymes used for subcloning the PCR amplified products. Direct digestion of the amplified PCR products with EcoRI and/or Xbal revealed that all 85 amplified fragments had an internal EcoSl site. It was estimated that this site was -225 bp downstream from the 5'-end of the amplified products, as the products had retained the integral 3'-end Xbal site. A closer analysis of the nucleotide alignment of known 4CL genes from Arabidopsis, Lithospermum, potato, rice, and pine revealed a highly conserved EcoRI site at the 5'-end of the gene. Thus, it was quite possible that multiple classes of 4CL genes could show a similar fingerprint pattern following digestion with EcoKL. To further discriminate between the different classes of clones, multiple clones were fingerprinted with six other restriction enzymes (Sail, Bgtl, Hindlll, Hindi, EcoKV, BamHT). This analysis revealed additional polymorphism for restriction enzyme recognition sites, and led to the identification of two different classes of clones. Class I clones did not have recognition sites for Sail, Bgtl, and Hindi, while class II clones had internal recognition sites for Sail, Bgtl, and Hindi. Both classes showed a similar fingerprint pattern with Xbal, EcoRI, EcoKV, Hindlll, and BamHJ digests. Figure 4.1 is a graphic representation of the amplified regions of the two clones and the position of the recognition sites for the polymorphic restriction enzyme site Sail. Table 4.2 summarizes the efficiency of different primer sets in amplifying the two classes of clones identified by RFLP. Out of 300-500 clones analyzed from each primer pair combination, 83-90% of the clones were found to belong to class II indicating a primer bias in favour of class II sequences. Furthermore, no class I clones were recovered following amplification with primer pair combination C3-C4. Interestingly, combinations of primer C8 with forward primers C1/C3 amplified class I clones at a higher frequency than did a similar combination of the forward primer C1/C3 with the sister reverse primer 86 C6. Class I clones represent 10-12% of the total clones analyzed from primer combinations (Cl/C3)-C8, as compared to only 5-7% representation of clones from (Cl/C3)-C6. The sequenced regions of Ri4cll and Ri4cl2 consisted in both cases of a portion of an open reading frame whose predicted amino acid sequence showed high homology to other 4CL proteins in the database. Neither region had an intron within the sequenced regions, consistent with the size of the PCR amplified products. As predicted, inserts of each class of Rubus 4CL also lacked the sequence for the forward primers C1/C3, and had instead an EcoRI site at their 5'-end. Using the C L U S T A L program in PC/GENE, I compared a stretch of 438 nucleotides (146 aa) (Figure 4.1) sequenced from all the 47 clones. Twenty-two of the twenty three class I clones were also identical over the 438 nucleotide region, and this sequence was designated as the Ri4cll gene (Figure 4.3). A single clone within this class had single base substitutions at nucleotide positions 100 (G->C), 247 (G-> T), and 395 (C-»T). These substitutions were not detected in other class I clones, and their frequency falls within the estimated level of error determined for Taq DNA polymerase (Butland et al., 1998). All class II-type clones compared had exactly the same sequence within this region, which was designated as the Ri4cl2 gene (Figure 4.3). Thus, by using a PCR based homology search I was able to identify at least two members of the Rubus 4CL gene family. Ri4cll and Ri4cl2 share 72% identical nucleotide sequence, 82% similar amino acid sequence, and 71% identical amino acid sequence (Table 4.3) in the 438 nucleotides (146 aa) upstream of primer C2/C4. Ri4cll shares 70% amino acid identity with Arabidopsis 4CL1 while Ri4cl2 has 82% amino acid 87 identity with Arabidopsis 4CL1. Comparison with other 4CLs, shows that the amino acid identity values range from a low of 45% between rice 4CL1 and Ri4cll to a high of 75% between tobacco 4CL1 and Ri4cl2 (Table 4.3). I wanted to analyze the phylogenetic relationships of the two Rubus 4CL genes to the 4CL gene family members identified from other plant species. It was however, not clear whether one can use the partial fragments identified by such a PCR screen to establish a reliable phylogenetic tree for 4CL genes. I chose Arabidopsis and poplar as the species of comparisons, since in each case multiple and highly divergent 4CL genes have been identified by extensive screening of a gDNA/cDNA library and/or by PCR-based homology search (Ehlting et al., 1999; Hu et al., 1998, Allina et al., 1998). A phylogenetic reconstruction based on the alignments of comparable regions of 4CL gene family members identified from Arabidopsis (Ehlting et al., 1999), Populus spps. (Hu et al., 1998; Allina et al., 1998) and Rubus (this study), revealed that the two classes of evolutionarily distinct 4CL genes originally identified by Ehlting et al. (1999) are resolved using these gene fragments (Figure 4.4). This conclusion was supported by the high bootstrap value (100%) obtained for the branch node separating the two classes. Within the class I -type cluster, there were two sub-categories. In the first, Ri4cll clustered with aspen Pt4CLl, and with Arabidopsis At4CLl and At4CL2. Ri4cl2, on the other hand, clustered with the 4CL1 and 4CL2 identified from hybrid poplar (Allina et al., 1998). It was interesting to note that aspen Pt4CLl did not cluster with 4CLs identified from hybrid poplar, although they belong to the same genus (Populus). It was also obvious from this analysis that I had not yet found a Rubus ortholog of the class II 88 cluster, indicating that there might be additional 4CL gene(s) in Rubus that had not been detected by the PCR-based homology search. 1 1 = 1 RSKLPDI • KGVMLT z 300bp P(L/M)FHI(Y/F)(S/A) Rl Rl XI J Ri4cll RI Rl SI XI J 1 Ri4cl2 Figure 4.1 Positions of the 4CL gene-specific primers relative to Arabidopsis 4CL1 (Genbank, S57784). Arrows indicate the target locations of the degenerate PCR primers. The amino acid sequences on which they are based are written below the primers. I, indicates the introns found within the coding regions of the gene. The amplified region of the Rubus 4cl genes is represented by a straight line. The dark lines indicate the 438 bp (146 aa) region cloned, sequenced and compared. Subcloned amplification products had an EcoKI (RI) and Xbal (XI) site at their 5' and 3-ends respectively, since each gene had an internal .EcoRI (RI) site. 89 M 1 2 3 4 56 7 89 10 11 12 Primer C l + + + - - + _ _ _ _ _ _ Primer C3 _ _ _ _ _ _ + + + - - + Primer C2 + - - + -+ + - - + _ + Primer C4 - + - - ++ - + - - + + Template + + + + + - + + + + + _ M Primer C l Primer C3 Primer C6 Primer C8 Template 1 + 2 + + + 5 + + + + -+ + + + Figure 4.2 PCR amplification of 4CL gene fragments from Rubus using 4CL gene-specific primers. A) Amplification of the Rubus genome with primer combinations (C1/C3)-(C2/C4). Lane 1, C1-C2; lane 2, C1-C4; lane 3, C l ; lane 4,C2; lane 5, C4; lane 6, no template control; lane 7, C3-C2; lane 8, C3-C4; lane 9, C3; lane 10, C2; lane 11, C4; lane 12, no template control. B) Amplification of the Rubus genome with primer combinations (C1/C3)-(C6/C8). Lanel, C1-C6; lane 2, C1-C8; lane 3, C3-C6; lane 4, C3-C8; lane 5, C l ; lane 6, C3; lane 7, C6, lane 8, C8. M , 1 kb molecular weights standard. 90 Primer Clones Clone Classification (%) Clones sequenced (#) Pairs Analyzed /u>. Class I Class II Others Class I Class II C1-C2 500 8.6 90.5 0.9 5 4 C1-C4 500 6.3 83.3 10.4 2 5 C3-C2 300 1.3 86.6 12.7 6 5 C3-C4 300 0.0 90.1 9.9 2 2 C1-C6 300 5.3 89.5 5.2 2 2 C1-C8 300 12.0 85.0 3.0 2 2 C3-C6 300 7.2 84.4 8.4 2 2 C3-C8 300 10.3 86.3 3.4 2 2 Table 4.2 Summary of analysis and characterization of putative Rubus 4CL clones from each primer pair. A number of clones, as indicated, from each primer pair were fingerprinted with multiple restriction enzymes, and classified into groups based on the fingerprints. Each class of clones obtained is shown as a percent of the total clones analyzed. Clones that were not 4CLs (confirmed by southern blots) or that lacked inserts account for the remaining clones ("others"). 91 Ri4cll GAATTCGTCTTTGCCTTCTTGGGAGCCTCGTTCTGCGGAGCCATGATGAC 50 I I I I I I I I I I I I I I I I ' l l I I I I I I I I I I I I I I I I I I I I I Ri4cl2 GAATTCGCCTTCGCATTCCTCGGCGCCTCGTACATCGGAGCCATGAGCAC Ri4cll AGCCGCCAACCCTTTCTTCACTCCGGCGGAGATCGCGAAACAAGCCAAGG 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 AACAGCCAACCCTTTCTACACTCCGGCCGAGGTGGCCAAGCAAGCCAAAG Ri4cll CATCGAAGGCGAAGCTGATCATCACTTTCGCTTGCTATTACGACAAAGTA 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 CCTCCAACGCCAAGCTCATCATAACTCAGTCGGCCTACGTGGACAAGGTT Ri4cll AAAGACTTATC-ATGCG—AC GAAGT GAAGT T GAT GT G CAT T GACT C GC C 197 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 AAGGACTTCGCGAAGCTCAACGACGTGAAAGTCATGTGCGTGGACGAAAC Ri4cll GCCACCTGACTCGTCTTGTCTT-CATTTCTCCGAACTGACTCAGTCAGAC 24 6 I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 ATCGTCTGA GGAT-GTCTTGCATTTTTCGGAGTTGATGTCCGCTGAC Ri4cll GAGAACGACGTGCCGGATGTGGACATCAGCCCGGACGACGTCGTGGCGTT 296 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 GAGAGCGAAACCCCGGCCGTCAAAATCAACCCGGACGACGTCGTCGCGCT Ri4cll ACCTTATTCCTCCGGGACGACGGGACTGCCGAAAGGGGTGATGTTGACGC 34 6 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 CCCGTATTCTTCGGGCACCACGGGGCTGCCGAAAGGCGTTATGCTGACGC Ri4cll ACAAAGGGCTGGTGACGAGCGTGTCTCAGCAGGTGGACGGAGAGAATCCG 396 M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 ACAAGGGGCTGGTGACAAGCGTGGCGCAGCAGGTCGACGGTGAAAACCCA Ri4cll AATTTGTACTACAGCAGCGACGACGTCGTTCTGTGCGTGCTG 438 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ri4cl2 AATCTGTACTTTCACAAGGAGGACGTGATCCTCTGCGTGCTG Figure 4.3 Nucleotide sequence of the two partial Rubus idaeus 4CL genes. 438 nucleotides of Ri4cll and Ri4cl2, characterized in detail, are shown. This region corresponds to nucleotides 299 to 737 of Arabidopsis 4CL1 (S54778). 92 Ri4cll Ri4cl2 GenBank %NT %AA %AA %NT %AA %AA Accession Identity Identity Similarity Identity Identity Similarity Numbers Ri4cll - _ _ 72 71 82 AF270933 Ri4cl2 72 71 ' 82 - - - AF270934 At4CLl 68 65 70 65 65 78 AF106084 At4CL2 68 62 72 67 63 78 AF106085 At4CL3 59 55 69 61 60 74 AF106086 Gm4CL16 72 51 70 67 67 79 X69955 Le4CLl 67 62 70 64 69 81 D49366 Le4CL2 57 51 64 57 56 71 D49367 Nt4CL 68 63 74 64 70 83 D43773 Nt4CLl 69 65 76 70 75 85 U50845 Nt4CL2 71 65 77 67 72 85 U50846 Os4CLl 59 45 56 61 52 63 X52623 Os4CL2 60 44 57 62 50 63 L43362 Pc4CLl 65 62 76 67 71 84 X13324 Pc4cL2 64 62 76 66 72 85 X13325 Pb4CLl 67 67 74 68 75 86 AF008183 Pb4CL2 66 65 75 68 77 87 AF008184 Pt4CLl 70 65 77 68 77 87 AF041049 Pt4CL2 70 53 67 68 77 87 AF041050 PU4CL1 65 58 71 67 68 80 U12012 PU4CL1 65 58 71 67 68 80 U12013 St4CLl 68 63 75 67 71 84 M62755 St4CL2a 68 64 76 68 72 85 AF150686 Vp4CL 62 66 74 66 73 83 X75542 Table 4.3 Sequence similarity of partial Rubus 4CL genes compared to each other, and to other 4CL genes", within the 438 nucleotide region shown in Figure 4.1 a Rubus ideaus (Ri), Arabidopsis thaliana (At), Glycine max (Gm), Lithospermum erythrorhizon (Le), Nicotiana tabacum (Nt), Oryza saliva (Os), Pinus taeda (Pit), Populus tremuloides (Pt), Populus balsamifera subsp. trichocarpa X Populus deltoides (Pb), Petroselinum crispum (Pc), Solanum tuberosum (St), Vanilla planifolia (Vp). 93 Pit4CL2 144 100 72 Pit4CLl 24 31 51 i i . 53 126 100 74 52 92 39 99 65 _25_ "Pt4CLl 51 Ri4CLl 47 "At4CLl 83 43 Pb4CLl 62 Pb4CL2 Ri4CL2 68 R4CL2 113 At4CL2 Class I At4CL3 Class II Figure 4.4 Relationship of Rubus 4CLs to members characterized from Arabidopsis thaliana and Populus spp. The most parsimonious tree was found using the heuristic search algorithm, PAUP 4.0b2. The number of steps between nodes is shown above the branch length, and bootstrap percentages (1000 bootstrap replicates) supporting the nodes are underlined. The tree had a consistency index of 0.888. The allelic 4CL sequences from pine (Pit; Pinus taeda) were used to root the tree. Abbreviations of the gene name are as follows: Rubus ideaus (Ri), Arabidopsis thaliana (At), Pinus taeda (Pit), Populus tremuloides (Pt), Populus balsamifera subsp. trichocarpa X Populus deltoides (Pb). 4.3.2 Isolation and characterization of ripening-related 4CL cDNAs A cDNA library representing partially-ripe fruit mRNA of Rubus was screened using a mixture of the PCR fragments Ri4cll and Ri4cl2 (Figure 4.1). Three rounds of low-stringency screening resulted in the isolation of 18 positive plaques. Direct amplification of the plaques in a PCR reaction with vector-specific forward primer T3 and T7 or T3 and ¥CX-gene- specific reverse primers C2+C4, confirmed that 12 plaques 94 had potentially full-length 4CL cDNAs (data not shown). Analysis of these 12 cDNAs with restriction enzymes, grouped the cDNAs into three groups. Representatives from the group 1 and 2 clones contained an EcoRI site 250 bp from the 5'-end. Group 2 clones contained additional restriction sites for Sail and Bgtl. Group 3 cDNA clones had multiple EcoRI cut sites at the 3'-end, and an Hindlll site that split the cDNA into two almost equal-length fragments. Detailed restriction maps of these clones are shown in Fig 4.5. Based on the restriction digest pattern, it was apparent that group 1 clones possessed a restriction digest pattern similar to that of the PCR fragment Ri4cll, and that group 2 clones were likewise related to the PCR fragment ofRi4cl2. To confirm these relationships, the nucleotide and the predicted amino acid sequence of one representative clone from each group was determined. Analysis of these sequences showed that the group 1 clone shared 100% amino acid identity (three nucleotide mismatches) with the predicted amino acid sequence of Ri4cll. The group 2 clone contained a ^CZ-like coding region, and shared 100% nucleotide identity to Ri4cl2. The group 3 clone represented a new class of Rubus 4CL gene, one which had not been identified by the PCR-based homology search. All three 4CL clones contained an in-frame methionine residue (putative start site) in a favorable sequence context for the initiation of translation (Joshi et al., 1987, Kozak, 1986). Translation of the nucleotide sequences of the three cDNAs showed that Ri4CLl encoded a protein of 503 amino acids, while Ri4CL2 and Ri4CL3 encoded proteins that contained 544 amino acids and 591 amino acids, respectively. 4CLs characterized in other species typically have a length of about 550 amino acids. A closer analysis of the Ri4CLl sequence revealed that there was a region of 40 amino acids downstream of the 95 first stop codon, that still showed high homology with other 4CLs. This suggested that either this change had occurred quite recently in evolution of the Rubus genome, or that the first stop codon detected was an artifact of the cloning process. To test the latter possibility, I used primers specific for R14CL1 (approximate position shown in Figure 4.5) to re-amplify the Rubus genomic DNA by PCR. The expected size of the amplified product was at least 350 bp, since an intron has consistently been found at this position in other 4CL genomic sequences (Ehlting et al, 1999). The amplification reaction with Ri4CLl gene-specific primers yielded products that were approximately 500 bp long, indicating that Ri4CLl also harbored an intron within the region amplified. Fragments from two PCR reactions were sequenced in full, and both had a single nucleotide change from Ri4CLl specifically within the triplet codon that accounted for the first stop codon of Ri4CLl cDNA (Figure 4.6 shows that exact position of the change). This change now created a codon for glutamine, an amino acid found at this position in other known 4CLs. This strongly suggests that the original Ri4CLl stop codon was an artifact of the cloning process, and therefore for all subsequent analysis involving Ri4CLl, I replaced the first stop codon (TAG) with the glutamine codon (CAG). The Ri4CLl cDNA comprised 1848 bp, consisting of 20 bp of 5'-untranslated region, a 199 bp 3'-untranslated region and a 1629 bp open reading frame. The Ri4CL2 cDNA comprised 2051 bp, consisting of an open reading frame of 1632 bp, 92 bp of 5'-untranslated region and 327 bp of 3'-untranslated region. The Ri4CL3 cDNA comprised 2083 bp, consisting of 70 bp of 5'-untranslated region, a 1773 bp open reading frame and 240 bp of 3'-untranslated region. The 3'-untranslated regions in the three Rubus 4CL cDNAs do not contain the conserved eukaryotic polyadenylation signal A A U A A A but 96 rather, AAUAAA-like sequences as is often the case for plant mRNA sequences (Wu et al., 1999) The calculated molecular mass and the predicted pi for each of the predicted polypeptides corresponding to three cDNAs is shown in Table 4.4. The calculated molecular weight observed for the three Rubus 4CLs were within the range observed for native 4CLs as in soyabean (55, 000; Lindl et al., 1973), Forsythia (55, 000; Gross and Zenk., 1974), and parsley (67,000; Knobloch and Hahlbrock, 1977) and pea (75,000; Wallis and Rhodes, 1977). It is interesting to note that the predicted amino acid sequences contain several potential post-modification sites. However, the biological relevance of these sites is uncertain given that (1) active preparations of 4CL enzymes have been obtained from recombinant proteins in E. coli cells and (2) the experimentally determined molecular mass for purified 4CL proteins matched the predicted mass in species such as parsley (Lozoya et al., 1988) and poplar (Allina et al., 1998), where 4CL has been studied at both the biochemical and genetic level. The coding regions of R14CL1 showed 67% arid 73% nucleotide sequence identity to Ri4CL2 and Ri4CL3 respectively. The predicted amino acid sequences of Ri4CLl and Ri4CL2 were more similar throughout their complete lengths to each other (72% amino acid sequence identity) than to the predicted Ri4CL3 sequence (62% and 63% amino acid sequence identity, respectively). The identity was more pronounced in the central and the C-terminal parts of the protein (Figure 4.7). A comparison of the predicted amino acid sequences of Rubus 4CLs to other known 4CL amino acid sequences is shown in Table 4.5. In a comparison of the Rubus 4CLs with Arabidopsis 4CLs, Ri4CLl and Ri4CL2 show the highest similarity to At4CLl (70% and 82% amino acid sequence identity), 97 while Ri4CL3 was most identical to At4CL3 (71% amino acid level). In general, the amino acid identity with other 4CLs ranged from a low value of 51% amino acid sequence identity between Ri4CLl and rice 4CL1, to a high value of 83% between Ri4CLl and soybean 4CL14 (83% amino acid sequence identity). In a comparison with other adenylate-forming enyzmes, the three cDNAs showed greatest similarity to 4CLs from plants, with decreasing homology to luciferase, long-chain-fatty-acid-CoA ligase, and peptide synthetases (data not shown). In a previous phylogenetic tree reconstruction (Figure 4.4) with the partial fragments of Rubus, Arabidopsis, and Populus 4CLs, I had shown that Ri4cll and Ri4cl2 belong to class I-type 4CLs. Based on that analysis, I had concluded that partial fragments of the 4CL genes could be sufficient to draw conclusions about the phylogenetic relationships amongst various 4CLs. I had also proposed that Rubus should have an ortholog of the class II type 4CL, which probably could not be detected by the particular primers used in my PCR-based homology screen. Amino acid sequence comparison (Table 4.5) showed that Ri4CL3 was more closely related to Arabidopsis 4CL3 and Aspen 4CL2 (class II phylogenetic group), than to Arabidopsis 4CL1, Arabidopsis 4CL2, aspen 4CL1, hybrid poplar 4CL1 or hybrid poplar 4CL2 (class I phylogenetic group). To study the relationship of Ri4CL3 to Ri4CLl, Ri4CL2 and other 4CLs, I constructed a phylogenetic tree based on full-length amino acid sequences of different plant 4CLs available in the database. As seen previously, there were two distinct phylogenetic classes: class I and class II originally described by Ehlting et al. (1999) (Figure 4.8). This was supported by the high bootstrap value of 100%. (1000 replications). As before, Ri4CLl and Ri4CL2 clustered with class I 98 4CLs, with Ri4CLl clustering with soyabean 4CL14 and aspen 4CL1, and Ri4CL2 showing a stronger relationship to hybrid poplar 4CL1 and 4CL2. Within the class I sequences, multiple members from Arabidopsis, tobacco, poplar, and potato clustered together. Ri4CL3, however, clustered with the class II phylogenetic class (Figure 4.8; Class II). The available monocot sequences (rice 4CL1, rice 4CL2 and sorghum 4CL) also grouped with class II. Allelic pine 4CL sequences were used to root the phylogenetic trees because they were the closest available outgroup and yielded the best alignment. However, the use of more distant roots {luciferase, or the putative AMP-binding proteins from Arabidopsis (GenBank, AAC34346) and Brassica (GenBank, 1903034)}, did not affect the overall topology of the analysis and did not affect the conclusions drawn (data not shown). Rubus 4CLs contain several sequence elements that are highly conserved among 4CL proteins. The N-terminus in each of the three predicted Rubus 4CL proteins carries the domain [LP( Y/F) S S GTTGPKG] (motif I, Figure 4.7), characteristic of both prokaryotic and eukaryotic enzymes that catalyze an ATP-dependent covalent linkage of AMP to their substrate (Schroder, 1989; Stuible et al., 2000). The amino acid sequence GEICIRG (motif II, Figure 4.7) has been found to be absolutely conserved in all 4CLs characterized to date. Analysis of this region in Rubus 4CLs showed that these seven amino acids are absolutely conserved in RJ4CL1 and Ri4CL2. Interestingly in RJ4CL3, motif II has a slightly different amino acid sequence, GEICVRG, although the change of an isoleucine to valine represents a conservative change. The other conserved residues among the Ri4CLs include several conserved cysteine residues (positions indicated in Figure 4.7) that are also found in most 4CLs. One or more of these cysteine residues 99 could be important for enzymatic activity, since 4CL is strongly inhibited by alkylating agents (Knobloch and Hahlbrock, 1977). cDNA Group RIA RI HI A X R14CL1 1 1 ' ' 1 ' 1 • < RI RI RI B SIA XI PI PI A X R14CL2 ' I I I " 1 1 LJ 1 2 RI A H RI A RI X \Ri4CL3 ' 1 1 ' ' 1—' 300bp Figure 4.5 Restriction maps of Rubus 4CL cDNA clones. The dark lines indicate the regions of Ri4CLl and R14CL2 characterized by the PCR-based homology search described in detail in section 4.3.1. The arrowheads inRi4CLl represent the position of the primers used to amplify the region in the corresponding genomic fragment to determine the validity of an internal stop codon found in the cDNA clone. The asterisk represents the approximate position of that stop codon. A, AccI; FJJ, BamHL; B, BgH; RI, £coRI; H, HindEl; PI, Pstl; SI, Sail; X, Xhol; XI, Xbal. 100 Ri4CLl-cDNA i?i 4 CLl-gDNA I TATTTCAAAATAG 1532 TATTTCAAAACAGgtatatatttttttaattgtttggaactaagggtcca Ri4CLl-cDNA Ri4CLl-gDNA acgtttgcggggtttcaaaatgcttactatctttgcagtaaccatttgag J?i4CXl-cDNA GTTGTATTC - 1541 Ri4CLl-gDNA ttgagattgaatctaaattgccactcactgtgtttctgcagGTTGTATTC Figure 4.6 Sequence alignments of the Ri4CLl cDNA clone and genomic clone (gDNA) in the region of the stop codon. The position of the nucleotides coding for the "spurious" stop codon in the cDNA is underlined. An intron 127 bp in length was found within the amplified region of the genomic clone. The nucleotide sequence of the intron is indicated by lower case letters and gaps have been placed in the corresponding region of the cDNA clone. The conserved acceptor and donor splice sites of the intron are indicated in bold. 101 Ri4CLl Ri4CL2 RI4CL3 Features of the Predicted Proteins #AA 543 Predicted MW (Da) 59,259 Theoretical pi 6.41 Potential Modification Sites 544 60,004 5.63 591 65,903 5.38 Glycosylation Sites N-glycosylation site 34-37, 4649, 86-89, 327-330,462- 35-38, 379-382, 385-388,463-(PS00001) 465 Phosphorylation Sites Protein kinase C 39-41, 61-63,203-205,302-304, phosphorylation site 320-322,486-488, 525-527 (PS00005) Casein Kinase II phosphorylation (PS00006) 48-51, 112-115, 142-145, 167-170, 180-183, 225-228, 266-269, 409-412, 493-496 466, 489-492 62-64, 204-206, 303-305, 321-323,471-473,518-520, 526-528 55-58, 113-116, 156-159, 169-172,471-474,494497 292-299,437444 Tyrosine kinase phosphorylation site 291 -298,436-443 (PS00007) N-myristoylation Sites N-myristoylation 73-78, 96-101,195-200, 206-211, 74-79,102-107,196-201,207-site (PS00008) 249-254, 310-315, 331-336, 359- 212,250-255, 332-337, 360-Signature Domain I Putative AMP-binding domain signature (PS00455) Signature Domain II Putative active site region3 364,489-494 188-199 387-393 365,490-495 189-200 388-393 11-14,64-67, 78-81 83-85, 93-95, 106-108, 244-246, 290-292, 343-345,418-420 20-23,38-41,98-101, 156-159, 193-196, 307-310, 329-332,450453 110-115, 117-122, 145-150, 236-241, 320-325, 372-377, 400405, 531-536 229-240 529-535 Table 4.4 Features of the predicted proteins corresponding to the three Rubus 4CL cDNAs. Potentially relevant modification sites of the proteins were analyzed by comparisons to known patterns in the PROSITE database. The amino acid positions of these modifications have been indicated for each protein. The numbers in brackets indicate the PROSITE documentation code for each type of modification. a This domain has been identified as part of the active site of 4CL enzymes (Becker-Andre et al, 1991) 102 Figure 4.7 Comparison of the deduced amino acid sequences of the Rubus 4CL genes to each other and to the 4CL consensus. The consensus was determined from an alignment of the 4CL proteins mentioned in Section 4.2.1. Amino acid residues for Rubus 4CL2 and 4CL3 are shown only when they differ from the predicted sequence of Ri4CLl. Dots represent amino acids conserved in all three sequences. Dash indicates the gaps introduced to maximize alignment. Putative initiation methionine residues are in bold; translational stop codon are indicated by asterisks; conserved cysteine residues are indicated by a plus; conserved motifs (I and II) postulated to be involved in 4CL enzyme activity are boxed. Comparable amino acids in Ri4CLl mutated by Stuible et al. (2000) in At4CL2 are identified by blue. The amino acids tracts targeted by the degenerate PCR primers are indicated by closed arrows and amino acid tracts targeted by gene-specific primers are indicated by open arrows. 103 Ri4CLl Ri4CL2 Ri4CL3 Consensus Ri4CLl Ri4CL2 Ri4CL3 Consensus Ri4CLl Ri4CL2 Ri4CL3 Consensus Ri4CLl Ri4CL2 Ri4CL3 Consensus Ri4CLl Ri4CL2 Ri4CL3 Consensus MAVQTP QHNIVYRSKLP MENKHQD D.EFIF MISIASNNNNNNSVWETPTKPEISPNIISDVISTSQTQPEQK.Q.PTTTTHH.FK.... RSKL-DIH-IPNHLPLHSYIFQNKSHLTSKPCIINGTTGDIHTYAKFKLTARKVASGLNKLGIEK ..Y- T.C.E.I.QFHDR. .L.. .N..ETF...EVE. .S.R..A. .D....QQ .LPN.S T.C.E.I.DFSER. . L . I. S . .KSY.FSETR.LSQ.TGV. .S. . ..H. DI — I L-LH-Y C-I R—A G ^ GDVFMLLLPNTSEFVFAFLGASFCGAMMTAANPFFTPAEIAKQAKASKAKLIITFACYYD N..V....Q.CP..A YI...S.T....Y....V N QSA.V. . . .V.I. .Q.CA M. . . MI. . VT . T . . . . Y. AS . . F. . LE . . N QSQ.V. M-LL-N—E F—AS—GA—T—NP E Q T K V K D L S - C D E V K L M C I D S P P P D S S C L H F S E L T Q S D E N D V P D V D I S P D D W 4 L P Y S S G T T G . . . . F A K L N D . . V . . V . E T S S E D - V M S A . . S E T . A . K . N . . . . L R Q P G — Q H F Q W T . . D - . . E N - V . S D A N . . E L . Q . S . D . . . P . J K-. .F. . D~ ALP-SSGTTG LPIfGVMLTHKGLVTSVSQQVDGENPNLYYSSDDVVLCVLPLFHIYSLNSVLLCGLRAGAA A FHKE. .1 F V. . . .1 S.I. . .A LKG F S L-tfGV-LTH VA—VDG-N-N DV—C-LP-FHI N R Ri4CLl ILLMQKFEIVSLLELMQKHRVSVAPIVPPTVLAIAKFPDLDKYDLGSIRVLKSGGAPLGK Ri4CL2 . .I NK. . . .VE.EK.TI. .F.. .1..S. . .C. . .HR. . .S.. .MVM. .A. .M.. Ri4CL3 V.V.P GT I.RY. . FCGGGWCLAGDSAGEESMVAD. . .S VL. .A Consensus M—F V VPP-V K D-S—R SGAAP-G-18 19 61 77 78 121 137 138 181 196 197 237 256 257 297 316 317 357 Ri4CLl ELEDTVRAKFPNVTLGQGYGMTEAGPVLTMSLAFAKEPFEVKPGGCGTWRNAELKIVDP Ri4CL2 L. .AK S.C Y.I.S.A M..I.. Ri4CL3 . . .EAL.NRV.QAV S.C Q..PT.S.S..S V.E. Consensus — E — -P-A—GQGYGMTEAGPVL-M-+ ( I D -FAK-P KSG-CGTWRN- -K— D-Ri4CLl Ri4CL2 Ri4CL3 Consensus ETGASLPRNHIGEICIRGJHQIMKGYLNDPEATRTTIDKQGWLHTGDIGFIDDDEELFIVD D.NE Q£ . . .R. . GY.Q V -T—SL—N-- GEICIRGj-p EN E Y D Is G. . .A. .V.VE YV. . .D.V. . . . IMKGY-N-P—T—T GWL-TG—G—DDD-E-FIVD 376 377 417 436 437 477 Ri4CLl RLKELIKYKGFQVAPAELEALLVTHPNISDAAWPMKDDAAGEVPVAFWSPKG-SQITE 4 95 R14CL2 M.IS...L S...E RSN.-.K.S. 496 Ri4CL3 .V F P S . . IS . . SMQMQL. . . Q RSN.GNEL.. 537 Consensus R-KE-IK—GFQV-P-ELE-LL—HP D-AW E-PVAFV E Ri4CLl DEIKQFISKQWFYKRIKRVFFIEAIPKSPSGKILRKELRAKLAAGFAN* 543 Ri4CL2 .D...Y SK...TDK...A D. . . R. . . . LP. * 544 Ri4CL3 EAV.E..A LHK.Y.VH A D AATPNPHHPI* 591 Consensus K-F—K-V—YK V K R — L 104 Species • r* RJ4CL1 RJ4CL2 Ri4CL3 GenBank specific %AA %AA %AA %AA % AA % AA Accession 4CLs a Identity Similarity Identity Similarity Identity Similarity Number Ri4CLl - - 72 83 62 75 AF239685 Ri4CL2 72 83 - - 64 75 AF239686 RJ4CL3 62 75 64 75 - - AF239687 At4CLl 70 80 82 90 60 75 AF106084 At4CL2 68 81 71 84 60 76 AF106085 At4CL3 62 77 66 81 71 83 AF106086 Gm4CL16 66 80 69 84 78 85 bX69955 Gm4CL14 83 89 80 90 81 87 ^31686 Le4CLl 69 80 75 85 73 81 D49366 Le4CL2 62 76 62 77 62 77 D49367 Nt4CL 71 81 77 86 64 80 D43773 Nt4CLl 72 83 78 88 63 80 U50845 Nt4CL2 72 83 78 87 63 79 U50846 Os4CLl 51 73 63 78 55 73 X52623 Os4CL2 58 72 61 75 64 77 L43362 Pc4CLl 71 83 77 87 62 80 X13324 Pc4cL2 71 83 76 87 63 81 X13325 Pt4CLl 74 84 82 90 65 82 AF008184 Pt4CL2 73 83 80 90 63 80 AF008183 Pb4CLl 73 85 75 86 58 76 AF041049 Pb4CL2 63 77 67 81 73 83 AF041050 Pit4CLl 64 79 69 84 63 80 U12012 Pit4CLl 64 79 68 83 63 80 U12013 St4CLl 71 82 76 86 62 79 M62755 St4CL2a 71 82 76 87 62 79 AF150686 Vp4CLl 72 82 78 87 65 81 X75542 Table 4.5 Amino acid sequence similarity among the full-length 4CLs. a Rubus idaeus (Ri), Arabidopsis thaliana (At), Glycine max (Gm), Lithospermum erythrorhizon (Le), Nicotiana tabacum (Nt), Oryza sativa (Os), Pinus taeda (Pit), Populus tremuloides (Pt), Populus balsamifera subsp. trichocarpa X Populus deltoides (Pb), Petroselinum crispum (Pc), Sorghum bicolor (Sb), Solanum tuberosum (St), Vanilla planifolia (Vp). Comparison to partial length sequence 105 48 66 115 [Too 59 61 44 26 39 31 40 60 60 68 73 26 Gm4CL14a R.4CL1 68 Pt4CLl 62 Pc4CLl 18 19 69 56 51 100 Pc4CL2 32 31 99 16 Vp4CL ' Nt4C12 Nt4CL Nt4CLl 21 r (TooLL? 100| 24 ^ St4CLl 4 St4CL2 67 Le4CLl 26 30 36 89 90 48 Pb4CLl Pb4CL2 46 Ri4CL2 53 48 99 50 At4CLl - At4CL2 Class I 29 34 32 46 61 60 92 61 72 59 37 75 101 83 Gm4CL16a • R4CL2 — R14CL3 - Le4CL2 Class II At4CL3 72 46 100 41 - Sb4CLa Os4CL2 131 Os4CLl Pit4CL2 lPit4CLl Figure 4.8 Phylogenetic relationships amongst plant 4CL proteins. The most parsimonious rooted phylogenetic tree was constructed from an alignment of plant 4CLs, using the heuristic search within PAUP 4.02b. The allelic pine 4CL sequences were used as the outgroup. Branch lengths are indicated above the branch lines and clustering percent support derived from 1000 bootstrap replications underlined. The tree has a consistency index of 0.764. Abbreviations of the gene name are as follows: Rubus idaeus (Ri), Arabidopsis thaliana (At), Glycine max (Gm), Lithospermum erythrorhizon (Le), Nicotiana tabacum (Nt), Oryza sativa (Os), Pinus taeda (Pit), Populus tremuloides (Pt), Populus balsamifera subsp. trichocarpa X Populus deltoides (Pb), Petroselinum crispum (Pc), Sorghum bicolor (Sb), Solanum tuberosum (St), Vanillaplanifolia (Vp). 106 4.3.3 Enzymatic activity of the recombinant Ri4CL3 protein Ri4CL3 is the most divergent of the three members of the Rubus 4CL gene family detected in this study. Ri4CL3 has a single amino acid substitution within motif II (GEICIRG) (Figure 4.7), a region that is absolutely conserved in all other 4CLs examined (Ehlting et al., 1999) and has been proposed to be associated with enzyme stability and catalytic activity (Stuible et al., 2000). The predicted Ri4CL3 protein also has an extended N-terminus which includes a string of six asparagine residues. To test the effect of these substitutions/additions on the enzymatic activity of Ri4CL3,1 expressed the open reading frame of Ri4CL3 in E. coli, and determined the biochemical properties of the recombinant protein. The recombinant Ri4CL3 was expressed as a fusion protein either containing or lacking an N-terminal His6-tag. Expression of recombinant Ri4CL3 was not affected by the presence or absence of the N-terminal His6-tag fusion (data not shown). The recombinant protein had an apparent molecular mass of approximately 65,000 Da and was recognized by antibodies raised against parsley 4CL (Figure 4.9, lane 3). Recombinant Ri4CL3 present in crude bacterial extracts was tested for its ability to utilize differently substituted hydroxycinnamic acids or benzoic acids as substrates. The recombinant protein was most active with 4-coumarate and displayed lower activity towards caffeate, with no detectable activity towards cinnamate, ferulate, sinapate, or benzoic acids (Table 4.6). No differences in the relative 4CL activities were detected when the proteins were expressed with or without N-terminal His6-tag (data not shown), and no 4CL protein (Figure 4.9; lane 1) or 4CL activity (not shown) was detectable in the protein extracts of bacteria harboring the empty expression vector pQE. 107 1 2 3 kDa 111 74 45.5 29.5 IPTG + - + Figure 4.9 Immunoblot analysis of recombinant Rubus 4CL3 expressed in K coli. Crude bacterial protein extracts (15 pg) were separated in a 12% SDS-PAGE gel, transferred to a PVDF membrane, and probed with antibodies raised against parsley-4CL. Lane 1, Extracts from bacterial cells carrying the control plasmid pQE-30; lane 2 and lane 3, Extracts from bacterial cells harboring the expression plasmid pQE-4CL3. Lane 1 and lane 3, bacterial cells were induced with IPTG (1 mM) for 4 h. Molecular mass standards (kDa) are shown on the right. Substrate Relative specific activity (0.2 mM) (4-coumaric acid = 100%) Cinnamic acids Cinnamic acid nc 4-Coumaric acid 100 Caffeic acid 29 Ferulic acid nc Sinapic acid nc Benzoic acids Benzoic acid nc 2- Hydroxybenzoic acid nc 3- Hydroxybenzoic acid nc 4- Hydroxybenzoic acid nc Table 4.6 Substrate utilization by K co//-expressed Rubus 4CL3 recombinant protein. 4CL enzyme activity was measured in crude bacterial extracts expressing recombinant Ri4CL3 (pQE-4CL3). 4CL activity is expressed as a percent of the activity obtained using 4-coumaric acid as a substrate. One hundred percent activity represents 10 nkat/mg crude protein for 4-coumaric acid. Results are averaged from three independent expression experiments and enzyme activity determination, nc, no conversion detected. 108 4.3.4 Developmental regulation of the Ri4CL genes 4.3.4.1 Design of gene-specific oligonucleotide primers Unique regions within the open reading frame of the Ri4CLs cDNA were used to design PCR primers for use in a quantitative PCR analysis (Figure 4.7). With each set of gene-specific primers the specificity of the primers was demonstrated by testing their ability to amplify DNA from alternative plasmid Ri4CL cDNAs. As shown in Figure 4.10 the primers preferentially amplified their cognate cDNAs. 600 bp^ mm mm — |-Target M 1 2 3 4 5 6 7 8 9 Figure 4.10 Specificity of the gene-specific primers used for QRT-PCR analysis. Lane 1, 4 and 7, Ri4CLl cDNA; lane 2, 5 and 8, Ri4CL2 cDNA; lane 3, 6 and 9, Ri4CL3 cDNA; lane 5, Ri4CLl cDNA. Lane 1-3, gene-specific primers for Ri4CLl; lane 4-6, gene-specific primers for Ri4CL2, lane 7-9, gene-specific primers for Ri4CL3. M , 100 bp molecular weight marker (Life technologies) 4.3.4.2 Developmental expression oiRi4CL transcripts Characterization of the Rubus 4CL cDNAs demonstrated that there were at least three classes of divergent 4CL genes. Phylogenetic analysis revealed that Ri4CL3 belonged to class II type-4CLs, whereas R14CL1 and R14CL2 grouped with class I-type 4CLs. Enzymatic properties of the recombinant Ri4CL3 proteins showed that Ri4CL3 used only 4-coumarate or caffeate as substrates in contrast to 4CLs from other species that also use other hydroxycinnamic acid derivatives such as cinnamate, ferulate, and 5-hydroxyferulate (Lee et al., 1996; Hu et al., 1998; Allina et al , 1998; Ehlting et al., 109 1999). To determine if these divergent forms of Ri4CLs were also differentially expressed, I initially conducted a northern blot analysis using R N A isolated from various organs of Rubus and flowers and fruits at different developmental stages. I failed to detect any hybridization signals even after probing northern blots loaded with as much as 20 u.g tRNA per lane, suggesting that Ri4CL transcripts do not comprise a substantial population within the total R N A from the various tissues studied. In order to determine the relative abundance of the three Ri4CL transcripts, I therefore used the quantitative RT-PCR (QRT-PCR) without the internal standards to determine the levels of expression of the three Ri4CL genes. To avoid making comparisons at the plateau phase of the PCR, cDNA samples representing various Rubus tissues were amplified for 28 and 32-cycles with only Ri4CL3 gene-specific primers. Preliminary RT-PCR data had indicated that R14CL3 is highly expressed in all the tissues investigated. Amplifications with combinations of Ri4CL3 gene-specific primers gave similar results with both 28 and 32-cycle protocols, and no signal saturation was detected in any tissue sample. Thus, for further analysis I used 30 repeated PCR cycles of amplifications with each gene-specific primer pair combination. Figure 4.11 presents these results, which reflect the developmental regulation of each gene-family member across the various Rubus organs studied. Since a constant amount of the same cDNA sample from each organ was amplified with each of the three Ri4CL gene-specific-primer combinations, one can directly compare the expression of the three Ri4CL transcripts in each organ. In contrast to Ri4CL2 and Ri4CL3, Ri4CLl was constitutively expressed at low levels in all organs investigated except the leaves. The abundance of the three transcripts 110 in leaves was distributed in a ratio of ca. 10:2:1 (Ri4CLl: Ri4CL2: Ri4CL3), suggesting that, in leaves Ri4CLl is the dominant species. In the various developmental stages of flowers and fruits, with one exception (stage II fruits), the expression pattern of Ri4CL2 was reproducibly similar to that of Ri4CL3, although the relative amount of Ri4CL2 mRNA was lower than that of R14CL3. In shoots, Ri4CL2 was highly expressed compared to Ri4CLl and Ri4CL3, suggesting that it is primarily responsible for synthesis of CoA esters. Although Ri4CL3 was detected in all the organs that were examined, the abundance of Ri4CL3 was much lower in leaves and shoots than in roots. In maturing flowers the levels of Ri4CL3 was high in stage I and II tissues and declined in fertilized flowers (stage III). As the fertilized flowers developed began to form fruits, the levels of R14CL3 expression increased in green immature fruits (stage I), and remained at moderately high levels during stage II and III, reaching a maximum in stage III, and declined in stage IV. These results show that, although all the Ri4CL genes are expressed in all the Rubus organs studied, each family member has a distinct pattern of developmental regulation. The three Ri4CL genes investigated here belonged phylogenetically to two distinct classes, which had evolutionarily bifurcated before the divergence of dicots and monocots. However, the developmental regulation of the three Ri4CL genes studied here does not appear to be correlated with their phylogenetic class, since the expression pattern of Ri4CL2, which belongs to phylogenetic class I, was most closely aligned with the expression pattern of Ri4CL3, which belongs to phylogenetic class II. This indicates that the signaling mechanisms controlling their developmental expression evolved independently of the genes themselves. I l l Figure 4.11 Expression of specific Ri4CL transcripts in different organs of Rubus as estimated by QRT-PCR analysis. A) QRT-PCR analysis was performed using 250 ng total RNA isolated from young leaves, shoots, roots and different developmental stages of flowers and fruits. Following 30 cycles of amplification, 50% of the products were resolved in a 3% TAE-agarose/EtBr gel. B) The relative expression levels of the Ri4CL transcripts (expressed in arbitary units (A.U.) per 250 ng of total RNA); were estimated by measuring the EtBr luminescence emanating from the PCR products resolved by gel electrophoresis. Similar results were obtained in two independent experiments. The intensity of the Ri4CL bands was normalized to the average intensity of the RiHis3 products as a control for equal amounts of starting tRNA. The expression levels of the three Ri4CL transcripts in each organ are directly comparable in this graph. M , lOObp DNA size marker (NEN Life Sciences). 112 500bp-| 500 bp -f 500 bp-f 500 bp-I 500 bp-f W 0£ 00 *"* 40 «• mm mm m mm mm mm mm -— — M o o JS a a OS B 5 3 H H £ > ? *3 2 *2 "3 3 3 Z »" > fe-te. S w B) 400-, BJD 300 4 1 tftfCZJ I Ri4CL3 200-» ioo4 6B 09 H cu -*•» > O O « 2 "3 -i o •-It o 5 5 ^ e -2 S3 2 o •a * S -a 2 B ? S w fa w 113 4.4 Discussion In this chapter, I describe the 4CL gene family from Rubus idaeus, a commercially important fruit crop species. Multigene families appear to be the norm for 4CL in plants and my results confirm the existence of at least three 4CL genes in the Rubus genome. Screening of cDNA/gDNA libraries has resulted in identification of multiple 4CL genes from Arabidopsis (Ehlting et al., 1999), rice (Zhao et al., 1990), soybean (Uhlmann and Ebel, 1993), Lithospermum (Yazaki et al., 1995), tobacco (Lee and Douglas, 1996), poplar (Allina et al., 1998), potato (Becker-Andre et al., 1991), aspen (Hu et al., 1998) and various species of pine (Wang et al., 2000). A major limitation of hybridization-based screening approaches using cDNA as the source material is that the likelihood of detection of a given family member is highly dependent on the temporal and spatial patterns of gene expression. For example, 4CL was initially suggested to exist as a single-copy gene in Arabidopsis, based on cDNA library screening (Lee et al., 1995), but later work revealed a multigene At4CL family (Ehlting et al., 1999). Several more sequences can be identified in the Arabidopsis database that also have homology (38-44% amino acid identity) to bona fide 4CLs, although the catalytic properties of these sequences remain undefined. Likewise, based on screening of a xylem cDNA library, PAL was suggested to exist as a single-copy gene in pine (Whetton and Sederoff, 1992). However, using a PCR-based homology search, Butland et al. (1998) were able to identify at least eight divergent pine PAL sequences. I therefore used a PCR-based homology search method to screen the Rubus genome for identify potential members of the 4CL gene family in an expression-independent manner. PCR-based screening has also been successfully employed in isolation of diverse terpenoid synthase 114 genes from Grand fir (Bohlmann et al., 1997), glutamine synthetase genes from monocots and dicots (Perez-Vicente et al., 1996) and protein tyrosine kinase genes from mouse, Drosophila and Caenorhabditis (Oates et al., 1998). PCR-based screening is not without its own challenges, however. Subtle differences in primer sequences can have a pronounced effect on the amplification efficiency during PCR. Given the degeneracy of the 4CL primers, including use of inosine, and the absence of exact knowledge of the target sequences in the genome, it was predictable that some Ri4CL sequences might be amplified more effectively than others. Primer CI in combinations with C2/C4 was more efficient than its sister primer C3 in amplifying the Ri4cll gene. Differences between the amplification efficiencies of sister primers C1/C3 in combinations with C6/C8 were also detected (Table 4.2). In general, all the primer combinations used were more effective in amplifying Ri4cl2 than Ri4cll. Similar discrimination with degenerate PCR primer pairs was observed by Butland et al. (1998), where primers P7-P6 amplified a single pal locus in pine, whereas P5-P2 amplified four additional pal sequence classes. This suggested that splitting of highly degenerate codons between sister primers could provide an effective means of preferentially amplifying alternative gene-family members. Since I was interested in elucidating the role of specific 4CL genes in the biosynthesis of divergent polyketide end-products in fruits, I wanted to be confident that the PCR-based search had been successful in identifying the full repertoire of Rubus 4CL gene-family members. I therefore screened a fruit-specific cDNA library at low stringency with a mixed probe of Ri4cll and Ri4cl2. The resulting clones fell into three groups, rather than two groups. The deduced amino acid sequences of group 1 and group 115 2 cDNAs, correspond to the partial fragments of Ri4cll and Ri4cl2 and were clearly the full-length versions of Ri4cll and Ri4cl2, respectively. Ri4CLl and Ri4CL2 are related most closely to dicotyledon angiosperm 4CL genes (62-83% amino acid identity), with a weaker relationship to 4CL from rice (51-63%) amino acid identity) or pine (64-69%) amino acid identity) (Table 2). The deduced amino acid sequence of the third group of Rubus fruit 4CL cDNAs isolated, referred to as Ri4CL3, shares only 62-64%) amino acid identity to RI4CLJ and Ri4CL2 (Table 4.5), but as with R14CL1 and Ri4CL2, Ri4CL3 was more closely related to angiosperm dicot 4CLs (60-81% amino acid identity) than to rice (55-64%) identity) or pine sequences (63% amino acid identity) (Table 4.5). Thus, the cDNA library screening revealed that despite designing multiple PCR primers for amplification of 4CL genes and analyzing many amplification products, these primers were not competent to amplify all members of the Rubus 4CL gene-family. The differential efficiencies of the primers probably derive from the alternative conserved amino acids found in different 4CL genes within the target regions. For example, Ri4CL3 has amino acid sequence FSKLPDL at the positions targeted by the forward primers C1/C3, which were designed to bind the amino acid sequence RSKLPDI. This difference results in three nucleotide changes in the primary sequence of Ri4CL3 relative to those in primers C1/C3, and most importantly, one of those changes occurred at the 3'-end of the primers. While the nucleotide sequences of the antisense primer C6 was reflected in the primary coding sequence of Ri4CL3, the sequence of the antisense primer C8 diverged at two positions. Similarly, the sequence of primers C2/C4 diverged at three or four positions from the primary sequence present in Ri4CL3 cDNA. It is likely that these differences in the primary sequence of Ri4CL3 reduced the efficiency of primer binding 116 and resulted in very low amplification of this gene family member. More subtle differences probably account for the higher amplification efficiency with Ri4CL2 than with Ri4CLl. Thus, while PCR and conserved motif-based searches do avoid the systematic bias inherent in hybridization screens using a full or partial cDNA probe, they also cannot guarantee that all related genes will be detected. Comparisons of the three Rubus 4CL genes to the consensus derived from other 4CL genes (Figure 4.7) indicated that much of the amino acid identity between the Rubus 4CLs and other 4CLs is restricted to the C-terminal portions of the 4CLs. Near the N-termini, short stretches of amino acid sequence conservation were detected, most notably the conservation of the motif I (Figure 4.7). This motif is not only highly conserved among all plant 4CLs observed to date, but is also found in other adenylate-forming enzymes. Near the C-terminus of motif I is an absolutely conserved Lys (K211 in At4CL2). Recombinant At4CL2 with a mutated K211 (K21 IS) retains only 5% of the activity of wild type enzyme, though it retains K m values for caffeate and ATP similar to those of the wild-type recombinant protein (Stuible et al., 2000). Based on this observation, it has been suggested that this motif is likely to be involved in catalysis rather than in ATP-binding (Stuible et al., 2000). Motif II (Figure 4.7), closer to the C-terminus of the protein, is absolutely conserved in Ri4CLl and RI4CL2, as it is in other plant 4CLs, but is slightly modified in Ri4CL3. An earlier study involving site-directed mutagenesis in this region resulted in inefficient expression in E. coli and a catalytically inactive product, suggesting that motif II is required for either stability or catalytic activity (Becker-Andre et al., 1991). However, recombinant At4CL2 harboring mutations E401Q or C403A within motif II, were found to retain 21 and 45% (respectively), of the 117 wild-type activity, with no significant alterations in the K m values for caffeate and ATP (Stuible et al., 2000). These results indicates that the cysteine within this motif is not directly involved in adenylate or thiol ester formations as had been proposed earlier for this residue. Stuible et al. (2000) also detected considerable loss in activity after mutating residues R449, K455, K457, and K540in At4CL2, suggesting that these residues are all important for the catalytic functions of 4CLs. Amino acids corresponding to these residues are conserved in all three Rubus 4CLs. Phylogenetic analysis of the full-length amino acid sequences allowed me to examine the relationship of Ri4CL3 (obtained from the cDNA screening) to the previously characterized Ri4CLl and Ri4CL2, and to other plant 4CLs. The tree obtained with full-length sequences had the same overall topology as that derived from the partial sequences, and placed all the available 4CL sequences in two classes (Figure 4.8), as was also observed by Ehlting et al. (1999). Within the class I cluster, the isoforms from Arabidopsis, poplar, tobacco, parsley, and potato grouped together, while RJ4CL1 grouped with pathogen non-inducible Gm4CLJ, and aspen 4CL1. RJ4CL2, on the other hand, grouped with poplar 4CL2. It is interesting to note that although aspen and poplar belong to the same genus (Populus), their respective 4CL sequences do not cluster together, indicating that additional 4CL genes may exist in each of these species. In fact, additional poplar 4CL genes have recently been isolated that are quite divergent from the two sequences already in the GenBank database (Cukovic, 1999). This pattern of relationships within 4CL families suggests that some class I genes have evolved by recent gene duplication events within respective plant lineages, as illustrated by the existence of pairs of 4CL genes in poplar, potato, tobacco and parsley. Other class I genes, however, 118 must have evolved from old gene duplications as illustrated by the divergent Rubus and Populus genes. Alternatively, a sequence of duplication/deletion cycles may have resulted in the loss of paralogous loci between genera as has been proposed for 4CL genes in various species of Pinaceae (Wang et al., 2000). An important factor limiting my ability to reconstruct the history of duplication and divergence may be the relatively narrow range of dicotyledon taxa for which data are available, and the non-comprehensive isolation methods that have been used for the characterization of the gene family members. Ri4CL3 grouped with the class II cluster of angiosperm 4CL sequences and was most closely related to Lithospermum 4CL2, Arabidopsis 4CL3, rice 4CLs and soyabean 4CL16. The origin of class II4CL genes must reflect an ancient divergence from class I genes, since specific 4CL genes from highly diverged plant taxa grouped together in class II rather than with other 4CL genes from the same species. Such gene duplication and divergence processes may play an important role in the evolution of functional diversity by altering substrate specificity, interactions with upstream effectors and/or activation processes. Consistent with this, genes belonging to class II share structural and functional differences from those in class I. For example, Arabidopsis 4CL3, Lithospermum 4CL2, rice 4CLs, aspen 4CL2, and Ri4CL3, all in class II, have N-terminal extensions of 19-40 amino acids, which are absent from class I sequences from the same species. It has been proposed that these variant N-terminal regions might be involved in phenolic substrate binding specificity (Hu et al., 1998) but confirmation of this hypothesis will require the structure determination of enzyme-ligand complexes. 119 Further evidence for divergence can be obtained at the gene level. Genomic clones of Arabidopsis 4CL3 have six introns in contrast to the three introns observed in the Arabidopsis class 14CLs (Ehlting et al., 1999). Likewise, aspen Pt4CLl had four introns and the promoter regions carried three consensus c/s-acting elements (box P, A and L) found in all the plant phenylalanine ammonia-lyase and 4CL gene promoters characterized at the time, while aspen Pt4CL2 had five introns and none of the consensus boxes P, A and L within the analyzed flanking regions (Hu et al., 1998). The expression patterns of class I genes are also clearly differentiated from those of class II in Arabidopsis, (Ehlting et al., 1999), soybean (Uhlmann and Ebel, 1995) and aspen (Hu et al., 1998). For the 4CL sequences reported to date, only a single member of the class II type has been found within the 4CL gene-family of any dicot species. By contrast, rice, a monocot species, has been found to be have a family of class II sequences. It remains to be determined whether this is typical of monocot genomes, or for that matter, whether gymnosperms may also have class II-type 4CL sequences that have yet to be identified. The tissue and developmental expression patterns of each Rubus 4CL gene family member were examined in order to identify 4CL genes that are active during the process of fruit-ripening and thus might be directly involved in supplying hydroxycinnamyl CoA thioesters to specific branch pathways. Quantitative RT-PCR analysis demonstrated that all of the Ri4CL genes are differentially expressed. Ri4CLl is expressed primarily in leaves, with lower levels observed in shoots, roots and developing fruits and flower. Because of its relatively low expression in developing flowers and fruits, Ri4CLl is unlikely to play a major role in the biosynthesis of "raspberry ketone" or flavonoid derivatives in fruits and flowers. Instead, Ri4CLl may be important for the biosynthesis 120 of phenolic derivatives in leaves. Ri4CL2 was expressed most highly in shoots, with moderate levels of expression in roots and low levels in leaves and developing flowers. During fruit development, Ri4CL2 was weakly expressed in young or mature green fruits (developmental stages II and III), and the expression declined further as the fruits started accumulating anthocyanins (developmental stage III). It did, however, increase again as the fruits ripened (developmental stage IV), and declined in somewhat dehydrated fruits (stage V). Comparing across tissues, Ri4CL3 was expressed most strongly in roots, relative to shoots and leaves. During the process of flower development, Ri4CL3 was expressed in fully mature flowers (stage II), while during fruit development it was expressed most highly in fully mature red fruits (stage IV). While the individual Ri4CL gene family members are differentially expressed, their expression patterns do not readily map to the process of biosynthesis of phenylpropanoids during development. If these three genes do represent the full repertoire of 4CL genes in Rubus, they are probably involved in the biosynthesis of non-fruit phenylpropanoids such as lignin, and in response to stresses such as wounding, U V irradiation or pathogen attack. Reports of 4CL genes induced by various stressors suggests that such responses are not restricted to a particular phylogenetic class of 4CL. While At4CLl and At4CL2 (belonging to phylogenetic class I) were inducible by pathogen, wounding and UV-light in Arabidopsis (Ehlting et al., 1999); Gm4CL16 (belonging to phylogenetic class II) was pathogen-inducible in soybean. Ri4CL3 is a particularly interesting member of the Rubus 4CL gene-family because of the change it possesses in the motif II sequence. Recombinant Ri4CL3 proteins were therefore examined in an effort to determine the significance of the 121 differences. 4CL has been produced previously as recombinant protein both in E. coli (Lee and Douglas 1996; Hu et al., 1998; Ehlting et al., 1999) and in an insect cell/baculovirus expression sytems (Allina et al, 1998). In each case, catalytically active enzymes have been obtained, indicating that 4CL does not need any further post-translational modifications for its enzymatic activity. Recombinant Ri4CL3 used 4-coumarate more efficiently than other cinnamic acid derivatives, as expected for the 4CL class of proteins. Recognition of the protein antisera raised against parsley 4CL (data not shown) also confirmed the authenticity of the protein. The substrate utilization profile and the phylogenetic relationship of Ri4CL3 suggests that its primary function may be to provide activated 4-coumarate CoA ester for branch pathways of phenylpropanoid metabolism that only require activated 4-coumaric acid, as opposed to other activated hydroxycinnamic acid derivatives. In Rubus, this could support either the flavonoid or the flavour pathway or both. Distinguishing between these possibilities would require creation and characterization of transgenic Rubus in which this gene had specifically been silenced. The failure of Ri4CL3, a bona fide 4CL-type enzyme, to utilize benzoic acids, is consistent with the substrate utilization profiles of native 4CL from parsley (Knobloch and Hahlbrock, 1977). It is interesting, however, that Barillas and Beerhues (1997) found that cell cultures of Centaurium erythraea contain two forms of CoA ligase, that exclusively use either cinnamic acids or benzoic acids as substrates and thus respectively represent a 4-coumarate CoA ligase and a 3-hydroxybenzoate CoA ligase (3HBL). Purification of 3HBL suggested a M r o f ~ 50,000 and a pH optimum cofactor requirements, and stability similar to those of plant 4CLs, although the temperature optimum was more similar to that of the bacterial CoA ligases (Barillas and Beerhues, 122 2000). Although 3HBL is smaller than the known plant 4CLs based on the predicted M r , the physical and biochemical similarities between 4CLs and 3HBL suggest that the two CoA ligases may also share considerable primary sequence homology. This suggests that, like the plant polyketide synthase gene family (Dixon, 1999), the AMP-dependent CoA ligase gene family may also constitute a superfamily of proteins that have exploited a common reaction chemistry to activate a range of divergent substrates. Degenerate primer PCR as deployed in this study provide a useful approach for characterization of this metabolically important family from other plant species. 123 CHAPTER FIVE Polyketide synthase gene family in Rubus idaeus: gene-family characterization, structural features, modes of expression, evolution and catalytic properties of three ripening-related cDNAs. 5.1 Introduction Biological reactions consisting of repetitive condensations between two CoA ester molecules give rise to a repertoire of natural products often referred to as polyketides (O'Hagan, 1991). In plants even though polyektides are synthesized as secondary metabolites, they are associated with a broad range of physiological functions such as growth, development, protection from U V and oxidants, signaling intermediates for interactions with rhizosphere symbionts, and defense against microorganisms and pests (Koes et al., 1994; Ferrer et al., 1999 and references therein). In addition to these biological roles in plants, polyketides also display bio-activities such as antioxidant, antimitotic, estrogenic, and anticancer activities that influence human health (reviewed in Schroder, 1998; Dixon and Steele, 1999; Dixon, 1999). Elucidation of the biosynthetic origin of various plant polyketides has shown that while the polyketides are structurally and functionally very diverse, their basic biosynthetic origin is highly conserved. The biosynthesis of polyketides depends on the catalytic properties of an enzyme, polyketide synthase (PKS). PKSs like fatty acid synthases catalyze repeated condensations between two CoA-esters, often referred to as starter and extender units. The structural diversity in polyketides has been attributed to the "synthase programming" in PKSs that determines the chain length, the choice of starter and extender units, and the subsequent modification and cyclization of the carbon chain (Hopwood and Sherman, 1990; Schroder, 1997). Yet, further diversity can be 124 produced by modification of the polyketides by action of other enzymes, including reductases and cytochrome P450's. Plant-specific PKSs function as dimeric proteins that act directly on Coenyme A (CoA)-thioesters of various carboxylic acids (reviewed in Schroder, 1997; Dixon, 1999). Chalcone synthase (CHS), a key enzyme of the phenylpropanoid metabolism leading to the biosynthesis of flavonoid pigments including the anthocyanins is the best studied plant-specific PKS. CHS catalyzes successive condensations between three units of malonyl-CoA with one unit of p-coumaryl-CoA to form a tetraketide intermediate that is finally cyclized by Claisen condensation to yield naringenin-chalcone. Similarly, another PKS, stilbene synthase (STS), resembles CHS in so far as it also orchestrate the condensations between three units of malonyl-CoA with one unit of p-coumaroyl-CoA to form an tetraketide intermediate. However, in the case of STS, this intermediate is cyclized by aldol condensation to yield a structurally distinct polyketide, hydroxy-stilbene (Figure 5.1). Other PKS have also been characterized from plants that possess similar catalytic properties as those of CHS (Figure 5.1). Among other PKS characterized, bibenzylsynthase (BBS) cloned from Phalaenopsis sp. utilizes a m-hydroxyphenylpropionyl-CoA starter unit and synthesizes 3,3',5'-trihydroxybibenzyl by a reaction mechanism similar to that of STS (Preisig-Muller et al., 1995). Acridone synthase (ACS) cloned from Ruta graveolens, produces 1,3-dihydroxy-N-methylacridone from N-methylanthranilyl-CoA and three units of malonyl-CoA (Lukacin et al., 1999). p-Coumaroyltriacetic acid synthase (CTAS) from Hydrangea condenses three units of malonyl-CoA with p-coumaroyl-CoA to produce a tetraketide, but without aromatic ring 125 formation (Akiyama et al., 1999). Thus, while CHS, STS, CTAS, and ACS form a group of plant PKSs that catalyze the maximum number of condensations cycles detected to date amongst various plant-polyketides, other plant PKSs have also been characterized that carryout fewer condensation cycles, or use other starter units, to yield structurally distinct polyketides (reviewed in Schroder, 1997; Dixon and Steele, 1999; Dixon, 1999) (Figure 5.1). An examples of such PKSs, is styrylpyrone synthase (SPS) that uses the same precursors as used by CHS, but carry out only two condensing reactions preceding the ring closure (Beckert et al., 1997). Others examples include pyrone synthase (PS) that catalyzes the formation of pyrone derivatives in Gerbera by using one unit of acetyl-CoA and two units of malonyl-CoA (Eckermann et al., 1998). While an array of the plant PKSs have been identified at the protein level, recent advances in molecular genetics have also led to the direct isolation, sequencing, and functional analysis of numerous genes encoding PKSs in various plant species. Large families of PKS genes have been cloned from plants such as Petunia (Koes et al., 1987), bean (Ryder et al., 1987), pea (Harker et al., 1990; An et al., 1993), alfalfa (Junghans et al., 1993), soyabean (Akada et al., 1991; Estabrook and Sengupta-Gopalan, 1991), and Ipomea (Durbin et al., 1995), while relatively smaller gene families have been cloned from species such peanut (Schroder et al., 1988), tomato (O'Neill et al., 1990), mustard (Batschauer et al., 1991), Mathhiola (Epping et al., 1990), maize (Franken et al., 1991), barley (Rhode et al., 1991), and Gerbera (Helariutta et al., 1995). As yet, only single PKS genes have been reported from Antirrhinum (Sommer and Saedler, 1986), Arabidopsis (Feinbaum and Ausubel, 1998), and Petroselinum (Reimold., 1983). A family of PKS 126 genes has also been cloned from gymnosperms (Raiber et al., 1995; Fliegmann et al., 1992; Schanz et al., 1992; Schroder et al., 1998). Comparative analysis of these plant PKSs reveal substantial similarity at the nucleotide and protein sequence level. All PKSs consist of -400 residue polypeptide chains and share >50% sequence identity. However, functional analysis of some of these sequences using either a genetic or a biochemical approach has shown that, despite the considerable sequence similarity, PKSs can exhibit very distinct" synthase programming" and thus catalytic capabilities. For example, within a family of three PKS genes isolated from Gerbera hybrida, gCHSl and gCHS3 share 88.9% amino acid sequence identity and show a typical CHS- type activity, while a third gene (gPS2) that shares -75% amino acid sequence identity to gCHSl and gCHS3, is a PS-type PKS (Helariutta et al., 1995; Eckermann et al., 1998). Similar results have been reported from other species in which more than one catalytically distinct PKS genes have been characterized. For example, the two PKS genes in Pinus strobus share 87.6% amino acid sequence identity, yet their products PStrCHSl performed a typical CHS reaction while PStrCHS2 is completely inactive with the usual CHS starter substrates and preferred to use a diketide derivative analogous to a CHS-intermediate (Schroder et al., 1998). The catalytic program of a plant-PKS, thus cannot be predicted solely based on the sequence information. It is of particular interest to determine how these very similar enzymes can catalyze the formation of distinctive products. Recently, the crystal structure of the first plant CHS has provided a view of the architecture of this enzyme. Remarkably, few chemically reactive amino acids are required at the CHS active site to catalyze a series of decarboxylation, condensation, cyclization and aromatization 127 reactions upon a continuously changing set of substrates (Ferrer et al., 1999; Jez et al., 2000). An alignment of all functionally distinct PKSs reveals that the residues forming the catalytic core of CHS are conserved in the known CHS-related enzymes including STS, BBS, ACS, PS and also in some bacterial CHS-like genes (Jez et al., 2000). The mechanistically simple active site of CHS led the authors to propose that the evolution of new PKS functions might require relatively few changes. Nature might have exploited this functional flexibility to evolve only moderately divergent PKS species that can nevertheless carry out catalytically distinct functions. While a large repertoire of PKS genes sharing the same or divergent catalytic functions has been isolated, the functional basis for this diversity has been analyzed in some species. The presence of apparently single CHS genes in Arabidopsis (Shirley et al., 1995), Antirrhinum (Sommer and Saedler, 1986), and Petroselinum (Hahlbrock and Scheel, 1989), suggests that possession of multiple PKS genes is not essential for the survival of the plants. In some species, the presence of multiple PKS genes has been correlated with distinct capacity to code for different but related enzymatic activities that are regulated in tissue- and development-specific fashion. In Antirrhinum (Sommer and Saedler, 1986) and Arabidopsis (Shirley et al., 1995) the CHS gene has been shown to be developmentally regulated with anthocyanin pigmentation during flower development, or with pigmentation of the seed coat, respectively, and in maize, CHS gene expression varies during kernel development (Franken et al., 1991). Similarly, in Gerbera, the expression of the two CHS genes has been correlated with the accumulation of flavonols, while PS, is expressed in almost all tissues of the plant (Helariutta et al., 1995). Analysis of a subset of alfalfa PKS genes showed that induction of CHS2 in CuCb-treated roots 128 and Phoma-'mkcted leaves was more rapid and /or transient than that of other members of the PKS family. In grape, pine, and peanut cell-cultures, it has been demonstrated that the PKS belonging to the STS-class is preferentially induced in response to elicitation as compared to CHS in the same species (Schoeppner and Kindl, 1979; Rolfs et al., 1981; Rolfs and Kindl, 1984). In most of these examples only a subset of the PKS genes have been analyzed. The characterization of all PKS gene-family members in a single plant species is important if we are to fully understand the functional divergence and the contribution to cell function(s) of the PKSs. Characterization of the full repertoire of PKS genes in a single species provides further insight into the evolution of this important family of proteins. Based on the phylogenetic analysis all PKSs characterized in plants Tropf et al. (1994) predicted that divergent members of the plant PKS superfamily had arisen independently from a CHS-type PKS in several species. However, in a more recent phylogenetic analysis of STS and CHS in Vitis in comparision with other similar PKSs Goodwin et al. (1999) propose that Vitis-STS has a separate origin from Vitis-CHSs, and thus that evolution of PKSs has followed multiple routes. Firmer conclusions about the evolution of the PKS superfamily requires the characterization of the repertoire of PKS from many taxa. In Rubus, polyketide-derivatives such as anthocyanins are readily visible in ripening fruits. The starting materials for the biosynthesis of anthocyanin are the products of CHS (Figure 5.1). Raspberry fruits also accumulate another polyketide-derivative, p-hydroxyohenylbut-2-one (pHPB), primarily responsible for the characteristic aroma of ripening fruits of raspberry (Schinz and Seiel, 1961). Biosynthesis of pHPB, follows a two-step process. In the first step, one unit of p-coumaryl-CoA is condensed with one-129 unit of malonyl-CoA to form a polyketide, p-hydroxyohenylbut-3-ene-2-one (BA). In the second step, B A is reduced by a reductase in the presence of NADPH to yield pHPB. Formation of B A is reminiscent of the biosynthesis of naringenin chalcones and it was thus proposed to be catalyzed by a novel PKS, benzalacetone synthase (BAS) (Figure 5.1). Partial purification of the BAS revealed that it had a M r , pi, and pH optimum similar to that of CHS (Boresjsza-Wysocki and Hrazdina, 1996). However, since a protein fraction enriched 172-fold for BAS activity also contained minor CHS activity and contained a mixture of proteins sharing similar M rs, the authors were unable to conclusively establish that formation of B A was dependent on the catalytic activity of a single novel PKS (BAS). Boresjsza-Wysocki and Hrazdina (1996) also observed that BAS could be preferentially induced over CHS, in elicited raspberry cell-cultures. This suggests that BAS like STS, ACS and other PKS might be a unique PKS that shares significant similarities to CHS. Interestingly, BA-derivatives have also been detected as a minor "early release product" of parsley CHS (Hrazdina et al, 1976), suggesting that BAS activity can be stimulated in a CHS-type PKS. Thus, one objectives of this project was to investigate whether the formation of B A is controlled by a novel PKS or is due to a typical CHS whose activity is modulated by interactions with other proteins or co-factors. While a role for protein-protein interaction in biosynthesis of plant polyketides has yet to be established in plants, such interactions cannot be excluded given the recent evidence presented by Burbulis and Winkel-Shirley (1999) that enzymes of the flavonoid pathway form multienzyme complexes. Other observations also support such a concept. 130 Exchangeability of polyketide reaction intermediates between two enzymes has been observed during the formation of 6'-deoxychalcone, an isoflavonoid phytoalexin. The biosynthesis of this polyketide derivative is dependent on the function of chalone reductase (CHR) that co-acts with CHS and reduces a ketone to from the CHS-intermediate before it is cyclized (Welle and Griesbach, 1988). In Pinus strobus, a CHS-like enzyme involved in the biosyntheiss of C-methylated chalcones is active only with a starter molecule that is chemically analogous to the diketide-CoA intermediate postulated to be formed after the first condensation reaction in CHS (Schroder et al., 1998) again suggesting an intimate interactions between two enzymes. To characterize BAS, I took a more global approach that included characterizing other PKSs that are present in the same species. Overall the findings in this chapter provide insight into the structure's and functions of the Rubus PKSs, as well as set of degenerate primers that could be used to characterize structurally diverse PKS genes from another species. 131 Figure 5.1 Plant PKSs that catalyze the condensation of a specific starter units [labelled (a)-(f)] with one, two or three units of malonyl CoA. a, 4-coumaryl CoA; b, m-hydroxybenzoyl CoA ; c, isovaleryl CoA/isobutryl CoA; d, m-hydroxypropionyl CoA; e, N-methylanthranilyl CoA; f, acetyl CoA. 132 5.2 Methods 5.2.1 Degenerate PCR primer design and amplification of members of the Rubus PKS gene family Amino acid alignments of selected chalcone synthases, stilbene synthases, bibenzyl synthase, and acridone synthase revealed short stretches of amino acids that were highly conserved across all polyketide synthases. The plant-specific PKS sequences selected for alignment were as follows: chalcone synthases [Antirrhinum (X03710), Arabidopsis (M20308), Gerbera (Z308096, Z380980), Petroselinum (P16107), Pueraria (D63855), Pinus (X60754), Sinapis (X16437), Zea (X60204, X60205)]; stilbene synthases [Arachis (X62300, L00952), Pinus (Z46915, Z46914, S50350), Vitis (X76892, P28343)]; bibenzyl synthases [Phalaenopsis (X79903, X79904)] and acridone synthase [Ruta (Z34088)] and a Gerbera polyketide synthase (Z38097) whose function was unknown at the time this study was started. Numbers in brackets denote the GenBank accession number. Degenerate PCR primers were designed to target DNA encoding the conserved regions of PKSs. Inosine was incorporated for codons with four/six fold degeneracy. PK1 [5 ,-cggaattcAA(A/G)GCIAT(TCA)AA(AG)GA(GA)TGGG] is the fully-degenerate sense primer based on amino acids KAIKEWG. PK3 f5'-cggaattcAA (G/A)GA(T/C)CT IGCIGA(A/G)AA(T/C)AA] is the partially degenerate sense primer targeting the amino acids KD(L/V)AEVNN. PK2 [5'-gctctagaIGG(A/G)TGIGC(A/G/T)ATCCA(A/G)AA] is the fully degenerate antisense primer encoding the amino acids FWIAHP, and PK4 [5-gctctagaCCIGGICC(A/G)AAICC(A/G)AA] is the fully- degenerate antisense PCR primer based on amino acids FGFGPG. The target locations of the primers within a typical plant PKS gene are illustrated in Figure 5.2. To facilitate subcloning of the PCR-133 amplified products, EcoRI recognition sites (underlined) were incorporated in the sense primers PK1 and PK3 and Xbal recognition sites (underlined) were incorporated in the antisense primers PK2 and PK4. Conditions for the PCR amplification reactions were as described in Section 2.3. To verify that the amplification products did not contain any chimeric artifacts due to recombination among related gene family members (Saiki et al., 1988), products of two independent PCR reactions with each primer combination were processed separately. To examine the size and the specificity of the inserts, multiple transformants from each primer combination were directly amplified in a 50 ul reaction volume containing 2 pM each vector-specific primers M13R and M13F, 50 pM each dNTPs, 0.5 U Taq polymerase (Appligene), and lx PCR buffer mix (Appligene). The cycling parameters were as described in Section 2.3. PCR fragments were resolved in a 1% TAE-agarose gel. To identify gene family members, amplified products (4 ul) were digested with the restriction enzymes Pstl and Smal (GLBCO-BRL). Restriction digests were carried out for 15 h at 37°C to ensure complete digestion. The digested products were electrophoresed through a 1.5% TAE-agarose gel, and clones were grouped according to the resulting restriction enzyme fingerprints. 5.2.2 Screening of a Rubus cDNA library for expressed PKS genes Combined fragments of the Ripks5 and Ripks6 DNA sequences obtained by amplifications of the Rubus genome with PKS gene-specific primers (Figure 5.2) were used to screen the library as described in Section 2.5. After three rounds of screening, the inserts of positive plaques were amplified with vector-specific primer T3 and PKS gene-specific primer P4, as described in Section 2.3. Restriction enzymes EcoRI+Xhol, Pstl, 134 Smal, Kpnl, EcoKV, Hindfll, Sail, and BamHL were used to digest the amplification products. Crude amplified products (5 pi) were treated for 15 h at 37°C with 10 U of each enzyme to ensure complete digestion. The digested products were electrophoresed through a 1.5% TAE-agarose gel and clones were grouped according to their restriction enzyme fingerprints. One representative plaque of each class, harboring the largest size insert, was rescued as a pBluescript II SK(") phagemid using the ExAssist helper phage. 5.2.3 Expression of recombinant Rubus PKS proteins For expression of the recombinant proteins, the open reading frames of RiPKS5, RiPKS6, and RiPKSll were cloned in-frame into the bacterial expression vectors pQE50 and pQE30. To generate the expression constructs, the open reading frame of each RiPKS cDNA was amplified in a PCR reaction with a gene-specific primer and the vector-specific universal primer T7. Gene-specific primers anchored at the 5'-end of the cDNA introduced a Sphl site upstream of the start codon. Vector-specific primer T7 was anchored at the 3'-end of the cDNA to introduce aXhol site downstream of the stop codon. The nucleotide sequences of the gene-specific primers were 5'-ACATGCATGC A TGGTGACCGTCGATGAA-3 ' for RiPKS5, 5'- A C A T G C A T G C A T G G T G A C C G T C G A TGAA-3' for RiPKS6, 5 ' -ACATGCATGCATGGTGACCGTCGATGAA-3 ' for RiPKSll. The Sphl recognition site within each specific primer has been underlined. The resulting PCR-amplified products (about 1.2 kb each) were ligated into Sphl-Satl digested pQE30 or pQE50 to yield plasmids pQE30-RiPKS5, pQE30-RiPKS6, pQE30-RiPKSll, pQE50-RiPKS5, pQE50-RiPKS6, and pQE50-RiPKSll. pQE50-based expression plasmids express the recombinant proteins without the N-terminal His6-tag, while recombinant proteins from the pQE30-based vector have an in-frame N-terminal His6- tag. 135 5.2.4 Design of gene-specific primers and RT-cPCR To distinguish between the transcripts of the full-length RiPKS genes, primers were designed to target unique regions within the 5'- and 3- untranslated regions of the corresponding cDNAs. RiPKS5 was amplified with forward primers 55 [5 -GAGACG G T T G T G C T T C A C A G T G T G ] and a gene-specific reverse primer 53 [ 5 - A C A A T A T G A A A T G G A A A C T G A T A ] , to yield a 294 bp product. RiPKS6 was amplified with sense primer 65 [5 -GAGACCGTTGTGCTTCACAGTGTGG] and a gene-specific antisense primer 63 [5'- G C A A A G C A A T C A G A A C T T T T T A T C ] , yielding a product of 330 bp. RiPKSll was selectively amplified with gene-specific sense primer 115 [5-G A T C A C T G C A A C A C C C C A A A C ] and antisense primer 113 [5'-T A C A G T T G G G A G G A G T T G C C ] to give a product of 141 bp. The locations of these primers are shown in Figure 5.6. Composite primers for making gene-specific competitor had additional nucleotides at the 3'-end of each gene-specific primer. Twenty additional nucleotides (5 ' -CCCCTAACAGGAATTCTGCG) were added at the 3'-end of each of the gene-specific sense primers. Twenty additional nucleotides ( 5 - A C C A T C G C A G A T T G A A G G A C ) were added at the 3'-end of the gene-specific antisense primers. Gene-specific primers for amplification of the Rubus Histone H3 had the nucleotides: hisl5- 5'-A T G G C G C G G A C G A A G G A and hisl3- 5 ' -GCCTACGCCGCCCGCTCAACCTA. Competitor target DNAs for RT-PCR analysis ofRiPKS5, RiPKS6 and RiPKSll were prepared by amplification of the spruce conferin fi-glucosidase cDNA with the composite primers to yield a product of ~ 280 bp. This initial PCR product was reamplified with the appropriate RiPKS gene-specific primers to generate competitors with gene-specific primer binding sites at each ends, as described in Section 2.11.3. 136 For amplification of the cDNA, first-strand cDNA reaction (1 ul) was amplified in a total volume of 20 ul containing 200 nM each PCR primer, 200 u M each dNTP, and 2.5 U of Taq D N A polymerase in lx PCR buffer (Qiagen) and lx Q solution (Qiagen). Thermal cycling was carried out at 94°C for 5 min followed by 25 cycles (for Histone H3) or 32 cycles (for RiPKSs) of 94°C for 20 s, 59°C for 50 s, and 72°C for 50 s and a final extension of 5 min at 72°C in a T Gradient 96 thermal cycler (Biometra). Amplified PCR products (10 were separated in a 4% TAE-agarose gel and stained with 5 Lig/ml ethidium bromide. 5.3 Results 5.3.1 Amplification and characterization of the Rubus polyketide synthase genes To find consensus regions conserved between diverse PKSs, I compared the encoded amino acid sequences of several plant PKS genes, choosing only those sequences whose functionality has been unequivocally demonstrated either genetically (loss-of-function mutants), or by biochemical characterization of the corresponding recombinant enzyme. Despite the differences in substrates and reaction mechanisms utilized, and class of product delivered by different PKSs, alignment of the deduced amino acid sequences identified several conserved regions that appeared to be useful for primer design. Genomic D N A isolated from young Rubus leaf tissue was used as a template for amplification of PKS genes using combinations of degenerate primers PK1-PK2, PK1-PK4, PK3-PK2, and PK3-PK4 (Figure 5.2). To eliminate chimeric constructs, I processed the amplification products from two separate PCR reactions with each primer pair. The expected product sizes with primer combinations PK1-PK2, PK1-PK4, PK3-PK2, and 137 PK3-PK4, were 600 bp, 800 bp, 400 bp and 600 bp respectively (Figure 5.2). Amplification with PK1-PK2 and PK3-PK4 yielded in each case an intense product of about 600bp, along with minor products of other sizes (Figure 5.3, lane 1 and 3). Amplification with PK3-PK2 yielded a single product of about 400 bp (Figure 5.3, lane 2). Amplifications with PK1-PK4 yielded either attenuated amplification or no amplification products, and therefore this primer combination was not used for further analysis. All PKS genes characterized to date, with the exception of the Antirrhinum CHS gene, have a single intron that is located outside the regions targeted by the primers in this study (Figure 5.2). The Antirrhinum CHS gene is unique in having a second intron (tentative position indicated in Figure 5.2) that is located within the region amplified by the primers PK1-PK2. The larger products seen with the combinations of primers PK1-PK2/ PK3-PK4 were thus interesting since they might have represented potential polyketide synthases genes with unique intron(s). However, in a Southern blot analysis (not shown) these fragments did not cross-hybridize with a homologous PKS probe, and thus were not investigated any further. No contamination was detected in the PCR reactions in these experiments, as determined by the absence of amplified products in the "no template" control (Figure 5.3, lane 8). After cloning the amplified products corresponding to the expected sizes into the EcoRI-Xbal site of pUC19, multiple colonies were analyzed for the presence or absence of inserts by PCR amplification using vector-specific primers M13R and M13F. All clones carried inserts of the same size as the PCR products from which they were generated. To discriminate between different classes of clones, I fingerprinted multiple clones with two restriction enzymes (PstI and SmaV). This analysis revealed 138 polymorphism for the restriction enzyme recognition sites and led to the identification of five classes of clones from primer pairs PK1-PK2, four classes of clones from primer pairs PK3-PK2, and only one class from primer pair PK3-PK4 (Table 5.2). PKS genes are known to comprise large multigene families in alfalfa (Junghans et al., 1993), Gerbera (Helariutta et al., 1995 and Helaruitta et al., 1996), Ipomea (Durbin et al., 1995), petunia (Koes et al., 1988), and soybean (Akada et al., 1995). Within a plant species, the coding regions of the PKS gene family members are highly homologous, making it conceivable that multiple genes could have identical fingerprint patterns with the two enzymes I had used to discriminate between clones. To control for that possibility, I sequenced multiple clones representing each RFLP class, and compared a stretch of 351 nucleotides (117 aa) within all the sequenced clones, using the C L U S T A L program in PC/GENE. 139 v 1 V 2 K , , 7 A r K E W G 1 2 3 v rCTAAAm/xTXT F 3o5WIAHP 1 8 6 K^oDCIVVjAEVNNige F377GFGPG383 s X J I E S P S X I I I L _ E P X E S I 1_ E S I I E I P i S i X f s X P i s X -ml E P i s X E i s X E i s X E i X 150 bp Ripksl I Ripks2 Ripks3 Ripks4 Ripks5 Ripks6 Ripks7 Ripks8 Ripks9 Ripksl 0 Figure 5.2 Positions of the PAS gene-specific primers, and of the amplified Rubus pks clones, relative to a generic plant PKS gene. The rectangular line denotes the coding regions of a generic plant PKS gene. Closed arrow heads indicate the positions of the degenerate PCR primers. The amino acids corresponding to each primer have been numbered relative to Arabidopsis CHS (GenBank, M20308) as denoted below each primer. V'represents the conserved intron found in all PKS genes isolated from plants. V 2 represents the position of a second intron found only in Antirrhnum CHS (X03710) The amplified regions of the Rubus PKS genes are indicated by solid lines. Ripksl-Ripks6 were detected by primers PK1-PK2, while Ripks7-Ripksl0 were generated from primer combination PK3-PK4. The dark line indicates the region (351 bp) compared and analyzed in detail. Amplified products had an EcoRI (E) and Xbal (X) site at their 5' and 3'-ends respectively. The positions of the polymorphic Pstl (P) and Smal (S) sites have been indicated for each gene fragment. 140 M 1 2 3 4 5 6 7 8 Figure 5.3 Fragments of the Rubus genome amplified using different PKS gene-specific degenerate primer pairs. The various primer pair combinations used were as follows: Lane 1, PK1-PK2; lane 2, PK3-PK2; lane 3, PK3-PK4; lane 4, PK1; lane 5, PK3; lane 6, PK2; lane 7, PK4; lane 8, no template control; M , 1 kb molecular weight marker. Primer Pairs # Clones Analyzed RFLP Classes Detected Sequence Classes Detected PK1-PK2 500 5 Ripksl, Ripks2, Ripks3, Ripks4, Ripks5, Ripks6 PK3-PK2 500 4 Ripks6, Ripksl, Ripks8, Ripks9, RipkslO PK3-PK4 500 1 Ripks6 Table 5.1 Summary of analysis and characterization of putative Rubus PKS clones from each primer pair. A number of clones, as indicated, from each primer pair were fingerprinted with two restriction enzymes, and classified into groups based on the fingerprints. Sequencing of multiple clones from each RFLP class revealed a finer level of sequence variation that led to identification of additional sequence classes. 141 Since the target DNA used for this analysis was isolated from diploid tissue, the allelic forms of the genes are presumed to have an equal chance of being amplified. Rubus is a vegetatively propagated crop and commercial varieties such as "Meeker" thus represent a very homogeneous population. Since Meeker is a cross of "Willamette" x "Cuthbert" (Moore and Daubeny, 1993), one would expect differences between allelic forms of the genes. Thus for my purpose, I arbitrarily attributed differences smaller than 10 nucleotides out of 351 nucleotides compared, to fidelity error introduced by Taq DNA polymerase, or to the co-amplification of the allelic sequence. Analysis of all the sequenced clones representing the five different RFLP classes allowed a finer level of discrimination between clones and led to the identification of ten sequence classes whose differences exceed the 10/351 nucleotide differences cut-off (Table 5.2). I designated these clones as Ripkl to RipkslO (Lower case letters are assigned to these clones since they represent only partial length genomic sequences). Figure 5.2 indicates the extent of cloned sequence of each gene and the positions of the polymorphic restriction sites within the amplified regions of each gene. Figure 5.4 presents the nucleotide sequences of the ten Ripks genes, which are discussed in detail below. 5.3.2 Rubus PKS sequence classes The Ripksl sequence class had a recognition site for Sma\ at a point 60 bp downstream of the primer PK1 binding site. This class lacked a recognition site for Pstl within the amplified and characterized region. Ripksl was represented by 11 sequenced clones obtained from two independent PCR reactions (denoted A and B) with primer pairs PK1-PK2. Five out of the eleven clones sequenced were identical over the 351 bp compared and this sequence was therefore designated Ripksl. Clone A58 had changes at 142 nucleotide positions 34 (A->T) and 62 (A-+G); clone B27 had changes at positions 28 (T—C), 148 (G-^A) and 222(A^G); clone B240 differed at position 31 (G^A); clone B49 had changes at positions 39 (C-+T), 85 (G^A) and 94 (A^G), clone B03 differed at positions 44 (G->A) and 201 (C->T); and clone B164 had changes at positions 13 (T-»G), 61 (A->G), 93 (A-+G) and 196 (C—>T). Since among these six clones no base substitutions occurred at the same positions, I conclude that all 11 clones represent a single PKS gene with variations attributed to amplifications of allelic sequences or to errors introduced by Taq D N A polymerase. However, it is possible that clone B164 represents another gene as it exceeds the error rate of 0.5% of Taq D N A polymerase as estimated by Butland et al. (1998) in a similar analysis. The Ripks2 gene fragment contained two recognition sites for Smal and one recognition site for Pstl (Figure 5.2). It was represented by clones A43 and B157 obtained from the primer pair combination PK1-PK2. These two clones were identical over the 351 nucleotides compared and this sequence were designated as Ripks2. Ripks3 had a recognition site for Pstl located 308 nucleotides downstream of the primer PK1 binding site. It was represented by two clones, A125 and B l 13, obtained from the primer pair combination PK1-PK2. The two sequences were identical over the 351 nucleotide region compared and this sequence was therefore designated as the Ripks3 sequence. Ripks4 had dual recognition sites for Smal (144 and 462 nucleotides downstream of the primer PK1 binding site) and a single Pstl site in common with some of the other Ripks genes (Figure 5.2). The Ripks4 sequence was represented by three clones, A53, B48 and A166, obtained from primer combination PK1-PK2. While two of these clones 143 were identical in sequence (this sequence was designated as Ripks4) one clone (A48) had changes at nucleotide positions 49 (C^T), 61 (T^C), 67 (C^T), 116(A^T), 210 (C—»A), 219 (T—>C). Since this level of change is greater than the estimated level of error introduced by Taq Polymerase (Butland et al., 1998), it is likely that clone A48 represents the allelic sequence of Ripks4. Ripks5 was represented by five clones obtained from primer pair PK1-PK2. It had a RFLP fingerprint with Smal and Pstl similar to that of Ripks2, but differed at 14 nucleotide positions from Ripks2 (Figure 5.4). Since this level of variation is higher than the estimated Taq D N A polymerase error-rate (Butland et al., 1998), I designated this sequence as originating from a new gene. However, I cannot exclude the possibility that RipksS represents an allelic sequence to Ripks2. Four of the five clones were identical in the 351 bp region compared and this sequence was designated as the Ripks5 sequence. One clone (A83) of the same RFLP class had a single nucleotide substitution at position 340 (T— A). Ripks6 was amplified by all three primer pair combinations. This RFLP class, represented by a single recognition site for Pstl and Smal (Figure 5.2), was the most prominent class detected in the analysis. It was represented by two sequenced clones from primer pair PK1-PK2, four sequenced clones from primer pair PK3-PK2 and seven clones from the primer pair PK3-PK4. Two clones from primer combination PK1-PK2, three clones from primer pair PK3-PK2 and five clones from primer combination PK3-PK4 were identical over the 351 nucleotides compared; hence this sequence was designated as Ripks6. However, in other clones with the same RFLP pattern differences were detected. For example, clone B263 from primer pair PK3-PK2 had changes at 144 nucleotide positions 14 (T^C), 62 (G^T), 75 (G^T) and clones A94 and B22 from primer pair PK3-PK4 had single base substitutions each at nucleotide positions 12 (T->C), 142 (C—>T), and 289 (A-+G). Since no base substitutions occurred at the same positions in these clones, I conclude that all these clones represent a single PKS gene with variations attributable to amplifications of allelic sequences or to the error introduced by Taq DNA polymerase. Ripks7 had single recognition sites for Pstl and for Smal (Figure 5.2), and was represented by two clones that had identical amino acid sequences over the 351 bp region compared. This sequence was designated Ripks7. Even though this class of clones shared common restriction sites for Pstl and Smal with Ripks4, 5 and 6, the nucleotide sequence varied at more than 14 nucleotide positions (over 351 nucleotides compared) when compared to Ripks4, 5 and 6.1 therefore assigned this to a different gene family member. Sequence class Ripks8 did not have a recognition site for Pstl (Figure 5.2) and was represented by two sequences from two independent PCR reactions with primer pair PK3-PK2. These sequences were identical over the 351 bp compared and this sequence was designated Ripks8. Classes Ripks9 and RipkslO were each represented by multiple clones with a similar RFLP pattern arising from two independent PCR reactions with primer pairs PK3-PK2.1 sequenced only one representative clone of each class. Each of these sequences varied at at least 15 nucleotide positions within the 351 bp region when compared to Ripksl-6. Thus, according to my criteria, I designated these sequences as originating from two different genes. 145 5.3.3 Rubus PKS gene structure and sequence comparisons Using a homology-based search method I have established that there are at least ten diverse PKS genes in the Rubus genome. The homology between these genes ranges from 86-96% nucleotide identity (Table 5.3) in the 351 nucleotide region upstream of the PK2 primer binding site. Their encoded amino acid sequences (117 aa) share 82-98%) amino acid sequence identity. None of the sequenced Rubus pks genes has intron(s) sequences within the sequenced regions, which is consistent with the structure of other PKS genes and with the size of the amplification products obtained with each of the three primer pair combinations used. It was not surprising to find that Rubus had a large repertoire of PKS genes in its genome, as characterization of this gene family in other plant species has revealed a similar diversity (Junghans et al., 1993, Durbin et al., 1995, Koes et al., 1988). One goal of this study was to determine whether the PKS gene products in Rubus differed in substrate specificity, in particular to identify one with BAS activity that is required for producing the characteristic aroma of Rubus fruits. However, based on the deduced partial amino acid sequences of these Rubus PKS clones, it was not possible to predict functional roles for individual Rubus PKS genes. Furthermore, I could not analyze the catalytic properties of the recombinant proteins or design gene-specific probes/primers to study the modes of expression of these genes. I therefore proceeded to isolate full-length cDNA sequences corresponding to those RiPKS genes that were functional during fruit-ripening. 146 Ripksl AGGGGTGCACGTTTTCTTGTTGTCTGCTCCGAAATCACCGTTGTTACCTTCCGTGGGCCT 60 Ripks2 G . . . . C C . . . C T 60 Ripks3 .A G CG 60 Ripks4 .A G CG 60 Ripks5 G . . . . C T . C T 60 Ripks6 .A G CG 60 Ripks7 .A G C C . . . 60 Ripks8 G 60 Ripks9 G . . . . C T . C T 60 RipkslO .A A . . . 60 Ripksl AACGACACCCACCTTGATTGTCTTGTGGGCCAAACCTTGTTTGGTGATGGTGTTGCATCT 120 Ripks2 .G A : G C C. . . . C . . . . G . . 120 Ripks3 .GT T A G C . . . . . C . . . . C . . . . G . C 120 Ripks4 .GT T A G C C . . . . C . . . . G . C 120 Ripks5 .G A G C C . . . . G . . 120 Ripks6 .GT T A G C C. . . . C . . . . G . C 120 Ripks7 .GT T A C G C C . . . . C . . . . G . C 120 Ripks8 . G. . . .T G 12 0 Ripks9 .G A G C C . . . G G . . 120 RipkslO .G G 120 Ripksl ATTATTGTTGGGGCTGACCCGTTGCCCGAGATTGAGAAGCCCTTGTTTGAGTTGGTTTCA 180 Ripks2 G 180 Ripks3 C. A . . G G . . . T A 180 Ripks4 A A G G 180 Ripks5 A A G C . . G 180 Ripks6 C. A . . G G . . . T A . . C . . . 180 Ripks7 T C . A . . G . . T A . G . . . T T C . . G 180 Ripks8 A A T C. . . 180 Ripks9 A A . A . . C . . A . .A T C . . G 180 RipkslO A A T C . . . 180 Ripksl GCGGCTCAAACTATTCTTCCCGACAGTGAAGGGGCCATTGAGGGGCATCTTCATGAAGTC 240 Ripks2 C T C C . . T G 240 Ripks3 C . . . . T . . . . T . . . . T T . . . G 240 Ripks4 C T C . . A C G 240 Ripks5 C C . . A C G 240 Ripks6 C T C C . . C G T 240 Ripks7 C C C . . C G T 240 Ripks8 C . . A C G 240 Ripks9 C C . . A C G 240 RipkslO G T . . 24 0 147 Ripksl GGGCTCACATTTCATCTCCTTGAGAATGTTCCCGCGCTGATTTCTAAGAACATCGAGAAG 300 Ripks2 C CA. .G G A... 300 Ripks3 300 Ripks4 C CA. .G G 300 Ripks5 C CA. .G G 300 Ripks6 C GA..G G.T.A A 300 Ripks7 C GA..G G.T.A A ....A 300 Ripks8 C CA..G G A 300 Ripks9 C CA..G G A 300 RipkslO A A.G A 300 Ripksl AGCCTAAACGAGACCTTCAAACCTTTGGACATCATGGATTGGAACTCACTT 351 Ripks2 G CA 351 RipksS 351 Ripks4 G CT 351 Ripks5 G CT 351 Ripks6 T G C...C 351 Ripks7 T AG i C...C 351 Ripks8 G CT. 351 Ripks9 .A AG.A CT 351 RipkslO A.A 351 Figure 5.4 Nucleotide sequences of the ten partial Rubus idaeuspks genes. 351 nucleotides of each sequence compared are shown in full for Ripksl. For the other sequences, only deviations from Ripksl are shown. Dots represent identical nucleotides. 148 Ripks2 Ripks3 Ripks4 Ripks5 Ripks6 Ripks7 Ripks8 Ripks9 »• h i 9 0 ( 9 3 ) Ripksl M 91(93) 92 88(91) 90 87(90) 91 85(90) 88 82 (90) 86 88(93) 94 84 (89) 89 Ripks2 93(95) 95 98(98) 95 96(96) 96 95(96) 92 95 (98) 90 93 (96) 93 95 (96) 93 Ripks3 93 (94) 93 91(93) 90 90 (92) 94 88(93) 90 84 (90) 88 81 (92) 91 Ripks4 96 (96) 96 95(95) 94 96 (97) 92 91(95) 93 95 (97) 93 Ripks5 94(95) 92 96 (98) 90 93 (96) 94 97(98) 96 Ripks6 96 (98) 96 92 (96) 90 94(96) 88 Ripks7 92 (97) 90 95 (98) 90 Ripks8 94 (95) 94 Ripks9 Table 5.2 Sequence similarity amongst the partial Rubus pks gene family members, comparing the 351 bp (117 aa) regions shown in Figure 5.2. The upper line for each sequence denotes the percent amino acid sequence identity and similarity (in brackets), while the lower line (in bold) denotes the percent nucleotide sequence identity. 149 5.3.4 Isolation and characterization of ripening-related Rubus PKS genes I constructed a cDNA library representing mRNA isolated from partially-ripe fruits of raspberry and screened 500,000 plaques of the library with a mixed population of probes consisting of full-length amplified fragments of Ripksl and RipksT (Figure 5.1). More than 150 positive clones were detected after the first round of plaque hybridizations. Since the library had been amplified once, not all of these plaques represented unique clones. I therefore selected 80 plaques for further analysis. All 80 plaques showed positive signals after two additional rounds of hybridization. Amplifications of the insert in each positive plaque with universal primer T3 and a PKS-gene-specific primer (PK2), identified 72 plaques as having potential PKS gene inserts >900bp in length. Restriction fragment length analysis of the amplified fragments of the 72 clones with EcoRI +^ 7;oI, Pstl, Smal, Kpnl, EcoRV, Hindm, Sail, and BamBl grouped the clones into three discrete classes. RFLP maps for a representative clone from each class are shown in Figure 5.4. The entire sequence of a representative clone from each class was determined by sequencing both strands of the DNA using the primer walking strategy. These clones isolated correspond to the PKS mRNA transcripts that accumulate in ripening fruits. In comparisons with the partial fragments of the ten Ripks genes characterized by the PCR-based search, group 1 cDNA clones showed 99% nucleotide identity to Ripks5, group 2 cDNA clones showed 99% nucleotide sequence identity to Ripks6, while group 3 cDNA clones shared 92-96% nucleotide sequence identity with Ripksl-10 being closest to Ripks9 (96% nucleotide sequence identity). This level of variation most likely represents a new gene rather than an allelic sequence, and I therefore designated group 3 150 clones as representing a new class RiPKSll. In the 351 bp regions compared (Figure 5.4), group 1 clones had a single nucleotide difference from Ripks5 and this group was designated as RiPKS5. As group 2 clones differed at a single nucleotide position from Ripks6, group 2 clones were designated RiPKS6. The nucleotide sequences of RiPKS5, RiPKS6 and RiPKSll were determined and are presented in Figure 5.4. RiPKS5 consists of 1482 bp containing 50 nucleotides of 5-untranslated region, an ORF of 1176 bp, and 256 bp of 3'-untranslated region. RiPKS6 consists of a 70 bp 5'-untranslated region, an ORF of 1767 bp, and a 308 bp 3'-untranslated region. RiPKSll (1474 bp), contains a 75 bp 5'-untranslated region, an ORF of 1176 bp, and a223 bp 3'-untranslated region. It was deduced that all three clones represent full-length sequences because homology to other CHSs in the GenBank begins shortly after the methionine underlined in Figure 5.7. At the nucleotide level, the open reading frame of RiPKS5 shares 93% nucleotide identity to RiPKS6, and 97% identity to RiPKSll. The open reading frames of RiPKS6 and RiPKSll share 92% nucleotide identity. The lengths of the 3'-untranslated regions were 256, 308 and 223 nucleotides (respectively) for RiPKS5, RiPKS6 and RiPKSll. The homology between the 3'-untranslated regions varied from 98% between RiPKS5 and RiPKSll to 60% between RiPKS5 and RiPKS6. Differences between the 5' and 3'-untranslated regions consisted mostly of deletions and insertions rather than significant stretches of nucleotide dissimilarity (Figure 5.6). In the sequenced regions, the 5'-untranslated regions ofRiPKS5 and RiPKS6 were identical. This is not a unique observation since the five alfalfa CHS genes (Junghans et al., 1993) also share considerable homology in these 5'-and the 3'-untranslated regions. Similarly, the two soybean CHS genes (chs5 and chs4) have identical coding regions and 3'-untranslated 151 regions, while sharing 63% nucleotide identity in the 5'-untranslated region (Akada et al., 1995). It is likely that characterization of sequences further upstream of the start codon would reveal significant differences between RiPKS5 and RiPKS6. Translation of the nucleotide sequences of these three cDNAs showed that they all encode proteins of 391 amino acids, with a predicted molecular masses around 42 kD and calculated pis of 6.04 - 6.28 (Table 5.3). The calculated molecular masses and the theoretical pis are within the range observed for PKS polypeptides from elicitor-induced alfalfa cell suspension cultures (Dalkin et al., 1990). It is interesting to note that the predicted amino acid sequences contain several potential post-translational modification sites (Table 5.3), along with a potential leucine zipper pattern (PROSITE: PS00029). I compared the predicted amino acid sequences of the three RiPKSs to each other and to other PKS sequences from plants. The predicted amino acid sequences of the three cDNAs share 96-98% amino acid identity (Table 5.4). RiPKS sequences also shared considerable sequence homology with other Rosaceae PKS sequences; 94-95%) amino acid sequence identity with strawberry CHS, and 86-90% amino acid sequence identity with apple CHS. The three RiPKSs also shared considerable similarity with Arabidopsis CHS sequences. The sequencing of the Arabidopsis genome has identified several divergent sequences that have been annotated as "CHS-like" based on homology. The RiPKSs were considerably divergent from these Arabidopsis CHS-like sequences. (Table 5.4) and much more closely related to the bona fide CHS sequences (78-87 % amino acid sequence identity). The three Rubus sequences shared only 61-78 % amino acid sequence identity with PKSs that display activities distinct from CHS (Table 5.4). 152 5.3.5 Molecular modeling A comparison of the deduced amino acid sequences of RiPKSs to strict CHS or STS consensus sequence is shown in Figure 5.7. The consensus sequence was derived by comparison of sequences whose protein functionality has been corroborated using genetic analysis, or biochemical analysis of the recombinant proteins (Table 5.4). I added the alfalfa CHS2 (Junghans et al., 1993) sequence in this comparison, as this recombinant protein has recently been crystallized, thus allowing the identification of residues important for CHS activity (Ferrer et al., 1999). The crystal structure of alfalfa CHS revealed that there are only four chemically reactive residues, Cysl64, Phe215, His303 and Asn336 (Figure 5.7), required in the active site (Ferrer et al., 1999). These residues are also conserved both in the three RiPKS polypeptides and in the consensus for STS and CHS. Residues Thrl97, Ile254, Gly256, Asn336, Ser338, and His303, which define the active site volume for the alfalfa CHS2, were conserved in the RiPKSs, indicating that the active site architecture of the RiPKSs was similar to that of the alfalfa CHS2 and thus that Rubus enzymes should be capable of accommodating large end products like resveratrol or chalcones. In summary, the high amino acid homology to bona fide CHS sequences, and the close relationship between the primary structure of the RiPKSs and to that of alfalfa CHS2 indicate that the RiPKS sequences likely encode Rubus CHS proteins. 153 Group cDNA 1 E P SI SIS i i P 1 s 1 H i x RIPKS5 ATG TGA 2 W 81 p . s 1 r x RiPKS6 ATG TGA 3 P SIS 1 1 P 1 s 1 H i X RiPKSll ATG 1 TGA 200 bp Figure 5.5 Restriction maps of representative Rubus PKS cDNA clones. The dark line represents the regions of the cDNA characterized by the PCR-based homology search. The positions of the start and the stop codons are denoted within each cDNA. Restriction enzymes used are: E, EcoRI; H, Hindlll; P, Pstl; S, Smal; SI, Sail; X, Xhol. RiPKS5 CCTGCAGCTCCATCTGTCC T 20 R1PKS6 GCAAGACTTTCACAACACAA . 41 RiPKSll CT . . .C .TC• . . .TC . • CCT• GATCACTGCAACACCCCAAAC 43 ** * * * * * * * * * RiPKS5 TTCTATT-CATCTTTTCTCCACAGATCAA—AAATGGTGACCGTCGATGAAGTCCGCAAG 77 RiPKS 6 - -- 97 RiPKSll C. .A.C.A.G.ACC.C TTT..G.TC G T 120 ** * * * * _*_****** ** * _ _ * * * * * * * * * * * * * * * * _ * * * * * * * * * * * RiPKS5 GCTCAAAGGGCTGAGGGTCCGGCCACAATCTTGGCGATCGGTACAGCAACTCCTCCCAAC 137 RiPKS6 157 RiPKSll G C TG C A. .G 162 *****_*****_**************__*******_*****_**_*************** RiPKS5 TGTGTCGACCAGAGCACATACCCGGACTACTACTTTCGTATCACCAAGAGTGAGCACAAG 197 R1PKS6 217 RiPKSll ...A A G C C..C 180 ***_*******_*****_*****_***********************_**_********* RiPKS5 ACTGAGCTCAAGGAGAAATTCCAGCGCATGTGTGACAAGTCAATGATCAAGAAGCGTTAC 257 RiPKS6 277 RiPKSll A 282 *********** ************************************************ R1PKS5 ATGTACTTGACGGAAGAAATCCTGAAGGAGAATCCTAGTATGTGTGAGTACATGGCACCT 317 RiPKS6 C 337 RiPKSll A. .T 342 * * * * * * * * * * ********************************_*************** RiPKS5 TCACTCGATGCAAGACAAGACATGGTGGTTGTTGAAATTCCAAAGCTCGGCAAAGAGGCT 377 RiPKS6 A 397 RiPKSll 402 ************************** ********************************* 154 RiPKS5 GCCACAAAGGCCATTAAGGAATGGGGTCAGCCCAAGTCCAAAATCACCCACTTGGTCTTT 437 R1PKS6 T T 517 RiPKSll 523 *****.*****************************************.************ RiPKS5 TGTACCACCAGTGGTGTCGACATGCCCGGGGCGGACTACCAGCTCACTAAACTCTTGGGC 497 RiPKS6 T C..T G..T 517 RiPKSll 540 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * RiPKS5 CTCCGTCCCTCTGTCAAGCGCCTCATGATGTATCAGCAAGGTTGCTTCGCAGGGGGCACG 557 RiPKS6 C C..G C..A G..T C 577 RiPKSll T 582 *****.***** ** ***************** ** *****_**.*****_********* RiPKS5 GTTCTTCGGTTAGCCAAGGACTTGGCCGAGAACAACAGGGGTGCACGTGTTCTCGTTGTC 617 RiPKS6 . .G. .C G T A T 637 RiPKSll 642 * * . * * . * * * * * ************** ********** ***************t****** RiPKS5 TGCTCCGAAATCACTGCTGTTACCTTTCGTGGGCCTAGCGACACCCACCTTGATATTCTT 677 RiPKS6 C..G ..C T T G. . . . 697 RiPKSll G 702 ************** ** ******** ************************** m**** RiPKS5 GTGGGCCAAGCCTTGTTCGGTGATGGTGCTGCAGCTATTATTGTTGGGGCTGACCCATTG 737 RiPKS 6 C C GC.A 757 RiPKSll 762 ***********************.***********.******************** a # * tRiPKS5 CCCAAGATTGAGAGGCCCTTGTTTGAGTTGGTCTCGGCGGCCCAAACTATTCTTCCCGAC 797 RiPKS6 . . GG.A T A A T... 817 RiPKSll A 820 * * # .*_*********** ******** .** .***** .******************** .*** RiPKS5 AGTGACGGAGCCATTGACGGGCATCTTCGTGAAGTCGGGCTCACATTTCACCTCCTCAAG 857 RiPKS6 G C T G. . . 877 RiPKSll 882 ******** . ***** ******************** ********************.*** RiPKS5 GATGTTCCCGGGCTGATTTCTAAGAACATCGAGAAGAGCCTAAACGAGGCCTTCAAACCT 917 RiPKS6 T.A A T 937 RiPKSll C 942 * * * * * * * * * * * * * * * * * * . ** . * * * * * * * * * * * * * * * * * . * * * * * * * * * * * * * * * * * * RiPKSS TTGGACATCACTGATTGGAACTCACTTTTCTGGATTGCACACCCAGGTGGGCCTGCAATT 977 RiPKS6 G..C C 997 RiPKSll 1002 ***********_**_*****************************_*************** RiPKS5 CT AGAC CAAGTAGAGACC AAAT TGGGC C TAAAGC C AGAAAAGTT AGAAGC C AC GAGGC AC 1037 RiPKS6 G C G A T 1057 RiPKSll 1062 *************** .******************* .*********** .***** .***** . RiPKS5 ATATTATCCGAGTACGGTAACATGTCGAGTGCTTGTGTGTTGTTTATTTTGGACGAGGTG 1097 RiPKS6 ' 1117 RiPKSll C 1122 **************************_********************************* 155 RiPKS5 AGGAGGAAGTCCGCAACTAATGGGCTCAAGACCACTGGAGAGGGCCTGGAGTGGGGAGTA 1157 RiPKS6 T...G A C 1177 RiPKSll A 1182 *********** *** ********* ************************ r******** # RiPKS5 CTATTCGGGTTTGGGCCTGGGCTCACCGTTGAGACGGTTGTGCTTCACAGTGTGGGTGTC 1217 RiPKS6 T..A C C 1232 RiPKSll 1242 ***** ** ******************** ***** ******************* RiPKS5 ACTGCTTGAACTTGAACTTGAACTTGAAGGCATCTA—TCTATCTGTTCTGTGGTGATC- 1274 RiPKS6 - . A . . . . C GGG.TC.CT.. T 1273 RiPKSll - 1291 _ * _ * * • * _ * * * * * * * * * * * * _ * * _ _ _ * _ _ * _ _ * * * * * * * * * _ RiPKS5 GATT—TTATCTGCTCCTATATATTATGTATGATTTGCA-TCTATTAATTTATAGC 1327 RiPKS6 CCTT . . .CAG.A. .T. .T. . .A TG. . . C . . C. T . . C . — ...C.G.. 1328 RiPKSll — - 134 4 _ * * * _ _ _ * * * * * * * * a * * * * * _ _ * * * * * _ _ * * * _ * * _ * _ * * _ * _ _ * * * _ * _ * * RiPKS5 TAG GTTTGATTTTGGGAATTTGTTCTCT—TAGAG GCTTGTGTGT GGGG 1374 R1PKS6 ...TTGTC....TT....T.AGT...T..T...GT.GA.CTACAT..AG.CA.CCA.A.. 1388 RiPKSll . . . --..A.. 1391 ***_ **** **** * ^ *** ** *** * * * * > > * ^ ^ * _ > > * _ * * RiPKS5 TAAGCTTTGGTGCAAATTGCTGCTGTGTTTACCTTTCATGTTGTTTATTGC ATTTCT 1431 RiPKS6 ...A ..GCCATG C..T-....G A. . .AATAG. . .GA. 1443 RiPKSll 1448 * * * _ * * * * • _ _ ** * * * * * * * * * * * * * * * * * * * * s * * * * * * _ _ * RiPKS 5 GAA GACAAAAG TGTAGCGACTTATATA TATCAGTTTCCAT 1471 RiPKS6 ...GAAA TGGCCTT T . . TA. . . G. . ATGATGAATAA. . . T. T . . CA. . . 1503 RiPKSll . . . A 1474 * * * - . . . . * * * * * * * * • * * _ * * _ _ * * * _ * RiPKS5 - - TTCATATTGTT . (20X) 1482 RiPKS6 AT GA. AAAAAGTTCTGATTGCT.TTGCT (19X) 1554 RiPKSll (18X) Figure 5.6 Nucleotide sequence alignments of the three Rubus PKS cDNA clones. The nucleotide sequence for RiPKS5 is shown in full. For RiPKS'11 and RiPKS6 only deviations from RiPKS5 are shown. Identical nucleotides in all the sequences are marked by asterisks in the consensus bottom line and by dots in the RiPKSll and RiPKS6 seqeunce. Gaps (-) were introduced for maximum alignment. The translation initiation site and stop sites are indicated in bold and underlined. Bars below the sequences denote the oligonucleotide tracts selected for use as primers designed to distinguish between the individual RiPKS transcripts by RT-cPCR. The numbers at the end denote the length of the poly(A) tail in each cDNA. 156 RiPKS5 RiPKS6 R i P K S l l Features of the Predicted Proteins #AA Predicted M W (Da) Theoretical pi Potential Modification Sites Giycosylation Sites N-glycosylation site (PS00001) Phosphorylation Sites c A M P - and cGMP-dependent protein kinase phosphorylation site (PS00004) Protein kinase C phosphorylation site (PS00005) Casein Kinase II phosphorylation (PS00006) N-myristoylation Sites 391 42,862 6.28 336-339 350-353 391 42,774 6.04 336-339 153-155,197-199 3-6, 35-38,44-47,82-85,133-136,204-207,282-285, 360-363 350-353 153-155,197-199 3-6,35-38,44-47,82-85,133-136,204-207,282-285, 360-363 391 42,830 6.04 336-339 350-353 153-155, 197-199 3-6,35-38,44-47,82-85, 133-136,204-207,282-285, 360-363 N-myristoylation site (PS00008) Others Chalcone and stilbene synthases active site (PS00441) Leucine zipper pattern (PS00029) 118-123,149-154,163-168,252- 118-123,149-154,163-168, 118-123,149-154,163-168, 257,335-340,357-362,364-369, 52-257, 335-340, 364-369, 252-257,273-278,335-340, 368-373 368-373 357-362,364-369,368-373 156-172 310-331 156-172 310-331 156-172 310-331 Table 5.3 Features of the predicted proteins corresponding to the three Rubus PKS cDNAs. Potentially relevant modification sites of the proteins were analyzed by comparisons to known patterns in the PROSITE database. The amino acid positions of these modifications have been indicated for each protein. The numbers in brackets indicate the PROSITE documentation code for each type of modification. 157 Table 5.4 Amino acid sequence similarity of full-length Rubus PKS cDNAs. The RiPKS sequences were compared to selected PKS sequences from other species. The sequences were sub-grouped as mentioned in the table. a Rubus idaeus (Ri), Arabidopsis thaliana (At), Antirrhinum majus (Am), Arachis hypogea (Ah), Fragaria x ananassa (Fr), Gerbera hybrida (Gh), Hordeum vulgare (Hv), Hydrangea macrophylla (Hm), Malus sp. (Ma), Medicago sativa (Ms), Peuraria lobata (PI), Petunia x hybrida (Ph), Petroselinum crispum (Pc); Phalaenopsis sp. (Ps), Pinus strobus (Pst), Pinus sylvestris (Psy), Ruta graveolens (Rg), Sinapis alba (Sa), F/fr's vinifera (Vv), Zea /ways (Zm). b Arabidopsis thaliana sequences bearing homology to CHSs, identified by genome sequencing efforts of the Arabidopsis Genome Initiative Group. Data as of December 1999 0 Comparison to partial length sequences d ACS, acridone synthase; CHS, chalcone synthase; CTAS, Coumaroyltriacetic acid synthase; DPS, dihydropinosylvin synthase; HCHS, homoeriodictyol/eriodictyol chalcone synthase; PS, pyrone synthase; RS resveratrol synthase; STS, stilbene synthase; 158 RiPKS 5 RiPKS 6 RiPKS 14 Sequences3 %AA %AA %AA %AA % AA % AA VJCIlDilUK. Identity imilarity Identity inilarity Identity Similarity Accession # RiPKS5 97 98 98 98 RiPKS9 97 98 - - 96 98 RiPKS 14 98 98 96 98 - -Polyketide synthases from Rosace ae characterized as CHS based on homology FrCHSY 94 95 96 98 94 95 U199420 MaCHSY 86 90 87 90 86 90 X68977 ArabidoDsis clones annotated as belonging to CHS suDerfamilv b AtPKSl 38 57 39 56 38 55 U89559b AtPKS2 39 56 38 . 57 38 57 AF069299b AtPKS3 38 56 38 56 37 56 AL079347b bona fide Chalcone Synthases AtCHS 83 89 83 89 82 82 M2308 AmCHS 84 90 85 91 85 91 X03710 GhCHSl 82 88 83 88 80 88 Z38096 GhCHS3 81 89 81 89 81 87 Z38098 HvCHSl 79 88 80 89 79 88 X58339 HmCHS 86 91 87 92 86 91 ABO 11467 MsCHS2A 82 90 83 90 82 90 L02902 PcCHS 80 87 81 89 80 88 V01538 PhCHSA 87 92 87 92 87 92 X14591 P1CHS 84 91 85 91 84 91 D10223 PsyCHS 81 87 81 87 81 88 X60754 PstCHSY 81 87 81 87 81 88 AJ004800 SaCHSAA 83 89 83 89 82 89 X14314 ZmC2 81 89 81 89 81 89 X60205 ZmWhp 78 87 78 87 78 88 X60204 Polyketide synthases with activities distinct from chalcone synthases'1 AhRSl 69 82 71 82 70 83 P20178 AhRS2 77 82 70 83 71 83 P20077 AhRS3 70 83 71 83 71 83 P51069 GhPS2 68 81 69 82 67 80 Z38097 GhCHS26 69 80 69 80 69 80 X91340 HvCHS2 72 83 72 84 71 82 Y09233 HmCTAS 70 81 70 81 70 82 ABO 11468 PstSTSl 67 79 67 80 67 80 P48407 PstSTS2 68 80 78 85 79 85 P48408 PsyDPS 66 79 66 97 65 78 Q02323 PsySTS 65 78 66 79 65 79 S21123 PstCHS-like 78 85 78 85 78 85 AJ002156 PsBBS 61 76 61 76 61 75 P53416 RgACS 68 82 68 82 67 82 S60241 VvSTSl 72 83 72 83 84 88 P28343 VvSTS 72 83 72 82 72 83 SI 1044 VvSTSYM 72 83 72 83 72 83 X76892 159 RiPKS5 MVTVDEVRKAQRAEGPATIIAIGTATPPNCVDQSTYPDYYFRITKSEHKTELKEKFQRMC 60 RiPKSll . ...E V I N 60 RiPKS6 60 MSCHS2 MVSVSEIRKAQRAEGPATILAIGTANPANCVEQSTYPDFYFKITNSEHKTELKEKFQRMC 60 Col R—QRA-GPA AIGTA-P-N-V-Q—YPD-YF—T-S-H LK-KF-RMC 60 Co2 RA-G-A--LAIG-A-P QS-Y-D-YF—T—E LK-KF C 60 RiPKS5 DKSMIKKRYMYLTEEILKENPSMCEYMAPSLDARQDMVVVEIPKLGKEAATKAIKEWGQP 120 RiPKSll 120 RiPKS 6 120 MSCHS2 DKSMIKRRYMYLTEEILKENPWCEYMAPSLDARQDMVVVEVPRLGKEAAVPCAIKEWGQP 120 Col -KS-I—R-M—TEE-L--NP--C AP S L D - RQD - W- EVP - LGK-AA- - Al KEWG- P 120 Co2 1— R EE-L P SL—R-d E-P AA-KA WGQ- 120 RiPKS5 KSKITHLVFCTTSGVDMPGADYQLTKLLGLRPSVKRLmYQQGCFAGGTVLRLAKDLAEN 180 RiPKSll H 180 RiPKS6 180 MsCHS2 KSKITHLIVCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN 180 Col -S-ITH CTTSGVDMPG-DYQLTK-LGLR—V-R-MMYQQGCFAGGTVLR-AKD-AEN 180 Co2 -S-ITH—FC-T PG-D LGL V-R GC-AGGT-LR-AK-LAE- 180 RiPKS5 NRGARVLWCSEITAVTFRGP SDTHLDILVGQALFGDGAAAIIVGADPLPKIERPLFELV 240 RiPKSll S 240 RiPKS6 . K S E 240 MSCHS2 NKGARVLWCSEVTAVTFRGP SDTHLDSLVGQALFGDGAAALIVGSDPVPEIEKPIFEMV 240 Col N-GARV-WCS E- TAVT FRGP H-DS-VGQALF-DGA-A G-DP E-P 240 Co2 ARVL—C-E-T-V-FR-P D—V—ALF-DG—A-I-G-DP E—F 240 RiPKS5 SAAQTILPDSDGAIDGHLREVGLTFHLLKDVPGLISKNIEKSLNEAFKPLDITDWNSLFW 300 RiPKSll N 300 RiPKS6 300 MSCHS2 WTAQTIAPDSEGAIDGHLREAGLTFHLLKDVPGIVSKNITKALVEAFEPLGISDYNSIFW 300 Col QT—PDS-GAIDGHL-E-GL-FHLLKDVP KNI L—AF 1 N--FW 300 Co2 Q P-S-GAI-G E-GL L VP S-N AF ISD-N—FW 300 RiPKS5 IAHPGGPAILDQVETKLGLKPEKLEATRHILSEYGNMSSACVLFILDEVRRKSATNGLKT 360 RiPKSll 360 RiPKS6 A A. .H.. 360 MsCHS2 IAHPGGPAILDQVEQKLALKPEKMNATREVLSEYGNMSSACVLFILDEMRKKSTQNGLKT 360 Col -AH-GGPAlLD-VE-K--L R-VLS-YGNMSSACV-FI-DEMR- G—T 360 Co2 --H-GG-AILD-VE L K R-V YGNMSS-CV-F—D—RK-S T 360 RiPKS5 TGEGLEWGVLFGFGPGLTVETWLHSVGVTA 390 RiPKSll 390 RiPKS6 AAST 390 MSCHS2 TGEGLEWGVLFGFGPGLTIETWLRSVAI-- 388 Col TGEG GVLFGFGPGLT-ETWL-S 390 Co2 -G-G GVLFGFGPGLT-ETWL-S 390 Figure 5.7 Alignment of the deduced amino acid sequences for the RiPKS cDNAs. The RiPKS5 sequence has been shown in full. For RiPKS6 and RiPKS 11 only deviations from RiPKS5 have been shown. Col , strict consensus of CHS sequences. Co2, strict consensus of STS sequences. Sequences used to derive the consensus in each case are described in Table 5.4. The alfalfa CHS2 sequence has been shown in full, since this sequence was used for the structure elucidation of CHS. The active site residues are shown in blue and residues defining the active site volume of CHS are in red, amino acids in green are required both for defining the active site volume and for catalysis. 160 5.3.6 Phylogenetic origin of Rubus polyketide synthases To study the phylogenetic relationships of the RiPKS gene family to PKS sequences from other plant species, I performed a parsimony analysis at the amino acid level. Again, I focused on PKS for which a PKS-type activity had been confirmed by genetic or biochemical tests (Table 5.4). The analysis yielded a single most parsimonious tree with a consistency index of 0.804, and a retention index of 0.807 (Figure 5.8). I rooted the tree with the Streptomyces RppA gene (GenBank, ABO 18047), whose encoded protein shares very low amino acid sequence similarity (typically 24-30% identity) with plant PKS genes, but forms polyketides in a manner resembling that used by CHS (Funa et al., 1999). The three RiPKS sequences clustered together, indicating that they had probably evolved by species-specific gene duplication event(s) from a single ancestor. It was apparent from the reconstruction that some CHS sequences were more closely related to non-CHS sequences from the same plants than to other CHS sequences from different plants. For example, the two PS sequences (PS26 and PS2) from Gerbera clustered with the CHS sequence from the same species. Within pine, the CHS sequences and the non-CHS sequences sub-cluster formed a monophyletic group of related sequences. The high bootstrap values of the corresponding nodes supports the robustness of these relationships. The appearance of monocots and dicots intermingled in two nested monophyletic groups indicates that PKS duplication and divergence predate the monocot/dicot divergence. 161 23 179 1817 100 253 39 53 100 87 63 _____ 100 79 89 100 Arabidopsis CHS Sinapis CHS | Zea CHS2 1 118 145 Zea CHS1 -T/oraeumCHSl 128 60 102 85 110 80 Gerbera CHS3 323 189 66 55 51 80 86 258 100 Gerbera CHS1 Petroselinum CHS 2^Rubus P K S 5 52BIU 4 «-AHIPKS11 6 R _ A _ I P K S 6 183 . Gerbera PS26 Gerbera PS2 57 74 2J r Sip 74 Peuraha CHS 79 Medicago CHS2 54 309 Arachis RSI 93 68 126 98 100 'Arachis RS3 P. slyvestris CHS P. s/roiuj CHSY 170 • P. sfroto CHS-like 144 | P. slyvestris DPS 341 100 100 144 1 P. slyvestris STS 100 4)6. P. sfrofcus STS1 T3. rfroto STS2 57 88 100 516 Petunia CHSA Antirrhinum CHS )j3 f/fti STS1 ^ FitoSTS ; 1 Vita STSY • Phalaenopsis BBS 100 Hordeum CHS2 flufa ACS Streptomyces CHS-like 100 changes Figure 5.8 Phylogenetic relationships among plant PKS proteins. The single most parsimonious tree was constructed from an amino acid alignment, using a heuristic search within PAUP4.0.b2. The Streptomyces griseus CHS-like sequence was used to root the tree. Branch lengths are indicated above the branch lines and clustering percent support values derived from 1000 bootstrap replicates are underlined. The tree has a consistency index of 0.801. Refer to Table 5.4 for further abbreviations. 5.3.7 Functional expression of the three R i P K S cDNAs 162 5.3.7 Functional expression of the three RiPKS cDNAs The recombinant RTPKS5, RiPKS6, and RiPKS 11 proteins were expressed in E. coli as fusion proteins either containing or lacking an N-terminal His6-tag. Expression of the recombinant protein was not affected by the presence or the lack of the N-terminal Ffis6- tag (data not shown). The recombinant protein had an apparent molecular mass of approximately 42,000 Da and was recognized on western blots by polyclonal antibodies raised against parsley CHS (Kreuzaler et al., 1979) (Figure 5.9). The recombinant proteins present in crude bacterial extracts were tested for chalcone synthase/benzalacetone synthase activity with malonyl CoA (10 pM) and p-coumaryl CoA as substrates. The end-product of the chalcone synthase reaction, naringenin chalcone, is unstable and undergoes non-enzymatic conversion to the corresponding flavanone. Thus naringenin can be considered as a suitable marker for monitoring CHS-type PKS activity (Hahlbrock et al., 1970). Figure 5.10 shows that with all three recombinant proteins, naringenin was the major product of the reaction, as detected by the co-migration of the radioactive band on T L C with an authentic sample of naringenin. Negligible benzalacetone synthesis was observed, and control extracts from bacteria transformed with the vector lacking inserts displayed no chalcone synthase activity (Figure 5.10). Control assays using recombinant proteins incubated with a single substrate (malonyl CoA) also did not yield naringenin. These results are consistent with the typical properties of CHS-type PKS (Hrazdina et al., 1976). 163 Figure 5.9 Analysis of recombinant RiPKS proteins expressed in E. coli. SDS/PAGE gel (A) and western blot detection of a gel run in parallel (B). The gel was stained with Coomassie Blue (A) or the proteins were blotted onto a PVDF membrane and developed with antiserum specific to parsley CHS. Soluble proteins were isolated from IPTG-induced E. coli harboring plasmid pQE30 (lane 1), pQE30-PKS5 (lane 2), pQE30-PKS6 (lane 3), or pQE30-PKSl 1 (lane 4). M , Molecular mass standard in kD. 164 1 2 3 4 1 2 3 4 1 2 3 4 R1PKS5 R1PKS6 RiPKSl l Figure 5.10 T L C analysis of products of the assay with the three Rubus PKS recombinant proteins expressed in E. coli. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender moiety. Lane 1, vector-only E.coli control with both CoA esters; lane 2, RiPKS5, 6 or 11 recombinant proteins incubated with malonyl CoA only; lane 3 and lane 4, Rubus recombinant proteins incubated with both CoA esters. The relative mobility of the radioactive product was compared against authentic samples of the expected end-products. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 165 5.3.8 Properties of the recombinant Rubus PKS proteins The capacity of the three recombinant proteins to use different hyroxycinnamyl CoA esters as starter units is shown in Figure 5.11. All three recombinant proteins showed an apparent preference for p-coumaryl CoA as the starter unit, relative to other hydroxycinnamyl CoA esters. The products seen with other hydroxycinnamyl CoA esters were not characterized further. I also tested the effect of increasing concentrations of P-mercaptoethanol on the activity of Rubus recombinant PKS proteins. P-mercaptoethanol has been reported to reduce the number of condensation cycles CHSs can successfully complete, which results in increasing release of intermediate condensation products (Ffinderer and Seitz, 1985). More specifically, p-mercapthoethanol has been reported to exert differential effect on Rubus BAS and CHS. Incubations with 4 mM P-mercaptoethanol reduced the CHS activity by 50%, whereas 4 mM P-mercaptoethanol increased the BAS activity by 35% (Borejsza-Wysocki et al., 1996). Incubations of the recombinant Rubus PKS proteins with increasing concentrations of P-mercaptoethanol (Figure5.12) shows that even 1 mM mercaptoethanol was sufficient to reduce the Rubus CHS activity by 30%. With RiPKS5 and RiPKS 11, no simultaneous increase in formation of any other product was detected with increasing concentrations of P-mercaptoethanol. However, as the concentration of P-mercaptoethanol was increased the reaction mixture containing RiPKS6 yielded a new radioactive product with a Rf value 0.66. 166 RiPKS5 • III RiPKS6 «P« RiPKSl l B N F O F *B •4 N O Figure 5.11 Conversion of hydroxycinnamoyl CoA starter esters by RiPKS recombinant proteins. Ethyl acetate-soluble assay products were separated by TLC and radioactive products labeled from [2- C] malonyl CoA visualized by autoradiography. The starter substrates used were p-coumaryl CoA (Lane 1), cinnamyl CoA (lane 2), caffeyl CoA (lane 3) and ferulyl CoA (lane 4). O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 167 RiPKS5 BJPKS6 RiPKSl l 2 4 6 8 10 Mercaptoethanol (mM) 12 Figure 5.12 Effect of •-mercaptoethanol on the activities of recombinant RiPKS proteins. Recombinant proteins were incubated with increasing concentrations of • -mercaptoethanol for 30 min at 4°C and then assayed with p-coumaryl CoA and malonyl CoA. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14C] malonyl CoA visualized by autoradiography. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 168 5.3.9 Modification of the enzymatic activity of the recombinant Rubus PKS proteins As mentioned in Section 53.5,1 had expressed the recombinant proteins either containing or lacking a N-terminal His6-tag. The histidine tag allows the rapid selective purification of proteins by metal chelation chromatography (Parath et al., 1975; Sulkowski, 1985). The technology is based on selective binding of the tagged protein to a resin charged with N i + 2 cations, and subsequent recovery by elution with an imidazole-rich buffer. When purifying the his-tagged version of recombinant RiPKS in this fashion, and assaying the directly-eluted recombinant protein for chalcone synthase activity, I detected the formation a new radioactive product (Rf = 0.86) that co-migrated with an authentic sample of benzalacetone (Figure 5.13B, lane 3). At the same time, the recombinant protein assayed in this imidazole-containing buffer displayed negligible chalcone synthase activity. These were very intriguing results, since it appeared that this treatment in vitro had changed a bona-ftde CHS enzyme into a BAS-type enzyme. Furthermore, I found it possible to restore the chalcone synthase activity of the recombinant protein by exchanging the elution buffer with a low salt buffer (50 mM HEPES, pH 7.5) (Figure 5.13B, lane 4). This conversion could be repeated by again switching the protein between the two buffers (Figure 5.13B, lanes 5, 6 and 7). The recombinant protein apparently lost its CHS activity in the high-salt elution buffer, and regained the CHS activity in a low-salt buffer. This dramatic switch from a CHS to a BAS-like activity and back was very reproducible. The metal chelate column elution buffer consisted of 1 M of imidazole and 0.5 M NaCl in 40 mM Tris pH 7.9.1 therefore incubated a crude sample of the recombinant protein with 0.5 M NaCl or 1 M imidazole and monitored their effect on the enzymatic 169 activity of the recombinant protein. While NaCl did not affect the CHS-type activity of the recombinant protein, adding increasing concentrations of imidazole to a crude recombinant protein extract (Figure 5.14), led to a decrease in the CHS activity with a concomitant increase in BAS-like activity. At a concentration of 0.4 M imidazole, the CHS activity declined by >98% while at the same time the BAS-like activity had increased by >98%. The BA-like product was formed by the recombinant protein only in the presence of both substrates (Figure 5.14, lane 10); control incubations of malonyl CoA and p-coumaryl CoA with 1M imidazole yielded no products (Figure 5.13, lane 9). Incubations consisting of crude bacterial extracts with vector only (Figure 5.13; lane 7, 8) also showed neither CHS nor BAS-like activity. Although only the data for one of the Rubus recombinant proteins is shown here, all three recombinant proteins, with or without the N-terminal His6-tag, behaved in an identical manner. To establish whether this phenomenon was unique to Rubus PKS genes, I analyzed the behavior of the Gerbera CHS1 recombinant protein (Helariutta et al., 1995). This recombinant protein also behaved in similar manner (Figure 5.15) although it required higher concentrations of imidazole to completely switch the activity from a CHS to BAS-like activity. 170 Figure 5.13 Effect of elution buffer on the activity of the Rubus PKS recombinant protein. (A) Purification of His6-tagged Rubus PKS6 with His.Bind resins. Proteins were visualized by Coomassie Blue staining. Lane 1, crude bacterial protein in 50 mM HEPES, pH 7.5; lane 2, flow-through from the his.bind resin; lane 3, eluate with the imidazole-containing elution buffer; lane 4, concentrated protein in 50 mM HEPES, pH 7.5 (B) TLC analysis of the recombinant protein at various purification stages. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14 C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. Lane 1-4, activity of the proteins at the purification steps described in A. Lane 5, activity of the protein purified in step 4 resuspended again in imidazole elution buffer; lane 6, activity of the protein from lane 5 resuspended in 50 mM HEPES, pH7.5; lane 7, activity of the protein from lane 6 resuspended again in imidazole elution buffer. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of the authentic naringenin; B, position of authentic benzalacetone. 171 1 2 3 4 5 6 7 Imidazole ( M ) 0 0.1 0.2 0.4 0.8 1.0 1.0 8 9 10 11 1.0 1.0 1.0 1.0 Figure 5.14 Products from the assay of recombinant R1PKS6 in the presence of increasing concentrations of imidazole. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14 C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. Lanes 1-6 and lane 11, RiPKS6 recombinant proteins incubated with both CoA esters in the presence of imidazole as indicated. Lane 7, vector-only E.coli control with malonyl CoA ester; lane 8, vector only E.coli control with both CoA esters; lane 9, control with both CoA esters in the presence of 1 M imidazole and no proteins; lane 10, RiPKS6 incubated with malonyl CoA in the presence of 1 M imidazole. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic narmgenin; B, position of authentic benzalacetone. 172 0 0.1 0.5 1.0 Imidazole (M) Figure 5.15 Effect of increasing concentrations of imidazole on the Gerbera CHS1 recombinant protein. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14 C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of naringenin; B, position of benzalacetone. 5.3.10 Induction of PKS activity in Rubus cell-suspension cultures To investigate the relationships between BAS and CHS activities in vivo I induced raspberry cell suspension cultures by treatment with Bacto-yeast extracts. Addition of yeast extract caused a rapid increase in CHS and BAS activity that peaked at 40 h for CHS and 120 h for BAS (Figure 5.16). Compared to CHS, BAS activity remained elevated for a longer period. No CHS or BAS activity was detected in control incubations at either 0 h or 120 h. The appearance of the elevated CHS activity in elicited cultures is contradictory to the results reported by Borejsza-Wysocki et al. (1996), who reported that 173 BAS was preferentially induced in response to elicitation in this same Rubus cell-culture line, while CHS activity remained at a basal level. C 1 2 5 10 20 40 60 120 160 Time after Elicitation (h) Figure 5.16 Changes in the activities of CHS and BAS in raspberry cell-suspension cultures induced with yeast extract. TLC analysis of products of the assay at various time points after elicitation. Ethyl acetate-soluble assay products were separated by TLC and radioactive products labeled from [2-14C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. C, control at 60 h after elicitation with water. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of naringenin; B, position of benzalacetone. 5.3.11 In vivo relevance of the effects of imidazole While I could not detect specifically a BAS-type PKS activity among the three Rubus recombinant proteins, I could detect the synthesis of both putative B A and naringenin in protein extracts from elicited cell cultures of Rubus. This suggested that the cell-cultures were capable of utilizing p-coumaroyl CoA and malonyl CoA to form both naringenin and benzalacetone, and therefore must have both CHS and BAS-type activity. 174 Borejsza-Wysocki and Harazdina (1994 and 1996) reported a similar result where they could detect BAS activity in ripening fruits of Rubus and elicited Rubus cell cultures. Borejsza-Wysocki and Hrazdina (1994) further noted that BAS activity increased as the fruit matures, with the highest activity being detected in berries that are slightly overripe and partially dehydrated. Their attempts at purification of BAS yielded a protein sample that was enriched 172-fold for BAS activity but still had contaminating CHS activity. 2-D gel analysis of the purified protein fraction showed that it contained multiple proteins with similar M rs (Borejsza-Wysocki and Harazdina, 1996). The unusual in vitro behaviour of the recombinant PKS protein in the presence of 1 M imidazole suggested a possible alternative origin for the production of B A in raspberry. It is conceivable that B A is formed in vivo by a CHS-type enzyme that is modified in its activity by one or more factors it encounters in the cell micro-environment. One such factor might be a change in the cellular pH. Raspberry fruits undergo rapid physiological changes during ripening. The soluble solids concentration, and concentrations of anthocyanins and benzalacetone-derived flavour components all increase, while titrable acidity decreases (Perkins-Veazie et al., 1992; Borejsza-Wysocki and Hrazdina, 1994). Since pH can have a profound effect on the enzyme catalysis, I tested the effect of changing pH on the activity of recombinant RiPKS. The assay was conducted at 37°C for 30 min using pH values of 4-9. At the end of the assay, the pH in each assay was acidified with acetic acid and the products of the assay were extracted into ethyl acetate and analyzed as discussed in materials and methods. Remarkably, the recombinant protein was active at all pH values tested, with the highest activity at pH 9.0 175 (Figure 5.18). CHS purified from cell-cultures of parsley has been reported to have optimal activity at pH 8.0 (Kreuzaler and Hahlbrock, 1975). I next tested if other metabolites with a similar pK a or physio-chemical properties as those exhibited by imidazole could cause a similar shift in the enzymatic activity of the recombinant RiPKS. I tested the effects of glycine betaine and several other amino acids (Figure 5.17) on the recombinant RiPKS. Some of these metabolites such as proline (Stines et al., 1999), and glycine betaine (Rhodes and Hanson, 1993) also accumulate to high concentrations in plants and thus, conceivably could be potential modifiers of the enzymatic activity as seen in vitro. As shown in Figure 5.19, glycine betaine, proline, tryptophan, lysine, aspartic acid, histidine, tyrosine, and phenylalanine (at 1.0 M or saturating concentrations) either did not have any effect on the chalcone synthase type activity of the recombinant proteins or were inhibitory. None however, was capable of inducing the shift to a BAS-like reaction. Having established that imidazole analogues or pH change were not sufficient to alter the CHS activity of the recombinant protein, I next focused my attention on other plausible mechanisms by which such a switch could occur in vivo. Kennedy et al. (1999) have reported that the biosynthesis of a polyketide, lovastatin, in Aspergillus terreus requires the action of two PKSs. The biosynthesis of lovastatin was dependent on the interaction of a type I polyfunctional enzyme (lovastatin nonaketide synthase) and a type II monofunctional enzyme (LovC). The cloning and characterization of LovC revealed that it encoded a protein of 363 aa that has high similarity to the product of the Cochliobolus carbonum toxD gene (Ann et al., 1998) and to a ripening-induced protein from strawberry (GenBank, 2465008) and auxin-induced protein from mung bean 176 (GenBank, 1184121). This led me to speculate that another protein might exist in raspberry tissues that would have the ability to modify CHS activity in vivo. As a preliminary test of this hypothesis, I designed a simple reconstruction experiment, where the recombinant protein was incubated together with crude Rubus plant and cell-culture protein extracts and these mixtures assayed for their PKS activity. Figure 5.20, shows that mixtures of recombinant proteins with plant or cell-culture extracts retained the ability to form naringenin but displayed no BAS activity. The various crude plant extracts themselves also displayed no BAS activity, but CHS activity could be detected in extracts from leaf, shoots, green fruits and partially ripe fruits (Figure 5.20). HN • ^ N ' ^ C O O H H2N—CH—COOH I (CH2)4 I NH 2 Lysine HN Imidazole Glycine betaine Proline CH3 I H2N—CHj—CHj Histidine H2N—CH—COOH I CHj COOH Aspartic Acid H-CH-COOH CH 2 to Tryptophan COOH COOH Phenylalanine Figure 5.17 Structures of chemicals tested for their ability to imitate the effects of imidazole on RiPKS activity. 177 4 5 6 7 8 9 pH Figure 5.18 Effect of assay pH on the enzymatic activity of recombinant RiPKS6. Ethyl acetate-soluble assay products of the assay were separated by TLC and radioactive products labeled from [2-14C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 178 Figure 5.19 Effect of different amino acids and amino acid derivatives on the enzymatic activities of recombinant RiPKS6. Ethyl acetate-soluble assay products were separated by T L C and radioactive products labeled from [2-14C] malonyl CoA visualized by autoradiography. The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. Concentrations as mentioned (or saturated solutions) of the various amino acids or their derivatives were added to 20 pg of recombinant protein and assayed for activity. Metabolites used were 1 M glycine betaine (lane 1), proline (lane 2), 1 M tryptophan (lane 3), lysine (lane 4), 1 M histidine (lane 5), aspartic acid (lane 6), tyrosine (lane 7), 1 M phenylalanine (lane 8). C, control RiPKS6 assay O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 179 C „ M C M C M C M C M C M C M C M C M C W */ •> / <fr /> / / / / / / Figure 5.20 Effect of adding protein extracts from raspberry tissue and cell-culture to crude sample of R1PKS6. Recombinant bacterial extract (15 pg) was assayed for CHS and BAS activity in the presence of raspberry tissue or cell-culture extracts (30 pg protein). The substrates used were p-coumaryl CoA as the starter unit and malonyl CoA as the extender unit. Ci , activity of 15 pg of crude recombinant RiPKS6; C, control activity in 30 pg of raspberry tissue or cell culture extracts; M , mixed activity of the recombinant RiPKS6 incubated with raspberry tissue and cell culture extract. The raspberry tissue extracts were from leaf, shoot and various developmental stages of raspberry fruits. 40 h and 80 h, represent extracts prepared from raspberry cell-culture induced with yeast cell-wall for 40 h and 80 h. O, origin of the chromatogram; F, solvent front of the chromatogram; N, position of authentic naringenin; B, position of authentic benzalacetone. 5.3.12 Developmental regulation of the RiPKS cDNA transcripts The expression patterns and the absolute levels of the RiPKS5, R1PKS6 and RiPKSll mRNAs were analyzed using RT-cPCR with gene-specific primers. Under the PCR conditions used, these primers only amplified their cognate plasmid cDNAs. 180 To examine the pattern of expression of the RiPKs genes, I assayed a range of organs, including several developmental stages of flowers and fruits (Figure 5.21). RiPKS5 was expressed in all the organs tested, and was expressed most actively in young leaves. In contrast, R1PKS6 and RiPKSll had very low expression in young leaves compared to their expression in other organs. The three RiPKS transcripts also differed in their temporal patterns of expression during flower development and fruit maturity. In fruits, RiPKSll expression was particularly high in developmental stages that also have higher anthocyanin levels (stage III, IV, V). During flower development, RiPKSll was more actively transcribed in fully mature flowers (stage II) than in buds (stage I) or fertilized flowers (stage HI). The kinetics of expression of RiPKS5 and RiPKS6 in developing flowers generally resembled each other and the accumulation of these transcripts did not follow the process of fruit development. Most notably, in the comparison of five developmental stages of fruits that I sampled RiPKS5 and RiPKS6 did not show a temporal relationship to anthocyanin accumulation, in contrast to RiPKSll. The three RiPKS transcripts thus follow unique developmental patterns. The semi-quantitative method used to analyze the developmental regulation of the three RiPKS cDNAs could not be used to compare the expression of specific RiPKS transcripts within a single tissue. To obtain that information I used the RT-cPCR assay to quantitatively determine the amounts of each transcript in stage 111 fruits. Once I had determined the absolute levels of expression of each of the three transcripts in that tissue, it was possible to normalize the amounts in other tissues by comparing the ratio (target:competitor) obtained for each tissue to the ratio (target:competitor) observed in stage IH fruits. 181 As determined by this analysis (Figure 5.22), the absolute levels of RiPKS transcripts in stage IE fruits were RiPKS5> RiPKS 11> RiPKS6 (160: 40: 0.8 pmol cDNA/mg tRNA). Comparison of the relative abundance amongst the three RiPKS transcripts in different tissues, revealed that RiPKS5 mRNA is more abundant than either RiPKS6 or RiPKSll in all the tissues (Table 5.5). Quantitatively, RiPKS6 was expressed only at very low levels in all the tissues investigated. During the screening of the cDNA library representing the stage HI of fruits, I also observed that RiPKSS was the most abundant cDNA, which is consistent with the absolute levels of specific RiPKS transcripts determined by RT-cPCR. Figure 5.21 Semi-quantitative RT-cPCR analysis of the accumulation of specific RiPKS transcripts in different organs of Rubus. A) RT-cPCR was performed using 100 ng of total R N A isolated from young leaves, shoots, roots and different developmental stages of flowers and fruits. Following 32-repeated cycles of amplification, the products (50% of each amplification mix) were resolved in a 3% TAE-agarose/EtBr gel. B) The relative amounts of target (t) and competitor (c) amplification product were calculated, and the ratio of the two products has been graphed. Similar results were obtained in two independent experiments. The expression level of a given RiPKS genes can be compared between tissues, but expression amongst the three RiPKS genes cannot be compared within a tissue in this graph. The intensity of the bands was normalized to the average intensity of RiHIS3 products as a control for starting R N A equivalence. 182 A) 500 bp RiPKS5 9 "Wm H P Target Competit 500 bp-{ RiPKS6 500 bp RiPKSll 500 bp RiHis3 500 bp RiHis3 ()RT (- Target Compet mH&f flMMfc mttlDk mnt& Compet Target Target £ * & M i $ f $ # J P 4? / / * * * * £ # f •# £ A. B) 18-1 17-S • RiPKSll 183 Figure 5.22 Quantification of the absolute levels of three RiPKS mRNAs in developmental stage III of fruits. The logarithmic molar ratio of the PCR products generated using a constant amount of cDNA templates in the presence of a serial dilution of the gene-specific competitor (1000 attomoles to lx 10"1 attomoles), was plotted against the competitor added per tube and linear regression was used to obtain a straight line. The equivalence points (Y=0) inferred for each of the three RiPKS cDNAs, indicates the target cDNA concentration of each sample. c, competitor; t, target; N c , Moles of competitor 184 RiPKS transcript levels pmol/mg of tRNA RiPKS 5 RiPKS 6 RiPKS 11 Leaves 390 0.1 0.4 Shoots 288 0.5 5 Roots 206 0.5 15 Flowers I 190 0.6 2 Flowers II 182 0.6 6 Flowers in 169 0.6 3 Fruits I 279 1 10 Fruits LI 210 0.4 3 Fruits m 159 0.8 40 Fruits IV 361 2 101 Fruits V 266 1 36 Table 5.5 Absolute levels of specific RiPKS transcripts in different organs. The absolute levels of the three RiPKS transcripts were determined in the developmental stage III of fruits (Figure 5.21). The levels of transcripts of each cDNA in other organs were then determined based on the relative ratio of expression of each cDNA in those tissues compared to that seen in stage in fruits (Figure 5.21). 185 5.4 Discussion 5.4.1. Rubus PKS multigene family The study described in this chapter represents one of the most comprehensive analyses of the plant PKS gene family conducted to date. Previous work with this gene family has largely concentrated on characterizing members encoding specific PKSs such as CHS, STS, PS and ACS (reviewed in Schroder, 1997; Dixon, 1999). In most cases, only a few members have been characterized at the sequence level in any given analysis. Here, I present sequence data for eleven members that make up the Rubus PKS multigene family. In addition, I have examined the full-length coding sequence, catalytic properties of the encoded proteins and evolutionary origins of three gene family members that are related to fruit-ripening. Since my objective was to characterize the full complement of Rubus PKS gene(s), the PCR primers were designed to target evolutionarily conserved regions that are found within a range of PKSs with distinct catalytic capabilities. The homogeneity of these various primers allowed me to use the same amplification protocol for all the primer combinations. All the primer sets allowed efficient amplification of the desired products with high specificity. The detection of ten PKS gene family members in Rubus is not surprising. Large gene families seem to be the norm for PKS genes in plants. Up to eight genes classified as CHS (based on homology) have been reported from legumes such as bean (Ryder et al., 1987), soybean (Estrabrook and Sengupta-Gopalan, 1991), and alfalfa (lunghans et al., 1993). PKSs other than CHS have also been reported to be encoded by a family of 186 homologous genes. In Vitis, at least seven closely linked STS genes have been reported (Wiese et al., 1994), while in gymnosperms ten STS genes have been isolated and characterized from P. slyvestris (Preisig-Muller et al., 1999). Similarly, a family of three transcriptionally active PS genes was characterized in Gerbera (Helariutta et al., 1996). Only Arabidopsis and Antirrhinum have been proposed to contain single copies of CHS-type PKS genes (Sommer and Saedler, 1986; Shirley et al., 1995). However, this conclusion may be premature, since a computer-based search of the Arabidopsis genome database reveals the presence of several sequences that share homology with the CHS consensus (Table 5.4). The complete sequencing of the Arabidopsis genome in the near future will help determine how many PKS genes are present and how they are related. Weinand et al. (1982) report the presence of several faint bands in a Southern blot analysis of Antirrhinum genomic D N A with a Petroselinum CHS fragment as a probe, suggesting that Antirrhinum might also have additional copies of PKS-type genes. In most of these cases, characterization of members of this gene family have generally either relied on screening of a cDNA/or gDNA libraries, or use of reverse-genetics approaches. My data indicate that PKS-Wke genes comprise a large gene family in Rubus. Rubus idaeus is vegetatively propagated and given the breeding history of R. idaeus in North America, it is likely that R. idaeus L cv. Meeker consists of a heterozygous and a homogenous population. Thus, to avoid treatment of sequence variations arising from co-amplifications of the allelic sequences, and from the error introduced by D N A Taq polymerase, I chose a variability threshold (10 nucleotide differences in the 351 nucleotides compared), below which all variations were ascribed to these factors. Using this threshold as my standard for distinguishing between genes and alleles, I was able to 187 characterize fragments potentially representing ten different PKS genes within the Rubus genome. These partial fragments shared considerable homology with each other (Table 5.2) and to other PKS sequences in the database (data not shown), which provides strong correlative evidence that they encode Rubus PKSs. In plants, all genomic clones of PKS genes have at least one intron, with the exception of Antirrhinum which has two introns. The placement of the common intron in various PKSs like CHSs and STSs is highly conserved, splitting a highly conserved cys6 6 residue codon (Arabidopsis CHS, GenBank M2308). The primers designed in this study target regions outside this conserved intron, but would capture, in principle, the second intron found within the Antirrhinum gene. No such larger amplification products were observed in Rubus, however, and sequence analysis around the intron-exon boundaries corresponding to the second Antirrihnum intron, confirmed that the ten Rubus PKS genes lack this potential intron. Screening of a partially ripe Rubus fruit cDNA library resulted in the identification of 3 unique PKS cDNAs corresponding to mRNAs whose accumulation was stimulated by ripening. The three RiPKS cDNAs share considerable sequence homology amongst themselves and with other PKSs (Table 5.4). The high homology within the coding regions of the Rubus PKS sequences also extended to the 5' and the 3'-UTRs. Thus, while RiPKS5 and RiPKS6 share 93% nucleotide identity, their 5'- and 3'-UTRs share 100% and 60% identity, respectively. Similarly RiPKSS and RiPKSll share 96%) nucleotide identity in the 3'-UTR and 97% nucleotide identity within the coding sequence, but share only 68% nucleotide identity in the 5'-UTR. Though considerable homology was detected in the 5' and the 3-UTRs, the subtle differences in these regions 188 between the three cDNA clones, suggest that each of the clones can be considered as arising from individual genes rather than alleles of the same gene. This observation also indirectly supports the cut-off values used to separate the PCR products as those arising from genes vs. those arising from alleles of the same gene. Confirmatory evidence for such a hypothesis however, requires further experiments like RFLP analysis of genetic crosses. Among the various PKSs that have been identified from plants, RiPKSs share higher percent sequence identity with CHS-type PKSs than with other non-CHS type PKS (Table 5.4). The deduced amino acid sequences when compared with the strict consensus of CHS and STS sequences, and with the sequence of alfalfa CHS2 revealed that the encoded Rubus proteins were more closely relates to CHS-type PKS than to the STS consensus (Figure 5.7). The most obvious of these distinguishing regions is the peptide sequence M M Y Q Q G C F A G G T V L R , which is completely conserved across the CHS consensus, and RiPKS5 and RiPKS6. RiPKS 11 harbors a single amino acid change in this region (Figure 5.7). The Rubus sequences also include the four amino acid residues (Figure 5.7) required for the catalytic functions of CHS (Jez et al., 2000). Thus, the overall homology of the Rubus sequences to CHS, suggests that the three Rubus cDNAs all encode CHS-type PKSs. Nevertheless our knowledge of the PKS active site is still preliminary. While computer-based homology searches have proven to be fruitful exercises to find and define functional homologues of genes across species, one has to be particularly cautious in using such an approach for predicting the catalytic properties of a new PKS. Non-CHS type PKSs are still very similar to CHS-type proteins, and it is clear that only subtle 189 changes in the PKS active site are required to create a novel catalytic functions. This adaptability may well underlie the remarkable diversification that characterizes PKS proteins in plants. 5.4.2 Catalytic properties of the three Rubus PKS cDNAs To ascertain the functional properties of the Rubus PKSs, all three RiPKS cDNAs were expressed in E. coli as recombinant proteins. In each case, a single protein species was obtained, and this was readily recognized by a parsley CHS-antiserum, consistent with the observed sequence homologies (Figure 5.9). Enzymatic assays in crude extracts with each of the recombinant proteins incubated with malonyl CoA and 4-coumaryl CoA revealed catalytic activities that were similar to that expected for a CHS-type PKS involved in the biosynthesis of chalcones. It has been reported that some PKSs can accept alternative starter units as substrates, albeit with lower efficiency. For example, while 4-coumaryl CoA is believed to the natural starter unit substrate for parsley CHS, that enzyme was also active with various aliphatic CoA esters (e.g. butyryl CoA, hexanoyl CoA) (Schtitz et al., 1983). Similarly, ACS from Ruta is active with N-methylanthranilyl CoA and substituted benzoic acid CoA esters, but is inactive with 4-coumaryl CoA (Junghanns et al., 1995), while acetyl CoA is the primary starter for PS from Gerbera, in vitro it can also prime condensation with small hydrophobic CoA esters like butyryl CoA, propionyl CoA, isovaleryl CoA (Eckermann et al., 1998). While 4-coumaryl CoA is the primary substrate for CHS, the presence of other cinnamic acid esters in plants led to an earlier "hydroxycinnamykCoA ester hypothesis" that proposed the existence in plants of CHS enzymes with a substrate preference for other cinnamate esters (Hess, 1967). In support of this hypothesis, a recently 190 characterized barley CHS-like protein (HvCHS2) has been reported to display a strong substrate preference for ferulyl CoA compared to 4-coumaryl CoA (Christensen et al., 1998). Ferulyl CoA has also been reported to be a preferential substrate for a protein fraction enriched 172-fold for BAS-type PKS activity from Rubus (Brojesza-Wysocki and Hrazdina, 1996). In an assay with differently substituted cinnamic acid esters, the three Rubus recombinant proteins accepted other starter substrates such as cinnamyl-CoA, caffeyl CoA, and ferulyl CoA (Figure 5.11), albeit at lower levels than 4-coumaryl CoA. This pattern of activities was consistent with that observed with other CHS-type PKSs. The heterologous expression of the Rubus cDNAs corroborates that the three RiPKS cDNA encode Rubus CHS and thus should accordingly be renamed RiCHS5, RiCHS6, andRiCHSll . The most enigmatic activity of the three recombinant RiPKS proteins was their activity in the presence of high concentrations of imidazole (Figure 5.14). While the recombinant proteins retained a CHS-type activity in a low salt buffer, in presence of high concentrations of imidazole (400 mM, pH 7.0) the reaction mixture appeared to display a BAS-type activity almost exclusively (Figure 5.14). The new activity was proposed to be "BAS-type" since the product band in this reaction co-migrated with an authentic sample of B A under the T L C conditions used. Attempts to elucidate the structure of this "B A-like" compound are underway at the present. This switch in activity was not limited to Rubus CHSs but could also be induced in Gerbera CHS1, another true CHS-type PKS (Figure 5.15). The biochemical origins of this remarkable change in the catalytic properties are uncertain, although some hints can be found in the literature. Imidazole has a pK a near 191 physiological pH, and is thus optimized for use as a general acid-base catalyst. Imidazole, histidine, or polyamines have been used in the past for promoting or enhancing R N A cleavage activity of ribozymes (Fersht, 1997) and because of the presence of histidine residues in the catalytic center of ribonuclease A, RNA hydrolysis in imidazole buffers has been extensively studied (Roth et al., 1998). Imidazole has recently been shown to restore catalytic efficiencies in ribozymes harboring specific point mutations (Perrotta et al., 1999). Based on these results, and the properties of imidazole Perrotta et al. (1999) suggested that certain RNA molecules use a general acid-base catalysis type of mechanism to extend their catalytic repertoire. In case of plant PKS, the effect of imidazole on catalysis appears to be one of shortening of the catalytic cycle that normally adds six carbons to the phenylpropanoid starter units. If the resulting product observed on T L C plates is in fact BA, this shorter cycle must also involve a side chain decarboxylation (enzymatic or non-enzymatic) that is not part of the usual CHS-catalyzed process. However, no matter what the underlying chemistry, it seems improbable that the induction of such a modified catalysis could be the result of a specific CHS-imidazole interaction, since imidazole does not accumulate in plants. Compounds with similar properties can be found in plants however, including L-histidine, L-proline, and glycine betaine, and these can sometimes accumulate to relatively high levels. To further investigate this possibility, I designed simple reconstruction experiments, where crude recombinant Rubus CHS was assayed after incubation with metabolites that have similar pK a or physio-chemical properties as those of imidazole. However, the BAS-type activity of the CHS could not be recreated by any of these 192 treatments ( Figure 5.19). The possibility exists, of course, that another plant metabolite is the putative effector. Alternatively, the effector could be a protein or a peptide whose action on CHS is simply mimicked by high concentrations of imidazole in vitro. In this model of cooperation between a CHS-type PKS and another protein or metabolite, the "derailment product" of the CHS might serve as a substrate for another enzyme. Such derailment products of PKS have been observed in in vitro reaction of CHS (Hinderer and Seitz, 1985), and exchange of products of CHS between two proteins, possibly by direct interaction with each other in vivo has been reported (Welle and Grisebach, 1988; Schroder et al., 1998). Another potentially relevant example of bi-partite PKS catalysis occurs in the biosynthesis of the antibiotic lovastatin. This process requires the interactions of lovastatin nonaketide synthase (LNKS) with an additional protein (LovC) for correct substrate processing and production of the final polyketide (Kennedy et al., 1999). To assess the possibility that the RiPKS modifying factor might be another Rubus protein, I co-incubated crude fractions of recombinant proteins with protein extracts obtained from various organs of Rubus. However, I was unable to detect any change in the activity of the CHS under these conditions (Figure 5.2). I was able to confirm the presence of a B A-like PKS activity in Rubus by analyzing the activities seen in Rubus cell-cultures treated with yeast extract, which mimics the effect of pathogen attack. When raspberry cell cultures were treated with such an elicitor, the activity of a BAS-type PKS showed a gradual increase, reached a maximum at 120 h after elicitation, and then declined. During this time, CHS-type activity also showed a steady increase until 60 h and then declined. This response of two distinct PKS in a single system is in direct contrast to that seen in other systems such as 193 peanut, grapes and pine cell-cultures where one PKS (STS) has been observed to be preferentially expressed relative to CHS (Rolfs et al., 1981; Rolfs and Kindl, 1984). My results differ somewhat from those of Borejsza-Wysocki and Hrazdina (1996), who reported that BAS is induced in response to elicitation, but that CHS remains at the basal level. They were using the same cell-culture system that I did, but at a different time and location. The contrasting data may therefore arise from subtle differences in the micro-environment that the cell-cultures were exposed to at two different facilities. I have not been able to establish whether the reversible loss of CHS-type PKS activity induced by imidazole is an artefact or reflects a biologically-relevant phenomenon in Rubus. The biosynthesis of B A in elicited Rubus cell-cultures suggests that a PKS exists that can catalyze the formation of this polyketide. Whether this activity represents the by-product of a CHS modified in its activity by another factor, including other proteins, or is the unique product of a specific PKS protein remains to be determined. Clarification of these hypotheses will require further experiments. It is important to remember that I have only determined the catalytic properties of three of the eleven possible PKS proteins present in Rubus. Based on the partial sequences of the other eight genomic fragments, one cannot predict the enzymatic capabilities of the respective proteins, so it is possible that one of the other eight isoforms represents a "true" BAS-type PKS. However, I consider this unlikely, since none of these eight genes were represented in the cDNA library made from "partially-ripe" raspberry fruits. The accumulation of raspberry ketone paralleled by the accumulation of BAS-type PKS activity in ripening raspberry fruits, where both the highest accumulation of raspberry ketone and BAS activity is detected in fruits that are overripe and somewhat dehydrated 194 (Borejsza-Wyscoki and Hrazdina, 1994). Thus, I would expect that the BAS-transcript itself would already be accumulating in earlier developmental stages such as the partially ripe fruits used as the starting material for the cDNA library. Overall, even if the effect of imidazole proves to be an artefact, it still suggests strategies that might be used to increase the catalytic repertoire of PKS to catalyze other condensation reactions. It also may shed light on the catalytic mechanisms that might be prevalent within the PKSs and the evolution of PKS catalysis. While the crystal structure of alfalfa CHS2 has been solved, the mechanism of polyketide formation itself is still debatable (Ferrer et al., 1999). Site-directed mutagenesis of important residues identified by the crystal structure can point to the reaction chemistry involved in the biosynthesis of chalcones (Jez et al., 2000). At this point, however, one can only speculate on the mechanisms of formation of other polyketides such as acridones, or stilbenes. A better understanding of the "imidazole effect" could provide useful insights into the ability of PKS gene products to acquire newer functions. 5.4.3 Evolution of PKS gene family An earlier phylogenetic study of 38 PKS gene family sequences, consisting of 34 sequences defined as CHS and 4 sequences defined as STS (primarily based on homology) showed that STS sequences grouped with CHS sequences from related plants. Based on this cluster analysis Trppf et al. (1994) concluded that non-CHS like PKSs had evolved from CHS sequences several times independently during the course of plant evolution. Later analyses from Durbin et al. (1995) and Helariutta et al. (1996) reiterated the working hypothesis of Trof et al. (1994) and supported a similar mechanism underlying the evolution of catalytically divergent PS in Gerbera (Helariutta et al., 1996), 195 and a family of CHS-like sequences in Ipomea (Durbin et al., 1995). In contrast, a recent phylogenetic study of a group of Vitis PKS sequences, defined as STS based on homology to other STS sequences, suggested that evolution of the STS sequences is more complex than initially suggested (Goodwin et al., 2000). Phylogenetic analysis of various CHS and STS sequences, using five different analysis programs, led to the conclusion that the STS genes might have originated through several different routes, involving both convergent evolution and diversification from a common progenitor. By using only PKS sequences that have defined enzymatic capabilities, I found that members of the PKS superfamily often formed subgroups based on their catalytic properties. For example, STS and CHS from pinaceae each formed a monophyletic group. STS from grapes formed a monophyletic cluster at the bottom of the tree; and a subset of the CHS sequences from Arabidopsis, maize and barley form a monophyletic group at the top of the tree. The two distinct clusters found within the STS-type sequences may reflect the sub- classification found within the STSs, where pinosylvan synthase from pines prefers cinnamoyl CoA as starter unit substrate and resveratrol synthase from grapes and peanuts preferentially utilizes 4-coumaryl CoA. Gene duplication followed by differentiation can result in the production of related proteins with novel functions. The duplicated gene must, however, diverge fast enough to escape the homogenizing effects of gene conversion (Walsh, 1987) or recombination (Drouin and Dover, 1990). The evolution of STS and CHS in pines and legumes may represent such a duplication and differentiation event. In this analysis, the three Rubus sequences grouped together, suggesting that these genes are the result of a recent gene-duplication event. 196 The deep phylogeentic divergence between monocot CHS and non-CHS like sequences, and the position of ACS in the tree suggest that the earliest PKS gene duplication were ancient. Many species are represented in the lower branching order by only one PKS sequence that has a defined activity, which makes it difficult to resolve how many gene duplications and differentiation events may have occurred. Interestingly, also the PKS sequences from gymnosperms were nested within a group representing various angiosperm PKS sequences, even though gymnosperms are considered to be evolutionarily more ancient than angiosperms. With the completion of its genome sequence, Arabidopsis might be used as a benchmark to identify and characterize potentially divergent PKS sequences, and to study to their phylogenetic relationships. In a preliminary exploration of this idea, a BLAST search of the still-incomplete Arabidopsis database was carried out using the single Arabidopsis CHS sequence as a probe. The analysis returned several sequences that had relatively low homology to CHS sequences (Table 5.4). Although the catalytic properties of the encoded proteins need to be ascertained, phylogenetic analysis of these Arabidopsis CHS-like sequences placed them in a distinct outgroup to all the CHS and non-CHS sequences used in my Rubus analysis (data not shown). I interpret this result as corroboration of the hypothesis that there existed an ancestral form from which Arabidopsis CHS and non-CHS sequences have evolved. It has been suggested that CHS itself may share a common origin with the fatty acid synthases that are crucial for primary metabolism (Verwoert et al., 1992). The structure elucidation of alfalfa CHS also revealed that CHS shares structural features and reaction chemistry with thiolase and with P-ketoacyl synthase II (Ferrer et al., 1999). The presence of a CHS-type protein that 197 performs a CHS-type reaction has been reported from Streptomyces (Funa et al., 1999), again suggesting that an initial condensing-type PKS might be the progenitor for the present day plant PKSs. All PKS genes regardless of their catalytic function have been found to possess a single intron, the only exception to this rule being CHS from Antirrhinum, which possesses an additional intron within its genomic sequence. The relevance and importance of these introns is difficult to assess. However, the universal presence of the single intron in PKS sequences suggests a common progenitor for all plant PKSs. How different PKS genes have evolved from this common progenitor is still debatable. It would be of particular interest, in the context of B A formation, to learn whether enzymes that are presently specialized to catalyze one or two condensation cycles represent evolutionarily old forms of PKSs or arose through loss-of-function divergence from CHS. 5.4.4 Expression patterns A range of PKS members, including CHS, STS, ACS, and PS, have consistently demonstrated differential patterns of expression with respect to specific tissues, and to developmental and environmental cues (Ryder et al., 1984; Bell et al., 1986; O'Neill et al., 1990; Harker et al., 1990; Gong et al., 1997; Koes et al., 1989; Helaruitta et al., 1995). Although there is much information about the expression patterns of individual genes, or the response of multiple gene-family members to environmental cues, little is known about the regulation of multiple family members during a developmental program such as flower and fruit development. Although I have not conducted an analysis with all members of the RiPKS family, I did examine the behavior of at least three members of 198 the Rubus CHS family. This revealed that members of this family show tissue-specificity or differential expression in response to developmental parameters. Two patterns of gene expression were apparent when the expression of the cDNA clones was examined in various organs of Rubus, and in developing fruits. RiPKS5 displayed high levels of mRNA expression in all organs of Rubus. The expression of a second group of cDNAs, represented by RiPKS6 and RiPKS 11, varied significantly in ripening Rubus fruits, indicating that these genes are regulated in a ripening-dependent manner. The results demonstrate that both fruit-ripening-dependent and independent pathways of gene expression coexist within the Rubus CHS gene family, and thus provides a system in which it should be possible to identify signals that contribute to ripening-regulated gene expression and to recover gene promoters that are ripening-regulated. CHS is not an abundant protein and the constitutive levels of its mRNA in several plant species has also been estimated to be low (Wienand et al., 1982). The three members of the Rubus CHS gene family were expressed to varying degrees in leaf, roots and shoots, as well as in response to development of flowers and fruits. One member, RiPKS5, was present at significant levels (159-361 pmol/ug; tRNA) in all organs studied. This suggests that CHS-like PKSs are required in all major organs of Rubus. The absolute levels of RiPKS6 and RiPKSll were one to two orders of magnitude lower than those of RiPKS5 (Table 5.5), but still varied between tissues. For example, the levels of RiPKS 11 in fruits (stage IV) were about 10 times higher than levels detected in immature green fruit (stage 1). These gene-specific changes in mRNA levels are likely to be involved in the physical and metabolic re-modeling that occur during fruit development. Although 199 RT-cPCR analysis allowed me to determine the absolute levels of each of the three transcripts, it is quite possible that the gene-specific primers designed in this study might be cross-hybridizing to other Rubus CHS transcripts that have similar a 5' or 3'-end UTRs. It is not unusual for one or two family members to account for most of the CHS gene expression in a plant. In petunia, there are about eight gene family members, but one gene, CHS-A accounts for ~ 90% of the gene expression, while CHS-J accounts for ~ 10%, with CHS-B and CHS-G being expressed at very low levels. Similarly, in soybean only one of the three CHS genes is active in cotyledons (Wingender et al., 1989). In Ipomea, out of the thirteen genes characterized from seven species, / . purpurea CHS-A and CHS-C, and /. platensis CHS-A are the only genes actively expressed (Durbin et al., 1995). In contrast, two CHS gems characterized from tomatoes are expressed at equivalent levels in cotyledons, hypocotyls and leaves (O'Neill et al., 1990). While I have conclusively demonstrated differential expression among three of the eleven Rubus PKS genes, it remains to be established whether the other Rubus PKS genes are functional, and if so, how they are expressed. Studies of paralogous Hox genes in mice, hoxaS and hoxd3, suggest that although these genes carry out identical biological functions, the different roles attributed to the gene products are the result of quantitative modulations in hoxa3 and hoxd3 gene expression (Greer et al., 2000). In an elegant experiment, Greer et al. (2000) determined that swapping the protein-coding sequences of the two Hox genes altered their expression patterns, suggesting that c/s-acting sequences play an important role in determining the functional specificity of the gene products. In view of such a hypothesis, it is notable that 200 the results of Harker et al. (1990) suggest that two regulatory factors in pea, a and a2, exert differential control on the three pea CHS genes, suggesting that multiple CHS genes have different combinations of cis elements that determine their response to the products of these regulatory loci. For example, the expression of pea CHS1 in flowers has an absolute requirement for the products of both the a and a2 loci, whereas, in root tissue, the products of these loci are not required. Pea CHS2, on the other hand, is expressed in roots but not in petal tissue, suggesting that it may not be able to interact with the products of a and a2 loci in petal tissues. Thus, given the functional dependence of the hox genes on their absolute quantitative levels, one could hypothesize that a similar model may be valid for the PKS gene family members, where the qualitative difference in the behavior of individual gene products may be a consequence of quantitative differences in their expression behavior. The identification and initial characterization of the eleven Rubus PKS genes presents us with an unprecedented opportunity to fully understand the fundamental function(s) and diversity of the PKS gene family members within a single species. My findings have outlined the RiPKS gene-family structure as well as a set of tools to carry out such an analysis in other plants. Characterization of the catalytic properties of all the RiPKS isoforms would generate important insights into their potential roles. More detailed analysis of the effects of imidazole on CHS-type activity provides a novel approach to understanding how evolution has generated structural diversity in polyketide natural products. The creation of transgenic plants that are blocked in specific PKS-catalyzed reactions would provide an alternative method for delineating the functions of 201 Rubus PKS genes and would permit an unambiguous association of these genes with the functions of their encoded proteins. 202 CHAPTER 6 Summary and future directions The work presented in this thesis was designed to characterize three phenylpropanoid gene-families, to examine their behavior during fruit ripening and to study their evolutionary dynamics. In this study, I have studied three branch-point regulatory enzymes (PAL, 4CL, and PKS) of the phenylpropanoid pathway. Based on the results of this thesis, future work could include: 6.1 Characterization of phenylpropanoid gene-families in other plant species To isolate and characterize the full repertoire of the PAL, 4CL, PKS gene family from Rubus, I designed degenerate PCR primers for each of the three gene-families. Such a homology-based gene search resulted in the characterization of two members of the PAL gene family (Ripall and Ripall), two members of the 4CL gene family (Ri4cll and Ri4cl2), and ten partial fragments representing the PKS gene-family (Ripksl-RipkslO). This was not surprising since multiple members for all three gene families have been reported from various other plants, including gymnosperms and angiosperms. A homology-based PCR screening proved to be a productive initial approach to defining the gene family size of PAL, 4CL and PKS in Rubus. While characterization of gene-families has generally relied on homology-based hybridization methods, the use of PCR-based homology has potential advantages. Gene identification is rapid and the results are independent of the expression pattern of the genes and homology to the probe being used. Further extension of homology-based screening to a hybridization-based screening of a cDNA library representing "partially-ripe" fruits of Rubus, resulted in 203 identification of a 4CL gene (Ri4CL3) and an additional PKS gene (RiPKSll) in addition to the genes identified by the PCR-based homology screen. This was not totally unexpected, since amplifications of a gene family member are biased by the target sequence of the primers. In addition, for reasons still unknown, some gene family members may be preferentially amplified. Thus, while the two Rubus PAL gene family members were characterized using either the PCR-based homology screen, or the hybridization-based screening methods, the former search failed to recover the full repertoire of 4CL and PKS genes. Nevertheless, the primers designed in this study should prove useful for initial characterization of PAL, 4CL and PKS genes from any plant species in a manner that is independent of the spatio-temporal expressions of the member. Specifically, these primers could be used to sample the presence of these genes across taxonomic diverse groups of plants. While it is known that lower plants also accumulate various phenolics, these genes have not been characterized from any lower plants. 6.2 Coordinate regulation of phenylpropanoid genes Why do plants need multiple copies of various phenylpropanoid genes? One speculation for the role of multiple genes has been that each gene encodes an isoform that can potentially interact with other proteins of the phenylpropanoid pathway to form unique protein-protein complexes that are required for channeling of carbon through this pathway (Burbulis and Winkel-Shirley, 1999; Rasmussen and Dixon, 1999). It is also established that genes encoding PALs, 4CLs and PKSs are upregulated in response to various stressors. One could also argue that genes encoding the interacting isoforms would show co-ordinate regulation under similar conditions. Co-ordinate regulation of 204 genes encoding enzymes of the phenylpropanoid pathway such as PAL, CHS and CHI has been observed in alfalfa cells (Dalkin et al., 1990), and cell suspension cultures of beans (Cramer et al., 1985). Similarly, co-ordinate regulation of PAL and 4CL mRNA was observed in Arabidopsis suspension-cultured cells elicited with PGA lyase (Davis and Ausubel, 1989). In old man cactus, increased levels of PAL, CHS and CHI accompany the synthesis of aurone phytoalexins (Pare et al., 1992). In Arabidopsis, PAL and CHS show a similar light-dependent pattern of accumulation (Levya et al., 1995). To elucidate the mechanisms by which these genes are coordinately regulated, phenylpropanoid gene promoters have been isolated and analyzed. Lois et al. (1989) identified three inducible sites of DNA/protein interactions (boxes P, A and L) in the parsley PAL-1 promoter using in vivo footprinting. These putative c/s-elements have been found to be conserved in many phenylpropanoid gene promoters including 4CL and C4H (Lois et al., 1989; Logemann et al., 1995; Sablowski et al., 1994; Logemann, 1995). Thus, while such studies indicate an overall coordination between different enzymes of the phenylpropanoid pathway, in none of the cases have the complete gene-family members encoding three diverse phenylpropanoid pathway enzymes been investigated. A preliminary study has been attempted in parsley, where a tight coordination of PALI, PAL2 and PAL3 was observed with C4H and 4CL in response to U V light, fungal elicitor, and wounding (Logemann et al., 1995). Using gene-specific probes for various members of the Rubus PAL, 4CL and PKS gene-families, I have shown for the first time that expression of some members of each family is ripening-related, and therefore these genes can be regulated coordinately. The overall analysis reveals that RiPALl, R14CL1 and RiPKS6 had a similar pattern of expression in various organs of Rubus, and in developing 205 flowers and fruits. I found that while these genes were most highly expressed in fruits (stage IV), they had lower levels of expression in other vegetative tissues such as roots> shoots> leaves. In various developmental stages of raspberry fruits, RiPALl and Ri4CLl were most highly expressed in immature and green fruits (stage I), suggesting that these genes might be required during the early stages of fruit-ripening. These results could be extended to include studies of the mechanisms by which such diverse genes are coordinately regulated. The occurrence of common promoter elements in PAL and 4CL genes (Logemann et al, 1995) suggests that a common transcription factor(s) might be involved. Characterization and comparison the promoter regions of these genes could be undertaken to identify consensus regions that could potentially be involved in co-ordinate regulation of these genes. Once such consensus regions were identified, they could be used to screen for transcription factors that might bind to these consensus regions. This approach could be extended to include the promoters of other phenylpropanoid genes that act at different points in the pathway and do not necessarily have a control on the flux of carbon through this pathway and genes that lie upstream of this pathway. Logemann et al. (2000) have observed that genes of primary metabolism biogenetically linked to the formation of phenylpropnpoids are coordinated with those specific for the phenylpropanoid pathway, such as PAL and CHS. Thus, while I have shown the regulation of the members encoding the Rubus PAL, 4CL and PKS gene families are co-ordinated in ripening fruits, role(s) of these gene family members in other aspects of plant development need to be assessed. Another potential area for exploration would be in plant-pathogen interactions. Both preformed and newly formed phenylpropanoids have been shown to play an important role during 206 plant pathogen interactions and these gene families have been shown to be induced in response to pathogen attack. Thus, it would be interesting to study the which members of the gene families are expressed during plant-pathogen interactions. 6.3 Identification of ripening-related promoters Compared to expression levels in various organ studies, RiPKSll was most highly expressed in ripe fruits, with very low levels in other organs, indicating that this gene is "strictly" ripening-regulated. This gene could be used to identify ripening-induced promoters that could have interesting commercial applications. 6.4 Characterization of Rubus PKSs Further studies are needed to determine if Rubus CHS is indeed being modulated to acquire a new function in the presence of imidazole. Is the "BA-like" radioactive band BA? The characterization of the BA-like radioactive product is of crucial importance before any conclusions can be drawn about the modulation of the Rubus CHSs by imidazole. This could be done by a double-labeling experiment where RiCHS in the presence of 1 M imidazole is incubated with a mixture of [3H] p-coumaryl CoA and [14C]malonyl CoA. The ratio of 3 H : 1 4 C incorporated into the "BA-like" compound could shed light on the number of moles of p-coumaryl CoA and malonyl CoA that are present in this "BA-like" compound. This would confirm the number of moles of of p-coumaryl CoA and malonyl CoA that have been incorporated in the "BA-like" compound. Concrete proof that this new compound is B A would however require the elucidation of its structure. This could be achieved by a large scale synthesis of pure "BA-like" compound and structure elucidation by techniques such as high resolution mass-spectrometry. The double-labeling experiment along with mass spectrometry of the purified compound 207 should lead to conclusive proof about the identity of this "B A-like" compound. If indeed the radioactive band is BA, then this would be the first example where greater diversity within plant-specific polyketides may be produced by modulation of the enzyme by other enzymes or metabolites. Has nature exploited this route to make new polyketides? If so, then how is CHS-type PKS modulated in its activity? Answering these questions could prove to be challenging. The question that remains to be answered is whether BAS is indeed a unique PKS. The results of Borejsza-Wysocki and Hrazdina (1996), suggest that BAS is indeed a unique PKS. Though the work described in this thesis does not totally refute this hypothesis, I was not able to isolate the potential gene for this enzyme. Subtractive hybridization of two appropriate RNA populations, e.g. mRNA from two different time points of the induced Rubus cell-cultures should enrich for BAS-type mRNAs that could be used as the starting material for such an analysis. Alternatively, various forms of representational differential analysis, or differential display, could be employed (Lisitsyn et al., 1993). Aside from raspberry fruits, "raspberry ketone", or its glucoside has also been found in rhubarab roots, European cranberries, fruits of sea buckthorn, and pine needles. At the same time BAS -type PKS activity has been detected in pea, apple, grape, and tobacco tissue cultures (Boresjsza-Wyscoki and Hrazdina, 1996 and references therein). These reports indicate that the BAS may be widespread in plants. Cloning of a BAS gene from non-Rubus species is an interesting alternative, not only for gaining access to the gene but also for allowing study of the regulation and function of this pathway in another species. 208 6.5 Evolution by design An even more exciting field for further investigation would be the in vitro evolution of new molecular functions within the plant PKS framework. Synthetic strategies, such as sexual evolution, DNA shuffling or combinatorial approaches have been used for evolving novel protein functions, peptide diversity, and immunologically diverse antibodies, and also for creating nucleic acid diversity to bind small molecules, proteins, and nucleic acids (reviewed in Liu and Schultz, 1999). Buchholz et al. (1998) have used directed evolution to improve the function of site-specific FLP recombinase in Escherichia coli and mammalian cells, whereas Kumamaru et al. (1998) have evolved polychlorinated biphenyl degrading enzymes with novel substrate specificites. Alignment of various PKS protein sequences emphasizes that a only a few amino acid changes are necessary to change the catalytic capabilities of a CHS-type PKS (reviewed in Schroder et al., 1997), thus sugesting that new functions could be created by introducing random changes. Such random changes could be introduced by sexual PCR of the Rubus CHSs, a method that involves amplification of a candidate gene under error-prone conditions to introduce a variety of base substitution mutations into the gene, followed by a error-prone shuffling or recombination processes that result in a population of molecules that contains a large number of combinations of those substitutions (Stemmer, 1994). A growing body of evidence suggests that plant polyketides such as stilbenes, acridones, anthranilamides, and pyrones, are phytoalexins or have antimicrobial effects. Thus one potential application of such a method could be to generate polyketides with 209 novel antimicrobial activities that could find application in development of novel plant defense strategies. 210 Bibliography Ahn J-H, Walton JD (1998) Regulation of cyclic peptide biosynthesis and pathogenicity in Cochliobolus carbonum by TOXEp, a novel protein with a bZLP basic DNA-binding motif and four ankyrin repeats. Mol Gen Genet 260: 462- 469 Akada S, Dube SK (1995) Organization of soybean chalcone synthase clusters and characterization of a new member of the family. Plant Mol Biol 29: 189-199 Akiyama T, Shibuya M, Liu HM, Ebizuka Y (1999) p-Coumaroyltriacetic acid synthase, a new homologue of chalcone synthase, from Hydragea macrophylla var thunbergii. Eur J Biochem 263: 834-839 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410 Allina SM, Pri-Hadash A, Theilmann DA, Ellis BE, Douglas CJ (1998) 4-coumarate:Coenzyme A ligase in hybrid poplar. Plant Physiol 116: 743-754 Allwood EG, Davies DR, Gerrish C, Ellis BE, Bolwell GP (1999) Phosphorylation of the phenylalanine ammonia-lyase: evidence for a novel protein kinase and identification of the phosphorylated residue. FEBS Lett 457: 47-52 An C, Ichinose Y, Yamada T, Tanaka Y, Shiraishi T, and Oku H (1993) Organization of the genes encoding chalcone synthase in Pisum sativum. Plant Mol Biol 21: 789-803 Anson JG, Gilvert HJ, Oram JD, Minton NP (1987) Complete nucleotide sequence of the Rhodosporidium toruloides gene encoding for phenylalanine ammonia-lyase. Gene 58: 189-199 Appert C, Logemann E, Hahlbrock K, Schmid J, Amrhein N (1994). Structural and catalytic properties of the four phenylalanine ammonia-lyase isoenzymes from parsley (Petroselinum crispum Nym.). Eur J Biochem 225: 491-499 Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, Struhl K (1995) Current Protocols in Molecular Biology. John Wiley, New York Baker SM, White EE (1996) A chalcone synthase/stilbene synthase D N A probe for conifers. Theor Appl Genet 92: 827-831 Bairoch A (1991) Putative AMP-binding domain signature. PROSITE: a dictionary of sites and patterns in proteins. Release 8.00, Medical Biochemistry Department, University of Geneva, Switzerland 211 Barillas W, Beerhues L (1997) 3-Hydroxybenzoate:coenzyme A ligase and 4-coumaratexoenzyme A ligase from cultured cells of Centaurium erythraea. Planta 202: 112-116 Barillas W, Beerhues L (2000) 3-Hydroxybenzoate:coenzyme A ligase from cell cultures of Centaurium erythraea: isolation and characterization. Biol Chem 381: 155-60 Barrit BH, Torre LC (1975) Fruit anthocyanin pigments of red raspberry cultivars. J Amer Soc Hort Sci 100: 98-100 Bate NJ, Orr J, Ni W, Meroni A, Nadler-Hassar T, Doerner PW, Dixon RA, Lamb CJ, Elkind Y (1994) Quantitative relationship between phenylalanine ammonia-lyase levels and phenylpropanoid accumulation in transgenic tobacco identifies a rate determining step in natural product synthesis. Proc Natl Acad Sci USA 91: 7608-7612 Batschauer A, Ehlmann B, Schafer E (1991) Cloning and characterization of chalcone synthase gene from mustard and its light-dependent expression. Plant Mol Biol. 16: 175-185 Becker-Andre M, Schulze-Lefert P, Hahlbrock K (1991) Structural comparison, modes of expression, and putative cz's-acting elements of the two 4-coumarate: CoA ligase genes in potato. J Biol Chem 266: 8551-8559 Beckert A (1997) Styrylpyrone synthase biosynthesis in Equisetum arvense L. Phytochemistry 35: 623-628 Bell JN, Ryder TB, Wingate VPM, Bailey JA, Lamb CJ (1986) Differential accumulation of plant defense gene transcripts in a compatible and an incompatible plant-pathogen interaction. Mol Cell Biol 6: 1615-1623 Bender DA (1985) Amino acid metabolism. John Wiley and Sons, Toronto Bernard MA, Ellis BE (1991) Phenylalanine ammonia-lyase from tomato cell cultures inoculated with Verticillium albo-altrum. Plant Physiol 97: 1494-1500 Biale JB, Young RE (1982) Respiration and ripening in fruits-retrospect and prospect. In Friends J, Rhodes MJC, eds, Recent Advances in the Biochemistry of Fruits and Vegetables). Academic Press, London, pp 1-39 212 Bohlmann J, Steele CL, Croteau R (1997) Monoterpene synthases from Grand Fir (Abies grandis). J Biol Chem 272: 21784-21792 Borejsza-Wysocki W, Hrazdina G (1994) Establishment of callus and cell suspension cultures of raspberry (Rubus idaeus cv. Royalty). Plant Cell Tissue and Organ Culture 37: 213-216 Borejsza-Wysocki W, Hrazdina G (1994) Biosynthesis of p-hydroxyphenylbutan-2-one in raspberry fruits and tissue cultures. Phytochemistry 35: 623-628 Borejsza-Wysocki W, Hrazdina G (1996) Aromatic polyketide synthases: purification, characterization and antibody development to beanzalacetone synthase from raspberry fruits. Plant Physiol 110: 791-799 Bowell, GP (1992) A role for phosphorylation in the downregulation of phenylalanine ammonia-lyase in suspension cultured cells of French bean. Phytochemistry 31: 4081-4086 Brady CJ (1987) Fruit ripening. Annu Rev Plant Physiol 38: 155-178. Breathnach R, Chambon P (1981) Organization and expression of eucaryotic split genes coding for proteins. Annu Rev Biochem 50: 349-383 Briggs MS, Harriman RW, Handa AK (1986) Changes in gene expression during tomato fruit ripening. Plant Physiol 100: 1802-1807 Brinkmann U , Mattes RE, Bucket P (1989) High-level expression of recombinant genes in Escherichia coli dependent on the availability of the dnaY gene product. Gene 85:109-114 Brodelius PE (1994) Phenylpropanoid metabolism in Vanillaplanifolis Andr. (V): High performance liquid chromatographic analysis of phenolic glycosides and aglycone in developing fruits. Phytochemical Analysis 5: 27-31 Brodelius PE, Xue ZT (1997) Isolation and characterization of cDNA from cell suspension cultures of Vanilla planifolia encoding 4-coumarate: coenzyme A ligase. Plant Physiol Biochem 35: 497-506 Brown JWS (1986) A catalogue of splice function and putative branch point sequences from plant introns. Nuc Acids Res 14: 9549-9559 Buchholz F, Angrand PO, Stewart AF (1998) Improved properties of FLP recombinase evolved by cycling mutagenesis. Nat Biotechnol 16: 657-662. 213 Burbulis EE, Winkel-Shirley B (1999) Interactions among enzymes of the Arabidopsis fiavonoid biosynthetic pathway Proc Natl Acad Sci USA 96: 12929-12934 Butland SL, Chow ML, Ellis BE (1998) A diverse family of phenylalanine ammonia-lyase genes in pine tree and cell cultures. Plant Mol Biol 37: 15-24 Carpenter EP, Hawkins AR, Frost JW, Brown KA (1998) Structure of dehydroquinate synthase reveals an active capable of multistep catalysis. Nature 394: 299-302 Chalmers DJ, Faragher JD, Raff JW (1973) Changes in anthocyanin synthesis as an index of maturity in red apple varieties. J Hort Sci 48: 389-392 Chappel J, Hahlbrock K (1984) Transcription of plant defense genes in response to U V light or fungal elicitor. Nature 311: 76-78 Chaw S-M, Zhaekikh A, Sung HM, Lau T-C, Li W-H (1997) Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNA sequences. Mol Biol Evol 14: 56-68 Cheng GW, Breen PJ (1991) Activity of phenylalanine ammonia-lyase (PAL) and concentrations of anthocyanins and phenolics in developing strawberry fruit. J Am Soc Horti Sci 116: 865-869 Christensen A, Gregersen PL, Schroder J, Collinge DB (1998) A chalcone synthase with an unusual substrate preference is expressed in barley leaves in response to U V light and pathogen attack. Plant Mol Biol 37: 849-857 Clendennen SK, May GD (1997) Differential gene expression in ripening banana fruit. Plant Physiol 115: 1155-1161 Cramer CL, Edwards K, Dron M, Liang X, Deldine SL, Bolwell GP, Dixon RA, Lamb CJ, Schuch WW (1989) Phenylalanine ammonia-lyase gene organization and structure. Plant Mol Biol 12: 367-383 Cukovic D (1999) Cloning and expression of genes encoding divergent 4CL enzymes in poplar. MSc thesis. University of British Columbia, Vancouver Dalkin K, Jorrin J, Dixon RA (1990) Stress responses in alfalfa (Medicago sativa L.) VII. Induction of defense related mRNAs in elicitor-treated cell suspension cultures. Physiol Mol Plant Path 37: 292-307 Davis KR, Ausubel FM (1989) Characterization of elicitor-induced defense responses in suspension cultures cells of Arabidopsis. Mol Plant-Microbe Interactions 2: 363-368 214 Diallinas G, Kanellis AK (1994) A phenylalanine ammonia-lyase gene from melon fruit: cDNA cloning, sequence and expression in response to development and wounding. Plant Mol Biol 26: 473-479 Dixon RA, Paiva NL (1995) Stress-induced phenylpropanoids in plants. Plant Cell 7: 1085-1097 Dixon RA (1999) Plant natural products: the molecular genetic basis of biosynthetic diversity. Curr Opin Biotech 10: 192-197 Dixon RA, Steele CL (1999) Flavonoids and isoflavonoids- a gold mine for metabolic engineering. Trends Plant Sci 4: 1360-1385 Dopico B, Lowe AL, Wilson TJ), Merodio C and Grierson D (1993) Cloning and characterization of avocado fruit mRNAs and the expression during ripening and low temperature storage. Plant Mol Biol 21: 437-449 Douglas C, Hoffmann H, Schulz W, Hahlbrock K (1987) Structure and elicitor or u.v-light-stimulated expression of two 4-coumarate:CoA ligase gene in parsley. E M B O J 6: 1189-1195 Douglas CJ, Hauffe KD, Ites-Morales M-E, Ellard M, Paszkowski U, Hahlbrock K, Dangl JL (1991) Exonic sequences are required for elicitor and light activation of a plant defense gene, but promoter sequences are sufficient for tissue specific expression. EMBO J 10: 1767-1775 Douglas CJ (1996) Phenylpropanoid metabolism and lignin biosynthesis: from weeds to trees. Trends Plant Sci 1: 171-178 Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12: 13-15 Drouin G, Dover GA (1990) Independent gene evolution in the potato actin gene family demonstrated by phylogenetic procedures for resolving gene conversions and the phylogeny of angiosperm actin genes. J Mol Evol 31: 132-150 Dubery IA, Schabort JC (1986) Phenylalanine ammonia-lyase from Citrus sinensis: Purification hydrophobic interaction chromatography and physical characterization. Biochem Int 13: 579-589 Durbin M L, Learn GH Jr, Huttley GA, Clegg MT (1995) Evolution of the chalcone synthase gene family in the genus Ipomoea. Proc Natl Acad Sci USA 92: 3338-3342 Dyer WE, Henstrand JN, Handa AK, Herrmann KM (1989) Wounding increases the first enzyme of the shikimate pathway in solanaceae. Proc Natl Acad Sci USA 86: 7370-7373 215 Ebel J, Hahlbrock K (1977) Enzymes of flavone and flavonol-glycoside biosynthesis. Eur J Biochem 75: 201-209 Eckermann S, Schroder G, Schmidt J, Strack D, Edrada RA, Helariutta Y, Elomaa P, Kotilainen M, Kilpelainen I, Proksch P, Teeri TH, Schroder J (1998) New pathway to polyketides in plants. Nature 396: 387-390 Ehlting J, Buttner D, Wang Q, Douglas CJ, Somssich EE, Kombrink E (1999) Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionary divergent classes in angiosperms. Plant J 19: 9-20 Ernes AV, Vining LC (1970) Partial purification and properties of L-phenylalanine ammonia-lyase from Streptomyces verticillatus. Can J Biochem 48: 613-622 Estabroook E, Gopalan-Sengupta C (1991) Differential expression of phenylalanine ammonia lyase and chalcone synthase during soyabean nodule development. Plant Cell 3: 299-308 Epping B, Kittel M, Ruhnau B, Hemleben V (1990) Isolation and sequence analysis of a chalcone synthase cDNA of Matthiola incana RBr (Brassicaceae). Plant Mol Biol 14: 1061-1063 Faragher JD, Brohier RL (1984) Anthocyanin accumulation in apples skin during ripening: Regulation by ethylene and phenylalanine ammonia-lyase. Scientia Horticulturae 22: 89-96 Feinbaum RL, Ausubel FM (1988) Transcriptional regulation of Arabidopsis thaliana chalcone synthase gene. Mol Cell Biol 8: 1985-1992 Ferrer J-L, Jez JM, Bowman ME, Dixon RA, Noel JP (1999) Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat Struct Bio 6: 775-784 Felsenstein, J (1995) Confidence limits on phytogenies: an approach using bootstrap. Evolution 39: 783-791 Fliegmann J, Schroder G, Schanz S, Britsch L, Schroder J (1992) Molecular analysis of chalcone synthase and dihydropinoslvin synthase from Scots pine (Pinus sylvestris), and differential regulation of these and related enzymes in stressed plants. Plant Mol Biol 18: 489-503 Franken P, Niesbach-KIosgen U, Weydemann U, Marechal-Drouard L, Saedler H, Wienand U (1991) The duplicated chalcone synthase genes C2 and Whp (white pollen) of Zea mays are independently regulated; evidence for translational control of Whp expression by the anthocyanin intensifying gene in EMBO J 10: 2065-2612 216 Frugoli JA, McPeek MA, Thomas TL, McCIung RC (1998) Intron loss and gain during evolution of the catalase gene family in angiosperms. Genetics 149: 355-365 Fukasawa-Akada T, Kung SD, Watson, JC (1996) Phenylalanine ammonia-lyase gene structure, expression, and evolution mNicotiana. Plant Mol Biol 30: 711-722 Funa N, Ohnishi Y, Fuji I, Shibuya M, Ebizuka Y, Horinouchi S (1999) A new pathway for polyketide in microorganisms. Nature 400: 897-899 Gorlach J, Raesecke H-R, Rentsch D, Regnass M, Roy P, Zala M, Keel C, Boiler T, Amrhein N, Schmid J (1995) Temporally distinct accumulation of transcripts encoding enzymes of the prechorismate pathway in elicitor-treated, cultured tomato cells. Proc Natl Acad Sci USA 92: 3166-3170 Given NK, Venis MA, Grierson D (1988) Phenylalanine ammonia-lyase activity and anthocyanin synthesis in ripening strawberry fruit. J Plant Physiol 133: 25-30 Given NK, Venis MA, Grierson D (1988) Purification and properties of phenylalanine ammonia-lyase from strawberry fruit and its synthesis during ripening. J Plant Physiol 133: 31-37 Goiffon JP, Brun M, Bourrier MJ (1991) Fruit anthocyanin pigments of red raspberry cultivars. J Chromat 55: 101-127 Gong Z, Yamazaki M, Sigiyama M, Tanaka Y, Saito K (1997) Cloning and molecular analysis of structural genes involved in anthocyanin biosynthesis and expressed in a forma-specific manner inPerilla frutescens. Plant Mol Biol 35: 915-917 Goodwin PH, Hsiang T, Erickson L (2000) A comparison of stilbene and chalcone synthases including a new stilbene synthase gene from Vitis riparia cv. Gloire de montpellier. Plant Sci 152: 1-8 Gowri G, Paiva NL, Dixon RA (1991) Stress responses in alfalfa (Medicago sativa L.) 12. Sequence analysis of phenylalanine ammonia-lyase (PAL) cDNA clones and appearance of PAL transcripts in elicitor-treated cells cultures and developing plants. Plant Mol Biol 17: 415-429 Grand C, Boudet A, Boudet AM (1983) Isoenzymes of hydroxycinnamate:CoA ligase from poplar stems properties and tissue distribution. Planta 158:225-229 Gray JE, Picton S, Shabbeer J, Schuch W, Grierson D (1992) Molecular biology of fruit ripening and its manipulation with antisense genes. Plant Mol Biol 19:69-79 Greer JM, Puetz J, Thomas KR, Capecchi MR (2000) Maintenance of functional equivalence during paralogous Hox gene evolution. Nature 403: 661-665 217 Grierson D, Slater A, Speirs J, Tucker GA (1985) The appearance of polygalacturonase mRNA in tomatoes: one of a series changes in gene expression during development and ripening. Planta 163: 263-271 Gross GG, Zenk MH (1974) Isolation and properties of hydroxycinnamate:CoA ligase from lignifying tissue ofForsythia. Eur J Biochem 42: 453-459 Hahlbrock K, Zilg H, Grisebach H (1970) Stereochemistry of the enzymatic cyclisation of 4,2',4'-trihydroxychalcone to 7,4-dihydroxyflavanone by isomerases from mung bean seedlings. Eur J Biochem 15: 13-18 Hahlbrock K, Griesebach H (1979) Enzymic controls in the biosynthesis of lignin and flavonoids. Ann Rev Plant Physiol 30: 105-130 Hahlbrock K, Scheel D (1989) Physiology and molecular biology of phenylpropanoid metabolism. Annu Rev Plant Physiol Plant Mol Biol 40: 347-369 Hain R, Reif H, Krause E, Langebartels R, Kindl H, Vornam B, Weise W, Schmelzer E, Schreier P, Stocker R, Stenzel K (1993) Disease resistance results from foreign phytoalexin expression in a novel plant. Nature 361: 153-156 Hanson KR, Havir EA (1970) L-phenylalanine ammonia-lyase IV. Evidence that prosthetic group contains a dehydroanyl residue and mechanism of action. Arch Biochem Biophys 141: 1-17 Harker C, Ellis N, TH, Coen ES (1990) Identification and genetic regulation of the chalcone synthase multigene family in pea. Plant Cell 2: 185-194 Hassan MA, Swartz HJ, Inamine G, MuIIineaux P (1993) Agrobacterium tumefaciens-mediated transformations of several Rubus genotypes and recovery of transformed plants. Plant Cell Tissue and Organ Culture 33: 9-17 Havir EA (1979) L-phenalyalnine ammonia-lyase. Binding of polysaccharide by the enzyme from maize. Plant Sci Lett 16: 297-304 Helariutta Y, Elomaa P, Kotilainen M, Griesbach RJ, Schroder J, Teeri TH (1995) Chalcone synthase-like genes are active during corolla development are differentially expressed and encode enzymes with differential catalytic properties in Gerbera hybrida (Asteraceae). Plant Mol Biol 28: 47-60 Helariutta Y, Kotilainen M, Elomaa P, Kalkkinen N, Bremer K, Teeri TH, Albert VA (1996) Duplication and functional divergence in the chalcone synthase gene family of Asteraceae: evolution with substrate change and catalytic simplification. Proc Natl Acad Sci USA 93: 9033-9038 218 Herrmann KM (1983) The common aromatic biosynthetic pathway. In K M Hermann, Somerville RL, eds, Amino Acids: Biosynthesis and Genetic Regulation. Reading, M A ; Addison-Wesley, pp 301-322 Herrmann KM (1995) The shikimate pathway: Early steps in the biosynthesis of aromatic compounds. Plant Cell 7: 907-919 Herrmann KM, Weaver LM (1999) The shikimate pathway. Annu Rev Plant Physiol Plant Mol Biol 50: 473-503 Hernandez D, Phillips AT (1994) Ser-143 is an essential active site residue in histidine ammonia-lyase. Arch Biochem Biophys 307: 126-132 Hess D (1967) Substrate induction during anthocyanin synthesis in Petunia. Naturwissenschaften. 54: 289-90 Holton TA, Grahm MW (1991) A simple and efficient method for direct cloning of PCR products using ddT-tailed vectors. Nuc Acids Res 19: 1156-1157 Hopwood DA, Sherman DH (1990) Molecular genetics of polyketides and its comparisons to fatty acid biosynthesis. Annu Rev Genet 24: 37-66 Howies PA, Arioli T, Weinman JJ (1994) Characterization of a phenylalanine ammonia-lyase multigene family in Trifolium subterraneum. Gene 138: 87-92 Howies PA, Paiva NL, Sewalt VJH, Elkind NL, Bate Y, Lamb CJ, Dixon RA (1996) Over-expression of L-phenylalanine ammonia-lyase in transgenic tobacco plants reveals control points for flux into phenylpropanoid biosynthesis. Plant Physiol 112: 1617-1624 Hrazdina G, Kreuzaler F, Hahlbrock K, Griesbach H (1976) Substrate specificity of flavanone synthase from cell suspension cultures of parsley and structure of release products in vitro. Arch Biochem Biophys 175: 392-399 Hrazdina G, Parsons GF, Mattick LR (1984) Physiological and biochemical events during development and maturation of grape berries. Am J Enol Vitic 35: 220-227 Hrazdina G, Wagner G (1985) Metabolic pathways as enzyme complexes: Evidence for the synthesis of phenylpropanoids and flavonoids in membrane-associated enzyme complexes. Arch Biochem Biophys 237: 88-100 Hrazdina G (1992) Compartmentalization in aromatic metabolism. Rec Adv Phytochem 26: 1-23 Hu W-J, Kawaoks A, Tsai C, Lung J, Osakabe K, Ebinuma H, Chiang V (1998) Compartmentalized expression of two structurally and functionally distinct 4-219 coumarate:CoA ligase genes in aspen (Populus tremuloides) Proc Natl Acad Sci USA 95: 5407-5412 Hu W-J, Harding SA, Lung J, Popko JL, Ralph J, Stokke DD, Tsai C-J, Chiang VL (1999) Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees. Nat Biotechnol 17: 808-812 Huang Y, McBeath JH (1994) Bacterial induced activation of an. Arabidopsis phenylalanine ammonia-lyase promoter in transgenic tobacco plants. Plant Sci 98: 25-35 Jez JM, Ferrer JL, Bowman ME, Dixon RA, Noel JP (2000) Dissection of malonyl-CoA decarboxylation from polyketide formation in the reaction mechanism of a plant polyketide synthase. Biochemistry 39: 890-902 Jones CS, Innaetta PPM, Woodhead M, Davies HV, McNicol RJ, Taylor MA (1997) The isolation of R N A from raspberry (Rubus idaeus) fruit. Mol Biotechnol 8: 219-221 Joos HJ, Hahlbrock K (1992) Phenylalanine ammonia-lyase in potato (Solarium tuberosum L.). Eur J Biochem 204: 621-629 Jorgensen R (1994) The genetic origins of biosynthesis and light-responsive control of the chemical U V screen of land plants. In B E Ellis, GW Kuroki, H A Stafford, eds. Genetic Engineering of Plant secondary Metabolism, Plenum Press, New York, pp 179-192 Joshi CP (1987) An inspection of the domain between putative T A T A box and translation start site in 79 plant genes. Nuc Acids Res 15: 6643-6653 Junghans H, Dalkin K, Dixon RA (1993) Stress response in alfalfa (Medicago sativa L.) 15. Characterization and expression patterns of members of a subset of the chalcone synthase multigene family. Plant Mol Biol 22: 239-253 Junghanns KT, Kneusel RE, Baumert A, Maier W, Groger D, Matern U (1995) Molecular cloning and recombinant expression of acridone synthase from elicited Ruta graveolens L cell suspension culture. Plant Mol Biol 27: 681-692 Kausch KD, Handa AK (1997) Molecular cloning of a ripening-specific lipoxygenase and its expression during wild-type and mutant tomato fruit development. Plant Physiol 113: 1041-1050 Keith B, Dong X, Ausubel FM, Fink G (1991) Differential induction of 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase genes in Arabidopsis thaliana by wounding and pathogen attack. Proc Natl Acad Sci USA 88: 8821-8825 220 Kennedy J, Auclair K, Kendrew SG, Park C, Vederas JC, Hutchinson CR (1999) Modulation of polyketide synthase activity by accessory proteins during lovastatin biosynthesis. Science 284: 1368-1272 Kervinen T, Peltonen S, Utriainen M, Kangasjarvi J, Teeri TH, Karjalainen R (1997) Cloning and characterization of cDNA clones encoding phenylalanine ammonia-lyase in barley. Plant Sci 123: 143-150 Kim S-H, Kronstad JW, Ellis BE (1996) Purification and characterization of phenylalnine ammonia-lyase from Ustilago maydis. Phytochemistry 43: 351-357 Knobloch K-H, Hahlbrock K (1975) Isoenzymes of p-coumarate:CoA ligase from cell suspension of Glycine max. Eur J Biochem 52: 311-320 Knobloch K-H, Hahlbrock K (1977) 4-coumarate:CoA ligase from cell suspension cultures of Petroselinum hortense Hoffm. Partial purification, substrate specificity, and further properties. Arch Biochem Biophys 184: 237-248 Knochel T, Ivens A, Hester G, Gonzalez A, Bauerle R, Wilmanns M, Kirschner K, Jansonius JN (1999) The crystal structure of anthranilate synthase from Sulfolobus solfataricus: functional implications. Proc Natl Acad Sci USA 96: 9479-9484 Knogge W, Beulen C, Weissenbock G (1981) Distribution of phenylalanine ammonia-lyase and 4-coumarate:CoA ligase in oat primary leaf tissues. Z Naturforsch 36: 389-395 Koes RE, Quattrocchio F, Mol JNM (1994) The flavonoid biosynthetic pathways in plants: function and evolution. Bioessays 16: 123-131 Koes RE, Spelt CE, Mol JNM, Gerats AGM (1987) The chalcone synthase multigene family of Petunia hybrida (V30): Sequence homology, chromosomal localization and evolutionary aspects. Plant Mol Biol 10: 375-385 Koes RE, Spelt CE, Mol JNM (1989) The chalcone synthase multigene family of Petunia hybrida (V30) Differential light-regulated expression during flower development and U V light induction. Plant Mol Biol 12: 213-225 Koes RE, Spelt CE, van den Elzen PJM, Mol JNM (1989) Cloning and molecular characterization of the chalcone synthase multigene family of Petunia hybrida. Gene 81: 245-57 Kozak M (1986) Point mutations define a sequence flanking the A U G initiator codon that modualtes transslation by eukaryotic ribisomes. Cell 44: 283-292 Krell T, Coggins JR, Lapthorn AJ (1998) Three-dimensional structure of shikimate kinase. J Mol Biol 278: 983-997 221 Kreuzaler F, Hahlbrock K (1972) Enzymatic synthesis of aromatic compounds in higher plants: formation of naringenin (5, 7, 4'-trihydroxyflavanone) from p-coumaroyl: coenzyme A and malonyl coenzyme A. FEBS Lett 28: 69-72 Kreuzaler F, Hahlbrock K (1975) Enzymic synthesis of an aromatic ring from acetate units. Partial purification and some properties of flavanone synthase from cell-suspension cultures of Peteroselinum hortense. Eur J Biochem 56: 205-213 Kreuzaler F, Hahlbrock K (1975) Enzymatic synthesis of aromatic compounds in higher plants. Formation of bis-noryangonin (4-hydroxy-6[4-hydroxystyryl]2-pyrone) from p-coumaroyl-CoA and malonyl-CoA. Arch Biochem Biophys 169: 84-90 Kreuzaler F, Ragg H, Heller W, Tesch R, Witt I, Hammer D, Hahlbrock K (1979) Flavanone synthase from Petroselinum hortense. Molecular weight, subunit composition, size of messenger RNA, and absence of pantheinyl residue. Eur J Biochem 99: 89-96 Kumamaru T, Suenaga H, Mitsuoka M, Watanabe T, Furukawa K (1998) Enhanced degradation of polychlorinated biphenyls by directed evolution of biphenyl dioxygenase. Nat Biotechnol 16: 663-666 Laemmli UK (1970) Clevage of structural proteins during the assembly of head of bacteriophage T 4 . Nature 227: 680-685 Langenheim JH (1994) Higher plant terpenoids: a phytocentric overview of their ecological roles. J Chem Ecol 20: 1223-1280 Lanz T, Schroder G, Schroder. J (1990) Differential regulation of genes for resveratrol synthase in cell cultures ofArachis hypogeaL. Planta 181: 169-175 Lawton M, Dixon R, Hahlbrock K, Lamb C (1983) Rapid induction of the synthesis of phenylalanine ammonia-lyase and chalcone synthase in elicitor treated plant cells. Eur J Biochem 129:593-601 Lay-Yee M, Dellapenna D, Ross GS (1990) Changes in mRNA and protein during ripening in apple fruit (Mains domestica Borkh. Cv. Golden Delicious). Plant Physiol 94: 850-853. Latza S, Dietmar G, Berger RG (1996) Identification and accumulation of 1-0-traans-cinnamoyl-beta-D-glucopyranose in developing strawberry fruit (Fragaria ananassa Duch.Cv. Kent) J Agric Food Chem 44: 1367-1370 Lee D, Ellard M, Wanner LA, Davis KR, Douglas CJ (1995) The Arabidopsis thaliana 4-coumarate:CoA ligase (4CL) gene: stress and developmentally regulated expression and nucleotide sequence of its cDNA. Plant Mol Biol 28: 871-884 222 Lee D, Douglas C (1996) Two divergent members of a tobacco 4-coumarate:Coenzyme A ligase gene family. Plant Physiol 112: 193-205 Levi A, Galau GA, Wetzstein HY (1992) A rapid procedure for the isolation of R N A from high-phenolic-contaning tissues of pecan. Hort Sci 27: 1316-1318 Leyva A, Liang XW, Pintortoro JA, Dixon RA, Lamb CJ (1992) c/s-elements combinations determine phenylalanine ammonia-lyase gene tissue-specific expression patterns. Plant Cell 4: 263-271 Leyva A, Antonio JJ, Salinas J, Miguel J, Zapafer M (1995) Low temperature induces accumulation of phenylalanine ammonia-lyase and chalcone synthase mRNA of Arabidopsis thaliana in a light dependent manner. Plant Physiol 108: 39-46 Liang X, Dron M, Schmid J, Dixon RA, Lamb CJ (1989) Developmental and environmental regulation of phenylalanine ammonia-lyase-P-glucronidase gene fusion in transgenic tobacco plants. Proc Natl Acad Sci USA 86: 9284-9288 Liang X, Dron A, Cramer CL, Dixon RA, Lamb CJ (1989) Differential regulation of phenylalanine ammonia-lyase genes during plant development and by environmental cues. J Biol Chem 264: 14486-14492 Liew CF, Goh CJ, Loh CS, Lim SH (1996) Cloning and nucleotide sequence of a cDNA encoding phenylalanine ammonia-lyase from Bromheadia finlaysoniana (Lindl.) Rchb.f. (Accession No. X99997)(PGR96-087) Plant Physiol 112: 863-863 Lindl T, Krezuler F, Hahlbrock K (1973) Synthesis of p-coumaroyl: coenzyme A with a partially purified p-coumarate:CoA ligase from cell suspension cultures of soybean (Glycine max). Biochim Biophys Acta 302: 457-464 Lisitsyn, N, Lisitsyn N, Wigler M (1993) Cloning the differences between two complex genomes. Science 259: 946-951 Liswidowati F, Holhmann F, Schwer B, Kindl H (1991) Induction of stilbene synthase by Botrytis cinerea in cultured grapevine cells. Planta 183: 307-314 Liu DR, Schultz PG (1999) Generating new molecular functions: A lesson from nature. Angew Chem Int Ed 38: 36-54 Logemann E, Parniske M, Hahlbrock K (1995) Modes of expression and common structural features of the complete phenylalanine ammonia-lyase gene family in parsley. Proc Natl Acad Sci USA 92: 5905-5909 Lois R, Dietrich A, Hahlbrock K, Schulz W (1989) A phenylalanine ammonia-lyase gene from parsley: structure, regulation and identification of elicitor and light responsive c/5-acting elements. E M B O J 8: 1641-1648 223 Lois R, Hahlbrock K (1992) Differential wound activation of members of the phenylalanine ammonia-lyase and 4-coumarate:CoA ligase gene family in various organs of parsley plants. Zeitschrift Fuer Naturforschung 47c: 90-94 Lopez-Gomez R, Gomez-Lim MA (1992) Changes in mRNA and protein during ripenning in mango fruit. J Plant Physiol 141: 82-87 Lozoya E, Hoffmann H, Douglas C, Schulz W, Scheel D, Hahlbrock K (1988) Primary structures and catalytic properties of isoenzymes encoded by the two 4-coumarate:CoA ligase genes in parsley. Eur J Biochem 176: 661-667 Lukacin R, Springob K, Urbanke C, Ernwein C, Schroder G, Schroder J, Matern U (1999) Native acridone synthases I and II form Ruta graveolens L. form homodimers. FEBS Lett 448: 135-140 McDowell JM, Huang S, Mckinney EC, An Y-Q, Meagher RB (1996) Structure and evolution of the actin gene family in Arabidopsis thaliana. Genetics (142) 587-602 McNeil SD, Nuccio ML, Hanson AD (1999) Betaines and related osmoprotectants targets for metabolic engineering of stress resistance. Plant Physiol 120: 945-949 McQuoid M, Ellis BE (1999) Examining the roots of an ancient biosynthetic enzyme: phenylalnine ammonia-lyase genes in early land plants (Abstract 604). Annual meeting of Plant Physiology Macheix JJ, Fleuriet A, Billot J (1990) Fruit Phenolics. CRC Press, Florida, pp 41-43 Malamy J, Klessig DF (1992) Salicylic acid and plant disease resistance. Plant J 2: 643-654 Mayer et al. (1999) Sequence analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402: 769-777 Minami E, Ozeki Y, Matsuoka M, Koizuka N, Tanaka Y (1989) Structure and some characterization of the gene for phenylalanine ammonia-lyase from rice plants. Eur J Biochem 185: 19-25 Moore PP and Daubeny HA (1993) "Meeker" Red Raspberry. Fruit Varieties J 47: 2-4 O'Hagan, D (1991) The polyketide metabolites. Ellis Horwood Ltd, Chichester, U K O'Neill SD, Tong Y, Sporlein B, Forkmann G, Yoder JI (1990) Molecular genetic analysis of chalcone synthase in Lycopersicon esculentum and an anthocyanin-deficient mutant. Mol Gen Genet 224: 279-288 224 Oates AC, Wollberg P, Achen MG and Wilks AF (1998) Sampling the genomic pool of protein tyrosine kinase genes using the polymerase chain reaction with genomic DNA. Biochem Biophys Res Comm 249: 660-667. Oetiker JH, Olson DC, Shiu OY, Yang SF (1997) Differential induction of seven 1-aminocyclopropane-l-carboxylate synthase genes by elicitor in suspension cultures of tomato (Lycopersicon esculentum). Plant Mol Biol 34: 275-286 Ohl S, Hedrick SA, Chory J, Lamb CJ (1990) Functional properties of a phenylalanine ammonia lyase promoter from Arabidopsis. Plant Cell 2: 837-848 Ohno, S (1970) Why gene duplications? Evolution by Gene Duplication. Springer-Verlag, New-York, pp 59-65 Orum H, Rasmussen OF (1992) Expression in E. coli of the gene encoding phenylalanine ammonia-lyase from Rhodosporidium toruloides. Appl Microbiol Biotechnol 36: 745-748 Osakabe Y, Kazuya N, Kitamura H, Kawai S, Kondo Y, Fujii T, Takabe K, Katayama Y, Morohoshi N (1996) Immunocytochemical localization of phenylalanine ammonia-lyase in tissues of Populus kitakamiensis. Planta 200: 13-19 Perez-Vicente R, Dorado G, Maldonado M (1996) Cross-species amplification of glutamine synthetase cDNA by polymerase chain reaction with degenerate primers. Physiologia Plantarum 98: 705-713 Paniego NB, Zuurbier WM, Fung S-Y, Heijden RVR, Scheffer JC (1999) Phlorisovalerophenone synthase, a novel polyketide synthase from hop (Humulus lupulus L.) cones. Eur J Biochem 262: 612-616 Parath J, Carlsson J, Olsson I, Belfrage G (1975) Metal chelate affinity chromatography, a new approach to protein fractionation. Nature 258: 598-599 Pare PW, Mischke CF, Edwards R, Dixon RA, Norman HA, Mabry TJ (1992) Induction of phenylpropanoid pathway enzymes in elicitor treated cultures of Cephalocerus senilis. Phytochemistry 31: 149-153 Pellegrini L, Rohfritsch O, Fritig B, Legrand M (1994) Phenylalanine ammonia-lyase in tobacco: molecular cloning, gene expression during hypersensitive reaction to tobacco mosaic virus and response to a fungal elicitor. Plant Physiol 106: 877-886 Perkins-Veazie P, Nonnecke G (1992) Physiological changes during ripening of raspberry fruits. Hort Sci 27: 331-333 Perrotta A, Shih I-H, Been MD (1999) Imidazole rescue of a cytosine mutation in a self-cleaving ribozyme. Science 266: 123-126 225 Price T, Aitken J, Simpson ER (1992) Relative expression of aromatase cytochrome P450 in human fetal tissues as determined by competitive polymerase chain reaction amplification. J Clin Endrocrin Metabolism 74: 879-883 Preisig-MuIIer R, Gnau P, Kindl H (1995) The inducible 9,10-dihydrophenanthrene pathway: characterization and expression of bibenzyl synthase and S-adenosylhomocysteine hydrolase. Arch Biochem Biophys 317: 201-207 Preisig-Muller R, Schwekendieks A, Brehm A, Reif HJ, Kindl H (2000) Characterization of a pine multigene family containing elcitor-responsive stilbene synthase gene. Plant Mol Biol 39: 221-229 Raiber S, Schroder G, Schroder, J (1995) Molecular characterization of two stilbene synthases from eastern white pine (Pinus strobus): A single Arg/His difference determines the activity and the pH dependence of the enzymes. FEBS Lett 361: 299-302 Rasmussen S, Dixon RA (1999) Transgene-mediated and elicitor-induced perturbations of metabolic channeling at the entry point into the phenylpropanpoid pathway. Plant Cell 11: 1537-1551 Rate DN, Cuenca JV, Bowman GR, Guttman DS, Greenberg JT (1999) The gain-of-function Arabidopsis acd6 mutant reveals novel regulation and function of the salicylic acid signaling pathway in controlling cell death, defenses, and cell growth. Plant Cell 11; 1695-1708 Reinold S, Hauffe KD, Douglas CJ (1993) Tobacco and parsley 4-coumarate:coenzyme A ligase genes are temporally and spatially regulated in a cell-type specific manner during tobacco flower development. Plant Physiol 101: 373-383 Reimold U , Kroeger M, Kreuzaler F, Hahlbrock K (1983) Coding and 3' non-coding nucleotide sequence of chalcone synthase mRNA and assignment of amino acid sequence of the enzyme. E M B O J 2: 1801-1805 Reinold S, Hahlbrock K (1997) In situ localization of phenylpropanoid biosynthetic mRNAs and proteins in parsley (Petroselinum crispum). Bot Acta 110: 431-443 Rhodes D, Hanson AD (1993) Quantenary ammonium and tertiary solfonium compounds in higher plants. Ann Rev Plant Physiol Plant Mol Biol 44: 357-384 Rhode W, Dorr S, Salamini F, Becker D (1991) Structure of a chalcone synthase gene from Hordeum vulgare. Plant Mol Biol 16: 1103-1106 Rivera-Lopez J, Ordorica-Falomir C, Wesche-Ebeling P (1999) Changes in anthocyanin concentration in Lychee (Litchi chinensis Sonn.) pericarp during maturation. Food Chem 65: 195-200 226 Robertson GW, Griffiths DW, Woodford JAT, Birsch ANE (1995) Changes in the chemical composition of volatiles released by the flowers and fruits of the red raspberry (Rubus idaeus) cultivar Glen prosen. Phytochemistry 38: 1175-1179 Rolfs CH, Fritzemeier KH, Kindl H (1981) Cultures cells ofArachis hypogaea susceptible for induction of stilbene synthase (Resveratrol-forming). Plant Cell Reports 1: 83-85 Rolfs CH, Kindl H (1984) Stilbene synthase and Chalcone synthase. Plant Physiol 75: 489-492 Rolf CH, Schon H, Steffens M, Kindl H (1987) Cell-suspension cultures ofArachis hypogea L. model system of specific enzyme induction in secondary metabolism. Planta 172: 238-244 Roubelakis-Angelakis KA, Kliewer WM (1985) Phenylalanine ammonia-lyase in berries of Vitis vinifera L: Extraction and possible sources of error during assay. Am J Enol Vitic36: 314-315 Ryder TB, Hedrick SA, Bell JN, Liang X, Clouse SD, Lamb CJ (1987) Organization and differential activation of a gene family encoding the plant defense enzyme chalcone synthase in Phaseolus vulagris. Mol Gen Genet 210: 219-233 Rubery PH, Northcote DH (1968) Site of phenylalanine ammonia-lyase activity and the synthesis of lignin during xylem differentiation. Nature 219: 1230-1232 Sablowski RWM, Mayano E, Culianez-Macia A, Schuch W, Martin C, Bevan M (1994) A flower-specific Myb protein activates transcription of phenylpropanoid biosynthetic gene. E M B O J 13: 128-137 Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA (1988) Primer-directed enzymatic amplification of D N A with a thermostable DNA polymerase. Science 239: 487-491 Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N Y Schanz S, Schroder G, Schroder J (1992) Stilbene synthase from Scots pine (Pinus Sylvestris). FEBS Lett 313: 71-74 Schemelzer E, Kruger-Lebus S, Hahlbrock K (1989) Temporal and spatial patterns of gene expression around sites of attempted fungal infection in parsley leaves. Plant Cell 1: 933-1001 227 Schinz H, Seidel C F (1961) Nachtrag zu der Arbeit Nr. 194. Helv Chim Acta 44: 278 Schroder G, Brown WS, Schroder J (1988) Molecular analysis of resveratrol synthase. CDNA, genomic clones and relationship with chalcone synthase. Eur J Biochem 172: 161-169 Schroder J (1989) Protein sequence homology between plant 4-coumarate:CoA ligase and firefly luciferase. Nuc Acid Res 17: 460 Schroder G, Schroder J (1992) A single amino acid change of histidine to glutamine alters the substrate preference of stilbene synthase. J Biol Chem 267: 20558-20569 Schroder J (1997) A family of plant-specific polyketide synthases: facts and predictions Trends Plant Sci 2: 373-378 Schroder J , Raiber S, Berger T, Schmidt A, Schmidt J, Soares-Sello AM, Bradshiri E , Strack D, Simpson T J , Veit M, Schroder S (1998) Plant Polyketide synthases: A chalcone synthase-type enzyme which performs a condensation reaction with the mehtylmalonyl-CoA in the biosynthesis of C-methylated chalcones. Biochemistry 37: 8417-8425 Schoeppner A, Kindl H (1979) Stilbene synthase (pinosylvine synthase) and its induction by ultraviolet light. FEBS Lett 108: 349-52 Schomburg D, Salzmann M (1980) Class 4: Lyases, Phenylalanine ammonia-lyase. In Enzyme Handbook 1, Springer-Verlag, Berlin Schutz R, Heller W, Hahlbrock K (1983) Substrate specificity of chalcone synthase from Petroselinum hortense. Formation of phloroglucinol derivatives from aliphatic substrates. J Biol Chem 258: 6730-6734 Schulz W, Eiben H-G, Hahlbrock K (1989) Expression in Escherichia coli of catalytically active phenylalanine ammonia-lyase from parsley. FEBS Lett 258: 335-338 Schwede T F , Retey J , Schulz (1999) Crystal structure of histidine ammonia-lyase revealing a novel polypeptide modification as the catalytic electrophile. Biochemistry 3817: 5355-61. Seguin A, Gotz L, Antonio L, Dixon RA, Lamb C J (1997) Characterization of a gene encoding a DNA-binding protein that interacts in vitro vascular specific cis elements of the phenylalanine ammonia-lyase promoter. Plant Mol Biol 35: 281-291 Shaw NN, Bolwell GP, Smith C (1990) Wound-induced phenylalanine ammonia-lyase in potato (Solanum tuberosum) tuber discs. Biochem J 267: 163-170 228 Shimazaki N, Mimaki Y, Sashida Y (1991) Prunasin and acetylated phenylpropanoic acid sucrose esters bitter principles from the fruits of Prunus-jamasakura and Prunus-maximowiczii. Phytochemistry 30: 1475-1480 Shirley-Winkel B, Kubasek W, Storz G, Bruggemann W, Koorneef M, Ausubel F, Goodman HM (1995) Analysis of Arabidopsis mutants deficient in flavonoid biosynthesis. Plant J 8: 659-671 Shufflebottom D, Edwards K, Schuch W, Bevan M (1993) Transcription of two members of a gene family encoding phenylalanine ammonia-lyase leads to remarkably different cell specificities and induction patterns. Plant J 3: 835-845 Shumilin IA, Kretsinger RH, Bauerle RH (1999) Crystal structure of phenylalanine-regulated 3-Jeoxy-D-arabino-heptulosonate-7-phosphate synthase from Escherichia coli. Structure (London) 7: 865-875 Sibert PD, Larrick JW (1993) PCR MTMICs: Competitive DNA fragments for use as internal standards in quantitative PCR. Biotechniques 14: 244-249 Sikowski E (1985) Purifications of proteins by IMAC. Trends Biotech 3: 1-7 Sommer H, Saedler H (1986) Structure of the chalcone synthase gene of Antirrhinum majus. Mol Gen Genet 202: 429-434 Sparvoli F, Martin C, Scienza A, Gavazzi G, Tonelli C (1994) Cloning and molecular analysis of the structural genes involved in flavonoid and stilbene biosynthesis in grapes (Vitis vinifera L). Plant Mol Biol 24: 743-755 Speirs J, Brady CJ (1991) Modification of gene expression in ripening fruit. Aust J Plant Physiol 18: 519-532 Stockigt J, Zenk MH (1975) Chemical synthesis and properties of hydroxycinnamoyl-coenzymes A derivatives. Z Naturforsch 30c: 352-358 Stafford HA (1974) The metabolism of aromatic compounds. Annu Rev Plant Physiol 25: 459-486 Stemmer WPC (1994) Rapid evolution of a protein in vitro by D N A shuffling. Nature 370: 389-390 Stewart WN, Rothwell GW (1993) Paleobotany and the Evolution of Plants. Cambridge University Press, Cambridge, U K Stines AP, Naylor DJ, Hej PB, Heeswijck RV (1999) Proline accumulation in developing grapevine fruit occurs independently of changes in the levels of A 1 -pyrroline-5-carboxylate synthetase mRNA or proteins. Plant Physiol 120: 923-931 229 Strater N, Hakansson K, Schnappauf G, Braus G, Lipscomb WN (1996) Crystal structure of the T state of allosteric yeast chorismate mutase and comparison with the R state. Proc Natl Acad Sci 93: 3330-3334 Stuible H-P, Buttner D, Ehlting J, Hahlbrock K, Kombrink E (2000) Mutations analysis of 4-coumarate:CoA ligase identifies functionally important amino acids and verifies it close relationship to other adenylate-forming enzymes. FEBS Lett 467: 117-122 Subramaniam R, Reinold S, Molitor EK, Douglas CJ (1993) Structure, inheritance, and expression of hybrid poplar (Populus trichcocarpa x Populus deltoides) phenylalanine ammonia-lyase genes. Plant Physiol 102: 71-83 Sukrasno N, Yeoman MM (1993) Phenylpropanoid metabolism during growth and development of Capsicum frutescens fruits. Phytochemistry 32: 839-844 Swoffort DL, Olsen GJ (1990) Phylogeny reconstruction. In Hillis D M , Moritz C, eds, Molecular systematics, Massachussets:Sinauer Associates, Sunderland, pp 411-501 Tanaka Y, Matsuoka M, Yamanoto N, Ohashi Y, Kano-Murakami Y, Ozeki Y (1989) Structure and characterization of a cDNA clone for phenylalanine ammonia-lyase from cut roots of sweet potato. Plant Physio 90: 1403-1407 Tamagnone L, Merida A, Stacey N, Plaskitt, Parr A, Chang C-F, Lynn D, Dow JM, Roberts K, Martin C (1998) Inhibition of phenolic acid metabolism results in precocious cell death and altered cell morphology in leaves of transgenic plants. Plant Cell 10: 1801-1816 Tropf S, Lanz T, Rensing SA, Schroder J, Schroder G (1994) Evidence that stilbene synthases have developed from chalcone synthases several times in the course of evolution. J Mol Evol (1994) 3 8: 610-618 Uhlmann A, Ebel J (1993) Molecular cloning and expression of 4- coumarate:coenzyme A ligase, an enzyme involved in the resistance response of soybean (Glycine max L.) against pathogen attack. Plant Physiol 102: 1147-1156. Vance CP, Kirk TK Sherwood RT (1980) Lignification as a mechanism of disease resistance. Annu Rev Phytopathol 18: 259-288 Vaslet CA, Strausberg RL, Sykes A, Levy A, Filpula D (1988) cDNA and genomic cloning of yeast pehnylalnine animonia-lyase reveal genomic intron deletions. Nuc Acids Res 16:11382 230 Verwoert JIGS, Verbee EC, Van der Linden KH, Nijkamp HJJ, Stuiji AR (1992) Cloning, nucleotide sequence, and expression of the Escherichia colifabD gene, encoding malonyl coenzyme A-acyl carrier protein transacylase. J Bacterid 174: 2851-2857 Voo KS, Whetten RW, O'Malley DM, Sederoff RR (1995) 4-coumarate coenzyme A ligase from loblolly pine xylem. Isolation, characterization, and complementary D N A cloning. Plant Physiol 108: 85-97. Wallis PJ, Rhodes MJC (1977) Multiple forms of hydroxy cinnamate: CoA ligase in etiolated pea seedlings. Phytochemistry 16: 1891-1894 Walsh JB (1987) Sequence-dependent gene conversion: can duplicated genes diverge fast enough to escape conversion? Genetics 117: 543-557 Wang C-S, Vodkin LO (1994) Extraction of RNA from tissues containing high levels of procyanidins that bind RNA. Plant Mol Biol Reporter 12: 132-145 Wang X-Q, Tank DC, Sang T (2000) Phylogeny and divergence times in Pinaceae: Evidence from three genomes. Mol Biol Evol 17: 773-781 Wanner LA, Guoqing L, Ware D, Somssich JJE, Davis K (1995) The phenylalanine ammonia-lyase gene family in Arabidopsis thaliana. Plant Mol Biol 27: 327-338 Waters E (1995) The molecular evolution of the small heat-shock proteins in plants. Genetics 141: 785-795 Webster Jr. LT, Mieyal JJ, Siddiqui (1974) Benzoyl and hydroxybenzoyl esters of coenzyme A. J Biol Chem 249: 2641-2645 Welle R, Grisebach H (1988) Induction of phytoalexin synthesis in soybean: enzymatic cyclization of prenylated pterocarpans to glyceollin isomers. Arch Biochem Biophys 263: 191-198 Weinand U, Sommer H, Schwarz Zs, Shepherd N, Saedler H, Krezuler F, Ragg H, Fautz E, Hahlbrock K, Harrison B, Peterson PA (1982) A general method to identify plant structural genes among genomic clones using transposable elements induced mutations. Mol Gen Genet 187: 195-201 Weisshar B and Jenkins GI (1998) Phenylpropanoid biosynthesis and its regulation. Curr Opin Plant Biol 1: 251-257 Whetten RW, Sederoff RR (1992) Phenylalanine ammonia-lyase from loblolly pine: purification of the enzyme and isolation of complementary D N A clones. Plant Physiol 98: 380-386 231 Wiersma PA, Wu Z (1998) A full length cDNA for phenylalanine ammonia-lyase cloned from ripe sweet cherry fruit (Prunus avium; Accession No. AF036948) (PGR98-184) Plant Physiol 118: 1102 Wiese W, Vorman E , Krause H, Kindl H (1994) Structural organization and differential expression of three stilbene synthase genes located on a 13-kb grapevine fragment. Plant Mol Biol 26: 667-677 Wilkinson JQ, Lanahan MB, Conneer T, Klee HJ (1995) Identification of mRNAs with enhanced expression in ripening strawberry fruit using polymerase chain reaction differential display. Plant Mol Biol 27: 1097-1108 Wingender R, Rohrig H, Horicke C, Wing D, Schell J (1989) Differential regulation of soyabean chalcone synthase genes in plant defense, symbiosis and upon environmental stimuli. Mol Gen Genet: 315-322 Woodward JR (1972) Physical and chemical changes in developing strawberry fruits. J Sci food Agric 23: 465-473 Wu L, Ueda T, Messing J (1995) The formation of mRNA 3'-end in plants. Plant J 8: 323-329 Wu SC, Hahlbrock K (1992) In situ localization of phenylpropanoid related gene expression in different tissues of light-grown and dark-grown parsley seedlings. Z Naturforsch 47c: 591-600 Yamada T, Tanaka Y, Sriprasertsak P, Kato H, Hashimoto T, Kawamata S, Ichinose Y, Kato H, Shiraishi T, Oku H (1992) Phenylalanine ammonia-lyase genes from Pisum sativum: structure, organ-specific expression and regulation by fungal elcicitor and suppressor. Plant Cell Physiol 33: 715-725 Yazaki K, Ogawa A, Tabata M (1995) Isolation and characterization of two cDNAs encoding 4-coumarate:CoA ligase in Lithospermum cell cultures. Plant Cell Physiol 36: 1319-1329 Young MR, Towers GHN, Neish AC (1966) Taxonomic distribution of ammonia-lyases for L-phenylalanine and L-tyrosine in relation to lignification. Can J Bot 44: 341-349 Zhang XH, Chiang VL (1997) Molecular cloning of 4-coumarate: coenzyme A ligase in loblolly pine and the roles of this enzyme in the biosynthesis of lignin in compression wood. Plant Physiol 113: 65-74. 232 Zhao Y , Kung SD, Dube SK (1990) Nucleotide sequence of rice 4-coumarate:CoA ligase gene, 4-CL.l . Nucleic Acids Res 18: 6144 233 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0089691/manifest

Comment

Related Items