Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A genomics study of Pinus taeda somatic embryo germination Lane, Alexander 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_2006-0230.pdf [ 14.32MB ]
JSON: 831-1.0092531.json
JSON-LD: 831-1.0092531-ld.json
RDF/XML (Pretty): 831-1.0092531-rdf.xml
RDF/JSON: 831-1.0092531-rdf.json
Turtle: 831-1.0092531-turtle.txt
N-Triples: 831-1.0092531-rdf-ntriples.txt
Original Record: 831-1.0092531-source.json
Full Text

Full Text

A G E N O M I C S S T U D Y O F Pinus taeda S O M A T I C E M B R Y O G E R M I N A T I O N By Alexander Lane B. Sc. (Plant Biology) University of British Columbia, 1999 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science In THE FACULTY OF GRADUATE STUDIES (Plant Science) THE UNIVERSITY OF BRITISH COLUMBIA April 2006 ©Alexander Lane, 2006 Abstract Loblolly pine (Pinus taeda) is one of the most prolific plantation forest tree species, and has significant ecological and economical importance. In order to maximize Pinus taeda plantation productivity, there has been extensive interest in developing an efficient vegetative reproduction system that could maximize production of elite performing trees, thereby minimizing the need for lengthy breeding cycles and dependence on traditional seed orchards. Somatic embryogenesis is one such propagation method that makes it possible to generate large numbers of genetically identical somatic embryos from a single zygotic seed. Though this technology is well established in research settings, the industrial application of this process has been limited due to low production efficiency. Germination of the somatic embryos has been one step in the process that appears to introduce a high degree of variability into the overall process. To better understand this variability a molecular level characterization of germinating somatic embryos was performed. By using cDNA microarray technology it was possible the plot the gene expression profiles of approximately 22000 genes across 3 time points and 4 growth medium treatments. Key primary and secondary biochemical pathways were observed to be induced while a large population of transcripts appears to be stored in the desiccated embryo prior to germination. The utility and challenges of cross-species microarray hybridizations are also discussed. ii Table of contents Table of contents iii List of tables vi List of figures vii Abbreviations ix Acknowledgments x 1.0 General introduction 1 1.1 Plantation forestry 1 1.2 Somatic embryogenesis 3 1.3 Somatic embryo germination 5 1.4 Genomics of somatic embryogenesis 8 1.5 Hypotheses 9 2.0 Cross-species hybridization 11 2.1 Expression profiling in plant genomic research 11 2.2 History of cross-species microarray hybridization 13 2.3 Challenges of cross-species microarray hybridization 14 3.0 Materials and methods: cross-species hybridization 18 3.1 Conifer seed acquisition and storage 18 3.2 Conifer seed treatment and germination 18 3.3 Harvesting of conifer germinates 19 3.4 Populus seed acquisition and storage 19 3.5 Harvesting of Populus germinants 19 3.6 mRNA isolation and purification 20 3.7 Microarray resources 21 3.8 Cross species microarray hybridization design 22 3.9 Microarray hybridization protocol 22 3.9.1 Probe preparation 22 3.9.2 Slide preparation 23 3.9.3 Probe hybridization 24 3.9.4 Labeling reaction 24 3.9.5 . Slide scanning and image processing 25 3.10 Promoter element prediction and Gene ontology characterization 26 3.10.1 Sample gene list generation: 1000 most highly expressed genes 26 3.10.2 Promoter analysis 27 4.0 Results and discussion: cross-species hybridization 28 4.1 Raw data analysis 30 4.1.1 Anomalous Pinus data 32 4.1.2 Core results 34 4.2 Genes highly expressed post-germination 40 4.2.1 Promoter analysis 40 4.2.2 Gene ontology: functional characterization 43 4.3 Possible future directions 48 5.0 Materials and methods: Pinus taeda somatic embryo germination 51 5.1 Acquisition of Pinus taeda somatic embryos 51 5.2 Desiccation of somatic embryo 51 5.3 Somatic embryo germination medium 52 5.4 Somatic embryo imbibition and germination 52 5.5 Harvesting of somatic embryos 52 5.6 Somatic embryo RNA isolation 53 5.7 Microarray hybridization design 53 5.8 Somatic embryo microarray hybridization ...54 5.9 Somatic embryo microarray data analysis 54 5.9.1 Scanning and image processing 54 5.9.2 Data pre-processing 55 5.9.3 Statistical computations 55 5.9.4 Data clustering 56 5.10 Quantitative Reverse Transcriptase Polymerase Chain Reaction 56 5.10.1 Quantitative RT-PCR methodologies 56 5.10.2 PCR primer design 58 5.10.3 Quantitative RT-PCR data analysis 59 6.0 Results and discussion: somatic embryo germination.... 61 6.1 Differentially expressed features 61 6.2 Data clustering 70 6.2.1 Cluster selection and analysis 70 (5.2.2 Cluster descriptions 76 6.3 Raffinose synthesis and degradation 83 6.3.1 Raffinose and seed biology 83 6.3.2 Raffinose synthesis and degradation related transcripts 85 6.4 Phenylpropanoid biosynthesis 94 6.4.1 Shikimate pathway 94 6.4.2 General phenylpropanoid biosynthesis 101 6.5' Flavonoid biosynthesis 108 6.5.1 Chalcone synthase and chalcone isomerase 108 6.5.2 Lignan biosynthesis 110 6.5.3 Anthocyanin biosynthesis 112 6.5.4 Flavonol biosynthesis 114 6.6 Differential expression of MYB4 116 7.0 Validation of microarray data 119 7.1 Quantitative reverse transcription PCR 119 7.2 QRT-PCR discussion 126 8.0 Conclusion 130 9.0 Biblography 133 10.0 Appendix 148 10.1 Appendix 1. Examples of slide printing and spot morphology variability 148 iv 10.2 Appendix 2. Box-plot depicting the mean foreground signal for each sample and channel 149 10.3 Appendix 3. Box-plot depicts the mean background signal for each sample and channel 150 10.4 Appendix 4. Non-redundant list of the genes that are highly expressed during the germination of all six genera. Shown are EST identifiers (EST ID), predicted Arabidopsis annotations of the gives EST, and the mean foreground to background ratio of each EST (FG:BG) 151 v List of tables Table 1. Pearson's and Spearman's R2 correlations between each test genus and the corresponding Picea reference 30 Table 2. Promoter elements enriched (p< 10e-6) in the 1000 most highly expressed Picea genes during seed germination 41 Table 3. Examples of genes that were found to be highly expressed in all genera during germination 47 Table 4. Pinus taeda QRT-PCR primers 59 Table 5. Estimated percent of differentially expressed ESTs in each microarray comparison 65 Table 6 ESTs contained in each of the ten selected sub-clusters 74 Table 6 continued 75 Table 7. Congruence of QRT-PCR results with microarray results 125 vi List of figures Figure 1. Phylogenetic relationships of five conifer genera and Populus 17 Figure 2. Examples of conifer seeds sampled at the point where the radicle is four times as long as the seed coat 20 Figure 3. Cross-species hybridization: common reference design 22 Figure 4. Six scatter-plots depicting Log_2 transformed raw FG signal intensity data for each test genus vs. the Picea reference 31 Figure 5. Mean background (BG) signal for probes of each test genus vs. the corresponding Picea reference probe 36 Figure 6. Interquartile data range comparison 36 Figure 7. Median foreground (FG) to background (BG) signal ratio for each genus 38 Figure 8. Percentage of all array features with foreground (FG) signal levels 2-fold and 3-fold greater than background (BG) levels 39 Figure 9. Functional characterization: Gene ontologies (GO) of molecular function 45 Figure 10. Percent commonality among the 1000 most highly expressed genes for each genus 45 Figure 11. Somatic embryogenesis: common reference design 54 Figure 12. Frequency of various parametric p-value categories for all array elements and all slides 64 Figure 13. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2%sucrose 7 day vs. desiccated embryo 66 Figure 14. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 0% sucrose 7 day vs. 2% maltose 7 day 66 Figure 15. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2% sucrose 7 day vs. 0% sucrose 7 day 67 Figure 16. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2%sucrose 7 day vs. 2% sucrose 10 day 67 Figure 17. Representative germinants harvested for each carbohydrate treatment time point 68 vii Figure 18. Hierarchical clustering of microarray results 72 Figure 19. Raffinose synthesis and degradation pathway 90 Figure 20. Shikimate pathway 95 Figure 21 General phenylpropanoid pathway 102 Figure 22. Flavonoid biosynthesis pathway 110 Figure 23. Micro array expression profile of WS01034_M15, an EST annotated as a MYB4 transcription factor 118 Figure 24 A, B. Comparison of QRT PCR and microarray results for A) alpha-galactosidase for line K3973 B) 4CL for line K3973 122 Figure 24 C, D. Comparison of QRT PCR and microarray results for C) MYB4 for line K3973 D) alpha-galactosidase for line L3519 123 Figure 24 E, F. Comparison of QRT PCR and microarray results for E) 4CL for line L3519 F) MYB4 for line L3519 124 viii Abbreviations 4CL 4-coumarate-CoA ligase A B A abscisic acid A B C ATP binding component A B R E A B A binding region AGI# arabidopsis gene index number A N O V A analysis of variance A P aspartyl protease family proteins At Arabidopsis thaliana B A S E BioArray Software Environment B G background signal Bp base pair C4H cinnamic acid 4-hydroxylase cDNA complimentary D N A CHI chalcone isomerase CHS chalcone synthase DAHP synthase 3-deoxy-D-arabino-heptolosonate-7-phosphate synthase DFR dihydroflavonol 4-reductase Diff Exp. differential expressed D N A deoxyribonucleic acid EST expressed sequence tag EST ID expressed sequence tag identifier FG foreground signal G A gibberellic acid GFP green fluorescent protein GO gene ontology GS galactinol synthase IFR isoflavone reductase k 1000 Kb kilo base kDa kilo Dalton L D O X leucoanthocyanidin dioxygenase mRNA messenger ribonucleic acid M Y A million year ago MYB4 my elob lastosis P A L phenylalanine ammonia-lyase P M T % photomultiplier tube QRT-PCR quantitative reverse transcriptase polymerase chain reaction RS raffinose synthase TDC tryptophan decarboxylase TIFF Tag Image File Format U V ultra violet V S N variance stabilized normalization Acknowledgments The research presented here would not have been possible without the contributions and guidance of a number of individuals and organizations. Firstly I would like to thank Dr. Brian El l is and all the members, past and present, of the El l is Lab, including Hardy Hal l , Jin Suk Lee, Greg Lampard, Corine Cluis, Ankit Wal ia, and Somrudee Sritubtim, for their critical and supportive input. The core of my research was only possible with access to and assistance from the world-class Treenomix (Genome B.C.) microarray facility. I am particularly indebted to Steven Ralph and Sharon Jancsik for their expert advice and input. In addition, the invaluable assistance from Rick White was greatly appreciated. Much of the microarray data presented in this thesis was greatly complemented by the results generated by the use of the quantitative P C R technology, made accessible by the lab of Dr. X i n L i . I would also like to thank Dave Kolotelo for his advice, and the B .C . Ministry of Forests (Tree Improvement Branch) for their generous donation of both literature and conifer seed. As well , the guidance provided by M ike Carlson, regarding poplar seed acquisition and germination was very helpful. Cellfor Inc. was a valued collaborator throughout my entire thesis work. Their substantial financial, technical, and intellectual contributions made this project a reality, and for that I am very grateful. I would like to highlight the contributions of the past and present Cellfor researchers: Irina Lobatcheva, Steve Attree, Palitha Dharmawardhana and Vindhya Amarasinghe. Thank you to my thesis advisory committee, Jorg Bohlmann, Wolfgang Schuch, and Brian El l is , for their accessibility and genuine input. x Finally, I would like to thank Marta Green for patiently supporting me in every throughout my MSc. xi 1.0 General introduction 1.1 Plantation forestry Wood is one of the most important natural products. It is impossible to imagine modern life without wood and its application, yet until fairly recently (25 years), the majority of the wood that we rely upon has been harvested from potentially dwindling natural forests. If consumption continues at the current rate, we could be faced with a major depletion of natural forests. However, if we were to grow our trees in a more managed, organized and intensive manner, loss of our future fiber supply might be avoided. This is the attraction of plantation forestry, which is already being implemented in some areas in the world, notably Europe, Brazil, South Eastern USA and New Zealand. Furthermore, by integrating biotechnology into plantation forestry, it is conceivable that most human wood demands could be met by tree plantations in the future. The predicted result of full scale implementation of plantation forestry would be that our remaining native forests could be left untouched while still meeting all future wood demands. Vegetative or clonal propagation of forest trees has been actively employed as a method of generating sufficient numbers of high quality seedlings for such large-scale tree-farming exercises. By propagating selected plants in a clonal manner, it is also possible to identify and mass-produce the elite performing genotypes for any given trait, without the need for hundreds of generations of breeding (Williams, 2001). For this reason, horticulturists have been regenerating plants from grafted scions and rooted explants for centuries. 1 Cumminghamia lanceolata (Chinese fir) is one of the oldest examples of vegetative propagation of forest trees. Chinese fir, a tree species once valued for its wood traits, has been propagated from cuttings and grown in plantations at large scale for at least 800 years (Minghe and Ritchie, 1999). By careful selection of certain genotypes chosen for adaptation to particular growth sites and for superior performance traits, tree farmers were able to optimize the performance of their C. lanceolata plantations. Although it is now cultivated primarily as an ornamental tree, Cumminghamia lanceolata was clearly a predecessor of modern day plantation conifer species such as Pinus radiata and Pinus taeda (P. taeda). P. taeda is a subtropical pine species that is grown as the predominant plantation conifer in North America due to its fast growth-rate, valuable wood traits, and desirable growth habit (Schultz, 1999). As well, P. taeda can grow across a relatively wide geo-climatic range ( making it a versatile plantation tree. Preserving and optimizing valued traits such as high growth rate, increased stress tolerance, extended habitat range, or desirable wood qualities are the essence of clonal forestry. By selecting individuals from a population that score exceptionally high for a given trait, it should be possible to mass-produce them as identically cloned propagules, and grow them in a plantation. This would shift the mean score for the selected trait to significantly higher values than that seen in the original source population, a phenomenon called genetic gain (Wang et al., 2004a). The theoretical genetic gain between a wild source forest tree population and a selected clonal population could be as high as 25-40% (Sutton, 2002). Such gains are thought to be possible for virtually any measurable trait, 2 such as growth habit, wood quality, stress or pest tolerance, or, habitat range. For these reasons, there has been great interest in vegetatively propagating plantation trees. Aside from improving production traits, there are numerous novel applications for clonal propagation of forest trees (Attree and Fowke, 1993). Examples described by Attree and Fowke include cloning of species such as Norway Spruce (Picea abies), or some species of Larix that, because of limited rates of natural seed production, are difficult to commercialize in a plantation setting (Attree and Fowke, 1993). As well, selection of genotypes that display enhanced environmental stress tolerance, or pest resistance, has been proposed. There have also been efforts to rescue endangered tree species through mass propagation (Litz et al., 2005), and from a commercial standpoint, the vegetative propagation of ornamental tree cultivars such as unusually colorful genotypes of Picea pungens Engelm (Colorado blue spruce), or naturally well-formed Christmas tree genotypes, has attracted the interest of horticulturists. 1.2 Somatic embryogenesis The major technical limitation when propagating conifer trees by vegetative means from shoot-cuttings is that efficient induction of roots on the cuttings is often difficult. Although improved adventitious rooting has been the focus of recent studies (Brinker et al., 2004; Malabadi and Van Staden, 2005), success remains limited. A promising alternative propagation system for conifers is somatic embryogenesis, whereby plant embryos are multiplied in vitro in the absence of a true seed. Originally, this process was developed in Daucus carota (carrot) (Gamborg, 2002), but the general method has been successfully adapted to numerous other plant species, including conifers 3 such as Picea abies (Hogberg et al., 2003), Pseudotsuga menziesii (Taber, 1998), and Pinus taeda (Gupta and Pullman, 1987). Optimized somatic embryogenesis technology makes it theoretically possible to generate vast numbers of genetically identical embryo-propagules from a single seed. This possibility is of particular interest to tree plantation managers since reliable seed sources are crucial to planting success (Schultz, 1999). The interest in commercializing somatic embryogenesis has stimulated research into the optimization of this process for numerous economically important plant crops including coffee (Etienne-Barry, 1999), ginseng (Langhansova et al., 2004), tangerine (Nieves, 1998), soybean (Walker, 2001), as well as Pinus taeda (Pullman et al., 2003c) and other conifers (Attree and Fowke, 1993; Iraqi and Tremblay, 2001; Percy et al., 2001; Stasolla et al., 2003) ( Despite the tremendous promise of somatic embryogenesis technology for clonal forestry, there are three components of the process, which represent important bottle-necks limiting the commercial-scale multiplication of conifer somatic embryos. The first of these involves the induction of embryogenic callus tissue from a mature zygotic embryo. There is currently no reliable means of predicting which conifer genotypes will yield callus tissue capable of regenerating in tissue culture to form a mature embryo, although numerous efforts have been made to determine optimum phytohormone concentrations and tissue culture conditions for induction of viable callus tissue. These studies have focused both on a better understanding of the underlying physiology, and on improving the effectiveness of existing protocols (Pullman et al., 2003c; Pullman et al., 2003a; Pullman et al., 2003b; Pullman et al., 2005). 4 Following callus induction, it is desirable to maximize the number of viable somatic embryos per gram of callus tissue (i.e. the embryogenic potential). However, successful somatic embryo maturation is often a highly variable, and thus limiting, step in the embryogenesis process (El Meskaoui and Tremblay, 2001; Pullman et al., 2003c). The primary strategy for attacking this problem has been the use of large-scale matrix experiments in which nutrient sources, osmotic agents, and phytohormone levels are manipulated in a combinatorial fashion, with the intention of defining the most efficient differentiation condition. 1.3 Somatic embryo germination The final limiting factor in the propagation of conifers via somatic embryogenesis has proven to be the germination of the callus-derived somatic embryos. Even with the ability to generate virtually unlimited numbers of somatic embryos, the inability to reliably germinate them greatly reduces the overall process efficiency. The need for improved germination efficiency at this stage of the process has two drivers. First, from a commercial perspective, the resource consumption associated with each seedling is proportional to the length of time it has been maintained in the tissue culture process. By the time a somatic embryo has reached the germination stage of the process it has undergone several weeks of tissue culture manipulation and care. Embryos that reach this stage of the process are thus the most costly to lose. Secondly, there are significant energy costs associated with further processing of each germinant once it leaves tissue culture, because the bulk of the germination and seedling growth stages generally take place in heated and artificially illuminated greenhouses. For these reasons, it is important to better understand the somatic embryo germination process and to apply this knowledge 5 to the eventual goal of minimizing seedling mortality and improving post germination-growth in an industrial-scale setting. One of the key factors that distinguishes somatic embryos from zygotic embryos is the lack of a seed coat and reduced energy reserves contained within the somatic propagules. While a conventional zygotic seed essentially possesses a self-contained energy source, a somatic embryo is dependent on supplemental nutrients, particularly carbohydrates, in order to germinate and develop into an autotrophic organism. For this reason, much of the existing literature describing somatic embryo germination focuses on optimizing the nutrient supply conditions of the germinating somatic embryo, with the intention of speeding its transition to an autotrophic state. Sucrose has traditionally been the primary carbohydrate provided in somatic embryo germination medium; however, some studies have suggested that replacing sucrose with maltose may have a positive effect on the germination of certain genotypes (Pullman et al., 2003c). It has been hypothesized that this may be due to the fact that maltose is more stable over time in tissue culture media, or that maltose may act as some sort of elicitor (Lipavsk and Konradova, 2004). Another important factor influencing the germination process can be changes in the osmotic status of somatic embryos. It is common practise to desiccate the somatic embryos following embryo maturation and prior to germination. This is generally achieved by either culturing the somatic embryos on a medium containing an osmotic agent, such as PEG (Stasolla et al., 2003), or by exposing the somatic embryos to a dehydrating atmosphere such as dry air. The consequences of these types of desiccation treatments are three-fold. First, this dehydrating treatment is believed to simulate the 6 events that take place in a natural seed (Vicente-Carbajosa and Carbonero, 2005), thereby possibly increasing the likelihood of germination success. Second, the desiccation treatment has the practical effect of allowing the dried somatic embryos to be stored in a dormant state of long periods of time, which is desirable if the goal is to mass produce and market somatic embryos in a similar manner to their zygotic counterparts. Finally, desiccation treatment has the effect of synchronizing the germination process. Highly synchronized germination improves the efficiency of a silviculture operation, and from a research perspective. It also allows better controlled germination studies to be preformed (Gallardo et al., 2001). The germination of somatic embryos remains a poorly understood phenomenon, and the literature describing the subject is correspondingly limited. This could be due in part to the fact that somatic embryos are difficult to produce in large numbers and do not generally germinate in a consistent fashion. Conducting reproducible germination studies thus becomes a very labor-intensive exercise. If, however, one does have access to high quality somatic embryos, there are numerous studies waiting to be performed. In particular, by taking advantage of recently developed conifer microarray technology, it is now possible to consider developing a model of the germination of P. taeda somatic embryos at an unprecedented level of detail. For instance, tracking the expression profiles of all available genes across a developmental time-series spanning from the desiccated embryo to a developed seedling would make it possible to identify key genes or metabolic pathways that are active during the germination process. One variation of such an experiment would be to quantify gene expression profiles of replicated groups of P. taeda somatic embryos as they are germinated on various carbohydrate-containing media, 7 which should make it possible to link gene expression patterns and nutrient-induced phenotypes. 1.4 Genomics of somatic embryogenesis There are numerous examples of modern molecular techniques being used to complement the tissue culture and cell biology approaches that have been classically used to study early somatic embryogenesis. Global transcriptional profiling (Stasolla et al., 2003; Stasolla et al., 2004b), and large-scale protein expression analysis experiments (Lippert et al., 2005) have been used to gain a better understanding of the developmental and physiological events taking place during somatic embryo formation. As well, recent advances in to field of metabolite profiling (Weichert et al., 2002; Robinson et al., 2005) may soon allow a 'systems biology' approaches to be applied in studies of somatic embryogenesis. The data generated by these global profiling studies will improve our fundamental understanding of the somatic embryogenesis process, but they will also lend themselves to the development of useful biomarkers. Biomarkers are predictive tools based on the presence or absence of a characteristic metabolite, protein or gene expression profile that has been demonstrated to be highly correlated with a phenotype of interest. The search for biomarkers is currently a major driver within pharmacological and genomics research, where the goal is the creation of rapid and simple methods for diagnosing or predicting the progression of disease, or the response to specific medications (Gius et al., 2005; Nemeth et al., 2005). In plant science, biomarkers are also of interest as a tool for assessing the viability of crop seed lots (de los Reyes and McGrath, 2003; de los Reyes et al., 2003), or improving the predictability of the conifer somatic embryogenesis process 8 (personal communication with Steven Attree, Cellfor Inc.). In the context of industrial scale somatic embryo propagation, it would be valuable to be able to predict the ultimate productivity of a given batch of somatic embryos. Such a predictive capacity would allow an informed decision to be taken at an early stage in the regeneration and germination process, before the tissue is put through a lengthy, expensive and potentially unproductive tissue culture process. 1.5 Hypotheses Germination, whether from seed or from somatic embryo, involves a dramatic transition from a dormant life stage into an autotrophic organism. An embryo undergoes major morphological and physiological changes during this germination process, and I hypothesize that these developmental changes will be correlated with a large number of differences in gene expression. Global analysis of such differences in germinating P. taeda somatic embryos will therefore provide insight into the identity of genes whose activity is important to successful germination. Since germination involves a sequential series of developmental events, each of which may require the participation of a somewhat different set of genes, I hypothesize that sampling P. taeda somatic embryos at two time-points post-imbibition, will allow me to obtain an initial view of the kinetic behaviour of those differentially expressed genes associated with germination. Since P. taeda somatic embryo germinants grown on different carbohydrate sources display marked morphological differences, I also hypothesize that comparative global gene expression analysis of such germinants will reveal genes and associated 9 biochemical pathways whose activities are uniquely correlated with growth on different carbon sources. One limitation of studying P. taeda embryo development is that there is little in the way of readily available microarray infrastructure. However, a large-scale Picea cDNA microarray has recently been developed by the forestry genomics initiative of Genome Canada ( Given that Picea and Pinus are believed to be closely related phylogenetically (Wang et al., 2000; Rydin et al., 2004; Besendorfer et al., 2005), and that existing Pinus and Picea EST collections are 80-90% similar (Pavy et al., 2005a), it is logical to hypothesize that biologically meaningful results could be obtained from an experiment involving the hybridization of P. taeda RNA (cDNA) to a Picea cDNA microarray. However, before extensive microarray experiments are carried out that rely on cross-species hybridizations, the validity of this hypothesis needs to be tested. 10 2.0 Cross-species hybridization 2.1 Expression profiling in plant genomic research High-density cDNA microarrays are powerful tools for semi-quantitatively evaluating the expression of thousands of genes simultaneously. The flexible and high through-put nature of microarray profiling has made it a key technology for investigating biological processes at a molecular level. Most microarray experiments have been performed using reference organisms that have become models within various fields of research.. This is due to the fact that in most instances their genomes have been fully sequenced. Examples of these so called "model organisms" include as mice, yeast, Arabidopsis or nematode ( Both the ease of experimentally manipulating these organisms, and the publicly available genome sequence information, have allowed microarray-based research in model species to flourish (Loring, 2005; Rensink and Buell, 2005). Genome sequencing efforts have initially been directed toward model species, which means that, within the plant kingdom, only Arabidopsis thaliana, rice and a handful of crop species are represented in the public databases in sufficient depth to support productive global transcriptional profiling exercises. Tree species, in particular, have received relatively little attention at the genomics level, although the recent completion of the Populus trichocarpa genome sequence provides an important resource for deciduous tree biology (Brunner et al., 2004b). Within the commercially more important conifers, microarray-based research has focused on the Pinus genera, and this has been fairly modest in scope, despite the fact that Pinus is a genus of tremendous 11 economic and ecological significance. Since the Pinus genome, at -2.0 X 1010 bp, is estimated to be 160 times larger than that of Arabidopsis thaliana, the current cost of genome sequencing, combined with the minimal interest in Pinus as a plant biology research system, make it unlikely that extensive full genome-based resources will be available for Pinus species in the near future (Kirst et al., 2003). Nevertheless, Pinus has not been completely ignored in the genomics revolution. Several genomics studies have been conducted on P. taeda, with a major focus on the tissues associated with wood development and properties (Whetten et al., 2001; Kirst et al., 2003). For that reason, most of the cDNA libraries from which the almost 200,000 publicly available Pinus ESTs are derived were produced from wood-related tissues such as xylem or shoot tips (Kirst et al., 2003; TIGR, The tissue-specific nature of these collections makes them less than ideal, however, for the study of other specialized processes, such as flowering or seed development. For example, transcripts specific to early gametophyte development will likely be underrepresented in, or excluded from, xylem-derived EST libraries. In contrast, extensive and more biologically representative EST collections have recently been generated for Picea species. Genome Canada, through its Treenomix and Arborea initiatives, has supported the production of approximately 22 k and 16.5 k unique cDNA sequences respectively, from a large variety of Picea tissues (Pavy et al., 2005b; ARBOREA,; Treenomix, The intent of these efforts was the generation of a rich and diverse collection of cDNAs that could be used for the experimental investigations of all aspects of Picea growth and 12 physiology, rather than simply wood formation, as was the case for the NSF-funded Pinus taeda projects. 2.2 History of cross-species microarray hybridization In order to perform genomic-scale experiments on non-model species for which no homologous microarrays are available, it has become increasingly common to use model organism-specific microarrays to interrogate the transcript profiles of closely related non-model species. This method is commonly referred to as cross-species hybridization (Enard et al., 2002);(Gilad et al., 2005). For example, a wide range of mammalian research has benefited from the fruits of the human genome project. Moody et al. used human cDNA arrays to compare human and pig muscle development, while Ji et al demonstrated that the Affymetrix human GeneChip platform can be an effective tool to explore biological questions in other mammals such as cows and dogs (Moody et al., 2002;' Ji et al., 2004). Human cDNA microarrays have also been used to study rodent eye morphogenesis (Nemeth et al., 2005), and a human dermis-specific cDNA microarray played a key role in the study of UV-induced melanoma in Monodelphis domestica (short-tailed opossum) (Wang et al., 2004b). Such cross-species studies are not limited to mammalian systems. Amphibian and fish-based research has also exploited the potential of cross-species microarray experiments, using Xenopus tropical (frog) oligo-arrays (Chalmers et al., 2005), and Astatotilapia burtoni (cichlid fish) (Renn et al., 2004) and Salmonid (salmon and trout) (Rise et al., 2004) cDNA arrays to explore biological events in related species at the transcriptional level. In the plant kingdom, several studies have used Arabidopsis arrays 13 to study Brassica species (Girke et al., 2000);(Becher et al., 2004), while preliminary experiments in conifers have taken advantage of existing Pinus teada xylogenesis cDNA arrays to examine spruce cell culture differentiation (Stasolla et al., 2003; van Zyl et al., 2003; Stasolla et al., 2004a; Stasolla et al., 2004b). However, given the scope and species diversity within the plant kingdom, and the relatively small number of model species for which genomics tools have been developed, the potential to employ model species arrays in a heterologous manner has considerable practical importance. If such an approach can be demonstrated to consistently generate reliable and informative data, wider use of cross-species hybridization is likely to stimulate a great deal of genomics research in species whose genomes may never be fully sequenced, because of their limited economic or scientific impact. 2.3 Challenges of cross-species microarray hybridization Although cross-species hybridization approaches could, in principle, eliminate the necessity for development of species-specific microarrays, there are potential limitations that have to be considered. Genetic divergence may lead to increased non-specific hybridization, and thus to increased numbers of false positive results in the experiment. This would likely be more of an issue when using cDNA microarrays as opposed to oligomer-based technology. cDNA microarrays are generally more sensitive, because binding efficiency is proportional to the sequence length, but they are also somewhat less specific due to the potential for cross-hybridization of transcripts with similar functional domains (Alba et al., 2004), such as would be found in multi-member gene families. Genetic divergence can also lead to false negative results, since it is difficult to 14 distinguish between genes that have very low expression levels and genes whose transcripts allow only a poor sequence match to array features due to evolutionary divergence. The logical, but rarely validated, assumption in cross-hybridization experiments is that the more closely related the probe and array species, the more reliable and biologically meaningful the result will be. This assumption has been tested and verified in systems such as primates (Gilad et al., 2005), and fish (Renn et al., 2004), but only preliminary evidence has been compiled in plant systems thus far. The ultimate utility of cross-species microarray hybridizations for large-scale gene expression profiling in non-model plant species will be determined, in part, by developing a better answer to this question: How closely related do the probe species and the array species have to be in order to generate results that accurately depict the biological events taking place? To explore that question, and to provide a benchmark for my own cross-species hybridization study, my first step was to examine the ability of spruce cDNA microarrays to support expression profiling across a range of phylogenetically related conifer species. By hybridizing cDNA derived from a range of conifer taxa to the recently developed 16.8 k Picea cDNA microarray (Treenomix, I wanted to determine the extent to which the quantity and quality of expression data being generated were proportional to the phylogenetic relationship between the probe and array genus. mRNA was isolated from developmentally-comparable seedlings of Picea glauca, Pinus contorta, Larix occidentalis, Pseudotsuga menziesii, Tsuga heterophylla, and Populus tricocarpa and hybridized to the Picea cDNA arrays. Picea glauca and Populus tricocarpa represented the extremes of the phylogenetic range that was to be tested, while 15 the remaining four genera fell within this range. The phylogenetic relationship of all six genera is shown in Figure 1. Superimposed on this phylogenetic tree is a geological time scale as estimated by Wang et al (Wang et al., 2000). The appearance of Picea in the fossil record approximately 140 million years ago (MYA) is indicated on the time scale with the arrow. To ensure that the expression differences being measured are the result of phylogenetic differences, rather than biological variability, it was essential that the samples used in the experiment were of the same developmental stage. Seed germination was selected, since this stage provided a clearly defined sampling point that was common to all genera being tested. It would also allow replicate samples to be generated at a later date without the effects of seasonality that would have to be considered when working with other mature tissues. In addition, germinating seeds represent an analogous system to the Pinus somatic embryo germination system that was to be studied in subsequent experiments. 16 I Picea " Pinus Larix Pseudotsuga Tsuga Populus C Pa 140 MYA Figure 1. Phylogenetic relationships of five conifer genera and Populus. Geological time scale is shown: Jurassic (J) 206-144 MYA, Cretaceous (C) 144-65 MYA, Paleocene (Pa) 65-54 MYA, Pliocene (P) 5-1 MYA. The arrow at 140 MYA represents the estimated time at which Pinus and Picea diverged. 17 3.0 Materials and methods: cross-species hybridization 3.1 Conifer seed acquisition and storage Previously stratified seeds of the following species were obtained from the British Columbia Ministry of Forests, Tree Improvement Branch: a) Picea glauca (Moench) b) Pinus contorta (var. latifolia Douglas) c) Larix occidentalis (Nutt) d) Pseudotsuga menziesii (var. glauca (beissn.) Franco) and e) Tsuga heterophylla ((Raf.) Sarq). All seeds were stored in the dark at 4°C until needed. For germination induction, seeds from each genus were randomly divided into batches of approximately 100 seeds, and these batches were then germinated separately. 3.2 Conifer seed treatment and germination Each batch of seeds was imbibed by being placed in a glass bottle containing 2 L sterile water that was aerated continuously at room temperature for 24 hr, using a small aquarium pump. The imbibed seeds were then surface-sterilized in 30% hydrogen peroxide for 30 min with gentle agitation, rinsed twice with sterile water, and placed on a 5 mm thick stack of sterile, water-saturated Whatman # 1 filter paper discs in 10 cm diameter petri plates. These were placed under continuous fluorescent lighting at room temperature. 18 3.3 Harvesting of conifer germinates To ensure an accurate cross-species comparison, it was essential that all selected germinants be at the same developmental stage. Seeds were therefore classified as germinated when the emerging radicle extended to 4 times the length of the seed coat, as described by the British Columbia ministry of forests seed handling hand book (Kolotelo et al., 2001) (Figure 2). Since the germination of the seeds within a batch was somewhat asynchronous, it was necessary to monitor and harvest germinants from each batch daily. Harvested germinants were placed in a 1.5 ml microfuge tube, and flash frozen with liquid N2. All samples were then stored at -80°C. 3.4 Populus seed acquisition and storage Mature Populus balsamifera ssp. trichocarpa seed was collected from specimen tree #1982 growing on the University of British Columbia - Vancouver campus. Seeds were dried, manually forced though a 1mm polypropylene screen to remove the external fibres and then placed on a 5 mm thick stack of sterile, water-saturated, Whatman #1 filter paper discs in 10cm diameter petri plates. These were placed under continuous fluorescent lighting at room temperature. 3.5 Harvesting of Populus germinants Since Populus seed germination is not readily scored on the basis of radicle extension measurement, seeds were defined as germinated when the seedling had ruptured but not yet completely shed the seed coat (approximately 16hrs after imbibition) Populus seed germination occurs very rapidly following imbibition, therefore seeds did 19 not require surface sterilization. Germinating seeds were harvested, place in a 1.5 ml microfuge tube, frozen immediately with liquid N 2 , and stored at -80°C Figure 2. Examples of conifer seeds sampled at the point where the radicle is four times as long as the seed coat. Mill imetre scale-bar shown 3.6 mRNA isolation and purification Total R N A was extracted from seedling tissue using a modified version of the Qiagen RNeasy plant R N A isolation kit (Valentia, C A ) , as described by McKenz ie et al. (McKenzie et al., 1997). To minimize possible variability in the R N A isolation process, 20 total RNA samples for each batch within a species were pooled (Churchill, 2002). To normalize for species-specific variation in the ratio of mRNA to total RNA, mRNA was isolated from the pooled total RNA, and purified using the Ambion (Austin, TX) "Micropoly (A) pure small scale mRNA isolation kit". Purified mRNA was quantified and the quality was assessed by performing a first-strand cDNA synthesis with [32 P] dGTP incorporation, as described by Kolosova et al (Kolosova et al., 2004). All mRNA samples were stored at -80°C until use. 3.7 Microarray resources All cross-species microarray hybridizations were performed using the Forestry Genome British Columbia 16.7k Picea cDNA microarray (Treenomix, which had been printed at the Jack Bell Gene Array Facility (Vancouver, B.C.). The array was composed of approximately 16,700 unique ESTs derived from an assortment of Picea tissues representing a wide range of developmental stages and treatment regimes. Each of these ESTs was printed as individual spots on the slide. As well, a selection of positive and negative control elements were printed in a uniform pattern across each slide for the purposes of hybridization quality control assessment. The positive control elements consisted of human-specific and GFP cDNAs that were both printed on the slide and spiked in each sample to confirm that optimal hybridization conditions had been met. Likewise, a series of empty spots, blank spots, poly-A and vector DNA spots were printed as negative controls to allow the levels of non-specific hybridization to be gauged. The array features were arranged in sub-grids of 20 X 20 features, and these sub-grids were then arranged in 4 meta-columns and 12 meta-rows. 21 3.8 Cross species microarray hybridization design Probes were hybridized in a common reference, dye-balanced design, using Picea germinant mRNA as the common reference (Figure 3). Each sample was technically replicated on four slides, two for each dye orientation. A 500 ng aliquot of mRNA from each sample was used per replicate. Picea Pinus Larix Picea _ Pseudotsuga Tsuga Populus Figure 3. Cross-species hybridization: common reference design. Arrows represent four dye-balanced replicate hybridizations between the Picea reference and one of the six test genera. 3.9 Microarray hybridization protocol 3.9.1 Probe preparation The following probe preparation method was used: sample mRNA (500 ng) (13.5 ul) was combined with Forestry GBC positive control RNA spikes (4 and an oligonucleotide primer unique to either cy3 or cy5 probes (2.5 u.1). Following denaturation at 80°C for 10 minutes, the mixture was placed on ice. The mRNA, positive control RNA spikes, and corresponding primers were then reverse-transcribed by following the specifications of the Superscript II Reverse Transcriptase kit (Invitrogen, 22 Carlsbad, CA.). The reaction was quenched by adding 0.7 u.1 NaOH (5M), 0.7 u.1 EDTA (0.5M), and 5.6 u.1 water, and then heating the mixture to 65°C for 10 minutes. Samples to be compared on the array, labeled with either cy3 or cy5, were combined in a single 1.5 ml tube, with 46 u.1 TE (pH 8.0). Linear acrylamide (3ul) from the Genisphere Array-900 labelling kit (Hatfield, PA.) was added to each sample, along with 16 id NaAc (3.3M) and 400|il ethanol (100%). The labeled cDNAs were then precipitated at -80 °C for 1-24 hr and centrifuged at 14,000g for 50 min at 4°C. The resultant pellet was washed with cold 70% ethanol, and again centrifuged at 14,000g for 10 min at 4°C. The pellet was then air-dried and re-suspended in 20ixl water and 25u,l hybridization solution as specified by the Array-900 manufacturer (Genisphere Hatfield, PA). This solution was denatured at 80°C for 10 min and stored in darkness at 65°C until needed. 3.9.2 Slide preparation Twenty-four slides from batch #DI002 were randomly divided into six groups of four slides each, i.e. one four-slide group per genus. Slides were stored in the dark at room temperature until needed. Prior to probe application, slides were washed twice with 0.1% SDS for 5 min at room temperature. One wash consisted of 30 sec of agitation, followed by the 4.5 min of soaking in the SDS solution. Slides were then rinsed by soaking for two min in ultra-pure Sigma-brand water (St. Louis, MO.), and then boiled in ultra-pure Sigma-brand water for 3 min. The slides were then dried by being placed in 50 ml conical plastic centrifuge tubes, centrifuged for an additional 3 min at 2000 RPM, cleared of dust with compressed nitrogen gas, and held in Corning hybridization chambers (New York, NY) at 65°C until loaded. 23 3.9.3 Probe hybridization The probe solution was applied to the slide as follows: a 45 ul aliquot of 65 °C probe solution was pipetted in a thin bead along the edge of the printed region of the slide. A microscope cover slip was then placed on top of the bead of probe solution, dispersing the solution equally across the printed region of the slide. Air bubbles were eliminated by gently tapping the cover slip with forceps. Distilled water (20 ul) was added to the hydration reservoirs in the hybridization chamber to ensure that optimal relative humidity was maintained inside the chamber. The slides were then sealed in the hybridization chamber and incubated at 60°C in a water-bath for 16 hours. Following the hybridization, the slides were removed from the hybridization chamber and soaked in wash solution (2 X SSC: 16.5 mM NaCl, 166.5 mM sodium citrate, 0.2% SDS), for 15 min at room temperature to remove the cover slip. The slides were then transferred to fresh 65 °C wash solution and allowed to soak for another 15 min. Following this, they were washed three times by being transferred to room temperature 2 X SSC, agitated for 30 seconds, and allowed to soak for 4.5 minutes. The 0.2 X SSC solution was replaced following each 5-min wash cycle. The slides were dried by being placed in 50 ml centrifuge tubes, centrifuged for 3 min at 2000 rpm, cleared of dust with compressed nitrogen gas, and stored in Corning hybridization chambers (New York, NY) at 65°C in preparation for labeling. 3.9.4 Labeling reaction The labeling solution was prepared as specified by the Array-900 kit manufacturer (Genisphere), and was applied by pipetting 45 ill labeling solution (65°C) 24 in a bead along the edge of the printed region of the slide. The solution was dispersed by covering the slide and solution bead with a microscope cover slip. Air bubbles were eliminated by gently tapping the cover slip with forceps. Distilled water (20 ui) was again added to the reservoirs of the hybridization chamber to ensure that optimal relative humidity was maintained inside the chamber. Each slide was then sealed in a hybridization chamber and incubated in a water bath at 60°C for 3 hours. To remove the cover slips, slides were soaked in wash solution (2 X SSC, 0.2% SDS) for 15 minutes at room temperature. Slides were then transferred to fresh, 65°C wash solution and again allowed to soak for 15 min. The slides were washed three times by being transferred to room-temperature 2 X SSC, agitated for 30 seconds, and allowed to soak for 4.5 min. Following this the slides were washed three times by being placed in room temperature 0.2 X SSC, agitated for 30 seconds, and allowed to soak for 4.5 min. The 0.2 X SSC solution was replaced following each 5-min wash cycle. The slides were dried by being placed in 50 ml conical based tubes, centrifuged for 3 min at 2000 rpm, cleared of dust with compressed nitrogen gas, and stored in an opaque slide storage box at room temperature. 3.9.5 Slide scanning and image processing The hybridized and labelled slides were scanned at lOfxm scan resolution using a Perkin-Elmer ScanArray, 2-laser microarray scanner (Boston, MA.), equipped with ScanArray Express software (Perkin-Elmer; Boston, MA.). To ensure consistent scanning between slides and to maximize the dynamic range in the data, PMT% values were adjusted per-channel for each slide in order to maintain background levels below 300 25 fluorescent units and the number of saturated spots below 1% of total array features. Slide images were saved as "TIFF" files for later analysis. Slide images were assessed and quantified using Imagene image analysis software (BioDiscovery, El Segundo, CA.). To estimate hybridization and slide quality, custom Forest GBC 16.7 k spruce-array-specific quality control scripts were used to evaluate the raw data generated by Imagene. TIFF images, raw data outputs, and quality control outputs were uploaded and archived in a BioArray Software Environment (BASE) (Saal et al., 2002). 3.10 Promoter element prediction and Gene ontology characterization 3.10.1 Sample gene list generation: 1000 most highly expressed genes For the purpose of evaluating if the cross-species microarray hybridization where generating a data consistant with the biology being studied (germinating seeds), a sample list of genes was analyzed from each species. These sample lists consisted of the 1000 most highly expressed genes in each sample. For each species, all annotated genes (significant BLAST return e < 10"5) were sorted in decending order of foreground to background ratio. The 1000 highest ranked genes were then extracted for analysis from each species list. In general all genes selected had foreground to background ratio greater than 3.5 (data not shown). 26 3.10.2 Promoter analysis ATHENA promoter and gene ontology prediction software (ATHENA, was used to analyze the Picea sample gene list (1000 most highly expressed genes). All settings were left as default with upstream range = 1.0 kb. 27 4.0 Results and discussion: cross-species hybridization To determine if there was any phylogeny-dependent decline in the quality and quantity of microarray data generated from the various cross-species microarray experiments, the raw data from the six classes of microarray comparison: Picea vs. Picea, Picea vs. Pinus, Picea vs. Larix, Picea vs. Pseudotsuga, Picea vs. Tsuga, Picea vs. Populus was evaluated for trends. Assessments of key data characteristic such as: scatter-plot regression correlations, foreground (FG) data range, mean foreground to background (BG) ratios, the portion of array features expressed at 2-fold or 3-fold above background level, and the functional categories of the most highly expressed genes were used to evaluate the utility and limitations of cross-species microarray hybridization experiments. These measures would allow me to estimate what portion of the data being generated by a given cross-species experiment is reliably detected. In my microarray experiment, data sets from closely-related species were expected to be highly correlated (R2 ~ 1.0), where as comparisons between more distantly related species, such as Picea vs. Populus were predicted to yield much weaker correlations (R2 ~ 0.5). Datasets from comparisons between moderately related species such as Pinus, Larix, Pseudotsuga, or Tsuga would be predicted to produce correlation values that fall somewhere between these two extremes. This hypothesis is based on the prediction that there will be sequence divergence between the two species that is proportional to the phylogenetic distance separating them. This varying degree of sequence divergence would result in mis-matches between the microarray features and 28 the probe species RNA, which would potentially have two effects: 1) a decrease in the overall hybridization efficiency, and 2) a decrease in the accuracy of the resulting data. Assessment of mean foreground (FG) signal range is another way of summarizing the quality of the microarray data. If we assume that the microarray technology is accurately measuring the expression levels of all genes being queried, I would expect to see the expression level of various genes cover the entire dynamic range of the measurement technology (0- 64,000 fluorescent units) when the array-species is the same as the probe-species. As the phylogenetic distance between the array-species and the probe-species increases, I would expect the range of the expression data to decrease as a result of the decrease in hybridization efficiency. Comparison of the interquartile data range, i.e. the difference between the 3rd and 1st quartile of the raw data, provides one measure of this range shrinkage. Similarly, I would expect the mean FG to background (BG) signal ratio obtained from each hybridization to decrease as a function of the phylogenetic distance between the array and probe species. A related measure, the percentage of array features expressed at levels 2- or 3-fold above BG level, gives an indication of the portion of data that in substantially above BG levels, and this should also decline steadily with respect to species divergence. Once these criteria are set, and the reliably detected portion of data has been defined in each species comparison, we can address the central question: does this fraction appear to accurately depict the biology that is unfolding in each of the confer species sampled. In the context of my experimental design, I would expect to consistently find known germination-related genes being highly expressed in the fraction of "reliably detected features". In order to test this prediction, I assembled the gene annotations for 29 the 1000 most highly expressed features (i.e. highest foreground to background ratio) for each species comparison, using homology between the Picea cDNA elements and their closest predicted Arabidopsis homologues. The predicted molecular functions of these homologous Arabidopsis genes were also examined for indications of their possible involvement in seed germination or seedling growth. Likewise, the putative promoter regions of these homologous Arabidopsis genes were searched in silico for any enrichment in known plant germination-related promoter elements. 4.1 Raw data analysis The mean FG signal values obtained for each genus sample were plotted against their corresponding Picea reference values (Figure 4). This generated six scatter-plots, one for each non-Picea (test) genus (Figure 4b-f), as well as one graph depicting the self-self hybridization of the Picea reference (Figure 4a). Spearman and Pearson correlation values were generated for each of the scatter-plots as a means of assessing the predicted phylogenetic distance-dependent decrease in microarray accuracy (Table 1). Table 1. Pearson's and Spearman's R2 correlations between each test genus and the corresponding Picea reference. Genus Pearson's R2 Spearman's R 2 Picea 0.93 0.98 Pinus 0.81 0.87 Larix 0.91 0.97 Pseudotsuga 0.80 0.89 Tsuga 0.84 0.92 Populus 0.49 0.72 30 log_2 Picea CM. O s r - C IOCL2 ffoea tog_2P/cea 9 • H B P " r z + • • 1 1 b 4 3 log_2 ffoea loci_2 flfcea bg_2 Picea f Figure 4. Six scatter-plots depicting Log_2 transformed raw FG signal intensity data for each test genus vs. the Picea reference, a) Picea vs . Picea; b) Pinus vs. Picea; c) L i m vs . Picea; d) Psudeostuga vs. Picea; e) Tsuga vs. Picea; f) Populus vs. P / cea 31 4.1.1 Anomalous P inus data When comparing the six scatter plots and the associated correlation values, I observed some specific trends. One of the more striking observations was that Pinus consistently yielded results that were inconsistent with its close evolutionary relationship to Picea. The Pinus-Picea scatter-plot (Figure 4b) has a marked data-point spread, and the Pearson's and Spearman's correlation values are more similar to those of the Picea -Pseudotsuga or Picea - Tsuga comparisons. Similarly, the median FG interquartile range for Pinus is greatly below what we would predict, as is the percent of genes expressed at 2-fold and 3-fold above BG (see below). There are three possible explanations for the Pinus observations. The first is that there are true biological differences between the Pinus samples and the other conifers. For instance, it is conceivable that Pinus seeds and Picea seeds that have apparent morphological similarity when harvested are, in fact, at significantly different developmental stages. In such a situation, it is possible that the transcriptional profiles of Pinus might be substantially different and/or underrepresented, compared to other closely related genera. However, comparison of the composition of the highly expressed gene sets from each genus make this possibility unlikely, since there is a strong congruency between the 1000 most highly expressed genes in the Picea and Pinus samples (see below). A second, trivial, possibility is that Pinus has been incorrectly assigned as the most closely related genus to Picea in this experiment. If this were true, it would clearly 32 be contrary to an extensive body of taxonomic and genetic literature (Wang et al., 2000; Rydin et al., 2004; Besendorfer et al., 2005). The third, and most likely, scenario is that the Pinus mRNA had become degraded just prior to the reverse transcription step of the microarray protocol. Degraded RNA can seriously compromise the efficiency of the RT reaction, resulting in the production of decreased quantities of probe. Signals from hybridization would then be reduced to a degree proportional to the loss of probe (Bustin and Nolan, 2004). A repeat of the Pinus portion of the experiment would be the logical replication to perform, but due to sample and technical limitations, there was no surplus mRNA available that would have allowed me to test this hypothesis. If I treat the Pinus data as anomalous, the data generated from the remaining genera appear to follow the predicted phylogenetic distance-dependant pattern. The Picea - Picea plot yields high correlation values, as anticipated, since any self-self microarray hybridization should theoretically generate an R2 correlation of 1.0, and the data should fall on a straight line with a slope of 1 with an X and Y intercept of zero (Yang et al., 2002). The correlation values then gradually decrease in concert with increasing phylogenetic distance of the non-Picea genera from Picea. 33 4.1.2 Core results Although self-self hybridizations should, in principle, yield a straight line on the correlation graphs, experimental data always produces some deviation from this ideal. The observed deviation represents the technical variation, or 'noise', which is the result of unavoidable inconsistencies in sample handling, slide quality, and other factors that contribute to decreased levels of reproducibility. In the case of the Picea vs. Picea graph (Figure 4a) these limitations resulted in a 'scatter cloud' rather than a straight line, as well as several outliers, and a Pearson's correlation of 0.93. Slide-printing artifacts were likely a major contributor to the noise observed in this case (Appendix 1). If the noise originates primarily from slide-printing artifacts, as proposed, it would be expected to be consistent across all slides in the experiment, regardless of which comparison is being made. Since all the biological material being tested should also have been at a similar developmental stage, I conclude that the observed reduction in sample correlation with respect to phylogenetic distance from Picea must represent primarily cross-genus effects, rather than technical errors. Interestingly, the Spearman's correlation values are slightly higher than the Pearson's correlations in each case (Table 1). A Spearman's correlation value is based on the correlation between the ranked expressions of the various elements, while Pearson's method correlates the absolute expression values for each element. The observation that Spearman's correlation values are higher suggests that, even though the absolute expression levels of each element may not be highly correlated between the non-Picea and Picea samples, the overall ranking of those elements is. Spearman's correlation would be expected to be less affected by technical noise than Pearson's. For example, 34 when comparing two replicate arrays, one with reduced signal intensity due to lower hybridization efficiency, and one with normal levels, Spearman's correlation would predict a high correlation between the replicates, because the rank of expression levels would likely remain unchanged despite the array-wide reduction in hybridization efficiency on one array. Pearson's correlation, on the other hand, would directly compare expression values on a per element basis, and predict a reduced correlation. Not only do the observed correlation values decrease with greater phylogenetic distance, but the scatter of the plots also tends to increase (Figure 4). As we move from Picea vs. Picea (Figure 4a), through phylogenetic space to Picea vs. Pseudotsuga (Figure 4d), and Picea vs. Populus (Figure 4f), an increasing number of points deviate from the hypothetical 45-degree regression-line. As well, it appears that there is a greater proportion of data points found at lower signal intensities, even though median slide background (BG) levels (Figure 5) remain relatively constant between genera for each comparison. Finally, the data cloud is increasingly skewed towards the Picea axis of the graph, suggesting that Picea probes are generally hybridizing preferentially to the Picea array features, when competing with probes from the other genera. These patterns substantiate the hypothesis that microarray hybridization efficiency decreases as the probe species and array species become more distantly related. The FG interquartile range was also found to decrease in a phylogenetically correlated manner, alhough the BG signal range remained relatively constant and low (Figure 6) (Appendix 2, 3). These decreases in FG interquartile range results suggest that it would be difficult to accurately quantify the expression profiles of species distantly related to Picea when using a cross-species hybridization approach. 35 500 450 400 in 3 C 0) (5 350 « 300 250 200 + A Picea X Pinus + Larix • Pseudotsuga O Tsuga o Populus 200 300 400 500 600 Picea reference Figure 5. Mean background (BG) signal for probes of each test genus vs. the corresponding Picea reference probe Figure 6. Interquartile data range comparison. Foreground (FG) signal range is depicted by the black bars while the background (BG) signal range for each test genus is shown by the grey line. 36 Since it is expected that Picea genes and the corresponding homologues in the test genera should be capable of roughly equivalent expression dynamics (i.e. genes being expressed at levels spanning the dynamic range of microarray detection), the observed decreases in signal range probably reflect imperfect matches between non-Picea probes and the Picea array features. Such imperfect probe-feature matches would result in weaker hybridization and thus lower signal intensity per gene. The fact that the mean BG signal range remains constant and low across all slides and genera suggests that the reduced FG signal range is not the product of increased technical noise or error in signal quantification, since in tht case the BG signal range would be expected to show a similar decline. The median FG to BG ratio is a particularly relevant metric, since it describes the extent to which data is being lost due to noise. The data in Figure 7 show that the median FG to BG ratio decreases as phylogenetic distance increases. For example, Populus has a median FG to BG ratio of 1.3 (Figure 7), indicating that almost half of the array features in this comparison have FG signals equal to, or less than, the BG signal. When subjected to the usual filtering criteria, such data points would obviously be eliminated from the data set. Another useful method for evaluating the portion of a dataset that are reliably detected is to assess the number of features with FG intensities two or three-fold greater than BG (Figure 8). When filtered in this manner, there is a clear decline in the number of highly expressed features with respect to evolutionary distance from Picea. This observation is consistent with the findings of Renn et al. who screened expression profiles of eight fish species of varying evolutionary divergence (10-200 MYA) from the 37 reference species Astatotilapia burtoni (Renn et al., 2004). They also concluded that number of reliably detectable features (FG >3x B G ) declined as evolutionary diverge increased. Picea Pinus Larix Pseudotsuga Tsuga Populus Figure 7. Median foreground (FG) to background (BG) signal ratio for each genus Overall, therefore, it appears that our ability to detect transcripts using a heterologous c D N A microarray declines as the phylogenetic distance between the probe genus and array genus increases. This is not surprising, given that the technology is based on strength of complementarity between probe and sample. The question that remains to be resolved is whether the data that can be extracted from cross-species hybridization studies are still representative of the biology being examined in the species of interest. To address this issue, I undertook a more detailed examination of the gene sets generated by each of the cross-species comparisons. 38 Figure 8. Percentage of all array features with foreground (FG) signal levels 2-fold and 3-fold greater than background (BG) levels. The portion of feature with FG signal levels 2-fold greater than BG are represented by the black bars. The portion of feature with FG signal levels 3-fold greater than BG are represented by the black bars 39 4.2 Genes highly expressed post-germination 4.2.1 Promoter analysis The Athena software package ( was used to identify the presents of promoter elements and enriched gene ontology (GO) classifications within a list of genes that were highly expressed in germinating Picea seeds. This software resource was specifically designed for analyzing Arabidopsis gene lists predicting promoter element and GO classifications based on known promoter elements and functional classifications in the Arabidopsis genome. Given that there are substantial differences between Arabidopsis and conifers, the results must be interpreted with caution. This fact is particularly relevant considering that approximately 50% of Pinus ESTs have no significant homologs in Arabidopsis (Kirst et al., 2003), with similar levels of sequence similarity being reported for Picea EST collection (Pavy et al., 2005b). If coding regions are highly dis-similar, then it is likely that non-coding and/or regulator region likely are as well. Also, since the Pinus or Picea genomes are orders of magnitude larger than Arabidopsis, it is possible that the extra non-coding genetic material may play some role in transcriptional regulation. For these reasons, putative promoter elements that are found to be enriched in the conifer gene lists may not have the same function as the Arabidopsis homologs. Having said this, Kirst et al. noted that approximately 90% of Pinus EST, greater than HOObp in length had significant homologs in the Arabidopsis genome, suggesting that there is a great deal of similarity between high quality Pinus sequence and Arabidopsis genomic sequence (Kirst et al., 2003). Since there in no equivalent Picea or Pmus-specific promoter prediction tool, the Arabidopsis genome derived predictions are the best alternative. 40 To explore the biological relatedness of the gene sets generated in the cross-species comparisons, the 1000 most highly expressed genes detected in the Picea seedling samples were examined. The Athena promoter analysis and GO prediction software determined if there was an enriched occurrence of characterized promoter elements within lkb of each gene. Putative molecular functions were also evaluated on the basis of predicted gene ontologies for the Picea/Arabidopsis homologs. p-value cutoffs of 10"4 and 10"6 were used for promoter element and GO classification assignments, respectively. In this manner, it was possible to identify promoter elements or GO classifications that are disproportionately abundant in the 1000-gene list compared to a similar sized list of randomly selected genes from the Arabidopsis genome. The most enriched promoter elements for the Arabidopsis homologs of the 1000 most highly expressed Picea genes can be viewed in Table 2. Four elements were significantly enriched, with very small p- values (<10~10). Table 2. Promoter elements enriched (p< 10"6) in the 1000 genes most highly differentially up-regulated in during Picea seed germination. 'Sub-set %' represents the estimated proportion of the genes in the list whose promoters contain the noted motif, while 'genome %' is the percentage of all the genes in the Arabidopsis genome containing the motif. Transcription Factor/Motif Name sub-set % genome % p-value ABRE-like binding site motif 79% 18% < 10e-9 ACGTABREMOTIFA20SEM 60% 12% < 10e-9 CACGTG MOTIF 58% 13% < 10e-9 GADOWNAT 38% 7% < 10e-9 One of the more striking findings is that 79% of the genes in the list contain ABRE-like binding motifs, compared to the 18% that would be expected by chance. ABRE-like 41 binding motifs are thought to be Abscisic acid (ABA) responsive and are frequently found in genes that are differentially regulated during Arabidopsis seed germination (Busk and Pages, 1998; Nambara and Marion-Poll, 2003; Nakabayashi et al., 2005). Since ABA is a phytohormone central to seed germination regulation it is not surprising that many of the Picea/Arabidopsis homologs that are highly expressed during seed germination would have ABA-related regulatory elements in their promoter region. Two other strongly over-represented motifs, ACGTABREMOTJPA20SEM and CACGT, are also considered to be ABA-responsive, and have been associated with germination events (Hattori et al., 2002; Chandrasekharan et al., 2003a). However, they are also found associated with genes whose expression is correlated with processes other than germination. For example, the ACGTABREMOTIFA20SEM motif has been associated with dehydration stress (Narusaka et al., 2003), while the CACGT motif has been associated with embryogenesis, elicitor-induced terpenoid biosynthesis and ethylene responses (Pasquali et al., 1999; Chakravarthy et al., 2003; Chandrasekharan et al., 2003b). Thus, although these two motifs appear to help regulate genes associated with non-germination processes, these are also processes that likely play a role during early developmental events in plants. The GADOWNAT motif is also 31% more abundant than expected in the 1000-gene Picea list. This gibberellic acid (GA)-regulated motif belongs to another class of regulatory elements that has been associated with Arabidopsis seed germination (Ogawa et al., 2003). GA is believed to act as an antagonist to ABA in germinating seeds, promoting seed coat weakening and vegetative growth, and the breaking of seed dormancy (Taiz and Zeiger, 1998). 42 4.2.2 Gene ontology: functional characterization Gene ontology analysis of the predicted Arabidopsis homologs for the same 1000 Picea gene list generated 18 highly significant (p< 10"6) molecular function characterizations, as shown in Figure 9. These molecular function characterizations were considered to be significant because they were highly overrepresented in my Picea gene list (Arabidopsis homologs) compared to a randomly selected Arabidopsis gene list of equal size. Two major trends can be observed in these data: over half of the most highly differentially expressed genes are involved directly or indirectly in protein synthesis, while the other dominant grouping consists of genes directly related to photosynthesis. As a seed germinates, starch, lipids and storage proteins from the endosperm are converted to simple sugars, fatty acids and amino acids, which are metabolized by the growing embryo (Bewley, 1997). As these stores are depleted, the plant must switch to photoautotrophic metabolism. This requires rapid assembly of the molecular infrastructure needed to produce mature chloroplasts, and to support both the light and dark reactions of photosynthesis. However, while important, this remodeling of the plastids is only one component of the massively increased cellular demand for new proteins, both structural and enzymatic, that accompanies germination. Ontology analysis of the Picea gene list demonstrates that together, these major metabolic shifts dominate the pattern of transcriptional activity in the young Picea seedlings. Promoter analysis of the predicted Arabidopsis homologues corresponding to the same gene list showed clear evidence for over-representation of ABA- and GA-responsive cis-elements, as well as various ethylene-, dehydration- and elicitor-induced response elements. This suggests that these hormones could play an important role in 43 regulating the observed shift in transcriptional programming in the germinating seed, which would be consistent with well-established models of seed physiological processes. Having established that the most highly differentially expressed genes in the Picea gene list reflect the transcriptional events predicted to be taking place in the germinating seedlings, the next question to be addressed is: How similar to this Picea seedling gene list are the gene lists generated from microarray analysis of the other genera that were examined? To answer this question, lists of the 1000 most highly differentially expressed transcripts were generated for each of the five remaining genera, using the same criteria as for the Picea list, and the percentage similarity between these gene lists was then calculated (Figure 10). Phylogenetically, Picea is more closely related to Pinus than to any of the other genera tested, and this close relationship is reflected in the gene list comparison, where 98% of the most highly differentially expressed genes reported for each genus are common. Thus, although the Pinus probes appeared to hybridize much less efficiently to the Picea array elements than did the Picea or Larix probes, the resulting data profile is very similar. This congruence emphasizes that the deficient hybridization observed with Pinus probes likely had its origins in a technical problem, rather than any biological differences. 44 structural constituent of ribosome protein biosynthesis ribosome intracellular ribosome biogenesis cytosolic small ribosomal subunit cytosol photosynthesis chlorophyll binding heat shock protein activity large ribosomal subunit response to heat light-harvesting complex structural constituent of cytoskeleton photosynthesis light harvesting protein degradation tagging activity reductive pentose-phosphate cycle copper ion homeostasis Figure 9. Functional characterization: Gene ontologies (GO) of molecular function. Categories plotted met p-value cutoff of p<10"6, as assigned by ATHENA software. 100.00% Rcea-Rnus Rcea-Rnus-Larix Rcea-Rnus-Larix- all conifer all genera Rseudotsuga Gene List Comparison Figure 10. Percent commonality among the lists of the 1000 most highly expressed genes for each genus. The value for 'Picea + Pinus'' shows that 98% of the 1000 most highly differentially expressed features for each genus are common to both, while the value for 'all genera' shows that approximately 55% of the 1000 most highly expressed features for each genus are common to all six genera. 45 The high similarity of the two gene lists also implies that they each provide a reasonably accurate depiction of the transcriptional program associated with the seed germination process in the respective genus. Interestingly, about 61% of the most highly expressed genes are common to all of the conifer gene lists, while 55% of the genes occur in all genera lists. Thus, despite the large phylogenetic range being addressed, the results being generated by cross-species microarray experiments appear to be consistant with the biology being characterized, at least for the most highly expressed genes being measured, where FG to BG ratios exceed 3.5. Many of the genes common to all six lists have been linked to germination processes in the literature, as was the situation with the previous promoter and GO analyses of the Picea list (Table 3). Inspection of the list of the highly expressed genes that are expressed during the germination of all six genera (Appendix 4), also revealed a theme of photosynthetic induction, since numerous putative chloroplast-related, photosystem-related, and early light-induced genes appear in this list. As well, transcripts encoding key anthocyanin synthesis enzymes, such flavanone 3-hydroxylase and chalcone synthase are also found. This pattern is consistent with the accumulation of anthocyanin observed in young conifer seedlings, post-germination (Kubasek et al., 1992; Kubasek et al., 1998) and with the red pigmentation I observed in many of the seedlings in my microarray experiments. Other genes that were highly expressed in all six genera, and which have also been found to be differentially regulated in other seed-specific expression studies include can be found in Table 3. Included in this list is GASTl-like 46 protein, a GA-stimulated transcript first isolated from tomato (Herzog et al., 1995), maize (White and Rivin, 1995), rice (Liu et al., 1995) and Douglas fir (Chatthai et al., 2004). Table 3. Examples of genes that were found to be highly expressed in all genera during germination. Included are Arabidopsis gene index numbers (AGI#) corresponding to the highly expressed Picea homologs, the putative annotation of the Picea sequences, and references that have demonstrated the involvement of each gene in seed germination. AGI# Annotations Reference At1g29930 chlorophyll a lb binding protein Sullivan and Deng 2003 At1g44575 photosystem II Sullivan and Deng 2003 At1g10360 17.6 kDa heat shock protein Kadyrzhanova 1998 At1g55670 photosystem I Sullivan and Deng 2003 At1g67090 ribulose-bisphosphate carboxylase small unit Sullivan and Deng 2003 At2g 17760 putative chloroplast nucleoid DNA-binding protein Sato 2001 At1g 10360 putative nonspecific lipid-transfer protein Sheoran et al 2005. At3g27690 putative chlorophyll A-B binding protein Sullivan and Deng 2003 At3g54890 chlorophyll a/b-binding protein Sullivan and Deng 2003 At3g56240 copper homeostasis factor Lee et al. 2005 At4g02890 polyubiquitin (UBQ14) iden Sun and Gallis 1997 At4g05320 polyubiquitin (UBQ10) Sun and Callis 1997 At5g 13930 chalcone synthase Kubasek et al 1992 At2g02120 protease inhibitor II Carrera and Pratt 1998 At1g64230 ubiquitin-conjugating enzyme Chen etal 1995 At1g20340 plastocyanin Dijkwel 1996 At3g09390 metallothionein-like protein Dong[and; Dunstan, 1996. Chatthai et al. 2004 At1g57860 60S ribosomal protein L21 Gallois et al. 2001 At2g36830 putative aquaporin - Hakman and Oliviusson, 2002 At3g22840 early light-induced protein Harari-Steinberg 2001 At1g74670 GASTl-like protein , Hergzot et al. 1995 At5g59320 nonspecific lipid-transfer protein precursor Sheoran et al 2005. At4g11600 phosphdlipidtfiydfop^rbxide 'glutathion6tp*etSxidase ' Sugimoto and Sakamoto. 1997 At1g26770 expansin 10 Wuetal. 1996 At3g51240 flavanone 3-hydroxylase (FH3) : '..^^^WF-'W: Yuetal. 2003 At1g 10360 putative glutathione S-transferase TSI-1 Zeng et al. 2005 (Kubasek et al., 1992; Chen et al., 1995; Herzog et al., 1995; Dijkwel et al., 1996; Dong and Dunstan, 1996; Wu et al., 1996; Sugimoto and Sakamoto, 1997; Sun and Callis, 1997; Kadyrzhanova et al., 1998; Gallois, 2001; Harari-Steinberg et al.; 2001; Sato, 2001; Hakman and Oliviusson, 2002; Sullivan and Deng, 2003; Chatthai et al., 2004; Dong et al., 2004; Sheoran et al., 2005; Zeng et a l , 2005) In summary, the most highly expressed gene lists for the six genera show substantial overlap. A large portion (~ 98%) of the 1000 most highly expressed Picea genes were present in both Picea and Pinus lists, while over half of the gene were still found to be highly expressed in all six genera, including Populus, the only non-conifer. 47 Within this common set, GO classification (molecular function) analysis points to an enrichment of germination or early seedling growth-related genes. Specifically, genes related to protein synthesis and to photosynthesis-related processes were found to be markedly more abundant than would be expected by chance. Taken together, the transcriptional responses observed across all the tree species tested are consistent with a biological commitment to seed germination and seedling development, as would be expected given that the RNA being analyzed in each case was derived from germinating seeds. 4.3 Possible future directions The results presented in this chapter support the hypothesis that biologically relevant data can be obtained by using cDNA microarrays to characterize transcription profiles of species or genera closely related to the species for which the microarray was designed. Further verification of this could be provided by repeating the present experiments. Since the completion of these cross-genus hybridization experiments, a new and improved 22.8 k Picea microarray has become available ( The new arrays provide a larger selection of unique transcripts, and the slide print quality of the 22.8 k array is superior to that of its 16.7 k predecessor. Spots on the microarray spots have much more uniform and desirable morphology, and the sub-grid design and layout have been improved. These improvements would likely reduce the noise in the datasets, leading to more reproducible results and an increase in the proportion of statistically reliable data, due to higher median FG to BG ratios. 48 With improved data acquisition, more biology-based comparisons could be made. It would be informative to put all the conifer species through a standardized treatment regime and examine the similarity in induction patterns over time/treatment. Germinating seed for each of the same five conifer genera and sampling them at two developmental time-points would provide increased power since the experimental design includes a dynamic factor (changes over time) rather than simply relying on the presence or absence of a FG signal above BG level at a single developmental point. If expression trends for a given gene were consistent across phylogenetic space, this would provide increased support for the argument that the gene product is functionally associated with seed germination, and that this function has been evolutionarily conserved. Germination-associated genes that do not appear to be common to all the conifers tested are also potentially interesting. For example, there may be genes that are differentially expressed during Larix germination but not Picea germination, which might represent unique or Lanx-specific germination factors. However, this categorization must be treated with caution. The absence of a gene from the list of 1000 may only mean that its expression was somewhat lower in that species than in the others, and that it therefore failed to meet the threshold for inclusion. Manual curation of the lists and comparison with the full datasets could confirm whether this was the case. Where it is confirmed that a gene appears to behave differently in one or more species, quantitative QRT-PCR on new biological replicates could be used to validate this result. It would also be interesting to extend this comparison to additional conifer genera such as Abies, Thuja, or Taxus. These data would aid the development of a scoring mechanism that reflects the array-species vs. probe-species compatibility. Ideally, such a 49 tool could eventually be used to predict, a priori, the effectiveness of using the Picea arrays for another species of interest. This would help realize one of the original goals of the forest genomics effort to build the Picea EST libraries and the microarrays derived from them, namely, that these resources should serve as a tool to enable the scientific community to conduct forest biology research across a wide range of tree species. In conclusion, it appears that cross-genus microarray hybridizations are a feasible means of exploring global transcriptional changes in related forest tree species. The quantity of reliable data generated by a cross-hybridization experiment appears to be a function of the phylogenetic distance between the array-genus and the probe-genus, as illustrated by measures such as the FG to BG ratio, the numbers of features expressed two- or three-fold above BG, and the increased scatter-plot point spread. Despite some reduction in data quantity, the quality of the remaining data appears to be less affected by the increased phylogenetic distance. Germination-related transcripts are predominant within the 1000 most highly expressed gene lists, and there is a strong congruency in the gene lists, particularly between closely-related species. Cross-species hybridization can thus be a useful tool for investigating global gene expression profiles in non-model species, when species-specific microarrays are not available. 50 5.0 Materials and methods: Pinus taeda somatic embryo germination 5.1 Acquisition of Pinus taeda somatic embryos One hundred and sixty 15 cm petri plates, each containing approximately 300 P. taeda somatic embryos, were received from Cellfor Inc. on Jan 15, 2004. The somatic embryo plates were stored at 4°C in the dark until needed. For the purposes of biological replication, two different P. taeda cell lines (L3519 and K3973) were selected based on their availability and well characterized performance. RNA from the two cell lines was processed and analyzed separately. RNA from cell line L3519 was used for the microarray portion of my experiment, while RNA from both cell lines (L3519 and K3973) was used for the QRT-PCR validation section. 5.2 Desiccation of somatic embryo All somatic embryos (SEs) were desiccated simultaneously prior to imbibition in order to improve the synchronicity of the germination process and to allow for indefinite storage. In order to achieve maximal desiccation levels, based on empirical data, SEs were transferred to empty 15 cm Petri plates and placed in a standard tissue culture laminar flow bench for 90 minutes with the fan on and the plate lids removed. This results in all the SEs becoming dried onto the carrier filter papers. The plates with their desiccated embryos were then covered, sealed with paraffin tape (Parafilm; Menasha, Wl), and stored at 4°C in the dark. Sterile conditions were maintained throughout this process. 51 5.3 Somat ic embryo germination medium The medium used to germinate the somatic embryos contained activated charcoal 1000 mg/1, agar 6000 mg/1, Ca(N03)2.4H20 354 mg/1, CuS04.5H20 8 mg/1, H 3 B0 3 6 mg/1, KH2P04 212 mg/1, ICN03 258 mg/1, L-glutamine 200 mg/1, MgS04.7H20 506 mg/1, MnS0 4.H 20 0.3 mg/1, myoinositol 100 mg/1, Na2Mo04.2H20 0.02 mg/1, nicotinic acid 0.5 mg/1, Plant Product iron 80 mg/1 (Sigma, St. Louis), pyridoxine-HCl 0.1 mg/1, thiamine-HCl 0.1 mg/1 and either sucrose 20g/l, maltose 20g/l, sorbitol 20g/l, or no carbohydrate. 5.4 Somatic embryo imbibition and germination To initiate the germination process, approximately 300 SEs and the carrier filter paper onto which they had been desiccated were transferred to germination tissue culture medium containing either 2% sucrose, 2% maltose, 2% sorbitol, or 0% carbohydrate. The plates were sealed with paraffin tape and placed in the dark at 30°C for 7 day. The SEs plates were subsequently transferred to a plant growth room where they were exposed to fluorescent lighting at 22 °C for 16 hours per day. The light intensity (photosynthetic photon flux) (25 umol m"2 s"1) was measured using a basic quantum meter (Apogee Instruments, Logan UT) 5.5 Harvesting of somatic embryos Germinants and embryos were sampled at three time points during the germination process. The first time point was immediately following the desiccation procedure, when a subset of desiccated SEs was placed in 1.5 ml microfuge tubes at a density of approximately 200 SEs per tube. The second harvesting point was after the 52 plates had spent 7 days in darkness at 30 °C, and the final sampling point was 10 days later, during which period the plates had been held room temperature with exposure to light. Germinants from the second and third sampling points were placed in 10 ml culture tubes, flash frozen in liquid nitrogen, and stored at -80°C until needed. 5.6 Somatic embryo RNA isolation Total RNA was isolated and assessed for purity using the protocol described in the "Cross-species hybridization" section. RNA from line L3519 was used for all microarray hybridizations; approximately lOfig RNA from each sample was also archived for future analysis and validation purposes. RNA from the second cell line, K3973, was isolated in the same manner and was used exclusively for the purposes of quantitative RT-PCR validation. In this way the key results from the microarray portion of the experiment were validated using an independent technology and two different cell lines (biological replicates). 5.7 Microarray hybridization design For simplicity, a common reference design was chosen. In order to simultaneously make comparisons between the three different time series samples, while also allowing robust comparisons to be made between different treatments at any given time-point, desiccated embryo RNA was assigned as the common reference throughout the experiment (Figure 11). Four dye-balanced replicate hybridizations were made between each sample and the reference for a total of 32 hybridizations. 53 7 day @ 30 °C 10 day @ 22 °C 2% sucrose 2% maltose Desiccated Embryo 2% sorbitol 0% sugar Figure 11. Somatic embryogenesis: common reference design. The arrows represent four replicated dye-balanced hybridizations between the Desiccated Embryo reference and one of the eight treatment conditions. 5.8 Somatic embryo microarray hybridization The hybridization protocol described in the "Cross-species hybridization" section was used, with the following exceptions: Five jxg total R N A was used for each hybridization rather than 500 ng m R N A , and the Forestry Genome Brit ish Columbia 22.8k Picea c D N A microarray was used rather than its predecessor, the 16.7 k version. 5.9 Somatic embryo microarray data analysis 5.9.1 Scanning and image processing A l l slides and images were processed, assessed and quantified, using the same criteria as were defined in the "Cross-species hybridization" section. The Forestry 54 Genome British Columbia 22.8k P/cea-array-specific quality control scripts were implemented to assess hybridization and slide quality. 5.9.2 Data pre-processing Background signal intensity for the raw data was defined as the lowest 10% median foreground signal per array sub-grid. This method was selected because it is less affected by artefacts introduced during the spot assessment and quantification process. Inconsistencies in assigning physical spot dimensions and boundaries do not change the foreground: background ratio, as much as the "Local-background" subtraction method does. Variations in scanning conditions also have less impact when the "lowest 10%" method is used, because local background levels are not integrated in this method. Raw data was transformed (Ln-scale) and normalized using variance stabilizing normalization (VSN) (Huber et al., 2002). 5.9.3 Statistical computations Ln-transformed expression ratios of replicates were averaged and ANOVA performed. Histograms describing the parametic p-value distribution of array features were generated for the entire microarray experiment, as well as for each comparison made as summarized in Table 1. The percent of differentially expressed genes (Diff Exp) was estimated for each comparison by calculating the proportion of features above the blue threshold line of the graph, which is a q-value-derived estimate of the proportion of null p-values (Storey and Tibshirani, 2003). 55 5.9.4 Data clustering In order to cluster my microarray data based on expression profiles, all genes were ranked in order of p-value and the 1000 annotated genes with the lowest p-values (p < 1.4 X 10"8) were extracted for further analysis. The members of this 1000-gene set were grouped in order of expression pattern similarity, using Genesis hierarchical clustering software (Sturn et al., 2002). Ten sub-clusters of genes were then defined, based on the magnitudes and patterns of gene expression. 5.10Quantitative Reverse Transcriptase Polymerase Chain Reaction 5.10.1 Quantitative RT-PCR methodologies The RNA used for the quantitative reverse transcriptase-PCR (QRT-PCR)-based analysis, was derived from two sources. The first was an aliquot of the line L3519 RNA used for the microarray portion of the study and the second source was RNA samples isolated from line K3973. RNA concentrations were quantified using a GeneQuant, RNA/DNA calculator (Pharmacia, Piscataway, NJ). RNA concentrations were then equalized for consistency and ease of handling. In order to eliminate any contaminating genomic DNA from RNA samples to be used for QRT-PCR, the samples were treated with amplification-grade deoxyribonuclease (DNAse) as specified by the manufacturer (Invitrogen; Carlsbad, CA.). Approximately 10% of each DNAse-treated RNA sample was archived at -80°C for later analysis, and the remainder was reverse transcribed using recombinant reverse transcriptase II, as 56 specified by the manufacturer (Invitrogen; Carlsbad, CA.). The resultant cDNA population was then diluted 1:10 with PCR-grade water, and stored at -20°C until needed. PCR reactions were performed in a MJ Opticon II, real time PCR detection system (MJ Research; Mississauga, ON), using the QuantiTect SYBR Green PCR Kit, as specified by the manufacturer (Qiagen; Mississauga, ON). All samples were run as 20 ul reactions using: 5ul (10 ng) RNA, 2 ui H 20, 1.5 ui (0.5 uM) forward primer, 1.5 ui (0.5u.M) reverse primer, and lOui SYBR Green PCR Master Mix. Cycle conditions were as follows: a) 95 °C for 15 min, b) 94 °C for 15 sec, c) 55 °C for 30 sec, d) 68 °C for 45 sec, e) repeat steps b-f for 42 cycles f) 68 °C for 5 min. Each sample was quantified between steps d) and e). A PCR product melting curve was also generated to assure that only a single product was amplified per reaction. This step involved heating each PCR product from 60°C to 90°C at 0.2°C/sec while a measurement of SYBR Green fluorescence is taken for each sample every 0.2 sec. For the purposes of relative quantification, a seven sample serial-dilution series of P-tubulin PCR product was used to generate a standard curve in each QRT-PCR run. This dilution series was generated using gel-purified P-tubulin PCR product. This was produced by amplification of P-tubulin cDNA from total Pinus taeda cDNA using JumpStart REDTaq ReadyMix PCR Reaction Mix (Sigma; St. Louis, MO) and Pinus taeda p-tubulin-specific primers (forward sequence: GGT CTT TAG GCC TGA TAA CTT; reverse sequence: GCA GAG ATC AAA TGG TTC AAT TC). The amplification conditions consisted of an initial cycle of 94°C for 5 min, then 35 cycles of 94°C for 1 min, 55°C for 1 min, and 72°C for 1 min, with a final cycle of 72°C for 10 min. The PCR product was then visualized on a 1.5% agarose electrophoresis gel, purified using the 57 Qiaquick PCR purification kit as specified by the manufacturer (Qiagen; Mississauga, ON), and quantified using a Pharmacia GeneQuant DNA/RNA calculator. The purified P-tubulin was serially diluted to concentrations of 108 molecules/5 u.1, 107 molecules/5 u.1, 106 molecules/5 [xl, 105 molecules/5 |xl, 104 molecules/5 [xl, 103 molecules/5 [xl, and 102 molecules/5 [xl. The conversion from ng/ |xl to molecules/ ul was based on the formula: (Xg/jul DNA I[plasmid length in basepairs x 660]) x 6.022 x 10e23 = Y molecules!fit (Qiagen, 2003). 5.10.2 PCR primer design Since the ESTs on the array were derived from Picea RNA, and the probes were generated from Pinus RNA, all QRT-PCR primers were designed specifically to amplify the closest Pinus homolog to the Picea EST of interest, as determined by BLASTN analysis. In order to confirm that the Pinus sequence used was the most similar available homologue corresponding to the Picea EST in question, a BLASTX search was performed for each Pinus sequence. If the Pinus sequence resulted in the same Arabidopsis annotation as the Picea EST, the Pinus sequence was assumed to be the best homologue. All PCR primers were designed with the aid of the online resource Primer3 ( Design parameters were set such that product size range = 75-150bp, primer size = 18-22 bp, primer Tm = 58-62, optimal GC content of 45%, maximum self-complimentarily — 5.0, maximum 3' 58 complimentarily = 1.0, and maximum poly-X = 3. A l l other parameters were left at default settings. A list of the Pinus taeda primers used is shown below (Table 4). Table 4. Pinus taeda QRT-PCR primers. 'Primer name' indicates the Picea E S T to which the Pinus sequence is most related, followed by the orientation of the primer, 'Annotation A T G numbers' indicates the best Arabidopsis homologue of the conifer c D N A , 'Annotated function' is the predicted function, and 'Sequence' is the sequence of the oligonucleotide. Primer Name Annotated ATG # Annotated function Sequence WS00113 D16 like Forward At5g08370 alpha-galactosidase AGAATAACCAGTGGGCAGCA WS00113 D16 like Reverse At5g08370 alpha-galactosidase AAGCCCAAATGCTGAAGTGT WS00729 F23 like Forward At1g20510 4-coumarate--CoA ligase AGGTTGGACTGTGGGAGTCA WS00729 F23 like Reverse At1g20510 4-coumarate--CoA ligase GGTTTCGCCATACAAGAGGA WS01034 M15 like Forward At4g38620 MYB4 CTTC AG ATC AGG ACG C AG GT WS01034 M15 like Reverse At4g38620 MYB4 TGGACCAAAGAAGAGGACGA WS0016 E09 like Forward At5g28840 NAD-dependent epimerase AATGGTCAGCATGAACGAGA WS0016 E09 like Reverse At5g28840 NAD-dependent epimerase GGAATGTGATGTATCGGCAAC WS00912 C11 like Forward At1g17160 pfkB-type carbohydrate kinase TTCAGTTGCAGAGAGCCTGA WS00912 C11 like Reverse At1g17160 pfkB-type carbohydrate kinase GCATACTTGGAATGGCACCT WS00929 J15 like Forward At4g29160 SNF7 ATGAAGGTGCTGTGGTTGCT WS00929 J15 like Reverse At4g29160 SNF7 GAGGATGAATTGGAGGCTGA WS00930 N09 like Forward At2g01630 glycosyl hydrolase TCAGCTACGTCAACAACTGGA WS00930 N09 like Reverse At2g01630 glycosyl hydrolase ATCAACAATGGCATCGAACA WS0075 D19 like Forward At4g02440 circadian clock coupling factor ATGTGATTGGCCTGGATGT WS0075 D19 like Reverse At4g02440 circadian clock coupling factor AATGCACAATTCACCAACGA WS00929 N09 like Forward At1g69740 porphobilinogen synthase CCAGCTATGCCCTATTTGGA WS00929 N09 like Reverse At1g69740 porphobilinogen synthase TTCCATCATTGCCTTTGTCT WS01039 A16 like Forward At5g53190 nodulin GTGGATGACCTATGCTGCTC WS01039 A16 like Reverse At5g53190 nodulin TAGCTGAGCCAATGCCAAG A l l primers were custom synthesized by Invitrogen (Carlsbad, C A . ) . 5.10.3 Quantitative RT-PCR data analysis Three replicate samples of each R N A preparation were assayed with each primer set. Each sample was also assayed using the p-tubulin primer set, as an internal standard for the purposes of data normalization, p-tubulin was assumed to be a so-called "house keeping" gene because of its relatively stable expression levels in all plant tissues, which allows it to be used as an internal standard (Vandesompele et al., 2002; Brunner et al., 2004a). Q R T P C R expression data was normalized according to the following formula: 59 [Copies gene X in sample Y] I [Copies of p-tubulin in sample Y] = Copies of gene X relative to p-tubulin Negative controls in the form of 'water blanks' were included in each QRT PCR run to detect any cross-contamination of samples. Because genomic sequence information for Pinus taeda is severely limited, it was not possible to deliberately design intron-spanning primers, which would have been preferable for such QRT PCR investigations (Bustin, 2002). Therefore, to control for the possibility of genomic DNA contamination of the cDNA samples, the cDNA aliquots that had been DNase-treated but not reverse-transcribed were screened with the P-tubulin primers. Any product that results from a PCR reaction containing non-reverse transcribed RNA is likely the result of genomic DNA contamination. Since all sample-to-sample microarray comparisons were made relative to the desiccated embryo (common reference), the same comparison framework was used for the QRT PCR assessments. This was accomplished by generating ratios of each P-tubulin-normalized sample vs the P-tubulin normalized desiccated embryo standard. For consistency with the original microarray data, all expression ratios were Ln-transformed. Pair-wise Student's t-tests were then performed on each comparison and a p-value cutoff was assigned to determine statistical significance. 60 6.0 Results and discussion: somatic embryo germination 6.1 Differentially expressed features Parametric p-values were calculated for all array comparisons and array features. When these are plotted as a histogram they show what percentage of the data is statistically significant in each experiment, and give an estimate of the number of differentially expressed microarray elements for any given comparison. As can be seen in the histogram of parametric p-values for the Whole Model ANOVA (all slides), a large proportion (approximately 70%) of the features printed on the arrays yielded differential expression values during at least one of the treatments or time points, relative to the reference (Figure 12). This percentage is relatively high for microarray studies, which typically display values ranging from -2% to 25% (Thibaud-Nissen et al., 2003; Ehlting et al., 2005). The high degree of differential expression I observed may be a reflection of the substantial morphological and physiological changes induced when a desiccated embryo is imbibed and makes the transition from a dormant state to fully autotrophic growth. Given the large number of differentially expressed genes in my experiment (approximately 15 400), I elected to undertake a more focused analysis, rather than attempting to find biological correlates for all the observed transcriptional changes. A closer investigation of the p-value histograms does reveal some interesting patterns when the various microarray comparisons are considered from the perspective of the statistical power generated by the chosen experimental design. Three classes of comparisons can be identified (Table 5). The first class represents all comparisons that were made directly between the eight treated samples and the common reference. 61 Because these comparisons represent co-hybridizations that were made directly between the reference and the treated sample, the statistical power is sufficiently high that even relatively small expression differentials are statistically significant. For example, in the comparison between the samples 2% sucrose 7 day vs. desiccated embryo (Figure 13), at a p-value threshold of 0.05, 8617 ESTs were defined as differentially expressed, with expression ratios as small as 1.1 and -1.1 (data not shown). Although the biological significance of such small expression ratios is debatable, this statistical power is usually only available when comparisons are made directly between two samples (Yang, 2002). The second class of comparisons consists of indirect comparisons between two treated samples each of which was co-hybridized with the common reference, and where none of the genes generated statistically reliably fold changes. Although numerous genes showed large fold changes in expression between various samples, the statistical power of these indirect comparisons was comparatively low (high p-values), and the probability of false results was thus high. Most of the comparisons in this class are those between different treatments at the same time point; e.g. 2% sucrose 7day vs. 2% maltose 7day (Figure 14). The exceptions to this were the comparisons between the 0% sucrose at either 7 day (Figure 15) or 10 day, and any other treatment at the same time point. These latter comparisons did yield a small fraction of differentially expressed transcripts, as will be discussed shortly. The apparent lack of differential expression between most treatments at any give time point is likely due to the limitations imposed by the experimental design rather than a genuine lack of biological differences. The foremost weakness of any common reference microarray design is that, while direct comparisons enjoy strong statistical 62 power, all indirect comparisons are compromised due to the four-fold increase in the variance (Yang, 2002). The result of this decreased statistical power is that subtle, yet biologically real, expression differences cannot be reliably measured. The final class of comparisons consists of indirect comparisons that did produce statistically significant expression ratios, despite the reduced statistical power. This class was predominantly made up of indirect comparisons within a treatment (2% sucrose, 2% maltose, 2%sorbitol, or 0% sucrose), but at different time points (7 day vs. 10 day). The previously mentioned comparison between 0% sucrose at 7 day or 10 day vs. any other treatment at the same time point would also belong to this class. The most likely explanation for this pattern of differential responses is that developmental processes have a much larger impact on global gene expression than does the substitution of various exogenous carbohydrate sources. Again, the 0% sucrose samples are somewhat anomalous, but this treatment might be predicted to have greater impact than changing the type of carbohydrate supplied, since the 0% sucrose treatment involves a complete withdrawal of carbon supply for the seedlings. The phenotypes developed by the seedlings during their germination on the different media are consistent with the suggestion that larger and more widespread transcriptional changes are associated with developmental transformations than with a change in carbohydrate source (Figure 17). Only minor morphological differences were observed in P. taeda somatic embryos germinated on 2% sucrose or on 2% maltose at either time point. In contrast, substantial differences are observed between embryos germinated on 2% sucrose and those on 0% sucrose. The most striking differences were the lack of roots, extended hypocotyls and small epicotyls of the 0% sucrose plants. 63 Likewise, if we compare the 2% sucrose germinants at the 7 day and 10 day sample points, major changes in root and epicotyl development, as well as pigment formation are obvious. It seems likely that such drastic physical changes will be reflected in transcriptional changes that are detectable even by indirect microarray comparison methodology. There are almost certainly transcriptional differences between o o o CM >> o c CD 3 a• o o o o 8 o 00 o o o to 70.96 % Diff Exp o o o CM ~i i r xzr I I I I — 0.0 0.2 0.4 0.6 parametric p-values 0.8 1.0 Figure 12. Frequency of various parametric p-value categories for all array elements and all slides. The estimated portion of differentially expressed genes (% Diff Exp) is the proportion of genes that fall above the blue line, which is a q-value derived estimation of null p-values. 64 Table 5. Estimated percent of differentially expressed ESTs in each microarray comparison. H igh l igh ted cel ls represent: direct comparisons (black cel ls) , indirect comparisons wi th no E S T s predicted to be differential ly expressed (ye l low cel ls) , and indirect comparisons wi th E S T s predicted to be differentially expressed (white cells) , "na" refers to indirect comparisons that were not processed. des emb 2%suc 7d 2% malt 7d 2% sorb 7d 0% sucr 7d 2% sue10d 2% malt 10d 2%sorb10d 0%suc 10d des emb -2%suc 7d 48.8% -2% malt 7d 45.0% 0.0% -2% sorb 7d 50.6% 0.0% 0.0% -0% sucr 7d 41.9% 20.7% 20.6% 23.4% -2% sue10d 52.0% 35.7% na na na -2%malt10d 46.0% na 2.2% na na 0.0% -2%sorb10d 48.0% na na 0.0% na 0.0% 0.0% -0%suc10d 59.8% na na na 35.7% 11.6% 15.1% 6.0% • 48.78 % Diff Exp T i \ 111111111111 I 1 1 1 ! 1 0.0 0.2 0.4 0.6 0.8 1.0 parametric p-values Figure 13. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2% sucrose 7 day vs. desiccated embryo. The estimated portion of differentially expressed genes (% Diff Exp) is the proportion of genes that fall above the blue line, which is a q-value-derived estimate of null p-values. § -oo ;ncy >nb 8 -<L> iz to o s -8 -csi 0 yrp\v:fr.i 0.2 0.4 0.6 0.8 parametric p-values Figure 14. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 0% sucrose 7 day vs. 2% maltose 7 day. The estimated portion of differentially expressed genes (% Diff Exp) is the proportion of genes that falls above the blue line, which is a q-value-derived estimate of null p-values. 66 0 r3 c /o C )i1 1 Exp I 1 1 1 1 1 0.0 0.2 0.4 0.6 0.8 1.0 parametric p-values Figure 15. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2% sucrose 7 day vs. 0% sucrose 7 day. The estimated portion of differentially expressed genes (% Diff Exp) is the proportion of genes that fall above the blue line, which is a q-value-derived estimate of null p-values. 1 3.1 3 °/ D D ff E Exp I 1 1 1 I 1 0.0 0.2 0.4 0.6 0.8 1.0 parametric p-values Figure 16. Histogram of p-value distribution of elements on replicated slides for the indirect comparison: 2% sucrose 7 day vs. 2% sucrose 10 day. The estimated portion of differentially expressed genes (% Diff Exp) is the proportion of genes that fall above the blue line, which is a q-value-derived estimate of null p-values. 67 desiccated embryo 2% sorbitol 7 day 0% sugar 7 day 2% sucrose 7 day 2% sucrose 10 day 2% maltose 7 day 2% maltose 10 day 2% sorbitol 10 day 0% sugar 10 day Figure 17. Representative germinants harvested for each carbohydrate treatment time point. Millimetre scale bar shown. 68 the various carbohydrate supply treatments at any give time point, but the adoption of a hybridization design intended to allow experimental flexibility resulted in reduced statistical power for these comparisons. Measurable phenotypic differences between the four nutrient treatments are apparent, with the 0% sucrose germinants showing the greatest perturbation in development. The other three carbohydrate treatments showed more modest differences. For example, the 2% sorbitol germinants appeared healthy but were approximately half the size of their 2% sucrose or 2% maltose counterparts, while the 2% maltose germinants were visibly different from the 2% sucrose specimens in terms of hypocotyl and root length. I would anticipate that physiological differences would accompany this obvious morphological differences; however, a more focused microarray investigation designed to generate increased statistical power through direct comparisons between carbohydrate treatments would be required in order to detect more subtle transcriptional changes associated with these same phenotypic differences. 69 6.2 Data clustering 6.2.1 Cluster selection and analysis Since approximately 70% of the P. taeda transcriptome was estimated to be differentially expressed at some point during somatic embryo germination, it was considered unrealistic to attempt to conduct detailed investigation of such a large gene set. Therefore, only those genes whose expression differentials possessed the lowest p-values (i.e. were statistically most reliable) were chosen for further study. The differentially expressed ESTs were ranked in order of overall p-value and then filtered based on annotated similarity to Arabidopsis genes, using an e-value cutoff of e < 10"5. The 1000 highest ranked ESTs were arbitrarily chosen for clustering and ontology analysis. This set possessed p-values between 1.5 X 10"19 and 1.4 X 10~8. Clustering of this 1000 gene set based on similarity of expression profile was performed using Genesis hierarchical clustering software (Sturn et al., 2002) (Figure 18). The hierarchical clustering method was used rather than K-mean methods because the latter requires that an arbitrary number of outcome clusters be assigned in advance, a process that can artificially group data (Slonim, 2002). The colored panel at the left of Figure 18 is a heat-map representation of the expression data for each direct microarray comparison (columns). Expression intensities for each EST (rows) are depicted as a color spectrum ranging from red to black to green where deepest red represents a Ln_expression ratio of 3.91 or more (>50-fold up-regulated), black indicates equal expression in both samples, and green represents a Ln_expression ratio of -3.91 or lower (i.e. >50-fold down-regulated). 70 The 1000 gene heat map yielded ten sub-clusters that were defined based on expression patterns that were biologically interesting either in terms of the timing of differential expression, or the varying degrees of expression induction. Heat map images of these ten sub-clusters are shown in the center column of Figure 18 and graphical representations of the average expression profile (pink lines) of each cluster are presented at the right. The x-axis of these profile graphs presents the comparisons made, while the y-axis shows the Ln-transformed expression ratio for each comparison. The data points have been connected with lines in order to highlight the underlying expression trends within each sub-cluster. The identity and predicted annotation of the genes within these clusters are summarized in Table 6. Few genes appeared to be differentially expressed between various treatments (i.e. different medium carbon source) at a given time point, but consistent temporal expression patterns could be detected. Interestingly, across most of the E S T sub-clusters, the 0% sucrose 7 day vs. desiccated embryo comparison consistently appears as an outlier. This is probably a reflection of the severe impact of the lack of any exogenous carbon source at a time when the germinating seedling is facing high metabolic demands. This impact is also reflected in the 0% sucrose germinant morphological phenotype (Figure 17). In general, the 0% sucrose germinants displayed retarded gene expression changes compared to the other carbohydrate-treated germinants, i.e. many transcripts have lower expression differentials for the 0% sucrose vs. reference comparison at the 7 day sample point, compared to their behavior in the other three treatment classes. 71 i \ ! i I ! ! i < i J i i > i t I cluster 2 | cluster 4 cluster 2 cluster^ I cluster 5 cluster 7 cluster 8 cluster 10 ! ! ! ! ! ! ! ! S ! | ! i ! i i i i 1 1 i t « i l i fl J i 3 I I S i I ! ! ! ! I i J J I I I 11 3 I ! { n s H l i ' i i i i t 11II}II3 J J 1 3 IS 2 3 Figure 18. Hierarchical clustering of microarray results. Heat-map representation of the expression differentials for the 1000 lowest p-value differentially expressed ESTs (left), magnified images of the ten selected sub-clusters (middle), and the averaged expression profiles of the transcripts within each cluster (right). Bright green cells represent 50-fold down-regulation, bright red cells represent 50-fold up-regulation, and black cells represent no change. 72 By the 10 day time point, however, the scale of differential expression for the 0% sucrose-treated samples has returned to levels approximately equal to those found in the 2% sucrose-treated samples. This trend is evident in all the clusters except clusters 4, 7, and 10 (Figure 18). One likely explanation for this lag in induction is that the 0% sucrose embryos are developmentally delayed because the minimal lipids and carbohydrates stored within the somatic embryo are not sufficient to fuel vigorous post-imbibition growth. While somatic embryos recapitulate many aspects of zygotic embryo development (Pullman et al., 2003c) they do not have access to the energy-rich seed tissues that their zygotic counterparts do, to support post-imbibition growth. In an analogous manner, somatic embryos imbibed on 2% sucrose or 2% maltose-containing medium can implement the germination program immediately following imbibition, due to the abundance of supplemental energy reserves in the form of the added carbohydrate in their growth medium. I would therefore hypothesize that the decreased transcriptional activity in the 0% sucrose-treated germinants is primarily due to a nutrient-dependent delay in metabolism. Once the 0% sucrose germinants reach an autotrophic state, supplementary carbohydrate no longer becomes limiting, and gene expression in these germinants can return to levels similar to those observed in 2% sucrose-treated germinants. 73 Table 6. ESTs contained in each of the ten selected sub-clusters. Sub-tables include: Cluster number, EST identifier, Arabidopsis gene identifier, and predicted function. Cluster # EST ID Agi# Function Cluster 1 WS0263_F15 At4g21110 G10 family protein WS00112_G14 At1g62290 aspartyl protease family protein WS00113_D16 At5g08370 alpha-galactosidase WS00112_F15 At3g54210 ribosomal protein L17 family protein WS01012_C09 At5g60220 senescence-associated family protein Cluster 2 WS00810 E02 At3g57450 expressed protein WS0079_E11 At4g19050 mobl/phocein family protein WS00923_H20 At5g06260 nucleolar protein-related WS01032_F09 At2g24762 expressed protein WS00930_O04 At3g19540 expressed protein WS00928_H13 At2g02990 ribonuclease 1 (RNS1) WS01035_G12 At5g64440 amidase family protein WS00729_F23 At1g20510 4-coumarate~CoA ligase family protein WS0014_E19 At5g47280 disease resistance protein WS00932_L05 At1g22410 2-dehyd ro-3-deoxyphosphoheptonate aldolase WS0106_D03 At4g20930 3-hydroxyisobutyrate dehydrogenase Cluster 3 WS0016J04 At1g10600 mov34 family protein WS01034_M15 At4g38620 myb family transcription factor (MYB4) WS0265_N02 At5g21274 calmodulin-6 (CAM6) identical to calmodulin-6 WS00712_F09 At5g02790 In2-1 protein WS00925_D24 At3g63530 zinc finger (C3HC4-type RING finger) family protein Cluster 4 WS02610_G13 At3g13490 tRNA synthetase class II (D, K and N) family protein WS0072J04 At5g45110 ankyrin repeat family protein / BTB/POZ domain-containing protein WS0105_F15 At1g51580 KH domain-containing protein WS02610_K07 At4g 17670 senescence-associated protein-related WS0017_E05 At5g41685 mitochondrial import receptor subunit TOM7 WS0016_E09 At5g28840 NAD-dependent epimerase/GDP-mannose 3,5-epimerase WS0087_J23 At5g59720 18.1 kDa class I heat shock protein (HSP18.1-CI) WS00924_H19 At3g21290 dentin sialophosphoprotein-related WS0063_N13 At1 g23740 oxidoreductase, zinc-binding dehydrogenase family protein WS0063_N13 At5g03290 isocitrate dehydrogenase WS0082_C17 At4g10110 RNA recognition motif (RRM)-containing protein WS0105_G17 At4g14960 tubulin alpha-6 chain (TUA6) WS0074 L12 At3g56490 zinc-binding protein Cluster 5 WS00912_C11 At1g17160 pfkB-type carbohydrate kinase family protein WS0084_J08 At1g32080 membrane protein WS00923 E06 At1g07160 protein phosphatase 2C WS0264_H22 At3g13550 ubiquitin-conjugating enzyme (COP10) Cluster 6 WS00929_J15 At4g29160 SNF7 family protein WS0074_P22 At3g43540 expressed protein hypothetical protein slr1699 WS0042_O07 At4g30840 WD-40 repeat protein family WS00727J319 At2g13610 ABC transporter family protein WS00729_N04 At3g20680 expressed protein WS00945_E20 At2g34510 expressed protein 74 Table 6 continued... Cluster 7 WS00916_A24 At5g20160 ribosomal protein L7Ae/L30e/S12e/Gadd45 family protein WS00820_N22 At5g60390 elongation factor 1-alpha WS0071_N15 At4g04350 leucyl-tRNA synthetase WS00730_J08 At2g22070 pentatricopeptide (PPR) repeat-containing protein WS00112_G14 At1g62290 aspartyl protease family protein WS00930_N09 At2g01630 glycosyl hydrolase family 17 protein WS00920_L03 At4g02230 60S ribosomal protein L19 WS00920_L03 At3g47580 leucine-rich repeat transmembrane protein kinase WS0063_C08 At3g08690 ubiquitin-conjugating enzyme 11 (UBC11) WS01016_F13 At2g45560 cytochrome P450 family protein WS00945_E03 At3g48280 cytochrome P450, putative WS0072_O22 At3g18560 expressed protein WS00723_C15 At3g02650 pentatricopeptide (PPR) WS02612_B04 At2g24530 expressed protein WS00713_H21 At5g02120 thylakoid membrane one helix protein (OHP) WS0019_M23 At1 g55850 cellulose synthase family protein WS0093_P01 At1g71695 peroxidase 12 (PER12) (P12) (PRXR6) WS01016_K15 At3g48080 lipase class 3 family protein WS01028_H15 At5g44550 integral membrane family protein Cluster 8 WS0075_D19 At4g02440 F-box family protein to circadian clock coupling factor WS01011_A19 At2g46760 FAD-binding domain-containing protein WS0053_P16 At5g41190 expressed protein WS00912J13 At5g61330 rRNA processing protein-related contains WS0058_G16 At4g01200 C2 domain-containing protein WS00716_G05 At3g 13600 calmodulin-binding family protein WS00930_B15 At5g12030 17.7 kDa class II heat shock protein 17.6A WS0041_D06 At1g71865 expressed protein WS00929_E08 At3g10670 ABC transporter family protein Cluster 9 WS00926J05 At3g12050 Aha1 domain-containing protein WS0024_F06 At1g77380 amino acid carrier WS00813_N15 At1g65960 glutamate decarboxylase 2 (GAD 2) WS00929_N09 At1g69740 porphobilinogen synthase WS0023_P05 At4g36130 60S ribosomal protein L8 (RPL8C) WS00910_L08 At5g09430 hydrolase, alpha/beta fold family protein Cluster 10 WS01017_F22 At2g35120 glycine cleavage system H protein WS00941_P01 At1g03100 pentatricopeptide (PPR) repeat-containing protein WS01039_A16 At5g53190 nodulin MtN3 family protein WS0017_C22 At3g60245 60S ribosomal protein 75 6.2.2 Cluster descriptions Cluster 1 is one of the most striking in terms of transcriptional induction. The five transcripts represented in this cluster are all at least 50-fold up-regulated in all treated samples, compared to the desiccated embryo reference. In the context of somatic embryo germination, these transcripts are highly up-regulated upon imbibition, and remain highly expressed throughout the germination process. These genes displayed the same trend across all carbohydrate treatment classes, although in the 0% sucrose germinants they show a slight delay in reaching maximal expression levels compared to the other three treatments. Of these strongly responsive ESTs, three homologues of an aspartyl protease family protein (Atlg66290), a ribosomal protein LI7 family protein homologue (At3g54210), and an alpha-galactosidase homologue (At5g08370) are annotated to Arabidopsis genes previously shown to be preferentially up-regulated in germinating seeds and young seedling tissues. Aspartyl protease family proteins (AP) have been characterized in many plant species, including conifers (Simoes and Faro, 2004). They are highly abundant in Glycine max (soy) seed, where they are hypothesized to be involved in processing and modifying storage proteins which is a metabolic process central to seed germination and early seedling growth (Terauchi et al., 2004). Interestingly, AtPCSl, one of the approximately 60APs in Arabidopsis, was recently found to have elevated mRNA and protein levels in mature seed, and shown to be involved in programmed cell death-driven embryo development (Ge et al., 2005). Genevestigator meta analysis supports these findings, since the data indicate that AtPCSl mRNA levels are similar in both dry and imbibed Arabidopsis seeds (Zimmermann et al., 2004). The aspartyl protease family protein 76 (Atlg66290) that I observed, could also possibly be involved in P.teada embryo remodeling during germination. Expression meta analysis also showed that ribosomal protein LI 7 family protein expression peaks during early seedling growth in Arabidopsis, and tapers off as the seedling develops into a mature plant. Finally, the profile of P. taeda alpha-galactosidase expression (to be discussed later in detail) shows a developmental pattern congruent with those found in other systems, most notably Lycopersicon esculentum (tomato) (Feurtado et al., 2001). Tomato alpha-galactosidase is strongly up-regulated during post-imbibition growth, an expression profile correlated with the degradation of raffinose, a key desiccation tolerance-enhancing carbohydrate and a post-imbibition nutrient source for numerous seed plant species (Feurtado et al., 2001). Clusters 2 and 3 have analogous, yet different, expression profiles. All transcripts within these clusters are highly up-regulated in the four carbohydrate treatment classes. However, unlike Cluster 1, where transcription is rapidly induced and remains high at the later time-points, there are temporal fluctuations in the Cluster 2 and 3 expression profiles. In Cluster 2, differential expression seems to be maximal at the 7 day sample point (approximately 7.5-fold) for all treatments and fall to lower levels (approximately 2.5-fold) at 10 days. Conversely, ESTs in Cluster 3 showed the highest differential expression at the 10 day time point with lower values at the 7 day point. Cluster 2 ESTs included homologues of 3-deoxy-D-arai>mo-heptolosonate-7-phosphate synthase (DAHP synthase) (Atlg22410) and 4-coumarate-CoA ligase (4CL) (Atlg 20510). DAHP synthase catalyzes the first step of the shikimate pathway, which is responsible for the biosynthesis of the essential aromatic amino acids tryptophan, 77 tyrosine, and phenylalanine (Herrmann and Weaver, 1999). Phenylalanine can then be metabolized via the phenylpropanoid pathway and its branches to form numerous phenolic compounds including flavonoids, lignin, and other phenylpropanoid metabolites (Herrmann, 1995). The enzyme 4CL catalyzes a key step in the core phenylpropanoid pathway (Cukovic et al., 2001). In the context of somatic embryo germination, induction of phenylpropanoid metabolism is important for many facets of early development, from the formation of UV protectants and antimicrobial agents, to the early lignification of cell walls in developing seedlings (Douglas, 1996; Hemm et al., 2001; Dicko et al., 2005). Another interesting transcript represented in Cluster 3 is annotated as a MYB4 transcription factor homologue (At4g38620). Although there are numerous MYB-type transcription factors found in plants (Martin and Paz-Ares, 1997) MYB4 is of particular significance because of its demonstrated involvement in the regulation of phenylpropanoid biosynthesis in P. taeda (Hemm et al., 2001; Stracke et al., 2001; Patzlaff et al., 2003). The possibility that MYB4 may also act as an intermediate signaling component between sugar perception and metabolic regulation (Lu et al., 2002; Patzlaff et al., 2003; Karpinska et al., 2004) makes it an especially interesting gene with respect to early plant development, and to somatic embryogenesis, because of the regulatory role sugars are believed to play in seed germination (Leon and Sheen, 2003; Rogers et al., 2005). The ESTs found in Cluster 4 exhibit an expression profile similar to those of ESTs in Cluster 3, although the magnitudes of the expression differentials are reduced in some cases by a factor of 10 (data not shown). Transcripts annotated as homologues of isocitrate dehydrogenase (At5g03290), and NAD-dependent epimerase / GDP-mannose 78 3, 5-epimerase (At5g28840) are among the 14 ESTs grouped in Cluster 4. Isocitrate dehydrogenase is an enzyme with a central role in the TCA cycle (Behal and Oliver, 1998). During embryo imbibition, much of the input carbon for this pathway can be derived from storage lipids, which are converted to succinate via the glyoxylate cycle (Cornah et al., 2004; Penfield et al., 2005). Succinate can then be further metabolized through the TCA cycle and gluconeogenesis, to provide polysaccharide cell wall precursors and other metabolites for the developing germinant (Fernie et al., 2004). GDP-mannose 3,5-epimerase catalyzes an early step in the biosynthesis of ascorbic acid (vitamin C), which is a crucial antioxidant for plants (Wolucka et al., 2005). Substantial levels of vitamin C accumulate (from 200 nmol/gFW to 1000 nmol/gFW) in germinating Pinus embryos as early as 48 hours following imbibition (Tommasi et al., 2001), while similar work by Stasolla and Yeung (2001) found vitamin C to accumulate in Picea glauca somatic embryos almost immediately following imbibition. These authors proposed that the rapid mobilization and oxidation of nutrient reserves within the embryo generates an abundance of potentially damaging reactive oxygen species, and that the plant responds by producing antioxidant compounds such as vitamin C to scavenge the free radicals. My microarray data suggest that vitamin C synthesis may continue at high levels throughout the germination process. In contrast to the clusters mentioned above, Cluster 8 contains nine ESTs that are significantly down-regulated (5- to 30-fold) upon imbibition. This is an intriguing pattern because it indicates that the transcripts in question are substantially more abundant in the desiccated embryo reference than in the respective treated samples. Such stored mRNAs were first discovered in dormant cotton seeds decades ago (Dure and Waters, 1965) and 79 have since been characterized in numerous angiosperms, including sunflower (Almoguera and Jordano, 1992) and wheat (Lane, 1991). They have been suggested to represent an archive of transcripts produced during late embryo maturation that are necessary to initiate germination upon imbibition. Nakabayshi et al. (2005) explored this phenomenon in Arabidopsis on a genome-wide scale, and found that large groups of genes were apparently down-regulated within 6 hours of imbibition since their levels in the seedling were lower than the levels detected in a dormant seed reference, a pattern analogous to that displayed by the genes in Cluster 8 (Nakabayashi et al., 2005). When the nine genes defined in Cluster 8 were compared to the Nakabayshi et al. (2005) Arabidopsis dataset, I found that four of the nine were common to both lists. These four common transcripts were annotated as: circadian clock coupling factor protein (At4g02440), FAD-binding domain-containing protein (At2g46760), 17.7 kDa class II heat shock protein (At5gl2030), and ABC transporter family protein (At3gl0670). The circadian clock coupling factor protein is encoded by a gene named ZGT, originally characterized in tobacco. ZGT was predicted to be controlled by the plant's circadian clock because of its regular oscillating and highly light-inducible expression patterns (Xu, 2001). By analogy, the P. taeda circadian clock coupling factor protein homologue mRNA may be stored in the desiccated somatic embryo so that the translated protein is available immediately following imbibition to aid in relaying circadian signals. The annotation, "FAD-binding-domain-containing protein", links this gymnosperm gene to a large family of FAD-binding-domain-containing proteins that appear to have various functions, including being associated with: early development of Arabidopsis (Mushegian and Koonin, 1995), mammalian cytochrome P450 reductase 80 activity (Hodgson and Strobel, 1996) and blue light sensory functions in cyanobacteria (Kita et al., 2005). Genevestigator meta-analysis revealed that, in Arabidopsis, At2g46760 is most highly expressed in tissues undergoing rapid elongation, namely roots and hypocotyls. This is consistent with storage of the corresponding mRNA in the desiccated embryo, which would allow the embryonic tissues to elongate rapidly soon after imbibition. However, its exact biochemical role remains unknown. The situation with the 17.7 kDa class II heat shock protein (At5gl2030) mRNA may be somewhat different, since the encoded protein itself has been found to accumulate to high levels in mature Arabidopsis seed, where it appears to increase desiccation tolerance. The protein is then normally degraded during germination (Wehmeyer and Vierling, 2000). The accumulation of heat shock proteins would presumably require substantial transcriptional activity during the later stages of embryo maturation, so the high levels of WS01011_A19 message detected in the desiccated embryo likely represent residual mRNA. ABC transporters form another large family of genes in plants, with over 100 members predicted in the Arabidopsis genome. They support a diversity of presumed functions, ranging from chlorophyll biosynthesis to ion flux and cell detoxification (Martinoia et al., 2002). Most ABC transporter proteins are expressed in all tissues with little indication of developmental specialization (Martinoia et al., 2002), but one exception to this pattern is AtPGPl (Sidler et al., 1998). AtPGPlpro:GUS promoter reporter experiments found PGP1 expression to be localized to shoot and root tips, while a PGP1 over-expression plant displayed abnormally long roots and hypocotyls without an associated change in cell number, suggesting that PGP1 plays a role in cell elongation. 81 The conifer ABC transporter family protein homologue (WS00929_E08) found to disappear from desiccated P. taeda somatic embryos during imbibition may be a critical cell elongation factor that is transcribed during late embryo maturation, stored, and then translated immediately once germination is initiated. 82 6.3 Raffinose synthesis and degradation 6.3.1 Raffinose and seed biology The appearance of strongly up-regulated alpha-galactosidase transcripts in germinating P. taeda somatic embryos highlights the role of raffinose metabolism in seed physiology. Raffinose is a trisaccharide that has long been known to accumulate in particular plant tissues, a pattern that has been .associated with increased desiccation tolerance (Koster, 1988; Black et al., 1999; Downie, 2000; Zuther et al., 2004). Because dormancy and successful germination require the appropriate desiccation of a seed, the longevity of seed stocks has also been linked to the accumulation of raffinose (Black et al., 1999), whose concentration can reach up to 35% of the dry weight of dormant seeds following desiccation (Koster, 1988). Indeed, raffinose has been estimated to be the one of the most abundant non-structural carbohydrates in plants (Downie, 2000). Raffinose is believed to contribute to the desiccation tolerance of seeds by gradually replacing the water within plant cells and thereby stabilizing cell membranes and preventing cellular damage during drying (Bomal et al., 2002). It has been suggested that accumulated raffinose provides a protective hydrophilic layer between cell membranes during dehydration-induced lipid bi-layer folding, and that this layer minimizes the damaging effects of membrane-to-membrane contact (Koster, 1988) Elevated raffinose levels also increase the viscosity of the cytosol, restricting molecular movement and slowing cytosol leakage (Xiao and Koster, 2001), and the formation of amorphous raffinose glass within the cell prevents the intracellular crystallization of simpler sugars such as sucrose (Xiao and Koster, 2001). For these reasons raffinose is often used as a cyroprotectant for mammalian tissue collections (Storey et al., 1998). 83 Because of this close link between raffinose accumulation and seed desiccation tolerance and dormancy, seed biologists have focused considerable attention on raffinose metabolism. Raffinose biosynthesis has been found to require a two-step biochemical pathway involving the reaction shown below (Lehle and Tanner, 1973). UDP-galactose + myo-inositol — galactinol + sucrose — Raffinose + (myo-inositol) GS RS The first committed step is catalyzed by galactinol synthase (GS) (Saravitz, 1987), while the second step requires raffinose synthase (RS) (Peterbauer et al., 2002). GS mRNA abundance increases during tomato seed maturation, with an abrupt decline during seed germination (Downie et al., 2003). If, however, tomato seeds were imbibed on medium containing the osmotic agent polyethylene glycol instead of water, GS transcription remained at levels as high as those observed during seed maturation, consistent with the involvement of raffinose synthesis in desiccation tolerance. In a soybean seed maturation time-course study, a substantial increase in GS activity was positively correlated with the accumulation of raffinose, and inversely related to the moisture content of the seed (Saravitz, 1987). Similar results were found in lupin (Pinheiro et al., 2005). GS transcription and raffinose accumulation were found to be correlated in GS over-expression Arabidopsis plants, which displayed a 3-fold increase in raffinose abundance compared to wild-type (Zuther et al., 2004). RS activity was found to track that of GS during a time-course study of pea seed maturation, where mRNA levels of both GS and RS were found to peak during late maturation (Peterbauer et al., 2001). Within the same time scale as the mRNA induction, raffinose was observed to accumulate while sucrose and galactinol were depleted. 84 Raffinose has also been shown to accumulate in cold-treated plants (Lui, 1998b), and GS activity is also enhanced. This is consistent with a second role for raffinose, as a cryo-protectant in plant species such as Arabidopsis. Raffinose accumulation in tissues of cold-treated plants helps prevent the membrane damage caused by ice crystal formation (Zuther et al., 2004). In conifers, raffinose abundance in Pinus and Picea needles has been used as an indirect indicator of frost hardiness (Sundblad et al., 2001). Following imbibition of a seed, the raffinose that had accumulated during the seed/embryo maturation phase is no longer required and it is therefore catabolized by a//?/ia-galactosidase to form sucrose and galactose. Raffinose thus also serves as a seed carbohydrate reserve (Feurtado et al., 2001; Marraccini et al., 2005). In general, a marked increase in a//?/za-galactosidase mRNA levels and enzyme activity are observed within 72 hours of imbibition, and this increase is correlated with decreases in the raffinose concentration in the same tissues (Feurtado et al., 2001; Guimaraes et al., 2001). This trend has been reported for germinating Picea seed as well (although induction took place over a somewhat longer time scale) (Downie, 2000). Following imbibition, sucrose abundance and raffinose abundance were seen to be inversely correlated in these seeds, substantiating the hypothesis that seed stores of raffinose can serve as an early source of sucrose for the germinating embryo. 6.3.2 Raffinose synthesis and degradation related transcripts The majority of the transcripts identified during within the cluster analysis demonstrated similar expression patterns across the various carbohydrate-treated samples. 85 Therefore, to avoid extensive redundancy, only the data derived from the 2% sucrose-treated germinants was examined in depth. One of the most highly differentially expressed ESTs (WS00113_D16) in my germination microarray study was a transcript annotated as an a/p/ia-galactosidase (At5g08370). A search of the remainder of the microarray data set for other similarly annotated ESTs retrieved 12 ESTs having substantial similarity to Arabidopsis alpha-galactosidase. ESTs corresponding to the two main raffinose biosynthesis enzymes were also identified, with four ESTs predicted to encode GS, and one for RS. In Figure 19, these ESTs and expression ratios are mapped onto a schematic drawing of the raffinose biosynthesis and degradation pathways, along with the related enzymes. In general, GS-encoding transcripts were not observed to be differentially regulated at the 7 day and 10 day time points. This was expected since a majority of the raffinose is synthesized during embryo maturation, rather than following imbibition during germination (Karner et al., 2004). However, expression of the GS-encoding EST, WS00919_C07 (At2g47180), was anomalous in displaying a 2.2 fold up-regulation at 10 day. However, raffinose accumulation has also been associated with increased cold tolerance in Arabidopsis leaves and conifer needles (Sundblad et al., 2001; Zuther et al., 2004). Increased GS expression may begin to reappear as the P. taeda seedling develops, greens, and starts to form needles (Lui, 1998a; Downie, 2000; Zuther et al., 2004), or it may reflect the temperature transition from the 7-day samples grown at 30°C to the 10-day sample 22°C in this germination protocol. Although this is a modest temperature shift, P. taeda is a sub-tropical conifer species that is not particularly frost tolerant 86 (, and it may therefore have an increased sensitivity to 'cold' temperatures. RS transcript accumulation showed a pattern similar to that of GS, with expression levels remaining unchanged at both time points following imbibition. This is consistent with the fact that most raffinose is hydrolyzed, rather than synthesized, during post-imbibition growth (Downie, 2000). If a secondary induction of raffinose is initiated upon the onset of photosynthetic tissue development or exposure to cold, as is suggested by post-imbibition induction of GS WS00919_C07 (At2g47180), one might expect RS up-regulation to follow, unless the basal levels are sufficient. Analysis of raffinose accumulation and RS transcript abundance in photosynthetic or cold-treated tissues at later time points would help resolve this question. As was mentioned earlier, 12 significantly differentially expressed ESTs from my experiment were annotated as a/p/M-galactosidase. Four of the twelve ESTs are up-regulated at both 7 days and 10 days, while three are down-regulated at 7 days and 10 days, and the remaining 5 ESTs display relatively little change. Clearly, there are multiple roles being played by the members of the AG gene family at this juncture, but the inherent limitations associated with P. taeda genomics research make it difficult to draw any firm conclusions. Because the number of a/p/zo-galactosidase genes in the P. taeda genome is unknown, it is impossible to determine whether the 12 significantly expressed ESTs represent 12 discrete genes, or multiple ESTs / transcripts from a handful of genes. Either a genome sequence, or full-length cDNA clones corresponding to these ESTs, would be required to confidently distinguish between unique isoforms of alpha-galactosidase and possible alternatively spliced transcripts. 87 As noted above, some P. taeda a/p/ia-galactosidase genes show significant up-regulation following imbibition. This observation is consistent with the fact that raffinose can accumulate to high levels in desiccated embryos and that a sizable amount of this hydrolytic enzyme (a/p/za-galactosidase) may be required to degrade the raffinose upon imbibition. However, a second class of a/p/za-galactosidase transcripts consists of those that are significantly down-regulated relative to the desiccated embryo reference. These messages would presumably represent stored mRNAs that were transcribed during late embryo maturation, as mentioned earlier. Once the embryo is metabolically active, this new set of a/p/za-galactosidase transcripts could be immediately translated to further amplify the consumption of raffinose. Another possible interpretation for the a/p/ia-galactosidase transcripts apparently occurring in high abundance in the desiccated embryo is that these mRNAs are remnants of an intense a/p/za-galactosidase protein synthesis effort that took place prior to embryo desiccation. Conceivably, pre-synthesized a/p/za-galactosidase protein could also be stored in the P. taeda desiccated embryo, for immediate mobilization upon imbibition as is believed to be the case in tomato (Feurtado et al., 2001). The remaining population of mRNAs that supported a/p/ia-galactosidase protein production during embryo maturation would be degraded during imbibition, while the stored protein would begin to immediately hydrolyze stored raffinose. As the raffinose is catabolized, a second wave of a/p/za-galactosidase transcription and translation may then be initiated. Regardless of whether it is mRNA, or mRNA and protein, that is stored in the dormant embryo, the underlying hypothesis would be that an initial source of a/p/za-galactosidase enzyme is 88 made immediately available during imbibition, and that this is supplemented by a second surge of a/p/w-galactosidase enzyme production during post-imbibition growth. My findings are similar to those described by Guirmaraes et al. (Guimaraes et al., 2001). In their Glycine max seed system, where an initial low level of alpha-galactosidase activity began to degrade the raffinose reserves immediately upon imbibition. This action was then presumably supplemented by newly synthesized enzyme. 89 UDP-galactose + myo-inositol galactinol + sucrose galactinol synthases Raffinose + (myo-inositol) raffinose synthase sucrose + galactose alpha-galactosidase |2%suc7d|2%suc10dl WS00919_C07 (At2g47180) WS00816_J13 (At1g60470) WS00111 _D05 (At1 g60470) 12% sue 7d 12% sue 10dl -0.24 0.06 WS026-LN04 (At5g20250) 2% sue 7d|2°„ sue 10dl -2.19 WS0263 J09 (At5g08380) WS0263_L14 (At5g08370) WS00113 D16 (At5g08370) WS00822_A18 (At3g57520) WS00927 M07 (At5g08370) WS00814 A22 (At5g08380) WS0268_E09 (At3g26380) WS01034_N09 (At3g26380) WS0261_F14 (At3g57520) WS00735 P08 (At5g08370) WS0089 N21 (At3g56310) 3.10 WS00931J21 (At3g56310) Figure 19. Raffinose synthesis and degradation pathway. Shown are E S T identifiers, predicted Arab idops i s homologues and accompanying expression data. Y e l l o w boxes contain enzyme names, red cells represent E S T s w i t h Ln_express ion ratios > 0.6, green cel ls represent Ln_express ion ratios < -0.6, and grey cells represent -0.6 < Ln_express ion ratios < 0.6. Two classes of a/p/za-galactosidase transcripts appear to be present in the example presented by Guirmaraes et al (Guimaraes et al., 2001). Some a/p/za-galactosidase transcripts were found at low levels in the desiccated embryo but at very high levels in the imbibed germinants, and other transcripts were highly expressed in the desiccated embryo but found at much lower levels in the imbibed germinants. Perhaps the wave-like pattern of a/p/za-galactosidase activity observed in germinating Glycine max seeds also occurs in P. taeda somatic embryos. Five ESTs annotated as a/p/w-galactosidase were found to be unchanged during the germination process. This could be explained in two ways. First, it is possible that certain isoforms of a/p/za-galactosidase could be constitutively expressed throughout germination. Alternatively, it could be that these ESTs represent isoforms of alpha-galactosidase that are only differentially expressed at specific developmental stages that were not sampled in this study. A more expanded survey of P. taeda tissues and developmental stages could shed light on possible functions of these five ESTs. WS00822_A18 (At3g57520), annotated as an alkaline a/p/za-galactosidase, is up-regulated at both 7 days and 10 days following imbibition. This is of interest because an alkaline a/p/w-galactosidase has been reported to be preferentially active in barley embryos within 24 hours of imbibition (Carmi et al., 2003). WS00822_A18 (At3g57520) could therefore be a post-imbibition growth-specific a/p/za-galactosidase. However, again, a more detailed sequence-specific transcript profile would be needed to confirm this. Based on this preliminary microarray investigation and existing literature, it appears that raffinose is being synthesized and accumulated during P. taeda embryo maturation and desiccation. The largely unchanged expression of the genes involved in raffinose biosynthesis following imbibition suggests that neither enzyme is being significantly modulated during the germination process. On the other hand, the high transcriptional activity of a/p/ia-galactosidase genes following imbibition is consistent with the model that raffinose is present in the desiccated embryo, and is then rapidly consumed upon re-hydration. A more detailed quantification of a/p/za-galactosidase transcript levels could better define the transcriptional behavior of the various a//?/za-galactosidase ESTs in P. taeda germinants. This exercise would be greatly facilitated by the development of full-length cDNA clones that would enable individual genes to be rigorously distinguished. A quantitative QRT-PCR-based study, with an increased number of sampling time-points, would also improve the resolution of the transcriptional profiles. Sampling from the start of imbibition and then at short intervals thereafter should make it possible to resolve any developmental/temporal specificity of the various a/p/za-galactosidase clones. To help define the point at which raffinose biosynthesis is initiated, a QRT-PCR study of somatic embryo samples harvested prior to desiccation would prove instructive. My model predicts that GS transcription will ramp up during embryo maturation, as is the case in tomato seed (Downie et al., 2003), with RS-encoding transcripts following the same trajectory. Aside from possible temporal differences in the expression patterns of the various biosynthetic and metabolic enzymes, it is also probable that raffinose is being synthesized in a tissue-specific manner. A comparison of GS and RS transcript abundance in green and non-green tissues would be a good initial test of this idea. Similarly, a needle-development time-course experiment would provide insight into the regulation of raffinose biosynthesis in another tissue-specific context. It would also help resolve whether WS00919_C07 (At2g47180) (GS), which was 2.2 fold up-regulated at the 10 days sample point, displays needle-specific expression. To examine the relationship between gene transcription and protein/enzyme abundance it would be useful to quantify protein abundance and activity changes of GS, RS, and a/p/ia-galactosidase across the same expanded developmental series. Antibodies to the conifer proteins do not exist at this time, but assays for enzyme activity would be possible. Measurement of GS or a//?/*a-galactosidase activities across a developmental time series from early embryo maturation to seedling maturity would be very informative, both in itself, and for comparison with data from tomato (Feurtado et al., 2001), pea (Peterbauer et al., 2001), and soy (Saravitz, 1987). Finally, to gain a comprehensive understanding of the patterns of raffinose synthesis, accumulation and degradation, raffinose levels should be quantified during P. taeda somatic embryo maturation and germination, using established methods (Peterbauer et al., 2001; Karner et al., 2004). Ultimately, the goal of these experiments would be to correlate the expression of GS, RS, and a//?/za-galactosidase transcripts, with the corresponding enzyme activities and in planta raffinose content, throughout a developmental time series, thereby positioning the cycle of raffinose accumulation and degradation within the envelope of P. taeda seed physiology. 93 6.4 Phenylpropanoid biosynthesis 6.4.1 Shikimate pathway Two of the most prominent transcripts highlighted in Cluster 2 were annotated as 3-deoxy-D-ara&mo-heptolusonate-7-phosphate synthase (DAHP synthase) (Atlg22410), and 4-coumarate-CoA ligase family protein (4CL) (Atlg 20510). These two enzymes are of particular interest because of their involvement in linking primary and secondary metabolism, and because this pathway is responsible for synthesis of many metabolites of critical importance to conifer growth, development and survival. To develop a broader metabolic context for the expression responses of these genes during P. taeda somatic embryo germination, the expression profiles of all available transcripts corresponding to shikimate pathway or phenylpropanoid/flavonoid biosynthesis pathway-related enzymes were retrieved from the microarray dataset. Only transcripts with p-value < 0.05 were included in this analysis. For each predicted enzyme, the Ln_ transformed expression ratio of the 2% sucrose-treated sample vs. the desiccated embryo sample was mapped onto the shikimate and general phenylpropanoid pathways, as shown in Figure 20 and 21 (Douglas, 1996; Herrmann and Weaver, 1999), and onto a simplified version of the flavonoid biosynthesis pathway (Figure 22) ( 94 ery throse-4 -phosphate 3-deoxy-D-arabino-heptolusonate-7-phosphate synthase 3-deoxy-arab ino-heptu losonate 7 -phosphate WS00932_L05 (At1g22410) WS0075 K05 (At1g22410) WS00110N03 (At4g33510) 3-dehydroquinate synthase 127. sue 7d| 2% sue IQdl 3-dehydroquinate } 3-dehydro-sh ik imate sh ik imic a c i d IS0013_M16(At3g28760) WS0015_E08 (At5g66120) 0.49 WS00819 C02 (At3g06350) «.W WS00923 L21 (At3g06350) shikimate kinase | 2% sue 7d| 2% sue10d| shik imate-3 -phosphate • • H WS00112_D12 (At2g21940) WS0076 L18 (At2g35500) 3-phosphoshikimate 1 -carboxyvinyltransferase | 2% sue 7d| 2% sue lOd] _ H _ _ _ K 3 D WS0085 B05 (At2g45300) 5-enolpyruvyl -sh ik imate-3-phosphate tryptophan synthase WS00810_K07 (At3g54640) WS00927_K16 (At5g54310) WS0O916_J12 (At5g38530) chor ismate t ryptophan chorismate mutase prephenate 12% sue 7d| 2% sue 10d] WS00712_.J18 (At3g29200) WS00928_I02 (At1g69370) prephenate dehydratase c 7 d | 2 % » u c 1 0 d ] -0.37 phenylpyruvate • * 3 0 WS00922 A13 (At1g11790) WS00943_H22 (At1 g11790) WS0074 L05 (At1g08250) WS00816_E04 (A!1g08250) phenyla lan ine Figure 20. Shikimate pathway. S h o w n are E S T identifiers, predicted Arab idops i s homologues and accompanying P. taeda expression data. Y e l l o w boxes contain enzyme name, red cel ls represent E S T s wi th Ln_express ion ratios > 0.6, green cel ls represent Ln_express ion ratios < -0.6, and grey ce l l represent -0.6 < Ln_express ion ratios < 0.6. A r r o w s indicate the b iochemica l reaction catalyzed by the noted enzyme. The shikimate pathway is an essential pathway in all plants as it is responsible for the synthesis of tryptophan, tyrosine and phenylalanine (Srinivasan and Sprinson 1958). For this reason, this pathway has historically been an ideal target for herbicide development (Herrmann, 1995). However, as well as producing essential amino acids for protein biosynthesis, the shikimate pathway also generates the precursor of numerous phenolic secondary metabolites, including families of compounds such as alkaloids, polyphenolics, lignin, flavonoids, and the plant stress signaling metabolite, salicylic acid. The importance of this pathway is clear if we consider that an estimated 20% to 30% of the photosynthetic carbon assimilated by plants is channeled through the shikimate pathway (Macheroux et al., 1999). The conversion of phenylalanine into the numerous phenolic compounds found in plants begins with the general phenylpropanoid pathway (Figure 21) (Hahlbrock and Grisebach, 1979). The nominal product of this reaction sequence is caffeic acid, which is the precursor to a wide range of hydroxylated phenolic compounds, including antimicrobial agents, antioxidants, and UV protectants (Douglas, 1996; Noel, 2005). One class of such compounds consists of flavonoids, which are most noted for their anti-oxidant and UV-protectant properties (Yu et al., 2003). Anthocyanins are an intensely pigmented sub-class of flavonoids that often accumulate in plant tissues during early seedling development, and in stress situations. DAHP synthase is the first enzyme in the shikimate pathway (Herrmann and Weaver, 1999), and has long been hypothesized to catalyze a key regulatory step (Suzich, 1985; Herrmann, 1995). In contrast to bacteria, which also possess the shikimate pathway including DAHP synthase, regulation of this entry-point enzyme in plants does not appear to be primarily based on allosteric feedback from the aromatic amino acid end products (Macheroux et al., 1999). Instead, DAHP synthase regulation in plants appears to be based on transcriptional responses to environmental and developmental cues. UV irradiation is a potent inducer of DAHP synthase transcription in plant cells (Henstrand, 1992). Significant transcript accumulation can be detected within two hours of initial exposure, with levels peaking by eight hours and declining thereafter. DAHP synthase transcription in Arabidopsis is also up-regulated by mechanical wounding, with peak expression at five hours post-treatement (Keith et al., 1991). These results suggest that the up-regulation of DAHP synthase transcription may occur at times when high carbon flux is required to repair cellular damage or to shield tissues from UV photo-oxidative damage. It has been proposed that the origins of the shikimate pathway and the operation of photosystem I in the chloroplast might be linked, since both metabolic functions may have originally been introduced into plant by endosymbiotic bacteria, the presumed predecessors of chloroplasts (Archibald and Keeling, 2002; Entus et al., 2002). Entus et al. (2002) further speculate that, once plants adopted the pathway and photosynthesis became the primary source of carbon, light became the primary regulator of this carbon flux, replacing the aromatic amino acid negative feed-back system that was predominant in the progenitor bacterium. Phenylalanine, tryptophan, and tyrosine are all required for protein synthesis during the post-imbibition growth phase of germination. This demand, combined with the germinants' exposure to light, is likely responsible for the observed up-regulation of transcription of two of the DAHP synthase ESTs during the germination of P. taeda somatic embryos. Several plant species, including Arabidopsis, possess up to three different DAHP synthase-encoding genes (Herrmann and Weaver, 1999). A multi-member gene family is also likely to be present in P. taeda, given its large and relatively uncharacterized genome. Whether the three DAHP synthase ESTs on the array represent three different genes is unclear, but the fact that they each display a different pattern of expression in this study would suggest that they are unique family members. Different isoforms of a gene often have specialized functions (Hughes, 2005), so it is possible that certain isoforms of DAHP synthase are induced by light, while others may be more influenced by carbon demands. It has been suggested that the duplicate DAHP synthase genes in Arabidopsis might enable independent regulation of aromatic amino acid synthesis for protein synthesis, and for secondary metabolism (Keith et al., 1991). Transcripts encoding 3-dehydroquinate synthase, the second enzyme in the shikimate pathway, also appear to be differentially expressed during somatic embryo germination. Of the four ESTs predicted to encode 3-dehydroquinate synthase, two transcripts exhibit strong up-regulation, one shows no differential expression and one seems to be somewhat down-regulated. Since at least some of the relevant genes are being induced, it is likely that 3-dehydroquinate synthase activity is up-regulated during P. taeda embryo germination, which would be consistent with overall up-regulation of the shikimate pathway. Similarly, WS00112_D12 (At2g21940), an EST annotated as shikimate kinase, is up-regulated, whereas WS0076_L18 (At2g35500), also annotated as shikimate kinase, shows no induction. This disparity is again suggestive of a gene family in which different members play different roles during development.. WS0085_B05 (At2g45300), the sole 3-phosphoshikimate-l-carboxyvinyltransferase EST on the arrays, was slightly down-regulated in this study. Interestingly, two transcripts annotated as the a and p sub-units of tryptophan synthase were down-regulated by as much as 7.7-fold. This may indicate a shift in carbon allocation away from the tryptophan branch of the pathway and towards the synthesis of phenylalanine and phenylalanine-derived products. The up-regulation of an EST encoding an enzyme in the phenylalanine synthesis branch of the pathway, chorismate mutase [WS00712_J18 (At3g29200)], is consistent with this suggestion. No ESTs annotated as prephenate dehydrogenase were deemed to have statistically significant differential expression, so the status of tyrosine synthesis is uncertain, but at least it does not appear to be strongly down-regulated. Re-distribution of carbon allocation within the shikimate pathway has also been observed in transgenic potato plants over-expressing tryptophan decarboxylase (TDC) (Yao et al., 1995). The TDC over-expression lines converted substantial amounts of tryptophan to tryptamine and concurrently displayed a nearly 50% reduction in phenylalanine levels compared to wild type plants. The authors concluded that the increased activity in the tryptophan branch in the TDC over-expression plants had created an artificial carbon sink and thereby diverted carbon away from the phenylalanine/tyrosine branch. A further consequence of this shift away from phenylalanine synthesis was a -50% reduction in phenylalanine-derived secondary metabolites. If this 'sink divergence' model is true for the P. taeda germinants, I predict that analysis of the aromatic amino acid pools over time in the germinating P. taeda embryos would reveal increases in the levels of phenylalanine and reductions in the levels of tryptophan that coincide with decreased levels of tryptophan synthase transcripts and increased chorismate mutase expression. 100 6.4.2 General phenylpropanoid biosynthesis If the flux through the shikimate pathway is shifted toward phenylalanine production, then it is likely that one or more downstream pathways which metabolize phenylananine would also be up-regulated. The microarray data for the three enzymes involved in general phenylpropanoid pathway support this conjecture. The relevant ESTs are annotated as phenylalanine ammonia-lyase (PAL) [WS0044_O08 (At2g37040), WS00821_C05 (At2g37040)], cinnamic acid 4-hydroxylase (C4H) [WS00824_E05 (At2g30490)], and 4-coumarate-CoA ligase (4CL) [WS00729_F23 (Atlg20510)], all of which show up-regulation during germination. PAL catalyzes the first step in the transition from the primary metabolite, L-phenylalanine, to the secondary metabolism that produces numerous phenolic compounds (Douglas, 1996). There are four PAL genes represented in the Arabidopsis genome (Costa et al., 2003) and at least five isoforms in Pinus banksiana (Butland et al., 1998). Not surprisingly, these five Pinus PAL genes were found to have distinct expression patterns, both in terms of tissue-specificity and response to a fungal elicitor. In my microarray data there are two ESTs with statistically significant expression patterns, both of which are up-regulated during germination. This observation is consistent with a shift towards phenylpropanoid metabolism. Since the phenylpropanoid pathway always depends on the shikimate pathway for a supply of carbon skeletons, it is not surprising that tobacco PAL activity levels were found to closely shadow those of DAHP synthase across a germination time series 101 phenylalanine phenylalanine ammonia-lyase | 2% sue 7d|2%suc 10dl trans-cinnamic acid 0.55 WS0044_O08 (At2g37040) WS00821_C05 (At2g37040) 4-coumarate-CoA ligase 12% sue 7d 12% sue 10d WS00729_F23 (At1g20510) WS00923_H15 (At1g20510) WS00112_J15 (At1g65060) WS00930_N07 (At3g21240) WS00913_G23 (At4g05160) WS00913_G23 (At1g65060) cinnamic acid 4-hydroxylase 4-coumaryl 4 -hydroxyc innamyl -CoA 12% sue 7d|2% sue lOd] WS00824_E05 (At2g30490) -0.38 WS00931_C11 (At2g30490) 3.82 -0.88 WS0043_A20 (At2g30490) other phenylpropanoids lignin biosynthesis flavonoid biosynthesis Figure 21 The general phenylpropanoid pathway. Shown are EST identifiers, predicted Arabidopsis homologues and accompanying P. taeda expression data. Yellow boxes contain enzyme name, red cells represent ESTs with Ln_expression ratios > 0.6, green cells represent Ln_expression ratios < -0.6, and grey cell represent -0.6 < Ln_expression ratios < 0.6. 102 (Guillet et al., 2000), and in studies of monolignol formation in P. taeda (Anterola et al., 2002). PAL transcript accumulation increased within one hour in P. taeda cell cultures treated with 40 mM phenylalanine, and maximal PAL mRNA levels were reached by ten hours. PAL enzyme activity was shown to peak within a similar time frame. The several hundred-fold up-regulation of PAL induced by phenylalanine suggests that phenylpropanoid biosynthesis can respond dynamically to the rate of phenylalanine production and availability. Although this idea has yet to be tested directly in vivo, it would be consistent with the observed coupling between expression of shikimate pathway genes and phenylpropanoid pathway genes. Cinnamate-4-hydroxylase (C4H) is a low abundance membrane-bound protein that catalyzes the second step in the general phenylpropanoid pathway. It is a member of the very large cytochrome P450 gene family in plants, with approximately 250 P450 family members encoded in the Arabidopsis genome (Nelson et al., 2004). C4H expression is known to be up-regulated by factors such as light, elicitors and wounding (Bell-Lelong et al., 1997), and it is often co-regulated with PAL. Although there is only one C4H gene found in the Arabidopsis genome (Costa et al., 2003), there are believed to be multiple family members in other plant taxa such as Populus (Ro et al., 2001) and possibly Pinus (Anterola et al., 2002). My microarray data for C4H are somewhat complex. Three ESTs are annotated as homologues of the single Arabidopsis C4H (At2g30490), but all three ESTs have very distinct expression patterns. WS00824_E05 (At2g30490) is up-regulated, WS0043_A20 (At2g30490) is down-regulated and WS00931_C11 (At2g30490) remains unchanged. One possible explanation for the three different expression patterns is that there are multiple copies of C4H in P. taeda genome, all of which are substantially similar to the Arabidopsis C4H. On the other hand, different P450 genes can display up to 95% similarity at the amino acid level, which makes cross-reaction between a given C4H microarray element and such highly similar P450s very likely (Ro et al., 2005). Likewise, incorrect annotation of these three ESTs is also possible, given the abundance of cytochrome P450s in plants. Having said this, not all ctyochrome P450s are indistinguishable with microarray technology. It should therefore still be possible to differentiate between transcripts of well characterized cytochrome P450s by using cDNA microarray to, such as Arabidopsis C4H (AT2G30490) and Arabidopsis ferulate-5-hydroxylase (AT4G36220), which are only 9% similar at the nucleotide level. Detailed QRT-PCR analysis of the expression of the specific P. taeda C4H ESTs using gene-specific primers would help shed light on the reliability of my microarray expression data with respect to genes within the same family. Another potentially complicating factor that might affect the C4H expression patterns is the C4H gene's strong circadian pattern of gene expression (Rogers et al., 2005). If C4H expression can change significantly even within a 4-hour period, sampling reproducibility becomes an important issue. Although I did attempt to harvest germinants at a consistent time of the day (morning), it would still be worth quantifying potential time-dependent C4H expression fluctuations. By harvesting tissue or germinants at different times of the day, and screening for C4H expression, the extent of this circadian effect in germinating P. taeda embryos could be determined. The enzyme 4CL catalyzes the final step in the general phenylpropanoid pathway (Ehlting et al., 1999). WS00729_F23 is one of six ESTs in my data set that is annotated as 4CL (Atlg20510) and displays an expression profile similar to that observed for PAL, with up-regulation following imbibition. Of the remaining 5 ESTs, 4 remain unchanged, while one is slightly down-regulated. Like DAHP synthase, 4CL expression paralleled that of PAL in the previously mentioned P. taeda monolignol synthesis study (Anterola et al., 2002). The authors found that 4CL transcripts accumulated to maximal levels by ten hours following phenylalanine application, as did PAL transcripts. Kao et al. also showed coordinate expression of various isoforms of 4CL and PAL within different tissue types (Kao et al., 2002). This co-regulation pattern is supported by the observation of common cis elements in 4CL and PAL promoters (Whitbred and Schuler, 2000). The similarity in expression profiles, and the existence of common promoter elements, suggests tight regulation of a pathway ("regulon") that has evolved to efficiently transfer carbon from a primary metabolite to a diverse range of secondary compounds. One complication in interpreting my 4CL microarray data is that there are believed to be between fourteen (Costa et al., 2003; Costa et al., 2005) and twenty nine (Schneider et al., 2005) 4CL and 4CL-like genes in the Arabidopsis genome (depending on the annotation criteria) and each of these isoforms likely has different tissue-specific expression tendencies. To date, the bulk of the expression and biochemical data has been concentrated on the four bona fide ACL genes in the Arabidopsis genome, although studies are currently underway to characterize the expression patterns of numerous 4C1-like genes (Clarice Souza, personal communication). Considering the size and uncharacterized nature of the P. taeda genome, and the major metabolic commitment of conifers to lignin deposition, similar numbers of 4CL genes are likely to be found in conifers as well. Voo et al. (1995)reported finding a xylem-specific 4CL isoform in P. taeda, while Zhang and Chiang (1997) later demonstrated the existence of two 4CL isoforms in P. taeda that differed by a single amino acid although distinct functions of the two forms were not clear (Voo et al., 1995; Zhang and Chiang, 1997). In poplar, 4CL isoforms displaying different tissue-specific expression patterns have been reported (Harding et al., 2002; Kao et al., 2002). One hypothesis for the existence of multiple 4CL isoforms is that they may direct carbon towards different fates, such as lignin vs. flavonoid biosynthesis (Harding et al., 2002). A similar argument has been made for the presence of two different forms of PAL in Populus, one of which was proposed to be dedicated to condensed tannin synthesis and one for lignin formation (Kao et al., 2002). The fact that only one of the six ESTs annotated as 4CL in my study microarray displays any substantial differential expression is consistent with the model that P. taeda also possesses multiple copies of 4CL, each with some functional specificity. Perhaps WS00729_F23 (Atlg20510) is a germination-specific 4CL isoform? Again, more detailed expression profiling could help answer this. In summary, it appears that in the P. taeda germinants the general phenylpropanoid pathway is transcriptionally up-regulated following imbibition. An overall up-regulation of the shikimate pathway that supplies phenylalanine to the phenylpropanoid pathway is also observed, accompanied by a marked down-regulation of trypophan synthase subunits. Increased levels of phenylalanine being generated by the activated shikimate pathway may contribute to a swift up-regulation of PAL and 4CL, or a least provide the necessary feedstock for enhanced rates of phenolic biosynthesis, reflected in the induction of PAL and 4CL-encoding transcripts. 6.5 Flavonoid biosynthesis 6.5.1 Chalcone synthase and chalcone isomerase Flavonoids are a diverse family of secondary metabolites that are produced by a branch of the general phenylpropanoid pathway. One of the best characterized flavonoid classes consists of anthocyanins, which are pigments that typically accumulate in the aerial tissues (leaves, fruit, flowers) of plants. They are also accumulated in roots and hypocotyls during seed germination in conifers and other plant taxa. To investigate whether flavonoid synthesis is altered during somatic embryo germination, data pertaining to all the ESTs encoding key flavonoid synthesis enzymes on the Picea microarray were compiled and mapped on the appropriate pathway diagram (Figure 22). The first committed step in flavonoid biosynthesis is catalyzed by chalcone synthase (CHS), and two ESTs annotated as CHS were found to show statistically significant differential expression patterns. These were WS0062_C03 (At5gl3930), which showed down-regulation of ~2 to 3-fold during germination and WS0014_M21 (At5gl3930) which exhibited no change. These expression ratios do not indicate a pronounced induction of flavonoid biosynthesis during germination. Similar results were obtained in a study of mRNA levels of anthocyanin synthesis enzymes, and accumulation of anthocyanin, during the germination of Arabidopsis seeds (Kubasek et al., 1992; Kubasek et al., 1998). No elevation of CHS mRNA levels was observed at the day-7 sampling time in that study, relative to time zero, although considerable accumulation of CHS transcripts was observed at an earlier point (day-3). CHS transcription seemed to start immediately following imbibition, peaking at day-3, and then rapidly declining to near zero by day-5. To further characterize this pattern of short-lived transcriptional activity, the authors monitored CHS mRNA at 1-hour intervals from time zero to 8 hours post-imbibition. This second analysis confirmed the rapid induction of CHS, demonstrating that flavonoid synthesis is initiated very early during post-germination growth. Anthocyanin accumulation profiles initially tracked those of CHS transcription, but the pigment did not reach their maximal concentrations until day 5. If a similar rapid response pattern occurs in P. taeda seedlings, one would expect CHS transcripts to be elevated in germinating P. taeda somatic embryos, but to have already declined to basal levels by the 7-day sampling time-point. A more finely grained time-series investigation of CHS transcript levels could test this prediction. The apparent down-regulation of WS0062_C03 (At5g 13930) at both time points relative to the desiccated embryo, on the other hand, suggests that this transcript may be relatively abundant in the dormant desiccated embryo. If so, this would indicate that WS0062_C03 (At5gl3930) is one of the transcripts accumulated late in the maturation phase of embryo development, perhaps for the purpose of rapidly generating CHS activity upon imbibition. Chalcone isomerase (CHI), the second enzyme in the flavonoid biosynthesis pathway, is represented by a single EST, WS00720_J05 (At3g55120), which appears to be up-regulated during embryo germination. Kubasek et al. (1992, 1998) found that Arabidopsis CHI mRNA levels followed a similar profile to that of CHS, with a very early induction of CHI transcription, and no CHI transcripts detectable by day-7. In Arabidopsis, CHI and CHS were found to have highly correlated expression patterns in a bioinformatics investigation of flavonoid biosynthesis that examined the results of 563 Arabidops i s microarray experiments (Gachon et a l . , 2005) 4-coumaryl-Co A chalcone synthase naringenin chalcone | 2"-, sue 7d|2°- sue I0dl • • • • WS0O62_C03 (At5gl3930) -0.16 -0.47 WS0014._M21 (At5g13930) isoflavone reductase-like | 2": sue 7d| 2% sue lOOJ chalcone isomerase 12% sue 7 d | 2 % S U C l M WS00720_J05 (At3g55120) WS0044 K23 (At4g39230) WS0048.K01 (AMg75280) WS00723 J06 (At4g39230) WS0045 J02 (At1g75260) WS0104_P18 (At4g39230) WS00928.M02 (At1g75290) WS0031_E17 (At1g32100) naringenin . lignans? flavonol synthase 2% sue 7a ,2% sue 10d| 0.56 • • • • WS0094 C12 (At5g056O0) • H WS00929 H17 (At5g056t>0) • H i • m WS00916._M15 (At2g38240) • M M •0.51 WS00946 D04 (At5g05600) 0.19 -0.54 WS0262.L21 (At2g38240) flavanone 3-hydroxylase 0.57 0.33 -0.27 0.41 0.28 0.71 -0.32 12% sue 7d| 2% a -0.33 0.54 di hydro kaempferol dihydroflavonol 4-reductase WS0262_B20 (At3g51240) WS0092_D21 (At3g51240) di hydroquercetin WS00820. P24 (At4g39230) WS00727_J11 (At4g39230) WS00712_E21 (At4g39230) WS0081 _P02 (At4g39230) WS00928_ M02 (At1g75290) WS01030_ I02 (At1g752B0) WS01011_J14 (At1g75290) WS00922. A24 (At4g39230) '-///ce WS00926_B24 (At5g42800) WS00723_B22 (At5g42800) WS0262_E14 (At1g61720) WS0093_G09 (At5g42800) WS00113_A17 (At1g61720) WS01012_M21 (At1g61720) WS00725_B17 (At5g42800) WS0091_C23 (At1g61720) WS0016_L17 (An g61720) leucoanthocyanidin dioxygenase anthocyanins leucocyanidin cyanidin WS0046_J11 (At4g22880) WS00937_K24 (At5g05600) WS00923..F22 (At4g22880) WS01028_.G09 (At2g36690) WS00940_ J15 (At2g36690) WS00927_L21 (At3g11l60) WS00912_H11 (At3g11180) WS0264_J18 (At4g22880) WS0045_K01 (At4g22880) WS0051_P20 (At4g22880) anthocyanins Figure 22. Flavonoid biosynthesis pathway. S h o w n are E S T identifiers, predicted Arab idops i s homologues and accompanying P. taeda expression data. Y e l l o w boxes contain enzyme name, red cel ls represent E S T s wi th Ln_express ion ratios > 0.6, green cel ls represent Ln_express ion ratios < -0.6, and grey cel ls represent -0.6 < Ln_express ion ratios < 0.6. 6.5.2 Lignan biosynthesis 110 Isoflavonoids are a class of anti-microbial flavonoid-based compounds, collectively known as phytoalexins, that are produced by some plants in response to various elicitors and biotic stresses (Dixon and Paiva, 1995). In many cases, these compounds are found to accumulate in the roots of plants, where they are thought to provide protection against fungal pathogens (Volpin et al., 1995; Tsanuo et al., 2003). Naringenin, the product of the CHI-mediated conversion of chalcone, sits at an important branch point in the flavonoid synthesis pathway where carbon can potentially be allocated to the isoflavonoid branch of the pathway, a process catalyzed by isoflavone reductase (EFR). Interestingly, there are 15 significantly expressed ESTs in my microarray that are annotated as IFR-like, seven of which are up-regulated post-imbibition. This is a curious result given that isoflavonoids (phytoalexins) are predominantly produced in legume species such as Glycine max (Yu et al., 2003), Desmodium uncinatum (Spanish clover) (Tsanuo et al., 2003), and Medicago sativa (Volpin et al., 1995), and have not been found in conifers such as P. taeda. A similar group of transcripts were found to be up-regulated in fungal elicitor-induced P. taeda suspension cultures (Gang et al., 1999), and in rice (Kim et al., 2003a). In both cases, the genes of interest were annotated as being IFR-homologs due to the 60-70% amino acid sequence similarity to known IFR genes. However, biochemical characterization of a recombinant form of the P. taeda IFR-homologue indicated that it might actually be involved in the synthesis of lignans, a class of isoflavonoid-like, anti-microbial phenylpropanoid compounds that are highly abundant in conifer plant tissues (Gang et al., 1999);(Saleem et al., 2005). It therefore seems likely that the induction of transcripts for the IFR-like ESTs in my microarray data reflect up-regulation of lignan formation during early seedling growth when the young germinants are presumably highly susceptible to fungal pathogens. Metabolic profiling of P. taeda germinants with a specific focus on lignans would enable this hypothesis to be tested. 6.5.3 Anthocyanin biosynthesis Another possible allocation of carbon in the flavonoid biosynthesis pathway is the synthesis of anthocyanin. Flavanone 3-hydroxylase (F3H) is the enzyme that directs carbon from naringenin towards anthocyanin formation. As was the case for CHS, both ESTs encoding F3H appear to be unchanged during post-imbibition growth. This result is consistent with the findings of the Arabidopsis data-mining study (Gachon et al., 2005), where F3H was found to have expression patterns highly similar to those of CHS. As was discussed for CHS, despite obvious anthocyanin accumulation and induction of other flavonoid synthesis genes, it could be that F3H transcription had already occurred by the time the samples were harvested. Again, a more thorough RT-PCR study with frequent sampling time points post-inhibition would enable me to exactly when F3H is expressed. Dihydroflavonol 4-reductase (DFR) is the first committed enzyme in the specific branch pathway leading towards anthocyanin formation. In my experiment, nine ESTs annotated as DFR showed statistically significant differential expression patterns, with seven of the nine ESTs being up-regulated at some point during the germination time series. These results suggest that the anthocyanin biosynthesis pathway is active. As was seen for other putative gene families, not all DFR ESTs respond in similarly, with two of them, WS0091_C23 (Atlg61720) and WS0016_L17 (Atlg61720), being conspicuously down-regulated. DFR often forms a multi-gene family in plants, although there is only one copy in the Arabidopsis genome (Shirley et al., 1992). When six different DFR genes encoded in the Lotus japonicus genome were cloned and characterized, each isoform showed a distinct expression pattern with respect to tissue/organ type (Shimada et al., 2005). L. japonicus DFR1 mRNA was found to be expressed preferentially in roots, DFR2 transcripts were detected in all tissues except leaves, and DFR3 transcripts could be detected only in leaves. Since there are nine differentially expressed transcripts annotated as DFR in my P. taeda data, with seven ESTs showing up-regulation and two showing down-regulation at some point following imbibition, it seems likely that there are several different DFR genes encoded in the P. taeda genome, and that some of them, at least, may be differentially regulated. However, it is important to recognize that the samples employed for the microarray experiments represented whole seedlings, so I am probably seeing an expression profile that effectively represents an average of all tissues. In addition, the paralogs of L. japonicus DFR are up to 96 % similar (Shimada et al., 2005). If this strong similarity is also true for the putative Pinus paralogs, it is unlikely that the Picea cDNA microarray could clearly resolve DFR gene family expression patterns in P. taeda. Nevertheless, the finding that seven of nine DFR-encoding ESTs are up-regulated is suggestive of a differentially regulated multi-gene family. Resolution of this question would require QRT-PCR analysis of specific tissues using EST-specific primers. Leucoanthocyanidin dioxygenase (LDOX) catalyzes the formation of pelargonidin and cyanidin, two of the three central structural classes of anthocyanin. Five of the ten ESTs annotated as LDOX are up-regulated following embryo imbibition. A similar expression pattern for LDOX transcripts was observed in Arabidopsis seedlings, starting 3 days post-imbibition (Pelletier et al., 1997). A subsequent study found LDOX protein to accumulate in a very similar pattern, as did the end-product flavonoids (Pelletier et al., 1999). In my P. taeda germination study, anthocyanin accumulation seems to also be correlated with expression of these LDOX transcripts, at least in terms of visible pigment accumulation (data not shown). 6.5.4 Fla vonol biosynthesis Flavonol synthase (FLS), the enzyme that directs carbon from dihydrokaempferol to the synthesis of flavonols, was the final enzyme to be investigated in this microarray study. Although six ESTs were annotated as FLS, they showed little in the way of differential expression at the time points sampled. WS0094_C12 (At5g05600) was the only differentially expressed EST, showing a 2.2-fold increase in expression at the 10-day time point. One possible explanation for this is that FLS could be highly expressed for only a short period of time following imbibition, as was suspected for CHS. Another possibility is that the flavonol synthesis branch is not highly induced at all during germination in P. taeda somatic embryos. On the other hand, in transgenic tomato, over-expression of both CHS and FLS was required in order to increase the abundance of three flavonol end-products (Verhoeyen et al., 2002). This suggests that roughly equivalent expression levels of CHS and FLS may be required to direct carbon into the flavonol synthesis branch of the flavonoid pathway, whereas inactivity of both enzymes might reduce flavonol synthesis to minimal levels (Colliver et al., 2002). The analogy presented by Verhoeyen et al. (2002) was that CHS "pushes" carbon through the pathway while FLS "pulls" it. Only when both enzymes are active, is flavonol synthesis maximized. It would be informative to use the metabolic profiling techniques described by Verhoeyen et al. and Kaffarnik et al. to quantify flavonol compounds known to be found in Pinus at a number of time points during somatic embryo germination (Verhoeyen et al., 2002; Robles et al., 2003; Kaffarnik et al., 2005). If my CHS and FLS expression data are representative of the associated metabolic patterns, I would predict that little change in flavonol accumulation will be observed. Since many of the flavonol compounds reported to accumulate in Pinus are believed to have UV-protectant properties, and are found in mature needles (Robles et al., 2003), it may be that the flavonol synthesis branch is not induced until much later in development. RT-PCR screening of mature P. taeda needles for patterns of CHS and FLS expression, as well as flavonol content, would therefore provide a useful comparison. 115 6.6 Differential expression of MYB4 As was mentioned previously in the hierarchical clustering section, an EST annotated as MYB4 was found to be significantly up-regulated during P. taeda somatic embryo germination. Since this was the only transcription factor found in the 1000 lowest p-value gene list, it was of particular interest. MYB4 is an R2R3-MYB family transcription factor, which suggests that it might possess a high affinity for AC-containing promoter elements (Stracke et al., 2001). The phenylpropanoid genes, PAL, C4H, and 4CL, all have promoters containing AC elements (Patzlaff et al., 2003). MYB4 has also been shown to be inducible by a range of factors including sucrose (Rogers et al., 2005), light (Jin et al., 2000) and cold (Vannini et al., 2004), all of which influence phenylpropanoid metabolism. MYB4 may therefore be involved in regulating the expression of phenylpropanoid synthesis genes in P. taeda during embryo germination, and helping integrate that regulation with environmental factors. Patzlaff et al. (2003) earlier cloned and characterized MYB4 from P. taeda, and demonstrated that not only did PtMYB4 preferentially bind AC elements, but that the over-expression of PtMYB4 constructs in Arabidopsis plants resulted in significant down-regulation of PAL and C4H (Patzlaff et al., 2003). Interestingly, these same over-expression plants were also found to have more extensive lignification than wild type plants, which suggests that down-regulation of specific PAL and C4H genes does not necessarily result in reduced commitment of carbon to all branches of phenylpropanoid metabolism. AtMYB4 over-expression Arabidopsis plants have shown strong down-regulation of CHS and 4CL3 (Jin et al., 2000), while atmyb4 mutant plants were found to accumulate increased levels of flavonoids and display significant up-regulation of C4H compared to wild type counterparts (Hemm et al., 2001). The interpretation of this MYB4 data is that MYB4 represents a transcriptional switch that directs phenylpropanoid carbon to either the lignin branch or the flavonoid branch. Flavonoid synthesis is preferred when MYB4 is down-regulated, and lignification is induced when the opposite is true (Hemm et al., 2001). If this model is also valid in P. taeda somatic embryos, it could be that the observed 7- to 30-fold up-regulation (Figure 23) of MYB4 is responsible for the concomitant down-regulation of certain transcripts homologous to C4H, 4CL and CHS, and that the corresponding ESTs would be more associated with flavonoid synthesis. AtMYB4 also shows substantial up-regulation when Arabidopsis plants are grown in sucrose-rich conditions (Jin et al., 2000), which is consistent with the MYB4 expression levels observed in the 2% sucrose germinants, while lignification was found to be highest in wild type Arabidopsis seedlings that were germinated on 30mM sucrose (Jin et al., 2000; Rogers et al., 2005). MYB4 expression is also up-regulated by exposure to UV light (Jin et al. 2000) and cold (Vannini et al., 2004). By integrating these results one could hypothesize that the high sucrose concentration in the germination medium induces MYB4 expression, which in turn down-regulates CHS and the flavonoid synthesis pathway, allowing carbon to be allocated towards lignification. Perhaps flavonoid synthesis is the preferred branch of the phenylpropanoid pathway during early post-imbibition growth, but as the plant matures and is exposed to light and cold, a MYB4-mediated shift toward lignification takes place. The anti-microbial and antioxidant properties of flavonoids may make their synthesis an overriding physiological priority when the plant is most vulnerable as a young seedling, but as the germinant develops extended aerial tissues, structural compounds such a lignin may gain increasing importance. Patlaff et al. (2003) showed that M Y B 4 expression coincides very closely with lignification in 21-day-old P.taeda zygotic seedlings, which is roughly the age of the seedlings harvested at the last time point in my microarray experiment (Patzlaff et al., 2003). A more detailed examination of the SE germination time-course in which the time-frame is extended beyond 10-days in light, together with analysis of those tissues for flavonoid and lignin levels, would make it possible to test the hypothesis that down-regulation of the flavonoid synthesis machinery is accompanied by increased lignin abundance, and that these changes are correlated with M Y B 4 expression. Expression profile of MYB4 (WS01034 M15) Figure 23. Microarray expression profile of WS01034_M15, an EST annotated as a MYB4 transcription factor. Shown are absolute fold-change values for each microarray comparison made. 118 7.0 Validation of microarray data 7.1 Quantitative reverse transcription PCR While microarray analysis is currently unmatched in its ability to interrogate large numbers of genes simultaneously, the technology has limitations in terms of speed, cost, sensitivity, specificity and reproducibility. For these reasons, it is standard practice to validate the expression of genes of particular interest by using different methodologies and new biologically-replicated samples. Quantitative reverse transcription PCR (QRT-PCR) is a powerful technique for directly and accurately measuring gene expression levels, and it has become the standard method for validating microarray expression data (Bustin, 2002). In order to confirm the accuracy of my microarray results, one EST from each cluster presented in Figure 18 was selected for validation using QRT-PCR (Table 4). This representative selection was intended to provide validation of the global expression patterns associated with the genes in each cluster, as well as to confirm some specific microarray results for both up- and down-regulated transcripts that displayed varying degrees of differential expression in the microarray experiments. Primers were designed to target putative P. taeda homologs of the Picea EST array elements of interest, and these were used to screen cDNA populations that replicated those used in the original microarray study. To provide more extensive replication of the associated biology, RNA from two different somatic cell lines of P. taeda was used for cDNA preparation and QRT-PCR. Unfortunately, the 0% sucrose 7 day and 0% sucrose 10 day RNA samples for cell line K3973 were lost, and were therefore not available for this validation exercise. Expression levels of the ten ESTs were normalized to the internal (3-tubulin expression levels of each sample, and comparisons were then made between expression levels in the desiccated embryo reference sample and the various carbohydrate treatment samples. The microarray result was considered to be validated if the QRT-PCR-generated expression ratio was in the same direction (up or down-regulated) as the microarray-generated ratio, and if the QRT-PCR-generated ratio was found to be statistically significant across replicates, using a p-value cut-off of < 0.05. Finally, for each E S T the Q R T - P C R results for the two cell lines were compared to the microarray data to gain some insight into the extent to which the conclusions derived from work with one genotype could be extrapolated to another of the same species. Q R T - P C R analysis of expression of the chosen a//?/za-galactosidase, 4 C L , and M Y B 4 genes validated the microarray results for most samples in both cell lines (Figure 24). However, this was not the case for all the sequences that were examined. The disparity between the Q R T P C R results and the microarray data can be seen in Table 7, which is a matrix of all genes for all comparisons in both cell lines. As can be seen in Figure 24, a substantial number of Q R T - P C R comparisons did not replicate the original microarray data. Approximately 40% of the Q R T - P C R comparisons in line L3519 were substantially similar to the microarray findings, while approximately 34% of the K3973 microarray expression ratios could be reproduced. As well, there are no obvious relationships between successful validation and sample time, treatment type, magnitude of expression change, or degree of data replication, although up-regulated genes seem to be slightly more likely to be validated. WS0075_D19, WS00929_N09, and WS01039_A16 were ESTs that were down-regulated to varying degrees in the microarray analysis, but these patterns only partially reproduced in the QRT-PCR data set. It should be noted, however, that only three down-regulated ESTs were screened compared to the seven that were up-regulated. 121 A ) 5.00 <u 4.00 at _j 3.00 u 1 2.00 1.00 0.00 QRT PCR validation (K3973): alpha-galactosidase l l l l l l I Array i QRT P C R 2malt07-des.emb 2malt10-des.emb 2sorb07-des.emb 2sorb10-des.emb 2suc.07-des.emb 2suc.10-des.emb comparison B) QRT PCR validation (K3973): 4CL 2.50 2malt07- 2malt10- 2sorb07- 2sorb10- 2suc.07- 2suc.10-des.emb des.emb des.emb des.emb des.emb des.emb comparison Figure 24 A , B. Comparison of Q R T P C R and microarray results for A) alpha-galactosidase for line K3973 B) 4 C L for line K3973. '*' indicates comparison with p-values < 0.05. Black bars indicate the expression ratios of comparisons made for cell-line L3519 using the 22.8k microarray. Grey bars indicate the expression ratios of comparisons made for the cell-line noted, using QRT-PCR. 122 C) 4.00 s °> 3.00 cs f? 2.00 o £ 1.00 0.00 QRT PCR validation (K3973): MYB4 M i l l Array Q R T P C R 2malt07- 2malt10- 2sorb07- 2sorb10- 2suc.07- 2suc .10-des.emb des.emb des.emb des.emb des.emb des.emb comparison D) 6.00 0.00 QRT PCR validation (L3519): alpha-galactosidase I Array I QRT PCR Osuc.07- Osuc.10- 2malt07- 2malt10- 2sorb07- 2sorb10- 2suc.07- 2suc.10-des.emb des.emb des.emb des.emb des.emb des.emb des.emb des.emb comparison Figure 24 C, D. Comparison of QRT PCR and microarray results for C) MYB4 for line K3973 D) a//?/za-galactosidase for line L3519. '*' indicates comparison with p-values < 0.05. Black bars indicate the expression ratios of comparisons made for cell-line L3519 using the 22.8k microarray. Grey bars indicate the expression ratios of comparisons made for the cell-line noted, using QRT-PCR. 123 E) QRT PCR validation (L3519): 4CL F) I Array I QRT PCR Osuc.07- Osuc.10- 2malt07- 2malt10- 2sorb07- 2sorb10- 2suc.07- 2suc.10-des.emb des.emb des.emb des.emb des.emb des.emb des.emb des.emb comparison QRT PCR validation (L3519): MYB4 4.00 1.00 0.00 -1.00 i 11111 Osuc.07- Osuc.10- 2malt07- 2mal110- 2sorb07- 2sorb10- 2suc.07- 2suc.10-des.emb des.emb des.emb des.emb des.emb des.emb des.emb des.emb comparison I Array I QRT PCR Figure 24 E, F. Comparison of QRT PCR and microarray results for E) 4CL for line L3519 F) MYB4 for line L3519. '*' indicates comparison with p-values < 0.05. Black bars indicate the expression ratios of comparisons made for cell-line L3519 using the 22.8k microarray. Grey bars indicate the expression ratios of comparisons made for the cell-line noted, using QRT-PCR 124 Table 7. Congruence of QRT-PCR results with microarray results. A) line L3519 B) line K3973 '+' represent instances where the two data sources are congruent, while ' - ' represents instances where they are not congruent. Results were considered congruent if the expression data generated by both methods were statistically significant, and if the direction (up or down) of the expression ratios generated were the same for both methods, "na" = comparisons that were not made due to sample loss. L3519 EST ID AGI# annotation 0%suc 7d 0%suc10d 2%malt7d 2%malt10d 2%sorb7d 2%sorb10d 2%suc7d 2%suc10d WS00113 D16 At5g08370 alpha-galactosidase + + + + + + + + WS00729 F23 At1g20510 4-coumarate-CoA ligase - + - + - - - + WS01034Jv115 At4g38620 MYB4 - + + + + + + -WS0016 E09 At5g28840 NAD-dependent epimerase + - - + - - - -WS00912 C11 At1g17160 pfkB-type carbohydrate kinase + + + - + + + + WS00929 J15 At4g29160 SNF7 + - + + + - - -WS00930 N09 At2g01630 qlycosyl hydrolase family 17 protein - - - + - - - -WS0075 D19 At4g02440 circadian clock coupling factor ZGT - - - - - - - -WS00929_N09 At1g69740 porphobilinogen synthase - + - - + - + -WS01039_A16 At5g53190 nodulin MtN3 family protein - - - - - - - -B K3973 EST ID AGI# annotation 0%suc 7d 0%suc10d 2%malt7d 2%malt10d 2%sorb7d 2%sorb10d 2%suc7d 2%suc10d WS00113 D16 At5g08370 alpha-qalactosidase na na + + + + + + WS00729 F23 At1g20510 4-coumarate-CoA ligase na na - + - + - + WS01034 M15 At4g38620 MYB4 na na - + - + + + WS0016 E09 At5q28840 NAD-dependent epimerase na na - - - - + -WS00912 C11 At1g17160 pfkB-type carbohydrate kinase na na - + - + - + WS00929_J15 At4g29160 SNF7 na na + + + - - -WS00930 N09 At2g01630 qlycosyl hydrolase family 17 protein na na - - - + - -WS0075 D19 At4g02440 circadian clock coupling factor ZGT na na - - - - - -WS00929 N09 At1 g69740 porphobilinogen synthase na na + - + - + -WS01039_A16 At5g53190 nodulin MtN3 family protein na na - - - - - -to on 7.2 QRT-PCR discussion There are a number of possible reasons for the poor correlation between my microarray data and the QRT-PCR results. Cross-hybridization between similar members of a gene-family is always a possibility when using a cDNA microarray to measure gene expression. This problem is greatest when members of gene families are highly similar at the nucleotide level. For the Picea array used in this study this limitation is particularly relevant, given that there are repeated examples of multiple ESTs having the same annotation. Girke et al. (2000) concluded that sequences with 70-80% sequence similarity will result in significant cross-hybridization on cDNA microarrays (Girke et al., 2000). Similar results were obtained in an Arabidopsis cytochrome P450-specific cDNA microarray study, where at least 20% cross-hybridization was observed once sequence similarity exceeded 80% (Xu et al., 2001). Ideally, future generation conifer microarrays will be based on "long-mer" technology, which is capable of reliably distinguishing between sequences of up to 90% nucleotide similarity (Douglas and Ehlting, 2005). Alternatively, adoption of short-oligo methodologies, such as those employed by platforms such as Affymetrix, could allow accurate transcript profiling of sequences having up to 93% similarity (Lipshutz et al., 1999). However, both of these approaches require full knowledge of the coding regions of the Picea (or Pinus) genome. Other cross-species hybridization effects could also be contributing to some of the observed variability. Mismatches between Pinus probes and Picea array elements would result in an overall decrease in hybridization efficiency, leading to increased non-specific hybridizations to similar gene-family members. Girk et al. (2000) hybridized Brassica mRNA to an Arabidopsis cDNA array and found that the number of genes expressed 2-fold above background levels was reduced by 50% (Girke et al., 2000). In addition, my QRT-PCR primers were designed to match the most homologous P. taeda sequences of each Picea EST to be tested, rather than to match each Picea EST itself. This was done because the goal was specifically to confirm/replicate the expression ratios of P. taeda ESTs. However, the P. taeda homolog chosen to represent a given Picea EST may not be the same transcript that is being expressed in the sample being interrogated by the array element. In the absence of a full genome sequence for either species, I can never exclude this possibility. One approach that might help to generate more robust data would be to validate the expression levels using primers designed to target a selection of P. taeda sequences that are highly similar to the Picea array feature, rather that just relying upon the first "hit" generated by a BLAST search. It is also possible that the poor congruence between the microarray data and the QRT-PCR results could stem from fundamental differences between the two technologies. The linear dynamic range of data generated by QRT-PCR methods is several orders of magnitude greater than that obtained by microarray analysis, as is the sensitivity (Czechowski et al., 2004). Expression ratios between the two technologies can therefore be dramatically different, simply due to detection limitations. However, while this could account for expression-ratio magnitude differences, it should not affect the direction of expression-ratio differences (up or down-regulated). A final consideration is the use of P-tubulin as the sole internal standard which may have contributed to the confounding results, since fluctuations in the expression of this gene are inevitable and could result in artefactual data (Bustin and Nolan, 2004). Since there rarely exists a gene with ubiquitous expression across all tissues and treatment, gene selection for internal standardization has become a key issue in quantitative PCR analysis. Lately, various research groups have strived to define a single reference gene that can be used to normalize QRT-PCR data (Brunner et al., 2004a; Radonic et al., 2004). The conclusion to be derived from these studies is that all samples of interest should be screened with several potential 'housekeeping genes' prior to expression profiling so that the best reference points can be selected for any given experiment. Ideally, this reference gene would have 1) an expression profile than fluctuates minimally throughout all samples being considered, 2) expression at roughly the same level as the other genes under investigation, and 3) amplification characteristics similar to those of the experimental genes to be studied (Morse et al., 2005). Since no single gene is likely to possess all of these properties, Vandesompele et al. (2002) proposed using the average expression of several reference genes as an internal standard, thereby buffering any radical expression changes that may confound later data analysis. This strategy would be theoretically superior, but would place increased technical and financial demands on the user, since several more QRT-PCR reactions would be necessary for every sample being investigated. A novel approach has been taken by Czechowski et al. (2005) to address the challenges of normalizing QRT-PCR data using standard house keeping genes (Czechowski et al., 2005). They performed a comprehensive survey of Affymetrix GeneChip-based Arabidopsis gene expression data, in order to find the genes that exhibited the lowest changes in expression under an extensive range of experimental conditions. They then quantified the expression of 35 of the least variable genes across 101 different tissue and treatment samples, using QRT-PCR. This allowed them to define dozens of Arabidopsis genes that have more stable expression than traditional housekeeping genes such as actin, B-tubulin, or polyubiquitin. These would presumably make ideal candidates for QRT-PCR data normalization in Arabidopsis. However, since my data is derived from Pinus samples, the genes described by Czechowski et al. (2005) may not behave the same in my system as they do in Arabidopsis. In addition, I would still be uncertain as to whether I had identified the appropriate Pinus orthologs for this purpose. Nevertheless, there would be merit to repeating the experiment described by Czechowski et al, using Pinus microarray datasets, in order to identify the most stable genes in my system (Czechowski et al., 2005). The most convincing way to validate microarray data is to replicate the microarray comparison using a series of biological replicates and multiple primer pairs for each gene of interest. Sequencing the PCR products would also increase my confidence that the correct product is being quantified. Once a short-list of validated transcripts has been selected for study, QRT-PCR with probe-based detection, rather than the non-specific SYBR-green staining, would improve detection reliability and allow for multiplexing of reactions (Grace et al., 2003). 1 8.0 Conclusion This body of work represents a unique, global view of the transcriptional events taking place during P. taeda somatic embryo germination. At the outset of the work, I set out three hypotheses. First, I hypothesized that the developmental changes observed in germinating somatic embryos would be correlated with a large number of differences in gene expression. Second, I hypothesized that sampling P. taeda somatic embryos at two time-points post-imbibition, would allow me to obtain an initial view of the kinetic behaviour of those differentially expressed genes associated with germination. Finally, I hypothesized that comparative global gene expression analysis of P. taeda germinants grown on different carbohydrate sources would reveal genes and associated biochemical pathways whose activities are uniquely correlated with growth on different carbon sources. In order to test my hypotheses, I first demonstrated that cross-species microarray hybridizations on Picea cDNA arrays could successfully report the patterns of gene expression in related conifer species. I then applied this strategy to determine which P. taeda genes are active during two different developmental stages in the somatic embryo germination process. Expression of a large portion of the P. taeda genome (>70%) appeared to be modulated at some point during the germination process, which confirmed my first hypothesis. Many of those genes also showed marked differences in their expression at the two different time-points sampled (7 days and 17 days), which supported my second hypothesis. 130 However, comparison of the gene expression profiles of germinants grown on different carbon sources and sampled at the same time revealed very few statistically supported differences in gene expression. This may be largely due to the loss of statistical power associated with indirect comparisons, but it could also reflect the over-riding impact of the developmental programming that underpins germination, even when the latter occurs under sub-optimal nutritional conditions. Nevertheless, the data I obtained failed to support my third hypothesis. My analysis of the differentially expressed genes in germinating somatic embryos then focused on those associated with a subset of key metabolic pathways. The strong up-regulation of an a/p/za-galactosidase transcript led to a detailed exploration of the raffinose synthesis and degradation pathway. Numerous other a/p/za-galactosidase transcripts were found to be differentially regulated, consistent with rapid degradation of raffinose stored in the dormant embryo as it begin to germinate. In contrast, genes encoding raffinose synthesis enzymes remained transcriptionally inactive. Similar analysis revealed that the shikimate pathway is likely up-regulated in germinating seedlings, indicative of increased primary and secondary metabolism. The pattern of expression changes led me predict a shift towards phenylalanine biosynthesis during germination. ESTs representing the three enzymes of the general phenylpropanoid pathway were also up-regulated, consistent with the predicted metabolic shift toward production and conversion of phenylalanine to phenylpropanoid metabolites. Flavonoid biosynthesis, one of the down-stream branches of general phenylpropanoid biosynthesis, also appeared to be induced. Specifically, anthocyanin biosynthesis-related genes were up-regulated, a pattern that is correlated with the appearance of red pigmentation in the germinating somatic embryos. Gene expression patterns were also consistent with concomitant induction of lignan and flanonol biosynthesis during germination. Finally, the conspicuous up-regulation of a MYB4 homologue transcription factor suggests that there may be a significant metabolic transition taking place during the germination stages under study; namely, a priority shift from potentially defensive flavonoid compounds to more structural lignin-based ones. Better temporal resolution of these expression patterns, either by extending the germination time series beyond the 10-day sampling point, or by adding a number of earlier time points, would make it possible to substantiate some of the trends observed in the present study. Detailed biochemical analysis of the metabolites being formed and degraded during the germination process would also make a logical extension of the gene expression analyses. Such data would allow me to corroborate the metabolic predictions that have, at this point, been based solely on expression of the genes encoding the relevant enzymes. Finally, the massive scale of differential gene expression detected in germinating somatic embryos may offer an opportunity to identify specific expression events that would be predictive of the rate of conversion of somatic embryos to established seedlings within a given population of embryos. In the context of industrial-scale tissue culture process refinement, such DNA- or protein-based biomarkers could have great value. Further progress toward that goal would require additional experimentation comparing global transcriptional profiles of embryo populations that were shown to ultimately perform well, or poorly, in the subsequent conversion stages. 132 9.0 Biblography Alba, R., Fei, Z., Payton, P., Liu, Y., Moore, S.L., Debbie, P., Cohn, J., D'Ascenzo, M., Gordon, J.S., Rose, J.K., Martin, G., Tanksley, S.D., Bouzayen, M., Jahn, M.M., and Giovannoni, J. (2004). ESTs, cDNA microarrays, and gene expression profiling: tools for dissecting plant physiology and development. Plant J39, 697-714. Almoguera, C , and Jordano, J. (1992). Developmental and environmental concurrent expression of sunflower dry-seed-stored low-molecular-weight heat-shock protein and Lea mRNAs. Plant Mol Biol 19, 781-792. Anterola, A.M., Jeon, J.H., Davin, L.B., and Lewis, N.G. (2002). Transcriptional control of monolignol biosynthesis in Pinus taeda: factors affecting monolignol ratios and carbon allocation in phenylpropanoid metabolism. J Biol Chem 277, 18272-18280. ARBOREA. ( Archibald, J.M., and Keeling, P.J. (2002). Recycled plastids: a 'green movement' in eukaryotic evolution. Trends Genet 18, 577-584. ATHENA. ( Attree, S.M., and Fowke, L.C. (1993). Embryogeny of gymnosperms: advances in synthetic seed technology of conifers. Plant cell, tissue and organ culture 35, 1-35. Becher, M., Talke, I.N., Krall, L., and Kramer, U. (2004). Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant J 37, 251-268. Behal, R.H., and Oliver, D.J. (1998). NAD(+)-dependent isocitrate dehydrogenase from Arabidopsis thaliana. Characterization of two closely related subunits. Plant Mol Biol 36, 691-698. Bell-Lelong, D.A., Cusumano, J.C., Meyer, K., and Chappie, C. (1997). Cinnamate-4-hydroxylase expression in Arabidopsis. Regulation in response to development and the environment. Plant Physiol 113, 729-738. Besendorfer, V., Krajacic-Sokol, I., Jelenic, S., Puizina, J., Mlinarec, J., Sviben, T., and Papes, D. (2005). Two classes of 5S rDNA unit arrays of the silver fir, Abies alba Mill.: structure, localization and evolution. Theor Appl Genet 110, 730-741. Bewley, J.D. (1997). Seed Germination and Dormancy. Plant Cell 9, 1055-1066. Black, M., Corbineau, F., Gee, H., and Come, D. (1999). Water content, raffinose, and dehydrins in the induction of desiccation tolerance in immature wheat embryos. Plant Physio] 120, 463-472. Bomal, C , Le, V.Q., and Tremblay, F.M. (2002). Induction of tolerance to fast desiccation in black spruce (Picea mariana) somatic embryos: relationship between partial water loss, sugars, and dehydrins. Physiol Plant 115, 523-530. Brinker, M., van Zyl, L., Liu, W., Craig, D., Sederoff, R.R., Clapham, D.H., and von Arnold, S. (2004). Microarray analyses of gene expression during adventitious root development in Pinus contorta. Plant Physiol 135, 1526-1539. 133 Brunner, A.M., Yakovlev, I.A., and Strauss, S.H. (2004a). Validating internal controls for quantitative plant gene expression studies. BMC Plant Biol 4, 14. Brunner, A.M., Busov, V.B., and Strauss, S.H. (2004b). Poplar genome sequence: functional genomics in an ecologically dominant plant species. Trends Plant Sci 9,49-56. Busk, P.K., and Pages, M. (1998). Regulation of abscisic acid-induced transcription. Plant Mol Biol 37,425-435. Bustin, S.A. (2002). Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol 29, 23-39. Bustin, S.A., and Nolan, T. (2004). Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J Biomol Tech 15, 155-166. Butland, S.L., Chow, M.L., and Ellis, B.E. (1998). A diverse family of phenylalanine ammonia-lyase genes expressed in pine trees and cell cultures. Plant Mol Biol 37, 15-24. Carmi, N., Zhang, G., Petreikov, M., Gao, Z., Eyal, Y., Granot, D., and Schaffer, A.A. (2003). Cloning and functional expression of alkaline alpha-galactosidase from melon fruit: similarity to plant SIP proteins uncovers a novel family of plant glycosyl hydrolases. Plant J 33, 97-106. Chakravarthy, S., Tuori, R.P., D'Ascenzo, M.D., Fobert, P.R., Despres, C , and Martin, G.B. (2003). The tomato transcription factor Pti4 regulates defense-related gene expression via GCC box and non-GCC box cis elements. Plant Cell 15, 3033-3050. Chalmers, A.D., Goldstone, K., Smith, J . C , Gilchrist, M., Amaya, E., and Papalopulu, N. (2005). A Xenopus tropicalis oligonucleotide microarray works across species using RNA from Xenopus laevis. Mech Dev 122, 355-363. Chandrasekharan, M.B., Bishop, K.J., and Hall, T.C. (2003a). Module-specific regulation of the beta-phaseolin promoter during embryogenesis. Plant J 33, 853-866. Chandrasekharan, M.B., Li, G., Bishop, K.J., and Hall, T.C. (2003b). S phase progression is required for transcriptional activation of the beta-phaseolin promoter. J Biol Chem 278, 45397-45405. Chatthai, M., Osusky, M., Osuska, L., Yevtushenko, D., and Misra, S. (2004). Functional analysis of a Douglas-fir metallothionein-like gene promoter: transient assays in zygotic and somatic embryos and stable transformation in transgenic tobacco. Planta 220, 118-128. Chen, X., Wang, B., and Wu, R. (1995). A gibberellin-stimulated ubiquitin-conjugating enzyme gene is involved in alpha-amylase gene expression in rice aleurone. Plant Mol Biol 29, 787-795. Churchill, G.A. (2002). Fundamentals of experimental design for cDNA microarrays. Nat Genet suppliment 32,490-495. Colliver, S., Bovy, A., Collins, G., Muir, S., Robinson, S., de VOs, C.H.R., and Verhoeyen, M.E. (2002). Improving the nutritional content of tomatoes throught reprogramming their flavonoid biosynthetic pathway. Phytochemistry Reviews 1, 113-123. 134 Cornah, J.E., Germain, V., Ward, J.L., Beale, M.H., and Smith, S.M. (2004). Lipid utilization, gluconeogenesis, and seedling growth in Arabidopsis mutants lacking the glyoxylate cycle enzyme malate synthase. J Biol Chem 279,42916-42923. Costa, M.A., Collins, R.E., Anterola, A.M., Cochrane, F.C., Davin, L.B., and Lewis, N.G. (2003). An in silico assessment of gene function and organization of the phenylpropanoid pathway metabolic networks in Arabidopsis thaliana and limitations thereof. Phytochemistry 64, 1097-1112. Costa, M.A., Bedgar, D.L., Moinuddin, S.G., Kim, K.W., Cardenas, C.L., Cochrane, F.C., Shockey, J.M., Helms, G.L., Amakura, Y., Takahashi, H., Milhollan, J.K., Davin, L.B., Browse, J., and Lewis, N.G. (2005). Characterization in vitro and in vivo of the putative multigene 4-coumarate:CoA ligase network in Arabidopsis: syringyl lignin and sinapate/sinapyl alcohol derivative formation. Phytochemistry 66, 2072-2091. Cukovic, D., Ehlting, J., VanZiffle, J.A., and Douglas, C.J. (2001). Structure and evolution of 4-coumarate:coenzyme A ligase (4CL) gene families. Biol Chem 382, 645-654. Czechowski, T., Bari, R.P., Stitt, M., Scheible, W.R., and Udvardi, M.K. (2004). Real-time RT-PCR profiling of over 1400 Arabidopsis transcription factors: unprecedented sensitivity reveals novel root- and shoot-specific genes. Plant J 38, 366-379. Czechowski, T., Stitt, M., Altmann, T., Udvardi, M.K., and Scheible, W.R. (2005). Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol 139, 5-17. de los Reyes, B.G., and McGrath, J.M. (2003). Cultivar-specific seedling vigor and expression of a putative oxalate oxidase germin-like protein in sugar beet (Beta vulgaris L.). Theor Appl Genet 107, 54-61. de los Reyes, B.G., Myers, S.J., and McGrath, J.M. (2003). Differential induction of glyoxylate cycle enzymes by stress as a marker for seedling vigor in sugar beet (Beta vulgaris). Mol Genet Genomics 269, 692-698. Dicko, M.H., Gruppen, H., Traore, A.S., van Berkel, W.J., and Voragen, A.G. (2005). Evaluation of the effect of germination on phenolic compounds and antioxidant activities in sorghum varieties. J Agric Food Chem 53, 2581-2588. Dijkwel, P.P., Kock, P., Bezemer, R., Weisbeek, P.J., and Smeekens, S. (1996). Sucrose Represses the Developmentally Controlled Transient Activation of the Plastocyanin Gene in Arabidopsis thaliana Seedlings. Plant Physiol 110,455-463. Dixon, R.A., and Paiva, N.L. (1995). Stress-Induced Phenylpropanoid Metabolism. Plant Cell 7, 1085-1097. Dong, J., Keller, W.A., Yan, W., and Georges, F. (2004). Gene expression at early stages of Brassica napus seed development as revealed by transcript profiling of seed-abundant cDNAs. Planta 218,483-491. Dong, J.Z., and Dunstan, D.I. (1996). Expression of abundant mRNAs during somatic embryogenesis of white spruce [Picea glauca (Moench) Voss]. Planta 199, 459-466. Douglas, C. (1996). Phenylpropanoid metabolism and lignin biosynthesies: from weeds to trees. Trends in Plant Science 1, 171. 135 Douglas, C.J., and Ehlting, J. (2005). Arabidopsis thaliana full genome longmer microarrays: a powerful gene discovery tool for agriculture and forestry. Transgenic Res 14, 551-561. Downie, B., Gurusinghe, S., Dahal, P., Thacker, R.R., Snyder, J.C., Nonogaki, H., Yim, K., Fukanaga, K., Alvarado, V., and Bradford, K.J. (2003). Expression of a GALACTINOL SYNTHASE gene in tomato seeds is up-regulated before maturation desiccation and again after imbibition whenever radicle protrusion is prevented. Plant Physiol 131, 1347-1359. Downie, B., and Bewley, J.D. (2000). Soluble sugar content of white spruce (picea glauca) seeds during and after germination. Physiologia Plantarum, 1-12. Dure, L., and Waters, L. (1965). Long-Lived Messenger Rna: Evidence from Cotton Seed Germination. Science 147,410-412. Ehlting, J., Buttner, D., Wang, Q., Douglas, C.J., Somssich, I.E., and Kombrink, E. (1999). Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. Plant J 19, 9-20. Ehlting, J., Mattheus, N., Aeschliman, D.S., Li, E., Hamberger, B., Cullis, I.F., Zhuang, J., Kaneda, M., Mansfield, S.D., Samuels, L., Ritland, K., Ellis, B.E., Bohlmann, J., and Douglas, C J . (2005). Global transcript profiling of primary stems from Arabidopsis thaliana identifies candidate genes for missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation. Plant J 42,618-640. El Meskaoui, A., and Tremblay, F.M. (2001). Involvement of ethylene in the maturation of black spruce embryogenic cell lines with different maturation capacities. J Exp Bot 52, 761-769. Enard, W., Khaitovich, P., Klose, J., Zollner, S., Heissig, F., Giavalisco, P., Nieselt-Struwe, K., Muchmore, E., Varki, A., Ravid, R., Doxiadis, G.M., Bontrop, R.E., and Paabo, S. (2002). Intra- and interspecific variation in primate gene expression patterns. Science 296, 340-343. Entus, R., Poling, M., and Herrmann, K.M. (2002). Redox regulation of Arabidopsis 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase. Plant Physiol 129, 1866-1871. Etienne-Barry, D., Bertrand,B., Vasquez,N., and Etienne,H. (1999). Direct sowing of Coffea arabica somatic embryos mass-produced in bioreactor and regeneration of plants. Plant Cell Reports 19, 111-117. Fernie, A.R., Carrari, F., and Sweetlove, L.J. (2004). Respiratory metabolism: glycolysis, the TCA cycle and mitochondrial electron transport. Curr Opin Plant Biol 7, 254-261. Feurtado, J.A., Banik, M., and Bewley, J.D. (2001). The cloning and characterization of alpha-galactosidase present during and following germination of tomato (Lycopersicon esculentum Mill.) seed. J Exp Bot 52, 1239-1249. Gachon, C M . , Langlois-Meurinne, M., Henry, Y., and Saindrenan, P. (2005). Transcriptional co-regulation of secondary metabolism enzymes in Arabidopsis: functional and evolutionary implications. Plant Mol Biol 58, 229-245. Gallardo, K., Job, C , Groot, S.P., Puype, M., Demol, H., Vandekerckhove, J., and Job, D. (2001). Proteomic analysis of arabidopsis seed germination and priming. Plant Physiol 126, 835-848. Gallois, P. (2001). Future of early embryogenesis studies in Arabidopsis thaliana. C R Acad Sci III 324, 569-573. Gamborg, O.L. (2002). Plant tissue culture. Biotechnology. Milestones. In vitro cell developmental biology-plant, 84-92. Gang, D.R., Kasahara, H., Xia, Z.Q., Vander Mijnsbrugge, K., Bauw, G., Boerjan, W., Van Montagu, M., Davin, L.B., and Lewis, N.G. (1999). Evolution of plant defense mechanisms. Relationships of phenylcoumaran benzylic ether reductases to pinoresinol-lariciresinol and isoflavone reductases. J Biol Chem 274, 7516-7527. Ge, X., Dietrich, C , Matsuno, M., Li, G., Berg, H., and Xia, Y. (2005). An Arabidopsis aspartic protease functions as an anti-cell-death component in reproduction and embryogenesis. EMBO Rep 6, 282-288. Gilad, Y., Rifkin, S.A., Bertone, P., Gerstein, M., and White, K.P. (2005). Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles. Genome Res 15, 674-680. Girke, T., Todd, J., Ruuska, S., White, J. , Benning, C , and Ohlrogge, J. (2000). Microarray analysis of developing Arabidopsis seeds. Plant Physiol 124, 1570-1581. Gius, D., Bradbury, C M . , Sun, L., Awwad, R.T., Huang, L., Smart, D.D., Bisht, K.S., Ho, A.S., and Nguyen, P. (2005). The epigenome as a molecular marker and target. Cancer 104, 1789-1793. Grace, M.B., McLeland, C.B., Gagliardi, S.J., Smith, J.M., Jackson, W.E., 3rd, and Blakely, W.F. (2003). Development and assessment of a quantitative reverse transcription-PCR assay for simultaneous measurement of four amplicons. Clin Chem 49, 1467-1475. Guillet, G., Poupart, J., Basurco, J., and De Luca, V. (2000). Expression of tryptophan decarboxylase and tyrosine decarboxylase genes in tobacco results in altered biochemical and physiological phenotypes. Plant Physiol 122, 933-943. Guimaraes, V.M., de Rezende, S.T., Moreira, M.A., de Barros, E.G., and Felix, C R . (2001). Characterization of alpha-galactosidases from germinating soybean seed and their use for hydrolysis of oligosaccharides. Phytochemistry 58, 67-73. Gupta, P.K., and Pullman, G.S. (1987). Biotechnology of somatic polyembryogenesis and plantlet regeneration in loblolly pine,. Bio/Technology 5, 147-151. Hahlbrock , K., and Grisebach, H. (1979). Enzymatic controls in biosynthesis of lignin and flavonoids. Annual Review of Plant Physiology 30, 105-130. Hakman, I., and Oliviusson, P. (2002). High expression of putative aquaporin genes in cells with transporting and nutritive functions during seed development in Norway spruce (Picea abies). J Exp Bot 53, 639-649. Harari-Steinberg, O., Ohad, I., and Chamovitz, D.A. (2001). Dissection of the light signal transduction pathways regulating the two early light-induced protein genes in Arabidopsis. Plant Physiol 127, 986-997. Harding, S.A., Leshkevich, J. , Chiang, V.L., and Tsai, C J . (2002). Differential substrate inhibition couples kinetically distinct 4-coumarate:coenzyme a ligases with spatially distinct metabolic roles in quaking aspen. Plant Physiol 128, 428-438. 137 Hattori, T., Totsuka, M., Hobo, T., Kagaya, Y., and Yamamoto-Toyoda, A. (2002). Experimentally determined sequence requirement of ACGT-containing abscisic acid response element. Plant Cell Physiol 43, 136-140. Hemm, M.R., Herrmann, K.M., and Chappie, C. (2001). AtMYB4: a transcription factor general in the battle against UV. Trends Plant Sci 6, 135-136. Henstrand, J.M., McCue K.F., Brink K., Handa A.K., Herrmann K.M., and Conn E.E. (1992). Light and Fungal Elicitor Induced 3-Deoxy-D-arabino-Heptulosonate 7-phosphate Synthase mRNA in Suspension Cultures of Parsley (Petreselinum crispum L.). Plant Physiol 98, 761-763. Herrmann, K.M. (1995). The Shikimate Pathway: Early Steps in the Biosynthesis of Aromatic Compounds. Plant Cell 7, 907-919. Herrmann, K.M., and Weaver, L . M . (1999). The Shikimate Pathway. Annu Rev Plant Physiol Plant Mol Biol 50, 473-503. Herzog, M., Dorne, A.M., and Grellet, F. (1995). GAS A, a gibberellin-regulated gene family from Arabidopsis thaliana related to the tomato GAST1 gene. Plant Mol Biol 27, 743-752. Hodgson, A.V., and Strobel, H.W. (1996). Characterization of the FAD binding domain of cytochrome P450 reductase. Arch Biochem Biophys 325, 99-106. Hogberg, K.A., Bozhkov, P.V., and Von Arnold, S. (2003). Early selection improves clonal performance and reduces intraclonal variation of Norway spruce plants propagated by somatic embryogenesis. Tree Physiol 23, 211-216. Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M . (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 Suppl 1, S96-104. Hughes, A.L. (2005). Gene duplication and the origin of novel proteins. Proc Natl Acad Sci US A 102, 8791-8792. Iraqi, D., and Tremblay, F.M. (2001). The role of sucrose during maturation of black spruce (Picea mariana) and white spruce (Picea glauca) somatic embryos. Physiol Plant 111, 381-388. Ji, W., Zhou, W., Gregg, K., Yu, N. , and Davis, S. (2004). A method for cross-species gene expression analysis with high-density oligonucleotide arrays. Nucleic Acids Res 32, e93. Jin, H., Cominelli, E., Bailey, P., Parr, A., Mehrtens, F., Jones, J., Tonelli, C , Weisshaar, B., and Martin, C. (2000). Transcriptional repression by AtMYB4 controls production of UV-protecting sunscreens in Arabidopsis. Embo J 19, 6150-6161. Kadyrzhanova, D.K., Vlachonasios, K.E., Ververidis, P., and Dilley, D.R. (1998). Molecular cloning of a novel heat induced/chilling tolerance related cDNA in tomato fruit by use of mRNA differential display. Plant Mol Biol 36, 885-895. Kaffarnik, F., Heller, W., Hertkorn, N . , and Sandermann, H., Jr. (2005). Flavonol 3-O-glycoside hydroxycinnamoyltransferases from Scots pine (Pinus sylvestris L.). Febs J 2 7 2 , 1415-1424. Kao, Y.Y., Harding, S.A., and Tsai, C.J. (2002). Differential expression of two distinct phenylalanine ammonia-lyase genes in condensed tannin-accumulating and lignifying cells of quaking aspen. Plant Physiol 130, 796-807. Karner, U., Peterbauer, T., Raboy, V., Jones, D.A., Hedley, C.L., and Richter, A. (2004). myo-inositol and sucrose concentrations affect the accumulation of raffinose family oligosaccharides in seeds. J Exp Bot 55, 1981-1987. Karpinska, B., Karlsson, M., Srivastava, M., Stenberg, A., Schrader, J., Sterky, F., Bhalerao, R., and Wingsle, G. (2004). MYB transcription factors are differentially expressed and regulated during secondary vascular tissue development in hybrid aspen. Plant Mol Biol 56, 255-270. Keith, B., Dong, X.N., Ausubel, F.M., and Fink, G.R. (1991). Differential induction of 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase genes in Arabidopsis thaliana by wounding and pathogenic attack. Proc Natl Acad Sci U S A 88, 8821-8825. Kirst, M., Johnson, A.F., Baucom, C., Ulrich, E., Hubbard, K., Staggs, R., Paule, C., Retzel, E., Whetten, R., and Sederoff, R. (2003). Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci U S A 100, 7383-7388. Kita, A., Okajima, K., Morimoto, Y., Ikeuchi, M., and Miki, K. (2005). Structure of a cyanobacterial BLUF protein, T110078, containing a novel FAD-binding blue light sensor domain. J Mol Biol 349, 1-9. Kolosova, N., Miller, B., Ralph, S., Ellis, B.E., Douglas, C., Ritland, K., and Bohlmann, J. (2004). Isolation of high-quality RNA from gymnosperm and angiosperm trees. Biotechniques 36, 821-824. Kolotelo, D., Van Steenis, E., Peterson, M., Bennett, R., Trotter, D., and Dennis, J. (2001). Seed Handling Guidebook. (Victoria: British Colunbia Ministry of Forests). Koster, K.L., and Leopold, C. (1988). Sugar and Desiccation tolerance in seeds. Plant Physiol 88, 829-832. Kubasek, W.L., Ausubel, F.M., and Shirley, B.W. (1998). A light-independent developmental mechanism potentiates flavonoid gene expression in Arabidopsis seedlings. Plant Mol Biol 37, 217-223. Kubasek, W.L., Shirley, B.W., McKillop, A., Goodman, H.M., Briggs, W., and Ausubel, F.M. (1992). Regulation of Flavonoid Biosynthetic Genes in Germinating Arabidopsis Seedlings. Plant Cell 4, 1229-1236. Lane, B.G. (1991). Cellular desiccation and hydration: developmentally regulated proteins, and the maturation and germination of seed embryos. Faseb J 5, 2893-2901. Langhansova, L., Konradova, H., and Vanek, T. (2004). Polyethylene glycol and abscisic acid improve maturation and regeneration of Panax ginseng somatic embryos. Plant Cell Rep 22, 725-730. Lehle, L., and Tanner, W. (1973). The function of myo-inositol in the biosynthesis of raffinose. Purification and characterization of galactinol: sucrose 6-galactosyltransferase from Vicia faba seeds. Eur J Biochem 38, 103-110. Leon, P., and Sheen, J. (2003). Sugar and hormone connections. Trends Plant Sci 8, 110-116. Lipavsk, H., and Konradova, H. (2004). Invited review: Somatic embryogenesis in conifers: the role of carbohydrate metabolism. In vitro cell developmental biology-plant 40, 23-30. 139 Lippert, D., Zhuang, J., Ralph, S., Ellis, D.E., Gilbert, M., Olafson, R., Ritland, K., Ellis, B., Douglas, C.J., and Bohlmann, J. (2005). Proteome analysis of early somatic embryogenesis in Picea glauca. Proteomics 5, 461-473. Lipshutz, R.J., Fodor, S.P., Gingeras, T.R., and Lockhart, D.J. (1999). High density synthetic oligonucleotide arrays. Nat Genet 21, 20-24. Litz, R.E., Moon, P.A., and Avila, V.M.C. (2005). SOMATIC EMBRYOGENESIS AND REGENERATION OF ENDANGERED CYC AD SPECIES. Acta Hort. (ISHS), 75-80. Liu, J. , Hara, C , Umeda, M., Zhao, Y., Okita, T.W., and Uchimiya, H. (1995). Analysis of randomly isolated cDNAs from developing endosperm of rice (Oryza sativa L.): evaluation of expressed sequence tags, and expression levels of mRNAs. Plant Mol Biol 29, 685-689. Loring, J.F. (2005). Evolution of microarray analysis. Neurobiol Aging. Lu, C.A., Ho, T.H., Ho, S.L., and Yu, S.M. (2002). Three novel MYB proteins with one DNA binding repeat mediate sugar and hormone regulation of alpha-amylase gene expression. Plant Cell 14, 1963-1980. Lui, J.J.-J., Krenz, D.C. (1998a). Galactinol synthase (GS):increased enzyme activity and levels of mRNA due to cold and desiccation. Plant Science 134, 11-20. Lui, J.J.-J., Krenz, D.C, Galvez A.F., de Lumen, B.O. (1998b). Galactinol synthase (GS): increaseed enzyme activity and levels on mRNA due to cold and desiccation. Plant Science, 11-20. Macheroux, P., Schmid, J., Amrhein, N., and Schaller, A. (1999). A unique reaction in a common pathway: mechanism and function of chorismate synthase in the shikimate pathway. Planta 207, 325-334. Malabadi, R.B., and Van Staden, J. (2005). Somatic embryogenesis from vegetative shoot apices of mature trees of Pinus patula. Tree Physiol 25, 11-16. Marraccini, P., Rogers, W.J., Caillet, V., Deshayes, A., Granato, D., Lausanne, F., Lechat, S., Pridmore, D., and Petiard, V. (2005). Biochemical and molecular characterization of alpha-d-galactosidase from coffee beans. Plant Physiol Biochem 43, 909-920. Martin, C , and Paz-Ares, J. (1997). MYB transcription factors in plants. Trends Genet 13, 67-73. Martinoia, E., Klein, M., Geisler, M., Bovet, L., Forestier, C , Kolukisaoglu, U., Muller-Rober, B., and Schulz, B. (2002). Multifunctionality of plant ABC transporters—more than just detoxifiers. Planta 214, 345-355. McKenzie, D.J., Mclean, M.A., Mukerji, S., and Green, M. (1997). Isolation of RNA from woody tissue using the RNeasy Plant Min Kit. Plant disease 81, 222. Minghe, L., and Ritchie, G.A. (1999). Eight hundred years of clonal forestry in China: traditional afforestation with Chinese Fir (Cunninghamia lanceolata). New Forest 18, 131-142. Moody, D.E., Zou, Z., and Mclntyre, L. (2002). Cross-species hybridisation of pig RNA to human nylon microarrays. BMC Genomics 3, 27. Morse, D.L., Carroll, D., Weberg, L., Borgstrom, M.C., Ranger-Moore, J. , and Gillies, R.J. (2005). Determining suitable internal standards for mRNA quantification of increasing cancer progression in human breast cells by real-time reverse transcriptase polymerase chain reaction. Anal Biochem 342, 69-77. Mushegian, A.R., and Koonin, E.V. (1995). A putative FAD-binding domain in a distinct group of oxidases including a protein involved in plant development. Protein Sci 4, 1243-1244. Nakabayashi, K., Okamoto, M., Koshiba, T., Kamiya, Y., and Nambara, E. (2005). Genome-wide profiling of stored mRNA in Arabidopsis thaliana seed germination: epigenetic and genetic regulation of transcription in seed. Plant J 41, 697-709. Nambara, E., and Marion-Poll, A. (2003). ABA action and interactions in seeds. Trends Plant Sci 8, 213-217. Narusaka, Y., Nakashima, K., Shinwari, Z.K., Sakuma, Y., Furihata, T., Abe, H., Narusaka, M., Shinozaki, K., and Yamaguchi-Shinozaki, K. (2003). Interaction between two cis-acting elements, ABRE and DRE, in ABA-dependent expression of Arabidopsis rd29A gene in response to dehydration and high-salinity stresses. Plant J 34, 137-148. Nelson, D.R., Schuler, M.A., Paquette, S.M., Werck-Reichhart, D., and Bak, S. (2004). Comparative genomics of rice and Arabidopsis. Analysis of 727 cytochrome P450 genes and pseudogenes from a monocot and a dicot. Plant Physiol 135, 756-772. Nemeth, K.A., Singh, A.V., and Knudsen, T.B. (2005). Searching for biomarkers of developmental toxicity with microarrays: normal eye morphogenesis in rodent embryos. Toxicol Appl Pharmacol 206, 219-228. Nieves, N., Blanco, M.A., Hernandez, M., Concepcion, O., Borroto, E., Martinez, M.E., and Gonzalez, A. (1998). Artificial endosperm of Cleopatra tnagerine zygotic embryos: a model for somatic embryo encapsulation. Plant cell, tissue and organ culture 54, 77-83. Noel, J.P., Austin, M.B. and Bomati, E.K. (2005). Structure-function relationship in plant phenylpropanoid biosynthesis. Current opinions in Plant Biology 8, 249-253. Ogawa, M., Hanada, A., Yamauchi, Y., Kuwahara, A., Kamiya, Y., and Yamaguchi, S. (2003). Gibberellin biosynthesis and response during Arabidopsis seed germination. Plant Cell 15, 1591-1604. Pasquali, G., Erven, A.S., Ouwerkerk, P.B., Menke, F.L., and Memelink, J. (1999). The promoter of the strictosidine synthase gene from periwinkle confers elicitor-inducible expression in transgenic tobacco and binds nuclear factors GT-1 and GBF. Plant Mol Biol 39, 1299-1310. Patzlaff, A., Mclnnis, S., Courtenay, A., Surman, C , Newman, L.J., Smith, C., Bevan, M.W., Mansfield, S., Whetten, R.W., Sederoff, R.R., and Campbell, M.M. (2003). Characterisation of a pine MYB that regulates lignification. Plant J 36, 743-754. Pavy, N., Laroche, J., Bousquet, J. , and Mackay, J. (2005a). Large-scale statistical analysis of secondary xylem ESTs in pine. Plant Mol Biol 57, 203-224. Pavy, N., Paule, C , Parsons, L., Crow, J.A., Morency, M.J., Cooke, J., Johnson, J.E., Noumen, E., Guillet-Claude, C , Butterfield, Y., Barber, S., Yang, G., Liu, J., Stott, J., Kirkpatrick, R., Siddiqui, A., Holt, R., Marra, M., Seguin, A., Retzel, E., Bousquet, J., and MacKay, J. (2005b). Generation, annotation, 141 analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics 6, 144. Pelletier, M.K., Murrell, J.R., and Shirley, B.W. (1997). Characterization of flavonol synthase and leucoanthocyanidin dioxygenase genes in Arabidopsis. Further evidence for differential regulation of "early" and "late" genes. Plant Physiol 113, 1437-1445. Pelletier, M.K., Burbulis, I.E., and Winkel-Shirley, B. (1999). Disruption of specific flavonoid genes enhances the accumulation of flavonoid enzymes and end-products in Arabidopsis seedlings. Plant Mol Biol 40, 45-54. Penfield, S., Graham, S., and Graham, I.A. (2005). Storage reserve mobilization in germinating oilseeds: Arabidopsis as a model system. Biochem Soc Trans 33, 380-383. Percy, R.E., Livingston, N.J., Moran, J.A., and Von Aderkas, P. (2001). Desiccation, cryopreservation and water relations parameters of white spruce (Picea glauca) and interior spruce (Picea glauca x engelmannii complex) somatic embryos. Tree Physiol 21, 1303-1310. Peterbauer, T., Mach, L., Mucha, J., and Richter, A. (2002). Functional expression of a cDNA encoding pea (Pisum sativum L.) raffinose synthase, partial purification of the enzyme from maturing seeds, and steady-state kinetic analysis of raffinose synthesis. Planta 215, 839-846. Peterbauer, T., Lahuta, L.B., Blochl, A., Mucha, J., Jones, D.A., Hedley, C.L., Gorecki, R.J., and Richter, A. (2001). Analysis of the raffinose family oligosaccharide pathway in pea seeds with contrasting carbohydrate composition. Plant Physiol 127, 1764-1772. Pinheiro, C , Rodrigues, A.P., de Carvalho, I.S., Chaves, M.M., and Ricardo, C P . (2005). Sugar metabolism in developing lupin seeds is affected by a short-term water deficit. J Exp Bot 56, 2705-2712. Pullman, G.S., Namjoshi, K., and Zhang, Y. (2003a). Somatic embryogenesis in loblolly pine (Pinus taeda L.): improving culture initiation with abscisic acid and silver nitrate. Plant Cell Rep 22, 85-95. Pullman, G.S., Zhang, Y., and Phan, B.H. (2003b). Brassinolide improves embryogenic tissue initiation in conifers and rice. Plant Cell Rep 22, 96-104. Pullman, G.S., Mein, J. , Johnson, S., and Zhang, Y. (2005). Gibberellin inhibitors improve embryogenic tissue initiation in conifers. Plant Cell Rep 23, 596-605. Pullman, G.S., Johnson, S., Peter, G., Cairney, J., and Xu, N. (2003c). Improving loblolly pine somatic embryo maturation: comparison of somatic and zygotic embryo morphology, germination, and gene expression. Plant Cell Rep 21, 747-758. Qiagen. (2003). Quantitech SYBR Green PCR handbook. (Qiagen). Radonic, A., Thulke, S., Mackay, I.M., Landt, O., Siegert, W., and Nitsche, A. (2004). Guideline to reference gene selection for quantitative real-time PCR. Biochem Biophys Res Commun 313, 856-862. Renn, S.C, Aubin-Horth, N., and Hofmann, H.A. (2004). Biologically meaningful expression profiling across species using heterologous hybridization to a cDNA microarray. BMC Genomics 5,42. 142 Rensink, W.A., and Buell, C.R. (2005). Microarray expression profiling resources for plant genomics. Trends in Plant Science 10, 603-609. Rise, M.L., von Schalburg, K.R., Brown, G.D., Mawer, M.A., Devlin, R.H., Kuipers, N., Busby, M., Beetz-Sargent, M., Alberto, R., Gibbs, A.R., Hunt, P., Shukin, R., Zeznik, J.A., Nelson, C., Jones, S.R., Smailus, D.E., Jones, S.J., Schein, J.E., Marra, M.A., Butterfield, Y.S., Stott, J.M., Ng, S.H., Davidson, W.S., and Koop, B.F. (2004). Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics. Genome Res 14,478-490. Ro, D.K., Mah, N., Ellis, B.E., and Douglas, C.J. (2001). Functional characterization and subcellular localization of poplar (Populus trichocarpa x Populus deltoides) cinnamate 4-hydroxylase. Plant Physiol 126, 317-329. Ro, D.K., Arimura, G., Lau, S.Y., Piers, E., and Bohlmann, J. (2005). Loblolly pine abietadienol/abietadienal oxidase PtAO (CYP720B1) is a multifunctional, multisubstrate cytochrome P450 monooxygenase. Proc Natl Acad Sci U S A 102, 8060-8065. Robinson, A.R., Gheneim, R., Kozak, R.A., Ellis, D.D., and Mansfield, S.D. (2005). The potential of metabolic profiling as a selection tool for genotype discrimination in Populus. Journal of Experimental Botany 56, 2807-2819. Robles, C , Greff, S., Pasqualini, V., Garzino, S., Bousquet-Melou, A., Fernandez, C , Korboulewsky, N., and Bonin, G. (2003). Phenols and flavonoids in Aleppo pine needles as bioindicators of air pollution. J Environ Qual 32, 2265-2271. Rogers, L.A., Dubos, C , Cullis, I.F., Surman, C , Poole, M., Willment, J., Mansfield, S.D., and Campbell, M.M. (2005). Light, the circadian clock, and sugar perception in the control of lignin biosynthesis. J Exp Bot 56, 1651-1663. Rydin, C , Pedersen, K.R., and Friis, E.M. (2004). On the evolutionary history of Ephedra: Cretaceous fossils and extant molecules. Proc Natl Acad Sci U S A 101, 16571-16576. Saal, L.H., Troein, C , Vallon-Christersson, J. , Gruvberger, S., Borg, A., and Peterson, C. (2002). BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 3, S OFT W ARE0003. Saleem, M., Kim, H.J., Ali, M.S., and Lee, Y.S. (2005). An update on bioactive plant lignans. Nat Prod Rep 22, 696-716. Saravitz, D.M., Pharr, D.M. and Carter Jr,T.E. (1987). Galactinol Synthase Activity and Soluble Sugars in Developing Seeds of Four Soybean Genotypes. Plant Physiol 83, 185-189. Sato, N. (2001). Was the evolution of plastid genetic machinery discontinuous? Trends Plant Sci 6, 151-155. Schneider, K., Kienow, L., Schmelzer, E., Colby, T., Bartsch, M., Miersch, O., Wasternack, C , Kombrink, E., and Stuible, H.P. (2005). A new type of peroxisomal acyl-coenzyme A synthetase from Arabidopsis thaliana has the catalytic capacity to activate biosynthetic precursors of jasmonic acid. J Biol Chem 280, 13962-13972. Schultz, R.P. (1999). Loblolly: the pine for the twenty-first century. New Forest 17, 71-88. 143 Sheoran, LS., Olson, D.J., Ross, A.R., and Sawhney, V.K. (2005). Proteome analysis of embryo and endosperm from germinating tomato seeds. Proteomics 5, 3752-3764. Shimada, N., Sasaki, R., Sato, S., Kaneko, T., Tabata, S., Aoki, T., and Ayabe, S. (2005). A comprehensive analysis of six dihydroflavonol 4-reductases encoded by a gene cluster of the Lotus japonicus genome. J Exp Bot 56, 2573-2585. Shirley, B.W., Hanley, S., and Goodman, H.M. (1992). Effects of ionizing radiation on a plant genome: analysis of two Arabidopsis transparent testa mutations. Plant Cell 4, 333-347. Sidler, M., Hassa, P., Hasan, S., Ringli, C , and Dudler, R. (1998). Involvement of an ABC transporter in a developmental pathway regulating hypocotyl cell elongation in the light. Plant Cell 10, 1623-1636. Simoes, I., and Faro, C. (2004). Structure and function of plant aspartic proteinases. Eur J Biochem 271, 2067-2075. Slonim, D.K. (2002). From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32 Suppl, 502-508. Stasolla, C , van Zyl, L., Egertsdotter, U., Craig, D., Liu, W., and Sederoff, R.R. (2003). The effects of polyethylene glycol on gene expression of developing white spruce somatic embryos. Plant Physiol 131, 49-60. Stasolla, C , Belmonte, M.F., van Zyl, L., Craig, D.L., Liu, W., Yeung, E.C., and Sederoff, R.R. (2004a). The effect of reduced glutathione on morphology and gene expression of white spruce (Picea glauca) somatic embryos. J Exp Bot 55, 695-709. Stasolla, C , Bozhkov, P.V., Chu, T.M., Van Zyl, L., Egertsdotter, U., Suarez, M.F., Craig, D., Wolfinger, R.D., Von Arnold, S., and Sederoff, R.R. (2004b). Variation in transcript abundance during somatic embryogenesis in gymnosperms. Tree Physiol 24, 1073-1085. Storey, B.T., Noiles, E.E., and Thompson, K.A. (1998). Comparison of glycerol, other polyols, trehalose, and raffinose to provide a defined cryoprotectant medium for mouse sperm cryopreservation. Cryobiology 37, 46-58. Storey, J.D., and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100, 9440-9445. Stracke, R., Werber, M., and Weisshaar, B. (2001). The R2R3-MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4, 447-456. Sturn, A., Quackenbush, J., and Trajanoski, Z. (2002). Genesis: cluster analysis of microarray data. Bioinformatics 18, 207-208. Sugimoto, M., and Sakamoto, W. (1997). Putative phospholipid hydroperoxide glutathione peroxidase gene from Arabidopsis thaliana induced by oxidative stress. Genes Genet Syst 72, 311-316. Sullivan, J.A., and Deng, X.W. (2003). From seed to seed: the role of photoreceptors in Arabidopsis development. Dev Biol 260, 289-297. Sun, C , and Callis, J . (1997). Independent modulation of Arabidopsis thaliana polyubiquitin mRNAs in different organs and in response to environmental changes. The Plant Journal 11, 1017-1027. 144 Sundblad, L.G., Andersson, M., Geladi, P., Salomonson, A., and Sjostrom, M. (2001). Fast, nondestructive measurement of frost hardiness in conifer seedlings by VIS+NIR spectroscopy. Tree Physiol 21, 751-757. Sutton, B. (2002). Commercial delivery of genetic improvement to conifer plantations using somatic embryogenesis. Annals of forest science 59, 657-661. Suzich, J.A., Dean J.F.D. , and Herrmann K.M. (1985). 3-deoxy-D-Heptulosonate 7_phosphate Synthase from Carrot Root (Daucus carotd) is a Hysteretic Enzyme. Plant Physiol 79, 765-770. Taber, R.P., Zhang, C, and Hu W. (1998). Kinetics of Douglas-fir (Pseudotsuga menziesii) somatic embryo development. Canadian Joural of Botany 76, 863-871. Taiz, L., and Zeiger, E. (1998). Plant Physiology. (Sinauer Associates Inc.). Terauchi, K., Asakura, T., Nishizawa, N.K., Matsumoto, I., and Abe, K. (2004). Characterization of the genes for two soybean aspartic proteinases and analysis of their different tissue-dependent expression. Planta 218, 947-957. Thibaud-Nissen, F., Shealy, R.T., Khanna, A., and Vodkin, L.O. (2003). Clustering of microarray data reveals transcript patterns associated with somatic embryogenesis in soybean. Plant Physiol 132, 118-136. TIGR. ( Tommasi, F., Paciolla, C , de Pinto, M.C., and De Gara, L. (2001). A comparative study of glutathione and ascorbate metabolism during germination of Pinus pinea L. seeds. J Exp Bot 52, 1647-1654. Treenomix. ( Tsanuo, M.K., Hassanali, A., Hooper, A.M., Khan, Z., Kaberia, F., Pickett, J.A., and Wadhams, L.J. (2003). Isoflavanones from the allelopathic aqueous root exudate of Desmodium uncinatum. Phytochemistry 64, 265-273. van Zyl, L., Bozhkov, P.V., Clapham, D.H., Sederoff, R.R., and von Arnold, S. (2003). Up, down and up again is a signature global gene expression pattern at the beginning of gymnosperm embryogenesis. Gene Expr Patterns 3, 83-91. Vandesompele, J., De Preter, K., Pattyn, F., Poppe, B., Van Roy, N., De Paepe, A., and Speleman, F. (2002). Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 3, RESEARCH0034. Vannini, C , Locatelli, F., Bracale, M., Magnani, E., Marsoni, M., Osnato, M., Mattana, M., Baldoni, E., and Coraggio, I. (2004). Overexpression of the rice Osmyb4 gene increases chilling and freezing tolerance of Arabidopsis thaliana plants. Plant J 37, 115-127. Verhoeyen, M.E., Bovy, A., Collins, G., Muir, S., Robinson, S., de Vos, C.H., and Colliver, S. (2002). Increasing antioxidant levels in tomatoes through modification of the flavonoid biosynthetic pathway. J Exp Bot 53, 2099-2106. Vicente-Carbajosa, J., and Carbonero, P. (2005). Seed maturation: developing an intrusive phase to accomplish a quiescent state. Int J Dev Biol 49, 645-651. Volpin, H., Phillips, D.A., Okon, Y., and Kapulnik, Y. (1995). Suppression of an Isoflavonoid Phytoalexin Defense Response in Mycorrhizal Alfalfa Roots. Plant Physiol 108, 1449-1454. 145 Voo, K.S., Whetten, R.W., OMalley, D.M., and Sederoff, R.R. (1995). 4 coumara texoenzyme a ligase from l o b l o l l y pine x y l e m . Isolation, characterization, and complementary D N A c lon ing . Plant P h y s i o l 108, 85-97. Walker, D.R., and Parrott, W.A. (2001). Effects o f polyethylene g l y c o l and sugar alcohols on soybean somatic embryo germinat ion and convers ion. Plant ce l l , tissue and organ culture 64, 55-62. Wang, T., Aitken, S.N., Woods, J.H., Polsson, K., and Magnussen, S. (2004a). Effects o f inbreeding on coastal Douglas fir growth and y ie ld in operational plantations: a model-based approach. Theor A p p l Genet 108, 1162-1171. Wang, X . , Tank, D .C , and Sang, T. (2000). Phy logeny and divergence t imes i n Pinaceae:evidence from three genomes. M o l e c u l a r B i o l o g y E v o l u t i o n 17, 773-781 . Wang, Z., Dooley, T.P., Curto, E.V., Davis, R.L., and VandeBerg, J.L. (2004b). Cross-species appl icat ion o f c D N A microarrays to profile gene expression us ing U V - i n d u c e d melanoma in M o n o d e l p h i s domestica as the model system. G e n o m i c s 83, 588-599. Wehmeyer, N., and Vierling, E. (2000). The expression o f smal l heat shock proteins in seeds responds to discrete developmental signals and suggests a general protective role in desiccation tolerance. Plant P h y s i o l 122, 1099-1108. Weichert, H., Kolbe, A., Kraus, A., Wasternack, C , and Feussner, I. (2002). M e t a b o l i c p rof i l ing o f o x y l i p i d in germinat ing cucumber seedlings- l ipoxgenase-dependent degradation o f t r iacylglycerols and biogenesis o f volat i le aldehydes. Planta , 612-619. Whetten, R., Y . , S., Zhang, Y . , and Sederoff, R. (2001). Funct ional genomics and ce l l w a l l biosynthesis in l o b l o l l y pine. Plant M o l B i o l , 275-+291. Whitbred, J.M., and Schuler, M.A. (2000). M o l e c u l a r characterization o f C Y P 7 3 A 9 and C Y P 8 2 A 1 P 4 5 0 genes invo lved in plant defense in pea. Plant P h y s i o l 124, 47-58. White, C N . , and Rivin, C.J. (1995). Character izat ion and expression o f a c D N A encoding a seed-specific metal lothionein in maize. Plant P h y s i o l 108, 831-832. Williams, C.G. (2001). Forestry's third revolut ion: integrating b io technology into P inus taeda L . breeding programs. Southern journal o f appl ied forestry 25, 116-121. Wolucka, B.A., Goossens, A., and Inze, D. (2005). M e t h y l jasmonate stimulates the de novo biosynthesis o f v i t amin C in plant ce l l suspensions. J E x p B o t 56, 2527-2538. Wu, Y . , Sharp, R.E., Durachko, D.M., and Cosgrove, D J . (1996). G r o w t h maintenance o f the maize pr imary root at l ow water potentials involves increases in c e l l - w a l l extension properties, expansin act ivi ty , and w a l l susceptibi l i ty to expansins. Plant P h y s i o l 111, 765-772. www.cgi. w w w . w v • r i ck . sbs .wsu . edu /cg i -b in /A thena / cg i /home .p l . 146 Xiao, L., and Koster, K.L. (2001). Desiccation tolerance of protoplasts isolated from pea embryos. J Exp Bot 52, 2105-2114. Xu, W., Bak, S., Decker, A., Paquette, S.M., Feyereisen, R., and Galbraith, D.W. (2001) . Microarray-based analysis of gene expression in very large gene families: the cytochrome P450 gene superfamily of Arabidopsis thaliana. Gene 272, 61-74. Xu, Y.aJ., C.H. (2001). A clock-and light-regulated gene that links the circadian oscillator to LHCB gene expression. The Plant Cell 13, 1411-1425. Yang, I.V., Chen, E., Hasseman, J.P., Liang, W., Frank, B.C., Wang, S., Sharov, V., Saeed, A.I., White, J., Li , J., Lee, N.H., Yeatman, T.J., and Quackenbush, J. (2002) . Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol 3, research0062. Yang, Y.H., and Speed, T. (2002). Design issues for cDNA microarray experiments. Nature Reviews: Genetics 3, 579-588. Yu, O., Shi, J., Hession, A.O., Maxwell, C.A., McGonigle, B., and Odell, J.T. (2003). Metabolic engineering to increase isoflavone biosynthesis in soybean seed. Phytochemistry 63, 753-763. Zeng, Q.Y., Lu, H., and Wang, X.R. (2005). Molecular characterization of a glutathione transferase from Pinus tabulaeformis (Pinaceae). Biochimie 87, 445-455. Zhang, X.H., and Chiang, V.L. (1997). Molecular cloning of 4-coumarate:coenzyme A ligase in loblolly pine and the roles of this enzyme in the biosynthesis of lignin in compression wood. Plant Physiol 113, 65-74. Zimmermann, P., Hirsch-Hoffmann, M., Hennig, L., and Gruissem, W. (2004). GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136, 2621-2632. Zuther, E., Buchel, K., Hundertmark, M., Stitt, M., Hincha, D.K., and Heyer, A.G. (2004). The role of raffinose in the cold acclimation response of Arabidopsis thaliana. FEBS Lett 576, 169-173. 10.0 Appendix Log_2 foreground signal per channel 10.2 Appendix 2. Box-plot depicting the mean foreground signal for each sample and channel The center line of each box indicated the median signal, top and bottom of the boxes represents the 1 st and 3rd quartile, and the whiskers measure +/- 1.5 x median Log_2 background signal per channel Sample/Channel 10.3 Appendix 3. Box-plot depicts the mean background signal for each sample and channel The center line of each box indicated the median signal, top and bottom of the boxes represents the 1st and 3rd quartile, and the whisker measure +/- 1.5 x median. 10.4 Appendix 4. Non-redundant list of the genes that are highly expressed during the germination of all six genera. Shown are EST identifiers (EST ID), predicted Arabidopsis annotations of the gives EST, and the mean foreground to background ratio of each EST (FG:BG) EST ID Arabidopsis annotation FG:BG WS0042_L03 At5g20620-68299.m02163 T1M15.20 polyubiquitin (UBQ4) 53.3 WS0018_C24 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 42.1 IS0014JH22 At5g20620-68299.m02163 T1M15.20 polyubiquitin (UBQ4) 39.7 WS0039_O05 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 37.3 WS0097_H14 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 36.5 IS0011J03 At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) 35.7 WS0045JH05 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 32.6 WS0061_B11 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 32.4 IS0013_M14 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 32.2 WS00112_J01 At2g32070-68297.m03344 F22D22.18 putative CCR4-associated factor 32.2 WS0062_A12 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 31.9 IS0013_K05 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 31.3 WS0061_B08 At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) 31.3 WS0042_O06 At5g20620-68299.m02163T1M15.20 polyubiquitin (UBQ4) 30.7 WS0016_B12 At3g04400-68298.m00410 T27C4.4 ribosomal protein L17 30.3 WS0037_J18 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 30.2 IS0012_B18 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 30.1 WS0056_N07 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 30.0 WS0043_M01 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 29.6 WS0023_F04 At1 g65350-68300.m06432 T8F5.13 ubiquitin, putative similar to ubiquitin 29.4 WS0024_J13 At3g04400-68298.m00410 T27C4.4 ribosomal protein L17 28.0 WS0037_K09 At3g08690-68298.m00871 F17014.16 E2, ubiquitin-conjugating enzyme 11 26.6 WS0045_G14 At3g53980-68298.m05116 F5K20.280 putative protein hypothetical protein At2g37870 26.5 WS00113_A21 At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) 24.5 WS00111_F06 At1g74960-68300.m07574 F25A4.7 F9E10.19 3-ketoacyl-ACP synthase 24.1 WS0016_N13 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 23.9 WS0043_ E05 At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) 22.0 WS00914 LE15 At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) 20.5 WS0023_ K09 At3g43810-68298.m04027 T28A8.100 calmodulin 7 20.4 WS0061_ _M01 At5g20620-68299.m02163 T1M15.20 polyubiquitin (UBQ4) 20.3 WS0071_ B04 At1g01620-68300.m00069 F22L4.16 plasma membrane intrinsic protein 1c 20.1 IS0012_O12 At1g18720-68300.m02019 F6A14.17 unknown protein similar to YGL010w-like protein 20.0 WS0046_ L22 At5g59370-68299.m05882 F2015.3 actin 4 19.9 WS0032_ H13 At3g54820-68298.m05206 F28P10.200 T5N23.2 aquaporin/MIP - like protein 19.8 WS0019_ C03 At3g26650-68298.m02939 MLJ15.3 glyceraldehyde 3-phosphate dehydrogenase A subunit 19.7 WS0022_ _I15 At2g38530-68297.m04045 T6A23.27 putative nonspecific lipid-transfer protein 19.3 WS00919_L08 At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) 19.0 WS0011_ N19 At4g26840-68296.m02896 F10M23.180 ubiquitin-like protein ubiquitin-like protein 18.7 WS0014_ .J20 At2g25210-51099.m00039 T22F11.20 60S ribosomal protein L39 18.4 WS0018_ D05 At3g16040-68298.m01739 MSL1.7 hypothetical protein predicted by genemark 18.1 WS0047_ J20 At4g31985-68296.m03498 F11C18.17 F10N7.2 Expressed protein 17.9 WS00919_J24 At3g04400-68298.m00410 T27C4.4 ribosomal protein L17 17.8 WS0022_ _L16 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 17.4 WS0024_ _D14 At2g47110-68297.m05030 F14M4.6 ubiquitin extension protein (UBQ6) 17.3 WS0036_ H14 At4g04930-68296.m00552 T1J1.1 putative fatty acid desaturase 17.3 WS0074_ _E20 At1g50900-68300.m04979 F8A12.12 unknown protein 17.3 WS0047_ _G19 At5g35680-68299.m03197 MXH1.2 MJE4.1 putative protein 17.2 WS0048_ _B24 At2g37170-68297.m03894 T2N18.7 aquaporin 17.2 WS0024_ _H22 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 17.1 WS0084_ _F06 At5g21150-68299.m02223 T10F18.180 zwille/pinhead-like protein 17.0 WS0056_ .P17 At2g16850-68297.m01614 F12A24.3 T24I21.27 putative plasma membrane 17.0 WS00113_C16 At1g67090-68300.m06634 F1019.14 F5A8.1 ribulose-bisphosphate carboxylase 16.8 WS0036_ J314 At4g14960-68296.m01532 FCAALL.199 tubulin alpha-6 chain (TUA6) 16.7 WS0064. _E16 At3g02190-68298.m00164 F1C9.36 F14P3.16 putative ribosomal protein L39 16.7 WS0014. .N07 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 16.5 WS0024. _F18 At4g14385-68296.m01467 FCAALL.166 Expressed protein 16.4 WS0051. _P17 At5g56010-68299.m05494 MDA7.5 heat shock protein 90 16.2 WS00112_K23 At3g49010-68298.m04571 T2J13.150 60S ribosomal protein L13, BBC1 protein 16.0 WS0011. _D14 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 15.6 WS0022_M13 At5g56670-68299.m05570 MIK19.12 40S ribosomal protein S30 homolog 15.6 WS0017_C02 At1g66240-68300.m06536 T6J19.6 copper homeostasis factor 15.5 WS0019_O21 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 15.4 WS0033_E05 At3g21240-68298.m02336 MXL8.10 putative 4-coumarate:CoA ligase 2 15.1 WS0017_F20 At2g17380-68297.m01672 F5J6.14 clathrin assembly protein AP19 14.9 IS0014_J11 At3g62290-68298.m06035 T17J13.250 ADP-ribosylation factor-like protein 14.9 WS0011_L05 At3g54890-68298.m05214 F28P10.130 chlorophyll a/b-binding protein 14.8 IS0011_F06 At2g07696-68297.m00771 T18C6.8 hypothetical protein 14.6 WS0039_E17 At1g15690-68300.m01623 F7H2.3 inorganic pyrophosphatase 14.6 WS0042_G22 At3g23810-68298.m02625 MYM9.17 S-adenosyl-L-homocysteinas 14.6 WS0048_D17 At2g16850-68297.m01614 F12A24.3 T24I21.27 putative plasma membrane 14.6 WS0047_O11 At3g09390-68298.m00960 F3L24.28 metallothionein-like protein 14.5 WS0039_L11 At5g44340-68299.m04156 K9L2.12 tubulin beta-4 chain (sp|P24636) 14.5 WS0094J.24 At4g17530-68296.m01809 FCAALL.87 ras-related small GTP-binding protein RAB1c 14.3 WS0023_E06 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 14.3 WS0079_F12 At2g40810-68297.m04307 T20B5.1 T7D17.19 hypothetical protein 14.2 WS0018_H06 At2g16850-68297.m01614 F12A24.3 T24I21.27 putative plasma membrane 14.2 WS0011_J08 At4g01850-68296.m00210 T7B11.11 S-adenosylmethionine synthase 2 14.0 WS00110_P08 At5g11420-68299.m01189 F15N18.10 F2I11.1 putative protein predicted proteins in castor bean 13.9 IS0014_C07 At3g43810-68298.m04027 T28A8.100 calmodulin 7 13.6 WS0018_P01 At1g67090-68300.m06634 F1019.14 F5A8.1 ribulose-bisphosphate carboxylase 13.4 WS0043_D10 At5g02380-68299.m00145 T1E22.140 metallothionein 2b 13.3 WS00112_G12 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 13.3 WS0036_B08 At4g02890-68296.m00328 T5J8.21 polyubiquitin (UBQ14) 13.2 WS0044_D13 At3g60245-68298.m05802 F27H5.3 Expressed protein 13.2 WS00912_O11 At4g08350-68296.m00759 T28D5.40 putative protein chromatin structural protein 13.0 WS0023_M18 At4g35090-68296.m03874 T12J5.2 M4E13.140 catalase 12.9 WS0021_C04 At1g61070-68300.m05963 T7P1.20 unknown protein contains gamma-thionin domain 12.9 WS0045_O10 At3g15353-68298.m01663 K7L4.17 Expressed protein 12.8 IS0014J13 At3g15353-68298.m01663 K7L4.17 Expressed protein 12.8 IS0011_J03 At1g23740-68300.m02605 F508.29 putative auxin-induced protein 12.7 WS0023_C07 At2g20260-68297.m02003 F11A3.19 putative photosystem I reaction center subunit IV 12.6 IS0012_C11 At4g01850-68296.m00210 T7B11.11 S-adenosylmethionine synthase 2 12.5 WS00918_B16 At1g63970-68300.m06283 T12P18.1 unknown protein similar to hypothetical protein 12.5 WS01013_M14 At5g05610-68299.m00555 MOP10.15 nucleic acid binding protein-like 12.5 WS00112_N11 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 12.4 WS00110_E02 At1g42970-68300.m04294 F13A11.3 glyceraldehyde-3-phosphate dehydrogenase 12.4 WS0078. .004 At3g59400-68298.m05705 F25L23.260 putative protein hypothetical protein 238 12.2 WS0011_ _N01 At1g67090-68300.m06634 F1019.14 F5A8.1 ribulose-bisphosphate carboxylase 12.2 WS0012_ _B14 At2g37600-68297.m03939 F13M22.10 60S ribosomal protein L36 12.1 WS00112_B07 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 12.1 WS0011_ A11 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 12.0 WS0011_ C17 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 12.0 WS0078_ .E17 At5g08380-68299.m00893 F8L15.110 alpha-galactosidase 11.9 WS0034_ _D19 At5g35530-68299.m03179 MOK9.14 40S ribosomal protein S3 11.9 WS0075_ _N03 At5g20290-68299.m02130 F5O24.180 putative protein ribosomal protein S8 11.9 WS0042_ _K22 At1g07890-68300.m00743 F24B9.2 L-ascorbate peroxidase 11.8 WS0023_ _D04 At3g15353-68298.m01663 K7L4.17 Expressed protein 11.8 WS0022_ .F21 At3g15353-68298.m01663 K7L4.17 Expressed protein 11.7 WS0084_ M05 At1g07700-68300.m00722 F24B9.21 Expressed protein 11.7 WS0024_ .112 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 11.7 WS0023_ .E07 At5g02960-68299.m00214 F9G14.270 putative protein ribosomal protein S23 11.6 WS0012_ K12 At1g61150-68300.m05972 F11P17.12 hypothetical protein predicted by genemark.hmm 11.5 WS0084_ L03 At4g26080-68296.m02800 F20B18.190 protein phosphatase ABI1 11.5 WS0019_ _M11 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 11.5 WS0097_ _P05 At2g34510-68297.m03609 T31E10.15 unknown protein 11.5 WS0032_ .113 At2g31670-68297.m03298 T9H9.19 unknown protein 11.4 WS0048_ _B21 At5g48380-68299.m04617 K23F3.10 MJE7.1 receptor-like protein kinase 11.4 WS0022_ _M22 At3g60245-68298.m05802 F27H5.3 Expressed protein 11.3 WS0039. _L01 At4g14960-68296.m01532 FCAALL.199 tubulin alpha-6 chain (TUA6) 11.3 WS00716_O11 At1g23360-68300.m02539 F26F24.24 spore germination protein c2 11.3 WS0105_ G17 At4g14960-68296.m01532 FCAALL.199 tubulin alpha-6 chain (TUA6) 11.3 WS0018_ 007 At2g43590-68297.m04630 F18019.30 T1024.43 putative endochitinase 11.2 WS0045_ _J14 At3g04400-68298.m00410 T27C4.4 ribosomal protein L17 11.1 WS0034_ _E09 At3g62920-68298.m06098 T20O10.20 putative protein 11.1 WS0021_ .L22 At3g15353-68298.m01663 K7L4.17 Expressed protein 10.9 WS0023_D03 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 10.9 WS00111_B14 At1g31340-68300.m03341 T19E23.13 putative heat shock transcription factor 10.9 WS0038_A09 At5g56720-68299.m05575 MIK19.17 cytosolic malate dehydrogenase 10.8 WS0038_C05 At2g38010-68297.m03987 T8P21.8 hypothetical protein predicted by genscan 10.7 IS0011_011 At4g36850-68296.m04076 AP22.27 putative protein 10.7 WS0041_K12 At1g75630-68300.m07649 F10A5.17 vacuolar ATP synthase 10.7 WS0036_H06 At5g35680-68299.m03197 MXH1.2 MJE4.1 putative protein 10.6 WS0041J14 At2g36580-68297.m03832 F1011.21 putative pyruvate kinase 10.6 WS0018_N12 At3g15353-68298.m01663 K7L4.17 Expressed protein 10.5 WS0093_D13 At2g43590-68297.m04630 F18019.30 T1024.43 putative endochitinase 10.5 WS0022_J07 At3g56240-68298.m05363 F18021.200 copper homeostasis factor 10.5 WS0057_M07 At5g59240-68299.m05866 MNC17.7 40S ribosomal protein S8 10.5 IS0014_D03 At5g52640-68299.m05104 F6N7.13 heat-shock protein 10.5 WS0024_G07 At5g35530-68299.m03179 M0K9.14 40S ribosomal protein S3 10.2 WS0018_G14 At2g38140-68297.m04000 F16M14.7 30S ribosomal protein S31 10.2 WS0011_J19 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 10.2 WS0022_G22 At3g15353-68298.m01663 K7L4.17 Expressed protein 10.2 WS0022_E24 At3g15353-68298.m01663 K7L4.17 Expressed protein 10.2 WS0011_G16 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 10.0 WS0104_F07 At3g16950-68298.m01862 K14A17.6 dihydrolipoamide dehydrogenase 10.0 WS00110J11 At1g55670-68300.m05538 F20N2.33 photosystem I subunit V precursor 9.9 WS00111_C22 At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like 9.8 WS00111_N24 At2g38530-68297.m04045 T6A23.27 putative nonspecific lipid-transfer protein 9.8 WS0022_B22 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 9.7 WS00811_N01 At5g51570-68299.m04986 K17N15.12 putative protein 9.6 WS00713_G02 At2g22360-68297.m02249 F14M13.24 putative DnaJ protein 9.6 WS0042_K18 At1g64230-68300.m06311 F22C12.2 E2, ubiquitin-conjugating enzyme 9.4 WS0061_F18 At5g02960-68299.m00214 F9G 14.270 putative protein ribosomal protein S23 9.4 IS0011_K15 At3g23810-68298.m02625 MYM9.17 S-adenosyl-L-homocysteinas 9.4 WS0024_G01 At2g46930-68297.m05013 F14M4.24 putative pectinesterase 9.3 WS0037_F07 At3g11510-68298.m01205 F24K9.19 putative 40S ribosomal protein s14 9.2 WS01010_H06 At5g53500-68299.m05208 MNC6.4 putative protein 9.1 WS0032_N03 At1g75170-68300.m07599 F22H5.20 unknown protein 9.1 WS0022J-08 WS0012_G05 WS0019_B07 WS0044_J22 WS0023_I05 WS0016JD14 WS0024_I19 WS0083_O22 WS0044_O19 WS0012_L12 WS0024_C22 WS00110J20 WS0064_A20 WS0043_N05 WS0063_M05 IS0014J24 WS0041_F24 WS0034_K08 IS0014_M24 WS00113_C02 WS00111J-05 WS0024J.12 WS0037J13 WS0038_P04 WS01010_B08 WS0018J17 IS0012_H19 WS00111J02 WS0022_B08 WS0038_G21 IS0013_M16 WS0061_G11 WS0061_N18 ON At4g38970-68296.m04337 F19H22.70 putative fructose-bisphosphate aldolase fructose-bisphosphate aldolase 9.1 At4g10340-68296.m01015 F24G24.140 chlorophyll a/b-binding protein 9.0 At2g18370-68297.m01791 T30D6.12 putative lipid transfer protein 9.0 At1g70600-68300.m07067 F24J13.17 F24J13.17 60S ribosomal protein L27A 9.0 At1g31330-68300.m03339 T19E23.12 photosystem I subunit III precursor 9.0 At1g67090-68300.m06634 F1019.14 F5A8.1 ribulose-bisphosphate carboxylase 9.0 At1g57860-68300.m05709 F12K22.19 60S ribosomal protein L21 9.0 At1 g61070-68300.m05963 T7P1.20 unknown protein contains gamma-thionin domain 8.9 At1g47128-68300.m04529 F2G19.31 cysteine proteinase RD21A 8.9 At1g61150-68300.m05972 F11P17.12 hypothetical protein predicted by genemark.hmm 8.9 At3g15353-68298.m01663 K7L4.17 Expressed protein 8.9 At5g34940-68299.m03106 T2L5.6 MGG23.2 putative protein heparanase 8.9 At3g15353-68298.m01663 K7L4.17 Expressed protein 8.8 At3g17390-68298.m01911 MGD8.26 putative s-adenosylmethionine synthetase 8.8 At3g48570-68298.m04526 T8P19.80 Expressed protein 8.8 At1g47128-68300.m04529 F2G19.31 cysteine proteinase RD21A 8.8 At3g15353-68298.m01663 K7L4.17 Expressed protein 8.7 At4g20890-68296.m02172 T13K14.50 tubulin beta-9 chain 8.7 At4g27090-68296.m02921 T24A18.40 ribosomal protein L14 8.7 At2g38540-68297.m04046 T6A23.26 putative nonspecific lipid-transfer protein 8.6 At3g61110-68298.m05893 T27I15.200 ribosomal protein S27 8.6 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein 8.6 At3g12110-68298.m01287 T21B14.7 actin 11 (ACT11) identical to actin 11 8.6 At4g33865-68296.m03714 F17I5.3 Expressed protein 8.6 At5g16450-68299.m01707 MQK4.18 S-adenosylmethionine:2-demethylmenaquinone methyltransferase-like protein 8.6 At1g12820-68300.m01284 F13K23.7 transport inhibitor response 1 (TIR1) 8.6 At3g62290-68298.m06035 T17J13.250 ADP-ribosylation factor-like protein 8.5 At2g21290-68297.m02141 F3K23.5 hypothetical protein predicted by genefinder 8.5 At1g79040-68300.m08020 YUP8H12R.34 photosystem II polypeptide 8.5 At1 g70480-68300.m07049 F24J13.5 unknown protein identical to most of OBP32 8.5 At3g28760-68298.m03181 T19N8.5 hypothetical protein 8.4 At1 g26880-68300.m02856 T2P11.7 60s ribosomal protein L34 8.4 At3g12500-68298.m01331 T2E22.18 basic chitinase identical to basic chitinase 8.4 W S 0 0 8 1 _ E 1 0 WS0051_G19 W S 0 0 9 1 4 _ A 1 1 W S 0 0 4 4 _ J 0 8 W S 0 0 6 2 _ F 0 4 W S 0 0 7 9 _ A 1 4 W S 0 0 4 3 _ H 1 7 W S 0 0 1 3 _ H 2 0 W S 0 0 2 4 _ L 2 4 W S 0 0 1 9 _ O 0 8 W S 0 0 5 8 _ L 0 6 W S 0 0 4 1 _ I 0 8 W S 0 0 4 3 _ A 2 1 W S 0 0 1 3 _ E 2 1 I S 0 0 1 4 _ K 0 8 W S 0 0 1 5 _ G 0 6 W S 0 0 1 9 _ B 2 1 I S 0 0 1 4 J . 0 8 W S 0 0 1 7 _ O 0 7 W S 0 0 4 3 _ L 1 1 W S 0 0 2 2 _ P 0 7 W S 0 0 9 1 1 _ K 0 7 W S 0 1 0 1 0 _ G 0 4 W S 0 0 4 1 _ N 0 9 I S 0 0 1 4 _ N 0 3 W S 0 0 1 1 2 _ P 0 5 W S 0 0 9 1 9 _ E 0 4 W S 0 0 1 1 1 _ L 2 4 W S 0 0 3 5 J 1 1 I S 0 0 1 1 _ D 2 0 W S 0 0 3 7 _ M 2 4 W S 0 0 3 4 _ G 0 6 W S 0 0 2 1 _ O 0 5 A t5g24510 -68299 .m02491 K 1 8 P 6 . 3 T31K7 .1 60s ac id ic r ibosomal protein P1 A t 3 g 4 6 0 4 0 - 6 8 2 9 8 . m 0 4 2 6 5 F 1 2 M 1 2 . 1 0 cy top lasmic r ibosomal protein S 1 5 a -l ike A t 4 g 1 4 6 8 0 - 6 8 2 9 6 . m 0 1 5 0 0 F C A A L L . 1 2 8 ATP-su l f u r y l ase A t 1 g 1 5 9 5 0 - 6 8 3 0 0 . m 0 1 6 4 9 T 2 4 D 1 8 . 5 c innamoy l C o A reductase A t 4 g 1 8 7 3 0 - 6 8 2 9 6 . m 0 1 9 4 3 F 2 8 A 2 1 . 1 4 0 r ibosomal protein L11 A t 3 g 6 0 7 2 0 - 6 8 2 9 8 . m 0 5 8 5 4 T 4 C 2 1 . 1 3 0 secretory protein A t 4 g 3 8 7 4 0 - 6 8 2 9 6 . m 0 4 3 0 7 T 9 A 1 4 . 2 0 peptidylprolyl i somerase R O C 1 At1 g 7 0 2 3 0 - 6 8 3 0 0 . m 0 7 0 2 3 F 2 0 P 5 . 5 unknown protein A t 2 g 4 7 1 1 0 - 6 8 2 9 7 . m 0 5 0 3 0 F 1 4 M 4 . 6 ubiquitin ex tens ion protein ( U B Q 6 ) A t 5 g 1 7 9 2 0 - 6 8 2 9 9 . m 0 1 8 6 4 M P I 7 . 6 0 5-methy l te t rahydropteroy l t r ig lu tamate-homocyste ine S-methy l t rans ferase A t 2 g 0 7 6 9 0 - 6 8 2 9 7 . m 0 0 8 1 6 T12J2 .1 putative D N A repl icat ion A t 3 g 1 6 0 8 0 - 6 8 2 9 8 . m 0 1 7 4 3 M S L 1 . 1 2 putative r ibosomal protein A t 5 g 5 1 7 2 0 - 6 8 2 9 9 . m 0 5 0 0 3 M I 0 2 4 . 1 4 unknown protein A t 5 g 5 1 4 3 0 - 6 8 2 9 9 . m 0 4 9 7 0 M F G 1 3 . 1 4 putative protein A t 4 g 1 4 3 2 0 - 6 8 2 9 6 . m 0 1 4 5 9 F C A A L L . 1 2 4 r ibosomal protein A t 3 g 2 3 8 1 0 - 6 8 2 9 8 . m 0 2 6 2 5 M Y M 9 . 1 7 S -adenosy l -L -homocys te inas A t 4 g 2 1 9 6 0 - 6 8 2 9 6 . m 0 2 3 0 2 F 1 N 2 0 . 3 T 8 O 5 . 1 7 0 perox idase prxr l A t 3 g 1 5 3 5 3 - 6 8 2 9 8 . m 0 1 6 6 3 K 7 L 4 . 1 7 E x p r e s s e d protein A t 4 g 2 1 2 8 0 - 6 8 2 9 6 . m 0 2 2 1 4 T 6 K 2 2 . 2 0 F 7 J 7 . 2 2 0 photosys tem II oxygen-evo lv ing comp lex A t 1 g 5 7 8 6 0 - 6 8 3 0 0 . m 0 5 7 0 9 F 1 2 K 2 2 . 1 9 6 0 S r ibosomal protein L21 A t 2 g 2 1 6 6 0 - 6 8 2 9 7 . m 0 2 1 7 9 F 2 G 1 . 4 g lyc ine-r ich R N A binding protein 7 A t 2 g 2 8 7 6 0 - 6 8 2 9 7 . m 0 2 9 7 4 F 8 N 1 6 . 5 putative nuc leot ide-sugar dehydra tase A t 3 g 1 2 1 1 0 - 6 8 2 9 8 . m 0 1 2 8 7 T 2 1 B 1 4 . 7 act in 11 ( A C T 1 1 ) identical to act in 11 A t5g24510 -68299 .m02491 K 1 8 P 6 . 3 T31K7 .1 60s ac id ic r ibosomal protein P1 A t 5 g 5 5 8 5 0 - 6 8 2 9 9 . m 0 5 4 7 3 M W J 3 . 3 M D F 2 0 . 2 9 NOI protein, ni trate- induced A t 3 g 2 7 6 9 0 - 6 8 2 9 8 . m 0 3 0 5 6 M G F 1 0 . 1 0 putative chlorophyl l A - B binding protein A t 3 g 1 5 3 5 3 - 6 8 2 9 8 . m 0 1 6 6 3 K 7 L 4 . 1 7 E x p r e s s e d protein A t 2 g 3 8 5 4 0 - 6 8 2 9 7 . m 0 4 0 4 6 T 6 A 2 3 . 2 6 putative nonspeci f ic l ipid-transfer protein A t 3 g 1 6 6 4 0 - 6 8 2 9 8 . m 0 1 8 2 7 M G L 6 . 1 9 translat ional ly control led tumor protein-l ike protein A t 3 g 1 5 3 5 3 - 6 8 2 9 8 . m 0 1 6 6 3 K 7 L 4 . 1 7 E x p r e s s e d protein A t 3 g 0 4 1 2 0 - 6 8 2 9 8 . m 0 0 3 8 2 T 6 K 1 2 . 2 6 g lycera ldehyde-3-phosphate dehyd rogenase C A t 5 g 5 6 0 3 0 - 6 8 2 9 9 . m 0 5 4 9 6 M D A 7 . 7 H E A T S H O C K P R O T E I N 81-2 A t 3 g 4 7 3 7 0 - 6 8 2 9 8 . m 0 4 4 1 0 T 2 1 L 8 . 1 2 0 4 0 S r ibosomal protein S20- l i ke protein 8.3 8.3 8.3 8.2 8.2 8.2 8.1 8.1 8.1 8.1 8.0 8.0 8.0 8.0 8.0 7.9 7.9 7.9 7.9 7.9 7.8 7.8 7.8 7.8 7.7 7.7 7.7 7.7 7.7 7.6 7.6 7.6 7.6 WS0022_P13 WS0044_A18 WS00911_J12 WS00713_B24 WS00111_E18 WS0012_A02 WS0043_N11 WS00911_M24 WS0044_F06 WS00113_C10 WS0022J09 WS0012_J1'4 WS00110_O22 WS0017_K23 IS0014_A20 WS0041_L20 WS0024_J17 WS00110_H03 WS0021_C05 WS0072_H06 WS0011_O09 WS0013_C06 WS0017_F07 WS00919_L14 WS00112J03 WS0022J.02 WS0056_F11 WS01010_E08 WS0011_M19 WS0017_D11 WS0038_M20 WS00113_J01 IS0013_C09 0 0 At1g61520-68300.m06010 T25B24.12 PSI type III chlorophyll a/b-binding protein At1g26240-68300.m02788 F28B23.10 hypothetical protein At3g15353-68298.m01663 K7L4.17 Expressed protein At4g27270-68296.m02939 M4I22.80 putative protein LEDI-3 protein At3g15353-68298.m01663 K7L4.17 Expressed protein At4g35100-68296.m03876 T12J5.9 M4E13.150 plasma membrane intrinsic protein At3g15353-68298.m01663 K7L4.17 Expressed protein At5g14670-68299.m01522 T15N1.160 ADP-ribosylation factor - like protein At2g02120-68297.m00116 F504.11 protease inhibitor II At4g28390-68296.m03072 F20O9.60 ADP.ATP carrier-like protein ADP.ATP carrier At1g08830-68300.m00846 F22013.32 superoxidase dismutase At4g38890-68296.m04328 F19H22.4T9A14.170 putative protein unknown mRNA At5g14930-68299.m01552 F2G14.50 putative protein disease resistance protein EDS1 At5g35680-68299.m03197 MXH1.2 MJE4.1 putative protein At5g08180-68299.m00863 T22D6.120 nhp2-like protein high mobility group-like At1g24735-68300.m02697 F5A9.20 F5A9.20 similar At3g56940-68298.m05442 T8M16.1 F24I3.20 leucine zipper-containing protein AT103 At3g51680-68298.m04864 T18N14.60 short-chain alcohol dehydrogenase-like At1g01620-68300.m00069 F22L4.16 plasma membrane intrinsic protein 1c At4g27440-68296.m02966 F27G19.40 protochlorophyllide reductase precursor At3g06680-68298.m00687 T8E24.9 F3E22.18 ribosomal protein L29 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein At3g61110-68298.m05893 T27I15.200 ribosomal protein S27 At4g35100-68296.m03876 T12J5.9 M4E13.150 plasma membrane intrinsic protein At3g15353-68298.m01663 K7L4.17 Expressed protein At4g25590-68296.m02739 M7J2.40 actin depolymerizing factor-like protein At2g02120-68297.m00116 F504.11 protease inhibitor II At5g05850-68299.m00583 MJJ3.27 putative protein strong At4g01150-68296.m00135 F2N1.18 hypothetical protein At3g63410-68298.m06160 MAA21.40 putative chloroplast inner envelope protein At1g64230-68300.m06311 F22C12.2 E2, ubiquitin-conjugating enzyme At2g20420-68297.m02019 F11A3.3 succinyl-CoA ligase beta subunit At3g49010-68298.m04571 T2J13.150 60S ribosomal protein L13, BBC1 protein 7.5 7.5 7.5 7.5 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.3 7.3 7.3 7.3 7.3 7.2 7.2 7.2 7.2 7.2 7.1 7.1 7.1 7.1 7.1 7.0 7.0 7.0 7.0 7.0 6.9 6.9 WS0023_O02 WS0044_M19 WS0043_P19 IS0013_B08 WS0023_B05 WS00111_F13 IS0013_M24 WS0022J12 WS0012_D10 WS0024J18 WS00712_D13 WS0012_D09 WS0087_F16 WS0011_K13 WS00111_K06 WS0034_C20 WS0023_K07 IS0014_O11 IS0014_F17 WS0036_K19 WS0074_G03 IS0011J01 WS0019_O12 WS0018_F04 WS0031_N14 WS0089J24 WS0021_H24 WS0022_I16 WS0062_J07 WS0076_F09 WS00113_D18 WS0073_A18 WS0021_F18 L/1 At2g18110-68297.m01761 T27K22.2 F8D23.11 putative elongation factor beta-1 At4g18100-68296.m01871 F15J5.70 ribosomal protein L32 -like protein ribosomal protein At3g11510-68298.m01205 F24K9.19 putative 40S ribosomal protein s14 At1g16180-68300.m01671 T24D18.26 unknown protein At4g39090-68296.m04349 F19H22.190 cysteine proteinase RD19A At3g45980-68298.m04259 F16L2.190 histone H2B At2g40510-68297.m04274 T2P4.14 40S ribosomal protein S26 At3g15353-68298.m01663 K7L4.17 Expressed protein At2g32060-68297.m03343 F22D22.19 40S ribosomal protein S12 At3g47470-68298.m04420 F1P2.20 CHLOROPHYLL A-B BINDING PROTEIN 4 At3g55800-68298.m05316 F1116.210 sedoheptulose-bisphosphatase precursor At1g44575-68300.m04445 T18F15.3 photosystem II 22kDa protein At2g02120-68297.m00116 F504.11 protease inhibitor II At3g61110-68298.m05893 T27I15.200 ribosomal protein S27 At1g31812-68300.m03405 F5M6.27 Acyl CoA binding protein At5g10980-68299.m01138 T30N20.250 T5K6.6 histon H3 protein At5g09510-68299.m00975 T5E8.310 ribosomal protein S15-like ribosomal protein At5g58420-68299.m05769 MQJ2.1 MCK7.1 ribosomal protein S4 - like At1g52000-68300.m05109 F5F19.6 myrosinase binding protein At3g12490-68298.m01330 T2E22.19 cysteine proteinase inhibitor At2g21580-68297.m02170 F2G1.15 40S ribosomal protein S25 At4g09320-68296.m00878 T30A10.80 nucleoside-diphosphate kinase At2g38540-68297.m04046 T6A23.26 putative nonspecific lipid-transfer protein At2g45290-68297.m04819 F4L23.20 putative transketolase precursor At4g14880-68296.m01524 FCAALL.34 cytosolic 0-acetylserine(thiol)lyase At4g39700-68296.m04420 T19P19.90 F23K16.6 putative protein predicted protein At3g25220-68298.m02776 MJL12.19 immunophilin (FKBP15-1) identical At3g45980-68298.m04259 F16L2.190 histone H2B At3g13920-68298.m01503 MDC16.5 Eukaryotic initiation factor 4A At3g53750-68298.m05093 F5K20.50 actin (ACT3) At2g02120-68297.m00116 F504.11 protease inhibitor II At5g10860-68299.m01121 T30N20.130 putative protein 110K5.11 At1g78370-68300.m07946 F3F9.23 2,4-D inducible glutathione S-transferase 6.9 6.9 6.9 6.9 6.9 6.8 6.8 6.8 6.7 6.7 6.7 6.7 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.5 6.5 6.5 6.5 6.5 6.5 6.4 6.4 6.4 6.4 6.4 6.4 6.4 WS0104_M13 WS0052_N12 WS0064_I24 WS0038_C11 WS0012_F07 WS0012_B11 WS0023_E05 WS00112_B17 WS00912_G16 WS0031_J04 WS0045_G21 WS0043_I23 IS0012_H08 WS0064_D18 WS0023_L02 WS00112_F19 WS0058_C05 WS0016_H14 WS0051_F24 IS0011J-22 WS0097_I03 WS0102_N10 IS0013_D18 WS0021_F12 WS0022_H05 WS00914_E07 IS0012_P17 WS0024_G11 WS0087_C04 WS0044J17 WS00112_O01 WS0023_C12 WS0075 J14 ON o At5g59370-68299.m05882 F2015.3 actin 4 At2g18110-68297.m01761 T27K22.2 F8D23.11 putative elongation factor beta-1 At2g02120-68297.m00116 F504.11 protease inhibitor II At1g15270-68300.m01573 F9L1.21 unknown protein ESTs At1g62750-68300.m06148 F23N19.11 unknown protein similar to elongation factor At1g74670-68300.m07527 F1M20.35 GAST1-like protein similar to GAST1 At1g42970-68300.m04294 F13A11.3 glyceraldehyde-3-phosphate dehydrogenase At5g25610-68299.m02623 T14C9.150 dehydration-induced protein RD22 At4g17830-67119.m00001 T6K21.10 FCAALL.133 N-acetylornithine deacetylase-like protein At2g34680-68297.m03628 T29F13.11 unknown protein At3g23810-68298.m02625 MYM9.17 S-adenosyl-L-homocysteinas At2g18400-68297.m01794 T30D6.9 putative ribosomal protein L6 At2g33580-68297.m03509 F4P9.35 putative protein kinase At2g02120-68297.m00116 F504.11 protease inhibitor II At5g59310-68299.m05874 MNC17.4 nonspecific lipid-transfer protein precursor - like At3g43810-68298.m04027 T28A8.100 calmodulin 7 At4g33865-68296.m03714 F17I5.3 Expressed protein At1g26880-68300.m02856 T2P11.7 60s ribosomal protein L34 At1g22160-68300.m02407 F2E2.23 unknown protein At4g33865-68296.m03714 F17I5.3 Expressed protein At5g54600-68299.m05330 MRB17.10 50S ribosomal protein L24 At1g07980-68300.m00752 T6D22.7 hypothetical protein At5g43860-68299.m04103 MQD19.22 AtCLH2 (gb|AAF27046.1) At3g15353-68298.m01663 K7L4.17 Expressed protein At2g37600-68297.m03939 F13M22.10 60S ribosomal protein L36 At1g48630-68300.m04718 F1114.18 F9P7.2 guanine nucleotide-binding protein At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At4g18100-68296.m01871 F15J5.70 ribosomal protein L32 -like protein ribosomal protein At4g01850-68296.m00210 T7B11.11 S-adenosylmethionine synthase 2 At4g19380-68296.m02018 T5K18.160 putative protein predicted protein At3g05530-68298.m00534 F22F7.1 26S proteasome AAA-ATPase At2g35350-68297.m03695 T32F12.27 hypothetical protein At1g74670-68300.m07527 F1M20.35 GASTl-like protein similar to GAST1 6.3 6.3 6.3 6.3 6.3 6.3 6.2 6.2 6.2 6.2 6.2 6.2 6.2 6.2 6.2 6.1 6.1 6.1 6.1 6.0 6.0 6.0 6.0 6.0 6.0 6.0 5.9 5.9 5.9 5.9 5.9 5.9 5.9 WS00916JH22 WS0023_N16 WS0054_J21 WS0056_C16 WS0014_K04 WS00913_K09 IS0011_C19 WS0033_G15 WS0022_F09 WS0014J23 WS00812_H16 WS0062_E06 WS0037_B06 WS0064_O13 WS0062_N22 WS0092_N15 WS0102_E17 WS0012_N15 WS0024_C11 WS0094_J03 WS00112_F15 WS00716_E11 WS00111_B03 WS0021_F03 WS0041_K15 WS0094_B17 WS0048_E08 WS0017_L07 WS0064_A10 WS0021_C23 WS0014_H20 WS0012_A15 WS00911_K24 At5g13870-68299.m01434 MAC12.33 endoxyloglucan transferase At4g09800-68296.m00933 F17A8.150 S18.A ribosomal protein At3g51000-68298.m04787 F24M12.40 epoxide hydrolase-like protein At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At3g59540-68298.m05723 T16L24.90 60S RIBOSOMAL PROTEIN L38-like protein At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At3g11250-68298.m01178 F11B9.17 60S acidic ribosomal protein At3g16640-68298.m01827 MGL6.19 translationally controlled tumor protein-like protein At5g58420-68299.m05769 MQJ2.1 MCK7.1 ribosomal protein S4 - like At4g34030-68296.m03736 F28A23.210 putative protein B subunit of propionyl-CoA carboxylase At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At1g20693-68300.m02249 F2D10.38 Expressed protein At1g53540-68300.m05288 F22G10.20 T3F20.29 17.6 kDa heat shock protein At2g02120-68297.m00116 F504.11 protease inhibitor II At5g20510-68299.m02152 F7C8.100 zinc finger protein At2g30160-68297.m03124 T27E13.10 putative mitochondrial carrier protein At3g60245-68298.m05802 F27H5.3 Expressed protein At2g30570-68297.m03171 T6B20.8 photosystem II reaction center 6.1 KD protein At1g51710-68300.m05073 F19C24.8 ubiquitin-specific protease 6 (UBP6) At3g54210-68298.m05139 F24B22.170 ribosomal protein L17 -like protein At5g55120-68299.m05394 MC015.7 putative protein At4g15610-68296.m01602 FCAALL.139 hypothetical protein At1g79550-68300.m08068 T8K14.3 phosphoglycerate kinase At1g20630-68300.m02241 F5M15.31 F2D10.11 hypothetical protein At3g61110-68298.m05893 T27I15.200 ribosomal protein S27 At2g22070-68297.m02221 T16B14.8 hypothetical protein predicted by genscan At1g74670-68300.m07527 F1M20.35 GASTUike protein similar to GAST1 At3g60720-68298.m05854 T4C21.130 secretory protein At3g05590-68298.m00545 F18C1.14 putative 60S ribosomal protein L18 At3g55440-68298.m05278 T22E16.100 cytosolic triosephosphatisomerase At3g16480-68298.m01807 T204.13 MDC8.11 putative mitochondrial processing peptidase alpha subunit At3g10900-68298.m01129 T7M13.2 putative (1-4)-beta-mannan endohydrolase 5.9 5.9 5.8 5.8 5.8 5.8 5.8 5.8 5.7 5.7 5.7 5.7 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.4 5.4 IS0014_K04 WS0011J14 WS0055_H10 WS0061_P21 WS0016J19 WS00918_C14 IS0014_O16 WS0036_G20 WS0039_D07 WS0022_H24 IS0011_F11 WS0011_F13 WS0022_P02 WS01010_M10 WS0024_D24 WS00113J01 WS0054_E14 WS00911_N19 WS0035_C24 WS0031_A17 WS0021_K10 WS0075_M03 WS0023_C09 WS00110_O11 WS00914_L11 WS0048_M15 WS0039_K06 WS0062_B03 WS0078_H15 WS0035_M05 WS00910_A07 WS0093_H16 WS0037_O02 as to At3g57870-68298.m05547 T10K17.80 E2 ubiquitin-conjugating-like enzyme Ahus5 At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein At5g24510-68299.m02491 K18P6.3 T31K7.1 60s acidic ribosomal protein P1 At5g12020-68299.m01255 F14F18.190 heat shock protein 17.6-11 At3g56340-68298.m05375 T5P19.4 F18021.300 40S ribosomal protein S26 homolog At2g30050-68297.m03110 T27E13.21 F23F1.3 putative protein transport protein SEC13 At3g23810-68298.m02625 MYM9.17 S-adenosyl-L-homocysteinas At4g30800-68296.m03343 T10C21.1 F6I18.290 ribosomal protein S11 At5g09810-68299.m01007 MYH9.2 ACTIN 2/7 (sp|P53492) At3g15353-68298.m01663 K7L4.17 Expressed protein At4g26850-68296.m02897 F10M23.190 putative protein At5g16470-68299.m01709 MQK4.20 putative protein similar to unknown protein At3g53620-68298.m05078 F4P12.320 inorganic pyrophosphatase -like protein At1g65060-68300.m06401 F16G16.6 4-coumarate:CoA ligase 3 At5g43970-68299.m04116 MRH10.8 putative protein At5g21090-68299.m02217 T10F18.120 leucine-rich repeat protein At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At3g12390-68298.m01320 T2E22.29 nascent polypeptide associated complex alpha chain At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At5g18570-68299.m01941 T28N17.50 GTP-binding protein obg At3g48780-68298.m04548 T21J18.50 serine palmitoyltransferase-like protein At1g44575-68300.m04445 T18F15.3 photosystem II 22kDa protein At3g08690-68298.m00871 F17014.16 E2, ubiquitin-conjugating enzyme 11 At5g43600-68299.m04073 K9D7.10 N-carbamyl-L-amino acid amidohydrolase-like At3g60245-68298.m05802 F27H5.3 Expressed protein At5g40370-68299.m03706 MPO12.80 glutaredoxin -like protein glutaredoxin At5g05930-68299.m00595 K18J17.8 unknown protein At4g35760-68296.m03951 F4B14.2 F8D20.270 putative protein predicted protein At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At5g24510-68299.17102491 K18P6.3 T31K7.1 60s acidic ribosomal protein P1 At1g16890-68300.m01759 F17F16.19 F6I1.11 E2, ubiquitin-conjugating enzyme At4g27270-68296.m02939 M4I22.80 putative protein LEDI-3 protein 5.4 5.3 5.3 5.3 5.3 5.3 5.3 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.0 5.0 5.0 5.0 4.9 4.9 4.9 4.9 4.9 WS0087_K03 WS0047_L02 IS0012_C15 WS0047_G13 WS0064_N08 WS0091J-21 WS00916_E09 WS0045_H10 IS0012_H03 WS0042_P08 WS00811_J23 IS0011_M05 WS0022_L10 WS0042_M07 WS0021_D10 WS0061_D02 W S 0 0 2 1 J 1 7 WS00113_L02 WS0013_F05 WS0048_D09 WS0021_J06 WS0017_J16 WS0022_A02 WS00913_N23 WS00112_O15 WS00716_E14 WS0038_A10 WS0037_M04 IS0012_D03 WS0036_L06 WS0083_M01 WS0031_F13 WS00712_M01 ON U J At2g39010-68297.m04096 T7F6.18 putative aquaporin (water channel protein) At3g04120-68298.m00382 T6K12.26 glyceraldehyde-3-phosphate dehydrogenase C At2g23090-68297.m02332 F21P24.15 Expressed protein At2g30860-68297.m03203 F7F1.7 glutathione S-transferase identical to GB:Y12295 At5g24510-68299.m02491 K18P6.3 T31K7.1 60s acidic ribosomal protein P1 At1g64060-68300.m06293 F22C12.18 cytochrome b245 beta chain homolog RbohAp108 At4g18730-68296.m01943 F28A21.140 ribosomal protein L11 At3g50000-68298.m04680 F3A4.80 CASEIN KINASE II, ALPHA CHAIN 2 (CK II) At3g24100-68298.m02656 MUJ8.7 unknown protein At3g02080-68298.m00145 F1C9.13 putative 40S ribosomal protein S19 At1g12900-68300.m01293 F13K23.15 putative calcium-binding protein At4g23690-68296.m02503 F9D16.160 putative disease resistance response protein At4g34050-68296.m03738 F28A23.190 caffeoyl-CoA O-methyltransferase At3g53980-68298.m05116 F5K20.280 putative protein hypothetical protein At2g37870 At5g01650-68299.m00071 F7A7.170 light-inducible protein ATLS1 At1g64160-68300.m06303 F22C12.8 dirigent protein, putative similar to dirigent protein At4g24770-68296.m02621 F6I7.11 F22K18.30 RNA-binding protein RNP-T precursor At4g21105-68296.m02195 F7J7.1 Expressed protein At2g36885-68297.m03865 T1J8.4 Expressed protein At3g12390-68298.m01320 T2E22.29 nascent polypeptide associated complex alpha chain At2g40880-68297.m04314 T20B5.8 putative cysteine proteinase inhibitor B (cystatin B) At3g23570-68298.m02598 MDB19.5 unknown protein contains Pfam profile At1g74670-68300.m07527 F1M20.35 G A S T U i k e protein similar to GAST1 At1g15270-68300.m01573 F9L1.21 unknown protein ESTs At4g14690-68296.m01501 FCAALL.232 Expressed protein At3g54420-68298.m05166 T14E10.4 T12E18.110 class IV chitinase (CHIV) At3g23810-68298.m02625 MYM9.17 S-adenosyl-L-homocysteinas At1g72970-68300.m07341 F3N23.17 unknown protein At2g27510-68297.m02825 F10A12.19 putative ferredoxin At4g29410-68296.m03195 F17A13.230 putative protein unknown protein chromosome II At4g13550-68296.m01378 T6G15.100 putative protein hypothetical protein At4g26850-68296.m02897 F10M23.190 putative protein At1 g26880-68300.m02856 T2P11.7 60s ribosomal protein L34 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.6 4.6 4.6 4.6 4.6 WS0031_D14 IS0011_B03 WS00716_D06 WS0012_F13 WS0089_O08 WS0014_G05 WS0051_L06 WS0023_D01 WS0037_M23 WS00914_C11 WS0076_E02 WS0022_H17 WS00810_K01 WS0023_A15 WS0022_D20 WS0041_N15 WS0019_D16 WS00713_O03 WS0089_O21 WS0022_I20 WS0023_P22 WS0012_J04 WS0078_O11 WS0051_N06 WS0017_B15 WS00113_C21 WS0048_P17 WS00110_G07 WS0043JV106 WS0022_O19 WS0016_F22 WS0062_J23 WS0036_J23 4 ^ At3g24830-68298.m02742 K7P8.13 60S ribosomal protein At4g17615-68296.m01820 FCAALL.122 calcineurin B-like protein 1 At3g57490-68298.m05503 T8H10.90 40S ribosomal protein S2 homolog 40S ribosomal protein S2 At3g54560-68298.m05178 T14E10.130 histone H2A.F/Z At1g50570-68300.m04932 F11F12.11 F17J6.9 Expressed protein At4g35000-68296.m03856 M4E13.60 F11111.1 L-ascorbate peroxidase At1g12900-68300.m01293 F13K23.15 putative calcium-binding protein At5g43940-68299.m04113 MRH10.4 alcohol dehydrogenase At5g59720-68299.m05920 MTH12.7 heat shock protein 18 At5g02230-68299.m00129 T7H20.280 putative protein putative hydrolase At2g32150 At3g04920-68298.m00469 T9J14.13 putative ribosomal protein s19 ors24 At3g52820-68298.m04995 F3C22.220 purple acid phosphatase-like protein At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) At3g14420-68298.m01561 MOA2.2 glycolate oxidase At3g13920-68298.m01503 MDC16.5 Eukaryotic initiation factor 4A At4g27270-68296.m02939 M4I22.80 putative protein LEDI-3 protein At4g20360-68296.m02120 F9F13.10 translation elongation factor EF-Tu At5g50260-68299.m04828 K6A12.12 cysteine proteinase At2g39750-68297.m04176 T5I7.5 unknown protein At1g15690-68300.m01623 F7H2.3 inorganic pyrophosphatase At3g27690-68298.m03056 MGF10.10 putative chlorophyll A-B binding protein At4g03280-68296.m00376 F4C21.21 putative component of cytochrome B6-F At5g05920-68299.m00594 K18J17.7 deoxyhypusine synthase At1g24050-68300.m02636 T23E23.20 unknown protein At4g16520-68296.m01701 FCAALL.383 symbiosis-related like protein At2g15700-68297.m01489 F9013.25 copia-like retroelement pol polyprotein At3g52590-68298.m04971 F3C22.8 F22O6.30 ubiquitin extension protein (UBQ1) At2g33150-68297.m03463 F25I18.11 3-ketoacyl-CoA thiolase At3g53980-68298.m05116 F5K20.280 putative protein hypothetical protein At2g37870 At1g60950-68300.m05950 T7P1.9 ferrodoxin precursor identical At4g17340-68296.m01788 FCAALL.412 membrane channel like protein At3g12390-68298.m01320 T2E22.29 nascent polypeptide associated complex alpha chain At1g75500-68300.m07635 F1B16.19 F10A5.28 nodulin-like protein 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.4 4.4 4.4 4.4 4.4 4.4 WS00111J22 At2g46820-68297.m04996 F19D11.10 unknown protein 4.4 WS0092_A16 At4g29450-68296.m03199 F17A13.270 serine/threonine-specific receptor protein kinase 4.3 WS0022_O12 At3g16920-68298.m01859 K14A17.26 basic chitinase, putative similar to basic chitinase 4.3 WS0104_P18 At4g39230-68296.m04364 T22F8.130 NAD(P)H oxidoreductase, isoflavone reductase 4.3 WS00110_B01 At4g25050-68296.m02663 F24A6.4 F13M23.190 acyl carrier 4.3 WS00912_E03 At2g23070-68297.m02330 F21P24.13 putative casein kinase II catalytic (alpha) subunit 4.3 WS0012_E03 At5g43830-68299.m04100 MQD19.19 aluminum-induced protein-like 4.3 WS0022_J01 At4g26850-68296.m02897 F10M23.190 putative protein 4.3 WS00914_C23 At2g43590-68297.m04630 F18019.30 T1024.43 putative endochitinase 4.3 WS0041_B22 At5g14740-68299.m01532 T9L3.40 CARBONIC ANHYDRASE 2 4.3 WS0099_N12 At3g61110-68298.m05893 T27I15.200 ribosomal protein S27 4.3 WS0051_O01 At3g05590-68298.m00545 F18C1.14 putative 60S ribosomal protein L18 4.2 WS00112_O09 At3g62290-68298.m06035 T17J13.250 ADP-ribosylation factor-like protein 4.2 WS0044_H06 At5g54640-68299.m05334 MRB17.14 histone H2A (gb|AAF64418.1) 4.2 WS0047_L09 At4g28390-68296.m03072 F20O9.60 ADP,ATP carrier-like protein ADP,ATP carrier 4.2 WS0022_F16 At5g14740-68299.m01532 T9L3.40 CARBONIC ANHYDRASE 2 4.2 WS0042_D22 At4g18100-68296.m01871 F15J5.70 ribosomal protein L32 -like protein ribosomal protein 4.2 WS0053_C20 At5g59240-68299.m05866 MNC17.7 40S ribosomal protein S8 4.2 WS0097_E10 At2g41470-68297.m04378 T26J13.6 unknown protein 4.2 WS0082_K13 At2g33150-68297.m03463 F25I18.11 3-ketoacyl-CoA thiolase 4.2 WS0091_J05 At1g70600-68300.m07067 F24J13.17 F24J13.17 60S ribosomal protein L27A 4.2 WS0038_O21 At4g25050-68296.m02663 F24A6.4 F13M23.190 acyl carrier 4.2 WS0017_P07 At3g22840-68298.m02516 MWI23.21 early light-induced protein 4.2 WS0024_D06 At3g05890-68298.m00579 F2O10.15 F10A16.19 low temperature and salt responsive protein 4.2 WS0011_L01 At1g01490-68300.m00056 F22L4.5 unknown protein 4.2 WS0034_D07 At3g57490-68298.m05503 T8H10.90 40S ribosomal protein S2 homolog 40S ribosomal protein S2 4.1 WS0021_C03 At1g24020-68300.m02633 T23E23.28 pollen allergen-like protein 4.1 WS0061_K10 At5g51440-68299.m04971 MFG13.15 mitochondrial heat shock 22 kd protein-like 4.1 WS00914_G22 At4g22880-68296.m02414 F7H 19.60 putative leucoanthocyanidin dioxygenase (LDOX) 4.1 WS0064_A09 At4g34670-68296.m03815 T4L20.250 Putative S-phase-specific ribosomal protein 4.1 WS00919_B16 At4g37870-68296.m04191 T28I19.150 phosphoenolpyruvate carboxykinase (ATP) 4.1 WS00916_A23 At5g59290-68299.m05871 MNC17.21 dTDP-glucose 4-6-dehydratase - like protein 4.1 WS0017_B04 At3g54940-68298.m05220 F28P10.80 cysteine proteinase non-consensus AG donor 4.1 ON WS0022_D09 WS0014_K06 WS00913_O17 WS0021_M10 WS0054_F23 WS0073_J14 WS0014_E10 WS0075_O24 WS0063_N05 WS0044_J05 WS0015_D02 WS0012_K03 WS0062_J12 WS0021_O21 WS01011_K12 WS0021_A16 WS0019_O16 WS0038_C14 IS0014_C22 IS0014_A13 WS0038_A04 WS0038_A07 WS0016_C20 WS00112_F10 WS0032J314 WS00713_D14 WS00712_H24 WS00910JM15 WS0022_E14 WS0024_L01 WS0036_H05 WS00910_C24 WS0023^D14 At5g25610-68299.m02623 T14C9.150 dehydration-induced protein RD22 At1g56580-68300.m05657 F25P12.97 hypothetical protein At1g70310-68300.m07031 F1707.16 spermidine synthase At2g30570-68297.m03171 T6B20.8 photosystem II reaction center 6.1 KD protein At2g45290-68297.m04819 F4L23.20 putative transketolase precursor At1g21320-68300.m02315 F16F4.3 hypothetical protein At2g45740-68297.m04868 F4I18.28 unknown protein At3g26060-68298.m02865 MPE11.21 putative peroxiredoxin similar At5g25610-68299.m02623 T14C9.150 dehydration-induced protein RD22 At1g75390-68300.m07620 F1B16.8 bZIP transcription factor ATB2 At2g28000-68297.m02883 T1E2.8 putative rubisco subunit binding-protein alpha subunit At3g24100-68298.m02656 MUJ8.7 unknown protein At1g17720-68300.m01896 F11 A6.6 type 2A protein serine/threonine phosphatase 55 kDa At4g09800-68296.m00933 F17A8.150 S18.A ribosomal protein At1g66180-68300.m06530 F15E12.7 unknown protein At4g05320-68296.m00624 C17L7.240 polyubiquitin (UBQ10) At5g57330-68299.m05642 MJB24.14 apospory-associated protein C At2g14900-68297.m01400 T26I20.6 similar to gibberellin-regulated proteins At4g25000-68296.m02656 F13M23.140 alpha-amylase At1g22840-68300.m02478 F19G10.20 putative cytochrome C At2g32720-68297.m03412 F24L7.14 putative cytochrome b5 At5g45010-68299.m04229 K21C13.20 putative protein At1g58290-68300.m05760 F19C14.9 glutamyl-tRNA reductase At4g14690-68296.m01501 FCAALL.232 Expressed protein At3g45980-68298.m04259 F16L2.190 histone H2B At2g39730-68297.m04174 T5I7.18 hypothetical protein At4g11600-68296.m01158 T5C23.30 phospholipid hydroperoxide glutathione peroxidase At2g04570-68297.m00387 T103.2 putative GDSL-motif lipase/hydrolase At5g37360-68299.m03353 MNJ8.18 putative protein At3g02470-68298.m00197 F16B3.10 S-adenosylmethionine decarboxylase At2g21170-68297.m02124 F26H11.7 putative triosephosphate isomerase At5g22870-68299.m02305 MRN17.10 putative protein At3g46430-68298.m04307 F18L15.150 putative protein mitochondria 4.1 4.1 4.1 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.8 3.8 3.8 3.8 3.8 3.8 WS0022_F05 At5g56150-68299.m05508 MDA7.21 E2, ubiquitin-conjugating enzyme 3.8 WS00712_F08 At5g10360-68299.m01068 F12B17.290 40S ribsomal protein S6 3.8 WS0106_C16 At4g33950-68296.m03726 F1715.140 protein kinase - like protein protein kinase 3.8 WS00113J03 At3g15640-68298.m01698 MSJ11.5 putative cytochrome c oxidase subunit Vb 3.8 WS0104_L16 At3g07090-68298.m00735 T1B9.26 F17A9.22 unknown protein 3.8 WS0012_F12 At4g21280-68296.m02214 T6K22.20 F7J7.220 photosystem II oxygen-evolving complex 3.7 WS01010JH22 At2g34480-68297.m03604 F13P17.34 T31E10.18 60S ribosomal protein L18A 3.7 WS0023J21 At1 g20120-68300.m02182 T20H2.29 anter-specific proline-rich protein APG precursor 3.7 WS0022J01 At1 g77940-68300.m07903 F28K19.15 ribosomal protein L30 3.7 WS0076_C16 At5g14920-68299.m01551 F2G14.40 putative protein predicted protein 3.7 WS0063_B17 At3g46040-68298.m04265 F12M12.10 cytoplasmic ribosomal protein S15a -like 3.7 WS00916_P08 At1g02560-68300.m00184 T14P4.12 ATP-dependent Clp protease proteolytic subunit 3.7 WS0039_P01 At4g23496-68296.m02483 F16G20.1 Expressed protein 3.7 WS0023_B17 At4g12800-68296.m01287 T20K18.150 probable photosystem I 3.7 WS0011_P16 At3g22930-68298.m02527 F5N5.10 calmodulin 3.7 WS0024_J14 At4g37800-68296.m04185 T28I19.80 endo-xyloglucan transferase - like 3.7 WS00712_E02 At5g10480-68299.m01080 F12B17.170 putative protein phosphatase protein 3.7 IS0012_C21 At3g13920-68298.m01503 MDC16.5 Eukaryotic initiation factor 4A 3.7 WS0022_E06 At5g50260-68299.m04828 K6A12.12 cysteine proteinase 3.7 WS0091_G03 At5g18380-68299.m01916 F20L16.100 40S RIBOSOMAL PROTEIN S16 3.7 WS0078_A13 At1 g60810-68300.m05937 F8A5.32 ATP citrate-lyase 3.7 WS0043_M08 At1g48630-68300.m04718 F1114.18 F9P7.2 guanine nucleotide-binding protein 3.7 WS0014_F14 At5g17170-68299.m01783 MKP11.2 unknown protein 3.7 WS0044_B02 At2g47110-68297.m05030 F14M4.6 ubiquitin extension protein (UBQ6) 3.6 ON 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items