Open Collections

UBC Faculty Research and Publications

Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification… Raherison, Elie; Rigault, Philippe; Caron, Sébastien; Poulin, Pier-Luc; Boyle, Brian; Verta, Jukka-Pekka; Giguère, Isabelle; Bomal, Claude; Bohlmann, Jörg; MacKay, John Aug 29, 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2012_Article_4527.pdf [ 1.46MB ]
JSON: 52383-1.0074602.json
JSON-LD: 52383-1.0074602-ld.json
RDF/XML (Pretty): 52383-1.0074602-rdf.xml
RDF/JSON: 52383-1.0074602-rdf.json
Turtle: 52383-1.0074602-turtle.txt
N-Triples: 52383-1.0074602-rdf-ntriples.txt
Original Record: 52383-1.0074602-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessTranscriptome profiling in conifers and thePiceaGenExpress database show patterns ofdiversification within gene families andinterspecific conservation in vascular geneexpressionElie Raherison1†, Philippe Rigault2†, Sébastien Caron1†, Pier-Luc Poulin1, Brian Boyle1, Jukka-Pekka Verta1,Isabelle Giguère1, Claude Bomal1, Jörg Bohlmann3 and John MacKay1*AbstractBackground: Conifers have very large genomes (13 to 30 Gigabases) that are mostly uncharacterized although extensivecDNA resources have recently become available. This report presents a global overview of transcriptome variation in aconifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues.Results: An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison ofsecondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highlyconserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robustvalidation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed fromover 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on themicroarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues thangenes of known or predicted function. Diversity of expression within gene families may be rapidly assessed fromPiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA) osmotic regulation proteins occur inlarge gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family,while diverse patterns suggested a greater degree of diversification among LEAs.Conclusion: Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource ofthis kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate geneticstudies in the large and important group comprised of conifer trees.BackgroundMicroarray (MA) transcript profiling and RNA sequen-cing (RNA-Seq) represent powerful approaches to ra-pidly gain functional information on a genome-widescale. Information on RNA transcript abundance is a keyto assessing the biological role of gene products andcannot be directly deduced from a gene’s sequence. Thishas lead researchers to develop databases of RNA abun-dance profiles, first and foremost for model organisms.For example, the AtGenExpress database was created forthe model-plant Arabidopsis from a host of tissue pre-ferential and stress response expression profiles [1].Databases such as AtGenExpress are particularly usefulfor the identification of groups of co-expressed genes.Other plant oriented databases include the poplar Pop-GenIE made up of tissue, developmental and stress re-sponse profiles [2]. Reflecting the value of gene expressiondata, public organizations and institutes also maintain* Correspondence:†Equal contributors1Center for Forest Research and Institute for Integrative and Systems Biology,Université Laval, Québec, QC, Canada G1V 0A6Full list of author information is available at the end of the article© 2012 Raherison et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.Raherison et al. BMC Genomics 2012, 13:434 databases like the Gene Expression Omnibus orGEO (NCBI) and ArrayExpress (EBI), among others,which host datasets from a wide array of organisms.Recent transcriptome-wide analyses underscore theimportance of gene expression in the genetic architec-ture of complex traits. Studies in fruit flies, mice,humans and maize show that a proportion of the geneticvariants underlying complex phenotypes exert theireffects through gene expression [3,4]; so, discovering thegenetic basis for the variation in transcript abundance iscentral to understanding phenotypic variation [5]. Geneexpression studies also provide insights into the molecu-lar impacts of natural selection. For example, expressionprofiling showed the differential action of selection pres-sure on different tissues and organs in humans [6]. Acomparative analysis of mouse and human showed ahigh level of conservation in the expression of ortholo-gous genes, showing the stability of house-keeping genesand the variability of tissue specific genes [7].Transcriptome profiling is facilitated by the availabilityof a reference genome but many studies have also beenbased on large-scale cDNA sequence datasets. In plants,many angiosperm genomes have been sequenced, in-cluding the model plant Arabidopsis thaliana [8], rice[9], poplar [10] and grapevine [11]; however, referencegenomes are still lacking for plant phyla belonging tothe gymnosperms. The best studied gymnosperms areconifers, which as a group have extremely large genomes(ranging from 13 to 30 Gb). In conifers including pines(Pinus spp.), spruces (Picea spp.), Douglas-fir (Pseudot-suga menziesii) and Japanese cedar (Cryptomeria japon-ica) over 1 million expressed sequence tags (ESTs) havebeen obtained from dideoxy sequencing and assembledto infer putative unigenes or transcript sets (reviewed in[12]). Large collections of cDNAs are available for whitespruce (Picea glauca) [13] and Sitka spruce (P. sitchen-sis) [14]. From 30% to 40% of conifer sequences cannotbe annotated because they lack sequence similarity toknown genes [13-16].This report describes a large-scale oligonucleotidemicroarray developed from the extensive cDNA datasetsavailable for spruce trees (P. glauca, P. sitchensis) toachieve broad transcriptome coverage. Previously, micro-arrays were developed from PCR amplicons (cDNAmicroarrays) primarily in pines and spruces (reviewed in[12]). Many of the cDNA microarrays have ranged from afew hundred to several thousand cDNAs, and a few ofthem have included over 20,000 spots, i.e. in Picea sitch-ensis [17] and Pinus taeda [18]. They have essentially beenused in comparative experiments (using two-dye designs)to investigate transcriptome remodeling during tissue dif-ferentiation, development, or in response to environmen-tal cues [12], but a general characterization of conifertranscriptomes has been lacking.A major goal of the present study was to assembletranscript profiles from spruce trees (Picea spp.) into adatabase called PiceaGenExpress, aiming to characterizethe basic features of a conifer transcriptome such as thenumber of transcribed genes in a variety of tissues. Thereference profiles in PiceaGenExpress enabled explora-tory analyses of the diversity of expression patternswithin and among gene families and the expression ofretrotransposons. In addition, conservation of gene ex-pression in secondary vascular tissue was studied basedon interspecific comparisons of tissue preferential ex-pression. The accuracy of microarray profiles and designwas directly evaluated by RNA-Seq analysis of the samesamples as those used for one of the microarray experi-ments included in PiceaGenExpress.Results and DiscussionDevelopment of an oligonucleotide microarray forspruces (Picea spp.)A large-scale custom microarray containing 31,604oligonucleotide probes was designed for broad representa-tion of the spruce (Picea spp) transcriptome. The 70 nu-cleotide probes were based on unique cDNA sequencesfrom white spruce (P. glauca) [13] and Sitka spruces (P.sitchensis) [14] (Table 1). Both of these conifers have ex-tensive ESTs and FL-cDNA sequence databases developedfrom dideoxy sequencing (Sanger method); 454 ESTs (GS-FLX) were also available for P. glauca. The probe designparameters and microarray manufacturing methods wereexperimentally determined through hybridization experi-ments with a microarray of 3,900 oligonucleotide probesspecifically designed for optimization tests (for details seeAdditional file 1, Additional file 2: Figure S1 and Additionalfile 3: Figure S2).The high level of sequence identity in the Picea data-sets enabled the design of a single probe matching bothP. glauca and P. sitchensis sequences, for most genes;however, the datasets did not overlap entirely so thatsome probes were unique to one of the species andcould not be verified in the other. A level of 95.7% of se-quence identity or higher (3 mismatches or less) wasobtained for nearly all of the probes relative to P. glaucasequences, and for 59% of the probes relative to P. sitch-ensis (Table 2). Preliminary experiments indicated that 3mismatches had a small impact on the hybridization sig-nal intensity and the ability to detect differential expres-sion (See Additional file 1, Additional file 2: Figure S1and Additional file 3: Figure S2). Analyses presented inthis report are based on the set of 25,045 probesdesigned from P. glauca sequences, which match 23,853unique cDNAs in the P. glauca gene catalogue of Rigaultet al. [13].The potential for utilizing this oligonucleotide micro-array in other species and genera of the PinaceaeRaherison et al. BMC Genomics 2012, 13:434 Page 2 of 16 was evaluated by using comparative RNA hybridi-zations with four different spruces (Picea spp), two pines(Pinus spp.) and a larch (Larix laricina) (Figure 1 andAdditional file 4: Figure S3). The hybridization outcomeswere highly similar among the spruces (Figure 1A-1C).A large majority of the probe signal intensities variedless than 2-fold among the spruces (Figure 1G). Whencompared to P. glauca, the pines and larch gave more di-vergent results as shown by the number of commonpositive probes (above background) and overall data cor-relations (Figure 1D-1F). Surprisingly, the pines andlarch gave signals of equal or greater intensity than P.glauca for most of the probes, and a decreased signal for17% to 34% of the detected probes. Taken together, theseobservations indicated that the microarray is suitable fordirect comparisons of transcript levels between spruces.For pines and larch, a subset of the probes may not beinformative. In general, the MA appears appropriate forstudies comparing data within the same genus.Differential expression in the vascular transcriptome isconserved among Picea speciesMicroarray transcript profiles compared two tissues thatsupport secondary vascular growth, i.e. diameter stemgrowth, as it is a key feature of the life habit of trees.Secondary xylem is the wood forming tissue located onthe internal side of the cambial meristem. It was com-pared to a composite sample of phloem and phellodermtissues (referred to here as phelloderm) located on theouter side of the cambial meristem in three spruce spe-cies. Identical analyses were carried out in three spruces:P. glauca, P. sitchensis and P. mariana. The number oftranscripts detected for these two tissue types werehighly conserved in the three species, ranging from13,744 to 14,513 in xylem, and 14,990 to 15,697 in phel-loderm. A total of 5,407 genes were differentiallyexpressed (DE) in all three species. Tissue preferentialtranscript accumulation and the fold difference betweenthe tissues were very similar among the three species(Figure 2). The small number of genes that varied intheir tissue specificity (60 genes or 1.1% of the DE genes)indicated that genes with small difference in expressionbetween tissues were more prone to vary between spe-cies or be less accurately determined.Different MA experiments comparing xylem andphloem tissues in angiosperms including Arabidopsis[19,20] and poplar [21], and in P. glauca have helped todelineate groups of genes whose expression was of par-ticular relevance to secondary vascular growth. A MAprofiling study in Arabidopsis root-hypocotyl defined aset of 319 genes specifically regulated in secondaryxylem compared to phloem or non-vascular tissues [19].In young spruce trees, 360 sequences were shown to bexylem preferential compared to needles and phloem[22]. A core set of 52 xylem genes was identified by Koet al. [21] based on transcriptome analyses of secondaryxylem in Arabidopsis thaliana and poplar, and of cottonfibres. The expression patterns reported here for threespruces indicated that tissue preferential expression forxylem compared to phelloderm were conserved amongspruces. These conserved patterns could be the basis forcomparative genomics of conifers and angiosperms trees.This finding is also relevant for studying the geneticarchitecture of wood traits because it was shown thatxylem preferential expression was a feature of genesassociated with genetic variation in wood properties[23].Validation of the microarray and profiling results byRNA-SequencingAn RNA-Seq study P. glauca secondary xylem and phel-loderm used the same RNA samples as for the MA pro-filing, but the samples for each tissue type were pooledbefore sequencing. A total of 59.5 M high-qualityTable 1 Development a large-scale oligonucleotide array for spruces (Picea spp): sequence information used to designoligonucleotide probes from Picea glauca and P. sitchensis sequencesCategory Number of probes % Probe designed1 Confirmation of P. glauca (Pgl) cDNA clone or other1 11,214 35% Pgl Confirmed in Pgl and Psi cDNAs2 12,251 39% Pgl Confirmed by Pgl 454 seqs or Psi cDNAs3 4,840 15% Psi Confirmed by Pgl 454 seq4 1,629 5% Pgl Unconfirmed but Pgl clones ≥ 25 1,670 5% Psi Psi full-length cDNA only (not found Pgl)All 31,6041Each probe was designed based on the sequence of P. glauca (Pgl) or P. Sitchensis (Psi).Table 2 Analysis of probes: sequence similarity of probesaligned to Picea glauca and P. sitchensis sequencesP. glauca P. sitchensis Number of probes %67-70 (>95.7%) 67-70 (>95.7%) 17,279 54%67-70 (>95.7%) NA 11,999 38%67-70 (>95.7%) 63-66 (>90%;<95%) 923 3%63-66 (>90%;<95%) 67-70 (>95.7%) 529 2%NA 67-70 (>95%) 869 3%Raherison et al. BMC Genomics 2012, 13:434 Page 3 of 16 were obtained and mapped to the 27,720cDNA clusters of the P. glauca gene catalogue as pre-viously described [13] (Table 3). The sequence frequencydata were normalized by transforming the data to readsper kilobase of exon model per million mapped reads(RPKM) as described by Mortazavi et al. [24]. Theseauthors estimated that an RKPM>1 represents oneRNA molecule per cell; therefore, an RPKM>1 was usedas a threshold for detection.Nearly all of the DE genes from the MA analysis (99%)were represented in at least one of the RNA-Seq sam-ples, and a large majority (84% to 88%) of genes with a2-fold difference on the MA, were also differentiallyrepresented in RNA-Seq (Table 4). These RNA-Seq dataconfirmed the tissue preferential expression (xylem vs.phelloderm) of 99.3% of the genes determined to be dif-ferentially expressed with both methods (Table 4). Thecross-validations between the two methods were basedPinus strobusPinus resinosaLarix laricinaPicea marianaPicea abiesPicea sitchensisPicea glauca Picea glauca Picea glauca15138 probes 14922 probes 17320 probes11156 probes 11671 probes11552 probesA B CD E F02000400060008000100001200014000Picea marianaPicea abiesPinus resinosaPinus strobusLarix laricinaLog2 fold change (Spp - P. glauca)<-1 1<>-1 >1GFigure 1 Interspecific comparison of hybridization intensities in secondary xylem. A-F: Pair-wise comparison of white spruce and six otherspecies based on the number of shared positive probes indicated in the plots. The squared correlation coefficients (r2) are as follows 0.86 (A), 0.85(B), 0.89 (C), 0.29 (D), 0.22 (E) and 0.27 (F). G: Analysis of signal intensity variation between species; the fold change (FC) was determined from theaverage normalized signal intensities (log2 scale). An FC of 1 or −1 represents a two-fold signal increase or decrease, respectively. For phellodermresults, see Additional file 4: Figure S3.Raherison et al. BMC Genomics 2012, 13:434 Page 4 of 16 differential expression in 2,171 to 4,181 genes de-pending on the fold difference threshold; therefore, itrepresented a highly robust confirmation of the accuracyof the MA results. The overlap in the number of genesvaried depending on the fold difference threshold, likelyowing to differences in experimental design and tech-nique; however, the preferential expression (xylem vs.phelloderm) was highly congruent between the analysismethods (Table 4).The RNA-Seq detected more transcribed genes thanthe MA analysis, i.e. close to 6,000 genes with anRPKM>1 (Table 4). The number of DE genes deter-mined by RNA-Seq and not by MA ranged from 491 to776 (Table 4), although these results are only suggestivegiven the experimental design used for the RNA-Seq. Inaddition, some of the genes shown to be tissue preferen-tial by MA analysis were suggested by RNA-Seq to bespecifically expressed only in one of the tissue types(Table 5). Interestingly, 50% to 65% of these putativetissue-specific sequences had no annotation based onsimilarity to TAIR sequences or the detection of Pfamdomains, compared to, less than 40% for the entire setof genes represented on the microarray [13]. This obser-vation is consistent with findings from animal researchshowing that genes that have more specific expressionpatterns also tend to have less conserved sequencesamong species [4]. In evolutionary terms, genes that areexpressed only in xylem or in phloem may either representsequences that are unique or are more highly diverged inconifers compared to other plants.PiceaGenExpress contains reference profiles that revealpatterns of tissue preferential expressionThe PiceaGenExpress database was developed from over150 MA hybridizations of Picea spp obtained for eightsample types representing different tissues and experi-ments (Table 6) (for procedures, see methods). Detailedanalysis of each of the eight datasets will be presented−6−4−20246−6−4−2024 6−4−20246Picea mariana Picea sitchensisPicea glaucaVariable genes between speciesFigure 2 Preferential expression in secondary vascular tissues of three spruce species. The FC data in the plot represent genes withdifferential expression in all three spruces. The scale is the log2 fold change.Table 3 RNA-Seq dataXylem Phelloderm Total or BothTotal reads (HQ) (millions) 29.5 29.9 59.5Reads mapped (millions) 17.3 20.4 37.7Genes - RPKM1 > 1 19,604 21,366 22,012Genes - RPKM> 3 16,168 18,297 19,1081 RPKM, Number of reads per kilobase of mapped sequences per megabasebased on Mortazavi et al. [24].Raherison et al. BMC Genomics 2012, 13:434 Page 5 of 16 In any given sample type, transcripts weredetected for 10,067 to 17,070 genes (hybridization signalabove background threshold); transcripts were detectedfor 21,939 different genes considering all of the tissues, i.e. 92% of the P. glauca genes represented on the micro-array. The PiceaGenExpress is made available as a flatfile (Additional file 5: Table S1) for ease of upload andaccess. Less than 10% of the genes were unique to onesample type. A simple ranking procedure was applied,as a means to represent the relative expression and en-able qualitative comparisons across samples. The geneswere ranked within each dataset of PiceaGenExpressseparately, based on average signal intensity in a sampletype, and then equally divided into 10 intensity catego-ries (expression classes): from the lowest 10% (class 1) tothe highest 10% (class 10) (Additional file 6: Table S2,Additional file 7: Figure S4).The PiceaGenExpress database was used as a tool torapidly assess tissue preferential and invariant expressionprofiles (Figure 3). It allowed us to identify gene familieswith diverse patterns of tissue preferential expression.For example, transcripts for three cellulose synthase(CS) genes stood out as being strongly overrepresentedin the secondary xylem (both juvenile and mature) com-pared to all other tissues, where as other CS transcriptswere more ubiquitous (Figure 3A). This observation wasconsistent with studies in pine [25] and poplar [26]showing that CesA genes that are specialized in second-ary cell wall formation occurred as triplets of genes. Adifferent type of pattern was observed with transcriptsfor photosystem I and II proteins, which are generallystrongly expressed in green tissues (Figure 3B). As mightbe expected, transcripts of all of the genes in this classwere detected at high levels and were co-expressed inyoung needles, expanding buds, and phelloderm ofyoung trees, and at low levels in non-photosynthetic tis-sues including roots, immature embryos and megagame-tophytes. For secondary xylem, the same transcriptswere abundant in young trees but low in the maturetrees, which could be accounted for by differences in ex-posure to light and bark thickness. Genes encodinghouse-keeping proteins like ubiquitin did not vary be-tween the tissues (Figure 3C). These observations rela-ting to tissue preferential expression are rapid andsimple within PiceaGenExpress. They are also consistentwith known expression and physiology, and suggest thatthe methodology that was followed to develop the databaseis adequate for revealing key patterns of gene expression.Non-annotated genes are expressed at low levels and infewer tissuesFrom 30% to 40% of cDNAs from non-model organismssuch as conifers could not be annotated by standard se-quence similarity searches like BLAST [15,16] or HMMER[13]. We investigated whether insights into the role of non-annotated sequences could be obtained by surveying theirabundance and distribution among tissues in PiceaGenEx-press (Figure 4; Additional file 8: Figure S5). First, weobserved that non-annotated sequences, i.e. sequences thatlacked similarity to known plant genes, were more repre-sented among low abundance transcript classes. On average,the non-annotated sequences represented 37.1% of the ex-pression class 1 genes and 22.4% in class 10 (not shown),and a same trend was observed in each of the tissue types(Figure 4A-C; Additional file 8: Figure S5A-E). These non-annotated sequences could be either unique to conifers,owing to differential loss or acquisition of genes among taxa,or could be too highly diverged to permit functional annota-tion. Their general low level of expression may suggest thatthey were expressed in fewer cells or were tightly regulatedin some manner, perhaps playing a more specialized role inmetabolism or development.Second, our data showed that annotated and non-annotated genes were strongly contrasted in regard to thenumber of tissues in which transcripts were detectedTable 5 Tissue specificity in RNA-SeqXylem Specific Phelloderm SpecificTotal Annotated Total AnnotatedGenes RPKM> 1 44 16 (36%) 186 79 (42%)Genes RPKM> 3 17 6 (35%) 81 41 (50%)Table 4 Validation of microarray results by RNA-SeqCriteria MA1 RNA-Seq and MA2 RNA-Seq only4P-value FC (log2) RPKM>1 DE3 Tissue pref. DE Add0.05 0.5 5,666 5,592 99% 4,181 75% 4,155 99% 776 19%0.05 1.0 2,614 2,588 99% 2,265 88% 2,253 99% 677 30%0.01 0.5 5,526 5,466 99% 3,608 66% 3,585 99% 542 15%0.01 1.0 2,608 2,582 99% 2,171 84% 2,160 99% 491 23%1 DE positive genes determined by MA.2 Number of DE genes detected by RNA-Seq among the DE genes found by MA hybridization; Tissue pref., indicates that the tissue preference for xylem orphelloderm was the same.3 DE, differential expression in RNA-Seq, determined by a Chi-squared test (df = 1) with Benjamini-Hochberg correction for multiple testing.4 RNA-Seq results for non-DE genes from MA (considering genes represented on the MA only).Raherison et al. BMC Genomics 2012, 13:434 Page 6 of 16 4D). The majority of the annotated sequenceswere detected in seven or eight of the tissues. In contrast,the non-annotated genes were much more likely to befound in few tissues. This observation is consistent withthe idea that less conserved (including non-annotated)sequences may generally play more specialized roles. It isalso consistent with findings from comparative expressionstudies of mice and humans showing that house-keepinggenes were more highly expressed and were more con-served among species both in terms of their sequence andtheir expression [7].Diversity of expression profiles varies within and amonggene families with related functionsA total of 28 different gene families were reported to bestatistically overrepresented in spruce compared tomajor angiosperms based on the occurrence of proteindomains of known function [13]. Approximately onefifth of them were related to stress responses includingosmotic regulation proteins like dehydrin and late em-bryogenesis abundant protein (LEA) gene families. Thetwo families are of similar size in the white spruce genecatalogue, with 49 sequences each [13]. Expression pro-files (distribution of transcript abundance classes) withinthese families were examined in PiceaGenExpress to gaininsight into the extent of functional divergence or re-dundancy that might be associated with gene family ex-pansion. The expression of 49 different sequencescontaining a dehydrin protein domain indicated strikingdifferences between tissue types during normal develop-ment. Many more dehydrin sequences were detected inroots (high expression classes), megagametophytes and,to some extent, in the phelloderm than in the other tis-sues (Figure 5A). A large group of the sequences seemedto be co-expressed, i.e. all were expressed either verystrongly (class 9–10) or more weakly (class < 5).Expression profiles of the 49 gene sequences contain-ing a LEA domain also varied between tissues andappeared more diversified than observed for dehydrins(Figure 5B). The extent of divergence among the mem-bers in each family was analyzed by clustering and deter-mination of Euclidean distances among the sequences(Figure 5C). Overall most of the nodes (38 out of 49)were separated by a greater distance in the LEA familythan in the dehydrin family, indicating that during nor-mal development, the regulation of the dehydrin familymembers is less diversified. This observation may pointat greater functional diversification among LEAs.Both dehydrins and LEAs have been shown to beexpressed during normal development and to be waterstress responsive in conifers [18,27,28]. Our study onlyconsidered expression during normal development. Pre-vious studies in conifers have monitored the expressionof a small subset of these large gene families. For ex-ample, five and eight dehydrins transcript sequenceswere studied in foliage of maritime pine [27] and vegeta-tive buds of Norway spruce, respectively [28]. Wedetected 22 distinct sequences by MA profiling in youngfoliage, strongly suggesting that a comprehensive view ofthis protein family in response to stress remains to bedeveloped. The expression profiles presented here indi-cated that osmotic-regulation during normal develop-ment may involve more genes (especially dehydringenes) and potentially more varied functions in some ofTable 6 The PiceaGenExpress database: sample characteristics, hybridizations and detected genesPlant material Microarrays Genes5Tissue type Sp1 Source sampling2 Genotype3 Slide Imaging4 Total Unique Non-annotated1 Embryogenic cells Pgl a, a 1 6 SQ 10,066 100 2,4562 Vegetative buds Pgl b, c 2 10 SQ 12,361 128 3,2163 Xylem (Mature) Pgl d, d 60 60 SQ 14,686 176 4,2324 Xylem (juvenile) Pgl e, f 30 20 SQ 13,807 56 3,7015 Phelloderm Pgl e, f 30 20 SQ 15,803 214 4,3916 Young needles Pgl e, f 30 10 SQ 12,819 167 3,0257 Megagametophytes Pgl g, c 3 3 PA 17,056 1,111 5,2058 Adventitious roots Pab h, c 8 20 PA 15,718 393 4,696Total detected 21,241 2,345Not detected 2,6121 Pgl: Picea glauca (White spruce); Pab: Picea abies (Norway spruce).2 The source of materials and the sampling method are from the following studies: a, [43]; b, [37]; c, this paper, see methods; d, [23]; e, O-P seedlot, this papersee methods; f, [22]; g, C2856 parent from [38]; h, [44].3 Total numbers of genotypes analyzed (either individually or in pools).4 Microarray Scanner and Image Processing: SQ, ScanArray Express (Perkin Elmer, Waltham MA, USA) and QuantArray v3.0 (Packard BioChip Technologies, Billerica,MA, USA); PA, PowerScanner (Tecan Group Ltd., Männedorf, Switzerland) and ArrayPro Analyser v6.3 (Media Cybernetics, Bethesda, MD, USA).5 Total: number of genes above background (see methods); Unique: genes detected only in one tissue.Raherison et al. BMC Genomics 2012, 13:434 Page 7 of 16 tissues (megagametophytes, roots and phelloderm)than others (foliage, embryos). This observation indi-cates that osmotic response monitoring, whether it isrelated to drought conditions [27] or to normal develop-mental processes [28] may be sensitive to tissuepreference. In the PiceaGenExpress dataset, severaldehydrins were down regulated in vegetative buds thatwere sampled at the time of bud flush in the spring, aswas observed through detailed time series analyses ofNorway spruce dehydrins [28]. An interesting featureof the data is that the two seed derived tissues, i.e.immature somatic embryos and megagametophytesfrom germinating seeds, were highly contrasted in re-gard to the expression of specific dehydrin and LEAsequences.Expression of LTR retrotransposon sequencesRecent reports indicated that LTR retrotransposons repre-sented a large fraction of the conifer genome [29-31]. Inaddition, sequences traced to copia and gypsy-like retroe-lements were reported as being very frequent in Pinuscontorta ESTs [32] and were overrepresented in a rootxylem cDNA library [33], although some of thesesequences may represent genomic contaminations [13].More evidence is needed to show whether any of theseelements are active or if activity may vary as a functionof development or environmental stresses. The Picea-GenExpress database was scanned to obtain evidence ofRNA transcript production, which is the first step forLTR retrotransposon mobilization. A total of 83 cDNAsequences represented on the MA coded only for pro-tein domains expected for LTR retroelements, such asRNAse H, integrase core domain, retrotransposon gagprotein and reverse transcriptase. Many of thesequences were not detected at all; however, 46sequences were detected in at least one tissue, most ofthem were found in two tissues or more, and only afew accumulated at high levels (expression class 10)(Figure 6A). Tissue preferential accumulation was sug-gested by the fact that few sequences were detected inimmature somatic embryos and were present at lowlevels, where as many sequences were detected and inhigher expression classes in mature xylem, roots andmegagametophytes.The MA sequences matching putative LTR were esti-mated to represent up to 78,520 unique copies in the P.glauca genome (Figure 6B) and a total of 1.18 Millionsequences (not shown) based on their occurrence inshotgun sample sequencing data for P. glauca [13]. Dataare not currently available to estimate what proportionof the genome these sequences may represent; however,if the sequences were derived from an intact LTR of4000 bp on average, they would represent nearly 5 Gbpor 20% of the P. glauca genome. In other words, such alarge number of copies is expected to occupy a sizablefraction of the genome. The number of predicted copiesdid not appear to correlate with transcript accumulation,i.e. number of tissues in which they were detected orrelative levels (Figure 6A, see right panel; 6B). ThisABCClass10987654321NA}}***B F X-MX-JP R M EBT118681BT108752BT113451BT115314BT116636BT116976BT106827DR550030BT112281BT106574BT112555BT102609BT108501DR570698BT114090BT113351BT116018BT113232BT109639BT113673BT116723BT113328BT113735BT116249BT114564BT117424EX402309BT109740BT112246BT100463BT101255BT117787DR558535BT100771EX426596BT103170BT114951BT108085BT114888Figure 3 The PiceaGenExpress database reveals tissuepreferential and conserved expression patterns within three genefamilies. A: Cellulose synthases. B: Photosystem I and II proteins.C: Ubiquitins. NA: Not detected. Tissues: B (Vegetative buds), F(Foliage), X-M (Xylem - mature), X-J (Xylem - juvenile), P (Phelloderm),R (Adventitious roots), M (Megagametophytes), E (Embryogenic cells).Raherison et al. BMC Genomics 2012, 13:434 Page 8 of 16 indicates that the data were more likely toresult from transcription than genomic contaminationsof the RNA samples. It also shows that the number ofcopies accumulated in the genome is not useful for pre-dicting the level of transcript production. In fact, manyof the high copy sequences (>30,000 copies) were notdetected at all in any of the tissues. These results pointto specific LTR sequences that have the highest potentialfor mobilization in P. glauca.ConclusionsAn oligonucleotide microarray was developed from P.glauca and P. sitchensis datasets. It represents 23,853unique P. glauca genes or 85% of the recently reportedgene catalogue [13]. Single dye analysis and a rankingprocedure were used to develop PiceaGenExpress, adatabase of reference transcript profiles, based on 150hybridizations in eight different tissue sample types.These data represent the first resource of this kind for agymnosperm plant.The pine family comprises over two hundred speciesbelonging to eight genera. It is the largest and the mosteconomically important of the conifers. Interspecificcomparison experiments presented in this report indi-cated that the microarray may be applied to at leastthree of these genera. It could be a valuable tool for spe-cies where cDNA resources are lacking or underdevel-oped. Our findings also indicate that expression profilesfrom P. glauca are likely to be representative of otherconifers. RNA-sequencing has become a method ofchoice for transcriptome profiling but the analysis ofRNA-Seq data can be complex owing to factors such assequence polymorphisms, gene paralogs, and alternatesplicing. Therefore, successful application of RNA-Seqdepends on the availability or the development of agood quality reference genome or gene catalogue [24],and both are lacking in many non-model species. By1 2 3 4 5 6 7 8 9 1020060010002006001000A BC DIntensity classesFrequencyFrequencyAnnotated sequences Non-annotated sequences1 2 3 4 5 6 7 8N1 2 3 4 5 6 7 8 9 10100030005000Figure 4 Expression classes and numbers of tissue of annotated and non annotated sequences. A-C: Number of annotated andnonannotated sequences per expression class for xylem from juvenile trees (A), roots (B) and young foliage (C). Other tissues are shown inAdditional file 8: Figure S5. D: Number of tissues in which each annotated and non-annotated sequence was detected. Frequency, number ofgenes in a given intensity class or detected in a given number of tissues types.Raherison et al. BMC Genomics 2012, 13:434 Page 9 of 16 P. glauca RNA-Seq and microarray data backto the same cDNAs sequences, we were able to showthat the two methods were highly congruent. New se-quencing technologies promise to generate high qualitysequences in addition to very large volumes of data.Given the large size of conifer genomes, it may be ad-vantageous to use these technologies to develop highquality gene catalogues rather than attempt to assembleentire genomes. Once a gene catalogue is produced,these high throughput sequences represent a powerfulmethodology for unrestricted gene expression studies[24].Conifer genomes have a number of characteristics thatmake them unique, most prominently their enormoussize which can reach or even exceed 30 Gbp [30] andtheir highly repetitive sequences [31]. They are alsoABCBT106878BT108502BT116883CO482136BT115256BT108266BT106438BT117952BT115255BT117924BT117079BT105094EX414687EX329502EX364204BT116481BT114833BT115468BT117490BT117559BT102663BT117826BT114799BT114744BT115236BT117770BT115322BT105774BT116501BT114270BT104034EX410411BT115428BT116035BT108652BT117136BT113967BT117792BT115583BT117865BT115319BT113137BT119908BT107285BT108762BT119270BT115159BT114790BT117492BFX-MX-JPRMEBT102257BT100402CO478857BT114260BT113386BT117743BT106269BT109822BT116912BT115759BT113705BT101156BT101404BT107319BT115755BT118575BT111785BT113296BT105381BT101854BT105878BT106421DV989085BT104890BT106368BT102202BT104656BT114380BT102807BT114395BT106293BT100968BT102169DR565752BT101289BT118224DR577681DR572145BT118873BT105238DR553008BT104936DR563141BT104737BT103925BT103483DR586717DR586969BT107339BFX-MX-JPRME10 9 8 7 6 5 4 3 2 1 NAClass0510152025301 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849Dehydrins LEAsNodesEuclidean distanceFigure 5 Gene expression patterns in two osmotic regulation protein families based on the PiceaGenExpress database. A: Dehydrins.B: Late Embryogenesis Abundant proteins. C: Distribution of Euclidean distances between the members of each protein family. The order of thebars is not representative of the order of the genes in panels A and B. A, B: Tissues: B (Vegetative buds), F (Foliage), X-M (Xylem - mature), X-J(Xylem - juvenile), P (Phelloderm), R (Adventitious roots), M (Megagametophytes), E (Embryogenic cells).Raherison et al. BMC Genomics 2012, 13:434 Page 10 of 16 to have high levels of heterozygocity and arebelieved to harbour many gene paralogs, at least in somegene families [12]. For the MA described here, the em-pirical tests of probe specificity (Additional file 2: FigureS1) and the probe design parameters appeared sufficientto discriminate between paralogs that are known in theP. glauca catalogue of 27,720 genes [13]. The very highlevel of congruence observed between MA and RNA-Seq results, 99.3% of confirmation of tissue preferencefrom over 2100 genes, also suggest that the MA resultsaccurately reflect the expression of the target sequences.In contrast, issues of cross-hybridization between geneparalogs with different expression patterns would havelikely resulted in a lower validation rate.The tools and methods presented in this report maylead to diverse applications for fundamental discovery inforest genetics and evolutionary biology, such as under-standing phenotypic variation in economic and adaptivetraits. The genetic architecture of complex phenotypesin plants and trees is routinely probed by scanning thegenome for DNA sequence polymorphisms throughQTL mapping and association studies [34]. However,Huang et al. [5] summarized several recent studies bystating that discovering the genetic basis for the variationin transcript abundance was central to understanding pheno-typic variation. Examining the genetic architecture of gene ex-pression can provide functional insights into physiology andmetabolism, for example by revealing the organization of genenetworks[35,36].MethodsEvaluation of array design and manufacture parameterswith test oligonucleotide microarrayA custom MA comprised of 3,900 oligonucleotideprobes was developed to evaluate the impact of designparameters (for details see Additional file 1, Additionalfile 2: Figure S1 and Additional file 3: Figure S2). It con-tained multiple probes for 929 distinct genes as well ascDNA amplicons for 96 of those genes. The parameterstested include oligonucleotide lengths of 50, 60 and 70nucleotides, the position of probes within the transcript,the impact of SNPs and indels. The presence of one or threeSNP mismatches (distributed throughout the oligonucleo-tide probes) had a small effect on hybridization signal inten-sities and in many cases the expression ratio between tissueswas largely conserved (See Additional file 2: Figure S1). Incontrast, the presence of seven SNP mismatches in theprobe had a large effect on intensities and expression ratios.Based on these data, 70-mer probes that vary by up to threeSNPs distributed throughout the probe (95.7% sequenceANb of Copies B F X-MX-JP R M EGenBankBT114012BT116388BT111042BT108486BT116888DR592817DR583259DR587035BT107333BT113389BT113823DR587500BT107743BT110011CO256382DR555624BT111590BT108575DR547247BT110760DV996828DR556293BT107272BT112663BT108952BT103022BT114221CO253064DR556124DR552764BT113879DR589579BT112269DR593369BT111489DR594981BT112947DR592828DR570827BT111631BT118754BT107875BT107183BT111598CO237900DR546880pfamacaeebebabaaabaaaebdbbba, beeadcaabbedebdadedaaba0 25 50 7510987654321NAClassx1000B0123456789100 10 20 30 40 50 60 70 80Number genomic copies predicted (1000s)Average transcript classFigure 6 Expression patterns of LTR retrotransposons based onthe PiceaGenExpress database. A: Sequences containing proteindomains of LTR retroelements. Pfam annotations: a: Integrase coredomain; b: Retrotransposon gag protein; c: Retroviral aspartylprotease; d: Reverse transcriptase (RNA-dependent DNA polymerase);e: RNAse. Tissues: B (Vegetative buds), F (Foliage), X-M (Xylem -mature), X-J (Xylem - juvenile), P (Phelloderm), R (Adventitious roots),M (Megagametophytes), E (Embryogenic cells). B: Expression levelsare not correlated with the number of genomic copies.Raherison et al. BMC Genomics 2012, 13:434 Page 11 of 16 are expected to hybridize to the same RNA and onaverage produce a signal of similar strength, where as probeswith seven SNPs or more (less than 90% sequence identity)give very little cross hybridization. These observations estab-lish thresholds of sensitivity to sequence variation, i.e. up tothree mismatches have little impact on sensitivity, and speci-ficity, i.e. specificity is achieved with seven mismatches ormore to other sequences. The cut-off is situated betweenfour and six SNPs, i.e. between 95% and 91% sequenceidentity. The presence of short insertions and deletions(up to six nucleotides) located at the center of the 70-merprobes had a small impact on probe performance. Theimpacts of other parameters tested were generally smalland less predictable.The impact of different spotting buffers and surfacechemistries used to manufacture the microarray werealso assessed in regard to image quality and data repro-ducibility. For details on methods and findings, seeAdditional file 1. We found that the optimum conditionswere obtained by using aminosilane coated slides and3X SSC without betaine as a spotting buffer.Microarray design and manufactureThe sequences included in the microarray were selectedon the basis of reproducible sequence quality from all ofthe ESTs and FL-cDNA described for P. glauca inRigault et al. [13] and P. sitchensis in Ralph et al. [14](Table 1). To obtain a robust probe set, we selectedsequences that were either detected in the two species,verified with two technologies (Sanger and 454) or werederived from a FL-cDNA (Table 1). The probes were 70nucleotides in length, and were designed to minimizesimilarity with other sequences in the dataset. Sequencesimilarity between the probes and known sequences inP. sitchensis and P. glauca was determined from se-quence alignments. The microarray contains 25,045probes that match with known P. glauca sequences;however, they represent 23,853 unique genes based onthe most recent clustering [13], such that 1,017 genesare represented by more than one probe.The microarray consists of 33,984 spotted features in-cluding 33,024 sample spots (Invitrogen, Carlsbad, CA,USA), 240 negative buffer spots, and 480 Spot ReportAlien Oligos (Agilent Technologies, Inc., Santa Clara,CA, USA). Oligonucleotides were consolidated into 384-well plates, lyophilized by speed-Vac, and resuspendedin 3X SSC to a printing concentration of 30 μM. Oligoswere printed on aminosilane slides (Erie, Hudson, NH,USA) with a QArrayMax microarray printer (GenetixLimited, Hampshire, UK) using 946MP2 microarray pins(ArrayIt Corp, Sunnyvale, CA, USA) in a 48-pin tooldepositing ~0.5 nL per spot onto the slide. The resultingmicroarrays had a 4 x 12 subgrid layout with 708 spotsper subgrid, each spot having approximate diameter andpitch of 90 μm and 160 μm, respectively. A 280-bp GFP(green fluorescent protein) oligonucleotide was printedin subgrid corners to assist in grid alignment duringimage processing. The slides were crosslinked in a UVStratalinker 2400 (Agilent Technologies, Inc.) at 300 mJ.Array quality was assessed by visual inspection andhybridization of representative slides from a print run bydye-labeled random 9-mer oligonucleotides. The qualitycontrol images were acquired via the GenePix 4200ALscanner (Molecular Devices, CA, USA) at a 10 micronresolution and quantified with the Imagene 8.0 softwaresuite. Oligonucleotide library management, printing ofmicroarrays and quality control was performed by the Ge-nome BC Microarray Platform (Vancouver, BC, Canada).The array design details are available in the Gene Ex-pression Omnibus (accession No. GPL15033). All of thetranscriptprofilingdataisalsodepositedinGEO(availableuponacceptanceofthismanuscriptforpublication).Plant materialsThe origin, genotypes and method of collection of eachtype of material are described in Table 6, except for detailsdescribed here. All of the tissue samples were frozen in li-quid nitrogen immediately after removal from the trees,the seed or tissue culture vessels, and stored at −80°Cuntil further use. Vegetative Buds: Buds were collected from branchesof several clonal replicates of 9-year-old trees of P.glauca regenerated from two genetically distinctsomatic embryogenesis lines as described [37]during the mid-Spring when the buds were justbeginning to grow. For each genotype, five biologicalsamples each consisting of 6 buds (approximately80 mg fresh weight) were used for analysis. Secondary xylem, phelloderm (including phloem),young needles of juvenile trees: Nursery plantingstock (from open-pollinated seed lots) were obtainedas 3-year-old seedlings of Picea glauca, Piceamariana, Picea abies, Picea Sitchensis, Pinus strobus,Pinus resinosa and Larix laricina, were transferred to8-inch pots and grown in a greenhouse undernatural light conditions. Sampling of tissues wastimed with the ending of primary shoot elongation,i.e. after 6 to 8 weeks of growth, and was asdescribed [22]. For each tissue type five biologicalsamples were prepared by pooling 6 independent treeswithin each species. These materials were used forinterspecific comparisons (all 6 species) and for thedevelopment of PiceaGenExpress (P. glauca only). Megagametophytes: Control-pollinated seed (crossC962856) were obtained from P. glauca tree (80112)described in [38], were surface-sterilized for 1minute in 70% EtOH and 10 minutes in 3% Na-Raherison et al. BMC Genomics 2012, 13:434 Page 12 of 16, washing the seeds with sterile waterthree times between and after treatments. The seedswere then immersed in 4°C sterile water for 24 hoursand stratified at 4°C for 28 days. Next, the seeds weremoved to 26°C on petri dishes with a moist paper andkept in the dark to start the germination process.After 4 hours of incubation the seeds were openedand the embryo was separated from themegagametophyte under a dissecting microscope. Atotal of 3 biological samples comprised of a singlemegagametophyte were used for analysis. Adventitious roots: Norway spruce (P. abies) seedlingswere grown in the experimental nursery of FinnishForest Research Institute for about one and a halfyears before sampling. They were grown in standardnursery growing media, light Sphagnum peat, andfertilized with mineral nutrients. Roots were washedunder tap water to remove surrounding peat andapproximately 1 cm of the root ends was collected foranalysis. Between two and four biological samplescomprised of several root ends were analyzed for eachof eight different genotypes.RNA extraction, labelling and hybridizationTotal RNA was extracted following Chang et al. [39] asdescribed in Pavy et al. [22] for all of the sample types,except for megagametophytes where poly A+RNA wasextracted directly by using polyT coated magnetic beads(Dynal). One μg of total RNA was amplified for eachsample replicates, except for the megagametophyte sam-ples where 10 ng of poly A+RNA was used, with theAmino Allyl MessageAmp II aRNA Amplification Kit(Applied Biosystems by Life Technologies, Carlsbad, CA,USA) according to manufacturer’s instructions. Five μgof amplified RNA (aRNA) was then labelled with AlexaFluor 555 or 647 dyes (Invitrogen, Carlsbad, CA, USA)and purified as per the manufacturer’s instructions. Dyeincorporation efficiency was determined by using aNanodrop spectrophotometer (Thermo Fisher Scientific,Waltham, MA, USA) following the manufacturer’sinstructions. Depending on the experiment, each micro-array was hybridized with one labeled aRNA sample ortwo samples labelled with different dyes. The sample(s)to be hybridized to a microarray were mixed and thevolume was reduced to ~10 μl by evaporating excesswater in a DNA 120 speedvac (Thermo Fisher Scien-tific). Labelled aRNAs were fragmented for 15 minutesat 70°C using Ambion’s ”RNA Fragmentation Reagents“(Applied Biosystems), placed on ice for 1 minute, dena-tured for 2 minutes at 95°C, put on ice for 2 min andresuspended in 120 μl hybridization buffer (50% forma-mide, 5X SSC, 0,1% SDS, 0,1 mg/mL Herring spermDNA) preheated to 55°C. Samples were kept in a heatingblock at 50°C until hybridization.Hybridizations were performed in HS400Pro hybridizationstations (Tecan Group Ltd., Männedorf, Switzerland). Theslides were heated at 80°C for 10 minutes, then washed onceat 37°C with 0.5X SSC, 0.1% SDS for 20 seconds and onceat 50°C with 2X SSC, 0.5% SDS for 20 seconds, and prehy-bridized for 1 hour at 65°C in 5X SSC, 0.1% SDS, 0.1 mg/mlBSA, 0.1 mg/ml Herring Sperm DNA. Next the slides werewashed at 55°C with 2X SSC, 0.5% SDS for 1 minute with a30 second soak and washed again at 45°C for 1 minute withthe same solution. The resuspended labeled targets wereinjected into the chambers and hybridized for 16 hours at45°C with sample agitation. The slides were then washed asfollows: 2 times 1 minute 30 seconds at 45°C with 30 sec-onds soaking in 2X SSC, 0.5% SDS, 1 time 1 minute at 45°Cin 2X SSC, 0.5% SDS, 2 times 1 minute 30 seconds at 45°Cwith 30 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1minute at 37°C with 20 seconds soaking in 0.5X SSC, 0.1%SDS, 1 time 1 minute at 23°C with 20 seconds soaking in0.5X SSC, 0.1% SDS, 1 time 1 minute 30 seconds at 23°Cwith 30 seconds soaking in 0.1X SSC, 1 time 30 seconds at23°C in 0.1X SSC and 2 times 30 seconds at 23°C in milliQfiltered water. Finally slides were dried for 2 minutes 30 sec-onds with nitrogen gaz. Slide scanning and image processingwere performed as described in Table 6.Microarray data processing and analysisA procedure was developed to process and analyzemicroarray intensity data from single dyes, as opposedto fold change methods routinely used for spottedarrays. Data analyses were performed using customizedscripts for R and Bioconductor (http://www.r-project.organd Spots that wereflagged as presenting abnormal morphology during theimage processing were replaced by mean value of theremaining spots of the same probe from the other slidesfrom the same sample type. Background intensities weresubtracted from the foreground intensities. Background-subtracted data were log2-transformed and normalizedusing quantile correction approach.A filtering step was applied to select positive genes to beused for further analysis. The mean intensity of spots con-taining buffer only was calculated for each row of sub-grids, and was taken as the minimum intensity of probesfor that subgrid. A probe was called positive (detectedabove background) when its signal intensity was above thebuffer intensity on at least 50% of slides within a givensample type. When determining differential expression,positive probes were probes that were detected accordingto this criterion in at least one of the tissues (e.g. phello-derm and xylem juvenile). Mean probe intensity wasdetermined for genes represented by more than one posi-tive probe. All microarray experiment data has been sub-mitted to the Gene Expression Omnibus (GEO) underaccession numbers GSE35624, GSE35847 and GSE35922.Raherison et al. BMC Genomics 2012, 13:434 Page 13 of 16 testing for differential gene expression usedthe linear modeling approach and the empirical Bayesstatistics [40]; the p-values were adjusted for multipletesting according to Benjamini and Hochberg [41]. Dif-ferential genes were those meeting an adjusted p-value≤0.01, unless stated otherwise.The PiceaGenExpress databaseThe PiceaGenExpress database was developed fromtranscript profiles obtained for eight different tissuetypes coming from five independent experiments (seeTable 6). For this, the average signal intensity was deter-mined from all of the slides available for a sample type.The genes were then ranked based on their average sig-nal intensities within a tissue type and equally dividedinto 10 separate classes according to their signal inten-sity. Genes from class 1 or class 10 were the 10% withlowest and highest signal intensities, respectively (for thenumber of genes per class in each type, see Additionalfile 6: Table S2 and Additional file 8: Figure S5). Func-tional annotations were based on the matches with Ara-bidopsis proteins (TAIR 9 release) with E-value <1e-10and on the detection of Pfam domains as described[13]. PiceaGenExpress is made available as a flat file(Additional file 5: Table S1), which may be uploaded toany type of data processing or spreadsheet platform.Agglomerative hierarchical clustering with the completelinkage method was performed using hclust function in R[42] on the expression levels for two gene families, i.e.dehydrins and LEA proteins. This approach used a simi-larity matrix based on Euclidean distance. A smaller dis-tance means that two genes or clusters have more similarexpression levels in the tissues analyzed. First, the cluster-ing analysis placed each gene into its own singleton groupor cluster. Second, the closest clusters were iterativelyjoined together until all genes were merged into a singlecluster based upon similarity/distance measures betweenclusters. Dendrograms showing clusters of genes weredrawn (Additional file 9: Figure S6).RNA-Seq data processing and analysisTwo composite samples were analysed by RNA- Sequencingfor validation and comparison to MA results. Sampleswere prepared by combining equal molar amounts of theP. glauca RNAs isolated from secondary xylem (juveniletrees) and phelloderm. These samples were also used inthe MA profiling of each of the tissues. RNA-Sequencing,filtering of quality reads and mapping of reads onto thecDNA clusters were described [13]. The number ofsequences matching each cDNA cluster was normalizedby transforming the data to the number of reads per kilo-base of exon model per million mapped reads (RPKM) fol-lowing the method of Mortazavi et al. [24]. Unlessspecified otherwise, an RPKM >1 was used as a minimumthreshold of detection for RNA-Seq. Differential expressionof genes was determined by using the Chi-squared test cor-rected for multiple testing according to Benjamini andHochberg [41].Additional filesAdditional file 1: Additional material Experimental tests of optimaloligonucleotide design and manufacture methods.Additional file 2: Figure S1. Effect of SNPs on hybridization signalintensities and differential expression ratios. Hybridization data werebased on five biological replications of each white spruce tissue tested,and two technical replicates (dye swaps) were used for each sample.Each data point represents the mean value for the five biologicalreplicates. For probe intensities (A, C, E), the data are based onhybridizations with total RNA from secondary xylem; each pointrepresents the mean value data for Alexa Fluor 555 (green) or Alexa Fluor647 (red). The ratios (B, D, F) were obtained from pair-wise comparisonsof secondary xylem and young needles; each dot represents the meanratio obtained from the dye-swaps of all five biological replicates.Additional file 3: Figure S2. Comparison of differential expressionresults from a cDNA microarray and the test oligonucleotide microarray.Hybridization data were based on five biological replications of eachwhite spruce tissue tested, and two technical replicates (dye swaps) wereused for each sample. Tissue preferential expression was determined asdescribed (Pavy et al. 2008) for secondary xylem and young needles. Theoutcomes of the two types of arrays were compared by assessing thepresence or absence of statistically significant tissue preference.Additional file 4: Figure S3. Interspecific comparison of hybridizationintensities in the phelloderm. A-F: Pair-wise comparison of white spruceand six other species based on the number of shared positive probesindicated in the plots. The squared correlation coefficients (r2) are asfollows 0.83 (A), 0.84 (B), 0.90 (C), 0.18 (D), 0.30 (E) and 0.18 (F). G: Analysisof signal intensity variation between species; the fold change (FC) wasdetermined from the average normalized signal intensities (log2 scale).An FC of 1 or −1 represents a two-fold signal increase or decrease,respectively.Additional file 5: Table S1. PiceaGenExpress transcript profiles.Additional file 6: Table S2. Summary statistics of Picea Gen Expresstranscript profiles.Additional file 7: Figure S4. Hybridization signal intensities of genes ineach of the 10 expression classes in each tissue in PiceaGenExpresstranscript profiles. Vegetative buds (A), megagametophytes (B), xylemfrom mature trees (C), phelloderm from juvenile trees (D), xylem fromjuvenile trees (E), embryogenic cells (F), needles (G) and roots (H). RPKMvalues from RNA-sequencing of phelloderm (I) and xylem (J) of juveniletrees are also presented.Additional file 8: Figure S5. Expression classes and numbers of tissueof annotated and non annotated sequences. A-E: Number of annotatedand non annotated sequences per expression class for embryogenic cells(A), megagametophytes (B), xylem from mature trees (C), phelloderm (D)and vegetative buds (E). F: Number of tissues in which each annotatedand non-annotated sequence was detected. Frequency, number of genesin a given intensity class or detected in a given number of tissues types.Additional file 9: Figure S6. Hierarchical clustering dendrograms ofgene expression within two osmotic regulation protein families: A)dehydrins, and B) late embryogenesis abundant (LEA) proteins. Each leafnode of the dendrograms corresponds to an individual gene, and eachnode (horizontal line) represents a gene cluster. A gene cluster iscomposed of individual genes or existing gene cluster with the fusionpoint. Each gene cluster was placed at a height level as shown on thevertical axis. Height values refer to the similarity/distance measuresbetween genes and gene clusters.Competing interestsThe authors declare that they have no competing interests.Raherison et al. BMC Genomics 2012, 13:434 Page 14 of 16’ contributionsPR, BB, JB and JM participated in the microarray design; ER, BB, SC, CB, JMdesigned experiments; ER, SC, J-PV, IG prepared samples, extracted RNAs,and carried out hybridizations and data acquisition; ER, PR, P-LP, J-PVdeveloped data analyses procedures and analysed the test array; ER and SCcreated the PiceaGenExpress database; SC submitted microarray design anddata to public databases; ER, SC and JM drafted the manuscript. All of theauthors approved the manuscript.AcknowledgmentsThe authors thank A. Séguin, I. Duval, K. Klimaszewska, S. Velmala, and T.Pennanen for sharing data before publication, for incorporation intoPiceaGenExpress reference transcript profiles. The authors would like toacknowledge Stephane LeBihan of the Genome BC Microarray Platform foradvice and dedicated support of microarray printing. Funding for this workwas obtained from Genome Canada, Genome Quebec, Genome BC for theArborea, Treemomix , and SMarTForests Projects, and the Discovery grantprogram of the Natural Sciences and Engineering Council (JM).Author details1Center for Forest Research and Institute for Integrative and Systems Biology,Université Laval, Québec, QC, Canada G1V 0A6. 2Gydle Inc., Québec, QC,Canada. 3Michael Smith Laboratories, University of British-Columbia,Vancouver, BC, Canada.Received: 3 February 2012 Accepted: 11 July 2012Published: 29 August 2012References1. Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B,Weigel D, Lohmann JU: A gene expression map of Arabidopsis thalianadevelopment. Nat Genet 2005, 37(5):501–506.2. Sjödin A, Street NR, Sandberg G, Gustafsson P, Jansson S: The PopulusGenome Integrative Explorer (PopGenIE): a new resource for exploringthe Populus genome. New Phytol 2009, 182(4):1013–1025.3. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB,Lamb JR, Cavet G: Genetics of gene expression surveyed in maize, mouseand man. Nature 2003, 422(6929):297–302.4. Lemos B, Meiklejohn CD, Cáceres M, Hartl DL: Rates of divergence in geneexpression profiles of primates, mice, and flies: stabilizing selection andvariability among functional categories. Evolution 2005, 59(1):126–137.5. Huang GJ, Shifman S, Valdar W, Johannesson M, Yalcin B, Taylor MS, TaylorJM, Mott R, Flint J: High resolution mapping of expression QTLs inheterogeneous stock mice in multiple tissues. Genome Res 2009,19(6):1133–1140.6. Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M,Liechti A, Aximu-Petri A, Kircher M: The evolution of gene expression levelsin mammalian organs. Nature 2011, 478(7369):343–348.7. Zheng-Bradley X, Rung J, Parkinson H, Brazma A: Large scale comparisonof global gene expression patterns in human and mouse. Genome Biol2010, 11(12):R124.8. Arabidopsis Genome Initiative: Analysis of the genome sequence of theflowering plant Arabidopsis thaliana. Nature 2000, 408(6814):796–815.9. Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X:A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science2002, 296(5565):79–92.10. Tuskan G, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N,Ralph S, Rombauts S, Salamov A, et al: The genome of black cottonwood,Populus trichocarpa (Torr & Gray ex Brayshaw). Science 2006,313:1596–1604.11. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N,Aubourg S, Vitulo N, Jubin C: The grapevine genome sequence suggestsancestral hexaploidization in major angiosperm phyla. Nature 2007, 449(7161):463–467.12. MacKay J, Dean J: Transcriptomics. In Genomics and Breeding of Conifers.Edited by Plomion C, Bousquet J, Kole C. New York: Edenbridge SciencePublishers and CRC Press; 2011:333–357.13. Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ: A WhiteSpruce Gene Catalog for Conifer Genome Analyses. Plant Physiol 2011,157(1):14–28.14. Ralph S, Chun H, Kolosova N, Cooper D, Oddy C, Ritland C, Kirkpatrick R,Moore R, Barber S, Holt R: A conifer genomics resource of 200,000 spruce(Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-lengthcDNAs for Sitka spruce (Picea sitchensis). BMC Genomics 2008, 9:484.15. Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C,Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genesfrom wood-forming tissues of loblolly pine (Pinus taeda L.) withArabidopsis thaliana. Proc Nat Acad Sc USA 2003, 100(12):7383–7388.16. Cairney J, Pullman GS: The cellular and molecular biology of coniferembryogenesis. New Phytol 2007, 176(3):511–536.17. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN: Global monitoringof autumn gene expression within and among phenotypically divergentpopulations of Sitka spruce (Picea sitchensis). New Phytol 2008,178(1):103–122.18. Lorenz WW, Sun F, Liang C, Kolychev D, Wang H, Zhao X, Cordonnier-Pratt MM,Pratt LH, Dean JFD: Water stress-responsive genes in loblolly pine (Pinustaeda) roots identified by analyses of expressed sequence tag libraries. TreePhysiol 2006, 26:1–16.19. Zhao C, Craig JC, Petzold HE, Dickerman AW, Beers EP: The xylem andphloem transcriptomes from secondary tissues of the Arabidopsis root-hypocotyl. Plant Physiol 2005, 138(2):803–818.20. Oh S, Park S, Han KH: Transcriptional regulation of secondary growth inArabidopsis thaliana. J Exper Bot 2003, 54(393):2709–2722.21. Ko JH, Beers EP, Han KH: Global comparative transcriptome analysisidentifies gene network regulating secondary xylem development inArabidopsis thaliana. Mol. Gen. Gen. 2006, 276(6):517–531.22. Pavy N, Boyle B, Nelson C, Paule C, Giguère I, Caron S, Parsons LS, Dallaire N,Bedon F, Bérubé H: Identification of conserved core xylem gene sets:conifer cDNA microarray development, transcript profiling andcomputational analyses. New Phytol 2008, 180(4):766–786.23. Beaulieu J, Doerksen T, Boyle B, Clément S, Deslauriers M, Beauseigle S, Blais S,Poulin PL, Lenz P, Caron S: Association genetics of wood physical traits inthe conifer white spruce and relationships with gene expression.Genetics 2011, 188(1):197–214.24. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping andquantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008,5(7):621–628.25. Nairn CJ, Haselkorn T: Three loblolly pine CesA genes expressed indeveloping xylem are orthologous to secondary cell wall CesA genes ofangiosperms. New Phytol 2005, 166(3):907–915.26. Suzuki S, Li L, Sun YH, Chiang VL: The cellulose synthase gene superfamilyand biochemical functions of xylem-specific cellulose synthase-likegenes in Populus trichocarpa. Plant Physiol 2006, 142(3):1233–1245.27. Velasco-Conde T, Yakovlev IP, Majada JP, Aranda I, Johnsen O: Dehydrins inmaritime pine (Pinus pinaster) and their expression related to droughtstress response. Tree Genetics & Genomes 2012.28. Yakovlev IA, Asante DK, Fossdal CG, Partanen J, Junttila O, Johnsen O:Dehydrins expression related to timing of bud burst in Norway spruce.Planta 2008, 228(3):459–472.29. Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Garcia SA,Kubisiak TL, Amerson HV, Carlson JE, Nelson CD: Evolution of genome sizeand complexity in Pinus. PLoS One 2009, 4(2):e4332.30. Morgante M, De Paoli E: Toward the conifer genome sequence. InGenetics, Genomics and Breeding of Conifers Trees. Edited by Plomion C,Bousquet J, Kole C. New York: Edenbridge Science Publishers and CRC Press;2011:389–407.31. Kovach AS, Wegrzyn JL, Parra G, Holt C, Bruening GE, Loopstra CA, Hartigan J,Yandell M, Langley CH, Korf I, Neale DB: The Pinus taeda genome ischaracterized by diverse and highly diverged repetitive sequences.BMC Genomics 2010, 11:420.32. Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA: Transcriptomesequencing in an ecologically important tree species: assembly,annotation, and marker discovery. BMC Genomics 2010, 11:180.33. Pavy N, Laroche J, Bousquet J, Mackay J: Large-scale statistical analysis ofsecondary xylem ESTs in pine. Plant Mol Biol 2005, 57(2):203–224.34. Grattapaglia D, Plomion C, Kirst M, Sederoff RR: Genomics of growth traitsin forest trees. Curr Opin Plant Biol 2009, 12(2):148–156.35. Keurentjes JJB, Fu J, Terpstra IR, Garcia JM, Van Den Ackerveken G, Snoek LB,Peeters AJM, Vreugdenhil D, Koornneef M, Jansen RC: Regulatory networkconstruction in Arabidopsis by using genome-wide gene expressionquantitative trait loci. Proc Nat Acad Sci USA 2007, 104(5):1708–1713.Raherison et al. BMC Genomics 2012, 13:434 Page 15 of 16 Drost DR, Benedict CI, Berg A, Novaes E, Novaes CRDB, Yu Q, Dervinis C,Maia JM, Yap J, Miles B: Diversification in the genetic architecture of geneexpression and transcriptional networks in organ differentiation ofPopulus. Proc Nat Acad Sci USA 2010, 107(18):8492–8497.37. Klimaszewska K, Overton C, Stewart D, Rutledge RG: Initiation of somaticembryos and regeneration of plants from primordial shoots of 10-year-old somatic white spruce and expression profiles of 11 genes followedduring the tissue culture process. Planta 2011, 233(3):635–647.38. Pelgas B, Bousquet J, Meirmans PG, Ritland K, Isabel N: QTL mapping inwhite spruce: gene maps and genomic regions underlying adaptivetraits across pedigrees, years and environments. BMC Genomics 2011,12:145.39. Chang S, Puryear J, Cairney J: A simple and efficient method for isolatingRNA from pine trees. Plant Mol Biol Rep 1993, 11(2):113–116.40. Smyth GK: Linear models and empirical Bayes methods for assessingdifferential expression in microarray experiments. Stat Applic Gen Mol Biol2004, 3:1.41. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practicaland powerful approach to multiple testing. J Royal Stat Society Series B(Methodological) 1995, 57(1):289–300.42. Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to ClusterAnalysis. New York: John Wiley & Sons, Inc; 1990.43. Zhao N, Boyle B, Duval I, Ferrer JL, Lin H, Seguin A, Mackay J, Chen F:SABATH methyltransferases from white spruce (Picea glauca): genecloning, functional characterization and structural analysis. Tree Physiol2009, 29(7):947–957.44. Korkama T, Pakkanen A, Pennanen T: Ectomycorrhizal communitystructure varies among Norway spruce (Picea abies) clones. New Phytol2006, 171(4):815–824.doi:10.1186/1471-2164-13-434Cite this article as: Raherison et al.: Transcriptome profiling in conifersand the PiceaGenExpress database show patterns of diversificationwithin gene families and interspecific conservation in vascular geneexpression. BMC Genomics 2012 13:434.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at et al. BMC Genomics 2012, 13:434 Page 16 of 16


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items