UBC Faculty Research and Publications

Evolution of gene structure in the conifer Picea glauca: a comparative analysis of the impact of intron… Stival Sena, Juliana; Giguère, Isabelle; Boyle, Brian; Rigault, Philippe; Birol, Inanc; Zuccolo, Andrea; Ritland, Kermit; Ritland, Carol; Bohlmann, Joerg; Jones, Steven; Bousquet, Jean; Mackay, John Apr 16, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12870_2013_Article_1527.pdf [ 571.44kB ]
JSON: 52383-1.0074625.json
JSON-LD: 52383-1.0074625-ld.json
RDF/XML (Pretty): 52383-1.0074625-rdf.xml
RDF/JSON: 52383-1.0074625-rdf.json
Turtle: 52383-1.0074625-turtle.txt
N-Triples: 52383-1.0074625-rdf-ntriples.txt
Original Record: 52383-1.0074625-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessEvolution of gene structure in the conifer Piceaglauca: a comparative analysis of the impact ofintron sizeJuliana Stival Sena1*, Isabelle Giguère1, Brian Boyle1, Philippe Rigault2, Inanc Birol3, Andrea Zuccolo4,5,Kermit Ritland6, Carol Ritland6, Joerg Bohlmann3, Steven Jones3, Jean Bousquet1,7 and John Mackay1AbstractBackground: A positive relationship between genome size and intron length is observed across eukaryotesincluding Angiosperms plants, indicating a co-evolution of genome size and gene structure. Conifers have verylarge genomes and longer introns on average than most plants, but impacts of their large genome and longerintrons on gene structure has not be described.Results: Gene structure was analyzed for 35 genes of Picea glauca obtained from BAC sequencing and genomeassembly, including comparisons with A. thaliana, P. trichocarpa and Z. mays. We aimed to develop anunderstanding of impact of long introns on the structure of individual genes. The number and length of exons waswell conserved among the species compared but on average, P. glauca introns were longer and genes had fourtimes more intronic sequence than Arabidopsis, and 2 times more than poplar and maize. However, pairwisecomparisons of individual genes gave variable results and not all contrasts were statistically significant. Genesgenerally accumulated one or a few longer introns in species with larger genomes but the position of long intronswas variable between plant lineages. In P. glauca, highly expressed genes generally had more intronic sequencethan tissue preferential genes. Comparisons with the Pinus taeda BACs and genome scaffolds showed a highconservation for position of long introns and for sequence of short introns. A survey of 1836 P. glauca genesobtained by sequence capture mostly containing introns <1 Kbp showed that repeated sequences were 10× moreabundant in introns than in exons.Conclusion: Conifers have large amounts of intronic sequence per gene for seed plants due to the presence offew long introns and repetitive element sequences are ubiquitous in their introns. Results indicate a complexlandscape of intron sizes and distribution across taxa and between genes with different expression profiles.Keywords: Genome size, Pinus taeda, BAC, Repeat elements, Gymnosperms, Gene expressionBackgroundMany factors related to genome size, recombination rate,expression level, and effective population size, amongothers, have been proposed to affect the evolution of genestructure [1-4]. At the molecular level, genome size varia-tions may result from mobile or transposable elements(TEs), whole genome duplication events, and polyploidiza-tion events, among others. Comparative studies haveshown that intron lengths and the abundance of mobile el-ements directly correlate with genome size, such that largegenomes have longer introns and a higher proportion ofmobile elements [1]. Mobile elements also impact genestructure and function as they can insert into genes, in-cluding introns and exons, and thus contribute to the evo-lution of genes.Conifer trees have very large genomes ranging from 18to 35 Gbp [5] that are composed of a large fraction of re-petitive sequences [6,7]. New insight into plant genomeevolution are expected from the unique structure and his-tory of conifer genomes [8], which may contribute to a* Correspondence: juliana.sena.1@ulaval.ca1Center for Forest Research and Institute for Systems and Integrative Biology,1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, CanadaFull list of author information is available at the end of the article© 2014 Stival Sena et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of theCreative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons PublicDomain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in thisarticle, unless otherwise stated.Stival Sena et al. BMC Plant Biology 2014, 14:95http://www.biomedcentral.com/1471-2229/14/95broader understanding of the relationships between genestructure and genome architecture. Draft genome assem-blies were recently reported for the European Picea abies(Norway spruce) [9] as well as the North American speciesPicea glauca (white spruce) [10] and Pinus taeda (loblollypine) [11,12]. Nystedt et al. [9] reported that Norway spruceand other conifers accumulate long introns and showedthat some introns can be very long (>10 Kbp) compared toother plant species.A positive relationship between genome size and intronlength has been observed in broad phylogenetic studies[2,13,14] including between recently diverged Drosophilaspecies harboring considerable difference in genome size,where D. viliris had longer introns than D. melanogaster[15]. In plants, a few studies investigated this questionwithin angiosperms, indicating that genome size is not ne-cessarily a good predictor of intron length [16,17] althougha general trend is observed. For instance, Arabidopsis thali-ana, Populus trichocarpa, Zea mays have well character-ized genomes that range in size from 125 Mbp to 2.3 Gbp;their average exons sizes are between 250 and 259, whereastheir introns sizes are 168 bp, 380 bp and 607 bp on aver-age, respectively [18-20]. The length of introns may dependupon gene function and expression level; however, there isconsiderable debate surrounding this issue when it comesto plant genomes. In Oryza sativa and A. thaliana it wasfound that highly expressed genes contained more and lon-ger introns than genes expressed at a low level [21], whichis in contrast to findings in Caenorhabditis elegans andHomo sapiens [4].Transposable elements are among the factors that mayinfluence the evolution of intron size, as they represent themajor component of plant genomes [22]. In Vitis vinifera,transposable elements comprise 80% of long introns [17].In many plants, LTR-RT represent a large fraction of thegenome but are more abundant in gene poor regions of thegenome; therefore, their impact on the evolution of genestructure may actually be lesser than other classes of trans-posable elements such as MITEs [23] and helitrons, both ofwhich are known to insert into or close to genes [24].To date, studies related to genome size and the evolutionof plant introns have primarily involved angiosperms (flow-ering plants), many of which have genomes under 1 Gbp.More recently, the Picea abies and Pinus taeda genomeswere shown to have among the largest average introns size[9,12]. We aimed to develop an understanding of the genestructure in conifers through a detailed analysis of individ-ual genes with a particular emphasis on the potential im-pact of long introns on gene structure trough comparativeanalyses. An underlying question relates to potential im-pacts on gene expression; therefore, our analyses took intoaccount their expression profiles. Gene structure was ana-lyzed in two conifers (P. glauca and P. taeda) and three an-giosperms. We explored three main hypothesis: (1) Intronlength is the major type of variation affecting gene structurein conifers compared to other plant species; (2) there is apositive relationship between genome size and intron lengthin P. glauca compared to A. thaliana, Z. mays and P. tricho-carpa; (3) P. glauca and P. taeda present a conserved genestructure despite the fact that they diverged over 100 MYAin keeping with their low rate of genome evolution [8].We present a detailed analysis of gene structure for 35genes from the conifer Picea glauca obtained from BACsequencing and genome assembly and comparative ana-lyses with A. thaliana, P. trichocarpa and Z. mays. Ourstudy also included the analysis of nearly 6000 gene se-quences obtained from sequence capture aiming to ex-plore the potential impact of repetitive sequences onintron size in P. glauca. Our findings show that intron sizeand the position of long introns within genes is variablebetween plant lineages but highly conserved in conifers.ResultsGenomic sequencesGenomic sequences were analyzed for several P. glaucagenes. The sequences were obtained either by targetedBAC isolations, from an early assembly of the P. glaucagenome [10], or from a sequence capture experiment (fordetails, see Methods).A total of 21 BAC clones were isolated each containinga different single copy gene associated with secondarycell-wall formation or with nitrogen metabolism. Follow-ing shotgun sequencing by GS-FLX and assembly with theNewbler software, the integrity and identity of each genewas verified. Estimated size of BAC clones was 131 Kb onaverage and coverage was 144× (for Summary statistics,see Additional file 1: Table S6). Twenty of the 21 targetedgenes were complete as determined by sequence align-ment indicating full coverage of FL cDNA sequences fromspruces and pines (P. glauca, P. sitchensis, P. taeda andPinus sylvestris) [25-28] (Additional file 1: Table S7).Nearly all genes were contained within a single contig, ex-cept the LIM gene which lacked one exon, and the Susygene which was complete cDNA sequence but spannedtwo contigs. None of BACs contained other genes as de-termined by BLAST searches against the P. glauca genecatalog [29] and the Swiss Prot database.Sequences were also isolated from a whole genomeshotgun assembly of P. glauca [10]. Sequences with ubi-quitous expression were targeted in order to complementthe set of more specialized genes which had been selectedfor BAC isolation. The P. glauca genome shotgun assemblywas screened with the complete CDS derived from cDNAsequences (according to Rigault et al. [29]) that were highlyexpressed in most tissues (according to Raherison et al.[30]). A total of 18 genomic sequences were randomly se-lected among those that spanned the entire coding regionof the targeted gene.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 2 of 16http://www.biomedcentral.com/1471-2229/14/95Gene expression profilesTranscript accumulation profiles from eight different tissueswere obtained from the PiceaGenExpress database [30] foreach of the gene sequences described above (Figure 1). Thetranscript data indicated that the group of highly expressedgenes was detected in all tissues and with average abun-dance class above 9.7 (out of 10) across all tissues (Figure 1,top). In contrast, the genes associated with wood formationand nitrogen metabolism nearly all had tissue preferentialexpression patterns; they were detected in six tissues onaverage (range of two to eight tissues) and had an averagetranscript abundance class of 5.8 in those tissues where thegenes were expressed (Figure 1, bottom).Gene structures and comparative analysis with angiospermsThe gene structure (exon and introns regions) of P. glaucagenes was determined by mapping the complete cDNAonto the genomic sequence (BACs or shotgun contigs) for35 genes. Homologs were retrieved from three well-characterized angiosperm genomes, Arabidopsis thaliana[19], Populus trichocarpa [18] and Zea mays [20]. Thecomparative analyses considered all of the genes togetherand also as two separate groups, i.e. genes highly expressedand genes related to secondary cell-wall formation and ni-trogen metabolism. On average, the protein coding se-quence similarity between P. glauca and A. thaliana was76%, 78% with P. trichocarpa and 75% with Z. mays.The number of exons and introns was well conservedbetween homologous genes among the different species(Table 1). The average length of exons was also well con-served between homologs among species (average of 240bp, median of 155 bp) and varied only slightly betweenthe two sub-groups genes (Table 1 and Additional file 2:Figure S2). Pairwise comparisons of matching exons alsoindicated conservation of length among the species con-sidered (not shown). These observations indicate thatexon structure is generally well conserved.In contrast, introns revealed much more variation be-tween species. Our analyses included comparisons of indi-vidual introns and of total intronic sequences in each gene.The average length of individual introns (in bp) was 144,295, 454, and 532 for A. thaliana, P. trichocarpa, Z. maysand P. glauca, respectively (Figure 2 and Additional file 2:Figure S2). The average intron length varied significantlyamong P. glauca and the three species; pairwise contrastswere significant with A. thaliana and Z. mays, and nearlysignificant with P. trichocarpa (Figure 2). In P. glauca,P. trichocarpa and Z. mays, we also observed that intronlengths were more heterogeneous as shown by differencesbetween low and upper quartiles, minimum and maximumlengths and outliers of large size (Figure 2). The averagelength of the longest intron per gene was 382 bp in A. thali-ana, 806 bp in P. trichocarpa ,1652 bp in Z. mays and 2022bp in P. glauca.Comparison of the total length of intronic sequences ona gene-by-gene basis showed that on average, P. glaucagenes had 4.1 times more intronic sequences than A.thaliana, 2.2 times more than P. trichocarpa and 1.8 timesmore than Z. mays (Figure 3A). The total length of intronsequences and length ratio was calculated for each gene inpairwise comparisons between all of the species. Compari-sons between P. glauca and A. thaliana gene sets werestatistically significant (Figure 3); the ratios were close tofive on average in highly expressed genes and three ingenes associated with secondary cell-wall formation andnitrogen metabolism (Figure 3B). In contrast, the ratio oftotal intron lengths between P. glauca compared to P. tri-chocarpa and Z. mays was constant at around two-foldand the total length of intronic sequence per gene was notstatistically different. Results also indicated that A. thali-ana has significantly less intronic sequence than P. tricho-carpa and Z. mays and that their ratios were mostdifferent for the highly expressed genes and more similarfor the genes involved in secondary cell-wall formationand nitrogen metabolism (Figure 3B). A significant differ-ence of intron lengths was also observed between the twoexpression groups within P. glauca (p < 0.05).The variation in the ratios of total intron sequence pergenes was quite striking, for both of the gene expressiongroups (Figure 4). For instance, depending on the gene,the ratios ranged from 0.2 to 10. This high level of het-erogeneity in pairwise comparisons is likely to accountfor the lack of statistically significant differences. Inaddition, the intron length ratios were not consistentacross species (Figure 4A and B).In this study, we show that much of the divergence inthe total length of intron sequences per gene was relatedto a few long introns. Very long introns were observedin a few P. glauca genes such as PHD, Peptidase_C1 andThiolase. Structure plots showed that introns in A. thali-ana generally had uniform lengths whereas the otherspecies had introns that were highly heterogeneous inlengths (Figure 5 and Additional file 3: Figure S3). Whilemost of the P. glauca genes only had a few (1–3) verylong introns (>1000 bp), gene sequences such as thosefor sucrose synthase (Susy) had many introns of moder-ate size (Figure 5). The longest introns in P. glauca weremost often in a different position than long Z. mays andP. trichocarpa introns. In addition, we did not observe atrend of increased length in first introns in 5′ UTRs asreported for several eukaryotes [31], as the long intronsin P. glauca appeared to be randomly distributed.Comparative analysis of gene structures between Piceaglauca and Pinus taedaA total of 23 different genes were submitted to pairwisecomparisons between Picea glauca and Pinus taeda, whichare both of the Pinaceae (for details, see Methods). A highStival Sena et al. BMC Plant Biology 2014, 14:95 Page 3 of 16http://www.biomedcentral.com/1471-2229/14/95Figure 1 (See legend on next page.)Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 4 of 16http://www.biomedcentral.com/1471-2229/14/95level of similarity was observed for coding sequences (91%on average) indicating that they were likely orthologousgenes (Additional file 1: Table S4), and gene structure wasconserved between the two conifers, with almost identicalnumbers of exons. The total intronic sequences per genedid not vary significantly at 3.13 and 3.17 Kbp for P. glaucaand P. taeda, respectively (Additional file 1: Table S1). Pair-wise comparison of introns indicated that the majority ofindividual introns were similar in length in the two species,despite the fact that the two genera diverged ca. 140 millionyears ago [32,33] (Figure 5). Although these observationsare based on a set of only 23 genes, they provide an indica-tion that intron length is mostly conserved between thesetwo conifer genera.The 138 intron sequences of the 22 genes (PAL gene donot have introns) were aligned between spruce and pine;sequence similarity ranged quite broadly among homolo-gous introns (Figure 6).We observed that highly conservedintrons generally were short, and that longer introns hadhighly variables levels of sequence similarity, except fortwo introns that were both long and highly conserved.Repeat elements in Picea glauca genesThe possible origin of long introns as observed in conifergenomes was investigated by searching for the presence ofrepeated sequences including transposable elements.First, the repetitive element content of the BACs wasestimated based on a repetitive library constructed withP. glauca data (see Methods) as a baseline. It was 55%on average, but it varied considerably among the BACclones, ranging from 18% to 83%. Additional file 4: FigureS1 shows that around half of repetitive sequences wereclassified as LTR-RT elements and the other half as un-known elements (without significant hits in Repbase andnr genbank).We then analyzed the sequences of the 35 P. glaucagenes described above including those identified in BACs,representing a total of 238 introns. The gene structures ofthese genes were screened for repeat elements using a P.glauca repeat library (see Methods). We found repetitiveelements in 10 of the genes for a total of 24 unclassifiedfragments with no significant hits in RepBase; 22 of thefragments produced no hits in genbank and were 179 bpon average and only two had significant hits in nr genbank(Additional file 1: Table S8).We also extended our analysis to include an add-itional set of genomic sequences obtained by targetedgene space sequencing based on sequence capture (seeMethods, for details). Complete genomics sequencesspanning the entire known mRNA sequence were re-covered for 5970 complete genes, 1836 of which con-tained one or more introns. The different repetitiveelements identified in introns and exons were then esti-mated. The proportion of genes harbouring repetitiveelements in their introns was 32.4% and was only 3.2%in exons. The repetitive elements represented 2.94%and 0.74% of the intronic and exonic sequences, re-spectively (Table 2). The repetitive sequences that wereidentified ranged from 31 to 1142 bp (median 117 bp)in exons and from 17 to 1189 bp (median 114bp) in in-trons. The unclassified elements were the most numer-ous, representing on average 80% of the hits in bothintrons and exons (Table 2). Class I LTR transposonswere the most abundant group of classified repetitive el-ements and were only represented by incomplete ele-ments. The LTRs were accounted for the higherrepetitive element sequence representation in introns;however, on average, the sequences identified as Copiaand Gypsy elements were longer in exons than inintrons.(See figure on previous page.)Figure 1 Transcript accumulation profiles from the PiceaGenExpress database (Raherison et al. [30]) of the P. glauca genes. Thetranscript abundance data are classified from 1 to 10, from lowest to highest microarray hybridization intensities detected within a given tissue.The profiles of highly expressed genes (top) (according to Raherison et al. [30]; class 8 to 10) are contrasted with most of the genes associated withsecondary cell wall formation and nitrogen metabolism (bottom, names in bold). NA: Not detected. Tissues: B (Vegetative buds), F (Foliage), X-M(Xylem – from mature trees), X-J (Xylem –juvenile trees), P (Phelloderm), R (Adventitious roots), M (Megagametophytes), E (Embryogenic cells).Table 1 Average number and length of exons in genes used for comparative analysesHighly expressed genes1 Secondary cell-wall formation and nitrogen metabolism genes2Exon number Exon length Standard deviation Exon number Exon length Standard deviationArabidopsis thaliana 5.9 220.8 215.0 9.1 228.9 189.8Populus trichocarpa 6.2 241.5 253.3 9.4 261.1 263.5Zea mays 6.1 244.5 236.7 9.0 257.6 274.8Picea glauca 6.2 236.5 226.0 9.5 223.9 217.81Data were obtained from 18 different genes and an average total of 109 exons per species.2Data were obtained from 17 different genes and an average total of 157 exons per species.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 5 of 16http://www.biomedcentral.com/1471-2229/14/95DiscussionThis study reports on the detailed gene structure analysisof 35 genes from the conifer Picea glauca obtained fromBAC sequencing and genome assembly. Recent analysesof the Picea abies and Pinus taeda genomes have analyzedindividual introns and reported among the highest averageintron lengths, the longest introns and highest averageamong long introns [9,12]. We aimed to develop an un-derstanding of the gene structure in conifers through a de-tailed analysis of entire genes taking into account geneexpression profiles, with a particular emphasis on the po-tential impact of longer introns on gene structure troughcomparative analyses. Our findings were also derived fromthe analysis of nearly 6000 gene sequences obtained fromsequence capture sequencing. We present an interpret-ation of our findings in regard to the evolution of genestructure.Evolution of gene structure in plantsAnalyses over a broad phylogenetic spectrum in eukary-otes showed that increases in genome size correlate withincreases in the average intron length [2,13]. A strong rela-tionship between intron length and genome size was ob-served from studies in humans and pufferfish [14], speciesof Drosophilla [15], and from studies of plants with smallgenomes [2,13].Our study compared the gene structure (introns andexons) of 35 homologous genes between four seed plantspecies with very different genome sizes. The conifer P.glauca has the largest genome with 19.8 Gbp [34]; amongangiosperms, the monocot Z. mays has a genome of 2.3Gbp [24], and dicots represent smaller plant genomes inthis set, i.e. P. trichocarpa with genome of 484 Mbp [18]and A. thaliana with the smallest genome of 125 Mbp[19]. In the present study, the average exon length wassimilar between the four species, but the overall length ofgenes varied owing to longer introns in P. glauca, P. tri-chocarpa and Z. mays. For the set of sequences analyzed,P. glauca had 4.1 times more intron sequence per genethan Arabidopsis, 2.2 times more than poplar and 1.8times more than maize (Figures 3 and 4); however, thestatistical significance of these differences was variable.The landscape of intron sizes in plants appears rathercomplex. A significant number of Vitis vinifera intronswere shown to be uncommonly large for its genome sizeof 416 Mbp, compared to other plants [17]. In Gossypium,after multiple inferred rounds of genome expansion andcontraction, intron size remained unchanged [16]. Such apattern may be expected, given that genome size increaseby polyploidy is sudden and fundamentally different thanother types of genome size variation such as the gradualaccumulation or loss of repeat elements over time. Takentogether, observations from different plants indicated thatevents resulting in the expansion or contraction of inter-genic regions are not clearly reflected by shifts in intronslength. It thus appears that the evolution of intron lengthand genome size may be uncoupled in plants or alterna-tively, that the evolution of intron length is lineage specific(Figure 7).Even though our study was based on 35 genes, our resultsare consistent with variations of intron size reported for A.thaliana, P. trichocarpa and Z. mays genomes [9,12,18-20].We concluded that the increased intron length in P. glauca,P. trichocarpa and Z. mays was heterogeneous comparedto A. thaliana. Even in genes with many introns, only a fewintrons were very long, whereas in Arabidopsis, genes ex-hibited a more uniform intron length, suggesting that in-trons expansion or contraction within a gene may beindependent across species.Comparisons between the A. thaliana (125 Mbp) andA. lyrata (~200 Mbp) genomes, which diverged about 10million years ago, showed that most of the difference ingenome size was due to hundreds of thousands of smalldeletions, mostly in noncoding DNA [35]. The authorsconcluded that evolution toward genome compaction isoccurring in Arabidopsis. Conifers such as species of Piceaand Pinus have large amounts of repetitive elements inintergenic regions and apparently more intronic sequenceper gene in comparison to many angiosperms. Our resultsdo not reveal whether the P. glauca genome and intronsIntrons length (bp)  A. thaliana P. trichocarpa All genes combined ***  ***  ***  ***  NS  *  Figure 2 Comparative analysis of individual intron length inP. glauca, A. thaliana P. trichocarpa and Z. mays. Box plotsrepresent intron length data for all of the introns of the 35 genesused in comparative analyses. Intron lengths were compared amongthe four species by Kruskal-Wallis test with post-test analysis byDunn’s multiple comparisons: NS, not significant (P≥ 0.06);*P = 0.06; **P < 0.01; ***P < 0.001.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 6 of 16http://www.biomedcentral.com/1471-2229/14/95are expanding, or alternatively evolving at slower pace,than other plant genomes which are contracting. Someevidence like the presence of very ancient retrotransposonelements [9,36] and the lack of gene rearrangements sincebefore their split from extant angiosperms [8] lend cre-dence to the paradigm that conifer genomes are slowlyevolving.Repetitive sequences in gene evolutionTransposable elements play a role in plant genes as wasshown by the abundance of TE- gene chimeras in Arabi-dopsis which was reported as 7.8% of expressed genes [37].The abundance of TEs may be especially high in long in-trons as recently shown in Picea abies where most of theintrons were longer than 5 Kbp, representing 5% of thetotal intron count [9]. This trend was also observed in otherrepeat rich genomes as V. vinifera and Z. mays [20,21,38].We isolated P. glauca BAC clones each containing adifferent complete transcription unit for 21 target genes.In each the BACs (average 131 Kb), only one intact genesequence was identified, which is indicative of largeintergenic regions as reported for other conifers [39-41].Previous studies on conifer trees have considered onlytwo targeted genes (from terpenoid biosynthesis) iso-lated from P. glauca BAC clones [40] and only a fewother intact genes with complete coding sequence iso-lated from BACs in pines [7,39,41].Complete sequencing of the P. glauca BACs showed thatthe repetitive element content is not distributed uniformlyin proximal intergenic regions, as indicated by the variableproportion of repetitive elements among the differentBACs. A study in 10 P. taeda BACs, sequences similar toeukaryote repeat elements (according to Repbase) repre-sented 23% of the sequence on average, and ranged from19% to 33% [7]. In P. glauca, 26% of BAC sequences wereclassified as LTR-RT repetitive elements on average andranged from 8% to 47%, while P. taeda had an average of18.8% of LTR-RT [7]. Furthermore, an average 26% of theP. glauca BAC sequences were unknown repeat elements.Results in spruce and pine indicate a relatively lowA BFigure 3 Comparative analysis of total intron length in P. glauca, A. thaliana, P. trichocarpa and Z. mays. Average ratio of total length ofintron sequences in pair-wise comparisons in: A- all genes; B- highly expressed genes and genes involved in secondary cell-wall formation andnitrogen metabolism (For individual ratios, see Figure 4). The total intron lengths were compared among the four species by Kruskal-Wallis testwith post-test analysis by Dunn’s multiple comparisons: NS, not significant (P ≥ 0.05); **P < 0.01; ***P < 0.001.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 7 of 16http://www.biomedcentral.com/1471-2229/14/95ABFigure 4 Gene by gene pair-wise comparisons of total length of intronic sequences in P. glauca, A. thaliana, Populus trichocarpa andZ. mays. (A) highly expressed genes and (B) genes associated with secondary cell-wall formation and nitrogen metabolism.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 8 of 16http://www.biomedcentral.com/1471-2229/14/95abundance of TEs in gene proximal sequences compared towhole genomes at 70% in the Picea abies genome [9] andaround 80% in Pinus taeda [12].Picea and Pinus genomes are reported to have amongthe highest average for the longest intron per gene, whencompared to angiosperms of diverse genome sizes [9]. Weverified whether insertions of repetitive elements could beresponsible for the length of introns in P. glauca in a setof more than 1800 genes sequences, and found that moregenes harboring repetitive elements in introns were 10times more frequent than genes harboring repetitive ele-ments in exons, i.e. 29.8% vs 3.2%. The vast majority ofthe repetitive elements were short fragments, suggestingthat they were remnants or fragments of TE insertionsthat have not persisted and could represent ancient inser-tion events. Importantly, interpretation of our findings inFigure 5 Gene structure of six genes from different angiosperm and gymnosperm species. The first three genes are associated withsecondary cell-wall formation and nitrogen metabolism; and highly expressed genes are bolded.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 9 of 16http://www.biomedcentral.com/1471-2229/14/95P. glauca must take into account the fact that the se-quences were derived from a sequence capture study andthat nearly all of the introns in the data set were <1 Kbp.Thus we show that TE sequences are ubiquitous even ingenes that do not harbor long introns, suggesting thattheir presence has been very widespread during the evolu-tion of conifer genes. An analysis of intact LTR TE inPicea genomic sequences showed that most insertionsdate back to 10 MYA or more, with a maximum around20–25 MYA [9]. The TE remnants that we detected in P.glauca indicate that many genes introns contained TE in amore or less distant past. In this report and in recent ana-lyses of conifer genomes, an emphasis has been place onlong introns; however the median intron length in conifersis very similar to other plant species, most of which have amedian between 100 bp and 200 bp. Therefore our find-ings on intron are relevant for a large majority of intronsrather than a small fraction represented by large or verylarge introns.Slow evolution of conifer genesAnalyses of the gene structure of 23 orthologous genes be-tween P. glauca and P. taeda clearly showed the conserva-tion of gene structure and the distribution of intron sizes inspite a divergence time of 100 to 140 MYA [32,33]. Theconservation of long introns was also observed acrossgymnosperm taxa, where a group of long introns in P. abieswas identified as orthologous to long introns in P. sylvestrisand Gnetum gnemon [9]. We suggest that the long intronsobserved in P. glauca likely date back to a period predatingthe divergence of major conifer groups. As more conifersgenomes become available [9-11] and assembly contiguitiesare improved it will be possible to extend this analyses oforthologous gene structures among conifers.We also observed that the sequence of many introns washighly similar between spruce and pine, and that shorter in-trons were more conserved on average. Between humansand chimpanzee, a strong positive correlation was foundbetween intron length and divergence [42]. The patternfound in conifers as well as observations in primates lead tothe hypothesis that shorter introns could be under strongerselection pressure than longer introns, which could be ex-plained by factors such as the maintenance of functionalregulatory elements in shorter introns or impacts on RNAtranscript processing and stability. In our analysis of se-quence similarity between Picea and Pinus, 20 of the in-trons were longer than 1 Kbp and only two of them hadhigh sequence similarity. Future studies with more long in-trons are required to confirm the hypothesis that shorterintrons are more conserved in conifers. Despite the factthat introns are assumed to be non-coding, conserved in-trons may play a functional role related to gene expression.01020304050607080901000 1000 2000 3000Sequence similarity (%)Average intron length betweenP. glauca and P. taedaFigure 6 Relationship between intron size and sequencesimilarity of introns from P. glauca and P. taeda. A total of 138introns were obtained from 22 genes and sequence alignmentswere produced with the Needle software (see Methods).Table 2 Abundance of repetitive elements in P. glaucagenes obtained from sequence captureClass Exons (%) Introns (%)Copia 0.09 0.24Gypsy 0.09 0.19LINE 0.03 0.15UNK1 0.03 0.07NHF2 0.49 2.29A total of 5970 genes were analyzed, 1836 contained one or more introns.1No significant hit in RepBase but significant hits in nr genbank.2No significant hit in RepBase and nr genbank.Figure 7 Variation in introns length and genome size in 35target genes. Average intron size for the Arabidopis, P. trichocarpa,Z. mays and P. glauca determined from the analysis of 35 homologousgenes. Note that Y- axes are in log 10 scale.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 10 of 16http://www.biomedcentral.com/1471-2229/14/95Costs and benefits associated with intron sizeThere is also considerable debate about other factors thatmay impact the evolution of introns, aside from transpos-able elements. Lynch [43] stated that the reduced effi-ciency of selection in regions of low recombination maylead to an increase in intron size if small introns provide aslightly improved transcription efficiency or splicing ac-curacy. On the other hand, Comeron and Kreitman [3]proposed that there might be situations in which a longerintron is selectively advantageous as an explanation for in-tron persistence and increased lengths. If so, there wouldbe indirect selection for large introns in regions of low re-combination because they can reduce the load caused bydeleterious mutations by increasing the recombinationrate. It was proposed that conifers have low recombinationrates at both the genome and within-gene scales [44].Their low recombination rates may explain at least in part,the accumulation of longer introns.The high degree of sequence conservation that we ob-served in short introns between spruce and pine may alsodepend on the recombination rate within genes, wheresmall introns would be under stronger selection becauseof efficiency in transcription and splicing, and long intronsin regions of low recombination diverge because of re-duced selection pressure. Another factor underlying theevolution of intron size is that intron length would be con-strained by energy use during transcription, given thatlarge introns represent a higher cost of transcription, theso-called “economy” or low-cost transcription hypothesis[4]. In the present study, the 35 P. glauca genes analyzedwere divided in two groups based on their expression pro-files, i.e. 17 genes associated with secondary cell-wall for-mation or nitrogen metabolism, many of which had tissuepreferential expression, and 18 genes that were highlytranscribed in a large range of tissues (based on Raherisonet al. [30]). The highly expressed genes had more intronicsequences per gene on average than the more specializedsubset of genes (4,182 bp versus 3,013 bp). We also ob-served a large variation among genes in each group, i.e.from 446 to 12,009 bp in highly transcribed genes and 440to 9,847 bp in the set of more specialized genes. These ob-servations do not support the “economy” hypothesis inP. glauca as there appears to be no clear rule governingthe relationship between intron size and expression levelsor profiles. In humans, genes contained total intronic se-quences are ~5,500 bp per gene on average [45], whichmore than any plant described so far. It was observed thatintron length declines steadily as the expression level in-creases in humans, in agreement with the low-cost tran-scription hypothesis [4]. Considering the smaller amountof intron sequences in plant genes including conifers com-pared to humans, it may be that the economy rule doesnot impact their introns as strongly as in vertebrates andthat other evolutionary forces are main drivers of intronsize evolution. This interpretation is consistent with thefindings reported for the P. abies genome [9]. Future stud-ies with more genes are needed to confirm this hypothesis.ConclusionsOur results indicate that P. glauca has longer introns thanArabidopsis, P. trichocarpa and Z. mays on average duethe presence of few long introns. Intron size and the pos-ition of long introns within genes were variable betweenplant lineages but well conserved in conifers. Our findingsare consistent with recent reports indicating that conifersaccumulate very long introns but we point out that longintrons represent a relatively small fraction of the overallintronic content, which is reflected by the median lengthof similar size to other plants. We show that RE sequencesare detected at a high frequency (32%) even in introns <1Kbp, indicating their ubiquitous presence in conifer genesover the course of evolution.Taken together, our observations and the recent litera-ture suggest that the evolution of plant gene structure isdetermined by more interacting forces than classically ex-pected. The pattern is reminiscent of the heterogeneity ofrates of evolution at the genetic, genomic and morpho-logical levels seen among seed plants including angio-sperms, conifers, annual and perennial taxa. It stands toreason that the distinctive features of the conifer genome,such as its large size and relatively small occupancy of thegene space, its conserved macro-structure, the large num-bers of repetitive elements, and long introns, represent theproduct of the intricate evolutionary history of conifers.MethodsPicea glauca BAC isolation and validationA BAC library developed from the single Picea glauca(Moench) Voss individual PG29 from BC Ministry of For-ests was utilized. The non-arrayed library consisted of ap-proximately 1.1 million BAC clones with an average insertsize of 140 kbp, representing approximately 3× coverage ofP. glauca genome [40]. The library was screened by quanti-tative PCR (qPCR) through successive steps using BACsuper-pools and pools, serial dilutions and clone identifyverification by amplicon sequencing. The BAC isolationand sequencing are reported here for the first time.We isolated 21 BAC clones each containing a differentsingle copy gene from P. glauca (see list of genes and ac-cession numbers in Additional file 1: Table S2). Each ofthe genes screened was represented by a unique FL-cDNAclone in P. glauca as described in Rigault et al. [29]. Theselected genes encoded enzymes and transcriptional regu-lators involved in secondary cell-wall formation and nitro-gen metabolism and were subject to manual curation.They were chosen as to facilitate comparison with BACisolation studies conducted in other conifers species (e.g.[21]). Two sets of gene specific primers were designed forStival Sena et al. BMC Plant Biology 2014, 14:95 Page 11 of 16http://www.biomedcentral.com/1471-2229/14/95each gene based on the cDNA sequence available inP. glauca gene catalogue [29]. The genomic sequence ob-tained was used to design two additional primers such thattwo small amplicons of 120–200 bp could be amplified byquantitative PCR (qPCR) (Additional file 1: Table S3). Allof the primer sets were verified by PCR and qPCR usingthe genomic DNA from P. glauca, genotype PG-653, andthen they were used to screen the BAC library in threesteps. See PCR conditions in Additional file 5.The BAC library was subdivided into pools with a titer1000 BACs on average, which were arrayed into ten 96deep-well plates. Each plate was inoculated in 96-well cul-ture plates with 1 ml of terrific broth (TB) and 20 μg/mLof chloramphenicol and grown in a 37°C shaker at 300rpm overnight. The same TB medium and growing condi-tions were utilized to culture bacteria throughout thescreening steps. Bacterial cultures from each of the col-umns and rows within a plate were combined in a totalof 200 super-pools for DNA isolation as described inOsoegawa et al. [46].The first step followed Jeukens et al. [47]. Briefly, thesuper-pool DNA was amplified by the two small ampli-cons for each target gene by qPCR using QuantiTectSYBR Green master mix as described in Boyle et al. [48].The intersection of a positive row and a positive columnwas indicative of positive wells on the original plate. Thepresence of target genes in the positive super-pools wasverified by qPCR in 30 μL reactions using QuantiTectSYBR Green master [48]. We performed PCR of the longamplicon and its purification for gene sequence valid-ation by Sanger sequencing (Additional file 1: Table S3).The second step of the screening relied on serial dilutionsof the super-pools to inoculate 50 bacteria from the posi-tive per well in a 96 deep-well plate. DNA super-poolswere extracted and screened by qPCR using the same con-ditions as in the first step. Then, we extracted DNA from1 μL of bacterial culture from each well of a positive col-umn to test it by qPCR and determine the positive well inthe column. From the positive well of the same bacterialculture plate, we proceeded with serial dilutions and weinoculated a 96 deep-well plate with one isolated colonyper well. The third step of the screening consisted to poolcolumns and rows of bacterial cultures. We identifiedpositive wells by qPCR and plated the culture of each posi-tive well on a different Petri dish and one colony per dishwas inoculated in 5 mL TB with chloramphenicol. DNAwas extracted from 2 mL of each culture. Positive isolatedclones were validated by PCR, qPCR and resequencing ofthe long amplicon. The validation steps to confirm geneidentity and integrity proved essential as conifers such asconifers contain many pseudogenes that reduce the effi-ciency of targeted BAC isolation [39].The 21 isolated BAC clones, each identified by screen-ing for a different gene (for accessions numbers, seeAdditional file 1: Table S2) were sequenced by Roche454 FLX pyrosequencing at McGill University and Gen-ome Québec Innovation Centre, Montreal, Canada. Se-quences were assembled de novo into contigs using theGS De novo Assembler module of Newbler version 2.3(Roche) [49]. In this analysis, the BAC vector and E. coligenome were trimmed and the assembly parameterswere a minimum overlap of 200 bp length, minimumoverlap identity of 98% and minimum contig length of500 bp. In general, more than one contig per BAC wasobtained; therefore, the order of the contigs within eachBAC was tested by PCR.To determine gene structure (introns and exons),cDNAs were mapped onto the BAC contigs containingthe respective gene using est2genome incorporated in theannotation software MAKER [50]. Four of the genes wereeliminated from the comparative gene structure analysesbecause they were either incomplete, lacked introns oridentifiable homologs in the species targeted for compara-tive analyses.Pinus taeda orthologous sequencesSeven BAC clones of Pinus taeda containing orthologsof P. glauca genes were identified by BLAST [51] usingan e-value threshold of 1e-20 and sequence identity >90% (Additional file 1: Table S4). An additional 16 se-quences were identified by BLAST [51] using an e-valuethreshold of 1e-20 in the whole genome shotgun assemblyof Pinus taeda [11]. Their gene structures were definedusing est2genome [50] and P. taeda cDNA or P. glaucacDNA when P. taeda complete cDNA was not available.Accessions numbers available in Additional file 1: TableS4. A pairwise alignment of all corresponding intron andexon sequences of orthologous genes between P. glaucaand P. taeda was conducted, followed by the estimation oftheir similarity with the software Needle, part of the ana-lysis package EMBOSS [52]. The BAC clones containingthe LIM gene in P. glauca and the Korrigan, Peptidase_C,Thiolase, Gp_dh_C and eRF1_2 genes in P. taeda lackedan intron and exon; these missing exons and introns wereexcluded from the comparison between P. glauca andP. taeda.Screening for highly expressed genes in whole genomeshotgun assemblyBased on transcript profiles (PiceaGenExpress database[30]) a set of 500 gene sequences each representing aunique FL-cDNA clone that was highly expressed in all tis-sues was identified from the P. glauca gene catalog [29]. Apreliminary assembly of the P. glauca genome assembly de-scribed by Birol et al. [10] was screened with each of thesesequences. The screening was performed by exonerate andest2genome model [53], which considers intron/exonboundaries. The cDNA/genome alignments were furtherStival Sena et al. BMC Plant Biology 2014, 14:95 Page 12 of 16http://www.biomedcentral.com/1471-2229/14/95filtered based on the identity and length coverage to retainonly complete alignments with entire cDNAs; i.e., geneswith complete genomic sequence. We randomly selected18 the genomic sequences thus identified as containingcomplete structures of highly expressed genes. As for genescontained into the BACs, genes annotations were generateautomatically and curated manually individual reciprocalBLASTs and sequence alignments.Identification of closest homologs in angiospermsHomologous sequences to P. glauca genes were identifiedin Arabidopsis thaliana and P. trichocarpa by BLASTX[51] with threshold e-value of 1e-10. Reciprocal analysis(BLASTX) between the A. thaliana, P. trichocarpa se-quence and the P. glauca gene catalogue was used to verifythat the genes were the closest homologs among known se-quences. In Zea mays, the closest homolog was identifiedbased on A. thaliana sequences by BLASTX, with a thresh-old e-value of 1e-10, and orthology was verified in theMaize Genome Project Sequencing database (Additionalfile 1: Table S5). We also performed a BLASTX of P. glaucaagainst Z. mays sequences and we verified that the closesthomologs identified between Z.mays and A. thaliana wereamong the best hits. Gene structures were recovered fromthe databases of TAIR 10 [54], Phytozome (JGI v3.0 geneannotation of assembly v3 of P. trichocarpa) [18,55] andthe Maize Genome Sequencing Project [56]. Accessionnumbers available in Additional file 1: Table S5.From the 21 genes contained into the P. glauca BACclones, four genes were eliminated from the gene structurecomparative analyzes between P. glauca, A. thaliana, P. tri-chocarpa and Z. mays: (1) Dof5 because a clear A. thalianahomolog could not be identified, (2) asparaginase because aclear Z. mays homolog could not be identified, (3) PAL be-cause it lacked introns and (4) LIM because it presented anincomplete sequence (cDNA). A total of 35 genes had theirgene structure compared with closest homologs in angio-sperms: 17 genes related to secondary cell wall formationand nitrogen metabolism and 18 highly transcribed geneswith little tissue-specific expression.Statistical analyses of intronsIntron lengths were compared between P. glauca, A.thaliana, P. trichocarpa and Z. mays by nonparametricKruskal-Wallis tests with post-test analysis by Dunn’s mul-tiple comparisons, because intron length did not follow aGaussian distribution. Comparisons of two groups ofgenes used by Wilcoxon rank sum test with continuitycorrection; this test was used to compare total intron se-quences in P. glauca genes belonging to the two expres-sion groups and to compare P. glauca with P. taeda. Dataanalyses were performed using the R packages coin andmultcomp [57-59].Gene space obtained from sequence capture technologySequences were obtained by using genomic DNA hybrid-izations on a custom P. glauca chip containing oligonucleo-tide baits for 23,864 genes. The method development, theDNA sequence isolation and analysis procedure and theresulting sequence data are reported in this manuscript forthe first time.DNA was extracted from needles of the P. glauca individ-ual 77111 from the Canadian Forest Service as described inPelgas et al. [60] using the DNeasy Plant mini kit accordingto the manufacturer’s instructions (QIAGEN). One micro-gram of high quality DNA was used to prepare a GS-FLXrapid library according to the manufacturer instructions(Roche). The library was amplified by ligation-mediatedPCR using 454 A and B primers as described in the Nim-bleGen SeqCap EZ Library LR User’s guide.Custom probes were designed by Nimblegen based onthe cDNA sequences and ESTs from the P. glauca genecatalogue [29]. We used a Newbler (gsAssembler modulev2.5.3) assembly of sequences from random genomic se-quencing (0.15× of coverage) from P. glauca [29] to iden-tify highly repetitive elements and to filter out probesrepresenting such elements as they were expected to re-duce the efficiency of the sequence capture. Next, a com-parative genomic hybridization (Array CGH) experimentwas conducted in collaboration with Nimblegen (Madison,WI, USA) to eliminate probes with abnormally high levelof hybridization that could not be identified with in silicoapproaches. Throughout the process, probes within genesharboring abnormally high capture levels were eliminated.The final design covered 23,864 genes. The target enrich-ment procedures including quantitative PCR assessmentsare described in Additional file 5.Emulsion PCR and GS-FLX Titanium sequencing wasperformed according to manufacturer’s instructions atthe Plateforme d’Analyses Génomiques of the Institut deBiologie Intégrative et des Systèmes (Université Laval,Quebec, Canada). Raw sequencing reads were de novo as-sembled using the gsAssembler module of Newbler v2.5.3.Contigs were screened for complete gene structures basedon the P. glauca gene catalogue [29]. Technical details areavailable in Additional file 5.Picea glauca repetitive library and identification ofrepeat elementsFor repeat identification, a random sample of 100,000P. glauca 454 reads from randomly sheared DNA wassearched de novo for repeats using the software RepeatSc-out [61]. The results were filtered by removing low com-plexity sequences and sequences shorter than 100 nt, andretaining only repeats having at least 10 matches whenmapped onto the original 454 set using RepeatMasker[62]. Since RepeatScout is tailored to analyze complete ge-nomes or at least large scaffolds, its output is usuallyStival Sena et al. BMC Plant Biology 2014, 14:95 Page 13 of 16http://www.biomedcentral.com/1471-2229/14/95fragmented when the program is run on randomsheared reads. In order to reduce fragmentation, wemerged the repeats belonging to the same elementrunning cap3 [63] under relaxed settings (−o 30, −p80, −s 500) on the RepeatScout output. Finally, the en-tire set of repeated sequences was clustered using thesoftware cd-hit-est [64] by collapsing all the repeats sharingat least 80% similarity in order to remove redundancies.Repeat characterization proceeded by similaritysearches were used to associate candidate repeats toknown TE families and to remove repeats showingsimilarity to gene sequences and being possibly part ofgene families. In particular, the repeat candidates fromeach species were searched against RepBase [65] usingTBLASTX [57] and setting as significance threshold ane-value of 1e-5. Repeats that did not provide significanthits were used as queries in BLASTX searches againstthe non-redundant (nr) division of Genbank. Thosehaving significant hits with genes were removed fromthe library while those having significant hits with TEswere labeled accordingly and the remaining repeatswere considered as unclassified.The search for repeat elements in all BAC contigs andin the gene space obtained from sequence capture wasconducted using RepeatMasker [62] using the Piceaglauca repetitive library and default parameters.Additional filesAdditional file 1: Table S1. Gene structure data of orthologs of Piceaglauca and Pinus taeda. Table S2. List of genes associated with secondarycell-wall formation or with nitrogen metabolism in P. glauca targeted forBAC isolations. Table S3. Primer information and sequences used for BACscreening and sequencing validation. Table S4. Accession numbers ofP. taeda orthologs and sequence similarity to P. glauca. Table S5. Accessionnumbers for the closest homologous sequences between P. glauca,Arabidopsis thaliana, Populus trichocarpa and Zea mays. Table S6. Summaryof sequencing results of P. glauca BAC clones isolated each containing adifferent single copy gene associated with secondary cell-wall formation orwith nitrogen metabolism.Table S7. GenBank accessions of complete cDNAutilized for gene structure definition when the cDNA in Picea glauca genecatalogue was incomplete. Table S8. Repetitive elements detected withingene structure of the 35 P. glauca genes.Additional file 2: Figure S2. Comparative analysis of individual intronlength in P. glauca, A. thaliana, P. trichocarpa and Z. mays. A. Average andmedian length of individual introns in all genes. B Average and medianlength of individual introns in highly expressed genes and genesassociated with secondary cell-wall formation and nitrogen metabolismin four species. Intron lengths were compared among the four species byKruskal-Wallis test with post-test analysis by Dunn’s multiple comparisons:NS, not significant (P > 0.06); * P < 0.06; **P < 0.01; ***P < 0.001.Additional file 3: Figure S3. Boxplot of the 35 homologous genes inP. glauca, A. thaliana, P.trichocarpa and Z. mays.Additional file 4: Figure S1. Content of repetitive elements in 21different BAC clones. The analysis used the RepeatMasker software and aP. glauca repetitive sequence library (see Methods). Repetitive elementswere classified as LTR (long terminal repeat) and unclassified (no hit inRepBase).Additional file 5: Supplemental file. Additional experimentalprocedures for BAC isolation and sequence capture.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsKR and CR provided the P. glauca BAC library; IG and BB ran the BACisolation experiments; BB performed sequence capture experiments andassembled the sequences; PR analyzed the sequence capture sequences andmapped them to the cDNA models; IB, SJ, JBoh, JBou and JM participated inthe assembly of P. glauca genome; JS conducted the data analysis andinterpretation of data and results, and drafted the manuscript; AZ developedthe P. glauca repetitive library; JBou and JM contributed to the supervisionand discussion of the research; JM, JBou and KR revised the manuscript.All of the authors approved the manuscript.AcknowledgementsThe authors thank D. Peterson (Mississippi State Univ., USA) and G. Clarosand F. Canovas (Univ. de Málaga, Spain) for sharing information on genetargets and strategies for BAC isolation in pines. Technical assistance of S.Caron, É. Fortin, G. Tessier (Univ. Laval, Canada) is acknowledged for BACscreening. F. Belzile, R. Lévesque, L. Bernatchez (Univ. Laval, Canada) areacknowledged for valuable discussions and suggestions at the projectplanning stage. Funding for the project was received from Génome Québecfor a Genome exploration grant (JM, JBou, PR, KR), from Genome Canada,Génome Québec and Genome British Columbia for the SmarTForests project(JM, JBoh, JBou, IB, KR, SJ) and NSERC of Canada (JM). JS received partialfunding from Univ. Laval.Author details1Center for Forest Research and Institute for Systems and Integrative Biology,1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada.2Gydle Inc., Québec, QC, Canada. 3Michael Smith Laboratories, University ofBritish Columbia, Vancouver, BC V6T 1Z4, Canada. 4Applied GenomicsInstitute, Udine 33100, Italy. 5Institute of Life Sciences, Scuola SuperioreSant’Anna, Pisa 56127, Italy. 6Department of Forest Sciences, University ofBritish Columbia, Vancouver, BC V6T 1Z4, Canada. 7Canada Research Chair inForest Genomics, Université Laval, Québec, QC G1V 0A6, Canada.Received: 11 September 2013 Accepted: 9 April 2014Published: 16 April 2014References1. Lynch M, Conery JS: The origins of genome complexity. Science 2003,302:1401–1404.2. Deutsch M, Long M: Intron-exon structures of eukaryotic modelorganisms. Nucleic Acids Res 1999, 27:3219–3228.3. Comeron JM, Kreitman M: The correlation between intron length andrecombination in Drosophila: dynamic equilibrium between mutationaland selective forces. Genetics 2000, 156:1175–1190.4. Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA:Selection for short introns in highly expressed genes. Nat Genet 2002,31:415–418.5. Murray BG, Leitch IJ, Bennett MD: Gymnosperm DNA C-values Database;2004. http://www.kew.org/cvalues/.6. Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Garcia SA,Kubisiak TL, Amerson HV, Carlson JE, Nelson CD, Davis JM: Evolution ofgenome size and complexity in Pinus. PLoS One 2009, 4:e4332.7. Magbanua ZV, Ozkan S, Bartlett BD, Chouvarine P, Saski CA, Liston A, Cronn RC,Nelson CD, Peterson DG: Adventures in the enormous: A 1.8 million clone BAClibrary for the 21.7 Gb genome of loblolly pine. PLoS One 2011, 6:e16214.8. Pavy N, Pelgas B, Laroche J, Rigault P, Isabel N, Bousquet J: A spruce genemap infers ancient plant genome reshuffling and subsequent slowevolution in the gymnosperm lineage leading to extant conifers.BMC Biol 2012, 10:84.9. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, Vezzi F,Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R, Sahlin K, Sherwood E,Elfstrand M, Gramzow L, Holmberg K, Hällman J, Keech O, Klasson L, KoriabineM, Kucukoglu M, Käller M, Luthman J, Lysholm F, Niittylä T, Olson Å, Rilakovic N,Ritland C, Rosselló JA, Sena J, et al: The Norway spruce genome sequenceand conifer genome evolution. Nature 2013, 497:579–584.10. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, Yuen MMS,Keeling CI, Brand D, Vandervalk BP, Kirk H, Pandoh P, Moore RA, Zhao Y,Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 14 of 16http://www.biomedcentral.com/1471-2229/14/95Mungall AJ, Jaquish B, Yanchuk A, Ritland C, Boyle B, Bousquet J, Ritland K,Mackay J, Bohlmann J, Jones SJM: Assembling the 20 Gb white spruce (Piceaglauca) genome from whole-genome shotgun sequencing data.Bioinformatics 2013, 29:1492–1497.11. Neale DB, Wegrzyn JL, Stevens KA, Zimin AV, Puiu D, Crepeau MW, CardenoC, Koriabine M, Holtz-Morris AE, Liechty JD, Martínez-García PJ, Vasquez-Gross HA, Lin BY, Zieve JJ, Dougherty WM, Fuentes-Soriano S, Wu L-S,Gilbert D, Marçais G, Roberts M, Holt C, Yandell M, Davis JM, Smith KE, Dean JF,Lorenz WW, Whetten RW, Sederoff R, Wheeler N, McGuire PE, et al: Decodingthe massive genome of loblolly pine using haploid DNA and novelassembly strategies. Genome Biol 2014, 15:R59.12. Wegrzyn JL, Liechty JD, Stevens KA, Wu L-S, Loopstra CA, Vasquez-Gross HA,Dougherty WM, Lin BY, Zieve JJ, Martínez-García PJ, Holt C, Yandell M,Zimin AV, Yorke JA, Crepeau MW, Puiu D, Salzberg SL, Dejong PJ,Mockaitis K, Main D, Langley CH, Neale DB: Unique features of theLoblolly Pine (Pinus taeda L.) Megagenome revealed throughsequence annotation. Genetics 2014, 196:891–909.13. Vinogradov AE: Intron–genome size relationship on a large evolutionaryscale. J Mol Evol 1999, 49:376–384.14. McLysaght A, Enright AJ, Skrabanek L, Wolfe KH: Estimation of syntenyconservation and genome compaction between pufferfish (Fugu) andhuman. Yeast 2000, 17:22–36.15. Moriyama EN, Petrov DA, Hartl DL: Genome size and intron size inDrosophila. Mol Biol Evol 1998, 15:770–773.16. Wendel JF, Cronn RC, Alvarez I, Liu B, Small RL, Senchina DS: Intron sizeand genome size in plants. Mol Biol Evol 2002, 19:2346–2352.17. Jiang K, Goertzen LR: Spliceosomal intron size expansion in domesticatedgrapevine (Vitis vinifera). BMC Res Notes 2011, 4:52.18. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, PutnamN, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR,Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M,Carlson J, Chalot M, Chapman J, Chen G-L, Cooper D, Coutinho PM, Coutur-ier J, Covert S, Cronk Q, et al: The genome of black cottonwood, Populustrichocarpa (Torr. & Gray). Science 2006, 313:1596–1604.19. Arabidopsis Genome Initiative: Analysis of the genome sequence of theflowering plant Arabidopsis thaliana. Nature 2000, 408:796–815.20. Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E,Wing RA, Rounsley S, Birren B, Nusbaum C, Mayer KFX, Messing J: Structureand architecture of the maize genome. Plant Physiol 2005, 139:1612–1624.21. Ren X-Y, Vorst O, Fiers MWEJ, Stiekema WJ, Nap J-P: In plants, highlyexpressed genes are the least compact. Trends Genet 2006, 22:528–532.22. Kumar A, Bennetzen JL: Plant retrotransposons. Annu Rev Genet 1999,33:479–532.23. Feschotte C, Jiang N, Wessler SR: Plant transposable elements: wheregenetics meets genomics. Nat Rev Genet 2002, 3:329–341.24. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J,Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C,Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K,Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B,et al: The B73 maize genome: complexity, diversity, and dynamics.Science 2009, 326:1112–1115.25. Ralph SG, Chun HJ, Kolosova N, Cooper D, Oddy C, Ritland CE, Kirkpatrick R,Moore R, Barber S, Holt RA, Jones SJ, Marra MA, Douglas CJ, Ritland K,Bohlmann J: A conifer genomics resource of 200,000 spruce (Picea spp.)ESTs and 6,464 high-quality, sequence-finished full-length cDNAs forSitka spruce (Picea sitchensis). BMC Genomics 2008, 9:484.26. Bedon F, Grima-Pettenati J, Mackay J: Conifer R2R3-MYB transcription fac-tors: sequence analyses and gene expression in wood-forming tissues ofwhite spruce (Picea glauca). BMC Plant Biol 2007, 7:17.27. Cañas RA, de la Torre F, Cánovas FM, Cantón FR: High levels of asparaginesynthetase in hypocotyls of pine seedlings suggest a role of the enzymein re-allocation of seed-stored nitrogen. Planta 2006, 224:83–95.28. Nairn CJ, Lennon DM, Wood-Jones A, Nairn AV, Dean JFD: Carbohydrate-re-lated genes and cell wall biosynthesis in vascular tissues of loblolly pine(Pinus taeda). Tree Physiol 2008, 28:1099–1110.29. Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ: A whitespruce gene catalog for conifer genome analyses. Plant Physiol 2011,157:14–28.30. Raherison E, Rigault P, Caron S, Poulin P-L, Boyle B, Verta J-P, Giguère I,Bomal C, Bohlmann J, MacKay J: Transcriptome profiling in conifers andthe PiceaGenExpress database show patterns of diversification withingene families and interspecific conservation in vascular gene expression.BMC Genomics 2012, 13:434.31. Bradnam KR, Korf I: Longer first introns are a general property ofeukaryotic gene structure. PLoS One 2008, 3:e3093.32. Savard L, Li P, Strauss SH, Chase MW, Michaud M, Bousquet J: Chloroplastand nuclear gene sequences indicate late Pennsylvanian time for thelast common ancestor of extant seed plants. Proc Natl Acad Sci U S A1994, 91:5163–5167.33. Wang X-Q, Tank DC, Sang T: Phylogeny and divergence times in Pinaceae:evidence from three genomes. Mol Biol Evol 2000, 17:773–781.34. Ohri D, Khoshoo TN: Genome size in gymnosperms. Plandt Syst Evol 1986,153:119–132.35. Hu TT, Pattyn P, Bakker EG, Cao J, Cheng J-F, Clark RM, Fahlgren N, FawcettJA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP,Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME,Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y,Grigoriev IV, Nordborg M, Weigel D, Guo Y-L: The Arabidopsis lyrata gen-ome sequence and the basis of rapid genome size change. Nat Genet2011, 43:476–481.36. Morgante M, De Poali E: Toward the conifer genome sequence. InGenetics, Genomics and Breeding of Conifers Trees. Edited by Plomion C,Bousquet J, Kole C. Enfield: Science Publishers; 2011:389–403.37. Lockton S, Gaut BS: The contribution of transposable elements toexpressed coding sequence in Arabidopsis thaliana. J Mol Evol 2009,68:80–89.38. Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N,Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C,Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B,Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Fabbro CD,Alaux M, Gaspero GD, Dumas V, et al: The grapevine genome sequencesuggests ancestral hexaploidization in major angiosperm phyla.Nature 2007, 449:463–467.39. Kovach A, Wegrzyn JL, Parra G, Holt C, Bruening GE, Loopstra CA, Hartigan J,Yandell M, Langley CH, Korf I, Neale DB: The Pinus taeda genome ischaracterized by diverse and highly diverged repetitive sequences.BMC Genomics 2010, 11:420.40. Hamberger B, Hall D, Yuen M, Oddy C, Hamberger B, Keeling CI, Ritland C,Ritland K, Bohlmann J: Targeted isolation, sequence assembly andcharacterization of two white spruce (Picea glauca) BAC clones forterpenoid synthase and cytochrome P450 genes involved in coniferdefence reveal insights into a conifer genome. BMC Plant Biol 2009, 9:106.41. Bautista R, Villalobos DP, Díaz-Moreno S, Cantón FR, Cánovas FM, Claros MG:Toward a Pinus pinaster bacterial artificial chromosome library. Ann ForSci 2007, 64:855–864.42. Gazave E, Marqués-Bonet T, Fernando O, Charlesworth B, Navarro A:Patterns and rates of intron divergence between humans andchimpanzees. Genome Biol 2007, 8:R21.43. Lynch M: Intron evolution as a population-genetic process. Proc Natl AcadSci U S A 2002, 99:6118–6123.44. Jaramillo-Correa JP, Verdú M, González-Martínez SC: The contribution ofrecombination to heterozygosity differs among plant evolutionarylineages and life-forms. BMC Evol Biol 2010, 10:22.45. Sakharkar MK, Chow VTK, Kangueane P: Distributions of exons and intronsin the human genome. In Silico Biol (Gedrukt) 2004, 4:387–393.46. Osoegawa K, de Jong PJ, Frengen E, Ioannou PA: Construction of bacterialartificial chromosome (BAC/PAC) libraries. In Current Protocols in MolecularBiology. Edited by Ausubel FM, Brent R, Kingston RE, Moore DD, SeidmanJG, Smith JA, Struhl K. Hoboken, NJ, USA: John Wiley & Sons Inc; 2001.47. Jeukens J, Boyle B, Kukavica-Ibrulj I, St-Cyr J, Lévesque RC, Bernatchez L: BAClibrary construction, screening and clone sequencing of lake whitefish(Coregonus clupeaformis, Salmonidae) towards the elucidation of adap-tive species divergence. Mol Ecol Resour 2011, 11:541–549.48. Boyle B, Dallaire N, MacKay J: Evaluation of the impact of singlenucleotide polymorphisms and primer mismatches on quantitative PCR.BMC Biotech 2009, 9:75.49. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV,Goodwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI,Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM,Lei M, Li J, et al: Genome sequencing in open microfabricated high dens-ity picoliter reactors. Nature 2005, 437:376–380.Stival Sena et al. BMC Plant Biology 2014, 14:95 Page 15 of 16http://www.biomedcentral.com/1471-2229/14/9550. Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Alvarado AS,Yandell M: MAKER: An easy-to-use annotation pipeline designed foremerging model organism genomes. Genome Res 2008, 18:188–196.51. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignmentsearch tool. J Mol Biol 1990, 215:403–410.52. Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular BiologyOpen Software Suite. Trends Genet 2000, 16:276–277.53. Slater GS, Birney E: Automated generation of heuristics for biologicalsequence comparison. BMC Bioinforma 2005, 6:31.54. The Arabidopsis Information Resource. http://arabidopsis.org.55. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W,Hellsten U, Putnam N, Rokhsar DS: Phytozome: a comparative platform forgreen plant genomics. Nucleic Acids Res 2012, 40(D1):D1178–D1186.56. Maize Genome Sequencing Project. http://www.maizesequence.org.57. Hothorn T, Bretz F, Westfall P: Simultaneous inference in generalparametric models. Biom J 2008, 50:346–363.58. Hothorn T, Hornik K, van de Wiel MA, Zeileis A: Implementing a class ofpermutation tests: the coin package. J Stat Softw 28(8):1–23. URLhttp://www.jstatsoft.org/v28/i08/.59. R project. http://www.r-project.org.60. Pelgas B, Bousquet J, Meirmans PG, Ritland K, Isabel N: QTL mapping in whitespruce: gene maps and genomic regions underlying adaptive traits acrosspedigrees, years and environments. BMC Genomics 2011, 12:145.61. Price AL, Jones NC, Pevzner PA: De novo identification of repeat familiesin large genomes. Bioinformatics 2005, 21(Suppl 1):i351–i358.62. Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. In 1996–2010http://www.repeatmasker.org.63. Huang X, Madan A: CAP3: a DNA sequence assembly program.Genome Res 1999, 9:868–877.64. Huang Y, Niu B, Gao Y, Fu L, Li W: CD-HIT Suite: a web server forclustering and comparing biological sequences. Bioinformatics 2010,26:680–682.65. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J:Repbase update, a database of eukaryotic repetitive elements.Cytogenet Genome Res 2005, 110:462–467.doi:10.1186/1471-2229-14-95Cite this article as: Stival Sena et al.: Evolution of gene structure in theconifer Picea glauca: a comparative analysis of the impact of intron size.BMC Plant Biology 2014 14:95.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitStival Sena et al. BMC Plant Biology 2014, 14:95 Page 16 of 16http://www.biomedcentral.com/1471-2229/14/95


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items