UBC Faculty Research and Publications

Genomic prediction accuracies in space and time for height and wood density of Douglas-fir using exome… Thistlethwaite, Frances R; Ratcliffe, Blaise; Klápště, Jaroslav; Porth, Ilga; Chen, Charles; Stoehr, Michael U; El-Kassaby, Yousry A Dec 2, 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2017_Article_4258.pdf [ 1.66MB ]
JSON: 52383-1.0361561.json
JSON-LD: 52383-1.0361561-ld.json
RDF/XML (Pretty): 52383-1.0361561-rdf.xml
RDF/JSON: 52383-1.0361561-rdf.json
Turtle: 52383-1.0361561-turtle.txt
N-Triples: 52383-1.0361561-rdf-ntriples.txt
Original Record: 52383-1.0361561-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessGenomic prediction accuracies in spaceand time for height and wood density ofDouglas-fir using exome capture as thegenotyping platformFrances R. Thistlethwaite1, Blaise Ratcliffe1, Jaroslav Klápště1,2,3, Ilga Porth4, Charles Chen5, Michael U. Stoehr6and Yousry A. El-Kassaby1*AbstractBackground: Genomic selection (GS) can offer unprecedented gains, in terms of cost efficiency and generation turnover,to forest tree selective breeding; especially for late expressing and low heritability traits. Here, we used: 1) exome captureas a genotyping platform for 1372 Douglas-fir trees representing 37 full-sib families growing on three sites in BritishColumbia, Canada and 2) height growth and wood density (EBVs), and deregressed estimated breeding values (DEBVs) asphenotypes. Representing models with (EBVs) and without (DEBVs) pedigree structure. Ridge regression best linearunbiased predictor (RR-BLUP) and generalized ridge regression (GRR) were used to assess their predictive accuracies overspace (within site, cross-sites, multi-site, and multi-site to single site) and time (age-age/ trait-trait).Results: The RR-BLUP and GRR models produced similar predictive accuracies across the studied traits. Within-site GSprediction accuracies with models trained on EBVs were high (RR-BLUP: 0.79–0.91 and GRR: 0.80–0.91), and were generallysimilar to the multi-site (RR-BLUP: 0.83–0.91, GRR: 0.83–0.91) and multi-site to single-site predictive accuracies (RR-BLUP: 0.79–0.92, GRR: 0.79–0.92). Cross-site predictions were surprisingly high, with predictive accuracies within a similar range(RR-BLUP: 0.79–0.92, GRR: 0.78–0.91). Height at 12 years was deemed the earliest acceptable age at which accuratepredictions can be made concerning future height (age-age) and wood density (trait-trait). Using DEBVs reduced theaccuracies of all cross-validation procedures dramatically, indicating that the models were tracking pedigree (familymeans), rather than marker-QTL LD.Conclusions: While GS models’ prediction accuracies were high, the main driving force was the pedigree tracking ratherthan LD. It is likely that many more markers are needed to increase the chance of capturing the LD between causalgenes and markers.Keywords: Douglas-fir, Genomic selection, Exome capture, Full-sib families, Genotype x environment interaction,Predictive model* Correspondence: y.el-kassaby@ubc.ca1Department of Forest and Conservation Sciences, Faculty of Forestry, TheUniversity of British Columbia, 2424 Main Mall, Vancouver, BC V6T 1Z4, CanadaFull list of author information is available at the end of the article© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Thistlethwaite et al. BMC Genomics  (2017) 18:930 DOI 10.1186/s12864-017-4258-5BackgroundNovel advancements in genomics technologies and statis-tical genetics, have paved the way for an increasingly pros-perous environment for breeding industries. Notably in thedairy sector, the traditional phenotype-dependent selectionhas been successfully replaced by genotype-dependent se-lection (aka genomic selection, GS) which reduces the timerequired for evaluating genetic traits [1]. Tree selectivebreeding (aka tree improvement) challenges are somewhatsimilar to those of the livestock industry; namely, long gen-eration times and low heritability and late expressing traits.GS, if successful, can offer unprecedented gains in forestrythrough reducing trait evaluation time, allowing fasterbreeding generations turn-over, and hence, an increasedgenetic gain per unit time can be reached. Additionally, theimplementation of GS to forestry would offer a certain re-silience to spontaneous market or environmental, and/orextraneous influences (e.g., disease/pest resistance, climatechange); as breeding programmes adapt accordingly in ashorter time frame [2].Selective breeding has traditionally focused on phenotypicselection, and more recently the linkage disequilibrium (LD)based indirect method of marker-assisted selection (MAS)[3]. MAS is suited for major-gene traits (such as resistance towhite-pine blister rust (Cronartium ribicola)) and proven tobe less effective for predicting complex quantitative traits thatclosely reflect Fisher’s infinitesimal model [4]. MAS modelscould not effectively describe a complex trait, since small ef-fect loci are not readily discovered and considered ‘not sig-nificant’, thus leaving a substantial proportion of geneticcontrol unaccounted for (i.e., missing heritability). Meuwissenet al. [5] proposed the method of ‘genomic selection’ to alle-viate this problem by simultaneously considering all markereffects, and in doing so, all genetic contributions are capturedregardless of size (significance). GS enables complex quantita-tive trait selection using genomic marker data alone and themethod does not require a priori knowledge regarding thespecific genetic architecture of the trait in question. Instead,markers throughout the entire genome (or in this case ex-ome) are incorporated into the estimate. The resulting gen-omic estimated breeding values (GEBVs) for each individualderived from the GS models provide a basis, upon which se-lection decisions are made. The effect of this is a paradigmshift, in which the model unit of these breeding analysesshifts from being the line of decent to the allele. This meansthat the phenotypic values of individuals are determined fromgenotypic data, enabling early selection of traits, leading to asignificantly shorter breeding cycle and higher selection dif-ferential, particularly for the “difficult to assess” attributes.The feasibility of applying GS to forest trees was initiallyassessed through deterministic simulations and the resultsindicated that GS has the potential to radically improve theefficiency of tree selective breeding [6]. Grattapaglia andResende [6] also recommended further experimentationand proof of concept investigations. Since then, severalforest tree species “proof-of-concept investigations” havebeen conducted with encouraging results [7–18].Neves et al. [19] recognized that one of the major barriersto the application of genomic technologies to tree selectivebreeding was the large size of the genome of many forestspecies. This is particularly true of conifers, and although co-nifers have a large genome [20–22], their transcriptomes arecomparable with other plants such as Arabidopsis whosegenome is more than 100 times smaller [23]. Therefore, asan alternative to the prohibitive costs in both monetaryterms and time, and the complexity of sequencing the wholegenome, Neves et al. [19] focused on the coding region. Thisthey refer to as “sequence capture”, and proposed that itwould enable more efficient genetic variant identification inconifers; as it had previously been done in human and maizegenomic studies [24–26]. Although sources of variation maynot be exclusively found in the exome, the reduced cost andtime compared to that of whole genome sequencing, as wellas the ability to still capture a significant proportion of vari-ants and rare variants, makes this method desirable as it isharder to find functional variants in non-coding regions [27,28]. Exome capture has been recognized as an effectivemethod for capturing rare variants in the field of medicine[29], and for increasing knowledge of unmapped largegenomes [19, 28]. In addition, Suren et al. [20] have shown itto be a cost effective method for reducing complexity inlarge genomes, such as those of conifers.The present GS study was based on the exomic informa-tion collected from 1372, 38-year-old coastal Douglas-fir(Pseudotsuga menziesii Mirb. (Franco)) trees. The samplesrepresented 37 full-sib families with replications over threesites in coastal BC. The study objectives were: 1) to comparetwo GS methods; ridge regression best linear unbiased pre-dictor (RR-BLUP) and generalised ridge regression (GRR),and 2) to test the GS prediction accuracy for within-, cross-,pooled multi-site, and time- time (age-age /trait-trait) be-tween age 12 and 38 years. Two phases to this analysis werecarried out, firstly the two GS models were trained on esti-mated breeding values (EBVs). This represents an analysis inwhich model predictions are based on pedigree (both histor-ical and contemporary) and marker-QTL LD information.Secondly the models were trained on deregressed breedingvalues (DEBVs). In this analysis the pedigree information(parental average) is removed, resulting in model predictionsbased on marker-QTL LD and co-segregation. Results ofABLUP cross-validations are provided as a reference forcomparison.ResultsTo assess the studied attributes’ variation, boxplots were pro-duced showing the variance of the estimated (EBVs) and thederegressed (DEBVs) breeding values (Fig. 1). It is interestingto note that the deregression process maintained the withinThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 2 of 16family variation for the three studied attributes (HT12,HT35, and WDres); however, it virtually eliminated amongfamily variation resulting in similar family means (Fig. 1).Traits’ heritabilities and EBV accuracyPedigree-based relationship matrix (ABLUP) heights herit-ability estimates varied among sites ranging between 0.13(Lost Creek) and 0.24 (Adam), and 0.05 (Adam) and 0.23(Fleet) for age 12 and 35 years, respectively (Fig. 2). Themulti-site height heritability estimates were similar to theaverage of the single-site estimates at age 12 (0.17 vs. 0.19);and was slightly higher than the average single site estimateby age 35 (0.17 vs. 0.14) (Fig. 2). Pedigree-based relationshipwood density heritability estimates generally were higherthan those obtained for height and substantially variedamong sites (range: 0.22 (Lost Creek) and 0.45 (Fleet)), withhigher multi- than single-site average estimates (0.43 vs.0.37) (Fig. 2). The average theoretical accuracies for the EBVsFig. 1 Distribution of estimated breeding values (EBVs) and deregressed estimated breeding values (DEBVs) for (a) height at 12 years (cm), (b)height at 35 years (cm), and (c) wood density (g/cm3) calculated from resistance to drillingThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 3 of 16were: 0.61, 0.68, and 0.76 for HT12, HT35, and WDresrespectively.Cross-validation across space and timeWithin-site prediction accuracyWithin-site prediction accuracies, determined based on thecorrelations between EBVs and GEBVs for the two genomicselection (RR-BLUP and GRR) models, generally producedvery similar results (correlations and standard errors) acrossEBV traits and sites (Table 1). For all EBV traits, Adam pro-duced the highest prediction accuracies, with Fleet Riversecond, and Lost Creek always producing the lowest accur-acies. The two models, RR-BLUP and GRR, produced al-most identical results in analysed EBV traits. The onlydifferences occurring in the prediction of GEBV for HT35,in which the GRR method produced slightly lowerabFig. 2 Heritabilities and GS prediction accuracies for models trained on EBVs and predicting GEBVs for each of the traits. Showing the results of within-site,cross-site, combined-sites, and multi-site to single-site cross-validation (Top (a), using RR-BLUP, Bottom (b), using GRR). The direction of the arrows depictswhat information (site) is used to train the model (shaft end), and which is being predicted (head). Traits key: HT 35 yrs. = height at 35 years (cm); HT12 yrs. = height at 12 years (cm); WDres = wood density. Sites are Adam, Fleet River, Lost Creek, and multi/combined-site (ALL)Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 4 of 16prediction accuracies than RR-BLUP (0.84 vs. 0.86, and0.80 vs. 0.82) for Fleet River and Lost Creek, respectively.The same was found in the analysis of WDres for Adam(GRR = 0.86, RR-BLUP = 0.87). In contrast the WDres,GRR results for Lost Creek (0.84) were slightly more accur-ate than in the RR-BLUP model (0.79). It is worthy to men-tion that the overall predictive accuracy of the studiedgenomic selection models was generally high across sitesand EBV traits (RR-BLUP: average = 0.86 and range of0.79–0.91 and GRR: average = 0.86 and range of 0.80–0.91)(Table 1). Generally, all prediction accuracies’ standarderror estimates were small reflecting good model fit.Using the deregressed breeding values to train the twoGS models, we obtained predictive accuracy results ap-proximating 0.0 for WDres for within-site cross-validation (RR-BLUP: Adam = −0.10 ± 0.055, FleetRiver = −0.04 ± 0.046, and Lost Creek = −0.06 ± 0.049;GRR: Adam = −0.05 ± 0.074, Fleet River = −0.03 ± 0.045,and Lost Creek = −0.04 ± 0.046). The other models forHT12 and HT35 failed to converge.The ABLUP within-site cross-validation, provided resultsof a similar nature as the GS models trained on EBVs. Theaverage prediction accuracy for ABLUP within-site was 0.87(both RR-BLUP and GRR had averages of 0.86), with a rangeof 0.77–0.96. WDres was predicted with the highest accuracyof all three traits, and surpassed the accuracy of the GSmodels: 0.94 ± 0.0009, 0.96 ± 0.0009, and 0.94 ± 0.001, forAdam, Fleet River, and Lost Creek, respectively (Fig. 3).Cross-sites prediction accuracyThe average predictive accuracy of the RR-BLUP and GRRgenomic selection models was very similar for cross- andwithin-site analyses. However, some trends in the predictiveability of the sites did occur (Fig. 2a and b). The sites Adamand Fleet River always produced the same or higher predic-tion accuracies for Lost Creek than the within site estimatefor Lost Creek. In addition, Lost Creek cross-site predictionaccuracies for Adam and Fleet River were also alwayshigher than for itself. Fleet River always produced higherprediction accuracies for Adam than itself. This was truefor all traits using EBVs. This may be an indication thatwhilst there may be some GxE occurring as expected, itmay not be significant enough to employ single-site testingand breeding. The accuracy of cross-site predictions variedand ranged from 0.78 (GRR for WDres: Adam - Lost Creek)to 0.92 (RR-BLUP for HT12: Lost Creek - Adam,), and bothselection models provided comparable results (Fig. 2a andb). Overall, across sites prediction accuracy decreased fromHT12 to HT35 to WDres with averages of 0.90, 0.86, and0.83, respectively (all from RR-BLUP).When the studied attributes’ deregressed breedingvalues (DEBVs) were used to train the two GS models,surprising outcomes were obtained and the resultingpredictive accuracies were extremely low approximating0.0 for WDres, while the remaining models for HT12and HT35 failed to converge.The results of the ABLUP cross-sites validation providedevidence of stronger GxE interaction than was predicted bythe GS models (Fig. 3). This particular trend can occur assite-specific GxE interactions tend to over-estimate within-site prediction accuracy due to the sharing of a common en-vironment, whilst seemingly inhibiting the efficacy of cross-site analyses. The average cross-sites prediction accuraciesfor ABLUP were as follows: 0.68, 0.70, and 0.79 for HT12,HT35, and WDres, respectively (Fig. 3). A drop from the ac-curacies obtained by the GS models trained on EBV data,and a complete reversal of the order of those accuracies.Within multi-site (combined) prediction accuracyThe RR-BLUP and GRR models gave almost identical re-sults, with HT12 having the highest combined-site predic-tion accuracy, followed by HT35 and lastly WDres (0.91,0.88, and 0.83, respectively for both RR-BLUP and GRR)(Fig. 2a and b, Table 2). In general, the combined site predic-tion accuracies were higher than the average of the within-site accuracies, except for WDres which produced the sameaccuracy for both (RR-BLUP: 0.91 vs. 0.89, 0.88 vs. 0.86, 0.83vs. 0.83; for HT12, HT35, and WDres, respectively) (Fig. 2a).Finally, it is also interesting to note the exceedingly smallstandard error values associated with all within multi-siteprediction accuracies, highlighting the model(s) fit (Table 2).Again, using the deregressed breeding values (DEBVs) totrain the two GS models, we obtained results approximat-ing 0.0 for all within multi-site cross-validation analyses forall traits considered (HT12, HT35, and WDres) (Table 2).The multi-site cross-validation accuracies for the ABLUPmodel were: 0.88 ± 0.002, 0.86 ± 0.003, and 0.84 ± 0.003,for HT12, HT35, and WDres, respectively (Fig. 3). Similarin magnitude and sequence to the GS models trained withEBVs.Table 1 Within single-site prediction accuracies and their standarderrors for ABLUP and genomic selection (ridge regression (RR-BLUP)and generalized ridge regression (GRR)) models for EBVs and GEBVsof heights (HT12 and HT35) and wood density (WDres)Trait Model SiteAdam Fleet River Lost CreekHT12 ABLUP 0.81 ± 0.002 0.77 ± 0.002 0.88 ± 0.001RR-BLUP 0.91 ± 0.010 0.89 ± 0.012 0.87 ± 0.013GRR 0.91 ± 0.010 0.89 ± 0.012 0.87 ± 0.012HT35 ABLUP 0.85 ± 0.002 0.83 ± 0.002 0.88 ± 0.002RR-BLUP 0.89 ± 0.007 0.86 ± 0.020 0.82 ± 0.010GRR 0.89 ± 0.008 0.84 ± 0.020 0.80 ± 0.013WDres ABLUP 0.94 ± 0.001 0.96 ± 0.001 0.94 ± 0.001RR-BLUP 0.87 ± 0.007 0.83 ± 0.026 0.79 ± 0.015GRR 0.86 ± 0.008 0.83 ± 0.026 0.84 ± 0.016Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 5 of 16Multi-site to single-site prediction accuracyOn average the multi- to single-site predictability foreach trait was slightly higher than the within multi-siteaverage estimates (for RR-BLUP, HT12: 0.91 vs. 0.89,HT35: 0.87 vs. 0.86, and WDres: 0.85 vs. 0.83) (Fig. 2a).In most cases the multi- to single-site accuracy predic-tions were the same or higher, than the correspondingsingle-site predictions. Except for HT35 at Lost Creekfor both RR-BLUP and GRR which gave slightly lowerprediction accuracies for multi- to single-site than withinsite (Lost Creek) analyses (RR-BLUP: 0.79 vs. 0.82; GRR:0.79 vs. 0.80) (Fig. 2a and b). In most cases Adam wasthe most predictable site, and on average, Lost Creekwas the least predictable.Again, using the deregressed breeding values (DEBVs)to train the two GS models, we obtained results approxi-mating 0.0 for all multi- to single-site cross-validationanalyses.The multi- to single-site predictions for the ABLUPmodel closely followed the pattern of predictions from theGS models trained on EBV data. In each case (for HT12,HT35, and WDres) Adam was the most predictable site(joint first with Fleet River for WDres), Fleet River the sec-ond, and Lost Creek the least predictable site (Fig. 3). Theaverage multi- to single-site predictability for the traitswere again the same or slightly higher than the withinmulti-site predictions for ABLUP (HT12; 0.88 vs. 0.88,HT35: 0.86 vs. 0.86, and WDres: 0.85 vs. 0.84) (Fig. 3).Two sites predicting one site accuracyThe RR-BLUP and GRR models for this analysis producedsimilar results overall (Fig. 4a and b). Generally, the high-est prediction accuracies were obtained for HT12 (aver-age = 0.90), followed by HT35 (average = 0.88), and lastlyWDres (average = 0.83) for both RR-BLUP and GRR.There were only minor differences between the twomodels (Fig. 4a and b). Similar to the multi- to single-site(above), the prediction accuracy of two sites to one site in-dicated that, in all cases, Adam was the most predictablesite using this two to one analysis, despite the discrepancyin heritabilities, and on average Lost Creek was the leastpredictable in most cases.Fig. 3 Heritabilities and prediction accuracies of the ABLUP model for each of the traits. Showing the results of within-site, cross-site, combined-sites,and multi-site to single-site cross-validation. The direction of the arrows depicts what information (site) is used to train the model (shaft end), andwhich is being predicted (head). Traits key: HT 35 yrs. = height at 35 years (cm); HT 12 yrs. = height at 12 years (cm); WDres = wood density (g/cm3)calculated from resistance drilling to drilling. Sites are Adam, Fleet River, Lost Creek, and multi/combined-site (ALL)Table 2 Within multi-site genomic selection prediction accuraciesand their standard errors for ridge regression (RR-BLUP) andgeneralized ridge regression (GRR) models), estimating GEBVs andGEDBVs for heights (HT12 and HT35) and wood density (WDres)(g/cm3) calculated from resistance drilling to drillingTrait GS Model GEBVs GEDBVsHT12 RR-BLUP 0.91 ± 0.004 −0.09 ± 0.019GRR 0.91 ± 0.003 −0.04 ± 0.017HT35 RR-BLUP 0.88 ± 0.006 −0.02 ± 0.021GRR 0.88 ± 0.006 −0.01 ± 0.029WDres RR-BLUP 0.83 ± 0.009 0.00 ± 0.032GRR 0.83 ± 0.010 −0.01 ± 0.025Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 6 of 16The deregressed breeding values produced similar re-sults to those obtained from within, cross- and multi-sites GS models, with two sites to one site cross-validation analyses prediction accuracy approximating0.0 for HT12, HT35, and WDres.The ABLUP cross-validation for two sites predictingone site resembled the results of the GS model trained onEBVs. Prediction accuracies for HT12 and HT35 werelower than the GS models using EBVs (HT12 aver-age = 0.81 vs. 0.90, and HT35 average = 0.81 vs. 0.88)(Fig. 5). However, the average predictability for WDresremained the same for ABLUP as the GS models usingEBVs (0.83) (Fig. 5). In general Adam was the most pre-dictable site for all three traits (HT12, HT35 and WDres),followed by Fleet River and lastly Lost Creek in thisABLUP analysis (Fig. 5). This is the same site predictabilitytrend displayed by the GS models trained on EBV data.Time- time prediction accuracy (age- age/ trait-traitcorrelation)In order to test the theory that the target time for forwardselection can be reduced, prediction models were assessedon their accuracy when trained on younger trees (12 years:Ht12) followed by validation on the same trees for height atage 35 (Ht35) and wood density at age 38 (WDres38) (i.e.,correlations between GEBVs at age 12 and EBVs at ages 35(HT12-HT35) and 38 (traitHT12-traitWDres38). The GEBVsvalues for Ht12 have significant positive correlations withabFig. 4 Heritabilities and GS prediction accuracies for models trained on EBVs and predicting GEBVs for each of the traits. Showing the results oftwo sites predicting one site cross-validation (Top (a) using RR-BLUP, Bottom (b), using GRR). The direction of the arrows depicts what information(site) is used to train the model (shaft end), and which is being predicted (head). Traits key: HT 35 yrs. = height at 35 years (cm); HT 12 yrs. = heightat 12 years (cm); WDres = wood density (g/cm3) calculated from resistance to drilling. Sites are Adam, Fleet River, Lost CreekThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 7 of 16EBVs for HT35 for both RR-BLUP (0.71 ± 0.0004) and GRR(0.71 ± 0.0004) (Table 3). Trait- trait correlation betweenHT12 and WDres (recorded at 38 years: traitHT12-traitWDres38) produced significant negative correlations (RR-BLUP: -0.46 ± 0.0005; GRR: -0.46 ± 0.0006) (Table 3).These are the most useful correlations to make as they re-flect the direction in which selections will be made.Using the deregressed breeding values to train the twoGS models, we obtained results approximating 0.0 for alltime-time cross-validation analyses (Table 3).Time-time and trait-trait cross-validation of ABLUPresembles closely that of the GS models trained on EBVdata. EBVs for HT12 have a medium to strong positivecorrelation with EBVs at HT35 (0.74 ± 0.001), and a sig-nificant negative correlation with EBVs of wood densityWDres at age 38 (traitHT12-traitWDres38: −0.48 ± 0.002).DiscussionExome captureExome capture is a target enrichment method for sequen-cing the protein coding regions in a genome. This makesanalysis much more efficient. Through targeting this re-gion, the focus is immediately resolved to those areas thatare likely to contain sources of variation for the pheno-type. The reduced cost and time compared to that ofwhole genome sequencing, as well as the ability to stillcapture a significant proportion of variants makes exomecapture desirable as it is harder to find functional variantsin noncoding regions [20, 27, 28]. Added to which, foresttrees already tend to have large and complex genome sizes[20–22]. Exome capture has been recognized as an effect-ive method for increasing knowledge of unmapped largegenomes [19, 28].Fig. 5 Heritabilities and prediction accuracies of the ABLUP model for each of the traits. Showing the results of two sites predicting one sitecross-validation. The direction of the arrows depicts what information (site) is used to train the model (shaft end), and which is being predicted(head). Traits key: HT 35 yrs. = height at 35 years (cm); HT 12 yrs. = height at 12 years (cm); WDres = wood density (g/cm3) calculated fromresistance to drilling. Sites are Adam, Fleet River, and Lost CreekTable 3 Genomic selection prediction accuracies for time- time correlations for HT12 for RR-BLUP and GRR models (standard errors)GSModelEBVs DEBVsWDres HT35 WDres HT35RR-BLUP −0.46 ± 0.0005 0.71 ± 0.0004 0.02 ± 0.0038 −0.03 ± 0.0039GRR −0.46 ± 0.0006 0.71 ± 0.0004 0.002 ± 0.0040 −0.004 ± 0.0047Prediction accuracies based on correlations between GEBVs at age 12 and EBVs at age 35 (HT) and 38 (WD); and correlations between GEDBVs at age 12 andDEBVs at age 35 (HT) and 38 (WD). Traits key: HT35 = height at 35 years (cm); WDres = wood density (g/cm3) calculated from resistance drilling to drilling;HT12 = height at 12 years (cm)Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 8 of 16Whilst exome capture is expected to produce some miss-ing marker information, the method surpasses whole gen-ome shotgun sequencing or genotyping methods usinggenome complexity reduction, in depth of coverage. Given arestricted budget (with no option for re- sequencing) thedepth achieved using exome sequencing is expected to bemuch greater than genotyping-by-sequencing [30], with theadded benefit of obtaining more reliable calls. After the se-quencing process, those markers with a large proportion ofmissing information can be filtered out, however Rutkoski etal. [31] admit that this step is unnecessary since it affects theGS prediction accuracy little. In order to maintain power inthe subsequent analyses, marker imputation should be usedto infer missing data in those records not filtered out, con-serving the majority of the original SNPs. This was carriedout in the current study, although the ratio of missing datawas very low and therefore the impact of this procedure isexpected to be minimal. Given that currently there is no gen-etic map available for Douglas-fir, imputation methods inthis case are restricted to those which support unordereddata. There have been various imputation methods proposedfor unordered markers however their efficacy and suitabilityis yet to be fully determined in genomic selection [31].Though some methods do show some promising resultswhen compared to mean imputation, notably random forestregression produced greater GS prediction accuracy than itscounterparts in both Rutkoski et al. [31] and Poland et al.[32] studies.Although we managed to capture significant family ef-fects using this type of SNP data, we found little evi-dence that the GS models were able to capture marker-QTL LD using this genotypic data set (see below for dis-cussion). It is highly likely that substantially more SNPswill be required to capture significant effects for thesetraits (i.e., LD). In addition to this, our genotyping ef-forts were focused on a restricted portion of the genome,the exome, which is limited to the available 40 K probesused in the present study. In humans, the exome consti-tutes a mere 1% of the total genome [33], thus consider-ing the unique genome size and complexity of conifers[34] we expect that the population of SNPs used in thisstudy represents a very small fraction of the Douglas-firgenome. Additionally, in this case the revealed exomewhich represents functional genes that, by default, areunder selection and thus are conserved. By focusing thepresent study on this region we were not able to capturethe variation among families, as this was not representedin the exome. In order to seize this variation in inter-genic regions, an alternative, whole genome approachmust be used for genotyping, for example genotyping-by-sequencing as it uncovers unordered SNPs across theentire genome. These intergenic regions are not underthe same selection pressure as the exome, and may con-tain important regulatory sequences which correspondto the control of the traits we investigated. Another ap-proach used by Fuentes-Utrilla et al. [16], was to userestriction-site associated DNA sequencing (RADseq)technology. They found that in a species with no wholegenome assembly (in this case Sitka spruce (Picea sitch-ensis (Bong.) Carr)), a SNP panel could be constructedfrom randomly located markers generated from RADseq.However, it must be noted that their GS analysis wasperformed strictly within a single family.HeritabilityThe use of ABLUP instead of GBLUP is likely falsely in-flating the heritability estimate. Though the impact ofheritability on the predictive accuracy seems to be lowin this study. Our results show that even with modestheritability, predictive accuracies can be high. As shownby the combined site analyses, the heritability for HT12,HT35, and wood density were modest (0.17, 0.17, and0.43, respectively), yet the prediction accuracies were0.91 (± 0.004), 0.88 (± 0.006), and 0.83 (± 0.009), re-spectively (RR-BLUP) (Table 2). The large sample sizeand low effective population size of the present study(Ne = 21) likely helped in negating the effect of low traitheritability [5]. M rtens et al. [35] provided evidencethat increased relatedness between training and valid-ation populations leads to higher prediction accuracy intheir study on yeast. Furthermore, the use of correlationbetween pedigree-based and marker-based breedingvalues, approximates correlation between unknown truebreeding value and genomic breeding value. The accur-acy can go above heritability in this case because bothvalues are representing only genetic effects, this is in linewith results obtained by Gamal El-Dien et al. [12].Genomic selectionThe desired outcome of genomic selection is to produceunbiased marker effect estimates [5], and to avoid theBeavis effect [36] which hinders MAS causing marker ef-fects to be overestimated [37]. Instead of selectingmarkers based on a significance threshold, GS estimatesall marker effects simultaneously causing a differentproblem; there are more predictor effects (p, markers inthis case) to be estimated than there are observations (n,samples) [3]. Least squares cannot be used to estimateall the effects at once since there are not enough degreesof freedom. In addition, multi-collinearity of markerswould cause any model to be over fitted [3].To address this issue (large p, small n), various statisticalmodels have been proposed. They generally fall into the fol-lowing categories: shrinkage models, variable selectionmodels, and kernel methods. Shrinkage models (e.g., ridgeregression BLUP [RR-BLUP], Whittaker et al. [38]) fit allmarker effects which are all shrunk to the same degree.With RR-BLUP, it is assumed that the trait in questionThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 9 of 16more closely resembles Fisher’s infinitesimal model (manyloci with small effects), and marker effects are samples froma normal distribution (with equal variance). Variable selec-tion (e.g., generalized ridge-regression [GRR], Shen et al.[39]) however, reduces the number of markers used, and indoing so assumes that the trait is controlled by fewer,strong effect loci [3]. Kernel methods convert the predictorvariables to distances in effect creating a matrix similar toan additive genetic relationship matrix. The kernel matrixquantifies the distance between individuals but alsosmoothing parameters are added [3]. This method is flex-ible and can incorporate complex relationships betweenmarkers, for this reason it is useful in cases where non-additive effects are suspected to occur [40]. Indeed,Douglas-fir height and wood density have proven to havenon-additive genetic variance component [41]; however, itsunpredictability has driven the species’ advanced generationbreeding and selection methods to be additive genetic gain-dependent [41–43]. Since different traits have differing gen-etic architecture, there does not exist one model that is ne-cessarily the best for all traits or populations [3].In the present study, we assessed the prediction accur-acies of two GS models (RR-BLUP and GRR) in two phases;first when trained on EBVs, and second when trained onDEBVs. In the first instance, using EBVs and by virtue ofretaining family means, our data contained both contem-porary and historical pedigree information as well asmarker-QTL LD information. Without further adjustment,all this information was parsed into the GS models and re-sulted in high prediction accuracies (for both model types).Subsequent to this, in phase two, we deregressed the EBVs,removing the parent average effect in order to disassociatethe pedigree information from the marker-QTL LD. Thisresulting in DEBVs that contained the marker-QTL LD in-formation only without the between family genetic vari-ance. Using these DEBVs to train the GS models, weobtained prediction accuracies which for all practical pur-poses were 0.0. The success of the first phase can be attrib-uted to the large within and among family geneticvariances. This pedigree-driven approach dominated theanalysis and produced the observed high prediction accur-acies, which were comparable to the ABLUP accuracies.When this influence of pedigree was removed, the gener-ated models were no longer able to provide useful predic-tions. A similar effect was seen in interior spruce (Piceaglauca x Picea engelmannii) where cross-validation usingfamily folding resulted in decreased prediction accuracy,because between family variance was dominating the ana-lysis (Gamal El-Dien et al. 2017, unpublished). In additionto this Fuentes-Utrilla et al. [16], when studying GS in Sitkaspruce, found that within family predictions cannot be ex-trapolated to between families. Furthermore, when examin-ing marker transferability between families in white spruce,Beaulieu et al. [44] also found within family predictions tobe more precise than between family predictions. Similarly,to Gamal El-Dien et al. (2017, unpublished) they also foundthat when a family had no representation in the traininggroup, the accuracies obtained for that family were verysmall, occasionally negative, and often not statistically sig-nificant from zero. Much like using the DEBVs here, whichtry to predict between family variance, but have familymeans stripped away. Fuentes-Utrilla et al. [16] concludedthat species with large effective population sizes (notablyconifers), have a reduced ability to make predictions acrossfamilies. With this in mind, GS may best be employed toproduce GEBVs for within large full-sib families in conifersas it captures the Mendelian sampling term.This discrepancy in the results from the two GS phases,is indicative that we simply were not able to capture themarker-QTL LD with the available SNPs. Whilst otherstudies have had some success in this area, it is importantto note that these particular investigations have focusedon tree species with much smaller genomes, for exampleResende et al. [7]. Resende et al. [17] make this point intheir 2017 study of Eucalyptus, that although other studiesmay have failed to produce significant predictions betweenunrelated populations, this may be due to low markerdensity. This is likely inhibiting the ability to captureshort-range LD, the result being that prediction modelsrely almost entirely on relatedness.In their study, Resende et al. [7] used 7680 DArTmarkers on Eucalyptus which is estimated to have a gen-ome size of 609 Mbp [7], equivalent to 12.61 markers/Mbp. In contrast, here we used 69,951 SNP markers onthe Douglas-fir genome which is estimated at 18,700 Mbp,giving 3.74 markers/Mbp. To obtain a similar coverage asResende et al. [7] would require approximately 235,800markers, many more than we currently have. With such alarge genome, it is likely that many more SNPs will be re-quired (≈235,800+), with greater genomic coverage inorder to capture this, so far elusive, LD. In a more recentstudy, also using Eucalyptus, Müller et al. [18] managed tocapture some short-range historical LD using 5000–10,000 SNPs. Yet they too concluded that the genomicprediction in this case was largely driven by relatedness.The similar predictions given by the RR-BLUP andGRR methods, and the similarly high ABLUP accuracies,was again the result on heavy reliance on between familyvariance, and thus we have gained no new informationas to the genetic control or architecture of the traits inquestion. Although similar findings have been noted inmore successful GS studies. For example, Resende et al.[8] when comparing GS methods, found there to be littledifference between the predictive abilities of shrinkageand variable selection methods (4 in total) even consid-ering they have different a priori assumptions. They useda Pinus taeda training population with 951 individualsand 4853 SNPs in the analyses. Prediction accuracies forThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 10 of 1617 traits (including growth, disease resistance and devel-opment) ranged from 0.17 to 0.51. Only one trait (fusi-form rust resistance) showed any significant differencebetween the models. Higher prediction accuracies wereobtained using variable selection methods for this trait.This reflects the genetic architecture of the trait which iscontrolled by few, large effect loci [8].Single and multi-site cross-validationThe combined site GS analysis produced higher predic-tion accuracies than the single site analyses on average(Fig. 2a and b). The combined site training populationhaving more individuals than the single site populations.This is in agreement with what the literature statesshould happen; Grattapaglia [2] noted that increasingthe training population size increases accuracy up to apoint (around 1000 individuals). In addition to increasedsample size for predictive accuracy improvement, themulti-site approach incorporated the present GxE in themodel, resulting in further improvement.The prediction accuracies are high for HT12, HT35,and WDres GEBVs (0.87–0.92, 0.79–0.92, and 0.78–0.88,respectively) (Fig. 2a and b). They are moderately higherthan those in previous studies including other forest treespecies [7, 8, 10, 11, 15]. Largely due to the inclusion ofthe pedigree structure. In this case the GS methods arenot giving much advantage over ABLUP (ABLUP multi-site cross-validation accuracies: 0.88 ± 0.002, 0.86 ± 0.003,and 0.84 ± 0.003, for HT12, HT35, and WDres respect-ively), thus both are predicting only family means. Theprediction accuracies for both the GS models and theABLUP model are much higher than theoretical accuracyof the EBVs (0.61, 0.68, and 0.76; for HT12, HT35, andWDres, respectively), which indicates that both the EBVsand GEBVs are converging on the family means, and arefar from the true breeding values.The high prediction accuracies for the GEBVs may alsopartially be the result of using a relatively large trainingpopulation, known to correlate with accuracy. In addition,there is an interacting effect of the relatively low effectivepopulation size (Ne = 21), and both these data characteris-tics increase the accuracy of predictions [3]. Although inthis case the low effective population/family size alsomeant that the Mendelian sampling term could not be de-fined. Indeed, Gamal El-Dien et al. [12] using an interiorspruce population with Ne = 93.8 (estimated assuming theOP families are unrelated and contributed equally to theexperiment) had lower prediction accuracies than ob-tained here. But these in turn were vastly higher than re-sults from a study with 214 open-pollinated white spruce(Picea glauca) families in Quebec with Ne = 622.5 [9]. An-other component responsible for the increased accuracyof predictions, was the training of the models on EBVs ra-ther than raw phenotypes [12].Prediction accuracies fell dramatically for modelstrained on DEBVs, as marker-QTL LD could not be re-covered using the available SNP data set, indicating thatthe SNP markers effectively tracked the pedigree.Cross-site validationSimilar prediction accuracies were observed in the cross-site compared to the single-site and combined-site GSanalyses (with models trained on EBVs). Genotype x En-vironment interaction is an important consideration offorest tree selective breeding, more so than in animalbreeding where individuals are considered to share a com-mon environment [45–48]. Prediction accuracy for HT12across sites was relatively high. An unexpected resultgiven the fact that the heritability was only 0.17 (combinedsites), and there is expected to be a competition effect atearly stand development (i.e., strong environmental com-ponent). This is possibly an indication of the partially con-trolled experimental environment (not natural stands) inwhich the trees were growing. Given these experimentalconditions, it is also conceivable that GxE effects are min-imal. Which is reflected in the cross-site predictions(Fig. 2a and b), which are of a similar magnitude to thewithin-site prediction accuracies for all attributes.Time- time and trait- trait correlationsIt is thought that forward selection in Douglas-fir shouldbe carried out at a minimum of 17 years [49], when accur-ate predictions of phenotype at time of harvest can bemeasured. We tested the correlation of height at 12 yearsand height at 35 years (HT12-HT35), and wood density(38 years) (traitHT12-traitWDres38) and positive (0.71 forheight) and negative (−0.46 for wood density) correlationswere detected by the GS models trained on EBV data. Al-though they did not offer accuracy above that of ABLUP,indicative of a strong reliance on pedigree information ra-ther than marker-QTL LD. The results give an indicationof how useful early selection could be. Correlations forheight are moderately strong and low-moderate for wooddensity. Marker-trait associations are known to vary ac-cording to the tree age, limiting any correlation. Thiswould certainly hamper efforts to create a highly corre-lated age-age model in trees [9, 50]. At the moment pro-viding such predictions at an age any younger than12 years would not be recommended (note that height atage 12 was the earliest measurement available in thepresent study). Since larger age differences have beenshown to produce less accurate models [51]. The discrep-ancy in prediction accuracy between the time-time corre-lations and the cross-validations suggest that there issome inconsistency between EBVs and marker effects atthese two ages (12 and 35 years). Although the discrep-ancy is relatively small, and so meaningful results may stillbe obtained through age-age and trait-trait correlations.Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 11 of 16This is, in addition to the varied environmental conditionsthe trees endure over their long lifespans, lessening time-time correlations. To this effect and as White et al. [52]suggested, training models will need to be updated withmid-rotation phenotypes in order to accurately predictmature trait values.Genomic selection and forest tree breedingSeveral genomic selection “proof-of-concept” studies havebeen conducted on few coniferous tree species (e.g., lob-lolly pine, maritime pine, Sitka spruce, white spruce, andwhite-Engelmann spruce hybrid), all concluded that GShas the potential to increase the genetic gain throughspeeding breeding generation turn-over and increasingthe selection differential. It should be stated that, with theexception of the maritime pine study [14], none of the de-rived GS predictive models have been validated on inde-pendent “validation” populations. The success of GS isdependent on the linkage disequilibrium between themarkers used (i.e., SNP panels) and causal genes under-pinning the traits of interest, and the degree of relatednessbetween the training and validation populations. There-fore, caution is required during GS implementation as LDchanges after every round of breeding (i.e., recombin-ation); the fact that it does rapidly decay called for usingdense marker coverage to overcome this caveat. Still wehave found that by only using sequence capture data, wewere unable to successfully resolve this marker-QTL LD.We only had success in capturing among family effects(i.e., pedigree). Even with a SNP panel designed appropri-ately based on an informative SNP library, and largeenough to handle a conifer genome, there are additionalhurdles. LD only survives over relatively short distances inconifers compared to livestock species due to their rela-tively large Ne [53]. This led Fuentes-Utrilla et al. [16] toconclude that GS may only be useful in tree populationswith reduced Ne, for example seed orchards, or lines/ sub-groups which have been produced through selectivebreeding. Though as they demonstrate with their analysis,it is possible to generate very large full-sib families intrees, by controlled crossing. In this type of population,LD extends over longer distances than in open-pollinatedpopulations. As a result, they suggest that GS could beemployed to make selections within families.Based on comparable studies: 1) a greater number ofmarkers, and 2) wider coverage throughout the wholegenome or dense unordered SNP genotyping platform(e.g., GBS), would be needed to capture this LD and add-itional intergenic variation [17]. Or indeed a shift in thelevel at which GS is applied, i.e. to within families ratherthan across families [16]. Whilst GS still has the potentialto deliver unprecedented gains, it does not seem likelythat was achieved in the present study as the predictiondriving force was the pedigree rather than LD. Despite thelow Ne, the small family size prevented the accurateassessment of the Mendelian sampling term in this popu-lation. Therefore, the EBVs were heavily shrunk towardthe family mean and all within family deviations were notestimated precisely.Finally, it should be emphasized that any gains capturedthrough GS require unique tree improvement deliverymethods as the traditional seed orchards’ productionmode requires time for reaching sexual maturity, evenunder intensive management such as top grafting or hor-monal applications, and its sexual-production effectivelybreaks the LD between selected traits and markers.ConclusionsThe results suggest that the population of SNP markersused, along with their low coverage across the Douglas-firgenome was not successful for capturing the LD with thecausal genes underpinning the studied attributes. In this casethe impressive results of the investigated genomic selectionmodels relied heavily on relatedness rather than the LD.Alternative marker generation methods such as whole gen-ome sequencing or other dense unordered SNP genotypingmethods such as GBS are needed, as is a larger SNP array.Exome capture provide enough markers to successfully cap-ture/track the pedigree (contemporary and historical) andthus it is useful for genetic variance decomposition ofconifer traits, thus providing better genetic parameter esti-mates. Since we were only able to resolve the between familyeffects, we gained no new information regarding the geneticarchitecture of the traits. Whilst low Ne may help boostprediction accuracy in similar studies, there may be a lowerlimit to this. Beyond which Mendelian sampling is notcaptured. However, using a single, large full-sib family causesLD to extend over further distances compared to open-pollinated trees, therefore it is possible to make withinfamily selections using this type of population as Fuentes-Utrilla et al. [16] have demonstrated.MethodsExperimental populationPredictive models were trained on a replicated 38-year-oldpedigreed coastal Douglas-fir (Pseudotsuga menziesiiMirb. (Franco)) progeny testing population. The trial wasestablished by the Ministry of Forests, Lands and NaturalResource Operations of British Columbia, Canada in 1975and consists of 165 full-sib families (54 parents), of which37 families were selected for sampling from three test sites(Adams (Lat. 50 24′42″ N, Long. 126 09′ 37″W, Elev.576 mas), Fleet River (Lat. 48 39′25″ N, Long. 128 05′05″ W, Elev. 561 mas), and Lost Creek (Lat. 49 22′15″N, Long. 122 14′07″ W, Elev. 424 mas)) with a total of1372 trees (N ≈ 500 per site) and effective population size(Ne) of 21.Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 12 of 16Tissue sampling, DNA extraction and genotypingCambial tissue was collected using a hammer and hollowpunch tool (approx. 2 cm diameter) to remove two smallcircular pieces of bark/ cambium and developing tissuefrom each tree. The cambium disks were separated fromthe bark layer and immediately placed in a 2 ml collectiontube with 1 ml storage buffer (10 mM EDTA pH 8.0,10 mM Na2SO3) and kept at 4 °C until DNA extraction.DNA extraction followed the modified procedure devel-oped by Ivanova et al. [54] (R. Whetten, unpublished,North Carolina State University, personal communica-tions). Genotyping was done using the exome capturemethod in a commercial facility (RAPiD Genomics©, Flor-ida, US). A total of 40 K probes were designed using theavailable Douglas-fir transcriptome assembly [55]. Fromthe Douglas-fir reference transcriptome, a total of 325,372non-overlapping 120-bp probes were initially designed.After filtering out redundant and organelle matchingprobes, this number was reduced to 117,135 probes. Ofthe remaining probes, we selected 7464 that contained17,096 SNPs previously reported [55]. A further 32,536probes were selected bringing the total to 40 K. Selectionwas performed by randomly sampling the remainingprobes restricting the selection to a maximum of twoprobes per transcript. These 40 K (7464 + 32,536) probescover a total of 21,187 transcripts. The raw sequencedreads were demultiplexed in each individual barcodes.Low quality bases with less than 20 quality score in the 3′end were trimmed out followed by a low quality filter thatremoved reads with more than 10% of the read with lessthan 20 quality score. The filtered reads were alignedagainst the reference transcriptome using Mosaik v2.2.3[56] with the following parameters -mmp 0.05 -m all -a all-hs 15. SNP markers were identified at a population levelusing Freebayes [57] without considering indels, multi-nucleotide polymorphisms and complex events. This ana-lysis resulted in approximately 550,000 SNPs. These SNPswere filtered to identify the highest quality sites. These in-cluded only biallelic SNPs with less than 40% missing data.Further filtering was applied so that data was located oncontigs with a mean read depth less than or equal to 60.Additional filters included minor allele frequency (MAF)>5%, Hardy-Weinberg disequilibrium cut-off <−0.05, andmaximum site depth < 60. This process resulted in a totalof 1372 samples with 69,551 SNPs for use in the study,and mean imputation was used for the missing data. (formore details on exome capture see, Neves et al. [19]).PhenotypingEarly- (1988) and mid- (2011) rotation growth trait mea-surements of the studied trees had been assessed forheight (HT: in meters) and mid-rotation (age 38 years,2014) wood density (WDres) was assessed indirectly usingthe average resistance measurements recorded with aResistograph® (Instrumenta Mechanik Labor, Germany).Resistograph readings were subsequently converted towood density indices (g/cm3) by scaling them by the DBHmeasurements, as performed by El-Kassaby et al. [58].Estimated breeding values (EBVs) and deregressedbreeding values (DEBVs)EBVs were fitted in ASReml 3.0 [59], using the followingmixed linear model for a single site:y ¼ Xβ þ Zaþ e ð1Þwhere y is the phenotypic trait measurement, β is a fixedeffect vector (overall mean), a is a random effects vector(additive genetic) which are normally distributed(~N(0,Aσa2)) where A is average numerator relationshipmatrix and σa2 is additive genetic variance, e is the ran-dom residual effects which are normally distributed(~N(0,Iσe2)) where I is identity matrix and σe2 is residualvariance. X and Z are incidence matrices relating thefixed and random effects to the observations. Since GxEplays an important role in forestry [60] the combinedsite EBVs were estimated with terms accounting for site,replication, and family structure. Thus minimising biasesin BV calculation caused by environmental variations be-tween and within sites, and full-sib genetic effects.Narrow-sense heritability was calculated as h2 = σa2 /(σa2 + σe2), where σa2 and σe2 are the variances of additivegenetic and residual effects, respectively. Combined siteheritability estimates included the GxE model term inthe denominator. The breeding values (â) are fittedusing BLUP as follows:a ¼ AZ ’ σ2a V −1 y−X β^  ð2Þwhere V is the variance- covariance matrix of y obtainedby:V ¼ ZAZ ’ σ2a þ I σ2e ð3ÞBreeding values were deregressed using the deregres-sion procedure of Garrick et al. [61]. This adjusts the BVdata to account for family means, resulting in DEBVsthat contain information regarding individuals only,without parental BV influence.The theoretical accuracy of the EBVs ( r ) was calcu-lated following the procedure of Dutkowski et al. [62].r^¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1−SE2i1þ Fið Þσ^2asð4Þwhere SEi is the standard error of breeding value, and Fiis the inbreeding coefficient of the ith individual.Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 13 of 16Cross-validation across time and spaceThe two GS models (RR-BLUP and GRR) were com-pared to assess their application to various traits, and inaddition they were cross-validated in order to assesstheir prediction accuracy across environments/ spatialdivisions and time. The intention was to give an indica-tion to which extent (if any) GS increases gain per unittime over traditional methods.The validation processes used was a replicated randomlyassigned 10-fold cross validation. Models were trained on9/10 of these folds, with the remaining 1/10 fold used as avalidation set. Prediction accuracy was determined as themean of the replications of the Pearson product-momentcorrelation between EBVs of the validation set and theirpredicted GEBVs. Or in the case where DEBVs were usedto train the models, the correlation between DEBVs andpredicted GEDBVs (genomic estimated deregressed breed-ing values). The following combinations were used: 1)within-site, using information from a single site to esti-mate the GEBVs within that same site, 2) Cross-site/ be-tween sites in all combinations, with information fromsingle sites used to predict other sites, 3) Combined sites,pooled information from all the sites, 4) Multi-site to sin-gle site, the pooled information from all three sites used inthe estimation of single sites only, 5) Two sites to predictone site, pooled information from two sites used in the es-timation of the remaining site only, and 6) Time-time(age-age)/ trait-trait, using information from individualtrees to obtain correlations between GEBVs at age 12 andEBVs at ages 35 for height and 38 for wood density. Inaddition, the same 10-fold cross-validation process wasused to assess the predictive accuracy of the ABLUPmodel for all the spatial cross-validation analyses for allthe attributes (HT12, HT35, and WDres) in the study. Inthis case predictions made were based on relatedness asgiven by the pedigree.Genomic selection analysisTwo GS methods were compared: ridge regression (RR-BLUP) and generalized ridge regression (GRR) [3]. Theperformance of each of the methods was assessed accord-ing to their predictive accuracy determined by the correl-ation between GEBVs and EBVs, or DEBVs and GEDBVs.Genomic estimated breeding values (GEBVs) and genomicestimated deregressed breeding values (GEDBVs)RR-BLUPRidge regression (RR-BLUP: Whittaker et al. [38]) wasproposed for use as a selection tool based on marker in-formation. The model was fitted using the R package‘RRBLUP’ [63], the GEBV (or GEDBV if using dereg-ressed BVs) is obtained by the sum of p marker effects:g xið Þ ¼Xpk¼1xikβk ð5Þwhere xik is the score (genotype) of SNP k in individual i,βk is the marker effect of k. This method assumes thatthe marker effects are normally distributed (mean = 0)and so the BLUP solutions for marker effects are derivedfrom solving mixed linear model equations of Henderson[64] so that ʎ is optimised in the following Eq:β ¼ Z′Z þ ʎI −1 Z′y ð6Þwhere Z is an incidence matrix relating markers to indi-viduals, I is an identity matrix, and y is the vector ofEBVs fitted in ASReml. The shrinkage parameter (ʎ) canbe written as ʎ = σe2/σβ2 (residual variance / commonmarker effect variance). Since marker effects are as-sumed to be identically distributed, all effects are shrunkequally towards zero. This method is equivalent to usinglines (as opposed to markers) as random effects in amixed model analysis, where the covariance is modelledby a kinship matrix calculated from the marker data (agenomic relationship matrix [G matrix]) (this is some-times referred to as GBLUP).GRRGeneralized ridge regression (GRR) is a two-step variableselection method, the first step obtains estimates in thesame way that RR-BLUP does using linear mixed modelanalysis to solve for optimum ʎ, and in the second step theBLUP for β is subjected to an alternative shrinkage param-eter which is marker specific, using GRR to solve a hetero-geneous error model which replaces Iʎ in (5) with diag(ʎ):β^¼ Z′Z þ diag ʎð Þ −1Z′y ð7ÞIn this case ʎ is a vector of p shrinkage parameters.For the kth element: ʎk ¼ σ2eb =σ2βkb , is the parameter, whereσ2βkb is the variance of marker effect k (σ2βkb ¼ β2^k= 1−hkkð )). βis from step 1 (the BLUP marker effect) and hkk is the in-fluence of the dependant variable on the fitted value forobservation k. In other words, hkk represents the diagonalelement (n + k) of the influence matrix H = T(T’T)−1 T’,and:T ¼ X Z0 diag ʎð Þ ð8ÞAbbreviationsABLUP: Pedigree-based Best Linear Unbiased Predictor; DEBV: Deregressedestimated breeding value; EBV: Estimated breeding value; GBLUP: Genomics-based Best Linear Unbiased Predictor; GRR: Generalized Ridge Regression;GS: Genomic selection; GxE: Genotype-environment interaction; HT12: Heightat age 12 years; LD: Linkage disequilibrium; MAS: Marker-assisted selection;QTL: Quantitative trait loci; RR-BLUP: Ridge Regression Best Linear UnbiasedPredictor; WDres: Wood density estimated using a ResistographThistlethwaite et al. BMC Genomics  (2017) 18:930 Page 14 of 16AcknowledgementsWe thank British Columbia Ministry of Forests, Lands and Natural ResourceOperations, Victoria, BC for data and trials access, M.F. Resende Jr. and L.G.Neves of RAPiD Genomics, Florida, USA for genotyping.FundingThis work was supported by Genome British Columbia (User PartnershipProgram (UPP-001) to YAK and MUS, NSERC Discovery Grant to YAK,TimberWest Forest Corp., Western Forest Products Inc. We declare that thefunding agencies did not participate in the design of the study andcollection, analysis, and interpretation of data and in writing the manuscript.Availability of data and materialsThe data sets supporting the results of this article will be available in theDryad Digital Repository upon acceptance (http://datadryad.org/).Authors’ contributionsYAK and MUS conceived and designed the experiment, FRT, BR, IP collectedthe data, FRT performed the data analyses and wrote the manuscript, BR, JK,IP, CC, MUS, YAK advised on the data analyses and edited the manuscript. Allauthors read and approved the final manuscript.Ethics approval and consent to participateAccess to progeny test trials was granted by The Ministry of Forests, Landsand Natural Resource Operations of British Columbia, Canada.Consent for publicationNot applicable.Competing interestsThe authors declare that they have no competing interests.Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.Author details1Department of Forest and Conservation Sciences, Faculty of Forestry, TheUniversity of British Columbia, 2424 Main Mall, Vancouver, BC V6T 1Z4, Canada.2Scion (New Zealand Forest Research Institute Ltd.), 49 Sala Street,Whakarewarewa, Rotorua 3046, New Zealand. 3Department of Genetics andPhysiology of Forest Trees, Faculty of Forestry and Wood Sciences, CzechUniversity of Life Sciences Prague, Kamycka 129, 165 21 Praha 6, CzechRepublic. 4Département des sciences du bois et de la forêt, Université Laval, QC,Québec G1V 0A6, Canada. 5Department of Biochemistry and Molecular Biology,Oklahoma State University, Stillwater, OK 74078-3035, USA. 6British ColumbiaMinistry of Forests, Lands and Natural Resource Operations, Victoria, BC V8W9C2, Canada.Received: 28 March 2017 Accepted: 1 November 2017References1. Hayes B, Bowman P, Chamberlain A, Goddard M. Genomic selection in dairycattle: progress and challenges. J Dairy Sci. 2009;92:433–43.2. Grattapaglia D. Breeding forest trees by genomic selection: current progressand the way forward. In: Tuberosa R, et al., editors. Genomics of plantgenetic resources. Netherlands: Springer; 2014. p. 651–82.3. Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H, Smith KP,Sorrells MK, Jannink JL. Genomic selection in plant breeding: knowledgeand prospects. Adv Agron. 2011;110:77.4. Fisher RA. The correlation between relatives on the supposition ofMendelian inheritance. Trans R Soc Edinburgh. 1918;52:399–433.5. Meuwissen T, Hayes B, Goddard M. Prediction of total genetic value usinggenome-wide dense marker maps. Genetics. 2001, 1819;1576. Grattapaglia D, Resende MD. Genomic selection in forest tree breeding.Tree Genet Genomes. 2011;7:241–55.7. Resende MD, Resende MF, Sansaloni CP, Petroli CD, Missiaggia AA, AguiarAM, Abad JM, Takahashi EK, Rosado AM, Faria DA, Pappas GJ Jr, Kilian A,Grattapaglia D. Genomic selection for growth and wood quality ineucalyptus: capturing the missing heritability and accelerating breeding forcomplex traits in forest trees. New Phytol. 2012;194:116–28.8. Resende MF, Muñoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM,Jokela EJ, Martin TA, Peter GF, Kirst M. Accuracy of genomic selectionmethods in a standard data set of loblolly pine (Pinus taeda L.). Genetics.2012;190:1503–10.9. Beaulieu J, Doerksen T, Clément S, Mackay J, Bousquet J. Accuracy ofgenomic selection models in a large population of open-pollinated familiesin white spruce. Heredity. 2014;113:343–52.10. Beaulieu J, Doerksen TK, MacKay J, Rainville A, Bousquet J. Genomicselection accuracies within and between environments and small breedinggroups in white spruce. BMC Genomics. 2014;15:1048.11. Resende MFR, Muñoz P, Acosta JJ, Peter GF, Davis JM, Grattapaglia D,Resende MD, Kirst M. Accelerating the domestication of trees usinggenomic selection: accuracy of prediction models across ages andenvironments. New Phytol. 2012c;193:617–24.12. Gamal El-Dien O, Ratcliffe B, Klápště J, Chen C, Porth I, El-Kassaby YA.Prediction accuracies for growth and wood attributes of interior spruce inspace using genotyping-by-sequencing. BMC Genomics. 2015;16:370.13. Ratcliffe B, El-Dien OG, Klápště J, Porth I, Chen C, Jaquish B, El-Kassaby YA.Comparison of genomic selection models across time in interior spruce(Picea engelmannii × glauca) using unordered SNP imputation methods.Heredity. 2015;115:547–55.14. Bartholome J, van Heerwaarden J, Isik F, Boury C, Vidal M, Polmion C,Bouffier L. Performance of genomic prediction within and acrossgenerations in maritime pine. BMC Genomics. 2016;17:604.15. Isik F, Bartholome J, Farjat A, Chancerel E, Raffin A, Sanchez L, Plomion C,Bouffier L. Genomic selection in maritime pine. Plant Sci. 2016;242:108–19.16. Fuentes-Utrilla P., Goswami C., Cottrell J.E, Pong-Wong R., Law A., A’Hara S.W., Lee S.J., Woolliams J.A., QTL analysis and genomic selection usingRADseq derived markers in Sitka spruce: the potential utility of within familydata. Tree Genet Genomes 2017: 13: 33.17. Resende RT, Resende MDV, Silva FF, Azevedo CF, Takahashi EK, Silva-JuniorOB, Grattapaglia D. Assessing the expected response to genomic selectionof individuals and families in Eucalyptus breeding with an additive-dominant model. SSS. 2017:1–11.18. Müller BSF, Neves LG, de Almeida Filho JE, Resende MFR Jr, Muñoz PR, dosSantos PET, Filho EP, Kirst M, Grattapaglia D. Genomic prediction in contrastto a genome-wide association study in explaining heritable variation ofcomplex growth traits in breeding populations of Eucalyptus. BMCGenomics. 2017;18:524.19. Neves LG, Davis JM, Barbazuk WB, Kirst M. Whole-exome targetedsequencing of the uncharacterized pine genome. Plant J. 2013;75:146–56.20. Suren H, Hodgins KA, Yeaman S, Nurkowski KA, Smets P, Rieseberg LH,Aitken SN, Holliday JA. Exome capture from the spruce and pine giga-genomes. Mol. Ecol Res. 2016;16:1136–46.21. Neale DB, McGuire PE, Wheeler NC, Stevens KA, Crepeau MW, Cardeno C,Zimin AV, Puiu D, Pertea GM, Sezen UU, Casola C, Koralewski TE, Paul R,Gonzalez-Ibeas D, Zaman S, Cronn R, Yandell M, Holt C, Langley CH, YorkeJA, Salzberg SL, Wegrzyn JL. The Douglas-fir genome sequence revealsspecialization of the photosynthetic apparatus in Pinaceae. G3. 2017;https://doi.org/10.1534/g3.117.300078.22. Neale DB, Kremer A. Forest tree genomics: growing resources andapplications. Nat Rev Genet. 2011;12:111–22.23. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y, Scofield DG, Vezzi F,Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R, Sahlin K,Sherwood E, Elfstrand M, Gramzow L, Holmberg K, Hällman J, Keech O,Klasson L, Koriabine M, Kucukoglu M, Käller M, Luthman J, Lysholm F,Niittylä T, Olson Å, Rilakovic N, Ritland C, Rosselló JA, Sena J, Svensson T,Talavera-López C, Theißen G, Tuominen H, Vanneste K, Wu Z, Zhang B,Zerbe P, Arvestad L, Bhalerao R, Bohlmann J, Bousquet J, Gil RG, HvidstenTR, de Jong P, MacKay J, Morgante M, Ritland K, Sundberg B, Lee ThompsonS, Van de Peer Y, Andersson B, Nilsson O, Ingvarsson PK, Lundeberg J,Jansson S. The Norway spruce genome sequence and conifer genomeevolution. Nature. 2013;497:579–84.24. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W,Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES,Nusbaum C. Solution hybrid selection with ultra-long oligonucleotides formassively parallel targeted sequencing. Nat Biotechnol. 2009;27:182–9.25. Fu Y, Springer NM, Gerhardt DJ, Ying K, Yeh CT, Wu W, Swanson-Wagner R,D’Ascenzo M, Millard T, Freeberg L, Aoyama N, Kitzman J, Burgess D,Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 15 of 16Richmond T, Albert TJ, Barbazuk WB, Jeddeloh JA, Schnable PS. Repeatsubtraction-mediated sequence capture from a complex genome. Plant J.2010;62:898–909.26. Walsh T, Shahin H, Elkan-Miller T, Lee MK, Thornton AM, Roeb W, Abu Rayyan A,Loulus S, Avraham KB, King MC, Kanaan M. Whole exome sequencing andhomozygosity mapping identify mutation in the cell polarity protein GPSM2 as thecause of nonsyndromic hearing loss DFNB82. Am J Hum Genet. 2010;87:90–4.27. Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N,Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P,Lehner T, Shugart YY, Price AL, de Bakker PIW, Purcell SM, Sunyaev SR.Exome sequencing and the genetic basis of complex traits. Nat Genet.2012;44:623–30.28. Mertes F, El Sharawy A, Sauer S, van Helvoort JMLM, van der Zaag PJ,Franke A, Nilsson M, Lehrach H, Brookes AJ. Targeted enrichment ofgenomic DNA regions for next-generation sequencing. Brief FunctGenomics. 2011;10:374–86.29. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A,Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP. Geneticdiagnosis by whole exome capture and massively parallel DNA sequencing.Proc Natl Acad Sci U S A. 2009;106:19096–101.30. Bodi K, Perera AG, Adams PS, Bintzler D, Dewar K, Grove DS, Kieleczawa J,Lyons RH, Neubert TA, Noll AC, Singh S, Steen R, Zianni M. Comparison ofcommercially available target enrichment methods for next-generationsequencing. J Biomol Tech. 2013;24:73–86.31. Rutkoski JE, Poland J, Jannink J-L, Sorrells ME. Imputation of unorderedmarkers and the impact on genomic selection accuracy. G3: Genes|Genomes| Genetics. 2013;3:427–39.32. Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, Dreisigacker S,Crossa J, Sánchez-Villeda H, Sorrells M, Jannink J. Genomic selection in wheatbreeding using genotyping-by-sequencing. Plant Genome. 2012;5:103–13.33. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T,Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J.Targeted capture and massively parallel sequencing of 12 human exomes.Nature. 2009;461:272–6.34. De La Torre AR, Birol I, Bousquet J, Ingvarsson PK, Jansson S, Jones SJM, KeelingCI, MacKay J, Nilsson O, Ritland K, Street N, Yanchuk A, Zerbe P, Bohlmann J.Insights into conifer Giga-genomes. Plant Physiol. 2014;166(4):1724–32.35. Mӓrtens K, Hallin J, Warringer J, Liti G, Parts L. Predicting quantitative traitsfrom genome and phenome with near perfect accuracy. Nat Commun.2016;7:11512.36. Xu S. Theoretical basis of the Beavis effect. Genetics. 2003;165:2259–68.37. Lande R, Thompson R. Efficiency of marker-assisted selection in theimprovement of quantitative traits. Genetics. 1990;124:743–56.38. Whittaker JC, Thompson R, Denham MC. Marker-assisted selection usingridge regression. Genet Res. 2000;75:249–52.39. Shen X, Alam M, Fikse F, Rönnegård LA. Novel generalized ridge regressionmethod for quantitative genetics. Genetics. 2013;193:255–1268.40. Gianola D, van Kaam JB. Reproducing kernel Hilbert spaces regressionmethods for genomic assisted prediction of quantitative traits. Genetics.2008;178:2289–303.41. Yanchuk AD. General and specific combining ability from disconnectedpartial diallels of coastal Douglas-fir. Silvae Genet. 1996;45:37–45.42. El-Kassaby YA, Park Y-S. Genetic variation and correlation in growth,biomass, and phenology pf Douglas-fir diallel progeny at different spacings.Silvae Genet. 1993;42:289–97.43. Krakowski J, Park Y-S, El-Kassaby YA. Early testing of Douglas-fir: wooddensity and ring width. For Genet. 2005;12:99–105.44. Beaulieu J, Doerksen T, Boyle B, Clément S, Deslauriers M, Beauseigle S, BlaisS, Poulin P-L, Lenz P, Caron S, Rigault P, Bicho P, Bousquet J, Mackay J.Association genetics of wood physical traits in the conifer white spruce andrelationships with gene expression. Genetics. 2011;188:197–214.45. Burdon RD. Genetic correlation as a concept for studying genotype-environment interaction in forest tree breeding. Silvae Genet. 1977;26:168–75.46. Owino F. Genotype x environment interaction and genotypic stability inloblolly pine. Silvae Genet. 1977;26:21–6.47. Matheson AC, Raymond CA. The impact of genotype x environment interactionon Australian Pinus Radiata breeding programs. Aust. For Res. 1984;14:11–25.48. Matheson AC, Cotterill PP. Utility of genotype x environment interactions.For Ecol Manag. 1990;30:159–74.49. Magnussen S, Yanchuk AD. Selection age and risk: finding the compromise.Silvae Genet. 1993;42:25–40.50. Lerceteau E, Szmidt AE, Andersson B. Detection of quantitative trait loci inPinus Sylvestris L. across years. Euphytica. 2001;121:117–22.51. Ratcliffe B, Gamal el-Dien O, Klápště J, Porth I, Chen C, Jaquish B, el-KassabyYA. A comparison of genomic selection models across time in interiorspruce (Picea engelmannii x glauca) using unordered SNP imputationmethods. Heredity. 2015;115:547–55.52. White TL, Adams WT, Neale DB. Forest genetics. Cabi. 2007;53. Neale DB, Savolainen O. Association genetics of complex traits in conifers.Trends Plant Sci. 2004;9:325–30.54. Ivanova NV, Fazekas AJ, Hebert PDN. Semi-automated, membrane-basedprotocol for DNA isolation from plants. Plant Mol Biol Rep. 2008;26:186–98.55. Howe GT, Yu J, Knaus B, Cronn R, Kolpak S, Dolan P, Lorenz WW, DeanJFDASNP. Resource for Douglas-fir: de novo transcriptome assembly andSNP detection and validation. BMC Genomics. 2013;14:137.56. Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GTMOSAIK.A hash-based algorithm for accurate NextGeneration sequencing short-readmapping. PLoS One. 2014;9(3):e90581.57. Garrison E, Marth G. Haplotype-based variant detection from short-readsequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN]. 2012;58. El-Kassaby YA, Mansfield S, Isik F, Stoehr M. In situ wood quality assessmentin Douglas-fir. Tree Genet Genomes. 2011;7:553–61.59. Gilmour A.R., Gogel B., Cullis B., Thompson R.;ASReml user guide release 3.0.200960. Cappa EP, Stoehr MU, Xie C-Y, Yanchuk AD. Identification and jointmodeling of competition effects and environmental heterogeneity in threeDouglas-fir (Pseudotsuga Menziesii Var. Menziesii) trials. Tree GenetGenomes. 2016;12:102.61. Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding valuesand weighting information for genomic regression analyses. Genet Sel Evol.2009:41–55.62. Dutkowski GW, eSilva JC, Gilmour AR, Lopez GA. Spatial analysis methodsfor forest genetic trials. Can J For Res. 2002;32:2201–14.63. Endelman JB. Ridge regression and other kernels for genomic selectionwith R package rrBLUP. Plant Genome. 2011;4:250–5.64. Henderson C. A simple method for computing the inverse of a numeratorrelationship matrix used in prediction of breeding values. Biometrics. 1976;32:69–83.•  We accept pre-submission inquiries •  Our selector tool helps you to find the most relevant journal•  We provide round the clock customer support •  Convenient online submission•  Thorough peer review•  Inclusion in PubMed and all major indexing services •  Maximum visibility for your researchSubmit your manuscript atwww.biomedcentral.com/submitSubmit your next manuscript to BioMed Central and we will help you at every step:Thistlethwaite et al. BMC Genomics  (2017) 18:930 Page 16 of 16


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items