UBC Faculty Research and Publications

Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A May 9, 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2015_Article_1597.pdf [ 1.17MB ]
JSON: 52383-1.0223874.json
JSON-LD: 52383-1.0223874-ld.json
RDF/XML (Pretty): 52383-1.0223874-rdf.xml
RDF/JSON: 52383-1.0223874-rdf.json
Turtle: 52383-1.0223874-turtle.txt
N-Triples: 52383-1.0223874-rdf-ntriples.txt
Original Record: 52383-1.0223874-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessPrediction accuracies for growth and woodattributes of interior spruce in space usinggKeywords: Interior spruce, Genomic selection, Genotyping-by-sequencing, Open-pollinated families, Genotype xGamal El-Dien et al. BMC Genomics  (2015) 16:370 DOI 10.1186/s12864-015-1597-yV6T 1Z4, CanadaFull list of author information is available at the end of the articleenvironment interaction, Imputation methods, Multi-trait GS* Correspondence: y.el-kassaby@ubc.ca1Department of Forest and Conservation Sciences, Faculty of Forestry, TheUniversity of British Columbia, 2424 Main Mall, Vancouver, British ColumbiaOmnia Gamal El-Dien1, Blaise Ratcliffe1, Jaroslav Klápště1,2, Charles Chen3, Ilga Porth1 and Yousry A El-Kassaby1*AbstractBackground: Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increasegain per unit time through early selection and greater selection intensity, particularly for traits of low heritabilityand late expression. Affordable next-generation sequencing technologies made it possible to genotype largenumbers of trees at a reasonable cost.Results: Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinatedfamilies planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value(MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearestneighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS predictionmodels were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the GeneralizedRidge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS predictionmodels were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data,respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the geneticarchitecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those ofsingle-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations anddeemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realisticheritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently bothheritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction modelsproduced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3.Conclusions: The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluationmethods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS intree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GSmodels ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle componentscores provided an opportunity for the concurrent selection of traits with different phenotypic optima.genotyping-by-sequencin© 2015 Gamal El-Dien et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of theCreative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use,distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons PublicDomain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in thisarticle, unless otherwise stated.Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 2 of 16BackgroundTree improvement programs are long-term and resourcedemanding endeavors requiring repeated cycles of selec-tion, breeding and testing. Most of conventional treebreeding programs face major challenges; including, longbreeding cycles, large field experiments planted over vastterritory, late expression of economic traits (e.g., wooddensity), and low to medium heritability of traits [1].The phenotypic selection approach coupled with longtesting phase often result in slow accumulation of gen-etic gain per unit time and cost [2]. Plant breedersadopted Marker-Assisted-Selection (MAS) to take ad-vantage of the linkage disequilibrium (LD) between gen-etic markers and Quantitative Trait Loci (QTLs) andrealized the method’s potential to increase breeding effi-ciency [3,4]. Similarly, tree breeders perceived MAS as ameans to reduce the time required for phenotypic selec-tion, increasing selection intensity, and improving selec-tion precision particularly for low heritability and lateexpressing traits as well as its ability to overcome majorconventional breeding obstacles such as the long andcostly breeding cycle [5,6]. However, MAS faced severalchallenges; as most associations were limited to onlyspecific genetic background due to the rapidly decayingLD in forest trees, the interaction of QTLs effects withthe genetic background, the genotype by environment(GxE) interaction, and the fluctuation of the alleles fre-quency over generations [7]. The complex nature ofquantitative traits [8] rendered MAS ineffective in bothanimal and crop breeding and few successes mostly in-volving traits with simple inheritance (e.g., disease re-sistance) were reported [9,10].Meuwissen et al. [11] introduced Genomic Selection(GS) as a method that collectively uses the genome-widemarker data in predicting the phenotype by estimatingthe genomic breeding values for each individual. Themajor advantage of GS is that it does not require theidentification of the QTLs or linked markers with targettraits as all marker effects are estimated simultaneouslyand used to develop the prediction model for estimatingGenomic Estimated Breeding Values (GEBV) for eachindividual. Thus, this method is suitable for selection oftraits with complex genetic architecture as it does notrely on the identification of a single causal variant, ra-ther it fits the genetic effects of all markers regardless oftheir known functional relevance [11,12]. In forest treebreeding context, GS has the ability to predict thephenotype for selecting elite genotypes at early age anddevelopmental stage, thus substantially shortening thebreeding cycle and increasing the selection differential,ultimately raising the genetic gain per unit time [13-16].The time savings involve tree testing (for late expressingtraits in particular), which is not needed in the next fewgenerations with GS being implemented in the coniferbreeding program, thus providing 15–25 years antici-pated savings [16].The development of Next-Generation-Sequencing(NGS) technologies and the implementation of geneticmarkers from sequence data in quantitative geneticsrelated to GS, the Genomic Best Linear Unbiased Pre-dictor (GBLUP) [17], and the unified single-stepevaluation approach (also known as HBLUP, single-step combining pedigree and realized kinship informa-tion) [18] have created novel opportunities for breeding,including forest trees [2,19-21]. Genotyping-By-Sequencing(GBS) [22], of the NGS technologies, offers a promising op-portunity in studying non-model species includingthose with large and complex genomes with no assem-bled reference sequence such as conifers [23]. GBS usesrestriction enzymes to allow the sequencing of a re-duced subset of the studied genome and the resultingfragments are DNA barcoded to permit multiplexed se-quencing. GBS has made genome-wide population stud-ies possible due to the affordability of the method andits capability of resolving tens of thousands of markersscattered throughout the genome.In this study, using GBS as a genotyping platform, wedeveloped GS prediction models in a dataset of 1,126 In-terior spruce trees representing 25 open-pollinated fam-ilies replicated over three sites in British Columbia (BC),Canada. White and Interior spruce are one of the mosteconomically important forest tree species in BC. In-terior spruce is a complex of white spruce (Piceaglauca (Moench) Voss), Engelmann spruce (Piceaengelmannii Parry), and their hybrids and, because oftheir similar growing habitats and silvicultural require-ments, they are often collectively treated as one com-plex [24]. While white spruce shows transcontinentaldistribution, the natural distribution of Engelmannspruce is much more limited and scattered and in BCprovince is confined to the northern part of central BC.Hybridization occurs mainly at mid elevations, wheretheir distributions overlap. Recently, extensive geneticand genomic resources became available for this species(4.9 million scaffolds from the 20.8 giga base pairs draftgenome of Interior spruce individual PG29, Birol et al.[25]; 21,840 spruce ESTs microarray employed in genet-ical genomics of interior spruce progenies [26]). The ob-jectives of the present study were to: 1) evaluate theefficiency of GBS as a rapid genetic marker genotypingplatform for GS studies, 2) investigate different imput-ation algorithms for GBS data on GS prediction accur-acy, 3) compare two GS approaches (Ridge regressionbest linear unbiased predictor (RR-BLUP) and general-ized ridge regression (GRR)), 4) investigate the hetero-geneous GxE effect on GS prediction accuracy in space,and 5) use PCA in the comparisons of multi- vs. single-trait GS prediction models.ResultsGenotyping, missing data imputation, and selection ofimputation methodIn this study, 1,126 38-year-old Interior spruce trees(Picea glauca (Moench) Voss x Picea engelmannii Parryex Engelm.) originating from 25 open-pollinated familiesselected for their superior growth traits were sampledfrom the progeny test trial planted on three sites, (1)Aleza Lake, (2) Prince George Tree Improvement Station(PGTIS), and (3) Quesnel. A cost-effective NGS technol-ogy, genotyping-by-sequencing (GBS), was employed forkNN-Fam-60% and SVD-60% produced better accuraciesGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 3 of 16genotyping a 20GB unassembled genome such as spruce.After two 48-multiplexed sequencing passes, a total of4,798,791,310 good barcoded reads was generated, andthe median of read depth per site was at 3.92 (averaged4.58 ± 4.28). TASSEL UNEAK SNP calling pipeline wasused to determine SNP polymorphism for these 1,126spruce trees, resulting in a large genotype table of1,232,406 SNPs [23,27]. Typical to GBS, a low coveragesequence platform, many markers tended to have miss-ing data even after the repeated sequencing of all studiedtrees (see Discussion, for more details). From the identi-fied 1,232,406 SNPs, the applied imputation methodsand filtering (minimum minor allele frequency of 0.05)used produced genotyping files ranged from 8,868 (MI-30% and EM-30%) to 62,618 (kNN-Fam-60%) SNPs(Table 1). Imputation accuracy ranges from 0.77 (SVD10 iterations) to 0.82 (SVD with 2 iterations). On aver-age, SVD with 2 iterations produced the best accuracy inthe four currently existing methods: MI, SVD, EM andkNN. Using K’s (in K-nearest neighbors) from family ver-sus non-family members, accuracy for kNN-Fam imput-ation ranged from 0.77 to 0.85. In general, including morefamily members resulted in higher accuracy (Additionalfile 1); however, imputation accuracy remained unchanged(and did not improve), when the number of non-familymembers that was included was larger than the familysize. The best imputation accuracy gained was at K1 = 5and K2 = 20, which represented the K values used in thisstudy for imputing the whole SNP table (Additional file 1).Table 1 Imputation methods used for genotyping-by-sequencing dataImputationmethod1Missing datathresholdImputationalgorithm# ofSNPsMI 30% Mean imputation (MI) 8,868MI 60% Mean imputation (MI) 47,521EM 30% Expectation-maximization (EM) 8,868kNN-Fam 60% Family-based K-nearestneighbor (kNN-Fam)62,198SVD 60% Singular Value 55,618Decomposition (SVD)1See main text for abbreviations.comparing to MI-60%; however, the kNN-Fam-60% wassuperior to SVD-60% (see below). This comparison wasdone based on GS prediction accuracies produced forthe two GS models and the seven studied traits for bothsingle- and multi-site scenarios (see below).Trait heritabilitiesUsing genotypes resulting from the EM-30% algorithmimputed data, the narrow-sense heritabilities of the traitsestimated from the pedigree (ABLUP, i.e. the conven-tional BLUP model using the pedigree-based relationshipmatrix) and genomic best linear unbiased predictors(GBLUP using the genomic-based realized kinshipmatrix) produced several broad generalizations that in-clude: 1) single- and multi-site heritabilities were higherfor ABLUP than those from their GBLUP counterparts,2) multi-site heritabilities were lower than that of a sin-gle site for both ABLUP and GBLUP, 3) trait heritabil-ities varied among sites for both ABLUP and GBLUP;however, the differences were lower for the GBLUP thanthat of the ABLUP, 4) the Quesnel site produced higherheritabilities than PGTIS and Aleza Lake, yet they havesome overlapping ranges, and 5) standard error esti-mates of heritabilities obtained from ABLUP were higherthan those from GBLUP for single- and multi-site(Table 2). Lower GBLUP heritabilities were expected asABLUP tended to inflate the estimates as the pedigreebased analysis assumptions are often violated due tomating pattern, relatedness built-up due to populationhistory, and inability to separate common environmenteffect from genetics.Prediction accuracy for different GS models andimputation methodsThe accuracy of GS models (RR-BLUP and GRR) in pre-dicting the GEBV were evaluated for the seven studiedtraits using all imputation methods (30% missing data:MI and EM, and 60% missing data: MI, kNN-Fam, andSVD) and over the four cross-validation scenarios: 1)within each individual site, 2) cross-site (all possiblecombinations), 3) within multi-site (the three sites com-As a result, we chose kNN-Fam over kNN of Troyanskayaet al. [28] due to its slight superiority in accuracy. The SNPtable imputed with this method is referred to as kNN-Fam.The selection of specific imputation methods for gen-omic selection analyses were restricted to the methodwith greater GS accuracies within the same percentageof missing data class (i.e., 30% vs. 60%). For the 30%missing data, the EM-30% produced greater accuracythan MI-30%, similarly for the 60% missing data, thebined), and 4) the multi-site population in predicting in-dividual site (see below).Within site GS accuraciesAcross all imputation methods (30% and 60% missingdata), the RR-BLUP produced higher within site GEBVaccuracies than the GRR (Tables 3 and 4, Figure 1,Additional file 2). In general, the RR-BLUP producedhigher accuracies than the GRR (100 out of the possible105 comparisons for both GS models) and this was alsotraits for RR-BLUP (traits averages were 0.51, 0.50, and0.46 as opposed to 0.52, 0.51, and 0.46 for PGTIS, AlezaLake, and Quesnel sites, respectively) and GRR (averageswere 0.49, 0.43, and 0.41 vs. 0.49, 0.46, and 0.41 forPGTIS, Aleza Lake, and Quesnel sites, respectively)(Table 3). The 60% missing data imputation methods pro-duced similar GS prediction and confirmed the superiorityTable 2 Multi- and single site heritability estimates and their standard errors using pedigree (ABLUP) and genomic(GBLUP) best linear unbiased predictorsTrait ABLUP GBLUP (EM-30%)Multi-site Single site Multi-site Single sitePGTIS Aleza L. Quesnel PGTIS Aleza L. QuesnelHT 0.35 ± 0.14 0.64 ± 0.22 0.43 ± 0.19 0.98 ± 0.02 0.20 ± 0.06 0.50 ± 0.15 0.32 ± 0.14 0.56 ± 0.13DBH 0.05 ± 0.08 0.39 ± 0.17 0.28 ± 0.15 0.55 ± 0.19 0.07 ± 0.06 0.37 ± 0.15 0.26 ± 0.13 0.53 ± 0.15VOL 0.09 ± 0.10 0.45 ± 0.18 0.29 ± 0.15 0.76 ± 0.23 0.09 ± 0.06 0.42 ± 0.15 0.27 ± 0.13 0.60 ± 0.15VDir 0.28 ± 0.12 0.31 ± 0.15 0.38 ± 0.17 0.78 ± 0.24 0.12 ± 0.06 0.17 ± 0.11 0.37 ± 0.15 0.49 ± 0.14WDres 0.27 ± 0.12 0.59 ± 0.21 0.65 ± 0.22 0.42 ± 0.15 0.10 ± 0.06 0.49 ± 0.15 0.28 ± 0.13 0.42 ± 0.14WDX-ray 0.38 ± 0.14 0.55 ± 0.20 0.48 ± 0.19 0.59 ± 0.20 0.18 ± 0.06 0.28 ± 0.13 0.39 ± 0.15 0.43 ± 0.13MoEd 0.28 ± 0.12 0.31 ± 0.15 0.38 ± 0.17 0.78 ± 0.24 0.12 ± 0.06 0.17 ± 0.11 0.37 ± 0.15 0.49 ± 0.14Traits are HT: height in m; DBH: diameter at breast height in cm; VOL: stem volume in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray:wood density in kg/m3 using X-ray densitometry; MoEd: dynamic modulus of elasticity.entaGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 4 of 16mirrored by their standard error estimates (Tables 3 and4). Within the 30% missing data imputation methods, theEM-30% produced greater accuracy than MI-30% for allTable 3 Within site (PGTIS, Aleza Lake (AL), and Quesnel) gerrors for RR-BLUP and GRR models across 30% missing daTrait GS model Imputation methodMI-30%PGTIS ALHT RR-BLUP 0.48 ± 0.0031 0.46 ± 0.002GRR 0.44 ± 0.003 0.45 ± 0.010DBH RR-BLUP 0.58 ± 0.002 0.55 ± 0.003GRR 0.54 ± 0.003 0.47 ± 0.017VOL RR-BLUP 0.56 ± 0.002 0.54 ± 0.003GRR 0.52 ± 0.003 0.50 ± 0.004VDir RR-BLUP 0.55 ± 0.002 0.54 ± 0.002GRR 0.52 ± 0.003 0.48 ± 0.004WDRes RR-BLUP 0.47 ± 0.003 0.37 ± 0.003GRR 0.46 ± 0.005 0.34 ± 0.005WDX-ray RR-BLUP 0.41 ± 0.003 0.49 ± 0.003GRR 0.41 ± 0.004 0.25 ± 0.011MoEd RR-BLUP 0.55 ± 0.003 0.55 ± 0.002GRR 0.53 ± 0.004 0.51 ± 0.003Ave. RR-BLUP 0.51 ± 0.062 0.50 ± 0.067GRR 0.49 ± 0.051 0.43 ± 0.097Traits are HT: height in m; DBH: diameter at breast height in cm; VOL: stem volumewood density in kg/m3 using X-ray densitometry; MoEd: dynamic modulus of elasticof the RR-BLUP over GRR and additionally highlightingthe better accuracies for kNN-Fam-60% compared to MI-60% and SVD-60% (Table 4).omic selection prediction accuracies and their standardimputation methods (MI-30% and EM-30%)ME-30%Quesnel PGTIS AL Quesnel0.33 ± 0.003 0.50 ± 0.003 0.48 ± 0.003 0.35 ± 0.0040.27 ± 0.007 0.46 ± 0.005 0.45 ± 0.005 0.29 ± 0.0060.53 ± 0.004 0.58 ± 0.003 0.55 ± 0.002 0.53 ± 0.0030.51 ± 0.006 0.53 ± 0.004 0.49 ± 0.006 0.51 ± 0.0030.44 ± 0.003 0.55 ± 0.004 0.54 ± 0.002 0.45 ± 0.0020.42 ± 0.006 0.53 ± 0.004 0.49 ± 0.004 0.41 ± 0.0060.41 ± 0.004 0.55 ± 0.003 0.55 ± 0.002 0.41 ± 0.0040.31 ± 0.006 0.52 ± 0.013 0.50 ± 0.005 0.33 ± 0.0040.59 ± 0.003 0.49 ± 0.003 0.39 ± 0.004 0.59 ± 0.0030.54 ± 0.005 0.44 ± 0.009 0.33 ± 0.007 0.54 ± 0.0050.50 ± 0.002 0.43 ± 0.003 0.48 ± 0.003 0.50 ± 0.0010.50 ± 0.004 0.42 ± 0.003 0.46 ± 0.020 0.50 ± 0.0020.40 ± 0.004 0.55 ± 0.002 0.55 ± 0.002 0.39 ± 0.0030.30 ± 0.006 0.55 ± 0.004 0.52 ± 0.005 0.29 ± 0.0060.46 ± 0.088 0.52 ± 0.051 0.51 ± 0.060 0.46 ± 0.0850.41 ± 0.113 0.49 ± 0.052 0.46 ± 0.063 0.41 ± 0.108in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray:ity.entam. El-Dien et al. BMC Genomics  (2015) 16:370 Page 5 of 16Table 4 Within site (PGTIS, Aleza Lake (AL), and Quesnel) gerrors for RR-BLUP and GRR models across 60% missing da60%)Trait GS model Imputation methodMI-60% kNN-FaPGTIS AL Quesnel PGTISHT RR-BLUP 0.54 ± 0.002 0.51 ± 0.003 0.40 ± 0.002 0.55 ± 0GRR 0.51 ± 0.005 0.45 ± 0.011 0.34 ± 0.007 0.51 ± 0DBH RR-BLUP 0.62 ± 0.002 0.60 ± 0.002 0.56 ± 0.003 0.62 ± 0GRR 0.59 ± 0.009 0.58 ± 0.004 0.53 ± 0.006 0.59 ± 0VOL RR-BLUP 0.60 ± 0.002 0.58 ± 0.003 0.49 ± 0.003 0.61 ± 0GRR 0.58 ± 0.005 0.55 ± 0.006 0.44 ± 0.009 0.58 ± 0VDir RR-BLUP 0.62 ± 0.002 0.57 ± 0.002 0.46 ± 0.003 0.63 ± 0GRR 0.59 ± 0.005 0.51 ± 0.010 0.40 ± 0.006 0.60 ± 0WDRes RR-BLUP 0.53 ± 0.002 0.44 ± 0.002 0.62 ± 0.002 0.55 ± 0GRR 0.46 ± 0.007 0.36 ± 0.009 0.58 ± 0.004 0.47 ± 0WDX-ray RR-BLUP 0.49 ± 0.002 0.51 ± 0.002 0.53 ± 0.003 0.51 ± 0GRR 0.45 ± 0.006 0.47 ± 0.005 0.49 ± 0.009 0.48 ± 0MoEd RR-BLUP 0.62 ± 0.001 0.57 ± 0.002 0.45 ± 0.002 0.64 ± 0GRR 0.60 ± 0.003 0.52 ± 0.007 0.38 ± 0.007 0.61 ± 0Ave. RR-BLUP 0.57 ± 0.053 0.54 ± 0.056 0.50 ± 0.074 0.59 ± 0Multi-sites GS accuraciesUnlike within site cross-validation, testing the applic-ability of a GS model for a specific site to predict theGEBV of other sites generally produced lower accur-acies for both models (RR-BLUP and GRR) (Figure 1,Additional files 2 and 3). This is expected due to theGxE interaction even when the three sites are locatedwithin one breeding zone (Prince George Seed PlanningZone (http://www.for.gov.bc.ca/hfd/pubs/docs/mr/an-nual/ar_1995-96/pspzm.htm)). For simplicity, in thissection we will restrict the cross-sites comparisons to theimputation method with the highest number of SNPs (i.e.,kNN-Fam-60% (62,198 SNPs)), and the GS model withhighest accuracies (i.e., RR-BLUP (Additional file 2)). Overthe seven studied traits, the RR-BLUP model producedcross-site validation accuracies ranging from 0.16 and 0.23when PGTIS was used to predict the GEBV of Aleza Lake(1→2), 0.13 and 0.24 for 2→1, 0.01 and 0.32 for PGTIS topredict Quesnel (1→3), 0.0 and 0.38 for 3→1, 0.06 and0.36 for 2→3, and 0.03 and 0.39 for 3→2 (Additional files2 and 3). The estimated type-b genetic correlations be-tween sites mimicked the trend observed for cross sitesGS accuracy with their Pearson-product-moment correla-tions ranging between 0.94 and 0.99 (P < 0.05) over theseven studied traits for the kNN-Fam-60% imputationmethod (Figure 2).GRR 0.54 ± 0.065 0.49 ± 0.073 0.45 ± 0.086 0.55 ± 0.06Traits are HT: height in m; DBH: diameter at breast height in cm; VOL: stem volumewood density in kg/m3 using X-ray densitometry; MoEd: dynamic modulus of elasticomic selection prediction accuracies and their standardimputation methods (MI-60%, kNN-Fam-60% and SVD--60% SVD-60%AL Quesnel PGTIS AL Quesnel2 0.56 ± 0.002 0.42 ± 0.002 0.53 ± 0.003 0.50 ± 0.004 0.42 ± 0.0035 0.51 ± 0.006 0.39 ± 0.005 0.51 ± 0.004 0.47 ± 0.006 0.37 ± 0.0051 0.63 ± 0.002 0.55 ± 0.002 0.60 ± 0.002 0.59 ± 0.003 0.54 ± 0.0035 0.62 ± 0.004 0.53 ± 0.004 0.59 ± 0.002 0.57 ± 0.004 0.52 ± 0.0042 0.63 ± 0.001 0.47 ± 0.002 0.59 ± 0.002 0.57 ± 0.003 0.48 ± 0.0033 0.59 ± 0.005 0.44 ± 0.005 0.58 ± 0.003 0.56 ± 0.004 0.45 ± 0.0052 0.61 ± 0.002 0.49 ± 0.002 0.58 ± 0.002 0.55 ± 0.002 0.46 ± 0.0033 0.57 ± 0.005 0.46 ± 0.006 0.57 ± 0.004 0.53 ± 0.003 0.42 ± 0.0042 0.49 ± 0.002 0.62 ± 0.002 0.56 ± 0.003 0.46 ± 0.004 0.58 ± 0.0025 0.44 ± 0.007 0.59 ± 0.005 0.54 ± 0.003 0.43 ± 0.005 0.56 ± 0.0032 0.53 ± 0.002 0.53 ± 0.002 0.50 ± 0.002 0.50 ± 0.002 0.50 ± 0.0035 0.50 ± 0.006 0.48 ± 0.009 0.49 ± 0.005 0.49 ± 0.003 0.49 ± 0.0041 0.61 ± 0.001 0.49 ± 0.002 0.59 ± 0.003 0.54 ± 0.004 0.45 ± 0.0044 0.58 ± 0.004 0.45 ± 0.004 0.58 ± 0.002 0.52 ± 0.004 0.41 ± 0.0050 0.58 ± 0.054 0.51 ± 0.064 0.56 ± 0.037 0.53 ± 0.045 0.49 ± 0.055Within multi-site GS accuraciesSimilar to within site assessment, the within multi-sitecross-validation produced higher GEBV accuracies forRR-BLUP as compared to GRR and this increase in ac-curacy persisted across all 30% and 60% missing dataimputation methods (Table 5). Comparisons betweenimputation methods revealed that EM-30% and kNN-Fam-60% produced better accuracies (Table 5, Figure 1,Additional file 2). Again, we will restrict the GEBV ac-curacy comparisons to the kNN-Fam-60% imputationmethod as it uses the largest number of SNPs (62,198SNPs). On average and across the seven studied traits,GS accuracies ranged between 0.62 and 0.77 for bothRR-BLUB and GRR (Table 5). The span of this range isfar greater than the one observed within sites and cross-sites validation (Tables 2, 3 and 4). These estimatesrepresent the most realistic accuracies as they accommo-dated the GxE interaction and, furthermore, were pro-duced with a large training population size (90% of thetotal N = 1,126).Single- vs. multi-site accuraciesWhen the meta-population was used to predict theGEBV for each individual site, the observed accuracieswere high with Aleza Lake producing the highest accur-acies (average over the 7 traits of 0.49 for RR-BLUP and0 0.54 ± 0.063 0.48 ± 0.065 0.55 ± 0.039 0.51 ± 0.050 0.46 ± 0.067in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray:ity.Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 6 of 16GRR) followed by Quesnel (averages of 0.46 and 0.45 forRR-BLUP and GRR, respectively) and PGTIS which pro-duced the lowest accuracies (average of 0.42 for bothRR-BLUP and GRR) (Table 6). These accuracies arehigher than those observed for the cross-site validation(Table 6, Figure 1, Additional file 2).Multi-trait GS prediction modelsThe first three principle components, PCA1-3, collect-ively accounted for 86% of the total phenotypic variationand individually accounted for 44, 25, and 17%, respect-ively. PCA1 produced significant (P < 0.002 - 0.0001)loading for all the studied traits and was positive forheight (HT) (0.69), diameter at breast height (DBH)(0.80), and acoustic velocity (VDir) (0.09) and negativefor wood density using X-ray densitometry (WDX-ray)(−0.71) and wood density using resistance to drilling(WDRes) (−0.75). PCA2 produced interesting results withsignificant (P < 0.0001) and positive loadings for HTFigure 1 Genomic selection prediction accuracies for each of the seven stud(six), within multi-site (one), and for multi-site to single site (three)), along withanalyses. Sites are Prince George Tree Improvement Station (PGTIS), Quesnel, Aat breast height in cm; VOL: stem volume in m3; VDir: acoustic velocity in km/sdensitometry; MoEd: dynamic modulus of elasticity.(0.39), VDir (0.92), and WDX-ray (0.49). Similarly, PCA3produced significant (P < 0.0001) and positive loadingsfor HT (0.46), DBH (0.38), WDX-ray (0.19) and WDRes(0.64). The fact that growth and wood quality traits pro-duced significant and positive loadings, even if it is forPCA2 and PCA3, is interesting as it creates concurrentselection opportunities for yield and wood quality traitsthat are commonly known to be negatively correlated.The two GS models produced high prediction accuraciesfor PCA1 with 0.72 ± 0.001 and 0.71 ± 0.001 for RR-BLUP and GRR, respectively. Similar results were ob-served for PCA 2 (RR-BLUP: 0.65 ± 0.001 and GRR:0.64 ± 0.001) and PCA3 (RR-BLUP: 0.57 ± 0.001 andGRR: 0.55 ± 0.002) using the multi-site GS model.ABLUP vs. GBLUP elite genotype selection comparisonExpectedly, across all the range of genetic gain penalties,the selection of 40 elite individuals yielded ABLUP gen-etic gain higher than that of the GBLUP with percentageied traits using the RR-BLUP model (within single site (three), cross-sitesnarrow-sense heritabilities (h2) from single- and multi-site GBLUPleza lake, and multi-site (ALL). Traits are HT: height in m; DBH: diameter; WDRes: resistance to drilling; WDX-ray: wood density in kg/m3 using X-rayFigure 2 Accuracy of cross-population GS prediction models (indicating their respective correlations (Y-axis)) for seven growth and wood qualitytraits for interior spruce. Sites are Prince George Tree Improvement Station (PGTIS), Quesnel, and Aleza lake. Traits are HT: height in m; DBH: diameter atbreast height in cm; VOL: stem volume in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray: wood density in kg/m3 using X-raydensitometry; MoEd: dynamic modulus of elasticity. Dash and solid lines represent Type B correlation and prediction accuracy, respectively.Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 7 of 16Table 5 Multi-site genomic selection prediction accuracies and their standard errors for RR-BLUP and GRR models forthe studied five imputation methodsTrait GS model Imputation methodMI-30% EM-30% MI-60% kNN-Fam-60% SVD-60%HT RR-BLUP 0.56 ± 0.0013 0.58 ± 0.001 0.60 ± 0.001 0.63 ± 0.001 0.61 ± 0.001GRR 0.50 ± 0.002 0.48 ± 0.004 0.57 ± 0.003 0.62 ± 0.002 0.58 ± 0.002DBH RR-BLUP 0.71 ± 0.001 0.72 ± 0.001 0.75 ± 0.001 0.77 ± 0.001 0.76 ± 0.001GRR 0.71 ± 0.001 0.73 ± 0.001 0.74 ± 0.001 0.77 ± 0.001 0.75 ± 0.001VOL RR-BLUP 0.67 ± 0.001 0.68 ± 0.001 0.71 ± 0.001 0.73 ± 0.001 0.72 ± 0.001GRR 0.67 ± 0.001 0.68 ± 0.001 0.70 ± 0.001 0.72 ± 0.001 0.71 ± 0.001VDir RR-BLUP 0.59 ± 0.001 0.61 ± 0.001 0.63 ± 0.001 0.67 ± 0.001 0.65 ± 0.001GRR 0.52 ± 0.004 0.50 ± 0.003 0.62 ± 0.002 0.66 ± 0.001 0.62 ± 0.006WDRes RR-BLUP 0.56 ± 0.001 0.58 ± 0.001 0.62 ± 0.001 0.64 ± 0.001 0.63 ± 0.001GRR 0.48 ± 0.002 0.47 ± 0.003 0.59 ± 0.003 0.64 ± 0.002 0.60 ± 0.003WDX-ray RR-BLUP 0.55 ± 0.001 0.56 ± 0.001 0.59 ± 0.001 0.62 ± 0.001 0.61 ± 0.001GRR 0.54 ± 0.002 0.55 ± 0.001 0.59 ± 0.002 0.62 ± 0.001 0.60 ± 0.002MoEd RR-BLUP 0.50 ± 0.001 0.61 ± 0.001 0.63 ± 0.001 0.67 ± 0.001 0.65 ± 0.001GRR 0.50 ± 0.013 0.56 ± 0.002 0.63 ± 0.002 0.66 ± 0.001 0.64 ± 0.002Ave. RR-BLUP 0.59 ± 0.073 0.62 ± 0.059 0.65 ± 0.060 0.68 ± 0.055 0.66 ± 0.057GRR 0.56 ± 0.091 0.57 ± 0.101 0.63 ± 0.063 0.67 ± 0.056 0.64 ± 0.063Traits are HT: height in m; DBH: diameter at breast height in cm; VOL: stem volume in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray:wood density in kg/m3 using X-ray densitometry; MoEd: dynamic modulus of elasticity.Table 6 Single site GS prediction accuracies and their standard errors resulting from using the multi-sites as trainingpopulation for RR-BLUP and GRR models for kNN-Fam-60% imputation methodTraits GS model Cross-validationMulti-sites PGTIS Aleza Lake QuesnelHT RR-BLUP 0.63 ± 0.001 0.37 ± 0.001 0.53 ± 0.002 0.45 ± 0.001GRR 0.62 ± 0.002 0.36 ± 0.003 0.52 ± 0.003 0.45 ± 0.002DBH RR-BLUP 0.77 ± 0.001 0.37 ± 0.001 0.50 ± 0.001 0.40 ± 0.001GRR 0.77 ± 0.001 0.37 ± 0.002 0.50 ± 0.001 0.40 ± 0.001VOL RR-BLUP 0.73 ± 0.001 0.34 ± 0.001 0.50 ± 0.001 0.41 ± 0.001GRR 0.72 ± 0.001 0.34 ± 0.002 0.50 ± 0.002 0.40 ± 0.002VDir RR-BLUP 0.67 ± 0.001 0.50 ± 0.001 0.47 ± 0.001 0.49 ± 0.001GRR 0.66 ± 0.001 0.49 ± 0.001 0.47 ± 0.001 0.48 ± 0.002WDRes RR-BLUP 0.64 ± 0.001 0.41 ± 0.001 0.48 ± 0.001 0.46 ± 0.001GRR 0.64 ± 0.002 0.41 ± 0.002 0.48 ± 0.002 0.45 ± 0.003WDX-ray RR-BLUP 0.62 ± 0.001 0.46 ± 0.001 0.49 ± 0.002 0.50 ± 0.001GRR 0.62 ± 0.001 0.46 ± 0.002 0.49 ± 0.002 0.50 ± 0.002MoEd RR-BLUP 0.67 ± 0.001 0.50 ± 0.001 0.46 ± 0.001 0.48 ± 0.001GRR 0.66 ± 0.001 0.49 ± 0.002 0.45 ± 0.002 0.47 ± 0.002Ave. RR-BLUP 0.68 ± 0.055 0.42 ± 0.066 0.49 ± 0.023 0.46 ± 0.039GRR 0.67 ± 0.056 0.42 ± 0.063 0.49 ± 0.023 0.45 ± 0.038Traits are HT: height in m; DBH: diameter at breast height in cm; VOL: stem volume in m3; VDir: acoustic velocity in km/s; WDRes: resistance to drilling; WDX-ray:wood density in kg/m3 using X-ray densitometry; MoEd: dynamic modulus of elasticity.Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 8 of 16increase between 9.2 and 14.6% for 100 and 1,000 pen-alty classes, respectively (Figure 3). Naturally, any in-crease in co-ancestry is associated with increase ingenetic gain; however, the GBLUP offers greater flexibil-ity for elite genotype selection than the ABLUP as theeffective number of genomic equivalent provides a con-tinuum for selection as opposed to the pedigree-basedstatus number which offers only two options of related-ness (unrelated or half-sibs).DiscussionGBS and imputation methodsThe utilization of NGS technology, and GBS in particu-lar, provides a low cost opportunity for genomic studiesfor non-model species [23]. In the present study, GBSproduced exceedingly large number of SNPs (1,232,406);however, the low coverage nature of the technique hasGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 9 of 16substantially reduced the available SNPs for analyses dueto missing data. Missing data could also result from eitherthe absence of the restriction site in the genomic sequenceor due to technical issues associated with DNA digestionor PCR amplification [29,30]. Out of the five imputationmethods used, the expectation maximization (EM-30%:[31]) and the newly developed half-sib family-based k-nearest neighbor (kNN-Fam-60%) method resulted in8,868 and 62,198 SNPs, respectively, and produced thegreatest accuracies (Figure 1, for kNN-Fam-60%). We usedthe EM-30% imputation method in estimating the traitheritabilities employing the GBLUP approach [17], whileall described imputation methods were used to evaluatethe GS models across all described scenarios. We believethat the higher GEBV accuracies attained from the kNN-Fam imputation method are attributable to the method’scapacity of recovering resemblance among individualsFigure 3 The relationship between height genetic gain and geneticdiversity for ABLUP (status number (Ns)) and GBLUP (number of foundergenome equivalent (NGE)) across a range of co-ancestry penalties.within families. In addition, kNN-Fam method propor-tionately weights family structure and the underlying LDof SNPs, which is also likely contributing to the slightlyhigher predictability due to its strength of simultaneouslycapturing identical-by-state with the variants in LD withthe causal genes [32].Heritability estimatesTreating the offspring from open-pollinated families ashalf-sibs is often associated with inflated heritability esti-mates, resulting in an exaggeration of the expected geneticgain [33-35]. In the present study, heritability estimatesobtained from the ABLUP were higher than those fromthe GBLUP (Table 2), highlighting the advantages of in-corporating genomic information in standard quantitativegenetic analyses [17] to obtain realistic estimates of breed-ing values and genetic gain (see ABLUP vs. GBLUP elitegenotype selection comparison below).Our results are similar to those reported for anotheropen-pollinated white spruce progeny trial in Québec,Canada [16].While heritability estimates were population-specific, slight differences in GBLUP-based heritabilityestimates for wood density (WDX-ray) and height (36- vs.22-year-old height) were observed between the two studies(wood density: 0.18 vs. 0.24 and height: 0.20 vs. 0.16) [16].Additionally, our results suggest that the trait heritabilityhas only limited effect on the prediction accuracy (PA) asdiameter at breast height (DBH) and stem volume (VOL)showed high multi-site RR-BLUP predictability despitetheir low heritability estimates (DBH: h2 = 0.07 and PA =0.77; VOL: h2 = 0.09 and PA = 0.73), results consistentwith those reported for loblolly pine (Pinus taeda) [15,36].GS modelsGS models suffer from the “large p, small n” problem,where the number of predictor effects p exceeds by farthe number of observations n (p> > n). A variety of stat-istical methods were proposed to handle this issue andthey can be classified into three major categories: shrink-age models, Bayesian methods (including variable selec-tion), and semi- or non-parametric methods such assupport vector regression and random forest regression.Those methods are different in their assumptions regard-ing the genetic architecture of the tested traits [1,37]. RR-BLUP, the most common shrinkage model, assumes thatthe trait is controlled by many genes each with small ef-fects, thus is suitable for traits following the infinitesimalmodel [8]. RR-BLUP assumes that all marker effects arerandom, normally, and identically distributed and have acommon variance, thus all the effects will be equallyshrunken toward zero [1,37,38]. This approach was de-scribed previously by Meuwissen et al. [11] and termedSNP-BLUP. In GS and genome wide association studies(GWAS), it is not realistic to use common shrinkageGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 10 of 16effects for all fitted SNPs across the genome as not allmarkers will be linked to functional genes and not all geneeffects are normally distributed [11]. To overcome this as-sumption, the Bayesian methods were developed to pro-vide more flexibility in modeling oligogenic traits (i.e.,traits that are controlled by few genes each with large ef-fects) [37]; however, these methods are computationallydemanding [39]. A new, fast, deterministic, and flexibleRidge regression method was suggested by Shen et al. [38]known as the generalized Ridge regression (GRR). Themain difference between RR-BLUP and GRR is that aSNP-specific shrinkage will be used instead of the com-mon shrinkage effect [38], which is more realistic andmore suitable to model oligogenic traits and represents aviable alternative to Bayesian models [28].Our results showed that GRR produced either similaror even lower prediction accuracies as compared to RR-BLUP, which indicates that marker selection by givingdifferent degree of penalization through the application ofdifferent shrinkage effects is inadequate for the testedtraits. This provides evidence that the tested traits (growthand wood quality) follow the infinitesimal model. More-over, experimental results in both plants and animals sug-gested that RR-BLUP provides the best adjustment/compromise between the computational effort and theprediction efficiency [37]. This supports the notion thatmost of the economically important traits are complexand quantitative in nature (i.e., follow the infinitesimalmodel). For example, in loblolly pine, Resende et al. [14]evaluated RR-BLUP and three Bayesian models across 17traits related to growth, development, and fusiform rustresistance and the resulting prediction accuracies weremarginally different across the four models, except for rustresistance, an oligogenic trait, where the Bayes A and Cmodels resulted in moderately larger performance thanRR-BLUP.Cross-validationThe multi-site cross-validation produced higher predic-tion accuracies as compared to single-sites (Tables 3, 4and 5, Figure 1) as the multi-site training population isthree times larger than any of the single-site models,resulting in more accurate estimation of marker effectsand this is consequently reflected in higher predictionaccuracy and precision [1,37]. Previous GS studies con-ducted on plant and animal populations clearly demon-strated the role of training population size on predictionaccuracy and illustrated the importance of the trainingpopulation size as compared to the number of markersused in the models, thus supporting the present study re-sults [40-42]. In forestry context, our results are also con-sistent with prediction accuracies obtained for growth andwood quality attributes in loblolly pine and Eucalyptus[13,15,43]. However, comparing the prediction accuraciesbetween our study and those from the Québec whitespruce open-pollinated progeny trial is of interest as theexperimental settings were somewhat similar [16]. Height,wood density, and dynamic modulus of elasticity werecommon traits between the two studies; however, theirprediction accuracies were lower than in the present study(height: 0.17 vs. 0.63, wood density: 0.33 vs. 0.64, dynamicmodulus of elasticity: 0.21 vs. 0.67). In general, the lowerprediction accuracies in the Québec study across all thetraits compared to our and other tree species studies, ismainly due to the considerably larger number of testedfamilies (214 vs. 25 families) which resulted in higher Ne(effective population size). It is also worth mentioning thatwe used the EBV as opposed to the raw phenotype intraining our GS models; this could have also contributedto the observed differences.Cross-site validationThe economic and ecological importance of interiorspruce to British Columbia promoted thorough under-standing of the various ecological regions of the speciesand subsequently 6 unique Seed Planning Zones (SPZs)were identified (Bukley Valley, East Kootenay, Nelson,Prince George, Peace River, and Thompson Okanagan).To date, most forestry GS studies were conducted withinthe confines of a single “environment model” similar tothose GS studies conducted in animal breeding programswhere the assumption of a common environment was in-voked. The assumption of “common environment” is notsuitable in forestry as estimates of GxE, even within a sin-gle breeding zone, are high [44] and this motivatedbreeders to evaluate the performance of a specific geno-type or family across different environments to identifygeneralists for their inclusion in seed production popula-tions [45]. For the successful implementation of GS in treebreeding, it is essential that GS models remain accurateacross sites, at least within the dedicated breeding zone.Only two out of the published four GS studies in foresttree tested GxE interaction, these include loblolly pine[14] and white spruce [16]. In the present study, we useddata from three sites within the Prince George breedingzone and the observed prediction accuracies of a singlesite to predict another site were generally low (Additionalfile 2, Figure 1). The observed reduced prediction accur-acies across sites were lower than those obtained from thewhite spruce and loblolly pine studies. Thus, it is import-ant to pay considerable attention to the structure of thetraining population; hence the developed models reflectthe underpinning forces affecting trait expression andtheir response to sites heterogeneities.Multi-trait GS prediction modelsGS models are trait-specific and do not lend themselvesto multi-trait selection as does index selection methodGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 11 of 16which maximizes the correlation between the indexscore of an individual and its breeding value [46]. Yet,selection indices require prior knowledge about the eco-nomic value of the traits for proper scaling beforeoptimum phenotypic weights can be estimated. The useof Principle Component Analysis offered an opportunityto handle a set of correlated variables by reducing the di-mensionality to a set of uncorrelated ones (i.e., principalcomponents). Negative genetic correlations betweenyield and wood quality traits are commonly observed[47] and the results from PC1 which accounts for 44%of the total variation confirmed these observations.However, while yield and wood quality are known to actin antagonizing fashion, the results based on PC2 andPC3, albeit collectively accounting for 42% of the totalvariation, created interesting opportunities for the con-current selection for both traits without any adverse ef-fect associated with the known negative correlations. Itseems that PC2 and PC3 accessed different combina-tions of SNPs (i.e., causal genes) that work in the samedirection. While we did not consider any prior economicknowledge for weighing in constructing the PCs, the re-sults from PC2-3 clearly demonstrated that it is (to acertain extent) also possible to artificially co-select suchattributes that are commonly known to be negativelycorrelated in the same positive direction. Consideringeconomic weights for traits during constructing selec-tion indices can result in changing the magnitude ofgenetic correlation among these traits as a consequenceof selection. This change in genetic correlation is ex-pected to change SNP effects and thus frequent trainingis required for GS model to be effective over genera-tions. Finally, our objective of using PCA is to offer asimple method that accounts for the inter-relation (gen-etic correlation) between the studied traits and providean opportunity for further expansions that consider eco-nomic weights.ABLUP vs. GBLUP elite genotype selection comparisonThe observed genetic gain differences between theABLUP and GBLUP across all co-ancestry penalties werenot surprising as heritability, breeding value of an indi-vidual, and genetic gain estimates are expected to behigher in open-pollinated populations due to the ABLUPinability to ascertain the true genetic relationship amongoffspring [33-35]. On the other hand, GBLUP relies onestimating the realized kinship which provides a moreaccurate ascertainment of the genealogical relationshipsamong members of an open-pollinated family and thus,resulting in more realistic gain estimates due to adjust-ment for Mendelian sampling term [48]. Our results aresimilar to those reported in the Québec white sprucestudy as they consistently produced higher gains frompedigree- vs. marker-based methods [16].It should be pointed out that the Bulmer effect (i.e., re-duction in response to selection) would be similar forABLUP and GBLUP and thus the response to selectionfor both methods will be similarly affected irrespectiveof the breeding values estimation method used [49]. Ifgenomic selection effectively reduces generation interval,then in the forestry context, a relatively smaller refer-ence (training) population size is needed to attain thesame response to selection from larger traditional popu-lation (i.e., ABLUP). Conversely, if generation turnoveris not possible, then larger training population size is re-quired, therefore defeating GS goals. Bastiaansen et al.[50] found similar response to selection for GBLUP andABLUP but the former accumulated lower level of in-breeding and consequently higher genetic variance thanthe latter.Genomic selection in forestryOpen-pollinated family testing is a formidable and eco-nomically viable option for screening a larger number ofcandidate parents without the development of “struc-tured pedigree” that represents the backbone of mostconventional tree breeding methods. The simplicity ofthe method made it an attractive first step before start-ing a full-blown tree improvement program. Indeed, thiswas the case for the New Zealand radiata pine (Pinusradiata) breeding program as open-pollinated testingprovided a quick and inexpensive screening method [51]and subsequently the selected parents were included in afull pedigree-based breeding program [52]. However, thecommonly used assumption of treating open-pollinatedoffspring as half-sib family is by far the greatest draw-back of this method as most genetic parameters (e.g.,breeding values, trait heritabilities, and gain estimates)are upwardly biased and this was clearly demonstrated inmany studies including the present one [16]. The intro-duction of genomic data (e.g., SNP markers) has providedthe means to overcome this drawback and the genea-logical relationship among open-pollinated family mem-bers is clearly and accurately ascertained. At present,many open-pollinated family testing trials have reached anadvanced age and are often abandoned, though they couldprovide badly needed information for late expressed traitsthat could not be obtained from younger conventional tri-als. The present study and that of Beaulieu at al. [16] pro-vided examples of producing yield and wood qualityattributes data with unprecedented accuracy and this be-came possible through the integration of genomic infor-mation in the quantitative genetic analyses (e.g., ABLUPvs. GBLUP).In the present study, the accuracy of predicting breed-ing values varied across the different studied populationscales with within multi-site being the highest and crosssites being the lowest (Figure 1). The high within multi-Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 12 of 16site GS prediction accuracies offer an opportunity to ob-tain reliable results for difficult traits such as wooddensity and yield and points towards considering “old”open-pollinated tests as a valuable source for informa-tion. The developed prediction models could be used forselecting elite genotypes with unprecedented selectionintensity for their inclusion in future seed productionpopulations, and this can be accomplished without thecreation of a single cross.In the present study, GBS successfully provided the in-formation for genomic-based quantitative genetics ana-lyses at reasonable cost. To our knowledge, this studyrepresents the first large-scale use of GBS in a forest treespecies known to a have complex genome and for whichno reference sequence has been assembled yet (N = 1,126trees). It is noteworthy to mention that this study was ini-tiated before the release of Norway and white spruce gen-ome sequences [53,25]. However, as the assemblies of thetwo spruce genomes are not anchored and ordered alongthe chromosomes, there is little advantage over de novoSNP markers.ConclusionsThe results reported here suggest that GBS can be used asa genotyping platform for the application of GS in forestry.The utilization of proper imputation algorithms is neededto overcome the commonly observed problem of missingdata with GBS. Greater GS prediction accuracies were ob-tained for RR-BLUP as compared to GRR indicating thatthe studied traits follow the infinitesimal model. Greateraccuracies were obtained for multi-site GS model andpoints to the inherent lack of reliability for cross-site pre-diction. The utilization of principle component analysis asa multi-trait GS approach was proven effective in dealingwith negatively correlated traits.MethodsExperimental population and DNA samplingFor this study, 1,126 38-year-old Interior spruce trees(Picea glauca (Moench) Voss x Picea engelmannii Parryex Engelm.) were sampled from a progeny test trialestablished by the Ministry of Forests, Lands and Nat-ural Resource Operations of British Columbia Canada,and planted on three sites [Aleza Lake (Lat. 54° 03′15.7″ N, Long. 122° 06′ 35.4″ W, Elev. 700 mas), PrinceGeorge Tree Improvement Station (PGTIS) (Lat. 53° 46′17.9″ N, Long. 122° 43′ 07.6″W, Elev. 610 mas), andQuesnel (Lat. 52° 59′ 27.2″ N, Long. 122° 12′ 30.6″ W,Elev. 915 mas)]. The sites were established in 1972/73and consisted of 181 open-pollinated families using 3-year-old seedlings planted at 2.5×2.5 m spacing in acomplete randomized block design with five or tenblocks and ten or fifteen tree-row-plots, respectively.Twenty-five families were selected based on theirsuperior growth traits and four trees per family fromfour blocks per site were randomly sampled (maximumof 32 trees per family). Evidence of similar genetic diver-sity between selected and unselected populations havebeen reported for spruces, including white spruce[54,55]. The differences across all the three sites in therelationship between overall X-ray density and growthtraits (see below) indicated that the Quesnel site is mostfavorable while PGTIS least favorable for growing inter-ior spruce (YA El-Kassaby, pers. obs.).Genotyping and SNP selectionDNA extraction was performed on dormant vegetativebuds of the sampled trees using a CTAB procedure modi-fied after Doyle and Doyle [56]. To generate a high-density SNP profile for the 1,126 spruce DNA extracts, weconducted a multiplexed, high-throughput Genotyping-by-Sequencing (GBS) following Elshire et al. [22] andChen et al. [23]. A 48-plex GBS library comprising of 47DNA samples and a negative control (without DNA) wasprepared and each of the 47 spruce DNA extracts was bar-coded. In brief, each DNA extract (500 ng) was digestedwith restriction enzyme ApeKI for 2 hours. The details ofoligonucleotide sequences for the ApeKI barcode adaptersand temperature cycles are provided in Chen et al. [23].Ligation products from each DNA extract were pooledand purified using QIAquick PCR purification kit (Qia-gen). The amplified 48-plex libraries were diluted and se-quenced (single-end reads only) twice on the IlluminaHiSeq 2000 at the Cornell University Genomics Core La-boratory to achieve the sequencing coverage equivalent to24-plex. Raw DNA short-read sequences were analyzedwith a pipeline, the Universal Network Enabled AnalysisKit (UNEAK), tailored to species lacking reference gen-ome information [27]. This SNP detection pipeline isavailable in TASSEL v5.0 [57]. To reduce sequencing errorin genotype determination, we set the error tolerance rateto 0.03 (to pass the expected Illumina sequencing errorrate at 0.4%). The resulting SNP table was further filteredusing minimum value of inbreeding coefficient (mnF =0.05) and minimum minor allele frequency (mnMAF =0.05), and finally, SNPs that are present in less than 40%of the samples were eliminated from further analysis.Missing data imputationTo interpret missing values present in the filtered SNP set,five different imputation algorithms were employed: (1)mean imputation (MI), (2) singular value decompositionimputation (SVD:[28]), (3) traditional k nearest neighbor(kNN:[28]), (4) expectation maximization imputation(EM:[31]), and (5) k-nearest neighbor imputation butnewly derived for half-sib family structure (kNN-Fam).For SVD, the original SNP matrix was used to obtain aset of the k most significant eigenvectors of the SNPGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 13 of 16markers. The k eigenvectors were then used as predic-tors for linear regression estimation of the missing data.SVD was implemented in R [58] using the “bcv” pakage[59]. The resultant numerical SNP values (x) were fur-ther classified into three separate genotype classes, −1, 0,and 1. The classification algorithm was taken as a modi-fied k-means algorithm [60], with the centroids set at −1(k1), 0 (k2), and 1 (k3). The assignment of genotypes wasdone by satisfying:argmin SSð Þ ¼Xki¼1Xx∈Six−ki ð1Þwhere (1) defines the minimum distance for the SNPvalue from the centroids.For traditional kNN, the missing values were replacedwith the weighted average of SNP values at the k closestSNP markers. The distances between all possible pairs ofmarkers were computed by Euclidean distance. We se-lected five families (6, 11, 17, 21, and 47) to test the im-putation accuracy, as well as the efficiency of iterationsfor convergence (2, 3, 5 and 10 iterations for SVD; forEM, we tested the distance between the new estimateand the previous values less than 0.01). K = 10 and 30were selected for accuracy estimates for kNN imput-ation. All iterations reached convergence criteria thatwere used in [61], however they resulted in different ac-curacies (shown in Additional file 4).The kNN-Fam algorithm is derived from the kNNmethod of Troyanskaya et al. [28]. Missing values in theSNP table were first replaced with the mean of the locusby MI. A standardized genomic similarity matrix for allsamples was calculated based on VanRaden [17] and theEuclidean distance between SNP markers was definedfollowing Rutkoski et al. [61]. Instead of the classic k-nearest neighbor method, wherey^ ¼ 1K sigma yð Þ ð2Þthe missing SNP values were replaced with:y^ ¼ mode 1K1þ K2 y ð3Þwhere K1 is the number of neighbors within the half-sibfamily based on the genomic similarity, K2 is the num-ber of neighbors from outside the family based on theEuclidean distance, and y is the original locus mean. Weconducted exhaustive search for the optimal values ofK1 and K2, by permutating K1 through 1 to 30 (thenearest neighbor set as 1, and then 2, 5, 10, 15, 20 to themaximum family size of 30), and K2 from 1 to 250, asthe total sample size of the panel is 1,126. The accuracyof kNN-Fam imputation was conducted for eachpermutation by randomly masking one million knowndata points from the filtered SNP table of the 5 selectedfamilies, and calculating the percentages of markers be-ing imputed back to the correct SNP values.Phenotypic dataThe studied trees were phenotyped for (a) two growthtraits (height in m (HT) and diameter at breast height incm (DBH) which were subsequently used to estimatestem volume in m3 (VOL) following Millman’s formula)(Millman M. Metric Volume and V-Bar Tables Derivedfrom the British Columbia Forest Service Whole StemCubic Meter Volume Equations. Vancouver BC, 1976.Unpublished) and (b) three wood quality attributes(wood density in kg/m3 using X-ray densitometry (WDX-ray), resistance to drilling (WDRes), and acoustic velocityin km/s (VDir)) [62]. Furthermore, WDX-ray and VDirwere used to derive the dynamic modulus of elasticity(MoEd) [63]. WDX-ray is commonly used to estimatewood density using increment cores extracted from thesampled trees, while WDRes and VDir represent indirect(i.e., non-invasive) methods that rely on wood densityfor either creating resistance during drilling or the speedof transmitting sound though the wood, respectively[62].Estimated breeding values (EBV)The breeding value for each tree was estimated usingASReml v.3 using two different mixed linear models[64]. The first used the pooled populations to estimatemulti-site breeding values (MSEBV), while the secondwas used to estimate single-site breeding values (SSEBV)as follows:Multi-site model:y ¼ Xβþ Z1aþ Z2sβþ Z3saþ e ð4Þwhere y is the phenotypic measurement of the analyzedtrait, β is a vector of fixed effect (i.e., the overall meanand the site effect), a is a vector of random additive ef-fect of individual trees ~ N(0, Aσ2a), sb is a vector of therandom effect of block within site ~ N(0, Iσ2sb), sa is avector of random site x genotype interaction ~ N(0,Iσ2 sa), e is a vector of random residual effect ~ N(0, Iσ2e),and X and Z1-Z3 are incidence matrices assigning fixedand random effects to each observation and I and A arethe identity and average numerator relationship matrices,respectively. Narrow-sense heritability was calculated ash2 = σa2/(σa2 + σsa2 + σe2) for the multi-site model.Single-site model:y ¼ Xβþ Z1βþ Z2aþ e ð5ÞThis model is identical to the multi-site mixed linearmodel but without all terms related to site (site, blockWe applied Principle Component Analysis (PCA) to dis-Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 14 of 16nested within site, and site x genotype interaction).Narrow-sense heritability was calculated as h2 = σa2/(σa2 + σe2). Additionally, Genomic Best Linear Un-biased Predictor (GBLUP) [17] was used to estimatethe narrow-sense heritabilities of the traits for singleand multi-site using genotypes from imputed data pro-duced by the EM algorithm with 30% missing data.This analysis was performed by substituting averagenumerator relationship matrix with marker-based rela-tionship matrix [17] using observed allele frequencies.Genomic selection analysesThe SNP effects were estimated on the basis of two differ-ent methods: 1) Ridge Regression Best Linear UnbiasedPredictor (RR-BLUP) implemented in R package rrBLUP[65] and 2) Generalized Ridge Regression (GRR) imple-mented in R package bigRR [38]. In both cases the follow-ing mixed linear models were fitted:y ¼ Xβ þ Zbþ e ð6Þwhere y is the vector of EBV, β is the vector of fixed ef-fect which is the overall mean, b is the vector of randomSNP effects, X and Z are incidence matrices for β and b,respectively, X is a vector of 1 while Z was built from(-1, 0, 1) for aa, Aa and AA, respectively. The codes forZ were standardized according to the allele frequencyusing VanRaden’s method [17]. β and b are estimatedsimultaneously using Henderson’s mixed model equation(MME) [66]:X0X X 0ZZ0X Z0Z þ λI βb ¼ X0yZ0y ð7Þwhere λ ¼ σ^ 2e=σ^ 2b is the shrinkage parameter for the ran-dom SNP effects, so all the SNPs will have the sameshrinkage magnitude, in other words, all are penalized tothe same degree. In GRR, the SNPs with small effectsare more penalized. The first step in GRR is an ordinaryRR, then it again uses MME to fit the heteroscedasticmodel:X0X X 0ZZ0X Z0Z þ diag λð Þ βb ¼ X0yZ0y ð8Þwhere diag ( λ ) is the diagonal matrix of SNP specificshrinkage parameters estimated as λj ¼ σ^ 2e=σ^ 2bj , whereσ^ 2bj is variance attributed to jth SNP and is estimated as:σ^ 2bj ¼b^2j1−hjjð9Þwhere, bj is the SNP effect, and hjj is the (n + j)th diag-onal element of the matrix H = T (T’T)−1T’, whereT ¼ X Z0 diag λð Þ ð10Þσ^ 2bj is needed as it represents the form of implementedvariable selection.Cross-validation, predictive accuracy and type-b geneticcorrelationThe predictive accuracy was estimated using a 10-foldcross-validation approach with 20 replications. In each rep-lication, the data were randomly divided into 10 subsets(folds) and each one was used as validation population(representing 10% of the data set), while the remaining 9-folds were used as the training population (90% of the dataset) to fit the GS model. This process was repeated 20 timeswith random assignment of the data to the 10 folds [67-69].One advantage of this scheme is that it provides the degreeof uncertainty (i.e., standard error) around these point esti-mates. In all the replicates, the models were fitted to thetraining data set and used to predict the GEBV of the valid-ation data set by multiplying the vector of the marker effectestimated from the training population with the incidencematrix Z of the individuals in the validation population andsumming over the estimated general mean:y^j ¼ u^ þXiZijm^i ð11Þwhere u is intercept, Z is genotype at the ith locus of the jthindividual and m is the marker effect. The accuracy of GSto predict the breeding value (BV) was estimated as thecorrelation of the vector of GEBV for all individuals (pre-dicted from the validation step) with their estimated BV(MSEBV or SSEBV according to the validation scenario).As we used 20 replicates, we obtained 20 estimates for pre-diction accuracy and we estimated means and standard er-rors for these estimates. The developed models werevalidated under the following four scenarios, namely, (1)within site, (2) in all 6 possible combinations for cross-validation comparisons across sites, (3) as a multi-sitepopulation, where training and validation populations werederived from the combined population for cross-validationand (4) again as a multi-site population, but where the en-tire multi-site population was used as training populationand the individual site as validation population.Moreover, we estimated the type-b genetic correlationacross sites, which is the additive genetic correlation be-tween the traits measured on different individuals fromthe same genetic group but present in different environ-ments, using a method described by Burdon [44].Multi-trait GS modeltil the correlated variables (EBV) into a set of linearly in-dependent variables (i.e., the principal componentsGamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 15 of 16(PCs)). We used HT, DBH, VDir, WDRes, and WDX-rayEBVs as variables to determine the PCs that best expressthese phenotypes and used their score as a new phenotypein subsequent RR-BLUP GS model for the multi-site sce-nario using the kNN-Fam imputation.ABLUP vs. GBLUP elite genotype selection comparisonNotwithstanding the relatively small number of 25open-pollinated families under investigation, to illus-trate the benefits of incorporating genomic informa-tion in selection, we conducted a selection exercise of40 elite genotypes for inclusion into a hypotheticalproduction population (seed orchard) following thegroup merit selection scheme of Lindgren and Mullin[70]. Group merit selection is founded on penalizingthe average BV of a selected subset by increasing theweight on the entire group co-ancestry (measured byco-ancestry coefficient) to reach a desired “status num-ber (Ns)” [71] which is an approximation of the effect-ive number of parents (Ne) (i.e., measure of diversity).In this method, the co-ancestry coefficients are esti-mated from the pedigree values of the selected individ-uals (ABLUP) while in the GBLUP case, we used themarker-based relationship matrix [17] to approximatethe co-ancestry of the selected individuals and theirdiversity was estimated by the number of foundergenome equivalents (Nge: [72]).Study approvalNot required.Availability of supporting dataThe data sets supporting the results of this article areavailable in the Dryad Digital Repository, http://doi.org/10.5061/dryad.8kb37.Additional filesAdditional file 1: Imputation accuracy of kNN-Fam method withdifferent K1 and K2 values.Additional file 2: GS prediction accuracies for GRR model for theseven studied traits for within single site, cross-sites, withinmulti-site, and for multi-site to single site with single- and multi-site(single- and multi-site GBLUP heritabilities are presented).Additional file 3: Cross-site GS prediction accuracies for all studiedcombinations (GS prediction model, imputation method, and trait).Additional file 4: The comparison of imputation methods.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsYAE conceived and designed the experiment. CC performed theimputations. OGI analysed the data and drafted the first version of themanuscript. JK, YAE supervised the data analyses. BR, JK, CC, IP and YEAcritically contributed to the final version of the manuscript. All authors readand approved the final manuscript.AcknowledgementsWe thank I Fundova and T Funda for phenotyping, T. Funda and J. Koreckyfor DNA extraction, S.E. Mitchell and K. Hyme for GBS, and T.B. Jaquish foraccess to progeny test trials and data. This study is funded by the Johnson’sFamily Forest Biotechnology Endowment, FPInnovations’ ForValueNet, andthe Natural Sciences and Engineering Research Council of Canada (NSERC)Discovery Grant to YAE.Author details1Department of Forest and Conservation Sciences, Faculty of Forestry, TheUniversity of British Columbia, 2424 Main Mall, Vancouver, British ColumbiaV6T 1Z4, Canada. 2Department of Genetics and Physiology of Forest Trees,Faculty of Forestry and Wood Sciences, Czech University of Life SciencesPrague, Kamycka 129, 165 21 Prague 6, Czech Republic. 3Department ofBiochemistry and Molecular Biology, Oklahoma State University, Stillwater,OK 74078-3035, USA.Received: 21 January 2015 Accepted: 28 April 2015References1. Grattapaglia D. Breeding Forest Trees by Genomic Selection:CurrentProgress and theWay Forward. In: Tuberosa R, Graner A, Frison E, editors.Genomics Plant Genet Resour. Dordrecht: Springer Netherlands;2014. p. 651–82.2. El-Kassaby YA, Isik F, Whetten RW. Modern Advances in Tree Breeding. In:Fenning T, editor. Challenges Oppor World’s For 21st Century. Dordrecht:Springer Science+Business Media; 2014. p. 441–59.3. Lande R, Thompson R. Efficiency of marker-assisted selection in theimprovement of quantitative traits. Genetics. 1990;124:743–56.4. Paterson AH, Tanksley SD, Sorrells ME. DNA markers in plant improvement.Adv Agron. 1991;46:39–90.5. Neale DB, Williams CG. Restriction-Fragment-Length-Polymorphism mappingin conifers and applications to forest genetics and tree improvement. CanJ Res. 1991;21:545–54.6. Williams CG, Neale DB. Conifer wood quality and marker-aided selection—acase-study. Can J Res. 1992;22:1009–17.7. Strauss SH, Lande R, Namkoong G. Limitations of molecular-marker-aidedselection in forest tree breeding. Can J Res. 1992;22:1050–61.8. Fisher RA. The correlation between relatives on the supposition ofMendelian inheritance. Trans R Soc Edinb. 1918;52:399–433.9. Stuber CW, Polacco M, Senior ML. Synergy of empirical breeding, marker-assistedselection, and genomics to increase crop yield potential. Crop Sci.1999;39:1571–83.10. Dekkers JCM. Commercial application of marker- and gene-assisted selection inlivestock: strategies and lessons. J Anim Sci. 2004;82:313–28.11. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic valueusing genome-wide dense marker maps. Genetics. 2001;157:1819–29.12. Goddard ME, Hayes BJ. Mapping genes for complex traits in domesticanimals and their use in breeding programmes. Nat Rev Genet.2009;10:381–91.13. Resende MFR, Muñoz P, Acosta JJ, Peter GF, Davis JM, Grattapaglia D, et al.Accelerating the domestication of trees using genomic selection: accuracyof prediction models across ages and environments. New Phytol.2012;193:617–24.14. Resende MFR, Muñoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM,et al. Accuracy of genomic selection methods in a standard data set ofloblolly pine (Pinus taeda L.). Genetics. 2012;190:1503–10.15. Zapata-Valenzuela J, Isik F, Maltecca C, Wegrzyn J, Neale D, McKeand S, et al. SNPmarkers trace familial linkages in a cloned population of Pinus taeda-prospectsfor genomic selection. Tree Genet Genomes. 2012;8:1307–18.16. Beaulieu J, Doerksen T, Clément S, Mackay J, Bousquet J. Accuracy ofgenomic selection models in a large population of open-pollinated familiesin white spruce. Heredity (Edinb). 2014;113:343-352.17. VanRaden PM. Efficient methods to compute genomic predictions. J DairySci. 2008;91:4414–23.18. Misztal I, Legarra A, Aguilar I. Computing procedures for genetic evaluationincluding phenotypic, full pedigree, and genomic information. J Dairy Sci.2009;92:4648–55.19. El-Kassaby YA, Lstibůrek M. Breeding without breeding. Genet Res (Camb).2009;91:111–20.Gamal El-Dien et al. BMC Genomics  (2015) 16:370 Page 16 of 1620. El-Kassaby YA, Cappa EP, Liewlaksaneeyanawin C, Klápště J, Lstibůrek M.Breeding without breeding: is a complete pedigree necessary for efficientbreeding? PLoS One. 2011;6, e25737.21. Isik F. Genomic selection in forest tree breeding: the concept and anoutlook to the future. New For. 2014;45:379–401.22. Elshire RJ, Glaubitz JC, Sun Q, Poland J a, Kawamoto K, Buckler ES, et al. Arobust, simple genotyping-by-sequencing (GBS) approach for high diversityspecies. PLoS One. 2011;6, e19379.23. Chen C, Mitchell SE, Elshire RJ, Buckler ES, El-Kassaby YA. Mining conifers’mega-genome using rapid and efficient multiplexed high-throughputgenotyping-by-sequencing (GBS) SNP discovery platform. Tree GenetGenomes. 2013;9:1537–44.24. Sutton BCS, Flanagan DJ, Gawley R, Newton CH, Lester DT, El-Kassaby YA.Inheritance of chloroplast and mitochondrial DNA in Picea and compositionof hybrids from introgression zones. Theor Appl Genet.1991;82:242–8.25. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, et al.Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. 2013;29:1492-1497.26. Porth I, White R, Jaquish B, Alfaro R, Ritland C, Ritland K. Genetical GenomicsIdentifies the Genetic Architecture for Growth and Weevil Resistance inSpruce. PLoS One. 2012;7:e44397.27. Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, Casler MD, et al. Switchgrassgenomic diversity, ploidy, and evolution: novel insights from a network-basedSNP discovery protocol. PLoS Genet. 2013;9, e1003215.28. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al.Missing value estimation methods for DNA microarrays. Bioinformatics.2001;17:520–5.29. Wang W, Wei Z, Lam T-W, Wang J. Next generation sequencing has lowersequence coverage and poorer CNP-detection capability in the regulatoryregions. Sci Rep. 2011;1:55.30. Pan J, Wang B, Pei Z-Y, Zhao W, Gao J, Mao J-F, et al. Optimization ofgenotyping-by-sequencing strategy for population genomic analysis inconifers. Mol Ecol Resour. 2014.31. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incompletedata via the EM algorithm. J R Stat Soc Ser B(Methodological).1977;39:1–38.32. Solberg TR, Sonesson AK, Woolliams JA, Meuwissen THE. Genomic selectionusing different marker types and densities. J Anim Sci. 2008;86:2447–54.33. Namkoong G. Inbreeding effects on estimation of genetic additive variance.For Sci. 1966;12:8–13.34. Squillace AE. Average genetic correlations among offspring from open-pollinatedforest trees. Silvae Genet. 1974;23:149–56.35. Askew GR, El-Kassaby YA. Estimation of relationship coefficients amongprogeny derived from wind-pollinated orchard seeds. Theor Appl Genet.1994;88:267–72.36. Grattapaglia D, Resende MDV. Genomic selection in forest tree breeding.Tree Genet Genomes. 2011;7:241–55.37. Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H, et al. Genomicselection in plant breeding. Knowledge and prospects. Adv Agron.2011;110:77–123.38. Shen X, Alam M, Fikse F, Rönnegård L. A novel generalized ridge regressionmethod for quantitative genetics. Genetics. 2013;193:1255–68.39. Hofheinz N, Frisch M. Heteroscedastic ridge regression approaches forgenome-wide prediction with a focus on computational efficiency andaccurate effect estimation. G3 Genes| Genomes| Genet. 2014;4:539–46.40. Lorenzana RE, Bernardo R. Accuracy of genotypic value predictions formarker-based selection in biparental plant populations. Theor Appl Genet.2009;120:151–61.41. Luan T, Woolliams J a, Lien S, Kent M, Svendsen M, Meuwissen THE. Theaccuracy of Genomic Selection in Norwegian red cattle assessed bycross-validation. Genetics. 2009;183:1119–26.42. VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD,Taylor JF, et al. Invited review: reliability of genomic predictions for NorthAmerican Holstein bulls. J Dairy Sci. 2009;92:16–24.43. Resende MDV, Resende MFR, Sansaloni CP, Petroli CD, Missiaggia A a,Aguiar AM, et al. Genomic selection for growth and wood quality inEucalyptus: capturing the missing heritability and accelerating breeding forcomplex traits in forest trees. New Phytol. 2012;194:116–28.44. Burdon RD. Genetic correlation as a concept for studying genotype-environmentinteraction in forest tree breeding. Silvae Genet. 1977;26:168–75.45. Annicchiarico P. Genotype X Environment Interaction Challenges andOpportunities for Plant Breeding and Cultivar Recommendations. FAO PlantProduction and Protection Paper No. 174. Rome,Italy; 2002:155.46. Hazel LN. The genetic basis for constructing selection indices. Genetics.1943;28:476–90.47. Bouffier L, Raffin A, Rozenberg P, Meredieu C, Kremer A. What are theconsequences of growth selection on wood density in the French maritimepine breeding programme? Tree Genet Genomes. 2008;5:11–25.48. Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selectionby using the realized relationship matrix. Genet Res (Camb). 2009;91:47–60.49. Van Grevenhof EM, Van Arendonk J a M, Bijma P. Response to genomicselection: the Bulmer effect and the potential of genomic selection whenthe number of phenotypic records is limiting. Genet Sel Evol. 2012;44:26.50. Bastiaansen JWM, Coster A, Calus MPL, van Arendonk J a M, Bovenhuis H.Long-term response to genomic selection: effects of estimation methodand reference population structure for different genetic architectures.Genet Sel Evol. 2012;44:3.51. Burdon RD, Shelbourne CJA. Breeding populations for recurrent selectionconflicts and possible solutions. New Zeal J For Sci. 1971;1:174–93.52. Jayawickrama KJS, Carson MJ. A breeding strategy for the New Zealandradiata pine breeding cooperative. Silvae Genet. 2000;49:82–90.53. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, et al.The Norway spruce genome sequence and conifer genome evolution.Nature. 2013;497:579–84.54. Chaisurisri K, El-Kassaby YA. Genetic diversity in a seed productionpopulation vs. natural populations of Sitka Spruce. Biodivers Conserv.1994;3:512–23.55. Stoehr MU, El-Kassaby YA. Levels of genetic diversity at different stages ofthe domestication cycle of interior spruce in British Columbia. Theor ApplGenet. 1997;94:83–90.56. Doyle JJ, Doyle JL. Isolation of plant DNA from fresh tissue. Focus (Madison).1990;12:13–5.57. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES.TASSEL: software for association mapping of complex traits in diversesamples. Bioinformatics. 2007;23:2633–5.58. R Core Team. R: A Language and Environment for Statistical Computing.Vienna, Austria; 2014.59. Perry PO. bcv: Cross-Validation for the SVD (Bi-Cross-Validation). 2009.60. Hartigan JA, Wong MA. Algorithm AS 136: A K-means clustering algorithm.J R Stat Soc Ser C (Applied Stat). 1979;28:100–8.61. Rutkoski JE, Poland J, Jannink J-L, Sorrells ME. Imputation of unordered markersand the impact on genomic selection accuracy. G3 Genes| Genomes| Genet.2013;3:427–39.62. El-Kassaby YA, Mansfield S, Isik F, Stoehr M. In situ wood quality assessmentin Douglas-fir. Tree Genet Genomes. 2011;7:553–61.63. Auty D, Achim A. The relationship between standing tree acousticassessment and timber quality in Scots pine and the practical implicationsfor assessing timber quality from naturally regenerated stands. Forestry.2008;81:475–87.64. Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml User. Guide release3.0. Hemel Hempstead, UK: VSN International Ltd.; 2009.65. Endelman JB. Ridge regression and other kernels for genomic selectionwith R package rrBLUP. Plant Genome. 2011;4:250–5.66. Henderson CR. Estimation of variance and covariance components.Biometrics. 1953;9:226–52.67. Gianola D, Okut H, Weigel K a, Rosa GJ. Predicting complex quantitativetraits with Bayesian neural networks: a case study with Jersey cows andwheat. BMC Genet. 2011;12:87.68. González-Camacho JM, de Los CG, Pérez P, Gianola D, Cairns JE, Mahuku G,et al. Genome-enabled prediction of genetic values using radial basisfunction neural networks. Theor Appl Genet. 2012;125:759–71.69. Crossa J, Beyene Y, Kassa S, Pérez P, Hickey JM, Chen C, et al. Genomicprediction in maize breeding populations with genotyping-by-sequencing.G3 Genes| Genomes| Genet Genomes| Genet. 2013;3:1903–26.70. Lindgren D, Mullin TJ. Balancing gain and relatedness in selection.Silvae Genet. 1997;46:124–9.71. Lindgren D, Gea L, Jefferson P. Loss of genetic diversity monitored by statusnumber. Silvae Genet. 1996;45:52–9.72. Caballero A, Toro M. Interrelations between effective population size andother pedigree tools for the management of conserved populations.Genet Res. 2000;75:331–43.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items