UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Exploring genomic selection in conifers Ratcliffe, Blaise Atom 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_may_ratcliffe_blaise.pdf [ 5.26MB ]
Metadata
JSON: 24-1.0377775.json
JSON-LD: 24-1.0377775-ld.json
RDF/XML (Pretty): 24-1.0377775-rdf.xml
RDF/JSON: 24-1.0377775-rdf.json
Turtle: 24-1.0377775-turtle.txt
N-Triples: 24-1.0377775-rdf-ntriples.txt
Original Record: 24-1.0377775-source.json
Full Text
24-1.0377775-fulltext.txt
Citation
24-1.0377775.ris

Full Text

  EXPLORING GENOMIC SELECTION IN CONIFERS by  Blaise Atom Ratcliffe B.Sc., The University of British Columbia, 2013  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Forestry)   THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  March 2019  © Blaise Atom Ratcliffe, 2019  ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: EXPLORING GENOMIC SELECTION IN CONIFERS  submitted by Blaise Atom Ratcliffe in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Forestry  Examining Committee: Dr. Yousry El-Kassaby, Forestry Supervisor  Dr. Charles Chen, Oklahoma State University Supervisory Committee Member  Dr. Michael Stoehr, BC MFLNRO Supervisory Committee Member Dr. Tongli Wang, Forestry University Examiner Dr. Loren Rieseberg, Botany University Examiner  Additional Supervisory Committee Members: Dr. Jaroslav Klápště, New Zealand Forest Research Institute Ltd., Scion Supervisory Committee Member  Supervisory Committee Member iii Abstract Breeding conifer species for phenotypic improvement is challenging due to delayed expression of important phenotypes related to productivity and their late sexual maturity, causing long recurrent selection cycles. Genomic selection (GS) can address such shortcomings through early prediction of phenotypes based on large numbers of jointly considered genomic markers, typically, single nucleotide polymorphisms (SNPs). Additionally, current conifer breeding genetic evaluations are based on pedigree-based predictions. However, the maximization of genetic gain in breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters, which can also be improved using GS. While GS has become a new paradigm in animal breeding, it is still in its infancy for tree improvement. Thus, GS requires validation before it can be operationally implemented. Collectively, this dissertation explores some of the challenges associated with the application of GS in forest tree improvement programs. Namely, the efficiency of GS compared to traditional phenotypic selection, methods to implement GS in a cost-efficient manner, and the prediction accuracy (PA) of phenotypes across generations, life-stages, and environments. To address these challenges I structured this dissertation into three analyses which use several GS methodologies, three genotyping platforms, and three conifer species. The first study explores the temporal decay and relative efficiency of GS PA for interior spruce (Picea engelmannii × glauca). The second study investigates the use of single-step GS (ssGBLUP) to improve the precision and accuracy of genetic parameter estimates for white spruce (Picea glauca). The third study focuses on the combined use of ssGBLUP and climate data to improve intra- and inter-generation PA in unobserved environments for Douglas-fir (Pseudotsuga menziesii). The results from these three studies demonstrated that: i) updating GS models requires iv phenotypic data at least mid-rotation age to accurately reflect mature growth traits, ii) the relative efficiency of GS is greater than traditional selection assuming a 25% reduction in breeding cycle length, iii) ssGBLUP is an effective tool for improvement in the genetic evaluation of open-pollinated mating designs, and iv) inclusion of climate variables as environmental covariates in the GS models yields improvement in PA for unobserved environments. v Lay Summary Intentional selection of traits (phenotypes) by humans is a proven method for their improvement. For long‐lived perennial species such as conifer trees, the time investment required to complete a selection cycle can be decades in length. The time requirement of this process is crucial to allow the collection of reliable phenotypic information and perform selections. Genomic selection (GS) is a technology that can immediately improve the time efficiency of forest tree improvement programs. GS uses many DNA markers to predict phenotypes, thereby eliminating lengthy selection cycles and offering faster improvement for traits of importance. Validation of GS technology is required before it can be implemented. Collectively, this dissertation explores some of the challenges associated with the application of GS in forest tree improvement programs. Namely, the efficiency of GS compared to traditional phenotypic selection, implementing GS in a cost-efficient manner, and prediction of phenotypes across generations, life stages, and environments. vi Preface This dissertation is original, independent work by the author, B. Ratcliffe. I conceived of the research questions and objectives through much consultation of my supervisory committee. I collaborated with colleagues in the data collection stages. I performed the data analysis, interpretation of results, and prepared the manuscripts. Sections of this dissertation have been submitted for publication, published or accepted in peer-reviewed journals, listed below:  • Chapter 2 o Ratcliffe, B., El-Dien, O. G., Klápště, J., Porth, I., Chen, C., Jaquish, B., & El-Kassaby, Y. A. (2015). A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods. Heredity, 115(6), 547. • Chapter 3 o Ratcliffe, B., El-Dien, O. G., Cappa, E. P., Porth, I., Klápště, J., Chen, C., & El-Kassaby, Y. A. (2017). Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca. G3: Genes, Genomes, Genetics, 7(3), 935-942. • Chapter 4 o Ratcliffe, B., Thistlewaite, F., El-Dien, O. G., Cappa, E. P., Porth, I., Klápště, J., Chen, C., Wang, T., Stoehr, M., & El-Kassaby, Y.A. (2019). Inter- and Intra-generation genomic predictions of Douglas-fir growth in unobserved environments. Submitted. vii Table of Contents  Abstract ......................................................................................................................................... iii Lay Summary .................................................................................................................................v Preface ........................................................................................................................................... vi Table of Contents ........................................................................................................................ vii List of Tables ............................................................................................................................... xii List of Figures ............................................................................................................................. xvi List of Abbreviations ................................................................................................................. xix Acknowledgements .................................................................................................................... xxi Dedication .................................................................................................................................. xxii Chapter 1: Introduction ................................................................................................................1 1.1 Classical tree improvement strategies ............................................................................. 1 1.2 Marker-assisted selection and genomic selection ........................................................... 3 1.3 Factors influencing prediction accuracy ......................................................................... 5 1.4 Statistical methodologies for genomic selection ............................................................. 7 1.5 Genomic selection in forest trees .................................................................................... 8 1.6 Research objectives ......................................................................................................... 8 1.7 Dissertation overview ..................................................................................................... 9 1.7.1 Chapter 2: Comparison genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods ............................. 9 1.7.2 Chapter 3: Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca.......................................................................................................................... 10 viii 1.7.3 Chapter 4: Inter- and intra-generation genomic predictions for Douglas-fir growth in unobserved environments ..................................................................................................... 10 Chapter 2: Comparing genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods .........................................15 2.1 Introduction ................................................................................................................... 15 2.2 Materials and Methods .................................................................................................. 18 2.2.1 Genetic material ........................................................................................................ 18 2.2.2 Phenotypic data and MBLUP analysis ..................................................................... 18 2.2.3 SNP genotyping and missing data imputation .......................................................... 20 2.2.4 Genomic selection ..................................................................................................... 20 2.2.4.1 Ridge regression BLUP (rrBLUP) .................................................................... 21 2.2.4.2 Generalized ridge regression (GRR) ................................................................. 21 2.2.4.3 BayesCπ (BCπ) ................................................................................................. 22 2.2.4.4 Cross-validation, PA and relative efficiency of GS .......................................... 22 2.3 Results ........................................................................................................................... 24 2.3.1 Phenotypic and MBLUP analysis ............................................................................. 24 2.3.2 Prediction accuracy ................................................................................................... 24 2.3.2.1 Imputation method and statistical approach ..................................................... 24 2.3.2.2 Prediction accuracy across time ........................................................................ 25 2.3.3 Distribution of SNP effects ....................................................................................... 25 2.3.4 Relative efficiency .................................................................................................... 26 2.4 Discussion ..................................................................................................................... 27 2.4.1 Accuracy of GS prediction through time .................................................................. 27 ix 2.4.2 Model comparison .................................................................................................... 28 2.4.3 GBS and marker imputation ..................................................................................... 30 2.4.4 Relative efficiency .................................................................................................... 31 2.5 Summary ....................................................................................................................... 34 2.5.1 Background ............................................................................................................... 34 2.5.2 Results ....................................................................................................................... 34 2.5.3 Conclusions ............................................................................................................... 35 Chapter 3: Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca ............................................................................................................................................42 3.1 Introduction ................................................................................................................... 42 3.2 Materials and Methods .................................................................................................. 44 3.2.1 White spruce open-pollinated progeny test, phenotype data, and genotyping ......... 44 3.2.2 Data analyses ............................................................................................................ 45 3.3 Results ........................................................................................................................... 48 3.3.1 Relationship matrices ................................................................................................ 48 3.3.2 Variance components ................................................................................................ 49 3.3.3 Breeding value accuracy ........................................................................................... 51 3.3.4 Candidate rankings.................................................................................................... 51 3.4 Discussion ..................................................................................................................... 52 3.5 Summary ....................................................................................................................... 57 3.5.1 Background ............................................................................................................... 57 3.5.2 Results ....................................................................................................................... 57 3.5.3 Conclusions ............................................................................................................... 58 x Chapter 4: Inter- and intra-generation genomic predictions for Douglas-fir growth in unobserved environments ...........................................................................................................64 4.1 Introduction ................................................................................................................... 64 4.2 Materials and Methods .................................................................................................. 67 4.2.1 Study populations...................................................................................................... 67 4.2.1.1 Study population 1 (SP1) .................................................................................. 67 4.2.1.2 Study population 2 (SP2) .................................................................................. 68 4.2.2 SNP genotyping ........................................................................................................ 68 4.2.3 Environmental covariates (ECs) ............................................................................... 69 4.2.4 Pedigree-based analyses (PBLUP) ........................................................................... 69 4.2.5 Genomic-based analyses (ssGBLUP) ....................................................................... 71 4.2.5.1 Model 1 (M1): ................................................................................................... 72 4.2.5.2 Model 2 (M2): ................................................................................................... 73 4.2.5.3 Model 3 (M3): ................................................................................................... 73 4.2.5.4 Model 4 (M4): ................................................................................................... 73 4.2.5.5 Model 5 (M5): ................................................................................................... 74 4.2.5.6 Prediction accuracy and cross-validation.......................................................... 74 4.2.5.7 Estimation of GEBV for the validation population .......................................... 74 4.3 Results ........................................................................................................................... 77 4.3.1 Pedigree-based analyses (PBLUP) ........................................................................... 77 4.3.2 Genomic-based analyses (ssGBLUP) ....................................................................... 77 4.3.3 Genomic prediction accuracy ................................................................................... 79 4.3.3.1 Intra-generation (GA1) ..................................................................................... 79 xi 4.3.3.2 Inter-generation (GA2) ..................................................................................... 79 4.4 Discussion ..................................................................................................................... 80 4.5 Summary ....................................................................................................................... 86 4.5.1 Background ............................................................................................................... 86 4.5.2 Results ....................................................................................................................... 86 4.5.3 Conclusions ............................................................................................................... 86 Chapter 5: Conclusion .................................................................................................................94 5.1 Research novelties ........................................................................................................ 94 5.2 Research contributions .................................................................................................. 94 5.3 Future research directions ............................................................................................. 96 References .....................................................................................................................................99 Appendix A Supplementary information (Chapter 2) ............................................................ 121 Appendix B Supplementary information (Chapter 3) ............................................................. 126 Appendix C Supplementary information (Chapter 4) ............................................................. 134  xii List of Tables Table 1.1 Summary of empirical genomic selection studies concerning forest trees. .................. 13 Table 2.1 Sample size (n) and descriptive statistics for interior spruce height (m) used in the pedigree-based analysis. ............................................................................................................... 36 Table 2.2 Pedigree-based estimates of variance components for interior spruce height (m) at ages 3, 6, 10, 15, and 30 (years), and the estimated narrow-sense individual tree heritabilities (ℎ2), age-age genetic correlations with age 40 (𝑟𝑖𝑗), mean estimated individual breeding value accuracies (𝑟(𝐸𝐵𝑉, 𝑇𝐵𝑉)). ............................................................................................................................. 36 Table 2.3 Genomic prediction accuracy obtained from rrBLUP, GRR, and BCπ analytical approaches for SVD, KNN, and M60 imputation methods for interior spruce tree height at ages 3, 6, 10, 15, 30, and 40 (years). ......................................................................................................... 37 Table 2.4 Age 40 temporal genomic prediction accuracy obtained from rrBLUP, GRR, and BCπ analytical approaches for SVD, KNN, and M60 imputation methods for interior spruce tree height at ages 3, 6, 10, 15, and 30 (years). ............................................................................................... 37 Table 2.5 Predictive accuracy and relative efficiency from cross-validated models using realized (GEBV GS ) and average (EBV TS ) relationship matrices for early (10 and 15 years) and mature (30 and 40 years) interior spruce tree height. ...................................................................................... 38 Table 3.1 Descriptive statistics of the pedigree (A) relationship estimator, and combined pedigree-marker (H) relationship estimator at 0, 25, 50, 75, and 100% genotyping effort. ........................ 59 Table 3.2 Pedigree-based (A) and combined pedigree-marker based (H) estimates of variance components for white spruce tree height (HT: cm) and wood density (WD: kg/m3), with associated estimated narrow-sense individual tree heritabilities (ℎ2), additive genetic correlations (rG), and Akaike information criterion (AIC) model fit statistic. ................................................................ 60 xiii Table 3.3 Mean breeding value accuracy estimates (𝑟) for white spruce tree height (HT: cm) and wood density (WD: kg/m3) using pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimator at 0, 25, 50, 75, and 100% genotyping effort for three classes of individuals: maternal parent, genotyped progeny, non-genotyped progeny. ............... 61 Table 3.4 Proportion of common candidates in the top 5% of maternal parent rankings (above diagonals), and estimated Spearman breeding value rank correlations (diagonals and below diagonals) for pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimators at 0, 25, 50, 75, and 100% genotyping effort for white spruce tree height (HT: cm) and wood density (WD: kg/m3). ................................................................................... 62 Table 3.5 Proportion of common candidates in the top 5% of progeny rankings (above diagonals), and estimated Spearman breeding value rank correlations (diagonals and below diagonals) for pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimators at 0, 25, 50, 75, and 100% genotyping effort for white spruce tree height (HT: cm) and wood density (WD: kg/m3). .......................................................................................................... 63 Table 4.1 Experimental population summaries for pedigree-based analyses. .............................. 88 Table 4.2 Training (T) and validation (V) population summaries for genomic analyses. ............ 89 Table 4.3 Pedigree-based single environment narrow-sense heritabilities (diagonal) and Type-B genetic correlations (lower triangle) for MAI, standard errors in parentheses. ............................ 90      xiv Table 4.4 Model Deviance Information Criterion (DIC), posterior means and their 95% highest posterior density (HPD) intervals in brackets for the additive variance (𝜎𝑎2), replicate within environment variance (𝜎𝑟2), environmental variance (𝜎𝑠2), environment covariates variance (𝜎𝑤2 ), the interaction between additive effects and environment covariates (𝜎𝑎𝑤2 ), the variance of the interaction between additive and environmental effects (𝜎𝑎𝑆2 ), residual variance (𝜎𝑒2), heritability (ℎ2). ............................................................................................................................................... 91 Table 4.5 Intra- (GA1) and inter-generation (GA2) cross-validation prediction accuracies (𝑟(𝐺𝐸𝐵𝑉, 𝐸𝐵𝑉)) with standard errors (𝑆𝐸). See text for models´ abbreviations. ........................ 92 Table B.1 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (ℎ2), additive genetic correlations (𝑟), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 25% genotyping effort. ................ 126 Table B.2 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (ℎ2), additive genetic correlations (𝑟), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 50% genotyping effort. ................ 127 Table B.3 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (ℎ2), additive genetic correlations (𝑟), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 75% genotyping effort. ................ 128 Table C.4 Number of parent-parent (F1-F1 / F2-F2) or parent-grandparent (F1-F2) (above diagonal) and families (below diagonal) in common among the environments for study population 1 (SP1). ....................................................................................................................................... 137 xv Table C.5 Number of F2 parents within the F1 environments. .................................................. 137 Table C.6 List of monthly (from January/01 to December/12) climatic variables obtained from ClimateBC v5.51 (Wang et al. 2012b) used in the genomic analyses. ....................................... 138   xvi List of Figures Figure 1.1 Schematic of a classical recurrent selection program for forest tree improvement. ... 12 Figure 2.1 GS prediction accuracies from models trained using interior spruce height data at ages 3, 6, 10, 15, 30 and 40 years and validated against EBV at the training age. Shape and pattern in left panels represent imputation methods for 60% missing genomic data (M60, triangle/dash; KNN, circle/dash-dot; SVD, square/dot). Shape and pattern in right panels represent genomic prediction methods (GRR, circle/dash; BCπ, square/dash-dot; rrBLUP, triangle/dot). ............... 39 Figure 2.2 Left panels: Temporal GS prediction accuracy for three models (rrBLUP, GRR, BCπ) trained using interior spruce height data at ages 3, 6, 10, 15, 30 and 40 years and validated against EBV at age 40 years. Right panels: Relationship between age-age genetic correlations from the MBLUP model and temporal GS prediction accuracy from the three GS models (rrBLUP, GRR, BCπ). Shape and patterns in both panels represent imputation methods for 60% missing genomic data (M60, triangle/dash; KNN, circle/dash-dot; SVD, square/dot). ............................................ 40 Figure 2.3 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 15 years in interior spruce. ....... 41 Figure 4.1 Intra-generation (GA1) and inter-generation (GA2) predictions of EBV. .................. 93 Figure A.1 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 3 year in interior spruce. ......... 121 xvii Figure A.2 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 6 year in interior spruce. ......... 122 Figure A.3 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 10 year in interior spruce. ....... 123 Figure A.4 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 30 year in interior spruce. ....... 124 Figure A.5 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (𝑟) and Spearman (𝜌) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 40 year in interior spruce. ....... 125 Figure B.6 Heat map of the pedigree-based relationship estimator (A). ..................................... 129 Figure B.7 Heat map of the combined pedigree-marker based relationship estimator (H) at 25% genotyping effort for a single random sample. ........................................................................... 130 Figure B.8 Heat map of the combined pedigree-marker based relationship estimator (H) at 50% genotyping effort for a single random sample. ........................................................................... 131 Figure B.9 Heat map of the combined pedigree-marker based relationship estimator (H) at 75% genotyping effort for a single random sample. ........................................................................... 132 xviii Figure B.10 Heat map of the combined pedigree-marker based relationship estimator (H) at 100% genotyping effort. ........................................................................................................................ 133 Figure C.11 Three generation coastal Douglas-fir contemporary pedigree containing the inter-generation validation population of study. Circles represent single individuals, octagons represent full-sibling families. Red colours are the unrelated (assumed) base population of wild plus tree selections (P0), green colours are the forward selection progenitors (F1) of the Jordan (purple, F2-2) and Bigtree (yellow, F2-3) validation populations (F2) respectively. The pedigree was produced using Helium software (Shaw et al. 2014). ................................................................................. 134 Figure C.12 Scatterplot of mean absolute marker effects from the cross-validation analysis. ... 135 Figure C.13 Scatterplots of mean rank of predicted trees with Spearman rank correlation coefficients from the cross-validation analyses for all predictions (𝜌1) and top 40% of predictions (𝜌2) within environments Adam (A), Lost (B), Fleet (C), Jordan (D), and Bigtree (E) comparing the base model (M1, x-axis) and fully specified model (M5, y-axis). Deviation from the diagonal line implies a rank change. .......................................................................................................... 136 xix List of Abbreviations BLUE      best linear unbiased estimator BLUP       best linear unbiased prediction CV       coefficient of variation EBV       estimated breeding value EC      environmental covariate GBLUP      genomic best linear unbiased predictor GBS      genotyping by sequencing GEBV      genomic estimated breeding value GLS      generalized least squares GRR       generalized ridge regression GS      genomic selection HBLUP     hybrid best linear unbiased predictor KNN       k-nearest-neighbour LD      linkage disequilibrium MAS      marker assisted selection MM      mixed model OLS      ordinary least squares PA       prediction accuracy QTL      quantitative trait loci REML      restricted maximum likelihood rrBLUP      ridge regression best linear unbiased predictor SNP      single-nucleotide polymorphism xx ssGBLUP      single-step genomic best linear unbiased predictor SVD       singular value decomposition TBV      true breeding value xxi Acknowledgements Disclaimer: This thesis was not a solo journey. Through the cumulative effect of many factors including mentoring, friendship, and love this thesis was made possible.  Thank you to my lab mates: Susan Song Dr. Omnia Gamal El-Dien Ibrahim Dr. Yang Liu Dr. Faisal Alharbi Frances Thistlethwaite   Thank you to my colleagues: Dr. Michael Stoehr Dr. Jaroslav Klápště Dr. Ilga Porth Dr. Charles Chen Dr. Eduardo Pablo Cappa  Thank you, Family ♡ Thank you, Sierra           (this is just the beginning ♡)   …And most of all, thank you to my supervisor Dr. Yousry El-Kassaby for your patience and seemingly endless supply of opportunity, encouragement, mentorship, support, and humor. xxii Dedication     1  Chapter 1: Introduction 1.1 Classical tree improvement strategies  Large scale recurrent selection tree improvement programs began in the 1950s for important economic forest tree species (Figure 1.1) (Zobel and Talbert 1984). Since then, rapid improvement of the statistical analyses and experimental design of tree improvement programs has occurred, largely due to progress in animal and agricultural plant breeding systems. Genetic tests are the central focus of most tree improvement programs as they allow the ascertainment of the genetic architecture of traits of interest (White et al. 2007). The genetic architecture here refers to population-level parameters such as the amount of genetic and phenotypic variation of a trait, estimates of heritability, genetic by environment interactions, genetic correlations between trait pairs, and juvenile-mature genetic correlations (White et al. 2007). These parameters are determined when selected trees are bred in a structured design to create families, and the resulting progeny are planted in a randomized and replicated experimental design suitable for statistical analysis.  For tree improvement programs, these parameters help guide which traits are suitable for development as well as the selection of superior individuals for advanced generations. Selections can be made through forward or backward selection. Backward selection refers to choosing parents based on progeny performance, while forward selection refers to the selection of offspring. Generally, early generation improvement programs practice backward selection while advanced generation programs use forward selections. Selections are based on the ranking of progeny and parents by their breeding value as a deviation from a population mean of zero (White et al. 2007). By definition, the breeding value is the average allelic effect passed from parent to offspring, thus an individual’s breeding value is equal to the average of their two parents (White et al. 2007). 2  Linear statistical models form the foundation of quantitative genetic analyses of breeding programs, allowing partitioning of observed phenotypic variation into genetic and non-genetic components (Mrode 2014). One of the greatest contributions came from the development of mixed model (MM) theory by (Henderson 1953). Henderson’s MMs provided a more robust method for the analysis of complex breeding programs by improving on several insufficiencies of ordinary least squares (OLS) analysis. The key to MM analyses is that it permits the analysis of unbalanced datasets with multiple-generations and various mating designs (White et al. 2007). Further, it can account for inbreeding and deviations from panmixia (White et al. 2007). These advantages make MM analyses the statistical tool of choice for modern breeders over OLS analyses (Mrode 2014). The crucial difference between OLS and MM methods is the presence of both fixed and random terms in the MMs and the handling of genetic effects as random effects. This assumption allows for best linear unbiased prediction (BLUP) estimates of the random genetic components of the model, while fixed components are estimated using generalized least squares (GLS) to produce best linear unbiased estimates (BLUE). BLUE and BLUP estimates have the distinct advantage over OLS estimates in that they are weighted depending on the quantity and quality of data available ensuring that the not only the best performing but the most tested individuals are ranked highest (White et al. 2007). Robust methods for estimating variance components that were suitable to the complex nature of breeding programs were also needed. The restricted maximum likelihood (REML) method can account for the previously described complexities of breeding programs (Lynch and Walsh 1998). Collectively, BLUP, BLUE, GLS, and REML, form the foundation of MM analysis, and as such are widely implemented in many useful software packages, such as ASReml (Gilmour et al. 2009).  3  With the development of the previous tools and computational power, Henderson (1976) introduced the animal model which allowed for correlated random effects. A key advantage of the animal model was the permission of the additive relationship matrix (𝑨), which allows BLUP estimated breeding values (EBV) to be based on information from all relatives (i.e., information sharing) (Thompson 2008). Likewise, modern tree improvement relies heavily on incorporating structured pedigree information in to linear mixed model analysis for prediction of individual EBVs. The average numerator relationship matrix (𝑨), describes the expected additive genetic relationship between individuals involved in the analysis. This type of analysis is commonly termed ‘ABLUP’ or ‘PBLUP’ and is routinely used to estimate EBVs.  1.2 Marker-assisted selection and genomic selection In the past, marker-assisted selection (MAS) has been considered as a selection strategy for forest tree breeding to exploit linkage disequilibrium (LD) between quantitative trait loci (QTL) and genetic markers (White et al. 2007). The MAS strategy was primarily appealing to breeders as a method to decrease the length of the extensive tree breeding cycle, which can take upwards of 30 years for a single cycle (Isik 2014). It was thought that few markers of large effect that are in LD with QTL would provide sufficient information for prediction of phenotypes. However, MAS has generally not been a rewarding endeavor in forest tree breeding programs due to several limitations of the method (Strauss et al. 1992). The primary constraint that has withheld MAS from operational use in forest trees is the low proportion of the phenotypic variance accounted for by the relatively small number of statistically significant markers used in the analysis (White et al. 2007). This limitation results primarily from the infinitesimal genetic architecture (i.e., complex) of most economic or growth-related traits (Hill et al. 2008). Further limitations to MAS in forest trees include low levels of LD (Neale and Savolainen 2004), and, environment and lineage-specific 4  QTL interactions caused by the overestimation of QTL effects (Beavis 1998). The latter limitation constrains the transferability of estimated marker effects between environments and genetic lineages respectively, rendering phenotypic predictions to be inaccurate if violated. Genomic selection (GS) is fundamentally different from MAS through its simultaneous use of phenotypes and dense set of markers (thousands), which are implemented without the a priori assumption concerning marker significance (Meuwissen et al. 2001). GS is thus able to capture more phenotypic variation in traits with complex inheritance since it is assumed that at least some of the very many fitted markers will be in LD with some of the QTL of the desired trait (Resende et al. 2012b). Thus, the novel GS approach combines the phenotypic and genetic information of a training population to develop a prediction model that produce genomic estimated breeding values (GEBV) for selection candidates, requiring only their genotypic information (Meuwissen et al. 2001). This method has the potential to circumvent the need for the long testing phase that forest trees require to attain accurate phenotypic data for traditional pedigree-based estimation of EBVs, resulting in greater genetic gain per unit time. Additionally, with the use of the realized relationship matrix (𝑮) (VanRaden 2008), GS offers increased accuracy when considering the genetic variance of the breeding population and can assist in the management of genetic diversity and within family selection (Heffner et al. 2009). The genetic relationships described by 𝑨 are based on expected values, however, relatedness within families can deviate from this expectation due to the Mendelian segregation of alleles, which is better captured through the use of 𝑮 (Heffner et al. 2009). This tool is especially useful in open-pollinated mating designs where the assumption of half-sib families fails often fails due to the presence of cryptic relatedness (i.e., full-sibs, self-sibs, and self-halfs within families, and historical relatedness) (Namkoong 1966; Squillace 1974). Ultimately, the use of 𝑮 allows for more 5  accurate estimates of genetic parameters, the ability to more accurately monitor the genetic variance of the breeding population, and ultimately capture more genetic gain for traits of interest.  1.3 Factors influencing prediction accuracy GS prediction models are characteristically evaluated by their ability to accurately predict corrected phenotypes or EBV of genotyped individuals. K-fold cross-validation is a widely used method of evaluating the prediction accuracy (PA) of a model (Lorenz et al. 2011). The experimental population is split into K number of folds and the genomic prediction model is then trained using the phenotypic and genetic data from the K-1 folds portion of the population. The remaining fold is used for validation of the model by calculating the average Pearson correlation coefficient between the predicted phenotype or GEBV and the corrected phenotype or an estimate of the true breeding value (TBV), respectively (Lorenz et al. 2011). This process is replicated to allow for the calculation of the standard error of prediction. Ideally, to fully generalize the genomic prediction model, the restriction of factors such as environment and genetic relatedness between the training and validation population is required when assessing the PA, however, this can depend on the rationale of constructing the prediction model. Several factors have been identified that influence the PA of GS prediction models: i) extent of LD between genetic markers and QTL, ii) 𝑁𝑒, iii) number of genetic markers, iv) heritability of the trait, v) training population size, and vi) distribution of QTL effects (Hayes et al. 2009; Grattapaglia and Resende 2011; Combs and Bernardo 2013). Grattapaglia (2014) noted that these factors are not independent and are strongly interconnected, affecting the PA of genomic prediction models jointly. Isik (2014) and Grattapaglia (2014) both comment that of the listed factors, 𝑁𝑒, number of markers, and training population size can be efficiently regulated by the tree breeder. 6  Using deterministic simulation, Grattapaglia and Resende (2011) showed that genomic PA is highly dependent on 𝑁𝑒 and the number of genetic markers used (i.e., marker density). The relationship between these two parameters essentially describes the extent of marker-QTL LD in the population. 𝑁𝑒 can be simply defined as the number of individuals who contribute gametes to the next generation. Thus, lower 𝑁𝑒 in theory should require lower marker density, since the effect of drift is more efficient to produce stronger non-random association (i.e., LD) between markers and QTL (Grattapaglia 2014). For breeding populations where 𝑁𝑒 is less than 60, Grattapaglia and Resende (2011) determined that a marker density of 2-3 markers/cM would be sufficient to obtain accuracy at least as good as traditional pedigree-based selection (i.e., ABLUP), assuming 100 QTLs controlling the trait of interest. Similarly, it was determined for 𝑁𝑒 up to 100, 10-20 markers/cM would be required depending on the number of QTLs controlling the trait. In conifers, this generally translates to a requirement of 20,000-60,000 markers which is now easily obtainable through current generation sequencing and genotyping platforms (Grattapaglia 2014).  Additionally, training population size was presented by Grattapaglia and Resende (2011) as having little impact on genomic PA when more than 1000 individuals were used. Further, the effect of trait heritability on genomic PA was minor when assuming 50 or 100 QTL (Grattapaglia and Resende 2011). This relationship confirmed that decreases in PA with decreasing trait heritability can be compensated for by increasing the training population size as suggested by (Meuwissen et al. 2001).  An important aspect of GS to consider is the persistence of PA over subsequent generations of breeding. The theoretical increase in genetic gain produced by GS hinges on the capacity of the prediction model to remain relevant in successive generations. For this to occur, PA must ideally be based on marker-QTL LD rather than kinship (Habier et al. 2013). The PA of genomic models 7  can be composed of a combination of the two factors, leading to inflated estimates of PA to occur without proper validation since the LD between marker-QTL is reduced only by rare recombination events (Habier et al. 2007, 2010, 2013). Thus, the GS prediction models should ideally be validated after genetic recombination has occurred in a progeny generation. 1.4 Statistical methodologies for genomic selection All GS methods are required to address the “large p, small n” problem where the number of markers (p) is greater than the number of samples (n), as well as model overfitting due to multicollinearity between markers (Lorenz et al. 2011). Two classes of statistical models are most widely used for GS in the literature, shrinkage models and variable selection models. Shrinkage models fit the effects of all markers regardless of the magnitude of their effect, whereas variable selection methods eliminate markers with effects near zero. Variable selection methods then operate under the assumption that a trait of concern is controlled by few to many loci (i.e., oligogenic), while shrinkage methods suppose a Fisher (1919) infinitesimal trait architecture (Lorenz et al. 2011). Simulation studies have shown that variable selection models provide improved PA for traits with oligogenic architecture, however, results from real data do not consistently show the same trend, justifying the need to test a wide variety of statistical models (Berger et al. 2015). The models can also be further subdivided based on the assumption of marker-specific variances. Homoscedastic marker variance models assume that each marker explains an equal proportion of phenotypic variance, whereas heteroscedastic effect models allow for unique marker variances. As previously discussed, genomic PA can be decomposed into kinship and pure marker-QTL LD components. Shrinkage models with homoscedastic effect variances, such as ridge regression BLUP, fit all markers into the prediction model. Consequently, models based under 8  these assumptions have been shown to produce a greater proportion of PA based on describing genetic relationships (i.e., kinship) (Habier et al. 2007). Conversely, models based on heteroscedastic marker variances and variable selection have been shown to provide stronger accuracy due to marker-QTL LD (Habier et al. 2007). Nevertheless, it is recommended to use common shrinkage models as a starting point for analysis due to their robustness (Lorenz et al. 2011). 1.5 Genomic selection in forest trees The potential for use of GS in forest trees was first explored by Grattapaglia and Resende (2011) through the use of deterministic simulation studies. More recently, fifteen empirical studies have all produced promising results in regard to the acceleration of the breeding cycle for many forest tree species; namely, eucalypts (Eucalyptus spp.) (Resende et al. 2012b; Tan et al. 2017), loblolly pine (Pinus taeda L.) (Resende et al. 2012a; c; Zapata-Valenzuela et al. 2012, 2013; Munoz et al. 2014), white spruce (Picea glauca (Moench) Voss) (Beaulieu et al. 2014a; b), black spruce (Picea mariana (Mill.) BSP) (Lenz et al. 2017), interior spruce (Picea glauca x engelmannii) (Gamal El-Dien et al. 2015), maritime pine (Pinus pinaster Aiton) (Isik et al. 2016; Bartholomé et al. 2016), coastal Douglas-fir (Pseudotsuga menziesii var. menziesii (Mayr) Franco) (Thistlethwaite et al. 2017), and Norway spruce (Picea abies (L.) Karst.) (Chen et al. 2018). Table 1.1 summarizes important information from these previous studies involving GS and forest trees. 1.6 Research objectives The research objectives of this dissertation are: • Assess the temporal decay in prediction accuracy (Chapter 2). • Evaluate the potential of genotyping by sequencing as a genotyping platform for GS (Chapter 2). 9  • Explore the potential for reducing genotyping effort and its impact on the accuracy and precision of genetic parameter estimates (Chapter 3). • Compare the effectiveness of genomic selection to classical tree improvement strategies (Chapter 2, Chapter 3). • Evaluate inter- and intra-generation prediction accuracy in unobserved environments (Chapter 4).  • Evaluate the potential of exome capture as a genotyping platform for GS (Chapter 4). 1.7 Dissertation overview The research objectives of this dissertation were explored in three separate studies, which have been divided into the following chapters: 1.7.1 Chapter 2: Comparison genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods This chapter investigates the decay in PA of GS models as a function of time. Tree height at ages 3, 6, 10, 15, 30 and 40 years were used to construct GS prediction models and the PA between consecutive measurement years was analyzed. Variation in PA of the GS prediction models were explored with combinations of i) three methods of genetic marker imputation, representing variable genetic marker densities, and ii) three methods of marker effect estimation. A comparison of the relative efficiency, as the genetic gain per unit time, of GS to classical tree improvement was also performed. This chapter represents the first use of genotyping by sequencing genetic marker discovery in a large-scale genomic tree improvement study. 10  1.7.2 Chapter 3: Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca This chapter introduces the single-step GBLUP (ssGBLUP) method for genomic selection. The ssGBLUP approach allows the incorporation of nongenotyped individuals into the GS prediction model to participate in breeding value predictions, estimation of marker effects, and estimation of genetic parameters. The aim was to assess the feasibility of reducing the genotyping effort of the required for GS to potentially increase its cost efficiency. Using variable proportions (0, 25, 50, 75, and 100%) of genotyped individuals within the breeding population, the fit GS models were compared based on their precision of genetic parameter estimates, accuracy of predicted breeding values, and rankings of selection candidates. This chapter represents the first application of ssGBLUP in a forest tree species. 1.7.3 Chapter 4: Inter- and intra-generation genomic predictions for Douglas-fir growth in unobserved environments In this chapter the ssGBLUP methodology of GS was applied to predict breeding values of siblings and progeny in unobserved environments. The ssGBLUP approach facilitated the inclusion of phenotypic data from 7 maintained progeny trials and 7 abandoned progeny trials representing 14 environments from across the breeding zone of coastal Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco var. menziesii). Additionally, monthly averages of climate variables were used to explain phenotypic variation due to environmental and genotype by environment interaction effects. Hierarchical models with the successive inclusion of random effects to model environmental and genotype by environment variance were compared based on their genetic parameter estimates, model fit criteria, and inter- and intra-generation prediction accuracy for unobserved environments. This chapter represents the first use of environmental covariates to explain 11  environmental heterogeneity and genotype by environment interactions for the genomic prediction of breeding values in a forest tree species.12    Figure 1.1 Schematic of a classical recurrent selection program for forest tree improvement. 13  Table 1.1 Summary of empirical genomic selection studies concerning forest trees. Reference Breeding population Population size Species Traits Marker No. markers Statistical method PA Multi-generation Multi-environment Resende et al. (2012a) Full-sib 800 P. taeda L. G SNP 4,852 rrBLUP 0.63-0.75 N Y Resende et al. (2012b) Full-sib 738-920 E. grandis x E. urophylla and E. grandis, E. urophylla and E. globulus and their hybrids G, B, W DArT 3,129-3,564 rrBLUP 0.55-0.88 Y Y Resende et al. (2012c) Full-sib 951 Pinus taeda G, B, W, C, D SNP 4,853 rrBLUP, LASSO, BA, BC 0.17-0.51 N N Zapata-Valenzuela et al. (2012) Full-sib 149 Pinus taeda G, B, C SNP 3,406 rrBLUP 0.30-0.83 N N Zapata-Valenzuela et al. (2013) Full-sib 165 Pinus taeda G SNP 3,461 GBLUP 0.37–0.74 N N Beaulieu et al. (2014a) Half-sib 1,694 Picea glauca G, B, W, C SNP 6,385 rrBLUP 0.33-0.44 N N Beaulieu et al. (2014b) Half/Full-sib 1,748 Picea glauca G, W SNP 6,932 rrBLUP, BL 0.52-0.79 N Y 14  Reference Breeding population Population size Species Traits Marker No. markers Statistical method PA Multi-generation Multi-environment Gamal El-Dien et al. (2015) Half-sib 1,126 Picea glauca x engelmannii G, W SNP 8,868-62,198 rrBLUP, GRR 0.47-0.77 N Y Munoz et al. (2014) Full-sib 951 Pinus taeda G SNP 4,853 GBLUP 0.66–0.86 N N Isik et al. (2016) Half-sib 661 Pinus pinaster G, S SNP 2,500 GBLUP, BRR, BL 0.38-0.55 Y N Bartholome et al. (2016) Half/Full-sib 818 Pinus pinaster G, S SNP 4,332 GBLUP, BL 0.52-0.82 Y N Lenz et al. (2017) Full-sib 734 Picea mariana G, W SNP 4,993 rrBLUP 0.74-0.86 N Y Thistlethwaite et al. (2017) Full-sib 1,372 Pseudotsuga. menziesii G, W SNP 69,551 rrBLUP, GRR 0.78-0.92 N Y Tan et al.(2017) Full-sib 1,039 E.grandis, E.urophylla and hybrids G, B, W SNP 41,304 GBLUP, rrBLUP, BL, RKHS 0.12-0.47 Y N Chen et al. (2018) Full-sib 1,370 Picea abies G, W SNP 116,765 GBLUP, BRR, BL, RKHS 0.39-0.81 N Y NOTE: G-growth traits, B-biomass traits, W-wood quality traits, C-wood chemistry traits, D-disease resistance; S-stem form; GBLUP, genomic BLUP; rrBLUP, random regression BLUP; BL, Bayesian LASSO; RKHS, reproducing kernel Hillbert space; BC, BayesC; BA, BayesA; GRR, generalized ridge regression; BRR, Bayesian ridge regression.  15  Chapter 2: Comparing genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods 2.1  Introduction The principal limitation in most tree-improvement programs is the time required for the completion of one cycle of breeding, testing, and selection. In some programs, it can take up to 30 years to complete a single cycle of breeding, specifically for traits with late expression patterns. Strategies to maximize genetic gain per unit time should then be the primary focus to rationalize the enormous spatial and economic requirements associated with forest tree-improvement practices (White et al. 2007). The concept of genomic selection (GS) (Meuwissen et al. 2001) has promised to reduce the time associated with breeding cycles and has since established itself as a paradigm in animal (Hayes et al. 2009) and plant (Heffner et al. 2009) breeding systems. Though this movement has yet to occur within a forest tree species context. The novel GS approach combines phenotypes and genotypes of a training population (TP) to develop a prediction model that estimates genomic breeding values (GEBV) for selection candidates, requiring only their genotypic information (Meuwissen et al. 2001). This method may circumvent the need for the long testing phase that forest trees require to attain accurate phenotypic data for traditional pedigree-based estimation of breeding values and offers a unique opportunity to substantially increase the response to selection through increasing the number of selection candidates. Previously, marker-assisted early selection (MAES) has been considered as a selection strategy for forest tree breeding to exploit the linkage disequilibrium (LD) between quantitative  16  trait loci (QTL) and genetic markers (White et al. 2007). However, MAES has not been rewarding in forest tree breeding programs owing to its severe limitations (Strauss et al. 1992). The primary constraint that has withheld MAES from use in forest trees is the low proportion of the phenotypic variance accounted for by the relatively low number of statistically significant markers used in the analysis (White et al. 2007). This limitation results primarily from the infinitesimal genetic architecture of most complex growth-related traits (Hill et al. 2008). Further limitations to MAES in forest trees include low levels of LD (Neale and Savolainen 2004) and strong QTL-environment and QTL-lineage interactions due to overestimation of the QTL effects (Beavis 1998). GS is fundamentally different from MAES through its simultaneous use of phenotypes and dense set of markers (thousands), which are implemented without a prior assumption concerning marker significance. GS is thus thought to capture more variation in traits with complex inheritance because it is assumed that at least some of the many fitted markers will be in LD with some of the QTL of the desired trait (Meuwissen et al. 2001). The GS method has been enabled via current generation sequencing technologies, their low per-sample cost and the use high-density single nucleotide polymorphism (SNP) genotyping platforms such as genotyping-by-sequencing (GBS) (Elshire et al. 2011). The GBS method is characterized by its use of methylation-sensitive restriction enzymes to reduce genome complexity, and high levels of multiplexing, which efficiently obtain genome-wide SNP markers. The GBS pipeline thus does not require prior genomic information, making it suitable for non-model species such as forest trees owing to their current lack of high-quality reference genomes (Elshire et al. 2011). Recently, (Chen et al. 2013) successfully demonstrated the suitability of GBS for SNP discovery in two economically important forest tree species, white spruce (Picea glauca (Moench) Voss) and lodgepole pine (Pinus contorta Dougl. ex. Loud.). The density of SNP  17  markers obtained by GBS can be increased substantially by tolerating high levels of missing data and the use of marker imputation (Crossa et al. 2013). However, the benefit of using imputation and the optimal imputation method has not yet been completely validated in GS studies that utilize GBS (Rutkoski et al. 2013). The potential for use of GS in forest trees was first explored by Grattapaglia and Resende (2011) through the use of deterministic simulation studies. More recently, empirical studies have all produced promising results in regard to the acceleration of the breeding cycle for three tree species; namely, eucalypts (Eucalyptus spp.) (Resende et al. 2012b), loblolly pine (Pinus taeda L.) (Resende et al. 2012a; c; Zapata-Valenzuela et al. 2012) and white spruce (Beaulieu et al. 2014a; b). This study represents a novel approach over the preceding studies through the application of the non-model organism GBS SNP discovery pipeline, in addition to high missing data ratio imputation methods to produce GS prediction models. In the present study, 769 40-year-old interior spruce (Picea engelmannii × glauca) trees from 25 elite open-pollinated families grown on two progeny test sites near Prince George, BC, Canada were randomly selected. Height at ages 3, 6, 10, 15, 30 and 40 were used to obtain estimates of pedigree-based breeding values (EBV), narrow-sense heritability, age-age genetic correlations, and in combination with SNP marker data, GS prediction models and associated GEBV. The prediction models were developed using three statistical approaches: ridge regression BLUP (rrBLUP), generalized ridge regression (GRR) and BayesCπ (BCπ). Further, the GBS pipeline for discovery of SNP markers for each of the 769 interior spruce trees. Three unordered high-density SNP imputation methods (K-Nearest Neighbor (KNN); Singular Value Decomposition (SVD); Mean Imputation (M60)) were used on a 60% missing data set to produce the SNP table and used to explore the effect of imputation method.  18  The objectives of this study are to: i) assess the prediction accuracy (PA) of GS at different ages for the complex trait, tree height, in interior spruce, ii) evaluate the temporal PA of GS models for purposes of model retraining, iii) explore variation of PA in the previous two objectives using combinations of three GS statistical approaches and three imputation methods, iv) consider the suitability of SNPs discovered through a GBS pipeline in conjunction with unordered high-density imputation, for GS in interior spruce, and v) assess the relative efficiency of GS to traditional pedigree-based BLUP selection. 2.2 Materials and Methods 2.2.1 Genetic material Fresh foliage was obtained post flush in spring 2013 from two 40-year-old interior spruce progeny trial sites, Quesnel and PGTIS, located within the Prince George Seed Planning Zone (SPZ) of North-central British Columbia, Canada. Tissues were sampled, separated, sealed and placed on ice prior to being stored at −80 °C until DNA extraction. Briefly, the two sites, PGTIS (Lat. 53.771639 N, Long. 122.718778 W, Elev. 610 m) and Quesnel (Lat. 52.990889 N, Long. 122.2085 W, Elev. 915 m), contain a total of 32 940 trees initially planted with 2-year-old container nursery stock in a randomized complete block design. The two sites are represented by the same 174 open-pollinated families each planted in 10-tree-row plots within 10 replicate blocks and 2.5 by 2.5 mm spacing. This study concerns 769 randomly selected trees from within a subset of 25 elite families based on breeding value for tree volume. 2.2.2 Phenotypic data and MBLUP analysis Tree height (m) at ages 3, 6, 10, 15, 30 and 40 years were used in this study. Mature tree heights were obtained using an ultrasonic clinometer Vertex III (Haglöf, Långsele, Sweden). The full data set consisting of a maximum of 29 475 trees from 174 open-pollinated families were used for  19  estimation of variance components, EBVs, age-age genetic correlations (𝑟𝑖𝑗), narrow-sense individual tree heritabilities and their respective standard errors. The following pedigree-based, multivariate polygenic model (MBLUP) (Mrode 2014) was implemented using ASReml v. 3.0 software (Gilmour et al. 2009): 𝒀𝑖 = 𝑿𝑖𝜷𝑖 + 𝒁𝑖𝒂𝑖 + 𝒁𝑖𝒃𝑖 + 𝒁𝑖𝒂𝒆𝒊 + 𝒆𝑖 (1) where Yi is the vector of phenotypes for height at the 𝑖𝑡ℎ year, and 𝑿𝑖 and 𝒁𝑖  are the incidence matrices relating observations of height at the 𝑖𝑡ℎ year to the vector of fixed site effect (𝜷𝑖), and vectors of random additive genetic effect (𝒂𝑖), block effect (𝒃𝑖), additive genetic by site effect (𝒂𝒆𝒊), and residual effect (𝒆𝑖). Assuming 𝒂𝑖~𝑁(0, 𝜎𝑎𝑖2 𝑨), where 𝜎𝑎𝑖2  is the additive genetic variance for the 𝑖𝑡ℎ trait and 𝑨 is the average numerator relationship matrix; 𝒃𝑖~𝑁(0, 𝜎𝑏𝑖2 𝑰), where 𝜎𝑏𝑖2  is the block variance for 𝑖𝑡ℎ year, and 𝑰 is the identity matrix; 𝒂𝒆𝑖~𝑁(0, 𝜎𝑎𝑒𝑖2 𝑰), where 𝜎𝑎𝑒𝑖2  is the additive genetic by site interaction variance for the 𝑖𝑡ℎ year; and 𝒆𝒊~𝑁(0, 𝜎𝑒𝑖2 𝑰), where 𝜎𝑒𝑖2  is the residual variance for the 𝑖𝑡ℎ year. The covariance matrix of the additive genetic term was modeled with a heterogeneous general correlation structure (‘CORGH’) in ASReml to directly obtain age-age genetic correlations (Gilmour et al. 2009). Estimates of individual tree narrow-sense heritability, ℎ̂𝑖2, for trait 𝑖 were calculated as the ratio of estimated additive variance (?̂?𝑎𝑖2 ) to total phenotypic variance (?̂?𝑎𝑖2 + ?̂?𝑎𝑒𝑖2 + ?̂?𝑒𝑖2 ) from equation (1). Accuracy of individual EBVs for height at age i obtained from the MBLUP model were estimated following (Dutkowski et al. 2002): 𝑟(𝐸𝐵𝑉,𝑇𝐵𝑉)𝑖 = √(1 − 𝑃𝐸𝑉/?̂?𝑎𝑖2 ) (2)  20  where 𝑃𝐸𝑉 is the individual tree prediction error variance (square of standard error), and ?̂?𝑎𝑖2  is the estimated additive genetic variance component of the ith trait from equation (1).  Standard errors of heritability and genetic correlation estimates were approximated using the Delta method (Lynch and Walsh 1998), using the standard error estimates for variance components obtained from ASReml. 2.2.3 SNP genotyping and missing data imputation SNP markers were discovered utilizing a GBS pipeline for non-model species; see Chen et al. (2013) for details. Additionally, the three SNP imputation methods evaluated in this study for SNP tables with 60% missing data were: mean imputation (M60), KNN with special family weighting (KNN) and SVD. The KNN and SVD imputation methods are described in detail by Gamal El-Dien et al. (2015). Mean imputation was carried out using the ‘A.mat’ function provided in the ‘rrBLUP’ R package and refers to imputation using the mean for each marker (Endelman 2011). 2.2.4 Genomic selection Three GS analytical approaches were assessed: a common shrinkage model, rrBLUP (Whittaker et al. 2000) and two variable selection models, GRR (Shen et al. 2013), and BayesCπ (BCπ) (Habier et al. 2011). Further, rrBLUP and BCπ are both homoscedastic effects models using common marker variances for shrinkage, whereas GRR is a heteroscedastic effects model (that is, marker-specific variances are used). SNP tables were coded as: aa=−1, Aa=0, AA=1, where a is the reference allele, and A is the alternative allele. All analyses were completed using R software (R-Core-Team 2014). The following base model was implemented (Moser et al. 2009): 𝑦𝑖 = 𝟏𝜇 + 𝑔(𝐱𝑖) +  𝑒𝑖 (3) where 𝑦𝑖 is the EBV of individual 𝑖 obtained from equation (1), 𝟏 is an incidence matrix of ones, 𝜇 is the overall mean, 𝐱𝑖 is a 1 ×  𝑝 vector of SNP genotypes for individual 𝑖, 𝑔(𝒙𝑖) is a function  21  to estimate the GEBV as the combined effect of 𝑝 SNP markers on the EBV of individual 𝑖, and 𝑒𝑖 is the residual error. 2.2.4.1 Ridge regression BLUP (rrBLUP) rrBLUP estimates of GEBV were obtained using the R package ‘rrBLUP’ (Endelman 2011). The model for rrBLUP follows: 𝑔(𝐱𝑖) = ∑ 𝑥𝑖𝑘𝑢𝑘𝑝𝑘=1  (4) where 𝑥𝑖𝑘 is the genotype of individual 𝑖 for SNP marker 𝑘, and 𝑢𝑘 is the additive effect of SNP marker 𝑘. The BLUP solution for marker effects, ?̂?, is obtained using Henderson’s methods for mixed model equations (MME) (Henderson 1953): ?̂? = (𝒁′𝒁 + 𝜆𝑰)−1𝒁′𝒚 (5) where 𝒁 is an incidence matrix relating SNP markers to individuals, 𝑰 is an identity matrix, 𝒚 is the vector of EBV and λ = ?̂?𝜀 2 assuming 𝒖 ~ 𝑁(0, 𝑰𝜎𝑢2). The SNP shrinkage parameter expressed as the ratio between the residual and common marker variances, λ = ?̂?𝜀 2/?̂?𝑢2 This method shrinks all marker effects equally, where the shrinkage is dependent on the marker allele frequency. 2.2.4.2 Generalized ridge regression (GRR) GRR is a two-step variable selection process, and was carried out using the R package ‘bigRR’ (Shen et al. 2013). In the first step, initial estimates of ?̂?e 2, ?̂?𝑢2, and ?̂? were obtained through the same MME as in rrBLUP. However, the BLUP estimate, ?̂?, is modified to accommodate a SNP specific shrinkage parameter: ?̂? = (𝒁′𝒁 + 𝑑𝑖𝑎𝑔(𝝀))−1𝒁′𝒚 (6) where 𝒁 and 𝒚 are the same as in (5) and 𝝀 is a vector of 𝑝 shrinkage parameters with λ𝑘 = ?̂?e 2/?̂?𝑢𝑘2  as the shrinkage parameter for SNP 𝑘, and ?̂?𝑢𝑘2  is the variance component for SNP 𝑘 computed as:  22  ?̂?𝑢𝑘2 =𝑢𝑘21−ℎ𝑘𝑘, where, ?̂?𝑘 is the marker effect BLUP for SNP 𝑘 obtained in equation (5) and ℎ𝑘𝑘 is the (𝑛 + k)𝑡ℎ diagonal of the hat matrix, 𝑯 = 𝑻(𝑻′𝑻)−1𝑻′, where 𝑻 = (𝑿 𝒁0 𝑑𝑖𝑎𝑔(𝛌)) (7) 2.2.4.3 BayesCπ (BCπ) BayesCπ was developed by Habier et al. (2011) as an extension to the Bayesian GS methods developed by Meuwissen et al. (2001). The statistical model for BCπ follows: 𝑔(𝐱𝑖) = ∑ 𝑥𝑖𝑘𝑢𝑘𝛿𝑘𝑝𝑘=1  (8) where, 𝑥𝑖𝑘 and 𝑢𝑘 are the same as in (4) and 𝛿𝑘 is an additional dummy variable that reflects the effect of SNP marker 𝑘 in the model being equal to zero with probability 𝜋. The proportion of markers with null effect in the model, π, is inferred from the data using a uniform prior distribution (0,1). BCπ assumes a common SNP effect variance, with a scaled inverse chi-square prior, 𝜈𝑢 degrees of freedom and scale parameter, 𝑆𝑢2. The proportion (1– 𝜋) markers included in the model have effects following a mixture of multivariate Student’s t-distributions , 𝑡(0, 𝜈𝑢, 𝑰𝑆𝑢2) as described in Habier et al. (2011). The R package ‘BGLR’ (Pérez and de los Campos 2014) was used to implement the BCπ algorithm. Gibbs sampling was used to generate the Monte Carlo Markov Chain using 100 000 iterations thinned at a rate of 100, with the first 20 000 discarded for burn-in. The ‘BGLR’ package default estimate for the scale parameter, 𝑆𝑢2, and degrees of freedom, 𝜈𝑢 = 5, was used. Trace plots were visually checked for model convergence. 2.2.4.4 Cross-validation, PA and relative efficiency of GS To assess GS PA, 10 replications in a 10-fold random cross-validation scheme were used. Under this scenario, 90% of available data are selected randomly as the TP while the remaining 10% is  23  designated as the validation population (VP). Here, PA of GS is defined as the mean Pearson product-moment correlation between EBV from the MBLUP model (1) and GEBV from the GS models, for the VP from the 10 replications, that is, 𝑟(𝐺𝐸𝐵𝑉, 𝐸𝐵𝑉). The temporal PA (TPA) is then 𝑟(𝐺𝐸𝐵𝑉𝑗, 𝐸𝐵𝑉𝑘) where 𝑘 is age 40 and 𝑗 is an age less than 40. To explore the relative efficiency of GS to traditional selection (TS), estimates of PA from model (1) using raw phenotypes of the genotyped individuals as training data were obtained by applying the same cross-validation method previously stated. Breeding value estimates were then obtained under two scenarios. The first scenario utilized the pedigree-based average numerator relationship matrix, 𝑨 (𝐸𝐵𝑉𝑇𝑆). The second replaced 𝑨 with the realized relationship matrix, 𝑮 (𝐺𝐸𝐵𝑉𝐺𝑆) (Habier et al. 2007; VanRaden 2008). The latter method is known colloquially as GBLUP and is equivalent to rrBLUP (Mrode 2014). The SVD marker data was used to compute 𝑮 as: 𝑮 = 𝒁𝒁′/2 ∑ 𝑝𝑗(1 − 𝑝𝑗) (9) where 𝒁 = 𝑴– 𝑷 with 𝑴 as the genotype matrix and 𝑷 as a vector of 2(𝑝𝑗– 0.5), and 𝑝 is the frequency of the alternative allele of the SNP at the 𝑗𝑡ℎ locus. The relative efficiency of GS to TS, assuming selection response is inversely proportional to the length of the breeding cycle is: 𝑅𝐸 =𝑟(𝐺𝐸𝐵𝑉𝐺𝑆,𝐸𝐵𝑉)𝑟(𝐸𝐵𝑉𝑇𝑆,𝐸𝐵𝑉)×𝑡𝑇𝑆𝑡𝐺𝑆 (10) where 𝐸𝐵𝑉 is the estimated breeding value using the full data from model (1), and 𝑡𝑇𝑆 and 𝑡𝐺𝑆 are the length of time to complete a breeding cycle under TS and GS, respectively (Grattapaglia and Resende 2011).  24  2.3 Results 2.3.1 Phenotypic and MBLUP analysis Variation in measures of height among trees within the two sites was relatively stable across years (range: CV=25.09% (HT40)—34.44% (HT6) (Table 2.1). Narrow-sense individual tree heritability estimates from the MBLUP analysis ranged from low (0.25, HT10) to moderate (0.64, HT30) (Table 2.2). As expected, the age-age genetic correlation between juvenile and mature height (HT40) increased with increasing juvenile age (Table 2.2). Mean individual accuracy of breeding values estimated with the MBLUP method were consistent across years of measurement and ranged from 0.74 to 0.76 (Table 2.2). 2.3.2 Prediction accuracy 2.3.2.1 Imputation method and statistical approach The number of SNP markers retained after filtering for minimum minor allele frequency (0.05), averaged from the cross-validation scenarios was 34 570 (M60), 39 915 (KNN) and 50 803 (SVD). On average, SVD, followed by the novel KNN family-based imputation approach both surpassed the PA of M60 imputation in all GS analysis methods, with the observed differences being comparatively minor between SVD and KNN (Figure 2.1, left; Table 2.3). The increase in PA relative to the baseline M60 imputation method, averaged across statistical approaches and ages, was greatest for SVD (7.9%), followed by KNN (6.6%). The average increase in PA of SVD relative to KNN was 1.2%. The former result produced a trend of diminishing return when comparing the number of markers used in GS analyses. On average, variation in the relative difference between GS analytical approaches was less than that among imputation methods and number of markers (Figure 2.1, right; Table 2.3). PA was, on average, equal between the rrBLUP and BCπ methods and both performed better than  25  GRR, with differences being largest at ages 15, 30 and 40 years. The relative increase in PA averaged across imputation methods and ages indicated that rrBLUP and BCπ both performed equally well over GRR (4.3%). Pairwise groupings of statistical approach and imputation method scenarios indicated that rrBLUP or BCπ in combination with SVD produced the greatest PA, on average, across all ages. Standard errors of the predictions computed from the 10 replicates were low, ranging from 0.001 to 0.006 (Table 2.3). 2.3.2.2 Prediction accuracy across time The GS PA was inconsistent across ages (Table 2.3; Figure 2.1). The greatest PA occurred for HT3, and while a reduction occurred for all other ages, it was largest for HT10 and HT15 across all imputation and GS analytical approach combinations. As expected, the TPA decreased with increasing difference between the training and VP age of height measurement (Table 2.4; Figure 2.2, right). Differences in TPA mirrored the results from imputation comparisons, with SVD and KNN producing the greatest average relative increases over M60 by 22.2% and 12.6%, respectively. Diversity in TPA between analytical approaches was lower than that between imputation procedures with average relative increases of rrBLUP and BCπ at 5.5% and 2.6% over GRR, respectively. Age-age genetic correlations from the MBLUP model were plotted with TPA of GS models (Figure 2.2, left). As anticipated, the Pearson product-moment correlation between the two was, on average, near perfect (=0.99, not tabulated), with the TPA and age-age genetic correlation both decreasing with the difference in years. Interestingly, the TPA of GS models based on 30-year data was often equivalent to those based on 40-year data. 2.3.3 Distribution of SNP effects The scatter plots, histograms and correlations of estimated marker effects from the three GS analytical methods suggested the greatest similarity between the rrBLUP and BCπ methods, and  26  least similarity between GRR and BCπ (Figure 2.3; Appendix A, Figures A.1–A.5). Evident in the plots is the relative tendency of GRR to apply intense shrinkage to the minor effect SNPs while allowing SNPs with large effect to persist. The latter result contrasts with rrBLUP and BCπ where they tended to distribute marker effects more widely owing to the common shrinkage parameter. The posterior mean of the π parameter (i.e., the probability of null effect) for BCπ was relatively constant across ages and with estimates ranging from 0.03 to 0.04 (not tabulated), accounting for some of the similarity between the rrBLUP and BCπ methods. The intensity of shrinkage due to marker quantity is also apparent when comparing the SVD to KNN and M60 marker effects plot. Pearson product-moment correlation was used to assess the linear relationship between the absolute value of marker effects in the three analytical approaches (Figure 2.3; Appendix A, Figures A.1–A.5; upper triangles). Pearson product-moment correlation averaged across imputation methods and ages (not tabulated), was greatest between rrBLUP-BCπ (=0.88), followed by rrBLUP-GRR (=0.86) and BCπ-GRR (=0.86). Spearman rank-correlation averaged across imputation methods and ages (not tabulated), of marker effects, yielded nearly identical ranking between rrBLUP-GRR (=0.99), as expected because the rrBLUP procedure precedes GRR. Lower rank correlations were observed between rrBLUP-BCπ (=0.93) and BCπ-GRR (=0.90). Additional marker effect plots for the remaining height measurement years are available in Appendix A. 2.3.4 Relative efficiency PA from rrBLUP and the SVD marker data (GS) was compared with those using TS (Table 2.5). Under the early selection scenario (10 and 15 years), the TPA using GS was greater than that of TS resulting in 112% and 106% increase in selection response, respectively. GS PA for mature tree height (30 and 40 years) of the same age as model estimation was lower (HT30) or equal  27  (HT40) to their TS counterparts, however, assuming a 25% reduction in breeding cycle length resulted in increases of selection response by 137 and 139% for both ages separately. Additionally, a high (=0.85, not tabulated) Pearson correlation between PA and proportion of additive genetic variance explained by the GS model (that is, narrow-sense genomic heritability) was observed in this study. A lower correlation (=0.75, not tabulated) was also observed between TS narrow-sense heritability and TS PA. 2.4 Discussion 2.4.1 Accuracy of GS prediction through time In this study, repeated measures of tree height over time permitted the testing of GS models’ accuracy (PA) at different ages (Figure 2.1, Table 2.3). PA reported in this study varied substantially throughout time (Figure 2.1, Table 2.3). This may reflect the capacity of the SNP markers to account for differential gene expression due to physiological or G×E interaction over time. Interestingly, the large drop in PA at age 10 and 15 years seems to coincide with a period of intense competitive exclusion between trees at this age, perhaps exacerbated by the relatively narrow tree spacing (2.5 × 2.5 m). The observed extent of PA for tree height was comparable with that reported in other studies using clonal eucalypts (Resende et al. 2012a) and 1–6-year tree height in loblolly pine (Resende et al. 2012c; b; Zapata-Valenzuela et al. 2012). More recently, Beaulieu et al. (2014a) tested GS in a half-sib population of white spruce and reported PA for 22-year tree height that was similar to those described here. Next, GS models were trained with EBV of tree height at ages 3, 6, 10, 15 and 30 year and the resulting GEBV was validated against EBV from 40-year measurements (Figure 2.2, Table 2.4). This scenario is of interest because the PA is expected to decline after each breeding cycle owing to the decay of SNP-QTL LD as a result of recombination in the offspring (Habier et al.  28  2013). Thus, the TPA of GS methods is an important consideration for retraining said models as it offers the potential to further accelerate the breeding cycle if target phenotypes can be selected earlier. TPA in this study decreased as the difference in age of training and VP increased. Interestingly, the TPA of GS models based on 30-year height was nearly equivalent to those based on 40-year despite the 10 years difference in measurements, suggesting consistency between the EBVs and SNP effects at both ages which is reflected by their high age-age genetic correlation. This analysis is in agreement with that of Resende et al. (2012b), who tested TPA for ages 1–6 years in clonal loblolly pine and concluded that model retraining will likely require phenotypic data from mid-rotation age or later to accurately reflect mature growth trait performance (White et al. 2007). These results are not unexpected as conifers typically have weak age-age genetic correlations attributed to their long lifespans and exposure to a wide range of environmental contingencies over time (Namkoong et al. 1988). Currently, selection based on growth attributes is carried out at the age 15 years in interior spruce. 2.4.2 Model comparison The PA of rrBLUP, GRR and BCπ statistical approaches were assessed for a time series of tree height measurements in interior spruce (Figure 2.1, Table 2.3). The rrBLUP and BCπ models both performed consistently well, producing the highest PA across all ages. These two models can be considered equivalent when the posterior mean of π in the BCπ model approaches zero value. The π parameter posterior mean estimate ranged from 0.03 to 0.04 in this study, accounting for some of the models’ observed similarity. The marker effect plots (Figure 2.3, Appendix A, Figures A.1–A.5) illustrate the likeness of the number of markers fitted, marker effect distributions and shrinkage for both methods. Similarly, Resende et al. (2012b) found no difference in PA between rrBLUP and BCπ for 6-year height in loblolly pine, although they did find that BCπ outperformed  29  rrBLUP for an oligogenic disease resistance trait. This demonstrates the flexibility of the BCπ algorithm, where the π parameter allows the model to behave like rrBLUP when traits follow Fisher’s infinitesimal model (Fisher 1919). This concept gives BCπ the possibility to be useful in prediction of traits with unknown genetic basis at the cost of additional computational time (BCπ took upwards of five times longer than both rrBLUP and GRR combined). rrBLUP has often been suggested as a baseline model to which comparisons should be made because it has been shown to yield high and stable PA across a wide variety of species and traits with low computational time investment relative to other methods (Heslot et al. 2012). Indeed, this result has been found to be true in the present as well as other studies involving forest trees (Resende et al. 2012c; Beaulieu et al. 2014b). As anticipated, the GRR model did not offer improved PA over rrBLUP or BCπ for mature tree height under this study’s conditions. Tree height is widely regarded as having a complex inheritance pattern under the Fisher’s infinitesimal model (Fisher 1919). In theory, the statistical approach used in GS can lead to variation in PA, depending on the genetic architecture of the trait (Daetwyler et al. 2010). Variable selection methods are generally expected to perform optimally for traits with simple genetic architecture (that is, few loci with large effect), because SNPs of low effect are strongly shrunk toward zero, while those of large effect persist. Beaulieu et al. (2014b) and Resende et al. (2012c) evaluated BayesA and Bayesian LASSO, approaches similar to GRR, where the improvement in PA of growth-related traits were found to be null. The GRR model did, however, offer PA comparable with rrBLUP and BCπ at juvenile ages (3, 6 and 10 years). As observed in the distributions of marker effects at these juvenile ages, the GRR model appeared to shrink all markers equally because of an apparent absence of those with large effect compared with mature ages (Figure 2.3, Appendix A Figure A.1). However, in mature ages where large marker  30  effects were perceived to exist by the GRR model, the intense shrinkage applied to markers of low effect led to an obvious impairment of PA. This is expected because of the complex genetic nature of tree height, guiding the decision that these large effect markers in mature tree height were likely false positives. Further, the increase in PA by rrBLUP and BCπ over GRR at later ages may be accounted for by the knowledge that when markers are shrunk equally, the kinship component of PA is more effectively captured, when compared with heteroscedastic models (Heslot et al. 2012). GS models should, however, ideally be based on the LD between SNP-QTL rather than kinship, because the SNP-QTL LD component of PA is expected to persevere in subsequent generations following breeding (Habier et al. 2007). 2.4.3 GBS and marker imputation The use of GBS, in combination with several imputation methods, was successful in supplying a dense genome-wide marker data set, and further, enabled moderate levels of PA to be captured for tree height in interior spruce. This study represents the first use of GBS data as a basis for genomic prediction in a forest tree species. Both SVD and novel KNN imputation methods offered increases in PA over mean imputation (M60) (Figure 2.1, left). However, it is unclear whether this result is the product of the number of markers retained by the imputation method or the imputation method itself. Although the initial marker matrix with 60% missing information was used for the three imputation methods, variation in the number of retained markers should be expected as some marker loci may be non-imputable with certain algorithms and differences will occur because of filtering of minor allele frequency. Thus, the imputation method should be wholly evaluated based on both its marker yield and PA, because restricting the markers to a common set would lend unintentional penalization to methods that yield greater numbers of markers. Similarly, Rutkoski et al. (2013) noted comparable increases in PA over mean imputation while using both SVD and  31  KNN imputation on GBS-derived markers for wheat (Triticum aestivum L.). Further, on average, the SVD imputation method yielded only slightly better PA than KNN. The former result alludes to diminishing returns given the average difference in available markers after filtering the TP SNP tables for a minimum minor allele frequency of 0.05. This result corresponds well to the asymptotic relationship between marker density and PA derived by Grattapaglia and Resende (2011) who used simulated data and deterministic formulae. Foundationally, the GBS method produces a large number of missing data due to low read depth and possible mutation at restriction sites in some individuals (Elshire et al. 2011), thus, it relies heavily on accurate imputation methods to derive dense marker data. With the availability of a complete reference genome assembly of white spruce, it may be possible to increase the accuracy of imputation through use of methods designed for ordered data and constructed haplotypes (Rutkoski et al. 2013). In the interim, scaffolds of the draft genome assembly for white spruce published by Birol et al. (2013) may suffice to aid in aligning the unordered genomic data produced in this study, and additionally assist in discovering additional markers that can be used for GS. This would be greatly beneficial because the genome size and complexity of conifers demand a large number of markers to sufficiently saturate the genome and provide high PA. Marker density is also particularly important in determining PA for forest tree populations where the effective size (𝑁𝑒) is commonly large (Grattapaglia and Resende 2011). 2.4.4 Relative efficiency The observed GS prediction accuracies (PA) in the present study are adequate to produce greater genetic gain over TS (Table 2.5). The relative efficiency results are compatible with those described by other studies involving white spruce and loblolly pine (Resende et al. 2012b; Beaulieu et al. 2014a). However, the reduction in time of breeding cycle was limited to a  32  conservative value of 25%, as opposed to 50% in other studies, because the reliability of the PA produced from the present study has not been validated in a progeny generation. The theoretical increase in genetic gain produced by GS hinges on the capacity of the prediction models to remain relevant in the next generation. For this to occur, PA must ideally be based on SNP-QTL LD rather than kinship (Habier et al. 2013). Recently, the source of the relationship between marker and QTL described by GS models has been decomposed by Habier et al. (2013) into factors involving both kinship and pure marker-QTL LD, generating questions about the validity of such models in subsequent generations. The origin of PA produced by the GS models in this study is currently unknown, and further testing via a progeny VP, or the partitioning of families in a restricted cross-validation scheme as demonstrated by Beaulieu et al. (2014a), will be required in the future. Optimally, the PA of the models was produced via LD between markers and QTL. Sub-optimally, PA was derived through kinship information between individuals in the training and VPs. GS PA can be composed of a combination of the two factors, leading to inflated estimates of PA to occur without proper validation because the kinship component is anticipated to decay more rapidly than marker-QTL LD in the progeny generations (Habier et al. 2007, 2011, 2013). In a study of a large population of white spruce open-pollinated families, Beaulieu et al. (2014a) observed that PA decreased significantly when kinship between the training and VPs was restricted. This is not unexpected in forest trees, where decay in LD is typically fast (Neale and Savolainen 2004). Thus, it may be necessary in the future to create ‘designer’ breeding populations through intense management to maximize marker-QTL LD. Alternatively, selective within-family genotyping of phenotypically extreme individuals (that is, best and worst) could be used to train accurate GS models based on haplotype blocks (Ødegård and Meuwissen 2014).  33  In the open-pollinated testing used in the present study, GS models were trained using traditional pedigree-based EBVs that incorporate expected additive genetic relationships between individuals into the matrix, 𝑨, to estimate genetic parameters (Lynch and Walsh 1998). Mixed model theory assumes that covariance matrices are error-free (that is, ideally reflecting the segregation of only QTLs), thus, the accuracy of information contained in 𝑨 is critical in obtaining unbiased and accurate estimates of genetic parameters and breeding values (Mrode 2014). Ideally, data fitted to train GS models should be analogous to the true additive genetic merit of each individual in the TP (Garrick et al. 2009). Earlier studies have fitted de-regressed EBV in GS models to improve PA (Garrick et al. 2009). However, there are empirical results in animal breeding to suggest EBV to be superior in some cases (Guo et al. 2010). Kinship explained by marker data can be used to overcome this limitation and has been used in the past to correct pedigree errors (Munoz et al. 2014), and produce increased accuracy of EBV (El-Kassaby and Lstibůrek 2009; El-Kassaby et al. 2011). Munoz et al. (2014) applied this concept to GS and noted improved PA by correcting pedigree errors prior to estimating EBVs of the training data. It appears that in the short-term, single step methods such as those incorporating the realized (𝑮) (VanRaden 2008) or combined (𝑯) (Legarra et al. 2009; Christensen and Lund 2010) genomic relationship matrix may be better suited for tree-breeding programs with simple mating structures and shallow pedigrees because EBV accuracy suffers from the insufficiencies produced by the simplified mating design. At present, selection with increased accuracy could be made over traditional BLUP methods with the benefit of accumulating genotypic information that can be used in the long term when deep pedigrees and ‘designer’ TPs have been established. This concept is most relevant to young breeding programs with open-pollinated mating structures such as the one studied here.  34  2.5 Summary 2.5.1 Background Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. This chapter explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects.  2.5.2 Results Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r = 0.99) and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). The observed GS PA were adequate to deliver up to 12% and 39% gain in efficiency for the early and mature selection scenarios respectively. The imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models  35  both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. 2.5.3 Conclusions The application of GS for the prediction of tree height was supported by the moderate to high levels of PA obtained in the study. The observed GS PAs in this chapter are adequate to produce greater genetic gain over traditional selection assuming a conservative 25% reduction in breeding cycle length. However, retraining of GS models will likely require phenotypic data from mid-rotation age or later to accurately reflect mature growth trait performance. Furthermore, genotyping by sequencing was proven as a suitable marker discovery platform for the application of genomic selection in forest trees, with larger numbers of marker producing slightly greater PA. The choice of statistical model did not have a large impact on the PA, suggesting complex genetic architecture for tree height and that rrBLUP is a good baseline model for growth traits.   36  Table 2.1 Sample size (n) and descriptive statistics for interior spruce height (m) used in the pedigree-based analysis.  Age n Mean SD CV (%) Min Max 3 29 475 0.36 0.11 30.56 0.04 1.05 6 29 184 0.90 0.31 34.44 0.10 2.51 10 28 696 1.54 0.50 32.47 0.18 4.47 15 27 922 2.69 0.90 33.46 0.30 7.19 30 27 466 7.15 2.27 31.75 0.62 14.98 40 26 808 11.32 2.84 25.09 0.70 20.80 NOTE: SD, standard deviation; CV, coefficient of variation.  Table 2.2 Pedigree-based estimates of variance components for interior spruce height (m) at ages 3, 6, 10, 15, and 30 (years), and the estimated narrow-sense individual tree heritabilities (?̂?𝒊𝟐), age-age genetic correlations with age 40 (?̂?𝒊𝒋), mean estimated individual breeding value accuracies (𝒓(𝑬𝑩𝑽,𝑻𝑩𝑽)).  Age ?̂?𝑎2 ?̂?𝑎𝑒2  ?̂?𝑏2 ?̂?𝑒2 ℎ̂𝑖2 ?̂?𝑖𝑗  𝑟(𝐸𝐵𝑉,𝑇𝐵𝑉) 3 2.95E-03 (3.80E-04) 6.89E-04 (1.02E-04) 4.17E-05 (2.05E-05) 2.55E-03 (2.88E-04) 0.48 (0.05) 0.63 (0.06) 0.74 6 1.74E-02 (2.55E-03) 7.64E-03 (1.13E-03) 2.16E-03 (1.03E-03) 3.65E-02 (1.97E-03)  0.28 (0.04) 0.75 (0.04) 0.74 10 7.56E-03 (1.16E-03) 3.73E-03 (5.70E-04) 1.25E-03 (5.94E-04) 1.94E-02 (9.03E-04) 0.25 (0.04) 0.84 (0.03) 0.74 15 7.82E-01 (4.01E-02) 7.68E-03 (1.19E-03) 2.33E-03 (1.11E-03) 3.54E-02 (2.32E-03) 0.33 (0.04) 0.93 (0.02) 0.75 30 3.09 (3.87E-01) 6.00E-01 (9.00E-02) 1.60E-01 (7.59E-02) 1.12 (2.32E-01) 0.64 (0.06) 1.00 (0.00) 0.76 40 4.53 (5.67E-01) 8.56E-01 (1.33E-01) 4.26E-01 (2.02E-01) 2.14 (4.29E-01) 0.61 (0.07) 1.00 (0.00) 0.76 NOTE: See text for variance parameter definitions. Standard errors in parentheses.    37  Table 2.3 Genomic prediction accuracy obtained from rrBLUP, GRR, and BCπ analytical approaches for SVD, KNN, and M60 imputation methods for interior spruce tree height at ages 3, 6, 10, 15, 30, and 40 (years). Age rrBLUP   GRR BCπ   SVD KNN M60 SVD KNN M60 SVD KNN M60 3 0.53 (0.003) 0.55 (0.001) 0.52 (0.002) 0.52 (0.003) 0.54 (0.004) 0.51 (0.003) 0.54 (0.002) 0.54 (0.002) 0.51 (0.002) 6 0.46 (0.002) 0.46 (0.002) 0.43 (0.003) 0.45 (0.002) 0.45 (0.002) 0.42 (0.006) 0.45 (0.001) 0.46 (0.003) 0.43 (0.002) 10 0.38 (0.002) 0.38 (0.002) 0.35 (0.002) 0.37 (0.002) 0.36 (0.002) 0.35 (0.004) 0.38 (0.004) 0.37 (0.003) 0.35 (0.004) 15 0.37 (0.005) 0.36 (0.003) 0.33 (0.003) 0.36 (0.005) 0.33 (0.005) 0.31 (0.005) 0.37 (0.002) 0.36 (0.003) 0.33 (0.003) 30 0.46 (0.003) 0.45 (0.003) 0.42 (0.002) 0.41 (0.003) 0.43 (0.003) 0.38 (0.006) 0.47 (0.002) 0.46 (0.001) 0.43 (0.002) 40 0.47 (0.003) 0.45 (0.002) 0.43 (0.002) 0.41 (0.004) 0.42 (0.002) 0.39 (0.005) 0.47 (0.002) 0.45 (0.002) 0.43 (0.003) NOTE: See text for model and imputation method descriptions. Standard errors in parentheses. Table 2.4 Age 40 temporal genomic prediction accuracy obtained from rrBLUP, GRR, and BCπ analytical approaches for SVD, KNN, and M60 imputation methods for interior spruce tree height at ages 3, 6, 10, 15, and 30 (years). Age rrBLUP   GRR   BCπ   SVD KNN M60 SVD KNN M60 SVD KNN M60 3 0.06 (0.003) 0.05 (0.002) 0.04 (0.002) 0.05 (0.003) 0.05 (0.003) 0.04 (0.002) 0.06 (0.002) 0.05 (0.002) 0.03 (0.002) 6 0.15 (0.003) 0.14 (0.002) 0.12 (0.002) 0.14 (0.003) 0.14 (0.002) 0.12 (0.004) 0.15 (0.001) 0.13 (0.003) 0.12 (0.002) 10 0.26 (0.003) 0.24 (0.002) 0.22 (0.002) 0.25 (0.003) 0.23 (0.003) 0.21 (0.004) 0.26 (0.003) 0.24 (0.002) 0.22 (0.003) 15 0.40 (0.005) 0.38 (0.005) 0.35 (0.003) 0.39 (0.005) 0.34 (0.004) 0.33 (0.005) 0.39 (0.003) 0.38 (0.004) 0.35 (0.002) 30 0.46 (0.003) 0.44 (0.004) 0.41 (0.001) 0.41 (0.003) 0.42 (0.003) 0.38 (0.005) 0.47 (0.002) 0.45 (0.001) 0.42 (0.003) Note: See text for model and imputation method descriptions. Standard errors in parentheses.    38  Table 2.5 Predictive accuracy and relative efficiency from cross-validated models using realized (GEBV GS ) and average (EBV TS ) relationship matrices for early (10 and 15 years) and mature (30 and 40 years) interior spruce tree height. Age r(EBVTS,EBV) r(GEBVGS,EBV) Efficiency Increase (%) 10 0.20a (0.003) 0.22a (0.004) 1.12 112 15 0.29a (0.002) 0.31a (0.002) 1.06 106 30 0.45 (0.003) 0.43 (0.003) 1.29b 129 40 0.43 (0.004) 0.43 (0.003) 1.33b 133 Abbreviations: EBV, estimated breeding values; GEBV, genomic breeding values. Standard errors in parentheses. aRepresents the temporal predictive accuracy. bAssumes a 25% reduction in breeding cycle length attributed to genomic selection.  39   Figure 2.1 GS prediction accuracies from models trained using interior spruce height data at ages 3, 6, 10, 15, 30 and 40 years and validated against EBV at the training age. Shape and pattern in left panels represent imputation methods for 60% missing genomic data (M60, triangle/dash; KNN, circle/dash-dot; SVD, square/dot). Shape and pattern in right panels represent genomic prediction methods (GRR, circle/dash; BCπ, square/dash-dot; rrBLUP, triangle/dot).    40   Figure 2.2 Left panels: Temporal GS prediction accuracy for three models (rrBLUP, GRR, BCπ) trained using interior spruce height data at ages 3, 6, 10, 15, 30 and 40 years and validated against EBV at age 40 years. Right panels: Relationship between age-age genetic correlations from the MBLUP model and temporal GS prediction accuracy from the three GS models (rrBLUP, GRR, BCπ). Shape and patterns in both panels represent imputation methods for 60% missing genomic data (M60, triangle/dash; KNN, circle/dash-dot; SVD, square/dot).    41   Figure 2.3 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 15 years in interior spruce.   42  Chapter 3: Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca 3.1 Introduction Maximizing genetic gain in tree breeding programs depends on the reliability of the estimated genetic parameters. Traits’ heritabilities and their genetic correlations are among the most critical genetic parameters determining potential genetic progress, therefore their precision is of great importance. Progeny testing is the vehicle by which these genetic parameters are estimated, while individual tree data, family composition, pedigree depth and connectedness, and field performance determine their reliability (Huber et al. 1992). Open-pollinated testing is often used to efficiently screen and assess large numbers of candidate trees (Burdon and Shelbourne 1971; Jayawickrama and Carson 2000). However, it is recognized that treating open-pollinated families as “half-sibs” is inaccurate due to the effect of hidden relatedness, which cannot be easily unraveled in traditional pedigree-based analyses, thus affecting the resulting genetic parameters and candidate rankings (Squillace 1974; Namkoong et al. 1988; Askew and El-Kassaby 1994). The use of molecular markers can uncover hidden relatedness and potential pedigree errors in open-pollinated populations via pedigree reconstruction and paternity assignment (Wang 2004; Kalinowski et al. 2007). Pedigree reconstruction has been effectively used to determine the genealogical relationship among groups of related individuals leading to estimating genetic parameters with increased precision (El-Kassaby and Lstibůrek 2009; Doerksen and Herbinger 2010; El-Kassaby et al. 2011; Korecký et al. 2013; Klápště et al. 2014). Ritland (1996) proposed an innovative approach where marker-based relationship estimates could be used for estimating traits’ heritabilities and demonstrated the method’s utility in wild populations. Capitalizing on the  43  availability of dense genomic marker sets, VanRaden (2008) derived estimates of marker-based relationships between pairs of individuals as a genomic relationship matrix (𝑮), which can be used as a substitute to the traditional pedigree-based average numerator relationship matrix (𝑨) in Henderson’s animal model (Henderson 1984; Frentiu Francesca D et al. 2008; Habier et al. 2013; Klápště et al. 2014). The advantages of using marker-based relationship estimates are: (1) bypassing the classical mating designs needed for generating structured pedigree (El-Kassaby et al. 2011; Klápště et al. 2014), (2) applicability to natural and experimental populations (El-Kassaby et al. 2012; Porth et al. 2013; Klápště et al. 2014), and (3) increased precision of genetic parameter estimates by accounting for genetic similarity due to common ancestry (i.e., historical pedigree) (Powell et al. 2010) and Mendelian segregation (Visscher et al. 2006; Ødegård and Meuwissen 2012). Combining 𝑨 with the marker-based relationship matrix 𝑮, in a single genetic covariance matrix, 𝑯 (Legarra et al. 2009; Misztal et al. 2009; Christensen and Lund 2010), has proven to be an effective way to improve relationship coefficients for better estimation of genetic parameters and tracking of genetic diversity in animal breeding systems. This method was dubbed “HBLUP,” since the best linear unbiased predictors (BLUPs) of breeding values are derived using the combined, 𝑯, genetic covariance matrix. The HBLUP method may be particularly useful in forest tree progeny test evaluation as the cost and logistics of genotyping the entirety of trials is prohibitive; thus, combining the marker-based relationship from a subset of trees from some families from few sites with the pedigree-based relationship estimator would be an efficient option for improving the derived genetic parameters’ precision for the entire test.  44  The aim of this study is to compare the traditional ABLUP approach to the HBLUP approach by comparison of the precision of genetic parameters, accuracy of breeding values, and rankings of selection candidates using variable proportions (0, 25, 50, 75, and 100%) of genotyped individuals within the open-pollinated families. 3.2 Materials and Methods 3.2.1 White spruce open-pollinated progeny test, phenotype data, and genotyping Data for the present study was obtained from Beaulieu et al. (2014a) and downloaded from the Dryad Digital Repository: doi 10.5061/dryad.6rd6f. Briefly, this study concerns a single white spruce (Picea glauca (Moench) Voss) provenance-progeny test site that was established in 1979 in Québec, Canada. The test site contains 214 open-pollinated families from 43 Québec provenances planted in randomly assigned five tree row plots at 2.4 m between row and 1.2 m within row spacing to attain a randomized block design with six blocks. A subset of 1694 trees was sampled from the site, the mean number of trees per family was ∼7.9, and the mean number of trees per family per block was ∼1.3. Complete details of the test site are available in Beaulieu et al. (2014a). Phenotypic data for this study included 22 year tree height (HT) (𝑐𝑚) and wood density (WD) (𝑘𝑔/𝑚3) obtained via X-ray densitometry. Single-nucleotide polymorphisms (SNPs) of the 1694 trees were genotyped using Illumina Infinium HD iSelect bead chip PgAS1 (Illumina, San Diego, CA) (Beaulieu et al. 2014a). The SNPs were located within 2814 genes and separated by a minimum distance of 200 bp. Genes were selected from the white spruce gene catalog “GCAT” (Rigault et al. 2011) and were those of diverse functions (Pavy et al. 2013). A total of 6716 SNPs were available for use in the analyses.  45  3.2.2 Data analyses Traditional pedigree-based genetic analyses were performed under the assumptions of unrelated half-sibling family structure and a minimal population structure effect, as shown by Beaulieu et al. (2014a). Variance components, heritabilities, genetic correlations, and individual tree breeding values were estimated with the bivariate animal model (herein referred to as “ABLUP”) (Henderson 1984) as follows: 𝒚 = 𝑿𝜷 + 𝒁1𝒂 + 𝒁2𝒖 + 𝒆      (1) where 𝒚 is the response matrix of individually scaled and centred phenotypic observations for the two traits; 𝑿 is the incidence matrix for the fixed effect 𝜷 (trait means); 𝒁1 and 𝒁2 are the corresponding incidence matrices related to random additive genetic effects (breeding values, 𝒂~𝑵(𝟎, 𝑮)) and block effects (𝒖~𝑵(𝟎, 𝑼)), and the random residual error effects are distributed as 𝒆~𝑵(𝟎, 𝑹). The covariance matrix for the random additive genetic effects was modelled using the heterogeneous covariance structure as 𝑮 = [𝜎𝑎12 𝜎𝑎1𝑎2𝜎𝑎2𝑎1 𝜎𝑎22 ] ⨂ 𝑨 where 𝑨 is the average numerator relationship matrix, 𝜎𝑎1𝑎2 is the additive covariance between traits 1 and 2, and ⨂ is the Kronecker product operator. The covariance matrix for the random block effects was modelled using a diagonal structure as 𝑼 = [𝜎𝑢12 00 𝜎𝑢22 ] ⨂ 𝑰 where 𝑰 is an identity matrix. The random residual error effect was modelled using an unstructured covariance matrix structure as 𝑹 =[𝜎𝑒12 𝜎𝑒1𝑒2𝜎𝑒2𝑒1 𝜎𝑒22 ] ⨂ 𝑰 where 𝜎𝑒1𝑒2 is the residual covariance between the two traits. The genomic relationship matrix (GRM) (𝑮) was constructed after VanRaden (2008): 𝑮 = 𝒁𝒁′/2 ∑ 𝑝𝑗(1 − 𝑝𝑗)𝑗       (2)  46  let 𝑛 be the number of individuals and 𝑚 be the number of markers, then 𝒁 is an 𝑛 ×  𝑚 matrix of centered genotype scores with missing values replaced with zeros (i.e., mean imputation), and 𝑝𝑗 is the reference allele frequency of the 𝑗𝑡ℎ marker. In the single-step model (herein referred to as ‘HBLUP’), the 𝑨 matrix was substituted by the combined additive relationship matrix 𝑯 (Legarra et al. 2009; Christensen and Lund 2010), resulting from accommodation of the genomic marker-based relationship matrix 𝑮: 𝑯 = [𝑨11 + 𝑨12𝑨22−1(𝑮 − 𝑨22)𝑨22−1𝑨21 𝑨12𝑨22−1𝑮𝑮𝑨22−1𝑨21 𝑮]    (3) where 𝑨𝟏𝟏 is the block of 𝑨 for the nongenotyped individuals of the population, A22 is the block of genotyped individuals, and 𝑨𝟏𝟐 and 𝑨𝟐𝟏 are the blocks containing the expected additive genetic relationships between genotyped and nongenotyped individuals. The inverse of 𝑯 (𝑯−𝟏) can be obtained via (Christensen and Lund 2010; Aguilar et al. 2010): 𝑯−𝟏 = 𝑨−1 + [𝟎 𝟎𝟎 𝑮−𝟏 − 𝑨22−𝟏]     (4) where 𝑨−1 is the inverse of the pedigree-based relationship matrix A, and 𝑮−𝟏 and 𝑨22−𝟏 are the pedigree and the genomic relationship matrices for the genotyped individuals, respectively. Prior to calculation of 𝑯, the GRMs were scaled as 𝑮∗ = 𝛽𝑮 + 𝛼, such that the average of diagonal and off-diagonal elements of G were equivalent to those of 𝑨22 by solving the following system of equations for the parameters 𝛼 and 𝛽 (Christensen et al. 2012). {Avg(diag(𝑮))𝛽 + 𝛼 = Avg(diag(𝑨22))Avg(𝑮)𝛽 + 𝛼 = Avg(𝑨22)     (5) To avoid potential problems with inversion of 𝑮∗, it was weighted as (Aguilar et al. 2010): 𝑮𝑤 = 0.95 × 𝑮∗ + (1 − 0.95)  × 𝑨22     (6)  47  Five levels of genotyping effort were tested in the HBLUP method: 0, 25, 50, 75, and 100%. In the case of 0% of genotyping effort, variance components, genetic correlations, and breeding values were obtained strictly using expected additive genetic relationships defined in 𝑨 (i.e., ABLUP). At 100% genotyping effort, the 𝑮 matrix used in Equation 4 containing all possible genotyped individuals was combined with the 𝑨 matrix. In this last case, 𝑨𝟏𝟏 represents only the unrelated and nongenotyped maternal parents (i.e., a diagonal square matrix of order 214). To avoid possible sampling bias, a replicated resampling scheme was used to obtain parameter estimates at 25–75% genotyping effort. In detail, two, four, or six individuals per family, of a possible maximum of eight, were randomly removed from the 𝑮 and 𝑨𝟐𝟐 matrices prior to rescaling and combining with 𝑨 in Equation 4. This process was repeated 30 times at each level of genotyping effort and mean values for the results are reported. The restricted maximum likelihood (REML) was used to estimate variance and covariance for the random effects in the bivariate mixed model (1) and were obtained with the ASReml-R v3.0 program (Butler et al. 2009). The single-trait narrow-sense heritability for of each trait from the bivariate ABLUP and HBLUP models were estimated as: ℎ̂2 =?̂?𝑎2?̂?𝑎2+?̂?𝑒2       (7) where ?̂?𝑎2 and ?̂?𝑒2 are the estimates of additive genetic and residual variances, respectively (Falconer and Mackay 1996). The additive genetic correlation between HT and WD was estimated as Pearson’s product moment: ?̂?𝐺 =𝑐𝑜?̂?𝑎1,𝑎2√?̂?𝑎12 ?̂?𝑎22       (8)  48  𝑐𝑜?̂?𝑎1,𝑎2, ?̂?𝑎12 , and ?̂?𝑎22  are the genetic covariance and additive genetic variances for HT and WD respectively, estimated from the bivariate model (1) (Falconer and Mackay 1996). The theoretical accuracy of breeding values (?̂?) was estimated after (Dutkowski et al. 2002): ?̂? = √1 −𝑆𝐸𝑖2(1+𝐹𝑖)?̂?𝑎2      (9) where 𝑆𝐸𝑖 is the standard error of the BLUP for additive genetic effects, and Fi is the inbreeding coefficient of the ith individual. SEs of the breeding values were obtained from the bivariate model with variance components fixed to estimates from the same model with 100% genotyping effort for the respective traits, under the assumption that these were most accurate. Additionally, the Akaike information criterion (AIC) was computed to compare the fit of the bivariate ABLUP and HBLUP models. A smaller AIC value indicates better fit. Furthermore, Spearman rank correlations and the proportion of common candidates in the top 5% of mothers and progenies were also calculated to compare whether the predicted breeding values differed among the ABLUP and HBLUP models. Standard errors of heritability and genetic correlation estimates were approximated using the Delta method (Lynch and Walsh 1998), using the standard error estimates for variance components obtained from ASReml. 3.3 Results 3.3.1 Relationship matrices Descriptive statistics for the expected (𝑨) and combined (𝑯) relationship estimators presented the ranges of the 𝑯 relationship groups to be larger than that of 𝑨 (Table 3.1). As expected, the Pearson correlation between 𝑨 and the 𝑯 relationship matrices decreased with an increase in genotyping  49  effort. The ranges of the values within the various relationship groups also tended to expand with increasing genotyping effort. Particularly noteworthy was the presence of non-zero relationship coefficients between maternal parents given by the 𝑯 relationship estimators, indicating that maternal parents are not completely unrelated as assumed in the pedigree. Likewise, large differences from the expected pairwise half-sib and unrelated values in 𝑨 were observed in the 𝑯 relationship estimators. These deviations imply the possible presence of pedigree errors on one side, and full-sibs within the half-sib families on the other side. Family and provenance structure was visually evident in the heat maps of the relationship estimators (Appendix B, Figure B.1-B.5). Samples were sorted according to family and provenance, then the pairwise relationship coefficients of the matrices were plotted. Historical relationships (i.e., provenance) not captured by the pedigree estimator 𝑨 (Appendix B, Figure B.1) were visually apparent in 𝑯 with as little as 25% genotyping effort (Appendix B, Figure B.2). Further, potential pedigree errors not accounted for in the contemporary pedigree were uncovered, these errors may be due to mislabeling or technical errors in the lab. Population structure and potential pedigree errors were further evidenced by the large deviations observed between the ranges maternal parent–parent, half-sib, and unrelated relationship coefficients between 𝑨 and the 𝑯 relationship estimators (Table 3.1). 3.3.2 Variance components The additive genetic variance estimates obtained from the bivariate HBLUP models were substantially lower than the bivariate ABLUP model estimates for both HT and WD (Table 3.2). Further, the additive genetic component estimates of the HBLUP models continually decreased as a result of increased genotyping effort. It should be noted that the observed differences in additive genetic variance estimates between the ABLUP and HBLUP models reflect on the known bias  50  associated with considering wind-pollinated offspring as half-sib progeny, which often results in the overestimation of additive genetic variance and subsequently incorrect parental and progeny rankings. This bias is attributable to hidden relationships and dominance effects in assumed open-pollinated half-sib pedigrees. Narrow-sense heritability estimates of the bivariate HBLUP models mirrored those of the additive genetic variance estimates (HT: from 0.178 to 0.295; WD: from 0.310 to 0.594). These estimates at 100% genotyping effort for HT and WD were intermediate to the polygenic and marker-based estimates described by Beaulieu et al. (2014a). The trend observed in the estimates of additive genetic variance was coupled with an increase in the residual variance component for the HBLUP models compared to the ABLUP model, with the lowest estimates observed for the ABLUP model, indicating that some residual variance was shifted to the additive variance, hence the overestimation of heritability (Table 3.2). Comparison of the block variance estimates provided no pattern of association with any factor, and the estimates were relatively stable across all models for both WD and HT. Across all the models, the additive genetic correlations between HT and WD were minor, negative, and not significant (based on SEs). The variation in model parameters’ estimates under 25, 50, and 75% genotyping effort, due to random sampling, was low, indicating that the selection of genotyped individuals within family has minimal impact (Appendix B, Table B.1, Table B.2, and Table B.3). Large incremental improvements in the model fit, determined by AIC, were observed for the HBLUP models over the ABLUP model with increasing genotyping effort. Model fit was greatest in the case with complete genotyping effort (100%), using the 𝑯 relationship estimator (Table 3.2).  51  3.3.3 Breeding value accuracy The mean breeding value accuracies for HT and WD are presented for three classes of individuals: maternal parents and genotyped and nongenotyped progeny (Table 3.3). Comparison of 𝑯 and 𝑨 relationship estimators on mean breeding value accuracies for the maternal parents showed minor increases for HT (differences were limited to only the second decimal place; from 0.520 to 0.539) and no changes for WD (from 0.635 to 0.635) with increasing family genotyping efforts. The mean breeding value accuracy of nongenotyped progeny using the 𝑯 estimator across all levels of genotyping effort was stable and equivalent to the mean breeding value accuracy of the progeny using the 𝑨 estimator, for both HT and WD. As expected, the largest improvements over the 𝑨 relationship estimator were noted for genotyped progeny using the 𝑯 estimators. Increasing genotyping effort produced large increases in mean breeding value accuracy for both HT and WD over mean breeding value accuracies of the progeny using 𝑨 (HT: 0.474; WD: 0.605), with the effect greater for HT (HT: from 0.498 to 0.536; WD: from 0.626 to 0.661). 3.3.4 Candidate rankings The Spearman rank correlations for maternal and progeny breeding values, and the proportion of common candidates in the top 5% of maternal and progeny rankings, between the 𝑨 and 𝑯 relationship estimator at 25, 50, 75, and 100% genotyping effort, for both HT and WD, are presented in Table 3.4 and Table 3.5. The pairwise Spearman rank correlations were always less than perfect indicating at least some disagreement in the rankings of candidate maternal parents and progeny. Overall, maternal and progeny Spearman rank correlations between the HBLUP models and the ABLUP model were greater for WD than HT, further suggesting that genomic information was less impactful on candidate rankings for the higher heritability trait. Furthermore, the Spearman rank correlations between the ABLUP model and the HBLUP models decreased  52  with increasing genotyping effort in all cases. The top 5% analysis again showed overall better agreement for WD over HT for both maternal and progeny candidates. Furthermore, agreement with the top 5% ABLUP candidates decreased with increasing genotyping effort in the HBLUP models. In summary, the Spearman rank correlations and the proportion of common candidates in the top 5% suggests that the inclusion of genomic information affects candidate rankings and has the potential to alter both forward and backward selections. 3.4 Discussion The investigation of genetic architecture in forest trees requires the establishment of large scale testing populations with a predefined mating design. The results’ quality (i.e., accuracy and precision of genetic parameters) are often a reflection of the extent of efforts invested in the design of the field experiment. Commonly, quantitative genetics analyses are based on the animal model implemented in mixed model theory (Henderson 1984). Such analyses are based on the average numerator relationship matrix tracking probability of identity by descent (𝑨 matrix). The mixed model theory operates with the assumption that variance–covariance matrices of random terms used in the animal model are error free (Henderson 1984). The average numerator relationship matrix cannot fulfill this assumption as it is based on the pairwise expected relationship values and does not account for the Mendelian sampling term. Moreover, the contemporary shallow pedigrees of many forest tree breeding programs are unable to account for historical coancestry. The use of genotypic data enables greater insight into genetic covariances and the precise mapping of the Mendelian sampling term (Visscher et al. 2006). The use of marker-based relationship matrix analyses can be beneficial even in small populations with limited pedigree information (Ødegård and Meuwissen 2012; El-Kassaby et al. 2012; Porth et al. 2013; Cappa et al. 2016a).  53  However, the cost and logistics of obtaining genotypic information for the large testing populations in tree improvement programs still represent a barrier to widespread implementation of this methodology (Beaulieu et al. 2014a). Additionally, deployment of genetic analysis using the entire testing population (i.e., genotyped and nongenotyped individuals) is important to eliminate bias connected to the total reliance on using genotyped individuals only or error-prone multi-step methods. While the genomic information benefits from lower inbreeding build-up during the selection phases as compared to pedigree-based BLUP (Liu et al. 2014) due to the added knowledge about the Mendelian sampling term and historical coancestry, it is recommended that including nongenotyped individuals in the same data analysis is beneficial as it effectively reduces the bias associated with the selection of individuals for genotyping (Ducrocq and Patry 2010). This is increasingly important, especially in forest tree breeding programs, which are based on shallow pedigree-based selection from large base populations. The HBLUP method combining both pedigree- and marker-based relationship matrices generally provides as good as, or better, results over pedigree-based BLUP (Christensen et al. 2012) or multi-step approaches (Vitezica et al. 2011). By permitting both pedigree and marker information in a single genetic evaluation analysis, the combined HBLUP approach offers resolution to these problems. Information gained through genomic relationship estimates for a subset of genotyped individuals is reflected via pedigree relationships to the nongenotyped individuals in the relationship matrix 𝑯 (Legarra et al. 2009). Thus, more accurate estimates of relatedness are obtained in a cost-efficient manner. Our results support this notion through large improvement in the model fit as represented by the AIC statistics from the exclusive pedigree-based model to models utilizing the combination of marker and pedigree information at various genotyping efforts (Table 3.2). Furthermore, the presence of  54  hidden relatedness, particularly in open-pollinated trials, can be a serious problem in their genetic evaluation resulting in upwardly biased estimates of heritability (Squillace 1974; Namkoong et al. 1988; Askew and El-Kassaby 1994) (Table 3.2). This bias is caused by overestimation of additive genetic variance and hidden dominance effects due to the unrealistic assumption of pure half-sibling relatedness within the open-pollinated families, as well as complete unrelatedness among the parental donors. Reality and empirical evidence suggests the existence of hidden relationships within these families (i.e., full-sibs, half-selfs, and full-selfs) (Squillace 1974; Namkoong et al. 1988; Askew and El-Kassaby 1994; Gamal El-Dien et al. 2016; El-Dien et al. 2018). Our results confirm that hidden relatedness within the OP families and historical coancestry exists in this P. glauca population, and was better accounted for through the HBLUP method, with increasing genotyping effort helping to approach more realistic estimates of genetic parameters and better model fit as compared to the analysis performed on the complete pedigree alone (Table 3.2). It should be stated that all hidden relatedness can only be accounted for under the GBLUP analysis, where individuals are genotyped (including parents); however, the HBLUP is composed of a mixture of genotyped and nongenotyped individuals, thus a slight overestimation of the additive genetic variance is expected. The degree of additive variance overestimation should lie between the GBLUP and pedigree analyses (Beaulieu et al. 2014a). The accuracy of the estimated breeding values is of practical importance to tree breeders. Correct ranking of the selection candidates based on accurate estimates of relatedness is an important factor. While traditional pedigree-based analyses have been proven to deliver increased genetic gain, a reduction in the size of the testing population can be achieved with an improvement in accuracy through the use of genetic markers in the evaluation. The comparison of breeding value accuracies in this study was based on the assumption that the variance component estimates  55  produced by the bivariate HBLUP model with full genotyping effort were most accurate. The variance components of each model were fixed to these estimates prior to calculating theoretical breeding value accuracy for each scenario. Our results show incremental improvement in the accuracy of progeny breeding values for the genotyped subset for both HT and WD with increasing genotyping effort (Table 3.3). However, this improvement was not translated to the nongenotyped progeny subset in either trait. The simple, disconnected pedigree structure of this population is a probable explanation for this observation, as the genomic information is only transferred from genotyped to nongenotyped individuals via pedigree relationships. It is expected that in systems with more complex and interconnected pedigree structure, such as diallel mating designs, multi-generational complex pedigrees will benefit more from the HBLUP method as genomic information can be inferred from multiple family sources to the nongenotyped individuals. Furthermore, family or maternal parent breeding value accuracy remained constant across all levels of genotyping effort for the trait with greater heritability, WD, whereas slight increases were observed for the lower heritability trait, HT, at 25, 50, 75, and 100% genotyping effort (from 0.525 to 0.539). The difference in contrasting heritabilities of the traits and these findings agree with previous discussions that GS is expected to be more efficient in low heritability traits (Calus et al. 2008; Grattapaglia and Resende 2011). Yet, the inclusion of even a small proportion (i.e., 25%) of genomic information per family produced differences in ranking the top 5% of both maternal and progeny candidates for both traits (Table 3.4, Table 3.5). The accuracy of estimated breeding values in HBLUP is not a function of only average diagonal and off-diagonal elements in the 𝑮 matrix, but also the difference between them as it reflects the level of relatedness in the testing population (Forni et al. 2011). Generally, the genomic relationship matrix uses actual allele frequencies instead of those of the base population, which are  56  usually unknown (in this case, the base population is treated as the studied pedigree population), and sets the genomic-base population as the genotyped population (Oliehoek et al. 2006). However, such analysis is affected by incompatibility between pedigree- and marker-based relationship matrices due to the inferences made about the base population using information from the studied population. Powell et al. (2010) highlighted the importance of scaling these matrices by correcting the genomic relationship matrix by a factor reflecting the mean difference between the 𝑨 and 𝑮 matrices. Several approaches were proposed to rescale the 𝑮 matrix according to the 𝑨 matrix (Meuwissen et al. 2011; Vitezica et al. 2011; Christensen et al. 2012); however, an opposite approach was also proposed by Christensen et al. (2012), whereby the 𝑨 matrix is rescaled according to its 𝑮 counterpart. To avoid incompatibility between the 𝑮 and 𝑨 matrices, a normalized 𝑮 matrix containing the average of diagonal elements equal to one needs to be constructed. Such a matrix is efficient when the population is inbreeding free, and if inbreeding is present then inbreeding coefficients have to be included in the denominator (Forni et al. 2011). The scaling of 𝑮 was chosen according to Christensen et al. (2012); however, all analyses were tested without scaling 𝑮 and yielded comparable results (results not shown). This result is likely due to the shallow pedigree structure and minor differences between the 𝑨𝟐𝟐 and 𝑮 matrices. Additionally, Vitezica et al. (2011) observed discrepancies in the accurate genomic breeding values with incorrectly scaled 𝑮, but only under strong selection (i.e., nonrandom) of genotyped individuals and when genotyping was across 10 generations. However, in the present study, the 25–75% of genotyped trees were randomly removed (i.e., unselected) and all these trees came from only one generation. As a preliminary investigation into the combined use of genomic and pedigree information in the applied genetic analysis of white spruce, the HBLUP method has proven to be a beneficial  57  tool to forest tree breeders. The inclusion of nongenotyped and genotyped trees in a single analysis produced improvements in breeding value accuracy and model fit, particularly for the trait with low heritability (HT). Further, discordance in candidate rankings between the HBLUP and the traditional method that utilizes 𝑨 were observed. Improvement in the results were continuous with increasing genotyping effort; however, improvements were also seen at the minimum level of 25% genotyping effort. 3.5 Summary 3.5.1 Background The maximization of genetic gain in forest tree breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters. The effect of the combined use of contemporary pedigree information and genomic relatedness estimates on the accuracy of predicted breeding values and precision of estimated genetic parameters, as well as rankings of selection candidates, using single-step genomic evaluation (HBLUP) was investigated. In this study, two traits with diverse heritabilities [tree height (HT) and wood density (WD)] were assessed at various levels of family genotyping efforts (0, 25, 50, 75, and 100%) from a population of white spruce (Picea glauca) consisting of 1694 trees from 214 open-pollinated families, representing 43 provenances in Québec, Canada.  3.5.2 Results The results of this chapter revealed that HBLUP bivariate analysis is effective in reducing the known bias in heritability estimates of open-pollinated populations, as it exposes hidden relatedness, potential pedigree errors, and inbreeding. The addition of genomic information in the analysis considerably improved the accuracy in breeding value estimates by accounting for both Mendelian sampling and historical coancestry that were not captured by the contemporary  58  pedigree alone. Increasing within family genotyping efforts were associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy. Yet, improvements were observed even at minimal genotyping effort, indicating that even modest genotyping effort is effective in improving genetic evaluation.  3.5.3 Conclusions The results indicate that the HBLUP is an effective tool for improvement in the genetic evaluation of open-pollinated mating designs. Even modest genotyping effort within open-pollinated families is effective at improving important population level genetic parameters such as heritability and genetic correlations. Further, the modest within family genotyping effort was effective at capturing historical coancestry (i.e., population structure) and Mendelian sampling within the breeding population that were not captured by the contemporary pedigree. In the broader sense, this chapter illustrates that HBLUP the combined utilization of both pedigree and genomic information is a cost-effective approach to increase the accuracy of breeding values in forest tree breeding programs where shallow pedigrees and large testing populations are the norm.  59  Table 3.1 Descriptive statistics of the pedigree (A) relationship estimator, and combined pedigree-marker (H) relationship estimator at 0, 25, 50, 75, and 100% genotyping effort. Estimator (GE) Overall   Relationship groups     Identity Parent-parent  Mean Variance ?̂?𝑨 Mean (SD) Range Mean (SD) Range A (0%) 0.002 0.001 1.000 1.000 (NA) 1.000 - 1.000 0.000 (NA) 0.000 - 0.000 H (25%) 0.002* 0.001* 0.963* 0.996 (0.024) * 0.839 - 1.190* 0.000 (0.011) * -0.031 - 0.219* H (50%) 0.002* 0.001* 0.890* 0.990 (0.044) * 0.752 - 1.292* 0.000 (0.017) * -0.050 - 0.362* H (75%) 0.002* 0.001* 0.816* 0.989 (0.057) * 0.669 - 1.380* 0.000 (0.022) * -0.061 - 0.457* H (100%) 0.002 0.001 0.754 0.990 (0.065) 0.599 - 1.424 0.000 (0.025) -0.068 - 0.574  (CONTINUED)  Estimator (GE) Relationship groups  Parent-offspring Half-sib Unrelated  Mean (SD) Range Mean (SD) Range Mean (SD) Range A (0%) 0.500 (NA) 0.500 - 0.500 0.250 (NA) 0.250 – 0.250 0.000 (NA) 0.000 - 0.000 H (25%) 0.490 (0.032) * 0.362 - 0.667* 0.242 (0.031) * -0.025 - 0.577* 0.001 (0.022) * -0.076 - 0.661* H (50%) 0.468 (0.071) * 0.264 - 0.792* 0.226 (0.068) * -0.032 - 0.668* 0.001 (0.024) * -0.084 - 0.791* H (75%) 0.452 (0.098) * 0.203 - 0.860* 0.213 (0.098) * -0.033 - 0.712* 0.001 (0.027) * -0.086 - 0.860* H (100%) 0.442 (0.117) 0.169 - 0.894 0.204 (0.122) -0.033 - 0.754 0.001 (0.031) -0.087 - 0.981 NOTE: SD: Standard deviation; GE: Genotyping effort; ?̂?𝑨: Pearson product moment correlation with pedigree relationship estimator A. * Mean value(s).    60  Table 3.2 Pedigree-based (A) and combined pedigree-marker based (H) estimates of variance components for white spruce tree height (HT: cm) and wood density (WD: kg/m3), with associated estimated narrow-sense individual tree heritabilities (?̂?𝟐), additive genetic correlations (?̂?𝑮), and Akaike information criterion (AIC) model fit statistic. GE Estimator ?̂?𝑎2 (HT) ?̂?𝑏2 (HT) ?̂?𝑒2 (HT) ℎ̂2 (HT) ?̂?𝑎2 (WD) ?̂?𝑏2 (WD) ?̂?𝑒2 (WD) ℎ̂2 (WD) ?̂?𝐺  AIC 0% A 0.284 (0.074) 0.046 (0.031) 0.679 (0.071) 0.295 (0.074) 0.589 (0.100) 0.010 (0.008) 0.403 (0.086) 0.594 (0.091) -0.035 (0.148) 3125.586 25% H 0.271 (0.067)* 0.046 (0.031)* 0.692 (0.064)* 0.282 (0.067)* 0.487 (0.080)* 0.010 (0.008) * 0.486 (0.0691)* 0.502 (0.075)* -0.043* (0.141)* 2907.071* 50% H 0.228 (0.055)* 0.046 (0.031)* 0.731 (0.054)* 0.237 (0.055)* 0.403 (0.064)* 0.009 (0.008)* 0.567 (0.055)* 0.416 (0.060)* -0.050* (0.138)* 2672.915* 75% H 0.196 (0.046)* 0.046 (0.031)* 0.762 (0.047)* 0.205 (0.046)* 0.339 (0.052)* 0.009 (0.008)* 0.628 (0.046)* 0.351 (0.049)* -0.043* (0.134)* 2379.521* 100% H 0.170 (0.039) 0.045 (0.031) 0.786 (0.042) 0.178 (0.039) 0.300 (0.046) 0.009 (0.008) 0.666 (0.041) 0.310 (0.043) -0.023 (0.133) 1989.502 NOTE: GE: Genotyping effort. See text for variance parameter definitions. Standard errors in parentheses. * Mean value(s).    61  Table 3.3 Mean breeding value accuracy estimates (?̂?) for white spruce tree height (HT: cm) and wood density (WD: kg/m3) using pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimator at 0, 25, 50, 75, and 100% genotyping effort for three classes of individuals: maternal parent, genotyped progeny, non-genotyped progeny. Genotyping effort Estimator Class HT WD    ?̂? ?̂? 0% A Maternal 0.521 (NA) 0.635 (NA)   Genotyped NA NA   Non-Genotyped 0.474 (NA) 0.605 (NA) 25% H Maternal 0.525 (0.0007) 0.635 (0.0007)   Genotyped 0.499 (0.0008) 0.626 (0.0007)   Non-Genotyped 0.474 (0.0003) 0.604 (0.0002) 50% H Maternal 0.527 (0.0010) 0.635 (0.0009)   Genotyped 0.514 (0.0006) 0.640 (0.0004)   Non-Genotyped 0.473 (0.0004) 0.602 (0.0003) 75% H Maternal 0.530 (0.0007) 0.635 (0.0007)   Genotyped 0.528 (0.0004) 0.651 (0.0003)   Non-Genotyped 0.475 (0.0003) 0.601 (0.0002) 100% H Maternal 0.539 (NA) 0.635 (NA)   Genotyped 0.536 (NA) 0.661 (NA)   Non-Genotyped NA NA NOTE: Standard deviation in parentheses.    62  Table 3.4 Proportion of common candidates in the top 5% of maternal parent rankings (above diagonals), and estimated Spearman breeding value rank correlations (diagonals and below diagonals) for pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimators at 0, 25, 50, 75, and 100% genotyping effort for white spruce tree height (HT: cm) and wood density (WD: kg/m3).  Parent  HT  A (0%) H (25%) H (50%) H (75%) H (100%) A (0%) 1.00 (NA) 0.89 (0.052) * 0.78 (0.070) * 0.70 (0.055) * 0.64 (NA) H (25%) 0.99 (0.002)* 0.98 (0.002) * 0.82 (0.046) * 0.74 (0.045) * 0.69 (0.045) * H (50%) 0.96 (0.006) * 0.97 (0.002) * 0.97 (0.004) * 0.80 (0.049) * 0.78 (0.062) * H (75%) 0.98 (0.006) * 0.95 (0.003) * 0.97 (0.005) * 0.98 (0.002) * 0.88 (0.065) * H (100%) 0.91 (NA) 0.93 (0.004) * 0.96 (0.006) * 0.98 (0.002) * 1.00 (NA)  WD  A (0%) H (25%) H (50%) H (75%) H (100%) A (0%) 1.00 (NA) 0.90 (0.058) * 0.77 (0.066) * 0.7 (0.050) * 0.73 (NA) H (25%) 1.00 (0.001) * 0.99 (0.001) * 0.81 (0.031) * 0.76 (0.051) * 0.79 (0.060) * H (50%) 0.98 (0.003) * 0.98 (0.001) * 0.98 (0.002) * 0.83 (0.052) * 0.85 (0.070) * H (75%) 0.96 (0.005) * 0.97 (0.002) * 0.98 (0.003) * 0.98 (0.001) * 0.94 (0.059) * H (100%) 0.95 (NA) 0.96 (0.002) * 0.97 (0.004) * 0.99 (0.002) * 1.00 (NA)  NOTE: Standard deviation in parentheses. * Mean value(s)    63  Table 3.5 Proportion of common candidates in the top 5% of progeny rankings (above diagonals), and estimated Spearman breeding value rank correlations (diagonals and below diagonals) for pedigree-based (A) relationship estimator, and combined pedigree-marker (H) relationship estimators at 0, 25, 50, 75, and 100% genotyping effort for white spruce tree height (HT: cm) and wood density (WD: kg/m3).  Progeny  HT  A (0%) H (25%) H (50%) H (75%) H (100%) A (0%) 1.00 (NA) 0.89 (0.035) * 0.76 (0.040) * 0.66 (0.034) * 0.55 (NA) * H (25%) 0.98 (0.003) * 0.98 (0.002) * 0.74 (0.012) * 0.66 (0.015) * 0.56 (0.023) * H (50%) 0.94 (0.010) * 0.95 (0.002) * 0.94 (0.006) * 0.70 (0.017) * 0.64 (0.034) * H (75%) 0.90 (0.008) * 0.91 (0.003) * 0.92 (0.006) * 0.94 (0.002) * 0.77 (0.024) * H (100%) 0.84 (NA) 0.86 (0.004) * 0.90 (0.006) * 0.95 (0.004) * 1.00 (NA) *  WD  A (0%) H (25%) H (50%) H (75%) H (100%) A (0%) 1.00 (NA) 0.92 (0.018) * 0.83 (0.023) * 0.74 (0.018) * 0.70 (NA) H (25%) 0.99 (0.002) * 0.99 (0.001) * 0.87 (0.014) * 0.80 (0.015) * 0.76 (0.017) H (50%) 0.96 (0.005) * 0.97 (0.001) * 0.96 (0.002) * 0.85 (0.012) * 0.84 (0.023) H (75%) 0.92 (0.006) * 0.93 (0.002) * 0.94 (0.003) * 0.96 (0.001) * 0.91 (0.019) H (100%) 0.88 (NA) 0.90 (0.003) * 0.92 (0.004) * 0.96 (0.003) * 1.00 (NA) NOTE: Standard deviation in parentheses. * Mean value(s)  64  Chapter 4: Inter- and intra-generation genomic predictions for Douglas-fir growth in unobserved environments 4.1 Introduction Breeding conifer species for phenotypic improvement is challenged due to the late expression of important phenotypes related to productivity and their late sexual maturity, causing long recurrent selection cycles. Genomic selection (GS) can address some of these shortcomings through early prediction of phenotypes based on large numbers of jointly considered genomic markers, most commonly, single nucleotide polymorphisms (SNPs) (Meuwissen et al. 2001). This solution is realized through improved management of co-ancestry and increased genetic gain via enhanced precision and accuracy of pairwise kinship estimates for the breeding population. The recent exploration of GS within the forest tree breeding framework has produced numerous promising studies, indicating that GS prediction accuracies are able to at least match and often surpass pedigree-based predictions (see reviews by Grattapaglia 2017; Grattapaglia et al. 2018). In forest trees, the effect of genotype-by-environment (G×E) interactions can be considerable and, if overlooked, detrimental to genetic gain optimization. Therefore, it is commonplace to regionalize tree breeding efforts based on available biogeoclimatic information to avoid maladaptation of deployed stock (i.e., defining species’ breeding zones) (Burdon 1977). However, maladaptation may still occur within regionalized areas due to trees’ long lifespans, their placement within highly heterogeneous growing environments, along with ongoing and predicted shifting of climatic boundaries due to climate change. Thus, it is essential to account for the effect of G×E interactions when predicting breeding values to either select genotypes that are stable across the breeding region or match specific genotypes with particular locations to capture  65  additional gain (Li et al. 2017) in the future. In forest trees’ genetic evaluations, G×E interactions are typically modeled using mixed linear models that allow for covariances to be estimated between environments assuming that a character measured in two environments represents two distinct traits (i.e., Type-B genetic correlations; Burdon 1977) (e.g. see e Silva et al. 2005; Cappa et al. 2012; Bian et al. 2014). However, for large numbers of environments, typically, this approach is computationally not practical and simplifications of covariance structures or factor analytic models must be put in place (Isik et al. 2017). Recent GS studies in conifers have demonstrated that predictions across environments suffer from marked decreases in accuracy when the prediction model’s training population does not include phenotypic observations from the environment to which genomic predictions were targeted (Gamal El-Dien et al. 2015; Thistlethwaite et al. 2017; Chen et al. 2018). Consequently, estimated marker effects would be considered specific to the environment(s) of the training populations. Without phenotypic observations from the target environment, genomic prediction is challenging. In crop plant systems, high-dimensional environmental covariates (ECs) have been implemented in GS models to improve multi-environment genomic predictions (Jarquín et al. 2014). The reaction- norm models use covariance structures analogous to the realized relationship matrix (VanRaden 2008) to model environmental and G×E interaction effects. Pérez-Rodríguez et al. (2015) and Morais Júnior et al. (2018) extended the use of the reaction-norm models to include the average numerator relationship matrix (𝑨) and the single-step combined relationship matrix (𝑯), respectively. Pedigree-based reaction-norm methods such as random regression were only very recently proposed in a forest tree G×E interactions study (Marchal et al. 2019). Single-step genomic evaluation (ssGBLUP) is a unified approach that allows the incorporation of phenotypic, genomic, and pedigree information into a single analysis (Legarra et  66  al. 2009; Misztal et al. 2009). This methodology allows the prediction of breeding values for genotyped and non-genotyped individuals to be on the same scale, avoiding bias and complex multi-step analyses (Vitezica et al. 2011). It also allows for the phenotypes of non-genotyped individuals to participate in the estimation of marker effects, effectively boosting the accuracy of prediction. Thus, the method also provides a cost-effective entry into GS as relatively few important individuals can be genotyped while phenotypic records of rogued trials can also easily be implemented yielding a modern analysis for estimating marker effects or breeding values. The use of ssGBLUP has recently been demonstrated to be effective in genetic evaluations of Picea glauca (Ratcliffe et al. 2017), Eucalyptus grandis (Cappa et al. 2018), and Eucalyptus nitens (Klápště et al. 2018). However, ssGBLUP-based reaction-norm remains untested in outcrossing species such as forest trees. A limitation of the ssGBLUP method, however, is that the combined genetic relationship matrix (𝑯) can be very dense when evaluating many individuals, leading to lengthy computation times (Legarra and Ducrocq 2012). Wang et al. (2012a) demonstrated that genomic estimated breeding values (GEBV) of the genotyped individuals from ssGBLUP analyses can be used to calculate SNP marker effects. Lourenco et al. (2015) applied this approach to American Angus cattle, coining it ‘indirect prediction’. The indirect prediction approach improves computational efficiency and allows for fast prediction of GEBV for new genotyped trees via SNP marker effects as opposed to a full ssGBLUP evaluation where new genotyped individuals need to be explicitly included. This study is based on the ‘maritime low’ coastal Douglas-fir (CDF) (Pseudotsuga menziesii (Mirb.) Franco var. menziesii) seed planning unit (SPU) which represents elevation bands 0-900 m in geographic areas west of the British Columbia (BC) coastal mountain range and  67  latitudinal gradients South 48°00’ - 52°00’. The CDF breeding program is the most advanced in BC and is currently in its third generation with advanced generation seed orchards producing 6.6 million seedling equivalents from 2012 to 2017. Using 14 experimental trials planted in different environments, monthly averages of ECs obtained from ClimateBC (Wang et al. 2012b) were used in conjunction with ssGBLUP (Legarra et al. 2009; Misztal et al. 2009) under a Bayesian mixed-model framework (Pérez and de los Campos 2014) to obtain genetic parameter estimates and genomic estimated breeding values for genotyped individuals. The method of Lourenco et al. (2015) was then used to obtain indirect predictions of breeding values of target populations in unobserved environments. Thus, the objectives of this study were to assess the influence of including monthly average ECs for environment and G×E effects in mixed model analysis on i) genetic parameter estimates for the studied population, ii) model fit criteria, and iii) between-environment genomic prediction accuracies for intra- and inter-generational predictions. 4.2 Materials and Methods 4.2.1 Study populations 4.2.1.1 Study population 1 (SP1) The trees in this study are from low elevation (0-900 m asl), third generation coastal Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco var. menziesii) breeding population located in coastal British Columbia, Canada (Table 4.1). The parental (P0) generation consists of 78 wild plus-tree selections, which were crossed in partial disconnected diallels to produce 165 full-sib families for the second generation (F1). Full-sib families of the second generation were planted in 1975 using nursery container stock in ten environments using randomized complete block designs with four replicates per environment and four tree family row plots within the replicates (mean ≈ 15; range = 14-16; trees / full-sib family / environment). The third generation (F2) contains two series (F2- 68  2, F2-3), with no common parentage between them (Appendix C, Figure C.1,Table C.1, Table C.2) and are the result of crossing forward selections from the F1 generation, based on tree volume. F2-2 and F2-3 were planted in 2003 and 2006 respectively, both using nursery container stock, and both planted as full-sib block tests with 5 × 5 tree plots on two environments per series (mean ≈ 19; range ≈ 17-21; trees / full-sib family / environment). A total of 31,999 tree height [cm] phenotypic measurements were available for ages 12 (F1) and 11 (F2-2 and F2-3) years. Phenotypes were standardized by dividing total tree height by age [years] at time of measurement to provide the mean annual increment (MAI [cm/year]). 4.2.1.2 Study population 2 (SP2) A subset of SP1 was used for the genomic analyses in this study (Table 4.2). The subset population was created by setting a relatedness threshold of greater than 0 as the minimum expected additive genetic kinship coefficient value (𝑨, derived via pedigree) of an individual tree in the F1 or F2 generations, with at least one of the genotyped individuals in the F2 population. This resulted in 11,759 F1 phenotypes available for the genomic analyses (mean ≈ 15; range ≈ 14-16; trees / full-sib family / environment). The relatedness restriction resulted in an average additive genetic kinship coefficient of 0.033, as opposed to 0.012 in the population described previously. 4.2.2 SNP genotyping SNP genotypes were based on those produced and used by Thistlethwaite et al. (2017, 2019) with the full details available in Thistlethwaite et al. (2017). Briefly, DNA was extracted from cambial tissue of the F1 population and expanding needle tissue of the F2 population, and sent to RAPiD Genomics© (Florida, US) for SNP genotyping using whole exome capture (Neves et al. 2013). The SNPs used here differed from those used by Thistlethwaite et al. (2017, 2019) due to filtering criteria resulting in a different number of SNPs and genotyped trees for this study. Filtering criteria  69  was done using VCFtools (Danecek et al. 2011) and were as follows: maximum number of alleles = 2, minimum number of alleles = 2, Hardy-Weinberg Equilibrium exact test = 0.10, maximum sample missing = 0.40, maximum SNP missing = 0.40, minor allele frequency = 0.05, maximum site read depth = 50, minimum site read depth = 4. The ‘impute.svd’ function from the R package ‘bcv’ (Perry 2015) was used to impute the remaining missing data resulting in a final count of 66,969 SNPs for the genomic analyses. 4.2.3 Environmental covariates (ECs) Thirteen monthly ECs were obtained using ClimateBC software version 5.51 (Wang et al. 2012b) resulting in 156 individual environmental covariates per environment (i.e., 12 months by 13 variables). ClimateBC generates “scale-free” climate data, thus allows the user to obtain monthly climate variables for specific test environments rather than pixel averages from grid-based climate data. Monthly ECs were averaged across the growing period for each trial, from planting year until the year of phenotypic measurement, and included primary measures of temperature, precipitation, and solar radiation (see Table C.3 for a complete list of the thirteen primary and derived monthly variables). 4.2.4 Pedigree-based analyses (PBLUP) PBLUP analyses were used to estimate (co)variance components or functions of them (i.e., heritabilities and Type-B genetic correlations) and breeding values for the experimental population outlined in Table 4.1. All PBLUP analyses were completed using ASReml-R v3.0 (Butler et al. 2009), which uses the Average Information algorithm on Restricted Maximum Likelihood (REML) (Gilmour et al. 1995). Individual-tree breeding values were obtained under the following multi-environmental mixed linear model:  70  𝒚 =  𝑿𝛃 + 𝒁𝒓𝒓 +  𝒁𝒇𝒇 +  𝒁𝒂𝒂 +  𝒆 [1] where 𝒚 is the vector of phenotypic observations; 𝛃 is the fixed effect of environments; 𝒓 is the vector of random replicate effects within of each F1 environment, with 𝒓~𝑵(𝟎, 𝑮𝒓 ⨂ 𝑰), where 𝑮𝒓 is the diagonal (co)variance matrix between environments with diagonal elements 𝜎𝑟𝑖2  for each environment 𝑖 representing the replicate within environment variances, 𝑰 is the identity matrix, and ⨂ represents the Kronecker product of matrices; 𝒇 is the vector of random full-sib family genetic effects, with 𝒇~𝑵(𝟎, 𝑮𝒇 ⨂ 𝑰), where 𝑮𝒇 is the diagonal dominance (co)variance matrix between environments with diagonal elements 𝜎𝑓2 representing ¼ of dominance genetic variance (i.e., a common family variance for all environments was used); 𝒂 is the vector of additive genetic effects (or breeding values, EBV), with 𝒂~𝑵(𝟎, 𝑮𝒂 ⨂ 𝑨) where 𝑮𝒂 is the additive genetic (co)variance matrix between environments with diagonal elements 𝜎𝑎2 representing additive genetic variance and off-diagonal elements 𝜎𝑎 (i.e., a common additive genetic variance and covariance for all environments was used) representing the same additive genetic covariance between environments, and the matrix A contains the additive genetic relationships among all trees; 𝑿, 𝒁𝒓, 𝒁𝒇, and 𝒁𝒂 are the respective incidence matrices assigning fixed and random effects to each observation. Finally, 𝒆 is the vector of random residual effects, with 𝒆~𝑵(𝟎, 𝑹), where 𝑹 = ⨁ 𝑰 𝜎𝑒𝑖2 , 𝜎𝑒𝑖2  is the residual variance for each environment 𝑖, and ⨁ represents the ‘direct sum’ of matrices. A common variance for genetic dominance and additive effects was chosen to allow a more parsimonious model when using a large number of trees and environments. Single-environment narrow-sense heritability for each environment 𝑖 (ℎ𝑖2) was calculated using the same model from Eq. [1], except that in 𝑮𝒂 different 𝜎𝑎2 were used for each environment  71  𝑖 (𝜎𝑎𝑖2 ) (i.e., CORUH structure in ASReml). Then, ℎ𝑖2 were estimated as: ℎ𝑖2 = 𝜎𝑎𝑖2 /(𝜎𝑎𝑖2 + 𝜎𝑓2 +𝜎𝑒𝑖2 ). Finally, pairwise Type-B genetic correlations between environments 𝑖 and 𝑖′ (𝑟𝑔𝑖𝑖’) were estimated using the model of Eq. [1] but fitting two environments at a time, and 𝑮𝒂 was a heterogeneous (co)variance matrix with different additive genetic variances among environments and an additive genetic covariance between pairs of environments 𝑖 and 𝑖′ equal to 𝜎𝑎𝑖𝑖′ (i.e., US structure in ASReml) was used for the vector random additive genetic effects in this case. Then, 𝑟𝑔𝑖𝑖’ was estimated as: 𝑟𝑔𝑖𝑖’ =  𝜎𝑎𝑖𝑖′√𝜎𝑖𝑖2𝜎𝑖′𝑖′2⁄ . Standard errors of heritability and genetic correlation estimates were approximated using the Delta method (Lynch and Walsh 1998), using the standard error estimates for variance components obtained from ASReml. 4.2.5 Genomic-based analyses (ssGBLUP) Variance components, genetic parameters, and model fit (Deviance Information Criterion, Spiegelhalter et al. 2002) were estimated by fitting models to the data set in Table 4.2. Following Jarquín et al. (2014), the models fit were a series of five linear hierarchical single-step GBLUP (ssGBLUP) models. The R package ‘BGLR’ (Pérez and de los Campos 2014) was used. For variance component estimation, all data in Table 4.2 was fit to each model and the Monte Carlo Markov Chain (MCMC) was generated with 200,000 iterations thinned at a rate of 5, with the first 25,000 discarded as burn-in. Bayesian ridge regression (BRR; Gaussian prior) model was used for all model effects. The ‘BGLR’ package default starting parameters were used. Multiple chains were generated and the MCMC chain posterior means and trace plots of the posterior distributions  72  were compared to confirm convergence. Cholesky decomposition of all variance-covariance matrices was used to speed up computations. 4.2.5.1 Model 1 (M1): The first ssGBLUP model included an overall mean (μ) as a fixed effect, random environment effects (𝑺𝒊), random replicate within environment (𝑹𝒌), and random additive genetic effects (𝒂𝒋). Let 𝒚𝒊𝒋𝒌 and 𝒆𝒊𝒋𝒌 be the phenotype and residual effects, respectively, of individual 𝑗 for environment 𝑖, scored in replicate 𝑘. Then the mixed model to analyze the data is: 𝒚𝒊𝒋𝒌 = μ + 𝑺𝒊 + 𝑹𝒌 + 𝒂𝒋 + 𝒆𝒊𝒋𝒌  [2] where 𝑺𝒊 ~ 𝑵(𝟎, 𝑰𝜎𝑠2) , 𝑹𝒌 ~ 𝑵(𝟎, 𝑰𝜎𝑟2), 𝒂𝒋 ~ 𝑵(𝟎, 𝑯𝜎𝑎2), and 𝒆𝒊𝒋𝒌 ~ 𝑵(𝟎, 𝑰𝜎𝑒2) where the respective variances 𝜎𝑠2, 𝜎𝑟2, 𝜎𝑎2, and 𝜎𝑒2  are common for all the environments. The combined additive genetic relationship matrix, 𝑯, was calculated as follows (Legarra et al. 2009; Christensen and Lund 2010): 𝑯 = [𝑨11 + 𝑨12𝑨22−1(𝑮 − 𝑨22)𝑨22−1𝑨21 𝑨12𝑨22−1𝑮𝑮𝑨22−1𝑨21 𝑮] where 𝑨11 is the portion of the average numerator relationship matrix (𝑨) including the non-genotyped individuals, 𝑨22 is the portion of genotyped individuals, and 𝑨12 and 𝑨21 are the portions containing the expected additive genetic relationships between genotyped and non-genotyped individuals. The genomic additive relationship matrix for genotyped individuals, 𝑮, was calculated after VanRaden (2008): 𝑮 = 𝑴𝑴′/Σ𝑙𝑝𝑙(1 − 𝑝𝑙) where 𝑴 is the centered matrix of SNP covariates, and 𝑝𝑙 is the current (or observed) allele frequency of the genotyped trees for marker 𝑙. Further, 𝑮 was scaled for compatibility with 𝑨 such that the mean diagonal and off-diagonal elements were equal (Christensen et al. 2012):  73  𝑮∗  =  𝛽𝑮 +  𝛼, {𝐴𝑣𝑔(𝑑𝑖𝑎𝑔(𝑮))𝛽 +  𝛼 =  𝐴𝑣𝑔(𝑑𝑖𝑎𝑔(𝑨22))𝐴𝑣𝑔(𝑮)𝛽 +  𝛼 =  𝐴𝑣𝑔(𝑨22)  To avoid potential problems with the inversion of 𝑮∗, it was weighted as (Aguilar et al. 2010): 𝑮𝒘  =  0.95 ×  𝑮∗  +  (1 −  0.95)  ×  𝑨22  [8] 4.2.5.2 Model 2 (M2): 𝒚𝒊𝒋𝒌 = μ + 𝑺𝒊 + 𝑹𝒌 + 𝒂𝒋 + 𝒘𝒊𝒋 + 𝒆𝒊𝒋𝒌   [3] M2 extends model M1 (Eq. [2]) to include the vector of random effects for environmental covariates (𝒘) which is distributed as 𝒘 ~ 𝑵(𝟎, 𝛀𝜎𝑤2 ). Where 𝛀 is the matrix of similarity among environments calculated as 𝛀 ∝ 𝑾𝑾′ 𝑞⁄ , with 𝑾 being a centred and standardized matrix of ECs and 𝑞 the number of ECs as described by Jarquín et al. (2014) in detail, and 𝜎𝑤2  is the variance of the vector of ECs. 4.2.5.3 Model 3 (M3): 𝒚𝒊𝒋𝒌 = μ + 𝑺𝒊 + 𝑹𝒌 + 𝒂𝒋 + 𝒘𝒊𝒋 + 𝒂𝒘𝒊𝒋 + 𝒆𝒊𝒋𝒌  [4] M3 incorporates the vector of random effects for the interaction between additive genetic (𝒂) and EC (𝒂𝒘), 𝒂𝒘 ~ 𝑵(𝟎, 𝒁𝒂𝑯𝒁𝒂′ ∘ 𝛀𝜎𝑎𝑤2 ), where 𝜎𝑎𝑤2  is the variance of the interaction between additive genetic effects and ECs, and ∘ is the Hadamard (Schur) product of matrices. 4.2.5.4 Model 4 (M4): 𝒚𝒊𝒋𝒌 = μ + 𝑺𝒊 + 𝑹𝒌 + 𝒂𝒋 + 𝒘𝒊𝒋 + 𝒂𝑺𝒊𝒋 + 𝒆𝒊𝒋𝒌  [5] M4 includes a vector of random effects for the interaction between additive genetic and main environment terms (𝒂𝑺), 𝒂𝑺~ 𝑵[𝟎, 𝒁𝒂𝑯𝒁𝒂′ ∘ (𝐙𝑺𝐙𝑺′ )𝜎𝑎𝑆2 ], where 𝒁𝑺 is the incidence matrix for the effects of environments, and 𝜎𝑎𝑆2  is the variance of the interaction between additive genetic and main environmental effects.  74  4.2.5.5 Model 5 (M5): 𝒚𝒊𝒋𝒌 = μ + 𝑺𝒊 + 𝑹𝒌 + 𝒂𝒋 + 𝒘𝒊𝒋 + 𝒂𝒘𝒊𝒋 + 𝒂𝑺𝒊𝒋 + 𝒆𝒊𝒋𝒌 [6] Finally, M5 includes both interaction effects from M3 (i.e., 𝒂𝒘) (Eq. [4]) and M4 (i.e., 𝒂𝑺) (Eq. [5]). 4.2.5.6 Prediction accuracy and cross-validation Two scenarios were considered for estimating the prediction accuracy (PA) of the validation populations (VP, Table 4.2). In the first scenario (GA1), a leave one environment out approach for environments with genotyped individuals (Lost, Adam, and Fleet) was used to form the training population (TP). In the second scenario (GA2), the TP consisted of all individuals from the F1 generation. In either case, the non-genotyped individuals of the TP were randomly divided into five folds, that is, the phenotypes of genotyped individuals in the TP always participated in the estimation of SNP marker effects. The random folding process was replicated three times for each model (M1-M5). For each fold, genomic estimated breeding values (GEBV) of the genotyped VP were correlated to their EBV estimated in the PBLUP analysis (Eq. [1]). Prediction accuracy (𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉)) was then calculated as the mean Pearson product-moment correlation of the three replicates. The procedure for obtaining GEBV for the VP follow closely the methods of Lourenco et al. (2015) and are outlined below. 4.2.5.7 Estimation of GEBV for the validation population 1) Use ssGBLUP models M1-M5 to obtain the vector of GEBV (𝒂) for the genotyped portion of the TP. 2) Calculate the direct genomic value for each genotyped tree 𝑗 (𝐷𝐺𝑉𝑗) in the TP (Aguilar et al. 2010):  75  𝐷𝐺𝑉𝑗 = − ( ∑ 𝑔𝑗′𝑗𝐺𝐸𝐵𝑉𝑗′/𝑔𝑗𝑗𝑗′,𝑗′≠𝑗) where 𝑔𝑗′𝑗 (and 𝑔𝑗𝑗) is an off-diagonal (and diagonal) element of the inverse of 𝑮 (𝑮−𝟏) corresponding to relationships between tree 𝑗′ and 𝑗 and 𝐺𝐸𝐵𝑉𝑗′ is the predicted additive genetic solutions from the ssGBLUP models M1-M5 for TP and individual 𝑗′. 3) Estimate SNP marker effects from the TP: ?̂? = 𝑫𝑴′𝑮−1(𝑫𝑮𝑽) where ?̂? is the vector of estimated SNP effects, 𝑫 is a weight diagonal matrix for SNP (here an identity matrix), 𝑴 as defined before, and DGV is the vector of DGV for the TP. 4) Calculate DGV for the trees in the VP: 𝐷𝐺𝑉𝑗 = 𝑴𝑗?̂? where 𝐷𝐺𝑉𝑗 and 𝑴𝑗 are the direct genomic values and the matrix of centered genotypes for tree 𝑗 in the VP and not included in the ssGBLUP models M1-M5, respectively. 5) Calculate approximate GEBV for the VP: 𝐺𝐸𝐵𝑉𝑗 ≈ 𝑤1𝑃𝐴 + 𝑤2𝐷𝐺𝑉𝑗 where PA is the parental average of individual 𝑗 in the VP and 𝑤1 and 𝑤2 are weights associated with the covariances of 𝐷𝐺𝑉𝑗 and PA: [𝑤1𝑤2] = [𝜎𝑃𝐴2 𝜎𝑃𝐴,𝐷𝐺𝑉𝑗𝜎𝐷𝐺𝑉𝑗,𝑃𝐴 𝜎𝐷𝐺𝑉𝑗2 ] [𝜎𝑃𝐴2𝜎𝐷𝐺𝑉𝑗2 ] Lourenco et al. (2015) note that this approximation excludes the pedigree prediction component from the selection index approach proposed by VanRaden et al. (2009). Parental breeding values for the F1 generation were calculated as the average  76  breeding value of all offspring without masked phenotypes. Parental breeding values for the F2 generation were directly estimated in the prediction models.  77  4.3 Results 4.3.1 Pedigree-based analyses (PBLUP) Individual tree single environment narrow-sense heritability estimates for the PBLUP analyses were low to moderate for the F1 generation trials (ℎ̂𝑖2 = 0.04 − 0.27) while estimates for the F2 generation were higher (ℎ̂𝑖2 = 0.13 − 0.42) (Table 4.3). Standard errors of heritability estimates were low indicating significance of the estimates. Type-B genetic correlation estimates between F1 trials were positive and generally high (?̂?𝑔𝑖𝑖’ = 0.51 − 0.98), representing agreement in allelic effects contributing to MAI among the tested environments. This result is expected since the trials are from a single breeding zone. Among the three genotyped F1 trials (Lost, Adam, Fleet) the Type-B genetic correlations were moderate to high (?̂?𝑔𝑖𝑖’ = 0.71 − 0.88), and between environments of F2-2 (?̂?𝑔𝑖𝑖’ = 0.62) and F2-3 (?̂?𝑔𝑖𝑖’ = 0.57). Type-B genetic correlations between F1 and F2 trials were in most cases likely not significant based on approximated standard errors or were non-estimable due to low genetic connectivity between environments (Appendix C, Figure C.1, Table C.1, Table C.2). 4.3.2 Genomic-based analyses (ssGBLUP) In M1 46% and 5% of the phenotypic variance was explained by the between environment (?̂?𝑆2) and within environment (?̂?𝑟2) effects, respectively (Table 4.4). With the addition of ECs (?̂?𝑤2 ) in M2 those values dropped to 34% and 3% respectively, with the ECs accounting for 22% of the phenotypic variance. The shift of explained variance from the between environment effect to ECs can be understood as a decomposition of environmental variance, showing that ECs in M2 are able to capture additional environmental variation not captured by M1. This shift was accompanied by  78  9% decrease in error variance (?̂?𝑒2) across models M1-M5. However, a minor increase in the DIC model fit statistic of M2 did not justify the addition of ECs to M1. Evaluating the addition of G×E interactions into the model structure (M3-M4) showed that the use of ECs (?̂?𝑎𝑤2 ) in M3 explained 4% more phenotypic variation than the use of the environment interaction effect (?̂?𝑎𝑆2 ) in M4. Large decreases in the DIC model fit statistic were observed with the addition of either 𝒂𝑺𝒊𝒋 or 𝒂𝒘𝒊𝒋 interaction effects in the model, justifying their use; however, when considering them individually the 𝒂𝒘𝒊𝒋 effect offered a better DIC model fit statistic. The joint use of both 𝒂𝑺𝒊𝒋 and 𝒂𝒘𝒊𝒋 interaction effects in the model to explain G×E accounted for 9% of the total phenotypic variation, indicating a small but important source of genetic variation in the breeding population. The DIC estimate for the most complex model (M5) was lowest, justifying the specification and use of all model parameters for the prediction of breeding values in this population. Narrow-sense heritability estimates for models M1-M5 were in agreement with the mean single environment narrow-sense heritability estimates from the prior PBLUP analysis. A decrease in additive genetic variance estimates and consequently narrow-sense heritability, was observed with increasing model complexity due primarily to the addition of G×E effects into the models, and a decrease in percent variance explained by the additive genetic effect (?̂?𝑎2) 9% M1 vs 7% M2-M5. The decrease in heritability highlights the confounding nature between additive genetic effect estimates and G×E interaction effects.  79  4.3.3 Genomic prediction accuracy 4.3.3.1 Intra-generation (GA1) Models M1-M5 prediction accuracies for EBV of genotyped individuals across environments and within the F1 generation varied (𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.447 − 0.640) (Table 4.5, Figure 4.1). Individual environment mean prediction accuracies for models M1-M5 were not equal among the three tested environments 𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.619 (Lost), 𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.513 (Fleet), and 𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.452 (Adam). The addition of ECs facilitated by model M2 gave minor increases in PA, up to 2% over M1 (Fleet). Gains in PA of up to 6% (M3, Lost; M5, Fleet) were observed by accounting for G×E effects in the population. Treatment of G×E effects using ECs (M3) performed better than using main environment effects (M4) when comparing to the base model scenario (M2). Standard errors of predictions were minor, ranging from 0.0000-0.0027, indicating low variation between cross-validation replicates. 4.3.3.2 Inter-generation (GA2) Testing the models to predict EBV across generations and across environments yielded an expectedly lower minimum PA than observed for GA1 (𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.317 − 0.538) (Table 4.5, Figure 4.1). In all models, prediction accuracies were greater for the Bigtree environment than they were for Jordan. Comparison of models M1 and M2 show that the addition of ECs in M2 did not produce an increase in PA for Bigtree and a 1% decrease was observed for Jordan. The addition of individual G×E terms in models M3 and M4 both gave increases in PA over the base model M1 for both Jordan and Bigtree. However, the use of ECs to explain G×E variation (M3) gave 13% (Jordan) and 6% (Bigtree) increases in prediction accuracies over M1 versus 8% (Jordan) and 4% (Bigtree) given by M4. No improvement in PA occurred for Jordan with the addition of both G×E terms in model M5 and estimates were, in fact, equal to those given by M3. For the Bigtree  80  environment, a small 2% gain in PA was observed for model M5 over model M4 versus 1% for model M3. 4.4 Discussion The capability of GS and its application to increase genetic gain in conifer forest tree breeding programs is at this point recognized as being a future certainty (Grattapaglia et al. 2018). Genetic gain is ultimately a product of trait heritability, accuracy, and intensity of selection, and the time interval to complete a round of selection, testing, and breeding. Of these factors, the accuracy of selection and time effort are most effectively addressed by GS methods. Previously, with forest trees, the ssGBLUP method was demonstrated by Ratcliffe et al. (2017), Cappa et al. (2018) and Klápště et al. (2018) to improve the accuracy of genetic parameter estimates and intra-generation breeding values in forest trees. Here, the ssGBLUP methodology permitted the use of rogued genetic trial data by collectively leveraging their phenotypic, pedigree and genomic information to participate in marker effect estimation and subsequent intra- and inter-generation breeding value predictions. Additionally, the merit of using indirect genomic predictions with the ssGBLUP method was supported by this study in reducing the computational time required to obtain predictions for unobserved samples. Further, the capabilities of ssGBLUP in boosting (or increasing) the training population size without the need of extra genotyping was demonstrated, an important aspect of genomic PA highlighted by Hayes et al. (2009). As a result, the training population (𝑛 = 11,759) used here, so far represents the largest for a genomic prediction study in forest trees. A study involving Norway spruce highlighted that a TP to VP ratio of up to 3:1 produced increased prediction accuracies for growth and wood quality traits (Chen et al. 2018). However, it should be noted that Tan et al. (2017) and Lenz et al. (2017) observed benefit in PA above the stated 3:1 ratio. Training population  81  size is critical when the G×E effect can be as substantial as commonly observed in forest trees. Thus, it is a boon when historical trials which represent testing of genotypes across many environments can be recovered and included in the genomic prediction analysis. Additionally, larger training populations will better capture trait architecture and the totality of genetic diversity of the breeding population, a substantial benefit for making inferences. This variation may include rare marker variants, which are effective for the prediction of oligogenic traits where they may account for larger proportions of trait heritability (e.g., Resende et al. 2017). In our analysis, the studied trait (MAI) is considered to have low heritability and complex genetic architecture (i.e., polygenic). Indeed, the heritability estimates (Table 4.3, Table 4.4) and marker effect plot (Appendix C, Figure C.2) agree, justifying the use of the BRR model and Gaussian prior choice for prediction of the GEBV. The effect of modeling G×E interactions on GS prediction accuracies in Douglas-fir, an outcrossing conifer tree species was explored here. Thereby, increased inter- and intra-generation PA of EBV through consideration of G×E effects defined not only by main environment effects but also by ECs in the predictive models was demonstrated. The inclusion of EC effects in the model by themselves (i.e., M2) presented a fractioning of the phenotypic variance into effects explained by the main environment effect (34%) and that of the ECs (22%). This was accompanied by a 7% decrease in the proportion of total variance explained by the residual effect, thus producing more realistic gain estimates. However, in most cases M2 offered little to no increase in PA over M1, which agrees with the results of previous studies which tested the same suite of models used here for various traits in plants such as rice, wheat, or cotton (Jarquín et al. 2014; Pérez-Rodríguez et al. 2015; Morais Júnior et al. 2018). Jointly, the specification of EC effects, 𝒘𝒊𝒋 and 𝒂𝒘𝒊𝒋, in M3 resulted in a decrease in estimated residual variance over M1 (16%) and the interaction effect  82  explained 9% of the phenotypic variation versus 5% of 𝒂𝑺𝒊𝒋, the interaction specified using the main environment effect (M4). Additionally, models M3 and M5 showed the most consistent increases in PA over the base model M1 which also corroborates the aforementioned crop plant-based studies. In our analysis, maximum intra-generation prediction accuracies were achieved using either M3 or M5 which always included 𝒘𝒊𝒋 and 𝒂𝒘𝒊𝒋 and ranged from 𝑟 =  0.461 –  0.640 depending on the target environment, which are similar to previous reports of intra-generation GS prediction accuracies based on multi-environment trials using conifer trees. Though many previous studies did not explicitly model or test the inclusion of G×E interaction effects in the training population or GS prediction model. For example, and perhaps most informative, was a four environment GS analysis of loblolly pine (Pinus taeda) by Resende et al. (2012a) which showed an average decrease of 23% in prediction accuracies for tree height (age 6) when the prediction was done in unobserved environments within the same breeding zone (𝑟 =  0.41 − 0.74). However, the same study noted an average decrease of 41% when the prediction was extended to between breeding zones (𝑟 =  0.23 − 0.55). Thus, Resende et al. (2012a) stressed the importance of regionalizing tree breeding efforts, breeding zone delineation, and the lack of  transferability of genomic prediction models across breeding zones. A later study by Beaulieu et al. (2014b) found large decreases (33-41%) in prediction accuracies for Quebec white spruce (Picea glauca) tree height (age 17) when predicting EBV in unobserved environments (𝑟 =  0.22 –  0.34). Lenz et al. (2017) however, reported only 2% and 9% reductions in PA for 25 year tree height for black spruce (Picea mariana) when their prediction model did not include the target environment (𝑟 = 0.52 −0.56). The results observed by Lenz et al. (2017) reflect the low G×E effects of their black spruce population.  83  Two previous studies did however explicitly model GxE in the training population. The first by Gamal El-Dien et al. (2015) showed an average decrease of 29% in genomic PA of tree height (age 38) for interior spruce (Picea glauca × engelmannii) when the target environment was not explicitly included in the training population (𝑟 = 0.37 − 0.53). However, the mentioned study did not compare the inclusion of specific model terms for G×E to investigate their impact on PA as was done in the present study. The second by Thistlethwaite et al. (2017) was based on the same F1 population of trees studied here, albeit with differences (see further discussion). Thistlethwaite et al. (2017) observed prediction accuracies of 𝑟 = 0.88 − 0.93 for age 12 tree height. This result contrasts with previous studies, since it is higher than the average PA when the training population contained observations from all three environments (𝑟 = 0.88). Empirical studies examining inter-generation GS prediction accuracies in forest tree species are very limited. Isik et al. (2016) were the first to include multiple generations in a GS analysis of a forest tree. The prior study was continued by Bartholomé et al. (2016) who observed moderately high prediction accuracies (𝑟 =  0.70) for age 12 tree height when using a training population of grandparents and parents to predict offspring. Using samples from the same breeding population of this study, Thistlethwaite et al. (2019) observed inter-generation PA of 𝑟 = 0.42 for the Jordan environment. However, the results presented by Thistlethwaite et al. (2019) are not directly comparable to those presented here due to many key methodological differences: i) genomic prediction methods, ii) training and validation population structure and composition, iii) marker subset and imputation method, iv) estimation method of EBV, and v) modeling of G×E factors. The inter-generation prediction accuracies observed in this study were nevertheless lower than those of the study for maritime pine juvenile tree height (𝑟 =  0.70) (Bartholomé et al. 2016).  84  Maximum inter-generation prediction accuracies for MAI here were 𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.360 (Jordan) and 𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) = 0.538 (Bigtree) given by model M6 in both cases. The inclusion of a main G×E interaction term is standard in individual-tree mixed models, yet the inclusion of ECs to explain environmental heterogeneity and genetic variation is not yet widely used in forestry. With the availability of ‘scale-free’ climate data for specific locations from programs such as ClimateBC (Wang et al. 2012b) and the NASA POWER Project (Stackhouse et al. 2017), breeders should opt for their use to help describe growing conditions and improve EBV predictions. Type-B genetic correlations in this study were high between the predicted F1 environments (?̂?𝑔𝑖𝑖’  =  0.71 − 0.88), which might account for the moderately high intra-generation prediction accuracies observed here. However, Coastal Douglas-fir is noted to have strong G×E interaction for growth traits (Campbell 1992; Cappa et al. 2016b), which may also help to explain the observed gain in PA in this study by accounting for G×E effects. Thus, it appears to be critical for species with moderate to strong G×E interactions to either i) explicitly include the target environment in the GS prediction model, or ii) incorporate G×E effects in the genomic prediction model by including observations from as many environments as possible from across the breeding zone, an opportunity made possible with ssGBLUP. To our knowledge, this is the first use of ECs to capture environmental heterogeneity and G×E effects on phenotypic variation in outcrossing trees for genomic prediction of breeding values. The prior mentioned studies concerning forest trees and multi-environment trials have not considered methods to improve predictions in unobserved environments as was done in this study. Here, the inclusion of 𝒘𝒊𝒋, as well as both  𝒂𝑺𝒊𝒋 and 𝒂𝒘𝒊𝒋 interaction effects in the model are  85  advocated as the gain in PA was generally consistent across all tested environments and generations.  From a practical standpoint, the base (M2) and fully specified (M5) models were mostly agreeable in candidate rankings of the predicted individuals from the unobserved environments (Appendix C, Figure C.3). It is clear from the low discordance in rankings of highest and lowest ranked individuals that the same selections would be made irrespective of the model used. However, forest tree breeding programs by necessity require minimum effective population sizes (𝑁𝑒) which requires the selection of some sub-optimal individuals to maintain adequate genetic diversity in the breeding population. For these individuals (i.e., mid-ranked candidates) there is much less concordance in rankings between the two models (Appendix C, Figure C.3). During this phase, the selection of these individuals becomes important to balance genetic gain with and genetic diversity in the breeding population. This result is in agreement with Stejskal et al. (2018) who noted the capacity of GS to accurately capture Mendelian sampling (i.e., within family genetic variance) and cryptic relatedness in the breeding population, making it a preferred approach over traditional pedigree-based selection in forestry. In addition to the combination of high-density genomic and environmental/climate data, it is realistic to assume that prediction of phenotypes can be further improved through the simultaneous inclusion of multiple phenotypic and environmental traits obtained via high-throughput platforms such as those produced by remote sensing (Araus et al. 2018; Dungey et al. 2018). Information sharing between genetic lines, correlated traits and correlated environments via multiple-trait models with the specification of G×E interaction effects will lead to improved phenotypic predictions. Finally, as the anticipated increasing role of GS in tree breeding, it is essential to highlight the importance of PA improvement through the utilization of all possible  86  available sources of information (i.e., phenotypic and environmental) such as was demonstrated here with the inclusion of non-genotyped individuals and their testing environments, as well as environmental covariates to better model G×E effects. 4.5 Summary 4.5.1 Background Conifers are prime candidates for GS due to their long recurrent breeding cycles. Previous studies have shown much reduced PA of breeding values in unobserved environments, which may impede its adoption. The impact of explicit environmental heterogeneity modeling including G×E interaction effects using ECs in a reaction-norm genomic prediction model was tested using ssGBLUP. Juvenile tree height measurements of a three-generation coastal Douglas-fir experimental population with 14 genetic trials (𝑛 = 13,615) permitted the estimation of intra- and inter-generation PA in unobserved environments using 66,969 SNPs derived from exome capture. 4.5.2 Results Intra- and inter-generation PAs ranged from 0.447 − 0.640 and 0.317 − 0.538, respectively. The inclusion of ECs in the prediction models explained up to 23% of the phenotypic variation for the fully specified model and resulted in the best model fit. Modeling G×E effects in the training population increased PA up to 6% and 13% over the base model for inter- and intra-generations, respectively.  4.5.3 Conclusions This chapter demonstrates that GS PA can be substantially improved using ECs to explain environmental heterogeneity and G×E effects. The ssGBLUP methodology allows historical genetic trials containing non-genotyped samples to contribute in genomic prediction, and, thus, effectively boost training population size which is a critical component of GS. This study also  87  emphasizes the importance of utilizing all possible sources available information in the genetic analysis of complex traits. 88  Table 4.1 Experimental population summaries for pedigree-based analyses. Environment Ntree NPar NFSFam Age Latitude Longitude Elevation (mas) Date Parents (F1)  Lost 2,464 78 165 12 49° 22′ 15″ 122° 14′ 07″ 424 1975-1987 Adam 2,368 78 165 12 50° 24′ 42” 126° 09′ 37″ 576 1975-1987 Fleet 2,595 78 165 12 48° 39′ 25″ 128° 05′ 05″ 561 1975-1987 Sechelt 2,472 78 165 12 49° 28' 05" 123° 42' 00" 227 1975-1987 Eldred 2,389 78 165 12 50° 08' 51" 124° 11' 19" 148 1975-1987 Menzies 2,509 78 165 12 50° 08' 55" 125° 38' 16" 333 1975-1987 Sproat 2,485 78 165 12 49° 17' 44" 125° 03' 10" 318 1975-1987 Squamish 2,477 78 165 12 50° 11' 51" 123° 22' 40" 470 1975-1987 Tansky 2,607 78 165 12 48° 27' 52" 124° 01' 48" 545 1975-1987 White 2,503 78 165 12 50° 06' 03" 126° 04' 54" 409 1975-1987 Series 2 (F2-2)  Jordan 1,551 74 79 11 48° 25’ 86" 124° 01’ 80" 160 2003-2014 North Arm 1,583 75 81 11 48° 50’ 72" 124° 06’ 53" 168 2003-2014 Series 3 (F2-3)  Big Tree 2,228 73 108 11 50° 15’ 80" 125° 43’ 59" 225 2006-2017 Roach 1,768 73 104 11 48° 45’ 08" 124° 06' 14" 230 2006-2017 NOTE: Ntree, Number of individual tree observations; NPar, Combined number of male and female parents; NFSFam, Number of full-sibling families; Age, Age of tree height  measurement; Date, Year of planting to year of tree height measurement.   89  Table 4.2 Training (T) and validation (V) population summaries for genomic analyses. Environment Generation NTree NPar NFSFam NGenotyped GA1 GA2 Lost F1 1,167 36 78 288 T / V T Adam F1 1,114 36 78 285 T / V T Fleet F1 1,228 36 78 295 T / V T Sechelt F1 1,164 36 78 - T T Eldred F1 1,130 36 78 - T T Menzies F1 1,191 36 78 - T T Sproat F1 1,170 36 78 - T T Squamish F1 1,169 36 78 - T T Tansky F1 1,236 36 78 - T T White F1 1,190 36 78 - T T Jordan F2-2 171 9 8 79 - V Bigtree F2-3 821 26 41 230 - V NOTE: GA1, Genomic Analysis 1; GA2, Genomic Analysis 2; T, training population; V, validation population; NTree, Number of  individual tree observations; NPar, Combined number of male and female parents; NFSFam, Number of full-sibling families. 90  Table 4.3 Pedigree-based single environment narrow-sense heritabilities (diagonal) and Type-B genetic correlations (lower triangle) for MAI, standard errors in parentheses.  Lost Adam Fleet Sechelt White Menzies Sproat Tansky Eldred Squamish Bigtree Roach Jordan Northarm Lost 0.13 (0.002)              Adam 0.71 (0.128) 0.18 (0.002)             Fleet 0.83 (0.131) 0.88 (0.089) 0.22 (0.004)            Sechelt b 0.77 (0.097) 0.98 (0.071) 0.22 (0.003)           White 0.73 (0.138) 0.85 (0.100) 0.93 (0.078) 0.93 (0.078) 0.04 (0.014)          Menzies 0.83 (0.137) 0.90 (0.095) 0.92 (0.117) 0.91 (0.099) 0.97 (0.097) 0.17 (0.003)         Sproat 0.55 (0.163) 0.51 (0.142) 0.78 (0.116) 0.80 (0.104) 0.78 (0.115) 0.81 (0.128) 0.21 (0.003)        Tansky 0.73 (0.138) 0.63 (0.121) 0.89 (0.083) 0.84 (0.090) 0.67 (0.125) 0.72 (0.132) 0.60 (0.136) 0.27 (0.005)       Eldred 0.77 (0.159) 0.61 (0.148) 0.77 (0.144) 0.98 (0.100) 0.91 (0.108) 0.89 (0.139) 0.86 (0.139) 0.72 (0.148) 0.14 (0.002)      Squamish 0.83 (0.103) 0.85 (0.068) 0.87 (0.076) 0.84 (0.080) 0.80 (0.090) 0.93 (0.070) 0.62 (0.120) 0.64 (0.120) 0.70 (0.135) 0.25 (0.002)     Bigtree -0.23 (0.604) -0.01 (0.474) 0.53 (0.394) -0.14 (0.472) 0.09 (0.529) 0.13 (0.597) -0.53 (0.455) -0.04 (0.554) -0.06 (0.575) -0.05 (0.565) 0.13 (0.000)    Roach 0.48 (0.468) 0.46 (0.368) 0.32 (0.418) 0.31 (0.407) 0.20 (0.471) 0.51 (0.452) -0.03 (0.506) 0.24 (0.480) 0.19 (0.500) 0.70 (0.338) 0.57 (0.133) 0.33 (0.000)   Jordan 0.73 (0.488) 0.64 (0.430) 0.20 (0.504) 0.72 (0.400) 0.80 (0.540) b 0.98 (0.369) 0.41 (0.573) b 0.35 (0.455) b b 0.42 (0.006)  Northarm b b b 0.89 (0.430) b b b b b 0.75 (0.435) b b 0.62 (0.194) 0.12 (0.003) NOTE: b, Type-B genetic correlation and their approximate standard errors were not estimated due to convergence problems.    91  Table 4.4 Model Deviance Information Criterion (DIC), posterior means and their 95% highest posterior density (HPD) intervals in brackets for the additive variance (?̂?𝒂𝟐), replicate within environment variance (?̂?𝒓𝟐), environmental variance (?̂?𝒔𝟐), environment covariates variance (?̂?𝒘𝟐 ), the interaction between additive effects and environment covariates (?̂?𝒂𝒘𝟐 ), the variance of the interaction between additive and environmental effects (?̂?𝒂𝑺𝟐 ), residual variance (?̂?𝒆𝟐), heritability (?̂?𝟐). Model DIC ?̂?𝑎2 ?̂?𝑟2 ?̂?𝑠2 ?̂?𝑤2  ?̂?𝑎𝑤2  ?̂?𝑎𝑆2  ?̂?𝑒2 ℎ̂2 M1 87846.80 20.02 11.11 102.53 - - - 91.35 0.18   [13.00 27.59] [6.35 16.63] [38.18 193.00] - - - [86.24 96.17] [0.12 0.24] M2 87850.41 19.11 9.20 90.52 58.04 - - 91.55 0.17   [12.58 26.08] [5.27 14.01] [24.32 186.18] [10.15 181.48] - - [86.67 96.23] [0.12 0.23] M3 87176.25 18.51 8.05 86.16 33.86 19.99 - 76.61 0.16   [11.89 25.15] [4.65 12.22] [26.04 168.63] [8.22 70.70] [14.88 25.31] - [71.10 82.19] [0.11 0.21] M4 87380.72 17.41 8.03 85.50 38.95 - 12.82 81.12 0.16   [11.27 23.88] [4.57 12.32] [21.92 172.05] [8.84 99.37] - [9.67 15.94] [76.17 86.00] [0.10 0.21] M5 87159.53 16.53 7.20 90.08 30.08 12.19 8.52 76.43 0.15   [10.87 22.92] [4.07 10.97] [25.45 181.30] [5.37 65.72] [6.96 17.48] [5.64 11.49] [71.27 81.92] [0.10 0.20] NOTE: See text for variance parameter definitions. 92  Table 4.5 Intra- (GA1) and inter-generation (GA2) cross-validation prediction accuracies (𝒓(𝑮𝑬𝑩𝑽,𝑬𝑩𝑽)) with standard errors (𝑺𝑬). See text for models´ abbreviations. Model Lost (GA1)  Adam (GA1)  Fleet (GA1)  Jordan (GA2)  Bigtree (GA2)  𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) 𝑆𝐸  𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) 𝑆𝐸  𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) 𝑆𝐸  𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) 𝑆𝐸  𝑟(𝐺𝐸𝐵𝑉,𝐸𝐵𝑉) 𝑆𝐸 M1 0.601 0.0000  0.447 0.0004  0.498 0.0006  0.320 0.0012  0.506 0.0006 M2 0.613 0.0011  0.453 0.0011  0.513 0.0027  0.317 0.0011  0.506 0.0008 M3  0.640 0.0007  0.461 0.0006  0.519 0.0010  0.360 0.0015  0.535 0.0009 M4  0.615 0.0011  0.448 0.0010  0.515 0.0010  0.346 0.0029  0.526 0.0006 M5 0.629 0.0009  0.452 0.0010  0.526 0.0023  0.360 0.0016  0.538 0.0004   93   Figure 4.1 Intra-generation (GA1) and inter-generation (GA2) predictions of EBV. 94  Chapter 5: Conclusion 5.1 Research novelties This dissertation explored some of the challenges associated with the application of GS in forest tree improvement programs. This work is unique among previous work primarily due to the variety of methods and data types employed: • The availability of data for three economically important conifer species, interior spruce (Chapter 2), white spruce (Chapter 3), and Douglas-fir (Chapter 4), allowed for a multi-species investigation of GS. While species specific results could not be compared directly due to differing methodologies, all the studies produced results that showed positive benefit from applying GS technologies to their breeding programs. • The availability of genomic data from three different genotyping platforms, genotyping by sequencing (Chapter 2), genotyping array (Chapter 3), exome capture (Chapter 4), allowed for validation of their use for GS. GBS and exome capture methods which are lauded for cost-efficiency and use in non-model organisms were proven for the first time to be as effective as array based genotyping platforms for GS in forest trees. • The first application of HBLUP/ssGBLUP in a forest tree context was investigated (Chapter 3). • The use of scale-free climate data as ECs for the purpose of GS was explored for the first time in a forest tree species (Chapter 4). 5.2 Research contributions Specific to the objectives outlined in Chapter 1.6, the contributions of this dissertation are as follows:   95  • Assessing the temporal decay in prediction accuracy (Chapter 2).  Temporal PA was well correlated with age-age genetic correlation and decreased substantially with increasing difference in age between the training and validation populations (Chapter 2). This finding suggests that updating of GS models will likely require phenotypic data from mid-rotation age or later to accurately reflect mature growth trait performance • Evaluating the potential of genotyping by sequencing as a genotyping platform for GS (Chapter 2). Genotyping by sequencing was proven as a suitable marker discovery platform for the application of genomic selection in forest trees. Using different imputation methods, differing numbers of SNP markers were retained for analysis. Increasing number of SNP markers was linked to producing slightly greater PA in the studied traits. • Exploring the potential for reducing genotyping effort and its impact on the accuracy and precision of genetic parameter estimates (Chapter 3). HBLUP is an effective tool for improvement in the genetic evaluation of open-pollinated mating designs. Modest genotyping effort within open-pollinated families is effective at improving important population level genetic parameters such as heritability and genetic correlations. Further, the modest within family genotyping effort was effective at capturing historical coancestry (i.e., population structure) and Mendelian sampling within the breeding population that were not captured by the contemporary pedigree. • Comparing the effectiveness of genomic selection to classical tree improvement strategies (Chapter 2, Chapter 3).  96  The observed GS PA in Chapter 2 were adequate to produce greater genetic gain over classical tree improvement strategies. In both early and mature selection scenarios, the relative efficiency of GS (based on PA) was at least 112% of classical methods. Further, by permitting both pedigree and marker information in a single genetic evaluation analysis, the HBLUP approach was associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy over traditional methods (Chapter 3). Discordance in candidate rankings between HBLUP and the traditional method that utilizes 𝑨 were observed. Correct ranking of selection candidates is crucial to maximizing genetic gain in breeding programs, representing a benefit to GS over classical methods. • Evaluating the potential of exome capture as a genotyping platform for GS (Chapter 4). Exome capture was proven as a suitable marker discovery platform for the application of genomic selection in Douglas-fir. Exome capture genomic data produced significant PA for juvenile tree height and permitted the evaluation of inter- and intra-generation prediction accuracy in unobserved environments. • Evaluating inter- and intra-generation prediction accuracy in unobserved environments (Chapter 4). The ssGBLUP analysis was successful in producing moderate to high intra- and inter-generation PA in unobserved environments (Chapter 4). The study suggests that with the inclusion of scale-free climate variables as EC in the prediction model yields improvement in PA. 5.3 Future research directions The capability of GS and its application to increase genetic gain for important traits in conifer forest tree breeding programs could at this point be recognized as a future certainty. The  97  availability of mass genomic data has enabled substantiation of this claim and has permitted the studies contained in this dissertation and numerous others in the literature. GS is very likely to cause a paradigm shift in the world of tree improvement as the cost and availability of genomic data for large numbers of individuals decreases. As DNA sequencings technologies continue to improve and evolve, the availability and quality of genomic resources for organisms with highly complex genomes (e.g., conifers) will increase. With this, functional genomic studies will seek to identify and annotate regions of conifer genomes with particularly high influence on important traits. These regions can readily be integrated in GS prediction models by providing more weight to DNA markers associated with them in the GS analysis offering prospective improvement in PA. This concept has been proven to be effective in aquaculture and animal breeding where high-quality genomic resources are readily available (Freitas et al. 2015; Zhang et al. 2016; Vallejo et al. 2017; Teissier et al. 2018).  With the foreseen implementation of GS in tree improvement programs, the labour intensive activity of phenotyping will remain the bottleneck. Thus, the integration of geomatics and genomics is another logical direction of future forest tree improvement research. Remote sensing technologies are now capable of obtaining mass quantities of high-resolution environmental and individual tree phenotypic data. This data will help to explain the observed phenotypic, environmental, and G×E variation in breeding populations and contribute to improved phenotypic predictions. Perhaps most importantly, the prospect of monitoring phenotypes at the individual tree level through time will generate very high-quality data, a compulsory requirement of GS to train models and produce accurate predictions. These concepts are extensively utilized in agricultural crop breeding systems (Araus et al. 2018), however due to the scale and complexity  98  of forest tree genetic trials, they will require drastically different approaches and methodologies (Dungey et al. 2018).  99  References Aguilar I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, et al., 2010 Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93: 743–752. https://doi.org/10.3168/jds.2009-2730 Araus J. L., S. C. Kefauver, M. Zaman-Allah, M. S. Olsen, and J. E. Cairns, 2018 Translating high-throughput phenotyping into genetic gain. Trends in Plant Science 23: 451–466. https://doi.org/10.1016/j.tplants.2018.02.001 Askew G. R., and Y. A. El-Kassaby, 1994 Estimation of relationship coefficients among progeny derived from wind-pollinated orchard seeds. Theoret. Appl. Genetics 88: 267–272. https://doi.org/10.1007/BF00225908 Bartholomé J., J. Van Heerwaarden, F. Isik, C. Boury, M. Vidal, et al., 2016 Performance of genomic prediction within and across generations in maritime pine. BMC Genomics 17: 604. https://doi.org/10.1186/s12864-016-2879-8 Beaulieu J., T. Doerksen, S. Clément, J. MacKay, and J. Bousquet, 2014a Accuracy of genomic selection models in a large population of open-pollinated families in white spruce. Heredity 113: 343–352. https://doi.org/10.1038/hdy.2014.36 Beaulieu J., T. K. Doerksen, J. MacKay, A. Rainville, and J. Bousquet, 2014b Genomic selection accuracies within and between environments and small breeding groups in white spruce. BMC Genomics 15: 1048. https://doi.org/10.1186/1471-2164-15-1048  100  Beavis W., 1998 QTL analyses: Power, precision, and accuracy., pp. 145–162 in Molecular Dissection of Complex Traits., edited by Paterson A. CRC Press, Boca Raton, FL. Berger S., P. Pérez‐Rodríguez, Y. Veturi, H. Simianer, and G. de los Campos, 2015 Effectiveness of shrinkage and variable selection methods for the prediction of complex human traits using data from distantly related individuals. Annals of Human Genetics 79: 122–135. https://doi.org/10.1111/ahg.12099 Bian L., J. Shi, R. Zheng, J. Chen, and H. X. Wu, 2014 Genetic parameters and genotype–environment interactions of Chinese fir (Cunninghamia lanceolata) in Fujian Province. Canadian Journal of Forest Research 44: 582–592. https://doi.org/10.1139/cjfr-2013-0427 Birol I., A. Raymond, S. D. Jackman, S. Pleasance, R. Coope, et al., 2013 Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 29: 1492–1497. https://doi.org/10.1093/bioinformatics/btt178 Burdon R. D., and C. J. A. Shelbourne, 1971 Breeding populations for recurrent selection: conflicts and possible solutions. New Zealand Journal of Forestry Science. Burdon R. D., 1977 Genetic correlation as a concept for studying genotype-environment interaction in forest tree breeding. Silvae Genetica 26: 168–175. Butler D. G., A. R. Gilmour, and B. J. Gogel, 2009 ASReml-R reference manual. The State of Queensland, Department of Primary Industries and Fisheries Brisbane, Australia.  101  Calus M. P. L., T. H. E. Meuwissen, A. P. W. de Roos, and R. F. Veerkamp, 2008 Accuracy of genomic selection using different methods to define haplotypes. Genetics 178: 553–561. https://doi.org/10.1534/genetics.107.080838 Campbell R. K., 1992 Genotype * environment interaction: a case study for Douglas-fir in western Oregon. Res. Pap. PNW-RP-455. Portland, OR: US Department of Agriculture, Forest Service, Pacific Northwest Research Station. 21 p 455. Cappa E. P., A. D. Yanchuk, and C. V. Cartwright, 2012 Bayesian inference for multi-environment spatial individual-tree models with additive and full-sib family genetic effects for large forest genetic trials. Annals of Forest Science 69: 627–640. https://doi.org/10.1007/s13595-011-0179-7 Cappa E. P., J. Klápště, M. N. Garcia, P. V. Villalba, and S. N. Marcucci Poltri, 2016a SSRs, SNPs and DArTs comparison on estimation of relatedness and genetic parameters’ precision from a small half-sib sample population of Eucalyptus grandis. Molecular Breeding 36: 97. https://doi.org/10.1007/s11032-016-0522-7 Cappa E. P., M. U. Stoehr, C.-Y. Xie, and A. D. Yanchuk, 2016b Identification and joint modeling of competition effects and environmental heterogeneity in three Douglas-fir (Pseudotsuga menziesii var. menziesii) trials. Tree Genetics & Genomes 12: 102. https://doi.org/10.1007/s11295-016-1061-4 Cappa E. P., Y. A. El-Kassaby, F. Muñoz, M. N. Garcia, P. V. Villalba, et al., 2018 Genomic-based multiple-trait evaluation in Eucalyptus grandis using dominant DArT markers. Plant Science 271: 27–33. https://doi.org/10.1016/j.plantsci.2018.03.014  102  Chen C., S. E. Mitchell, R. J. Elshire, E. S. Buckler, and Y. A. El-Kassaby, 2013 Mining conifers’ mega-genome using rapid and efficient multiplexed high-throughput genotyping-by-sequencing (GBS) SNP discovery platform. Tree Genetics & Genomes 9: 1537–1544. https://doi.org/10.1007/s11295-013-0657-1 Chen Z.-Q., J. Baison, J. Pan, B. Karlsson, B. Andersson, et al., 2018 Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce. BMC Genomics 19: 946. https://doi.org/10.1186/s12864-018-5256-y Christensen O. F., and M. S. Lund, 2010 Genomic prediction when some animals are not genotyped. Genetics Selection Evolution 42: 2. https://doi.org/10.1186/1297-9686-42-2 Christensen O. F., P. Madsen, B. Nielsen, T. Ostersen, and G. Su, 2012 Single-step methods for genomic evaluation in pigs. animal 6: 1565–1571. https://doi.org/10.1017/S1751731112000742 Combs E., and R. Bernardo, 2013 Accuracy of genomewide selection for different traits with constant population size, heritability, and number of markers. The Plant Genome 6. https://doi.org/10.3835/plantgenome2012.11.0030 Crossa J., Y. Beyene, S. Kassa, P. Pérez, J. M. Hickey, et al., 2013 Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3: Genes, Genomes, Genetics 3: 1903–1926. https://doi.org/10.1534/g3.113.008227  103  Daetwyler H. D., R. Pong-Wong, B. Villanueva, and J. A. Woolliams, 2010 The impact of genetic architecture on genome-wide evaluation methods. Genetics 185: 1021–1031. https://doi.org/10.1534/genetics.110.116855 Danecek P., A. Auton, G. Abecasis, C. A. Albers, E. Banks, et al., 2011 The variant call format and VCFtools. Bioinformatics 27: 2156–2158. https://doi.org/10.1093/bioinformatics/btr330 Doerksen T. K., and C. M. Herbinger, 2010 Impact of reconstructed pedigrees on progeny-test breeding values in red spruce. Tree Genetics & Genomes 6: 591–600. https://doi.org/10.1007/s11295-010-0274-1 Ducrocq V., and C. Patry, 2010 Combining genomic and classical information in national BLUP evaluation to reduce bias due to genomic pre-selection. Interbull Bulletin 41: 33–36. Dungey H. S., J. P. Dash, D. Pont, P. W. Clinton, M. S. Watt, et al., 2018 Phenotyping whole forests will help to track genetic performance. Trends in Plant Science 23: 854–864. https://doi.org/10.1016/j.tplants.2018.08.005 Dutkowski G. W., J. C. e Silva, A. R. Gilmour, and G. A. Lopez, 2002 Spatial analysis methods for forest genetic trials. Can. J. For. Res. 32: 2201–2214. https://doi.org/10.1139/x02-111 El-Dien O. G., B. Ratcliffe, J. Klápště, I. Porth, C. Chen, et al., 2018 Multienvironment genomic variance decomposition analysis of open-pollinated Interior spruce (Picea glauca x engelmannii). Mol Breeding 38: 26. https://doi.org/10.1007/s11032-018-0784-3  104  El-Kassaby Y. A., and M. Lstibůrek, 2009 Breeding without breeding. Genetics Research 91: 111–120. https://doi.org/10.1017/S001667230900007X El-Kassaby Y. A., E. P. Cappa, C. Liewlaksaneeyanawin, J. Klápště, and M. Lstibůrek, 2011 Breeding without breeding: is a complete pedigree necessary for efficient breeding? PLOS ONE 6: e25737. https://doi.org/10.1371/journal.pone.0025737 El-Kassaby Y. A., J. Klápště, and R. D. Guy, 2012 Breeding without breeding: selection using the genomic best linear unbiased predictor method (GBLUP). New Forests 43: 631–637. https://doi.org/10.1007/s11056-012-9338-4 Elshire R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, et al., 2011 A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6: e19379. https://doi.org/10.1371/journal.pone.0019379 Endelman J. B., 2011 Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome 4: 250–255. https://doi.org/10.3835/plantgenome2011.08.0024 Falconer D. S., and T. F. C. Mackay, 1996 Introduction to quantitative genetics. Longmans Green, Harlow, Essex, UK. Fisher R. A., 1919 XV.—the correlation between relatives on the supposition of Mendelian inheritance. Earth and Environmental Science Transactions of The Royal Society of Edinburgh 52: 399–433. https://doi.org/10.1017/S0080456800012163  105  Forni S., I. Aguilar, and I. Misztal, 2011 Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution 43: 1. https://doi.org/10.1186/1297-9686-43-1 Freitas M. S., L. S. Freitas, T. Weber, M. Yamaki, M. E. Cantão, et al., 2015 Comparison of iterated single-step and Bayesian regressions on genomic evaluations for age at 100 kg in swine. J Anim Sci 93: 4675–4683. https://doi.org/10.2527/jas.2014-8842 Frentiu Francesca D, Clegg Sonya M, Chittock John, Burke Terry, Blows Mark W, et al., 2008 Pedigree-free animal models: The relatedness matrix reloaded. Proceedings of the Royal Society B: Biological Sciences 275: 639–647. https://doi.org/10.1098/rspb.2007.1032 Gamal El-Dien O., B. Ratcliffe, J. Klápště, C. Chen, I. Porth, et al., 2015 Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing. BMC Genomics 16: 370. https://doi.org/10.1186/s12864-015-1597-y Gamal El-Dien O., B. Ratcliffe, J. Klápště, I. Porth, C. Chen, et al., 2016 Implementation of the realized genomic relationship matrix to open-pollinated white spruce family testing for disentangling additive from nonadditive genetic effects. G3: Genes, Genomes, Genetics 6: 743–753. https://doi.org/10.1534/g3.115.025957 Garrick D. J., J. F. Taylor, and R. L. Fernando, 2009 Deregressing estimated breeding values and weighting information for genomic regression analyses. Genetics Selection Evolution 41: 55. https://doi.org/10.1186/1297-9686-41-55  106  Gilmour A. R., R. Thompson, and B. R. Cullis, 1995 Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51: 1440–1450. https://doi.org/10.2307/2533274 Gilmour A. R., B. Gogel, B. Cullis, and R. Thompson, 2009 ASReml User Guide Release 3.0. VSN International Ltd., Hemel Hempstead, UK. Grattapaglia D., and M. D. V. Resende, 2011 Genomic selection in forest tree breeding. Tree Genetics & Genomes 7: 241–255. https://doi.org/10.1007/s11295-010-0328-4 Grattapaglia D., 2014 Breeding forest trees by genomic selection: Current progress and the way forward, pp. 651–682 in Genomics of Plant Genetic Resources: Volume 1. Managing, sequencing and mining genetic resources, edited by Tuberosa R., Graner A.,  Frison E. Springer Netherlands, Dordrecht. Grattapaglia D., 2017 Status and perspectives of genomic selection in forest tree breeding, pp. 199–249 in Genomic Selection for Crop Improvement: New Molecular Breeding Strategies for Crop Improvement, edited by Varshney R. K., Roorkiwal M.,  Sorrells M. E. Springer International Publishing,  Cham, Switzerland. Grattapaglia D., O. B. Silva-Junior, R. T. Resende, E. P. Cappa, B. S. de F. Müller, et al., 2018 Quantitative genetics and genomics converge to accelerate forest tree breeding. Frontiers in Plant Science 9. https://doi.org/10.3389/fpls.2018.01693 Guo G., M. S. Lund, Y. Zhang, and G. Su, 2010 Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response  107  variables. Journal of Animal Breeding and Genetics 127: 423–432. https://doi.org/10.1111/j.1439-0388.2010.00878.x Habier D., R. L. Fernando, and J. C. M. Dekkers, 2007 The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: 2389–2397. https://doi.org/10.1534/genetics.107.081190 Habier D., J. Tetens, F.-R. Seefried, P. Lichtner, and G. Thaller, 2010 The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genetics, selection, evolution: GSE 42: 5. https://doi.org/10.1186/1297-9686-42-5 Habier D., R. L. Fernando, K. Kizilkaya, and D. J. Garrick, 2011 Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 12: 186. https://doi.org/10.1186/1471-2105-12-186 Habier D., R. L. Fernando, and D. J. Garrick, 2013 Genomic BLUP decoded: A look into the black box of genomic prediction. Genetics 194: 597–607. https://doi.org/10.1534/genetics.113.152207 Hayes B. J., P. J. Bowman, A. J. Chamberlain, and M. E. Goddard, 2009 Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of Dairy Science 92: 433–443. https://doi.org/10.3168/jds.2008-1646 Heffner E. L., M. E. Sorrells, and J.-L. Jannink, 2009 Genomic selection for crop improvement. Crop Science 49: 1–12. https://doi.org/10.2135/cropsci2008.08.0512  108  Henderson C. R., 1953 Estimation of variance and covariance components. Biometrics 9: 226–252. https://doi.org/10.2307/3001853 Henderson C. R., 1976 A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32: 69–83. https://doi.org/10.2307/2529339 Henderson C. R., 1984 Applications of linear models in animal breeding. University of Guelph, Guelph, Canada. Heslot N., H.-P. Yang, M. E. Sorrells, and J.-L. Jannink, 2012 Genomic selection in plant breeding: A comparison of models. Crop Science 52: 146–160. https://doi.org/10.2135/cropsci2011.06.0297 Hill W. G., M. E. Goddard, and P. M. Visscher, 2008 Data and theory point to mainly additive genetic variance for complex traits. PLOS Genetics 4: e1000008. https://doi.org/10.1371/journal.pgen.1000008 Huber D. A., T. L. White, and G. R. Hodge, 1992 The efficiency of half-sib, half-diallel and circular mating designs in the estimation of genetic parameters in forestry: A simulation. Forest science 38: 757–776. Isik F., 2014 Genomic selection in forest tree breeding: the concept and an outlook to the future. New Forests 45: 379–401. https://doi.org/10.1007/s11056-014-9422-z  109  Isik F., J. Bartholomé, A. Farjat, E. Chancerel, A. Raffin, et al., 2016 Genomic selection in maritime pine. Plant Science 242: 108–119. https://doi.org/10.1016/j.plantsci.2015.08.006 Isik F., J. Holland, and C. Maltecca, 2017 Genetic Data Analysis for Plant and Animal Breeding. Springer International Publishing,  Cham, Switzerland. Jarquín D., J. Crossa, X. Lacaze, P. Du Cheyron, J. Daucourt, et al., 2014 A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics 127: 595–607. https://doi.org/10.1007/s00122-013-2243-1 Jayawickrama K. J. S., and M. J. Carson, 2000 A breeding strategy for the New Zealand radiata pine breeding cooperative. Silvae Genetica 49: 82–90. Kalinowski S. T., M. L. Taper, and T. C. Marshall, 2007 Revising how the computer program cervus accommodates genotyping error increases success in paternity assignment. Molecular Ecology 16: 1099–1106. https://doi.org/10.1111/j.1365-294X.2007.03089.x Klápště J., M. Lstibůrek, and Y. A. El-Kassaby, 2014 Estimates of genetic parameters and breeding values from western larch open-pollinated families using marker-based relationship. Tree Genetics & Genomes 10: 241–249. https://doi.org/10.1007/s11295-013-0673-1  110  Klápště J., M. Suontama, H. S. Dungey, E. J. Telfer, N. J. Graham, et al., 2018 Effect of hidden relatedness on single-step genetic evaluation in an advanced open-pollinated breeding program. Journal of Heredity 109: 802–810. https://doi.org/10.1093/jhered/esy051 Korecký J., J. Klápště, M. Lstibůrek, J. Kobliha, C. D. Nelson, et al., 2013 Comparison of genetic parameters from marker-based relationship, sibship, and combined models in Scots pine multi-site open-pollinated tests. Tree Genetics & Genomes 9: 1227–1235. https://doi.org/10.1007/s11295-013-0630-z Legarra A., I. Aguilar, and I. Misztal, 2009 A relationship matrix including full pedigree and genomic information. Journal of Dairy Science 92: 4656–4663. https://doi.org/10.3168/jds.2009-2061 Legarra A., and V. Ducrocq, 2012 Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction. Journal of Dairy Science 95: 4629–4645. https://doi.org/10.3168/jds.2011-4982 Lenz P. R. N., J. Beaulieu, S. D. Mansfield, S. Clément, M. Desponts, et al., 2017 Factors affecting the accuracy of genomic selection for growth and wood quality traits in an advanced-breeding population of black spruce (Picea mariana). BMC Genomics 18: 335. https://doi.org/10.1186/s12864-017-3715-5 Li Y., M. Suontama, R. D. Burdon, and H. S. Dungey, 2017 Genotype by environment interactions in forest tree breeding: review of methodology and perspectives on research and application. Tree Genetics & Genomes 13: 60. https://doi.org/10.1007/s11295-017-1144-x  111  Liu H., A. C. Sørensen, T. H. Meuwissen, and P. Berg, 2014 Allele frequency changes due to hitch-hiking in genomic selection programs. Genetics Selection Evolution 46: 8. https://doi.org/10.1186/1297-9686-46-8 Lorenz A. J., S. Chao, F. G. Asoro, E. L. Heffner, T. Hayashi, et al., 2011 Chapter Two - Genomic selection in plant breeding: Knowledge and prospects, pp. 77–123 in Advances in Agronomy, edited by Sparks D. L. Academic Press, , The Netherlands. Lourenco D. a. L., S. Tsuruta, B. O. Fragomeni, Y. Masuda, I. Aguilar, et al., 2015 Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. J Anim Sci 93: 2653–2662. https://doi.org/10.2527/jas.2014-8836 Lynch M., and B. Walsh, 1998 Genetics and analysis of quantitative traits. Sinauer Associates Incorporated, Sunderland, MA. Marchal A., C. D. Schlichting, R. Gobin, P. Balandier, F. Millier, et al., 2019 Deciphering hybrid larch reaction norms using random regression. G3: Genes, Genomes, Genetics 9: 21–32. https://doi.org/10.1534/g3.118.200697 Meuwissen T. H. E., B. J. Hayes, and M. E. Goddard, 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829. Meuwissen T. H. E., T. Luan, and J. A. Woolliams, 2011 The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. Journal of Animal Breeding and Genetics 128: 429–439. https://doi.org/10.1111/j.1439-0388.2011.00966.x  112  Misztal I., A. Legarra, and I. Aguilar, 2009 Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. Journal of Dairy Science 92: 4648–4655. https://doi.org/10.3168/jds.2009-2064 Morais Júnior O. P., J. B. Duarte, F. Breseghello, A. S. G. Coelho, O. P. Morais, et al., 2018 Single-step reaction norm models for genomic prediction in multienvironment recurrent selection trials. Crop Science 58: 592–607. https://doi.org/10.2135/cropsci2017.06.0366 Moser G., B. Tier, R. E. Crump, M. S. Khatkar, and H. W. Raadsma, 2009 A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genetics Selection Evolution 41: 56. https://doi.org/10.1186/1297-9686-41-56 Mrode R. A., 2014 Linear models for the prediction of animal breeding values. CAB International, Wallingford, Oxfordshire, UK. Munoz P. R., M. F. R. Resende, D. A. Huber, T. Quesada, M. D. V. Resende, et al., 2014 Genomic relationship matrix for correcting pedigree errors in breeding populations: Impact on genetic parameters and genomic selection accuracy. Crop Science 54: 1115–1123. https://doi.org/10.2135/cropsci2012.12.0673 Namkoong G., 1966 Inbreeding effects on estimation of genetic additive variance. Forest Science 12: 8–13. https://doi.org/10.1093/forestscience/12.1.8 Namkoong G., H. C. Kang, and J. S. Brouard, 1988 Tree breeding: Principles and strategies. Springer Science & Business Media, New York, NY, USA.  113  Neale D. B., and O. Savolainen, 2004 Association genetics of complex traits in conifers. Trends in Plant Science 9: 325–330. https://doi.org/10.1016/j.tplants.2004.05.006 Neves L. G., J. M. Davis, W. B. Barbazuk, and M. Kirst, 2013 Whole-exome targeted sequencing of the uncharacterized pine genome. The Plant Journal 75: 146–156. https://doi.org/10.1111/tpj.12193 Ødegård J., and T. H. Meuwissen, 2012 Estimation of heritability from limited family data using genome-wide identity-by-descent sharing. Genetics Selection Evolution 44: 16. https://doi.org/10.1186/1297-9686-44-16 Ødegård J., and T. H. Meuwissen, 2014 Identity-by-descent genomic selection using selective and sparse genotyping. Genetics Selection Evolution 46: 3. https://doi.org/10.1186/1297-9686-46-3 Oliehoek P. A., J. J. Windig, J. A. M. van Arendonk, and P. Bijma, 2006 Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics 173: 483–496. https://doi.org/10.1534/genetics.105.049940 Pavy N., F. Gagnon, P. Rigault, S. Blais, A. Deschênes, et al., 2013 Development of high-density SNP genotyping arrays for white spruce (Picea glauca) and transferability to subtropical and nordic congeners. Molecular Ecology Resources 13: 324–336. https://doi.org/10.1111/1755-0998.12062 Pérez P., and G. de los Campos, 2014 Genome-wide regression and prediction with the BGLR statistical package. Genetics 198: 483–495. https://doi.org/10.1534/genetics.114.164442  114  Pérez-Rodríguez P., J. Crossa, K. Bondalapati, G. De Meyer, F. Pita, et al., 2015 A pedigree-based reaction norm model for prediction of cotton yield in multienvironment trials. Crop Science 55: 1143–1151. https://doi.org/10.2135/cropsci2014.08.0577 Perry P. O., 2015 bcv: Cross-validation for the SVD (Bi-Cross-Validation). R package version 1.0.1. Porth I., J. Klápště, O. Skyba, B. S. K. Lai, A. Geraldes, et al., 2013 Populus trichocarpa cell wall chemistry and ultrastructure trait variation, genetic control and genetic correlations. New Phytologist 197: 777–790. https://doi.org/10.1111/nph.12014 Powell J. E., P. M. Visscher, and M. E. Goddard, 2010 Reconciling the analysis of IBD and IBS in complex trait studies. Nature Reviews Genetics 11: 800–805. https://doi.org/10.1038/nrg2865 Ratcliffe B., O. G. El-Dien, E. P. Cappa, I. Porth, J. Klápště, et al., 2017 Single-step BLUP with varying genotyping effort in open-pollinated Picea glauca. G3: Genes, Genomes, Genetics 7: 935–942. https://doi.org/10.1534/g3.116.037895 R-Core-Team, 2014 R Foundation for Statistical Computing, Vienna, Austria. Resende M. F. R., P. Muñoz, J. J. Acosta, G. F. Peter, J. M. Davis, et al., 2012a Accelerating the domestication of trees using genomic selection: Accuracy of prediction models across ages and environments. New Phytologist 193: 617–624. https://doi.org/10.1111/j.1469-8137.2011.03895.x  115  Resende M. D. V., M. F. R. Resende, C. P. Sansaloni, C. D. Petroli, A. A. Missiaggia, et al., 2012b Genomic selection for growth and wood quality in Eucalyptus: Capturing the missing heritability and accelerating breeding for complex traits in forest trees. New Phytologist 116–128. https://doi.org/10.1111/j.1469-8137.2011.04038.x@10.1002/(ISSN)1469-8137(CAT)FeatureIssues(VI)Bioenergytrees Resende M. F. R., P. Muñoz, M. D. V. Resende, D. J. Garrick, R. L. Fernando, et al., 2012c Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda l.). Genetics 190: 1503–1510. https://doi.org/10.1534/genetics.111.137026 Resende R. T., M. D. V. Resende, F. F. Silva, C. F. Azevedo, E. K. Takahashi, et al., 2017 Regional heritability mapping and genome‐wide association identify loci for complex growth, wood and disease resistance traits in Eucalyptus. New Phytologist 213: 1287–1300. https://doi.org/10.1111/nph.14266 Rigault P., B. Boyle, P. Lepage, J. E. K. Cooke, J. Bousquet, et al., 2011 A white spruce gene catalog for conifer genome analyses. Plant Physiology 157: 14–28. https://doi.org/10.1104/pp.111.179663 Ritland K., 1996 Estimators for pairwise relatedness and individual inbreeding coefficients. Genetics Research 67: 175–185. https://doi.org/10.1017/S0016672300033620 Rutkoski J. E., J. Poland, J.-L. Jannink, and M. E. Sorrells, 2013 Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes, Genomes, Genetics 3: 427–439. https://doi.org/10.1534/g3.112.005363  116  Shaw P. D., M. Graham, J. Kennedy, I. Milne, and D. F. Marshall, 2014 Helium: visualization of large scale plant pedigrees. BMC Bioinformatics 15: 259. https://doi.org/10.1186/1471-2105-15-259 Shen X., M. Alam, F. Fikse, and L. Rönnegård, 2013 A novel generalized ridge regression method for quantitative genetics. Genetics 193: 1255–1268. https://doi.org/10.1534/genetics.112.146720 Silva J. C. e, G. W. Dutkowski, and N. M. G. Borralho, 2005 Across-site heterogeneity of genetic and environmental variances in the genetic evaluation of Eucalyptus globulus trials for height growth. Annals of Forest Science 62: 183–191. https://doi.org/10.1051/forest:2005010 Spiegelhalter D. J., N. G. Best, B. P. Carlin, and A. V. D. Linde, 2002 Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64: 583–639. https://doi.org/10.1111/1467-9868.00353 Squillace A. E., 1974 Average genetic correlations among offspring from open-pollinated forest trees. Silvae genetica 23: 149–156. Stackhouse P. W. J., D. Westberg, W. S. Chandler, T. Zhang, and J. M. Hoell, 2017 Prediction Of Worldwide Energy Resource (POWER)---Agroclimatology Methodology---(1.0 o Latitude by 1.0 o Longitude Spatial Resolution). NASA, SSAI/NASA Langley Research Center, USA.  117  Stejskal J., M. Lstibůrek, J. Klápště, J. Čepl, and Y. A. El-Kassaby, 2018 Effect of genomic prediction on response to selection in forest tree breeding. Tree Genetics & Genomes 14: 74. https://doi.org/10.1007/s11295-018-1283-8 Strauss S. H., R. Lande, and G. Namkoong, 1992 Limitations of molecular-marker-aided selection in forest tree breeding. Can. J. For. Res. 22: 1050–1061. https://doi.org/10.1139/x92-140 Tan B., D. Grattapaglia, G. S. Martins, K. Z. Ferreira, B. Sundberg, et al., 2017 Evaluating the accuracy of genomic prediction of growth and wood traits in two Eucalyptus species and their F1 hybrids. BMC Plant Biology 17: 110. https://doi.org/10.1186/s12870-017-1059-6 Teissier M., H. Larroque, and C. Robert-Granié, 2018 Weighted single-step genomic BLUP improves accuracy of genomic breeding values for protein content in French dairy goats: a quantitative trait influenced by a major gene. Genetics Selection Evolution 50: 31. https://doi.org/10.1186/s12711-018-0400-3 Thistlethwaite F. R., B. Ratcliffe, J. Klápště, I. Porth, C. Chen, et al., 2017 Genomic prediction accuracies in space and time for height and wood density of Douglas-fir using exome capture as the genotyping platform. BMC Genomics 18: 930. https://doi.org/10.1186/s12864-017-4258-5 Thistlethwaite F. R., B. Ratcliffe, J. Klápště, I. Porth, C. Chen, et al., 2019 Genomic selection of juvenile height across a single-generational gap in Douglas-fir. Heredity. https://doi.org/10.1038/s41437-018-0172-0  118  Thompson R., 2008 Estimation of quantitative genetic parameters. Proceedings of the Royal Society B: Biological Sciences 275: 679–686. https://doi.org/10.1098/rspb.2007.1417 Vallejo R. L., T. D. Leeds, G. Gao, J. E. Parsons, K. E. Martin, et al., 2017 Genomic selection models double the accuracy of predicted breeding values for bacterial cold water disease resistance compared to a traditional pedigree-based model in rainbow trout aquaculture. Genetics Selection Evolution 49: 17. https://doi.org/10.1186/s12711-017-0293-6 VanRaden P. M., 2008 Efficient methods to compute genomic predictions. Journal of Dairy Science 91: 4414–4423. https://doi.org/10.3168/jds.2007-0980 VanRaden P. M., C. P. Van Tassell, G. R. Wiggans, T. S. Sonstegard, R. D. Schnabel, et al., 2009 Invited Review: Reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92: 16–24. https://doi.org/10.3168/jds.2008-1514 Visscher P. M., S. E. Medland, M. A. R. Ferreira, K. I. Morley, G. Zhu, et al., 2006 Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLOS Genetics 2: e41. https://doi.org/10.1371/journal.pgen.0020041 Vitezica Z. G., I. Aguilar, I. Misztal, and A. Legarra, 2011 Bias in genomic predictions for populations under selection. Genetics Research 93: 357–366. https://doi.org/10.1017/S001667231100022X Wang J., 2004 Sibship reconstruction from genetic data with typing errors. Genetics 166: 1963–1979. https://doi.org/10.1534/genetics.166.4.1963  119  Wang H., I. Misztal, I. Aguilar, A. Legarra, and W. M. Muir, 2012a Genome-wide association mapping including phenotypes from relatives without genotypes. Genetics Research 94: 73–83. https://doi.org/10.1017/S0016672312000274 Wang T., A. Hamann, D. L. Spittlehouse, and T. Q. Murdock, 2012b ClimateWNA - High-resolution spatial climate data for western North America. Journal of Applied Meteorology and Climatology 51: 16–29. https://doi.org/10.1175/JAMC-D-11-043.1 White T. L., W. T. Adams, and D. B. Neale, 2007 Forest Genetics. CABI, Wallingford, Oxfordshire, UK. Whittaker J. C., R. Thompson, and M. C. Denham, 2000 Marker-assisted selection using ridge regression. Genetics Research 75: 249–252. Zapata-Valenzuela J., F. Isik, C. Maltecca, J. Wegrzyn, D. Neale, et al., 2012 SNP markers trace familial linkages in a cloned population of Pinus taeda—prospects for genomic selection. Tree Genetics & Genomes 8: 1307–1318. https://doi.org/10.1007/s11295-012-0516-5 Zapata-Valenzuela J., R. W. Whetten, D. Neale, S. McKeand, and F. Isik, 2013 Genomic estimated breeding values using genomic relationship matrices in a cloned population of loblolly pine. G3: Genes, Genomes, Genetics 3: 909–916. https://doi.org/10.1534/g3.113.005975 Zhang X., D. Lourenco, I. Aguilar, A. Legarra, and I. Misztal, 2016 Weighting strategies for single-step genomic BLUP: An iterative approach for accurate calculation of GEBV and GWAS. Front Genet 7. https://doi.org/10.3389/fgene.2016.00151  120  Zobel B., and J. Talbert, 1984 Applied forest tree improvement. John Wiley & Sons, New York, NY, USA.   121  Appendices Appendix A  Supplementary information (Chapter 2)   Figure A.1 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 3 year in interior spruce.   122   Figure A.2 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 6 year in interior spruce.    123   Figure A.3 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 10 year in interior spruce.   124   Figure A.4 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 30 year in interior spruce.    125   Figure A.5 Comparison of SNP marker effects by kernel density plots (diagonal), scatter plots (lower triangle), and Pearson (?̂?) and Spearman (?̂?) correlations (upper triangle) for three imputation methods (KNN, red; SVD, blue; M60, green) and three analytical approaches (rrBLUP, top; GRR, middle; BCπ, right) estimated for tree height at age 40 year in interior spruce. 126  Appendix B  Supplementary information (Chapter 3) Table B.1 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (?̂?𝟐), additive genetic correlations (?̂?), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 25% genotyping effort.  Sample ?̂?𝑎2 (HT) ?̂?𝑏2 (HT) ?̂?𝑒2 (HT) ℎ̂2 (HT) ?̂?𝑎2 (WD) ?̂?𝑏2 (WD) ?̂?𝑒2 (WD) ℎ̂2 (WD) ?̂? AIC 1 0.287 (0.070) 0.046 (0.031) 0.675 (0.066) 0.298 (0.070) 0.514 (0.085) 0.010 (0.008) 0.467 (0.072) 0.524 (0.078) -0.003 (0.142) 2809 2 0.266 (0.065) 0.047 (0.032) 0.694 (0.063) 0.277 (0.065) 0.505 (0.082) 0.010 (0.008) 0.469 (0.070) 0.518 (0.076) -0.085 (0.139) 2969 3 0.267 (0.067) 0.047 (0.032) 0.695 (0.064) 0.278 (0.067) 0.478 (0.080) 0.010 (0.008) 0.499 (0.070) 0.489 (0.075) -0.123 (0.141) 2681 4 0.278 (0.066) 0.047 (0.032) 0.682 (0.063) 0.290 (0.066) 0.465 (0.077) 0.010 (0.008) 0.512 (0.067) 0.476 (0.072) -0.034 (0.140) 2819 5 0.282 (0.068) 0.047 (0.032) 0.679 (0.065) 0.294 (0.068) 0.487 (0.079) 0.010 (0.008) 0.486 (0.069) 0.501 (0.074) -0.050 (0.140) 2960 6 0.302 (0.072) 0.047 (0.032) 0.660 (0.067) 0.314 (0.071) 0.501 (0.084) 0.010 (0.008) 0.479 (0.072) 0.511 (0.077) 0.001 (0.142) 2675 7 0.263 (0.065) 0.047 (0.032) 0.697 (0.063) 0.274 (0.066) 0.507 (0.083) 0.009 (0.008) 0.472 (0.071) 0.518 (0.076) -0.013 (0.143) 2670 8 0.279 (0.067) 0.046 (0.031) 0.681 (0.064) 0.290 (0.067) 0.446 (0.074) 0.010 (0.008) 0.523 (0.065) 0.460 (0.070) -0.052 (0.140) 3010 9 0.245 (0.064) 0.046 (0.031) 0.716 (0.062) 0.255 (0.064) 0.448 (0.076) 0.010 (0.008) 0.528 (0.067) 0.459 (0.072) -0.014 (0.149) 2992 10 0.285 (0.068) 0.046 (0.031) 0.677 (0.065) 0.296 (0.068) 0.468 (0.078) 0.009 (0.008) 0.509 (0.068) 0.479 (0.073) -0.113 (0.138) 3030 11 0.285 (0.069) 0.045 (0.031) 0.678 (0.066) 0.296 (0.069) 0.502 (0.081) 0.010 (0.008) 0.475 (0.070) 0.514 (0.075) -0.044 (0.140) 2925 12 0.269 (0.065) 0.046 (0.031) 0.690 (0.062) 0.281 (0.065) 0.520 (0.082) 0.009 (0.008) 0.456 (0.070) 0.533 (0.076) 0.002 (0.138) 2891 13 0.262 (0.065) 0.046 (0.031) 0.697 (0.063) 0.273 (0.065) 0.523 (0.085) 0.010 (0.008) 0.456 (0.072) 0.534 (0.078) -0.121 (0.139) 3019 14 0.294 (0.068) 0.048 (0.032) 0.665 (0.064) 0.307 (0.067) 0.469 (0.076) 0.009 (0.008) 0.503 (0.066) 0.482 (0.071) -0.049 (0.136) 2956 15 0.307 (0.070) 0.047 (0.032) 0.654 (0.066) 0.320 (0.070) 0.556 (0.088) 0.009 (0.008) 0.430 (0.074) 0.564 (0.080) -0.025 (0.135) 2907 16 0.269 (0.067) 0.046 (0.031) 0.692 (0.064) 0.280 (0.067) 0.486 (0.080) 0.009 (0.008) 0.492 (0.069) 0.497 (0.074) 0.001 (0.144) 3013 17 0.273 (0.067) 0.046 (0.031) 0.687 (0.064) 0.284 (0.067) 0.520 (0.083) 0.009 (0.008) 0.457 (0.071) 0.532 (0.077) -0.068 (0.139) 3013 18 0.254 (0.065) 0.046 (0.031) 0.708 (0.063) 0.264 (0.065) 0.488 (0.079) 0.011 (0.009) 0.483 (0.068) 0.503 (0.074) 0.001 (0.144) 2892 19 0.269 (0.068) 0.045 (0.030) 0.694 (0.065) 0.279 (0.068) 0.473 (0.078) 0.009 (0.008) 0.500 (0.067) 0.486 (0.073) -0.087 (0.141) 3019 20 0.243 (0.064) 0.046 (0.031) 0.718 (0.063) 0.253 (0.065) 0.471 (0.078) 0.009 (0.008) 0.503 (0.068) 0.484 (0.073) -0.092 (0.145) 2808 21 0.261 (0.064) 0.046 (0.031) 0.697 (0.062) 0.272 (0.064) 0.467 (0.078) 0.009 (0.008) 0.507 (0.067) 0.479 (0.073) -0.048 (0.142) 3013 22 0.276 (0.067) 0.046 (0.031) 0.685 (0.064) 0.287 (0.067) 0.488 (0.079) 0.009 (0.008) 0.488 (0.068) 0.500 (0.073) -0.011 (0.14) 3015 23 0.263 (0.065) 0.045 (0.031) 0.698 (0.063) 0.273 (0.065) 0.492 (0.081) 0.009 (0.008) 0.484 (0.069) 0.504 (0.075) 0.027 (0.144) 2921 24 0.250 (0.066) 0.046 (0.031) 0.712 (0.064) 0.259 (0.066) 0.459 (0.076) 0.010 (0.008) 0.517 (0.067) 0.471 (0.072) -0.017 (0.148) 2928 25 0.261 (0.066) 0.046 (0.031) 0.700 (0.064) 0.272 (0.066) 0.462 (0.078) 0.010 (0.008) 0.516 (0.068) 0.472 (0.073) -0.019 (0.146) 2876 26 0.280 (0.067) 0.046 (0.031) 0.680 (0.064) 0.292 (0.067) 0.527 (0.084) 0.009 (0.007) 0.450 (0.071) 0.539 (0.077) -0.112 (0.135) 2798 27 0.269 (0.067) 0.046 (0.031) 0.692 (0.064) 0.280 (0.067) 0.483 (0.078) 0.009 (0.008) 0.490 (0.068) 0.497 (0.073) -0.135 (0.138) 3022 28 0.277 (0.068) 0.046 (0.031) 0.685 (0.064) 0.288 (0.067) 0.477 (0.080) 0.010 (0.008) 0.504 (0.069) 0.486 (0.074) 0.057 (0.145) 2922 29 0.277 (0.067) 0.046 (0.031) 0.684 (0.064) 0.289 (0.067) 0.513 (0.083) 0.009 (0.008) 0.468 (0.071) 0.523 (0.077) -0.068 (0.139) 2696 30 0.229 (0.062) 0.047 (0.032) 0.731 (0.061) 0.238 (0.062) 0.515 (0.084) 0.009 (0.007) 0.465 (0.072) 0.525 (0.077) 0.003 (0.149) 2962 Mean 0.271 (0.067) 0.046 (0.031) 0.692 (0.064) 0.282 (0.067) 0.487 (0.080) 0.010 (0.008) 0.486 (0.069) 0.502 (0.075) -0.043 (0.141) 2907 SD 0.017 (0.002) 0.001 (0.000) 0.017 (0.001) 0.018 (0.002) 0.026 (0.003) 0.001 (0.000) 0.024 (0.002) 0.026 (0.002) 0.049 (0.004) 114    127  Table B.2 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (?̂?𝟐), additive genetic correlations (?̂?), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 50% genotyping effort. Sample ?̂?𝑎2 (HT) ?̂?𝑏2 (HT) ?̂?𝑒2 (HT) ℎ̂2 (HT) ?̂?𝑎2 (WD) ?̂?𝑏2 (WD) ?̂?𝑒2 (WD) ℎ̂2 (WD) ?̂? AIC 1 0.198 (0.051) 0.046 (0.032) 0.760 (0.052) 0.207 (0.052) 0.380 (0.062) 0.009 (0.008) 0.588 (0.055) 0.393 (0.059) -0.061 (0.144) 2648 2 0.225 (0.054) 0.046 (0.031) 0.733 (0.053) 0.235 (0.054) 0.401 (0.063) 0.009 (0.008) 0.565 (0.054) 0.415 (0.059) -0.032 (0.137) 2723 3 0.276 (0.059) 0.046 (0.031) 0.683 (0.056) 0.288 (0.059) 0.480 (0.070) 0.009 (0.008) 0.491 (0.058) 0.494 (0.064) -0.146 (0.122) 2614 4 0.220 (0.054) 0.045  (0.03) 0.739 (0.054) 0.229 (0.055) 0.389 (0.062) 0.010 (0.008) 0.580 (0.054) 0.401 (0.058) -0.051 (0.140) 2627 5 0.214 (0.054) 0.046 (0.031) 0.746 (0.054) 0.223 (0.054) 0.371 (0.062) 0.010 (0.008) 0.601 (0.055) 0.382 (0.058) 0.049 (0.147) 2792 6 0.230 (0.055) 0.046 (0.031) 0.729 (0.055) 0.240 (0.056) 0.441 (0.066) 0.010 (0.008) 0.529 (0.056) 0.455 (0.061) -0.074 (0.133) 2772 7 0.221 (0.054) 0.045 (0.031) 0.737 (0.054) 0.231 (0.054) 0.368 (0.061) 0.009 (0.008) 0.601 (0.054) 0.380 (0.058) 0.030 (0.143) 2796 8 0.241 (0.058) 0.047 (0.032) 0.719 (0.056) 0.251 (0.058) 0.400 (0.062) 0.010 (0.008) 0.566 (0.054) 0.414 (0.059) -0.081 (0.135) 2771 9 0.242 (0.057) 0.045 (0.030) 0.718 (0.055) 0.252 (0.057) 0.379 (0.062) 0.009 (0.008) 0.589 (0.055) 0.392 (0.059) -0.031 (0.139) 2598 10 0.220 (0.053) 0.045 (0.030) 0.738 (0.053) 0.229 (0.053) 0.404 (0.064) 0.009 (0.008) 0.565 (0.055) 0.417 (0.059) -0.103 (0.135) 2608 11 0.246 (0.057) 0.047 (0.032) 0.712 (0.055) 0.257 (0.056) 0.399 (0.064) 0.011 (0.009) 0.573 (0.056) 0.411 (0.060) 0.010 (0.137) 2638 12 0.237 (0.056) 0.046 (0.031) 0.721 (0.054) 0.248 (0.056) 0.378 (0.062) 0.010 (0.008) 0.590 (0.054) 0.391 (0.058) -0.088 (0.136) 2623 13 0.229 (0.056) 0.046 (0.031) 0.730 (0.055) 0.239 (0.056) 0.390 (0.061) 0.009 (0.007) 0.577 (0.053) 0.403 (0.057) -0.023 (0.138) 2783 14 0.185 (0.052) 0.045 (0.031) 0.776 (0.054) 0.193 (0.053) 0.417 (0.064) 0.009 (0.007) 0.554 (0.055) 0.429 (0.059) 0.066 (0.149) 2651 15 0.234 (0.056) 0.045 (0.031) 0.725 (0.055) 0.244 (0.056) 0.409 (0.065) 0.010 (0.008) 0.561 (0.056) 0.422 (0.060) -0.076 (0.135) 2610 16 0.230 (0.054) 0.046 (0.031) 0.727 (0.054) 0.240 (0.055) 0.324 (0.057) 0.010 (0.008) 0.641 (0.053) 0.336 (0.055) -0.092 (0.142) 2636 17 0.275 (0.060) 0.046 (0.031) 0.684 (0.057) 0.287 (0.059) 0.364 (0.061) 0.010 (0.008) 0.607 (0.055) 0.375 (0.058) -0.071 (0.134) 2621 18 0.192 (0.051) 0.045 (0.031) 0.765 (0.052) 0.201 (0.051) 0.426 (0.067) 0.008 (0.007) 0.548 (0.057) 0.438 (0.062) -0.186 (0.139) 2756 19 0.229 (0.056) 0.046 (0.031) 0.730 (0.055) 0.239 (0.056) 0.423 (0.065) 0.009 (0.007) 0.547 (0.056) 0.436 (0.060) 0.014 (0.138) 2626 20 0.233 (0.055) 0.046 (0.031) 0.724 (0.054) 0.244 (0.055) 0.410 (0.065) 0.010 (0.008) 0.560 (0.056) 0.423 (0.060) -0.077 (0.135) 2646 21 0.225 (0.056) 0.046 (0.031) 0.734 (0.055) 0.235 (0.056) 0.451 (0.068) 0.009 (0.008) 0.521 (0.057) 0.464 (0.062) -0.236 (0.130) 2595 22 0.222 (0.055) 0.048 (0.033) 0.737 (0.054) 0.232 (0.055) 0.442 (0.067) 0.010 (0.008) 0.533 (0.057) 0.453 (0.062) 0.003 (0.138) 2605 23 0.251 (0.057) 0.046 (0.031) 0.708 (0.055) 0.262 (0.057) 0.387 (0.064) 0.009 (0.008) 0.585 (0.056) 0.398 (0.060) -0.045 (0.137) 2636 24 0.228 (0.055) 0.046 (0.031) 0.731 (0.054) 0.238 (0.055) 0.415 (0.064) 0.011 (0.009) 0.555 (0.055) 0.428 (0.059) -0.053 (0.136) 2741 25 0.238 (0.057) 0.045 (0.03) 0.724 (0.055) 0.247 (0.056) 0.377 (0.061) 0.010 (0.008) 0.592 (0.054) 0.389 (0.058) 0.060 (0.141) 2625 26 0.249 (0.059) 0.046 (0.031) 0.713 (0.057) 0.259 (0.059) 0.420 (0.064) 0.010 (0.009) 0.546 (0.055) 0.434 (0.059) -0.068 (0.132) 2596 27 0.227 (0.056) 0.047 (0.032) 0.733 (0.055) 0.236 (0.056) 0.412 (0.064) 0.009 (0.008) 0.556 (0.055) 0.426 (0.060) -0.029 (0.139) 2773 28 0.213 (0.053) 0.044 (0.03) 0.744 (0.053) 0.223 (0.054) 0.403 (0.064) 0.009 (0.008) 0.568 (0.056) 0.415 (0.060) -0.108 (0.139) 2648 29 0.192 (0.051) 0.047 (0.032) 0.766 (0.053) 0.200 (0.052) 0.412 (0.064) 0.010 (0.008) 0.558 (0.055) 0.425 (0.059) -0.085 (0.142) 2779 30 0.208 (0.052) 0.046 (0.031) 0.751 (0.053) 0.217 (0.053) 0.418 (0.066) 0.008 (0.007) 0.557 (0.057) 0.429 (0.061) 0.098 (0.143) 2650 Mean 0.228 (0.055) 0.046 (0.031) 0.731 (0.054) 0.237 (0.055) 0.403 (0.064) 0.009 (0.008) 0.567 (0.055) 0.416 (0.060) -0.050 (0.138) 2673 SD 0.021 (0.002) 0.001 (0.001) 0.021 (0.001) 0.022 (0.002) 0.030 (0.003) 0.001 (0.000) 0.029 (0.001) 0.031 (0.002) 0.073 (0.005) 72    128  Table B.3 Combined pedigree-marker based (H) estimates of variance components for white spruce tree height (cm) and wood density (kg/m3), with associated estimated narrow-sense individual tree heritabilities (?̂?𝟐), additive genetic correlations (?̂?), and Akaike information criterion (AIC) model fit statistic for 30 random samples at 75% genotyping effort. Sample ?̂?𝑎2 (HT) ?̂?𝑏2 (HT) ?̂?𝑒2 (HT) ℎ̂2 (HT) ?̂?𝑎2 (WD) ?̂?𝑏2 (WD) ?̂?𝑒2 (WD) ℎ̂2 (WD) ?̂? AIC 1 0.208 (0.047) 0.046 (0.031) 0.750 (0.047) 0.217 (0.047) 0.364 (0.054) 0.009 (0.008) 0.603 (0.046) 0.376 (0.050) 0.001 (0.130) 2349 2 0.193 (0.046) 0.045 (0.031) 0.765 (0.047) 0.202 (0.046) 0.323 (0.052) 0.010 (0.008) 0.647 (0.047) 0.333 (0.049) -0.008 (0.139) 2434 3 0.240 (0.049) 0.045 (0.030) 0.718 (0.048) 0.251 (0.049) 0.309 (0.050) 0.010 (0.008) 0.655 (0.046) 0.321 (0.048) 0.021 (0.130) 2371 4 0.187 (0.045) 0.046 (0.031) 0.770 (0.047) 0.195 (0.045) 0.340 (0.053) 0.008 (0.007) 0.628 (0.047) 0.351 (0.050) -0.082 (0.136) 2378 5 0.218 (0.047) 0.045 (0.031) 0.738 (0.047) 0.228 (0.047) 0.362 (0.054) 0.010 (0.008) 0.607 (0.047) 0.374 (0.050) -0.072 (0.126) 2362 6 0.174 (0.044) 0.045 (0.030) 0.785 (0.047) 0.181 (0.045) 0.318 (0.051) 0.010 (0.008) 0.648 (0.046) 0.329 (0.049) -0.029 (0.143) 2394 7 0.201 (0.045) 0.046 (0.031) 0.755 (0.046) 0.210 (0.045) 0.331 (0.051) 0.010 (0.008) 0.633 (0.045) 0.343 (0.048) 0.008 (0.132) 2364 8 0.195 (0.047) 0.046 (0.031) 0.765 (0.048) 0.203 (0.047) 0.347 (0.052) 0.010 (0.008) 0.619 (0.046) 0.359 (0.049) -0.021 (0.135) 2369 9 0.199 (0.046) 0.047 (0.032) 0.759 (0.047) 0.207 (0.046) 0.339 (0.052) 0.010 (0.008) 0.625 (0.046) 0.352 (0.049) -0.038 (0.133) 2370 10 0.206 (0.046) 0.046 (0.031) 0.752 (0.047) 0.215 (0.046) 0.322 (0.052) 0.009 (0.007) 0.646 (0.047) 0.333 (0.049) 0.095 (0.137) 2380 11 0.187 (0.044) 0.047 (0.032) 0.769 (0.046) 0.196 (0.044) 0.326 (0.051) 0.009 (0.008) 0.640 (0.046) 0.337 (0.048) -0.057 (0.135) 2428 12 0.183 (0.045) 0.045 (0.031) 0.775 (0.047) 0.191 (0.045) 0.320 (0.051) 0.010 (0.008) 0.647 (0.046) 0.331 (0.048) 0.037 (0.142) 2392 13 0.193 (0.046) 0.045 (0.031) 0.765 (0.047) 0.201 (0.046) 0.371 (0.055) 0.010 (0.008) 0.595 (0.047) 0.384 (0.051) -0.123 (0.130) 2368 14 0.196 (0.046) 0.045 (0.031) 0.762 (0.048) 0.204 (0.047) 0.340 (0.052) 0.010 (0.008) 0.625 (0.046) 0.352 (0.049) -0.093 (0.133) 2363 15 0.183 (0.046) 0.046 (0.031) 0.776 (0.047) 0.191 (0.046) 0.338 (0.053) 0.009 (0.008) 0.631 (0.047) 0.349 (0.049) -0.007 (0.140) 2380 16 0.216 (0.048) 0.045 (0.031) 0.742 (0.048) 0.226 (0.048) 0.361 (0.054) 0.010 (0.008) 0.605 (0.047) 0.374 (0.050) -0.044 (0.128) 2356 17 0.184 (0.045) 0.046 (0.031) 0.774 (0.047) 0.192 (0.046) 0.319 (0.051) 0.010 (0.008) 0.644 (0.046) 0.332 (0.049) -0.141 (0.137) 2387 18 0.202 (0.047) 0.046 (0.031) 0.757 (0.048) 0.211 (0.047) 0.352 (0.054) 0.009 (0.008) 0.615 (0.047) 0.364 (0.050) -0.07  (0.132) 2370 19 0.163 (0.043) 0.045 (0.031) 0.795 (0.046) 0.170 (0.044) 0.357 (0.054) 0.008 (0.007) 0.610 (0.047) 0.369 (0.050) -0.018 (0.142) 2429 20 0.169 (0.043) 0.045 (0.031) 0.789 (0.046) 0.176 (0.044) 0.369 (0.055) 0.010 (0.008) 0.599 (0.047) 0.381 (0.051) -0.102 (0.137) 2370 21 0.184 (0.045) 0.046 (0.031) 0.774 (0.047) 0.192 (0.046) 0.364 (0.055) 0.010 (0.008) 0.604 (0.047) 0.376 (0.051) -0.070 (0.135) 2364 22 0.174 (0.044) 0.045 (0.031) 0.783 (0.046) 0.182 (0.044) 0.358 (0.054) 0.008 (0.007) 0.611 (0.047) 0.369 (0.050) 0.005 (0.139) 2356 23 0.200 (0.046) 0.045 (0.031) 0.756 (0.047) 0.209 (0.046) 0.308 (0.051) 0.008 (0.007) 0.659 (0.046) 0.319 (0.048) -0.074 (0.136) 2387 24 0.212 (0.048) 0.047 (0.032) 0.746 (0.048) 0.222 (0.048) 0.338 (0.052) 0.009 (0.008) 0.626 (0.046) 0.350 (0.049) -0.032 (0.131) 2414 25 0.200 (0.046) 0.048 (0.033) 0.757 (0.047) 0.209 (0.047) 0.340 (0.053) 0.009 (0.007) 0.626 (0.047) 0.352 (0.049) -0.144 (0.130) 2382 26 0.211 (0.047) 0.047 (0.032) 0.745 (0.047) 0.221 (0.047) 0.342 (0.052) 0.009 (0.008) 0.625 (0.046) 0.353 (0.049) -0.059 (0.129) 2364 27 0.201 (0.046) 0.047 (0.032) 0.756 (0.047) 0.210 (0.046) 0.335 (0.052) 0.009 (0.008) 0.631 (0.046) 0.347 (0.049) -0.010 (0.134) 2364 28 0.190 (0.046) 0.046 (0.031) 0.767 (0.047) 0.199 (0.046) 0.302 (0.050) 0.010 (0.008) 0.663 (0.046) 0.313 (0.048) -0.134 (0.136) 2396 29 0.210 (0.047) 0.045 (0.031) 0.748 (0.047) 0.220 (0.047) 0.318 (0.050) 0.009 (0.008) 0.647 (0.046) 0.330 (0.048) 0.014 (0.134) 2368 30 0.201 (0.046) 0.045 (0.031) 0.756 (0.047) 0.210 (0.046) 0.352 (0.054) 0.009 (0.008) 0.617 (0.047) 0.364 (0.050) -0.029 (0.132) 2378 Mean 0.196 (0.046) 0.046 (0.031) 0.762 (0.047) 0.205 (0.046) 0.339 (0.052) 0.009 (0.008) 0.628 (0.046) 0.351 (0.049) -0.043 (0.134) 2380 SD 0.016 (0.001) 0.001 (0.000) 0.016 (0.000) 0.017 (0.001) 0.019 (0.002) 0.001 (0.000) 0.019 (0.001) 0.020 (0.001) 0.056 (0.004) 22   129   Figure B.6 Heat map of the pedigree-based relationship estimator (A).   130   Figure B.7 Heat map of the combined pedigree-marker based relationship estimator (H) at 25% genotyping effort for a single random sample.   131   Figure B.8 Heat map of the combined pedigree-marker based relationship estimator (H) at 50% genotyping effort for a single random sample.   132   Figure B.9 Heat map of the combined pedigree-marker based relationship estimator (H) at 75% genotyping effort for a single random sample.   133   Figure B.10 Heat map of the combined pedigree-marker based relationship estimator (H) at 100% genotyping effort. 134  Appendix C  Supplementary information (Chapter 4)  Figure C.11 Three generation coastal Douglas-fir contemporary pedigree containing the inter-generation validation population of study. Circles represent single individuals, octagons represent full-sibling families. Red colours are the unrelated (assumed) base population of wild plus tree selections (P0), green colours are the forward selection progenitors (F1) of the Jordan (purple, F2-2) and Bigtree (yellow, F2-3) validation populations (F2) respectively. The pedigree was produced using Helium software (Shaw et al. 2014).    135   Figure C.12 Scatterplot of mean absolute marker effects from the cross-validation analysis.   136   Figure C.13 Scatterplots of mean rank of predicted trees with Spearman rank correlation coefficients from the cross-validation analyses for all predictions (𝝆𝟏) and top 40% of predictions (𝝆𝟐) within environments Adam (A), Lost (B), Fleet (C), Jordan (D), and Bigtree (E) comparing the base model (M1, x-axis) and fully specified model (M5, y-axis). Deviation from the diagonal line implies a rank change. 137  Table C.4 Number of parent-parent (F1-F1 / F2-F2) or parent-grandparent (F1-F2) (above diagonal) and families (below diagonal) in common among the environments for study population 1 (SP1). Generation  F1 F2-2 F2-3   Lost Jordan Northarm Jordan Northarm Menzies Sproat Squamish Tansky White Jordan Northarm Bigtree Roach F1 Lost  13 13 13 13 78 78 78 78 78 13 13 12 12 Adam 165 13 13 13 13 78 78 78 78 78 13 13 12 12 Fleet 165 13 13 13 13 78 78 78 78 78 13 13 12 12 Sechelt 165 13 13 13 13 78 78 78 78 78 13 13 12 12 Eldred 165 13 13 13 13 78 78 78 78 78 13 13 12 12 Menzies 165 13 13 13 13  78 78 78 78 13 13 12 12 Sproat 165 13 13 13 13 165  78 78 78 13 13 12 12 Squamish 165 13 13 13 13 165 165  78 78 13 13 12 12 Tansky 165 13 13 13 13 165 165 165  78 13 13 12 12 White 165 13 13 13 13 165 165 165 165  13 13 12 12 F2-2 Jordan   74  74       74   Northarm  79  79       79    F2-3 Bigtree              73 Roach             104   Table C.5 Number of F2 parents within the F1 environments.   F2-2 F2-3   Jordan Northarm Bigtree Roach F1 Lost 4 4 5 5 Adam 6 6 4 4 Fleet 7 7 8 8 Sechelt 8 8 9 9 Eldred     Menzies     Sproat     Squamish     Tansky     White         138  Table C.6 List of monthly (from January/01 to December/12) climatic variables obtained from ClimateBC v5.51 (Wang et al. 2012b) used in the genomic analyses. Primary monthly variables Tave01 – Tave12  mean temperatures (°C) TMX01 – TMX12  maximum mean temperatures (°C) TMN01 – TMN12  minimum mean temperatures (°C) PPT01 – PPT12  precipitation (mm) RAD01 – RAD12  solar radiation (MJ m‐2 d‐1)  Derived monthly variables DD_0_01 – DD_0_12  degree-days below 0°C DD5_01 – DD5_12  degree-days above 5°C DD_18_01 – DD_18_12  degree-days below 18°C DD18_01 – DD18_12  degree-days above 18°C NFFD01 – NFFD12  number of frost-free days PAS01 – PAS12  precipitation as snow (mm) Eref01 – Eref12  Hargreaves reference evaporation (mm) CMD01 – CMD12  Hargreaves climatic moisture deficit (mm)  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0377775/manifest

Comment

Related Items