UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Contributions of intrinsic and environmental factors to early life DNA methylation Goodman, Sarah Jessica 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2019_november_goodman_sarah.pdf [ 9.47MB ]
JSON: 24-1.0380993.json
JSON-LD: 24-1.0380993-ld.json
RDF/XML (Pretty): 24-1.0380993-rdf.xml
RDF/JSON: 24-1.0380993-rdf.json
Turtle: 24-1.0380993-turtle.txt
N-Triples: 24-1.0380993-rdf-ntriples.txt
Original Record: 24-1.0380993-source.json
Full Text

Full Text

CONTRIBUTIONS OF INTRINSIC AND ENVIRONMENTAL FACTORS TO EARLY LIFE DNA METHYLATION by  Sarah Jessica Goodman  B.Sc., Queens University, 2012  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Medical Genetics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)   September 2019  © Sarah Jessica Goodman, 2019 ii  The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Contributions of intrinsic and environmental factors to early life DNA methylation  submitted by Sarah J. Goodman in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Medical Genetics  Examining Committee: Dr. Michael S. Kobor Supervisor  Dr. Wendy P. Robinson Supervisory Committee Member  Dr. Robert McMaster Supervisory Committee Member Dr. Jan M. Friedman University Examiner Dr. Ruth Eckstein Grunau University Examiner  Additional Supervisory Committee Members: Dr. Stuart E. Turvey Supervisory Committee Member   iii  Abstract Many early experiences and exposures are known to cause health disparities later in life, suggesting that they are somehow ‘biologically embedded’. The mechanisms underlying ‘biological embedding’ are currently not well understood. However, emerging evidence has implicated a potential contribution from epigenetic modifications, such as DNA methylation (DNAm), which has been shown to associate with early life experiences of low socioeconomic status. Additional related experiences have also been connected with DNAm, including parental stress, childhood maltreatment or deprivation, and maternal mental health problems during the perinatal period. The relationship between early life experience and epigenetics is complicated by internal psychological and physiological factors, as well as genetic variation, which can account for 20% to 80% of inter-individual epigenetic differences. As well, stress reactivity and temperament are predictive of how a child may interact with his or her environment and can affect how such exposures are internalized. Thus, the main objectives of my dissertation were to elucidate the relationships between childhood environment, DNAm, genetic variation, and behaviour, to understand how these systems influence one another. Using matched DNAm profiles from blood and buccal tissue from a cross-sectional cohort of Canadian children, I uncovered tissue-specific and -shared DNAm signatures in order to glean the utility of accessible tissues in epigenetic association studies. In a longitudinal cohort, I tested the hypothesis that indicators of children’s early internal, biological and behavioural responses to stressful challenges are linked to stable patterns of DNAm later in life; I found relationships between biobehavioural response propensities in early life and patterns of DNAm in DLX5 and IGF2 genes at ages 15 and 18. Finally, I examined the epigenetic correlates of familial socioeconomic status in matched childhood peripheral blood and dried neonatal blood spot samples, allowing iv  me to assess the DNAm pattern over time. Together these findings build upon our current understanding of the role of DNAm in biological embedding and more broadly, the field of social epigenetics.    v  Lay Summary Childhood is a formative time, during which experiences and environments can alter developmental outcomes. Familial socioeconomic status (SES) in early life, for example, is a powerful determinant of health across the life course. However, this relationship is complicated by individual factors including genetic variation and stress reactivity. The biological mechanism by which all these components interact and ultimately “get under the skin” to impact long term health and wellbeing is not well understood. However, emerging evidence has implicated a contribution of DNA methylation (DNAm), a molecular marker that regulates gene activity. Early life exposures including low SES, parental stress, and childhood maltreatment or deprivation have all been linked to unique DNAm patterns. The research presented here, supports and broadens our understanding of how both internal and external factors associate with DNAm with the goal of informing targeted, multilevel interventions capable of abating societal inequalities in health. vi  Preface All data chapters in this thesis (Chapters 3-5) are presented in manuscript format, as they are currently published (Chapters 3 and 4) or in preparation for submission (Chapter 5).  Portions of Chapter 1 (Introduction) were copied or adapted from a previously published review article: • Jones MJ, Goodman SJ, Kobor MS. (2015) DNA methylation and healthy human aging. Aging Cell. Specifically, subsection 1.6, titled “DNA methylation over the life course”, contains excerpts of this review article which was jointly written and edited Dr. M. Jones and myself in Dr. M. Kobor’s lab. The remainder of this chapter is original and unpublished, and written by myself.   Chapter 2 is original, unpublished work summarizing methodology, which I wrote.  A version of Chapter 3 has been published as:  • Islam SA*, Goodman SJ*, MacIsaac JL, Obradović J, Barr RG, Boyce WT, Kobor MS. (2019) Integration of DNA methylation patterns and genetic variation in human pediatric tissues help inform EWAS design and interpretation. Epigenetic & Chromatin. *Authors contributed equally. • Islam, SA. (2018). Tissue-specific investigations of DNA methylation variation in human neurobiological diseases (PhD Dissertation). University of British Columbia. Retrieved from https://open.library.ubc.ca/collections/ubctheses/24/items/1.0373200 vii  This study was conducted in Dr. M. Kobor’s lab. Dr. S. Islam and I contributed equally to this work with regards to study design, statistical analysis, and manuscript writing. Drs. J. MacIsaac and L. McEwen assisted in running DNA methylation arrays. The primary investigators of the GECKO and C3ARE cohorts are Dr. W.T. Boyce and Dr. R. Barr, respectively.   A version of Chapter 4 has been published as:  • Goodman SJ, Roubinov DS, Bush NR, Park M, Farré P, Emberly E, Hertzman C, Essex MJ, Kobor MS*, Boyce WT*. (2017) Children’s biobehavioral reactivity to challenge predicts DNA methylation in adolescence and emerging adulthood. Developmental Science. *Authors contributed equally. Dr. M. Essex, the principal investigator of the cohort. Dr. D. Roubinov, Dr. M. Kobor, Dr. M. Essex, Dr. W.T. Boyce and I all contributed intellectually to study design and hypothesis generation. L. Lam, S. Nuemann and Dr. M. Park ran the DNA methylation arrays. I ran the statistical analysis, performed the pyrosequencing in Dr. M. Kobor’s lab. I wrote the manuscript with Dr. D. Roubinov. Dr. M. Kobor and Dr. W.T. Boyce made editorial contributions.  Chapter 5 is original work conducted in Dr. M. Kobor’s lab, in collaboration with Dr. W.T. Boyce, the principal investigator of the cohort. I ran the DNA methylation and genotyping arrays with assistance from Dr. J. MacIsaac and Dr. G. Leung. I also ran all statistical analysis and wrote the manuscript.  Chapter 6 (Conclusions) is original, unpublished work, which I wrote.  viii  All research articles noted above are available under the terms of the Creative Commons Attribution License (CC BY), which permits distribution and reproduction in any medium, given proper citation.  Ethics approval for sample collection and epigenetic analysis was obtained separately for each cohort (GECKO, C3ARE, and WSFW) from the University of British Columbia, Children and Women’s Hospital Ethics board (Certificates H07-01317 (C3ARE) and H07-02773 (GECKO, WSFW)). Written informed consent was obtained from a parent or legal guardian, and assent was obtained from each child before study participation.   ix  Table of Contents Abstract ................................................................................................................................... iii Lay Summary ............................................................................................................................ v Preface ...................................................................................................................................... vi Table of Contents ..................................................................................................................... ix List of Tables ........................................................................................................................... xv List of Figures ........................................................................................................................ xvi List of Abbreviations............................................................................................................xviii Acknowledgements................................................................................................................. xxi Dedication .............................................................................................................................. xxii Chapter 1: Introduction ............................................................................................................ 1 1.1 Developmental origins of health and disease through developmental plasticity ............ 1 1.2 The impact of early life socioeconomic status and social environment on health and development ............................................................................................................................ 3 1.3 Differential susceptibility to early life environments .................................................... 6 1.3.1 Gene by environment interactions in the context of early life environments .......... 7 1.4 Epigenetics .................................................................................................................. 8 1.5 DNA methylation ........................................................................................................ 8 1.5.1 DNA methylation involvement in gene regulation .............................................. 11 1.5.2 Genetic background contributes to DNA methylation variation .......................... 12 1.5.3 Tissue specific DNA methylation and cellular differentiation ............................. 13 1.6 DNA methylation over the life course ........................................................................ 13 x  1.7 Evidence of DNA methylation alterations by SES-related early environmental exposures .............................................................................................................................. 15 1.7.1 Animal studies of epigenetics and early exposures .............................................. 16 1.7.2 Human studies of epigenetics and early exposures .............................................. 17 1.8 Synthesis and thesis objectives .................................................................................. 20 Chapter 2: Common methods and materials ......................................................................... 22 2.1 Introduction to epigenome-wide association studies ................................................... 22 2.2 Study samples ............................................................................................................ 23 2.2.1 Wisconsin Study of Family and Work ................................................................ 23 2.2.2 Gene Expression Collaborative Kids Only .......................................................... 25 2.3 DNA extraction and sodium bisulfite conversion ....................................................... 26 2.4 Illumina DNA methylation microarrays ..................................................................... 27 2.4.1 Illumina Infinium HumanMethylation27 BeadChip ............................................ 28 2.4.2 Illumina Infinium HumanMethylation450 BeadChip .......................................... 28 2.5 DNA methylation data processing pipeline ................................................................ 29 2.5.1 Prepossessing and quality control ....................................................................... 29 2.5.2 Normalization .................................................................................................... 31 2.5.3 Cell-type correction of 450K DNA methylation data .......................................... 31 2.5.4 Batch correction ................................................................................................. 32 2.6 Data analysis ............................................................................................................. 33 2.6.1 Data reduction .................................................................................................... 33 2.6.2 Statistical analysis of differential DNAm ............................................................ 33 xi  Chapter 3: Integration of DNA methylation patterns and genetic variation in human pediatric tissues help inform EWAS design and interpretation ............................................ 36 3.1 Introduction ............................................................................................................... 36 3.2 Materials and methods ............................................................................................... 39 3.2.1 Study cohorts and tissue samples ........................................................................ 40 3.2.2 DNA isolation and DNA methylation arrays ....................................................... 41 3.2.3 DNA methylation array data quality control and normalization .......................... 41 3.2.4 Cell-type correction of DNA methylation data .................................................... 42 3.2.5 Assessment of cross-tissue correlation, tissue-specific variability and tissue-specific differences in DNA methylation data .................................................................... 44 3.2.6 SNP genotyping arrays ....................................................................................... 45 3.2.7 Preprocessing of SNP genotyping data and PCA analyses for genetic ancestry ... 46 3.2.8 Cis-mQTL analyses ............................................................................................ 46 3.2.9 Representation of identified sites in published EWAS findings ........................... 47 3.3 Results ....................................................................................................................... 48 3.3.1 Study cohorts and DNAm data processing .......................................................... 48 3.3.2 BEC DNAm had greater inter-individual variability than PBMC DNAm ............ 49 3.3.3 Variable CpGs were more highly correlated between tissues .............................. 52 3.3.4 Genetic variation contributed to tissue concordance ............................................ 55 3.3.5 Tissue-specific differential DNAm was consistent across cohorts ....................... 58 3.3.6 cis-mQTLs were present in previously published EWAS findings ...................... 60 3.4 Discussion ................................................................................................................. 63 xii  Chapter 4: Children’s biobehavioural reactivity to challenge predicts DNA methylation in adolescence and emerging adulthood ..................................................................................... 69 4.1 Introduction ............................................................................................................... 69 4.2 Materials and methods ............................................................................................... 72 4.2.1 Study sample ...................................................................................................... 73 4.2.2 Temperament, ANS, and mental health measures ............................................... 73 4.2.3 DNA methylation ............................................................................................... 76 4.2.4 Statistical analysis .............................................................................................. 80 4.2.5 Pyrosequencing experiments .............................................................................. 83 4.3 Results ....................................................................................................................... 83 4.3.1 Principal components analyses of biobehavioural reactivity ................................ 83 4.3.2 Associations between biobehavioural reactivity factors and DNAm at age 15 ..... 84 4.3.3 Examination of sex differences in correlations between biobehavioural reactivity factors and DNA methylation at age 15 ............................................................................. 90 4.3.4 Persistence of associations between BID and DNAm at age 18 ........................... 92 4.3.5 Longitudinal stability in DNA methylation ......................................................... 94 4.3.6 Pyrosequencing experiments verified associations between Biobehavioural Inhibition/Disinhibition and DLX5 DNA methylation ........................................................ 95 4.4 Discussion ................................................................................................................. 97 4.4.1 Limitations ....................................................................................................... 101 Chapter 5: Assessing early life patterns of DNA methylation with concurrent objective and subjective socioeconomic status ............................................................................................ 104 5.1 Introduction ............................................................................................................. 104 xiii  5.2 Methods and materials ............................................................................................. 108 5.2.1 Study sample .................................................................................................... 108 5.2.2 SES measures ................................................................................................... 110 5.2.3 DNA methylation data ...................................................................................... 111 5.2.4 Genotyping data generation and use in inferring population structure ............... 112 5.2.5 Statistical analysis of DNAm in PBMCs ........................................................... 113 5.2.6 Gene ontology .................................................................................................. 114 5.2.7 Identification of differentially methylated regions ............................................ 114 5.2.8 mQTL analysis ................................................................................................. 115 5.2.9 Statistical analysis in dried blood spots ............................................................. 116 5.3 Results ..................................................................................................................... 116 5.3.1 Objective and subjective SES measures in the GECKO cohort ......................... 116 5.3.2 Subjective SES and objective SES tended to associate with different CpGs ...... 120 5.3.3 Loci associated with objective SES contained more underlying DMRs than those associated with subjective SES ........................................................................................ 124 5.3.4 Few SES-associated CpGs were also driven by genetic variation ...................... 126 5.3.5 Assessing DNAm in neonatal dried blood spots ................................................ 127 5.4 Discussion ............................................................................................................... 130 5.4.1 Limitations ....................................................................................................... 133 Chapter 6: Conclusions ......................................................................................................... 136 6.1 Summary of dissertation results ............................................................................... 136 6.2 Limitations .............................................................................................................. 142 6.3 Future directions ...................................................................................................... 145 xiv  References ............................................................................................................................. 148 Appendices ............................................................................................................................ 179 Appendix A Supplementary materials for chapter 3 ............................................................. 179 A.1 Supplementary figures ...................................................................................... 179 A.2 Supplementary tables ....................................................................................... 184 Appendix B Supplementary materials for chapter 4 ............................................................. 187 B.1 Supplementary figures ...................................................................................... 187 B.2 Supplementary tables ....................................................................................... 193 Appendix C Supplementary materials for chapter 5 ............................................................. 194 C.1 Supplementary figures ...................................................................................... 194 C.2 Supplementary tables ....................................................................................... 195  xv  List of Tables Table 3.1 Sample characteristics for C3ARE and GECKO cohorts ............................................ 48 Table 4.1 Mental health, temperament and ANS traits collected over seven years and included in the analysis ............................................................................................................................... 74 Table 5.1 Sample characteristics for full GECKO cohorts and subset presented in this study ... 109 Table 5.2 CpGs associated with both PosCan and CompSES (|∆b| > 5% and p-value < 5x10-4) ............................................................................................................................................... 121  xvi  List of Figures Figure 3.1 BEC DNAm was consistently more variable than PBMC DNAm at the genome-wide and probe-wise level. ................................................................................................................ 51 Figure 3.2 Variable CpGs were more highly correlated between tissues. ................................... 54 Figure 3.3 Independently validated cis-mQTL were more likely to be shared across tissues than expected by chance. .................................................................................................................. 57 Figure 3.4 Tissue-specific differential DNA methylation was consistent across cohorts. ............ 59 Figure 3.5 Overlap and representation of identified CpGs in previously published pediatric EWAS findings. ........................................................................................................................ 62 Figure 4.1 Results of principal component analysis revealed biobehavioural reactivity as biologically driven composite measure. ..................................................................................... 85 Figure 4.2 Schematic of DLX5, IGF2, MYO16 and PRUNE2 genes, which each contained more than 2 high or medium confidence CpGs significantly associated with Biobehavioural Inhibition/Disinhibition. ............................................................................................................ 88 Figure 4.3 Correlations between DNA methylation at age 15 and Biobehavoural Inhibition/Disinhibition in males and females............................................................................ 91 Figure 4.4  DLX5 and IGF2 DNA methylation remained significantly associated with Biobehavioural Inhibition/Disinhibition at age 18. .................................................................... 93 Figure 4.5 CpG stability across three years and pyrosequencing of DLX5 gene to verify DNA methylation findings. ................................................................................................................ 96 Figure 5.1 Relations between sample characteristics and SES measures in full GECKO cohort were maintained in subset used in DNAm analysis. ................................................................. 118 xvii  Figure 5.2 Genome-wide linear regressions showed limited numbers of CpGs associated with each SES variable. .................................................................................................................. 121 Figure 5.3 Limited enrichment of genomic features found in CpGs associated with CompSES or PosCan. ................................................................................................................................... 123 Figure 5.4 CompSES found to be associated with more DMRs than PosCan. .......................... 125 Figure 5.5 Little overlap between CpGs associated with PosCan or CompSES and mQTL-associated CpGs. ..................................................................................................................... 126 Figure 5.6 CpGs associated with SES in both PBMCs and DBSs showed similar effect sized across time points. ................................................................................................................... 129   xviii  List of Abbreviations 27K array – Illmina Infinium HumanMethylation27 BeadChip array 450K array – Illumina Infinium HumanMethylation450 BeadChip array |∆b| – Absolute delta beta ANS – Autonomic nervous system BEC – Buccal epithelial cells  BID – Biobehavioural inhibition/disinhibition bp – base pair C – Celsius  C3ARE – Cleaning, Carrying, Changing, Attending, Reading and Expressing CBC/Diff – Complete blood count with differential CGI – CpG island CompSES – Composite socioeconomic status CpG – Cytosine-guanine dinucleotide DBS – Dried blood spot DMNT – DNA methyltransferase DMR – Differentially methylated region DNA – Deoxyribonucleic acid DNAm – DNA methylation DOHaD – Developmental Origins of Health and Disease  ESC – Embryonic stem cell EWAS – Epigenome-wide association study FDR – False discovery rate xix  GECKO – Gene Expression Collaborative Kids Only GO – Gene ontology GR – Glucocorticoid receptor GSR – Gene score resample GWAS – Genome-wide association study GxE – Gene by environment interactions HPA – Hypothalamic-pituitary-adrenal kb – Kilobase  mQTL – Methylation quantitative trait locus mQTLs – Methylttion quantitative trait loci mRNA – Messenger ribonucleic acid NIMH – National Institutes of Mental Health ORA – Over-representation analysis PBMC – Peripheral blood mononuclear cell PCR – Polymerase chain reaction PosCan – Position in Canada RNA – Ribonucleic acid SES – Socioeconomic status SNP – Single nucleotide polymorphism TET – Ten-eleven translocation methylcytosine dioxygenase  TSS – Transcriptional start site UCSC – University of California, Santa Cruz Genome Browser VNTR - Variable number tandem repeat xx  WSFW – Wisconsin Study of Family and Work  xxi  Acknowledgements  I am very grateful to have been mentored by many tremendous researchers who educated and supported me throughout these past seven years. To my supervisor, Dr. Michael Kobor, thank you for guidance and encouragement on all of my research endeavours and for allowing me to be a part of your amazing team. To my collaborators, Dr. Tom Boyce, Dr. Marilyn Essex, Dr. Nicki Bush and Dr. Danielle Roubinov, I am so fortunate to have had the opportunity to work with you and learn from you. This research would not have been possible without your passion and knowledge. Many thanks to my advisory committee, Dr. Wendy Robinson, Dr. Rob McMaster, and Dr. Stuart Turvey for your thoughtful feedback and insight, and for calming my nerves during committee meetings.  To my incredible lab mates, friends, and an amazing cohort of MedGen students, thank you for being my problem solvers, confidantes, and my Vancouver family (especially Maria, Magda, and Gian). Special thanks to Cheryl Bishop for your helping hand, endless support, and enthusiasm.  Last but not the least, I would like to thank my parents and sisters for never allowing me to go more than a few days without a phone call to remind me that I wasn’t really that far from home and you were expecting me back soon.    xxii  Dedication To my inspiring and selfless mother.    1  Chapter 1: Introduction  1.1 Developmental origins of health and disease through developmental plasticity  Early epidemiological studies showing geographical relationships between rates of infant mortality and adult coronary heart disease collectively gave rise to the developmental origins of health and disease (DOHaD) hypothesis; this paradigm posits that environmental factors during fetal development and infancy contribute to chronic disease susceptibility (1). Fetal mortality in the UK, commonly attributed to low birthweight, was geographically linked to rates of cardiovascular disease; this indicated that prenatal environment, namely nutrition, contributed to later cardiovascular health (1). Findings from Norway, Finland and the US showed that poor living conditions, such as recurrent exposure to infection, and resultant infant mortality rates were also geographically associated with the occurrence of mortality from heart disease (1-3). These findings established the contribution of prenatal and neonatal environment to health outcomes across the lifespan and the field of DOHaD research, which has since expanded to include the study of childhood or “early life” environment (4).  In early development, a critical window of time exists during which biological systems are altered by and thereby adapt to environmental exposures and influence long-term health, and this phenomenon of environmental programming is a central tenet of the DOHaD approach (5). Animal studies of short nutritional alterations provide overwhelming evidence of developmental programming and consequential life-long health outcomes (5). Overfeeding infant baboons leads to obesity which only arises in adulthood (6). In rat studies both pre- and postnatal nutrition have been shown to influence adult health (5). Male rats born to mothers fed a low protein diet during pregnancy but suckled by mothers with a normal diet have significantly shorter life spans than 2  controls (5); however, another study showed that rats were permanently smaller after suckling mothers fed a low protein, but were normal sized when mother were fed the same diet during pregnancy (7). In sum, animal studies have revealed prenatal nutrition altering brain development, learning, and behaviour in addition to metabolic health outcomes.  In humans, longitudinal findings from adults exposed in utero to the Dutch Hunger Winter include increased risk of obesity, abnormal lipid profiles, cardiovascular disease, and affective disorders (8). These outcomes differ based on the timing of exposure; those exposed only during early gestation had normal birth weights, while those exposed at late gestation had reduced birth weights (8). As such, windows of environmental sensitivity or critical periods are thought to differ for different biological systems. Exposures beyond nutrition, which have been frequently studied and reproducibly linked to later health outcomes include BPA, mercury, air pollution and particulate matter, lead, organophosphates (9). These and other environmental exposures have been associated with a wide range noncommunicable diseases including obesity, diabetes, hypertension, asthma and allergy, immune and autoimmune diseases, neurodevelopmental and neurodegenerative diseases, infertility, cancers, and mental health disorders (10-12). Less tangible exposures such as various causes of and exposures to stress, have also been examined in relation to developmental outcomes (13). Offspring of pregnant mothers exposed to stress show altered endocrine, immune and neurobehavioural outcomes as compared to controls, as well as decreased birth weight and decreased gestational age at birth (14-17). Therefore, during fetal development and early life, the body and brain are vulnerable to the surrounding environment, both positive and negative, which makes this time frame especially pertinent to human health and disease research.   3  1.2 The impact of early life socioeconomic status and social environment on health and development  Socioeconomic Status (SES) is a complex, multifaceted measure of the economic or social standing or class of an individual or group. It is often measured as a combination of education, income and occupation and, importantly, is one the most potent environmental determinants of human health, with well-established influences on morbidity and mortality across industrialized populations (18). Across the life span adverse health outcomes increase gradually from the most to the least privileged populations, creating an observable gradient (18).  Adverse health outcomes to low SES include increased incidence of heart disease, asthma, obesity and diabetes in a graded fashion even when access to health care, ethnicity and health risk behaviours are all taken into account (19,20). Within these associations, SES in childhood has proven to be as strong a predictor of health outcomes as adult SES (21-23). As such, SES researchers have argued for the importance of the “life course approach”, which examines SES across the life course to better understand the relationship between SES and variations in health (21,24). Most simply put, early life SES strongly predicts SES in adult life, with advantage or disadvantage often accumulating within one’s life (25). Beyond this association, early life SES can impact many aspects of development including physical, social-emotional and cognitive development and concurrent health outcomes which may beget later health outcomes (19,26,27).    One prominent theory addressing why the outcomes of early-life exposures sometimes manifest later in life was proposed by Gluckman and Hanson (2004) (28). They posit that in childhood, alterations in response to the environment may be grouped into three categories based on representing: 1) immediately beneficial and adaptive responses to certain environments; 2) the negative outcomes or damage from exposure to certain environments; 3) predictive adaptive 4  responses which “confer advantage by establishing metabolic physiology appropriate for the postnatal environment predicted to exist” (28). In the last scenario, such alterations set the child or infant on a specific development trajectory that would only be adaptive if environments later in life mirror those in early life. Such developmental programming can lead to both beneficial and detrimental ‘latent effects’, outcomes which are not concurrent with the environment but rather manifest later in life.   Environments altered by SES affect all aspects of life. Children from low-SES families, in comparison to more advantaged peers, are more likely to live in poor-quality housing with inadequate water, heating, and sewage (29). These homes tend to be noisier, crowded, and unsafe (29,30). This is also seen in the schools they attend and the neighborhoods in which children reside (29). Healthy food is more likely to be scarce in low-SES households, due to high costs and limited availability in local markets (29,31). The neighborhoods, in general, may be less safe with higher crimes rates (29). These differences represent persistent misfortunes of children growing up in low-SES families and unfortunately, maltreatment and disadvantage co-occur more often than expected by chance alone, meaning that children who are abused or neglected may also be more likely fall below the population average with regards to material resources (19,29). However, a sizable minority of individuals show resiliency to their surroundings, maintaining their mental and physical health despite adversity (32,33). There are psychological adaptions that may act as buffers; these include the child’s perspective of his/her SES, his/her temperament and cognitive abilities (32). Family and school setting can also offset misfortune by providing a warm, supportive environment for children to thrive (32). Finally, early interventions may diminish the risks associated with low-SES environments in childhood, highlighting the necessity of understanding the mechanisms by which SES gradients influence health (34).  5  Although the evidence of early environment shaping one’s health is overwhelming, the molecular mechanisms through which environmental exposures are “biologically embedded”, thereby leaving lasting biological residues, remain largely unknown. Nevertheless, numerous studies have linked SES-related health problems to dysregulation of inflammatory pathways (35-38). More specifically, low SES consistently associates with a wide-range of pro-inflammatory profiles including elevated levels of circulating C-reactive protein and interleukin-6, as well as increasing cortisol output, which is indicative of accumulative “wear and tear” of stress-response pathways due to chronic stress (35,39). As well, these findings have been supported by analyses of genomic expression profiles of individuals from low and high SES backgrounds (40,41). Resulting immunological differences have the capacity to influence a wide-range of physical and mental health outcomes. Mild, chronic inflammation, such as that associated with low-SES, contributes to the pathogenesis of atherosclerosis, obesity, cancer, chronic obstructive pulmonary disease, asthma and neurodegenerative disease (42).  Inflammation linked to SES can also be associated with mental health outcomes. Individuals with major depressive disorder consistently display a pro-inflammatory profile, especially in cytokines (36,43). Cytokines are small proteins involved in cell signaling to mediate the innate immune response. Cytokines may contribute to depression by influencing hypothalamic-pituitary-adrenal (HPA) axis function, as well as crossing the blood-brain barrier and affecting neurotransmitter activity (43). Dysregulation of inflammatory pathways undoubtedly couples early life SES to later disease occurrence, however other biological mechanisms, such as genetic variation and temperament, i.e. individual, inborn differences in one’s reactions to environmental stimuli, can also contribute to these outcomes.    6  1.3 Differential susceptibility to early life environments  Not all children raised in impoverished or adverse environments develop the common negative health outcomes observed at population levels. Similarly, some children will not meet the expected developmental trajectories despite being afforded privileged upbringings. Individual factors, which can be summarized as differential susceptibility to the environment, can moderate or buffer early life experiences of SES and therefore alter health outcomes (32,44,45). Many factors contribute to one’s overall sensitivity or resiliency to environmental exposures (33). Differential susceptibility is closely related to levels of autonomic nervous system (ANS) reactivity in response to stressful events, i.e. the physiological measures of the “flight or fight” response. Temperament, which can be defined as characteristics that emerge shortly after birth and govern a child’s responses to new or different situations, is another common plasticity or susceptibility marker (44). Temperament has been shown to buffer the effects of parenting style and quality of child-care on psychological outcomes such as externalizing behaviour (44). ANS reactivity and the construct of temperament are closely connected, as a child who has a more fearful, inhibited temperament also typically displays stronger ANS reactivity (46,47). Furthermore, individuals’ perceptions of the environment also mediate their outcomes to negative experiences (48,49); Subjective factors (e.g., feelings of loneliness) can bear a greater influence on health outcomes, such as lowered transcription of anti-inflammatory genes, elevated systolic blood pressure, and mortality risk in the elderly, than objective measures of social isolation (50-52). Thus, an individual’s internal milieu can provide insight into neurodevelopment and potential health outcomes in the face of adversity and stress (44).   7  1.3.1 Gene by environment interactions in the context of early life environments Gene environment (GxE) interactions, whereby genes outcomes to environmental exposures arise in an allele-dependent manner, can underlie differences in environmental sensitivity and resiliency. Numerous genes, many of which are related to stress-response pathways and/or neuronal circuitry, have been shown to interact with early life environments to alter health outcomes. For example, the gene encoding the serotonin transporter, SLC6A4, contains an upstream variable number tandem repeat (VNTR), 5-HTTLPR, with two common alleles. The likelihood of developing mental health problems following adverse experiences in childhood such as maltreatment, stress, and sexual abuse has been consistently associated to 5-HTTLPR genotype frequencies (53). Reported associated health outcomes include but are not limited to depressive symptoms, anxiety, PTSD, and suicidal ideation (53). Other genes often cited as displaying GxE interactions with early life adversity, include BDNF (Brain derived neurotrophic factor), COMT (catechol-O-methyltransferase), DRD4 (dopamine receptor D4), NR3C1 (glucocorticoid receptor), and MAOA (monoamine oxidase A) (54-57). Work from the Bucharest Early Intervention Project uncovered a protective effect of the COMT met allele (val158met polymorphism) against depressive symptoms in children who remained longer in institutions (58). Not surprisingly, the functions of many GxE genes are critical to neurodevelopment and thus contribute to inter-individual variation in temperament and reactivity (59). Therefore, the relationship between genotype, reactivity/temperament and differential susceptibility to early life adversity is complex but critical to predicting which specific children are most at risk.   8  1.4 Epigenetics Epigenetics is a relatively nascent and expanding field investigating the nexus of genome and environment; more specifically, epigenetics is commonly defined as modifications to DNA or DNA packaging that are potentially transmissible to daughter cells and that do not involve changes to the DNA sequence (60,61). It encompasses the idea of “cellular memory” or molecular changes that persist after the original stimuli has ceased (61). This definition is quite close to the original use of “epigenetics”, as described by Conrad Waddington in 1942 – “the branch of biology which studies the causal interactions between genes and their products, which bring the phenotype into being” (62). We now know that epigenetic mechanisms are responsible for the structural organization and packaging of DNA; by altering DNA accessibility, gene expression programs may be orchestrated to drive cellular processes. However, it has been argued that temporal and tissue-specific differences in gene expression are more appropriately described as “transcriptional regulation” and do not fall into the realm of epigenetics (61). Epigenetic marks include to acetylation, phosphorylation, and methylation of histone proteins and as well as other post-translational modifications; covalently bound modifications to DNA, such as DNA methylation (DNAm); non-canonical histone variants; and non-coding RNAs.   1.5 DNA methylation DNAm refers to the covalent attachment of a methyl group primarily to the 5’carbon of cytosine bases found most often at cytosine-phosphate-guanine dinucleotides (CpGs) (63). DNAm occurring at non-CG dinucleotides is relatively infrequent, found predominantly in fetal tissue, specifically fetal brain tissue, and in low but measurable levels in adult neuronal tissue (64,65). The process of DNAm is catalyzed by enzymes called DNA methyltransferases 9  (DNMT), which transfer methyl groups from a methyl donor, S-Adenosyl methionine, to unmethylated DNA. There are three functional DNMTs, which are highly conserved across mammals: DNMT3A, DNMT3B, DNMT1. DNMT3A and DNMT3B are de novo methyltransferases and function through the interaction of a regulatory factor DNMT3L with specific histone modification, followed by recruitment the methyltransferase (66). DNMT1 is the maintenance methyltransferase responsible for DNAm at hemi-methylated sites during cell division (67). CpG dinucleotides are palindromic and therefore both DNA strands are methylated at CpGs (63). As such, during cell division, if the parental strand is methylated; it can be used a template for methylation of the newly synthesized strand by DNMT1, thus replicating the DNAm state in a semiconservative fashion (68). However, human cancer cells lacking DNMT1 were able to maintain methylation following division; this suggests de novo methyltransferases play a role in maintenance of DNAm. DNMT3A and DNMT3B are required both for normal fetal development, while loss of DNMT1 is embryonic lethal (68,69). DNA demethylation is currently believed to occur both passively and actively; the former can take place when DNAm is not maintained following DNA replication and subsequently becomes “diluted” out (70). Active demethylation is likely a function of the ten-eleven translocation methylcytosine dioxygenase (TET) family of enzymes, which catalyze the first step of the demethylation process through the oxidation of the 5-methylcytosine (71). This reaction produces hydroxymethylation, which is both an intermediary step in demethylation and an important epigenetic marker involved in development and brain function (65,72). To that end, hydroxymethylation marks are most prevalent in progenitor cells and neurons (72); however the exact function of this mark and its role gene regulation and chromatin modification is not well understood (72).  10  The distribution of DNAm along the genome is non-random and highly dependent on the underlying genomic landscape, partly because CpGs themselves are not evenly distributed across the genome (63). CpGs are under-represented in the genome but are enriched in some regions, which are referred to as CpG islands (CGIs). Transcription factor binding is thought to play a role in in both CpG density in the DNA sequence, as well as DNAm status (73). With regards to the latter, transcription factors, in part, establish genome-wide DNAm states. This phenomenon is especially evident in development, during which binding sites of cell-specific transcription factors exhibit little to no DNAm (74). Transcription factors bound to DNA, especially those with CGs in their binding motifs, likely prevent methyl sites from being added to the DNA sequence. CpGs found outside of gene regulatory regions are less likely to be bound by transcription factors (73,75). Evolutionarily, these sites were more likely to be methylated and therefore, less stable and more prone to mutation (73). As such, transcription factor binding is believed to have contributed to the current patterning of CpG density in the genome and the existence of CpG islands (CGI). CGIs are often associated with gene promoters and are commonly defined as regions spanning 200 base pairs or longer and with at least 50% guanine or cytosine content and 0.6 observed/expected ratio of CpGs (76,77). Despite the high density of CpG dinucleotides, CGI are less likely methylated compared to non-island CpGs. One exception to this pattern is CpGs found at repetitive elements, such as Alu and LINE-1, which are highly methylated in order to repress transposition (78,79). Regions immediately surrounding CGIs are referred to as “shores”, followed by “shelves”. Specifically, CpG shores are found within 2kb outside of the CGI boundaries and CpG shelves are between 2-4kb outside of the CpG island boundaries (80,81). Beyond CGI status, CpGs are can be described in terms of their location relative to genes and 11  other genomic features, such as transcription start sites (TSS) or gene enhancers. These classifications prove useful for in silico analyses of DNAm patterns and inferring downstream effects.         1.5.1 DNA methylation involvement in gene regulation Nearly 70% of gene promoters are near a CGI, and this proximity results in a strong relationship between promoter DNAm and gene expression (76,82). Generally, high levels of DNAm at promoter-associated CGIs are associated with low levels of expression, and vice versa (83). However, the association of DNAm and gene activity is complex. A negative correlation between DNAm levels and gene expression is not upheld when comparing expression and DNAm levels for a specific gene across individuals (84-87). As well, recent findings suggest the causation may be reversed and gene expression levels influence levels of DNAm, not vice versa (85). Furthermore, DNAm marks found outside promoters show less consistent associations with gene expression (85). For example, in CGI shores, higher levels of DNAm have been associated with higher levels of gene expression (63,88).  Detailed molecular analysis has provided a likely framework through which DNAm can inhibit gene expression via the recruitment of methylated DNA-binding proteins, which in turn recruit chromatin modifying enzymes to package the DNA such that it is inaccessible to transcription machinery (63). While this relationship between DNAm and gene transcription is supported by a large body of research, whether DNAm is a causal mark or acts in maintaining other repressive signals is currently unknown. Recently, emerging evidence suggested, via the artificial induction of DNAm at gene promotors, that DNAm alone cannot repress transcription (89). However, reanalysis of these data suggested that forcible DNAm does repress transcription 12  and is associated with reduced H3K4me3, a histone mark associated with active transcription (89,90). Such conflicting findings are representative of our current understanding of the specific role of DNAm in gene regulation.   1.5.2 Genetic background contributes to DNA methylation variation  The underlying DNA sequence can influence DNAm. Reminiscent of work examining expression quantitative trait loci, sites of linkage between allelic variation and gene expression, multiple studies have identified SNPs associated DNAm at specific CpGs, known as methylation quantitative trait loci (mQTLs) (83,85). This relationship was originally uncovered investigating differences in DNAm between ethnicities, as well as higher than expected heredity of DNAm patterns (83,91,92). CpGs under genetic influence are estimated to make up anywhere from 20 to 80% of all CpGs (83,93,94). One mechanism through which genetic variation alters DNAm is whereby a SNP disrupts the specific sequence recognized by a DNA binding protein responsible for creating boundaries between methylated and unmethylated genomic regions (85). As variation at this SNP may be associated with methylation patterns at certain CpGs, it would constitute an mQTL. With respect to the interplay between SNPs, gene expression and DNAm, SNPs have been shown to most commonly alter methylation and expression independently, as opposed to a SNP altering gene expression first, followed by gene expression affecting DNAm or a SNP altering DNAm first, followed by DNAm affecting gene expression (85). These interactions may be mediated by transcription factor binding, which alters gene expression and DNAm simultaneously. Regardless of the mechanism by which these relationships function, it is crucial to consider genetic variation in the analysis of DNAm.   13  1.5.3 Tissue specific DNA methylation and cellular differentiation  A well understood role of DNAm occurs during embryonic and fetal development, wherein it regulates cell differentiation, conferring tissue-specific identity that is stable and mitotically heritable (67). As such, DNAm displays tissue- and cell-specific patterns (95-97). In fact, tissue of origin is one of the largest determinants of DNAm variation in healthy individuals, accounting for greater variation than genetic background (96-99). Much of what is known about the role of epigenetic mechanisms during embryogenesis has been elucidated through the study of mouse embryonic stem cells (ESCs). During differentiation, DNAm is required to silence pluripotency factors. For example, the promoters of genes associated with pluripotency, such as Oct4 and Nanog are hypermethylated and silenced (67). As well, DNAm acts to upregulate markers associated with germ-layer specificity; in ESCs lacking DNAm, differentiation is inhibited.  Further knowledge of the role of DNAm in tissue specificity has been gleaned from studying the human hematopoietic system. During hematopoietic stem cell differentiation, DNAm gains and losses at specific regions of the genome are believed to maintain the differentiation marks required for cellular identity (100). Myeloerythroid and lymphoid lineages display lineage-specific epigenetic differences, which further develop as the cells mature (101-103).  1.6 DNA methylation over the life course DNAm is most dynamic during fetal development, when epigenetic patterns play an integral part in the complex processes of embryogenesis (64,104). Highlighting the dynamic nature of DNAm, levels in neonatal blood are lower than that observed at most other points 14  during the lifespan (105,106). After birth, average DNAm levels increase in blood throughout the first year of life (105,107). These changes occur preferentially at CpG island shores and shelves, enhancers, and promoters lacking CpG islands (108). In both blood and buccal epithelial cells, DNAm between monozygotic twins has been shown to become more divergent in the first year of life (105,109). After the first year, median global DNAm levels are relatively stable, with the exception of certain regions that frequently gain DNAm (105). The first few years of life have been extensively studied; however, there are relatively few reports of changes in DNAm throughout later childhood and adolescence. Studies that examined this period of human development have reported that DNAm levels increase rapidly and then stabilize by adulthood in both brain and blood (65,110).  DNAm in respect to the aging process, from adulthood to advance age, has also been extensively studied. Overall levels of DNAm remain stable in blood, while inter-individual variability increases over that time (111,112). Post adulthood, many studies have found a mean decrease in blood DNAm with increasing age (112-119). These changes are less likely to occur in promoters and more likely to be observed in enhancers (118). Regions that gain DNAm with age are enriched for CGI, while non-CGIs tend to lose DNAm with age (112,115,116,119,120). In sum, sites that show overall low DNAm, such as promoter-associated CGIs, tend to increase methylation with age, while those with high DNAm such as intergenic non-island CpGs tend to lose methylation with age. As most CpGs are located outside of CGIs and are highly methylated, this translates to an overall loss of DNAm in later life as well as a tendency for DNAm levels to shift toward the global mean DNAm with increased age (76,77,82,112,117,121). Taken together, these findings suggest that DNAm shows reduced stringency in maintenance over the lifespan.  15  Despite a gain in DNAm in early life and gradual loss in later life across the genome, these changes are not symmetrical. They differ in two major ways: (i) the rate of change is much higher in early life than later life, and (ii) the genomic locations of the changes are quite different. In early life, DNAm is gained globally, but more at island shores and intergenic regions, while in later life, DNAm is lost globally, but still gained at islands and shores (108,110,122). Overall, most tissues fit with the pattern of increase in average DNAm early in life, with a gradual decrease later in life (123,124). For example, many studies have shown that brain regions follow this pattern, showing rapid changes in DNAm in the early life period and then slowing gradually over the lifespan (65,116,125). In addition to general patterns of DNAm change with age, DNAm levels at specific sites in the genome, which are highly associated with age, can be used to predict chronological age (112,116,117,119,126). These sites underlie the concept of the ‘epigenetic clock’. These epigenetic clock sites have been found both within a specific tissue and across tissues, and have been shown to be much more concordant across tissues than gene expression changes across tissues with age (112,116,117,119).   1.7 Evidence of DNA methylation alterations by SES-related early environmental exposures Complementing its role in cell type specification, DNAm is also emerging as a mechanism of cellular memory of past exposures (61). Although DNAm is mitotically stable, transmitted from mother to daughter cells, across an individual’s life time it may change following specific environmental exposures (127). The paradigm of  environmentally-induced DNAm alterations has recently been applied to early life SES (45,128). In the following sections 16  I will explore the current findings in both mouse models of early life adversity and human research.   1.7.1 Animal studies of epigenetics and early exposures Numerous animal studies established causal links between stress exposures altering epigenetic markers and subsequent health or behavioural outcomes. While SES itself cannot be modelled in animal studies, chronic or acute stress exposure represents one of the most powerful aspects of SES-related early life experiences. To that end, low levels of maternal licking/grooming and arch-back nursing behaviour have been associated with decreased DNAm at the glucocorticoid receptor (GR) gene in cross-fostered offspring (129). Outcomes to the offspring also included decreased expression of the GR gene, Nr3c1, and increased HPA axis responsivity and anxious behaviour. Moreover, the molecular signatures and stress response phenotype were reversible in adulthood with central infusion (into brain ventricles) of a histone deacetylase inhibitor. The study of maternal care in rodents has also provided evidence of DNAm moderating the expression of estrogen receptor alpha, Esr1, in the hypothalamus (130). Offspring raised by low licking/grooming dams had decreased expression the Esr1 gene and reduced neural activation; this caused mature female offspring to provide less licking/grooming to their own offspring. Gene expression was associated with differential DNAm promotor region of Esr1 (130). Together these findings provide strong causal evidence of the role epigenetics in translating early life experiences into long-term physical outcomes.   Beyond nursing behaviour, many different animal models of early life stress, such social isolation, exposure to aggressive social interactions, and environmental manipulations, have been used to investigate the role of epigenetics as a biological mechanism of the long-term effects 17  (36). Early postnatal stress in mice lead to a range of behavioural outcomes, including increased stress reactivity and addiction, altered gene expression profile, and DNAm changes depending on the stress paradigm and its duration (36). For example, stress induced by infant-mother-separation in mice resulted in hyperactivity of the HPA axis, specifically hypersecretion corticosterone, increased arginine vasopressin (Avp) expression in the hypothalamus and loss of DNAm at an enhancer of Avp  (131). Importantly, AVP functions in stress response, namely through stimulatory effects on the HPA axis. Behavioural changes, loss of DNAm and altered Avp expression all persisted beyond the 10 day exposure to maternal separation (131). Finally, differential susceptibility in response to environmental stress has also been modeled in mice. One study found that adult mice exposed to 10 days of a standard “social defeat” protocol used induce stress to model stress-related disorders such as anxiety, displayed either behavioural susceptibility or resiliency, as determined by subsequently avoiding or interacting with unfamiliar mice, respectively (132). Only susceptible mice displayed loss of DNAm and increased expression at the Crh gene. Like AVP, CRH (corticotrophin releasing hormone) is a neuropeptide involved in the HPA axis. Animal models have proven especially valuable in understanding the mechanisms of social epigenetics and have causally implicated DNAm in “biological embedding” of early life stress by allowing for the longitudinal assessment of DNAm.  1.7.2 Human studies of epigenetics and early exposures In addition to extant research in animal models, human research has offered evidence of the role of environmental exposures in DNAm patterns through association studies. By necessity, most research on humans is correlational rather than causal and many uses peripheral tissue to 18  assess DNAm, as postmortem brain samples are limited and many human tissues require invasive sampling methods to obtain. Currently, the most widely reproduced association between DNAm and an environmental exposure is that of cigarette smoke exposure and DNAm in the aryl hydrocarbon receptor (AHRR) gene (133-136). AHRR is involved in the aryl hydrocarbon receptor signaling pathway which responds to toxic compounds in polluted air and tobacco smoke. Decreased DNAm in the gene body of AHRR, particularly at cg05575921, has reproducibly correlated with smoke exposure in many independent cohorts, including current smokers, individuals exposed to secondhand cigarette smoke, and infants exposed prenatally to maternal smoking, in tissues including lymphoblasts, PMBCs, monocytes, and cord blood (133-136).  The same methodologies used in EWASs of physical environmental exposures have been applied to early life stress and familial SES. Such studies have strengthened and expanded our understanding of SES as a social determinant of health. As previously described, the negative health effects of SES are better explained by childhood SES, as compared to adulthood SES; this paradigm has been recently observed in DNAm. In a cohort of adults from either low or high SES backgrounds, based on parents’ job prestige in early life, who were balanced for current SES, DNAm was associated with early life SES irrespective of concurrent SES; concurrent SES was not associated with DNAm (87). Furthermore, pre-stimulation DNAm differences in leukocytes were predictive their cytokine responses when stimulated through the Toll-like receptor pathways (87). As such, one could speculate that this specific DNAm pattern connects past SES exposures to later immune response, thus connecting early life experiences to health outcomes in adulthood. DNAm research has also broadened our understanding of differential susceptibility to early life stress and adversity.  19  Individuals with a “sensitivity allele” in the FKBP5 gene, which causes increased induction of the gene, and who experienced childhood trauma were found to have loss of DNAm at FKBP5 and an increased risk of developing stress-related psychiatric disorders in adulthood, including post-traumatic stress disorder, as compared to individuals with childhood trauma but no copies of the sensitivity allele (137). FKBP5 is a regulator of the stress hormone system and demethylation at this gene is followed by increased stress-dependent gene transcription and ultimately by a long-term dysregulation of the stress hormone system and a global effect on areas of the brain associated with stress regulation(138). Notably, this study also implicates dysregulation of stress response pathways as a mechanism by which adversity can lead to disease. Beyond these studies, a wide range of research on DNA methylation and exposures to early life psychosocial adversity or SES exists. DNA methylation appears to be associated with being rearing in institutional environments (139,140), childhood maltreatment or deprivation (141-143); maternal mental health problems during the perinatal period (144,145) and parental adversity and stress in either infancy or the preschool (146,147).  The take home message from our current understanding of social epigenetics is that complex social experiences, such as early life SES, are intertwined with physical exposures and subjective experience and any or all of these factors may play a role in DNAm alterations. Furthermore, the causation and directionality of this relationship is not interpretable in cross-sectional studies and outcomes are complicated by individuals’ genetic backgrounds and temperaments, dictating their differential susceptibility. This is underscored by a genome-wide study of genotype, DNAm and various in utero exposures, including maternal smoking, maternal depression, maternal BMI and birth order, in which the vast majority of high variability DNAm 20  sites of were best modelled by an interaction of genotype and exposure, as compared to genotype alone or exposure alone (148).   1.8 Synthesis and thesis objectives Epigenetic processes, such as DNAm, represent a likely mechanism of biological embedding that can extend to any biological system. Hypothetically, epigenetic mechanisms could enable the body to mount conditionally adaptive changes to metabolic processes, the endocrine system, neuroregulatory function, etc. depending on the specific environmental onslaught and the tissue type in which the epigenetic change occurs. Therefore, assaying DNAm in the context of “early life” environment can bring to light aberrations underlying development, learning and behavioural differences, and later health outcomes. In the context of this dissertation, the term “early life” is used broadly to refer to a period of time starting at conception and ending at puberty. The relationship between early life experience and epigenetics is complicated by internal psychological and physiological factors, as well as genetic variation. As well, stress reactivity and temperament are predictive of how a child may interact with his or her environment and can affect how such exposures are internalized.  In sum, the early life environmental exposure and intrinsic, behavioural differences may imprint on the DNA methylome at genes related to stress response pathways and/or neurodevelopment, thus underlying the gradient of health outcomes associated with one’s childhood socioeconomic status. Focusing my research around this hypothesis, the goal of this dissertation was to build foundational knowledge of molecular mechanisms of “biological embedding” to ultimately inform targeted, multilevel interventions capable of abating lifelong societal inequalities in health and human development. Given early state social epigenetics 21  research, many obstacles exist limiting the interpretability and generalizability of SES-related EWAS findings. For examples, epigenetic studies often use accessible, surrogate tissues in place of primary tissues of interest. Given the vast differences in DNAm patterns between cell types and tissues, the functionality of epigenetic findings in surrogate tissue are not fully understood. Thus, the main objectives of my dissertation were to elucidate the relationships between childhood environment, DNAm and genetic variation, and behaviour, to understand how these systems influence one another but also to assess the utility of such findings. These objectives were divided into the following three research projects:  I. Evaluate the concordance of DNAm in matched blood and buccal tissues from the pediatric age range to determine benefits and disadvantages of using these accessible tissues, as well as uncover tissue-specific DNAm patterns. II. Examine associations of early, internal differences in biobehavioural responses to later epigenetic modifications, expecting to uncover significant relationships between biobehavioural response propensities and patterns of DNAm, similar to previously documented associations between early life environments and DNAm. III. Examine the epigenetic correlates of early-life socioeconomic status in PBMCs and BECs collected at ages 8-10 and dried blood spots collected at birth to understand the tissue-specificity and timing of SES-related DNAm patterns.  22  Chapter 2: Common methods and materials   2.1 Introduction to epigenome-wide association studies  Epigenome-wide association studies (EWASs), aim to discover DNA methylation loci or CpGs that associate with a phenotype or exposure of interest. EWASs, like genome-wide association studies (GWASs), can discover associations with no a priori hypotheses of the genes involved (124,149,150). As well, EWASs cannot specify causal relationships and therefore may uncover DNAm patterns which are mechanistically involved or biomarkers(94,149,150). Unfortunately, due to multiple test correction and small samples sizes, EWASs can be plagued by high type I error rate (false negatives). Furthermore, failure to account for technical confounders, population complexity, and other confounders in the data leads to a large number of type II errors (false positive results). Unlike GWASs, power calculations, aimed at estimating appropriate sample sizes, are not common. A small number of publications have provide suggestions and guidelines for applying power calculations to EWASs (151,152). However, calculating cohort-based EWASs have proven challenging due to the difficulties in predicting the required inputs for power calculations, such as predicted effect size and number of significantly associated CpGs. Running under-powered EWASs has most likely contributed to the infrequency of replicability of findings across independent cohorts due to high type I error rates. With the steep rise in EWAS publications, methods being employed to ensure the quality of reported EWAS findings limit type I and II error rates include but are not limited to batch correction, cell-type correction, accounting for ethnicity or genetic background, and independent replication (94,149,150,152). In this chapter, I outline the common methods and materials used in chapters 3 to 5 of this dissertation.  23   2.2 Study samples  Two main cohorts were analyzed in this dissertation - Wisconsin Study of Families and Work (WSFW) and Gene Expression Collaborative Kids Only (GECKO). WSFW samples are analyzed in chapter 4 and GECKO is used in chapters 3 and 5. In chapter 3, the C3ARE cohort is also assessed; all processing and analyses of the C3ARE cohort were performed by S. Islam and is not described here. Please see chapter 3 for sample information on the C3ARE cohort (147,153). All experimental procedures were conducted in accordance to institutional review board policies at the University of British Columbia (Certificates H07-01317 (C3ARE) and H07-02773 (GECKO, WSFW)). Written informed consent was obtained from a parent or legal guardian and assent was obtained from each child before study participation.   2.2.1 Wisconsin Study of Family and Work Wisconsin Study of Families and Work (WSFW) is a longitudinal study that was originally established to examine the effects of maternity leave on families. Dr. Marilyn J. Essex was the principal investigator of the project, which began in 1994 after concluding an initial phase, for which Dr. Janet S. Hyde was the principal investigator (154).  A total of 570 pregnant women, 18 years or older and living with the biological father, were recruited for participation from their physicians’ clinics (154). Seventy eight percent of the sample came from the Milwaukee, Wisconsin and twenty-two percent from the Madison, Wisconsin. Participants were required to be between 21 and 25 weeks pregnant during the recruitment process, which occurred between June 1990 and September 1991. As well, the 24  mother was required to be employed or a homemaker; if the latter was reported, the father was required to be employed.    Of the sample of 570 women, 93% self-identified as “White” and 7% reported a minority ethnic heritage (Indian/Alaskan, Asian/Pacific Islands, African-American, Hispanic, or White Hispanic); 223 were pregnant with their first child, 95% were married. The median age of mothers was 29 (20-43); the median educational experience was college graduating (with experience ranging from no completion of high school to post-graduate degree); the median family income was $45,000 USD (range $7,500 to $200,000 USD). Following birth, participants partook in 15 assessments over the course of 18 years, which included mail-out interviews, in-person interviews, questionnaires, or home visits to collect observational data. For certain assessments, fathers, care-givers, teachers and children all participated in interviews and questionnaires. The analyses presented here are based on a subsample of children, referred to as the “MacArthur 120”. During grade 1 (in 1998), these 120 children (73 girls, 47 boys), their parents and teachers completed additional assessments, for the purpose of evaluating the MacArthur Assessment Battery for Middle Childhood (155). These children were recruited based on scoring either high (upper 20% percentile on internalizing and/or externalizing symptoms) or low on mental health symptomology in the MacArthur Health and Behavior Questionnaire (HBQ) completed in kindergarten. Therefore, unlike the greater sample, the MacArthur 120 was designed to be balanced for high and low pre-syndromal mental health symptoms. The subsample did not differ significantly from the larger sample in family income, parental education, or marital status. 25  Buccal swabs were collected at ages 15 and 18 using the Isohelix Buccal Swabs (Cell Projects Ltd., Kent, UK) and stabilized with Isohelix Dri-Capsules for storage at -80° C prior to DNA extraction (156).   2.2.2 Gene Expression Collaborative Kids Only GECKO is a cross-sectional study of epigenetic profiles, SES and family adversity in approximately 400 children living in socioeconomically diverse neighborhoods from the Vancouver lower mainland. This project was designed to observe how neurological and genomic factors interact with early life experiences to alter developmental outcomes. It combines population level data regarding neighborhood-level socioeconomic and school performance/educational information, and individual health, genetics, epigenetics and stress reactivity in order to generate a “whole systems” understanding of the early life experience. Dr. W. Thomas Boyce was the principal investigator and Drs. Michael Kobor, Tim Oberlander and Jelena Obradovic were co-investigators.  The initial round of data collection took place between December 2009 and November 2011. During this time a travelling van, i.e. a mobile laboratory, outfitted with all required instrumentation travelled to the participants houses to perform approximately two hours of laboratory stress tests. Specifically, electroencephalography (EEG) and echocardiography ECG measures, were recorded, as well as Galvonic skin response and respiratory rate. These measures assessed autonomic nervous system (ANS) reactivity and prefrontal cortex (PFC) activity during the stress protocol. Saliva was collected at 4 time points across the assessment for the purpose of measuring adreno-cortisol levels. The protocol itself involved answering questionnaires given by 26  the research associate, following by a visual oddball detection task, an executive functioning task and the Trier Social Stress Task. During the protocol, parents answered questionnaires aimed at assessing socioemotional vulnerability, adversity, financial stress, conflict tactics, life events, child health and development, education and household income.  Participants provided buccal swabs and saliva for DNA analysis as part of the mobile lab protocols; the latter was used only for genotyping and never for DNAm analysis. Buccal samples were preserved, as per Isohelix protocol, with proteinase k and “ls” solutions; the samples could then be stored at room temperature without DNA degradation. Saliva samples were preserved using DNA Genotek Oragene collection kits. For a subset of participants, peripheral whole blood was drawn between 6 months and 2 years after the initial visit using Vacutainer® CPT™ Cell Preparation Tubes (Becton, Dickinson and Company, NJ, USA) and PBMCs were isolated following centrifugation, washing and resuspension into R10 media (Sigma-Aldrich, MO, USA), as previously described (35). PBMC pellets were frozen and stored at -80°C until DNA extraction. As well, in subset of participants, I obtained neonatal dried blood spot collected from heel pricks using Whatman Protein Saver cards by the BC Newborn Screening Program.  2.3 DNA extraction and sodium bisulfite conversion   Genomic DNA from stabilized buccal samples was isolated using Isohelix Buccal DNA Isolation Kits (Cell Projects Ltd., Kent, UK) and was purified and concentrated using DNA Clean & Concentrator (Zymo Research, CA, USA). Genomic DNA was extracted from PBMC pellets using the DNeasy kit (Qiagen, MD, USA). DNA yield and purity were assessed using a Nanodrop ND-1000 (Thermo Fisher Scientific, MA, USA). Genomic DNA was extracted from 27  DBSs using the GenSolve DNA Recovery Kit, GVR100 (IntegenX), followed by purification using the QiAmp DNA Micro Kit (Qiagen). Finally, genomic DBS DNA is concentrated using Microcon DNA Fast Flow Devices, MRCF0R100 (Millipore). Bisulfite conversion of all genomic DNA (750 ng) was performed using the Zymo Research EZ DNA Methylation Kit (Zymo Research, CA, USA), as per manufacturer’s instructions. Bisulfite conversion translates DNA methylation into sequence differences by a deamination process. Unmethylated cytosines become uracils, and then thymines following amplification, while methylated cytosines are protected from deamination and remain thymine (157). Thus, percent methylation (ranging from 0 to 1) of a single CpG in a cell population may be quantitated by methods similar to those used for DNA polymorphisms.   2.4 Illumina DNA methylation microarrays  There are many technologies available to assay genome-wide DNAm, including microarrays and next-generation sequencing (158), which are typically performed after sodium bisulfite treatment of genomic DNA. However, Illumina DNAm microarrays are one of the most commonly used in human DNAm EWASs (149,150,159). Advantages of this technology include low cost per sample, single-nucleotide resolution and high reproducibility (80,160). Disadvantages of DNAm microarrays include low coverage of the methylome and batch effects. Over the last 10 years, the DNAm microarrays offered by Illumina have greatly improved coverage. In this dissertation, I used two iterations of this Illumina’s microarray, Infinium HumanMethylation27 BeadChip (27K array) and Infinium HumanMethylation450 BeadChip (450K array), based on what was commercially available at the time (80,161). Both assays required the same general workflow(161):  28  1. 160ng of each bisulfite converted DNA sample was denatured and neutralized. 2. Samples were isothermally amplified overnight, resulting in whole genome amplification. 3. Amplified sample were fragmented, then precipitated and resuspended in preparation for hybridization.  4. Samples were hybridized to microarray beadchips and incubated overnight. 5. Beadchips were washed of any unhybridized DNA.  6. Beadchips underwent single base pair extension of probes using hybridized DNA as template and were then stained with fluorescent labels.  7. Beadchips were scanned, producing IDAT files consisting of light intensity reading that could be converted to beta values.   2.4.1 Illumina Infinium HumanMethylation27 BeadChip Genomic BEC DNA collected at age 15 from WSFW participants was assayed using the 27K array. This microarray platform covers 27,578 CpGs across 13,500 promoters, as well as some gene bodies. The 27K array contains only Infinium type I probes (161). For this technology, two unique probes exist for each targeted locus. One probe targets methylated DNA and the other targets unmethylated DNA.  2.4.2 Illumina Infinium HumanMethylation450 BeadChip  Genomic BEC DNA collected at age 18 from WSFW participants, as well as all DNA collected from the GECKO cohort was run on the 450K array. The 450K array contains over 485,000 methylation sites, covering over 99% of RefSeq genes (80). Unlike the 27k array, probes are distributed across the promoter, 5'UTR, first exon, gene body, and 3'UTR. In addition 29  to covering 96% of CpG islands, island shores and the regions flanking them are also covered. The 450k array contains two types of probes. Infinium type I probes, described above, are used on the 450k array for high CpG density regions. The majority of probes, however, are Infinium type II probes(80). Type II probes differ by using a single probe for both methylated and unmethylated alleles of the CpG. Following hybridization of the DNA, a single base extension using auto-fluorescing nucleotides is used to determine the methylation status of the CpG. Type I probes use an all-or-nothing assumption, i.e. all CpGs within the sequence of methylated probes are assumed to be methylated and all CpGs within the sequence of unmethylated probes are assumed to be unmethylated. However, type II probes are degenerate, containing all possible combinations of methylated and unmethylated underlying CpGs.  2.5 DNA methylation data processing pipeline  Prior to analysis of the DNA methylation array data, it must be assessed for quality and data normalized to remove background noise and technical effects (162). The processing pipeline is not standardized and there are steps which can be included or excluded depending on the study design. As well, different methods were used to normalize the 27k array data and the 450k array data that were based on the standards in the field at the time of processing. As such, each data chapter contains a detailed methods section of the processing and normalization methods used. The general pipeline one follows for processing 450K array data sets is outlined below.  2.5.1 Prepossessing and quality control  Raw intensity values from the DNA methylation arrays were imported into Illumina GenomeStudio V2011.1 software and subjected to initial quality control checks for array 30  staining, extension and bisulfite conversion followed by color correction and background adjustment using control probes contained in the 450K array(163). The data were then exported from GenomeStudio as beta values (b) which represent the estimated DNAm level based on a ratio of intensities between methylated and unmethylated alleles, with beta values ranging from 0 (unmethylated) to 1 (fully methylated). DNAm was also reported as a percent, which is calculated by multiplying the beta value by 100. Subsequent processing and analyses were performed in R (http://www.r-project.org).   Sample filtering is performed using multiple methods. Each sample was visually assessed for a typical bimodal distribution of methylation values; clustered by sex probes to check for agreement with assigned sex and confirmed that less than 5% of probes perform poorly (metrics described below) (163). Failing any of these quality control measures merits removal of sample. Additionally, R functions, such as removeOutlier, were implemented to identify and remove outlying samples based on sample clustering(164).  Following sample filtering, probe filtering was performed based on two metrics: 1) probes with detection p-values greater than 0.01, indicating that the signal is not significantly greater than background signal, were removed, and 2) probes with missing beta values, for which less than three beads contributed to the signal, were removed (163). In smaller cohorts, if a probe fails either metric in a single sample, this merited removal; in larger cohorts, the probe must fail in a given percentage of samples to be removed. For example, in a cohort of 100 samples, the limit may be set at 5%, in which case a probe must fail either metric in more than 5 samples in order to be removed.   Additional probe removal was based on re-annotation of the 450K array (165). Polymorphic probes, probes at which the assayed CpG is polymorphic, were filtered out of the 31  dataset, as well as probes that have nonspecific in silico binding to the sex chromosomes were also removed (165). Finally, 65 control SNP probes were removed and in cohorts containing both male and female participants, probes mapping to the sex chromosomes were removed, as males and females cannot be directly compared and to limit the confounding effects of sex.    2.5.2 Normalization  Following preprocessing and quality control of the array data, normalization is required to minimize or remove background noise, as well as variability attributed to technical differences. One notable contributor to noise, is the two different technologies used on the 450K array uses, type I and type II probes (166). In the cohorts presented in this dissertation, two normalization methods were used, Beta Mixture Quantile dilation (BMIQ) on GECKO and C3ARE and Subset Within-Array Normalization (SWAN) on WSFW age 18 data (166,167). The WSFW age 15 data, run on the 27k array were quantile normalized (168). Details of these normalization methods are provided in the data chapters to which they are pertain.   2.5.3 Cell-type correction of 450K DNA methylation data  Following normalization, cell-type correction may be applied to the dataset if DNAm is measured in a heterogeneous tissue, such as PBMCs or BECs. The latter is composed of BECs and leukocytes from saliva. In this dissertation, PBMC cell type proportions were estimated using a standard blood deconvolution method (169,170). No published deconvolution method exists for BECs. To account for buccal epithelial cell versus white blood cell proportions in BECS, in chapters 3 and 5, I used saliva-based deconvolution method (171). This method was used because saliva and BEC samples are composed of the same cells in different proportions 32  and it also allowed me to investigate potential blood contamination (171). Both methods output estimated cell proportions for each sample, which were used to normalize cellular heterogeneity across individuals using a linear regression (172).  2.5.4 Batch correction  Finally, a batch correction technique, such as ComBat, can be employed to adjust for variation attributed batch effects (173). Common causes of batch effects in DNAm arrays can include:  1. Plate – samples are randomized onto 96 well plates. On these plates, samples undergo genome-wide amplification, fragmentation, precipitation and resuspension and are then pipetted onto the microarray chips. 2. Chip – each 450K array chip contains 12 samples. The DNA is hybridized to the beadchip overnight and then undergoes single base-pair extension and staining and imaging.   3. Chip row – On the chip, samples are arranged in two columns of six rows. During the extension and staining steps, reagents are pipetted onto the top of the chip and allowed to flow down.  In the ComBat package, the user can input which “batch” he/she wants removed as well any covariates of interested which should be protected. However, if samples are unbalanced, which is more likely to occur in small sample sizes, Combat is likely not appropriate and batches must be accounted for using a different method (174). For this reason, Combat was not used in all data chapters.  33  2.6 Data analysis   Prior to analysis beta values were log-transformed to M-values, which are less biologically intuitive but are more statistically valid for differential analysis because transformation minimizes heteroscedasticity (163,175). EWAS analyses do not follow pre-set pipeline, but rather depend upon the specific research questions. Therefore, different statistical tests were employed in different data chapters. Here, I outline common techniques used throughout this dissertation.    2.6.1 Data reduction  In some instances, not all CpGs assayed on a microarray are pertinent to the planned analysis; for example, sex chromosomes may be removed in cohorts containing males and females, as they are not directly comparable between sexes and therefore any stratification of the data by sex can confound interpretation. Data reduction techniques based on inter-individual variability of CpGs allows for a reduction in the multiple test burden, as well as the removal of CpGs that are not of biological interest (152). Therefore, I calculated inter-individual variability of each CpG, based on the range between the 10th percentile and 90th percentile beta values for each site (176). I chose to use this calculation as it captures variability across the bulk of samples while being robust to outliers (176). This calculation does not rely on the assumption that CpGs are normally distributed.   2.6.2 Statistical analysis of differential DNAm  In this dissertation differential DNAm was assessed using a number of statistical tests. Choice of test was dependent on the study design and the available meta data. Due to the 34  relationship between genetic variation and DNAm variation, ethnicity/ancestry and relatedness should be considered in DNAm analyses (91,177). For all cohorts, ethnicity/population structure was estimated by the Illumina PsychChip array, an exome genotyping array. Sex and age are also commonly, but not always, included as covariates in EWAS analyses (149). For example, in chapter 4, I used a simple correlation between the variable of interest and DNAm at each variable CpG, while in chapter 5, a linear regression was applied to the data (178). In chapter 4, phenotypic variables, including ethnicity, age, and SES were either independent of the explanatory variable or homogenous across the cohort, meriting a correlation. By comparison, both ethnicity and sex were associated with the variable of interest in the GECKO cohort and therefore, were included as covariates in a linear regression. Finally, the tests performed in Chapter 3 were paired because they were performed on matched tissues. For example, to test differential methylation between two matched PBMCs tissues, a Wilcoxon signed-rank test was performed.   When reporting significance either nominal p-values were reported or false discovery rate (FDR) corrected p-values. Multiple test correction was performed using the Benjamini-Hochberg method, which controls FDR (179). This method is less stringent than a method such as Bonferroni correction which controls the familywise error rate. As well as using a significance threshold, associated CpGs are also filtered using a biological threshold. Across this dissertation, CpGs were required to meet a minimum effect size, specifically an absolute delta beta > 5% (|∆b| > 5%). The |∆b|, or effect size, was calculated for ordinal or continuous variables using a linear model to estimate the difference in beta values across the variable of interest. Specifically, a regression line is fit to the data at each CpG and change in DNAm across the regression line is 35  calculated. For dichotomous variables, |∆b| is calculated as the difference in mean DNAm between the two levels.  36  Chapter 3: Integration of DNA methylation patterns and genetic variation in human pediatric tissues help inform EWAS design and interpretation   3.1 Introduction  Epigenome-wide association studies (EWASs) are becoming increasingly popular, in part due to their potential to enhance our understanding of the determinants of health and disease, including potential early life embedding of experiences and exposures and their association with later life outcomes (94,150,180-184). The term ‘epigenetics’, describes mitotically heritable modifications of DNA and its regulatory components, including chromatin and non-coding RNA, that potentially modulate cellular states or fate through gene expression changes, without changing the DNA sequence itself (61,185,186). DNA methylation (DNAm), which involves the covalent attachment of a methyl group to a cytosine primarily at cytosine-phosphate-guanine (CpG) dinucleotides, is the most well studied chromatin mark in human populations due to its relative stability and ease of measurement on quantitative array-based methods (187,188). To date, EWASs have identified differential DNAm across a broad range of contexts including disease states, genetic background and environmental exposures, thereby providing evidence for the potential contribution of DNAm in mediating gene-by-environment (GxE) interactions (94,189,190).  Given that tissue specificity is an integral feature of epigenetic profiles, as different tissues and cell types acquire distinct epigenomes during differentiation, the selection of tissue source is a key consideration in the careful design and interpretation of EWAS analyses (95,191,192). The collection of a disease-relevant, target tissue allows for the direct assessment 37  of epigenetic associations that may be implicated in the underlying phenotypic or disease biology. In certain cases, readily accessible peripheral tissues may represent the target tissue; for example, use of PBMCs for the investigation of DNAm associations to immune or inflammatory phenotypes (99,193-195). However, in many cases, the target tissue, such as brain, muscle, adipose tissue, among others, may be impossible or very difficult to collect from living individuals or at sufficient quality for analysis from postmortem samples (181). Easily accessible peripheral tissues are therefore often used in human epigenetic studies for biomarker discovery in lieu of target tissues that are difficult to collect. This is particularly relevant to pediatric cohorts in which biopsy specimens with invasive collection procedures or postmortem samples are less common than in adult populations. As such, more readily accessible tissues with minimally invasive collection procedures, such as cord blood, saliva, buccal epithelium cells (BECs) or peripheral blood mononuclear cells (PBMCs), are widely used tissue source materials for early life EWASs.  The use of pediatric tissues in DNAm analyses is further complicated by the fact that widespread alterations occur in tissue-specific DNAm patterns during development, therefore conferring additional complexity in the selection of appropriate source material for early life DNAm studies (96,196). Furthermore, changes in cell composition within a tissue are a source of potential confound in EWAS, as shown for a number of DNAm associations, including changes during development and certain environmental exposures such as smoking (94,197-201).  Currently, two major focal points in human epigenetic research are to elucidate the tissue specificity of DNAm patterns with respect to individual CpGs as well as assess inter-individual variation within a single tissue (96,202-204). At a population level, a number of studies have examined the concordance of DNAm patterns across multiple tissues (99,203-208). Findings 38  have shown that beyond tissue-specific differences in absolute DNAm measures, inter-individual DNAm variability also varies by tissue type (99,205). For example, previous work by our group has shown that BECs have greater DNAm variability over matched PBMCs at both the genome-wide level and at individual CpGs (99). Moreover, CpG sites with higher DNAm variability tend to be more correlated between matched tissues (203-205,208). Although these results provide important insights into the comparability of DNAm measures across matched tissues, the analyses to date have been conducted in adult tissues, thereby limiting their relevance to DNAm profiles from pediatric samples. As previous studies have demonstrated that developmental changes in blood DNAm patterns tend to be more pronounced and occur more rapidly in childhood, the examination of DNAm concordance and variability in pediatric tissues represents an important and currently missing step in our understanding of EWAS associations from pediatric peripheral tissues (96,196).   Genetic variation represents an additional contributor to DNAm patterns in tissues, with genetic influences accounting for nearly 20-80% of DNAm variance within a tissue (83,93,209-212). Methylation quantitative trait loci (mQTLs), sites at which DNAm is associated with genetic variation, are present across the genome and are often consistent across tissues, ancestral populations and developmental stage (92,177,213,214). Notably, genetically influenced sites of inter-individual DNAm variation, which can co-occur across tissues, may be biologically informative. For example, allele-specific DNAm of the FK605 binding protein 5 (FKBP5) gene, which has been associated with risk of developing stress-related psychiatric disorders, responds to glucocorticoid stimulation in a similar way in peripheral blood cells and neuronal progenitor cells (215). Within a particular tissue, such as blood, mQTLs often are stable across development (177,216). Moreover, approximately 75% of the inter-individual regional DNAm variance within 39  a single tissue can be best described by GxE models (148). As such, delineating the contribution of genetic influences to tissue-specific DNAm may help clarify the interpretation of EWAS associations.  Given that early life development brings about sizable changes to DNAm patterns, it is important to examine DNAm variability and concordance between peripheral tissues, as well as genetic influences on early life DNAm patterns, in childhood (96,110). To this end, we used matched PBMC and BEC samples, two commonly used peripheral tissues in EWAS, from two independent early life cohorts in order to identify a) differences in inter-individual variability and concordance of DNAm between these tissues and b) genetic contributions to these patterns at the site-specific level. Our results showed that genome-wide DNAm variability differed between tissues, with BECs exhibiting greater inter-individual DNAm variability over PBMCs. Moreover, we found that highly variable CpGs were more likely to be positively correlated between matched tissues and enriched for DNAm sites under genetic influence. Finally, we demonstrated the relevance of our findings to EWAS analysis by categorizing DNAm associations that were previously identified in pediatric BECs and peripheral blood. Collectively, these findings highlighted a number of potential insights and considerations for the appropriate design and interpretation of EWAS analyses performed in commonly used peripheral tissues of pediatric samples.  3.2 Materials and methods  40  3.2.1 Study cohorts and tissue samples  Matched tissues were obtained from a subset of two separate pediatric cohorts. Specifically, a subset of samples from the previously described C3ARE (Cleaning, Carrying, Changing, Attending, Reading and Expressing) cohort were collected from 16 individuals (8 females; 50%) aged 3-5 years (age range: 3.6-4.2 years (BEC) and 4.5-5.2 years (PBMC)) from Vancouver, British Columbia (147). The GECKO cohort samples (Gene Expression Collaborative Kids Only) comprised of 79 individuals (36 females; 46%) aged 6-13 years (age range: 6-11 years (BEC) and 7-13 years (PBMC)) also from Vancouver, British Columbia. Birth dates were not available for all GECKO participants; age in years was recorded at the BEC sample collections. In both cohorts, the majority of BEC samples were collected at the first visit and PBMCs were collected at a later date. In the C3ARE cohort, follow-up visits ranged from 7 days to 1.5 years, with three pairs of matched BECs and PBMCs being collected on the same day. In the GECKO cohort, the follow-up visits at which peripheral blood was collected ranged from 6 months to 2.3 years after the initial visit. All experimental procedures were conducted in accordance to institutional review board policies at the University of British Columbia (Certificates H07-01317 (C3ARE) and H07-02773 (GECKO). Written informed consent was obtained from a parent or legal guardian and assent was obtained from each child before study participation. For both cohorts, BECs were collected using the Isohelix Buccal Swabs (Cell Projects Ltd., Kent, UK) and stabilized with Isohelix Dri-Capsules for storage at room temperature prior to DNA extraction, as previously described (217). Whole blood was collected into Vacutainer® CPT™ Cell Preparation Tubes (Becton, Dickinson and Company, NJ, USA) and PBMCs were isolated following centrifugation, washing and resuspension into R10 media 41  (Sigma-Aldrich, MO, USA), as previously described (218). PBMC pellets were frozen and stored at -80°C until DNA extraction. 3.2.2 DNA isolation and DNA methylation arrays  Genomic DNA from stabilized buccal samples was isolated using Isohelix Buccal DNA Isolation Kits (Cell Projects Ltd., Kent, UK) and was purified and concentrated using DNA Clean & Concentrator (Zymo Research, CA, USA). Genomic DNA was extracted from PBMC pellets using the DNeasy kit (Qiagen, MD, USA). DNA yield and purity were assessed using a Nanodrop ND-1000 (Thermo Fisher Scientific, MA, USA). Bisulfite conversion of DNA (750 ng) was performed using the Zymo Research EZ DNA Methylation Kit (Zymo Research, CA, USA). Samples were subsequently randomized and 160 ng of bisulfite-converted DNA was applied to the Illumina Infinium HumanMethylation450K Beadchip (450K) array, as per manufacturer’s protocols (Illumina, CA, USA) (80).  3.2.3 DNA methylation array data quality control and normalization  Data from each cohort were analyzed separately. Specifically, raw intensity values from the DNAm arrays were imported into Illumina GenomeStudio V2011.1 software and subjected to initial quality control checks for array staining, extension and bisulfite conversion followed by color correction and background adjustment using control probes contained on the 450K array. Data were exported from GenomeStudio as beta values, which represent the estimated DNAm level based on a ratio of intensities between methylated and unmethylated alleles, such that beta values range from 0 (unmethylated) to 1 (fully methylated). Subsequent processing and analysis were performed in R Version 3.2.1 (http://www.r-project.org). Profiles from 65 probes targeting single nucleotide polymorphisms (SNPs) were used to ensure matched tissue samples originated 42  from the same individual. The 65 SNP probes were subsequently filtered out of the dataset. Since the cohorts were not equally matched for sex, we removed sex chromosome probes (11,648) from both datasets. Additional probe filtering was performed in which poor performing probes including those with detection p-values greater than 0.01 or probes with missing beta values in more than 2% of samples were removed (14,400 C3ARE, 13,374 GECKO). Re-annotation of the Illumina 450K array was used to filter probes that are known to be polymorphic at the target CpG. Probes which have non-specific in silico binding to the sex chromosomes were also removed (219). Final probe count after quality control probe filtering was 429,494 probes for C3ARE and 430,581 probes for GECKO. Following quality control processing, quantro determined quantile normalization to be inappropriate as the global DNAm distributions between the two distinct tissues were highly differential (220). Beta Mixture Quantile dilation (BMIQ) normalization was performed to remove differences between Type I and Type II probes on the 450K array, yielding normalized DNAm (166).   3.2.4 Cell-type correction of DNA methylation data   The effects of cellular heterogeneity on DNAm measures were removed from PBMC and BEC samples in both cohorts. Specifically, blood cell type proportions were estimated for the PBMC samples using the established Houseman blood deconvolution method (169,170). This blood deconvolution algorithm has been previously used in pediatric blood DNAm profiles where it was shown to perform reasonably well (197). To test whether this was indeed also true in our GECKO and C3ARE samples, we assessed the appropriateness of the Houseman probeset panel in our pediatric blood samples compared to adult blood profiles (169,221). We downloaded the original adult blood DNAm dataset (Reinius) on which the Houseman method 43  was trained (Accession # GSE35069) and filtered to 500 probes used in the algorithm that were common across all GECKO, C3ARE (following preprocessing) and Reinius samples (222). Given that this Houseman signature comprises 600 statistically-related probes, 500 of which passed quality control in both GECKO and C3ARE, we chose to use two commonly used analytical approaches, principal component analysis (PCA) and hierarchical clustering, to determine the relationship of methylation states between cohorts in the data. PCA showed an overlap of child and adult PBMC profiles in the two top-ranking PCs (accounting for 98% of the DNAm variance of the Houseman probeset panel) and similarly, adult samples did not cluster separately from child samples in the hierarchical clustering analysis. Collectively, these findings suggested that DNAm at CpGs used in the Houseman deconvolution signature were similar between adult and child blood samples (Supplementary Figure 3.1).Given that no cell deconvolution algorithm for buccal tissues exists and that buccal swabs, like saliva, are predominantly composed of BECs and leukocytes, we used a saliva-based deconvolution method which was designed to predict these cell types from underlying DNAm patterns (171,223,224). Predicted cell proportions from both PBMC and BEC tissues were used to normalize cellular heterogeneity within each tissue using a regression-based strategy (Supplementary Figure 3.2) (172). PCA was subsequently used to confirm that the correlation of estimated cell-type proportions to DNAm variance within a tissue were minimal in the corrected 450K datasets (data not shown).               44  3.2.5 Assessment of cross-tissue correlation, tissue-specific variability and tissue-specific differences in DNA methylation data  Prior to subsequent DNAm analyses, the corrected 450K datasets were filtered down to overlapping probes (419,507) between the GECKO and C3ARE cohorts. Probe-wise cross-tissue Spearman’s correlations were calculated on beta values between the matched PBMC and BEC tissues. Inter-individual variability of each CpG was calculated as the range between the 10th and 90th percentile beta values for each CpG, referred to as “reference range” (176). This method captures variability across the bulk of samples while being largely robust to outlier samples.  In order to assess sample size-related differences in our DNAm analyses between GECKO and C3ARE, we performed 100 trials of Monte Carlo simulations. Specifically, we randomly subsampled the GECKO cohort to the equivalent size as the C3ARE cohort (n = 16 individuals) 100 times and re-ran the cross-tissue correlations and reference range calculations on the subsamples. We reported the average correlation coefficients, p-values and references ranges from the 100 trials, which we refer to as “GECKOsub.”  Paired Wilcoxon signed-rank tests were used to compare global differences in reference range between matched BEC and PBMC samples. Fligner-Killeen tests were used to compare probewise variability differences in each of the cohorts. Using previously published methods, we aimed to identify informative sites between BECs and PBMCs, which we defined as CpGs that are both variable across individuals and highly correlated between both tissues (225). To identify informative sites, we first subset each cohort down to CpGs with a reference range greater than 0.10 in both tissues. We subsequently ran a beta mixture model on Spearman correlation rho values generating two Gaussian distributions, which separated out a group of highly concordant 45  CpGs (Supplementary Figure 3.3). The Spearman rho distributions in this set of highly correlated CpGs was used to define a threshold correlation coefficient, the cutoff being two standard deviations lower than the mean of the distribution. In the GECKO cohort rho > 0.47 was determined as the threshold and in the C3ARE cohort, rho > 0.32 was determined as the threshold. We also set a minimum reference range of 0.05 in both tissues to exclude CpGs with little inter-individual variation.   Finally, we identified CpGs that were differentially methylated between tissues by running Wilcoxon signed-rank tests across all probes in the C3ARE, GECKO and GECKOsub datasets. For all tests, the resulting p-values were adjusted using the Benjamini-Hochberg (BH) false discovery rate (FDR) method (179). CpGs which passed an FDR < 0.05 and an effect size threshold, abosolute delta beta > 5% (|∆b| > 5%), independently in all three datasets, C3ARE, GECKO and GECKOsub, were classified as “differential sites”.  3.2.6 SNP genotyping arrays  In the GECKO cohort, DNA for genotyping was collected from saliva samples of 63 individuals using the Oragene OG-500 DNA all-in-one system as per manufacturer’s protocol (DNA Genotek Inc, ON, Canada). In the C3ARE cohort, genomic DNA for genotyping was obtained from PBMC samples as described above. Genotyping data was measured at 588,454 SNP sites using the Illumina Infinium PsychChip BeadChip (PsychChip), as per manufacturer’s protocols (Illumina, CA, USA). Content for the PsychChip includes 264,909 proven tag SNPs found on the Infinium Core-24 BeadChip, 244,593 markers from the Infinium HumanCoreExome BeadChip, and 50,000 additional markers associated with common psychiatric disorders. 46   3.2.7 Preprocessing of SNP genotyping data and PCA analyses for genetic ancestry  Quality control pre-preprocessing of Illumina Infinium PsychChip data was performed separately for each cohort according to recommended guidelines (226). Specifically, SNPs with a low 10th percentile GenCall score or with a low average GenCall score were filtered out. Additionally, SNP probes located on mitochondrial DNA, on sex chromosomes or without chromosome labels were removed. After probe filtering, final SNP probe counts for the C3ARE and GECKO datasets were 550,200 and 547,662, respectively. To test for difference in genetic ancestry between the two cohorts, we ran all samples in PCA, using the 542,699 SNPs called for every individual in both processed datasets. Genetic ancestry was not found to differ significantly between the cohorts (Supplementary Figure 3.4), as determined by Wilcoxon ranked sum test of GECKO versus C3ARE in PC1 scores (p = 0.8) and PC2 scores (p = 0.4). Therefore, genetic ancestry was not considered in further analyses.   3.2.8 Cis-mQTL analyses  We ran cis-mQTL analyses in each cohort separately, using GECKO as the discovery cohort and C3ARE as the validation cohort. In the GECKO cohort, PsychChip data were filtered after quality control to remove any SNP probes containing missing values in 5% of all samples, leaving 560,770 SNPs. In addition, SNPs with a minor allele frequency less than 5% or not in Hardy-Weinberg equilibrium were removed. Remaining SNPs (249,835) were then numerically coded as 1, 2, or 3, for correlational analyses. Therefore, all SNPs used in mQTL analyses were directly measured on array, rather than generated through imputation. CpGs with a reference range of less than 5% were removed from mQTL analysis; this was performed separately in each 47  tissue, leaving 131,706 CpGs in PBMCs and 210,784 CpGs in BECs. Finally, SNP-CpG pairs less than 5 kb apart were tested as for mQTL associations using Spearman correlations. We selected a 5kb window as previous mQTL analyses using whole genome bisulfite sequencing data reported that associations between SNP–CpG pairs are more likely to be causal within a 5 kb window (85,148,227-229). In GECKO, a total of 165,591 unique SNP-CpG pairs in PBMC and 261,739 unique SNP-CpG pairs in BEC were interrogated for associations between DNAm and allelic variation; this included 145,222 SNP-CpG pairs tested in both tissues. Pairs with FDR ≤ 0.05 and DNAm change per allele ≥ 2.5% were designated as cis-mQTL candidates and followed up for validation in the C3ARE cohort (230). For validation testing in the C3ARE samples, SNP-CpG pairs were further filtered to exclude those with SNPs that were a) not present in the filtered C3ARE PsychChip data or b) monomorphic or had less than 2 heterozygotes in the C3ARE samples. The mQTL analyses were repeated in the C3ARE data. SNP-CpG pairs with FDR ≤ 0.05 and DNAm change per allele ≥ 2.5% were designated as validated cis-mQTLs and followed up in subsequent analyses. All genotyping and DNAm data were analyzed using the human assembly GRCH37 (hg19) genome build. All SNPs are reported on the (+) strand, according to standard practices in the field.    3.2.9 Representation of identified sites in published EWAS findings  In order to relate our results to published EWAS findings performed in pediatric cohorts, we selected five published studies that used the 450K array to measure DNAm profiles in pediatric BECs or peripheral blood. Specifically, these studies examined DNAm variation associated with puberty, aging in early life, childhood psychotic symptoms, fetal alcohol spectrum disorder and autism spectrum disorder (217,231-234). For each study, we downloaded 48  the list of probes reported as significant and matched these probes to sites, which we identified as: 1) informative sites, 2) differential sites and/or 3) cis-mQTL-associated CpGs. For one study, in which differentially methylated regions (DMRs) were reported, we downloaded the dataset (Accession # GSE50759) and extracted individual probes underlying the DMRs (234).      3.3 Results  3.3.1 Study cohorts and DNAm data processing   To explore the tissue-specific DNAm patterns of pediatric PBMCs and BECs, we used subsets from two independent human cohorts, GECKO and C3ARE, both of which contained matched tissue samples from healthy children from the Lower Mainland Vancouver area. In GECKO, individuals ranged in age from 6 to 11 years at time of BEC collection (median = 8.8) and 7 to 13 years at time of PBMC collection (median = 10.3). Of the GECKO study sample (n = 79), 46% were female (n = 36). In C3ARE (n = 16), individuals ranged in age from 3 to 5 years at time of BEC collection (median = 4.5) and 4 to 5 years at time of PBMC collection (median = 5.1) and 50% were female (n = 8) (Table 3.1).   Table 3.1 Sample characteristics for C3ARE and GECKO cohorts Characteristics C3ARE GECKO Age Range (years) at BEC collection (mean) 3.7-5.8 (4.5) 6-11 (8.8) 49  Age Range (years) at PBMC collection (mean) 4.2-5.9 (5.1) 7-13 (10.3) Sex n = 16 total (50% F) n = 79 total (46% F)   DNAm data, as measured across ~485,000 CpGs by the Illumina 450K array, were filtered down to overlapping 419,507 sites which passed independent quality control measures in both cohorts. Each 450K dataset was normalized to remove probe type differences and adjusted for cell-type heterogeneity in each tissue using established bioinformatic correction methods (166,169-171). Genetic variants were measured genome-wide using the Illumina Infinium PsychChip. Following probe filtering for low-quality probes, 550,200 and 547,662 SNP probes remained for analysis in C3ARE and GECKO, respectively. We used these corrected DNAm and genotyping data of matched PBMC and BEC samples from both cohorts to assess inter-individual DNAm variability, DNAm concordance across tissues and genetic influence on DNAm, in order to gain insight into DNAm variation in these commonly used pediatric peripheral tissues.  3.3.2 BEC DNAm had greater inter-individual variability than PBMC DNAm  As inter-individual DNAm variability within a tissue likely relates to the potential effect sizes that are detectable in EWAS analyses, we were interested in assessing tissue-specific DNAm variability. To this end, we first interrogated the global differences in inter-individual DNAm variability between PBMC and BEC samples, following in silico correction for cell type differences in each tissue. We used reference range as a measure of DNAm variability as opposed to absolute range in order to minimize potential skewing by outlier values and non-normal DNAm values at individual CpGs, as previously described (225,235). Within each 50  cohort, BEC DNAm had a significantly greater reference range than PBMC DNAm (Figure 3.1A; Wilcoxon signed-rank test, all p-values = 2.2 x 10-16).  In GECKO, the median reference range, measured in beta values, was 1.9% higher in BECs (5.2%) than in PBMCs (3.3%). Similarly, in C3ARE, the median reference range was 1.6% higher in BECs (3.6%) than in PBMCs (2.0%). The difference in reference range was not dependent on sample size, as demonstrated by the consistency between GECKO and GECKOsub, the GECKO cohort randomly subsampled to the sample size of C3ARE (n = 16) 100 times (Figure 3.1A). In addition, tissue-specific differences in DNAm variability were observed at individual CpGs, as determined by a Fligner-Killeen test, a non-parametric test measuring homogeneity of variances between two groups. In GECKO, 217,091 probes had significantly greater variability in BEC at FDR ≤ 0.05, while only 32,350 probes were more variable in PBMC. Similarly, in the C3ARE cohort, 127,472 probes had greater variability in BECs (FDR ≤ 0.05) and 8,183 probes in PBMCs (FDR ≤ 0.05; Figure 3.1B). This consistent difference in variability between BECs and PBMCs was best represented by cg10852045, cg14245471 and cg1855901 (Figure 3.1C). Collectively, 85% of C3ARE probes (108,498) with greater variability in BEC were also found in the GECKO cohort to have greater BEC variability. These 108,498 CpGs were enriched for sites with high inter-individual BEC variability in both cohorts (10,000 permutations, p-value < 1 x 10-4). As well, 84% of C3ARE probes (6,840) with greater variability in PBMCs, were also more variable in PBMCs in the GECKO cohort; similarly, this subset was enriched for CpGs with high PBMC variability in both cohorts (10,000 permutations, p-value < 1 x 10-4). These findings suggested that BEC DNAm was consistently more variable than PBMC DNAm across both cohorts, in line with previous analyses using adult tissues (99). 51   Figure 3.1 BEC DNAm was consistently more variable than PBMC DNAm at the genome-wide and probe-wise level.  A) Distribution of reference range in C3ARE, GECKO and GECKOsub, showing a significantly great variability in BEC vs. PBMC (Wilcoxon p < 2.2x10-16 in each cohort). B) Scatterplot of PBMC versus BEC reference range in each cohort. C) Three examples of CpGs with the greatest reference range difference between tissues. Individuals from the GECKO cohort are shown in red and individuals from C3ARE are shown in blue.  ABcg10852045 cg14245471 cg18559901BECPBMCBECPBMCBECPBMC0.000.250.500.751.00tissuebeta value projectC3AREGECKOCC3ARE GECKO GECKOsubBECPBMCBECPBMCBECPBMC−6−4−20log reference rangeC3ARE GECKO0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 reference rangeBEC reference range1204038103counttissue52  Apart from tissue-specific differences in reference range, we also observed a cohort-specific difference in DNAm variability. Specifically, CpGs in GECKO had a significantly greater median reference range than C3ARE CpGs in both tissues (Wilcoxon rank sum test, p-values = 2.2 x 10-16). In BECs, the median reference range was 1.6% higher in GECKO than C3ARE and in PBMCs, it was greater by 1.3%. This difference remained significant when GECKOsub was used in lieu of GECKO (BEC difference = 1.2%, PBMC difference = 1.1%, Wilcoxon rank sum test, p-values = 2.2 x 10-16), suggesting that these cohort-specific DNAm variability differences occurred irrespective of sample size and may be related to age-associated increases in DNAm variability, as previously described (236-241) or an unaccounted for difference in sampling.   3.3.3 Variable CpGs were more highly correlated between tissues   Taking advantage of the matched tissue design of our cohorts, we evaluated whether DNAm variation in one tissue reflected DNAm variation in the other. We performed probe-wise Spearman’s correlations between paired BEC and PBMC samples for the C3ARE, GECKO and GECKOsub datasets, respectively (Supplementary Figure 3.5).  Using multiple reference range thresholds to capture increasingly variable CpGs, as previously described, we observed progressively greater enrichment of highly positively correlated CpGs, irrespective of sample size (Figure 3.2A and Supplementary Table 3.1) (225). This suggested that, broadly, CpGs with greater variability were more likely to be correlated between these tissues than less variable CpGs.   We next sought to investigate DNAm variability and concordance at individual CpGs. Specifically, we aimed to identify “informative sites”, which we defined as CpGs that are both variable across individuals and highly correlated between BECs and PBMCs, using a previously 53  described method (225). Such CpGs may be predictive of PBMC DNAm when measured in BECs or vice versa. To be classified as informative, i.e. variable and concordant, a CpG was required to have a reference range ≥ 5% in both tissues and meet the minimum correlation coefficient between tissues of 0.47 in GECKO samples and 0.32 in C3ARE samples, as determined by a beta mixture model run on highly variable CpGs in each cohort. Overlapping CpGs that met these criteria in both cohorts resulted in a set of 8,140 informative sites. Of note, we observed a greater than expected by chance overlap (3682 out of 8140 sites, 45%, 10,000 permutations, p < 1x10-4) between our set of informative sites and informative CpGs previously identified between matched samples from adult brain and blood tissues (225). Visualization of our six most correlated informative sites revealed continuous distributions of positively correlated DNAm values between the tissues, as expected (Figure 3.2B). However, the most variable informative sites exhibited discrete distributions with 2 to 3 distinct clusters, rather than a typical continuous distribution, suggesting that these CpGs may be enriched for CpGs which are likely under genetic influence (Figure 3.2B) (204). 54   Figure 3.2 Variable CpGs were more highly correlated between tissues. A) Density distribution plots of Spearman’s correlation rho between matched PBMCs and BECs across C3ARE, GECKO and GECKOsub datasets showing progressively greater enrichment of highly positively correlated CpGs at increasing reference range thresholds. Reference range thresholds were set along a sliding scale with cut-offs at 0, 0.05, 0.1, 0.2 and 0.5 (depicted by gradient of green lines). B) Scatterplots of BEC DNAm versus PBMC DNAm for a representative set of informative sites (defined as CpGs that are both variable across individuals and highly correlated between BECs and PBMCs). Top-ranking correlated informative sites (shown in the left two columns) exhibited continuous distributions. In contrast, top-ranking variable informative sites (shown in the right two columns) exhibited discrete distributions, suggesting that these Cps may be under genetic influence. C3ARE samples are shown in blue while GECKO samples are shown in red.  ABC3ARE GECKO GECKOsub−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.00123Spearman's correlation coefficient (rho)densityVariability cutoff00. cg26094651cg16824113 cg19665696cg00413030 cg062383160.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 (beta value)cg19214707 cg24883219cg04234412 cg14651435cg00456343 cg037960030.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00BEC (beta value)Top-ranking correlated informative sites Top-ranking variable informative sites55  3.3.4 Genetic variation contributed to tissue concordance   In order to determine the influence of local genetic variation on inter-individual DNAm variability and concordance of DNAm signal across matched peripheral tissues, we identified cis-mQTLs in both BEC and PBMC samples, respectively. Briefly, CpGs were filtered by DNAm variability (reference range ≥ 0.05) in their respective tissues and were correlated against all SNPs within a 5kb window, a window size previously demonstrated to enrich for mQTLs that are more likely to be functionally linked to proximal CpGs (85,148,229). As the GECKO cohort had a larger sample size as compared to C3ARE and was therefore more adequately powered for cis-mQTL detection, the GECKO samples were used as the discovery cohort. A total of 16,880 and 18,245 significant cis-mQTLs were identified in GECKO PBMCs and BECs, respectively (FDR ≤ 0.05 and DNAm change per allele ≥ 2.5%), with 6,359 mQTLs in common between tissues (Figure 3.3A). These mQTLs were selected for validation testing in C3ARE.   After quality control processing and variability filtering of the C3ARE DNAm and genotyping data, 16,138 and 17,563 SNP-CpG pairs could be tested for validation in PBMCs and BECs, respectively (mQTLs that were not tested for validation lacked genetic variability in the C3ARE cohort). This resulted in a total of 1,871 PBMC-specific, 3,705 BEC-specific and 1,097 shared-tissue validated cis-mQTLs (FDR ≤ 0.05 and DNAm change per allele ≥ 2.5%), which exhibited highly consistent effect sizes between GECKO and C3ARE cohorts (Spearman rho = 0.92, p = 2.2 x 10-16) (Figure 3.3A and 3.3B). The overlap between validated cis-mQTLs between tissues was greater than expected by chance (10,000 permutations, p-value < 1x10-4) (Figure 3.3A and Supplementary Figure 3.6). This suggested that genetic influences contributed to co-variation between tissues. Finally, we found a significant overlap of our 1871 PBMC-specific and 1097 shared-tissue cis-mQTLs with previously published mQTL hits from whole 56  blood samples of 7-year-old children in the AIRES cohort (1810 out of 2968 sites, 61%, 10,000 permutations, p < 1x10-4), further supporting our mQTL findings (148).    57    Figure 3.3 Independently validated cis-mQTL were more likely to be shared across tissues than expected by chance.  A) Stacked bar plot representing number of cis-mQTLs identified in GECKO discovery cohort (shown in blue) and number of cis-mQTLs validated in C3ARE cohort (shown in red) in either BECs, PBMCs or shared across both tissues. (B) Scatterplot of DNAm change per allele in GECKO versus C3ARE across all validated cis-mQTLs shows mQTL effect sizes (measured as DNAm change per allele) were highly consistent across cohorts (BEC-specific, PBMC-specific and shared-tissue mQTLs shown in different colours). C) Boxplots of genotype versus DNAm for representative examples of a shared-tissued (top left), BEC-specific (top right) and a PBMC-specific (bottom) validated cis-mQTLs. C3ARE samples are shown in blue while GECKO samples are shown in red.  58     We next sought to characterize our validated cis-mQTLs by their genomic localization and functional features. Firstly, the 4,980 unique CpGs associated with the validated cis-mQTLs showed a greater than expected by chance enrichment in intergenic regions and were depleted in intragenic and north shelf regions (2-4 kb upstream of CpG islands) (Supplementary Figure 3.7A, FDR ≤ 0.05). In particular, both the CpGs associated with tissue-specific cis-mQTLs and the CpGs associated with shared-tissue cis-mQTLs were significantly enriched at intergenic and intragenic regions and showed significant depletion at promoters and CpG islands, where DNAm levels tend to be low and there is limited inter-individual variation (Figure 3.7B and C, FDR ≤ 0.05). However, tissue-specific mQTL-associated CpGs exhibited significant enrichment at south shelf regions (2-4 kb downstream of CpG islands) whereas shared-tissue mQTL-associated CpGs were significantly enriched in north shores (0-2 kb upstream of CpG islands) but depleted in north shelf regions (Supplementary Figure 3.7B and C, FDR ≤ 0.05). In addition, we found that CpGs associated with shared-tissue cis-mQTLs exhibited a greater than expected by chance enrichment of informative CpGs (687 out of 812 unique CpGs in shared-tissue cis-mQTLs, 85%, 10,000 permutations, p < 1x10-4), further substantiating that site-specific DNAm correlation between tissues are influenced, in part, by genetic variation (Supplementary Figure 3.8).  3.3.5 Tissue-specific differential DNAm was consistent across cohorts   Taking further advantage of our matched tissue design, we subsequently assessed differential DNAm between PBMCs and BECs at individual CpGs for both cohorts. In the GECKO samples, 36% of CpGs (150,647) were differentially methylated between matched BECs and PBMCs (Wilcoxon signed rank test; FDR ≤ 0.05 and |∆b| ≥ 5%). The number of 59  significant differentially methylated sites was not greatly affected by sample size differences as GECKOsub had similar findings with 36% of sites exhibiting differential DNAm (149,094 CpGs, with 148,767 sites overlapping with GECKO). Similarly, in C3ARE, 38% of CpGs (157,992) were significantly differentially methylated (Wilcoxon signed rank test; FDR ≤ 0.05 and |∆b| ≥ 0.05). The overwhelming majority of these CpGs (139,662) were differentially methylated in the same direction in GECKO, GECKOsub and C3ARE (Figure 3.4). Of these sites, 102,203 (73%) had greater average DNAm in PBMCs and 37,459 (27%) had greater average DNAm in BECs. This corresponded with a greater median DNAm across all PBMC probes (68%, 68%) as compared to all BEC probes (47%, 50%) in both C3ARE and GECKO, respectively.    Figure 3.4 Tissue-specific differential DNA methylation was consistent across cohorts.  Volcano plots of differential methylation analysis (run using a paired Wilcoxon signed rank test) between BEC and PBMC tissues for C3ARE, GECKO and GECKOsub datasets. Vertical lines represent an effect size threshold of > 0.05 for absolute mean difference between tissues (BEC - PBMC) and the horizontal line represents the nominal p-value corresponding to an FDR < 0.05 in each cohort. CpGs in dark purple met the effect size and significance cut-offs independently in all three datasets (139,662 CpGs). GECKO -log p-values were ~5X greater than that of GECKOsub and C3ARE likely due to sample size differences between datasets (n=79, n=16, n=16, respectively); y-axes were left unstandardized to display trends within each cohort.  60   3.3.6 cis-mQTLs were present in previously published EWAS findings   To provide a granular categorization of CpGs measured on the 450K array, we overlapped CpGs that were identified as a) informative (i.e., variable across individuals and correlated between BECs and PBMCs) (8,140), b) differentially methylated between matched tissues (139,662), or c) under genetic influence (4,980; i.e. number of unique CpGs associated with validated cis-mQTLs) across both GECKO and C3ARE cohorts. Of all CpGs associated with cis-mQTLs, 17.7% were informative and 76.2% were differentially methylated (Figure 3.5A). However, in CpGs associated with cross-tissue cis-mQTLs (812 unique CpGs in total), 84.6% were informative and 58.8% were differentially methylated.   We then applied this categorization scheme to previously reported EWAS findings performed in pediatric BEC or PBMC tissues to provide an example of how the classification of CpGs can aid in the interpretation of such studies. We selected five published studies that used the 450K array in pediatric BECs or peripheral blood to assess DNAm variation associated with puberty, aging in early life, childhood psychotic symptoms, fetal alcohol spectrum disorder and autism spectrum disorder (217,231-234). By implementing our CpG classification scheme on their respective list of significant EWAS hits, we found that cis-mQTLs, as identified here, accounted for 0.02-13.5% of significant CpGs reported in these five studies. Differentially methylated CpGs comprised the most represented type of CpG across all 5 studies with only one study demonstrating an overlap of 24.3% with our identified informative sites (Figure 3.5B and Supplementary Table 3.2) (234). This suggested that the majority of DNAm associations identified in these EWASs were likely specific to peripheral blood or BECs, rather than shared across tissues. Finally, we tabulated our CpGs classifications across all 419,507 DNAm probes 61  assessed in our study in order to serve as a resource for researchers wishing to compare their own EWAS results (Additional file 11: Supplementary File 1). Collectively, these findings reveal the importance of considering DNAm variability and concordance between tissues, as well as genetic influences on these patterns, when interrogating and interpreting EWAS findings from pediatric peripheral tissues.    62   Figure 3.5 Overlap and representation of identified CpGs in previously published pediatric EWAS findings.  A) Venn diagram of CpGs identified as informative, differentially methylated between tissues, or underlying our set of validated cis-mQTLs. Scatterplots display three representative CpGs from the pairwise intersections between categories.  B) Stacked bar plot showing proportion of CpGs of each defined category represented in significant CpGs of various pediatric EWAS publications in BECs or PBMCs. (All = all categories; Differential = differentially methylated between tissues; Informative = informative CpG; Inform + Diff = informative and differential; mQTL = CpG associated with mQTL; mQTL+Diff = mQTL-associated CpG and differential; mQTL+Inform = mQTL-associated CpG and informative; None = not in any of the listed categories). 63  3.4 Discussion   In this study, we comprehensively compared genome-wide DNAm in BECs and PBMCs using matched samples from two independent pediatric cohorts. Moreover, we leveraged the strength of paired DNAm and genotyping profiles to define cis-mQTLs across the genome and assess the influence of local genetic variation on DNAm variability and tissue concordance. Our findings showed that at the genomic and site-specific level, BECs had greater inter-individual DNAm variability over PBMCs, with highly variable CpGs more likely to be positively correlated between the matched tissues. In our subsequent cis-mQTL analyses, we observed distinct genetic influences on tissue-specific DNAm and confirmed that a sizeable proportion of shared DNAm patterns between tissues resulted from allelic variation. Finally, we provided a classification framework for the post-hoc examination of EWAS associations and examined the representation of our categorized CpGs in published EWAS findings performed in pediatric BECs and PBMCs.   Our findings highlighted extensive differences in DNAm patterns between tissues and thus the importance of tissue selection when designing an EWAS. To a large extent, EWAS tissue selection in early life cohorts is guided by two factors. Firstly, ease of collection is particularly important in this age range and may restrict tissue availability. Buccal swabs are less invasive than intravenous puncture, and the latter contributes to participation refusal in pediatric cohorts (242). Secondly, the relevance of the tissue to the phenotype or exposure being tested represents an important consideration for all EWAS analyses, irrespective of age. As peripheral blood represents a circulating tissue with broad immune and inflammatory functions, it might be more relevant to a wider range of health phenotypes than BECs. However, another hypothesis posits that tissues that arise from the same germ layer are more epigenetically similar and thus 64  might be a preferred choice for surrogate tissue selection (243). For example, in comparison to blood, it has been proposed that BEC DNAm may more closely reflect brain DNAm than blood DNAm, as both derive from the ectodermal germ layer (93,234). Adding to the complexity of this issue, we found that BEC DNAm had significantly greater inter-individual variability than PBMC DNAm at the genome-wide level and at the site-specific level, a finding consistent with adult BECs and PBMCs (99). Having a higher proportion of variable CpGs might be desirable for EWAS analyses as testing any tissue with little inter-individual DNAm variation would naturally limit effect sizes. From this perspective, BECs might represent a more appropriate choice of peripheral tissue for population-based epigenetic studies over PBMCs. However, it is worth noting that while we did correct for cellular heterogeneity in both tissues using bioinformatic deconvolution approaches, the higher proportion of variable CpGs in BECs may, to some extent, be attributed to the increased diversity of cell types or residual cellular heterogeneity in BECs over PBMCs (i.e. epithelial and hematopoietic in the former and entirely hematopoietic in the latter) (224).   Taking advantage of our matched sample design, we were able to rigorously interrogate the extent of correlation between DNAm signatures of BECs and PBMCs. CpGs with greater variability were more likely to be correlated between matched tissues, as best exemplified by the 8,140 informative sites we identified. These may aid in the inference of unmeasured PBMC or BEC DNAm (when the other tissue is measured) as well as for prioritization of sites for cross-tissue replication. In the latter case, cross-tissue replication typically involves the generation of candidate gene lists in accessible tissues for validation in less available tissues, such as post-mortem samples, an approach which can boost confidence in identified associations (244-246). There was a substantial overlap (45%) between our informative sites and those previously 65  published in matched adult blood and brain tissues (225). However, we found only 1.9% of total measured CpGs to be informative by our measures and thresholds as compared to 9.7% found in the previous analyses of adult samples from our laboratory (225). These quantitative differences might be due to a number of reasons, with the most likely being that the blood-brain informative sites were identified using a single cohort while our blood-buccal informative sites were filtered down to sites that were common across both GECKO and C3ARE cohorts; other explanations may be methodological (i.e. slight differences in analytical thresholds derived from empirical testing), or biological (i.e. blood may be more epigenetically similar to brain tissue than to BECs, resulting in more informative sites).  An in-depth analysis of such cross-tissue comparisons between pediatric and adult samples, ideally by means of longitudinal sampling of DNAm, may help elucidate such sources of tissue variation across the lifespan.  Integration of genetic and epigenetic information may further clarify the relative contribution of genetic and environmental factors on inter-individual DNAm variability. We found that genetic variation contributed to both inter-individual DNAm variation within a tissue, as well as common DNAm variation between tissues. This is in general agreement with previous findings that show that many – but not all - mQTLs have consistent effects across tissues and human populations and are generally depleted in genomic regions which tend to have low DNAm variability such as promoters and CpG island but enriched in more variable intergenic and intragenic regions (91,177,213,216,247). It is currently unclear why we observed more BEC-specific mQTLs in our matched design as compared to PBMC-specific or cross tissue mQTLs. The most likely explanation is that BECs contained more validated cis-mQTLs due to greater inter-individual DNAm variability. It is also tempting to speculate that allelic variation contributes more strongly to DNAm in BECs over PBMCs, because blood DNAm might be 66  more plastic and responsive due to the role of blood cells in the immune system(248-250). For example, changes in genome-wide transcriptional programs and DNAm profiles are observed in response to an inflammatory stimulus in blood leukocytes, which could be incongruent with a high degree of fixed, genetically-driven DNAm patterns in these cells (248-250). In a more complicated paradigm, DNAm variation may be best explained by the interaction of both genetic and environmental factors (GxE interactions), as previously demonstrated in blood-based DNAm profiles (137,148).    As touched upon in several recent reviews, genetic contribution to DNAm might be more prominent in shaping the DNA methylome than initially anticipated, and thus affect the analysis and interpretation of EWAS findings (94,251). To illustrate this, we tested for the presence of our categorized CpGs in published EWAS findings. Notably, we found that while most identified EWAS associations may be distinct to the tissue in which they were examined, in some instances, these associations may be reflected across multiple tissues and/or under genetic influence. For example, we observed CpGs associated with autism spectrum disorder to contain the highest proportion of cis-mQTLs. While there might be a number of reasons for this, it is possible that the proportion of genetically-influenced CpGs found in an EWAS may be proportional to the heritability of the phenotype under examination, although such hypotheses will require rigorous testing in large cohorts across a diverse spectrum of phenotypes with and without heritable contributions. Furthermore, it is difficult to discern whether having a high proportion of mQTLs in EWAS analyses is favourable or not. Previous work has shown, the majority of variably methylated regions are best described by an interaction of both genetic and environmental factors (148,246). Emerging findings from neonatal blood samples have additionally shown that the bulk of variable DNAm sites are best accounted for by either additive 67  (G+E) or interaction (GxE) models, suggesting that environmental influences on DNAm may be further delineated with the inclusion of genotype information (148,252). As such, any mQTL-associated CpGs found in an EWAS may offer alternate interpretations to phenotypic associations with DNAm and would require further investigation for potential gene-environment effects.   It is worth noting that our study had a few inherent limitations. Firstly, in both GECKO and C3ARE cohorts, PBMCs were collected from individuals at a slightly later time point than BECs, resulting in an age-related difference (0 – 1.5 years for C3ARE; 0.5 – 2.3 years for GECKO) between matched tissues, which may have affected analyses of DNAm variability. However, we anticipate that age-related differences in DNAm variability are relatively small compared to tissue-specific differences as our findings are consistent with previous work performed on age-matched tissues in adults (99). Another limitation was the relatively small sample size of our cohorts, which may have inflated type II error rates. We also chose to not assess distal genetic effects on DNAm (ie trans-mQTLs) due to the increased multiple testing burden, but rather prioritized cis-mQTLs as previous work has suggested these may be more functionally linked to nearby CpGs (85,148,229). As well, previous work in blood has shown that the proportion of DNAm variance explained by trans-mQTLs is much lower than that of cis-mQTLs (216). For these reasons, we examined SNPs that were directly measured and not imputed, as performed in other pediatric mQTL analyses, within a 5 kb window (148,166). As a result, we likely underestimated the number of mQTLs present in our tissues. Future work using large cohorts will be required to clarify the contribution of distal genetic variants to DNAm in other peripheral tissues. In addition, our mQTL findings were limited by the coverage of the 450K array, which interrogates less than 2% of all DNAm sites across the genome, although this 68  includes 94% of all mapped CpGs islands. As such, it is generally biased towards CpG-dense promoter regions, which typically have limited inter-individual and inter-tissue variation (80,88,96,253). Finally, while we found the Houseman blood deconvolution method to perform well in our cohorts, evidence of substantial DNAm changes across the lifespan, especially during early childhood, necessitates the refinement of cell deconvolution methods, including adjusting for age, to allow for more nuanced estimation of cell types in early life (96,110,241).   The work here presents a comprehensive assessment of local genetic influences on DNAm in matched BECs and PBMCs, as well as a characterization of DNAm variability and concordance between paired pediatric tissues. Moreover, our results highlight a number of possible considerations for EWAS analyses, including the potential enrichment of mQTL findings following pre-filtering to variable CpGs to reduce multiple test barriers and possible strategies to facilitate in-depth curation of EWAS hits. Such post-hoc examination of significant differentially methylated CpGs will hopefully support the interpretation of EWAS findings and aid in the prioritization of candidate associations for functional validation. 69  Chapter 4: Children’s biobehavioural reactivity to challenge predicts DNA methylation in adolescence and emerging adulthood  4.1 Introduction From the earliest moments of life, children’s health and development are shaped by the qualities of their environmental contexts. Processes termed “biological embedding” elucidate the possible mechanisms of such relations and describe how exposures to environmental adversity get “under the skin” to influence critical biological pathways affecting health across the lifespan (19,127). Epigenetic processes represent one model of biological embedding and have been increasingly recognized as a potential link between stressful childhood environments and later health outcomes (127,254). DNA methylation (DNAm) patterns associated with environment or experience are also influenced by other factors such as individual health behaviours, differences in temperament, and disease states (255). DNAm is the most studied epigenetic modification in human populations and consists of a methyl group addition to the 5’ cytosine of CpG dinucleotides (CpGs). Once believed to be a gene silencing epigenetic mark, DNAm is context- and location-specific and has been linked to increased, decreased, and unchanged gene activity (63,85,88). The complex mechanisms by which DNAm can alter gene activity include inhibiting or enhancing transcription factor binding to DNA, recruiting enzymes to alter histone modifications, and altering splice sites, among others (63,256). DNAm is most dynamic during fetal development when epigenetic patterns play an integral part in the complex processes of embryogenesis (64,104) and rates of change generally stabilize in adulthood, (110). However, adolescence is also understood to be a time of 70  increased methylome alterations (65,110), though studies of DNAm changes during this developmental period are more scarce compared to those conducted in early childhood and later adulthood (241). A growing body of research has revealed associations between exposures to early life environmental and psychosocial adversity and DNAm in accessible tissues such as buccal epithelial cells (BECs), saliva and peripheral blood (for an excellent review of the epigenetics patterns of traumatic stress, see Vinkers et al. (2015)) (257). For example, children reared in institutional environments show increased DNAm among many genes in peripheral blood mononuclear cells (PBMCs) and whole blood, as compared to children reared by biological parents (139,140). DNAm measured in tissues including PBMCs, saliva, and BECs also appears to be associated with early life experiences of low socioeconomic status (87), childhood maltreatment or deprivation (137,141-143) and maternal mental health problems during the perinatal period (144,145). In a prior study of this cohort conducted by our research team, exposure to maternal stress in infancy and childhood was associated with differential DNAm among offspring in mid-adolescence (146). Paternal stress in childhood was also associated with DNAm changes in mid-adolescence among female offspring only.  Beyond the influence of adverse early environmental experiences, epigenetic patterns may also be shaped by intra-individual biology. Genetic variation, for example, is a strong predictor of DNAm patterns (91,211,258). Allelic variation may alter individual susceptibility to adverse social and environmental conditions leading differences in DNAm (190). For example, an allelic variant of the FKBP5 stress-response gene altered whether adults who experienced childhood abuse or trauma also exhibited loss of DNAm at this gene (137). In addition to allelic 71  variation, factors indexing an individual’s internal psychological and physiological state may also be associated with patterns of DNAm (259,260).   Empirical studies examining the association between individual-level phenotypic factors and epigenetic differences are scarce. In two papers, measures of physiological reactivity to stress during infancy and childhood were associated with DNAm of BECs and placental cells (259,260), and physical aggression in early life has been shown to predict differential patterns of DNAm in T cells in adulthood (128,261). Recent research in a group of young rhesus macaques also showed anxious temperament to be associated with differentially methylated loci in the central nucleus of the amygdala (262).   A limited number of studies have thus examined associations among discrete biological and behavioural stress response parameters, psychological health, and epigenetic modifications. The inherent coupling involved in “mind-body relations,” however, suggests that a more comprehensive understanding might be gleaned from synthesizing the interrelations among individuals’ internal, individual-level biological and behavioural qualities into an integrated factor that could be examined for associations with DNAm. To this end, the current study derived measures of children’s biobehavioural response predispositions from the shared variation among temperamental traits, presyndromal behavioural symptoms, and autonomic reactivity to stressful laboratory challenges. Temperament has been defined as “constitutionally based individual differences in reactivity and self-regulation in the domains of affect, activity, and attention” (263). Such differences have established biological underpinnings and are known to influence children’s physiological and behavioural responsivity to environmental conditions (264-266). Both temperament and stress reactivity are related to the development of later forms of psychopathology and may act as antecedent, subclinical precursors, or endophenotypes (267-72  270). When taken together, an integrative measure of temperamental traits, presyndromal mental health symptoms, and biological reactivity might plausibly provide a more powerful indicator of a child’s internal biobehavioural response predispositions than if those domains are explored independently. To my knowledge, no extant research has examined relations between biobehavioural responses and DNAm over time. In light of previous research, the present study examined prospective associations between early, internal differences in biobehavioural responses and later epigenetic modifications across two time points within a sample of 55 individuals from the Wisconsin Study of Families and Work (WSFW). This developmentally-oriented, longitudinal research project established a birth cohort from which data on child temperament, autonomic reactivity, mental health and DNA were collected at multiple time points from prenatal life to age 18. I anticipated significant relations between childhood biobehavioural response propensities (i.e., internal factors) and adolescent patterns of DNAm, paralleling prior work performed in my lab (and that of other investigators) documenting linkages between stressful life experiences (i.e., external, environmental factors) and DNAm. I examined the relations between early life biobehavioural measures and DNAm at two time points, 15 years and 18 years. Utilizing these rich longitudinal data, I ran an additional analysis of the temporal stability of DNAm, examining whether such stability is required for the longitudinal persistence of biobehavioural associations.  4.2 Materials and methods   73  4.2.1 Study sample Participants in the current study were drawn from a WSFW subsample (n = 120) of children, parents, and teachers (154). Children were selected for that subsample to provide a balanced representation of high and low reported mental health symptoms (155). The present analyses are based on a subset of 55 children who had complete data on temperament, mental health symptomatology, and ANS reactivity in the infancy, preschool, and kindergarten periods, and who provided DNA samples at ages 15 and 18 years (Table 1). Mann–Whitney U tests indicated no significant differences on any biological or behavioural measure between the 55 children in the present analysis and the larger WSFW subsample from which they were drawn (p > 0.05 at each variable). Of the 55 individuals, 19 were male and 36 were female. Mean family income measured at 12 months postpartum and preschool (4.5 years old) was $51,480 (median = 47,000) and $63,220 (median = 56,000), respectively. Six children were of non-Caucasian minority status. All children entered primary school in the same school year (in 1996). Ethics approval for the WSFW was obtained from the University of Wisconsin-Madison Institutional Review Board and informed consent was obtained from all participants.  4.2.2 Temperament, ANS, and mental health measures  A summary of all measures collected at each time point, organized by construct, can be found in Table 4.1.    74  Table 4.1 Mental health, temperament and ANS traits collected over seven years and included in the analysis  Parameter  Variable Reporter Age/time point Instrument Measure Min.* Median* Max* ANS response OG1-HR observational grade 1 ANS stress reactivity Heart rate reactivity (slope) -1.30 -0.23 2.85 OG1-PEP Pre-ejection period reactivity (slope) -2.34 0.41 1.13 OG1-RSA Respiratory sinus arrhythmia reactivity (slope) -2.31 0.29 1.20 OG1-MAP Mean arterial pressure reactivity (slope) -1.42 -0.22 1.97 Temperament MI-AN mother 12 months IBQ Approach negativity (activity level, distress to limitations) -2.14 -0.02 1.99 MI-WN Withdrawal negativity (distress to novelty, startle) -2.09 0.00 3.06 MP-WN avg. 3.5 & 4.5 years CBQ Withdrawal negativity (fear, sadness,  shyness) -2.36 0.03 1.74 OP-WN  observational 4.5 years LabTAB -2.31 -0.10 1.85 OG1-WN  grade 1 -2.58 -0.08 2.01 MP-ANG mother avg. 3.5 & 4.5 years CBQ Approach negativity (anger) -2.48 0.10 1.87 OP-ANG  observational 4.5 years LabTAB -1.67 0.14 2.34 OG1-ANG  grade 1 -1.99 -0.09 1.81 Mental health symptom MK-INT mother kindergarten HBQ Internalizing (depression,  separation anxiety, overanxious) -1.34 -0.19 2.82 TK-INT teacher -0.85 -0.43 3.61 MK-EXT mother Externalizing (oppositional, conduct, overt aggression) -1.67 -0.10 2.59 TK-EXT teacher -0.57 -0.45 3.85 Note: M = mother-report, T = teacher-report, O = observed; I – infancy, P = preschool, G1 = grade 1; AN = approach negativity, WN = withdrawal negativity, ANG = anger, INT = internalizing symptoms, EXT = externalizing symptoms, HR = heart rate, PEP = pre-ejection period, RSA = respiratory sinus arrhythmia, MAP = mean arterial pressure. * Reporting the standardized descriptives that were used for PCA 75  During in-home assessments completed during first grade, children participated in a 15-minute standardized, developmentally-appropriate stress reactivity protocol (271). Briefly, the protocol consisted of a challenges across social (interview with the child), cognitive (digit recall task), sensory (a taste identification task), and emotional (a fear- and sadness-eliciting movie clip) domains (271). Measurements of ANS activity, including heart rate (HR), pre-ejection period (PEP), respiratory sinus arrhythmia (RSA), and mean arterial pressure (MAP) were assessed via electrocardiography and impedance cardiography throughout the protocol (155,271). Autonomic reactivity was indexed as increases in HR and MAP and decreases in PEP (reflecting sympathetic activation) and RSA (reflecting parasympathetic withdrawal), relative to resting levels (155). All specific reactivity measures were computed as the slope of the ANS measure reactivity regressed on time. Positive slopes on HR and MAP and negative slopes on RSA and PEP all indicated up-regulation in general ANS arousal (155). Temperament was assessed via both maternal report and observational coding methods, capturing the specific domains of temperamental negativity that have been theoretically and empirically related to children’s physiological reactivity (272). First, mothers reported on the Infant Behavior Questionnaire (IBQ (273)) at age 1 year and a modified version of the Child Behavior Questionnaire (CBQ (274)) at ages 3.5 and 4.5 years. Both instruments have been widely validated for use in their respective target populations (273-276).  Observational measures of children’s temperament were also collected using the Laboratory Temperament Assessment Battery (LabTAB) administered in a standardized fashion during home assessments at 4.5 years and in grade one. The LabTAB is comprised of 12 emotion-eliciting behavioural episodes that simulate everyday situations (e.g., a social interaction with an unfamiliar adult, waiting for a signal before eating a snack, and using fine 76  motor skills at a toy workbench). These situations were used to evoke affective reactions within three domains: negative affectivity, positive affectivity, and behavioural control-regulation. All LabTAB assessments were videotaped and rated by two independent reviewers who coded facial, vocal, motor, behavioural and postural responses (277). To remain consistent with the temperament domains assessed by maternal report on the IBQ and CBQ, only coded behavioural observations of approach negativity and withdrawal negativity, elicited during negative affectivity episodes, were included in the present analyses. Maternal report and laboratory-based observations of temperament and behaviour that were not expected to relate to children’s physiological reactivity were excluded. For additional details of LabTAB episodes, administration, and coding (see Luby et al., 2002). Both LabTab- and questionnaire-derived temperamental measures have been shown reliable and internally consistent in this sample (277).  Children’s presyndromal, internalizing and externalizing symptoms were assessed in kindergarten using maternal and teacher reports on subscales from the MacArthur Health and Behavior Questionnaire (HBQ) (278,279). The HBQ internalizing and externalizing subscales are well-validated measures of emotion regulation and reactivity difficulties relevant to the present analyses of children’s biobehavioural reactivity (e.g., sadness, withdrawal, irritability, anger) (280,281). The validity and reliability of these measures have been previously established in this sample (281).  4.2.3 DNA methylation BECs were collected from participants at ages 15 and 18 years using MasterAmp Buccal Swabs (Epicentre Biotechnologies) and were stored at -80º C. Genomic DNA was extracted from buccal swabs using Buccal DNA Isolation Kits (Isohelix Ltd), then purified and concentrated 77  with DNA Clean & Concentrator kits (Zymo Research). DNA quality was assessed by a NanoDrop ND-1000 (Thermo Scientific). 750ng of genomic DNA underwent bisulfite conversion using the EZ DNA Methylation Kit (Zymo Research).  Bisulfite converted DNA was treated according to established protocols (Illumina) in preparation for loading onto microarrays. DNA from buccal swabs collected at age 15 years was assayed using the Infinium HumanMethylation27 BeadChip (27K array); previous findings on these data can be found in Essex et al. (2011). DNA from samples collected at age 18 was assayed using the next generation of this technology, the Infinium HumanMethylation450 BeadChip (450K array). For samples collected at age 15, 160 ng of bisulfite-converted DNA was whole-genome amplified, fragmented, and hybridized onto the 27K array. The 27K array analyzes DNAm at 27,578 CpGs, primarily at DNA sequences that map onto gene promoter regions. Raw DNAm data from the scanned microarrays are available in the gene expression omnibus (GEO) database under the accession number GSE25892 at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25892 (146).  For samples collected at age 18, 160 ng of bisulfite-converted DNA was whole-genome amplified, fragmented, and hybridized onto the 450K array. The 450K array covers over 485,000 CpGs, representing 99% of all RefSeq genes, and includes ~90% of the CpGs that are on the 27K array. Microchips with hybridized DNA, data were input into Illumina’s Genome Studio software.  For the 27K array, DNAm data were background-adjusted and quantile-normalized, as previously described (146). Probes underwent rigorous quality control processes, were assessed for detection p-value, number of underlying probes, and signal levels in each subject and 78  replaced with “NA” in the subject of interest if they failed at any metric (total of 4567 CpG measurements). One sample was removed based on poor quality data (>10% probes with a detection p-value > 0.05), leaving 109 samples, and one probe was removed due to poor quality data (detection p-value > 0.05 in more than 10% of individuals). The remaining samples had between 0 and 194 “NA” values (mean = 27.19, median = 13), which were replaced with imputed values using the “impute.knn” function in the R package “impute” (282). Probes on the X chromosome were removed as these CpGs differ by sex. Finally, probes were removed based on lack of inter-individual variability based on existing data reduction methods. First, probes in which b values across all individuals were < 0.05 or > 0.95 were excluded (9,346) (283). Second, analyses were restricted to individuals with complete biobehavioural measures only (n=55) and any probes with a range < 0.05 b, as calculated between individuals within the 10th and 90th percentile were removed (8,309). This left 9,922 probes for analysis (176). Although this variability cut-off was stringent most probes measured on the 27K array are located within promoter CpG islands, which are the most highly stable CpGs in the genome (88). The 9,922 CpGs were annotated to 6,583 unique genes; 2,212 genes contained two or more variable probes.  Finally, the “detectOutlier” function in the lumi package was applied and no outliers were identified (n=55) (164). DNAm data from the 450K array were background-adjusted and color-corrected in Genome Studio (Illumina). Three outlier samples were detected using the “detectOutlier” and removed, leaving 52 individuals; this method assumes a single cluster of samples and removes samples which are greater than two-times the median distance from the center of the cluster and was applied following data visualization due to visible outlying samples (164). A total of 4,314 probes deemed low quality were removed. Additional probes that were removed included those 79  that mapped onto sex chromosomes (1, 216), assayed single nucleotide polymorphisms (SNPs; 64), and were found to cross-hybridize to sex or autosomal chromosomes (37, 909) or SNPs (19, 999), based on a previous annotation (165). Data were then normalized across samples using quantile-normalization, followed by normalization of probe-type differences using Subset-quantile Within Array Normalization (SWAN) (167). An empirical Bayes method (ComBat) was applied to correct for effects associated the separation of samples into batches during the microarray experiments, specifically into 96-well plates, into microarray chips holding 12 samples per chip, and into 6 rows present on each chip (173).  Of note, the DNAm data collected at ages 15 and 18 were run at different times and on different technologies (the 27k array and the 450k array, respectively), and these data were treated differently during preprocessing. Specifically, the 450k array contains two probe types (type I and type II), which have different dynamic ranges and therefore different b value distributions; this was corrected for using SWAN. The 450k array was also corrected for batch effects using ComBat due to technical variability. The 27k array contains only type I probes and batch effects as measured by plate, chip and row were not significantly correlated with the variables of interest (all p-values > 0.05), nor were they correlated with the first principal component of the DNAm array data, which accounted for over 90% of total variability. Therefore, neither SWAN nor ComBat were used on the 27k array at age 15. For consistency, CpG sites assayed on both the 27k array and the 450k array were mapped to genes using the Price annotation (165), while the Illumina HumanMethylation27 Manifest File was used for CpG assayed only on the 27 array. At each CpG assayed on the microarray platforms, a b value ranging from 0 (completely unmethylated) to 1 (100% methylated) was calculated for each sample using the signal intensity 80  from the scan. b values of all samples from the 27K array and 450K array were log transformed into M-values prior to analysis to adjust for the heteroscedastic nature of b values (175). All results are reported in b values to facilitate biological interpretation.  4.2.4 Statistical analysis Principal component analysis (PCA) is a multivariate technique designed to reduce the dimensionality of a large set of non-independent variables, while retaining important sources of variation in the dataset (284). In the present analysis, PCA was used to derive biological and behavioural response propensities that reflect the shared, underlying regulatory processes common across the measures of temperament, ANS reactivity, and internalizing/externalizing symptoms (Supplementary Figure 1). PCA provided an effective analytic alternative to conducting separate analyses of the relations between the biological/behavioural measures and DNAm, which would have been statistically and conceptually problematic due to the increased rate of Type I error and the interrelatedness of the reactivity measures. PCA was run on the 16 scaled variables derived from measures of temperament, ANS, and internalizing/externalizing symptoms (Table 1), producing 16 principal components (PCs) that represent mathematically unique but conceptually overlapping aspects of the child’s biological and behavioural response propensities. Each PC (hereafter referred to as a biobehavioural reactivity factor) represents an independent axis of variation among the data, driven by different combinations of weights from the original measures. The first three biobehavioural reactivity factors were chosen for DNAm analysis based on examination of the scree plot, which indicates a clear “break” between the third and fourth component (285). 81  A number of covariates were evaluated for potential inclusion in the analyses. The sample was homogenous in age (all participants entered preschool within the same school year) and ethnicity (87% Caucasian), precluding the need to control for these variables. Family income, measured at 9 months postpartum and kindergarten, were also excluded as covariates, because they were not correlated with any of the three PCs (all p-values > 0.05). The effect of sex was examined post hoc.  The cell composition of buccal swabs can vary between individuals, altering DNAm patterns, and was therefore also considered for inclusion as a covariate.  The percentages of underlying BECs and leukocytes were calculated from the DNAm profiles measured at age 15 using a cell deconvolution algorithm trained on BECs and saliva samples (171). Using this tool, the proportions of BECs in the age 15 cohort was estimated to range from 81-96% (mean 87.7%). However, this proportion was not correlated with the measures of interest (p-values >0.05) and was also excluded from further analysis.   Associations between biobehavioural reactivity and DNAm Associations between the biobehavioural reactivity factors and DNAm at age 15 were examined from the 27K array. The sample of 9,922 variable CpGs was tested against the three biobehavioural reactivity factors using Spearman rank order correlations. P-values were corrected using the Benjamini-Hochberg method to estimate false discovery rates (FDRs), which limits the expected proportion of false positives and therefore reduce the number of Type I errors (179). The magnitude of change in DNAm across individuals, termed Db, was calculated using the slope of the regression line and reported for each differentially methylated CpG as a measure of effect size (87). CpGs that were significantly associated at a FDR corrected p-value of 0.05 or smaller and had an absolute deta beta > 5% (|∆b| > 5%) were reported as high confidence 82  differentially methylated CpGs (146). CpGs with a FDR corrected p-value between 0.05 and 0.2 and an absolute |Db| > %5 were reported as medium confidence differentially methylated (146).  To test whether associations identified at age 15 persisted at age 18, I examined associations between the biobehavioural reactivity factors and DNAm measured at age 18 using the 450K array. Because age was perfectly confounded with both microarray platform and processing methods, I chose not to compare directly b values at ages 15 and 18 and rather repeated the correlations using DNAm assessed at age 18. This tested whether variation across individuals was also associated with biobehavioural reactivity at the later time point. As described in more detail below, I focused on genes with multiple high and medium confidence CpGs found at age 15. Again, p-values were corrected using the Benjamini-Hochberg methods to control for FDR. As an additional test of the strength of the associations, correlation coefficients between the PCA-derived biobehavioural reactivity factors and differentially methylated CpGs were permuted 100 times, generating a null distribution, and compared to the true correlation coefficients.  We also tested stability in DNAm from age 15 to age 18 using mixed effects models that evaluated the prediction of DNAm at age 18 from DNAm at age 15. This analysis was performed on a subset of the CpGs significantly associated with biobehavioural reactivity at age 15. As such, biobehavioural reactivity was included as a covariate. Gene ontology (GO) analysis was performed using the software ErmineJ (286,287). The 9,922 CpG sites used in the analysis were annotated to genes as previously described, in order to generate a complete gene list or “background” from which to test for enrichment (288). Analysis was then performed using precision-recall and the following parameters: use the best scoring 83  replicate, include only “Biological Process” related GO terms, minimum gene set size of 5, maximum gene set size of 100, and test the effect of multifunctional genes.  4.2.5 Pyrosequencing experiments  Pyrosequencing experiments were performed in order to confirm significant associations between children’s biobehavioural reactivity factors and DNAm in the DLX5 gene found at both age 15 and age 18. A pyrosequencing assay was designed to examine the DLX5 3’ CpG island; this assay spanned approximately 200 base pairs and included 5 CpGs, two of which are covered by the 450K array (cg12041387, cg08835113); cg12041387 is also covered by the 27k array. Of the 55 child samples used in this analysis, only 42 had enough remaining genomic DNA from samples collected at age 18 to conduct experiments. All reactions were run on a PyroMark Q96 MD Pyrosequencer, following the manufacturer’s protocol. All CpG loci passed Pyro Q-CpG software quality control. Primer sequences used for DNA amplification and pyrosequencing are available upon request.  4.3 Results 4.3.1 Principal components analyses of biobehavioural reactivity Following analyses, the first three principal components were examined to derive and understand the primary factors that described children’s biobehavioural reactivity. The first PC explained approximately18% of the total variation, while the second and third PCs explained 13.5% and 11.5%, respectively (Figure 4.1a).  Maternal-report of children’s temperament and behaviour loaded strongly onto the first PC, while teacher-report and observational measures of children’s biobehavioural reactivity did not 84  (Figure 4.1b). Thus, the first PC distinguished mothers’ broad perspectives on children’s functioning from measures of children’s reactivity in more context- and stressor-specific settings. The third PC differentiated HR reactivity (an indicator of the activity of both the parasympathetic and sympathetic branches of the ANS) from RSA (a measure of parasympathetic activity only; Figure 4.1b). My review of the PCA results indicated that both the first and third principal components reflected variance attributable to method- and/or reporter-based differences, and thus I hypothesized that these components would not associate with DNA methylation.   4.3.2 Associations between biobehavioural reactivity factors and DNAm at age 15 As expected, when correlated with DNAm at all 9,922 CpGs measured at age 15, PC1 and PC3 displayed uniform p-value distributions, suggestive of a null distribution, and neither PC was significantly correlated with any individual CpG after FDR correction (Figure 4.1c).  85   Figure 4.1 Results of principal component analysis revealed biobehavioural reactivity as biologically driven composite measure.  (a) Plotting the percent variability of the principal components (PCs) showed a flatter distribution than what is typically expected when running PCA on psychological variables. (b) Loadings of original variables onto PC1, PC2 and PC3 (from left to right). (c) p-value distributions of genome-wide correlations between DNA methylation and PC 1 to 3 (left to right) (n= 55).  0510151 2 3 4 5 6 7 8 9 10Principal componentsPercentage of explained variancesABC−0.4−0.3−0.2−0.10.0OG1−HROG1−PEPOG1−RSAOG1−MAPMI−ANMI−WNMP−WNOP−WNOG1−WNMP−ANGOP−ANGOG1−ANGMK−INTTK−INTMK−EXTTK−EXTPC1−−HROG1−PEPOG1−RSAOG1−MAPMI−ANMI−WNMP−WNOP−WNOG1−WNMP−ANGOP−ANGOG1−ANGMK−INTTK−INTMK−EXTTK−EXTPC2−0.6−−HROG1−PEPOG1−RSAOG1−MAPMI−ANMI−WNMP−WNOP−WNOG1−WNMP−ANGOP−ANGOG1−ANGMK−INTTK−INTMK−EXTTK−EXTPC302004006000.00 0.25 0.50 0.75 1.00Spearman p−value# CpG sites02004006000.00 0.25 0.50 0.75 1.00Spearman p−value# CpG sites02004006000.00 0.25 0.50 0.75 1.00Spearman p−value# CpG sitesFigure 1 Result  of principal component analysis revealed biobehavioral reactivity as biologically driven composite measure. (a) Plotting the percent variability of the principal components (PCs) showed a flatter distribution than what is typically expected when running PCA on psych logic l variables. (b) Loadings of o iginal variables onto PC1, PC2 and PC3 (from left to right). (C) p-value distributions of genome-wide correlations between DNA methylation and PC 1 to 3 (left to right) (n= 55).86  The second PC represented individual differences in children’s biobehavioural inhibition and disinhibition, ascertained across teacher-, maternal-, and laboratory-based observational measures (Figure 4.1b). Observed measures of withdrawal negativity and internalizing symptoms loaded positively onto PC2, whereas anger and externalizing symptoms loaded negatively.  Measures of autonomic reactivity also showed moderate loadings on PC2: HR loaded positively, while PEP, MAP and RSA loaded negatively. Thus, PC2 integrated both behavioural and biological response characteristics—across reporters, contexts, and stressors—distinguishing inhibited children (higher scores) from disinhibited children (lower scores) and is hereafter referred to as Biobehavioural Inhibition/Disinhibition (BID). Correlations between BID and DNAm showed a left-skewed p-value distribution that deviated from the distribution that would be expected by chance, suggesting an association with age 15 DNAm (Figure 4.1c). Examining this association more closely, I found that BID was significantly associated with 12 CpGs at an FDR cut-off of 0.05 (“high confidence” CpGs) and an additional 81 CpGs were associated with at an FDR between 0.05 and 0.2 (“medium confidence” CpGs). Therefore, this multi-method, multi-reporter composite trait of early life biobehavioural reactivity showed an observable and statistically significant DNAm signature. To check the robustness of my findings, a linear regression was also run on all 9922 CpGs, using BID as the explanatory variable and including sex and minority status as covariates. The significant CpGs were largely stable across statistical tests (Supplementary Figure 4.2). Given the lack of covariate effects and my interest in the basic bivariate associations, I focus on the results of the Spearman correlations Next, genomic locations of the 93 high confidence and medium confidence CpGs differentially methylated by BID were examined. These CpGs mapped to multiple genes, including the imprinted genes, GNAS complex locus (GNAS) and insulin-like growth factor 2 87  (IGF2), and genes related to neurotransmitter secretion, including vesicle-associated membrane protein 5 (VAMPS) and otoferlin (OTOF). However, after running gene ontology analysis on genes ranked by p-value of associated CpGs, I did not find that genes containing differentially methylated CpGs could be classified by shared biological processes.   All genes contained only a single differentially methylated, high or medium confidence CpG, with the exception of four: DLX5 (distal-less homeobox 5), IGF2, MYO16 (myosin XVI), and PRUNE2 (prune homolog 2) (Figure 4.2). Nine CpGs mapped to the DLX5 gene, located within the coding region of the gene between 1.5 and 4kb downstream of the transcription start site. Five were high confidence CpGs (FDR < 0.05) and four were medium confidence (FDR 0.05-0.2). The nine CpGs were contiguous except for one interrupting CpG that fell just outside of the medium confidence threshold (cg02101486, FDR p-value < 0.209). These CpGs thus constitute a differentially methylated region (DMR), a region in which multiple, adjacent CpGs exhibit associations with BID. All 9 differentially methylated CpGs within DLX5 correlated negatively with BID scores, with Spearman’s rank order correlation coefficients (rhos) ranging between -0.60 and -0.44. The nine high and medium confidence CpG sites possessed large DNAm differences observed across individuals with the highest and lowest BID scores; the |Db| ranged from 9%-38% (149). 88   Figure 4.2 Schematic of DLX5, IGF2, MYO16 and PRUNE2 genes, which each contained more than 2 high or medium confidence CpGs significantly associated with Biobehavioural Inhibition/Disinhibition.  Green bars in gene schematic represent CpG Islands and gray lines or boxes represent genomic locations of CpGs plotted. Scatter plots of individual CpG DNA methylation are colored by sex (males = blue, females = pink) (n= 55). 10kb2kbDLX5IGF2100kbPRUNE2200kbMYO162228 − cg18873386 2020 − cg11500797 1768 − cg24115040 1679 − cg088783233972 − cg12041387 3634 − cg00503840 3032 − cg20080624 2862 − cg13462129 2557 − cg27016494−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2−4 −2 0 Inhibition/Disinhibition (PC2)DNA methylation−169 − cg21237591 −3620 − cg11005826−4 −2 0 2 −4 −2 0 Inhibition/Disinhibition (PC2)DNA methylation−452 − cg14396117 4 − cg18946226−4 −2 0 2 −4 −2 0 Inhibition/Disinhibition (PC2)DNA methylation−9507 − cg19282250 −9629 − cg11880010−4 −2 0 2 −4 −2 0 Inhibition/Disinhibition (PC2)DNA methylationFigure 2 Schematic of DLX5, IGF2, MYO16 and PRUNE2 genes, which each contained more than 2 CpGs significantly associated with Biobehavioral Inhibition/Disinhibition. Gree  bars in gen  schematic represent CpG Islands and gray lines or boxes represent genomic locations of CpGs plotted. Scatter plots of individual CpG DNA methylation are colored by sex (males = blue, females = pink) (n=55).89   IGF2, PRUNE2 and MYO16 each contained two probes significantly associated with BID. IGF2 contained one high confidence CpG (cg11005826, FDR p-value < 0.04, rho= -0.52, Db=15%) and one medium confidence CpG (cg21237591, FDR p-value < 0.13, rho= -0.45,  Db=9%), which were negatively associated with BID and were located 3620bp and 169bp upstream of the transcript start site, respectively. PRUNE2 contained two medium confidence CpG sites which were positively correlated with BID scores (cg11880010, FDR p-value < 0.09, rho= 0.47, Db=18%; cg19282250, FDR p-value < 0.15, rho= 0.42,  Db=15%), located 9620 and 9507bp upstream of the transcript start site, respectively. Finally, MYO16 contained two medium confidence CpGs. One CpG, located 452bp upstream of the transcription start site, was positively correlated with BID (cg14396117, FDR p-value < 0.08, rho= 0.48, Db=10%) and one CpG, located 4bp downstream of the transcription start site was negatively correlated (cg18946226, FDR p-value < 0.13, rho= -0.44,  Db=7%).  To ensure that effect sizes of the high and medium confidence sites were not being inflated by individuals with extreme DNAm values, correlations with BID were recalculated after a 90% winsorization (5% was modified from each tail) of all medium and high confidence CpGs. Results remained largely unchanged and the mean difference between p-values generated before and after winsorization was 6.66x10-6 (median= -6.84x10-7, 1st quartile = -1.67x10-5, 3rd quartile = 1.63x10-5). Prior to winsorization nominal p-values ranged from 1.06x10-6 to 1.97x10-3 and after winsorization nominal p-values ranged from 9.52x10-7 to 2.1x10-3. Similarly, |Db| differences were minor; effect sizes changed by -4.05x10-3 on average after winsorization (median= -1.71x10-3). 90  To add an additional level of rigor to the analyses and to test that these correlations did not occur by chance, I permuted the BID scores 100 times and correlated the permuted scores with the 15 high and medium confidence CpGs mapping to DLX5, IGF2, PRUNE2 and MYO16 (Supplementary Figure 4.3). With the exception of cg14396117, all of the true correlation coefficients were significantly greater than correlations expected by chance (p-value < 0.01). The correlation coefficient for cg14396117, located in MYO16, fell outside the 99th percentile of the null distribution (p-value < 0.02). Therefore, these findings were unlikely to be spurious, but rather reflected significant associations between DNAm of DLX5 and childhood reactivity, as measured by BID.  4.3.3 Examination of sex differences in correlations between biobehavioural reactivity factors and DNA methylation at age 15 Given previously observed differences between sexes in DNAm, temperament, and mental health symptoms, correlations between the 93 BID-associated CpGs were re-examined separately within males (n=19) and females (n=36) (289,290). In general, the correlation coefficients (rhos) generated in females were similar to the coefficients generated when including both sexes in correlations (mean absolute difference in rho = 0.07) (Figure 4.3 top panel); however, the correlation coefficients generated in the males differed more strongly (mean absolute difference in rho = 0.13). To test whether this sex difference was driven by sample size, females were subsampled 100 times down to the sample size of males (n = 19) and correlations were rerun. The correlations were predominantly stable (mean absolute difference in rho = 0.02) (Supplementary Figure 4.4), suggesting that the difference in sexes was not entirely driven by sample size.  91  Additionally, I directly compared the DNAm values of males to females and assessed whether high and medium confidence CpGs differed by sex. A Mann–Whitney U test was run on all 9,922 CpGs sites to test for CpGs differentially methylated by sex, revealing 192 such sites (FDR p-value < 0.05), including five CpGs significantly associated with BID (cg01796228, cg09340639, cg09565688, cg11005826, cg15731815). These mapped to the genes LIFR, FCRL1, CLEC12B, IGF2, and RNF207. In all five CpGs differentially methylated by sex, associations with BID remained significant in females (all p-values < 0.05) but were no longer significant in males (all p-values > 0.05) (Figure 4.3 bottom panel).   Figure 4.3 Correlations between DNA methylation at age 15 and Biobehavoural Inhibition/Disinhibition in males and females.  (top) Spearman’s correlation coefficients of 93 high and medium confidence CpGs, calculated in full cohort (gray circle, n=55), females only (pink square, n=36) and males only (blue triangle, n=19). (bottom) 5 CpGs which were associated with Biobehavioural Inhibition/Disinhibition and differentially methylated by sex. In all CpGs, correlations remained significant in females only but lost significance in males (n= 55).  −0.8− or high confidence CpGrhosexbothfemalemalecg01796228 cg09340639 cg09565688 cg11005826 cg15731815−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 20.250.500.75Biobehavioral Inhibition/Disinhibition (PC2)DNA methylationABFigur  3 Results of correlations between DNA methylation at age 15 and Biobehavoural Inhibition/Disinhibition when cohort is separated by sex. (a) Spearman’s correlation coefficients of 93 high and medium conf dence CpGs, calculated in full cohort (gray circle, n=55), females only (pink square, n=36) and ales only (blue triangle, n=19). (b) 5 CpGs whi h w re associated with Biobehavioral Inhibition/Disinhibition and differentially methylated by sex. In all CpGs, correlations remained significant in females only but lost significance in males (n= 55).92  We then examined potential sex differences in biobehavioural reactivity-DNAm relation within the DLX5 gene only. All correlations with CpGs (9 CpGs) remained significant in females, with correlation coefficients ranging from -0.38 to -0.63 (p-value < 0.05). In males, two CpGs, cg12041387 and cg11500797, remained significant, with correlation coefficients of -0.52 and -0.49, respectively (p-value < 0.05). In sum, only five of the 93 CpGs reported (cg01796228, cg09340639, cg09565688, cg11005826, cg15731815) were differentially methylated by sex, suggesting that DNAm patterns linked to BID are not likely driven by sex-specific differences.  4.3.4 Persistence of associations between BID and DNAm at age 18 Given the strong associations between BID scores and multigenic DNAm at age 15 years, I hypothesized that those same associations would hold three years later, when participants were 18 years old. Using the 450K data at age 18, I examined all variable CpGs annotated to the four genes of interest to take advantage of the added coverage of the 450K array, as compared to the 27k array. I examined correlations between BID scores and 39 CpGs in DLX5, 35 in IGF2, 9 in PRUNE2, and 37 in MYO16; this included all CpGs in those genes found to be significant at age 15.  A total of 15 probes at age 18 were found to be significant after FDR correction for the 120 tests (FDR p-value < 0.05 and |Db| > 5%) (Figure 4.5). As before, these differentially methylated CpGs were permuted 100 times to create null distributions, and all observed correlation coefficients either fell outside of the 97% percentile of the null distribution (Supplementary Figure 4.4).   93   Figure 4.4  DLX5 and IGF2 DNA methylation remained significantly associated with Biobehavioural Inhibition/Disinhibition at age 18.  (A) DNA methylation at age 18 in 15 probes located upstream and within the DLX5 gene. (B) DNA methylation at age 18 in 4probes located upstream of IGF2 gene. Panel titles represent distance from transcription start site, followed by CpG ID. Correlations found to be significant at age 18 are labelled with a red box; Correlations found to be significant at both age are labelled with a red/black box; all remaining CpGs were found to be significant at age 15 only. Males are plotted n blue; females are plotted in pink (n= 52).  Fi ure 4 DLX5 and IGF2 DNA methylation remained sig ifi tl  sociated with Biobehavioral I i iti n/ isinhibition at age 18. (a) DNA methylation at age 18 in probes located upstream and within the LX5 gene. (b) DNA methylation at age18 in 4probes located upstream of IGF2 gene. Panel titles represent distance from transcription start site, followed by CpG ID. Correlations found to be significant at age 18 are labelled with a red box; Correlations found to be significant at both age are labelled with a red/black box; all remaining CpGs were found to be significant at age 15 only. Males are plotted n blue; femalesare plotted in pink (n= 52).AB3820 − cg00400832 3951 − cg08835113 3972 − cg12041387 4047 − cg09359114 5003 − cg101568462862 − cg13462129 3032 − cg20080624 3475 − cg20377305 3634 − cg00503840 3736 − cg153392311768 − cg24115040 2020 − cg11500797 2228 − cg18873386 2557 − cg27016494 2606 − cg15732768−1746 − cg25076459 −788 − cg04737114 −689 − cg17083494 −639 − cg01448276 1679 − cg08878323−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 20.250.500.750.250.500.750.250.500.750.250.500.75Biobehavioral Inhibition/Disinhibition (PC2)DNA methylation−3620 − cg11005826 −1467 − cg02425416 −169 − cg21237591 −137 − cg11701022−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 Inhibition/Disinhibition (PC2)DNA methylation94  The significant probes included 13 DLX5 CpGs, including two of the nine probes found significant at age 15 (cg12041387 (rho = -0.45), cg00503840 (rho = -0.39)). An additional four CpGs that were significant at age 15 had a p-value < 0.05 but did not pass FDR correction at age 18 (Spearman’s rho = -0.34 – -0.31, |Db| = 10 – 12%). There were an additional 11 significant CpGs at age 18 that were assayed only on the 450K array and thus not tested at age 15. These were located within the same region reported at age 15, as well as upstream of the transcription start site (600-1700bp). Again, effect sizes were notable and ranged from 10 – 21%.  The remaining two age 18 probes significantly associated with BID were IGF2 CpGs (cg02425416 and cg11701022) with correlation coefficients of -0.40 and -0.47. These were not measured at age 15. Two IGF2 CpGs measured at age 15 that were significantly associated with BID (cg11005826 and cg21237591) were no longer significant at age 18.   4.3.5 Longitudinal stability in DNA methylation Taking further advantage of the longitudinal DNAm data, I directly examined its stability between the age 15 and 18. First, I compared the inter-individual ranges of 17039 CpG sites run on both platforms, after removing low quality CpGs and those at which b values across all individuals were <0.05 or > 0.95. A few sites differed substantially, an expected finding, given that samples were taken three years apart. However, the average difference in inter-individual variation neared zero, suggesting no systematic difference across the study sample due to either age or microarray platform (median= -0.011, mean= -0.022) (data not shown).   We then examined the 15 CpGs found in DLX5, IGF2, PRUNE2 or MYO16, that were significantly associated with BID at age 15, to ask whether these sites would reflect the global trend in stability seen above. Using a linear mixed effects model, DNAm at age 15 was tested as 95  the explanatory variable, predicting DNAm at age 18, and BID was included as a covariate. These regression models were significant for 13 of the 15 CpGs (FDR corrected p-value < 0.05), with DNAm at age 15 explaining 8-24% of the variation in DNAm at age 18 (Supplementary Table 4.1). Although DNAm patterns at ages 15 and 18 were significantly associated, median DNAm in the 15 CpGs changed from -18% to +11%, with DNAm decreasing in 13 of the 15 CpGs three years later (figure 5a). These findings suggested a lack of stability of DNAm from age 15 to 18.  4.3.6 Pyrosequencing experiments verified associations between Biobehavioural Inhibition/Disinhibition and DLX5 DNA methylation Finally, pyrosequencing experiments were conducted to verify the DNAm patterns in DLX5 detected on the microarray platforms. All 5 probes in the pyrosequencing assay were negatively correlated with BID (Figure 5b), with correlation coefficients ranging from -0.47 to -0.52. However, DNAm levels assessed by pyrosequencing were consistently 5-15% lower than in the 450K array in the two CpGs measured in both platforms (Figure 5c). Serial dilutions of artificially methylated and unmethylated samples run on pyrosequencing confirmed that the assay was not biased; this indicated that the array probes at these loci may exhibit a preference for binding methylated DNA over unmethylated DNA. 96   Figure 4.5 CpG stability across three years and pyrosequencing of DLX5 gene to verify DNA methylation findings. (A) Changes in DNA methylation in 15 CpGs measured across both ages. Red/black boxes indicate CpGs significantly associated with Biobehavioural Inhibition/Disinhibition at both age; all remaining CpGs were found to be significant at age 15 only (B) 5 CpGs assayed by pyrosequencing DNA collected at age 18 were associated with Inhibition/Disinhibition scores. Two CpGs assayed are identified by their 450K IDs (bottom). Three CpGs (top) were unique to the pyrosequencing assay and not covered by the array. (C) DNA methylation at cg12041387 and cg08835113 was strongly correlated in the data generated by pyrosequencing and the 450K array. Pearson correlation coefficients ranged from 0.84 to 0.86. However, pyrosequencing generated consistently lower DNA methylation values (n= 42). MYO16 − cg18946226 PRUNE2 − cg11880010 PRUNE2 − cg19282250DLX5 − cg27016494 IGF2 − cg11005826 IGF2 − cg21237591 MYO16 − cg14396117DLX5 − cg13462129 DLX5 − cg18873386 DLX5 − cg20080624 DLX5 − cg24115040DLX5 − cg00503840 DLX5 − cg08878323 DLX5 − cg11500797 DLX5 − cg120413870.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.80.0 0.2 0.4 0.6 methylation at age 15DNA methylation at age 18−202PC2Acg12041387 cg08835113 P3 P4 P5−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 Inhibition/DisinhibitionDNA methylation (pyro)BFigure 5 CpG stability across three years and pyrosequencing verification of DLX5 gene. (a) Changes in DNA methylation in 15 CpGs measured across both ages. Red/black boxes indicate CpGs significantly associated with Biobehavioral Inhibition/Disinhibition at both age; all remaining CpGs were associated at age 15 only (b) 5 CpGs assayed by pyrosequencing DNA collected at age 18 were associated with Inhibition/Disinhibition scores. Two CpGs are identified by their 450K IDs, the remainder were not assayed by the 450K array. (c) Correlations of DNA methylation values at cg12041387 and cg08835113 generated by pyrosequencing and the 450K array (n= 42).r = 0.840. 0.4 0.6 0.8DNA methylation (450K)DNA methylation (pyro) cg12041387r = 0.860. 0.4 0.6 0.8DNA methylation (450K)DNA methylation (pyro) cg08835113C97   4.4 Discussion Current research on the epigenetic alterations that accompany early adversity exposures has led to a growing set of studies supporting external, environmental influences on DNA methylation (DNAm). Much of this research has examined the relation between early childhood stressors on DNAm (45,127). However, variations in physiological and psychological functioning have also been shown to correlate with DNAm (259,261,262) and may reflect the linkage of a child’s internal reactivity to the environment through epigenetic signatures. Few empirical studies have tested this latter supposition, however, and those that have are mainly cross-sectional in nature. I sought to address these limitations by testing the hypothesis that children’s biological and behavioural response propensities would also be related to DNAm measured later in life. I also examined the persistence of relations between Biobehavioural Inhibition/Disinhibition (BID) and DNAm at age 15 and age 18 and the stability of DNAm at these CpGs.  Multiple-reporter and multi-method measures of early childhood temperament, behaviour, and ANS reactivity were input into a PCA examining early life biobehavioural reactivity factors. Three components explained a significant percentage of the overall variance in temperament, mental health symptoms, and ANS reactivity and were further examined for their associations with DNAm at age 15. Two components, PC1 and PC3, were not associated with DNAm patterns at age 15, likely because they reflected method- and reporter-based variance rather than trait-like differences in children’s biobehavioural reactivity. The second principal component (PC2), however—BID—reflected the intersection of observed temperamental withdrawal, anger, autonomic reactivity, and internalizing/externalizing symptoms and was 98  found to have both a broad DNAm signature across many genes and a particularly strong association with multiple sites within DLX5 and IGF2 genes.  Elevated levels of children’s biobehavioural disinhibition (approach negativity, anger, externalizing symptoms) were associated with significantly higher DNAm in DLX5 and IGF2 at age 15 and at age 18 years. Conversely, those with greater childhood inhibition (fear, withdrawal negativity, internalizing symptoms, and heart rate response) showed lower DLX5 and IGF2 methylation (Supplementary Figure 6). While median DNAm within these CpGs changed over the course of 3 years, the association between children’s early biobehavioural reactivity and DLX5 and IGF2 methylation was maintained from adolescence into young adulthood, a period marked by significant changes across various domains of psychological development (291). This illustrates the paradoxically fixed yet dynamic nature of DNAm. Despite changes across the lifespan, there may be enduring patterns of DNAm in certain genes over time, particularly those associated with early biological and behavioural reactivity.  Given the tissue of origin in the current study (BECs), I can only suggest that these DNAm patterns represent a biomarker of behavioural reactivity. However, Gene-Tissue Expression (GTEx) RNA sequencing data (292) indicate that DLX5 is expressed in skin tissue and some brain regions, and it is possible that the magnitude of inter-individual DNAm differences in DLX5 reflects actual differences gene transcription levels, affecting downstream biological processes (149).  IGF2 is not expressed at significant levels in either skin tissue or brain regions, though this does not negate a potential for DNAms differences to be a biomarker of biobehavioural reactivity. Importantly, results of present research are preliminary, and validation of these findings in a larger sample, with more repeated measures of DNAm and reactivity, are needed. 99   DLX5 is homeobox gene involved in neuron, craniofacial and bone development. Its protein regulates glutamic acid decarboxylases involved in the synthesis of gamma-aminobutyric acid (GABA), the chief inhibitory neurotransmitter in GABAergic neurons (293). If the differences in DNAm observed in this cohort correspond to regulatory epigenetic patterns in brain tissue, such differences could produce altered levels of GABA, corresponding to inhibited or disinhibited behavioural proclivities. Although GO analysis of all CpGs ranked by p-value did not find any significant enrichment of GABA-related genes, these results cannot fully rule out the involvement of GABA circuitry. Future research should explore the underlying pathways between reactivity-DNAm associations.  DLX5 is also highly expressed in osteoblasts during embryogenesis and plays an important role in craniofacial development. Commensurate with my finding of temperament and behavioural response-associated differences in DLX5 methylation, aspects of temperament have been previously linked to the bizygomatic width of facial structure, i.e., the ratio of the facial diameter across the cheekbones to the vertical height of the head. Specifically, four-month-old infants who showed propensities to biobehavioural reactivity had smaller bizygomatic widths (i.e., narrower faces) at 14 and 21 months than infants who were less reactive (294). Although the various functions of the DLX5 gene suggest that the protein may act as a regulator of the early development of inhibition and reactivity, further research is needed to confirm the relation between inhibition and DLX5 DNAm. IGF2 is expressed in many fetal tissues and encodes a growth factor that primarily acts to promote overall growth during gestation via cell differentiation and proliferation. Unlike the DLX5 gene, the function of DNAm patterns in this gene have been well-studied. IGF2 is imprinted, i.e. shows parent-of-origin specific expression, and expressed only from the paternal 100  allele (295). Aberrations in imprinting of this gene, as well as other nearby genes, are associated with Beckwith-Wiedemann and Silver-Russell syndromes—two congenital, growth-affecting conditions—as well as a number of cancers (296-300). While there are no published clinical features of Beckwith-Wiedemann relating to temperament, parents of affected children often describe them as more tenacious than their siblings (R. Weksberg, personal communication, December 2017). Previous studies of IGF2 methylation have also found associations with ADHD symptoms and prenatal maternal anxiety (301,302). Given that the BID measure was comprised of measures of internalizing and externalizing symptoms, these previous associations between IGF2 methylation and mental health are commensurate with the identified relation between DNAm at this gene and BID.  The focus of this study was on early-life internal, individual differences in biobehavioural reactivity in order to extend research that has largely focused on relations between DNAm and external, environmental adversities. However, the developmental influences of such internal and external factors do not operate in isolation of each other, as demonstrated by existing literature on environmental correlates of IGF2 methylation. In a previous study of this cohort, associations were observed between early parental stress and DNAm in adolescence (see Essex et al. 2011). BID was not associated with DNAm levels in CpGs found related to parental stress; however, early parental stress was significantly associated with CpGs in IGF2 and DLX5, although not those identified here. In sum, results of the present study provide a strong foundation on which future research can further explore relations among external environmental influences, internal biobehavioural factors, and DNAm patterns. 101  4.4.1 Limitations  Results of the present study must be weighed in light of several limitations. First, the current study’s assessment of DNAm at ages 15 and 18 years suggests at least short-term persistence of associations with biobehavioural reactivity, but I am unable to infer when differences in DNAm may have arisen in development. It is possible that the observed DNAm pattern was established during early embryogenesis in the ectodermal germ layer, in response to allelic differences and fetal exposures. If this were true, then both BECs and neurons, with their common ectodermal origins, might be expected to exhibit comparable patterns of DNAm. While it is possible that DNAm in tissues other than BECs played a causal role in the development of differing levels of biobehavioural reactivity, DNAm and reactivity may be related in at least two other ways. DNAm patterns could result from temperamental response predispositions leaving a chemical mark on the epigenome, or a third factor, such as genetics, could have affected both reactivity and DNAm. Regarding the latter, I did not test the CpGs of interest for the influence of genetic variability although such CpGs, termed methylation quantitative trait loci, are common in the genome (148,258). The relatively small and homogenous nature of the sample precluded extensive testing of the influence of race and sex on relations between BID and DNAm, though the results were retained across models that controlled for the effects of covariates. Future longitudinal studies with larger, more heterogeneous samples are needed to build on the present findings and address more complex questions of antecedence, causality, and modifying factors.   Due to ethical and other conspicuous prohibitions on acquiring samples of brain, I studied DNAm of BECs collected from oral swabs, rather than DNAm in neural tissues, where biobehavioural patterns of response originate. DNAm is tissue-specific in nature, and there are clearly many differences among the epigenetic marks measured in buccal epithelium and brain 102  tissue. Current research in epigenetics is elucidating the level of DNAm concordance between brain and tissues/fluids commonly collected for DNA analysis, including saliva, BECs, and white blood cells (97,171,225). Although such research is ongoing, emergent literature suggests that DNAm located in specific genomic regions, such as CpG islands within gene coding regions, may be more highly conserved across tissues and therefore potentially informative of brain DNAm patterns (225,303).   Our results reveal age-related DNAm changes from age 15 to 18 in DLX5, IGF2, PRUNE2 or MYO16, consistent with prior research identifying increased DNAm changes in brain tissue and blood from childhood to adolescence as compared to more minimal changes that occur across adulthood (65,110). However, given the confounding of DNAm processing technology with age in this study, I cannot rule out that changes in DNAm between age 15 and age 18 resulted from differences between the 27K array and 450K technologies and differences in data processing and normalization. Additionally, differences in the cell composition of buccal samples between age 15 and age 18 may have contributed. A different study design will be required to test broadly how age is reflected in DNAm changes. Finally, to test my hypotheses related to longitudinal stability, I conducted, by necessity, a more constrained analysis of DNAm at age 18. Future analyses will explore the full range of data collected at 18 in relation to both early environmental and biobehavioural reactivity factors.     These limitations notwithstanding, the present study’s integration of pertinent information from different biological arenas (e.g., epigenetics, autonomic physiology) with psychological constructs, is consistent with a new strategy in mental health research proposed by the National Institute of Mental Health (NIMH). The Research Domain Criteria (RdoC) framework guides mental health research to recognize the broader biological and psychological 103  contexts of health and behaviour and to understand the phenotypic variability within clinical disorders (304,305). As such, advancing investigations into the relations among epigenetic differences, childhood behaviour problems, and patterns of stress responsivity could support earlier identification of inauspicious developmental and mental health outcomes.   In conclusion, this study revealed strong, prospective associations of observational measures of childhood inhibition/disinhibition with patterns of DNAm in BECs harvested at both mid-adolescence and early adulthood. Though current analyses do not allow for firm inferences of antecedence and causality, such associations focus attention upon possible linkages between inhibition/disinhibition dimensions of children’s temperament and behaviour and the DLX5 and IGF2 genes that have diverse developmental and regulatory functions. These findings offer provisional evidence for a developmental, epigenetic biomarker of internal biobehavioural response predilections. As with many other developmental processes, epigenetic modifications integrate the complex interactions of environmental context and constitutional biology, providing insight into developmental trajectories and long-term health outcomes. 104  Chapter 5: Assessing early life patterns of DNA methylation with concurrent objective and subjective socioeconomic status   5.1 Introduction  Socioeconomic status (SES) is a powerful determinant of health across the life span and has been linked to chronic diseases such as diabetes, cardiovascular and respiratory disease, and premature mortality (20,306). Low SES experienced in infancy and childhood is especially potent, exhibiting graded associations with stunted physical, social-emotional and cognitive development, in addition to latent effects, such as higher rates of morbidity and mortality (19,38,307). These associations suggest that social experiences are “biologically embedded” and leave a physical residue or imprint on the body, which has wide-reaching effects on long-term health and well-being (308,309). Currently, we are only beginning to understand the mechanisms underlying this relationship, and have yet to elucidate which aspects of SES are causal, the timing of embedding, and the biological systems involved.  Recent research into the systems underlying embedding has uncovered two, related occurrences: upregulation of inflammation and dysregulation of stress response pathways (310). In both adults and children, low SES is associated with chronic activation of the hypothalamic-pituitary-adrenocortical axis, including higher cortisol output and greater expression of the glucocorticoid receptor and inflammatory cytokines (311-313). However, the persistence of such changes, even after an individual experiences an upward trajectory in SES, indicates a more fundamental biological pathway is involved (87,314). 105   DNA methylation (DNAm), which typically involves the covalent attachment of a methyl group to the cytosine of a cytosine-guanine dinucleotide (CpG), is the most commonly studied epigenetic mark in humans (63,188). DNAm is paradoxically stable and plastic, in that it is mitotically heritable, maintaining patterns attributed to histological differences and genetic variation, but is also to some extent malleable by environmental exposures (94,190,315). However, epidemiological studies of DNAm patterns across human populations are complicated by the respective relationships between tissue specificity, genetic variation, ethnicity, age and DNAm patterning (150). Tissue from which the DNA is extracted is one of the strongest determinants of DNAm (95,192,316). During differentiation, cells acquire unique epigenomes which contribute to – and are affected by – unique transcription profiles and cellular identity (67,192). As such, genome-wide DNAm patterns from one tissue and different individuals are more closely related than DNA patterns from one individual and multiple tissues (95,97,316). Moreover, blood cell composition may differ by SES or related phenotypes such as birth weight, or adoption status (139,317).   Genetic variation is currently thought to account for 20-80% of DNAm variation (94,209-211). At the genome-wide level, individuals who share ethnicity or ancestry, will have more similar DNAm patterns, in large part likely due to underlying genetic variation (91,318,319). At the CpG level, inter-individuals DNAm patterns have been associated with nearby or even more distant single nucleotide polymorphisms (SNPs); SNPs at which genetic variation is associated with DNAm are referred to as methylation quantitative trait loci (mQTLs) (91,177,213). With respect to the relationship between mQTLs and gene expression, most SNPs in the genome are predicted to influence DNAm and gene expression independently, rather than demonstrate a linear relationship (85,320). Finally, age has been strongly correlated with DNAm, such that 106  DNAm can be used to accurately predict chronological age (241,321). The role of DNAm in aging is not well understood, but in early life, pre- and post-natal, DNAm is hypothesized to act as a critical regulator of development (65,67,322). Notably, prenatal environmental exposures have been consistently associated with DNAm changes in childhood and may mediate related long-term health outcomes (148,323,324). As early life DNAm patterns are especially dynamic, they may be more susceptible to alterations from environmental exposures (96,110,148). These features, as well as its fundamental role in regulating gene activity, make DNAm a plausible mechanism of biological embedding (314,315,325-327).  Multiple studies have demonstrated that DNAm is reliably associated with early life social class, irrespective of current SES, as well as related to inflammatory signaling, transcriptional profiles (35,87,314,327). Although in some cohorts, associations between early life SES and epigenetic patterns measured later in life depend on concurrent SES (328). Various components of SES including family income, parental education, and family psychosocial adversity have also been associated with concurrent, differential DNAm and CpGs were enriched for genes related to immune function and development (325). Taken together these findings suggest that DNAm can be linked to different components of SES, measured at times in the life course, and call for studies probing which underlying aspects the SES experience, such as education or financial standing, are most predictive of DNAm.  Experiences often stratified by SES range from those at the family level, including maltreatment or abuse, and extend to neighborhood and city characteristics, dictating quality of housing, proximity to violence, etc. (29). Extensive research has attempted to tease apart how singular exposures and experiences relate to and affect health, but the underlying relationships are often obscured by the extensive variation in SES-related experiences. Beyond this, the same 107  experience may be perceived and internalized differently by two individuals. This raises the question of how subjective vs. objective measures of SES compare in their relationships to later or concurrent health outcomes (49,329). Subjective measures of SES, such as an individual’s perception of their social rank, is reliably associated with health outcomes across many different populations, even when corrected for objective measures of SES (49,330,331). While previous studies have shown a significant but non-redundant relationship of traditional indicators of SES and subjective rankings of SES, the latter may encompass a broader picture of an individual’s experience, and therefore may be more closely related to his/her stress levels (332). In a sample of healthy Caucasian women, subjective social status was more consistently and strongly related to psychological functioning and health-related factors, including cortisol habituation to repeated stress, as compared to objective SES indicators (49). Increased subjective SES has also been associated with lower susceptibility to upper respiratory infection while controlling for objective SES (329). These findings suggest that psychological perceptions of social status may be an important contributor to the SES-health gradient.    In children specifically, interactions with parents act as an important buffer of SES-related experiences in early life, mitigating a child’s experience of SES-related stress and disadvantage in the home (311,333). For example, children with social support from the parent significantly reduced the cortisol stress response, as compared to support from a stranger (334). Moreover, adults reared in low-SES households who experienced high maternal warmth have decreased pro-inflammatory signaling compared to adults from low-SES households who experienced low maternal warmth (311). Given that parental interactions act to “buffer” stress and adversity in children, this raises the question of how parents’ subjective experiences of SES, 108  informed by stress and their own environmental sensitivity, relate to their children’s biological outcomes.   Based on extent research, I hypothesized that DNAm correlates of familial material resources, as measured by objective SES, would differ from DNAm correlates of parents’ subjective social standing, in a similar fashion to other biological of outcomes. In the present study, I interrogated DNAm in a cohort of children aged 6-11 years old and compared concurrent DNAm from peripheral blood mononuclear cells (PBMCs) and objective and subjective SES, finding unique DNAm associations in each SES measure despite their interrelatedness. Using matched neonatal dried blood spots (DBSs), I also addressed the question of whether these correlates existed at birth. Finally, I performed a targeted analysis of the CpGs correlated with SES, testing for mQTLs to explore if relationships between DNAm and SES, like other environmental exposures, are influenced by genetic variation. Taken together, these findings provide preliminary evidence of non-redundant DNAm loci associating with objective vs. subjective SES, highlighting the importance of SES measure selection in epidemiological studies and the potential role of epigenetics in biological embedding.  5.2 Methods and materials   5.2.1 Study sample Participants in the current study were recruited as part of the Gene Expression Collaborative for Kids Only (GECKO), a cross-sectional study of epigenetic profiles, SES and family adversity. Participants ranged from 6 to 11 years old at time of first visit, 49% were females (197 of 402), and were living in the Greater Vancouver Area (Table 5.1). The present analyses are 109  based on a subset of 63 who provided both peripheral blood and saliva samples. Blood was collected between 6 months to 2.3 years after the first visit using Vacutainer® CPT™ Cell Preparation Tubes (Becton, Dickinson and Company, NJ, USA) and PBMCs were isolated within four hours of collection, as previously described (35). Isolated PBMC pellets were stored at -80°C prior to DNA extraction. Saliva samples were collected and preserved in Oragene collection kits (DNA Genotek Inc., ON, Canada). From a subset of 37 participants, I obtained neonatal DBSs collected from heel pricks using Whatman® Protein Saver cards (Sigma-Aldrich, MO, USA). Blood spots were collected and stored by the BC Newborn Screening Program. Ethics approval for GECKO was obtained from the University of British Columbia Institutional Review Board and informed consent was obtained from all participants or guardians.   Table 5.1 Sample characteristics for full GECKO cohorts and subset presented in this study   Full cohort  Subset sample size 402 63 % female 49% (197) 43% (26) age at first visit 6 - 11; med = 9 6 - 11; med = 9 parent respondant  350 mother, 52 father 61 mother, 2 father household income ($ CAD) <5,000 - ≥200,000; med = 75,000 - 99,999 <5,000 - ≥200,000; med= 100,000-149,999 net worth ($ CAD) <500 - ≥500,000; med = 50,000-99,999 <500 - ≥500,000; med = 50,000-99,999 highest education (years) highschool (10) - graduate school (≥20); med= college (16) highschool (11) - graduate school (≥20); med= college (16) SES 2.1 - 1.57; med = 0.04 1.76 - 1.57; med = 0.1 Position in Canada 1 - 10; med = 7 3 - 10; med = 7 Note: med = median  110  5.2.2 SES measures Socioeconomic information was collected during the first visit via a parent questionnaire regarding home life, income and educational background. Of the 63 participants used in the present analysis, 61 mothers and 2 fathers completed the questionnaire. A composite measure of objective SES was created by taking the average of three measures after transforming them to z-scores: household income, top education level of the two the parents, and net worth.  To acquire household income information, the respondents were asked to record gross household income from the past year. Eleven options were provided which ranged from below $5,000 to $200,000 or greater and increased in $5,000 intervals. To determine net worth, respondents were asked to record how much cash would be gained after liquidation of their assets. Finally, respondents were asked to record highest grade (in years) in school completed by either parent. Answers ranged from 11 years, which corresponded to high school, to greater than 20 years (graduate school). These measures were scaled and centered, creating z-scores, prior to averaging to create a composite measure of SES.  To obtain a measure of subjective SES, The MacArthur Scale of Subjective Social Status was used (331,335). This social ranking instrument is a simple picture of a ladder with 10 steps and has been widely validated across different populations (335,336). Here, the parent respondent was asked to indicate on the ladder where his/her immediate family stands in his/her country (Canada) in terms of income, education, and occupation. Notably, the subset used in this analysis did not represent the full range of economic diversity of the GECKO cohort, as individuals with the lowest economic measures tended not to consent to providing blood samples.  111  5.2.3 DNA methylation data Genomic DNA was extracted from PBMC pellets using the Dneasy kit (Qiagen, MD, USA). DNA yield and purity were assessed using a Nanodrop ND-1000 (Thermo Fisher Scientific, MA, USA). For DBSs, The GenSolve DNA Recovery Kit, GVR100 (IntegenX, CA, USA) was used to extract genomic DNA, followed by purification using QiAmp DNA Micro Kit (Qiagen), concentration using Microcon DNA Fast Flow Devices, MRCF0R100 (Millipore) and quantification using a Nanodrop ND-1000. Next 750ng from each PBMC and DBS genomic DNA sample underwent bisulfite conversion using EZ DNAm Kits (Zymo Research) and 160ng of bisulfite-convered DNA was applied onto the Illumina Infinium HumanMethylation450 BeadChip (450K array) (Illumina Inc, CA, USA) per manufacturer’s instructions. PBMC DNA samples were run on 8 chips in one batch and DBS DNA samples were run on 5 chips in one batch. Both were scanned were scanned on an Illumina HiScan.  Following scanning of the microarrays, DNA methylation data were imported in Illumina’s GenomeStudio software for background-subtraction and color-correction; all subsequent processing steps and analyses were performed in R Version 3.2.1 (http://www.r-project.org). In R, probes were first assessed for quality control and genomic location. Of the original 485,577 probes, 73,502 were removed because they had missing beta values in at least 5% of samples; were not detected above background in at least 5% of samples (detection p-value > 0.01); mapped to SNPs; mapped the X/Y chromosomes; were polymorphic for SNPs; or were measured by cross-hybridizing probes (165). Following quality control, 412,075 probes remained in the dataset, which was then normalized using Beta Mixture Quantile dilation (BMIQ) to remove differences in dynamic range between Type I and Type II probes (166). PBMC samples were corrected for blood cell type proportion using the Houseman deconvolution 112  method (169), which has been used in pediatric blood samples, including this cohort (153,197). The algorithm outputs the relative predicted proportions of the six cell types (CD8 T cells, CD4 T cells, natural killer cells, B cells, monocytes and granulocytes), which were then used to regress out variability attributable to inter-individual differences in cell proportion (153). As granulocytes are not present in PBMCs they are treated as a “negative control” of the extraction procedure. After correction, the Houseman method was re-run on the DNAm which estimated cell proportions consistent across all samples (data not shown). The neonatal DBSs were not corrected for cell type heterogeneity for two reasons. First, neonatal peripheral blood contains nucleated red blood cells that would likely decrease the efficacy of a prediction method trained on adult blood cells. Secondly, CpGs tested in DBSs only included those found to be associated with SES in PBMCs irrespective of cell proportions. Data were corrected for microarray chip in ComBat to remove batch effects (173). Prior to analysis data were filtered for inter-individual variability in the PBMC samples. For each CpG measured in PBMCs the range between beta values at the 10th and 90th percentile was calculated and those with a range greater than 5% were included in the analysis (176). Following this step, 107,295 CpGs remained.    5.2.4 Genotyping data generation and use in inferring population structure  DNA from saliva samples was extracted by ethanol precipitation and was purified and concentrated using DNA Clean & Concentrator (Zymo Research). Next, 200 ng of genomic DNA were amplified and hybridized to the Illumina PsychChip SNP genotyping array (PsychChip) (Illumina Inc, San Diego, CA) per manufacturer’s instructions. Chips were scanned and genotype data were imported into Illumina’s GenomeStudio using default parameters. Genotype calls were generated using Illumina cluster definitions with a Gencall threshold of 113  0.10.  Genotyping data were then exported into R Version 3.2.1 for probe filtering. Each probe was assessed for genomic location and quality. Probes mapping to sex chromosomes, mitochondria, or no chromosome were removed; for quality control, probes with a low average Gencall scores (< 0.1) or an unexpected distribution (10th percentile < 0.2) were removed. After these steps, 547,477 SNPs remained and were filtered for high pairwise linkage disequilibrium and high variability using PLINK software; this was performed to remove genomic patterns that could mask greater patterns related to population structure (337).   Next, the 223,089 SNPs that remained following SNP pruning were run through principal component analysis (PCA) as a means of data reduction. The first two principal components, which accounted for a 65.1% of the total variation (62.9% and 2.2%, respectively) and had standard deviations greater than 1, were used in analysis as measures of population structure in the GECKO cohort.    Of note, self-reported ethnicity was collected but over 100 reporters identified their children as “Canadian”. Most of these individuals clustered with individuals of European descent at PC1 and PC2 of the genotyping data, but many individuals reported to be “Canadian” clustered most closely with individuals of East Asian ancestry or multiethnic ancestry (Supplementary Figure 5.1). As such, these individuals were removed from comparisons between ethnicity and SES factors in section 5.3.1.    5.2.5 Statistical analysis of DNAm in PBMCs   First, two linear models were run on each of the 107,295 variables CpGs, using either objective or subjective SES as the explanatory and sex and PC1 and PC2 from the genotyping data as covariates (178). As age was not correlated with either SES outcome, it was omitted from 114  the linear model, to preserve statistical power. Associations with an uncorrected p-value < 5x10-4 and an effect size threshold of absolute delta beta > 5% (|∆b| ≥ 5%) were followed-up for downstream analysis. This threshold was chosen as no sites met strict criteria for significance, i.e. FDR-adjusted p-value < 0.05, but as this was a pilot study, lower thresholds were used to identify candidate sites (152). These sites, associated with objective or subjective SES, were tested for enrichment of underlying genomic features as compared to a “null” expected count of CpGs in each region from 1,000 iterations (205).  5.2.6 Gene ontology  Gene ontology (GO) analysis was performed using the software ErmineJ (287). The 412,075 CpG, which passed quality control and normalization in PBMCs, were annotated to genes as previous described (165), generating a 450K GO annotation background. This background included sites removed prior to analysis due to low inter-individual variability, as these sites can also considered as not related to the measures of interest and therefore should be tested against in enrichment analysis. Enrichment for GO terms was then performed using over-representation analysis. The following ErmineJ parameters were used: include only “Biological Process” related GO terms, minimum gene set size of 5, maximum gene set size of 100, and test the effect of multifunctional genes. Reported p-values were corrected for both FDR and multifunctionality of GO terms.  5.2.7 Identification of differentially methylated regions  As multiple studies have reported that functionally relevant findings are likely to be associated with genomic regions rather than a single CpG, I chose to assess the genomic 115  locations of SES-associated CpGs as to identify underlying differentially methylated regions (DMRs) (81,338-340). Reported DMRs were required to contain at least two SES-associated CpGs, as determined by the previous linear regression, which could be no more than 1kb apart. CpGs in DMR were not required to be contiguous as discordant CpGs have be discovered within DMRs, specifically in cancer, as well as in Chapter 4 of this dissertation (341). A window of 1kb was selected as co-methylation has been found to exist most strongly within a distance of 1 kb (342). As well, methylation haplotype blocks identified from whole genome bisulfite sequencing data were all less than 1500bp, with 99.98% 1000bp or smaller in size (64).  5.2.8 mQTL analysis  For analysis of genetic influence on SES-associated CpGs, a list of mQTLs previously uncovered in the GECKO cohort were used (See chapter 3). Briefly, using 74 GECKO PBMC samples, SNPs measured on the PsychChip data were numerically coded, as 1, 2, or 3, and correlated with any variable CpG within 5kb using Spearman correlations. Pairs with an FDR corrected p-value ≤ 0.05 and DNAm change per allele ≥ 2.5% in the discovery cohort (GECKO) were tested in the C3ARE cohort for validation. A total of 2,857 pairs of associated CpG-SNPs validated and were overlapped with SES-associated CpGs. SES-associated CpGs were also interrogated against mQTLs discovered in the Accessible Resource for Integrative Epigenomic Studies (ARIES cohort) (216). Blood samples collected from over 1000 children at age 7 years were used to generate these data, which were accessed through the mQTL database (http://www.mqtldb.org/). All mQTLs meeting a p-value of 1x10-7 were downloaded, trans-mQTLs were removed and an effect size threshold of ≥ 2.5% DNAm change per allele was implemented, leaving 2,838 mQTLs associated with 28 unique CpGs for comparison.  116   5.2.9 Statistical analysis in dried blood spots  CpGs found to be correlated with objective or subjective SES in early life PBMC samples, were analyzed in 37 neonatal DBS samples to assess if the associations were present at birth. Of these 37 individuals with DBS samples, 32 were matched to the PBMC samples; however, all 37 samples were used to avoid further reduction of the small sample size. Identical linear models to those run in PBMC DNAm data were used for the DBS DNAm data and CpGs reported as potentially associated with SES had an uncorrected p-value < 0.05 and an absolute |∆b| > 5%. Benjamini-Hochberg corrected p-values were also reported (179). Following these analyses, samples were reduced to those with matched PBMC samples (n=32) to compare effect sizes across both time points, i.e. in both PBMCs and DBSs.  5.3 Results   5.3.1 Objective and subjective SES measures in the GECKO cohort  To understand the relationship between objective and subjective SES in GECKO, a cross-sectional cohort of children aged 6-11 years, I began by analyzing these two measures in relation to each other and in relation to sex, age, and ethnicity/population structure (Figure 5.1). In this cohort, objective SES (hereafter referred to as a CompSES, an abbreviation of composite SES) was calculated as an average of three parent-reported measures: highest education completed by either parent, household income, and net worth. Subjective SES was measured using a parent-report ranking his/her social and economic position in Canada, on a scale from 1 to 10 (hereafter referred to as a PosCan) (331). Correlations were performed in both the full cohort (n=402, 117  CompSES was not available for 22 individuals) and the subset of children who provided peripheral blood samples from which PBMCs were isolated (n = 63, CompSES was not available for 2 individuals). A summary of demographic information and SES measures can be found in Table 5.1.   118   Figure 5.1 Relations between sample characteristics and SES measures in full GECKO cohort were maintained in subset used in DNAm analysis.  Correlations between CompSES, PosCan, age, sex and ethnicity (principal components 1 and 2 of genotyping array) in full cohort (A and B; n=402, CompSES was not available for 22 individuals) and subset used in DNAm analysis (C and D; n = 63, CompSES was not available for 2 individuals). Colours and values in correlation heatmaps (A and C) represent Spearman correlation coefficients.   10.520.050.1500.070.04−0.140.2310.−0.140.1410.470.430.760.320.050.2610.620.850.54−0.20.310.840.58−0.110.2110.6−0.130.2910.050.061−0.19 1−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1Ethnicity_1Ethnicity_2Highest_eduIncomeSavingsCompSESPosCanAgeSexEthnicity_1Ethnicity_2Highest_eduIncomeSavingsCompSESPosCanAgeSexCA BD12345678910−1 0 1CompSESPosCan12345678910−2 −1 0 1CompSESPosCan10.560.060.11−−0.050.0910.080.05−0.1200.1400.1410.330.350.730.4−0.030.0210.530.810.35−0.050.0410.810.44−0.06−0.0210.52−0.070.021−0.07−0.051−0.07 1−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1Ethnicity_1Ethnicity_2EducationIncomeSavingsCompSESPosCanAgeSexEthnicity_1Ethnicity_2EducationIncomeSavingsCompSESPosCanAgeSex119   We uncovered a significant association between CompSES and PosCan in both the full cohort and the subsample (r=0.52 and r=0.6, respectively, p-values < 0.05). In the full cohort, neither CompSES, nor PosCan was associated with sex or age, however in the subsample of families whose children provided blood samples, CompSES was associated with sex (r=0.29, p-value<0.05). As such, sex was included as a covariate in later DNAm analyses.   As DNAm reflects in part genetic ancestry, I built a molecular genetic classification of ancestry by applying PCA to data from a genotyping array (91,318). Of the full cohort, 304 individuals provided saliva samples, from which the extracted DNA was run on the Illumina PsychChip genotyping array to assess population structure. Using principal component analysis, population structure was defined by component 1 (PC1) and component 2 (PC2) (Supplementary Figure 5.1). In both the full cohort (n=304) and the subsample (n=63), PC2 was significantly correlated with PosCan (p-values < 0.05), but not CompSES. Of note, PC2 mainly separated individuals who self-identified as East Asian or Southeast Asian from Caucasian, suggesting a potential trend in the reporting of subjective SES, consistent with previous findings of subjective SES differing by demographic group (336). However, a t-test of PosCan in all Caucasian (n=61) vs. Asian and Southeast Asian individuals (n=64) did not reveal significant differences between these groups (p-value=0.12). Taken together, these results indicated that while the full GECKO cohort was generally balanced for sex, age, and ethnicity across the SES gradient, the subset of individuals who gave blood samples were slightly more skewed with regard to these variables. These findings were incorporated into the design of the subsequent epigenetic analysis intended to elucidate the DNAm correlates of early-life objective and subjective SES.    120  5.3.2 Subjective SES and objective SES tended to associate with different CpGs  DNAm extracted from the PBMCs of 63 participants was measured across ~485,000 CpGs genome-wide using the Illumina Infinium HumanMethylation450 BeadChip (450K, Illumina Inc, San Diego, CA). After filtering for non-variable probes and correcting for cell-type heterogeneity, 107,295 CpGs remained and were included an analysis of DNAm correlates of objective and subjective SES. Notably, predicted cell type proportions were not significantly correlated with PosCan, nor CompSES (data not shown).   Following data processing, I ran two linear regression models, using either PosCan (subjective) or CompSES (objective) as the explanatory variable, and included sex and PC1 and PC2 of the genotyping data as covariates. In cognizance of the small sample size, I used a liberal p-value threshold of 5x10-4 when reporting SES-associated CpGs (152). In PosCan, 98 CpGs met the criteria for associated loci (p-value< 5x10-4, |∆b| > 5%) and in CompSES, 63 CpGs met these criteria (Figure 5.2; Supplementary Table 5.1); no sites were significant following FDR correction, at a Benjamnini-Hochberg corrected p-value of 0.05. A total of six CpGs were associated with both PosCan and CompSES (Table 5.2) and while this overlap was significantly greater than what was expected by chance based on 20,000 Monte Carlo simulations (p-value < 2x10-4), it was lower than anticipated based on the strong association between PosCan and CompSES.    121   Figure 5.2 Genome-wide linear regressions showed limited numbers of CpGs associated with each SES variable.   Associations between PosCan (left) and CompSES (right) with DNAm at 107, 295 variable CpGs. Volcano plots of each CpG showing effect size (db) vs. p-value. Points plotted in blue are CpGs found significant at an |∆b| > 5% (vertical lines) and p-value < 5x10-4 (horizontal lines).  Table 5.2 CpGs associated with both PosCan and CompSES (|∆b| > 5% and p-value < 5x10-4) CpG CompSES p-value CompSES ∆b PosCan p-value PosCan ∆b Chromosome cg02556042 2.59E-04 -0.101 3.78E-04 -0.105 12 cg02966332 4.50E-04 -0.059 2.60E-04 -0.064 20 cg06650914 2.29E-04 -0.053 1.18E-04 -0.060 18 cg07033961 1.56E-04 0.059 4.96E-04 0.057 5 cg18808777 2.05E-05 0.089 2.48E-04 0.083 6 cg20320656 2.07E-04 -0.125 4.17E-04 -0.118 12   Further examination of the CpGs associated with both PosCan and CompSES trended toward an inverse relationship between DNAm and SES. For CompSES 48 of 63 CpGs exhibited 04812−0.50 −0.25 0.00 0.25 0.50delta beta−log P− value04812−0.50 −0.25 0.00 0.25 0.50delta betaPosCan CompSES122  a negative correlation and for PosCan, 80 of 98 CpGs exhibited a negative correlation. While one might speculate that lower SES has a repressive effect of gene activity via increased DNAm, the location of these CpGs relative to genes suggested otherwise. Both sets of associated CpGs were tested for enrichment in gene features and CpG islands relative to all CpGs that were analyzed (107,295 CpGs; Figure 5.3) (205). Using 1000 permutations of randomly selected CpGs, PosCan-associated sites were found to be significantly depleted at gene promoters, the regions at which the inverse relationship between DNAm and gene activity is strongest (FDR-correction p-value <0.05). By comparison, CompSES-associated CpGs did not significantly differ in their genomic locations from the “background” set of CpGs. Finally, I performed Gene Ontology analysis using ErmineJ to identify any enrichment of genes in shared biological pathways. For both PosCan and CompSES, no biological pathways were significant after correction for multiple testing and multifunctionality.   123   Figure 5.3 Limited enrichment of genomic features found in CpGs associated with CompSES or PosCan. Representation of enrichment or depletion of various genomic features mapping to A) 98 CpGs associated with Poscan and B) 63 CpGs associated with CompSES. Height of bars denote the fold-change between actual CpG count in each genomic region and the mean count of randomly selected CpGs in that same genomic feature, based on 1000 iterations. Error bars show standard error (* denotes significant enrichment or depletion at at Benjamini-Hochberg corrected p-value < 0.05) (S = South; N = North).  ABCpG Island Features Gene FeaturesS Shelf S Shore Island N Shore N Shelf None Promoter Intragenic Three Prime Intergenic−4−2024Fold ChangeCpG Island Features Gene FeaturesS Shelf S Shore Island N Shore N Shelf None Promoter Intragenic Three Prime Intergenic−4−2024Fold Change*124  5.3.3 Loci associated with objective SES contained more underlying DMRs than those associated with subjective SES  Given that the reported CpGs had increased likelihood for false positives due to the relaxed p-value threshold, I employed additional methods to improve the probability of reporting true associations. To that end, I took advantage of co-methylation (i.e. the dependent nature of DNAm at neighbouring CpG sites) and used a region-specific approach (342,343). Starting with the 98 PosCan-associated CpGs and 63 CompSES-associated CpGs, I identified differentially methylated regions (DMRs) containing two or more sites within 1kb of each other. This uncovered four CompSES-associated DMRs and two PosCan-associated DMRs (Figure 5.4, Supplementary Table 5.2). Of the four CompSES-associated DMRs, the largest was composed of six CpGs, all within approximately 300bp of each other and located on the island shore of a promoter CpG island upstream of ASCL2, achaete-scute family bHLH transcription factor 2. As well, one CompSES-associated DMRs, composed of two CpGs within the HCP5 gene, contained one CpG (cg18808777) which was one of the six CpGs found to associate with both CompSES and PosCan. By contrast, both PosCan-associated DMRs were composed of two CpGs and the single DMR that mapped to a gene was intragenic to the HOXA6 gene, encoding homeobox protein Hox-A6. While I used a liberal threshold to define associated CpGs (uncorrected p-value < 5x10-4), the finding of DMRs associated with CompSES or PosCan strengthened the likelihood of these loci reflecting true associations.  125   Figure 5.4 CompSES found to be associated with more DMRs than PosCan. DMRs (differentially methylated regions) associated with CompSES (A) or PosCan (B). DMRs were identified as loci containing 2 or more CpGs associated with SES found no greater than 1kb apart (nominal p-value < 5x10-4 and |∆b| >5%). Represented are the four DMRs associated with CompSES, mapping to ASCL2, intergenic, CRYGD and HCP5, respectively, and two DMRs associated with PosCan, mapping to HOXA6 and intergenic.  BALorem ipsumHCP5_cg18808777 HCP5_cg25843003NA_cg17797229 NA_cg18865445 CRYGD_cg25356886 CRYGD_cg25429719ASCL2_cg24353535 ASCL2_cg26051413 NA_cg02315096 NA_cg09460553ASCL2_cg03306615 ASCL2_cg09394785 ASCL2_cg11644479 ASCL2_cg13762320−1 0 1 −1 0 1−1 0 1 −1 0 (beta)NA_cg01443020 NA_cg05928186 HOXA6_cg13710086 HOXA6_cg231299304 8 1060.20.40.6PosCanDNAm (beta)4 8 106 4 8 106 4 8 106126  5.3.4 Few SES-associated CpGs were also driven by genetic variation  Given the emerging insight that DNAm patterns in variable CpGs are best modelled by an interaction between genetic variation and environment, I assessed the genetic contribution to the SES-associated CpGs (148). First, I utilized existing associated CpG-SNPs previously discovered in GECKO PBMC samples and independently validated (see Chapter 3) (153). I intersected these mQTL-associated CpGs with those associated with Poscan or CompSES. Six of the 98 PosCan-associated CpGs were also associated with SNPs, i.e. mQTLs, and three of the 63 CompSES-associated CpGs were also associated with mQTLs (Figure 5.4). Similarly, I compared the CpG sites to mQTLs discovered in the ARIES cohort, a more appropriately sized cohort for mQTL discovery, and found 28 of the 98 PosCan CpGs were associated with ARIES mQTLs, while just one of the 63 CompSES-associated CpGs was also associated with ARIES mQTLs (216).    Figure 5.5 Little overlap between CpGs associated with PosCan or CompSES and mQTL-associated CpGs. Overlap of SES-associated CpGs with previously reported mQTL-associated CpGs, identified in the GECKO cohort    AA AG GGLorem ipsum4 6 8 10 4 6 8 10 4 6 8 (beta value)CC CT TT−1 0 1 −1 0 1 −1 0 (beta value)rs2074479-cg04467589rs2066938-cg21721566ABC223951255875PosCanCompSESGECKO mQTL127  5.3.5 Assessing DNAm in neonatal dried blood spots  Given that both pre- and postnatal SES environments have been associated with health outcomes, the timing of the biological embedding of SES is difficult to predict. As such, the SES-associated DNAm patterns in the GECKO cohort may have been established prenatally. I used 37 DBS DNAm samples collected at birth to test whether the DNAm patterns observed in PBMCs from childhood were also present at birth, with the caveat that SES measures were collected up to 11 years postnatally.   CpGs previously associated with SES, CompSES and PosCan, in PBMCs were tested for validation in the DBSs using identical linear models to those run on the PBMC DNAm data. Of the 63 CpGs associated with CompSES in PBMCs, 12 had a p-value < 0.05 and |∆b| > 5% in DBSs (Supplementary table 3). However, only a single CpG remained significant following FDR correction for the 63 tests run (cg14087413; FDR-corrected p-value < 0.05). Similarly, 14 of 97 CpGs tested in DBSs were nominally associated with PosCan (Supplementary table 3); five of which remained significant following multiple test correction (cg03609398, cg14453935, cg15282632, cg16443667, cg27034935; all FDR-corrected p-values < 0.05). Despite differences in age, sample size and cell-composition, the effect sizes in PBMCs and DBSs at these 26 sites were strikingly similar (Figure 5.5a). All CpGs validated in DBSs were uniquely associated with either PosCan or CompSES and no two CpGs mapped to the same gene. However, five CpGs were previously identified as belonging to SES-associated DMRs (cg24353535, cg25429719, cg23129930, cg02315096, cg18808777) (Figure 5.5b). Cg18808777, located in the HCP5 gene, was associated with both PosCan and CompSES in PBMCs; located within a CompSES-associated DMR; and associated with PosCan in DBS DNAm. Replicating associations using neonatal DBSs indicated that a portion of SES-associated DNAm patterns were present at birth 128  and provided an additional level of support for the relationship between DNAm and both CompSES and PosCan.    129   Figure 5.6 CpGs associated with SES in both PBMCs and DBSs showed similar effect sized across time points. Associations between CompSES and PosCan with matched DNAm from neonatal dried blood spots (DBSs). A) Effect sizes (|∆b|) of corresponding CpGs measured in PBMCs in childhood and DBSs at birth (n=32). Blue points represent CpGs associated with either PosCan or CompSES at both time points (p-value < 0.05); grey points are associated with SES at childhood but not at birth. B) DNAm at birth at CpGs located within DMRs, previously identified from childhood samples, associated with CompSES (above) or PosCan (below) (n=37).  ABCompSES PosCan−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2−0.2− delta betaDBS delta betaASCL2_cg24353535 NA_cg02315096 CRYGD_cg25429719−1 0 1 −1 0 1 −1 0 (beta)HCP5_cg18808777 HOXA6_cg231299304 6 8 10 4 6 8 (beta)130  5.4 Discussion  Biological embedding of SES and adversity in childhood results in life long health disparities and there is mounting evidence that DNAm patterns can reflects this phenomenon (19,87,314,315). However, SES is a complex construct and the various methods used define or measure SES, such as subjective vs. objective, affect its association with health outcomes (49,329,332). Hypothetically, epigenetic correlates of SES may also differ by objective vs. subjective measures. This study represented the intersection of previous findings, using parents’ perceived social standing to predict DNAm patterns in children and compared these findings to those generated with more traditional measures of SES.   Testing the DNAm patterns of variable CpGs from PBMCs in childhood, I found surprisingly little overlap between CpGs associated with each measure, given the strong correlation of between the measures themselves. While there were a large number of CpGs that met the initial p-value and effect size thresholds, only two DMRs, each containing two CpGs, were found within PosCan-associated CpGs. By comparison, the CompSES-associated CpGs contained four underlying DMRs, the largest of which was comprised of six CpGs, approximately 1kb upstream of ASCL2 (Achaete-scute complex homolog 2).  Of note, ASCL2 is most strongly expressed in the placenta, functioning in trophoblast differentiation, but is also expressed in tissues of the gastrointestinal tract and in skin (344). Leveraging the available genotyping data, I found limited genetic influence on SES-associated CpGs at the fairly coarse level of granularity offered by the approach. However, there was a striking difference in number of ARIES mQTLs associated with PosCan vs. CompSES, 28 CpGs and one CpG, respectively. Finally, I found a subset of CpGs which were validated in neonatal DBSs despite that at birth, 131  individuals having only been “exposed” to SES in utero. Taken together, these results reflect unique patterns of association between DNAm and objective and subjective SES.    One main objective of this study was to test whether subjective SES was more strongly related to DNAm than objective SES. Previous findings have indeed suggested that subjective SES may better predict a variety of health outcomes as compared to traditional, objective economic measures. This is likely due the role of perceived social standing in chronic stress. As well financial and overall parents’ stress in childhood have been linked to discipline styles and responses, and has also been associated with differentially DNAm in the child (146,345). Neither PosCan, nor CompSES stood out as being more strongly linked to DNAm than the other and there was no considerable overlap between the associating CpGs. This finding supports previous research findings of objective and subjective SES representing intersecting but unique facets of the SES experience (329,346). As well, previous research of DNAm correlates of family adversity, parent education levels and income also found little overlap between CpGs associated with each variable (325). In the GECKO cohort, the subjective SES measure, PosCan, was reported by a parent and therefore may be more closely related to the child’s experience of his/her parent’s stress and well-being, than CompSES. Essex et al. (2011) linked exposure to parents’ stress in infancy and preschool to DNAm patterns in adolescents, 10-15 years later. The CompSES CpGs overlapped with one gene previously reported gene in Essex et al., IGFAS; while the PosCan CpGs overlapped with three previously reported genes, ARSG, MOBP, and TBX1. The replication of these genes in independent cohorts, assessing related environmental exposures, is promising and warrants further investigation of these candidate biomarkers of early exposure to stress in caregivers.  132   Incorporating genetic variation with the DNAm findings also highlighted the difference in the epigenetic correlates of PosCan and CompSES. As previously noted, PosCan was likely more strongly correlated to parental stress and mental health, which have genetic components, than CompSES (49,347). As well, research into DNAm changes in psychiatric disorders has identified mQTLs as contributing to the epigenetic variation associated with diagnosis (138,348,349). Taken together, it is tempting to speculate that the PosCan-associated DNAm “signature” contains more mQTL-associated CpGs than the CompSES-associated signature, which may be related to the heritable, genetic components of mental health. Although gene ontology analysis on these CpG sites did not garner significant results, multiple genes containing CpGs associated with PosCan were related to brain function and mental health; these include CUX2 (cut-like homeobox 2), GRIK1 (glutamate receptor, ionotropic, kainite 1) and MOBP (myelin associated oligodendrocyte basic protein) (350-353).     Finally, taking advantage of the matched, longitudinal samples, I was able to test the replicability of the SES-associated CpGs in neonatal DBSs. Effect sizes across these time points were strikingly similar, suggesting that the correlation between SES and DNAm was stable over time and across different tissue compositions, whole blood versus PBMCs, at least for some CpGs. In addition to upholding the results of the PBMC analysis, these findings supported the knowledge that SES experienced in utero, i.e. social and economic experience of a pregnant mother, has long term outcomes on the child and underscore the value of DNAm as a biomarker of these experiences (354,355).    133  5.4.1 Limitations  The sample size was small, especially relative to the size of the whole GECKO cohort, which led me to use lenient statistical cut-offs, increasing the rate of false positives in the SES-associated CpGs I reported. Due to the sample size, I also chose to analyze males and females together rather independently, which may have obscured sex-specific responses. These findings, therefore, should be tested replication in an independent cohort to determine whether or not the associations are replicable. The approach of forgoing multiple test correction for validation using an independent cohort multiple, has been recently proposed and utilized (152,245). However, I believe the additional analyses run on these sites somewhat offset these concerns and provided a biological perspective to increase confidence. After using an initial p-value threshold to report SES-associated sites, I then further interrogated these sites for underlying DMRs and for their replicable associations over time using neonatal samples.   We chose not to run a full-scale EWAS on the DBS DNAm data due to a number of limitations. Firstly, I did not have SES measures collected concurrent to when the neonatal DBSs were collected, therefore these analyses assume that social and economic standing of these families did not greatly change. Ideally, it would have been preferable to have concurrent measures to be able to treat the data as longitudinal. Secondly, while the PBMCs were corrected for cell type heterogeneity across individuals, the DBSs were not. Blood cell type proportions have been previously shown to associate with SES status, which is likely related to the pro-inflammatory effects of low SES (87,328); therefore, differential methylation in the DBSs might have been driven by differences in cell type heterogeneity. However, all CpGs were differentially methylated by SES in the DBSs, were also associated with SES in the PBMCs irrespective of cell type proportion. Therefore, it is unlikely that findings in the DBSs were 134  entirely driven by cell type. As such, I felt the limitations of the study design were mitigated by my careful selection of analytic methods.  In addition to DNAm, early life SES has been associated with other molecular characteristics such as blood cell proportions, transcription levels of genes in inflammatory pathways, and circulating hormones (87,328,356). Of note, I found that neither CompSES, nor PosCan, was associated with PBMC cell proportion as predicted by a validated deconvolution method (169,170). While previous publications have reported such associations between early life SES and blood cell proportions, other factors may have obscured such relations within this cohort. For example, sex and ancestry have all been shown to alter blood cell counts in healthy individuals (357,358). As well, differences in blood cell proportion may be a latent effect which only arises in adulthood, similar to many SES-associated health outcomes. I was not able to test SES against blood cell proportions at birth as current in silico cell type predictions are less reliable in newborn blood, in part due to the presence of nucleated red blood cells. Although, it is a plausible hypothesis that the cell composition of neonatal whole blood be receptive to prenatal, SES-related exposures. Finally, based on the correlative nature of these analyses I cannot infer causation, but rather evidence of an association between early life SES and DNAm.   The data presented here provide early evidence of the unique means through which subjective and objective SES in early life may be biologically embedded. Potential epigenetic sites of biological embedding uncovered in this study warrant further investigation to understand the timing of DNAm patterning; to tease apart what specific environment or exposure; and to determine causality. Moreover, I highlighted the importance of integrating genetic and epigenetic data, finding a number of SES-associated CpGs under some level of genetic influence. DNAm represents a potential stable biomarker of childhood stress and adversity, which could inform an 135  individual’s susceptibility to SES, developmental trajectories and risk for future negative health outcomes.  136  Chapter 6: Conclusions  6.1 Summary of dissertation results   DNA methylation (DNAm) has become a popular tool in epidemiology to link environmental exposures or complex traits to a quantitative and often meaningful biological outcome. However, the more I learn about DNAm, the more complex the picture becomes. The path from exposure to DNAm alteration, to ultimate health outcome is often not linear. Timing of the exposure, tissue of origin and its cellular composition, as well as genetic variation, and likewise ethnicity, are all related to DNAm patterns and therefore may confound or obscure epigenome-wide association study (EWAS) observations(94,148,359-361). As well, EWASs do not elucidate causation, and relationships may be reversed in that phenotype precedes DNAm aberrations (94). With this in mind, this dissertation began with an investigation into the contributions of genetic variation and tissue of origin on the DNA methylome. These findings informed the design and analysis of the following two chapters, which focused on uncovering loci differentially methylated by internal characteristics and environmental exposures.    Tissue of origin is one of the largest contributors to DNAm variation, so much so that based on hierarchical clustering, two DNAm profiles from different individuals but the same tissue show greater similarity than two DNAm profiles from the same individual but different tissues (95,97,316). This has prompted increasing interest in the study of tissue-specific DNAm variation. In chapter 3, I presented a systematic comparison of genome-wide DNAm patterns between matched pediatric buccal epithelial cells (BECs) and peripheral blood mononuclear cells (PBMCs). My co-first author, S. Islam and I found that BECs had greater inter-individual DNAm variability than PBMCs and that highly variable CpGs were more likely to be positively 137  correlated between the tissues. CpGs that were both variable and correlated were enriched for methylation quantitative trait loci (mQTLs), i.e. sites under genetic influence, suggesting that a substantial proportion of the DNAm variation common between tissues can be attributed to genetic variation. Current estimates of the genetic influence on DNAm variation ranges from 20-80% (83,93,94). This finding raises questions regarding what mechanisms dictate whether mQTLs are tissue specific or shared between multiple tissues. Plotting mQTLs specific to one tissue often revealed consistently high methylation patterns at the linked CpG across all individuals in the other tissue. Based on this observation, one could speculate that high DNAm, when located in CpGs islands at gene promoters, may be involved in repression of gene activity and this role may override genetic contribution to DNAm variation. Supporting this postulation, CpGs with the largest DNAm differences between three cell types isolated from cord blood, were found to be enriched for eQTMs, expression quantitative trait methylation loci, i.e. CpGs with significant pairwise correlation with gene expression at a single gene (85,320).   At a genome-wide level, the prevailing relationship between SNPs, gene expression and DNAm demonstrates that genetic variation influences DNAm and gene expression independently, with no linear relationship between the three (85,320). As well, a small proportion of transcriptome-wide RNA levels have been correlated epigenetic changes shown to be independent of genetic variation (211);  although, directionality cannot be assumed and transcriptional levels may be altering DNAm. One potential future evaluation of this relationship could assess whether genes expressed in multiple tissues, such as housekeeping genes, are be more likely to contain cross-tissue mQTLs than genes that dictate tissue or cell specificity. A positive finding would support the function of DNAm “repressing” the relationship between 138  genetic variation and expression, known as expression quantitative trait loci (eQTL), in tissue-specific genes.   In this chapter, I also categorized each CpG assayed on the Infinium HumanMethylation450K (450K) array, as being 1) associated with an mQTL, 2) differentially methylated between tissues, and 3) concordant between tissues. I assessed published EWASs for representation of each category. This analysis uncovered a number of mQTLs within published EWAS “hits” suggesting these analyses may be inadvertently uncovering genetic variants associated with their traits of interest. A high percentage of CpGs associated with autism spectrum disorder were also associated with SNPs, i.e. mQTLs, as compared to early-life aging, onset of puberty, childhood psychotic disorder, and fetal alcohol spectrum disorder. As such, I hypothesize that heritability of the trait of interest may dictate the discovery of mQTLs in EWAS results. Although different analytical approaches could plausibly be responsible for the differences in the underlying mQTLs. Regardless, for EWASs run on environmental exposures or experiences, identifying mQTLs may still be of interest. For example, CpGs associated with genetic variation in cohorts stratified for environmental exposures may be indicators of gene by environment interactions. Such CpGs, which are associated with both the exposure and genotype, may be located in genes that confer environmental sensitivity or resilience.    In Chapter 5, I ran a conventional EWAS on socioeconomic status (SES) in PBMC samples from the GECKO cohort. Childhood experiences of SES can impact developmental trajectories and lead to health disparities later in life, suggesting that they become ‘biologically embedded’ (19). However, SES is a broad, multifactorial construct that can be measured in many ways and at multiple levels – the societal level, the community or neighborhood level, or the individual level – depending on the objective of the research (29). Here, I characterized SES in 139  two ways: 1) an objective, composite measure of household income, education level and net worth and 2) a subjective measure of parent respondents’ social and economic rank in comparison to the rest of Canada. Previous research of subjective SES, suggests that the latter may represent a more encompassing view of life experiences and, in some scenarios, can better predict health outcomes than objective SES measures (306,329). Therefore, by comparing the EWAS results of objective vs. subjective SES, I examined whether this outcome also occurs with respect to DNAm.   I ran a linear model across all variable CpGs using either objective SES, CompSES, or subjective SES, PosCan, as the explanatory variable. After reducing the associated CpGs to those found within differentially methylated regions (DMRs), CompSES showed a stronger association with DNAm. Specifically, I found four DMRs, containing a total of 14 CpGs, to be associated with CompSES. By comparison, PosCan was associated with two DMRs, each containing two CpGs. I further analyzed the SES-associated CpGs, rerunning the same model on DNAm from dried blood spots (DBSs) collected at birth. Of the 98 PosCan-associated CpGs found in PBMCs, 14 were also associated in DBSs and of 64 CompSES-associated CpGs, 12 CpGs remained associated in DBSs. Despite, DBSs remaining uncorrected for cell type composition, the effect sizes of the CpGs significant in both tissues, measured at birth and in early childhood, were strikingly similar.  To clarify whether the SES-DNAm associations observed in both early childhood and at birth were influenced by genetic variation, I intersected these CpGs with mQTL-associated CpGs (identified in the previous chapter). Surprisingly, only two stably associated CpGs were also associated with mQTLs. I also overlapped these sites with mQTLs found in the ARIES cohort finding approximately one third of PosCan CpGs were ARIES mQTLs, while just one 140  CompSES-associated CpGs was also associated with ARIES mQTLs. Overall, this provided support that the stable association found between SES and DNAm was not strongly driven by underlying genetic differences within the cohort. An obvious limitation to this analysis was the lack of SES information from birth, although family SES tends to be stable throughout childhood (362).   As outlined in the introduction and tested in Chapter 5, a growing body of research has documented associations between childhood environments and DNA methylation, highlighting epigenetic processes as potential mechanisms through which early external contexts influence health across the life course. In Chapter 4, I tested a complementary hypothesis: indicators of children’s early internal, biological and behavioural responses to stressful challenges may be linked to stable patterns of DNAm later in life. Using the longitudinal WSFW cohort, I combined temperament, mental health and autonomic reactivity traits measured across infancy and childhood to generate a broad, composite measure of inhibition/disinhibition. This measure, termed biobehavioural inhibition/disinhibition (BID), separated inhibited children with strong withdrawal negativity and heart rate response to tests designed to elicit ANS activity from disinhibited children with higher anger and externalizing symptoms. Importantly, this composite biobehavioural reactivity measure was correlated with a number of DNAm sites measured from buccal epithelial cells (BECs) at age 15. This included multiple CpGs in each of the following genes: DLX5, IGF2, MYO16, and PRUNE2.  Building on this finding, I tested whether the associations in these four genes, measured at age 15, persisted in DNAm measured at age 18 in the same individuals. Both DLX5 and IGF2 contained CpGs whose DNAm was correlated with biobehavioural reactivity at both ages despite these sites not displaying stability across the three years. Therefore, both age and early life 141  biobehavioural reactivity are reflected in DNAm in BECs at the sites. At a conceptual level, this result is in line with a common finding of EWASs, i.e. that DNAm variability in any given CpG may be best explained by multiple variables (148,359-361).   One common finding between chapters 4 and 5 was the location of associated CpGs in imprinted genes, which are genes that are differentially expressed depending on parent of origin and are often associated with differentially methylation at an imprinting control region; the gene promoters of imprinted genes often also display intermediate DNAm levels due to monoallelic methylation dependent on the parent of origin. In the GECKO cohort, multiple CpGs in ASCL2 were associated with objective SES and in WSFW, multiple CpGs in IGF2 were associated with biobehavioural reactivity. These associations are likely the result of different mechanisms as SES is an environmental factor and biobehavioural reactivity is an independent, “internal” factor. Imprinted genes are important for fetal growth and development and therefore are tightly regulated (363). However, the findings presented here support and broaden a theory postulating that DNAm at imprinted genes are more susceptible to environmentally induced epigenetic changes (364,365). Previous studies of both “internal” and “external” factors including early-life, maternal socioeconomic status and neurobehavioural measures have been associated with inter-individual differences in DNAm at imprinted genes (366-368). In future, such patterns should be assessed with considerations for representation of imprinted genes on the 450K array; whether reported CpGs are part of imprinting-specific differentially methylated regions and are truly mono-allelically methylated; and if the proximal genes are imprinted in the measured tissue.   142  6.2 Limitations  Unlike disease states or chemical exposures, social determinants of health and psychological/behavioural measures in healthy individuals often have a much smaller signal in the epigenome (369). These relatively small effect sizes, paired with the multiple testing problem that plagues most –omics research and the typically larger effects of age, sex, and ethnicity on DNAm, make identifying true associations in challenging. As no cohort is ideal, there were a number of limitations to these study designs, many of which are common amongst human EWASs (94). EWAS results, like GWAS findings, cannot conclude causation, only correlation. As yet, it is unknown what function, if any, DNAm alterations play in “embedding” SES. In studies such as the one performed in chapter 5, reverse causation is not possible, i.e. DNAm cannot influence SES, however there are multiple alternative hypotheses beyond direct “biological embedding” or cellular reprogramming. For example, DNAm patterns may be indicative of certain genetic variants which have been associated with education levels and social-class mobility (370). Another possibility is that the developmental changes that accompany low SES may somehow result in epigenetic alterations rather than the exposure in itself. With the study of internal traits or phenotype, the possibilities become greater. In chapter 4, I discussed the numerous pathways through which DNAm and biobehavioural reactivity may become associated. Not being able to infer a pathway can make interpreting EWAS findings challenging. One way to elucidate the relationship between the trait of interest and DNAm is to gather longitudinal data. This is especially important to DoHAD studies in which the timing of exposure may directly alter the outcome. However, longitudinal information in epidemiological studies cannot clarify the relationship between DNAm and prenatal exposures, inborn traits, or 143  phenotypes with a strong genetic component including biobehavioural reactivity. Such clarification will likely require animal models or cell culture.   Two common issues arise with tissue selection in human cohorts: 1) accessibility of tissue of interest, and 2) cell composition of tissue. With respect the former, surrogate tissues are often used in lieu of inaccessible, primary tissues. In chapter 4, I associated BEC DNAm with behavioural measures, although brain tissue would have been ideal. Due to the tissue-specific nature DNAm patterns, which I interrogated in Chapter 3, use of surrogate tissues can make interpretation and generalizability of results difficult. Multiple publications have assessed the relationships between brain DNAm and DNAm from peripheral tissues, such as cord blood, peripheral blood and saliva, as well as generated tools to help infer brain DNAm (171,204,225). However, this research has also revealed a stark difference in DNAm patterns between brain regions (97,204,225). Nonetheless, finding an association in DNAm from a surrogate tissue can still be of great value in its use as a biomarker. As a biomarker, DNAm differences can aid in determining disease risk, disease detection or past exposures, irrespective of biological mechanism (369). Therefore, depending on the objective of the research, the use of surrogate tissues or peripheral tissues is not necessarily hindered by the fact that DNAm is so strongly driven by tissue of origin.   Cell composition within a tissue is also major confounding factor for epigenetic studies. Multiple studies have shown that previous epigenetic associations were either confounded with or driven by cell composition (197,371). For example, if the underlying proportions of blood cells change with age, then comparing DNAm across ages could indicate a change in DNAm where there is none (372). Many solutions have arisen to try to address this problem, including using sorted cells and incorporating cell counts into statistical models. In blood, many in silico 144  cell prediction methods have been generated to address the issue, including those which use the methylomes of sorted cells as references and so-called “reference-free” methods which estimate underlying linear components representing cell types (169,170,373). These methods are not always in agreement with regards to the predicted proportions and selecting the “best” one can be challenging given the algebraic knowledge required to dissect the underlying methodology. Nonetheless, researchers are currently applying these methods to tissues beyond peripheral blood (171,374). In chapters 3 and 5, I ran cell-type correction on BECs and PBMCs prior to running analyses, while in chapter 4, I chose to leave BECs uncorrected for underlying cell composition because BEC cell proportions were not correlated the biobehavioural measures. Both of these methods were and are imperfect. Both methods would likely fail to identify signatures that are present in a cell type that constitutes only a small fraction of the tissue as such signatures would be obscured by signals from dominant cell types unless measured in isolated cells. Although, cells present in such low proportions in accessible tissues may not be practical choices for the identification of biomarkers. As no ideal method currently exists for addressing the underlying cell composition of any given tissue, this is a common limitation of DNAm analysis and the subject of many debates within the epigenetics community.   Beyond the need to address or correct for cell composition in EWAS design, cell composition is at the center of another discussion, i.e. whether environmental exposures do in fact trigger cellular reprogramming, as measured by DNAm, and ultimately change phenotypic outcomes. An emerging theory has posited that differential DNAm found in EWASs can be attributed to changes in subtypes of cells (94). More specifically, the current model of reprogramming describes mature cells “responding” to environmental exposures with altered DNAm patterns, whereas this new model, termed polycreodism, suggests that these differences 145  could arise due to differentiating cells responding to perturbations and altering cell fate. If true, polycreodism would require new means of defining cell types and subtypes beyond the current classification using histology and morphology, such as utilizing single-cell technologies to confirm a mosaic population of cells previously thought to be homogeneous (183). Although this is an interesting theory and has many implications for understanding the mechanisms underlying social epigenetics, the final association between environment and DNAm, or phenotype and DNAm would still stand. To that end, current implications for this theory based on the findings presented in this dissertation are difficult to surmise. For example, results in chapter 5 support DNAm as a means of biological embedding of SES. If these DNAm differences were a result of SES altering cell fate rather than DNAm patterns of a mature cell, all hypothesizes regarding biological embedding would still hold true. Further research is required to resolve these somewhat competing theories and the concept of polycreodism may alter how epigeneticists design studies and interpret findings.    6.3 Future directions     Knowing the common normative range for a specific CpG, in a given the tissue and at a given age, as well as potential effectors of variability, such as nearby genetic variation or specific environmental exposures, is the ultimate goal of the research presented in this dissertation. If pursued over the life course, this would essentially create something of a “growth chart” of DNAm across the life span and in situations in which aberrant DNAm precedes or quantifies negative health outcomes, could facilitate faster interventions by healthcare providers, whether it be additional monitoring or diagnosis and treatment. Currently work in genomics is aided by a plethora of online resources and repositories of genetic data, in which one could look up gene 146  function, common variants, expression patterns across tissues, phenotypic-associations, etc. Hopefully in the future such resources will be available for epigenetic marks, including DNAm. DNAm has already been shown to provide accurate information on the pathogenicity on variants of unknown significance, as well as diagnostic efficacy in disorders with overlapping phenotypes (375,376). Beyond this, DNAm may provide information regarding the sensitivity of individuals to environments and exposures for use in individualized treatments or interventions, following the trend of personalized medicine.      Future changes to epigenetics research, nearer than the creation of a DNAm growth chart, will hopefully address the issues in methodology used for EWASs. Epigenetics is a relatively nascent field, which means that the technologies and statistical methods are rapidly changing as more knowledge is gained about DNAm itself. Not only is the treatment of cell composition a matter of ongoing discussion, there is also no general consensus on processing and normalization methods, statistical testing, and most polarizing, interpretation of EWAS findings. The latter has spurred debates within the epigenetics field, often through the use of social media platforms such as Twitter or personal blogs, to evaluate and critique the latest publications. This open dialogue, from an outspoken community has arisen out of a desire to raise the standards of practice in the field. However, it is conceivable that it has also diminished the merit of epigenetic epidemiology in the eyes of outside researchers and scientists. Personally, I believe the community would benefit from more research into and rigorous testing of the methodologies used in EWAS analyses. Rather than generating new algorithms and R packages, assessing those currently in use, through common performance metrics, would be greatly beneficial. This also brings into question whether the publication process has been effective, given the common occurrence of new publications receiving negative feedback for inappropriate methods or over-interpretation of 147  findings. As more scientists are becoming experts in DNAm, hopefully this will translate into more critical reviewers. In the meantime, researchers are using pre-print servers, such as bioRxiv, and open source code, to gain critical feedback from the epigenetic community prior to publication.    148  References 1. Barker DJP. The origins of the developmental origins theory. J Intern Med. 2007 May;261(5):412–7.  2. Forsdahl A. Are poor living conditions in childhood and adolescence an important risk factor for arteriosclerotic heart disease? Journal of Epidemiology & Community Health. BMJ Publishing Group Ltd; 1977 Jun 1;31(2):91–5.  3. Buck C, Simpson H. Infant diarrhoea and subsequent mortality from heart disease and cancer. Journal of Epidemiology & Community Health. 1982 Mar;36(1):27–30.  4. Wadhwa P, Buss C, Entringer S, Swanson J. Developmental Origins of Health and Disease: Brief History of the Approach and Current Focus on Epigenetic Mechanisms. Semin Reprod Med. 2009 Aug 26;27(05):358–68.  5. Lucas A. Programming by early nutrition: an experimental approach. J Nutr. 1998 Feb;128(2 Suppl):401S–406S.  6. Lewis DS, Bertrand HA, McMahan CA, McGill HC, Carey KD, Masoro EJ. Preweaning food intake influences the adiposity of young adult baboons. J Clin Invest. American Society for Clinical Investigation; 1986 Oct;78(4):899–905.  7. Desai M, Crowther NJ, Lucas A, Hales CN. Organ-selective growth in the offspring of protein-restricted mothers. British Journal of Nutrition. Cambridge University Press; 1996 Oct 1;76(4):591–603.  8. Schulz LC. The Dutch Hunger Winter and the developmental origins of health and disease. Proc Natl Acad Sci USA. 2010 Sep 28;107(39):16757–8.  9. Haugen AC, Schug TT, Collman G, Heindel JJ. Evolution of DOHaD: the impact of environmental health sciences. J Dev Orig Health Dis. 2015 Apr;6(2):55–64.  10. Barouki R, Gluckman PD, Grandjean P, Hanson M, Heindel JJ. Developmental origins of non-communicable disease: implications for research and public health. Environ Health. BioMed Central; 2012 Jun 27;11(1):42.  11. Balbus JM, Barouki R, Birnbaum LS, Etzel RA, Gluckman PD, Grandjean P, et al. Early-life prevention of non-communicable diseases. The Lancet. 2013 Jan;381(9860):3–4.  12. Heindel JJ, Balbus J, Birnbaum L, Brune-Drisse MN, Grandjean P, Gray K, et al. Developmental Origins of Health and Disease: Integrating Environmental Influences. Endocrinology. 2015 Jul 21;156(10):3416–21.  149  13. Vinall J, Grunau RE. Impact of repeated procedural pain-related stress in infants born very preterm. Pediatr Res. 2014 Feb 5;75(5):584–7.  14. Entringer S, Kumsta R, Nelson EL, Hellhammer DH, Wadhwa PD, Wüst S. Influence of prenatal psychosocial stress on cytokine production in adult women. Dev Psychobiol. 2008 Sep;50(6):579–87.  15. Entringer S, Kumsta R, Hellhammer DH, Wadhwa PD, Wüst S. Prenatal exposure to maternal psychosocial stress and HPA axis regulation in young adults. Hormones and Behavior. Elsevier Inc; 2009 Feb 1;55(2):292–8.  16. Entringer S, Wüst S, Kumsta R, Layes IM, Nelson EL, Hellhammer DH, et al. Prenatal psychosocial stress exposure is associated with insulin resistance in young adults. American Journal of Obstetrics and Gynecology. 2008 Nov;199(5):498.e1–498.e7.  17. Nkansah-Amankra S, Luchok KJ, Hussey JR, Watkins K, Liu X. Effects of Maternal Stress on Low Birth Weight and Preterm Birth Outcomes Across Neighborhoods of South Carolina, 2000–2003. Matern Child Health J. 2009 Jan 28;14(2):215–26.  18. Adler NE, Boyce WT, Chesney MA, Folkman S, Syme SL. Socioeconomic inequalities in health. No easy solution. JAMA. 1993 Jun;269(24):3140–5.  19. Hertzman C, Boyce WT. How experience gets under the skin to create gradients in developmental health. Annu Rev Public Health.  Annual Reviews; 2010;31(1):329–47.  20. Adler NE. Socioeconomic inequalities in health. No easy solution. JAMA: The Journal of the American Medical Association. 1993;269(24):3140–5.  21. Luo Y, Waite LJ. The impact of childhood and adult SES on physical, mental, and cognitive well-being in later life. J Gerontol B Psychol Sci Soc Sci. 2005 Mar;60(2):S93–S101.  22. Turrell G, Lynch JW, Kaplan GA, Everson SA, Helkala E-L, Kauhanen J, et al. Socioeconomic position across the lifecourse and cognitive function in late middle age. J Gerontol B Psychol Sci Soc Sci. 2002 Jan;57(1):S43–51.  23. Gilman SE, Kawachi I, Fitzmaurice GM, Buka SL. Socioeconomic status in childhood and the lifetime risk of major depression. Int J Epidemiol. 2002 Apr;31(2):359–67.  24. Wadsworth ME. Health inequalities in the life course perspective. Soc Sci Med. 1997 Mar;44(6):859–69.  25. Dannefer D. Cumulative advantage/disadvantage and the life course: cross-fertilizing age and social science theory. J Gerontol B Psychol Sci Soc Sci. 2003 Nov;58(6):S327–37.  150  26. Shonkoff JP, Boyce WT, McEwen BS. Neuroscience, molecular biology, and the childhood roots of health disparities: building a new framework for health promotion and disease prevention. JAMA. American Medical Association; 2009 Jun 3;301(21):2252–9.  27. Lundberg O, 1991. Childhood living conditions, health status, and social mobility: a contribution to the health selection debate. European Sociological Review. 1991 Sep;7(2):149–62.  28. Gluckman PD, Hanson MA. Developmental Origins of Disease Paradigm: A Mechanistic and Evolutionary Perspective. Pediatr Res. 2004 Sep;56(3):311–7.  29. Evans GW, Kantrowitz E. Socioeconomic Status and Health: The Potential Role of Environmental Risk Exposure. Annu Rev Public Health. 2002 May;23(1):303–31.  30. Evans GW, English K. The environment of poverty: multiple stressor exposure, psychophysiological stress, and socioemotional adjustment. Child Development. 2002 Jul;73(4):1238–48.  31. Frank JW, Mustard JF. The determinants of health from a historical perspective. Daedalus. 1994;123(4):1–17.  32. Chen E, Miller GE. Socioeconomic Status and Health: Mediating and Moderating Factors. Annu Rev Clin Psychol. 2013 Mar 28;9(1):723–49.  33. Ellis BJ, Boyce WT. Biological sensitivity to context. Current Directions in Psychological Science. 2008 Jun;17(3):183–7.  34. Weikart DP. Changing early childhood development through educational intervention. Prev Med. 1998 Mar;27(2):233–7.  35. Miller GE, Chen E, Chen E, Fok AK, Fok AK, et al. Low early-life social class leaves a biological residue manifested by decreased glucocorticoid and increased proinflammatory signaling. Proc Natl Acad Sci USA [Internet]. 2009 Aug 25;106(34):14716–21.  36. Provençal N, Binder EB. The effects of early life stress on the epigenome: From the womb to adulthood and even before. Experimental Neurology. Elsevier Inc; 2015 Jun 1;268(C):10–20.  37. Rutter M. Achievements and challenges in the biology of environmental effects. Proc Natl Acad Sci USA. 2012 Oct 16;109(Supplement_2):17149–53.  38. Miller GE, Chen E. The Biological Residue of Childhood Poverty. Child Dev Perspect. 2013 Jan 18;7(2):67–73.  151  39. Danese A, Pariante CM, Caspi A, Taylor A, Poulton R. Childhood maltreatment predicts adult inflammation in a life-course study. Proc Natl Acad Sci USA. National Acad Sciences; 2007 Jan 23;104(4):1319–24.  40. Miller GE, Chen E, Sze J, Marin T, Arevalo JMG, Doll R, et al. A Functional Genomic Fingerprint of Chronic Stress in Humans: Blunted Glucocorticoid and Increased NF-κB Signaling. Biological Psychiatry. 2008 Aug;64(4):266–72.  41. Powell ND, Sloan EK, Bailey MT, Arevalo JMG, Miller GE, Chen E, et al. Social stress up-regulates inflammatory gene expression in the leukocyte transcriptome via β-adrenergic induction of myelopoiesis. Proc Natl Acad Sci USA. National Academy of Sciences; 2013 Oct 8;110(41):16574–9.  42. Nathan C, Ding A. Nonresolving Inflammation. Cell. Elsevier Inc; 2010 Mar 19;140(6):871–82.  43. Miller AH, Maletic V, Raison CL. Inflammation and Its Discontents: The Role of Cytokines in the Pathophysiology of Major Depression. Biological Psychiatry. 2009 May;65(9):732–41.  44. Belsky J, Pluess M. Beyond diathesis stress: Differential susceptibility to environmental influences. Psychological Bulletin. 2009;135(6):885–908.  45. Anacker C, O'Donnell KJ, Meaney MJ. Early life adversity and the epigenetic programming of hypothalamic-pituitary-adrenal function. Dialogues Clin Neurosci. 2014 Sep;16(3):321–33.  46. Fox NA. Psychophysiological correlates of emotional reactivity during the first year of life. Developmental Psychology. 1989;25(3):364–72.  47. Snidman N, Kagan J, Riordan L, Shannon DC. Cardiac function and behavioral reactivity during infancy. Psychophysiol. 1995 May;32(3):199–207.  48. Cole SW. Social regulation of human gene expression: mechanisms and implications for public health. Am J Public Health. 2013 Oct;103 Suppl 1(S1):S84–92.  49. Adler NE, Epel ES, Castellazzo G, Ickovics JR. Relationship of subjective and objective social status with psychological and physiological functioning: Preliminary data in healthy, White women. Health Psychol. 2000;19(6):586–92.  50. Cole SW, Hawkley LC, Arevalo JM, Sung CY, Rose RM, Cacioppo JT. Social regulation of gene expression in human leukocytes. Genome Biol. BioMed Central; 2007;8(9):R189.  51. Hawkley LC, Masi CM, Berry JD, Cacioppo JT. Loneliness is a unique predictor of age-related differences in systolic blood pressure. Psychology and Aging. 2006;21(1):152–64.  152  52. Seeman TE. Health promoting effects of friends and family on health outcomes in older adults. Am J Health Promot. 2000 Jul;14(6):362–70.  53. Karg K, Burmeister M, Shedden K, Sen S. The Serotonin Transporter Promoter Variant (5-HTTLPR), Stress, and Depression Meta-analysis Revisited. Arch Gen Psychiatry. American Medical Association; 2011 May 2;68(5):444–19.  54. Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, Harrington H, et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science. American Association for the Advancement of Science; 2003 Jul 18;301(5631):386–9.  55. Chau CMY, Ranger M, Bichin M, Park MTM, Amaral RSC, Chakravarty M, et al. Hippocampus, Amygdala, and Thalamus Volumes in Very Preterm Children at 8 Years: Neonatal Pain and Genetic Variation. Front Behav Neurosci. Frontiers; 2019;13:51.  56. Zohsel K, Buchmann AF, Blomeyer D, Hohm E, Schmidt MH, Esser G, et al. Mothers“ prenatal stress and their children”s antisocial outcomes - a moderating role for the Dopamine D4 Receptor (DRD4) gene. J Child Psychol & Psychiat. 2nd ed. John Wiley & Sons, Ltd (10.1111); 2013 Sep 14;55(1):69–76.  57. Belsky J, Beaver KM. Cumulative-genetic plasticity, parenting and adolescent self-regulation. J Child Psychol & Psychiat. Blackwell Publishing Ltd; 2010 Oct 6;52(5):619–26.  58. Drury SS, Theall KP, Smyke AT, Keats BJB, Egger HL, Nelson CA, et al. Modification of depression by COMT val158met polymorphism in children exposed to early severe psychosocial deprivation. Child Abuse & Neglect. Elsevier Ltd; 2010 Jun 1;34(6):387–95.  59. Saudino KJ. Behavioral genetics and child temperament. J Dev Behav Pediatr. 2005 Jun;26(3):214–23.  60. Bird A. Perceptions of epigenetics. Nature. 2007 May 24;447(7143):396–8.  61. Greally JM. A user’s guide to the ambiguous word “epigenetics.” Nat Rev Mol Cell Biol. 2018;19.  62. Goldberg AD, Allis CD, Bernstein E. Epigenetics: A Landscape Takes Shape. Cell. 2007 Feb;128(4):635–8.  63. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature Reviews Genetics. 2012 May 29;13(7):484–92.  64. Guo H, Zhu P, Yan L, Li R, Hu B, Lian Y, et al. The DNA methylation landscape of human early embryos. Nature. 2014 Jul 31;511(7511):606–10.  153  65. Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, et al. Global epigenomic reconfiguration during mammalian brain development. Science. American Association for the Advancement of Science; 2013 Aug 9;341(6146):1237905.  66. Ooi SKT, Qiu C, Bernstein E, Li K, Da Jia, Yang Z, et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. Nature Publishing Group; 2007 Aug 9;448(7154):714–7.  67. Smith ZD, Meissner A. DNA methylation: roles in mammalian development. Nature Reviews Genetics. Nature Publishing Group; 2013 Feb 12;14(3):204–20.  68. Okano M, Bell DW, Haber DA, Li E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell. 1999 Oct 29;99(3):247–57.  69. Li E, Bestor TH, Jaenisch R. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell. 1992 Jun 12;69(6):915–26.  70. Wu H, Zhang Y. Reversing DNA Methylation: Mechanisms, Genomics, and Biological Functions. Cell. 2014 Jan;156(1-2):45–68.  71. Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. American Association for the Advancement of Science; 2009 May 15;324(5929):930–5.  72. Santiago M, Antunes C, Guedes M, Sousa N, Marques CJ. TET enzymes and DNA hydroxymethylation in neural development and function — How critical are they? Genomics. Elsevier Inc; 2014 Nov 1;104(5):334–40.  73. Schübeler D. Function and information content of DNA methylation. Nature. Nature Publishing Group; 2015 Jan 15;517(7534):321–6.  74. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Schöler A, et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. Nature Publishing Group; 2011 Dec 14;480(7378):490–5.  75. Han L, Lin IG, Hsieh CL. Protein binding protects sites on stable episomes and in the chromosome from de novo methylation. Molecular and Cellular Biology. American Society for Microbiology Journals; 2001 May;21(10):3416–24.  76. Saxonov S, Berg P, Brutlag DL. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA. National Acad Sciences; 2006 Jan 31;103(5):1412–7.  77. Illingworth RS, Bird AP. CpG islands – “A rough guide.” FEBS Letters. Federation of European Biochemical Societies; 2009 Jun 5;583(11):1713–20.  154  78. Kochanek S, Renz D, Doerfler W. DNA methylation in the Alu sequences of diploid and haploid primary human cells. The EMBO Journal [Internet]. 1993 Mar;12(3):1141–51.  79. Alves G, Tatro A, Fanning T. Differential methylation of human LINE-1 retrotransposons in malignant cells. Gene [Internet]. 1996 Oct 17;176(1-2):39–44.  80. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. Elsevier Inc; 2011 Oct 1;98(4):288–95.  81. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet. 2009 Jan 18;41(2):178–86.  82. Weber M, Hellmann I, Stadler MB, Ramos L, Pääbo S, Rebhan M, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet [Internet]. Nature Publishing Group; 2007 Apr 1;39(4):457–66.  83. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. BioMed Central Ltd; 2011 Jan 20;12(1):R10.  84. van Eijk KR, de Jong S, Boks MPM, Langeveld T, Colas F, Veldink JH, et al. Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects. BMC Genomics. BioMed Central; 2012 Nov 17;13(1):636.  85. Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, Buil A, Ongen H, Yurovsky A, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife. 2013 Jun 4;2:56–18.  86. Wagner JR, Busche S, Ge B, Kwan T, Pastinen T, Blanchette M. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol [Internet]. 2014;15(2):R37.  87. Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, et al. Factors underlying variable DNA methylation in a human community cohort. Proc Natl Acad Sci USA. 2012 Oct 16;109 Suppl 2:17253–60.  88. Edgar R, Tan PPC, Portales-Casamar E, Pavlidis P. Meta-analysis of human methylomes reveals stably methylated sequences surrounding CpG islands associated with high gene expression. Epigenetics & Chromatin. BioMed Central; 2014;7(1):28.  89. Ford EE, Grimmer MR, Stolzenburg S, Bogdanovic O, de Mendoza A, Farnham PJ, et al. Frequent lack of repressive capacity of promoter DNA methylation identified through genome-wide epigenomic manipulation. bioRxiv. Cold Spring Harbor Laboratory; 2017 Sep 20;:1–43.  155  90. Korthauer K, BioRxiv RI, 2018. Genome-wide repressive capacity of promoter DNA methylation is revealed through epigenomic manipulation. bioRxiv.  91. Fraser HB, Lam LL, Neumann SM, Kobor MS. Population-specificity of human DNA methylation. Genome Biol [Internet]. 2012;13(2):R8.  92. Heyn H, Moran S, Hernando-Herraez I, Sayols S, Gomez A, Sandoval J, et al. DNA methylation contributes to natural human variation. Genome Research. 2013 Sep 1;23(9):1363–72.  93. Bell JT, Tsai PC, Yang TP, Pidsley R, Nisbet J, Glass D. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genet. 2012;8.  94. Lappalainen T, Greally JM. Associating cellular epigenetic models with human phenotypes. 2017 Jul 1;18(7):441–51.  95. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, Laird PW. Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum Mol Genet. 2009;18.  96. Ziller MJ, Gu H, Müller F, Donaghey J, Tsai LTY, Kohlbacher O, et al. Charting a dynamic DNA methylation landscape of the human genome. Nature. 2013 Aug 7;500(7463):477–81.  97. Farré P, Jones MJ, Meaney MJ, Emberly E, Turecki G, Kobor MS. Concordant and discordant DNA methylation signatures of aging in human blood and brain. Epigenetics & Chromatin. BioMed Central; 2015;8(1):19.  98. Davies MN, Volta M, Pidsley R, Lunnon K, Dixit A, Lovestone S, et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood. Genome Biol [Internet]. 2012;13(6):R43.  99. Jiang R, Jones MJ, Chen E, Neumann SM, Fraser HB, Miller GE. Discordance of DNA methylation variance between two accessible human tissues. Sci Rep. 2015;5.  100. Meissner A. Epigenetic modifications in pluripotent and differentiated cells. Nat Biotechnol [Internet]. Nature Publishing Group; 2010 Oct 1;28(10):1079–88.  101. Calvanese V, Fern ndez ANF, Fernandez AF, Urdinguio ROG, Urdinguio RG, Su rez-Alvarez B, et al. A promoter DNA demethylation landscape of human hematopoietic differentiation. Nucleic Acids Research [Internet]. 2011 Sep 10;40(1):116–31.  102. Suarez-Alvarez B, Rodriguez RM, Fraga MF, López-Larrea C, L pez-Larrea C. DNA methylation: a promising landscape for immune system-related diseases. Trends in Genetics [Internet]. 2012 Oct;28(10):506–14.  156  103. Álvarez-Errico D, Vento-Tormo R, Sieweke M, Ballestar E. Epigenetic control of myeloid cell differentiation, identity and function. Nat Rev Immunol [Internet]. 2014 Dec 23;15(1):7–17.  104. Khavari DA, Sen GL, Rinn JL. DNA methylation and epigenetic control of cellular differentiation. Cell Cycle. Taylor & Francis; 2010 Oct 1;9(19):3880–3.  105. Martino DJ, Tulic MK, Gordon L, Hodder M, Richman TR, Metcalfe J, et al. Evidence for age-related and individual-specific changes in DNA methylation profile of mononuclear cells during early immune development in humans. Epigenetics. 2011 Sep 1;6(9):1085–94.  106. Wang T, Pan Q, Lin L, Szulwach KE, Song C-X, He C, et al. Genome-wide DNA hydroxymethylation changes are associated with neurodevelopmental genes in the developing human cerebellum. Human Molecular Genetics. 2012 Oct 5;21(26):5500–10.  107. Herbstman JB, Wang S, Perera FP, Lederman SA, Vishnevetsky J, Rundle AG, et al. Predictors and Consequences of Global DNA Methylation in Cord Blood and at Three Years. El-Maarri O, editor. PLoS ONE. Public Library of Science; 2013 Sep 4;8(9):e72824–10.  108. McClay JL, Aberg KA, Clark SL, Nerella S, Kumar G, Xie LY, et al. A methylome-wide study of aging using massively parallel sequencing of the methyl-CpG-enriched genomic fraction from blood in over 700 subjects. Human Molecular Genetics. 2013 Oct 16;23(5):1175–85.  109. Martino D, Loke YJ, Gordon L, Ollikainen M, Cruickshank MN, Saffery R, et al. Longitudinal, genome-scale analysis of DNA methylation in twins from birth to 18 months of age reveals rapid epigenetic change in early life and pair-specific effects of discordance. Genome Biol. BioMed Central Ltd; 2013 May 22;14(5):R42.  110. Alisch RS, Barwick BG, Chopra P, Myrick LK, Satten GA, Conneely KN, et al. Age-associated DNA methylation in pediatric populations. Genome Research. Cold Spring Harbor Lab; 2012 Apr;22(4):623–32.  111. Talens RP, Christensen K, Putter H, Willemsen G, Christiansen L, Kremer D, et al. Epigenetic variation during the adult lifespan: cross-sectional and longitudinal data on monozygotic twin pairs. Aging Cell. 2012 Jun 4;11(4):694–703.  112. Weidner CI, Weidner C, Lin Q, Koch CM, Koch C, Eisele L, et al. Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol [Internet]. 2014;15(2):R24.  113. Bjornsson HT, Fallin MD, Feinberg AP. An integrated epigenetic and genetic approach to common human disease. Trends Genet. 2004 Aug;20(8):350–8.  157  114. Boks MP, Derks EM, Weisenberger DJ, Strengman E, Janson E, Sommer IE, et al. The Relationship of DNA Methylation with Age, Gender and Genotype in Twins and Healthy Controls. Najbauer J, editor. PLoS ONE. Public Library of Science; 2009 Aug 26;4(8):e6767–8.  115. Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, Gomez A, et al. Distinct DNA methylomes of newborns and centenarians. National Acad Sciences; 2012 Jun 26;109(26):10522–7.  116. Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, van Eijk K, et al. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol. BioMed Central Ltd; 2012 Oct 3;13(10):R97.  117. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates. Molecular Cell. 2013 Jan;49(2):359–67.  118. Johansson A, Johansson S, Enroth S, Gyllensten U. Continuous Aging of the Human DNA Methylome Throughout the Human Lifespan. Suter CM, editor. PLoS ONE [Internet]. 2013 Jun 27;8(6):e67378.  119. Florath I, Butterbach K, Muller H, Bewerunge-Hudler M, Brenner H. Cross-sectional and longitudinal changes in DNA methylation with age: an epigenome-wide analysis revealing over 60 novel age-associated CpG sites. Human Molecular Genetics. 2014 Feb 7;23(5):1186–201.  120. Rakyan VK, Down TA, Maslau S, Andrew T, Yang TP, Beyan H, et al. Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Research. 2010 Apr 1;20(4):434–9.  121. Teschendorff AE, Teschendorff AE, West J, West J, Beck S, Beck S. Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Human Molecular Genetics [Internet]. 2013 Sep 24;22(R1):R7–R15.  122. Gentilini D, Mari D, Castaldi D, Remondini D, Ogliari G, Ostan R, et al. Role of epigenetics in human aging and longevity: genome-wide DNA methylation profile in centenarians and centenarians’ offspring. AGE. 2012 Aug 25;35(5):1961–73.  123. Grönniger E, Weber B, Heil O, Peters N, Stäb F, Wenck H, et al. Aging and Chronic Sun Exposure Cause Distinct Epigenetic Changes in Human Skin. Reik W, editor. PLoS Genet. 2010 May 27;6(5):e1000971–10.  124. Ong M-L, Holbrook JD. Novel region discovery method for Infinium 450K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways. Aging Cell. 2013 Oct 22;13(1):142–55.  158  125. Numata S, Ye T, Hyde TM, Guitart-Navarro X, Tao R, Wininger M, et al. DNA methylation signatures in development and aging of the human prefrontal cortex. Am J Hum Genet. 2012 Feb 10;90(2):260–72.  126. Bocklandt S, Lin W, Sehl ME, Sanchez FJ, S nchez FJ, Sinsheimer JS, et al. Epigenetic Predictor of Age. Landsberger N, editor. PLoS ONE [Internet]. 2011 Jun 22;6(6):e14821.  127. Boyce WT, Kobor MS. Development and the epigenome: the “synapse” of gene-environment interplay. Dev Sci. 2014 Dec 28;18(1):1–23.  128. Provençal N, Suderman MJ, Guillemin C, Vitaro F, Côté SM, Hallett M, et al. Association of childhood chronic physical aggression with a DNA methylation signature in adult human T cells. Clelland JD, editor. PLoS ONE. 2014;9(4):e89839.  129. Weaver ICG, Cervoni N, Champagne FA, D'Alessio AC, Sharma S, Seckl JR, et al. Epigenetic programming by maternal behavior. Nat Neurosci. Nature Publishing Group; 2004 Aug;7(8):847–54.  130. Champagne FA, Weaver ICG, Diorio J, Dymov S, Szyf M, Meaney MJ. Maternal Care Associated with Methylation of the Estrogen Receptor-α1b Promoter and Estrogen Receptor-α Expression in the Medial Preoptic Area of Female Offspring. Endocrinology. 2006 Jun;147(6):2909–15.  131. Murgatroyd C, Patchev AV, Wu Y, Micale V, Bockmühl Y, Fischer D, et al. Dynamic DNA methylation programs persistent adverse effects of early-life stress. Nat Neurosci. 2009 Dec;12(12):1559–66.  132. Elliott E, Ezra-Nevo G, Regev L, Neufeld-Cohen A, Chen A. Resilience to social stress coincides with functional DNA methylation of the Crf gene in adult mice. Nat Neurosci. Nature Publishing Group; 2010 Oct 3;13(11):1351–3.  133. Elliott HR, Tillin T, McArdle WL, Ho K, Duggirala A, Frayling TM, et al. Differences in smoking associated DNA methylation patterns in South Asians and Europeans. 2014 Feb 3;6(1):1–10.  134. Monick MM, Beach SRH, Plume J, Sears R, Gerrard M, Brody GH, et al. Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers. Am J Med Genet [Internet]. 2012 Jan 9;159B(2):141–51.  135. Reynolds LM, Magid HS, Chi GC, Lohman K, Barr RG, Kaufman JD, et al. Secondhand Tobacco Smoke Exposure Associations with DNA Methylation of the Aryl Hydrocarbon Receptor Repressor. NICTOB. 2016 Aug 31;:ntw219–10.  136. Joubert BR, Håberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, et al. 450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to 159  Maternal Smoking during Pregnancy. Environ Health Perspect. 2012 Oct;120(10):1425–31.  137. Klengel T, Mehta D, Anacker C, Rex-Haffner M, Pruessner JC, Pariante CM, et al. Allele-specific FKBP5 DNA demethylation mediates gene–childhood trauma interactions. Nat Neurosci. 2012 Dec 2;16(1):33–41.  138. Klengel T, Pape J, Binder EB, Mehta D. The role of DNA methylation in stress-related psychiatric disorders. Neuropharmacology. Elsevier Ltd; 2014 May 1;80(c):115–32.  139. Esposito EA, Jones MJ, Doom JR, MacIsaac JL, Gunnar MR, Kobor MS. Differential DNA methylation in peripheral blood mononuclear cells in adolescents exposed to significant early but not later childhood adversity. Dev Psychopathol. 2016 Nov;28(4pt2):1385–99.  140. Naumova OY, Lee M, Koposov R, Szyf M, Dozier M, Grigorenko EL. Differential patterns of whole-genome DNA methylation in institutionalized children and children raised by their biological parents. Dev Psychopathol. Cambridge University Press; 2012 Feb;24(1):143–55.  141. Kumsta R, Marzi SJ, Viana J, Dempster EL, Crawford B, Rutter M, et al. Severe psychosocial deprivation in early childhood is associated with increased DNA methylation across a region spanning the transcription start site of CYP2E1. Transl Psychiatry. Nature Publishing Group; 2016 Jun 7;6(6):e830.  142. Non AL, Hollister BM, Humphreys KL, Childebayeva A, Esteves K, Zeanah CH, et al. DNA methylation at stress-related genes is associated with exposure to early life institutionalization. Am J Phys Anthropol. 2016 Sep;161(1):84–93.  143. Weder N, Zhang H, Jensen K, Yang BZ, Simen A, Jackowski A, et al. Child abuse, depression, and methylation in genes involved with stress, neural plasticity, and brain circuitry. Journal of the American Academy of Child & Adolescent Psychiatry. Elsevier; 2014 Apr;53(4):417–24.e5.  144. Hompes T, Izzi B, Gellens E, Morreels M, Fieuws S, Pexsters A, et al. Investigating the influence of maternal cortisol and emotional state during pregnancy on the DNA methylation status of the glucocorticoid receptor gene (NR3C1) promoter region in cord blood. Journal of Psychiatric Research. Elsevier; 2013 Jul;47(7):880–91.  145. Monk C, Spicer J, Champagne FA. Linking prenatal maternal adversity to developmental outcomes in infants: The role of epigenetic pathways. Dev Psychopathol. 2012 Oct 15;24(04):1361–76.  146. Essex MJ, Boyce WT, Hertzman C, Lam LL, Armstrong JM, Neumann SMA, et al. Epigenetic Vestiges of Early Developmental Adversity: Childhood Stress Exposure and DNA Methylation in Adolescence. Child Development. 2011 Sep 2;84(1):58–75.  160  147. Moore S, McEwen L, Quirt J, Morin A, Mah SM, Barr RG. Epigenetic correlates of neonatal contact in humans. Dev Psychopathol. 2017;29.  148. Teh AL, Pan H, Chen L, Ong ML, Dogra S, Wong J, et al. The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes. Genome Research. 2014 Jul 1;24(7):1064–74.  149. Michels KB, Binder AM, Dedeurwaerder S. Recommendations for the design and analysis of epigenome-wide association studies. Nature. 2013;10(10):949–55.  150. Mill J, Heijmans BT. From promises to practical strategies in epigenetic epidemiology. Nat Rev Genet. 2013;14.  151. Tsai P-C, Bell JT. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation. Int J Epidemiol. 2015 Aug;44(4):1429–41.  152. Mansell G, Gorrie-Stone TJ, Bao Y, Kumari M, Schalkwyk LS, Mill J, et al. Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array. BMC Genomics; 2019 May 10;:1–15.  153. Islam SA, Goodman SJ, MacIsaac JL, Obradović J, Barr RG, Boyce WT, et al. Integration of DNA methylation patterns and genetic variation in human pediatric tissues help inform EWAS design and interpretation. Epigenetics & Chromatin. 2019;12(1):1.  154. Hyde JS, Klein MH, Essex MJ, Clark R. Maternity leave and women's health. Psychology of Women Quarterly. 1995;:257–85.  155. Boyce WT, Quas J, Alkon A, Smider NA, Essex MJ, Kupfer DJ, et al. Autonomic reactivity and psychopathology in middle childhood. Br J Psychiatry. 2001 Aug;179:144–50.  156. Goodman SJ, Roubinov DS, Bush NR, Park M, Farré P, Emberly E, et al. Children's biobehavioral reactivity to challenge predicts DNA methylation in adolescence and emerging adulthood. Dev Sci. 4 ed. John Wiley & Sons, Ltd (10.1111); 2018 Sep 21;22(2):e12739–19.  157. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci USA. 1992 Mar 1;89(5):1827–31.  158. Parle-Mcdermott A, Harrison A. DNA methylation: a timeline of methods and applications. Front Genet. 2011.  159. Ramsay M. Epigenetic epidemiology: is there cause for optimism? Epigenomics. 2015 Aug;7(5):683–5.  161  160. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2014 Oct 27;6(6):692–702.  161. Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, et al. Genome-wide DNA methylation profiling using Infinium ®assay. Epigenomics. 2009 Oct;1(1):177–200.  162. Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F. Evaluation of the Infinium Methylation 450K technology. Epigenomics. 2011 Dec;3(6):771–84.  163. Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F. A comprehensive overview of Infinium HumanMethylation450 data processing. Briefings in Bioinformatics. 2014 Nov 19;15(6):929–41.  164. Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008 Jul 1;24(13):1547–8.  165. Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics & Chromatin. BioMed Central; 2013 Mar 3;6(1):4.  166. Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics. 2013 Jan 16;29(2):189–96.  167. Maksimovic J, Gordon L, Oshlack A. SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol. BioMed Central Ltd; 2012 Jun 15;13(6):R44.  168. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185–93.  169. Houseman EA, Houseman E, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics [Internet]. 2012;13(1):86.  170. Koestler DC, Christensen BC, Christensen B, Karagas MR, Marsit CJ, Langevin SM, et al. Blood-based profiles of DNA methylation predict the underlying distribution of cell types. Epigenetics [Internet]. 2014 Oct 27;8(8):816–26.  171. Smith AK, Kilaru V, Klengel T, Mercer KB, Bradley B, Conneely KN, et al. DNA extracted from saliva for methylation studies of psychiatric traits: evidence tissue specificity and relatedness to brain. 2015 Jan;168B(1):36–44.  162  172. Jones MJ, Islam SA, Edgar RD, Kobor MS. Adjusting for Cell Type Composition in DNA Methylation Data Using a Regression-Based Approach. Methods Mol Biol. New York, NY: Springer New York; 2017;1589(Suppl 2):99–106.  173. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006 Dec 6;8(1):118–27.  174. Nygaard V, Rødland EA, Hovig E. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics. 2016 Jan;17(1):29–39.  175. Du P, Zhang X, Huang C-C, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. BioMed Central; 2010 Nov 30;11(1):587.  176. Lemire M, Zaidi SHE, Ban M, Ge B, Aïssi D, Germain M, et al. Long-range epigenetic regulation is conferred by genetic variation located at thousands of independent loci. Nature Communications. Nature Publishing Group; 2015 Feb 26;6:6326.  177. Smith AK, Kilaru V, Kocak M, Almli LM, Mercer KB, Ressler KJ, et al. Methylation quantitative trait loci (meQTLs) are consistently detected across ancestry, developmental stage, and tissue type. BMC Genomics. BMC Genomics; 2014 Feb 21;15(1):1–11.  178. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015 Jan 20;43(7):e47–7.  179. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society Series B ( …. 1995;57(1):289–300.  180. Boyce WT, Kobor MS. Development and the epigenome: the “synapse” of gene-environment interplay. Dev Sci. 2014;18.  181. Chadwick LH, Sawa A, Yang IV, Baccarelli A, Breakefield XO, Deng HWW. New insights and updated guidelines for epigenome-wide association studies. Neuroepigenetics. 2015;1C.  182. Waterland RA, Michels KB. Epigenetic Epidemiology of the Developmental Origins Hypothesis. Annu Rev Nutr. 2007 Aug;27(1):363–88.  183. Birney E, Smith GD, Greally JM. Epigenome-wide Association Studies and the Interpretation of Disease -Omics. Barsh GS, editor. PLoS Genet. 2016 Jun 23;12(6):e1006105–9.  184. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12.  163  185. Henikoff S, Greally JM. Epigenetics, cellular memory and gene regulation. Curr Biol. 2016;26.  186. Bird A. Perceptions of epigenetics. Nature. 2007;447.  187. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13.  188. Bock C. Analysing and interpreting DNA methylation data. Nature Reviews Genetics. Nature Publishing Group; 2012 Oct 1;13(10):705–19.  189. Jirtle RL, Skinner MK. Environmental epigenomics and disease susceptibility. Nat Rev Genet. 2007;8.  190. Meaney MJ. Epigenetics and the biological definition of gene x environment interactions. Child Development. 2010 Jan;81(1):41–79.  191. Yuen RKC, Jiang R, Peñaherrera MS, McFadden DE, Robinson WP. Genome-wide mapping of imprinted differentially methylated regions by DNA methylation profiling of human placentas from triploidies. Epigenet Chromatin. 2011;4.  192. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010;11.  193. Michels KB, Binder AM, Dedeurwaerder S, Epstein CB, Greally JM, Gut I. Recommendations for the design and analysis of epigenome-wide association studies. Nat Methods. 2013;10.  194. Li Y, Zhu J, Tian G, Li N, Li Q, Ye M. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol. 2010;8.  195. Hatchwell E, Greally JM. The potential role of epigenomic dysregulation in complex human disease. Trends Genet. 2007;23.  196. Alisch RS, Barwick BG, Chopra P, Myrick LK, Satten GA, Conneely KN. Age-associated DNA methylation in pediatric populations. Genome Res. 2012;22.  197. Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol [Internet]. 2014;15(2):R31 198. Gao X, Jia M, Zhang Y, Breitling LP, Brenner H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clin Epigenet. BioMed Central; 2015 Dec 1;7(1):113.  199. Su D, Wang X, Campbell MR, Porter DK, Pittman GS, Bennett BD. Distinct epigenetic effects of tobacco smoking in whole blood and among leukocyte subtypes. PLoS ONE. 2016;11.  164  200. Bauer M, Fink B, Thürmann L, Eszlinger M, Herberth G, Lehmann I. Tobacco smoking differently influences cell types of the innate and adaptive immune system—indications from CpG site methylation. Clin Epigenet. 2016;8.  201. Bauer M, Linsel G, Fink B, Offenberg K, Hahn AM, Sack U. A varying T cell subtype explains apparent tobacco smoking induced single CpG hypomethylation in whole blood. Clin Epigenet. 2015;7.  202. Kundaje A, Ernst J, Yen A, Ziller MJ, Whitaker JW, Sandstrom RS. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518.  203. Farré P, Jones MJ, Meaney MJ, Emberly E, Turecki G, Kobor MS. Concordant and discordant DNA methylation signatures of aging in human blood and brain. Epigenet Chromatin. 2015;8.  204. Hannon E, Lunnon K, Schalkwyk L, Mill J. Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015 Oct 19;10(11):1024–32.  205. Edgar RD, Jones MJ, Meaney MJ, Turecki G, Kobor MS. BECon: a tool for interpreting DNA methylation findings from blood in the context of brain. Transl Psychiatry. 2017;7.  206. Lowe R, Gemma C, Beyan H, Hawa MI, Bazeos A, Leslie RD. Buccals are likely to be a more informative surrogate tissue than blood for epigenome-wide association studies. Epigenetics. 2014;8.  207. Slieker RC, Bos SD, Goeman JJ, Bovée JVMG, Talens RP, Breggen R. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenetics Chromatin. 2013;6.  208. Smith AK, Kilaru V, Klengel T, Mercer KB, Bradley B, Conneely KN. DNA extracted from saliva for methylation studies of psychiatric traits: evidence tissue specificity and relatedness to brain. Am J Med Genet Part B Neuropsychiatr Genet. 2015;168.  209. Grundberg E, Meduri E, Sandling JK, Hedman AK, Keildson S, Buil A. Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am J Hum Genet. 2013;93.  210. Gertz J, Varley KE, Reddy TE, Bowling KM, Pauli F, Parker SL, et al. Analysis of DNA Methylation in a Three-Generation Family Reveals Widespread Genetic Influence on Epigenetic Regulation. Bickmore WA, editor. PLoS Genet. Public Library of Science; 2011 Aug 11;7(8):e1002228.  211. Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, et al. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell. Elsevier; 2016 Nov 17;167(5):1398–1414.e24.  165  212. Cheung WA, Shao X, Morin A, Siroux V, Kwan T, Ge B. Functional variation in allelic methylomes underscores a strong genetic contribution and reveals novel epigenetic alterations in the human epigenome. Genome Biol. 2017;18.  213. Gutierrez-Arcelus M, Ongen H, Lappalainen T, Montgomery SB, Buil A, Yurovsky A, et al. Tissue-Specific Effects of Genetic and Epigenetic Variation on Gene Regulation and Splicing. Brown CD, editor. PLoS Genet. 2015 Jan 29;11(1):e1004958–25.  214. Fraser HB, Lam LL, Neumann SM, Kobor MS. Population-specificity of human DNA methylation. Genome Biol. 2012;13.  215. Klengel T, Mehta D, Anacker C, Rex-Haffner M, Pruessner JC, Pariante CM. Allele-specific FKBP5 DNA demethylation mediates gene–childhood trauma interactions. Nat Neurosci. 2012;16.  216. Gaunt TR, Shihab HA, Hemani G, Min JL, Woodward G, Lyttleton O, et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 2016 Mar 31;17(1):1256–15.  217. Portales-Casamar E, Lussier AA, Jones MJ, MacIsaac JL, Edgar RD, Mah SM. DNA methylation signature of human fetal alcohol spectrum disorder. Epigenet Chromatin. 2016;9.  218. Miller GE, Chen E, Fok AK, Walker H, Lim A, Nicholls EF. Low early-life social class leaves a biological residue manifested by decreased glucocorticoid and increased proinflammatory signaling. Proc Natl Acad Sci USA. 2009;106.  219. Price EM, Price ME, Cotton AM, Lam LL, Farré P, Farr P. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenet Chromatin. 2013;6.  220. Hicks SC, Irizarry RA. quantro: a data-driven approach to guide the choice of an appropriate normalization method. Genome Biol. 2015;16.  221. Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE. 2012;7.  222. Reinius LE, Acevedo N, Joerink M, Pershagen GR, Pershagen G, Dahlen S-E, et al. Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility. Ting AH, editor. PLoS ONE [Internet]. 2012 Jul 25;7(7):e41361.  223. Eipel M, Mayer F, Arent T, Ferreira MRP, Birkhofer C, Gerstenmaier U. Epigenetic age predictions based on buccal swabs are more precise in combination with cell type-specific DNA methylation signatures. Aging (Albany NY). 2016;8.  166  224. Theda C, Hwang SH, Czajko A, Loke YJ, Leong P, Craig JM. Quantitation of the cellular content of saliva and buccal swab samples. Sci Rep. 2018;8.  225. Edgar RD, Jones MJ, Meaney MJ, Turecki G, Kobor MS. BECon: a tool for interpreting DNA methylation findings from blood in the context of brain. Nature Publishing Group; 2017 Jul 26;7(8):e1187–10.  226. Guo Y, He J, Zhao S, Wu H, Zhong X, Sheng Q. Illumina human exome genotyping array clustering and quality control. Nat Protoc. 2014;9.  227. Gaffney DJ, Veyrieras JB, Degner JF, Pique-Regi R, Pai AA, Crawford GE. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. BioMed Central; 2012;13(1):1–15.  228. Lappalainen T, Sammeth M, Friedländer MR, T Hoen PAC, Monlong J, Rivas MA. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501.  229. Banovich NE, Lan X, McVicker G, van de Geijn B, Degner JF, Blischak JD, et al. Methylation QTLs Are Associated with Coordinated Changes in Transcription Factor Binding, Histone Modifications, and Gene Expression Levels. Reddy TE, editor. PLoS Genet [Internet]. 2014 Sep 18;10(9):e1004663.  230. Hannon E, Spiers H, Viana J, Pidsley R, Burrage J, Murphy TM, et al. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci. 2015 Nov 30;19(1):48–54.  231. Almstrup K, Lindhardt Johansen M, Busch AS, Hagen CP, Nielsen JE, Petersen JH. Pubertal development in healthy children is mirrored by DNA methylation patterns in peripheral blood. Sci Rep. 2016;6.  232. Xu CJ, Bonder MJ, Söderhäll C, Bustamante M, Baïz N, Gehring U. The emerging landscape of dynamic DNA methylation in early childhood. BMC Genomics. 2017;18.  233. Fisher HL, Murphy TM, Arseneault L, Caspi A, Moffitt TE, Viana J. Methylomic analysis of monozygotic twins discordant for childhood psychotic symptoms. Epigenetics. 2015;10.  234. Berko ER, Suzuki M, Beren F, Lemetre C, Alaimo CM, Calder RB. Mosaic epigenetic dysregulation of ectodermal cells in autism spectrum disorder. PLoS Genet. 2014;10.  235. McEwen LM, Morin AM, Edgar RD, MacIsaac JL, Jones MJ, Dow WH. Differential DNA methylation and lymphocyte proportions in a Costa Rican high longevity region. Epigenet Chromatin. 2017;10.  167  236. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestart ML. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci USA. 2005;102.  237. Slieker RC, Iterson M, Luijk R, Beekman M, Zhernakova DV, Moed MH. Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biol. 2016;17.  238. Poulsen P, Esteller M, Vaag A, Fraga MF. The epigenetic basis of twin discordance in age-related diseases. Pediatr Res. 2007;61.  239. Kaminsky ZA, Tang T, Wang S-C, Ptak C, Oh GHT, Wong AHC, et al. DNA methylation profiles in monozygotic and dizygotic twins. Nat Genet. 2009 Jan 18;41(2):240–5.  240. Martino DJ, Tulic MK, Gordon L, Hodder M, Richman TR, Metcalfe J, et al. Evidence for age-related and individual-specific changes in DNA methylation profile of mononuclear cells during early immune development in humans. Epigenetics. 2014 Oct 27;6(9):1085–94.  241. Jones MJ, Goodman SJ, Kobor MS. DNA methylation and healthy human aging. Aging Cell. John Wiley & Sons, Ltd (10.1111); 2015;14(6):924–32.  242. Dlugos DJ, Scattergood TM, Ferraro TN, Berrettinni WH, Buono RJ. Recruitment rates and fear of phlebotomy in pediatric patients in a genetic study of epilepsy. Epilepsy Behav. 2005;6.  243. Lin X, Teh AL, Chen L, Lim IY, Tan PF, MacIsaac JL. Choice of surrogate tissue influences neonatal EWAS findings. BMC Med. 2017;15.  244. Teschendorff AE, Yang Z, Wong A, Pipinikas CP, Jiao Y, Jones A. Correlation of smoking-associated DNA methylation changes in buccal cells with DNA methylation changes in epithelial cancer. JAMA Oncol. 2015;1.  245. Houtepen LC, Vinkers CH, Carrillo-Roa T, Hiemstra M, van Lier PA, Meeus W, et al. Genome-wide DNA methylation levels and altered cortisol stress reactivity following childhood trauma in humans. Nature Communications. Nature Publishing Group; 2016 Mar 21;7(1):10967.  246. Andrews SV, Ellis SE, Bakulski KM, Sheppard B, Croen LA, Hertz-Picciotto I, et al. Cross-tissue integration of genetic and epigenetic data offers insight into autism spectrum disorder. 2017 Oct 11;:1–9.  247. Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai S-L, et al. Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain. PLoS Genet. Public Library of Science; 6(5):e1000952.  168  248. Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ. A network-based analysis of systemic inflammation in humans. Nature. 2005;437.  249. Pacis A, Tailleux L, Morin AM, Lambourne J, MacIsaac JL, Yotova V, et al. Bacterial infection remodels the DNA methylation landscape of human dendritic cells. Genome Research [Internet]. Cold Spring Harbor Lab; 2015 Dec 1;25(12):1801–11 250. Marr AK, MacIsaac JL, Jiang R, Airo AM, Kobor MS, McMaster WR. Leishmania donovani Infection Causes Distinct Epigenetic DNA Methylation Changes in Host Macrophages. Horn D, editor. PLoS Pathog [Internet]. 2014 Oct 9;10(10):e1004419.  251. Jones MJ, Moore SR, Kobor MS. Principles and challenges of applying epigenetic epidemiology to psychology. Annu Rev Psychol. 2017;69.  252. Czamara D, Eraslan G, Lahti J, Page CM, Lahti-Pulkkinen M, Hämäläinen E, et al. Variably methylated regions in the newborn epigenome: environmental, genetic and combined influences. bioRxiv. Cold Spring Harbor Laboratory; 2018 Oct 17;:436113.  253. Allum F, Shao X, Guénard F, Simon MM, Busche S, Caron M. Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants. Nat Commun. 2015;6.  254. LaSalle JM, Powell WT, Yasui DH. Epigenetic layers and players underlying neurodevelopment. Trends Neurosci. Elsevier; 2013 Aug;36(8):460–70.  255. Ng JWY, Barrett LM, Wong A, Kuh D, Smith GD, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol. 2012 Jun 29;13(6):246.  256. Yin Y, Morgunova E, Jolma A, Kaasinen E, Sahu B, Khund-Sayeed S, et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. American Association for the Advancement of Science; 2017 May 5;356(6337):eaaj2239–17.  257. Vinkers CH, Kalafateli AL, Rutten BP, Kas MJ, Kaminsky Z, Turner JD, et al. Traumatic stress and human DNA methylation: a critical review. Epigenomics. 2015 Jun;7(4):593–608.  258. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. BioMed Central; 2011;12(1):R10.  259. Conradt E, Fei M, LaGasse L, Tronick E, Guerin D, Gorman D, et al. Prenatal predictors of infant self-regulation: the contributions of placental DNA methylation of NR3C1 and neuroendocrine activity. Front Behav Neurosci. Frontiers; 2015;9:130.  169  260. Ouellet-Morin I, Wong CCY, Danese A, Pariante CM, Papadopoulos AS, Mill J, et al. Increased serotonin transporter gene (SERT) DNA methylation is associated with bullying victimization and blunted cortisol response to stress in childhood: a longitudinal study of discordant monozygotic twins. Psychol Med. Cambridge University Press; 2013 Sep;43(9):1813–23.  261. Guillemin C, Provençal N, Suderman M, Côté SM, Vitaro F, Hallett M, et al. DNA methylation signature of childhood chronic physical aggression in T cells of both men and women. Marsit CJ, editor. PLoS ONE. Public Library of Science; 2014;9(1):e86822.  262. Alisch RS, Chopra P, Fox AS, Chen K, White ATJ, Roseboom PH, et al. Differentially methylated plasticity genes in the amygdala of young primates are linked to anxious temperament, an at risk phenotype for anxiety and depressive disorders. J Neurosci. Society for Neuroscience; 2014 Nov 19;34(47):15548–56.  263. Rothbart MK, Derryberry D. Development of individual differences in temperament. In: Lamb ME, Brown AL, editors. Advances in developmental psychology. Hillsdale, NJ: Erlbaum; 1981. pp. 37–86.  264. Goldsmith HH, Buss AH, Plomin R, Rothbart MK, Thomas A, Chess S, et al. Roundtable: what is temperament? Four approaches. Child Development. 1987 Apr;58(2):505–29.  265. Kagan J. Temperament. Rev. ed. Encyclopedia on Early Childhood Development [online]; 2012.  266. Rothbart MK. Temperament in Childhood: A framework. In: Kohnstamm GA, Bates JE, Rothbart MK, editors. Temperament in Childhood. Wiley; 1989. pp. 59–73.  267. Rutter M. Psychopathology and development: I. Childhood antecedents of adult psychiatric disorder. Aust N Z J Psychiatry. 1984 Sep;18(3):225–34.  268. Caspi A, Moffitt TE, Newman DL, Silva PA. Behavioral observations at age 3 years predict adult psychiatric disorders. Longitudinal evidence from a birth cohort. Arch Gen Psychiatry. 1996 Nov;53(11):1033–9.  269. Pine DS, Fox NA. Childhood antecedents and risk for adult mental disorders. Annu Rev Psychol.  Annual Reviews; 2015 Jan 3;66(1):459–85.  270. Muris P, Ollendick TH. The role of temperament in the etiology of child psychopathology. Clin Child Fam Psychol Rev. 2005 Dec;8(4):271–89.  271. Alkon A, Goldstein LH, Smider N, Essex MJ, Kupfer DJ, Boyce WT, et al. Developmental and contextual influences on autonomic reactivity in young children. Dev Psychobiol. 2002 Dec 6;42(1):64–78.  170  272. Gray JA. Neural systems, emotion and personality. In: Madden J, editor. Neurobiology of Learning, Emotion, and Affect. 4 ed. Raven Press; 1991.  273. Gartstein MA, Rothbart MK. Studying infant temperament via the Revised Infant Behavior Questionnaire. Infant Behavior and Development. 2003 Feb;26(1):64–86.  274. Rothbart MK, Ahadi SA, Hershey KL, Fisher P. Investigations of temperament at three to seven years: the Children's Behavior Questionnaire. Child Development. 2001 Sep;72(5):1394–408.  275. Parade SH, Leerkes EM. The reliability and validity of the Infant Behavior Questionnaire-Revised. Infant Behavior and Development. 2008 Dec;31(4):637–46.  276. Putnam SP, Rothbart MK. Development of short and very short forms of the Children's Behavior Questionnaire. J Pers Assess. Lawrence Erlbaum Associates, Inc; 2006 Aug;87(1):102–12.  277. Gagne JR, Van Hulle CA, Aksan N, Essex MJ, Goldsmith HH. Deriving childhood temperament measures from emotion-eliciting behavioral episodes: Scale construction and initial validation. Psychological Assessment. American Psychological Association; 2011 Jun 1;23(2):337–53.  278. Boyce WT, Essex MJ, Woodward HR, Measelle JR, Ablow JC, Kupfer DJ, et al. The Confluence of Mental, Physical, Social, and Academic Difficulties in Middle Childhood. I: Exploring the “Headwaters” of Early Life Morbidities. American Academy of Child and Adolescent Psychiatry; 2002 May 1;41(5):580–7.  279. Essex MJ, Boyce WT, Goldstein LH, Armstrong JM, Kraemer HC, Kupfer DJ, et al. The Confluence of Mental, Physical, Social, and Academic Difficulties in Middle Childhood. II: Developing the MacArthur Health and Behavior Questionnaire. American Academy of Child and Adolescent Psychiatry; 2002 May 1;41(5):588–603.  280. Lemery-Chalfant K, Shreiber JE, Schmidt NL, Van Hulle CA, Essex MJ, Goldsmith HH. Assessing Internalizing, Externalizing, and Attention Problems in Young Children: Validation of the MacArthur HBQ. Journal of the American Academy of Child & Adolescent Psychiatry. 2007 Oct;46(10):1315–23.  281. Luby JL, Heffelfinger A, Measelle JR, Ablow JC, Essex MJ, Dierker L, et al. Differential performance of the macarthur HBQ and DISC-IV in identifying DSM-IV internalizing psychopathology in young children. Journal of the American Academy of Child & Adolescent Psychiatry. 2002 Apr;41(4):458–66.  282. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5.  283. Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. 2010 May 25;107(21):9546–51.  171  284. Ringnér M. What is principal component analysis? Nat Biotechnol. 2008 Mar;26(3):303–4.  285. Bro R, Smilde AK. Principal component analysis. Anal Methods. 2014;6(9):2812–20.  286. Lee HK, Braynen W, Keshav K, Pavlidis P. ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics. BioMed Central; 2005 Nov 9;6(1):269.  287. Gillis J, Mistry M, Pavlidis P. Gene function analysis in complex data sets using ErmineJ. Nature Protocols. Nature Publishing Group; 2010 Jun 1;5(6):1148–59.  288. Price EM, Peñaherrera MS, Portales-Casamar E, Pavlidis P, Van Allen MI, McFadden DE, et al. Profiling placental and fetal DNA methylation in human neural tube defects. Epigenetics & Chromatin. BioMed Central; 2016 Feb 16;9(1):993.  289. Singmann P, Shem-Tov D, Wahl S, Grallert H, Fiorito G, Shin S-Y, et al. Characterization of whole-genome autosomal differences of DNA methylation between men and women. Epigenetics & Chromatin. BioMed Central; 2015;8(1):43.  290. Verhulst FC, van der Ende J, Ferdinand RF, Kasius MC. The prevalence of DSM-III-R diagnoses in a national sample of Dutch adolescents. Arch Gen Psychiatry. 1997 Apr;54(4):329–36.  291. Arnett JJ. Emerging adulthood: A theory of development from the late teens through the twenties. American Psychologist. American Psychological Association; 2000 May 1;55(5):469–80.  292. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013 May 29;45(6):580–5.  293. Stühmer T, Anderson SA, Ekker M, Rubenstein JLR. Ectopic expression of the Dlx genes induces glutamic acid decarboxylase and Dlx expression. Development. 2002 Jan;129(1):245–52.  294. Arcus D, Kagan J. Temperament and craniofacial variation in the first two years. Child Development. 1995 Oct;66(5):1529–40.  295. Giannoukakis N, Deal C, Paquette J, Goodyer CG, Polychronakos C. Parental genomic imprinting of the human IGF2 gene. Nat Genet. 1993 May 1;4(1):98–101.  296. Joyce JA, Lam WK, Catchpoole DJ, Jenks P, Reik W, Maher ER, et al. Imprinting of IGF2 and H19: lack of reciprocity in sporadic Beckwith-Wiedemann syndrome. Human Molecular Genetics. 1997 Sep;6(9):1543–8.  297. Weksberg R, Shen DR, Fei YL, Song QL, Squire J. Disruption of Insulin-Like Growth Factor-Ii Imprinting in Beckwith-Wiedemann Syndrome. Nat Genet. 1993 Oct;5(2):143–50.  172  298. Eggermann T. Silver-Russell and Beckwith-Wiedemann syndromes: opposite (epi)mutations in 11p15 result in opposite clinical pictures. Horm Res. Karger Publishers; 2009 Apr;71 Suppl 2(Suppl. 2):30–5.  299. Tycko B. Epigenetic gene silencing in cancer. J Clin Invest. American Society for Clinical Investigation; 2000 Feb;105(4):401–7.  300. Choufani S, Shuman C, Weksberg R. Molecular findings in Beckwith-Wiedemann syndrome. Am J Med Genet C Semin Med Genet. 2013 May;163C(2):131–40.  301. Rijlaarsdam J, Cecil CAM, Walton E, Mesirow MSC, Relton CL, Gaunt TR, et al. Prenatal unhealthy diet, insulin-like growth factor 2 gene (IGF2) methylation, and attention deficit hyperactivity disorder symptoms in youth with early-onset conduct problems. J Child Psychol Psychiatry. 5 ed. 2017 Jan;58(1):19–27.  302. Vangeel EB, Izzi B, Hompes T, Vansteelandt K, Lambrechts D, Freson K, et al. DNA methylation in imprinted genes IGF2 and GNASXL is associated with prenatal maternal stress. Genes Brain Behav. Blackwell Publishing Ltd; 2015 Nov;14(8):573–82.  303. Walton E, Hass J, Liu J, Roffman JL, Bernardoni F, Roessner V, et al. Correspondence of DNA Methylation Between Blood and Brain Tissue and Its Application to Schizophrenia Research. Schizophr Bull. 2016 Mar;42(2):406–14.  304. Simmons JM, Quinn KJ. The NIMH Research Domain Criteria (RDoC) Project: implications for genetics research. Mamm Genome. 2013 Oct 2;25(1-2):23–31.  305. Morris SE, Cuthbert BN. Research Domain Criteria: cognitive systems, neural circuits, and dimensions of behavior. Dialogues Clin Neurosci. Les Laboratoires Servier; 2012 Mar;14(1):29–37.  306. Adler N, Singh-Manoux A, Schwartz J, Stewart J, Matthews K, Marmot MG. Social status and health: A comparison of British civil servants in Whitehall-II with European- and African-Americans in CARDIA. Social Science & Medicine. 2008 Mar;66(5):1034–45.  307. Shonkoff JP. Capitalizing on Advances in Science to Reduce the Health Consequences of Early Childhood Adversity. JAMA Pediatr. 2016 Oct 1;170(10):1003–5.  308. Phelan JC, Link BG. Social conditions as fundamental causes of health inequalities. J Health Soc Behav. 1995;:80–94.  309. Hertzman C. The biological embedding of early experience and its effects on health in adulthood. Annals of the New York Academy of Sciences. 1999;896:85–95.  310. Fagundes CP, Glaser R, Kiecolt-Glaser JK. Stressful early life experiences and immune dysregulation across the lifespan. Brain Behav Immun. 2013 Jan;27:8–12.  173  311. Chen E, Miller GE, Kobor MS, Cole SW. Maternal warmth buffers the effects of low early-life socioeconomic status on pro-inflammatory signaling in adulthood. Molecular Psychiatry. Nature Publishing Group; 2010 May 18;16(7):729–37.  312. Dowd JB, Simanek AM, Aiello AE. Socio-economic status, cortisol and allostatic load: a review of the literature. Int J Epidemiol. 2009 Oct 1;38(5):1297–309.  313. Chen E, Fisher EB, Bacharier LB, Strunk RC. Socioeconomic Status, Stress, and Immune Markers in Adolescents With Asthma. Psychosomatic Medicine. 2003 Nov;65(6):984–92.  314. Borghol N, Suderman M, McArdle W, Racine A, Hallett M, Pembrey M, et al. Associations with early-life socio-economic position in adult DNA methylation. Int J Epidemiol. 2012 Mar 14;41(1):62–74.  315. Boyce WT, Kobor MS. Development and the epigenome: the “synapse” of gene-environment interplay. Dev Sci. 2014 Dec 28;18(1):1–23.  316. Yuen RK, Neumann SM, Fok AK, Peñaherrera MS, McFadden DE, Robinson WP, et al. Extensive epigenetic reprogramming in human somatic tissues between fetus and adult. Epigenetics & Chromatin. BioMed Central Ltd; 2011 May 5;4(1):7.  317. McDade TW, Jones MJ, Miller G, Borja J, Kobor MS, Kuzawa CW. Birth weight and postnatal microbial exposures predict the distribution of peripheral blood leukocyte subsets in young adults in the Philippines. J Dev Orig Health Dis. Cambridge University Press; 2018 Apr;9(2):198–207.  318. Galanter JM, Gignoux CR, Oh SS, Torgerson D. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. eLife. 2017.  319. Fagny M, Patin E, MacIsaac JL, Rotival M, Flutre T, Flutre TE, et al. The epigenomic landscape of African rainforest hunter-gatherers and farmers. Nature Communications [Internet]. 2015 Nov 30;6:10047.  320. Jones MJ, Fejes AP, Kobor MS. DNA methylation, genotype and gene expression: who is driving and who is along for the ride? Genome Biol [Internet]. 2013;14(7):126.  321. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol [Internet]. 2013;14(10):R115.  322. Shulha HP, Cheung I, Guo Y, Akbarian S, Weng Z. Coordinated Cell Type–Specific Epigenetic Remodeling in Prefrontal Cortex Begins before Birth and Continues into Early Adulthood. Ren B, editor. PLoS Genet. 2013 Apr 11;9(4):e1003433–12.  323. Tobi EW, Slieker RC, Luijk R, Dekkers KF, Stein AD, Xu KM, et al. DNA methylation as a mediator of the association between prenatal adversity and risk factors for metabolic 174  disease in adulthood. Sci Adv. American Association for the Advancement of Science; 2018 Jan;4(1):eaao4364.  324. Sosnowski DW, Booth C, York TP, Amstadter AB, Kliewer W. Maternal prenatal stress and infant DNA methylation: A systematic review. Dev Psychobiol. 2018 Jan 18;60(2):127–39.  325. Bush NR, Edgar RD, Park M, MacIsaac JL, McEwen LM, Adler NE, et al. The biological embedding of early-life socioeconomic status and family adversity in children's genome-wide DNA methylation. Epigenomics. 2018 Nov;10(11):1445–61.  326. Needham BL, Smith JA, Zhao W, Wang X, Mukherjee B, Kardia SLR, et al. Life course socioeconomic status and DNA methylation in genes related to stress reactivity and inflammation: The multi-ethnic study of atherosclerosis. Epigenetics. 2015 Aug 12;10(10):958–69.  327. Stringhini S, Polidoro S, Sacerdote C, Kelly RS, van Veldhoven K, Agnoli C, et al. Life-course socioeconomic status and DNA methylation of genes regulating inflammation. Int J Epidemiol. 2015 Sep 30;44(4):1320–30.  328. McDade TW, Ryan CP, Jones MJ, Hoke MK, Borja J, Miller GE, et al. Genome-wide analysis of DNA methylation in relation to socioeconomic status during development and early adulthood. Am J Phys Anthropol. 2019 Feb 15;169(1):3–11.  329. Cohen S, Alper CM, Doyle WJ, Adler N, Treanor JJ, Turner RB. Objective and subjective socioeconomic status and susceptibility to the common cold. Health Psychol. 2008 Mar;27(2):268–74.  330. Hu P, Adler NE, Goldman N, Weinstein M, Seeman TE. Relationship between subjective social status and measures of health in older Taiwanese persons. J Am Geriatr Soc. John Wiley & Sons, Ltd (10.1111); 2005 Mar;53(3):483–8.  331. Singh-Manoux A, Adler NE, Marmot MG. Subjective social status: its determinants and its association with measures of ill-health in the Whitehall II study. Social Science & Medicine. 2003 Mar;56(6):1321–33.  332. Singh-Manoux A, Marmot MG, Adler NE. Does Subjective Social Status Predict Health and Change in Health Status Better Than Objective Status? Psychosomatic Medicine. 2005 Nov;67(6):855–61.  333. Cohen S, Wills TA. Stress, social support, and the buffering hypothesis. Psychological Bulletin. 1985 Sep;98(2):310–57.  334. Hostinar CE, Johnson AE, Gunnar MR. Parent support is less effective in buffering cortisol stress reactivity for adolescents compared to children. Dev Sci. 2014 Jun 18;18(2):281–97.  175  335. Operario D, Adler NE, Williams DR. Subjective social status: reliability and predictive utility for global health. Psychology & Health. 2007 Feb;19(2):237–46.  336. Shaked D, Williams M, Evans MK, Zonderman AB. Indicators of subjective social status_ Differential associations across race and sex. SSM - Population Health. Elsevier; 2016 Dec 1;2(C):700–7.  337. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 2007 Sep;81(3):559–75.  338. Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003 Mar;33(S3):245–54.  339. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. Nature Publishing Group; 2009 Nov 10;462(7271):315–22.  340. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011 Jun 26;43(8):768–75.  341. Landau DA, Clement K, Ziller MJ, Boyle P, Fan J, Gu H, et al. Locally Disordered Methylation Forms the Basis of Intratumor Methylome Variation in Chronic Lymphocytic Leukemia. Cancer Cell. 2014 Dec;26(6):813–25.  342. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet. Nature Publishing Group; 2006 Dec 1;38(12):1378–85.  343. Guo S, Diep D, Plongthongkum N, Fung H-L, Zhang K, Zhang K. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet. 2017 Mar 6;49(4):635–42.  344. Miyamoto T, Hasuike S, Jinno Y, Soejima H, Yun K, Miura K, et al. The human ASCL2 gene escaping genomic imprinting and its expression pattern. J Assist Reprod Genet. Kluwer Academic Publishers-Plenum Publishers; 2002 May;19(5):240–4.  345. Pinderhughes EE, Dodge KA, Bates JE, Pettit GS, Zelli A. Discipline responses: influences of parents' socioeconomic status, ethnicity, beliefs about parenting, stress, and cognitive-emotional processes. J Fam Psychol. 2000 Sep;14(3):380–400.  346. Singh-Manoux A, Adler NE, Marmot MG. Subjective social status: its determinants and its association with measures of ill-health in the Whitehall II study. Social Science & Medicine. 2003 Mar;56(6):1321–33.  176  347. Quon EC, McGrath JJ. Subjective socioeconomic status and adolescent health: A meta-analysis. Health Psychol. 2014 May;33(5):433–47.  348. Hannon E, Dempster E, Viana J, Burrage J, Smith AR, Macdonald R, et al. An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome Biol. Genome Biology; 2016 Aug 25;:1–16.  349. Gamazon ER, Badner JA, Cheng L, Zhang C, Zhang D, Cox NJ, et al. Enrichment of cis-regulatory gene expression SNPs and methylation quantitative trait loci among bipolar disorder susceptibility variants. Molecular Psychiatry. Nature Publishing Group; 2012 Jan 3;18(3):340–6.  350. Geisheker MR, Heymann G, Wang T, Coe BP, Turner TN, Stessman HAF, et al. Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains. Nat Neurosci. 2017 Jun 19;20(8):1043–51.  351. Shibata H, Joo A, Fujii Y, Tani A, Makino C, Hirata N, et al. Association study of polymorphisms in the GluR5 kainate receptor gene (GRIK1) with schizophrenia. Psychiatr Genet. 2001 Sep;11(3):139–44.  352. Lasky-Su J, Neale BM, Franke B, Anney RJL, Zhou K, Maller JB, et al. Genome-wide association scan of quantitative traits for attention deficit hyperactivity disorder identifies novel associations and confirms candidate gene associations. Am J Med Genet. 2008 Dec 5;147B(8):1345–54.  353. Ayalew M, Le-Niculescu H, Levey DF, Jain N, Changala B, Patel SD, et al. Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction. Molecular Psychiatry. Nature Publishing Group; 2012 May 15;17(9):887–905.  354. Gluckman PD, Hanson MA, Cooper C, Thornburg KL. Effect of In Utero and Early-Life Conditions on Adult Health and Disease. N Engl J Med. 2008 Jul 3;359(1):61–73.  355. Painter RC, Osmond C, Gluckman P, Hanson M, Phillips D, Roseboom TJ. Transgenerational effects of prenatal exposure to the Dutch famine on neonatal adiposity and health in later life. BJOG: An International Journal of Obstetrics & Gynaecology. 2008 Sep;115(10):1243–9.  356. Cunliffe VT. The epigenetic impacts of social stress: how does social adversity become biologically embedded? Epigenomics. 2016 Dec;8(12):1653–69.  357. Tollerud DJ, Clark JW, Brown LM, Neuland CY, Pankiw-Trost LK, Blattner WA, et al. The influence of age, race, and gender on peripheral blood mononuclear-cell subsets in healthy nonsmokers. J Clin Immunol. 1989 May;9(3):214–22.  177  358. Pérez-de-Heredia F, Gómez-Martínez S, Díaz L-E, Veses AM, Nova E, Wärnberg J, et al. Influence of sex, age, pubertal maturation and body mass index on circulating white blood cell counts in healthy European adolescents—the HELENA study. Eur J Pediatr. Springer Berlin Heidelberg; 2015 Feb 10;174(8):999–1014.  359. Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, et al. Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. Sch beler D, editor. PLoS Genet [Internet]. 2009 Aug 14;5(8):e1000602.  360. Relton CL, Davey Smith G. Epigenetic Epidemiology of Common Complex Disease: Prospects for Prediction, Prevention, and Treatment. PLoS Med. Public Library of Science; 2010 Oct 26;7(10):e1000356–8.  361. Ollikainen M, Smith KR, Joo EJ-H, Ng HK, Andronikos R, Novakovic B, et al. DNA methylation analysis of multiple tissues from newborn twins reveals both genetic and intrauterine components to variation in the human neonatal epigenome. Human Molecular Genetics. 2010 Aug 10;19(21):4176–88.  362. Azad MB, Lissitsyn Y, Miller GE, Becker AB, HayGlass KT, Kozyrskyj AL. Influence of Socioeconomic Status Trajectories on Innate Immune Responsiveness in Children. Ahuja SK, editor. PLoS ONE. 2012 Jun 7;7(6):e38669–9.  363. Reik W, Walter J. Genomic imprinting: parental influence on the genome. Nature Reviews Genetics. Nature Publishing Group; 2001 Jan;2(1):21–32.  364. Perera F, Herbstman J. Prenatal environmental exposures, epigenetics, and disease. Reproductive Toxicology. 2011 Apr;31(3):363–73.  365. Dolinoy DC, Jirtle RL. Environmental epigenomics in human health and disease. Environ Mol Mutagen. John Wiley & Sons, Ltd; 2008;49(1):4–8.  366. Woodfine K, Huddleston JE, Murrell A. Quantitative analysis of DNA methylation at all human imprinted regions reveals preservation of epigenetic stability in adult somatic tissue. Epigenetics & Chromatin. 2011 Jan 31;4(1):1.  367. King K, Murphy S, Hoyo C. Epigenetic regulation of Newborns' imprinted genes related to gestational growth: patterning by parental race/ethnicity and maternal socioeconomic status. Journal of Epidemiology & Community Health. BMJ Publishing Group Ltd; 2015 Jul;69(7):639–47.  368. Dolinoy DC, Das R, Weidman JR, Jirtle RL. Metastable Epialleles, Imprinting, and the Fetal Origins of Adult Diseases. Pediatr Res. 2007 May;61(5 Part 2):30R–37R.  369. Breton CV, Marsit CJ, Faustman E, Nadeau K, Goodrich JM, Dolinoy DC, et al. Small-Magnitude Effect Sizes in Epigenetic End Points are Important in Children’s Environmental Health Studies: The Children’s Environmental Health and Disease 178  Prevention Research Center’s Epigenetics Working Group. Environ Health Perspect. 2017 Mar 14;125(4):1–16.  370. Belsky DW, Domingue BW, Wedow R, Arseneault L, Boardman JD, Caspi A, et al. Genetic analysis of social-class mobility in five longitudinal studies. Proc Natl Acad Sci USA. 2018 Jul 31;115(31):E7275–84.  371. Weng N-P, Akbar AN, Goronzy J. CD28− T cells: their role in the age-associated decline of immune function. Trends in Immunology. 2009 Jul;30(7):306–12.  372. Guintivano J, Aryee MJ, Kaminsky ZA. A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics [Internet]. 2014 Oct 27;8(3):290–302.  373. Houseman EA, Molitor J, Marsit CJ. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014 May 15;30(10):1431–9.  374. Montaño CM, Irizarry RA, Kaufmann WE, Talbot K, Gur RE, Feinberg AP, et al. Measuring cell-type specific differential methylation in human brain tissue. Genome Biol. BioMed Central Ltd; 2013 Aug 30;14(8):R94.  375. Choufani S, Cytrynbaum C, Chung BHY, Turinsky AL, Grafodatskaya D, Chen YA, et al. NSD1 mutations generate a genome-wide DNA methylation signature. Nature Communications. 2015 Dec 22;6(1):109.  376. Butcher DT, Cytrynbaum C, Turinsky AL, Siu MT, Inbar-Feigenberg M, Mendoza- Londono R, et al. CHARGE and Kabuki Syndromes: Gene-Specific DNA Methylation  Signatures Identify Epigenetic Mechanisms Linking These Clinically Overlapping  Conditions. Am J Hum Genet. 2017 May 4;100(5):773–88.   179  Appendices  Appendix A  Supplementary materials for chapter 3  A.1 Supplementary figures  Supplemental Figure 3.1 Cell type correction.  A) Predicted proportions of cell types in PBMCs for both datasets (Mono = monocytes; Gran = Granulocytes). B) Predicted proportions of cell types in BECs for both datasets. 180    Supplementary Figure 3.2 Identifying correlation coefficient thresholds for informative CpGs.  Beta mixture modeling on Spearman correlation rho values between matched BECs and PBMCs for A) GECKO and B) C3ARE cohorts. The bimodal distribution of Spearman rho values indicated two underlying populations of CpGs, a set of uncorrelated CpGs (shown in red) and a set of right-skewed highly positively correlated CpGs (shown in green). Correlation coefficient threshold for informative CpGs were determined at 2 standard deviations minus the mean of the green Gaussian distribution (GECKO rho = 0.47; C3ARE rho = 0.32).   181   Supplementary Figure 3.3 Similarities in population structure of GECKO and C3ARE. Principal component analysis of PsychChip genotyping profiles (542,699 SNPs) for C3ARE (shown in blue) and GECKO (shown in red) revealed that genetic ancestry did not differ significantly between the cohorts as determined by Wilcoxon ranked sum test of GECKO versus C3ARE in PC1 scores (p = 0.8) and PC2 scores (p = 0.4).   Supplementary Figure 3.4 Distribution of tissue correlations. Density distribution of Spearman’s correlation coefficient (Rho) across 419, 507 CpGs in matched BEC and PBMC tissues for GECKO, GECKOsub, GECKOsub Averaged (mean of 100 trials of GECKOsub) and C3ARE datasets.   182  Supplementary Figure 3.5 cis-mQTL identification. Overlap of cis-mQTLs identified in matched tissues of both C3ARE and GECKO cohorts, respectively.   183   Supplementary Figure 6. cis-mQTL enrichment in genomic features and informative CpGs. A) Representation of 4,980 CpGs underlying validated cis-mQTLs across various genomic features. Bars show the fold-change between CpG count in each genomic region and the mean count of randomly selected CpGs in that same genomic feature, from 10,000 iterations. Error bars show standard error.  (* denotes significant enrichment or depletion at FDR ≤ 0.05) (S = South; N = North). B) A) Stacked bar plot representing overlap of identified informative sites associated with BEC-specific, PBMC-specific and shared-tissue validated cis-mQTLs.     184  A.2 Supplementary tables  Supplementary Table 3.1 C3ARE PBMC Reference Range    0 ≥ 0.05 ≥ 0.1 ≥ 0.2 ≥ 0.5  Positive Correlation Rho 0 419,507 64,204 12,218 1,742 46 ≥ 0.3 74,137 24,542 7,474 1,282 44 ≥ 0.6 10,476 6,489 3,112 747 29 ≥ 0.9 45 45 35 9 0 Negative Correlation Rho ≤ -0.3 51,436 4,866 372 40 0 ≤ -0.6 3,300 327 34 6 0 ≤ -0.9 3 0 0 0 0 GECKO PBMC Reference Range    0 ≥ 0.05 ≥ 0.1 ≥ 0.2 ≥ 0.5  Positive Correlation Rho 0 419,507 131,227 28,311 3,597 159 ≥ 0.3 333,22 29,158 15,774 3,055 146 ≥ 0.6 6,285 6,174 5,355 1,985 119 ≥ 0.9 41 41 41 38 4 Negative Correlation Rho ≤ -0.3 1,557 331 82 7 0 ≤ -0.6 8 8 6 0 0 ≤ -0.9 0 0 0 0 0 GECKOsub PBMC Reference Range    0 ≥ 0.05 ≥ 0.1 ≥ 0.2 ≥ 0.5  Positive Correlation Rho 0 419,507 115,404 21,563 2,689 93 ≥ 0.3 30,615 26,385 12,916 2,306 88 ≥ 0.6 5,252 5,172 4,381 1,476 76 185  ≥ 0.9 11 11 11 10 0 Negative Correlation Rho ≤ -0.3 1,357 243 53 5 0 ≤ -0.6 5 5 4 0 0 ≤ -0.9 0 0 0 0 0      186  Supplementary Table 3.2  Type of Site # of Sites in Almstrup et al. 2017 (%) # of Sites in Berko et al. 2014 (%) # of Sites in Fisher et al. 2015 (%) # of Sites in Portales et al. 2016 (%) # of Sites in Xu et al. 2017 (%) All categories 0 (0%) 0 (0%) 1 (0.4%) 3 (0.5%) 33 (0.3%) Differential 67 (71.3%) 27 (36.5%) 42 (16.7%) 347 (52.7%) 6629 (68.3%) Informative 1 (1.1%) 18 (24.3%) 2 (0.8%) 9 (1.4%) 63 (0.6%) Informative & Differential 4 (0.04%) 6 (8.1%) 2 (0.8%) 21 (3.2%) 166 (1.7%) mQTL CpG 2 (0.02%) 0 (0%) 0 (0%) 6 (0.9%) 45 (0.5%) mQTL & Differential 0 (0%) 10 (13.5%) 2 (0.8%) 18 (2.7%) 168 (1.7%) mQTL & Informative 0 (0%) 0 (0%) 1 (0.4%) 2 (0.3%) 9 (0.09%) None 20 (21.3%) 13 (17.6%) 202 (80.2%) 252 (0.4%) 2591 (26.7%) Total Reported 94 (100%) 74 (100%) 252 (100%) 658 (100%) 9704 (100%)       187  Appendix B  Supplementary materials for chapter 4  B.1 Supplementary figures   Supplementary Figure 4.1 Associations between temperament, mental health and ANS measures supported preexisting relationships between these traits. Correlation matrix of pairwise spearman correlations run on 16 original traits (n= 55).    188    Supplementary Figure 4.2 Comparison of FDR corrected p-values and effect sizes (|∆b|) from Spearman correlations and linear regressions. In left panel, Spearman q-values vs. q-values produced by linear regression covaried by sex and minority status. In right panel, |∆b| from Spearman correlation analysis vs. linear regression. Of the 93 high and medium confidence CpGs, 40 were retained at a q-value < 0.2; this included 11 of the 12 original high confidence CpGs and 29 of 81 medium confidence. Horizontal lines represent thresholds used for medium confidence sites (q-value < 0.2, absolute |∆| > 5%). Sites are colored by their significance in Spearman analysis (red = high confidence, blue = medium confidence, grey = not significant) (n= 55).     189    Supplementary Figure 4.3 Associations between Inhibition/Disinhibition scores and DLX5 differentially methylated CpGs passed permutation tests at age 15. Inhibition/Disinhibition scores were randomly assigned 100 times and correlated with DNA methylation of the respective differentially methylated CpGs at age 15. True coefficients fell outside of the 99 percentile of the null distributions suggesting that these associations were not spurious. Colors represent to which gene the CpG is mapped: purple = DLX5, red= IGF2, blue= MYO16, green = PRUNE2 (n= 55).   −0.6− correlation coefficient190   Supplementary Figure 4.4 Correlations between DNA methylation at age 15 and Biobehavoural Inhibition/Disinhibition in males, females, subsampled females. Difference in magnitude in correlation coefficient between males and females was not greatly  altered by sample size in high and medium confidence CpGs. Full cohort (gray circle, n=55), females only (pink square, n=36) and males only (blue triangle, n=19). Average of subsampled females to n=19 and running correlations 100 times in black squares.    191   Supplementary Figure 4.5 Associations between Inhibition/Disinhibition scores and DLX5 differentially methylated CpGs passed permutation tests at age 18. Inhibition/Disinhibition scores were randomly assigned 100 times and correlated with DNA methylation of the respective differentially methylated CpGs at age 18. True coefficients fell outside of the 97 percentile of the null distributions suggesting that these associations were not spurious. Colors represent to which gene the CpG is mapped: purple = DLX5, red= IGF2 (n= 52).    −0.50− correlation coefficient192   Supplementary Figure 4.6 Associations between measures underlying Inhibition/Disinhibition and differentially methylated CpGs in DLX5 and IGF2. Observational anger measured at grade 1 was the most strongly correlated with DNA methylation at both genes.       193  B.2 Supplementary tables  Supplementary Table 4.1 site r2 adjusted r2 p-value FDR p-value gene name Distance to TSS (bp) significant at age 18 cg24115040 0.26705472 0.23713859 0.00020298 0.00304475 DLX5 1768 no cg11500797 0.24492205 0.21410254 0.00211411 0.01585586 DLX5 2020 nominal cg18946226 0.23227322 0.20093743 0.00333816 0.01669078 MYO16 4 no cg00503840 0.25395101 0.22350003 0.01088797 0.02557812 DLX5 3634 FDR cg11005826 0.17095002 0.13711125 0.01193646 0.02557812 IGF2 -3620 no cg13462129 0.2096587 0.17739987 0.00824102 0.02557812 DLX5 2862 nominal cg27016494 0.20067186 0.16804622 0.00986032 0.02557812 DLX5 2557 nominal cg19282250 0.18840758 0.15528136 0.01402684 0.02630033 PRUNE2 -9507 nominal cg20080624 0.20769858 0.17535975 0.01860574 0.03100956 DLX5 3032 nominal cg12041387 0.26227119 0.23215981 0.02260787 0.03391181 DLX5 3972 FDR cg14396117 0.15362305 0.11907705 0.02515218 0.03429843 MYO16 -452 no cg08878323 0.12021293 0.08430326 0.02750238 0.03437798 DLX5 1679 no cg18873386 0.10884631 0.07247269 0.03491743 0.04028935 DLX5 2228 no cg11880010 0.1058737 0.06937875 0.046848 0.05019428 PRUNE2 -9629 no cg21237591 0.10259212 0.06596322 0.13349063 0.13349063 IGF2 -169 no   194  Appendix C  Supplementary materials for chapter 5 C.1 Supplementary figures  Supplementary Figure 5.1 Self-reported ethnicity compared to first and second principal component, PC1 and PC2, of genotyping data. PC1 and PC2 accounted for 62.9% and 2.2%, respectively, of variation in the genotyping typing data (n=304). Individuals how were reported as Canadian (black points) did not cluster with a single group.    −0.10− 0.056 0.058PC1PC2Self-reported ethnicityAfrican American/BlackCanadianCaucasianEast AsianFirst NationsLatin AmericanMultiracial/MultiethnicOtherSouth AsianSoutheast AsianWest Asian195  C.2 Supplementary tables Supplementary Table 5.1 Association CpG p-value Delta beta (|∆b|) PosCan cg14156792 0.00010115 0.21364137 PosCan cg04121631 0.00047624 0.09693551 PosCan cg13798679 0.00017467 -0.0567288 PosCan cg19412669 0.00032285 -0.0694238 PosCan cg24375409 0.00040432 -0.1561574 PosCan cg23800435 0.00042675 -0.0700827 PosCan cg07813265 0.00046666 -0.0653389 PosCan cg10759651 0.00044754 -0.0650657 PosCan cg15642380 0.00042281 -0.0523463 PosCan cg14264194 0.00024619 -0.0891523 PosCan cg16962115 0.00014267 0.06193473 PosCan cg07700062 0.00011868 -0.0696864 PosCan cg21032567 0.00026929 -0.1224142 PosCan cg04944784 0.00043091 -0.0641391 PosCan cg04036593 0.0002826 -0.0726475 PosCan cg13099839 5.23E-05 -0.0515544 PosCan cg02643782 0.00035128 -0.0501024 PosCan cg02750792 0.00031068 -0.1353819 PosCan cg17641710 0.00013531 -0.0620558 PosCan cg12615535 0.00032064 -0.0582703 PosCan cg26095395 0.00023092 -0.0861033 PosCan cg19934294 0.00038678 -0.0527305 PosCan cg20080590 0.00017811 0.0604394 PosCan cg22725685 4.13E-05 -0.0571288 PosCan cg09526685 0.00017105 -0.2373299 PosCan cg07033961 0.0004648 0.05767375 PosCan cg07593523 0.00017744 0.12077722 PosCan cg22179564 0.00027546 -0.115591 PosCan cg27034935 0.00033082 -0.1268946 PosCan cg04467589 0.00043884 0.07443024 PosCan cg18808777 0.00024927 0.08343909 PosCan cg12231373 0.00015062 -0.0712878 PosCan cg23317857 0.000299 -0.0667741 PosCan cg17117243 8.03E-05 -0.0664841 196  PosCan cg20706496 0.00048304 0.0776308 PosCan cg26247093 0.00044294 0.05752354 PosCan cg23129930 0.00014753 -0.0841488 PosCan cg05928186 0.00023347 -0.0747499 PosCan cg11850943 0.00023262 0.10052228 PosCan cg13477780 0.00037471 -0.061358 PosCan cg16008979 0.00033659 -0.0670469 PosCan cg01545109 0.00037998 0.06239001 PosCan cg12978800 0.00044255 -0.0821842 PosCan cg07910560 0.00013986 -0.0749372 PosCan cg14316231 0.00024185 -0.0589439 PosCan cg17450425 0.00024747 -0.0558773 PosCan cg03609398 0.00022357 0.15829943 PosCan cg04332163 0.00046729 -0.0841953 PosCan cg24756403 0.00013329 -0.0647096 PosCan cg19054360 0.00017341 -0.0725608 PosCan cg07692929 0.00010954 -0.0520654 PosCan cg15920975 0.00037333 -0.0752409 PosCan cg16118839 0.00018664 -0.0595886 PosCan cg18031880 0.00018902 -0.0983282 PosCan cg19704853 0.00035763 -0.0506376 PosCan cg01158527 0.00043732 -0.0786045 PosCan cg15282632 0.00013413 -0.098587 PosCan cg20320656 0.00042416 -0.1181435 PosCan cg01290565 5.46E-05 -0.0867748 PosCan cg02556042 0.00037104 -0.1055742 PosCan cg04217706 0.00017364 -0.0730008 PosCan cg03885975 0.00041433 -0.0935023 PosCan cg22124136 0.00041323 -0.069707 PosCan cg09842196 0.00035373 -0.0626533 PosCan cg26027442 0.00025074 -0.0502048 PosCan cg13427748 0.00012653 0.05918896 PosCan cg00611495 0.00020907 -0.0671208 PosCan cg04022912 4.15E-05 -0.0601433 PosCan cg04337176 7.50E-05 0.06980444 PosCan cg08634357 0.00034218 -0.0804004 PosCan cg13710086 7.30E-05 -0.0909291 PosCan cg01443020 0.00049335 -0.1321152 197  PosCan cg20520147 0.00024052 0.05538793 PosCan cg26933683 0.00033785 -0.0614819 PosCan cg13093111 9.23E-05 -0.0618555 PosCan cg25867973 0.00031598 -0.0563275 PosCan cg16387467 0.00035251 -0.0603054 PosCan cg06650914 0.00011353 -0.0604465 PosCan cg14453935 0.00042824 -0.1140602 PosCan cg06744585 0.00044713 -0.0897358 PosCan cg16521040 0.00012418 -0.0610724 PosCan cg21448033 0.00011843 -0.0519864 PosCan cg19498228 5.78E-05 -0.0727495 PosCan cg09265161 6.00E-05 0.08819041 PosCan cg00587465 0.00033157 -0.0794221 PosCan cg24083746 0.00014622 -0.067266 PosCan cg17611686 0.00045125 -0.0797195 PosCan cg00481227 0.00041518 -0.0758613 PosCan cg01906944 0.00025278 -0.0670976 PosCan cg03199996 0.0002659 -0.0634823 PosCan cg16443667 0.00025983 -0.096753 PosCan cg20652954 3.29E-05 -0.0783382 PosCan cg02966332 0.00024204 -0.0645007 PosCan cg01622416 0.00021334 0.06907193 PosCan cg17449254 0.00027277 -0.0880016 PosCan cg05298458 0.00045953 -0.0771892 PosCan cg10941566 0.00022258 -0.0706103 PosCan cg14335069 1.08E-05 -0.0647062 CompSES cg09895325 0.00014561 0.0893803 CompSES cg18936040 0.00018443 -0.0784756 CompSES cg06936779 0.00042399 -0.0629242 CompSES cg09671955 6.86E-05 0.07365604 CompSES cg20707323 0.00038393 -0.0520781 CompSES cg23933289 2.76E-05 0.05123025 CompSES cg22622057 0.00040571 -0.0689931 CompSES cg07919760 0.0004145 0.05684397 CompSES cg04231677 0.00023307 -0.0668022 CompSES cg00493617 0.00021664 -0.0559326 CompSES cg25356886 0.00015271 -0.0710399 CompSES cg25429719 0.00037047 -0.1274762 198  CompSES cg08217411 0.00030288 -0.0718378 CompSES cg04216051 0.00028257 0.0687372 CompSES cg23855365 0.00019923 -0.0669263 CompSES cg11519983 0.00020843 0.1778019 CompSES cg07033961 0.00015115 0.05959503 CompSES cg11230940 0.00039609 -0.1359001 CompSES cg14677130 0.00014382 -0.0740824 CompSES cg25843003 0.00017644 0.05902741 CompSES cg18808777 2.03E-05 0.08900093 CompSES cg04452713 0.00037745 -0.054961 CompSES cg26325286 0.0004448 -0.1087012 CompSES cg15970595 2.94E-05 -0.0706644 CompSES cg24617008 0.0002581 0.05015504 CompSES cg14087413 0.00046576 -0.0561771 CompSES cg05407489 0.00026246 -0.0507447 CompSES cg03330490 0.00027947 0.05268399 CompSES cg16664405 1.21E-05 -0.0550979 CompSES cg19572051 0.0004547 -0.2256547 CompSES cg02425416 0.00022209 -0.0772744 CompSES cg03306615 0.00014973 -0.1374105 CompSES cg26051413 0.00022444 -0.1700336 CompSES cg13762320 0.00042667 -0.0904743 CompSES cg11644479 0.00024021 -0.1147157 CompSES cg24353535 9.76E-05 -0.1179523 CompSES cg09394785 0.00021661 -0.0753936 CompSES cg08577913 0.00033554 -0.1244275 CompSES cg23745839 0.00039035 -0.0598093 CompSES cg22040889 7.97E-05 -0.1038312 CompSES cg15392457 0.00019981 -0.0555871 CompSES cg07451370 0.00032614 -0.0674942 CompSES cg09274587 0.00025504 -0.0608646 CompSES cg18071532 3.20E-05 -0.0906912 CompSES cg20320656 0.00021098 -0.1248849 CompSES cg02556042 0.000255 -0.1014616 CompSES cg21892295 0.00018245 0.13881512 CompSES cg09460553 0.0004271 -0.1398706 CompSES cg02315096 0.00012557 -0.158518 CompSES cg18865445 0.00028788 -0.1540926 199  CompSES cg17797229 0.0003987 -0.1929233 CompSES cg16761754 1.42E-05 0.21773072 CompSES cg13563725 0.00039187 -0.0510089 CompSES cg22713356 0.00020384 -0.1161049 CompSES cg26826117 0.00028432 0.0874998 CompSES cg07933656 6.58E-05 -0.1860448 CompSES cg06650914 0.00022659 -0.0529829 CompSES cg22761077 0.00022981 -0.0510489 CompSES cg07425555 0.00026551 -0.0768959 CompSES cg08440266 0.00011204 -0.1103966 CompSES cg01817521 0.00032099 0.05556459 CompSES cg02966332 0.00043331 -0.0587546 CompSES cg06066601 0.00015817 -0.0915649    200  Supplementary Table 5.2 DMR Association CpG p-value Delta beta (|∆b|) Chrom. UCSC gene name UCSC gene region 1 CompSES cg25356886 0.00015666 -0.0709474 2 CRYGD Body cg25429719 0.00038003 -0.1273029 2 CRYGD TSS200 2 CompSES cg25843003 0.00017652 0.05901498 6 HCP5 3'UTR cg18808777 2.05E-05 0.08897175 6 HCP5 3'UTR 3 CompSES cg03306615 0.00014735 -0.1375116 11 ASCL2 TSS1500 cg26051413 0.00021987 -0.1702083 11 ASCL2 TSS1500 cg13762320 0.00042162 -0.0905445 11 ASCL2 TSS1500 cg11644479 0.00023712 -0.1148043 11 ASCL2 TSS1500 cg24353535 9.66E-05 -0.1180188 11 ASCL2 TSS1500 cg09394785 0.00021844 -0.0753635 11 ASCL2 TSS1500 4 CompSES cg09460553 0.00041728 -0.1399785 13     cg02315096 0.00012539 -0.1585108 13     cg18865445 0.00028475 -0.1541414 13     cg17797229 0.00038613 -0.1931293 13     5 PosCan cg23129930 0.00015588 -0.0838737 7 HOXA6 1stExon cg05928186 0.00024912 -0.0744607 7 HOXA6 1stExon 6 PosCan cg13710086 7.3006E-05 -0.0909291 17     cg01443020 0.00049335 -0.1321152 17        201  Supplementary Table 5.3 Association CpG p-value FDR p-value Delta beta (|∆b|) CompSES cg02315096 0.0136646 0.145755682 -0.172215 CompSES cg02556042 0.02623619 0.167911635 -0.129417 CompSES cg03330490 0.00403811 0.098718093 0.05830829 CompSES cg07451370 0.02196338 0.156184047 -0.0866381 CompSES cg14087413 7.39E-05 0.004728329 -0.1885805 CompSES cg16761754 0.0167502 0.153144701 0.22322337 CompSES cg21721566 0.0415096 0.241510396 -0.1507618 CompSES cg21892295 0.00616988 0.098718093 0.19760927 CompSES cg22713356 0.00532333 0.098718093 -0.070068 CompSES cg23745839 0.02045977 0.156184047 -0.0605778 CompSES cg24353535 0.04644109 0.2476858 -0.095346 CompSES cg25429719 0.01268284 0.145755682 -0.1436036 PosCan cg03609398 0.00210781 0.040891504 0.17576249 PosCan cg04467589 0.03214918 0.271474505 0.06417462 PosCan cg06744585 0.01558784 0.18900253 -0.0585618 PosCan cg07910560 0.03756607 0.271474505 -0.0594183 PosCan cg11850943 0.04955304 0.271474505 0.06798295 PosCan cg14453935 0.00039672 0.019240826 -0.1360803 PosCan cg15282632 0.0019624 0.040891504 -0.1373493 PosCan cg16443667 0.00133221 0.040891504 -0.0524907 PosCan cg17449254 0.04441765 0.271474505 -0.0519896 PosCan cg18031880 0.00673665 0.10890913 -0.123132 PosCan cg18808777 0.02171587 0.234048826 0.10168585 PosCan cg23129930 0.035589 0.271474505 -0.0729138 PosCan cg24375409 0.04422148 0.271474505 -0.0824201 PosCan cg27034935 3.15E-05 0.003059637 -0.1474872  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items