UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

DNA methylation in human development : methodologies and analytics for genome-wide studies Price, Eva Magdalena Wagner 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_may_price_eva.pdf [ 6.13MB ]
Metadata
JSON: 24-1.0300236.json
JSON-LD: 24-1.0300236-ld.json
RDF/XML (Pretty): 24-1.0300236-rdf.xml
RDF/JSON: 24-1.0300236-rdf.json
Turtle: 24-1.0300236-turtle.txt
N-Triples: 24-1.0300236-rdf-ntriples.txt
Original Record: 24-1.0300236-source.json
Full Text
24-1.0300236-fulltext.txt
Citation
24-1.0300236.ris

Full Text

  DNA METHYLATION IN HUMAN DEVELOPMENT: METHODOLOGIES AND ANALYTICS FOR GENOME-WIDE STUDIES  by  Eva Magdalena Wagner Price  B.Sc., The University of Guelph, 2009  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Reproductive and Developmental Sciences)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  April 2016  © Eva Magdalena Wagner Price, 2016 ii  Abstract High-throughput methods have resulted in a large volume of studies measuring genome-wide DNA methylation (DNAm) in association with human health and disease. Understanding of DNAm patterns may be translated, for example, into predicting children at risk for illness or identifying etiological subtypes within a heterogeneous disease. Addressing biological and technical factors affecting measurement of genome-wide DNAm is essential to reduce false discovery in such studies. This dissertation develops principles for analyzing genome-wide DNAm, with the aim of improving collection and analysis of human developmental data. To this end, I present four studies employing several techniques to measure genome-wide DNAm: DNAm of L1 and Alu repetitive elements in addition to Illumina 27k and 450k DNAm microarrays. In the first of these studies, I found that tissue type, gestational age, technical platform and CpG density contribute to variable measurement of genome-wide DNAm. Subsequent studies primarily used the 450k array to measure genome-wide DNAm, a technology targeting 485,577 sites in the genome. A detailed annotation of the 450k array was created and tested, to enhance this platform’s utility. Array probes targeting sites containing SNPs (4.3%) and non-specific probes (8.6%) were identified, and I examined how these compromised probes may result in spurious discoveries. A pilot study in placental tissue identified batch effects in 450k data. A computational tool was applied to reduce the batch signal, but I demonstrated that when applied to a problematic study design, false biological signal may be introduced. The workflow for processing and analyzing genome-wide DNAm data was finally applied to profile five tissues ascertained from second trimester neural tube defect (NTD)-affect pregnancies. Despite research, medical interventions and public health changes, NTDs remain the second most common congenital abnormality in many parts of the world, and the etiology of these cases is unknown. Using the 450k array, I found 3,342 differentially methylated sites in the kidneys of spina bifida cases compared to gestational-age matched controls, but little alteration in genome-wide DNAm in other NTD tissues. This dissertation contributes methodologies and analytical tools that will help manage bias, improve reproducibility and reduce false discoveries in studies of genome-wide DNAm. iii  Preface Parts of this dissertation were performed by collaborators:  Chapter 2 o Samples were ascertained by Dr Deborah McFadden. DNA extraction and sample preparation were performed by Ruby Jiang and Dr Maria Peñaherrera. Illumina Infinium HumanMethylation27 BeadChips were run by Dr Maria Peñaherrera, Dr Ryan Yuen, Sarah Neumann and Lucia Lam.  Chapter 3 o Samples were ascertained by Robinson lab research coordinators and Dr Michael Kobor. DNA extraction and sample preparation were performed by Ruby Jiang, Sarah Neumann and Lucia Lam. Illumina Infinium HumanMethylation450 BeadChips were run by Dr Maria Peñaherrera, Dr Ryan Yuen, Sarah Neumann and Lucia Lam. Lucia Lam also conducted pre-processing of 450k data and gene feature annotation. Dr Allison Cotton annotated the location of closest TSS and conducted part of the CpG island annotation. Pau Farré conducted parts of the enrichment analyses. Dr Eldon Emberly performed non-specific probe annotation.  Chapter 4 o Samples were ascertained by Robinson lab research coordinators. DNA extraction and sample preparation were performed by Ruby Jiang. Illumina Infinium HumanMethylation450 BeadChips were run by Dr Maria Peñaherrera and Dr Courtney Hanna. Dr Courtney Hanna performed placental MTHFR genotyping in half of the samples.  Chapter 5 o Samples were ascertained by Robinson lab research coordinators and Dr Deborah McFadden. Part of the DNA extraction and sample preparation were performed by Ruby Jiang.   iv  Parts of this dissertation were previously published in: 1. Hogg K, Price EM, Hanna CW, Robinson WP. Prenatal and perinatal environmental influences on the human fetal and placental epigenome. Clin Pharmacol Ther. 2014 Dec;92(6):716-26.   © 2012 John Wiley and Sons. Section of text authored by EMP included with permission in Chapter 1. All authors contributed equally to the content development of this review article. I authored the section covering maternal nutrition and contributed two figures. CWH authored the section covering exogenous steroid hormones and contributed one figure. KH authored the section covering maternal stress, contributed one figure and compiled the manuscript. WPR authored the remainder of the content and supervised the study. 2. Robinson WP, Price EM. The human placental methylome. Cold Spring Harb Perspect Med. 2015 Feb 26;5(5):a023044.   © 2015 Cold Spring Harbor Laboratory Press. Section of text authored by EMP included with permission in Chapter 1. I authored the section covering DNA methylation changes in response to exposure, conducted analyses for and created two figures, and contributed to the development of manuscript content. The remainder of the publication was authored by WPR.  3. Price EM, Cotton AM, Peñaherrera MS, McFadden DE, Kobor MS, Robinson W. Different measures of "genome-wide" DNA methylation exhibit unique properties in placental and somatic tissues. Epigenetics. 2012 Jun 1;7(6):652-63.   © 2015 Landes Bioscience. Data and text published in this article are included with permission in Chapter 2. I collected and analyzed pyrosequencing data, performed array analyses and was the primary author of the manuscript. AMC contributed to analyses. MSP processed Illumina arrays and contributed to analyses. DEM ascertained tissue samples. MSK aided in array data collection. WPR generated research goals and supervised the research. All authors critically edited the article.   v  4. Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, Robinson WP, Kobor MS. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. Mar 3;6(1):4.   © 2013 Price et al., licensee BioMed Central. Data and text published in this article are contained in Chapters 1 and 3. I designed and contributed to array annotation, conceived and conducted testing of annotation, carried out pyrosequencing, participated in CpG island and SNP annotation, participated in array data processing and drafted the manuscript. AMC designed and contributed to array annotation, participated in CpG island annotation and contributed to testing of annoation. LLL processed Illumina data, conducted gene feature annotation, participated in SNP annotation and contributed to the design of the study. PF conducted enrichment analyses. EE conducted non-specific probe analyses and contributed to the design of the study. WPR contributed to data analysis and study design. MSK and CJB conceived of the study and contributed to data analysis and study design. All authors edited the article.  5. Price EM, Peñaherrera MS, Portales-Casamar E, Pavlidis P, Van Allen MI, McFadden DE, Robinson WP. Profiling placental and fetal DNA methylation in human neural tube defects. Epigenetics Chromatin. 2016. 9:6.  © 2016 Price et al. under the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). Data and text published in this article are contained within Chapter 5. I participated in the study design, conducted the statistical analyses, ran the Illumina arrays, designed and carried out molecular studies and drafted the manuscript. MSP participated in the study design, molecular studies and data analysis and ran the Illumina arrays. EP-C and PP participated in the statistical analyses. MVA participated in the study design and recruitment of study patients. DM participated in the study design and collection of patient samples. WPR participated in the study design, coordination and statistical analyses. All authors critically edited the article.  vi  Collection of samples used in this dissertation was approved by the University of British Columbia/Children’s Hospital and Women’s Health Centre of British Columbia Research Ethics Board (UBC C&W REB); certificate numbers: H04-70488, H06-70085, H10-1028.   Given that Chapters 2, 3 and 5 remain largely unchanged from their published versions, I have retained the use of plural first person pronouns in these sections, except for introductions to each of these chapters. In the remainder of the dissertation, singular first person pronouns are employed.   The names of R packages and functions are distinguished throughout this dissertation by the use of courier new italics font. vii  Table of Contents Abstract ................................................................................................................................ ii Preface ................................................................................................................................. iii Table of Contents ................................................................................................................. vii List of Tables ........................................................................................................................ xii List of Figures ...................................................................................................................... xiii List of Abbreviations ............................................................................................................. xv Acknowledgements ............................................................................................................. xix Dedication ............................................................................................................................ xx Chapter 1: Introduction ......................................................................................................... 1 1.1 Dissertation context and overview ................................................................................ 2 1.2 Epigenetics – a mechanism for fetal programming ....................................................... 3 1.2.1 Epigenome-wide association studies ................................................................ 4 1.3 DNA methylation ............................................................................................................ 4 1.3.1 Genomic landscape of CpG dinucleotides ........................................................ 5 1.3.2 Relationship of DNA methylation with gene expression .................................. 5 1.3.3 DNAm in development ..................................................................................... 7 1.3.4 Tissue differences in DNA methylation............................................................. 9 1.3.5 Placental DNA methylation ............................................................................. 10 1.3.6 Methods for measuring DNA methylation ..................................................... 10 1.3.7 DNAm microarrays – the Illumina platform ................................................... 12 1.4 Folate-mediated one carbon metabolism .................................................................... 16 1.4.1 Sources of folate and dietary requirements ................................................... 16 1.4.2 Folate uptake and transport during pregnancy .............................................. 17 1.4.3 Suboptimal folate ............................................................................................ 17 1.4.4 Consequences of suboptimal folate during pregnancy .................................. 18 1.4.5 Neural tube defects ........................................................................................ 19 1.5 Research objectives ...................................................................................................... 20 viii  Chapter 2: Different measures of genome-wide DNAm exhibit unique properties in placental and somatic tissues ............................................................................................................. 22 2.1 Introduction .................................................................................................................. 23 2.2 Materials and methods ................................................................................................ 25 2.2.1 Sample collection ............................................................................................ 25 2.2.2 Pyrosequencing ............................................................................................... 26 2.2.3 Illumina Infinium HumanMethylation27 BeadChip array (27k array) ............ 27 2.2.4 MethylFlash global DNAm kit ......................................................................... 28 2.2.5 Statistical analyses .......................................................................................... 28 2.3 Results .......................................................................................................................... 29 2.3.1 Total 5-mC as measured by ELISA was not reproducible ............................... 29 2.3.2 Evolutionary age and assay method affect the assessment of L1 and Alu DNAm .............................................................................................................. 29 2.3.3 Evidence for association of DNAm in weak islands with L1 and non-islands . 32 2.3.4 Dispersed DNAm assays each produce a distinct tissue-specific DNAm profile .............................................................................................................. 34 2.3.5 Distance to transcription start site is associated with promoter CpG island density and distinct trends in DNAm .............................................................. 36 2.3.6 Preliminary within-individual correlation of dispersed DNAm ....................... 38 2.3.7 Increase of weak island and non-island DNAm in chorionic villi through gestation ......................................................................................................... 39 2.4 Discussion ..................................................................................................................... 41 Chapter 3: Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array .................................................... 44 3.1 Introduction .................................................................................................................. 45 3.2 Materials and methods ................................................................................................ 46 3.2.1 Annotation ...................................................................................................... 46 3.2.2 SNP annotation ............................................................................................... 47 ix  3.2.3 Non-specific probe annotation ....................................................................... 47 3.2.4 CpG enrichment annotation ........................................................................... 48 3.2.5 Gene feature and TSS annotation ................................................................... 50 3.2.6 Sample collection ............................................................................................ 52 3.2.7 Illumina Infinium HumanMethylation450 BeadChip (450k array) processing 52 3.2.8 Processing of aging dataset ............................................................................ 53 3.2.9 Pyrosequencing ............................................................................................... 54 3.2.10 Statistical analyses .......................................................................................... 54 3.3 Results and discussion .................................................................................................. 55 3.3.1 Polymorphic CpGs may affect the assessment of DNA methylation .............. 55 3.3.2 8-9% of probes mapped to more than one location in silico ......................... 60 3.3.3 Comparing Illumina and HIL annotation of probes highlighted differences between CpG classification systems ............................................................... 62 3.3.4 DNA methylation was variable across nine gene feature groups .................. 66 3.4 Conclusion .................................................................................................................... 69 Chapter 4: Correction for batch effects using ComBat improves 450k data in a pilot study of placental MTHFR genotype .................................................................................................. 71 4.1 Introduction .................................................................................................................. 72 4.2 Materials and methods ................................................................................................ 75 4.2.1 Sample collection and case characteristics .................................................... 75 4.2.2 Illumina Infinium HumanMethylation450 BeadChip quality control and pre-processing ....................................................................................................... 76 4.2.3 Differential DNA methylation analyses........................................................... 77 4.2.4 Array-wide analyses ........................................................................................ 78 4.2.5 DMR analysis ................................................................................................... 78 4.2.6 Statistical software.......................................................................................... 78 4.3 Results .......................................................................................................................... 79 4.3.1 Initial processing of 450k MTHFR data ........................................................... 79 x  4.3.2 A second attempt at batch effect correction ................................................. 82 4.3.3 Array-wide DNA methylation .......................................................................... 86 4.3.4 Differential methylation of individual CpG sites array-wide .......................... 88 4.4 Discussion ..................................................................................................................... 90 Chapter 5: Profiling placental and fetal DNA methylation in human neural tube defects ...... 94 5.1 Introduction .................................................................................................................. 95 5.2 Materials and methods ................................................................................................ 96 5.2.1 Ethics approval ................................................................................................ 96 5.2.2 Sample collection ............................................................................................ 96 5.2.3 Case characteristics ......................................................................................... 98 5.2.4 MTHFR genotyping.......................................................................................... 99 5.2.5 Illumina Infinium HumanMethylation450 BeadChip quality control and pre-processing ....................................................................................................... 99 5.2.6 Probe to gene annotation ............................................................................. 100 5.2.7 Differential DNA methylation analyses......................................................... 101 5.2.8 Biologically relevant candidate CpG sites analysis ....................................... 101 5.2.9 Genome-wide analyses – 450k array ............................................................ 102 5.2.10 DNA methylation assessment by pyrosequencing ....................................... 102 5.2.11 GO analysis .................................................................................................... 103 5.2.12 DMR analysis ................................................................................................. 103 5.2.13 Publicly available data and analysis .............................................................. 103 5.2.14 Statistical software........................................................................................ 104 5.3 Results ........................................................................................................................ 104 5.3.1 MTHFR genotyping........................................................................................ 104 5.3.2 Differential methylation of biologically relevant candidate CpG sites ......... 105 5.3.3 Genome-wide DNA methylation ................................................................... 105 5.3.4 Differential methylation of CpG sites array-wide ......................................... 110 xi  5.3.5 Follow up of differentially methylated CpG sites in kidney and chorionic villi ................................................................................................................. 114 5.4 Discussion ................................................................................................................... 116 Chapter 6: Discussion ......................................................................................................... 120 6.1 Summary of dissertation ............................................................................................ 121 6.2 Introduction to discussion .......................................................................................... 122 6.3 Standardized approaches will reduce technical noise in EWAS ................................. 123 6.4 Accounting for demographics will enhance disease-associated discovery in EWAS . 125 6.5 Responsible reporting will accelerate the field of EWAS ........................................... 127 6.6 The future of EWAS in open source ........................................................................... 128 Bibliography ....................................................................................................................... 130 Appendices ........................................................................................................................ 156 Appendix A Supplementary tables and figures for Chapter 2 ................................................ 157 Appendix B Supplementary tables and figures for Chapter 3 ................................................ 168 Appendix C Supplementary tables and figures for Chapter 4 ................................................ 181 Appendix D Supplementary tables and figures for Chapter 5 ................................................ 182  xii  List of Tables Table 1.1 Considerations for choosing DNA methylation technique ........................................... 11 Table 1.2 Comparison of Illumina 27k and 450k DNA methylation arrays [68, 69] ..................... 12 Table 2.1 Spearman correlation of DNAm at five ReDS ............................................................... 34 Table 3.1 The majority of highly variable probes* were annotated with SNPs ........................... 59 Table 3.2 Location of in silico cross-hybridization of non-specific probes ................................... 61 Table 4.1 Clinical characteristics of cases ..................................................................................... 76 Table 4.2 Initial processing: DM sites at FDR <0.05 before and after ComBat ........................... 81 Table 4.3 Array-wide measures of DNA methylation ................................................................... 87 Table 5.1 Clinical characteristics of cases ..................................................................................... 98 Table 5.2 Genome-wide DNA methylation by tissue .................................................................. 109  xiii  List of Figures Figure 1.1 Dramatic changes in genome-wide DNA methylation during embryogenesis.............. 7 Figure 1.2 Illumina Infinium DNA methylation microarray technology. ...................................... 15 Figure 1.3 Folate-mediated one carbon metabolism. .................................................................. 16 Figure 2.1 Schematic of analyses performed in Chapter 2. .......................................................... 25 Figure 2.2 The assessment of Alu and L1 DNAm is affected by evolutionary age and assay type...................................................................................................................................... 31 Figure 2.3 Unsupervised hierarchical clustering of 27k array data separated samples by tissue of origin ........................................................................................................................... 32 Figure 2.4 Five ReDS exhibit different tissue patterns of DNAm. ................................................. 35 Figure 2.5 Distance to transcription start site (TSS) influences DNAm in promoters. ................. 37 Figure 2.6 Intra-individual correlation of dispersed DNAm. ......................................................... 39 Figure 2.7 Increase in DNAm at Alu, weak islands and non-islands through gestation. .............. 40 Figure 3.1 Relative location of probes to target CpG ................................................................... 47 Figure 3.2 Illustration of Illumina and HIL CpG classes ................................................................. 49 Figure 3.3 Illustration of gene feature annotation ....................................................................... 51 Figure 3.4 Probes targeting polymorphic CpGs may affect the assessment of DNA methylation57 Figure 3.5 Comparison of the genomic distribution of Illumina-annotated CpG probe classes within each HIL-annotated CpG probe class ............................................................... 63 Figure 3.6 Distinct patterns of DNAm within CpG classification systems .................................... 64 Figure 3.7 Enrichment of differentially methylated probes in many CpG classes ....................... 65 Figure 3.8 Contribution of HIL CpG classes to probes in nine gene feature groups..................... 67 Figure 3.9 Variation of gene feature DNAm within a CpG class ................................................... 68 Figure 4.1 Initial processing: p-value distributions for linear modelling of MTHFR group at each processing step. .......................................................................................................... 80 Figure 4.2 Initial processing: association of top five PCs before and after application of ComBat...................................................................................................................................... 81 Figure 4.3 Location of samples on seven 450k chips. ................................................................... 82 xiv  Figure 4.4 Second processing: p-value distributions for linear modelling of MTHFR group at six analysis steps. ............................................................................................................. 84 Figure 4.5 Second processing: M value distributions for two replicate samples at each processing step. .......................................................................................................... 85 Figure 4.6 Second processing: association of top five PCs before and after application of ComBat. ..................................................................................................................... 85 Figure 4.7 Unsupervised hierarchical clustering of MTHFR samples using 442,348 CpGs. .......... 87 Figure 4.8 Array-wide volcano plots. ............................................................................................ 88 Figure 4.9 Distribution of unadjusted p-values from linear models separated by CpG density. . 89 Figure 5.1 Sample clustering based on array-wide DNA methylation. ....................................... 107 Figure 5.2 Tissue distribution of unadjusted p-values from linear modelling of differential methylation in NTDs. ................................................................................................ 110 Figure 5.3 Spina bifida array-wide volcano plots. ....................................................................... 112 Figure 5.4 Differentially methylated CpG sites in the chorionic villi comparison of anencephaly cases to controls. ...................................................................................................... 113 Figure 5.5 Identification and investigation of persistent hits in spina bifida kidneys. ............... 115  xv  List of Abbreviations 27k  Illumina Infinium HumanMethylation27 BeadChip 450k Illumina Infinium HumanMethylation450 BeadChip 5,10-CH2-THF 5,10-methenyltetrahydrofolate 5-C unmethylated cytosine 5-CH3-THF 5-methyltetrahydrofolate 5-mC methylated cytosine 5-hmC hydroxymethyl cytosine AN  anencephaly AU approximate unbiased auto autosomal B.C. British Columbia bp(s) base pair(s) BPA bisphenol A br brain bsc bisulfite converted C cytosine base CH3 methyl group chr chromosome CON control CpG cytosine-phosphate-guanosine ddNTP dideoxynucleotide deltaβ difference in DNAm DM (1) differentially methylated   (2) differential methylation DMR differentially methylated region DNAm DNA methylation DNMT DNA methyltransferase  xvi  DNP dinitrophenol DOHaD developmental origins of health and disease dTMP deoxythymidine monophosphate dUMP deoxyuracil monophosphate  ESPNL ectoplasmic specialization protein like  EWAS epigenome-wide association study F female FA folic acid FDR false discovery rate FR folate receptor GA gestational age GEO Gene Expression Omnibus GO gene ontology GWAS genome-wide association study HC high density CpG island hcy homocysteine HIL high density CpG island, intermediate density CpG island, low CpG density HPLC high performance liquid chromatography  IC intermediate density CpG island ICM inner cell mass ICshore intermediate density CpG island bordering an HC IQR inter-quartile range IUGR  intrauterine growth restriction kid kidney KS Kruskal-Wallis L1 long interspersed element 1 LC low CpG density or non-island LOPE late onset preeclampsia m methylated xvii  M male mRNA messenger RNA MS methionine synthase mSNP methylation-association SNP MTHFR methylenetetrahydrofolate reductase mus muscle n number N no ns not significant NTD neural tube defect obs/exp observed/expected OCM one carbon metabolism OD optical density PARP1 poly (ADP-Ribose) Polymerase 1 PC principal component PCA principal component analysis R (1) R programming language and software environment (2) reverse RE repetitive element ReDS Representative Dispersed Sequences  RFC-1 reduced folate carrier RRBS reduced representation bisulfite sequencing rs reference SNP SAH S-adenosylhomocysteine SAM S-adenosylmethionine SB spina bifida sc spinal cord SD standard deviation SNP single nucleotide polymorphism xviii  T thymine tDM tissue differentially methylated tDMR tissue differentially methylated region TE trophectoderm THF tetrahydrofolate TSS transcription start site u unmethylated U uracil UCSC University of California, Santa Cruz v chorionic villi WGBS whole genome bisulfite sequencing wks weeks X methyl acceptor Y yes yrs years β beta value  xix  Acknowledgements  The completion of this dissertation was achieved through the support of a number of people. First, I would like to acknowledge the immense contribution of my graduate supervisor, Dr Wendy Robinson, whose balance of mentorship and space allowed me to develop scientifically and discover some unexpected interests. Also, thank you to my doctoral supervisory committee – Dr Diana Juriloff, Dr Michael Kobor and Dr Dan Rurak for your advice and valuable time through my studies. Past and present members of the Robinson lab – Ruby, Courtney, Kirsten, Irina, John, Dan, Ryan, Olivia, Chaini, Sam, Giulia, Johanna – thank you for sharing your experiences, the stimulating discussions and helpful feedback.   Special thanks to Kristal for her recruitment of patients, the families of the patients who donated to these studies and the C&W pathology lab for sample collection. Dr Maria Peñaherrera – your dedication, enthusiasm and scientific standards were an integral part of my success. I am eternally grateful to Dr Allison Cotton, who was always available to me and challenged my ideas – your mentorship was indispensable. To Dr Meaghan Jones, Rachel Edgar, Sarah Goodman and Sumaiya Islam, thank you so much for the debates and sharing of code.   Finally, to my family – Maria, Steven and Karl, your love, encouragement and advice have got me to the finish line. Gord, thank you for giving me the love, time and space to explore this interest.    Thank you to the institutions that provided financial support for the work presented in this dissertation: the Canadian Institutes of Health Research, the Child and Family Research Institute and the University of British Columbia.  xx  Dedication  To my family.1   Chapter 1: Introduction  Sections of this chapter have previously been published (see Preface for contribution details):   Hogg K, Price EM, Hanna CW, Robinson WP. Prenatal and perinatal environmental influences on the human fetal and placental epigenome. Clin Pharmacol Ther. 2014 Dec;92(6):716-26.  Robinson WP, Price EM. The human placental methylome. Cold Spring Harb Perspect Med. 2015 Feb 26;5(5):a023044. Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, Robinson WP, Kobor MS. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. 2013 Mar 3;6(1):4. 2   1.1 Dissertation context and overview  With few exceptions, every cell in the human body contains the same DNA sequence. And yet, over 200 cell types exist, with vast differences in their structure and function. Cell type differences in the epigenetic mark DNA methylation (DNAm), can modulate or be modulated by gene expression, contributing to the occurrence of “one genome – many cell types”. Patterns of DNAm may contribute to a population’s normative trait variation and disease susceptibility. DNAm may be less sensitive than mRNA to technical factors like time from collection to storage [1], making it a preferred, stable mark to measure in many scenarios. In the field of reproductive biology, DNAm assessed in human fetal and placental tissues can provide a wealth of knowledge to understand aspects of pregnancy, including fetal adaptation to the in utero environment, abnormal changes in cell composition and even predisposition to disease [2]. For example, large and widespread differences in DNAm have been identified in placentas ascertained from early- but not late-onset pre-eclampsia [3]. This finding supports the clinical subdivision of pre-eclampsia into two groups, and is suggestive of differing etiologies.    Within the past decade, the ability to measure DNAm marks has scaled up from assessing a few sites in the genome to looking at hundreds of thousands in tandem, providing exciting opportunities for discovery and application. The fact that DNAm data can be produced by multiple technologies and the mark varies with biological characteristics like age, tissue type and environmental exposures, means that analysis of this type of high-throughput data requires fluency in both genetics and statistics [4]. An active debate on the approaches for experimental design, data collection and analysis of genome-scale DNAm data is ongoing in this budding field [4-9].   In this dissertation, I present studies that investigate both technical and biological factors influencing the assessment of genome-scale DNAm in pregnancy-associated tissues (Chapters 2-4). This culminated in the development of a workflow for DNAm microarray data analysis, which was applied in Chapter 5 to the study of neural tube defects (NTDs), a severe congenital abnormality found at a rate of nearly one in a thousand Canadian births. The utility 3   of the workflow was demonstrated by the fact that 64% of sites identified as differentially methylated in kidney samples ascertained from 2nd trimester NTDs, were replicated in publically available data. In addition, two sites identified as differentially methylated in placenta of NTDs were validated by a secondary technique, and one of these replicated in an extended set of NTD samples. The approach to DNAm microarray analysis presented here contributes to improving the quality of DNAm microarray studies in medical research. There is room for continued improvement of these tools through future research, with a sound prospect for future applications to medical care. 1.2 Epigenetics – a mechanism for fetal programming Developmental origins of health and disease (DOHaD) is a theory which evolved out of studies in the 1980s correlating infant birthweight with adult disease risk [10]. DOHaD now posits that “developmental cues extending from the oocyte to the infant… [have] widespread consequences for later health” [10]. Though development continues after birth, “critical periods of plasticity exist for most organs and systems during gestation… followed by loss of plasticity and fixed functional capacity” [11]. This developmental plasticity thus makes in utero exposures (e.g., nutrition, oxygen, hormones or toxins), instrumental in programming the fetus and placenta, with impacts on both normative trait variation and predisposition to disease [12].   A mechanism by which the fetus and/or placenta might respond to or reflect the in utero environment is epigenetics, i.e., “heritable changes within a cell without a change in the DNA sequence” [13]. Epigenetics is often analogized in the public media as a “light dimmer switch”, modulating the quantity, timing and location of gene expression [14-16]. However, this relationship is not unidirectional; epigenetic marks may themselves be modulated by the expression of genes [17]. Epigenetic mechanisms include chemical modifications to histones and DNA, which can affect chromatin packaging and/or binding of transcription machinery. The establishment of some epigenetic marks coincides with critical periods of developmental plasticity, which has popularized epigenetics as a mechanism for DOHaD [10]. Responses to or consequences of in utero exposures on the epigenome likely depend on many factors (e.g., 4   timing, quantity, type, length of exposure) and may be localized to specific regions or have genome-wide effects [12].   1.2.1 Epigenome-wide association studies  Epigenome-wide association studies (EWAS) are a tool for identifying epigenetic loci correlated with complex phenotypes [18], i.e., traits determined by both genetic and environmental factors. It is hoped that EWAS will provide a “missing piece” in understanding the etiology of such diseases and traits [4]. Though the number of EWAS publications is on a steep rise [5], this field stands where genome-wide association studies (GWAS) were, perhaps a decade ago, when studies were small and rarely replicated findings for any given disease [8]. Issues with EWAS common to the field of GWAS must first be addressed, including collection of data on multiple platforms, minimum sample sizes, batch effects, and normalization [4]. EWAS are further complicated by the fact that unlike GWAS, which examine stable genetic marks, epigenetic marks vary by biological factors (e.g., age, environmental exposure, tissue type) [4]. As evident by numerous cautionary design and analysis reviews [5, 6, 19, 20], the field of EWAS is still in its infancy, but must mature quickly to make use of current research efforts [4]. 1.3 DNA methylation  DNA methylation (DNAm) is the most commonly studied mark in EWAS [5], because it is relatively easy to measure (see Section 1.3.6) and it has a well-studied relationship with gene expression (see Section 1.3.2). In humans, DNAm usually involves the addition of a methyl group (CH3) to the fifth carbon of a cytosine (C) base, in the context of a cytosine-phosphate-guanosine (CpG) dinucleotide [21]. Non-CpG DNAm (e.g., CpA, CpT or CpG DNAm) occurs primarily in pluripotent cells (~25% of DNAm in human embryonic stem cells versus <0.05% of DNAm in fetal lung fibroblasts), with low conservation of patterns across passages and cell lines [22, 23]. Thus, unless otherwise stated, the term “DNAm” has been used in this dissertation with respect to DNAm at CpG dinucleotides. The CpG sequence is palindromic (i.e., it bears the same 5’ -> 3’ sequence on both leading and lagging strands); this property is important for the heritability of DNAm patterns. The enzyme DNA methyltransferase 1 (DNMT1) binds hemi-5   methylated DNA present following cell division and catalyzes the addition of methyl groups to the new strand, ensuring the propagation of DNAm patterns through mitosis [24]. The initial establishment of DNAm is catalyzed by other DNA methyltransferases, DNMT3A and DNMT3B, discussed in Section 1.3.3.  1.3.1 Genomic landscape of CpG dinucleotides  CpG dinucleotides are not randomly distributed throughout the genome; with genomic evolution, most have spontaneously deaminated, except for some enriched regions known as “CpG islands” [25]. Of all CpGs, about one-half fall in repetitive sequences, while the remainder are located in unique intragenic and intergenic sequences [26]. About 70% of gene promoters are associated with CpG islands [27] and these generally have a low level of DNAm [28], while CpG islands in repetitive elements are generally highly methylated [26]. Several classifications exist for defining regional CpG enrichment; for example, the University of California Santa Cruz (UCSC) Genome Browser defines a CpG island as a region with length >200 bps, CG content >50%, obs/exp CpG ratio >0.60 [29]. An alternate system provides more enrichment discrimination by classification into high density CpG islands (HCs, length >500 bps, CG content >55%, obs/exp CpG ratio >0.75), intermediate density CpG islands (ICs, length >200 bps, CG content >50%, obs/exp CpG ratio >0.48) and low CpG density or non-islands (LCs, non HCs/IC regions) [30, 31]. About 65% of promoter-associated CpG islands are classified as HC by this second system [30]. Promoters associated with HCs have been termed strong island promoters; those associated with ICs, weak island promoters; and those not associated with a CpG island, non-island promoters [30]. The preservation of CpG-rich regions in the genome through evolution suggests functional importance [21].  1.3.2 Relationship of DNA methylation with gene expression  Until recently, studies of DNAm have concentrated on examining CpG islands near transcription start sites (TSS), which has largely informed the perception of the function of DNAm in the human genome [17]. Though most promoter CpG islands are unmethylated, a portion becomes methylated during development [28] (discussed in Section 1.3.3). The 6   presence of DNAm at a promoter CpG island is generally associated with transcriptional repression of the related gene [32]. Two mechanisms proposed for this relationship are: inhibition of transcription factor binding in the presence of methylated CpGs and recruitment of methyl-binding proteins associated with chromatin condensation [24]. In a study measuring DNAm at promoter CpG islands and gene expression in the same set of 55 individuals, the canonical negative correlation between promoter DNAm and gene expression held when looking across all genes within an individual [33]. However, on a single gene level, variation in DNAm across individuals was only correlated with gene expression at a small number of promoters [33, 34]. Most HCs are unmethylated in somatic tissues, many in association with transcriptionally inactive genes [28]; thus, DNAm may not be necessary to inhibit gene expression. It is suggested that multiple epigenetic and genetic mechanisms may work in tandem to modulate gene expression [35]. DNAm at CpG islands may better reflect the potential for gene transcription, rather than the current transcriptional status [33].  Investigations of DNAm are not limited to the patterns at CpG island promoters. There is growing interest in DNAm in the 2 kbs up- and down-stream of CpG islands, known as “CpG island shores”. Shores demonstrate more variability in DNAm patterns than CpG islands, including between different tissues and individuals, and in comparing cancer to healthy tissues [36-38]. DNAm might be prevented in promoter CpG islands and shores of actively transcribed genes [35] by the presences of factors (e.g. transcription factors, RNA polymerase) that restrict DNMT access [24]. In fact, the level of DNAm in shores is often found to be more tightly correlated with gene expression than that of CpG islands [39, 40]. The DNAm of gene bodies has been shown to have a “U”-shaped relationship with gene expression; lower levels of gene-body DNAm associated with the lowest and highest levels of gene expression, whereas higher levels of gene-body DNAm associated with intermediate levels of gene expression [41]. Thus, the relationship between DNAm and gene expression is likely much more complex than initially posited. 7   1.3.3 DNAm in development  In healthy adult tissues, DNAm is relatively stable, though a global gradual loss of DNAm and increase in inter-individual variation has been observed with aging [28, 42]. In contrast, rapid and dramatic spatial and temporal changes in DNAm are observed early in embryo development (Figure 1.1) [43]. Most of the research on DNAm in early development comes from experiments conducted in mouse embryos, though a few recent studies highlighted below suggest similar findings in human development. Changes in DNAm are timed with critical points of differentiation, which strongly suggests a role for DNAm in tissue lineage differentiation.    Figure 1.1 Dramatic changes in genome-wide DNA methylation during embryogenesis  Studies in early mouse and human embryos document dramatic changes in DNAm during development. Looking genome-wide or at the “bulk genome”, DNAm is mostly erased during the first few cell divisions. Establishment of new DNAm marks coincides with lineage differentiation - inner cell mass (ICM)-derived lineages exhibit high levels of DNAm while trophectoderm (TE)-derived lineages exhibit relative hypomethylation. Mouse and human numbers represent days-post fertilization. Line colour indicates lineage – light grey, sperm; dark grey oocyte; green, TE and derivatives; blue, ICM and ectoderm; red, mesoderm; yellow, endoderm; E, embryonic. (© 2002 Nature Publishing Group; image based on [44], with permission). 8     Looking at the whole genome, prior to fertilization, sperm are highly methylated, while oocytes exhibit a more intermediate-high level of DNAm [45]. After fertilization, the paternal pronucleus undergoes rapid demethylation without DNA replication (termed active demethylation). DNMT1 is excluded from the nucleus of the oocyte and demethylation continues passively through successive rounds of DNA replication in the fertilized embryo [43]. Study of demethylation is still fairly new, but the modified nucleotide 5-hydroxymethyl cytosine (5-hmC) is thought to play a role in active and passive demethylation. In a murine study, 5-hmC was enriched in the paternal pronucleus 2.5 hours after fertilization, coinciding with decreased 5-mC [46]. 5-hmC may displace proteins that bind to and protect methylated DNA or prevent DNMT1 binding, resulting in passive demethylation. Alternatively, 5-hmC may be recognized by DNA repair mechanisms and excised, resulting in active demethylation [47]. The embryo continues to lose DNAm through to the blastocyst stage, at which point the compacted inner cell mass (ICM) and surrounding trophectoderm (TE) are similarly hypomethylated [45, 48]. The loss of gametic DNAm patterns is observed throughout the genome (though to a lesser extent in repetitive elements), with the exception of some imprinted genes, which maintain a parent-of-origin pattern of DNAm.  De novo DNAm is initiated at implantation [48] and dramatic gains of DNAm are observed in the ICM [45, 49], catalyzed by the DNA methyltransferases DNMT3A and DNMT3B. The spatial trends in DNAm described in Section 1.3.1 are quickly established, i.e., high levels of DNAm throughout the genome with the exception of CpG islands. Interestingly, some pluripotency- and developmental-gene CpG island promoters are noted to be highly methylated in ICM-derived tissues, suggesting a role for DNAm in restricting cell fate [49]. Once implanted, gastrulation takes place and the ICM differentiates into three germ layers – the ectoderm (giving rise to spinal cord, brain, nails), mesoderm (giving rise to skeleton, muscle, kidneys) and endoderm (giving rise to lungs, liver, thyroid). Clustering of genome-wide DNAm of 17 mouse tissues showed significant grouping of samples by their germ layer origin [50], indicating that major trends in DNAm are established during gastrulation and retained in derivative tissues. 9   1.3.4 Tissue differences in DNA methylation  One of the original postulated functions of epigenetic regulation was tissue differentiation [28], and it is now widely-held that variation in DNAm is a driver of lineage restriction. Between-tissue variation in genome-wide DNAm within an individual is much greater than within tissue-variation across a population, as indicated by the grouping of samples originating from the same tissue over samples originating from the same individual in multi-tissue studies [37, 51, 52]. This pattern highlights the association of DNAm patterns with tissue identity. Tissue-specific differentially methylated regions or tDMRs, present throughout the genome, were described in the early 2000s [53]. Soon thereafter, tDMRs were found to be located primarily in intermediate CpG density regions, with three quarters of tDMRs located in the 2 kbs surrounding CpG islands, popularizing the term and definition of “CpG island shores” in epigenetics [37]. In fact, the tissue signal of DNAm in these regions is conserved in mice and DNAm of tDMRs groups tissues by organ-type (e.g., brain, liver and spleen) over species (mouse vs. human), consistent with evolutionary conservation of DNAm in these regions.   Further discrimination of DNAm patterns has been noted between cell types isolated from within a tissue. The most studied tissue is whole blood, containing six broad nucleus-contributing lineages – macrophages, lymphocytes, basophils, neutrophils, eosinophils and nucleated red blood cells. Since DNAm acts as “epigenetic memory” [28], with patterns conserved in derivative cells, hematopoietic lineages can be recapitulated using DNAm profiles of blood cell types [54]. Differences in DNAm between cell types within blood were small, but associated with transcription factors involved in lineage specification. Not only do these findings strongly suggest a role for DNAm in lineage specification, but they have important implications for DNAm study design and analysis [55]. In a tissue sample of mixed cell populations, a difference identified between study groups may reflect a change in DNAm at that location limited to one cell type or suggest a difference in the composition of cell types between groups. Though the distribution of DNAm across the genome differs between tissues, ICM-derived lineages exhibit a similar average genome-wide level of DNAm, distinctive from that of TE-derived lineages. 10   1.3.5 Placental DNA methylation  The TE surrounding the developing embryo does not exhibit remethylation to the same extent as the ICM upon implantation. The TE maintains a lower level of DNAm, which may be due in part to reduced expression of DNMT1 in this lineage [56]. The TE gives rise to the major placental cell type – trophoblast – that makes up a large component of placental chorionic villi. Of healthy tissues, the DNAm of chorionic villi is the most distinct, exhibiting about 10-25% less DNAm than somatic tissues [57]. Unique features of chorionic villus DNAm include: hypomethylation of repetitive elements [58], large regions (>100 kbs) of hypomethylation termed partially methylated domains [59], hypermethylation of tumor suppressor genes [58], and unique genomic imprinting [60, 61]. The purpose of this DNAm profile remains unclear, through it perhaps contributes to placental qualities like controlled invasion of the maternal decidua or allows for adaptation to a range of in utero conditions [62].   1.3.6 Methods for measuring DNA methylation  In a single diploid cell, at a single CpG locus, DNAm can be represented by one of three states (with corresponding value): absent on both alleles (0), present on both alleles (1), or present on one allele (0.5). A human tissue, however, contains many cells, often originating from different lineages. Thus in reality, DNAm measured in a tissue sample is an average across a pool of cells, and will range between 0 and 1 [63].   Numerous techniques have been developed to assay DNAm, each with a different complement of technical specifications (Table 1.1). The selection of method for measuring DNAm determines the type of conclusions that can be made with the data. Many popular DNAm techniques require pretreatment of DNA with bisulfite to discriminate between methylated C (5-mC) and unmethylated C (5-C). Bisulfite conversion is the transformation of 5-C to uracil (U), while 5-mC is not converted by this treatment. After an amplification step, each 5-C in the original DNA sequence is replaced by T, while each 5-mC remains C [64]. Bisulfite conversion cannot distinguish 5-mC from 5-hmC [47], which may be problematic in measurement of DNAm in tissues with increased levels of 5-hmC. Both human and mouse studies suggest that 5-hmC appears at low levels in most tissues, with the exception of sperm, 11   neuronal and undifferentiated cells [46, 65]. Bisulfite conversion changes patterns of DNAm from epigenetic to genetic, allowing the application of molecular genetics tools to discriminate methylated from unmethylated sites.   Table 1.1 Considerations for choosing DNA methylation technique Technical consideration Details Genomic coverage  How many CpGs are measured by the assay? Global techniques – measure all CpGs in the genome [28] Genome-wide techniques – measure the “bulk genome” - many CpGs, dispersed across the genome Locus-specific techniques– measure CpGs in a restricted region (e.g., a particular gene) Resolution  At what granularity are measurements taken? Integrated – an average value across a region or genome CpG-specific – a single value/CpG Sensitivity   What is the minimum magnitude of difference that can be measured? May range from 0.01 to >0.10 Precision  What is the technical variability of the assay? Pretreatment  Is a step required to distinguish 5-mC from 5-C prior to measurement? None Enzymatic digestion – treatment with a methylation-sensitive restriction enzyme and non-methylation sensitive isoschizomer Affinity enrichment – methyl-binding protein enriches for methylated DNA fraction, often compared to untreated fraction Bisulfite conversion  Platform type  What is the basis of the molecular approach for measurement? Gel-based  ELISA-based Array-based Sequencing-based 5-mC specificity Does the assay specifically measure 5-mC? Bisulfite conversion-based techniques cannot distinguish 5-mC from 5-hmC  12   1.3.7 DNAm microarrays – the Illumina platform  Microarrays are a popular tool for genome-wide measurement of DNAm because (i) measurement is taken of specific, targeted CpGs; (ii) the technology is robust; and (iii) costs are low. The suite of DNAm microarrays produced by Illumina, Inc. is the most widely-used in human DNAm EWAS studies [18, 63, 66, 67]. In this dissertation, data were collected using two iterations of Illumina DNAm arrays: Infinium HumanMethylation27 BeadChip (27k array, Chapter 2) and Infinium HumanMethylation450 BeadChip (450k array, Chapters 3-5). An overview of these two arrays is presented in Table 1.2. Specific language is used in this dissertation with reference to the Illumina DNAm arrays (Figure 1.2). Twelve samples can be analyzed in parallel on a single “chip”, with the bisulfite converted DNA from each sample loaded onto one of the twelve “arrays”. On the surface of each array are hundreds of thousands of “beads”, to which numerous copies of a “probe” are attached. These synthetic, 50-mer oligo probes are designed to bind DNA at a specific location in the genome and measure DNAm at the “target CpG” site.    Table 1.2 Comparison of Illumina 27k and 450k DNA methylation arrays [68, 69]  Infinium 27k array (released 2008)  Infinium 450k array (released 2010)  Number of targets 27,578 CpG sites 485,577 sites  - 482,421 CpG sites - 3,091 non-CpG sites - 65 random SNPs includes ~90 % of 27k array Coverage Promoters of 14,475 genes  - Average 2 sites per gene - 2-30 sites for 200 cancer-related and imprinted genes 99% RefSeq genes 96% CpG islands Some coverage in: Shores Intergenic regions Enhancers Probe technology Infinium I Infinium I & II   13   Illumina DNAm array technology is based on “quantitative genotyping of C/T SNPs introduced following bisulfite conversion” [66]; in fact, the technology is borrowed from Illumina’s SNP arrays. Measuring DNAm involves four principal steps: (i) probe binding of targeted bisulfite-treated DNA; (ii) single nucleotide extension of bound probes with a biotin/DNP-labelled ddNTP; (iii) staining with fluorophores to differentiate ddNTP labels - Cy5 to label ddATP and ddTTP (red), Cy3 to label ddGTP and ddCTP (green); and (iv) chip scanning with fluorescent scanner [70].  Two assay types are used by the Infinium platforms: Infinium I (i.e., Type I probes) and Infinium II (i.e., Type II probes), bound to beads on the surface of the array (Figure 1.2). The Infinium I assay uses two bead types specific to the target CpG; an unmethylated (u) and a methylated (m) bead, each with a different probe design. Both u and m Type I probes for a given CpG fluoresce in the same colour channel. The Infinium II assay uses only one bead type for each CpG of interest: methylated + unmethylated (m+u) bead. One probe is designed for each Type II target and the colour of fluorescence is based on which nucleotide is incorporated in the single base extension step. The intensity of fluorescence in the scanned images is translated into a level of DNAm for each targeted site [66] – either as a β value, a number between 0 and 1 calculated as max(mintensity,0) / (max(uintensity,0) + max(mintensity,0)) or M value, calculated as the log2 of (max(mintensity,0) +1) / (max(uintensity,0) +1) [71]. Though the use of two probe types increases the complexity of 450k data analysis, it is necessary for coverage of both higher (Type I) and lower (Type II) CpG density regions [66].  As the 450k array is a relatively new platform, there has been considerable discussion over many aspects of how data should be analyzed [7, 66, 71-73]. One example of this is the development of numerous methods for 450k data normalization, including, among others: all sample mean normalization (ASMN) [74], peak-based correct (PBC) [66], subset-quantile within array normalization (SWAN) [75], subset quantile normalization (SQN) [76], β-mixture quantile normalization (BMIQ) [77] and Dasen [78]. Reports comparing the performance of various subsets of normalization methods demonstrate that some of these methods perform worse than simply using raw data [73, 79, 80]. Furthermore, different methods are named as the top 14   ranking normalization approach. Though EWAS would benefit from quickly developed data production and analysis standards, this is somewhat more difficult than in GWAS due to the known biological variability of DNAm [4]. 15    Figure 1.2 Illumina Infinium DNA methylation microarray technology. A portion of this figure was removed due to copyright restrictions. It was a diagram adapted to define terms associated with Illumina Infinium DNAm arrays: “chips” are slides containing 12 “arrays” onto which bisulfite-converted DNA from one sample is loaded. On the surface of each array are hundreds of thousands of “beads”, carrying multiple copies of a given “probe”. Original source: http://www.illumina.com/technology/beadarray-technology.html. Each probe acts as a primer for single nucleotide extension off the targeted CpG in the bisulfite-converted template DNA. The Illumina Infinium HumanMethylation27 BeadChip (27k array) contains 27,578 probes, all of Type I design. The Illumina Infinium HumanMethylation450 BeadChip (450k array) contains 485,577 probes, which are of Type I or Type II design. For each Type I probe, methylated and unmethylated versions of the probe are designed and housed on different beads. The 3’ end of the probe overlaps the target CpG site, and with end nucleotide match, single nucleotide extension occurs on the nucleotide following the CpG. For this reason, the nucleotide incorporated (and thus colour of fluorescence) is the same for both unmethylated and methylated probes. For each Type II probe, only a single design is needed to measure DNA methylation as the probe overlaps only the “G” of the CpG. The colour of fluorescence relates to whether the site was methylated or unmethylated in the template DNA. (© 2011 Future Medicine Ltd.; image based on [66], with permission). 16   1.4 Folate-mediated one carbon metabolism  The biochemical process that activates and transfers methyl groups is folate-mediated one carbon metabolism (OCM), named for folate’s key role as a substrate. OCM is essentially the intersection of two underlying pathways: the DNA cycle, which produces nucleotide precursors, and the methylation cycle, which produces S-adenosylmethionine (SAM), the universal methyl donor for methyltransferases (Figure 1.3). As evident from these roles, functional OCM is fundamental to basic cellular activities like mitosis, growth and migration [81].    Figure 1.3 Folate-mediated one carbon metabolism. A simplified diagram illustrating key substrates and enzymes in the methylation cycle (left) and DNA cycle (right) that make up one carbon metabolism (OCM).  5-CH3-THF, 5-methyltetrahydrofolate; THF, tetrahydrofolate; 5,10-CH2-THF, 5,10-methenyltetrahydrofolate; SAM, S-adenosylmethionine; SAH, S-adenosylhomocysteine; MTHFR, methylenetetrahydrofolate reductase, MS, methionine synthase; X, methyl acceptor. (© 2010 Oxford University Press; image based on [81], with permission).  1.4.1 Sources of folate and dietary requirements  The three main sources of dietary folate are: (i) vitamin B9, found in dark leafy greens, legumes and citrus fruits; (ii) folic acid-fortified food, including most commercial cereals, breads 17   and pastas; and (iii) folic acid supplements [81]. Dietary sources of folates are less bioavailable than synthetic folic acid (~50% vs. 100%) since folates lose biochemical activity with harvesting, storage and cooking, and also require processing in the gut before absorption [82]. It is recommended that all women of reproductive age consume 400 µg/day of folic acid and this may increase up to 5.0 mg/day for women with a history of neural tube defect pregnancies, or diabetes, or who take anticonvulsant medication [83].  1.4.2 Folate uptake and transport during pregnancy  Folates and folic acid are absorbed by transporters located in the microvilli of the small intestine. Once inside the intestinal mucosa, folates are converted into 5-methyltetrahydrofolate (5-methylTHF) for transport via maternal blood [82]. Cellular uptake of 5-methylTHF is mediated by the reduced folate carrier (RFC-1) and any of three folate receptors (FR α, β, γ) [81]. In maternal-facing placental trophoblast cells, FRα is the main transporter of 5-methylTHF, which reversibly binds and releases it into fetal circulation [84]. However, prior to 10-12 weeks gestation, perfusion of the placenta by maternal blood is low [85], and thus transport of folate to the embryo is histotrophic, i.e., via uterine glands [86]. Folates and other nutrients are secreted by maternal endometrial glands and absorbed by the trophoblast of the invading embryo. Endometrial gland filtrates are transferred to the embryo via the vitelline veins of the yolk sac [86]. Placental trophoblast can concentrate folate up to 3 times that found in maternal circulation, reducing the risk of deficiency during critical windows of development [81].  1.4.3 Suboptimal folate  Suboptimal folate levels may lead to altered OCM cycling, with potential effects on several cellular processes: (i) nucleotide synthesis, (ii) homocysteine recycling, and (iii) methylation reactions (Figure 1.3). Pyrimidine synthesis utilizes 5-methylTHF to create deoxythymidine monophosphate (dTMP) from deoxyuracil monophosphate (dUMP). Inadequate folate is hypothesized to increase incorporation of U in DNA leading to increased rates of point mutations, double-stranded breaks, and chromosome instability [87]. This 18   mechanism is thought to play a role in the association of certain cancers with low dietary folate intake [87]. 5-methylTHF is also a cofactor in the conversion of the amino acid homocysteine (hcy) to methionine in OCM. Elevated hcy levels are associated with inadequate folate intake, and hcy is used to gauge functional folate metabolism. Elevated hcy was found to be cytotoxic in vitro [88] and result in apoptosis in trophoblast cells [89], as did folate deficiency [90]. Furthermore, recycling of hcy prevents build-up of the methylation reaction product S-adenosylhomocysteine (SAH). As SAH is an inhibitor of methyltransferases, it is expected that inadequate folate could also lead to alterations in the capacity to methylate DNA, histones and lipids [91, 92].   Inadequate folate may either be dietary (i.e., a lack of intake) or functional. Sources of functional inadequacy include dietary deficiency of micronutrients that act as cofactors in OCM (vitamins B2, B12 and B6, iron or zinc) or reduced enzymatic function [81]. The enzyme most studied for its role in functional folate inadequacy is methylenetetrahydrofolate reductase (MTHFR). MTHFR catalyzes an irreversible reaction that commits methyl groups to the methylation, rather than DNA cycle in OCM (Figure 1.3). Two single nucleotide polymorphisms (SNPs) in MTHFR (677C>T and 1298A>C) were shown to result in reduced MTHFR function in vitro, with levels at about 45% and 68% of reference controls [93]. These variants are common in the population ranging from 4% to 32% for MTHFR677, depending on geography and ethnicity [94]. Elevated hcy is consistently documented in the blood of homozygous carriers of the 677T variant, indicative of compromised OCM cycling [95-98]; however, there are conflicting data on the effect of the 1298C variant in vivo [95].   1.4.4 Consequences of suboptimal folate during pregnancy  Demand for OCM is particularly high during gestation because of the rapid growth and development of the embryo/fetus and placenta. Inadequate folate – both dietary and functional – may be associated with increased risk for pregnancy complications, including Down syndrome, placental abruption, pre-eclampsia, spontaneous abortion and neural tube defects [81]. In 1998, the Canadian government implemented mandatory folic acid fortification of cereal and grain products, and similar requirements are now found in more than 80 countries 19   around the world [99]. The driving force for this public health measure was a series of studies demonstrating that the high incidence of neural tube defects documented in low socioeconomic status groups [100] could be lowered with maternal prenatal and perinatal folate supplementation [101-103].  1.4.5 Neural tube defects Neural tube defects (NTDs) are the second most common congenital abnormality world-wide [92], though regional rates have varied from 20 per 1,000 births in Northern China [104] to 0.75 per 1,000 births in Canada [105]. NTDs arise in the first month of pregnancy from a failure of the flat neural plate to elevate, fold and close into the neural tube in the developing embryo [106]. NTDs are a complex disease and it is thought that different types of NTDs may be a result of different predisposing genetic and environmental factors [107]. Under normal conditions, closure of the neural tube is initiated from two or more sites, extending along its length [108]. Failure of closure at different sites may result in different types of NTDs, including anencephaly, craniorachischisis, open and closed spina bifida and encephalocele [109].  Caudal failure of neural tube closure can lead to spina bifida, most commonly in the form of myelomeningocele, i.e., herniation of the spinal cord usually also affecting the overlying muscle and vertebrae in the area [109]. Cranial failure of neural tube closure can result in anencephaly, clinically characterized by the full or partial lack of brain and skull [109]. It has been suggested that anencephaly is preceded by an initial herniation of the cephalic neural tube (termed exencephaly) followed by degradation in the amniotic fluid [110], as is described in mice [106]. While most fetuses with an NTD are terminated, die in utero or are stillborn, live births with spina bifida may survive after undergoing surgery. Children with spina bifida usually rely heavily on family and the health care system due to issues such as paralysis, deformation of limbs or spine, bladder or bowel dysfunction, sexual dysfunction and learning disabilities [105].  NTDs follow a multifactorial pattern of inheritance with interaction of genetic and environmental factors predisposing to malformation [107]. NTDs occur most often as an isolated defect with a higher prevalence, especially of anencephaly, in females than males [111]. Recurrence risk of first-degree relatives is 2-5%, and this increases when more than one 20   relative is affected (with >2 NTD-affected, the risk rises to 11%) [112]. In addition to maternal folate status, maternal glycaemic dysregulation (maternal obesity and pre-pregnancy diabetes mellitus) [113], and maternal and fetal MTHFR 677 genotype [114], are also known risk factors for NTD development.   After the implementation of folic acid fortification, a Canadian survey documented a 58% and 36% decrease in spina bifida and anencephaly-affected pregnancies, respectively [115]. The contribution of social factors, like increased uptake of prenatal screening and medical terminations, should be recognized [116], in addition to the role of folic acid fortification and supplementation. Despite this success, the incidence of NTDs in Canada, per 1,000 live and stillbirths was 0.41 for spina bifida and 0.36 for anencephaly in 2007 [115]. The etiology of these persistent NTDs remains unclear.  1.5 Research objectives Previous doctoral students in the Robinson lab, in particular Dr Yuen (2011), have contributed fundamental understanding of genome-wide patterns of DNAm in human placental and fetal tissues. Using the first versions of Illumina DNAm array (GoldenGate - 1,536 targets and 27k - 27,578 targets), Dr Yuen demonstrated tissue-specific differentially methylated regions in fetal tissues, changes in DNAm patterns of chorionic villi over gestation, and a high degree of inter-individual variability in DNAm of the human placenta. Expanding on these findings, I aim to further elucidate technical and biological factors important for assessing DNAm in pregnancy-associated tissues.  The objective of this dissertation is to develop principles for analyzing genome-wide DNAm in human fetal and placental tissues, with the goal of improving the collection and analysis of developmental epigenetic data. To this end, I undertook the following studies: (i) A comparison of the performance, variability and relationship of five measures of genome-wide DNAm, in a set of human fetal tissues and placental chorionic villi. (ii) A detailed annotation and evaluation of the location and quality of 450k DNAm array targets, using healthy placental chorionic villi, child buccal and adult blood samples.  21   (iii) An investigation of batch effects in a pilot 450k DNAm array study, examining the effect of MTHFR genotype on DNAm in control placental chorionic villi. (iv) The application of the tools, procedures and approaches developed above to study DNAm in five tissues ascertained from cases of neural tube defects.  These studies culminated in the development of a workflow for the analysis of genome-wide DNAm array data, with specific recommendations for pregnancy-associated tissues.  22   Chapter 2: Different measures of genome-wide DNAm exhibit unique properties in placental and somatic tissues  A version of this chapter has been published (see Preface for contribution details):   Price EM, Cotton AM, Peñaherrera MS, McFadden DE, Kobor MS, Robinson W. Different measures of "genome-wide" DNA methylation exhibit unique properties in placental and somatic tissues. Epigenetics. 2012 Jun 1;7(6):652-63. 23   2.1 Introduction Multiple techniques are available to measure genome-wide DNAm in human tissues, each with its own set of advantages and drawbacks. The gold standard for global DNAm, high performance liquid chromatography (HPLC), provides a measure of total 5-mC content in the genome [117]. Alternative global DNAm techniques like 5-mC ELISA kits have been developed as less expensive and less labour-intensive alternatives [118]. Although comprehensive, global techniques lack resolution - they mask the biologically important distribution of DNAm throughout the genome. Genome-wide approaches like the array-based methylated DNA immunoprecipitation (MeDIP) and Illumina Infinium platforms, sacrifice genomic coverage for higher CpG resolution. Recently sequencing techniques have been applied to assess DNAm on both the global (whole genome bisulfite sequencing, WGBS) and genome-wide scale (reduced representation bisulfite sequencing, RRBS). The sensitivity of these techniques depends on the level of coverage, thus the current cost of sequencing often limits discrimination to differences in DNAm greater than ten percentage points [119]. DNAm of repetitive element (RE) sequences has been used as a surrogate measure of global DNAm [120] especially in studies of the effects of environmental exposures on disease [121-124]. Its value is reportedly three-fold: (i) sequences are dispersed, thus their DNAm should represent trends throughout the genome; (ii) sequences are numerous, thus assessment of many copies can be carried out in tandem, at low cost; and (iii) DNAm at such sequences may be sensitive to environmental exposures. The REs, LINE1 (L1) and Alu, account for close to 30% of all base pairs (bps) in the human genome [125], and about 12% and 25% of CpG dinucleotides fall within LINEs and Alus, respectively [126, 127]. These REs amplify by a copy-and-paste mechanism that reverse transcribes the repeat sequence into a new location [125]. Alu and L1 transposable elements integrated into the ancestral genome more than 80 and 150 million years ago, and over time, lineages have diverged significantly from the original DNA sequences [21]. Phylogenetic analyses divide the existing 500,000 copies of L1 and 1,000,000 copies of Alu [21] into three large subfamilies for each set, based on evolutionary age [128-24   130]. It is hypothesized that the host genome methylates RE DNA as a defense mechanism to limit their potentially detrimental transcription [30, 131]; however, DNAm patterns are shown to differ based on the evolutionary age of the RE examined [132]. It has been suggested that RE may be sensitive to changes in DNAm with exposure to environmental toxins. In mice, feeding mothers the xenoestrogen bisphenol A (BPA) during pregnancy resulted in hypomethylation of the RE up-stream of the Avy gene and a coordinated change in coat colour in offspring [133]. Furthermore, supplying the mother with a methyl donor-rich diet partially counteracted the effects of the BPA-induced hypomethylation [133]. In humans, several studies have reported small reductions in L1 and/or Alu DNAm in adults exposed to benzene [134], particulate air pollution [123, 135], DDT [122] and PCBs [122].  Though measuring Alu and L1 DNAm is an attractive method to rapidly and economically assess many CpGs [120], there is little evidence in direct support of RE DNAm as a surrogate for genome-wide DNAm [136, 137] or DNAm of other groups of genomic sequences. The self-replication of REs in combination with their evolutionary age and parasitic relationship with the human genome, make L1 and Alu DNAm particularly interesting, yet difficult to study. To properly design experiments, accurately report data, and make comparisons between studies, there is a need to understand the targets of DNAm assays, the conditions that influence assessment and how techniques relate to each other.  In this chapter, DNAm of a set of Representative Dispersed Sequences (ReDS) is examined – subsets of L1, Alu, strong islands, weak islands and non-islands (see Section 2.2.3) – across multiple human tissue types. RE DNAm was assessed by pyrosequencing of consensus sequences, in addition to examining targets of the Illumina Infinium HumanMethylation27 BeadChip (27k array) that map to REs. DNAm of three categories of CpG islands was also examined using 27k array targets: (i) strong islands; (ii) weak islands; and (iii) non-islands [30]. Finally, an assessment of total 5-mC was attempted using an ELISA kit for global DNAm. Figure 2.1 outlines analyses conducted in this chapter to assess the degree to which DNAm at these ReDS is related to each other, as well as how each is affected by factors that may vary within and between studies. The results of this chapter highlight that each of the ReDS is a unique 25   measure of genome-wide DNAm, affected in different ways by technical and biological variables. Furthermore, it suggests that multiple measures of DNAm are useful to obtain a more complete assessment of genome-wide DNAm.    Figure 2.1 Schematic of analyses performed in Chapter 2.  2.2 Materials and methods  2.2.1 Sample collection  Samples were collected through the B.C. Children’s Hospital and B.C. Women’s Hospital & Health Centre pathology laboratory as previously described [51, 60]. Ethics approvals were obtained from the University of British Columbia/Children’s Hospital and Women’s Health Centre of British Columbia Research Ethics Board (UBC C&W REB (H04-70488, H06-70085). 1st Merged villi group Merged somatic groupBiological comparisons Tissue to tissue comparison of dispersed DNAm• Tissue groups: 2nd trimester chorionic villi, brain, kidney, muscle and adult blood• Assays: pyrosequencing L1 and Alu; strong, weak and non-island 27k array probes• Statistic: Kruskal-Wallis testGestational age comparison of dispersed DNAm• Tissue groups: 1st trimester, 2nd trimester, and term chorionic villi• Assays: pyrosequencing L1 and Alu; strong, weak and non-island 27k array probes• Statistic: Kruskal-Wallis testWithin-individual comparison of dispersed DNAm• Tissue groups: 2nd trimester chorionic villi, brain, kidney, and muscle • Assays: pyrosequencing L1 and Alu; strong, weak and non-island 27k array probes• Statistic: Spearman rank-order correlationTechnical comparisons Assessment of assays to measure dispersed DNAm• Tissue groups: 1st trimester, 2nd trimester and term chorionic villi, brain, kidney, muscle, adult blood and merged somatic group• Assays: pyrosequencing L1 and Alu; strong, weak and non-island 27k array probes• Statistic: Spearman rank-order correlationAssessment of repetitive element DNAm• Tissue groups: merge villi group, merged somatic group• Assays: pyrosequencing of L1 and Alu; L1 and Alu 27k array probes• Statistic: Kruskal-Wallis test, Spearman rank-order correlationn=101st trimester villin=112nd trimester villin=10term villin=82nd trimester brainn=112nd trimester kidneyn=102nd trimester musclen=10adult blood26   trimester placental chorionic villi (male n=8, female n=2) were obtained anonymously from chromosomally normal elective terminations between 8 and 12 weeks gestation. 2nd trimester placental chorionic villi (male n=5, female n=6) and fetal tissues (kidney, male n=5, female n=6; brain, male n=4, female n=4; muscle, male n=4, female n=6) were obtained from de-identified, chromosomally normal pregnancies terminated for medical reasons, such as premature rupture of membranes or placental abruption between 17-24 weeks gestation. Term placental chorionic villi (38-41 weeks gestation, male n=5, female n=5) and adult female blood (n=10) were samples from previous studies of DNAm in the placenta [60]. After collection, a 1 cm3 piece was sampled from three or four sites on the fetal side of each placenta to avoid maternal decidual contamination. The amniotic membrane and chorion were removed and then DNA was extracted from one quarter of the piece of chorionic villi. A similar sized piece of each fetal tissue was also taken for DNA extraction. The remainder of each sample was stored at -80 C for future use. Extraction was performed by standard salting out method [138] for all samples and DNA was stored at -20 C. 2.2.2 Pyrosequencing  DNAm of L1 and Alu elements was measured by pyrosequencing. 300 ng of genomic DNA was bisulfite converted using the EZ DNA Methylation-Gold Kit (Zymo Research, Orange, CA, USA). L1 and Alu elements were amplified using published primer sets designed to complement the L1H and AluSx consensus sequences and cycling conditions were as previously published [134]. PCR products were subsequently sequenced by a PyroMark MD system (Biotage, Uppsala, Sweden). The DNAm status for each CpG dinucleotide (L1 n=4 CpGs, Alu n=3 CpGs) was evaluated using PyroQ-CpG software (Biotage) and the average for all CpG sites was calculated as a percentage for each sample. Correlation of independent bisulfite conversions for the L1 and Alu assays were r=0.99 and r=0.51, respectively. The lower Alu correlation was in part due to the small range of DNAm values measured for the Alu assay, only 1.6 percentage points in comparison to nearly 50 in the L1 assay.   27   2.2.3 Illumina Infinium HumanMethylation27 BeadChip array (27k array)   All samples were run on the 27k array. Data for chorionic villi, fetal tissues and blood samples were previously published in an analysis of gestational age changes in DNAm [139] and for identification of novel differentially methylated regions between fetal tissues and blood [60]. Briefly, genomic DNA was bisulfite converted using the EZ DNA Methylation Kit (Zymo Research, Orange, CA, USA), digested and then hybridized to the 27k array (Illumina, San Diego, CA, USA). The 27k array is a robust assay and technical replicates were highly correlated (r=0.99). Using GenomeStudio 2008 1.0.5 software (Illumina San Diego, CA, USA), sex chromosome probes were removed to reduce sex bias (n=1,092). Poor quality probes (n=32), defined as probes with detection p-value of >0.05 in >10% of samples were removed. Additionally, we removed probes that were polymorphic at the target CpG (n=263) [140]. Signal intensity output from GenomeStudio was read in R [141] with the Bioconductor methylumi package [142] to calculate an M value based on the log2 ratio of the intensity of the methylated to unmethylated probes. The lumi Bioconductor package was used to correct colour biases from chip to chip and then the M value for each probe was converted to a β value (m2beta function) multiplied by 100 to ease data interpretation [71, 72].  GpGIE2.0 [143] was used to annotate the genome with the location, size and density of CpG islands based on Weber’s definitions [30]. This annotation was intersected with the genomic location of 27k array probes in Galaxy [144-146] to group probes into three categories based on the density of the CpG island in which they fell. Strong island probes (n=14,391) were defined as those that mapped to CpG islands with >0.75 obs/exp CpG ratio, >55% CG content and >500 bps in length. Weak island probes (n=2,786) were defined as those that mapped to CpG islands with >0.48 obs/exp CpG ratio, >50% CG content and >200 bps in length. Probes in weak islands that bordered strong islands (n=2,229) were not included in either island group, since they may act like CpG island shores and thus confound analyses. Non-island probes 28   (n=5,805) were all remaining probes1. Probes that mapped to Alu (n=623) and L1 (n=222) were removed from CpG island groups and used for evolutionary age comparison of DNAm of REs. Binning of probes into distance to nearest TSS was accomplished using distances provided by the Illumina 27k array annotation file. Probes mapping to strong, weak and non-islands were initially grouped into each of six non-overlapping bins: (1) -1500 to -1001 bps, (2) -1000 to -501 bps, (3) -500 to 0 bps, (4) 1 to 500 bps, (5) 501 to 1000 bps and (6) 1001 to 1500 bps. Average DNAm was calculated for probes in each of the three CpG island groups within each of the six TSS bins. Error bars in Figure 2.4 give the standard deviation of this average between somatic tissue samples. Given that we found no difference in DNAm based on direction to TSS, probes were not separated into up-stream/down-stream in later TSS analyses.  2.2.4 MethylFlash global DNAm kit  Total 5-mC content for each sample was measured using the MethylFlash Methylated DNA Quantification Kit (Colorimetric) (Epigentek Group Inc., New York, NY, USA) following manufacturer’s protocol. The absorbance (or optical density) of each well was measured using a microplate spectrophotometer.  2.2.5 Statistical analyses Due to small sample size and non-infinite values, normality was not assumed for methylation data and non-parametric statistical tests were employed. Comparison of DNAm between tissue groups, gestational ages and RE evolutionary ages was carried out using the non-parametric anova or Kruskal-Wallis test followed by Dunn’s multiple comparison test. Averages with standard deviation (SD) of DNAm are stated for tissue groups. Spearman’s rank order correlation (non-parametric) was used to compare DNAm of different ReDS within each tissue group and within an individual, followed by Benjamini-Hochberg correction for multiple comparisons. A chi-squared test was used to compare the observed vs. expected number of                                                      1 In subsequent chapters, the naming of these probe categories changed from strong island, weak island and non-island to, respectively, high density CpG island (HC), intermediate density CpG island (IC) and low CpG density or non-island (LC), though the same criteria were used to define them [30]. 29   probes in each of the TSS bins. Statistics were performed in GraphPad Prism 5 (GraphPad Software, Inc.). Graphs were produced in R and GraphPad Prism 5. For box plots, boxes represent the interquartile range (IQR); whiskers - the last data point within ± 1.5 x IQR; bars - the median; and dots - outliers. An estimate of the representation of all Alu and L1 sequences was obtained by dividing the number of probes on the array by the total number of sequences in each subfamily. The total number of Alu sequences was estimated based on supporting data in Price et al., 2004 [147] and the total number of L1 sequences was estimated based on counts from UCSC RepeatMasker track (hg18). 2.3 Results 2.3.1 Total 5-mC as measured by ELISA was not reproducible To obtain a measure of global DNAm that represented the average methylation at all CpGs in the genome, each sample was assessed using the MethylFlash ELISA kit. A standard curve created by serial dilutions of a manufacturer-provided positive control (polynucleotide containing 50% of 5-mC) in addition to a negative control (polynucleotide containing 50% of 5-C), was run in duplicate to assess within-plate variability. Correlation of the optical density (OD) measured for the four most linear points in the standard curve (including 0) was r=0.71 (p=0.14) and the average coefficient of variation was 42.9% (Supplementary Table 2.1). In a set of 21 study samples run in duplicate, the correlation was 0.39 (p=0.07) and average coefficient of variation was 22.7%. ELISA protocols from Abcam (Cambridge, UK) and Salimetrics (Pennsylvania, US) suggest a threshold coefficient of variation of <20%, though no criteria were provided with the MethylFlash kit. As the global DNAm data did not meet these standard criteria, results were not included in subsequent analyses. 2.3.2 Evolutionary age and assay method affect the assessment of L1 and Alu DNAm Characteristics that may bias the assessment of RE DNAm include evolutionary age [127, 132], genomic location [127, 148] and assay method [120]. DNAm of L1 and Alu sequences assessed by pyrosequencing was compared to subsets of probes from the 27k array that we 30   identified as mapping to REs. Consensus sequences must be used to design L1 and Alu pyrosequencing primers and thus multiple primer sets are available for each family. We used a primer set that measures four CpGs in the 5’ CpG island promoter of the L1H consensus sequence and three CpGs in the body of the AluSx consensus sequence since these primer sets have been widely used in recent years [120, 122, 123, 134, 135, 149]. While there are about 130,000 copies of AluSx in the human genome [130], there are only about 1,200 copies of L1H [150] and up to 70% of these are expected to be 5’ truncated [151]. Thus the L1 and Alu pyrosequencing assays sample a small portion of each subfamily and are only representative of these specific dispersed sequences. RE 27k array probes cover a variety of evolutionary age groups, but due to the design of the array, the population we could examine was biased towards REs proximal to gene promoters. A variable number of 27k array probes mapped to each of the Alu and L1 subfamilies: old Alu (AluJ; n=153), intermediate Alu (AluS; n=392), young Alu (AluY; n=78) and old L1 (L1M; n=192), intermediate L1 (L1P; n=26), young L1 (L1H; n=4). This sample represents approximately 0.52% of all AluJs, 0.11% of all AluS, 0.08% of all AluY and 0.23% of all L1H, 0.01% of all L1P, 0.03% of all L1M. Analyses of element age were conducted in two tissue groups: 1) merged chorionic villi (all three trimesters); and 2) merged somatic tissues (fetal brain, fetal kidney, fetal muscle and adult blood), since DNAm of each subfamily of Alu or L1 did not differ between the tissues included in each group (data not shown). There was a significant trend for hypomethylation of older L1 and Alu subfamilies, in comparison to intermediate and young ones in both the merged chorionic villi and somatic tissue groups (Figure 2.2A, B). We next compared the level of RE DNAm assessed by pyrosequencing to that assessed by the 27k array. After correction for multiple comparisons, mean L1 DNAm by pyrosequencing was not different from L1H DNAm assessed by the 27k array in the merged chorionic villi group (Figure 2.2B). All other comparisons of DNAm between L1 or Alu pyrosequencing and the 27k array were significantly different in both tissue groups (Figure 2.2A, B; all p<0.001). Assessment by pyrosequencing of L1 DNAm was correlated with 27k array results in the merged somatic tissue group (L1M 31   r=0.55; L1P r=0.58; both p<0.0001) but in neither tissue group for Alu DNAm. Thus, we suggest that DNAm of REs is dependent on both evolutionary age and assay method.    Figure 2.2 The assessment of Alu and L1 DNAm is affected by evolutionary age and assay type.  DNAm of (A) Alu and (B) L1 was assessed in two groups: merged chorionic villi (n=31; 1st trimester, 2nd trimester and term) and merged somatic tissues (n= 39; brain, kidney, muscle, blood) using pyrosequencing and sites assessed by the 27k array. 27k array probes that mapped to REs were divided into three age groups based on evolutionary emergence: old Alu (AluJ; n=153), intermediate Alu (AluS; n=392), young Alu (AluY; n=78) and old L1 (L1M; n=192), intermediate L1 (L1P; n=26), young L1 (L1H; n=4). There was a trend for increased DNAm from the old to young Alu and L1 measured by the 27k array. Alu and L1 DNAm as assessed by pyrosequencing was significantly different from each age group measured by the 27k array, except when comparing L1 by pyrosequencing to young L1 in the merged chorionic villi group. Significance is indicated by *p<0.05, **p<0.001, ***p<0.0001 after correction for multiple comparisons. 020406080100Intermediate (AluS)PyrosequencingYoung (AluY)Old (AluJ)merged chorionic villi merged somatic tissues*****************************A%  DNA methylation020406080100Intermediate (L1P)PyrosequencingYoung (L1H)Old (L1M)merged chorionic villi merged somatic tissuesB*****************************%  DNA methylation32   2.3.3 Evidence for association of DNAm in weak islands with L1 and non-islands Correlation analyses were carried out to determine how L1 and Alu DNAm assessed by pyrosequencing compare to each other and to DNAm of other ReDS. Using all array data, samples clustered most strongly by placental vs. somatic origin (Figure 2.3) and then by tissue type; thus, between-method comparisons were conducted within individual somatic tissues (fetal brain, kidney and muscle and adult blood) and then collectively in the merged somatic tissue group. Gestational age was previously found to have a significant effect on DNAm in chorionic villi [139, 152]; therefore, between-method comparisons of chorionic villi were carried out within each gestational age group.    Figure 2.3 Unsupervised hierarchical clustering of 27k array data separated samples by tissue of origin Prior to analysis of the 27k array data, sex chromosome probes, poor quality probes, probes that mapped to Alu or L1 elements and CpG polymorphic probes were removed. Thus, the DNAm (average ß values) of 25,346 probes was used to cluster samples based on the degree to which the samples were correlated, determined by 1-r (GenomeStudio, Illumina). Two large groups emerged: chorionic villi (1st trimester (n=10), 2nd trimester (n=11) and term (n=10)) and somatic tissues (blood (n=10), kidney (n=11), muscle (n=10) and brain (n=8)). 33     RE DNAm was compared to three groups of 27k array probes categorized based on the CpG island density of their hybridization location: strong, weak and non-islands. Supplementary Table 2.2 summarizes average DNAm by tissue group for each of the five ReDS. There were no significant differences in DNAm by sex for any tissue or for any ReDS; thus, male and female tissues were considered together for all analyses (Supplementary Figure 2.1). L1 and weak island DNAm were correlated in 2nd trimester chorionic villi (r=0.68, p=0.025) and in brain (r=0.88, p=0.007); however, these correlations did not withstand correction for multiple comparisons (Table 2.1). Weak island and non-island DNAm were significantly correlated after correction for multiple comparisons (Table 2.1) in 1st trimester chorionic villi (r=0.94, p<0.001) and 2nd trimester chorionic villi (r=0.80, p=0.005). In the merged somatic tissue group, weak island DNAm was also correlated with L1 DNAm (r=0.48, p=0.002), in addition to non-island DNAm (r=0.91, p<0.0001). L1 and Alu DNAm were correlated in 1st trimester chorionic villi (r=0.64, p=0.054) and in brain (r=0.74, p=0.046), but neither of these was significant after correction for multiple comparisons (Table 2.1). Details of correlation analyses within each tissue can be found in Supplementary Figure 2.2 through Supplementary Figure 2.9. Since the correlation of DNAm between weak islands and non-islands, in addition to weak islands and L1s, were present in several individual tissues and the merged somatic tissue group, the association between these two pairs of ReDS may be more than inter-individual variation. However, the inconsistency of correlation between DNAm of the other pairs of ReDS suggests that each measure targets a genomic sequence with different trends in DNAm.   34   Table 2.1 Spearman correlation of DNAm at five ReDS   chorionic villi 1st  trimester chorionic villi  2nd  trimester chorionic villi   term brain kidney muscle blood merged somatic tissue Alu L1 0.64* 0.29 0.62 0.74* -0.09 0.26 -0.03 0.20 strong islands -0.66* -0.04 -0.49 0.21 -0.14 0.60 -0.27 0.17 weak islands 0.79*† 0.54 -0.14 0.43 0.06 0.48 -0.47 0.14 non-islands 0.76*† 0.45 0.53 0.05 0.13 0.25 -0.13 0.13 L1 strong islands -0.16 0.37 -0.58 0.43 -0.35 -0.12 0.49 0.48*† weak islands 0.54 0.68* -0.39 0.88* -0.21 -0.26 0.25 0.48*† non-islands 0.50 0.37 0.13 0.07 0.35 0.31 -0.18 0.50*† strong island weak islands -0.50 0.10 0.66* 0.21 0.42 0.50 0.53 0.78*† non-islands -0.50 -0.42 -0.16 -0.79* -0.52 -0.45 -0.48 0.56*† weak islands non-islands 0.94*† 0.80*† 0.44 0.33 0.32 -0.13 0.38 0.91*† Spearman correlation values (r) are stated; *indicates correlations with p≤0.05; † indicates significant correlations (p≤0.05) after Benjamini-Hochberg correction for multiple comparisons.  2.3.4 Dispersed DNAm assays each produce a distinct tissue-specific DNAm profile The five ReDS examined here are functionally different regions of the genome and thus tissue-specific trends in their DNAm may also be distinct. The 2nd trimester tissues (chorionic villi, brain, kidney and muscle) and adult blood were used to compare tissue-specific DNAm at each of the ReDS (Figure 2.4). After correction for multiple comparisons, there were no significant tissue-to-tissue differences in average Alu DNAm (Figure 2.4A). However, average L1 DNAm in chorionic villi (62.52% ±4.60) was significantly lower than in somatic tissues (Figure 2.4B; brain 84.89% ±3.32; muscle 82.82% ±2.44 and blood 85.79% ±1.92; all p<0.001). Reduced average non-island DNAm was also observed in chorionic villi (49.60% ±1.20) compared to somatic tissues (Figure 2.4E; brain 60.89% ±1.48; kidney 57.91% ±0.93 and blood 63.64% ±0.58; all p<0.001) and at weak islands (Figure 2.4D; chorionic villi 37.44% ±0.65; brain 41.31% ±0.49; kidney 38.83% ±0.47; blood 43.53% ±0.36; all p<0.001). Interestingly, strong island DNAm in chorionic villi (11.21% ±0.49) was greater than in other tissues (Figure 2.4C; muscle 9.83% 35   ±0.32; kidney 9.80% ±0.47; all p<0.0001), although these small differences in DNAm may not be biologically significant. The similarity in the patterns of tissue-specific DNAm assessed at L1, weak islands and non-islands further suggests that these three ReDS follow similar trends in DNAm.    Figure 2.4 Five ReDS exhibit different tissue patterns of DNAm.  DNAm in chorionic villi (n=11), brain (n=8), kidney (n=11) and muscle (n=10) from 2nd trimester fetuses and adult blood (n=10) was measured by assessing (A) Alu, (B) L1, (C) strong islands, (D) weak islands and (E) non-islands. L1, weak islands and non-island DNAm were most variable tissue to tissue. Chorionic villi DNAm was significantly reduced compared with most other somatic tissues at L1, weak islands and non-islands. However, chorionic villi DNAm was significantly increased compared to kidney and muscle at strong islands. Significance is indicated by *p<0.05, **p<0.001, ***p<0.0001.  chorionic villibrain kidney muscle blood020406080100Alu% DNA methylationchorionic villibrain kidney muscle blood020406080100L1% DNA methylationchorionic villibrain kidney muscle blood0102030405060Total 5mC% DNA methylationchorionic villibrain kidney muscle blood0102030405060Strong islands% DNA methylationchorionic villibrain kidney muscle blood0102030405060Weak islands% DNA methylationchorionic villibrain kidney muscle blood0102030405060Non-islands% DNA methylation*********** ********************* ************A BC D E36   2.3.5 Distance to transcription start site is associated with promoter CpG island density and distinct trends in DNAm Since 98% of the 27k array probes examined in strong, weak and non-islands were within 1,500 bps from a transcription start site (TSS), they are considered to be in gene promoters [69]. We investigated whether probe distance to a TSS within each of the CpG island groups affected average DNAm. We present here the data for the merged chorionic villi; however, merged somatic tissues yielded similar results. Probes were binned into six 500 bp windows based on distance to nearest TSS: (1) -1500 to -1001 bps (n=575), (2) -1000 to -501 bps (n=1,989), (3) -500 to 0 bps (n=8,628), (4) 1 to 500 bps (n=8,707), (5) 501 to 1000 bps (n=2,036) and (6) 1001 to 1500 bps (n=558) (Figure 2.5A). DNAm was significantly different between strong, weak, and non-islands within each bin (p<0.0001), but was not dependent on probe direction to TSS (up- vs. down-stream, p=0.73); thus, direction to TSS was not considered in further analyses. We next assessed tissue-specific patterns of DNAm in 2nd trimester tissues and adult blood for each CpG island density within each of three TSS bins: (1) ± 0 to 500 bps, (2) ± 501 to 1000 bps and (3) ± 1001 to 1500 bps. Ranking of tissue DNAm was the same between the three TSS bins within strong, weak, and non-island groups (data shown for strong islands in Figure 2.5B-D). These patterns were also the same as those observed when probes were not separated by distance to closest TSS (see Figure 2.4C, D, E). Between tissue differences in strong islands were largest at CpGs distal to TSS (Figure 2.5D; chorionic villi 16.91%±0.7 vs. kidney 15.00%±0.63, p<0.0001; blood 18.10%±0.74 vs. brain 16.13%±0.64, kidney 15.00%±0.63 and muscle 16.20%±0.42, all p<0.05) and smallest at CpGs close to TSS (Figure 2.5B; chorionic villi 9.63% ±0.67 vs. kidney 8.27%±0.46, and muscle 8.10%±0.32, all p<0.001). To test for an association between distance to closest TSS and CpG island density, we compared the observed to expected number of array probes in each of the three TSS bins (Supplementary Figure 2.10). There was an over-representation of strong island probes and under-representation of non-island probes close to TSS (± 0 to 500 bps). Conversely there was an over-representation of non-island probes and an under-representation of strong island probes distal to TSS (± 1001 to 1500 37   bps). Taken together these investigations imply that within promoters, distance to nearest TSS is linked to CpG island density, which in is associated with DNAm of promoters.   Figure 2.5 Distance to transcription start site (TSS) influences DNAm in promoters.  98% of 27k array targets were within 1,500 bps of a known TSS. (A) In the merged chorionic villi tissue group, probes were binned into six 500 bp windows around known TSS. There were significant differences in DNAm between strong (●) weak (■) and non (▲) islands (p<0.0001), but direction to TSS had no association with DNAm. DNAm furthest from TSS (± 1000 to 1500 bps; 16.31%±2.26, 59.29%±5.39, 63.82%±3.6 for strong, weak and non-islands respectively) was significantly higher than close to TSS (± 0 to 500 bps; 8.65%±0.77, 35.60%±2.16, 57.14±3.33 for strong, weak and non-islands respectively, all p<0.001). Tissue differences in DNAm of 2nd trimester chorionic villi, muscle, kidney, brain and adult blood were investigated in three TSS bins in strong CpG islands: (B) ± 0 to 500 bps, (C) ± 501 to 1000 bps and (D) ± 1000 to 1500 bps. Significance is indicated by *p<0.05, **p<0.001, ***p<0.0001. 204060801000 to -500-501 to -1000-1001 to -15001 to 500501 to 10001001 to 1500Adistance to closest TSS% DNA methylation 0 to 500 bps to TSSchorionic villibrainkidneymuscleblood0102030405060*******B% DNA methylation 501 to 1000 bps to TSSchorionic villibrainkidneymuscleblood0102030405060**********C% DNA methylation 1001 to 1500 bps to TSSchorionic villibrainkidneymuscleblood0102030405060*********D% DNA methylation38   2.3.6 Preliminary within-individual correlation of dispersed DNAm  After examining tissue-specific patterns of dispersed DNAm, we investigated whether there were within-individual trends in DNAm at each of the five ReDS. The 2nd trimester fetal samples, including chorionic villi, brain, kidney and muscle, were obtained from 12 fetuses and were thus used to investigate within-individual DNAm. None of the ReDS was universally correlated across all four tissues analyzed within an individual (Figure 2.6A-E). However, weak island DNAm (Figure 2.6D) was significantly correlated between chorionic villi and kidney (r=0.70, p=0.03) and chorionic villi and brain (r=0.89, p=0.01). Additionally, Alu DNAm (Figure 2.6A) was correlated between chorionic villi and kidney (r=0.82, p=0.01). Overall, we observed more positive than negative correlation coefficients (23 vs. 6). In particular, there were more comparisons with r>0.5 than for r<-0.5 (8 vs. 1). These results suggest a general trend for correlation of DNAm between tissues; however, conclusions are limited by the small sample size. 39    Figure 2.6 Intra-individual correlation of dispersed DNAm.  Within-individual Spearman rank-order correlation matrix of six measures of dispersed DNAm in 2nd trimester brain (n=8), kidney (n=11), muscle (n=10) and chorionic villi (n=11) sampled from 12 individuals. Each box in the top panels should be viewed as an individual scatter plot comparing DNAm assessed in tissue 1 on the x axis vs. tissue 2 on the y axis. Correspondingly, the scales for each tissue plotted are located on the outside of the correlation matrix in line with the tissue label. Each dot in a scatter plot represents the DNAm of one individual measured in two tissues. In the lower panels, each box indicates the Spearman rank order correlation (r) and the p-value for tissue 1 on the x axis compared to tissue 2 on the y axes. Boxes in bold highlight significant correlations (considered as p≤ 0.05) between different tissues measured by (A) Alu, (B) L1, (C) strong island, (D) weak island and (E) non-island DNAm.  2.3.7 Increase of weak island and non-island DNAm in chorionic villi through gestation The Robinson lab previously reported an increase in chorionic villi gene promoter DNAm through gestation using the 27k array data for 1st trimester, 2nd trimester and term [139]. Here, we evaluated gestational age changes in chorionic villi DNAm with the addition of L1 and Alu 40   DNAm, and further subdivision of the 27k array into strong, weak and non-islands (Figure 2.7A-E). After correction for multiple comparisons, there was a significant increase in non-island DNAm from 1st trimester (46.99% ±1.75) to term (52.01% ±0.91) (Figure 2.7E; p<0.0001) and weak island DNAm from 1st trimester (36.12% ±0.99) to term (38.10% ±0.68) (Figure 2.7D; p<0.0001), while strong island DNAm was not significantly different between trimesters (Figure 2.7C; 1st trimester 11.15% ±0.61; 2nd trimester 11.21% ±0.49 and term 11.82% ±0.63). Mean L1 DNAm did not change through gestation, although variation did decrease (Figure 2.7B; 1st trimester 62.69% ±7.00; 2nd trimester 62.52% ±4.60 and term 62.54% ±2.78). Thus, the gain in DNAm in chorionic villi through gestation occurs at specific genomic locations.    Figure 2.7 Increase in DNAm at Alu, weak islands and non-islands through gestation.  DNAm in 1st trimester (n=10), 2nd trimester (n=11) and term (n=10) chorionic villi was measured by assessing (A) Alu, (B) L1, (C) strong islands, (D) weak islands and (E) non-islands. There was no change in L1 or strong island DNAm, but there was a notable increase in DNAm at weak islands and non-islands throughout gestation. Significance is indicated by *p<0.05, **p<0.001, ***p<0.0001. 1st trimester 2nd trimester term020406080100Alu% DNA methylation1st trimester 2nd trimester term020406080100L1% DNA methylation1st trimester 2nd trimester term0102030405060Total 5mC% DNA methylation1st trimester 2nd trimester term0102030405060Strong islands% DNA methylation1st trimester 2nd trimester term0102030405060Weak islands% DNA methylation1st trimester 2nd trimester term0102030405060Non-islands% DNA methylation** *********A BC D E41   2.4 Discussion In this chapter, genome-wide DNAm was examined by measuring representative members of five sequences: L1, Alu, strong islands, weak islands and non-islands, in placental chorionic villi, fetal somatic tissues and adult blood. The assessment of L1 and Alu DNAm was affected by assay method, evolutionary age composition of REs and tissue type. Additionally, distinct between-tissue patterns of DNAm were observed at each of the ReDS. The subdivision of 27k array targets into strong, weak and non-islands showed that the gestational age-related gain in chorionic villi DNAm observed in other studies [139, 153, 154] predominantly occurred at weak CpG island and non-island regions.   L1 and Alu sequences account for about 17% and 11% of the human genome respectively [21] and only a subset of these are routinely examined. Although many studies use both L1 and Alu as surrogate measures for global DNAm, few have examined the question of whether L1 and Alu DNAm correlate with each other and with total 5-mC levels. A correlation of L1 with Alu DNAm was reported in neuroendocrine tumors, but not in control samples [137]. Given that REs may be sensitive sites in the genome for changes in DNAm, the correlation of L1 and Alu DNAm may be stronger under pathological conditions. This hypothesis is supported by two additional studies in cancer cells [155, 156], as well as the lack of correlation between L1 and Alu DNAm in the control tissues presented here. Wang et al. found no correlation of total 5-mC, measured by Methylamp, with mean L1 DNAm in human nervous tissue [136]. However, a strong association of both L1 and Alu DNAm with total 5-mC content measured by HPLC was identified in a panel of tumours and adjacent tissue [157]. Some of this study-to-study variation may be attributed to the use of different methods for measuring RE DNAm. The comparison of RE DNAm obtained by pyrosequencing vs. array probes suggests that the assessment of L1 and Alu DNAm is assay-dependent and thus may contribute to how DNAm at these ReDS correlates with a genome-wide measure of 5-mC.  Studies examining the genomic distribution of REs have demonstrated an enrichment of L1 in low GC content regions and on the X chromosome [21], in addition to an under-representation near genes, where Alu elements are over-represented [158]. L1s may be sites of 42   de novo DNAm in the developing embryo, from which DNAm silencing is spread [159]. However, this spread into CpG island promoters may be buffered by Alus, which are enriched near TSS associated with CpG islands [148]. We observed lower levels of DNAm in CpG island promoters close to TSS across all tissue types and increased DNAm at CpGs in island promoters further away from TSS. Kang et al. proposed a model of counteracting forces in the spread of DNAm towards promoters that leads to a transitional area of DNAm bordering CpG island promoters [148]; however, they use a definition for CpG islands that falls in between the weak and strong CpG islands identified in this study. DNAm was most variable between tissues at L1, non-islands and weak CpG islands and the pattern of DNAm across tissues was similar at these three ReDS. In addition, positive correlations were noted between L1, weak CpG island and non-island DNAm. The spreading model of DNAm from L1 sequences suggests an explanation for the association observed between L1 and weak CpG island DNAm.  HPLC based studies have determined that the placenta has the lowest 5-mC content in comparison to other normal somatic tissues [56, 57, 154], and thus, the placenta is often described as “hypomethylated”. Specific regions are shown to have low DNAm in placental tissue, including Alu [160], L1 and some regions on the X chromosome in females [31]. This lower level of DNAm may contribute to placental-specific gene expression from L1 [161] and Alu [162] elements, which are thought to play a functional role in the placenta, contributing to its invasive and proliferative properties. Syncytin is a classic example of a gene derived from an RE that is expressed exclusively in the placenta throughout gestation [163] and is involved in the fusion of pluripotent cytotrophoblast into differentiated syncytiotrophoblast cells [164]. This chapter confirms previous reports that L1 is hypomethylated in the placenta across gestation vs. other somatic tissues, but this was not confirmed for Alu. We also did not find large differences in the DNAm of chorionic villi compared to somatic tissues at strong islands or close to TSSs, in support of previous findings on the X chromosome [31]. Since L1 elements make up almost 17% of the genome [21], they may bias tissue comparisons of DNAm based on total 5-mC content.  43   Although REs are widespread elements that can be rapidly and inexpensively assessed, it has been demonstrated that L1s and Alus are sequences in the genome with distinct trends in DNAm. Despite these findings, it should not be ruled out that under significant environmental changes or pathological conditions, DNAm of different ReDS may be more strongly associated and there may be greater within-individual correlation. Conversely, DNAm of one type of dispersed sequence may be affected by a given condition while others are spared. Thus, the DNAm status of multiple ReDS are valuable measures to consider, but should not be used as a surrogate for trends at other genomic regions. Additionally, DNAm should only be directly compared across studies when sequences are measured by the same technique. The interpretation of DNAm collected via a diverse range of assays is complex and the true power of these individual measurements and their relationship to each other should be scrutinized.     44   Chapter 3: Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array  A version of this chapter has been published (see Preface for contribution details):  Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, Robinson WP, Kobor MS. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. 2013 Mar 3; 6(1):4.45   3.1 Introduction Following the publication of the work presented in Chapter 2, Illumina released a larger DNAm microarray, the Infinium HumanMethylation450 BeadChip or 450k array. This new array included >450,000 probes targeting individual CpG sites, was highly robust and cost efficient, and thus rapidly became a popular platform for epigenome-wide association studies (EWAS) [20, 165]. While the 450k array offered the opportunity to examine more of the genome, I and other Illumina users at UBC believed that to accurately interpret data collected using this array, a more careful analysis and detailed understanding of the targeted CpGs would be required. Therefore, before we applied the 450k array to biological questions, I led a group in documenting technically unreliable probes and enhancing the biological annotation of the 450k array. Extensive annotation of the CpG sites targeted by 450k probes was made available by Illumina to aid users in data interpretation, including for example, probe location within genes, CpG islands and regulatory features (deposited in the Gene Expression Omnibus (GEO) under accession GPL13534). However, several technical limitations of the Infinium platform which were previously described, were not documented in Illumina’s probe annotation [166, 167]. An evaluation of the earlier 27k array identified two groups of probes as possibly compromised by their design [140]. The first group, accounting for about 6-10% of the 27k array, was non-specific probes, i.e., probes that hybridized to multiple genomic locations in silico. The level of DNAm measured at a non-specific probe likely reflects a combination of DNAm at the various locations to which the probe hybridizes. The second group of unreliable probes was those with a polymorphic target (0.24% of the 27k array). The Illumina Infinium platform (both 27k and 450k) uses quantitative genotyping of bisulfite-introduced C/T SNPs to determine the level of DNAm (Figure 1.2). Thus, probes with polymorphisms at the target C or G have the potential of assessing a difference in genotype rather than a difference in DNAm. An increase in the number of both non-specific and polymorphic probes was expected given the similar technology of the 450k and 27k arrays [167]. 46   In this chapter, I present additional annotation for 450k array targets in four areas: (1) documentation of SNPs in the target CpG, (2) documentation of probe binding specificity, (3) classification by CpG density, and (4) classification by gene feature. The expanded annotation was applied and evaluated in a set of healthy samples: adult blood (n=4), child buccal (n=4) and placental chorionic villi (n=4), and followed up in a larger publically available blood dataset (n=261). Based on these analyses, the utility of the 450k array was enhanced, analysis recommendations were made and I applied these recommendations to biological questions in Chapters 4 and 5.  3.2 Materials and methods 3.2.1 Annotation To expand the annotation, we calculated additional probe location information based on the Illumina-provided “MAPINFO” (i.e., the genomic location of the C in the target CpG) including: (1) the interval of the target CpG (CpG); (2) the interval containing the probe but excluding the target CpG (Probew/o CpG); and (3) the interval of the entire probe (entire probe) (Supplementary Table 3.1). Probe type (Type I vs. Type II) and strand of design (F or R) were taken into consideration when calculating genomic location (Figure 3.1). The calculated location of ten Type I and ten Type II probes were manually checked against the annotated probe sequence. A UCSC track was created containing the targeted Cs on the 450k array (Additional File 15 in [168]). All of the annotation and analysis of the expanded annotation was conducted on 485,512 probes, including both cg (CpG loci) and ch (non-CpG loci) probes, but excluding rs (SNP assay) probes, unless otherwise specified.  47    Figure 3.1 Relative location of probes to target CpG To complete our analysis, it was necessary to locate 450k probes (span 50 bases) within the genome. Illumina annotated the hg19 location of each target C (called MAPINFO) and the strand on which the probe was designed; R probes bind to the - strand, while F probes bind to the + strand. With this information, we annotated the start and end coordinates for all probes on the 450k array. Type I vs. Type II probes and F vs. R probes align differently with target CpGs.   3.2.2 SNP annotation To annotate probes potentially affected by a SNP in the target CpG, the dbSNP131 table from UCSC [169] was imported into Galaxy [144-146]. Only rs numbers for SNPs that were an interval of 1 bp in length and of alignment quality weight = 1 were included in the annotation. An interval file was uploaded into Galaxy using the hg19 location we annotated for the interval of each probe spanning the C and G of the target CpG, for cg probes only. The probe file was intersected with the dbSNP131 table to create a list of probes with documented SNPs in the C or G of the target CpG. This file was collapsed in R software environment to create a list of rs numbers for each probe, since some target CpGs were documented with more than one SNP. The rs numbers for SNPs in the target CpG were included in the expanded annotation in the “target CpG SNP” column (n=20,270), while the number of target CpG SNPs per probe was annotated in the “n_target CpG SNP” column. 3.2.3 Non-specific probe annotation To identify probes that potentially have multiple genomic targets (i.e., non-specific probes), we followed the method used to annotate such probes on the 27k array [140]. Special 48   treatment of Type II probes was required as the Illumina annotation had noted Cs in CpGs within the probe as an “R” SNP. For Type II probes that contained Rs, we considered two probe sequence versions, one with all Rs replaced by As (i.e., a unmethylated version) and the other with all Rs replaced by Gs (i.e., a methylated version). Using these conditions, we were able to match each of the 450k probes with their intended target (i.e., the Illumina-annotated genomic location).  Briefly, we used BLAT [170] to align probe sequences to four versions of the hg19 draft sequence genome: (i) a fully unmethylated “bisulfite treated” genome, with all Cs converted to Ts; (ii) a fully methylated “bisulfite treated” genome, with only non-CpG Cs converted to Ts; (iii) and (iv) the above treatments on the reverse complement sequence. BLAT was run using the following parameters: stepsize=5, wordsize=11 and repMatch=10,000,000; lowering the word length led to only fractionally more hits. The selection criteria used were as previously outlined: for a probe to be considered non-specific, there had to be 90% identity over the aligned region, at least 40 of 50 matching bps, no gaps, and the 50th nucleotide had to align, as the probe hybridizes to the target CpG at this position [140]. The number of non-specific probe matches or “hits” were annotated in the expanded annotation columns “AlleleA_Hits, AlleleB_Hits”, while the site of cross-hybridization was annotated in the columns “XY_Hits” (if at least one hit was on a sex chromosome) and “Autosomal_Hits” (if at least one hit was on an autosomal chromosome). Repetitive sequences from RepeatMasker were marked in lower case in the hg19 genomic sequence. Thus, we were able to identify the amount of repetitive DNA within the Illumina-intended alignment of each probe in the expanded annotation column “n_bp_repetitive”.  3.2.4 CpG enrichment annotation Illumina categorized probes in CpG islands (column “Relation_to_UCSC_CpG_Island”) based on the UCSC Genome Browser [171] criteria of >50% CG content, >0.60 obs/exp CpG ratio and >200 bps long. Shores and shelves were identified by Illumina based on their relationship to a CpG island; shores as the 2 kbs up- and down-stream of CpG islands and 49   shelves as the 2 kbs outside of shores. The remaining probes were located in non-island regions, which we refer to as the “sea” (Figure 3.2A) [70].   Figure 3.2 Illustration of Illumina and HIL CpG classes  (A) Diagram of Illumina-annotated probes, based on their relative location to a CpG island: within the island, shore or shelf. We used the term “sea probes” to refer to probes that were not annotated into one of the Illumina CpG classes. Islands were defined based on UCSC criteria: >200 bps long, >50% CG content and >0.60 obs/exp CpG ratio. Shores were defined as the 2 kbs up- and down-stream of a CpG island and shelves as the 2 kbs outside of a shore. (B) The HIL definition of CpG islands was used to annotate probes into three CpG classes: HC probes (map to a high-density CpG island or HC), IC probes (map to an isolated intermediate-density CpG island or IC) and ICshore probes (map to a region with IC density that borders an HC). The remainder of probes did not map to a CpG island and were thus considered non-island or LC probes. HCs were defined as >500 bps long, >55% CG content and >0.75 obs/exp CpG ratio, while ICs were defined as >200 bps long, >50% CG content and >0.48 obs/exp CpG ratio.  We annotated probes into four mutually exclusive CpG classes based on alternative CpG enrichment criteria: high density CpG island probes (HC, n=153,859), intermediate density CpG island probes (IC, n=118,727), ICshore probes (probes in ICs that border HCs, n=33,955), and low CpG density or non-island probes (LC, n=178,971) (Figure 3.2B). This classification was termed HIL (HC, IC, LC) and annotation of probes into the HIL classes was added in the “HIL_CpG_class” column of the expanded annotation. To locate probes within each of the four CpG classes, we first annotated these CpG enrichment classes throughout the genome. The hg19 genomic sequence was downloaded from UCSC in overlapping segments and read by 50   CpGIE [143]. CpGIE searches input sequences in sliding windows based on user-set criteria. HCs were defined as regions with >55% CG content, >0.75 obs/exp CpG ratio and >500 bps in length, while ICs were defined as regions with >50% CG content, >0.48 obs/exp CpG ratio, and >200 bps in length [30, 172]. CpGIE HC and IC output was merged into a single file for each chromosome, duplicate islands were removed and CpG islands were identified as follows: ICs – isolated regions of the genome with IC density; ICshores – regions of the genome with IC density that were next to regions with HC density; HCs – any region of the genome with HC density; and LCs – regions that were not of IC or HC density. Islands were given unique names after being defined, e.g., chr8_IC:49890018-49891221 (chr#_CpG class: genomic start-genomic end). The hg19 HC and IC islands have been complied into a UCSC track available in Additional File 16 of [168]. The hg19 HIL annotation was intersected with the genomic location (hg19) of 450k probe targets in Galaxy to assign probes into the four CpG classes2 included in the expanded annotation column “HIL_CpG_Island”. An annotation of probes into HIL CpG islands using the unique nomenclature can also be found in the expanded annotation column “HIL_CpG_Island_Name”. 3.2.5 Gene feature and TSS annotation Using RefSeq gene annotation, we annotated probes into nine groups based on three gene components (1st exons, exons and introns) and three gene regions (5’UTR, body and 3’UTR). Probes were thus grouped into: 1) 5’UTR 1st exons, 2) 5’UTR exons, 3) 5’UTR introns, 4) body 1st exons, 5) body exons, 6) body introns, 7) 3’UTR 1st exons, 8) 3’UTR exons and 9) 3’UTR introns (Figure 3.3). The hg19 RefSeq track from the RefGene table was downloaded from UCSC [169]. Exon and intron information was extracted and parsed into genomic interval data with the most up-stream exon denoted as the first exon. Next, 5’UTR, gene body and 3’UTR locations were parsed into genomic interval data utilizing the transcription start/stop and coding start/stop information from RefSeq. Intersection was performed between each of 5’UTR, gene                                                      2 The same criteria [30] used to define high CpG density island (HC), intermediate CpG density island (IC) and low CpG density or non-island (LC) were used to classify strong islands, weak islands and non-islands, respectively, in Chapter 2. 51   body and 3’UTR with 1st exon, exon and intron intervals to generate the nine gene features. The gene feature intervals were then intersected with the hg19 genomic location of 450k targets in R [141] to assign probes into the nine gene feature categories. This annotation was completed using both RefSeq gene names and transcript names. Gene feature annotation was conducted using the GenomicRanges package in R [173].    Figure 3.3 Illustration of gene feature annotation Based on the overlap of three gene components (1st exon vs. exon vs. intron) with three gene regions (5’UTR vs. body vs. 3’UTR) probes were annotated into the following nine gene feature groups: 1) 5’UTR 1st exons, 2) 5’UTR exons, 3) 5’UTR introns, 4) body 1st exons, 5) body exons, 6) body introns, 7) 3’UTR 1st exons, 8) 3’UTR exons and 9) 3’UTR introns (corresponding to numbers below transcripts). A given probe could be annotated with more than one gene feature; as illustrated by the multiple transcripts (A-E) of a fictional region of the genome. Probe i would be annotated as 5’UTR exon, 5’UTR 1st exon and 5’UTR intron; probe ii would be annotated as body exon, 5’UTR intron, body intron, body 1st exon and probe iii would be annotated as 3’UTR exon, 3’UTR intron, 3’UTR exon, 3’UTR exon and 3’UTR 1st exon. White boxes represent untranslated exons; grey boxes represent translated exons.  The hg19 UCSC knownGene table [169] was downloaded to Galaxy and the closest transcription start site (TSS) for each probe was annotated, regardless of whether the probe was located within the same gene. For each probe, the distance to the closest TSS, gene name and transcript name was noted in the expanded annotation columns “Closest_TSS”, “Distance_closest_TSS”, “Closest_TSS_gene_name” and “Closest_TSS_Transcript”. 52   3.2.6 Sample collection  The utility of the expanded annotation was tested using 450k data collected from placental chorionic villi, adult blood and child buccal samples. Two male and two female placentas were collected as controls for a separate study of chromosomal abnormalities in the placenta. DNA was extracted by standard salting out method [138] from chorionic villi sampled from two separate locations as previously described [152]. The DNA from these two independent locations was combined in equal amounts prior to bisulfite conversion to provide a representative sample of each placenta (n=4). Two male and two female blood samples were collected as adult controls for ongoing studies of respiratory disease and epigenetics (n=4). Peripheral blood mononuclear cell (PBMC) DNA was extracted according to standard procedures. Buccal epithelial samples were collected from two males and two females for a study on maternal care effects on childhood DNAm (n=4). Buccal samples were collected using Isohelix buccal swab kits (Cell Projects Ltd, Kent, UK) and stabilization reagents and DNA was extracted using Isohelix DNA isolation kit (Cell Projects Ltd), as per the manufacturer’s protocols. 3.2.7 Illumina Infinium HumanMethylation450 BeadChip (450k array) processing Two µg of genomic DNA were purified using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA, USA) following manufacturer’s protocol. Purified DNA quality and concentration were assessed with a NanoDrop ND-1000 (Thermo Scientific, Waltham, MA, USA) prior to bisulfite conversion. One µg of purified genomic DNA was bisulfite converted using the EZ DNA Methylation kit (Zymo Research, Orange, CA, USA) following manufacturer’s protocol. Bisulfite DNA quality and concentration were assessed using the NanoDrop and if required, samples were concentrated to approximately 50 ng/µL using a Speedvac (Thermo Electron Corporation, Waltham, MA, USA). Following the Illumina Infinium HumanMethylation450 BeadChip protocol, four µL of bisulfite converted DNA were whole-genome amplified, enzymatically digested, hybridized to the array and then single nucleotide extension was performed [70]. Chips were scanned using an Illumina HiScan on a two-colour channel to detect Cy3 labeled probes on the green channel and Cy5 labeled probe on the red channel. GenomeStudio 2011.1 53   software (Illumina, San Diego, CA, USA) was used to read the HiScan output and conduct background normalization. The signalA, signalB and probe intensity were exported for autosomal probes and read into R [141]. M values were generated using the Bioconductor methylumi [142] package M=log2(max(intensity m, 0)+1/max(intensity u,0)+1), since this value has been shown to be valid for statistical analyses [71]. Following correction for within chip colour bias using the lumi package [72], and probe type correction using Subset-quantile Within Array Normalization (SWAN) [75], M values were converted to ß values using the equation ß=(2M/(2M+1)). The ß value is a number ranging from 0 to 1 that is approximately the fraction of DNAm; thus to ease interpretation, we have reported results as ß values. The microarray data used in this chapter were submitted to GEO (http://www.ncbi.nlm.nih.gov/geo) under accession number GSE42409. Probes with a detection p-value >0.01 in any sample, probes with no ß value in any sample, all rs and ch probes, all sex chromosome and non-specific probes were removed prior to analyses. The level of DNAm for 428,216 probes in our sample dataset was intersected with the expanded annotation for further analyses. 3.2.8 Processing of aging dataset To ensure that results were not limited by sample size, the utility of the expanded annotation was tested on a large publically available dataset in parallel to the testing performed on the data we generated. Series matrix files were downloaded from GEO for GSE40279, containing β values for 473,039 probes for each of 656 blood samples [174]. We worked with the subset of samples that roughly matched the age of the blood samples used in our study (n=261, aged 19-61). Probes with no β value in ≥1 sample, all sex chromosome probes, rs and ch probes were removed from the dataset. For SNP analyses, non-specific probes were also removed; however, these were retained in the analysis of autosomal sex-specific probes. For the discovery of autosomal probes with sex differences in DNAm, β values were read into R, converted into M values using the Bioconductor package lumi [72] and then Significance of Microarray Analysis (SAM) was conducted using the Bioconductor package 54   siggenes [175]. At FDR <0.01, 10,139 autosomal probes were identified as significantly different between male and female samples. Next, this list was crossed with a list of deltaβ values for each probe calculated by taking the absolute value of the difference between the average β of males and the average β of females. 3.2.9 Pyrosequencing Probe cg06961873 was selected for genotype validation of SNP rs61775206 in each of our samples. Primers were designed using PSQ Assay Design software version 1.0.6 (Biotage AB, Uppsala, Sweden). Primer sequences and probe information are available in Supplementary Table 3.2. 0.5 µL of genomic DNA was PCR amplified using the following conditions: 95 C° for 5 minutes, (95 C° for 20 seconds, 55 C° for 20 seconds, 75 C° for 20 seconds) x 50, 72 C° for 5 minutes. Genotyping was performed using a PyroMark MD system (Biotage) and analyzed with PSQ 96MA SNP software (Biotage).  3.2.10 Statistical analyses A Kolmogorov-Smirnov (KS) test was used to assess the difference in distribution of SD in ß values for probes that contained SNPs compared to all probes on the array. The KS statistic represents the maximum absolute difference between the cumulative distributions of two functions. Probes with small within tissue SD in ß (<0.10) were removed from all probe groups to increase the power of the analysis. Probes with a target CpG SNP were removed from the SNP<10 bps group. The number of probes included in the SD in ß distribution curves for blood samples was 5,450 for all probes, 809 for SNP>10 bps, 402 for SNP<10 bps, and 2,190 for target CpG SNP; for the aging dataset, 6,267 for all probes, 1,022 for SNP>10 bps, 362 for SNP<10 bps, and 2,753 for target CpG SNP. KS tests were also used to assess the difference in distribution of DNAm between Illumina CpG classes and between HIL CpG classes. Fisher’s exact test was used to compare the distribution of the number of probes within the three levels of DNAm for both Illumina- and HIL-annotated CpG classes: hypomethylated (ß values of 0 to ≤0.2), heterogeneously methylated (ß values of >0.2 to <0.8) and hypermethylated (ß values of ≥0.8 to 1.0). Enrichment analyses of tissue differentially methylated (tDM) probes were performed 55   in Python. To select tDM probes, DNAm was first averaged for each probe within a tissue. A z score was calculated for each probe comparison between tissues. A p-value cutoff of 0.05 was selected with a Bonferroni correction to account for multiple comparisons [37]. KS and Fisher’s tests were performed in R. All figures were created in R or Adobe Illustrator CS6. 3.3 Results and discussion 3.3.1 Polymorphic CpGs may affect the assessment of DNA methylation  The Infinium assays are based on quantifying bisulfite-introduced C/T SNPs; thus, the actual DNA sequence at the target CpG is at risk of compromising the assessment of DNAm. The end of each 450k probe targets a CpG of interest and, although the alignment of Type I and Type II probes with CpGs differs by one base pair (Figure 3.1), end nucleotide match is essential for extension of both probe types. Thus, a SNP leading to a sequence change at a target CpG might result in a false DNAm signal due to hybridization of the wrong probe (possible for Type I probes) or no/minimal extension at the target site (possible for both probe types). Illumina included annotation of SNPs located within 10 bps of the target CpG (SNP<10 bps, n=36,535 probes) and those located within the remainder of the probe (SNP>10 bps, n=59,892 probes). We have added annotation of probes that query CpGs with documented polymorphisms specifically at the C and/or G position (called target CpG SNPs). Using the dbSNP database, one or more SNPs were annotated at 4.3% (n=20,869) of target CpGs (Figure 3.4A). Most of these probes had only one target CpG SNP (n=20,270), however, 599 had two or more (Supplementary Table 3.3). 32.5% of probes with a target CpG SNP were not documented as variable by dbSNP, while 43.2% had a heterozygosity greater than 0.1 (Figure 3.4A). Being more frequent in the population, this second group of SNPs is more likely to affect the assessment of DNAm. The majority (67.3%) of the rs numbers for probes with target CpG SNPs corresponded to those annotated by Illumina as a SNP<10 bps. Differences between the annotations may be a result of our inclusion of SNPs in the C or G of the target CpG (whereas Illumina only annotated SNPs within the probe sequence, see Figure 56   3.1), updates to the dbSNP database, and the possibility that Illumina used a minimum heterozygosity as SNP inclusion criteria.    57    Figure 3.4 Probes targeting polymorphic CpGs may affect the assessment of DNA methylation (A) A documented SNP was identified at the target C or G position of 4.3% of 450k probes (target CpG SNP). Of these SNPs, 43.2% had a heterozygosity of >0.1, and due to their frequency in the population are thus more likely to affect measurement of DNAm. (B) Using blood samples (n=4) as example, the standard deviation (SD) in ß value (i.e., level of DNAm) between individuals was calculated for all probes. Probes with small SD in ß (<0.10) were removed from the analysis. The distribution of SD in ß value was plotted for all probes, and for the subsets of probes annotated with a target CpG SNP, a SNP within 10 bps of the target but without a target CpG SNP (SNP<10 bps), and a SNP within the remainder of the probe (SNP>10 bps). Numbers in brackets indicate KS statistics in comparison to the distribution of all probes. (C) Using a selection of 261 adult blood samples extracted from the aging dataset (GSE40279), the distribution of SD in ß value was plotted for the subsets of probes as described in B. Numbers in brackets indicate KS statistics in comparison to the distribution of all probes. (D) DNAm at probe cg06961873 across 12 individuals exemplified the trichotomous pattern of DNAm at a target CpG SNP. The three distinct levels of DNAm corresponded to sample genotype at SNP rs61775206, located at the target CpG: TT genotypes were assessed as hypomethylated, TC genotypes as hemimethylated and CC genotypes as close to fully methylated. 58   Theoretically, a bi- or tri-modal distribution of DNAm would be produced by a probe affected by sample genotypes, and this pattern would result in a high within-tissue standard deviation (SD) in ß value. Thus, we examined the distribution of within-tissue SD in ß (n=4 per tissue) at probes annotated with a target CpG SNP, SNP<10 bps (excluding those probes also annotated with a target CpG SNP) and SNP>10 bps (Figure 3.4B, results for blood). The distribution of SD in ß for probes annotated with a target CpG SNP was most different (p=1.78x19-15) from that of all probes based on a Kolmogorov-Smirnov (KS) test for difference in distribution. This trend was illustrated by a shift in the density curve for SD in ß of probes with target CpG SNPs, in comparison to the curve for SD in ß of all probes (Figure 3.4B). To ensure that this finding was not an artifact of our small sample size, we performed the same analysis using a larger, publically available dataset (GSE40279) that had investigated age-associated DNAm changes in the blood of 656 individuals (aged 19 to 101) [174]. We extracted the younger half of samples (n=261, aged 19-61) for our analysis since this roughly covered the age range of the blood samples in our study. In this larger dataset (referred to as the “aging dataset”), the distribution of SD in ß for probes annotated with a target CpG SNP also exhibited the largest difference in distribution from that of all probes (p=1.22x10-14), based on a KS test (Figure 3.4C). We next hypothesized that highly variable probes (defined as within-tissue SD in ß ≥0.25) were likely compromised by the presence of a target CpG SNP. There were 780 such probes in blood, 819 in buccal, 666 in chorionic villus samples and 480 in the aging dataset that met this criterion (Table 3.1). We did not expect the number of probes affected by SNPs to be the same across tissues since each tissue type was from a different set of individuals, with different genotypes. Comparing these variably methylated probes to the SNP annotation, 85.0%, 81.6%, 72.7% and 92.5% were annotated with a target CpG SNP in blood, buccal and chorionic villus samples and the aging dataset, respectively (Table 3.1). Of the highly variable probes, only four in blood, two in buccal and two in chorionic villi overlapped with the sex-specific autosomal probes described in the next section; thus, we do not believe that these 59   large SD were driven by sex differences in the data. No probes in the aging dataset overlapped with probes that cross-hybridize to the sex chromosomes.  Table 3.1 The majority of highly variable probes* were annotated with SNPs  Tissue  Blood n=4 Buccal n=4 Chorionic villi n=4 Blood (Aging dataset) n=261 Total 780 819 666 480 Annotated with target CpG SNP 663 (85.00) 668 (81.56) 484 (72.67) 444 (92.5) *Defined as within-tissue SD in ß ≥0.25. Number in brackets is the percent of total per tissue.  To confirm that a target CpG SNP could affect DNAm, samples were genotyped at a probe (cg06961873) that had an annotated target CpG SNP and SD in ß ≥0.25 in all three tissues. As predicted, homozygous C samples were assessed as hypermethylated, heterozygotes were assessed as hemimethylated, and homozygous T samples as hypomethylated (Figure 3.4D). Although we were not able to genotype samples in the aging dataset, a histogram of DNAm at this same CpG site across the 261 samples showed the hypothesized trimodal pattern of DNAm (Supplementary Figure 3.1). Other examples of highly variable probes in the aging dataset also illustrate this pattern (Supplementary Figure 3.1). Given the demonstrated potential to bias the call of DNAm, we suggest that probes with a target CpG SNP should be disregarded in most analyses of the 450k array. On the other hand, differences at these compromised probes may point to relevant genetic differences between disease groups, though small sample sizes may result in such findings by chance. At a minimum, 450k users should carefully check candidate discoveries against the target CpG SNP annotation, in addition to a current SNP database, to ensure that genetic differences are not misreported as epigenetic differences. Although we used a straightforward example to illustrate how a target CpG SNP may confound the assessment of DNAm, effects may also be observed at probes with SNPs within the remainder of the probe, i.e., outside of the target CpG. For example, polymorphisms throughout the interval of hybridization have been shown to affect the signal 60   intensity of probes used in Illumina mRNA expression arrays [176] that have the same probe length as the 450K array. Similar effects have also been observed in Affymetrix mRNA expression arrays, although these use shorter probes which might be more sensitive to sequence mismatches [177]. Additionally, several studies have recognized the heritability of DNAm through the genetic-epigenetic interaction of methylation-associated SNPs (mSNPs) [178-180], suggesting that some SNP-associated differences in DNAm may be true differences and not due to technical artifacts.  3.3.2 8-9% of probes mapped to more than one location in silico  An additional confounding feature of the Infinium DNAm arrays is that some probes map to multiple locations in the genome [140]. Signals from these non-specific probes likely represent a combination of DNAm at more than one location. Using alignment to four different in silico bisulfite-treated genomes [140], we identified 11.2% (n=15,125) of Type I probes and 7.7% (n=26,812) of Type II probes on the 450k array as non-specific (total of 8.6% of 450k probes). While the number of cross-hybridization loci per probe ranged from two to 1,615, the majority of non-specific probes cross-hybridized to between two and five locations (52.4% of Type I non-specific probes and 65.2% of Type II non-specific probes). Within non-specific probes, 600 were intended to target sex chromosomes but also mapped to autosomal chromosomes, while 11,412 were intended to target autosomal chromosomes but also mapped to sex chromosomes (Table 3.2); this location of cross-hybridization was included in the expanded annotation. Autosomal probes that potentially cross-hybridize to the sex chromosomes may be problematic in studies assessing sex differences in DNAm on autosomes or in studies where male and female subjects are analyzed together [140, 181], due to the general hypermethylation of CpG island promoters on one of the two X chromosomes in females [172].   61   Table 3.2 Location of in silico cross-hybridization of non-specific probes   Intended probe target: auto chrs Intended probe target:   sex chrs  Total on 450k array 473,864 11,648 Non-specific probes Cross-hybridize only to auto chrs 29,178 371 Cross-hybridize only to sex chrs 540 747 Cross-hybridize to auto & sex chrs 10,872 229 Total: cross-hybridize to sex chrs 11,412 976 Total: cross-hybridize to auto chrs 40,050 600 Total 40,590 1,347 Auto, autosomal; chrs, chromosomes.  In the aging dataset, after excluding sex chromosome probes, but not our annotated non-specific probes, we used a false discovery rate (FDR) and minimum difference in DNAm (deltaβ) between sexes to identify autosomal probes that were differentially methylated between males (n=133) and females (n=128). An FDR of <0.01 and minimum deltaβ of 0.10 identified 75 sex-specific autosomal probes, of which 40% were annotated to cross-hybridize to the sex chromosomes (Supplementary Table 3.4). Although some true sex differences in DNAm likely exist on the autosomes, this result indicates that many of the large autosomal sex differences in DNAm may be an artifact of probe design and likely actually represent sex-chromosome differences in DNAm.  Homologous gene families, duplicated genes or repetitive elements have been proposed as potential causes of in silico cross-hybridization of Infinium probes [140]. Thus, for all 450k probes, we annotated the number of nucleotides at the intended site of hybridization that mapped to repetitive DNA based on RepeatMasker annotation in BLAT [170]. For 72,957 probes (15.0% of 450k probes), more than half of the nucleotides in the probe target (>25 bps) overlapped with repetitive DNA. We had annotated 19,731 of these repetitive probes as non-specific, which reflects their in silico cross-hybridization. Interestingly, for 24,847 specific probes (i.e., mapped only to the intended target), the entire probe (all 50 bps) was in repetitive 62   DNA. This group of specific repetitive probes might be exploited to assess DNAm of repetitive elements, particularly in studies investigating changes in DNAm in cancer or in association with environmental exposures [122, 123]. We recommend that probes cross-hybridizing to the sex chromosomes be removed from analyses prior to hypothesis testing, though this may vary with study design. Since cross-hybridizing probes may target relevant homologous gene families or repetitive elements, some may choose to retain autosomal cross-hybridizing probes in analyses. With this approach, potential cross-hybridization needs to be considered in the interpretation of any identified candidate CpG sites. 3.3.3 Comparing Illumina and HIL annotation of probes highlighted differences between CpG classification systems As previously mentioned, the 450k array includes probes designed to target UCSC CpG islands, as well as shores, shelves and non-island regions, which we refer to as the “sea” [70] (Figure 3.2A; see Section 3.2.4 for class definitions). Alternative “HIL” CpG classes (i.e., high density CpG island, intermediate density CpG island and low CpG density or non-island) provide a different criterion for probe annotation based on CpG enrichment. We expanded the 450k annotation by categorizing probes into four HIL classes: (1) HC probes, (2) IC probes, (3) ICshore probes (regions of IC density that border HCs), and (4) non-island probes or LCs (Figure 3.2B; see Section 3.2.4 for class definitions) [30, 172].  The distribution of probes within each Illumina-annotated CpG class was compared to the distribution of probes within each HIL-annotated CpG class (Supplementary Table 3.5). The majority of probes were classified as anticipated, with 77.6% of HC probes annotated as Illumina island probes, 65.0% of ICshore probes annotated as Illumina shore probes, and 61.5% of LC probes annotated as Illumina sea probes (Figure 3.5). The largest difference in annotation was that close to half of IC probes (51.0%) were Illumina-annotated sea probes, while the remainder of IC probes was distributed across Illumina-annotated islands (17.2%), shores (19.9%) and shelves (11.9%).  63    Figure 3.5 Comparison of the genomic distribution of Illumina-annotated CpG probe classes within each HIL-annotated CpG probe class Within HCs, ICshores and LCs, the majority of probes were categorized into the respective Illumina-annotated CpG class. However, even though ICs and ICshores have the same CpG density, the distribution of probes based on Illumina’s CpG classes was different between these two HIL classes, suggesting a functional difference between ICs that border HCs and isolated ICs.  To elucidate potential functional differences between CpG classes, we examined the distribution of DNAm for both Illumina- and HIL-annotated CpG classes (for blood, Supplementary Figure 3.2; buccal, Supplementary Figure 3.3 and chorionic villi, Supplementary Figure 3.4). Within each classification system, all distribution curves were significantly different from each other. On average, KS statistics were larger for comparisons between HIL CpG classes than for Illumina CpG classes, indicative of more distinct distributions of DNAm in HIL CpG 64   classes. Using blood as example, ß values were separated into three categories: hypomethylated (ß values of 0 to ≤0.2), heterogeneously methylated (ß values of >0.2 to <0.8), and hypermethylated (ß values of ≥0.8 to 1.0) (Figure 3.6 and Supplementary Table 3.6) [33, 182]. The majority of both HC probes (79.2%) and Illumina-annotated island probes (72.3%) fell in the hypomethylated category in blood, consistent with the characteristic pattern of CpG island DNAm [45, 182]. The distribution of DNAm within Illumina-annotated shore probes, IC probes and ICshore probes was different (e.g., in the hypomethylated category 34.0%, 13.6% and 46.1% respectively), suggesting that these CpG classes are distinctive. Interestingly, a higher proportion of Illumina-annotated shelf probes than Illumina-annotated sea probes were hypermethylated (72.6% vs. 66.4%, respectively), perhaps attributable to the differing CpG enrichment profile within shelves and seas (as demonstrated by the contribution of HC, IC and LC probes to each of these classes, Supplementary Table 3.5).    Figure 3.6 Distinct patterns of DNAm within CpG classification systems  Probes were grouped into three levels of DNAm based on average ß values within a tissue: hypomethylated (ß values of 0 to ≤0.2; yellow), heterogeneously methylated (ß values of >0.2 to <0.8; light blue) and hypermethylated (ß values of ≥0.8 to 1; dark blue). The percentage of probes in Illumina- and HIL-annotated CpG classes was plotted for the three levels of DNAm in blood (n=4). HIL CpG classes were more characteristic in their DNAm profiles than Illumina-annotated CpG classes. Numbers on top of bars indicate number of probes per class. 65    Previous studies have shown that tissue-specific differences in DNAm occur in CpG island shores [37]. We were interested in assessing where tissue-specific differences in DNAm occur based on the Illumina vs. HIL CpG classes. Thus, we examined probes that were differentially methylated between tissues (tDM) for enrichment within each CpG class. The highest number of tDM probes were identified between blood vs. chorionic villus samples (91,255; 21.3% of probes), in comparison to chorionic villus vs. buccal samples (75,021; 17.5% of probes) and blood vs. buccal samples (69,174; 16.2% of probes). tDM probes were significantly depleted in Illumina-annotated islands and HIL-annotated HCs, and significantly enriched in all other CpG classes (Figure 3.7). Interestingly, the level of enrichment within each CpG class varied by the tissues compared (Figure 3.7, Supplementary Table 3.7).   Figure 3.7 Enrichment of differentially methylated probes in many CpG classes  Probes that were differentially methylated (tDM) between blood and buccal samples (n= 69,174) or between blood and chorionic villus samples (n=91,255) were assessed for enrichment in (A) Illumina- and (B) HIL-annotated CpG classes. Enrichment was plotted as “% relative enrichment”, representing the enrichment of tDM probes relative to the total percentage of probes within each CpG class. Negative % relative enrichment indicates that tDM probes were depleted in the given probe-type category whereas positive % relative enrichment indicates that tDM probes were enriched in the given probe-type category. All enrichment analyses were significant with the exception of ICshore probes in the comparison of blood vs. chorionic villi. 66   The HIL CpG classes demonstrated a more extreme DNAm profile and larger percentage of tDM probes. Intriguingly, even though ICs and ICshores have the same CpG density, distinct differences between these two classes emerged in our analyses, suggesting that ICs that border HCs are distinct from ICs on their own and highlighting the utility of this additional classification. This discrete classification may be a useful approach to subset the array CpG sites for analyses within biologically-relevant structures of the genome. 3.3.4 DNA methylation was variable across nine gene feature groups  There is increasing evidence that DNAm of gene features outside of CpG islands and promoters may be an important marker of gene expression. For example, it has been shown that DNAm of the 1st exon is correlated with transcriptional repression [183]. Coverage of regions outside of CpG islands and promoters increased dramatically from the 27k to 450k array, but Illumina only categorized probes into six gene feature groups: TSS1500 (within 1500 bps of a transcription start site (TSS)), TSS200 (within 200 bps of a TSS), 5’UTR, 1st exon, body and 3’UTR. Given the number of probes on the 450k array, a more detailed gene structure classification might increase the potential to observe subtle, biologically-relevant trends in DNAm. Thus, we expanded on gene feature annotation by (i) annotating the distance of each probe to the closest TSS, and (ii) classifying probes into nine groups based on three gene components (1st exons, exons and introns) and three gene regions (5’UTR, body and 3’UTR). Probes were thus grouped into: (1) 5’UTR 1st exons, (2) 5’UTR exons, (3) 5’UTR introns, (4) body 1st exons, (5) body exons, (6) body introns, (7) 3’UTR 1st exons, (8) 3’UTR exons, and (9) 3’UTR introns, using (a) transcript and (b) RefGene name. Due to alternative TSS and splicing, a given probe could be categorized into several gene feature categories (Figure 3.3).  Since we observed large differences in DNAm across HIL CpG classes in Section 3.3.3, gene features were further sub-classified by HIL CpG class. Given the known bias in the distribution of CpGs in the genome [27], there was a predictable unequal distribution of the proportion of probes annotated to each HIL CpG class across gene feature groups (Figure 3.8, Supplementary Table 3.8). For example, HC probes were significantly over-represented in 1st exons found in the 5’UTR and gene body, while LC probes were significantly under-represented 67   in both these groups. Within each HIL CpG class, trends in DNAm across gene features were consistent (Supplementary Table 3.8). For example, in blood, DNAm of intronic probes increased from 5’UTR to 3’UTR to gene body probes (Figure 3.9A), while DNAm of 5’UTR probes increased from 1st exon to intron to exon probes (Figure 3.9B).    Figure 3.8 Contribution of HIL CpG classes to probes in nine gene feature groups The percentage of probes within each HIL CpG class was different for each gene feature group. Numbers on top of bars indicate number of probes per gene feature group; a total of 213,315 probes were located within these nine gene feature groups.   68    Figure 3.9 Variation of gene feature DNAm within a CpG class The level of DNAm was plotted as an average ß value for each gene feature in blood. Analyses were conducted within each HIL CpG class due to the large differences in DNAm that were observed between classes. Average ß values varied across probes by (A) gene location, as exemplified by intronic probes and (B) gene components, as exemplified by 5’UTR probes.  We were also interested in assessing where tissue-specific differences in DNAm occurred based on gene features. Thus, we examined tDM probes for enrichment within each gene feature group (again, separated by CpG class, Supplementary Table 3.9). tDM probes in 1st exons were significantly depleted in 5’UTRs located in HCs and ICshores, but significantly enriched in 5’UTRs located in LCs. HC exons were significantly enriched for tDM probes in 5’UTR, body and 3’UTR across all tissue comparisons, perhaps due to biological significance or small probe numbers in these categories. Although CpG classes were primarily associated with differences in DNAm, gene structure is also an important factor to consider when analyzing 450k array results. 69   3.4 Conclusion In this chapter, I presented an expanded annotation of the Illumina Infinium HumanMethylation450 BeadChip DNAm microarray (450k array) including both unreliable probe annotation and additional biologically-relevant annotation. The expanded annotation was deposited as a platform on GEO (http://www.ncbi.nlm.nih.gov/geo) under the accession GPL16304. Based on the analyses conducted in this chapter, I suggest that 450k users analyze data with the following factors in mind: probe signals may be compromised by the presence of SNPs in the target CpG and/or binding of probes to multiple genomic locations. SNPs at the target CpG may be especially problematic in studies with small sample size, as chance may result in dramatic differences in the frequency of polymorphisms between groups. However, false positives may still present in studies with larger sample sizes, if groups are not ethnically matched. Additionally, DNAm patterns by CpG enrichment classes or gene features should be considered in data analysis. If DNAm is averaged, for example, in a set of probes with differing CpG densities, this may in fact wash out meaningful exposure or phenotype-related differences. With the advent of next-generation sequencing applied to bisulfite converted samples, measurement of DNAm will be possible on a global, site-specific scale. However, difficulties remain in the alignment of reduced complexity reads, as well as biologically-relevant data interpretation [184]. Thus, array-based technologies, which target specific genomic regions of interest, are of value for assessing DNAm changes relevant to studies of human health and disease. Analytical tools and data filters for the 450k array are in constant evolution. For example, a recent study excluded 450k probes that mapped to copy number variants (CNVs) because of the potential to bias measurement of DNAm [185], and another study suggested pre-filtering data to reduce analysis to examination of only highly variable probes (i.e., standard deviation across samples >0.1) [186]. Like many platforms, it may not be possible to write a single comprehensive protocol for processing 450k data; virtually every question posed will require modifications. Thus, a thorough understanding of array design, targets and technical issues is paramount to making informed processing decisions. Furthermore, even though high throughput data collection is relatively easy and inexpensive, it is essential that biological 70   factors, especially CpG density, be taken into consideration to choose the most appropriate course of analysis. 71   Chapter 4: Correction for batch effects using ComBat improves 450k data in a pilot study of placental MTHFR genotype 72   4.1 Introduction  Methylenetetrahydrofolate reductase (MTHFR) catalyzes an irreversible reaction that commits methyl groups to the methylation, rather than to the DNA cycle in one carbon metabolism (OCM, Figure 1.3). There is an overwhelming amount of literature investigating the association of polymorphisms in MTHFR with disease risk, like arthritis [187], colorectal cancer [188], migraines [189], autism [190], neural tube defects [191] and cardiovascular disease [192]. Two polymorphisms 677C>T (rs1801133) [193] and 1298A>C (rs1801131) [98], are the most studied MTHFR SNPs, as they are missense mutations that reduce enzymatic activity to about 45% and 68%, respectively, comparing alternate to reference enzymes [93]. Interest in clinical implications of these variants is evident by the fact that MTHFR genotyping is available from 50 certified labs in the United States [194]. This genetic test is widely promoted in the naturopathic field, where patients are told that a “faulty genotype” may explain a whole list of symptoms and diseases including “anxiousness, adrenal fatigue, brain fog, cervical dysplasia, increased risk of many cancers (including breast and prostate), low thyroid, leaky gut, high blood pressure, heart attacks, stroke, Alzheimer’s disease, diabetes, and miscarriages” [195]. These patients are then told to take supplements containing “methyl-folate” and methyl B12 to reportedly increase methylation and decrease the risk of disease development [196]. However, the basic link between MTHFR variants and altered OCM is uncertain.   Consistently, increased homocysteine (hcy) has been documented in the blood of 677TT individuals [95-98], estimated at 20% higher levels than in 677CC [197]. This trend of elevated hcy was thought to relate to the association of variant MTHFR alleles with risk of cardiovascular disease, venous thrombosis and recurrent miscarriage [194]. However, meta-analyses now demonstrate little association of MTHFR genotype with these diseases, except for some modest variation by ethnicity [198-202]. Publication bias may contribute to the inconsistent literature; for example, Clarke at al. [197] found a significant association of MTHFR 677TT with coronary heart disease in published but not unpublished datasets. In contrast to public interest in this 73   topic, medical guidelines currently state that in most cases, there is insufficient evidence to warrant testing of MTHFR genotype [194].  A consistent, increased risk for neural tube defect (NTD)-affected pregnancies is found in women homozygous for the variant allele at the 677 locus (odds ratio 1.2 – 2.0 for TT vs. CC), but not 1298 locus [191, 203]. However, MTHFR genotyping is also not recommended in association with NTD disease risk, since the contribution to disease is thought to be small and not well understood [204]. Altered genome-wide DNA methylation (DNAm) is a hypothesized mechanism underlying the increased risk for NTDs in association with MTHFR polymorphisms [205], as well as perhaps other diseases. However, the relationship between MTHFR polymorphisms and DNAm even in a healthy context is unclear. Lower global DNAm has been documented in the blood of healthy adults with 677TT [206-208], though in some cases, this association was limited to conditions of folate depletion [208, 209]. There is conflicting data on the effects of the 1298C variant on DNAm, with some studies noting lower levels of DNAm [210], and others no effect in 1298CC [211].   A pilot study to better understand the relationship between MTHFR genotype and DNAm during pregnancy is presented in this chapter using placental chorionic villi from 30 healthy term pregnancies. Studying DNAm in chorionic villi is of particular interest as the placenta is responsible for folate acquisition through gestation [84], and unique placental DNAm may enable this organ to adapt to environmental conditions [1, 139, 212]. The placenta exhibits a high degree of both within- and between-individual variability in DNAm [1, 212], suggesting that it may be more tolerant than other tissues to changes in DNAm. Pooled DNA from each placenta was run on the Illumina HumanMethylation450 BeadChip (450k array) to compare genome-wide DNAm associated with high-risk MTHFR genotypes (variant 677TT; n=10 and variant 1298CC; n=10), to that of a reference group (n=10; 677CC and 1298AA). These samples were selected in an attempt to isolate the association of each of the high-risk variants with DNAm; the examination of 677 and 1298 heterozygotes was beyond the scope of this pilot study.  74   The 450k DNAm data were examined in several ways: (i) integrating DNAm across the array, to assess genome-wide differences in DNAm; (ii) in blocks called differentially methylated regions (DMRs), to combine data across neighbouring sites; and (iii) at individual CpG sites, to assess DNAm array-wide. Though no changes in DNAm were identified by MTHFR genotype, this analysis was important in the continued development of a workflow for DNAm microarray analysis. The 30 MTHFR samples were run as part of an 84 sample set, containing three different projects. The set was designed so that some samples would be used by multiple studies, and thus the 84 samples were randomized by study and condition across seven 450k chips. In preparing the MTHFR data for analysis, I discovered batch effects (i.e., systematic variation in the data due to technical sources) were present due to the distribution of samples across chips and rows. Though both arise from technical measurement variability, batch effects are distinguished from noise in that they are systematic [213]. Well-known sources of batch effects include, for example, the running of experiments on different days or by different operators, and the use of different reagent lots. In this chapter, I demonstrate how measurement variability due to batch effects may be reduced in a 450k dataset using a tool called ComBat [214] implemented in the R software environment [141], but that when applied to an unbalanced study design, ComBat can introduce false signal. This pilot study informed on the importance of testing for the presence of batch effects as well as the approach to take for correction – integral steps to processing of DNAm array data.  75   4.2 Materials and methods Materials and methods were adapted from [215].  4.2.1 Sample collection and case characteristics Ethics approval for this study was obtained from the University of British Columbia/Children’s Hospital and Women’s Health Centre of British Columbia Research Ethics Board (certificate number: H04—70488). Chromosomally normal placental chorionic villus samples were collected from term deliveries at B.C. Women’s Hospital & Health Centre, Vancouver, Canada. Exclusion criteria for cases included chromosomal abnormality, congenital abnormality and grossly abnormal placenta. DNA was extracted from samples by standard salting out method modified from Miller et al. [138]. Previous studies in the Robinson lab have shown that DNAm varies across the placenta [1, 212]; thus, to obtain a more accurate assessment, samples from three independent cotyledons were taken from the fetal side of each placenta and extracted DNA was combined for DNAm analyses. To select ten samples of each of three MTHFR genotype combinations, 186 placentae were screened for MTHFR 677 and 1298 genotypes. Primer sequences and reaction conditions can be found in Supplementary Table 4.1. Five µL of PCR product were sequenced on a PyroMark Q96 MD Pyrosequencer (Qiagen) using standard protocols [216]. The selected group of ten reference samples (677CC and 1298AA), ten variant 677 samples (677TT and 1298AA), and ten variant 1298 samples (677CC and 1298 CC) were matched on important clinical parameters (Table 4.1). There was no statistical difference in the distribution of sex, gestational age at birth, birth weight (standardized by gestational age), presence of pathology, maternal age or maternal ethnicity by MTHFR group.  76   Table 4.1 Clinical characteristics of cases  reference n=10 variant 677 n=10 variant 1298 n=10 p-valuea  maternal age (yrs); median (range) 35.7 (31.4-41.1) 35.3 (30.0-42.8) 33.8 (26.1-39.5) ns, ns maternal ethnicity; n Caucasian (%) 8 (80%) 6 (60%) 8 (80%) ns, ns Adverse pregnancy outcome 1 LOPE 2 LOPE; 1 IUGR 1 LOPE ns, ns GA at birth (wks); median (range) 39.4 (36.1-41.6) 37.8 (34.6-40.3) 39.3 (38.6-40.7) ns, ns birth weight (SD)b; median (range) 0.005 (-1.6-2.2) -0.130 (-3.0-0.7) 0.055 (-0.8-1.4) ns, ns sex; n male (%) 4 (40) 4 (40) 5 (50) ns, ns placental MTHFR 677 genotype; n      CC  10 0 10 - TT  0 10 0 - placental MTHFR 1298 genotype; n     AA  10 10 0 - CC  0 0 10 - a p-values calculated by Mann-Whitney test for continuous variables and Fisher’s exact test for categorical variables; b Birth weight is measured in standard deviations from gestational age and sex-specific means [217]; GA, gestational age; wks, weeks; yrs, years; ns, not significant; LOPE, late onset preeclampsia; IUGR, intrauterine growth restriction.  4.2.2 Illumina Infinium HumanMethylation450 BeadChip quality control and pre-processing Genomic DNA was purified and bisulfite converted as in Section 3.2.6 and processed following the Illumina Infinium HumanMethylation450 BeadChip protocol [70]. Raw intensity was read into Illumina GenomeStudio Software 2011.1 and background normalization was applied.  On each array, Illumina includes 835 control probes to assess technical issues such as array staining, extension and bisulfite conversion. An initial quality control (QC) check following Illumina protocol was performed using the control probes, with no samples or chips identified as outliers. Signal intensity exported from GenomeStudio was read into R statistical software [141] using lumi [72] to convert signal intensities to M values. Sample identity was checked 77   using clustering of samples by sex with 450k chromosome X and Y probes. Finding no mislabeled samples, sample quality was next assessed using: (i) the number of probes with a detection p-value >0.01; (ii) the number of probes with <3 bead replicates; and (iii) the average sample intensity. No samples were identified as outliers, based on these criteria. Next, probe filtering was conducted to eliminate systemically poor quality probes (detection p-value >0.01 or <3 bead replicates in >20% of samples; n=122), probes targeting the sex chromosomes (n=11,648), and polymorphic probes (n=19,957) and sex chromosome cross-hybridizing probes (n=11,412) annotated in Chapter 3 [168]. Colour correction [72] to correct for red-green colour channel bias and SWAN normalization [75] to correct for Type I – Type II probe bias were applied. M values were replaced with NAs in the remaining probe-sample pairs with detection p-values >0.01 or <3 bead replicates. Principal component analysis was used to assess batch effects. Sentrix_row (i.e., chip row) and Sentrix_ID (i.e., chip ID) were found to be associated with variability in the dataset and were subsequently corrected for, using ComBat in the SVA package [214]. More details can be found in Sections 4.3.1 and 4.3.2. 4.2.3 Differential DNA methylation analyses Differential methylation (DM) was assessed by applying a linear model to M values on a per CpG-level using the R package limma [218]. In modelling DNAm, MTHFR group (variant 677, variant 1298 or reference) was used as the main effect and fetal sex and gestational age (GA) were included as additive covariates. DM results were extracted for the comparison of the variant 677 to reference group and the variant 1298 to reference group. Resulting p-values were adjusted using the Benjamini & Hochberg [219] false discovery rate (FDR) method in limma [218]. M values adjusted for fetal sex and gestational age were transformed to β values using the m2beta function in lumi [72]. For every CpG site, average DNAm was calculated by taking the mean β value for samples within each of the reference, variant 677 and variant 1298 groups. Then, group differences in DNAm (deltaβ) were calculated by subtracting the average 78   for the reference group from each of the variant 677 and variant 1298 group averages per CpG site. Significant DM CpG sites were considered as those with an FDR <0.05 and deltaβ ≥0.05.  4.2.4 Array-wide analyses Unsupervised hierarchical clustering was performed using all clean, adjusted data (n=442,348 CpG sites, n=30 samples), clustering by Euclidean distance and average agglomeration. Stability of the clustering was performed using the R package pvclust [220] and 1,000 iterations. An array average β was calculated for each sample by averaging M values at all clean CpG sites adjusted for gestational age and sex with missing values stripped prior to calculation, and then transforming the result to a β value. The percentage of outlier probes per sample was calculated using M values in two steps. First, the number of outlier probes per sample was calculated. A probe was considered an outlier if it was greater than 3 median absolute deviations from the probe median for all samples [221]. Second, the number of outlier probes per sample was normalized by dividing it by the total number of CpG sites with data (i.e., NAs stripped) in the given individual. 4.2.5 DMR analysis Differentially methylated region (DMR) analysis was conducted on M values using the dmrFind function in the R package charm [222] with the following criteria to define a DMR: maximum gap between adjacent CpGs = 300 bps and ≥3 probes in identified DMRs. Fetal sex and gestational age were included as additive covariates in the modelling of differentially methylated DMRs. Once DMRs were identified in the comparison of the variant 677 to reference group and the variant 1298 to reference group, a q value correction for multiple comparisons was applied with 1,000 iterations, using the qval function in charm. Significant DMRs were considered as those with a q value <0.05. 4.2.6 Statistical software  All analyses were conducted in R statistical software [141]. P-values for Table 4.1 and Table 4.3 were calculated by Mann-Whitney test for continuous variables and Fisher’s exact 79   test for categorical variables. Association of top principal components (PCs) loadings with sample characteristics was performed using linear modelling, with presentation of nominal p-values in Figures 4.2 and 4.6. Graphics were created using the ggplot2 package [223] and Adobe Illustrator CS6. All graphs were plotted using sex- and gestational age-corrected data. 4.3 Results 4.3.1 Initial processing of 450k MTHFR data Data for the 30 MTHFR samples were processed in preparation for analyses (see Section 4.2.2). Five major steps were taken to prepare the data: (i) filtering of systemically poor quality probes; (ii) colour correction for differences in the dynamic range of red and green stain signals; (iii) normalization to correct for differences in the dynamic range of Type I and Type II 450k probes; (iv) batch correction with ComBat for chip; and (v) batch correction with ComBat for row on the chip. Following each of these steps, a linear model was applied to each target CpG site to model DNAm as a function of MTHFR genotype, including sex and gestational age, as additive covariates. Results were extracted for two comparisons: variant 677 vs. the reference group and variant 1298 vs. the reference group, generating a p-value for every CpG site per comparison. The distribution of unadjusted p-values at each processing step was plotted to give an overall view of the data (Figure 4.1). As data was cleaned, normalized and corrected for batch effects, I expected p-value distributions would flatten towards a uniform distribution (i.e., equal likelihood of significant and non-significant tests) and may become right-skewed (i.e., left peaking), if there were more differences in DNAm between MTHFR groups than expected by chance.    The first four graphs of Figure 4.1 show similar and slightly left-skewed (i.e., right-peaking) distributions, which suggest missing explanatory variables in the model, perhaps batch effects or other unknown contributors to variation in DNAm. Batch effect due to chip row (graph 5) was corrected for using the R function ComBat, which resulted in the slightly right skewed distributions. Finally, after applying ComBat again to correct for batch effect due to chip (graph 6), the distributions became extremely right-skewed.  80     Figure 4.1 Initial processing: p-value distributions for linear modelling of MTHFR group at each processing step. At each step in processing, a linear model was fit to test for differential methylation by MTHFR group. Unadjusted p-value distributions were plotted to monitor data processing. The left peaking distribution in graph 6 indicates a dramatic change in the data after correcting for batch effect due to chip.  At a typical threshold of FDR <0.05, no differences in DNAm by MTHFR genotype were observed prior to batch correction (i.e., after step 4), while after correction for chip and row, the data contained 9,683 differentially methylated CpG sites for the variant 677 comparison, and 19,192 differentially methylated CpG sites for the variant 1298 comparison (Table 4.2). I was wary of the magnitude of change in the association with MTHFR genotype after correcting for batch.   81   Table 4.2 Initial processing: DM sites at FDR <0.05 before and after ComBat  Before ComBat After ComBat number of tested sites 441,990 441,990 variant 677 vs. reference 0 9,683 variant 1298 vs. reference 0 19,192  PCA was a second metric used to assess data processing. This technique identifies orthogonal principal components (PCs) to reduce high-dimensional data into a lower number of dimensions that account for the majority of data variation. Testing the association between top PCs and sample variables can suggest sources of variability in the data. Figure 4.2 illustrates the association of the top five PCs with MTHFR genotype, row and chip. It is evident that the application of ComBat removed variability due to chip and row; however, as was suggested by the density plots in Figure 4.1 and DM in Table 4.2, PCA also indicates that a strong MTHFR genotype signal appeared after correction.    Before ComBat  After ComBat             % Variance/PC 7.2 6.0 5.5 4.1 4.1  9.1 8.6 6.7 5.5 5.1  PC1 PC2 PC3 PC4 PC5  PC1 PC2 PC3 PC4 PC5 MTHFR genotype            chip            row            Figure 4.2 Initial processing: association of top five PCs before and after application of ComBat. A linear model was fit to test for an association between PC loadings and sample characteristics prior to batch correction and after correction for both chip and row. Box colours indicate significance of association: dark blue p ≤0.001, mid-blue p ≤0.01, light blue p ≤0.05, grey boxes indicate no association. After correction for batch effects, no association with chip or row was found, however a strong association with MTHFR genotype was introduced.  82   4.3.2 A second attempt at batch effect correction  ComBat requires two inputs to correct for a batch effect. The first is a model describing the parameter(s) that should be protected from correction (in this case, MTHFR group), and the second is the batch variable to be corrected for (in this case, row and then chip). In essence, the function estimates the effect of each category of the batch (row 1, 2, 3, 4, 5, 6 and chip 1, 2, 3, 4, 5, 6, 7) for each category of the protected variable (reference, variant 677 and variant 1298 groups). Because of the randomization of this small number of samples within the larger group of samples run at the same time, the distribution across chips and rows was sparse (Figure 4.3). Thus, I speculated that in my initial analysis, batch effects were not accurately estimated and corrected for in each MTHFR group.   Figure 4.3 Location of samples on seven 450k chips. Illustration of the location of the MTHFR samples used in this study across seven chips containing samples for multiple studies.   To obtain an improved estimation of row and chip effects, I made several changes to the MTHFR data processing and then reanalyzed:  i. Increased sample size: 29 other chorionic villi samples were run in parallel to this project for other studies (hashed arrays in Figure 4.3), and were included in the reanalysis. This 83   increased the pre-processing sample size from 30 to 59, with a better distribution of samples across chips and rows. When ComBat was applied in this second analysis, the protected parameters were MTHFR677 + MTHFR1298 genotype, in order to accommodate the inclusion of heterozygotes in the additional samples. The 30 MTHFR samples were separated out at each step of the analysis for monitoring, and then once data cleaning was complete. This also allowed for the inclusion of a technical replicate to better monitor data processing. ii. Reduced sample subdivision: In running of the 450k arrays, the chips stand vertical for approximately three hours while a series of washes are applied, which may be the source of the row effect. Thus, I grouped samples into high (rows 5 and 6), mid (rows 3 and 4) and low (rows 1 and 2) locations to reduce the number of categories that needed to be estimated for row. In this second analysis, after each processing step, the 30 MTHFR samples were selected out of the larger group of 59 samples and a linear model was fit as described in Section 4.3.1. Unadjusted p-value distributions for the updated analysis are presented in Figure 4.4. While graphs 1 through 4 mirror those from the initial analysis (Figure 4.1), the distributions in graph 6 are close to uniform, suggesting that the applied models fit and that batch effects were removed. In support, the correlation of the technical replicate available from the larger set of chorionic villus samples improved from r=0.99616 in the raw data to r=0.99666, after processing was complete (Figure 4.5). Finally, PCA confirmed the removal of batch effects, without introduction of a strong MTHFR signal (Figure 4.6). From this updated procedure, the 30 final, clean MTHFR samples were extracted for biological analyses.  84    Figure 4.4 Second processing: p-value distributions for linear modelling of MTHFR group at six analysis steps. At each step in the updated analysis, the 30 MTHFR samples were selected from the larger group of 59 chorionic villus samples and a linear model was fit to test for differential methylation by MTHFR genotype. Unadjusted p-value distributions were plotted to monitor data processing. The distribution of p-values became flatter with processing, indicating that the model fit and that batch effects were removed.   85    Figure 4.5 Second processing: M value distributions for two replicate samples at each processing step. At each step in the updated processing, the distribution of M values was plotted for a pair of replicate samples. As the data was processed, M value distributions become smoother and closer to each other. The pairwise correlation of all probes improved from r=0.99616 to r=0.99666.    Before ComBat  After ComBat             % Variance/PC 7.8 5.1 4.5 3.7 3.1  6.5 5.5 4.7 3.6 3.3  PC1 PC2 PC3 PC4 PC5  PC1 PC2 PC3 PC4 PC5 MTHFR genotype            chip            row            Figure 4.6 Second processing: association of top five PCs before and after application of ComBat. A linear model was fit to test for an association between PC loadings and sample characteristics prior to batch correction and after correction for both chip and row. Box colours indicate significance of association: dark blue p ≤0.001, mid-blue p ≤0.01, light blue p ≤0.05, Grey boxes indicate no association. After correction for batch effects, no association was found in the data with chip or row.   86   4.3.3 Array-wide DNA methylation Having reduced the batch effect signal, I next conducted between-group comparisons to address whether MTHFR variants were associated with differential DNAm. Given the fundamental involvement of OCM in activating and transporting methyl units [81], if the variant MTHFR alleles have an effect on DNAm, it is predicted to be widespread and not gene-specific [224]. Using all clean data, the pairwise distance between every sample was calculated to quantify the similarity between samples [141]. Hierarchical clustering was then used to obtain a global view of the relationship of samples (Figure 4.7). Hierarchical clustering initially places each sample in its’ own group (i.e., the terminal braches of the clustergram), and then an algorithm is used to identify the pair of most similar samples, grouping these into a cluster (i.e., the horizontal line closest to the origin on the clustergram). This horizontal joining of the two most similar clusters is repeated iteratively until there is only one cluster [141]. Stable clusters of samples can then be examined for grouping features. Samples did not cluster by any of the available clinical characteristics, including MTHFR group, indicating that MTHFR genotype does not drive separation of chorionic villus samples based on 450k DNAm.  Next, mean DNAm was calculated for each sample by averaging the signal at all clean 450k array sites (n=442,348 CpGs). There was no significant difference in average array-wide DNAm by MTHFR group (Table 4.3). Because of the variable association of MTHFR genotype and disease risk, altered genome-wide DNAm might not be a characteristic of all variant carriers. Thus, lastly, the percentage of outlier CpG sites was calculated for each sample. This type of approach has previously been used in DNAm studies of other heterogeneous populations to identify individuals exhibiting outlying patterns of DNAm [225]. There was a trend for more outlying CpG sites in the variant 1298 compared to the reference group (Table 4.3, p=0.01). 87    Figure 4.7 Unsupervised hierarchical clustering of MTHFR samples using 442,348 CpGs. Stability of clusters was calculated using multiscale bootstrap resampling (1,000 permutations) and is indicated by the approximate unbiased (AU) p-value above the branch. AU values >95 indicate stability at a p-value approximately less than 0.05. There was no clustering based on known sample characteristics, as indicated by coloured boxes below the graph; fetal sex – male, blue; female, pink; ethnicity – non-Caucasian, grey; Caucasian, white; mode of delivery – C-section, grey; vaginal, white.  Table 4.3 Array-wide measures of DNA methylation  reference n=10 variant 677 n=10 variant 1298 n=10 p-valuea  Average DNAm; median  (range) 0.407  (0.396-0.413) 0.404  (0.396-0.409) 0.405  (0.395-0.413) ns, ns Percent outlier sites; median (range) 0.665 (0.267-1.01) 0.852 (0.262-6.341) 1.14 (0.499-3.33) ns, 0.01 a p-values calculated by Mann-Whitney test for comparison of variant 677 to reference and variant 1298 to reference; DNAm, DNA methylation; ns, not significant 88   4.3.4 Differential methylation of individual CpG sites array-wide Next, I explored whether any individual CpG sites showed changes in DNAm associated with the variant MTHFR genotypes. The flat distribution of p-values from the 442,348 linear models fit to the clean data (Figure 4.4, graph 6) indicated there were unlikely to be differentially methylated (DM) sites after correction for multiple comparisons. Using the FDR method, adjusted p-values were plotted against the group difference in DNAm (deltaβ) for each 450k probe (Figure 4.8). No sites passed typical EWAS thresholds of FDR <0.05 and deltaβ ≥0.05. Only one site passed a lenient threshold of FDR <0.20 (Figure 4.8, right).   Figure 4.8 Array-wide volcano plots. Volcano plots comparing the magnitude of difference in DNAm (adjusted delta beta) to statistical significance (-log10(adjusted.P.Value)) for each CpG site (n=442,348) in the comparison of the variant 677 to reference groups (left) and variant 1298 to reference groups (right).  Two techniques were used to explore whether identification of differences in DNAm between MTHFR groups was limited due to small sample size or large number of test sites. Due to structural or functional differences, some areas of the genome may be more vulnerable to gains or losses of DNAm. Thus, first, the classification system annotated in Section 3.2.4, was adjusted delta beta (variant 677 – reference) adjusted delta beta (variant 1298 – reference) 89   used to separate 450k probes into four groups based on the density of CpGs in the surrounding region: high density islands, island shores, intermediate density islands and non-islands. The unadjusted p-value distributions separated by density group followed the same, flat trend identified in the array-wide analysis (Figure 4.9). This dimension reduction technique did not identify any CpG class as specifically associated with a change in DNAm by MTHFR genotype.   Figure 4.9 Distribution of unadjusted p-values from linear models separated by CpG density. No CpG density category showed a trend for differential methylation by MTHFR genotype group.   Assessing DM regions (DMRs) rather than DM CpG sites between the variant and reference groups was the second dimension reduction technique applied. Testing for DMRs reduces the number of multiple correction tests and integrates information across neighbouring CpGs. Each of the variant MTHFR groups was compared to the low risk group to identify DMRs covering more than three CpG sites and with a deltaβ greater than 0.075 across the DMR. In the comparison of the variant 677 to reference group, 122 DMRs were identified, while 100 DMRs were identified in the comparison of the variant 1298 to reference group. However, none of these DMRs withstood correction for multiple comparisons. In summary, both the dimension reduction tools in addition to the flat “volcano” plots, illustrate a lack of differential methylation between MTHFR genotypes in this pilot study.  90   4.4 Discussion “Batch effects” is the term used to describe systematic technical variability in data [226]. When batches, such as processing date or lab, are confounded with the variable of interest, like presence of disease, differences between group may be identified that are, in fact, experimental artifacts [165]. A striking example of this was highlighted by Akey et al. [227], when attempting to reanalyze a publicly-available dataset comparing gene expression between two ethnicities [228]. Akey and colleagues found that the majority of arrays collected for European participants were run two years prior to those for Asians. The reanalyzed data indicated that the near complete confounding of year with ethnicity was the likely source of the >4,000 differentially expressed genes identified by the original study [227].  A 2007 review of published gene expression microarray data identified batch effects as one of the top three sources of data variability in eight of nine examined studies [226]. The authors explain that most, if not all, high throughput datasets contain batch effects, and that in many cases this unwanted signal is the primary source of variation across samples. In the mid-2000s, the impact of batch effects in gene expression microarray studies were considered so large that some questioned whether this technology could yield reliable, reproducible data [213]. Thus, in 2006, a consortium of scientists and organizations, ran the MicroArray Quality Control (MAQC) project to systematically test batch effects in gene expression microarray data [229]. The same set of biological samples was run in replicate in different labs and on different microarray platforms. From this project, guidelines for standardized processing, reporting and analysis of gene expression microarray data were developed, including tools for the removal of batch signal. Awareness of the importance of identifying and removing batch effects should be as widespread in DNAm microarray as gene expression microarray studies.  In this pilot MTHFR 450k study, both chip and row were identified as sources of batch effects, which have also been noted in other studies [7, 165]. Though all seven chips were processed at the same time, by the same operators, it is impossible for each step in the three-day protocol to be applied simultaneously to all chips. On the second day of array processing, chips stand vertically for approximately three hours, while successive stains and washes are 91   applied and drained. Slight differences in the timing and length of exposures across chips and rows may account for the batch variation I observed. The R tool ComBat was chosen to correct for chip and row effects, as it has previously been applied to DNAm data [7, 165] and was specifically developed for small sample sizes [230]. In my first application of ComBat, to the original thirty study samples, I found 9,683 and 19,192 DM CpG sites for the two MTHFR variant comparisons, despite the fact that there were no differences in the data prior to correction. Recently, Buhule and colleagues were concerned with similar findings in their study which identified ~400 DM sites before and ~4,000 DM sites after ComBat correction in a 450k study comparing whole blood from obese to non-obese adults [165]. These authors demonstrated that their findings were introduced by the application of ComBat to an unbalanced experimental design (i.e., all obese samples were run together and all lean samples were run together), and that this could be avoided by stratified randomization of samples. The findings of the current study further indicate that with poor experimental design – in this case, a small number of samples across many chips – the use of ComBat may in fact introduce false biological signal into the data. By making some changes to processing and including a greater number of samples, I was able to successfully reduce the batch signal, which revealed very little variability in the DNAm data due to the biological variable of interest: placental MTHFR genotype.  The placenta plays the crucial role of extracting and concentrating folate from maternal blood for transfer to the fetus [81, 84]. In a study of placental MTHFR expression in uncomplicated human pregnancies, placentas of the 677TT genotype were found to have about one-third the MTHFR activity of 677CC placentas [231]. Studies investigating the association between MTHFR genotype and DNAm have reported global losses of DNAm in the blood of 677 and 1298 variant homozygotes [206-208], however there are no such studies in placenta. A genome-wide loss of DNAm fits with the hypothesized mechanism that compromised OCM due to reduced MTHFR function elevates homocysteine levels, leading to elevated S-adenosylmethionine and inhibition of DNMTs (Figure 1.3) [81, 84]. However, in this study, I found no significant altered DNAm by placental MTHFR genotype.  92   The results presented in this Chapter must be interpreted with caution, given this study’s limitations. Though Caucasian vs. Asian ethnicity was equal between groups, the study population was likely to be heterogeneous for other factors, e.g., other genetic polymorphisms, diet, lifestyle choices and socioeconomic status. In well-designed studies, containing hundreds or thousands of samples, the effect of these factors may be somewhat overcome. However, with small sample sizes, and limited clinical information, these factors may overwhelm differences in DNAm by MTHFR genotype. Given the pilot nature of this study, only ten samples were assayed per genotype group, and thus statistical power was low, between about 0.20 to 0.35. Thus, if an array-wide pattern of altered DNAm was present, the sample size was large enough that about 20-35% of differences should have been detected (i.e., if there were 100 true differences in DNAm, about 20 should have been detected). Figure 4.8 illustrates extremely flat volcano plots; only one CpG site passes the very lenient FDR of 0.2. While changes in DNAm at a small number of CpG sites cannot be excluded, this pilot study suggests at the very least, that large in magnitude and/or array-wide differential methylation does not occur by MTHFR genotype. Furthermore, the 450k array is biased towards assessing gene promoters and CpG islands (Section 1.3.7), leaving many CpGs in repetitive elements and regulatory regions unexamined by this study. Thus, the association of MTHFR genotype and DNAm in regions not covered by the 450k array remains unclear.   As previously mentioned, several studies noted effects of MTHFR alleles on DNAm only under limited folate conditions, and folate status was unknown for the cases used in this study. In the presence of adequate folate levels, should MTHFR function be reduced and hcy levels rise, not only can hcy be converted into SAH, but it can also be removed by conversion to methionine in the placenta (see Figure 1.3) [81]. It is plausible that in Canada, healthy carriers of variant MTHFR alleles might not manifest compromised OCM, since with folic acid fortification in 1998, high uptake of gestational monitoring and increased literacy around healthy pregnancies, the population is largely folate replete [232]. As OCM is intertwined with other biochemical cycles, the balance of substrate availability with enzymatic function, 93   including but not limited to MTHFR, may be the key to an individual’s capacity to methylate DNA.    94   Chapter 5: Profiling placental and fetal DNA methylation in human neural tube defects  A version of this chapter is in press (see Preface for contribution details):  Price EM, Peñaherrera MS, Portales-Casamar E, Pavlidis P, Van Allen MI, McFadden DE, Robinson WP. Profiling placental and fetal DNA methylation in human neural tube defects. Epigenetics Chromatin. 2016. 9:6.95   5.1 Introduction Alteration to DNA methylation (DNAm) has become a popular factor to consider in the etiology of multifactorial disease, as a mechanism by which the environment may interact with DNA. To successfully study DNAm in such conditions, careful consideration of sample type and research question are essential in relation to the initial choice of platform as well as down-stream analytical decisions. To demonstrate the use of approaches and tools developed thus far, in this chapter, I profile DNAm in a congenital multifactorial disorder with a strong hypothesis for aberrant DNAm on the genome-wide scale: neural tube defects (NTDs). In Canada, the incidence of NTDs declined by 46% after the implementation in 1998 of mandatory folic acid (FA) fortification of cereal and grain products [105]. A hypothesis of altered capacity for DNAm has been proposed as the mechanism underlying the prevention of NTDs through FA [205], which also provides an explanation for the excess of female NTDs [111]. The basis of this hypothesis was the observation of increased incidence of NTDs in association with maternal and fetal variant MTHFR 677 genotype [114], expected to result in a shift of OCM towards the DNA cycle, and restricting methylation capacity [205]. Furthermore, studies in mice show that the de novo DNAm enzyme, Dnmt3b, is expressed in mouse neural folds during neurulation, and that null mutants develop NTDs [233]. Studies from countries without FA fortification also lend support to the hypothesis. A slight reduction in genome-wide DNAm was noted in NTD cases in China [104, 136, 234], in addition to changes in NTD DNAm at specific genes relevant to fetal development in China and Belgium: imprinted [235], planar-cell polarity [236, 237], HOX [238] and folate receptor genes [239]. Some of the studies are, however, undermined by technical issues like small magnitude of change in DNAm or a lack of statistical correction for multiple comparisons. Despite research, medical interventions and policy changes, NTDs remain the second most common congenital abnormality in many parts of the world [240], and the etiology of FA-resistant cases is unknown. This chapter profiles DNAm of NTDs in British Columbia, a folate-replete population [232]. However, vitamin B12 insufficiency (a cofactor in OCM) has been noted in Canadian women of childbearing age [241] and low maternal serum B12 levels have been associated with 96   an increased risk for NTDs [242, 243]. I therefore considered that in folate-replete populations, the same mechanism through which FA fortification has contributed to reducing the incidence of NTDs may be dysregulated, but by a different pathway, and also manifest as abnormal DNAm. A unique set of multiple tissues was collected from 2nd trimester human fetuses with spina bifida (SB, n=22) or anencephaly (AN, n=15), in addition to controls (CON, n=19), to explore DNAm using repetitive elements and the Illumina Infinium HumanMethylation450 BeadChip (450k array) platform. It was reasoned that observed changes in DNAm in NTDs might be restricted to a single disease-affected tissue (spinal cord or brain) or support tissue (chorionic villi). Alternatively, abnormal DNAm may present in multiple tissues (including those peripheral to the defect, like kidney or muscle), as an imprint of an insult early in embryo development. Due to the essential role that products of OCM play in cell proliferation, differentiation and migration [91, 244], NTDs might be associated with (i) changes in DNAm at a specific subset of loci (which may contribute to the development of NTDs or be a direct consequence of a causative pathway), or (ii) widespread changes in DNAm. 5.2 Materials and methods 5.2.1 Ethics approval Ethics approval for this study was obtained from the University of British Columbia/Children’s Hospital and Women’s Health Centre of British Columbia Research Ethics Board (certificate number: H10—1028). Written consent was obtained for cases ascertained prior to pregnancy termination (n=8). For cases obtained retrospectively, by pathological autopsy (n=48), biospecimens were de-identified and unlinked to clinical data. For all cases, only non-identifiable information is presented.  5.2.2 Sample collection Tissue samples from 2nd trimester (14 – 26 weeks gestational age) stillbirths, elective terminations and spontaneous abortions were collected by the Embryo-Fetal Pathology laboratory at the B.C. Women’s Hospital & Health Centre, Vancouver, Canada. Exclusion criteria 97   for control cases included chromosomal abnormality, congenital or brain abnormality, or grossly abnormal placenta. The use of second trimester fetal tissues required the inclusion of samples from complicated pregnancies. However, multiple pathologies were included to obtain a heterogeneous reference control group and minimize the likelihood of significant findings in association with any given pathology. For controls with known mode of fetal demise (n = 16 of 19), the distribution is as follows: five cases with preterm premature rupture of membranes (PPROMs), three cases of chorioamnionitis, two cases of oligohydramnios, and one case with each of: cervical incompetence, copper intrauterine device (IUD), severe IUGR, intrauterine fetal demise, spontaneous abortion and hypoplastic left heart syndrome. In the NTD groups, chromosomally normal cases with an isolated spina bifida or anencephaly were included, and the mode of termination was heterogeneous within each NTD status group. Whenever possible, a sample of fetal kidney (cortex and medulla), brain (cortex), spinal cord (thoracic region or superior to the lesion in SB), muscle (psoas) and placental chorionic villi were obtained for each of 22 SB-affected fetuses, 15 AN-affected fetuses and 19 CON fetuses (see Table 5.1 and Supplementary Table 5.1); a total of 187 samples were run on the 450k array. Placental chorionic villi were obtained from an additional 9 SB, 11 AN and 9 CON for follow-up pyrosequencing analyses (Supplementary Table 5.2). DNA was extracted from samples by standard salting out method modified from [138]. Previous studies in the Robinson lab have shown that DNAm varies across the placenta [1, 212]; thus, to obtain a more accurate assessment, two independent sites (one proximal to the cord insertion and one midway between the cord and placental edge) were taken from the fetal side of each placenta and extracted DNA was combined for DNAm analyses. 98   Table 5.1 Clinical characteristics of cases  control  (CON) n=19 spina bifida  (SB) n=22 anencephaly (AN) n=15 p-valuea (CON vs. SB; CON vs. AN) fetal GA (wks); median (range) 19.0 (14.5-23.9) 21.8 (19.4-23.7) 20.0 (16.7-23.3) 0.0004, ns maternal age (yrs); median (range) 30.0 (21.0-41.0) 30.0 (20.1-40.5) 30.4 (22.8-37.3) ns, ns fetal sex; n male (% male) 7 (37) 16 (73) 5 (33) 0.03, ns fetal MTHFR 677 genotype; n (%)    ns, ns CC  12 (63) 11 (50) 6 (40)  CT  7 (37) 7 (32) 8 (53)  TT  0 (0) 4 (18) 1 (7)  fetal MTHFR 1298 genotype; n (%)    ns, ns AA  8 (42) 14 (64) 6 (40)  AC 9 (47) 6 (27) 8 (53)  CC  2 (11) 2 (9) 1 (7)  available tissues; n (%)     chorionic villi  16 (84) 22 (100) 14 (93)  kidney 16 (84) 20 (91) 8 (53)  spinal cord  9 (47) 17 (77) 6 (40)  brain  11 (58) 9 (41) -  muscle 13 (68) 10 (45) 8 (53)  a p-values calculated by Mann-Whitney test for continuous variables and Fisher’s exact test for categorical variables; GA, gestational age; wks, weeks; yrs, years; ns, not significant  5.2.3 Case characteristics Clinical case characteristics are presented in Table 5.1. SB and AN were compared as separate NTD status groups throughout this study, as they may have distinct etiologies [108]. Although the range of gestational ages overlapped, CON cases were younger than SB cases at delivery (median 19.0 vs. 21.8 weeks, p=0.0004), and included fewer males (37% vs. 73%, p=0.03). These clinical characteristics did not differ between the AN group and CON (Table 5.1). 99   A detailed list of sample information, including clinical characteristics and technical variables relating to running of the 450k array, is available in Supplementary Table 5.1. 5.2.4 MTHFR genotyping MTHFR genotype was assessed in each case at nucleotides 677C>T (rs1801133) and 1298A>C (rs1801131). In all individuals but one, chorionic villi were available for genotyping; in the exceptional case, kidney tissue was used in lieu of chorionic villi. Primer sequences and reaction conditions can be found in Supplementary Table 5.3. Five µL of PCR product was sequenced on a PyroMark Q96 MD Pyrosequencer (Qiagen) using standard protocols [216].  5.2.5 Illumina Infinium HumanMethylation450 BeadChip quality control and pre-processing Genomic DNA was purified and bisulfite converted as in Section 3.2.6. Samples were randomized across three MSA-4 plates for processing following the Illumina Infinium HumanMethylation450 BeadChip protocol [70]. Raw intensity was read into Illumina GenomeStudio Software 2011.1 and background normalization was applied.  On each array, Illumina includes 835 control probes to assess, for example, array staining, extension and bisulfite conversion. An initial quality control (QC) check following Illumina protocol was performed using the control probes, with no samples, chips or batches identified as outliers. Signal intensity exported from GenomeStudio was read into R statistical software [141] using lumi [72] to convert signal intensities to M values. Extensive QC was conducted to check sample identity using: (i) clustering of samples originating from the same individual with 65 450k SNP probes; (ii) clustering of samples by sex with 450k chromosome X and Y probes; and (iii) clustering of samples with their respective tissue using all autosomal probes. Finding no mislabeled samples, sample quality was next assessed using: (i) the number of probes with a detection p-value >0.01; (ii) the number of probes with <3 bead replicates; and (iii) the average sample intensity. Four samples were identified as outliers based on these sample quality checks and removed from further analyses.  100   Probe filtering was next conducted to eliminate systemically poor quality probes (detection p-value >0.01 in >20% of samples or <3 bead replicates in >20% of samples; n=587), probes targeting the sex chromosomes (n=11,345), polymorphic probes (n=20,573), and probes that potentially cross-hybridize to the sex chromosomes (n=10,672), as annotated in Chapter 3. Colour correction [72] for red-green colour channel bias and SWAN normalization [75] to correct for Type I – Type II probe bias were applied. M values were replaced with missing values in the remaining probe-sample pairs with detection p-values >0.01 or <3 bead replicates.  Principal component analysis was used to detect batch effects within each tissue. MSA-4 plate, Sentrix_row (i.e., chip row) and Sentrix_ID (i.e., chip ID) were found to be associated with variability in the dataset and were subsequently corrected for using ComBat [214]. Prior to correction, an additional 244 probes were filtered from all samples, since there were less than two values for one of the three batch variables, resulting in 183 samples and 442,156 probes. Successive rounds of batch correction were applied, starting by correcting for MSA-4 plate and followed by Sentrix_row then Sentrix_ID.  The correlation of two replicate pairs (one chorionic villi, one kidney sample) was used as a QC metric throughout pre-processing of the dataset; starting in the raw data with r_chorionic villi=0.9953467 and r_kidney=0.989889 and ending with r_chorionic villi=0.9959680 and r_kidney=0.9947478 in the batch-corrected data. With one sample of each replicate pair removed and removal of the 65 SNP probes, the final dataset included 442,091 “clean probes” and 179 samples: 52 chorionic villi, 44 kidney, 32 spinal cord, 20 brain and 31 muscle (Supplementary Table 5.1). Filtered and raw data for the 179 samples were deposited in NCBI’s Gene Expression Omnibus (GEO) [245] and are accessible through GEO series accession number GSE69502 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69502). 5.2.6 Probe to gene annotation For gene-based analyses, a single gene name was annotated to each 450k CpG site in the following manner: (i) sites with no Illumina-annotated UCSC_refgene_name were annotated as NA; (ii) sites with one or more gene name entries in the Illumina-annotated 101   UCSC_refgene_name and where all gene names were identical were annotated to the given gene; (iii) sites with multiple differing gene name entries in the Illumina-annotated UCSC_refgene_name were annotated to the closest transcription start site (TSS), based on the Price et al., 2013 (GPL16304) annotation in the Closest_TSS_gene_name column [168]. Probes that fell into category (i) were distant from TSSs. These were not annotated to a gene as the regulation of DNAm at these CpG sites may not be determined by the closest TSS.  5.2.7 Differential DNA methylation analyses Differential methylation (DM) was assessed within a tissue by applying a linear model to M values on a per CpG-level using limma [218]. In modelling DNAm, NTD-status (CON, SB, AN) was used as the main effect and fetal sex and gestational age were included as additive covariates. DM results were extracted for the comparison of SB to CON and AN to CON. Resulting p-values were adjusted using the Benjamini & Hochberg [219] false discovery rate (FDR) method, using the topTable function in limma.  M values corrected for fetal sex and gestational age were transformed to β values using lumi [72]. For every CpG site, average DNAm was calculated within each tissue by taking the mean β value for each of CON, SB and AN, with missing data points stripped prior to calculation. Group differences in DNAm (deltaβ) were then calculated by subtracting the CON average β from each of the SB and AN average β per CpG site. Significant DM CpG sites were considered as those with an FDR <0.05 and deltaβ ≥0.05.  5.2.8 Biologically relevant candidate CpG sites analysis DM analysis was conducted as outlined in Section 5.2.7, using only a subset of biologically relevant candidate CpG sites. Biologically relevant candidate CpG sites were chosen as those mapping to the following genes: (i) where a mutation of the homologous gene in mice has been shown to result in NTDs [246], (ii) thought to be associated with human cases of NTDs [247], or (iii) annotated to the GO term “one carbon metabolism process” (GO:0006730). 340 genes met one or more of these criteria, and the 8,393 probes associated with these genes were used in the biologically relevant candidate CpG site analysis.  102   5.2.9 Genome-wide analyses – 450k array Unsupervised hierarchical clustering was performed using all clean, gestational age and sex corrected data (n=442,091 CpG sites, n=179 samples). The dissimilarity structure of the data was calculated using the Euclidean method with average agglomeration. An array average DNAm was calculated for each sample by averaging DNAm at all clean CpG sites corrected for gestational age and sex (n=442,091), with missing data points stripped prior to calculation. The percentage of outlier probes per sample was calculated in two steps. First, the number of outlier probes per sample was calculated within each tissue. A probe was considered an outlier if it was greater than three median absolute deviations from the probe median for all samples in the given tissue [221]. Second, the number of outlier probes per sample was normalized by dividing it by the total number of CpG sites with data (i.e., not NAs) in the given individual. 5.2.10 DNA methylation assessment by pyrosequencing Genomic DNA from each sample was bisulfite converted using the EZ Gold DNA methylation kit (Zymo, Irvine, CA, USA), following manufacturer’s protocols. Primer sequences and reaction conditions for all pyrosequencing assays can be found in Supplementary Table 5.3. Within an assay, samples were randomized across pyrosequencing plates to reduce technical bias. For quality control, synthetic fully methylated and unmethylated samples (EpigenDx, Hopkinton, MA, USA) were included on each plate. Pyrosequencing was conducted following standard protocols [248]. Repetitive elements – DNAm was averaged for four CpG sites in the L1 promoter and three CpG sites in Alu for assessment of repetitive element DNAm [120, 134]. The measurement was repeated for samples with a single peak height <75 or SD >10% between assayed CpGs.  Chorionic villi candidate follow-up – Four CpG sites (cg1098862, cg02413938, cg17343385, cg02413938) were identified as DM in the 450k array comparison of AN to CON. Two of these sites (cg1098862, cg02413938) were followed-up by pyrosequencing in an extended group of samples, since they were close to genes which may be of interest in NTDs. The other two sites were not followed-up by pyrosequencing, since cg17343385 overlaps a 103   documented SNP (rs111359627, dbSNP141), which likely accounts for the observed difference in DNAm and cg24666096 is >30 kbs from the closest TSS. DNAm by pyrosequencing was significantly correlated with array DNAm in the set of samples run on the 450k array (cg1098862, r=0.84 (p<2.2e-16); cg02413938, r=0.37 (p=0.008)).  5.2.11 GO analysis Gene ontology analysis (GO) was performed using ErmineJ [249]. For each tissue by NTD status comparison, genes associated with 450k CpG sites were ranked by magnitude of unadjusted p-values (smallest to largest) from the fitted linear models described in Section 5.2.7. Enrichment for GO classes was performed against the background of clean 450k data and using the following ErmineJ conditions: gene score resampling method, minimum gene set size of 10, maximum gene set size of 200, using the best score for gene replicates, median class scoring, 200,000 iterations and full resampling. Gene ontology was conducted with the 748 persistent kidney SB hits using the “quick list” option of the over-representation analysis (ORA) method in ErmineJ. 5.2.12 DMR analysis Differentially methylated region (DMR) analysis was conducted within a tissue on M values using the function dmrFind in charm  [222], with the following criteria to define a DMR: maximum gap between adjacent CpGs =300 bps and ≥3 probes in identified DMRs. Fetal sex and gestational age were included as additive covariates in the modelling of DMRs. Once DMRs were identified in the comparison of SB to CON and AN to CON, a q-value correction for multiple comparisons was applied with 1,000 iterations using the qval function in charm.  5.2.13 Publicly available data and analysis  Filtered and raw data for the 179 samples used in this study were deposited in NCBI’s Gene Expression Omnibus (GEO) [245] and are accessible through GEO Series accession number GSE69502 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69502). 450k data available for five fetal kidney samples of comparable gestational age (GSM868047, GSM868048, GSM868049, GSM868050 and GSM868051; Supplementary Table 104   5.4) were downloaded from GEO SuperSeries, GSE30654 [250]. To follow-up the DM kidney SB CpGs, the CON and SB kidney samples used in this study were compared to the control GEO kidney samples (termed GEO CON). First, the SB samples were each compared to the GEO CON samples to assess replication of DM at an FDR <0.05. Then CpG sites demonstrating a post hoc association with technical confounding variables were filtered out. CpG sites that were DM between the two control groups were removed (n=1,536). Finally, only those CpG sites that were DM between the SB and the GEO CON samples were considered “persistent hits” (n=748).  5.2.14 Statistical software  All analyses were conducted in R statistical software [141]. P-values for Table 5.1 and Table 5.2 were calculated by Mann-Whitney test for continuous variables and Fisher’s exact test for categorical variables. For chorionic villi AN follow-up analyses, comparison of pyrosequencing and array data was conducted using Spearman’s rank order correlation. A linear model, with main effect NTD status and fetal sex and gestational age as additive covariates, was fit to each follow-up CpG site. Correction for multiple comparisons was applied using Bonferroni correction. Graphics were created using ggplot2 [223] in R and Adobe Illustrator CS6. All graphs were plotted using sex and gestational age-corrected data. 5.3 Results 5.3.1 MTHFR genotyping Two SNPs in MTHFR, 677C>T and 1298A>C, were evaluated in cases as: (i) reduced MTHFR function has been reported for carriers of 677TT (~45% function of controls) and 1298CC (~68% function of controls) [193, 251]; and (ii) an association between the 677TT genotype and NTD status has been consistently reported (odds ratio of 1.8 for NTD infant carriers) [114]. In the cases used in this study, no difference in genotype frequency by NTD status was observed at the 1298 locus. A trend for increased T homozygotes at the 677 locus was present in both NTD groups compared to CON, although this did not reach statistical 105   significance (Table 5.1). Given the small sample size of this study, additional genetic polymorphisms were not evaluated. 5.3.2 Differential methylation of biologically relevant candidate CpG sites Specific disease-relevant loci may play a role in the development of NTDs through altered gene expression. These include genes for which mutations in mice lead to the development of NTDs, in addition to loci involved in one carbon metabolism. As DNAm can either affect or reflect gene expression, as a first step, I assessed DNAm at 8,393 candidate 450k CpG sites in 340 biologically relevant genes. Within each tissue, a linear model was fit to every biologically relevant CpG site to test for differential methylation by NTD status, while controlling for fetal sex and gestational age. Differentially methylated (DM) CpG sites were those identified at FDR <0.05 and average group difference in DNAm (deltaβ) ≥0.05, when comparing SB to CON or AN to CON (see Section 5.2.7). DM CpG sites were detected only in the kidney comparison of SB to CON (n=65, 0.8% of biologically relevant candidate CpG sites), while comparisons in all other tissues did not meet the DM criteria (Supplementary Table 5.5). Thus, aberrant DNAm of these biologically relevant candidate loci is not a main feature of these NTD cases, with the exception of SB kidneys.  5.3.3 Genome-wide DNA methylation Demand for one carbon units is high during times of rapid cell proliferation, development and migration [91], such as in early embryo development. If OCM cycling is disrupted in persisting NTD cases, widespread changes in DNAm might occur – either throughout the genome or at specific sensitive sites like repetitive elements [252]. I took several approaches to address this hypothesis. First, unsupervised hierarchical clustering of clean 450k CpG sites (n=442,091 CpGs, see Section 5.2.7) gave an overall view of the relationship between study samples (Figure 5.1). As expected, samples clustered primarily by tissue type, not the individual of origin. Tissues originating from the inner cell mass clustered closer together (i.e., brain and spinal cord of ectodermal origin; muscle and kidney of mesodermal origin), while chorionic villi (of mixed trophectodermal/chorionic mesodermal 106   origin) clustered further away from the four somatic tissues. Notably, even within a tissue, there was no clear division of samples by NTD status, although a cluster of eight SB and one AN in kidney, and of twelve SB and two CON in spinal cord were noted. 107     Figure 5.1 Sample clustering based on array-wide DNA methylation. Unsupervised hierarchical clustering of 442,091 CpG sites clustered samples primarily by tissue type. Even within a tissue, samples did not cluster by NTD status. An SB-dominant cluster in kidney (8 SB, 1 AN) and SB-dominant cluster in spinal cord (12 SB, 2 AN) are visible. 108   All cases of NTDs might not exhibit aberrant genome-wide DNAm; their multifactorial inheritance suggests the etiology of this disease is heterogeneous. Thus second, I calculated: (i) the 450k array average DNAm per sample (Table 5.2, Supplementary Figure 5.1) and (ii) percentage of outlying CpG sites per sample (Supplementary Figure 5.2). Using 1,000 permutations of the Mann-Whitney test, a slight reduction in array average DNAm was noted in chorionic villi for SB vs. CON (-0.005, p<0.01) and AN vs. CON (-0.007, p<0.05) (Table 5.2). Comparing between NTD status groups, there was no difference in the number of outlier samples based on either of these measures.  Finally, a different assay was used to measure genome-wide DNAm in all samples –  pyrosequencing of the repetitive elements, L1 and Alu [120, 134]. L1 and Alu repetitive elements are scattered throughout the genome and their DNAm has been shown to be sensitive to environmental exposures [121-124]. These repetitive elements are not densely covered by the 450k array and thus, they were used to assess DNAm at other genomic regions. There were no statistical differences in repetitive element DNAm by NTD status in any of the studied tissues (Table 5.2). 109   Table 5.2 Genome-wide DNA methylation by tissue   chorionic villi kidney spinal cord brain muscle median array avg β ±SD (n) control 0.432 ±0.006 (16) 0.471 ±0.004 (16) 0.486 ±0.002 (9) 0.486 ±0.004 (11) 0.484 ±0.004 (13) spina bifida 0.427 ±0.007 (22)** 0.470 ±0.003 (20) 0.481 ±0.004 (17)* 0.482 ±0.004 (9) 0.484 ±0.003 (10) anencephaly 0.425 ±0.007 (14)* 0.468 ±0.003 (8) 0.485 ±0.003 (6) - 0.484 ±0.003 (8) median % L1 DNAm ±SD (n) control 55.8 ±3.8 (16) 79.2 ±1.5 (16) 79.9 ±1.5 (9) 81.5 ±2.0 (11) 80.3 ±1.4 (13) spina bifida 56.7 ±3.0 (22) 78.3 ±2.1 (20) 80.9 ±1.7 (17) 83.2 ±1.4 (9) 80.9 ±1.6 (10) anencephaly 55.0 ±2.9 (14) 78.5 ±1.4 (8) 79.7 ±1.7 (6) - 80.5 ±1.4 (8) median % Alu DNAm ±SD (n) control 22.6 ±1.1 (16) 24.7 ±1.2 (16) 25.0 ±0.9 (9) 25.6 ±1.6 (11) 24.2 ±1.0 (13) spina bifida 22.3 ±1.1 (22) 24.8 ±1.5 (20) 26.1 ±1.6 (17) 25.4 ±1.2 (9) 25.0 ±2.2 (10) anencephaly 22.8 ±1.3 (14) 24.1 ±2.1 (8) 25.4 ±0.9 (6) - 24.5 ±0.6 (8) p-values based on comparison of spina bifida to controls or anencephaly to controls. * p<0.05 based on 1,000 permutations of Mann-Whitney test, ** p<0.01 based on 1,000 permutations of Mann-Whitney test. Standard deviation, SD; average, avg.   110   5.3.4 Differential methylation of CpG sites array-wide Next I explored whether NTDs exhibited altered DNAm at any of the CpGs targeted by the 450k array. Within a tissue, a linear model was fit per CpG site to test for differential methylation (DM) by NTD status while controlling for fetal sex and gestational age (see Section 5.2.7). The distribution of unadjusted p-values from the comparisons of SB and AN to CON cases gives a broad view of the pattern of DM by NTD status (Figure 5.2). A uniform distribution of p-values, for example in the AN spinal cord comparison, indicates equal likelihood of DM and non-DM. Left-peaking distributions (e.g., SB kidney or spinal cord comparisons), indicate a greater likelihood of DM than non-DM, suggesting that a subset of loci show altered DNAm in these tissues. These differences, however, must pass correction for multiple comparisons to be considered significant CpGs of interest.   Figure 5.2 Tissue distribution of unadjusted p-values from linear modelling of differential methylation in NTDs. Distribution of p-values from the comparison of DNAm for spina bifida to control samples (left) and for anencephaly to controls samples (right) at each of 442,091 CpG sites, including fetal gestational age and sex as covariates. Flat distributions indicate equal likelihood of significant and non-significant tests, while left peaking distributions indicate greater likelihood of significant tests.   111   An FDR correction was applied to the p-values obtained from the fitted linear models; additionally, for every CpG site, the average CON DNAm was subtracted from the average SB or AN DNAm (deltaβ). Plotting deltaβ against FDR for each CpG site demonstrated remarkably few extreme differences in most of the tissue comparisons, except for the comparison of SB kidneys to CON kidneys (Figure 5.3, Supplementary Figure 5.3). The deltaβ on the x-axis of these volcano plots (Figure 5.3) suggests that the array-wide difference in average DNAm noted in chorionic villi in the previous section (Table 5.2) is representative of small changes in DNAm across many CpGs. At a statistical threshold of FDR <0.05, 4,148 CpG sites were DM in the kidney comparison of SB to CON, one CpG site was DM in the spinal cord comparison of SB to CON and five CpG sites were DM in the chorionic villi comparison of AN to CON (Supplementary Table 5.6). The application of a second filter, a minimum deltaβ of 0.05 to enhance for biologically meaningful differences between groups, reduced the number of DM CpG sites to 3,342 in the kidney comparison of SB to CON and four CpG sites (Figure 5.4) in the chorionic villi comparison of AN to CON (discussed in next section).  112    Figure 5.3 Spina bifida array-wide volcano plots. Volcano plots comparing the magnitude of difference in DNAm (adjusted delta beta) to statistical significance (-log10(adjusted.P.Value)) for each CpG site (n=442,091) in spina bifida vs. control samples. 113    Figure 5.4 Differentially methylated CpG sites in the chorionic villi comparison of anencephaly cases to controls. Four CpG sites were identified as significantly differentially methylated at an FDR <0.05 and deltaβ ≥0.05 in the chorionic villi comparison of anencephaly to controls. Each plot is labelled with the 450k CpG site identifier, the gene to which it is closest and average difference in DNAm between anencephaly and controls (deltaβ). Box edges are plotted at the 25th and 75th percentiles (the inter-quartile range (IQR)) and whiskers are plotted to the last sample within +/- 1.5*IQR.  Three approaches were taken to reduce the risk that findings from the above DM analysis were not limited due to threshold cutoffs, sample size or small differences in DNAm. First, the top 1,000 ranking DM CpG sites were overlapped: (i) within a tissue by NTD status (Supplementary Figure 5.4), and (ii) within an NTD status across all tissues (Supplementary Figure 5.5). There was little overlap of top ranking DM CpG sites in either comparison. Second, 114   for each tissue by NTD status comparison, a gene ontology (GO) analysis was conducted with all unadjusted p-values to rank genes associated with CpG sites on the array. The only GO classes significant at an FDR <0.10 were “nuclear-transcribe mRNA catabolic process, nonsense-mediated decay” (GO:0000184, 111 genes) in the chorionic villi SB to CON comparison and “production of molecular mediator involved in inflammatory response” (GO:0002532, 13 genes) in the spinal cord AN to CON comparison. Third, DM regions (DMRs) rather than DM CpG sites were assessed to integrate information from neighbouring CpG sites and reduce the number of comparisons in statistical tests. DMRs were identified in each of the tissue by NTD status comparisons, but after correction for multiple comparisons, none were significant. These supplemental analyses, in addition to the array-wide volcano plots (Fig. 3; Supplementary Figure 5.3) suggest that aberrant genome-wide DNAm is present in SB kidneys, but may not be a major feature of the other NTD tissues assessed in this study. 5.3.5 Follow up of differentially methylated CpG sites in kidney and chorionic villi To confirm the unique findings in kidney SB, I compared the SB and CON cases used in this study to 450k data from five independent control 2nd trimester fetal kidney samples downloaded from GEO (GEO CON; Supplementary Table 5.4). Of the 4,148 CpG sites identified as DM in the previous section at FDR <0.05, 2,644 (64%) remained DM at FDR <0.05 when comparing the SB kidneys to the GEO CON kidneys. A subset of 748 higher confidence CpG sites, termed persistent hits, was selected for use in follow-up analyses. Persistent hits were refined using four filters: (i) CpG sites identified as DM at FDR <0.05 & deltaβ ≥0.05 in Section 5.3.4 when comparing the SB to CON kidneys (3,342 of 4,148); (ii) CpG sites where a post-hoc test revealed no association with study covariates - gestational age, sex, plate, chip or row (2,644 of 3,342); (iii) CpG sites where the two control groups exhibited no DM at FDR <0.05 (1,536 of 2,644); and (iv) CpG sites with DM at FDR <0.05 and deltaβ ≥0.05 between the SB and the GEO CON samples (748 of 1,536). Persistent hits were underrepresented in high-density CpG islands (p<2.2e-16), and enriched in non-islands (p<2.2e-16) and enhancers (p<2.2e-16) (Figure 5.5A), but were not enriched for any GO terms. As expected, supervised clustering of kidney samples using only these 748 persistent hits almost entirely separated the SB from CON 115   cases, though interestingly, two distinct clusters emerged within the SB cases (Figure 5.5B). These two SB clusters did not, however, separate by sex, gestational age, location of the defect, or MTHFR genotype. The persistent hits were not followed up, as they appear to be secondary changes, and are not likely linked to the etiology of NTDs.   Figure 5.5 Identification and investigation of persistent hits in spina bifida kidneys. By comparing the spina bifida (SB) and control (CON) samples to an independent control group (n=5, GEO CON), 748 of the differentially methylated kidney spina bifida CpG sites were identified as “persistent hits”. (A) Persistent hits were enriched for CpG sites located in enhancers and outside of CpG islands. (B) Hierarchical clustering of the spina bifida and control samples based solely on persistent hits almost completely separated the spina bifida from control cases, and two groups of spina bifida cases emerged.   Four sites (cg1098862, cg02413938, cg17343385, cg24666096) were identified in Section 5.3.4 as DM in the AN vs. CON comparison in chorionic villi (Figure 5.4). Pyrosequencing assays (Supplementary Table 5.3) were designed to follow up two of these CpG sites in the chorionic villus samples run on the 450k array (n=16 CON, 22 SB, 14 AN), in addition to an extended set of chorionic villus samples (n=9 CON, 9 SB, 11 AN; Supplementary Table 5.2). Follow-up of cg17343385 was excluded, because a post-hoc search of UCSC genome browser indicated that this probe targeted a SNP not filtered out in data processing. In the case of A B 116   cg24666096, follow-up was not conducted, because no nearby functional genomic elements were annotated in UCSC genome browser (i.e., TF binding sites, TSS or ENCODE enhancers), making it currently difficult to interpret the biological relevance of altered DNAm at such a location. The difference in DNAm at cg10988628, 146 bps up-stream of PARP1 (poly (ADP-Ribose) Polymerase 1), was validated by pyrosequencing in the samples run on the array (p=0.000006, Supplementary Figure 5.6) and replicated in the extended samples (p=0.02). The difference in DNAm observed at cg02413938, 21 bps up-stream of ESPNL (Espin-Like, a gene of unknown function), was also validated by pyrosequencing in the samples run on the array (p=0.05), although it was not replicated in the extended samples (Supplementary Figure 5.7A). DNAm at this CpG site appeared to be associated with genotype at a SNP 6 bps down-stream (rs6431579), and it is therefore possible that cg02413938 was picked up in the 450k DM analysis due to unequal genotype distribution between NTD status groups (Supplementary Figure 5.7).  5.4 Discussion Increased methyl group availability is one of the proposed mechanisms for how folic acid (FA) fortification contributed to the worldwide reduction of NTDs. To test whether abnormal DNAm is a feature of NTDs that develop despite FA fortification, in this chapter, DNA methylation (DNAm) was profiled in tissues ascertained from cases of NTDs in British Columbia, a folate-replete population. Barring distinctive DNAm in the kidneys of spina bifida (SB) cases, neither aberrant CpG site-specific nor genome-wide DNAm were characteristic of the tissues in this cohort. Aside from these main findings, there were remarkably few changes in DNAm in anencephalic (AN) fetuses, which suggests that normal fetal neural function does not have dramatic effects on placental and fetal development through the second trimester.   Though tissues directly involved in the neural tube defect – spinal cord and brain – had largely normal DNAm in NTDs, a tissue peripheral to the defect – kidney – was the one dramatic exception to this pattern, with 4,148 CpG sites (FDR <0.05) identified as differentially methylated (DM) in SB. This distinct pattern may be reflective of cell type heterogeneity in 117   kidney, consistent tissue sampling or true differences in DNAm, and warranted further investigation. When compared to a publically available control kidney group, 64% of the kidney SB vs. CON sites remained DM, increasing confidence in this distinctive DNAm profile. This was not, however, an ideal validation, as an independent SB group was not also available.   About 50% of people living with spina bifida exhibit abnormal urodynamics, which in many cases leads to severe renal complications [253]. However only about 8-9% of SB cases present with a congenital renal abnormality at birth, including for example renal agenesis, horseshoe kidney or ureteral duplication [254, 255]. All of the 22 SB cases in this study were found to have histologically normal kidneys on autopsy, and thus grossly abnormal renal morphology does not likely account for the DM that I observed. Caudal neural tube closure should occur at around day 28 of gestation and the ureteric buds, which induce differentiation of the kidneys, develop at about day 30-32 [254]. Thus, an environmental disruption during this window of pregnancy, leading to the development of an NTD, might also be related to the abnormal kidney DNAm observed here [253]. Alternatively, the abnormal pattern of DNAm might be related to dysregulated innervation from the spinal cord during renal development. Sympathetic renal innervation originates from the T10 to L1 regions of the spinal cord [256] and is posited to play a role in the cellular and biochemical development of this organ during gestation [257]. Of the 20 SB kidneys, the highest intact level of the spinal cord was known for 15 cases: above T10 – 2 cases, T10-L1 – 3 cases, L2-L3 – 7 cases, L4-L5 – 2 cases and S1 – 1 case. The distribution of location of defects in the cases and the lack of biological pathway enrichment in the persistent DM kidney CpGs, suggest that abnormal kidney DNAm does not likely give rise to the NTD.  Placental chorionic villi was a tissue I predicted might be more likely to exhibit changes in DNAm in NTDs. Placental DNAm may be particularly sensitive to the in utero environment [62], since this organ is designed to respond to fetal, as well as maternal, signals. The placenta is also responsible for nutrient acquisition from maternal circulation during gestation. Decreased placental weight and increased placental immaturity have been reported in anencephalic pregnancies at term [258, 259]. There was a weak trend for differential 118   methylation in several of my chorionic villi analyses, though overall, relatively little significant change in DNAm in NTDs was noted. One of the significantly DM sites identified in AN vs. CON (cg10988628) may be of interest in NTDs, since it is located 146 bps up-stream of poly (ADP-Ribose) Polymerase 1 (PARP1). This gene is known to regulate trophoblast differentiation [260], and is involved in ADP-ribosylation of histones [261]; loss of function of PARP1 is associated with invasive and metastatic properties in cancer [262]. Nonetheless, the changes I noted were subtle in AN-associated placentas, and would require a larger sample size to confirm and characterize.   To my knowledge, there has been one other study of genome-wide DNAm in NTD placenta, which used the 450k array: it compared DNAm of eight normal (40 wks gestational age) to eight SB (27 – 40 wks gestational age) placentas [263]. The authors identified 3,839 DM sites at a nominal p-value=0.01 and fold change=0.2. A stricter threshold for DM was used in the current study, FDR <0.05 and deltaβ ≥0.05, as is recommended for EWAS to reduce false positive results [5]. Using these same criteria, the Robinson lab previously observed dramatic differences in the DNAm of chorionic villi in other, less sever conditions such as preeclampsia (30,248 CpG sites) and trisomy confined to the placenta (24,621 CpG sites) [3, 264], versus only four sites identified here in AN and none in SB. The difference in results between Zhang et al. [263] and the current study is likely related to their lack of correction for multiple comparisons, smaller sample size and non-overlapping gestational age of cases and controls.   My ability to detect significant differences at individual CpGs in any of the tissues studied here may have been limited by (i) large variation in DNAm within a group, (ii) heterogeneity of the CON group, (iii) small average group differences in DNAm and/or (iv) sample size, given the factors in (i) – (iii). I also did not have access to important clinical information, such as possible use of periconceptional folic acid or maternal ethnicity (though ethnic differences between cases and controls are expected to lead to false positives, not false negatives). Furthermore, since the same individuals contributed multiple tissues to the study, if mismatched ethnicity were driving these differences in DNAm, I would have seen overlap of top ranking sites across tissues (Supplementary Figure 5.5). As evidenced by the findings in SB 119   kidneys, this study, despite these limitations was sufficient in size to indicate that large and widespread differences in DNAm are not a consistent feature of nervous or placental tissues from NTDs in British Columbia, Canada. The findings were inconsistent with a generalized phenomenon of altered DNAm in NTDs that persist in a folate-replete population, though the methylation hypothesis cannot be excluded in other circumstances (i.e., nutritional environment, demographics, tissues) or regions of the genome not assessed by this study. Other mechanisms, such as changes in non-DNAm regulation of gene expression, altered cell cycle length, or deficiency of vitamins involved in one carbon metabolism are areas of investigation that may provide further clues as to the etiology of NTDs that persist in folate-replete populations.  120   Chapter 6: Discussion 121   6.1 Summary of dissertation  In this dissertation, aspects of genome-wide DNAm, as measured by L1 and Alu repetitive elements (REs), in addition to the 27k and 450k DNAm microarrays, were evaluated to provide fundamental information about biological and technical factors that affect assessment of DNAm in pregnancy-associated samples. Despite the fact that both L1 and Alu are used as surrogate measures of global DNAm, in Chapter 2, I found little correlation between DNAm of these REs. I further examined the relationship of RE DNAm in comparison to DNAm at regions of high, intermediate and low CpG density, and suggested that L1 and Alu DNAm should not be used as proxies for each other or DNAm at other regions of the genome.  In Chapter 3, I presented an additional annotation of the 450k array that included classification of compromised probes and additional biological annotation. Using CpG classification criteria consistent with Chapter 2, probes were annotated by CpG density, to provide a more detailed biological definition of target sites. Using a large publically available dataset in addition to placental, buccal and blood samples, the utility of this enhanced annotation was demonstrated. In Chapters 4, batch effects were discovered during the processing of 450k array data, associated with the location of the sample with regards to 450k chip and row. After correction, I found little evidence for array-wide associations of MTHFR genotype with DNAm in placenta, suggesting that effects were either too small to be detected in this pilot study or not present in healthy placentas and/or adequate folate conditions.   Chapters 2-4 built upon each other to develop a workflow for cleaning, assessing and analyzing DNAm microarray data. Important components include systematic quality control checks to identify outlying samples and probes, identification and use of unbiased metrics for data processing and application of unsupervised as well as a priori biologically-driven methods for analysis. In Chapter 5, this workflow was applied to compare DNAm in multiple tissues ascertained from NTD cases. Genome-wide DNAm was profiled in 53 chromosomally normal 2nd trimester human fetuses collected in a folate-replete population and includes tissues representing three embryonic lineages: brain and spinal cord for ectoderm, kidney and muscle 122   for mesoderm and placental chorionic villi for trophoblast (179 samples total). Fetal tissues are extremely difficult to collect because of political and religious restrictions, at the national, institutional and patient level; therefore, most studies of human pregnancy are limited to sampling accessible tissues like placenta, cord blood, amniotic fluid or extraembryonic membranes, which may have limited applicability to the disease in question. Thus this NTD dataset represents a unique collection with no similar data of comparable sample size published to date. Since I showed, with the exception of spina bifida kidneys, that NTD status was not a major contributor to variation in DNAm, this data may inform baseline expectations of DNAm early in gestation, and provide a resource for correlating DNAm in accessible placental tissue with trends in fetal tissues. 6.2 Introduction to discussion The World Health Organization estimates that complex diseases like cardiovascular disease, cancers and diabetes account for more than 150,000 deaths in Canada every year [265]. These, like other multifactorial diseases, are thought to arise from an interaction of genetic predisposition, in utero exposures and environmental influences throughout life [18, 67]. Epigenetic modulation stands at the intersection of genes and environment, and it is hoped that epigenome-wide association studies (EWAS) will help to unravel the complexity of human variation and disease. Research in this field has even received attention in the media through articles like the BBC’s 'Mother's diet during pregnancy alters baby's DNA' [266], and ‘Can genes explain rising obesity?’ [267], and Discover magazine’s 'Grandma's Experiences Leave a Mark on Your Genes' [268, 269]. The most popular approach for EWAS, is screening for differences in DNA methylation (DNAm) that may be used to identify markers of disease or exposures; future applications include clinical stratification, development of predictive biomarkers and understanding disease etiology [20, 270, 271]. Given that DNAm studies predominate in these investigations of gene-environment interactions, the discussion that follows will be centred on DNAm EWAS.  123   The relatively low cost, single-base resolution, precise targeting of biologically-relevant sites and high-throughput of the 450k array has made it the “go-to” platform for EWAS [19, 20, 67], as evidenced by the nearly 600 datasets and 36,000 samples submitted to the public repository GEO as of December 2015 (GPL13534). The 450k platform has increased the accessibility of EWAS and been a catalyst for multidisciplinary collaboration, bringing together clinicians, basic scientists, bioinformaticians and statisticians. This tool was also the basis for the development of the “epigenetic clock” – a set of CpG sites that can be used to calculate an individual’s “epigenetic age” [272]. The epigenetic clock has been applied to identify “age acceleration”, which appears significant in cancerous tissue [272] and may be associated with in utero or early life exposures [273]. The 450k array has been successfully applied, to the discovery of significantly different and reproducible changes in DNAm in the offspring of women who smoke during pregnancy [274-278], an association which persists in different tissues [279]. Despite some promising findings, many EWAS report small changes in DNAm (<5%) that do not reach genome-wide statistical significance, and further demonstrate no validation or functional relevance [5, 20, 67]. The complexities of study design and data analysis may be masked by the excitement and ease with which genome-wide DNAm data can be generated. Excellent reviews have described issues currently limiting the interpretation and comparison of EWAS, with many lessons drawn from 450k studies [5, 6, 18-20, 67]. In this final chapter, I will highlight some of the issues described by these authors in the context of my dissertation and contribute to the discussion of avenues for advancement in this field.  6.3 Standardized approaches will reduce technical noise in EWAS  Study-to-study differences in approaches, from the sampling of biospecimens to experimental design, data cleaning and statistical analyses likely contribute unwanted noise and may lead to spurious associations in EWAS. Controlling such variability is extremely important for the discovery of true disease or exposure associations, given that the magnitude of change in DNAm is expected to be small. To improve study reproducibility and replication, standard 124   procedures should be published and utilized, whenever possible, to minimize known and unknown sources of variation.   Previous studies in the Robinson lab, have for example, documented variability in DNAm between biopsies taken at different locations within the same placenta, likely a reflection of stochastic changes arising in development combined with its clonal structure [1, 212]. To account for this, it is now advised that DNA be extracted from multiple, independent placental cotyledons and pooled prior to DNAm analyses, for the measurement of a more representative sample [280, 281]. As DNAm is reflective of cell lineage [54], differences in DNAm identified using whole tissue samples may represent a change in DNAm limited to one cell type or suggest a difference in the composition of cell types between disease groups. Unequal proportions of cell types among samples both within and between studies may contribute to variable reproducibility of DNAm differences identified using whole tissues. Computational algorithms to isolate cell type changes in DNAm from changes in cell type proportion changes have been developed for whole blood [282] and brain [283]. However, the use of whole tissues should not negate findings; such limitations should be discussed upfront and computational tools may also be employed to distinguish cell type-related differences from exposure-related differences [284]. In cases where altered DNAm is found to be representative of changes in cell type composition, this may add valuable understanding to disease etiology or progression. Another likely source of study-to-study variability in EWAS arises from batch effects [165]. Despite the fact that batch effects are a known source of false associations in high-throughput data [5, 165, 226], experimental design often goes unreported in the publication of EWAS methodology, with limited or no description of the tools used to assess and correct for batch effects. In Chapters 4 and 5, batch effects were discovered in 450k data due to run date, chip and row, which accounted for top sources of variability in DNAm prior to correction. Of great concern, I found that the application of batch correction to the uneven study design in Chapter 4 likely introduced false biological signal to the data. The findings of this dissertation, as well as the report by Buhule and colleges discussed in Chapter 4 [165], suggest that experimental design and batch correction have major implications for EWAS data analysis and 125   interpretation. For proper evaluation of EWAS, the approach to mitigate batch effects should be described in detail in publications, with inclusion of metrics (e.g., replicates, PCA, p-value distributions) used to assess data processing.  Probe design is yet another technical feature known to result in spurious EWAS findings, if not correctly accounted for [181, 285]. In Chapter 3, I presented an annotation of technically-biased 450k probes and demonstrated that, in a large publically available dataset, many highly variable sites and autosomal sex differences in DNAm, were likely technical artifacts confounded by the presence of such probes. Highlighting the importance of accounting for such probes, other groups have created similar annotations of the 450k array [286-288]. More support is evident from the fact that the published version of Chapter 3 has been cited 105 times (as of April 2016) in the three years since publication, including in further demonstrations of the effects of compromised probes [289, 290] and in recommendations or R packages for standardized processing of 450k array data [7, 73, 291, 292]. These studies, in addition to my own, suggest that the level of DNAm measured by non-specific and polymorphic probes can lead to false discoveries; therefore, removing or at the very least flagging such probes prior to hypothesis testing is recommended. Informed decisions on sample collection, experimental design and data processing require knowledge of hands-on, technical aspects of the measurement platform. Every researcher must consider the multitude of potentially confounding factors that may influence their specific study. Hypothesis testing may be one of the most exciting aspects of EWAS analyses, but is irrelevant if the preceding steps were not carefully completed. Standardized procedures for generating and analyzing EWAS [7] will reduce the reporting of false discoveries, improve reproducibility and allow for easier data integration. 6.4 Accounting for demographics will enhance disease-associated discovery in EWAS Within and between populations, differences exist in disease risk. For example, there is a higher incidence of the cranial NTD anencephaly in females than males [111] and the incidence of preeclampsia is lower in women of Chinese vs. Caucasian ethnicity [293]. The 126   extent to which demographic characteristics, such as sex [294], ethnicity [295] and age [174], are associated with variation in genome-wide DNAm is essential to the identification of disease-associated loci that may contribute to observed differences in risk. Furthermore, if demographic traits are confounded with the variable of interest in the design of EWAS, such characteristics may drive differences between groups. A 2013 review of published DNAm EWAS noted that more than 80% of studies primarily targeted disease or exposure associations [5], but only 30% adjusted for covariates. Gestational age, for example, was shown in Chapter 2 to be associated with genome-wide changes in intermediate CpG density island and non-island promoter DNAm [139], and pregnancy complications like NTDs, preeclampsia and preterm birth are confounded by gestational age. Given that many EWAS report differences in DNAm of ~2% in association with disease or exposures [67], studies that do not account for demographics, by matching samples and/or modelling, risk the identification of false disease-associated loci.  Genetic background is another factor that is associated with disease risk and also with differences in genome-wide DNAm, through key genes for the establishment or maintenance of DNAm (e.g., epigenetic “writers”, OCM genes) or through SNP-DNAm associations, termed methylation quantitative trail loci (meQTLs) [179]. The identification of meQTLs in multiple tissue types indicates that DNAm marks may be representative of both genetic and environmental influences on the epigenome [18, 180, 296]. The effect of a specific genetic profile on DNAm may also be moderated by environmental exposures. It is suggested that the MTHFR genotype 677TT may have a more severe effect on levels of genome-wide DNAm under conditions of folate depletion. Thus, I hypothesized that the lack of differential methylation I observed in association with alternate MTHFR genotypes in Chapter 4, might be a result of the low incidence of folate deficiency currently observed in Canada [232]. Differences in ethnicity, in combination with environmental factors, such as geographic location and diet, may be an additional source of variability in EWAS, and ethnic composition of study groups should be matched and described. Large studies (>500 individuals per group) and meta-analyses are needed to better characterize and isolate genome-wide DNAm in association with demographic variables. For 127   example, a 2015 publication in nearly 2,000 individuals identified >1,000 autosomal differences in DNAm between healthy males and females, a finding that was replicated in populations from three distinct European regions and did not overlap with non-specific probes [294]. Sex differences in DNAm may not only account for variability in disease susceptibility, but also be a major source of between-study differences in EWAS. Consideration of factors affecting normative variation in the recruitment of patients, randomization of samples in experimental design, statistical analyses and result interpretation will improve the characterization of true disease-associated changes in DNAm. 6.5 Responsible reporting will accelerate the field of EWAS  The scientific community has nearly a decade’s experience with EWAS, and with this shared knowledge, minimum guidelines for such studies have been established [5, 6, 18-20, 67]. The responsibility for employing and enforcing these standards for high quality research rests with scientists, reviewers and journal editors alike. A recent presentation and its coverage in the media, from the 2015 American Society of Human Genetics (ASHG) conference in Boston, exemplify consequences of not meeting minimum EWAS standards [297]. The presented study used the 450k array to assess twins concordant and discordant for sexual orientation, and reported an algorithm to predict sexual orientation based on DNAm [298]. ASHG published a press release [299] and NatureNews publicized the study [300], despite the fact that the authors did not meet the basic criteria of correction of p-values for multiple comparisons, among other technical issues. The online discussion of the problems plaguing this study, and more generally EWAS as a field, demonstrate how open peer review can benefit the research community, reaching scientists beyond the authors of a particular study.   The ASHG presentation and responses highlighted some of the systemic issues plaguing EWAS reporting. Reviewers need to be aware of minimum criteria for EWAS and decline to review should they not be equipped to evaluate such studies. Additionally, the publication of positive and negative results should be given equal merit by authors, reviewers and editors alike. For example, the lack of widespread differential DNAm in NTDs in Chapter 5 is an 128   important contribution to the literature, given the expectation for an association. The extensive over-interpretation of small results prior to validation or replication may reflect the early years of this research field and be exacerbated by publication bias. This may, for example, be contributing to the reporting of differential global DNAm in NTDs, even though many human studies report changes of less than 3%, and some of which also do not correct for multiple comparisons or covariates [104, 136, 234, 263]. While NTD research should continue to investigate changes in DNAm in larger sample sizes and outside of the regions targeted by the array, the analysis presented in Chapter 5 indicates that other mechanisms should also be examined. Following established guidelines, reporting limitations, cautiously interpreting results and publishing negative findings, will help researchers to test for replication, generate informed hypotheses and make better use of resources [6]. 6.6 The future of EWAS in open source The field of EWAS is young, still primarily in the data generation stage, with a focus on piloting, profiling and exploring genome-wide DNAm under normal and abnormal conditions. As discussed, accumulated experience has begun to define best practices for designing, generating, analyzing and interpreting EWAS. This parallels the course taken by the field of GWAS, which exhibited a period of data production, followed by the creation of consortia for the standardization of data acquisition and analysis [4, 8, 9, 301, 302]. Larger sample sizes had a dramatic effect on increasing the number of disease/trait-associated loci in GWAS, where above a threshold of about 2,000 individuals, the doubling of sample size results in roughly a doubling of GWAS hits [303]. Resources like the Human Genome Project and 1,000 Genomes Project, which made large datasets publically available, helped to generate references for GWAS and create standards for data sharing within the genomics community [301].  The open source model of free sharing of software development code, could be the key to accelerating the field of EWAS. Already this approach has been adopted by many – from the free sharing of R analysis tools to requirements of many journals that raw data be posted to public repositories. To date, the degree of compliance is variable, as exhibited by the fact that 129   in 2014, of nearly 2,500 450k samples in GEO, close to 1,000 did not report sex of the individual [304]. For shared data to be used to its full potential, a truly altruistic approach is needed: accept and publish negative findings, describe challenges, and report upfront case demographics (tissue, sex, age, ethnicity) and technical characteristics (design, batches, processing steps).  Free sharing of data allows researchers to test the replication of associations in populations with similar and different characteristics. Both validated and non-validated associations will be beneficial, as study-to-study discrepancies can inform the nature and extent of technical confounders or important features for disease susceptibility. Complete data sharing will allow for the integration of data into larger sample sizes, crowdsourcing of analyses and layering of multiple ‘omics measurements. Databases including the Encyclopedia of DNA Elements (ENCODE) [305], the International Human Epigenetic Consortium (IHEC) [306], NIH Roadmap Epigenomics Project [307], and The Cancer Genome Atlas (TCGA) [308], for example, are generating epigenomic libraries from which scientists can begin to identify important conserved and unique patterns of DNAm and their association with other genomic and epigenomic features. Should open source be completely embraced in EWAS, the community will have access to an immense shared resource in which to discover mechanisms underlying normal variation, development and disease. 130   Bibliography 1. Avila L, Yuen RK, Diego-Alvarez D, Penaherrera MS, Jiang R, Robinson WP: Evaluating DNA methylation and gene expression variability in the human term placenta. Placenta 2010, 31(12):1070-1077. 2. Novakovic B, Saffery R: The ever growing complexity of placental epigenetics - role in adverse pregnancy outcomes and fetal programming. Placenta 2012, 33(12):959-970. 3. Blair JD, Yuen RK, Lim BK, McFadden DE, von Dadelszen P, Robinson WP: Widespread DNA hypomethylation at gene enhancer regions in placentas associated with early-onset pre-eclampsia. Mol Hum Reprod 2013, 19(10):697-708. 4. Mill J, Heijmans BT: From promises to practical strategies in epigenetic epidemiology. Nature Reviews Genetics 2013, 14(8):585-594. 5. Michels KB, Binder AM, Dedeurwaerder S, et al: Recommendations for the design and analysis of epigenome-wide association studies. Nat Methods 2013, 10(10):949-955. 6. Chadwick L, Sawa A, Yang IV, et al: New insights and updated guidelines for epigenome-wide association studies. Neuroepigenetics 2015, 1:14. 7. Lehne B, Drong AW, Loh M, et al: A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol 2015, 16(37):28. 8. Callaway E: Epigenomics starts to make its mark. Nature 2014, 508(7494):22. 9. Paul DS, Beck S: Advances in epigenome-wide association studies for common diseases. Trends Mol Med 2014, 20(10):541-543. 10. Wadhwa PD, Buss C, Entringer S, Swanson JM: Developmental Origins of Health and Disease: Brief History of the Approach and Current Focus on Epigenetic Mechanisms. Semin Reprod Med 2009, 27(5):358-368. 11. Barker DJ: The origins of the developmental origins theory. J Intern Med 2007, 261(5):412-417. 12. Hogg K, Price EM, Hanna CW, Robinson WP: Prenatal and perinatal environmental influences on the human fetal and placental epigenome. Clin Pharmacol Ther 2012, 92(6):716-726. 131   13. Bird A: Epigenetics: What's left to find out. New Scientist 2013, 217:viii. 14. Craig J. The new science of Epigenetics and the possibilities for resilience. [http://www.abc.net.au/tv/life/stories/s2994579.htm] (2015). Accessed February 12, 2016. 15. Epigenetic switch for obesity. [https://www.mpg.de/9910690/epigenetic-switch-obesity] (2016). Accessed February 12, 2016. 16. Watters E. DNA Is Not Destiny: The New Science of Epigenetics. [http://discovermagazine.com/2006/nov/cover] (2006). Accessed February 12, 2016. 17. Jones PA: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature Reviews Genetics 2012, 13(7):484-492. 18. Ong ML, Lin X, Holbrook JD: Measuring epigenetics as the mediator of gene/environment interactions in DOHaD. J Dev Orig Health Dis 2015, 6(1):10-16. 19. Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011, 12(8):529-541. 20. Heijmans BT, Mill J: Commentary: The seven plagues of epigenetic epidemiology. Int J Epidemiol 2012, 41(1):74-78. 21. Lander ES, Linton LM, Birren B, et al: Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860-921. 22. Ziller MJ, Müller F, Liao J, et al: Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet 2011, 7(12):e1002389. 23. Lister R, Pelizzola M, Dowen RH, et al: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462(7271):315-322. 24. Li E and Bird A: DNA methlation in Mammals. In: CD Allis, T Jenuwein and D Reinberg, editors. Epigenetics. New York: Cold Spring Harbor Laboratory Press; 2007. p.342-356. 25. Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping using CpG islands. Nat Genet 2000, 26(1):61-63. 26. Fazzari MJ, Greally JM: Epigenomics: beyond CpG islands. Nat Rev Genet 2004, 5(6):446-455. 132   27. Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci U S A 2006, 103(5):1412-1417. 28. Bird A: DNA methylation patterns and epigenetic memory. Genes Dev 2002, 16(1):6-21. 29. Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol 1987, 196(2):261-282. 30. Weber M, Hellmann I, Stadler MB, et al: Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 2007, 39(4):457-466. 31. Cotton AM, Avila L, Penaherrera MS, Affleck JG, Robinson WP, Brown CJ: Inactive X chromosome-specific reduction in placental DNA methylation. Hum Mol Genet 2009, 18(19):3544-3552. 32. Hsieh CL: Dependence of transcriptional repression on CpG methylation density. Mol Cell Biol 1994, 14(8):5487-5494. 33. Lam LL, Emberly E, Fraser HB, et al: Factors underlying variable DNA methylation in a human community cohort. Proc Natl Acad Sci U S A 2012, 109 Suppl 2:17253-17260. 34. Volkmar M, Dedeurwaerder S, Cunha DA, et al: DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients. EMBO J 2012, 31(6):1405-1426. 35. Waterland RA, Michels KB: Epigenetic epidemiology of the developmental origins hypothesis. Annu Rev Nutr 2007, 27:363-388. 36. Doi A, Park IH, Wen B, et al: Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 2009, 41(12):1350-1353. 37. Irizarry RA, Ladd-Acosta C, Wen B, et al: The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009, 41(2):178-186. 38. Zhang D, Cheng L, Badner JA, et al: Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 2010, 86(3):411-419. 39. Ji H, Ehrlich LI, Seita J, et al: Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 2010, 467(7313):338-342. 133   40. Ogoshi K, Hashimoto S, Nakatani Y, et al: Genome-wide profiling of DNA methylation in human cancer cells. Genomics 2011, 98(4):280-287. 41. Jjingo D, Conley AB, Yi SV, Lunyak VV, Jordan IK: On the presence and role of human gene-body DNA methylation. Oncotarget 2012, 3(4):462-474. 42. Jones MJ, Goodman SJ, Kobor MS: DNA methylation and healthy human aging. Aging cell 2015, 14(6):924-932. 43. Santos F, Dean W: Epigenetic reprogramming during early development in mammals. Reproduction 2004, 127(6):643-651. 44. Wang H, Dey SK: Roadmap to embryo implantation: clues from mouse models. Nature Reviews Genetics 2006, 7(3):185-199. 45. Smith ZD, Chan MM, Mikkelsen TS, et al: A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 2012, 484(7394):339-344. 46. Ruzov A, Tsenkina Y, Serio A, et al: Lineage-specific distribution of high levels of genomic 5-hydroxymethylcytosine in mammalian development. Cell Res 2011, 21(9):1332-1342. 47. Huang Y, Pastor WA, Shen Y, Tahiliani M, Liu DR, Rao A: The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PloS one 2010, 5(1):e8888. 48. Smith ZD, Chan MM, Humm KC, et al: DNA methylation dynamics of the human preimplantation embryo. Nature 2014, 511(7511):611-615. 49. Borgel J, Guibert S, Li Y, et al: Targets and dynamics of promoter DNA methylation during early mouse development. Nat Genet 2010, 42(12):1093-1100. 50. Hon GC, Rajagopal N, Shen Y, et al: Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 2013, 45(10):1198-1206. 51. Yuen RK, Neumann SM, Fok AK, et al: Extensive epigenetic reprogramming in human somatic tissues between fetus and adult. Epigenetics Chromatin 2011, 4:7. 52. Hannon E, Lunnon K, Schalkwyk L, Mill J: Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics 2015, 10(11):1024-1032. 134   53. Rakyan VK, Down TA, Thorne NP, et al: An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res 2008, 18(9):1518-1529. 54. Bock C, Beerman I, Lien WH, et al: DNA methylation dynamics during in vivo differentiation of blood and skin stem cells. Mol Cell 2012, 47(4):633-647. 55. Reinius LE, Acevedo N, Joerink M, et al: Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One 2012, 7(7):e41361. 56. Novakovic B, Wong NC, Sibson M, et al: DNA Methylation-mediated Down-regulation of DNA Methyltransferase-1 (DNMT1) Is Coincident with, but Not Essential for, Global Hypomethylation in Human Placenta. J Biol Chem 2010, 285(13):9583-9593. 57. Ehrlich M, Gama-Sosa MA, Huang LH, et al: Amount and distribution of 5-methylcytosine in human DNA from different types of tissues of cells. Nucleic Acids Res 1982, 10(8):2709-2721. 58. Novakovic B, Saffery R: Placental pseudo-malignancy from a DNA methylation perspective: unanswered questions and future directions. Front Genet 2013, 4:285. 59. Schroeder DI, Blair JD, Lott P, et al: The human placenta methylome. Proc Natl Acad Sci U S A 2013, 110(15):6037-6042. 60. Yuen RK, Jiang R, Penaherrera MS, McFadden DE, Robinson WP: Genome-wide mapping of imprinted differentially methylated regions by DNA methylation profiling of human placentas from triploidies. Epigenetics Chromatin 2011, 4(1):10. 61. Court F, Tayama C, Romanelli V, et al: Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human imprinting and suggests a germline methylation-independent mechanism of establishment. Genome Res 2014, 24(4):554-569. 62. Robinson WP, Price EM: The Human Placental Methylome. Cold Spring Harb Perspect Med 2015, 5(5):10.1101/cshperspect.a023044. 63. Tsai PC, Bell JT: Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation. Int J Epidemiol 2015, 44(4):1429-1441. 64. Frommer M, McDonald LE, Millar DS, et al: A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 1992, 89(5):1827-1831. 135   65. Nestor CE, Ottaviano R, Reddington J, et al: Tissue type is a major modifier of the 5-hydroxymethylcytosine content of human genes. Genome Res 2012, 22(3):467-477. 66. Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F: Evaluation of the Infinium Methylation 450K technology. Epigenomics 2011, 3(6):771-784. 67. Ramsay M: Epigenetic epidemiology: is there cause for optimism? Epigenomics 2015, 7(5):683-685. 68. Bibikova M, Barnes B, Tsan C, et al: High density DNA methylation array with single CpG site resolution. Genomics 2011, 98(4):288-295. 69. Bibikova M, Le J, Barnes B, et al: Genome-wide DNA methylation profiling using Infinium((R)) assay. Epigenomics 2009, 1(1):177-200. 70. Sandoval J, Heyn H, Moran S, et al: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 2011, 6(6):692-702. 71. Du P, Zhang X, Huang CC, et al: Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 2010, 11:587. 72. Du P, Kibbe WA, Lin SM: lumi: a pipeline for processing Illumina microarray. Bioinformatics 2008, 24(13):1547-1548. 73. Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F: A comprehensive overview of Infinium HumanMethylation450 data processing. Briefings in bioinformatics 2013, :bbt054. 74. Yousefi P, Huen K, Schall RA, et al: Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies. Epigenetics 2013, 8(11):1141-1152. 75. Maksimovic J, Gordon L, Oshlack A: SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol 2012, 13(6):R44. 76. Touleimat N, Tost J: Complete pipeline for Infinium® Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics 2012, 4(3):325-341. 77. Teschendorff AE, Marabita F, Lechner M, et al: A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 2013, 29(2):189-196. 136   78. Pidsley R, Y Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC: A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 2013, 14(1):293. 79. Wang T, Guan W, Lin J, et al: A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data. Epigenetics 2015, 10(7):662-669. 80. Wu MC, Joubert BR, Kuan P, et al: A systematic assessment of normalization approaches for the Infinium 450K methylation platform. Epigenetics 2014, 9(2):318-329. 81. Laanpere M, Altmae S, Stavreus-Evers A, Nilsson TK, Yngve A, Salumets A: Folate-mediated one-carbon metabolism and its effect on female fertility and pregnancy viability. Nutr Rev 2010, 68(2):99-113. 82. FAO and WHO. FAO/WHO expert consultation on human vitamin and mineral requirements. [http://www.fao.org/3/a-y2809e.pdf] (2001). Accessed December 1, 2015. 83. De-Regil LM, Fernandez-Gaxiola AC, Dowswell T, Pena-Rosas JP: Effects and safety of periconceptional folate supplementation for preventing birth defects. Cochrane Database Syst Rev 2010, (10):CD007950. 84. Solanky N, Requena Jimenez A, D'Souza SW, Sibley CP, Glazier JD: Expression of folate transporters in human placenta and implications for homocysteine metabolism. Placenta 2010, 31(2):134-143. 85. Jauniaux E, Watson A, Burton G: Evaluation of respiratory gases and acid-base gradients in human fetal fluids and uteroplacental tissue between 7 and 16 weeks' gestation. Am J Obstet Gynecol 2001, 184(5):998-1003. 86. Jauniaux E, Johns J, Gulbis B, Spasic-Boskovic O, Burton GJ: Transfer of folic acid inside the first-trimester gestational sac and the effect of maternal smoking. Am J Obstet Gynecol 2007, 197(1):58-e1. 87. Fenech M: The role of folic acid and Vitamin B12 in genomic stability of human cells. Mutat Res 2001, 475(1-2):57-67. 88. Huang RF, Huang SM, Lin BS, Wei JS, Liu TZ: Homocysteine thiolactone induces apoptotic DNA damage mediated by increased intracellular hydrogen peroxide and caspase 3 activation in HL-60 cells. Life Sci 2001, 68(25):2799-2811. 89. Di Simone N, Riccardi P, Maggiano N, et al: Effect of folic acid on homocysteine-induced trophoblast apoptosis. Mol Hum Reprod 2004, 10(9):665-669. 137   90. Steegers-Theunissen RP, Smith SC, Steegers EA, Guilbert LJ, Baker PN: Folate affects apoptosis in human trophoblastic cells. BJOG 2000, 107(12):1513-1515. 91. Beaudin AE, Stover PJ: Folate-mediated one-carbon metabolism and neural tube defects: balancing genome synthesis and gene expression. Birth Defects Res C Embryo Today 2007, 81(3):183-203. 92. Blom HJ: Folic acid, methylation and neural tube closure in humans. Birth Defects Res A Clin Mol Teratol 2009, 85(4):295-302. 93. Weisberg IS, Jacques PF, Selhub J, et al: The 1298A-->C polymorphism in methylenetetrahydrofolate reductase (MTHFR): in vitro expression and association with homocysteine. Atherosclerosis 2001, 156(2):409-415. 94. Wilcken B, Bamforth F, Li Z, et al: Geographical and ethnic variation of the 677C>T allele of 5,10 methylenetetrahydrofolate reductase (MTHFR): findings from over 7000 newborns from 16 areas world wide. J Med Genet 2003, 40(8):619-625. 95. Friedman G, Goldschmidt N, Friedlander Y, et al: A common mutation A1298C in human methylenetetrahydrofolate reductase gene: association with plasma total homocysteine and folate concentrations. J Nutr 1999, 129(9):1656-1661. 96. Dekou V, Whincup P, Papacosta O, et al: The effect of the C677T and A1298C polymorphisms in the methylenetetrahydrofolate reductase gene on homocysteine levels in elderly men and women from the British regional heart study. Atherosclerosis 2001, 154(3):659-666. 97. Shelnutt KP, Kauwell GP, Chapman CM, et al: Folate status response to controlled folate intake is affected by the methylenetetrahydrofolate reductase 677C-->T polymorphism in young women. J Nutr 2003, 133(12):4107-4111. 98. van der Put NM, Gabreels F, Stevens EM, et al: A second common mutation in the methylenetetrahydrofolate reductase gene: an additional risk factor for neural-tube defects? Am J Hum Genet 1998, 62(5):1044-1051. 99. Food Fortification Initiative: Enhancing Grains for Healthier Lives. [http://www.ffinetwork.org/global_progress/index.php] (2016). Accessed June 12, 2015. 100. Smithells RW, Sheppard S, Schorah CJ: Vitamin deficiencies and neural tube defects. Arch Dis Child 1976, 51(12):944-950. 138   101. Anonymous Prevention of neural tube defects: results of the Medical Research Council Vitamin Study. MRC Vitamin Study Research Group. Lancet 1991, 338(8760):131-137. 102. Czeizel AE, Dudas I: Prevention of the first occurrence of neural-tube defects by periconceptional vitamin supplementation. N Engl J Med 1992, 327(26):1832-1835. 103. Berry RJ, Li Z, Erickson JD, et al: Prevention of neural-tube defects with folic acid in China. China-U.S. Collaborative Project for Neural Tube Defect Prevention. N Engl J Med 1999, 341(20):1485-1490. 104. Chang H, Zhang T, Zhang Z, et al: Tissue-specific distribution of aberrant DNA methylation associated with maternal low-folate status in human neural tube defects. J Nutr Biochem 2011, 22(12):1172-7. 105. Van Allen MI, McCourt C, Lee NS: Preconception Health: Folic Acid for the Primary Prevention of Neural Tube defects: a Resource Document for Health Professionals. Minister of Public Works and Government Services Canada 2002. 106. Copp AJ, Greene ND, Murdoch JN: The genetic basis of mammalian neurulation. Nat Rev Genet 2003, 4(10):784-793. 107. Copp AJ, Greene ND: Genetics and development of neural tube defects. J Pathol 2010, 220(2):217-230. 108. Van Allen MI, Kalousek DK, Chernoff GF, et al: Evidence for multi-site closure of the neural tube in humans. Am J Med Genet 1993, 47(5):723-743. 109. Botto LD, Moore CA, Khoury MJ, Erickson JD: Neural-tube defects. N Engl J Med 1999, 341(20):1509-1519. 110. Timor-Tritsch HE, Greenebaum E, Monteagudo A, Baxi L: Exencephaly-anencephaly sequence: proof by ultrasound imaging and amniotic fluid cytology. J Matern Fetal 1996, 5(4):182-185. 111. Juriloff DM, Harris MJ: Hypothesis: the female excess in cranial neural tube defects reflects an epigenetic drag of the inactivating X chromosome on the molecular mechanisms of neural fold elevation. Birth Defects Research Part A: Clinical and Molecular Teratology 2012, 94(10):849-855. 112. Deak KL, Siegel DG, George TM, et al: Further evidence for a maternal genetic effect and a sex-influenced effect contributing to risk for human neural tube defects. Birth Defects Res A Clin Mol Teratol 2008, 82(10):662-669. 139   113. Suarez L, Felkner M, Brender JD, Canfield M, Zhu H, Hendricks KA: Neural tube defects on the Texas-Mexico border: what we've learned in the 20 years since the Brownsville cluster. Birth Defects Res A Clin Mol Teratol 2012, 94(11):882-892. 114. Botto LD, Yang Q: 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: a HuGE review. Am J Epidemiol 2000, 151(9):862-877. 115. De Wals P, Tairou F, Van Allen MI, et al: Reduction in neural-tube defects after folic acid fortification in Canada. N Engl J Med 2007, 357(2):135-142. 116. Van Allen MI, Boyle E, Thiessen P, et al: The impact of prenatal diagnosis on neural tube defect (NTD) pregnancy versus birth incidence in British Columbia. J Appl Genet 2006, 47(2):151-158. 117. Ramsahoye BH: Measurement of genome wide DNA methylation by reversed-phase high-performance liquid chromatography. Methods 2002, 27(2):156-161. 118. Li W, Liu M: Distribution of 5-hydroxymethylcytosine in different human tissues. J Nucleic Acids 2011, 2011. 119. Ziller MJ, Hansen KD, Meissner A, Aryee MJ: Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat Methods 2015, 12(3):230-232. 120. Yang AS, Estecio MR, Doshi K, Kondo Y, Tajara EH, Issa JP: A simple method for estimating global DNA methylation using bisulfite PCR of repetitive DNA elements. Nucleic Acids Res 2004, 32(3):e38. 121. Fryer AA, Nafee TM, Ismail KMK, Carroll WD, Emes RD, Farrell WE: LINE-1 DNA methylation is inversely correlated with cord plasma homocysteine in man A preliminary study. Epigenetics 2009, 4(6):292-295. 122. Rusiecki JA, Baccarelli A, Bollati V, Tarantini L, Moore LE, Bonefeld-Jorgensen EC: Global DNA hypomethylation is associated with high serum-persistent organic pollutants in Greenlandic Inuit. Environ Health Perspect 2008, 116(11):1547-1552. 123. Baccarelli A, Wright RO, Bollati V, et al: Rapid DNA Methylation Changes after Exposure to Traffic Particles. Am J Respir Crit Care Med 2009, 179(7):572-578. 124. Wright RO, Schwartz J, Wright RJ, et al: Biomarkers of lead exposure and DNA methylation within retrotransposons. Environ Health Perspect 2010, 118(6):790-795. 140   125. Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution. Nat Rev Genet 2009, 10(10):691-703. 126. Schmid CW: Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog Nucleic Acid Res Mol Biol 1996, 53:283-319. 127. Xie HF, Wang M FAU - Bonaldo, Maria de,F., Bonaldo Mde FF, et al: High-throughput sequence-based epigenomic analysis of Alu repeats in human cerebellum. Nucleic acids research JID - 0411011 2009, 37(13):4331-4340. 128. Schmid CW: Human Alu subfamilies and their methylation revealed by blot hybridization. Nucleic Acids Res 1991, 19(20):5613-5617. 129. Smit AF, Toth G, Riggs AD, Jurka J: Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J Mol Biol 1995, 246(3):401-417. 130. Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet 2002, 3(5):370-379. 131. Weber M, Davies JJ, Wittig D, et al: Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 2005, 37(8):853-862. 132. Szpakowski S, Sun X, Lage JM, et al: Loss of epigenetic silencing in tumors preferentially affects primate-specific retroelements. Gene 2009, 448(2):151-167. 133. Dolinoy DC, Huang D, Jirtle RL: Maternal nutrient supplementation counteracts bisphenol A-induced DNA hypomethylation in early development. Proc Natl Acad Sci U S A 2007, 104(32):13056-13061. 134. Bollati V, Baccarelli A, Hou L, et al: Changes in DNA methylation patterns in subjects exposed to low-dose benzene. Cancer Res 2007, 67(3):876-880. 135. Tarantini L, Bonzini M, Apostoli P, et al: Effects of particulate matter on genomic DNA methylation content and iNOS promoter methylation. Environ Health Perspect 2009, 117(2):217-222. 136. Wang L, Wang F, Guan J, et al: Relation between hypomethylation of long interspersed nucleotide elements and risk of neural tube defects. Am J Clin Nutr 2010, 91(5):1359-1367. 141   137. Choi IS, Estecio MR, Nagano Y, et al: Hypomethylation of LINE-1 and Alu in well-differentiated neuroendocrine tumors (pancreatic endocrine tumors and carcinoid tumors). Mod Pathol 2007, 20(7):802-810. 138. Miller SA, Dykes DD, Polesky HF: A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res 1988, 16(3):1215. 139. Novakovic B, Yuen R, Gordon L, et al: Evidence for widespread changes in promoter methylation profile in human placenta in response to increasing gestational age and environmental/stochastic factors. BMC Genomics 2011, 12(1):529. 140. Chen YA, Choufani S, Ferreira JC, Grafodatskaya D, Butcher DT, Weksberg R: Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. Genomics 2011, 97(4):214-222. 141. R Core Team: R: A language and environment for statistical computing. 2014, [http://www.R-project.org/]  142. Davis D, Du P, Bilke S, Triche Jr. T, Bootwalla M: methylumi: Handle Illumina methylation data. 2015, R package version 2.16.0. 143. Wang Y, Leung FC: An evaluation of new criteria for CpG islands in the human genome as gene markers. Bioinformatics 2004, 20(7):1170-1177. 144. Goecks J, Nekrutenko A, Taylor J, Galaxy Team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11(8):R86-2010-11-8-r86. Epub 2010 Aug 25. 145. Blankenberg D, Von Kuster G, Coraor N, et al: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 2010, Chapter 19:Unit 19.10.1-21. 146. Giardine B, Riemer C, Hardison RC, et al: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005, 15(10):1451-1455. 147. Price AL, Eskin E, Pevzner PA: Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res 2004, 14(11):2245-2252. 148. Kang M, Rhyu M, Kim Y, et al: The length of CpG islands is associated with the distribution of Alu and L1 retroelements. Genomics 2006, 87(5):580-590. 149. Baccarelli A, Wright R, Bollati V, et al: Ischemic Heart Disease and Stroke in Relation to Blood DNA Methylation. Epidemiology 2010, 21(6):819-828. 142   150. Ewing AD, Kazazian HH,Jr: High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res 2010, 20(9):1262-1270. 151. Buzdin A, Ustyugova S, Gogvadze E, Lebedev Y, Hunsmann G, Sverdlov E: Genome-wide targeted search for human specific and polymorphic L1 integrations. Hum Genet 2003, 112(5-6):527-533. 152. Yuen RK, Penaherrera MS, von Dadelszen P, McFadden DE, Robinson WP: DNA methylation profiling of human placentas reveals promoter hypomethylation of multiple genes in early-onset preeclampsia. Eur J Hum Genet 2010, 18(9):1006-1012. 153. Chavan-Gautam P, Sundrani D, Pisal H, Nimbargi V, Mehendale S, Joshi S: Gestation-dependent changes in human placental global DNA methylation levels. Mol Reprod Dev 2011, 78(3):150. 154. Fuke C, Shimabukuro M, Petronis A, et al: Age related changes in 5-methylcytosine content in human peripheral leukocytes and placentas: an HPLC-based study. Ann Hum Genet 2004, 68(Pt 3):196-204. 155. Cho NY, Kim BH, Yoo EJ, et al: Hypermethylation of CpG island loci and hypomethylation of LINE-I and Alu repeats in prostate adenocarcinoma and their relationship to clinicopathological features. J Pathol 2007, 211(3):269-277. 156. Richards KL, Zhang B, Baggerly KA, et al: Genome-wide hypomethylation in head and neck cancer is more pronounced in HPV-negative tumors and is associated with genomic instability. PLoS One 2009, 4(3):e4941. 157. Weisenberger DJ, Campan M, Long TI, et al: Analysis of repetitive element DNA methylation by MethyLight. Nucleic Acids Res 2005, 33(21):6823-6836. 158. Medstrand P, van de Lagemaat LN, Mager DL: Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 2002, 12(10):1483-1495. 159. Turker MS: Gene silencing in mammalian cells and the spread of DNA methylation. Oncogene 2002, 21(35):5388-5393. 160. Hellmann-Blumberg U, Hintz MF, Gatewood JM, Schmid CW: Developmental differences in methylation of human Alu repeats. Mol Cell Biol 1993, 13(8):4523-4530. 161. Matlik K, Redik K, Speek M: L1 antisense promoter drives tissue-specific transcription of human genes. Journal of Biomedicine and Biotechnology 2006, 2006. 143   162. Macaulay EC, Weeks RJ, Andrews S, Morison IM: Hypomethylation of functional retrotransposon-derived genes in the human placenta. Mamm Genome 2011, 22(11-12):722-735. 163. Okahara G, Matsubara S, Oda T, Sugimoto J, Jinno Y, Kanaya F: Expression analyses of human endogenous retroviruses (HERVs): tissue-specific and developmental stage-dependent expression of HERVs. Genomics 2004, 84(6):982-990. 164. Frendo JL, Olivier D, Cheynet V, et al: Direct involvement of HERV-W Env glycoprotein in human trophoblast cell fusion and differentiation. Mol Cell Biol 2003, 23(10):3566-3574. 165. Buhule OD, Minster RL, Hawley NL, et al: Stratified randomization controls better for batch effects in 450K methylation analysis: a cautionary tale. Front Genet 2014, 5:354. 166. Zhang X, Mu W, Zhang W: On the analysis of the illumina 450k array data: probes ambiguously mapped to the human genome. Front Genet 2012, 3:73. 167. Morris T, Lowe R: Report on the Infinium 450k methylation array analysis workshop: April 20, 2012 UCL, London, UK. Epigenetics 2012, 7(8):961-962. 168. Price EM, Cotton AM, Lam LL, et al: Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin 2013, 6(4). 169. Karolchik D, Hinrichs AS, Furey TS, et al: The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004, 32(Database issue):D493-6. 170. Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12(4):656-664. 171. Kent WJ, Sugnet CW, Furey TS, et al: The human genome browser at UCSC. Genome Res 2002, 12(6):996-1006. 172. Cotton AM, Lam L, Affleck JG, et al: Chromosome-wide DNA methylation analysis predicts human tissue-specific X inactivation. Hum Genet 2011, 130(2):187-201. 173. Lawrence M, Huber W, Pages H, et al: Software for computing and annotating genomic ranges. PLoS Comput Biol 2013, 9(8):e1003118. 174. Hannum G, Guinney J, Zhao L, et al: Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates. Mol Cell 2013, 49(2):359-367. 144   175. Schwender H: siggenes: Multiple testing using SAM and Efron's empirical Bayes approaches. 2011, R package version 1.28.0. 176. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, et al: A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res 2010, 38(3):e17. 177. Benovoy D, Kwan T, Majewski J: Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. Nucleic Acids Res 2008, 36(13):4417-4423. 178. Fraser HB, Lam LL, Neumann SM, Kobor MS: Population-specificity of human DNA methylation. Genome Biol 2012, 13(2):R8. 179. Bell JT, Pai AA, Pickrell JK, et al: DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011, 12(1):R10. 180. Teh AL, Pan H, Chen L, et al: The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes. Genome Res 2014, 24(7):1064-1074. 181. Blair JD, Price EM: Illuminating Potential Technical Artifacts of DNA-Methylation Array Probes. Am J Hum Genet 2012, 91(4):760-762. 182. Eckhardt F, Lewin J, Cortese R, et al: DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 2006, 38(12):1378-1385. 183. Brenet F, Moh M, Funk P, et al: DNA methylation of the first exon is tightly linked to transcriptional silencing. PLoS One 2011, 6(1):e14524. 184. Laird PW: Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 2010, 11(3):191-203. 185. Beyan H, Down TA, Ramagopalan SV, et al: Guthrie card methylomics identifies temporally stable epialleles that are present at birth in humans. Genome Res 2012, 22(11):2138-2145. 186. Bourgon R, Gentleman R, Huber W: Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A 2010, 107(21):9546-9551. 187. Saad MN, Mabrouk MS, Eldeib AM, Shaker OG: Effect of MTHFR, TGFbeta1, and TNFB polymorphisms on osteoporosis in rheumatoid arthritis patients. Gene 2015, 568(2):124-128. 145   188. Stover PJ, MacFarlane AJ, Field MS: Bringing clarity to the role of MTHFR variants in neural tube defect prevention. Am J Clin Nutr 2015, 101(6):1111-1112. 189. Liu R, Geng P, Ma M, et al: MTHFR C677T polymorphism and migraine risk: a meta-analysis. J Neurol Sci 2014, 336(1-2):68-73. 190. Sener EF, Oztop DB, Ozkul Y: MTHFR Gene C677T Polymorphism in Autism Spectrum Disorders. Genet Res Int 2014, 2014. 191. Yadav U, Kumar P, Yadav SK, Mishra OP, Rai V: "Polymorphisms in folate metabolism genes as maternal risk factor for neural tube defects: an updated meta-analysis". Metab Brain Dis 2015, 30(1):7-24. 192. Gao XH, Zhang GY, Wang Y, Zhang HY: Correlations of MTHFR 677C>T polymorphism with cardiovascular disease in patients with end-stage renal disease: a meta-analysis. PLoS One 2014, 9(7):e102323. 193. Frosst P, Blom HJ, Milos R, et al: A candidate genetic risk factor for vascular disease: a common mutation in methylenetetrahydrofolate reductase. Nat Genet 1995, 10(1):111-113. 194. Hickey SE, Curry CJ, Toriello HV: ACMG Practice Guideline: lack of evidence for MTHFR polymorphism testing. Genet Med 2013, 15(2):153-156. 195. Bellamy J. Dubious MTHFR genetic mutation testing. [https://www.sciencebasedmedicine.org/dubious-mthfr-genetic-mutation-testing/] (2015). Accessed December 13, 2015. 196. MTHFR gene mutation... What's the big deal about Methylation?. [http://doccarnahan.blogspot.ca/2013/05/mthfr-gene-mutation-whats-big-deal.html] (2013). Accessed February 4, 2016. 197. Clarke R, Bennett DA, Parish S, et al: Homocysteine and coronary heart disease: meta-analysis of MTHFR case-control studies, avoiding publication bias. PLoS Med 2012, 9(2):e1001177. 198. Rai V: Methylenetetrahydrofolate reductase gene A1298C polymorphism and susceptibility to recurrent pregnancy loss: a meta-analysis. Cell Mol Biol (Noisy-le-grand) 2014, 60(2):27-34. 199. Ren A, Wang J: Methylenetetrahydrofolate reductase C677T polymorphism and the risk of unexplained recurrent pregnancy loss: a meta-analysis. Fertil Steril 2006, 86(6):1716-1722. 146   200. Den Heijer M, Lewington S, Clarke R: Homocysteine, MTHFR and risk of venous thrombosis: a meta-analysis of published epidemiological studies. J Thromb Haemost 2005, 3(2):292-299. 201. Ray JG, Shmorgun D, Chan WS: Common C677T polymorphism of the methylenetetrahydrofolate reductase gene and the risk of venous thromboembolism: meta-analysis of 31 studies. Pathophysiol Haemost Thromb 2002, 32(2):51-58. 202. Lewis SJ, Ebrahim S, Davey Smith G: Meta-analysis of MTHFR 677C->T polymorphism and coronary heart disease: does totality of evidence support causal role for homocysteine and preventive potential of folate? BMJ 2005, 331(7524):1053. 203. Yan L, Zhao L, Long Y, et al: Association of the maternal MTHFR C677T polymorphism with susceptibility to neural tube defects in offsprings: evidence from 25 case-control studies. PLoS One 2012, 7(10):e41689. 204. Finnell RH, Shaw GM, Lammer EJ, Volcik KA: Does prenatal screening for 5,10-methylenetetrahydrofolate reductase (MTHFR) mutations in high-risk neural tube defect pregnancies make sense? Genet Test 2002, 6(1):47-52. 205. Blom HJ, Shaw GM, den Heijer M, Finnell RH: Neural tube defects and folate: case far from closed. Nature Reviews Neuroscience 2006, 7(9):724-731. 206. Stern LL, Mason JB, Selhub J, Choi SW: Genomic DNA hypomethylation, a characteristic of most cancers, is present in peripheral leukocytes of individuals who are homozygous for the C677T polymorphism in the methylenetetrahydrofolate reductase gene. Cancer Epidemiol Biomarkers Prev 2000, 9(8):849-853. 207. Friso S, Choi SW, Girelli D, et al: A common mutation in the 5,10-methylenetetrahydrofolate reductase gene affects genomic DNA methylation through an interaction with folate status. Proc Natl Acad Sci U S A 2002, 99(8):5606-5611. 208. Axume J, Smith SS, Pogribny IP, Moriarty DJ, Caudill MA: The MTHFR 677TT genotype and folate intake interact to lower global leukocyte DNA methylation in young Mexican American women. Nutr Res 2007, 27(1):1365-1317. 209. Shelnutt KP, Kauwell GP, Gregory JF,3rd, et al: Methylenetetrahydrofolate reductase 677C-->T polymorphism affects DNA methylation in response to controlled folate intake in young women. J Nutr Biochem 2004, 15(9):554-560. 147   210. Castro R, Rivera I, Ravasco P, et al: 5,10-methylenetetrahydrofolate reductase (MTHFR) 677C-->T and 1298A-->C mutations are associated with DNA hypomethylation. J Med Genet 2004, 41(6):454-458. 211. Narayanan S, McConnell J, Little J, et al: Associations between two common variants C677T and A1298C in the methylenetetrahydrofolate reductase gene and measures of folate metabolism and DNA stability (strand breaks, misincorporated uracil, and DNA methylation status) in human lymphocytes in vivo. Cancer Epidemiol Biomarkers Prev 2004, 13(9):1436-1443. 212. Penaherrera MS, Jiang R, Avila L, Yuen RK, Brown CJ, Robinson WP: Patterns of placental development evaluated by X chromosome inactivation profiling provide a basis to evaluate the origin of epigenetic variation. Hum Reprod 2012, 27(6):1745-1753. 213. Scherer A: Variation, Variability, Batches and Bias. In: A Scherer, editors. Microarray Experiments: An Introduction. United Kingdom: John Wiley & Sons, Ltd; 2009. p.1-4. 214. Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Storey JD: sva: Surrogate Variable Analysis. R package version 3.12.0. 215. Price EM, Peñaherrera MS, Portales‐Casamar E, et al: Profiling placental and fetal DNA methylation in human neural tube defects. Epigenetics Chromatin 2016, 9(6). 216. Royo JL, Hidalgo M, Ruiz A: Pyrosequencing protocol using a universal biotinylated primer for mutation detection and SNP genotyping. Nat Protoc 2007, 2(7):1734-1739. 217. Kramer MS, Platt RW, Wen SW, et al: A new and improved population-based Canadian reference for birth weight for gestational age. Pediatrics 2001, 108(2):E35. 218. Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. New York: Springer; 2005:397. 219. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 1995, 57:289. 220. Suzuki R, Shimodaira H: pvclust: Hierarchical Clustering with P-Values via   Multiscale Bootstrap Resampling. 2015, R package version 2.0-0. 221. Pearson RK, Gonye GE, Schwaber JS: Outliers in Microarray Data Analysis. In Methods of Microarray Data Analysis. Edited by Johnson KF, Lin SM. 2003:41. 148   222. Aryee MJ, Wu Z, Ladd-Acosta C, et al: Accurate genome-scale percentage DNA methylation estimates from microarray data. Biostatistics 2011, 12(2):197-210. 223. Wickham H: ggplot2: elegant graphics for data analysis: Springer New York; 2009. 224. O'Neill RJ, Vrana PB, Rosenfeld CS: Maternal methyl supplemented diets and effects on offspring health. Front Genet 2014, 5:289. 225. Hanna CW, McFadden DE, Robinson WP: DNA methylation profiling of placental villi from karyotypically normal miscarriage and recurrent miscarriage. The American journal of pathology 2013, 182(6):2276-2284. 226. Leek JT, Scharpf RB, Bravo HC, et al: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010, 11(10):733-739. 227. Akey JM, Biswas S, Leek JT, Storey JD: On the design and analysis of gene expression studies in human populations. Nat Genet 2007, 39(7):807-8; author reply 808-9. 228. Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG: Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 2007, 39(2):226-231. 229. Shi L, Reid LH, Jones WD, et al: The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24(9):1151-1161. 230. Johnson WE, Li C, Rabinovic A: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007, 8(1):118-127. 231. Daly SF, Molloy AM, Mills JL, et al: The influence of 5,10 methylenetetrahydrofolate reductase genotypes on enzyme activity in placental tissue. Br J Obstet Gynaecol 1999, 106(11):1214-1218. 232. MacFarlane AJ, Greene-Finestone LS, Shi Y: Vitamin B-12 and homocysteine status in a folate-replete population: results from the Canadian Health Measures Survey. Am J Clin Nutr 2011, 94(4):1079-1087. 233. Okano M, Bell DW, Haber DA, Li E: DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 1999, 99(3):247-257. 234. Chen X, Guo J, Lei Y, et al: Global DNA hypomethylation is associated with NTD-affected pregnancy: A case-control study. Birth Defects Res A Clin Mol Teratol 2010, 88(7):575-581. 149   235. Stolk L, Bouwland-Both MI, van Mil NH, et al: Epigenetic profiles in children with a neural tube defect; a case-control study in two populations. PLoS One 2013, 8(11):e78462. 236. Shangguan S, Wang L, Chang S, et al: DNA methylation aberrations rather than polymorphisms of FZD3 gene increase the risk of spina bifida in a high-risk region for neural tube defects. Birth Defects Res A Clin Mol Teratol 2015, 103(1):37-44. 237. Wu L, Wang L, Shangguan S, et al: Altered methylation of IGF2 DMR0 is associated with neural tube defects. Mol Cell Biochem 2013, 380(1-2):33-42. 238. Rochtus A, Izzi B, Vangeel E, et al: DNA methylation analysis of Homeobox genes implicates HOXB7 hypomethylation as risk factor for neural tube defects. Epigenetics 2015, 10(1):92-101. 239. Farkas SA, Bottiger AK, Isaksson HS, Finnell RH, Ren A, Nilsson TK: Epigenetic alterations in folate transport genes in placental tissue from fetuses with neural tube defects and in leukocytes from subjects with hyperhomocysteinemia. Epigenetics 2013, 8(3):303-316. 240. Detrait ER, George TM, Etchevers HC, Gilbert JR, Vekemans M, Speer MC: Human neural tube defects: developmental biology, epidemiology, and genetics. Neurotoxicol Teratol 2005, 27(3):515-524. 241. Ray JG, Goodman J, O'Mahoney PR, Mamdani MM, Jiang D: High rate of maternal vitamin B12 deficiency nearly a decade after Canadian folic acid flour fortification. QJM 2008, 101(6):475-477. 242. Ray JG, Blom HJ: Vitamin B12 insufficiency and the risk of fetal neural tube defects. QJM 2003, 96(4):289-295. 243. Thompson MD, Cole DE, Ray JG: Vitamin B-12 and neural tube defects: the Canadian experience. Am J Clin Nutr 2009, 89(2):697S-701S. 244. Bai S, Ghoshal K, Datta J, Majumder S, Yoon SO, Jacob ST: DNA methyltransferase 3b regulates nerve growth factor-induced differentiation of PC12 cells by recruiting histone deacetylase 2. Mol Cell Biol 2005, 25(2):751-766. 245. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30(1):207-210. 246. Harris MJ, Juriloff DM: An update to the list of mouse mutants with neural tube closure defects and advances toward a complete genetic perspective of neural tube closure. Birth Defects Res A Clin Mol Teratol 2010, 88(8):653-669. 150   247. Greene ND, Stanier P, Copp AJ: Genetics of human neural tube defects. Hum Mol Genet 2009, 18(R2):R113-29. 248. Tost J, Gut IG: DNA methylation analysis by pyrosequencing. Nature Protocols 2007, 2(9):2265-2275. 249. Gillis J, Mistry M, Pavlidis P: Gene function analysis in complex data sets using ErmineJ. Nat Protoc 2010, 5(6):1148-1159. 250. Nazor KL, Altun G, Lynch C, et al: Recurrent variations in DNA methylation in human pluripotent stem cells and their differentiated derivatives. Cell Stem Cell 2012, 10(5):620-634. 251. Weisberg I, Tran P, Christensen B, Sibani S, Rozen R: A second genetic polymorphism in methylenetetrahydrofolate reductase (MTHFR) associated with decreased enzyme activity. Mol Genet Metab 1998, 64(3):169-172. 252. Rochtus A, Jansen K, Geet CV, Freson K: Nutri-epigenomic Studies Related to Neural Tube Defects: Does Folate Affect Neural Tube Closure Via Changes in DNA Methylation? Mini reviews in medicinal chemistry 2015, 15(13):1095-1102. 253. Madersbacher H: Neurogenic bladder dysfunction in patients with myelomeningocele. Curr Opin Urol 2002, 12(6):469-472. 254. Hunt GM, Whitaker RH: The pattern of congenital renal anomalies associated with neural-tube defects. Dev Med Child Neurol 1987, 29(1):91-95. 255. Hulton SA, Thomson PD, Milner LS, Isdale JM, Ling J: The pattern of congenital renal anomalies associated with neural tube defects. Pediatr Nephrol 1990, 4(5):491-492. 256. Gray H: PL Williams and LH Bannister, editors. Gray's anatomy: the anatomical basis of medicine and surgery. Edinburgh; New York: Churchill Livingstone; 1995. p.1825-1861. 257. Tiniakos D, Anagnostou V, Stavrakis S, Karandrea D, Agapitos E, Kittas C: Ontogeny of intrinsic innervation in the human kidney. Anat Embryol (Berl) 2004, 209(1):41-47. 258. Honnebier WJ, Swaab DF: The influence of anencephaly upon intrauterine growth of fetus and placenta and upon gestation length. J Obstet Gynaecol Br Commonw 1973, 80(7):577-588. 259. Batson JL, Winn K, Dubin NH, Parmley TH: Placental immaturity associated with anencephaly. Obstet Gynecol 1985, 65(6):846-847. 151   260. Hemberger M, Nozaki T, Winterhager E, et al: Parp1-deficiency induces differentiation of ES cells into trophoblast derivatives. Dev Biol 2003, 257(2):371-381. 261. Roper SJ, Chrysanthou S, Senner CE, et al: ADP-ribosyltransferases Parp1 and Parp7 safeguard pluripotency of ES cells. Nucleic Acids Res 2014, 42(14):8914-8927. 262. Tentori L, Lacal PM, Muzi A, et al: Poly(ADP-ribose) polymerase (PARP) inhibition or PARP-1 gene deletion reduces angiogenesis. Eur J Cancer 2007, 43(14):2124-2133. 263. Zhang X, Pei L, Li R, et al: Spina bifida in fetus is associated with an altered pattern of DNA methylation in placenta. J Hum Genet 2015, 60:605-611. 264. Blair JD, Langlois S, McFadden DE, Robinson WP: Overlapping DNA methylation profile between placentas with trisomy 16 and early-onset preeclampsia. Placenta 2014, 35(3):216-222. 265. Riley L and M Cowan. Noncommunicable Diseases Country Profiles 2014. [http://www.who.int/nmh/countries/can_en.pdf?ua=1] (2014). Accessed December 12, 2015. 266. Gallagher J: Mother's diet during pregnancy alters baby's DNA. BBC News 2011, [http://www.bbc.com/news/health-13119545] Accessed December 11, 2015. 267. Wilkinson E: Can genes explain rising obesity?. BBC News 2006, [http://news.bbc.co.uk/2/hi/health/5117752.stm] Accessed December 19, 2015. 268. Richardson SS, Daniels CR, Gillman MW, et al: Society: Don't blame the mothers. Nature 2014, 512(7513):131-132. 269. Hurley D: Grandma's Experiences Leave a Mark on Your Genes. Discover 2015, [http://discovermagazine.com/2013/may/13-grandmas-experiences-leave-epigenetic-mark-on-your-genes] Accessed December 19, 2015. 270. Feil R, Fraga MF: Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet 2012, 13(2):97-109. 271. Perera F, Herbstman J: Prenatal environmental exposures, epigenetics, and disease. Reprod Toxicol 2011, 31(3):363-373. 272. Horvath S: DNA methylation age of human tissues and cell types. Genome Biol 2013, 14(10):1-20. 152   273. Simpkin AJ, Hemani G, Suderman M, et al: Prenatal and early life influences on epigenetic age in children: a study of mother-offspring pairs from two cohort studies. Hum Mol Genet 2016, 25(1):191-201. 274. Joubert BR, Haberg SE, Nilsen RM, et al: 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect 2012, 120(10):1425-1431. 275. Lee KW, Richmond R, Hu P, et al: Prenatal exposure to maternal cigarette smoking and DNA methylation: epigenome-wide association in a discovery sample of adolescents and replication in an independent cohort at birth through 17 years of age. Environ Health Perspect 2015, 123(2):193-199. 276. Richmond RC, Simpkin AJ, Woodward G, et al: Prenatal exposure to maternal smoking and offspring DNA methylation across the lifecourse: findings from the Avon Longitudinal Study of Parents and Children (ALSPAC). Hum Mol Genet 2015, 24(8):2201-2217. 277. Markunas CA, Xu Z, Harlid S, et al: Identification of DNA methylation changes in newborns related to maternal smoking during pregnancy. Environ Health Perspect 2014, 122:1147-1153. 278. Novakovic B, Ryan J, Pereira N, Boughton B, Craig JM, Saffery R: Postnatal stability, tissue, and time specific effects of AHRR methylation change in response to maternal smoking in pregnancy. Epigenetics 2014, 9(3):377-386. 279. Nielsen CH, Larsen A, Nielsen AL: DNA methylation alterations in response to prenatal exposure of maternal cigarette smoking: A persistent epigenetic impact on health from maternal lifestyle? Arch Toxicol 2016, 20(2):231-245. 280. Hogg K, Price EM, Robinson WP: Improved reporting of DNA methylation data derived from studies of the human placenta. Epigenetics 2014, 9(3):333-337. 281. Burton GJ, Sebire NJ, Myatt L, et al: Optimising sample collection for placental research. Placenta 2014, 35(1):9-22. 282. Houseman EA, Accomando WP, Koestler DC, et al: DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 2012, 13(86). 283. Montano CM, Irizarry RA, Kaufmann WE, et al: Measuring cell-type specific differential methylation in human brain tissue. Genome Biol 2013, 14(8):R94. 284. Jones MJ, Farre P, McEwen LM, et al: Distinct DNA methylation patterns of cognitive impairment and trisomy 21 in Down syndrome. BMC Med Genomics 2013, 6(58). 153   285. Chen Y, Choufani S, Grafodatskaya D, Butcher DT, Ferreira J, Weksberg R: Cross-Reactive DNA Microarray Probes Lead to False Discovery of Autosomal Sex-Associated DNA Methylation. Am J Hum Genet 2012, 91(4):762-764. 286. Chen Y, Lemire M, Choufani S, et al: Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 2013, 8(2):203-209. 287. Naeem H, Wong NC, Chatterton Z, et al: Reducing the risk of false discovery enabling identification of biologically significant genome-wide methylation status using the HumanMethylation450 array. BMC Genomics 2014, 15(1):51. 288. Okamura K, Kawai T, Hata K, Nakabayashi K: Lists of HumanMethylation450 BeadChip probes with nucleotide-variant information obtained from the phase 3 data of the 1000 genomes project. Genomics Data 2016, 7:67-69. 289. Daca-Roszak P, Pfeifer A, Żebracka-Gala J, et al: Impact of SNPs on methylation readouts by Illumina Infinium HumanMethylation450 BeadChip Array: implications for comparative population studies. BMC Genomics 2015, 16(1):1003. 290. Barrow TM, Byun H: Single nucleotide polymorphisms on DNA methylation microarrays: precautions against confounding. Epigenomics 2014, 6(6):577. 291. Morris TJ, Beck S: Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data. Methods 2015, 72:3-8. 292. Aryee MJ, Jaffe AE, Corrada-Bravo H, et al: Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 2014, 30(10):1363-1369. 293. Xiao J, Shen F, Xue Q, et al: Is ethnicity a risk factor for developing preeclampsia? An analysis of the prevalence of preeclampsia in China. J Hum Hypertens 2014, 28(11):694-698. 294. Singmann P, Shem-Tov D, Wahl S, et al: Characterization of whole-genome autosomal differences of DNA methylation between men and women. Epigenetics Chromatin 2015, 8:43-015-0035-3. eCollection 2015. 295. Heyn H, Moran S, Hernando-Herraez I, et al: DNA methylation contributes to natural human variation. Genome Res 2013, 23(9):1363-1372. 296. Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, et al: Passive and active DNA methylation and the interplay with genetic variation in gene regulation. Elife 2013, 2:e00523. 154   297. Yong E: No, Scientists Have Not Found the ‘Gay Gene’. The Atlantic 2015, [http://www.theatlantic.com/science/archive/2015/10/no-scientists-have-not-found-the-gay-gene/410059/] Accessed December 18, 2015. 298. Ngun TC, Guo W, Ghahramani NM, Purkayastha K, Conn D, Sanchez FJ, Bocklandt S, Zhang M, Ramirez CM, Pellegrini M, Vilain E: Abstract: A novel predictive model of sexual orientation using epigenetic markers.  [abstract]. Presented at American Society of Human Genetics 2015 Annual Meeting Baltimore, Md 2015. 299. ASHG: Epigenetic Algorithm Accurately Predicts Male Sexual Orientation  Findings Reported at ASHG 2015 Annual Meeting. American Society of Human Genetics Press Release 2015, [http://www.ashg.org/press/201510-sexual-orientation.html] Accessed December 18, 2015. 300. Reardon S: Epigenetic 'tags' linked to homosexuality in men. Nature News 2015, [http://www.nature.com/news/epigenetic-tags-linked-to-homosexuality-in-men-1.18530] Accessed December 18, 2015. 301. Psychiatric GWAS Consortium Coordinating Committee: Genomewide association studies: history, rationale, and prospects for psychiatric disorders. Am J Psychiatry 2009, 166(5):540-556. 302. Cantor RM, Lange K, Sinsheimer JS: Prioritizing GWAS results: a review of statistical methods and recommendations for their application. The American Journal of Human Genetics 2010, 86(1):6-22. 303. Visscher PM, Brown MA, McCarthy MI, Yang J: Five years of GWAS discovery. The American Journal of Human Genetics 2012, 90(1):7-24. 304. Cotton AM, Price EM, Jones MJ, Balaton BP, Kobor MS, Brown CJ: Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Hum Mol Genet 2015, 24(6):1528-1539. 305. ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489(7414):57-74. 306. Bae J: Perspectives of international human epigenome consortium. Genomics & informatics 2013, 11(1):7-14. 307. Bernstein BE, Stamatoyannopoulos JA, Costello JF, et al: The NIH roadmap epigenomics mapping consortium. Nat Biotechnol 2010, 28(10):1045-1048. 155   308. Weinstein JN, Collisson EA, Mills GB, et al: The cancer genome atlas pan-cancer analysis project. Nat Genet 2013, 45(10):1113-1120.  156   Appendices 157   Appendix A  Supplementary tables and figures for Chapter 2 Supplementary Table 2.1 Optical density (OD) measured for replicate standard curves using MethylFlash kit ng ME4  loaded Standard  curve 1 Standard curve 2 replicate  average replicate SD replicate CV (%) 0 0.091 0.054 0.07 0.03 42.86 0.5 0.976 0.363 0.67 0.43 64.18 1 1.339 0.482 0.91 0.61 67.03 2 1.601 0.819 1.21 0.55 45.45 5 1.543 1.031 1.29 0.36 27.91 10 1.078 0.861 0.97 0.15 15.46  Supplementary Table 2.2 Mean DNAm and standard deviation of placental and somatic tissues at five ReDS. A   Alu L1 strong islands weak islands non-islands 27k array data published Tissue n Mean (%) SD Mean (%) SD Mean (β) SD Mean (β) SD Mean (β) SD  1st trimester  10  19.75 2.21 62.69 7.00 0.1115 0.61 0.3612 0.99 0.4699 1.75 Novakovic 2011 2nd trimester  11 22.99 1.27 62.52 4.60 0.1121 0.49 0.3744 0.65 0.4960 1.20 Novakovic 2011 Term  10  21.34 0.93 62.54 2.78 0.1182 0.63 0.3810 0.68 0.5201 0.91 Novakovic 2011  B   Alu L1 strong islands weak islands non-islands 27k array data published Tissue n Mean (%) SD Mean (%) SD Mean (β) SD Mean (β) SD Mean (β) SD  Brain  8 22.41 4.34 84.99 3.32 10.42 0.75 41.31 0.49 60.89 1.48 not published Kidney  11 23.96 2.24 81.11 3.33 9.80 0.47 38.83 0.47 57.91 0.93 not published Muscle  10 23.95 1.83 82.82 2.44 9.83 0.32 38.22 0.34 54.85 0.54 not published Blood  10  24.06 0.26 85.79 1.92 11.30 0.67 43.53 0.36 63.22 0.58 Yuen 2011  Mean and standard deviation for DNAm at five ReDS in (A) chorionic villi at three gestational ages and (B) somatic tissues (brain, kidney and muscle tissues obtained from 2nd trimester fetuses and adult blood). The table indicates samples included in this study for which 27k array data has been previously published: Novakovic 2011, Yuen 2011.   158    Supplementary Figure 2.1 No sex-differences in dispersed DNA methylation.  Male to female comparison of DNAm within chorionic villi and somatic tissues for each of five measures of dispersed DNAm revealed no differences by sex. Blood samples were not included in this analysis since all were female. Sex chromosome probes on the 27k array were removed prior to all analyses. 159    Supplementary Figure 2.2 Variable correlation of five measures of dispersed DNAm in 1st trimester chorionic villi.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. Each box in the top panels should be viewed as an individual scatter plot comparing DNAm assessed by assay 1 on the x axis vs. assay 2 on the y axis. Correspondingly, the level of DNAm scales for each assay plotted are located on the outside of the correlation matrix in line with the assay label. Each dot in a scatter plot represents the DNAm of one individual measured by two assays. In the lower panels, each box indicates the Spearman rank order correlation (r) and the p-value for assay 1 on x axis compared to assay 2 on the y axes. Boxes in bold highlight significant correlations (considered as p ≤0.05) between different measures of DNAm.   160    Supplementary Figure 2.3 Variable correlation of five measures of dispersed DNAm in 2nd trimester chorionic villi.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph. 161    Supplementary Figure 2.4 Variable correlation of five measures of dispersed DNAm in term chorionic villi.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.162    Supplementary Figure 2.5 Variable correlation of five measures of dispersed DNAm in brain.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.163    Supplementary Figure 2.6 Variable correlation of five measures of dispersed DNAm in kidney.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.164    Supplementary Figure 2.7 Variable correlation of five measures of dispersed DNAm in muscle.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.165    Supplementary Figure 2.8 Variable correlation of five measures of dispersed DNAm in blood.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.166    Supplementary Figure 2.9 Variable correlation of five measures of dispersed DNAm in merge somatic tissue group.  Spearman rank order correlation tests were performed to compare each measure of dispersed DNAm to every other measure. See Supplementary Figure 2.2 for aid in interpreting graph.   167    Supplementary Figure 2.10 Observed vs. expected number of probes in three categories of distance to nearest transcription start site (TSS).  (A) ± 0 to 500 bps; (B) ± 501 to 1000 bps; and (C) ± 1001 to 1500 bps. Analysis demonstrated an overrepresentation of strong island probes and an underrepresentation of non-island probes close to TSS (A). Conversely there was an overrepresentation of non-island probes and underrepresentation of strong island probes distal to TSS (C). Hashed bars indicate the expected number of probes calculated using a chi squared test, while solid bars indicate the observed number of probes within each island group. 168   Appendix B  Supplementary tables and figures for Chapter 3 Supplementary Table 3.1 Calculation of intended genomic location of 450k probes Probe interval Strand Probe Type Start column name (calculation) End column name (calculation) Entire probe F I Probe start (MAPINFO-1) Probe end (MAPINFO+49) II Probe start (MAPINFO) Probe end (MAPINFO+50) R I Probe start (MAPINFO-49) Probe end (MAPINFO+1) II Probe start (MAPINFO-50) Probe start (MAPINFO) Probe without CpG F I Probe start (MAPINFO+1) Probe end (MAPINFO+49) II Probe start (MAPINFO+1) Probe end (MAPINFO+50) R I Probe start (MAPINFO-49) Probe end (MAPINFO-1) II Probe start (MAPINFO-50) Probe start (MAPINFO-1) CpG F & R I & II CpG start (MAPINFO-1) CpG end (MAPINFO+1) C F & R I & II C start (MAPINFO-1) MAPINFO  Supplementary Table 3.2 Primers for genotype validation of a target CpG SNP IlmnID cg06961873 Probe type II SNP ID rs61775206 refNCBI G observed A/G avHet 0 F Primer 5’ biotin-CTGCGGATCCTGCTGTAAAA 3’ R Primer 5’ TTAGCCATCGTTACAGTCTCTCG 3’ S Primer 5’ GATGTAGCTGTGGTGGTA 3’ 169   Supplementary Table 3.3 Frequency of target SNP CpG/probe # of SNPs/probe Frequency Percent of probes with target SNP CpG 1 20,270 97.13 2 574 2.75 3 21 0.1 4 2 0.01 5 2 0.01 Total 20,870 100 170   Supplementary Table 3.4 List of autosomal probes with sex differences in DNAm ID_REF AbsDeltaB XY_Hits Autosomal_Hits n_bp_repetitive AlleleA_Hits AlleleB_Hits cg04462931 41.08% XY_YES A_YES 48 6 0 cg11955727 27.29% XY_YES A_YES 48 5 0 cg19097082 26.34% XY_YES A_YES 50 4 1 cg00399683 25.29% XY_YES A_YES 50 6 0 cg12949927 24.36% XY_YES A_YES 49 6 0 cg17765025 24.31% XY_YES A_YES 49 4 0 cg27540865 23.65% XY_YES A_NO 0 2 2 cg20926353 23.03% XY_YES A_NO 0 2 0 cg00167275 22.93% XY_YES A_NO 0 2 2 cg23256579 22.90% XY_NO A_NO 0 1 0 cg03911306 22.02% XY_YES A_NO 50 5 0 cg01620164 20.65% XY_NO A_NO 0 1 1 cg08037478 20.45% XY_YES A_NO 0 3 1 cg00157199 20.25% XY_NO A_NO 0 1 1 cg00804338 18.19% XY_YES A_NO 13 2 2 cg14079463 17.68% XY_NO A_NO 0 1 0 cg11643285 17.42% XY_NO A_NO 0 1 0 cg16218221 17.34% XY_YES A_NO 0 2 0 cg00774458 17.06% XY_YES A_YES 0 8 9 cg14815891 17.04% XY_NO A_YES 0 4 1 cg14361252 16.95% XY_NO A_NO 0 1 1 cg26921482 16.68% XY_NO A_YES 25 8 1 cg12691488 16.42% XY_NO A_NO 0 1 0 cg03151810 16.19% XY_NO A_NO 0 1 1 cg11388673 16.02% XY_NO A_NO 0 1 1 cg04580344 15.52% XY_NO A_YES 0 2 0 cg15606473 14.56% XY_NO A_YES 3 6 2 cg00500229 14.28% XY_YES A_NO 0 2 0 cg07753967 13.99% XY_NO A_YES 0 3 3 cg20811988 13.95% XY_NO A_YES 0 2 2 cg08656326 13.91% XY_YES A_NO 47 2 0 cg10341310 13.89% XY_YES A_NO 0 2 0 cg11092486 13.75% XY_NO A_NO 0 1 0 cg08931129 12.98% XY_NO A_YES 0 2 1 cg11240062 12.86% XY_NO A_NO 0 1 1 cg17226602 12.79% XY_YES A_NO 0 2 0 cg27615582 12.61% XY_NO A_NO 0 1 0 171   ID_REF AbsDeltaB XY_Hits Autosomal_Hits n_bp_repetitive AlleleA_Hits AlleleB_Hits cg16609139 12.34% XY_YES A_YES 48 16 0 cg17292758 12.33% XY_NO A_NO 0 1 0 cg15011775 12.29% XY_YES A_YES 0 5 0 cg06358300 12.18% XY_YES A_NO 0 2 0 cg14095100 12.11% XY_YES A_NO 0 2 0 cg08981669 12.07% XY_YES A_YES 0 10 0 cg10563109 11.88% XY_NO A_NO 0 1 1 cg13150977 11.86% XY_YES A_YES 0 9 0 cg16293892 11.80% XY_NO A_NO 0 1 1 cg03894796 11.71% XY_NO A_NO 0 1 0 cg23250574 11.71% XY_NO A_NO 0 1 1 cg07381872 11.52% XY_YES A_NO 45 2 0 cg07004386 11.50% XY_NO A_NO 0 1 0 cg19382572 11.41% XY_NO A_YES 50 8 0 cg05132077 11.37% XY_NO A_NO 0 1 1 cg16440561 11.24% XY_NO A_NO 0 1 1 cg27535677 11.08% XY_NO A_NO 0 1 1 cg15756407 10.94% XY_YES A_NO 0 2 0 cg13565129 10.93% XY_NO A_NO 0 1 1 cg26865747 10.93% XY_NO A_NO 0 1 1 cg20390711 10.92% XY_NO A_NO 0 1 1 cg17238319 10.90% XY_YES A_NO 0 2 0 cg25034424 10.89% XY_NO A_NO 50 1 1 cg11012412 10.86% XY_NO A_NO 0 1 0 cg06972969 10.82% XY_NO A_NO 23 1 1 cg20432211 10.78% XY_NO A_NO 0 1 1 cg25653641 10.63% XY_NO A_YES 0 2 1 cg02026141 10.61% XY_NO A_NO 0 1 0 cg24919522 10.61% XY_YES A_YES 0 10 0 cg07586008 10.44% XY_NO A_NO 0 1 1 cg05017628 10.32% XY_NO A_NO 0 1 1 cg04858776 10.30% XY_NO A_YES 37 2 1 cg24920126 10.28% XY_NO A_NO 0 1 1 cg22549041 10.24% XY_NO A_NO 0 1 1 cg12906381 10.20% XY_NO A_NO 50 1 1 cg02556954 10.17% XY_YES A_NO 0 2 0 cg25294185 10.02% XY_YES A_NO 0 2 2 cg19590578 10.00% XY_NO A_NO 46 1 0 172   Supplementary Table 3.5 Distribution of probes within Illumina- and HIL-annotated CpG classes   Illumina CpG classes    Island Shore Shelf Sea Total HIL CpG classes HC 119,445 29,889 492 4,033 153,859 ICshore 10,056 22,069 405 1,425 33,955 IC 20,378 23,644 14,191 60,514 118,727 LC 375 36,465 32,056 110,075 178,971  Total 150,254 112,067 47,144 176,047 485,512  Supplementary Table 3.6 Distribution of ß values within Illumina and HIL-annotated CpG classes for blood    hypomethylated (ß 0 to ≤0.2) heterogenously methylated (ß >0.2 to <0.8) hypermethylated (ß ≥0.8 t0 1.0) Total Illumina CpG class Island  98,869 (72.32) 19,577 (14.32) 18,266 (13.36) 136,712 Shore  34,014 (33.99) 30,019 (29.99) 36,050 (36.02) 100,083 Shelf  1,975 (4.96) 8,961 (22.50)ns 28,897 (72.55) 39,833 Sea  13,964 (9.21) 36,992 (24.40) 100,632 (66.39) 151,588 Total  148,822 (34.75) 95,549 (22.31) 183,845 (42.93) 428,216 HIL CpG class HC  110,768 (79.22) 19,915 (14.24) 9,143 (6.54) 139,826 ICshore  14,033 (46.06) 9,774 (32.08) 6,660 (21.86) 30,467 IC  13,660 (13.63) 23,449 (23.41) 63,055 (62.95) 100,164 LC  10,361 (6.57) 42,411 (26.88) 104,987 (66.55) 157,759 Total  148,822 95,549 183,845 428,216 Number in brackets is the percent of probes based on the CpG class (i.e. row) total. Fisher’s exact test was used to calculate a p-value for the distribution of probes across the three levels of DNAm. All comparisons were significant at p<2.2x10-16 except for ns, non-significant.173   Supplementary Table 3.7 Enrichment of differentially methylated (DM) probes in Illumina and HIL-annotated CpG classes    Blood vs. buccal Buccal vs. chorionic villi Chorionic villi vs. blood    % of all probes % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value Illumina CpG classes Island 31.92% 18.49% -42.09% <6.8x10-160 16.14% -49.43% <6.8x10-160 18.44% -42.24% <6.8x10-160 Shore 23.37% 27.52% 17.77% <6.8x10-160 24.46% 4.66% 9.2X10-13 25.67% 9.85% <6.8x10-160 Shelf 9.30% 10.08% 8.38% 8.3x10-13 12.09% 30.03% <6.8x10-160 11.02% 18.44% <2.2x10-16 Sea 35.41% 43.91% 24.01% <6.8x10-160 47.31% 33.59% <6.8x10-160 44.88% 26.73% <6.8x10-160 HIL CpG classes HC 32.65% 17.31% -46.99% <6.8x10-160 12.60% -61.41% <6.8x10-160 15.37% -52.93% <6.8x10-160 ICshore 7.11% 8.53% 19.86% <6.8x10-160 6.67% -6.21% 1.2x10-6 7.49% 5.23% 6.2x10-6 IC 23.39% 25.71% 9.90% <6.8x10-160 31.33% 33.95% <6.8x10-160 31.83% 36.06% <6.8x10-160 LC 36.84% 48.45% 31.52% <6.8x10-160 49.40% 34.08% <6.8x10-160 45.32% 23.01% <6.8x10-160 Numbers in red highlight negative relative enrichment 174   Supplementary Table 3.8 Average DNAm and standard deviation (SD) of nine gene features HIL class Component Location # of probes Blood (n=4) Chorionic villi (n=4) Buccal (n=4) Avg β SD Avg β SD Avg β SD HC 1st Exon 5'UTR 16246 0.092 0.11 0.08 0.223 0.069 0.098 Body 5819 0.141 0.186 0.126 0.186 0.105 0.174 3'UTR 956 0.156 0.203 0.133 0.177 0.128 0.203 Exon 5'UTR 460 0.219 0.276 0.208 0.242 0.192 0.276 Body 4587 0.502 0.374 0.44 0.323 0.46 0.381 3'UTR 1587 0.425 0.361 0.399 0.32 0.382 0.359 Intron 5'UTR 12493 0.134 0.173 0.122 0.16 0.102 0.154 Body 19320 0.242 0.293 0.221 0.26 0.204 0.284 3'UTR 1198 0.167 0.2 0.168 0.199 0.133 0.193 ICshore 1st Exon 5'UTR 1072 0.191 0.236 0.181 0.213 0.143 0.209 Body 493 0.394 0.35 0.341 0.35 0.345 0.353 3'UTR 201 0.371 0.329 0.3 0.279 0.283 0.314 Exon 5'UTR 158 0.376 0.308 0.383 0.262 0.313 0.305 Body 1492 0.647 0.333 0.56 0.305 0.58 0.35 3'UTR 706 0.594 0.343 0.529 0.297 0.526 0.345 Intron 5'UTR 3256 0.272 0.281 0.25 0.246 0.215 0.259 Body 6925 0.44 0.354 0.391 0.309 0.382 0.353 3'UTR 340 0.362 0.31 0.313 0.259 0.289 0.29 IC 1st Exon 5'UTR 2372 0.355 0.34 0.305 0.282 0.314 0.334 Body 1029 0.6 0.35 0.5 0.35 0.582 0.363 3'UTR 232 0.622 0.338 0.495 0.285 0.591 0.339 Exon 5'UTR 398 0.634 0.351 0.535 0.307 0.604 0.371 Body 13097 0.864 0.169 0.761 0.216 0.831 0.208 3'UTR 3852 0.82 0.219 0.727 0.243 0.771 0.252 Intron 5'UTR 4798 0.63 0.342 0.537 0.309 0.567 0.359 Body 30436 0.786 0.249 0.682 0.261 0.741 0.283 3'UTR 808 0.644 0.312 0.552 0.287 0.594 0.328 LC 1st Exon 5'UTR 2365 0.616 0.321 0.541 0.278 0.579 0.33 Body 1116 0.777 0.215 0.615 0.215 0.775 0.241 3'UTR 395 0.714 0.284 0.635 0.265 0.688 0.305 Exon 5'UTR 534 0.754 0.263 0.673 0.252 0.729 0.291 Body 5712 0.847 0.174 0.759 0.21 0.813 0.215 3'UTR 8586 0.847 0.178 0.769 0.211 0.808 0.221 Intron 5'UTR 12847 0.757 0.255 0.674 0.259 0.699 0.302 Body 48514 0.791 0.231 0.7 0.248 0.739 0.279 3'UTR 2085 0.776 0.236 0.678 0.25 0.73 0.284   Total 216485       175   Supplementary Table 3.9 Enrichment of tissue differentially methylated (tDM) probes within gene features     Blood vs. buccal Buccal vs. chorionic villi Chorionic villi vs. blood HIL CpG Class Component Location % of class probes % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value HC 1st Exon 5' UTR 25.90 12.83 -50.49 1.3x10-112 8.40 -67.55 2.8x10-156 11.23 -56.63 6.8x10-160 Body 9.28 8.78 -5.32 0.100 7.53 -18.88 2.9x10-5 7.40 -20.28 9.x10-8 3'UTR 1.52 1.55 1.43 0.447 1.40 -8.34 0.245 1.64 7.60 0.223 Exon 5' UTR 0.73 0.98 34.15 0.013 0.95 29.04 0.048 0.96 30.80 0.017 Body 7.40 12.67 71.15 <6.8x10-160 20.75 180.40 <6.8x10-160 17.64 138.33 <6.8x10-160 3'UTR 2.53 4.43 74.97 <6.8x10-160 5.70 125.30 <6.8x10-160 5.63 122.59 <6.8x10-160 Intron 5' UTR 19.92 18.55 -6.86 0.005 13.36 -32.92 3.9x10-28 14.98 -24.81 1.3x10-23 Body 30.80 37.83 22.80 <6.8x10-160 40.22 30.57 <6.8x10-160 38.59 25.28 <6.8x10-160 3'UTR 1.91 2.39 25.09 0.004 1.69 -11.52 0.142 1.93 1.26 0.444 ICshore 1st Exon 5' UTR 7.31 4.13 -43.53 2.6x10-11 4.70 -35.78 4.6x10-7 3.95 -45.95 1.0x10-13 Body 3.36 2.01 -40.15 2.9x10-5 3.27 -2.74 0.401 3.27 -2.68 0.388 3'UTR 1.37 1.42 3.77 0.406 1.17 -14.37 0.204 1.20 -12.17 0.207 Exon 5' UTR 1.08 0.94 -13.07 0.232 1.22 12.83 0.257 0.96 -11.19 0.253 Body 10.30 10.51 2.08 0.353 13.25 28.64 1.1x10-6 13.55 31.60 5.5x10-10 3'UTR 4.82 5.76 19.61 0.009 6.04 25.38 0.003 6.61 37.20 9.6x10-7 Intron 5' UTR 22.21 23.77 7.02 0.022 19.66 -11.46 0.001 18.96 -14.64 4.2x10-6 Body 47.23 48.79 3.28 0.048 48.51 2.70 0.106 48.81 3.34 0.036 3'UTR 2.32 2.67 15.21 0.104 2.18 -5.98 0.326 2.69 15.82 0.083 IC 1st Exon 5' UTR 4.15 5.01 20.76 1.0x10-5 3.29 -20.71 7.5x10-7 3.79 -8.59 0.014 Body 1.80 1.51 -16.14 0.016 2.17 20.81 0.001 2.01 11.79 0.025 3'UTR 0.41 0.48 18.92 0.117 0.43 4.79 0.366 0.47 15.80 0.108 Exon 5' UTR 0.70 0.73 4.71 0.348 0.74 6.03 0.286 0.74 6.48 0.252 176       Blood vs. buccal Buccal vs. chorionic villi Chorionic villi vs. blood HIL CpG Class Component Location % of class probes % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value % DM probes % Relative enrichment p-value Body 23.18 17.70 -23.65 6.3x10-38 23.33 0.64 0.346 21.93 -5.39 1.4x10-4 3'UTR 6.74 6.63 -1.56 0.340 5.84 -13.31 3.2x10-5 6.32 -6.19 0.020 Intron 5' UTR 8.39 10.90 29.93 <6.8x10-160 8.67 3.35 0.129 8.58 2.29 0.197 Body 53.23 55.44 4.15 6.4x10-6 54.20 1.82 0.015 54.46 2.32 0.001 3'UTR 1.41 1.60 13.33 0.058 1.33 -5.76 0.221 1.69 19.42 0.002 LC 1st Exon 5' UTR 2.88 4.46 55.10 <6.8x10-160 3.20 11.29 0.004 4.05 40.77 <6.8x10-160 Body 1.36 1.19 -12.58 0.027 1.99 46.50 8.0 x10-14 1.88 38.39 8.0x10-11 3'UTR 0.48 0.42 -11.61 0.145 0.46 -4.49 0.337 0.52 7.29 0.236 Exon 5' UTR 0.65 0.51 -21.19 0.012 0.72 10.18 0.133 0.73 12.94 0.069 Body 7.02 5.15 -26.61 4.4x10-22 6.69 -4.79 0.038 6.67 -5.01 0.025 3'UTR 10.44 7.98 -23.58 2.4x10-26 8.41 -19.49 1.1x10-19 8.63 -17.32 2.2x10-17 Intron 5' UTR 15.63 17.89 14.51 1.1x10-16 16.64 6.49 7.9x10-5 16.21 3.72 0.011 Body 59.01 59.80 1.34 0.018 59.30 0.50 0.210 58.78 -0.38 0.258 3'UTR 2.54 2.59 2.07 0.331 2.60 2.54 0.290 2.53 -0.33 0.470 Numbers in red highlight negative relative enrichment 177     Supplementary Figure 3.1 Distribution of DNAm at three highly variable probes The level of DNAm was plotted for three highly variable probes (SD in ß ≥0.25) annotated with a target CpG SNP, across the 261 individuals in the aging dataset. cg06961873 corresponds to the CpG site genotyped in Figure 3.4D. A trimodal pattern of DNAm was observed at these three exemplary sites, indicating that DNAm measured at these sites may reflect sample genotype. 178    Supplementary Figure 3.2 Distinct patterns of DNAm across Illumina and HIL -annotated CpG classes in blood Density curves were plotted using average ß values for probes within each Illumina-annotated and HIL-annotated CpG class in blood (n=4). The number of probes contributing to each curve was: Island=136,712, Shore=100,083, Shelf=39,833, Sea=151,588, HC=139,826, ICshore=100,164, IC=30,467 and LC=157,759. For Illumina-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of sea probes was 0.67 for island probes, 0.34 for shore probes and 0.06 for shelf probes. For HIL-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of LC probes was 0.77 for HC probes, 0.53 for ICshore probes and 0.08 for IC probes. 179    Supplementary Figure 3.3 Distinct patterns of DNAm across Illumina and HIL -annotated CpG classes in buccal samples Density curves were plotted using average ß values for probes within each Illumina-annotated and HIL-annotated CpG class in buccal samples (n=4). The number of probes contributing to each curve was: Island=136,712, Shore=100,083, Shelf=39,833, Sea=151,588, HC=139,826, ICshore=100,164, IC=30,467 and LC=157,759. For Illumina-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of sea probes was 0.66 for island probes, 0.32 for shore probes and 0.06 for shelf probes. For HIL-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of LC probes was 0.76 for HC probes, 0.49 for ICshore probes and 0.07 for IC probes. 180    Supplementary Figure 3.4 Distinct patterns of DNAm across Illumina and HIL -annotated CpG classes in chorionic villi Density curves were plotted using average ß values for probes within each Illumina-annotated and HIL-annotated CpG class in chorionic villi (n=4). The number of probes contributing to each curve was: Island=136,712, Shore=100,083, Shelf=39,833, Sea=151,588, HC=139,826, ICshore=100,164, IC=30,467 and LC=157,759. For Illumina-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of sea probes was 0.61 for island probes, 0.28 for shore probes and 0.08 for shelf probes. For HIL-annotated CpG classes, KS statistics in comparison to the distribution of DNAm of LC probes was 0.72 for HC probes, 0.45 for ICshore probes and 0.10 for IC probes.   181   Appendix C  Supplementary tables and figures for Chapter 4 Supplementary Table 4.1 Pyrosequencing conditions  MTHFR 677 MTHFR 1298 Primer sequence (5' to 3')  Forward 5Biosg/CTCAAAGAAAAGCTGCGTGAT TCCAGCATCACTCACTTTGTGAC Reverse TGTCATCCCTATTGGCAGGTT 5Biosg/CTTTGGGGAGCTGAAGGACTACTA Sequencing AAGCACTTGAAGGAGAA AACAAAGACTTCAAAGACAC PCR cycling conditions   step 1 95°C for 05:00 95°C for 05:00 step 2 95°C for 00:20 95°C for 00:20 step 3 57°C for 00:20 60°C for 00:20 step 4 72°C for 00:20 72°C for 00:20 step 5 Go to step 2, 49 times Go to step 2, 49 times step 6 72°C for 05:00 72°C for 05:00 PCR reaction (µL)  H2O 9.22 9.22 10X PCR buffer 1.5 1.5 1.25 nM dNTPs 2.4 2.4 10 µM F/R primers 0.6 0.6 5 U/µL DNA polymerase 0.18 0.18 DNA 50 ng (genomic) 50 ng (genomic) Bisulfite converted, bsc 182   Appendix D  Supplementary tables and figures for Chapter 5  Supplementary Table 5.1 Clinical and technical variables for each sample.  case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT13_br FT13 CON F br 17.0 WG0009065-MSA4 7970368054 R04C01 CC TT FT13_kid FT13 CON F kid 17.0 WG0009065-MSA4 7970368014 R01C01 CC TT FT13_mus FT13 CON F mus 17.0 WG000907-MSA4 7973201038 R01C02 CC TT FT13_v FT13 CON F v 17.0 WG0009065-MSA4 7970368097 R03C01 CC TT FT16_kid FT16 AN F kid 18.3 WG0009065-MSA4 7970368112 R03C01 TC TT FT16_v FT16 AN F v 18.3 WG000907-MSA4 7970368062 R05C02 TC TT FT18_kid FT18 CON M kid 20.4 WG0011603-MSA4 9296930154 R01C02 TC TT FT18_mus FT18 CON M mus 20.4 WG0011603-MSA4 9406922117 R01C02 TC TT FT18_sc FT18 CON M sc 20.4 WG0011603-MSA4 9296930154 R03C01 TC TT FT21_br FT21 SB M br 21.1 WG0009065-MSA4 7970368015 R01C02 CC GG FT21_kid FT21 SB M kid 21.1 WG0009065-MSA4 7970368036 R03C02 CC GG FT21_mus FT21 SB M mus 21.1 WG0009065-MSA4 7970368015 R06C02 CC GG FT21_sc FT21 SB M sc 21.1 WG0009065-MSA4 7970368014 R02C02 CC GG FT21_v FT21 SB M v 21.1 WG0009065-MSA4 7970368015 R02C02 CC GG FT22_br FT22 SB M br 21.8 WG000907-MSA4 7970368082 R06C01 CC GT FT22_kid FT22 SB M kid 21.8 WG0011603-MSA4 9296930154 R04C01 CC GT FT22_v FT22 SB M v 21.8 WG0011603-MSA4 9296930154 R05C02 CC GT FT23_br FT23 SB M br 20.8 WG000907-MSA4 7970368100 R06C02 TC TT FT23_kid FT23 SB M kid 20.8 WG0009065-MSA4 7970368014 R06C02 TC TT FT23_mus FT23 SB M mus 20.8 WG0009065-MSA4 7970368066 R03C02 TC TT FT23_sc FT23 SB M sc 20.8 WG000907-MSA4 7970368050 R01C01 TC TT FT23_v FT23 SB M v 20.8 WG000907-MSA4 7973201026 R05C01 TC TT FT26_mus FT26 AN M mus 19.3 WG0009065-MSA4 7970368015 R02C01 TC GT FT26_v FT26 AN M v 19.3 WG0009065-MSA4 7970368015 R05C02 TC GT FT27_kid FT27 SB M kid 19.8 WG0011603-MSA4 9406922026 R06C02 TC TT 183    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT27_sc FT27 SB M sc 19.8 WG000907-MSA4 7970368023 R03C01 TC TT FT27_v FT27 SB M v 19.8 WG0011603-MSA4 9296930155 R04C02 TC TT FT28_kid FT28 CON F kid 14.5 WG0011603-MSA4 9296930114 R06C02 CC TT FT28_sc FT28 CON F sc 14.5 WG0009065-MSA4 7970368054 R03C02 CC TT FT28_v FT28 CON F v 14.5 WG0011603-MSA4 9296930117 R06C02 CC TT FT29_kid FT29 AN F kid 17.6 WG000907-MSA4 7970368100 R01C01 CC TT FT29_mus FT29 AN F mus 17.6 WG0009065-MSA4 7970368066 R05C01 CC TT FT29_sc FT29 AN F sc 17.6 WG000907-MSA4 7970368023 R01C02 CC TT FT29_v FT29 AN F v 17.6 WG000907-MSA4 7970368023 R02C02 CC TT FT3_kid FT3 CON F kid 19.7 WG0011603-MSA4 9406922026 R04C02 CC TT FT3_v FT3 CON F v 19.7 WG0011603-MSA4 9296930117 R05C01 CC TT FT30_kid FT30 AN F kid 21.3 WG000907-MSA4 7973201026 R01C02 CC GT FT30_mus FT30 AN F mus 21.3 WG000907-MSA4 7970368082 R04C02 CC GT FT30_sc FT30 AN F sc 21.3 WG0009065-MSA4 7970368014 R01C02 CC GT FT33_br FT33 CON M br 18.4 WG000907-MSA4 7973201026 R03C01 CC GG FT33_kid FT33 CON M kid 18.4 WG000907-MSA4 7970368062 R01C01 CC GG FT33_mus FT33 CON M mus 18.4 WG0009065-MSA4 7970368036 R06C01 CC GG FT33_sc FT33 CON M sc 18.4 WG0009065-MSA4 7970368076 R01C01 CC GG FT34_v FT34 CON F v 23.3 WG000907-MSA4 7973201026 R05C02 TC GT FT35_kid FT35 AN F kid 16.7 WG000907-MSA4 7970368142 R03C01 TC GT FT35_mus FT35 AN F mus 16.7 WG000907-MSA4 7970368023 R06C01 TC GT FT35_sc FT35 AN F sc 16.7 WG0009065-MSA4 7970368112 R04C02 TC GT FT35_v FT35 AN F v 16.7 WG0009065-MSA4 7970368036 R06C02 TC GT FT36_br FT36 CON F br 15.0 WG0009065-MSA4 7970368066 R04C01 CC GT FT36_kid FT36 CON F kid 15.0 WG0009065-MSA4 7970368014 R03C02 CC GT FT36_mus FT36 CON F mus 15.0 WG000907-MSA4 7973201038 R05C02 CC GT FT36_sc FT36 CON F sc 15.0 WG0009065-MSA4 7970368076 R03C01 CC GT FT36_v FT36 CON F v 15.0 WG0009065-MSA4 7970368015 R03C01 CC GT 184    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT38_br FT38 CON M br 19.1 WG000907-MSA4 7970368082 R03C02 TC GT FT38_kid FT38 CON M kid 19.1 WG000907-MSA4 7970368050 R01C02 TC GT FT38_mus FT38 CON M mus 19.1 WG0009065-MSA4 7970368066 R06C02 TC GT FT38_sc FT38 CON M sc 19.1 WG0009065-MSA4 7970368066 R01C02 TC GT FT38_v FT38 CON M v 19.1 WG000907-MSA4 7970368050 R05C01 TC GT FT39_br FT39 CON M br 19.0 WG0009065-MSA4 7970368066 R06C01 CC GT FT39_kid FT39 CON M kid 19.0 WG0009065-MSA4 7970368054 R02C02 CC GT FT39_mus FT39 CON M mus 19.0 WG000907-MSA4 7970368050 R04C02 CC GT FT39_sc FT39 CON M sc 19.0 WG000907-MSA4 7973201026 R02C02 CC GT FT39_v FT39 CON M v 19.0 WG000907-MSA4 7970368050 R06C02 CC GT FT4_br FT4 CON F br 19.4 WG0011603-MSA4 9296930124 R05C02 TC TT FT4_kid FT4 CON F kid 19.4 WG0011603-MSA4 9296930117 R02C01 TC TT FT4_mus FT4 CON F mus 19.4 WG0011603-MSA4 9296930154 R01C01 TC TT FT40_kid FT40 AN M kid 20.0 WG0009065-MSA4 7970368015 R04C01 CC TT FT40_mus FT40 AN M mus 20.0 WG0009065-MSA4 7970368097 R03C02 CC TT FT40_sc FT40 AN M sc 20.0 WG0009065-MSA4 7970368036 R01C02 CC TT FT40_v FT40 AN M v 20.0 WG0009065-MSA4 7970368097 R04C01 CC TT FT41_br FT41 CON F br 15.5 WG000907-MSA4 7970368082 R05C01 CC GT FT41_kid FT41 CON F kid 15.5 WG0009065-MSA4 7970368036 R05C02 CC GT FT41_mus FT41 CON F mus 15.5 WG0009065-MSA4 7970368036 R02C02 CC GT FT41_sc FT41 CON F sc 15.5 WG000907-MSA4 7970368142 R01C02 CC GT FT41_v FT41 CON F v 15.5 WG000907-MSA4 7970368142 R05C02 CC GT FT42_br FT42 CON M br 17.0 WG000907-MSA4 7970368100 R06C01 CC GT FT42_kid FT42 CON M kid 17.0 WG000907-MSA4 7970368100 R05C02 CC GT FT42_mus FT42 CON M mus 17.0 WG0009065-MSA4 7970368097 R01C01 CC GT FT42_sc FT42 CON M sc 17.0 WG0009065-MSA4 7970368112 R05C02 CC GT FT42_v FT42 CON M v 17.0 WG000907-MSA4 7970368142 R04C01 CC GT FT47_kid FT47 SB M kid 22.0 WG0011603-MSA4 9296930124 R03C02 TT TT 185    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT47_sc FT47 SB M sc 22.0 WG0011603-MSA4 9296930124 R02C02 TT TT FT47_v FT47 SB M v 22.0 WG0011603-MSA4 9296930155 R05C01 TT TT FT5_br FT5 CON M br 23.7 WG0009065-MSA4 7970368014 R04C02 TC TT FT5_kid FT5 CON M kid 23.7 WG0009065-MSA4 7970368015 R03C02 TC TT FT5_mus FT5 CON M mus 23.7 WG0009065-MSA4 7970368097 R05C01 TC TT FT5_v FT5 CON M v 23.7 WG0009065-MSA4 7970368054 R05C01 TC TT FT52_kid FT52 AN F kid 21.8 WG0009065-MSA4 7970368054 R02C01 TT TT FT52_mus FT52 AN F mus 21.8 WG0009065-MSA4 7970368076 R06C01 TT TT FT52_sc FT52 AN F sc 21.8 WG0009065-MSA4 7970368112 R02C01 TT TT FT52_v FT52 AN F v 21.8 WG0009065-MSA4 7970368076 R02C02 TT TT FT54_kid FT54 SB M kid 20.4 WG0011603-MSA4 9406922117 R06C01 CC GT FT54_sc FT54 SB M sc 20.4 WG0011603-MSA4 9296930124 R03C01 CC GT FT54_v FT54 SB M v 20.4 WG0011603-MSA4 9406922026 R01C02 CC GT FT57_br FT57 SB M br 22.0 WG000907-MSA4 7973201038 R04C01 TC TT FT57_kid FT57 SB M kid 22.0 WG000907-MSA4 7973201038 R02C02 TC TT FT57_mus FT57 SB M mus 22.0 WG000907-MSA4 7973201026 R01C01 TC TT FT57_sc FT57 SB M sc 22.0 WG0009065-MSA4 7970368076 R04C02 TC TT FT57_v FT57 SB M v 22.0 WG0009065-MSA4 7970368076 R05C01 TC TT FT58_kid FT58 AN F kid 17.0 WG000907-MSA4 7970368082 R03C01 CC GT FT58_mus FT58 AN F mus 17.0 WG000907-MSA4 7970368100 R04C01 CC GT FT58_sc FT58 AN F sc 17.0 WG000907-MSA4 7970368100 R01C02 CC GT FT58_v FT58 AN F v 17.0 WG0009065-MSA4 7970368054 R06C01 CC GT FT59_br FT59 SB F br 19.4 WG0009065-MSA4 7970368097 R01C02 TT TT FT59_kid FT59 SB F kid 19.4 WG0009065-MSA4 7970368112 R01C01 TT TT FT59_mus FT59 SB F mus 19.4 WG0009065-MSA4 7970368054 R01C01 TT TT FT59_sc FT59 SB F sc 19.4 WG0009065-MSA4 7970368054 R01C02 TT TT FT59_v FT59 SB F v 19.4 WG000907-MSA4 7973201038 R04C02 TT TT FT60_kid FT60 SB F kid 23.4 WG000907-MSA4 7970368082 R01C02 CC GG 186    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT60_mus FT60 SB F mus 23.4 WG000907-MSA4 7970368050 R02C02 CC GG FT60_sc FT60 SB F sc 23.4 WG000907-MSA4 7973201026 R04C02 CC GG FT60_v FT60 SB F v 23.4 WG000907-MSA4 7970368050 R03C01 CC GG FT62_kid FT62 SB F kid 22.6 WG000907-MSA4 7970368100 R04C02 CC TT FT62_mus FT62 SB F mus 22.6 WG000907-MSA4 7970368062 R02C01 CC TT FT62_sc FT62 SB F sc 22.6 WG000907-MSA4 7970368062 R05C01 CC TT FT62_v FT62 SB F v 22.6 WG000907-MSA4 7970368062 R01C02 CC TT FT64_br FT64 SB M br 21.7 WG0009065-MSA4 7970368076 R01C02 CC GT FT64_kid FT64 SB M kid 21.7 WG000907-MSA4 7973201038 R03C02 CC GT FT64_mus FT64 SB M mus 21.7 WG0009065-MSA4 7970368112 R04C01 CC GT FT64_sc FT64 SB M sc 21.7 WG000907-MSA4 7970368142 R04C02 CC GT FT64_v FT64 SB M v 21.7 WG000907-MSA4 7970368050 R06C01 CC GT FT65_kid FT65 AN F kid 19.0 WG0009065-MSA4 7970368054 R06C02 TC GT FT65_mus FT65 AN F mus 19.0 WG000907-MSA4 7973201038 R02C01 TC GT FT65_v FT65 AN F v 19.0 WG000907-MSA4 7970368142 R02C02 TC GT FT67_kid FT67 SB M kid 22.4 WG0011603-MSA4 9296930114 R06C01 CC GT FT67_sc FT67 SB M sc 22.4 WG0011603-MSA4 9296930114 R01C01 CC GT FT67_v FT67 SB M v 22.4 WG0011603-MSA4 9296930155 R03C02 CC GT FT72_kid FT72 SB F kid 23.0 WG0011603-MSA4 9406922026 R03C02 CC TT FT72_sc FT72 SB F sc 23.0 WG0011603-MSA4 9296930124 R04C01 CC TT FT72_v FT72 SB F v 23.0 WG0011603-MSA4 9296930114 R03C01 CC TT FT73_br FT73 CON F br 18.0 WG0011603-MSA4 9296930117 R05C02 CC GG FT73_kid FT73 CON F kid 18.0 WG0011603-MSA4 9296930124 R01C01 CC GG FT73_mus FT73 CON F mus 18.0 WG0011603-MSA4 9406922026 R02C02 CC GG FT73_sc FT73 CON F sc 18.0 WG0011603-MSA4 9296930154 R06C02 CC GG FT73_v FT73 CON F v 18.0 WG0011603-MSA4 9296930114 R02C01 CC GG FT74_v FT74 AN M v 21.5 WG0011603-MSA4 9296930117 R04C02 TC GT FT75_v FT75 AN F v 20.0 WG0011603-MSA4 9406922026 R03C01 CC TT 187    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 FT78_kid FT78 SB F kid 20.4 WG0011603-MSA4 9406922117 R03C02 TC TT FT78_v FT78 SB F v 20.4 WG0011603-MSA4 9406922117 R05C01 TC TT FT79_v FT79 AN M v 20.8 WG0011603-MSA4 9296930124 R04C02 TC GT FT82_kid FT82 CON M kid 19.0 WG0011603-MSA4 9296930155 R05C02 CC GT FT82_v FT82 CON M v 19.0 WG0011603-MSA4 9406922026 R04C01 CC GT FT84_v FT84 AN F v 17.0 WG0011603-MSA4 9406922026 R01C01 CC GG FT85_kid FT85 CON F kid 23.9 WG0011603-MSA4 9296930114 R05C02 TC TT FT85_mus FT85 CON F mus 23.9 WG0011603-MSA4 9296930155 R02C01 TC TT FT85_v FT85 CON F v 23.9 WG0011603-MSA4 9406922117 R04C01 TC TT mt4-5_br mt4-5 CON F br 20.3 WG000907-MSA4 7970368062 R02C02 CC GT mt4-5_kid mt4-5 CON F kid 20.3 WG0009065-MSA4 7970368076 R06C02 CC GT mt4-5_mus mt4-5 CON F mus 20.3 WG0009065-MSA4 7970368015 R04C02 CC GT mt4-5_v mt4-5 CON F v 20.3 WG0009065-MSA4 7970368112 R06C01 CC GT NTD1_v FT83 AN F v 22.7 WG000907-MSA4 7973201038 R05C01 TC TT NTD14_v NTD14 AN M v 23.3 WG0011603-MSA4 9296930155 R06C02 TC GT NTD16_kid NTD16 SB F kid 21.0 WG0011603-MSA4 9296930117 R02C02 TC TT NTD16_sc NTD16 SB F sc 21.0 WG0011603-MSA4 9296930154 R05C01 TC TT NTD16_v NTD16 SB F v 21.0 WG0011603-MSA4 9406922117 R04C02 TC TT NTD17_kid NTD17 SB M kid 23.7 WG0011603-MSA4 9296930117 R03C02 TT TT NTD17_sc NTD17 SB M sc 23.7 WG0011603-MSA4 9296930155 R06C01 TT TT NTD17_v NTD17 SB M v 23.7 WG0011603-MSA4 9296930154 R02C02 TT TT NTD2_v NTD2 SB M v 23.7 WG0011603-MSA4 9406922026 R06C01 CC TT NTD3_br NTD3 SB M br 20.0 WG0009065-MSA4 7970368014 R05C01 TC TT NTD3_kid NTD3 SB M kid 20.0 WG0009065-MSA4 7970368097 R06C01 TC TT NTD3_mus NTD3 SB M mus 20.0 WG0009065-MSA4 7970368112 R03C02 TC TT NTD3_v NTD3 SB M v 20.0 WG000907-MSA4 7970368100 R05C01 TC TT NTD6_kid NTD6 SB M kid 21.1 WG0011603-MSA4 9296930114 R03C02 TT TT NTD6_sc NTD6 SB M sc 21.1 WG0011603-MSA4 9406922026 R02C01 TT TT 188    case ID NTD status sex tissue GA (wks) 450k plate ID 450k Sentrix ID 450k Sentrix Position MTHFR 677 MTHFR 1298 NTD6_v NTD6 SB M v 21.1 WG0011603-MSA4 9296930124 R05C01 TT TT NTD8_br NTD8 SB M br 20.0 WG000907-MSA4 7970368023 R02C01 TC TT NTD8_kid NTD8 SB M kid 20.0 WG0009065-MSA4 7970368036 R01C01 TC TT NTD8_mus NTD8 SB M mus 20.0 WG000907-MSA4 7970368142 R05C01 TC TT NTD8_sc NTD8 SB M sc 20.0 WG0009065-MSA4 7970368097 R04C02 TC TT NTD8_v NTD8 SB M v 20.0 WG0009065-MSA4 7970368036 R04C02 TC TT NTD9_br NTD9 SB M br 22.0 WG0009065-MSA4 7970368014 R06C01 CC GT NTD9_kid NTD9 SB M kid 22.0 WG000907-MSA4 7970368100 R02C01 CC GT NTD9_mus NTD9 SB M mus 22.0 WG000907-MSA4 7970368142 R06C01 CC GT NTD9_sc NTD9 SB M sc 22.0 WG000907-MSA4 7970368023 R05C01 CC GT NTD9_v NTD9 SB M v 22.0 WG0009065-MSA4 7970368112 R01C02 CC GT PL118_v PL118 SB M v 22.7 WG0011603-MSA4 9296930117 R01C02 CC GT PL148_v PL148 CON F v 18.7 WG0011603-MSA4 9296930117 R06C01 TC TT PL149_v PL149 CON F v 18.5 WG0011603-MSA4 9296930154 R03C02 CC GT Spina bifida, SB; anencephaly, AN; control, CON; male, M; female, F; chorionic villi, v; kidney, kid; spinal cord, sc; brain, br; muscle, mus; gestational age, GA; weeks, wks; years, yrs 189   Supplementary Table 5.2 Sample information for follow-up pyrosequencing.  case ID NTD status sex tissue GA (wks) sample run on 450k array? FT100_v FT100 AN F chorionic villi 17.7 N FT101_v FT101 AN F chorionic villi 21 N FT102_v FT102 AN M chorionic villi 18.9 N FT103_v FT103 SB M chorionic villi 22 N FT105_v FT105 SB F chorionic villi 24.1 N FT107_v FT107 AN M chorionic villi 21.4 N FT13_v FT13 CON F chorionic villi 17 Y FT16_v FT16 AN F chorionic villi 18.3 Y FT20_v FT20 SB F chorionic villi 20.6 N FT21_v FT21 SB M chorionic villi 21.1 Y FT22_v FT22 SB M chorionic villi 21.8 Y FT23_v FT23 SB M chorionic villi 20.8 Y FT26_v FT26 AN M chorionic villi 19.3 Y FT27_v FT27 SB M chorionic villi 19.8 Y FT28_v FT28 CON F chorionic villi 14.5 Y FT29_v FT29 AN F chorionic villi 17.6 Y FT3_v FT3 CON F chorionic villi 19.7 Y FT30_v FT30 AN F chorionic villi 21.3 N FT34_v FT34 CON F chorionic villi 23.3 Y FT35_v FT35 AN F chorionic villi 16.7 Y FT36_v FT36 CON F chorionic villi 15 Y FT38_v FT38 CON M chorionic villi 19.1 Y FT39_v FT39 CON M chorionic villi 19 Y FT40_v FT40 AN M chorionic villi 20 Y FT41_v FT41 CON F chorionic villi 15.5 Y FT42_v FT42 CON M chorionic villi 17 Y FT47_v FT47 SB M chorionic villi 22 Y FT5_v FT5 CON M chorionic villi 23.7 Y FT52_v FT52 AN F chorionic villi 21.8 Y FT54_v FT54 SB M chorionic villi 20.4 Y FT57_v FT57 SB M chorionic villi 22 Y FT58_v FT58 AN F chorionic villi 17.7 Y FT59_v FT59 SB F chorionic villi 19.4 Y FT60_v FT60 SB F chorionic villi 23.4 Y FT62_v FT62 SB F chorionic villi 22.6 Y 190    case ID NTD status sex tissue GA (wks) sample run on 450k array? FT63_v FT63 AN M chorionic villi 22 N FT64_v FT64 SB M chorionic villi 21.7 Y FT65_v FT65 AN F chorionic villi 19 Y FT67_v FT67 SB M chorionic villi 22.4 Y FT71_v FT71 AN M chorionic villi 22.7 N FT72_v FT72 SB F chorionic villi 23 Y FT73_v FT73 CON F chorionic villi 18 Y FT74_v FT74 AN M chorionic villi 21.5 Y FT75_v FT75 AN F chorionic villi 20 Y FT78_v FT78 SB F chorionic villi 20.4 Y FT79_v FT79 AN M chorionic villi 20.8 Y FT82_v FT82 CON M chorionic villi 19 Y FT83_v FT83 AN F chorionic villi 22.7 Y FT84_v FT84 AN F chorionic villi 17 Y FT85_v FT85 CON F chorionic villi 23.9 Y FT89_v FT89 CON F chorionic villi 17.7 N FT91_v FT91 CON F chorionic villi 14 N FT93_v FT93 AN M chorionic villi 21 N FT95_v FT95 SB F chorionic villi 21 N FT96_v FT96 AN F chorionic villi 19.4 N FT97_v FT97 AN F chorionic villi 22 N FT98_v FT98 SB F chorionic villi 20.9 N mt4-5_v mt4-5 CON F chorionic villi 20.3 Y NTD14_v NTD14 AN M chorionic villi 23.3 Y NTD16_v NTD16 SB F chorionic villi 21 Y NTD17_v NTD17 SB M chorionic villi 23.7 Y NTD18_v NTD18 SB F chorionic villi 20.7 N NTD19_v NTD19 SB F chorionic villi 22.4 N NTD2_v NTD2 SB M chorionic villi 23.7 Y NTD20_v NTD20 AN M chorionic villi 19.7 N NTD3_v NTD3 SB M chorionic villi 20 Y NTD5_v NTD5 SB M chorionic villi 22.8 N NTD6_v NTD6 SB M chorionic villi 21 Y NTD7_v NTD7 SB M chorionic villi 22 N NTD8_v NTD8 SB M chorionic villi 20 Y NTD9_v NTD9 SB M chorionic villi 22 Y 191    case ID NTD status sex tissue GA (wks) sample run on 450k array? PL118_v PL118 SB M chorionic villi 22.5 Y PL148_v PL148 CON F chorionic villi 18.7 Y PL149_v PL149 CON F chorionic villi 18.5 Y PL151_v PL151 CON M chorionic villi 15.1 N PL152_v PL152 CON M chorionic villi 16.3 N PL154_v PL154 CON M chorionic villi 16.4 N PL155_v PL155 CON M chorionic villi 15 N PL158_v PL158 CON M chorionic villi 23.4 N PL159_v PL159 CON M chorionic villi 18.2 N PL163_v PL163 CON F chorionic villi 15.4 N spina bifida, SB; anencephaly, AN; control, CON, male, M; female, F; yes, Y; no, N192   Supplementary Table 5.3 Pyrosequencing conditions  MTHFR 677 MTHFR 1298 L1 Alu cg10988628  cg02413938   Primer sequence (5' to 3')      Forward 5Biosg/CTCAAAGAAAAGCTGCGTGAT TCCAGCATCACTCACTTTGTGAC TTTTGAGTTAGGTGTGGGATATA 5Biosg/TTTTTATTAAAAATATAAAAATT TGTGGTAGGTTTTGAGTAGG TGGGAGTGGTTTGGGTAGG Reverse TGTCATCCCTATTGGCAGGTT 5Biosg/CTTTGGGGAGCTGAAGGACTACTA 5Biosg/AAAATCAAAAAATTCCCTTTC CCCAAACTAAAATACAATAA 5Biosg/CCCTAAATAACTCAAATAACTAACTCTACA 5Biosg/ATACCAACTACTTCTCCATCCTCTAACC Sequencing AAGCACTTGAAGGAGAA AACAAAGACTTCAAAGACAC AGTTAGGTGTGGGATATAGT AATAACTAAAATTACAAAC GGTAGTTGGGTGTTTT GGTAGGTTTTTAAGGTTTG PCR cycling conditions       step 1 95°C for 05:00 95°C for 05:00 95°C for 15:00 95°C for 15:00 95°C for 10:00 95°C for 10:00 step 2 95°C for 00:20 95°C for 00:20 95°C for 00:20 95°C for 01:30 95°C for 00:40 95°C for 00:40 step 3 57°C for 00:20 60°C for 00:20 50°C for 00:20 49°C for 01:00 55°C for 00:40 55°C for 00:40 step 4 72°C for 00:20 72°C for 00:20 72°C for 00:20 72°C for 01:20 72°C for 00:40 72°C for 00:40 step 5 Go to step 2, 49 times Go to step 2, 49 times Go to step 2, 45 times Go to step 2, 40 times Go to step 2, 39 times Go to step 2, 39 times step 6 72°C for 05:00 72°C for 05:00 72°C for 05:00 72°C for 05:00 72°C for 07:00 72°C for 07:00 PCR reaction (µL)      H2O 9.22 9.22 15.8 15.8 9.22 9.22 10X PCR buffer 1.5 1.5 2.5 2.5 1.5 1.5 1.25 nM dNTPs 2.4 2.4 4 4 2.4 2.4 10 µM F/R primers 0.6 0.6 0.5/0.5 0.5/0.5 0.6/0.6 0.6/0.6 5 U/µL DNA polymerase 0.18 0.18 0.2 0.2 0.18 0.18 DNA 50 ng (genomic) 50 ng (genomic) 30 ng (bsc) 30 ng (bsc) 17 ng (bsc) 17 ng(bsc) bisulfite converted, bsc193   Supplementary Table 5.4 Additional control kidney samples from GEO. GEO ID NTD status sex tissue GA (wks) GSM868047 geo.control M kidney 20 GSM868048 geo.control F kidney 18 GSM868049 geo.control F kidney 20 GSM868050 geo.control M kidney 15 GSM868051 geo.control F kidney 14 male, M; female, F; gestational age, GA; weeks, wks 194   Supplementary Table 5.5 Differential methylation summary – candidate gene CpG sites.   chorionic villi kidney spinal cord brain muscle      deltaβ  FDR NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 SB vs. CON 0.05 1 0 0 83 65 9 0 0 0 0 0 0 0 0 0 0.1 1 0 0 213 139 10 0 0 0 0 0 0 0 0 0 0.15 1 0 0 347 180 10 0 0 0 0 0 0 0 0 0 0.2 1 0 0 431 203 10 0 0 0 0 0 0 0 0 0 1 8393 112 7 8393 266 10 8393 339 40 8393 174 11 8393 68 5 AN vs. CON 0.05 0 0 0 0 0 0 0 0 0 NA NA NA 0 0 0 0.1 2 2 1 1 1 0 0 0 0 NA NA NA 0 0 0 0.15 7 4 2 2 1 0 0 0 0 NA NA NA 0 0 0 0.2 46 20 4 2 1 0 0 0 0 NA NA NA 0 0 0 1 8393 125 8 8393 97 6 8393 288 28 NA NA NA 8393 133 6 spina bifida, SB; anencephaly, AN; control, CON195   Supplementary Table 5.6 Differential methylation summary – array-wide.   chorionic villi kidney spinal cord brain muscle      deltaβ  FDR NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 NA 0.05 0.10 SB vs. CON 0.05 0 0 0 4148 3342 504 1 0 0 0 0 0 0 0 0 0.1 0 0 0 9997 6723 623 3 0 0 0 0 0 0 0 0 0.15 0 0 0 15511 8891 656 3 0 0 4 2 0 0 0 0 0.2 0 0 0 20807 10316 669 3 0 0 5 3 1 0 0 0 1 442091 11508 413 442091 13734 734 442091 17952 1585 442091 11903 1087 442091 3999 156 AN vs. CON 0.05 5 4 1 0 0 0 0 0 0 NA NA NA 0 0 0 0.1 9 7 2 0 0 0 0 0 0 NA NA NA 0 0 0 0.15 10 8 2 4 3 0 0 0 0 NA NA NA 0 0 0 0.2 56 42 13 9 7 1 0 0 0 NA NA NA 0 0 0 1 442091 10417 326 442091 5402 218 442091 16753 1530 NA NA NA 442091 7057 287 spina bifida, SB; anencephaly, AN; control, CON    Supplementary Figure 5.1 Array average DNA methylation.  Box plots of array average DNAm (n=442,091 CpG sites) calculated for each sample and plotted by NTD status. Box edges are plotted at the 25th and 75th percentiles (the inter-quartile range (IQR)), whiskers are plotted to the last sample within +/- 1.5*IQR, samples beyond whiskers are outliers plotted as points. There was a small, but significant difference in array-wide average DNAm (array avgβ) in chorionic villi between the SB and CON (p<0.01) and AN and CON (p<0.05) in addition to spinal cord in SB compared to CON (p<0.05) (also see Table 5.2).  197   Supplementary Figure 5.2 Percentage of outlier CpG sites.  Box plots of percentage of outlier CpG sites (i.e. 450k array probes) calculated for each sample and plotted by NTD status. A CpG site-sample combination with >3 median absolute deviations from the site median for all samples in that tissue was called an outlier. The number of outlier probes per sample was normalized by dividing by the number of CpG sites with data for that sample (the number of missing values varied by sample). Box edges are plotted at the 25th and 75th percentiles (the inter-quartile range (IQR)), whiskers are plotted to the last sample within +/- 1.5*IQR, samples beyond whiskers are outliers plotted as points. There was no difference in the percentage of outlier probes/sample by NTD status.   198   Supplementary Figure 5.3 Anencephaly array-wide volcano plots. Volcano plots comparing the magnitude of difference in DNAm (adjusted delta beta) to the statistical significance (-log10(adjusted P.Value)) for each CpG site (n=442,091) in anencephaly vs. control samples.    Supplementary Figure 5.4 NTD status overlap of top 1,000 CpG sites by tissue.  Venn diagram overlapping the top 1,000 ranking CpG sites for each tissue based on unadjusted p-values for the comparison of SB to CON and AN to CON. Brain could not be included in this analysis as no brain tissue was collected from AN cases.    Supplementary Figure 5.5 Tissue overlap of top 1,000 CpG sites by NTD status.  Venn diagram overlapping the top 1,000 ranking CpG sites across tissues within an NTD status, based on unadjusted p-values for the comparison of SB to CON and AN to CON.     200   Supplementary Figure 5.6 Pyrosequencing of cg1098862 in chorionic villi.  Pyrosequencing was performed to follow up differential methylation (DM) at cg1098862, identified in the 450k array comparison of anencephaly to controls in chorionic villi. This CpG is located 146 bps up-stream of poly (ADP-Ribose) Polymerase 1 (PARP1), a gene shown to regulate trophoblast differentiation, involved in ADP-ribosylation of histones and loss of function is associated with invasive and metastatic properties in cancer. The differential DNAm was validated in the set of samples run on the 450k array (p=0.000006) and replicated in the smaller, extended set of samples (p=0.02). % DNA methylation plotted was adjusted for fetal sex and gestational age.     201   Supplementary Figure 5.7 Pyrosequencing of cg02413938 in chorionic villi.  Pyrosequencing was performed to follow up differential methylation (DM) at cg02413938, identified in the 450k array comparison of anencephaly to controls in chorionic villi. This CpG is located 21 bps up-stream of ectoplasmic specialization protein like (ESPNL), a gene with a role in proliferation and invasion in melanoma. An A/G SNP (rs64315796) 6 bps down-stream of the target CpG site was included in the pyrosequencing assay. (A) Differential DNAm was validated in the set of samples run on the 450k array (p=0.05), but not replicated in a smaller, extended set of samples. (B) We identified significant differential DNAm at cg02413938 by genotype of rs64315796, including both array and extended samples. There was unequal distribution of genotypes by NTD status in the array cohort, which likely accounts for the DM picked in the 450k array analysis. % DNA methylation plotted was adjusted for fetal sex and gestational age.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0300236/manifest

Comment

Related Items