UBC Faculty Research and Publications

Population-specificity of human DNA methylation Fraser, Hunter B; Lam, Lucia L; Neumann, Sarah M; Kobor, Michael S Feb 9, 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13059_2011_Article_2759.pdf [ 802.76kB ]
JSON: 52383-1.0223575.json
JSON-LD: 52383-1.0223575-ld.json
RDF/XML (Pretty): 52383-1.0223575-rdf.xml
RDF/JSON: 52383-1.0223575-rdf.json
Turtle: 52383-1.0223575-turtle.txt
N-Triples: 52383-1.0223575-rdf-ntriples.txt
Original Record: 52383-1.0223575-source.json
Full Text

Full Text

RESEARCH Open AccessPopulation-specificity of human DNA methylationHunter B Fraser1*, Lucia L Lam2,3, Sarah M Neumann2,3 and Michael S Kobor2,3*AbstractBackground: Ethnic differences in human DNA methylation have been shown for a number of CpG sites, but thegenome-wide patterns and extent of these differences are largely unknown. In addition, whether the geneticcontrol of polymorphic DNA methylation is population-specific has not been investigated.Results: Here we measure DNA methylation near the transcription start sites of over 14, 000 genes in 180 cell linesderived from one African and one European population. We find population-specific patterns of DNA methylationat over a third of all genes. Furthermore, although the methylation at over a thousand CpG sites is heritable, theseheritabilities also differ between populations, suggesting extensive divergence in the genetic control of DNAmethylation. In support of this, genetic mapping of DNA methylation reveals that most of the populationspecificity can be explained by divergence in allele frequencies between populations, and that there is little overlapin genetic associations between populations. These population-specific genetic associations are supported by thepatterns of DNA methylation in several hundred brain samples, suggesting that they hold in vivo and acrosstissues.Conclusions: These results suggest that DNA methylation is highly divergent between populations, and that thisdivergence may be due in large part to a combination of differences in allele frequencies and complex epistasis orgene × environment interactions.BackgroundIn multicellular organisms, the great diversity of celltypes is maintained by mitotically heritable differencesin gene expression, which are in part regulated by epige-netic mechanisms [1]. These include histone modifica-tions, histone variants, RNA-based mechanisms, andDNA methylation [2]. The latter is perhaps the bestunderstood component of the epigenetic machinery [3]and in somatic cells occurs almost exclusively on cyto-sine residues in the context of CpG dinucleotides [4].While CpGs are underrepresented across the humangenome, they are enriched at the majority of gene pro-moters, forming regions known as CpG islands that canregulate the expression of neighboring genes [4]. DNAmethylation is not only closely linked to tissue-specificgene expression, but also to a number of intriguing bio-logical phenomena such as X-chromosome inactivationin females, allele-specific expression of imprinted genes,aging, and cancer [5].An emerging aspect of epigenetics is its role at theinterface between the environment and the genome [6].Although DNA methylation is a very stable epigeneticmark, numerous environmental influences have beenassociated with variation in DNA methylation as well asother epigenetic marks [2,6]. These include nutritionalfactors, exposure to environmental pollutants, and socialenvironment. It is this plasticity that underlies much ofthe potential contribution of DNA methylation to multi-factorial diseases and complex phenotypes [7]. However,the fundamental biology of the epigenome poses somechallenges to testing this attractive concept. For exam-ple, most primary material available from human popu-lations consists of mixtures of different cell types withdistinct epigenomes, making it difficult to specificallyassess the association of epigenetic changes with envir-onmental exposure and phenotype. To address the roleof epigenetics in common disease, it is important tounderstand the nature of epigenetic variation in the con-text of genetically well-characterized pure cellpopulations.* Correspondence: hbfraser@stanford.edu; msk@cmmt.ubc.ca1Department of Biology, Stanford University, Stanford, CA 94305, USA2Department of Medical Genetics, University of British Columbia, Vancouver,British Columbia, V6T 1Z3, CanadaFull list of author information is available at the end of the articleFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8© 2012 Fraser et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.Recent advances in high-throughput technologies formeasuring DNA methylation have allowed the patternsof methylation to be characterized throughout thehuman genome [8-15]. Comparing these results betweentwins has revealed that methylation at some CpG sitescan be heritable [14,15], and combining them with gen-otype data has led to the discovery of hundreds ofmethylation-associated SNPs, or ‘mSNPs’, in brain tissue[11,12] as well as cell lines [13]. However, the questionof whether the effects of mSNPs on DNA methylationlevels and heritability differ between human populationshas not been addressed. Quantifying such populationspecificity is important for our understanding of thegenetic architecture of the epigenome, as well as itsplasticity during human evolution.ResultsTo compare DNA methylation between human popula-tions, we utilized lymphoblastoid cell lines (LCLs) fromthe HapMap project [16], which have been extensivelygenotyped and previously employed to study the popula-tion specificity of gene expression levels [17-19].Although LCLs can acquire changes in gene expressionand DNA methylation during transformation and cellculture [20,21], it has been shown that the inter-indivi-dual variation - which is what is relevant for the currentwork - is nearly always conserved (at least for geneexpression) [21]. Our initial study set consisted of 30family trios (mother/father/offspring) of Northern Eur-opean ancestry (abbreviated CEU), and 30 trios of Yoru-ban (West African) ancestry (abbreviated YRI). These180 cell lines were grown in identical conditions andtheir genomic DNA was subjected to quantitative bead-array-based DNA methylation analysis at 27, 578 CpGsites near the transcription start sites of 14, 495 genes(Materials and methods). Although an average ofapproximately two CpG sites near each transcriptionstart site does not directly measure most of the methyla-tion in regulatory regions, the fact that sites separated byunder approximately 1 kb show highly correlated methy-lation [9,10] suggests that our data may actually capturethe majority of methylation information near transcrip-tion start sites - similar to the effect of linkage disequili-brium (LD) between genetic variants in genome-wideassociation studies (though there is no guarantee that themost relevant sites will be in ‘methylation LD’ with theCpG sites we measure). The 1, 092 sites on the × and Ychromosomes were excluded from all analyses to elimi-nate gender effects, leaving 26, 486 autosomal sites in 13,890 genes (in which no significant sex specificity wasobserved; Figure S1 in Additional file 1).The resulting data revealed a wide range of within-population variability in the methylation of individualCpG sites (Figure 1a), consistent with previous work[11-13]. Across all sites, the average correlation ofmethylation profiles between individuals (mean r2 = 0.78for CEU, 0.86 for YRI) was far lower than that of techni-cal replicates (r2 > 0.99 for all six replicate pairs), indi-cating that most of the variability was biological, andnot technical. In addition, we replicated results for twovariable sites in all 180 samples by pyrosequencingbisulfite-treated DNA. This showed excellent concor-dance with our array-based results (r2 = 0.88 for IGSF2and 0.94 for PLSCR2; Figure 1b), suggesting that thearray data provide accurate quantification of DNAmethylation levels.In addition to the variation within each population, weobserved extensive differences in the DNA methylationpatterns between populations (for example, FLJ32569 inFigure 1a). To quantify this population specificity, wecalculated the number of CpG sites with methylationdiffering between populations, using the nonparametricWilcoxon test. We found a substantial fraction differingbetween the populations (Figure 1c): at nominal P <0.01, 8, 475 sites differed between populations (32.0% ofsites; false discovery rate (FDR) = 3.1%), and 5, 654 sitesremained significant at P < 0.001 (21.4% of sites; FDR =0.5%; Figure S2 in Additional file 1). Thus, the methyla-tion of approximately 30% of the CpG sites we studied -representing over a third of the genes assayed - differedbetween populations (this degree of population specifi-city is similar to that of gene expression levels in thesame cell lines; Figure S3 in Additional file 1). However,these population-level differences tended to be small inmagnitude, with only 1, 033 sites (3.9%) differing by anaverage of over 10% methylation, and 3, 695 sites(14.0%) differing by over 5%. Perhaps because of theirsmall magnitudes, differences in DNA methylationexplained very little of the variation in gene expressionlevels between populations that has been previouslyreported [17-19] (Supplemental text and Figure S4 inAdditional file 1), consistent with previous findings thatinter-individual variation in DNA methylation explainsalmost none of the variation in gene expression [12,13].These subtle but extensive epigenetic differencesbetween populations could have genetic or environmen-tal underpinnings - or a combination of both. To assessthe role of both common and rare genetic variants indetermining DNA methylation patterns, we estimatedthe contribution of additive genetic variation (known asnarrow-sense heritability, or h2) to the methylation ofeach CpG site in each population by measuring the cor-relation in methylation levels between parents and theiroffspring (Figure 2a; Materials and methods). Weobserved heritable methylation at approximately 762CpG sites in CEU and 930 sites in YRI (Figure 2b), sug-gesting that genetic control of polymorphic methylationis fairly common - though slightly less heritable thanFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 2 of 12Figure 1 Population-specificity of DNA methylation. (a) Heatmap of the clustered methylation data set. Three representative cases aremagnified: a site with a clear population difference; a site showing within- but not between-population variability; and a site with little variabilitywithin or between populations. (b) We performed pyrosequencing as an independent means to measure methylation of two CpG sites (IGSF2,chromosome 1, base 117345939; PLSCR2, chromosome 3, base 147696535) in our 180 samples. The agreement validates the accuracy of ourmicroarray data. (c) The methylation of many sites differs between CEU and YRI. We performed the nonparametric Wilcoxon test to identify CpGsites differing in methylation between populations. The P-values are skewed towards small values, as shown by comparing to the expecteduniform distribution on either a linear (left) or log (right) scale.Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 3 of 12Figure 2 Population specificity of DNA methylation heritability. (a) An example of a CpG site (near PLSCR2: chromosome 3, base147696535) whose methylation is heritable in YRI, but not CEU, as assessed by the similarity of average parental methylation to their offspringmethylation (each point represents one family trio). (b) Histograms comparing the observed distribution of per-site heritabilities to a typicalrandomized distribution (numbers in the text are based on 1, 000 randomizations; Materials and methods). The greater number of sites at highheritabilities in the real data compared to random (arrows) is an estimate of the number of heritable sites we can detect in each population. (c)No similarity between heritabilities in each population (Pearson’s r2 = 0.002; each point is a CpG site).Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 4 of 12gene expression levels in the same cell lines (Figure S5in Additional file 1). Given our limited power to detectweakly heritable DNA methylation, these numbers arelikely to be substantial underestimates of the true extentof heritability.Considering the overall genetic similarity amonghuman populations [16,22], we expected the patterns ofheritability in CEU and YRI to be similar. Surprisingly,we found almost no correlation between them (r2 =0.002; Figure 2c). This is similar to agreement in h2 forgene expression levels in the same cell lines (Figure S6in Additional file 1). We did not find any evidence forcomplex inheritance patterns - such as dominance,maternal-biased, or paternal-biased inheritance of DNAmethylation - that could affect heritability (Supplementaltext in Additional file 1).Differences in heritability between populations couldhave many causes. h2 is defined as the ratio of a trait’sadditive genetic variance to its total variance in a popu-lation; factors that can affect this ratio include changesin the additive genetic variance (for example, differingallele frequencies), non-additive (gene × gene, or GxG)genetic variance, environmental variance, and gene ×environment (GxE) interaction variance [23]. In addi-tion, limited statistical power could restrict the accuracyof our heritability estimates (Supplemental text and Fig-ure S7 in Additional file 1). Although we were not ableto rule out any of these potential factors, the extensiveDNA sequence data available for these samples do allowus to test the contributions of two types of divergencethat may contribute to the population-specific DNAmethylation levels, and their heritabilities.One type of divergence that may affect DNA methyla-tion levels and heritabilities is a difference in the CEU/YRI allele frequencies at genetic variants that influencemethylation. In particular, lower minor allele frequencyat such a variant reduces the population-level geneticvariation affecting a site’s methylation, thus reducing h2.To test how much of our observed population specificitycan be explained in this way, we first identified the‘local’ SNP (within 100 kb of the CpG) most stronglyassociated with each CpG’s methylation across all 180samples from both populations (although genetic asso-ciations in ethnically heterogeneous cohorts such as thiscan reflect population stratification, it is appropriate forour current goal). We then included this single SNPgenotype in a multiple regression analysis to assesswhether genotype or population was a stronger predic-tor of methylation at each site. Among the 5, 654 CpGsites differing between populations at Wilcoxon P <0.001 (discussed above), we found that 3, 131 (55.4%)were more strongly associated with a local SNP geno-type than with population, implying that common (andlikely cis-acting) genetic variants can explain over half ofthe population specificity we observed. This result alsoindicates that most of the population specificity is unli-kely to be due to any type of cell line artifacts, sincethese would not correlate with individual SNPgenotypes.The second type of divergence we tested concernedcomplex GxG or GxE interactions: if a genetic variant ispresent in two populations, but affects DNA methylationin only one, then that variant must genetically interactwith other variants and/or the environment. Such inter-actions can decrease heritability by increasing the popu-lation-level variance in DNA methylation (thedenominator of h2) without affecting the additive geneticvariance (the numerator). To perform this analysis, weneeded to identify SNPs associated with the methylationof individual CpG sites separately in each population,and then compare the lists to one another.Three previous studies of genome-wide DNA methyla-tion have mapped SNPs whose genotype correlates withthe methylation of a CpG site, termed ‘mSNPs’ [11-13].Because mSNPs are highly enriched close to their targetCpG sites [11-13], we performed a ‘local’ associationanalysis between methylation at each CpG site with allHapMap SNPs within 100 kb, separately for each popu-lation. These local mSNP associations can arise fromeither true (likely cis-acting) genetic associations, orgenetic variants that disrupt hybridization of the bead-array probes in some individuals, leading to spuriousassociations (analogous to issues in eQTL mapping[24]). Using recent and essentially complete catalogs ofcommon genetic variants in each [22], we identified allprobes overlapping variants present in the 1000 Gen-omes samples (2, 734 probes in CEU, and 3, 923 probesin YRI; Table S1 in Additional file 1). We observed a2.6-fold higher frequency of mSNPs for these probescompared to probes not disrupted by SNPs, implying ahigh rate of spurious associations (re-analysis of pre-viously reported brain mSNPs [11,12] suggests a simi-larly high rate of spurious associations in those studies).Therefore, we removed these probes from our analysis(these sites did not have a higher level of heritability orpopulation differentiation, so were not excluded fromthose analyses; Supplemental text and Figure S8 inAdditional file 1).After excluding the potentially problematic probes, weidentified 49 mSNPs in CEU and 86 in YRI (genotypeversus methylation level r > 0.6; FDR of 37% and 28%,respectively), each explaining 36 to 92% of the variancein DNA methylation at the associated site (Figure 3a).We note that these numbers are not directly comparableto previous studies [11,12] that included CpG probesthat may contain SNPs, since including probes overlap-ping SNPs in our analysis increases the number of(apparent) mSNPs while decreasing the FDR. RestrictingFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 5 of 12Figure 3 Population specificity of mSNPs. (a) An example of an mSNP (between a CpG site near LDHC (chromosome 11, base 18390591), andrs2643856) that is found in both YRI and CEU. In both cases the T allele is associated with higher methylation. (b) Venn diagram of the overlapamong CpG sites associated with an mSNP in YRI and/or CEU. Five CEU sites and eight YRI sites were excluded from the overlap analysisbecause they overlapped a SNP in the other population. (c) Example of an mSNP (between a CpG site near PLSCR2 (chromosome 3, base147696535) and rs12489924) that is found in YRI but not CEU. No other SNPs in CEU within 100 kb of the CpG are associated with methylationat the site (r < 0.25 for all), indicating that the difference is unlikely to be due to differing LD between rs12489924 and the causal variant. (d)Scatter plot of all 86 YRI mSNPs, showing the strongest association found for that site in each population. Points are colored according to thesignificance of the difference in the associations within each population; most mSNP association strengths are significantly (P < 0.005) differentbetween populations. The same plot for CEU mSNPs is shown in Figure S10 in Additional file 1. (e) Overlap of LCL mSNPs with brain mSNPsfrom two studies of European populations (similar to CEU). Both all CEU mSNPs and CEU-specific mSNPs show similar overlap of 40 to 42%,which is thus a minimum estimate for the extent of mSNPs shared between LCLs and brain. However, YRI-specific mSNPs show only 3.2%overlap, not significantly different from the 1.2% expected from any random set of CpG sites.Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 6 of 12the CpG sites to only those with heritable methylation(h2 > 0.2) decreased the FDR substantially (24 mSNPs at8.6% FDR in CEU; 55 mSNPs at 4.7% FDR in YRI), pro-viding a high-confidence list of mSNPs (Table 1; TableS2 in Additional file 1), as well as evidence supportingour heritability estimates in each population. Our high-confidence YRI mSNP list overlapped the mSNPs froma previous study of YRI LCL mSNPs [13] over 50-foldmore than expected by chance (Supplemental text inAdditional file 1). The vast majority of our mSNPs didnot coincide with eSNPs (SNPs associated with geneexpression levels; Supplemental text in Additional file1), in agreement with previous work [13], suggestingthat most do not impact gene expression levels in stan-dard LCL culture conditions. None of these mSNPsaffected methylation in known imprinted regions, andthere was no enrichment for Gene Ontology categoriesor KEGG (Kyoto Encyclopedia of Genes and Genomes)pathways among the genes associated with either popu-lation’s mSNPs.To test our mSNP mapping accuracy, we performedbisulfite Sanger sequencing at one mSNP locus(RNF186; Table 1) on 55 individual DNA moleculesfrom six samples (three CEU and three YRI; Figure S9in Additional file 1). Each individual’s average methyla-tion level at a particular CpG site (cg09195271) agreedwith our array-based results (r2 = 0.74), recapitulatingthe association between this site’s methylation and thegenotype of a nearby SNP (rs3806308): individuals witha CC genotype had the lowest average methylation (4/27DNA molecules methylated = 14.8%), CT was inter-mediate (5/18 = 27.8% methylated), and TT had thehighest (8/10 = 80% methylated). Interestingly, themethylation at six additional CpG sites in betweenrs3806308 and the target CpG did not correlate with theSNP genotype, indicating site-specific control of methy-lation, and not a more general regional effect.Comparing our complete catalogs of mSNPs fromeach population, we found little overlap between them,or in the DNA methylation sites associated with mSNPs:Table 1 High-confidence mSNPs in CEUGene Chromosome CpGpositionmSNP Perccentage CEU varianceexplainedPerccentage YRI varianceexplainedCEUh2YRIh2BrainmSNP?TTC13 1 229182620 rs7545429 71.3 49.0 0.41 0.64 NoMGC3207 19 13736014 rs371671 68.8 27.2 0.60 0.35 YesPPP4R2 3 73128376 rs9816164 66.7 43.2 0.51 0.23 YesLDHC 11 18390591 rs11601413 65.4 86.5 0.55 0.68 YesRNF186 1 20015084 rs3806308 65.1 68.3 0.41 0.50 NoFLJ32569 1 204085874 rs823080 58.5 4.5 0.28 0.05 YesNDUFAF2 5 60275337 rs162244 57.4 62.6 0.26 0.49 NoPCGF3 4 689950 rs2242234 57.2 19.9 0.47 -0.10 NoLTA 6 31648435 rs2516390 55.9 40.5 0.48 0.24 NoIGSF2 1 117345939 rs12130298 52.6 10.0 0.96 -0.19 NoGSTM5 1 110056139 rs4970776 52.4 12.1 0.55 0.14 YesFLJ32569 1 204085802 rs823080 50.4 3.7 0.49 0.08 YesASCIZ 16 79627243 rs16954698 47.8 9.6 0.24 -0.12 NoTACSTD2 1 58815787 rs1109896 42.2 50.4 0.29 0.49 NoHLA-C 6 31347299 rs6457375 42.1 44.0 0.24 0.61 YesHLA-DRB56 32606582 rs9271586 42.0 28.2 0.32 0.42 NoLYCAT 2 30523367 rs829650 40.8 52.4 0.75 0.64 YesPARK2 6 163069159 rs13218900 40.4 41.6 0.21 0.03 NoITPR1 3 4510075 rs304075 39.4 7.6 0.21 -0.07 NoPSMD5 9 122644335 rs12343516 39.4 35.1 0.53 0.11 YesBTN3A2 6 26472772 rs2393667 38.1 14.9 0.22 0.31 YesRAPGEF3 12 46439111 rs3759407 37.2 6.8 0.71 -0.17 NoFAM83A 8 124264314 rs16898095 36.3 76.5 0.27 0.71 NoCRIP2 14 105011436 rs4983346 36.1 3.6 0.46 0.04 NoThe 24 mSNP-CpG site pairs where > 36% of the variance in CEU methylation is explained by the mSNP genotype, and h2 > 0.2. When more than one SNP wastied for the strongest association (due to perfect LD), one was chosen randomly. The YRI association strength is for the top local (within 100 kb) mSNPassociation for the same CpG site. In bold are YRI associations that explain < 20% of the variance in YRI methylation, indicating a high-confidence set of CEU-specific associations. For brain mSNPs, the intersection of cis-acting mSNP lists used by the authors of each original study [11,12] was used. YRI mSNPs are listedin Table S2 in Additional file 1.Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 7 of 12only 11 CpG sites (8.9% of the mSNP-associated sites)were present in both of our medium-confidence lists(Figure 3a-c). This lack of overlap parallels the extensivepopulation specificity of both methylation levels (Figure1c) and their heritabilities (Figure 2c). Sites with popula-tion-specific mSNPs also tended to have population-spe-cific heritabilities (Table 1, entries in bold; and seePLSCR2 in Figure 2a and 3b), suggesting that themSNPs we detect are a major source of the heritabilityof their target sites’ methylation.Three factors could contribute to a lack of overlapbetween mSNPs from each population: low power, dif-fering LD/allele frequencies, and true population-specificeffects of genetic variation on methylation. We foundthat neither low power nor differing LD/allele frequen-cies could account for most of the population specificitywe observed (Supplemental text in Additional file 1),suggesting that many mSNPs exert population-specificeffects on DNA methylation. Such population specificitycan only be explained by interactions between themSNPs and other genetic variants, and/or the environ-ment (see Discussion).Comparing our mSNP catalogs to previously reportedmSNPs from brain allows us to test the generality of theobserved population specificity in an independent cohortand tissue. Among our CEU mSNPs, 42% (10/24; Figure3e) were previously observed in both of two brainmSNP catalogs that utilized cohorts of European ances-try [11,12] (Table 1), indicating that these associationsare shared across tissues. A similar fraction (4/10, 40%;Figure 3e; Table 1, entries in bold) of the subset ofhigh-confidence mSNPs observed only in CEU (not YRI)were also seen in brain. A key prediction of our resultsis that mSNPs found only in YRI should not beobserved in the European brain samples if they are trulypopulation specific. In support of this, only 1/32 (3.1%;Figure 3e; Table S1 in Additional file 1) of YRI-specificmSNPs were seen in European brain (not significantlydifferent than the 1.2% expected by chance). This lackof overlap is unlikely to be due to potential artifacts oflong-term cell culture, since the CEU cell lines are dec-ades older than the YRI, which would tend to actagainst the trend we observed. Therefore, we concludethat the population specificity we discovered is recapitu-lated in vivo, as well as across tissues.DiscussionOur results demonstrate extensive population specificityin DNA methylation profiles near transcription startsites. We observed these differences at three levels: theextent of DNA methylation, its heritability, and its asso-ciation with specific genetic variants (mSNPs). We attri-bute most of these differences to two main factors:population-specific allele frequencies of genetic variantsaffecting DNA methylation, and complex GxG or GxEinteractions.Although in vitro artifacts are always a concern whenusing cell lines - and in particular LCLs, which have beenshown to have some methylation differences comparedto blood [20,21] - our results are unlikely to be driven bythese effects, for three main reasons. First, unlike someprevious studies of population-level differences in thesecell lines [17,25], we processed samples in a randomizeddesign, to eliminate the possibility of batch effects influ-encing our estimates of population specificity. Second,we found most of the population-specific DNA methyla-tion to be explained by local genetic variants, ruling outany type of cell line artifact as an alternative explanation.Third, and most importantly, our population-specificmSNPs are supported by comparison to two studies ofbrain mSNPs in cohorts of European ancestry: 40% ofour CEU-specific mSNPs overlap with both of these pre-vious studies, whereas only 3.1% of YRI-specific mSNPsdo, despite our expectation that the much older CEULCLs would be more likely to have accumulated abnorm-alities in DNA methylation [20]. Together, these lines ofevidence strongly suggest that our results apply in vivoand across tissues.A variant that is present in two populations, butaffects DNA methylation in only one, can only beexplained by complex genetic interactions. These inter-actions could involve the environment (GxE), epistasiswith other variants (GxG), or both. For example, somegenetic variants have an observable effect on DNAmethylation only in the presence of a sufficient quantityof methyl donors [26], which could differ between Yoru-bans and European-Americans as a result of diet orother factors (though methylation differences due toGxE interactions would have to be preserved during thecreation and culturing of the LCLs). Even with suchinteractions causing differentiation between populations,genetic effects could be entirely additive within popula-tions, consistent with our observation of heritable DNAmethylation at many sites.Divergence in the genetic underpinnings of DNAmethylation (as evidenced by the population-specificmSNPs) would be expected to result in differing herit-abilities and methylation levels, consistent with ourresults. Although we cannot provide an accurate esti-mate of exactly how much of the population-specificDNA methylation we observed is due to population-spe-cific mSNPs, it is likely to be a substantial fraction oncemSNPs of small effect (which could not be detectedhere due to our limited sample size) are accounted for.ConclusionsAs DNA methylation is an important epigenetic modifi-cation, affecting a wide range of diseases and otherFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 8 of 12phenotypes [1-7], our finding that genetic or environ-mental interactions likely affect most mSNPs - and thusmay also explain a substantial portion of the populationspecificity of DNA methylation levels, and their herit-abilities - underscores the complex interplay of factorsthat influence epigenetic modifications. Further charac-terization of these factors will be critical for our under-standing of the epigenome.Materials and methodsGenome-wide DNA methylation analysisGenomic DNA was purchased from the Coriell Institute.DNA concentration and purity were assessed spectro-photometrically using a NanoDrop ND-1000 (ThermoScientific, Waltham, MA, USA). After random orderingof all samples, 1 μg of genomic DNA from each samplewas bisulfite-converted using the EZ-96 DNA Methyla-tion Kit (Zymo Research, Irvine, CA, USA) as per Illu-mina’s Infinium specific protocol. Bisulfite convertedDNA was then quantified by NanoDrop and concen-trated to higher than 50 ng/μl using a Speedvac.Quantitative DNA methylation measurements of bisul-fite-treated genomic DNA were performed with the Infi-nium HumanMethylation27 BeadChip assay (Illumina,San Diego, CA, USA), using experimental proceduresrecommended by the manufacturer. Briefly, 200 ng ofbisulfite-converted DNA was whole-genome amplified,fragmented by an enzymatic process and hybridized toBeadChip arrays. Two oligonucleotide probes interro-gated each CpG site, one probe with sequences targetingmethylated DNA and the other containing sequencestargeting unmethylated DNA. After extension withDNP-labeled and biotin-labeled dNTP, each array wasstained with Cy5 labeled anti-DNP antibodies and Cy3labeled streptavidin and scanned with the Illumina iScanon a two-color channel to detect Cy3 labeled probes onthe green channel and Cy5 labeled probes on the redchannel. Using the Illumina GenomeStudio softwarepackage, methylation levels (b values) were then calcu-lated by dividing the methylated probe signal intensityby the sum of methylated and unmethylated probe sig-nal intensities. b values range from 0 (completelyunmethylated) to 1 (fully methylated) and provide aquantitative readout of relative DNA methylation foreach CpG site within the cell population being interro-gated. This method was highly reproducible, as technicalreplicates across different runs had r > 0.996. All sam-ples passed internal controls included on the Human-Methylation27 arrays, including controls for arraybackground, hybridization quality, target specificity andbisulfite conversion. Furthermore, all samples passedour quality control check of having fewer than 5% ofsites with either detection P-value < 0.05 or fewer thanfive beads being present on the array for a particularCpG site. Cluster analysis also indicated the absence ofany outlier samples. Raw data have been deposited inthe Gene Expression Omnibus database under accessionnumber [GSE27146].Samples from both populations were run together in arandomized order to avoid confounding batch effectswith population differences. In order to test for the pre-sence of batch effects, we tested whether the DNAmethylation profiles of samples run in either the samebatch number (1 to 4) or well number (1 to 96) weremore similar to each other than expected by chance.Neither batch number nor well number was predictiveof profile similarity (comparing correlation coefficientswithin batches or wells to all sample correlations, Wil-coxon P = 0.79 and 0.64, respectively), indicating thelack of any detectable batch effects.Several steps were applied for normalization of bvalues across the subjects. First, average backgroundintensity, as measured by negative background probespresent on the array, was subtracted from the raw inten-sities to adjust for varying background signals across dif-ferent samples. This background adjustment was doneseparately for raw data from the green and red channelsto adjust for Cy3 and Cy5 differences. All negativeintensities were assigned values of zero before furthernormalizations were performed. To minimize batcheffects across different sets of arrays, backgroundadjusted raw data from both channels were quantilenormalized separately. Applying the same formula usedby GenomeStudio, average b values were then recalcu-lated using background subtracted and quantile normal-ized intensities of methylated probes divided by the sumof normalized intensities from unmethylated and methy-lated probes.PyrosequencingDNA methylation of the promoter regions of PLSCR2and IGSF2 containing specific CpG loci under the con-trol of mSNPs were confirmed using bisulfite pyrose-quencing. Genomic DNA (750 ng) was bisulfiteconverted using an EZ DNA Methylation Gold kit(Zymo Research). After PCR amplification of approxi-mately 200 bp regions encompassing the target lociusing specifically designed primers to ensure unbiasedamplification, quantitative measurement of DNA methy-lation at each CpG was performed using a pyrosequen-cing primer located within 30 bp of the CpGinterrogated. Reactions were measured on a PyroMarkQ96 MD Pyrosequencer following the manufacturer’sprotocol, and analyzed using the Pyro Q-CpG software(Biotage, Uppsala, Sweden), which allows quality assess-ment of each measurement. CpG loci that were called‘passed’ in the default software settings are shown inFigure 1b (n = 175 for IGSF2; n = 156 for PLSCR2). ToFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 9 of 12assess the agreement between methods, we used Pear-son’s correlation (as throughout the manuscript),because rank-based correlations do not account for theclustering of most samples within a small range ofmethylation (for example, 95 to 100% methylation forPLSCR2 in Figure 1b). An alternative metric, classifyingsites into high or low methylation based on a cutoff andmeasuring agreement in a 2 × 2 contingency table, ledto results similar to the Pearson correlation across awide range of cutoffs (data not shown). Primersequences used for DNA amplification and pyrosequen-cing are available upon request.Calculation of false discovery ratesAll FDRs were estimated by randomization, which pre-serves all aspects of the data that might affect statisticalanalyses. For example, the FDR for population-specificmethylation was estimated by randomly assigning CEU/YRI labels, and recalculating the Wilcoxon P-value onthe randomized data (resulting in an essentially uniformdistribution of P-values, like that shown in Figure 1c).FDRs for mSNPs were estimated by pairing genotypeswith randomly chosen methylation profiles, and calculat-ing mSNPs as for the real data. Because of the familytrio structure of the HapMap samples, not all samplesare independent; to account for this in our randomiza-tion procedure, we also performed randomizationsbased on swapping methylation data for entire trios, ineffect treating each trio as an independent unit com-posed of three methylation profiles and three genomesequences. This procedure yielded indistinguishableFDRs compared to randomizing all samples individually.All FDRs are based on at least 1, 000 randomizations.Heritability analysisNarrow-sense heritabilities (h2) were estimated as thecorrelation between average parental values and theiroffspring. Because the offspring and parental variancesare equal, this is equivalent to performing regression.Although heritabilities are by definition non-negative,our estimates are often negative due to the limitedpower inherent in our data. We note that our methodof estimating h2 assumes that there is no shared envir-onmental variance between parents and offspring thatimpacts DNA methylation; if this assumption is violated,we will overestimate h2 (with an upper bound of H2, thebroad-sense heritability). It also assumes that somaticDNA methylation is not passed directly from parent tooffspring through the germline, since this would violatethe assumptions of the heritability estimation. To esti-mate the number of CpG sites with heritable methyla-tion, we generated 1, 000 randomized versions of the h2distribution (see above), and calculated the number ofsites with greater methylation in the real data, comparedto each randomized distribution. Visually, this corre-sponds to the area in between the two distributions, onthe right side (positive values) where the real distribu-tion is shifted to the right. The average difference acrossthe 1, 000 randomizations was 762 sites for CEU, and930 for YRI. Note that this procedure allows us to esti-mate the number of heritable sites, but not specifywhich specific sites are the heritable ones; thus, it is notpossible to calculate an FDR for these estimates.mSNP analysismSNPs were identified by calculating correlationsbetween SNP genotypes (arbitrarily coded as 0, 1, and 2)and methylation levels. Only SNPs within 100 kb ofeach CpG site were tested, to reduce the multiple test-ing burden. Although the 1000 Genomes SNP catalog ismore complete, we used HapMap genotypes [16] for themSNP analysis, since not all cell lines for which we col-lected methylation data have been sequenced as part ofthe 1000 Genomes Project [22]. We required a mini-mum of 5 minor alleles among the 90 individuals ofeach population to include a SNP in this analysis (fordetails of how we accounted for the family trio struc-ture, see ‘Calculation of false discovery rates’ above).This resulted in 2, 668, 982 YRI SNPs and 2, 405, 735CEU SNPs (1, 969, 973 shared by both). For the analysisof genetic variants contributing to population-level dif-ferences, only the SNPs shared by both populationswere used, and population was represented in the multi-ple regression as 0/1 for CEU/YRI.Correlations were recorded as the absolute value ofthe correlation coefficient, since the sign is arbitrary,depending on how genotypes are coded as 0/1/2. How-ever, for comparisons between CEU and YRI correla-tions, the fact that all correlations are positive meansthat the difference between associations can be underes-timated. If the same SNP (or two SNPs in high LD) wasused to calculate the correlation with a particular CpGsite’s methylation in both populations, the signs couldbe used; however, in most cases a site’s strongest corre-lation was with different SNPs in CEU and YRI, pre-cluding the use of signs.Bisulfite sequencing of RNF186 promoter regionGenomic DNA (500 ng) was bisulfite converted usingthe EZ-96 DNA Methylation Gold Kit (Zymo Research)as per the manufacturer’s protocol with minor modifica-tions. A 532 bp region upstream of the RNF186 genecontaining the SNP rs3806308 and the CpG sitecg09195271 from the IlluminaHuman Methylation arraywas amplified by nested PCR reactions using HotstarTaq (Qiagen, Hilden, Germany). The first round of PCRamplification was done using 55°C annealing tempera-ture for 30 cycles and the primer pair F3Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 10 of 12(GGATATAGAGGGTGGTTTGTAGTGTTAGT) andR2 (ACRCACAAATATTTAACACCTACTACT). A 3 μlaliquot of the material obtained in the first round wasfurther amplified in the second round in a total volumeof 50 μl, using 51°C annealing temperature for 35 cyclesand the primer pair F2 (TGAATGAAATATTTGTTT-GAGGGAGTGT) and R3 (CCTTAAAACCACAAC-TATTATATTCACAA). All primers were designed tobe specific for bisulfite converted DNA. The amplifiedPCR product was separated from primers by electro-phoresis in a 1.5% Tris-acetate-EDTA (TAE) agarosegel, excised and purified using the QIAquick gel extrac-tion kit (Qiagen). Purified DNA was then ligated intoplasmid pGem-T Easy using the pGem-T Easy vectorysystem (Promega, Madison, WI, USA) and transformedinto competent JM109 Escherichia coli (Promega) by theCaCl2 method. Colonies carrying a plasmid containingan insert were then selected based on blue-white screen-ing. Plasmid DNA was extracted using Qiaprep SpinMiniprep kit (Qiagen). Plasmid clones containing theappropriate sized insert, as determined by a restrictiondigestion analysis, were sequenced using T7 and/or SP6primers by Genewiz Inc. South Plainfield, NJ, USA.Sequences were analyzed using Sequencher sequenceanalysis package 4.6 (Gene Codes Corporation, AnnArbor, MI, USA).Additional materialAdditional file 1: Supplemental text, Tables S1 and S2, and FiguresS1 to S19 [27-30].Abbreviationsbp: base pair; CEU: HapMap population of Northern European ancestry; CpG:cytosine-phosphate-guanine; FDR: false discovery rate; GxE: gene-by-environment; GxG: gene-by-gene; LCL: lymphoblastoid cell line; LD: linkagedisequilibrium; mSNP: methylation-associated SNP; SNP: single-nucleotidepolymorphism; YRI: HapMap population of Yoruban ancestry.AcknowledgementsWe thank M Feldman, M Hayden, J Rine, S Roy, and an anonymous reviewerfor helpful comments and discussion. We further thank M Lorincz for adviceon bisulfite sequencing and use of Sequencher software, and A Devlin foruse of the PyroMarkMD system. Work in MSK’s laboratory is supported byNational Institute of Health (NIH) grant R24MH-081797-01. Work in HBF’slaboratory is supported by National Institute of Health (NIH) grant1R21HG005750-01A1. MSK is a Scholar of the Canadian Institute forAdvanced Research and of the Mowafaghian Foundation. HBF is an Alfred PSloan Fellow and Pew Scholar in the Biomedical Sciences.Author details1Department of Biology, Stanford University, Stanford, CA 94305, USA.2Department of Medical Genetics, University of British Columbia, Vancouver,British Columbia, V6T 1Z3, Canada. 3Centre for Molecular Medicine andTherapeutics, Child and Family Research Institute, Vancouver, BritishColumbia V5Z 4H4, Canada.Authors’ contributionsMSK designed the project, oversaw data generation and wrote the paper.HBF designed the project, analyzed the data and wrote the paper. LL andSN generated and normalized the data. All authors have approved the finalmanuscript for publication.Competing interestsThe authors declare that they have no competing interests.Received: 29 September 2011 Revised: 30 January 2012Accepted: 9 February 2012 Published: 9 February 2012References1. Mohn F, Schübeler D: Genetics and epigenetics: stability and plasticityduring cellular differentiation. Trends Genet 2009, 25:129-136.2. Bonasio R, Tu S, Reinberg D: Molecular signals of epigenetic states.Science 2010, 330:612-616.3. Law JA, Jacobsen SE: Establishing, maintaining and modifying DNAmethylation patterns in plants and animals. Nat Rev Genet 2010,11:204-220.4. Illingworth RS, Bird AP: CpG islands - ‘a rough guide’. FEBS Lett 2009,583:1713.5. Chang SC, Tucker T, Thorogood NP, Brown CJ: Mechanisms of X-chromosome inactivation. Front Biosci 2006, 11:852-866.6. Jaenisch R, Bird A: Epigenetic regulation of gene expression: how thegenome integrates intrinsic and environmental signals. Nat Genet 2003,33(Suppl):245-254.7. Bjornsson HT, Fallin MD, Feinberg AP: An integrated epigenetic andgenetic approach to common human disease. Trends Genet 2004,20:350-358.8. Zilberman D, Henikoff S: Genome-wide analysis of DNA methylationpatterns. Development 2007, 134:3959-3965.9. Li Y, Zhu J, Tian G, Li N, Li Q, Ye M, Zheng H, Yu J, Wu H, Sun J, Zhang H,Chen Q, Luo R, Chen M, He Y, Jin X, Zhang Q, Yu C, Zhou G, Sun J,Huang Y, Zheng H, Cao H, Zhou X, Guo S, Hu X, Li X, Kristiansen K,Bolund L, Xu J, et al: The DNA methylome of human peripheral bloodmononuclear cells. PLoS Biol 2010, 8:e1000533.10. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J,Cox TV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK,Kunde J, Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S,Thompson C, West T, Rogers J, Olek A, Berlin K, Beck S: DNA methylationprofiling of human chromosomes 6, 20 and 22. Nat Genet 2006,38:1378-1385.11. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, Luo W, Craig DW,Redman M, Gershon ES, Liu C: Genetic control of individual differences ingene-specific methylation in human brain. Am J Hum Genet 2010,86:411-419.12. Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai SL,Arepalli S, Dillman A, Rafferty IP, Troncoso J, Johnson R, Zielke HR,Ferrucci L, Longo DL, Cookson MR, Singleton AB: Abundant quantitativetrait loci exist for DNA methylation and gene expression in humanbrain. PLoS Genet 2010, 6:e1000952.13. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y,Pritchard JK: DNA methylation patterns associate with genetic and geneexpression variation in HapMap cell lines. Genome Biol 2011, 12:R10.14. Boks MP, Derks EM, Weisenberger DJ, Strengman E, Janson E, Sommer IE,Kahn RS, Ophoff RA: The relationship of DNA methylation with age,gender and genotype in twins and healthy controls. PLoS One 2009, 4:e6767.15. Kaminsky ZA, Tang T, Wang SC, Ptak C, Oh GH, Wong AH, Feldcamp LA,Virtanen C, Halfvarson J, Tysk C, McRae AF, Visscher PM, Montgomery GW,Gottesman II, Martin NG, Petronis A: DNA methylation profiles inmonozygotic and dizygotic twins. Nat Genet 2009, 41:240-245.16. International HapMap 3 Consortium, Altshuler DM, Gibbs RA, Peltonen L,Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F,Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI,Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A,Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, et al:Fraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 11 of 12Integrating common and rare genetic variation in diverse humanpopulations. Nature 2010, 467:52.17. Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG:Common genetic variants account for differences in gene expressionamong ethnic groups. Nat Genet 2007, 39:226-231.18. Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM: Gene-expression variation within and among human populations. Am J HumGenet 2007, 80:502-509.19. Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX,Schweitzer AC, Blume JE, Cox NJ, Dolan ME: Evaluation of geneticvariation contributing to differences in gene expression betweenpopulations. Am J Hum Genet 2008, 82:631-640.20. Grafodatskaya D, Choufani S, Ferreira JC, Butcher DT, Lou Y, Zhao C,Scherer SW, Weksberg R: EBV transformation and cell culturingdestabilizes DNA methylation in human lymphoblastoid cell lines.Genomics 2010, 95:73-83.21. Caliskan M, Cusanovich DA, Ober C, Gilad Y: The effects of EBVtransformation on gene expression levels and methylation profiles. HumMol Genet 2011, 20:1643-1652.22. The 1000 Genomes Project Consortium: A map of human genomevariation from population-scale sequencing. Nature 2010, 467:1061.23. Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. SinauerAssoc 1997.24. Alberts R, Terpstra P, Li Y, Breitling R, Nap JP, Jansen RC: Sequencepolymorphisms cause many false cis eQTLs. PLoS One 2007, 2:e622.25. Akey JM, Biswas S, Leek JT, Storey JD: On the design and analysis of geneexpression studies in human populations. Nat Genet 2007, 39:807-808.26. Friso S, Choi SW: Gene-nutrient interactions in one-carbon metabolism.Curr Drug Metab 2005, 6:37-46.27. Bjornsson HT, Sigurdsson MI, Fallin MD, Irizarry RA, Aspelund T, Cui H, Yu W,Rongione MA, Ekström TJ, Harris TB, Launer LJ, Eiriksdottir G, Leppert MF,Sapienza C, Gudnason V, Feinberg AP: Intra-individual change over timein DNA methylation with familial clustering. JAMA 2008, 299:2877-2883.28. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E,Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanismsunderlying human gene expression variation with RNA sequencing.Nature 2010, 464:768.29. Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M,Pritchard JK: High-resolution mapping of expression-QTLs yields insightinto human gene regulation. PLoS Genet 2008, 4:e1000214.30. Chen YA, Choufani S, Ferreira JC, Grafodatskaya D, Butcher DT, Weksberg R:Sequence overlap between autosomal and sex-linked probes on theIllumina HumanMethylation27 microarray. Genomics 2011, 97:214-222.doi:10.1186/gb-2012-13-2-r8Cite this article as: Fraser et al.: Population-specificity of human DNAmethylation. Genome Biology 2012 13:R8.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitFraser et al. Genome Biology 2012, 13:R8http://genomebiology.com/2012/13/2/R8Page 12 of 12


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items