@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Medical Genetics, Department of"@en, "Medicine, Faculty of"@en, "Molecular Medicine and Therapeutics, Centre for"@en, "Non UBC"@en ; edm:dataProvider "DSpace"@en ; ns0:identifierCitation "Epigenetics & Chromatin. 2015 May 09;8(1):19"@en ; dcterms:contributor "Child and Family Research Institute"@en ; ns0:rightsCopyright "Farré et al.; licensee BioMed Central."@en ; dcterms:creator "Farré, Pau"@en, "Jones, Meaghan J."@en, "Meaney, Michael J."@en, "Emberly, Eldon"@en, "Turecki, Gustavo"@en, "Kobor, Michael S. (Geneticist)"@en ; dcterms:issued "2015-10-24T01:37:59"@en, "2015-05-09"@en ; dcterms:description """Background. DNA methylation is an epigenetic mark that balances plasticity with stability. While DNA methylation exhibits tissue specificity, it can also vary with age and potentially environmental exposures. In studies of DNA methylation, samples from specific tissues, especially brain, are frequently limited and so surrogate tissues are often used. As yet, we do not fully understand how DNA methylation profiles of these surrogate tissues relate to the profiles of the central tissue of interest. Results We have adapted principal component analysis to analyze data from the Illumina 450K Human Methylation array using a set of 17 individuals with 3 brain regions and whole blood. All of the top five principal components in our analysis were associated with a variable of interest: principal component 1 (PC1) differentiated brain from blood, PCs 2 and 3 were representative of tissue composition within brain and blood, respectively, and PCs 4 and 5 were associated with age of the individual (PC4 in brain and PC5 in both brain and blood). We validated our age-related PCs in four independent sample sets, including additional brain and blood samples and liver and buccal cells. Gene ontology analysis of all five PCs showed enrichment for processes that inform on the functions of each PC. Conclusions Principal component analysis (PCA) allows simultaneous and independent analysis of tissue composition and other phenotypes of interest. We discovered an epigenetic signature of age that is not associated with cell type composition and required no correction for cellular heterogeneity."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/54766?expand=metadata"@en ; skos:note "RESEARCHConcordant and discordanmot bte,ylaawpromoter-associated islands associate with gene expres- (TET) family of enzymes, DNA hydroxymethylation isFarré et al. Epigenetics & Chromatin (2015) 8:19 DOI 10.1186/s13072-015-0011-ymeasurement of DNA methylation cannot distinguishbetween the different kinds of modifications.ave, Vancouver, BC V5Z4H4, CanadaFull list of author information is available at the end of the articlesion levels [2]. CpG density is commonly classed into fourcategories: high-density CpG island (HC), intermediatedensity CpG island (IC), intermediate density island shorethought to exist as an intermediate in the process of activeDNA demethylation, although its exact functional roleremains enigmatic [4,5]. DNA hydroxymethylation in themammalian brain is typically 5 to 10 times higher thanany other tissue, and recent evidence has suggested that itplays an important role in normal brain function [6-9].Importantly, most sodium bisulfite-based methods of* Correspondence: msk@cmmt.ubc.ca2Centre for Molecular Medicine and Therapeutics, Child & Family ResearchInstitute, 950 W 28th ave, Vancouver, BC V5Z4H4, Canada3Department of Medical Genetics, University of British Columbia, 950 W 28thCpG island, and DNA methylation levels at theseour analysis were associated with a variable of interest: principal component 1 (PC1) differentiated brain from blood,PCs 2 and 3 were representative of tissue composition within brain and blood, respectively, and PCs 4 and 5 wereassociated with age of the individual (PC4 in brain and PC5 in both brain and blood). We validated our age-related PCsin four independent sample sets, including additional brain and blood samples and liver and buccal cells. Geneontology analysis of all five PCs showed enrichment for processes that inform on the functions of each PC.Conclusions: Principal component analysis (PCA) allows simultaneous and independent analysis of tissue compositionand other phenotypes of interest. We discovered an epigenetic signature of age that is not associated with cell typecomposition and required no correction for cellular heterogeneity.Keywords: DNA methylation, Principal component analysis, Aging, Brain, Blood, EpigeneticsBackgroundEpigenetics refers to modifications to DNA and chroma-tin that regulate transcription without alteration of thegenetic code. The best-studied epigenetic mark is DNAmethylation, first defined as the addition of a methylgroup to a cytosine residue, most frequently in the contextof CG dinucleotides. CpGs are not uniformly distributedin the genome and tend to be enriched in CpG islands[1,2]. Most promoters in the genome have an associated(ICshore, meaning intermediate density regions foundflanking HC regions), and low-density CpG island (LC)[2,3]. This CpG density is related to both DNA methy-lation level and variability [1,2]. While the commonform of DNA methylation described above has been themost extensively studied, many other related modifica-tions have recently emerged. Chief among them is 5-hydroxymethylcytosine, which exists as the oxidizedform of the canonical 5-methylcytosine mark of DNAmethylation. Catalyzed by the ten-eleven translocationsignatures of aging in huPau Farré1, Meaghan J Jones2,3, Michael J Meaney4,5,6, EldAbstractBackground: DNA methylation is an epigenetic mark thaexhibits tissue specificity, it can also vary with age and pomethylation, samples from specific tissues, especially brainused. As yet, we do not fully understand how DNA methprofiles of the central tissue of interest.Results: We have adapted principal component analysis toarray using a set of 17 individuals with 3 brain regions and© 2015 Farré et al.; licensee BioMed Central. TCommons Attribution License (http://creativecreproduction in any medium, provided the orDedication waiver (http://creativecommons.orunless otherwise stated.Open Accesst DNA methylationan blood and brainn Emberly1, Gustavo Turecki7 and Michael S Kobor2,3,6*alances plasticity with stability. While DNA methylationntially environmental exposures. In studies of DNAare frequently limited and so surrogate tissues are oftention profiles of these surrogate tissues relate to thenalyze data from the Illumina 450K Human Methylationhole blood. All of the top five principal components inhis is an Open Access article distributed under the terms of the Creativeommons.org/licenses/by/4.0), which permits unrestricted use, distribution, andiginal work is properly credited. The Creative Commons Public Domaing/publicdomain/zero/1.0/) applies to the data made available in this article,Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 2 of 17Epigenetics in general and DNA methylation in particu-lar are associated with cell fate and differentiation. Land-scapes of DNA methylation are highly divergent betweencell types, with cells from similar lineages showing moresimilar DNA methylation profiles [10,11]. In the context ofDNA methylation, the main drivers of tissue-specific pat-terns are located in areas of low CpG density outside ofislands and shores, and so differences between tissues areoften found in discrete locations in the genome [12,13].Studies of the differences in DNA methylation betweencell types in humans pose two difficulties. First, since inmany cases the tissue of interest for a particular conditionmay not be available, surrogate tissues are often employed.In particular, studies of the brain necessarily require theuse of postmortem tissue, tissue from surgical resection,or a surrogate tissue. Postmortem tissue has been usefulin many studies, but in general for large-scale studies, onlysurrogate tissue is available [14,15]. In these cases, re-searchers examine accessible peripheral tissues like buccalcells or blood to examine associations with phenotypesthat are assumed to be manifested in central tissues. Sincedifferent tissues show distinct epigenetic patterns, it is im-portant to compare the DNA methylomes of peripheraland central tissues. One area of particular interest is deter-mining whether the variation between individuals in a per-ipheral tissue resembles that in central tissues. Second,when comparing a specific tissue across individuals, differ-ences in the cellular composition of the tissue sample cangreatly affect DNA methylation pattern differences be-tween the individuals [16]. This problem can be correctedby either measuring the tissue composition of the sample,by using the DNA methylation profiles themselves to pre-dict the underlying cell composition of the tissue sample,or by using methods that correct for underlying cell com-position without the actual measurements [17-20]. It istherefore important to develop a method that can assessthe concordance between a surrogate tissue and the cen-tral tissue it represents while simultaneously controllingfor cell composition differences in both tissues.Organismic aging is a major component of epigeneticvariation. Epigenetic aging consists principally of two dis-tinct types of changes, termed ‘epigenetic drift’ and the ‘epi-genetic clock’ [21-24]. Epigenetic drift refers to an increasein inter-individual variability with age, while the epigeneticclock refers to observations that specific sites in the gen-ome show DNA methylation changes that are highly corre-lated with age across individuals. All tissues examined showan overall increase in DNA methylation with age, withsome sites showing loss in methylation [14,25-32]. Thesechanges tend to occur at genes related to developmentalprocesses. Sites that gain DNA methylation with age tendto be located in islands, while sites that lose methylationare less likely to be found in islands, indicating a trend to-wards median levels of DNA methylation with age [11,14].Studies of DNA methylation in cohorts present add-itional challenges. Sample size is often limited when inter-rogating precious primary human material from centralorgans [33]. This issue results in statistical challenges dueto multiple testing controls when applying current highdimensional methodologies to measure the methylationstatus of a large number of CpGs. While correlational ana-lysis with P values adjusted for multiple testing is com-monly used, other methods are emerging that identify areduced number of common patterns of variation acrossprobes, lessening the impact of multiple statistical tests[34]. In addition, since CpGs tend to show highly corre-lated methylation profiles, especially CpGs situated inproximity, statistical approaches that assume probe inde-pendence are not ideally suited to the study of DNAmethylation. In contrast, principal component analysis(PCA) is based on the cognizance that CpGs in an individ-ual often share common patterns of DNA methylation[10,20,35]. PCA is a technique that identifies correlationsamong data points within a large multidimensional dataset and is useful at reducing the dimensionality of thedata. A given principal component (PC) describes a par-ticular pattern of DNA methylation across samples. Eachsample in the data set is assigned a score for each princi-pal component, indicating the relative contribution ofeach PC-related pattern to the sample’s overall pattern.Each PC is also linearly independent from the others andaccounts for a particular amount of variance within thedata. PCA has often been used to identify batch effects inDNA methylation data, but has recently begun to be ap-preciated for its potential in broader and more biologicalaspects of epigenetic analysis [10,36-38].We used a PCA approach to compare DNA methylationin brain and blood samples from 17 individuals. Thismatched design allows for rigorous assessment of DNAmethylation irrespective of inter-individual differences inenvironment or genetic background. Given the large num-ber of tests and the relatively small sample size, PCA,which allows for the identification of dominant patterns ofvariation in methylation between tissues and also acrossindividuals within a tissue, was an appropriate choice. Wefound that PCA robustly identified patterns of DNAmethylation associated with known traits even in thissmall cohort, two of which we validated in independentlarger cohorts. The results presented here identify a PCA-based age predictor, as well as specific genomic locationswhere DNA methylation is more or less variable in brainand blood tissue.Results and discussionBlood and brain samples were obtained from theDouglas-Bell Canada Brain Bank. A total of 17 partici-pants were included in the study, ranging from 15 to 87years of age, with 4 females and 13 males. Three corticalregions (Broadmann area 10 (BA10), prefrontal cortex;Broadmann area 7 (BA7), parietal cortex; and Broadmannarea 20 (BA20) temporal cortex) were dissected frompostmortem brain as described previously [39], and wholeblood was collected postmortem from each subject byvenipuncture. We used the Infinium Human Methylation450K array to determine the genomic DNA methylationprofiles of the three brain regions and matching peripheralwhole blood. It is important to note that this technique, ascurrently applied, does not distinguish between DNAmethylation and DNA hydroxymethylation, so our reportsof DNA methylation in brain particularly are a compositeof both marks. We obtained all 4 tissues of interest for 15of the 17 participants; BA20 was missing from one partici-pant and whole blood sample from another. We removedpoorly performing probes, including those that overlappedwith SNPs or hybridized to multiple locations in the gen-ome and those located on the X and Y chromosomes,resulting in a total of 408,576 probes [3]. We applied PCAto this dataset to identify the major patterns of variationin DNA methylation.The majority of variation in DNA methylation wasaccounted for by tissue differences, cellular heterogeneitywithin a tissue, and subject ageWe first identified the distinct principal components anddetermined their contribution to the total variance inour dataset (Additional file 1: Figure S1A). The first 13PCs accounted for more than 90% of the variance in thedata (Additional file 1: Figure S1A). Patterns of DNAmethylation across samples for each PC were quite dis-tinct, as illustrated for the top five PCs (Figure 1A). EachficderhoFarré et al. Epigenetics & Chromatin (2015) 8:19 Page 3 of 17Figure 1 The first five principal components were associated with speciSamples were sorted by their tissue of origin (background color) and orvariables that correlated with the first five PC patterns. Correlations are swhich they are correlated. BA10, Broadmann area 10; BA20, Broadmann arewhole blood.biological factors. (A) First five PCs and their associated variance.ed by increasing age of the individual for each tissue. (B) Samplewn in the gray boxes. Arrows connect PCs with the variables witha 20, BA7, Broadmann area 7; PC, principal component; WB,Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 4 of 17panel was first ordered by tissue and second by increasingsubject age within each tissue. Both positive and negativecorrelations with PCs indicate the same type of relation-ship, since the sign of a particular PC score is arbitrary.The majority of the variation (75%) was accounted forby PC1, which clearly separated brain from blood tissue,suggesting that the dominant difference in DNA methy-lation across all samples was the difference betweenblood and brain tissues (Figure 1A,B). Previous studiesreveal that between-tissue differences are the main pre-dictors of DNA methylation variability [10,20,40]. ThisPC likely included a contribution of hydroxymethylatedsites, since hydroxymethylation is significantly higher inthe brain than in the blood [8,9].We also identified PCs with variability apparent onlyin brain tissue (PC2 in Figure 1A), in blood only (PC3in Figure 1A), or across all tissues (PC4 and PC5 inFigure 1A, PC6 in Additional file 1: Figure S1D). Theobservation that PCs 2 and 3 both showed variability inonly one of the two tissues raised the possibility thatcellular composition within each tissue was underlyingthe pattern of variability observed. To test this hypothesis,we used cellular composition prediction algorithms onour DNA methylation data for each sample. This analysisresulted in predicted proportions of white blood cell typesfor the blood samples and a neuron/glia proportion forthe brain samples [41,42]. Using these predicted propor-tions, we found that PC2, which was variable in all threebrain regions but not blood, was highly correlated withthe predicted proportion of neurons in the sample (r =0.98, P = 2.4E−33, Figure 1B, scatter plot in Additional file2: Figure S2). In contrast, PC3, which was variable inblood, but not brain, was highly correlated with the pre-dicted proportion of granulocytes in the whole blood sam-ple, but no other white blood cell types (r = −0.90, P =1.7E−6, Figure 1B, scatter plot in Additional file 2: FigureS2). This finding was consistent with granulocytes beingthe predominant white blood cell type and thus likelycontributing the majority of signal to a DNA methyla-tion profile in whole blood. Collectively, these datashowed that after tissue identity, cellular heterogeneitywithin a tissue was the major predictor of variation inDNA methylation.When the samples were plotted by age (Figure 1), atrend was observed within the brain for PC4 and bothbrain and blood for PC5 (Figure 1A). We thus correlatedthe DNA methylation profiles for each PC with the ageof the individuals. PC4 showed a statistically significantcorrelation with age for the three brain regions, but notblood. In contrast, there was a significant correlation forPC5 that included both brain and blood (Figure 1B,scatter plots in Additional file 2: Figure S2). These datasuggest two distinct DNA methylation signatures ofaging, one specific for the brain and the otherencompassing both brain and blood. Interestingly, we alsoobserved an increase in the mean level of DNA methyla-tion with age in the brain, but not in blood samples (r =0.71, P = 6.3E−9, Additional file 1: Figure S1B).We then determined the contribution of individualCpG probes to a particular pattern of methylation vari-ation across tissues and/or individuals by calculating theprojection of each CpG site to each PC. A projection indi-cates either a positive or a negative contribution of thegiven PC pattern to the methylation profile observed ateach CpG probe. Visualization of the distribution of CpGprojections on PC1, which differentiates brain tissues fromblood, clearly highlighted probes that had either a positivecontribution (more methylated in the blood compared tothe brain) or a negative contribution (more methylated inthe brain than in the blood) (Additional file 3: Figure S3).Selecting probes by their projection score compared tostandard deviation of all projections revealed that largerscores were reflective of greater similarities of the patternof the individual CpG site to the overall pattern of PC1(Additional file 3: Figure S3). For example, the two probeson the bottom left and bottom right, belonging to thegroup of ≷ ± 3σ, respectively, showed sharp transitions inDNA methylation between blood and brain, akin to PC1(Additional file 2: Figures S2, Additional file 3: Figure S3and 1A). This analysis illustrated that PC projectionscould be used as a means to filter the data to highlightspecific traits in the classification approach used below.Nevertheless, it is important to note that, although greaterprojections imply greater associations with a PC, projec-tions are not a measurement of statistical confidence.Instead, they simply quantify in which CpG sites the corre-sponding pattern of variation is the strongest. Mappingprojection values to a confidence test would suppose a bias,as the PCs are obtained from the data, and, by definition,they are the most dominant patterns of variation [43].Variability of DNA methylation between brain and bloodwas moderately concordantHaving identified biological variables that were correlatedwith nearly 85% of the variability in our DNA methylationdata, we next addressed the degree to which methylationvaried across individuals, and whether this variability wasconsistent across tissues. Examining the distribution ofM-value variance across all the samples of each tissue(Additional file 4: Figure S4A), we found that blood issignificantly more variable than any of the brain tissues(Kolmogorov-Smirnov (KS)-test of blood vs brain re-gions >0.18; KS-test between brain regions <0.12; per-centage of total variance in each tissue: BA10 26.5%,BA20 19.3%, BA7 17.7%, whole blood (WB) 36.4%).However, given the results in the previous section andpreviously published findings, it is likely that part of theinter-tissue differences in variance were due to differencesFarré et al. Epigenetics & Chromatin (2015) 8:19 Page 5 of 17in cell composition between tissues [16,20,44]. To accountfor this, we subtracted the contribution of PC2 (neuronalproportion) and PC3 (granulocyte proportion) from allsamples and repeated the analysis (Additional file 4: FigureS4B). After removing this variance due to inter-individualdifferences in cellular composition, we found that the vari-ances of different brain regions became notably homoge-nized (KS-test between brain regions <0.04) while thedifference between brain and blood increased (KS-test ofblood vs brain regions >0.27). We thus concluded thatblood was significantly more variable han brain regions(percentage of total variance in each tissue: BA10 20.8%,BA20 21.5%, BA7 20.3%, WB 37.3%) and that inter-individual differences in cellular composition were an im-portant contributor to variance in uncorrected data.Our approach of using matched tissue samples allowedus to determine the fraction of PCs for which the ob-served pattern of variation was common in blood andbrain. This analysis compared the similarity in the DNAmethylation patterns between blood and cortical braincells, with the caveat that brain DNA methylation in-cludes both DNA methylation and DNA hydroxymethy-lation. Since PC1 to PC3 were identified as tissue- andcell-type-associated PCs, they would not be informativein determining concordance across tissues. For this ana-lysis, we instead used the remaining PCs after PC3. Wefirst averaged the three different brain tissue PCs to-gether since their patterns of methylation are overallvery similar. We next selected PCs for which the amountof variation in each tissue was of comparable magnitude:σ2blood−σ2brain  < 12 σ2blood þ σ2brain : Finally, we selectedthe PCs showing a correlation P value of <0.01 betweenDNA methylation patterns of the brain and blood. ThePCs that followed these criteria (19 PCs, 37.2% of vari-ation after PC3) represented patterns of inter-individualvariation that were common between blood and braintissues (Additional file 5: Figure S5). The first eight ofthe PCs identified by these criteria were positively corre-lated between blood and brain and captured 74.5% ofthe variation, whereas the remaining PCs were negativelycorrelated between blood and brain and captured theremaining 25.5%. Thus, overall 37.2% of the non-tissue-specific variation was highly correlated between bloodand brain, 74.5% of which is positively correlated and25.5% is negatively correlated (Additional file 6: FigureS6). It is tempting to speculate that a portion of thisshared variation between blood and brain might repre-sent shared tissue differences, genetic impacts, or envir-onmental exposures. The negatively correlated variationimplies that high DNA methylation values in one tissueare associated with low methylation in the other. Thesesites may be those with important functions in eitherbrain or blood, where they are highly expressed and lowmethylated in one tissue and not expressed and highlymethylated in the other. These hypotheses based on ouranalysis of general patterns of DNA methylation will, ofcourse, need to be tested, particularly to determine thepossible contribution of hydroxymethylation to thesedifferences and similarities in variation.Age-related PCs were more easily detected in the brainthan in bloodWe next performed PCA on each of our brain and bloodtissues separately to see if the correlations with variablesof interest would persist in a smaller dataset. All thebrain tissues show a first PC that correlates with theneuron composition of the samples (P values <2E−11,Additional file 7: Figure S7A,B,C), followed by a secondPC that correlates with age (P values <8E−6, Additionalfile 7: Figure S7E,F,G). Since both PC4 and PC5 in thefull dataset showed a correlation with age in the braintissue, all of these probes were found to strongly overlapwith the probes identified with the age PCs in the braintissues only. Interestingly, a PCA on the 16 blood sam-ples revealed that the first two PCs both correlated withblood cell composition (first PC P value 2E−4,Additional file 7: Figure S7D; second PC P value 1E−2,Additional file 7: Figure S7H), but we were unable toidentify a PC strongly correlated with age. This findingimplies that the epigenetic pattern associated with age isnot as strong in blood as it is in brain. Thus, PC5 in thefull dataset, which shows an age-correlated pattern ofmethylation in blood as well as brain would not havebeen observed in blood alone, and was apparent becausethe presence of the pattern on brain reinforced poweracross samples.We next sought to evaluate the sample size for a cohortrequired to detect an age-correlated PC in blood. We usedan independent, published blood dataset (GSE40279) con-sisting of 656 individuals with an age range of 19 to 101years [27]. First, we calculated the cell composition of thesamples and subtracted the associated variance [41]. Then,we randomly subsampled the data into datasets with asmaller number of individuals. We performed PCA oneach of the subsampled datasets and reported the percent-age of times that we found an age-correlating PC for agiven sample size. We found that in datasets consisting of16 blood samples such as ours, the likelihood of finding aPC that correlates with age (P value <0.01) is approxi-mately 41%. The chance of detecting an age-related PC inblood improves to >60% when there are more than 22samples (Additional file 8: Figure S8). Although bloodDNA methylation showed more variation than brain DNAmethylation, a greater number of samples are needed toidentify an age-related pattern of methylation, whereas inbrain, a small sample size seems effective at identifyingsuch a correlation.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 6 of 17Age-associated PCs were replicated in independentdatasetsWe took advantage of published datasets to evaluate thereproducibility of our findings of tissue-concordant andtissue-discordant DNA methylation signatures of aging.We used the projections of each probe for the two PCs(PC4 and PC5) associated with aging and reconstructedthese PCs on the larger published cohort. Beginningwith data from brain (n = 40; age range: 2 to 56 years,GSE53162, [45]), we found a high correlation with agefor reconstructed PC4 and PC5, similar to our brain data(Figure 2A). We next performed the same analysis onblood data (n = 656; age range 19 to 101 years,GSE40279, [27]) where we predicted that only the recon-structed PC5 would be correlated with age, since the ori-ginal PC5 was correlated with age in both brain andblood tissue, while the original PC4 was correlated onlyin brain. Indeed, while reconstructed PC4 had a poorcorrelation coefficient with age (correlation: 0.11, P value3.6E−3), reconstructed PC5 had very strong correlationthat was highly significant (correlation: 0.55, P value1.81E−52, Figure 2B). To more broadly investigate thetissue specificity of our aging signature, we performedthe same reconstruction on two other data sets, frombuccal epithelial cells (BEC) (n = 96; age range 1 to 28years, GSE50759, [46]), and liver (n = 85; age range 23to 83 years, GSE48325, [47]). Despite the younger andmore limited age range of the BEC cohort, both data setsshowed high correlation between both reconstructedPC4 and PC5 and age (Figure 2 for details, correlations,and P values). Collectively, these data suggested that atleast two independent DNA methylation signatures ofaging exist, one of which (PC5) is shared across all tis-sues examined and the other (PC4) which is found in allexcept blood. Inherently, these data also provided strongevidence for replication of age-related DNA methylationsignatures between diverse datasets from different co-horts and laboratories.Hierarchical clustering of data using principalcomponents revealed further relationships betweensamplesWe used hierarchical clustering to further explore howour PCs describe the relationships between samples. Toreveal the natural internal relationship between samples,we first computed the nearest neighbor hierarchicalclustering similarities of the most highly variable probesin the full dataset. These probes showed variance (>4σ)across all samples (7,420 probes) (Figure 3A). The threebrain tissues from a given individual clustered together,with BA10 and BA7 being closer to each other andforming a node distinct from BA20. Blood from all indi-viduals clustered separately, and the clustering distancebetween individuals was generally larger than that inbrain (Figure 3A). These data suggested that individualDNA methylation patterns in the three cortical brain re-gions were more closely related between different indi-viduals than between brain and blood in the sameindividual, and that inter-individual differences in bloodDNA methylation were more pronounced than those inbrain DNA methylation. This conclusion is consistentwith our analysis of variance presented earlier whereblood variance was significantly higher than brain.We performed clustering using the subset of probesthat PCA identified as being correlated with variables ofinterest to further uncover similarities in DNA methyla-tion between samples. For example, using only theprobes that had a projection score ≷ ± 4σ on PC1 (2,258probes), hierarchical clustering separated blood frombrain with very few distinctions within the two groups.This approach revealed tissue similarities as the only sig-nificant relationship in PC1 regardless of the origin ofthe individual sample (Figure 3B). This analysis thusconfirmed PC1 as a blood vs brain tissue classifier.A different picture emerged when we performed thehierarchical clustering using only the probes that had aprojection score ≷ ± 4σ on PC4 (1,993 probes), the PCthat showed age-dependent methylation in brain, buccal,and liver, but not blood (Figure 3C). As with the previ-ous examples, this clustering approach confirmed ourPCA association as it sorted individuals according toage. In this case, however, brain and blood from thesame individual clustered together, with nuanced distinc-tions within each individual revealing a first order ofsimilarity that encompassed BA10 and BA7, followed bya second node of BA20, and finally a node that includedblood. In this case, the probes that contribute to theage-related PC4 showed a stronger effect of individualand a weaker effect of tissue. We previously saw thatPC4 was related to age in the brain tissue only, there-fore, one may naively expect blood to cluster separatelyhere. However, probes that have strong projections onPC4 can still have significant projections on otherindividual-specific PCs, as we will show in the followingsections. This results in an increased similarity betweenindividuals for this probe subset.Tissue and age-dependent methylation profiles wereenriched for specific CpG densitiesExisting evidence suggests that age-related DNAmethylation changes occur at specific genomic locations[14,27,35]. We tested whether the probes that contributesignificantly to our top five PCs, all of which were asso-ciated with a biological variable, were enriched or de-pleted for particular genomic regions and CpG islandclassifications.CpG loci associated with PCs 1, 2, and 3, which differ-entiate tissues by its broad origin (blood vs brain forFigure 2 Age-associated PCs validated in independent datasets. The projections of each CpG site onto PC4 and PC5 were used to reconstructthese PCs in (A) 40 brain samples, (B) 656 blood samples, (C) 96 buccal swabs, and (D) 85 liver samples. r values and P values for each correlationare shown. PC, principal component.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 7 of 17Figure 3 (See legend on next page.)Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 8 of 17PC1) or cellular heterogeneity within a tissue, (fractionof neuron for PC2; fraction of granulocytes for PC3),respectively, were primarily enriched in low-density CpG(LC) regions and depleted in high-density CpG (HC)contexts (Figure 4A,B,C). This finding was consistentwith previous reports that HC islands were less likely tocontain tissue-specific, differentially methylated regions[40]. This was true for both positive and negative direc-tions, meaning that the locations of differing DNAmethylation did not change depending on whether DNAmethylation was higher in one cell or tissue type or theother. A different pattern emerged for CpGs associatedwith age. CpGs predictive of age in all tissues exceptblood (PC4) were enriched for HC contexts and de-pleted in LC contexts, irrespective of whether DNAmethylation was gained (negative projections) or lost(positive projections) with age (Figure 4D). While HCenrichment/LC depletion was also found for CpGs asso-ciated with age in all tissues (PC5), it was limited tothose CpGs where DNA methylation increased with age(Figure 4E). In contrast, CpGs where DNA methylationwas lost with age (negative projections) tended to beenriched in intermediate and low CpG contexts and de-pleted in HC (Figure 4E). This discordance in genomiclocations of gain or loss of DNA methylation betweenthe two signatures of age was interesting. The pattern(See figure on previous page.)Figure 3 Hierarchical clustering of samples by PCs revealed distinct cluster patterns. Sample IDs indicate sample tissue, color tags (bottom) aremapped to the age of the individual (red: young, green: old). (A) Clustering of samples from a selection of all CpG sites with > 4σ variance acrosssamples. The first cluster separated blood from brain. Inside the brain region, samples clustered by individual instead of tissue. (B) Clustering ofsamples from a selection of CpG sites with ≷ ± 4σ projections in PC1. (C) Clustering of samples from a selection of CpG sites with ≷ ± 4σprojections in PC4, an age-related PC. PC, principal component; BA10, Broadmann area 10; BA20, Broadmann area 20, BA7, Broadmann area 7; WB,whole blood.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 9 of 17Figure 4 PCs showed distinct CpG density category enrichments. Enrichmwith < − 1σ projections (left bar) and > 1σ projections (right bar) compared tPC1 to PC3 showed an enrichment of LC probes and depletion of HC probespresent in all tissues checked except blood, showed an enrichment of HCprojections. (E) PC5, the age pattern present in all the tissues checked, was enmethylation with age) and was enriched in LC probes for negative projecCpG island; IC, intermediate density CpG island; LC, low-density CpG islanent and depletion of CpG density categories in subsets of CpG siteso the background total 450K CpG sites (central bar). (A-C) Tissue-relatedirrespective of the sign of the projection. (D) PC4, the age patternprobes and depletion of LC probes in both positive and negativeriched in HC probes for positive projections (CpGs show increasedtions (CpGs show decreased methylation with age). HC, high-densityd; PC, principal component.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 10 of 17apparent in PC5 showing gain in DNA methylation atHC regions and loss at LC regions was consistent withpublished reports that HC islands lose DNA methylationwith age [14,25,26]. The finding that PC4 had a differentpattern, where high-density regions showed both gainsand losses in DNA methylation with age, first indicatesthat the signatures delineated by PCs 4 and 5 were in-deed occurring at different sites. Also, since PC4 wasnot associated with age in blood, it suggested that thispattern of gain of DNA methylation at HC regions maybe tissue specific. Together, these results reinforced theidea that these were two independent signatures of age.We next used a similar approach to address whethersubsets of probes are enriched for introns or exons. Weobserved that probes associated with tissue differenti-ation (PCs 1 to 3) are enriched for introns and depletedin exons (Additional file 9: Figure S9A,B,C). PC4 showeda slight tendency for the opposite pattern, enrichmentfor exons and depletion for introns, while PC5 againshowed enrichment that was dependent on the sign ofthe projection score. Positive PC5 scores (increasedDNA methylation with age) showed enrichment in exonCpGs, while negative scores (decreased DNA methylationwith age) showed enrichment in intron CpGs (Additionalfile 9: Figure S9D,E).CH methylation was associated with specific tissue andcell typesDNA methylation has also been observed at non-CpGsites, sometimes referred to as CH sites [48,49]. Suchsites are especially prevalent in the brain. The 450Karray contains 3,091 non-CpG probes, thus allowing usthe opportunity to determine whether this alternativeform of DNA methylation was associated with our vari-ables. In PC1, we observed an enrichment of non-CpGsites for probes with negative projections (probes moremethylated in brain than blood, Additional file 10: FigureS10A,F). For PC2, we observed that probes with positiveprojections were enriched for non-CpG (methylationincreases with neuron fraction, Additional file 10: FigureS10B,F). This strongly suggested that non-CpG methyla-tion was not only higher in brain than in blood, but thatthe brain enrichment was mostly due to neurons asopposed to glia. These two findings were both supportedby previous studies, which also show more non-CpGmethylation in neurons than other tissues [28]. PC3showed a general depletion of non-CpG for both pro-jection signs, suggesting that non-CpG sites did notplay an important role in white blood cell composition(Additional file 10: Figure S10C,F). The age-relatedPCs, PC4 and PC5, showed a slight enrichment of non-CpG sites in the probes where methylation increaseswith age (positive projections in PC4 and negative inPC5, Additional file 10: Figure S10D,F).Distinct promoter DNA methylation signatures associatedwith tissue differentiation and agingGiven the association between promoter DNA methyla-tion and gene expression, we next explored whether dif-ferences in promoter DNA methylation existed betweenCpGs associated with either tissue differences or age. Tounambiguously associate only one promoter with a gene,we focused this analysis on ‘lone genes’ (genes that haveno other promoters within 5 kb of the transcriptionalstart site (TSS), n = 16,344). This approach resulted inincreased rigor by eliminating sites that might map tomore than one gene. We found 60,846 probes on the450K array that were situated within 2.5 kb distancefrom the transcription start site (TSS) of a ‘lone gene.’By design, the content of the 450K array is biased to-wards CpGs located within 1 kb up and downstreamfrom the TSS (Figure 5A). We used this as a backgrounddistribution for evaluating the spatial enrichment of spe-cific probe sets (Additional file 11: Figure S11).Visualizing the statistically significant locations withrespect to the TSS of probes associated with our first fivePCs as a heatmap revealed several interesting differencesand distinct patterns (Figure 5B). First, tissue- and cell-type-specific CpGs (PC1 to PC3) were generally enrichedin regions more than 500 bp away from the TSS. This pat-tern was especially significant in the brain-related profilesin PC1 and PC2. In PC3, the blood composition pattern,the Z-scores were smaller in magnitude, but still showedthis overall trend. We further observed some direction-dependent enrichment. In the neuron-composition PC(PC2), we observed that CpGs where methylation de-creases as neuron fraction increases (negative projections)were enriched away from the TSS. For the CpGs wheremethylation increases with neuron fraction, the Z-scoreswere not as significant as in the negative projections, butthe under-enriched region extended further into the geneitself, and we observed enrichment in proximity to theTSS. These patterns were the same as those observed inPC1 (brain–blood differential methylation). Negative PC1projections, where the brain tissue was more methylatedthan blood, showed the same type of enrichment as posi-tive PC2 projections (where neurons are more methylatedthan glia). This finding suggested that neurons constitutedan important source of differential DNA methylation pat-tern observed in brain vs blood.We also observed that CpGs predictive of age in alltissues examined except blood (PC4) were generallyenriched around the TSS and depleted away from theTSS. These CpGs showed subtle spatial differences de-pending on a positive or a negative projection. CpGs forwhich methylation decreases with age in the brain (posi-tive projections) are located close to the TSS (<500 bpaway), while probes that increase with age (negative pro-jections) are located further from the TSS (from −1,000Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 11 of 17bp to 1,500 bp from the TSS) and are under-enriched atthe TSS itself (Figure 5B). Thus, probes located close tothe TSS showed a decrease of methylation with age,whereas probes that increase with age tended to be lo-cated just upstream of the TSS or within the gene itself.CpGs predictive of age in all tissues tested (PC5) had adistinct pattern that was dependent on the direction oftheir component score. CpGs for which DNA methylationincreased with age (positive scores) were enriched aroundthe TSS and in proximal regions of the gene body,whereas those for which DNA methylation decreasedwith age (negative scores) were depleted at the TSS andenriched in upstream and downstream regions (Figure 5B).We used our list of 60,846 probes in ‘lone genes’ anddetermined the overlap in ‘lone genes’ between ourFigure 5 Distinct spatial enrichments of PC-associated CpGs. (A) Spatial disany neighboring gene in 5Kbp. (B) Enrichment Z-scores of probes found arespect to the background distribution. PC1 to PC3 probes showed a genefrom it. PC4 probes were enriched around the TSS and depleted away from itwere enriched close to the TSS and negative projections were enriched awayvarious PCs. Using only probes with ≷ ± 2σ projectionon each PC (an average of 22,221 probes per PC), we de-termined the overlap of the ‘lone genes’ associated withPCs 1, 2, and 3 and then PCs 1, 4, and 5 (Figure 6A,B). Weobserved that the overlap between tissue-differentiationgenes and aging genes (PC1 ∩ PC4) and (PC1 ∩ PC5) issignificantly smaller than the overlap between the twoaging sets of genes (PC4 ∩ PC5) (Figure 6B).PCs were enriched for biologically relevant gene ontologytermsLastly, we examined representations of functional categor-ies that were associated with our first five PCs. Beforeassessing functional enrichments, we examined overlapbetween positive and negative projections within the sametribution of CpG probes around the TSS of genes that do not havet each distance for each PC projection threshold subset of probes, withral trend of depletion around the TSS, while they were enriched away. PC5 probes had a strong sign-dependent trend, positive projectionsfrom the TSS. PC, principal component; TSS, transcriptional start site.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 12 of 17PCs to determine whether they should be examined to-gether or separately. For each PC, we visualized the over-lap of the set of genes that contained probes with scores >2σ with the set of genes with score < − 2σ (Figure 6C).Taking into account the number of lone genes that con-tained probes, we calculated the overlap of genes withpositive and negative projections that is expected bychance. We found that for the tissue-related PCs (PC1 to3), the overlap is smaller than expected. This suggests thatmethylation of tissue-specific probes within the same genetended to vary in the same direction with tissue or celltype. In contrast, for the age-related PCs, we found anoverlap equal to (PC4) and larger (PC5) than expected byFigure 6 Overlap of genes associated with specific PCs. (A) Overlap of genOverlap of genes associated to probes with ≷ ± 2σ projection on PC1 (bloblood). (C) Overlap of the set of genes that contained probes with > 2σ printerest. PC, principal component.chance. Thus, methylation of age-related probes withinthe same gene can vary in opposite directions with age(Additional file 12: Figure S12).Gene ontology (GO) analysis using DAVID is pre-sented in Additional file 13: Table S1 [50]. PC1 positiveprojections referred to genes that were more methylatedin blood than in brain and were enriched for clusters in-cluding neuron projection and axon morphogenesis andmacromolecule catabolism. Negative projections onPC1, which represent genes that were more methylatedin brain than in blood, were enriched for clusters thatincluded defense response and response to wounding,and rho signaling. PC2 positive projections had noes associated to probes with ≷ ± 2σ projection on PCs 1 to 3. (B)od–brain tissue related), PC4 (age in brain), and PC5 (age in brain andojection with the set of genes with < − 2σ projection, for each PC ofFarré et al. Epigenetics & Chromatin (2015) 8:19 Page 13 of 17significantly enriched clusters, while negative projec-tions, representing sites that are more methylated in gliathan neurons had three significantly enriched categories,including apoptosis, synaptic transmission, and proteinconjugation. For PC3, positive projections denote genesmore methylated in granulocytes than non-granulocytes,and were associated with cell motility, inflammation anddefense, and secretion and transport. Negative projec-tions, sites that were more methylated in granulocytesthan non-granulocytes, were associated with inflamma-tion and defense and metabolism. In all cases, GO termsassociated with specific PCs reflect the underlying tissuecomposition.For the age-related PCs, PC4 negative projections,reflecting sites that increased with DNA methylation inage in all tissues examined except blood, included anumber of categories, including transcription, organ de-velopment, neuron cell fate, adhesion, and differentiation.PC4 positive projections had no significant associations.For PC5 positive projections, representing sites for whichDNA methylation increased with age in all tissues, associ-ated clusters included developmental processes and tran-scriptional regulation. PC5 negative projections had noenrichment. Thus, both age-related PCs had no functionalenrichment categories for sites that lost DNA methylationwith age, while both contained categories related to devel-opment and differentiation in sites that gain DNA methy-lation with age. This was consistent with previouslypublished reports [25,29].ConclusionsUsing matched tissues from different individuals, wehave shown that PCA is capable of simultaneously iden-tifying independent DNA methylation signatures of anumber of variables of interest. This approach is particu-larly helpful for those variables which are related to oneanother, such as age and white blood cell composition[16]. By testing the correlation of each variable with eachPC, it is possible to find PCs where one variable is corre-lated, but the other is not. We were able to find PCsassociated with age that were not correlated with whiteblood cell composition, which is known to be a majordifficulty in assessing age-related DNA methylation fromblood [16]. These results have implications for otherstudies by indicating the relative contributions of factorsknown to cause changes in DNA methylation pattern.Our results indicated that tissue, cell type, and age, inthat order, all had very important effects on DNAmethylation. Future development of this method couldtarget clusters of CpGs rather than individual CpGs,identifying broader regions of variable-associated differ-ential methylation.Principal component analysis has been extensivelyused in gene expression studies. We show here thatPCA can provide functionally relevant insight into DNAmethylation variation as well. The use of a PCA-basedapproach allowed us to overcome difficulties often asso-ciated with epigenetic studies. The most commonmethod for analyzing DNA methylation data is linearmodeling. PCA complements from linear modeling, butdiffers in two primary ways: first, PCA does not assumea particular linear (or non-linear) relationship betweenthe variable of interest and the DNA methylation profileand second, PCA uses a Z-score to assign a confidenceto how strongly a particular PC accounts for a givenCpG’s methylation pattern. These differences in manycases can result in benefits to performing PCA. For ex-ample, one potential caveat with our study was smallsample size, which might not be conducive to derivinggenerally applicable relationships, particularly usingtraditional methods such as linear modeling. However,the reproducibility of PCs derived from only 17 subjectsin a much larger cohort suggests that our PCA approachis an excellent tool to detect a meaningful associationbetween DNA methylation and biological variables evenwhen only small sample sizes are available. This workthen highlights the issue of statistical power calculationsin epigenetic research, which has practical relevance forthe design of epigenetic studies. It should also be notedthat both the white blood cell preparation and the brainsamples used here constitute heterogeneous mixtures ofcell types. Since we were able to use established methodsto predict the cellular composition of both tissues, wecould identify specific PCs that associated with cellularcomposition and reassure ourselves that our aging signa-tures in brain and brain–blood did not correlate with dif-ferences in cellular composition. Finally, it is important tonote that PCs 1 and 3 might have a contribution of hydro-xymethylation to the signal. We cannot unambiguouslydetermine whether hydroxymethylation contributes posi-tively or negatively to these PCs, and ultimately how itrelates to CpG density, promoter spatial enrichment, orpathway analysis. As hydroxymethylation is generally notfound at promoter regions; it likely has a smaller effect onour determinations of promoter spatial enrichment [4,51].Our analysis provides a map of broad patterns of DNAmethylation in two important tissues, and laid some im-portant ground work on how these patterns were similarand different across tissues. Future work will be requiredto ferret out whether these broad patterns have func-tional implications. Interestingly, our data provided evi-dence for the existence of at least two independent DNAmethylation signatures associated with age. The first sig-nature was observed in all tissues examined, while thesecond was found in all except blood. More broadly, thissuggested that epigenetic signatures even for the samevariable could be both tissue specific and tissue inde-pendent. This is particularly relevant as blood and brainFarré et al. Epigenetics & Chromatin (2015) 8:19 Page 14 of 17originate from different germ layers. Future research inlarger cohorts with carefully ascertained cognitive vari-ables might reveal potential linkages between these typesof DNA methylation signatures and cognition.MethodsData collectionDNA was extracted from samples from the Quebec Sui-cide Brain Bank using the Qiagen DNAeasy DNAextraction kit (Qiagen, Valencia, CA, USA). Brain tissuewas obtained from the Douglas-Bell Canada Brain Bank(DBCBB; Douglas Mental Health University Institute,Montréal, Québec). All subjects were psychiatricallydiagnosed by means of psychological autopsy, which is avalidated method to reconstruct psychiatric history bymeans of extensive proxy-based interviews [52]. Individ-uals were Caucasian and died suddenly, with no pro-longed agonal period. Exclusion criteria were a lifetimetrauma exposure, a current DSM-IV axis I psychiatricdiagnosis including any form of substance abuse [53].Brain tissue dissection was carried out as previously de-scribed [39]. Briefly, tissues from the left hemispherewere carefully dissected at 4°C after having been flash-frozen in isopentene at −80°C. Brain tissue was dissectedand Brodmann areas (BA) identified using referenceneuroanatomical maps. The Research Ethics Board atthe Douglas Mental Health University Institute approvedthe project. Signed informed consent was obtained foreach subject from next of kin.DNA was treated with sodium bisulfite using theZymo EZ-DNA kit (Zymo Research, Orange, CA, USA)according to manufacturer’s instructions. All sampleswere randomized before bisulfite treatment, then ran-domized again before beginning the Illumina arrayprotocol. DNA was processed and hybridized to a totalof six of the Illumina Infinium HumanMethylation 450BeadChips (Illumina Inc., CA, USA) according to man-ufacturer’s instructions, then scanned on an IlluminaHiScan (Illumina Inc., CA, USA). After scanning, datawas imported into Genome Studio and control probeswere examined to ensure data quality, after which datawas exported into R. Next, probes were filtered to re-move any probes on the X and Y chromosomes(11,648), probes for which any sample showed a detectionP value greater that 0.01 or fewer than three beads con-tributing to the signal (27,541), the 65 SNP genotypingprobes, and probes that assess polymorphic CpGs or thatcross-hybridize to the X or Y chromosomes, leaving a totalof 408,576 probes remaining [3]. Background subtraction,color correction, and quantile normalization were per-formed on all samples together using the lumi R package,and peak-based correction was used to normalize Type Iand Type II probes [54,55]. At this point, M values wereexported from R for further analysis using Python.Principal component analysisPCA is a mathematical approach that reveals the in-ternal structure of variation in a data matrix. It calcu-lates a set of principal components of variation, alongwith a set of associated eigenvalues that quantify howmuch variation is captured by each principal component(Figure 1A and Additional file 1: Figure S1A).We built an N×M matrix of M values, X, where eachrow is a CpG from Illumina 450K Human DNA methy-lation array, and each column is a collected sample.The mean M-value of each column, x; was subtracted(Additional file 1: Figure S1B). Next, the M×M covariancematrix was calculated from the data and was diagonalized,getting the corresponding matrix of eigenvectors V(principal components, PCs) and eigenvalues σ2i (varianceassociated to each PC). Each PC is an M-long vector.M values are distributed bimodally over the entire col-lection of CpG probes. This highlights that there are lowmethylated probes and high ones. Since PCA calculatesthe dominant contributions to the variance, this high-lowvariation forms the zeroth PC. It is merely a constant off-set that shifts the mean methylation from one probe tothe next (Additional file 1: Figure S1C) and accounts for96% of the variation. We subtracted out this contributionand considered only the variation in methylation acrosssamples after this constant offset was taken account of.To assess potential batch effects, we performed aKolmogorov-Smirnov test to determine whether beadchip or position on the bead chip affected distributionsof PC scores. After Bonferroni correction at a P valuecutoff of 0.01, no chip or position on the chip showedsignificantly different scores for PCs 1 to 5, indicatinglittle contribution of batch in our variable-associatedPCs (Additional file 14: Figure S14).Projection thresholdingThe projection of each CpG site onto the eigenvectormatrix (P = XV) quantifies how much each PC con-tributes to the pattern seen on the CpG site. Projec-tion values are approximately normally distributedwith zero mean and variance equal to the associatedeigenvalue. The CpG sites with the largest positiveprojections on a given PC represent the CpG siteswhere the patterns resemble the PC profile. In con-trast, large negative projections correspond to the pat-terns described by the inverted PC (that is, eachcomponent of the PC multiplied by −1) (Additionalfile 3: Figure S3).The selection of the subset of CpGs with the greatestprojections allows us to study the CpG sites responsiblefor driving the pattern of interest. We consider a probeto have a ± nσ contribution from a given PC if its pro-jection is ≷ ± nσi, where σ2i is the eigenvalue of theassociated PC (Additional file 3: Figure S3).Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 15 of 17Principal component reconstructionIdeally, we seek to associate the patterns of methylationfor each PC with an observable trait such as age, tissue,and so on. We can consider the projections as repre-senting the strength of a ‘vote’ of a given CpG for thatparticular PC. If a PC is associated with a trait, we canconstruct a trait predictor using these projections.Given the data matrix X and the projection matrix P wecan reconstruct the matrix of PCs V using V =norm(XTP) where norm indicates normalization of col-umns to unity.The value of this approach is to reconstruct the associ-ated trait predictors (PCs) in an unknown dataset Y, byusing the associated votes from each CpG from the knowndata X. By analogy to the expression above, the new re-constructed matrix of PCs Ṽ is defined as Ṽ = norm(ỸTP).Blood and brain compositionThe neuron/glia composition of our brain samples wascomputed using the publicly available CETS package forR [42]. To obtain the cell composition of our blood sam-ples we used a published deconvolution method [18,41].Hierarchical clusteringTo perform hierarchical clustering, we computed thesample similarities using the nearest point algorithm:d u; vð Þ ¼ min d →ui;→vj  , where d is the Euclidian dis-tance, min is the minimum, u and v are two differentclusters, and→ui;→vj are vectors from the respective clus-ters. The dendogram was calculated and plotted usingthe ‘scipy.cluster.hierarchy’ Matplot library fromPython.Enrichment of spatial location around the TSSFrom the hg19 human genome release, we selectedgenes that do not have any neighboring TSS within 5kbp distance from their own TSS. We found 16,344genes meeting this criterion, which we call ‘lone genes.’In order to discard methylation interference from neigh-boring genes, we selected probes whose distance to thelone genes’ TSS is smaller than 2.5 kbp. The number ofIllumina 450K Human DNA methylation probes associ-ated to lone genes was 60,846 after correcting forpoorly performing probes, probes with SNPs, probeshybridized to multiple locations in the genome, andthose located on the X and Y chromosomes. Theseprobes mapped to 10,176 out of the total 16,344 ‘lonegenes.’For a given a subset of probes, we compared their spatialdistribution with that of the background distribution ofprobes associated with the lone genes (Nbg = 60, 486). Weselected the subset of probes that have a certain thresholdprojection onto the PC of interest, with size Nexp. Thedistances of each probe from the TSS were binned (binsize = 167 bp). For each bin, this gave the number ofprobes from the enriched set nexp and background nbg Toobtain an enrichment Z-score, we considered the null dis-tribution of the enrichment set to be a binomial with meanμ and variance σ2 with probability p = nbg/Nbg:μ ¼ nbgNbgNexpσ2 ¼ 1− μN exp μZ ¼ nexp−μ σGene ontology analysisAll the CpG sites that are less than 2.5 kbp away fromthe TSS of a lone gene were associated to the corre-sponding gene. We have calculated enrichments for bio-logical functions by comparing the features of the genescontained in a particular CpG subset, to the background(‘lone genes’ that have probes on them). The GO termswere obtained using DAVID 6.7 [50,56].Additional filesAdditional file 1: Figure S1. Principal component summary. (A)Cumulative sum of variance % of each PC. We can observe that 75% ofthe total variance is captured by PC1. Ninety percent of the totalvariation is captured by the first 13 PCs. (B) Mean methylation M-value ofeach sample. (C) PC0 emerged as the most dominant pattern. It had ahorizontal line shape since it is due to a methylation offset betweenCpGs that have high values across all samples and CpGs that have lowvalues. (D) PC6 showed variability across individuals but not across tissuesand it was not correlated with any of the variables measured. (E) PC27showed tissue-related levels of methylation for BA7 samples. BA10,Broadmann area 10; BA20, Broadmann area 20, BA7, Broadmann area 7;PC, principal component; WB, whole blood.Additional file 2: Figure S2. Scatter plot of PCs vs sample features. PC1was correlated with tissue of origin. PC2 was correlated with neuronproportion in brain samples. PC3 was correlated with granulocyteproportion of whole blood samples. PC4 was correlated with age in brainsamples but not in whole blood samples. PC5 was correlated with age inboth brain and whole blood samples. BA10, Broadmann area 10; BA20,Broadmann area 20, BA7, Broadmann area 7; PC, principal component;WB, whole blood.Additional file 3: Figure S3. Projection thresholding. (Top) Histogramof the 450K probe projections for PC1. (Bottom) Six probes with differentpositive and negative scores. Positive scoring probes are moremethylated in the blood than in the brain, whereas negative scoringprobes are more methylated in the brain than in the blood. The similaritybetween the probe profile and PC1 (Figure 1A) increases with themagnitude of the score value. PC, principal component.Additional file 4: Figure S4. Variance across samples in each tissue.Line plots show the variance distributions; bar plots show the percentageof total variance on each tissue. (A) Original data. (B) Data without celltype contribution. Blood was significantly more variable across samplesthan brain. BA10, Broadmann area 10; BA20, Broadmann area 20, BA7,Broadmann area 7; WB, whole blood.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 16 of 17Additional file 5: Figure S5. PCs for which blood patterns and brainpatterns correlated with each other. This subset of PCs captured 33.1% ofthe non-tissue-specific variation (PCs after PC3). The first 8 patterns havepositive correlations and they capture 74.5% of the variance of the correlatingsubset; the following 11 patterns have negative correlations and represent25.5% of the variance of the subset. PC, principal component.Additional file 6: Figure S6. Correlation of blood and brain methylationpatterns. We show the variance captured by individual-specific PCs (PCsafter PC3) classified by the correlation of DNA methylation patterns in thebrain and blood.Additional file 7: Figure S7. PCA performed on each tissue separately.All brain tissues (BA10, BA20, BA7) showed a first PC that correlated withneuron proportion (A-C) and a second one that correlated with age (E-G).Whole blood (WB) samples showed two PCs that correlated with granulocyteproportion (D, H) and no PCs correlating with age of participants (not shown).BA10, Broadmann area 10; BA20, Broadmann area 20, BA7, Broadmann area 7;PC, principal component; WB, whole blood.Additional file 8: Figure S8. Percentage of datasets with PCs with asignificant correlation (P < 0.05 and P < 0.01) with the age of participantsafter cell composition was subtracted. The datasets of different sizes weregenerated with random sampling from a larger cohort. We observed thatan age PC can be found with a probability of approximately 60% only indatasets with >22 samples.Additional file 9: Figure S9. Enrichment and depletion of intron/exoncategories in subsets of CpG sites with < − σ projections (left bar) and >σ projections (right bar) compared to the background total 450K CpGsites (central bar). (A-C) Tissue-related PC1-PC3 showed an enrichment ofintron probes and depletion of exon probes irrespective of the sign ofthe projection. (D) PC4, the age signature found in all tissues exceptblood showed an enrichment of exon probes and depletion of intronprobes in both positive and negative projections. (E) PC5, the age signaturethat was found present in all of the tissues checked was found enriched inexon probes in positive projections (CpGs increase methylation with age)and enriched in intron probes in negative projections (CpGs decreasemethylation with age). PC, principal component.Additional file 10: Figure S10. Enrichment and depletion of CpG/non-CpG categories in subsets of probes with < − σ projections (leftbar) and > σ projections (right bar) compared to the background total450K probes (central bar). PC1 showed an enrichment of non-CpG sitesfor probes with negative projections (probes more methylated in brainthan blood) (A). For PC2, probes with positive projections wereenriched for non-CpG (methylation increases with neuron fraction) (B).PC3 showed a general depletion of non-CpG for both projection signs(C). The age-related PCs, PC4 and PC5, showed a slight enrichment ofnon-CpG sites in the probes where methylation increases with age(positive projections in PC4 and negative in PC5) (D,F). (E) Projectionsof non-CpG sites in each PC normalized by the standard deviation of all450K probe projections in the PC. PC, principal component.Additional file 11: Figure S11. Scheme of the construction of spatialenrichment heatmaps. In gray we show the background number ofprobes in each distance bin. Error bars show the standard deviation of abinomial distribution. The color plot is a hypothetical example of anexperimental distribution of probes. The number of standard deviationsaway from the mean background (Z-score) is mapped to a color, wherered corresponds to enrichment and blue corresponds to depletion. TSS,transcriptional start site.Additional file 12: Figure S12. Overlaps of genes containing probeswith positive and negative 2σ projections on the first five PCs. Theindependent-case overlaps were calculated as the product of theobserved probabilities of belonging to each gene set.Additional file 13: Table S1. List of GO term enrichment results fromDAVID.Additional file 14: Figure S14. Distribution of PC scores sorted by chipand row position for the evaluation of possible batch effects. Kolmogorov-Smirnov tests indicated that all distributions were significantly similar. BA10,Broadmann area 10; BA20, Broadmann area 20, BA7, Broadmann area 7; PC,principal component; WB, whole blood.AbbreviationsBA7: Broadmann area 7; BA10: Broadmann area 10; BA20: Broadmann area20; GO: Gene ontology; HC: High-density CpG island; IC: Intermediate densityCpG island; KS: Kolmogorov-Smirnov; LC: Low-density CpG island;PCA: Principal component analysis; PC: Principal component;TSS: Transcriptional start site.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsPF performed the analyses, generated the figures, and drafted themanuscript. MJJ assisted with the study design, interpreted the results, anddrafted the manuscript. MM assisted with the study conception, advised onthe study design, assisted with the interpretation of results, and revised themanuscript. EE assisted with the study design and analysis and revised themanuscript. GT conceived of the study, participated in the study design, andrevised the manuscript. MSK conceived of the study, advised on the analysisand figures, and revised the manuscript. All authors read and approved thefinal manuscript.AcknowledgementsWe thank Lucia Lam for excellent technical assistance. MSK is the co-lead ofthe Biology Working Group of the Canadian Longitudinal Study of Agingand the Canada Research Chair in Social Epigenetics. This study was supportedby funds from NSERC (EE), Brain Canada/Garfield Weston Foundation (MSK, MJM,GT), and NeuroDevNet NCE (MSK). MJJ was supported by a Mining for MiraclesPost-doctoral fellowship from the Child and Family Research Institute. MJM andMSK are Senior Fellows of the Canadian Institute for Advanced Research.Author details1Department of Physics, Simon Fraser University, 8888 University Drive,Burnaby, BC V5A 1S6, Canada. 2Centre for Molecular Medicine andTherapeutics, Child & Family Research Institute, 950 W 28th ave, Vancouver,BC V5Z4H4, Canada. 3Department of Medical Genetics, University of BritishColumbia, 950 W 28th ave, Vancouver, BC V5Z4H4, Canada. 4Ludmer Centrefor Neuroinformatics and Mental Health, Douglas Mental Health UniversityInstitute, McGill University, 6875 Boulevard Lasalle, Verdun, QC H4H 1R3,Canada. 5Singapore Institute for Clinical Sciences, 30 Medical Drive,Singapore 117609, Singapore. 6Canadian Institute for Advanced Research,Toronto, ON, Canada. 7Department of Psychiatry, McGill University, 6875Boulevard Lasalle, Verdun, QC H4H 1R3, Canada.Received: 15 January 2015 Accepted: 21 April 2015References1. Illingworth RS, Bird AP. CpG islands–‘a rough guide’. FEBS Lett. 2009;583:1713–20.2. Weber M, Hellmann I, Stadler MB, Ramos L, Pääbo S, Rebhan M, et al.Distribution, silencing potential and evolutionary impact of promoter DNAmethylation in the human genome. Nat Genet. 2007;39:457–66.3. Price ME, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, et al. Additionalannotation enhances potential for biologically-relevant analysis of theIllumina Infinium HumanMethylation450 BeadChip array. EpigeneticsChromatin. 2013;6:4.4. Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, et al. Quantitativesequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-baseresolution. Science. 2012;336:934–7.5. Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, et al.Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalianDNA by MLL partner TET1. Science. 2009;324:930–5.6. Ficz G, Branco MR, Seisenberger S, Santos F, Krueger F, Hore TA, et al.Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells andduring differentiation. Nature. 2011;473:398–402.7. Song C-X, Szulwach KE, Dai Q, Fu Y, Mao S-Q, Lin L, et al. Genome-wideprofiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell.2013;153:678–91.8. Wen L, Li X, Yan L, Tan Y, Li R, Zhao Y, et al. Whole-genome analysis of5-hydroxymethylcytosine and 5-methylcytosine at base resolution in thehuman brain. Genome Biol. 2014;15:R49.Farré et al. Epigenetics & Chromatin (2015) 8:19 Page 17 of 179. Jin SG, Wu X, Li AX, Pfeifer GP. Genomic mapping of 5-hydroxymethylcytosinein the human brain. Nucleic Acids Res. 2011;39:5015–24.10. Ziller MJ, Gu H, Müller F, Donaghey J, Tsai LTY, Kohlbacher O, et al. Chartinga dynamic DNA methylation landscape of the human genome. Nature.2013;500:477–81.11. Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL,et al. Aging and environmental exposures alter tissue-specific DNA methylationdependent upon CpG island context. PLoS Genet. 2009;5:e1000602.12. Slieker RC, Bos SD, Goeman JJ, Bovée JV, Talens RP, van der Breggen R, et al.Identification and systematic annotation of tissue-specific differentially methylatedregions using the Illumina 450k array. Epigenetics Chromatin. 2013;6:26.13. Yuen RK, Neumann SM, Fok AK, Penaherrera MS, McFadden DE, RobinsonWP, et al. Extensive epigenetic reprogramming in human somatic tissuesbetween fetus and adult. Epigenetics Chromatin. 2011;4:7.14. Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, van Eijk K, et al. Agingeffects on DNA methylation modules in human brain and blood tissue.Genome Biol. 2012;13:R97.15. Lowe R, Gemma C, Beyan H, Hawa MI, Bazeos A, Leslie RD, et al. Buccals arelikely to be a more informative surrogate tissue than blood for epigenome-wideassociation studies. Epigenetics. 2013;8:445–54.16. Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical inepigenome-wide association studies. Genome Biol. 2014;15:R31.17. Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J. Epigenome-wideassociation studies without the need for cell-type composition. NatMethods. 2014;11:309–11.18. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ,Nelson HH, et al. DNA methylation arrays as surrogate measures of cellmixture distribution. BMC Bioinformatics. 2012;13:86.19. Houseman EA, Molitor J, Marsit CJ. Reference-free cell mixture adjustmentsin analysis of DNA methylation data. Bioinformatics. 2014;30:1431–9.20. Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, et al. Factorsunderlying variable DNA methylation in a human community cohort. ProcNatl Acad Sci U S A. 2012;109 Suppl 2:17253–60.21. Fraga MF, Esteller M. Epigenetics and aging: the targets and the marks.Trends Genet. 2007;23:413–8.22. Teschendorff AE, West J, Beck S. Age-associated epigenetic drift: implications,and a case of epigenetic thrift? Hum Mol Genet. 2013;22:R7–15.23. Poulsen P, Esteller M, Vaag A, Fraga MF. The epigenetic basis of twindiscordance in age-related diseases. Pediatr Res. 2007;61:38R–42.24. Horvath S. DNA methylation age of human tissues and cell types. GenomeBiol. 2013;14:R115.25. Florath I, Butterbach K, Müller H, Bewerunge-Hudler M, Brenner H. Cross-sectionaland longitudinal changes in DNA methylation with age: an epigenome-wideanalysis revealing over 60 novel age-associated CpG sites. Hum Mol Genet.2014;23:1186–201.26. Weidner CI, Lin Q, Koch CM, Eisele L, Beier F, Ziegler P, et al. Aging of bloodcan be tracked by DNA methylation changes at just three CpG sites.Genome Biol. 2014;15:R24.27. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-widemethylation profiles reveal quantitative views of human aging rates. Mol Cell.2013;49:359–67.28. Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, et al.Global epigenomic reconfiguration during mammalian brain development.Science. 2013;341:1237905.29. Johansson A, Enroth S, Gyllensten U. Continuous aging of the human DNAmethylome throughout the human lifespan. PLoS One. 2013;8:e67378.30. Bjornsson HT, Fallin MD, Feinberg AP. An integrated epigenetic and geneticapproach to common human disease. Trends Genet. 2004;20:350–8.31. Boks MP, Derks EM, Weisenberger DJ, Strengman E, Janson E, Sommer IE,et al. The relationship of DNA methylation with age, gender and genotypein twins and healthy controls. PLoS One. 2009;4:e6767.32. Bell JT, Tsai P-C, Yang T-P, Pidsley R, Nisbet J, Glass D, et al. Epigenome-widescans identify differentially methylated regions for age and age-relatedphenotypes in a healthy ageing population. PLoS Genet.2012;8:189–200.33. Heijmans BT, Mill J. Commentary: the seven plagues of epigeneticepidemiology. Int J Epidemiol. 2012;41:74–8.34. Jiao Y, Widschwendter M, Teschendorff AE. A systems-level integrativeframework for genome-wide DNA methylation and gene expression dataidentifies differential gene expression modules under epigenetic control.Bioinformatics. 2014;30:2360–6.35. McClay JL, Aberg KA, Clark SL, Nerella S, Kumar G, Xie LY, et al. A methylome-widestudy of aging using massively parallel sequencing of the methyl-CpG-enrichedgenomic fraction from blood in over 700 subjects. Hum Mol Genet.2014;23:1175–85.36. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al.Tackling the widespread and critical impact of batch effects in high-throughputdata. Nat Rev Genet. 2010;11:733–9.37. Banovich NE, Lan X, McVicker G, van de Geijn B, Degner JF, Blischak JD,et al. Methylation QTLs are associated with coordinated changes intranscription factor binding, histone modifications, and gene expressionlevels. PLoS Genet. 2014;10:e1004663.38. Jiang R, Jones MJ, Sava F, Kobor MS, Carlsten C. Short-term diesel exhaustinhalation in a controlled human crossover study is associated with changesin DNA methylation of circulating mononuclear cells in asthmatics. PartFibre Toxicol. 2014;11:71.39. Sequeira A, Mamdani F, Ernst C, Vawter MP, Bunney WE, Lebel V, et al.Global brain gene expression analysis links glutamatergic and GABAergicalterations to suicide and major depression. PLoS One. 2009;4:e6585.40. Davies MN, Volta M, Pidsley R, Lunnon K, Dixit A, Lovestone S, et al.Functional annotation of the human brain methylome identifies tissue-specificepigenetic variation across brain and blood. Genome Biol. 2012;13:R43.41. Koestler DC, Christensen B, Karagas MR, Marsit CJ, Langevin SM, Kelsey KT,et al. Blood-based profiles of DNA methylation predict the underlyingdistribution of cell types: a validation analysis. Epigenetics.2013;8:816–26.42. Guintivano J, Aryee MJ, Kaminsky ZA. A cell epigenotype specific model forthe correction of brain cellular heterogeneity bias and its application to age,brain region and major depression. Epigenetics. 2013;8:290–302.43. Chung NC, Storey JD. Statistical significance of variables driving systematicvariation in high-dimensional data. Bioinformatics. 2015;31:545–54.44. Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlen S-E, Greco D, et al.Differential DNA methylation in purified human blood cells: implications forcell lineage and studies on disease susceptibility. PLoS One. 2012;7:e41361.45. Ladd-Acosta C, Hansen KD, Briem E, Fallin MD, Kaufmann WE, Feinberg AP.Common DNA methylation alterations in multiple brain regions in autism.Mol Psychiatry. 2014;19:862–71.46. Berko ER, Suzuki M, Beren F, Lemetre C, Alaimo CM, Calder RB, et al. Mosaicepigenetic dysregulation of ectodermal cells in autism spectrum disorder.PLoS Genet. 2014;10:e1004402.47. Ahrens M, Ammerpohl O, Von Schönfels W, Kolarova J, Bens S, Itzel T, et al.DNA methylation analysis in nonalcoholic fatty liver disease suggestsdistinct disease-specific and remodeling signatures after bariatric surgery.Cell Metab. 2013;18:296–302.48. Ramsahoye BH, Biniszkiewicz D, Lyko F, Clark V, Bird AP, Jaenisch R. Non-CpGmethylation is prevalent in embryonic stem cells and may be mediated byDNA methyltransferase 3a. Proc Natl Acad Sci U S A. 2000;97:5237–42.49. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, et al.Human DNA methylomes at base resolution show widespread epigenomicdifferences. Nature. 2009;462:315–22.50. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis oflarge gene lists using DAVID bioinformatics resources. Nat Protoc.2008;4:44–57.51. Stewart SK, Morris TJ, Guilhamon P, Bulstrode H, Bachman M, BalasubramanianS. Beck S: oxBS-450K: a method for analysing hydroxymethylation using 450KBeadChips. Methods. 2015;72:9–15.52. McGirr A, Alda M, Séguin M, Cabot S, Lesage A, Turecki G. Familialaggregation of suicide explained by cluster B traits: a three-group familystudy of suicide controlling for major depressive disorder. Am J Psychiatry.2009;166:1124–34.53. Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al.The Mini-International Neuropsychiatric Interview (M.I.N.I.): the developmentand validation of a structured diagnostic psychiatric interview for DSM-IVand ICD-10. J Clin Psychiatry. 1998;59 Suppl 20:22–33. –quiz 34–57.54. Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray.Bioinformatics. 2008;24:1547–8.55. Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F.Evaluation of the Infinium Methylation 450K technology. Epigenomics.2011;3:771–84.56. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools:paths toward the comprehensive functional analysis of large gene lists.Nucleic Acids Res. 2009;37:1–13."@en ; edm:hasType "Article"@en ; edm:isShownAt "10.14288/1.0074675"@en ; dcterms:language "eng"@en ; ns0:peerReviewStatus "Reviewed"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "BioMed Central"@en ; ns0:publisherDOI "10.1186/s13072-015-0011-y"@en ; dcterms:rights "Attribution 4.0 International (CC BY 4.0)"@en ; ns0:rightsURI "http://creativecommons.org/licenses/by/4.0/"@en ; ns0:scholarLevel "Faculty"@en ; dcterms:subject "DNA methylation"@en, "Principal component analysis"@en, "Aging"@en, "Brain"@en, "Blood"@en, "Epigenetics"@en ; dcterms:title "Concordant and discordant DNA methylation signatures of aging in human blood and brain"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/54766"@en .