UBC Faculty Research and Publications

Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets Worsley Hunt, Rebecca; Wasserman, Wyeth W Jul 29, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-13059_2014_Article_412.pdf [ 2.49MB ]
Metadata
JSON: 52383-1.0223318.json
JSON-LD: 52383-1.0223318-ld.json
RDF/XML (Pretty): 52383-1.0223318-rdf.xml
RDF/JSON: 52383-1.0223318-rdf.json
Turtle: 52383-1.0223318-turtle.txt
N-Triples: 52383-1.0223318-rdf-ntriples.txt
Original Record: 52383-1.0223318-source.json
Full Text
52383-1.0223318-fulltext.txt
Citation
52383-1.0223318.ris

Full Text

RESEARCHNon-targeted transcriptionsystemic component of Chg-tomregunr cenintranscription. With the compilation of large and diverseChIP-seq data collections, an opportunity has emerged toate the significance of an observed number of mappedreads in the foreground ChIP experiment [2]. MostWorsley Hunt and Wasserman Genome Biology 2014, 15:412http://genomebiology.com/2014/15/7/412density position (that is, ‘peakMax’).BC, CanadaFull list of author information is available at the end of the articlestudy the common characteristics of TF-bound regions re-vealed by ChIP-seq.commonly background sequence data sources are gen-erated from sheared input DNA or mock immunopre-cipitation (mock-IP) using a non-specific antibody (forexample, IgG). The comparison of the foregroundagainst the background by peak finding software is oftenthe basis for specifying the TF-bound regions, usuallydelineated with a start, stop, and local maximum read* Correspondence: wyeth@cmmt.ubc.ca1Centre for Molecular Medicine and Therapeutics, Child and Family ResearchInstitute, University of British Columbia, Vancouver, BC, Canada3Department of Medical Genetics, University of British Columbia, Vancouver,significantly enriched. Grouping related binding profiles, the set includes: CTCF-like, ETS-like, JUN-like, and THAP11profiles. These frequently enriched profiles are termed ‘zingers’ to highlight their unanticipated enrichment indatasets for which they were not the targeted TF, and their potential impact on the interpretation and analysis ofTF ChIP-seq data. Peaks with zinger motifs and lacking the ChIPped TF’s motif are observed to compose up to 45%of a ChIP-seq dataset. There is substantial overlap of zinger motif containing regions between diverse TF datasets,suggesting a mechanism that is not TF-specific for the recovery of these regions.Conclusions: Based on the zinger regions proximity to cohesin-bound segments, a loading station model isproposed. Further study of zingers will advance understanding of gene regulation.BackgroundThe mapping of the regulatory sequences in the humangenome is proceeding rapidly. Large-scale chromatin im-munoprecipitation coupled to high-throughput sequencing(ChIP-seq) experiments have been a central component ofthe mapping efforts, including both transcription factor(TF) target and histone target derivatives [1]. These map-ping efforts are providing key insights into the propertiesof regulatory sequences, the interactions between TFs, andthe mechanisms contributing to selective patterns of geneThe characteristics of ChIP-seq data are shaped by bothbiological and technical influences [2-5]. As with everyhigh-throughput technology, the community learns pro-gressively more about the nuances of the data as they ac-cumulate. Much effort has focused on the developmentof peak finding methods, which allow for the quantita-tive determination of TF-bound regions within thesequences recovered in a ChIP-seq experiment. In gen-eral, most methods take into account a background rateof sequence recovery and use this background to evalu-Rebecca Worsley Hunt1,2 and Wyeth W Wasserman1,3*AbstractBackground: The global effort to annotate the non-codinchromatin immunoprecipitation data generated with highgenerally successful in detailing the segments of the genfactor (TF), however almost all datasets contain genomicto be determined if these regions are related to the immthere is a portion of peaks that can be attributed to otheResults: Analyses across hundreds of ChIP-seq datasets ga small set of TF binding profiles for which predicted TF b© 2014 Worsley Hunt and Wasserman; licenseof the Creative Commons Attribution Licensedistribution, and reproduction in any mediumDomain Dedication waiver (http://creativecomarticle, unless otherwise stated.Open Accessfactors motifs are aIP-seq datasetsportion of the human genome relies heavily onhroughput DNA sequencing (ChIP-seq). ChIP-seq ise bound by the immunoprecipitated transcriptionions devoid of the canonical motif for the TF. It remainsoprecipitated TF or whether, despite the use of controls,auses.erated for sequence-specific DNA binding TFs revealding site motifs are repeatedly observed to bee BioMed Central Ltd. This is an Open Access article distributed under the terms(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use,, provided the original work is properly credited. The Creative Commons Publicmons.org/publicdomain/zero/1.0/) applies to the data made available in thisWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 2 of 16http://genomebiology.com/2014/15/7/412It is clear that the ChIP-seq procedure is working wellfor detecting regions bound by sequence-specific TFs.Analysis of ChIP-seq datasets reveals an enrichment ofthe expected TF binding site (TFBS) pattern close tothe peakMax or, where no peakMax is determined, peakcentre positions (hereafter also referred to as ‘peakMax’)[6,7]. Ab initio pattern discovery software applied toChIP-seq data routinely recover the known TFBS pat-tern [8], and pattern enrichment methods confirmhighly significant enrichment of the TFBS pattern of theChIPped TF [9,10]. Additionally, a sufficient number ofreplicates have been performed to demonstrate generalconsistency between ChIP-seq datasets using the samecells and antibodies [11].The properties of DNA in the nucleus have a stronginfluence on the results of diverse methods, includingChIP-seq and DNase I hypersensitivity mapping data[12]. Both input DNA and diverse ChIPped DNA reveala strong tendency for the recovery of sequences frompromoter regions [4,11], indicating that the DNA shear-ing process favors regions of open or less compact DNA.These open regions have been demonstrated to beenriched for TF binding and other indicators of access-ible DNA such as key histone modifications [13].One of the open questions about ChIP-seq results isthe not infrequent recovery of peaks under which thetarget motif of the ChIPped TF is absent. Such observa-tions might be attributable to an inadequate understand-ing of the TF binding specificity, the potential indirecttethering of a TF to a region through protein-proteininteractions, or non-specific antibody pull-down. Basedon this background, we sought to understand the prop-erties of ChIP-seq TF binding data, with an emphasison the identification of mechanisms to account for thepast observations of peaks lacking the motif of the tar-geted TF. Based on our research, we report a strikingproperty of TFBS enrichment around the peakMax forCTCF-like, JUN-like, ETS-like, and THAP11 motifsacross a broad set of TF ChIP-seq data. The broadlyenriched TFBS classes, which we term ‘zingers’ for theirstartling enrichment, can account for a substantial por-tion of TFBS ChIP-seq data. The zinger regions areobserved to recur across ChIP-seq data from multiplecell lines and for multiple TFs. These recurring regionstend to be proximal to structural features definedby cohesin and polycomb group proteins. A model toaccount for the observed properties of zingers is intro-duced and discussed.ResultsZingers are TF binding motifs enriched across multiple TFChIP-seq datasetsA subset of TF ChIP-seq data has been reported to lackmotifs for the ChIPped TF, suggesting that there may beadditional proteins interacting in a sequence specificmanner with these regions. Drawing together diverseTF-ChIP-seq data, we sought to determine if character-ized TFs might account for a portion of the discrepancy.To measure the enrichment of TF motifs across the com-piled TF ChIP-seq datasets we performed motif over-representation analyses, using the oPOSSUM 3.0 software[9]. We tested 165 position weight matrices (PWMs) se-lectively curated from the JASPAR development database(see methods), on 285 human datasets (33 cell-lines) for101 TFs (ENCODE and other resources; see Materials andmethods). A parallel analysis of mouse data was performedfor 81 datasets (12 cell-lines) encompassing 43 TFs(ENCODE and other resources; see Materials and methods).For each oPOSSUM analysis we provided a set of back-ground sequences of similar length and nucleotide com-position relative to the ChIP-seq dataset (all peaks wereconstrained to 401 bp length). As there were two or moreChIP-seq datasets for many TFs, generated from differentcell lines or conditions, we averaged the oPOSSUM enrich-ment scores across all datasets for a given ChIPped TF. Thedetails of the statistical measures and assessed thresholdsare presented in the methods. Briefly, two oPOSSUMenrichment scores were used to evaluate the datasets: aFisher-log score (to assess enrichment of motifs acrossmany ChIP-seq peaks) and a Kolmogorov-Smirnov (KS)centrality score (to assess enrichment of motifs in proxim-ity to the peakMax position).Of 165 TF motifs analyzed, CTCF, ETS-like (for example,GABPA and ELK4), and JUN-like motifs were found to beboth the most enriched and most proximal to the peakMaxacross the greatest number of both human (Figure 1A andbinding site logos in 1B) and mouse (Additional file 1:Figure S1A and binding site logos in S1B) TFs’ datasets.We refer to such broadly enriched TF motifs as ‘zingers’,reflecting their potential to confound the analysis andinterpretation of TF ChIP-seq results.To assess if zinger enrichment is independent of theChIPped TFs’ motifs (that is, not over-lapping the ex-pected motif), we performed a second enrichment analysison human ChIP-seq sequences in which the ChIPped TFmotifs were masked (thus restricting the analysis to thesubset of ChIP-seq datasets for which a TF binding profileis available). We again consider the two metrics of Fisher-log enrichment score and KS centrality score. The CTCF,ETS-like, and JUN-like zinger motifs remained enriched(Additional file 2: Figure S2A).Short patterns, such as those found by PWMs, canoccur by chance in the genome. To confirm the findingsof zinger-specific enrichment, we shuffled the zingerPFMs and determined the likelihood of achieving thefrequency of enriched datasets observed for the originalprofile (see Materials and methods). In all cases, the com-parison against the frequency of enriched datasets obtainedWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 3 of 16http://genomebiology.com/2014/15/7/412with the shuffled matrices confirmed that the truezinger motifs’ enrichment was extremely unlikely tooccur by chance (P values are: 2.5e-44 for CTCF, 2.8e-09for GABPA, and 3.7e-08 for JUN).Ab initio motif discovery of zinger profilesWe sought to determine if ab initio pattern discoverycould recover either novel profiles or known TFBS pro-files in pooled data, a process requiring a greater signal-to-noise ratio than the more noise-tolerant oPOSSUMmotif enrichment testing above. Across all of the ChIP-seq data, we masked the motif of the ChIPped TF andrepeat-masked the sequences (see Materials and methods),then drew five sets of 5,000 sequences from the ChIP-seqpool and subjected each set to pattern discovery analysisFigure 1 Zinger binding motifs are enriched across multiple human Cenrichment analysis on 281 human ChIP-seq datasets generated with the odisplayed enrichment near the peakMax for a TF profile. The y-axis is the nof datasets. The profiles most frequently observed to be enriched are labelsame width, information content, and GC composition as the CTCF, GABPAhistogram follow: 2.5e-44 for CTCF, 2.8e-09 for GABPA, and 3.7e-08 for JUNmotifs across the greatest number of datasets, manually grouped by motifcontent (that is, pattern strength) along the y-axis. (C) Motifs detected conrandom sequences. The upper motif is similar to the CTCF logo in sectionusing the MEME system [8]. From the five replicate pools,MEME returned profiles for wide and high informationcontent patterns. In all cases MEME detected a patternconsistent with the CTCF binding profile in the top sixresults (Figure 1C, top logo) and a profile unknownto MEME Suite’s TOMTOM pattern similarity scoringsystem [14] (Figure 1C, bottom logo). A report fromNgondo-Mbongo et al. [15] identified that THAP11 bindsto a motif that matches the unknown profile, so we willhereafter refer to the MEME derived profile as the THAP11profile. We reviewed oPOSSUM results for the enrichmentof the THAP11 motif, and found that it is consistent withthe zingers for the Fisher-log score enrichment frequency,but the motif is not frequently observed to be centrallypositioned based on the oPOSSUM KS-score (although ithIP-seq datasets. (A) The histogram displays the results of TFBS motifPOSSUM 3.0 software. Along the x-axis is the fraction of datasets thatumber of TF profiles that were found enriched for a given fractioned on the histogram. The likelihood (P values) of a PWM with the, or JUN PWMs to attain the enrichment frequency observed in the. (B) The binding site logos of the 10 TF binding models with enrichedsimilarity. Each logo depicts position along the x-axis and informationsistently by ab initio motif discovery across five datasets of 5,000B, while the lower motif is similar to the motif for the THAP11 TF.Worsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 4 of 16http://genomebiology.com/2014/15/7/412is proximal to the peakMax by the heuristic motif enrich-ment method presented below in this report). Given thestrength of evidence, we elected to classify THAP11 motifas an additional zinger.Zinger motif enrichment observed within open chromatinand genomic datasetsUsing the oPOSSUM enrichment analysis procedure, wesought to determine if the zingers showed enrichment inother genomic data collections. ChIP-seq data are recog-nized to be highly enriched with open chromatin re-gions, and in particular ChIP-seq data for CTCF, one ofthe zinger TFs, are known to strongly overlap withDNase I hypersensitive sites [16,17]. We therefore ana-lyzed ENCODE DNaseI-seq and Faire-seq data to assessthe enrichment of the zinger motifs. Each region (aver-age 150 bp) was extended to 401 bp for enrichment ana-lysis using the oPOSSUM 3.0 software. oPOSSUMenrichment results revealed the zinger profiles to be themost frequently enriched within DNaseI-seq and Faire-seq datasets, showing enrichment near the region centrein 50% to 100% of the DNaseI-seq datasets, and 20% to92% of Faire-seq datasets (Additional file 3: Figure S3).We further assessed the ratio of zinger motifs in DNaseand Faire regions compared to flanking regions, provid-ing an indication of the portion of each dataset thatcould be attributed to zingers: mean values of 47% forDNaseI-seq and 13% for Faire-seq were obtained (seeAdditional file 4: Text S1).We have observed enrichment of zingers in other openchromatin associated data such as ChIP-seq data forhelicase-related proteins or histone modifiers (Additionalfile 4: Text S1 and Additional file 5: Figure S4), and ChIP-seq control data (Additional file 4: Text S1 and Additionalfile 6: Figure S5). Thus zinger motifs are observed in mul-tiple classes of genomic datasets.Visualizing the pattern of motif enrichmentWe first used visualization approaches to examine thedistribution of both the motif scores and peakMax prox-imity for the CTCF, JUN, GABPA, and THAP11 zingermotifs for several datasets using TFBS-landscape plots[18]. To visually assess the topological pattern of enrich-ment of zinger motifs using TFBS-landscape plots, weextended all analyzed sequences to 1,001 bp (peakMaxposition at 501 bp), and plotted the motif position rela-tive to the peakMax (x-axis; upstream and downstreamof peakMax) and the motif score (y-axis) of the top scor-ing zinger motif for each peak. As seen in Figure 2, themotif predictions of zinger PWMs are in general con-centrated in motif score ranges across all positions rela-tive to the peakMax, for example, motif scores 70 to 85for CTCF (Figure 2A), or 80 to 87 for JUN (Figure 2B).However, proximal to the peakMax, there is a distinctiveenrichment for the zinger motif, most strikingly seen forCTCF and THAP11 where almost all high scoring motifs(>85) are located proximal to the peakMax. The enrich-ment of JUN and particularly GABPA zinger motifs areless distinctive visually, due to the peakMax proximalenrichment overlapping the same score range as thebackground motifs. In control datasets and with shuffledmatrices we do not see the distinct high scoring popula-tion of motif scores; we instead see a uniform distributionalong the total 1,001 bp of sequence, which conveys, visu-ally, the background rate of motif prediction for the PWM(Additional file 7: Figure S6). The distinctive zinger motifenrichment allowed for the selection of subsets of peaksthat were enriched for the motif of a TF that was not spe-cifically targeted by the ChIP-seq experiment.Defining a set of zinger motif containing peaksBased on the visualization analysis we used a procedurefor determining the range of motif enrichment relativeto peakMax proximity and motif score enrichment [18].The outer limits of these ranges of enrichment were thenapplied as thresholds that defined ‘enrichment zones’ forquantitative analysis of ChIP-seq dataset motif compos-ition (Figure 3; see Materials and methods).For ease of reference, we will hereafter use ‘zinger mo-tifs’ to refer to the collection of CTCF, JUN-like, ETS-like, and THAP11 motifs within the enrichment zonesand ‘zinger motif peaks’ to refer to those peaks within adataset that have a zinger motif but not the ChIPpedTF’s motif. Motif predictions outside the enrichmentzones will be referred to as ‘distal-zinger’ motifs.As anticipated, peaks with the ChIPped TF’s motifproximal to the peakMax comprised the majority ofmost datasets (up to 99% in the best case). After ac-counting for background ChIPped TF motif rates, themean observed portion was 55% (the median was 59%with a median absolute deviation (MAD) of 27 pp).There are, however, extreme cases in which the ChIPpedTF’s canonical binding motif is present in less than 10%of the peaks (Additional file 4: Text S1 and Additionalfile 8: Figure S7).After accounting for background, and excluding twooutliers, up to 45% of a ChIP-seq dataset are zingermotif peaks with a mean of 12% (median of 9% with aMAD of 3 pp) (Additional file 9: Figure S8A). The zingermotif peaks account for up to 69% of the set of peaksunexplained by the ChIPped TF’s motif, with a mean of27% (median of 27% with MAD of 14 pp), in datasetswith at least 1% zinger motif peak content (Additionalfile 9: Figure S8B); the zinger motif peak enrichment isvisually depicted in a heat map format (Additional file 9:Figure S8C). For clarity, the portion of zinger motifpeaks are anti-correlated with the portion of ChIPpedTF motif peaks (Additional file 9: Figure S8D).zinax5 btioFKWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 5 of 16http://genomebiology.com/2014/15/7/412Figure 2 Zinger motifs are enriched at the peak maximum of non-the top scoring motif for each peak relative to the peakMax (the peakMy-axis. The adjacent line plots display the fraction of motifs observed inzinger appears above the related enrichment plot. (A) CTCF motif predicfrom TCF7L2 ChIP-seq (Hct116 cells). (C) GABPA motif predictions from NChIP-seq (K562 cells).No strong dependencies detected for zinger motif occurrenceAs zinger motifs are present in peaks without the ChIPpedTF’s motif we wanted to determine if there were any char-acteristics specific to or in common among this set ofpeaks. We found that neither the presence nor proportionof zinger motif peaks within a ChIP-seq dataset is depend-ent on cell type, as seen in Additional file 10: Figure S9AFigure 3 The fraction of zinger motif peaks and ChIPped TF motif perandom selection of 50 datasets for multiple TFs and cell-lines with zingermotif peak enrichment to the least. Black is the portion of peaks with the Cis the remaining portion of peaks that do not contain either the ChIPped Tger ChIP-seq datasets. The enrichment plots display the location ofis at 0) on the x-axis, while the score of the motif is plotted on thep increments. The logo reflecting the binding specificity for eachns from NRF1 ChIP-seq (GM12878 cells). (B) JUN motif predictionsB ChIP-seq (GM19099 cells). (D) THAP11 motif predictions from IRF1for the five most abundant cell lines. Neither, is theproportion of zinger motif-containing peaks consistentacross multiple datasets for the same TF (Additionalfile 10: Figure S9B).Next we asked if the zinger motifs have a strong tendencyto co-occur in the same zinger motif peaks. We found thatat most 11% of datasets show a positive association with aaks varies across ChIP-seq datasets. The pie charts present amotifs present (>1% zinger). The charts are ordered by greatest zingerhIPped TF’s motif, red is the portion of zinger motif peaks, and brownF nor zinger motifs.gions. Comparing the set of zinger motif peaks to peakspeaks with a zingers’ motif. The peak scores for the zingerWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 6 of 16http://genomebiology.com/2014/15/7/412containing peaks are significantly poorer than for thosepeaks with the ChIPped TF’s canonical motif (Wilcoxonone-tailed test P values <5.0e-05).Peaks with a zinger motif may be bona fide targets of thezinger TFPrediction of TFBSs can suffer from poor specificity, andas the enriched zinger motifs’ peaks were unexpectedlyfound in datasets for non-zinger TFs, we asked if thezinger motif peaks were actual binding locations for thezinger TF or not. Therefore we investigated the degreeof agreement (co-occurrence within 100 bp) betweenzinger motif peaks with a strong motif score (score >85)and ChIP-seq data ChIPped for the zingers TF in thesame cell type (Figure 4). On average 75% of zinger CTCFmotif peaks overlapped CTCF ChIP-seq peaks (median79% with a MAD of 15 pp); 38% of zinger JUN motifpeaks overlapped JUN ChIP-seq peaks (median 38%with a MAD of 17 pp); and 28% of zinger GABPA motifpeaks overlapped GABPA ChIP-seq peaks (median 27%with a MAD of 13 pp). In all cases the agreement waswith the ChIPped TF’s motif, we did not detect consist-ent enrichment tendencies that distinguished betweenthe two sets of regions (Additional file 4: Text S1).Peaks containing a zinger motif but lacking the ChIPpedTF motif have low scoresAs zinger motifs are an unexpected presence across data-sets we assessed the quality of the peaks they occur in,asking if the zinger motifs tended to be in the lower scor-ing peaks of the dataset. We compared the peak callingscores of peaks containing the ChIPped TF’s motif againstsignificant P value (Fisher exact P values <0.001 and logodds ratios >0) for any pairwise co-occurrence of two dif-ferent zingers within a single peak (the most frequent pairof zingers being GABPA and THAP11). A few datasets(17%) show a negative association for zinger motif co-occurrence with a significant P value (Additional file 11:Figure S10A). Thus, the zinger motifs are not inter-dependent. We next evaluated the pairwise tendency forzinger motif peak enrichment within the same ChIP-seqdatasets, finding unremarkable correlation values (correl-ation coefficients -0.0233 to 0.3803) (Additional file 11:Figure S10B).Lastly we determined whether zinger motif peaks wereconsistently located near a feature in the genome. Weevaluated the proximity of zinger-associated regions togenomic features such as transcription start sites (TSS),CpG islands, conserved regions, and repeat sequence re-significant (Wilcoxon P values <3.4e-20) with respect tothe distal-zinger control (see Additional file 4: Text S1),and indicated that many of the zinger regions may bebona fide binding regions for the zinger TF.A comparison of the peak scores for the ChIP-seqpeaks that overlapped the set of zinger motif peaks versusthe set of distal-zinger peaks revealed a significant differ-ence between the two groups (Wilcoxon one-tailed testsignificance threshold P value <0.001). The zinger motifpeaks associated with stronger scoring ChIP-seq peaksthan did the distal-zinger peaks for the majority of data-sets (that is, 81%, 67%, and 79% of CTCF, JUN, andGABPA ChIP-seq datasets, respectively).Zinger motif peak regions recur across multiple TFdatasetsAs zinger motif peaks are enriched in numerous datasetsfor which the zinger is not the targeted TF, we askedwhether the same zinger regions were occurring repeat-edly across multiple datasets, that is, are the same zingerregions being ChIPped by many TFs. We pooled thezinger motif peaks, which by definition lacked the motifof the ChIPped TF, from across datasets (33 cell lines;823,574 peaks), requiring that the zinger motif have astrong motif score of 85 or greater to reduce false posi-tives. We assigned peaks whose peakMax were within50 bp of each other into neighbourhoods (see Materialsand methods), and then assessed the recurrence of eachneighborhood, that is, the number of unique TFs whose data-sets contributed a zinger motif peak to the neighbourhood.We obtained 257,631 zinger neighbourhoods of which92,244 neighbourhoods derived from regions ChIPped bytwo or more unique TFs. The neighborhoods ChIPped bytwo or more TFs are on average 167 bp in width (max-imum 607 bp), and 77% derive from two or more cell lines.This amounts to approximately 15.4 Mbp of recurrentlydetected zinger motif associated sequence that wasChIPped by 2 to 41 non-zinger TFs in up to 21 cell lines.Figure 5 exemplifies the number of TFs that ChIPpedzinger neighbourhoods across chromosomes 1 and 3(zinger neighbourhood coordinates are provided inAdditional file 12: Dataset S1).We similarly generated neighborhoods from those re-gions with neither the ChIPped TF nor zinger motifs (un-identified motif neighborhoods - 536,546), and from theregions found to have a high scoring motif (score >85) forthe ChIPped TF and no zinger motif (ChIPped TF neighbor-hoods - 408,677) (see Materials and methods). The zingerneighborhoods were found to be ChIPped by significantlymore unique TFs than are the other two sets of neighbor-hoods (Wilcoxon one-tailed test P value = 0).The recurrence of the zinger motif peaks across datasetsprompted us to consider the motif content of HOT re-gions. HOT (high occupancy of transcription-related pro-teins) regions, as defined by Yip et al. [19], are ChIP-seqregions that within a single cell line (GM12878, HeLa,Worsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 7 of 16http://genomebiology.com/2014/15/7/412H1-hESC, HepG2, or K562) demonstrate binding co-occurrence among chromatin-related factors, general TFs,and sequence-specific TFs. Yip et al. noted that a substan-tial portion of a cell-line’s HOT regions are motif-less forthe ChIPped factor, and associate with strong DNaseI sig-nals. The HOT regions are present in two or more celllines in 25% of cases according to Yip et al., while zingerneighborhoods were noted above to be 77% of cases.oPOSSUM over-representation analysis on the combinedset of HOT regions found the zinger motifs to be 13 outof the 20 most enriched patterns, consistent with whatwas observed above for the DNaseI-seq/Faire-seq openchromatin datasets (Additional file 13: Table S1).Zinger neighbourhoods tend to occur close to regionsoccupied by cohesinRecurring open chromatin enrichment across datasetssuggested that structural properties of chromatin mightFigure 4 ChIP-seq data for zinger TFs overlaps zinger motif peaks frodatasets is alphabetically ordered by TF name horizontally. The y-axis reprepeak in experiments performed with the same cell type. Two populations opeaks with a peakMax-proximal zinger motif, and open triangles represent(C) GABPA. The horizontal dashed line at 0.13 is a qualitatively selected visucontribute to zinger motif recovery across ChIP experi-ments [12]. Cohesin is a protein noted for both its role ingene regulation and DNA structure [20,21]. It is a multi-subunit complex, which is believed to form a ring likestructure around DNA, and has been well documented inits role of sister chromatid interaction during the mitoticmetaphase. Cohesin has also been implicated in promotinginteraction between enhancers and core promoters ofactive genes in embryonic stem cells [21] and in chromo-somal looping [22]. Chromosomal looping may be a struc-tural element that is conducive to DNA shearing under thestress of sonication. Additionally, cohesin or associatedproteins may function as a ‘loading station’ by bringing to-gether proteins bound to remote regulatory elements andpromoter regions that will in turn regulate transcriptionwithin the looped region [23].We evaluated the proximity of the zinger neighborhoodsto cohesin-interacting regions. Zinger neighborhoods arem other TF’s datasets. For each plot, a selection of TF ChIP-seqsents the fraction of peaks that overlap with the zinger TF’s ChIP-seqf peaks are plotted per dataset: solid circles represent the subset ofthe subset of peaks with a distal-zinger motif. (A) CTCF, (B) JUN, oral aide.Worsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 8 of 16http://genomebiology.com/2014/15/7/412enriched for proximity (that is, within 500 bp) to cohesinregions (via RAD21 and SMC3 ChIP-seq) compared tothe ChIPped TF neighborhoods or unidentified motifneighborhoods (Fisher exact one-tailed test P value of 0for both comparisons; 77% of the zinger neighborhoodsobserved for multiple TFs are proximal, while 46% ofthe unidentified motif and 13% of the ChIPped TFneighborhoods are so positioned). The neighborhoodsfor unidentified motif peaks were also significantly moreproximal to cohesin than neighborhoods from ChIPpedTF peaks (Fisher exact one-tailed test P value of 0). Assome of the neighborhoods contain CTCF zinger attrib-uted regions, and cohesin is known to interact with CTCF[24,25], we removed neighborhoods within 500 bp of aCTCF ChIP-seq region and repeated the analysis. Re-gardless of the depletion of CTCF associated neighbor-hoods, the zinger neighborhoods remained significantlycloser to cohesin (Fisher exact one-tailed test P value of0 for all comparisons).Another system noted to impact chromatin structureare the polycomb group proteins (including polycombrepressive complex 1 (PRC1) and polycomb repressiveFigure 5 Zinger motif peaks recur across datasets for multiple TFs. Thtext): one set derived from zinger motif peaks (red) and the other from Chneighborhood position on a chromosome: (A) chromosome 1, (B) chromoin a neighborhood. A horizontal dotted line at y = 5 is given for visualizatiolocations (red) that were ChIPped by multiple unique TFs.complex 1 (PRC2) forms), which are implicated in theremodeling of chromatin. In drosophila, PRC1 has beennoted to interact with cohesin to co-regulate activegenes [26]. We used ChIP-seq data for the constituentproteins CBX and EZH2 proteins to identify regionsbound by the PRC1 and PRC2 complexes, respectively.We found that the zinger neighborhoods were signifi-cantly closer to CBX peaks and EZH2 peaks than are theneighborhoods derived from either ChIPped TF motifpeaks, or from unidentified motif peaks (Fisher exactone-tailed test P value of 0). We observed that the PRC1and PRC2 peaks proximal to the zinger neighborhoods,tend to be those that are also within 500 bp of cohesin(Fisher exact one-tailed test P value <7.6e-160 for PRC1,and P = 0 for PRC2). The unidentified motif neighbor-hoods are, in turn, significantly closer to PRC regionsthan the neighborhoods derived from peaks with themotif for the ChIP-seq experiment’s targeted TF.Thus, the zinger neighborhoods, and to a lesser degreethe unidentified motif neighborhoods, are associated withcohesin and polycomb repressive complex regions. Thissuggests that these diverse regions, which were initiallye plots present two distinct neighbourhood sets (as defined in theIPped TF motif peaks without zinger motifs (black). The x-axis gives thesome 3. The y-axis is the number of unique TFs that ChIPped a peakn purposes, to highlight that there are many zinger neighborhoodidentified as not containing the motif of the ChIPped TF,and yet in many cases enriched for an alternative motif(zingers), may be part of a structure involving cohesin. Sucha structure could influence the tendency for these regionsto be detected recurrently across diverse ChIP-seq data.DiscussionChIP-seq experiments are increasingly used to investigatehow sequence-specific DNA binding TFs regulate geneexpression. In this report, we introduce ‘zingers’: four clas-ses of TFBSs that display significant binding site enrich-ment, unexpectedly proximal to the peakMax, acrossChIP-seq binding experiments for other TFs. Within indi-vidual TF ChIP-seq experiments, up to 45% of peaks areobserved that lack the canonical TF binding motif andcontain a zinger motif, with a mean of 12% (median 9%).While biased to the lower scoring peaks in other TFChIP-seq data, the same zinger-associated regions tend tobe high scoring peaks within datasets ChIPped for thezinger TF; indicating these regions are likely bound by thezinger TF. The zinger motif peaks derive from 257,631 re-gions (neighborhoods) in the genome, 36% of which areobserved recurrently across datasets for diverse TFs, insharp contrast to neighborhoods containing only theChIPped TF’s motif, which recur relatively infrequently.Some regions lacking both the ChIPped TF’s motif anda zinger motif, are also recurrently observed. Bothzinger motif and unidentified motif neighborhoods arehIPe Tourp pTFre kmeingWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 9 of 16http://genomebiology.com/2014/15/7/412Figure 6 A model to account for zinger motif enrichment across Ccompatible with the observed enrichment of zinger motifs across diversChIPped TF, the magenta oval represents the zingers, the remaining colwith the DNA, and the red loop represents cohesin and polycomb groustation. Multiple proteins may interact within a local region, from whichstructural components such as cohesin and polycomb group proteins aDNA loading station segments might be recovered in a ChIP-seq experizinger motif is present in trans (upper) or in cis (lower). (C) Indirect bindmediating protein. The zinger motif is again present in trans (upper) or in cthe loading station, providing an abundance of epitopes, thus increasing th-seq datasets. A TF loading station model is presented that isF ChIP-seq data and cell lines. The dark blue oval represents theed ovals represent TFs or other proteins or complexes that engageroteins. The grey strands are chromatin. (A) Overview of a loadings may disperse to search for other regulatory regions. Zingers andey features. Panels B, C, and D present specific scenarios under whichnt. (B) Direct binding. The ChIPped TF directly binds to a TFBS, while a. The ChIPped TF is present due to an indirect interaction, involving ais (lower). (D) Non-specific events. Numerous proteins are present ate probability of being recovered in a ChIP-seq experiment.Worsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 10 of 16http://genomebiology.com/2014/15/7/412positioned proximal to structural regions defined by thepresence of cohesin and polycomb group complexes. Ac-counting for the contribution of zinger-associated regionsto global studies of regulatory sequences will be a consid-eration for future analysis of ChIP-seq data.Understanding the underlying biochemical mechanismby which the zinger-associated regions are observed acrosssuch diverse datasets remains to be resolved. However,based on the findings in this investigation, we present a‘loading station’ model consistent with our state of under-standing (Figure 6). Cohesin/polycomb and zinger proteinsare proposed to participate in demarcation and stabilizationof inter-segment interactions of DNA at which TFs bind.At these ‘stations’, the ChIPped TF may be present via direct(Figure 6B) or indirect (Figure 6C) interactions withthe DNA, and either in cis- or trans- arrangements with azinger TFBS. In a ChIP experiment, assuming covalentlinking of the ChIPped TF and the cohesin-paired DNA,the patterns of motif enrichment observed in this reportcould emerge, including the presence or absence of motifsfor both the ChIPped TF and a zinger. Alternatively, or pos-sibly in combination, there may exist zinger-containing re-gions (Figure 6D) at which many proteins are present (at acell population level). Such regions may contain a diverserange of epitopes and therefore be more likely to be recov-ered in ChIP-seq experiments, especially with polyclonalantibodies. Within this model, TFs may ‘visit’ cohesin andzinger marked regions, resulting in a low but consistent re-covery of reads in a ChIP-seq experiment. The model ac-counts for recurring detection of zinger motif peaks, theproximity of the peaks to cohesion interacting regions, andwhy the zinger motifs may be present in the sequence evenwhen the ChIPped TF’s motif is absent.From a broader mechanistic perspective, a loading sta-tion mechanism is consistent with the ‘hop-skip-jump’theory for how TFs efficiently search the nucleus to arriveat their TFBSs [27]. The proposed loading station modelis supported in recent literature. Faure et al. [23] proposea role for cohesin in stabilizing large protein-DNA com-plexes. While this manuscript was under review, Taipaleet al. [28] published a study using the LoVo cell line sug-gesting that cohesin participates in holding chromatinopen during cell division to facilitate TFs relocating backto those regions once division is complete.The zinger content of every ChIP-seq dataset should beevaluated, consistent with a growing effort to criticallyevaluate such data [12,29,30]. For instance, the STAT1(GM12878) ChIP-seq dataset exceeds 30% of peaks withzinger motifs proximal to the peakMax, while STAT1 mo-tifs occur only at the background frequency. We proposea general approach for the study of zinger content. Foreach ChIP-seq dataset, the peak regions should bescanned for the presence of the ChIPped TF motif inproximity to the peakMax. The peaks lacking a ChIPpedTF motif should be compared to the recurring zingerneighborhoods (Additional file 12: Dataset S1). The por-tion of the dataset overlapping the neighborhoods givesinsight into the overall specificity of the experiment.ConclusionsWe have identified zinger motifs that are frequentlyenriched across a portion of TF ChIP-seq data, includingCTCF-like, ETS-like, and JUN-like motif families, andTHAP11. As high-throughput ChIP-seq data informsgenome annotation, research into gene regulation maybe impacted by zinger motif derived annotations. Mov-ing forward it will be important to determine the preva-lence of zinger-like motifs in ChIP-seq data in diverseorganisms, probe the structural properties of the zingerregions, and develop computational approaches to sys-tematically identify recurring zinger regions in large-scale genome annotation. Ultimately, understanding thebiophysical processes that result in the zinger motif en-richment in ChIP-seq data may provide broader insightinto the mechanisms of transcription regulation.Materials and methodsDatasetsFor our analyses, we used ENCODE ChIP-seq datasets(human and mouse), ENCODE DNaseI-seq and Faire-seqdata, and human ChIP-seq controls [1] downloaded fromthe UCSC ENCODE database [31]. We also incorporatednon-ENCODE ChIP datasets downloaded from GEO:(1) GSE11431 - 13 mouse ESC datasets [32]; (2) GSE25532 -mouse NFYA data in ES cells [33]; (3) GSE17917 andGSE18292 - human KLF4, POU5F1, cMYC, NANOG, andSOX2 data [34]; and (4) GSE22078 - human and mouseCEBPA and HNF4A [35]. Where only the mapped datawere available, we used FindPeaks 4.0 [36] to call peaksusing the following parameter options: dist_type 1 200-subpeaks 0.6 -trim 0.2 -duplicatefilter. The ENCODEbroadPeak datasets frequently occurred in replicate; toavoid duplication, only the replicate with the most peaksof a pair was used for analyses.Where coordinates were provided as NCBI36/hg18 orNCBI36/mm8, they were first converted to GRCh37/hg19 or NCBI37/mm9, using a locally installed versionof the UCSC lift-over tool [37]. We then used theEnsembl API to retrieve sequences from GRCh37/hg19and NCBI37/mm9 assemblies.The ENCODE ChIP-seq data are in one of two formats,narrowPeak and broadPeak. Both formats contain two col-umns pertaining to statistical significance of the peaks(also known as peak scores): one is a P value, the other a qvalue (bonferroni corrected). We used the q value fieldwhen it was assigned, and otherwise used the P value field.As peaks are reported in a multitude of lengths, in therange of 1 bp to greater than 5,000 bp, we trimmed orWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 11 of 16http://genomebiology.com/2014/15/7/412extended all peaks to a constant length centered at thepeak maximum for narrowPeak format datasets, or atthe peak centre for broadPeak format and DNase-seq/Faire-seq datasets. For enrichment visualization and de-termining heuristic boundaries of enrichment we used1,001 bp sequences, oPOSSUM TFBS enrichment ana-lysis input was 401 bp sequences, and ab initio motif de-tection input was 201 bp sequences.Position frequency matrices (PFMs) were obtainedfrom the JASPAR [38] development 4.0_alpha databaseof transcription factor models (prior to 2013, April).Where the JASPAR PFM did not agree with the consen-sus in the literature we performed an ab initio analysison the top 500 peaks (selected by peak score) of two ormore ChIP-seq datasets for the given TF, using a locallyinstalled version of the MEME software [8]. MEME re-sults were then checked against the literature and for en-richment in a different ChIP-seq dataset for the givenTF. MEME position specific probability matrices (PSPM)were converted to PFMs by transposing the PSPM andmultiplying each letter (A, G, C, T) frequency in thematrix by the number of sites found by MEME. ThePFMs were subsequently converted to position weightmatrices (PWMs), using the TFBS Perl Module [39],only PWMs based on PFMs with information content(IC) greater than 8 bits were retained. The PFMs used inthis study are provided in Additional file 14: Dataset S2.For those analyses using datasets of shuffled matrices,the datasets were generated by random permutation of allcolumns of the originating PFMs, excluding the lower in-formation content columns on the edges (2 columns oneach side for all cases, except for the wider CTCF PWMfor which 3 columns on each side were held constant).Motif over-representation analysisMotif over-representation analyses were performed witha locally installed version of oPOSSUM 3.0 [9]. We usedthe sequence-based analysis option with default settings,except for specifying the use of the JASPAR develop-ment PFM matrices (Additional file 14: Dataset S2). Wetrimmed or extended all peaks to 401 bp. Backgroundsfor the over-representation analyses came from the map-pable portion of the genome, and were chosen to matchthe sequence length and mononucleotide GC compos-ition distribution of each dataset.The oPOSSUM Fisher-log enrichment score is derivedfrom a one-tailed Fisher exact probability test, based onthe hypergeometric distribution which compares thenumber of sequences that contain a motif for the TF ofinterest in the target and background datasets. Thenegative natural logarithm of the Fisher test probabilitiesis the reported Fisher-log score. Thus a Fisher-log scoreof 6.91 or higher is equivalent to a P value of 0.001 orlower. Fisher-log enrichment scores of ‘infinite’ valuewere set to either 500 or to 100 past the maximum non-infinite Fisher-log score.The oPOSSUM KS centrality score is the negative loga-rithm of the probabilities from a Kolmogorov-Smirnovtest. Thus a KS score of 6.91 or higher is equivalent to aP value of 0.001 or lower. The Kolmogorov-Smirnov testswhether a TF’s motifs are positionally enriched at the cen-ter of the target sequences relative to the motifs in thebackground set of sequences. KS ‘infinite’ enrichmentscores were set to 100.To calculate the number of datasets enriched for amotif we first obtained the average Fisher-log score andKS log score for datasets ChIPped for the same TF.Once we had a set of scores for each TF, we used a bin-ary count of 1 or 0 to indicate whether both of theoPOSSUM enrichment scores passed a threshold basedon the standard deviation (SD) of the scores or not (twoSD for Fisher-log scores and one SD for KS log scores).This yielded the number of datasets with enrichmentaround the sequence midpoint for each of the 165 TFs.We then applied a further correction to compensate forthe bias created by multiple datasets for families of TFsthat recognize the same motif (for example, JUN, JUND,JUNB, AP1, FOS, FOSL1, FOSL2, and BATF PWMs allrecognize a TGA(g/c)TCA consensus). The number ofmotif-family members, minus one, was subtracted fromthe count of datasets for each of the member TFs, forexample, if JUNB were enriched in 20 TF datasets, and 9of those datasets were ChIPped for a TF that recognizesthe JUN-motif family consensus, then a count of eightwould be subtracted from 20. The 165 TFs were thenranked according to this final number of associated datasets.Motif over-representation analysis with shuffled matricesTo assess the probability of a PWM’s predictions beingenriched within as many datasets as observed with thezinger PWMs, we shuffled the PFMs of the zingers andfit a distribution to the results. We generated 100 shuf-fled matrices as described above. We performed oPOS-SUM enrichment analyses with the shuffled PWMs, onthe same human datasets as used to generate Figure 1.The oPOSSUM results were evaluated as outlined above.However, we applied the enrichment score thresholdsfor each dataset as was set for the original PWMs. Wethen counted the number of datasets within which eachshuffled profile was enriched, and fit a zero-adjustedlogarithmic distribution (ZALG) to the counts. The dis-tribution was selected using the fitDist() function inthe R statistical package GAMLSS 4.1-5 [40], and theparameters describing the distribution were obtainedwith gamlss family ZALG and the gamlss() function. Wetested for goodness-of-fit of the distribution to the databy generating datasets from the random generation func-tion, rZALG, and assaying the similarity of the generatedWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 12 of 16http://genomebiology.com/2014/15/7/412distributions to our data using a chi-squared test. The fit-ted distribution function was then used to determine theprobability of the shuffled PWMs obtaining a result as ex-treme as the original PWM. The probability was calcu-lated with the density function for the zero-adjustedlogarithmic distribution (dZALG).Motif predictionMotif prediction was performed with C-code adaptedfrom the TFBS Perl Module [39], reporting relative motifscores. Motifs predicted by a PFM are not permitted tooverlap by more than one-fifth the PFM length (this set-ting is intended to equate to the low information con-tent flanks of a PWM), for example, a 7 bp motif couldonly overlap a neighboring motif by 1 bp.For post-oPOSSUM analyses, we predicted the presenceof zinger motifs using one PWM per zinger TF motif fam-ily as proxy, to prevent redundancies. CTCF-like motifswere predicted with the CTCF PWM, ETS-like motifswith the GABPA PWM, JUN-like motifs with the JUNPWM, and THAP11 motifs with a THAP11 PWM.MEME suite toolsMEME [8] analyses were run using the following op-tions: -dna -nmotifs 10 -minw 6 -maxw 15 -maxsize2000000 -mod zoops -revcomp. TOMTOM [14] ana-lyses were run with default values, aside from increasingthe E-value threshold to 20, from the web server.Repeat-maskingMasking of repeat elements was performed using a localinstallation of RepeatMasker (RMBlast) [41] and RepBase[42], using default settings.Data processing and statistical analysesData processing and statistical analyses were done with acombination of in-house Unix and R scripts (R version2.14.1) [40]. Throughout the manuscript we report thecombination of median and the median absolute devi-ation (MAD), a measure of dispersion around the me-dian. For a normal distribution the median and MADare the same values as the mean and SD.TFBS-landscape visualization plotsTo visualize peakMax proximal enrichment of TF motifswithin ChIP-seq datasets, the top scoring predictedmotif in each region for the given TF PWM, was plottedrelative to its signed distance from the peakMax (usingthe R basic statistical package [40]). The dense horizon-tal ranges of motif scores spanning all positions relativeto the peakMax, such as seen in the Figure 2 plots, areobserved for the combination of most PWMs and ChIP-seq datasets, and are likely a mixture of both false andtrue TFBS predictions. Those motif matches that aredistal to the peakMax are anticipated to be less reliable,as the observed frequency is consistent with backgroundrates of motif prediction. If we take enrichment proximalto the peakMax as a measure of confidence for the pre-dictions we can determine a distance threshold andmotif score threshold (see next section) at a point wheremotif frequency proximal to the peakMax is greater thanthe flanking distal motif frequency. Using this threshold,we can select a sub-population of peaks that are lesslikely to have arisen by chance.Heuristic boundaries of enrichmentWe assessed the enrichment of motif distance to thepeakMax and motif score, using a heuristic method fortopological motif enrichment [18], which we outline inbrief here. To determine whether a motif was proximalto the peakMax, we used heuristic distance boundariesderived from the density of the top scoring motif foreach 1,001 bp region. We identified the location, relativeto the 501st bp, at which the density of motifs exceedsthat of the distal region (approximately 175 to 500 bpdistant from the peakMax). This change in density is ob-served in the TFBS-landscape plots of Figure 2, wherethere is a constant density of motif scores in the distalregions and an increase in the density of motif scoreswithin approximately 100 bp of the peakMax. The heur-istic distance boundaries were set at the transition point.A similar procedure was applied to determine a thresh-old for the motif score, where the motif score thresholdwas set at the point where the motif enrichment prox-imal to the peakMax was at least 20% higher than theflanking enrichment. The region defined by the distanceboundaries and the motif score threshold, was termedthe ‘enrichment zone’. The enrichment zone was subse-quently used to specify peakMax enriched proximal mo-tifs. On average, an enrichment boundary was ±90 bpfrom the peakMax, and the motif relative score thresh-old was 82.The heuristic analysis of motif enrichment across data-sets reports that on average a CTCF zinger motif isenriched above a motif score threshold of 79, while forJUN the average was 86, for GABPA it was 83, and forTHAP11 it was 84. CTCF and THAP11 in particularconsistently have enrichment above a motif scorethreshold of 85 that is strongly distinct from the flankingregions of similar score range, as seen in Figure 2A andD. The regions that flank the peakMax proximal enrich-ment in Figure 2 are representative of the backgroundexpectation of a PWM’s motif prediction. Thus, to re-duce the presence of false positive predictions in subsetsof peaks we analyzed, we selected, where noted in themain text, peaks with a motif scoring above the motifscore threshold of 85. The use of a single threshold per-mits the processing of data as a single unit. A motifWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 13 of 16http://genomebiology.com/2014/15/7/412score of 85 is also the default threshold score in theoPOSSUM software.Background expectation of motif predictionsTo estimate the proportion of regions in a given datasetin which motifs may result from background motif pre-diction, we compared the count of regions with motifsin the enrichment zone relative to the count of regionswith motifs at least 50 bp outside the enrichment zone.The distal ‘zone’ from which counts were determined,was set to be the same length of sequence as the enrich-ment zone, that is, if the enrichment zone was 200 bpwide, then the distal zone was also 200 bp wide (100 bpfrom 5′ and 100 bp from 3′ of the region center). To es-timate the proportion of regions in the enrichment zonewith false positives, we divided the number of regionswith motifs in the distal zone by the number of regionswith motifs in the enrichment zone. See Additional file 4:Text S1 for the estimated overall background expectationof ChIPped TF and zinger motif prediction.Calculating the background corrected estimates ofChIPped TF and zinger motif proportions within a data-set was done by subtracting the distal zone count fromthe enrichment zone count for the ChIPped TF or eachzinger. For the ChIPped TF, the corrected count was di-vided by the size of the dataset. For the four zingers, thefour corrected counts were first summed, and then di-vided by the size of the dataset.Heatmaps and correlation between zinger motifsHeatmaps were created with the heatmap.2() functionfrom the R statistical package: gplots, with the distancemeasure as ‘manhattan’ and the ‘ward’ agglomerationmethod for clustering.The heatmap of zinger motif peak log2 fold enrich-ment was generated using the log2 fold enrichment ofzinger motif peaks with motif score 85 or greater, rela-tive to distal-zinger peaks of similar score range. Wherethe fold enrichment was below 1.5 we assigned a mini-mum value, represented as a grey colour in the heatmap,to facilitate visualization.A heatmap of zinger motif inter-dependency withindatasets was generated using the set of zinger motifpeaks with motif scores equal to or greater than 85, anda 2×2 confusion matrix for each pair of zinger motifs. AFisher exact P value <0.001 was taken to indicate signifi-cance and the sign of the log odds ratio to indicatewhether a positive or negative association existed. Thevalues used to generate the heatmap were 1-pvalue forpositive associations, -1*(1-pvalue) for the negative asso-ciations, and 0 for the non-significant P values.The pairwise correlation of zinger motif peaks for thedifferent zingers, across datasets, was assessed usingthe log2 fold enrichment values generated for the aboveheatmap. The correlations were evaluated with bothPearson correlation and Spearman’s rank order correl-ation (R basic statistical package: cor() function).ChIP-seq controlsWe obtained controls from a range of cell types and EN-CODE consortium groups, and processed the mappedreads with FindPeaks. We used the peak height to rankthe control peaks, and then selected the top 70,000peaks. The number of peaks was chosen to match theaverage size of the ChIP-seq datasets. The peaks werethen scored with the zinger PWMs and the enrichment ofthe motifs with respect to the peakMax was evaluated.Evaluating proximity of zinger motif peaks to genomicfeaturesWe compared the genomic feature proximity of zingermotif peaks, with those peaks containing the ChIPpedTF’s motif and lacking zinger motifs. We measured thedistance between the peakMax and the middle of thefeature, which in the case of transcription start sites(TSSs) was simply the starting coordinate of the tran-script. We used only those datasets for which we had atleast 200 zinger motif peaks. The number of peaks thatwere within 500 bp, 1 kb or 5 kb of the TSS, or within500 bp of CpG islands, conserved regions or repeat-masked regions were compared between the zingermotif peaks and the ChIPped TF peaks using a Fisherexact test. For the results of a zinger to be consideredstriking we required that at least 60% of the datasetswith zinger motifs show statistical significance in onedirection, that is, either 60% of datasets tend to be prox-imal to a feature, or 60% of datasets tend to be distal toa feature.Comparing zinger regions from non-zinger ChIP-seqdatasets to peaks ChIPped by the zinger TFWe assessed the proximity of the zinger motif peakswith a high scoring zinger motif (score >85) to ChIP-seqpeaks ChIPped by the zinger’s TF to determine whetherthe zinger motif peaks found in datasets for which thezinger is not the targeted TF, are potential bona fidebinding regions for the zinger TF. In all cases we re-quired that the zinger motif peaks and zinger TF’s ChIP-seq data be from the same cell line. To call a zingermotif peak in agreement with the zinger TF’s ChIP-seqdata we required that the peakMax of the zinger motifpeak be within 100 bp of a peakMax in the zinger TF’sdataset. This 100 bp distance reflects the average rangeof enrichment for a TF’s motif relative to the peakMax.The assessment of the distal-zinger peaks, that is, thosepeaks with motifs not proximal to the peakMax, relativeto the zinger TF’s ChIP-seq dataset was performed inthe same manner.the same NRF1 (GM12878) dataset. (C) Enrichment of JUN motifs on theWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 14 of 16http://genomebiology.com/2014/15/7/412Generation of ChIP-seq peak neighborhoodsTo determine the degree of recurrence for a zinger motifpeak region across multiple datasets we pooled all zingermotif peaks that had a high scoring (score >85) zingermotif from all datasets. We then calculated the inter-zinger distances between each zinger motif peak and itsnearest neighbour in the 3′ direction on the plus strand.Consecutive peaks that were within 50 bp of their near-est neighbor were merged into a ‘zinger neighborhood’.The distance of 50 bp was chosen as a stringent measureof proximity between zinger motif peaks. For eachneighborhood, we counted the number of unique TFsthat ChIPped the zinger motif peaks and the number ofunique cell lines. We provide the coordinates for thezinger neighborhoods in Additional file 12: Dataset S1.We generated neighborhoods from the remaining twogroups of peaks in a similar manner: those with theChIPped TF motifs and lacking zinger motifs (‘ChIPpedTF neighborhoods’), and those without either motif (‘un-identified motif neighborhoods’). For the ChIPped TFneighborhoods we required that there be a high scoringmotif (score >85) for the ChIPped TF. Neighborhoodwidths were <150 bp on average. As stated in the maintext, zinger motif peaks may be bona fide binding re-gions for the zinger TF. Thus, after generating theneighborhood sets, we removed from the ChIPped TFneighborhoods those regions that were within 300 bp(measured centre to centre of the zinger neighborhoodsto ensure that comparisons were made between distinctneighborhood sets. We also removed from the ChIPpedTF neighborhoods those regions that overlapped the un-identified motif neighborhoods in the same manner.Neighborhood proximity to cohesin and polycombrepressive complexTo assess whether a neighborhood is proximal to a re-gion occupied by cohesin or the polycomb repressivecomplex (PRC) 1 or 2, we generated three datasets bycombining the ENCODE ChIP-seq data for the cohesinproteins, RAD21 and SMC3, into a one dataset; combin-ing the ENCODE ChIP-seq data for CBX to form a data-set for PRC1 occupancy, and lastly combining EZH2ChIP-seq data into a dataset for PRC2 occupancy. Wethen assessed how many zinger neighborhoods were sit-uated within 500 bp of one of the three protein com-plexes, measuring from the center of a neighborhood tothe ChIP-seq peakMax, and compared this to the twoother neighborhoods.Additional filesAdditional file 1: Figure S1. Zinger motifs are enriched across multiplemouse ChIP-seq datasets. (A) The histogram displays the results of TFBSmotif enrichment analysis on 81 mouse ChIP-seq datasets generated withthe oPOSSUM 3.0 software. Along the x-axis is the fraction of datasetsthat displayed enrichment near the peakMax for a TF profile. The y-axis isthe number of TF profiles that were found enriched for a given fractionof datasets. The profiles most frequently observed to be enriched arelabeled on the histogram. (B) The binding site logos of the nine TFbinding models with enriched motifs across the greatest number ofdatasets, manually grouped by motif similarity. Each logo depicts positionalong the x-axis and information content (that is, pattern strength) alongthe y-axis.Additional file 2: Figure S2. Zinger motifs are enriched across multiplehuman datasets after masking the ChIPped TF’s motif. (A) The histogramdisplays the results of TFBS motif enrichment analysis on 281 humanChIP-seq datasets in which the ChIPped TFs motifs were masked. Resultswere generated with the oPOSSUM 3.0 software. Along the x-axis is thefraction of datasets that displayed enrichment for a TF profile. The y-axisis the number of TF profiles that were found enriched near the peakMaxfor a given fraction of datasets. The profiles most frequently observed tobe enriched are labeled on the histogram. (B) The binding site logos ofthe TF binding models with enriched motifs across the greatest numberof datasets, manually grouped by motif similarity. Each logo depictsposition along the x-axis and information content (that is, patternstrength) along the y-axis.Additional file 3: Figure S3. DNaseI-seq and Faire-seq datasets areenriched for zinger motifs. The histograms display the results of TFBSmotif enrichment analysis on (A) DNaseI-seq datasets and (B) Faire-seqdatasets. Results were generated with the oPOSSUM 3.0 software. Alongthe x-axis is the fraction of datasets that displayed enrichment for a TFprofile. The y-axis is the number of TF profiles that were found enrichedfor a given fraction of datasets. The profiles most frequently observed tobe enriched are labeled on the histogram. (C) The binding site logos ofthe TF binding models with enriched motifs across the greatest numberof either DNaseI-seq or Faire-seq datasets. The logos are manuallygrouped by motif similarity, except for the bottom row. Each logo depictsposition along the x-axis and information content (that is, patternstrength) along the y-axis.Additional file 4: Text S1. Additional observations regarding zingermotifs and zinger motif peaks.Additional file 5: Figure S4. ChIP-seq datasets for non-sequence-specificproteins are enriched for zinger motifs. The enrichment plots display thelocation of the top scoring motif for each peak relative to the peakMax(the peakMax is at 0) on the x-axis, while the score of the motif isplotted on the y-axis. The adjacent line plots display the fraction ofmotifs observed in 5 bp increments. The logo reflecting the bindingspecificity for each zinger appears above the related enrichment plot.(A) CTCF motif predictions on ChIP-seq data for WHIP, a helicaseinteracting protein. (B) JUN motif predictions on ChIP-seq data for p300,a histone acetyltransferase. (C) GABPA motif predictions on ChIP-seqdata for CCNT2, a cyclin regulator of CDK9 kinase. (D) THAP11 motifpredictions on ChIP-seq data for CHD2, a chromodomain helicase.Additional file 6: Figure S5. Input and mock-IP control data are enrichedfor zinger motifs. The enrichment plots display the location of the topscoring CTCF motif for each peak relative to the peakMax (the peakMaxis at 0) on the x-axis, while the score of the motif is plotted on they-axis. The adjacent line plots display the fraction of CTCF motifsobserved in 5 bp increments. The logo reflecting the binding specificityfor CTCF appears above the related enrichment plot. (A) Input regionsfrom the HUVEC cell line. (B) IgG rabbit mock-IP regions from GM12878 cells.Additional file 7: Figure S6. Shuffled zinger PWMs are not enrichedproximal to the peakMax. The enrichment plots display the location of thetop scoring motif for each peak relative to the peakMax (the peakMax isat 0) on the x-axis, while the score of the motif is plotted on the y-axis. Theadjacent line plots display the fraction of motifs observed in 5 bpincrements. The logo reflecting the binding specificity for each zingerappears above the related enrichment plot. (A) Enrichment of CTCF motifson the NRF1 (GM12878) dataset. (B) Enrichment of shuffled-CTCF motifs onTCF7L2 (Hct116) dataset. (D) Enrichment of a shuffled-JUN motif on thesame TCF7L2 (Hct116) dataset.reflecting the binding specificity for each zinger appears above theWorsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 15 of 16http://genomebiology.com/2014/15/7/412related enrichment plot. (A) STAT1 motif predictions on STAT1 ChIP-seqfrom untreated GM12878 cells. No STAT1 motif. (B) CTCF motifpredictions on STAT1 ChIP-seq from untreated GM12878 cells. (C) STAT1motif predictions on STAT1 ChIP-seq from IFNγ treated HeLa cells. STAT1motif is present. (D) CTCF motif predictions on STAT1 ChIP-seq formIFNγ treated HeLa cells.Additional file 9: Figure S8. The distribution of zinger motif contentvaries across ChIP-seq datasets. (A, B) For those datasets with at leasta 1% zinger component, the histograms present the distribution ofobserved zinger motif peak content. The x-axis reports the proportionof zinger motif peaks within an analyzed dataset, and the y-axis thefrequency of such observations. The black vertical dashed line representsthe mean, the blue vertical dashed line represents the median, and thered vertical dashed line represents the point where two-thirds of thedatasets are to the right of the line. The asterisk indicates the maximumzinger proportion, excluding outliers. (A) Analysis performed on entireChIP-seq datasets. (B) Analysis on the set of peaks unaccounted for bythe ChIPped TF motif. (C) A heatmap of the individual zingers’ motifpeaks log2 fold enrichment in the set of peaks unaccounted for by theChIPped TF and with a strong motif score (score 85 or greater). Foldenrichment less than 1.5 is grey. The rows are individual datasets, thecolumns are the zingers. (D) A scatterplot of the proportions of zingermotif peaks (y-axis) and ChIPped TF motif peaks (x-axis) in each dataset.Additional file 10: Figure S9. The proportion of a dataset with zingermotifs is not dependent on cell-line nor the ChIPped TF. (A) The x-axis isthe proportion of datasets composed of zinger motif peaks. The y-axis isa density value reflecting the fraction of datasets with zinger motifs.The five cell lines are K562 (black), GM12878 (blue), HeLa (red), H1-hESC(green), and HepG2 (magenta). There are no significant differencesbetween the distributions per Wilcoxon test P values. (B) The TFsanalyzed are listed on the horizontal access. The y-axis is the maximumdifference of zinger proportions observed between two ChIP-seq datasetsfor the same TF.Additional file 11: Figure S10. Zinger motifs and zinger motif peaks arenot strongly correlated. (A) A heatmap of significance for inter-dependencebetween pairs of zinger motifs in zinger motif peaks. Positive associationswith a significant Fisher exact P value (P value <0.001) are yellow, negativeassociations with a significant Fisher exact P value are red, and non-significantP values are grey. The color density reflects P value significance, with thedensest colors being P values closest to 0. The columns are individualdatasets; the rows are the six possible zinger pairs. (B) A correlation matrixpresenting both Spearman’s rank (lower diagonal) and Pearson (upperdiagonal) correlation coefficients for the pairwise association of zinger motifpeak enrichment within the same ChIP-seq datasets.Additional file 12: Dataset S1. Genomic coordinates for zingerneighborhoods (tab delimited).Additional file 13: Table S1. The top 20 motifs from motifover-representation analysis on HOT regions.Additional file 14: Dataset S2. Position frequency matrices.AbbreviationsChIP-seq: Chromatin immunoprecipitation and high-throughput sequencing;peakMax: Peak local maximum; PFM: Position frequency matrix;PWM: Position weight matrix; TF: Transcription factor; TFBS: Transcriptionfactor binding site.Additional file 8: Figure S7. Untreated STAT1 ChIP-seq data showstrong zinger motif enrichment and not STAT1 motif enrichment. Theenrichment plots display the location of the top scoring motif for eachpeak relative to the peakMax (the peakMax is at 0) on the x-axis, whilethe score of the motif is plotted on the y-axis. The adjacent line plotsdisplay the fraction of motifs observed in 5 bp increments. The logoCompeting interestsThe authors declare that they have no competing interests.Authors’ contributionsRWH performed all analyses. RWH and WWW designed the study and wrotethe manuscript. Both authors read and approved the final manuscript.AcknowledgementsWe thank Dr. Sorana Morrissy, Dr. Anthony Fejes, Dr. Francois Parcy, and Dr.Maja Tarailo-Graovac, Dr. Anthony Mathelier, Chih-yu Chen and the othermembers of the Wasserman lab for suggestions and feedback, Miroslav Hatasfor systems support, and Dora Pak for management support. We are indebtedto the researchers around the globe who generated the ChIP-seq data. Thework was supported by funding from the National Institutes of Health (USA)grant 1R01GM084875, the Canadian Institutes for Health Research, the NationalScience and Engineering Research Council (NSERC), and GenomeBC andGenomeCanada (ABC4DE Project). RWH was supported by fellowships fromthe Canadian Institutes of Health Research (CIHR) and the Michael SmithFoundation for Health Research (MSFHR).Author details1Centre for Molecular Medicine and Therapeutics, Child and Family ResearchInstitute, University of British Columbia, Vancouver, BC, Canada.2Bioinformatics Graduate Program, University of British Columbia, Vancouver,BC, Canada. 3Department of Medical Genetics, University of British Columbia,Vancouver, BC, Canada.Received: 25 April 2014 Accepted: 29 July 2014Published: 29 July 2014References1. ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED,Gunter C, Snyder M: An integrated encyclopedia of DNA elements in thehuman genome. Nature 2012, 489:57–74.2. Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seqstudies. Nat Methods 2009, 6:S22–S32.3. Kidder BL, Hu G, Zhao K: ChIP-seq: technical considerations for obtaininghigh-quality data. Nat Immunol 2011, 12:918–922. 1529-2908.4. Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, Zhang Y, Kim TK,He HH, Zieba J, Ruan Y, Bickel PJ, Myers RM, Wold BJ, White KP, Lieb JD, LiuXS: Systematic evaluation of factors influencing ChIP-seq fidelity. NatMethods 2012, 9:609–614.5. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S,Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C,Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, HoffmanMM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, LiuXS, Ma L, Milosavljevic A, Myers RM, et al: ChIP-seq guidelines andpractices of the ENCODE and modENCODE consortia. Genome Res 2012,22:1813–1831.6. Wilbanks EG, Facciotti MT: Evaluation of algorithm performance in ChIP-seqpeak detection. PLoS One 2010, 5:e11471.7. Bailey TL, Machanick P: Inferring direct DNA binding from ChIP-seq.Nucleic Acids Res 2012, 40:e128.8. Bailey TL, Elkan C: Fitting a mixture model by expectation maximizationto discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994,2:28–36.9. Kwon AT, Arenillas DJ, Hunt RW, Wasserman WW: oPOSSUM-3: advancedanalysis of regulatory motif over-representation across genes or ChIP-seqdatasets. G3 2012, 2:987–1002.10. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C,Singh H, Glass CK: Simple combinations of lineage-determining transcriptionfactors prime cis-regulatory elements required for macrophage and B cellidentities. Mol Cell 2010, 38:576–589.11. Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R,Carriero N, Snyder M, Gerstein MB: PeakSeq enables systematic scoring ofChIP-seq experiments relative to controls. Nat Biotechnol 2009, 27:66–75.12. Teytelman L, Ozaydin B, Zill O, Lefrancois P, Snyder M, Rine J, Eisen MB:Impact of chromatin structures on DNA processing for genomicanalyses. PLoS One 2009, 4:e6700.13. Auerbach RK, Euskirchen G, Rozowsky J, Lamarre-Vincent N, Moqtaderi Z,Lefrancois P, Struhl K, Gerstein M, Snyder M: Mapping accessible chromatinregions using Sono-seq. Proc Natl Acad Sci U S A 2009, 106:14926–14931.14. Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS: Quantifyingsimilarity between motifs. Genome Biol 2007, 8:R24.Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatlyexpanded open-access database of transcription factor binding profiles.Nucleic Acids Res 2010, 38:D105–D110.39. Lenhard B, Wasserman WW: TFBS: Computational framework for transcriptionfactor binding site analysis. Bioinformatics 2002, 18:1135–1136.40. R Core Team: R: A language and environment for statistical computing.2141st edition. Vienna: R Foundation for Statistical Computing; 2014.http://www.R-project.org/.41. Smit A, Hubley R, Green P: RepeatMasker 4.0.3 edition. Institute for SystemsBiology: Seattle, WA; 2013.42. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J:Repbase Update, a database of eukaryotic repetitive elements.Cytogenet Genome Res 2005, 110:462–467. 1424-8581.doi:10.1186/s13059-014-0412-4Cite this article as: Worsley Hunt and Wasserman: Non-targetedtranscription factors motifs are a systemic component of ChIP-seq datasets.Genome Biology 2014 15:412.Worsley Hunt and Wasserman Genome Biology 2014, 15:412 Page 16 of 16http://genomebiology.com/2014/15/7/41215. Ngondo-Mbongo RP, Myslinski E, Aster JC, Carbon P: Modulation of geneexpression via overlapping binding sites exerted by ZNF143, Notch1 andTHAP11. Nucleic Acids Res 2013, 41:4000–4014.16. Boyle AP, Song L, Lee BK, London D, Keefe D, Birney E, Iyer VR, Crawford GE,Furey TS: High-resolution genome-wide in vivo footprinting of diversetranscription factors in human cells. Genome Res 2011, 21:456–464.17. Shu W, Chen H, Bo X, Wang S: Genome-wide analysis of the relationshipsbetween DNaseI HS, histone modifications and gene expressionreveals distinct modes of chromatin domains. Nucleic Acids Res 2011,39:7428–7443.18. Worsley Hunt R, Mathelier A, Del Peso L, Wasserman WW: Improvinganalysis of transcription factor binding sites within ChIP-seq data basedon topological motif enrichment. BMC Genomics 2014, 15:472.19. Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J,Birney E, Bickel P, Snyder M, Gerstein M: Classification of human genomicregions based on experimentally determined binding sites of more than100 transcription-related factors. Genome Biol 2012, 13:R48.20. Schmidt D, Schwalie PC, Ross-Innes CS, Hurtado A, Brown GD, Carroll JS,Flicek P, Odom DT: A CTCF-independent role for cohesin in tissue-specifictranscription. Genome Res 2010, 20:578–588.21. Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL,Ebmeier CC, Goossens J, Rahl PB, Levine SS, Taatjes DJ, Dekker J, Young RA:Mediator and cohesin connect gene expression and chromatinarchitecture. Nature 2010, 467:430–435.22. Kim YJ, Cecchini KR, Kim TH: Conserved, developmentally regulatedmechanism couples chromosomal looping and heterochromatin barrieractivity at the homeobox gene A locus. Proc Natl Acad Sci U S A 2011,108:7391–7396.23. Faure AJ, Schmidt D, Watt S, Schwalie PC, Wilson MD, Xu H, Ramsay RG,Odom DT, Flicek P: Cohesin regulates tissue-specific expression bystabilizing highly occupied cis-regulatory modules. Genome Res 2012,22:2163–2175.24. Parelho V, Hadjur S, Spivakov M, Leleu M, Sauer S, Gregson HC, Jarmuz A,Canzonetta C, Webster Z, Nesterova T, Cobb BS, Yokomori K, Dillon N,Aragon L, Fisher AG, Merkenschlager M: Cohesins functionally associatewith CTCF on mammalian chromosome arms. Cell 2008, 132:422–433.25. Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S,Nagae G, Ishihara K, Mishiro T, Yahata K, Imamoto F, Aburatani H, Nakao M,Imamoto N, Maeshima K, Shirahige K, Peters JM: Cohesin mediatestranscriptional insulation by CCCTC-binding factor. Nature 2008,451:796–801.26. Schaaf CA, Misulovin Z, Gause M, Koenig A, Gohara DW, Watson A, DorsettD: Cohesin and polycomb proteins functionally interact to controltranscription at silenced and active genes. PLoS Genet 2013, 9:e1003560.27. Loverdo C, Benichou O, Voituriez R, Biebricher A, Bonnet I, Desbiolles P:Quantifying hopping and jumping in facilitated diffusion of DNA-binding proteins. Phys Rev Lett 2009, 102:188101.28. Yan J, Enge M, Whitington T, Dave K, Liu J, Sur I, Schmierer B, Jolma A,Kivioja T, Taipale M, Taipale J: Transcription factor binding in human cellsoccurs in dense clusters formed around cohesin anchor sites. Cell 2013,154:801–813.29. Cheung MS, Down TA, Latorre I, Ahringer J: Systematic bias in high-throughputsequencing data and its correction by BEADS. Nucleic Acids Res 2011, 39:e103.30. Teytelman L, Thurtle DM, Rine J, van Oudenaarden A: Highly expressed lociare vulnerable to misleading ChIP localization of multiple unrelatedproteins. Proc Natl Acad Sci U S A 2013, 110:18602–18607.31. Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM,Wong MC, Maddren M, Fang R, Heitner SG, Lee BT, Barber GP, Harte RA,Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, Kuhn RM, HausslerD, Kent WJ: ENCODE data in the UCSC Genome Browser: year 5 update.Nucleic Acids Res 2013, 41:D56–D63.32. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, ZhangW, Jiang J, Loh YH, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B,Shahab A, Ruan Y, Bourque G, Sung WK, Clarke ND, Wei CL, Ng HH:Integration of external signaling pathways with the core transcriptionalnetwork in embryonic stem cells. Cell 2008, 133:1106–1117.33. Tiwari VK, Stadler MB, Wirbelauer C, Paro R, Schubeler D, Beisel C: Achromatin-modifying function of JNK during stem cell differentiation.Nat Genet 2012, 44:94–100.34. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, NeryJR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V,Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionMillar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at baseresolution show widespread epigenomic differences. Nature 2009,462:315–322.35. Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A,Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P,Odom DT: Five-vertebrate ChIP-seq reveals the evolutionary dynamics oftranscription factor binding. Science 2010, 328:1036–1040.36. Fejes AP, Robertson G, Bilenky M, Varhol R, Bainbridge M, Jones SJ: FindPeaks3.1: a tool for identifying areas of enrichment from massively parallelshort-read sequencing technology. Bioinformatics 2008, 24:1729–1730.37. Kuhn RM, Haussler D, Kent WJ: The UCSC genome browser and associatedtools. Brief Bioinform 2013, 14:144–161.38. Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E,Submit your manuscript at www.biomedcentral.com/submit

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0223318/manifest

Comment

Related Items