UBC Faculty Research and Publications

Derivation of consensus inactivation status for X-linked genes from genome-wide studies Balaton, Bradley P; Cotton, Allison M; Brown, Carolyn J Dec 30, 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13293_2015_Article_53.pdf [ 1.09MB ]
JSON: 52383-1.0361791.json
JSON-LD: 52383-1.0361791-ld.json
RDF/XML (Pretty): 52383-1.0361791-rdf.xml
RDF/JSON: 52383-1.0361791-rdf.json
Turtle: 52383-1.0361791-turtle.txt
N-Triples: 52383-1.0361791-rdf-ntriples.txt
Original Record: 52383-1.0361791-source.json
Full Text

Full Text

RESEARCH Open AccessDerivation of consensus inactivation statusfor X-linked genes from genome-widestudiesBradley P. Balaton1, Allison M. Cotton2 and Carolyn J. Brown1*AbstractBackground: X chromosome inactivation is the epigenetic silencing of the majority of the genes on one of the Xchromosomes in XX therian mammals. In humans, approximately 15 % of genes consistently escape from thisinactivation and another 15 % of genes vary between individuals or tissues in whether they are subject to, orescape from, inactivation. Multiple studies have provided inactivation status calls for a large subset of the genes onthe X chromosome; however, these studies vary in which genes they were able to make calls for and in some caseswhich call they give a specific gene.Methods: This analysis aggregated three published studies that have examined X chromosome inactivation statusof genes across the X chromosome, generating consensus calls and identifying discordancies. The impact ofexpression level and chromosomal location on X chromosome inactivation status was also assessed.Results: Overall, we assigned a consensus XCI status 639 genes, including 78 % of protein-coding genes expressedoutside of the testes, with a lower frequency for non-coding RNA and testis-specific genes. Study-specificdiscordancies suggest that there may be instability of XCI during cell culture and also highlight study-specificvariations in call type. We observe an enrichment of discordant genes at boundaries between genes subject to andescaping from inactivation.Conclusions: This study has compiled a comprehensive list of X-chromosome inactivation statuses for genes andalso discovered some biases which will help guide future studies examining X-chromosome inactivation.Keywords: X-chromosome inactivation, Dosage compensation, Escape from X-chromosome inactivation, Somaticcell hybrids, Allelic imbalance, DNA methylationBackgroundIn mammals, sex is chromosomally determined with thepresence or absence of the Y chromosome generallyresulting in XY males and XX females. There is clearsexual dimorphism, with major contributing factors in-cluding expression of sex-linked genes and differentialhormone regulation of some gene pathways [1–3]. Sexdifferences can have effects on disease predispositionand sensitivity to certain therapies, leading fundingagencies including the NIH in the USA and CanadianInstitutes of Health Research (CIHR) in Canada, toinclude the consideration of sex differences in their cri-teria for funding. The sex difference in expression ofmost X-linked genes is minimized by X-chromosome in-activation (XCI); however, some genes are known to es-cape from XCI leading to male-female expressiondifferences, particularly in humans [4].XCI is the inactivation of one of the two X chromo-somes (X) in XX eutherian females as a form of dosagecompensation between XX females and XY males [5, 6].Which X is inactivated is randomly chosen in each cellearly in development and maintained in that cell’s de-scendants, resulting in females being a mosaic of whichparental X is inactive. XCI allows XX females and XYmales to have similar levels of expression for the major-ity of X-linked genes [2, 7]. However, not all X-linked* Correspondence: carolyn.brown@ubc.ca1Department of Medical Genetics, Molecular Epigenetics Group, Life SciencesInstitute, University of British Columbia, Vancouver, CanadaFull list of author information is available at the end of the article© 2015 Balaton et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Balaton et al. Biology of Sex Differences  (2015) 6:35 DOI 10.1186/s13293-015-0053-7genes are fully inactivated on the inactive X (Xi). Differ-ent studies suggest that between 8 [8] and 15 % [9] of X-linked genes escape from XCI and are expressed fromthe Xi at a level at least 10 % that of the active X (Xa).Another 10 [9] to 32 % [8] of genes on the X are variablein their XCI status between individuals or tissues. Com-paratively, in mice, 3–7 % of X-linked genes escape fromXCI, depending on tissue and strain [10]. Such differ-ences in which genes escape from XCI, along with otherdifferences in XCI between mouse and human, challengethe use of mouse as a model organism for predicting theXCI status of X-linked genes in humans.Examples of genes that escape from XCI are the genesin the pseudoautosomal region (PAR1) at the short armterminus of the X chromosome [9]. There are two PARson the X, and they are homologous to the PARs at thetermini of the Y chromosome. These regions recombineduring male meiosis and are therefore identical betweenthe X and Y. PAR genes do not need further dosagecompensation because XX females and XY males havethe same copy number. Interestingly, the PAR2 genes onthe long arm of the X chromosome achieve dosageequivalence differently as they are subject to XCI whilealso being silenced on the Y chromosome [11].Knowing which genes escape from XCI is importantbecause genes that escape from XCI can contribute tomale-female sex differences. Multiple studies haveshown an enrichment of genes with sex-biased expres-sion on the X chromosome [2, 12, 13]. A female expres-sion bias predominates on the X (5 % of genes);however, some X-linked genes do show a male expres-sion bias (1.7 % of genes) [2]. Analysis of the Genotype-Tissue Expression (GTEx) pilot project data shows thatmost of the 29 X chromosome genes with a female biasescape from XCI, while the eight X chromosome genesshowing a male expression bias were predominantlyPAR located [12]. In mouse brain samples, 12 % of genesdifferentially expressed between the sexes are located onthe sex chromosomes, and these genes have a larger foldchange between males and females than other differen-tially expressed genes [13].One consequence of escape from XCI and incompletedosage compensation is that there will be altered geneexpression associated with X chromosome aneuploidies.Having a single X without a Y chromosome (Turner’ssyndrome) is more severe in humans than in mice [6],and this is likely linked to differences in how many genesescape from XCI between the species [4]. In patientswith Klinefelter’s syndrome (XXY males), some genesthat escape from XCI were found to be overexpressedand correlated with negative phenotypes [14]. Addition-ally, escape from XCI can affect disease susceptibility. X-linked tumor suppressor genes which escape from XCI,an example being UTX [15], only require one mutationto be knocked out in males but need two for females tobe affected. Another example of a gene which escapesfrom XCI with sex-specific disease effects is DDX3Xwhich has different severities of phenotype and diseasemechanisms between males and females [16].Determining which genes escape from XCI will alsofurther our overall understanding of XCI which has beena useful model system for understanding epigeneticregulation at other loci, especially those controlled bylong non-coding RNA (lncRNA). XCI is thought to beinitiated by the lncRNA XIST, which is expressed specif-ically from the Xi. Early in development, XIST spreadsalong one of the X chromosomes and allows for the re-cruitment of histone-modifying enzymes to make co-operative silencing modifications such as H3K27me3,ubH2A, H4K20me3, and H3K9me3 (reviewed in [17]).DNA methylation (DNAm) is another epigenetic markassociated with X inactivation, and blocking DNAm with5-azacytidine allows reactivation of X-linked genes inhuman-mouse hybrid cells [18]. Other lncRNAs, such asHOTAIR, are implicated in similar epigenetic regulation[19]. Understanding XIST and the epigenetic mecha-nisms controlling XCI may help further our understand-ing of how these other lncRNAs function.The goal of this study is to integrate the results fromstudies that have done large-scale analyses of whichgenes escape from, are subject to, or variably escapefrom XCI and to come up with a catalog of consensusXCI status calls using the hg19 gene map. The first ofthe three main studies to be integrated used twomethods [9]. Human-mouse hybrid cell lines with an ac-tive or inactive human X chromosome allowed the directexamination of which genes are expressed from the Xi.Comparison of the expression of each gene from the Xicell lines to the expression from the Xa cell lines led to acall of escape from XCI when there was 10 % or morerelative Xi expression. These results will be referred toas the Carrel hybrid study. The Carrel hybrid study usednine Xi hybrid cell lines and made XCI status calls for465 genes (Table 1). Genes which escaped in only 0, 1,or 2 cell lines were called as being subject to XCI, andgenes which escaped in 7, 8, or 9 cell lines were calledas escaping from XCI. Genes which escaped XCI in 3 to6 hybrid cell lines were called as variably escaping fromXCI. The same publication examined the allelic ratio ofX-linked expressed SNPs in fibroblast cell lines whichwere skewed completely for which X was inactivated,such that in a population of cells, the same allele was al-ways on the Xa and biallelic expression would reflect es-cape from XCI. These results will be referred to as theCarrel SNP study [9]. The Carrel SNP study examined apanel of 40 cell lines and made XCI status calls for 84genes, with an average of 12 informative cell lines pergene (Table 1). Genes which had less than 23 % of theirBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 2 of 11cell lines escaping from XCI were called as subject toXCI while genes with over 78 % of their cell lines escap-ing XCI were called as escaping from XCI. Genes withbetween 23 and 78 % of their cell lines escaping fromXCI were called as variably escaping from XCI.The second study looked at the expression of X-linkedSNPs using microarray data to include assessment of in-tronic polymorphisms [8]. The allelic imbalance (AI) be-tween the allele on the Xa and the allele on the Xi forgenes which already had strong evidence for being sub-ject to XCI was used to assess how much skewing ofXCI was present in each cell line, and this was then usedto calculate how much of the AI was due to mosaicismand how much was due to escape from XCI. This will bereferred to as the Cotton AI study [8]. The Cotton AIstudy used 99 cell lines and made XCI status calls for419 genes with an average of 25 informative samples pergene. The same thresholds were used for the AI study asthe SNP study (Table 1).The third study used CpG island methylation datafrom the Illumina Infinium Human Methylation450BeadChip platform [20]. It compared the female andmale DNAm levels at CpG islands at the promoters ofgenes known to be subject to XCI and those known toescape from XCI to develop a classifier which could pre-dict the XCI status of other genes. This classifier wasthen used on genes with unknown or less evident XCIstatus to make new XCI status calls. This will be referredto as the Cotton DNAm study [20]. The Cotton DNAmstudy examined 1875 female samples and 1053 malesamples, giving XCI status calls for 409 genes (and mul-tiple transcription start sites for most genes) (Table 1).XCI status calls were given individually by tissue, andthe overall XCI status call was a list of calls which wereobtained in at least one tissue. An uncallable designationwas used when less than 50 % of samples in that tissuehad a methylation level and male-female differencewithin two standard deviations of the subject or escapetraining genes in that tissue (50 genes were left in anuncallable category because they were uncallable in overhalf of the tissues examined). Genes were called as sub-ject to or escaping from XCI in a tissue if all samplesthat were given an XCI status call gave the same call.Genes were called as variably escaping from XCI if theyhad at least one sample giving each XCI status call(subject and escape). Variable escape from XCI was rarein this study with a maximum of one third of all tissuesshowing variable escape for any given gene.Additional approaches to determine XCI status, whichhave examined fewer genes, include DNAm analysis atnon-CpG sites [21], SNP expression analysis in singlecells [22], RNA-FISH to detect expression from both Xchromosomes [23], analysis of protein polymorphisms inclonal cells by size [24] or by enzyme activity [25],microarray analysis of cellular expression with varyingnumbers of X chromosomes [26], microarray analysis ofexpression differences between males and females [27],and allelic expression analysis of RNA-seq data fromclonal cells [28].Each of the three studies integrated in this analysishave examined over 400 different genes, and combinedthere is data for 639 genes. Generally, multiple studiesagree, and only 47 genes show substantial discordanciesbetween studies, which we discuss. There is an enrich-ment of discordancies and calls of mostly variable escapefrom XCI at putative XCI boundaries. Seventy percent ofprotein-coding messenger RNA (mRNA) genes have anXCI status call with the hypermethylated cancer-testesantigen gene family accounting for 42 % of theremaining uncalled mRNA genes. However, fewer of thenon-protein-coding genes have a defined XCI status.MethodsCategorization of X-linked genesA full list of genes on the X chromosome was down-loaded from University of California, Santa Cruz(UCSC)’s HG19.knownGene table browser [29]. Thetable was condensed manually from having an entry foreach transcription start site to having an entry for eachgene. XCI calls from the studies were added to the table,matching alternate gene names from the National Cen-ter for Biotechnology Information (NCBI) [30] alongwith using the in silico PCR tool in UCSC [31] with pub-lished primers [9].Genes were placed into eight categories for an overallXCI status call. If all of a gene’s calls from different stud-ies were the same, then the gene was placed in a cat-egory for all subjects, all escapes or all variable escapes.If the majority of studies (2 out of 3 or 3 out of 4) gavethe same call, then the gene was placed in the mostlyTable 1 Sample sizes of previous studiesStudy Carrel hybrid Carrel SNP Cotton AI Cotton DNAmXCI status calls 465 84 429 406Number of samples 9 40 99 1875Average number of informative samples – 12 25 –The number of samples used and XCI status calls made per study for the Carrel hybrid, Carrel SNP, Cotton AI, and Cotton DNAm studies. The average number ofinformative samples was also included for the Carrel SNP and Cotton AI studies as only samples which were heterozygous at a SNP could be used forthese studiesBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 3 of 11subject, mostly escape or mostly variable escape categor-ies. Genes that had one-call subject or one-call escapeand a variable escape call which leaned towards the samecall (variable escape in a study, with less than 34 % orgreater than 65 % of samples escaping XCI) were alsoplaced in the mostly subject and mostly escape categor-ies. The Cotton DNAm study gave some calls that wereescape + variable escape or subject + variable escape; formy categorization, these genes were considered to bewhichever call was given in the most tissues, this wasusually subject or escape. Genes that had no calls in anyof the studies were designated as the no call category,while genes that did not fit any of these other categorieswere placed in the discordant category. Discordant geneshad either an even split of different calls or had one ofeach call (subject, escape, and variable escape from XCI).Genes were sorted by their transcript type (mRNA,micro RNA (miRNA), ncRNA, snRNA, transfer ribo-nucleic acid (tRNA)) as determined by UCSC’sHG19.kgXref table [29] and if still unknown, a search ofNCBI. A list of cancer-testis antigen genes was takenfrom CTdatabase [32].To determine the source of discordancies, genes withthree or four calls and only one study giving a differentcall from the other studies were examined. The studywhich gave the discordant call was noted, along with thecall it gave and the call agreed upon by the otherstudies.Expression analysisExpression data for the lymphoblast cell line GM12878was downloaded from GEO dataset GSE30400 [28], andexpression data for the fibroblast cell line IMR90 wasdownloaded from GEO dataset GSM981249 [33]. Thisdata was annotated using Seqmonk (Babraham Bioinfor-matics) using our condensed X chromosome gene list. ATukey test was performed to determine if expressionlevels in lymphoblasts differed amongst the various cat-egories using the multcomp package in R [34, 35]. Thiswas repeated for the calls given by each individual study.Domain analysisDomains were annotated by labeling any genes betweenescape genes, without crossing a subject gene, as beingin an escape domain and labeling any genes betweensubject genes without crossing an escape gene as beingin a subject domain. Genes between a subject and escapegene, with no other subject or escape genes in between,were classified as boundaries; boundaries can start insideof the gene body of a gene which is subject to or escap-ing from XCI, as a gene’s XCI status is likely determinedby its promoter. Enrichment was determined using achi-square test (chisq.test from the MASS package in R[34, 36]). Standardized residuals were extracted from thechi-square test and used to determine enrichment ofcertain categories [37], followed by a chi-square testcomparing the enrichment of variable, mostly variableand discordant genes in boundaries, individually againstgenes with no call. Genes with no call were shown to bea good control (p value >0.95) by a chi-square compari-son between genes with no call and genes with a call, inboundaries compared to the outside of boundaries.Results and discussionCreation of a consensus XCI statusGencode currently lists 1144 genes on the human Xchromosome [38, 39]. Between the four datasets exam-ined, 639 (54 %) of these genes have an XCI status call(Fig. 1a). There is a roughly equal distribution of genesthat have been examined in one, two, or three of thesestudies; however, very few genes have an XCI status callin all four studies because the Carrel SNP study has asmall sample size of 84 (Fig. 1a). Comparing the distri-bution of transcript types between genes with XCI statuscalls and those without, protein-coding genes are muchmore likely to have a call whereas genes for non-codingRNA such as miRNA and tRNA are more likely to nothave an XCI status call (Fig. 1b). A large proportion ofthe protein-coding genes without a call can be explainedby them belonging to the Cancer-Testis Antigen Gene(CTAG) family (Fig. 1b). CTAG genes are hypermethy-lated and silenced on both Xs in healthy female cells andare normally only expressed in cancer cells or in the tes-tes of males [32]. Other genes lacking calls have verylow expression (RPKM values less than 0.1) in the fibro-blasts and lymphoblasts examined in the hybrid, SNP,and AI studies (102 out of 143 non-CTAG genes with-out a call (Additional file 1: Table S1)), and all geneswithout calls either are not present on or filtered outfrom the DNAm microarray used for assessment in theDNAm study (reasons for filtering include hypermethy-lation in male samples and mapping to repetitive ele-ments or to the autosomes [20]) or were found to havemethylation levels in an uncallable region between thatfound for known subject and escape genes. There wereonly 24 genes that lacked expression and were called bythe DNAm but were unable to be called by the other ex-pression studies. Enrichment of calls for protein-codinggenes likely reflects the more recent identification oflncRNA genes. The smaller RNA types are too small ortoo tissue-specific to have their XCI status determinedin these studies; furthermore, high homology to anothergene might prevent assessment of XCI status and the Xis enriched for large inverted repeats [40].Genes were divided into eight categories based onwhat XCI status the studies called the gene and howoften the studies agreed (Fig. 2a). Seventy-three percentof genes were given an overall call of subject or mostlyBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 4 of 11subject, roughly agreeing with the percent found to besubject in each individual study (Fig. 2b). The percent ofescape and mostly escape genes (12 %) was also similarto the percent of escape genes found by each individualstudy. The variable escape and mostly variable escapecategories (8 %) agreed with the Carrel studies; however,the Cotton studies have large differences in the amountof genes they call variable escape. This difference in thenumber of variable escape calls contributed to a fairamount of the discordancies between studies. Seven per-cent of genes on the X were discordant between studiesand no consensus call could be assigned, while another28 % had a single discordancy (categorized into one ofthe mostly escape, mostly subject, or mostly variable es-cape categories) (Fig. 2a).Discordancies between studiesTo understand the nature of the discordancies betweenstudies, we tabulated the frequency with which studiesdisagreed and the difference from the consensus call(Table 2). The Cotton AI study was the most discordantstudy with 11 % of its calls disagreeing with two or threeother studies and a tendency to call gene variable escapewhen other studies called that gene escape or subject(Fig. 3a). This tendency to call variable escape could bedue to the extra calculations involved to correct forusing cells which were only partially skewed. Anothercontributing factor could be that the AI study, inaddition to the exonic SNPs used in the SNP study, alsoused intronic SNPs which are spliced out and degradedand would be present in lower levels which may affectFig. 1 The majority of X-linked protein-coding genes have an XCI status call. a The number of datasets contributing an XCI status call per gene.The number of calls is the number of studies which gave an XCI status call of subject, escape, or variable escape from XCI. Genes with no callwere not mentioned in any of the studies but were included in Gencode for HG19 [38, 39]. b The distribution of RNA transcript types for geneswith and without an XCI status call. Transcript type was taken from Gencode or an NCBI search [30]. CTAG are cancer-testes antigen genes whichare protein-coding genes expressed exclusively in cancer and in testes and hypermethylated in other tissues making XCI status calls very difficult.Other mRNAs are mRNA genes that are not members of the CTAG familyFig. 2 Consensus XCI status calls. a Distribution of our consensus XCI status calls. E is escape from XCI, S is subject to XCI, and VE is variablyescaping from XCI in some individuals or tissues. The mostly E, S, or VE categories are genes which have two out of three or three out of four XCIstatus calls agree on a call of E, S, or VE and the last study disagree. The all E, S, or VE categories had at least one XCI status call for E, S, or VE andhad no XCI status calls disagree. Discordant calls had either an even split of different XCI status calls or had one of each call. Genes with no callwere left out of this graph. b The distribution of XCI status calls given by each individual study. See above for a description of E, S, and VE. E/VEand S/VE are calls from the Cotton DNAm study where most tissues were given a call of escape or subject, but some tissues were given a call ofvariable escape. For the sample sizes of each study see Table 1Balaton et al. Biology of Sex Differences  (2015) 6:35 Page 5 of 11the XCI status calls drawn from them. The AI study alsoused more samples than the other expression studies (anaverage of 25 informative samples per gene compared to12 in the SNP study and 9 in hybrids) which would in-crease the chance of finding variable escape genes. TheCotton DNAm study was the most concordant studywith only 2 % of its calls disagreeing with 2 or threeother studies; however, it also had an uncallable categoryfor genes which had methylation levels or male-femalemethylation differences between the thresholds set bytraining sets of known subject and escape genes (thethreshold was set at two standard deviations away fromthe training set mean). Cotton did not give these genes acall and they were not considered in this analysis. Thediscordancies in the Cotton DNAm study were mostlydue to it not finding any genes with a high level of vari-able escape from XCI (Additional file 2: Figure S1). Thehybrid study discordancies arose from genes called es-cape or variable escape when other studies gave a sub-ject call.Tissue-specific differences in XCI status are an import-ant possible source of discordancies between studies.The Carrel hybrid and SNP studies were both done in asingle tissue type, fibroblasts. The Cotton AI study usedboth lymphoblasts and fibroblasts and found that 10 %of genes showed evidence of tissue-specific escape fromXCI; these genes would not appear to be variably escap-ing in the Carrel studies. However, the Cotton DNAmstudy looked at 27 tissue types (including fibroblasts andwhole blood (which includes lymphoblasts)) and foundTable 2 Most studies show a trend with what they are calling discordantlyDiscordant StudyDiscordant call Consensus call Carrel hybrid Carrel SNP Cotton AI Cotton DNAmE VE 0 1 0 2S 7 1 1 2VE E 1 1 17 0S 9 0 26 0S E 0 1 3 0VE 0 1 1 3Discordant call is which XCI status call is being given by the discordant study while consensus call is the XCI status call agreed upon by two or more other studiesE escape from XCI, S subject to XCI, VE variable escape from XCI.Fig. 3 Comparison of discordancies. a The level of discordancies in each study. A gene is counted as discordant in a study if that study gives acall and at least two other studies agree on a different call. For the sample sizes of each study, see Table 1. b Comparison of the Carrel hybridcalls to calls from other studies. The number of escaping hybrids is, for each gene, in how many mouse-human hybrid cell lines (out of 9) did thatgene escape XCI. The Y axis is how many genes one or more other studies agreed were subject to, escaping from, or variably escaping from XCI.c A magnified version of B to better show escape and variable escape from XCIBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 6 of 11high concordancy between tissues and very few tissue-specific differences in escape from XCI. Therefore, amore likely source of differences between studies couldbe from differences acquired in cell culture. The CottonDNAm study was the only study to use primary cells;the Carrel studies and Cotton AI study used culturedcells. Previous studies have shown differences in XCI be-tween primary cells and cultured cells from the same or-ganism [10, 41] and between individuals at different ages[42]. Genes with discordancies between studies or callsof variable escape in individual studies may be the genesmost prone to epigenetic changes in culture. In themostly subject and mostly escape categories, 90 % of thegenes have variable escape as the discordant call and82 % of the discordant genes have at least one variableescape call (Additional file 1: Table S1). This differencebetween the studies could also be due to differences be-tween the methylation status and XCI status of some ofthe more variable genes; however, most genes which arefound variable by other studies are not given an XCI sta-tus call by the Cotton DNAm study (Additional file 2:Figure S1).The mouse-human hybrid cells may be the most dif-ferent from primary cells. In hybrid cells, XIST fails toproperly localize to the Xi [43]. This may reflect a loss ofsome heterochromatin marks on the Xi, leaving X in-activation to be maintained by fewer marks, includingDNAm [44]. X-inactivated genes in hybrids are morevulnerable to reactivation by 5-azacytidine, a methyla-tion inhibitor [18], and approximately 1 in 105 hybridcells will spontaneously reactivate the HPRT genewhich is normally subject to inactivation [45]. Reacti-vation could explain the genes being called escape orvariable escape in the Carrel hybrid study while beingcalled subject in other studies. When compared withconsensus calls from other studies, genes found to es-cape in three or four hybrid cell lines in the Carrelhybrid study (which were thus classified as variableescape in that study) are more often called subject toXCI than variably escaping from XCI (Fig. 3b,Additional file 3: Table S2). Reactivation of subjectgenes appears to occur for a small percentage ofgenes in hybrid cell lines.Most of these studies have used expression to monitorXCI status. We therefore examined whether expressionlevel has an effect on a gene’s XCI status call (Additionalfile 4: Figure S2). None of the categories had significantlydifferent expression levels (p > 0.05) nor were there sig-nificant differences in expression levels for the calls ineach individual study (not shown).Domains of escape and boundariesIt has been hypothesized that there are domains on theXi with coordinately regulated XCI caused by nearbyXCI way stations spreading XCI or escape elements pro-moting euchromatin with boundaries separating the two[46–48]. We used our categories to locate these domainsand examined the domain enrichment of discordanciesand variably escaping genes (Fig. 4, Additional file 5 andAdditional file 6). Fully variable escape genes were mostoften found in subject domains at a frequency similar tothe overall distribution of genes (Fig. 4b). Genes whichmostly variable escape were most often in escapedomains and boundary regions suggesting variation inescape genes. Discordant genes were equally abundantin subject domains and boundary regions, despite thesubstantially smaller size of the boundary regions.Boundaries between domains may provide clues to themechanisms controlling XCI. Fully variable escape geneswere not enriched in boundaries (p value >0.95) whereasmostly variable escape and discordant genes each had anapproximately threefold enrichment (from 2 to 6 % ofgenes for mostly variable escape (p value <5*10−4) andfrom 7 to 20 % for discordant genes (p value < 4*10−7))(Fig. 4c). We hypothesize that these genes may be vari-able due to either natural variability in the position of aboundary or from instability of boundaries due to cellculture. These discordant and variable genes are spreadthroughout the different boundaries; 42 % of boundarieshave discordant or variable genes in them and 45 % ofall the discordant genes and 60 % of all the mostly vari-able escape genes are in boundaries.Comparison to additional studies examining XCIWe compared our XCI status calls to those found byvarious studies examining the XCI status of single genesor regions and generally found agreement (Additionalfile 1: Table S1). A chi-square standardized residual ana-lysis between the results of other studies and our ana-lysis shows that our study was strongly enriched for callsof fully escape and mostly escape calls when other stud-ies called a gene as escaping from XCI. Our analysis wasalso strongly enriched for calls of fully subject andenriched for calls of mostly subject and fully variable es-cape when other studies called a gene subject to XCI.When other studies disagreed with each other, our studytended to call genes discordant.Another method of examining XCI, using non-CpGmethylation (mCH), was recently reported [21] and wasalso compared to our results. Genes called escape bymCH were enriched for the mostly variable escape cat-egory while being strongly enriched for the escape andmostly escape categories and depleted for the subjectcategory. Genes called subject by mCH were almost en-tirely in our subject and mostly subject categories. An-other study used mCH to examine XCI across multipletissue types and found tissue-specific differences [49].Our consensus results were most concordant for genesBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 7 of 11that escaped XCI across multiple tissues. Together, thesecomparisons to various calls associated with XCI haveshown that the XCI calls presented in our analysis arerobust and are relevant to further studies.XCI status of genes with Y chromosome homologyThe X and Y chromosomes were once a homologouspair of chromosomes, and XCI is hypothesized to pro-vide dosage compensation as the Y homologs havedecayed. The number of genes escaping XCI is higheron the evolutionarily more recent regions of the Xchromosome [50], so we compared our consensus callsto which genes have been identified as having Y homo-logs or Y pseudogenes [51]. X-linked genes with Y ho-mologs are enriched for genes that escape and mostlyescape from XCI (Additional file 7: Figure S3A). X-linked genes with pseudogenes on the Y are not particu-larly enriched in any XCI category, although they havesignificantly less genes with no call (Additional file 7:Figure S3B). Genes with Y homologs might beanticipated to escape from XCI as having a functioningY homolog would negate the need for dosage compensa-tion. In addition, these genes could also have been toodosage-sensitive for the stepwise process of upregulationand becoming subject to XCI [52, reviewed in 53]. TheXCI pattern for genes with Y pseudogenes may be morerandom, as these genes have had time to evolve XCI. Be-ing enriched for genes with calls may be an artifact dueto pseudogenes and XCI calls both being enriched forgenes that are better known and well annotated.Our consensus XCI status calls and sex differences inexpressionGenes that escape from XCI tend to not be expressed tothe level that is observed from the active X chromo-some. A threshold of 10 % has been used, and at thislevel expression from females would only be minimallyhigher than males; however, expression up to approxi-mately 95 % of the Xa has been demonstrated [8], whichwould result in sex-biased expression. Recent genome-Discordant, 20, 20%All E, 16,16%Mostly E, 13,13%All VE, 6,6%Mostly VE, 6,6%All S, 31,30%Mostly S, 9,9%AB0102030405060708090Subject Domain Escape Domain Boundary Regionllac taht htiw seneg lla fo %All calls Fully variable escape Mostly variable escape DiscordantCDomains of XCIGene callsFig. 4 Domains of XCI and the enrichment of discordant and mostly variable escape genes at boundaries. a Our consensus gene calls and thedomains of XCI along the X chromosome. The top row is the XCI status calls for all genes with a call on the X while the second row is thedomains of XCI called from the consensus calls (see Section 2). For the XCI status calls, the colors are defined in c. For the domains of XCI: red issubject, green is escape, orange is boundaries, and white space is between domains. A magnification of two regions is shown below, demonstratinghow genes line up with domains. Domains are defined by the first and last gene in the domain, even if they start or end inside of other genes whichdo not share the same domain call. See Additional files 6 and 7 for the BED files used to generate the UCSC browser track upon which this graph isbased. b Distribution of genes into XCI status domains. The graph shows what percent of genes with each call are in each domain type. Percent isdetermined by dividing the number of genes with that XCI status call in that domain type by the total number of genes with that XCI status call. Theall calls category includes all genes on the X chromosome, including genes with no calls. c Distribution of genes at boundaries. This figure includesthe subject and escape genes which define the edges of the boundariesBalaton et al. Biology of Sex Differences  (2015) 6:35 Page 8 of 11wide comparisons of expression across multiple tissues(GTEx [12]) tested for sex-based expression, and theresults correlate well with our consensus calls. Geneswith a female expression bias were strongly enriched(p value <10−15) for the escape and mostly escapefrom XCI categories. This makes sense as geneswhich escape have two transcriptionally active copiesof a gene in females while only having one in males.Genes with a male expression bias are enriched forbeing in the PAR1 (p value <10−15) supporting thetheory that there is a minor spread of inactivationinto the PAR so that the Y chromosomal copy of thegene has more expression than the Xi copy [7].ConclusionsWe have compiled a list of XCI status calls from threelarge studies that used different methodologies. We gen-erated a stringent list in which multiple studies were en-tirely concordant for subject, escape, or variablecategories. We extend those calls with a “mostly” cat-egory, allowing single discrepancies. Together, theseclassifications can be applied to 50 % of genes on the X,including 80 % of all non-CTAG protein-coding genes.Having a reference list of XCI statuses will prove valu-able in the future as more research begins to considersex differences and the effect of having an inactivated Xchromosome. This table can be used by researchers toconsider the sex effects of their genes of interest or forcomparison to larger scale -omics studies such as theGTEx analysis project [12]. The table can also be in-formative for the impact of rearrangements, aneu-ploidies, or copy number variants on the Xi. This XCIstatus call list will also be valuable for labs such as oursstudying X chromosome inactivation. Having a confidentXCI status call is needed when attempting to determinepatterns across genes with similar XCI statuses or whenlooking for boundaries between domains with differ-ences in XCI.Additional filesAdditional file 1: Table S1. Our consensus XCI status calls for all geneson the X chromosome. The consensus calls from this study are under thecolumn labeled Balaton consensus calls. The data used for the rest of theanalyses in this article are also included as columns. The second sheethas descriptions of each column. (XLSX 316 kb)Additional file 2: Figure S1. Comparing the Cotton DNAm XCI statuscalls and consensus calls. No data reflects genes which were not called inthe DNAm study, primarily due to a lack of CpG islands. Uncallable aregenes which had methylation between the subject and escape classifiersand were unable to be confidently called by the DNAm study. S, E, andVE are subject, escape, and variable escape from XCI. E/VE and S/VE aregenes which were fully subject or escape in some tissues while variablyescaping in other tissues. All 4 states were genes which had some tissuessubject, escaping, variably escaping and uncallable making the gene notfit into any other XCI status category. A) The Cotton DNAm XCI statuscalls when the consensus call is variable escape or discordant. N = 91. B)The Cotton DNAm XCI status calls for all genes on the X chromosome forcomparison. N = 1144. (PDF 126 kb)Additional file 3: Table S2. The hybrid study tends to call genesvariable escape discordantly. The data used to create Fig. 4. Escapinghybrids is how many human-mouse hybrid cell lines (out of 9) werefound to escape from XCI by Carrel, Hybrid call is the XCI status call fromthe Carrel hybrid study, % agreement is the percent of genes with thatnumber of escaping hybrids whose Carrel hybrid call agrees with one ormore other study’s call. Consensus S, VE, and E are how many genes haveother studies agree on a call of subject, variable escape or escape.(DOC 37 kb)Additional file 4: Figure S2. Expression in GM12878 does not correlatewith consensus XCI status call. A box and whisker plot of the log readsper kilobase of transcript per million mapped reads (RPKM) of expression.A value of 1 RPKM was added to each gene in order to include geneswith 0 expression in a graph of log10(RPKM). E, VE, S and PAR are escape,variable escape, and subject to XCI and pseudoautosomal region. The Nare: Discordant = 44, E = 29, mostly E = 26, mostly S = 129, mostly VE = 10,no call = 509, PAR = 22, S = 331, VE = 37. (PDF 89 kb)Additional file 5: BED file used to make a UCSC browser track withcolor coded consensus XCI status calls for each gene on the X(excluding genes with no XCI status calls). This file was used togenerate Fig. 4a and colors correspond to Fig. 4a. (BED 39 kb)Additional file 6: BED file used to make a UCSC browser track ofthe XCI domains. This file was used to generate Fig. 4a and colorscorrespond to Fig. 4a. (BED 5 kb)Additional file 7: Figure S3. Consensus XCI status calls of genes with Yhomologs or Y pseudogenes. A) XCI status calls of X genes withhomologs on the Y chromosome. E is genes which escape from XCI in allstudies, mostly E is genes which escape from XCI in the majority ofstudies, S is genes which are subject to XCI in all studies, discordant isgenes which either have an even split of S and E calls or have one ofeach call (including variable escape), and no call is genes with no XCIstatus call in any study. N = 19. B) XCI status calls of X genes withpseudogenes on the Y chromosome. See above for description of mostcategories. VE and mostly VE is variable escape from XCI in all studies andvariable escape from XCI in the majority of studies. Mostly S is subject toXCI in the majority of studies. N = 264. (PDF 106 kb)AbbreviationsAI: allelic imbalance; CTAG: cancer-testes antigen gene; DNAm: DNAmethylation; GTEx: Genotype-Tissue Expression; lncRNA: long non-codingRNA; mCH: non-CpG methylation; PAR: pseudoautosomal region; X: Xchromosome; Xa: active X chromosome; XCI: X chromosome inactivation;Xi: inactive X chromosome.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsAMC generated the initial table. BPB condensed the table and performedthe analyses from it. BPB wrote the manuscript. All authors conceived thestudy and contributed to revision of the manuscript. All authors read andapproved the final manuscript.AcknowledgementsThis work was supported by CIHR grant MOP-119586 to CJB.Author details1Department of Medical Genetics, Molecular Epigenetics Group, Life SciencesInstitute, University of British Columbia, Vancouver, Canada. 2Department ofMedical Genetics, Centre for Molecular Medicine and Therapeutics, Child andFamily Research Institute, University of British Columbia, Vancouver, BC, Canada.Received: 27 October 2015 Accepted: 14 December 2015Balaton et al. Biology of Sex Differences  (2015) 6:35 Page 9 of 11References1. Ronen D, Benvenisty N. Sex-dependent gene expression in humanpluripotent stem cells. Cell Rep. 2014;8:923–32.2. Jansen R, Batista S, Brooks AI, Tischfield JA, Willemsen G, van Grootheest G,et al. Sex differences in the human peripheral blood transcriptome. BMCGenomics. 2014;15:33.3. Arnold AP. Conceptual frameworks and mouse models for studying sexdifferences in physiology and disease: why compensation changes thegame. Exp Neurol. 2014;259:2–9.4. Deng X, Berletch JB, Nguyen DK, Disteche CM. X chromosome regulation:diverse patterns in development, tissues and disease. Nat Rev Genet. 2014;15:367–78.5. Lyon MF. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature. 1961;190:372–3.6. Lyon MF. Sex chromatin and gene action in the mammalian X-chromosome. Am J Hum Genet. 1962;14:135–48.7. Johnston CM, Lovell FL, Leongamornlert DA, Stranger BE, Dermitzakis ET,Ross MT. Large-scale population study of human cell lines indicates thatdosage compensation is virtually complete. PLoS Genet. 2008;4, e9.8. Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–4.9. Cotton AM, Bing G, Light N, Adoue V, Pastinen T, Brown CJ. Analysis ofexpressed SNPs identifies variable extents of expression from the humaninactive X chromosome. Genome Biol. 2013;14:R122.10. Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, et al.Escape from X inactivation varies in mouse tissues. PLoS Genet.2015;11, e1005079.11. De Bonis ML, Cerase A, Matarazzo MR, Ferraro MR, Strazzullo M, Hansen RS, etal. Maintenance of X-and Y-inactivation of the pseudoautosomal (PAR2) geneSPRY2 is independent from DNA methylation and associated to multiple layersof epigenetic modifications. Hum Mol Genet. 2006;15:1123–32.12. Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. Thehuman transcriptome across tissues and individuals. Science. 2015;348:660–5.13. Armoskus C, Moreira D, Bollinger K, Jimenez O, Taniguchi S, TSAI H.Identification of sexually dimorphic genes in the neonatal mouse cortexand hippocampus. Brain Res. 2014;1562:22–38.14. Zitzmann M, Bongers R, Werler S, Bogdanova N, Wistuba J, Kliesch S, et al.Gene expression patterns in relation to the clinical phenotype in Klinefeltersyndrome. J Clin Endocrinol Metab. 2014;100:E518–23.15. Van der Meulen J, Sanghvi V, Mavrakis K, Durinck K, Fang F, Matthijssens F,et al. The H3K27me3 demethylase UTX is a gender-specific tumorsuppressor in T-cell acute lymphoblastic leukemia. Blood. 2015;125:13–21.16. Snijders Blok L, Madsen E, Juusola J, Gilissen C, Baralle D, Reijnders MR, et al.Mutations in DDX3X are a common cause of unexplained intellectualdisability with gender-specific effects on Wnt signaling. Am J Hum Genet.2015;97:343–52.17. Dixon-McDougall T, Brown CJ. The making of a Barr body: the mosaic offactors that eXIST on the mammalian inactive X chromosome. Biochem CellBiol. 2015. doi:10.1139/bcb-2015-0016.18. Mohandas T, Sparkes RS, Shapiro LJ. Reactivation of an inactive human Xchromosome: evidence for X inactivation by DNA methylation. Science.1981;211:393–6.19. Tsai M, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F, et al. Longnoncoding RNA as modular scaffold of histone modification complexes.Science. 2010;329:689–93.20. Cotton AM, Price EM, Jones MJ, Balaton BP, Kobor MS, Brown CJ. Landscapeof DNA methylation on the X chromosome reflects CpG density, functionalchromatin state and X-chromosome inactivation. Hum Mol Genet.2015;24:1528–39.21. Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, et al.Global epigenomic reconfiguration during mammalian brain development.Science. 2013;341:1237905.22. Carrel L, Willard HF. Heterogeneous gene expression from the inactive Xchromosome: an X-linked gene that escapes X inactivation in some humancell lines but is inactivated in others. Proc Natl Acad Sci U S A. 1999;96:7364–9.23. Hacisuleyman E, Goff LA, Trapnell C, Williams A, Henao-Mejia J, Sun L, et al.Topological organization of multichromosomal regions by the longintergenic noncoding RNA Firre. Nat Struct Mol Biol. 2014;21:198–206.24. Davidson RG, Nitowsky HM, Childs B. Demonstration of two populations ofcells in the human female heterozygous for glucose-6-phosphatedehydrogenase variants. Genetics. 1963;50:481–5.25. Migeon BR, Moser HW, Moser AB, Axelman J, Sillence D, Norum RA.Adrenoleukodystrophy: evidence for X linkage, inactivation, and selectionfavoring the mutant allele in heterozygous cells. Proc Natl Acad Sci U S A.1981;78:5066–70.26. Sudbrak R, Wiezorek G, Nuber UA, Mann W, Kirchner R, Erdogan F, et al. Xchromosome-specific cDNA arrays: identification of genes that escape fromX-inactivation and other applications. Hum Mol Genet. 2001;10:77–83.27. Craig IW, Mill J, Craig GM, Loat C, Schalkwyk LC. Application of microarraysto the analysis of the inactivation status of human X-linked genes expressedin lymphocytes. Eur J Hum Genet. 2004;12:639–46.28. Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, et al. AlleleSeq:analysis of allele-specific expression and binding in a network framework.Mol Syst Biol. 2011;7:522.29. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al.The UCSC browser data retrieval tool. Nucleic Acids Res. 2004;32:493–6.30. Brown GR, Hem V, Ovetsky KS, Wallin C, Ermolaeva O, Tolstoy I, et al. Gene:a gene-centered information resource at NCBI. Nucleic Acids Res.2015;43:D36–42.31. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al.The UCSC Genome Browser database: update 2006. Nucleic Acids Res.2006;34:D590–8.32. Almeida LG, Sakabe NJ, de Oliveira AR, Silva MC, Mundstein AS, Cohen T,et al. CTdatabase: a knowledge-base of high-throughput and curated dataon cancer-testis antigens. Nucleic Acids Res. 2009;37:D816–9.33. Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, et al. A comparativeencyclopedia of DNA elements in the mouse genome. Nature.2014;515:355–64.34. R Core Team. R: A language and environment for statistical computing. RFoundation for Statistical Computing, Vienna, Austria. 2014. http://www.R-project.org/35. Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametricmodels. Biom J. 2008;50:346–63.36. Venables WN, Ripley BD. Modern applied statistics with S. 4th ed. New York:Springer; 2002.37. Sharpe D. Your chi-square test is statistically significant: now what? PARE.2015;20.38. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, etal. GENCODE: the reference human genome annotation for the ENCODEproject. Genome Res. 2012;22:1760–74.39. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, et al.GENCODE: producing a reference annotation for ENCODE. Genome Biol.2006;7:S4.1–9.40. Warburton PE, Giordano J, Cheung F, Gelfand Y, Benson G. Inverted repeatstructure of the human genome: the X-chromosome contain apreponderance of large, highly homologous inverted repeats that containtestes genes. Genome Res. 2004;14:1861–9.41. Nino-Soto MI, Nuber UA, Basrur PK, Ropers HH, King WA. Differences in thepattern of X-linked gene expression between fetal bovine muscle andfibroblast cultures derived from the same muscle biopsies. Cytogenet.2005;111:57–64.42. Bennet-Baker PE, Wilkowski J, Burke DT. Age-associated activation ofepigenetically repressed genes in the mouse. Genetics. 2003;165:2055–62.43. Clemson CM, Chow JC, Brown CJ, Lawrence JB. Stabilization and localizationof Xist RNA are controlled by separate mechanisms and not sufficient for Xinactivation. J Cell Biol. 1998;142:13–23.44. Gartler SM, Dyer KA, Marshall Graves JA, Rocchi M. A two step model formammalian X-chromosome inactivation. Prog Clin Biol Res. 1985;198:96–102.45. Graves JA, Young GJ. X-chromosome activity in heterokaryons and hybridsbetween mouse fibroblasts and teratocarcinoma stem cells. Exp Cell Res.1982;141:87–97.46. Miller AP, Willard HF. Chromosomal basis of X chromosome inactivation:identification of a multigene domain in Xp11.21-p11.22 that escape Xinactivation. Proc Natl Acad Sci U S A. 1998;95:8709–14.47. Pinter SF, Sadreyev RI, Yildirim E, Jeon Y, Ohsumi T, Borowsky M, et al.Spreading of X chromosome inactivation via a hierarchy of definedpolycomb stations. Genome Res. 2012;22:1864–76.48. Li N, Carrel L. Escape from X chromosome inactivation is an intrinsicproperty of the Jarid1c locus. Proc Natl Acad Sci U S A. 2008;105:17055–60.49. Schultz MD, He Y, Whitaker JW, Hariharan M, Mukamel EA, Leung D, et al.Human body epigenome maps reveal noncanonical DNA methylationvariation. Nature. 2015;523:212–6.Balaton et al. Biology of Sex Differences  (2015) 6:35 Page 10 of 1150. Ross MT, Grafham DV, Coffey AJ, Scherer S, McLay K, Muzny D, et al. TheDNA sequence of the human X chromosome. Nature. 2005;434:325–37.51. Wilson Sayres MA, Makova KD. Gene survival and death on the human Ychromosome. Mol Biol Evol. 2013;30:781–7.52. Lahn BT, Page DC. Four evolutionary strata on the human X chromosome.Science. 1999;286:964–7.53. Veitia RA, Veyrunes F, Bottani S, Birchler JA. X chromosome inactivation andactive X upregulation in therian mammals: facts, questions, and hypotheses.J Mol Cell Biol. 2015;7:2–11.•  We accept pre-submission inquiries •  Our selector tool helps you to find the most relevant journal•  We provide round the clock customer support •  Convenient online submission•  Thorough peer review•  Inclusion in PubMed and all major indexing services •  Maximum visibility for your researchSubmit your manuscript atwww.biomedcentral.com/submitSubmit your next manuscript to BioMed Central and we will help you at every step:Balaton et al. Biology of Sex Differences  (2015) 6:35 Page 11 of 11


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items