UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Comparative studies of X inactivation within Eutheria Yen, Ziny 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2005-0711.pdf [ 19.44MB ]
JSON: 831-1.0092268.json
JSON-LD: 831-1.0092268-ld.json
RDF/XML (Pretty): 831-1.0092268-rdf.xml
RDF/JSON: 831-1.0092268-rdf.json
Turtle: 831-1.0092268-turtle.txt
N-Triples: 831-1.0092268-rdf-ntriples.txt
Original Record: 831-1.0092268-source.json
Full Text

Full Text

Comparative Studies of X Inactivation within Eutheria  by ZINY Y E N B . S c , University of B r i t i s h C o l u m b i a , 2003  A THESIS SUBMITTED INPARTIAL F U L F I L M E N T O F T H E REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES MEDICAL  GENETICS  THE UNIVERSITY OFBRITISH COLUMBIA August 2005  © Z i n y Y e n , 2005  Abstract X chromosome inactivation has not been well studied i n mammals other than humans and mice. In both species, the inactive X expresses the XIST/Xist (X-inactivation specific transcript) non-coding R N A that is crucial for dosage compensation i n females. Although both species belong to the same mammalian subclass, Eutheria, they show significant differences in imprinting patterns, negative regulation of XIST/Xist, and extent o f silencing on the inactive X chromosome. Furthermore, the mechanism by which the X i s t transcript coats and silences the X i n cis is unknown. This study focuses on X-inactivation i n other eutherians, first to unravel domains within XIST/Xist o f biological significance, and second to investigate whether incomplete silencing i n humans is unique within the mammalian subclass. Comparative analysis to predict conserved secondary structures between seven eutherian orthologs revealed common stems in the sequence before the Xist A repeat, the A repeat, F repeat, and exon 4. Several complex secondary structures were also similar between rodents but were not conserved i n other species. These included the D repeat; structures between the B and D , as well as A and F repeats; and the unique rodent exon 5. The significance o f these conserved domains i n the context o f potential biological functions, and how the structural differences might account for some species-specific differences, is discussed i n this thesis. To investigate the species variability i n the extent o f silencing, methylation analysis was performed on Zfic, JaridlC,  Crsp2, Utx, Ubel, Ar, and Fmrl i n the cow and coast mole,  i n addition to human and mouse. Results from this study suggest that mouse is distinct i n its more complete inactivation at several loci - Zjx, Crsp2 - on the evolutionary newer part o f the X , and Ubel on an evolutionary older part o f the chromosome. In addition to evolutionary age, factors such as the position o f the centromere, distance from the X inactivation centre (XIC), and presence o f Y homologs failed to consistently explain or predict whether the genes on the X chromosome would escape or be subject to inactivation. Further epigenetic analysis is necessary to understand the distinct mechanisms leading to escape versus inactivation amongst different mammals.  ii  Table of Contents Abstract  ii  List of Tables  vi  List of Figures  vii  List of Abbreviations  viii  Acknowledgements  ix  Chapter I Introduction to Dosage Compensation and Xist in Mammals  1  1.1) Mammalian Phylogenetic Tree: Metatheria, Prototheria, Eutheria  1  1.2) Dosage Compensation in Different Species 1.3) The X-Inactivation Process 1.3.1) Initial X Inactivation 1.3.2) Establishment o f Stable Silencing 1.4) L y o n Repeat Hypothesis 1.5) The XIC and Elements Involved in Count and Choice 1.6) The Xist Sequence 1.7) Differences Between Species  4 6 6 8 8 11 12 15  Chapter II Sequence in Coast Mole Xist and Comparative Analysis  17  2.1) Introduction 2.2) Choice o f Data Set and Bioinformatic Tools for this Study 2.2.1. ) Xist Regions and Species Data Set 2.2.2. ) Bioinformatics Tools 2.3) Results 2.3.1) Generation o f Xist Data Set 2.3.2) Notable Differences i n Sequence Characteristics i n Different Eutherians 2.3.3) Choice o f Orthologous Segments 2.3.4) Findings from Method 1 Analysis (Repeats to Anchor Orthologs) 2.3.5) Findings from Method 2 Analysis (Dotplot to Detect Orthology) 2.3.6) Control for C A R N A C Output 2.4) Discussion  17 18 18 19 23 23 24 31 33 43 52 56  Chapter III Comparative Survey of Inactivation Status in Multiple Eutherians  64  3.1) Introduction to Genes that Escape Inactivation 3.1.1) Origin o f Mammalian Sex Chromosomes 3.1.2) Genes that Escape X Inactivation Escape as a Consequence o f Sex Chromosome Evolution Human vs. Mouse: To Escape or not Escape?  64 64 67 67 69  iii Other Considerations o f Escape 3.1.3) Differences Between Species : Imprinting, Methylation, and Escape 3.2) Introduction to Methylation Analysis 3.2.1) Generation Time 3.2.2) Constitutive Heterochromatin 3.2.3) Distance from the XIC 3.2.4) Evolutionary A g e and X / Y Divergence 3.3) Results 3.4) Discussion 3.4.1) Evidence o f X - L i n k e d L o c i i n C o w and M o l e 3.4.2) Implications o f Factors in Escape Generation Time Constitutive Heterochromatin Distance from the XIC Evolutionary A g e o f the Region Presence o f Y H o m o l o g s Zfr Crsp2 Ubelx Utx JaridlC Ar and Fmrl 3.4.3) Summary o f Factors in Escape  69 71 73 73 74 75 76 76 83 83 87 87 87 88 89 90 90 90 91 92 92 93 94  Chapter IV General Conclusion  96  Chapter V Material and Methods  101  5.1) Polymerase Chain Reaction (PCR)  101  5.2) Cloning 5.3) Restriction Digest (Pst\) Cloning 5.4) 5' and 3' R A C E 5.5) G e l Extraction and D N A Purification 5.6) N C B I and B C M Search Launcher 5.7) Nucleic A c i d Dotplots 5.8) Tandem Repeat Finder 5.9) C A R N A C 5.10) R N A l i f o l d 5.11) M f o l d 5.12) R Statistical Package 5.13) Tissue Culture 5.14) R N A Extraction 5.15) Reverse Transcription 5.16) D N A Extraction 5.17) UCSC-Degenerate Primers for C p G Islands 5.18) Methylation Analysis  101 102 103 104 105 105 105 106 106 106 107 107 108 109 109 110 110  iv  References  113  Appendix  121  Figures A. 1) A.2) A.3) A.4) A.5)  Coast Mole Extended Exon 4 in Xist Overview of Percent Identity Plots of Eutherian Xist Sequences Percent Identity Plots of Human and Mouse XIST/Xist Genomic Sequences Multiple Alignments of Xist Repeats A and F RNAlifold of the Xist 5' end  121 122 123 124 125  A.6) A.7) A.8) A.9)  Partial Conservation of Exon 2 in Rodents Rodents Xist Exon 6 Partially Conserved and Consensus Structures Xist Exon 5 C A R N A C Results Without Rodents Rodent versus Non-Rodent Combined Internal Exon Region  126 127 128 129  A.1) A.2) A.3) A.4)  Multiple Species Xist Splice Junctions Primers Used for Sequencing Coast Mole Xist Pairwise Identities of Exons in Eutherians Pairwise Identities for Xist Sequences Before the A Repeat  13 0 131 134 137  Tables  v  List of Tables Table 2.1. Xist Repeats in Multiple Eutherians Table 2.2. Summary of Randomized Vs. Original Xist C A R N A C Results Table 3.1. Degenerate Primers Used for Methylation Analysis at CpG Islands of X-linked Genes  vi  26 57 78  List of Figures  1.1) Mammalian Phylogenetic Tree 1.2) X Inactivation in Mouse Female Mammalian Embryonic Development 1.3) Silencing of the X Chromosome after Blastulation in the Mouse 1.4) Timeline of events in Undifferentiated and Differentiated Embryonic Stem Cells 1.5) Critical Regions of XIST/Xist in Human and Mouse 2.1) The Theory Behind the CARNAC Algorithm 2.2) Coast Mole Xist cDNA Sequence 2.3) Summary of Tandem Repeats within Xist in Multiple Eutherians 2.4) Eutherian .Als/cDNADotplots 2.5) Eutherian Xist cDNA Dotplots against Human Xist cDNA 2.6) Summary Diagram of Exon and Intron Structures of Xist Orthologs 2.7) Method 1 Analysis Diagram 2.8) Method 2 Breakdown of Xist Orthologs 2.9) Xist Sequence Before A Repeat 2.10) CARNAC and RNAlifold Structures for the Xist A Repeat 2.11) Xist Repeat F 2.12) The Region Between the A and F Repeat for Rodents 2.13) Rodent versus Non-Rodent, Region between A and F Repeats 2.14) D Repeat in Rodents 2.15) Summary of Method 1 CARNAC Results 2.16) Method 2, Segment 1 CARNAC Results 2.17) Method 2, Segment 4 of Xist CARNAC Results for Non-Rodents 2.18) Method 2, Segment 4 of Xist CARNAC Results for Rodents 2.19) Method 2, Segment 6 of Xist CARNAC Results 2.20) Method 2, Segment 7 and 8 of Xist CARNAC Results 2.21) Summary of Method 2 CARNAC Results 2.22) Xist Exon 4 CARNAC Results 2.23) Xist Rodent Unique Exon 5 CARNAC Results 2.24) Consensus Structure and Multiple Alignment of Exon 4 2.25) Summary of CARNAC Results in Exonic Portions 2.26) Proposed Functions of Xist Regions 3.1) Evolution of Mammalian Sex Chromosomes 3.2) Extent of Genes that Escape Inactivation in Human versus Mouse 3.3) Differences Between Mammals 3.4) Control PCRs for Methylation Analysis 3.5) Methylation Analysis of X-Linked Loci in Four Eutherians 3.6) Summary Diagram of Methylation Analysis Results 3.7) Potential Developmentally-Dependent Expression status of JaridlC in Cow 3.8) Comparison of Eutherian X and Y Chromosomes 5.1) Methylation Analysis  vii  2 7 9 10 14 22 25 28 29 30 32 34 35 36 37 38 40 41 42 44 45 46 47 48 49 50 51 53 54 55 59 65 68 72 80 81 82 85 87 112  List of Abbreviations CARNAC: Computer Alignment of RNA by Cofolding CIP: Calf Intestinal Phosphatase Cot-1: Concentration over Time-1 DCC: Dosage Compensation Complex DEPC: Diethyl Pyrocarbonate DMSO: Dimethyl Sulfoxide DTT: Dithiothreitol E (eg: E 6 . 5 ) : Embryonic Day 6.5 ES: Embryonic Stem Cell FCS: Fetal Calf Serum FISH: Fluorescent in-situ Hybridization HMG: High Mobility Group K: Lysine LPCR: Long PCR MEM: Minimal Essential Medium MFE: Minimum Free Energy MIS: Mullerian Inhibiting Substance M-MLV: Moloney Murine Leukemia Virus MSCI: Meiotic Sex Chromosome Inactivation Mya: Million Years Ago NEAA: Non-Essential Amino Acids PAR: Pseudoautosomal Region PCR: Polymerase Chain Reaction PNA: Peptide Nucleic Acid RACE: Rapid Amplification of cDNA Ends RT: Reverse Transcription SAP: Shrimp Alkaline Phosphatase SDS: Sodium Dodecyl Sulfite TAP: Tobacco Acid Pyrophosphatase X:A: X Chromosome to Autosome Ratio Xa: Active X Chromosome/ Xi: Inactive X Chromosome Xm: Maternal X Chromosome/ Xp: Paternal X Chromosome XIST/Xist: X Inactive Specific Transcript XIC: X Inactivation Centre XCI: X Chromosome Inactivation XAR: X Added Region XCR: X Conserved Region XCE: X Controlling Element XITE: X Inactivation Intergenic Transcribed Element Gene name capitalized: Human locus. Eg: XIST Gene name lower case: Locus in non-human species. Eg: Xist Non-italicized gene name: productfromthe gene. Eg: Human XIST transcript UTR: Untranslated Region  viii  Acknowledgements I would like to thank Dr. Carolyn Brown, my supervisor and my mentor, for all her inspiration and encouragement throughout my Master's program. I would like to thank all members o f the Brown lab for their motivation, guidance, and support, which has made m y experience at U B C all the more worthwhile. I would like to express my gratitude to members o f the Lefebvre and Robinson lab for the informative lab meetings and enjoyable social gatherings we have shared together. Members o f my committee, Dr. Louis Lefebvre, Dr. E v i c a Rajcan-Separovic, Dr. Sally Otto, and Dr. Wyeth Wasserman are greatly appreciated for their continuous constructive criticism and detailed review o f my thesis. I am grateful to have had Sohrab Shah assist me with the randomization procedure and Tracy Tucker help me with my statistical analyses. Lastly, I would like to thank Sanja Karalic, whose past work in the lab has made this project possible.  Chapter I Introduction to Dosage Compensation and Xist in Mammals  1.1) Mammalian Phvlogenetic Tree: Metatheria, Prototheria, Eutheria A l l mammals belong to the phylum Chordata i n the kingdom Animalia, which includes all animals with backbones, known as vertebrates. Chordates have been categorized into classes based on anatomical differences. Fish, our swimming vertebrate ancestors, emerged from Vertebrata at least 450-530 million years ago (mya), as approximated by the age o f the oldest fish fossil found (Figure 1.1) (reviewed i n [1]). The first amphibian, the cross-over between solely water-living and solely land-living vertebrates, arose around 375 mya [1]. Amphibians include frogs, toads, and salamanders, which are characterized by their smooth skin, cold-blood, and ability to dwell i n water or land. O n the other hand, land-living reptiles radiated approximately 350 mya, and include the members lizard, snake, turtle and crocodile [1]. These animals are also cold-blooded and reproduce by either laying soft-shelled eggs or bearing live young. In contrast to mammals and birds (below), which are endothermic (body temperature regulated and maintained internally), reptiles, amphibians, and fish are ectothermic (body temperature primarily regulated by ambient temperatures). Flying animals i n the class ofAves evolved approximately 310 mya [1]. Unlike the vertebrates mentioned above, these birds are warm-blooded, have hollow bones and feathers and are born from hard-shelled eggs. Mammals, the fifth class i n Chordata, are also warm-blooded, but they are distinguished by hair/fur, milk production, and the birth o f live young [1]. Within the class Mammalia, two subclasses exist, known as Theria and  Prototheria.  The prototherians (egg-laying mammals), or monotremes, are the most early branching mammals, diverging from the therians about 200 mya (reviewed i n [2]). The name monotreme derives from the term "one-holed," referring to the single hole that acts as the urinary tract, anus, and reproductive tract i n these mammals.  Three species o f monotremes exist, currently  restricted to Australia and N e w Guinea: the duck-billed platypus (Ornithorhynchus) and the two spiny anteater species, or echidnas (Tachyglossus and Zaglossus) (reviewed in [2]). Monotremes exhibit both ancestral reptilian and mammalian traits, as they reproduce by laying  1  B Human JhiniKiiviji)  Goote  Babocfi Macaque ,'.:(.• • M ; i ! h , ! ;  K  Mouse Lemur Lemui  R*MX |— Hi Birds 31 Omva  Cat Oog  Metatherians/ Marsupials 185mva  OH  - Amiadllo S A Opossum NA Opoiwm  Chdwn Fugu  Fish. Amphibians. Reptiles 530-350mya  Prototherians/ Monotremes 20()mya  Mouse  Rat  Hedgehog  r u l r i i «), i  Zefcrafeh  Figure 1.1. Mammalian Phylogenetic Tree. A) Mammalian relationships with adjusted branch lengths, as  adapted from [3]. B) Simplified version with the species of interest in this study depicted in unrooted cladogram format, based on substitutions per synonymous site relative to human rather than phylogenies within Eutheria [3]. The mole belongs to the same order Insectivora as the hedgehog, and the vole belongs to the same order Rodentia, as the rat and mouse.  Human  tiny eggs with a leathery shell (less than 2 c m long), but nourish their offspring with milk produced from a gland on their belly. L i k e other mammals, monotremes possess hair, but have a l o w metabolic rate and a temperature slightly below that o f therians (placental mammals). The internal temperature o f a platypus is 30°C, compared to the constant body temperature o f 3539°C i n other mammals [2]. The therians, on the other hand, are further subdivided into two infraclasses: the metatherians and eutherians. The metatherians (pouched mammals), or marsupials, diverged from prototherians 130-185 mya [2, 3]. They have short gestational times relative to eutherians, due to a yolk-like placenta in the mother, and give birth approximately a month after fertilization.  After birth, the marsupial embryo continues to develop, for weeks or months  depending on the species, i n the mother's pouch, as apparent i n the kangaroo or opossum [2, 3]. Unlike metatherians, the eutherians (wrongly coined "placentals" since all therians possess placentae) possess an allantoic placenta and exhibit long gestational times, nourishing the developing embryo with the mother's blood supply in utero. Hence, they give birth to welldeveloped young who generally feed on their mother's milk for further growth and development. Eutheria is divided into many orders, including Insectivora, Rodentia,  Carnivora,  Artiodactyla, Primates, as exemplified by the mole, mouse and rat, dog, cow, and human. Their relative positions on the phylogenetic tree, based on genetic data, are shown on Figure 1.1 [3]. The phylogeny among the mammalian orders, dating back to at least 60 to 100 mya, has not been easily resolved, due to a short period o f time when mammalian radiation occurred. Contrasting groupings have been derived from morphological and molecular data [4], and incongruent branching orders have been observed between molecular trees generated from different inference methods. Reconstruction o f evolutionary history relies on the identification o f homologous characters shared between different organisms. U n t i l the 1970s, this was essentially restricted to morphological or anatomical characteristics. The comparison o f fossils and extant species has proved powerful to some extent, i n that it has categorized the major groups o f animals and plants. However, this approach is hampered by the limited number o f reliable homologous characters. The availability o f molecular data has increased the number o f characters, thereby improving the resolving power for inferring phylogenetic relationships. However, conflicts arise between results obtained by molecular and morphological data, due to  3  sampling bias and the limited data availability. Furthermore, different trees have been generated from mitochondrial (protein-coding or R N A ) versus nuclear sequences, due to differences i n mutation rates, inheritance patterns o f these sequences, and sampling bias o f different data sets [5]. The resulting branching patterns have also depended on whether parsimony, distance, or likelihood methods were used. Fortunately, the large amount o f sequence information that has emerged i n the last decade has allowed for a greater number o f genes and species to be represented, as well as a combination o f approaches where both mitochondrial and nuclear data are included. Phylogenomic data and inference methods such as supermatrix and supertrees have become powerful to accurately establish the branching orders within Mammalia [6]. Figure 1.1 A shows a recent tree using marsupials as an outgroup and phylogenies based on results from two data sets: 16.4 kb o f concatenated nuclear and mitochondria exonic data [4] using a supermatrix method, and over 12Mb o f genomic sequence based on rare genomic changes such as transpositions, insertions, and deletions [7]. Furthermore, the trees generated from these two data sets were cross-validated using more than one inference approaches and received high bootstrap support for the resulting branches. I consider Figure 1.1A to be a reliable tree (perhaps the most accurate to date), as the data is both rich i n representative species/sequence information and relied on multiple phylogenomic inference methods [3].  1.2) Dosage Compensation in Different Species Invertebrates utilize a variety o f mechanisms for dosage compensation. C. elegans, for example, accounts for the genetic imbalance between X X hermaphrodites and X O males by downregulating both X chromosomes i n the hermaphrodite (reviewed in [8, 9]). A multi-protein complex that assembles on both X chromosomes presumably alters chromatin structure to reduce transcriptional activity. The X chromosomes are apparently distinguished from the autosomes by a 793 bp element which acts as a recruiting site on the X chromosome, allowing the complex to first bind and then establish silencing o f neighboring regions [10].  Drosophila  melanogaster (fruit-fly), on the other hand, hypertranscribes the single X in X Y males to achieve equivalency with the X X gene dosage in females. The upregulation requires the dosage compensation complex ( D C C ) , including two non-coding R N A s - r o X l and rox2- involved in chromosomal targeting and a variety o f proteins that remodel chromatin, including a histone H 4 acetyltransferase, and a histone kinase (reviewed i n [11]).  4  Within vertebrates, it has been hypothesized that birds dosage compensate for the genetic imbalances between males and females by upregulating the single Z i n females, achieved by a locus on the W chromosome, which maintains the Z : autosome ( Z : A ) ratio at 1 [12]. Inactivation o f one Z chromosome for dosage compensation does not occur, as Z-borne genes show expression from both Z chromosomes [13, 14]. The repetitive sequence (MHM, male hypermethylated) on the Z chromosome that is hypomethylated and transcribed i n Z W females appears to be up-regulated by a factor on the W chromosome [12]. The non-coding female-specific M H M transcript accumulates at the MHM locus on the Z chromosome and likely represses the nearby Dmrtl [12]. Thus, Dmrtl escapes expression (which would have led to male sex determination) o f the otherwise upregulated Z chromosome i n females. A femalespecific histone H 4 lysine 16 (H4K16) acetylation is enriched on the Z chromosome i n the region o f the MHM locus, which is the same histone modification enriched on the up-regulated Drosophila melanogaster male X chromosome that plays a vital role i n the dosage compensation [15]. Hence, it is convincing that MHM is important for both sex determination (by repressing Dmrtl in females) and dosage compensation (because it is W-dependent and upregulates the single Z chromosome only i n females). In marsupials, X chromosome inactivation occurs as the means o f dosage compensation. A l l female mammals, including prototherians, display replication asynchrony between their X homologs. The inactive X replicates later than the active X chromosome, and this feature is a hallmark o f silent chromatin [2]. Asynchronous replication timing i n platypus and echidna occurs on part o f the X chromosome in lymphocytes, but not fibroblasts, suggesting that inactivation is partial and tissue-specific [2]. Kangaroo females show late replication timing as well, but only one allele is expressed in heterozygous individuals for X - l i n k e d traits, revealing that inactivation affects only the paternal allele i n marsupials [16]. The silencing is not only imprinted but appears to be unstable, for loci on the inactive opossum X are readily reactivated in culture [17]. In addition, no Xist homolog (see below) has been found in metatherians or monotremes to date [18]. Eutherian mammals silence one o f the X s per diploid set i n females v i a a mechanism mediated by the Xist non-coding R N A . This transcript is an over 17 kb alternatively spliced, polyadenylated R N A (over 15 kb in mouse) that coats the inactive X from which it is exclusively expressed to trigger the events o f silencing. X i s t is necessary and sufficient to  5  initiate the silencing cascade in cis [19,20]. In humans, inactivation is random i n all tissues, compared to cattle and mice, which show paternal inactivation i n extraembryonic tissues and random inactivation elsewhere. The next section discusses the eutherian X-inactivation process as it occurs i n mouse because most early embryo experiments have been conducted i n this species.  1.3) The X-inactivation Process 1.3.1) Initial X Inactivation In mouse, inactivation occurs exclusively on the paternal X i n the trophectoderm, whereas random inactivation occurs in the majority o f the inner cell mass, with the exception o f the primitive endoderm. The mouse paternal X chromosome is inactivated very early, at either the two- or four-cell stage after fertilization, as supported by X i s t coating coupled with lack o f Cot-1 staining (symbolic o f nascent transcription), as well as absence o f elongating R N A polymerase II at the inactive X , respectively (Figure 1.2) [21, 22]. The erasure o f imprints must occur in the epiblast after blastulation, so that random inactivation can occur i n the embryo proper. Consistent with this, X i s t coating is lost at the blastocyst stage [22]. It is unclear whether the paternal X is inherited i n a preinactivated state by retaining features o f heterochromatin from the meiotic sex body formation i n male spermatogenesis [22-24], or whether both X s are initially active followed by early inactivation o f the X bearing the paternal imprint i n all cells [25]. Although it is possible that the paternal X retains imprints from meiotic sex chromosome inactivation (MSCI) i n the father, there are major differences between the two types o f X inactivation processes. One obvious difference is that M S C I is Xist-independent (although X i s t is present), as mice with deleted Xist loci are still fertile and able to form X - Y bodies during pachytene [26, 27]. A second difference is that the inactive X s i n each case carry different epigenetic marks: the H 4 histones o f the X involved i n M S C I are not hypoacetylated [28] and a different histone H 2 variant, the phosphorylated H 2 A X , accumulates on the sex chromatin [29]. The role o f M S C I seems drastically opposite to that o f somatic X inactivation. In the former case, transient silencing o f the X results i n a large dosage imbalance between females and males. M S C I may serve to silence genes detrimental to spermatogenesis or to allow for more efficient synapsis in males for proper segregation during gametogenesis [30].  6  female zygote  o  •Paternal X partially inactive •Lack o f late replication timing  2 cell  •Partial Xist coating •Cot-1 negative •Distance-dependent silencing (biallelic expression of loci distant to XIQ  4 cell  •No R N A pol II association  •H3K4 hypomethylation and H 3 K 9 hypoacetylation on inactive X p 8 cell  •H3K9 and K27 methylation on inactive X p •MacroH2A recruitment 16 cell morula  • H 3 K 9 dimethylation on 32-cell blastocyst  Xp  Xm  inactive X p  Figure 1.2. X Inactivation in Mouse Female Mammalian Embryonic Development. The two leftmost columns describe the stage o f embryonic development, while the t w o rightmost columns depict and describe X inactivation associated events at the chromosomal level, respectively [25, 3237].  1.3.2) Establishment of Stable Silencing The paternal X i n the mouse early embryo is only partially inactivated. W i t h fluorescent in situ hybridization (FISH), the mouse paternal X shows only a small signal o f X i s t localization, indicating partial X i s t coating. Monoallellic expression o f X - l i n k e d genes occurs at loci close to the Xie but biallelic expression is still seen at loci situated at a greater distance [25]. Histone H 3 K 4 (lysine 4) hypomethylation and H 3 K 9 hypoacetylation occur at the 8-cell embryo stage, followed by H 3 K 9 methylation by G 9 a [31] and K 2 7 methylation by Eed/Ezh2 (mPRC2), thought to be important for establishing chromosomal memory o f the inactive state [32] . M a c r o H 2 A recruitment is a late event, occurring i n the 16-cell morula, followed by H 3 K 9 dimethylation in the 32-cell blastocyst (Figure 1.2) [25]. A t the blastocyst stage, cells o f the mouse trophectoderm and primitive endoderm show stabilized paternal X inactivation, marked by complete X i s t coating, and monoallelic expression o f loci independent o f distance from the Xie (Figure 1.3). O n the other hand, reactivation ofthe paternal X occurs i n the inner cell mass o f the implanted blastocyst accompanied by loss o f X i s t coating and associated proteins, and the mechanisms o f count and choice occur to establish random inactivation (Section 1.5 below). In undifferentiated embryonic stem (ES) cells, silencing is reversible and dependent on continuous Xist expression, possibly because it lacks hallmarks such as histone hypoacetylation and/or D N A methylation, which might be important for maintenance (Figure 1.4) [33]. X inactivation i n undifferentiated stem cells is not associated with late replication o f the inactive X . Upon the onset o f differentiation, X i s t coating occurs followed by histone H3 hypoacetylation at K 9 and K 2 7 , H 3 K 4 hypomethylation, H 3 K 9 / K 2 0 / K 2 7 hypermethylation and H 2 A ubiquitinylation by the polycomb protein R i n g l b [32, 34-38]. Thus, methyltransferases, acetylases, and polycomb proteins accompany X i s t i n remodeling the chromatin to establish silencing. This inactive chromatin is maintained synergistically by later modifications including H 4 hypoacetylation, m a c r o H 2 A recruitment, and D N A methylation [39, 40]. After two days o f differentiation, silencing becomes irreversible, and subsequent expression of Xist fails to induce silencing in mice [41].  1.4) Lyon Repeat Hypothesis Because L I N E - I s (long-interspersed nuclear elements) comprise 30% o f the X chromosome (two times the average abundance in the genome), they are postulated to serve as  8  Trophectoderm (and primitive endoderm)  blastocyst  Xp  •Loss of Xist and proteins on Xp •ERASURE of imprints  1 Xm  •Stabilized silencing on inactive Xp •NON-RANDOM inactivation  Xm Epiblast and Embryonic Stem Cells  •Count and Choice mechanisms  RANDOM INACTIVATION OR  Xp  Xm  Xp  Xm  •Xist coats future inactive X, BRCA1-dependent Figure 1.3. Silencing of the X chromosome after Blastulation in the Mouse. The establishment o f random inactivation i n the embryo proper and paternal inactivation in the extraembryonic tissues i n the mouse [25, 65].  Undifferentiated ES cells •Silencing is Xist dependent and reversible • N o histone hypoacetylation or late replication on X i  Upon induction of differentiation:  2-days post-differentiated ES cells •Silencing is X i s t independent and irreversible  MAINTENANCE H4 hypoacetylation macroH2A recruitment DNA methylation  Xist coating H3K27 and K9 hypoacetylation H3K4 hypomethylation H3K9, K20, K27 hypermethylation  I  Xp  Xm  or  H2A ubiquitination by Ringlb  Xp  Figure 1.4. Timeline of events in Undifferentiated and Differentiated Embryonic Stem Cells [31-41]. A ) Silencing characteristics i n undifferentiated and differentiated stem cells i n mouse. List o f events related to X inactivation i n chronological order after differentiation o f embryonic stem cells. The green arrow signifies the point at w h i c h silencing becomes irreversible.  "way-stations" or "booster" elements to help propagate the silencing signal on the X [42,43]. In support o f this hypothesis, L I N E - I s are found at a higher density within 20 kb o f random monoallelically expressed loci compared to biallelically expressed loci [44]. L I elements are also enriched on the long arm o f the X , where most genes are subject to inactivation [45]. This enrichment is observed across three eutherian clades [46]. However, this correlation could simply be a remnant o f evolution rather than reflect an active role in enhancing the X i s t signal along the X chromosome. The mechanism by which L I N E - 1 works is unknown, but intrachromosomal pairing o f L i s possibly amplifies the inactivation signal [43]. Although most L I elements are repressed in the genome v i a methylation at their promoter, L i s are likely hypomethylated prior to X-inactivation, such that intrachromosomal pairing could occur. This has been supported by the observation that L I methylation o f the inactive X does not occur until after L I methylation on the active X , being mediated by a distinct methyltransferase [47].  1.5 ) The XIC and Elements Involved in Count and Choice In mammals, one X is kept active per diploid autosome set and all other X s present i n the same cell are inactivated. This "counting" phenomenon is clearly demonstrated i n tetraploids which retain two active X chromosomes and triploids which carry either one or two active X s . Females trisomic for the X chromosome, on the other hand, retain only one active X , suggestive o f autosomal involvement (reviewed i n [48]). H o w exactly the cell counts the number o f X s is unknown, but a 20 kb bipartite domain downstream o f Xist (see below) appears to be involved [49, 50]. After determining the number o f X s to remain active, there is a choice between which X homolog per diploid set should be inactivated. This decision is stably inherited from parent to progeny. The mechanism o f "choice" is complex and involves multiple elements o f the Xie, including a roughly defined X controlling element (Xce) 3' o f Xist that leads to complete nonrandom inactivation when deleted [51, 52], and Tsix, located 12 kb downstream of Xist that negatively regulates the X i s t R N A , both by its overlapping transcription and antisense sequence [53]. Absence o f the Tsix promoter leads to preferential inactivation o f the mutant X allele, whereas upregulation o f Tsix expression leads to the opposite outcome [54-56]. Deletions o f Xite, which lies upstream o f Tsix, reveal that it is a positive regulator o f Tsix, suggesting that this locus may be equivalent to the previously described X c e [57].  11  Once the count and choice processes are established, the inactivation mechanism itself requires the X inactivation centre (Xie) on the X chromosome, a region originally mapped to a 1Mb region ( X q l 3 ) by human X;autosome translocations and refined to a 450 kb region by mouse transgenic studies. The Xie encodes for a number o f non-coding R N A s , including Xist.  1.6) The Xist Sequence X i s t is necessary and sufficient to initiate silencing o f the X chromosome i n cis [19, 20]. In addition to the X chromosomes, autosomes are capable o f being silenced, though less completely and stably, as evidenced by unbalanced X;autosome translocations leading to apparently normal phenotypes [58]. The same autosomal segment can either escape or be inactivated, and the spread o f inactivation can be discontinuous or continuous depending on the rearrangement [59]. The chromatin environment changes the outcome o f the inactivation process and the Xist R N A must somehow be functionally constrained to work most effectively in its original context. The domains that are important for recognizing the X chromosome i n cis, effectively propagating the silencing signal, and recruiting chromatin remodelers to the future inactive X , are largely unknown. Because structure is closely connected to function, some attempts have been made to predict structures o f biological significance within Xist using comparative sequence analysis. Three-way comparisons involving mouse-human-vole or mouse-human-cow XISTIXist sequences have in general revealed l o w primary sequence conservation [19, 60, 61]. The XISTIXist sequence conservation between human and mouse is 60-70% i n gap free regions, although some regions show up to 80% identity [19,20]. Notably, there are many regions that are unique to either o f the species. Although potential open reading frames (ORFs) were detected in the mouse and human sequences, they were less than 600 bp (483 bp in human and 576 bp i n mouse), a length likely to be observed by chance alone [19, 62]. The most highly conserved blocks were located at the 5' end o f the gene around transcript positions 250-800 bp in both species, consisting o f 43-59 bp tandem repeats separated by A T - r i c h spacers (coined the " A " repeat), as well as a C-rich repeat 1 kb downstream (coined the " B " repeat) [19]. XIST/Xist has up to six sets o f tandem repeats, comprising about 50% o f the entire processed transcript [20, 63].  12  Surprisingly, the global sequence similarity between mouse and vole was found to be only 57%, although both rodents branched into different families just 15-25 mya [63]. This estimate was not much higher than the conservation between human and vole, found to be 49% overall. Comparison o f exonic regions between cow, human, and mouse further confirmed that the sequence similarity between XIST/Xist orthologs was lower than would be expected for protein coding regions (previously established to be 85% by comparing nearly 2000 unique rodent versus human m R N A s ) [60]. The study detected 66% sequence identity between human and mouse and 62% between mouse and cow. These values are even lower than the expected sequence conservations o f 5' (67-79% identity) and 3' U T R s (69-74% identity) between rodent and human m R N A s [60]. Based on these analyses, there appears to be l o w evolutionary pressure to maintain the primary gene sequence o f Xist, perhaps because the precise D N A sequence is less important than the secondary structure that the R N A adopts for functioning i n X inactivation. Despite poor global D N A similarity, there are several regions o f high local similarity within exon 1 and exon 6 between the human, mouse and the four vole species. These include the six tandem repeat elements, designated A - F [19, 20, 63]. Notably, there are differential insertions and truncations, specific to each species. In human, the B repeat is interrupted by an insertion such that it is split in comparison to the rodents [63]. In rodents, the C repeat (115 bp per monomer) has been truncated i n repeat length but greatly expanded i n copy number, relative to the single copy i n human [19, 63]. The D repeat (290 bp per monomer), which spans over 3 kb i n mouse and voles, is further expanded i n human [19, 20, 63]. The last E repeat (T-rich), which is situated within exon 6, seems to be variable i n the vole, human, and mouse sequences, but always occurs within the 5' end o f the second biggest exon [48]. The repetitive elements o f Xist have been speculated to form secondary structures important for binding D N A or proteins necessary for dosage compensation [19]. Transgenic studies using deleted variants o f Xist c D N A introduced into mouse E S cells have revealed that X i s t localization involves the cooperation o f the redundant repeat sequences, A , B , D , E , F and parts o f C . Silencing involves the A (5' conserved) repeat (Figure 1.5) [64]. The copy number o f this repeat is important, as a minimum o f 5.5 copies is necessary to achieve inactivation o f genes along the mouse X chromosome [64]. A threshold number o f repeat copies might be required to form structures important for recruiting proteins that establish silent chromatin. O n  13  c  o  1 2 E ©  .2 S e © Jg <k  S o -S *•  a § d 3  u  J  1 0J5 3 H s  B— —:  • CB e •_ 0  —  — .2 —a u  a S i E  z  2 E = V  5  es -  r°ODO°  ° «  3"  -  o  ja s -S5 "O B " C 3 3 •- es 2 •* I ss S u e s w2 e W J g |  o o o j I  I  I  o o o o  1 «  B 1) —  c  I  3  H  © g  Figure 1.5. Critical Regions of XIST/Xist in Human and Mouse. Functional regions o f XIST/Xist established by deletion studies or P N A interference mapping i n human or mouse [64, 69, 87; personal communication, Jennifer C h o w ] . Repeats A - D are w i t h i n exon 1 of XIST/Xist and repeat E is within exon 6 o f human XIST or exon 7 o f mouse Xist.  14  the other hand, macroH2 A recruitment requires both repeats D and E , but silencing v i a repeat A can occur i n the absence o f D and E , confirming that macroH2A recruitment is a late event that is not essential for initial inactivation [64]. M a c r o H 2 A is not the only protein that is recruited to the inactive X . Other domains o f Xist must interact with key players such as B R C A 1 , the only protein thus far shown to be necessary for the localization o f X i s t [65]. G 9 a [31], Eed/Ezh2 [36, 38, 6 6 ] , R i n g l b [34, 67], R N A polymerase II, histone/DNA modifiers or poly comb group proteins [37], are other candidate interacting proteins. W a y stations such as L i s or L T R s that play a role i n enhancing the silencing signal o f X i s t might also bind to the R N A . In addition, it has been recently demonstrated that X i s t interacts with the S A F - A scaffold protein to constrain the R N A within the nucleus near the X chromosome [68]. Finally, the stability o f X i s t could result from R N A structures that prevent degradation. Because secondary structures may be important for the function o f X i s t despite poor D N A sequence conservation, comparative sequence analysis with a greater number o f eutherian species w i l l shed insight into R N A structures that are common and indicative o f significant biological functions. X i s t R N A structures have been investigated in human and mouse. W i t h the M - f o l d software, the A repeat was predicted to fold into a stable hairpin 1- hairpin 2 structure i n both human and mouse (Figure 1.5) [64]. In addition, human exon 4 (mouse exon 5), which shows better sequence similarity relative to the rest o f the R N A , was predicted to fold stably into a long hairpin structure in each species [69]. The biological relevance o f this conserved domain is currently unknown, as deletions o f this region in Xist did not alter localization nor inactivation ability [69]. However, a decrease in steady state level o f the mutant compared to the wild-type R N A was seen, suggestive o f an effect on stability or processing o f X i s t [69].  1.7) Differences Between Species Within eutherians, differences arise i n the X-inactivation process. The antisense regulation o f X i s t is different between mouse and human. Although the Tsix transcript overlaps completely with the Xist locus i n mouse, only a truncated form is found i n human that does not overlap with XIST completely and lacks exons and promoter regions characteristic o f Tsix [70, 71].  In addition, the critical C p G island methylation site o f D X P a s 3 4 that is believed to regulate  Tsix expression (important for imprinting) is absent i n both human and cow, although cow still  15  shows paternal inactivation i n extraembryonic tissues [72]. Furthermore, continued expression o f TSIX is coincident with XIST expression i n human and is found at l o w levels, whereas mouse Xist is down-regulated on the chromosome that highly expresses Tsix (the future active X ) during early development [73]. In female cow somatic tissues (differentiated), Tsix and Xist are co-expressed i n the same cells and maintained i n the adult, again arguing against T s i x ' s role i n antisense regulation i n this species [74]. Xist itself is different between eutherians. Variations i n sequence and secondary structure between human XIST and mouse Xist might account for the observed differences i n binding affinity. The mouse Xist is retained on metaphase chromosomes, as opposed to human X I S T which is released i n prophase [73]. Species-specific factors i n X inactivation might explain the inability o f human X I S T to localize to the human X in human-mouse somatic cell hybrids [75]. Examining the Xist sequence i n multiple eutherians w i l l provide insight into how Xist has evolved in different lineages leading to variations i n function and proteins involved i n the X-inactivation pathway. Such studies form the basis o f Chapter 2. Another large difference between human and mouse is the extent o f silencing along the X chromosome. This topic w i l l be addressed in detail i n Chapter 3.  16  Chapter II Sequencing Coast Mole Xist and Comparative Analysis  2.1) Introduction  Xist must interact with a wide array o f /raws-acting players and cz's-elements i n order to bring about the inactive state o f one X chromosome i n mammalian females. The apparently simple outcome o f stable X inactivation is accomplished by a complex sequence o f events. Important interacting players include chromatin modification and restructuring proteins, regulatory and recognition machinery, and molecules which initially mark the chromosomes for silencing (see Chapter 1). H o w Xist might interact with these candidates and how it manages to coat the inactive X in cis is still poorly understood. Genetic studies have demonstrated that silencing is distinct from X i s t coating (Section 1.6), as Xist localization on its own is insufficient to cause X inactivation [64]. Furthermore, localization seems to require species-specific autosomal factors, since human X I S T cannot coat the human X chromosome i n a human-mouse hybrid [75]. Apparently i n this case, the human complement o f the factors necessary to guide X I S T to the corresponding X are absent and the complete set o f mouse chromosomes i n the hybrid fails to replace this function. Thus, the autosomal factors that help Xist to localize might only recognize the chromosomes o f their own species. H o w Xist might structurally interact with these localization factors is addressed i n this study. A s mentioned previously (Section 1.6), localization involves redundant domains o f Xist in the mouse model [64], but whether this statement generalizes to other eutherians is unknown. Xist sequence differences between species might lead to altered localization, as the mouse X i s t transcript is able to localize and remain tethered to the mouse inactive X beyond metaphase, whereas human X I S T falls off during this phase o f cell division [73]. Even though the human and mouse transcripts must be dissimilar to account for observable differences i n binding affinity, when human Y A C X/STtransgenes are expressed i n mouse E S cells, the human X I S T transcript is still able to coat and partially inactivate the mouse autosome from which it is transcribed [76]. This indicates that the transcripts are interchangeable to some extent for localization and silencing. Presumably, there are regions in common between human and mouse X I S T / X i s t transcripts to bring about these functions, but  17  these sites have not been clearly delineated and it is unknown whether the same regions are present in other eutherians. The L y o n hypothesis proposes that L I elements, which are enriched on the X chromosome relative to the rest o f the genome, might act as "booster-elements" to propagate the spread o f the silencing signal along the X [42]. Questions o f how X i s t might interact with these ds-elements and what domains are necessary to cause silencing remain to be answered. If the A repeat that is necessary for silencing is what interacts with L I or other important c/s-elements, then this repeat should hypothetically be found in all eutherians. Certainly, as structure is tightly tied to function, especially for non-coding R N A s , obtaining secondary structure information for XIST/Xist would be useful to understand how the transcript works. Unfortunately the large size of XIST/Xist (>15kb i n both cases) has made it difficult to determine structure with methods such as N M R , X-ray crystallography, chemical digestions, denaturing temperature differences i n paired vs. unpaired regions, and footprinting. The purpose o f this study is to compare orthologous Xist sequences from different eutherians in order to find important domains o f the transcript. The global primary sequence conservation, consensus secondary structure prediction from aligned sequences, and common structures found in all species w i l l be useful to identify significant regions. Furthermore, the gene structure o f Xist in terms o f exon/intron properties and repeats has been assessed across the multiple species to gain insight into the evolution o f the primitive X i s t transcript. Differences between species w i l l help to explain variation o f X i s t function within Eutheria.  2.2) Choice of Data Set and Bioinformatic Tools for this Studv 2.2.1.) Xist Regions and Species Data Set Initial comparative sequence analyses of Xist concentrated on the conservation o f D N A sequence, rather than secondary structures. Since X i s t functions as a non-coding R N A , evolutionary pressures have presumably constrained the proper folding o f the transcript, rather than primary sequence, to ensure efficient function. In other words, nucleotides can diverge substantially at the primary sequence level, as long as compensatory mutations occur to preserve the R N A structure. Additionally, repetitive regions may fold into similar structures, despite having poor sequence conservation. Since repeats carry a high potential for expansion, these repeats may serve to maintain a certain threshold o f size for particular roles o f the R N A . Thus,  18  the focus o f this study is on conserved folded structures using bioinformatics, with a large emphasis on the repetitiveness o f Xist which is highly conserved i n the eutherians examined thus far. These analyses have assumed that conservation i n diverged species implies functional significance, in order to highlight regions o f biological function within the X i s t molecule. The dog, rat, and cow genomes have been sequenced, and their data are available. Additionally, I have sequenced -14.5 kb o f the coast mole Xist using P C R and traditional cloning approaches (Sections 6.1-6.5). Experiments have demonstrated that the processed X i s t transcript is sufficient for coating and silencing. Hence it has been necessary to identify exonic boundaries within the dog, rat, cow, and coast mole sequences. Given that insectivores, carnivores, artiodactyls, rodents and primates are not clustered on the eutherian phylogenetic tree (Section 1.1 and Figure 1.1), the orthologs from the above species (coast mole, dog, cow, rat, mouse, human) have been chosen to give a representative picture o f the X i s t transcript i n this mammalian subclass.  2.2.2) Bioinformatics Tools To evaluate primary sequence conservation, ClustalW alignments were produced and pairwise percent identities were calculated per region o f Xist, across the seven eutherian orthologs. MultiPipMaker [77] was used to generate percent identity plots o f the Xist sequences relative to human and mouse XIST/Xist. Unfortunately, to date, there are no programs available that w i l l quantitatively assess the percent conservation o f R N A foldings across an entire transcript (i.e., provide global structural conservation information i n R N A ) . The poor D N A alignment and the large size o f Xist make this especially problematic because many R N A conservation prediction tools require an initial alignment and cannot handle sizes larger than a few kilobases without prohibitive computational times [78]. One question that must be overcome is defining what is considered a "conserved R N A structure." H o w would an algorithm be designed to recognize a conserved R N A domain i n the context o f an entire complex R N A molecule? W o u l d a hairpin that is present i n the R N A o f all species be considered "conserved" and biologically significant even though it belongs to drastically different, larger predicted structures i n each species? Should conservation be defined at individual hairpins/stems/loops or at complex global structures? A l s o , what should the assigned relative scores o f conservation be based upon? Due to the existence o f these problems,  19  the present analysis o f X i s t R N A structure has been reduced to a qualitative one. However, methods to draw consensus structures with alignment data are available. I have examined consensus R N A structures for regions o f X i s t that are conserved sufficiently i n D N A sequence for proper alignment. R N A l i f o l d was used to predict consensus structures from aligned Xist repeats. The algorithm [79] simultaneously considers both thermodynamic stability (minimum free energy) and compensatory mutations to generate an optimal R N A structure, presented i n postscript format. The given overall energy o f the structure is the energy averaged over all sequences i n the alignment. Bonus energies for compensatory and consistent mutations are assigned, while penalties are given for predicted base-pairs o f secondary structures that are not common i n all sequences. Because R N A l i f o l d can only handle 2 kb at a time with a running time o f 0(n ) (proportional to the length in nucleotides cubed), repeats whose lengths exceeded this limit due to high copy number were reduced by whole monomers until they fit the limit [79]. Constructing multiple alignments of Xist was generally infeasible due to high sequence divergence and ambiguities as to which species segments were analogous. However, for the internal exons, orthology was clear. For these segments, I initially used M f o l d [80] to predict the R N A structure o f single sequences based on minimum free energy. The results were qualitatively assessed for the appearance o f globally similar structures for each species. The categorization o f structures as "similar" was not feasible, so this approach was abandoned. Nevertheless, some o f the M f o l d predicted structures are shown i n the results section for comparison o f images generated from different programs. The C A R N A C computational method [81, 82] was used in a more systematic attempt to find conserved structures. Unlike R N A l i f o l d , this method requires no input alignment and hence works for diverged but related sequences. It indicates similar structures within each sequence, rather than giving one consensus structure based on all input data. One advantage is that it is able to handle up to 2 kb per input sequence at a time and is computationally simple, with a run time o f a few seconds for shorter sequences (<300nt) and a tolerable several minutes for longer input (up to 2 kb). C A R N A C uses energy minimization to first predict the most stable stems in each input sequence. This is followed by detection o f analogous stems from pairwise foldings o f orthologs [81, 82], which is achieved by phylogenetic comparison and sequence conservation. The program performs a l o w stringent alignment o f the input sequences to look for short areas o f  20  sequence similarity to use as anchors for comparison. These anchors serve as contextual information so that folded structures can be compared between orthologs at corresponding locations. Stems that are similar between pairwise comparisons are then categorized into the same "components," consisting o f nodes (each node represents a stem found i n an ortholog) connected to each another i f they are common between pairs o f input sequences (Figure 2.1). Components with fully interconnected nodes equal to the number o f input sequences are considered to be the most reliable stems, considered by C A R N A C to be conserved structures [81, 82]. These represent stems that are present i n each sequence/ortholog and are shared between any pair o f orthologs. One disadvantage o f C A R N A C is that the output gives no quantitative measure o f how confident one should be o f the common stems found, but simply indicates the locations o f the common stems i n text (connect files, .ct) or visual formats ( J P E G , .jpg; Postscript, .ps). The program finds common structures that are present i n only a subset o f the orthologs and also indicates when no folding commonalities can be found among the sequences inputted [81]. A t least three different causes can block the appearance o f shared structures even i f they do exist. These include overly high sequence similarity (>95%) between the inputs, which does not allow the program to take advantage o f compensatory mutations to infer common structural foldings; too highly diverged sequences (>50%), which does not allow the program to use locally conserved regions to compare stems; and the presence o f pseudoknots which the program cannot detect [81]. The performance o f this program was previously evaluated by testing it on diverged ciliated telomerase sequences (differing lengths o f approximately 200-3OOnt, unalignable with traditional methods) from three species known to share common structures despite weak conservation. This confirmed C A R N A C ' s ability to recover the structure o f each, complete and consistent with the reference structures in Rfam (an R N A structure database) [83] with l o w false positive rates. A s well, C A R N A C was used i n cases where the R N A s shared only a partial common structure in relatively long sequences, as i n the 5 ' U T R o f three 18 OOnt m R N A s from enteroviruses. The analysis showed that C A R N A C predicted conserved stems only before the start codons, i n accordance with known data [82]. Although the performance o f C A R N A C was evaluated only qualitatively, it seemed suitable to use as a tool to analyze Xist at least to get a crude picture o f conserved regions found i n a number o f eutherians, as a starting point for experimental assays to test functionality o f these regions i n the future.  21  Nucleic Acids Research. 2004. Vol. 32. Web Sen-er issue  seql  I  seql/seq2  seq2  seql/wq3  «q3  scq2/soq3  all potential stems  2.  pairwisefoldings  3. stem graph  F i g u r e 2.1. T h e T h e o r y B e h i n d the C A R N A C A l g o r i t h m . Stable stems are first predicted for each sequence independently, followed by co-foldings to find stems c o m m o n to each pair. This information is used to build a stem graph w h i c h categorizes similar stems into the same component. The components w i t h the number o f nodes equal to the number o f input sequences (signifying that the stem was found in each ortholog) as w e l l as fully connected nodes (symbolizing that each pairwise comparison contained analogous stems) are taken to be the most reliable stems, given as C A R N A C output [81, 82].  22  2.3) Results 2.3.1) Generation of the Xist Data Set The Xist c D N A sequences from dog, rat, cow, human, mouse, vole and coast mole were used for comparative analysis. Because the X chromosome sequences o f dog and rat are available on N C B I but have not yet been annotated, h u m a n X Z S T c D N A (gi: 340393) was compared with the dog and rat genomes (blastn) to identify Xist orthologs (MapViewer, N C B I ) . B L A S T retrieved several dog and rat supercontigs that corresponded to repetitive regions within the human XIST sequence. Since the boundaries o f dog and rat Xist within these contigs were unknown, the approximate 5' and 3' ends o f the gene were extrapolated based on size. The predicted Xist ortholog sequences were then compared i n dotplots against human XIST c D N A until the observed alignments spanned the entire human XIST sequence (i.e., sequences that showed alignments in excess o f the human XIST sequence were truncated, and those that were too short were elongated until the plots reached full coverage). The resulting dog Xist genomic sequence consisted o f two contigs that were combined (see Table A . 1 for accession numbers), while the rai Xist sequence corresponded to the reverse complement o f a portion o f a single contig. Since Xist c D N A sequences were necessary for the subsequent bioinformatics analysis, approximate exon/intron boundaries within the dog and rat genomic sequences were defined using dotplots against human and mouse XIST/Xist c D N A . This was followed by prediction o f potential splice junctions using N N S P L I C E version 0.9 (http://www.fruitfly.org/seq tools/splice.html). The most probable o f these splice sites were selected based on sequence similarity to known exons i n vole, mouse and human [63], as well as consistency with estimated sites from the dotplots. Data regarding the exon and intron boundaries for the predicted dog and rat Xist are shown in Table A . 1 . From this information, the "introns" were removed from the Xist genomic D N A sequences to produce the virtual dog and rat c D N A sequences. Initial coast mole Xist fragments were obtained v i a inverse P C R with conserved primers by a former graduate student, Sanja Karalic [84]. I sequenced the remainder o f coast mole Xist via a progressive P C R approach, using primers that were mole Xw/-specific and degenerate (Table A.2). Large segments within exon 1 gave multiple l o w intensity P C R products due to repetitiveness in the region. G e l purification o f the desired P C R products followed by  23  sequencing failed to give results due to remaining contaminants and/or l o w template concentration for the sequencing reactions. This demanded that I use traditional cloning to increase the yield o f the desired products for sequencing. Therefore for these regions, I cloned either original purified P C R products ( T A cloning) or P C R products that were restriction enzyme digested (creating sticky ends for vector ligation) (Figure 2.2). T o sequence the internal exons, coast mole c D N A was generated from reverse transcription o f R N A extracted from cultured female mole fibroblasts. Conserved primers designed from exons 2 and 4 were used for subsequent P C R amplification o f the coast mole c D N A . In total, I sequenced 14.5 kb o f coast mole Xist, whose alignment with other Xist sequences confirmed that the transcript has almost been entirely sequenced, with the exception o f 1 kb at the 5' end and 2.5 kb at the 3' end.  2.3.2) Notable Differences in Sequence Characteristics in Different Eutherians Disparities between eutherian Xist function could result from sequence or structural variability within the orthologs. To examine differences between eutherian sequences, Xist repeats from the seven eutherians were determined using Tandem Repeat Finder. The repeat sequences were verified by dotplot comparison and by visual inspection. Details about the repeats (consensus sequence, monomer size, copy number, percent identity, location) i n the different species are listed in Table 2.1 and Figure 2.3. The dog Xist sequence contains the complete set of repeats established i n human and mouse. In addition, the dog B repeat is interrupted by an insertion similar to the situation in human [63]. The location o f dog B and C repeats are reversed compared to other eutherians, suggestive o f an inversion (Table 2.1). N o C repeat (truncated or full length) was detected i n cow or mole Xist using B L A S T , l o w stringency dotplot comparisons, visual inspection, or the Tandem Repeat Finder program (Table 2.1). However, a l o w number o f copies (1-2X) was observed in human, dog and vole, while the C repeat was expanded i n mouse and rat (Table 2.1).  Cow Xist contains at least two repeat elements detectable in dotplots (Figure 2.4) that were not visible in other eutherian sequences. The four non-rodents (human, mole, cow, dog) possess expanded D repeats compared to those observed i n the rodents (Figure 2.5). Although all seven species display a characteristic repetitive region in the centre o f the D repeat known as the D core [20] (Figure 2.5), the composition o f the remainder o f the D repeat differs between rodents and non-rodents. Specifically, according to dotplots, rodents lack obvious sequence  24  COAST MOLE Xist  700bp  4600  380bp 1507bp  /  19 23  R2  \  700bp/ 15  R3  R11 120  R8  1074  2581  R17  -27  s  4kbPCR (expect 2.1 kb)  1kb, 150bp PCR expect 600bp  1.4kb  9147 20  24  rT  11938  R13 R10  18 *  , R9 R12  5kbPCR (expect 4547bp)  25  ^  n  11300  12104  EXON 1 (11364 bp in human)  1  |—| sequenced PCR product |  j inverse PCR  4 0 0 b  P  green numbers refer to locations relative to human XIST  • - - - amplified with PCR  * 17  21  \  \  R16 13743 1 2  EXON 6 (4.5KB in human)  coast mole primer E2  sequenced clone  1—  1 22  R16  e 6  R5  >26  E3  E4  E5  E6  e6  human primer conserved primer  200bp  400bp PCR (expect 207bp)  F i g u r e 2.2. C o a s t m o l e Xist c D N A Sequence. Primers used for sequencing are shown with degenerate primers shown as purple arrows and coast mole Xist specific primers shown i n black. Green numbers indicate approximate locations o f the coast mole sequence relative to human XIST based on B L A S T alignments. P C R product sizes are g i v e n in bp or kb. The internal exon region (exons 2-5) is magnified below for clarity.  T a b l e 2.1, Xist Repeats in M u l t i p l e Eutherians. Repeat location, copy number, total length, monomer length, and percent identity are listed. The accession numbers of the Xist sequences are indicated.  A I ' lOOOnt in RNA; hairpinl + hairpin2 + A T - r i r h snacer . \ 1 —1 1L 11 284-781 (8.5X) = 429nt total 43-59nt monomer 78-86% s  Human cDNA GI: 340393 M97168 gDNA GI:45269107 U80460 (49937-82015)  Mouse cDNA GI:202420 AJ421479 gDNA GI:21425583 AJ421479 (106332I L  / SOU;  Vole cDNA gDNA GI: 13445263 AJ310127 C o w cDNA gDNA GI:21425595 AJ421481  F ~750nt downstream A, discovered in vole); 16bp motif 1443-1544 (2X) = 10Int total 42nt monomer 70%  B C rich 4-9bp motif  C  D 290bp monomer  E AT rich Variable Repeat  Bh:1975-2068 ( 1 2 X ) = 93nt total 7-9nt monomer 87%  3045-3090 (1.9X) = 45nt total 83%  4582-8419 (12.6X) = 3837nt total 289nt monomer 76-83%  12213-13702 (25A) 1489nt total 18-25nt monomer 69-84% 25045-26533  B : 2809-2927 ( 1 7 X ) = 118nt total 6-9nt monomer 78% 414-708 ( 8 X ) = 294nt total 42-74nt monomer 75-83% 395-688  1192-1531 ( 2 X ) G C = 339nt total 33nt monomer 82%  3225-4673 ( 1 4 X ) = 1448nt total 119nt monomer 89%  2805-2988  3064-4512  3003-3100 (0.8X) = 97nt total  3336-6451 (9X) = 3115nt total 78-83%  N o repeat seen  4432-16759 (48X) = 12327nt total 95-96nt monomers 58-62%  290-718 ( 8 X ) = 428nt  1450-1534 ( 5 X ) = 84nt  total  total  40-5 5 nt monomer 80-85%  91%  2800-2976 ( 3 0 X ) = 176nt total 6nt monomer 84%  445-799 (8x) = 354nt total 42-43nt monomer 88%  963-1579 (12x) = 616nt total 24nt 70%  2795-2970 ( 3 2 X ) = 175nt total 6nt monomer 87%  15nt monomer  6381-6500 ( I X ) = 119nt+ 1 0 X truncated copies = 380nttotal 73-80%o  2817-3000 ( 3 2 X ) = 183nt total 6-8nt monomer 83%  i m i n i 1 r\f\A ^ A Y ^ — 10230-1 1 0 0 4 ( j O A ; —  774nt total 20nt monomer 59% 15159-15932  14189-15520 (JOAJ -  1337nt total 16-22nt monomer 64-73% 8643-9978 • 1 A"ifi ^ 1 C\HH A( 4(AA~V\ 19294-19774 ) — 480nt total 14-18nt monomer 55-60% 143833-144313  Table 2.1. Xist Repeats in M u l t i p l e Eutherians (Continued...)  Mole c D N A  Not yet sequenced  121-550 (1 lx) = 429nt total 39nt monomer 75%  1688-1850 (10x) = 162nt total 1 Int monomer 65%  No repeat seen  3353-10050 (18X) = 6072nt total 97-3 66nt monomer 76-93%  13021-13580(4X) = 559nt total 17nt 76%  Dog c D N A  368-789 (8x) = 42Int total 44-45nt monomer 80-85%  1253-1554 (13x) = 30Int total 20nt monomer 73%  Bh: 1847-1892 (3x) = 45nt total 19nt monomer 82%  813-2975 (1.9X truncated copies) = 2162nt total 87%  4635-7470 (31x) = 509nt total 94-96nt monomers 59-78%  14232-15411 (6X)= 1189nt total 19-25nt 56-79% 27975-29164  3296-4810 (15x) = 1514nt total 98nt monomer 75% 7447-8960  6523-8699 (21x) = 2176nt total 103nt monomer 81% 10674-12850  11125-11714 (24X) = 589nt total 24nt monomer 55% 19768-20357  gDNA  GI: 50088291 AAEX01057775 (17368-24421bp) + AAEXO105 7774 (l-25440bp) pieced together 1 =beginning of pieced sequence Rat c D N A gDNA  GI: 34881475 N W 048043.1 (29609702986883bp reverse complement, where 1=2960970)  B: 1893-3135 (27x) = 1242nt total 6nt monomer 71% 638-1049 (8x) = 41 Int total 45nt monomer 80-84% 4789-5200  1524-1849 (5x) = 325nt total 18nt monomer 82% 5675-6000  3094-3232 (24x) = 138nt total 6nt monomer 76% 7244-7382  Bh  Human  I I  I  C D  I:  Bh  II  Dog  D  B  1  B  Mole Cow  D  CO  D  D  B  I  D  Rat  L)  Mouse Vole  Figure 2.3. Summary of Tandem Repeats within Xist in Multiple Eutherians. Repeats A and B [19], C , D , and E [20], as w e l l as F [63] are shown i n the different eutherians used in the study, based on dotplots against human and mouse XIST/Xist c D N A , or their o w n sequences. The two pink repeats i n c o w seem to be species specific, whereas the two pink repeats i n mole seem to be distinct sequences that form a larger D repeat structure. The B repeat and the C repeats are reversed i n order i n dog. B o t h dog and human have a split B repeat.  28  HismacnLimitia •: - • • Window s a p 9 30WT-SMdi ontMpiotto oct*n coonMnja inftrmjbcn  O I' U clrd-ac near base 1013ft in C N * 1 ana case . " I S m O N * 2  DNA 1 on horizontal aws = 21320 bases  DNA 1 on horizontal ans = 11655 bases  DNA 2 on vertical axle • 21320 bases  DNA 2 on vertical axis = 11S5S bases  COW  MOUSE  Click on plot to get positional data  Dene - elide on tne plot to obtain coordinate information DNA 1 on horizontal aas = 13841 bases DNA 2 cm vertical aws =13841 bases  MOLE  VOLE  Click on plot to get positional data  j c n s • d i a ontha plot to ootsin eocrcmate information Don* - ©iff on tre p t « B ootsin wcramara infcrmacon DNA 1 on horizontal ans = 19212 bases DNA 1 on horizontal aas = 13660 bases  DNA 2 on vertical axis = 19212 bases  DNA 2 on vertical ans = 13660 bases  • X'-r.i-'jr.i^r-'"."  RAT  DOG  Click on plot to get positional a Click on plot to get positional data  F i g u r e 2.4. E u t h e r i a n Xist c D N A Dotplots. Sequences o f Xist orthologs are aligned to themselves ro reveal repetitive regions. In general, non-rodents have expanded D repeat (yellow); rodents have more apparent C (turquoise) and E repeats (purple). 29  Mane Plot  WindowSize j i ?  Oona • dicK cn the plot to obtain coordinate information >ou aitteo near esse 1U03 ir> O H * 1 ana oass 13390 in DNA 2 DNA 1 on horizontal axis =14210 bases  DNA1 on horizontal axis -1 *21 Obas9s  DNA 2 on vertical aws • 15525 bases  DNA 2 on vertical axis = 13683 bases  , •  . \-.V*C5v.<:-'. • '•'.'vj'V:!'! '^""  RAT  DOG  ..•! • 1  : «P  rff—n  «Plol  Window  f'ou clicked near case U 1 0 3 in DN*-1  Vou clicked near base UQ5? in DH- i and base 1-5^3 in QNA2  and case 21229 in D N - 2  DNA 1 on horizontal axis = 14210 bases DNA 2 on vertical axis = 14580 bases  • 4ft»  n r  MOLE  X  HUMAN  HUMAN g  MgjwPjtiJ  n Mismatch limit | -  DNA 2 on vertical ans = 21320 bases  w •  ' '  HUMAN  HUMAN Slap?""  DNA 1 on horizontal axis =14210 bases  COW  I*'  Window Sua is lone - click on the plot tc obtain coordinate information  cu dichad near a w e 11933 m Dn-. 1 and base 10824 in  DNA 1 on horizontal axis = 14210 bases  on*z  DNA 1 on horizontal axis = 14210 bases  DNA 2 an vertical axis = 13841 bases  DNA 2 on vertical axis = 11655 bases  X-  MOUSE  VOLE  ,,..-. v  I I ! !.  v  V  HUMAN Figure  HUMAN  2.5. E u t h e r i a n Xist c D N A Dotplots against H u m a n Xist c D N A . Repeats are  indicated. The D repeat (yellow) in cow and dog is expanded compared to others, whereas the repeat (turquoise) is amplified i n mouse compared to others. A r r o w points at the D core  30  [20].  similarity before the D core region to human Xist c D N A (discussed below i n Section 2.3.3 as segment 4) (Figure 2.5). The E repeat is not apparent i n mole and dog Xist dotplots, whereas i n rodents this region clearly displays expansion o f a monomer (Figure 2.5). The primary sequence o f the Xist E repeat differs significantly between eutherians, as the tandem repeat finder identifies species-specific consensus sequences (data not shown on Table 2.1). However, all E repeat sequences are l o w i n G C content and situated at the beginning o f the large 5' Xist exon. The predicted cow, mole, and dog c D N A sequences have similar exonic structure to human Xist, with a notable exception o f human exon 2 (Figure 2.6). The pairwise identity o f cow-human exon 2 is 14%, mole-human is 12%, and dog-human is 15% (Table A . 1 ) . The Xist exon 2 o f cow, mole, and dog resembles that o f mouse and vole, with pairwise identities ranging from 63%-70% (Table A . 1 ) . A s expected, the rat sequence is most similar to the other rodents, due to the presence o f the rodent-specific exon 5 (Figure 2.6). The exon 4 sequence is the most conserved, showing pairwise identities ranging from 52-96%, compared to sequences o f the other exons (pairwise identities o f exon 2 is 10-85%, exon 3 is 12-81%, and exon 5 [rodent exon 6] is 8-83%) (Table A . 2 ) . Dotplot comparisons o f coast mole Xist c D N A with cow or human Xist genomic D N A revealed an exon in mole bearing sequence similarity to human and cow intronic sequence (Figure A.1). Inspection o f the region confirmed that it is not similar to the unique rodent exon 5 or non-rodent exon 5 sequences. Hence, mole Xist contains an additional exonic region, either signifying an elongated exon 4, or a separate exon between traditional exons 4 and 5. MultiPipMaker o f Xist c D N A i n the seven eutherians revealed that exon 4 was the most highly conserved i n primary sequence (Figure A . 2 ) .  Plots generated from genomic Xist  sequences did not reveal any intronic regions with consistently high percent identities between pairs o f species (Figure A.3). D o g and human as well as rat and mouse were the most similar i n intronic sequence to one another.  2.3.3) Choice of Orthologous Segments The structures of Xist repeats A to F (Figure 2.3), as well as exons 2 to 6 (Figure 2.6) were individually predicted using C A R N A C . For other parts o f Xist, repeats'were used to anchor the orthologs and regions between the repeats were considered related. Because differences between repeats and sequence lengths led to difficulty in distinguishing analogous  31  JI COMPARISON OF Xist GEHEIHTROH EXON STRUCTURE  354096  3  MOUSE  BOVINE 116801  H U MA H  3S3614  £  366632 366392  GT  1  J  S  2  • T | GT  •|  GT  GT  29688•  1  S  AG AG  g  33S67  368341 368495 367S12  AG  1  134811  366763  366482  AG  •  GT  J  6  J  ,  367722  AG  1 1  AG  GT  *  AG  GT  » **** 33630  137917  j  368013  AG GT  GT  „ „ „  137781  367867  376603  369277 A G  G T  AG  370934 G T  373798 G T  A G tt  376940  PolyA signal  A G  1  5  GT  |  GT  37696 37832  1  AG  J  9  5  J  4  AG  GT  AG  GT  AGJ|GT  143828 AG,  GT 142*51  I AGMGT 39796 40004  142783  AGIIGT 41840 42003  AG 47626  43087  48855 49000  M  „  50396  -DAG |j GT  t§l [j  •  AGn Cr GT  AT-  AG  GT  AG  GT  AG  GT  AGri GT  AGBGT  AGn GT  AG n GT AGHGT  AGn GT  AGg GT  AGj-jGT  AG n GT AGGn T GT_  JW  0  [  EXOH (EXPERIMENTALLY CONFIRMED)  C O N S E R V E D REGION CORRESPONDING T O A KNOWN EXON IN A T L E A S T ONE SPECIES  Figure 2.6. Summary Diagram of Exon and Intron Structures of Xist Orthologs. M o l e , dog, rat, and vole Xist sequences are shown i n comparison to human, mouse and c o w orthologs. Figure is extended from [60].  Potential splice sites are shown for  dog and rat, whereas the splice sites shown for other species have been experimentally confirmed. Regions showing sequence similarity to k n o w n exons are shaded in grey. The mole sequence has not been sequenced to completion at the 5' and 3' ends.  Xist segments across species, a second approach was used to ensure that the corresponding regions from different species were analyzed together. This approach consisted o f comparing each o f the species' Xist sequences to human XIST c D N A i n dotplots (Figure 2.5), in order to find similar regions. These two input methods w i l l hereafter be referred to as "Method 1" (repeat anchors) (Figure 2.7) and "Method 2 " (dotplot) (Figure 2.8). The segments that were analyzed i n Method 2 (based on alignment patterns to human XIST cDNA) are indicated on the dotplots i n Figure 2.8. These segments are designated 1-10. Segment 1 represents the sequence before the A repeat and is therefore identical to the "before A " region used i n Method 1. Segment 2 is between the A and F repeats, segment 3 is between the F and B repeat, and segment 4 is between the B and D repeat. Because i n the dotplots, rodents lacked sequence similarity to human XIST in segment 4,1 analyzed rodent and nonrodent sequences separately for this region. The repeat D was divided into four segments, segments 5-8. The first o f the four - segment 5 - is characterized by expansion o f a small monomer present in the human XIST sequence. Rodent dotplots did not show this pattern; hence rodent sequences were excluded from segment 5 C A R N A C analysis. Segment 6 marked a less repetitive D repeat region that displayed a clear alignment with human XIST, whereas this characteristic was not visible in rodent Xist dotplots. Due to this reason, rodent sequences were excluded from the segment 6 input. The D repeat core was present i n all species dotplots and was designated segment 7. Segment 8 represented the remainder o f the D repeat, whereas segment 9 contains the sequence after the D repeat continuing to the end o f exon 1. The internal exons, exons 2-5 (or 6 i n rodents), were analyzed individually, followed by segment 10 which was equivalent to the region o f the 5' large Xist exon containing the E repeat.  2.3.4) Findings from Method 1 Analysis (Repeats to Anchor Orthologs) The sequence before the A repeat, ranging from 242 - 325 bp o f sequence, displayed pairwise identities o f 35%-80% (Table A.4) based on ClustalW alignment (Figure 2.9). F o r this region, C A R N A C revealed common stems i n all species sequences analyzed, ranging from 3-9 bp in length (Figure 2.9). Likewise, the A repeat and F repeat (355-435 bp and 46 - 372 bp regions, respectively) yielded a few common stems (Figures 2.10 and 2.11). The A repeat stems were a minimum o f 5 bp to a maximum o f 10 bp, with two consecutive A repeats contributing to a single hairpin (as  33  F i g u r e 2.7. M e t h o d 1 A n a l y s i s D i a g r a m . Segments that were analyzed together are shown in the same color. Fragments between repeats that were analyzed are shown i n red. Orthologous sequences were analyzed up to 2kb at a time, except the D repeat w h i c h was divided into the first and last kb o f the repeat, shown in yellow.  vou dictsa near case 11*03 in O N * I and base 13330 m DNA 2  i o n * • CHC* on the piotto omain coordinate intcrmation  DNA 1 on nonzontal axis =14210 bases  DNA 1 on horizontal axis - 1 4210 bases  DNA 2 on vertical axis = 15525 bases  DNA 2 on vertical axis = 13641 bases  VOLE  Mismatcn umit |-  H  DOG 1 V  10  a wit  HUMAN  HUMAN  Internal exons  B  Segments Human Cow Dog Mole Mouse Rat Vole  2 3 + + + + + +  + + + + + + +  + + + + + + +  8 + + + +  + + + +  + + + +  + + + +  + + + +  + +  -  -  -  +  -  +  -  -  +  -  +  +  -  +  +  +  + + +  + + + +  F i g u r e 2.8. M e t h o d 2 B r e a k d o w n o f Xist O r t h o l o g s . The Xist sequence analysed i n M e t h o d 2 are designated segments 1-10. Segments 5-8 comprise the D repeat, segment 2 is between the A and F repeat, segment 3 is between the F and B repeat, and segment 4 is between the B and D repeat. D o g and vole dotplots are shown above to contrast rodents and non-rodents i n their alignments w i t h human XIST, especially for the D repeat. Segments are marked on the dotplots for clarity. + or - signify obvious or absent dotted alignments w i t h human XIST, respectively.  35  90  100  110  120  130  140  130  ICO  170  180  190  200  210  220  Cow/1-464  - T C TCAC T T C T I A A A G C GC TGC A C TTTGC T G C GAC C G C C A T A T T T C T T C T T T T C CC G A G A - T G G A A G C T T A T T A A T A T T G G A T T T C T T T G C CTGTGTG6TTC T T T C T G G A A C A T T T T C C A G A C C C C A A C C A T G -  Doa/1-464  CCC TCAC TTC T T A A A G C A C T G C A A TTTGC T G C T G C C G C C A T A T T T C T T C C T T T C C C  Hum/1~464  C CTTCAGTTC^ftAAGC GC T G C A A T T C GC T G C T G C A G C C A T A T T T C TTAC T C T C T C G G G G C T G G A A G C T T C C T G A C T G A A G A T C T C T C T G C A C TTG<»GGTTCTSTC T A G A A C A T T T T C T A G T C C C C CAACAC C  Rat/1-464  C GC C A G T C A G T X A A A G G C G A G C A A C T G C T T G C T G C A G C C A T A T T T G C T C G T C T C C C 6TGGATGTGAGGTCTC C T C C G T G G T T T C T C T C C A T C TAAA G G G C T - TTTGGGGAACATTTTrAATC C CCCTAC C A C C A T G C C TOATGGTG  M o u s e / 1 - 4 64  G A G A - X G G T A G C T C G C T A A C A GTGGGTGTCTTC G C C C G T G T G G T T C T T T C T G G A A C G - T T T T C C A G ~ CICCAACCACC- •  T G T T T G C T C G T T O C CC G T G G A T G T G C G G T T C T T - C C G T G G T T T C T C T C C A T C T A A G G A G C T - T T G G G G G A A C A T T T T T A G T T C C C CTACCAC C AA G C C T X A T G G C T  vole/1-464  ATGXTXGCCAGTSECCCCGTGGATGTGAGCCTCCT-CTGTGGTCTCTGTCCATCTACCGGGGT-CTCTGGGAACATXTXIAGTTCCCTCACCACCATGCCTTATGGTG  JIL  Quality/1-4  230 Cow/1-464  240  260  270  280  290  310  300  H o u s e / l - 4 6 4 AAGCCTrATGGCTTATrT  320  330  340  350  360  370  -TCTTCTTGACACt?rXC7X C A T A T T T ^  AAGAAAA C A - T A T C A A M T 7 A C A T A A GATTTTTG AT—GTTTTGATATGTTCTCCTAAGC-  -TTTTCTTGACAC GTC CTC CATATTTTTTG ITl'lTl^'lljACATCTCCTCrATALITfl'l AAtyuUATA-TATXMGATTCrATAATlTTTTTrMA—GATTTTATATCTTCTGCTAGG— C T T T T C T T G A C A C CTTrTTCATA - - T U T  j"• "  AAGAAAACA -TATCAAAATTCCACGA GATTTTTGA C - - GTTTTrATATGTTCTGGTAAGAT  ATC^CTTATGGTGTATTT  380  ACAJ^TACTTCWMMATTTTTAGG^  - u n c i i IJACAC CTCCTCC CTATXTT- -  C C T I A T G C C CTATrTCTTTAAjaULAATT- C A C C C A A A T T C CATJLAAATATCTTAA - C AATTCT/GAA C T T T C T T C G A G T G -  ATKCTTATGGTCTATTT  J U U L  - TTTTCTTGACAC CTTCTC AGTATTTT  CTTTATG G C G T A T T T C T T T A A A J L A A A T — C A C C T A A A T T C C A T A A A A T A T T T T T T T A A A T 7 C T A T A rjTTrCTCrEUjTG-  HUM/1-464  vole/1-464  250  CCTTATGGC ATATTTrnTGGAAAAAATTAC AC C A A A A A T T C ATAAAATAT/TTTTA - A A A A C CTCAiriTTClTCXJLGTA -  Doc;/1-464  Rat/1-464  CTTTATGGC G  GATATT \ GTAATATTTTC A CTCAATIT 3 ICAl'iTTi A A G G A A T ;  G A T A T T T G T A A T A I l l l l W*ACAA 1 I I 11 UmTTTAATGACTJ  B  y DOG  RAT  cow  [ huml-284  [ mraxel-326 ]  QO  [  dogi-242  2U  10 U  VoXexiutqdllU  • »  ]  [  r  ctfwl-319  •  100  ICO  A40  ICO  ABO  =00  100  ll'O  140  160  100  -00  40  40  121  HUMAN  ]  [ z.it4341-4665  ]  IDO  [  VOLE  MOUSE  ]  £0  60  ]  •  1  F i g u r e 2.9. Xist Sequence Before A repeat. A ) C l u s t a l W alignment o f orthologs i n this region. B ) C A R N A C structures depicted i n each species. Consensus structure predicted by R N A l i f o l d is boxed i n green. C ) C o m m o n stems found w i t h C A R N A C i n six eutherians. M o l e was excluded due to lack o f sequence i n this region.  36  F i g u r e 2.10.  C A R N A C a n d R N A l i f o l d S t r u c t u r e s for the Xist A Repeat A ) Stems  that are analogous i n each species, as determined by C A R N A C . B ) Consensus structure predicted by R N A l i f o l d .  C ) The stems corresponding to the top C A R N A C figures.  37  VOLE  RAT  HUMAN  SO  50  b  MOUSE  DOG  50  5 0  Figure  SO  COW  MOLE  cow  Representative CARNAC structure  50  2.11. Xist Repeat F . A ) C A R N A C stems  conserved i n the seven eutherians. B ) C o w conserved secondary structure generated from C A R N A C , representative the stems i n the other eutherians. C ) R N A l i f o l d consensus structure o f Xist repeat F .  38  opposed to two hairpins forming from a single A repeat - the hairpin l-hairpin2 structure [Section 1.6]). C A R N A C only predicted one conserved stem within Xist repeat F which was 410 bp, depending on the species. The multiple alignments o f these two repeats (Figure A.4) were entered into R N A l i f o l d to predict consensus structures (Figures 2.10 and 2.11). A multiple alignment o f the Xist 5' end spanning these two repeats was also analyzed by R N A l i f o l d to visualize the common underlying structure in this region (Figure A.5). R N A l i f o l d predictions were window independent, yielding the same consensus structures for a given portion o f an alignment regardless o f context (Figure A.5). Hairpins displayed i n the R N A l i f o l d consensus structure were compared to common stems predicted by C A R N A C . It was noted that the C A R N A C stems always fell into the same locations as stems from R N A l i f o l d , but R N A l i f o l d generally also output stems not present i n C A R N A C predicted common structures. This could reflect the different questions each program addresses: R N A l i f o l d attempts to find the common underlying structure from an alignment, whereas C A R N A C predicts stable stems i n single sequences and identifies stems common to cofolded sequences. Since R N A l i f o l d outputs consensus structures even when C A R N A C is unable to detect common stems, Xist sequences were first analyzed with C A R N A C . R N A l i f o l d was utilized subsequently, i f alignment between the analogous regions was possible. For the rest of Xist, common structures were predicted only when rodent Xist sequences were analyzed separately from non-rodent sequences. C A R N A C detected conserved stems in the region between the A and F repeats, as well as for the D repeat (Figures 2.12 to 2.14). Between the A and F repeats, the predicted common structures were complex, showing a large number o f stems, when rodents or non-rodents sequences were considered separately. This is presumably because some o f the structural similarities reflect close evolutionary relationships rather than conservation due to biological significance. The R N A l i f o l d consensus structures resulting from multiple alignment o f the region between A and F repeats were different between non-rodent versus rodent groups (Figure 2.14). The last kilobase o f the D repeat was conserved in structure between all three rodents, whereas the first kilobase ofthe same repeat shared secondary structures only between mouse and rat. However, according to C A R N A C , nonrodents do not share secondary structures in these regions. This might imply that rodents have unique functional domains of Xist that are not found i n the other eutherians.  39  B  A  ""X. RAT  MOUSE  F i g u r e 2.12.  T h e R e g i o n Between the A a n d F Repeat for Rodents. A ) C A R N A C  stems identified when non-rodents were excluded from the input. B) C A R N A C structure from the corresponding input.  40  A 10  20  JO  40  SO  SO  70  80  90  100  110  120  130  140  UO  liO  170  180  ISO  1-7(7 THHmwiiiiCTmHtmnracmawBCTKiflnauu™ Wi-717  -  -imat^OTMAcraaKifflaaKACTTtmrcOTKrwcm^  j.^i-717  nn^cgr<^gat«TOKAiana--WOTAgcmwCTattw^^  c.ut/i-717  rat/1-766  wc  4OTmiKAttracira;«cTiauc-Ksst«n«iAK«OTCT^cAtt^^  TTTCTGCGTGACACAGATATnrTC G i w u c m t G C - u c u m c i ^ a c i a r a H H ^ a a ^ ^  awTwrnrGGCAaomKMaA  iouM/18-766 (TTTraGmTlCGGCTATTnr. GAGCCAffnACGCCAAGUTTAGGAOCCGAtlGAGCAClK lUGCttlSATGimGUTrAHKKOT^^  vole/20-766 TTGCT5CGTGACAT  mniitr/l-76ij  10 iK/i-766  20  30  40  u  50  GACTTGATTf&TGCC-—TUTHASIC .  4 60  70  80  90  100  110  120  130  140  150  -(auwAccTKUTATATmracrrm^  •0U3c/ie-766TGCAnACCTOAATAT--nATOCT^ voie/20-766 Tin»CTAam*ATAG--ccATACCTrrrnnm  Qutdltv/1-761  Figure 2.13. Rodent versus Non-Rodent, Region Between A and F Repeats. A ) C l u s t a l W alignment o f non-rodent sequences i n this region B) C l u s t a l W alignment o f rodent orthologs. C ) R N A l i f o l d consensus structure generated from alignment i n ( A ) . D) R N A l i f o l d consensus structure from alignment i n (B).  41  160  F i g u r e 2.14. D repeat i n Rodents. A ) Conserved structures found i n mouse and rat for the first kilobase o f the D repeat. N o commonality was found i n vole. B) Conserved structures o f the last kilobase o f the D repeat, as depicted i n all three rodents.  42  Method 1 allowed me to analyze the internal exons, exon 2 to exon 5 (or exon 6 in rodents), within one 2 kb window. The seven eutherian Xist orthologs did not exhibit common structures in this region. Likewise, the 3' end of Xist failed to uncover conserved stems. A summary diagram o f Method 1 results is shown in Figure 2.15.  2.3.5) Findings from Method 2 Analysis (Dotplot to Detect Orthologv)  A s expected, C A R N A C found common structures i n segment 1, yielding the same structures as above for "before A " o f method 1 (Figure 2.16). N o conserved stems were seen for segment 2 (sequence between repeats A and F) or segment 3 (sequence between F and B ) . The dotplot method allowed the detection o f regions between rodent and non-rodent orthologs that displayed different alignment patterns to human XIST (Figure 2.8). Analyzing the eutherian Xist segments i n two separate categories improved C A R N A C ' s ability to predict common stems within segment 4 (sequence after the B repeat, before the D repeat). Secondary structures were conserved within non-rodents, although the stems were sparsely distributed with stem lengths o f 4-7 bp (Figure 2.17). In contrast, C A R N A C predicted a complex secondary structure o f segment 4 that was conserved in the rodents (Figure 2.18). The consensus structures generated from R N A l i f o l d showed distinct folding o f R N A i n both eutherian categories, confirming the differences in this region (data not shown). According to results from C A R N A C , non-rodents share conserved secondary structures for segments 6 and 8 (parts o f the D repeat), when rodent sequences were excluded from the input (Figures 2.19 and 2.20). In contrast, Method 1 detected no secondary structures from the beginning and end o f D repeat that were shared between human, cow, dog, and mole, despite analyzing non-rodent sequences independently from rodent sequences. Since the conserved structure predicted by C A R N A C for segment 8 was a single stem o f 6-7 bp within a >800 bp region, the significance o f this stem is questionable. Additionally, since a short stem (3-10 bp within a >400 bp region) within segment 7 ( D core) was conserved i n only a subset o f the seven species (Figure 2.20), this region likely does not carry a functional role. N o conserved structures were found for segment 9 before the internal exons, or segment 10, consisting o f the E repeat in the large 5' exon. A summary diagram for Method 2 analysis is shown i n Figure 2.21. A s for exons 2-5/6(rodents), only the exon 4 structure (Figure 2.22) was predicted to be conserved i n all species tested, whereas the mouse, vole and rat also shared structural similarity  43  F i g u r e 2.15. S u m m a r y o f M e t h o d 1 C A R N A C Results. Structures found to be conserved in rodents only are shown on the top; those conserved i n non-rodents only are shown on the bottom; and those conserved in both non-rodents and rodents are boxed i n red. Structures conserved i n a l l rodents are boxed i n y e l l o w .  HUMAN  •M 2H  40  cO  80  SM  120  140  1«0  180  40  t'o  co  IOO  i:u  140  i£0  iso  200  40  £0  80  100  120  140  ICQ  180  100  RAT  :u  o  220  cow  ^O  MOUSE  0  CO  60  40  80  10 0  12 0  140  Ifi  n.  DOG  0  20  cQ  40  100  30  120  140  U  0/ I  I VOLE  y  J  F i g u r e 2.16. M e t h o d 2, Segment 1 C A R N A C Results. The stems w i t h their locations are shown on the left. The corresponding  "i 0  20  40  i 60  i 00  structures o f the six 100  120  140  eutherians (excluding mole) If  are shown on the right.  45  MOLE  HUMAN  r  <"i i  i  i  i  0  100  200  J00  1 400  1 500  1 cOO  1 "700  1 SOO  i  —i £00  1  1  1000  1100  < 1200  1— 1300  1  1400  COW  _Q  CL  DOG  o  F i g u r e 2.17. M e t h o d 2, Segment 4 o f X « r C A R N A C Results for N o n - R o d e n t s . C o m m o n stems and their corresponding visual structures are shown on the left and right, respectively, when rodents were excluded from the input.  46  o  A MOUSE  n  n r 11 f i n r irfi • ' ^ 1  I  I  I  I  1  1  11.  100  ;00  300  400  SOO  f f ^ Y in I tOO  n i H T l U n  a  I  1  1  1  1  1  700  SOO  '-OO  1000  1100  ±200  L. 1300  RAT  i  o  ioo  i  i  i  ;oo  soo  400  1  soo  1  eoo  1  TOO  1  1  BOO  SOO  1  1000  1  xioo  1  1:00  1  1300  >-  1400  VOLE  r~inr 1  0  I  100  I  200  I  300  MOUSE  I  400  V a n I  SOO  I  £01)  n ^ i L  700  intent  . f ^ T ^ n  1  1  800  SOO  —I  1000  RAT  1  11.00  1  1200  1  1300  f l g m I  1400  1  1500  '  1600  VOLE  F i g u r e 2.18. M e t h o d 2, Segment 4 of Xist C A R N A C Results for R o d e n t s . C o m m o n stems and their corresponding visual structures are shown on the top ( A ) and bottom ( B ) , respectively, when non-rodents were excluded from the input.  47  B  A MOLE  F i g u r e 2.19. M e t h o d 2, Segment 6 o f Xist C A R N A C results. C o m m o n stems and their corresponding visual structures are shown on the left ( A ) and right ( B ) , respectively, when rodents were excluded from the input.  48  A  MOUSE COW  100  U  200  300  400  LJ  I 0  :  1 100  VOLE -°  HUMAN  DOG XL  B DOG  HUMAN  n 100  200  300  400  SOU  COO  700  800  100  MOLE  COW  Q_ 1 100  1 :oo  1 300  1 400  1 500  1 600  ' 700  1 000  f~) i_  1  200  300  400  500  COO  700  300  900  Figure 2.20. Method 2, Segment 7 and 8 of Xist CARNAC results. C o m m o n stems and their corresponding visual structures are shown for segment 7 ( A ) and segment 8 (B). Rodents were  100  S 0  excluded from the input for segment 8 but not for segment 7. Segment 7 shows partial conservation i n the subset o f eutherians shown (mole and rat did not show conserved structures).  49  Internal exon region  / 1  B  B Segments  1 2  3  4  6  7  8  10  Human Cow Dog Mole Mouse Rat Vole F i g u r e 2.21. S u m m a r y o f M e t h o d 2 C A R N A C R e s u l t s . Structures boxed i n red are conserved between the seven eutherians, while structures in y e l l o w are found i n a l l rodents or all non-rodents (analyzed separately). Segment 7 is found i n only i n a subset ofthe seven mammals (mouse, vole, dog, human, c o w ) . Segment 2 was not evaluated by C A R N A C because it corresponds to essentially the same input data as the A repeat from method 1 (see text).  MOLE  RAT  VOLE  MOUSE  HUMAN  B MOLE  i  i  ;  U  10  CO  MOUSE  i  JO  i  AO  i  i  i  i  :  i  i  50  CD  70  60  90  100  110  i  i  ICO  130  i  6  10  20  30  40  SO  io  -i)  70  M  100  110  ICO  IM  0  HUMAN  0  10  ;0  i  i  i  i  i  '  '  '  •  *  •  40  SO  tU  70  60  90  100  HU  ICO  liC  140  150  1\)  40  SO  CO  70  30  "0  100  110  ICO  120  140  150  30  1 ;o  1 40  50  1 CO  1 70  1, , CO  «0  ' 100  ' 110  • 120  ' 170  CO  RAT  40  50  d"Q  70  30  50  100  110  ICO  130  I  DOG  :o  i 30  10  g  io  i CO  >  COW  VOLE  0  i 10  1 io  1 CO  v  4,,-  io  40  so  (o  70  n  '.ii  100  no  ico  1  ]  no  / f  COW  DOG  F i g u r e 2.22. Xist E x o n 4 C A R N A C Results. C o m m o n stems (B) and their corresponding visual structures ( A ) and ( C ) o f the seven eutherians.  51  for exon 5 (unique to rodents) (Figure 2.23). M f o l d predicted stable hairpin structures for all eutherian exon 4 sequences, including the extended exon 4 i n coast mole (Figures A.1 and 2.24). In fact, the minimum free energy o f the extended mole exon was -139 kcal/mol, compared to 78 kcal/mol for the unextended version (Figure A . 1), and -64 to -67 kcal/mol for exon 4 hairpins in the other eutherians (data not shown). E x o n 2 structures output by C A R N A C were only common between rat and mouse (Figure A . 6) and secondary structures predicted for exon 6 were only conserved between vole and mouse (Figure A.7). O f the non-rodent eutherians, only human and mole showed conservation for exon 5 (Figure A . 8). Thus, it is unlikely that any o f the exons, with the exception o f exon 4 and rodent exon 5, play a role i n X inactivation. Rodents and non-rodents showed distinct conserved structures when internal exons were treated as a whole unit (Figure A.9). A summary diagram for the C A R N A C exon analysis is shown i n Figure 2.25.  2.3.6) Control for CARNAC Output  Out o f concern that the stems predicted by C A R N A C were i n fact artifacts rather than real conserved structures, a randomization procedure was performed by Sohrab Shah on each region o f Xist, as a control for chance stems. The randomization used a C L O T E Computational R N A shuffler (http://clavius.bc.edu/~clotelab/RNAdinucleotideShuffle/dinucleotideShuffle.html) that shuffled single sequences independently, while preserving the lengths o f the input sequences, which varied between the species [85]. Each species sequence was randomized and sequences corresponding to the same region of Xist were entered together into C A R N A C . Only Xist regions that were originally predicted by C A R N A C to contain conserved secondary structures in all analyzed sequences were randomized. Input sequences were shuffled 1000 times per region of Xist followed by C A R N A C prediction. Only those resulting stems that were found to be conserved i n all of the species from the shuffled data sets were included (i.e., trials where common structures were only predicted in a subset o f species were discarded), i n order to be consistent with the results given from the original data. In addition, using the same input sequences to perform the control as those used to generate the original Xist data ensured that the nucleotide content and sequence lengths were constant. Individual stems resulting from the same species in a particular Xist region were aggregated i n each data set to calculate total stem  52  A VOLE  .J  u,  -j  i  i  i  i  50  eO  70  80  50  i_  SO  60  70  80  90  100  110  90  100  110  100  MOUSE  —!  40  1  120  1  L_  130  140  RAT  _l  10  20  30  L.  40  50  _J  II  L_  120  130  B B«i  B  £>->_„ A =«:- ' 0  Of •s.  8 VOLE  MOUSE  RAT  F i g u r e 2.23. X « r R o d e n t U n i q u e E x o n 5 C A R N A C Results. C o m m o n stems and their corresponding visual structures are shown i n ( A ) and ( B ) , respectively. C ) R N A l i f o l d consensus structure.  53  10 MX4/1-210  20  30  40  50  60  70  80  90  100  110  120  130  -GATnrnraMiGjUTAOKTimGT^ •ACT--ACCAA  3»all/l-216AGA3CTTCCTXAGAAGAATAGGCrr(nTTJin^ vole/1-214  TA •—GGGAGACACTTCSCTGA  -ATnTTCCCCA^raTOTACTriflTOnTK  »ex4/l-212 rat/1-132  -ATCTCCCCCCAGUTnntKaKTTKnraTTGCAfrarrCKCACn^^ AlTOOTCCTrtOTarrrrCCAmCTCGCGACCTAm  biJ/1-318  --ATintCCTCAAAAGAATAljGCLlijllbililACAGTW^AGTGACCTGT]^  CTC^COTTTCCTGA C--  B  HUMAN  VOLE  MOUSE  fl  MOLE  RAT  COW  DOG  Figure 2.24. Consensus Structure and Multiple Alignment of Exon 4. A ) C l u s t a l W alignment o f a portion o f exon 4. B) R N A l i f o l d consensus structure as predicted from the alignment input. C ) M f o l d structures o f exon 4 i n the eutherians.  54  3 6 6 6 3 2 3 6 6 7 6 3 3 6 7 5 1 2 3 6 7 7 2 2 A G G T A GG T  3 8 8 3 4 1 3 6 8 4 9 5 3 6 9 2 7 7 3 6 7 6 6 7 3 6 6 6 1 3 AG GT G G T A G  1 3 6 2 1 61 3 6 3 0 5 1 3 9 3 7 7 1 3 9 5 8 4 A G GT AG II GT A G GT 1 3 7 7 8 11 3 7 9 1 7 |f  1 4 3 8 2 8 AG GT AGlGT AGL 1 4 2 6 5 1 1 4 2 7 8 3  3 6 6 3 9 2 3 6 6 4 6 2 AG G T  GT  A G  GT  AG  II GT  A  AG  AGJIGT A G .  GT  4 1 8 4 0 4 2 0 0 3  3 7 6 9 6 3 7 8 3 2 3 9 7 9 6 4 0 6 6 4  „„.„, 3 7 0 9 3 4 3 7 3 7 9 8 GT GT  4 3 0 8 7  A  G  GT GT 4 4 9 5 6  AG 4 7 6 2 6  „  AGriGT  AGIB1  GT  AG fl GT AG n GT  AGFlGT  r  p o y lA signal  GT  AG p o y lA signal  « • »  « P O A fjl signal ^  *>»«  _GX.  AGnGT  A < J  ^  GT  AG  ""4  RODENTS: OTHERS: ALL: F i g u r e 2.25. S u m m a r y o f C A R N A C Results i n E x o n i c P o r t i o n s . Structures conserved between a l l eutheians are boxed in red, while structures found i n all rodents or a l l non-rodents are i n yellow. Structures found only in a subset o f these groups are shown.  4 9 0 0 8  A r G  5 0 0 4 2  5 0 3 9 6  length per orthologous region i n a given trial, so that stem lengths within the same number o f nucleotides were compared between randomized and unrandomized samples. Since any secondary structures resulting from the shuffled input sequences were location independent, and not based on real Xist data, they represent stems that are common due to chance rather than to biological significance. The average total stem lengths per orthologous region generated from the original Xist data set was compared to the average total stem lengths per orthologous region i n the randomized samples using a T-test to get an idea o f how similar the original output was to "chance." This control procedure assumes that longer stems than those conserved by chance are likely to be biological significant. Summary values o f the standard deviation, range, and mean stem lengths are listed i n Table 2.2. After shuffling 1000 times for each region of Xist, the resulting stem lengths produced distributions with very broad variances, giving large standard deviations (5.09-36.182). After 1000 sets o f input data into C A R N A C , total stem lengths o f virtually all sizes (e.g., 3-187 bp) were predicted from chance. Nevertheless, there were notable Xist regions whose unshuffled average stem lengths were significantly different (defined as p<0.05) from those o f the corresponding shuffled data sets, highlighted i n grey i n Table 2.2. These regions included the F repeat, segment 7, when all species were analyzed; the sequence after the A repeats, when only rodents were analyzed; and segment 8, when only non-rodents were analyzed. However, all three o f these Xist segments displayed significantly shorter mean total stem lengths when compared to randomized sequences. In fact, exon 4 was the only region found to show a higher mean total stem length compared to that from chance, although this difference was not found to be significant, with p = 0.109. The A repeat and rodent specific exon 5 showed the next lowestp values o f 0.177 and 0.201, respectively.  2.4) Discussion  Consistent with past comparative sequence studies, rodents have undergone rapid sequence divergence from human compared to the other eutherians studied [3]. In all cases o f pairwise sequence comparisons of Xist, rodents showed comparatively l o w sequence similarity to the non-rodents, but higher similarity to one another. The high intron similarity between the rodents is expected because o f their relationship on the phylogenetic tree, but i f the radiation o f  56  Table 2.2. Summary of Randomized Vs. Original Xist CARNAC Results. Values for stem lengths after randomization of Xist sequences are shown (left) in comparison to stems from the corresponding original, unshuffled Xist data (right). Significance values are from T-tests (twotailed) between randomized and unrandomized samples assuming unequal variances. Significant difference is defined at p < 0.05. Regions that showed stems that were significantly different than stems from shuffled data are shown in grey and theirp values are marked with an asterisk*. Highlighted in yellow is a region that shows a higher mean stem length from the original data compared to shuffled data, although the p value was not significant.  Randomized Mean Max 37 9.44 9.14 33 150 73.5 40 11.6 187 99.1 41 8.96 37 20.4  Xist Region  Conservation  Before A A Repeat AfterA F repeat D last kb Exon 4! Exon 5 Unique Combined Internal Exon Region Segment 4 Segment 4 Segment 6  All eutherians All eutherians All rodents All eutherians All rodents All eutherians All rodents  2  All rodents  4  80.6  All non-rodents All rodents All non-rodents All eutherians except mole & rat All non-rodents  3 4 3 3 3  Min  Segment 7j (D-core) Segment 8  3 4 3 4 2 4  SD 4.80 2.25 6.24 2.17 24.0 5.40 7.10  T-test p value 0.597 0.177 0.009* 0.002* 0.580 0.109 0.201  81  7.57  0.195  5.23 5.93 25.2 5.80  7 12 44 10  2.23 2.54 1.71 3.03  0.238 0.275 0.263 0.038*  6.75  7  0.50  0.000*  Unrandomized Mean Max 8.33 16 7.67 10 49 42.0 6.50 10 109 90.0 27 13.9 12.7 19  SD 5.27 5.09 27.4 6.70 36.2 5.45 4.72  Min 3 5 37 4 63 5 5  164  29.1  67  72.3  15.8 15.5 15.8 9.82  42 62 63 26  7.30 6.43 10.9 4.78  5 5 14 3  13.7  45  7.49  6  57  Xist were uniform, mouse-human sequence similarity would not be expected to be significantly lower than cow-human similarity (for example). Structures that were conserved i n one eutherian category but not the other provide insight into functional differences o f X i s t across species. These include segments 6 and 8 (D repeat) structures, unique to non-rodents; as well as the last kilobase o f the D repeat and exon 5, unique to rodents (Figure 2.26). In addition, those regions that showed R N A structure conservation in both rodent and non-rodent groups (whether the structures were different between the groups) reflect functional constraints o f the region that presumably still allowed divergence due to co-evolving partner molecules. Xist segments belonging to this category include the sequence between A and F , segment 4, and the internal exon region. O f greatest significance are those structures predicted by C A R N A C to be conserved amongst all seven eutherians, given the divergence o f the species analyzed. In particular, the region before the A repeat, the A repeat, the F repeat, and exon 4 contain secondary structures shared by all Xist sequences entered into C A R N A C . However, the randomization control suggests that only a subset o f these C A R N A C structures may be meaningful biologically. In particular, exon 4 exceeded the average stem length generated by chance, although not significantly. The A repeat in all eutherians and rodent-conserved internal exon 5 structures also gave relatively l o w although not significant p values when compared to rest of Xist. O n the other hand, some regions of Xist form conserved stems that are shorter than by chance, which include the sequence between the A and F repeat in rodents, as well as the F repeat in all eutherians tested (segment 7 is only conserved i n a subset o f species). Although most o f the stem lengths found in the original analysis overlapped with those lengths that can arise from random nucleotide sequences, this does not signify that the conservation o f structures those regions does not exist. Instead, C A R N A C with its high specificity (true positive rate o f 85-93%, but false negative rate of 33-49%) likely outputs real (but partial) common structures [82], but whether those common structures are biologically significant is a question to be addressed with experiments. Since the silencing function o f the A repeat is demonstrated in the mouse and human by deletion studies [64]; and the primary sequence, copy number, and secondary structure is preserved i n all eutherians examined, there is little doubt that this region is crucial to inactivate the X chromosome. This function is presumably achieved by recruiting initial players involved in heterochromatinization. Such players include G 9 a and E e d / E n x l complex, which lead to early histone methylation at H 3 K 9 and H 3 K 2 7 on the inactive X [31, 64]. However, mutants  58  Non-Rodents  E  B  Establishing Heterochromatin  Silencing  DNA binding  MacroH2A recruitment; Localization  transcription/processing  A  Rodents  I IH  D  D  g&r  Differences in extent of silencing  (  ,H^''-.,  Differences in maintenance; Species-specific localization and binding affinity  Figure 2.26. Proposed Functions of Xist Regions. Structures conserved among non-rodents are shown on the top; structures conserved among rodents are shown on the bottom; structures shared between both rodents and non-rodents are shown in the centre. Proposed functions o f conserved regions are shown i n the centre. Functional differences possibly arising from structural differences i n X i s t R N A between rodents and non-rodents are shown on the bottom.  with G 9 a deficiency still display normal X inactivation or subsequent maintenance [86]; and H 3 K 2 7 trimethylation by E e d / E n x l is independent o f silencing [32]. Thus, the A repeat likely does not interact with these factors, but instead recruits unidentified players crucial for establishing silencing. Deletion o f the A repeat i n human not only abolishes the silencing function o f X I S T , but appears to additionally affect stability or localization o f the X i s t transcript, as only a pinpoint X I S T signal is seen on the inactive X using R N A - F I S H (personal communication, Jennifer Chow). Deletion studies i n mouse also suggest that the A repeat contributes to m-localization [64]. Hence, domains of Xist seem to have overlapping roles. The exon 4 hairpin is conserved across all species, indicative o f a functional role i n the inactivation process. In mouse, deletion o f exon 4 does not lead to obvious changes i n silencing or localization. It could instead have a role in stability or processing o f the X i s t R N A [69]. Alternatively, it could bind to epigenetic modifiers important for the maintenance o f inactivation status. This is compatible with the fact that maintenance was not thoroughly investigated i n the exon 4 deletion study by Caparros et. al [69]. Because different mammals do not share a common secondary structure i n Xist B , C and E repeats, it is unlikely that R N A folding in these regions plays a crucial functional role i n X inactivation. Despite the good sequence similarity across species, the B repeat does not fold into any globally stable structures (Mfold), but rather gives many tiny stems with an overall minimum free energy [ M F E ] o f -11 kcal/mol (data not shown). This argues that i f the B repeat indeed plays a biological role i n X inactivation, the primary sequence o f this repeat is more important than its secondary structure. Perhaps it forms a necessary D N A site o f Xist for transacting factors to recognize, rather than a single R N A motif whose shape is important to bind D N A or other proteins. Thus it might be a binding site for trans-acting proteins such as histone modifiers, D N A methyltransferases, or the Scaffold attachment factor-A ( S A F - A ) structure to locally constrain X i s t within the territory o f its expression site and prevent the transcript from drifting i n the nucleus [68]. This S A F - A structure is thought to be a component important for maintenance o f silencing rather than induction. Because past genetic studies did not address the influences o f X i s t deletions on extent o f silencing, the B repeat may interact with any proteins involved in the maintenance stage o f silencing. In contrast, the E repeat is highly diverged even at the sequence level, so its presence i n different eutherians could reflect the important role o f expanding Xist. Increased size could be necessary for the transcript to sufficiently coat the X chromosome. Alternatively, the E repeat  60  could prevent degradation o f the Xist molecule by forming a protective secondary structure (the precise nature o f the structure would be irrelevant). However, i f the E repeat is important for the high stability o f the Xist molecule (half life is over 5 hours), deletion ofthe 3 ' end of Xist should demolish all Xist activity, as the transcript would be quickly degraded. However, mouse and human deletion studies indicate that truncation o f Xist at the 3' end still allows X i s t to silence the sex chromosome, as long as the A repeat is intact, although localization is reduced [64]. The C repeat is likely dispensable due to its absence i n mole and cow, the presence o f only one copy in humans, and its truncation in rodents. In accordance with this, deletion ofthe C repeat i n human failed to cause any apparent disruption to X i s t localization or silencing ofthe X chromosome (personal communication, Jennifer Chow). However, the C repeat is implicated in chromosome binding i n mouse because peptide nucleic acid ( P N A ) interference to this region led to complete loss o f X i s t localization to the inactive X [87]. Either this C repeat function is unique to rodents (as the repeat is largely expanded i n rodents) or the conflicting results from deletion versus P N A studies simply reflect different experimental methods. The D repeat is distinct between rodents and the other eutherians in terms o f both sequence composition (as depicted in dotplots) and structure (as predicted by C A R N A C ) . Deletion studies in mouse illustrate that both D and E are important for macroH2A localization to the inactive X , which explains why X i s t expression causes macrochromatin body formation [88]. However, it is unclear whether these repeats recruit macroH2A to the inactive X in human and other non-rodents. Since regions within the D repeat are diverged between rodents and nonrodents, the domain required to interact with macroH2A could lead to differences in the maintenance o f silencing, which is the stage o f X inactivation during which this histone variant is recruited. This may translate into disparities i n the extent o f silencing o f the X chromosomes in rodents compared to non-rodents. The copy number o f repeat A (around 8 X ) is conserved i n the mammals studied, compared to B , C , D , and E which are differentially expanded or shrunken i n the various species. In mouse, various parts o f the 3' end and C , D and E repeats have been experimentally shown to function cooperatively for localization [64]. This justifies the variation observed across eutherians i n repeat copy numbers: the shrinkage o f one o f the repeats is not detrimental to X i s t function since it can be compensated by the expansion o f the other two repeats. The redundancy o f the C , D , and E repeats might also explain why human XIST is functional for  61  localization in a mouse background [76]. Perhaps, the divergence o f C , D , and E repeats at the sequence and structural levels accounts for a small degree o f species selectiveness - how well the transcript localizes when it does. This could account for the differences between human and mouse i n X I S T / X i s t binding affinity to X chromosomes, as observed i n M-phase, as well as the fact that human XIST localizes only partially to the mouse autosome from which it is transcribed [76]. Lastly, the conservation ofthe repeat F structure using C A R N A C , despite differences i n monomer copy number, implies that the single structure from the combined sequence itself is more important than the reiterated structures formed from independent copies. Results from the randomization control reveal that the F repeat is more likely to form longer stems by chance than i n the original Xist context. The genomic composition i n this region may inherently counter the formation o f larger stems. This feature may have been selected for during evolution i f the region is a useful site for binding D N A . The same idea applies for the sequence between the A and F repeat, segment 8 o f the D repeat (unique to non-rodents) and the D core (present i n five o f seven eutherians) because they led to smaller conserved stems compared to chance. The problem with a bioinformatics approach is that one can never be certain o f the biological significance o f the output unless the candidate regions are later tested experimentally. R N A conservation analysis tools are currently not at the stage to account for interaction between domains and transient structures o f the R N A . Additionally, the current tools available publicly have at least one o f the following problems: computationally complicated, long running time, unable to deal with large size, cannot detect pseudoknots, cannot deal with diverged sequences, and/or no quantitative output. Until all o f these factors can be addressed, ultimately experimental approaches are best suited to define functional domains. However, since it is not feasible to perform shotgun or progressive deletions i n R N A s o f such a massive size i n all eutherians o f interest, bioinformatics provides a tool to narrow down regions o f potentially functional importance and to look at many animals at the same time. Once a crude picture o f functional domains is drawn, transgenic studies using Xist/XIST  cDNA  constructs i n embryonic stem cells or somatic cell hybrids w i l l reveal the importance o f these regions. Additionally, one can test the binding affinity o f a defined region of Xist to a candidate interacting molecule by mobility shift assay. Pull-down experiments using biotinylated X i s t R N A can reveal important interacting players i n X inactivation. One could also replace the repeats o f one species with the same segment i n another species to test whether the repeats are  62  interchangeable. Even i f the results o f this present research have not uncovered additional regions o f conservation, it has undoubtedly served the role o f promoting the need for algorithms better suited to the needs o f large and divergent functional R N A s such as Xist.  63  Chapter III Comparative Survey of Inactivation Status in Multiple Eutherians  3.1) Introduction to Genes that Escape Inactivation 3.1.1) Origin of Mammalian Sex Chromosomes The mammalian sex chromosomes likely differentiated from an ancient autosomal pair 300-350 mya, shortly after the avian-mammalian divergence (Figure 3.1) [89]. These autosomes are independent from those that gave rise to the avian sex chromosomes as evident i n comparative mapping studies. The region proximal to human X p l 1.23, otherwise known as the X conserved region ( X C R ) , which is present on the X chromosome i n all mammalian classes ~ prototherians, metatherians, and eutherians - is predominantly syntenic to chicken chromosome 4p, rather than to the avian sex chromosomes [90]. Conversely, the Dmrtl gene which is on the Z sex chromosome i n birds is homologous to a region on human autosome 9, rather than to the human sex chromosomes [91, 92]. According to the leading theory o f mammalian sex chromosome evolution, the proto X and Y began to differentiate when one allele o f the proto-sex chromosomes acquired a sexdetermining function. This was possibly achieved by a dominant mutation in the SOX3 gene, changing it into a new penetrant SRY allele. SRY must have emerged after the prototheriantherian divergence because sex-specific 57? 7 is not found in monotremes or other vertebrates (but SOX genes are) [93]. Rearrangements such as inversions that could suppress recombination on the proto-Y were positively selected because this allowed genes evolving similar sex-specific roles to be inherited on the same chromosome. This recombination suppression allowed the proto-Y to be genetically isolated and to differentiate from the proto-X [94]. A t least four major rearrangements on Y have occurred through time, based on footprints left from the evolutionary process on present day human sex chromosomes, in the form o f distinct strata. Lahn and Page (1999) grouped genes on the X into four regions ("strata") based on their percent X - Y divergence, which reflected the time each region had to accumulate mutations. The most diverged region was coined stratum 1, which is the oldest part o f the sex chromosomes (the first to differentiate). From their study, they identified three other strata, with  64  Ancestral autosomes Emergence of sex Recombination 300-350 my chromosomes suppression and 290-320my attrition 230-300my Y  Further Y attrition 130-170 my  Addition of autosomal material  Eutherian X Inactivation  1 and  2=  XCR  Inversion spanning X A R and X C R  F i g u r e 3.1. E v o l u t i o n o f M a m m a l i a n Sex C h r o m o s o m e s . M a m m a l i a n sex chromosomes likely derived from an ancestral pair o f autosomes where the emergence o f a sex determining allele on the proto-Y w o u l d favor recombination suppression mechanisms such as inversions on the Y chromosome. It is thought that at least four such rearrangements (labeled 1-4) have occurred to a l l o w for genetic isolation o f the Y and sex chromosome differentiation. Decay o f the Y necessitated a dosage compensation mechanism [89, 94-96]. X A R = X added region; X C R = X conserved region; P A R = Pseudoautosomal region.  65  stratum 4 being the most recent addition onto the X chromosome, also known as the distal X p pseudoautosomal region 1 ( P A R I ) , named for its retention o f highly homologous X - Y gene pairs, which are able to synapse, recombine, and be inherited i n a manner similar to autosomal material. Based on comparative locations across species, Lahn and Page further estimated that stratum 1 is 300-350 my old, stratum 2 is 130-170 my old, stratum 3 is 80-130 my old, and stratum 4 is 30-50 my old. Thus, strata 1 and 2 correspond to the X C R , because they existed before the time o f prototherian-therian divergence and thus exist on the X chromosome i n all mammalian subclasses. However, both stratum 1 and stratum 2 contain exceptions that are found on the eutherian X chromosome yet are autosomal i n marsupials and monotremes. This is because at least seven stratum 1 genes were independently translocated to autosomes i n the monotreme lineage and an inversion on the X chromosome involving stratum 2 presumably occurred after material was independently added i n the eutherian lineage (Figure 3.1). The oldest X - Y gene pair corresponds to SOX3/SRY in stratum 1, i n accordance to the sex chromosome evolution model. Comparative mapping to the chicken genome has revealed that the X C R stratum 1 is syntenic to chicken 4p, while X C R stratum 2 maps to a variety o f chicken chromosomes (predominantly 1, 4, and 12). O n the other hand, strata 3 and 4 constitute recently added material, before and after the metatherian-eutherian divergence, respectively [95]. They correspond to the X added region ( X A R ) , distal to X p l 1.23, not found on the X chromosome in monotremes [96]. The majority o f the X A R is syntenic to chicken chromosome l q , with the exception o f the pseudoautosomal regions [90]. Different autosomal material has been independently added to the pseudoautosomal regions in separate eutherian lineages, as evident in the differing gene compositions o f P A R I across species [97].  Without the potential benefit o f recombination to reverse the effects o f deleterious mutations that accumulated on the proto-Y, the Y began to shrink because mutated, functionless genes could become deleted without consequence. Indeed, the current human Y chromosome is enriched for pseudogenes. However it is also abundant in large palindromic repeats, which are postulated to provide an alternative to recombination (i.e., gene conversion), necessary to prevent complete degradation o f the Y [94]. The abundance o f mutations and decay that occurred during evolution allowed the Y to slowly diverge from the X .  Loss o f functional  genes on the Y chromosome resulted i n dosage imbalance between females with two X  66  chromosomes and males with only one X , which necessitated X inactivation as a dosage compensation mechanism (in mammals) to cope with detrimental differences between genders.  3.1.2) Genes that Escape X Inactivation Escape as a Consequence of Sex Chromosome Evolution The expression status along the inactive X chromosome in females broadly reflects the time o f X - Y divergence, as less divergence corresponds to intactness o f the Y homolog and dosage equivalence between males and females (Figure 3.2). Genes on the newer strata would expectedly escape inactivation i n females, while those on the older strata would inactivate. Consistent with this model, most escapees reside on the short arm o f the X chromosome, and the escape pattern diminishes from the short arm to long arm ( X C R ) , where the majority o f genes are subject to inactivation [98, 99]. Genes whose Y homologs have been retained would be expected to escape inactivation, and those whose Y homologs have decayed or have become non-functional would be expected to inactivate. This hypothesis has been verified for at least three loci by examining their methylation status i n a wide variety o f eutherian mammals, whether the genes are located on the X C R or X A R . RPS4X is unique i n that it appears to only escape in primates, who have retained their functional Y homologs. O n the other hand, Zjx seems only to be subject to silencing i n rodents with essentially no Y homolog expression i n somatic tissues because Zfy has become testis-specific in this lineage. A s expected, Aldl, which is on the X C R , with no Y homolog i n any eutherians examined, was consistently subject to inactivation [100]. However, JaridlC did not show any discemable pattern o f escape or inactivation i n the eutherians [100]. Numerous genes on the human X do not possess Y homologs but similarly escape inactivation i n human females [99]. These genes might confer female specific traits, such that two doses distinguish a female from the single dose in males. Marsupials seem to require a double X chromosome dosage to stimulate some female characteristics while inhibiting some male features [2]. This is likely achieved by genes without Y homologs that escape inactivation, allowing for sex differences. Alternatively, dosage differences may be inconsequential at some loci. Lastly, it is likely that levels o f regulation exist beyond transcription that might alter dosage effects at the transcript or protein level, such that escapees o f X inactivation do not  67  Human  100%  Mouse PAR  PAR 1: CSF2RA, IL3RA, ANT3 (all E)  0%  (E) Sts (E)Mid1  PARI (E) STS (E/l) JaridIC  (E)ZFX (E)EIF2S3 (E) CRSP2 (E) UTX  chick 1q  Xist,  Enox (E)  (I) Rps4x  chick distinct origins  L  (E) UBE1X (E)PCTK1 (E) JARID1C  (l)Zfx* (E) Eif2s3x  *(E) RPS4X  XIST chick 4p (E) Dbx (I) Timpl (I) Ubelx (E) Utx  PAR2  % Escaping genes Per region  PAR 2: SYBLI(I), IL9R(E)  15% genes on Xi escape inactivation  F i g u r e 3.2.  'unique escape or inactivation status 5% genes on Xi escape inactivation  Extent of Genes that Escape Inactivation in H u m a n versus M o u s e .  E= escape status; 1= subject to inactivation; X chromosome on left and Y chromosome on right o f each pair. A) The leftmost graph shows that the number o f genes that escape inactivation increases towards the distal p arm o f the X chromosome. Origins o f the regions on the X are indicated [90, 105]. B ) The seven genes that are k n o w n to escape inactivation on the mouse inactive X are shown.  68  translate to differences in eventual product concentration. Consistent with this, analysis o f male versus female transcriptomes failed to detect over-expression o f many genes in females observed to escape inactivation in other studies [101]. Nevertheless, a high number o f genes located within the X C R , especially in the short arm, that do not possess Y homologs still escape inactivation and this number seems to be greater i n human versus mouse. Human vs. Mouse: To Escape or not Escape? In general, the mouse X chromosome displays more thorough inactivation compared to the human counterpart. To date, only seven examined genes are known to escape silencing: Enox, Utx, Midi, Jaridlc (formerly Smcx), Dbx, Eif2s3x, and Sts (with Sts being a P A R I gene) (Figure 3.2). The small number o f loci that have been studied i n mouse might not represent the entire mouse X chromosome. However, the absence o f an abnormal phenotype i n 3 9 , X O mice compared to the Turner phenotype i n 4 5 , X O humans suggests incomplete silencing o f the human X relative to the mouse chromosome. Consistent with this, an extensive expression analysis o f 624 X-linked genes and E S T s confirmed that over 15% o f the human loci escape inactivation and an additional 10% show variable expression from female to female [99]. Cattle monosomic for the X chromosome also show Turner syndrome, developing only streak gonads, improper ovaries and no ovulation. Analogous to human, the X X Y cow displays Klinefelter syndrome, with small testes and sterility [102]. Therefore, the escape pattern i n cow resembles that i n human and contrasts with that in mouse. A n exploration into other mammals may reveal the factors that explain the observed diversity across eutherians. Other Considerations of Escape When comparing human and mouse, where the retention o f Y homologs can differ for the same locus, it is unclear what allows the silencing signal to skip the same locus i n one species but not the other. Sequence comparisons between the 5 ' C p G islands o f ZFXIZjx fail to reveal any significant differences in sequence and structural elements [103, 104], arguing against cw-elements within the gene as being important regulators for expression. Delineating the factors that influence the expression o f X - l i n k e d genes and finding species differences i n these factors are the focus o f Sections 3.2 - 3.4.  69  In humans, the presence o f blocks o f escapees suggests domain-regulation o f X - l i n k e d genes. The human escape cluster in X p l 1 contains a region o f distinct chromosomal origins (chicken chromosomes 1,4 and 12) relative to the rest o f the X C R (chicken 4p), including a number o f X A R genes [90,105]. This region corresponds to a portion o f the X that presumably underwent a minor inversion spanning both the X C R and X A R (Figure 3.1). A comparative analysis o f the corresponding block i n mouse shows a lower level o f long terminal repeats ( L T R ) i n humans than mouse, suggesting that L T R s may enhance silencing i n mouse [106]. C T C F insulators have recently been discovered at transition points between loci subject to inactivation and those that escape [107], which may help maintain open chromatin domains within larger heterochromatic contexts. Gene-specific regulation o f X inactivation is evident on the mouse chromosome, as a small number o f genes escape silencing despite being surrounded by larger domains subject to inactivation. However, analysis o f the mouse JaridlC promoter has not revealed any characteristics o f this gene that could explain its unique escape status [106]. It is unclear whether escape status on the inactive X chromosome arises due to lack o f maintenance factors for stable inactivation, or an initial resistance to Xist-dependent silencing (the concept o f "precommitment"). The mouse Jaridlc gene, which is initially subject to inactivation and then reactivates early in development [108, 109], illustrates that reactivation leads to escape i n some situations. In addition, females with I C F syndrome, associated with hypomethylation due to mutations i n the D N M T 3 B D N A methyltransferase, show abnormal escape i n a portion o f cells [110]. Although this implicates a role for methylation i n maintenance, it also demonstrates that the absence o f methylation is insufficient to reactivate X linked genes i n all cells. TIMP1, which shows variable low-level expression despite methylation o f its promoter, demonstrates that methylation is not enough to maintain a consistent inactivation status. However, TIMP1 may be predisposed to becoming expressed by the acetylation o f histone H 3 K 9 , a mark found on both naturally expressed TIMP1 genes and those successfully induced to be expressed [111]. Thus, it appears that escape might dualistically involve both pre-commitment and maintenance. L o c i may inherently bear properties that make them more prone to reactivation later. Monoallelically expressed genes, including imprinted genes and genes that are subject to inactivation, have H 3 K 4 dimethylation restricted to their promoters, whereas biallelically expressed genes show the same such marks i n  70  exonic regions as well as their promoters [112]. The additional mark i n the exon may predispose these genes to being expressed rather than silenced. Interestingly, these differences i n modifications are most drastic i n undifferentiated stem cells compared to differentiated fibroblasts, highlighting their role i n marking the gene status prior to inactivation.  3.1.3) Differences Between Species : Imprinting, Methylation, and Escape Unlike the random inactivation in all tissues o f humans, paternal X chromosome inactivation occurs in all tissues o f marsupials and monotremes, as well as i n the extraembryonic membranes o f rodents and cows (reviewed i n [48]). The non-random inactivation pattern observed i n marsupials has been attributed to early cleavage events that prevent erasure o f imprints on the paternal X chromosome after blastulation (reviewed i n [113] and see Section 1.3) (Figure 3.3). In mice and i n cattle, the early stage cleavage occurs later than i n marsupials. Paternal inactivation is observed only i n extraembryonic lineages i n these two species. The first cleavage stage i n humans is comparatively late - this presumably allows more time for the erasure o f imprints to occur before new epigenetic marks on the future inactive X are established randomly i n all tissues [25, 48,113]. Humans/primates are thus unique i n terms o f not normally displaying imprinted inactivation. In eutherians, methylation o f C p G islands on the inactive X maintains the silencing signal. Methylation is conserved within Eutheria, associated with inactivation i n diverged species such as the coast mole [84]. However, marsupials do not show a clear association between methylation and silencing, consistent with the incomplete, unstable, tissue-specific inactivation observed i n metatherians [2]. Because late replication timing and histone deacetylation are associated with silencing i n metatherians, but methylation o f C p G islands is not, D N A methylation is likely a more recently evolved maintenance mechanism o f repression [18]. L i k e marsupials, incomplete and unstable inactivation is observed i n the early embryo o f mouse before the blastocyst stage; and Jaridlc i n mouse is variably expressed depending on the stage o f development and tissue-type [25,108,109]. In humans, the loci that show variability between females [99, 108, 109, 114] might reflect unstable inactivation due to absence o f methylation, as variable loci tend to lack C p G islands [99]. Thus it possible that there is a progression from unstable, variable, partial inactivation i n the early branching mammals to variable, incomplete inactivation seen i n humans, to variable, nearly complete inactivation  71  Prototherians  Metatherians  Human  Imprinting  Paternal; tissue specific  Paternal; tissue specific  Random  Tsix Functionality  7  ?  Tsix  Tsix  Tsix functional  Presence of  ?  ?  +  +  +  Features of Inactivation  Unstable; tissue variable; late replication seen  Unstable; lack of methylation; tissue variable; late replication and histone deacetylation  Stable but can be variable between individuals; late replication, histone deacetylation and DNA methylation  Stable, unknown variability; late replication, histone deacetylation and DNA methylation  Stable but can be variable depending on developmental time; late replication, histone deacetylation and DNA methylation  Sex Chromosome Pairing  X and Y synapse (large region)  X and Y pair, no PAR (small region via modified axial elements)  X and Y pair at PAR  X and Y pair at PAR  X and Y pair at PAR  Incomplete  Incomplete silencing  Many escapees  ?  Mostly silent  ?  ?  Xist binds until prophase  ?  Xist binds until metaphase  Cow Paternal in extraembryonic; random in embryo  Mouse Paternal in extraembryonic; random in embryo  Xist  Extent of Silencing Xist Binding Affinity  _  F i g u r e 3.3. Differences between M a m m a l s . Comparisons between prototherians, metatherians, human, c o w and mouse i n terms o f cleavage time, imprinting, X i s t and T s i x functionality, sex chromosome pairing, and silencing characteristics.  observed i n mice. The mouse X chromosome might represent the eutherian X at an advanced state o f acquiring epigenetic modifications that increase the stability and completeness o f silencing. A n examination o f other eutherians w i l l clarify whether the silencing pattern i n mice is unique.  3.2) Introduction to Methylation Analysis  Not only is the initial and fundamental step o f Xist coating the eutherian X chromosome in cis poorly understood (Chapter 2), but the mechanism by which this results i n silencing along the X chromosome has similarly been ambiguous. In particular, an understanding o f why and how some genes escape inactivation, and furthermore, why there is a species difference o f inactivation status for the same loci would provide insight into the mechanisms o f regulating silencing i n different mammals and genes. Comparative studies i n multiple eutherians could reveal factors important for silencing that are discrepant i n human and mouse, and clarify the roles o f previously hypothesized factors in the extent o f inactivation. In this study, I examine whether the presence o f Y homologs, distance o f the loci from the XIC, age o f the region on the X , proximity to constitutive heterochromatin, and different generation times influence the escape status o f X-linked genes i n different species.  3.2.1) Generation Time The rapid generation time ofthe mouse relative to human might explain the greater extent o f inactivation. Because the number o f mutations increases with a greater number o f replications, those species with shorter generation times would expectedly inherit more mutations per unit time than those with longer generation times. Thus, mouse sex chromosomes could represent more mutated forms relative to the human X and Y , exhibiting a higher amount o f Y decay and consequent greater need for more extensive inactivation along the X chromosome. Using this logic, rodents with shorter generation times would be expected to show more complete inactivation along the X chromosome, compared to insectivores, artiodactyls, and primates with longer generation times. However, there has been controversy over whether the generation time hypothesis holds true. Studies from over a decade ago suggest that synonymous mutation rates are consistently higher i n the mouse compared to the cow and  73  human, i n the genes investigated [115]. O n the other hand, a later D N A / D N A hybridization experiment followed by relative rate testing, using artiodactyls with similar biology/metabolic rates and different generation times, found no evidence o f a greater accumulation o f nucleotide changes i n the species with shorter generation times [116]. Furthermore, a computational analysis o f 5669 genes (17208 sequences) in 326 eutherians for mutation rate differences led to the conclusion that mutation rate is approximately constant per year across lineages and largely similar among genes [117]. Based on those results, the authors argued that overall mutation rates are influenced by factors that play larger roles than D N A replication errors i n germ cells. Yet, a recent assessment o f greater than 700,000 bp o f full-length c D N A i n human, pig, and mouse, found that mouse and pig showed 1.44 and 2.86 as many synonymous substitutions as human, although the rates o f non-synonymous mutations were similar [118]. In addition, Margulies et. al (2005) found that the substitutions per site were higher i n rodents compared the hedgehog (belonging to the same mammalian order, Insectivora, as the coast mole), cow and human [3]. A major criticism given by authors o f the above computational analysis was that studies prior to it used either a small number o f genes or a small number o f species [117]. A large number o f genes is important because only a fraction (~15%) o f positions i n a sequence are four-fold degenerate (expected to harbor only synonymous substitutions), which is necessary to test mutation rates i n the absence o f selection, according to the nearly neutral theory o f evolution [117]. Because the Margulies et. al (2005) study looked at a large number o f placental mammals and sequences, it seems to have rebuked this argument, showing that mutation rates vary i n different eutherian lineages, with the mouse showing branch lengths longer than other mammals investigated [3]. Therefore, it remains a valid question whether the overall proportion o f genes that are expressed on the inactive X chromosome decreases with species that have short generation times.  3.2.2) Constitutive Heterochromatin Because the mouse X chromosome is acrocentric (some say "telocentric"), whereas the human homolog is instead metacentric, the presence o f constitutive heterochromatin i n the middle o f the human X might hamper the spreading o f X I S T , resulting i n more abundant escape i n proximal regions as well as the distal X p located opposite from the XIST locus. Indeed, X i s t coverage is absent in G-dark metaphase bands, demonstrating that the R N A transcript has  74  preferential affinity for non-constitutive heterochromatin [119]. In addition, intercalary constitutive heterochromatin between the autosome and X is selected for i n species that have acquired X;autosome translocations, presumably because it prevents silencing from spreading to the autosome [120]. Since the mouse centromere is located distally, whereas its Xist locus resides near the centre o f the chromosome, this could facilitate silencing spread o f the mouse X i s t by allowing accessibility to appropriate binding sites on the sex chromosome. I f this were the case, then mammals with acrocentric X chromosomes (rat, river buffalo, sheep) [121] should show more complete silencing than mammals with metacentric sex chromosomes (coast mole, cow, human).  3.2.3) Distance from the XIC Better spread o f inactivation correlates with proximity to the XIC, as demonstrated by both the gradient effect in X;autosome translocations [59, 122] as well as in mouse early embryos, before inactivation becomes stable [25]. Thus, regions that have a predisposition to not be silenced or to be reactivated might have an increased chance for escape i f they are situated at a greater distance from the Xist locus. The locations o f the genes on the X chromosome differ between species, which allows for correlational studies o f distance from the XIC with expression status o f the same loci across eutherians. For a gene located proximal to the Xist locus i n one species, but distal to the XIC in another species, the gene would be expected to be subject to inactivation i n the former species but to escape inactivation i n the latter.  3.2.4) Evolutionary Age and X/Y Divergence The age o f the different regions on the X chromosomes generally correlates with inactivation status; the older regions being more prone to silencing because they are less likely to contain functional Y homologs and/or more likely to accumulate way-stations ( i f they do exist). However, on the human X chromosome, i n addition to the genes that escape inactivation in the newer X A R on the short arm, escapees also reside i n the X C R o f the short arm (stratum 2), as well as the long arm X C R [99]. Interestingly, the genes that are devoid o f silencing i n stratum 2 are syntenic to distinct chromosomal origins (chicken 1,4, 12) than the rest o f the  75  X C R (chicken 4p) [90,105]. One human escape cluster at X p l 1 includes genes derived from both the X conserved region ( X C R ) and X added region ( X A R ) , so the relationship between how long (evolutionary time) these genes have spent on the X chromosome and why they escape inactivation is unclear. Because a significant portion o f genes that escape inactivation i n humans also do not possess Y homologs, escape cannot be strictly for appropriate dosage regulation. A n examination into other species w i l l reveal i n which cases the presence o f Y homologs correlates with escape status.  The purpose o f the present study is to use different species to test the expression status o f genes that show discrepant status in human and mouse, i n order to distinguish whether the incomplete inactivation is unique to primates or is representative o f other eutherians. This w i l l contribute to our understanding o f what characteristics lead to discordant silencing status across these species, and whether these same characteristics help to predict whether a gene is prone to inactivation or escape. In addition to using human and mouse as comparative controls, two other eutherians, the cow and mole, have been chosen based on the availability o f both female and male cell lines in the lab, which is necessary for D N A extractions for the assay. G i v e n that the mole is a member o f Insectivora and the cow belongs to Artiodactyla, their distributed positions relative to human and mouse on the phylogenetic tree also make them useful species for representing distinct eutherian lineages. Because methylation at the C p G islands o f genes is associated with silencing, and this D N A modification is conserved within Eutheria, methylation analysis was used to assay for inactivation status. 3.3) Results The experiments were conducted using the methylation-sensitive enzyme Hpall, where the presence o f a band after restriction enzyme digestion followed by P C R amplification suggests the presence o f methylation at the tested locus. This reflects a silent state on the inactive X , whereas the absence o f a band indicates hypomethylation associated with expression from the inactive X . A s a control to ensure that the absence o f a P C R band reflects lack o f methylation at the C p G island rather than the mere absence o f D N A in the digest, a mock control was performed under identical conditions using glycerol/buffer solution i n place o f the enzyme and buffer mix. This also served to show the band size for the species in question. Absence o f a band in this mock control was either due to poor optimization o f primers or bad  76  primer design, or the degradation o f D N A i n the digests. When possible, P C R conditions were adjusted, new primers were designed, or new digests were made. To ensure that the presence o f a band i n the Hpall digest reflects methylation at the C p G island o f interest rather than incomplete restriction enzyme digestion, a Mspl cutting control was used. Because Mspl recognizes the same D N A site as Hpall but is not methylation-sensitive, performing this digest under identical conditions as Hpall first confirms that the locus being amplified contains a Hpall site, and second allows for direct comparison o f the extent o f digestion between the two enzymes. Remnant bands i n the Mspl control represented insufficient time or poor cutting conditions leading to incomplete digestion, or reflected an excessive number o f cycles for the P C R reaction. Samples were either redigested or the number o f cycles was decreased to counter these problems. Degenerate primers were designed for those loci o f interest (Table 3.1) possessing C p G islands lying within 2 kb from the 5' end (Santa Cruz genome browser) for amplification across the four eutherians. Methylation digests were performed on female and male coast mole, cow, human, and mouse fibroblast-derived D N A , and three sets o f reliable primers were used for P C R amplication o f non-CpG island loci to make sure that all digests were positive for D N A (Figure 3.4). For digests that showed weaker P C R amplification, the amount o f template was adjusted until all samples showed equal intensity o f P C R bands. The established amount o f template was then held consistent for all P C R reactions to amplify X - l i n k e d C p G islands o f interest. For the methylation analysis, I have chosen two loci that reside within the long arm o f the X chromosome i n the X C R , which are subject to inactivation i n both humans and mice, to test i n the cow and coast mole. The purpose o f this was to confirm that genes that are silent i n both mouse and human show similar status i n all eutherians. Indeed, for all species tested, both Fmrl and Aral displayed a band i n each o f the mock female and male controls, along with each o f the female Hpall lanes, but not in the male Hpall or any o f the Mspl digests (Figure 3.5). These results suggest the presence o f methylation at Fmrl and Aral C p G islands i n the eutherians tested, as expected for loci that are part o f the evolutionary older X , where there is a general depletion o f functional Y homologs (methylation results are summarized i n Figure 3.6). To address the idea o f unique escape status i n human, JaridlC and Ubel (short arm X C R , stratum 2) which escape silencing in human but not in mouse were tested i n the other two mammals. For JaridlC, bands were observed i n the female mouse and cow Hpall lanes, but not  77  T a b l e 3.1. Degenerate Primers Used for M e t h y l a t i o n Analysis at C p G Islands of X - l i n k e d Genes. O p t i m i z e d primers w o r k for cow, mole, h u m a n , a n d mouse genomic D N A .  Primer Pair  Gene  CRSP2-CpGF CRSP2-CpGR  Crsp2  UTX-CpGF UTX-CpGR UTX-CpGF UTXCpGR2  Utx  UTXCpGRS UTXCpGF2  Utx  UTXCpGF3 UTXCpGR3  Utx  UBEl-CpGF UBEl-CpGR  Utx  Sequence 5' to 3' TAC AGR G G G CRG M G G TGA G R A GGG RC GCS CGC CTC A A Y TGC RCC G A R T A C A A CCT CAG CCT CCS CCT  MgC12/Betaine 1.5mM 1M Betaine  CGY G G A GGC Y A T TAT TTC Y A G C A G A A T G RAG GGT CCV GGC Y G K GTC CGY GGA GGC YAT TAT TTCY A G C ARW GGS AGC WKS Y K G TTA GGT TG TCR T Y C TGG CGC CAT CTT CAT G A  Product Size  40X  Hhal (IX) HpaU ( 3 X )  375bp  Hhal ( 8 X ) HpaU ( 5 X )  354bp  Hhal ( 8 X ) HpaU ( 5 X ) Hhal(\X) HpaU ( 3 X ) Hhal(\X) HpaU ( 4 X )  403bp  Hhal ( 4 X ) HpaU ( 7 X )  544bp  Hhal HpaU Hhal HpaU  (5X) (6X) (9X) (3X)  630bp  Failed  Hhal ( 3 X ) HpaU ( 4 X )  329bp  Optimizing  Hhal ( 2 X ) HpaU ( 3 X )  132bp  Failed 1.5mM 1M Betaine  Ubelx  ATG ATT C A T RAR TRG GCG C G G GGTMY G A Y B Y C A A G GTCA G A TTT  1.5mM  UBEl-CpGF UBElCpGR2 PCTKlCpGR3 PCTKlCpGF3  Ubelx  ATG ATT CAT RAR TRG GCG CGG GTT CTG A Y M K G M RAT R C A W G G Y T C DGG A  Pctkl  GTG CCA GTA GTC TTCRGC CAT TTT GGT A C A Y G C A G T CCG A G G TGA  PCTKl-CpGF PCTKl-CpGR PCTKl-CpGF PCTKlCpGR2  Pctkl  GGA CTD GGA TCG  Pctkl  R . E . Sites Flanked  Failed  GGT GAT G A G GRA A A G A A A ATG GCG C C T TCR T Y C TGG C G C C A T C T T C A T G A  GGA R A A GGA GGTCGC GCG CCC A W C C Y C A G C TCC Y A G R M C GGA R A A GGA GGTCGC GCG VGG ACR CGC TCA CCG GMG  Cycles  Cycling Conditions 94-1 min 54-1 min 72-2min Failed  951 min 54-1 min 72-2min 94-1 min 56-1 min 72-2min  35X  35X  Failed 1 M Betaine (Human, M o u s e ) 2 M and 3 M Betaine ( C o w , Mole)  95-1 min 54-1 min 72-2min  35X  240bp 368bp  350bp  . Degenerate P r i m e r s Used for Methylation Analysis at C p G Islands of X - l i n k e d Genes (Continued...) FRAX 2 FRAX 7  Fmrl  GCT C A G CTC CGT TTC GGT TTC ACT TCC G G T A G C CCC GCA CTT C C A C C A C C A GCT CCT C C A  NEMOCpGF NEMOCpGR ZFX1 ZFX2  IkBkg  TAY AAR GAG CTA  Zfic  GAC ACC GGA AGC CGG A A G ARG ACC A C A CCT GTC A G C A G CTC GGA GCT GAC A A A A A CCC TTC CGC A T T TTC C T '  SMCX1 SMCX2  JaridlC  CCT CGG GCC CAC CAT GGA CTG ATT TTC GCG ATG TAG CC  E1F2S3F EIF2S3R COW A R FOR COW A R REV  Eif2s3  CCT TCA TCG GGT  Ar  BAG CTC AGT GCC  Y V T TGC CTR C M C A R A W A T C CAS C Y T CDC CSC C M G C C A T G C A GCA CCT TCC GGC G GGC CTC GCT CAG GAT GT  2mM 2 M Betaine  94-1 min 50-1 min 72-1 min  35X  Untested  Hhal ( 2 X ) Hpall ( 3 X )  250bp400bp depending on snecies  Hhal ( 2 X ) Hpall ( 3 X )  165bp  2mM 2 M Betaine  94-1 min 50-1 min 72-1 min  35X  Hhal ( 2 X ) Hpall ( 3 X )  105bp  0.5mM I M Betaine  94-1 min 56-lmin 72-1 min Failed  35X  Hhal ( 2 X ) Hpall ( 2 X )  117bp  240bp  40X  Hhal ( I X ) Hpall ( 2 X ) Hhal ( 4 X ) Hpall ( 4 X )  35X  -  1.5mM  E6R CME6R ( M o l e D N A Control) mUBEl A mUBElB (Mouse D N A control)  Xist  G C A G A G A C A CTG A A G C A C A C A A TGT CTT A C C C A T TTC C A T G A T TC  1.5mM  Ubelx  A G C TGT GCT GCA A C G A T G A A GTC TTG A G G TTG CTG GGT A  1.5mM  CMXIST12 CMXIST13 (Human and C o w D N A Control)  Xist  TTC TCA G M A GTK CTG GCA CAT CTG TTC TTT T G A G A T G T M CTT TTT G A T G T T  1.5mM  95-5min 95-1 min 62-1 min 72-2min 94-1 min 54-1 min 72-2min 94-1 min 56-lmin 72-1 min 94-1 min 52-1 min 72-3min (Human) 94-1 min 49-1 min 72-3min (Cow)  380bp  430bp  35X  196bp  35X  ~400bp (Human) ~300bp (Cow)  HUMAN COW MOLE MOUSE  Figure 3.4. Control PCRs for Methylation Analysis. The equal presence o f X chromosome D N A for all digested and mock-digested samples was checked using independent primers from those used in the methylation survey. C o w , human, and coast mole digests were amplified w i t h a degenerate pair of Xist primers; mouse digests were amplified using mouse Ubel primers that d i d not surround HpaU cut sites.  80  Figure 3.5. Methylation Analysis of X-linked Loci in Four Eutherians. The genes and their respective locations are shown on the left i n relation to the human X chromosome. The corresponding gels are shown on the right for mole, mouse, cow and human. E a c h panel includes the mock, Mspl, and Hpall reactions for female and male as labeled. X added region ( X A R ) is i n green, while X conserved region ( X C R ) is i n blue on the human X chromosome.  81  Mb  HUMAN  23.9  ZFX  E  40.3  CRSP2  E  44.5  UTX  E  46.8  UBE1  E  53.1  MJARIDIC  66.5  146.7  I  I  E  MOUSE  + +  +  COW  I Testes  E  +  E  ?  I -  E  -  E  -  E  +  E  +  E  -  E  -  E + I Testes  I/E +  I/E  I"  AR  \FMR1  MOLE  I  I  I  Figure 3.6. Summary Diagram of Methylation Analysis Results. M o u s e shows unique inactive status compared to the non-rodent eutherians i n the X A R and most distal X C R . The proximal X C R shows variable status among two eutherians and escape status in the others. I = inactive; E = escape; +/- = presence or absence o f Y homolog, respectively. Testes = testes-specific expression o f Y homolog. The location o f the loci are indicated on the left, i n M b from the p terminal o f the human X chromosome.  82  in the corresponding lanes i n human and mole, when mock controls were positive and Mspl digests were negative. These results indicate that the mouse and cow JaridlC  genes are subject  to inactivation, whereas the mole and human genes show escape. However, when using D N A from a different female cow (IVF), the absence o f a band i n the Hpall lane suggests that the D N A is unmethylated and JaridlC  escapes inactivation (the result was replicated three times)  (Figure 3.7). For Ubel, the gene escapes silencing i n both human and cow, but is subject to inactivation i n mouse. The status at this locus is unknown for the coast mole, as these primers failed to amplify a specific band o f an expected size, but instead produced multiple non-specific bands under a variety o f conditions. In addition, even after many redigestions, bands were seen in the mole male Hpall digests but not i n Mspl lanes. I f any o f these bands represent the Ubel locus, the C p G island is normally methylated in both males and females. In humans, there is no X - l i n k e d locus known to date which is normally inactive i n both males and females, other than Xist. The closest situations are SYBL1 and HSPRY3, which are subject to inactivation on both the human inactive X and Y [43, 99]. To test whether mouse shows unique inactivation, two loci ( X A R ) - Zfx, Crsp2- that escape inactivation i n human but not i n mouse, were assayed.  For both loci, bands were absent  in the coast mole, cow, and human Hpall lanes, suggesting hypomethylation at the corresponding C p G islands. This pattern indicates an escape status i n coast mole, cow, and human for Zfx and Crsp2, compared to the inactivation status i n mouse. Finally, to confirm whether loci with concordant active status i n human and mouse are expressed on the inactive X i n other eutherians, the Utx gene ( X A R ) , which escapes inactivation in both human and mouse, was tested. The coast mole and cow Hpall digests failed to amplify bands, despite obvious bands in the mock control. Thus, the Utx expression pattern i n both moles and cows resembles that seen i n human and mouse.  3.4) Discussion 3.4.1) Evidence of X-linked Loci in Cow and Mole To draw any conclusions from the present methylation analysis, it was important to know whether the loci tested in fact reside on the X chromosome i n all four species. Confirming the locations was not a problem for human, mouse and to some extent cow, since their X chromosomes have been sequenced to entirety (Figure 3.8). However, several X - l i n k e d  83  Female  Female Male ^  K>  $^  H-I  H H H H Q  ^  k>  %  H H H-1 H-I C L  M  '"S  H H  H H _*s> H-1 ^  Male >—I  H H H-1  Q  IVF CCL 209  positive control  JaridlC P C R  Figure 3.7. Potential Developmentally-Dependent Expression Status of JaridlC in Cow. Methylation analysis o f c o w D N A derived from in vitro fertilization (top) compared to cow D N A derived from the pulmonary artery o f a young female (bottom). D N A from the male c o w cell line C C L 2 0 7 was used i n both cases. Left panels show positive control for the presence o f D N A using Xist primers amongst all digested and mock-digested samples, w h i l e right panels show JaridlC amplification o f the same samples.  \  A  Zfx \ Pctkl Utx Ubelx Arafl JaridlCS  / Zfx / Crsp2 Utx Ubelx Pctkl Arafl \ Synl JaridlC  Rps4x Xist  Ar  Rps4x Xist  Fmrl  Fmrl Ikbkg  CAT  DOG  Synl Arafl Ubelx Pctkl Utx Crsp2 JaridlC Zfx Ar Rps4x Xist  Fmrl Ikbkg  RAT  Crsp2 (I) Utx (E) Ubelx (1) Pctkl Arafl Synl  Fmrl (I) Ikbkg Zfx (I) Ar (I) Rps4x (I) Xist JaridlC (E/I)  MOUSE  ZFX (E) / CRSP2 (E) UTX (E) UBEIX (E) PCTKI (E) ARAFI \SYNI JARID1C (E)  Fmrl (1) Ikbkg / Crsp2? (E) Utx? (E) Pctkl? Ubel? (E) Synl \ Arafl? JaridlC? (1/E)  AR (I) RPS4X (E) XIST  Xist (Xq23) Rps4x Ar (I)  FMRI (I) IKBKG (E)  Zfx(Xq34) (E)  HUMAN  COW  HUMAN  COW  B  CAT  MOUSE  Figure 3.8. Comparison of Eutherian X and Y Chromosomes. A ) The relative order o f tested loci i n the present methylation analysis. The loci o f interest (and their Y homologs i n panel [B]) are highlighted i n red. The loci indicated w i t h (?) in c o w have been mapped onto the X chromosome, but the relative positions o f these loci are unknown. K n o w n expression status o f the loci i n mouse, human, and c o w are indicated as (E) for escape status or (I) for inactive status. B ) Y chromosomes o f mouse, human, and cow, w i t h cat and p i g Y chromosomes shown as comparisons. The precise location o f the centromeres and size o f the chromosomes may not be accurate. Positions o f loci are not drawn to scale [124, 126-128 and N C B I ] .  85  orthologs o f interest (Crsp2, Utx, Ubel, JaridlC)  have not been "placed" on the cow X  chromosome map and they lie at undefined positions i n the cow genome ( N C B I , U C S C ) . Nevertheless, MaoA which is near Utx, and Synl which is near Ubel and Pctkl, are mapped onto the cow X chromosome according to the 2005 cow radiation map available i n the B O V M A P database (http://locus.iouy.inra.fr/), albeit at unknown locations. A t present there is no sequence information on the coast mole X chromosome. However, Z o o - F I S H experiments have confirmed whole X chromosome synteny i n all eutherians tested [123]. This is consistent with Ohno's law first hypothesized in 1973, which states that the gene content ofthe traditional mammalian X chromosome (not including the P A R ) would be highly conserved across taxa because o f strong selection to maintain dosage compensation. Examinations o f gene order i n rodents [124] and artiodactyls [125,126] reveal a fair degree o f rearrangement relative to the human X chromosome. The gene order o f the rat X chromosome is drastically different from that o f mouse, although both rodents do share the similarity o f acrocentric sex chromosomes ( N C B I ) . In cow, Fmrl is located at X p (instead o f X q as i n human) [127], on the opposite arm from cow Xist (Xq23) [127], and the centromere has shifted in cattle compared to human [128]. Due to an inversion, Ar and RpS4Xare still located on the q arm o f cow, similar to human [127]. According to the map in several recent papers, Zfx has been shuffled to X q 3 4 i n cow, compared its location at Xp21 i n human [127, 128]. ZooF I S H analysis on the common shrew (belonging to the same order Insectivora as the coast mole) again shows that the human X arm is conserved i n its entirety onto a single chromosome o f the shrew, although this chromosome also consists o f human chromosome 2 genes and lacks human X p material [129]. Different shrews possess X - l i n k e d Zfx, suggesting that moles which are also insectivores also possess this gene on the X chromosome ( N C B I ) . Recent reconstructions o f the ancestral eutherian karyotype using the parsimony principle (assumes that chromosomes identical i n species belonging to different taxa are likely to be present in their common ancestor) predicted that ten ancestral chromosomes were homologous to entire human chromosomes: 5, 6, 9, 11, 13, 17, 18, 20, X and Y . The ancestral karyotype resulting from other studies ranges from 2n=44 to 2n=50, resembling the karyotype o f human [130]. This gives confidence that the loci investigated i n this study are indeed located on the X i n all eutherians.  86  3.4.2) Implications of Factors in Escape Generation Time The findings in the present methylation analysis showed that three out o f four genes expressed differently between human and mouse (Zfa, Crsp2, and Ubel) are inactivated i n mouse but not any other eutherian tested (Figure 3.6). The short generation time i n mouse might explain its more complete silencing. Mouse has a gestational time o f three weeks, while the coast mole has a similar gestational time o f four weeks, compared to the 266 days i n human and 277-290 days in cow [131] (http://www.infoplease.com/ipa/A0004723.html).  However,  the coast mole breeds during January to March and produces a single yearly litter consisting o f 3-4 young, compared to the mouse which breeds throughout the year (http://www.dfg.ca.gov/whdab/html/M017.html). In addition, female moles are sexually mature at 9-10 months compared to 2 months in mice (http://imnh.isu.edu/digitalatlas/bio/mammal/insec/mole/como.htm).  In cattle, females begin to  mate at 18 months o f age with mating taking place throughout the year (www.oaklandzoo.org/atoz/azcattle.html). Since Zjx and Crsp2 ( X A R genes) escape inactivation i n moles (with longer generation times than mice), resembling the pattern seen i n cattle and humans, more complete inactivation i n the X A R in mice can be due to accelerated evolution. Humans may show abundant escape compared to the other three eutherians due to their long generation time. Unfortunately, the mole Ubel expression status is still unknown. However, the escape status o f cow and human Ubel ( X C R ) strengthens the assertion that mouse, with its short generation time, displays unique inactivation status for several genes on the X chromosome, independent o f age o f the region ( X C R or X A R ) . Therefore, other than generation time, any mechanism that leads to increased mutation rates relative to other species, such as high metabolism (which is thought to increase the amount o f oxygen radical damage to D N A and correlates with small body sizes), also merits further investigation regarding its effects on the proportion o f genes that escape inactivation on the X chromosome. Constitutive Heterochromatin The fact that Xist does not need to traverse across constitutive heterochromatin might also explain the discrepancy between mouse and the non-rodent eutherians investigated here.  87  The simplistic view is that centromeres at the termini o f X chromosomes would allow the X i s t R N A to coat and silence the chromosomes more efficiently. Cattle have a diploid number o f 60 comprising acrocentric autosomes and a pair o f metacentric sex chromosomes. O f mole chromosomes investigated to date, the centromeres o f the majority o f the species occupy a metacentric or submetacentric position, with only one out o f the eight species showing an acrocentric position (in only one pair o f homologs), usually with a diploid number o f 34 [132, 133]. Although the coast mole has not been karyotyped, it is likely that it also possesses biarmed sex chromosomes with centromeres near the centre, rather than near the termini. Human, cow and mole with metacentric X chromosomes all show escape at Zfx and Crsp2, compatible with constitutive heterochromatin playing a role i n escape. However, i n the Jegalian and Page study (1998) Zfx was shown to escape silencing even in sheep which have acrocentric chromosomes [100]. The observation that the cow Zfx gene is relocated to the X q arm but still escapes inactivation like in human, argues against the role o f constitutive heterochromatin i n hampering X i s t spread and subsequent silencing o f this region. In addition, the position o f Fmrl on the cow X chromosome, seems to have relocated from the long arm to the p arm, but again the centromeric barrier between Xist on the q arm and this locus does not seem to hinder silencing. Therefore, another factor other than constitutive heterochromatin must contribute to the species differences i n inactivation. Distance from the XIC  Because cow Zfx is located at a distal position (Xq34) on the same arm as Xist [127], the large separation from the Xist locus might account partly for the gene's escape status i n this species. O n the other hand, two out o f four genes on P A R 2 i n humans, located at comparable distance from XIST, are subject to inactivation on both the X and Y chromosomes. Furthermore, mouse genes whether located nearby or far away from the XIC are nevertheless stably silenced the silencing status o f genes along the mouse X chromosome seems to be uniform, without a gradient due to distance. A s mentioned before, the cow Fmrl locus is relocated to the X p arm, presumably increasing the distance between the gene and Xist (Xq23) [123]. Despite the distance, Fmrl remains subject to inactivation. These lines o f evidence make it unlikely that distance from the XIC accounts for expression differences o f X - l i n k e d genes between species.  88 Evolutionary Age of the Region Genes located in the X C R on the short arm o f the human chromosome (stratum 2) JaridlC  and Ubel- that escape silencing i n human but are subject to inactivation i n mouse  exhibit a variable pattern i n the mole and cow. For JaridlC, expression i n cow is similar to i n mouse (inactive); whereas expression i n mole is similar to i n human (escape). A s for Ubel, at least i n one other mammal, the cow, the gene escapes inactivation analogous to i n human. Therefore humans do not appear to be unique i n allowing the expression o f loci on the inactive X located within the short arm X C R . It is interesting to note that stratum 2, which contains JaridlC  and Ubel, may not have  been part o f the original ancestral pair o f autosomes from which sex chromosomes evolved (syntenic to chicken 4p). Instead, stratum 2 maps to distinct autosomal origins, which was likely added onto the original pair before the prototherian-therian divergence [90, 105].  JaridlC  is a gene that shows inconsistent expression i n mouse depending on the time o f development, being inactivated initially during early embryogenesis and escaping as development proceeds. A l o n g with the tissue and timing variability seen with JaridlC, some genes i n this region also show variability i n escape between human females (eg: Timpl); genes such as these tend to lack 5' C p G islands, as suggested by a recent analysis i n humans [99]. One justification for this variability is that even though the stratum 2 X C R existed on the X prior to developing dosage compensation, it originated from a different genomic context than the traditional X C R (stratum 1). It could lack cz's-elements or epigenetic factors necessary for stable inactivation or contain sequence features more resistant to X i s t silencing than the traditional X C R . Eventually, this could have led to different eutherian lineages acquiring different expression profiles and evolving different mechanisms to maintain their expression or inactivation. The status o f Ubel in coast mole and Pctkl (another gene within the region) might clarify whether each eutherian bears distinct expression status. A t loci whose expression concurs i n both human and mouse, the status seems to also be similar i n other species. Utx ( X A R ) , Fmrl and Ar ( X C R ) whose status coincides with that expected from evolutionary age - X C R genes that are subject to inactivation or X A R genes that escape silencing i n human and mouse- show the same status i n cow and mole. One explanation for the consistent expression within Eutheria is that the dosage o f these genes is crucial and therefore maintained throughout evolutionary time. For these genes, it is likely that all  89  eutherians share ancestral regulation mechanisms due to strong evolutionary pressures to maintain the correct dosage o f the gene products.  Presence of YHomologs  One question is whether the silencing status o f genes i n different species reflects the differential Y homolog retention in various lineages. Understanding functions o f the loci tested and their Y homologs might help to explain why genes require differential escape or inactivation. Zfx  Zfx encodes a zinc finger protein with transcriptional activation, containing the Cys(2)His(2) (C2H2) zinc finger [134]. In humans, both this gene and its Y homolog are ubiquitously expressed in all tissues tested and at all stages o f organismal development (GeneCards, http://bioinfo.weizmann.ac.il/cards/index.shtml). O n the other hand, the mouse Zfy has functions confined to germ cells ( O M I M , http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM). Although a variety o f carnivores (cat, dog, cheetah, jaguar, leopard, and lynx) and artiodactyls (pig, cow, sheep, goat, bison and horse) possess the Zfy gene ( N C B I ) , it is unknown whether Zfy is present in the mole and whether it is ubiquitously expressed i n the cow. I f the expression partem o f the cow Zfy gene is similar to human, the need for dosage compensation at this locus would explain the escape status i n human and cow, compared to inactive status i n mouse, whose expression o f Zfy has become testis-specific! Unfortunately, there is not enough information on Zfy expression or the mole Y chromosome to draw any conclusions on whether the locus complies with the Y homolog "rules" for inactivation, although the situation i n human and mouse suggests this is true. Crsp2  A s its name implies, Crsp2 (cofactor required for Spl activation) encodes a nuclear cofactor comprising a large multiprotein complex Crsp, required for S p l to initiate transcription  90  [135].  In human and mouse, the Crsp2 gene is ubiquitously expressed, but mouse contains no  Crsp2 Y homolog, whereas human contains a Y pseudogene ( O M I M and [99]). N o Crsp2 Y homolog was detected from the available cow and mole sequences deposited on the N C B I database, or radiation hybrid, linkage, or physical maps, to date. Although the requirement for dosage equivalency between males and females explains Crsp2 inactivation i n mice lacking the Y homolog, this same explanation is insufficient for the cow and mole which also lack Y homologs. In addition, since the Y pseudogene i n human by definition does not encode functional transcripts, the absence o f final dosage differences between males and females contradicts the observed escape status o f Crsp2 in human. Thus, the presence o f Y homologs does not account for the Crsp2 expression status on the inactive X chromosome across eutherians. Ubelx  Ubelx encodes for a ubiquitin activating enzyme that catalyzes the first step o f ubiquitin conjugation necessary for diverse cellular processes, including selective protein degradation, D N A repair, nuclear D N A replication/synthesis, and progression o f the cell cycle ( O M I M ) . Consistent with its important cellular functions, Ubelx is widely expressed i n mouse [136]. Ubely is found in many eutherians, a metatherian (opossum), and even a prototherian (platypus), but nevertheless is absent in humans [136]. N o Ubely homolog was found i n the cow or any insectivores based on sequence data on N C B I map viewer or through B L A S T . Consistent with its proposed role i n spermatogenesis, Ubely is limited to the testis i n mouse [136]. Thus, it is understandable why X - l i n k e d Ubelx is subject to X inactivation i n mouse because this gene is essentially expressed in one dose i n somatic tissues o f both males and females. However, this X - Y gene pair is differentially regulated, being expressed at different ages and tissues, signifying that Ubelx and Ubely are not functionally equivalent [137]. In human and other eutherians (cow and mole) that do not appear to possess Ubely, Ubelx escapes inactivation, suggesting that the exact amount o f final products from the Ubel locus is not crucial. The absence o f Y homologs does not explain the expression status o f Ubelx i n human, cow and mole, although the expression pattern o f the Y homolog explains the status i n mouse.  91 Utx  The Utx (X-linked ubiquitously transcribed transcribed tetratricopeptide repeat [TPRJ)  gene, which is expressed from the inactive X chromosome i n both mice and humans, has a Y homolog, Uty, that encodes a protein with 8 T P R motifs, believed to mediate protein-protein interaction. The U T Y protein confers H - Y antigenicity on male cells known to be involved i n stem cell graft rejection between males and females. Because the human UTY gene maps to band 5 C known to contain spermatogenesis genes, Uty may similarly be involved i n spermatogenesis (OMIIvl). Both Utx and Uty are widely expressed i n all tissues o f human and mouse (OMUvI and GeneCards). Uty is found i n cow (GI: 17933095), human, mouse, and shrews (GI: 60267961, 60267959), whereas no evidence o f this gene was found i n dog, rat, and mole based on available sequence data on N C B I . Since Utx escapes inactivation i n all four eutherians tested here, o f which three out o f four are known to possess the Y homolog, it is likely that the coast mole also contains a homolog on the Y chromosome because it shows a similar Utx escape pattern (the online databases likely lack information on the mole). Based on these results, I conclude that the dosage o f the Utx/Uty protein products is important and predicts whether Utx would be expressed or subject to inactivation. Eutherians lacking the Uty homologs would be expected to display an inactive status on the X chromosome.  JaridlC  Jumonji AT rich interactive domain 1C (JARID1C; formerly SMCX) is an evolutionarily  conserved protein involved i n transcriptional regulation and chromatin remodeling ( O M I M and [138]). J A R I D 1 C transcripts are detected in many human adult tissues with the highest expression i n brain and skeletal muscle and the lowest expression i n the heart and liver [138]. O n the other hand, the Y - l i n k e d JARID1D,  or Jumonji AT rich interactive domain ID (formerly  SMCY), is expressed only in pancreas, lung, brain, and skeletal muscle, at much lower levels than JARID1C [138]. The JaridlD homolog is found i n cow, cat, mouse, human but not i n rat, dog, or insectivores, based on available information ( N C B I , physical, linkage and R H maps). Therefore, relying solely on Y homolog differences, I would expect the coast mole JaridlC to be subject to inactivation. Instead, the mole gene showed a similar pattern to human JARID1C  92  escape, and the cow gene showed a similar inactivation status to that o f mouse Jaridlc (early mouse development). In addition, J a r i d l C and J a r i d l D may not be functionally equivalent. In mice, JaridlC is expressed at significantly higher levels i n the adult female brain relative to the male brain, yet the expression o f JaridlD  i n males is insufficient to compensate for that  difference [137]. Therefore, the presence o f the low-expressing Y homolog i n males likely does not necessitate the need for a second dose i n females, although doing so might account for sexspecific differences. Because the expression o f mouse JaridlC on the inactive X increases with developmental time [108, 109, 114], establishing such sex differences would occur during late development. In the present study, cow JaridlC was observed to be both subject to inactivation i n a C C L 2 0 9 female cell line and to escape inactivation i n an in vitro fertilization ( I V F ) female derived sample (Figure 3.7). The cow gene might be subject to inactivation due to low expression o f the Y homolog like i n mouse, but there is currently no information on expression levels o f the cow JaridlD.  Unfortunately, the developmental stage from which the I V F sample  was derived is unknown, although the C C L 2 0 9 cell line is derived from the pulmonary artery o f a young female cow. Experiments using 85- and 95- day old fetuses from pregnant uteri, as well as bovine cells from skin biopsies with sex aneuploidy [139] have revealed that JaridlC is subject to inactivation during later developmental stages and adulthood, a result replicated i n this study using C C L 2 0 9 cell lines. If the C C L 2 0 9 D N A is from a cow o f a later developmental stage than the I V F cow, then the status observed i n cow is opposite to the escape status preceded by silencing seen in mouse. Taken together, this study presumes that sex differences resulting from differential expression are achieved during an early developmental stage in cow. However, the variability seen between cow samples could reflect differences between individuals or cell types rather than developmental time. The cow variation idea can be assessed by comparing the methylation status o f JaridlC  in embryos o f known developmental stages and  testing cell lines from different tissues and individuals. Ar and  Fmrl  The C p G islands of Ar and Fmrl were methylated in the four species tested in the present study. AR encodes for androgen receptor, a ligand-activated nuclear transcription factor, which is also a serine/threonine protein kinase. The active androgen-receptor complex regulates  93  the activity o f androgen-responsive genes, which direct the development o f sexual characteristics, such as hair growth and sex drive, i n both genders, as well as male-specific sexual characteristics. In human, AR is expressed i n many o f the body's tissues, whereas i n mouse, it is predominantly expressed in urogenital tissues. Mouse, human, cow and mole lack Y homologs for this gene. Since AR/Ar is subject to inactivation i n all four species, this reflects the importance o f precise dosage for regulating sexual characteristics, despite different tissuespecificities. Fmrl, which is infamous for its role i n fragile X mental retardation when functionally lost, is implicated i n dendritic development. A s with Ar, Fmrl bears no Y homologs in the four species tested, characteristic o f genes located i n the evolutionary old part o f the X chromosome.  The X-linked copy o f the gene was subject to inactivation i n all four  eutherians examined i n this study, as would be expected o f a gene that holds critical functions and is expressed i n only one dose i n males. 3.4.3) S u m m a r y of Factors i n Escape  In conclusion, the presence o f Y homologs correlated with escape status o f the X - l i n k e d loci Utx, Ar, and Fmrl (perhaps Zfx), but not Crsp2, Ubelx or JaridlC. The first three loci (Utx, Ar, and Fmrl) showed concordant status i n all four eutherians and the absence o f Y homologs correlated with the evolutionary age o f the region. The dosage o f the final products from these genes is presumably important and thus regulation o f appropriate expression i n males and females has been maintained i n different eutherian lineages. For the three genes whose retention o f Y homologs did not explain the need for escape, two out o f three are located within the stratum 2 X C R , while all three showed discordant expression status between human and mouse. These genes may possess epigenetic differences across species that could account for the expression differences. The results o f the present study support the idea o f gene-specific regulation, since the hypothesized reasons for escape at one locus did not always apply to another.  It is likely that the expression profiles o f X - l i n k e d genes across species result from  multiple factors that contribute differently depending on the gene. Short generation times make it more likely that Y homologs are lost in evolutionarily new regions, and the lack o f constitutive heterochromatin facilitates the spreading process. Large distance from the Xie make genes less prone to inactivation, but the initial predisposition o f different genes by epigenetic marks vary. Species with acrocentric chromosomes could have long generation times  94  that counter the need for inactivation because Y homologs are still retained i n the evolutionary newer regions. Hence, meaningful comparisons are best achieved using mammals with as many similar features as possible other than the factor tested. M a n y additional genes and species are necessary to the rigorously test the different factors associated with the regulation o f escape versus inactivation. Future epigenetic analyses would also clarify whether genes that escape inactivation i n different species share the same epigenetic modifications.  95  Chapter IV General Conclusion  Whether it is Xist sequence or regulation o f Xist by Tsix, rodents appear to be outliers i n the eutherian infraclass. Results from the present conservation and methylation analyses further suggest that the secondary structure o f the X i s t transcript and extent o f silencing along the inactive X chromosome are unique in rodents compared to other eutherians. Although the use o f the mouse model to study X inactivation has provided tremendous insight into the process o f dosage compensation over the past decades, overall, the present comparative studies have underlined the importance o f species-specific differences and cautioned against relying solely on mice to make generalizations about all eutherians. For the first time, secondary structure conservation o f a transcript the size o f X i s t has been studied using data from greater than three eutherian species and a R N A program designed to predict common structures i n diverged, but related sequences. Results from this analysis have revealed conserved structures for the sequence before the A repeat, the A repeat, F repeat, as w e l l as exon 4. The confirmation that the A repeat and exon 4 stems are conserved within the Xist sequence i n multiple eutherians argues that these regions carry an important biological role common to all eutherians in the X-inactivation process.  Considering its role o f silencing in  mouse, the A repeat potentially recruits players involved in the establishment o f silencing in all eutherians. However, the conserved A repeat stem predicted i n the present study is different from the structure previously predicted by M f o l d for single Xist orthologs [64]. This emphasizes that not all hairpins predicted to be stable are conserved and raises the possibility that the conserved stems formed from adjacent A repeats is responsible for silencing rather than the two stem loops formed within each A monomer. Past studies which demonstrated that mutations o f A repeats disrupted silencing involved changing sites crucial for base-pairing o f stems internal to each repeat [64]. According to the location o f the conserved stems found i n this study, the sites mutated are the same sites necessary to form stems between A repeats. Hence, the past mutational studies do not allow us to clearly distinguish the roles o f the formerly predicted stems and the current stems found to be conserved here.  O n the other hand, exon 4  forms a stable stem loop in all o f the eutherians compared i n the present study, identical to that predicted previously. Since exon 4 deletion in mouse led to no apparent changes in localization and random inactivation [69], it accordingly bears no impact on the same processes i n the other  96  eutherians. Instead, the exon 4 hairpin could contribute to the stability o f the Xist transcript [69]. In this study, two new Xist regions were found to share common secondary structures across seven eutherians. These include the sequence before the A repeat and the F repeat. Because the A repeat is in close proximity to the 5' end, intuitively, transcriptional silencing machinery also binds to the sequence before the A repeat. Alternatively, conservation o f this sequence could be a result o f genetic hitchhiking. O n the other hand, the F repeat conserved structure could mediate D N A binding, since the F repeat in seven eutherians has a lower tendency to form secondary structures than shuffled sequences bearing the same base composition. This region could affect mobility o f Xist by binding to the inactive X and/or interacting with S A F - A [68]. The present study uncovered Xist secondary structures that are distinct to rodents. Conserved R N A folding o f the unique exon 5 not shared i n non-rodents could confer an advantage to rodents i n the stability o f the transcript, as it is located i n close proximity to exon 4. Futhermore, differences in the structures between the A and F repeats, suggest that the process o f silencing or mobility is altered between rodents and non-rodents. The R N A folding within the last kilobase o f the D repeat was conserved only among the rodents investigated. Since deletions o f the D repeat did not affect silencing, in mouse [64], this region must act downstream o f silencing; in mice, this repeat takes part i n both macroH2A recruitment and localization. Structural differences in the D repeat across species suggest that these processes are altered between rodents and non-rodents. This difference in D repeat structure could explain why human XIST only partially coats and silences the mouse autosome from which it is expressed [76]. Differences i n recruitment o f macroH2A and localization as well as downstream effects such as maintenance have not been investigated in cow, dog, rat, vole, or mole. Such studies on macroH2A recruitment and localization would be useful to test the idea o f species variation due to secondary structure differences. M y work suggests that rodents have undergone more rapid Xist sequence and R N A structure divergence compared to other eutherian lineages. Given the differences i n the rodents' Xist secondary structures compared to other non-rodent eutherians, it is not surprising that rodents show differences in their extent o f silencing along the X chromosome, i f the shape o f the Xist transcripts bears any impact on affinity ofthe R N A to the X chromosome and/or subsequent silencing ability. This difference could partly account for the variation seen i n the  97  extent o f silencing on the X chromosomes between eutherian species, as observed i n the methylation analysis presented here. From the current methylation analysis to test expression status along the X chromosome across different mammals, the mouse showed methylated status for three out o f four X - l i n k e d genes that were consistently unmethylated in other mammals o f distinct eutherian lineages. This study supports the idea that mice are distinct i n their complete silencing spread on the inactive X due to accelerated evolution o f the chromosome from short generation times. O n the other hand, I found no evidence that the distance from the XIC and the need for the Xist transcript to cross through constitutive heterochromatin directly plays a role in affecting the expression o f X linked genes i n the mammals tested. Furthermore, for many loci seen here and also notable from a recent extensive analysis by Carrel and Willard (2005), numerous X - l i n k e d genes escape inactivation despite the absence o f their corresponding Y homologs [99]. The presence or absence o f Y homologs adequately explains the methylation status o f those loci that show similar expression in human and mouse. Gene dosage from these loci is presumably important and the retention o f Y homologs coincides with the evolutionary age o f the region. Since methylation status o f these genes is consistent across eutherians lineages, conserved regulatory mechanisms likely exist at these loci to control for appropriate expression during X inactivation. The discovery o f variable JaridlC expression between cow samples suggests that variability is not uncommon within the eutherian infraclass. A s monotremes and marsupials normally show variable expression o f X-linked genes, the eutherian variability could simply reflect common ancestral origins. The incomplete inactivation observed the human, cow, and mole X chromosome also resembles the incomplete inactivation seen along marsupial X chromosome, although mouse instead shows nearly complete inactivation. This study strengthens the argument that the mouse X chromosome is at a state o f acquiring more complete inactivation compared to other eutherians, possibly due to their short generation times or any mechanism leading to an increased mutation rate i n the rodent, and/or the establishment o f epigenetic factors necessary for efficient inactivation. Humans, however, might be at the other extreme o f allowing a large number o f genes to escape silencing on the long arm o f the X [99], regardless o f whether these same genes are located on the long arm or short arm in the other eutherians. Investigation o f the expression status at these loci i n other mammals w i l l test whether the human expression profile is unique. The retention o f Y homologs, as for RPS4X, might explain the need to escape for some human  98  X C R genes [100]. Long generation times in human could account partially for the escape observed along the human X chromosome. L o n g arm X C R genes, such as IKBKG, that escape inactivation i n human should be investigated i n other mammals to test this idea. Although the current methylation analysis has conclusively shown that mice have unique inactivation for several loci, the mechanisms by which genes escape inactivation i n one species versus another remain unclear. X;autosome translocations studies suggest that escape correlates with increased distance from the XIC and differs depending on genomic context: which chromosome, and which segment o f a given chromosome are involved [43, 59, 122]. Inducible Xist transgenes inserted into human autosomes have confirmed that localization differs depending on insertion into the p versus q arm o f a given chromosome (personal communication: Sharan Sidhu and Jennifer Chow). The spread o f silencing to these regions might also differ, depending on composition o f possible way stations, such as L I or other repetitive elements [42, 106]. Similarly, i n this study, because there has been the shuffling o f genes (the extent o f which depends on the eutherian) on the X despite conserved synteny from the bulk o f the chromosome, loci have been repositioned to different genomic contexts, which could affect silencing. What is necessary to delineate the factors correlated with escape are high resolution X chromosome maps from numerous eutherians clearly showing gene order, an extensive analysis o f expression status o f numerous X - l i n k e d genes in multiple eutherians, and genomic context information about their X chromosomes, especially o f L I and L T R s which might play a role in silencing [42, 106]. These data, o f course, are not readily available, making correlations with escape from X inactivation difficult. Although the genomes o f many eutherians are currently being sequenced, the gene order and expression status along the X w i l l require concerted effort by many researchers. The methylation analysis used in this study is a limited means to assay for expression along the inactive X chromosome. The genes investigated are restricted to those containing C p G islands near their promoters. Genes lacking C p G islands tend to be variable i n expression [99] and this assay makes studying the regulation o f these genes infeasible. The number o f C p G islands whose methylation status can be assessed is limited by the number o f Hpall restriction sites within the P C R product. Although primers are designed to flank numerous Hpall restriction sites, a single unmethylated C p G could lead to unsuccessful P C R amplification, suggesting hypomethylation o f the C p G island. It is unclear whether the methylation status o f a single C p G reflects the global methylation status o f the entire island studied, although this could  99  be tested using sodium-bisulfite sequencing. Furthermore, methylation analysis does not allow one to investigate the role o f other epigenetic differences i n regulating, maintaining, or predisposing X-linked genes to a particular expression status. Differences such as H 3 K 4 or K 9 methylation, H 3 K 9 acetylation, and H 4 acetylation could cause variations i n expression. The difference in L I and C p G island density could also mark monoallelically expressed from biallelically expressed genes, as i n the case for autosomal, random monoallelically expressed loci [44] . Future experiments using C h l P (chromatin immunoprecipation) and correlations made by genomic sequence analyses could address these additional candidate factors i n controlling expression patterns on the X chromosome.  100  Chapter V Material and Methods  5.1) Polymerase Chain Reaction (PCR)  25 u l P C R s were set up in Biometra or Techne Genius Thermal Cyclers with 1 m i n denaturation at 94°C, 1 minute annealing at 50-62°C, and 2-3 m i n elongation at 72°C, depending on the primers (please see the primer table for corresponding properties and conditions). Each reaction contained 200ng template, 2 0 u M dNTPs, 1.5mM M g C l , l u l 10X P C R buffer (200 2  m M Tris HC1 [pH 8.4], 500 m M KC1), l u M primer pair, and 0.625 units (U) Taq D N A polymerase (all from G i b c o / B R L except primers which were ordered from U B C N A P s sequencing centre). Occasionally, the amplification o f C G rich regions, for example the C p G islands in the methylation analysis, required 1 or 2 M Betaine which decreased secondary structure formation to allow for more efficient amplification, or a change i n M g C l  2  concentration. Amplifying long P C R fragments such as those spanning large coast mole Xist regions greater than 3 kb necessitated the use o f the Expand L o n g Template system (Boehringer Mannheim). T w o separate 25pl mixes, kept on ice, were combined into a total 50ul volume for amplification. The first solution consisted o f primers, dNTPs, D N A i n water, i n the same final concentrations as for standard P C R above. The second m i x contained 2 U o f enzyme m i x , Buffer 3 (400 m M Tris HC1 (pH 8.4, 1 M K C 1 , 0.75mM M g C l ) and water. Once combined, the 2  reaction was overlayed with 30pl mineral o i l before putting into thermocyclers for an initial 2 min denaturation at 94°C, followed by 40 cycles o f (30 sec at 94°C for denaturing, 30 sec at 5054°C for annealing depending on the primers, and 3-7 minutes at 68°C for elongation depending on fragment size). The 40 cycles was followed by a 68°C extension for 7 minutes to ensure that partial fragments are completely synthesized.  5.2) Cloning  The p G E M - T vector system (Promega) was used to TA-clone P C R products for increased concentration o f low-yield products for sequencing and ensured that desired products were sequenced. It was required in cases where sequencing reactions failed, possibly due low  101  concentration o f templates, suboptimal conditions for annealing o f user-supplied primers, or remnant unwanted products i n gel purified samples. Propagating the P C R fragment v i a bacterial clones increased the template concentration, ensured that a single product was sequenced, and eliminated the problem o f user-supplied primers, since standard vector primers were used i n the sequencing reactions. A s described i n the p G E M - T vector system manual, 10ul ligations were carried out using the Rapid Ligation Buffer ( 6 0 m M T r i s - H C L [pH 7.8], 2 0 m M M g C l , 2 0 m M D T T , 2 m M 2  A T P , 10% polyethylene glycol M V 8 0 0 0 , A C S Grade]), 3 U o f T4 ligase, 50ng p G E M - T vector, and the appropriate nanogram amount o f P C R product depending on its size to give a 1:1 vector: insert ratio. The ligation utilized the A overhangs o f the product preferentially left by Taq polymerase (or any other D N A polymerase without 3'->5' exonuclease activity) i n P C R reactions and the T overhang supplied by the vector designed to reduce self-ligations ofthe vector or insert. The reaction was mixed by pipetting and remained at 4°C overnight to allow for efficient ligation. Transformation was carried out using 50pl J M 1 0 9 H i g h Efficiency Competent Cells (Fisher) that were transferred to 2 u l o f each ligation reaction after thawing from -70°C. The tubes were mixed by gently flicking and were then placed on ice for 20 min, after which the cells were heat-shocked for 45-40s at 42°C, and immediately returned to ice for another 2 min. To propagate the transformants, 950ul o f S O C medium was added and the bacteria were incubated for 1.5 hours at 37°C with 150 rpm shaking. 10 and lOOpl o f each transformation culture was spread onto L B , ampicillin (50pg/ml), I P T G , and X - g a l agar plates that were incubated overnight at 37°C. Plasmid minipreps were prepared from white colonies for insert analysis. The presence and size o f inserts were confirmed v i a restriction enzyme analysis by Notl (8-cutter) and Ncol (6-cutter) double digests.  5.3) Restriction Digest (Pst\) Cloning  Several coast mole P C R fragments were i n low concentration due to poor amplification, even after several coast mole specific primers and many P C R conditions were attempted. Before cloning the P C R product into vectors to increase the yield, the P C R sample was treated with a proteinase (proteinase-K, Invitrogen) followed by phenol-chloroform extraction to remove proteins and salts. The clean P C R product was digested with Pstl (6-bp cutter) to give  102  different size bands and cloned into the pBSII bluescript vector (Stratagene), which was cut with the same enzyme and treated with shrimp alkaline phosphatase ( S A P ) to prevent self-annealing o f the vector. A 1:1 insertvector ratio was used i n an overnight ligation at 4°C, followed by transformation into D H 5 alpha competent bacteria, which were grown on L B plates with I P T G and X - g a l , and picked the next morning using blue-white screening. White clones were propagated in L B media at 37°C i n a shaker for 16 hours. Inserts were analyzed using plasmid minipreps followed by restriction enzyme analysis with Pstl, to select positive clones for sequencing. 5.4) 5' and 3 ' R A C E  Since P C R amplification of the 5' and 3' ends o f coast mole Xist proved problematic using conserved primers designed from the multiple alignment o f cow, human, and mouse, Rapid Amplification o f c D N A Ends ( R A C E ) (GeneRacer K i t , Invitrogen) was used to amplify the promoter and U T R regions o f coast mole Xist, as well as to reach a p o l y - A signal. Alignments o f sequenced coast mole Xist regions indicated that the gene ended at ~2 kb and 2.5 kb away from the 5' and 3' terminals o f sequenced D N A , respectively. However according to the properties of the mouse, vole, and human Xist, the largest exons o f the 3' end (-4.5 kb human e6, mouse e7) contains many alternative splice donor sites which lead to earlier termination o f the exon. The human exon 6 contains splice donor site 3 kb into the exon. Since I sequenced 2.5 kb o f coast mole exon 6, 3' R A C E was performed to obtain the last portions o f the mole Xist gene.  5 ' R A C E was also attempted, since this system supposedly works to obtain  5' and 3' ends as far away as 2 kb from available sequences. For 5' R A C E , the GeneRacer method involved the treatment o f total coast mole R N A with calf intestinal phosphatase (CIP) to remove 5' phosphates from non-capped R N A or nonR N A s . The treatment o f dephosphorylated R N A with tobacco acid pyrophosphatase ( T A P ) removed the 5'cap structure from intact, full length R N A and exposed a 5' phosphate, preparing it for subsequent ligation to the GeneRacer oligo, whereas truncated m R N A and n o n - m R N A whose 5' phosphates were removed i n the initial C I P step could not undergo this ligation. The ligation o f the supplied oligo by T4 ligase provided a unique priming site for GeneRacer primers to generate full-length c D N A from m R N A . 3 ' R A C E did not require any dephosphorylation or decapping steps. The R N A was reversed transcribed using a user-designed gene-specific primer  103  near the 5' end or, i n the case o f 3' R A C E , the GeneRacer OligodT primer. Next a P C R reaction was performed with a supplied GeneRacer 5' or 3 ' primer and a complementary gene specific primer with the following conditions: Temperature  Time (minutes)  Cycles  94  2  1  94  0.5  5  72  2  5  94  0.5  5  70  2  5  94  0.5  20  65  0.5  20  68  2  20  68  10  1  (°C)  For both 5' and 3' R A C E , the supplied total H e L a R N A was used as an internal control for proper conditions, performed alongside the coast mole reactions. It is uncertain whether X i s t R N A is capped; therefore, total human R N A was used as a control i n 5 ' R A C E , with human Xist specific primers i n the R T and P C R steps.  5.5) Gel Extraction and DNA Purification  P C R products were purified by first excising the P C R band from a low-melt gel and then cleaning up the D N A via the Qiagen QIAquick P C R purification kit ( Q I A G E N Cat. No.287704). The agarose fragment was solubilized using buffer Q G and then placed i n a supplied silica-gel membrane spin column, to which D N A adheres when centrifuged, whereas remaining agarose was filtered through into the collecting tube and was discarded. Additional buffer Q G was combined i n the column and spun to remove trace amounts o f agarose. The sample was washed several times i n an ethanol-based buffer P E to remove any salts and the final D N A was eluted with 30ul water into a clean 1.5ml Eppendorf tube.  104  5.6> NCBI. BCM Search Launcher  Searches for sequences similar to the Xist gene from different species required the Nucleotide-nucleotide B L A S T program (blastn) available on N C B I (http://wvvvv.ncbi.nlm.nih.gov/). Pairwise alignments of Xist orthologs were performed with the bl2seq B L A S T tool in order to note relative locations to human or mouse XIST/Xist and to approximate exonic boundaries. The map viewer available on N C B I was used to compare the X chromosomes from dog, rat, cow, mouse and human, as well as to confirm locations o f X - l i n k e d genes or Y homologs. Multiple sequence alignments were done v i a ClustalW v. 1.6 set, available on the B C M Search Launcher website (http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) to design degenerate primers for sequencing o f mole Xist.  5.7) Nucleic Acid Dotplots  The dot plot program available online at http://arbl.cvmbs.colostate.edu/molkit/dnadot/ was used to align two sequences with poor local similarities, which was infeasible using BLAST.  This program allowed the detection o f repetitive regions and global similarities within  the sequences. A sliding window size o f 19 and a mismatch limit o f 4 were generally defined, such that a dot was placed in the middle o f the window by the program when at least 15 out o f 19 bases matched between the two sequences.  5.8) Tandem Repeat Finder  Xist sequences from each o f the eutherians were submitted to the Tandem Repeat Finder (http://tandem.bu.edu/trf/trf.advanced.submit.html).  To adjust all parameters for low stringency  detection of tandem repeats, the advanced input form was used. The alignment parameters were set at (2, 7, 7) for the number o f matches, mismatches and insertions/deletions, respectively. The minimum alignment score to retrieve results was 50 and the maximum period size was restricted to 500. The output regarding repeats included their locations, consensus sequences, and copy numbers.  105  5.9) CARNAC  C A R N A C was employed to predict common secondary structures between Xist orthologs (public webserver, http://bioinfo.lifl.fr/carnac/). The sequences were entered in F A S T A format; and the parameters were set to eliminate redundant sequences and to account for G C content within the sequences without allowing isolated stems. The stems (found in all orthologs) for each sequence retrieved were displayed i n three formats: a Connect file (ct), which provided a textual description o f the base pairings, a PostScript (ps) file and a J P E G (jpg) file, which both provided graphical representations o f the secondary structures. The J P E G and PostScript files were automatically generated from the ct file using the freely distributed drawing tool Naview. When no common stems were detected then the message ' N o structure found' was displayed. To visualize all predicted foldings at once for easy comparison, R N A f a m i l y , a Java applet dedicated to presenting multiple R N A sequences, was downloaded from the C A R N A C site.  5.10) RNAlifold  ClustalW alignments were first obtained for the orthologous sequences i n question (http://www.ebi.ac.uk/clustalw/). The alignments saved i n text file were then uploaded onto the R N A l i f o l d server (http://rna.tbi.univie.ac.at/cgi-bin/alifold.cgi) for analysis using the default parameters. Alignment lengths were limited to 1.5 kb. In particular, the fold algorithm o f "partition function and pair probabilities" was selected; weight o f covariance term was 1; penalty for non-compatible sequences was 1; and energy parameters were scaled to 37° C for temperature. The output consisted o f mountain plots, energy dotplots, and predicted consensus structures o f the orthologs, given i n postscript formats.  5.11) Mfold Individual orthologous sequences from each species was entered into the M f o l d webserver version 3 (http://mfold.burnet.edu.au/) or version 3.1 (http://www.bioinfo.rpi.edu/applications/mfold/old/dna/forml.cgi). Default parameters were used: folding temperature was 37°C, percent suboptimality was 5, upper bound on the number  106  o f computed foldings was 50, ionic conditions were 1 M N a C l without divalent ions, and default window parameter was used. N o pairing constraints were applied.  5.12) R Statistics Package Original as well randomized Xist datasets were input to the R Statistics package (version for Windows), downloaded from http://cran.r-project.org. The data sets were read by the program as tab-delimited text files, displaying information i n 4 columns ~ species, region o f Xist, run number, stem length. Each row signified a single stem detected by C A R N A C as being conserved between species. Histograms o f frequency versus stem lengths were generated for both randomized and original data sets using command lines. Values o f the range, median, mean, and standard deviation (SD) were also obtained from the plotted histograms using the R software.  5.13) Tissue Culture  Female and male coast mole fibroblast cell lines were established previously by Sanja Karalic from fresh organs o f moles sacrificed in a study by Dr. K e v i n Campbell [84]. The female cat ( C C L 176), cow ( C C L 209), and rabbit ( C C L 193), as well as male cow ( C C L 207) fibroblast cell lines, were purchased from the American Tissue Culture Catalog ( A T C C ) (http://www.atcc.org/catalog/cellBiology/cellBiologyIndex.cfm). These were immortal cell lines derived from the embryonic tongue i n the cat, a main stem pulmonary artery o f a young female cow, a lung biopsy from a rabbit, and the trachea o f a male cow. The mouse B M S L 2 cell line was derived from F I generation mice fibroblasts and contains both inactive and active X s with distinguishable alleles due to heterozygosity [140]. This cell line was split from a growing flask and propagated i n a separate t25 (25cm flask). After thawing the cell lines from liquid nitrogen, the coast mole cells were maintained i n 15% fetal calf serum (Cansera F C S ) minimal essential media (alpha M E M G i b c o / B R L ) containing 1% non-essential amino acids ( N E A A ) , 1% penicillin/streptomycin (P/S), and 1% L glutamine (all from G i b c o / B R L ) at 37°C. The cat, cow, and rabbit cells required only 10% F C S , again with N E A A , P/S, and L-glutamine. B M S L 2 female mouse cells were maintained i n 7.5% F C S M E M without N E A A .  107  To split growing cells, 90-100% confluent t25 flasks o f cell lines were rinsed i n I X phosphate saline buffer (PBS), trypsinized with 0.25% trypsin-EDTA and then resuspended in 10ml o f media, with the exception o f the coast mole fibroblasts which required scraping instead o f trysinizing to lift the adherent cells from the flask surface. 1ml ofthe coast mole suspensions, or 0.3ml o f A T C C cell suspensions were then transferred to new flasks containing appropriate media to seed new colonies. To prepare D N A or R N A , the cell suspensions were centrifuged at room temperature. After removing the supernatant, the cell pellets were stored i n -70°C until further use, or used immediately for R N A and D N A extraction. Freezing cell lines for long term storage required similar steps to making cell pellets, except after centriftiging and removing the supernatant, the pellet was resuspended i n 15%FCS R P M I containing 10% dimethyl sulfoxide ( D M S O ) and then transferred to a cryovial (1ml o f solution per 50% confluent t25 flask). The cryovial remained in an isopropanol filled container to cool slowly at -70°C for at least 4 hours before being placed i n long term liquid nitrogen.  5.14) RNA Extraction The acid-guanadinium-phenol-chloroform R N A extraction was performed as described initially by Chomczynski and Sacchi i n 1987 [141].  Solution D containing guanadinium  thiocyanate was added (0.6ml per confluent 60mm dish) to the cell pellet followed by vortexing to disrupt the cell membranes. Guanadinium and water complex with R N A to prevent hydrophilic interactions with D N A and proteins, such that R N A remains i n the aqueous layer after phenol-chloroform extraction. Following Solution D , an equal volume o f diethyl pyrocarbonate-treated water ( D E P C is used to inactivate R N A s e s ) saturated phenol (which acts as a deproteinating agent) and 0 . 2 M sodium acetate (pH 4) was added. The sample was vortexed for thorough mixing. 0.4 volumes chloroform was added and the preparation was placed on ice for 5-15 m i n to remove any traces o f phenol, followed by centrifugation (10 min, 13,500 rpm). The top layer containing R N A was placed i n a new 1.5ml Eppendorf tube, while the lower layer containing D N A and proteins was discarded. One volume o f isopropanol was added to the R N A solution and the sample was stored overnight at -20°C, where the sodium acetate salt helped to precipitate out the R N A i n the alcohol. The next morning, the R N A was centrifuged for 10 m i n at 13,500 rpm to pellet the R N A which was briefly rinsed with 70%  108  ethanol to remove excess traces o f salt, and then centrifuged to re-pellet the R N A . After removing the alcohol supernatant, the pellet was air-dried, re-suspended i n 50pl DEPC-water and stored at -20 °C.  5.15) Reverse Transcription  Before carrying out a reverse transcription (RT) reaction o f the R N A sample to make c D N A , the R N A sample was DNase-treated to remove trace amounts o f D N A . For this, l / 2 0  th  volume o f porcine RNase inhibitor (RNasin, Amersham Pharmacia Biotech) and 1/10 volume th  o f RNase-free DNase were combined with the R N A in D E P C - d d H 2 0 , incubated for one hour at 37°C. A phenol-chloroform extraction was performed to remove proteins, salts, and buffer from the DNA-free R N A . A n equal volume o f 1:1 phenolxhloroform solution was added, followed by vortexing and centrifuging for 10 minutes at 12,500 rpm at 4°C. The upper aqueous layer was transferred to a new tube after which 1 volume o f chloroform was added, the sample was vortexed, and centrifuged. To precipitate the R N A , 0 . 2 M sodium acetate and 1 volume o f isopropanol was added, as above, and left overnight i n -20°C. The R N A was spun for 10-15 min in a 4°C centrifuge i n the morning, the supernatant was removed, and the pellet was resuspended i n 50ul DEPC-water after air-drying. The actual R T reaction required the combination o f 5ug o f R N A , I X first-strand buffer ( G i b c o / B R L ) , 0 . 0 1 M Dithiothreitol ( D T T ) (Invitrogen), 0.0625mM d N T P s ( G i b c o / B R L ) , l u l random hexamers, 2 u l (1U) R N A s i n (Invitrogen), 1 (4.1 (1U) M - M L V (Moloney Murine Leukemia virus) Reverse transcriptase, brought up to a total volume o f 20ul by DEPC-water. The reaction mix sat for 5 min at room temperature followed by incubation for 2 hours at 42°C. The m i x was incubated subsequently at 95°C for 5 minutes and the c D N A was stored at -20°C until further use.  5.16) D N A Extraction  For cell pellets from a confluent t25 flask, 1 m l o f T r i s - E D T A (TE) buffer ( p H 7.5-8), l / 2 0 volume o f 20% sodium dodecyl sulfite (SDS) and l u l o f proteinase K were added. The th  sample was incubated at room temperature overnight. S D S detergent works by rupturing the cell membranes to expose nucleic acids, while the proteinase K digests a wide array o f proteins.  109  The next morning, 0 . 5 M N a C l was added followed by incubation for 2 hours at 37°C until the sample was i n solution. After adding 1.7M N a C l , the sample was shaken vigorously and centrifuged at room temperature for 15 m i n at 2,500 rpm. The supernatant was transferred to a new tube and 1/30 volume 20% S D S as well as 1.7M N a C l were added. The mixture was again shaken vigorously and spun for 15 m i n at 2,500 rpm. Finally, the supernatant was transferred to a new vial and 2 volumes o f ethanol were added to precipitate the D N A . The D N A was then recovered and suspended i n T E , together stored at 55°C to for two hours to dissolve the D N A . Alternatively, when not much D N A was retrieved, the D N A i n ethanol was stored at 4°C overnight for precipitation. The sample was centrifuged the following day to retrieve a pellet that was resuspended i n T E after air-drying.  5.17) UCSC - Degenerate Primers for CpG Islands  Primers for methylation analysis were devised from sequence information supplied by Santa Cruz genome browser (http://genome.ucsc.edu/) using the human genome M a y 2004 assembly. Regions with 40-60% G C content located within 2 kb o f the X - l i n k e d genes were considered C p G islands. In viewing options, conservation plots were displaced i n order to visualize the multiple sequence alignment made available for each island. Degenerate primers were designed from regions o f high conservation within the X - l i n k e d C p G islands o f interest to give P C R product sizes o f 100-700 bp. Often, the parts o f the C p G islands that overlapped with exonic regions o f the genes nearby were the most conserved and useful.  5.18) Methylation Analysis  Methylation analysis was performed on cow, mole, human, and mouse genomic D N A to assess for expression status o f X - l i n k e d genes. To first digest the D N A into smaller pieces, 4ug o f genomic D N A was combined with 1/10 volume appropriate restriction enzyme buffer, 10U EcoRl, and brought up to a total volume o f 40pl. The solution was subjected to RNase treatment and phenol-chloroform extraction (described above). T o evaluate the presence or absence o f methylation marks at C p G islands 5' o f Zfx, Crsp2, Utx, Ubel, JaridlC, Fmrl, and Ar, the restriction enzyme Hpall was used. This enzyme cut only unmethylated C C G G recognition sites, but did not cut methylated recognition sites. Subsequent P C R amplification o f  110  Hpall digested samples using primers to the C p G island o f interest exhibited bands on an agarose gel only when the locus was methylated (reflective o f inactive status). To make Hpall digests, 2ug o f the EcoRl digests was added to a mixture o f 1/10 volume Hpall restriction buffer, 10U Hpall enzyme (methylation sensitive, C C G G ) and brought up to 20ul with water, for a final genomic concentration o f 100ng/ul From the same EcoRl digest, a similar digest was done with Mspl which served as a cutting control. Mspl recognizes the same site ( C C G G ) as Hpall but cuts whether the site is methylated or not. A third digestion was performed from initial EcoRl digests. This was a mock digest where no restriction enzyme (glycerol/buffer solution instead) was added but the sample contained the same buffer used for Hpall. A l l three types o f digests were incubated at 37°C for optimal cutting. P C R s o f the digests using degenerate primers designed for X-linked C p G islands o f interest were performed. Absence o f bands generated from Hpall digests signified an active status at the locus (Figure 5.1). The mock digest served as an internal control for both the presence o f D N A , tolerable incubation conditions, and successful P C R conditions.  Ill  Hpall  Hpall  CH3  EcoRl  EcoRl  EcoRl  EcoRl  PCR  GEL  F i g u r e 5.1. M e t h y l a t i o n A n a l y s i s . G e n o m i c D N A samples from male and female eutherians were digested w i t h Hpall w h i c h cuts at unmethylated C p G recognition sites. P C R amplification o f the region o f interest reveals a band only i f the locus was methylated, indicative o f inactive status for most genes.  References  1.  Schartl, M . , Sex chromosome evolution in non-mammalian vertebrates. Curr O p i n Genet Dev, 2004.14(6): p. 634-41.  2.  Graves, J.A., Mammals that break the rules: genetics of marsupials and monotremes. A n n u R e v Genet, 1996. 30: p. 233-60.  3.  Margulies, E . H . , Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. P N A S , 2005. 102(9): p. 3354-3351.  4.  Murphy, W . J . , et al., Resolution of the early placental mammal radiation using Bayesian phylogenetics.  5.  Science, 2001. 294(5550): p. 2348-51.  Springer, M . , et al., Mitochondrial versus nuclear gene sequences in deep-level mammalianphylogeny  reconstruction. M o l B i o l E v o l . , 2001.18(2): p. 132-4.  6.  Delsuc, F., H . Brinkmann, and H . Philippe, Phylogenomics and the reconstruction of the  7.  Thomas, J.W., et al., Comparative analyses of multi-species sequences from targeted  tree of life. Nat Rev Genet., 2005. 6(5): p. 361-75. genomic regions. Nature, 2003. 424(6950): p. 788-93. 8.  Zarkower, D . , Establishing sexual dimorphism: conservation amidst diversity? Nat R e v Genet, 2001.2(3): p. 175-85.  9.  Hodgkin, J., Sex determination and dosage compensation in Caenorhabditis elegans. A n n . Rev. Genet, 1987. 21: p. 133-154.  10.  Csankovszki, G . , P. M c D o n e l , and B . J . Meyer, Recruitment and spreading of the C. elegans dosage compensation complex alongXchromosomes.  Science, 2004. 303(5661):  p. 1182-5. 11.  Meller, V . H . , et al., Ordered assembly of roXRNAs  into MSL complexes on the dosage-  compensated X chromosome in Drosophila. Curr B i o l , 2000.10(3): p. 136-43. 12.  Graves, J.A., Sex and death in birds: A model of dosage compensation that predicts lethality of sex chromosome aneuploids. Cytogenet Genome Res., 2003.101(3-4): p. 278-282.  13.  Kuroda, Y . , et al., Absence ofZ-chromosome  inactivation for five genes in male  14.  Kuroiwa, A . , et al., Biallelic expression ofZ-linkedgenes  chickens. Chromosome Res., 2001. 9(6): p. 457-68. in male chickens. Cytogenet.  Genome Res., 2002. 99: p. 310-314. 15.  Bisoni, L . , et al., Female-specific hyperacetylation of histone H4 in the chicken Z chromosome. Chromosome Res., 2005.13(2): p. 205-14.  16.  Richardson, B . J . , A . B . Czuppon, and G . B . Sharman, Inheritance of  glucose-6-phosphate  dehydrogenase variation in kangaroos. Nature N e w B i o l . , 1971. 230: p. 154-155. 17.  Migeon, B . R . , S. Jan de Beur, and J . Axelman, Frequent derepression of G6PD and HPRTon  the marsupial inactive X chromosome associated with cell proliferation in  vitro. E x p . C e l l Res., 1989.182: p. 597-609. 18.  Graves, J . A . and M . Westerman, Marsupial genetics and genomics. Trends Genet, 2002. 18(10): p. 517-21.  19.  Brown, C . J., et a l , The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. C e l l , 1992. 71: p. 527-542.  113  20.  Hendrich, B . D . , C . J . B r o w n , and H . F . W i l l a r d , Evolutionary conservation of possible functional domains of the human and murine XIST genes. H u m . M o l . Genet., 1993. 2(6): p. 663-672.  21.  Okamoto, I., et al., Epigenetic dynamics of imprinted X inactivation during early mouse development. Science, 2004. 303(5658): p. 644-9.  22.  Huynh, K . D . and J.T. Lee, Inheritance of a pre-inactivated paternal X chromosome in early mouse embryos. Nature, 2003. 426(6968): p. 857-62.  23.  Hoyer-Fender, S., C . Costanzi, and J . Pehrson, Histone macroH2A1.2 is concentrated in the XY-body by the early pachytene stage of spermatogenesis. E x p . C e l l Res., 2000. 258: p. 254-260.  24.  Richler, C , S.K. Dhara, and J . Wahrman, Histone macroH2A1.2 is concentrated in the XY compartment of mammalian male meiotic nuclei. Cytogenet. C e l l Genet., 2000. 89: p. 118-120.  25.  Cheng, M . K . and C M . Disteche, Silence of the fathers: early X inactivation. 2004. 26: p. 821-824.  26.  Marahrens, Y . , et al., Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes & Dev., 1997. 11: p. 156-166.  27.  McCarrey, J.R., et al., X-chromosome inactivation during spermatogenesis is regulated by an Xist/Tsix-independent mechanism in the mouse. Genesis, 2002. 34(4): p. 257-66.  28.  Armstrong, S.J., et al., Different strategies of X-inactivation  in germinal and somatic  cells: histone H4 underacetylation does not mark the inactive X chromosome in mouse male germline. Exp. C e l l Res., 1997. 230: p. 399-402. 29.  Fernandez-Capetillo, O . , et al., H2AXis  required for chromatin remodeling and  inactivation of sex chromosomes in male mouse meiosis. D e v C e l l , 2003. 4(4): p. 497508. 30.  Turner, J . M . , et al., Meiotic sex chromosome inactivation in male mice with targeted disruptions of Xist. J C e l l Sci, 2002.115(Pt 21): p. 4097-105.  31.  Rougeulle, C , et al., Differential histone H3 Lys-9 and Lys-27 methylation profiles on the X chromosome. M o l C e l l B i o l , 2004. 24(12): p. 5475-84.  32.  Kohlmaier, A . , et al., A chromosomal memory triggered by Xist regulates histone methylation inXinactivation.  33.  P L o S B i o l , 2004. 2(7): p. E l 7 1 .  Latham, K . E . , Xchromosome imprinting and inactivation in preimplantation mammalian embryos. Trends Genet., 2005. 21(2): p. 120-7.  34.  Fang, J . , et al., Ringlb-mediated H2A ubiquitination associates with inactive X  35.  Heard, E . , et al., Methylation of histone H3 at Lys-9 is an early mark on the X  chromosomes and is involved in Initiation of X-inactivation. chromosome duringXinactivation. 36.  J B i o l Chem, 2004.  C e l l , 2001.107: p. 727-738.  Plath, K . , et al., Role of histone H3 lysine 27 methylation in X inactivation. Science, 2003.300(5616): p. 131-5.  37.  Plath, K . , et al., Developmentally regulated alterations in Polycomb repressive complex 1 proteins on the inactive Xchromosome. J C e l l B i o l , 2004.167(6): p. 1025-35.  38.  Silva, J . , et al., Establishment of histone H3 methylation on the inactive X chromosome requires transient recruitment ofEed-Enxl  polycomb group complexes. D e v C e l l , 2003.  4(4): p. 481-95. 39.  Csankovszki, G . , A . Nagy, and R . Jaenisch, Synergism of Xist RNA, DNA methylation, and histone hypoacetylation in maintainingXchromosome 2001.153: p. 773-783.  114  inactivation. J . C e l l B i o l . ,  40. 41. 42. 43. 44.  45.  46.  47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58.  59. 60. 61.  Brown, C.J. and H.F. Willard, The human X inactivation center is not requiredfor maintenance of Xinactivation. Nature, 1994. 368: p. 154-156. Wutz, A . and R. Jaenisch, A shiftfrom reversible to irreversible X inactivation is triggered during ES cell differentiation. M o l . Cell, 2000. 5: p. 695-705. L y o n , M . F . , The Lyon and the LINE hypothesis. Seminars in Cell & Developmental Biology, 2003.14: p. 313-318. Brown, C.J. and J . M . Greally, A stain upon the silence: genes escaping X inactivation. Trends Genet, 2003.19(8): p. 432-8. Allen, E . , et al., High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes. Proc Natl A c a d S c i U S A , 2003.100(17): p. 9940-5. Bailey, J.A., et al., Molecular evidence for a relationship between LINE-1 elements and Xchromosome inactivation: the Lyon repeat hypothesis. Proc. Natl. A c a d . Sci., U S A , 2000. 97: p. 6634-6639. Waters, P.D., et al., LINE-1 distribution in Afrotheria and Xenarthra: implications for understanding the evolution of LINE-1 in eutherian genomes. Chromosoma, 2004.113: p. 137-144. Hansen, R.S., X inactivation-specific methylation of LINE-1 elements by DNMT3B: implications for the Lyon repeat hypothesis. H u m M o l Genet, 2003.12(19): p. 2559-67. Plath, K . , et al., Xist RNA and the mechanism of X chromosome inactivation. A n n u Rev Genet, 2002. 36: p. 233-78. Clerc, P. and P. Avner, Role of the region 3' to Xist exon 6 in the counting process ofXchromosome inactivation. Nat. Genet., 1998.19: p. 249-253. Morey, C , et al., The region 3' to Xist mediates Xchromosome counting and H3 Lys-4 dimethylation within the Xist gene. Embo J, 2004. 23(3): p. 594-604. Cattanach, B . and C . Rasberry, Identification of the Mus spretus Xce allele. Mouse Genome, 1991. 89: p. 565. Simmler, M . C . , et al., Mapping the murine Xce locus with (CA)n repeats. Mammalian Genome, 1993. 4: p. 523-530. Shibata, S. and J.T. Lee, Tsix transcription versus RNA-based mechanisms in Xist repression and epigenetic choice. Curr B i o l , 2004.14(19): p. 1747-54. Lee, J.T. and N . L u , Targeted mutagenesis of Tsix leads to nonrandom X inactivation. C e l l , 1999. 99: p. 47-57. Lee, J.T., Disruption of imprinted X inactivation by parent-of-origin effects at Tsix. Cell, 2000.103: p. 17-27. Sado, T., et al., Regulation of imprinted X-chromosome inactivation in mice by Tsix. Development, 2001.128: p. 1275-1286. Ogawa, Y . and J.T. Lee, Xite, X-inactivation intergenic transcription elements that regulate the probability of choice. M o l . Cell, 2003.11: p. 731-743. White, W . M . , et al., The spreading ofX inactivation into autosomal material of an X;autosome translocation: evidence for a difference between autosomal andXchromosomal DNA. A m . J. H u m . Genet., 1998. 63: p. 20-28. Sharp, A . J . , et al., Molecular and cytogenetic analysis of the spreading ofX inactivation in X;autosome translocations. H u m . M o l . Genet., 2002.11: p. 3145-3156. Chureau, C , et al., Comparative sequence analysis of the X-inactivation center region in mouse, human and bovine. Genome Res., 2002.12: p. 894-908. Nesterova, T . B . , et al., Comparative mapping ofX chromosomes in vole species of the genus Microtus. Chromosome Research, 1998: p. 41-48.  115  62.  63.  64. 65. 66.  67.  68. 69. 70.  71. 72. 73. 74. 75.  76.  77. 78. 79. 80. 81.  Brockdorff, N . , et al., The product of the mouse Xist gene is a 15kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. C e l l , 1992. 71: p. 515-526. Nesterova, T . B . , et al., Characterization of the Genomic Xist Locus in Rodents Reveals Conservation of Overall Gene Structure and Tandem Repeats but Rapid Evolution of Unique Sequence. Genome Research, 2001.11(5): p. 833-849. Wutz, A . , T.P. Rasmussen, and R. Jaenisch, Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nat Genet, 2002. 30(2): p. 167-74. Ganesan, S., et al., BRCA1 supports XIST RNA concentration on the inactive X chromosome. Cell, 2002.111(3): p. 393-405. M a k , W . , et al., Mitotically stable association of polycomb group proteins eed and enxl with the inactive Xchromosome in trophoblast stem cells. Curr B i o l , 2002.12(12): p. 1016-20. de Napoles, M . , et al., Polycomb Group Proteins RinglA/B Link Ubiquitylation of Histone H2A to Heritable Gene Silencing andXInactivation. Dev C e l l , 2004. 7(5): p. 663-76. Fackelmayer, F.O., A stable proteinaceous structure in the territory of inactive X chromosomes. J B i o l Chem., 2005. 280(3): p. 1720-3. Caparros, M . L . , et al., Functional analysis of the highly conserved exon IV of Xist RNA. Cytogenet Genome Res, 2002. 99: p. 99-105. Migeon, B . R . , et al., Identification of TSIX, encoding an RNA antisense to human XIST, reveals differences Jrom its murine counterpart: implications for X inactivation. A m . J . Hum. Genet, 2001. 69: p. 951-960. Shibata, S. and J.T. Lee, Characterization and quantitation of differential Tsix transcripts: implications for Tsix function. H u m . M o l . Genet, 2003.12: p. 125-136. X u e , F., et al., Aberrant patterns of X chromosome inactivation in bovine clones. Nat Genet, 2002. 31(2): p. 216-20. Brown, C.J. and J.C. Chow, Beyond sense: the role of antisense RNA in controlling Xist expression. Semin Cell Dev B i o l , 2003.14(6): p. 341-7. Farazmand, A . , et al., Expression of Xist sense and antisense in bovine fetal organs and cell cultures. Chromosome Res, 2004.12(3): p. 275-83. Tinker, A . V . and C.J. Brown, Induction of XIST expression from the human active X chromosome in mouse/human somatic cell hybrids by DNA demethylation. N u c l . Acids Res., 1998. 26: p. 2935-2940. Heard, E . , et al., Human XIST yeast artificial chromosome transgenes show partial X inactivation center function in mouse embryonic stem cells. Proc. Natl. A c a d . Sci., U S A , 1999. 96: p. 6841-6846. Schwartz, S., et al., PipMaker - a web server for aligning two genomic DNA sequences. Genome Res., 2000.10: p. 577-586. Eddy, S.R., Computational genomics of noncoding RNA genes. C e l l , 2002.109(2): p. 137-40. Hofacker, I., Vienna RNA secondary structure server. Nuclei Acids res, 2003. 31(13): p. 3429-3431. Zuker, M . , Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 2003. 31(13): p. 3406-15. Touzet, H . , CARNAC: folding families of related RNAs. Nuclei A c i d s res, 2004. 32: p. 142-145.  116  82.  Perriquet, O., Find the common structure shared by 2 homologous RNAs. Bioinformatics, 2003.19(1): p. 108-116.  83.  Griffiths-Jones, S., et al., Rfam: annotating non-coding RNAs in complete genomes. Nucleic A c i d Res, 2005. 33: p. 121-124.  84.  Karalic, S., Studies ofX-chromosome  inactivation and the identification of the Xist gene  in the insectivore Scapanus orarius, i n Medical Genetics. 2001, U B C : Vancouver, p. 145. 85.  Washietl, S. and I.L. Hofacker, Consensus folding of aligned sequences as a new measure for the detection offunctional RNAs by comparative genomics. J . M o l . B i o l . , 2004. 342(1): p. 19-30.  86.  Ohhata, T., et al., X-inactivation is stably maintained in mouse embryos deficient for  87.  Beletskii, A . , et al., PNA interference mapping demonstrates functional domains in the  histone methyl transferase G9a. Genesis, 2004. 40(3): p. 151-6. noncoding RNA Xist. Proc. N a t l . A c a d . Sci., U S A , 2001. 98: p. 9215-9220. 88.  Rasmussen, T.P., et al., Expression of Xist RNA is sufficient to initiate macrochromatin body formation. Chromosoma, 2001. 110: p. 411-420.  89.  A y l i n g , L . J . and D . K . Griffin, The evolution of sex chromosomes. Cytogenet Genome Res, 2002. 99(1-4): p. 125-40.  90.  K o h n , M . , et al., Wide genome comparisons reveal the origins of the human X  91.  Nanda, I., et al., 300 million years of conserved synteny between chicken Z and human  92.  Nanda, I., et al., Conserved synteny between the chicken Z sex chromosome and human  chromosome. Trends Genet, 2004. 20(12): p. 598-603. chromosome 9. Nat Genet, 1999. 21(3): p. 258-9. chromosome 9 includes the male regulatory gene DMRTP.  a comparative re(view) on  avian sex determination. Cytogenetics and C e l l Genetics, 2000. 89: p. 67-78. 93.  Pask, A . and J . M . Graves, Sex chromosomes and sex-determining genes: insights from marsupials and monotremes. Cellular and Molecular L i f e Sciences, 1999. 55: p. 864875.  94.  Lahn, B . T . , N . M . Pearson, and K . Jegalian, The human Y chromosome, in the light of evolution. Nat Rev Genet, 2001. 2(3): p. 207-16.  95.  Lahn, B . T . and D . C . Page, Four evolutionary strata on the human X chromosome.  96.  W i l c o x , S.A., et al., Comparative mapping identifies the fusion point of an ancient  Science, 1999. 286: p. 964-967. mammalian X-autosomal rearrangement. Genomics, 1996. 35(1): p. 66-70. 97.  Graves, J . A . , M . J . Wakefield, and R. Toder, The origin and evolution of the pseudoautosomal  regions of human sex chromosomes. H u m M o l Genet, 1998. 7(13): p.  1991-6. 98.  Carrel, L . , et al., A first-generation  X-inactivation profile of the human X chromosome.  Proc. Natl. A c a d . Sci., U S A , 1999. 96: p. 14440-14444. 99.  Carrel, L . and H . F . Willard, X-inactivation profile reveals extensive variability in Xlinkedgene expression in females. Nature, 2005. 434(17): p . 400-404.  100.  Jegalian, K . and D . C . Page, A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature, 1998. 394: p. 776-780.  101.  Craig, I.W., et al., Application of microarrays to the analysis of the inactivation status of human X-linked genes expressed in lymphocytes. E u r J H u m Genet, 2004. 12(8): p. 63946.  117  102.  103.  104. 105. 106.  107.  108.  109.  110.  111. 112.  113. 114. 115. 116. 117.  118. 119.  120.  Coates, J. W., S . M . Schmutz, and C G . Rousseaux, A survey of malformed aborted bovine fetuses, stillbirths, and non-viable neonates for abnormal karyotypes. Can. J. Res., 1988. 52: p. 258-63. Tsuchiya, K . D . and H.F. Willard, Chromosomal domains and escapefromX inactivation: comparative X inactivation analysis in mouse and human. M a m m . Genome, 2000.11: p. 849-854. Luoh, S. W., et al., CpG islands in human ZFX and Zfy and mouse Zfx genes: sequence similarities and methylation differences. Genomics, 1995. 29: p. 353-363. Ross, M . T . , et al., The DNA sequence of the human X chromosome. Nature, 2005. 434(7031): p. 325-37. Tsuchiya, K . D . , et al., Comparative sequence and X-inactivation analyses of a domain of escape in human Xpl 1.2 and the conserved segment in mouse. Genome Res, 2004. 14(7): p. 1275-84. Filippova, G . N . , et al., Boundaries between chromosomal domains of X inactivation and escape bind CTCF and lack CpG methylation during early development. Dev. Cell, 2005. 8: p. 31-42. Carrel, L . , P . A . Hunt, and H.F. Willard, Tissue and lineage-specific variation in inactive X chromosome expression of the murine Smcx gene. H u m . M o l . Genet., 1996. 5: p. 1361-1366. Sheardown, S., et al., The mouse Smcx gene exhibits developmental and tissue specific variation in degree of escapefromX inactivation. H u m . M o l . Genet., 1996. 5: p. 13551360. Hansen, R.S., et al., Escapefromgene silencing in ICF syndrome: evidence for advanced replication time as a major determinant. H u m M o l Genet, 2000. 9(18): p. 2575-87. Anderson, C . L . , and Brown, C.J., Epigeneticpredisposition to expression ofTIMPl from the human inactive Xchromosome. 2005, Submitted. Rougeulle, C , P. Navarro, and P. Avner, Promoter-restricted H3 Lys 4 di-methylation is an epigenetic markfor monoallelic expression. H u m M o l Genet, 2003. 12(24): p. 33438. Migeon, B.R., Xchromosome inactivation: theme and variations. Cytogenet. Genome Res., 2002. 99: p. 8-16. Lingenfelter, P. A . , et al., EscapefromX inactivation of Smcx is preceded by silencing during mouse development. Nat. Genet., 1998.18: p. 212-213. Ohta, T., An examination of the generation-time effect on molecular evolution. P N A S , 1993. 90(22): p. 10676-80. Douzery, E . , J.D. Lebreton, and F . M . Catzeflis, Testing the generation time hypothesis using DNAJDNA hybridization between artiodactyls. J. evol. B i o l . , 1995. 8: p. 511-29. Kumar, S. and S. Subramanian, Mutation rates in mammalian genomes. Proc Natl A c a d Sci U S A , 2002. 99(2): p. 803-8. Jorgensen, F . G . , et al., Comparative analysis ofprotein coding sequencesfromhuman, mouse and the domesticated pig. B M C B i o l . , 2005. 3(1): p. 2-17. Duthie, S . M . , et al., Xist RNA exhibits a banded localization on the inactive X chromosome and is excludedfrom autosomal material in cis. H u m . M o l . Genet., 1999. 8: p. 195-204. Dobigny, G . , et al., Viability ofX-autosome translocations in mammals: an epigenomic hypothesisfroma rodent case-study. Chromosoma, 2004.113(1): p. 34-41.  118  121.  122.  123. 124. 125.  126.  127.  128. 129. 130. 131. 132.  133.  134. 135. 136. 137. 138.  139.  Iannuzzi, L . , et al., Comparative FISH mapping of bovidX chromosomes reveals homologies and divergences between the subfamilies bovinae and caprinae. Cytogene Cell Genet, 2000. 89(3-4): p. 171-6. Sharp, A . , D . O . Robinson, and P . A . Jacobs, Absence of correlation between latereplication and spreading of X inactivation in an X;autosome translocation. H u m . Genet, 2001.109: p. 295-302. Chowdhary, B.P., et al., Emerging patterns of comparative genome organization in some mammalian species as revealed by Zoo-FISH. Genome Res., 1998. 8(6): p. 577-89. Kuroiwa, A . , et al., Comparative FISH mapping of mouse and rat homologues of twentyfive human X-linked genes. Cytogenet Cell Genet., 1998. 81(3-4): p. 208-12. Robinson, T.J., et al., A molecular cytogenetic analysis of Xchromosome repatterning in the Bovidae: transpositions, inversions, and phylogenetic inference. Cytogenet Cell Genet, 1998. 80(1-4): p. 179-84. Piumi, F., et al., Comparative cytogenetic mapping reveals chromosome rearrangeme between the X chromosomes of two closely related mammalian species (cattle and goats). Cytogenet Cell Genet, 1998. 81(1): p. 36-41. Raudsepp, T., et al., Exceptional conservation of horse-human gene order on X chromosome revealed by high-resolution radiation hybrid mapping. Proc Natl A c a d Sc 2004.101(8): p. 2386-91. Itoh, T., et al., A comprehensive radiation hybrid map of the bovine genome comprising 5593 loci. Genomics, 2005. 85(4): p. 413-24. Dixkens, C , et al., ZOO-FISH analysis in insectivores: "Evolution extols the virtue of the status quo". Cytogenet Cell Genet., 1998. 80(1-4): p. 61-67. Svartman, M . , et al., A chromosome painting test of the basal eutherian karyotype. Chromosome Res., 2004.12(1): p. 45-53. Gorman, M . L . and R . D . Stone, The natural history of moles. 1990, N e w York: Comstock Publishing, Cornell University Press. Yates, T . L . , A . D . Stock, and D.J. Schmidly, Chomosome banding patterns and the nucleolar organizer region of the Eastern Mole (Scalopus aquaticus). Experientia, 1976 11(10): p. 1276-77. Yates, T . L . and D.J. Schmidly, Karyotype of the eastern mole (Scalopus squaticus), with comments on the karyology of the family Talpidae. Journal o f Mammology, 1976. 56(4): p. 902-05. Poloumienko, A . , Cloning and comparative analysis of the bovine, porcine, and equine sex chromosome genes ZFX andZFY. Genome, 2004. 47(1): p. 74-83. R y u , S., et al., The transcriptional cofactor complex CRSP is requiredfor activity of the enhancer-binding protein Spl. Nature, 1999. 397(6718): p. 446-50. Mitchell, M . J . , et al., The origin and loss of the ubiquitin activating enzyme gene on the mammalian Y chromosome. H u m M o l Genet, 1998. 7(3): p. 429-34. X u , J., P.S. Burgoyne, and A . P . Arnold, Sex differences in sex chromosome gene expression in mouse brain. H u m M o l Genet, 2002.11(12): p. 1409-19. Jensen, L . , et al., Mutations in the JARID1C gene, which is involved in transcriptional regulation and chromatin remodeling, cause X-linked mental retardation. A m J H u m Genet, 2005. 76(2): p. 227-36. Basrur, P . K . , et al., Expression pattern of X-linked genes in sex chromosome aneuploid bovine cells. Chromosome Res, 2004.12(3): p. 263-73.  119  140.  141.  Komura, J., et al., In vivo ultraviolet and dimethyl sulfate footprinting ofthe 5' region of the expressed and silent Xist alleles. Journal of Biological Chemistry, 1996. 272(16): p. 10975-10980. Chomczynski, P., and Sacchi, N., Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem., 1987.162: p. 156-159.  120  Appendix  Figure A.1. Coast Mole Extended Exon 4 in Xist. Dotplot of coast mole Xist c D N A on X axis, and cow Xist genomic sequence on Y axis, is depicted. Red arrow indicates region in mole sequence that aligns to intronic sequence in cow, between cow exon 4 and exon 5, representing an extended coast mole exon 4. Folding this elongated exon in mole generates a stable hairpin structure via Mfold (B), that is conserved in other eutherians.  121  Mouse  Xist  genomic D N A  underlay I 'i vole I g rat j Q mole I 6D  • _| I  ]  COW I dog I  I I  human I  I  1  1  1  1  1  1  5k  10k  15k  20k  21530  Mouse  Xist  cDNA  underlay vole |T < rat C mole LZ  Q  w  cow dog human  -| 2k  Human  Xist g e n o m i c  1  1  1  1  1  4k  6k  8k  10k  11646  D N A  underlay I  vole I ^ moe l C g cow C WD dog C mouse  -at I  1  1  1  1  1  1  5k  10k  15k  20k  25k  Human  Xist  1  1  30k 32079  cDNA  underlay ^< mole C Z dog |  s  cow mouse vole rat  r_ | C f  2k  4k  6k  8k  10k  12k  14192  A.2. O v e r v i e w o f P e r c e n t I d e n t i t y P l o t s o f E u t h e r i a n Xist S e q u e n c e s . Regions o f >50% sequence similarity to the either the human or mouse XIST/Xist sequence (labeled at the top) are shown in green, w h i l e regions o f >75% sequence similarity are shown i n pink. The repeat regions o f the indicated sequence (labeled at the top) are shown i n red for A repeat, grey for F repeat, green for B repeat, blue for C repeat, y e l l o w for D repeat, and purple for E repeat. W i t h the exception o f mole, genomic Xist sequences from eutherians were aligned to human or mouse genomic XIST/Xist sequences; eutherian Xist c D N A sequences were aligned to human or mouse XIST/Xist c D N A using M u l t i P i p m a k e r [77]. Figure  122  Mouse Xist genomic DNA i  ole  23  •  sf^V  4 5 6  II  III  rat mole cow  ^  ^  ^  ^  ^  dog human L Ok  »—r-^ — i 2k  1  , 4k  H  r 6k  1  5  -!  1 Sk  i  1 IQ*  1  1 iZk  — i  •  r—=n—• f  '  20k  Human Xist genomic DNA vole  mole  rat  dog  mole  cow  cow  mouse  dog  vole rat  human  S—'  i •  " i • i  1  1  r—  1—"i  1  1 — — 1 —  2153.0  1  n  mole dog  .-  • •  -••>  cow mouse vole  -  rat -i  "I'  1  i  1  r^—l  1  "i  1  <(  Figure A.3. Percent Identity Plots of Human and Mouse XIST/Xist Genomic Sequences. The mouse Xist genomic sequence is shown at the top, with exons labeled 1-7 i n black boxes. The human XIST genomic sequence is shown at the bottom, w i t h exons labeled 1-6 i n black boxes. Percent identities o f the six eutherian Afar orthologs relative to the mouse and human sequence are shown, as generated by M u l t i P i p M a k e r [77], are shown.  123  A ratA/1-510 KCaTCWMCACGC-ATACCTG 7TGTCCTCC CADCCXT TCC* TnCCAGCTGGGCITGGGATAinTW CTOTTTTAATanTT l IlomeA/1-510 m t t T C E W C C A C G G A T A C C T C 3TGTCCTCC—-CCKCAT TCCi W CCW i IKGOTnGGATACTTA- CTKCTTTOriUTTlTlmli/25-510 Uim "CCCCATEGGGGCTACAGTrGAATG 3TGTCCTCCTTCCCTJKCCCTTCACTCC/ WCACTGGGGCTCrGGAAACTTA- iCTGCTnTMlTCU'iTlljl 11-dogl/1-510 K i CCAICCCCGCTCCGGATACCGC nTTATTAT TATTATTA—TTA1 TJCCCAACGGGGCTGTGGATA CTOCTTTTAATI COTA/1-510 UIMA/19-SIO  1  |  ACCCAACGGGGTCATGGATA  TATT)raUTXIKGGCTECGGATACCTG-ITITATr  CTKCTmiTTTTAlTmil--  ATTTTT----TCT1 TC j CCAACGGGGCCGTGGATA  ICTKnTTraillllllll-  Qmlicr/l-51i  E»tA/l-S10 I C-BCaCBAA— lOlueA/l-SlOT T-TT--TCTAAAITI vole/25-510 llUl|l|JUAACTI ICCm TCCCrn -TAiTranmcn  GCCCATCTO7GTT5TGGATACCTGC TTTATlLl'l 1 i'lI'll! II! rCTCCT-—t GCCCATCGGGGCCATGGATACCTGC rTTUCCAAAAAACCCCCtA KCCATIHTOCT5TOAIACCTGC mATigHtBlJ—Tjfc- -T. XCCATCGGCGCUWGATACCTGC mTGTJ - C j CCATXGGG-TAAT' CGATACCTGC ITITGCCCATTreGGCTGAGGATACCTGC CUI! 11 111! 1ICCTT :ATCGGGGCAATWATACCTGCI ,!~nGCCCATCGGGGCCTCGGATACCTGC GTGCCCCCCCCCCCCAAC—TCCCGCCCATCGGGGCCTG t GATACCTGC GTCTACCC-CCTCTCICCCTAACC^LTCGGGGCAATOJATACCTGCI IVnTITlAAAWGITOT G C C C A T C G O C G T A T C G G A T A C C T G C GAnrccmcccrcTBAAccca  Tm GCCCATCGGGGCTTCGGATACCTl men GCCCATCGGGGCCC I GGATACCT;t C I ATCGGGGCTTCGC  QlMllCJ/l-SH 400 41Q 420 430 3« 360 321) 330 450 350 370 ntA/l-S10 JATACCTGC nTAGI--TnT--TKCC, "GCCCAACGGGGCCTCGGATACCTGC -TTA--A' termTOCATxtawcreriwATACCTrmummTm'-cic-GCCCATCGC-OTATTTGGT&GATI JAAATAATGCTTTT GCCCAAC5G&GC-TTGGTGGATI IAAAT — louseA/1-Sll Aj TACCTGC il'liSiriTITI-TrCCC. "GCCCAACGGGGCCTCGGATACCTGC GTTATTAl'mTl'ITlLI I ITIll II "GCCCATCGGGGCTGTGGATACCT mTO^KM-TW GCCCAT^GKGCATTT&GTGGATt rATATAAGT volt/25-510 3AHCCTK --TAI?ITTJUrrC--TTnrraCAAC&GGGCCTTGGATACCTGC-TA-Al'mTl'mTllLiLril'l 1KCCACCG&GGCCGTGGATACCT mCATTTTTTTCCK iii)gA/i-5io pJiCCTGc nTAui'iii'i'i-mcrracATCGGGGCcrcGGATACciGcT m - i m n i i i c c-i wccATXGiranmGATACini iTTAGAi'mi'MHOTi GCCCATCGGGGCCTTTTATGGATt iUlrlVl'lkn" 1cwA/1-510 JATACCTGC TTAAin'lUT!—TT--C*OTIUTCGGOGCCTCGGATACCTGC-TO-ilTU'n'l'K C-1 WCCAICIKG i GCCKaiATACCTi nTAGAIilllbl'lll-ACAC ACCCATCGG68CB niaA/19-S10 JATACCTGC UllllllllllATTTTCr r&CCCATCGGGGCCTCGGATACCTGC -TTA-AlUHWIl—-TTC— •GCCaTCOGCCCCGCGOATACCTl nTTGAl 111 I'll tl-TCAT GCCCATCIKTGCTrTTTATGGAn' AAAAAATGTTGG-  Qualltr/1-5  B 10  20  30  40  50  -mmmmmmr--  huaT/1-722  60  (.oleF/51-722 GTCCAAATCCCGGATTATTTTTTTCCGCAGTGTCCAA TGGCGGGAA dogF/1-722  TCGCGGACC  i  70  -wm—mmm.TTCTGTTACATGCTAAAA AT&GCGGCT JAGCACTTCTG—  KUJ^axmZGTaTnnTttiti  iAATITTG-CCGCAGTGTTCAt TGGCGGGAA SCI  A C A T A G T A A A A ATGGCGGCT 7AGTACTTGCCGCAG1  cavr/323-722 GTTCAAA TGGCGGATC ATTTT&--CI  LTAGTAAAA ATGCCCGCT UCTACTTGCCGCACTCTAJ  ATGGCGGCT U&CTCTTGCTGCAG—TATAi «GC GGAGG-t T MUMF/40-722CTATJ  Qu*liCy/l-722  160  170  180  190  200  210  220  230  240  250  260  270  280  290  &U»r/l-722 p H i p H | Mler/51-72I GGCGG-i TTlTtnrrTTOT&TG^^  TGGCAGG— 34GCCTCGGG-GATAGTATGGCA&GCCTTArTATCTGGAACT TTGOCAATT JTCACCTOGATCTCTrOOCJLGAOCGTCCACASGGOCCTTCCTCTG  dogr/1-722  GGCGGGi LTflVltTIV-CCGTGTGCATTTI  -"TGACAGC7T7 TGCCGCAGG-GACAATATGGCAGACCTTgTCATTTCGATAT' ATGCCACTT 7TCACffTOGACSTCATCOCC  cowr/323-72;  CCCGGGi I T I T M ' L I L I « : C C T C T C C A T T T T T C A T A G G T I T O C T G O L C G - G A C A A X A T G G C T G A C C T T 0 T T ^ T C T C G A . T A T ? A T G C C A G 7 T ' . 7 T C A C G T G G A T A T C G T G G C A  tatT/1-722  GGAGG-i .1111.1 Ul 11CACATGTCA HCTTX .TGGCTGAGT AGTTTAGGGA C AAAGTTjGC AC ATGrnrTTt^GTTTGTCTAA  M O U S C F / 4 0 - 7 ; : A * A G C - i ^TTGTGTGTCACATGTCAGCTTl •TOTCTCAG? * G C C T M - - - A & f c O G T m : A C A T t ^ ^  F i g u r e A.4. M u l t i p l e A l i g n m e n t s o f  ATGGCGG-T TTC ATGTGATC AGC C CTC AAG  ATtrGCGC-A,3TCATGTGACCTGCCCTCTAG  O O G A T 1 T I ITiCCCSC— S G G G T C T T I ' G A C CGT- - -  B  L T J I  W ' 1  TGGTT-TCTTTCAiiTGATl  Xist Repeats A a n d F . A ) C l u s t a l W  alignments o f repeat A and B ) repeat F . A and F consensus sequences are boxed i n red. Nucleotides are colored i n green (T), orange (G), blue ( A ) , and pink ( C ) .  124  F i g u r e A . 5 . R N A l i f o l d o f the Xist 5 ' end. A ) Consensus structure o f the first kilobase of Xist exon. This sequence includes the beginning o f Xist until after the A repeat. B ) Consensus structure o f +500 bp to 1.5k b o f Xist. This region includes parts o f the A repeat and extends to include the F repeat. Since this region overlaps w i t h that o f ( A ) , the analogous structures found in each structure are boxed in green.  so A  (  P..  V,rV u  A  u  a -  A  C  ro RAT  VOLE  CONSENSUS  B [ voleex«n2 ]  10  is  :o  so  ii  so  55  to es  [ r<ite2 ]  i  i  1  1_  io  is  :o  :s  30  35  40  45  _l 75  so  I 80  I 85  I SO  u 95  D 10 rat/1-99  20  30  40  GGGTGAATCTGGAGTXGGTTTTGTGCCCCTGCCTCAAGAAG  50  60  70  80  90  GATTGCCTGGATTTAGAGGAGTGAAGAGTGCTGGAGAGTGCTGGTTGACTGAGAG  mexZ /1-99 GGATGAATTTGGAGTCTGTnTGTGCTCCTGCC^ v o l e / 1 - 9 9 GTATGAATTTGGAGTTGGTTTTGTGCCTC-AATTGAAGAAG  ATGGCCTGGTTTTAGAAGAATGGA---TTCTAGACAGCAKCAAAG  Quality/1  F i g u r e A . 6 . P a r t i a l C o n s e r v a t i o n o f E x o n 2 i n Rodents. C A R N A C gave c o m m o n structures depicted i n ( A ) and (B) i n vole and rat only, while R N A l i f o l d generated the consensus structure (C) when inputting rat, mouse, and vole exon 2 sequences from the alignment shown i n (D).  126  VOLE  100  HQ m6/1-199 rat/1-199  120  130  110  ICO  140  150  150  160  170  ^ ^ 1 8 0  190  AfCAAGCAATGTGAACACACAAAAGGMGGCAGCTTTATAMTGACCCGAGGATCMCATGCCTGACTGCAGCATCTTAAMGCIAATAGAATGAATCAAGAAATGTAAACACATGGAAGAACGCCAGCTTTACATACGACCAGAGGATCAACATGTCTGACT-TAGCATCTTAAGGGCAACAGACTGAG  v o l e / 1 - 1 9 9 AICGCGAAACGGGAACACATGAAAGGAAGCCAGCTTTATA  ACCCAAGGACCAACA-ACATGCCT- --GCATCTTAAA-GCAACAGAATGAA  Quality/1-  Figure A.7. Rodents Xist Exon 6 Partially Conserved CARNAC and Consensus Structures. A ) C A R N A C structures conserved i n only mouse and vole when nonrodents were excluded. B) C l u s t a l W alignment; only part o f the sequences is shown. C ) R N A l i f o l d consensus structure o f rodent exon 6.  127  HUMAN  I  ?  *" L' — T  .NX  'V. ML  X  cow D cex5/l-174  DOG  10  20  30  A  40  HUMAN  50  60  MOLE  70  90  80  AGAGCTCCTGGTTGTTCCCTTCATATTTGCCAAATCATTATCTTTCCCTGAAGTAGTGCAAAGAGC  AAGAAATGTG  doae5/l-174  GCTCCTGATTGTTCCCTTTTTATCTACCAAATCATTGTCT- -CCCAAAAGCAGTGCAGAGGGC  AAGAAAGTGG  hex5/l-174  GCTCTTCATTGTTCC  TATCTGCCAAATCATTATACTTCCTACAAGCAGTGCAGAGAGCTGAGTCTTCAGCAGGTCCAAGAAATTTG  TTATTGGTCT  TGGGTAAATCATCATTCTGGAGCCTCTGGTC--TGCAACGATCTCC-CTTGTGGTCCTTGGAAAACCTTTG  CM/1-174  Quality/1-  Figure A.8.  Xist E x o n 5 C A R N A C Results W i t h o u t R o d e n t s . C o m m o n stems and  their corresponding visual structures are shown on the left ( A ) and right ( B ) , respectively, when rodents were excluded from the input. M o l e and human show conserved stems, whereas c o w and dog do not. C ) M f o l d structures i n each nonrodent eutherian. D ) C l u s t a l W alignment o f non-rodent exon 5.  128  F i g u r e A . 9 . R o d e n t versus N o n - r o d e n t C o m b i n e d I n t e r n a l E x o n R e g i o n . A ) C A R N A C structures with rodent input only. B ) C A R N A C structures w i t h non-rodent input only.  129  Table A.1. Multiple Species Xist Splice Junctions. Potential splice sites i n dog, rat, and mole are indicated i n comparison to experimentally determined splice junctions in cow. Splice sites o f mouse, human, and vole can be found i n previous work [19-20, 62-63].  Species Accession Number  Mole N/A  Dog  Cow  NW_048043.1 (2960970-2986883bp reverse complement, where 1= 2960970) GI:34881475 gDNA -25914/13732  AAEX01057775 GI: 50088291 (1736824421bp) + AAEX01057774 (l-25440bp) pieced together l=beginning of pieced sequence gDNA -32267/15520  AJ421481 GI:21425595  Rat  gDNA 43940/21205  Sequence Type Size of Xist gDNA/cDNA Start Exon 1- 3' SD  cDNA 14559 incomplete 1 12099  -4191 13899 gagtacagtaagtac  -1 13319 taccttggtaagctt  Exon 2 -5' SA  12100  16512 ttttccaggggtgaat  13927 tttaaagggatgaat  Exon 2 -3' SD  12176  16609 gactgagagtctctgccctt  14026 tccaaaggtgaatct  Exon 3 -5' SA  12177  16714 cttcacaggaacaat  22791 cttctcaaggaaattcc  Exon 3 -3' SD  12304  16853 aaaaaaggtactttg  22939 aaaagatagtttggg  Exon 4 -5' SA  12305  17604 tttcccccagagtct  24983 tcttctccagatgtt  116,080 134,811 tactgtaagtact 136216 tcttaaagggatg 136305 caaaggtgaatctt 137781 ttctcaaggacat 137917 aaggtaatgtaag 139377 ttttccagatc  Exon 4 -3' SD Exon 5 -5' SA  12782 12783  25723 ttgtcaggtaagact 26088 attttttatagctcct  139588 ccagaggtgg 142651 ttttgtagctc  Exon 5 -3' SD  12918  26219 gaatgaagtaagttg  142783 gaagtaagt  Exon 6 -5' SA Exon 6 -3' SD  12919  17807 aaaataggtaagttt Rodent specific exon 5: 18018 gtttttcctaggacaa Rat exon 6: 18519 tttttttgtagtgccatct Rodent specific exon 5: 18160 tacacaagtgagtag Rat exon 6: 18675 gactgaggtaagtta Rat exon 7: 18930 gccatttttacaggcttaaa Rat exon 7: 21877 ggcttaggtgagcag 22682: aggctcagtaagttg  27707 ctctcctagatctggct 29272 ctcttgggtgagcgg 29695 tcaaatggtaaatat  143828 actgtagttt 148720 taaatgggtaagatg  ?  Table A . 2 . P r i m e r s Used for Sequencing Coast M o l e Xist. Primers and their sequences, conditions, and product sizes are listed. Primer Pair  Sequence 5'to 3'  |MgC12j  C M X I S T 23 CMXISTR8  GGT TCT TTC TRG A A C A T T TTC C R G GAT ACT A G A GTA ACT GCA GCG  1.5mM  C M X I S T 14 CMXIST11  94-lmin 48-lmin 72-3min  TTC A T A T G C A C T A A T A A C A A T AGC A C T GCT CAG AAG CAA TGC  1.5mM  CMX1ST17 CMXISTR12  AGC TCA CTA CCA CTG GGC A A C AGC TGC TTG CAG TCC TCA TGT  C M X I S T 18 CMXIST11 C M X I S T 15 CMXISTREV9  Cycling Conditions  #  P r o d u c t Size  Cycles 35X  Multiple bands  94-lmin 54-lmin 72-2min  35X  1 band, 500bp-lkb  2.5mM  94-lmin 52-1 min 72-3min  35X  1 band, l-1.6kb  GTA TTG TTG CTG AGG A G T GCT A A C AGC A C TGCT CAG A A G CAA TGC  1.5mM  94-lmin 50-lmin 72-2min  35X  1 band, 500bp-lkb  CAG C A G A G G G T A TTT GGG A AGT TCA TTC A T T GTT A A C ATG GCC  1.5mM  94-2min (94-30s 54-30s 68-8min) 68-7min 94-lmin 52-lmin 72-3min (94-lmin 54-lmin 72-2min) 72-7min  C M X I S T 12 CMXISTR6  TTC T C A G M A G T K C T G G C A C A T C T G G A A C A G C A G TTC TTT G T A A T C  1.5mM  C M X I S T R5 C M X I S T 10  ACT A G G C A A C A A CTC A C T GC CAG GTG GAG TTG ATA ACC TGG  1.5mM  CMXIST11 CMXISTR2  AGC ACT GCT CAG AAG CAA TGC CCT TGC CTT TCT C A A G A G G A A C  1.5mM  CMXIST1 CMXIST2  CAT TGC TGA A G T GGC CTG AGG GTT C C T CTT G A G A A A G G C A A G G  1.5mM  3-4kb 40X  35X  1.5-2kb  3 OX  600bp  (94-30s 52-30s 68-8min) 68-7min  40X  12kb  (94-lmin 54-lmin 72-2min) 72-7min  3 OX  200bp  Table A . 2 . P r i m e r s Used for Sequencing Coast M o l e Xist (Continued...)  CMXISTREV9 CMXISTREV2  CMXISTREV2 CMXISTREV10  C M X I S T 21 CMXIST12  AGT T C A TTC A T T GTT A A C A T G G C C CCT TGC CTT TCT C A A G A G G A A C  CCT TGC CTT TCT CAA G A G G A A C TCC C A A A T A C C C TCT GCT G  1.5mM  1.5mM  CMX1STR12 CMXISTR6  GCC A A T ATT TAC TTC A A G A T G CC TTC T C A G M A G T K CTG G C A C A T C T G AGC TGC TTG CAG TCC T C A TGT GAA C A G C A G TTC TTT GTA A T C  CMXISTR6 C M X I S T 12  GAA C A G C A G TTC TTT GTA A T C TTC T C A G M A G T K C T G G C A C A T C T G  1.5mM  C M X I S T 21 E6  GCC A A T ATT TAC TTC A A G A T G CC TTG TGT GCT TCA GTG TCT C T G C  1.5mM  C M X I S T 22  GCC A A T ATT TAC TTA CTT C A A G A T GCC TTG T G T GCT TCA G T G TCT C T G C  1.5mM  GCA GAG ACA CTG AAG CAC ACA A TTC T C A G M A G T K C T G G C A C A T C T G  3.0mM  E6 E6R CMXIST12 C M X I S T RI5 CMXIST25 C M X I S T 26 C M X I S T R16  GCT TTA G A G G A A A G G G G A G G A CT GTC TCC C C C TCT TTG TTT C A T A C AGT CCT CCC CTT TCC TCT A A A GC CCA RWG CMR HAR A M A CAC AHT GGCC  94-2min (94-3Os 54-30s 68-8min) 68-7min 94-2min (94-30s 54-30s 68-8min) 68-7min  2 bands, 6kb (faint) or 9kb 35X  5kb 35X  Failed to amplify 1.5mM  94-1 min 5 4 - l m i n or 5 6 - l m i n 72-2min 94-1 min 54-lmin 72-3min 94-1 min 54-lmin 72-3min 94-1 min 54-lmin 72-3min 94-1 min 54-lmin 72-3min  40X  250bp  35X  1.5-2kb  35X  400bp  35X  400bp  35X  1605bp  40X  2 bands, 450bp or lkb  Failed to amplify 3mM  94-1 min 52-1 min 72-3min  Table A . 2 . P r i m e r s Used for Sequencing Coast M o l e Xist (Continued...)  CMXIST15 C M X I S T R17  C A G C A G A G G G T A TTT G G G A AAT GGG A A G GCA A A G ATG GG  C M X I S T RI 7 C M X I S T 27  AAT GGG A A G GCA AAG ATG GG GAA GGA A A A GTA GGAGGG GTG G  CMXISTREV10 CMXISTREV3  TCC C A A A T A C C C TCT GCT G AAG GCC A A T TAA TGA GTT CA  C M X I S T 19 C M X I S T R E V 11  GAA G A K GGY WCT A A C CTY A A K GTA  C M X I S T 12 CMXIST13  CTR G A A A A T GTT C Y A G A A A G A A C C TTC T C A G M A G T K C T G G C A C A T C T G TTC TTT T G A G A T G T M CTT TTT G A T GTT  1.5mM  (94-30s 52.6-30s 68-5min) 68-7min  35X  3-4kb  Failed to amplify 1.5mM  1.5mM  1.5mM  94-2min (94-30s 54-30s 68-8min) 68-7min 94-lmin 52-1 min 72-2min 94-lmin 52-1 min 72-3min (Human) OR 94-lmin 49-1 min 72-3min (Cow)  5kb 35X  3 OX  900bp M o l e 600bp Human, Cow  35X  ~400bp Human ~300bp C o w Failed in M o l e  Table A.3. Pairwise Identities of Exons in Eutherians. Human exon 2 is unique i n sequence, as reported previously. The lengths o f each sequence are indicated. The conservation scores between the first sequence (Seq A ) and the second sequence (Seq B ) are given in the last column. CM=coast mole, h=human, m=mouse, c=cow; "ex" or "e" = exon. A ) Pairwise Identities o f Xist E x o n 2. S e q A Name  Length  1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6  77 77 77 77 77 77 83 83 83 83 83 64 64 64 64 91 91 91 90 90 97  CM CM CM CM CM CM  vole vole vole vole vole hex2 hex2 hex2 hex2 mex2 mex2 mex2 cex2 cex2 dog  s e q B Name  vole hex2 mex2 cex2 dog rat hex2 mex2 cex2 dog rat mex2 cex2 dog rat cex2 dog rat dog rat rat  2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7  SeqA  Name  Len(nt)  S e q B Name  1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6  hex3 hex3 hex3 hex3 hex3 hex3 cex3 cex3 cex3 cex3 cex3 doge3 doge3 doge3 doge3  137 137 137 137 137 137 138 138 138 138 138 141 141 141 141 129 129 129 138 138 132  2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7  CM CM CM  rat rat mex3  Length  cex3 doge3 CM  rat mex3 vole doge3 CM rat  mex3 vole CM  rat mex3 vole rat mex3 vole mex3 vole vole  Score  64 12 48 80 83 44 14 80 66 63 60 21 14 15 10 70 64 83 85 61 58  83 64 91 90 97 96 64 91 90 97 96 91 90 97 96 90 97 96 97 96 96  Len(nt)  Score  138 141 129 138 132 138 141 129 138 132 138 129 138 132 138 138 132 138 132 138 138  78 72 67 35 54 43 78 74 30 58 60 72 12 15 54 32 55 59 81 55 65  134  Table A.3. Pairwise Identities of Exons in Eutherians (Continued...) SeqA Name 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6  mole mole mole mole mole mole vole vole vole vole vole hex4 hex4 hex4 hex4 mex4 mex4 mex4 cex4 cex4 dog  SeqA Name SeqA Name 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6  cow cow cow cow cow cow dog dog dog dog dog mole mole mole mole human human human mouse mouse rat  Len(nt)  SeqB Name  474 474 474 474 474 474 213 213 213 213 213 209 209 209 209 211 211 211 208 208 215  2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7  Len(nt) Len(nt)  SeqB Name SeqB Name  133 133 133 133 133 133 132 132 132 132 132 138 138 138 138 164 164 164 197 197 155  2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7  vole hex4 mex4 cex4 dog rat hex4 mex4 cex4 dog rat mex4 cex4 dog rat cex4 dog rat dog rat rat  dog mole human mouse rat vole mole human mouse rat vole human mouse rat vole mouse rat vole rat vole vole  Len(nt)  Score  213 209 211 208 215 131 209 211 208 215 131 211 208 215 131 208 215 131 215 131 131  52 69 52 70 67 77 77 87 73 73 86 77 87 89 78 73 74 96 86 77 80  Len(nt) ten(nt)  Score Score  132 138 164 197 155 134 138 164 197 155 134 164 197 155 134 197 155 134 155 134 134  81 75 78 27 21 23 76 78 29 31 60 74 40 38 36 67 71 44 83 64 61  D) Pairwise Identities of Xist Unique Rodent E x o n 5. SeqA Name Len(nt) SeqB Name Len(nt) score 1 1 2  vole vole mex5  103 103 147  2 3 3  mex5 rat5 rat5  147 131 131  135  33 49 84  Table A.3. Pairwise Identities of Exons in Eutherians (Continued...) E) Pairwise Identities of Xist Internal E x o n Region (Exons 2-5 for Non-Rodents SeqA Name  ten(nt)  SeqB Name  Len(nt)  Score  1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6  671 671 671 671 671 671 989 989 989 989 989 533 533 533 533 574 574 574 568 568 651  2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7  989 533 574 568 651 585 533 574 568 651 585 574 568 651 585 568 651 585 651 585 585  52 47 41 53 72 53 45 41 54 48 52 43 58 33 57 69 47 67 42 84 42  vole vole vole vole vole vole mouse mouse mouse mouse mouse mole mole mole mole human human human cow cow rat  mouse mole human cow rat dog mole human cow rat dog human cow rat dog cow rat dog rat dog dog  136  Table A . 4 . Pairwise Identities for Xist Sequences Before the A Repeat.  SeqA Name 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5  Hum Hum Hum Hum Hum Rat Rat Rat Rat Cow Cow Cow Mouse Mouse Dog  Len(nt)  SeqB Name  284 284 284 284 284 325 325 325 325 319 319 319 319 319 242  2 3 4 5 6 3 4 5 6 4 5 6 5 6 6  Rat Cow Mouse Dog vole Cow Mouse Dog vole Mouse Dog vole Dog vole vole  137  Len(nt)  Score  325 319 319 242 290 319 319 242 290 319 242 290 242 290 290  56 74 50 77 59 46 54 60 55 58 80 51 35 77 38  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items