Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Characterization of mammalian Asx-like genes and role of Asx-like-1 in development and hematopoiesis Fisher, Cynthia 2004

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2004-931196.pdf [ 32.39MB ]
Metadata
JSON: 831-1.0092245.json
JSON-LD: 831-1.0092245-ld.json
RDF/XML (Pretty): 831-1.0092245-rdf.xml
RDF/JSON: 831-1.0092245-rdf.json
Turtle: 831-1.0092245-turtle.txt
N-Triples: 831-1.0092245-rdf-ntriples.txt
Original Record: 831-1.0092245-source.json
Full Text
831-1.0092245-fulltext.txt
Citation
831-1.0092245.ris

Full Text

CHARACTERIZATION OF MAMMALIAN ASX-LIKE GENES AND ROLE OF ASX-LIKE-1 IN DEVELOPMENT AND HEMATOPOIESIS by CYNTHIA FISHER B.Sc. Physics, University of Saskatchewan, 1991 B.Sc. Certificate Biology, University of Saskatchewan, 1993 M.Sc. Medical Biophysics, University of Toronto, 1996 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY In THE F A C U L T Y OF G R A D U A T E STUDIES Department of Zoology We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A July, 2004 © Cynthia Fisher, 2004 A B S T R A C T Polycomb Group (PcG) chromatin regulatory proteins maintain repression of Hox genes during development, leading to posterior homeotic transformations in the absence of gene product. Conversely, trithorax Group (trxG) proteins maintain activation of Hox genes and trxG null mutants exhibit anterior transformations. Mutations in some PcG genes, termed Enhancers of trx and Pc (ETP), enhance both trxG and PcG gene mutations, result in both posterior and anterior transformations and mediate both silencing and activation of Hox genes. The Additional sex combs (Asx) gene of Drosophila belongs to the ETP group. I have identified and characterized three mammalian orthologues of Asx in mice and humans, named Asx-like-1, Asx-like-2, and Asx-like-3. Primary sequence conservation between A S X and mammalian ASX-like proteins is limited to an N-terminal nuclear receptor binding motif containing region (termed the A S X H domain), and a C-terminal region containing a PHD zinc finger that mediates interaction with the SET domain of the T R I T H O R A X / M L L protein in Drosophila and mammals respectively. Murine Asxll and Asxl2 are expressed ubiquitously in adult tissues and undifferentiated embryonic stem (ES) cells. Asxll is expressed selectively in hematopoietic cells. To assess the role of Asxll in mouse development and hematopoiesis, I generated Asxll deficient mice by targeted mutagenesis of ES cells. Homozygous Asxll-/- mice exhibit partial perinatal lethality. Surviving adult Asxll-/- mice fail to thrive, and exhibit cell autonomous reductions in thymopoiesis and B-cell differentiation whereas myeloid lineage differentiation is largely unaffected. Asxll-/- newborn mice show bidirectional homeotic transformations of the axial skeleton, indicating that Asxll is a true functional homologue of Asx. To determine i f Asxll is a conserved murine ETP gene, I generated compound mutants for Asxll and the PcG gene M33, a homologue of the Polycomb gene in Drosophila. Compound Asxll;M33 mutants exhibit enhanced lethality, and more severe and highly penetrant axial skeletal homeotic transformations as compared to single Asxll and M33 mutants. These results are consistent with classification of Asxll as the first described ETP gene in mice. Further analysis of the mammalian Asx-like gene family will provide insight into the interacting regulatory mechanisms governing maintenance of gene expression states through cell differentiation and development. ii T A B L E OF C O N T E N T S A B S T R A C T II T A B L E O F C O N T E N T S I l l L I S T O F T A B L E S V I L I S T O F F I G U R E S V I I L I S T O F A B B R E V I A T I O N S X A C K N O W L E D G E M E N T S X I I C H A P T E R 1 I N T R O D U C T I O N 1 I. C E L L DIFFERENTIATION, D E V E L O P M E N T , A N D M E M O R Y 1 II. EPIGENETIC R E G U L A T I O N 2 III. H O M E O T I C CLUSTER/T/OA" GENES A N D ANTERIOR-POSTERIOR B O D Y A X I S P A T T E R N I N G D U R I N G D E V E L O P M E N T : A M O D E L S Y S T E M FOR INVESTIGATIONS INTO C E L L U L A R M E M O R Y 3 I V . HOXGENE R E G U L A T I O N B Y EPIGENETIC M A I N T E N A N C E PROTEINS: T H E P O L Y C O M B A N D TRITHORAX GROUPS. .5 V . E V O L U T I O N A R Y C O N S E R V A T I O N OF M A I N T E N A N C E PROTEINS 6 V I . M A I N T E N A N C E E L E M E N T S .' 12 V I I . M O L E C U L A R M E C H A N I S M S OF P C G PROTEINS 13 V I I I . M O L E C U L A R M E C H A N I S M S OF T R X G PROTEINS 16 I X . M O D E L S FOR M A I N T E N A N C E PROTEIN FUNCTION 18 i) trxG models 18 ii) PcG models 19 iii) Maintenance models 21 X . B L U R R I N G T H E B O U N D A R I E S : E N H A N C E R OF TRITHORAX A N D P O L Y C O M B ( E T P ) G R O U P PROTEINS 23 X I . M O D E L S FOR E T P FUNCTION : 25 X I I . B E Y O N D HOXGENE R E G U L A T I O N : M U L T I P L E FUNCTIONS FOR P C G A N D T R X G PROTEINS 2 7 X I I I . HEMATOPOIESIS: A M O D E L S Y S T E M FOR INVESTIGATING M E C H A N I S M S OF C E L L DIFFERENTIATION A N D M E M O R Y 28 X I V . R O L E OF M A I N T E N A N C E PROTEINS IN R E G U L A T I O N OF HEMATOPOIESIS 33 X V . T H E E T P GROUP M A I N T E N A N C E PROTEIN A D D I T I O N A L S E X C O M B S ( A S X ) IN DROSOPHILA MELANOGASTER 3 8 X V I . THESIS A I M S 41 C H A P T E R 2 C H A R A C T E R I Z A T I O N O F ASX-LIKE G E N E S I N M A M M A L S 43 I. INTRODUCTION 4 4 II. R E S U L T S 4 7 /. Characterization of human ASXL1 cDNA and genomic clones 47 ii. Cytological mapping of human ASXL1 51 i i i iii. Characterization of mouse Asxll cDNA and genomic clones 51 iv. Sequence analysis of human ASXL1 and mouse Asxll 56 v. Genomic organization of human ASXL1 64 vi. Genomic organization of mouse Asxll 65 vii. Analysis of human ASXL1 expression 68 viii. Analysis of mouse Asxll expression 71 ix. Characterization of other mammalian Asx-like genes 75 x. Sequence comparisons of mammalian Asx-like genes 77 xi. Conserved regions within mammalian ASX-like proteins 77 xii. Analysis of mouse Asxl2 expression 86 III. DISCUSSION 88 C H A P T E R 3 F U N C T I O N A L C O N S E R V A T I O N O F ASXL1 A S A N E T P G E N E I N M I C E 9 8 I. INTRODUCTION 98 II. R E S U L T S 9 9 /. Generation of Asxll and Asxll; M33Clen" mutant mice 99 ii. Viability and general observations of Asxll mutant mice 106 iii. Sex-determination and fertility in Asxll mutant mice 110 iv. Eye development defects in Asxll mutant mice 114 v. Homeotic phenotypes of Asxll mutant mice 115 a. Axial skeletal transformations of Asxll mutant mice 117 b. Hox gene expression in Asxll mutant mice 122 vi. Genetic interactions between Asxll and M33 mutations 123 a. Enhancement of lethality in Asxll;M33 c'erm compound mutant mice 124 b. Sex-reversal phenotype of M33 Cterm mutant mice 126 c. Enhancement of skeletal defects in Asxll;M33 Clerm compound mutant mice 126 III. DISCUSSION 132 C H A P T E R 4 R O L E O F ASXL1 I N M U R I N E H E M A T O P O I E S I S 142 I. INTRODUCTION 142 II. R E S U L T S 144 /. Asxll expression in discrete hematopoietic lineage compartments 144 ii. Hematopoietic abnormalities of Asxll mutant mice 147 a. Changes in cellularity of hematopoietic organs in Asxll mutant mice 147 b. Defects in myelo-erythropoiesis of Asxll mutant newborn and adult mice 148 c. T lymphocyte development is impaired in thymus of Asxll mutant adults 155 d. B lymphocyte development is impaired in Asxll mutant mice 158 e. Bone marrow reconstitution experiments recapitulate hemtopoietic defects of adult Asxll mutant mice 166 III. DISCUSSION 175 iv C H A P T E R 5 G E N E R A L D I S C U S S I O N 185 I. S T R U C T U R E A N D FUNCTION OF A S X - L I K E PROTEINS 185 /. Structural insights into mechanisms of action of ASX-like proteins 185 ii. Functional redundancy 186 II. A S X L P s AS PART OF T H E ETP G R O U P OF MPs: TESTS OF GENETIC INTERACTION 187 III. R O L E OF A S X L P s IN HEMATOPOIESIS: FUTURE W O R K 189 IV. T A R G E T S OF A S X L P s IN R E G U L A T I O N OF D E V E L O P M E N T A N D C E L L DIFFERENTIATION 192 V. E V O L U T I O N OF MPs A N D A S X L P s 195 VI. T H E FUTURE OF M A I N T E N A N C E 196 C H A P T E R 6 M A T E R I A L S A N D M E T H O D S 199 I. N O M E N C L A T U R E OF M A M M A L I A N ASX-LIKE cDNAs 199 II. BIOINFORMATICS 199 III. cDNA L I B R A R Y SCREENING, C L O N I N G , A N D SEQUENCING 2 0 0 IV. S U B C E L L U L A R L O C A L I Z A T I O N 2 0 2 V. G E N O M I C C L O N I N G , PCR, A N D SEQUENCING 203 VI. C H R O M O S O M E M A P P I N G 203 VII. N O R T H E R N BLOTS 2 0 4 VIII. S I N G L E - C E L L G L O B A L RT-PCR BLOTS 205 IX. G E N E T A R G E T I N G A N D G E N E R A T I O N OF ASXLI M U T A N T MICE 2 0 5 X. S C R E E N I N G OF M33CTERM ~'~ MICE A N D G E N E R A T I O N OF ASXLI; M33CTERM D O U B L E M U T A N T S 2 0 7 XI. FERTILITY TESTING OF ASXLI M U T A N T MICE 208 XII. G E N E R A L M O U S E PROCEDURES 208 XIII. S K E L E T A L PREPARATIONS 2 0 9 XIV. I N SITU R N A HYBRIDIZATION 2 0 9 X V . P R E P A R A T I O N OF SECTIONS A N D SLIDES 2 1 0 XVI . H E M A T O L O G I C A L A N A L Y S I S OF PERIPHERAL B L O O D , T H Y M U S , S P L E E N , A N D B O N E M A R R O W C E L L S 2 1 0 XVII. M E T H Y L C E L L U L O S E A S S A Y S TO DETECT CLONOGENIC PROGENITORS 2 1 0 XVIII. D A Y 12 C F U - S P L E E N (CFU-S 1 2 ) A S S A Y 211 XIX. L O N G - T E R M RECONSTITUTION A S S A Y 211 X X . F L O W C Y T O M E T R I C (FACS) A N A L Y S I S 211 XXI . STATISTICAL A N A L Y S I S 2 1 2 B I B L I O G R A P H Y 2 1 3 A P P E N D I X 242 V LIST OF TABLES Chapter 1 Table 1-1. PcG and ETP gene families, conserved protein domains, and molecular functions. 7 Table 1-2. Selected trxG and ETP gene families, conserved protein domains (present in most or all family members), and molecular functions. 10 Chapter 2 Table 2-1. Selected cDNA and genomic D N A sequences which contain ASXLI sequence used to generate the ASXLI cDNA contig. 50 Table 2-2. Selected cDNA and genomic D N A sequences and/or clones which contain sequence representing Asxll, some of which were used to generate the Asxll cDNA contig. 54 Table 2-3. Exon-intron boundaries of the human ASXLI gene. 61 Table 2-4. Exon-intron boundaries of the mouse Asxll gene. 62 Table 2-5. The Asx-like gene family in mice and humans. 76 Table 2-6. Conserved sequence features within the mammalian ASX-like protein family, listed in N-terminal to C-terminal order. 79 Chapter 3 Table 3-1. Numbers of offspring of Asxll heterozygous intercrosses. 107 Table 3-2. Fertility tests of Asxll-/- adult male mice. 113 Table 3-3. Skeletal abnormalities of newborn Asxll mutant mice. 118 Table 3-4. Numbers of offspring of intercrosses between compound heterozygous Asxll and M33c,erm mutant mice. 125 Table 3-5. Skeletal defects in Asxll; M33Qerm mice. 127 Chapter 4 Table 4-1. Peripheral blood cell counts, and percent gated and absolute number of nucleated blood cells expressing the cell differentiation antigens indicated, from recipients of bone marrow transplants from adult Asxll-I- and Asxll+/+ donors, determined by flow cytometry 18 weeks post-transplant. 169 Appendix Appendix Table A - l . Numbers of 3 week old offspring of double heterozygous intercrosses of Asxll and M33Cterm mutant mice, by genotype and sex. 242 Appendix Table A-2: Percent gated and absolute number of cells expressing CD4 and CD8 from newborn thymus, determined by flow cytometry. 242 vi LIST OF FIGURES Chapter 1 Figure 1-1. Summary of characteristics of PcG, ETP, and trxG gene functions, mutations, and phenotypes. 24 Chapter 2 Figure 2-1. Nucleic acid sequence of human ASXL1 cDNA. 49 Figure 2-2. Nucleic acid sequence of mouse Asxll cDNA. 53 Figure 2-3. Predicted amino acid sequence alignment of human A S X L 1 , mouse A s x l l , and Drosophila A S X and domain structure of Drosophila A S X compared to human ASXL1 and mouse Asxl l predicted proteins. 58 Figure 2-4. Sequence alignment of the A S X H domains of Drosophila A S X and ASX-like proteins from other species. 63 Figure 2-5. Sequence alignment of the PHD fingers of human A S X L 1 and Drosophila A S X with those of other Polycomb and trithorax group proteins. 66 Figure 2-6. Genomic structure of the human ASXL1 gene and the mouse Asxll gene. 67 Figure 2-7. Northern blot analysis of human ASXL1 expression in adult tissues, fetal liver, and cancer cell lines. 69 Figure 2-8. Northern blot analysis of mouse Asxll expression in adult tissues, and day 0 C C E embryonic stem (ES) cell line. 73 Figure 2-9. Asxll expression at 10.5 dpc and 11.0 dpc in whole mount mouse embryos detected by R N A hybridization in situ. 74 Figure 2-10. Sequence alignment of the N-terminal regions of Drosophila A S X and mouse and human ASX-like proteins. 81 Figure 2-11. Sequence alignment of the central regions of mouse and human ASX-like proteins. 82 Figure 2-12. Sequence alignment of a C-terminal region containing two nuclear receptor (NR) binding domains of mouse and human ASX-like proteins. 83 Figure 2-13. Sequence alignment of the C-terminal proximal PHD domains of mouse and human ASX-like proteins. 84 Figure 2-14. Domain structure of Drosophila A S X compared to the predicted ASX-like protein family in mammals. 85 Figure 2-15. Northern blot analysis of mouse AsxU expression in adult tissues, and day 0 CCE embryonic stem (ES) cell line. 87 Chapter 3 Figure 3-1: Targeting strategy and disruption of the Asxll gene in ES cells and mice. 102 Figure 3-2. Northern blot and RT-PCR analysis of Asxll expression in wild-type and Asxll -I- mice. 104 Figure 3-3. In situ R N A hybridization of Asxll+/+ and Asxll-/- 10.5 dpc embryos, using an Asxll antisense riboprobe. 105 Figure 3-4. Comparison of adult mouse body weight and size according to Asxll genotype. 108 vii Figure 3-5. Organ weight comparisons in Asxll adult mice. 109 Figure 3-6. Splenomegaly in Asxll-/- adult mice. I l l Figure 3-7. Role of Asxll in gonad development and spermatogenesis. 112 Figure 3-8. Defective eye development in Asxll mutant 12.5 dpc embryos. 116 Figure 3-9. Alterations of the axial skeleton of newborn Asxll mutant and Asxll ;M33 compound mutant mice: lateral views of the cervical regions and scapulae. 118 Figure 3-10. Alterations of the axial skeleton of newborn Asxll mutant and Asxll; M33 compound mutant mice: lateral views of the thoracic regions. 118 Figure 3-11. Alterations of the axial skeleton of newborn Asxll mutant and Asxll ;M33 compound mutant mice: dorsal views of the lower cervical, thoracic, and upper lumbar regions. 118 Figure 3-12. Alterations of the axial skeleton of newborn Asxll ;M33 compound mutant mice: dorsal views of the cervical and thoracic regions. 128 Figure 3-13. Alterations of the axial skeleton of newborn Asxll mutant and Asxll;M33 compound mutant mice: ventral views of the thoracic region. 128 Chapter 4 Figure 4-1. Hematopoiesis stage-defined murine single-cell amplified cDNA slot blots. 145 Figure 4-2. Red blood cell counts (RBC) and nucleated cell numbers of hematopoietic tissues in 18.5 dpc embryos, newborn, and adult Asxll mutant mice compared to wild type controls. 149 Figure 4-3. In vitro colony formation of committed myeloid progenitors in 18.5 dpc fetal liver, newborn spleen, and adult bone marrow of Asxll-I- mice compared to Asxll+/+ controls. 152 Figure 4-4. Numbers of cells expressing the myeloid lineage markers Gr-1, Mac-1, and Ter-119 as determined by flow cytometry of newborn spleen, and of bone marrow and spleen cells from adult Asxll-/- mice compared to matched wild type controls. 153 Figure 4-5. Schematic diagram representing stages of T cell differentiation within the murine thymus including differentiation antigen expression profiles. 156 Figure 4-6. Thymus cells expressing or co-expressing the T lymphocyte lineage markers CD4 and CD8 as indicated, determined by flow cytometry in adult Asxll-/- mice compared to matched wild type controls. 157 Figure 4-7. Schematic diagram representing stages of B cell differentiation (according to the Philadelphia nomenclature) within murine bone marrow, and immediately following migration to peripheral lymphoid organs including the spleen. 159 Figure 4-8. Newborn spleen cells expressing the B lymphocyte markers B220+, B220+IgM+, and IgM+IgD+, determined by flow cytometry, comparing Asxll-/-mice to Asxll+/+ controls. 160 Figure 4-9. Numbers of cells expressing the B lymphocyte lineage markers B220+, B220+IgM+, and B220+CD43+, determined by flow cytometry of bone marrow cells from adult Asxll-/- mice compared to matched Asxll+/+ controls. 162 Figure 4-10. CFU-IL-7 assay: in vitro colony formation of committed pre-B lymphocyte progenitors derived from adult bone marrow of Asxll-/- mice compared to wild type controls. 163 Figure 4-11. Cells expressing the B lymphocyte lineage markers B220+, B220+IgM+, and IgM+IgD+, determined by flow cytometry of spleen cells from adult Asxll-I-mice compared to matched wild type controls. 164 viii Figure 4-12. Examples of flow cytometric profiles showing marked reduction of B lymphocyte lineage populations in one adult Asxll-I- mouse compared to its matched Asxll+/+ control expressing B220+ or IgM+IgD+, in the peripheral blood, spleen, and bone marrow. 165 Figure 4-13. Bone marrow reconstitution of lethally irradiated Ly5.1+ adult recipients injected with Ly5.2+ bone marrow cells from adult Asxll-/- donors recapitulates the hematopoietic defects observed in adult Asxll-/- adult mice. 170 Appendix Appendix Figure A-1. Pairwise sequence alignment between mouse Asxl3 and human A S X L 3 predicted proteins. 243 Appendix Figure A-2. Pairwise sequence alignment between mouse Asx l l and Asxl2 predicted proteins. 247 Appendix Figure A-3. Pairwise sequence alignment between mouse Asx l l and Asxl3 predicted proteins. 250 Appendix Figure A-4. Pairwise sequence alignment between mouse Asxl2 and Asxl3 predicted proteins. 254 Appendix Figure A-5. In situ R N A hybridization of Asxll+/+ and Asxll-/- 10.5 dpc embryos, using a Hoxc8 antisense riboprobe. 258 Appendix Figure A-6. In vitro colony formation of committed erythroid progenitors CFU-E and BFU-E in adult bone marrow of Asxll-/- mice compared to Asxll+/+. 260 Appendix Figure A-7. Numbers of nucleated cells/ml in peripheral blood expressing the lineage markers Gr-1, B220, and Ter-119, in newborn Asxll mutant and wild type control mice, and expressing the lineage markers Gr-1, Mac-1, Ter-119, B220, IgM+IgD+, CD4+CD8- SP, CD4-CD8+ SP, and CD4+CD8+ DP in adult Asxll-/- mice compared to Asxll+/+ controls, as determined by flow cytometry. 261 ix LIST OF A B B R E V I A T I O N S A G M Aorta-gonad-mesonephros A-P Anterior-posterior A S X Additional sex combs protein (Drosophila) Asxl l Additional sex combs-like 1 protein (mouse) A S X L P ASX-like protein ATP Adenosine triphosphate BAP Brahma associated protein BCR B cell receptor bp basepair CBP cAMP-responsive binding protein CD Cell differentiation antigen CFU Colony-forming unit ChIP Chromatin immunoprecipitation CLP Common lymphoid progenitor C M M Cellular Memory Module CMP Common myeloid progenitor CtBP Carboxyl-terminus binding protein D N Double negative thymocyte (CD4- and CD8-) D N A Deoxyribonucleic acid DP Double positive thymocyte (CD4+ and CD8+) dpc Days post-coitum (used to stage murine embryogenesis) dPCC Recombinant Drosophila Polycomb "core" complex E M H Extramedullary hematopoiesis ETP Enhancer of trithorax and Polycomb FACS Fluorescence activated cell sorting GOF Gain of function GST Glutathione S-transferase GTF General transcription factor H3-K4 Histone H3 lysine 4 H3-K9 Histone H3 lysine 9 H3-K27 Histone H3 lysine 27 HAT Histone acetyltransferase H D A C Histone deacetylase Hox Homeobox cluster HMTase Histone methyltransferase HSC Hematopoietic stem cell IgH Immunoglobulin heavy chain IgL Immunoglobulin light chain IL-7 Interleukin-7 kDa kilo-Daltons LOF Loss of function MDa mega-Daltons Me Methylation M E Maintenance elements M L L Mixed lineage leukemia MP Maintenance protein x NCBI National center for biotechnology information NR Nuclear receptor NRBD N R binding domain PAS Para-aortic splanchnopleura PB Peripheral blood PC Polycomb protein (Drosophila) PCC Polycomb core complex PcG Polycomb Group PEV Position-effect variegation PRC1 Polycomb repressive complex 1 PRC2 Polycomb repressive complex 2 (contains E(Z) and ESC proteins) PRE Polycomb Response Element RBC Red blood cell RNA Ribonucleic acid R N A i R N A interference RNA-TRAP R N A tagging and recovery of associated proteins S A M Self-association motif Sbfl SET binding factor 1 SET • SU(VAR)3-9, E N H A N C E R OF ZESTE, and TRITHORAX SP Single positive thymocyte (CD4+CD8- or CD4-CD8+) SRC PI60 steroid receptor coactivator TAC1 trithorax acetylation complex 1 TAF TBP-Associated Factor TBP TATA-binding protein TCR T cell receptor TPE Telomeric position effect TRE trithorax Response Element T R X trithorax protein (Drosophila) trxG trithorax Group UCSC University of California at Santa Cruz Note: for PcG, trxG, and ETP group gene name abbreviations please refer to Tables 1-1 and 1-2. xi A C K N O W L E D G E M E N T S I would like to thank my graduate supervisor, Dr. Hugh Brock, for the opportunity and latitude to pursue research in mouse Polycomb Group genetics, which was a rather unorthodox project in a laboratory previously focused on the Drosophila model system. I have appreciated Dr. Brock's enthusiastic commitment to my research and intellectual growth, and for supporting my attendance at several international meetings and conferences. I am also grateful to Dr. Brock for allowing me to pursue an independent research project in Japan during the summer of 2003 through an award from the Japan Society for the Promotion of Science, which delayed completion of this thesis. I am also indebted to Dr. Keith Humphries, who served on my supervisory committee, and also generously welcomed me into his laboratory to conduct the majority of the research work contained in this thesis. Without his support and guidance, I could not have completed my research project. I am also thankful for the support, advice, and assistance provided by the other members of my supervisory committee, Drs. Linda Matsuuchi, Tom Grigliatti, and Michel Roberge throughout my PhD programme. I would like to thank Dr. Yoshihiro Takihara for the research experience in his laboratory in Japan during the summer of 2003, and for interesting discussions on connections between the Polycomb Group and the cell cycle. I would also like to acknowledge the financial support of the Canadian Institutes of Health Research (formerly Medical Research Council of Canada) in the form of a M R C studentship, and the Society for Developmental Biology in the form of a travel award and poster competition award from the 2002 annual meeting. Many other people provided knowledge and technical assistance to me at various stages of my research. I would particularly like to thank Jacob Hodgson for his infectious positive outlook, tireless encouragement, and never-ending intellectual stimulation, Patty Rosten for sharing her vast knowledge of molecular biology techniques, Cheryl Helgason for teaching me about hematopoiesis and relevant techniques, Caroline Bodner for her technical assistance and moral support during long experiments, and Rewa Grewal for her help with mouse procedures. I am also grateful to the late Doug Houge for helpful discussions on targeting vector construction during the early phases of my project, and for providing me with the vector D N A that I used to construct the Asx l l targeting vector backbone. I would like to thank the following current and former members of the Brock lab: Jacob, Ester, Sebastian, Bob, Tom, Yong-Jun, Jack, Pol, Andrea, and Iris; the Humphries lab: Patty, Cheryl, Jennifer, Nick, Sharlene, Rhonna, Ben, Andrew, Caroline, Hide, Koichi, Lars, Sylvia, Sanja, Chris, Suzan, and Rewa; and the Takihara lab: Moto, Shini, and Rika, for their friendship and camaraderie, as well as for their support and advice during my graduate programme. I am also very grateful to John Stingl, my parents Ron and Carol Fisher, and my sister Sheri Fisher, for their tireless encouragement and emotional support. I dedicate this work in memory of my grandmother, Kathleen North, who passed away in April 2004. xii CHAPTER 1 Introduction I. Cell differentiation, development, and memory A one-cell animal zygote is capable of developing into an adult multicellular organism. A mammalian hematopoietic stem cell gives rise to multiple blood lineages, both myeloid and lymphoid, and yet retains an extensive capacity to self-renew. In each system, a single cell that is capable of giving rise to many cell types undergoes multiple rounds of cell division that yields daughter cells with either a preserved identity (self-renewal) or different identities with a reduced ability to differentiate compared to the initial cell. How the genomic information contained within that zygote, or stem cell, can be transduced into determination of cell fate at any given stage of the developmental process, is a question of fundamental importance in biology. The current paradigm holds that discrete expression patterns of gene subsets define the cell type, and that changes in activation or repression of key determination genes constitute the commitment events that drive various cell differentiation and developmental processes. Hence much research has gone into the identification and characterization of transcription factors that control gene expression by directly binding to D N A of target loci and altering their transcription, and connecting these activities with cell differentiation outcomes. This approach has been widely successful in identifying key commitment factors. After a commitment decision has been made, and the appropriate subset of gene expression patterns have been initiated, these gene expression patterns must be maintained, through subsequent cell divisions, even after the transiently expressed transcription factors responsible for making that decision are gone (Brock and van Lohuizen, 2001). Hence a cell must retain a memory of its appropriate expression state, likely at several different loci simultaneously, in many cases while still possessing further differentiation capabilities (Gaston and Jayaraman, 2003; Moazed, 2001; Smale, 2003). Disruption in appropriate maintenance of gene expression patterns, or cell memory, can have disastrous effects on the organism, such as death, or cellular transformation leading to cancer (Hake et al., 2004; Jacobs and van Lohuizen, 2002; Otte and Kwaks, 2003; Roberts and Orkin, 2004). On the other hand, dynamic regulation of maintenance of cell fate, or the ability to reprogram cell fate, can be advantageous under some circumstances, such as in naturally occurring vertebrate tissue regeneration (Tanaka, 2003) or transdetermination in Drosophila imaginal discs as a response to wounding (Maves and Schubiger, 2003; Wei et al., 2000), and in somatic cell cloning for therapeutic applications in medicine (Jouneau and Renard, 2003). 1 II. Epigenetic regulation Maintenance of gene expression states through mitosis or even meiosis, is an epigenetic phenomenon. Epigenetics includes all processes involved in cell memory and/or inheritance that do not result in changes to D N A sequence and/or do not follow Mendelian inheritance patterns (Cavalli, 2002; Urnov and Wolffe, 2001a). Selected examples of epigenetic phenomena include mating type silencing in yeast (Gasser and Cockell, 2001), position-effect variegation (PEV) (Schotta et al., 2003), telomeric position-effect (TPE) (Boivin et al., 2003; Perrod and Gasser, 2003), gametic imprinting and X chromosome inactivation in mammals (Chow and Brown, 2003; Okamoto et al., 2004), and maintenance of transcriptional regulatory states by Polycomb (PcG) and trithorax (trxG) Group proteins (Orlando, 2003). Epigenetic phenomena may share varying degrees of mechanistic overlap. Epigenetic regulation may modify the D N A itself (eg methylation), modify histones, affect recruitment of multiple accessory/co-regulatory proteins to chromatin, affect nuclear localization of loci, or affect overall chromatin structure (Carmo-Fonseca, 2002; Khorasanizadeh, 2004; Kornberg and Lorch, 2002; Vermaak et al., 2003) (Felsenfeld and Groudine, 2003; Fisher and Merkenschlager, 2002). Epigenetic mechanisms control modulation of loci within both heterochromatin and euchromatin (Vermaak et al., 2003). At the molecular level, the heterochromatic state is a permanently repressed (i.e. silenced) state within highly compacted chromatin, while euchromatic loci can be transiently activated or repressed, or silenced (Wolffe, 1998). Dynamic control of chromatin structure is a key element in gene regulation, and provides another level of regulation that must be integrated with transcription factor binding to DNA, which is also more dynamic than traditionally thought (Belmont, 2003; Fischle et al., 2003a; Luger, 2003; Wolffe, 1998). Several epigenetic mechanisms have been discovered. Perhaps the most well described mechanism of altering chromatin structure is the ATP-dependent remodeling of chromatin, i.e. shifting nucleosome positions and conformations, thereby altering D N A template accessibility. Numerous ATP-dependent chromatin remodelling complexes have been characterized to date (Becker and Horz, 2002; Fan et al., 2003). Other mechanisms for altering chromatin structure affect histone modification or replacement which in turn can modify higher order chromatin structure from a permissive to silenced state, or can signal reversible changes in transcription levels (Eberharter and Becker, 2002; Kornberg and Lorch, 2002). Much effort is being spent on cataloging various types of post-translational histone modifications, generally referred to as 2 'marks', identifying enzymes that perform these modifications, and eludicating their role in epigenetic phenomena. Such modifications may form a 'histone code' which dictates the outcome on transcription and other processes, in a combinatorial manner (Jenuwein and Allis, 2001; Khorasanizadeh, 2004; Narlikar et al, 2002; Strahl and Allis, 2000). Histone aceytlases (HATs) and deacetylases (HDACs) modify lysine residues predominantly on the N-terminal histone tails (Carrozza et al, 2003; Hasan and Hottiger, 2002; Yang and Seto, 2003). HATs generally act as transcriptional activators (Carrozza et al, 2003; Eberharter and Becker, 2002; Narlikar et al, 2002; Sterner and Berger, 2000), whereas HDACs function as transcriptional repressors, and can reverse histone acetylation marks (Grozinger and Schreiber, 2002; Narlikar et al, 2002; Yang and Seto, 2003). Certain H A T and H D A C -containing complexes can collaborate with ATP-dependent chromatin remodeling complexes (Fry and Peterson, 2001; Yang and Seto, 2003). H D A C activity is necessary but not sufficient for certain epigenetic silencing processes to occur (Eberharter and Becker, 2002). Lysines on histones can also be methylated by histone methyltransferases (HMTases). However no demethylase enzymes have yet been discovered and it is entirely possible that they do not exist (Khorasanizadeh, 2004). Effects mediated by HMTases are more permanent, since the only way to remove a methylation mark on a histone is to replace the entire histone, and therefore these enzymes are thought to be key factors in transducing long-term maintenance of epigenetic phenomena. Other types of (reversible) histone modifications such as ubiquitination and phosphorylation also likely play a role in regulation of epigenetic processes but these not as well understood (Khorasanizadeh, 2004). The various models for epigenetic regulation of transcription will be discussed further below in relation to the primary focus of this thesis, the trithorax and Polycomb Groups of genes, once they and their classical targets, the homeotic gene clusters, have been introduced. III. Homeotic cluster///***; genes and anterior-posterior body axis patterning during development: a model system for investigations into cellular memory Homeotic (HOM-C or Hox) genes are found in clusters, are critical determinants of the metazoan body plan, and have been conserved throughout evolution from basal Metazoans to mammals (Ferrier and Holland, 2001; Kmita and Duboule, 2003; Krumlauf, 1994; Lewis, 1978; Prince, 2002; Veraksa et al, 2000). Hox gene mutations result in a phenotype of'homeosis', whereby one body segment is transformed into the likeness of another body segment (Bateson, 1894; Kmita and Duboule, 2003). Hox genes possess a characteristic 180 base-pair sequence 3 called the homeobox that codes for a helix-turn-helix D N A binding motif, and Hox proteins function as transcription factors (McGinnis et al., 1984a; McGinnis et al., 1984b; Scott and Weiner, 1984). Most of our knowledge of how alterations in Hox gene expression and regulation affect body patterning comes from studies in the fruit fly Drosophila melanogaster (Lewis, 1978). There are 8 Hox genes in Drosophila, split into two linkage groups within the same chromosome, called the Antennapaedia and Bithorax complexes (Lewis, 1978; McGinnis and Krumlauf, 1992). The HOM-C gene cluster has undergone two large-scale duplication events during evolution to mammals, resulting in four paralogous clusters (termed A, B, C, and D), each on a different chromosome, containing a total of 39 Hox genes (Duboule and Dolle, 1989; Graham et al, 1989; Martinez and Amemiya, 2002). Hox gene expression along the body axis from anterior to posterior is restricted to particular domains, and the anterior boundaries of these domains are colinear with the physical position of the respective gene 3' to 5' within the Hox cluster genomic locus (Deschamps et al., 1999). Proper spatial and temporal regulation of these expression boundaries, as well as of gene product dosage, are required to generate proper segmental identity along the AP axis (Deschamps et al., 1999; Dubrulle and Pourquie, 2002; Kessel and Grass, 1991; Kmita and Duboule, 2003). Hox genes are global regulators of body axis patterning in mice, but this is only obvious when all paralogues are disrupted on account of a high degree of functional redundancy between the paralogues (van den Akker et al., 2001; Wellik and Capecchi, 2003). Single Hox gene knockout mice exhibit mild phenotypes involving alterations in size or shape of the skeletal elements primarily in the anterior range of the normal expression domain of that gene (Carpenter et al, 1993; Condie and Capecchi, 1993; Davis and Capecchi, 1994; Krumlauf, 1994; Mak and Simard, 1998; Mark et al., 1993). Compound mutants of Hox gene paralogues generally show more severe transformation phenotypes in skeletal elements than the sum of the individual gene knockout phenotypes, implying redundancy of Hox gene function (Favier et al., 1996; Gavalas et al, 1998; Horan et al, 1995; Mak and Simard, 1998). Strikingly, mice lacking all three Hox 10 paralogues no longer possess distinctly lumbar or sacral vertebrae but instead have ectopic rib processes extending from all posteriorly located vertebrae within the expected lumbar and sacral vertebral regions. Mice lacking all three Hoxll paralogues show normal rib morphology and location but all vertebrae posterior to T13 (where the last rib normally occurs) take on a lumbar morphology such that no sacral vertebrae form. Mice with 5 of the 6 mutant alleles exhibited much less severe phenotypes (Wellik and Capecchi, 2003). Gene swapping 4 experiments using paralogue group 3 Hox genes indicate that protein products from paralagous Hox loci are functionally equivalent (Duboule, 2000; Greer et al, 2000). Hox genes also regulate floral development in plants (Ferrario et al., 2004), and patterning of the limbs, gonads, and other body organs in animals (Estrada et al., 2003; L i and Cao, 2003; Lo and Frasch, 2003; Patterson and Potter, 2003). Hox genes regulate normal and leukemic blood cell differentiation in mammals, which will be discussed further below (Buske and Humphries, 2000; Owens and Hawley, 2002; Payne and Crooks, 2002). A key challenge in the field of Hox gene regulation is to discover the suites of downstream target genes that mediate the various patterning events and to characterize how Hox co-factors, such as products of the Pbx and Pax gene families (Graba et al, 1997; Mann and Affolter, 1998; Westerman et al., 2003), assist in the process of target gene regulation in vivo. In Drosophila, the appropriate patterns of Hox gene expression are initially established through the coordinate action of the segmentation genes in a process that is well understood (Rivera-Pomar and Jackie, 1996; Thieffry and Sanchez, 2003; Wolpert, 1994). However, in mice, only a small number of upstream regulators involved in setting Hox gene initial patterns have been characterized, such as the transcription factors Krox20 and Kreisler, and transcription factors activated by FGF and retinoic acid signaling (Bel-Vialar et al, 2002; Dubrulle et al, 2001; Manzanares et al, 2002). In vertebrates in general, establishment of Hox expression patterns appears to be a complex process, and are partially under the control of a global clock-like regulatory system, termed the 'Hox clock', in which the temporal order of Hox gene activation along the cluster is translated into linearly spatially restricted expression domains in the body (Deschamps et al, 1999; Dubrulle and Pourquie, 2002; Kmita and Duboule, 2003; Pourquie, 2003). Other regulatory mechanisms are also involved (Bel-Vialar et al, 2002; Kmita and Duboule, 2003; Spitz et al, 2003; Zakany et al, 2001). Upstream regulators of the Hox genes also include the Polycomb and trithorax Groups of epigenetic modulators. IV. Hox gene regulation by epigenetic maintenance proteins: the Polycomb and trithorax groups Polycomb Group (PcG) and trithorax Group (trxG) genes, the focus of this thesis, are epigenetic regulators, most of which were first identified via genetic screens in Drosophila due to the presence of various types of homeotic transformations in the mutant flies, including extra sex combs on the second and third legs of male flies in PcG mutants, and haltere to wing transformations in trxG mutants (Garcia-Bellido, 1977; Ingham and Whittle, 1980; Jurgens, 5 1985; Kennison and Tamkun, 1988; Lewis, 1978; Lewis, 1947; Shearn et al, 1971; Slifer, 1942). PcG and trxG genes are required to maintain proper Hox gene expression patterns during embryogenesis. In Drosophila, PcG and trxG mutants show normal initiation of Hox gene expression patterns established by the concerted efforts of the segmentation gene products; however, there is a failure to maintain the properly initiated Hox expression patterns (Simon, 1995) (Mahmoudi and Verrijzer, 2001). Hence, PcG and trxG proteins are collectively referred to as maintenance proteins (MPs). Loss-of-function mutations in Polycomb Group (PcG) genes enhance other PcG mutations, suppress trithorax Group (trxG) mutations, cause posterior transformations along the body axis, resulting from ectopic expression of homeotic loci. PcG genes silence Hox genes. Conversely, loss-of-function mutations in trithorax Group genes enhance other trxG mutations, suppress PcG mutations, cause anterior axial transformations resulting from reduced expression of homeotic loci. trxG genes are needed for maintenance of Hox gene activation (Capdevila and Garcia-Bellido, 1981; Kennison, 1995; Mahmoudi and Verrijzer, 2001). There are an estimated 30-40 PcG genes in Drosophila (Jurgens, 1985; Landecker et al, 1994) of which 15 have been cloned and characterized (Table 1-1) (Orlando, 2003). There are at least 14 trxG genes in Drosophila (Table 1-2). V. Evolutionary conservation of maintenance proteins Mammalian homologues exist for the PcG and trxG genes of Drosophila (Table 1-1) (Brock and van Lohuizen, 2001; Lessard and Sauvageau, 2003b; Otte and Kwaks, 2003). As a result of gene duplication, most PcG and trxG genes are present in at least two orthologues in mammals compared to their Drosophila counterparts (Table 1-1). PcG and trxG protein homologues contain conserved functional domains that are now considered characteristic of chromatin proteins (Tables 1-1 and 1-2; see below). It appears likely that ancestral PcG and trxG epigenetic regulators evolved very early in eukaryotic evolution (see Chapter 5 for further discussion). Homologues of Drosophila PcG and trxG genes generally exhibit conserved function in AP body axis patterning via homeotic/i/ox gene regulation, in the nematode Caenorahbitis elegans (Ross and Zarkower, 2003; Zhang et al, 2003) and in the mouse (Mahmoudi and Verrijzer, 2001; van Lohuizen, 1998) (see Chapter 3). M33, a murine homologue of Polycomb (Pc), can largely rescue the phenotypes of Pc null mutants in Drosophila development, clearly illustrating strong functional as well as structural conservation of these PcG genes (Muller et al, 1995). 6 U c S-6X1 'a o c G © 2 J E « e i i O © "33 o a e •o '« > S >- ,2 o "O C o U c WO ! -O .2 c 6X) H W o 0 u CM O N ON o f tS v : ON 0 ) ON s i « o S3 S; ° " - -s 5 CS xi 2? <=> <N a ° o ON <N S s E .3 u ON ~ j a _ J ^ ^ -g J3 6 2 * I s e s e £ <: se H a pa > J ^ x 2 oo o -v c H -3 ^ ° < - S ' 09 i i x i a £ E ;S 3 3 - § 5 5 O C/J 3 3 O O V) C/5 C/l 5 B P P 8 8 a. ON ON ON * ON ON C-~ a 2 $ I ON£ 3 « « "3 .22 t> -a s g cj T3 E m o o g I E is I o U a. o c « a. 60 c c 'N •* U E E 3 3 g g 5 5 fj t " 5 O N ON oo c J 3 N 5 <u 4) d> U U TO J- U oa aa rn oa CQ E o O o- b o _ o U ra a . «3 a a E 3 5 C E is 5 " • a. « 'S 5 >^ -C -B l i s ^ S s l oo oo O N O N O N O N O O <u ? ^ 2 E E u. I E U OH O O ao c IS a 1 5 o o s s 6, I "-I 8 f P S •S a 7 N O O N O N O N O N O N CQ O N O N O N <U (/> O > C ~ o u C. ~ o CQ ON O ON O -ON ~ O Q 0 „ c c « & ? <u o o ^ u ^ fflBw oo B 1*5 c CU r/1 O Cd E 3 5 E O O rt ^ ^ c g g 1 o o 2 o o o o o _, < N ( N O N " ~ « « ^ 2 -g _ _ o « c . 3 . 3 3 O U CQ CQ J >- O -o c rt X 'I e 1 S I S Col Co I co to co I o o y > OH £ N <N <= 5 0 u 1 & o E K S3 a i i*5 PH O o « a 6 c "S »S H ^ ^ T -N O O N O N c <o u N O O N O N _ ON S 2 o „ is w •= a a rt « I -o o < K rt O CQ O oo O N O N — ( N •2 M w r 1 c ' r t E o T 3 H W Ol CN X N OH — O S « 60 -O C <5 N >< - 3 W •C •E u g rt -a C H S ? w U m E .S O ed •« E o o ^ -° rt U —. — CN) a o N N G C « ro E E 3 u u 3 3 g O O s 2 s c is a . s §•5-5 I S a N •S S c*^ id o OH OH NNJ N Ik ~ H : I? eg I co ^ 5 S \ E 3 N O O N O N < ^ 2 a 2 2 'g, '2 3 0 0 <tH - a c c O N ^ 2 C [ H M H ^ O W X 5 W u 0< OH <*H U O u * H . - H C T 3 s * O o & e E a S-1 W S o a <u Q «u I c ^ - J 0 0 ^ 5 kj <0 ^ N 8 r o -rt O O O O 0 0 (N <N O N r<-i O N o -—' O O • M « .. w a -a x i ~ c c i - v A n 3 u -g •£ u J 3 2 2 S 2 a a OO ^ ^ 2 3 tu 3 .Ji 2 c « o P o 0 0 J S ^ 2 < N O N « 2 . C rt 1 So .52 « S o n ^ O ^ o o o o O <N r a « t rt T3 2 e E « a u o o o O O o o O !U 5J ^H u o M 60 c c •C - C a . a . on oo X I 60 c O N O N C O o o J 3 3 O o a . o 60 •c a O o 2 > tu O H S 1 2 § en X3 <u w -•a S c « _3 .2 X X oo oo <; < . a . , E o o u OH OH <*H O c rt ft x 5 g x> ° 2 " i H -S g o E IS £ Z £ c <" 2 2 ,„ 3 <u in a <-> 5 <u G • - X , i 111 o a . o a a -a .5 3 tu I e c o O E 2 c 'C 2 ft G ^ « — . .5 OH rt 3 O 'is .s * I ~ s u y c o 2 I G = o E a 60 o .5 •a x i X -S H -° .53 * < Z 60 C « Q X 0H c c rt G O > N m S ft 3 ^ W rt « "3 § E c o c CO T 3 C < u | 0 o u ft ftT3 u w S 1 i TO •g 0 Q oa Q x> s o N o 6: 09 H D3 rt .S E § o s T3 O Q c c c a E 1 I ^ 3 3 3 "B K X X Q t/3 W Cfl g § § g Q 2 2 S 3 ^ S S 3 2 ^ - W • M M . 5i •I c i I g CU DC <D O ^ i . tf] w Q « a s * o o - B to 2 Q - B - B a B O l l 5 OH ~ H <N ^2 u u ^ ^ •3 ^ a I* B •a -o -S : H « w a to B •« .00 5 9 c/> o c <u u .Si <u >-C C O a o l o sal u P H « s * s * , ® -Q © O N O N O N > O M •o O N O N ca PH N O O N O N & 8 co s 3 3 E o 3 H z; OH pa u £ a g * '3 £ c| T oo i N O O N O N e CO •a c CO C o CO "33 +J o a sa cu e >- 2 tu " B sa i<3 E o -u §2 & " 8 e CA 5 ^ 2 o *; 2 <: E oo c c E FFI E W CO J3 2 'I -S CO co •a S CO e H co tU s X s E o •u H W c '3 E o •o H W C/3 CO H S 2 1 c E o T 3 O a I * § ffi £ "I '§> CL, < J 3 <U oo - 1 a >-C N t / 3 c « CUD o CO •« SB I s M d is c a TO Cd TO TO E E E E 3 3 3 3 XX XX -s: -s: e <U S3 10 tor <S lated ke/GAGA-? A 2 o -g s o S o -s: <N 3 <3 10 as O N •a- -<^  S£ O N O N OJ « a TS a -a O N ' -a • c o « o o C 2 H 3 a 2 < >- 2 O N O N V O O N O N oo — O N O N O N » — O N O N »—; • »—i ' - , a „ -— a "s w ~ V £ 00 «> •» f 3 C 00 P a g e to v o to a K 2 •a r 0 £ 3 3 <£ 2 <3 O N o 21 <=> o O N <N § ^ <D ^ 2 o £ 5-5 E N H N ca 3 o > X i*i O N O N g « § g o c o 5 a £ S oo oo O N — —* O N o r n O O N o o o — (N O (N tu 2 2a tu <o <U W « J to p N p "5 bo Q bo S is t b O u< O UH O O a U Q- OH •3 « 1 00 OH C H - 3 < <o "2 o P c p p E G . c o I O X> P C X o) S OH <D 3 o •a e rt OH o 2 T3 c IB E g -§ J p T E ^ X CQ > o -a 00 c •5 s •5 x> 2 < 2 Q C/3 C/3 P S -2" '3 g E £ ^3 e 00 c •3 g Q Q Q 2 BC -S 0 3 E I I Q X P u BC >-" E P 3 3 P .9 o - B s< s ' E 3 - B O rt • E 3 q BC 2 Q X • B g -2 -S -5 3^ o o to to g g - B 95 a S a 1^  « 5 5 «3 OS 3 ~ H N ^ 00 a <»q to 3 •5 N 11 VI . Maintenance elements How do MPs find their specific gene targets on the chromosome? This question is still far from being fully answered, but most progress has been made utilizing Drosophila homeotic Antennapedia and Bithorax gene cluster D N A as a model system. Several cw-regulatory elements (Polycomb Response Elements (PREs)) were discovered that were necessary for maintenance of homeotic gene silencing (Busturia and Bienz, 1993; Busturia et al., 1997; Chan et al, 1994; Chang et al, 1995; Muller and Bienz, 1991; Simon et al, 1993; Simon et al, 1990). PcG proteins specifically associate with PREs, which are typically several hundred base pairs in length. However most PcG proteins do not bind directly to the D N A (Chan et al, 1994; Chang et al, 1995; Gindhart and Kaufman, 1995; Hagstrom et al, 1997; Kapoun and Kaufman, 1995; Orlando et al, 1998; Orlando and Paro, 1993; Strutt et al, 1997; Strutt and Paro, 1997; Zink et al, 1991; Zink and Paro, 1995). PREs have also been identified in other non-homeotic cluster genes including engrailed (Kassis, 1994), polyhomeotic (Fauvarque and Dura, 1993) and hedgehog (Maurange and Paro, 2002). trxG proteins associate with their target loci at trithorax Response Elements (TREs) (Chan et al, 1994; Chang et al, 1995). PREs are intermingled with TREs (Bloyer et al, 2003; Tillib et al, 1999), and have interconnected functions and so these regulatory regions have been collectively termed maintenance elements (MEs) (Brock and van Lohuizen, 2001) or cellular memory modules (CMMs) (Cavalli and Paro, 1998; Cavalli and Paro, 1999). A central question is how PcG and trxG proteins accomplish site-specific M E recognition given that most MPs do not bind DNA. It is not as simple as tethering PcG proteins to their target locus, since D N A binding domain-PcG fusion proteins cannot maintain silencing of the target locus, although a PcG complex can assemble (Poux et al, 2001a). Several conserved D N A sequence motifs are present among characterized MEs (Ringrose et al, 2003), and these include D N A binding sites for PHO and PHO-like, the only known PcG proteins with DNA-binding activity (Brown et al, 2003; Brown et al, 1998; Mihaly et al, 1998), Pipsqueak (Hodgson et al, 2001; Huang et al, 2002), and the trxG proteins G A G A factor (GAF) (Strutt et al, 1997) and Zeste (Hur et al, 2002; Saurin et al, 2001), as well A A C A A sequences for which no binding factors have yet been associated (Tillib et al, 1999). However, neither G A G A nor PHO binding sites by themselves can confer silencing (Strutt et al, 1997) (Mohd-Sarip et al, 2002). Zeste sites are necessary for inheritance of the active transcriptional state, and for recruitment of the trxG protein Brahma, but not for recruitment of T R X protein (Dejardin and Cavalli, 2004). It appears that each of these various DNA-binding factor sites are necessary but 12 not sufficient for complete epigenetic inheritance function. The D N A binding factors likely facilitate targeting of PcG/trxG components or complexes to MEs, although the processes involved are not fully understood, are likely complex and combinatorial in nature, and each different M E may specifically recruit different MP complexes (Dejardin and Cavalli, 2004; Hodgson et al., 2001; Mulholland et al, 2003; Pirrotta et al, 2003; Ringrose et al, 2003). MPs also bind at the promoter of target loci in Drosophila, in addition to the M E (see below). MPs also bind at mammalian promoters (see below); however, PRE function has not yet been described in mammals (Orlando, 2003). D N A binding proteins have been identified that interact with mammalian PcG proteins, including E2F6, which belongs to the cell proliferation regulatory E2F family of transcription factors; therefore, these proteins may function in recruitment of MPs to target loci in mammals (Atchison et al, 2003; Dahiya et al, 2001; Garcia et al, 1999; Ogawa et al, 2002; Satijn et al, 2001; Trimarchi et al, 2001). Orlando (2003) has speculated that repetitive D N A elements may also play a role in targeting of PcG proteins in mammalian genomes, based on the observation that Y Y 1 (PHO homologue) belongs to a repressive complex that binds to the D4Z4 repeat element implicated in regulation of the facioscapulohumeral muscular dystrophy (FSHD) gene (Gabellini et al, 2002). Non-coding RNA transcripts may facilitate targeting and establishment of PcG-mediated gene silencing, as this mechanism has been implicated in heterochromatin silencing systems in yeast and mammals (Dernburg and Karpen, 2002; Maison et al, 2002; Muchardt et al, 2002; Orlando, 2003; Vermaak et al, 2003; Volpe et al, 2002). VII . Molecular mechanisms of PcG proteins How do PcG and trxG proteins function as a cell memory system? Early genetic studies in Drosophila demonstrated that PcG genes exhibit synergistic effects in combinations of different PcG gene mutants (Jurgens, 1985; Kennison, 1995), suggesting that PcG proteins act together to silence Hox loci. PcG proteins directly interact with each other through conserved functional domains (Table 1-1), and these interactions are critical for PcG function in Hox gene regulation and axial patterning during Drosophila embryogenesis (Alkema et al, 1997a; Franke et al, 1992; Kyba and Brock, 1998; Satijn and Otte, 1999; Strutt and Paro, 1997). While some conserved domains of PcG proteins, such as the S A M domain, act as protein-protein interaction modules (Kyba and Brock, 1998; Peterson et al, 1997; Zhang et al, 2004), the recent discovery that others (e.g. the SET domain) exhibit enzymatic activity on their own was a great step forward in understanding mechanism of action of Maintenance Proteins. The S A M domain in 13 the PcG protein SOP-2 of C. elegans mediates interaction with the small ubiquitin-related modifier (SUMO)-conjugating enzyme UBC-9, and sumoylation of SOP-2 is required for Hox gene repression (Zhang et al, 2004). The SET domain is a histone methyltransferase (HMTase), present in the heterochromatin protein SU(VAR)3-9, the trxG proteins TRX, TRR, and ASH1, and the PcG protein E(Z), and other proteins (Alvarez-Venegas and Avramova, 2002; Beisel et al, 2002; Breiling and Orlando, 2002; Marmorstein, 2003; Orlando, 2003; Rea et al, 2000; Xiao et al, 2003; Yeates, 2002). SU(VAR)3-9 preferentially methylates lysine (K) 9 on histone H3 (H3-K9Me) (Rea et al, 2000), whereas E(Z) predominantly exhibits H3-K27Me activity and to a lesser extent H3-K9Me activity (Cao et al, 2002; Czermin et al, 2002; Kuzmichev et al, 2002; Muller et al, 2002). Both H3-K27Me and H3-K9Me marks are considered indicators of repressive states (Kouzarides, 2002). However, it is not yet clear how histone modifications confer functional outcomes by PcG action, or whether or not a combinatorial "histone code" (Jenuwein and Allis, 2001) is utilized by PcG proteins at each target locus to dictate these outcomes. In support of combinatorial action, another conserved chromatin domain, the chromodomain, found in P O L Y C O M B (Table 1-1) and the heterochromatin protein HP1 (Paro and Hogness, 1991), and approximately 40 other proteins in mice and humans (Tajul-Arifin et al, 2003), is a evolutionarily conserved recognition module for methylated lysines on histones, but also exhibits R N A binding activity in some cases (Eissenberg, 2001; Fischle et al, 2003a). PC binds to H3-K27Me (Fischle et al, 2003b). E(Z) SET HMTase activity on H3-K27 may set the docking site for PC binding via its chromodomain, which may then recruit other PcG proteins (see below) at euchromatic target loci (Czermin et al, 2002; Platero et al, 1996). As predicted from earlier protein-protein interaction studies, multi-protein soluble complexes containing PcG proteins have been purified from Drosophila and mammalian nuclear extracts. However, there is some controversy regarding the number and composition of PcG complexes (Otte and Kwaks, 2003). While it is definitely an oversimplification, there are currently two broad categories of soluble PcG complexes. The first category is represented by the >1 to 2 MDa Polycomb repressive complex 1 (referred to as PRC1) in Drosophila, containing the PcG proteins PSC, SCM, PC, PH, and RING, the trxG protein ZESTE, and several other non-PcG/trxG proteins (Francis and Kingston, 2001; Saurin et al, 2001; Shao et al, 1999). As well, their human homologues BMI1, SCMH1, HPC2, HPH1, and RING 1A respectively form a mammalian PRC 1-type complex, although this complex lacks most of the non-PcG/trxG components found in the Drosophila PRC1 complex (Levine et al, 2002). 14 PRCI appears to have several distinct functions. First, the PRC1 complex, as well as a recombinant Drosophila Polycomb 'core' complex (dPCC) consisting of just PC, PSC, PH, and RING, represses transcription and inhibits in vitro chromatin remodeling of nucleosome arrays by the human SWI-SNF complex (Francis and Kingston, 2001; Shao et al, 1999). Surprisingly, PSC by itself can also inhibit remodeling, and both PCC and PSC may accomplish this via template exclusion. Because restriction enzymes can still access the nucleosomal arrays it appears that the PCC, and by extension PRCI, does not completely 'coat' the template to block access to all other proteins (Francis and Kingston, 2001). However, in vivo assays probing chromatin accessibility of homeotic gene templates repressed by PcG proteins indicate that restricted template access does occur in vivo (Fitzgerald and Bender, 2001). Second, recombinant dPCC and mouse PRC 1 core complexes (mPCCs) act in trans to link nucleosome arrays, which may mediate long-distance interactions between the promoter region and distal PREs (Lavigne et al, 2004). A third potential function is suggested by the presence of TBP and TAFs (Muller and Tora, 2004) in the Drosophila PRCI complex, implying that PRCI acts directly at the promoter to repress activity of the basal transcription machinery (Saurin et al, 2001). However, the mammalian PRCI complex equivalent does not contain TBP and TAFs (Levine et al, 2002), and the reason behind these differences is unclear. The Drosophila PcG proteins E(Z), ESC, SU(Z)12, and possibly PHO (and human homologues EED, EZH2, and SUZ12) belong to a second group of conserved soluble complexes in Drosophila and humans respectively, generally referred to as PRC2 (Table 1-1) (Cao et al, 2002; Ng et al, 2000; Sewalt et al, 1998; van Lohuizen et al, 1998; X u et al, 2001; Czermin et al, 2002; Kuzmichev et al, 2002; Muller et al, 2002; Tie et al, 1998; Tie et al, 2001; Tie et al, 2003). There appear to be at least two distinct PRC2-type complexes in Drosophila, a smaller one of about 600 kDa, and a larger 1 MDa complex which in addition to the above factors also contains the PcG protein PCL and the repressive H D A C protein RPD3 (Tie et al, 2001; Tie et al, 2003). A PRC2 (but not PRCI) complex also exists in plants (Hsieh et al, 2003). The PRC2 complex exhibits HMTase activity, predominantly at H3-K27, mediated by the SET domain of E(Z)/EZH2 (Czermin et al, 2002; Kuzmichev et al, 2002; Muller et al, 2002). Human PRC2 associates with HDAC1 and HDAC2 proteins through direct interactions with EED; and trichostatin A, an H D A C inhibitor, relieves EED mediated repression (van der Vlag and Otte, 1999). In Drosophila the H D A C protein RPD3 is required for PRE-mediated target gene repression and binds directly to the PRE within the D N A in vivo 15 (Tie et al, 2001). Hence there likely is a mechanistic link between repressive H D A C function and PRC2 function. What are the connections between PRC1 and PRC2, and how do they contribute to cell memory function? Both complexes are required for maintenance of Hox gene repression since null mutants in PcG genes belonging to either complex result in embryonic lethality, and PcG members from each complex are required for PRE-mediated silencing (Simon et al, 1992; Simon et ah, 1993). PRC2 and PRC1 interact transiently, in early preblastoderm Drosophila development, but not in subsequent stages (Poux et ah, 2001b). If PRC2 recruits PRC1 as a consequence of methylating histones, it is surprising that when the PRC1 component PC protein is tethered to target loci, mutations in the PRC2 component gene esc abolish silencing (Poux et al, 2001a), and that tethered PC can also recruit ESC (Poux et al, 2001b). Together these observations suggest that ESC must possess other functions as well as a putative role in PRC1 recruitment. VIII. Molecular mechanisms of t rxG proteins trxG proteins also contain conserved functional domains, many of which are also found in PcG proteins, and they also belong to multimeric protein complexes, based on biochemical evidence from yeast, Drosophila, and mammals (Table 1-2) (Kennison, 2004; Roberts and Orkin, 2004). The trxG proteins TRX, TRR, ASH1, and their mammalian homologues (Table 1-2), each contain a HMTase SET domain, which, where tested, show specificity for H3-K4 methylation activity (Byrd and Shearn, 2003; Milne et al, 2002; Nakamura et al, 2002; Sedkov et al, 2003; Smith et al, 2004). Tri-methylation of H3-K4 in yeast is a hallmark of transcriptionally active euchromatic genes, whereas di-methylation at H3-K4 corresponds to either inactive or active euchromatic loci, underscoring the complexity of the putative 'histone code' (Jenuwein and Allis, 2001; Santos-Rosa et al, 2002). Particular H3-K4 methylation tags mediated by trxG proteins may provide an epigenetic mark for maintenance of transcriptional activation (Liang et al, 2004; Milne et al, 2002), whereas H3-K9Me and/or H3-K27Me may be marks for maintenance of transcriptional repression of euchromatic loci, while H3-K9Me marks heterochromatic regions for silencing (Kouzarides, 2002). T R X and ASH1 interact with each other via their SET domains (Rozovskaia et al, 2000; Rozovskaia et al, 1999), although they have not been found to copurify within the same protein complexes (see below). A functional consequence of this interaction is indicated by the observation that partial loss of ash 1 function abrogates T R X binding to polytene chromosomes 16 (Rozovskaia et al, 1999). Interestingly, binding of TRX, as well as of the PcG proteins PSC and SU(Z)2, but not of ZESTE, to polytene chromosomes is also dependent upon E(Z) (Kuzin et al, 1994; Rastelli et al., 1993). These results suggest that both ASH1 and E(Z) are required for targeting T R X complex(es) to, possibly distinct, TREs. Both T R X and ASH1 directly interact and copurify with CREB-binding protein, CBP (Bantignies et al., 2000), a transcriptional coactivator and member of the CBP/p300 family of HATs (Bantignies et al., 2000; Goto et al., 2002; Petruk et al., 2001; Petruk et al., 2004), which may mediate effects of these two trxG proteins on the basal transcriptional machinery thereby affecting transcriptional initiation (see below). The trxG proteins T R X and B R A H M A also contain a bromodomain (Simon and Tamkun, 2002). Bromodomains specifically bind to acetylated lysine residues, were the first modules recognized to 'read' histone marks, and have affinity for both histone and non-histone proteins (Dhalluin et al, 1999; Hudson et al, 2000; Jacobson et al, 2000; Owen et al, 2000; Zeng and Zhou, 2002). Bromodomains are present in virtually all known HAT proteins, including CBP, and are thought to tether regulatory proteins to specific chromosome sites through selective recognition of specific acetylated lysine residues (Kanno et al, 2004; Zeng and Zhou, 2002). A l l characterized members of the trxG are either subunits of chromatin remodelling complexes (Langst and Becker, 2004; Sif, 2004) or histone modifying complexes (Khorasanizadeh, 2004; Kouzarides, 2002). Multiple different complexes containing trxG proteins have been characterized so far in Drosophila. Two complexes contain the trxG protein Brahma (BRM), which is related to core ATPase subunits from two ATP-dependent chromatin remodeling complex in yeast: SWI2/SNF2 from the SWI/SNF complex; and Sthl from the RSC complex (Mohrmann et al, 2004). B R M is found in the 2 MDa B A P complex, which contains the trxG proteins B R M , MOIRA, OSA, and SNR1, and also in the PBAP complex that does not contain OSA but rather contains the proteins P O L Y B R O M O and B A P 170 (Dingwall et al, 1995; Mohrmann et al, 2004). KISMET is similar to B R M and is likely a part of a distinct chromatin remodeling complex that is still uncharacterized (Daubresse et al, 1999). Drosophila TRX, ASH1, and ASH2 belong to distinct protein complexes, none of which exhibit chromatin remodeling activity. A 1 MDa T R X complex called TAC1, a 2 MDa ASH1 complex, and a 0.5 MDa ASH2 complex, have been characterized in Drosophila (Papoulas et al, 1998; Petruk et al, 2001; Petruk et al, 2004). The TAC1 complex, which also contains CBP and Sbfl (a phosphatase-related protein) in addition to TRX, is rapidly recruited to the heat shock hsp70 locus following heat shock induction (Sanchez-Eisner and Sauer, 2004; Smith et ah, 2004). The TAC1 complex facilitates R N A polymerase II mediated hsp70 transcriptional elongation rather than functioning in transcriptional initiation (Smith et al, 2004). T R X has 5 homologues in mammals (Table 1-2), of which one, variously called Mixed Lineage Leukemia (MLL) /ALL-1/HRX/hTRX, has been intensively studied on account of its role as a fusion oncogene in pediatric leukemia (Eguchi et al., 2003). The human M L L / A L L - 1 protein belongs to a supercomplex containing at least 29 proteins (Nakamura et al., 2002), which is presumably distinct from a putative mammalian TACl-type complex similar to the Drosophila TAC1 described above which would be more limited in function. IX. Models for maintenance protein function Once MPs are localized to their specific gene targets on the chromosome, how do they function in regulation of gene activation or repression, and in conferring maintenance of those gene expression states? As the preceding section illustrates, these questions are still far from being answered. The following models summarize several broad scale possibilities mentioned above that are currently under consideration. It is entirely possible that different MPs utilize distinct mechanisms for function, and the following models are not necessarily mutually exclusive. i) trxG models There is substantial evidence for the model that trxG proteins function in transcriptional activation due to the activity of chromatin remodeling complexes to which they belong. Such complexes alter chromatin to allow for full access and function of the basal transcriptional machinery (Sif, 2004). A model for trxG protein function based on the histone code model of Strahl and Allis (Jenuwein and Allis, 2001; Strahl and Allis, 2000) is that trxG proteins deposit and/or respond to particular epigenetic tags on histones that positively influence transcription and maintain gene activation. Consistent with this model, trxG proteins act as H3-K4 HMTases, are present at promoters of active genes, and recruit HAT coactivators (see above). However, additional research is required to elucidate exactly how trxG-mediated histone code alterations lead to enhanced transcription. The final model of trxG function postulates that trxG proteins antagonize PcG function in gene repression. The latter model will be discussed further below, in conjunction with the PcG model 'prevention of activation of trxG proteins'. 18 ii) PcG models PcG proteins may interfere with either assembly or activity of the basal transcription apparatus (Bienz, 1992). Both PC and T R X are localized to promoters of silenced homeotic genes, based on results from chromatin immunoprecipitation (ChIP) experiments (Orlando et al., 1998). PcG silencing does not prevent binding at the promoter by general transcription factors (GTFs), TBP, or RNA polymerase II, but it does prevent initiation of transcription (Breiling et al., 2001; Dellino et al., 2004). The Drosophila PRCI complex contains stoichiometric quantities of TAFs and other GTFs, and PRCI components co-immunoprecipitate with TBP (Saurin et al, 2001). PRCI and recombinant PCC block RNA polymerase II activity on a nucleosomal template in vitro (King et al, 2002). It therefore appears that PcG proteins do not affect trxG, PcG, or transcription factor recruitment at the promoter, but somehow inhibit transcriptional activation of target genes through an as yet unknown mechanism. It is also possible that PcG proteins inhibit transcriptional elongation (e.g. inhibit TAC1 complex activity; see below) or some aspect of R N A metabolism (Orlando, 2003). trxG and PcG proteins are bound to chromatin at both repressed and active promoters (Orlando et al, 1998) implying that decisions between silencing and activation are not simply due to differential recruitment of maintenance proteins. trxG and PcG proteins may act antagonistically (Klymenko and Muller, 2004). PRCI PcG complexes block hSWI/SNF chromatin remodeling complex binding to nucleosome arrays in vitro, (Francis et al, 2001; Shao et al, 1999), suggesting that PRCI can prevent activation by trxG-containing chromatin remodeling proteins. In Drosophila embryos, targeting PcG proteins to an active reporter locus does not result in repression (Poux et al, 2001a), showing that PcG proteins prevent activation only if they act prior to establishment of an active transcriptional state. Like the trxG, PcG proteins have been proposed to alter the histone code (Jenuwein and Allis, 2001; Strahl and Allis, 2000), and can also modify other proteins. PcG proteins do not appear to have H D A C activity themselves, but they can recruit H D A C co-repressors (Chang et al, 2001; Tie et al, 2001; van der Vlag and Otte, 1999). The carboxyl-terminus binding protein (CtBP) transcriptional corepressor interacts with the PcG protein Pc2 in humans, which mediates sumoylation of CtBP (Kagey et al, 2003; Sewalt et al, 1999). H3-K27 histone methylation mediated by E(Z) of the PRC2 complex (Cao et al, 2002; Czermin et al, 2002; Kuzmichev et al, 2002; Muller et al, 2002) may recruit PRCI (Fischle et al, 2003b). It is still unclear how the PcG-mediated silencing function is maintained through cell division although 19 this might be mediated by particular patterns of histone methylation (see below). It is likely that other post-translational modifications of histone tails, and of PcG proteins themselves (e.g. sumoylation; see above), will also be involved in gene silencing. Based on early observations of a shared sequence motif (the chromodomain) between PC and the heterochromatin protein HP1, it was proposed that PcG proteins silence their targets by inducing formation of heterochromatin-like structures (Paro and Hogness, 1991). However, ChIP experiments demonstrate that PC does not spread along widespread chromosomal regions of silenced homeotic genes (Orlando et ah, 1998), and that PcG repressed promoters do not prevent binding of GTFs (Breiling et al., 2001), R N A polymerase II, TBP, or heat shock factor to the target gene promoter (Dellino et al., 2004). Despite evidence indicating some degree of reduced D N A accessibility (Fitzgerald and Bender, 2001), the balance of the evidence rules out a general heterochromatin-like model for PcG proteins involving dramatically restricted access to the template. There is increasing evidence that non-coding R N A is involved in multiple regulatory processes in higher eukaryotes (Mattick, 2004). Recent appreciation of the importance of non-coding RNA stems from the fortuitous discovery of the phenomenon of R N A interference (RNAi) (Denli and Hannon, 2003). Heterochromatic silencing is dependent upon the RNAi machinery (Pal-Bhadra et al., 2002). Several groups have speculated that PcG silencing may involve non-coding R N A components (Bender and Fitzgerald, 2002; Drewell et al., 2002; Hogga and Karch, 2002; Orlando, 2003; Pal-Bhadra et al, 2002; Rank et al, 2002; Vermaak et al., 2003; Volpe et al., 2002). It is too early to formulate a particular model for connections between PcG silencing and non-coding RNA; future researchers will undoubtedly continue to explore these novel possibilities. The looping model for PcG mechanism of action proposes that cooperative binding between the PcG complexes located at PREs (or other sites) and the promoter results in looping out of the intervening DNA, and prevents enhancers from interacting with the promoter to activate gene transcription (Pirrotta, 1995; Pirrotta, 1998). Strong evidence for a similar D N A looping model in control of gene expression in the murine globin gene cluster exists (Carter et al., 2002; de Laat and Grosveld, 2003; Dekker, 2003). Limited evidence consistent with the looping model exists for PcG regulation (Pirrotta and Rastelli, 1994), but D N A looping remains an attractive model for PcG function since it would incorporate some observed features of PcG regulation: firstly, the D N A could remain accessible to trans-acting factors; and secondly, multiple cooperative interactions between PcG proteins would allow for proper assembly of 20 loop ends at hypothesized 'active chromatin hubs'. Given that the chromosome conformation capture (3C) and RNA-TRAP techniques are now available (de Laat and Grosveld, 2003; Dekker, 2003; Splinter et al, 2004), direct tests for involvement of D N A looping in PcG regulation at well-characterized target loci will likely be forthcoming. In the nuclear compartmentalization model, silencing by PcG proteins is brought about by sequestering target loci to repressive perinuclear compartments within the nucleus, in analogy to yeast silent mating type loci and telomeres (Andrulis et al., 1998; Baxter et al., 2002; Hediger and Gasser, 2002; Hediger et al., 2004; Marshall, 2002). However, currently, there is no direct evidence for a PcG-mediated subnuclear compartmentalization model. If this model of PcG function holds up, it may be involved only in silencing of particular loci, rather than a general mechanism used for all PcG silenced loci. Regulation of certain genes involved in lymphopoiesis, including the immunoglobulin and T-cell receptor loci that undergo rearrangment, Ikaros, and the Dntt terminal transferase gene, involves regulated changes in their nuclear compartmentalization; based on recent findings (see Chapter 4) it is tempting to speculate that this may be triggered by histone modifications mediated by Ezh2 (Kosak et al., 2002; Liberg et al, 2003; Ramsden and Zhang, 2003; Su et al, 2003; Su et al, 2004). iii) Maintenance models While the above models describe potential mechanisms of how trxG and PcG proteins might mediate transcriptional activation and repression respectively, one main problem is that most models do not explicitly address the question of cell memory (i.e. maintenance of transcriptional states through cell division). It is widely assumed that PcG proteins impart and/or respond to an epigenetic mark that distinguishes loci to be silenced (permanently repressed), and re-establishment of the PcG-mediated silenced state occurs in cellular progeny after every cell division (Cavalli and Paro, 1999; Pirrotta et al, 2003). However, since most PcG proteins disassociate from chromatin during prophase in mitosis and then reassociate in late anaphase or telophase (Buchenau et al, 1998)(E. O'Dor, unpublished observations), it seems unlikely that the physical presence of PcG proteins themselves could constitute the epigenetic mark. A methylation tag on lysine and/or arginine residues of histone tails within nucleosomes at target loci is the most attractive candidate for a trxG or PcG protein epigenetic mark for long-term memory of transcriptional states through cell division. As described above, certain trxG and PcG proteins act as HMTases. Histone methylation is the only type of post-transcriptional histone modification not known to be enzymatically reversible (Bannister et al, 2002), and 21 therefore could constitute a permanent mark. Such a mark could only be removed by replacement of the histone containing the mark. Signals provided by methylation marks would presumably have to be integrated with other, reversible, histone modifications (Khorasanizadeh, 2004; Vermaak et al, 2003). However, during cell division and replication of DNA, i f the nucleosome octomer is disassembled into an H3-H4 heterotetramer plus two H2A-H2B heterodimers which are then are randomly distributed onto the leading and lagging strands of D N A to regenerate chromatin as traditionally thought (Vermaak et al, 2003; Wolffe, 1998), a key problem arises as to how it is possible for the epigenetic information residing as marks on each nucleosome to be faithfully inherited by each daughter cell. A very attractive model that would solve the above problem was recently proposed (Korber and Horz, 2004; Tagami et al, 2004). In this model, nucleosome assembly is semi-conservative, and in this process, each daughter strand would inherit a single H3-H4 heterodimer from the exisiting nucleosome, and would be paired with a de novo synthesized H3-H4 heterodimer to assemble the (H3-H4)2 heterotetramer found within intact nucleosomes. Therefore each nucleosome would inherit one histone H3 molecule replete with its epigenetic marks, which could then act as a template to relay its information to the de novo synthesized histone H3. Evidence consistent with this model exists: protein folding studies show that there is a dynamic equilibrium of H3-H4 dimers and tetramers (Banks and Gloss, 2004), and H3-H4 dimers (not tetramers) are present in histone chaperone complexes (Tagami et al, 2004). Due to its attractiveness in explaining a mechanism for faithful transmission of epigenetic tags, numerous stringent tests of the model will no doubt be forthcoming. Distinct histone variants and nucleosome assembly pathways may also be critically involved in gene regulation, and there is increasing evidence that they are associated with active versus silent chromatin states (Ahmad and Henikoff, 2002a; Ahmad and Henikoff, 2002b; Ahmad and Henikoff, 2002c; Korber and Horz, 2004; Loyola and Almouzni, 2004; McKittrick et al, 2004; Vermaak et al, 2003). For example, particular histone H2A and H3 variants show distinct functional specificity in regulation of V(D)J recombination during lymphopoiesis (Korber and Horz, 2004; Mostoslavsky et al, 2003; Redon et al, 2002; van Leeuwen and Gottschling, 2003). There are distinct D N A replication-coupled and replication-independent nucleosome assembly pathways; the latter pathway appears to be used specifically for active loci (Ahmad and Henikoff, 2002c; Korber and Horz, 2004; Vermaak et al, 2003). Competency to receive a histone modification on a given residue, or to respond to and integrate signals from 22 a given modification, could be predetermined by the type of histone protein variant present in the nucleosome positioned at the target locus (Ahmad and Henikoff, 2002b; Khorasanizadeh, 2004; Vermaak et al, 2003). A mechanistic connection between D N A replication and transcription was proposed fifteen years ago (Almouzni et al, 1991; Wolffe, 1991), and early observations that transcriptionally active loci replicate early in higher eukaryotes have recently been confirmed in a genome-wide analysis in Drosophila (Goldman et al, 1984; Schubeler et al, 2002). Induction of vertebrate HoxB cluster genes appears to require D N A replication ((Fisher and Mechali, 2003). It is an interesting possibility that gene regulation might be controlled through distinct pathways of nucleosome replacement involving cell-cycle dependent mechanisms. Testing the above models specifically in relation to function and mechanisms of trxG and PcG mediated maintenance of transcriptional activation and repression, and connections with the cell cycle, will certainly be a priority of future research. X. Blurring the boundaries: enhancer of trithorax and Polycomb (ETP) group proteins Maintenance Proteins have recently been further categorically subdivided, based on genetic observations. The category of Enhancer of trithorax and Polycomb (ETP) Group genes was created on the basis of a Drosophila genetic screen in Alan Shearn's laboratory for enhancers and suppressors of trxG mutations (Gildea et al, 2000). Mutations in six genes formerly classified in the PcG, namely Additional sex combs (Asx), Enhancer of zeste (E(z)), Enhancer of Polycomb (E(Pc)), Posterior sex combs (Psc), Suppresor of zeste 2 (Su(z)2), and Sex combs on midleg (Scm) were found to enhance trxG mutations, thereby leading to the reclassification of those genes to the ETP Group (Figure 1-1, Table 1-1, Table 1-2), defined as genes whose mutations enhance the phenotypes of both PcG and trxG gene mutants (Gildea et al, 2000; LaJeunesse and Shearn, 1996; Milne et al, 1999). The phenotypes of ETP group gene mutants imply these genes function in both repression and activation of Hox target genes (Figure 1-1; Brock and van Lohuizen, 2001). Mammalian homologues exist for all Drosophila ETP Group proteins (Tables 1-1 and 1-2). However, the categorical separations into PcG, trxG, and ETP groups have not been clearly applied in the mammalian literature because the requisite genetic tests have generally not been done, and hence mammalian ETP gene homologues are still typically categorized as PcG genes. Tests of ETP function in mammals are hindered by the relative lack of mouse mutant models available to test for genetic enhancement as compared to the availability of multiple Drosophila 23 CO -a £ .a -s + •+* T w « V • eS fcj 0 3 c o : i s •2 £ © * « O O S S 4> ISO 3 08 so C ,© SO C e • C E s 61) o c a CO e o ••a •3 •53 co 03 CJ 5 24 mutant strains for each known PcG, trxG, and ETP group gene. Ultimately, the problem with applying such classification systems from Drosophila to mammals is that, for the most part, we do not yet have a full understanding of the pleiotropic functions for most PcG, trxG, and ETP genes, or of the target genes they regulate directly, and therefore cannot adequately judge whether or not function and mechanism of action of these genes has been conserved in evolution. X I . Models for E T P function Distinctions between the traditional views that all PcG genes maintain transcriptional repression of target loci and all trxG genes maintain transcriptional activation of target loci are becoming blurred on account of identification of ETP Group proteins and other observations. PcG and trxG proteins are now thought to have some degree of interconnected function in maintaining the appropriate transcriptional state of a given target locus, depending on the context, and ETP proteins may provide insight into the bridge(s) between these functions (Brock and van Lohuizen, 2001). There are several models for ETP function and mechanism of action, which are not necessarily mutually exclusive, and different ETP Group proteins may have different function and mechanism of action: 1) ETP proteins are transcriptional activators of both PcG and trxG genes; 2) ETP proteins are components of both silencing (PcG) and activating (trxG) soluble complexes; 3) ETP proteins are cofactors of both PcG and trxG proteins ; 4) ETP proteins change the accessibility of chromatin at target loci which allows selective modulation of binding by either PcG or trxG complexes (Brock and van Lohuizen, 2001). There is no direct evidence for ETP models 3 and 4 above. However, several lines of evidence from Drosophila and mammalian systems illustrate the interconnected functions between PcG, trxG, and ETP group proteins, some of which lend support for ETP models 1 and 2, and others of which are too incomplete to fit into the above models. Some PcG proteins bind PcG loci and yet they do not silence themselves, which is the predicted function of PcG proteins (Bloyer et al, 2003; Fauvarque et al, 1995; Strutt et al, 1997). PcG proteins may possess a dual role in modulation of gene expression as well as in silencing, depending on the target locus context (Bloyer et al, 2003). Homologues of the ETP protein Enhancer of Polycomb (E(PC)) (Sinclair et al, 1998a; Stankunas et al, 1998), belong to related histone acetyltransferase (HAT) complexes in yeast, C. elegans, and mammals which function to activate transcription, and also interact with the RET finger protein (RFP) transcriptional repressor in mammals (Boudreault et 25 al, 2003; Ceol and Horvitz, 2004; Fuchs et al, 2001; Galarneau et al, 2000; Shimono et al, 2000; Tezel et al, 2002), consistent with a dual function. It is not yet known if PcG and/or trxG genes are direct targets of E(PC) HAT complexes. ETP and PcG or trxG proteins do belong to the same soluble complexes and/or interact directly. The soluble Polycomb repressive complex 1 (PRC1) in flies and mammals contain both PcG and ETP proteins (Francis et al, 2001; Levine et al, 2002; Saurin et al, 2001; Shao et al, 1999). Similarly, a second conserved soluble complex, PRC2, also contains both PcG and ETP proteins (van Lohuizen et al, 1998; Hsieh et al, 2003; Ng et al, 2000; Sewalt et al, 1998; Tie et al, 2003; X u et al, 2001). The PcG protein PCL (and its human counterpart PHF1) interacts directly with the ETP protein E(Z) (and human EZH2 respectively), via the PHD fingers of PCL/PHF1 (O'Connell et al, 2001). The human ETP protein BMI1 (PSC homologue) and the PcG protein HPC2 (PC homologue), as well as the histone deacetylase HDAC1 and the transcriptional corepressor CtBP, all interact directly with the trxG protein M L L (TRX homologue) via its repression domain (Xia et al, 2003). The trxG protein G A G A factor (GAF) encoded by Trithoraxlike (Trl)) co-localizes to PREs and promoters repressed by PcG proteins, and co-immunoprecipitates or co-purifies with PcG proteins indicating that G A F and PcG proteins are found within the same complex(es) (Breiling et al, 2001; Busturia et al., 2001; Faucheux et al, 2003; Hodgson et al, 2001; Horard et al, 2000; Poux et al, 2001b; Lehmann, 2004). Trl mutants also enhance PcG gene mutants, leading to the suggestion that Trl be reclassified as an ETP gene (Gildea et al, 2000; Strutt and Paro, 1997). Some PcG and trxG mutants (not including ETP group mutants) exhibit homeotic transformation phenotypes that are inconsistent with the historical definitions of these groups. M33 is a murine structural homologue of Pc in Drosophila, and shows significant functional conservation as it can rescue most embryonic phenotypes of Pc nulls (Muller et al, 1995). Pc null mutants exhibit posterior homeotic transformations along the body axis, and cause ectopic anterior expression of Hox genes (McKeon and Brock, 1991; Simon et al, 1992). However, M33 null mutants show bidirectional axial skeleton transformations, and minimal defects in Hox gene expression (Bel et al, 1998; Core et al, 1997; Katoh-Fukui et al, 1998). Drosophila trx null mutants exhibit bidirectional homeotic transformations (Milne et al, 1999), as do MU+/-mice (Hanson et al, 1999; Y u et al, 1998; Y u et al, 1995). Overall, a picture is emerging in which the composition of soluble and/or D N A sequence-specific protein complexes involved in maintaining repressive and/or activated states of transcription, contain different ETP, PcG, and trxG proteins depending on the cell type, 26 developmental stage, and/or target locus, and are probably very dynamic structures (Brock and van Lohuizen, 2001; Otte and Kwaks, 2003; Simon and Tamkun, 2002). It may take some time before clarity is achieved with respect to how PcG, trxG, and ETP mechanism(s) of action translates into function and the pleiotropic phenotypic outcomes observed in mutant model systems. Investigation of multiple differentiation and developmental processes affected in PcG, trxG, and ETP gene mutants in conjuction with molecular mechanistic approaches will undoubtedly shed light on these unresolved issues. XII. Beyond Hox gene regulation; multiple functions for PcG and trxG proteins PcG and trxG proteins regulate numerous other target loci in addition to the Hox genes, as was suggested by: 1) early observations of pleiotropic developmental defects in Drosophila PcG mutants; and 2) PcG protein binding to over 100 sites on Drosophila polytene chromosomes (Chinwalla et al, 1995; DeCamillis et al., 1992; McKeon and Brock, 1991; Zink and Paro, 1989). In addition to their roles in euchromatic gene regulation, a subset of PcG and trxG proteins in Drosophila are required for position-effect variegation (PEV) (Birve et al., 2001; Laible et al, 1997; Schotta et al, 2003; Sinclair et al, 1998a; Yamamoto et al, 2004) and telomeric position effect (TPE) (Boivin et al, 2003). PcG genes are also needed for random X inactivation in mammals (Silva et al, 2003), and in silencing of the germline X chromosome in C. elegans (Bean et al, 2004; Pirrotta, 2002; Xu et al, 2001). PcG and trxG proteins are required for genomic imprinting in plants (Grossniklaus et al, 2001; Hsieh et al, 2003), and PcG proteins regulate imprinted X inactivation (Erhardt et al, 2003; Mak et al, 2002; Wang et al, 2001) and autosomal locus imprinting in mammals (Mager et al, 2003). A subset of PcG genes is required for R N A interference (Dudley et al, 2002; Pal-Bhadra et al, 2002). There are several lines of evidence connecting PcG function with cell cycle regulation and control of senescence in Drosophila and mammals (Dimri et al, 2002; Gi l et al, 2004; Itahana et al, 2003; Jacobs et al, 1999a; Jacobs et al, 1999b; L i and Rosenfeld, 2004; Luo et al, 2004; Pasini et al, 2004; Trimarchi et al, 2001), some of which will be discussed further below and in Chapter 4. PcG and trxG genes are also implicated in oncogenesis, in both leukemias and solid tumors ((Ernst et al, 2002; Jacobs and van Lohuizen, 2002; Kleer et al, 2003; Lessard and Sauvageau, 2003b; Leung et al, 2004; Raaphorst, 2003; Roberts and Orkin, 2004; Varambally et al, 2002). Additional relevant observations relating PcG and trxG function to hematopoiesis and leukemia will be discussed in further detail below, and in Chapter 4. What does seem clear 27 is that we have only explored the tip of the iceberg in terms of the multitude of functions and processes affected by the action of PcG and trxG genes, in only a very few organisms. XIII. Hematopoiesis; a model system for investigating mechanisms of cell differentiation and memory Hematopoiesis, the process whereby blood cells of all lineages are generated, is another excellent model system in which to probe mechanisms of commitment, cell differentation, and epigenetic gene regulation by maintenance proteins and other factors that function at the level of chromatin (Ema and Nakauchi, 2003; Fisher and Merkenschlager, 2002; Lessard and Sauvageau, 2003a; Lessard and Sauvageau, 2003b; Smale, 2003). Hematopoietic stem cells (HSCs) give rise to all types of mature blood cells, and are also characterized by their ability to self-renew (Ema and Nakauchi, 2003). HSC regulation and hematopoiesis have been intensively studied in mice and humans due to obvious clinical importance, although some mechanisms are conserved in Drosophila (Evans and Banerjee, 2003; Evans et al., 2003; Kondo et al., 2003). During hematopoiesis, daughter cells become more and more limited in their differentiation potential following each division event, and intermediate multipotential progenitor cells and lineage restricted precursor cell types have been identified (Reya, 2003; Reya et al., 2001; Zon, 2001). HSCs give rise to two major types of early progenitors with restricted potential: a common myeloid progenitor (CMP) that can generate erythroid and myeloid lineage (granulocytes, monocytes, and megakaryocytes) cell types; and a common lymphoid progenitor (CLP), which is the precursor for B and T lymphocytes and natural killer (NK) cells. This hierarchical model is supported by substantial evidence; however there are alternative models also under consideration (Ema and Nakauchi, 2003; Fisher, 2002; Katsura, 2002; Montecino-Rodriguez and Dorshkind, 2003; Quesenberry et al., 2002). Early efforts to understand lineage differentiation in hematopoiesis focused on identifying secreted cytokines specific to certain lineages and cell types (Zhu and Emerson, 2002). Substantial progress has been made in characterizing phenotypic markers such as receptors present on the cell-surface of different hematopoietic cells, thereby allowing categorization and isolation of particular cell types, as described in the following paragraphs (Kondo et al., 2003). However, in order to understand the decision-making process, current efforts have been focused on identifying key commitment genes, such as transcription factors and chromatin regulators, and cell signaling pathways, which drive these commitment processes 28 at different stages of hematopoiesis; selected examples follow (Wang and Spangrude, 2003; Warren and Rothenberg, 2003; Zhu and Emerson, 2002). During murine embryogenesis, the first adult repopulating HSCs are produced within the embryo proper, in the P A S / A G M region at approximately 10.5 dpc (Dzierzak, 2002; Dzierzak, 2003; Ling and Dzierzak, 2002). Subsequently, definitive hematopoiesis occurs within the murine fetal liver starting from about 11.0 dpc until birth, whereas HSC's colonize the spleen at approximately 15.5 dpc, and finally the bone marrow at 16.5 dpc (Douagi et al., 2002; Kondo et al, 2003; Robb, 1997; Speck et al, 2002). Shortly after birth, the liver and spleen are no longer major sites of definitive HSC generation (except during pathological conditions), whereas the bone marrow becomes the predominant site of definitive hematopoiesis, continuing throughout adult life. There are several differences between hematopoietic differentiation processes during embryogenesis as compared to adulthood (Douagi et al., 2002; Speck et al., 2002). Myeloid and/or erythroid precursor cells arise within the adult bone marrow and migrate into the periphery to mature and perform their various functions within the blood or tissues (Montecino-Rodriguez and Dorshkind, 2002; Zhu and Emerson, 2002). Differentiated cells of the myelo-erythroid lineages include macrophages and granulocytes which can be detected by flow cytometry using antibodies to the surface antigens Mac-1 (CD1 lb) and Gr-1 (which detects granulocytes only), and erthryocytes which are detected using the Ter-119 surface marker (Friedman, 2002). Within adult mice, early precursor cells (including but perhaps not limited to the common lymphoid precursor (CLP)) with T lymphoid potential arise within the bone marrow and migrate to the thymus to complete the differentiation process into T lymphocytes (Borowski et al., 2002; Katsura, 2002; Montecino-Rodriguez and Dorshkind, 2003; Zhu and Emerson, 2002). T cell differentiation within the thymus follows a developmental pathway characterized by changes in cell surface antigen expression, including CD3, CD4, and CD8 (Rodewald and Fehling, 1998). This process ultimately generates mature T lymphocytes that are released to circulate in the blood and peripheral organs, possessing either ap T cell receptors (TCRs; which are also either CD3+CD4+CD8- or CD3+CD4-CD8+), or y8 TCRs (which are less abundant, and are also CD3+CD4-CD8-), on their cell surface (see Chapter 4 Figure 4-5) (Carding and Egan, 2002; MacDonald et al, 2001a). Precursor cells within the CD4-CD8- double negative (DN) stage progress to the CD4+CD8+ double positive (DP) stage, and undergo major 29 expansion while doing so, such that the DP subset contains the vast majority (80-90%) of thymocytes within the cortex. The D N thymic compartment can be subdivided into four subpopulations, DN1 through DN4, based on the cell's differentiation antigen expression profiles and on the state of rearrangement of the TCRB locus (Figure 4-5) (Borowski et al, 2002; Godfrey et al, 1994; Godfrey et al, 1993; Rodewald and Fehling, 1998). Pro-T cells (DN1 and DN2 subpopulations) have their TCRB locus in germline configuration, and express CD44 (the Pgp-1 multidrug resistance transporter) and c-kit (the stem cell factor (SCF) receptor) but not CD25 (the IL-2Ra receptor) surface markers. DN2 cells upregulate CD25 expression, and are also responsive to growth factors IL-7 and SCF. Pre-T (DN3) cells are CD44-, c-kit- and CD25+, and show rearrangement of the TCRB locus. A major T cell developmental checkpoint, B (negative) selection, occurs between the DN3 and DN4 (CD44-, c-kit- and CD25-) stages of differentiation (Borowski et al, 2002). At this checkpoint, only those cells that can produce pre-TCR molecules (consisting of a successfully rearranged TCRB chain polypeptide in a heterodimer with an invariant pre-Ta chain) are permitted to continue along the developmental pathway to the DN4 stage. Cells that progress from DN4 to the DP stage upregulate their aBTCRs, and only the relatively small proportion of these cells that undergo successful positive selection (i.e. are non-self reactive) progress to the single positive (SP) stage, characterized by selective downregulation of either CD4 or CD8 (i.e. surface phenotype of either CD4-CD8+ or CD4+CD8-). SP cells migrate to the medulla, and undergo further differentiation into mature helper (CD4+CD8-) or cytotoxic (CD4-CD8+) T cells which then migrate to lymphoid tissues in the periphery. In newborn and adult mice, the early stages of B lymphocyte lineage progression occur within the bone marrow, and can be subdivided according to surface antigen expression, and by the status of rearrangement of their immunoglobulin (Ig) loci and Ig-chain expression within the cytoplasm (see Chapter 4 Figure 4-8) (Hardy et al, 1991; Hardy and Hayakawa, 1991; Hardy and Hayakawa, 1995; Hardy and Hayakawa, 2001; Rothenberg, 2000). A l l cells progressing through the various B lineage stages express the surface marker B220 (CD45R), although there are some non-B lineage cells, including N K cells, which also express this marker in mouse bone marrow (Rolink et al, 1996). B cell precursors up to and including immature B cells also express the AA4.1 cell surface antigen. B cell progenitors at the pre-pro-B stage (Philadelphia nomenclature; also referred to as Hardy's Fraction A) also express CD43 (leukosialin) and c-kit, 30 have their immunoglobulin-heavy {IgH) and -light (IgL) chain loci in the germline configuration, and are dependent on the bone marrow stroma for survival. Once rearrangements of the IgH D-J locus begin, CD 19 expression is upregulated; these cells are termed pro-B cells (Fraction B/C). Both IgH alleles are located at the nuclear periphery in non-B-lymphoid cells; however, in committed pro-B cells they are relocated to central positions of the nucleus and undergo large-scale contraction (Kosak et al., 2002). Regulated transitions between subnuclear compartments, which is also one model for PcG mechanism of action discussed above, appears to be a novel mechanism for regulating IgH transcription and recombination during B-cell development (Kosak et al., 2002). After successful V-DJ rearrangements are completed, CD43 and c-kit expression is downregulated, and the cells express surface pre-B cell receptor (pre-BCR). Having passed this first major developmental checkpoint termed pre-BCR selection, the cells progress to the 'early pre-B' compartment (Fraction C ) , at which time they lose their stromal dependence, and undergo a period of rapid clonal expansion. B lineage progression from the late pro-B to pre-B stage is highly dependent on stimulation by the cytokine IL-7 (Suda et ah, 1989; von Freeden-Jeffry et al., 1995), via signals integrated by the IL-7 receptor and pre-BCR (Fleming and Paige, 2002). Signalling through the IL-7 receptor is required for both B and T lymphocyte progression (Akashi et al., 1998). In the absence of IL-7, B cell progression is severely impaired at the pro-B stage (von Freeden-Jeffry et al., 1995). A sensitive assay to detect B lymphoid development defects at the level of the pro-B to pre-B progenitor cell transition is the IL-7 dependent in vitro colony-forming assay, or CFU-IL7 assay (Dorshkind et al, 1989). IL-7 controls chromatin accessibility for V(D)J recombination (Huang and Muegge, 2001; Muegge, 2003). At the 'late pre-B' stage (Fraction D), cells express cytoplasmic \i class IgH chain protein, and begin IgL chain V-J rearrangement. The cells then progress to immature B cells after IgL chain rearrangement is completed, leading to the expression of surface IgM; these cells are also termed new-B (Fraction E). Immature B220+IgM+ B cells are released from the bone marrow and migrate to the spleen and other secondary lymphoid organs to complete B cell lineage progression within lymphoid follicles, and at this second major checkpoint about 90% of the immature bone marrow B cells are lost, likely through deletion of autoreactive cells (Paul, 2003; Rolink et al, 1999; Rolink et al., 2001). Following follicular selection, the now mature B cells, having undergone alternative splicing of IgH transcripts to express surface IgD in addition to IgM, are released into the blood and circulate through the secondary lymphoid organs. 31 Particular transcriptional regulators and signaling molecules have been identified that regulate some of the hematopoietic lineage decisions described above. Multiple members of the Hox gene family are expressed in hematopoietic cells (Buske and Humphries, 2000; Owens and Hawley, 2002). Strikingly, Hox gene expression is highest in premature compartments, arguing for critical roles in HSC and early progenitor regulation (Sauvageau et al., 1994). Of interest, increased HoxB4 expression is associated with increased HSC regeneration and expansion, thus implicating Hox genes in fundamental processes of HSC self-renewal (Antonchuk et al, 2001; Antonchuk et al., 2002). HoxB4 is not essential for HSC generation as HoxB4 knockout mice exhibit only a minor reduction in numbers of HSCs (Brun et ah, 2004). While PcG and trxG genes are known to regulate Hox genes in development, it remains unknown whether or not they specifically regulate HoxB4 in hematopoiesis (see below and Chapter 4). The Ets family transcription factor PU. 1 is essential for development of all myeloid and lymphoid lineages (Singh et ah, 1999), and low levels of P U . l drive multipotential progenitors towards lymphoid development, while high levels of PU. l favour myeloid lineage differentiation (DeKoter and Singh, 2000; Wang and Spangrude, 2003). Along the lymphoid pathway, P U . l upregulates expression of the cytokine IL-7 receptor, which is a key signaling molecule required for survival and proliferation of early B and T cells described above (Warren and Rothenberg, 2003). Another key transcription factor is GATA-1 , which is required for erythrocyte development (Warren and Rothenberg, 2003). P U . l and GATA-1 appear to act antagonistically in determining lymphoid, myeloid, or erythroid cell fate, at least in part by PU. 1 competitively inhibiting acetylation of GATA-1 , mediated by the H A T coactivator CBP (Hong et ah, 2002). With regards to lymphocyte development, signaling through Notch-1, a member of the Notch family of transmembrane receptors, is a key step in regulating B- versus T-cell lineage decisions (Borowski et al., 2002; Koch et al, 2001; Maillard et al, 2003a; Maillard et al., 2003b; Pui et al, 1999; Radtke et al, 2004). Targeted deletion of Notch-1 completely blocks T cell development and allows B lymphopoiesis to aberrantly occur within the thymus (MacDonald et al, 2001b; Radtke et al, 1999; Wilson et al, 2001; Wolfer et al, 2001), while induction of Notch-1 signaling leads to abnormal generation of T-cell committed precursors in the bone marrow and in vitro culture assays (De Smedt et al, 2002). The zinc finger protein and putative transcription factor Ikaros is also critically important for lymphoid development (Busslinger, 2004; Fisher and Merkenschlager, 2002; Liberg et al, 2003; Smale and Fisher, 2002). Ikaros interacts directly with chromatin remodeling complexes (Kim et al, 1999), functions as a transcriptional repressor, and accumulates with several inactive genes within 32 perinuclear subcompartments containing centromeric heterochromatin (Brown et al, 1997; Trinh et al., 2001). The basic helix-loop-helix transcription factors E2A and EBF are necessary but not sufficient for proper early lymphoid development, act as upstream regulators of the Pax family member Pax5, and activate genes required for V(D)J recombination during B cell development including RAG1 and RAG2, as well as other B lineage specific genes (Johnson and Calame, 2003; O'Riordan and Grosschedl, 1999). Mice lacking E2A and/or EBF show arrested B-cell development at the B220+ CD43+ progenitor cell (fraction A) stage, characterized by a germline configuration of IgH (O'Riordan and Grosschedl, 1999). Pax5 is a critical B-lineage commitment factor, restricting developmental trajectories of lymphoid progenitors to the B lineage pathway (Nutt and Busslinger, 1999; Nutt et al., 1999). Pax5 blocks T cell development by downregulating transcription of Notchl (Radtke et al., 2004; Souabni et al., 2002). Further research will continue to elucidate the regulatory relationships between the transcription factors and signaling pathways discussed above. XIV. Role of maintenance proteins in regulation of hematopoiesis In addition to their well-characterized roles in control of body axis patterning via transcriptional maintenance of Hox gene expression patterns, PcG, trxG, and ETP epigenetic maintenance proteins are also involved in regulating normal and leukemic hematopoiesis in Drosophila and mammals (Jacobs and van Lohuizen, 2002; Lessard and Sauvageau, 2003b). The first indications of mammalian PcG, trxG, and ETP function in these latter processes was with the identification of Bmil, a homologue of the ETP genes Psc and Su(z)2 in flies (van Lohuizen et al., 1991a), as a cooperating oncogene in Ep myc transgenic mice to generate B and T cell lymphomas (Haupt et al, 1991; Haupt et al, 1993; van Lohuizen et al, 1991b), and observations of children with mixed lineage leukemias showing 1 lq23 chromosomal translocations resulting in duplications or chimeric fusion proteins involving the TRITHORAX homologue M L L / A L L - 1 / H R X (Cimino et al, 1991; Tkachuk et al, 1992; Ziemin-van der Poel et al, 1991). PcG, trxG and ETP genes are expressed in distinctive tissue, cell, lineage, and differentation stage specific patterns during normal murine and human hematopoiesis, and in leukemias and lymphomas (Bea et al, 2001; Fukuyama et al, 2000; Hasegawa et al, 1998; Lessard et al, 1998; Lessard et al, 1999; Park et al, 2002; Phillips et al, 2000; Raaphorst et al, 2001a; Raaphorst et al, 2001b; Raaphorst et al, 2000a; Raaphorst et al, 2000b; Raaphorst et al, 2004; van Kemenade et al, 2001; Visser et al, 2001). Such distinctive expression 33 patterns likely reflect the existence of multiple maintenance protein complexes of varied composition within the nucleus, with distinctive functions in normal and leukemic hematopoiesis and lymphomagensis (Jacobs and van Lohuizen, 2002; Otte and Kwaks, 2003). Different maintenance proteins appear to regulate hematopoiesis at different (likely multiple) stages, primarily based on studies using single and compound null mutant mouse model systems and/or engineered gene overexpression, correlated with gene expression patterns. Bmil is required for maintenance of definitive hematopoietic stem cell (HSC) self-renewal ability, and both Bmil and rae28 null mutants exhibit reduced bone-marrow competitive repopulating unit (CRU) activity and reduced myelo- and lymphopoiesis (Lessard and Sauvageau, 2003a; Lessard et al, 1999; Ohta et al, 2002; Park et al, 2004; Park et al, 2003; Raaphorst, 2003; Tokimasa et al, 2001). Mil mutant mice exhibit reduced B cell populations and in vitro proliferation of yolk sac and fetal liver progenitors along the myeloid lineage, although erythropoiesis is only mildly affected in the yolk sac system, which undergoes primitive hematopoiesis (Hess et al, 1997; Yagi et al, 1998; Y u et al, 1995)). Unfortunately, investigations of the role of Mil in normal hematopoiesis are hampered by mid-gestational embryonic lethality of Mil mutant models, hence a conditional Mil mutant model would be useful. Recently however, Stanley Korsmeyer's group solved this problem by investigating hematopoietic defects of mice deficient for Rag-2 recombinase (which is required to generate mature lymphocytes) that are also chimeric for Mil deficiency (Ernst et al, 2004). The Mll-deficient chimeras do not give rise to B or T lymphocytes showing that Mil is essential for lymphocyte generation, and they further showed that Mil is essential for fetal liver HSC activity (Ernst et al, 2004), similar to requirements for Bmil discussed above. Mil is also required during embryogenesis for full differentiation and expansion ability of AGM-derived HSCs (Ernst et al, 2004). Another PcG gene, eed, affects both primitive and late lineage-specific progenitor proliferation (although its effects on HSCs have not been analyzed); however, eed null heterozygotes and hypomorphs unexpectedly show hyperproliferation defects in both myelopoiesis and B-lymphopoiesis (Lessard et al, 1999), while eed hypomorphs show a partial block in early T-cell development (Richie et al, 2002). Still other genes, including Mell8, M33, and Ezh2, have been demonstrated to affect only lymphocyte development, as null or hypomorphic mutants either show no defects in myelopoiesis {Mel 18), or else their effects on myelopoiesis have not been investigated or published (Akasaka et al, 1997; Core et al, 1997; Su et al, 2003). Ezh2 regulates early B cell development by controlling aspects of 34 immunoglobulin heavy chain IgH gene rearrangement, but is dispensible for maturation and activation of peripheral B cells (Su et al, 2003). Mell8 and YY1 affect chemokine expression within the thymus and a T cell line respectively, suggesting a role in intrathymic T cell migration that could prevent proper thymocyte differentiation in mutant mice (Hasegawa et al., 2001; Miyazaki et al., 2002). Finally, Mell8 is also known to control later stages of T lymphocyte function, as Mell8 null mutants exhibit impaired mature peripheral Th2 cell proliferation (Kimura et al., 2001). Common features of the above mentioned PcG and ETP mutant models are the presence of all mature hematopoietic cell types in the circulation (sometimes even at normal levels) despite defects of varying severity at the level of progenitors, gene dosage dependence and variable penetrance of the hematopoietic phenotypes, and the progressive nature of most hematopoietic defects with increasing age of the fetuses and/or surviving adult mutant mice investigated. Importantly, the above studies show that similarity of function between different MP's in the process of axial patterning is not necessarily predictive of similarity in function of the same proteins in hematopoiesis. For example, Bmil and eed null and hypomorphic mutants respectively each exhibit posterior homeotic transformations along the body axis and derepression of overlapping Hox targets during embryogenesis (Lessard et al., 1999; Schumacher et al., 1996; van der Lugt et al., 1996; van der Lugt et al., 1994; Wang et al., 2002), yet Bmil and eed have antagonistic effects specifically at the pre-B stage of early B cell development, whereas Bmil and eed have opposing phenotypic effects with no evidence of genetic interaction in regulation of primitive myeloid (LTC-IC assay), mature myeloid (CFC assay), and primitive lymphoid (WW-IC assay) progenitor differentiation (Lessard et al., 1999; van der Lugt et ah, 1994). Another example is the incongruous effects of murine Bmil and Mell8 (both are Posterior sex combs homologues) in oncogenesis, since overexpression of Bmil in bone marrow of mice leads to increased probability of B and T cell lymphoma development suggesting that Bmil is an oncogene (Alkema et al, 1995; Haupt et al, 1991; Haupt et al., 1993), whereas inhibition of Mel 18 expression in an immortalized fibroblast cell line facilitiates tumour formation when those cells are transplanted into nude mice suggesting that Mell8 is a tumour suppressor (Kanno et al., 1995). However, both Mell8 and Bmil null mutants exhibit similar defects in early B and T lymphocyte development resulting from a poor response to IL-7 stimulation of precursor cells (Akasaka et al., 1997; van der Lugt et al., 1994). Eed may also be a tumour suppressor gene, as its expression suppresses development of carcinogen-induced thymic lymphomas (Richie et al., 2002). 35 The difficulties here lie in the fact that we have very little knowledge of which target genes, such as the transcription factors implicated in hematopoiesis as discussed above, are directly regulated by different MP's in different cell types, at different stages of differentiation. One would presume that the specific mechanisms of action of a given M P would not differ radically between cell types; hence diverse functional outcomes may result from coordinate regulation of distinct sets of target genes in concert with differential composition of M P complexes in different cell types. Purification of PcG and ETP complexes from various cell sources representing distinct stages of hematopoiesis (or development), examination of their functions and mechanism of action including identification of direct target loci, will ultimately be necessary to determine whether or not the disparate functions discussed above reflect the existence of separate multimeric complexes with stage-specific function (Otte and Kwaks, 2003). Indeed, since eed and Ezh2 null (or hypomorphic) mutants clearly exhibit opposite effects on early B cell development (Lessard et al, 1999; Su et al., 2003), it seems unlikely that their functions in this process would be mediated by a common PRC2-type nuclear complex (similar to that implicated in axial patterning; see above). Although the suite of direct target genes controlled by most PcG, trxG, and ETP epigenetic regulators in blood cell differentiation remain largely unknown, progress towards this goal has been made in characterizing effects on putative targets that control cell cycle progression. Bmi l negatively regulates the Ink4a/ARF (or cyclin-dependent kinase inhibitor 2a (Cdkn2a)) tumour suppressor locus (Itahana et ai, 2003; Jacobs et al., 1999a; Jacobs et al., 1999b; Park et al., 2004; Park et al., 2003; Raaphorst, 2003) which encodes the murine proteins p l 6 I N K 4 a and p l 9 A R F ( p l 4 A R F in humans) that function to inhibit cell-cycle progression through regulation of retinoblastoma (Rb) protein and p53 activity (Collins and Sedivy, 2003; Enders, 2003; Lowe and Sherr, 2003). In Bmil-/- lymphocytes and mouse embryonic fibroblasts (MEFs), p l 6 I N K 4 a and p l 9 A R F expression is dramatically enhanced, and the MEFs undergo premature senescence, whereas overexpression of Bmil in fibroblasts leads to an extension of their replicative lifespan and downregulation of p l 6 I N K 4 a and p l 9 A R F expression (Itahana et al., 2003; Jacobs et al., 1999a). It is not yet known whether or not B m i l regulation of Ink4a/ARF is involved specifically in mediating its effect on self-renewal of HSCs; however since compound Ink4a/ARF;Bmil null mutants showed partial rescue of Bmil-/- hematopoietic phenotypes (Jacobs et al., 1999a), this is a distinct possibility. Bmi l also regulates self-renewal in neuronal stem cells and cerebellar precursor cells, and therefore may represent a general mechanism for 36 maintenance of stem cell self-renewal (Leung et al, 2004; Molofsky et al, 2003; Park et al., 2004). Overexpression of Me/7 S arrests cell cycle progression of stimulated B lymphocytes through a pathway leading to downregulation of c-myc and cdc25, resulting in a reduction of the hyperphosphorylated form of Rb (Akasaka et al., 1997; Tetsu et al., 1998). However, in the context of T lymphocytes, Mel 18 positively regulates Th2 cell differentiation via induction of GATA-3 transcription factor expression (Kimura et al., 2001). Mel 18 also appears to regulate the Ink4a/ARF locus in mouse embryonic fibroblasts (Kranc et al., 2003). For the B m i l , Mel 18, and M33 proteins, the Ink4a-ARF locus is a direct target as these proteins bind to the promoter proximal region (M. Djabali and M . van Lohuizen, unpublished observations), importantly establishing this locus as the first non-Hox gene direct target of PcG action. However, neither Mel 18 nor M33 null mutants exhibit the degree of severity of hematopoietic phenotypes seen in Bmil-/- mice, which suggests that these PcG genes act at different stages of hematopoietic differentiation, act on different targets in addition to Ink4a/ARF, and/or that their loss is compensated for by other MPs to varying degrees. Conversely, not all MP genes implicated in hematopoiesis function through modulation of Ink4a/ARF expression, as Rae28 and eed mutants show no differences in gene expression at this target locus in fetal liver and adult hematopoietic tissue respectively (Lessard et al., 1999; Ohta et ah, 2002). Interestingly, eed mutants show hyperproliferative defects in myelo-erythropoiesis and B lymphopoiesis which are opposite to effects seen in Asxll and all other M P gene mutants investigated to date (Lessard et al, 1999), and yet the same eed mutants show hypoproliferative defects in T cell development which are similar to effects seen in other MP mutants including Asxll (Richie et al, 2002). It is probable that eed will be shown to regulate different sets of target genes in these processes, rather than utilizing disparate mechanisms, to lead to such apparently opposing hematopoietic phenotypes. Unfortunately there are currently no confirmed eed targets (Lessard et al, 1999; Richie et al, 2002). Rae28 -/- unfractionated total fetal liver cells exhibited a normal cell cycle phenotype and did not show aberrant expression of several cell cycle regulatory genes examined (Ohta et al, 2002), either suggesting that in fetal liver hematopoiesis, Rae28 does not function by modulating the cell cycle, or alternatively that the assay used was not sufficiently sensitive to detect changes in gene expression that may be occurring only in progenitor subsets that comprise a small proportion of the total fetal liver sample and may correlate with hematopoietic differentiation stage. 37 Hox genes are another suspected group of MP targets in hematopoiesis, due to their known involvement in HSC function, hematopoiesis, and leukemogenesis, and since they are direct MP targets in axial patterning as discussed above (Antonchuk et al., 2002; Buske and Humphries, 2000; Milne et al, 2002; Owens and Hawley, 2002; Payne and Crooks, 2002; Thorsteinsdottir et al, 1997). However, evidence from Rae28, and eed mutant mice indicate that effects of those particular PcG and ETP genes in fetal liver and adult hematopoiesis respectively are not mediated via Hox targets either (Lessard et al, 1999; Ohta et al, 2002). Conversely, certain Hox genes do play a critical role in Mil function during leukemogenesis, as Hoxal and Hoxa9 are required for myeloid transformation mediated by M i l chimeric fusion proteins (Ayton and Cleary, 2003), and are also likely regulated by M i l during normal hematopoiesis as suggested by earlier studies showing downregulation of Hoxal, Hoxa9, and Hoxal 0 in 12.5 dpc fetal liver of Mil homozygous mutant mice (Yagi et al, 1998). MP proteins will likely regulate multiple target genes throughout hematopoiesis, depending on the particular stage and lineage context. X V . The E T P group maintenance protein A D D I T I O N A L S E X C O M B S (ASX) in Drosophila melanoeaster The Drosophila Additional sex combs (Asx) gene was first described by Jurgens (1985) and was further characterized and cloned in Hugh Brock's laboratory (Sinclair et al, 1992; Sinclair et al, 1998b). Asx is an ETP Group gene because mutations in Asx enhance both Polycomb and trithorax Group gene mutations (Gildea et al, 2000; Milne et al, 1999). Consistent with this classification, Asx mutants exhibit bidirectional transformations along the anterior-posterior body axis, and A S X is required to maintain repression and activation of homeotic cluster (HOM-C/Hox) loci (Breen and Duncan, 1986; Campbell et al, 1995; Milne et al, 1999; Sinclair et al, 1992; Soto et al, 1995). Like most PcG and ETP gene LOF mutants, LOF Asx mutants are embryonic lethal; however they also have unique defects in the head region, and do not complete head involution (Breen and Duncan, 1986; Jurgens, 1985; Sinclair et al, 1992; Soto et al, 1995). Unlike most Drosophila PcG and ETP mutants, Asx mutants show tissue-specificity in the ectopic expression patterns of homeotic genes (McKeon and Brock, 1991; Soto et al, 1995). A S X is also an enhancer of centromeric heterochromatin-mediated position effect variegation (PEV) in Drosophila (Sinclair et al, 1998a), suggesting a role in heterochromatin function, perhaps in mediating an open chromatin configuration at variegating loci. This 38 observation suggests a unique role for A S X in both euchromatic gene regulation and heterochromatin functions, since most other PcG, trxG, and ETP proteins do not function in PEV at all, and those that do (i.e. E(Z), E(PC), and SU(Z)12) are suppressors, not enhancers, of PEV (Birve et al., 2001; Laible et al., 1997; Sinclair et al, 1998a; Yamamoto et al, 2004). However, a subset of PcG gene mutants (not including Asx) act as dominant suppressors of telomeric position-effect (TPE), while several trxG gene mutants (all of which belong to chromatin remodeling complexes) behave as dominant enhancers of TPE (Boivin et al, 2003). Interestingly, mutations in Asx and alleles of certain other PcG genes that do not act as TPE suppressors on their own do show synergistic effects when in combination, suggesting that Asx is capable of affecting TPE but that PcG genes can largely compensate for each other (Boivin et al, 2003). How the observation of Asx being a weak suppressor of TPE relates to its role as an enhancer of PEV is unclear and requires further investigation. Of all PcG and ETP genetic mutants tested, Polycomb (Pc) mutants show the strongest genetic interaction withes* mutations (Campbell et al, 1995; Milne et al, 1999). A S X also shows partial colocalization on polytene chromosomes with the PcG proteins Polycomb (PC), ' polyhomeotic (PH), polycomblike (PCL), and the ETP protein E(PC), whereas approximately 30% of A S X binding sites do not overlap with any of the PcG proteins tested (PC, PH, or PCL), suggesting that A S X also regulates a set of unique loci (Sinclair et al, 1998b; Stankunas et al, 1998). However, A S X does not appear to interact directly with PC using the 2-hybrid system (Kyba, 1998). There is no published evidence that A S X belongs to PRC1 or PRC2 soluble complexes in Drosophila. Therefore it seems likely that, while A S X and PC interact genetically and localize to a common set of target loci, their interaction is not direct. It is therefore likely that A S X and PC belong to distinct regulatory complexes that function together in regulation of the same loci. The mechanism of action of A S X is unclear, but several lines of evidence have recently provided some insight. A S X colocalizes with trithorax (TRX) to many loci on Drosophila polytene chromosomes (T. Milne, S. Smith, and A . Mazo, unpublished observations), suggesting that they regulate a set of common target genes. Asx interacts genetically with trx (Milne et al, 1999), and it was shown by yeast-2-hybrid and in vitro GST pulldown analyses that A S X interacts directly through its C-terminal region with the SET domain of TRX (Kyba, 1998). SET domains act as histone methyltransferases and also participate in protein-protein interactions (Alvarez-Venegas and Avramova, 2002; Yeates, 2002). The same C-terminal region of Drosophila A S X interacts directly with the SET domains of E(Z), ASH1, and 39 SU(VAR)3-9, using a yeast-2-hybrid assay (T. Rozovskaia and E. Canaani, unpublished observations). A S X and T R X also co-immunoprecipitate (S. Smith and A. Mazo, unpublished observations). The above data collectively suggests that A S X and T R X functionally interact in vivo and directly interact in vitro. Currently, however, there is no biochemical evidence showing that A S X is a component of soluble TRX complexes (Petruk et al., 2001; Petruk et al., 2004), or that A S X mammalian homologues (see below) belong to the A L L - 1 / M L L (TRX homologue) supercomplex in mammals (Nakamura et al., 2002), therefore such an interaction is either transient, unstable, involves substoichiometric amounts of A S X and may therefore have escaped detection, occurs in vivo only when A S X and/or TRX is bound to a target locus, or else simply remains unreported in the literature. The TAC1 complex containing T R X is recruited to the heat shock hsp70 locus (and certain other heat shock loci) shortly after heat shock induction and facilitates transcriptional elongation by modifying nucleosome structure including SET domain-mediated methylation of H3-K4 (Smith et al., 2004). A S X is also rapidly localized to several heat shock loci, including hsp70, following induction (S. Smith and A . Mazo, unpublished observations). Strikingly, in Asx mutants, T R X is no longer recruited to the hsp70 locus following heat shock (S. Smith and A. Mazo, unpublished observations), providing strong evidence of a functional role for the interaction between A S X and T R X . Taken together, the above results suggest two possible models for function of the ETP protein A S X at the hsp70 locus. The first model is that A S X , already localized to the target locus, subsequently recruits T R X via interaction with its SET domain. The second model is that A S X is present within a soluble complex including T R X that gets recruited to hsp70 upon induction. In either case, A S X is required for T R X recruitment and subsequent activation of hsp70 upon heat shock induction. Neither of these models; however, addresses how A S X would contribute to maintenance of transcriptional states of developmentally regulated loci. Extrapolating from the above results and models, A S X may function to mediate the balance between maintenance of activation versus repression of homeotic genes by modulating TRX function. This may be consistent with the ETP models 3 and 4 discussed above. It is important to keep in mind; however, that A S X likely possesses additional functions not related to its interaction with TRX. Similar models to the two listed above could also apply to other SET-domain containing proteins in addition to TRX, since A S X also interacts with the SET domains from SU(VAR)3-9, E(Z), and ASH1 (see above). ASX-SET domain interactions are likely context and target gene specific, and likely modulate the transcriptional state of Hox and 40 numerous other target loci. Therefore, studying the function and mechanism of action of A S X , should provide important insights into connections between maintenance of repressive and active transcriptional states of target loci, including homeotic genes. These possibilities will be further discussed in Chapter 5. X V I . Thesis aims The molecular basis for the dual role of Asx in maintenance of transcriptional activation and repression is not well understood. In order to gain more insight into the structural basis for the role of Asx as an ETP gene, I first undertook a search for Asx mammalian orthologous genes to identify conserved domains that may be of functional importance. In Chapter 2 of this thesis, I report the discovery and characterization of three Asx homologues in humans and mice, and identify several conserved regions between Drosophila and mammalian homogues, and other regions that are uniquely shared among mammalian Asx homologues. A l l three mammalian A S X homologues contain a C-terminal region with a high degree of sequence identity to the C-terminal portion of A S X previously shown to interact with the SET domain of T R X in Drosophila. Subsequently, for the first Asx gene homologue discovered, called ADDITIONAL SEX COMBS LIKE 1 (human ASXLI and mouse Asxll), it was shown that the interaction is conserved between A S X L I and the SET domain of M L L / A L L - 1 / H R X , human homologues of A S X and T R X respectively, by yeast-2-hybrid and GST pulldown analyses (C. Fisher, E. O'Dor, and H . Brock, unpublished observations). The same general C-terminal region also interacts with unmodified histone tails (H3, H4, H2A, and H2B) by GST pulldown analysis (E. O'Dor and H. Brock, unpublished observations). Further investigation of conserved structure-function relationships pertaining to the A S X - T R X and A S X L I - M L L interactions are currently being conducted in Hugh Brock's laboratory in collaboration with Alexander Mazo's laboratory. As discussed above, Asx possesses unique functional attributes for a PcG gene in Drosophila, which led to its reclassification as an ETP group gene. I became interested in determining the function of Asx mammalian homologues, and asking the question of whether or not ETP function of Asx is conserved during evolution. In Chapter 3 of this thesis, I present data addressing these questions, obtained by generating a mouse knockout model of Asxll, and investigating the effect of Asxll mutation on axial patterning and Hox gene expression during embryogenesis, and genetic interaction with another PcG gene in mice called M33. In Chapter 4 of this thesis, I extend the functional characterization of Asxll null mouse mutants to the 41 analysis of hematopoiesis, and compare functions of Asxll with those of other mammalian PcG and trxG genes known to be involved in hematopoiesis and leukemia. In Chapter 5 of this thesis, I discuss how information obtained from my investigations of Asx gene homologues in mammals, combined with knowledge of A S X structure and function from the Drosophila model system, may be used in the future to address questions of potential mechanisms of action of ASX-like proteins (ASXLPs) in Metazoans. I also discuss perspectives on how the above information should be used to advance our overall understanding of maintenance proteins in processes of epigenetic gene regulation. 42 C H A P T E R 2 Characterization of Asx-like genes in mammals1 1 This chapter is partially based on the article: Fisher, C.L., J. Berger, F. Randazzo, and H.W. Brock. 2003. A human homolog of Additional sex combs, ADDITIONAL SEX COMBS-LIKE 1 (ASXLI) maps to chromosome 20ql 1. Gene 13:115-26. J. Berger and F. Randazzo contributed cloning and sequencing of partial cDNAs for human ASXLI and mouse Asxll, and Northern blot data for ASXLI. H . Brock contributed the subcellular localization of A S X L I (unpublished results). I contributed all other primary data, and conducted all data analyses, including bioinformatics, contained within this chapter. 43 I. Introduction Gene expression patterns within cells can be passed on to daughter cells, even in the absence of the transiently expressed transcription factors that initiated transcription or repression. The mechanism of this cellular memory function is not clearly understood, but growing evidence suggests that epigenetic changes in chromatin structure are important in maintaining gene expression patterns. Two groups of genes that are essential for maintenance of homeotic gene expression have been identified in Drosophila. The Polycomb group (PcG) genes encode proteins required to silence homeotic genes (Mahmoudi and Verrijzer, 2001; Ringrose and Paro, 2001; Simon and Tamkun, 2002). The actions of PcG proteins are antagonized by the trithorax group (trxG) proteins (Francis et al., 2001; Mahmoudi and Verrijzer, 2001; Poux et al, 2002; Shao et al, 1999; Simon and Tamkun, 2002). Both PcG (Garcia et al, 1999; Kuzmichev et al, 2002; Levine et al, 2002; Muller et al, 2002; Saurin et al, 2001; Shao et al, 1999; Tie et al, 2001; Tie et al, 2003) and trxG (Armstrong et al, 2002; Nakamura et al, 2002; Papoulas et al, 1998; Petruk et al, 2001) genes encode chromatin proteins found in multimeric complexes of varying composition depending on the developmental context (Otte and Kwaks, 2003; Pirrotta et al, 2003). Epigenetic inheritance by PcG and trxG proteins has been linked to changes in histone covalent modifications at target loci resulting from enzymatic activity of conserved functional domains within PcG and trxG proteins and/or their interaction partners (Orlando, 2003). Consistent with the histone code hypothesis (Jenuwein and Allis, 2001; Strahl and Allis, 2000), PcG proteins associate with histone deacetylases (HDACs) (Saurin et al, 2001; Tie et al, 2001), whereas trxG proteins associate with histone acetyltransferases (HATs) (Petruk et al, 2001); however, changes in histone acetylation do not appear to contribute to long-term silencing but rather are reversible, acting as a switch between permissive and repressive chromatin states (Czermin and Imhof, 2003; Eberharter and Becker, 2002). Heritable silencing of euchromatic loci by the PcG may result from other epigenetic marks on histones, leading to changes in higher-order chromatin organization (Orlando, 2003; Vermaak et al, 2003). The most well characterized example of this is histone methylation, conferred by the action of conserved SET domains, present in PcG, trxG, and heterochromatin proteins (e.g. SU(VAR)3-9, E(Z), TRX, ASH1) which act as histone methyltransferases (Kouzarides, 2002; Orlando, 2003). It is still unclear whether or not particular universal histone modifications dictate the balance between maintenance of gene activation and silencing according to a histone code (Jenuwein and Allis, 2001; Strahl and Allis, 2000); however, histone H3 methylation at lysine 4, which is mediated by T R X and ASH1, does 44 appear to be associated with active genes, can result in exclusion of PcG protein binding, and promotion of SWI/SNF chromatin remodeling complex binding with associated HATs (Beisel et al, 2002). In flies, PcG mutations exhibit posterior transformations in embryos and adults, caused by failure to silence homeotic genes (Struhl and Akam, 1985), whereas trxG mutations exhibit anterior transformations caused by failure to maintain activation of homeotic loci (Breen and Harte, 1993; Mahmoudi and Verrijzer, 2001). Mutations in one PcG gene enhance the homeotic phenotypes of mutations in different PcG genes (Jurgens, 1985), and similarly mutations in one trxG gene enhance the homeotic phenotypes of mutations in other trxG genes (Gildea et al, 2000; Shearn, 1989). Recently, the clear distinction between PcG and trxG genes has been blurred by the discovery of genes that have phenotypes and genetic characteristics of both groups of genes (Brock and van Lohuizen, 2001; Gildea et al, 2000; Simon and Tamkun, 2002). Some PcG gene mutations exhibit both anterior and posterior transformations, or enhance homeotic phenotypes of trxG mutations, and conversely, some trxG genes are required for silencing (Brock and van Lohuizen, 2001; Gildea et al., 2000; Hagstrom et al., 1997). Genes that exhibit genetic interactions with both trxG and PcG gene mutations have recently been renamed Enhancers of trithorax and Polycomb (ETP) genes, as they show dual function in maintenance of activation and silencing (Brock and van Lohuizen, 2001; Gildea et ah, 2000). It is now apparent that maintenance of cellular memory is more dynamic and complex than previously thought. The developmental stage and context-specific interactions between PcG and trxG proteins, regulatory cofactors, the RNA polymerase II complex, and/or general transcription factors, are all involved in controlling the balance between silencing and maintenance of activation of target loci (Orlando, 2003; Otte and Kwaks, 2003). PcG, trxG, and ETP proteins are conserved in mammals, and have similar functions in maintenance of activation and/or silencing of Hox genes (Brock and van Lohuizen, 2001). Murine embryos homozygous for targeted null alleles of PcG and ETP genes exhibit predominantly posterior transformations of the axial skeleton (Akasaka et al., 1996; Core et al., 1997; Donohoe et al, 1999; Schumacher et al, 1996; Takihara et al, 1997; van der Lugt et al, 1994; Wang et al, 2002), whereas null mutations in the trxG gene Mil cause primarily anterior transformations (Yu et al, 1995), and both groups of mutants exhibit misexpression of homeotic genes (Hanson et al, 1999; Y u et al, 1998). These results suggest that mammalian PcG, trxG, and ETP genes have conserved functions in Drosophila and mammals; however, it remains to 45 be seen i f function of every Drosophila ETP gene is conserved in mammals and/or if novel ETP gene function has been acquired during evolution. Mammalian PcG, trxG, and ETP genes have acquired multiple functions during evolution, because gene knockout mice exhibit, for example, hematopoietic, neural crest, and cardiac defects (Koga et al., 2002; Lessard and Sauvageau, 2003a; Takihara et al, 1997; van der Lugt et al, 1994), as well as sex reversal (Katoh-Fukui et al, 1998). Drosophila PcG and ETP proteins are ubiquitiously expressed (Carrington and Jones, 1996; DeCamillis and Brock, 1994), whereas most murine PcG and ETP genes show tissue-specific expression (Gunster et al, 2001). Collectively, the above data suggests that individual mammalian PcG, trxG, and ETP proteins are likely to have specific, multiple roles in embryonic development and cell differentiation processes. Accumulating evidence implicates PcG, trxG, ETP, and other chromatin proteins not only in maintaining proper cell fate decisions during development, but also in the control of cell proliferation and prevention of tumorigenesis, presumably by regulating a variety of target genes, few of which have been identified to date (Jacobs and van Lohuizen, 2002; Lessard and Sauvageau, 2003b; Muyrers-Chen and Paro, 2001; Wolffe, 2001). Notable in this regard is the proto-oncogene Bmil, a murine homologue of the ETP gene Posterior sex combs (Psc), and the first mammalian PcG homologue to be identified, on account of its cooperation in inducing B-cell lymphomagenesis in Ep-myc transgenic mice (van der Lugt et al, 1996; van der Lugt et al, 1994; van Lohuizen et al, 1991a; van Lohuizen et al, 1991b). Bmi l is required to maintain self-renewal ability of neural stem cells, and normal and leukemic hematopoietic stem cells, likely mediated via regulation of the cell-cycle through the Ink4a locus (Ema and Nakauchi, 2003; Jacobs et al, 1999a; Jacobs et al, 1999b; Lessard and Sauvageau, 2003a; Molofsky et al, 2003; Park et al, 2004; Park et al, 2003; Raaphorst, 2003). The Additional sex combs (Asx) gene of Drosophila is an ETP gene, as mutants exhibit anterior and posterior transformations in adults (Sinclair et al, 1992). Asx mutations enhance the anterior transformation phenotypes of trx mutations, and the posterior transformation phenotypes of PcG genes, and result in homeotic cluster gene misexpression (Milne et al, 1999). Asx encodes a chromatin protein whose binding pattern to polytene chromosomes partially overlaps those of PcG proteins (Sinclair et al, 1998b) and trxG proteins (T. Milne, S. Smith, and A . Mazo, unpublished observations). A S X also interacts in vitro with the SET domains of TRX, SU(VAR)3-9, E(Z), and ASH1 (T. Rozovskaia and E. Canaani, unpublished observations). Asx is an enhancer of position effect variegation, suggesting that it has a role in 46 establishment of heterochromatin structure (Sinclair et al., 1998a). However, the molecular basis for the dual role of Asx in maintenance of activation and silencing is not fully understood (see Chapter 1). In an effort to understand the structural basis for Asx function as an ETP gene, a search for mammalian homologs of Asx was undertaken in order to identify conserved domains. In this chapter I report the discovery and characterization of three Asx homologues in humans and mice: ADDITIONAL SEX COMBS LIKE 1 (human ASXL1 and mouse Asxll), ADDITIONAL SEX COMBS LIKE 2 (human ASXL2 and mouse Asxl2), and ADDITIONAL SEX COMBS LIKE 3 (human ASXL3 and mouse Asxl3). I describe the predicted protein, and genomic structures of ASXL1 and Asxll, and examine the expression patterns of the ASXL1, Asxll, and Asxl2 genes. II. Results L Characterization of human ASXL1 cDNA and genomic clones The dbEST public database (NCBI) was searched for human ESTs with similarity to the Drosophila Asx sequence using BLAST, and multiple human cDNAs exhibiting similarity to Drosophila Asx were identified. The 1,683 bp R61738 clone derived from the Soares infant female brain 1 M B library was obtained from the I M A G E Consortium. To obtain a full-length cDNA, a 0.45 kb Hindlll fragment from EST R61738 was used as a probe to screen an adult human heart cDNA library. One 4,926 bp clone was identified and sequenced, and contained a short 5'UTR, the entire putative open reading frame, and partial 3' UTR sequence of an Asx-like cDNA (all above work done by J. Berger and F. Randazzo). This partial human ^45x-like cDNA sequence was submitted to GenBank under accession number AR072721 (F. Randazzo, unpublished data). In my subsequent searches of dbEST, I identified an additional EST cDNA (GenBank BM456370) that extended the 5'UTR sequence in t h e i ' direction. While the position of the 5' end has not been confirmed by SI protection analysis, it is likely that the authentic 5' end is very close to the 5' end of the cDNA reported here. While our experiments were in progress, a partial human ,4s;c-like cDNA (GenBank AB023195) was published as part of a study to clone novel large transcripts from human brain (Nagase et al., 1999). This partial cDNA starts at 777 bp in our sequence, and ends at 6,864 bp, extending our sequence in the 3' direction. By aligning the sequence of the 4,926 bp cDNA clone with that of EST cDNA clone BM456370, genomic contig NT_028392.4, cDNA clone AB023195, and multiple overlapping EST cDNAs, I generated a cDNA contig of 6,864 bp 47 (Figure 2-1) that was named ASXLI by agreement with the Gene Nomenclature Committee. Table 2-1 lists selected cDNA and genomic D N A sequences which represent ASXLI and their relationship to the full length ASXLI cDNA contig. Together, the available cDNAs show that the 5' UTR is 258 bp, the coding region is 4,626 bp, and the 3' UTR is 1,980 bp. The 3' UTR sequence contains three polyadenylation signal sequences at positions 4,882, 5,935 and 6,844 bp (Figure 2-1). The distribution of oligo d(T)-primed EST cDNA clones in the dbEST database (NCBI) revealed by a B L A S T search to the full length ASXLI cDNA indicates that all three polyadenylation signals are used, and generate three alternative transcripts of approximately 4,925 bp, 5,976 bp, and 6,864 bp respectively. We have deposited the ASXLI cDNA sequence described above in GenBank under accession number AJ438952. The ASXLI cDNA was previously represented by two UniGene (NCBI) clusters Hs. 211193 and Hs. 3686, the former containing only 5' EST sequences, and the latter containing ESTs corresponding to the remainder of the ASXLI cDNA. The error in assignment of ASXLI to two UniGene clusters is likely a result of the poor representation of EST cDNAs for the 5' region of ASXLI in the dbEST database. Recently, the UniGene cluster Hs.211193 has been retired, and the 5' region ESTs in the ASXLI locus have been incorporated into UniGene cluster Hs.422176, which is still incorrectly assigned. The LocusLink (NCBI) identifier for UniGene cluster Hs. 3686 is 23393. As of the Nov. 2002 Freeze of the human genomic database (NCBI), there were 3 additional UniGene clusters containing ESTs that span the ASXLI locus: Hs.90695, Hs.345093, and Hs.136033. The first two clusters likely represent additional 5' region sequence of the ASXLI cDNA, while the latter cluster contains ESTs that fall within the large 59 kb intron of ASXLI. The origin of the Hs. 136033 cluster ESTs is unknown, as many contain sequence not contained within the characterized ASXLI cDNA, and many contain sequence from the reverse strand as compared to the ASXLI locus direction of transcription. Two overlapping B A C genomic clones generated by the Sanger Centre, derived from chromosome 20, were identified that together contain the ASXLI locus and show close to 100% identity to our cDNA within exons. Human genomic clone RP11-358N2 (GenBank AL121583) includes exons corresponding to the ASXLI cDNA up to the bp 2871 position and overlaps with genomic clone RP5-1184F4 (GenBank AL034550) which contains exonic sequence corresponding to the ASXLI cDNA starting at bp 2,772, and the entire 3' UTR (Table 2-1). The 48 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301 2401 2501 2601 2701 2801 2901 3001 3101 3201 3301 3401 3501 3601 3701 3801 3901 4001 4101 4201 4301 4401 4501 4601 4701 4801 4901 5001 5101 5201 5301 5401 5501 5601 5701 5801 5901 6001 6101 6201 6301 6401 6501 6601 6701 6801 G G G C A G C C G C C G C T G C C G C C G T G G G C G A C T G A C G C A G C G C G G G C G C G T G G A G C C G C C G C C G C C C C T C C C C C A C C G C C G C T C T C G C G C C A G C C G G T C C C C G C G T G C C C G C C C C T T C T C C C C G G C C G C A C C C G A G A C C T C G C G C G C C G C C G C T G C C A C G C G C C C C C C C C A C C G C C G C C G C C G C C C C A G C C C C G C G C C A C C G C C C C A G C C C G C C C A G C C C G G A G G T C C C G C G T G G A G C T G C C G C C G C C G C C G G G G A G A A G G A T G A A G G A C A A A C A G A A G A A G A A G A A G G A G C G C A C G T G G G C C G A G G C C G C G C G C C T G G T A T T A G A A A A C T A C T C G G A T G C T C C A E ' A A T A A A T T A T G G C C A T G G G A A A C G T T G T A T A T T T A G T G T G T G T A T T T T G A T A A T G A T T G A T C T T A A A T C T G T A T A C A G A A T A T C A T T G A T A C A A T A C T C T T T A G G C A G G A G C A C T C T T G C C T T C C C C C A A A A T T T A C A C T G C T A A A G C C C T C T G T C A C T T G G C G A C C C T T C T G G T C T T G C T G G A G G G G T T T C C T G G G T A T A A C C C A T T G G G C T G C C C A A G G C C A G C C A G C C T G A G C T C T C C T G C A A G A C A G A G C C T G A T G T G G C A C G G A G T G G G G T T G C G G G G G G T G G G G G G A C T G C C T G A C T C C C A G A G G G A C T T G A A A C T G A A G C A A G A A G G T T G C A T T C T C C A C C A A G G G A G T T A A C C T A C C T G A A C T A A G T A G A A A T G C C A G T C T T C C A C T A C C C C C T C C C T G C C A T C T T T T C T T C T G C T A C T T T G G G G A G T T G A T G G C C A G G A A A G A A G C C A G C A C A G G G T T A A A G T A A C T C C T G G C A T T G C C C A C C A G G G G G C T G G T G C A C C T G C T G A C C T C A G G G T C A C A G T T G A G T C A T T T G C C A G T T G A C G G A G C A A G T T T G A C C T T G G T T C T G T T G C T G A A G C A A A T T T G G A A C T T T T C T G T C T C A G T G T G A T C C A C T A A C C C A C A G G A T C A T T T G G A A C C T T G A A T A G C T C T G C T T G G A C A A T G G G G T T G G G G A A T A G G G T T G T C T T T C C T A T G A A A A T G C C A T C T G T A G A C C T T G T G A G T C A G C C G T C C A G A T G T T T G C A G G T G A A T T C C T C T G C T T G A C A T C C T C C C T G T C A C T T T G G A C C C T A T G G G A G T G G G C A T C T C C A C G C A C C T G T G T A T G T G A A A G T C A T T T T A C A T T T C A A A G C A G T G T G T G T T T C T T A T T T T T A T A T T T T T A A C T C T T T A T T C T T G G A T G T A T A A A G T G A A C T T T T T G G C T T C T G T A A G T A T G C T C T A T G C A C C T C T A A T G T T T T A T C A T G T A T T T A T A T G T T G T A C A C A G T A C T G G C T G A T T C T G T A A A T G G A T G T A T T G T A C A G A G A A C A T G A A C G T C T C T T C C T A A T T T T A C A T C T T C A G C A T C A T T G C A T T A A A G T G G T G T A A T C T C C T T C T C T A C A T C T G T T G T C A G A G C C A C T G A G T G C T G T G C T G C T C G A C G T G A G G G T G A A A T G A T T G A C T T G T G A C C T G C C A G G T T G C C C G A T G C C C T G T T G G G T C A C C G G C T G G A C C T G C T G C A G C C T G C A G A G C C A C A G T C A G C C T G C C C A C A T G C C A C C G A G C A A A C G C A T C T T G C T T T T C A C A T C T C T C C T C C T A C A G C C T T A A T G G C T G C T T G C T G C C A T A T G T G A C A A A T C A C C A C C A C C A G T G T T A A G T G C T T C T G G A T T C A T G G G T G A G T T C C C T G G G C A G C C C C C A G G A A G G C C T T C C A G A T C T G G C T C C A G G G T C A C C A C C T G T C A C A G C A A T A C C T G G G A C C A T G C T C T C C T G G G A C T G T G A G G C T C C T T T T G A C G T A C T T T T G A C A T C A G G C A G G T T T G G G A A G A A A C A A A G C C A T G C C T G C T C C T G C C T C T C T C C C A A C A T G T T T C C A G C A A G T A G A T G C C C C T G T G T G T G T T T T C C C T T G C C T T G T T T C C T G C C T T A T A T C T T G T A T T T C G A C T T A T T A C A G A G T T G A G G G T T C T T G C T T A A T T T A G A T C A A G T A T A A A A T T T G T A T G A C T T C A A G T C T C A T T T T A T C T G A A A G G T T T T T T T C T C A T T T A A T C T G A T G T G G C A T T T T C G T C A T C T G A A G C A T G A G T G A C A A G T T G G G A A T G A T G T G G T G A T T T A G A A T G C A G T A T T G G C C A A G T C C A A G T T G T C A A C T T A A G C G T C T G T T T A C C A A A G A C C G G G A A C A G G G G C C C A A A C A T G T C C A G T C C T C T T C T T C C C T C T G C T G G A A C C T T T G G G G A C A C T C A A G G G T A C A G T T T G A C A C T G A T C T G G T C C A T G A G G C T G C C C A G A G A A A G C A C T G C T T C T G T A T G T C T C T T G T G G T A T T G G A A C A A T A A A C C C G T A C A A C C T G C Figure 2-1. Nucleic acid sequence of human ASXLI cDNA. Human cDNA clones were assembled into a 6,864bp contig. The longest ORF is indicated by a black background and polyadenylation sites are in bold. 49 Table 2-1. Selected cDNA and genomic D N A sequences which contain ASXL1 sequence used to generate the ASXL1 cDNA contig. GenBank Sequence type Size of bp of ASXL1 contig Accession clone or contained within listed Number contig (bp) sequence AJ438952 cDNA contig 6864 1-6864 AR072721 partial cDNA clone 4926 1-48983 AB023195 partial cDNA clone 6088 777-6864 BM456370 EST cDNA clone 757b 1-728 X M 047013 model cDNA 6354 511-6864 AL121583 genomic B A C clone 84710 1-2872 AL034550 genomic B A C clone 118873 2773-6864 NT 028392.4 genomic contig 5063606 1-6864 a The first 28 bp of AR072721 may represent a cloning artifact since that sequence is not present within the genomic contig NT_028392.4; therefore, that sequence was deleted from the ASXL1 cDNA contig. The stated size represents the sequenced portion of the EST which is likely smaller than the full insert cDNA size 50 human genomic contig NT_028392.4, containing sequence mapped to chromosome 20ql 1, contains the entire ASXL1 locus. ii. Cytological mapping of human ASXL1 ASXL1 was mapped by FISH with a cosmid containing ASXL1 to 20ql 1 (J. Berger and F. Randazzo, unpublished observations). This map assignment is in agreement with human chromosome 20 sequence and mapping data generated by the Sanger Centre, which places ASXL1 at 20ql 1.21, between markers for the characterized genes kinesin family member 3B (KIF3B) in the centromeric direction, previously mapped to 20q, and DNA-methlytransferase 3 beta (DNMT3B) in the telomeric direction, previously mapped to 20ql 1.2. The (predicted) gene immediately telomeric to ASXL1 codes for the hypothetical protein DKFZp566G1424, mapped to 20ql 1.1-ql 1.23, which is transcribed in the opposite direction to A S X L L iii. Characterization of mouse Asxll cDNA and genomic clones The dbEST public database (NCBI) was searched for mouse ESTs with similarity to the Drosophila Asx sequence using BLAST, and multiple mouse cDNAs were identified. The W41911 EST clone derived from the Soares mouse 13.5-14.5 d.p.c. embryo library was obtained from the I M A G E Consortium. To obtain a full-length cDNA, the W41911 EST clone was used by J. Berger and F. Randazzo as a probe to screen a mouse neonatal brain cDNA library (Stratagene; combined random and oligo(dT) primed library). Of six positives identified, two clones were sequenced: a 4.8 kb clone, mAsx7A, which contained a poly(A) tail, and a 1.8 kb clone, mAsx9A, without a poly(A) tail. Sequencing of these clones revealed a 1,273 bp overlap between the 3' end of mAsx9A and the 5'end of mAsx7A. Using these two clones, I constructed a single 5,330 bp cDNA clone which contains the longest putative open reading frame and partial 3' UTR sequence of Asxll. Analysis of the distribution of BLAST hits of the cloned Asxll sequence against the mouse EST database (NCBI) revealed additional 3'UTR sequence. There is also a partial cDNA CAP-trapper clone (Okazaki et al., 2002) present in public databases (GenBank AK079284) that contains some 3'UTR sequence of Asxll. By aligning multiple overlapping ESTs and the mAsx clone, I extended the 3'UTR sequence of mAsx by 1,215 bp to generate a full length cDNA contig of 6,545 bp (Figure 2-2; Table 2-2) that was named Asxll by agreement with the Gene Nomenclature Committee. 51 To clone additional 3'UTR sequence, I performed 3' R A C E PCR on a mouse day 0 ES cell oligo(dT) primed cDNA library and cloned a 2. lkb fragment, mAsx3'Xho, which begins at 4,487 bp and ends at 6,534 bp of the Asxll cDNA contig. The identity of this clone compared to the Asxll contig was confirmed by end sequencing and restriction fragment mapping analysis; however, a clone containing the full length Asxll cDNA was not constructed. In an attempt to extend the 5'UTR sequence, I performed 5' R A C E PCR on a mouse day 0 ES cell random primed cDNA library, and cloned a 0.3 kb fragment, mAsx5 'XN. This 5' R A C E fragment contains Asxll sequence from 86 bp to 363 bp of the contig; therefore I was unable to extend the contig further in the 5' direction. The position of the 5' end of Asxll has not been confirmed by SI protection analysis. There are no ESTs in current databases that further extend the 5' end of this cDNA; however, there is a CAP trapper selected cDNA clone (Genbank accession number AK081063.1) generated by the RIKEN Mouse Genome Encylopedia Project (Okazaki et al., 2002) that extends the 5'UTR by 423 bp upstream and contains up to the 539 bp position of the Asxll cDNA contig. The 5'UTR sequence in clone AK081063.1 contains several sequence discrepancies compared to the Asxll cDNA contig and the mouse genomic contig NW_00179.1 (see below), including at the predicted translational start site. Therefore, I have not added this sequence to the cDNA contig for Asxll; however, it does appear likely that there is additional 5' UTR sequence for Asxll that remains uncharacterized. Together, the available cDNAs of Asxll show that the 5' UTR is 66 bp, the coding region is 4,542 bp, and the 3' UTR is 1,937 bp. The 3' UTR sequence contains three polyadenylation signal sequences at positions 4,610, 5,620, and 6,525 bp (Figure 2-2), which predicts three alternatively polyadenylated transcripts, resulting in a similar transcript size distribution to that of human ASXL1 (see above). The distribution of multiple oligo d(T)-primed EST cDNA clones in the dbEST database (NCBI) revealed by a B L A S T search to the full length Asxll cDNA indicates that all three polyadenylation signals are used, and generate three alternative transcripts of approximately 4,637 bp, 5,636 bp, and 6,545 bp respectively. The Asxll cDNA sequence described above has not yet been deposited in GenBank. There are three cDNA sequences exhibiting partial identity to the Asxll contig, which are model cDNAs generated by the automated NCBI Annotation Project: GenBank XM_206561, XM_149246, and XM_149245 (Table 2-2). These model cDNAs are not supported by our experimental evidence (see below). The Asxll cDNA contig is currently represented by two UniGene (NCBI) clusters: 1) Mm.24019, containing only 5' region Asxll cDNA sequences; and 2) Mm. 28424, containing 52 1 A C C G C C C C A G T C C G C C C C G C C C G A A G G A C C C G C G T G G A G C C G C C A C C G C C G C C G C G G A G G A G G A G G K TGAAGGACAAACAGAAGAGGAAG GGAGCGCACGTGGGCCGAGGCCGCGCGCCTGGTGTTAGAAAACTACTCAGATGCTCCAATGACACCAAAACAGATTCTGCAGGTCATA GAGGCAGAAGGACTGAAGGAAATGAGAAGTGGGACATCCCCTCTTGCGTGCCTCAATGCCATGCTACATTCCAACTCAAGAGGAGGAGAA GGGCTGTTTTATAAATTACCTGGCCGCATTAGTCTTTTCACACTCAAGAAAGATGCAGTGCAGTGGTCTAGAAATGCAGCTACAGTGGAT GGAGACGAGCCAGAGGACTCCGCTGATGTGGAAAGCTGTGGGTCTAATGAAGCCAGCACTGTGAGTGGTGAAAATGATGTATCTCTGGAT GAAACATCTTCAAATGCATCCTGCTCTACAGAGTCTCAGAGCCGACCCCTCTCCAATCCCAGGGACAGCCACAGGGCTTCCTCACAGGCA AACAAACAGAAGAAAAGGACTGGGGTTATGCTACCTCGTGTTGTCCTGACTCCTCTGAAGGTAAACGGGGCCCACGTGGAACCTGCGTCA GGATTCTCAGGCCGCCACGCAGATGGCGAGAGTGGCAGTCCATCGAGCAGCAGCAGCGGTTCTCTGGCCTTGGGCAACAGTGCCATTCGA GGCCAGGCCGAGGTCACTCGGGACCCTGCCCCCCTCTTAAGAGGCTTCCGGAAGCCAGCCACAGGGCAAATGAAGCGCAACAGAGGGGAA GAGGTAGATTTTGAGACGCCTGGGTCCATTCTTGTTAACACCAACCTCCGTGCTCTGATAAACTCTCGGACCTTCCATGCCCTGCCACTA CACTTCCAGCAGCAACTCCTCCTCCTCCTGCCTGAAGTGGACAGACAGGTGGGGACAGATGGCCTGCTGCGCCTCAGCGGCAGTGCACTC TAATGAGTTTTTCACCCATGCAGCTCAGAGCTGGCGAGAACGCCTTGCTGATGGTGAATTCACTCATGAGATGCAAGTCAGGCTAAGA CAGGAAATGGAAAAGGAGAAGAAGGTGGAACAATGGAAGGAAAAGTTCTTTGAAGATTACTACGGACAGAAATTGGGTTTGACCAAAGAA GAATCACTGCAGCAGAAAGAGGTCCAGGAGGAGGCCAAAGTCAAGAGTGGTTTATGTGTCTCTGGAGAGTCTGTGCGGCCGCAGCGTGGG CCCAACACCCGTCAACGGGACGGACATTTTAAGAAACGTTCTCGGCCAGATCTCCGAACCAGATCCAGAAGGAATATATACAAAAAACAG GAGCCAGAACAAGCAGGGGTTGCTAAAGATGCAAGTGCTGCACCAGACGTCTCACTCTCTAAAGATACTAAAACCGACTTAGCAGGGGTG AACAGTACCCCTGGGCCAGATGTGTCCTCAGCAACATCTGGACAGGAGGGTCCCAAGTGTCCCAGTGAACCTGTGGCTTCCCAGATCCAA GCAGAAAGGGACAACTTGGCATGTGCCTCTGCATCTCCAGACAGAATCCCTACCTTACCTCAGGACACTGTGGATCAAGAGACAAAGGAT CAGAAGAGAAAATCCTTTGAGCAGGAAGCCTCTGCATCCTTTCCCGAAAAGAAACCCCGGCTTGAAGATCGTCAGTCCTTTCGTAACACA ATTGAAAGTGTTCACACCGAAAAGCCACAGCCCACTAAAGAGGAGCCCAAAGTCCCGCCCATCCGGATTCAACTTTCACGTATCAAACCA CCCTGGGTGGCTAAAGGTCGGCCCACTTACCAGATATGCCCCCGGATCGTCCCCATCACGGAGTCCTCCTGCCGGGGTTGGACTGGTGCC AGGACCCTCGCAGACATTAAAGCCCGTGCTTTGCAGGCCCGAGGGGCGAGAGGTTACCACTGCAATCGAGAGACGGCCACCACTGCCATC GGAGGGGGGGGTGGCCCGGGTGGAGGTGGCAGTGGGGCCATCGATGAGGGAGGTGGCAGAGACAGCAGCAGTGGTGATGGTAGTGAGGCC TGTGGCCACCCTGAGCCCAGGGGAGCCCCAAGCACCTCTGGAGAGAGTGCGTCAGATCTACAGCGAACACAACTACTGCCGCCTTGTCCT CTGAATGGAGAGCACACTCCAGCTGAAGCTGCCATGCCCAGAGCCAGAAGAGAAGACTCAGCTTCTCTCAGAAAGGAAGAGAGCTGCCTG TTGAAGAGGGTCCCAGGTGTGCTTACAAGTGGGCTGGAAGATGCCTCTCAACCCCCTATTGCTCCCACTGGAGACCAGCCGTGTCAGGCT TTGCCCCCTCTGTCCTCCCAAACTCCAGTGGCCGAGATGTTAACAGAGCAGCCTAAGTTGCTTCTAGATGATAGAACTGAGTGTGAATCT AGTAGAGAAGATCAAGGACCCACCATTCCCTCAGAGAGTAGTTCTGGACGGTTTCCATTGGGAGATCTATTAGGAGGAGGAAGTGACCAG GCCTTTGATAATATGAAGGAGCCTGTAAGTATGACACCTACTTTTATATCTGAATTGTCATTAGCTAACTACCTACAGGATAGGCCTGAT GATGATGGATTAGGGCTTGGTGCCACAGGCCTACTCATAAGGGAAAGTAGTAGACAAGAAGCTTTGACTGAGGCTTTTGCATCTGGCAGT CCTACCTCCTGGGTACCCATTCTGTCAAATTATGAGGTAATAAAAACATCTGATCCAGAATCCAGAGAAAACATACCATGTCCGGAGCCC CAGGATGAAAAAGAGTGGGAGAGAGCTGTTCCTCTCATTGCAGCAACAGAAAGTGTGCCCCAACCTGAGAGCTGCATTTCACATTGGACA CCTCCTCCAGCAGCTGTGGGCAGCACTGGCAGTGACAGTGAGCAAGTGGACCTTGAAAGACTGGAAATGAATGGCATCTCTGAAGCACCA AGTCCTCACAGTGAATCCACAGATACAGCCTCTGACTCCGAAGGCCATCTCTCTGAGGACAGCAGTGAGGTTGATGCAAGTGAAGTCACA GTGGTAAAAGGGTCATTAGGTGGGGATGAAAAGCAAGACTGGGACCCATCTGCCTCACTGTCCAAGGTGAACAATGACCTAAGTGTGCTT ^CAAGGACAGGAGGGGTGGCTGCTTCTCAGAGCTGGGTGTCTAGAGTATGTTCAGTCCCACACAAGATCCCAGACTCTCTGTTGCTGTCC AGTACTGAGTGCCAGCCGAGGTCTGTGTGCCCACTGAGGCCTGGC'TCTTCAGTGGAGGTTACCAACCCACTTGTGATGCACCTGCTGCATi GGTAATTTGCCCTTGGAGAAGGTTCTTCCTCCAGGTCACAGAAGCAGCCGACTAGAGTCATCACAGCTGCCACTTAGAGAACAGAGC'CAG, GATAGAGGCACTCTACAAGGTACAGGGGAAAACAATCGCCTAGCTGCCAGAATCAACCCTGGTTCTGCACAAACATTGAAAGAGTCTATT CTGGCCCAGAGCTATGGAGCAAGTGCTGGTCTTGTCAGGGCAATGGCCTCCAAGGCTCCTGCAATGTCCCAGAAGATTGCGAAGATGGTT! ACAAGTTTAGACTCACAGCATCCAGAGACAGAACTGACACCTTCCTCTGGCAATCTGGAAGAAATAGATTCCAAAGAGCATCTCTCTTCC TTCCTTTGTGAAGAGCAGAAAGAAGGCCATTCCCTGTCTCAAGGCAGTGATCCAGGTGCGGCCCCAGGCCAATGTCTAGGAGATCACACT ACCTCCAAAGTGCCATGTTTCTCCTCCACAAATGTGAGCCTCTCCTTTGGATCTGAGCAGACAGATGGGACCCTGAGTGATCAGAACAAT GCTGGTGGTCATGAAAAGAAACTATTTGGTCCCGGGAATACAGTTACCACCCTTCAGTGCCCCAGGTCTGAAGAGCAGACACCACTACCT GCTGAGGTCCCTCCAGTGTTTCCCAGTAGGAAGATAGAACCAAGCAAAAACTCTGTGTCTGGTGGTGTGCAAACTACAAGGGAAAACAGG ATGCCCAAACCACCTCCTGTCTCTGCTGACAGCATCAAGACAGAGCAGACATTTTTGAGGGATCCTATTAAGGCAGATGCAGAGAACAGA AAAGCTGCAGGGTACAGTTCTCTGGAACTAGTGGGTCACTTGCAAGGGATGCCTTTTGTTGTGGATCTGCCTTTCTGGAAGTTACCCAGA GAGCCAGGGAAAGGGTTCAGTCAACCCCTGGAGCCTTCTTCCATCCCTTCCCAACTCAACATCAAGCAGGCCTTGTATGGGAAGTTGTCT AAACTTCAGCTCAGTCCCACCAGCTTTAATTACTCCTCTAGCTCTGCTACCTTTCCCAAAGGCCTTGCTGGTGGTGTGGTGCAGCTGAGC CACAAAGCCAGCTTTGGTACAGGCCACACTGCATCACTGTCCTTACAAATGTTCGCTGACAGCAGTGCAGTAGAAAGCATCTCTCTCCAA TGTGCATGCAGCCTGAAAGCCATGATCATGTGCCAAGGCTGCGGAGCATTCTGCCATGATGACTGCATTGGACCTTCAAAGCTCTGTGT TTGTGCCTTGTGGTGAG. gTAATAAATTATGGCCATTGGAAACATTGTACATTTAGTGTGTGTATTTTAATAATGGTTGATCTTAAATCTG 4681 TATACAAAATATCACTGATATAATGAACTCTCTCTCTCTAGACAAGATAAATTTTGCCTCCCCATGAGATTTATAGTGCTGAAGCCCTCT 4771 GTCACTTGACACCCTTCTAGCCTTGTTGGAAGGGTTTTCAGGGAGATGGGGGCACTATGGTTGCCCAAGACCATAAACCCTCTTGTAGTC 4 861 AGACAGTATAGTGTAGCAGGGCAATCTGTCTGACACCTAAATGGACTTGAAATTGAAGCAGGAAGGTTGGGTTCTCCATGGATGGAACTC 4951 ACCTGCCTGAACTGAGCAGGAATGTCAGTCTTCCACTGCCCCTCCCTGCCATCTTCTGCTACTTAGCTTGGGAGTTGATGGTTGCAGAAG 5041 CCACACAGGGTTAAAGTAAATTCTGTCTTTGCCCACCAGGGGATCAAACCCCTGCTGATCTTGATATCATATTTCTGTCATTTGCCAGTT 5131 GATGGAGCCAAGTTGACCTTTGGTTCTGGTGCTTCACCCAGTTTGGAACTTTAATCTGTAACCCATGGATCCACAGATTTTCTTGGGAGC 5221 TTGAATAGCCCTTCTTGGACAATGGGGTCTGGAAATAGGGCTGTCTGCTTATGGAAATGCCATCTGTAGACCTTGAGAGTCAACTGTACA 5311 GATGTTTGCAGGTGACTCCTGCTTGCCACCCACCCATCATATTTGATTCTGTGGGAGTGGGCACCCATACACCTGTGTATGTGAAAGTCA 54 01 TTTTACATTTCCAAGCAGTCTGTTTCTTATTTTTATATTTTAACTCTATTCTTGGATGTATAAAGTGAACTTTTTCGCTTCTGTAAGTAT 54 91 GCTCTATGCACCTCTAACGTTTTATCCTGTATTTATATGTTGTACACAGTACTGGCCAATTCTGTAAATGGATGTATTGTACAGAGAACA 5581 CGAACGTCTCTTCCTTATTTTACGTCTTCAGCACCCTTGATTAAAATGGTGTCGTTTACTTATGTGCACCAATTGTCAAAACCACTGAGT 5671 GCTGTGCTGCCTATGTGAGAAGAAAGTTACTGACTTGTGACCTATGCTTGGCCCTGTGCCTAGTCACCTGTCACCAGGGATTCTACTACA 5761 GTCTGCGCGCCACAGATAGTTGCTCACACACCATTGAGCAAACGTGTCTTGTTCCTTGTATCTCTTTTCTTCCAGCCCTGAGTGGCTGCG 5851 CTGCTGCCATATGTGACAAGTGGTCCTCACCAGTGTTAAGTGCTTCTGGATTCATGGGTGAGTTCCTTGGACGACCCCAGACAGGCTTCT 5941 GGATCTGGCTCCAGGGTCATCACATGTCAAAGCAATATTGGGGATTGCTAGGCTTCTTTCAACTTACTTTTCACAGTAGGTTTCAGGGGA 6031 ACAAAGCCATGCCTGCTGCCTGCCTCTCTCCTAACATGGTTCCATCAAGTAGACACCCTTTGTATTTGACCTCTTCCCAGTTCCTGCCCT 6121 GTTTCTGTATCTTTACTTGAAAGTTGGGAGTCCTTGTTTAATTTAGGTCTGTCTTCTTGTGTGACTGTTTCTGTATGAAGTATAAAAATT 6211 TGTGTGCCTTGAGCAAGTCTCATTTTCTTTGTAAGCTGCCTCTTCCGCCATTTCATCTGAGGTTGCTGTCTTTTTGGTGGGGAGGGGTTT 63 01 GGGGAACCCATCAGGTATGAGTGAGGAGCCGGTAGTGAATTCAGTATTGGCCAAGTCCATGTTGTGTGCTCAGAAGTCTCTTTACCAAAG 63 91 ACCAGGAGCAGGAGCCAAGCATGTCCAGTCCTCTTCTGAGACCTCTGAGGTCTCAGAAGAGAAACTTGACAGTGTCTGGTCCATGGGGCT 64 81 GCTCAGAAAAGCGCTGCTCTTGTATGTCTCATTGGTATTGGAAAAATAAACCTGTACAACCTGCA Figure 2-2. Nucleic acid sequence of mouse Asxll cDNA. Mouse cDNA clones were assembled into a 6,545 bp contig. The longest ORF is indicated by a black background and polyadenylation sites are in bold. 53 Table 2-2. Selected cDNA and genomic D N A sequences and/or clones.which contain sequence representing Asxll, some of which were used to generate the Asxll cDNA contig. ND, not determined. GenBank Sequence type Size of bp of Asxll contig Accession clone or contained within listed Number contig sequence (and/or name) Asxll cDNA contig 6545 bp 1 to 6545 AR072722 partial cDNA clone 5362 bp 1 to 5330 b (mAsx) mAsx7A partial cDNA clone 4.8 kb ND mAsx9A partial cDNA clone 1.8 kb N D mAsx3'Xho partial cDNA clone 2.1 kb 4487 to 6534 AK081063.1 CAP trapper selected 962 bp -423 to 539 c cDNA clone AK079284 CAP trapper selected 1755 bp 3821 to 5574 cDNA clone W41911 EST cDNA clone 448bp a 4211-4638 X M 206561 NCBI Predicted 296 bp 26 to 206 (LOC278997) Gene X M 149246 NCBI Predicted 695 bp 628 to 912 (LOC228789) Gene X M 149245 NCBI Predicted 3393 bp 1245 to 4637 (LOC228790) Gene NW 00179.1 genomic contig 1 to 6545 (chromosome 2) a The stated size represents the sequenced portion of the EST which is likely smaller than the full insert cDNA size b The first 26 bp (5'end) and the last 6 bp (3'end) of AR072722 are derived from the cloning vector ° The upstream 5' UTR sequence of AK081063.1 was not included in the Asxll contig due to numerous sequence discrepancies (see text for details) 54 cDNAs corresponding to the remainder of the Asxll cDNA contig. The error in assignment of Asxll to more than one UniGene cluster, which also occurred for human ASXLI, is likely a result of the poor representation of EST cDNAs for the 5' region of Asxll in the dbEST database. There are two additional UniGene clusters located within the Asxll locus, in the region of the 154,170K position of the chromosome 2 NW_00179.1 contig, which are not contained within the Asxll cDNA contig, and also do not contain any shared sequence between each other. These clusters fall between exons 4 and 5 of the Asxll locus, within the large 35 kb intron (see below), and were identified by inspection of the NCBI Entrez Map View mouse genome display for the region containing the Asxll locus. The first 5' EST cluster is composed of Accession numbers BG173484, AA762074, and AA771664/AI5 87940 (two sequence reads of the same clone) and is represented by the UniGene Cluster Mm. 172388, while the second 5' EST cluster is composed of Accession numbers BB636000, and AA274047/AI510258 (two sequence reads of the same clone) and is represented by the UniGene Cluster Mm.30796. The UniGene cluster Mm. 172388 was retired by the NCBI in January 2003; upon B L A S T N search of a contig representing these ESTs to the nr database (NCBI), more than 200 high similarity hits were obtained, the vast majority of which (including the first 18 high scoring alignments) represent mouse genomic sequences or contigs. A B L A S T X search of this EST cluster contig translated in all reading frames to the protein database (NCBI) did not yield high scoring similarities to any characterized proteins, only moderate scoring similarities to hypothetical proteins from the mouse genome project. A B L A S T N search of this EST contig to the mouse EST database yielded numerous hits of high to moderate similarity, indicating that similar but not identical transcripts are generated from numerous other regions within the mouse genome. An Rfam (Griffiths-Jones et al., 2003) search of this EST cluster yielded no matches to known non-coding R N A gene families. A search of this EST cluster to the RepeatMasker database yielded no hits to previously described mouse repetitive elements. Conversely, a B L A S T N search of the second 5' EST cluster UniGene Mm.30796 to the nr database (NCBI) revealed only low scoring, and therefore likely non-signficant, similarities to known genes and characterized genomic sequences. A B L A S T X search to the protein database only yielded low scoring, and therefore likely non-significant, similarities to known and hypothetical proteins. A B L A S T N search to the mouse EST database yielded no high scoring similarities to other ESTs. Therefore the ESTs contained within UniGene cluster Mm.30796 appear to be unique. These ESTs do not belong to a previously described non-55 coding RNA family, since there were no matches to the contig by Rfam search. However, a Repeatmasker search revealed the presence of a B2 family SINE repeat element within this EST contig. The mouse genomic contig NW_00179.1 of chromosome 2 contains the Asxll locus, located within band 2H1 (Table 2-2). The LocusLink (NCBI) identifier for the Asxll locus is LOC228790. The mouse Asxll locus lies between the KifSb gene in the centromeric direction, and the uncharacterized gene LOC241733 which is similar to the locus for human hypothetical protein DKFZp566G1424 in the telomeric direction, and as such lies in a chromosomal region syntenic to human chromosome 20ql 1 where the ASXL1 gene is located. The mouse genomic B A C clone #gs 12943, and the PI clone #gsl 1815 (Incyte Genomics), each containing portions of the Asxll locus DNA, were identified in screens using the Asxll EST cDNA Accession number W41911 as a probe. The B A C clone #gs 12943 was subcloned into several fragments for use in subsequent experiments (see Chapter 3) and for sequence analysis. iv. Sequence analysis of human ASXL1 and mouse Asxll The ASXL1 gene product is predicted to contain 1,541 amino acids, have a relative molecular mass of 165,462, and a pi of 5.85, whereas the Asxll gene product is predicted predicted to contain 1,514 amino acids, have a relative molecular mass of 162,674, and a pi of 5.70. The sequences surrounding the initiator codons of ASXL1 ( G G G C G A A G G A T G A ) , and Asxll ( A G G A G G A G G A T G A ) , agree with the vertebrate Kozak consensus only at the -9 position and the critical -3 position at which 97% of genes analyzed have a purine, and the -6 position for Asxll but not ASXL1 (Figures 2-1 and 2-2) (Kozak, 1991). Figure 2-3 compares the conceptual translations and domain structures of human A S X L 1 , mouse A s x l l , and Drosophila A S X . The overall degree of sequence conservation between human A S X L 1 and Drosophila A S X is low at 17% identity and 43% similarity; mouse Asxl l and Drosophila A S X share 16% identity and 40% similarity; whereas mouse Asxl l is 74%) identical and 81% similar to that of human A S X L L The conservation between ASXL1 and Asxl l is extremely high in the N-terminal third of the predicted proteins, is lower within the central region, and is high again at the C-termini (Figure 2-3A). There are three putative nuclear localization signals (NLS) in ASXL1 and Asxl l respectively, which are almost completely conserved: K D K Q K ( K / R ) K K at residues 2-9; K Q K K K at residues 160-164; and K K R S R at residues 409-413, all located within the N -56 terminal third of the protein (Figure 2-3A) (Cokol et al, 2000; Jans et al, 2000; Nair et al., 2003). Nuclear localization of human A S X L I tagged at the N-terminus with two hemagglutanin (HA) tags was confirmed by fluorescence microscopy of transiently co-transfected OS2 cells (H. Brock, unpublished observations). Co-localization of the HA-tagged A S X L I protein with that of co-transfected F L A G tagged M L L or BMI-1, which have both previously been shown to be localized to the nucleus (Hanson et al., 1999), is consistent with the prediction that endogenous A S X L I is localized to the nucleus (H. Brock, unpublished observations). There are several potential PEST sequences (Rechsteiner and Rogers, 1996; Rogers et al., 1986) for proteolytic degradation which are distributed throughout the predicted proteins: residues 93-143, 786-825, 838-860, and 1,126-1,140 of A S X L I ; and residues 92-143, 452-477, 790-805, 936-958, 1,185-1,203, and 1,290-1,308 within A s x l l , two of which are conserved with A S X L I (Figure 2-3A). Hence Asxl l and A S X L I are predicted to be unstable proteins. There is a serine-rich region near the N-terminus and a glycine-rich region near the middle of the mammalian ASX-like-1 proteins (Figure 2-3 A). Serine contributes 11.6% of the total amino acids of A S X L I , 12.4%o for A s x l l , both higher than that observed for Drosophila A S X at 7.8%). A S X L I is also rich in leucine and proline, at 8.8%) and 8.6%> of the total amino acids respectively, and 8.9%> and 8.0% respectively for A s x l l . Neither the putative AT hook motif (RGRP), nor the nucleotide-binding motif of Drosophila A S X ((Sinclair et al, 1998b)) is conserved in A S X L I or A s x l l . There are two regions of high sequence conservation between the mammalian ASX-like 1 predicted proteins and Drosophila A S X , indicated on the sequence alignment in Figure 2-3A, and illustrated by the domain architecture diagram shown in Figure 2-3B. The first extends from residues 249 to 368 of A S X L I , and is 46% identical and 62%> similar to Drosophila A S X . This region is also highly conserved in ASX-like predicted proteins from other vertebrates, as shown in Figure 2-4.1 have termed this region the A S X Homology (ASXH) domain, and it is encoded by exons 9, 10, and 11 of ASXLI (Table 2-3) and Asxll (Table 2-4; also see below). Overall, this putative domain does not exhibit primary sequence conservation with any known domains. The first half of the domain is rich in conserved hydrophobic and charged residues, whereas the latter half of the domain contains numerous conserved charged residues, particularly glutamic acid and lysine. Structural predictions of the A S X H domain suggest that it is predominantly a-helical. Within the A S X H domain there are two O X X O O motifs (where 57 NLS Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm 71 sHKSTH^RRHLPRII^^pBc^MAPg^^^^^^^^^^^^^^^^^^^|AVSTAPAgP ...NLS IRGQAEVTraDPAPLLRGFRKPATG AIRGQAEVTSDPAPLLRGFRKPAT" Q|| TMREVL0SI PGFS^ NPRRR SNKJJLTTAS 308 308 281 L N N E F F T H A A Q S W R E R L A D G E F T H E M Q V R L R Q E M E K E L N N E F F T H A A O S W R E R L A D G E F T H E M Q V R I R Q E M E K E L N N E F F ^ A S E W R E R L S E G E F T S E E O L K L K S K B E R E K ! nsfipj OJHBSP 'fit-- NTSRGKDKDKLESDCKNQKLSASIK 365 365 3 61 SEPKPPAT§JQ§KP| RSLLQAQFLGRRRPVRLH Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLI-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm 590 593 681 RPAAHRPHLPPP' PEST... 731 gVPG\ 7 34 gATVGfl 841 TEEQQJ DASQ|PJAPTGDQPCQALP|LSSQT DASOIPVAPTGDQPCQALPHLSSQT llgEDS^VREIgDKLQQHQQgQNQQQHHgQLHIgDl QILVSS^^SNF^^PRQQS PEST.. 112 Em-BMrTSM •m;_mym g T S D g j ^ N T O C P ^ g D E K g gLvBjEgs LHKWWPBBD BBHIjpKfflaBSHlHsvlSWvnKH l o o i QQSTL^ AAAQQQQQQ@QSHQQ@QQQ^ QQATSSNSLGKTLPVAL| ^LIAE5JESJ5PQP§S c i s g w i g p p ^ J raTPPg^GDjTAraEGLDPLDSLTSLWTpSp c|TQQFL^Ni5|AQ|HQQ QQQQQPQSQQ 58 A s x l l - M m ASXLl-Hs Asx-Dm H * G G ^ g Q | M D | ^ g s n |s - BSDKBBBPMNO^PII PLNIgSMTRPA^AS^TAAQTA Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm ^QQQQSPPH3APQQQ^QQQQLANHNSMJQQ||PNVLJ . . .PEST 1 1 0 2 RsfcLHsh 1 1 2 7 |nnyi^Bp" 1235 GRKAESNRLPj i Q T ^ g S g f f l A Q g Y G S s A ^ r a A M S@S|LRA^S p 1 B P D | C E T G T SAS ] E PSSILNQHQPTTTJJAPSPINPJJTJN" S N ^ ^ B j S P g J [GRPGVY KVI GP^MS JFPRKKYgQjgKPg PEST .TATSQQLQMLQQHHQS|JT|P5PVQ: 1335 VSAD^ET§2| 1362 AFVGBBfe 14 73 NQFVf2QiS P L H S PPISATJ Asxll-Mm ASXLl-Hs Asx-Dm Asxll-Mm ASXLl-Hs Asx-Dm 1477 1504 1632 QfflQHPWLQLHQjiGENTPPGNEATe PHD LQCACSLKAMIMCQGCGAFCHDDCIGPSKLCVLCJWR LQ CACSLKAMIMCQGCGAFCHDDCIGPSKLCVLC: M Figure 2-3. A) Predicted amino acid sequence alignment of human A S X L 1 , mouse Asx l l , and Drosophila A S X . Sequences were aligned using ClustalW and manual adjustment, and formatted using B O X S H A D E . Conserved amino acids are shown with black backgrounds, conservative amino acid changes are indicated with grey backgrounds, and gaps are indicated by dashes. Conserved nuclear receptor (NR.) binding motifs, A S X H region (dotted underline), and PHD domains are indicated for A S X L 1 , Asx l l , and A S X . Nuclear localization signals (NLS) and PEST sequences are indicated for ASXL1 and A s x l l . Motifs that wrap across text lines are indicated by three dots (...). B) (Next page) Domain structure of Drosophila A S X compared to human ASXL1 and mouse Asxl l predicted proteins. Human and mouse ASX-like-1 predicted proteins have identical domain structure and therefore are shown here as one cartoon, labelled as A S X L l . 59 B N Q A as y 11 SX4GKT AT hooks ASXL1 ^ ASXH domain | NR (LXXLL) motif * putative NLS SingIe aa-rich regions PHD zinc finger 60 '£< w .H 'en C o o c !-§ u _o "a, in 01 4 J 4-1 4-1 o i 01 4-1 4 J u 4 J 01 (0 cn 01 to cn 01 cn 01 01 fO 01 01 01 to 4-) m to m to 4-1 (0 u 4-1 tO 01 01 4 J m n3 to to tO cn Cn 10 to (0 4J 4J •u 4J 4-1 4J 4J 4J 4J Ol Oi 01 01 01 0) 01 01 0) 0) 01 01 CD CD CD CD a CD a CD o o CD CT GA AT CA CA CA CA AT CD CG U En u a < EH < <C CD < u CD EH u u < u CD H u EH u < U EH U u < U <: < P H CD O O C3 to O "a, a> N c o Z 'al a e .a c C c/i o o, < z Q o c o w >—i O m in VO 0 0 m C N C N •rt- NO in ON r- oo o r - - —i NO NO ON o O N i n O N ^CT C N 0 0 T j -ON CN CN m <: U EH u U EH "5 CD EH o "3 EH < < H U Q t£ CD < EH CD CD U < EH CD f£ CD C J < CD CD CD H CD EH u CD CD EH H EH CD ^ < U CD EH EH EH U EH CD H EH CD a CD CD CD CD 01 01 01 01 01 01 01 01 01 01 01 01 m n) (0 (0 R) m (0 (0 m rd m <d u 4 J U u u u u 4-1 4-1 4-> u u 4-1 fO (0 (0 u tn 4-1 tO U V 01 Dl 4-1 4-1 4-1 u 4 J u 4-1 u 4 J u 4-1 4-1 U U 4-1 4-1 4-1 (0 U 4-1 4-1 V 4-> 4 J cn cn O N , ( 0 0 cn t~- N O r-oo O CN O N O N m N O O N © cn 0 0 N O 0 0 f - 0 0 cn r - - VO 0 0 ON O CN cn CN NO CN H—1 CN r -» 0 0 y—l t - cn NO o m cn ON cn in 0 0 0 0 CN CN o cn CN CN cn cn in m t-- cn m IT) rn .—1 f -H i-H y—t CN ~ H CN CN CN CN CN CN CN CN CN 1 cn • ON I 1 o i cn i O cn i NO i NO i 0 0 i in i ON i VO H t~- CN CN CN cn ON r - r - CN CN NO ON o CN CN CN r - - cn CN O CN CN CN cn cn in NO l > 0 0 i n m y—i i—H r—t i—i 1—1 1—1 1—1 CN CN CN CN CN CN CN CN CN >n 0 0 O ,—i ON cn © e'- cn t^ ON o cn CN CN en cn cn in 0 0 O N CN cn O N 0 0 • • • • • 1 • VO 1 O N CN »—H CN © 1 • • I O N © cn cn CN 0 0 0 0 cn cn m NO r - » 0 0 ON cn CN cn r -O N © ^ H C N cn - H t N i n T f i n N 0 M » O \ H r f H r t 61 1 ^ e o i s o c o o (SI <^> m «N «N 00 in p S5 C <L>. M l o a O O o O J 1 ^ C o c o r<-> O a -< Q o c o , x w v - i i o o o - ^ - c N ' ^ r m o o o o vo © •>tf- <—< r~~ m t N 4-) 4-1 4J 01 4-> 4-1 4-1 4-> m m 01 m rt 01 4-> 01 01 o 01 01 01 4-1 m m rt rt 4-) rt u (0 m rt 4J m m rt rt m 01 01 O l m m 4-> JJ JJ JJ 4J 4J 4J 4-) 4-> 4J 01 01 01 Ol 01 01 01 01 01 Ol Ol CD CD CD CD a CD CD CD CD < < < EH CD TG u u <C CD U TG u CD EH <c CD < U H H u CD u a EH EH 3 U <: EH U u < U < < u H U u EH CD EH CD 3 EH EH U CD CD < EH a CD U H CD CD U CD CD CD EH <] EH u CD CD EH H H CD <: u < CD EH EH EH H CD EH 1=5 EH a CD CD CD CD < < 01 Ol Ol Ol Ol 01 Ol Ol Ol 01 Ol 01 rt rt (0 (d « m rt rt rt rt rt id u 4J u u 4J u CJ u 4-1 4-1 o u 4J tC 01 rt 01 m 4-1 01 u 4-1 01 Ol u m 4-1 u 4-1 4-> u o u O 4-1 4J 4J u 4-) u 4-1 u u 4-> U 4-1 4J m a \ 00 CO r-» VO m 00 00 o (N Os Os l O VO ON O <N VO VO VO Os 00 a s r - r—1 00 *-* VO o O t—1 m C I 00 in CN m VO a s o r~ m t i • 1 • • 1 • vo r - o Os o 00 in • • i o m 00 Os vo (N t~-VO (N a s o ' 1 '— 1 "2 CO o 1 3 g c o c o t! C J 3 c S o u u s u o c 2 <L> C 3 O C T T 3 K 8 g a 6 0 c < Q o D J 3 3 IS c _o 13 o a 3 o C o I c o X 0) <u J 3 < N ro -3- in vo o - N m 00 Os 1 1 '—1 '—1 a . <u 3 C .S ^ a 1 5 ° 2 c 3 <u 62 ASXLl_Hs Asxll_Mm BI182513_Ss AL957414_St BJ040195_X1 Asxl3_Mm ASXL3_Hs Asxl2_Mm BF524312_Rn ASXL2_Hs BI537396_Bt ASX Dm GQMKRT 1 P 1 i A ^ B E O I E I D G K H B I L S E D I A S X L 1 _ H S Asxll_Mm BI182513_Ss AL957414_St BJ040195_X1 Asxl3_Mm ASXL3_Hs Asxl2_Mm BF524312_Rn ASXL2_Hs BI537396_Bt ASX Dm ASXLl_Hs Asxll_Mm BI182513_ AL957414_ BJ040195_ Asxl3_Mm ASXL3_Hs Asxl2_Mm BF524312_ ASXL2_Hs BI537396_ ASX Dm Ss St X I Rn Bt 121 N<31E! pfiBiJgHlaTal pt3?BE>Rfo s RGKJJKD Figure 2-4. Sequence alignment of the ASXH domains of Drosophila ASX and ASX-like proteins from other species. Sequences used in alignment were: Drosophila ASX (Accession CAA04568), human ASXLI (Accession CAD27708), mouse Asxll (predicted translation of Accession AR072722), and homologous regions of ASX-like proteins from other species, represented either by predicted translations of EST cDNA clones in the dbEST database (NCBI; Accession numbers as indicated) or by Genscan protein predictions from the UCSC Genome annotation project on the Human Nov. 2002 Freeze and Mouse Feb. 2002 Freeze (Genscan Gene Prediction Numbers: ASXL2, NT_005204.107; Asxl2, chrl2_2.4; ASXL3, NT_010966.68; Asxl3, chrl8_2.478). The sequences were aligned using ClustalW and formatted using BOXSHADE. Conserved amino acids are shown with black backgrounds, similar amino acids have grey backgrounds, gaps are indicated by dashes, and missing sequences are indicated by dots. The conserved nuclear receptor (NR) binding motifs are indicated. Numbers refer to amino acid positions in the corresponding predicted protein fragments shown with the most N-terminal amino acid defined as position one. Species abbreviations (shown after protein name) are: Hs, Homo sapiens; Mm, Mus musculus; Ss, Sus scrofa; St, Silurana tropicalus; XI, Xenopus laevis; Rn, Rattus norvegicus; Bt, Bos Taurus; Dm, Drosophila melanogaster. 63 cp is any hydrophobic residue) similar to the L X X L L conserved putative nuclear receptor (NR.) binding motifs at residues 264 to 268 and 284 to 288 of ASXL1 (Figure 2-3A). A C 4 H C 3 cluster at the extreme C-terminus, from residues 1,506 to 1,537 of ASXL1 and residues 1,479 to 1,510 of A s x l l , is the second highly conserved region between the mammalian ASX-like 1 proteins and A S X (Figure 2-3B), is encoded by exons 13 ofASXLl and Asxll (Table 2-3 and Table 2-4 respectively), and is 100% conserved between ASXL1 and A s x l l . This region is 73% identical and 87%) similar to that of Drosophila A S X . I have identified this region as a plant homeo-domain (PHD) zinc finger, using multiple sequence alignments of the C 4 H C 3 cluster to known zinc finger domain families and manual inspection, as this domain was not recognized by SMART or other database domain searches. PHD and RING zing finger domains are similar in that they each possess two zinc-chelating pockets and form a treble clef fold with cross-brace topology; however, they are distinctly different protein domains as evidenced by specific structural differences such as the conserved helical region C-terminal to the final pair of cysteines within RING but not PHD domains (Aravind et al., 2003; Kosarev et al, 2002; Matthews and Sunde, 2002). The zinc finger domains of A S X and ASX-like-1 proteins are more similar to the PHD type as opposed to the RING type, and they lack the C-terminal helical extension found in RING proteins since they occur at the very C-terminus of the protein. The PHD domains of A S X and ASX-like-1 share some interesting features (Figure 2-5). The spacing between each conserved putative zinc-binding position is on the small end of th th the range of consensus for a PHD domain. The number of residues between the 6 and 7 positions is smaller than the PHD consensus and is more consistent with the consensus for a RING domain. However, because the 4 putative zinc binding position is Cys and not His, the lh C 4 H C 3 cluster cannot be a RING domain. The 5 putative zinc binding position could be His or Cys since these residues are adjacent to one another. There is a precedent for a PHD domain with eight cysteines, found in wild-type A T R X (Picketts et al., 1998). Mutational analysis of th the KAP-1 corepressor indicates that either His or Cys can be tolerated at the 5 position of the PHD domain, with only a slight further reduction in transcriptional activity (Capili et al., 2001). v. Genomic organization of human ASXL1 Comparison between the 6864bp ASXL1 cDNA contig and human genomic sequences present in public databases revealed that the ASXL1 gene consists of 13 exons and 12 introns, 64 between the approximate positions 30,685K and 30,775K on chromosome 20, spanning 81kb. The exon-intron organization of ASXL1 is shown schematically in Figure 2-6A. The position and size of the exons within the cDNA are listed in Table 2-3. The 5' donor and 3' acceptor sites fulfill the gt and ag splice site rules respectively (Table 2-3) (Kozak, 1991). The 5' UTR of ASXL1 consists of 258 bp, within exon 1. The entire 3'UTR is contained within exon 13, which is by far the largest exon at 4,887 bp. There are two large 5' introns with lengths of 7.6 kb and 59 kb. Notably, the smallest exon (exon 3) is 3 bp long, which is an internal translated exon. The smallest internal translated exon previously described in humans is 4 bp, from the TNN1 gene (Zhang, 1998). The author argues that there is no minimum size constraint of this type of exon (Zhang, 1998). We confirmed the sequence of the ASXL1 5' splice donor site of exon 2 and the 3' splice acceptor site of exon 4 by PCR of human genomic D N A in order to rule out the possibility of sequencing errors within genomic database sequences. Because the corresponding genomic sequence is in agreement with genomic database contigs, and because the cDNA sequence of our 4,926 bp clone across the region corresponding to exons 2 to 4 has been confirmed independently in the form of multiple ESTs of A S X L 1 , it is unlikely that our identification of exon 3 is the result of a sequencing error. vi. Genomic organization of mouse Asxll A comparison of the Asxll cDNA contig to mouse genomic databases resulted in the identification of 13 exons and 12 introns on chromosome 2 band 2H1, between positions 154,140K and 154,212K, which consists of sequence from mouse strain C57/BL6J, and spans 58 kb. The exon-intron organization of Asxll is shown schematically in Figure 2-6B, and is very similar to that of human ASXL1 (Figure 2-6A). The position and size of the exons within the cDNA are listed in Table 2-4. The 5' donor and 3' acceptor sites fulfill the gt and ag splice site rules respectively (Kozak, 1991); however the sequence of the first 5' splice donor site is uncharacterized as it fell within a gap between adjacent mouse whole genome shotgun supercontigs at the time of analysis (Table 2-4; see http://www.ncbi.nlm.nih.gov/genome/seq/NCBIContigInfo.html). The 5' \JTK of Asxll consists of 66 bp, within exon 1. The entire 3'UTR is contained within exon 13, which is the largest exon at 4,768 bp. There are two large 5' introns with lengths of 5.9 kb (estimated) and 34.6 kb. Identical in coding sequence to the corresponding region of the ASXL1 locus, the smallest exon of Asxll (exon 3) is 3 bp long, and is also an internal translated exon. The exon-intron 65 U £u [x a L K ft O h H PC PC PC PC PC PC PC P C PC PC PC PC P C ffi PC PC PC PC PC PC PC >H >"l >H > H B W J > H >H >< >H >H tt4 N a U U U j> g g IB < < < K a o w co co i 1 u 1 a O o w w Q i 1 1 w w w 5M 1 1 1 w 1 Q Q Q | I ] i ! s Q s i 1 i 1 i 1 < p Q a 1 i [ i 1 a w a • i 1 • 1 > q « >H u CD CO o 2 u b u a; a . CO a, >' a s 2 cd co C3 CO CJ O CD CS « h ft CO CO Ui ^ € S rt e £ .1 CN 2 oo MO CN ft ft ft CM tn CO r- r- PI ID ID H iH PI CO CI o r- CM CO CM 01 P ) <# rH PI r- o •a< OA CO Ln ON U) PI CD i n CN r» OD PI s CN r> rH rH CM CM o CM rH CM H H rH i—1 rH H H rH iH H rH H rH rH CO CM rH rH Ul i n H CN Q PI Q Q CN EC H Q EC P ) EC CN H PI Q ft CN rH Q H CN a IX Q Q CM Q Q Q Q EC iH Q Q EC Q Q 1 EC E I EC EC Ed EC CM IT) EC EC CM EC EC rH H i H CM rH CM CM CM CM I > CM CM 1 CC, CM rH CM 1 i i I i i 1 rH I CN 1 I 1 CM 1 1 CM X X HJ J X X HH X Cd Cd 43 Cd rC Cd Cd rH • J X rH J3 CO co CO J in SH J iH hH CQ rH r-H tj J H U CO < < < z < 4J 4J < u < < m < 2 < < CM 4-1 CM C3 B .5 IT ) I CN O X C V3 rt .S O & 2 CQ oo a , »• » r-S 00 cy NO £ C N E ° c u C3 66 Figure 2-6. Genomic structure of the human ASXL1 gene (A) and the mouse Asxll gene (B). Exons are shown as boxes, and introns are represented as lines. The position of the 5' UTR (no shading), the coding region (black), and the 3' UTR (grey) are indicated. 67 boundaries from exons 5 to 13 were experimentally confirmed by sequence analysis of subclones of the mouse genomic B A C clone #gs 12943 from strain 129/SvJ (Incyte Genomics) which contains part of the Asxll locus. vii. Analysis of human ASXLI expression Northern blots were analyzed to determine the expression profile of ASXLI in human adult tissues, fetal liver, and cancer cell lines (Figure 2-7; done by J. Berger and F. Randazzo; see Fisher et al., 2003). Two transcripts are evident in most samples, with apparent sizes of 8.0 and 6.0 kb, with the 8.0 kb transcript showing a higher expression level than the 6.0 kb transcript in normal somatic tissues. Conversely, in the testis, the 6.0 kb transcript is predominant, and an additional transcript of 5.0 kb is observed which is undetectable in other tissues. The smaller two transcript sizes agree well with predictions of 4,925 bp and 5,976 bp derived from analysis of ESTs and the location of polyadenylation signals within the ASXLI cDNA contig sequence. The 8.0 kb transcript appears to be larger than the 6,864 bp predicted from our analysis of cDNAs. This transcript may result from additional 5' UTR, 3' UTR, or alternatively spliced internal exon sequence that is not contained within our cDNA contig. Alternatively the larger mRNA may migrate aberrantly on the gel. ASXLI expression is very low in many adult tissues, including heart, brain, placenta, skeletal muscle, pancreas, spleen, prostate, small intestine, colon, peripheral blood leukocytes, bone marrow, and fetal liver (Figure 2-7, and data not shown). Expression is undetectable in adult liver and kidney (data not shown). ASXLI is expressed at moderate levels in thymus, ovary, lymph node, and appendix, and is highly expressed in testis relative to the beta-actin control hybridization signal. Testis is the only tissue examined in which the 5.0 kb transcript of ASXLI is evident. The relative expression levels of the three ASXLI transcripts are also different in testis compared to other tissues, as the 6.0 kb transcript shows the highest expression, followed by the 5.0 kb, and then the 8.0 kb transcript. The Serial Analysis of Gene Expression (SAGE; NCBI) tag A G G C T G C C C A corresponds to position 6,793 to 6,802 bp of the ASXLI cDNA contig and provides additional information on ASXLI expression. This SAGE tag is expressed in libraries derived from normal prostate, colon epithelial tissue, brain, pancreas, and breast mammary epithelium. 68 B o CD CQ 00 CO CM —I CD <i> i n X * ! -CO DC o CO 3 $ in to CQ CO < O 2.4 -1 3 5 - • Figure 2-7. Northern blot analysis of human ASXL1 expression in (A) adult tissues and fetal liver, and (B) cancer cell lines. The size of marker RNAs in kb is indicated to the left of each panel, and the tissue or cell line is indicated above each lane. PBL, peripheral blood leukocytes. Hybridization to P-actin was used as a loading control (narrow panel below each blot). 69 In contrast to its generally low expression in tissues, ASXLI is expressed at higher levels in some of the cancer cell lines examined (Figure 2-7; done by J. Berger and F. Randazzo). In cell lines HeLa S3, K562, Burkitt's Raji, and SW480, both the 8.0 and 6.0 kb transcripts appear to have the same relative intensity, whereas in normal tissues the 6.0 kb transcript is expressed at a much lower level than the 8.0 kb transcript. As in all somatic tissues, the 5.0 kb transcript is undetectable in all cancer cell lines examined. The ASXLI SAGE tag reported above is expressed in a range of primary tumor and carcinoma cell line libraries derived from prostate, colon epithelia, brain, pancreas, ovary, and breast. Most notable is the variety of brain tumor types in which this SAGE tag was expressed, including astrocytoma, glioblastoma, oligodendroglioma, and medulloblastoma primary tumors. Several ESTs derived from normal human tissue were identified by B L A S T searches or inspection of putative spliced ESTs shown on the UCSC Genome Browser which contain sequence identical to portions of the ASXLI cDNA contig within the ORF as well as novel sequence, and may therefore represent alternative splice variants involving internal translated exons. Comparison of human genomic D N A (Nov. 2002 Freeze) to these EST putative splice variants predicts one subgroup that contains an additional novel exon after exon 4, at the end of which the transcript is terminated, resulting in a cDNA of at most 600-700 bp long. The existence of such short cDNAs is not supported by our Northern blot expression analysis of ASXLI (Figure 2-7); however, it is possible that their abundance is below the level of detection. Another example of a possible ASXLI splice variant that was identified in EST database searches is represented by a single EST clone (GenBank AW235187) from pooled germ cell tumor samples. This putative splice variant is not fully characterized; however, the sequence contained in the EST predicts a transcript whose 3' end contains the last 10 bp of exon 11, spliced to the first 211 bp of exon 12, which is then spliced into exon 13 at a position 4,688 bp in the full length ASXLI cDNA using cryptic splice donor and acceptor sites, and terminates using the poly(A) signal sequence at 4,882 bp. The predicted translation of this putative splice variant transcript contains sequence corresponding to the predicted ORF between the 360 aa to 432 aa positions of A S X L I , which then undergoes a frame shift after the cryptic 3' splice junction, thereby eliminating sequence C-terminal to the 432 aa position including the PHD domain, and replacing it with 17 aa of nonhomologous sequence before the translation stop. If the entire putative splice variant transcript represented by EST AW235187 contains normally spliced exons 1 through 11, then the predicted transcript would be approximately 2 kb in length, 70 and the predicted protein generated would be truncated shortly after the A S X H domain, have a length of 449 aa, and be localized to the nucleus. We did not detect a putative splice variant of this size in normal human testis tissue by Northern blot analysis (Figure 2-7), suggesting it may be specific to the germ cell tumour sample. viii. Analysis of mouse Asxll expression Northern blots were analyzed to determine the expression profile of Asxll in mouse adult tissues and the embryonic stem (ES) cell line CCE and to confirm the predicted transcript structure (Figure 2-8). Using the BclI-3' probe derived from the 3' end of the predicted Asxll ORF, three transcripts are evident in most samples, with apparent sizes of 7.0, 6.0, and 4.7 kb. The highest M W transcript of 7.0 kb shows a higher expression level than the other two transcripts in all tissues analyzed. The smallest transcript size of 4.7 kb agrees well with the prediction of 4,637 bp derived from analysis of ESTs and the location of the first polyadenylation signal within the ASXL1 cDNA contig sequence. However, the 6.0 and 7.0 kb transcripts appear to be larger than the 5,636 and 6,545 bp predicted respectively from our analysis of alternatively polyadenylated cDNAs. These transcripts may result from additional 5' UTR, 3' UTR, or alternatively spliced internal exon sequence that is not contained within the Asxll cDNA contig, or the larger mRNAs may migrate aberrantly on the gel. Overall, Asxll expression is low in heart, skeletal muscle, and possibly salivary gland; however, the relative G A P D H control hybridization signal for the latter tissue is also quite low. Asxll is expressed at higher levels in brain, kidney, and lung relative to the G A P D H control hybridization signal. Using the probe M X - G 4 which was derived from the 5' end of Asxll, expression is also observed in CCE ES cells; however, only two transcripts are detectable, at 7.2 and 5.4 kb. Since the CCE ES poly(A) + R N A sample was run on a different gel than the adult tissue poly(A) RNA samples, it is possible that the differences in estimated M W of the transcripts simply reflects errors due to different electrophoretic conditions. The pattern of separation between the two Asxll transcripts in the CCE ES cell sample suggests that these transcripts correspond to the 7.0 kb transcript and the 4.7 kb transcript seen in adult tissues. The middle 6.0 kb Asxll transcript seen in adult tissue samples is undetectable in C C E ES cells, possibly due to limits in detection sensitivity of the blot as the blot had been previously stripped and re-probed several times. Consistent with this, the G A P D H control hybridization signal was also faint (Figure 2-71 8F); however, it is also possible that the middle 6.0 kb transcript of Asxll is not expressed in ES cells. Northern blot analysis using the BamHI-3' probe derived from the 3' end of the Asxll cDNA contig within the 3' UTR, on the same membranes as above following stripping, revealed one prominent transcript in all samples at 7.0 kb. Indeed, this probe was predicted to hybridize predominantly to the largest M W transcript of 7.0 kb based on the model of three alternatively polyadenylated Asxll cDNAs. This major transcript band exactly overlaps the position of the highest M W band identified using the BclI-3' probe of Asxll, and I therefore conclude that both probes detect the same transcript of Asxll. Using the BamHI-3' probe, an additional smaller transcript was detectable in kidney and brain tissue only, at 2.5 kb. This transcript was not detectable above background in the previous Northern blot using probe BclI-3'. The structure of this alternative transcript is unknown, as it is not explained by the predicted exon/intron structure of our Asxll full length cDNA contig. A detailed poly(A)+RNA RT-PCR analysis of the mouse Asxll exon/intron structure should allow characterization of this alternative transcript. Mouse embryos were analyzed by RNA in situ hybridization using a riboprobe for Asxll in order to obtain expression information within embryonic tissues. Analysis of several whole mount 10.5 dpc embryos showed ubiquitous expression of Asxll, though with varying levels of staining, in all embryonic tissues (Figure 2-9 A,B). The strongest staining was seen in the telencephalic vesicles, branchial arches, forelimb and hindlimb buds, dorsal root ganglia, and the tail bud. A lesser degree of staining was observed in the paraxial mesoderm, with minimal to no detectable staining in the heart and liver primordia. General staining in the cephalic region may be a result of reagent trapping in the lumen as the level of staining varied between embryos; however the strong staining in the telencephalic vesicles is robust in all replicate samples. As a positive control, littermate embryos were concurrently stained with a riboprobe for Hoxc8, resulting in the expected staining pattern, indicating that the observed expression pattern obtained with the Asxll riboprobe is not due to background staining (see Appendix Figure A-5). 72 1.35 — 1 .35 — C D Figure 2-8. Northern blot analysis of mouse Asxll expression in adult tissues (A,C), and day 0 C C E embryonic stem (ES) cell line (B,D). Transcripts detected by the Asxll probe BC1I-3' derived from the 3'end of the coding region (A), and MX -G4 probe derived from the 5' end of the coding region of Asxll (B) are compared to those detected by the extreme 3' end proximal 3'UTR probe BamHI-3' (C,D). The size of marker RNAs in kb is indicated to the left of each panel, and the tissue or cell line is indicated above each lane. Hybridization to GAPDH was used as a loading control (E,F). 73 Figure 2-9. Asxll expression at 10.5 dpc (A) , and 11.0 dpc (B) in whole mouse embryos detected by hybridization in situ. 74 ix. Characterization of other mammalian Asx-like genes There are two additional Asx-\\ke, genes in mice and humans that have not been well characterized experimentally (C.L.F. and Kathleen Millen, unpublished data; Katoh and Katoh, 2003). One of these homologues in humans has been termed ASX-LIKE 2 (ASXL2), and is one of two candidate genes involved in a balanced reciprocal translocation identified in a female patient with complete agenesis of the corpus callosum (ACC), bilateral periventricular nodular heterotopia (PNH), and other developmental nervous system defects (Ramocki et ah, 2003). I initially identified cDNAs representing the mouse Asxl2 gene by B L A S T searches of the EST database using the mouse Asxll cDNA. The EST clone AW496276 contains Asxl2 sequence homologous to the N-terminal homology (ASXH) domain of Asxll. Subsequent BLAST and B L A T searches using EST AW496276, the mouse Asxll cDNA contig, and the human ASXLI cDNA contig, against the mouse and human genome databases (NCBI and the UCSC Genome Browser) resulted in the identification of loci for the predicted v4s;c-like genes Asxl2 and Asxl3 in the mouse, and ASXL2 and ASXL3 in humans (Table 2-5). Predicted cDNA, protein, and genomic structure of human ASXL2, and predicted cDNA and protein structure of mouse Asxl2, have recently been described using in silico analyses (Katoh and Katoh, 2003). The predicted genomic structures, cDNAs, and amino acid sequences of these new Asx-like gene family members have not been experimentally confirmed, and as such may contain errors, and/or are incomplete. The mammalian Asx-like genes are predicted to generate large cDNAs, and their loci occupy large genomic regions, therefore they are likely very complex and generate multiple alternative splice variants, perhaps leading to multiple protein isoforms (see below). The mouse Asxl2 predicted protein obtained from the Genscan model protein prediction program from the UCSC Bioinformatics server (model protein chrl2_2.4) contains a kinesin-like region at the 5' end of the overall predicted protein which I interpret as the incorrect joining of two adjacent transcriptional units (see Table 2-5). Interestingly, kinesin-like genes are proximally chromosomally located 5' to the Asxll and ASXLI genes as well, indicating that the Asxll and Asxl2 genes are located on paralogous regions of different chromosomes (see above, and Katoh and Katoh (2003)). I therefore deleted the kinesin-like region from the mouse Asxl2 predicted protein and used the C-terminal 1,471 aa for my analyses, choosing the new start methionine based on sequence conservation with mouse Asx l l based on a predominance of charged R and K residues (see below). Additionally, it is probable that the N-terminal proximal 110 amino acids of the mouse Asxl3 predicted protein are incorrect, since this region showed no 75 Table 2-5. The Asx-like gene family in mice and humans Gene Organism Chromosomal Size of Genomic locus Accession location predicted size (kb) number/Identifier protein (aa) Asxlla mouse 2H1 1514 58 AR072722 ASXL1a human 20ql 1.21 1541 81 AJ438952 Asxl2 b mouse 12A1+1 1370 136 chrl2 2.4 and AK036839 ASXL2h human 2p23.3 1435 139 NM_018263 Asxl30 mouse 18A2 2298 181 chrl8 2.478 ASXL3" human 18ql2.1 2282 168 NT 010966.68 a Information on the Asx-like-1 genes is from Fisher et al. (2003) and this thesis chapter; the full length Asxll cDNA described herein has not yet been submitted to public databases b Information on mouse Asxl2 is from Katoh and Katoh (2003) except the predicted genomic locus size, which is based on my in silico analysis of the chrl2_2.4 model protein from Genscan (UCSC Bioinformatics); information on ASXL2 is from Katoh and Katoh (2003) 0 Information on Asx-like-3 genes is based on models in public genome project databases that have not been analyzed further, and therefore likely contain errors; mouse proteins are from Genscan (UCSC Bioinformatics); human proteins are from NCBI automated gene prediction programs 76 similarity with human A S X L 3 nor with other ASX-like proteins (ASXLPs; see below). It is common for automated gene prediction programs to incorrectly predict exons containing 5'UTR and start site sequences, whereas internal and C-terminal exons are more accurately predicted (Wang et al, 2003). x. Sequence comparisons of mammalian Asx-like genes Using the predicted amino acid model sequences from publicly available gene prediction programs using genome database information and published sources (listed in Table 2.5; noting the caveats above), I performed multiple pairwise sequence comparisons using a Needleman-Wunsch algorithm between all possible combinations of the mouse and human ASX-like predicted proteins, in order to: a) confirm mouse/human protein homologies based on mouse/human syntenic chromosomal location; b) assess the overall degree of identity and similarity between the proteins; and, most importantly, c) to identify novel conserved domains between mammalian family members that are not conserved with the Drosophila A S X protein. The degree of identity between mouse Asxl l and human A S X L I (74% identity and 81% similarity; Figure 2-3 A) is comparable to that between mouse Asxl2 and human A S X L 2 (79% identity; (Katoh and Katoh, 2003)) and between mouse Asxl3 and human A S X L 3 (70% identity and 78% similarity; Appendix Figure A - l ) . Successive pairwise alignments between the mouse A s x l l , Asxl2, and Asxl3 predicted proteins revealed similarities between all three sequences to be within the 31 to 38% range, with the similarity between Asx l l and Asxl2 (38%; Appendix Figure A-2) slightly higher than that between Asx l l and Asxl3 (31%; Appendix Figure A-3) and between Asxl2 and Asxl3 (31%>; Appendix Figure A-4). Overall, the degree of identity between any two mouse or human non-orthologous ASX-like predicted proteins was low, ranging from 22 to 27%, while the degree of similarity varied between 31 and 38%>. Once the predicted cDNA and protein sequences of all mouse and human Asx-like genes have been experimentally confirmed, the above sequence comparisons between homologues can be confirmed, and used to investigate evolutionary relationships between Asx-like genes. xi. Conserved regions within mammalian ASX-like proteins Inspection of the multiple pairwise comparisons between the mammalian ASX-like family members (Figure 2-3A and Appendix Figures A - l through A-4) revealed novel 77 conserved regions or motifs that were not conserved with the Drosophila A S X protein, in addition to the previously identified conserved regions discussed above (Table 2-6). Successive global pairwise comparisons (using a Needleman-Wunsch algorithm) were performed rather than a progressive multiple sequence alignment (using ClustalW) as the latter method did not yield acceptable results since identified conserved regions showed extensive misalignment of at least one family member at any given sequence location. This is likely due to the differences in overall sequence lengths of ASX-like predicted proteins (Table 2-5) and stretches of non-conserved amino acids of varying length between the dispersed conserved regions. The two highly conserved domains that were identified by comparing the Drosophila A S X protein sequence to that of mouse Asxl l and human A S X L 1 are also highly conserved in the other mammalian ASX-like predicted proteins, namely the N-terminal A S X H domain with its two c&XXO® motifs, similar to the consensus L X X L L NR binding motifs, and the C-terminal PHD domain (Table 2-6). By comparing the mammalian ASX-like predicted proteins to each other, four new conserved regions were identified that are not conserved with Drosophila A S X (Table 2-6). The first two new conserved motifs are N-terminal to the A S X H domain. The first of these is a single ®<DX<D® motif. The second contains the consensus sequence G T S P L A C L N A M L H that is 100% identical between all mammalian ASX-like predicted proteins, that I have named A S X L - B O X 1 (Figure 2-10). The third new conserved motif is downstream of the A S X H region, is termed A S X L - B O X 2 , and contains the core sequence TGARTLADIKA(R/K)A(Q/L)2 (Figure 2-11). The fourth new conserved region contains two consecutive O X X O O motifs flanked by prolines and separated by 5-6 amino acids including a glycine, with a consensus sequence of S 2 (V/I)X3NPLV(T/M)QLLQGX,. 2 (L/V)PLE(Q/K)-(V/I)LP, and is located between A S X L - B O X 2 and the C-terminal PHD domain (Figure 2-12). The PHD domain is highly conserved among all Drosophila, mouse, and human ASX-like predicted proteins (Figure 2-13). Importantly, the order within the ASX-like predicted proteins in which these conserved motifs occur is conserved, and all mammalian ASX-like predicted proteins contain all of the conserved elements listed in Table 2-6 (Figure 2-14). 78 u o •+-* i O c o co C cd OH CD o to < CD CO o a co bo r-- a CD r-+J .B CD o VO o CD X O O < G .2 fi a cd a O t-i OH *S 7 3 O a c o co C CD O CN X CD C C o .2 § WO £ CD o o £2 CO o 1) CO CD o o < X 00 o £ g •4-* l - a . C CO o -a CD CD g CN s <N ">< CO < CD CO =5 O a < u td vo 1 ^ < ON « 2 a ° i ^ . a CD CO 3 o a _o 'co CD O o < CD O c CD g-CD CO -O CD & CD CO c O U vb i CN 3 CO H Cd c H-1 CO cd CN O VO OS CN U < PQ c _o 'co CO CD CD O < CD l-i CD O D . o m u CN 00 a =3 £ .a CD CD O C CD §• CD 00 CD T 3 O 4 CN CN CN r--o CN T 3 s „ CD I- "° CD O 1 S c CO CD l-i C 'co CO <D O o < o co c CD B O CD CN £ 1 S 2 co V ] O o O w a — m S3 > *t * • a . e © © c U 1/1 C3 •a 0) o c 00 O O O O C O •o a a © o CL, TJ-CU u s CU s a" V Vi o (U w t-1 > > > t-1 > o o o > <y^ •§ -s w J H ol ol in H <• o o t/3 z o u -o a © © s e e x © ^ E 0) CO 3^ « S CO S Z © § J S ffi CN CO C N H co td CO a K co < c cd 6 o c C N V O vo V O vo V O C N C N 00 o\ O N • t •4 C N as as oo H O 0) CD CD > > > > > o o O O O X ) 43 X> J 3 cd cd cd cd cd CO CO CO CO CO < < < <: < X o -PQ • i J C7) < < CM X oo < c cd s X CO X oo < G cd s CO CU CN CN lO O 00 CN vo vo o oo CN as co co io co >/•> "d-• i i i i i © O co ON vo O in m 0 \ vo »—i oo CN CN CO CN CO 00 00 »> k> w w w w w S S E £ ^ «j 5? ^ °2 ^ PH £ &H £H ft, £ H f-n H H H H a o •o S X Vi < cd r o CN CO J i-l g x a x <: < < < " S <u 5 cd co cd g 3 E 3 O 3 ffi S K CD co 3O 79 o c o c 00 oo CN 00 oo CN co o CO in 00 •4 00 CN •4 00 CN CN CO O co o m 4 and T3 ss ed C cd C Cd c cd 00 NO CN 00 VO CN 00 CN co oo ON co 4 VO CN 4 VO CN o co 00 CN © co 4 ON co ON O CO 00 m CN CN CN 00 VO CN VO VO VO — - i 1—H NO ON 4 CN m o o O 00 VO VO VO CN r—1 CN 1—1 VO IT) ON CN 00 co ON O i—i o 00 ON 00 '—1 '—' »—i VO -H '—1 co 00 r~- ON • CN T—1 ON ,—1 r~ ON O o vo ON 00 *—< ON »—< T3 T3 -d T 3 13 C C c C C 3 cd cd cd cd cd cd r- CN O r-~ VO in oo r- in CN 00 O i—< o ON ON oo 1—H <—1 1 CN m '—' CN in © oo O vo ON CN 00 o o ON 00 T—1 © r- r~ o 00 T—1 CO VO CO ON r-in m CN CN •—1 i—i ~H CN CN ON VO v i ON co rA c-- o CO ON VO in •<cr CO CN CN >—i T—1 CN CN hJ J hJ J I—I I—I hJ hJ h-1 T3 C cd • J J J CO CO CL> > > > o O o x> X) x> cd cd cd CO < < < hJ Q < H Pi < O co > o •8 CO > o •s CO > o X) cd O h-1 > oo < U oo > < > CO Vi CO .PS1 CO AO AO .PS1 AO X> Xi X> cd cd cd < < < u Vi X Vi < e e e o ft H xi CO —I X oo < s K CN X 00 < c cd s 3 CO X 00 < a cd CN X O n — CN CO x g ^ 9 x X co S a> S « S 2 3 | 3 | 3 g o a e e e o ft H -H CN >-< r.1 -J v x ^ x x 00 x Vi s 3 X co < CO U5 3 O c cd S 3 X co X 00 < a cd s 3 CN CO S hj h-1 h-1 SD^ X g x g x a >< 00 >< 00 x 00 CO S u S w Q £ 3 g 3 1 3 1 vo 2 33 S K S ffi 80 A s x l l ASXL1 Asxl2 ASXL2 Asxl3 ASXL3 ASX A s x l l ASXL1 Asxl2 ASXL2 Asxl3 ASXL3 ASX 30 30 30 30 54 61 35 —TAjgai MKDKRK: -MKT0TPDTTTSTSCEJ KY VPFGITSRTLPGSTRSSWAYC JHMKGTRKTPQLTNDSKADIGSHSERRVK 'gA^QHjJgPMLQ Q ASXL-BOX 1 FPSTGKNFSLNVHISTLKRRYIEMGSVAIKKYDKVEEC RNKDLTQRKTEKNKEGTVTGSDSHHGLVERAIELSKAC -QSLLAAPP PTMIMEHVNLVDDDEKE GTSPLACLNAMLHSNSRGGEGL GTSPLACLNAMLHGNSRGGEGL GTSPLACLNAMLHTNSRGSEGI GTSPLACLNAMLHTNSRGGEGI GTSPLACLNAMLHTNTRBGDG GTSPLACLNAMLHTNTRHGDG £§LE|sPS(|gHTHSij Figure 2-10. Sequence alignment of the N-terminal regions of Drosophila A S X and mouse and human ASX-like proteins. Sequences used in alignment were: Drosophila A S X (Accession CAA04568), human ASXL1 (Accession CAD27708), mouse Asxl l (predicted translation of Accession AR072722), and the ASX-like-2 proteins from human (Accession NP 060733; Katoh and Katoh, 2003) and mouse (Accession BAC29602.1; Katoh and Katoh, 2003), and ASX-like-3 proteins predicted by the Genscan program from the UCSC Genome annotation project on the Human Nov. 2002 Freeze and Mouse Feb. 2002 Freeze (Genscan Gene Prediction Numbers: ASXL3 , NT_010966.68; Asxl3, chrl8_2.478; these predicted proteins likely contain errors). The sequences were aligned using ClustalW, and formatted using B O X S H A D E . Conserved amino acids are shown with black backgrounds, similar amino acids have grey backgrounds, gaps are indicated by dashes. Conserved sequences representing the nuclear receptor (NR) binding motifs, and a region with no known homologies (termed A S X L - B O X 1) are indicated. Numbers refer to amino acid positions in the corresponding predicted protein fragments with the most N-terminal amino acid shown defined as position one. Refer to Table 2-6 for a summary of all conserved motifs and their corresponding amino acid positions in the predicted ASX-like protein family, and to Appendix for pairwise alignments. 81 ASXL-BOX 2 TGARTLADI KARAQQj ITGARTLADI KARAQQj j TGARTLADIKAKAQBI jHTGARTLAD I KAKAQg j ITGARTLAD IKARAfflQ ifTGARTLAD I KARAgQ^ A s x l 3 1 VPP ASXL3 1 VPP A s x l 2 1 VPP ASXL2 1 VPP A s x l l 1 VPP ASXLI 1 VPP SKAES SKPES -Qs| TSPiJ -iGRfiTYQgcri ESTYQBC' ^ R F P I ^ W S P Y I 1 IVPIWES:CRG\ IIP^ESJCRGV A s x l 3 ASXL3 A s x l 2 ASXL2 A s x l l ASXLI 61 E ^ Hs i t t 61 E n S v S H s I 61 GYHCNRETgTTg-61 GHHCHREAgTTg KARTLAHIKEQTKAKLFAKHQSRAHLFQTSKETR KTRTLAHIKEQTKAKLFAKHQARAHLFQTSKETR QSPR QGPG SGA !GA Figure 2-11. Sequence alignment of the central regions of mouse and human ASX-like proteins. Sequences used in alignment were: human A S X L I (Accession CAD27708), mouse Asxl l (predicted translation of Accession AR072722), and homologous regions of ASX-like proteins from human and mouse, represented by Genscan protein predictions from the UCSC Genome annotation project on the Human Nov. 2002 Freeze and Mouse Feb. 2002 Freeze (Genscan Gene Prediction Numbers: A S X L 2 , NT_005204.107; Asxl2, chrl2_2.4; A S X L 3 , NT_010966.68; Asxl3, chrl8_2.478; these predicted proteins likely contain errors). The sequences were aligned using ClustalW and manual adjustment, and formatted using B O X S H A D E . Conserved amino acids are shown with black backgrounds, similar amino acids have grey backgrounds, gaps are indicated by dashes. A highly conserved region with no known domain homologies is termed A S X L - B O X 2 as indicated. Numbers refer to amino acid positions in the corresponding predicted protein fragments with the most N-terminal amino acid shown defined as position one. Refer to Table 2-6 for a summary of all conserved motifs and their corresponding amino acid positions in the predicted ASX-like protein family, and to Appendix for pairwise alignments. 82 A s x l l 1 A S X L 1 1 ISO A s x l 3 1 R L A S X L 3 1 R L A s x l 2 1 T S A S X L 2 1 @T jRS | a j 3 S S Q i j J D D | M S SSPQ J * L G A S k S L2 K T Figure 2-12. Sequence alignment of a C-terminal region containing two nuclear receptor (NR) binding domains of mouse and human ASX-like proteins. Sequences used in alignment were: human ASXL1 (Accession CAD27708), mouse Asxl l (predicted translation of Accession AR072722), and homologous regions of ASX-like proteins from human and mouse, represented by Genscan protein predictions from the UCSC Genome annotation project on the Human Nov. 2002 Freeze and Mouse Feb. 2002 Freeze (Genscan Gene Prediction Numbers: A S X L 2 , NT_005204.107; Asxl2, chrl2_2.4; A S X L 3 , NT_010966.68; Asxl3, chrl8_2.478; these predicted proteins likely contain errors). The sequences were aligned using ClustalW and formatted using B O X S H A D E . Conserved amino acids are shown with black backgrounds, similar amino acids have grey backgrounds, gaps are indicated by dashes. Conserved sequences representing the NR binding motifs are indicated. Numbers refer to amino acid positions in the corresponding predicted protein fragments with the most N-terminal amino acid shown defined as position one. Refer to Table 2-6 for a summary of all conserved motifs and their corresponding amino acid positions in the predicted ASX-like protein family, and to Appendix for pairwise alignments. 83 Asxl3 ASXL3 Asxl2 ASXL2 A s x l l ASXLI ASX Figure 2-13. Sequence alignment of the C-terminal proximal PHD domains of mouse and human ASX-like proteins. Sequences used in alignment were: Drosophila A S X (Accession CAA04568), human A S X L I (Accession CAD27708), mouse Asx l l (predicted translation of Accession AR072722), and other ASX-like proteins from human and mouse, represented by Genscan protein predictions from the UCSC Genome annotation project on the Human Nov. 2002 Freeze and Mouse Feb. 2002 Freeze (Genscan Gene Prediction Numbers: Asxl2, chrl2_2.4; A S X L 3 , NT_010966.68; Asxl3, chrl8_2.478; these predicted proteins likely contain errors), and ASXL2 (Accession NP_060733). The sequences were aligned using ClustalW and manual adjustment, and formatted using BOXSHADE. Conserved amino acids are shown with black backgrounds, similar amino acids have grey backgrounds, gaps are indicated by dashes. Numbers to the left of the alignment refer to amino acid positions in the corresponding predicted protein fragments with the most N -terminal amino acid shown defined as position one. Refer to Table 2-6 for a summary of all conserved motifs and their corresponding amino acid positions in the predicted A S X -like protein family, and to Appendix for pairwise alignments. Residues predicted to be involved in Z n 2 + binding are numbered below the alignment. 84 X 0 0 < c "<5 E o T3 CD i X CO < a s s CM X o in < o E I CD '53 X 00 cd < : t> - «§ 2 | 2 *> -P. oo cd CD kH 1) 4 3 a o 4 3 CO cu cd U-i S3 CD cd 7 3 CD CD Cd £ X o CD X 00 00 < 7 3 S3 cd cd .2 CN ^ CD O X 00 CD 5 X .S ^ cd co Q B cd co _S3 '53 o 4 3 co O a CD 4 3 7 3 S3 cd CD -3 C C+H CO S3 CD 7 3 'co CD S3 cd s a • 43 CD X co ^ _e 'cd s o 7 3 CD > cd 4 3 CO S3 cd s S3 4 3 7 3 S3 cd CD CD CD I CO CD i-CD 4 3 O 4 3 P H — O a T3 CD 4 3 H X oo CD •B O 4 3 4 3 e CD VO 7 3 S3 cd 1/1 X oo 5S <N CD 7 3 CD "a3 2 e a S3 4H~> 3 ) 1 CN CD 7 3 CD CD 7 3 CD S3 O O •c cd CD CD S3 O 4 2 cd H o C+H CD CD "cd CD CO 85 Katoh and Katoh (2003) have recently compared their in silico generated gene and protein predictions of human A S X L 2 and mouse Asxl2 to my experimentally confirmed human ASXL1 predicted protein (Fisher et al, 2003) and the Drosophila A S X sequence (Sinclair et al, 1998b), and identified a large region of sequence homology in the N-termini of ASXL1 and A S X L 2 which they term the A S X N (ASX N-terminal) domain. This A S X N domain includes the N-terminal O O X O O motif and A S X L - B O X 1 motif that I identified above as being conserved in all mammalian ASX-like predicted proteins, but which are not conserved in Drosophila A S X . However, the A S X N domain is not conserved throughout its length among all three mammalian ASX-like predicted proteins. Katoh and Katoh (2003) also identified the A S X H domain within the ASX-like-2 predicted proteins, but renamed it the A S X M ("middle") domain. The A S X H (ASX homology) domain was so named as it is the only domain other than the PHD zinc fingers that is conserved in both Drosophila and mammalian ASX-like predicted proteins. Katoh and Katoh (2003) did not use terminology to differentiate between conserved regions present only within the mammalian ASX-like family as opposed to regions conserved with Drosophila A S X , as I have done in this thesis chapter, which perhaps explains their inconsistent use of conserved region terminology. An in silico prediction of the ASXL3 gene has also recently been published (Katoh and Katoh, 2004). xii. Analysis of mouse Asxl2 expression In order to investigate Asxll expression, a probe generated from the EST AW496276 was hybridized to the same poly(A)+ Northern blots used to analyze Asxll expression (see above). Six transcripts were identified in adult mouse tissues, whereas three transcripts were detectable in C C E ES cells (Figure 2-15). The two most intense bands were detectable in all samples analyzed: kidney, lung, heart, salivary gland, skeletal muscle, brain, and CCE ES cells. Overall, Asxl2 expression is low in heart, skeletal muscle, and possibly salivary gland; whereas Asxl2 is expressed at higher levels in brain, kidney, and lung relative to the G A P D H control hybridization signal. This overall pattern of relative transcript abundance across tissues is very similar to that observed for Asxll (see Figure 2-8). In all tissues analyzed, the strongest transcript signal was seen at 9.5 kb, followed by the transcript at 4.5 kb. Four faint bands were clearly detectable only in brain and kidney, at 14.4, 12.0, 6.5, and 2.4 kb. In CCE ES cells, the predominant band is at 9.5 kb, followed by a band at 4.5 kb, and the faintest band at 2.4 kb. In tissues other than brain and kidney, and in CCE ES cells, the faintest bands could either have 86 Figure 2-15. Northern blot analysis of mouse Asxll expression in adult tissues (A), and day 0 C C E embryonic stem (ES) cell line (B). The same blots used for northern analysis of Asxll were rehybridized with a probe derived from an AsxU EST clone. The size of marker RNAs in kb is indicated to the left of each panel, and the tissue or cell line is indicated above each lane. Hybridization to G A P D H was used as a loading control (C,D). 87 been below the level of detection, or perhaps not all possible transcripts are expressed in every tissue type. The expression patterns of human ASXL2, human ASXL3, and mouse Asxl3 have not yet been investigated experimentally, and I have not analyzed SAGE data available in silico for these genes. However, the complexity of transcript distribution for the Asxll gene indicates that it is a complex locus with at least six alternative transcripts, which must encode multiple protein isoforms since the longest ORF is longer than the smallest alternative transcript detected by Northern blot analysis. It is likely that the ASXL1, ASXL3, and Asxl3 gene loci are also very complex. III. Discussion I have characterized the cDNA, predicted protein, and genomic structures of Asx-like-1 from human and mouse, which are novel mammalian homologues of the Drosophila Asx gene (Fisher et al., 2003), and have identified two novel mammalian Asx-like genes, Asx-like-1 and Asx-like-3. It is likely that these three genes represent the entire Asx-like gene family in mammals, since I could not identify additional homologues in searches of mouse and human genome and EST databases. I have provided experimental evidence for expression of the genes ASXL1, Asxll, and Asxll. The ASXL1, ASXL3, and Asxl3 gene and protein predictions are solely based on in silico analysis using Genscan (UCSC Bioinformatics) arid NCBI gene prediction bioinformatics programs, and results of in silico analysis of ASXL1 by Katoh and Katoh (2003). The predicted proteins for all three mammalian ASX-like homologues show little similarity to A S X with the exception of two distinct regions: 1) an amino terminal region that I have called the A S X homology domain (ASXH); and 2) a carboxy-terminal PHD domain (Table 2-6, Figure 2-3). By comparing all mouse and human ASX-like predicted proteins to each other, I identified four additional regions that are conserved within the mammalian A S X L P family but not in Drosophila A S X (Table 2-6; Appendix Figures A - l to A-4). Identification of novel conserved regions may assist in functional characterization of the mammalian ASXLPs, since conserved regions often represent important functional domains whose sequence was constrained through evolution. Two of the regions conserved within the mammalian ASXLPs , namely A S X L - B O X 1 and A S X L - B O X 2 do not exhibit similarity with any known protein domains present within databases. These conserved regions may represent important structural constraints, form novel functional domains specific to mammalian A S X L P function, or have sufficiently low similarity 88 to known conserved domains so as to escape detection by database searching. Future analysis of the A S X L - B O X 1 and -BOX2 conserved regions will distinguish these possibilities. There are five conserved O X X O O motifs (where O is any hydrophobic residue), similar or in some instances identical to the originally defined signature L X X L L motif for nuclear receptor (NR) binding, within mammalian ASXLPs (Figure 2-10). The O X X O O motifs are generally referred to as NR binding domains or motifs in the literature. Two NR binding domains (NRBDs) are within the A S X H domain that is conserved with Drosophila A S X , while one is N-terminal, and the remaining two are adjacent to one another and are located C-terminal to the A S X H domain. NRs are DNA-binding transcription factors which function in the context of chromatin and act together with chromatin-remodeling factors, coactivators, and corepressors, to switch target genes between active and repressed states, by binding to coactivators in the presence of ligand, and to corepressors in the absence of ligand, respectively (Glass and Rosenfeld, 2000; Hsiao et al, 2002; Kraus and Wong, 2002; Rosenfeld and Glass, 2001; Urnov and Wolffe, 2001b). NR binding domains mediate interaction between the NR and NR coregulatory proteins (Heery et al, 1997; Nagy et al., 1999). Structural and mutational analyses indicate that the N R B D forms an amphipathic a-helix, and that determinants of functional specificity may reside in residues carboxy-terminal to the motif (Darimont et al., 1998; Mclnerney et al., 1998; Perissi et al, 1999). Several motifs very similar to classical L X X L L motifs have been identified within the NR coregulators CBP and ACTR/RAC3/AIBl /pCIP /TRAM-l that mediate interaction with each other by formation of a cooperatively folded helical heterodimer, whereas the domains in isolation are intrinsically disordered (Demarest et al, 2002). It remains to be seen whether or not this mechanism of coregulator-coregulator interaction is a general phenomenon, though likely it will come to represent a new class of synergistic protein folding through domains similar to NRBDs. Recently, a member of the trithorax Group, the trithorax-related gene (trr), which contains four NR binding motifs in addition to a SET domain, was shown to interact with the ecdysone receptor (EcR) specifically through its second N R binding domain (Sedkov et al, 2003). Ecdysone hormone and its receptor are involved in multiple aspects of eye development, and, correspondingly, TRR was found to be required for eye development (Sedkov et al, 2003). As expected for a SET-domain containing protein, TRR also acts as a histone methyltransferase that can trimethylate lysine 4 of histone H3 (Sedkov et al, 2003). The results of Sedkov et al. 89 (2003) strongly suggest that the TRR protein acts as an EcR coactivator by modifying the chromatin structure at ecdysone-responsive promoters. It is therefore tempting to consider a similar role to TRR for A S X and mammalian ASXLPs in binding to specific NRs and modifying chromatin structure at target loci. Consistent with a possible role in NR binding, the A S X H domain is predicted to be predominantly a-helical; however, it is also possible that the two NRBDs within this domain and the other three NRBDs within ASXLPs mediate interactions with NR coregulators rather than with NRs themselves. These two possibilities need not be mutually exclusive since the five NRBDs are present within three separate regions of mammalian ASXLPs, therefore binding to multiple partners is possible. Given the presence of multiple NRBDs in ASXLPs , it would be very interesting to explore the protein binding specificity of these motifs to determine whether or not ASXLPs do indeed bind to NRs or to NR coregulators. The second conserved motif between Drosophila A S X and mammalian ASXLPs, which is also found in chromatin regulatory proteins, is the PHD domain, located at the C-terminus within the A S X L P family members. PHD domains possess a consensus sequence of C X 2 C X 9 . 21 C X 2 - 4 C X 4 . 5 H X 2 C X ] 2.46CX2C, are found in over 400 eukaryotic proteins, and function primarily as protein-protein interaction modules (Aasland et al., 1995; Coscoy and Ganem, 2003; Kosarev et al., 2002)). Structural studies of the KAP-1 corepressor PHD domain show that the conserved Cys and His residues cooperatively bind two zinc atoms using a cross-brace topology similar to that of RING domains, and that zinc binding is required both for proper folding of the domain and transcriptional repression (Capili et ah, 2001). It remains to be determined whether or not the ASX-like PHD domains bind zinc, and i f so, which residues are involved. It is notable that other members of the PcG (Polycomblike) and trxG (trx, ashl, ash2) and their human homologs have PHD domains (see Figure 2-5), and interaction between the ETP protein E(Z) and the PcG protein PCL, and their conserved human homologs, is mediated by the PHD finger of Polycomblike (Brock and van Lohuizen, 2001; O'Connell et al., 2001). A S X has been shown to interact with the SET domains of E(Z), TRX, SU(VAR)3-9, and ASH1, through its C-terminal region including but not necessarily limited to, the PHD domain (M. Kyba, T. Rozovskaia, and E. Canaani, unpublished observations). At least one of these interactions is conserved in mammals, as the ASXL1 C-terminus and the SET domain of M L L also interact (C. Fisher, E. O'Dor, and H. Brock, unpublished observations; see Chapter 1). The PHD domain of A S X / A S X L 1 may be necessary but not sufficient for SET domain binding; 90 however, a C-terminal fragment of A S X L I limited to the PHD domain has recently been shown to bind both unmodified and modified histone tails in vitro (E. O'Dor and H . Brock, unpublished observations). Recently the PHD domain has been implicated in an enzymatic function (Coscoy and Ganem, 2003). The PHD finger of the histone acetyltransferase CREB binding protein (CBP), which functions as a critical coactivator for a variety of transcription factors and as a protein scaffold that stabilizes R N A polymerase Il-mediated transcription, is an essential component of the enzymatic core of the acetyltransferase domain (Kalkhoven et al., 2003; Kalkhoven et al., 2002). Several reports have also appeared claiming that the PHD domains mediate E3 ubiquitin ligase (E3) activity of their target proteins and thereby mark them for regulated proteolytic degradation in the ubiquitin pathway, including the PHD containing MIR family herpesvirus proteins and M E K K 1 , a regulator of cellular signaling (Chang and Karin, 2001; Coscoy and Ganem, 2003). However, subsequent careful analysis of the protein domain structure within those proteins revealed that the domains involved belong to the RING family, and are not PHD domains (Aravind et al., 2003). Therefore the confirmed functional roles of the PHD domain are limited to protein-protein interaction, and involvement in acetylation of chromatin proteins. Mutations leading to amino acid substitutions within the PHD domain, or truncation of the PHD domain, of several different proteins (e.g. A T R X , AIRE, CBP, and ING1) correlates with human disease ((Kalkhoven et al., 2002); and references therein). Mutations in the autoimmune regulator (AIRE) gene in humans lead to autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED), which is a rare autosomal recessively inherited disease that leads to destruction of the endocrine organs (Pitkanen and Peterson, 2003). It is interesting to note that AIRE contains 2 PHD domains and 4 L X X L L motifs, and hence shares some similarity in its complement of domains to that of ASXLPs even though the order of domains differs. Given these similarities, investigations of mechanism of action of AIRE may provide insight into the roles of mammalian ASXLPs in both normal and disease states. The Brock laboratory is currently investigating the role of the PHD domain of A S X and A S X L I in protein-protein interactions and intrinsic chromatin binding and/or modifying activities. Continued structure-function analyses will be required to elucidate the various roles of the PHD domain in ASX-like and other chromatin regulatory proteins, both in the context of normal activity and aberrant function in disease states. 91 Analysis of the chromosomal locations of Asx-like genes may also give insight into potential functional roles in human disease. Deletions from 20ql 1.2 to 20ql2, lying between the RPN2 and D20S17 markers are found in 18/19 patients with myeloid disorders (Roulston et al., 1993). ASXLI maps outside this region, suggesting that deletion of ASXLI is not a contributing factor in myeloid disorders. However, a wide range of human primary tumors including sarcomas, and bladder, breast, colorectal, esophageal, gastric, hepatocellular, and pancreatic carcinomas, show frequent amplification on 20q as detected by comparative genomic hybridization (CGH) (Guan et al, 2000a; Guan et al, 2000b; Menghi-Sartorio et al, 2001). Male germ cell and ovarian cancers also show frequent gain of chromosome 20q (Looijenga et al, 2000; Mostert et al, 2000). This is especially interesting because ASXLI shows the highest relative expression and unique distribution of transcripts in normal testis tissue (see below). Gain of 20q is also associated with dominant changes that contribute to immortalization of human uroepithelial cells in vitro (Cuthill et al, 1999). Resistance to chemotherapeutic agents is a serious clinical problem, and may involve multiple genetic pathways. Gain of 20ql 1.2-13.1 is associated with acquisition of drug resistance to tamoxifen in a human breast cancer cell line (Achuthan et al, 2001) and amplification of 20ql 1.2-12 is observed in a subset of primary human male germ cell tumors that are resistant to cisplatin therapy (Rao et al, 1998). Amplification of the ASXLI gene at 20ql 1.21, may be a causative factor for any or all of the aforementioned cancers, or may be a part of a genetic pathway leading from drug sensitivity to resistance. The human ASXL2 gene was identified as part of a balanced reciprocal translocation between chromosomes 2p24 and 9q32 in a juvenile female patient with agenesis of the corpus callosum (ACC), a common brain malformation (Ramocki et al, 2003). This translocation fuses the genomic region containing the first 5 exons of the Gt4-2 gene 5' to that of the last 9 exons of ASXL2, predicting a fusion protein that contains the PHD domain of ASXL2 presumably driven by the Gt4-2 gene promoter. The phenotypes observed in the patient may therefore be due to loss of function of one or both genes, and/or gain of function due to misregulation of functional domains remaining in the fusion protein. The patient also exhibited bilateral chorioretinal and iris colobomas and bilateral periventricular nodular heterotopia (PNH). There are over 150 human syndromes involving A C C , but very few human gene mutations leading to defective callosal development have been identified to date (Ramocki et al, 2003). Although it is not yet known whether or not the predicted fusion protein is expressed 92 in this patient since the required antibodies are lacking, the findings suggest a role for ASXL2 in human brain development. There is a precedent for involvement of PcG genes in brain development, as young homozygous null bmi-1 mutant mice exhibited severe ataxia, accompanied by tremors and seizures (van der Lugt et al., 1994). Histological analysis of bmi-V1' CNS revealed defects in the cerebellum and hippocampus, with substantial loss or reduction in size of neurons, especially Purkinje cells. The corpus callosum, the same region as is affected in the patient with A C C described above, of bmi-1'1' mice showed extensive gliosis (van der Lugt et al, 1994). Similar to the chromosomal region around ASXL1, the chromosomal region at 2p23, the site of ASXL2, is amplified in a variety of tumour types, including B-cell leukemias and lymphomas, germ cell and gastric tumours, and chrondrosarcomas (Katoh, 2003). Another chimeric fusion protein involving ASXL2 was deposited into public databases by Hosoda et al. (2002; unpublished observations; Genbank accession number AB084281). In this chimeric protein, the Monocytic leukemia zinc finger protein (MOZ) (Borrow et al., 1996) is fused to ASXL2 (which they called ASXH2) such that only a small portion (19 aa) of the N-terminus of AS XL2 is deleted, and none of the conserved domains of A S X L 2 listed in Table 2-6 are missing (analyzed by comparison of M O Z - A S X L 2 to A S X L 2 using Blast2Seq program). M O Z belongs to the M Y S T family of histone acetyltransferase (HAT) proteins, is thought to act as a transcriptional coregulator, and interacts with Runt-domain transcription factors which are implicated in T cell lymphomagenesis (Champagne et al., 1999; Champagne et ah, 2001; Pelletier et al, 2003; Pelletier et al, 2002; Sterner and Berger, 2000). M O Z was recently shown to regulate Hox gene expression and segmental identity of the pharynx in Zebrafish larvae (Miller et al, 2004). The structure of the M O Z - A S X L 2 fusion protein suggests that virtually the entire ORF of A S X L 2 is controlled by the M O Z promoter and regulatory sequences. The M O Z - A S X L 2 fusion protein was discovered in a patient with therapy-related myelodysplastic syndrome (MDS) with a t(2;8)(p23;pll.2) reciprocal translocation. No further functional information on the M O Z - A S X L 2 fusion protein is available, and it remains to be seen whether the fusion protein plays a causative, or just correlative, role in development of MDS. However, it is intriguing that the two other known M O Z gene fusion proteins, involving the nuclear receptor coactivator protein CBP (which is also a HAT) and TIF2/NcoA2/GRIP 1 (which interacts with CBP), have both been implicated in leukemogenesis, possibly via aberrant chromatin acetylation activity (Carrozza et al, 2003; Jacobson and Pillus, 93 1999; Sterner and Berger, 2000). In Drosophila, dCBP forms part of the TAC1 chromatin remodeling complex along with the M L L homologue T R X (Petruk et al., 2001; Petruk et al., 2004), and also interacts with ASH1 which itself is part of a separate chromatin remodelling complex (Papoulas et al., 1998). Both T R X and ASH1 SET domains interact with the C-terminus of A S X , and A S X L I interacts with M L L (see Chapter 1 and Chapter 2 Introduction). By extension, given the high degree of sequence similarity between A S X L I and A SX L2 at the C-terminus including the PHD domain, it is possible that A S X L 2 also interacts with human T R X and ASH1 homologues and therefore possibly even with CBP itself. As in the MOZ-ASXL2 fusion, the M O Z portion is fused N-terminally to the CBP and TIF2 proteins, and most of the CBP and TIF2 protein remains intact. It may therefore be illuminating to investigate and compare functions of the three M O Z chimeric fusion proteins, as this may provide further information on A S X L 2 (and general ASXLP) function and mechanism of action. Human ASXLI, mouse Asxll, and mouse Asxl2, are each expressed in most adult tissues analyzed as one major transcript at 8.0, 7.0, and 9.5 kb respectively, each with two or more minor transcripts. The level of Asx-like gene expression varies substantially between different adult tissues, implying that expression is differentially regulated between tissues; indeed, this is common among mammalian PcG genes (Gunster et al., 2001). Transcripts of the mouse Asxll and Asxl2 genes are relatively highly expressed in adult brain, whereas human ASXLI is expressed at relatively low levels in brain. This may reflect species differences in tissue-specific expression of the Asx-like-1 gene, as expression in the kidney of mouse Asxll was moderate whereas there was no detectable expression of human ASXLI. However, since largely non-overlapping sets of tissues were used for Northern blot analysis of mouse versus human Asx-like-1, this question cannot be adequately addressed at this time. Mouse Asxl2 is ubiquitously expressed in the CNS from 9.5 to 15.5 dpc, which is the period when callosal development begins (K. Millen, unpublished results), and I have shown that mouse Asxll is also expressed in the CNS at 10.5-11.0 dpc by whole-mount R N A in situ hybridization. Given the probable role of human ASXL2 in brain development as discussed above, and the observation that Asxll and Asxl2 are both expressed in the developing and adult brain, it will be interesting to investigate possible roles for all Asx-like genes in brain development and function. Like ASXLI and Asxll, expression of other mammalian PcG genes is relatively high in gonads (Gunster et al., 2001; van Lohuizen, 1998). Overexpression and difference in patterns of mRNA expression in spermatogenic versus somatic cells is common (Kleene, 2001). 94 Analysis of sectioned adult mouse testes by immunohistochemistry using an a-Asxl l polyclonal antibody to obtain cell specific information on Asx l l expression within gonads revealed expression in pachytene spermatocytes and round spermatids up to stage VIII within the seminiferous tubules (C. Suarez-Quian and F. Randazzo, unpublished observations). It will be interesting to investigate whether or not Asx l l has a functional role in spermatogenesis given the specific, stage-limited expression pattern observed in the testis. Because of the map position of ASXLI at human 20ql 1, the association of this region with amplification in tumors, and increasing evidence for the involvement of chromatin proteins in cancer (Jacobs and van Lohuizen, 2002; Wolffe, 2001), we examined the expression of ASXLI in a variety of cell lines derived from human tumours. Compared to normal tissues, the minor 6.0 kb transcript showed marked upregulation, to the same level as the major 8.0 kb transcript, in a number of different tumour cell lines. It will therefore be interesting to confirm by quantitative gene expression analysis i f particular alternative transcripts of ASXLI are overexpressed in primary human carcinomas compared to matched normal tissue. In addition to the Asx-like gene transcripts discussed above, which were characterized by Northern blot analysis, useful expression information can be gleaned from SAGE gene expression data, and from careful analysis of ESTs mapping to the Asxl-like gene loci. SAGE analysis of a tag from the 3' UTR of human ASXLI extended the range of tissues in which A S X L I expression is found to include normal prostate, colon epithelial tissue, brain, pancreas, and breast mammary epithelium. Conversely, certain uncharacterized ESTs must be interpreted with caution. The uncharacterized transcripts within the large 5' introns of mouse Asxll and human ASXLI could represent: 1) non-coding transcripts, possibly involved in gene regulation; 2) alternative transcripts from uncharacterized internal exon(s) of the Asx-like-1 gene; 3) an alternative 5' end of the Asx-like-1 gene preceded by an alternative promoter region; 4) a gene other than Asx-like-1. It is probable that transcripts generated from the region represented by the mouse UniGene cluster Mm. 172388 do not result in protein products since their predicted translations do not match any entries in the protein databases. However, the second EST cluster, UniGene Mm. 30796 may represent an additional exon of Asxll that is not present within the Asxll cDNA contig. The two clones within this second EST cluster are from lymph node and thymus, both of which are types of lymphoid tissue, and therefore these ESTs could represent rare lymphoid specific transcripts of Asxll. Since lymphoid tissues were not included in our analysis of mouse 95 Asxll expression, this hypothesis remains to be tested; however there do not appear to be alternative transcripts for human ASXLI specific to lymph node or thymus as detected by northern blot analysis. Alternatively, these EST clusters could indicate the presence of another gene within the large 35 kb intron of Asxll. However, the most likely interpretation is that these ESTs represent non-coding transcripts, the relevance of which is currently unknown. To confirm or deny the existence of various putative alternative splice variants involving internal translated exons of Asxll and ASXLI, a detailed RT-PCR analysis on poly(A) +RNA from several different normal and diseased tissues would need to be conducted. There is increasing evidence that aberrant splice variant usage is correlated with tumorigenesis and other disease states (Caballero et al., 2001; Cartegni et al., 2002). Given the relatively high level of expression of Asxll in normal murine male gonads, it may be informative to examine primary gonadal tumors to compare the human ASXLI expression pattern to that of matched normal testis and ovary as a first step in determining whether or not aberrant ASXLI expression correlates with tumorigenesis in gonads. It will also be of interest to characterize the putative A S X L I splice variant protein product in pooled germ tumour samples similar to that noted above to determine i f absence of the PHD domain correlates with disease progression. Asxll and Asxl2 are both expressed in undifferentiated mouse embryonic stem (ES) cells. Confirmation of strong Asxll expression in day zero cultured mouse ES cells (using conditions promoting hematopoietic lineage differentiation as per Helgason et al. (1996) was obtained by RT-PCR (as per Pineault et al. (2002), in agreement with my results, and further showed that Asxll is rapidly downregulated to undetectable levels upon ES differentiation after day zero, but is then upregulated at day 12 of ES differentiation (N. Pineault and K . Humphries, unpublished observations). Asxll expression is ubiquitous in 10.5-11.0 dpc embryos according to in situ RNA hybridization experiments; however, earlier and later stages should also be examined to determine if Asxll and Asxl2 are expressed throughout embryonic development. Expression in undifferentiated ES cells predicts expression in blastocyst stage embryos, since ES cell lines are derived from the inner cell mass of blastocysts. Gene expression in ES cells is often predictive of a very early developmental function of that gene at pre-implantation stages, as is the case for the PcG and ETP genes YY1, Ezh2, and Rnf2/RinglB (Donohoe et al, 1999; O'Carroll et al, 2001; Voncken et al, 2003) and the trxG Mil (Ayton et al, 2001), or is predictive of an important function in stem cells as is the case for 96 Bmil (Molofsky et al, 2003; Raaphorst, 2003; Voncken et al, 2003). This contrasts with eed and Ringl/RinglA, which are not expressed in ES cells or blastocysts but rather begin to be expressed shortly after the implantation stage of development at 5.5 and 6.5 dpc respectively (Schumacher et al, 1996; Voncken et al, 2003). Eed has a critical role in early post-implantation development, since eed null homozygotes die at the primitive streak stage, prior to initiation of Hox gene expression (Schumacher et al, 1996), whereas RinglA null homozygotes are viable and fertile, yet show highly expressive and penetrant transformations of the axial skeleton at later stages of development (del Mar Lorente et al, 2000). Unlike all other characterized mammalian PcG genes, which are present as multiple paralogues in mice and humans, eed/EED is the single representative of its type, and therefore eed gene paralogues cannot compensate for loss of eed. The observation of Asxll and Asxl2 expression in ES cells is therefore suggestive of a possibly redundant, early function of Asx-like genes in embryonic development, and/or function in stem cell regulation. Because the human and mouse Asx-like genes so far analyzed (ASXL1, Asxll, and Asxll) exhibit a unique expression profile in comparison to other human and mouse PcG genes (Gunster et al, 2001), they are regulated independently of them. Interestingly, the variation in the major (9.5 kb) Asxl2 transcript intensity with tissue type is strikingly similar to that seen for the major (7.0 kb) Asxll transcript, suggesting that tissue-specific expression levels of Asxll and Asxl2 (and by extension, perhaps of all three Asx-like genes in mice) may be coordinately regulated. This observation, combined with the knowledge that the ASXLPs share multiple regions of sequence homology with a similar overall domain architecture, leads to the possibility that the three Asx-like genes are partially functionally redundant. Effects of functional redundancy will become relevant when analyzing phenotypes of LOF alleles of only one Asx-like gene at a time, as discussed in subsequent chapters of this thesis. 97 CHAPTER 3 Functional conservation of Asxll as an ETP gene in mice I. Introduction Polycomb group (PcG), trithorax group (trxG), and enhancer of trithorax and Polycomb (ETP) genes were initially identified and classified in Drosophila, based on the outcome of several different genetic screens, as described in Chapter 1. Multiple homologues of Drosophila PcG, trxG, and ETP proteins (hereafter collectively referred to as Maintenance Proteins (MP)) have been identified in mammals and other organisms based on primary sequence homology (see Chapter 1), many of which have subsequently been shown to possess analagous functions to their Drosophila counterparts through study of LOF and GOF mutant mouse models. However, functional divergence between Drosophila and mammalian MP homologues may have occurred and therefore it is necessary to test for conservation of function, and where possible, mechanism of action, of the putative homologues in order to understand their roles in mammalian development. MP homologues exhibit tissue- and cell-specific expression patterns in mammals as compared to ubiquitous expression in Drosophila (Gunster et al., 2001). Genetic disruption of murine MP genes results in embryonic or perinatal lethality, similar to their Drosophila counterparts (two exceptions are Ring la and the PcG-associated E2F6 gene homozygous nulls which are viable; (del Mar Lorente et al., 2000; Storre et al., 2002)). They also exhibit phenotypes such as axial skeletal transformations, misregulation of Hox genes during embryogenesis, and defects in hematopoiesis, which are analogous to those seen in mutants of their Drosophila homologues (Jacobs and van Lohuizen, 2002; Lessard and Sauvageau, 2003b; Remillieux-Leschelle et al., 2002; Simon and Tamkun, 2002). However, they have also acquired additional gene-specific functions across evolution, as illustrated by novel phenotypes observed in mouse mutant models. Gene-specific defects include male-to-female sex reversal in M33 mutant mice (Katoh-Fukui et al, 1998), neural crest defects leading to ocular and heart malformations in Rae28/Mphl-/- mice (Shirai et al., 2002; Takihara et al., 1997), cerebellar abnormalities in Bmil-/- mice (van der Lugt et al., 1994), and X inactivation in Eed and Ezh2 mutants (Erhardt et al, 2003; Silva et al, 2003; Wang et al, 2001). Mice with compound mutations in PcG and/or ETP genes (Bmil x Mell8, Bmil x M33, and Ringlb x MellS) exhibit functional synergism of homeotic phenotypes and effects on Hox gene expression compared to observations of the respective single PcG or ETP mutants (Akasaka et al, 2001; Bel et al, 1998; Suzuki et al, 2002). Conversely, compound mutant 98 mice of the putative ETP gene Bmil and the trxG gene Mil show normalization of homeotic phenotypes and some (but not all) Hox gene misexpression patterns (Hanson et al., 1999). This functional normalization suggests that, in the context of AP axis patterning phenotypes, Bmil primarily acts as a PcG gene, not as an ETP, since it functions antagonistically to Mil. This is unlike the situation with the Bmil ETP homologue Psc in Drosophila, in which mutations enhance trx mutant homeotic phenotypes (Gildea et al, 2000). It cannot therefore be assumed that ETP function, with respect to a given phenotype such as homeotic transformations, is conserved between Drosophila and mammalian ETP gene homologues. Murine Asx l l is homologous to Drosophila A S X , based on primary sequence conservation, as shown in Chapter 2. A S X belongs to the ETP group, as Asx mutants enhance mutations in PcG and trxG genes. Asx mutants also show bidirectional homeotic transformations along the A P axis in Drosophila, and exhibit altered expression of homeotic genes (see Chapter 1). If Asx l l is indeed a functional homolog of A S X , and therefore a conserved ETP protein in mice, then mutations in Asxll will enhance mutations in both PcG and trxG genes, exhibit bidirectional transformations along the axial skeleton, and show ectopic activation and repression of different Hox genes. In order to test these hypotheses, and assess potential unique roles of Asx l l in mammalian development, I generated a mouse mutant model of Asxll. II. Results CtCTttt i. Generation of Asxll and Asxll; M33 mutant mice The Asxll gene was disrupted by homologous recombination by inserting a neomycin-resistant (neo) gene in reverse orientation to the Asxll direction of transcription (Figure 3-1 A). The selection marker cassette was not removed after homologous recombination. I did not attempt a knockout strategy that would delete the entire Asxll locus as that would be challenging due to its large size (see below; (Muller, 1999)). The neo insertion site is within exon 5, interrupts the reading frame of Asxll by introducing several stop codons, is downstream of the large 35 kb intron number 4, and is upstream of the conserved nuclear receptor binding motif-containing A S X H domain exons (see below). In order to generate a null mutant, it is preferable to design a targeting construct such that the promoter of the transcription unit is removed, without removing large portions of the locus as that may inadvertently remove regulatory regions which would affect the transcription of nearby genes (Melton, 2002; Muller, 1999). I did not target Asxll in this manner since at the time of targeting vector construction, I 99 did not possess a genomic D N A clone containing this region of the locus, the mouse genome sequence had not yet been released, and Asxll promoter elements were (and are still) unknown. By inserting the neo cassette within an early exon, upstream of the exons containing the A S X H domain, it was predicted that a loss-of-function mutant would result even i f a short protein product truncated at the C-terminus were generated, since the resulting fragment would lack both the A S X H and PHD conserved domains of the A S X L P family. The targeting vector was electroporated into E14 ES cells and 7 out of 627 clones analyzed contained an Asxll -targeted allele, in the final successful screen. The PCR-based screening method used to identify positive ES clones is described in Materials and Methods. Two of the positive ES clones, which were karyotypically normal and had no random integrations of the targeting vector, were used to generate chimeric mice that transferred the Asxll-targeted allele to their offspring. Mice heterozygous for the mutated Asxll allele were interbred to generate homozygous offspring which were genotyped by Southern blot and triple-primer genomic PCR analysis (Figure 3-1 B,C), as described in the Materials and Methods, using the external probe and primers shown in Figure 3-1 A. Mice homozygous for the Asxll mutation lacked Asxll wild-type transcripts by Northern blot and RT-PCR analysis though transcripts of altered M W at reduced levels of expression were detected (Figure 3-2). The size and pattern of altered transcripts generated from the Asxll mutant allele are consistent with read-through of the 1.8 kb neomycin cassette insertion, since each of these transcripts are approximately 1.8 kb larger than the corresponding wild-type band (Figure 3-2A), and all bands in the Asxll-/- sample are also positive for a neomycin probe (Figure 3-2B). The neo insertion itself is intact and generates a full-length mRNA since an additional band corresponding to the size of the neo transcript was detected in Asxll mutant samples but not in wild-type samples (Figure 3-2B). Reverse orientation of the neo insertion in the mutated Asxll allele was confirmed by RT-PCR analysis (Figure 3-2C). Moderately reduced levels of Asxll transcript expression in the Asxll mutants were observed by whole mount RNA in situ hybridization of 10.5 dpc embryos, visualized by an overall reduced signal intensity in the ubiquitous expression pattern (Figure 3-3; compare to Figure 2-9). Since stop codons are present in every reading frame within the PGK promoter of the inserted neo cassette, it is unlikely that the altered transcripts generated from the Asxll mutated allele will result in full-length Asx l l protein product, since such aberrant transcripts would lack appropriate poly(A) tails and would therefore be degraded. However, it remains possible that a short protein truncated at the C-terminus may be generated, although this protein would lack both conserved 100 A S X H and PHD domains, so it is unlikely that a dominant-negative mutant would result. It is also possible that utilization of cryptic splice sites may result in the generation of aberrant splice variants of low abundance from the mutated Asxll allele that contain internal deletions (Figure 3-2A). We have generated three independent antibodies against different regions of the Asxl l protein (C. Fisher, S. Bloyer, and H. Brock unpublished results). We have so far been unable to obtain conclusive results with these antibodies on endogenous Asx l l expression (C. Fisher, S. Bloyer, B. Argiropolous, and H . Brock, unpublished results) and are continuing to address this question. Hence Asx l l protein expression data for the Asxll mutant mice is currently lacking; it is of paramount importance to obtain such information in the near future to determine the nature of the Asxll mutation I have generated. Demonstration of complete absence of Asx l l protein product using antibodies directed against several regions of the protein would be required to conclude that the Asxll homozygous mutants are true nulls. However, with the caveat that Asxl l protein expression information is currently lacking, the Asxll homozygous mutants will be referred to herein as Asxll-/-. Generation of M33 mutant mice, with a targeted disruption in the M33 gene, have been described (Katoh-Fukui et al, 1998) and were obtained as a kind gift from Torn Ctcvtfi Higashinakagawa. These mice were designated as M33 mutants rather than nulls since the targeting strategy used (neo cassette insertion into the coding region of last (fifth) exon) may result in generation of a C-terminal truncated protein product. Although full length M33 protein was not detected by Western blot using a C-terminal derived anti-M33 antibody, an anti-M33 antibody directed against the N-terminus (upstream of the neo insertion site) was unavailable to use for confirmation of null status. Genotyping was done by triple-primer PCR on genomic D N A using M33-specific primers (Figure 3-1D; refer to Materials and Methods for details). In order to generate various combinations of Asxll and M33 mutant mice, Asxll +/-; Ct€f"f7t/~^' M33 mice were intercrossed. Prior to intercrosses, both strains of mice had been maintained on a C57BL/6J background. Mice were genotyped by genomic PCR using M33-specific and^5x/7-specific primers (Figure 3-1 C,D) in separate reactions, as described in the Materials and Methods. 101 B kb * > * 7.5- *• 3.8- m c bp 4 2 6 -2 5 3 - — mm — — D bp 3 2 5 - — 200 - m* «m Figure 3-1: Targeting strategy and disruption of the Asxll gene in ES cells and mice. (A) Previous page. Diagram of the Asxll genomic locus, the targeting vector, and the modified Asxll allele. The ylsx//-coding exons are indicted by black boxes, introns by white boxes. The PGKneo expression cassette (large white box) was used for positive selection. Position of relevant restriction sites (Clal, C; EcoRI, E; Hindlll, H ; Xbal, X ; Xhol, Xh), location of external probe and PCR primers, and sizes of diagnostic fragments are indicated (on previous page). (B) Southern analysis of genomic D N A isolated from newborn offspring of an Asxll +/- intercross after digestion with EcoRI and probed with the indicated 0.53 kb external probe shown in (A). This probe detects a 7.3 kb EcoRI fragment from the wild-type allele, and a 3.9 kb fragment from the Asxll targeted allele. (C) PCR analysis of liver genomic D N A of 18.5 dpc embryos from an Asxll+/- intercross. Primers mAsxK02-Tfor and mAsxK02-Trev amplify a 253 bp fragment of the wild-type allele, whereas primers PGK-PR-AS and mAsxK02-Trev amplify a 426 bp fragment of the Asxll targeted allele. (D) PCR analysis of yolk sac genomic D N A of 10.5 dpc embryos from an Asxll +/-;M33Cterm/+ intercross, showing M33 specific reactions. Primers M33A and M33B amplify a 200 bp fragment of the wild-type allele, whereas primers PGK-PR-AS and M33B amplify a 325 bp fragment of the M33Cterm allele. 103 B k b k b 9 .5 — ' — 9 . 5 7 .5 — mm 7 .5 4.4 - 4.4 2.4 — I # 2.4 — 1.35 — 1.35 0.4 • Ma» M M Primer set 1 Primer set 2 Figure 3-2. Northern blot analysis of poly(A)+RNA from pooled tissue of neonate wild-type (+/+) and Asxll -/-mice using probes for Asxll (A), and neomycin (B). (C) RT-PCR analysis of total RNA from pooled tissue of individual newborn wild-type and Asxll -I- mice. As predicted, primer set 1 (MEX1-F1 and mAsxK02-Trev) amplifies a 368 bp band in wild-type samples, and a 2.2 kb band in Asxll-/- samples in agreement with read-through of the inserted PGKneo cassette. Primer set 2 (MEX1-F1 and neo-Fl) amplifies a 300 bp band from the Asxll -I- samples confirming expected reverse orientation of the neo cassette, whereas no amplification occurs in the wild-type samples. Refer to methods and Figure 3-1A for further information on primers used. 104 Figure 3-3. In situ R N A hybridization of Asxll +/+ and Asxll-/- 10.5 dpc embryos, using an Asxll antisense riboprobe. Images were taken at the same magnification and exposure settings, and the embryos were littermates. The Asxll-/- embryo shows reduced Asxll expression and is smaller than the control embryo. 105 iL Viability and general observations of Asxll mutant mice. Mice heterozygous for the Asxll mutated allele interbred to homozygosity at the expected segregation ratios at time points up to and just following birth (Table 3-1). By the time of weaning (3 weeks of age), Asxll-/- mice were found at a substantially lower frequency than expected, at only 28% of the number expected based on the number of Asxll+/+ mice obtained (Table 3-1A). The frequency of Asxll+/- mice observed was also reduced, at 87% of the number expected (Table 3-1 A). Similar survivorship rates according to genotypes of Asxll were observed in mice derived from each of two independently targeted ES cell clones (Table 3-1B) and so subsequent results from mice derived from both lines have been pooled. By observation of newborn mice and young pups, it appears that the majority of lethality occurs within 1-3 days after birth, and the dead pups are cannibalized promptly by their mothers. Most il l or dead Asxll-/- newborns and young pups discovered intact did possess milk in their stomachs indicating that suckling had occurred. The cause of death has not been determined. Most newborn Asxll-/- mice appear outwardly normal immediately following birth, are not anaemic, and have normal lung, kidney, liver, and heart histology (J. Hess, personal communication; data not shown); however, histological analysis of additional tissues has not been performed. Many Asxll-/- mice that survive past the age of weaning are smaller than their littermate controls (Figure 3-4), but this phenotype is not completely penetrant as some Asxll-/- mice are virtually indistinguishable from controls. A l l surviving Asxll-/- mice develop infections of the eyes, ears, and mucous membranes, after a period of several weeks to months, in some cases after more than a year. Adult Asxll-/- mice were sacrificed soon after displaying signs of illness (lethargy, ruffled coat, redness of mucous membranes) along with an age- and sex-matched Asxll +/+ control (littermate where possible; otherwise a sex-matched control mouse of similar age from the same mouse line and parental generation) was used in experiments. Since the time of onset of these symptoms, and therefore sacrifice, in adult Asxll-/- mice used for experiments varied widely, the results for the various assays conducted were compared to each other across the matched pairs and were treated as related rather than independent data points (see Materials and Methods for details on statistical analyses used). Investigation of organs harvested from adult Asxll mice (Figure 3-5) revealed consistent reductions in the weights of the thymus and testes, and increased size of the spleen, in Asxll-/-mice compared to Asxll +/+ mice, with no change in weight of the liver, lungs, heart, or kidneys (Figure 3-5A). The above changes in thymus, spleen, and testes weights according to Asxll 106 Table 3-1. (A) Numbers of offspring of Asxll heterozygous intercrosses observed (O) by genotype, with corresponding expected values (E) noted in square brackets. Expected values are defined by equating the number of wild type (Asxll+/+) mice expected to the number observed, and calculating expected values of the other genotypes accordingly using the expected Mendelian frequency distribution (%) shown in the far right column, dpc, days post-coitum; ND, genotype not determined. * indicates observed frequencies Age Genotype 10.5-11.0 dpc Newborn 3 weeks* Expected (%) O [El O [El O [El Asxll + / + 15 [75] 31 [31] 213 [213] 25 Asxll+/" 22 [30] 54 [62] 370 [426] 50 Asxll"'" 12 [751 26 [371 60 [213] 25 n 50 119 643 ND 1 (resorbed) 8 1 Table 3-1. (B) Numbers of 3 week old offspring of Asxll heterozygous intercrosses broken down according to genotype, sex (M=male, F=female, ND=not determined), and mutant mouse line (1 or 2). Sex was determined by visual inspection. * indicates observed genotype frequencies according to mutant mouse line (using values from TOTAL columns), are significantly different from expected; while * s indicates that sex Genotype-* +/+ +/+ +/+ +/- +/- +/- -/- -/- -/-Linei M F TOTAL M F TOTAL M F TOTAL 1* 61 53 114 85 110 195 25 11 36 2* 48 51 99 75 100 175 13 11 24 1+2* 109 104 213 160 210 370*s 38 22 60*s 107 60 50 40 20 30 37 46 53 Age (weeks) 53 54 58 B +/+ -/-Figure 3-4. Comparison of adult mouse body weight and size according to Asxll genotype. (A) Adult mouse body weight presented in matched pairs of individual Asxll'- mice (black bars) with their age- and sex-matched Asxll+I+ control (white bars). Wilcoxon signed-ranks test was used to compare differences across all adult Asxll-/- and Asxll+/+ matched pairs. Statistical significance is indicated by *, p < 0.05. (B) A matched pair of Asxll adult mice (approximately 6 months old) showing diminished size of the Asxll'- mouse compared to its matched Asxll+I+ control. 108 a a 0) < o o o o o o o o o o m o m o m o m o i o o \ t \ t n n o i N r i - o o o o o o o o o o o o (%) n^Bja/w Apoq/sniuAi|x 00 1 (6) ;u6j3M UBBJQ T ^ T ^ T ^ T - ^ O O O O (%) )l|B|9M Apoq/sajsai O o < O O O O O O O O O O O O O O O O i q o i n o i n o i n o f i n i N I N r r ' d d (%) m6|8M Apoq/uaa|ds jo a .a <s & IB CCS ^ » •-B c/f a •a a « 5b . oo a CD e > on l> rt rt T 3 rt T3 -=• -a CD DH a rt fa * O .-a *o o a _a <u &, is S =3 oo t-O • u s CD a rt 3 a a < o + .2 — « . » ^ rt H 6 ^ § "2 rt 60 > a N rt o 00 +2 O a s s o "? £ CO -O <U « I-l OH a a .SP ° £ So 2 S i f v- co -7? 8 a $ CD C CD H ,<D CD i>3 1 6 ^ T 3 CD ~ .-a O K co rt ~*r] p, 2 -2 b —' rt v-rt * is ^ o w 00 B °. a CD o 3 -° V rt co £ °< « SP <S * * S > (-) a co # . CD . „ C to O CD to •S ~ « rt -o •> a po rt " - S a w 00 CD a J D J O " « ^ CQ X w O co i s rt x a £ o 109 genotype are still seen once body weight of the mice is taken into account (Figure 3-5B,C,D). Reduction in thymus weight will be further discussed in Chapter 4. Several adult Asxll-/- mice exhibited splenomegaly (Figure 3-5C, Figure 3-6). In order to determine if ultrastructure was affected in the large Asxll-/- spleens, histological sections stained with hematoxylin and eosin were analyzed, and a representative example is shown in Figure 3-6B. Regions of red pulp (where red blood cells are destroyed) and white pulp (also known as follicles, which contain B-and T-lymphoid cells) are visible within the Asxll-/- spleens, as expected (not shown). However, there appears to be extramedullary hematopoiesis (EMH) present within the red pulp regions of the large Asxll mutant spleens (Figure 3-6B). This phenotype will be discussed further in Chapter 4. iii. Sex-determination and fertility in Asxll mutant mice Mutations of the mouse PcG gene M33 lead to male-to-female sex reversal, resulting in Ct&KtTl/CtCFTH extremely low numbers of male mice within the M33 cohort (Katoh-Fukui et ah, 1998). We therefore monitored Asxll mutant mice to see if there was any skewing in sex ratios towards an over-abundance of female Asxll-/- mice, which may indicate male-to-female sex reversal. 3 week old mice can be sexed by inspection of external genitalia. There were no differences observed in sex ratios of 3 week old Asxll'+/+ mice, however, female Asxll+/- were overrepresented and female Asxll-/- mice were underrepresented, suggesting that the Asxll-/-mutation leads to enhanced lethality in females over males (Chi-square test, p<0.05; Table 3-1B). A l l adult male Asxll-/- mice sacrificed (n=6) did possess exclusively male internal reproductive organs (no hermaphroditic mice were observed). These results suggest that male-to-female sex-reversal does not occur in Asxll mutants; however, the chromosomal sex of mutants would have to be determined in order to confirm that sex-reversal does not occur. The reduced testes weight (Figure 3-5), and size (Figure 3-7) observed in Asxll-/- mice compared to matched controls suggested an effect on fertility and/or spermatogenesis. In order to determine i f fertility was affected by loss of Asxll, Asxll-/- males were mated to female C57BL/6 mice of confirmed fertility, and the number of 1-cell and 2-cell zygotes and embryos in each oviduct were counted on day 2 following overnight mating. Of six Asxll-/- males tested, 4 were fertile and 2 were infertile (as no 2-cell embryos were found in the oviduct of the females they were mated with despite the presence of a vaginal plug the morning following mating indicating that mating had occurred; Table 3-2). Additional attempts at mating the two 110 Figure 3-6. Splenomegaly in Asxll-/- adult mice. (A) Photograph showing an example of a grossly enlarged spleen harvested from an Asxll-/- adult mouse, with a normal sized spleen from the matched Asxll+/+ control mouse for comparison. (B) Spleen cross-sections stained with hematoxylin and eosin from an adult Asxll-/- mouse and matched Asxll+/+ control mouse. Extramedullary hematopoiesis (EMH) is visible throughout the Asxll mutant spleen as clusters of dark red staining red blood cells. Magnification = 156x. I l l Figure 3-7. Role of Asxll in gonad development and spermatogenesis. Photographs showing reduced size of testis (A) from an adult Asxll-/- male mouse, compared to a normal sized testis from the age-matched Asxll+/+ control mouse. Ruled lines are 1mm apart. (B) Photograph of reproductive organs dissected from an adult Asxll-/- female mouse showing absence of the ovary (arrow) from one side of the body. (C) Testis cross-sections stained with hematoxylin and eosin from an adult Asxll-/- male mouse and matched control Asxll+/+ mouse. Aberrant spermatogenesis is visible in the Asxll-/- testis sections as the presence of immature cell types in the lumen which normally only contains elongated spermatids. 112 Table 3-2. Fertility tests of Asxll-/- adult male mice. Male mice were mated to C57BL/6 fertile females. Zygotes or unfertilized eggs (1-cell) and 2-cell embryos were harvested from successfully mated female mice (indicated by a vaginal plug). Male mice from each Asxll-/- male ID 1-cell 2-cell embryos Oviduct 1 Oviduct 2 Oviduct 1 Oviduct 2 Asx-1 (line 1; F5) 2 3 11 9 Asx-3 (line 2; F3) 2 2 18 22 Asx-4 (line 2; F3) 27 12 0 0 Asx-5 (line 1; F6) 27 24 0 0 Asx-6 (line 2; F4) 2 0 5 6 Asx-7 (line 2; F4) 2 3 11 23 113 infertile Asxll-/- males with different female mice were unsuccessful in generating offspring, confirming that these two males were indeed infertile. Three Asxll-/- females were also tested for fertility, and all were fertile, producing offspring when mated to C57BL/6 males. However, one other Asxll-/- female sacrificed lacked an ovary on one side of the body only (Figure 3-7B). Asxll +/- male and female heterozygotes were generally fertile, as judged by numerous successful inter-cross matings, and backcross matings between Asxll+/- males and C57BL/6 females, although fertility was not quantified. Preliminary analysis of testis sections from six male Asxll-/- mice revealed abnormal spermatogenesis within the seminiferous tubules of one of the testes examined (Figure 3-7C). This is visualized as a disruption (within the Asxll-/-testis shown) in the normal pattern of progression from spermatogonia to pachytene spermatocytes to round spermatids to elongated spermatids in a basal to luminal direction, such that precursor cells appear in juxtaposition with more mature spermatids (Figure 3-7C). iv. Eye development defects in Asxll mutant mice A number of surviving adult Asxll-/- mice had visibly smaller eyes compared to their Asxll+/+ littermates, while some Asxll-/- mice lacked eyes on one or both sides of the face (e.g. Asxll-/- mouse shown in Figure 3-4B is lacking both eyes); however this phenotype was incompletely penetrant, and the penetrance was not quantified. These phenotypes were not observed in the adult control or Asxll +/- littermates, suggesting that it is not due to genetic background. However, mice of the C57BL/6J strain, which is the genetic background of the Asxll mutants, are known to exhibit sporadic microphthalmia and anophthalmia at a low frequency of approximately 4% (Smith et al., 1994; http://iaxmice.iax.org/library/notes/463a.html) and therefore the following results should be interpreted with caution. Dissection of the head of one male adult Asxll-/- mouse exhibiting an eye defect phenotype revealed a complete lack of both eyeballs, both optic nerves, and the optic chiasm on the ventral surface of the brain (unpublished observations by Kathleen Millen, University of Chicago), indicating an early severe defect in eye differentiation. In order to investigate eye defects in Asxll-/- mice during embryonic development, I examined 12.5 dpc embryos for eye abnormalities by inspection of unstained whole mounts using a dissecting scope. This time point was chosen since developing eye structures such as the lens epithelia and pigmented neural retina are clearly recognizable (Kaufman, 1992; Kaufman and Bard, 1999; Kondoh, 2002), and it is prior to the major phase of lethality of Asxll-/- mice. Out of one litter of mice analyzed (n=8), 2 Asxll+/+, 4 Asxll +/-, and 2 Asxll-I-114 embryos were obtained, and both Asxll-/- embryos exhibited abnormalities in the form of reduced or absent developing eyes, while eyes of all Asxll+/+ and Asxll+/- mice appeared normal (Figure 3-8). Interestingly, the size of the developing eyes differed between the two sides of the face in each Asxll-/- embryo, an example of developmental left-right asymmetry (Figure 3-8B). The dark region encircling the light region of the wild-type eye at 12.5 dpc (Figure 3-8A) is the externally visible portion of the pigmented neural retina, the light region in the middle is the lens epithelium, and the ventral discontinuity is the choroid fissure. In the top panel of Figure 3-8B, the pigmented neural retina is present on the LHS although the central lens epithelium is absent. On the RHS however, there appear to be no eye structures whatsoever. In the bottom panel of Figure 3-8B, both eyes are further developed than in the top panel (although the eye on the LHS is reduced in size), both eyes exhibit discontinuous morphology of the lens epithelium, and the ventral choroid fissure appears larger than normal. Failure to close this fissure later in development would result in coloboma (Kondoh, 2002). The latter observations suggest a defect in early morphogenesis of the optic vesicle to the optic cup, with variable expressivity of the phenotype, as the most severe effect is complete loss of eye structures. v. Homeotic phenotypes of Asxll mutant mice Inactivation of PcG, trxG, and ETP function in mice characteristically results in homeotic transformations of the axial skeleton as a result of misexpression of Hox genes (see Chapter 1). According to the traditional view of PcG genes as repressors, posterior homeotic axial skeletal transformations, as a result of ectopic Hox gene expression within the embryonic paraxial mesoderm (which demarcates the prospective vertebrae), should be found in PcG gene null mice. Conversely, trxG gene nulls should exhibit anterior transformations as a result of the failure to maintain proper Hox gene expression, since trxG genes are normally required for target gene activation. ETP null mutants should therefore exhibit bidirectional homeotic transformation phenotypes, and show both ectopic expression and downregulation of different Hox target genes. To determine whether or not mouse Asxll shows conserved ETP function compared to its Drosophila homolog Asx, I analyzed newborn Asxll mutants for skeletal patterning defects along the anterior-posterior body axis, and for defective Hox gene expression in embryos. If Asx-like-1 is a conserved ETP gene in mammals, then Asxll mouse mutants should show bidirectional axial transformations, and exhibit concordant Hox gene misexpression. 115 a. Axial skeletal transformations of Asxll mutant mice In order to visualize skeletal defects in Asxll mutants, newborn mice were sacrificed, and skeletal and cartilage preparations were made by staining with alizarin-red and alcian-blue respectively. I found reproducible alterations in the cervical, thoracic, and lumbar vertebrae of Asxll-/- mice, whereas these alterations were either absent or present at a very low frequency in wild type mice (Table 3-3). Some of the alterations were also observed in Asxll+/- mice, albeit at a lower frequency than in Asxll-/- mice. In 93% of Asxll-/- mice analyzed, I observed that the anterior arch (which is the anterior ossification projection normally associated with only the CI vertebra (the atlas) via a cartilaginous connection) was distinctly larger, and shifted such that it was now also associated with the C2 vertebra (the axis) via an ectopic cartilaginous process (black arrows in Figure 3-9A-C). This type of alteration can be interpreted as a C2—»C1 anterior transformation, and was seen at a lower frequency of 32% in Asxll +/- mice, and also rarely (11%) in wild type mice (Table 3-3). In 14% of Asxll-/- mice, another type of anterior transformation was observed in that the C1 vertebrae was split laterally similar to the break observed between the exoccipital (EO) and supraoccipital (SO) bones of the skull (orange arrow in Figure 3-9B). This alteration is interpreted as a CI—> occipital anterior transformation. These types of defects are also seen in Hoxb4, Hoxd3, and Hoxd4 mutant mice (Ramirez-Solis et al, 1993; Condie and Capecchi, 1993; Horan et al, 1995a, Horan et al, 1995b). At the cervico-thoracic boundary, an ectopic complete rib on the C7 vertebra was seen in 14% of the Asxll-/-mice (Table 3-3; Figure 3-10A,B); partial ectopic ribs on C7 were also observed in several Asxll-/- mice (Figure 3-10C,D). The ectopic C7 rib phenotype was never observed in Asxll+/-or wild type mice, and is interpreted as a C7-»T1 posterior transformation. This type of defect is also observed in Hoxa4, Hoxa5, Hoxa6, Hoxb7, Hoxa7, and Hoxb9 null homozygous mice (Chen et al, 1998; Horan et al, 1995; Jeannotte et al, 1993; Kostic and Capecchi, 1994). Proceeding caudally within the thoracic region, on the ventral side, I observed an abnormal xiphoid process (indicated by a small hole within the process) in 50% of Asxll-/- mice, and also at a very low frequency (5%) in Asxll+/- mice, but never in wild type mice (Table 3-3). Such defects were also seen in Hoxc4 mutant mice (Saegusa et al, 1996). At the thoraco-lumbar boundary, 64% of Asxll-/- mice were either completely lacking the thoracic T13 rib on one or both sides, or else had rib buds instead of complete ribs on one or both of the T13 vertebrae (Table 3-3; Figure 3-11A-D), indicating transformation towards a lumbar vertebral identity. This phenotype was also observed in a reduced proportion (26%) of Asxll +/- mice, and in 11% 117 Table 3-3. Skeletal abnormalities of newborn Asxll mutant mice. +/+ (n=9) % +/-(n= 19) % -/-(n=14) % Cervical region CI—^occipital 0 11 0 0 32 0 14 93 14 C2-M31 C7—»T1 ectopic complete rib Thoracic and lumbar regions Abnormal xiphoid process T13—»L1 rudimentary or 0 11 5 26 50 64 missing rib(s) Figure 3-9 (next page). Alterations of the axial skeleton of newborn Asxll mutant and Asxll-J\433 compound mutant mice. Lateral views of the cervical regions and scapulae of cleared skeletons of newborn wild-type (A), two Asxll homozygous mutant (B,C), and three Asxll ;M33 compound heterozygous mutant mice (D,E,F). After staining with alizarin red and alcian blue, bone and cartilage appear red and blue respectively; skeletons in the lower panel (D,E,F) only show alizarin red staining. The anterior arch is normally associated with only the CI vertebra, joined by a cartilaginous connection (black arrow in A). Anterior transformations of the C2 vertebra to CI vertebra in mutant mice are indicated by an association of the (now larger) anterior arch with C2 (unlabelled arrows in B,C,D,E,F) either instead of (D), or in addition to (B,C,E,F) its association with CI. In one Asxll -I- mouse shown (B), the CI vertebra is anteriorly transformed toward the occipital bone of the skull indicated by a split within the CI (arrow). In one Asxll; M33 compound mutant (D), more severe anterior transformations are indicated by fusion of the CI vertebra to the exoccipital (EO) bone of the skull (arrow), while the anterior arch is reduced in size and is associated with the C2 vertebrae (unlabelled arrow), which is thickened. Other malformations in (D) are reduced size of the supraoccipital (SO) bone of the skull (arrow), malformations of all cervical vertebrae, and defective ossification of the scapula (star), as compared to wild-type (A). Note that the scapula in (F) is broken, not a skeletal malformation. Figure 3-10 (page 120). Alterations of the axial skeleton of newborn Asxll mutant and Asxll; M33 compound mutant mice. Lateral views of the thoracic regions of cleared skeletons of newborn wild-type (A), three Asxll homozygous mutants (B,C,D), and one Asxll;M33 compound heterozygous mutant mouse (E). After staining with alizarin red and alcian blue, bone and cartilage appear red and blue respectively; the skeleton in the upper right panel (E) only shows alizarin red staining. C7 to T l posterior transformations are indicated by the presence of an ectopic complete rib (B), partial rib (C), or rib bud (D,E) on the C7 vertebra, shown with arrows. Thoracic ribs T l through T3 are numbered 1 through 3 respectively. Figure 3-11 (page 121). Alterations of the axial skeleton of newborn Asxll mutant and Asxll;M33 compound mutant mice. Dorsal views of the lower cervical, thoracic, and upper lumbar regions of cleared skeletons of newborn wild-type (A), three Asxll homozygous mutants (B,C,D), one Asxll'1' ;M33c,erm/+ mutant (E), and three Asxll \M33 compound heterozygous mutant mice (F,G,H). After staining with alizarin red and alcian blue, bone and cartilage appear red and blue respectively; the skeletons in the lower panel (E,F,G,H) only show alizarin red staining. T13 to LI posterior transformations are shown by the lack of both T13 ribs (B,E), or the presence of rib buds on one or both sides of the T13 vertebra (C,D,F,G,H), noted with arrows. In C) and D), the cartilage associated with the T13 rib is still present even though its associated rib is reduced or absent. The T13 and LI vertebrae are labeled. 118 119 120 -o d <u on s-U 0 0 CL CD CD m tn CD U I 121 of wild type mice (Table 3-3), and is interpreted as a T13-»L1 posterior transformation. L I to T13 anterior transformations are seen in single and/or compound null homozygous mutants for Hoxa9, Hoxb8, Hoxb9, Hoxc8, Hoxd8, and Hoxd9 (Chen and Capecchi, 1997; Fromental-Ramain et al, 1996; Le Mouellic et al., 1992; van den Akker et al, 2001). Sacral region transformations of Asxll mutants remain to be investigated. There appear to be no defects in patterning of the appendicular skeleton in Asxll mutants. In summary, Asxll mutants exhibit bidirectional axial skeletal homeotic transformations, with anterior transformations occurring in the upper cervical vertebrae and occipital bones of the skull, and posterior transformations in the lower cervical, thoracic, and lumbar vertebrae, b. Hox gene expression in Asxll mutant mice Maintenance of Hox gene expression is altered in PcG, trxG, and ETP mutants, thereby leading to transformations of body segments along the A P axis. As a first step to determine i f Hox genes are misexpressed in Asxll-/- mice, I analyzed 10.5 dpc embryos for changes in the pattern of mRNA expression of Hoxc8 by whole mount in situ hybridization. Hoxc8 was chosen since it is expressed between the 11 t h prevertebra (pvl 1, somite 14, T4) and pv20 (somite 23, T13) within the paraxial mesoderm of mouse embryos, patterning of thoracic and upper lumbar vertebrae is affected in Hoxc8 mutants (Belting et al., 1998; Le Mouellic et al., 1992; Tiret et al., 1993; van den Akker et al., 2001), and Asxll-/- mice showed skeletal abnormalities and homeotic transformations in the thoracic and upper lumbar regions, including C7 to T l and T13 to L I transformation. Hoxc8 is also known to be a direct target of M i l (Hanson et al., 1999; Milne et al, 2002), and a one segment posterior shift in the rostral Hoxc8 expression boundary is observed in Mll+Z- mutant embryos, as well as an overall reduction in staining intensity (Hanson et al., 1999; Y u et al., 1998), consistent with a role of Mil in maintaining activation of Hox target genes. Effects on Hox gene expression in Mil null embryos cannot be accurately determined due to early embryonic lethality at approximately 10.5 dpc along with gross patterning defects (Yu et al., 1995). Since human A S X L I and M L L interact directly (C. Fisher, E. O'Dor, and H . Brock, unpublished observations; see Chapter 1), this provided another reason to test for aberrant Hoxc8 expression in Asxll mutants as Asxl l and M i l may regulate common target loci. However, the anterior boundary of expression of Hoxc8 appeared normal in two Asxll-/- embryos analyzed, with weak expression in pvl 1, and stronger expression in pvl2 within the paraxial mesoderm (Appendix Figure A-5). The signal intensity in the Asxll-/- embryos was very slightly reduced towards the caudal direction as compared to 122 the wild type, although this effect is more pronounced in one Asxll-/- embryo (Appendix Figure A-5B) than in the other (Appendix Figure A-5D). Subsequent analyses of Hoxc8 expression by in situ hybridization of additional Asxll-/- embryos were consistent with the above preliminary findings of very modestly reduced Hoxc8 staining within the normal boundaries of expression (n=8; B. Argiropolous and H. Brock, unpublished results). Confirmation of these preliminary findings by quantitative analysis of Hoxc8 expression in Asxll-/- embryos compared to wild type is necessary due to the very modest nature of the phenotype. If confirmed, these preliminary observations would suggest a role of Asx l l in maintaining activation of Hoxc8 within the normal boundaries of expression. However, in order to obtain a broader understanding of effects on Hox gene regulation in Asxll mutants, and to determine i f Asx l l is required to maintain repression or activation of other Hox genes, in situ expression analysis of additional Hox genes in 9.5-12.5 dpc Asxll-/- embryos must be conducted in the future. vi. Genetic interactions between Asxll and M33 mutations If Asxll truly is an ETP gene in mice, as is suggested by the presence of bidirectional axial transformations in Asxll mutants shown above, then mutations in Asxll will enhance mutations in both PcG and trxG genes. As the first test in confirming this hypothesis, I investigated genetic interaction between Asxll and M33 by generating compound mutant mice. M33 is a member of the PcG and was the first identified Polycomb (Pc) homologue in mice (Pearce et al., 1992), whose functional conservation was strikingly illustrated by its ability to largely compensate for loss of Pc expression during Drosophila embryonic development (Muller et al., 1995). I chose to generate a compound mutant of Asxll and M33, because Pc exhibits the strongest genetic interaction with Asx of all PcG genes tested in Drosophila (Milne et al., 1999). Two homozygous mutant mouse models of M33 exist, M33-/- (Core et ah, 1997) and M33Cterm/Cterm (Katoh-Fukui et al, 1998), both of which demonstrate similar phenotypes, including partial perinatal lethality, male-to-female sex-reversal, and axial skeletal transformations correlated with defects in maintenance of Hox gene expression during embryogenesis. The M33Cterm mutant is likely LOF, but this has not yet been confirmed since an appropriate antibody directed against the N-terminal portion of M33 is lacking, precluding testing for the existence of a C-terminal truncated protein product generated from the targeted locus (Katoh-Fukui et al., 1998). Of interest in the context of ETP gene function, both M33 mutants exhibited bidirectional axial transformations (Core et al., 1997; Katoh-Fukui et al., 123 1998). In this study, I crossed Asxll'+/- and M33Uerm/+ (Katoh-Fukui et al., 1998) mice to generate compound mutant offspring, and analyzed their phenotype. a. Enhancement of lethality in Asxll;M33 c , e r m compound mutant mice M33-/- mice survive to birth in the expected ratio, but show high postnatal lethality compared to M33 Q e r m / Q e r m mice, as 90% die within 4 weeks and none live past 6 weeks (Core et al., 1997). M33 C t e r m / C t e r m m i c e exhibit perinatal lethality, and 60% of M33 C t e r m / C t e r m mice die between birth and 3 weeks of age (Katoh-Fukui et al, 1998). In that study, there also appeared to be an increase in lethality of M33 Q e r m / + mice since observed ratios of 28:38:17 M33+/+:M33Qerm/+:M33Cterm/c,erm were seen in newborn mice, and therefore only 68% of the expected number of M33 Cter'"/+ were obtained, although this was not noted by the authors (Katoh-Fukui et al., 1998). By the time of weaning at three weeks of age, I observed 63% of the expected number of Asxll +/+;M33 C t e r m / + mice (Table 3-4), consistent with previous findings of Katoh-Fukui et al. (1998). Asxll+/-;M33+/+ mice were found at a frequency of 88% of the expected number, which is consistent with my previous findings for single Asxll+/- mutants (see Table 3-1). At the three week timepoint, very few Asxll'/';M33+/+ and Asxll +/+;M33 Q e r m / cterm m j c e w e r e still alive (Table 3-4). This observation is consistent with my previous results for survivorship rates of single Asxll-/- mutants (see Table 3-1), indicating that any background effects in the double mutant background were not significantly altering survivorship rates of Asxll mutants. However, my data also indicates a reduction in viability of the M33 C , e r m / Q e r m mutants compared to previously published results, since I observed only 6% of the expected number of Asxll +/+;M33 Ctern"Cterm mice alive at the 3 week timepoint. The reduced viability of the M33 C l e r m / C t e r m m i c e in my study compared to that reported by Katoh-Fukui et al. (1998) is likely due to differences in genetic background, since they investigated offspring of N3 generation parents backcrossed to C57BL/6Njcl (B6), while I used mice that had been backcrossed to the C57BL/6J background for at least 10 generations. Only 47%o of the expected number of Asxll +/-;M33 C t e r m / + trans-heterozygotes were observed at the 3 week timepoint, indicating an enhancement of lethality over the single mutant heterozygotes (Table 3-4; compare to Table 3-1 for Asxll +/- survivorship data). Strikingly no Asxll-/-;M33 C t e r m / C t e r m double homozygous mutant mice were seen at the 3 week timepoint (n=227), and so earlier timepoints were investigated. No double homozygous mutants were found among the newborns analyzed (n=33), nor in the set of 10.5-11.0 dpc embryos analyzed (n=39), suggesting that Asxll-/-;M33 C t e r m / C l e r m mutants are lethal prior to 10.5-11.0 dpc; 124 c~ co VJ T l CD H 60 O oo CD u 0) •Ji CCS *J -2 00 SP * O T3 So CO : co c*i "~> CT) CO CT) CT) cn V O ' C T C N ' ^ VO^  VO^  F T VO1 c o r ^ r ^ ^ c o c o r - ~ r ~ - c o vo cn vo r--cc ivOTtvOcntNhrHO CN ( N « 2 o i ' o ^ M O O C N - ^ - ^ J - O O C N C N ^ f - ^ - C N C N I / O T — I H (N| 'Cf O a a s CO V <r> V t_. ^ " L w fn » r 1 ^ < < •< -f 4S -K H H H H H H H ^3 ^3 ^5 ^3 ^3 ^3 ^3 -F ^3 CN CO cn oo .5 T31 CD o oo vo 125 however, since the total n for these two latter experiments is low, the results could be due to sampling bias and therefore more samples should be obtained at these timepoints to confirm indications of embryonic lethality. On the other hand, consistent with speculations of embryonic lethality of Asxll -/-;M33 Q e r m / Q e r m mice, four resorbing embryos were observed among those harvested at 10.5 dpc. Unfortunately, embryo derived tissue could not be obtained from the resorbed material and so those genotypes could not be determined. Regardless of the precise lethal phase, my results still indicate enhanced lethality in the double homozygous mutants, since some Asxll-/- and M33 c , e r m / c , e r m single mutants were observed alive at 3 weeks post-partum, while no Asxll-/-;M33 C , e r m / C l e r m mice remained alive at that timepoint. b. Sex-reversal phenotype of M33Cterm mutant mice Since M33 C l e r m / C , e r m a n d M33-/- (but not M33 heterozygous mutant) mice exhibit male-to-female sex reversal (Katoh-Fukui et al., 1998), we wondered if the presence of a compound Asxll ;M33 mutation would affect this phenotype, even though the Asxll mutant mice did not appear to exhibit a sex-reversal phenotype (see above). Unfortunately, since we did not obtain any Asxll-/-; M33 C t e r m / C , e r m double homozygous mutant mice, and obtained only one 3 week old Asxll+/-; M33 Q e r m / Q e r m mouse (which was female), we cannot address this question at the present time (see Appendix Table A - l ) . There does not appear to be enhancement of the sex-reversal phenotype in Asxll+/-; M33 C t e r m / + mice since roughly equal numbers of phenotypically male (n=32) and female (n=35) mice of this genotype at the 3 week timepoint were obtained (Appendix Table A - l ) . If sex-reversal had occurred, we would have expected to see substantially more female mice than males. However, since sex was determined based on external genitalia, but not based on the chromosomal sex of these mice, it is possible that a proportion of the externally female mice may indeed be chromosomally male (XY). To determine chromosomal sex of mice, a PCR-based assay or Southern blot analysis to detect the presence of the Y-chromosome-specific male sex-determination gene Sty (Sex-reversed on the Y) could be used in the future (Lambert et al., 2000). c. Enhancement of skeletal defects in Asxll;M33 Cterm compound mutant mice Evidence for a strong genetic interaction between the Asxll and M33Qerm mutations was obtained by investigation of skeletal defects in compound mutant newborns compared to the single mutant mice. Although I did not obtain any Asxll''';M33 Q e r m / C t e r m mice for analysis, illuminating results were obtained by analyzing the expressivity and penetrance of skeletal 126 00 g o §a Is? , | i. f a; 3 2 n 5 * »eH -8 + u + 3 S.-S — u «« ro + S £ — u 7 * 2 a o © o o o < u 2 0 o o o o Pi © o o C N OO o 1 o X X> •c a. o — u .a u H t t C N U U © o o o o o o O U e o x> •c 2 a •a -° P C C* z C N 00 Pi z Pi Z <-> © 0 °> 2 2 vo s 8 S D . O s '-a o o -5 « 2 o C3 XI < C/3 VO to O O < T3 © © © © © © a o a c o x> •c I-SI T 3 3 O l 127 Figure 3-12 (next page). Alterations of the axial skeleton of newborn Asxll ;M33 compound mutant mice. Dorsal views of the cervical (A,B) and thoracic (C,D) regions of cleared skeletons of newborn wild-type (A,C), and^5x/7;M33 compound heterozygous mutant mice (B,D). After staining with alizarin red and alcian blue, bone and cartilage appear red and blue respectively; the skeletons in the right panel (B,D) only show alizarin red staining. In several Asx;M33 compound heterozygous mutants, there is a reduction in the ossification of the vertebral bodies of the cervical vertebrae, an example of which is shown with an arrow in B). One Asx;M33 compound heterozygous mutant (D) exhibited a thoracic rib defect, with an ectopic ossification joining two adjacent thoracic vertebrae (black arrow). Figure 3-13 (page 130). Alterations of the axial skeleton of newborn Asxll mutant and Asxll ;M33 compound mutant mice. Ventral views of the thoracic region of cleared skeletons of newborn wild-type (A), one ^x/7~ ' - ;M33™' mutant (B), and two Asxll ;M33 compound heterozygous mutant mice (C,D). After staining with alizarin red and alcian blue, bone and cartilage appear red and blue respectively; the skeletons in B), C), and D) only show alizarin red staining. Compound Asxll;M33 mutant mice exhibit defects in the sternum, notably a hole in the xiphoid process (arrows in C,D), reduced ossification of the 5 t h sternebra (arrow in B; also seen in C,D to a lesser extent), and the presence of only 6 vertebrosternal ribs on the left hand side of the one Asxir'-\M33 C t e r m / + mouse shown in B). There are 7 vertebrosternal ribs in a wild-type mouse (A). The vertebrosternal ribs on the left hand side are numbered in panels A) and B). 128 129 Figure 3-13 (see page 128 for legend). 130 transformations in newborn Asxll'1';M33 Q e r m / + and trans-heterozygous mice (Table 3-5; Figures 3-9 through 3-13) compared to that of single gene mutant Asxll-/-, Asxll+/-, and M33 Q e r m / + newborn mice (refer to Table 3-3, Table 3-5, and Katoh-Fukui et al., (1998)). In the trans-heterozygote group, 82% exhibited a C2 to CI transformation (Table 3-5; Figure 3-9), a substantially higher percentage than observed in single Asxll+I- mutants (32%) or M33 c , e r m / + mutants (0%>), indicating enhancement of skeletal transformations in the Asxll ;M33Cterm double mutants. An ectopic rib bud was seen on the 7 t h cervical vertebra in one trans-heterozygote (Figure 3-10), indicating a C7 to T l transformation, although this particular observation does not imply an enhanced genetic interaction since ectopic ribs were also seen at a low frequency in M33 Q e r m / + mutants (Table 3-5; but not in Asxll+/- mutants, see Table 3-3). The penetrance of the T13 to L I transformation was increased in the trans-heterozygotes (Table 3-5; Figure 3-11), with a frequency of 45% compared to 26% in Asxll+/- and 0%> in M33 c , e r m / + single gene mutants. Trans-heterozygotes also commonly exhibited reduced ossification of the vertebral bodies on the ventral side of the cervical vertebrae, and one trans-heterozygote had a unique thoracic rib defect in that there was an ectopic ossification joining two adjacent vertebrae on the dorsal side (Figure 3-12). While only two AsxU~'~;M33 C t e r m / + mice were analyzed, they both exhibit novel phenotypes of the sternum that were not observed in the Asxll'' or M33 c , e r m / + single mutants (Figure 3-13). Specifically, these two mice show a reduced or defective ossification of the 5 t h sternebrae, an offset joining of ribs to the sternum referred to as a "crankshaft" sternum, and one of the mice had only 6 vertebrosternal ribs attached to the sternum on the left side instead of 7 as occurs in a wild-type mouse (Figure 3-13B). Both of these mice also had reduced or absent ossification centers (vertebral bodies) in some of the cervical vertebrae on the ventral side. Similar to phenotypes observed in the Asxll''';M33 Q e r m / + mice, six vertebrosternal ribs on either side, and a severely 'crankshaft' sternum (not shown) were observed in one of the trans-heterozygous mice, which was found dead shortly after birth. This particular mouse also had a split scapula on the left side, and a fusion between the atlas and the exoccipital bones of the skull (Figure 3-9D). As well, 7 other trans-heterozygotes exhibited sternal defects including one th or more of the following phenotypes: reduced or defective ossification in the 5 sternebrae, a hole in the xiphoid process, or offset unilateral articulation of ribs joining the sternum (Figure 3-13). These trans-heterozygote phenotypes have been observed in M33'1' (van der Lugt et al., 1996) and M33 a e r m / C t e r m mice (Katoh-Fukui et al, 1998) but not in M33+/', M33 Q e r m / + , or Asxll'1' mice, indicating a pronounced enhancement of these phenotypes in the trans-131 heterozygotes. In addition, C2 to CI anterior transformations and T13 to L I posterior transformations were observed in 100% of Asxll,',';M33 Qerm/+mice analyzed (n=2; Figures 3-9 and 3-11), whereas these transformations were less penetrant in Asxll''' mice and were never observed in M33 C t e r m / + mice (Table 3-5; (Katoh-Fukui et al., 1998)). Taken together, these results show an enhancement in skeletal transformation phenotypes in the compound Asxll;M33 Q e r m mutants compared to the single Asxll and M33 C t e r m mutants since stronger skeletal alterations and increased penetrance of transformations are observed at multiple positions along the anterior-posterior body axis. III. Discussion In this study, I generated a mutant mouse model of Asxll, a gene showing sequence homology to the Drosophila ETP group gene Asx, in order to test for conservation of ETP function and to investigate novel roles of Asxll acquired across evolution. N l generation Asxll mutants on a mixed 129/SvJ and C57B1/6J background show predominantly perinatal lethality, with a small fraction of Asxll-/- mice surviving into adulthood. This result is surprising given the observation that Asxll is one of a limited number of PcG or ETP genes expressed in undifferentiated embryonic stem (ES) cells, predictive of an early role of Asxll in embryogenesis (see Chapter 2 Discussion). Subsequent experiments intercrossing Asxll heterozygous mutants onto a more homogeneous C57B1/6J background (i.e. from N5 and higher generations) have shown that no live Asxll-/- mice are found at the 3 week timepoint (n>100; A . Dahl and H. Brock, unpublished observations), nor at the newborn timepoint (n=18), indicating an increase in lethality. Assuming the lethal phase of Asxll homozygous null mutants is still perinatal, this would seemingly place Asxll in the class of late-acting PcG and ETP genes along with M33, Bmi-1, Mel-18, and Rae28/Mphl, whose products form mammalian PRCl-type complex(es), and also lead to perinatal lethality in single gene knockout models (Akasaka et al., 1996; Levine et al, 2002; Takihara et al, 1997; van der Lugt et al, 1994). In contrast, several other single PcG and ETP gene homozygous null mutants show early embryonic lethality, namely eed, Ezh2, YY1, and Rnf2/Ringlb (Donohoe et al., 1999; Erhardt et al., 2003; O'Carroll et al., 2001; Schumacher et al., 1996; Voncken et al., 2003) their gene products belong to or interact with the second major class of soluble PcG complexes characterized to date, referred to in mammals as EED/EZH2 or PRC2-type complexes (Cao et al., 2002; Kuzmichev et al., 2002), with the exception of Rnf2/Ringlb which interacts with PRC1 components (Suzuki et al., 2002). YY1, Ezh2, and Rnf2/RinglB are expressed in ES cells (Donohoe et al, 1999; O'Carroll 132 et al, 2001; Voncken et al, 2003), as is Asxll. One homozygous mutant mouse model of the Mil gene also shows very early embryonic lethality, prior to 1.5 dpc (Ayton et al., 2001), although two other Mil homozygous mutant strains show mid-gestation stage embryonic lethality (Yagi et al., 1998; Y u et al., 1995); these phenotypic differences result from differing targeting strategies used in attempts to generate null mutants, as Mil is a highly complex locus (Ayton et al, 2001; Ernst et al, 2004; Ernst et al, 2002; Hsu and Look, 2003). Mil is also expressed in ES cells (Ayton et al, 2001). The stage of lethality of PcG and ETP genes has been to classify the genes as either early-acting or late-acting; however it is now apparent that such classifications are an oversimplification. Early lethality of single PcG gene mutants is certainly indicative of early function involving target genes other than Hox, since in that case lethality occurs prior to the onset of Hox gene expression (at approximately 8.5 dpc). However, late lethality does not preclude the possibility of earlier function whose loss is compensated for by additional paralogies or orthologues identified via sequence homology, or even by other non-orthologous genes that possess PcG function such as the sop-2 gene in C. elegans (Zhang et al, 2003). Evidence from the Drosophila system suggests that a PRC2-type complex transiently interacts with a PRC 1-type complex during early embryogenesis to set the stage for long-term gene silencing (Poux et al, 2001b), thereby invoking an early function of PRC 1 that was not evident from separate studies of individual PRC1 or PRC 1-interacting gene null mutants in Drosophila which showed late embryonic lethality. Conservation in mammals of the transient PRC2-PRC1 complex interaction seems likely given that multiple paralogues for each Drosophila PcG and ETP orthologue exist in mice (with the exception of eed which is the sole homologue of esc) and largely show conserved protein-protein interactions and complex composition. It is unclear how Asx l l function would integrate into the above model since it has not been shown to belong to PRC1 or PRC2 type complexes (see Chapter 1). However, it is tempting to speculate that transient interactions between Asx l l and SET-domain containing proteins occur (see Chapter 1) which, when abrogated by mutation in Asxll, would adversely affect the functions of the interaction partners, leading to lethality and other phenotypes (see below) observed in Asxll mutant mice. For example, transient interactions between Asxl l and the mammalian E(z) homologue Ezh2 (or Ezhl) may occur, resulting in modulation of PRC2 complex function, since Drosophila A S X and E(Z) have been shown to interact in the yeast-2-hybrid assay (see Chapter 1). Like all other PcG proteins in Drosophila, loss of maternal and zygotic expression of A S X and E(Z) leads to embryonic lethality (Soto et al, 1995). However, 133 while both Asx and E(z) are necessary for Hox gene repression in Drosophila embryos, they are, along with esc and pho, largely dispensible for maintenance of Hox gene repression during subsequent stages of development within imaginal wing disc cells, whereas all other PcG genes tested (ph, Psc, Su(z)2, Pc, Scm, See/Ring, and Pel; which are all components of or interact with the PRCI complex; see Chapter 1) are required continuously throughout larval development (Beuchle et al., 2001). These results suggest that A S X falls into the same functional category along with the PRC2 components E(Z), ESC, and PHO, in that these proteins have their primary function in early, but not late, Drosophila development. In addition, it is also possible that Asxl l interacts with M i l early in development, as they are both expressed in ES cells, and ASXL1 interacts with M L L in vitro (see Chapter 1). There is only one Asx gene in Drosophila, whereas there are three Asx-like genes in mice and humans (see Chapter 2). It is therefore highly probable that loss of Asxll is compensated for by the remaining two Asx-like genes (Asxl2 and AsxlS), especially since Asxll and Asxl2 show evidence of coordinate regulation in adult mouse tissues (see Chapter 2). Hence I predict that compound mutations in Asxll and one or both of the other two Asx-like genes in mice would result in early embryonic lethality, which would be consistent with expression of both Asxll and Asxl2 in ES cells (see Chapter 2), and would also be consistent with probable interaction of ASXLPs with the SET-domain proteins M i l and Ezh2 that both show expression in ES cells and early lethality in null mutant models. These functional possibilities will be further discussed in Chapter 5. As is the case for other PcG and ETP gene knockout models in mice, Asxll-/- mice exhibit unique phenotypes, suggesting a role in regulation of other developmental processes in addition to patterning of the axial skeleton. There are preliminary indications of involvement of Asxll in gametogenesis and/or development of the reproductive organs, and reductions in fertility of Asxll-/- males, which correlates with the high levels of expression of human ASXL1 seen in testis (see Chapter 2). In order for male mice to be infertile, their sperm counts must be reduced by over 90% (Russell et al., 1990). Therefore it would be interesting to quantify sperm counts of Asxll-/- male mice to see if they show reduced levels in comparison to Asxll +/+ controls. Mutations in certain clustered Hox genes (e.g. HoxalO, Hoxal 1, Hoxdll) are known to affect development of the male and female reproductive system and can lead to defects in spermatogenesis and sterility (Davis and Capecchi, 1994; Davis et al., 1995; Hsieh-Li et al., 1995; Satokata et al., 1995); therefore it would be interesting to examine whether or not 134 misexpression of these Hox genes occurs within the reproductive tract of Asxll-/- mice, during development and in the adult. A null mutant of the transcriptional repressor, and cell-cycle regulator E2f6, which also interacts with components of the PRC1 complex (Ogawa et al, 2002; Trimarchi et al, 2001), was shown to exhibit defects in spermatogenesis, as well as limited posterior homeotic axial skeleton transformations akin to those in PcG mutant mice, although the mutants were viable and fertile (Storre et al, 2002). It would therefore be interesting to determine whether or not Asxl 1 interacts with E2F6 in regulating target genes involved in spermatocyte development. Recently, the Drosophila PRC1 complex component SCE/dRING was identified in a screen for ORD protein interactors, and was shown to be involved in mediating proper chromosome segregation during male and female meiosis through its interactions with ORD (Balicky et al, 2004). Drosophila mutants for the PcG protein ph show defects in chromosome segregation during mitosis (Lupo et al, 2001), as do Asx mutants (E. O'Dor and H . Brock, unpublished observations). Given the above preliminary results for A s x l l , indications of involvement of A S X and other PcG proteins in spermatogenesis and/or chromosome segregation, and increasing knowledge of unique chromatin remodeling events and gene regulation occurring during spermatogenesis (Sassone-Corsi, 2002), further analysis of the role of Asx l l in mammalian gametogenesis is warranted. Asxll null mutants also exhibit defects in eye development, with varying expressivity and incomplete penetrance. The range of phenotypes observed in surviving adult N l generation Asxll-/- mice was extreme, varying from complete absence of eye and neural-optic structures on both sides, to small eyes, to no discernable effects on eye structures as determined by visual assessment. Examination of a limited number of Asxll-/- 12.5 dpc embryos also revealed abnormalities in eye structures, again with varying expressivity, and left-right asymmetry. It is interesting to note that a balanced reciprocal translocation involving the human ASXL2 gene was identified in a patient with brain malformations of the corpus callosum, as well as bilateral chorioretinal and iris colobomas (Ramocki et al, 2003). Notably, the developing eyes from one Asxll-/- 12.5 dpc embryo exhibited larger than normal ventral choroid fissures, which would lead to colobomas if the fissures failed to close at subsequent stages of development. Asxll and Asxl2 are each relatively highly expressed in adult brain; however, specific expression in eye structures has not been examined (see Chapter 2). Although preliminary, the above findings lead to the speculation that there is a common role of the Asx-like gene family in eye and brain development. Nullizygous mutants of the murine PcG gene rae28 also exhibit defects in eye 135 development by 17.5 dpc, ranging from hypoplasia to complete absence of the optic cup, with left-right asymmetry (Takihara et al., 1997); however, target genes involved in mediating this phenotype have not yet been identified. Functional investigations of the young mutation in zebrafish, which mapped to a homologue of the brahma-related gene (brgl), a subunit of a SWI-SNF-type chromatin remodeling complex and member of the trxG, revealed that young/brgl mediates retinal cell differentiation (Gregg et al., 2003). In addition, several members of the PcG and trxG were recently identified in a mosaic genetic screen for genes involved in Drosophila eye development (Janody et al., 2004). Null mutants of the Hox transcriptional cofactor Meis l also exhibit eye defects subsequent to the 11.0 dpc stage of development in mice (Hisa et ai, 2004). It is therefore possible that Asx-like genes function in evolutionarily conserved genetic pathways to regulate eye development along with certain other chromatin maintanence proteins and/or the Hox cofactor Meis l . As expected for a PcG and ETP gene, Asxll null mutants exhibit homeotic transformations of the axial skeleton, as well as other skeletal patterning defects. The observation that Asxll null mutants exhibit both anterior (CI—>occipital and C2—»C1) and posterior (C7—»T1 and T13—>L1) transformations within the same animal is consistent with the proposed classification of Asx-like-1 as an ETP gene in mammals, and indicates functional conservation with the Drosophila Asx ETP gene, mutants in which also show bidirectional homeotic transformations along the A P axis (Milne et al., 1999). Asxll null heterozygotes exhibit the same types of homeotic transformations as null homozygotes, although at a lower penetrance, indicating an effect of gene dosage on Asxll function. The bidirectional transformation phenotypes of Asx and Asxll mutants contrast with those of null or hypomorphic mutants in most PcG and ETP gene orthologues (e.g. ph/rae28, psc/bmi-1 and mel-18, esc/eed) which show exclusively posterior homeotic A P axis transformations in mice and Drosophila (Akasaka et al., 1996; Schumacher et al., 1996; Simon et al., 1995; Takihara et al., 1997; van der Lugt et al, 1994). Unlike the situation with. Asx and Asxll, there are also instances in which the mutant phenotypes of PcG and ETP genes are not analogous between Drosophila and mouse orthologues, suggesting some divergence in function across evolution. Ring/See mutants in Drosophila show strong posterior homeotic transformations (Breen and Duncan, 1986; Campbell et al., 1995; Fritsch et al., 2003), whereas Ringla mouse mutants unpredictably exhibit moderately penetrant anterior AP axial transformations in both null homozygotes and 136 heterozygotes, and highly penetrant anterior A P axial transformations in transgenic hypermorphs (del Mar Lorente et al, 2000). Conversely, hypomorphs of the other characterized orthologue to Ring in mice, Ringlb/Rnf2, exhibit posterior homeotic transformations of low to moderate penetrance depending on the genetic background (Suzuki et al, 2002), while null homozygotes die at gastrulation presumably due to severe effects on target loci expression other than the Hox cluster, including the Cdkn2a cell-cycle inhibitor locus (Voncken et al, 2003). While the above observations are consistent with Ring being a core member of the repressive PRC1 complex in Drosophila (Francis et al, 2001), it is difficult to reconcile the disparate phenotypes of Ring la and Ring lb mutants with the presence of both corresponding proteins in the mammalian equivalent of the PRC1 com