Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Epigenetic information in the cell : a potential avenue for biocompatible computing Hentrich, Thomas 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2011_fall_hentrich_thomas.pdf [ 9.19MB ]
Metadata
JSON: 24-1.0052107.json
JSON-LD: 24-1.0052107-ld.json
RDF/XML (Pretty): 24-1.0052107-rdf.xml
RDF/JSON: 24-1.0052107-rdf.json
Turtle: 24-1.0052107-turtle.txt
N-Triples: 24-1.0052107-rdf-ntriples.txt
Original Record: 24-1.0052107-source.json
Full Text
24-1.0052107-fulltext.txt
Citation
24-1.0052107.ris

Full Text

Epigenetic information in the cell A potential avenue for biocompatible computing by Thomas Hentrich Diplom-Informatiker, Friedrich-Schiller Universität Jena, Germany, 2006 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Computer Science) The University Of British Columbia (Vancouver) August 2011 c© Thomas Hentrich, 2011 Abstract Living organisms sense, process and produce molecular signals to regulate their activity and, thus, process information and perform computations on biological substrates. Understanding the fundamental principles of information representa- tion and manipulation along these algorithmic bioprocesses will both advance our understanding of biology and inspire novel forms of computation. From one side of this connection, the research field of Bioinformatics uses computational tools to gain insight into the processes of life. On the other side, Biomolecular Computing attempts to utilize molecules and cellular processing machineries to operate engi- neered nano-scale biocomputers in live cells for in situ diagnostics, therapeutics and many other applications in biotechnology, bioengineering and biomedicine. Addressing this bifold character of information of life, this thesis contributes to the part of Computing Life by elucidating aspects of chromatin, the nucleo- protein structure of DNA in the cell. As a computation and storage layer in the cell, chromatin offers a high degree of plasticity to integrate (environmental) sig- nals into DNA-based regulation and allows information propagation according to epigenetic principles. In this work, I present results on the complexity of chromatin modifications and reveal their impact on gene activity and related functions in the cell. I further discuss a bioinformatics pipeline I developed and that was applied for genome-wide chromatin profiling and transcriptome analysis. From the perspective of Living Computers, I address the question of informa- tion encoding by biomolecules and draw on principles of epigenetics to devise a model for an RNA interference-based equivalent of the electric flip-flop, which ii is one of the fundamental elements in digital circuits. In particular, I focus on the digital switch abstraction of the flip-flop as it is the pivotal idea in both the electric and biomolecular world that connects and at the same time decouples an underlying physical process and the abstract representation of information. This work contributes elucidating the computational principles of the RNA interference machinery and suggests novel ideas for universal memory units in biomolecular computing. By juxtaposing both the natural and artificial perspective, this thesis attempts to enhance our understanding of epigenetic information processing in the cell and its capacity for biocomputing applications. iii Preface Chapter 2 of this dissertation is based on a manuscript under review: Hentrich T and Unrau PJ, Programming with RNA interference: the potential for a bistable flip-flop. As first author, I was involved in formulating the research question and re- search design together with P. J. Unrau as principal investigator. In the course of the project, I generated, analyzed and visualized all the data and wrote the manuscript. Section 3.5.2 in Chapter 3 contains material of a manuscript to be submitted: Hentrich T, Schulze JM, Kobor MS, Emberly E, CHROMATRA: a web-based tool for compact visualization of chromatin modifications across transcripts. Together with J. M. Schulze, I was involved in developing the visualization approach, im- plementing it in software and writing the manuscript. M. S. Kobor and E. Emberly provided guidance in the course of the project. Section 4.3 in Chapter 4 contains parts of a published manuscript: Schulze JM, Jackson J, Nakanishi S, Gardner JM, Hentrich T, Haug J, Johnston M, Jaspersen SL, Kobor MS, Shilatifard A. (2009), Linking cell cycle to histone modifications: SBF and H2B monoubiquitination machinery and cell-cycle regulation of H3K79 dimethylation. Mol Cell. 35, 626-41. All experiments described in the chapter were performed by J. M. Schulze, and I developed the software to analyze the gen- erated data. Parts of the manuscript that are not shown originate from work of J. Jackson, S. Nakanishi, J. Gardner and J. Haug. M. Johnston, S. L. Jaspersen, M. S. Kobor and A. Shilatifard guided the project as principal investigators. Section 4.4 of the same chapter is based on a submitted manuscript: Schulze JM, Hentrich T, Nakanishi S, Gupta A, Emberly E, Shilatifard A, Kobor MS. Split- iv ting the task: Ubp8 and Ubp10 deubiquinate different cellular pools of H2BK123. As equal first author, I was involved in research design, data analysis and writ- ing the manuscript. J. M. Schulze performed the experiments, S. Nakanishi pro- vided chemical reagents, and A. Gupta, E. Emberly A. Shilatifard and M. S. Kobor guided the project. Results shown in Section 4.5 of Chapter 4 are based on experiments performed by J. M. Schulze in the laboratory of M. S. Kobor. My role in this project was to develop the software and analyze the presented data. v Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biomolecular computing . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 The computational paradigm . . . . . . . . . . . . . . . . 4 1.2.2 Towards DNA-based computation . . . . . . . . . . . . . 5 1.2.3 Beyond DNA-based computation . . . . . . . . . . . . . 7 1.3 Epigenetic regulation in the cell . . . . . . . . . . . . . . . . . . 10 1.3.1 Chromatin . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 RNA interference . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Thesis contribution . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 RNA interference and the potential for a bistable flip-flop . . . . . . 26 2.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 vi 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 RNAi as an intrinsically programmable machine . . . . . . . . . . 29 2.4 Digital memory with RNAi . . . . . . . . . . . . . . . . . . . . . 32 2.5 The mathematical model of an RNAi-based flip-flop . . . . . . . . 35 2.6 Steady state characteristics of the flip-flop model . . . . . . . . . 36 2.7 Dynamic characteristics of the flip-flop model . . . . . . . . . . . 39 2.8 Robustness of the flip-flop model . . . . . . . . . . . . . . . . . . 40 2.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 Software pipeline for genome-wide chromatin modification profiling and transcriptome analysis . . . . . . . . . . . . . . . . . . . . . . . 48 3.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 Wet-lab data generation . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.1 Chromatin modification mapping with ChIP-on-chip . . . 50 3.3.2 Transcriptome mapping with tiling arrays . . . . . . . . . 53 3.4 Raw data processing . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.1 rMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.2 tilingArray . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Data analysis and visualization . . . . . . . . . . . . . . . . . . . 58 3.5.1 Local analysis pipeline . . . . . . . . . . . . . . . . . . . 58 3.5.2 Web-based analysis and collaboration . . . . . . . . . . . 67 3.6 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4 Analysis of chromatin modifications in the context of transcription . 72 4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3 Distinct states of histone modifications . . . . . . . . . . . . . . . 79 4.3.1 Genome-wide localizations of H3K79me2 and H3K79me3 79 4.3.2 Profile of H3K79me2 and H3K79me3 at promoters and ORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.3 Association of H3K79me2 and H3K79me3 with transcription 82 vii 4.3.4 Association of H3K79me2 and H3K79me3 with the cell cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.5 Dynamics of H3K79me2 and H3K79me3 during the cell cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.6 H3K79me2 at intergenic regions and ARS in G2/M-phase 86 4.3.7 Association of H3K79me2 with the transcription factor Swi4 88 4.3.8 Colocalization of H2BK123ub with H3K79me3 . . . . . . 88 4.4 Interdependencies of histone modifications . . . . . . . . . . . . . 92 4.4.1 Genome-wide distribution of H2BK123ub and H3 methy- lation marks with respect to gene length and transcriptional frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.2 Correlation and colocalization of H2BK123ub and its de- pendent marks . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.3 Site-specific removal of H2BK123ub by Ubp8 and Ubp10 100 4.5 Aberrant transcripts and their chromatin structure . . . . . . . . . 105 4.5.1 Chromatin at cryptic promoters . . . . . . . . . . . . . . 105 4.5.2 Deposition and role of H2A.Z at cryptic promoters . . . . 106 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 viii List of Tables Table 2.1 Variable values of the optimized flip-flip. . . . . . . . . . . . . 42 Table 2.2 Parameter values of the optimized flip-flop . . . . . . . . . . . 42 Table 2.3 Parameter ranges of randomly generated flip-flops . . . . . . . 43 Table 4.1 H2BK123ub and H3 methylation at different genomic features 92 Table 4.2 Frequencies of histone modification patterns . . . . . . . . . . 99 ix List of Figures Figure 1.1 Layers of cellular memory and information processing . . . . 2 Figure 1.2 Architectural hierarchy and modifications of chromatin . . . . 11 Figure 1.3 Core components and process steps of RNA interference . . . 18 Figure 2.1 RNAi pathways in Caenorhabditis elegans . . . . . . . . . . 30 Figure 2.2 Flip-flop mode of operation . . . . . . . . . . . . . . . . . . 33 Figure 2.3 Sequence setup of the RNA flip-flop . . . . . . . . . . . . . . 34 Figure 2.4 Mathematical analysis of the RNA flip-flop . . . . . . . . . . 39 Figure 2.5 Simulated flip-flop behaviour (core RNAs) . . . . . . . . . . 41 Figure 2.6 Simulated flip-flop behaviour (all RNAs) . . . . . . . . . . . 41 Figure 2.7 Robustness of the RNA flip-flop for different parameters . . . 44 Figure 2.8 Robustness of the RNA flip-flop for different pulse parameters 45 Figure 3.1 Chromatin profiling using ChIP-on-chip:wet-lab part . . . . . 51 Figure 3.2 Chromatin profiling using ChIP-on-chip:dry-lab part . . . . . 53 Figure 3.3 Software ecosystem of the bioinformatics pipeline . . . . . . 60 Figure 4.1 H3K79me2 and H3K79me3 patterns across the genome . . . 80 Figure 4.2 Distribution of H3K79me2 and H3K79me3 across genes . . . 82 Figure 4.3 Association of H3K79me2 and H3K79me3 with transcriptional frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 4.4 Association of H3K79me2 and H3K79me3 with the cell cycle 85 Figure 4.5 Association of H3K79me2 with the cell cycle in G2/M-phase . 86 Figure 4.6 H3K79me2 profile in G2/M-phase . . . . . . . . . . . . . . . 87 Figure 4.7 Association of H3K79me2 with the transcription factor Swi4 . 89 Figure 4.8 Colocalization of H2BK123ub with H3K79me3 . . . . . . . . 91 x Figure 4.9 Distribution of H2BK123ub and H3 methylation marks in all transcripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Figure 4.10 Distribution of H2BK123ub and H3 methylation marks with respect to transcriptional frequency . . . . . . . . . . . . . . 95 Figure 4.11 Associations of H2BK123ub and H3 methylation marks with transcriptional frequencies . . . . . . . . . . . . . . . . . . . 96 Figure 4.12 Correlation of H2BK123ub and its dependent marks . . . . . 97 Figure 4.13 Patterns of H2BK123ub and its dependent marks . . . . . . . 98 Figure 4.14 Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (probe level) . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 4.15 Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (transcript level) . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 4.16 Connection between Ubp8, Ubp10 and H3 methylation marks 103 Figure 4.17 Segment-based association of Ubp8 and Ubp10 with H3 methy- lation marks . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 4.18 Distribution of H2A.Z across ORFs and transcripts . . . . . . 107 Figure 4.19 H2A.Z at sites of cryptic initiation . . . . . . . . . . . . . . . 108 Figure 4.20 H3K4me3 at sites of cryptic initiation . . . . . . . . . . . . . 109 Figure 4.21 Role of SWR1-C in H2A.Z deposition at cryptic promoters . . 110 Figure 4.22 Role of H2A.Z for the occurrence of cryptic transcripts . . . . 111 Figure 4.23 H2A.Z at sites of antisense cryptic transcripts . . . . . . . . . 112 Figure 4.24 Circuitry model for H2BK123ub and its dependent histone marks116 xi Acknowledgements This thesis would not have been possible without the many people who supported me in various ways over the past years. It is, therefore, my pleasure to express my thanks to those who aided me in completing the work presented herein. First and foremost, I owe my deepest gratitude to my supervisor Dr. Arvind Gupta for giving me the opportunity to pursue the questions addressed in this thesis and for providing an inspiring and stimulating research environment. His support, guidance and encouragement as my mentor have been invaluable over the past years at Simon Fraser University and the University of British Columbia. Similarly, I would like to thank my supervisory committee members, Dr.Michael Kobor, Dr. Eldon Emberly, Dr. Peter Unrau and Dr. Wyeth Wasserman for many fruitful discussions, interesting ideas and helpful comments. Their enthusiasm and joy for research were truly motivational. Special thanks to Dr. Ali Shilatifard and Dr. Frank Holstege for giving me the opportunities to collaborate on several exciting projects. A warm thank you to all members of the Kobor group for making the lab one of the most incredible places for my research. Thanks for all the supportive and in- spirational advice—on science and life in general—the team spirit and, especially, for many fun times. I truly enjoyed the wonderful collaborations with the Centre for Molecular Medicine and Therapeutics. xii I would further like to thank Joyce Poon, Gerdi Snyder, Val Galat and Sharon Ruschkowski for their support with all administrative and organizational needs over the past years. I also gratefully acknowledge the funding sources that made my work possible, foremost the CIHR/MSFHR Bioinformatics Program for Health Research, which offered a unique environment for my time in graduate school. Lastly, I would like to thank my family for their love and encouragement, es- pecially my parents and my brother who supported me in all my pursuits even if it meant to be far away from home on the other side of the globe. Yet, most importantly, I would like to thank Julia, the most amazing person in my life. Her support, motivation and love made the hard times of this journey bearable and the good times unforgettable. I am deeply grateful for everything you have done for me, and I would not be what I am without you. This life can only be great and exciting with you on my side and I am looking forward to it! xiii To my parents and grandparents xiv Chapter 1 Introduction 1.1 Motivation Biological systems process information. Across the scale, from unicellular to com- plex organisms, components of these systems sense, process and produce molec- ular signals to regulate their activity and interact with the environment they live in (Cooper and Hausman, 2009; Alberts et al., 2007; Lodish et al., 2007). This self-regulatory character of biological systems is primarily based on precisely con- trolled (de-)activation of genes in particular situations that are characterized by cer- tain molecular inputs. However, many cellular pathways are not simple cascades of gene activation that always generate the same output on the same input. Often a proper response must take conditions into account that the cell encountered previ- ously. Hence, such pathways require memory to store information (Alberts et al., 2007). At first glance, DNA seems to be the natural choice for storing information, but in its organismal role DNA is a static storage entity for evolutionary-scale infor- mation conveyance (Figure 1.1). Manipulating this nucleic acid sequence usually has severe negative effects. Yet, how do organisms record, store and pass on infor- mation that changes frequently when manipulations to the DNA sequence might jeopardize the whole system? 1 Cells, especially in higher organisms with differentiated cell types, achieve the flexibility to cope with dynamic information by employing storage means above and beyond the genetic level of the DNA sequence (Bonn and Furlong, 2008; Is- trail et al., 2007; Levine and Davidson, 2005). Such non-sequence based layers of (heritable) information are capable of altering gene expression and termed epige- netic (Greek: epi-, above). Figure 1.1: Cellular information storage and processing occurs across several highly connected and interdependent layers that compute and perform logical operations on different biomolecular substrates through various cellular functions. Besides permanent sequence-based information on the DNA level, RNA and chromatin provide greater plasticity and ab- stract encoding schemes that reflect dynamic aspects of information pro- cessing in the cell. In particular, they are capable of epigenetic informa- tion storage and propagation. The pictures of two of these regulatory layers, chromatin and RNA, have started to change profoundly over the past years. Chromatin, initially considered to be a mere packaging structure for DNA, now emerges as a highly dynamic entity with various means to control access to the DNA and to regulate transcriptional gene activity (Figure 1.1). It is becoming increasingly clear that chromatin exists in multiple states that are dynamically shaped by environmental conditions and intrin- sic factors to modulate cellular functions (Allis, 2007). Similarly, the discoveries 2 of non-coding RNAs and associated pathways such as RNA interference (RNAi) broadened our understanding of RNA family members from rather passive sub- strates in the protein synthesis pathway to active players in a spectrum of cellular processes (Ketting, 2011). Even though research over the past years identified many parts on the molecular level of chromatin and RNA many questions remain to be answered as to how these parts work together and influence cellular functions. Addressing these questions is not alone motivated from a biological perspec- tive; many of them have also been raised in the research field of in vivo biomolec- ular computing. The long-term goal of in vivo biomolecular computing is to build miniature computing machines that operate in live cells and interact directly with the host environment (Benenson, 2009b). Such biocompatible computing devices are envisioned to monitor, control and even reprogram cellular processes and to open new ways of diagnosis and disease treatments among many other appli- cations. Proof-of-concept devices have already been demonstrated successfully, some even in human cells (Rinaudo et al., 2007). However, their current design is challenged in two fundamental aspects: Firstly, information that these devices need or generate is encoded in nucleic acid se- quences. Secondly, the computation manipulates the information, i.e. the nucleic acid molecules, which usually leads to their destruction and limits these devices to a single operational cycle (Benenson, 2009b). These characteristics conflict on one side with the biological constraints on sequence-based information in the cell and on the other side with the computational requirement of continuous operation to interact with cellular pathways. These limitations of biomolecular computing could be overcome by employing alternative encoding schemes for information and computational state. Sequence-based information in the organismal context is static. Computation, however, demands the dynamic handling of information. Against this background, this work explores the epigenetic layers of chromatin and RNA and contributes to elucidate their gene regulatory impact from the bio- logical perspective, and explores how to potentially utilize their characteristics for handling dynamic information from the viewpoint of biocompatible computing. 3 1.2 Biomolecular computing 1.2.1 The computational paradigm Biomolecular computing describes the interdisciplinary attempt of biology, chem- istry and computer science to fundamentally understand information processing in living systems on one side and, on the other side, to utilize biomolecules and biochemistry to probe and manipulate natural or to engineer artificial systems for computing and communication. From the latter perspective, computing with biomolecules can be similarly un- derstood as other ‘unconventional computing’ ideas, including quantum (DiVin- cenzo, 1995), optical (Jain and G W Pratt, 1976) or even billiard ball comput- ing (Fredkin and Toffoli, 1982), which all strive to extend or explore alternatives to the computational architecture of von Neumann (Newmann, 1945) and/or the com- putational principles of Turing (Turing, 1937), which have undoubtedly led to the pervasive success of silicon-based computers, but which also revealed challenges current computational systems are faced with in terms of computational complex- ity, energy consumption, recyclability and the like. While some of the proposed ‘unconventional’ computing substrates led to alter- natives to von Neumann’s principles, none of them has (yet) led to solve a compu- tational problem that cannot be solved with classical computers (assuming enough time and space resources from a computational complexity standpoint, even if they might be practically infeasible). Even the computational superiority of quantum computing is still debated and generally doubted (Bernstein and Vazirani, 1997). Nevertheless, an undeniable novelty of many of the unconventional approaches is that each of them broadened our view of computation and information. In case of biomolecular computing in particular, many initial works used in- stances of well-known problems in complexity theory to demonstrate the feasibil- ity of the new type of computation (Braich et al., 2002; Liu et al., 2000; Ouyang et al., 1997). But trading time for space complexity and exploiting the inherent par- 4 allel processing of biochemical reactions as many of these attempts suggested only scales to a certain degree (besides many other challenges, such as error-proneness of molecular operations). So far, biomolecular computing cannot compete in solv- ing problem instances of sizes that are intractable for classical, silicon-based com- puting. One of the unique strengths of biomolecular computing rather lies in its com- patibility with our physiology and in its potential to carry out operations in living organisms (Fu, 2007). In contrast to any other computational substrate, biologi- cal molecules can be used to build nanoscale computing devices that operate in and interact with living cells. Computing in this environment does not try to solve large-scale or hard problems in the sense of classical complexity. Instead, biocom- puters interface with cellular processes to monitor, analyze and manipulate molec- ular signals, and thereby open new ways to diagnostics and therapeutics among many other applications. Attempts in this direction are referred to as biocompati- ble computing (Shapiro and Benenson, 2006). 1.2.2 Towards DNA-based computation The idea of biological-inspired computing is not as new as often stated. Although Adleman in the mid-1990s undoubtedly pushed the door wide open by convinc- ingly exploiting biological molecules to solve an instance of a well-known compu- tational problem (Adleman, 1994), many others paved the way to the research field where we see it today. Some of the earliest conceptual ideas of engineering on the molecular scale date back to Feynman in the late 1950s (Feynman, 1959). Even a decade earlier, von Neumann described the theoretical model of a self-replicating automata in molecular vocabulary (Neumann, 1966). A few years later, Rössler and others then reasoned about nanoscale computing and universal automata based on chemical reactions (Rössler, 1972; 1974), and Hjelmfelt and Ross later showed that (ideal) chemical kinetics are indeed Turing-complete (Hjelmfelt et al., 1991; 1992; Hjelm- felt and Ross, 1992). 5 Parallel to the evolving chemical perspective on molecular computing, the dis- coveries of the DNA structure (Watson and Crick, 1953) and the genetic code (Crick et al., 1961) and their fundamental roles in cellular information processing trig- gered similar ideas in biology. The unravelling of genetic modules, such as the lambda repressor (Ptashne, 1986), the lac operon (Müller-Hill, 1996), the tetrecy- clin repressor (Hillen and Berens, 1994; Hinrichs et al., 1994) and the lux operon (Babloyantz and Nicolis, 1972; Dunny and Winans, 1999), led to the understand- ing that sets of biomolecules are often connected in networks, and that they convert molecular inputs into molecular outputs according to certain regulatory rules—a bioprocess that essentially is a computation (Regev and Shapiro, 2002). Together with the growing portfolio of laboratory techniques, such as DNA recombination, sequencing and synthesis, the field of biology continuously developed its own ap- proaches and tools to probe and manipulate the organismal machineries of compu- tation. Adleman, a computer scientist, blended the theoretical potential and practi- cal feasibility of molecular operations and applied them in light of computation in the classical sense (Adleman, 1994). His proof-of-concept computation used DNA molecules as a primary substrate, which coined the term DNA computing that became virtually synonymic for the entire field of biomolecular computing. Most of the approaches and information encoding schemes his work inspired in the following years were based on DNA (nicely reviewed in (Benenson, 2009b; Condon, 2006). Inspired by the informational role of DNA in the cell, its stability and sequence properties were considered ideal for information storage and pro- cessing (Condon, 2006), and many creative works led to a colourful bouquet of ideas, ranging from theoretical proposals of DNA Turing machines (Lipton and Baum, 1996), to successful demonstrations of algorithmic self-assembly of DNA molecules (Douglas et al., 2009), to innovative uses of DNA structure (Shapiro and Benenson, 2006) and many others. The field of DNA computing has continued to expand and diversify since then, and connections have been established with other research areas, such as Syn- 6 thetic Biology and Nanotechnology. With respect to the latter, DNA molecules were used to build rigid 2D or even 3D scaffolds (Rothemund, 2006; Seeman, 2003) that allow assembly of other biological molecules (Niemeyer et al., 2002) or non-biological components (Xin and Woolley, 2003). Even locomotion and transport based on DNA molecules have been demonstrated (comprehensively re- viewed in (Simmel and Dittmer, 2005)). From the interface with Synthetic Biol- ogy, several works successfully demonstrated engineered computational elements of biomolecules that are inspired by electronic circuits. Beginning with Gard- ner’s toggle switch (Gardner et al., 2000) and Elowitz’s repressilator (Elowitz and Leibler, 2000), efforts towards standardizing parts began (Group et al., 2006; Can- ton et al., 2008; Weiss, 2001) that resulted in libraries with more than 3000 ele- ments as of today (http://www.partsregistry.org). From the perspective of biocompatible computing, recent works demonstrated that automata of biomolecules are capable to sense molecular signals, to evaluate cellular conditions and to generate molecular outputs, which directly affect cel- lular functions or that can be measured by an external observer (Benenson et al., 2004). However, DNA-based automata are challenged in several ways in in vivo environments (Ezziane, 2006). Information and computational state encoded in the sequence or length of DNA molecules must be updated frequently in the course of a computation, and since many changes are not reversible and/or the information- encoding molecules degrade during the computational process, DNA automata are limited to a single operational cycle. Thus, one of the most critical obstacles to- wards continuous computation in vivo is the character of DNA as computational substrate itself (Benenson, 2009b). DNA in its organismal role is a static molecule for information transport on evolutionary time scales. Manipulating this nucleic acid sequence usually has severe negative effects. Computation, in contrast, is intrinsically dynamic and, hence, demands substrates that support this plasticity. 1.2.3 Beyond DNA-based computation Given the challenges DNA-based in vivo computing is faced with—in particu- lar the static organismal character of DNA in contrast to the dynamic nature of 7 computation—alternative biomolecules were considered by the field. According to the central dogma of molecular biology (Crick, 1970), two other major classes of molecules are carriers of information in the cell: RNAs and proteins. Protein-based computations were explored in the context of enzymatic activity by integrating molecular inputs to logic functions (Baron et al., 2006a;b), combin- ing logic gates to networks (Niazov et al., 2006; Privman et al., 2008) and even Turing machine-like operations (Bar-Ziv et al., 2002). However, since our under- standing of peptide/enzyme engineering is still limited (Benenson, 2009b), more complex protein-based computations comparable to existing DNA-based systems remain to be demonstrated. Similarly, RNA—given its higher instability and rapid turnover—remained in the shadow of DNA as computational substrate for quite a while. Although earlier works exist (Faulhammer et al., 2000; Stojanovic and Stefanovic, 2003), it was not until in vivo computing faced the limitations of DNA that RNA received broader attention (Benenson, 2009a). Then, the dynamic characteristics of RNA, such as the possibility of synthesizing new RNA in the cell, together with the discoveries of novel RNA regulatory pathways, such as RNA interference, led to increasing importance of RNA-based computations (Benenson, 2009a). Besides RNA inter- ference (see Section 1.3.2 for details) and small regulatory RNAs, riboswitches received attention. Riboswitches exist in nature as meta-stable structures, such as stem-loops, in mRNA transcripts (Mandal et al., 2003; Winkler and Breaker, 2005), and their conformational state can be dynamically altered by an array of molecular signals and inputs, such as temperature, small molecules and other RNAs (Isaacs et al., 2004). Riboswitch(-like) structures can be optimized or designed de novo (Beisel and Smolke, 2009; Babiskin and Smolke, 2011) for a given input such that state changes approximate discrete behaviour, multiple riboswitches can be introduced on a single transcript or linked in trans across multiple ones to implement logic functions (Bayer and Smolke, 2005; Win and Smolke, 2007a; 2008). 8 While riboswitches are mostly found in bacteria (Henkin, 2008), RNA inter- ference plays a key regulatory role in higher organisms (Fire et al., 1998). For detailed modalities of this pathway from a biological perspective, please refer to Section 1.3.2. In biomolecular computing, RNA interference has been success- fully used to implement the largest in vivo computation based on Boolean logic so far (Rinaudo et al., 2007). Current research tries to broaden the range of input molecules and flexibly integrate them with riboswitches (An et al., 2006; Tuleuova et al., 2008; Beisel et al., 2008) or RNA interference platforms (Leisner et al., 2010; Xie et al., 2010). The increasing understanding of the regulatory roles of RNAs in the cell, com- bined with the importance of their associated pathways make them attractive sub- strates and/or building blocks for molecular computers, especially in the envisioned diagnostic and therapeutic applications (Isaacs et al., 2006; Davidson and Elling- ton, 2007; Benenson, 2009a). The ability to rationally design or tailor RNA reg- ulators (Ellington and Szostak, 1990) can enable computational devices to access a large variety of (extra-)cellular signals. Taken together, these features seem to position RNA-based approaches as key elements towards biocompatible systems. Form a biological perspective, research over the past years revealed that cellu- lar computation, i.e. gene regulation, cannot be fully explained by just cis-regulatory networks (Bonn and Furlong, 2008; Istrail et al., 2007; Levine and Davidson, 2005) composed of transcription factors, signalling proteins and (non-coding) RNAs. Chromatin, the structure that packages DNA in the nucleus, emerged as another layer of regulatory impact (see Section 1.3.1 for details). From a biomolecular computing viewpoint, practical attempts to utilize chromatin have not been made as of the writing of this thesis; first theoretical works, however, suggest great po- tential of chromatin as computational substrate, especially in terms of memory capacity and functional impact (Prohaska et al., 2010). 9 1.3 Epigenetic regulation in the cell 1.3.1 Chromatin Chromatin architecture In eukaryotic cells, the macro-molecule of DNA is tightly compacted into the nu- cleus of a few microns in diameter. This packaged structure is called chromatin and appears as regularly spaced ‘beads on a string’ under the electron micro- scope (Olins and Olins, 1974). The ellipsoidal ‘beads’ are the fundamental archi- tectural units of the eukaryotic genome and referred to as nucleosomes (Kornberg, 1977; Oudet et al., 1975). Each nucleosome contains ∼147 base pairs (bp) of DNA wound around 13/4 turns an octamer of histone proteins: two heterodimers of core histones H2A and H2B and a tetramer of histones H3 and H4 (Arents et al., 1991; Luger et al., 1997). As illustrated in Figure 1.2, the architectural level of nucleosomes is the first one in the hierarchy of increasingly complex layers, which together determine the chromatin structure (Misteli, 2007). Above the nucleosomes, on the second level, chromatin folds into higher-order structures known as fibres (Woodcock, 2006). Besides further compaction, these structural features may bring distant genome regions into close proximity, facilitating the interactions of enzymatic complexes and spatially joining nonadjacent genetic elements (Fraser, 2006; van Driel et al., 2003). On the third level, chromatin fiber looping—together with the spatial and temporal organization of enzymatic machineries—defines sub-nuclear compart- ments within the nucleus which facilitate cellular processes such as DNA repli- cation, repair and transcription (Misteli, 2005). Being at the first level of this architecture, nucleosomes play indeed a funda- mental role in genome organization and regulation. They exhibit great plasticity— both in their composition and localization—and are subject to an array of post- translational histone modifications (Kouzarides, 2007) (see Section 1.3.1). The stability of these modifications varies. Some are transient, characterized by fast 10 enzymatic removal or rapid histone turn-over, whereas others are more permanent, mitotically or even meiotically heritable and capable of affecting the next genera- tion of cells (Youngson and Whitelaw, 2008). Figure 1.2: Depiction of the architectural hierarchy of chromatin and some of its known modifications. In the cell nucleus, DNA is compacted into the dense chromosome structure, which in turn is composed of chromatin loops and fibers. The basic units of chromatin are nucleosomes, consist- ing of histone proteins at the core that are subject to post-translational modifications such as acetylation (Ac), methylation (Me) and ubiquiti- nation (Ub). Together with histone modifications, histone variants such as H2A.Z (Z) and DNA methylation (Me) characterize distinct chro- matin neighbourhoods. Since some modifications are stable and heritable, chromatin has become a fo- cus in the field of epigenetics (Bird, 2007). In this context, epigenetic modifica- tions refer to changes in genome function such as gene expression that are not based on changes of the DNA sequence (Ptashne, 2007; Goldberg et al., 2007). Although heritability of the modification is required by most definitions of the 11 term epigenetic, there are examples of cells that do no longer divide and still pos- sess non sequence-based modulation of cellular functions via chromatin modifi- cations (Bonasio et al., 2010; Dulac, 2010). The term somatic epitype has been proposed additionally for non-heritable epigenetic modifications (Lahiri and Mal- oney, 2006), but has not found wide usage so far. The specific histone modification patterns and the structural composition of chromatin at any point in time serve as epigenetic memory in the cell that allow integrating and storing input signals and generating functional outputs on cellular processes (Filion et al., 2010). It has been shown that past cell states and envi- ronmental influences, such as nutrition, exposure to toxic agents, infections and diseases, affect the chromatin structure (Zhang and Meaney, 2010). The capability of integrating such (environmental) signals in a rapid, flexible and often reversible manner into the program of the genome provides the regulatory mechanisms to adapt functions and pathways in higher organisms (Prohaska et al., 2010). A classic example of the purpose of this memory platform, on which input sig- nals are stored as epigenetic traces and influence cellular functions (at a future point in time), is given by the establishment and maintenance of cell identity in higher organisms (Satterlee et al., 2010). Even though every cell of an organism is based on a single zygotic genome, subsets of progeny cells launch distinct programs of gene expression along the developmental trajectory that eventually leads to their specific organismal character and role. These cell identities are usually maintained for a lifetime, even when differentiation signals were experienced only once dur- ing embryonic development (Bonasio et al., 2010). The induced epigenetic traces capture and stabilize these signals and render the new expression states heritable (in those cells that continue to divide) (Bryant et al., 2008; Ptashne, 2009). Chromatin dynamics Over the course of almost a century after its initial description in the early 1880s (Flemming, 1882), chromatin slowly emerged as (passive) scaffolding structure for the DNA. The fast pace of discoveries in the field of genetics, however, vir- 12 tually eclipsed research on chromatin until the structure of the nucleosome was revealed (Kornberg, 1974). Against common belief up to that point that histone proteins would simply coat the DNA, the newly proposed structure put DNA on the ‘outside’ of nucleosomes (Kornberg, 1974; Oudet et al., 1975), suggesting they must be actively involved in making segments of DNA accessible to cellu- lar machineries. However, the active role of chromatin in regulatory processes, in particular transcription, was not apparent until the description of the transcription- linked histone modifiers from Tetrahymena (Brownell et al., 1996) and mammalian cells (Taunton et al., 1996). Since then, the research fields of transcription and chromatin have been connected and changed the understanding of chromatin from a rigid scaffold to a dynamic structure. The fact that nucleosomes unite both stabil- ity and plasticity in order to maintain the DNA in a compacted form while still al- lowing controlled access to the underlying genetic information, demonstrates their regulatory importance for rendering genomic loci accessible or inaccessible for proteins in DNA-templated processes (Kornberg and Lorch, 1999). Today, chromatin regulation is understood to be one of the fundamental modes in gene regulation and cellular information processing in eukaryotic cells (Pro- haska et al., 2010). To control the impact of nucleosomes on chromatin structure and processes, cells employ two main strategies: One way is through chromatin remodelling (Cairns, 2007), the second way works through post-translational mod- ifications of histones (Bhaumik et al., 2007; Kouzarides, 2007). In the first mode, chromatin remodelling enzymes use energy from ATP mole- cules to slide, eject or change the composition of nucleosomes (Clapier and Cairns, 2009). They play important roles in altering histone-DNA contacts and making DNA accessible during transcription and repair processes, and to reassemble the proper chromatin structure after DNA replication (Cairns, 2007). Chromatin re- modellers are also capable of replacing canonical histone proteins with histone variants that differ in amino acid composition (Clapier and Cairns, 2009). Dynamic repositioning of nucleosomes and introduction of histone variants through chro- matin remodellers or other modifiers yields distinct chromatin configurations or neighbourhoods characterized by nucleosome occupancy and composition. These neighbourhoods may impact cellular functions by being recognizable to and/or 13 actively recruit enzymatic machineries in DNA-templated processes (Talbert and Henikoff, 2010). In the second mode of chromatin remodelling, histone proteins are subject to post-translational modifications (Figure 1.2), including acetylation, methylation, phosphorylation, SUMOylation, ubiquitination, adenosine-diphosphate ribosyla- tion and biotinylation (Kouzarides, 2007). So far, modifications on more than one hundred histone amino acid residues have been identified. Histone modifications can occur in certain combinations and may influence each other in a process re- ferred to as regulatory crosstalk (Latham and Dent, 2007). Influence on other his- tone modifications can occur in cis on the same histone, or in trans across histones and/or nucleosomes, leading to spacial and temporal dependencies of modifica- tions (Suganuma and Workman, 2008). Histone modifications influence the chromatin architecture twofold: In a di- rect way, they may change the chromatin structure through charge-dependent al- terations affecting the contacts between adjacent nucleosomes or between histones and the DNA (Simpson, 1978; Ausio and van Holde, 1986; Ura et al., 1997; Ahn et al., 2005). Classically, regions with ‘loose’ wrapping and less condensed pack- aging were referred to as euchromatin, whereas those with tight nucleosome pack- aging as heterochromatin (Allis, 2007). In a more indirect way, individual or combinations of histone modifications may act as recognition signals for the recruitment of regulatory proteins that trigger functional responses (Berger, 2007; Rando and Chang, 2009). This hypothesis is known as the effector-mediated model, where signalling modules exist that operate as ‘writer’, ‘eraser’ and/or ‘reader’ of chromatin marks (Ruthenburg et al., 2007). Many of the identified chromatin-modifying complexes conform to this hypothesis and have enzymatic domains to add (write) or remove (erase) or bind (read) certain modifications (Ruthenburg et al., 2007). The capability of writing, reading and erasing modification signals on histones without effecting the underlying DNA sequence constitutes an additional compu- tational layer for storing and retrieving information in the cell (Prohaska et al., 14 2010). Observed correlations between certain histone modifications and cellular functions, such as the transcriptional activity of genes, led to the proposal that— on top of the genetic code—there exists an epigenetic histone code (Jenuwein and Allis, 2001; Strahl and Allis, 2000; Turner, 2007), which would, for example, al- low deriving the transcriptional state of the gene by determining its histone marks. However, recent research indicates that—from a biological perspective—histone modifications do not represent a self-contained code which can be understood de- tached from the context. According to this view, histone modifications and their functional impact must be seen in light of their temporal and inter-relational de- pendencies, which would render the idea of a code into a ‘nuanced language’ that constitutes the basis for transcriptional regulation through the chromatin signalling pathway (Lee et al., 2010). From a more computational perspective, however, the ambiguity or sometimes even the lack of (immediate) functional effects of certain chromatin modifications on cellular processes might hint at a decoupling event that took place during evolution and separated the epigenetic modification layer from direct biochemical functions, making it available for information storage and com- putation in a way classical regulatory effectors cannot work (Prohaska et al., 2010). Current research is attempting to generate genome-wide maps for diverse chro- matin modifications (Milosavljevic, 2010). These maps will broaden our under- standing of epigenetic regulation and are likely to increase the translational poten- tial of the field, in particular with respect to diagnostic and therapeutic applications in medicine as improper regulation of chromatin marks has been linked with sev- eral diseases, including imprinting disorders, Rett syndrome, facioscapulohumeral muscular dystrophy and even autism (Feinberg, 2010). Some of the most interest- ing advances in cancer biology, particularly leukaemias, are attributable to changes in the epigenome, such as abnormalities in histone marks in the promoter region of genes or aberrant methylation of DNA at CpG islands and microRNAs (Na- ture Biotechnology, 2010). However, given the plasticity of many chromatin mod- ifications, mapping them can only be a first step towards a coherent picture of their dynamics and impact. It will become necessary to map their localization at differ- ent points in time, in different tissues, in healthy and disease cells, etc. to further understand the molecular mechanisms by which they mediate their role for the 15 functioning of the cell. Furthermore, deciphering the histone code or language—if there is any—is likely to require information from higher-order architectural di- mensions of chromatin (see Section 1.3.2) and the interplay between DNA and/or chromatin modifications and RNAs, transcription factors, nuclear organizing fac- tors and signal transduction pathways (Nature Biotechnology, 2010). The current stage of epigenetic research seems comparable to genetic research before deciphering the DNA structure and the genetic code (Henikoff and Grosveld, 2008). If this comparison is true, the hopes of a revolutionary impact of epigenetics in many fields might not be unreasonable. 1.3.2 RNA interference Operational principle Conserved across a wide spectrum of eukaryotic organisms, RNA interference serves on one side as a natural defense mechanism in the cell against viruses and transposons and, on the other side, as a regulatory system that is key in develop- ment and gene expression in general (Zamore, 2002). With respect to the latter aspect, RNAi is part of the cellular machinery controlling which genes are active and tuning to which degree they are active. Intrinsically, RNAi is a negative feedback process to silence genes. As de- scribed in Section 1.3.2, RNA interference may embrace processes or features of the chromatin layer in several organisms, but the ‘classic’ operational principle is based on the post-transcriptional level. There, the protein machinery of RNAi controls the abundance of messenger RNA (mRNA) transcripts through small- interfering RNA (siRNA) molecules that base-pair with homologous regions on mRNA targets and induce their degradation (Fire et al., 1998). Feedback loops in that process are capable of sustaining, reinforcing and modulating the interfer- ing impact (Lipardi et al., 2001; Plasterk, 2002; Sijen et al., 2001; Zamore, 2002). Thereby, RNAi is capable of altering the information that is encoded and conveyed in mRNA expression levels. 16 Initially, the phenomenon of RNA interference—though termed differently— was discovered in plants (Ecker and Davis, 1986; Napoli et al., 1990). While try- ing to increase the colour intensity of a flower, additional copies of a gene had been introduced into the plant’s genome. However, instead of the expected over- expression of the gene and intensified colour, substantially reduced transcript lev- els and weaker colour were observed. The molecular mechanisms for this phe- nomenon first remained elusive, and research in many organisms began to uncover them. The observations made in these organisms were labelled differently, such as post-transcriptional gene silencing in plants, RNAi in animals, quelling in fungi and virus-induced gene silencing (Agrawal et al., 2003). It took almost a decade until research in the nematode worm Caenorhabditis elegans identified double- stranded RNA (dsRNA) molecules as the silencing trigger (Fire et al., 1998). Generally, dsRNA trigger for RNAi exist in two major forms in the eukaryotic cell (Figure 1.3): microRNAs (miRNAs), which arise from non-coding RNAs in the host genome, and small-interfering RNAs (siRNAs) from selfish or exogenous genetic material (Ding and Voinnet, 2007). Although both types have common features and share the same protein machinery (Gregory et al., 2006), they differ in their biogenesis, final structure and function. A newly transcribed miRNA, called pri-miRNA, get first cleaved into a pre-miRNA by the Drosha complex such that the remaining sequence of ∼70 nt in length folds into a hairpin structure, which then is subject to further processing by the Dicer complex and finally leads to a mature miRNA of ∼22 nt (Gregory et al., 2006). In contrast, the biogenesis of siRNAs begins with a long selfish or invasive dsRNA trigger, such as the genetic material of a virus (Ding and Voinnet, 2007), which gets first cleaved by Dicer into several short duplexes of ∼20 nt in length. Subsequently, each duplex gets sepa- rated such that one part becomes a mature siRNA. Once produced, the pathways for miRNA and siRNA converge at the RNA- induced Silencing Complex (RISC)—the core component of the RNAi protein machinery (Rana, 2007). Both miRNAs or siRNAs can be loaded into a RISC complex and guide it to mRNA transcripts that share sequence similarity with the trigger molecule (Figure 1.3). MiRNAs induce gene silencing by base-pairing with 17 Figure 1.3: Schematic diagram of core components and process steps of RNA interference (RNAi). The RNAi pathways are guided by small ncRNAs, primarily siRNAs and miRNAs. In the siRNA pathway, (exogenous) dsRNA are cleaved by Dicer into siRNA duplexes of which one strand is incorporated into the RISC complex that targets and typically cleaves mRNAs based on perfect sequence homology. (For other siRNA-based processing modes, refer to Section 2.3.) In the miRNA pathway, en- dogenously produced transcripts (pri-miRNAs) are first processed by the Drosha complex into pre-miRNAs which are subsequently exported into the cytoplasm where Dicer cleaves the stem-loop structure into the final miRNA. Once one strand of a mature miRNA is loaded into RISC, mRNAs are targeted and translationally inhibited. In contrast to siR- NAs, sequence homology between miRNA and target is typically im- perfect. 18 regions of an mRNA target and blocking the translational machinery. The pairing with the target happens at the post-transcriptional level in the cytoplasm but often is incomplete with several base-pairs mismatching. SiRNAs, on the other hand, generally base-pair perfectly with a site on the target and post-transcriptionally si- lence the gene by inducing cleavage of the transcript. Even though differences in the modalities and features of the RNAi pathway exist between organisms, the phenomenon continues to converge towards a sin- gle universal theme of gene regulation: The common components of this regula- tory pathway are that (i) the interference is triggered by a dsRNA molecule and that (ii) transcripts are targeted in a sequence-dependent manner by (iii) a ncRNA- controlled protein machinery (Agrawal et al., 2003). The RNA interference pathway is not only a cellular machinery of intriguing universality but also represents a powerful technique that led to many applications in a spectrum of research fields, ranging from biology, to medicine and even to biomolecular computing (see Section 1.2.3). As a tool in biology, RNAi allows for rapid determination of gene function in many organisms (Boutros et al., 2004; Kamath and Ahringer, 2003). Also in the context of therapeutics, RNAi has found applications. Given the high sequence specificity regarding genes, siRNAs can on one side be used to validate drug targets and on the other side to effectively modu- late gene expression in disease cells (Bitko and Barik, 2001; McManus and Sharp, 2002; Moss, 2003). Future medical applications attribute great potential for RNAi to combat carcinomas, myelomas and cancers caused by overexpression of onco- proteins (Tuschl and Borkhardt, 2002). For the field of biomolecular computing, RNA interference represents a promising platform to realize computations with biomolecules in live cells. Circuits of Boolean logic elements based on RNAi have already been demonstrated successfully (Rinaudo et al., 2007) and efforts are under way to make a variety of (extra-)cellular signals available for RNAi-computers (An et al., 2006; Beisel et al., 2008; Culler et al., 2010; Tuleuova et al., 2008). 19 Connections between chromatin and RNAi-like processes Chromatin and RNA interference play key roles in epigenetic phenomena and work in tandem to realize an array of cellular functions, including gene regula- tion, heterochromatin formation, DNA methylation and programmed DNA elimi- nation (Mochizuki et al., 2002; Pal-Bhadra et al., 2002; Tabara et al., 1999; Taverna et al., 2002; Zilberman et al., 2003). While the ‘classic’ RNA interference pathway represents a form of post-trans- criptional gene silencing (PTGS) and impacts messages in the cytoplasm, its reg- ulatory intersection with the chromatin layer enables RNAi(-like) mechanisms of transcriptional gene silencing (TGS) in the nucleus. Analogous to the sequence- driven recognition of mRNA targets in PTGS, small non-coding RNAs (ncRNAs) are also key in TGS for identifying genomic sites to trigger chromatin modifica- tions through ncRNA-DNA interactions (Jones et al., 2001; Mette et al., 2000). Both PTGS and TGS modes of RNAi share core enzymatic machineries and oper- ational principles that impact gene activity. The function of RNAi as TGS by regulating the epigenetic structuring of the genome is conserved across many species (Grewal and Rice, 2004). In Schizosac- charomyces pombe, core proteins of RNAi complexes are required for heterochro- matin formation (Grewal and Elgin, 2007). For instance, the equivalent of the RISC complex, called RNA-Induced Transcriptional Silencing complex (RITS) (Verdel et al., 2004), contains both a chromatin-associated protein and an RNAi-associated protein and, together with the RNA-Directed RNA polymerase Complex (RDRC), represent critical components in heterochromatin assembly. Deletion of any of the genes encoding these proteins results in defects in histone methylation and cen- tromere formation, which disturbs proper iteration through cell cycle stages (Hall et al., 2002; Volpe et al., 2002; 2003). Furthermore, components of the RNAi machinery together with the histone variant H2A.Z suppress antisense transcripts in the S. pombe genome (Zofall et al., 2009), which represents another intriguing example of the deep connection between chromatin and RNAi for proper mRNA processing and genome stability. 20 In Drosophila melanogaster and Tetrahymena, RNAi-like mechanisms are involved in TGS by affecting histone methylation and gene silencing proteins in heterochro- matic regions (Pal-Bhadra et al., 2002; 2004; Mochizuki et al., 2002; Taverna et al., 2002; Liu et al., 2004). Plants too share a similar set of RNAi proteins and enzymatic activities that are re- quired for structuring the epigenome. In Arabidopsis, histone methylation, RNA- directed DNA meythlation (RdDM) and targeting of repetitive DNA sequences re- quire an RNA-driven RNAi-like machinery (Matzke et al., 2002; Zilberman et al., 2003; Onodera et al., 2005; Gao et al., 2010; Lister et al., 2008; Cokus et al., 2008). In mammals, RNAi has been associated with epigenetic chromatin mechanisms for inactivation of one of the X-chromosomes in female offspring. In that process, a long ncRNA, called Xist, is transcribed from the inactive X-chromosome and induces/maintains chromosome-wide gene silencing by recruiting silencing com- plexes in cis (Brockdorff et al., 1992; Brown et al., 1992; Clemson et al., 1996; Penny et al., 1996; Marahrens et al., 1997). Paradoxically, silencing does not af- fect the Xist gene itself. Instead, the recruited chromatin-modifying complexes maintain its expression in a positive feedback loop. On the active X-chromosome, in contrast, the antisense transcript of Xist, Tsix, is expressed and binds to Xist forming a dsRNA duplex (Lee et al., 1999; Lee and Lu, 1999; Lee, 2000; Luiken- huis et al., 2001; Sado et al., 2001). This duplex is proposed to be processed in an RNAi-like Dicer-dependent manner, which causes downregulation of Xist on the active X-chromosome (Ogawa et al., 2008). It is further proposed that the RNAi pathway is involved in the regulation of chromatin modifications and in the spread- ing of silencing on the inactive X-chromosome (Ogawa et al., 2008). Besides their role in chromatin structuring in cis, long ncRNAs also emerged as important regulators of the epigenome from a second perspective. HOTAIR, for example, is a long ncRNAs that is proposed to act in trans by providing a scaffold for chromatin-modifying complexes (Tsai et al., 2010). In binding to two dis- tinct complexes at the same time, HOTAIR mediates coordinated methylation and demethylation of two amino acid residues on histone H3, and thereby inherently specifies a pattern of chromatin modifications to silence the genes it is recruited to. 21 While some of the detailed molecular mechanism remain to be elucidated, these examples clearly demonstrate a strong connection between RNA- and chro- matin-mediated processes for cellular functions. Given that large potions of the genome are transcribed and only a fraction translated into proteins, it is likely that other examples and mechanisms enlarge the regulatory overlap of both layers (Lee, 2010). 1.4 Thesis contribution With the work presented in this thesis, I elucidate fundamental aspects of informa- tion representation and manipulation in bioprocesses from both the Bioinformatics as well as the Biomolecular Computing perspective. From the Bioinformatics side, I focus on chromatin, the nucleo-protein struc- ture that packages the DNA in the cell. As a computation and storage layer in the cell, chromatin provides high plasticity to integrate (environmental) signals into DNA-based regulation and allows information propagation according to epi- genetic principles. In this work, I present results on the complexity of chromatin modifications and reveal their impact on gene activity and related functions in the cell. I further discuss the software pipeline I developed and that was employed for genome-wide chromatin profiling and transcriptome analysis. From the Biomolecular Computing perspective, I draw on epigenetic princi- ples to devise a model for an RNA interference-based equivalent of the electric flip-flop, which is one of the fundamental elements in digital circuits. In particular, I focus on the digital switch abstraction of the flip-flop as it is a central idea in both the electric and biomolecular world to connect and at the same time decouple the underlying physical process from the abstract representation of information. This work explores the computational principles of the RNA interference machinery and suggests novel ideas for universal memory units in biomolecular computing. By complementing both the natural and theoretical perspective, I attempt to enhance our understanding of epigenetic information processing in the cell and its 22 capacity for biocomputing. In particular, the projects presented in this dissertation led to the following findings and contributions: • Potential for an RNA interference-based flip-flop: I devise a mathemat- ical model for the biomolecular equivalent of the electronic flip-flop using RNA interference and speculate how this system could potentially be used in biocompatible computing devices as a memory unit. The novelty of this work is centered on the non-sequence-based abstract encoding scheme for digital information and how the RNAi machinery might unite analog and digital features for computing in the organismal context. • Non-redundancy of residue methylation states in chromatin architec- tural proteins: My analysis aided revealing that di- and trimethylation of histone H3 lysine 79 are mutually exclusive across the genome and that dimethylation associates with specific sets of cell cycle-regulated genes. This work is important as it answers the long-debated question in the field of chromatin biology of whether H3 lysine 79 methylation states are function- ally redundant and further reveals a novel connection between the epigenetic state of chromatin and transcriptional gene activity. From the biocomputing perspective, the fluctuation and distinct localization of H3K79 methylation during the cell cycle could indicate a toggle mech- anism that signals a particular system state to cellular machineries and dis- criminates marked genes from others, thereby triggering the correct iteration through the cell cycle program similar to a finite state machine, with cycle stages being machine states, chromatin modifications being (part of) transi- tion events and activation of certain genes being generated outputs. • Cross-talk of residue modifications in chromatin architectural proteins: In this project, my analysis aided in establishing the first genome-wide pro- file of histone H2B lysine 123 monoubiquitination in S. cerevisiae and in characterizing H2B lysine 123 monoubiqutination as a signalling tag that is required for proper establishment of particular H3 residue modifications. With respect to the underlying computational principles, the H2B lysine 123 monoubiqutination pathway demonstrates how certain histone modifications 23 trigger others in the context of transcriptional gene activation. Drawing again on the finite state machine abstraction, the transcriptional program can be di- vided into certain states that need to be executed in a certain order. The pre- sented sequence and combination of histone modifications can be regarded as (part of the) input to trigger correct state transitions, and the recruitment of enzymatic complexes as well as the establishing of downstream histone marks as generated outputs of such a machine. • Functional non-redundancy of protein function in the context of chro- matin: My work contributed to the discovery that the histone deubiquiti- nases, Ubp8 and Ubp10, act on distinct genomic loci. While Ubp8 removes the ubiquitin tag from H2B lysine 123 at sites enriched for trimethylated H3 lysine 4, Ubp10 functions at those marked by trimethylated H3 lysine 79. Through this work it was made possible to answer the question of functional redundancy of Ubp8 and Ubp10 and reveal genome-wide dependencies of histone marks. Regarding potential biomolecular implications, the discussed results point towards principles how identical enzymatic activity can be recruited to dis- tinct genomic loci in a histone modification-dependent manner supporting the hypothesized histone code. • Aberrant transcripts and causalities regarding chromatin structure: The analysis I performed aided elucidating the chromatin structure at sites of cryptic transcription and determine causalities with respect to certain his- tone modifications. From the biomolecular computing viewpoint, the data suggest that certain chromatin modifications have no (direct) impact on gene activity, hence, of- fering essential degrees of freedom to exploit these resources for information encoding in biocomputers with fewer interferences regarding cellular func- tion. • Software tools for chromatin modification profiling and transcriptome analysis: As a product of studying chromatin modifications, I developed several software tools that allow analyzing ChIP-on-chip, gene expression 24 and related data. I also implemented tools for visualizing genome-wide data and made them publicly available as plug-in for the Galaxy bioinformatics environment. Chapter 1 has motivated the idea of biomolecular computing, specifically biocom- patible approaches. It has further introduced key concepts and terminologies of RNA interference, chromatin biology and their role in cellular information pro- cessing. In the remainder of this thesis, Chapter 2 describes the RNA interference- based flip-flop model and presents its mathematical analysis. In Chapter 3, the bioinformatics pipeline that I developed is explained and used to analyze whole- genome chromatin modification maps and gene expression data. Chapter 4 puts the results derived through the pipeline in their biological context and studies as- pects of chromatin structure and regulation and their impact on cellular functions. Chapter 5 finally presents the conclusions of this work and embeds them into the bigger picture of information processing in living systems. 25 Chapter 2 RNA interference and the potential for a bistable flip-flop 2.1 Synopsis RNA interference continues to emerge as one of the central regulatory mecha- nisms in the cell and it is found across a wide spectrum of eukaryotic organisms. Uniquely, the protein machinery of RNAi is capable of generating arbitrary af- fector molecules and using them to identify and influence target molecules in a completely general yet highly specific manner. Feedback loops in that process re- late affector and target molecules as inputs and outputs, suggesting that RNAi can sustain a general form of cellular computation. In this chapter, we speculate on how eukaryotic cells might use RNAi to gener- ate a form of epigenetic memory at the RNA level and realize computations based on sequential logic. We present a model for an RNAi-based equivalent of the well- known electric flip-flop, the fundamental memory unit in digital circuits, and rea- son how natural and engineered systems might use it. The presented work aims to elucidate abstract principles of RNAi from a biological perspective and suggests ideas for universal memory units from a biomolecular computing standpoint. 26 2.2 Background Biological systems process information. From simple to complex organisms, cel- lular components must sense, process and produce molecular signals so as to re- spond and interact appropriately with their environment (Cooper and Hausman, 2009; Alberts et al., 2007; Lodish et al., 2007). This regulation is primarily based on precisely controlled and highly contextual (de-)activation of genes in particular cellular situations. Since gene products often provide additional regulatory control that activate or repress further downstream genes, regulatory cascades have evolved that allow an appropriate cellular response to particular environmental queues. A key feature of such regulatory cascades is whether or not they can feedback on themselves. In the absence of feedback, the differential equations describing such systems will always give a cellular response that is specified by a given en- vironmental condition. Often, however, it is biologically useful and necessary to remember past cellular conditions and respond appropriately from this context. Processes like the iteration through the cell cycle and the unfolding of the devel- opmental program in higher organisms exemplify various cellular programs that are stateful, and demonstrate that the underlying computational principles are se- quential rather than combinatorial in nature (Paul Hill, 1989). Consequently, these regulatory programs require a memory to store information (Alberts et al., 2007). Memory can be generated on the genetic level by linking regulatory elements through feedback mechanisms as exemplified by the lambda repressor (Ptashne, 1986), the lac operon (Müller-Hill, 1996), tetrecyclin repressor (Hillen and Berens, 1994; Hinrichs et al., 1994), and the lux operon (Babloyantz and Nicolis, 1972; Dunny and Winans, 1999). These regulatory systems capture transient stimuli, modulate associated cellular processes and maintain the process states beyond the duration of the trigger signal. Upon the discoveries of these regulatory systems, research towards the under- standing of gene activity focused for almost two decades on identifying genetic regulatory elements (Dynan, 1989; Ptashne, 1988). Common ground was that by 27 knowing the regulatory elements of a gene, the timing and modality of its expres- sion could be derived. It however became clear that regulation is often more com- plex and that additional mechanisms modulate gene activity. Several of these mechanisms are referred to as epigenetic since they encode (heritable) information and mediate regulatory impact without changing the DNA sequence (Bird, 2007). Epigenetic mechanisms encode information and modulate gene expression based on the prior history of a eukaryotic cell (Bird, 2007). DNA methylation and histone modification both encode epigenetic information by defin- ing distinct chromatin neighbourhoods that are able to modulate cellular transcrip- tion and that are fundamentally a cellular memory of past events (Bonasio et al., 2010). Epigenetic information may span multiple regulatory layers in order to con- verge and reinforce signals: Besides DNA methylation and histone modifications, transcription factors and non-coding RNAs play critical roles in orchestrating com- plex epigenetic states (Bonasio et al., 2010). A process that continues to emerge as a ‘glue’ between (epigenetic) regulatory layers is RNA interference. The RNAi machinery and epigenetic phenomena have been linked to a wide range of functions in the eukaryotic cell, including chromo- somal dynamics (Hall et al., 2002), stem cell maintenance (Cox et al., 1998) and cell fate determination (Bohmert et al., 1998). This central role of RNAi makes it an ideal interface to understand and manipulate cellular computations. Uniquely, the protein machinery of RNAi can generate arbitrary RNA affector molecules (siRNAs), that in turn are used to target highly specific RNAi-mediated responses. These responses make use of a completely general and highly specific strategy that relates affector sequences to target sequences via their ability to hybridize. Provided that the outputs from such a process can modulate the input, a general feedback mechanism exists that is defined by the RNAi machinery. In this chapter, we speculate on how eukaryotic cells might use RNAi to gener- ate epigenetic memory at the RNA level. We present a model for the biomolecular equivalent of the electric flip-flop, the fundamental memory unit in digital circuits, and reason how natural and engineered systems might employ it. 28 2.3 RNAi as an intrinsically programmable machine Our current understanding of the facets in RNAi can be largely attributed to re- search in the nematode worm Caenorhabditis elegans. As shown in Figure 2.1, RNAi in C. elegans is initiated by the DICER protein complex, which cleaves long double-stranded RNA (dsRNA) molecules into short duplexes that are 21–25 bp in length (Gregory et al., 2005; Vermeulen et al., 2005; Zamore et al., 2000). Un- winding of a duplex yields a single-stranded guide and passenger strand. While the passenger strand is degraded, the guide strand gets loaded into the Argonaute RDE-1 as primary (1◦) small-interfering RNA (siRNA) (Zamore et al., 2000). The ‘loaded’ enzyme, also known as RNA-induced Silencing Complex (RISC), then targets and potentially cleaves mRNA transcripts that are complementary to the guiding siRNA (Siomi and Siomi, 2009). Successful cleavage of an mRNA triggers the recruitment of an RNA-dependent RNA polymerase (RdRP) complex, which initiates the production of secondary (2◦) siRNAs towards the 3′-end of the tran- script (Baulcombe, 2007). Thereby, the location of the initial transcript cleavage site defines a periodicity and phasing of 2◦ siRNA production initiations (Pak and Fire, 2007). Consequently, different primary cleavage sites lead to distinct sets of 2◦ siRNAs. Newly produced siRNAs become available to different RISCs and determine their mode of operation (Okamura and Lai, 2008) as depicted in Fig- ure 2.1. The capacity of the RNAi pathway to amplify and modulate a primary silencing signal such that secondary affectors may target additional mRNAs is ref- ered to as transitive RNAi (Alder et al., 2003). Its regulatory universality and central connectivity with cellular processes led to the development of many applications for RNA interference. Besides the im- mediate utilization of RNAi in Biology to rapidly determine functions of genes (Boutros et al., 2004; Kamath and Ahringer, 2003), and its medical applications to validate drug targets and to effectively downregulate mutant genes in disease cells (Bitko and Barik, 2001; McManus and Sharp, 2002; Moss, 2003), RNA in- terference also has had a profound impact on Biomolecular Computing. A cen- tral goal of research in Biomolecular Computing is to build miniature computing machines that monitor, control and even reprogram processes in live cells (Be- nenson, 2009b; Culler et al., 2010). Efforts in the field are driven by the fact that 29 Figure 2.1: RNAi pathways in Caenorhabditis elegans. The DICER protein complex cleaves long dsRNA molecules into short duplexes of which one strand gets separated and loaded into the RISC complex as a pri- mary (1◦) siRNAs. This complex then targets and potentially cleaves mRNAs that share complementarity with the loaded siRNA. The cleav- age event triggers recruitment of an RdRP complex that produces sec- ondary (2◦) siRNAs primarily towards the 3′-end of the transcript. The location of the cleavage site defines the future periodicity of siRNA cleavage sites. Newly produced siRNAs are available to different RISCs and determine their mode of operation: They may again (1) cleave a target and amplify a response, or initiate production of other siRNAs, (2) degrade an mRNA without generating new siRNAs, or (3) block a transcript from being translated without cleaving it. 30 biomolecules—in contrast to any other computational substrate—allow direct com- putational interaction with living systems, a paradigm that is referred to as biocom- patible computing (Shapiro and Benenson, 2006). Most recent works from the field consider RNA-based approaches to be an ideal platform to interface with cellular networks and to compute inside cells (Be- nenson, 2009a; Culler et al., 2010). The versatility in RNA functionality in the cell to both sense and actuate, and the rational design techniques with which RNA structures can be designed (Link and Breaker, 2009; Davidson and Ellington, 2007) were already successfully employed to engineer information processing devices that identify and respond to disease conditions in live cells (Leisner et al., 2010; Rinaudo et al., 2007; Xie et al., 2010; Beisel and Smolke, 2009; Win et al., 2009; Win and Smolke, 2007a). Yet, these devices encode information directly as a function of sequence, length or structure of nucleic acids. The required molecule manipulations in order to reflect changed bits of information are not necessarily reversible and cause the information-encoding molecules to degrade over the course of the computational process. Thus, these devices are limited to a single operational cycle, in which they essentially evaluate a formula encoded in combinatorial logic. Sequential logic and iterated operation requires a type of memory that matches the dynamics and plas- ticity of the underlying information and that is not intrinsically encoded by the problem to be solved (Paul Hill, 1989). We herein propose an information encoding scheme that could potentially aug- ment the computational versatility of current bicompatible devices by drawing on the characteristic of the RNA interference pathway to separate information from machinery. We present a model of the biomolecular equivalent of the well-known electric flip-flop, the fundamental memory unit in digital circuits. The information encoding of the flip-flop is not based on individual molecule properties but on rela- tive concentrations differences between mRNA populations. As for natural mRNA regulation, RNA interference is used to modulate the mRNA concentrations and thereby alters the encoded information. 31 2.4 Digital memory with RNAi Digital memory at its core has bistable circuitry that records a ON or OFF state so long as the circuit is active. Applying a trigger pulse to such a flip-flop circuit results in flipping from ON to OFF, or OFF to ON depending on the initial state of the circuit. Electronically, such memory is achieved by building a flip-flop from a combination of dissipative elements (resistors) and active elements (transistors) in a symmetrical fashion so that the state of the circuit can be read out as a high (ON) or low voltage (OFF) on one half of the circuit (say at S in Figure 2.2A). Such dynamic devices are at the core of modern computer memory (Random Access Memory, RAM) and store information so long as powered. RNAi provides a nearly exact analog for the fundamental components of the elec- tronic flip-flop. RNAi makes possible the selective degradation of particular RNAs (dissipative in nature) while in certain organisms secondary small RNAs can be produced in a sequence-dependent fashion (active in nature). Since this form of regulation is kept digitally separated by virtue of the sequence-specific and hybridization-based machinery, in principle at least RNAi offers a biological route to generalized computation. As shown in Figure 2.2B, the RNAi flip-flop core is represented by two dis- tinct mRNAs A and B. The flip-flop state is encoded by the relative concentra- tion difference between these mRNA species. Similar to the electric flip-flop (see Figure 2.2A), A and B represent the equivalent of two cross-coupled inverting ele- ments. Their influence on each other is mediated by primary siRNAs s and p, and 2◦ siRNAs a, b, x and y. We assume that A and B are transcribed uniformly and with equal rates. Primary siRNAs s and p are produced outside of the circuit, and s is assumed to be at a uniform concentration. They serve to set the phasing of the secondary RNAs, which in turn serve to reinforce a particular state of the flip-flop. A source of siRNA s serves to establish the entire flip-flop circuit and induces a pattern of secondary RNAs in both mRNA populations that will inevitably force the circuit in the steady state to have a defined state (i.e. A high and B low, or B high and A low). Switching between the two flip-flop states is under the control of an externally supplied siRNA p, which we assume to be a transient cellular signal 32 Figure 2.2: Flip-flop mode of operation. A: A bistable multivibrator circuit or electric flip-flop. The circuit is composed of two cross-coupled tran- sistors and resistors (Paul Hill, 1989). Both parts are mutually exclusive for sustained conductance and influence each other in terms of the cur- rent and voltage. In analogy to the RNA flip-flop, the components are colour-coded. The flip-flop state is defined by the part that is set con- ductive by applying a voltage to the base of one of the transistors (e.g. through S). The state can be switched by setting the conductive part un- conductive by applying a voltage to the base of the opposite transistor (through R). Both system states are stable and maintained even after S or R are no longer applied. B: Interactions of the mRNA molecules through siRNAs in the RNA flip-flop. Blunt-ended arrows represent negative, regular arrows protective impact on the mRNAs through siR- NAs in the sense of the flip-flop. The state of the flip-flop is defined by the relative concentrations of A and B, maintained through the impact of s, and altered by transient pulses of p. In analogy to its electric counter- part, both RNA flip-flop parts influence each other symmetrically such that one mRNA is in high concentration while the other is low. 33 Figure 2.3: Sequence setup of the RNA flip-flop. The mRNAs A, B represent the equivalent of two cross-coupled inverting elements in bistable elec- tric circuits (see Figure 2.2A). They interact through 1◦ siRNAs s and p, and 2◦ siRNAs a, b, x, y. Stretches with primed labels along the mRNAs represent siRNA target sites, unprimed labels siRNA production sites. Vertical lines along the mRNA indicate idealized phasing defined by s and p. of interest. Pulsing p results in the symmetric swap of concentrations between A and B and flips the state of the circuit that can be read out by monitoring levels of A and B or the siRNAs a and b. Without loss of generality, we assume the flip-flop to be in the ON state when mRNA A is more abundant than B, and OFF conversely. Constant expression of the RNAs A, B and s causes constant cleavage in both mRNA populations since both transcripts have a target site for s. Successful cleavage events trigger the templated production of 2◦ siRNA b on A, and a on B, respectively. Once pro- duced and loaded into a RISC, b induces degradation (RISC mode 2, Figure 2.1) of mRNA B, while a—originating from B—targets members of the A population. Since A is assumed to be in higher concentration, production of b is higher, too. This allows A to continue to dominate over the concentration of B and, hence, to maintain the current flip-flop state. The actual concentrations of A and B represent dynamic equilibria: Even though both mRNA populations are negatively impacted by siRNA-mediated cleavage and unspecific degradation, there is a constant tran- scriptional supply that accounts positively for their concentrations. Both opposing 34 processes cancel each other out at the equilibrium points for each mRNA. Switching the flip-flop from ON to OFF is triggered by a transient pulse of siRNA p. The target sites for p on A and B are out of phase with those for s. Hence, p causes templated production of a, x on A, and of b, y on B. Newly generated a and b then target the same mRNA population they originated from, while x and y have a rescue effect on the opposite population by partially obscuring the target sites for both s and p without triggering any mRNA degradation (RISC mode 3, Figure 2.1). Since A is assumed to be more abundant than B, more a than b is produced, which in turn establishes a reinforcement loop that brings down the A population quicker than B. Even further, since more x than y is produced, the protective effect of the former on B is stronger than the latter’s on A. Hence, the negative impact on A continuously increases while B can gradually recover. Under the right conditions, the concentration of B climbs above A. Eventually, the influence of s drives both populations to the equilibrium concentrations—now in mirrored concentrations. The new flip-flop state is stabilized and maintained until the next trigger pulse p occurs and flips the system back to its initial state. 2.5 The mathematical model of an RNAi-based flip-flop The molecular interactions between the different RNA species of the flip-flop were described in a set of first-order ordinary differential equations (ODEs) (see Equa- tions (2.1)–(2.8)) to analyze to analyze under what conditions the desired bistable behaviour can be achieved. Each equation describes one RNA species of the flip- flop as labeled in Figure 2.3. For the mRNAs A and B at the core of the flip-flop, a constant transcription rate τ and an unspecific (i.e. not siRNA-mediated) degrada- tion rate γ were assumed. The RISC-mediated mRNA cleavage rate was expressed through φ , and RISC-mediated translational blocking and unblocking of mRNAs through µ and δ , respectively. ∂A ∂ t = τ− γA−φaA−µyA+δAb (2.1) ∂B ∂ t = τ− γB−φbB−µxB+δBb (2.2) ∂a ∂ t = ωB− εa−ρaA+ξ ptA (2.3) 35 ∂b ∂ t = ωA− εb−ρbB+ξ ptB (2.4) ∂x ∂ t = ξ ptA− εx−µxB+δBb (2.5) ∂y ∂ t = ξ ptB− εy−µyA+δAb (2.6) ∂Ab ∂ t = µyA−δAb (2.7) ∂Bb ∂ t = µxB−δBb (2.8) The s-mediated production of the 2◦ siRNAs a and b was described through ω , which dominates the system behaviour during the steady state of the flip-flop, and a pulse-dependent production rate ξ that dominates the state transitions. The loading rate of siRNAs into RISCs was modeled by parameter ρ , and the unspecific siRNA degradation rate by ε . Using two parameters, ρ and φ , to model RISC-mediated mRNA cleavage allowed representing potential multiple turnovers of siRNAs as well as ineffective cleavage attempts despite successful RISC loading. Similar to the mRNA-templated production of a and b, the 2◦ siRNAs x and y are produced in a p-dependent manner. The latter two transiently block A and B, and unspecifically degrade at rate ε like the other small RNAs. Temporarily blocked mRNAs are represented by Ab and Bb. Their assembly and disassembly rates correspond to the blocking and unblocking rates modeled by µ and δ as described above. pt = Ap √ wp pi e−wp(t−D) 2 (2.9) For the pulse, we assume a Gaussian behaviour as given by Equation (2.9), where where Ap and wp control the area and width and D its time offset from t = 0. 2.6 Steady state characteristics of the flip-flop model Assuming the flip-flop to be in either one of the steady states and the pulse to not be present, i.e. pt = 0, the RNA species x, y, Ab and Bb do not exist, and the ODEs 36 can be reduced to: 0 = τ− γA−φaA (2.10) 0 = τ− γB−φbB (2.11) 0 = ωB− εa−ρaA (2.12) 0 = ωA− εb−ρbB (2.13) Iterative substitutions of A, a and b by the parameters in the equations reduced the system further to a single equation in terms of parameters only: 0 = τ+ γ(γB− τ)(ε+ρB) φωB + φωB(γB− τ)(ε+ρB) εφωB−ρ(γB− τ)(ε+ρB) (2.14) Solving Equation 2.14 leads to four solutions for B: B1 = τρ− γε+ √ τ2ρ2+2γρτε+ γ2ε2+4τεφω 2(φω+ γρ) B2 = τρ− γε− √ τ2ρ2+2γρτε+ γ2ε2+4τεφω 2(φω+ γρ) B3 = (τρ− γε)(φω−ργ)+ √ (φω−ργ) ( φω (τρ−ge)2−ργ (γε+ τω)2 ) 2ργ (φω−ργ) B4 = (τρ− γε)(φω−ργ)− √ (φω−ργ) ( φω (τρ−ge)2−ργ (γε+ τω)2 ) 2ργ (φω−ργ) Substituting back in, these solutions represented the fixpoints of the equation sys- tem, i.e. the concentration values for A and B in the stable states of flip-flop. The goal was to render at least two of these solutions positive definite in order to get two positive-definite fixpoints. For positive-definite parameters, the equation for B1 always renders positive-definite, while solution B2 becomes imaginary and can be disregarded. From B3 we can derive the following constraint: τρ ( 1+ γετρ )2 γε ≤ φω ( τρ γε −1 )2 γρ (2.15) 37 and additionally from B4: φω γρ > 1 (2.16) Substituting α = φω γρ β = τρ γε we can rewrite the constraints (2.15) and (2.16) as follows: (β +1)2 ≤ α(β −1)2 (2.17) α > 1 (2.18) Plotting the two-dimensional constraint system described through (2.17) and (2.18), dissects the positive-definite quadrant in three regimes as shown in Figure 2.4A. The area labelled bistability (+) represented combinations of α and β that fulfilled all parameter constraints. Values from the monostability area represented degraded flip-flops with a single monostable state, and combinations from the bistability (−) region represented negative-definite fixpoint values. Selecting a suitable set of parameters according to an (α,β ) pair from the bistability (+) area and plotting the nullclines for A and B as shown in Figure 2.4B, the resulting curves intersect three times, representing the fixpoints of the system. The outer points are stable and correspond to the stable ON and OFF states of the flip-flop, whereas the middle fixpoint is unstable. Although two stable states would be sufficient and favourable, the sigmodial shape of the nullclines does not allow an even number of intersections. The basins of attraction around the outer fixpoints drive the system to the ON state for value pairs of A, B above the separatrix, and to the OFF state otherwise. The separatrix itself represents solutions where A, B are equal, which lead to degraded monostable flip-flops with no defined state. 38 Figure 2.4: Mathematical analysis of the RNA flip-flop. A: Phase space of the Equations 2.10–2.13. For positive-definite parameters the phase space divides into three areas: The upper area represents parameter combinations for (α,β ) that lead to systems with two stable and one unstable positive-definite fixpoints as shown in (B). The middle area represents degraded, monostable systems with a single positive-definite fixpoint, and the lower area defines systems with three fixpoints that are negative-definite. B: The nullclines of the first-order ordinary differen- tial system (see Equations 2.10–2.13) intersect three times for suitable parameter sets, yielding to a symmetric bistable system with three fix- points, the outer ones being stable and the middle one being unstable. 2.7 Dynamic characteristics of the flip-flop model In order to mimic the behaviour of an electric flip-flop and at the same time stay within plausible biological boundaries, the parameters of the biomolecular flip- flop model had to be balanced and constrained. On one hand, the system should be reasonably stable in its ON and OFF state such that minor perturbations do neither result in unintended flipping nor in oscillations of the system. On the other hand, the flip-flop has to be sensitive enough to capture a transient trigger pulse and switch properly. Finally, and to minimize interference with other cellular processes in a real-life environment, transitions between the states should be fast. 39 To find the optimal parameter combination for a representative flip-flop, we first arbitrarily fixed meaningful steady state concentrations for A and B as defined by the (α,β )-space. We further fixed an arbitrary Gaussian trigger pulse p within the same order of magnitude as the mRNA concentrations using Ap = 12, wp = 0.5 (see Equation (2.9). The remaining degrees of freedom of the flip-flop were then adjusted such that the system flipped. This flip-flop then served as a template to randomly generate a population of flip-flops with the same fixpoints but otherwise different parameters. These flip-flops were then handed to a genetic algorithm (Optimization Tool 5.1, MAT- LAB R2010b) and iteratively optimized for the free parameters (under the fixed pulse p). In the course of this process, the fitness of a flip-flop was defined by its flipping time, i.e. the time span from the onset of the pulse p to the symmetric swap of concentrations between A and B. Observed production and degradation rates greater than the slope of the pulse were penalized and so were any oscilla- tions in RNA concentrations. Over several rounds of selection and reproduction in the genetic algorithm, the flip-flop population evolved to a fitness threshold where flipping times differed less than t = 10−3 of a simulation time unit among all systems. For the units of the RNA species begin [A], [B], . . . , [Bb] = µmol/l and time [t] = min, their value ranges are given in Table 2.1. The units and values of the optimized parameters adhering to all constraints are listed in Table 2.2. Figures 2.5 and 2.6 shows the time course plot of the simulation of the optimized system over two trigger pulses. 2.8 Robustness of the flip-flop model After having shown that a flip-flop with fixed stable states can be optimized for a given trigger pulse, the robustness of the model was analysed by releasing these constraints and restoring the degrees of freedom. First, the importance of the trig- ger pulse on the flipping time of the system was determined by varying the pulse width and height. Over the range of a 20-fold difference in pulse area and a 5-fold 40 Figure 2.5: Simulated behaviour (core RNAs) over two trigger pulses (as rep- resented by the star in Figure 2.7A) of the optimized RNA flip-flop (rep- resented by star in Figure 2.8 and Table 2.2). Dashed lines indicate the fixpoints of the system. Figure 2.6: Simulated behaviour (all RNAs) over two trigger pulses (as rep- resented by the star in Figure 2.7A) of the optimized RNA flip-flop (rep- resented by star in Figure 2.8 and Table 2.2). Dashed lines indicate the fixpoints of the system. 41 Variable Molecule Type Value (min,max) Unit A,B mRNA (2, 6) µmol/l a,b siRNA (3, 18) µmol/l x,y siRNA (0, 1.4) µmol/l Ab,Bb mRNA-siRNA hybrid (0, 14.7) µmol/l Table 2.1: Value ranges of the variables after optimizing the flip-flop. Values rounded to the first decimal place. The corresponding parameter values are given in Table 2.2. Parameter Description Value Unit τ mRNA transcription rate 60 µmol/l min γ unspecific mRNA degradation rate 6 min−1 φ RNAi-based mRNA consumption rate 43 l/µmol min µ hybridization-based mRNA blocking rate 58 l/µmol min δ mRNA-siRNA deblocking rate 14 min−1 ω steady siRNA production rate 108 min−1 ε unspecific siRNA degradation rate 18 min−1 ρ RNAi-based siRNA consumption rate 9 l/µmol min ξ trigger-mediated siRNA production rate 1.4 l/µmol min Table 2.2: Parameter values and units of the optimized flip-flop. Values rounded to the first decimal place. difference in pulse width, the flipping time of the system changed only about a factor of three, except the extremes towards the inner boundary (see Figure 2.8). To further exclude the possibility that the manually set fixpoints of the optimized flip-flop had an impact on the overall system behaviour, the full degree of free- dom for the flip-flop was restored and a large set of random flip-flops from the bistability (+) area of the (α,β )-space generated. For for each system, it was de- termined whether it flipped (for the same pulse as used before) and what time it took to flip. Of the 10.000 systems, 3173 flipped within the simulation time boundary. The other candidates did not flip within that time or did not flip at all. 42 Parameter min max Unit τ 4.2 120 µmol/l min γ 0.3 12 min−1 φ 0.1 2.7 l/µmol min µ 2.3 116 l/µmol min δ 0.1 28 min−1 ω 5.1 215.9 min−1 ε 0.4 36 min−1 ρ 0.3 18 l/µmol min ξ 0.3 2.8 l/µmol min Table 2.3: Minimum and maximum of parameter values observed in the set of randomly generated flip-flops that flipped within the simulation time boundary (tmax = 128 simulation minutes). Values rounded to the first decimal place. Figure 2.7 shows the colour-coded flipping times of the flipping systems and their localization within the (α,β )-space. 2.9 Discussion In this chapter, we presented a model of a biomolecular flip-flop that is based on the cellular processing machinery of RNA interference and stores 1 Bit of infor- mation encoded as RNA concentrations. In comparison to existing approaches in biolmolecular computing, which encode information in nucleic acid sequences, as a function of molecule length or other direct properties, the presented encoding scheme abstracts further by drawing on principles of epigenetic information pro- cessing in the cell. In both natural and artificial circuits, computation is inherently dynamic and demands substrates that support this plasticity when processing and storing infor- mation. In contrast to permanent information, which is primarily sequence-based in a cellular context, metastable information is better represented on epigenetic 43 Figure 2.7: Robustness of the RNA flip-flop for different parameters. A: Colour-coded flipping times for 10.000 randomly generated flip- flops and a particular trigger pulse (represented by the star in Fig- ure 2.8). Coloured dots represent positive-definite bistable systems that flipped. Grey dots represent positive-definite bistable systems that did not flip within the simulation time boundary (tmax = 128 simulation min- utes). Initial flip-flop parameters were generated by randomly sampling each parameter of the optimized flip-flop (represented by the star) from the interval [0,2vi] where vi is the optimized value of parameter i (see text for details). B: Distribution of flipping times for the subset of randomly generated systems that flipped (3173 total) within the time boundary (see A). layers. Messenger RNAs form one of these layers, as they convey information (be- sides their actual sequence) in form of concentrations from the genetic blueprint to the final gene product. A key regulatory element to alter RNA-based information is the RNA interference pathway. In this work, we understand the RNAi machinery to be universal as it can be programmed by small non-coding RNAs to control the abundance of arbitrary (within certain limits) messenger RNAs. We further understand this machinery to naturally lend itself to bridge analog and digital computing in a cellular con- text, as its different modes of operation can modulate—for example reinforce— 44 Figure 2.8: Robustness of the RNA flip-flop for different pulse parameters. Flipping times of the optimized RNA flip-flop for trigger pulses of vary- ing Gaussian width (wp) and area (Ap). The white part of the plot rep- resents (Ap,wp) combinations where the system did not flip within the simulation time boundary (tmax = 32 simulation minutes). The star in- dicates the (Ap,wp) pair used to simulate the systems as shown in Fig- ure 2.7. interference signals such that they can cause nearly discrete behaviour of targeted messages. Based on known features of RNAi, we suggested a design for the flip-flop and showed that the model can encode 1 Bit of digital information. We further showed that for a wide range of parameter values, for both the flip-flop and the trigger pulse, discrete and fast flipping is possible. A key unresolved question concerns the actual biological parameters for the processes described in Table 2.1. The broad range of parameters tolerated by the flipping solutions (Table 2.3) demonstrate the tolerance and flexibility of the model, and while the kinetics of the RNAi machin- ery remain largely unresolved this suggests that such a flip-flop may be biologically possible. Of course, as our understanding of the functioning of the RNAi pathway develops, the equations used herein have to be revisited and critically examined if they capture the observed molecular interactions properly. Cooperativity between RISC complexes, for instance, could provide the non-linear kinetics directly com- 45 pared to the current model which requires additional molecule species to achieve flip-flop behaviour. From the biocompatible computing perspective, an implemented flip-flop as described herein could serve as a universal memory unit, as it can be interfaced with a variety of cellular and engineered pathways. Providing standardized inter- faces for molecular signals is considered to be essential for building increasingly complex biomolecular circuits, and individual circuit elements should aspire to be as general as possible in a computational sense (Benenson, 2009a). For many envisioned applications in the field of biomolecular computing hav- ing more than a single Bit of memory would be highly desirable. Multiple RNA flip-flops could in principle be coupled as their counterparts in electronic circuits are combined into arrays for storing more information. This coupling can poten- tially be achieved in several ways. Against initial hopes, we were not able to sequentially link multiple flip-flops re- liably, in which case they would have acted as a counter or frequency divider of an external signal. The key point to overcome this obstacle would be to introduce the asymmetrical production of an siRNA that gets produced only when the flip- flop completes a full cycle, i.e. after every other flip, or to clock the system. In the current model, the different RNA populations behaved too symmetrically and coupling signals could not be restored reliably over multiple instances. The parallel usage of multiple flip-flops, however, is straightforward. They could, for example, be used as memory units to store temporally spread out inputs and feed them to a Boolean logic circuit whose output would then depend in a non- trivial way on the input condition (Rinaudo et al., 2007). Another option for the implementation of flip-flop arrays could potentially be to use a single unit in each of many cells and utilize cellular communication channels to store and read dis- persed information in a similar way to how recent work used quorum sensing to link multiple NOR-gates, implement as genetic circuits, to Boolean functions of greater complexity (Tamsir et al., 2011). In such a scenario, the versatility of RNAi might again come to play as interference signals are known to spread systemically throughout organisms such a C. elegans (Hunter et al., 2006), thereby providing a 46 communication channel between spatially distributed flip-flops that could be used to synchronize or reset states. As far as an actual implementation of the flip-flop concerns, we suggest an in vitro environment first, before attempting to assemble the circuit in a cellular con- text. The availability of cell-free extracts with intact RNAi machinery, from C. el- egans for example (Aoki et al., 2007), can provide a suitable test bed to determine proper functioning of individual flip-flop parts and their interplay. Given that RNA structures can be reliably predicted based on their Watson-Crick base-pairing, ra- tional design and/or in vitro selection provide proven techniques to determine and optimize the sequence setup of the flip-flop parts. In the same way, the intended (and unintended) interactions between different RNAs according to the RNAi reg- ulatory machinery can be determined to much greater extent than it is possible for protein-protein or protein-DNA interactions of protein regulators (Subsoontorn et al., 2011). A functioning implementation of the RNA flip-flop would undoubtedly add to our understanding of the fundamental computational principles of the RNAi ma- chinery in the cellular context of (epi-)genetic regulation. We also see immediate potential applications for such a device as a tool for biologists: Any transient cellu- lar event of interest that is characterized by a (unique) small RNA effector could be linked to and captured by the flip-flop. Connected to a suitable readout mechanism, such as a fluorescent protein that is being produced upon successful flipping of the system, the event could be visualized to an external observer. 47 Chapter 3 Software pipeline for genome-wide chromatin modification profiling and transcriptome analysis 3.1 Synopsis Chromatin in general and histone modifications in particular are fundamentally in- volved in epigenetic regulation of cellular processes. Understanding the regulatory mechanisms will not only advance our understanding from a biological standpoint but might also lead to chromatin-based computations in live cells. As one of the steps towards these goals, chromatin modifications have to be profiled genome- wide and assessed regarding their associations with gene activity and each other in order to infer the abstract principles of this form of cellular computation. For the profiling stages of this process DNA microarrays play a central role. They represent one of the technologies that has transformed research in biology over the past years from single gene studies to interrogations of thousands of genes and entire genomes at once. In chromatin biology research they have been widely 48 used for genome-wide mapping of histone modifications and studying the expres- sion of genes under different conditions. In this chapter, I present the bioinformatics pipeline I developed for genome- wide analysis of ChIP-on-chip and gene expression data to identify distinct chro- matin neighbourhoods, interdependencies and combinatorial patterns of histone modifications and their correlations with gene regulatory processes. 3.2 Background The pace of advancements in DNA microarray (Stoughton, 2005) and sequenc- ing technology (Shendure and Ji, 2008) continues to transform biological research. Both techniques complement each other and provide powerful platforms to interro- gate entire genomes at high-resolution and from a variety of perspectives, ranging from sequence decoding, to gene expression profiling, to chromatin modification mapping (Metzker, 2010; Kapranov et al., 2007; Collas, 2010). It is in large part due to these technologies that amounts of quantitative data are becoming available to scientists and allow questions to be asked of unprecedented depth. With respect to chromatin, research over the past decade began to map the growing number of histone modifications and to link their occurrence with gene expression and other DNA-templated processes in the cell (Lee et al., 2010). Re- cent efforts try to identify combinations of modifications as ‘words’ of the proposed histone modification code (see Section 1.3.1). Evidence suggests that it is neces- sary to identify such words in their spatial and temporal context of the genome, to put them together and thereby continuously reveal the principles of syntax and grammar of the proposed histone crosstalk language (Lee et al., 2010). In order to profile histone modifications and their regulating enzymatic machineries under various experimental conditions microarray and sequencing technology will con- tinue to play a central role. In this chapter I present the bioinformatics pipeline I developed for analyzing DNA microarray data for profiles of gene expression and chromatin modifications. 49 The pipeline uses scripts and programs written primarily in MATLAB, R, Python and Java that allow a streamlined processing of microarray data, an integration of other tools and data sets and flexible ways to visualize data and results. While the following sections focus on technical aspects of the data processing, Chapter 4 presents the results derived through the analysis pipeline in their biological context. 3.3 Wet-lab data generation 3.3.1 Chromatin modification mapping with ChIP-on-chip Chromatin immunoprecipitation (ChIP) combined with microarrays (ChIP-on-chip) enables genome-wide profiling of protein-DNA interactions (Lieb, 2003). It served as a primary technology on the wet-lab side of the collaborative projects described in Chapter 4 to map various chromatin modifications across the genome of Saccha- romyces cerevisiae, also known as budding yeast. As an extremely versatile tool, ChIP-on-chip allows determining the localiza- tion of protein-DNA interactions for almost any protein (or its modifications) of interest in the context of chromatin (Aparicio et al., 2004). As shown in Fig- ure 3.1, the ChIP-on-chip workflow starts by covalently linking the protein of interest (POI), such as a histone with its modifications, to the DNA sequence in vivo and extracting the chromatin complexes from the nucleus. Since a single cell does not yield enough material for a ChIP-on-chip analysis, larger numbers of cells have to be used. In the projects described hereafter, 500 ml of budding yeast cells were grown in a rich medium to an OD600 of 0.8, which represents about 5 ·109 cells. Before chromatin extraction, the cells are exposed for about 20 min to 1% formaldehyde to crosslink all permanent and transient proteins associated with the DNA. While this concentration is not saturating, longer incubations result in large cross-linked agglomerates of (likely unspecific) proteins. The extracted chromatin-protein complexes are then sheared into DNA fragments of about 500 bp in length. An antibody that is mechanically or enzymatically attached to a solid sur- face or magnetic beads is then used to specifically recognize the POI and thereby extracting the bound DNA fragments. The cross-linking of DNA and POI is finally 50 reversed and the immunoprecipitated DNA fragments are linearly amplified using two rounds of T7 RNA polymerase amplification (van Bakel et al., 2008) in the experiments analyzed in Chapter 4. Eventually, the obtained DNA fragments are labelled with a biotin tag and transferred to a microarray. Figure 3.1: Overview of the wet-lab workflow for a ChIP-on-chip experiment to profile chromatin architectural proteins and/or their modifications. (This schematic was accepted for the Wikipedia project of the Wikime- dia Foundation.) DNA microarrays, or chips, are solid carriers with short DNA oligonucleotides, so-called probes of 25–60 nt in length. Several tens of thousand up to millions of copies of a particular probe represent one feature, and the probe sequence and feature position on the chip is known and catalogued. The most common array platforms used for ChIP-on-chip experiments are commercial tiling arrays from companies such as Affymetrix, NimbleGen and Agilent. In the projects described in this thesis, Affymetrix 1.0R S. cerevisiae GeneChip arrays were used, which comprise ∼3.2 million features covering the complete genome of budding yeast. On these arrays, probes are 25 nt in length and tiled at an average of 5 nt resolution, creating an overlap of approximately 20 nt between genomically adjacent features. 51 Once obtained, the amplified and labelled DNA samples are hybridized to an array and stained with a fluorescent dye. The amount of hybridized sample at each feature on the array, i.e. the number of bound probes per feature, can then be de- tected and measured as a function of fluorescence. The emitted fluorescence for a particular feature represents the relative quantity of sample DNA at the probe’s genomic position in the immunoprecipitated material. The fluorescence pattern of the array is captured as a digital image, which is transformed into numerical inten- sities values for each feature and, in case of the Affymetrix platform, stored as a CEL-file. However, the obtained values of an experiment cannot necessarily be fed into the analysis pipeline right away. Along the steps of the wet-lab workflow, from crosslinking the proteins with the DNA, to fragmenting the chromatin, to immuno- precipitating, amplifying and hybridizing the samples to the chip, and also satura- tion effects of the fluorescent dye, can introduce biases due to DNA sequence- and genomic region-dependent characteristics—called probe effects—and/or biochem- ical and biophysical aspects (Kimmel and Oliver, 2006). Therefore, it is crucial for any ChIP-on-chip experiment to identify and correct for these biases. Typi- cally, this is done by measuring a reference signal and dividing signal intensities obtained from the immunoprecipitated DNA samples by the reference signal inten- sities (Kimmel and Oliver, 2006). To generate a reference signal, either genomic DNA (input) or a mock immuno- precipitation (mock IP) are hybridized onto a second array when working with the Affymetrix platform. An input reference is obtained by using a fraction of the DNA samples before the rest of the material is immunoprecipitated and treated further as described above. A mock IP is performed using an unspecific or no antibody at all to enrich unspecifically bound DNA-fragments before hybridizing them (Kimmel and Oliver, 2006). Furthermore, to validate the observed effects and reduce variations based on fluo- rescence intensities and other potential noise sources, a ChIP-on-chip experiment is repeated to generate replicates, typically two to three. 52 Figure 3.2: Overview of the dry-lab workflow for a ChIP-on-chip experiment to profile chromatin architectural proteins and/or their modifications. (This schematic was accepted for the Wikipedia project of the Wikime- dia Foundation.) At the end of the wet-lab portion of the workflow, a CEL file for each sample and control gets generated that contains the raw data. For the work presented herein, typically two replicates were generated per experiment. If their Spearman rank correlation was greater than ρ = 0.9, one representative dataset was used for all subsequent analysis. For data with lower correlation, quantile normalization across replicates was used to generate a high-confidence dataset. In either case, the dataset was computationally processed and mapped (given the catalogued array layout) in order to determine at which genomic sites the POI or chromatin modification was present (Figure 3.2). Section 3.5 describes the important steps of this analysis fur- ther. 3.3.2 Transcriptome mapping with tiling arrays As mentioned above, the application range for DNA microarrays is broad and con- stantly increasing. Besides chromatin modification mapping and other analysis, DNA microarrays are versatile tools to monitor and compare gene expression levels and profile the transcriptome (Kapranov et al., 2003; Bertone et al., 2006). In this type of application, which is often used to reconstruct cellular pathways (Young, 2000) and—as described below—to identify genomic coordinates of transcription events, gene expression levels are analyzed for a sample and a reference condition. Observed changes in gene activity between a sample and reference profile al- low grouping of genes based on their response, assuming that genes with a common 53 regulatory pattern appear in the same pathway or share a common function (Young, 2000). As long as some genes of a group have already been linked to a certain path- way or function, newly identified members can be associated with those, too. On tiling arrays, the high density of probes allows determining where in the genome transcription occurs under various conditions and, hence, facilitates the identifica- tion of novel functional elements or annotation of unknown genomes (Kapranov et al., 2003; David et al., 2006). Due to their resolution power, tiling arrays, which contain overlapping probes, allow a much more accurate view on a cell’s transcriptome than earlier methods, which contained fewer probes and were limited in representing the accurate struc- ture of genetic elements (Bertone et al., 2005; Liu, 2007). Some of the first stud- ies with tiling arrays revealed that a larger portion of the genome seemed to be transcribed than covered by known protein-coding regions (Cawley et al., 2004; Kapranov et al., 2002). A mere 1–2 % of the mammalian genome sequence en- codes for proteins while 70–90 % of the remaining 98 % is transcribed (Kapra- nov et al., 2007; Mercer et al., 2009). Formerly, this large portion of the genome was considered to be ‘junk’ DNA, but now emerges as a source of non-coding RNAs (see Section 1.3.2), antisense, promoter-associated, intergenic and cryptic transcripts (Claverie, 2005; Katayama et al., 2005; Kapranov et al., 2007; He et al., 2008; Guttman et al., 2009; Khalil et al., 2009; Mercer et al., 2009). In Section 4.5, I present an analysis of tiling array data that aided elucidating an intriguing example of the interplay between cryptic transcription and chromatin structure. In that work, samples were prepared for array hybridization by syn- thesizing cDNA from isolated total RNA in S. cerevisiae. The purified cDNA was fragmented, labelled and hybridized to custom Affymetrix tiling arrays, which con- tain both strands of the S. cerevisiae genome tiled at 8 bp resolution (David et al., 2006). Analog to ChIP-on-chip experiments on the Affymetrix platform, numerical fluorescence intensities obtained from tiling arrays are stored in CEL-files. 54 3.4 Raw data processing Once raw data from the sample and control microarray experiments are available as CEL files, they have to be processed computationally in several steps. As de- scribed above, intensities values between different batches of experiments are not directly comparable because of the differences in quality and quantity of the la- belled sample as well as differences in reagents, stain and array handling. In order to reduce these biases and to make the probe measurements more comparable, the data have to be normalized before it can be used to determine enriched regions. Several algorithms have been developed for normalizing gene expression mi- croarrays (Choe et al., 2005; Grewal et al., 2007). Since these arrays usually have a low probe density, the algorithms cannot be easily transferred to tiling arrays used for ChIP studies. The high density probes covering large contiguous regions of the genome and the overlapping probe layout, require different bioinformatics approaches (Bolstad et al., 2003). Several tools and methods have been developed to analyze ChIP-on-chip data of tiling arrays. Widely used are MAT (Johnson et al., 2006; Droit et al., 2010), Ringo (Toedling et al., 2007),TileProbe (Judy and Ji, 2009), Tilescope (Zhang et al., 2007), TiMAT1, Affymetrix Tiling Array Soft- ware2, Splitter3 and MA2C (Song et al., 2007). A study of several labs comparing these algorithms (Johnson et al., 2008) found that some algorithms are more suitable than others when applied on data from cer- tain array platforms. For Affymetrix tiling arrays, as used throughout the work described in this thesis, MAT performed best. The most recent version of the algo- rithm, rMAT (Droit et al., 2010), was used for all ChIP-on-chip data preprocessing presented hereafter. 1http://sourceforge.net/projects/timat2 2http://www.affymetrix.com/partners programs/programs/developer/tools/affytools.affx 3http://zlab.bu.edu/splitter 55 3.4.1 rMAT As the most recent implementation, rMAT uses an adapted version of the Model- based Analysis of Tiling-arrays (MAT) algorithm to reliably detect enriched re- gions of the genome based on data from a ChIP-on-chip experiment (Johnson et al., 2006; Droit et al., 2010). Firstly, rMAT tries to eliminate probe effect biases using a Bayesian approach. Therefore, for each probe the probe intensity is first estimated based on its se- quence composition and then subtracted from the measured intensity. These effects are corrected for, because probes on an array having a high content of G or C bases show stronger binding affinity to the chromatin immunoprecipitated sample DNA, i.e. they tend to have higher intensity values just because of their sequence. This can be explained by the fact that any G-C base-pairing between probe and sample is stronger then between A-T because the latter association is based on only two hydrogen bounds whereas the former on three. Secondly, rMAT tries to address biases based on the copy number of each probe. Given that some genomic regions are repetitive, a sample DNA from such a region may hybridize to any of multiple probes on the array. Therefore, rMAT attempts to correct measured probe intensities taking the number of existing probe copies into account. After these normalization steps, all probe intensities are smoothed using a slid- ing window approach, and an enrichment score for each probe is calculated that reflects the occurrence of the protein or modification of interest. Finally, rMAT uses a scoring function that calls enriched and coherent regions based on a cutoff or p-value or a false discovery rate (FDR) cutoff. 3.4.2 tilingArray Biases arising in transcriptome analysis with tiling arrays are similar to those men- tioned above for ChIP-on-chip tiling arrays. Therefore, the intensity values of measured gene expression have to be normalized first. The normalization aims 56 to remove variations in the data of non-biological origin due to probe effects and the like, such that the remaining variation in the data can be considered as much as possible as truly biological effects. Of the normalization approaches proposed for tiling arrays, complete data methods—considering the entire data of several replicates—are commonly used, and quantile normalization in particular performs best (Bolstad et al., 2003). To account for probe effects, tilingArray further uses reference intensities of genomic DNA samples to adjust the probe signal intensities and employs a probe-specific background correction (Huber et al., 2006). The next step of transcriptome analysis is to obtain an unbiased map of posi- tion, abundance and architecture of transcripts by calculating the genomic coor- dinates of transcript start and stop sites on both strands of the DNA. To identify transcripts some algorithms used sliding window methods together with thresh- old criteria (Bertone et al., 2004; Kampa et al., 2004; Royce et al., 2005; Schadt et al., 2004). Also discrete state hidden Markov models were employed to detect transcript boundaries (Tjaden et al., 2002). TilingArray increases the prediction power of these approaches by treating transcript abundance not as a discrete but continuous metric and modelling the hidden state of the HMM as a real-valued variable (Huber et al., 2006). The hybridization profiles are then partitioned into segments of constant intensity values, which are separated by change-points in- dicating the transcript boundaries. To detect the change-points in hybridization intensities, tilingArray determines the global maximum of the log-likelihood of a piece-wise constant model by dynamic programming (Gentleman et al., 2004; Pi- card et al., 2005). Once processed by tilingArray, the gene expression data provided an unbi- ased genome-wide view on the transcriptome, and the subsequent analysis steps described below built on the results to correlate transcription with chromatin mod- ifications. 57 3.5 Data analysis and visualization 3.5.1 Local analysis pipeline As depicted in Figure 3.3, the software ecosystem used for analyzing DNA mi- croarray data comprised several tools that had to be integrated and interfaced with external data sources to allow a streamlined processing. Locally running appli- cations and services were hosted on a 64 bit Ubuntu Linux Server 9.10 running on a dual quad-core Intel Xeon machine with 16 GB of main memory. The nor- malization and preprocessing of the array data was done in R 2.9 with the rMAT and tilingArray packages as provided from Bioconductor (Gentleman et al., 2004). Since neither the packages nor R support native parallel execution of code, the Sun Grid Engine 6.1 was used as a job management system to pool and batch process multiple data sets concurrently and thereby utilize the available computa- tional resources more efficiently. Once preprocessed through calling functionalities of the packages, I developed scripts in R to visualize and plot the data together with other genomic features (see below for details). Finally, the data sets were exported from R as BED and GFF files. BED files are widely adopted and describe genomic features4. The format specifies three required and nine optional fields that are saved in a (tab-delimited) text file. As shown below, a BED file was generated when handling ChIP-on-chip data that stored the chromosome as well as the start and end coordinates of ge- nomic regions that were called enriched for the chromatin modification of interest by rMAT. Similar to BED but greater in flexibility, GFF is an exchange format for de- scribing genes and other features of DNA, RNA and protein sequences5. It uses nine tab-delimited fields to store and exchange genomic data. Version 3 of the GFF specification was used to export genome-wide probe-level ChIP and mRNA expression data as follows. 4For BED format definition, see http://genome.ucsc.edu/FAQ/FAQformat.html 5For GFF format definition, see http://www.sequenceontology.org/gff3.html 58 Depending on the type of analysis, the generated files were then directly fed into analysis in MATLAB and/or deposited into a local MySQL 5.1 database, mostly via Java code in Talend Open Studio 3. As an open-source tool, Talend Open Studio provides a code generation engine and a graphical integration designer to flexibly combine, convert, update different (local and remote) data sets, connect to other data sources and output a variety of formats (http://www.talend.com). Sev- eral hundred processing functions and formats are predefined and can be assem- bled into a data integration configuration, which can be run directly in Open Studio or standalone using a Java or Perl engine. Batch processing, synchronization of databases, migration and complex transformation of data are all greatly facilitated through this tool. At the core of the genome-wide DNA microarray analysis, I employed MAT- LAB (R2008a–R2010b) and its toolboxes, primarily because of the conveniences the language offers for handling multi-dimensional arrays, parallel execution of code, connectivity with external services (e.g. databases via SQL or web services via HTTP wrappers), debugging functions of the development environment and in- teractive options for plotting. In particular, MATLAB scripts were used to perform analysis and test biological hypothesis as described in the following sections. Enrichment of genomic features for chromatin modifications. As explained in Section 1.3.1, chromatin modifications such as histone marks have been associ- ated with gene activity and other DNA-templated processes in the cell. Under- standing the distribution of modifications and their localization with respect to functional genetic elements will provide insights into the underlying epigenetic regulation mechanisms. To determine enrichment of histone marks and chromatin modifications for certain genomic features, I developed a script that uses genomic coordinates saved in BED files representing enriched regions, and genomic coordinates of genetic features of interest, e.g. all open reading frames (ORFs) of genes in the S. cerevisiae genome. Iterating over all chromosomes, the script matches both sets of coordinates and determines the relative enrichment per feature. According to a user-defined thresh- old, enriched features, e.g. gene names, are saved. 59 Figure 3.3: Software ecosystem of the bioinformatics pipeline. See text for details on the individual components. 60 Subsequently, multiple lists of enriched features can be compared by set op- erations, such as Union, Intersection and Complement, to determine features with cooccurrent and exclusive marks. Such analysis aided to reveal the mutually ex- clusive distribution of H3K79me2 and H3K79me3 in the genome (see Section 4.3 for details). Similarly, I developed scripts that determine enrichment of other genomic fea- tures, such as promoters, centromeres, telomeres, tRNAs, rRNAs, autonomous replicating sequences and silent mating-type loci. The Saccharomyces Genome Database (Cherry et al., 1998) served as a primary source of information on such features, and data was downloaded and stored in tab-delimited files and/or the local MySQL database. A list of all transcription start and end sites for all transcripts in S. cerevisiae was kindly provided by Harm van Bakel and is based on data pub- lished in Lee et al., 2007. To determine statistical significances of set cardinalities, e.g. groups of genes enriched for particular histone marks, Fisher’s exact test (Fisher, 1922) was ap- plied. Similar to the χ2-test, Fisher’s test analyzes contingency tables to calculate the significance of associations between objects and categories. It is more accurate than the χ2-test, especially for small group cardinalities (Agresti, 1992). The binomial test was used as an exact test statistic when objects were equally likely to belong to either one of two categories, e.g. enriched or not enriched. In cases where expected distributions were unknown, Monte Carlo approaches were employed to generate reference values. For an example based on the latter ap- proach testing significant enrichment of autonomous replicating sequences with H3K79-methylation marks, refer to Section 4.3.6. Since not all chromatin modifications are necessarily found across the entire length of genomic features but localized to distinct segments, partitioning of fea- tures is essential to get a more accurate view of their enrichment for particular marks (Liu et al., 2005). Also, probe-level enrichment scores from GFF data in- creases the accuracy over BED-file information. I developed an algorithm that partitions genes into five segments representing their promoter, transcription start 61 site, 5′-coding sequence, mid-coding sequence and 3′-coding sequence, and deter- mines their individual enrichment for histone marks. Overlapping of segments was avoided by applying length constraints. Section 4.4.2 presents results based on this approach. Genome-wide visualization of chromatin modifications. While raw num- bers may contain all relevant information of an analysis, they are not always nec- essarily suitable to convey biological meaning efficiently. Graphic visualizations of results often aid in representing abstract content and reveal information at a glance (Gehlenborg et al., 2010). Yet, satisfying this claim is often particularly challenging for genome-wide data. For the case of mapping chromatin modifica- tions with ChIP-on-chip, it is a major goal to access the localization and distribution of the modifications across the genome. Genome-wide visualization approaches for ChIP-on-chip and related data can be distinguished according to the level of detail they achieve resolving. At single- gene resolution, tools like the UCSC Genome Browser (Fujita et al., 2011), offer the most detailed level of data presentation, typically spread out over several scrol- lable maps. At this resolution level, I developed visualization scripts in R that gen- erate genome-wide maps of DNA microarray data. Similar to the Genome Browser, they allow visualization of additional features, such as nucleosome positions, in- trons, telomere elements, autonomous replicating sequences and silent mating-type loci, but extent its flexibility in layered plotting of multiple profiles and features, which, for example, facilitate to qualitatively assess colocalization of chromatin modifications relatively to genomic elements. Plots of this kind are found in Sec- tion 4.3.1 for instance, and aided in formulating initial hypothesis on the influence of chromatin structure at sites of cryptic transcription (see Section 4.5). While offering fine-grained information, single-gene maps make it difficult to derive broader principles when evaluating enrichment profiles of chromatin mod- ifications across all genes since the information is too dispersed. Averaging ap- proaches provide a widely used alternative by first grouping genes—or other ge- nomic features—of similar kind into classes based on common characteristics and 62 then calculating the average enrichment profile for each class. Similar to results derived in previous studies (Pokholok et al., 2005; Mayer et al., 2010), I developed algorithms to generate average gene profiles based on the degree of enrichment, gene length or transcriptional frequency. Section 4.4.3 and others present results of this analysis. Average profiles can provide a good overview of the distribution of chromatin marks and their association with certain genomic features, but may also be biased for the binding profile of the strongest bound genes without revealing how many exceptions there are. Therefore, dividing features into segments—and thereby nor- malizing for gene length—as done in previous studies (Venters and Pugh, 2009; Badis et al., 2008) results in more accurate representations of enrichment profiles. However, these representations so far typically visualize only certain feature seg- ments individually, e.g. only the 5′-coding sequence of all genes, and thereby in- evitably fragment the information again that is necessary for a global assessment at a glance. I improved this approach and developed a visualization tool that allows an un- biased view on chromatin modification profiles at a genome-wide scale by sorting features by length and calculating mean enrichment scores for bins of absolute size. Thereby, the tool balances detail and compactness of existing visualization approaches. For details on the tool and how it was used in the work described be- low, refer to Section 3.5.2. Colocalization and patterns of chromatin modifications. Chromatin modifi- cations can colocalize on the same or neighbouring nucleosomes and may depend on each other through pathways collectively refereed to as histone crosstalk (see Section 1.3.1 for details). A major goal when analyzing chromatin modifications is to find such patterns of co-occurring modifications—sometimes called ‘words’ of the hypothesized code (Lee et al., 2010)—and to elucidate their dependencies. To interrogate the global relationship of histone modifications on a probe level, Spearman’s rank correlation coefficients are widely used (Spearman, 1987). This 63 statistic represents a non-parametric test, which discovers the strength and direc- tion of a link between two data sets. As a correlation metric, it is based on ranked values rather than raw expression/enrichment levels, which makes it less sensitive to extreme values in the data. In the projects described below, Spearman correlation coefficients were calculated to assess the reproducibility of replicated microarray experiments and to find correlations between different chromatin modifications. On the level of genomic features, potential links between chromatin modifica- tions can be revealed by comparing sets of, for example, feature names or coor- dinates associated with a modification. Common set operations between multiple sets resulting in non-empty intersections indicate cooccurances of marks, empty intersections exclusive enrichments, and so on. Derived results can be checked for their statistical significance using appropriate test, such as Fisher’s exact test as described above. As explained in Section 4.4.2, chromatin modifications do not necessarily oc- cur across the entire length of genomic features, and, hence, colocalization of mod- ifications must be considered on a feature-segment level, too. For the compre- hensive analysis of chromatin modification patterns (chromatin signatures) across transcripts in Section 4.4.2, first all transcripts in the S. cerevisiae genome were partitioned into five segments and their relative enrichment calculated based on probe-level GFF data. Secondly, the idea was to automatically group similarly en- riched segments. Clustering provides powerful methods for this purpose and allows organizing data into a smaller number of relatively homogenous groups. Clustering meth- ods can be distinguished into supervised and unsupervised approaches, in which the former assign some predefined order to the data, while the latter require no prior assumptions on how the data is grouped. For microarray data, hierarchical clustering, k-means, k-nearest neighbour, principal-component analysis and self- organizing maps are commonly applied clustering methods, each with their own strengths and weaknesses in certain analysis (Nugent and Meila, 2010). 64 For the work presented the Section 4.4.2, generally no prior information on the number of clusters was available. Hence, supervised approaches were dis- favoured. Hierarchical clustering based on Euclidean distance between entities and unweighted average distance (UPGMA) as linkage criteria yielded the most plausible results to group modification profiles from a biological perspective. To get a more quantitative measure of occurring chromatin signatures, applying an enrichment threshold transforms the patterns into binary combinations. For the five segments discussed in Section 4.4.2, the resulting 32 combinations can then be characterized further with respect to certain Gene Ontology terms, transcription factor binding lists and the like. In order to explore the statistical significance of any overlaps between sets, Fisher’s exact test and a cut-off of p = 0.0001 was ap- plied. Associations of cellular functions with chromatin modifications. Besides the genome-wide mapping of chromatin modifications, it is a major goal to access the association of marked genomic features, typically genes, with cellular func- tions. In the work presented below, genes as well as gene segments were analyzed for associations with transcriptional frequencies, transcription factors, Gene On- tology (GO) (Ashburner et al., 2000) terms as well as specific cellular processes of particular interest. Section 4.3, for instance, presents results that functionally link histone H3 methylation marks and cell cycle-regulated genes. For microarray data, exploratory data mining based on GO annotations has proven effectively in revealing cellular functions and pathways genes of interest are involved in (Werner, 2008). As the most comprehensive effort to introduce consistency in naming biological events and properties, the Gene Ontology project provides a set of structured vocabularies and controlled terms for annotating genes, gene products and sequences (Ashburner et al., 2000). Represented as a hierarchi- cal graph structure, GO terms allow deriving general as well as specific charac- teristics of genes depending on their location in the graph. For the genome of S. cerevisiae, the Saccharomyces Genome Database has used the Gene Ontology to annotate gene products in the budding yeast. From MATLAB, the ontologies 65 can be accessed either remotely or locally after downloading and queried regarding molecular function to assign activities to gene products, regarding biological pro- cess to put those gene functions in their biological context, and regarding cellular component to determine the subcellular compartments the gene product is found in. Relationships between individual or combinations of chromatin modifications and transcriptional regulation were further investigated by testing whether certain sets of genes or signatures correlated with transcriptional frequencies and/or bind- ing of transcription factors. Therefore, each gene was assigned to its corresponding transcriptional frequency class (Holstege et al., 1998) and/or compared to publicly available lists of genes bound by specific transcription factors (MacIsaac et al., 2006). Examples of this analysis are found in Section 4.3. Associations of transcription events with chromatin modifications. Build- ing on the analysis results regarding chromatin modifications in the context of tran- scription, gene expression data were utilized to further complement the intriguing associations between chromatin structures and transcriptional events. For the work described in Section 4.5, gene expression experiments were per- formed to measure the complete transcriptome in different S. cerevisiae strains. In the analysis of the resulting data sets, features of the tilingArray pack- age (Huber et al., 2006) described above were used together with custom-written code to first calculate the coordinates and visualize transcription segments on a genome-wide scale and then to prepare the data for subsequent analysis. Among those, emphasis was put on the identification of novel and ‘unusual’ transcripts in certain mutant strains. Capitalizing on database operations together with MAT- LAB functionality enabled determining the occurrences of cryptic transcription initiation events, characterized by intragenetic transcript boundaries predicted by tilingArray, on a genome-wide scale and to integrate ChIP-on-chip data to allow assessing the observations together with the local chromatin structure. These analysis aided in formulating several hypothesis regarding a potential relocaliza- tion of the histone variant H2A.Z in certain mutant strains and the restructuring of intragenic chromatin that resemble promoter characteristics and may lead to dis- 66 cerning dependencies and causalities of chromatin modifications in the context of transcriptional regulation in the cell. 3.5.2 Web-based analysis and collaboration While the local bioinformatics pipeline performed well for all aspects of the anal- ysis in the projects described in Chapter 4, ultimately it remains a heterogeneous software ecosystem that requires acquaintance with several tools, services, lan- guages, file formats and so forth. These aspects can make it labour intensive to reconfigure pipeline components, integrate new data sources, update or migrate the setup to a new environment and the like (Davidson et al., 1995). In order to fur- ther facilitate synthesizing new scientific knowledge and relationships within large amounts of data, solutions should strive for homogeneity, portability and flexibil- ity (Ritter et al., 1994; Markowitz and Ritter, 1995; Baker et al., 1998). Hence, one goal of my work was to identify and modularize components of the pipeline and facilitate their reuse as plugins for existing web-based bioinformatics environ- ments. Thanks to the pervasive popularity of the web browser as user interface, the ubiquitous availability of network access and the separation between logic and representation, web-based software solutions have become increasingly popular in bioinformatics research to effectively access data sources and software tools. They can be broadly classified into data-centred and analysis-centred systems (Stockinger et al., 2008), although boundaries are not always clear-cut and hybrid solutions may be advantageous in the long term given the increasing penetrance of cloud-based computing (Schadt et al., 2010). Over the last decade, there have been tremendous efforts to increase the user- friendliness of such environments and to facilitate interactions with the algorithms and data repositories that often allow complex parametrization regarding their in- put and output data structures. Some of the environments integrating various data sources and/or analysis tools that found a broader community include—but are not limited to—Biology Workbench (Subramaniam, 1998), ISYS (Siepel et al., 2001), 67 GenePattern (Reich et al., 2006), BioExtract Server (Lushbough et al., 2008) and Galaxy (Goecks et al., 2010; Taylor et al., 2007). As one of the most popular, versatile and cloud-ready platforms, the Galaxy framework integrates a variety of tools, which enable experimentalists without pro- gramming expertise to easily retrieve and query local and remote data, perform genome-scale analysis and visualize the results in a modularized way. Galaxy. Galaxy is an open-source, extendable, web-based bioinformatics en- vironment, which offers various features and tools for the analysis of genome-scale data under a unified interface. It is readily available online (http://galaxy.psu.edu) and downloadable for local installations and customizations. The standard setup al- ready includes over one hundred computational tools and converters for many tasks and data formats in bioinformatics research. The philosophy of Galaxy, thus, aims beyond data-centred approaches of projects like BioMart (Smedley et al., 2009) or the UCSC genome browser (Fujita et al., 2011). Furthermore, Galaxy allows shar- ing of analysis steps through history functions and collaboration of several people on entire analysis workflows—including the data. This increases the transparency of data analysis, facilitates comparisons and the general reproducibility and trace- ability of how results were derived. From the local software pipeline for microarray data described above, one visu- alization approach appealed particularly to experimentalists as it offers great versa- tility to represent genome-wide chromatin modifications in an accurate yet compact manner. In the following section, I use this component to exemplify how parts of a local analysis pipeline can be easily converted for use in Galaxy. CHROMATRA. The previous sections outline how the rise of DNA microar- rays contributed of the transformation of research in biology from a purely lab- based towards an information science. As a ‘positive’ result of this transition, the amount of data now available to biologists allow questions of unprecedented depth to be addressed. On the other side, however, the increasingly large volume and complexity of data sets, demands adequate computational methods to analyze and 68 also to present the data more effectively. With respect to the latter aspect of data handling, it remains a challenge to visual- ize large-scale data effectively such that their biological meaning can be grasped easily. As described above, existing tools, like the UCSC Genome Browser (Fu- jita et al., 2011), offer the most detailed level of data presentation but may make it challenging to derive broader principles in the data since the information is too dispersed. Averaging approaches, circumvent this problem by condensing the in- formation. They, however, inevitably sacrifice—potentially important—details. Balancing both detail and compactness of existing visualization approaches, I developed CHROMATRA (CHRomatin mOdification Mapping Across TRAn- scripts), a visualization tool for genome-wide DNA-protein binding and chromatin modification maps. It portrayals binding events or chromatin modifications profiles across all transcripts or other genomic features that can be aligned in a comprehen- sive yet condensed form by accounting for their length and additional characteris- tics. CHROMATRA is a set of Python scripts bundled as a plug-in for the Galaxy bioinformatics platform (Goecks et al., 2010; Taylor et al., 2007). Integration of the tool into Galaxy is achieved via parameter and user interface descriptions provided in an XML file. Since CHROMATRA was written in Python it easily integrates with Galaxy, itself primarily implemented in Python, and has no dependencies on any (external) libraries not already used by Galaxy. CHROMATRA consists of two visualization modules, which can be integrated into the analysis workflow for ChIP-on-chip or ChIP-Seq experiments. As ex- plained in previous sections, the raw data of ChIP experiments are normalized and enrichment scores are calculated and stored in standardized formats such as BED, GFF and WIG for subsequent analysis. After these steps, CHROMATRA can be tied in with the workflow and used to visualize ChIP profiles by reading GFF files that contain enrichment scores for the entire genome (Figure 3.3). The first module, CHROMATRA-L, accounts for differences in length of ge- 69 nomic features, such as genes, and eliminates potential biases in assessing ChIP profiles as they arise when using non-absolute length scales. It does so by sort- ing all features by length and calculating mean enrichment scores for bins of ab- solute size according to user-specified parameters. Therefore, the module requires gene coordinates to be uploaded, e.g. position information of all transcripts or open reading frames of an organism. These coordinates can be uploaded by the user or readily derived from other Galaxy modules in a tab-delimited format. The resulting enrichment profiles are colour-coded according to user-defined settings and visu- alized in a heatmap-like plot, which is available for download in different image formats, such as PNG, PDF or SVG. The second module, CHROMATRA-T, extends the first one by allowing an additional metric, such as transcriptional frequency of genes, to be accounted for when visualizing feature enrichment profiles. The values for the additional char- acteristic can be specified in the tab-delimited input file that also holds the gene coordinates. According to user-defined intervals or clusters for the second metric, CHROMATRA-T first partitions the set of features accordingly and then sorts each group by length. Enrichment values are subsequently calculated and displayed as described above. CHROMATRA-T hence allows assessing ChIP enrichment profiles at a glance according to various feature characteristics and avoids length- dependent biases. Section 4.4 describes results visualized with the CHROMATRA approach that allowed direct comparison of genome-wide ChIP-on-chip data addressing the lo- calization and dependencies of chromatin modifications in the context of transcrip- tion. Also, data derived through other ChIP techniques, such as MeDIP-seq (Weber et al., 2005; Jacinto et al., 2008) or MBD-seq (Serre et al., 2010) can be handled by CHROMATRA. 3.6 Outlook CHROMATRA represents an example of how to transform an existing (heteroge- neous) local analysis pipeline into a (homogenous) web-based environment such 70 as Galaxy, which increases reuse, portability and flexibility of the tools, and en- ables (remote) project collaborations and greater traceability of analysis. In the long term, standardization and integration efforts will continue for such platforms. GenomeSpace from Broad Institute (http://www.genomespace.org), for example, already integrates Galaxy and other popular tools including Cytoscape, GenePat- tern, Genomica, Integrative Genomics Viewer and the UCSC Genome Browser into a meta-workspace. Equipped with cloud-based computational resources, such efforts seem to provide suitable means to address the data-related challenges ge- nomics, metagenomics and epigenetics projects will pose (Kahn, 2011). 71 Chapter 4 Analysis of chromatin modifications in the context of transcription 4.1 Synopsis The overall goal of the projects described in this chapter was to learn more about the genome-wide distribution and regulation of chromatin modifications, with a particular focus on histone post-translational modifications, such as methylation, acetylation and ubiquitination, as well as histone variants and their relationship with transcription events. My work aided elucidating the function and distribution of the modification states of a single histone residue, the distinctive patterns in which multiple histone modifications cooccur, the spatial and temporal dependencies between different modifications, and the structure of chromatin at sites of cryptic transcription. In the first part, the genome-wide distribution of histone H3 lysine 79 di- and trimethylation was mapped using ChIP-on-chip and subsequent analysis revealed that the methylation states are mutually exclusive across the genome. This non- 72 redundancy suggests that individual states are linked to different cellular functions, and my analysis revealed that H3 lysine 79 dimethylation was linked to the cell cycle. The second part of this chapter presents my analysis of patterns of cooccurring histone marks, with a particular focus on histone H2B monoubiquitination cross- talk. The first genome-wide map of H2B lysine 123 monoubiquitination (H2B- K123ub) in S. cerevisiae was established, and my comprehensive analysis revealed that H2BK123ub is associated with both its dependent marks H3K4- and H3K79 trimethylation on a genome-wide scale. Furthermore, presented results indicate that the H2B deubiquitinases Ubp8 and Ubp10 act non-redundantly and remove H2BK123ub at different genomic loci. Ubp8 acted on loci that were enriched with H3K4 trimethylation, while Ubp10 removed ubiquitin at sites marked with H3K79 trimethylation. In the last part of this chapter, my analysis supported studies of chromatin neighbourhoods at sites of cryptic transcription, which occur when certain chro- matin pathways, such as the the Set2/Rpd3 pathway, are impaired. Using tiling arrays to capture the complete transcriptome of the cell, cryptic initiations sites are determined across the genome. Furthermore, the localization of the histone variant H2A.Z at sites of cryptic transcription as well as the role of its deposition complex are investigated. 4.2 Background As described in Section 1.3.1, chromatin folds into a hierarchy of complex lay- ers (Misteli, 2007) and serves as a state-encoding memory in the cell. On the first level of the hierarchy, histone proteins can be post-transcriptionally modi- fied (Bhaumik et al., 2007; Kouzarides, 2007) and modifications can colocalize in various combinations, thereby defining distinct chromatin signatures and pat- terns (Bhaumik et al., 2007; Kouzarides, 2007). One of the first described post-translational modifications of histones was methy- 73 lation of lysine residues (Murray, 1964). It is one of the most stable epigenetic marks and proposed to serve as a core epigenetic mark during cell state inheri- tance and cellular memory in general (Radman-Livaja et al., 2010; Muramoto et al., 2010; Lan and Shi, 2009). Although initially considered to be non-reversible, sev- eral demethylases have been discovered over the past years (Mosammaparast and Shi, 2010). Distinct states of histone modifications. To date, three histone methyltrans- ferases have been identified in S. cerevisiae: Set1/COMPASS, Set2 and Dot1 (Shi- latifard, 2006). They are responsible for methylation of lysines 4, 36 and 79 on histone H3 (H3K4, H3K36 and H3K79), respectively. In that process, each ly- sine residue can be either mono-, di- or trimethylated (e.g. H3K79me1, -me2 or -me3) (Shilatifard, 2006). Histone H3 methylation occurs in distinct regions of the genome and in dif- ferent segments of genes, which are associated with particular stages of the tran- scription cycle that comprises transcription initiation, elongation and termination. H3K4me3, for example, is found at the 5′-end of genes (Bernstein et al., 2002; Briggs et al., 2001; Krogan et al., 2003b; Miller et al., 2001; Ng et al., 2003b; Santos-Rosa et al., 2002; Pokholok et al., 2005) where transcription initiation oc- curs. H3K36 methylation, in contrast, is located towards the 3′-end of open reading frames and associated with transcription elongation (Pokholok et al., 2005; Strahl et al., 2002). While H3K79 methylation has also been implicated with DNA repair and telom- eric silencing (Lacoste et al., 2002; Ng et al., 2003a; van Leeuwen et al., 2002) its connection with the transcriptional process are not well understood yet. This uncertainty can be largely attributed to the unsettled debate in the field whether different methylation states of H3K79 have similar—possibly even redundant—or distinct functions. The work presented in the first part of this chapter, addresses the debated re- dundancy of lysine 79 methylation states by mapping H3K79 di- and trimethyla- 74 tion genome-wide using chromatin immunoprecipitation (ChIP) followed by high- resolution DNA microarray profiling (ChIP-on-chip). My data analysis contributed to the finding that H3K79 methylation states are mutually exclusive across the genome and that individual states are linked to different cellular functions. In par- ticular, H3K79 dimethylation was linked to the cell cycle and found to be enriched at M/G1 cell cycle-regulated genes as well as in Swi4/6-bound promoters. In S. cerevisiae, more than 800 genes are known to be cell cycle-regulated and their expression peaks periodically in either one of the M, G1, S or G2 cy- cle phases (Spellman et al., 1998). A key mechanism driving cell cycle control is the regulatory interconnection and temporal orchestration of transcriptional activa- tors (Simon et al., 2001). Often, transcription factors function specifically dur- ing one cell-cycle stage and control the expression of transcriptional activators for a subsequent one, thereby forming a feed-forward regulatory circuit to ensure an ordered progression through cell division. One of those is the heterodimeric transcription factor complex SBF (SCB-binding factor) consisting of Swi4 and Swi6 (Breeden, 1996; Andrews and Herskowitz, 1989). They bind the repeated regulatory sequence SCB (Swi4,6-dependent cell cycle box) upstream of the genes they activate during G1/S phase transition (Harbison et al., 2004; Iyer et al., 2001; Simon et al., 2001). The role of chromatin modifications in transcription control has been inten- sively studied in recent years (Hogan et al., 2006; Kaplan et al., 2008). However, many questions remain to be answered regarding chromatin structure and the role that chromatin alterations play in normal cell-cycle progression and transcriptional regulation. The results presented in the first part of this chapter aided in revealing a novel connection between H3K79me2 and the SBF transcriptional activation com- plex. Interdependencies of histone modifications. Post-translational histone modi- fications may occur individually or in combination on the same nucleosome, result- ing in their cooperative function in specific chromatin neighbourhoods (Suganuma and Workman, 2008). The spatial proximity of histone modifications sometimes 75 underlies causal relationships and dependencies based on histone crosstalk (Latham and Dent, 2007; Suganuma and Workman, 2008). One striking example is the evolutionary conserved trans-tail histone crosstalk between H2BK123 monoubiquitination (H2BK123ub) and H3K4- and H3K79- trimethylation, in which H2BK123ub is required for the addition of H3K4me3 and H3K79me3 (Shilatifard, 2006). This regulatory pathway acts unidirectional, as mutations that eliminate either H3K4me3 or H3K79me3 have no effect on H2B ubiquitin levels (Dover et al., 2002; Sun and Allis, 2002; Briggs et al., 2002). H2BK123ub is established by the Rad6/Bre1 ubiquitin ligase complex, and con- sistent with roles in transcription initiation and elongation, it is found in gene pro- moter and their coding regions (Dover et al., 2002; Hwang et al., 2003; Robzyk et al., 2000; Wood et al., 2003a). Numerous aspects of the transcription cycle are associated with H2BK123ub. Bre1 is required for the recruitment of Rad6 to promoters, and Rad6 necessitates the PAF complex to associate with RNA Poly- merase II (RNAPII) and travel with its elongating form into coding regions (Xiao et al., 2005; Wood et al., 2003a). Consistently, for proper establishment of H2B- K123ub, the transcription elongation complexes PAF and BUR are essential (Kro- gan et al., 2003b; Wood et al., 2003b; 2005). While it is already known that H2BK123ub is linked to H3K4 and H3K79 methylation, its role in establishing these marks has not been elucidated on a genome scale due to the lack of a suitable antibody. As part of the project de- scribed in this chapter, the first genome-wide map of H2BK123ub in S. cerevisiae was established, and the localization of H2BK123ub was compared to H3 methy- lation marks. My comprehensive analysis revealed that H2BK123ub is associated with both dependent marks H3K4me3 and H3K79me3 on a genome-wide scale. Furthermore, the global genomic analysis demonstrates that monoubiquitination of histone H2BK123 cooccurs in distinct combinations with histone H3 methyla- tion marks in promoters, the 5′-end, the body and the 3′-end of genes. In contrast to methylation marks, which are relatively stable and small, mono- ubiquitination is a transient and large protein tag of 76 amino acids (Hochstrasser, 76 1996). In S. cerevisiae, H2BK123ub is removed by the ubiquitin-specific proteases Ubp8 and Ubp10 (Weake and Workman, 2008). Ubp8 is a subunit of the Spt-Ada-Gcn5-acetyltransferase (SAGA) complex, and the integrity of SAGA is required for Ubp8 deubiquitination activity (Daniel et al., 2004; Henry et al., 2003; Ingvarsdottir et al., 2005; Lee et al., 2005; Powell et al., 2004). Deubiquitination of H2BK123 by Ubp8 triggers the recruitment of Ctk1, which is involved in phosphorylating serine 2 at the C-terminal domain (CTDS2ph) of RNA polymerase II (Wyce et al., 2007). In contrast, Ubp10 (also known as Dot4) acts independently of SAGA and is as- sociated with telomeric silencing (Emre et al., 2005; Gardner et al., 2005). Ubp10 binds to the silencing protein Sir4, is enriched in silenced loci, but its function is not restricted to heterochromatic regions (Gardner et al., 2005; Kahana and Gottschling, 1999). Given these differences, it is likely that Ubp8 and Ubp10 deubiquinate distinct pools of H2BK123ub in the cell. This is further supported by deletion of both pro- teases resulting in a higher global increase of H2BK123ub level than either one of the single deletions alone (Emre et al., 2005; Gardner et al., 2005). However, the site-specific roles of Ubp8 and Ubp10 have not been investigated on a genome scale. In the second part of this chapter, the presented results indicate that Ubp8 and Ubp10 act non-redundantly and remove H2BK123ub at different genomic loci. Ubp8 acted on loci that were enriched with H3K4me3, while Ubp10 removed ubiq- uitin at sites marked with H3K79me3. Aberrant transcripts and their chromatin structure. Besides post-transla- tional modifications, histone variants can alter chromatin structure and affect or regulate cellular processes. These variants are deposited into specific chromatin regions and differ from their canonical counterparts in primary amino acid se- quence (Talbert and Henikoff, 2010). H2A.Z is one such variant and is deposited into chromatin by the ATP-dependent complex SWR1-C (Kobor et al., 2004; Kro- gan et al., 2003a; Mizuguchi et al., 2004). It is mainly localized to promoter regions of genes and has been proposed to play a role in transcriptional regula- 77 tion (Guillemette and Gaudreau, 2006). Furthermore, H2A.Z, together with com- ponents of the RNAi machinery, has been proposed to suppress antisense tran- scripts in the S. pombe genome (Zofall et al., 2009), pointing at its essential role in transcriptional control. The regulation of chromatin is not only essential for a controlled progress through the transcription cycle, but also influences the accurate choice of transcript initiation sites. In cells lacking certain chromatin regulators, initiation of transcrip- tion occurs inappropriately within the protein-coding regions of genes, indicating that under altered genetic or physiological conditions, the expression of alternative genetic information may occur (Kaplan et al., 2003; Carrozza et al., 2005; Cheung et al., 2008; Joshi and Struhl, 2005). This phenomenon, called cryptic or spurious transcription, indicates that the precise organization of chromatin along transcrip- tion units is crucial to direct transcription factors and RNA polymerase to proper start sites within genes. One of the chromatin regulators, involved in the suppression of cryptic tran- scripts is the methyl-transferase Set2. As mentioned above, Set2 interacts with RNA polymerase II during transcription elongation and methylates H3K36 co- transcriptionally (Strahl et al., 2002; Krogan et al., 2003c). H3K36 methylation was shown to act as a signal to recruit the histone deacetylase complex Rpd3S, which deacetylates histones within transcribed sequences. The removal of histone acetylation is an important step to prevent cryptic transcription, because histone acetylation marks open chromatin and increases accessibility of the transcription machinery. This process can be understood as a kinetic proofreading scheme to en- sure correct transcription initiation (Blossey and Schiessel, 2008). If the proper de- acetylation in open reading frames is defective, for example in the absence of Set2, spurious transcription initiation occurs from cryptic start sites within ORFs (Car- rozza et al., 2005; Lickwar et al., 2009). However very little is known how tran- scription is initiated from cryptic promoters and how the chromatin structure is altered (Pattenden et al., 2010). In the following part, the localization of the histone variant H2A.Z at sites of cryp- tic transcription is analyzed and the role of its deposition complex investigated. 78 4.3 Distinct states of histone modifications 4.3.1 Genome-wide localizations of H3K79me2 and H3K79me3 Little is known about the distinct functions of H3K79me2 and H3K79me3 marks in the cell, and a redundancy in their roles was proposed in budding yeast (Fred- eriks et al., 2008; Shahbazian et al., 2005). To better understand the relationship of H3K79 di- and trimethylation as well as elucidate a possible role for H3K79me2 in cell cycle progression, a comprehensive genome-wide map of these modifica- tions was established via ChIP-on-chip experiments using high-resolution tiling microarrays. Protein-DNA complexes containing either di- or trimethylated forms of histone H3K79 were specifically immunoprecipitated with antibodies against H3K79me2 and H3K79me3, respectively. In order to reliably detect enriched regions, an adapted version of the Model- based Analysis of Tiling arrays algorithm (rMAT, see Section 3.4.1 for details) (John- son et al., 2006; Droit et al., 2010) was applied, comparing signal intensities be- tween ChIP and genomic DNA to calculate the protein-binding profile. Spearman rank correlation coefficients of ρ = 0.9 in average indicated high reproducibility and robustness of the performed replicates, typically two per experiment. Intriguingly, H3K79me2 and H3K79me3 were localized to different regions of the genome and had distinct and mutually exclusive patterns on chromatin (Fig- ure 4.1A). In total, H3K79me2 covered ∼22 % and H3K79me3 ∼35 % of the genome with only 2% overlap, suggesting that these H3K79 methyl marks were associated with distinct genomic regions. To compare the localization data with known genome-wide nucleosome occupancy data, H3K79 methylation profiles were overlaid with nucleosome position data predicted by a Hidden Markov Model (HMM) (Lee et al., 2007) (Figure 4.1A). Despite slight differences in resolution, enriched regions (blue and red bars in Fig- ure 4.1A) colocalized with regions of known nucleosome occupancy (green bars in Figure 4.1A). To further ensure that these profiles truly reflected specific H3K79 methylation marks recognized by the two antibodies, control experiments were performed in a 79 Figure 4.1: H3K79me2 and H3K79me3 patterns across the genome. A: High-resolution profile of H3K79me2 and H3K79me3. Representa- tive for the entire genome, regions of chromosome four and eight were plotted along the x-axis against the relative occupancy of H3K79me2 and H3K79me3 on the y-axis. ORFs are indicated as rectangles above the axis for Watson genes and below the axis for Crick genes. Green boxes represent HMM-predicted well-positioned and fuzzy nucleosome positions derived from Lee et al., 2007. B: Venn diagram comparing the number of H3K79me2- and H3K79me3-enriched ORFs. C: Average lengths of H3K79me3- and H3K79me2-enriched genes. Boxplots show the lengths of 6576 ORFs in yeast as well as the lengths of H3K79me3- and H3K79me2-enriched ORFs. 80 strain lacking the H3K79 methyltransferase Dot1, in which H3K79 methylation is completely eliminated. The genome-wide control profiles showed randomly scattered background peaks and a trend toward occupancy of repetitive regions, demonstrating that the antibodies were specific for their respective H3K79 methy- lation state. 4.3.2 Profile of H3K79me2 and H3K79me3 at promoters and ORFs Having established the detailed maps of H3K79 di- and trimethylation, we sought to understand the general features of the occupied regions. Genome-wide analy- sis revealed that H3K79me2 and H3K79me3 covered 1866 and 2350 of 6576 total open reading frames, respectively. As expected, the two sets of ORFs enriched with either H3K79 di- or trimethylation overlapped in very few genes (Figure 4.1B). Interestingly, H3K79me3-enriched ORFs were longer (median 1815 bp) whereas H3K79me2-enriched ORFs were shorter (median 848 bp) relative to the average ORF (median 1067 bp) (Figure 4.1C). To visualize the average profile of ORFs marked with H3K79me3 and H3- K79me2, all enriched ORFs were aligned according to the location of translation initiation and termination sites, similar to an earlier published analysis (Pokholok et al., 2005) (Figure 4.2). Consistent with previous studies (Pokholok et al., 2005), H3K79me3 was uniformly enriched within the protein-coding region of genes (Fig- ure 4.2A). In contrast, H3K79me2-enriched ORFs showed that H3K79 dimethyla- tion was not only found in the protein-coding regions of genes, but also covered their promoter region (Figure 4.2B). Overall, H3K79me3 was found in only a few promoters, whereas H3K79me2 covered promoter regions more frequently. Promoters comprise a fraction of the intergenic region that are defined as regions that do not encode protein. Consistent with higher occupancy of H3K79me2 in promoter regions, H3K79me2 covered ∼20 % of the intergenic regions, whereas H3K79me3 covered less than 4%. 81 Figure 4.2: Distribution of H3K79me2 and H3K79me3 across genes. Aver- age profiles of A: H3K79me3- and B: H3K79me2-enriched ORFs. A gene was considered to be enriched if at least 50 % of its ORF was covered by the modification. ORFs were aligned according to their translational start and stop sites, similar to an approach by the Young lab (Pokholok et al., 2005). Each ORF was divided into 40 bins of equal length, probes were assigned accordingly, and average enrichment val- ues were calculated for each bin. Probes in promoter regions (500 bp upstream of transcriptional start site) and 3’ UTR (500 bp downstream of stop site) were assigned to 20 bins, respectively. The average enrich- ment value for each bin was plotted. 4.3.3 Association of H3K79me2 and H3K79me3 with transcription In order to examine the correlation between H3K79 di- and trimethylation and gene expression, enriched genes were assigned into five different classes accord- ing to their transcription rate (Holstege et al., 1998), and a composite occupancy profile for each class was determined (Figure 4.3A–B). Genes enriched for H3K79 dimethylation had a tendency to be present at higher levels in transcriptionally less active genes mainly toward their 5′-end and promoter region (Figure 4.3B). Con- sistent with previous findings (Pokholok et al., 2005), no clear correlation to tran- scriptional activity of genes and enrichment for H3K79 trimethylation was found (Figure 4.3A). Genes with lower expression levels were enriched for H3K79 di- and trimethylation with similar ratios (Figure 4.3C). An exception, however, was the small group of the most highly expressed genes. These genes had very low levels of H3K79 di- and trimethylation in their promoters as well as in their ORFs. Gene expression has been reported to correlate inversely with nucleosome occu- 82 Figure 4.3: Association of H3K79me2 and H3K79me3 with transcriptional frequency. Average profiles of A: H3K79me3- and B: H3K79me2- enriched ORFs according to transcriptional activity. All genes for which information was available (Holstege et al., 1998) were divided into five classes according to their transcriptional rate. Average gene profiles were computed and plotted as described in Figure 4.2. C: Percent en- richment of H3K79 di- and trimethylated ORFs in different transcrip- tional classes. As before, genes were divided into five classes according to their transcriptional activity (Holstege et al., 1998), and the percent overlap with H3K79 di- and trimethylated ORFs was plotted. 83 pancy in promoters (Lee et al., 2007), so it was not surprising to find low H3K79 di- and trimethylation in these promoters. However, coding regions of highly ex- pressed genes have been shown to be more occupied by nucleosomes than lower expressed genes (Lee et al., 2007). Therefore, ORFs of highly expressed genes are devoid of H3K79 di- and trimethylation despite the dense occupancy with nucleo- somes. 4.3.4 Association of H3K79me2 and H3K79me3 with the cell cycle Because of the cell-cycle dependence of H3K79me2, we tested if H3K79 dimethyl- marked genes were regulated during the cell cycle. In budding yeast, ∼800 genes change their transcriptional profile and peak in certain stages of the cell cycle (Spell- man et al., 1998). We compared ORFs enriched for H3K79 di- and trimethylation with the different classes of cell cycle-regulated genes (Figure 4.4), and asked if the overlap was significant using Fisher’s exact test (Tavazoie et al., 1999). Indeed, M/G1-regulated genes were significantly enriched for H3K79me2, but were not enriched for H3K79me3 (Figure 4.4). In contrast, genes regulated in the G2 phase showed a significant occupancy with H3K79me3, but are not marked by H3K79me2 (Figure 4.4). These results suggest that H3K79 methylation is not random, but rather that it is regulated in conjunction with progression through the cell cycle and might be involved in periodic transcription of genes during distinct cell-cycle phases. 4.3.5 Dynamics of H3K79me2 and H3K79me3 during the cell cycle Protein blot analysis suggested that bulk levels of H3K79me2, but not H3K79me3, changed during the progression of the cell cycle, with reduced levels in G1 and elevated levels in G2/M. To test if levels of H3K79me2 fluctuated at the level of single genes during the cell cycle, ChIP-on-chip assays were performed using chro- matin of nocodazole-arrested yeast cells and compared to asynchronous cells. In nocodazole, cells are arrested in the G2/M phase of the cell cycle and should have the highest level of H3K79me2, while asynchronous cells have a mixed distribu- tion of G1, S, and G2/M phase cells. While H3K79me2 profiles in asynchronous and G2/M-arrested cells were overall similar, with a Spearman rank correlation 84 Figure 4.4: Association of H3K79me2 and H3K79me3 with the cell cycle. Overlap of H3K79me2- and H3K79me3-enriched ORFs with transcrip- tionally regulated genes for each cell-cycle stage (Spellman et al., 1998). Numbers below the x-axis represent total number of genes with peri- odic transcription. Numbers above the x-axis represent the overlap of these genes with those enriched for H3K79me2 and H3K79me3. The percentage of the overlap was plotted on the y-axis. A: H3K79me2- enriched ORFs in asynchronous cells. Expected by chance are 28 % (1866 H3K79me2-enriched ORFs out of 6576 total). B: H3K79me3- enriched ORFs in asynchronous cells. Expected by chance are 36 % (2350 H3K79me3-enriched ORFs out of 6576 total). The p-values were calculated using Fisher’s exact test. coefficient of ρ = 0.69, a detailed analysis revealed important differences between them. In asynchronous cells, ORFs enriched for H3K79me2 significantly over- lapped with genes whose expression is regulated in M/G1 (Figure 4.4). Interestingly, ORFs enriched for H3K79me2 in G2/M-arrested cells significantly overlapped not only with M/G1- but also with G1-regulated genes (Figure 4.5A). Moreover, this effect expanded into the promoter region of genes, since promoter regions of M/G1- and G1-regulated genes were also significantly enriched for H3K79me2 in G2/M-arrested cells (Figure 4.5B). This result suggests that M/G1- and G1-regulated genes are marked in their ORF and promoter by H3K79me2 dur- ing cell-cycle stages (G2/M) when these genes are inactive. In contrast to H3K79me2, global levels of H3K79me3 were not altered during 85 Figure 4.5: Association of H3K79me2 with the cell cycle in G2/M-phase. Overlap of H3K79me2-enriched ORFs in G2/M-arrested cells with tran- scriptionally regulated genes for each cell-cycle stage (Spellman et al., 1998). A: H3K79me2-enriched ORFs in G2/M-arrested cells. Expected by chance are 38 % (2444 H3K79me2-enriched ORFs out of 6576 to- tal). B: H3K79me2-enriched promoters in G2/M. Expected by chance are 23 % (1483 H3K79me2-enriched promoters out of 6576 total). the cell cycle. In order to confirm this observation on a single-gene level, the H3K79me3 profile in G2/M-arrested cells was determined. As expected, the pro- files of the asynchronous and G2/M-arrested cells were similar, occupied the same regions, and correlated with a Spearman rank correlation coefficient of ρ = 0.91. Consistent with the observations in asynchronous cells, H3K79 di- and trimethyla- tion had distinct and mutually exclusive patterns on chromatin in the G2/M phase and overlapped in only 1% of the genome. 4.3.6 H3K79me2 at intergenic regions and ARS in G2/M-phase To further characterize similarities and differences between H3K79me2 in asyn- chronous and G2/M-arrested cells, we concentrated on typical genomic features. It is known that nucleosome occupancy does not exhibit large, global variation between cell-cycle phases (Hogan et al., 2006), and observed changes in H3K79 dimethylation pattern during the cell cycle were most likely not due to global changes in nucleosome occupancy. 86 Visualizing the average profile of all genes occupied by H3K79me2 in G2/M- arrested cells showed that, similar to asynchronous cells, H3K79 dimethylation was not only found in the protein-coding regions of genes, but also covered their promoters (Figure 4.6A). The trend toward occupancy of less-transcribed genes was weaker in G2/M-arrested cells compared to the asynchronous data set, but the relative height of the H3K79 dimethylation profile followed precisely the decreas- ing order of transcriptional activity (Figure 4.6B). Figure 4.6: Association of H3K79me2 with ORFs and transcription in G2/M- phase. A: Average profile of H3K79me2-enriched ORFs in G2/M- arrested cells. The profile for the average enriched ORF was determined as explained in Figure 4.2. B: Average profile of H3K79me2-enriched ORFs in G2/M-arrested cells according to their transcriptional activity. The profile was determined as described in Figure 4.3. Not only promoters were enriched more frequently with H3K79me2 in G2/M- arrested cells, but also intergenic regions in general. Indeed, they were found to be covered to ∼50 % in G2/M-arrested cells compared to ∼20 % in the asynchronous data set. To further characterize additional chromosomal features for their enrichment with H3K79 di- and trimethylation, we focused on origins of replication (ARSs for au- tonomously replicating sequences) and centromeres. Intriguingly, ARSs were sig- nificantly enriched (131 out of 274, p < 3 ·10−14) for H3K79me2 in the G2/M phase. In contrast, low ARS occupancy of H3K79 di- and trimethylation (17 and 3 ARSs out of 274) was detected in asynchronous cells. The significant over- lap of ARSs with H3K79me2 in G2/M-arrested cells indicates a potential role of 87 H3K79me2 in maintaining ARSs in their inactive state. In yeast, the single cen- tromeric nucleosome contains a specialized H3 variant, Cse4, in place of canon- ical histone H3. Consistent with replacement of H3 at the centromere, neither H3K79me2 nor H3K79me3 antibodies enriched for CEN sequences in either the asynchronous or G2/M data set. 4.3.7 Association of H3K79me2 with the transcription factor Swi4 Genome-wide analysis of SBF-binding sites by ChIP-on-chip using the DNA- binding subunit Swi4 revealed that the SBF binds to promoters of genes expressed in G1/S (Harbison et al., 2004; Iyer et al., 2001; Simon et al., 2001). A comparison of H3K79me2-enriched promoters to the 137 promoters bound by Swi4 (Iyer et al., 2001) gave a significant overlap in G2/M-arrested cells (Figure 4.7). This overlap was not significant for H3K79me2-enriched promoters and ORFs in asynchronous cells, perhaps indicating that H3K79me2 is a consequence of SBF-binding/trans- cription earlier in the cell cycle. In contrast, ORFs and promoters enriched with H3K79me3 showed significantly lower overlap than expected by chance (based on Fisher’s exact test) and no overlap with Swi4-bound genes, respectively (Fig- ure 4.7). This analysis showed that a significant number of SBF-regulated genes were H3K79 dimethylated in their promoter during G2/M. 4.3.8 Colocalization of H2BK123ub with H3K79me3 Given the results demonstrating that the patterns of H3K79 di- and trimethylation on chromatin are mutually exclusive (Figure 4.1), we next sought to understand the mechanism by which a single-enzyme Dot1 can distinguish between sites of di- versus trimethylation. The appearance of H3K79me2 after replication in S phase could be explained by either the de novo establishment of the methyl mark or the demethylation of an existing H3K79me3 mark resulting in H3K79 dimethylation. The latter would require regions that were dimethylated in G2/M to be trimethy- lated during G1/S transition. We ruled out this possibility, because ChIP-on-chip of H3K79me3 in G1-arrested cells showed that regions that are dimethylated in G2/M were not trimethylated in G1, and the profiles of H3K79 trimethylation were simi- lar in G1- and G2/M-arrested cells with a Spearman rank correlation of ρ = 0.82. 88 Figure 4.7: Association of H3K79me2 with the transcription factor Swi4. Venn diagram showing overlap of the 137 Swi4-bound genes (Iyer et al., 2001) with H3K79me2- and H3K79me3-enriched ORFs and promoters, respectively. H3K79me2-enriched promoters in G2/M-arrested cells showed significant overlap with Swi4-bound genes based on Fisher’s exact test. H3K79me3-enriched ORFs and promoters show significant underrepresentation of Swi4-bound genes. Promoters were called en- riched when 450 bp upstream of the ORF were covered by the methyl mark. Based on these observations, it seems to us that demethylation of H3K79me3 may not be the main method by which yeast cells regulate the pattern of H3K79me2 and H3K79me3; however, it remains possible that the trimethyl state could be achieved transiently and quickly removed in G1. Given the data, we hypothesize that H3K79 di- and trimethylation are established independently, and additional factors control the distribution of di- versus trimethylation. One major candidate is monoubiquitination of lysine 123 on histone H2B me- diated by Rad6/Bre1, which is known to be required for proper H3K79 trimethy- lation. Since ChIP-on-chip studies indicated that the patterns of H3K79 di- and trimethylation are mutually exclusive and have ∼2 % overlap throughout the yeast genome, we hypothesize that the pattern of H2B monoubiquitination could control distribution of H3K79 di- versus trimethylation on chromatin. Based on this hy- pothesis, we predict that factors required for H3K79me3 should function through the regulation of H2B monoubiquitination. 89 To test the hypothesis that H2BK123 monoubiquitination determines the pat- tern of H3K79 trimethylation and colocalizes with H3K79 tri- but not dimethylation, a polyclonal antibody was developed specifically recognizing monoubiquitinated H2BK123. Employing this antibody, we determined a comprehensive genome- wide map of H2BK123 ubiquitination via ChIP-on-chip (Figure 4.8A). To ensure the specific enrichment of H2BK123ub over unmodified H2B, nonspecific binding was blocked with H2B peptide during the immunoprecipitation step. In addition, H2BK123ub enrichment data were normalized using an H2BK123A mutant profile obtained from an identical ChIP-on-chip experiment. Intriguingly, H2BK123ub colocalized with H3K79 trimethylation at many genomic loci, but showed distinct and mutually exclusive patterns with H3K79 dimethy- lation (Figure 4.8A). Overall, high-confidence regions colocalized with H3K79 trimethylation, and only a very minor fraction overlapped with H3K79 dimethyla- tion (Figure 4.8B). As expected from this analysis, very similar distributions ex- isted at the ORF level, where 812 out of 2350 H3K79 trimethylated ORFs were marked by H2BK123 ubiquitination, but only 45 out of 1866 H3K79-dimethylated ORFs were enriched for H2BK123 ubiquitination (Figure 4.8C). Since H3K79 dimethylation and H2B monoubiquitination were mutually ex- clusive on a genome-wide level, we predicted the link of H3K79me2 to genes expressed specifically during the cell cycle to be independent of H2B monoubiq- uitination pattern. Indeed, cell cycle-regulated genes, especially those regulated in M/G1, were not marked by H2B ubiquitination (Figure 4.8D). Taken together, these findings suggest that the regulation of H2BK123 mono- ubiquitination is linked to H3K79 tri- but not dimethylation and could play role in distinguishing the genome-wide establishment of H3K79 di- versus trimethylation. 90 Figure 4.8: Colocalization of H2BK123ub with H3K79me3. A: Overlay of H2BK123ub, H3K79me2, and H3K79me3 profiles. Sample genomic positions for chromosomes 8 and 10 were plotted along the x-axis against the relative occupancy of the indicated histone modifications on the y-axis. ORFs are indicated as rectangles above the axis for Wat- son and below the axis for Crick strand. B: Diagram summarizing the percentage of genome-wide occupancy of H3K79me2 and H3K79me3 and their overlap with H2BK123ub. C: Diagram illustrating the over- lap of H3K79me2 and H3K79me3 enriched with H2BK123ub-enriched genes. D: Overlap of H2BK123-ubiquitinated ORFs with transcription- ally regulated genes for each cell-cycle stage (Spellman et al., 1998). Numbers below and above the x-axis represent the total number of genes in each cell-cycle class and their overlap with H2B-ubiquitinated genes, respectively. The percentage of overlap is plotted on the y-axis with a dashed line indicating the percentage expected by chance. The p-values were calculated using Fisher’s exact test. 91 4.4 Interdependencies of histone modifications 4.4.1 Genome-wide distribution of H2BK123ub and H3 methylation marks with respect to gene length and transcriptional frequency To illuminate the chromatin network around H2BK123ub in S. cerevisiae, H2B- K123ub, its dependent marks H3K4- and H3K79 methylation as well as the tran- scription elongation mark H3K36me3 were mapped using high-resolution tiling ar- rays. These profiles allow a comprehensive analysis to elucidate the co-occurrences and dependencies of H2BK123ub and H3 methylation in the context of transcrip- tional regulation across the genome. H2BK123ub and the H3 methylation marks measured here were strongly enriched in genomic regions transcribed by RNAPII, but mostly absent from other genomic features such as telomeres, centromeres, the rRNA locus, ARSs and tRNAs (Table 4.1). Table 4.1: H2BK123ub and H3 methylation at different genomic features. Overlap of investigated histone modifications with different genomic fea- tures like rRNAs, tRNAs, autonomously replicating sequence (ARS), centromeres (CENs) and telomeres (TEL). rRNAs, tRNAs, ARS and CENs were called associated when 100 % of underlying probes had an enrichment score above a threshold of 1.5. Telomeres were called en- riched when at least 50 % of the underlying probes had an enrichment score above a threshold of 1.5. Feature rRNA tRNA ARS CEN TEL Total 25 275 274 16 32 H2BK123ub WT 0 0 4 0 0 H3K4me3 1 4 9 1 0 H3K36me3 0 0 3 0 0 H3K79me3 0 0 3 0 0 H3K79me2 0 0 16 0 1 H2BK123ub ubp8∆ 0 1 5 0 0 H2BK123ub ubp10∆ 0 2 4 0 0 92 Therefore, the analysis was focused on RNAPII transcripts, and a compact yet comprehensive visualization approach (CHROMATRA) was developed (see Sec- tion 3.5.2) to assess the distribution of histone modifications across all transcripts at once, while accounting for gene length and transcriptional frequency. In this approach, enrichment scores of each modification were calculated using a 150 bp frame and colour-coded for all known transcripts, which were aligned according to their transcription start sites (TSS) and sorted by their length (Figure 4.9). Ex- tending previous studies mapping H3 methylation in S. cerevisiae (Pokholok et al., 2005), our platform and visualization achieved greater resolution and avoided am- biguities caused by averaging effects in gene length. H2BK123ub covered the entire coding sequence of genes and extended into some promoters (Figure 4.9), which agrees with studies in human cells (Minsky et al., 2008). H3K4me3 was localized in the 5′-end of almost all transcripts and sharply peaked just downstream of the TSS. In contrast, H3K36me3 covered the entire body of transcripts almost uniformly. The lateral distribution of H3K79me3 fol- lowed a similar shape, but was less pronounced towards the 3′-end and intensi- fied with increasing transcript length. As shown above, the profile of H3K79me2 was mutually exclusive to the one of H3K79me3, with higher occupancy towards shorter transcripts (Figure 4.9). In order to examine the correlation between these histone marks and gene expression, all transcripts were grouped into five classes according to their tran- scriptional frequency (Holstege et al., 1998), and using CHROMATRA (see Sec- tion 3.5.2) plotted similarly to Figure 4.9. While H2BK123ub was present in genes belonging to all transcriptional classes (Figure 4.10), it had a tendency for stronger occupancy in classes of higher transcriptional frequency (Figure 4.11). Similarly, H3K4me3 and H3K36me3 were enriched in all classes of transcriptional frequen- cies, whereas H3K79me2 and H3K79me3 were only enriched in classes of lower transcriptional frequencies as shown above (Figure 4.10 and 4.11). 93 Figure 4.9: Distribution of H2BK123ub and H3 methylation marks in all transcripts. Enrichment of H2BK123ub, H3K4me3, H3K36me3, H3K79me3 and H3K79me2 across all transcripts sorted by their length and aligned by their TSS. The normalized ChIP-on-chip MAT-scores were binned into segments of 150 bp. The average enrichment value for each bin was colour-coded and plotted. The upper adjacent (UA) of the MAT score distribution was used for colour bar limits. 94 Figure 4.10: Distribution of H2BK123ub and H3 methylation marks with re- spect to transcriptional frequency. As in Figure 4.9, but with transcripts sorted by their length as well as transcriptional frequency. Transcripts were grouped into five classes according to their number of transcripts per hour (Holstege et al., 1998). The upper adjacent (UA) of the MAT score distribution was used for colour bar limits. 95 Figure 4.11: Associations of H2BK123ub and H3 methylation marks with transcriptional frequencies. Boxplots indicating the association of his- tone marks with transcriptional frequencies. As above transcripts were grouped into five classes according to their transcriptional frequency. For all modifications and each transcript the average enrichment score was calculated as the average MAT score of all probes between tran- script start and end. For each transcription class the average scores were then boxplotted. For H3K4me3, which peaks downstream of the TSS around the +2 and +3 nucleosome, average scores were calculated for 300 bp in that region. 4.4.2 Correlation and colocalization of H2BK123ub and its dependent marks The spatial relationship of H2BK123ub and H3 methylation marks was complex as seen in the overlay at one representative genomic region (Figure 4.12A). To assess their associations quantitatively, their pair-wise Spearman correlation was calculated on a 5 bp (probe)-level. H2BK123ub correlated positively with its down- stream marks H3K4me3 and H3K79me3, whereas the correlation was much stronger with H3K79me3 (ρ = 0.67) than with H3K4me3 (ρ = 0.26) (Figure 4.12B). In addition, H2BK123ub correlated positively with the elongation mark H3K36me3 (ρ = 0.63), which itself showed the strongest correlation with H3K79me3 (ρ = 96 0.76). In contrast, H2BK123ub and H3K79me2 correlated negatively (ρ =−0.31) consistent with the finding presented above that these two marks do not colocalize. Figure 4.12: Correlation of H2BK123ub and its dependent marks. A: Over- lay of H2BK123ub (black), H3K4me3 (red), H3K36me3 (green), H3K79me3 (blue) and H3K79me2 (orange) ChIP-on-chip profiles. Sample genomic position for chromosome 8 was plotted along the x- axis against the relative occupancy of the histone modifications on the y-axis. ORFs are indicated as rectangles, above the axis for Watson genes and below the axis for Crick genes. B: Spearman correlation matrix for genome-wide profiles calculated on a 5 bp (probe) level. Red indicates high correlation and blue represents anticorrelation. To more directly determine the colocalization of H2BK123ub and H3 methylation, the occurring patterns across all genes were further analyzed. Since some modifi- cations such as H3K4me3 are enriched in specific parts of the gene (Figure 4.9), 97 genes were partitioned in segments for further analysis similar to an approach used before (Liu et al., 2005). Each gene was divided into five segments: promoter (300 bp upstream of the coding start), TSS-proximal (300 bp downstream of TSS), 5′-CDS (300 bp downstream of the coding start), mid-CDS (300 bp around the CDS center) and 3′-CDS (300 bp upstream of the coding end). Figure 4.13: Patterns of H2BK123ub and its dependent marks. Specific seg- ments of a gene showed different combination of marks. Genes were partitioned into the following segments: promoter region (300 bp up- stream of the coding region), TSS-proximal (300 bp downstream of TSS), 5′-CDS (300 bp downstream CDS start), middle CDS (300 bp around centre of ORF), and 3′-CDS (300 bp upstream of CDS end). Mid-CDS and 3′-CDS was only considered when the ORF had a length of at least 900 bp or 600 bp, respectively. The average enrichment scores for the histone modification were hierarchically clustered (inde- pendently for each gene segment), colour-coded and plotted. Columns represent modifications, rows correspond to genes. In order to determine the combinations of H2BK123ub and the H3 methylation marks within each gene segment, the average enrichment score for each modifi- cation was calculated, clustered and plotted as a heatmap (Figure 4.13). To get a 98 more quantitative measure of these cooccurring patterns, we determined the num- ber of all 32 possible combinations of marks for the five segments by assigning a binary value for each modification depending on whether it was enriched or not within each segment (Table 4.2). Both analyses revealed that promoters (column 3) were mostly enriched for H2BK123ub, H3K4me3 or H3K79me2 (lines 2, 3, 6), and characterized by two predominant combinations of marks: H2BK123ub and H3K4me3 (line 7) as well as H3K4me3 and H3K79me2 (line 13 and Figure 4.13, Table 4.2). H3K4me3 was most prominent in both the TSS-proximal segment (col- umn 4) and the 5′-CDS (column 5), and largely cooccurred with either H2BK123ub or H3K79me2 (lines 7, 13). Mid- and 3′-CDS (columns 6, 7) were dominated by a triple combination of H2BK123ub, H3K79me3 and H3K36me3 (line 20). In addi- tion, and consistent with findings presented above, the clustering clearly separated H3K79me2 from H3K79me3 particularly in the mid-CDS and 3′-CDS, and showed that H3K79 tri- but not di-methylation was associated with H2BK123ub (Fig- ure 4.13). Taken together, in all segments H2BK123ub colocalized with its depen- dent marks H3K4me3 and H3K79me3—but not necessarily vice versa as discussed below. Table 4.2: Frequencies of histone modification patterns. For all possible histone modification patterns of H2BK123ub, H3K4me3, H3K36me3, H3K79me3, and H3K79me2 in promoter, 5′-CDS, TSS-proximal, mid- CDS and 3′-CDS the actual number of genes with each pattern is speci- fied. Signature Prom TSS-prox 5′-CDS mid-CDS 3′-CDS Total 6572 4868 6132 3832 4786 1 no modification 3113 656 998 360 1439 2 K123ub 413 69 38 5 86 3 K4me3 1060 1465 1218 32 74 4 K36me3 31 4 21 84 441 5 K79me3 46 15 48 351 255 6 K79me2 600 304 534 417 736 7 K123ub, K4me3 402 505 304 3 3 8 K123ub, K36me3 10 1 5 8 159 9 K123ub, K79me3 21 12 17 42 49 10 K123ub, K79me2 23 5 3 1 8 11 K4me3, K36me3 33 69 171 57 119 12 K4me3, K79me3 70 267 480 12 7 13 K4me3, K79me2 423 793 953 47 122 99 Table 4.2 – continued Signature Prom TSS-prox 5′-CDS mid CDS 3′-CDS 14 K36me3, K79me3 36 1 38 919 507 15 K36me3, K79me2 7 0 16 61 122 16 K79me3, K79me2 6 4 8 56 50 17 K123ub, K4me3, K36me3 35 96 172 51 109 18 K123ub, K4me3, K79me3 69 410 468 3 1 19 K123ub, K4me3, K79me2 44 81 65 1 4 20 K123ub, K36me3, K79me3 24 1 16 767 277 21 K123ub, K36me3, K79me2 2 0 0 0 4 22 K123ub, K79me3, K79me2 0 1 2 2 1 23 K4me3, K36me3, K79me3 31 13 137 204 60 24 K4me3, K36me3, K79me2 7 7 36 58 61 25 K4me3, K79me3, K79me2 7 30 73 5 4 26 K36me3, K79me3, K79me2 6 0 2 58 26 27 K123ub, K4me3, K36me3, K79me3 41 41 266 189 38 28 K123ub, K4me3, K36me3, K79me2 0 4 13 4 5 29 K123ub, K4me3, K79me3, K79me2 2 11 16 1 0 30 K123ub, K36me3, K79me3, K79me2 0 0 0 1 6 31 K4me3, K36me3, K79me3, K79me2 8 1 9 30 13 32 K123ub, K4me3, K36me3, K79me3, K79me2 2 1 5 3 0 4.4.3 Site-specific removal of H2BK123ub by Ubp8 and Ubp10 To directly test the hypothesis that Ubp8 and Ubp10 remove the transient ubiquitin mark from distinct genomic regions (see page 77), H2BK123ub was mapped across the genome in strains lacking either Ubp8 or Ubp10. As expected from bulk pro- tein blotting studies (Emre et al., 2005; Gardner et al., 2005), the number of probes enriched for H2BK123ub increased in the strains lacking Ubp8 or Ubp10 (Fig- ure 4.14). Supporting the hypothesis, newly enriched probes for H2BK123ub were different between the two deletion strains (Figure 4.14). To assess the localization of these newly ubiquitinated regions, the H2BK123ub distribution was plotted for strains lacking Ubp8 or Ubp10 across all transcripts sorted by their length as well as transcriptional frequency using CHROMATRA (Figure 4.15A). In the ubp8∆ strain, H2BK123ub peaked downstream of the TSS, but was reduced throughout the body of the transcript compared to wildtype cells (Figure 4.15A). In contrast, H2BK123ub was localized across the coding sequence of mainly longer genes in ubp10∆ strain comparable to wildtype cells (Figure 4.15A). To clearly 100 visualize where newly enriched sites are located along the transcripts, wildtype profiles were subtracted from deletion profiles and positive-definite results colour- coded (Figure 4.15B). Figure 4.14: Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (probe level). Number of probes enriched for H2BK123ub within transcripts in wild-type as well as ubp8 and ubp10 deletion strains. Venn diagrams comparing the overlap of these probes between the different strains. To test the hypothesis that Ubp8 removes H2BK123ub at sites marked by H3K4me3 and Ubp10 does so for regions enriched for H3K79me3, we asked how many H2BK123ub-enriched probes are enriched for H3K4me3 or H3K79me3 in wild- type, ubp8∆ and ubp10∆ stains (Figure 4.16A). Consistent with the hypothesis, newly enriched probes for H2BK123ub in the ubp8∆ strain were mainly marked by H3K4me3, and those in the ubp10∆ strain by H3K79me3 (Figure 4.16A). By averaging the enrichment profiles of H2BK123ub and its dependent marks in a length-dependent manner similar to a previous approach (Mayer et al., 2010), we noticed that the H2BK123ub profile was very different from H3K4me3 but com- parable in its lateral distribution and overall shape to H3K79me3 (Figure 4.16B). Upon deletion of Ubp8, the H2BK123ub profile changed dramatically, now show- ing a striking similarity to H3K4me3. The ubp10 deletion profile, however, had relatively modest changes, although the degree of resemblance to H3K79me3 fur- ther increased (Figure 4.16B). These data suggest that Ubp8 acts primarily in the 5′-CDS marked by H3K4me3, whereas Ubp10 deubiquinates H2BK123 in the body of transcripts marked by H3K79me3. Since most changes in the distribution of H2BK123ub upon loss of Ubp8 and 101 Figure 4.15: Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (transcript level). A: Distribution of H2BK123ub in wild-type as well as ubp8 and ubp10 deletion strains across all transcripts sorted by their transcriptional frequencies. Calculations and plotting as in Figure 4.10. B: Differences in the enrichment of H2BK123ub in ubp8 and ubp10 deletion strains. Enrichment scores for H2BK123ub in ubp8 and ubp10 deletion strains were subtracted from the wildtype en- richment scores and only positive-definite results colour-coded. Aver- age enrichment was calculated and transcripts sorted as in Figure 4.10. 102 Figure 4.16: Connection between Ubp8, Ubp10 and H3 methylation marks. A: All probes enriched for H2BK123ub within transcripts in wild- type as well as ubp8 and ubp10 deletion strains were compared with the number of these probes enriched for H3K4me3 and H3K79me3. B: All genes with a known TSS were divided into five length classes and the average enrichment for H2BK123ub wildtype, ubp8 and ubp10 deletion strains as well as H3K4me3 and H3K79me3 were mapped. All transcripts in each group were partitioned into 150 bp bins and the average enrichment values were calculated and plotted. 103 Ubp10 occurred in the 5′- and mid-CDS, these gene segments were further ana- lyzed and the association of H2BK123ub with H3K4me3 and H3K79me3 deter- mined (Figure 4.17). Supporting the hypothesis, the 5′-ends deubiquitinated by Ubp8 were primarily marked by H3K4me3, whereas the mid-CDSs deubiquiti- nated by Ubp10 were mainly marked by H3K79me3 (Figure 4.17). Figure 4.17: Segment-based association of Ubp8 and Ubp10 with H3 methy- lation marks. Venn diagrams comparing the number of 5′-CDS and mid-CDS enriched for H2BK123ub in wildtype, ubp8 and ubp10 dele- tion strains. For the 5′-CDS and mid-CDS enriched for H2BK123ub only in the ubp8 or ubp10 deletion strain, the overlap with the number of 5′-CDS and mid-CDS either marked by H3K4me3 (red), H3K79me3 (blue) or both (purple) is shown as bars. To further infer dependencies in the circuitry of H2BK123 ubiquitination/deubiqui- tination and its dependent marks, we tested whether loss of Ubp8 or Ubp10 had any consequences on the global levels and the genome-wide distribution of H3K4me3 and H3K79me3. As previously shown (Gardner et al., 2005; Daniel et al., 2004; Song and Ahn, 2010), protein blot analysis revealed no bulk change of H3K4me3 and H3K79me3 in neither ubp8 nor ubp10 deletion strains compared to wild-type. Furthermore, the genome-wide distribution of H3K4me3 and H3K79me3 remained largely unchanged upon loss of Ubp8 or Ubp10 (data not shown). These findings suggested that Ubp8 and Ubp10 remove ubiquitin from H2BK123 after H3K4- and 104 H3K79 trimethylation have been established. The link between H2BK123ub and the transcriptional cycle (Weake and Workman, 2008) may suggest that non-removal of the (bulky) ubiquitin mark on H2BK123 in ubp8∆ and ubp10∆ strains impairs transcription. However, and consistent with moderate effects on transcription levels (Gardner et al., 2005; Lenstra et al., 2011), the distribution of RNAPII was not affected by loss of either Ubp8 or Ubp10 (data not shown). Moreover, the distribution of the elongation mark H3K36me3 was not impaired upon loss of Ubp8 (data not shown), suggestion that transcription elongation still takes place. 4.5 Aberrant transcripts and their chromatin structure 4.5.1 Chromatin at cryptic promoters It has been shown that cells lacking certain components of chromatin-associated pathways, such as the Set2/Rpd3S pathway critical for proper transcription-coupled chromatin remodeling, initiate cryptic transcripts within open reading frames (Car- rozza et al., 2005; Lickwar et al., 2009; Joshi and Struhl, 2005; Kaplan et al., 2003; Cheung et al., 2008). To detect these spurious transcription initiation events in bud- ding yeast, whole-genome custom Affymetrix tiling arrays were used to map the entire transcriptome in a wild-type and set2 deletion strain, replicated twice each. As described in Chapter 3, the R package tilingArray (Huber et al., 2006) was applied to normalize the data and determine all transcript segments for a glob- ally optimal fit of expression segments along genomic coordinates for an assumed length of 1500 bp per segment (nrBasesPerSegment = 1500). It is impor- tant to note that this parameter does not enforce a minimum length for individual segments, it is rather used to determine the number of segments the algorithm is going to consider when partitioning the region of interest. ORFs in which seg- ments newly appeared in the absence of Set2 were further considered. For these genes, the average gene expression values for all identified sections within known ORF boundaries were calculated and compared to gene expression levels in wild- type cells. In total 521 of the ∼6500 ORFs in S. cerevisiae showed one or more cryptic initiation events. More than 57 % of those overlapped with previously re- 105 ported cryptic transcripts (Lickwar et al., 2009) that were calculated employing the same underlying detection principle (Bai and Perron, 1998) of change-points that tilingArray uses. Little is know about the chromatin structure at sites of cryptic initiation within ORFs. One key chromatin modification found at many regular transcription initi- ation sites in budding yeast is the histone variant H2A.Z. We used ChIP-on-chip with high-resolution tiling arrays to determine its distribution across the genome. To visualize its distribution across transcripts, the average profile across genes was calculated (Figure 4.18A) and its association with transcriptional frequency visual- ized using CHROMATRA (Section 3.5.2) (Figure 4.18B). Consistent with previous studies (Guillemette et al., 2005; Raisner et al., 2005; Zhang et al., 2005; Albert et al., 2007), H2A.Z was mainly enriched at gene promoters (Figure 4.2A) and with stronger occupancy in classes of lower transcriptional frequency (Figure 4.18B). To test if cryptic promoters have H2A.Z localized to their initiation sites in in- tragenic regions, we mapped H2A.Z in cells lacking Set2. Strikingly, H2A.Z was enriched at about half of all sites of cryptic transcription initiation when the Set2/Rpd3S pathway was impaired (Figure 4.18C). Overlaying gene expression data with the H2A.Z mapping data allowed a direct comparison of the chromatin structure at sites of cryptic initiation (Figure 4.19). Another histone modification strongly associated with transcription initiation sites is H3K4 tri-methylation. As shown in Figure 4.10, H3K4me3 is enriched at 5′-ends of almost all genes in S. cerevisiae. If the chromatin structure at sites of cryp- tic transcription resembles the chromatin structure at normal initiations sites, one would expect to find H3K4me3 downstream of cryptic initiation sites as well. To test this hypothesis, we mapped H3K4me3 in cells lacking Set2 and found H3K4me3 to be enriched in many genes with cryptic transcripts (Figure 4.20). 4.5.2 Deposition and role of H2A.Z at cryptic promoters The occurrence of H2A.Z and H3K4me3 at sites of cryptic initiation indicates that patterns of chromatin are comparable to those at regular transcribed regions. How- 106 Figure 4.18: Distribution of H2A.Z across ORFs and transcripts. A: Aver- age profiles of H2A.Z across ORFs. ORFs were aligned according to their translational start and stop sites, similar to an approach by the Young lab (Pokholok et al., 2005). Plot calculated similar to Fig- ure 4.2. B: Distribution of H2A.Z across transcripts with respect to transcriptional frequency. Plot as in Figure 4.10. C: Overlay of H2A.Z wild-type (red) and set2 deletion (blue) ChIP-on-chip profiles. Two sample genomic positions were plotted along the x-axis against the relative occupancy of the histone variant on the y-axis. ever, these data do not allow to draw conclusions about dependencies and the enzy- matic machineries establishing these modifications. In order to elucidate such de- pendencies, we focused on the histone variant H2A.Z and its deposition machinery. H2A.Z is well known to be deposited into chromatin by the multi-subunit complex SWR1-C (Kobor et al., 2004; Krogan et al., 2003a; Mizuguchi et al., 2004). Im- pairing of the catalytic subunit Swr1 leads to a major loss of H2A.Z associated with chromatin, indicating that SWR1-C is the predominant H2A.Z-deposition machin- 107 Figure 4.19: H2A.Z at sites of cryptic initiation. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles with the corresponding gene expression data in wild-type and set2 deletion strain is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green lines. ery (Marques et al., 2010). To test whether the SWR1-C is responsible for integrat- ing H2A.Z at cryptic promoters as well, we mapped H2A.Z in cells lacking both Swr1 and Set2. As expected, H2A.Z was lost at regular promoters in the absence of Swr1, but surprisingly, the localization of H2A.Z to cryptic promoters was not de- pendent on SWR1-C (Figure 4.21), suggesting that another complex is involved in H2A.Z deposition at these sites. Our current experiments and analyses are geared towards revealing the cellular machineries important for H2A.Z deposition at those sites. A strong correlation exists between transcription and many chromatin modifica- tions. Yet, whether these correlations are based on pure associations or causal relationship exist between histone modifications and transcription is largely un- known. The disturbance of chromatin structures, e.g. by impairing the Set2/Rpd3 pathway, leads to the initiation of cryptic transcription within intragenic regions, arguing that chromatin has a direct effect on transcriptional regulation. However, the causal relationship of modifications associated with the initiation of transcripts 108 Figure 4.20: H3K4me3 at sites of cryptic initiation. For two sample genes, the overlay of H3K4me3 ChIP-on-chip profiles with the corresponding gene expression data in wild-type and set2 deletion strain is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment vi- sualized as horizontal green lines. such as H2A.Z and H3K4me3 for the occurence of cryptic transcripts has not been investigated so far. Therefore, we studied the role of H2A.Z in this process by map- ping cryptic transcription in cells lacking H2A.Z. Interestingly, despite its strong association with (cryptic) transcription, H2A.Z was not causal for the occurrence of cryptic transcripts, as the number of intragenic initiation events was similar in the set2∆ and set2∆htz1∆ strains (Figure 4.22). As the example of cryptic transcription indicates, the cell’s transcriptome is rather complex and transcripts of various type have been identified so far. More- over, transcription from the opposite strand to a protein-coding or sense strand called antisense transcription seems to be widespread in budding yeast and has been ascribed roles in gene regulation (David et al., 2006; Havilio et al., 2005; Nagalakshmi et al., 2008; Steigele and Nieselt, 2005; Gelfand et al., 2011). We ob- served that most cryptic transcripts occur in sense direction, but antisense cryptic 109 Figure 4.21: Role of SWR1-C in H2A.Z deposition at cryptic promoters. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles in wild- type, set2∆ and set2∆swr1∆ strains with the corresponding gene ex- pression data is shown. Dotted vertical lines represent predicted tran- script segments, which were used to calculate average expression lev- els for each segment visualized as horizontal green lines. transcripts occur as well (Figure 4.23). To test whether the histone variant H2A.Z is localized to sites of antisense cryptic transcription, the analysis was focussed on intragenic regions containing such transcripts. Interestingly, H2A.Z was found at these sites, indicating that chromatin structures at antisense cryptic promoters has a typical composition, too (Figure 4.23). Taken together, these studies provide a glimpse of the complexity of transcrip- tion, in particular at cryptic sites, and the association with the surrounding chro- matin neighnourhoods and lay out the corner stones for further analysis. 4.6 Discussion The work presented in the first part this chapter provides new evidence that indi- vidual modification states of the same histone residue have non-redundant cellular functions. Secondly, my analysis helped to reveal temporal and spatial dependen- 110 Figure 4.22: Role of H2A.Z for the occurrence of cryptic transcripts. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles with the corresponding gene expression data in wild-type, set2∆ and set2∆htz1∆ strains is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green and blue lines. cies of histone modifications and their distinctive patterns. Finally, evidence is presented that the chromatin structure is altered at cryptic transcriptions sites and that the H2A.Z as well as H3K4me3 are enriched at cryptic initiation sites. Distinct states of histone modification. In the first part, findings are reported, which address di- and trimethylation of H3K79 and their debated redundancy in functional outcome. The analysis showed that H3K79 methylation states were mu- tually exclusive across the genome and linked to different cellular functions. In particular, the presented characteristics of methylation states point towards a fun- damental connection between H3K79 dimethylation and the cell cycle. The ChIP- on-chip analysis revealed H3K79me2 to be enriched at M/G1 cell cycle-regulated genes as well as in Swi4/6-bound promoters. Histone lysine residues have different methylation states, and it has been un- clear whether these states are functionally distinct or redundant. For H3K79 methy- 111 Figure 4.23: H2A.Z at sites of antisense cryptic transcripts. For a sample gene, the overlay of H2A.Z ChIP-on-chip profiles with the correspond- ing sense as well as antisense gene expression data in wild-type, set2∆ and set2∆htz1∆ strains is shown. Dotted vertical lines represent pre- dicted transcript segments, which were used to calculate average ex- pression levels for each segment visualized as horizontal green lines. lation, it was proposed that the three states are functionally redundant (Frederiks et al., 2008; Shahbazian et al., 2005), based on genetic evidence indicating that all three states have roles in telomeric silencing (Frederiks et al., 2008). In contrast, work presented in this chapter, provides evidence that H3K79 methylation states have non-redundant, distinct functions. The presented ChIP-on-chip profiles reveal that H3K79me2 and H3K79me3 are associated with mutually exclusive regions of the genome, a fundamental prerequisite to be functionally distinct. Mechanistically, the sole H3K79 methyl-transferase Dot1 requires a mechanism to specifically establish each state at certain regions in the genome. In one possible mechanism for this region-specific binding, Dot1 could recognize other chromatin marks that induce state-specific methylation. Indeed, as shown here, a potential candidate is H2BK123ub, which colocalizes with H3K79me3, but not H3K79me2. Supporting this hypothesis, in vitro reconstitution of the methylation reaction pro- posed that the mono-ubiquitination of H2B induces a conformational change to the Dot1-nucleosome complex which specifically stimulates Dot1’s catalytic ac- 112 tivity (Jeltsch and Rathert, 2008; McGinty et al., 2008). Therefore, the pattern of H2B monoubiquitination could control distribution of H3K79 di- versus trimethy- lation on chromatin. Another point of evidence supporting the distinct function of the different methylation states is the presented association of H3K79 di- but not trimethyla- tion with cell-cycle control. What is known about chromatin and the cell cycle? One major aspect of cell-cycle control is the regulatory network of interconnected transcriptional activators (Simon et al., 2001). Often, transcription factors function specifically during one cell-cycle stage and control the expression of transcrip- tional activators for the subsequent one, thereby forming a feed-forward regula- tory circuit to ensure an ordered progression through cell division. In addition to transcriptional regulation, cell-cycle progression also employs proteolysis, phos- phorylation, localization, and other regulatory mechanisms to ensure completion of one event before entry into the next. The role of chromatin modifications in transcription control has been intensively studied in recent years. However, little is known about cell cycle-dependent changes in chromatin structure and the role that chromatin alterations play in normal cell-cycle progression and transcriptional regulation. Based on observations presented in this chapter, we propose the following model of how H3K79me2 and cell-cycle control relate to one another. Dephos- phorylation of Swi6 and synthesis of Swi4 during mitosis allows binding of SBF to its target genes in the late M/early G1 phase of the cell cycle. However, in- hibitory factors such as Whi5 prevent transcriptional activation at these targets until early G1, after cells have reached a critical size threshold. At this point in the cell cycle, known as START in budding yeast, activation of the G1 cyclin-dependent kinase Cln3-Cdk1 inside the nucleus renders the SBF functionally active, and a positive feedback loop results in amplification of SBF- as well as MBF-dependent transcription and synthesis of genes required for progression through G1 and entry into the S phase. The SBF is inactivated upon entry into the S phase through phos- phorylation of Swi6, which disrupts its binding to Swi4 and results in its export to the cytoplasm. Without Swi6, Swi4 can no longer bind to DNA. The observa- 113 tions that levels of H3K79me2 increase during the S phase and remain high during the G2/M phase, combined with the genome-wide location analysis showing that H3K79me2-occupied genes overlap extensively with those expressed specifically in the G1 phase (or bound by Swi4), suggest that H3K79me2 marks cell cycle- specific genes during G2/M. Whether H3K79me2 is a consequence of gene inactivation or if it actively causes transcriptional inactivation remains to be determined. Here, we provide evidence that the modification is created de novo—that is, through the addition of two methyl groups to an unmodified H3K79—and is not generated by demethylation of an ex- isting H3K79me3 residue. Taken together, the data clearly show that H3K79 methylation states are mutu- ally exclusive and linked to different cellular functions. In particular, a fundamental connection between H3K79 dimethylation and the cell cycle is presented, which offers a great opportunity for further analysis of chromatin’s role in regulating the cell cycle. From the computational perspective, the fluctuation and distinct localization of H3K79 methylation during the cell cycle stages could indicate a toggle mecha- nism that signals a particular overall system state to certain cellular machineries and discriminates marked genes from others, thereby triggering how to correctly proceed through the cell cycle program. According to the histone code hypoth- esis, the dimethyl mark might be recognized by complexes via specific protein domains that are recruited to marked genes in order to promote their transcription as they are required for the subsequent process stage. Expressed in terms of a finite state machine, the cell cycle stages can be understood as machine states, chromatin modifications as part of the input (and/or output) events that determine correct state transitions, and the activation of certain genes as output of a state change. Interdependencies of histone modifications. H2BK123ub, a post-transla- tional histone mark with roles in transcription initiation and elongation, is re- quired in the histone trans-tail cross-talk for tri-methylation of histone H3K4 and H3K79 (Briggs et al., 2002; Dover et al., 2002; Nakanishi et al., 2009; Sun and Al- lis, 2002). H2BK123ub is highly transient and removed by the ubiquitin-specific 114 proteases Ubp8 and Ubp10. However, the genomic regions these deubiquitinases act upon have not been investigated genome-wide. In this chapter, genome-wide maps of H2BK123ub in wildtype, ubp8 and ubp10 deletion strains as well as H3 methylation profiles obtained on the same high-resolution platform are presented. The analysis demonstrates that H2BK123ub was associated with both its depen- dent marks and, despite not causally linked, also colocalized with the transcrip- tion elongation mark H3K36me3. Furthermore, Ubp8 and Ubp10 had site-specific roles in the genome: Ubp8 deubiquinated H2BK123ub at H3K4me3-marked re- gions, whereas Ubp10 removed H2BK123ub at H3K79me3-enriched sites. The data further indicate that H2BK123ub might be more transient in the 5′-end than in the body of transcripts, thereby influencing certain steps of the transcription cycle differently. These findings together with the current understanding of the histone trans- tail cross-talk suggest the following model (Figure 4.24): Initially, Rad6/Bre1 get recruited to promoters through interactions with transcriptional activators and monoubiquitinate H2BK123 (Weake and Workman, 2008). Together with the sur- rounding histone residues, H2BK123ub then provides a molecular ‘tag’ attract- ing the Set1/COMPASS complex, which in turn tri-methylates H3K4 (Lee et al., 2007). Eventually, Ubp8 removes the bulky H2BK123 monoubiquitin group, and H3K4me3 remains as a memory mark of recent transcriptional initiation (Ger- ber and Shilatifard, 2003; Krogan et al., 2003b; Muramoto et al., 2010; Ng et al., 2003b). In longer genes, which require an extensive transcription elongation phase, Rad6/Bre1 stay associated with the elongating form of RNA polymerase II and monoubiquitinate H2BK123 throughout the CDS. This provides a molecular ‘tag’ recognized by Dot1, which binds stronger to and resides longer at these sites to specifically trimethylate H3K79 (Jeltsch and Rathert, 2008). In those regions, Ubp10 removes H2BK123ub, and the stable H3K79me3 mark remains. The data further indicate that H2BK123ub might be more transient in the 5′-end than in the body of transcripts. In wild-type cells, H2BK123ub was mainly en- riched in the body of transcripts marked by H3K79me3 and did not peak in the 5′-end. The increase of ubiquitin in the body of transcripts was modest upon loss 115 of Upb10, while it strongly increased in their 5′-ends upon loss of Ubp8, suggest- ing that the ubiquitin mark is more transient at the 5′-end under normal conditions. Furthermore, H2BK123ub was reduced in the body of transcripts in the Ubp8 dele- tion strain, indicating that non-removal of H2BK123ub at the 5′-end might block the ubiquitination of H2BK123 during later elongation steps. Figure 4.24: Model depicting the circuitry of H2BK123ub and its dependent marks. First, H2B is monoubiquitinated by Rad6/Bre1 resulting in the recruitment of Set1/COMPASS and in longer genes Dot1 to tri- methylate H3K4me3 and H3K79me3, respectively. The removal of H2BK123ub is proposed to be site-specific by either Ubp8 or Ubp10. It has been shown that deletion of Ubp8 and Ubp10 together results in a higher global increase of H2BK123ub level than either one of the single deletions alone, but the increase is not completely additive indicating that they work synergisti- cally at a few regions (Emre et al., 2005; Gardner et al., 2005). Our data suggest that these regions might be the once marked by both dependent marks H3K4me3 and H3K79me3. In particular, at 5′-ends of genes H3K4me3 and H3K79me3 co- occurred, suggesting that Ubp8 and Ubp10 most likely both act at those regions. In support of the presented hypothesis, Ubp8 and Ubp10 catalyze deubiquitina- tion of H2BK123 on distinct genomic loci, which raises the issue of specific recog- 116 nition of these sites. Ubp8 is part of the SAGA complex and forms a module with Sgf11, Sus1 and Sgf73 (Köhler et al., 2010; Samara et al., 2010). Sus1 is required for the recruitment of Ubp8 to promoters (Köhler et al., 2006) and might help Ubp8 to be recruited to H3K4me3-marked regions specifically. Ubp10 has not been identified to be part of a complex and the mechanism of its recruitment to chro- matin is not clear. Ubp10 was proposed to play a role in telomeric silencing (Emre et al., 2005; Gardner et al., 2005), but we observed no increase of H2BK123ub in telomeres upon loss of Ubp10. Besides its proposed importance for telomeric regions, gene expression studies point towards a role of Ubp10 in euchromatin as well (Gardner et al., 2005). Consistently, we revealed that Ubp10 specifically re- moves H2BK123ub from H3K79 tri-methylated open reading frames. Strikingly, the H3K79 methyltransferase Dot1 has been initially found in the same screen as Ubp10 (Dot4) (Singer et al., 1998), further supporting this connection of Ubp10 and Dot1 across the genome. Proper addition and removal of H2BK123ub are required for optimal gene expression (Weake and Workman, 2008) and Ubp8-mediated deubiquitination is involved in the transition between initiation and elongation (Daniel et al., 2004; Henry et al., 2003). Failure to remove the H2BK123 ubiquitin mark leads to de- fects in recruitment of the kinase Ctk1, but global defects in subsequent CTD ser- ine 2 phosphorylation have not been detected (Wyce et al., 2007). Consistently, it has been shown that loss of Ubp8 does not alter recruitment of RNAPII to GAL1 during gene activation (Wyce et al., 2007). We here confirmed that Rpb3 localiza- tion upon loss of Ubp8 or Ubp10 was unaltered genome-wide, which is reflected by moderate changes in gene expression (Gardner et al., 2005; Lenstra et al., 2011). Consistently, the elongation mark H3K36me3 did not change upon loss of Ubp8, suggesting that transcriptional elongation still takes place when H2BK123ub is not properly removed. Together, these findings indicate that the cell is able to efficiently transcribe genes despite impaired removal of the ubiquitin moiety. A possible explanation could be an indirect removal of the ubiquitin mark through eviction of H2A and H2B during transcription or histone turn-over. In addition, the overall transcrip- 117 tional program might be unaffected by the disturbed H2BK123ub pathway, po- tentially due to genetic robustness achieved by functional redundancy. Quanti- tative genetic interaction mapping showed that genes encoding components of the H2BK123ub machinery such as Rad6, Bre1 and Ubp8 interact genetically with genes encoding multiple transcription-related factors such as subunits of the Set1/COMPASS, PAF, SWR1-C or the proteasome and is embedded in the well- connected complexes of transcriptional elongation (Ingvarsdottir et al., 2005; Xiao et al., 2005). Taken together, the results agree with previous findings that describe DUBs as major molecular regulators (D’Andrea, 2010) and point towards distinct roles of Ubp8 and Ubp10 in the deubiquitination machinery of eukaryotic cells. With respect to the underlying computational principles, the H2BK123ub pathway demonstrates nicely how one histone modification can trigger others in the con- text of transcriptional gene activation. Drawing again on the finite state machine abstraction, the transcriptional program can be divided into certain states, such as initiation, elongation and termination on a coarse-grained level, that need to be ex- ecuted in a certain order. The sequence and combination of histone modifications described above might then be regarded as (part of the) inputs to determine correct state transitions. The recruitment of enzymatic complexes for certain transcription stages and establishing downstream histone marks can be understood as generated outputs of the machine. Aberrant transcripts and their chromatin structure. Recent studies have shown that the proper regulation of chromatin influences the correct choice of tran- script initiation sites. In cells lacking certain chromatin-regulation pathways, ini- tiation of transcription occurs inappropriately within the protein-coding regions of genes, indicating that expression of alternative genetic information occurs (Kaplan et al., 2003; Carrozza et al., 2005; Cheung et al., 2008; Joshi and Struhl, 2005). Although it is known that some of these cryptic transcripts are translated into pro- teins, very little is known about the chromatin neighbourhoods at cryptic initiation sites. In the last part of this chapter, we investigate chromatin alterations at sites of cryp- tic initiation when the Set2/Rpd3 pathway is impaired. We reveal that chromatin modifications typically found in normal promoters, such as H2A.Z, are also en- 118 riched in cryptic promoters. Since histone acetylation is a key modification of transcript initiation, the oc- currence of cryptic transcripts within genes can be explained by increased his- tone acetylation within open reading frames due to an impaired Set2/Rpd3 path- way (Carrozza et al., 2005). It has been proposed that cryptic transcripts arise at all intragenic regions containing DNA sequences that have the ability to inappro- priately recruit transcription initiation factors (Lickwar et al., 2009). Additional factors such as gene length or transcription rate may influence the occurrence of sites of cryptic transcription (Lickwar et al., 2009). So far, the functional impor- tance of cryptic transcripts in S. cerevisiae is unclear. Potentially, these sites might have influenced the evolution of alternative promoters in higher eukaryotes, as the use of alternative transcription start sites is frequently found in higher organisms. To fully unravel the function of cryptic transcription, it is important to understand its association with the underlying chromatin structure. Here, we show that the histone variant H2A.Z as well as the histone modifi- cation H3K4me3 are enriched at sites of cryptic initiation, indicating that cryptic promoters have chromatin structures similar to canonical promoters. However, the data suggest that the regulation of chromatin modifications at cryptic sites is different from sites of canonical transcript initiation. The histone variant H2A.Z is known to be deposited into chromatin by the SWR1-C under normal condi- tions (Kobor et al., 2004; Krogan et al., 2003a; Mizuguchi et al., 2004). In contrast, H2A.Z is not deposited to sites of cryptic initiation by the SWR1-C. Therefore, an- other protein or enzymatic complex may be responsible to integrate H2A.Z at these sites of cryptic initiation. Potential candidates are the H2A.Z chaperones Nap1 or Chz1 (Luk et al., 2007) although they contain no enzymatic activities themselves. Alternatively, previous studies proposed that H2A.Z can be randomly incorporated into chromatin in a non-targeted fashion (Hardy et al., 2009). In this scenario, H2A.Z is not found in coding regions, since it is constantly removed during tran- scription elongation (Hardy et al., 2009). Under impaired conditions, such as the disturbance of the Set2 pathway, when cryptic transcripts arise, the removal of H2A.Z might not function properly at intragenic sites and H2A.Z gets enriched at 119 cryptic promotes without enzymatic activity. Our work further reveals that the histone variant H2A.Z is not causal for the occurrence of cryptic transcripts under the conditions tested, indicating that it most likely does not influence the location of cryptic initiation either. As mentioned above, other factors that affect transcription, including the availability of binding sites for transcriptional activators and the existence of DNA elements such as a TATA box, and play causal roles in the context-dependent initiation of cryptic tran- scription (Lickwar et al., 2009). Taken together, we show evidence that the chromatin modifications H2A.Z and H3K4me3 are enriched at sites of cryptic transcription. These data provide the first step of revealing the chromatin neighbourhood at cryptic start sites and offer the opportunity to further fill in the gap on the epigenetic map of the yeast genome. From the biocomputing viewpoint, the data suggest that the chromatin marks stud- ied up to this point are not causal for the transcriptional program to be initiated at cryptic sites but are rather memory marks to signalize where a recent transcription took place and to reflect the state of a certain genomic stretch. The lack of impact on gene activity, however, indicates that these chromatin marks are potentially suit- able candidates to encode information for artificial biocomputers. The decoupling of information and cellular consequence provides essential degrees of freedom to exploit these resources with fewer interferences regarding cellular function. 120 Chapter 5 Conclusion While an all-embracing yet precise definition of life as the discriminating char- acteristic between inanimate matter and objects we refer to as organisms remains open to a certain degree, undisputed properties of living systems include the ca- pacity to metabolite, respond to stimuli and reproduce (Koshland, 2002). At first glance, some of these functions seem conceivable in higher organisms as emerging results of complex sensory networks, intricate structural organizations and elabo- rate nervous systems. Yet, even the ‘lowliest of the lowest’ single-celled organ- isms that roam this planet—as Joseph Leidy described amoeba (Naturalists et al., 1878)—lead highly sophisticated lives. Approaching the question of how such ‘simple’ living systems can, for example, respond to light, exhibit directed locomo- tion and even actively hunt prey, we start appreciating that these activities require information processing within the molecular realm. Essentially, cells are systems of highly connected and interdependent circuits that compute and perform logical operations on biomolecular substrates (Bonn and Furlong, 2008; Istrail et al., 2007; Levine and Davidson, 2005). Understanding the fundamental principles of these algorithmic bioprocesses will—mutually— advance our understanding of biology and inspire novel forms of computation (Con- don et al., 2009). At the very core, the connection between life as a concrete physical process and computation as an abstract principle can be derived from the- ories on pancomputationalism (Zuse, 1990; Jaynes, 1957a;b), according to which 121 information is the essence of every process and, thus, any physical process is a computation and can be restated in terms of information: It is not unreasonable to imagine that information sits at the core of physics, just as it sits at the core of a computer. (...) It from bit. Otherwise put, every ‘it’—every particle, every field of force, even the space-time continuum itself—derives its function, its mean- ing, its very existence entirely—even if in some contexts indirectly— from the apparatus-elicited answers to yes-or-no questions, binary choices, bits. ‘It from bit’ symbolizes the idea that every item of the physical world has at bottom—a very deep bottom, in most instances— an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes-no questions and the regis- tering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin (...) (taken from Wheeler, 1990). While the radical hypothesis of pancomputationalism awaits further elucida- tion, it is interesting to note that Wheeler’s idea suggests even binary logic at the core of every physical process. If this postulate is true, most efforts in biomolecu- lar computing seem to follow the ‘correct’ computational paradigm when adopting digital encoding schemes and striving for discrete behaviour in man-made bio- computers despite the fact that many biological processes appear to be inherently analog in nature—at least at the macro-molecular scale to the human observer. Leaving the question of digital versus analog aside, computing—in both the classical and biological sense—essentially is a dynamic process that transforms a set of inputs to a set of outputs according to some algorithmic rules. To better reflect this dynamic aspect, certain branches in the field of biomolecular comput- ing have started to make a transition away from DNA-based approaches towards more agile substrates as carriers of computational information in biocompatible applications (Benenson, 2009a). Two prominent biomolecular substrates that saw a dramatic shift towards agility in their roles in cellular computation are the main subject of this thesis—RNA and chromatin. 122 The presented work centres around the functionality and characteristics that RNA and chromatin provide with respect to their regulatory role in the cell as well as their potential capacities for biomolecular computing. The first part of the dissertation, covered in Chapter 2, focuses on RNA. I spec- ulate on how eukaryotic cells might use regulatory RNA molecules and the RNA interference machinery to generate a form of epigenetic memory, and how they might potentially utilize it for computations based on sequential logic. The model I present is an RNA-based equivalent of the well-known electric flip-flop, one of the fundamental memory units in digital circuits. This work contributes by eluci- dating the abstract principles of the RNA interference machinery from a biological perspective and suggests novel ideas for universal memory units in biomolecular computing. The second part of this work, prepared in Chapter 3 and put into biological context in Chapter 4, focuses on chromatin. In collaborative projects with highly recog- nized research groups, my work contributes to the understanding of various as- pects of chromatin biology. In particular, I analyzed genome-wide data of different methylation states of histone residue H3K79. Results of this analysis allowed to an- swer a long-standing question in the field of chromatin biology, and demonstrated that H3K79 methylation states are mutually exclusive in the genome and associ- ated with distinct cellular processes. I further identified genome-wide patterns of several histone H3 modifications in the context of histone H2BK123 monoubiq- uitination, which—among other results—led to the finding that Ubp8 and Ubp10 deubiquitinate H2BK123 in different loci in the genome. Lastly, my work sheds light on intriguing connections between chromatin structure and gene activity by analyzing transcriptome data together with genome-wide profiles of histone marks, the histone variant H2A.Z and parts of its depositing enzymatic machinery. Al- though immediate translations of these results into applications for biomolecular computing seem far-fetched at this point, they aid in deciphering the principles of chromatin-based computation in the cell and, hence, lay the foundation for poten- tial computational designs at a later point. The presented work emphasizes that there are different types of chromatin modifications, some of them being rather transient, whereas others are more permanent, indicating that some modifications of this computational layer have storage functionality, while others act like sig- 123 nalling toggles to discriminate activation of the ‘correct’ genetic elements in a par- ticular system state. Chromatin and RNA represent two of several layers in the hierarchical net- work (Figure 1.1) that constitutes the cellular computation apparatus (Conrad, 1983). Although initially appearing to be rather independent, chromatin- and RNA-based regulatory functions became quickly connected with the discoveries of non-coding RNAs and their role in chromatin structure (see Section 1.3.1). As both research fields evolve, this overlap continues to grow. The examples in Sec- tion 1.3.2 of long ncRNAs, such as Xist and HOTAIR, demonstrate how deeply both layers are nested and hint at the versatile and powerful interplay they exhibit with respect to cellular function in general and epigenetic information processing in particular (Lee, 2010). For biomolecular computing—specifically biocompatible approaches—regu- latory RNAs have already proven their superiority over DNA, and the field has passed the stage of identifying tractable goals how to fully implement in vivo com- puting (Benenson, 2009b). In the next phase, attempts are under way to assemble individual components into larger subunits. Complexity-wise, these units are pro- jected to contain about 10–20 components, and the key lesson to be learned on that route will be how to deal with the inherent noise and random fluctuations in living systems while maintaining proper functioning of the engineered systems (Benen- son, 2009b). Randomness and non-determinism neither conflict with the claim that cells compute nor are they unwanted byproducts. Instead, many molecular interac- tions in the cell heavily rely on random searching to increase the probability of encounters between reaction partners over relatively longer distances; only once in proximity, more deterministic processes control the course of interaction (Yanagida et al., 2007; Bajic and Tan, 2005). The critical question for biomolecular comput- ing to understand here is how do natural systems avoid the amplification of this endogenous noise and sustain their operation, and so across a wide range of phys- iological conditions? It has been proposed that the hierarchical composition of 124 interconnected computational layers in the cell might be—at least part of—the an- swer to this question as it facilitates redundancy and converging of disintegrated signal and control flows (Bajic and Tan, 2005; Prohaska et al., 2010; Stojanovic, 2010). It even appears as if this form of distributed computing over multiple lay- ers and different molecular substrates is critical for the flexible and intelligent be- haviour of organisms (Bajic and Tan, 2005). Thus, once again, current efforts in biomolecular computing aiming to combine transcriptional, post-transcriptional and post-translational layers (Benenson, 2009b) seem to pursue the right path to- wards robust operation of complex engineered circuits in vivo. With respect to RNA-based computations, this principle becomes obvious. As explained in Sections 1.3.2 and 2.3, small interfering RNAs alone—despite being the pivotal element—have no capacity to unfold their regulatory impact without the protein machinery of RNA interference. This observation holds for other types of regulatory RNAs as well (Shalgi et al., 2007; Shimoni et al., 2007). Combina- tions of the RNA and protein layer features led to the most advanced biocomputers thus far (Deans et al., 2007; Greber et al., 2008; Rinaudo et al., 2007; Win et al., 2009). Especially in the work of Rinaudo et al., 2007, the logic capabilities and computational power of such hybrid circuits are substantially greater compared to previous approaches and could be further widened—towards sequential and/or it- erated operation—if equipped with universal memory units such as flip-flops to temporarily store information. Additional increases in computational flexibility of such devices could poten- tially be derived by further integrating sequence- and structure-based features of RNAs. As successfully laid out by Smolke’s group, riboswitch RNAs with in- tegrated aptamer domains offer tremendous opportunities as sensors and affec- tors for a wide range of molecular signals in the cell (Win and Smolke, 2007b). The induced switch-like conformational change of the RNA in response to bind- ing between a specific ligand and the aptamer modulates the efficiency of target processing through the RNA interference machinery (Beisel et al., 2008). For the envisioned diagnostic and therapeutic applications of biocompatible comput- ers, aptamers could serve as detectors for characteristic molecules in disease cells 125 and specifically expose a disease-relevant target to RNAi processing (Beisel et al., 2008). Hence, RNAi-mediated processing of target RNAs can be controlled not only through the (sequence of the) siRNA effector but also by applying or remov- ing the ligand signal the aptamer on the target is susceptible for. Maybe this dual control scheme might also provide a route to cascade multiple RNA flip-flops in a counter- or frequency divider-like fashion. Integrated aptamers in RNAs and cor- responding ligands could render flip-flops susceptible for primary trigger signals or clock certain parts of flip-flop cascades to achieve (a-)synchronous behaviour. As more and more studies reveal the diverse ways nature utilizes RNAs and as much as the revealed versatility must appeal to current attempts towards in vivo computing, one also has to pay attention to the fact that the more nested a substrate or process is in the cell, the more likely it is that the degrees of freedom from an en- gineering perspective become limited. Many molecules that would be particularly interesting to exploit—for instance polymerases and ribosomes because of their catalytic properties and the ability to translate information from DNA to RNA and from RNA to protein, respectively—are evolutionary tightly locked within the cel- lular computing machinery and, hence, offer little flexibility since other elements in the machinery would be inevitably affected too when trying to modify these molecules (Stojanovic, 2010). Abstraction might provide a principle to circumvent this dilemma. By not en- coding information as direct properties of substrates, such as their sequence, length or structure, one may gain additional freedom for computation without affecting other cellular processes. The RNA-based flip-flop discussed in Chapter 2 exempli- fies such an approach. Although being a memory model not (yet) found in nature, it could potentially operate within and according to the rules of the cellular machin- ery without any further requirements. Admittedly, an implementation would have to show whether the utilization of the RNAi machinery has negative effects for the rest of the cell. It could certainly be that only a limited amount of RISC complexes is available and that either the flip-flop or natural processes cease function because of saturation effects. 126 Another substrate that might offer computational potential through abstraction is chromatin (Prohaska et al., 2010). While the impact of chromatin modifications on gene activity clearly suggests a link between the underlying DNA sequence and the positioning of nucleosomes, other chromatin architectural proteins and their chemical modifications, the mapping between genetic and epigenetic information does not seem to be a bijection (Jiang and Pugh, 2009; Washietl et al., 2008; Segal and Widom, 2009). This ‘partial detachment’ from DNA together with the capacity to write, erase and read chromatin information through specific proteins generates freedom for epigenetic information processing superimposed to the DNA-driven cis-regulatory processes (Benecke and Group, 2006; Fischle et al., 2003; Hall et al., 2002; Sedighi and Sengupta, 2007). In particular, writer, eraser and reader enable information exchange between different genomic regions and epigenetic informa- tion propagation over cell generations. The inducible chromatin state—from both the chemical and structural perspective—thereby serves as a large, flexible and re- settable (Reik, 2007; Morgan et al., 2005) memory device. As emphasized by the work presented in Chapter 4, even down to the level of different modification states of individual histone residues, information is encoded in a non-redundant manner. Hence, postulated storage capacities of 70 bits per nucleosome in S. cerevisiae and around 200 bits in human (Prohaska et al., 2010)—not even considering different states and context dependencies of modifications—seem conservative. The ‘hybrid’ processing layer of epigenetics, which unites aspects of DNA sequence information, chromatin structure and RNA as well as protein activity, promises a wide range of computationally exploitable opportunities. However, much remains to be fundamentally elucidated from the biological side before we can start thinking of using chromatin for biocomputing. One of the central points research will have to address is the question of the ‘quality’ of connectivity between chromatin and other cellular layers. Up to this point, deriving general principles re- garding the relations between different chromatin modifications and between chro- matin modifications and gene activity that allow distinguishing actual causal links and mere associations has been a challenge (Lee et al., 2010). The exceptions that seem to exist for every derived rule may however—at least partially—be rooted in technical limitations. 127 Exemplified by the results presented in Chapter 4, the data suggests that multiple histone modifications (such as H2BK123ub and H3K4me3 or H3K79me3) cooc- cur on the same nucleosome. Yet, neither ChIP-on-chip nor ChIP-seq profiling— representing averages of entire usually asynchronous cell populations—yield the resolution to further analyze and address such questions at single-cell or single- nucleosome level. Carrier ChIP (CChIP) and MicroChIP currently represent the best approaches of this family, but still require on the order of 102–103 cells to gen- erate a chromatin modification profile (O’Neill et al., 2006; Dahl and Collas, 2009), and genome-wide single-cell maps of modifications are technically still not feasi- ble as of the writing of this dissertation. Even if we already could study chromatin on the single-cell level, the question of whether certain histone marks, such as methylation of different H3 residues as presented in Chapter 4, coexist on the same histone could still not be answered. Single-molecule experiments are necessary to determine cooccurrance of modifications at high resolution. Recent work based on highly sensitive mass spectrometry generated progress in that direction (Young et al., 2010; Taverna et al., 2007). Yet, all of the mentioned technologies capture only snapshots of the chromatin structure and fall short to address the dynamics and, thus, the sequential dependencies between chromatin modifications. While synchronizing or arresting cells, as done to derive the data analyzed in Section 4.3, might work for certain questions, ultimately techniques like CATCH-IT (covalent attachment of tags to capture histones and identify turnover) offer greater versatil- ity for measuring the dynamic properties of histones (Deal et al., 2010). The gained understanding on the ‘micro’ level of structural and dynamic prop- erties of chromatin will have to go hand in hand with the unrevealing of the spa- tial organization of this computational layer on the ‘macro’ level within the nu- cleus (Misteli, 2007). Advancements in live-cell imaging represent promising steps towards determining these conformational layouts (Zhao et al., 2006; Simo- nis et al., 2006; Dekker et al., 2002; Vassetzky et al., 2009). A more comprehensive appreciation of the spatial and temporal complexity of chromatin will also help to relate back to the processes located on its connecting layers, foremost DNA and the activity of genetic elements. The ease with which genetic manipulations can be performed in model organisms in combination with suitable environmental con- 128 ditions will be crucial to dissect (context-dependent) causalities in these complex problems in cell biology. Hence, chromatin modification profiling complemented by transcriptome mapping under various conditions will greatly augment the abil- ity to link observations and functional outcomes with respect to gene activity and non-coding RNA regulation. The examples presented in Section 4.5 demonstrate the rich opportunities for scientific discovery that arise from this synergy. To take full advantage of these opportunities, the analysis of multi-dimensional data will be increasingly important and offer plenty of room for inter-laboratory and -disciplinary collaborations. Suitable analysis platforms supporting such en- deavours will be crucial along that path, and the Internet will play a key role as the web-based fusion of tools and data repositories steadily continues. This trend is best exemplified by the recent integration of one of the most advanced algo- rithms for genome-scale assembly of sequencing data (Ng et al., 2010) into the Galaxy environment. I too chose Galaxy to demonstrate in Chapter 3 how lo- cally running analysis tools can be easily migrated and made publicly available to broader audiences. Besides the development of smart new algorithms to man- age the data volume that is being generated through projects such as the Roadmap Epigenomics (Bernstein et al., 2010) and the large-scale integration of such re- sources into frameworks, simply increasing the exposure of computational people to biological and biomedical research—and vice versa—may be a relatively easy step to tackle the ‘informatics crisis’ (Goecks et al., 2010) in the ’omics era. Personally, I have greatly enjoyed this exposure over the past years while work- ing on the projects presented herein, and I feel benefited having the opportunities to straddle domains and explore the facets of information and life from both the biocomputing and bioinformatics perspective. I hope the results I was able to de- rive will contribute to the advancement in understanding information processing in living systems and inspire novel designs for biocomputing circuits. 129 Bibliography Adleman, L. M. (1994). Molecular computation of solutions to combinatorial problems. Science (New York, NY) 266, 1021–1024. → pages 5, 6 Agrawal, N., Dasaradhi, P. V. N., Mohmmed, A., Malhotra, P., Bhatnagar, R. K. and Mukherjee, S. K. (2003). RNA interference: biology, mechanism, and applications. Microbiology and molecular biology reviews : MMBR 67, 657–685. → pages 17, 19 Agresti, A. (1992). A Survey of Exact Inference for Contingency Tables. Statistical Science 7, 131–153. → pages 61 Ahn, S.-H., Cheung, W. L., Hsu, J.-Y., Diaz, R. L., Smith, M. M. and Allis, C. D. (2005). Sterile 20 kinase phosphorylates histone H2B at serine 10 during hydrogen peroxide-induced apoptosis in S. cerevisiae. Cell 120, 25–36. → pages 14 Albert, I., Mavrich, T. N., Tomsho, L. P., Qi, J., Zanton, S. J., Schuster, S. C. and Pugh, B. F. (2007). Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome. Nature 446, 572–576. → pages 106 Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. (2007). Molecular Biology of the Cell. 5 edition, Garland Science. → pages 1, 27 Alder, M. N., Dames, S., Gaudet, J. and Mango, S. E. (2003). Gene silencing in Caenorhabditis elegans by transitive RNA interference. RNA 9, 25–32. → pages 29 Allis, C. D. (2007). Epigenetics. CSHL Press. → pages 2, 14 An, C.-I., Trinh, V. B. and Yokobayashi, Y. (2006). Artificial control of gene expression in mammalian cells by modulating RNA interference through 130 aptamer-small molecule interaction. RNA (New York, NY) 12, 710–716. → pages 9, 19 Andrews, B. and Herskowitz, I. (1989). The yeast SWI4 protein contains a motif present in developmental regulators and is part of a complex involved in cell-cycle-dependent transcription. Nature 342, 830–833. → pages 75 Aoki, K., Moriguchi, H., Yoshioka, T., Okawa, K. and Tabara, H. (2007). In vitro analyses of the production and activity of secondary small interfering RNAs in C. elegans. The EMBO journal 26, 5007–5019. → pages 47 Aparicio, O., Geisberg, J. V. and Struhl, K. (2004). Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Current protocols in cell biology Chapter 17, Unit 17.7. → pages 50 Arents, G., Burlingame, R. W., Wang, B. C., Love, W. E. and Moudrianakis, E. N. (1991). The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. Proceedings of the National Academy of Sciences of the United States of America 88, 10148–10152. → pages 10 Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. and Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 25, 25–29. → pages 65 Ausio, J. and van Holde, K. E. (1986). Histone hyperacetylation: its effects on nucleosome conformation and stability. Biochemistry 25, 1421–1428. → pages 14 Babiskin, A. H. and Smolke, C. D. (2011). Engineering ligand-responsive RNA controllers in yeast through the assembly of RNase III tuning modules. Nucleic Acids Research, in press. → pages 8 Babloyantz, A. and Nicolis, G. (1972). Chemical instabilities and multiple steady state transitions in Monod-Jacob type models. Journal of theoretical biology 34, 185–192. → pages 6, 27 Badis, G., Chan, E. T., van Bakel, H., Pena-Castillo, L., Tillo, D., Tsui, K., Carlson, C. D., Gossett, A. J., Hasinoff, M. J., Warren, C. L., Gebbia, M., 131 Talukder, S., Yang, A., Mnaimneh, S., Terterov, D., Coburn, D., Li Yeo, A., Yeo, Z. X., Clarke, N. D., Lieb, J. D., Ansari, A. Z., Nislow, C. and Hughes, T. R. (2008). A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular cell 32, 878–887. → pages 63 Bai, J. and Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica 66, 47–78. → pages 106 Bajic, V. B. and Tan, T. W. (2005). Information processing and living systems. Imperial College Pr. → pages 124, 125 Baker, P. G., Brass, A., Bechhofer, S., Goble, C., Paton, N. and Stevens, R. (1998). TAMBIS–Transparent Access to Multiple Bioinformatics Information Sources. International Conference on Intelligent Systems for Molecular Biology 6, 25–34. → pages 67 Bar-Ziv, R., Tlusty, T. and Libchaber, A. (2002). Protein-DNA computation by stochastic assembly cascade. Proceedings of the National Academy of Sciences of the United States of America 99, 11589–11592. → pages 8 Baron, R., Lioubashevski, O., Katz, E., Niazov, T. and Willner, I. (2006a). Logic gates and elementary computing by enzymes. The journal of physical chemistry. A 110, 8548–8553. → pages 8 Baron, R., Lioubashevski, O., Katz, E., Niazov, T. and Willner, I. (2006b). Elementary arithmetic operations by enzymes: a model for metabolic pathway based computing. Angewandte Chemie (International ed in English) 45, 1572–1576. → pages 8 Baulcombe, D. C. (2007). Molecular biology. Amplified silencing. Science (New York, NY) 315, 199–200. → pages 29 Bayer, T. S. and Smolke, C. D. (2005). Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nature biotechnology 23, 337–343. → pages 8 Beisel, C. L., Bayer, T. S., Hoff, K. G. and Smolke, C. D. (2008). Model-guided design of ligand-regulated RNAi for programmable control of gene expression. Molecular systems biology 4, 224. → pages 9, 19, 125, 126 Beisel, C. L. and Smolke, C. D. (2009). Design principles for riboswitch function. PLoS computational biology 5, e1000363. → pages 8, 31 132 Benecke, A. and Group, S. E. (2006). Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs. The European physical journal. E, Soft matter 19, 353–366. → pages 127 Benenson, Y. (2009a). RNA-based computation in live cells. Current opinion in biotechnology 20, 471–478. → pages 8, 9, 31, 46, 122 Benenson, Y. (2009b). Biocomputers: from test tubes to live cells. Molecular bioSystems 5, 675–685. → pages 3, 6, 7, 8, 29, 124, 125 Benenson, Y., Gil, B., Ben-Dor, U., Adar, R. and Shapiro, E. (2004). An autonomous molecular computer for logical control of gene expression. Nature 429, 423–429. → pages 7 Berger, S. L. (2007). The complex language of chromatin regulation during transcription. Nature 447, 407–412. → pages 14 Bernstein, B. E., Humphrey, E. L., Erlich, R. L., Schneider, R., Bouman, P., Liu, J. S., Kouzarides, T. and Schreiber, S. L. (2002). Methylation of histone H3 Lys 4 in coding regions of active genes. Proceedings of the National Academy of Sciences of the United States of America 99, 8695–8700. → pages 74 Bernstein, B. E., Stamatoyannopoulos, J. A., Costello, J. F., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M. A., Beaudet, A. L., Ecker, J. R., Farnham, P. J., Hirst, M., Lander, E. S., Mikkelsen, T. S. and Thomson, J. A. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nature biotechnology 28, 1045–1048. → pages 129 Bernstein, E. and Vazirani, U. (1997). Quantum Complexity Theory. SIAM Journal on Computing 26, 1411–1473. → pages 4 Bertone, P., Gerstein, M. and Snyder, M. (2005). Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 13, 259–274. → pages 54 Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M. and Snyder, M. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science (New York, NY) 306, 2242–2246. → pages 57 133 Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., Kao, M.-Y., Snyder, M. and Gerstein, M. (2006). Design optimization methods for genomic DNA tiling arrays. Genome research 16, 271–281. → pages 53 Bhaumik, S. R., Smith, E. and Shilatifard, A. (2007). Covalent modifications of histones during development and disease pathogenesis. Nature structural & molecular biology 14, 1008–1016. → pages 13, 73 Bird, A. (2007). Perceptions of epigenetics. Nature 447, 396–398. → pages 11, 28 Bitko, V. and Barik, S. (2001). Phenotypic silencing of cytoplasmic genes using sequence-specific double-stranded short interfering RNA and its application in the reverse genetics of wild type negative-strand RNA viruses. BMC microbiology 1, 34. → pages 19, 29 Blossey, R. and Schiessel, H. (2008). Kinetic proofreading of gene activation by chromatin remodeling. HFSP journal 2, 167–170. → pages 78 Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M. and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. The EMBO journal 17, 170–180. → pages 28 Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford, England) 19, 185–193. → pages 55, 57 Bonasio, R., Tu, S. and Reinberg, D. (2010). Molecular signals of epigenetic states. Science (New York, NY) 330, 612–616. → pages 12, 28 Bonn, S. and Furlong, E. E. M. (2008). cis-Regulatory networks during development: a view of Drosophila. Current opinion in genetics & development 18, 513–520. → pages 2, 9, 121 Boutros, M., Kiger, A. A., Armknecht, S., Kerr, K., Hild, M., Koch, B., Haas, S. A., Paro, R., Perrimon, N. and Consortium, H. F. A. (2004). Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science (New York, NY) 303, 832–835. → pages 19, 29 Braich, R. S., Chelyapov, N., Johnson, C., Rothemund, P. W. K. and Adleman, L. (2002). Solution of a 20-variable 3-SAT problem on a DNA computer. Science (New York, NY) 296, 499–502. → pages 4 134 Breeden, L. (1996). Start-specific transcription in yeast. Curr Top Microbiol Immunol 208, 95–127. → pages 75 Briggs, S. D., Bryk, M., Strahl, B. D., Cheung, W. L., Davie, J. K., Dent, S. Y., Winston, F. and Allis, C. D. (2001). Histone H3 lysine 4 methylation is mediated by Set1 and required for cell growth and rDNA silencing in Saccharomyces cerevisiae. Genes & development 15, 3286–3295. → pages 74 Briggs, S. D., Xiao, T., Sun, Z.-W., Caldwell, J. A., Shabanowitz, J., Hunt, D. F., Allis, C. D. and Strahl, B. D. (2002). Gene silencing: trans-histone regulatory pathway in chromatin. Nature 418, 498. → pages 76, 114 Brockdorff, N., Ashworth, A., Kay, G. F., McCabe, V. M., Norris, D. P., Cooper, P. J., Swift, S. and Rastan, S. (1992). The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71, 515–526. → pages 21 Brown, C. J., Hendrich, B. D., Rupert, J. L., Lafrenière, R. G., Xing, Y., Lawrence, J. and Willard, H. F. (1992). The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527–542. → pages 21 Brownell, J. E., Zhou, J., Ranalli, T., Kobayashi, R., Edmondson, D. G., Roth, S. Y. and Allis, C. D. (1996). Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell 84, 843–851. → pages 13 Bryant, G. O., Prabhu, V., Floer, M., Wang, X., Spagna, D., Schreiber, D. and Ptashne, M. (2008). Activator control of nucleosome occupancy in activation and repression of transcription. PLoS biology 6, 2928–2939. → pages 12 Cairns, B. R. (2007). Chromatin remodeling: insights and intrigue from single-molecule studies. Nature structural & molecular biology 14, 989–996. → pages 13 Canton, B., Labno, A. and Endy, D. (2008). Refinement and standardization of synthetic biological parts and devices. Nature biotechnology 26, 787–793. → pages 7 Carrozza, M. J., Li, B., Florens, L., Suganuma, T., Swanson, S. K., Lee, K. K., Shia, W.-J., Anderson, S., Yates, J., Washburn, M. P. and Workman, J. L. (2005). Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123, 581–592. → pages 78, 105, 118, 119 135 Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., Wheeler, R., Wong, B., Drenkow, J., Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K. and Gingeras, T. R. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509. → pages 54 Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S. and Botstein, D. (1998). SGD: Saccharomyces Genome Database. Nucleic acids research 26, 73–79. → pages 61 Cheung, V., Chua, G., Batada, N. N., Landry, C. R., Michnick, S. W., Hughes, T. R. and Winston, F. (2008). Chromatin- and transcription-related factors repress transcription from within coding regions throughout the Saccharomyces cerevisiae genome. PLoS biology 6, e277. → pages 78, 105, 118 Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M. and Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome biology 6, R16. → pages 55 Clapier, C. R. and Cairns, B. R. (2009). The biology of chromatin remodeling complexes. Annual review of biochemistry 78, 273–304. → pages 13 Claverie, J.-M. (2005). Fewer genes, more noncoding RNA. Science (New York, NY) 309, 1529–1530. → pages 54 Clemson, C. M., McNeil, J. A., Willard, H. F. and Lawrence, J. B. (1996). XIST RNA paints the inactive X chromosome at interphase: evidence for a novel RNA involved in nuclear/chromosome structure. The Journal of cell biology 132, 259–275. → pages 21 Cokus, S. J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C. D., Pradhan, S., Nelson, S. F., Pellegrini, M. and Jacobsen, S. E. (2008). Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219. → pages 21 Collas, P. (2010). The current state of chromatin immunoprecipitation. Molecular biotechnology 45, 87–100. → pages 49 Condon, A. (2006). Designed DNA molecules: principles and applications of molecular nanotechnology. Nature reviews Genetics 7, 565–575. → pages 6 136 Condon, A., Harel, D. and Kok, J. N. (2009). Algorithmic Bioprocesses. Springer-Verlag New York Inc. → pages 121 Conrad, M. (1983). Microscopic-macroscopic interface in biological information processing. Bio Systems 16, 345–363. → pages 124 Cooper, G. M. and Hausman, R. E. (2009). The Cell: A Molecular Approach, Fifth Edition. 5th edition edition, Sinauer Associates Inc. → pages 1, 27 Cox, D. N., Chao, A., Baker, J., Chang, L., Qiao, D. and Lin, H. (1998). A novel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. Genes & development 12, 3715–3727. → pages 28 Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561–563. → pages 8 Crick, F. H., Barnett, L., Brenner, S. and Watts-Tobin, R. J. (1961). General nature of the genetic code for proteins. Nature 192, 1227–1232. → pages 6 Culler, S. J., Hoff, K. G. and Smolke, C. D. (2010). Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science (New York, NY) 330, 1251–1255. → pages 19, 29, 31 Dahl, J. A. and Collas, P. (2009). MicroChIP: chromatin immunoprecipitation for small cell numbers. Methods in molecular biology (Clifton, NJ) 567, 59–74. → pages 128 D’Andrea, A. D. (2010). Susceptibility pathways in Fanconi’s anemia and breast cancer. The New England journal of medicine 362, 1909–1919. → pages 118 Daniel, J. A., Torok, M. S., Sun, Z.-W., Schieltz, D., Allis, C. D., Yates, J. R. and Grant, P. A. (2004). Deubiquitination of histone H2B by a yeast acetyltransferase complex regulates transcription. The Journal of biological chemistry 279, 1867–1871. → pages 77, 104, 117 David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C. J., Bofkin, L., Jones, T., Davis, R. W. and Steinmetz, L. M. (2006). A high-resolution map of transcription in the yeast genome. Proceedings of the National Academy of Sciences of the United States of America 103, 5320–5325. → pages 54, 109 Davidson, E. A. and Ellington, A. D. (2007). Synthetic RNA circuits. Nature chemical biology 3, 23–28. → pages 9, 31 137 Davidson, S. B., Overton, C. and Buneman, P. (1995). Challenges in integrating biological data sources. Journal of computational biology : a journal of computational molecular cell biology 2, 557–572. → pages 67 Deal, R. B., Henikoff, J. G. and Henikoff, S. (2010). Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science (New York, NY) 328, 1161–1164. → pages 128 Deans, T. L., Cantor, C. R. and Collins, J. J. (2007). A tunable genetic switch based on RNAi and repressor proteins for regulating gene expression in mammalian cells. Cell 130, 363–372. → pages 125 Dekker, J., Rippe, K., Dekker, M. and Kleckner, N. (2002). Capturing chromosome conformation. Science (New York, NY) 295, 1306–1311. → pages 128 Ding, S.-W. and Voinnet, O. (2007). Antiviral immunity directed by small RNAs. Cell 130, 413–426. → pages 17 DiVincenzo, D. P. (1995). Quantum Computation. Science (New York, NY) 270, 255–261. → pages 4 Douglas, S. M., Dietz, H., Liedl, T., Högberg, B., Graf, F. and Shih, W. M. (2009). Self-assembly of DNA into nanoscale three-dimensional shapes. Nature 459, 414–418. → pages 6 Dover, J., Schneider, J., Tawiah-Boateng, M., Wood, A., Dean, K., Johnston, M. and Shilatifard, A. (2002). Methylation of histone H3 by COMPASS requires ubiquitination of histone H2B by Rad6. The Journal of biological chemistry 277, 28368–28371. → pages 76, 114 Droit, A., Cheung, C. and Gottardo, R. (2010). rMAT–an R/Bioconductor package for analyzing ChIP-chip experiments. Bioinformatics (Oxford, England) 26, 678–679. → pages 55, 56, 79 Dulac, C. (2010). Brain function and chromatin plasticity. Nature 465, 728–735. → pages 12 Dunny, G. M. and Winans, S. C. (1999). Cell-cell signaling in bacteria. Amer Society for Microbiology. → pages 6, 27 Dynan, W. S. (1989). Modularity in promoters and enhancers. Cell 58, 1–4. → pages 27 138 Ecker, J. R. and Davis, R. W. (1986). Inhibition of gene expression in plant cells by expression of antisense RNA. Proceedings of the National Academy of Sciences of the United States of America 83, 5372–5376. → pages 17 Ellington, A. D. and Szostak, J. W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822. → pages 9 Elowitz, M. B. and Leibler, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338. → pages 7 Emre, N. C. T., Ingvarsdottir, K., Wyce, A., Wood, A., Krogan, N. J., Henry, K. W., Li, K., Marmorstein, R., Greenblatt, J. F., Shilatifard, A. and Berger, S. L. (2005). Maintenance of low histone ubiquitylation by Ubp10 correlates with telomere-proximal Sir2 association and gene silencing. Molecular cell 17, 585–594. → pages 77, 100, 116, 117 Ezziane, Z. (2006). DNA computing: applications and challenges. Nanotechnology 17, R27. → pages 7 Faulhammer, D., Cukras, A. R., Lipton, R. J. and Landweber, L. F. (2000). Molecular computation: RNA solutions to chess problems. Proceedings of the National Academy of Sciences of the United States of America 97, 1385–1389. → pages 8 Feinberg, A. P. (2010). Genome-scale approaches to the epigenetics of common human disease. Virchows Archiv : an international journal of pathology 456, 13–21. → pages 15 Feynman, R. (1959). Plenty of room at the bottom. In Presentation to American Physical Society. → pages 5 Filion, G. J., van Bemmel, J. G., Braunschweig, U., Talhout, W., Kind, J., Ward, L. D., Brugman, W., de Castro, I. J., Kerkhoven, R. M., Bussemaker, H. J. and van Steensel, B. (2010). Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143, 212–224. → pages 12 Fire, A., Xu, S., Montgomery, M. K., Kostas, S. A., Driver, S. E. and Mello, C. C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806–811. → pages 9, 16, 17 Fischle, W., Wang, Y. and Allis, C. D. (2003). Binary switches and modification cassettes in histone biology and beyond. Nature 425, 475–479. → pages 127 139 Fisher, R. A. (1922). On the interpretation of from contingency tables, and the calculation of P.”Journal of the Royal Statistical Society. Technical report. → pages 61 Flemming, W. (1882). Zellsubstanz, Kern und Zelltheilung. F C W Vogel, Leipzig. → pages 12 Fraser, P. (2006). Transcriptional control thrown for a loop. Current opinion in genetics & development 16, 490–495. → pages 10 Frederiks, F., Tzouros, M., Oudgenoeg, G., van Welsem, T., Fornerod, M., Krijgsveld, J. and van Leeuwen, F. (2008). Nonprocessive methylation by Dot1 leads to functional redundancy of histone H3K79 methylation states. Nature structural & molecular biology 15, 550–557. → pages 79, 112 Fredkin, E. and Toffoli, T. (1982). Conservative logic. International Journal of Theoretical Physics 21, 219–253. → pages 4 Fu, P. (2007). Biomolecular computing: is it ready to take off? Biotechnology journal 2, 91–101. → pages 5 Fujita, P. A., Rhead, B., Zweig, A. S., Hinrichs, A. S., Karolchik, D., Cline, M. S., Goldman, M., Barber, G. P., Clawson, H., Coelho, A., Diekhans, M., Dreszer, T. R., Giardine, B. M., Harte, R. A., Hillman-Jackson, J., Hsu, F., Kirkup, V., Kuhn, R. M., Learned, K., Li, C. H., Meyer, L. R., Pohl, A., Raney, B. J., Rosenbloom, K. R., Smith, K. E., Haussler, D. and Kent, W. J. (2011). The UCSC Genome Browser database: update 2011. Nucleic acids research 39, D876–82. → pages 62, 68, 69 Gao, Z., Liu, H.-L., Daxinger, L., Pontes, O., He, X., Qian, W., Lin, H., Xie, M., Lorkovic, Z. J., Zhang, S., Miki, D., Zhan, X., Pontier, D., Lagrange, T., Jin, H., Matzke, A. J. M., Matzke, M., Pikaard, C. S. and Zhu, J.-K. (2010). An RNA polymerase II- and AGO4-associated protein acts in RNA-directed DNA methylation. Nature 465, 106–109. → pages 21 Gardner, R. G., Nelson, Z. W. and Gottschling, D. E. (2005). Ubp10/Dot4p regulates the persistence of ubiquitinated histone H2B: distinct roles in telomeric silencing and general chromatin. Molecular and cellular biology 25, 6123–6139. → pages 77, 100, 104, 105, 116, 117 Gardner, T. S., Cantor, C. R. and Collins, J. J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. → pages 7 140 Gehlenborg, N., O’Donoghue, S. I., Baliga, N. S., Goesmann, A., Hibbs, M. A., Kitano, H., Kohlbacher, O., Neuweger, H., Schneider, R., Tenenbaum, D. and Gavin, A.-C. (2010). Visualization of omics data for systems biology. Nature methods 7, S56–68. → pages 62 Gelfand, B., Mead, J., Bruning, A., Apostolopoulos, N., Tadigotla, V., Nagaraj, V., Sengupta, A. M. and Vershon, A. K. (2011). Regulated antisense transcription controls expression of cell-type-specific genes in yeast. Molecular and cellular biology 31, 1701–1709. → pages 109 Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. H. and Zhang, J. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5, R80. → pages 57, 58 Gerber, M. and Shilatifard, A. (2003). Transcriptional elongation by RNA polymerase II and histone methylation. The Journal of biological chemistry 278, 26303–26306. → pages 115 Goecks, J., Nekrutenko, A., Taylor, J. and Team, G. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, R86. → pages 68, 69, 129 Goldberg, A. D., Allis, C. D. and Bernstein, E. (2007). Epigenetics: a landscape takes shape. Cell 128, 635–638. → pages 11 Greber, D., El-Baba, M. D. and Fussenegger, M. (2008). Intronically encoded siRNAs improve dynamic range of mammalian gene regulation systems and toggle switch. Nucleic acids research 36, e101. → pages 125 Gregory, R. I., Chendrimada, T. P., Cooch, N. and Shiekhattar, R. (2005). Human RISC couples microRNA biogenesis and posttranscriptional gene silencing. Cell 123, 631–640. → pages 29 Gregory, R. I., Chendrimada, T. P. and Shiekhattar, R. (2006). MicroRNA biogenesis: isolation and characterization of the microprocessor complex. Methods in molecular biology (Clifton, NJ) 342, 33–47. → pages 17 Grewal, A., Lambert, P. and Stockton, J. (2007). Analysis of expression data: an overview. Current protocols in bioinformatics Chapter 7, Unit 7.1. → pages 55 141 Grewal, S. I. S. and Elgin, S. C. R. (2007). Transcription and RNA interference in the formation of heterochromatin. Nature 447, 399–406. → pages 20 Grewal, S. I. S. and Rice, J. C. (2004). Regulation of heterochromatin by histone methylation and small RNAs. Curr Opin Cell Biol 16, 230–238. → pages 20 Group, B. F., Baker, D., Church, G., Collins, J., Endy, D., Jacobson, J., Keasling, J., Modrich, P., Smolke, C. and Weiss, R. (2006). Engineering life: building a fab for biology. Scientific American 294, 44–51. → pages 7 Guillemette, B., Bataille, A., Gevry, N., Adam, M., Blanchette, M., Robert, F. and Gaudreau, L. (2005). Variant histone H2A.Z is globally localized to the promoters of inactive yeast genes and regulates nucleosome positioning. PLoS biology 3, e384. → pages 106 Guillemette, B. and Gaudreau, L. (2006). Reuniting the contrasting functions of H2A.Z. Biochemistry and cell biology = Biochimie et biologie cellulaire 84, 528–535. → pages 78 Guttman, M., Amit, I., Garber, M., French, C., Lin, M. F., Feldser, D., Huarte, M., Zuk, O., Carey, B. W., Cassady, J. P., Cabili, M. N., Jaenisch, R., Mikkelsen, T. S., Jacks, T., Hacohen, N., Bernstein, B. E., Kellis, M., Regev, A., Rinn, J. L. and Lander, E. S. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227. → pages 54 Hall, I. M., Shankaranarayana, G. D., Noma, K.-I., Ayoub, N., Cohen, A. and Grewal, S. I. S. (2002). Establishment and maintenance of a heterochromatin domain. Science (New York, NY) 297, 2232–2237. → pages 20, 28, 127 Harbison, C., Gordon, D., Lee, T., Rinaldi, N., Macisaac, K., Danford, T., Hannett, N., Tagne, J., Reynolds, D., Yoo, J., Jennings, E., Zeitlinger, J., Pokholok, D., Kellis, M., Rolfe, P., Takusagawa, K., Lander, E., Gifford, D., Fraenkel, E. and Young, R. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104. → pages 75, 88 Hardy, S., Jacques, P.-E., Gévry, N., Forest, A., Fortin, M.-E., Laflamme, L., Gaudreau, L. and Robert, F. (2009). The euchromatic and heterochromatic landscapes are shaped by antagonizing effects of transcription on H2A.Z deposition. PLoS genetics 5, e1000687. → pages 119 Havilio, M., Levanon, E. Y., Lerman, G., Kupiec, M. and Eisenberg, E. (2005). Evidence for abundant transcription of non-coding regions in the Saccharomyces cerevisiae genome. BMC genomics 6, 93. → pages 109 142 He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N. and Kinzler, K. W. (2008). The antisense transcriptomes of human cells. Science (New York, NY) 322, 1855–1857. → pages 54 Henikoff, S. and Grosveld, F. (2008). Welcome to epigenetics & chromatin. Epigenetics & chromatin 1, 1. → pages 16 Henkin, T. M. (2008). Riboswitch RNAs: using RNA to sense cellular metabolism. Genes & development 22, 3383–3390. → pages 9 Henry, K. W., Wyce, A., Lo, W.-S., Duggan, L. J., Emre, N. C. T., Kao, C.-F., Pillus, L., Shilatifard, A., Osley, M. A. and Berger, S. L. (2003). Transcriptional activation via sequential histone H2B ubiquitylation and deubiquitylation, mediated by SAGA-associated Ubp8. Genes & development 17, 2648–2663. → pages 77, 117 Hillen, W. and Berens, C. (1994). Mechanisms underlying expression of Tn10 encoded tetracycline resistance. Annual review of microbiology 48, 345–369. → pages 6, 27 Hinrichs, W., Kisker, C., Düvel, M., Müller, A., Tovar, K., Hillen, W. and Saenger, W. (1994). Structure of the Tet repressor-tetracycline complex and regulation of antibiotic resistance. Science (New York, NY) 264, 418–420. → pages 6, 27 Hjelmfelt, A. and Ross, J. (1992). Chemical implementation and thermodynamics of collective neural networks. Proceedings of the National Academy of Sciences of the United States of America 89, 388–391. → pages 5 Hjelmfelt, A., Weinberger, E. D. and Ross, J. (1991). Chemical implementation of neural networks and Turing machines. Proceedings of the National Academy of Sciences of the United States of America 88, 10983–10987. → pages 5 Hjelmfelt, A., Weinberger, E. D. and Ross, J. (1992). Chemical implementation of finite-state machines. Proceedings of the National Academy of Sciences of the United States of America 89, 383–387. → pages 5 Hochstrasser, M. (1996). Ubiquitin-dependent protein degradation. Annual review of genetics 30, 405–439. → pages 76 Hogan, G., Lee, C. and Lieb, J. (2006). Cell cycle-specified fluctuation of nucleosome occupancy at gene promoters. PLoS genetics 2, e158. → pages 75, 86 143 Holstege, F., Jennings, E., Wyrick, J., Lee, T., Hengartner, C., Green, M., Golub, T., Lander, E. and Young, R. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728. → pages 66, 82, 83, 93, 95 Huber, W., Toedling, J. and Steinmetz, L. M. (2006). Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics (Oxford, England) 22, 1963–1970. → pages 57, 66, 105 Hunter, C. P., Winston, W. M., Molodowitch, C., Feinberg, E. H., Shih, J., Sutherlin, M., Wright, A. J. and Fitzgerald, M. C. (2006). Systemic RNAi in Caenorhabditis elegans. Cold Spring Harbor symposia on quantitative biology 71, 95–100. → pages 46 Hwang, W., Venkatasubrahmanyam, S., Ianculescu, A., Tong, A., Boone, C. and Madhani, H. (2003). A conserved RING finger protein required for histone H2B monoubiquitination and cell size control. Molecular cell 11, 261–266. → pages 76 Ingvarsdottir, K., Krogan, N. J., Emre, N. C. T., Wyce, A., Thompson, N. J., Emili, A., Hughes, T. R., Greenblatt, J. F. and Berger, S. L. (2005). H2B ubiquitin protease Ubp8 and Sgf11 constitute a discrete functional module within the Saccharomyces cerevisiae SAGA complex. Molecular and cellular biology 25, 1162–1172. → pages 77, 118 Isaacs, F. J., Dwyer, D. J. and Collins, J. J. (2006). RNA synthetic biology. Nature biotechnology 24, 545–554. → pages 9 Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R. and Collins, J. J. (2004). Engineered riboregulators enable post-transcriptional control of gene expression. Nature biotechnology 22, 841–847. → pages 8 Istrail, S., De-Leon, S. B.-T. and Davidson, E. H. (2007). The regulatory genome and the computer. Developmental biology 310, 187–195. → pages 2, 9, 121 Iyer, V., Horak, C., Scafe, C., Botstein, D., Snyder, M. and Brown, P. (2001). Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538. → pages 75, 88, 89 Jacinto, F. V., Ballestar, E. and Esteller, M. (2008). Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. BioTechniques 44, 35, 37, 39 passim. → pages 70 Jain, K. and G W Pratt, J. (1976). Optical transistor. Applied Physics Letters 28, 719–721. → pages 4 144 Jaynes, E. T. (1957a). Information Theory and Statistical Mechanics. The Physical Review 106, 620–630. → pages 121 Jaynes, E. T. (1957b). Information Theory and Statistical Mechanics. II. Phys. Rev. 108, 171–190. → pages 121 Jeltsch, A. and Rathert, P. (2008). Putting the pieces together: histone H2B ubiquitylation directly stimulates histone H3K79 methylation. Chembiochem : a European journal of chemical biology 9, 2193–2195. → pages 113, 115 Jenuwein, T. and Allis, C. D. (2001). Translating the histone code. Science (New York, NY) 293, 1074–1080. → pages 15 Jiang, C. and Pugh, B. F. (2009). Nucleosome positioning and gene regulation: advances through genomics. Nature reviews Genetics 10, 161–172. → pages 127 Johnson, D. S., Li, W., Gordon, D. B., Bhattacharjee, A., Curry, B., Ghosh, J., Brizuela, L., Carroll, J. S., Brown, M., Flicek, P., Koch, C. M., Dunham, I., Bieda, M., Xu, X., Farnham, P. J., Kapranov, P., Nix, D. A., Gingeras, T. R., Zhang, X., Holster, H., Jiang, N., Green, R. D., Song, J. S., McCuine, S. A., Anton, E., Nguyen, L., Trinklein, N. D., Ye, Z., Ching, K., Hawkins, D., Ren, B., Scacheri, P. C., Rozowsky, J., Karpikov, A., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M., Yang, A., Moqtaderi, Z., Hirsch, H., Shulha, H. P., Fu, Y., Weng, Z., Struhl, K., Myers, R. M., Lieb, J. D. and Liu, X. S. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome research 18, 393–403. → pages 55 Johnson, W., Li, W., Meyer, C., Gottardo, R., Carroll, J., Brown, M. and Liu, X. (2006). Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A 103, 12457–12462. → pages 55, 56, 79 Jones, L., Ratcliff, F. and Baulcombe, D. C. (2001). RNA-directed transcriptional gene silencing in plants can be inherited independently of the RNA trigger and requires Met1 for maintenance. Current biology : CB 11, 747–757. → pages 20 Joshi, A. A. and Struhl, K. (2005). Eaf3 chromodomain interaction with methylated H3-K36 links histone deacetylation to Pol II elongation. Molecular cell 20, 971–978. → pages 78, 105, 118 Judy, J. T. and Ji, H. (2009). TileProbe: modeling tiling array probe effects using publicly available data. Bioinformatics (Oxford, England) 25, 2369–2375. → pages 55 145 Kahana, A. and Gottschling, D. E. (1999). DOT4 links silencing and cell growth in Saccharomyces cerevisiae. Molecular and cellular biology 19, 6608–6620. → pages 77 Kahn, S. D. (2011). On the future of genomic data. Science (New York, NY) 331, 728–729. → pages 71 Kamath, R. S. and Ahringer, J. (2003). Genome-wide RNAi screening in Caenorhabditis elegans. Methods (San Diego, Calif) 30, 313–321. → pages 19, 29 Kampa, D., Cheng, J., Kapranov, P., Yamanaka, M., Brubaker, S., Cawley, S., Drenkow, J., Piccolboni, A., Bekiranov, S., Helt, G., Tammana, H. and Gingeras, T. R. (2004). Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome research 14, 331–342. → pages 57 Kaplan, C. D., Laprade, L. and Winston, F. (2003). Transcription elongation factors repress transcription initiation from cryptic sites. Science (New York, NY) 301, 1096–1099. → pages 78, 105, 118 Kaplan, T., Liu, C., Erkmann, J., Holik, J., Grunstein, M., Kaufman, P., Friedman, N. and Rando, O. (2008). Cell cycle- and chaperone-mediated regulation of H3K56ac incorporation in yeast. PLoS genetics 4, e1000270. → pages 75 Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P. A. and Gingeras, T. R. (2002). Large-scale transcriptional activity in chromosomes 21 and 22. Science (New York, NY) 296, 916–919. → pages 54 Kapranov, P., Cheng, J., Dike, S., Nix, D. A., Duttagupta, R., Willingham, A. T., Stadler, P. F., Hertel, J., Hackermüller, J., Hofacker, I. L., Bell, I., Cheung, E., Drenkow, J., Dumais, E., Patel, S., Helt, G., Ganesh, M., Ghosh, S., Piccolboni, A., Sementchenko, V., Tammana, H. and Gingeras, T. R. (2007). RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science (New York, NY) 316, 1484–1488. → pages 54 Kapranov, P., Sementchenko, V. I. and Gingeras, T. R. (2003). Beyond expression profiling: next generation uses of high density oligonucleotide arrays. Briefings in functional genomics & proteomics 2, 47–56. → pages 53, 54 Kapranov, P., Willingham, A. T. and Gingeras, T. R. (2007). Genome-wide transcription and the implications for genomic organization. Nature reviews Genetics 8, 413–423. → pages 49, 54 146 Katayama, S., Tomaru, Y., Kasukawa, T., Waki, K., Nakanishi, M., Nakamura, M., Nishida, H., Yap, C. C., Suzuki, M., Kawai, J., Suzuki, H., Carninci, P., Hayashizaki, Y., Wells, C., Frith, M., Ravasi, T., Pang, K. C., Hallinan, J., Mattick, J., Hume, D. A., Lipovich, L., Batalov, S., Engström, P. G., Mizuno, Y., Faghihi, M. A., Sandelin, A., Chalk, A. M., Mottagui-Tabar, S., Liang, Z., Lenhard, B., Wahlestedt, C., Group, R. G. E. R., Group), G. S. G. G. N. P. C. and Consortium, F. (2005). Antisense transcription in the mammalian transcriptome. Science (New York, NY) 309, 1564–1566. → pages 54 Ketting, R. F. (2011). The many faces of RNAi. Developmental cell 20, 148–161. → pages 3 Khalil, A. M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B. E., van Oudenaarden, A., Regev, A., Lander, E. S. and Rinn, J. L. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106, 11667–11672. → pages 54 Kimmel, A. and Oliver, B. (2006). DNA microarrays: Array platforms and wet-bench protocols. Methods in enzymology, Elsevier/Academic Press. → pages 52 Kobor, M., Venkatasubrahmanyam, S., Meneghini, M., Gin, J., Jennings, J., Link, A., Madhani, H. and Rine, J. (2004). A protein complex containing the conserved Swi2/Snf2-related ATPase Swr1p deposits histone variant H2A.Z into euchromatin. PLoS biology 2, E131. → pages 77, 107, 119 Köhler, A., Pascual-Garcı́a, P., Llopis, A., Zapater, M., Posas, F., Hurt, E. and Rodrı́guez-Navarro, S. (2006). The mRNA export factor Sus1 is involved in Spt/Ada/Gcn5 acetyltransferase-mediated H2B deubiquitinylation through its interaction with Ubp8 and Sgf11. Molecular biology of the cell 17, 4228–4236. → pages 117 Köhler, A., Zimmerman, E., Schneider, M., Hurt, E. and Zheng, N. (2010). Structural basis for assembly and activation of the heterotetrameric SAGA histone H2B deubiquitinase module. Cell 141, 606–617. → pages 117 Kornberg, R. D. (1974). Chromatin Structure: A Repeating Unit of Histones and DNA. Science (New York, NY) 184, 868–871. → pages 13 Kornberg, R. D. (1977). Structure of chromatin. Annual review of biochemistry 46, 931–954. → pages 10 147 Kornberg, R. D. and Lorch, Y. (1999). Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294. → pages 13 Koshland, D. E. (2002). Special essay. The seven pillars of life. Science (New York, NY) 295, 2215–2216. → pages 121 Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128, 693–705. → pages 10, 13, 14, 73 Krogan, N., Keogh, M., Datta, N., Sawa, C., Ryan, O., Ding, H., Haw, R., Pootoolal, J., Tong, A., Canadien, V., Richards, D., Wu, X., Emili, A., Hughes, T., Buratowski, S. and Greenblatt, J. (2003a). A Snf2 family ATPase complex required for recruitment of the histone H2A variant Htz1. Molecular cell 12, 1565–1576. → pages 77, 107, 119 Krogan, N. J., Dover, J., Wood, A., Schneider, J., Heidt, J., Boateng, M. A., Dean, K., Ryan, O. W., Golshani, A., Johnston, M., Greenblatt, J. F. and Shilatifard, A. (2003b). The Paf1 complex is required for histone H3 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. Molecular cell 11, 721–729. → pages 74, 76, 115 Krogan, N. J., Kim, M., Tong, A., Golshani, A., Cagney, G., Canadien, V., Richards, D. P., Beattie, B. K., Emili, A., Boone, C., Shilatifard, A., Buratowski, S. and Greenblatt, J. (2003c). Methylation of histone H3 by Set2 in Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Molecular and cellular biology 23, 4207–4218. → pages 78 Lacoste, N., Utley, R., Hunter, J., Poirier, G. and Cote, J. (2002). Disruptor of telomeric silencing-1 is a chromatin-specific histone H3 methyltransferase. The Journal of biological chemistry 277, 30421–30424. → pages 74 Lahiri, D. K. and Maloney, B. (2006). Genes are not our destiny: the somatic epitype bridges between the genotype and the phenotype. Nature reviews Neuroscience 7. → pages 12 Lan, F. and Shi, Y. (2009). Epigenetic regulation: methylation of histone and non-histone proteins. Science in China Series C, Life sciences / Chinese Academy of Sciences 52, 311–322. → pages 74 Latham, J. A. and Dent, S. Y. R. (2007). Cross-regulation of histone modifications. Nature structural & molecular biology 14, 1017–1024. → pages 14, 76 148 Lee, J., Shukla, A., Schneider, J., Swanson, S., Washburn, M., Florens, L., Bhaumik, S. and Shilatifard, A. (2007). Histone crosstalk between H2B monoubiquitination and H3 methylation mediated by COMPASS. Cell 131, 1084–1096. → pages 115 Lee, J.-S., Smith, E. and Shilatifard, A. (2010). The language of histone crosstalk. Cell 142, 682–685. → pages 15, 49, 63, 127 Lee, J. T. (2000). Disruption of imprinted X inactivation by parent-of-origin effects at Tsix. Cell 103, 17–27. → pages 21 Lee, J. T. (2010). The X as model for RNA’s niche in epigenomic regulation. Cold Spring Harbor perspectives in biology 2, a003749. → pages 22, 124 Lee, J. T., Davidow, L. S. and Warshawsky, D. (1999). Tsix, a gene antisense to Xist at the X-inactivation centre. Nature genetics 21, 400–404. → pages 21 Lee, J. T. and Lu, N. (1999). Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99, 47–57. → pages 21 Lee, K. K., Florens, L., Swanson, S. K., Washburn, M. P. and Workman, J. L. (2005). The deubiquitylation activity of Ubp8 is dependent upon Sgf11 and its association with the SAGA complex. Molecular and cellular biology 25, 1173–1182. → pages 77 Lee, W., Tillo, D., Bray, N., Morse, R., Davis, R., Hughes, T. and Nislow, C. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature genetics 39, 1235–1244. → pages 61, 79, 80, 84 Leisner, M., Bleris, L., Lohmueller, J., Xie, Z. and Benenson, Y. (2010). Rationally designed logic integration of regulatory signals in mammalian cells. Nature nanotechnology 5, 666–670. → pages 9, 31 Lenstra, T. L., Benschop, J. J., Kim, T., Schulze, J. M., Brabers, N. A. C. H., Margaritis, T., van de Pasch, L. A. L., van Heesch, S. A. A. C., Brok, M. O., Groot Koerkamp, M. J. A., Ko, C. W., van Leenen, D., Sameith, K., van Hooff, S. R., Lijnzaad, P., Kemmeren, P., Hentrich, T., Kobor, M. S., Buratowski, S. and Holstege, F. C. P. (2011). The specificity and topology of chromatin interaction pathways in yeast. Molecular cell 42, 536–549. → pages 105, 117 Levine, M. and Davidson, E. H. (2005). Gene regulatory networks for development. Proceedings of the National Academy of Sciences of the United States of America 102, 4936–4942. → pages 2, 9, 121 149 Lickwar, C. R., Rao, B., Shabalin, A. A., Nobel, A. B., Strahl, B. D. and Lieb, J. D. (2009). The Set2/Rpd3S pathway suppresses cryptic transcription without regard to gene length or transcription frequency. PloS one 4, e4886. → pages 78, 105, 106, 119, 120 Lieb, J. D. (2003). Genome-wide mapping of protein-DNA interactions by chromatin immunoprecipitation and DNA microarray hybridization. Methods in molecular biology (Clifton, NJ) 224, 99–109. → pages 50 Link, K. H. and Breaker, R. R. (2009). Engineering ligand-responsive gene-control elements: lessons learned from natural riboswitches. Gene therapy 16, 1189–1201. → pages 31 Lipardi, C., Wei, Q. and Paterson, B. M. (2001). RNAi as random degradative PCR: siRNA primers convert mRNA into dsRNAs that are degraded to generate new siRNAs. Cell 107, 297–307. → pages 16 Lipton, R. J. and Baum, E. B. (1996). DNA based computers. Amer Mathematical Society. → pages 6 Lister, R., O’Malley, R. C., Tonti-Filippini, J., Gregory, B. D., Berry, C. C., Millar, A. H. and Ecker, J. R. (2008). Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536. → pages 21 Liu, C. L., Kaplan, T., Kim, M., Buratowski, S., Schreiber, S. L., Friedman, N. and Rando, O. J. (2005). Single-nucleosome mapping of histone modifications in S. cerevisiae. PLoS biology 3, e328. → pages 61, 98 Liu, Q., Wang, L., Frutos, A. G., Condon, A. E., Corn, R. M. and Smith, L. M. (2000). DNA computing on surfaces. Nature 403, 175–179. → pages 4 Liu, X. S. (2007). Getting started in tiling microarray analysis. PLoS computational biology 3, 1842–1844. → pages 54 Liu, Y., Mochizuki, K. and Gorovsky, M. A. (2004). Histone H3 lysine 9 methylation is required for DNA elimination in developing macronuclei in Tetrahymena. Proceedings of the National Academy of Sciences of the United States of America 101, 1679–1684. → pages 21 Lodish, H., Berk, A., Kaiser, C. A., Krieger, M., Scott, M. P., Bretscher, A., Ploegh, H. and Matsudaira, P. T. (2007). Molecular Cell Biology. 6th edition, W.H.Freeman & Co Ltd. → pages 1, 27 150 Luger, K., Mader, A., Richmond, R., Sargent, D. and Richmond, T. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251–260. → pages 10 Luikenhuis, S., Wutz, A. and Jaenisch, R. (2001). Antisense transcription through the Xist locus mediates Tsix function in embryonic stem cells. Molecular and cellular biology 21, 8512–8520. → pages 21 Luk, E., Vu, N.-D., Patteson, K., Mizuguchi, G., Wu, W.-H., Ranjan, A., Backus, J., Sen, S., Lewis, M., Bai, Y. and Wu, C. (2007). Chz1, a nuclear chaperone for histone H2AZ. Molecular cell 25, 357–368. → pages 119 Lushbough, C. M., Bergman, M. K., Lawrence, C. J., Jennewein, D. and Brendel, V. (2008). Implementing bioinformatic workflows within the bioextract server. International journal of computational biology and drug design 1, 302–312. → pages 68 MacIsaac, K. D., Wang, T., Gordon, D. B., Gifford, D. K., Stormo, G. D. and Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC bioinformatics 7, 113. → pages 66 Mandal, M., Boese, B., Barrick, J. E., Winkler, W. C. and Breaker, R. R. (2003). Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell 113, 577–586. → pages 8 Marahrens, Y., Panning, B., Dausman, J., Strauss, W. and Jaenisch, R. (1997). Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes & development 11, 156–166. → pages 21 Markowitz, V. M. and Ritter, O. (1995). Characterizing heterogeneous molecular biology database systems. Journal of computational biology : a journal of computational molecular cell biology 2, 547–556. → pages 67 Marques, M., Laflamme, L., Gervais, A. L. and Gaudreau, L. (2010). Reconciling the positive and negative roles of histone H2A.Z in gene transcription. Epigenetics : official journal of the DNA Methylation Society 5, 267–272. → pages 108 Matzke, M. A., Aufsatz, W., Kanno, T., Mette, M. F. and Matzke, A. J. M. (2002). Homology-dependent gene silencing and host defense in plants. Advances in genetics 46, 235–275. → pages 21 151 Mayer, A., Lidschreiber, M., Siebert, M., Leike, K., Söding, J. and Cramer, P. (2010). Uniform transitions of the general RNA polymerase II transcription complex. Nature structural & molecular biology 17, 1272–1278. → pages 63, 101 McGinty, R. K., Kim, J., Chatterjee, C., Roeder, R. G. and Muir, T. W. (2008). Chemically ubiquitylated histone H2B stimulates hDot1L-mediated intranucleosomal methylation. Nature 453, 812–816. → pages 113 McManus, M. T. and Sharp, P. A. (2002). Gene silencing in mammals by small interfering RNAs. Nature reviews Genetics 3, 737–747. → pages 19, 29 Mercer, T. R., Dinger, M. E. and Mattick, J. S. (2009). Long non-coding RNAs: insights into functions. Nature reviews Genetics 10, 155–159. → pages 54 Mette, M. F., Aufsatz, W., van der Winden, J., Matzke, M. A. and Matzke, A. J. (2000). Transcriptional silencing and promoter methylation triggered by double-stranded RNA. The EMBO journal 19, 5194–5201. → pages 20 Metzker, M. L. (2010). Sequencing technologies - the next generation. Nature reviews Genetics 11, 31–46. → pages 49 Miller, T., Krogan, N. J., Dover, J., Erdjument-Bromage, H., Tempst, P., Johnston, M., Greenblatt, J. F. and Shilatifard, A. (2001). COMPASS: a complex of proteins associated with a trithorax-related SET domain protein. Proceedings of the National Academy of Sciences of the United States of America 98, 12902–12907. → pages 74 Milosavljevic, A. (2010). Putting epigenome comparison into practice. Nature biotechnology 28, 1053–1056. → pages 15 Minsky, N., Shema, E., Field, Y., Schuster, M., Segal, E. and Oren, M. (2008). Monoubiquitinated H2B is associated with the transcribed region of highly expressed genes in human cells. Nature cell biology 10, 483–488. → pages 93 Misteli, T. (2005). Concepts in nuclear architecture. BioEssays : news and reviews in molecular, cellular and developmental biology 27, 477–487. → pages 10 Misteli, T. (2007). Beyond the sequence: cellular organization of genome function. Cell 128, 787–800. → pages 10, 73, 128 Mizuguchi, G., Shen, X., Landry, J., Wu, W., Sen, S. and Wu, C. (2004). ATP-driven exchange of histone H2AZ variant catalyzed by SWR1 chromatin 152 remodeling complex. Science (New York, NY) 303, 343–348. → pages 77, 107, 119 Mochizuki, K., Fine, N. A., Fujisawa, T. and Gorovsky, M. A. (2002). Analysis of a piwi-related gene implicates small RNAs in genome rearrangement in tetrahymena. Cell 110, 689–699. → pages 20, 21 Morgan, H. D., Santos, F., Green, K., Dean, W. and Reik, W. (2005). Epigenetic reprogramming in mammals. Human molecular genetics 14 Spec No 1, R47–58. → pages 127 Mosammaparast, N. and Shi, Y. (2010). Reversal of histone methylation: biochemical and molecular mechanisms of histone demethylases. Annual review of biochemistry 79, 155–179. → pages 74 Moss, E. G. (2003). Silencing unhealthy alleles naturally. Trends in biotechnology 21, 185–187. → pages 19, 29 Müller-Hill, B. (1996). The lac Operon. a short history of a genetic paradigm, De Gruyter. → pages 6, 27 Muramoto, T., Müller, I., Thomas, G., Melvin, A. and Chubb, J. R. (2010). Methylation of H3K4 Is required for inheritance of active transcriptional states. Current biology : CB 20, 397–406. → pages 74, 115 Murray, K. (1964). The occurrence of epsilon-N-methyl lysine in histones. Biochemistry 3, 10–15. → pages 74 Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (New York, NY) 320, 1344–1349. → pages 109 Nakanishi, S., Lee, J. S., Gardner, K. E., Gardner, J. M., Takahashi, Y.-h., Chandrasekharan, M. B., Sun, Z.-W., Osley, M. A., Strahl, B. D., Jaspersen, S. L. and Shilatifard, A. (2009). Histone H2BK123 monoubiquitination is the critical determinant for H3K4 and H3K79 trimethylation by COMPASS and Dot1. The Journal of cell biology 186, 371–377. → pages 114 Napoli, C., Lemieux, C. and Jorgensen, R. (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. The Plant cell 2, 279–289. → pages 17 Naturalists, A. S. o., of Chicago Press. Journals Division, U. and (Organization), J. (1878). The American naturalist. → pages 121 153 Nature Biotechnology, E. (2010). Making a mark. Nature biotechnology 28, 1031. → pages 15, 16 Neumann, J. V. (1966). Theory of Self-Reproducing Automata. University of Illinois Press, Champaign, IL, USA. → pages 5 Newmann, J. v. (1945). First draft of a report on the EDVAC. Technical report. → pages 4 Ng, H. H., Ciccone, D. N., Morshead, K. B., Oettinger, M. A. and Struhl, K. (2003a). Lysine-79 of histone H3 is hypomethylated at silenced loci in yeast and mammalian cells: a potential mechanism for position-effect variegation. Proceedings of the National Academy of Sciences of the United States of America 100, 1820–1825. → pages 74 Ng, H. H., Robert, F., Young, R. A. and Struhl, K. (2003b). Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. Molecular cell 11, 709–719. → pages 74, 115 Ng, S. B., Buckingham, K. J., Lee, C., Bigham, A. W., Tabor, H. K., Dent, K. M., Huff, C. D., Shannon, P. T., Jabs, E. W., Nickerson, D. A., Shendure, J. and Bamshad, M. J. (2010). Exome sequencing identifies the cause of a mendelian disorder. Nature genetics 42, 30–35. → pages 129 Niazov, T., Baron, R., Katz, E., Lioubashevski, O. and Willner, I. (2006). Concatenated logic gates using four coupled biocatalysts operating in series. Proceedings of the National Academy of Sciences of the United States of America 103, 17160–17163. → pages 8 Niemeyer, C. M., Koehler, J. and Wuerdemann, C. (2002). DNA-directed assembly of bienzymic complexes from in vivo biotinylated NAD(P)H:FMN oxidoreductase and luciferase. Chembiochem : a European journal of chemical biology 3, 242–245. → pages 7 Nugent, R. and Meila, M. (2010). An overview of clustering applied to molecular biology. Methods in molecular biology (Clifton, NJ) 620, 369–404. → pages 64 Ogawa, Y., Sun, B. K. and Lee, J. T. (2008). Intersection of the RNA interference and X-inactivation pathways. Science (New York, NY) 320, 1336–1341. → pages 21 154 Okamura, K. and Lai, E. C. (2008). Endogenous small interfering RNAs in animals. Nature reviews Molecular cell biology 9, 673–678. → pages 29 Olins, A. L. and Olins, D. E. (1974). Spheroid chromatin units (v bodies). Science (New York, NY) 183, 330–332. → pages 10 O’Neill, L. P., VerMilyea, M. D. and Turner, B. M. (2006). Epigenetic characterization of the early embryo with a chromatin immunoprecipitation protocol applicable to small cell populations. Nature genetics 38, 835–841. → pages 128 Onodera, Y., Haag, J. R., Ream, T., Nunes, P. C., Pontes, O. and Pikaard, C. S. (2005). Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120, 613–622. → pages 21 Oudet, P., Gross-Bellard, M. and Chambon, P. (1975). Electron microscopic and biochemical evidence that chromatin structure is a repeating unit. Cell 4, 281–300. → pages 10, 13 Ouyang, Q., Kaplan, P. D., Liu, S. and Libchaber, A. (1997). DNA solution of the maximal clique problem. Science (New York, NY) 278, 446–449. → pages 4 Pak, J. and Fire, A. (2007). Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science (New York, NY) 315, 241–244. → pages 29 Pal-Bhadra, M., Bhadra, U. and Birchler, J. A. (2002). RNAi related mechanisms affect both transcriptional and posttranscriptional transgene silencing in Drosophila. Molecular cell 9, 315–327. → pages 20, 21 Pal-Bhadra, M., Leibovitch, B. A., Gandhi, S. G., Rao, M., Bhadra, U., Birchler, J. A. and Elgin, S. C. R. (2004). Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science (New York, NY) 303, 669–672. → pages 21 Pattenden, S. G., Gogol, M. M. and Workman, J. L. (2010). Features of cryptic promoters and their varied reliance on bromodomain-containing factors. PloS one 5, e12927. → pages 78 Paul Hill, W. H. (1989). Art of Electronics, The; 2nd Edition. Cambridge University Press. → pages 27, 31, 33 155 Penny, G. D., Kay, G. F., Sheardown, S. A., Rastan, S. and Brockdorff, N. (1996). Requirement for Xist in X chromosome inactivation. Nature 379, 131–137. → pages 21 Picard, F., Robin, S., Lavielle, M., Vaisse, C. and Daudin, J.-J. (2005). A statistical approach for array CGH data analysis. BMC bioinformatics 6, 27. → pages 57 Plasterk, R. H. A. (2002). RNA silencing: the genome’s immune system. Science (New York, NY) 296, 1263–1265. → pages 16 Pokholok, D., Harbison, C., Levine, S., Cole, M., Hannett, N., Lee, T., Bell, G., Walker, K., Rolfe, P., Herbolsheimer, E., Zeitlinger, J., Lewitter, F., Gifford, D. and Young, R. (2005). Genome-wide map of nucleosome acetylation and methylation in yeast. Cell 122, 517–527. → pages 63, 74, 81, 82, 93, 107 Powell, D. W., Weaver, C. M., Jennings, J. L., McAfee, K. J., He, Y., Weil, P. A. and Link, A. J. (2004). Cluster analysis of mass spectrometry data reveals a novel component of SAGA. Molecular and cellular biology 24, 7249–7259. → pages 77 Privman, V., Strack, G., Solenov, D., Pita, M. and Katz, E. (2008). Optimization of enzymatic biochemical logic for noise reduction and scalability: how many biocomputing gates can be interconnected in a circuit? The journal of physical chemistry. B 112, 11777–11784. → pages 8 Prohaska, S. J., Stadler, P. F. and Krakauer, D. C. (2010). Innovation in gene regulation: the case of chromatin computation. Journal of theoretical biology 265, 27–44. → pages 9, 12, 13, 14, 15, 125, 127 Ptashne, M. (1986). A genetic switch: gene control and phage [lambda]. Cell Press. → pages 6, 27 Ptashne, M. (1988). How eukaryotic transcriptional activators work. Nature 335, 683–689. → pages 27 Ptashne, M. (2007). On the use of the word ’epigenetic’. Current biology : CB 17, R233–6. → pages 11 Ptashne, M. (2009). Binding reactions: epigenetic switches, signal transduction and cancer. Current biology : CB 19, R234–41. → pages 12 Radman-Livaja, M., Liu, C. L., Friedman, N., Schreiber, S. L. and Rando, O. J. (2010). Replication and active demethylation represent partially overlapping 156 mechanisms for erasure of H3K4me3 in budding yeast. PLoS genetics 6, e1000837. → pages 74 Raisner, R., Hartley, P., Meneghini, M., Bao, M., Liu, C., Schreiber, S., Rando, O. and Madhani, H. (2005). Histone variant H2A.Z marks the 5’ ends of both active and inactive genes in euchromatin. Cell 123, 233–248. → pages 106 Rana, T. M. (2007). Illuminating the silence: understanding the structure and function of small RNAs. Nature reviews Molecular cell biology 8, 23–36. → pages 17 Rando, O. J. and Chang, H. Y. (2009). Genome-wide views of chromatin structure. Annual review of biochemistry 78, 245–271. → pages 14 Regev, A. and Shapiro, E. (2002). Cells as computation. Nature 419, 343. → pages 6 Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P. and Mesirov, J. P. (2006). GenePattern 2.0. Nature genetics 38, 500–501. → pages 68 Reik, W. (2007). Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447, 425–432. → pages 127 Rinaudo, K., Bleris, L., Maddamsetti, R., Subramanian, S., Weiss, R. and Benenson, Y. (2007). A universal RNAi-based logic evaluator that operates in mammalian cells. Nature biotechnology 25, 795–801. → pages 3, 9, 19, 31, 46, 125 Ritter, O., Kocab, P., Senger, M., Wolf, D. and Suhai, S. (1994). Prototype implementation of the integrated genomic database. Computers and biomedical research, an international journal 27, 97–115. → pages 67 Robzyk, K., Recht, J. and Osley, M. A. (2000). Rad6-dependent ubiquitination of histone H2B in yeast. Science (New York, NY) 287, 501–504. → pages 76 Rössler, O. (1974). Chemical automata in homogeneous and reaction-diffusion kinetics. Springer, Heidelberg. → pages 5 Rössler, O. E. (1972). A principle for chemical multivibration. Journal of theoretical biology 36, 413–417. → pages 5 Rothemund, P. W. K. (2006). Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302. → pages 7 157 Royce, T. E., Rozowsky, J. S., Bertone, P., Samanta, M., Stolc, V., Weissman, S., Snyder, M. and Gerstein, M. (2005). Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends in genetics : TIG 21, 466–475. → pages 57 Ruthenburg, A., Li, H., Patel, D. and Allis, C. (2007). Multivalent engagement of chromatin modifications by linked binding modules. Nature reviews Molecular cell biology 8, 983–994. → pages 14 Sado, T., Wang, Z., Sasaki, H. and Li, E. (2001). Regulation of imprinted X-chromosome inactivation in mice by Tsix. Development (Cambridge, England) 128, 1275–1286. → pages 21 Samara, N. L., Datta, A. B., Berndsen, C. E., Zhang, X., Yao, T., Cohen, R. E. and Wolberger, C. (2010). Structural insights into the assembly and function of the SAGA deubiquitinating module. Science (New York, NY) 328, 1025–1029. → pages 117 Santos-Rosa, H., Schneider, R., Bannister, A. J., Sherriff, J., Bernstein, B. E., Emre, N. C. T., Schreiber, S. L., Mellor, J. and Kouzarides, T. (2002). Active genes are tri-methylated at K4 of histone H3. Nature 419, 407–411. → pages 74 Satterlee, J. S., Schübeler, D. and Ng, H.-H. (2010). Tackling the epigenome: challenges and opportunities for collaboration. Nature biotechnology 28, 1039–1044. → pages 12 Schadt, E. E., Edwards, S. W., GuhaThakurta, D., Holder, D., Ying, L., Svetnik, V., Leonardson, A., Hart, K. W., Russell, A., Li, G., Cavet, G., Castle, J., McDonagh, P., Kan, Z., Chen, R., Kasarskis, A., Margarint, M., Caceres, R. M., Johnson, J. M., Armour, C. D., Garrett-Engele, P. W., Tsinoremas, N. F. and Shoemaker, D. D. (2004). A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome biology 5, R73. → pages 57 Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L. and Nolan, G. P. (2010). Computational solutions to large-scale data management and analysis. Nature reviews Genetics 11, 647–657. → pages 67 Sedighi, M. and Sengupta, A. M. (2007). Epigenetic chromatin silencing: bistability and front propagation. Physical biology 4, 246–255. → pages 127 Seeman, N. C. (2003). DNA in a material world. Nature 421, 427–431. → pages 7 158 Segal, E. and Widom, J. (2009). From DNA sequence to transcriptional behaviour: a quantitative approach. Nature reviews Genetics 10, 443–456. → pages 127 Serre, D., Lee, B. H. and Ting, A. H. (2010). MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic acids research 38, 391–399. → pages 70 Shahbazian, M., Zhang, K. and Grunstein, M. (2005). Histone H2B ubiquitylation controls processive methylation but not monomethylation by Dot1 and Set1. Molecular cell 19, 271–277. → pages 79, 112 Shalgi, R., Lieber, D., Oren, M. and Pilpel, Y. (2007). Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS computational biology 3, e131. → pages 125 Shapiro, E. and Benenson, Y. (2006). Tapping the computing power of biological molecules gives rise to tiny machines that can speak directly to living cells. Scientific American 294, 44–51. → pages 5, 6, 31 Shendure, J. and Ji, H. (2008). Next-generation DNA sequencing. Nature biotechnology 26, 1135–1145. → pages 49 Shilatifard, A. (2006). Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. Annual review of biochemistry 75, 243–269. → pages 74, 76 Shimoni, Y., Friedlander, G., Hetzroni, G., Niv, G., Altuvia, S., Biham, O. and Margalit, H. (2007). Regulation of gene expression by small non-coding RNAs: a quantitative view. Molecular systems biology 3, 138. → pages 125 Siepel, A., Farmer, A., Tolopko, A., Zhuang, M., Mendes, P., Beavis, W. and Sobral, B. (2001). ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics (Oxford, England) 17, 83–94. → pages 67 Sijen, T., Fleenor, J., Simmer, F., Thijssen, K. L., Parrish, S., Timmons, L., Plasterk, R. H. and Fire, A. (2001). On the role of RNA amplification in dsRNA-triggered gene silencing. Cell 107, 465–476. → pages 16 Simmel, F. C. and Dittmer, W. U. (2005). DNA nanodevices. Small (Weinheim an der Bergstrasse, Germany) 1, 284–299. → pages 7 159 Simon, I., Barnett, J., Hannett, N., Harbison, C., Rinaldi, N., Volkert, T., Wyrick, J., Zeitlinger, J., Gifford, D., Jaakkola, T. and Young, R. (2001). Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708. → pages 75, 88, 113 Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B. and de Laat, W. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature genetics 38, 1348–1354. → pages 128 Simpson, R. T. (1978). Structure of chromatin containing extensively acetylated H3 and H4. Cell 13, 691–699. → pages 14 Singer, M. S., Kahana, A., Wolf, A. J., Meisinger, L. L., Peterson, S. E., Goggin, C., Mahowald, M. and Gottschling, D. E. (1998). Identification of high-copy disruptors of telomeric silencing in Saccharomyces cerevisiae. Genetics 150, 613–632. → pages 117 Siomi, H. and Siomi, M. C. (2009). On the road to reading the RNA-interference code. Nature 457, 396–404. → pages 29 Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G. and Kasprzyk, A. (2009). BioMart–biological queries made easy. BMC genomics 10, 22. → pages 68 Song, J. S., Johnson, W. E., Zhu, X., Zhang, X., Li, W., Manrai, A. K., Liu, J. S., Chen, R. and Liu, X. S. (2007). Model-based analysis of two-color arrays (MA2C). Genome biology 8, R178. → pages 55 Song, Y.-H. and Ahn, S. H. (2010). A Bre1-associated protein, large 1 (Lge1), promotes H2B ubiquitylation during the early stages of transcription elongation. The Journal of biological chemistry 285, 2361–2367. → pages 104 Spearman, C. (1987). The proof and measurement of association between two things. By C. Spearman, 1904. The American journal of psychology 100, 441–471. → pages 63 Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular biology of the cell 9, 3273–3297. → pages 75, 84, 85, 86, 91 160 Steigele, S. and Nieselt, K. (2005). Open reading frames provide a rich pool of potential natural antisense transcripts in fungal genomes. Nucleic acids research 33, 5034–5044. → pages 109 Stockinger, H., Attwood, T., Chohan, S. N., Côté, R., Cudré-Mauroux, P., Falquet, L., Fernandes, P., Finn, R. D., Hupponen, T., Korpelainen, E., Labarga, A., Laugraud, A., Lima, T., Pafilis, E., Pagni, M., Pettifer, S., Phan, I. and Rahman, N. (2008). Experience using web services for biological sequence analysis. Briefings in bioinformatics 9, 493–505. → pages 67 Stojanovic, M. (2010). Two Molecular Information Processing Systems Based on Catalytic Nucleic Acids. In Natural Computing, (Peper, F., Umeo, H., Matsui, N. and Isokawa, T., eds), pp. 55–63. Springer Japan Columbia University USA. 10.1007/978-4-431-53868-4 6. → pages 125, 126 Stojanovic, M. N. and Stefanovic, D. (2003). A deoxyribozyme-based molecular automaton. Nature biotechnology 21, 1069–1074. → pages 8 Stoughton, R. B. (2005). Applications of DNA microarrays in biology. Annual review of biochemistry 74, 53–82. → pages 49 Strahl, B. D. and Allis, C. D. (2000). The language of covalent histone modifications. Nature 403, 41–45. → pages 15 Strahl, B. D., Grant, P. A., Briggs, S. D., Sun, Z.-W., Bone, J. R., Caldwell, J. A., Mollah, S., Cook, R. G., Shabanowitz, J., Hunt, D. F. and Allis, C. D. (2002). Set2 is a nucleosomal histone H3-selective methyltransferase that mediates transcriptional repression. Molecular and cellular biology 22, 1298–1306. → pages 74, 78 Subramaniam, S. (1998). The Biology Workbench–a seamless database and analysis environment for the biologist. Proteins 32, 1–2. → pages 67 Subsoontorn, P., Kim, J. and Winfree, E. (2011). Bistability of an In Vitro Synthetic Autoregulatory Switch. ArXiv e-prints, http://adsabs.harvard.edu/abs/2011arXiv1101.0723S. → pages 47 Suganuma, T. and Workman, J. L. (2008). Crosstalk among Histone Modifications. Cell 135, 604–607. → pages 14, 75, 76 Sun, Z. and Allis, C. (2002). Ubiquitination of histone H2B regulates H3 methylation and gene silencing in yeast. Nature 418, 104–108. → pages 76, 114 161 Tabara, H., Sarkissian, M., Kelly, W. G., Fleenor, J., Grishok, A., Timmons, L., Fire, A. and Mello, C. C. (1999). The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99, 123–132. → pages 20 Talbert, P. B. and Henikoff, S. (2010). Histone variants–ancient wrap artists of the epigenome. Nature reviews Molecular cell biology 11, 264–275. → pages 14, 77 Tamsir, A., Tabor, J. J. and Voigt, C. A. (2011). Robust multicellular computing using genetically encoded NOR gates and chemical ’wires’. Nature 469, 212–215. → pages 46 Taunton, J., Hassig, C. A. and Schreiber, S. L. (1996). A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p. Science (New York, NY) 272, 408–411. → pages 13 Tavazoie, S., Hughes, J., Campbell, M., Cho, R. and Church, G. (1999). Systematic determination of genetic network architecture. Nature genetics 22, 281–285. → pages 84 Taverna, S. D., Coyne, R. S. and Allis, C. D. (2002). Methylation of histone h3 at lysine 9 targets programmed DNA elimination in tetrahymena. Cell 110, 701–711. → pages 20, 21 Taverna, S. D., Ueberheide, B. M., Liu, Y., Tackett, A. J., Diaz, R. L., Shabanowitz, J., Chait, B. T., Hunt, D. F. and Allis, C. D. (2007). Long-distance combinatorial linkage between methylation and acetylation on histone H3 N termini. Proceedings of the National Academy of Sciences of the United States of America 104, 2086–2091. → pages 128 Taylor, J., Schenck, I., Blankenberg, D. and Nekrutenko, A. (2007). Using galaxy to perform large-scale interactive data analyses. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 10, Unit 10.5. → pages 68, 69 Tjaden, B., Haynor, D. R., Stolyar, S., Rosenow, C. and Kolker, E. (2002). Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics (Oxford, England) 18 Suppl 1, S337–44. → pages 57 Toedling, J., Skylar, O., Sklyar, O., Krueger, T., Fischer, J. J., Sperling, S. and Huber, W. (2007). Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. BMC bioinformatics 8, 221. → pages 55 162 Tsai, M.-C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J. K., Lan, F., Shi, Y., Segal, E. and Chang, H. Y. (2010). Long noncoding RNA as modular scaffold of histone modification complexes. Science (New York, NY) 329, 689–693. → pages 21 Tuleuova, N., An, C.-I., Ramanculov, E., Revzin, A. and Yokobayashi, Y. (2008). Modulating endogenous gene expression of mammalian cells via RNA-small molecule interaction. Biochemical and biophysical research communications 376, 169–173. → pages 9, 19 Turing, A. M. (1937). On Computable Numbers, with an Application to the Entscheidungsproblem. Proc. London Math. Soc. s2-42, 230–265. → pages 4 Turner, B. M. (2007). Defining an epigenetic code. Nature cell biology 9, 2–6. → pages 15 Tuschl, T. and Borkhardt, A. (2002). Small interfering RNAs: a revolutionary tool for the analysis of gene function and gene therapy. Molecular interventions 2, 158–167. → pages 19 Ura, K., Kurumizaka, H., Dimitrov, S., Almouzni, G. and Wolffe, A. P. (1997). Histone acetylation: influence on transcription, nucleosome mobility and positioning, and linker histone-dependent transcriptional repression. The EMBO journal 16, 2096–2107. → pages 14 van Bakel, H., van Werven, F., Radonjic, M., Brok, M., van Leenen, D., Holstege, F. and Timmers, H. (2008). Improved genome-wide localization by ChIP-chip using double-round T7 RNA polymerase-based amplification. Nucleic acids research 36, e21. → pages 51 van Driel, R., Fransz, P. F. and Verschure, P. J. (2003). The eukaryotic genome: a system regulated at different hierarchical levels. Journal of cell science 116, 4067–4075. → pages 10 van Leeuwen, F., Gafken, P. and Gottschling, D. (2002). Dot1p modulates silencing in yeast by methylation of the nucleosome core. Cell 109, 745–756. → pages 74 Vassetzky, Y., Gavrilov, A., Eivazova, E., Priozhkova, I., Lipinski, M. and Razin, S. (2009). Chromosome conformation capture (from 3C to 5C) and its ChIP-based modification. Methods in molecular biology (Clifton, NJ) 567, 171–188. → pages 128 163 Venters, B. J. and Pugh, B. F. (2009). A canonical promoter organization of the transcription machinery and its regulators in the Saccharomyces genome. Genome research 19, 360–371. → pages 63 Verdel, A., Jia, S., Gerber, S., Sugiyama, T., Gygi, S., Grewal, S. I. S. and Moazed, D. (2004). RNAi-mediated targeting of heterochromatin by the RITS complex. Science (New York, NY) 303, 672–676. → pages 20 Vermeulen, A., Behlen, L., Reynolds, A., Wolfson, A., Marshall, W. S., Karpilow, J. and Khvorova, A. (2005). The contributions of dsRNA structure to Dicer specificity and efficiency. RNA (New York, NY) 11, 674–682. → pages 29 Volpe, T., Schramke, V., Hamilton, G. L., White, S. A., Teng, G., Martienssen, R. A. and Allshire, R. C. (2003). RNA interference is required for normal centromere function in fission yeast. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 11, 137–146. → pages 20 Volpe, T. A., Kidner, C., Hall, I. M., Teng, G., Grewal, S. I. S. and Martienssen, R. A. (2002). Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science (New York, NY) 297, 1833–1837. → pages 20 Washietl, S., Machné, R. and Goldman, N. (2008). Evolutionary footprints of nucleosome positions in yeast. Trends in genetics : TIG 24, 583–587. → pages 127 Watson, J. D. and Crick, F. H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737–738. → pages 6 Weake, V. M. and Workman, J. L. (2008). Histone ubiquitination: triggering gene activity. Molecular cell 29, 653–663. → pages 77, 105, 115, 117 Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L. and Schübeler, D. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nature genetics 37, 853–862. → pages 70 Weiss, R. (2001). Cellular computation and communications using engineered genetic regulatory networks. PhD thesis,. → pages 7 Werner, T. (2008). Bioinformatics applications for pathway analysis of microarray data. Current opinion in biotechnology 19, 50–54. → pages 65 164 Wheeler, J. A. (1990). A journey into gravity and spacetime. Scientific American Library, New York. → pages 122 Win, M. N., Liang, J. C. and Smolke, C. D. (2009). Frameworks for programming biological function through RNA parts and devices. Chemistry & biology 16, 298–310. → pages 31, 125 Win, M. N. and Smolke, C. D. (2007a). A modular and extensible RNA-based gene-regulatory platform for engineering cellular function. Proceedings of the National Academy of Sciences of the United States of America 104, 14283–14288. → pages 8, 31 Win, M. N. and Smolke, C. D. (2007b). RNA as a versatile and powerful platform for engineering genetic regulatory tools. Biotechnology & genetic engineering reviews 24, 311–346. → pages 125 Win, M. N. and Smolke, C. D. (2008). Higher-order cellular information processing with synthetic RNA devices. Science (New York, NY) 322, 456–460. → pages 8 Winkler, W. C. and Breaker, R. R. (2005). Regulation of bacterial gene expression by riboswitches. Annual review of microbiology 59, 487–517. → pages 8 Wood, A., Krogan, N., Dover, J., Schneider, J., Heidt, J., Boateng, M., Dean, K., Golshani, A., Zhang, Y., Greenblatt, J., Johnston, M. and Shilatifard, A. (2003a). Bre1, an E3 ubiquitin ligase required for recruitment and substrate selection of Rad6 at a promoter. Molecular cell 11, 267–274. → pages 76 Wood, A., Schneider, J., Dover, J., Johnston, M. and Shilatifard, A. (2003b). The Paf1 complex is essential for histone monoubiquitination by the Rad6-Bre1 complex, which signals for histone methylation by COMPASS and Dot1p. The Journal of biological chemistry 278, 34739–34742. → pages 76 Wood, A., Schneider, J., Dover, J., Johnston, M. and Shilatifard, A. (2005). The Bur1/Bur2 complex is required for histone H2B monoubiquitination by Rad6/Bre1 and histone methylation by COMPASS. Molecular cell 20, 589–599. → pages 76 Woodcock, C. L. (2006). Chromatin architecture. Current opinion in structural biology 16, 213–220. → pages 10 Wyce, A., Xiao, T., Whelan, K. A., Kosman, C., Walter, W., Eick, D., Hughes, T. R., Krogan, N. J., Strahl, B. D. and Berger, S. L. (2007). H2B ubiquitylation 165 acts as a barrier to Ctk1 nucleosomal recruitment prior to removal by Ubp8 within a SAGA-related complex. Molecular cell 27, 275–288. → pages 77, 117 Xiao, T., Kao, C.-F., Krogan, N. J., Sun, Z.-W., Greenblatt, J. F., Osley, M. A. and Strahl, B. D. (2005). Histone H2B ubiquitylation is associated with elongating RNA polymerase II. Molecular and cellular biology 25, 637–651. → pages 76, 118 Xie, Z., Liu, S. J., Bleris, L. and Benenson, Y. (2010). Logic integration of mRNA signals by an RNAi-based molecular computer. Nucleic acids research 38, 2692–2701. → pages 9, 31 Xin, H. and Woolley, A. T. (2003). DNA-templated nanotube localization. Journal of the American Chemical Society 125, 8710–8711. → pages 7 Yanagida, T., Ueda, M., Murata, T., Esaki, S. and Ishii, Y. (2007). Brownian motion, fluctuation and life. Bio Systems 88, 228–242. → pages 124 Young, N. L., Plazas-Mayorca, M. D. and Garcia, B. A. (2010). Systems-wide proteomic characterization of combinatorial post-translational modification patterns. Expert review of proteomics 7, 79–92. → pages 128 Young, R. A. (2000). Biomedical discovery with DNA arrays. Cell 102, 9–15. → pages 53, 54 Youngson, N. A. and Whitelaw, E. (2008). Transgenerational epigenetic effects. Annual review of genomics and human genetics 9, 233–257. → pages 11 Zamore, P. D. (2002). Ancient pathways programmed by small RNAs. Science (New York, NY) 296, 1265–1269. → pages 16 Zamore, P. D., Tuschl, T., Sharp, P. A. and Bartel, D. P. (2000). RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25–33. → pages 29 Zhang, H., Roberts, D. N. and Cairns, B. R. (2005). Genome-wide dynamics of Htz1, a histone H2A variant that poises repressed/basal promoters for activation through histone loss. Cell 123, 219–231. → pages 106 Zhang, T.-Y. and Meaney, M. J. (2010). Epigenetics and the environmental regulation of the genome and its function. Annual review of psychology 61, 439–66, C1–3. → pages 12 166 Zhang, Z. D., Rozowsky, J., Lam, H. Y. K., Du, J., Snyder, M. and Gerstein, M. (2007). Tilescope: online analysis pipeline for high-density tiling microarray data. Genome biology 8, R81. → pages 55 Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K. S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S. and Ohlsson, R. (2006). Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nature genetics 38, 1341–1347. → pages 128 Zilberman, D., Cao, X. and Jacobsen, S. E. (2003). ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science (New York, NY) 299, 716–719. → pages 20, 21 Zofall, M., Fischer, T., Zhang, K., Zhou, M., Cui, B., Veenstra, T. D. and Grewal, S. I. S. (2009). Histone H2A.Z cooperates with RNAi and heterochromatin factors to suppress antisense RNAs. Nature 461, 419–422. → pages 20, 78 Zuse, K. (1990). Der Computer - mein Lebenswerk (2. Aufl.). Springer. → pages 121 167

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0052107/manifest

Comment

Related Items