Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Epigenetic information in the cell : a potential avenue for biocompatible computing Hentrich, Thomas 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2011_fall_hentrich_thomas.pdf [ 9.19MB ]
JSON: 24-1.0052107.json
JSON-LD: 24-1.0052107-ld.json
RDF/XML (Pretty): 24-1.0052107-rdf.xml
RDF/JSON: 24-1.0052107-rdf.json
Turtle: 24-1.0052107-turtle.txt
N-Triples: 24-1.0052107-rdf-ntriples.txt
Original Record: 24-1.0052107-source.json
Full Text

Full Text

Epigenetic information in the cell A potential avenue for biocompatible computing  by Thomas Hentrich Diplom-Informatiker, Friedrich-Schiller Universit¨at Jena, Germany, 2006  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Computer Science)  The University Of British Columbia (Vancouver) August 2011 c Thomas Hentrich, 2011  Abstract Living organisms sense, process and produce molecular signals to regulate their activity and, thus, process information and perform computations on biological substrates. Understanding the fundamental principles of information representation and manipulation along these algorithmic bioprocesses will both advance our understanding of biology and inspire novel forms of computation. From one side of this connection, the research field of Bioinformatics uses computational tools to gain insight into the processes of life. On the other side, Biomolecular Computing attempts to utilize molecules and cellular processing machineries to operate engineered nano-scale biocomputers in live cells for in situ diagnostics, therapeutics and many other applications in biotechnology, bioengineering and biomedicine. Addressing this bifold character of information of life, this thesis contributes to the part of Computing Life by elucidating aspects of chromatin, the nucleoprotein structure of DNA in the cell. As a computation and storage layer in the cell, chromatin offers a high degree of plasticity to integrate (environmental) signals into DNA-based regulation and allows information propagation according to epigenetic principles. In this work, I present results on the complexity of chromatin modifications and reveal their impact on gene activity and related functions in the cell. I further discuss a bioinformatics pipeline I developed and that was applied for genome-wide chromatin profiling and transcriptome analysis. From the perspective of Living Computers, I address the question of information encoding by biomolecules and draw on principles of epigenetics to devise a model for an RNA interference-based equivalent of the electric flip-flop, which ii  is one of the fundamental elements in digital circuits. In particular, I focus on the digital switch abstraction of the flip-flop as it is the pivotal idea in both the electric and biomolecular world that connects and at the same time decouples an underlying physical process and the abstract representation of information. This work contributes elucidating the computational principles of the RNA interference machinery and suggests novel ideas for universal memory units in biomolecular computing. By juxtaposing both the natural and artificial perspective, this thesis attempts to enhance our understanding of epigenetic information processing in the cell and its capacity for biocomputing applications.  iii  Preface Chapter 2 of this dissertation is based on a manuscript under review: Hentrich T and Unrau PJ, Programming with RNA interference: the potential for a bistable flip-flop. As first author, I was involved in formulating the research question and research design together with P. J. Unrau as principal investigator. In the course of the project, I generated, analyzed and visualized all the data and wrote the manuscript. Section 3.5.2 in Chapter 3 contains material of a manuscript to be submitted: Hentrich T, Schulze JM, Kobor MS, Emberly E, CHROMATRA: a web-based tool for compact visualization of chromatin modifications across transcripts. Together with J. M. Schulze, I was involved in developing the visualization approach, implementing it in software and writing the manuscript. M. S. Kobor and E. Emberly provided guidance in the course of the project. Section 4.3 in Chapter 4 contains parts of a published manuscript: Schulze JM, Jackson J, Nakanishi S, Gardner JM, Hentrich T, Haug J, Johnston M, Jaspersen SL, Kobor MS, Shilatifard A. (2009), Linking cell cycle to histone modifications: SBF and H2B monoubiquitination machinery and cell-cycle regulation of H3K79 dimethylation. Mol Cell. 35, 626-41. All experiments described in the chapter were performed by J. M. Schulze, and I developed the software to analyze the generated data. Parts of the manuscript that are not shown originate from work of J. Jackson, S. Nakanishi, J. Gardner and J. Haug. M. Johnston, S. L. Jaspersen, M. S. Kobor and A. Shilatifard guided the project as principal investigators. Section 4.4 of the same chapter is based on a submitted manuscript: Schulze JM, Hentrich T, Nakanishi S, Gupta A, Emberly E, Shilatifard A, Kobor MS. Splitiv  ting the task: Ubp8 and Ubp10 deubiquinate different cellular pools of H2BK123. As equal first author, I was involved in research design, data analysis and writing the manuscript. J. M. Schulze performed the experiments, S. Nakanishi provided chemical reagents, and A. Gupta, E. Emberly A. Shilatifard and M. S. Kobor guided the project. Results shown in Section 4.5 of Chapter 4 are based on experiments performed by J. M. Schulze in the laboratory of M. S. Kobor. My role in this project was to develop the software and analyze the presented data.  v  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xii  Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xiv  1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.1  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.2  Biomolecular computing . . . . . . . . . . . . . . . . . . . . . .  4  1.2.1  The computational paradigm . . . . . . . . . . . . . . . .  4  1.2.2  Towards DNA-based computation . . . . . . . . . . . . .  5  1.2.3  Beyond DNA-based computation . . . . . . . . . . . . .  7  Epigenetic regulation in the cell . . . . . . . . . . . . . . . . . .  10  1.3.1  Chromatin . . . . . . . . . . . . . . . . . . . . . . . . .  10  1.3.2  RNA interference . . . . . . . . . . . . . . . . . . . . . .  16  Thesis contribution . . . . . . . . . . . . . . . . . . . . . . . . .  22  RNA interference and the potential for a bistable flip-flop . . . . . .  26  2.1  26  1.3  1.4 2  Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  3  2.2  Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  27  2.3  RNAi as an intrinsically programmable machine . . . . . . . . . .  29  2.4  Digital memory with RNAi . . . . . . . . . . . . . . . . . . . . .  32  2.5  The mathematical model of an RNAi-based flip-flop . . . . . . . .  35  2.6  Steady state characteristics of the flip-flop model . . . . . . . . .  36  2.7  Dynamic characteristics of the flip-flop model . . . . . . . . . . .  39  2.8  Robustness of the flip-flop model . . . . . . . . . . . . . . . . . .  40  2.9  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  43  Software pipeline for genome-wide chromatin modification profiling and transcriptome analysis . . . . . . . . . . . . . . . . . . . . . . .  48  3.1  Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  48  3.2  Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  49  3.3  Wet-lab data generation . . . . . . . . . . . . . . . . . . . . . . .  50  3.3.1  Chromatin modification mapping with ChIP-on-chip . . .  50  3.3.2  Transcriptome mapping with tiling arrays . . . . . . . . .  53  Raw data processing . . . . . . . . . . . . . . . . . . . . . . . .  55  3.4.1  rMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . .  56  3.4.2  tilingArray . . . . . . . . . . . . . . . . . . . . . . . . .  56  Data analysis and visualization . . . . . . . . . . . . . . . . . . .  58  3.5.1  Local analysis pipeline . . . . . . . . . . . . . . . . . . .  58  3.5.2  Web-based analysis and collaboration . . . . . . . . . . .  67  Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  70  Analysis of chromatin modifications in the context of transcription .  72  4.1  Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72  4.2  Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  73  4.3  Distinct states of histone modifications . . . . . . . . . . . . . . .  79  4.3.1  Genome-wide localizations of H3K79me2 and H3K79me3  79  4.3.2  Profile of H3K79me2 and H3K79me3 at promoters and  3.4  3.5  3.6 4  ORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3  81  Association of H3K79me2 and H3K79me3 with transcription 82  vii  4.3.4  Association of H3K79me2 and H3K79me3 with the cell cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4.3.5  4.4  84  Dynamics of H3K79me2 and H3K79me3 during the cell cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . .  84  4.3.6  H3K79me2 at intergenic regions and ARS in G2/M-phase  86  4.3.7  Association of H3K79me2 with the transcription factor Swi4 88  4.3.8  Colocalization of H2BK123ub with H3K79me3 . . . . . .  88  Interdependencies of histone modifications . . . . . . . . . . . . .  92  4.4.1  Genome-wide distribution of H2BK123ub and H3 methylation marks with respect to gene length and transcriptional frequency . . . . . . . . . . . . . . . . . . . . . . . . . .  4.4.2 4.4.3 4.5  4.6 5  92  Correlation and colocalization of H2BK123ub and its dependent marks . . . . . . . . . . . . . . . . . . . . . . .  96  Site-specific removal of H2BK123ub by Ubp8 and Ubp10  100  Aberrant transcripts and their chromatin structure . . . . . . . . . 105 4.5.1  Chromatin at cryptic promoters . . . . . . . . . . . . . . 105  4.5.2  Deposition and role of H2A.Z at cryptic promoters . . . . 106  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130  viii  List of Tables Table 2.1  Variable values of the optimized flip-flip. . . . . . . . . . . . .  42  Table 2.2  Parameter values of the optimized flip-flop . . . . . . . . . . .  42  Table 2.3  Parameter ranges of randomly generated flip-flops . . . . . . .  43  Table 4.1  H2BK123ub and H3 methylation at different genomic features  92  Table 4.2  Frequencies of histone modification patterns . . . . . . . . . .  99  ix  List of Figures Figure 1.1  Layers of cellular memory and information processing . . . .  2  Figure 1.2  Architectural hierarchy and modifications of chromatin . . . .  11  Figure 1.3  Core components and process steps of RNA interference . . .  18  Figure 2.1  RNAi pathways in Caenorhabditis elegans . . . . . . . . . .  30  Figure 2.2  Flip-flop mode of operation . . . . . . . . . . . . . . . . . .  33  Figure 2.3  Sequence setup of the RNA flip-flop . . . . . . . . . . . . . .  34  Figure 2.4  Mathematical analysis of the RNA flip-flop . . . . . . . . . .  39  Figure 2.5  Simulated flip-flop behaviour (core RNAs) . . . . . . . . . .  41  Figure 2.6  Simulated flip-flop behaviour (all RNAs) . . . . . . . . . . .  41  Figure 2.7  Robustness of the RNA flip-flop for different parameters . . .  44  Figure 2.8  Robustness of the RNA flip-flop for different pulse parameters  45  Figure 3.1  Chromatin profiling using ChIP-on-chip:wet-lab part . . . . .  51  Figure 3.2  Chromatin profiling using ChIP-on-chip:dry-lab part . . . . .  53  Figure 3.3  Software ecosystem of the bioinformatics pipeline . . . . . .  60  Figure 4.1  H3K79me2 and H3K79me3 patterns across the genome . . .  80  Figure 4.2  Distribution of H3K79me2 and H3K79me3 across genes . . .  82  Figure 4.3  Association of H3K79me2 and H3K79me3 with transcriptional frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . .  83  Figure 4.4  Association of H3K79me2 and H3K79me3 with the cell cycle  85  Figure 4.5  Association of H3K79me2 with the cell cycle in G2/M-phase .  86  Figure 4.6  H3K79me2 profile in G2/M-phase . . . . . . . . . . . . . . .  87  Figure 4.7  Association of H3K79me2 with the transcription factor Swi4 .  89  Figure 4.8  Colocalization of H2BK123ub with H3K79me3 . . . . . . . .  91  x  Figure 4.9  Distribution of H2BK123ub and H3 methylation marks in all transcripts . . . . . . . . . . . . . . . . . . . . . . . . . . . .  94  Figure 4.10 Distribution of H2BK123ub and H3 methylation marks with respect to transcriptional frequency . . . . . . . . . . . . . .  95  Figure 4.11 Associations of H2BK123ub and H3 methylation marks with transcriptional frequencies . . . . . . . . . . . . . . . . . . .  96  Figure 4.12 Correlation of H2BK123ub and its dependent marks . . . . .  97  Figure 4.13 Patterns of H2BK123ub and its dependent marks . . . . . . .  98  Figure 4.14 Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (probe level) . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 4.15 Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (transcript level) . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 4.16 Connection between Ubp8, Ubp10 and H3 methylation marks  103  Figure 4.17 Segment-based association of Ubp8 and Ubp10 with H3 methylation marks . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 4.18 Distribution of H2A.Z across ORFs and transcripts . . . . . . 107 Figure 4.19 H2A.Z at sites of cryptic initiation . . . . . . . . . . . . . . . 108 Figure 4.20 H3K4me3 at sites of cryptic initiation . . . . . . . . . . . . . 109 Figure 4.21 Role of SWR1-C in H2A.Z deposition at cryptic promoters . . 110 Figure 4.22 Role of H2A.Z for the occurrence of cryptic transcripts . . . . 111 Figure 4.23 H2A.Z at sites of antisense cryptic transcripts . . . . . . . . . 112 Figure 4.24 Circuitry model for H2BK123ub and its dependent histone marks116  xi  Acknowledgements This thesis would not have been possible without the many people who supported me in various ways over the past years. It is, therefore, my pleasure to express my thanks to those who aided me in completing the work presented herein. First and foremost, I owe my deepest gratitude to my supervisor Dr. Arvind Gupta for giving me the opportunity to pursue the questions addressed in this thesis and for providing an inspiring and stimulating research environment. His support, guidance and encouragement as my mentor have been invaluable over the past years at Simon Fraser University and the University of British Columbia. Similarly, I would like to thank my supervisory committee members, Dr.Michael Kobor, Dr. Eldon Emberly, Dr. Peter Unrau and Dr. Wyeth Wasserman for many fruitful discussions, interesting ideas and helpful comments. Their enthusiasm and joy for research were truly motivational. Special thanks to Dr. Ali Shilatifard and Dr. Frank Holstege for giving me the opportunities to collaborate on several exciting projects. A warm thank you to all members of the Kobor group for making the lab one of the most incredible places for my research. Thanks for all the supportive and inspirational advice—on science and life in general—the team spirit and, especially, for many fun times. I truly enjoyed the wonderful collaborations with the Centre for Molecular Medicine and Therapeutics.  xii  I would further like to thank Joyce Poon, Gerdi Snyder, Val Galat and Sharon Ruschkowski for their support with all administrative and organizational needs over the past years. I also gratefully acknowledge the funding sources that made my work possible, foremost the CIHR/MSFHR Bioinformatics Program for Health Research, which offered a unique environment for my time in graduate school. Lastly, I would like to thank my family for their love and encouragement, especially my parents and my brother who supported me in all my pursuits even if it meant to be far away from home on the other side of the globe. Yet, most importantly, I would like to thank Julia, the most amazing person in my life. Her support, motivation and love made the hard times of this journey bearable and the good times unforgettable. I am deeply grateful for everything you have done for me, and I would not be what I am without you. This life can only be great and exciting with you on my side and I am looking forward to it!  xiii  To my parents and grandparents  xiv  Chapter 1  Introduction 1.1  Motivation  Biological systems process information. Across the scale, from unicellular to complex organisms, components of these systems sense, process and produce molecular signals to regulate their activity and interact with the environment they live in (Cooper and Hausman, 2009; Alberts et al., 2007; Lodish et al., 2007). This self-regulatory character of biological systems is primarily based on precisely controlled (de-)activation of genes in particular situations that are characterized by certain molecular inputs. However, many cellular pathways are not simple cascades of gene activation that always generate the same output on the same input. Often a proper response must take conditions into account that the cell encountered previously. Hence, such pathways require memory to store information (Alberts et al., 2007). At first glance, DNA seems to be the natural choice for storing information, but in its organismal role DNA is a static storage entity for evolutionary-scale information conveyance (Figure 1.1). Manipulating this nucleic acid sequence usually has severe negative effects. Yet, how do organisms record, store and pass on information that changes frequently when manipulations to the DNA sequence might jeopardize the whole system?  1  Cells, especially in higher organisms with differentiated cell types, achieve the flexibility to cope with dynamic information by employing storage means above and beyond the genetic level of the DNA sequence (Bonn and Furlong, 2008; Istrail et al., 2007; Levine and Davidson, 2005). Such non-sequence based layers of (heritable) information are capable of altering gene expression and termed epigenetic (Greek: epi-, above).  Figure 1.1: Cellular information storage and processing occurs across several highly connected and interdependent layers that compute and perform logical operations on different biomolecular substrates through various cellular functions. Besides permanent sequence-based information on the DNA level, RNA and chromatin provide greater plasticity and abstract encoding schemes that reflect dynamic aspects of information processing in the cell. In particular, they are capable of epigenetic information storage and propagation. The pictures of two of these regulatory layers, chromatin and RNA, have started to change profoundly over the past years. Chromatin, initially considered to be a mere packaging structure for DNA, now emerges as a highly dynamic entity with various means to control access to the DNA and to regulate transcriptional gene activity (Figure 1.1). It is becoming increasingly clear that chromatin exists in multiple states that are dynamically shaped by environmental conditions and intrinsic factors to modulate cellular functions (Allis, 2007). Similarly, the discoveries 2  of non-coding RNAs and associated pathways such as RNA interference (RNAi) broadened our understanding of RNA family members from rather passive substrates in the protein synthesis pathway to active players in a spectrum of cellular processes (Ketting, 2011). Even though research over the past years identified many parts on the molecular level of chromatin and RNA many questions remain to be answered as to how these parts work together and influence cellular functions. Addressing these questions is not alone motivated from a biological perspective; many of them have also been raised in the research field of in vivo biomolecular computing. The long-term goal of in vivo biomolecular computing is to build miniature computing machines that operate in live cells and interact directly with the host environment (Benenson, 2009b). Such biocompatible computing devices are envisioned to monitor, control and even reprogram cellular processes and to open new ways of diagnosis and disease treatments among many other applications. Proof-of-concept devices have already been demonstrated successfully, some even in human cells (Rinaudo et al., 2007). However, their current design is challenged in two fundamental aspects: Firstly, information that these devices need or generate is encoded in nucleic acid sequences. Secondly, the computation manipulates the information, i.e. the nucleic acid molecules, which usually leads to their destruction and limits these devices to a single operational cycle (Benenson, 2009b). These characteristics conflict on one side with the biological constraints on sequence-based information in the cell and on the other side with the computational requirement of continuous operation to interact with cellular pathways. These limitations of biomolecular computing could be overcome by employing alternative encoding schemes for information and computational state. Sequence-based information in the organismal context is static. Computation, however, demands the dynamic handling of information. Against this background, this work explores the epigenetic layers of chromatin and RNA and contributes to elucidate their gene regulatory impact from the biological perspective, and explores how to potentially utilize their characteristics for handling dynamic information from the viewpoint of biocompatible computing.  3  1.2  Biomolecular computing  1.2.1  The computational paradigm  Biomolecular computing describes the interdisciplinary attempt of biology, chemistry and computer science to fundamentally understand information processing in living systems on one side and, on the other side, to utilize biomolecules and biochemistry to probe and manipulate natural or to engineer artificial systems for computing and communication. From the latter perspective, computing with biomolecules can be similarly understood as other ‘unconventional computing’ ideas, including quantum (DiVincenzo, 1995), optical (Jain and G W Pratt, 1976) or even billiard ball computing (Fredkin and Toffoli, 1982), which all strive to extend or explore alternatives to the computational architecture of von Neumann (Newmann, 1945) and/or the computational principles of Turing (Turing, 1937), which have undoubtedly led to the pervasive success of silicon-based computers, but which also revealed challenges current computational systems are faced with in terms of computational complexity, energy consumption, recyclability and the like. While some of the proposed ‘unconventional’ computing substrates led to alternatives to von Neumann’s principles, none of them has (yet) led to solve a computational problem that cannot be solved with classical computers (assuming enough time and space resources from a computational complexity standpoint, even if they might be practically infeasible). Even the computational superiority of quantum computing is still debated and generally doubted (Bernstein and Vazirani, 1997). Nevertheless, an undeniable novelty of many of the unconventional approaches is that each of them broadened our view of computation and information. In case of biomolecular computing in particular, many initial works used instances of well-known problems in complexity theory to demonstrate the feasibility of the new type of computation (Braich et al., 2002; Liu et al., 2000; Ouyang et al., 1997). But trading time for space complexity and exploiting the inherent par-  4  allel processing of biochemical reactions as many of these attempts suggested only scales to a certain degree (besides many other challenges, such as error-proneness of molecular operations). So far, biomolecular computing cannot compete in solving problem instances of sizes that are intractable for classical, silicon-based computing. One of the unique strengths of biomolecular computing rather lies in its compatibility with our physiology and in its potential to carry out operations in living organisms (Fu, 2007). In contrast to any other computational substrate, biological molecules can be used to build nanoscale computing devices that operate in and interact with living cells. Computing in this environment does not try to solve large-scale or hard problems in the sense of classical complexity. Instead, biocomputers interface with cellular processes to monitor, analyze and manipulate molecular signals, and thereby open new ways to diagnostics and therapeutics among many other applications. Attempts in this direction are referred to as biocompatible computing (Shapiro and Benenson, 2006).  1.2.2  Towards DNA-based computation  The idea of biological-inspired computing is not as new as often stated. Although Adleman in the mid-1990s undoubtedly pushed the door wide open by convincingly exploiting biological molecules to solve an instance of a well-known computational problem (Adleman, 1994), many others paved the way to the research field where we see it today. Some of the earliest conceptual ideas of engineering on the molecular scale date back to Feynman in the late 1950s (Feynman, 1959). Even a decade earlier, von Neumann described the theoretical model of a self-replicating automata in molecular vocabulary (Neumann, 1966). A few years later, R¨ossler and others then reasoned about nanoscale computing and universal automata based on chemical reactions (R¨ossler, 1972; 1974), and Hjelmfelt and Ross later showed that (ideal) chemical kinetics are indeed Turing-complete (Hjelmfelt et al., 1991; 1992; Hjelmfelt and Ross, 1992).  5  Parallel to the evolving chemical perspective on molecular computing, the discoveries of the DNA structure (Watson and Crick, 1953) and the genetic code (Crick et al., 1961) and their fundamental roles in cellular information processing triggered similar ideas in biology. The unravelling of genetic modules, such as the lambda repressor (Ptashne, 1986), the lac operon (M¨uller-Hill, 1996), the tetrecyclin repressor (Hillen and Berens, 1994; Hinrichs et al., 1994) and the lux operon (Babloyantz and Nicolis, 1972; Dunny and Winans, 1999), led to the understanding that sets of biomolecules are often connected in networks, and that they convert molecular inputs into molecular outputs according to certain regulatory rules—a bioprocess that essentially is a computation (Regev and Shapiro, 2002). Together with the growing portfolio of laboratory techniques, such as DNA recombination, sequencing and synthesis, the field of biology continuously developed its own approaches and tools to probe and manipulate the organismal machineries of computation. Adleman, a computer scientist, blended the theoretical potential and practical feasibility of molecular operations and applied them in light of computation in the classical sense (Adleman, 1994). His proof-of-concept computation used DNA molecules as a primary substrate, which coined the term DNA computing that became virtually synonymic for the entire field of biomolecular computing. Most of the approaches and information encoding schemes his work inspired in the following years were based on DNA (nicely reviewed in (Benenson, 2009b; Condon, 2006). Inspired by the informational role of DNA in the cell, its stability and sequence properties were considered ideal for information storage and processing (Condon, 2006), and many creative works led to a colourful bouquet of ideas, ranging from theoretical proposals of DNA Turing machines (Lipton and Baum, 1996), to successful demonstrations of algorithmic self-assembly of DNA molecules (Douglas et al., 2009), to innovative uses of DNA structure (Shapiro and Benenson, 2006) and many others. The field of DNA computing has continued to expand and diversify since then, and connections have been established with other research areas, such as Syn6  thetic Biology and Nanotechnology. With respect to the latter, DNA molecules were used to build rigid 2D or even 3D scaffolds (Rothemund, 2006; Seeman, 2003) that allow assembly of other biological molecules (Niemeyer et al., 2002) or non-biological components (Xin and Woolley, 2003). Even locomotion and transport based on DNA molecules have been demonstrated (comprehensively reviewed in (Simmel and Dittmer, 2005)). From the interface with Synthetic Biology, several works successfully demonstrated engineered computational elements of biomolecules that are inspired by electronic circuits. Beginning with Gardner’s toggle switch (Gardner et al., 2000) and Elowitz’s repressilator (Elowitz and Leibler, 2000), efforts towards standardizing parts began (Group et al., 2006; Canton et al., 2008; Weiss, 2001) that resulted in libraries with more than 3000 elements as of today ( From the perspective of biocompatible computing, recent works demonstrated that automata of biomolecules are capable to sense molecular signals, to evaluate cellular conditions and to generate molecular outputs, which directly affect cellular functions or that can be measured by an external observer (Benenson et al., 2004). However, DNA-based automata are challenged in several ways in in vivo environments (Ezziane, 2006). Information and computational state encoded in the sequence or length of DNA molecules must be updated frequently in the course of a computation, and since many changes are not reversible and/or the informationencoding molecules degrade during the computational process, DNA automata are limited to a single operational cycle. Thus, one of the most critical obstacles towards continuous computation in vivo is the character of DNA as computational substrate itself (Benenson, 2009b). DNA in its organismal role is a static molecule for information transport on evolutionary time scales. Manipulating this nucleic acid sequence usually has severe negative effects. Computation, in contrast, is intrinsically dynamic and, hence, demands substrates that support this plasticity.  1.2.3  Beyond DNA-based computation  Given the challenges DNA-based in vivo computing is faced with—in particular the static organismal character of DNA in contrast to the dynamic nature of  7  computation—alternative biomolecules were considered by the field. According to the central dogma of molecular biology (Crick, 1970), two other major classes of molecules are carriers of information in the cell: RNAs and proteins. Protein-based computations were explored in the context of enzymatic activity by integrating molecular inputs to logic functions (Baron et al., 2006a;b), combining logic gates to networks (Niazov et al., 2006; Privman et al., 2008) and even Turing machine-like operations (Bar-Ziv et al., 2002). However, since our understanding of peptide/enzyme engineering is still limited (Benenson, 2009b), more complex protein-based computations comparable to existing DNA-based systems remain to be demonstrated. Similarly, RNA—given its higher instability and rapid turnover—remained in the shadow of DNA as computational substrate for quite a while. Although earlier works exist (Faulhammer et al., 2000; Stojanovic and Stefanovic, 2003), it was not until in vivo computing faced the limitations of DNA that RNA received broader attention (Benenson, 2009a). Then, the dynamic characteristics of RNA, such as the possibility of synthesizing new RNA in the cell, together with the discoveries of novel RNA regulatory pathways, such as RNA interference, led to increasing importance of RNA-based computations (Benenson, 2009a). Besides RNA interference (see Section 1.3.2 for details) and small regulatory RNAs, riboswitches received attention. Riboswitches exist in nature as meta-stable structures, such as stem-loops, in mRNA transcripts (Mandal et al., 2003; Winkler and Breaker, 2005), and their conformational state can be dynamically altered by an array of molecular signals and inputs, such as temperature, small molecules and other RNAs (Isaacs et al., 2004). Riboswitch(-like) structures can be optimized or designed de novo (Beisel and Smolke, 2009; Babiskin and Smolke, 2011) for a given input such that state changes approximate discrete behaviour, multiple riboswitches can be introduced on a single transcript or linked in trans across multiple ones to implement logic functions (Bayer and Smolke, 2005; Win and Smolke, 2007a; 2008).  8  While riboswitches are mostly found in bacteria (Henkin, 2008), RNA interference plays a key regulatory role in higher organisms (Fire et al., 1998). For detailed modalities of this pathway from a biological perspective, please refer to Section 1.3.2. In biomolecular computing, RNA interference has been successfully used to implement the largest in vivo computation based on Boolean logic so far (Rinaudo et al., 2007). Current research tries to broaden the range of input molecules and flexibly integrate them with riboswitches (An et al., 2006; Tuleuova et al., 2008; Beisel et al., 2008) or RNA interference platforms (Leisner et al., 2010; Xie et al., 2010). The increasing understanding of the regulatory roles of RNAs in the cell, combined with the importance of their associated pathways make them attractive substrates and/or building blocks for molecular computers, especially in the envisioned diagnostic and therapeutic applications (Isaacs et al., 2006; Davidson and Ellington, 2007; Benenson, 2009a). The ability to rationally design or tailor RNA regulators (Ellington and Szostak, 1990) can enable computational devices to access a large variety of (extra-)cellular signals. Taken together, these features seem to position RNA-based approaches as key elements towards biocompatible systems. Form a biological perspective, research over the past years revealed that cellular computation, i.e. gene regulation, cannot be fully explained by just cis-regulatory networks (Bonn and Furlong, 2008; Istrail et al., 2007; Levine and Davidson, 2005) composed of transcription factors, signalling proteins and (non-coding) RNAs. Chromatin, the structure that packages DNA in the nucleus, emerged as another layer of regulatory impact (see Section 1.3.1 for details). From a biomolecular computing viewpoint, practical attempts to utilize chromatin have not been made as of the writing of this thesis; first theoretical works, however, suggest great potential of chromatin as computational substrate, especially in terms of memory capacity and functional impact (Prohaska et al., 2010).  9  1.3 1.3.1  Epigenetic regulation in the cell Chromatin  Chromatin architecture In eukaryotic cells, the macro-molecule of DNA is tightly compacted into the nucleus of a few microns in diameter. This packaged structure is called chromatin and appears as regularly spaced ‘beads on a string’ under the electron microscope (Olins and Olins, 1974). The ellipsoidal ‘beads’ are the fundamental architectural units of the eukaryotic genome and referred to as nucleosomes (Kornberg, 1977; Oudet et al., 1975). Each nucleosome contains ∼147 base pairs (bp) of DNA wound around 13/4 turns an octamer of histone proteins: two heterodimers of core histones H2A and H2B and a tetramer of histones H3 and H4 (Arents et al., 1991; Luger et al., 1997). As illustrated in Figure 1.2, the architectural level of nucleosomes is the first one in the hierarchy of increasingly complex layers, which together determine the chromatin structure (Misteli, 2007). Above the nucleosomes, on the second level, chromatin folds into higher-order structures known as fibres (Woodcock, 2006). Besides further compaction, these structural features may bring distant genome regions into close proximity, facilitating the interactions of enzymatic complexes and spatially joining nonadjacent genetic elements (Fraser, 2006; van Driel et al., 2003). On the third level, chromatin fiber looping—together with the spatial and temporal organization of enzymatic machineries—defines sub-nuclear compartments within the nucleus which facilitate cellular processes such as DNA replication, repair and transcription (Misteli, 2005). Being at the first level of this architecture, nucleosomes play indeed a fundamental role in genome organization and regulation. They exhibit great plasticity— both in their composition and localization—and are subject to an array of posttranslational histone modifications (Kouzarides, 2007) (see Section 1.3.1). The stability of these modifications varies. Some are transient, characterized by fast  10  enzymatic removal or rapid histone turn-over, whereas others are more permanent, mitotically or even meiotically heritable and capable of affecting the next generation of cells (Youngson and Whitelaw, 2008).  Figure 1.2: Depiction of the architectural hierarchy of chromatin and some of its known modifications. In the cell nucleus, DNA is compacted into the dense chromosome structure, which in turn is composed of chromatin loops and fibers. The basic units of chromatin are nucleosomes, consisting of histone proteins at the core that are subject to post-translational modifications such as acetylation (Ac), methylation (Me) and ubiquitination (Ub). Together with histone modifications, histone variants such as H2A.Z (Z) and DNA methylation (Me) characterize distinct chromatin neighbourhoods. Since some modifications are stable and heritable, chromatin has become a focus in the field of epigenetics (Bird, 2007). In this context, epigenetic modifications refer to changes in genome function such as gene expression that are not based on changes of the DNA sequence (Ptashne, 2007; Goldberg et al., 2007). Although heritability of the modification is required by most definitions of the 11  term epigenetic, there are examples of cells that do no longer divide and still possess non sequence-based modulation of cellular functions via chromatin modifications (Bonasio et al., 2010; Dulac, 2010). The term somatic epitype has been proposed additionally for non-heritable epigenetic modifications (Lahiri and Maloney, 2006), but has not found wide usage so far. The specific histone modification patterns and the structural composition of chromatin at any point in time serve as epigenetic memory in the cell that allow integrating and storing input signals and generating functional outputs on cellular processes (Filion et al., 2010). It has been shown that past cell states and environmental influences, such as nutrition, exposure to toxic agents, infections and diseases, affect the chromatin structure (Zhang and Meaney, 2010). The capability of integrating such (environmental) signals in a rapid, flexible and often reversible manner into the program of the genome provides the regulatory mechanisms to adapt functions and pathways in higher organisms (Prohaska et al., 2010). A classic example of the purpose of this memory platform, on which input signals are stored as epigenetic traces and influence cellular functions (at a future point in time), is given by the establishment and maintenance of cell identity in higher organisms (Satterlee et al., 2010). Even though every cell of an organism is based on a single zygotic genome, subsets of progeny cells launch distinct programs of gene expression along the developmental trajectory that eventually leads to their specific organismal character and role. These cell identities are usually maintained for a lifetime, even when differentiation signals were experienced only once during embryonic development (Bonasio et al., 2010). The induced epigenetic traces capture and stabilize these signals and render the new expression states heritable (in those cells that continue to divide) (Bryant et al., 2008; Ptashne, 2009). Chromatin dynamics Over the course of almost a century after its initial description in the early 1880s (Flemming, 1882), chromatin slowly emerged as (passive) scaffolding structure for the DNA. The fast pace of discoveries in the field of genetics, however, vir-  12  tually eclipsed research on chromatin until the structure of the nucleosome was revealed (Kornberg, 1974). Against common belief up to that point that histone proteins would simply coat the DNA, the newly proposed structure put DNA on the ‘outside’ of nucleosomes (Kornberg, 1974; Oudet et al., 1975), suggesting they must be actively involved in making segments of DNA accessible to cellular machineries. However, the active role of chromatin in regulatory processes, in particular transcription, was not apparent until the description of the transcriptionlinked histone modifiers from Tetrahymena (Brownell et al., 1996) and mammalian cells (Taunton et al., 1996). Since then, the research fields of transcription and chromatin have been connected and changed the understanding of chromatin from a rigid scaffold to a dynamic structure. The fact that nucleosomes unite both stability and plasticity in order to maintain the DNA in a compacted form while still allowing controlled access to the underlying genetic information, demonstrates their regulatory importance for rendering genomic loci accessible or inaccessible for proteins in DNA-templated processes (Kornberg and Lorch, 1999). Today, chromatin regulation is understood to be one of the fundamental modes in gene regulation and cellular information processing in eukaryotic cells (Prohaska et al., 2010). To control the impact of nucleosomes on chromatin structure and processes, cells employ two main strategies: One way is through chromatin remodelling (Cairns, 2007), the second way works through post-translational modifications of histones (Bhaumik et al., 2007; Kouzarides, 2007). In the first mode, chromatin remodelling enzymes use energy from ATP molecules to slide, eject or change the composition of nucleosomes (Clapier and Cairns, 2009). They play important roles in altering histone-DNA contacts and making DNA accessible during transcription and repair processes, and to reassemble the proper chromatin structure after DNA replication (Cairns, 2007). Chromatin remodellers are also capable of replacing canonical histone proteins with histone variants that differ in amino acid composition (Clapier and Cairns, 2009). Dynamic repositioning of nucleosomes and introduction of histone variants through chromatin remodellers or other modifiers yields distinct chromatin configurations or neighbourhoods characterized by nucleosome occupancy and composition. These neighbourhoods may impact cellular functions by being recognizable to and/or 13  actively recruit enzymatic machineries in DNA-templated processes (Talbert and Henikoff, 2010). In the second mode of chromatin remodelling, histone proteins are subject to post-translational modifications (Figure 1.2), including acetylation, methylation, phosphorylation, SUMOylation, ubiquitination, adenosine-diphosphate ribosylation and biotinylation (Kouzarides, 2007). So far, modifications on more than one hundred histone amino acid residues have been identified. Histone modifications can occur in certain combinations and may influence each other in a process referred to as regulatory crosstalk (Latham and Dent, 2007). Influence on other histone modifications can occur in cis on the same histone, or in trans across histones and/or nucleosomes, leading to spacial and temporal dependencies of modifications (Suganuma and Workman, 2008). Histone modifications influence the chromatin architecture twofold: In a direct way, they may change the chromatin structure through charge-dependent alterations affecting the contacts between adjacent nucleosomes or between histones and the DNA (Simpson, 1978; Ausio and van Holde, 1986; Ura et al., 1997; Ahn et al., 2005). Classically, regions with ‘loose’ wrapping and less condensed packaging were referred to as euchromatin, whereas those with tight nucleosome packaging as heterochromatin (Allis, 2007). In a more indirect way, individual or combinations of histone modifications may act as recognition signals for the recruitment of regulatory proteins that trigger functional responses (Berger, 2007; Rando and Chang, 2009). This hypothesis is known as the effector-mediated model, where signalling modules exist that operate as ‘writer’, ‘eraser’ and/or ‘reader’ of chromatin marks (Ruthenburg et al., 2007). Many of the identified chromatin-modifying complexes conform to this hypothesis and have enzymatic domains to add (write) or remove (erase) or bind (read) certain modifications (Ruthenburg et al., 2007). The capability of writing, reading and erasing modification signals on histones without effecting the underlying DNA sequence constitutes an additional computational layer for storing and retrieving information in the cell (Prohaska et al., 14  2010). Observed correlations between certain histone modifications and cellular functions, such as the transcriptional activity of genes, led to the proposal that— on top of the genetic code—there exists an epigenetic histone code (Jenuwein and Allis, 2001; Strahl and Allis, 2000; Turner, 2007), which would, for example, allow deriving the transcriptional state of the gene by determining its histone marks. However, recent research indicates that—from a biological perspective—histone modifications do not represent a self-contained code which can be understood detached from the context. According to this view, histone modifications and their functional impact must be seen in light of their temporal and inter-relational dependencies, which would render the idea of a code into a ‘nuanced language’ that constitutes the basis for transcriptional regulation through the chromatin signalling pathway (Lee et al., 2010). From a more computational perspective, however, the ambiguity or sometimes even the lack of (immediate) functional effects of certain chromatin modifications on cellular processes might hint at a decoupling event that took place during evolution and separated the epigenetic modification layer from direct biochemical functions, making it available for information storage and computation in a way classical regulatory effectors cannot work (Prohaska et al., 2010). Current research is attempting to generate genome-wide maps for diverse chromatin modifications (Milosavljevic, 2010). These maps will broaden our understanding of epigenetic regulation and are likely to increase the translational potential of the field, in particular with respect to diagnostic and therapeutic applications in medicine as improper regulation of chromatin marks has been linked with several diseases, including imprinting disorders, Rett syndrome, facioscapulohumeral muscular dystrophy and even autism (Feinberg, 2010). Some of the most interesting advances in cancer biology, particularly leukaemias, are attributable to changes in the epigenome, such as abnormalities in histone marks in the promoter region of genes or aberrant methylation of DNA at CpG islands and microRNAs (Nature Biotechnology, 2010). However, given the plasticity of many chromatin modifications, mapping them can only be a first step towards a coherent picture of their dynamics and impact. It will become necessary to map their localization at different points in time, in different tissues, in healthy and disease cells, etc. to further understand the molecular mechanisms by which they mediate their role for the 15  functioning of the cell. Furthermore, deciphering the histone code or language—if there is any—is likely to require information from higher-order architectural dimensions of chromatin (see Section 1.3.2) and the interplay between DNA and/or chromatin modifications and RNAs, transcription factors, nuclear organizing factors and signal transduction pathways (Nature Biotechnology, 2010). The current stage of epigenetic research seems comparable to genetic research before deciphering the DNA structure and the genetic code (Henikoff and Grosveld, 2008). If this comparison is true, the hopes of a revolutionary impact of epigenetics in many fields might not be unreasonable.  1.3.2  RNA interference  Operational principle Conserved across a wide spectrum of eukaryotic organisms, RNA interference serves on one side as a natural defense mechanism in the cell against viruses and transposons and, on the other side, as a regulatory system that is key in development and gene expression in general (Zamore, 2002). With respect to the latter aspect, RNAi is part of the cellular machinery controlling which genes are active and tuning to which degree they are active. Intrinsically, RNAi is a negative feedback process to silence genes. As described in Section 1.3.2, RNA interference may embrace processes or features of the chromatin layer in several organisms, but the ‘classic’ operational principle is based on the post-transcriptional level. There, the protein machinery of RNAi controls the abundance of messenger RNA (mRNA) transcripts through smallinterfering RNA (siRNA) molecules that base-pair with homologous regions on mRNA targets and induce their degradation (Fire et al., 1998). Feedback loops in that process are capable of sustaining, reinforcing and modulating the interfering impact (Lipardi et al., 2001; Plasterk, 2002; Sijen et al., 2001; Zamore, 2002). Thereby, RNAi is capable of altering the information that is encoded and conveyed in mRNA expression levels.  16  Initially, the phenomenon of RNA interference—though termed differently— was discovered in plants (Ecker and Davis, 1986; Napoli et al., 1990). While trying to increase the colour intensity of a flower, additional copies of a gene had been introduced into the plant’s genome. However, instead of the expected overexpression of the gene and intensified colour, substantially reduced transcript levels and weaker colour were observed. The molecular mechanisms for this phenomenon first remained elusive, and research in many organisms began to uncover them. The observations made in these organisms were labelled differently, such as post-transcriptional gene silencing in plants, RNAi in animals, quelling in fungi and virus-induced gene silencing (Agrawal et al., 2003). It took almost a decade until research in the nematode worm Caenorhabditis elegans identified doublestranded RNA (dsRNA) molecules as the silencing trigger (Fire et al., 1998). Generally, dsRNA trigger for RNAi exist in two major forms in the eukaryotic cell (Figure 1.3): microRNAs (miRNAs), which arise from non-coding RNAs in the host genome, and small-interfering RNAs (siRNAs) from selfish or exogenous genetic material (Ding and Voinnet, 2007). Although both types have common features and share the same protein machinery (Gregory et al., 2006), they differ in their biogenesis, final structure and function. A newly transcribed miRNA, called pri-miRNA, get first cleaved into a pre-miRNA by the Drosha complex such that the remaining sequence of ∼70 nt in length folds into a hairpin structure, which then is subject to further processing by the Dicer complex and finally leads to a mature miRNA of ∼22 nt (Gregory et al., 2006). In contrast, the biogenesis of siRNAs begins with a long selfish or invasive dsRNA trigger, such as the genetic material of a virus (Ding and Voinnet, 2007), which gets first cleaved by Dicer into several short duplexes of ∼20 nt in length. Subsequently, each duplex gets separated such that one part becomes a mature siRNA. Once produced, the pathways for miRNA and siRNA converge at the RNAinduced Silencing Complex (RISC)—the core component of the RNAi protein machinery (Rana, 2007). Both miRNAs or siRNAs can be loaded into a RISC complex and guide it to mRNA transcripts that share sequence similarity with the trigger molecule (Figure 1.3). MiRNAs induce gene silencing by base-pairing with 17  Figure 1.3: Schematic diagram of core components and process steps of RNA interference (RNAi). The RNAi pathways are guided by small ncRNAs, primarily siRNAs and miRNAs. In the siRNA pathway, (exogenous) dsRNA are cleaved by Dicer into siRNA duplexes of which one strand is incorporated into the RISC complex that targets and typically cleaves mRNAs based on perfect sequence homology. (For other siRNA-based processing modes, refer to Section 2.3.) In the miRNA pathway, endogenously produced transcripts (pri-miRNAs) are first processed by the Drosha complex into pre-miRNAs which are subsequently exported into the cytoplasm where Dicer cleaves the stem-loop structure into the final miRNA. Once one strand of a mature miRNA is loaded into RISC, mRNAs are targeted and translationally inhibited. In contrast to siRNAs, sequence homology between miRNA and target is typically imperfect.  18  regions of an mRNA target and blocking the translational machinery. The pairing with the target happens at the post-transcriptional level in the cytoplasm but often is incomplete with several base-pairs mismatching. SiRNAs, on the other hand, generally base-pair perfectly with a site on the target and post-transcriptionally silence the gene by inducing cleavage of the transcript. Even though differences in the modalities and features of the RNAi pathway exist between organisms, the phenomenon continues to converge towards a single universal theme of gene regulation: The common components of this regulatory pathway are that (i) the interference is triggered by a dsRNA molecule and that (ii) transcripts are targeted in a sequence-dependent manner by (iii) a ncRNAcontrolled protein machinery (Agrawal et al., 2003). The RNA interference pathway is not only a cellular machinery of intriguing universality but also represents a powerful technique that led to many applications in a spectrum of research fields, ranging from biology, to medicine and even to biomolecular computing (see Section 1.2.3). As a tool in biology, RNAi allows for rapid determination of gene function in many organisms (Boutros et al., 2004; Kamath and Ahringer, 2003). Also in the context of therapeutics, RNAi has found applications. Given the high sequence specificity regarding genes, siRNAs can on one side be used to validate drug targets and on the other side to effectively modulate gene expression in disease cells (Bitko and Barik, 2001; McManus and Sharp, 2002; Moss, 2003). Future medical applications attribute great potential for RNAi to combat carcinomas, myelomas and cancers caused by overexpression of oncoproteins (Tuschl and Borkhardt, 2002). For the field of biomolecular computing, RNA interference represents a promising platform to realize computations with biomolecules in live cells. Circuits of Boolean logic elements based on RNAi have already been demonstrated successfully (Rinaudo et al., 2007) and efforts are under way to make a variety of (extra-)cellular signals available for RNAi-computers (An et al., 2006; Beisel et al., 2008; Culler et al., 2010; Tuleuova et al., 2008).  19  Connections between chromatin and RNAi-like processes Chromatin and RNA interference play key roles in epigenetic phenomena and work in tandem to realize an array of cellular functions, including gene regulation, heterochromatin formation, DNA methylation and programmed DNA elimination (Mochizuki et al., 2002; Pal-Bhadra et al., 2002; Tabara et al., 1999; Taverna et al., 2002; Zilberman et al., 2003). While the ‘classic’ RNA interference pathway represents a form of post-transcriptional gene silencing (PTGS) and impacts messages in the cytoplasm, its regulatory intersection with the chromatin layer enables RNAi(-like) mechanisms of transcriptional gene silencing (TGS) in the nucleus. Analogous to the sequencedriven recognition of mRNA targets in PTGS, small non-coding RNAs (ncRNAs) are also key in TGS for identifying genomic sites to trigger chromatin modifications through ncRNA-DNA interactions (Jones et al., 2001; Mette et al., 2000). Both PTGS and TGS modes of RNAi share core enzymatic machineries and operational principles that impact gene activity. The function of RNAi as TGS by regulating the epigenetic structuring of the genome is conserved across many species (Grewal and Rice, 2004). In Schizosaccharomyces pombe, core proteins of RNAi complexes are required for heterochromatin formation (Grewal and Elgin, 2007). For instance, the equivalent of the RISC complex, called RNA-Induced Transcriptional Silencing complex (RITS) (Verdel et al., 2004), contains both a chromatin-associated protein and an RNAi-associated protein and, together with the RNA-Directed RNA polymerase Complex (RDRC), represent critical components in heterochromatin assembly. Deletion of any of the genes encoding these proteins results in defects in histone methylation and centromere formation, which disturbs proper iteration through cell cycle stages (Hall et al., 2002; Volpe et al., 2002; 2003). Furthermore, components of the RNAi machinery together with the histone variant H2A.Z suppress antisense transcripts in the S. pombe genome (Zofall et al., 2009), which represents another intriguing example of the deep connection between chromatin and RNAi for proper mRNA processing and genome stability.  20  In Drosophila melanogaster and Tetrahymena, RNAi-like mechanisms are involved in TGS by affecting histone methylation and gene silencing proteins in heterochromatic regions (Pal-Bhadra et al., 2002; 2004; Mochizuki et al., 2002; Taverna et al., 2002; Liu et al., 2004). Plants too share a similar set of RNAi proteins and enzymatic activities that are required for structuring the epigenome. In Arabidopsis, histone methylation, RNAdirected DNA meythlation (RdDM) and targeting of repetitive DNA sequences require an RNA-driven RNAi-like machinery (Matzke et al., 2002; Zilberman et al., 2003; Onodera et al., 2005; Gao et al., 2010; Lister et al., 2008; Cokus et al., 2008). In mammals, RNAi has been associated with epigenetic chromatin mechanisms for inactivation of one of the X-chromosomes in female offspring. In that process, a long ncRNA, called Xist, is transcribed from the inactive X-chromosome and induces/maintains chromosome-wide gene silencing by recruiting silencing complexes in cis (Brockdorff et al., 1992; Brown et al., 1992; Clemson et al., 1996; Penny et al., 1996; Marahrens et al., 1997). Paradoxically, silencing does not affect the Xist gene itself. Instead, the recruited chromatin-modifying complexes maintain its expression in a positive feedback loop. On the active X-chromosome, in contrast, the antisense transcript of Xist, Tsix, is expressed and binds to Xist forming a dsRNA duplex (Lee et al., 1999; Lee and Lu, 1999; Lee, 2000; Luikenhuis et al., 2001; Sado et al., 2001). This duplex is proposed to be processed in an RNAi-like Dicer-dependent manner, which causes downregulation of Xist on the active X-chromosome (Ogawa et al., 2008). It is further proposed that the RNAi pathway is involved in the regulation of chromatin modifications and in the spreading of silencing on the inactive X-chromosome (Ogawa et al., 2008). Besides their role in chromatin structuring in cis, long ncRNAs also emerged as important regulators of the epigenome from a second perspective. HOTAIR, for example, is a long ncRNAs that is proposed to act in trans by providing a scaffold for chromatin-modifying complexes (Tsai et al., 2010). In binding to two distinct complexes at the same time, HOTAIR mediates coordinated methylation and demethylation of two amino acid residues on histone H3, and thereby inherently specifies a pattern of chromatin modifications to silence the genes it is recruited to.  21  While some of the detailed molecular mechanism remain to be elucidated, these examples clearly demonstrate a strong connection between RNA- and chromatin-mediated processes for cellular functions. Given that large potions of the genome are transcribed and only a fraction translated into proteins, it is likely that other examples and mechanisms enlarge the regulatory overlap of both layers (Lee, 2010).  1.4  Thesis contribution  With the work presented in this thesis, I elucidate fundamental aspects of information representation and manipulation in bioprocesses from both the Bioinformatics as well as the Biomolecular Computing perspective. From the Bioinformatics side, I focus on chromatin, the nucleo-protein structure that packages the DNA in the cell. As a computation and storage layer in the cell, chromatin provides high plasticity to integrate (environmental) signals into DNA-based regulation and allows information propagation according to epigenetic principles. In this work, I present results on the complexity of chromatin modifications and reveal their impact on gene activity and related functions in the cell. I further discuss the software pipeline I developed and that was employed for genome-wide chromatin profiling and transcriptome analysis. From the Biomolecular Computing perspective, I draw on epigenetic principles to devise a model for an RNA interference-based equivalent of the electric flip-flop, which is one of the fundamental elements in digital circuits. In particular, I focus on the digital switch abstraction of the flip-flop as it is a central idea in both the electric and biomolecular world to connect and at the same time decouple the underlying physical process from the abstract representation of information. This work explores the computational principles of the RNA interference machinery and suggests novel ideas for universal memory units in biomolecular computing. By complementing both the natural and theoretical perspective, I attempt to enhance our understanding of epigenetic information processing in the cell and its  22  capacity for biocomputing. In particular, the projects presented in this dissertation led to the following findings and contributions: • Potential for an RNA interference-based flip-flop: I devise a mathematical model for the biomolecular equivalent of the electronic flip-flop using RNA interference and speculate how this system could potentially be used in biocompatible computing devices as a memory unit. The novelty of this work is centered on the non-sequence-based abstract encoding scheme for digital information and how the RNAi machinery might unite analog and digital features for computing in the organismal context. • Non-redundancy of residue methylation states in chromatin architectural proteins: My analysis aided revealing that di- and trimethylation of histone H3 lysine 79 are mutually exclusive across the genome and that dimethylation associates with specific sets of cell cycle-regulated genes. This work is important as it answers the long-debated question in the field of chromatin biology of whether H3 lysine 79 methylation states are functionally redundant and further reveals a novel connection between the epigenetic state of chromatin and transcriptional gene activity. From the biocomputing perspective, the fluctuation and distinct localization of H3K79 methylation during the cell cycle could indicate a toggle mechanism that signals a particular system state to cellular machineries and discriminates marked genes from others, thereby triggering the correct iteration through the cell cycle program similar to a finite state machine, with cycle stages being machine states, chromatin modifications being (part of) transition events and activation of certain genes being generated outputs. • Cross-talk of residue modifications in chromatin architectural proteins: In this project, my analysis aided in establishing the first genome-wide profile of histone H2B lysine 123 monoubiquitination in S. cerevisiae and in characterizing H2B lysine 123 monoubiqutination as a signalling tag that is required for proper establishment of particular H3 residue modifications. With respect to the underlying computational principles, the H2B lysine 123 monoubiqutination pathway demonstrates how certain histone modifications 23  trigger others in the context of transcriptional gene activation. Drawing again on the finite state machine abstraction, the transcriptional program can be divided into certain states that need to be executed in a certain order. The presented sequence and combination of histone modifications can be regarded as (part of the) input to trigger correct state transitions, and the recruitment of enzymatic complexes as well as the establishing of downstream histone marks as generated outputs of such a machine. • Functional non-redundancy of protein function in the context of chromatin: My work contributed to the discovery that the histone deubiquitinases, Ubp8 and Ubp10, act on distinct genomic loci. While Ubp8 removes the ubiquitin tag from H2B lysine 123 at sites enriched for trimethylated H3 lysine 4, Ubp10 functions at those marked by trimethylated H3 lysine 79. Through this work it was made possible to answer the question of functional redundancy of Ubp8 and Ubp10 and reveal genome-wide dependencies of histone marks. Regarding potential biomolecular implications, the discussed results point towards principles how identical enzymatic activity can be recruited to distinct genomic loci in a histone modification-dependent manner supporting the hypothesized histone code. • Aberrant transcripts and causalities regarding chromatin structure: The analysis I performed aided elucidating the chromatin structure at sites of cryptic transcription and determine causalities with respect to certain histone modifications. From the biomolecular computing viewpoint, the data suggest that certain chromatin modifications have no (direct) impact on gene activity, hence, offering essential degrees of freedom to exploit these resources for information encoding in biocomputers with fewer interferences regarding cellular function. • Software tools for chromatin modification profiling and transcriptome analysis: As a product of studying chromatin modifications, I developed several software tools that allow analyzing ChIP-on-chip, gene expression 24  and related data. I also implemented tools for visualizing genome-wide data and made them publicly available as plug-in for the Galaxy bioinformatics environment. Chapter 1 has motivated the idea of biomolecular computing, specifically biocompatible approaches. It has further introduced key concepts and terminologies of RNA interference, chromatin biology and their role in cellular information processing. In the remainder of this thesis, Chapter 2 describes the RNA interferencebased flip-flop model and presents its mathematical analysis. In Chapter 3, the bioinformatics pipeline that I developed is explained and used to analyze wholegenome chromatin modification maps and gene expression data. Chapter 4 puts the results derived through the pipeline in their biological context and studies aspects of chromatin structure and regulation and their impact on cellular functions. Chapter 5 finally presents the conclusions of this work and embeds them into the bigger picture of information processing in living systems.  25  Chapter 2  RNA interference and the potential for a bistable flip-flop 2.1  Synopsis  RNA interference continues to emerge as one of the central regulatory mechanisms in the cell and it is found across a wide spectrum of eukaryotic organisms. Uniquely, the protein machinery of RNAi is capable of generating arbitrary affector molecules and using them to identify and influence target molecules in a completely general yet highly specific manner. Feedback loops in that process relate affector and target molecules as inputs and outputs, suggesting that RNAi can sustain a general form of cellular computation. In this chapter, we speculate on how eukaryotic cells might use RNAi to generate a form of epigenetic memory at the RNA level and realize computations based on sequential logic. We present a model for an RNAi-based equivalent of the wellknown electric flip-flop, the fundamental memory unit in digital circuits, and reason how natural and engineered systems might use it. The presented work aims to elucidate abstract principles of RNAi from a biological perspective and suggests ideas for universal memory units from a biomolecular computing standpoint.  26  2.2  Background  Biological systems process information. From simple to complex organisms, cellular components must sense, process and produce molecular signals so as to respond and interact appropriately with their environment (Cooper and Hausman, 2009; Alberts et al., 2007; Lodish et al., 2007). This regulation is primarily based on precisely controlled and highly contextual (de-)activation of genes in particular cellular situations. Since gene products often provide additional regulatory control that activate or repress further downstream genes, regulatory cascades have evolved that allow an appropriate cellular response to particular environmental queues. A key feature of such regulatory cascades is whether or not they can feedback on themselves. In the absence of feedback, the differential equations describing such systems will always give a cellular response that is specified by a given environmental condition. Often, however, it is biologically useful and necessary to remember past cellular conditions and respond appropriately from this context. Processes like the iteration through the cell cycle and the unfolding of the developmental program in higher organisms exemplify various cellular programs that are stateful, and demonstrate that the underlying computational principles are sequential rather than combinatorial in nature (Paul Hill, 1989). Consequently, these regulatory programs require a memory to store information (Alberts et al., 2007). Memory can be generated on the genetic level by linking regulatory elements through feedback mechanisms as exemplified by the lambda repressor (Ptashne, 1986), the lac operon (M¨uller-Hill, 1996), tetrecyclin repressor (Hillen and Berens, 1994; Hinrichs et al., 1994), and the lux operon (Babloyantz and Nicolis, 1972; Dunny and Winans, 1999). These regulatory systems capture transient stimuli, modulate associated cellular processes and maintain the process states beyond the duration of the trigger signal. Upon the discoveries of these regulatory systems, research towards the understanding of gene activity focused for almost two decades on identifying genetic regulatory elements (Dynan, 1989; Ptashne, 1988). Common ground was that by  27  knowing the regulatory elements of a gene, the timing and modality of its expression could be derived. It however became clear that regulation is often more complex and that additional mechanisms modulate gene activity. Several of these mechanisms are referred to as epigenetic since they encode (heritable) information and mediate regulatory impact without changing the DNA sequence (Bird, 2007). Epigenetic mechanisms encode information and modulate gene expression based on the prior history of a eukaryotic cell (Bird, 2007). DNA methylation and histone modification both encode epigenetic information by defining distinct chromatin neighbourhoods that are able to modulate cellular transcription and that are fundamentally a cellular memory of past events (Bonasio et al., 2010). Epigenetic information may span multiple regulatory layers in order to converge and reinforce signals: Besides DNA methylation and histone modifications, transcription factors and non-coding RNAs play critical roles in orchestrating complex epigenetic states (Bonasio et al., 2010). A process that continues to emerge as a ‘glue’ between (epigenetic) regulatory layers is RNA interference. The RNAi machinery and epigenetic phenomena have been linked to a wide range of functions in the eukaryotic cell, including chromosomal dynamics (Hall et al., 2002), stem cell maintenance (Cox et al., 1998) and cell fate determination (Bohmert et al., 1998). This central role of RNAi makes it an ideal interface to understand and manipulate cellular computations. Uniquely, the protein machinery of RNAi can generate arbitrary RNA affector molecules (siRNAs), that in turn are used to target highly specific RNAi-mediated responses. These responses make use of a completely general and highly specific strategy that relates affector sequences to target sequences via their ability to hybridize. Provided that the outputs from such a process can modulate the input, a general feedback mechanism exists that is defined by the RNAi machinery. In this chapter, we speculate on how eukaryotic cells might use RNAi to generate epigenetic memory at the RNA level. We present a model for the biomolecular equivalent of the electric flip-flop, the fundamental memory unit in digital circuits, and reason how natural and engineered systems might employ it. 28  2.3  RNAi as an intrinsically programmable machine  Our current understanding of the facets in RNAi can be largely attributed to research in the nematode worm Caenorhabditis elegans. As shown in Figure 2.1, RNAi in C. elegans is initiated by the DICER protein complex, which cleaves long double-stranded RNA (dsRNA) molecules into short duplexes that are 21–25 bp in length (Gregory et al., 2005; Vermeulen et al., 2005; Zamore et al., 2000). Unwinding of a duplex yields a single-stranded guide and passenger strand. While the passenger strand is degraded, the guide strand gets loaded into the Argonaute RDE-1 as primary (1◦ ) small-interfering RNA (siRNA) (Zamore et al., 2000). The ‘loaded’ enzyme, also known as RNA-induced Silencing Complex (RISC), then targets and potentially cleaves mRNA transcripts that are complementary to the guiding siRNA (Siomi and Siomi, 2009). Successful cleavage of an mRNA triggers the recruitment of an RNA-dependent RNA polymerase (RdRP) complex, which initiates the production of secondary (2◦ ) siRNAs towards the 3 -end of the transcript (Baulcombe, 2007). Thereby, the location of the initial transcript cleavage site defines a periodicity and phasing of 2◦ siRNA production initiations (Pak and Fire, 2007). Consequently, different primary cleavage sites lead to distinct sets of 2◦ siRNAs. Newly produced siRNAs become available to different RISCs and determine their mode of operation (Okamura and Lai, 2008) as depicted in Figure 2.1. The capacity of the RNAi pathway to amplify and modulate a primary silencing signal such that secondary affectors may target additional mRNAs is refered to as transitive RNAi (Alder et al., 2003). Its regulatory universality and central connectivity with cellular processes led to the development of many applications for RNA interference. Besides the immediate utilization of RNAi in Biology to rapidly determine functions of genes (Boutros et al., 2004; Kamath and Ahringer, 2003), and its medical applications to validate drug targets and to effectively downregulate mutant genes in disease cells (Bitko and Barik, 2001; McManus and Sharp, 2002; Moss, 2003), RNA interference also has had a profound impact on Biomolecular Computing. A central goal of research in Biomolecular Computing is to build miniature computing machines that monitor, control and even reprogram processes in live cells (Benenson, 2009b; Culler et al., 2010). Efforts in the field are driven by the fact that  29  Figure 2.1: RNAi pathways in Caenorhabditis elegans. The DICER protein complex cleaves long dsRNA molecules into short duplexes of which one strand gets separated and loaded into the RISC complex as a primary (1◦ ) siRNAs. This complex then targets and potentially cleaves mRNAs that share complementarity with the loaded siRNA. The cleavage event triggers recruitment of an RdRP complex that produces secondary (2◦ ) siRNAs primarily towards the 3 -end of the transcript. The location of the cleavage site defines the future periodicity of siRNA cleavage sites. Newly produced siRNAs are available to different RISCs and determine their mode of operation: They may again (1) cleave a target and amplify a response, or initiate production of other siRNAs, (2) degrade an mRNA without generating new siRNAs, or (3) block a transcript from being translated without cleaving it.  30  biomolecules—in contrast to any other computational substrate—allow direct computational interaction with living systems, a paradigm that is referred to as biocompatible computing (Shapiro and Benenson, 2006). Most recent works from the field consider RNA-based approaches to be an ideal platform to interface with cellular networks and to compute inside cells (Benenson, 2009a; Culler et al., 2010). The versatility in RNA functionality in the cell to both sense and actuate, and the rational design techniques with which RNA structures can be designed (Link and Breaker, 2009; Davidson and Ellington, 2007) were already successfully employed to engineer information processing devices that identify and respond to disease conditions in live cells (Leisner et al., 2010; Rinaudo et al., 2007; Xie et al., 2010; Beisel and Smolke, 2009; Win et al., 2009; Win and Smolke, 2007a). Yet, these devices encode information directly as a function of sequence, length or structure of nucleic acids. The required molecule manipulations in order to reflect changed bits of information are not necessarily reversible and cause the information-encoding molecules to degrade over the course of the computational process. Thus, these devices are limited to a single operational cycle, in which they essentially evaluate a formula encoded in combinatorial logic. Sequential logic and iterated operation requires a type of memory that matches the dynamics and plasticity of the underlying information and that is not intrinsically encoded by the problem to be solved (Paul Hill, 1989). We herein propose an information encoding scheme that could potentially augment the computational versatility of current bicompatible devices by drawing on the characteristic of the RNA interference pathway to separate information from machinery. We present a model of the biomolecular equivalent of the well-known electric flip-flop, the fundamental memory unit in digital circuits. The information encoding of the flip-flop is not based on individual molecule properties but on relative concentrations differences between mRNA populations. As for natural mRNA regulation, RNA interference is used to modulate the mRNA concentrations and thereby alters the encoded information. 31  2.4  Digital memory with RNAi  Digital memory at its core has bistable circuitry that records a ON or OFF state so long as the circuit is active. Applying a trigger pulse to such a flip-flop circuit results in flipping from ON to OFF, or OFF to ON depending on the initial state of the circuit. Electronically, such memory is achieved by building a flip-flop from a combination of dissipative elements (resistors) and active elements (transistors) in a symmetrical fashion so that the state of the circuit can be read out as a high (ON) or low voltage (OFF) on one half of the circuit (say at S in Figure 2.2A). Such dynamic devices are at the core of modern computer memory (Random Access Memory, RAM) and store information so long as powered. RNAi provides a nearly exact analog for the fundamental components of the electronic flip-flop. RNAi makes possible the selective degradation of particular RNAs (dissipative in nature) while in certain organisms secondary small RNAs can be produced in a sequence-dependent fashion (active in nature). Since this form of regulation is kept digitally separated by virtue of the sequence-specific and hybridization-based machinery, in principle at least RNAi offers a biological route to generalized computation. As shown in Figure 2.2B, the RNAi flip-flop core is represented by two distinct mRNAs A and B. The flip-flop state is encoded by the relative concentration difference between these mRNA species. Similar to the electric flip-flop (see Figure 2.2A), A and B represent the equivalent of two cross-coupled inverting elements. Their influence on each other is mediated by primary siRNAs s and p, and 2◦ siRNAs a, b, x and y. We assume that A and B are transcribed uniformly and with equal rates. Primary siRNAs s and p are produced outside of the circuit, and s is assumed to be at a uniform concentration. They serve to set the phasing of the secondary RNAs, which in turn serve to reinforce a particular state of the flip-flop. A source of siRNA s serves to establish the entire flip-flop circuit and induces a pattern of secondary RNAs in both mRNA populations that will inevitably force the circuit in the steady state to have a defined state (i.e. A high and B low, or B high and A low). Switching between the two flip-flop states is under the control of an externally supplied siRNA p, which we assume to be a transient cellular signal  32  Figure 2.2: Flip-flop mode of operation. A: A bistable multivibrator circuit or electric flip-flop. The circuit is composed of two cross-coupled transistors and resistors (Paul Hill, 1989). Both parts are mutually exclusive for sustained conductance and influence each other in terms of the current and voltage. In analogy to the RNA flip-flop, the components are colour-coded. The flip-flop state is defined by the part that is set conductive by applying a voltage to the base of one of the transistors (e.g. through S). The state can be switched by setting the conductive part unconductive by applying a voltage to the base of the opposite transistor (through R). Both system states are stable and maintained even after S or R are no longer applied. B: Interactions of the mRNA molecules through siRNAs in the RNA flip-flop. Blunt-ended arrows represent negative, regular arrows protective impact on the mRNAs through siRNAs in the sense of the flip-flop. The state of the flip-flop is defined by the relative concentrations of A and B, maintained through the impact of s, and altered by transient pulses of p. In analogy to its electric counterpart, both RNA flip-flop parts influence each other symmetrically such that one mRNA is in high concentration while the other is low.  33  Figure 2.3: Sequence setup of the RNA flip-flop. The mRNAs A, B represent the equivalent of two cross-coupled inverting elements in bistable electric circuits (see Figure 2.2A). They interact through 1◦ siRNAs s and p, and 2◦ siRNAs a, b, x, y. Stretches with primed labels along the mRNAs represent siRNA target sites, unprimed labels siRNA production sites. Vertical lines along the mRNA indicate idealized phasing defined by s and p. of interest. Pulsing p results in the symmetric swap of concentrations between A and B and flips the state of the circuit that can be read out by monitoring levels of A and B or the siRNAs a and b. Without loss of generality, we assume the flip-flop to be in the ON state when mRNA A is more abundant than B, and OFF conversely. Constant expression of the RNAs A, B and s causes constant cleavage in both mRNA populations since both transcripts have a target site for s. Successful cleavage events trigger the templated production of 2◦ siRNA b on A, and a on B, respectively. Once produced and loaded into a RISC, b induces degradation (RISC mode 2, Figure 2.1) of mRNA B, while a—originating from B—targets members of the A population. Since A is assumed to be in higher concentration, production of b is higher, too. This allows A to continue to dominate over the concentration of B and, hence, to maintain the current flip-flop state. The actual concentrations of A and B represent dynamic equilibria: Even though both mRNA populations are negatively impacted by siRNA-mediated cleavage and unspecific degradation, there is a constant transcriptional supply that accounts positively for their concentrations. Both opposing 34  processes cancel each other out at the equilibrium points for each mRNA. Switching the flip-flop from ON to OFF is triggered by a transient pulse of siRNA p. The target sites for p on A and B are out of phase with those for s. Hence, p causes templated production of a, x on A, and of b, y on B. Newly generated a and b then target the same mRNA population they originated from, while x and y have a rescue effect on the opposite population by partially obscuring the target sites for both s and p without triggering any mRNA degradation (RISC mode 3, Figure 2.1). Since A is assumed to be more abundant than B, more a than b is produced, which in turn establishes a reinforcement loop that brings down the A population quicker than B. Even further, since more x than y is produced, the protective effect of the former on B is stronger than the latter’s on A. Hence, the negative impact on A continuously increases while B can gradually recover. Under the right conditions, the concentration of B climbs above A. Eventually, the influence of s drives both populations to the equilibrium concentrations—now in mirrored concentrations. The new flip-flop state is stabilized and maintained until the next trigger pulse p occurs and flips the system back to its initial state.  2.5  The mathematical model of an RNAi-based flip-flop  The molecular interactions between the different RNA species of the flip-flop were described in a set of first-order ordinary differential equations (ODEs) (see Equations (2.1)–(2.8)) to analyze to analyze under what conditions the desired bistable behaviour can be achieved. Each equation describes one RNA species of the flipflop as labeled in Figure 2.3. For the mRNAs A and B at the core of the flip-flop, a constant transcription rate τ and an unspecific (i.e. not siRNA-mediated) degradation rate γ were assumed. The RISC-mediated mRNA cleavage rate was expressed through φ , and RISC-mediated translational blocking and unblocking of mRNAs through µ and δ , respectively. ∂A ∂t ∂B ∂t ∂a ∂t  = τ − γA − φ aA − µyA + δ Ab  (2.1)  = τ − γB − φ bB − µxB + δ Bb  (2.2)  = ωB − εa − ρaA + ξ pt A  (2.3)  35  ∂b ∂t ∂x ∂t ∂y ∂t ∂ Ab ∂t ∂ Bb ∂t  = ωA − εb − ρbB + ξ pt B  (2.4)  = ξ pt A − εx − µxB + δ Bb  (2.5)  = ξ pt B − εy − µyA + δ Ab  (2.6)  = µyA − δ Ab  (2.7)  = µxB − δ Bb  (2.8)  The s-mediated production of the 2◦ siRNAs a and b was described through ω, which dominates the system behaviour during the steady state of the flip-flop, and a pulse-dependent production rate ξ that dominates the state transitions. The loading rate of siRNAs into RISCs was modeled by parameter ρ, and the unspecific siRNA degradation rate by ε. Using two parameters, ρ and φ , to model RISC-mediated mRNA cleavage allowed representing potential multiple turnovers of siRNAs as well as ineffective cleavage attempts despite successful RISC loading. Similar to the mRNA-templated production of a and b, the 2◦ siRNAs x and y are produced in a p-dependent manner. The latter two transiently block A and B, and unspecifically degrade at rate ε like the other small RNAs. Temporarily blocked mRNAs are represented by Ab and Bb . Their assembly and disassembly rates correspond to the blocking and unblocking rates modeled by µ and δ as described above. pt = A p  w p −w p (t−D)2 e π  (2.9)  For the pulse, we assume a Gaussian behaviour as given by Equation (2.9), where where A p and w p control the area and width and D its time offset from t = 0.  2.6  Steady state characteristics of the flip-flop model  Assuming the flip-flop to be in either one of the steady states and the pulse to not be present, i.e. pt = 0, the RNA species x, y, Ab and Bb do not exist, and the ODEs  36  can be reduced to: 0 = τ − γA − φ aA  (2.10)  0 = τ − γB − φ bB  (2.11)  0 = ωB − εa − ρaA  (2.12)  0 = ωA − εb − ρbB  (2.13)  Iterative substitutions of A, a and b by the parameters in the equations reduced the system further to a single equation in terms of parameters only: 0 = τ+  γ(γB − τ)(ε + ρB) φ ωB(γB − τ)(ε + ρB) + φ ωB εφ ωB − ρ(γB − τ)(ε + ρB)  (2.14)  Solving Equation 2.14 leads to four solutions for B: B1 = B2 =  τρ − γε +  τ 2 ρ 2 + 2 γρτε + γ 2 ε 2 + 4 τεφ ω 2(φ ω + γρ)  τρ − γε −  τ 2 ρ 2 + 2 γρτε + γ 2 ε 2 + 4 τεφ ω 2(φ ω + γρ)  (τρ − γε) (φ ω − ργ) +  (φ ω − ργ) φ ω (τρ − ge)2 − ργ (γε + τω)2  B3 =  2ργ (φ ω − ργ) (τρ − γε) (φ ω − ργ) −  (φ ω − ργ) φ ω (τρ − ge)2 − ργ (γε + τω)2  B4 =  2ργ (φ ω − ργ)  Substituting back in, these solutions represented the fixpoints of the equation system, i.e. the concentration values for A and B in the stable states of flip-flop. The goal was to render at least two of these solutions positive definite in order to get two positive-definite fixpoints. For positive-definite parameters, the equation for B1 always renders positive-definite, while solution B2 becomes imaginary and can be disregarded. From B3 we can derive the following constraint: γε τρ 1 + τρ  γε  2  φω ≤  37  τρ γε  −1  γρ  2  (2.15)  and additionally from B4 : φω >1 γρ  (2.16)  Substituting α = β  =  φω γρ τρ γε  we can rewrite the constraints (2.15) and (2.16) as follows: (β + 1)2 ≤ α(β − 1)2 α > 1  (2.17) (2.18)  Plotting the two-dimensional constraint system described through (2.17) and (2.18), dissects the positive-definite quadrant in three regimes as shown in Figure 2.4A. The area labelled bistability (+) represented combinations of α and β that fulfilled all parameter constraints. Values from the monostability area represented degraded flip-flops with a single monostable state, and combinations from the bistability (−) region represented negative-definite fixpoint values. Selecting a suitable set of parameters according to an (α, β ) pair from the bistability (+) area and plotting the nullclines for A and B as shown in Figure 2.4B, the resulting curves intersect three times, representing the fixpoints of the system. The outer points are stable and correspond to the stable ON and OFF states of the flip-flop, whereas the middle fixpoint is unstable. Although two stable states would be sufficient and favourable, the sigmodial shape of the nullclines does not allow an even number of intersections. The basins of attraction around the outer fixpoints drive the system to the ON state for value pairs of A, B above the separatrix, and to the OFF state otherwise. The separatrix itself represents solutions where A, B are equal, which lead to degraded monostable flip-flops with no defined state.  38  Figure 2.4: Mathematical analysis of the RNA flip-flop. A: Phase space of the Equations 2.10–2.13. For positive-definite parameters the phase space divides into three areas: The upper area represents parameter combinations for (α, β ) that lead to systems with two stable and one unstable positive-definite fixpoints as shown in (B). The middle area represents degraded, monostable systems with a single positive-definite fixpoint, and the lower area defines systems with three fixpoints that are negative-definite. B: The nullclines of the first-order ordinary differential system (see Equations 2.10–2.13) intersect three times for suitable parameter sets, yielding to a symmetric bistable system with three fixpoints, the outer ones being stable and the middle one being unstable.  2.7  Dynamic characteristics of the flip-flop model  In order to mimic the behaviour of an electric flip-flop and at the same time stay within plausible biological boundaries, the parameters of the biomolecular flipflop model had to be balanced and constrained. On one hand, the system should be reasonably stable in its ON and OFF state such that minor perturbations do neither result in unintended flipping nor in oscillations of the system. On the other hand, the flip-flop has to be sensitive enough to capture a transient trigger pulse and switch properly. Finally, and to minimize interference with other cellular processes in a real-life environment, transitions between the states should be fast.  39  To find the optimal parameter combination for a representative flip-flop, we first arbitrarily fixed meaningful steady state concentrations for A and B as defined by the (α, β )-space. We further fixed an arbitrary Gaussian trigger pulse p within the same order of magnitude as the mRNA concentrations using A p = 12, w p = 0.5 (see Equation (2.9). The remaining degrees of freedom of the flip-flop were then adjusted such that the system flipped. This flip-flop then served as a template to randomly generate a population of flip-flops with the same fixpoints but otherwise different parameters. These flip-flops were then handed to a genetic algorithm (Optimization Tool 5.1, MATLAB R2010b) and iteratively optimized for the free parameters (under the fixed pulse p). In the course of this process, the fitness of a flip-flop was defined by its flipping time, i.e. the time span from the onset of the pulse p to the symmetric swap of concentrations between A and B. Observed production and degradation rates greater than the slope of the pulse were penalized and so were any oscillations in RNA concentrations. Over several rounds of selection and reproduction in the genetic algorithm, the flip-flop population evolved to a fitness threshold where flipping times differed less than t = 10−3 of a simulation time unit among all systems. For the units of the RNA species begin [A], [B], . . . , [Bb ] = µmol/l and time [t] = min, their value ranges are given in Table 2.1. The units and values of the optimized parameters adhering to all constraints are listed in Table 2.2. Figures 2.5 and 2.6 shows the time course plot of the simulation of the optimized system over two trigger pulses.  2.8  Robustness of the flip-flop model  After having shown that a flip-flop with fixed stable states can be optimized for a given trigger pulse, the robustness of the model was analysed by releasing these constraints and restoring the degrees of freedom. First, the importance of the trigger pulse on the flipping time of the system was determined by varying the pulse width and height. Over the range of a 20-fold difference in pulse area and a 5-fold  40  Figure 2.5: Simulated behaviour (core RNAs) over two trigger pulses (as represented by the star in Figure 2.7A) of the optimized RNA flip-flop (represented by star in Figure 2.8 and Table 2.2). Dashed lines indicate the fixpoints of the system.  Figure 2.6: Simulated behaviour (all RNAs) over two trigger pulses (as represented by the star in Figure 2.7A) of the optimized RNA flip-flop (represented by star in Figure 2.8 and Table 2.2). Dashed lines indicate the fixpoints of the system.  41  Variable A, B  Molecule Type mRNA  Value (min,max) (2, 6)  Unit µmol/l  a, b  siRNA  (3, 18)  µmol/l  x, y  siRNA  (0, 1.4)  µmol/l  Ab , Bb  mRNA-siRNA hybrid  (0, 14.7)  µmol/l  Table 2.1: Value ranges of the variables after optimizing the flip-flop. Values rounded to the first decimal place. The corresponding parameter values are given in Table 2.2. Parameter τ  Description mRNA transcription rate  Value 60  Unit µmol/l min  γ  unspecific mRNA degradation rate  6  min−1  φ  RNAi-based mRNA consumption rate  4 3  l/µmol min  µ  hybridization-based mRNA blocking rate  58  l/µmol min  δ  mRNA-siRNA deblocking rate  14  min−1  ω  steady siRNA production rate  108  min−1  ε  unspecific siRNA degradation rate  18  min−1  ρ  RNAi-based siRNA consumption rate  ξ  trigger-mediated siRNA production rate  9  l/µmol min  1.4  l/µmol min  Table 2.2: Parameter values and units of the optimized flip-flop. Values rounded to the first decimal place. difference in pulse width, the flipping time of the system changed only about a factor of three, except the extremes towards the inner boundary (see Figure 2.8). To further exclude the possibility that the manually set fixpoints of the optimized flip-flop had an impact on the overall system behaviour, the full degree of freedom for the flip-flop was restored and a large set of random flip-flops from the bistability (+) area of the (α, β )-space generated. For for each system, it was determined whether it flipped (for the same pulse as used before) and what time it took to flip. Of the 10.000 systems, 3173 flipped within the simulation time boundary. The other candidates did not flip within that time or did not flip at all. 42  Parameter τ  min 4.2  max 120  Unit µmol/l min  γ  0.3  12  min−1  φ  0.1  2.7  l/µmol min  µ  2.3  116  l/µmol min  δ  0.1  28  min−1  ω  5.1  215.9  min−1  ε  0.4  36  min−1  ρ  0.3  18  l/µmol min  ξ  0.3  2.8  l/µmol min  Table 2.3: Minimum and maximum of parameter values observed in the set of randomly generated flip-flops that flipped within the simulation time boundary (tmax = 128 simulation minutes). Values rounded to the first decimal place. Figure 2.7 shows the colour-coded flipping times of the flipping systems and their localization within the (α, β )-space.  2.9  Discussion  In this chapter, we presented a model of a biomolecular flip-flop that is based on the cellular processing machinery of RNA interference and stores 1 Bit of information encoded as RNA concentrations. In comparison to existing approaches in biolmolecular computing, which encode information in nucleic acid sequences, as a function of molecule length or other direct properties, the presented encoding scheme abstracts further by drawing on principles of epigenetic information processing in the cell. In both natural and artificial circuits, computation is inherently dynamic and demands substrates that support this plasticity when processing and storing information. In contrast to permanent information, which is primarily sequence-based in a cellular context, metastable information is better represented on epigenetic 43  Figure 2.7: Robustness of the RNA flip-flop for different parameters. A: Colour-coded flipping times for 10.000 randomly generated flipflops and a particular trigger pulse (represented by the star in Figure 2.8). Coloured dots represent positive-definite bistable systems that flipped. Grey dots represent positive-definite bistable systems that did not flip within the simulation time boundary (tmax = 128 simulation minutes). Initial flip-flop parameters were generated by randomly sampling each parameter of the optimized flip-flop (represented by the star) from the interval [0, 2vi ] where vi is the optimized value of parameter i (see text for details). B: Distribution of flipping times for the subset of randomly generated systems that flipped (3173 total) within the time boundary (see A). layers. Messenger RNAs form one of these layers, as they convey information (besides their actual sequence) in form of concentrations from the genetic blueprint to the final gene product. A key regulatory element to alter RNA-based information is the RNA interference pathway. In this work, we understand the RNAi machinery to be universal as it can be programmed by small non-coding RNAs to control the abundance of arbitrary (within certain limits) messenger RNAs. We further understand this machinery to naturally lend itself to bridge analog and digital computing in a cellular context, as its different modes of operation can modulate—for example reinforce— 44  Figure 2.8: Robustness of the RNA flip-flop for different pulse parameters. Flipping times of the optimized RNA flip-flop for trigger pulses of varying Gaussian width (w p ) and area (A p ). The white part of the plot represents (A p , w p ) combinations where the system did not flip within the simulation time boundary (tmax = 32 simulation minutes). The star indicates the (A p , w p ) pair used to simulate the systems as shown in Figure 2.7. interference signals such that they can cause nearly discrete behaviour of targeted messages. Based on known features of RNAi, we suggested a design for the flip-flop and showed that the model can encode 1 Bit of digital information. We further showed that for a wide range of parameter values, for both the flip-flop and the trigger pulse, discrete and fast flipping is possible. A key unresolved question concerns the actual biological parameters for the processes described in Table 2.1. The broad range of parameters tolerated by the flipping solutions (Table 2.3) demonstrate the tolerance and flexibility of the model, and while the kinetics of the RNAi machinery remain largely unresolved this suggests that such a flip-flop may be biologically possible. Of course, as our understanding of the functioning of the RNAi pathway develops, the equations used herein have to be revisited and critically examined if they capture the observed molecular interactions properly. Cooperativity between RISC complexes, for instance, could provide the non-linear kinetics directly com45  pared to the current model which requires additional molecule species to achieve flip-flop behaviour. From the biocompatible computing perspective, an implemented flip-flop as described herein could serve as a universal memory unit, as it can be interfaced with a variety of cellular and engineered pathways. Providing standardized interfaces for molecular signals is considered to be essential for building increasingly complex biomolecular circuits, and individual circuit elements should aspire to be as general as possible in a computational sense (Benenson, 2009a). For many envisioned applications in the field of biomolecular computing having more than a single Bit of memory would be highly desirable. Multiple RNA flip-flops could in principle be coupled as their counterparts in electronic circuits are combined into arrays for storing more information. This coupling can potentially be achieved in several ways. Against initial hopes, we were not able to sequentially link multiple flip-flops reliably, in which case they would have acted as a counter or frequency divider of an external signal. The key point to overcome this obstacle would be to introduce the asymmetrical production of an siRNA that gets produced only when the flipflop completes a full cycle, i.e. after every other flip, or to clock the system. In the current model, the different RNA populations behaved too symmetrically and coupling signals could not be restored reliably over multiple instances. The parallel usage of multiple flip-flops, however, is straightforward. They could, for example, be used as memory units to store temporally spread out inputs and feed them to a Boolean logic circuit whose output would then depend in a nontrivial way on the input condition (Rinaudo et al., 2007). Another option for the implementation of flip-flop arrays could potentially be to use a single unit in each of many cells and utilize cellular communication channels to store and read dispersed information in a similar way to how recent work used quorum sensing to link multiple NOR-gates, implement as genetic circuits, to Boolean functions of greater complexity (Tamsir et al., 2011). In such a scenario, the versatility of RNAi might again come to play as interference signals are known to spread systemically throughout organisms such a C. elegans (Hunter et al., 2006), thereby providing a 46  communication channel between spatially distributed flip-flops that could be used to synchronize or reset states. As far as an actual implementation of the flip-flop concerns, we suggest an in vitro environment first, before attempting to assemble the circuit in a cellular context. The availability of cell-free extracts with intact RNAi machinery, from C. elegans for example (Aoki et al., 2007), can provide a suitable test bed to determine proper functioning of individual flip-flop parts and their interplay. Given that RNA structures can be reliably predicted based on their Watson-Crick base-pairing, rational design and/or in vitro selection provide proven techniques to determine and optimize the sequence setup of the flip-flop parts. In the same way, the intended (and unintended) interactions between different RNAs according to the RNAi regulatory machinery can be determined to much greater extent than it is possible for protein-protein or protein-DNA interactions of protein regulators (Subsoontorn et al., 2011). A functioning implementation of the RNA flip-flop would undoubtedly add to our understanding of the fundamental computational principles of the RNAi machinery in the cellular context of (epi-)genetic regulation. We also see immediate potential applications for such a device as a tool for biologists: Any transient cellular event of interest that is characterized by a (unique) small RNA effector could be linked to and captured by the flip-flop. Connected to a suitable readout mechanism, such as a fluorescent protein that is being produced upon successful flipping of the system, the event could be visualized to an external observer.  47  Chapter 3  Software pipeline for genome-wide chromatin modification profiling and transcriptome analysis 3.1  Synopsis  Chromatin in general and histone modifications in particular are fundamentally involved in epigenetic regulation of cellular processes. Understanding the regulatory mechanisms will not only advance our understanding from a biological standpoint but might also lead to chromatin-based computations in live cells. As one of the steps towards these goals, chromatin modifications have to be profiled genomewide and assessed regarding their associations with gene activity and each other in order to infer the abstract principles of this form of cellular computation. For the profiling stages of this process DNA microarrays play a central role. They represent one of the technologies that has transformed research in biology over the past years from single gene studies to interrogations of thousands of genes and entire genomes at once. In chromatin biology research they have been widely  48  used for genome-wide mapping of histone modifications and studying the expression of genes under different conditions. In this chapter, I present the bioinformatics pipeline I developed for genomewide analysis of ChIP-on-chip and gene expression data to identify distinct chromatin neighbourhoods, interdependencies and combinatorial patterns of histone modifications and their correlations with gene regulatory processes.  3.2  Background  The pace of advancements in DNA microarray (Stoughton, 2005) and sequencing technology (Shendure and Ji, 2008) continues to transform biological research. Both techniques complement each other and provide powerful platforms to interrogate entire genomes at high-resolution and from a variety of perspectives, ranging from sequence decoding, to gene expression profiling, to chromatin modification mapping (Metzker, 2010; Kapranov et al., 2007; Collas, 2010). It is in large part due to these technologies that amounts of quantitative data are becoming available to scientists and allow questions to be asked of unprecedented depth. With respect to chromatin, research over the past decade began to map the growing number of histone modifications and to link their occurrence with gene expression and other DNA-templated processes in the cell (Lee et al., 2010). Recent efforts try to identify combinations of modifications as ‘words’ of the proposed histone modification code (see Section 1.3.1). Evidence suggests that it is necessary to identify such words in their spatial and temporal context of the genome, to put them together and thereby continuously reveal the principles of syntax and grammar of the proposed histone crosstalk language (Lee et al., 2010). In order to profile histone modifications and their regulating enzymatic machineries under various experimental conditions microarray and sequencing technology will continue to play a central role. In this chapter I present the bioinformatics pipeline I developed for analyzing DNA microarray data for profiles of gene expression and chromatin modifications.  49  The pipeline uses scripts and programs written primarily in MATLAB, R, Python and Java that allow a streamlined processing of microarray data, an integration of other tools and data sets and flexible ways to visualize data and results. While the following sections focus on technical aspects of the data processing, Chapter 4 presents the results derived through the analysis pipeline in their biological context.  3.3 3.3.1  Wet-lab data generation Chromatin modification mapping with ChIP-on-chip  Chromatin immunoprecipitation (ChIP) combined with microarrays (ChIP-on-chip) enables genome-wide profiling of protein-DNA interactions (Lieb, 2003). It served as a primary technology on the wet-lab side of the collaborative projects described in Chapter 4 to map various chromatin modifications across the genome of Saccharomyces cerevisiae, also known as budding yeast. As an extremely versatile tool, ChIP-on-chip allows determining the localization of protein-DNA interactions for almost any protein (or its modifications) of interest in the context of chromatin (Aparicio et al., 2004). As shown in Figure 3.1, the ChIP-on-chip workflow starts by covalently linking the protein of interest (POI), such as a histone with its modifications, to the DNA sequence in vivo and extracting the chromatin complexes from the nucleus. Since a single cell does not yield enough material for a ChIP-on-chip analysis, larger numbers of cells have to be used. In the projects described hereafter, 500 ml of budding yeast cells were grown in a rich medium to an OD600 of 0.8, which represents about 5 · 109 cells. Before chromatin extraction, the cells are exposed for about 20 min to 1 % formaldehyde to crosslink all permanent and transient proteins associated with the DNA. While this concentration is not saturating, longer incubations result in large cross-linked agglomerates of (likely unspecific) proteins. The extracted chromatin-protein complexes are then sheared into DNA fragments of about 500 bp in length. An antibody that is mechanically or enzymatically attached to a solid surface or magnetic beads is then used to specifically recognize the POI and thereby extracting the bound DNA fragments. The cross-linking of DNA and POI is finally 50  reversed and the immunoprecipitated DNA fragments are linearly amplified using two rounds of T7 RNA polymerase amplification (van Bakel et al., 2008) in the experiments analyzed in Chapter 4. Eventually, the obtained DNA fragments are labelled with a biotin tag and transferred to a microarray.  Figure 3.1: Overview of the wet-lab workflow for a ChIP-on-chip experiment to profile chromatin architectural proteins and/or their modifications. (This schematic was accepted for the Wikipedia project of the Wikimedia Foundation.) DNA microarrays, or chips, are solid carriers with short DNA oligonucleotides, so-called probes of 25–60 nt in length. Several tens of thousand up to millions of copies of a particular probe represent one feature, and the probe sequence and feature position on the chip is known and catalogued. The most common array platforms used for ChIP-on-chip experiments are commercial tiling arrays from companies such as Affymetrix, NimbleGen and Agilent. In the projects described in this thesis, Affymetrix 1.0R S. cerevisiae GeneChip arrays were used, which comprise ∼3.2 million features covering the complete genome of budding yeast. On these arrays, probes are 25 nt in length and tiled at an average of 5 nt resolution, creating an overlap of approximately 20 nt between genomically adjacent features. 51  Once obtained, the amplified and labelled DNA samples are hybridized to an array and stained with a fluorescent dye. The amount of hybridized sample at each feature on the array, i.e. the number of bound probes per feature, can then be detected and measured as a function of fluorescence. The emitted fluorescence for a particular feature represents the relative quantity of sample DNA at the probe’s genomic position in the immunoprecipitated material. The fluorescence pattern of the array is captured as a digital image, which is transformed into numerical intensities values for each feature and, in case of the Affymetrix platform, stored as a CEL-file. However, the obtained values of an experiment cannot necessarily be fed into the analysis pipeline right away. Along the steps of the wet-lab workflow, from crosslinking the proteins with the DNA, to fragmenting the chromatin, to immunoprecipitating, amplifying and hybridizing the samples to the chip, and also saturation effects of the fluorescent dye, can introduce biases due to DNA sequence- and genomic region-dependent characteristics—called probe effects—and/or biochemical and biophysical aspects (Kimmel and Oliver, 2006). Therefore, it is crucial for any ChIP-on-chip experiment to identify and correct for these biases. Typically, this is done by measuring a reference signal and dividing signal intensities obtained from the immunoprecipitated DNA samples by the reference signal intensities (Kimmel and Oliver, 2006). To generate a reference signal, either genomic DNA (input) or a mock immunoprecipitation (mock IP) are hybridized onto a second array when working with the Affymetrix platform. An input reference is obtained by using a fraction of the DNA samples before the rest of the material is immunoprecipitated and treated further as described above. A mock IP is performed using an unspecific or no antibody at all to enrich unspecifically bound DNA-fragments before hybridizing them (Kimmel and Oliver, 2006). Furthermore, to validate the observed effects and reduce variations based on fluorescence intensities and other potential noise sources, a ChIP-on-chip experiment is repeated to generate replicates, typically two to three. 52  Figure 3.2: Overview of the dry-lab workflow for a ChIP-on-chip experiment to profile chromatin architectural proteins and/or their modifications. (This schematic was accepted for the Wikipedia project of the Wikimedia Foundation.) At the end of the wet-lab portion of the workflow, a CEL file for each sample and control gets generated that contains the raw data. For the work presented herein, typically two replicates were generated per experiment. If their Spearman rank correlation was greater than ρ = 0.9, one representative dataset was used for all subsequent analysis. For data with lower correlation, quantile normalization across replicates was used to generate a high-confidence dataset. In either case, the dataset was computationally processed and mapped (given the catalogued array layout) in order to determine at which genomic sites the POI or chromatin modification was present (Figure 3.2). Section 3.5 describes the important steps of this analysis further.  3.3.2  Transcriptome mapping with tiling arrays  As mentioned above, the application range for DNA microarrays is broad and constantly increasing. Besides chromatin modification mapping and other analysis, DNA microarrays are versatile tools to monitor and compare gene expression levels and profile the transcriptome (Kapranov et al., 2003; Bertone et al., 2006). In this type of application, which is often used to reconstruct cellular pathways (Young, 2000) and—as described below—to identify genomic coordinates of transcription events, gene expression levels are analyzed for a sample and a reference condition. Observed changes in gene activity between a sample and reference profile allow grouping of genes based on their response, assuming that genes with a common 53  regulatory pattern appear in the same pathway or share a common function (Young, 2000). As long as some genes of a group have already been linked to a certain pathway or function, newly identified members can be associated with those, too. On tiling arrays, the high density of probes allows determining where in the genome transcription occurs under various conditions and, hence, facilitates the identification of novel functional elements or annotation of unknown genomes (Kapranov et al., 2003; David et al., 2006). Due to their resolution power, tiling arrays, which contain overlapping probes, allow a much more accurate view on a cell’s transcriptome than earlier methods, which contained fewer probes and were limited in representing the accurate structure of genetic elements (Bertone et al., 2005; Liu, 2007). Some of the first studies with tiling arrays revealed that a larger portion of the genome seemed to be transcribed than covered by known protein-coding regions (Cawley et al., 2004; Kapranov et al., 2002). A mere 1–2 % of the mammalian genome sequence encodes for proteins while 70–90 % of the remaining 98 % is transcribed (Kapranov et al., 2007; Mercer et al., 2009). Formerly, this large portion of the genome was considered to be ‘junk’ DNA, but now emerges as a source of non-coding RNAs (see Section 1.3.2), antisense, promoter-associated, intergenic and cryptic transcripts (Claverie, 2005; Katayama et al., 2005; Kapranov et al., 2007; He et al., 2008; Guttman et al., 2009; Khalil et al., 2009; Mercer et al., 2009). In Section 4.5, I present an analysis of tiling array data that aided elucidating an intriguing example of the interplay between cryptic transcription and chromatin structure. In that work, samples were prepared for array hybridization by synthesizing cDNA from isolated total RNA in S. cerevisiae. The purified cDNA was fragmented, labelled and hybridized to custom Affymetrix tiling arrays, which contain both strands of the S. cerevisiae genome tiled at 8 bp resolution (David et al., 2006). Analog to ChIP-on-chip experiments on the Affymetrix platform, numerical fluorescence intensities obtained from tiling arrays are stored in CEL-files.  54  3.4  Raw data processing  Once raw data from the sample and control microarray experiments are available as CEL files, they have to be processed computationally in several steps. As described above, intensities values between different batches of experiments are not directly comparable because of the differences in quality and quantity of the labelled sample as well as differences in reagents, stain and array handling. In order to reduce these biases and to make the probe measurements more comparable, the data have to be normalized before it can be used to determine enriched regions. Several algorithms have been developed for normalizing gene expression microarrays (Choe et al., 2005; Grewal et al., 2007). Since these arrays usually have a low probe density, the algorithms cannot be easily transferred to tiling arrays used for ChIP studies. The high density probes covering large contiguous regions of the genome and the overlapping probe layout, require different bioinformatics approaches (Bolstad et al., 2003). Several tools and methods have been developed to analyze ChIP-on-chip data of tiling arrays. Widely used are MAT (Johnson et al., 2006; Droit et al., 2010), Ringo (Toedling et al., 2007),TileProbe (Judy and Ji, 2009), Tilescope (Zhang et al., 2007), TiMAT1 , Affymetrix Tiling Array Software2 , Splitter3 and MA2C (Song et al., 2007). A study of several labs comparing these algorithms (Johnson et al., 2008) found that some algorithms are more suitable than others when applied on data from certain array platforms. For Affymetrix tiling arrays, as used throughout the work described in this thesis, MAT performed best. The most recent version of the algorithm, rMAT (Droit et al., 2010), was used for all ChIP-on-chip data preprocessing presented hereafter. 1 2 programs/programs/developer/tools/affytools.affx 3  55  3.4.1  rMAT  As the most recent implementation, rMAT uses an adapted version of the Modelbased Analysis of Tiling-arrays (MAT) algorithm to reliably detect enriched regions of the genome based on data from a ChIP-on-chip experiment (Johnson et al., 2006; Droit et al., 2010). Firstly, rMAT tries to eliminate probe effect biases using a Bayesian approach. Therefore, for each probe the probe intensity is first estimated based on its sequence composition and then subtracted from the measured intensity. These effects are corrected for, because probes on an array having a high content of G or C bases show stronger binding affinity to the chromatin immunoprecipitated sample DNA, i.e. they tend to have higher intensity values just because of their sequence. This can be explained by the fact that any G-C base-pairing between probe and sample is stronger then between A-T because the latter association is based on only two hydrogen bounds whereas the former on three. Secondly, rMAT tries to address biases based on the copy number of each probe. Given that some genomic regions are repetitive, a sample DNA from such a region may hybridize to any of multiple probes on the array. Therefore, rMAT attempts to correct measured probe intensities taking the number of existing probe copies into account. After these normalization steps, all probe intensities are smoothed using a sliding window approach, and an enrichment score for each probe is calculated that reflects the occurrence of the protein or modification of interest. Finally, rMAT uses a scoring function that calls enriched and coherent regions based on a cutoff or p-value or a false discovery rate (FDR) cutoff.  3.4.2  tilingArray  Biases arising in transcriptome analysis with tiling arrays are similar to those mentioned above for ChIP-on-chip tiling arrays. Therefore, the intensity values of measured gene expression have to be normalized first. The normalization aims  56  to remove variations in the data of non-biological origin due to probe effects and the like, such that the remaining variation in the data can be considered as much as possible as truly biological effects. Of the normalization approaches proposed for tiling arrays, complete data methods—considering the entire data of several replicates—are commonly used, and quantile normalization in particular performs best (Bolstad et al., 2003). To account for probe effects, tilingArray further uses reference intensities of genomic DNA samples to adjust the probe signal intensities and employs a probe-specific background correction (Huber et al., 2006). The next step of transcriptome analysis is to obtain an unbiased map of position, abundance and architecture of transcripts by calculating the genomic coordinates of transcript start and stop sites on both strands of the DNA. To identify transcripts some algorithms used sliding window methods together with threshold criteria (Bertone et al., 2004; Kampa et al., 2004; Royce et al., 2005; Schadt et al., 2004). Also discrete state hidden Markov models were employed to detect transcript boundaries (Tjaden et al., 2002). TilingArray increases the prediction power of these approaches by treating transcript abundance not as a discrete but continuous metric and modelling the hidden state of the HMM as a real-valued variable (Huber et al., 2006). The hybridization profiles are then partitioned into segments of constant intensity values, which are separated by change-points indicating the transcript boundaries. To detect the change-points in hybridization intensities, tilingArray determines the global maximum of the log-likelihood of a piece-wise constant model by dynamic programming (Gentleman et al., 2004; Picard et al., 2005). Once processed by tilingArray, the gene expression data provided an unbiased genome-wide view on the transcriptome, and the subsequent analysis steps described below built on the results to correlate transcription with chromatin modifications.  57  3.5 3.5.1  Data analysis and visualization Local analysis pipeline  As depicted in Figure 3.3, the software ecosystem used for analyzing DNA microarray data comprised several tools that had to be integrated and interfaced with external data sources to allow a streamlined processing. Locally running applications and services were hosted on a 64 bit Ubuntu Linux Server 9.10 running on a dual quad-core Intel Xeon machine with 16 GB of main memory. The normalization and preprocessing of the array data was done in R 2.9 with the rMAT and tilingArray packages as provided from Bioconductor (Gentleman et al., 2004). Since neither the packages nor R support native parallel execution of code, the Sun Grid Engine 6.1 was used as a job management system to pool and batch process multiple data sets concurrently and thereby utilize the available computational resources more efficiently. Once preprocessed through calling functionalities of the packages, I developed scripts in R to visualize and plot the data together with other genomic features (see below for details). Finally, the data sets were exported from R as BED and GFF files. BED files are widely adopted and describe genomic features4 . The format specifies three required and nine optional fields that are saved in a (tab-delimited) text file. As shown below, a BED file was generated when handling ChIP-on-chip data that stored the chromosome as well as the start and end coordinates of genomic regions that were called enriched for the chromatin modification of interest by rMAT. Similar to BED but greater in flexibility, GFF is an exchange format for describing genes and other features of DNA, RNA and protein sequences5 . It uses nine tab-delimited fields to store and exchange genomic data. Version 3 of the GFF specification was used to export genome-wide probe-level ChIP and mRNA expression data as follows. 4 For 5 For  BED format definition, see GFF format definition, see  58  Depending on the type of analysis, the generated files were then directly fed into analysis in MATLAB and/or deposited into a local MySQL 5.1 database, mostly via Java code in Talend Open Studio 3. As an open-source tool, Talend Open Studio provides a code generation engine and a graphical integration designer to flexibly combine, convert, update different (local and remote) data sets, connect to other data sources and output a variety of formats ( Several hundred processing functions and formats are predefined and can be assembled into a data integration configuration, which can be run directly in Open Studio or standalone using a Java or Perl engine. Batch processing, synchronization of databases, migration and complex transformation of data are all greatly facilitated through this tool. At the core of the genome-wide DNA microarray analysis, I employed MATLAB (R2008a–R2010b) and its toolboxes, primarily because of the conveniences the language offers for handling multi-dimensional arrays, parallel execution of code, connectivity with external services (e.g. databases via SQL or web services via HTTP wrappers), debugging functions of the development environment and interactive options for plotting. In particular, MATLAB scripts were used to perform analysis and test biological hypothesis as described in the following sections. Enrichment of genomic features for chromatin modifications. As explained in Section 1.3.1, chromatin modifications such as histone marks have been associated with gene activity and other DNA-templated processes in the cell. Understanding the distribution of modifications and their localization with respect to functional genetic elements will provide insights into the underlying epigenetic regulation mechanisms. To determine enrichment of histone marks and chromatin modifications for certain genomic features, I developed a script that uses genomic coordinates saved in BED files representing enriched regions, and genomic coordinates of genetic features of interest, e.g. all open reading frames (ORFs) of genes in the S. cerevisiae genome. Iterating over all chromosomes, the script matches both sets of coordinates and determines the relative enrichment per feature. According to a user-defined threshold, enriched features, e.g. gene names, are saved. 59  Figure 3.3: Software ecosystem of the bioinformatics pipeline. See text for details on the individual components.  60  Subsequently, multiple lists of enriched features can be compared by set operations, such as Union, Intersection and Complement, to determine features with cooccurrent and exclusive marks. Such analysis aided to reveal the mutually exclusive distribution of H3K79me2 and H3K79me3 in the genome (see Section 4.3 for details). Similarly, I developed scripts that determine enrichment of other genomic features, such as promoters, centromeres, telomeres, tRNAs, rRNAs, autonomous replicating sequences and silent mating-type loci. The Saccharomyces Genome Database (Cherry et al., 1998) served as a primary source of information on such features, and data was downloaded and stored in tab-delimited files and/or the local MySQL database. A list of all transcription start and end sites for all transcripts in S. cerevisiae was kindly provided by Harm van Bakel and is based on data published in Lee et al., 2007. To determine statistical significances of set cardinalities, e.g. groups of genes enriched for particular histone marks, Fisher’s exact test (Fisher, 1922) was applied. Similar to the χ 2 -test, Fisher’s test analyzes contingency tables to calculate the significance of associations between objects and categories. It is more accurate than the χ 2 -test, especially for small group cardinalities (Agresti, 1992). The binomial test was used as an exact test statistic when objects were equally likely to belong to either one of two categories, e.g. enriched or not enriched. In cases where expected distributions were unknown, Monte Carlo approaches were employed to generate reference values. For an example based on the latter approach testing significant enrichment of autonomous replicating sequences with H3K79-methylation marks, refer to Section 4.3.6. Since not all chromatin modifications are necessarily found across the entire length of genomic features but localized to distinct segments, partitioning of features is essential to get a more accurate view of their enrichment for particular marks (Liu et al., 2005). Also, probe-level enrichment scores from GFF data increases the accuracy over BED-file information. I developed an algorithm that partitions genes into five segments representing their promoter, transcription start 61  site, 5 -coding sequence, mid-coding sequence and 3 -coding sequence, and determines their individual enrichment for histone marks. Overlapping of segments was avoided by applying length constraints. Section 4.4.2 presents results based on this approach. Genome-wide visualization of chromatin modifications. While raw numbers may contain all relevant information of an analysis, they are not always necessarily suitable to convey biological meaning efficiently. Graphic visualizations of results often aid in representing abstract content and reveal information at a glance (Gehlenborg et al., 2010). Yet, satisfying this claim is often particularly challenging for genome-wide data. For the case of mapping chromatin modifications with ChIP-on-chip, it is a major goal to access the localization and distribution of the modifications across the genome. Genome-wide visualization approaches for ChIP-on-chip and related data can be distinguished according to the level of detail they achieve resolving. At singlegene resolution, tools like the UCSC Genome Browser (Fujita et al., 2011), offer the most detailed level of data presentation, typically spread out over several scrollable maps. At this resolution level, I developed visualization scripts in R that generate genome-wide maps of DNA microarray data. Similar to the Genome Browser, they allow visualization of additional features, such as nucleosome positions, introns, telomere elements, autonomous replicating sequences and silent mating-type loci, but extent its flexibility in layered plotting of multiple profiles and features, which, for example, facilitate to qualitatively assess colocalization of chromatin modifications relatively to genomic elements. Plots of this kind are found in Section 4.3.1 for instance, and aided in formulating initial hypothesis on the influence of chromatin structure at sites of cryptic transcription (see Section 4.5). While offering fine-grained information, single-gene maps make it difficult to derive broader principles when evaluating enrichment profiles of chromatin modifications across all genes since the information is too dispersed. Averaging approaches provide a widely used alternative by first grouping genes—or other genomic features—of similar kind into classes based on common characteristics and 62  then calculating the average enrichment profile for each class. Similar to results derived in previous studies (Pokholok et al., 2005; Mayer et al., 2010), I developed algorithms to generate average gene profiles based on the degree of enrichment, gene length or transcriptional frequency. Section 4.4.3 and others present results of this analysis. Average profiles can provide a good overview of the distribution of chromatin marks and their association with certain genomic features, but may also be biased for the binding profile of the strongest bound genes without revealing how many exceptions there are. Therefore, dividing features into segments—and thereby normalizing for gene length—as done in previous studies (Venters and Pugh, 2009; Badis et al., 2008) results in more accurate representations of enrichment profiles. However, these representations so far typically visualize only certain feature segments individually, e.g. only the 5 -coding sequence of all genes, and thereby inevitably fragment the information again that is necessary for a global assessment at a glance. I improved this approach and developed a visualization tool that allows an unbiased view on chromatin modification profiles at a genome-wide scale by sorting features by length and calculating mean enrichment scores for bins of absolute size. Thereby, the tool balances detail and compactness of existing visualization approaches. For details on the tool and how it was used in the work described below, refer to Section 3.5.2. Colocalization and patterns of chromatin modifications. Chromatin modifications can colocalize on the same or neighbouring nucleosomes and may depend on each other through pathways collectively refereed to as histone crosstalk (see Section 1.3.1 for details). A major goal when analyzing chromatin modifications is to find such patterns of co-occurring modifications—sometimes called ‘words’ of the hypothesized code (Lee et al., 2010)—and to elucidate their dependencies. To interrogate the global relationship of histone modifications on a probe level, Spearman’s rank correlation coefficients are widely used (Spearman, 1987). This 63  statistic represents a non-parametric test, which discovers the strength and direction of a link between two data sets. As a correlation metric, it is based on ranked values rather than raw expression/enrichment levels, which makes it less sensitive to extreme values in the data. In the projects described below, Spearman correlation coefficients were calculated to assess the reproducibility of replicated microarray experiments and to find correlations between different chromatin modifications. On the level of genomic features, potential links between chromatin modifications can be revealed by comparing sets of, for example, feature names or coordinates associated with a modification. Common set operations between multiple sets resulting in non-empty intersections indicate cooccurances of marks, empty intersections exclusive enrichments, and so on. Derived results can be checked for their statistical significance using appropriate test, such as Fisher’s exact test as described above. As explained in Section 4.4.2, chromatin modifications do not necessarily occur across the entire length of genomic features, and, hence, colocalization of modifications must be considered on a feature-segment level, too. For the comprehensive analysis of chromatin modification patterns (chromatin signatures) across transcripts in Section 4.4.2, first all transcripts in the S. cerevisiae genome were partitioned into five segments and their relative enrichment calculated based on probe-level GFF data. Secondly, the idea was to automatically group similarly enriched segments. Clustering provides powerful methods for this purpose and allows organizing data into a smaller number of relatively homogenous groups. Clustering methods can be distinguished into supervised and unsupervised approaches, in which the former assign some predefined order to the data, while the latter require no prior assumptions on how the data is grouped. For microarray data, hierarchical clustering, k-means, k-nearest neighbour, principal-component analysis and selforganizing maps are commonly applied clustering methods, each with their own strengths and weaknesses in certain analysis (Nugent and Meila, 2010).  64  For the work presented the Section 4.4.2, generally no prior information on the number of clusters was available. Hence, supervised approaches were disfavoured. Hierarchical clustering based on Euclidean distance between entities and unweighted average distance (UPGMA) as linkage criteria yielded the most plausible results to group modification profiles from a biological perspective. To get a more quantitative measure of occurring chromatin signatures, applying an enrichment threshold transforms the patterns into binary combinations. For the five segments discussed in Section 4.4.2, the resulting 32 combinations can then be characterized further with respect to certain Gene Ontology terms, transcription factor binding lists and the like. In order to explore the statistical significance of any overlaps between sets, Fisher’s exact test and a cut-off of p = 0.0001 was applied. Associations of cellular functions with chromatin modifications. Besides the genome-wide mapping of chromatin modifications, it is a major goal to access the association of marked genomic features, typically genes, with cellular functions. In the work presented below, genes as well as gene segments were analyzed for associations with transcriptional frequencies, transcription factors, Gene Ontology (GO) (Ashburner et al., 2000) terms as well as specific cellular processes of particular interest. Section 4.3, for instance, presents results that functionally link histone H3 methylation marks and cell cycle-regulated genes. For microarray data, exploratory data mining based on GO annotations has proven effectively in revealing cellular functions and pathways genes of interest are involved in (Werner, 2008). As the most comprehensive effort to introduce consistency in naming biological events and properties, the Gene Ontology project provides a set of structured vocabularies and controlled terms for annotating genes, gene products and sequences (Ashburner et al., 2000). Represented as a hierarchical graph structure, GO terms allow deriving general as well as specific characteristics of genes depending on their location in the graph. For the genome of S. cerevisiae, the Saccharomyces Genome Database has used the Gene Ontology to annotate gene products in the budding yeast. From MATLAB, the ontologies 65  can be accessed either remotely or locally after downloading and queried regarding molecular function to assign activities to gene products, regarding biological process to put those gene functions in their biological context, and regarding cellular component to determine the subcellular compartments the gene product is found in. Relationships between individual or combinations of chromatin modifications and transcriptional regulation were further investigated by testing whether certain sets of genes or signatures correlated with transcriptional frequencies and/or binding of transcription factors. Therefore, each gene was assigned to its corresponding transcriptional frequency class (Holstege et al., 1998) and/or compared to publicly available lists of genes bound by specific transcription factors (MacIsaac et al., 2006). Examples of this analysis are found in Section 4.3. Associations of transcription events with chromatin modifications. Building on the analysis results regarding chromatin modifications in the context of transcription, gene expression data were utilized to further complement the intriguing associations between chromatin structures and transcriptional events. For the work described in Section 4.5, gene expression experiments were performed to measure the complete transcriptome in different S. cerevisiae strains. In the analysis of the resulting data sets, features of the tilingArray package (Huber et al., 2006) described above were used together with custom-written code to first calculate the coordinates and visualize transcription segments on a genome-wide scale and then to prepare the data for subsequent analysis. Among those, emphasis was put on the identification of novel and ‘unusual’ transcripts in certain mutant strains. Capitalizing on database operations together with MATLAB functionality enabled determining the occurrences of cryptic transcription initiation events, characterized by intragenetic transcript boundaries predicted by tilingArray, on a genome-wide scale and to integrate ChIP-on-chip data to allow assessing the observations together with the local chromatin structure. These analysis aided in formulating several hypothesis regarding a potential relocalization of the histone variant H2A.Z in certain mutant strains and the restructuring of intragenic chromatin that resemble promoter characteristics and may lead to dis66  cerning dependencies and causalities of chromatin modifications in the context of transcriptional regulation in the cell.  3.5.2  Web-based analysis and collaboration  While the local bioinformatics pipeline performed well for all aspects of the analysis in the projects described in Chapter 4, ultimately it remains a heterogeneous software ecosystem that requires acquaintance with several tools, services, languages, file formats and so forth. These aspects can make it labour intensive to reconfigure pipeline components, integrate new data sources, update or migrate the setup to a new environment and the like (Davidson et al., 1995). In order to further facilitate synthesizing new scientific knowledge and relationships within large amounts of data, solutions should strive for homogeneity, portability and flexibility (Ritter et al., 1994; Markowitz and Ritter, 1995; Baker et al., 1998). Hence, one goal of my work was to identify and modularize components of the pipeline and facilitate their reuse as plugins for existing web-based bioinformatics environments. Thanks to the pervasive popularity of the web browser as user interface, the ubiquitous availability of network access and the separation between logic and representation, web-based software solutions have become increasingly popular in bioinformatics research to effectively access data sources and software tools. They can be broadly classified into data-centred and analysis-centred systems (Stockinger et al., 2008), although boundaries are not always clear-cut and hybrid solutions may be advantageous in the long term given the increasing penetrance of cloud-based computing (Schadt et al., 2010). Over the last decade, there have been tremendous efforts to increase the userfriendliness of such environments and to facilitate interactions with the algorithms and data repositories that often allow complex parametrization regarding their input and output data structures. Some of the environments integrating various data sources and/or analysis tools that found a broader community include—but are not limited to—Biology Workbench (Subramaniam, 1998), ISYS (Siepel et al., 2001),  67  GenePattern (Reich et al., 2006), BioExtract Server (Lushbough et al., 2008) and Galaxy (Goecks et al., 2010; Taylor et al., 2007). As one of the most popular, versatile and cloud-ready platforms, the Galaxy framework integrates a variety of tools, which enable experimentalists without programming expertise to easily retrieve and query local and remote data, perform genome-scale analysis and visualize the results in a modularized way. Galaxy. Galaxy is an open-source, extendable, web-based bioinformatics environment, which offers various features and tools for the analysis of genome-scale data under a unified interface. It is readily available online ( and downloadable for local installations and customizations. The standard setup already includes over one hundred computational tools and converters for many tasks and data formats in bioinformatics research. The philosophy of Galaxy, thus, aims beyond data-centred approaches of projects like BioMart (Smedley et al., 2009) or the UCSC genome browser (Fujita et al., 2011). Furthermore, Galaxy allows sharing of analysis steps through history functions and collaboration of several people on entire analysis workflows—including the data. This increases the transparency of data analysis, facilitates comparisons and the general reproducibility and traceability of how results were derived. From the local software pipeline for microarray data described above, one visualization approach appealed particularly to experimentalists as it offers great versatility to represent genome-wide chromatin modifications in an accurate yet compact manner. In the following section, I use this component to exemplify how parts of a local analysis pipeline can be easily converted for use in Galaxy. CHROMATRA. The previous sections outline how the rise of DNA microarrays contributed of the transformation of research in biology from a purely labbased towards an information science. As a ‘positive’ result of this transition, the amount of data now available to biologists allow questions of unprecedented depth to be addressed. On the other side, however, the increasingly large volume and complexity of data sets, demands adequate computational methods to analyze and 68  also to present the data more effectively. With respect to the latter aspect of data handling, it remains a challenge to visualize large-scale data effectively such that their biological meaning can be grasped easily. As described above, existing tools, like the UCSC Genome Browser (Fujita et al., 2011), offer the most detailed level of data presentation but may make it challenging to derive broader principles in the data since the information is too dispersed. Averaging approaches, circumvent this problem by condensing the information. They, however, inevitably sacrifice—potentially important—details. Balancing both detail and compactness of existing visualization approaches, I developed CHROMATRA (CHRomatin mOdification Mapping Across TRAnscripts), a visualization tool for genome-wide DNA-protein binding and chromatin modification maps. It portrayals binding events or chromatin modifications profiles across all transcripts or other genomic features that can be aligned in a comprehensive yet condensed form by accounting for their length and additional characteristics. CHROMATRA is a set of Python scripts bundled as a plug-in for the Galaxy bioinformatics platform (Goecks et al., 2010; Taylor et al., 2007). Integration of the tool into Galaxy is achieved via parameter and user interface descriptions provided in an XML file. Since CHROMATRA was written in Python it easily integrates with Galaxy, itself primarily implemented in Python, and has no dependencies on any (external) libraries not already used by Galaxy. CHROMATRA consists of two visualization modules, which can be integrated into the analysis workflow for ChIP-on-chip or ChIP-Seq experiments. As explained in previous sections, the raw data of ChIP experiments are normalized and enrichment scores are calculated and stored in standardized formats such as BED, GFF and WIG for subsequent analysis. After these steps, CHROMATRA can be tied in with the workflow and used to visualize ChIP profiles by reading GFF files that contain enrichment scores for the entire genome (Figure 3.3). The first module, CHROMATRA-L, accounts for differences in length of ge69  nomic features, such as genes, and eliminates potential biases in assessing ChIP profiles as they arise when using non-absolute length scales. It does so by sorting all features by length and calculating mean enrichment scores for bins of absolute size according to user-specified parameters. Therefore, the module requires gene coordinates to be uploaded, e.g. position information of all transcripts or open reading frames of an organism. These coordinates can be uploaded by the user or readily derived from other Galaxy modules in a tab-delimited format. The resulting enrichment profiles are colour-coded according to user-defined settings and visualized in a heatmap-like plot, which is available for download in different image formats, such as PNG, PDF or SVG. The second module, CHROMATRA-T, extends the first one by allowing an additional metric, such as transcriptional frequency of genes, to be accounted for when visualizing feature enrichment profiles. The values for the additional characteristic can be specified in the tab-delimited input file that also holds the gene coordinates. According to user-defined intervals or clusters for the second metric, CHROMATRA-T first partitions the set of features accordingly and then sorts each group by length. Enrichment values are subsequently calculated and displayed as described above. CHROMATRA-T hence allows assessing ChIP enrichment profiles at a glance according to various feature characteristics and avoids lengthdependent biases. Section 4.4 describes results visualized with the CHROMATRA approach that allowed direct comparison of genome-wide ChIP-on-chip data addressing the localization and dependencies of chromatin modifications in the context of transcription. Also, data derived through other ChIP techniques, such as MeDIP-seq (Weber et al., 2005; Jacinto et al., 2008) or MBD-seq (Serre et al., 2010) can be handled by CHROMATRA.  3.6  Outlook  CHROMATRA represents an example of how to transform an existing (heterogeneous) local analysis pipeline into a (homogenous) web-based environment such  70  as Galaxy, which increases reuse, portability and flexibility of the tools, and enables (remote) project collaborations and greater traceability of analysis. In the long term, standardization and integration efforts will continue for such platforms. GenomeSpace from Broad Institute (, for example, already integrates Galaxy and other popular tools including Cytoscape, GenePattern, Genomica, Integrative Genomics Viewer and the UCSC Genome Browser into a meta-workspace. Equipped with cloud-based computational resources, such efforts seem to provide suitable means to address the data-related challenges genomics, metagenomics and epigenetics projects will pose (Kahn, 2011).  71  Chapter 4  Analysis of chromatin modifications in the context of transcription 4.1  Synopsis  The overall goal of the projects described in this chapter was to learn more about the genome-wide distribution and regulation of chromatin modifications, with a particular focus on histone post-translational modifications, such as methylation, acetylation and ubiquitination, as well as histone variants and their relationship with transcription events. My work aided elucidating the function and distribution of the modification states of a single histone residue, the distinctive patterns in which multiple histone modifications cooccur, the spatial and temporal dependencies between different modifications, and the structure of chromatin at sites of cryptic transcription. In the first part, the genome-wide distribution of histone H3 lysine 79 di- and trimethylation was mapped using ChIP-on-chip and subsequent analysis revealed that the methylation states are mutually exclusive across the genome. This non-  72  redundancy suggests that individual states are linked to different cellular functions, and my analysis revealed that H3 lysine 79 dimethylation was linked to the cell cycle. The second part of this chapter presents my analysis of patterns of cooccurring histone marks, with a particular focus on histone H2B monoubiquitination crosstalk. The first genome-wide map of H2B lysine 123 monoubiquitination (H2BK123ub) in S. cerevisiae was established, and my comprehensive analysis revealed that H2BK123ub is associated with both its dependent marks H3K4- and H3K79 trimethylation on a genome-wide scale. Furthermore, presented results indicate that the H2B deubiquitinases Ubp8 and Ubp10 act non-redundantly and remove H2BK123ub at different genomic loci. Ubp8 acted on loci that were enriched with H3K4 trimethylation, while Ubp10 removed ubiquitin at sites marked with H3K79 trimethylation. In the last part of this chapter, my analysis supported studies of chromatin neighbourhoods at sites of cryptic transcription, which occur when certain chromatin pathways, such as the the Set2/Rpd3 pathway, are impaired. Using tiling arrays to capture the complete transcriptome of the cell, cryptic initiations sites are determined across the genome. Furthermore, the localization of the histone variant H2A.Z at sites of cryptic transcription as well as the role of its deposition complex are investigated.  4.2  Background  As described in Section 1.3.1, chromatin folds into a hierarchy of complex layers (Misteli, 2007) and serves as a state-encoding memory in the cell. On the first level of the hierarchy, histone proteins can be post-transcriptionally modified (Bhaumik et al., 2007; Kouzarides, 2007) and modifications can colocalize in various combinations, thereby defining distinct chromatin signatures and patterns (Bhaumik et al., 2007; Kouzarides, 2007). One of the first described post-translational modifications of histones was methy-  73  lation of lysine residues (Murray, 1964). It is one of the most stable epigenetic marks and proposed to serve as a core epigenetic mark during cell state inheritance and cellular memory in general (Radman-Livaja et al., 2010; Muramoto et al., 2010; Lan and Shi, 2009). Although initially considered to be non-reversible, several demethylases have been discovered over the past years (Mosammaparast and Shi, 2010). Distinct states of histone modifications. To date, three histone methyltransferases have been identified in S. cerevisiae: Set1/COMPASS, Set2 and Dot1 (Shilatifard, 2006). They are responsible for methylation of lysines 4, 36 and 79 on histone H3 (H3K4, H3K36 and H3K79), respectively. In that process, each lysine residue can be either mono-, di- or trimethylated (e.g. H3K79me1, -me2 or -me3) (Shilatifard, 2006). Histone H3 methylation occurs in distinct regions of the genome and in different segments of genes, which are associated with particular stages of the transcription cycle that comprises transcription initiation, elongation and termination. H3K4me3, for example, is found at the 5 -end of genes (Bernstein et al., 2002; Briggs et al., 2001; Krogan et al., 2003b; Miller et al., 2001; Ng et al., 2003b; Santos-Rosa et al., 2002; Pokholok et al., 2005) where transcription initiation occurs. H3K36 methylation, in contrast, is located towards the 3 -end of open reading frames and associated with transcription elongation (Pokholok et al., 2005; Strahl et al., 2002). While H3K79 methylation has also been implicated with DNA repair and telomeric silencing (Lacoste et al., 2002; Ng et al., 2003a; van Leeuwen et al., 2002) its connection with the transcriptional process are not well understood yet. This uncertainty can be largely attributed to the unsettled debate in the field whether different methylation states of H3K79 have similar—possibly even redundant—or distinct functions. The work presented in the first part of this chapter, addresses the debated redundancy of lysine 79 methylation states by mapping H3K79 di- and trimethyla74  tion genome-wide using chromatin immunoprecipitation (ChIP) followed by highresolution DNA microarray profiling (ChIP-on-chip). My data analysis contributed to the finding that H3K79 methylation states are mutually exclusive across the genome and that individual states are linked to different cellular functions. In particular, H3K79 dimethylation was linked to the cell cycle and found to be enriched at M/G1 cell cycle-regulated genes as well as in Swi4/6-bound promoters. In S. cerevisiae, more than 800 genes are known to be cell cycle-regulated and their expression peaks periodically in either one of the M, G1, S or G2 cycle phases (Spellman et al., 1998). A key mechanism driving cell cycle control is the regulatory interconnection and temporal orchestration of transcriptional activators (Simon et al., 2001). Often, transcription factors function specifically during one cell-cycle stage and control the expression of transcriptional activators for a subsequent one, thereby forming a feed-forward regulatory circuit to ensure an ordered progression through cell division. One of those is the heterodimeric transcription factor complex SBF (SCB-binding factor) consisting of Swi4 and Swi6 (Breeden, 1996; Andrews and Herskowitz, 1989). They bind the repeated regulatory sequence SCB (Swi4,6-dependent cell cycle box) upstream of the genes they activate during G1/S phase transition (Harbison et al., 2004; Iyer et al., 2001; Simon et al., 2001). The role of chromatin modifications in transcription control has been intensively studied in recent years (Hogan et al., 2006; Kaplan et al., 2008). However, many questions remain to be answered regarding chromatin structure and the role that chromatin alterations play in normal cell-cycle progression and transcriptional regulation. The results presented in the first part of this chapter aided in revealing a novel connection between H3K79me2 and the SBF transcriptional activation complex. Interdependencies of histone modifications. Post-translational histone modifications may occur individually or in combination on the same nucleosome, resulting in their cooperative function in specific chromatin neighbourhoods (Suganuma and Workman, 2008). The spatial proximity of histone modifications sometimes 75  underlies causal relationships and dependencies based on histone crosstalk (Latham and Dent, 2007; Suganuma and Workman, 2008). One striking example is the evolutionary conserved trans-tail histone crosstalk between H2BK123 monoubiquitination (H2BK123ub) and H3K4- and H3K79trimethylation, in which H2BK123ub is required for the addition of H3K4me3 and H3K79me3 (Shilatifard, 2006). This regulatory pathway acts unidirectional, as mutations that eliminate either H3K4me3 or H3K79me3 have no effect on H2B ubiquitin levels (Dover et al., 2002; Sun and Allis, 2002; Briggs et al., 2002). H2BK123ub is established by the Rad6/Bre1 ubiquitin ligase complex, and consistent with roles in transcription initiation and elongation, it is found in gene promoter and their coding regions (Dover et al., 2002; Hwang et al., 2003; Robzyk et al., 2000; Wood et al., 2003a). Numerous aspects of the transcription cycle are associated with H2BK123ub. Bre1 is required for the recruitment of Rad6 to promoters, and Rad6 necessitates the PAF complex to associate with RNA Polymerase II (RNAPII) and travel with its elongating form into coding regions (Xiao et al., 2005; Wood et al., 2003a). Consistently, for proper establishment of H2BK123ub, the transcription elongation complexes PAF and BUR are essential (Krogan et al., 2003b; Wood et al., 2003b; 2005). While it is already known that H2BK123ub is linked to H3K4 and H3K79 methylation, its role in establishing these marks has not been elucidated on a genome scale due to the lack of a suitable antibody. As part of the project described in this chapter, the first genome-wide map of H2BK123ub in S. cerevisiae was established, and the localization of H2BK123ub was compared to H3 methylation marks. My comprehensive analysis revealed that H2BK123ub is associated with both dependent marks H3K4me3 and H3K79me3 on a genome-wide scale. Furthermore, the global genomic analysis demonstrates that monoubiquitination of histone H2BK123 cooccurs in distinct combinations with histone H3 methylation marks in promoters, the 5 -end, the body and the 3 -end of genes. In contrast to methylation marks, which are relatively stable and small, monoubiquitination is a transient and large protein tag of 76 amino acids (Hochstrasser, 76  1996). In S. cerevisiae, H2BK123ub is removed by the ubiquitin-specific proteases Ubp8 and Ubp10 (Weake and Workman, 2008). Ubp8 is a subunit of the Spt-Ada-Gcn5-acetyltransferase (SAGA) complex, and the integrity of SAGA is required for Ubp8 deubiquitination activity (Daniel et al., 2004; Henry et al., 2003; Ingvarsdottir et al., 2005; Lee et al., 2005; Powell et al., 2004). Deubiquitination of H2BK123 by Ubp8 triggers the recruitment of Ctk1, which is involved in phosphorylating serine 2 at the C-terminal domain (CTDS2ph) of RNA polymerase II (Wyce et al., 2007). In contrast, Ubp10 (also known as Dot4) acts independently of SAGA and is associated with telomeric silencing (Emre et al., 2005; Gardner et al., 2005). Ubp10 binds to the silencing protein Sir4, is enriched in silenced loci, but its function is not restricted to heterochromatic regions (Gardner et al., 2005; Kahana and Gottschling, 1999). Given these differences, it is likely that Ubp8 and Ubp10 deubiquinate distinct pools of H2BK123ub in the cell. This is further supported by deletion of both proteases resulting in a higher global increase of H2BK123ub level than either one of the single deletions alone (Emre et al., 2005; Gardner et al., 2005). However, the site-specific roles of Ubp8 and Ubp10 have not been investigated on a genome scale. In the second part of this chapter, the presented results indicate that Ubp8 and Ubp10 act non-redundantly and remove H2BK123ub at different genomic loci. Ubp8 acted on loci that were enriched with H3K4me3, while Ubp10 removed ubiquitin at sites marked with H3K79me3. Aberrant transcripts and their chromatin structure. Besides post-translational modifications, histone variants can alter chromatin structure and affect or regulate cellular processes. These variants are deposited into specific chromatin regions and differ from their canonical counterparts in primary amino acid sequence (Talbert and Henikoff, 2010). H2A.Z is one such variant and is deposited into chromatin by the ATP-dependent complex SWR1-C (Kobor et al., 2004; Krogan et al., 2003a; Mizuguchi et al., 2004). It is mainly localized to promoter regions of genes and has been proposed to play a role in transcriptional regula77  tion (Guillemette and Gaudreau, 2006). Furthermore, H2A.Z, together with components of the RNAi machinery, has been proposed to suppress antisense transcripts in the S. pombe genome (Zofall et al., 2009), pointing at its essential role in transcriptional control. The regulation of chromatin is not only essential for a controlled progress through the transcription cycle, but also influences the accurate choice of transcript initiation sites. In cells lacking certain chromatin regulators, initiation of transcription occurs inappropriately within the protein-coding regions of genes, indicating that under altered genetic or physiological conditions, the expression of alternative genetic information may occur (Kaplan et al., 2003; Carrozza et al., 2005; Cheung et al., 2008; Joshi and Struhl, 2005). This phenomenon, called cryptic or spurious transcription, indicates that the precise organization of chromatin along transcription units is crucial to direct transcription factors and RNA polymerase to proper start sites within genes. One of the chromatin regulators, involved in the suppression of cryptic transcripts is the methyl-transferase Set2. As mentioned above, Set2 interacts with RNA polymerase II during transcription elongation and methylates H3K36 cotranscriptionally (Strahl et al., 2002; Krogan et al., 2003c). H3K36 methylation was shown to act as a signal to recruit the histone deacetylase complex Rpd3S, which deacetylates histones within transcribed sequences. The removal of histone acetylation is an important step to prevent cryptic transcription, because histone acetylation marks open chromatin and increases accessibility of the transcription machinery. This process can be understood as a kinetic proofreading scheme to ensure correct transcription initiation (Blossey and Schiessel, 2008). If the proper deacetylation in open reading frames is defective, for example in the absence of Set2, spurious transcription initiation occurs from cryptic start sites within ORFs (Carrozza et al., 2005; Lickwar et al., 2009). However very little is known how transcription is initiated from cryptic promoters and how the chromatin structure is altered (Pattenden et al., 2010). In the following part, the localization of the histone variant H2A.Z at sites of cryptic transcription is analyzed and the role of its deposition complex investigated. 78  4.3 4.3.1  Distinct states of histone modifications Genome-wide localizations of H3K79me2 and H3K79me3  Little is known about the distinct functions of H3K79me2 and H3K79me3 marks in the cell, and a redundancy in their roles was proposed in budding yeast (Frederiks et al., 2008; Shahbazian et al., 2005). To better understand the relationship of H3K79 di- and trimethylation as well as elucidate a possible role for H3K79me2 in cell cycle progression, a comprehensive genome-wide map of these modifications was established via ChIP-on-chip experiments using high-resolution tiling microarrays. Protein-DNA complexes containing either di- or trimethylated forms of histone H3K79 were specifically immunoprecipitated with antibodies against H3K79me2 and H3K79me3, respectively. In order to reliably detect enriched regions, an adapted version of the Modelbased Analysis of Tiling arrays algorithm (rMAT, see Section 3.4.1 for details) (Johnson et al., 2006; Droit et al., 2010) was applied, comparing signal intensities between ChIP and genomic DNA to calculate the protein-binding profile. Spearman rank correlation coefficients of ρ = 0.9 in average indicated high reproducibility and robustness of the performed replicates, typically two per experiment. Intriguingly, H3K79me2 and H3K79me3 were localized to different regions of the genome and had distinct and mutually exclusive patterns on chromatin (Figure 4.1A). In total, H3K79me2 covered ∼22 % and H3K79me3 ∼35 % of the genome with only 2 % overlap, suggesting that these H3K79 methyl marks were associated with distinct genomic regions. To compare the localization data with known genome-wide nucleosome occupancy data, H3K79 methylation profiles were overlaid with nucleosome position data predicted by a Hidden Markov Model (HMM) (Lee et al., 2007) (Figure 4.1A). Despite slight differences in resolution, enriched regions (blue and red bars in Figure 4.1A) colocalized with regions of known nucleosome occupancy (green bars in Figure 4.1A). To further ensure that these profiles truly reflected specific H3K79 methylation marks recognized by the two antibodies, control experiments were performed in a  79  Figure 4.1: H3K79me2 and H3K79me3 patterns across the genome. A: High-resolution profile of H3K79me2 and H3K79me3. Representative for the entire genome, regions of chromosome four and eight were plotted along the x-axis against the relative occupancy of H3K79me2 and H3K79me3 on the y-axis. ORFs are indicated as rectangles above the axis for Watson genes and below the axis for Crick genes. Green boxes represent HMM-predicted well-positioned and fuzzy nucleosome positions derived from Lee et al., 2007. B: Venn diagram comparing the number of H3K79me2- and H3K79me3-enriched ORFs. C: Average lengths of H3K79me3- and H3K79me2-enriched genes. Boxplots show the lengths of 6576 ORFs in yeast as well as the lengths of H3K79me3and H3K79me2-enriched ORFs.  80  strain lacking the H3K79 methyltransferase Dot1, in which H3K79 methylation is completely eliminated. The genome-wide control profiles showed randomly scattered background peaks and a trend toward occupancy of repetitive regions, demonstrating that the antibodies were specific for their respective H3K79 methylation state.  4.3.2  Profile of H3K79me2 and H3K79me3 at promoters and ORFs  Having established the detailed maps of H3K79 di- and trimethylation, we sought to understand the general features of the occupied regions. Genome-wide analysis revealed that H3K79me2 and H3K79me3 covered 1866 and 2350 of 6576 total open reading frames, respectively. As expected, the two sets of ORFs enriched with either H3K79 di- or trimethylation overlapped in very few genes (Figure 4.1B). Interestingly, H3K79me3-enriched ORFs were longer (median 1815 bp) whereas H3K79me2-enriched ORFs were shorter (median 848 bp) relative to the average ORF (median 1067 bp) (Figure 4.1C). To visualize the average profile of ORFs marked with H3K79me3 and H3K79me2, all enriched ORFs were aligned according to the location of translation initiation and termination sites, similar to an earlier published analysis (Pokholok et al., 2005) (Figure 4.2). Consistent with previous studies (Pokholok et al., 2005), H3K79me3 was uniformly enriched within the protein-coding region of genes (Figure 4.2A). In contrast, H3K79me2-enriched ORFs showed that H3K79 dimethylation was not only found in the protein-coding regions of genes, but also covered their promoter region (Figure 4.2B). Overall, H3K79me3 was found in only a few promoters, whereas H3K79me2 covered promoter regions more frequently. Promoters comprise a fraction of the intergenic region that are defined as regions that do not encode protein. Consistent with higher occupancy of H3K79me2 in promoter regions, H3K79me2 covered ∼20 % of the intergenic regions, whereas H3K79me3 covered less than 4 %.  81  Figure 4.2: Distribution of H3K79me2 and H3K79me3 across genes. Average profiles of A: H3K79me3- and B: H3K79me2-enriched ORFs. A gene was considered to be enriched if at least 50 % of its ORF was covered by the modification. ORFs were aligned according to their translational start and stop sites, similar to an approach by the Young lab (Pokholok et al., 2005). Each ORF was divided into 40 bins of equal length, probes were assigned accordingly, and average enrichment values were calculated for each bin. Probes in promoter regions (500 bp upstream of transcriptional start site) and 3’ UTR (500 bp downstream of stop site) were assigned to 20 bins, respectively. The average enrichment value for each bin was plotted.  4.3.3  Association of H3K79me2 and H3K79me3 with transcription  In order to examine the correlation between H3K79 di- and trimethylation and gene expression, enriched genes were assigned into five different classes according to their transcription rate (Holstege et al., 1998), and a composite occupancy profile for each class was determined (Figure 4.3A–B). Genes enriched for H3K79 dimethylation had a tendency to be present at higher levels in transcriptionally less active genes mainly toward their 5 -end and promoter region (Figure 4.3B). Consistent with previous findings (Pokholok et al., 2005), no clear correlation to transcriptional activity of genes and enrichment for H3K79 trimethylation was found (Figure 4.3A). Genes with lower expression levels were enriched for H3K79 diand trimethylation with similar ratios (Figure 4.3C). An exception, however, was the small group of the most highly expressed genes. These genes had very low levels of H3K79 di- and trimethylation in their promoters as well as in their ORFs. Gene expression has been reported to correlate inversely with nucleosome occu82  Figure 4.3: Association of H3K79me2 and H3K79me3 with transcriptional frequency. Average profiles of A: H3K79me3- and B: H3K79me2enriched ORFs according to transcriptional activity. All genes for which information was available (Holstege et al., 1998) were divided into five classes according to their transcriptional rate. Average gene profiles were computed and plotted as described in Figure 4.2. C: Percent enrichment of H3K79 di- and trimethylated ORFs in different transcriptional classes. As before, genes were divided into five classes according to their transcriptional activity (Holstege et al., 1998), and the percent overlap with H3K79 di- and trimethylated ORFs was plotted.  83  pancy in promoters (Lee et al., 2007), so it was not surprising to find low H3K79 di- and trimethylation in these promoters. However, coding regions of highly expressed genes have been shown to be more occupied by nucleosomes than lower expressed genes (Lee et al., 2007). Therefore, ORFs of highly expressed genes are devoid of H3K79 di- and trimethylation despite the dense occupancy with nucleosomes.  4.3.4  Association of H3K79me2 and H3K79me3 with the cell cycle  Because of the cell-cycle dependence of H3K79me2, we tested if H3K79 dimethylmarked genes were regulated during the cell cycle. In budding yeast, ∼800 genes change their transcriptional profile and peak in certain stages of the cell cycle (Spellman et al., 1998). We compared ORFs enriched for H3K79 di- and trimethylation with the different classes of cell cycle-regulated genes (Figure 4.4), and asked if the overlap was significant using Fisher’s exact test (Tavazoie et al., 1999). Indeed, M/G1-regulated genes were significantly enriched for H3K79me2, but were not enriched for H3K79me3 (Figure 4.4). In contrast, genes regulated in the G2 phase showed a significant occupancy with H3K79me3, but are not marked by H3K79me2 (Figure 4.4). These results suggest that H3K79 methylation is not random, but rather that it is regulated in conjunction with progression through the cell cycle and might be involved in periodic transcription of genes during distinct cell-cycle phases.  4.3.5  Dynamics of H3K79me2 and H3K79me3 during the cell cycle  Protein blot analysis suggested that bulk levels of H3K79me2, but not H3K79me3, changed during the progression of the cell cycle, with reduced levels in G1 and elevated levels in G2/M. To test if levels of H3K79me2 fluctuated at the level of single genes during the cell cycle, ChIP-on-chip assays were performed using chromatin of nocodazole-arrested yeast cells and compared to asynchronous cells. In nocodazole, cells are arrested in the G2/M phase of the cell cycle and should have the highest level of H3K79me2, while asynchronous cells have a mixed distribution of G1, S, and G2/M phase cells. While H3K79me2 profiles in asynchronous and G2/M-arrested cells were overall similar, with a Spearman rank correlation  84  Figure 4.4: Association of H3K79me2 and H3K79me3 with the cell cycle. Overlap of H3K79me2- and H3K79me3-enriched ORFs with transcriptionally regulated genes for each cell-cycle stage (Spellman et al., 1998). Numbers below the x-axis represent total number of genes with periodic transcription. Numbers above the x-axis represent the overlap of these genes with those enriched for H3K79me2 and H3K79me3. The percentage of the overlap was plotted on the y-axis. A: H3K79me2enriched ORFs in asynchronous cells. Expected by chance are 28 % (1866 H3K79me2-enriched ORFs out of 6576 total). B: H3K79me3enriched ORFs in asynchronous cells. Expected by chance are 36 % (2350 H3K79me3-enriched ORFs out of 6576 total). The p-values were calculated using Fisher’s exact test. coefficient of ρ = 0.69, a detailed analysis revealed important differences between them. In asynchronous cells, ORFs enriched for H3K79me2 significantly overlapped with genes whose expression is regulated in M/G1 (Figure 4.4). Interestingly, ORFs enriched for H3K79me2 in G2/M-arrested cells significantly overlapped not only with M/G1- but also with G1-regulated genes (Figure 4.5A). Moreover, this effect expanded into the promoter region of genes, since promoter regions of M/G1- and G1-regulated genes were also significantly enriched for H3K79me2 in G2/M-arrested cells (Figure 4.5B). This result suggests that M/G1and G1-regulated genes are marked in their ORF and promoter by H3K79me2 during cell-cycle stages (G2/M) when these genes are inactive. In contrast to H3K79me2, global levels of H3K79me3 were not altered during 85  Figure 4.5: Association of H3K79me2 with the cell cycle in G2/M-phase. Overlap of H3K79me2-enriched ORFs in G2/M-arrested cells with transcriptionally regulated genes for each cell-cycle stage (Spellman et al., 1998). A: H3K79me2-enriched ORFs in G2/M-arrested cells. Expected by chance are 38 % (2444 H3K79me2-enriched ORFs out of 6576 total). B: H3K79me2-enriched promoters in G2/M. Expected by chance are 23 % (1483 H3K79me2-enriched promoters out of 6576 total). the cell cycle. In order to confirm this observation on a single-gene level, the H3K79me3 profile in G2/M-arrested cells was determined. As expected, the profiles of the asynchronous and G2/M-arrested cells were similar, occupied the same regions, and correlated with a Spearman rank correlation coefficient of ρ = 0.91. Consistent with the observations in asynchronous cells, H3K79 di- and trimethylation had distinct and mutually exclusive patterns on chromatin in the G2/M phase and overlapped in only 1 % of the genome.  4.3.6  H3K79me2 at intergenic regions and ARS in G2/M-phase  To further characterize similarities and differences between H3K79me2 in asynchronous and G2/M-arrested cells, we concentrated on typical genomic features. It is known that nucleosome occupancy does not exhibit large, global variation between cell-cycle phases (Hogan et al., 2006), and observed changes in H3K79 dimethylation pattern during the cell cycle were most likely not due to global changes in nucleosome occupancy.  86  Visualizing the average profile of all genes occupied by H3K79me2 in G2/Marrested cells showed that, similar to asynchronous cells, H3K79 dimethylation was not only found in the protein-coding regions of genes, but also covered their promoters (Figure 4.6A). The trend toward occupancy of less-transcribed genes was weaker in G2/M-arrested cells compared to the asynchronous data set, but the relative height of the H3K79 dimethylation profile followed precisely the decreasing order of transcriptional activity (Figure 4.6B).  Figure 4.6: Association of H3K79me2 with ORFs and transcription in G2/Mphase. A: Average profile of H3K79me2-enriched ORFs in G2/Marrested cells. The profile for the average enriched ORF was determined as explained in Figure 4.2. B: Average profile of H3K79me2-enriched ORFs in G2/M-arrested cells according to their transcriptional activity. The profile was determined as described in Figure 4.3. Not only promoters were enriched more frequently with H3K79me2 in G2/Marrested cells, but also intergenic regions in general. Indeed, they were found to be covered to ∼50 % in G2/M-arrested cells compared to ∼20 % in the asynchronous data set. To further characterize additional chromosomal features for their enrichment with H3K79 di- and trimethylation, we focused on origins of replication (ARSs for autonomously replicating sequences) and centromeres. Intriguingly, ARSs were significantly enriched (131 out of 274, p < 3 · 10−14 ) for H3K79me2 in the G2/M phase. In contrast, low ARS occupancy of H3K79 di- and trimethylation (17 and 3 ARSs out of 274) was detected in asynchronous cells. The significant overlap of ARSs with H3K79me2 in G2/M-arrested cells indicates a potential role of 87  H3K79me2 in maintaining ARSs in their inactive state. In yeast, the single centromeric nucleosome contains a specialized H3 variant, Cse4, in place of canonical histone H3. Consistent with replacement of H3 at the centromere, neither H3K79me2 nor H3K79me3 antibodies enriched for CEN sequences in either the asynchronous or G2/M data set.  4.3.7  Association of H3K79me2 with the transcription factor Swi4  Genome-wide analysis of SBF-binding sites by ChIP-on-chip using the DNAbinding subunit Swi4 revealed that the SBF binds to promoters of genes expressed in G1/S (Harbison et al., 2004; Iyer et al., 2001; Simon et al., 2001). A comparison of H3K79me2-enriched promoters to the 137 promoters bound by Swi4 (Iyer et al., 2001) gave a significant overlap in G2/M-arrested cells (Figure 4.7). This overlap was not significant for H3K79me2-enriched promoters and ORFs in asynchronous cells, perhaps indicating that H3K79me2 is a consequence of SBF-binding/transcription earlier in the cell cycle. In contrast, ORFs and promoters enriched with H3K79me3 showed significantly lower overlap than expected by chance (based on Fisher’s exact test) and no overlap with Swi4-bound genes, respectively (Figure 4.7). This analysis showed that a significant number of SBF-regulated genes were H3K79 dimethylated in their promoter during G2/M.  4.3.8  Colocalization of H2BK123ub with H3K79me3  Given the results demonstrating that the patterns of H3K79 di- and trimethylation on chromatin are mutually exclusive (Figure 4.1), we next sought to understand the mechanism by which a single-enzyme Dot1 can distinguish between sites of diversus trimethylation. The appearance of H3K79me2 after replication in S phase could be explained by either the de novo establishment of the methyl mark or the demethylation of an existing H3K79me3 mark resulting in H3K79 dimethylation. The latter would require regions that were dimethylated in G2/M to be trimethylated during G1/S transition. We ruled out this possibility, because ChIP-on-chip of H3K79me3 in G1-arrested cells showed that regions that are dimethylated in G2/M were not trimethylated in G1, and the profiles of H3K79 trimethylation were similar in G1- and G2/M-arrested cells with a Spearman rank correlation of ρ = 0.82.  88  Figure 4.7: Association of H3K79me2 with the transcription factor Swi4. Venn diagram showing overlap of the 137 Swi4-bound genes (Iyer et al., 2001) with H3K79me2- and H3K79me3-enriched ORFs and promoters, respectively. H3K79me2-enriched promoters in G2/M-arrested cells showed significant overlap with Swi4-bound genes based on Fisher’s exact test. H3K79me3-enriched ORFs and promoters show significant underrepresentation of Swi4-bound genes. Promoters were called enriched when 450 bp upstream of the ORF were covered by the methyl mark.  Based on these observations, it seems to us that demethylation of H3K79me3 may not be the main method by which yeast cells regulate the pattern of H3K79me2 and H3K79me3; however, it remains possible that the trimethyl state could be achieved transiently and quickly removed in G1. Given the data, we hypothesize that H3K79 di- and trimethylation are established independently, and additional factors control the distribution of di- versus trimethylation. One major candidate is monoubiquitination of lysine 123 on histone H2B mediated by Rad6/Bre1, which is known to be required for proper H3K79 trimethylation. Since ChIP-on-chip studies indicated that the patterns of H3K79 di- and trimethylation are mutually exclusive and have ∼2 % overlap throughout the yeast genome, we hypothesize that the pattern of H2B monoubiquitination could control distribution of H3K79 di- versus trimethylation on chromatin. Based on this hypothesis, we predict that factors required for H3K79me3 should function through the regulation of H2B monoubiquitination. 89  To test the hypothesis that H2BK123 monoubiquitination determines the pattern of H3K79 trimethylation and colocalizes with H3K79 tri- but not dimethylation, a polyclonal antibody was developed specifically recognizing monoubiquitinated H2BK123. Employing this antibody, we determined a comprehensive genomewide map of H2BK123 ubiquitination via ChIP-on-chip (Figure 4.8A). To ensure the specific enrichment of H2BK123ub over unmodified H2B, nonspecific binding was blocked with H2B peptide during the immunoprecipitation step. In addition, H2BK123ub enrichment data were normalized using an H2BK123A mutant profile obtained from an identical ChIP-on-chip experiment. Intriguingly, H2BK123ub colocalized with H3K79 trimethylation at many genomic loci, but showed distinct and mutually exclusive patterns with H3K79 dimethylation (Figure 4.8A). Overall, high-confidence regions colocalized with H3K79 trimethylation, and only a very minor fraction overlapped with H3K79 dimethylation (Figure 4.8B). As expected from this analysis, very similar distributions existed at the ORF level, where 812 out of 2350 H3K79 trimethylated ORFs were marked by H2BK123 ubiquitination, but only 45 out of 1866 H3K79-dimethylated ORFs were enriched for H2BK123 ubiquitination (Figure 4.8C). Since H3K79 dimethylation and H2B monoubiquitination were mutually exclusive on a genome-wide level, we predicted the link of H3K79me2 to genes expressed specifically during the cell cycle to be independent of H2B monoubiquitination pattern. Indeed, cell cycle-regulated genes, especially those regulated in M/G1, were not marked by H2B ubiquitination (Figure 4.8D). Taken together, these findings suggest that the regulation of H2BK123 monoubiquitination is linked to H3K79 tri- but not dimethylation and could play role in distinguishing the genome-wide establishment of H3K79 di- versus trimethylation.  90  Figure 4.8: Colocalization of H2BK123ub with H3K79me3. A: Overlay of H2BK123ub, H3K79me2, and H3K79me3 profiles. Sample genomic positions for chromosomes 8 and 10 were plotted along the x-axis against the relative occupancy of the indicated histone modifications on the y-axis. ORFs are indicated as rectangles above the axis for Watson and below the axis for Crick strand. B: Diagram summarizing the percentage of genome-wide occupancy of H3K79me2 and H3K79me3 and their overlap with H2BK123ub. C: Diagram illustrating the overlap of H3K79me2 and H3K79me3 enriched with H2BK123ub-enriched genes. D: Overlap of H2BK123-ubiquitinated ORFs with transcriptionally regulated genes for each cell-cycle stage (Spellman et al., 1998). Numbers below and above the x-axis represent the total number of genes in each cell-cycle class and their overlap with H2B-ubiquitinated genes, respectively. The percentage of overlap is plotted on the y-axis with a dashed line indicating the percentage expected by chance. The p-values were calculated using Fisher’s exact test.  91  4.4 4.4.1  Interdependencies of histone modifications Genome-wide distribution of H2BK123ub and H3 methylation marks with respect to gene length and transcriptional frequency  To illuminate the chromatin network around H2BK123ub in S. cerevisiae, H2BK123ub, its dependent marks H3K4- and H3K79 methylation as well as the transcription elongation mark H3K36me3 were mapped using high-resolution tiling arrays. These profiles allow a comprehensive analysis to elucidate the co-occurrences and dependencies of H2BK123ub and H3 methylation in the context of transcriptional regulation across the genome. H2BK123ub and the H3 methylation marks measured here were strongly enriched in genomic regions transcribed by RNAPII, but mostly absent from other genomic features such as telomeres, centromeres, the rRNA locus, ARSs and tRNAs (Table 4.1). Table 4.1: H2BK123ub and H3 methylation at different genomic features. Overlap of investigated histone modifications with different genomic features like rRNAs, tRNAs, autonomously replicating sequence (ARS), centromeres (CENs) and telomeres (TEL). rRNAs, tRNAs, ARS and CENs were called associated when 100 % of underlying probes had an enrichment score above a threshold of 1.5. Telomeres were called enriched when at least 50 % of the underlying probes had an enrichment score above a threshold of 1.5. Feature Total H2BK123ub WT H3K4me3 H3K36me3 H3K79me3 H3K79me2 H2BK123ub ubp8∆ H2BK123ub ubp10∆  rRNA  tRNA  ARS  CEN  TEL  25 0 1 0 0 0 0 0  275 0 4 0 0 0 1 2  274 4 9 3 3 16 5 4  16 0 1 0 0 0 0 0  32 0 0 0 0 1 0 0  92  Therefore, the analysis was focused on RNAPII transcripts, and a compact yet comprehensive visualization approach (CHROMATRA) was developed (see Section 3.5.2) to assess the distribution of histone modifications across all transcripts at once, while accounting for gene length and transcriptional frequency. In this approach, enrichment scores of each modification were calculated using a 150 bp frame and colour-coded for all known transcripts, which were aligned according to their transcription start sites (TSS) and sorted by their length (Figure 4.9). Extending previous studies mapping H3 methylation in S. cerevisiae (Pokholok et al., 2005), our platform and visualization achieved greater resolution and avoided ambiguities caused by averaging effects in gene length. H2BK123ub covered the entire coding sequence of genes and extended into some promoters (Figure 4.9), which agrees with studies in human cells (Minsky et al., 2008). H3K4me3 was localized in the 5 -end of almost all transcripts and sharply peaked just downstream of the TSS. In contrast, H3K36me3 covered the entire body of transcripts almost uniformly. The lateral distribution of H3K79me3 followed a similar shape, but was less pronounced towards the 3 -end and intensified with increasing transcript length. As shown above, the profile of H3K79me2 was mutually exclusive to the one of H3K79me3, with higher occupancy towards shorter transcripts (Figure 4.9). In order to examine the correlation between these histone marks and gene expression, all transcripts were grouped into five classes according to their transcriptional frequency (Holstege et al., 1998), and using CHROMATRA (see Section 3.5.2) plotted similarly to Figure 4.9. While H2BK123ub was present in genes belonging to all transcriptional classes (Figure 4.10), it had a tendency for stronger occupancy in classes of higher transcriptional frequency (Figure 4.11). Similarly, H3K4me3 and H3K36me3 were enriched in all classes of transcriptional frequencies, whereas H3K79me2 and H3K79me3 were only enriched in classes of lower transcriptional frequencies as shown above (Figure 4.10 and 4.11).  93  Figure 4.9: Distribution of H2BK123ub and H3 methylation marks in all transcripts. Enrichment of H2BK123ub, H3K4me3, H3K36me3, H3K79me3 and H3K79me2 across all transcripts sorted by their length and aligned by their TSS. The normalized ChIP-on-chip MAT-scores were binned into segments of 150 bp. The average enrichment value for each bin was colour-coded and plotted. The upper adjacent (UA) of the MAT score distribution was used for colour bar limits.  94  Figure 4.10: Distribution of H2BK123ub and H3 methylation marks with respect to transcriptional frequency. As in Figure 4.9, but with transcripts sorted by their length as well as transcriptional frequency. Transcripts were grouped into five classes according to their number of transcripts per hour (Holstege et al., 1998). The upper adjacent (UA) of the MAT score distribution was used for colour bar limits.  95  Figure 4.11: Associations of H2BK123ub and H3 methylation marks with transcriptional frequencies. Boxplots indicating the association of histone marks with transcriptional frequencies. As above transcripts were grouped into five classes according to their transcriptional frequency. For all modifications and each transcript the average enrichment score was calculated as the average MAT score of all probes between transcript start and end. For each transcription class the average scores were then boxplotted. For H3K4me3, which peaks downstream of the TSS around the +2 and +3 nucleosome, average scores were calculated for 300 bp in that region.  4.4.2  Correlation and colocalization of H2BK123ub and its dependent marks  The spatial relationship of H2BK123ub and H3 methylation marks was complex as seen in the overlay at one representative genomic region (Figure 4.12A). To assess their associations quantitatively, their pair-wise Spearman correlation was calculated on a 5 bp (probe)-level. H2BK123ub correlated positively with its downstream marks H3K4me3 and H3K79me3, whereas the correlation was much stronger with H3K79me3 (ρ = 0.67) than with H3K4me3 (ρ = 0.26) (Figure 4.12B). In addition, H2BK123ub correlated positively with the elongation mark H3K36me3 (ρ = 0.63), which itself showed the strongest correlation with H3K79me3 (ρ =  96  0.76). In contrast, H2BK123ub and H3K79me2 correlated negatively (ρ = −0.31) consistent with the finding presented above that these two marks do not colocalize.  Figure 4.12: Correlation of H2BK123ub and its dependent marks. A: Overlay of H2BK123ub (black), H3K4me3 (red), H3K36me3 (green), H3K79me3 (blue) and H3K79me2 (orange) ChIP-on-chip profiles. Sample genomic position for chromosome 8 was plotted along the xaxis against the relative occupancy of the histone modifications on the y-axis. ORFs are indicated as rectangles, above the axis for Watson genes and below the axis for Crick genes. B: Spearman correlation matrix for genome-wide profiles calculated on a 5 bp (probe) level. Red indicates high correlation and blue represents anticorrelation. To more directly determine the colocalization of H2BK123ub and H3 methylation, the occurring patterns across all genes were further analyzed. Since some modifications such as H3K4me3 are enriched in specific parts of the gene (Figure 4.9), 97  genes were partitioned in segments for further analysis similar to an approach used before (Liu et al., 2005). Each gene was divided into five segments: promoter (300 bp upstream of the coding start), TSS-proximal (300 bp downstream of TSS), 5 -CDS (300 bp downstream of the coding start), mid-CDS (300 bp around the CDS center) and 3 -CDS (300 bp upstream of the coding end).  Figure 4.13: Patterns of H2BK123ub and its dependent marks. Specific segments of a gene showed different combination of marks. Genes were partitioned into the following segments: promoter region (300 bp upstream of the coding region), TSS-proximal (300 bp downstream of TSS), 5 -CDS (300 bp downstream CDS start), middle CDS (300 bp around centre of ORF), and 3 -CDS (300 bp upstream of CDS end). Mid-CDS and 3 -CDS was only considered when the ORF had a length of at least 900 bp or 600 bp, respectively. The average enrichment scores for the histone modification were hierarchically clustered (independently for each gene segment), colour-coded and plotted. Columns represent modifications, rows correspond to genes. In order to determine the combinations of H2BK123ub and the H3 methylation marks within each gene segment, the average enrichment score for each modification was calculated, clustered and plotted as a heatmap (Figure 4.13). To get a 98  more quantitative measure of these cooccurring patterns, we determined the number of all 32 possible combinations of marks for the five segments by assigning a binary value for each modification depending on whether it was enriched or not within each segment (Table 4.2). Both analyses revealed that promoters (column 3) were mostly enriched for H2BK123ub, H3K4me3 or H3K79me2 (lines 2, 3, 6), and characterized by two predominant combinations of marks: H2BK123ub and H3K4me3 (line 7) as well as H3K4me3 and H3K79me2 (line 13 and Figure 4.13, Table 4.2). H3K4me3 was most prominent in both the TSS-proximal segment (column 4) and the 5 -CDS (column 5), and largely cooccurred with either H2BK123ub or H3K79me2 (lines 7, 13). Mid- and 3 -CDS (columns 6, 7) were dominated by a triple combination of H2BK123ub, H3K79me3 and H3K36me3 (line 20). In addition, and consistent with findings presented above, the clustering clearly separated H3K79me2 from H3K79me3 particularly in the mid-CDS and 3 -CDS, and showed that H3K79 tri- but not di-methylation was associated with H2BK123ub (Figure 4.13). Taken together, in all segments H2BK123ub colocalized with its dependent marks H3K4me3 and H3K79me3—but not necessarily vice versa as discussed below. Table 4.2: Frequencies of histone modification patterns. For all possible histone modification patterns of H2BK123ub, H3K4me3, H3K36me3, H3K79me3, and H3K79me2 in promoter, 5 -CDS, TSS-proximal, midCDS and 3 -CDS the actual number of genes with each pattern is specified.  1 2 3 4 5 6 7 8 9 10 11 12 13  Signature Total no modification K123ub K4me3 K36me3 K79me3 K79me2 K123ub, K4me3 K123ub, K36me3 K123ub, K79me3 K123ub, K79me2 K4me3, K36me3 K4me3, K79me3 K4me3, K79me2  Prom 6572 3113 413 1060 31 46 600 402 10 21 23 33 70 423  99  TSS-prox 4868 656 69 1465 4 15 304 505 1 12 5 69 267 793  5 -CDS 6132 998 38 1218 21 48 534 304 5 17 3 171 480 953  mid-CDS 3832 360 5 32 84 351 417 3 8 42 1 57 12 47  3 -CDS 4786 1439 86 74 441 255 736 3 159 49 8 119 7 122  14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  4.4.3  Table 4.2 – continued Signature Prom TSS-prox K36me3, K79me3 36 1 K36me3, K79me2 7 0 K79me3, K79me2 6 4 K123ub, K4me3, K36me3 35 96 K123ub, K4me3, K79me3 69 410 K123ub, K4me3, K79me2 44 81 K123ub, K36me3, K79me3 24 1 K123ub, K36me3, K79me2 2 0 K123ub, K79me3, K79me2 0 1 K4me3, K36me3, K79me3 31 13 K4me3, K36me3, K79me2 7 7 K4me3, K79me3, K79me2 7 30 K36me3, K79me3, K79me2 6 0 K123ub, K4me3, K36me3, K79me3 41 41 K123ub, K4me3, K36me3, K79me2 0 4 K123ub, K4me3, K79me3, K79me2 2 11 K123ub, K36me3, K79me3, K79me2 0 0 K4me3, K36me3, K79me3, K79me2 8 1 K123ub, K4me3, K36me3, K79me3, K79me2 2 1  5 -CDS 38 16 8 172 468 65 16 0 2 137 36 73 2 266 13 16 0 9 5  mid CDS 919 61 56 51 3 1 767 0 2 204 58 5 58 189 4 1 1 30 3  3 -CDS 507 122 50 109 1 4 277 4 1 60 61 4 26 38 5 0 6 13 0  Site-specific removal of H2BK123ub by Ubp8 and Ubp10  To directly test the hypothesis that Ubp8 and Ubp10 remove the transient ubiquitin mark from distinct genomic regions (see page 77), H2BK123ub was mapped across the genome in strains lacking either Ubp8 or Ubp10. As expected from bulk protein blotting studies (Emre et al., 2005; Gardner et al., 2005), the number of probes enriched for H2BK123ub increased in the strains lacking Ubp8 or Ubp10 (Figure 4.14). Supporting the hypothesis, newly enriched probes for H2BK123ub were different between the two deletion strains (Figure 4.14). To assess the localization of these newly ubiquitinated regions, the H2BK123ub distribution was plotted for strains lacking Ubp8 or Ubp10 across all transcripts sorted by their length as well as transcriptional frequency using CHROMATRA (Figure 4.15A). In the ubp8∆ strain, H2BK123ub peaked downstream of the TSS, but was reduced throughout the body of the transcript compared to wildtype cells (Figure 4.15A). In contrast, H2BK123ub was localized across the coding sequence of mainly longer genes in ubp10∆ strain comparable to wildtype cells (Figure 4.15A). To clearly 100  visualize where newly enriched sites are located along the transcripts, wildtype profiles were subtracted from deletion profiles and positive-definite results colourcoded (Figure 4.15B).  Figure 4.14: Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (probe level). Number of probes enriched for H2BK123ub within transcripts in wild-type as well as ubp8 and ubp10 deletion strains. Venn diagrams comparing the overlap of these probes between the different strains. To test the hypothesis that Ubp8 removes H2BK123ub at sites marked by H3K4me3 and Ubp10 does so for regions enriched for H3K79me3, we asked how many H2BK123ub-enriched probes are enriched for H3K4me3 or H3K79me3 in wildtype, ubp8∆ and ubp10∆ stains (Figure 4.16A). Consistent with the hypothesis, newly enriched probes for H2BK123ub in the ubp8∆ strain were mainly marked by H3K4me3, and those in the ubp10∆ strain by H3K79me3 (Figure 4.16A). By averaging the enrichment profiles of H2BK123ub and its dependent marks in a length-dependent manner similar to a previous approach (Mayer et al., 2010), we noticed that the H2BK123ub profile was very different from H3K4me3 but comparable in its lateral distribution and overall shape to H3K79me3 (Figure 4.16B). Upon deletion of Ubp8, the H2BK123ub profile changed dramatically, now showing a striking similarity to H3K4me3. The ubp10 deletion profile, however, had relatively modest changes, although the degree of resemblance to H3K79me3 further increased (Figure 4.16B). These data suggest that Ubp8 acts primarily in the 5 -CDS marked by H3K4me3, whereas Ubp10 deubiquinates H2BK123 in the body of transcripts marked by H3K79me3. Since most changes in the distribution of H2BK123ub upon loss of Ubp8 and 101  Figure 4.15: Site-specific removal of H2BK123 ubiquitin by Ubp8 and Ubp10 (transcript level). A: Distribution of H2BK123ub in wild-type as well as ubp8 and ubp10 deletion strains across all transcripts sorted by their transcriptional frequencies. Calculations and plotting as in Figure 4.10. B: Differences in the enrichment of H2BK123ub in ubp8 and ubp10 deletion strains. Enrichment scores for H2BK123ub in ubp8 and ubp10 deletion strains were subtracted from the wildtype enrichment scores and only positive-definite results colour-coded. Average enrichment was calculated and transcripts sorted as in Figure 4.10.  102  Figure 4.16: Connection between Ubp8, Ubp10 and H3 methylation marks. A: All probes enriched for H2BK123ub within transcripts in wildtype as well as ubp8 and ubp10 deletion strains were compared with the number of these probes enriched for H3K4me3 and H3K79me3. B: All genes with a known TSS were divided into five length classes and the average enrichment for H2BK123ub wildtype, ubp8 and ubp10 deletion strains as well as H3K4me3 and H3K79me3 were mapped. All transcripts in each group were partitioned into 150 bp bins and the average enrichment values were calculated and plotted. 103  Ubp10 occurred in the 5 - and mid-CDS, these gene segments were further analyzed and the association of H2BK123ub with H3K4me3 and H3K79me3 determined (Figure 4.17). Supporting the hypothesis, the 5 -ends deubiquitinated by Ubp8 were primarily marked by H3K4me3, whereas the mid-CDSs deubiquitinated by Ubp10 were mainly marked by H3K79me3 (Figure 4.17).  Figure 4.17: Segment-based association of Ubp8 and Ubp10 with H3 methylation marks. Venn diagrams comparing the number of 5 -CDS and mid-CDS enriched for H2BK123ub in wildtype, ubp8 and ubp10 deletion strains. For the 5 -CDS and mid-CDS enriched for H2BK123ub only in the ubp8 or ubp10 deletion strain, the overlap with the number of 5 -CDS and mid-CDS either marked by H3K4me3 (red), H3K79me3 (blue) or both (purple) is shown as bars. To further infer dependencies in the circuitry of H2BK123 ubiquitination/deubiquitination and its dependent marks, we tested whether loss of Ubp8 or Ubp10 had any consequences on the global levels and the genome-wide distribution of H3K4me3 and H3K79me3. As previously shown (Gardner et al., 2005; Daniel et al., 2004; Song and Ahn, 2010), protein blot analysis revealed no bulk change of H3K4me3 and H3K79me3 in neither ubp8 nor ubp10 deletion strains compared to wild-type. Furthermore, the genome-wide distribution of H3K4me3 and H3K79me3 remained largely unchanged upon loss of Ubp8 or Ubp10 (data not shown). These findings suggested that Ubp8 and Ubp10 remove ubiquitin from H2BK123 after H3K4- and 104  H3K79 trimethylation have been established. The link between H2BK123ub and the transcriptional cycle (Weake and Workman, 2008) may suggest that non-removal of the (bulky) ubiquitin mark on H2BK123 in ubp8∆ and ubp10∆ strains impairs transcription. However, and consistent with moderate effects on transcription levels (Gardner et al., 2005; Lenstra et al., 2011), the distribution of RNAPII was not affected by loss of either Ubp8 or Ubp10 (data not shown). Moreover, the distribution of the elongation mark H3K36me3 was not impaired upon loss of Ubp8 (data not shown), suggestion that transcription elongation still takes place.  4.5 4.5.1  Aberrant transcripts and their chromatin structure Chromatin at cryptic promoters  It has been shown that cells lacking certain components of chromatin-associated pathways, such as the Set2/Rpd3S pathway critical for proper transcription-coupled chromatin remodeling, initiate cryptic transcripts within open reading frames (Carrozza et al., 2005; Lickwar et al., 2009; Joshi and Struhl, 2005; Kaplan et al., 2003; Cheung et al., 2008). To detect these spurious transcription initiation events in budding yeast, whole-genome custom Affymetrix tiling arrays were used to map the entire transcriptome in a wild-type and set2 deletion strain, replicated twice each. As described in Chapter 3, the R package tilingArray (Huber et al., 2006) was applied to normalize the data and determine all transcript segments for a globally optimal fit of expression segments along genomic coordinates for an assumed length of 1500 bp per segment (nrBasesPerSegment = 1500). It is important to note that this parameter does not enforce a minimum length for individual segments, it is rather used to determine the number of segments the algorithm is going to consider when partitioning the region of interest. ORFs in which segments newly appeared in the absence of Set2 were further considered. For these genes, the average gene expression values for all identified sections within known ORF boundaries were calculated and compared to gene expression levels in wildtype cells. In total 521 of the ∼6500 ORFs in S. cerevisiae showed one or more cryptic initiation events. More than 57 % of those overlapped with previously re105  ported cryptic transcripts (Lickwar et al., 2009) that were calculated employing the same underlying detection principle (Bai and Perron, 1998) of change-points that tilingArray uses. Little is know about the chromatin structure at sites of cryptic initiation within ORFs. One key chromatin modification found at many regular transcription initiation sites in budding yeast is the histone variant H2A.Z. We used ChIP-on-chip with high-resolution tiling arrays to determine its distribution across the genome. To visualize its distribution across transcripts, the average profile across genes was calculated (Figure 4.18A) and its association with transcriptional frequency visualized using CHROMATRA (Section 3.5.2) (Figure 4.18B). Consistent with previous studies (Guillemette et al., 2005; Raisner et al., 2005; Zhang et al., 2005; Albert et al., 2007), H2A.Z was mainly enriched at gene promoters (Figure 4.2A) and with stronger occupancy in classes of lower transcriptional frequency (Figure 4.18B). To test if cryptic promoters have H2A.Z localized to their initiation sites in intragenic regions, we mapped H2A.Z in cells lacking Set2. Strikingly, H2A.Z was enriched at about half of all sites of cryptic transcription initiation when the Set2/Rpd3S pathway was impaired (Figure 4.18C). Overlaying gene expression data with the H2A.Z mapping data allowed a direct comparison of the chromatin structure at sites of cryptic initiation (Figure 4.19). Another histone modification strongly associated with transcription initiation sites is H3K4 tri-methylation. As shown in Figure 4.10, H3K4me3 is enriched at 5 -ends of almost all genes in S. cerevisiae. If the chromatin structure at sites of cryptic transcription resembles the chromatin structure at normal initiations sites, one would expect to find H3K4me3 downstream of cryptic initiation sites as well. To test this hypothesis, we mapped H3K4me3 in cells lacking Set2 and found H3K4me3 to be enriched in many genes with cryptic transcripts (Figure 4.20).  4.5.2  Deposition and role of H2A.Z at cryptic promoters  The occurrence of H2A.Z and H3K4me3 at sites of cryptic initiation indicates that patterns of chromatin are comparable to those at regular transcribed regions. How-  106  Figure 4.18: Distribution of H2A.Z across ORFs and transcripts. A: Average profiles of H2A.Z across ORFs. ORFs were aligned according to their translational start and stop sites, similar to an approach by the Young lab (Pokholok et al., 2005). Plot calculated similar to Figure 4.2. B: Distribution of H2A.Z across transcripts with respect to transcriptional frequency. Plot as in Figure 4.10. C: Overlay of H2A.Z wild-type (red) and set2 deletion (blue) ChIP-on-chip profiles. Two sample genomic positions were plotted along the x-axis against the relative occupancy of the histone variant on the y-axis. ever, these data do not allow to draw conclusions about dependencies and the enzymatic machineries establishing these modifications. In order to elucidate such dependencies, we focused on the histone variant H2A.Z and its deposition machinery. H2A.Z is well known to be deposited into chromatin by the multi-subunit complex SWR1-C (Kobor et al., 2004; Krogan et al., 2003a; Mizuguchi et al., 2004). Impairing of the catalytic subunit Swr1 leads to a major loss of H2A.Z associated with chromatin, indicating that SWR1-C is the predominant H2A.Z-deposition machin-  107  Figure 4.19: H2A.Z at sites of cryptic initiation. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles with the corresponding gene expression data in wild-type and set2 deletion strain is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green lines. ery (Marques et al., 2010). To test whether the SWR1-C is responsible for integrating H2A.Z at cryptic promoters as well, we mapped H2A.Z in cells lacking both Swr1 and Set2. As expected, H2A.Z was lost at regular promoters in the absence of Swr1, but surprisingly, the localization of H2A.Z to cryptic promoters was not dependent on SWR1-C (Figure 4.21), suggesting that another complex is involved in H2A.Z deposition at these sites. Our current experiments and analyses are geared towards revealing the cellular machineries important for H2A.Z deposition at those sites. A strong correlation exists between transcription and many chromatin modifications. Yet, whether these correlations are based on pure associations or causal relationship exist between histone modifications and transcription is largely unknown. The disturbance of chromatin structures, e.g. by impairing the Set2/Rpd3 pathway, leads to the initiation of cryptic transcription within intragenic regions, arguing that chromatin has a direct effect on transcriptional regulation. However, the causal relationship of modifications associated with the initiation of transcripts 108  Figure 4.20: H3K4me3 at sites of cryptic initiation. For two sample genes, the overlay of H3K4me3 ChIP-on-chip profiles with the corresponding gene expression data in wild-type and set2 deletion strain is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green lines. such as H2A.Z and H3K4me3 for the occurence of cryptic transcripts has not been investigated so far. Therefore, we studied the role of H2A.Z in this process by mapping cryptic transcription in cells lacking H2A.Z. Interestingly, despite its strong association with (cryptic) transcription, H2A.Z was not causal for the occurrence of cryptic transcripts, as the number of intragenic initiation events was similar in the set2∆ and set2∆htz1∆ strains (Figure 4.22). As the example of cryptic transcription indicates, the cell’s transcriptome is rather complex and transcripts of various type have been identified so far. Moreover, transcription from the opposite strand to a protein-coding or sense strand called antisense transcription seems to be widespread in budding yeast and has been ascribed roles in gene regulation (David et al., 2006; Havilio et al., 2005; Nagalakshmi et al., 2008; Steigele and Nieselt, 2005; Gelfand et al., 2011). We observed that most cryptic transcripts occur in sense direction, but antisense cryptic  109  Figure 4.21: Role of SWR1-C in H2A.Z deposition at cryptic promoters. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles in wildtype, set2∆ and set2∆swr1∆ strains with the corresponding gene expression data is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green lines. transcripts occur as well (Figure 4.23). To test whether the histone variant H2A.Z is localized to sites of antisense cryptic transcription, the analysis was focussed on intragenic regions containing such transcripts. Interestingly, H2A.Z was found at these sites, indicating that chromatin structures at antisense cryptic promoters has a typical composition, too (Figure 4.23). Taken together, these studies provide a glimpse of the complexity of transcription, in particular at cryptic sites, and the association with the surrounding chromatin neighnourhoods and lay out the corner stones for further analysis.  4.6  Discussion  The work presented in the first part this chapter provides new evidence that individual modification states of the same histone residue have non-redundant cellular functions. Secondly, my analysis helped to reveal temporal and spatial dependen-  110  Figure 4.22: Role of H2A.Z for the occurrence of cryptic transcripts. For two sample genes, the overlay of H2A.Z ChIP-on-chip profiles with the corresponding gene expression data in wild-type, set2∆ and set2∆htz1∆ strains is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green and blue lines. cies of histone modifications and their distinctive patterns. Finally, evidence is presented that the chromatin structure is altered at cryptic transcriptions sites and that the H2A.Z as well as H3K4me3 are enriched at cryptic initiation sites. Distinct states of histone modification. In the first part, findings are reported, which address di- and trimethylation of H3K79 and their debated redundancy in functional outcome. The analysis showed that H3K79 methylation states were mutually exclusive across the genome and linked to different cellular functions. In particular, the presented characteristics of methylation states point towards a fundamental connection between H3K79 dimethylation and the cell cycle. The ChIPon-chip analysis revealed H3K79me2 to be enriched at M/G1 cell cycle-regulated genes as well as in Swi4/6-bound promoters. Histone lysine residues have different methylation states, and it has been unclear whether these states are functionally distinct or redundant. For H3K79 methy111  Figure 4.23: H2A.Z at sites of antisense cryptic transcripts. For a sample gene, the overlay of H2A.Z ChIP-on-chip profiles with the corresponding sense as well as antisense gene expression data in wild-type, set2∆ and set2∆htz1∆ strains is shown. Dotted vertical lines represent predicted transcript segments, which were used to calculate average expression levels for each segment visualized as horizontal green lines. lation, it was proposed that the three states are functionally redundant (Frederiks et al., 2008; Shahbazian et al., 2005), based on genetic evidence indicating that all three states have roles in telomeric silencing (Frederiks et al., 2008). In contrast, work presented in this chapter, provides evidence that H3K79 methylation states have non-redundant, distinct functions. The presented ChIP-on-chip profiles reveal that H3K79me2 and H3K79me3 are associated with mutually exclusive regions of the genome, a fundamental prerequisite to be functionally distinct. Mechanistically, the sole H3K79 methyl-transferase Dot1 requires a mechanism to specifically establish each state at certain regions in the genome. In one possible mechanism for this region-specific binding, Dot1 could recognize other chromatin marks that induce state-specific methylation. Indeed, as shown here, a potential candidate is H2BK123ub, which colocalizes with H3K79me3, but not H3K79me2. Supporting this hypothesis, in vitro reconstitution of the methylation reaction proposed that the mono-ubiquitination of H2B induces a conformational change to the Dot1-nucleosome complex which specifically stimulates Dot1’s catalytic ac112  tivity (Jeltsch and Rathert, 2008; McGinty et al., 2008). Therefore, the pattern of H2B monoubiquitination could control distribution of H3K79 di- versus trimethylation on chromatin. Another point of evidence supporting the distinct function of the different methylation states is the presented association of H3K79 di- but not trimethylation with cell-cycle control. What is known about chromatin and the cell cycle? One major aspect of cell-cycle control is the regulatory network of interconnected transcriptional activators (Simon et al., 2001). Often, transcription factors function specifically during one cell-cycle stage and control the expression of transcriptional activators for the subsequent one, thereby forming a feed-forward regulatory circuit to ensure an ordered progression through cell division. In addition to transcriptional regulation, cell-cycle progression also employs proteolysis, phosphorylation, localization, and other regulatory mechanisms to ensure completion of one event before entry into the next. The role of chromatin modifications in transcription control has been intensively studied in recent years. However, little is known about cell cycle-dependent changes in chromatin structure and the role that chromatin alterations play in normal cell-cycle progression and transcriptional regulation. Based on observations presented in this chapter, we propose the following model of how H3K79me2 and cell-cycle control relate to one another. Dephosphorylation of Swi6 and synthesis of Swi4 during mitosis allows binding of SBF to its target genes in the late M/early G1 phase of the cell cycle. However, inhibitory factors such as Whi5 prevent transcriptional activation at these targets until early G1, after cells have reached a critical size threshold. At this point in the cell cycle, known as START in budding yeast, activation of the G1 cyclin-dependent kinase Cln3-Cdk1 inside the nucleus renders the SBF functionally active, and a positive feedback loop results in amplification of SBF- as well as MBF-dependent transcription and synthesis of genes required for progression through G1 and entry into the S phase. The SBF is inactivated upon entry into the S phase through phosphorylation of Swi6, which disrupts its binding to Swi4 and results in its export to the cytoplasm. Without Swi6, Swi4 can no longer bind to DNA. The observa113  tions that levels of H3K79me2 increase during the S phase and remain high during the G2/M phase, combined with the genome-wide location analysis showing that H3K79me2-occupied genes overlap extensively with those expressed specifically in the G1 phase (or bound by Swi4), suggest that H3K79me2 marks cell cyclespecific genes during G2/M. Whether H3K79me2 is a consequence of gene inactivation or if it actively causes transcriptional inactivation remains to be determined. Here, we provide evidence that the modification is created de novo—that is, through the addition of two methyl groups to an unmodified H3K79—and is not generated by demethylation of an existing H3K79me3 residue. Taken together, the data clearly show that H3K79 methylation states are mutually exclusive and linked to different cellular functions. In particular, a fundamental connection between H3K79 dimethylation and the cell cycle is presented, which offers a great opportunity for further analysis of chromatin’s role in regulating the cell cycle. From the computational perspective, the fluctuation and distinct localization of H3K79 methylation during the cell cycle stages could indicate a toggle mechanism that signals a particular overall system state to certain cellular machineries and discriminates marked genes from others, thereby triggering how to correctly proceed through the cell cycle program. According to the histone code hypothesis, the dimethyl mark might be recognized by complexes via specific protein domains that are recruited to marked genes in order to promote their transcription as they are required for the subsequent process stage. Expressed in terms of a finite state machine, the cell cycle stages can be understood as machine states, chromatin modifications as part of the input (and/or output) events that determine correct state transitions, and the activation of certain genes as output of a state change. Interdependencies of histone modifications. H2BK123ub, a post-translational histone mark with roles in transcription initiation and elongation, is required in the histone trans-tail cross-talk for tri-methylation of histone H3K4 and H3K79 (Briggs et al., 2002; Dover et al., 2002; Nakanishi et al., 2009; Sun and Allis, 2002). H2BK123ub is highly transient and removed by the ubiquitin-specific 114  proteases Ubp8 and Ubp10. However, the genomic regions these deubiquitinases act upon have not been investigated genome-wide. In this chapter, genome-wide maps of H2BK123ub in wildtype, ubp8 and ubp10 deletion strains as well as H3 methylation profiles obtained on the same high-resolution platform are presented. The analysis demonstrates that H2BK123ub was associated with both its dependent marks and, despite not causally linked, also colocalized with the transcription elongation mark H3K36me3. Furthermore, Ubp8 and Ubp10 had site-specific roles in the genome: Ubp8 deubiquinated H2BK123ub at H3K4me3-marked regions, whereas Ubp10 removed H2BK123ub at H3K79me3-enriched sites. The data further indicate that H2BK123ub might be more transient in the 5 -end than in the body of transcripts, thereby influencing certain steps of the transcription cycle differently. These findings together with the current understanding of the histone transtail cross-talk suggest the following model (Figure 4.24): Initially, Rad6/Bre1 get recruited to promoters through interactions with transcriptional activators and monoubiquitinate H2BK123 (Weake and Workman, 2008). Together with the surrounding histone residues, H2BK123ub then provides a molecular ‘tag’ attracting the Set1/COMPASS complex, which in turn tri-methylates H3K4 (Lee et al., 2007). Eventually, Ubp8 removes the bulky H2BK123 monoubiquitin group, and H3K4me3 remains as a memory mark of recent transcriptional initiation (Gerber and Shilatifard, 2003; Krogan et al., 2003b; Muramoto et al., 2010; Ng et al., 2003b). In longer genes, which require an extensive transcription elongation phase, Rad6/Bre1 stay associated with the elongating form of RNA polymerase II and monoubiquitinate H2BK123 throughout the CDS. This provides a molecular ‘tag’ recognized by Dot1, which binds stronger to and resides longer at these sites to specifically trimethylate H3K79 (Jeltsch and Rathert, 2008). In those regions, Ubp10 removes H2BK123ub, and the stable H3K79me3 mark remains. The data further indicate that H2BK123ub might be more transient in the 5 -end than in the body of transcripts. In wild-type cells, H2BK123ub was mainly enriched in the body of transcripts marked by H3K79me3 and did not peak in the 5 -end. The increase of ubiquitin in the body of transcripts was modest upon loss 115  of Upb10, while it strongly increased in their 5 -ends upon loss of Ubp8, suggesting that the ubiquitin mark is more transient at the 5 -end under normal conditions. Furthermore, H2BK123ub was reduced in the body of transcripts in the Ubp8 deletion strain, indicating that non-removal of H2BK123ub at the 5 -end might block the ubiquitination of H2BK123 during later elongation steps.  Figure 4.24: Model depicting the circuitry of H2BK123ub and its dependent marks. First, H2B is monoubiquitinated by Rad6/Bre1 resulting in the recruitment of Set1/COMPASS and in longer genes Dot1 to trimethylate H3K4me3 and H3K79me3, respectively. The removal of H2BK123ub is proposed to be site-specific by either Ubp8 or Ubp10. It has been shown that deletion of Ubp8 and Ubp10 together results in a higher global increase of H2BK123ub level than either one of the single deletions alone, but the increase is not completely additive indicating that they work synergistically at a few regions (Emre et al., 2005; Gardner et al., 2005). Our data suggest that these regions might be the once marked by both dependent marks H3K4me3 and H3K79me3. In particular, at 5 -ends of genes H3K4me3 and H3K79me3 cooccurred, suggesting that Ubp8 and Ubp10 most likely both act at those regions. In support of the presented hypothesis, Ubp8 and Ubp10 catalyze deubiquitination of H2BK123 on distinct genomic loci, which raises the issue of specific recog116  nition of these sites. Ubp8 is part of the SAGA complex and forms a module with Sgf11, Sus1 and Sgf73 (K¨ohler et al., 2010; Samara et al., 2010). Sus1 is required for the recruitment of Ubp8 to promoters (K¨ohler et al., 2006) and might help Ubp8 to be recruited to H3K4me3-marked regions specifically. Ubp10 has not been identified to be part of a complex and the mechanism of its recruitment to chromatin is not clear. Ubp10 was proposed to play a role in telomeric silencing (Emre et al., 2005; Gardner et al., 2005), but we observed no increase of H2BK123ub in telomeres upon loss of Ubp10. Besides its proposed importance for telomeric regions, gene expression studies point towards a role of Ubp10 in euchromatin as well (Gardner et al., 2005). Consistently, we revealed that Ubp10 specifically removes H2BK123ub from H3K79 tri-methylated open reading frames. Strikingly, the H3K79 methyltransferase Dot1 has been initially found in the same screen as Ubp10 (Dot4) (Singer et al., 1998), further supporting this connection of Ubp10 and Dot1 across the genome. Proper addition and removal of H2BK123ub are required for optimal gene expression (Weake and Workman, 2008) and Ubp8-mediated deubiquitination is involved in the transition between initiation and elongation (Daniel et al., 2004; Henry et al., 2003). Failure to remove the H2BK123 ubiquitin mark leads to defects in recruitment of the kinase Ctk1, but global defects in subsequent CTD serine 2 phosphorylation have not been detected (Wyce et al., 2007). Consistently, it has been shown that loss of Ubp8 does not alter recruitment of RNAPII to GAL1 during gene activation (Wyce et al., 2007). We here confirmed that Rpb3 localization upon loss of Ubp8 or Ubp10 was unaltered genome-wide, which is reflected by moderate changes in gene expression (Gardner et al., 2005; Lenstra et al., 2011). Consistently, the elongation mark H3K36me3 did not change upon loss of Ubp8, suggesting that transcriptional elongation still takes place when H2BK123ub is not properly removed. Together, these findings indicate that the cell is able to efficiently transcribe genes despite impaired removal of the ubiquitin moiety. A possible explanation could be an indirect removal of the ubiquitin mark through eviction of H2A and H2B during transcription or histone turn-over. In addition, the overall transcrip117  tional program might be unaffected by the disturbed H2BK123ub pathway, potentially due to genetic robustness achieved by functional redundancy. Quantitative genetic interaction mapping showed that genes encoding components of the H2BK123ub machinery such as Rad6, Bre1 and Ubp8 interact genetically with genes encoding multiple transcription-related factors such as subunits of the Set1/COMPASS, PAF, SWR1-C or the proteasome and is embedded in the wellconnected complexes of transcriptional elongation (Ingvarsdottir et al., 2005; Xiao et al., 2005). Taken together, the results agree with previous findings that describe DUBs as major molecular regulators (D’Andrea, 2010) and point towards distinct roles of Ubp8 and Ubp10 in the deubiquitination machinery of eukaryotic cells. With respect to the underlying computational principles, the H2BK123ub pathway demonstrates nicely how one histone modification can trigger others in the context of transcriptional gene activation. Drawing again on the finite state machine abstraction, the transcriptional program can be divided into certain states, such as initiation, elongation and termination on a coarse-grained level, that need to be executed in a certain order. The sequence and combination of histone modifications described above might then be regarded as (part of the) inputs to determine correct state transitions. The recruitment of enzymatic complexes for certain transcription stages and establishing downstream histone marks can be understood as generated outputs of the machine. Aberrant transcripts and their chromatin structure. Recent studies have shown that the proper regulation of chromatin influences the correct choice of transcript initiation sites. In cells lacking certain chromatin-regulation pathways, initiation of transcription occurs inappropriately within the protein-coding regions of genes, indicating that expression of alternative genetic information occurs (Kaplan et al., 2003; Carrozza et al., 2005; Cheung et al., 2008; Joshi and Struhl, 2005). Although it is known that some of these cryptic transcripts are translated into proteins, very little is known about the chromatin neighbourhoods at cryptic initiation sites. In the last part of this chapter, we investigate chromatin alterations at sites of cryptic initiation when the Set2/Rpd3 pathway is impaired. We reveal that chromatin modifications typically found in normal promoters, such as H2A.Z, are also en118  riched in cryptic promoters. Since histone acetylation is a key modification of transcript initiation, the occurrence of cryptic transcripts within genes can be explained by increased histone acetylation within open reading frames due to an impaired Set2/Rpd3 pathway (Carrozza et al., 2005). It has been proposed that cryptic transcripts arise at all intragenic regions containing DNA sequences that have the ability to inappropriately recruit transcription initiation factors (Lickwar et al., 2009). Additional factors such as gene length or transcription rate may influence the occurrence of sites of cryptic transcription (Lickwar et al., 2009). So far, the functional importance of cryptic transcripts in S. cerevisiae is unclear. Potentially, these sites might have influenced the evolution of alternative promoters in higher eukaryotes, as the use of alternative transcription start sites is frequently found in higher organisms. To fully unravel the function of cryptic transcription, it is important to understand its association with the underlying chromatin structure. Here, we show that the histone variant H2A.Z as well as the histone modification H3K4me3 are enriched at sites of cryptic initiation, indicating that cryptic promoters have chromatin structures similar to canonical promoters. However, the data suggest that the regulation of chromatin modifications at cryptic sites is different from sites of canonical transcript initiation. The histone variant H2A.Z is known to be deposited into chromatin by the SWR1-C under normal conditions (Kobor et al., 2004; Krogan et al., 2003a; Mizuguchi et al., 2004). In contrast, H2A.Z is not deposited to sites of cryptic initiation by the SWR1-C. Therefore, another protein or enzymatic complex may be responsible to integrate H2A.Z at these sites of cryptic initiation. Potential candidates are the H2A.Z chaperones Nap1 or Chz1 (Luk et al., 2007) although they contain no enzymatic activities themselves. Alternatively, previous studies proposed that H2A.Z can be randomly incorporated into chromatin in a non-targeted fashion (Hardy et al., 2009). In this scenario, H2A.Z is not found in coding regions, since it is constantly removed during transcription elongation (Hardy et al., 2009). Under impaired conditions, such as the disturbance of the Set2 pathway, when cryptic transcripts arise, the removal of H2A.Z might not function properly at intragenic sites and H2A.Z gets enriched at 119  cryptic promotes without enzymatic activity. Our work further reveals that the histone variant H2A.Z is not causal for the occurrence of cryptic transcripts under the conditions tested, indicating that it most likely does not influence the location of cryptic initiation either. As mentioned above, other factors that affect transcription, including the availability of binding sites for transcriptional activators and the existence of DNA elements such as a TATA box, and play causal roles in the context-dependent initiation of cryptic transcription (Lickwar et al., 2009). Taken together, we show evidence that the chromatin modifications H2A.Z and H3K4me3 are enriched at sites of cryptic transcription. These data provide the first step of revealing the chromatin neighbourhood at cryptic start sites and offer the opportunity to further fill in the gap on the epigenetic map of the yeast genome. From the biocomputing viewpoint, the data suggest that the chromatin marks studied up to this point are not causal for the transcriptional program to be initiated at cryptic sites but are rather memory marks to signalize where a recent transcription took place and to reflect the state of a certain genomic stretch. The lack of impact on gene activity, however, indicates that these chromatin marks are potentially suitable candidates to encode information for artificial biocomputers. The decoupling of information and cellular consequence provides essential degrees of freedom to exploit these resources with fewer interferences regarding cellular function.  120  Chapter 5  Conclusion While an all-embracing yet precise definition of life as the discriminating characteristic between inanimate matter and objects we refer to as organisms remains open to a certain degree, undisputed properties of living systems include the capacity to metabolite, respond to stimuli and reproduce (Koshland, 2002). At first glance, some of these functions seem conceivable in higher organisms as emerging results of complex sensory networks, intricate structural organizations and elaborate nervous systems. Yet, even the ‘lowliest of the lowest’ single-celled organisms that roam this planet—as Joseph Leidy described amoeba (Naturalists et al., 1878)—lead highly sophisticated lives. Approaching the question of how such ‘simple’ living systems can, for example, respond to light, exhibit directed locomotion and even actively hunt prey, we start appreciating that these activities require information processing within the molecular realm. Essentially, cells are systems of highly connected and interdependent circuits that compute and perform logical operations on biomolecular substrates (Bonn and Furlong, 2008; Istrail et al., 2007; Levine and Davidson, 2005). Understanding the fundamental principles of these algorithmic bioprocesses will—mutually— advance our understanding of biology and inspire novel forms of computation (Condon et al., 2009). At the very core, the connection between life as a concrete physical process and computation as an abstract principle can be derived from theories on pancomputationalism (Zuse, 1990; Jaynes, 1957a;b), according to which 121  information is the essence of every process and, thus, any physical process is a computation and can be restated in terms of information: It is not unreasonable to imagine that information sits at the core of physics, just as it sits at the core of a computer. (...) It from bit. Otherwise put, every ‘it’—every particle, every field of force, even the space-time continuum itself—derives its function, its meaning, its very existence entirely—even if in some contexts indirectly— from the apparatus-elicited answers to yes-or-no questions, binary choices, bits. ‘It from bit’ symbolizes the idea that every item of the physical world has at bottom—a very deep bottom, in most instances— an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes-no questions and the registering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin (...) (taken from Wheeler, 1990). While the radical hypothesis of pancomputationalism awaits further elucidation, it is interesting to note that Wheeler’s idea suggests even binary logic at the core of every physical process. If this postulate is true, most efforts in biomolecular computing seem to follow the ‘correct’ computational paradigm when adopting digital encoding schemes and striving for discrete behaviour in man-made biocomputers despite the fact that many biological processes appear to be inherently analog in nature—at least at the macro-molecular scale to the human observer. Leaving the question of digital versus analog aside, computing—in both the classical and biological sense—essentially is a dynamic process that transforms a set of inputs to a set of outputs according to some algorithmic rules. To better reflect this dynamic aspect, certain branches in the field of biomolecular computing have started to make a transition away from DNA-based approaches towards more agile substrates as carriers of computational information in biocompatible applications (Benenson, 2009a). Two prominent biomolecular substrates that saw a dramatic shift towards agility in their roles in cellular computation are the main subject of this thesis—RNA and chromatin.  122  The presented work centres around the functionality and characteristics that RNA and chromatin provide with respect to their regulatory role in the cell as well as their potential capacities for biomolecular computing. The first part of the dissertation, covered in Chapter 2, focuses on RNA. I speculate on how eukaryotic cells might use regulatory RNA molecules and the RNA interference machinery to generate a form of epigenetic memory, and how they might potentially utilize it for computations based on sequential logic. The model I present is an RNA-based equivalent of the well-known electric flip-flop, one of the fundamental memory units in digital circuits. This work contributes by elucidating the abstract principles of the RNA interference machinery from a biological perspective and suggests novel ideas for universal memory units in biomolecular computing. The second part of this work, prepared in Chapter 3 and put into biological context in Chapter 4, focuses on chromatin. In collaborative projects with highly recognized research groups, my work contributes to the understanding of various aspects of chromatin biology. In particular, I analyzed genome-wide data of different methylation states of histone residue H3K79. Results of this analysis allowed to answer a long-standing question in the field of chromatin biology, and demonstrated that H3K79 methylation states are mutually exclusive in the genome and associated with distinct cellular processes. I further identified genome-wide patterns of several histone H3 modifications in the context of histone H2BK123 monoubiquitination, which—among other results—led to the finding that Ubp8 and Ubp10 deubiquitinate H2BK123 in different loci in the genome. Lastly, my work sheds light on intriguing connections between chromatin structure and gene activity by analyzing transcriptome data together with genome-wide profiles of histone marks, the histone variant H2A.Z and parts of its depositing enzymatic machinery. Although immediate translations of these results into applications for biomolecular computing seem far-fetched at this point, they aid in deciphering the principles of chromatin-based computation in the cell and, hence, lay the foundation for potential computational designs at a later point. The presented work emphasizes that there are different types of chromatin modifications, some of them being rather transient, whereas others are more permanent, indicating that some modifications of this computational layer have storage functionality, while others act like sig123  nalling toggles to discriminate activation of the ‘correct’ genetic elements in a particular system state. Chromatin and RNA represent two of several layers in the hierarchical network (Figure 1.1) that constitutes the cellular computation apparatus (Conrad, 1983). Although initially appearing to be rather independent, chromatin- and RNA-based regulatory functions became quickly connected with the discoveries of non-coding RNAs and their role in chromatin structure (see Section 1.3.1). As both research fields evolve, this overlap continues to grow. The examples in Section 1.3.2 of long ncRNAs, such as Xist and HOTAIR, demonstrate how deeply both layers are nested and hint at the versatile and powerful interplay they exhibit with respect to cellular function in general and epigenetic information processing in particular (Lee, 2010). For biomolecular computing—specifically biocompatible approaches—regulatory RNAs have already proven their superiority over DNA, and the field has passed the stage of identifying tractable goals how to fully implement in vivo computing (Benenson, 2009b). In the next phase, attempts are under way to assemble individual components into larger subunits. Complexity-wise, these units are projected to contain about 10–20 components, and the key lesson to be learned on that route will be how to deal with the inherent noise and random fluctuations in living systems while maintaining proper functioning of the engineered systems (Benenson, 2009b). Randomness and non-determinism neither conflict with the claim that cells compute nor are they unwanted byproducts. Instead, many molecular interactions in the cell heavily rely on random searching to increase the probability of encounters between reaction partners over relatively longer distances; only once in proximity, more deterministic processes control the course of interaction (Yanagida et al., 2007; Bajic and Tan, 2005). The critical question for biomolecular computing to understand here is how do natural systems avoid the amplification of this endogenous noise and sustain their operation, and so across a wide range of physiological conditions? It has been proposed that the hierarchical composition of 124  interconnected computational layers in the cell might be—at least part of—the answer to this question as it facilitates redundancy and converging of disintegrated signal and control flows (Bajic and Tan, 2005; Prohaska et al., 2010; Stojanovic, 2010). It even appears as if this form of distributed computing over multiple layers and different molecular substrates is critical for the flexible and intelligent behaviour of organisms (Bajic and Tan, 2005). Thus, once again, current efforts in biomolecular computing aiming to combine transcriptional, post-transcriptional and post-translational layers (Benenson, 2009b) seem to pursue the right path towards robust operation of complex engineered circuits in vivo. With respect to RNA-based computations, this principle becomes obvious. As explained in Sections 1.3.2 and 2.3, small interfering RNAs alone—despite being the pivotal element—have no capacity to unfold their regulatory impact without the protein machinery of RNA interference. This observation holds for other types of regulatory RNAs as well (Shalgi et al., 2007; Shimoni et al., 2007). Combinations of the RNA and protein layer features led to the most advanced biocomputers thus far (Deans et al., 2007; Greber et al., 2008; Rinaudo et al., 2007; Win et al., 2009). Especially in the work of Rinaudo et al., 2007, the logic capabilities and computational power of such hybrid circuits are substantially greater compared to previous approaches and could be further widened—towards sequential and/or iterated operation—if equipped with universal memory units such as flip-flops to temporarily store information. Additional increases in computational flexibility of such devices could potentially be derived by further integrating sequence- and structure-based features of RNAs. As successfully laid out by Smolke’s group, riboswitch RNAs with integrated aptamer domains offer tremendous opportunities as sensors and affectors for a wide range of molecular signals in the cell (Win and Smolke, 2007b). The induced switch-like conformational change of the RNA in response to binding between a specific ligand and the aptamer modulates the efficiency of target processing through the RNA interference machinery (Beisel et al., 2008). For the envisioned diagnostic and therapeutic applications of biocompatible computers, aptamers could serve as detectors for characteristic molecules in disease cells 125  and specifically expose a disease-relevant target to RNAi processing (Beisel et al., 2008). Hence, RNAi-mediated processing of target RNAs can be controlled not only through the (sequence of the) siRNA effector but also by applying or removing the ligand signal the aptamer on the target is susceptible for. Maybe this dual control scheme might also provide a route to cascade multiple RNA flip-flops in a counter- or frequency divider-like fashion. Integrated aptamers in RNAs and corresponding ligands could render flip-flops susceptible for primary trigger signals or clock certain parts of flip-flop cascades to achieve (a-)synchronous behaviour. As more and more studies reveal the diverse ways nature utilizes RNAs and as much as the revealed versatility must appeal to current attempts towards in vivo computing, one also has to pay attention to the fact that the more nested a substrate or process is in the cell, the more likely it is that the degrees of freedom from an engineering perspective become limited. Many molecules that would be particularly interesting to exploit—for instance polymerases and ribosomes because of their catalytic properties and the ability to translate information from DNA to RNA and from RNA to protein, respectively—are evolutionary tightly locked within the cellular computing machinery and, hence, offer little flexibility since other elements in the machinery would be inevitably affected too when trying to modify these molecules (Stojanovic, 2010). Abstraction might provide a principle to circumvent this dilemma. By not encoding information as direct properties of substrates, such as their sequence, length or structure, one may gain additional freedom for computation without affecting other cellular processes. The RNA-based flip-flop discussed in Chapter 2 exemplifies such an approach. Although being a memory model not (yet) found in nature, it could potentially operate within and according to the rules of the cellular machinery without any further requirements. Admittedly, an implementation would have to show whether the utilization of the RNAi machinery has negative effects for the rest of the cell. It could certainly be that only a limited amount of RISC complexes is available and that either the flip-flop or natural processes cease function because of saturation effects.  126  Another substrate that might offer computational potential through abstraction is chromatin (Prohaska et al., 2010). While the impact of chromatin modifications on gene activity clearly suggests a link between the underlying DNA sequence and the positioning of nucleosomes, other chromatin architectural proteins and their chemical modifications, the mapping between genetic and epigenetic information does not seem to be a bijection (Jiang and Pugh, 2009; Washietl et al., 2008; Segal and Widom, 2009). This ‘partial detachment’ from DNA together with the capacity to write, erase and read chromatin information through specific proteins generates freedom for epigenetic information processing superimposed to the DNA-driven cis-regulatory processes (Benecke and Group, 2006; Fischle et al., 2003; Hall et al., 2002; Sedighi and Sengupta, 2007). In particular, writer, eraser and reader enable information exchange between different genomic regions and epigenetic information propagation over cell generations. The inducible chromatin state—from both the chemical and structural perspective—thereby serves as a large, flexible and resettable (Reik, 2007; Morgan et al., 2005) memory device. As emphasized by the work presented in Chapter 4, even down to the level of different modification states of individual histone residues, information is encoded in a non-redundant manner. Hence, postulated storage capacities of 70 bits per nucleosome in S. cerevisiae and around 200 bits in human (Prohaska et al., 2010)—not even considering different states and context dependencies of modifications—seem conservative. The ‘hybrid’ processing layer of epigenetics, which unites aspects of DNA sequence information, chromatin structure and RNA as well as protein activity, promises a wide range of computationally exploitable opportunities. However, much remains to be fundamentally elucidated from the biological side before we can start thinking of using chromatin for biocomputing. One of the central points research will have to address is the question of the ‘quality’ of connectivity between chromatin and other cellular layers. Up to this point, deriving general principles regarding the relations between different chromatin modifications and between chromatin modifications and gene activity that allow distinguishing actual causal links and mere associations has been a challenge (Lee et al., 2010). The exceptions that seem to exist for every derived rule may however—at least partially—be rooted in technical limitations. 127  Exemplified by the results presented in Chapter 4, the data suggests that multiple histone modifications (such as H2BK123ub and H3K4me3 or H3K79me3) cooccur on the same nucleosome. Yet, neither ChIP-on-chip nor ChIP-seq profiling— representing averages of entire usually asynchronous cell populations—yield the resolution to further analyze and address such questions at single-cell or singlenucleosome level. Carrier ChIP (CChIP) and MicroChIP currently represent the best approaches of this family, but still require on the order of 102 –103 cells to generate a chromatin modification profile (O’Neill et al., 2006; Dahl and Collas, 2009), and genome-wide single-cell maps of modifications are technically still not feasible as of the writing of this dissertation. Even if we already could study chromatin on the single-cell level, the question of whether certain histone marks, such as methylation of different H3 residues as presented in Chapter 4, coexist on the same histone could still not be answered. Single-molecule experiments are necessary to determine cooccurrance of modifications at high resolution. Recent work based on highly sensitive mass spectrometry generated progress in that direction (Young et al., 2010; Taverna et al., 2007). Yet, all of the mentioned technologies capture only snapshots of the chromatin structure and fall short to address the dynamics and, thus, the sequential dependencies between chromatin modifications. While synchronizing or arresting cells, as done to derive the data analyzed in Section 4.3, might work for certain questions, ultimately techniques like CATCH-IT (covalent attachment of tags to capture histones and identify turnover) offer greater versatility for measuring the dynamic properties of histones (Deal et al., 2010). The gained understanding on the ‘micro’ level of structural and dynamic properties of chromatin will have to go hand in hand with the unrevealing of the spatial organization of this computational layer on the ‘macro’ level within the nucleus (Misteli, 2007). Advancements in live-cell imaging represent promising steps towards determining these conformational layouts (Zhao et al., 2006; Simonis et al., 2006; Dekker et al., 2002; Vassetzky et al., 2009). A more comprehensive appreciation of the spatial and temporal complexity of chromatin will also help to relate back to the processes located on its connecting layers, foremost DNA and the activity of genetic elements. The ease with which genetic manipulations can be performed in model organisms in combination with suitable environmental con128  ditions will be crucial to dissect (context-dependent) causalities in these complex problems in cell biology. Hence, chromatin modification profiling complemented by transcriptome mapping under various conditions will greatly augment the ability to link observations and functional outcomes with respect to gene activity and non-coding RNA regulation. The examples presented in Section 4.5 demonstrate the rich opportunities for scientific discovery that arise from this synergy. To take full advantage of these opportunities, the analysis of multi-dimensional data will be increasingly important and offer plenty of room for inter-laboratory and -disciplinary collaborations. Suitable analysis platforms supporting such endeavours will be crucial along that path, and the Internet will play a key role as the web-based fusion of tools and data repositories steadily continues. This trend is best exemplified by the recent integration of one of the most advanced algorithms for genome-scale assembly of sequencing data (Ng et al., 2010) into the Galaxy environment. I too chose Galaxy to demonstrate in Chapter 3 how locally running analysis tools can be easily migrated and made publicly available to broader audiences. Besides the development of smart new algorithms to manage the data volume that is being generated through projects such as the Roadmap Epigenomics (Bernstein et al., 2010) and the large-scale integration of such resources into frameworks, simply increasing the exposure of computational people to biological and biomedical research—and vice versa—may be a relatively easy step to tackle the ‘informatics crisis’ (Goecks et al., 2010) in the ’omics era. Personally, I have greatly enjoyed this exposure over the past years while working on the projects presented herein, and I feel benefited having the opportunities to straddle domains and explore the facets of information and life from both the biocomputing and bioinformatics perspective. I hope the results I was able to derive will contribute to the advancement in understanding information processing in living systems and inspire novel designs for biocomputing circuits.  129  Bibliography Adleman, L. M. (1994). Molecular computation of solutions to combinatorial problems. Science (New York, NY) 266, 1021–1024. → pages 5, 6 Agrawal, N., Dasaradhi, P. V. N., Mohmmed, A., Malhotra, P., Bhatnagar, R. K. and Mukherjee, S. K. (2003). RNA interference: biology, mechanism, and applications. Microbiology and molecular biology reviews : MMBR 67, 657–685. → pages 17, 19 Agresti, A. (1992). A Survey of Exact Inference for Contingency Tables. Statistical Science 7, 131–153. → pages 61 Ahn, S.-H., Cheung, W. L., Hsu, J.-Y., Diaz, R. L., Smith, M. M. and Allis, C. D. (2005). Sterile 20 kinase phosphorylates histone H2B at serine 10 during hydrogen peroxide-induced apoptosis in S. cerevisiae. Cell 120, 25–36. → pages 14 Albert, I., Mavrich, T. N., Tomsho, L. P., Qi, J., Zanton, S. J., Schuster, S. C. and Pugh, B. F. (2007). Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome. Nature 446, 572–576. → pages 106 Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. (2007). Molecular Biology of the Cell. 5 edition, Garland Science. → pages 1, 27 Alder, M. N., Dames, S., Gaudet, J. and Mango, S. E. (2003). Gene silencing in Caenorhabditis elegans by transitive RNA interference. RNA 9, 25–32. → pages 29 Allis, C. D. (2007). Epigenetics. CSHL Press. → pages 2, 14 An, C.-I., Trinh, V. B. and Yokobayashi, Y. (2006). Artificial control of gene expression in mammalian cells by modulating RNA interference through  130  aptamer-small molecule interaction. RNA (New York, NY) 12, 710–716. → pages 9, 19 Andrews, B. and Herskowitz, I. (1989). The yeast SWI4 protein contains a motif present in developmental regulators and is part of a complex involved in cell-cycle-dependent transcription. Nature 342, 830–833. → pages 75 Aoki, K., Moriguchi, H., Yoshioka, T., Okawa, K. and Tabara, H. (2007). In vitro analyses of the production and activity of secondary small interfering RNAs in C. elegans. The EMBO journal 26, 5007–5019. → pages 47 Aparicio, O., Geisberg, J. V. and Struhl, K. (2004). Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Current protocols in cell biology Chapter 17, Unit 17.7. → pages 50 Arents, G., Burlingame, R. W., Wang, B. C., Love, W. E. and Moudrianakis, E. N. (1991). The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. Proceedings of the National Academy of Sciences of the United States of America 88, 10148–10152. → pages 10 Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. and Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 25, 25–29. → pages 65 Ausio, J. and van Holde, K. E. (1986). Histone hyperacetylation: its effects on nucleosome conformation and stability. Biochemistry 25, 1421–1428. → pages 14 Babiskin, A. H. and Smolke, C. D. (2011). Engineering ligand-responsive RNA controllers in yeast through the assembly of RNase III tuning modules. Nucleic Acids Research, in press. → pages 8 Babloyantz, A. and Nicolis, G. (1972). Chemical instabilities and multiple steady state transitions in Monod-Jacob type models. Journal of theoretical biology 34, 185–192. → pages 6, 27 Badis, G., Chan, E. T., van Bakel, H., Pena-Castillo, L., Tillo, D., Tsui, K., Carlson, C. D., Gossett, A. J., Hasinoff, M. J., Warren, C. L., Gebbia, M., 131  Talukder, S., Yang, A., Mnaimneh, S., Terterov, D., Coburn, D., Li Yeo, A., Yeo, Z. X., Clarke, N. D., Lieb, J. D., Ansari, A. Z., Nislow, C. and Hughes, T. R. (2008). A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular cell 32, 878–887. → pages 63 Bai, J. and Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica 66, 47–78. → pages 106 Bajic, V. B. and Tan, T. W. (2005). Information processing and living systems. Imperial College Pr. → pages 124, 125 Baker, P. G., Brass, A., Bechhofer, S., Goble, C., Paton, N. and Stevens, R. (1998). TAMBIS–Transparent Access to Multiple Bioinformatics Information Sources. International Conference on Intelligent Systems for Molecular Biology 6, 25–34. → pages 67 Bar-Ziv, R., Tlusty, T. and Libchaber, A. (2002). Protein-DNA computation by stochastic assembly cascade. Proceedings of the National Academy of Sciences of the United States of America 99, 11589–11592. → pages 8 Baron, R., Lioubashevski, O., Katz, E., Niazov, T. and Willner, I. (2006a). Logic gates and elementary computing by enzymes. The journal of physical chemistry. A 110, 8548–8553. → pages 8 Baron, R., Lioubashevski, O., Katz, E., Niazov, T. and Willner, I. (2006b). Elementary arithmetic operations by enzymes: a model for metabolic pathway based computing. Angewandte Chemie (International ed in English) 45, 1572–1576. → pages 8 Baulcombe, D. C. (2007). Molecular biology. Amplified silencing. Science (New York, NY) 315, 199–200. → pages 29 Bayer, T. S. and Smolke, C. D. (2005). Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nature biotechnology 23, 337–343. → pages 8 Beisel, C. L., Bayer, T. S., Hoff, K. G. and Smolke, C. D. (2008). Model-guided design of ligand-regulated RNAi for programmable control of gene expression. Molecular systems biology 4, 224. → pages 9, 19, 125, 126 Beisel, C. L. and Smolke, C. D. (2009). Design principles for riboswitch function. PLoS computational biology 5, e1000363. → pages 8, 31 132  Benecke, A. and Group, S. E. (2006). Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs. The European physical journal. E, Soft matter 19, 353–366. → pages 127 Benenson, Y. (2009a). RNA-based computation in live cells. Current opinion in biotechnology 20, 471–478. → pages 8, 9, 31, 46, 122 Benenson, Y. (2009b). Biocomputers: from test tubes to live cells. Molecular bioSystems 5, 675–685. → pages 3, 6, 7, 8, 29, 124, 125 Benenson, Y., Gil, B., Ben-Dor, U., Adar, R. and Shapiro, E. (2004). An autonomous molecular computer for logical control of gene expression. Nature 429, 423–429. → pages 7 Berger, S. L. (2007). The complex language of chromatin regulation during transcription. Nature 447, 407–412. → pages 14 Bernstein, B. E., Humphrey, E. L., Erlich, R. L., Schneider, R., Bouman, P., Liu, J. S., Kouzarides, T. and Schreiber, S. L. (2002). Methylation of histone H3 Lys 4 in coding regions of active genes. Proceedings of the National Academy of Sciences of the United States of America 99, 8695–8700. → pages 74 Bernstein, B. E., Stamatoyannopoulos, J. A., Costello, J. F., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M. A., Beaudet, A. L., Ecker, J. R., Farnham, P. J., Hirst, M., Lander, E. S., Mikkelsen, T. S. and Thomson, J. A. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nature biotechnology 28, 1045–1048. → pages 129 Bernstein, E. and Vazirani, U. (1997). Quantum Complexity Theory. SIAM Journal on Computing 26, 1411–1473. → pages 4 Bertone, P., Gerstein, M. and Snyder, M. (2005). Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 13, 259–274. → pages 54 Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M. and Snyder, M. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science (New York, NY) 306, 2242–2246. → pages 57  133  Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., Kao, M.-Y., Snyder, M. and Gerstein, M. (2006). Design optimization methods for genomic DNA tiling arrays. Genome research 16, 271–281. → pages 53 Bhaumik, S. R., Smith, E. and Shilatifard, A. (2007). Covalent modifications of histones during development and disease pathogenesis. Nature structural & molecular biology 14, 1008–1016. → pages 13, 73 Bird, A. (2007). Perceptions of epigenetics. Nature 447, 396–398. → pages 11, 28 Bitko, V. and Barik, S. (2001). Phenotypic silencing of cytoplasmic genes using sequence-specific double-stranded short interfering RNA and its application in the reverse genetics of wild type negative-strand RNA viruses. BMC microbiology 1, 34. → pages 19, 29 Blossey, R. and Schiessel, H. (2008). Kinetic proofreading of gene activation by chromatin remodeling. HFSP journal 2, 167–170. → pages 78 Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M. and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. The EMBO journal 17, 170–180. → pages 28 Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford, England) 19, 185–193. → pages 55, 57 Bonasio, R., Tu, S. and Reinberg, D. (2010). Molecular signals of epigenetic states. Science (New York, NY) 330, 612–616. → pages 12, 28 Bonn, S. and Furlong, E. E. M. (2008). cis-Regulatory networks during development: a view of Drosophila. Current opinion in genetics & development 18, 513–520. → pages 2, 9, 121 Boutros, M., Kiger, A. A., Armknecht, S., Kerr, K., Hild, M., Koch, B., Haas, S. A., Paro, R., Perrimon, N. and Consortium, H. F. A. (2004). Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science (New York, NY) 303, 832–835. → pages 19, 29 Braich, R. S., Chelyapov, N., Johnson, C., Rothemund, P. W. K. and Adleman, L. (2002). Solution of a 20-variable 3-SAT problem on a DNA computer. Science (New York, NY) 296, 499–502. → pages 4 134  Breeden, L. (1996). Start-specific transcription in yeast. Curr Top Microbiol Immunol 208, 95–127. → pages 75 Briggs, S. D., Bryk, M., Strahl, B. D., Cheung, W. L., Davie, J. K., Dent, S. Y., Winston, F. and Allis, C. D. (2001). Histone H3 lysine 4 methylation is mediated by Set1 and required for cell growth and rDNA silencing in Saccharomyces cerevisiae. Genes & development 15, 3286–3295. → pages 74 Briggs, S. D., Xiao, T., Sun, Z.-W., Caldwell, J. A., Shabanowitz, J., Hunt, D. F., Allis, C. D. and Strahl, B. D. (2002). Gene silencing: trans-histone regulatory pathway in chromatin. Nature 418, 498. → pages 76, 114 Brockdorff, N., Ashworth, A., Kay, G. F., McCabe, V. M., Norris, D. P., Cooper, P. J., Swift, S. and Rastan, S. (1992). The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71, 515–526. → pages 21 Brown, C. J., Hendrich, B. D., Rupert, J. L., Lafreni`ere, R. G., Xing, Y., Lawrence, J. and Willard, H. F. (1992). The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527–542. → pages 21 Brownell, J. E., Zhou, J., Ranalli, T., Kobayashi, R., Edmondson, D. G., Roth, S. Y. and Allis, C. D. (1996). Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell 84, 843–851. → pages 13 Bryant, G. O., Prabhu, V., Floer, M., Wang, X., Spagna, D., Schreiber, D. and Ptashne, M. (2008). Activator control of nucleosome occupancy in activation and repression of transcription. PLoS biology 6, 2928–2939. → pages 12 Cairns, B. R. (2007). Chromatin remodeling: insights and intrigue from single-molecule studies. Nature structural & molecular biology 14, 989–996. → pages 13 Canton, B., Labno, A. and Endy, D. (2008). Refinement and standardization of synthetic biological parts and devices. Nature biotechnology 26, 787–793. → pages 7 Carrozza, M. J., Li, B., Florens, L., Suganuma, T., Swanson, S. K., Lee, K. K., Shia, W.-J., Anderson, S., Yates, J., Washburn, M. P. and Workman, J. L. (2005). Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123, 581–592. → pages 78, 105, 118, 119 135  Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., Wheeler, R., Wong, B., Drenkow, J., Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K. and Gingeras, T. R. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509. → pages 54 Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S. and Botstein, D. (1998). SGD: Saccharomyces Genome Database. Nucleic acids research 26, 73–79. → pages 61 Cheung, V., Chua, G., Batada, N. N., Landry, C. R., Michnick, S. W., Hughes, T. R. and Winston, F. (2008). Chromatin- and transcription-related factors repress transcription from within coding regions throughout the Saccharomyces cerevisiae genome. PLoS biology 6, e277. → pages 78, 105, 118 Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M. and Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome biology 6, R16. → pages 55 Clapier, C. R. and Cairns, B. R. (2009). The biology of chromatin remodeling complexes. Annual review of biochemistry 78, 273–304. → pages 13 Claverie, J.-M. (2005). Fewer genes, more noncoding RNA. Science (New York, NY) 309, 1529–1530. → pages 54 Clemson, C. M., McNeil, J. A., Willard, H. F. and Lawrence, J. B. (1996). XIST RNA paints the inactive X chromosome at interphase: evidence for a novel RNA involved in nuclear/chromosome structure. The Journal of cell biology 132, 259–275. → pages 21 Cokus, S. J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C. D., Pradhan, S., Nelson, S. F., Pellegrini, M. and Jacobsen, S. E. (2008). Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219. → pages 21 Collas, P. (2010). The current state of chromatin immunoprecipitation. Molecular biotechnology 45, 87–100. → pages 49 Condon, A. (2006). Designed DNA molecules: principles and applications of molecular nanotechnology. Nature reviews Genetics 7, 565–575. → pages 6  136  Condon, A., Harel, D. and Kok, J. N. (2009). Algorithmic Bioprocesses. Springer-Verlag New York Inc. → pages 121 Conrad, M. (1983). Microscopic-macroscopic interface in biological information processing. Bio Systems 16, 345–363. → pages 124 Cooper, G. M. and Hausman, R. E. (2009). The Cell: A Molecular Approach, Fifth Edition. 5th edition edition, Sinauer Associates Inc. → pages 1, 27 Cox, D. N., Chao, A., Baker, J., Chang, L., Qiao, D. and Lin, H. (1998). A novel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. Genes & development 12, 3715–3727. → pages 28 Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561–563. → pages 8 Crick, F. H., Barnett, L., Brenner, S. and Watts-Tobin, R. J. (1961). General nature of the genetic code for proteins. Nature 192, 1227–1232. → pages 6 Culler, S. J., Hoff, K. G. and Smolke, C. D. (2010). Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science (New York, NY) 330, 1251–1255. → pages 19, 29, 31 Dahl, J. A. and Collas, P. (2009). MicroChIP: chromatin immunoprecipitation for small cell numbers. Methods in molecular biology (Clifton, NJ) 567, 59–74. → pages 128 D’Andrea, A. D. (2010). Susceptibility pathways in Fanconi’s anemia and breast cancer. The New England journal of medicine 362, 1909–1919. → pages 118 Daniel, J. A., Torok, M. S., Sun, Z.-W., Schieltz, D., Allis, C. D., Yates, J. R. and Grant, P. A. (2004). Deubiquitination of histone H2B by a yeast acetyltransferase complex regulates transcription. The Journal of biological chemistry 279, 1867–1871. → pages 77, 104, 117 David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C. J., Bofkin, L., Jones, T., Davis, R. W. and Steinmetz, L. M. (2006). A high-resolution map of transcription in the yeast genome. Proceedings of the National Academy of Sciences of the United States of America 103, 5320–5325. → pages 54, 109 Davidson, E. A. and Ellington, A. D. (2007). Synthetic RNA circuits. Nature chemical biology 3, 23–28. → pages 9, 31  137  Davidson, S. B., Overton, C. and Buneman, P. (1995). Challenges in integrating biological data sources. Journal of computational biology : a journal of computational molecular cell biology 2, 557–572. → pages 67 Deal, R. B., Henikoff, J. G. and Henikoff, S. (2010). Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science (New York, NY) 328, 1161–1164. → pages 128 Deans, T. L., Cantor, C. R. and Collins, J. J. (2007). A tunable genetic switch based on RNAi and repressor proteins for regulating gene expression in mammalian cells. Cell 130, 363–372. → pages 125 Dekker, J., Rippe, K., Dekker, M. and Kleckner, N. (2002). Capturing chromosome conformation. Science (New York, NY) 295, 1306–1311. → pages 128 Ding, S.-W. and Voinnet, O. (2007). Antiviral immunity directed by small RNAs. Cell 130, 413–426. → pages 17 DiVincenzo, D. P. (1995). Quantum Computation. Science (New York, NY) 270, 255–261. → pages 4 Douglas, S. M., Dietz, H., Liedl, T., H¨ogberg, B., Graf, F. and Shih, W. M. (2009). Self-assembly of DNA into nanoscale three-dimensional shapes. Nature 459, 414–418. → pages 6 Dover, J., Schneider, J., Tawiah-Boateng, M., Wood, A., Dean, K., Johnston, M. and Shilatifard, A. (2002). Methylation of histone H3 by COMPASS requires ubiquitination of histone H2B by Rad6. The Journal of biological chemistry 277, 28368–28371. → pages 76, 114 Droit, A., Cheung, C. and Gottardo, R. (2010). rMAT–an R/Bioconductor package for analyzing ChIP-chip experiments. Bioinformatics (Oxford, England) 26, 678–679. → pages 55, 56, 79 Dulac, C. (2010). Brain function and chromatin plasticity. Nature 465, 728–735. → pages 12 Dunny, G. M. and Winans, S. C. (1999). Cell-cell signaling in bacteria. Amer Society for Microbiology. → pages 6, 27 Dynan, W. S. (1989). Modularity in promoters and enhancers. Cell 58, 1–4. → pages 27  138  Ecker, J. R. and Davis, R. W. (1986). Inhibition of gene expression in plant cells by expression of antisense RNA. Proceedings of the National Academy of Sciences of the United States of America 83, 5372–5376. → pages 17 Ellington, A. D. and Szostak, J. W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822. → pages 9 Elowitz, M. B. and Leibler, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338. → pages 7 Emre, N. C. T., Ingvarsdottir, K., Wyce, A., Wood, A., Krogan, N. J., Henry, K. W., Li, K., Marmorstein, R., Greenblatt, J. F., Shilatifard, A. and Berger, S. L. (2005). Maintenance of low histone ubiquitylation by Ubp10 correlates with telomere-proximal Sir2 association and gene silencing. Molecular cell 17, 585–594. → pages 77, 100, 116, 117 Ezziane, Z. (2006). DNA computing: applications and challenges. Nanotechnology 17, R27. → pages 7 Faulhammer, D., Cukras, A. R., Lipton, R. J. and Landweber, L. F. (2000). Molecular computation: RNA solutions to chess problems. Proceedings of the National Academy of Sciences of the United States of America 97, 1385–1389. → pages 8 Feinberg, A. P. (2010). Genome-scale approaches to the epigenetics of common human disease. Virchows Archiv : an international journal of pathology 456, 13–21. → pages 15 Feynman, R. (1959). Plenty of room at the bottom. In Presentation to American Physical Society. → pages 5 Filion, G. J., van Bemmel, J. G., Braunschweig, U., Talhout, W., Kind, J., Ward, L. D., Brugman, W., de Castro, I. J., Kerkhoven, R. M., Bussemaker, H. J. and van Steensel, B. (2010). Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143, 212–224. → pages 12 Fire, A., Xu, S., Montgomery, M. K., Kostas, S. A., Driver, S. E. and Mello, C. C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806–811. → pages 9, 16, 17 Fischle, W., Wang, Y. and Allis, C. D. (2003). Binary switches and modification cassettes in histone biology and beyond. Nature 425, 475–479. → pages 127  139  Fisher, R. A. (1922). On the interpretation of from contingency tables, and the calculation of P.”Journal of the Royal Statistical Society. Technical report. → pages 61 Flemming, W. (1882). Zellsubstanz, Kern und Zelltheilung. F C W Vogel, Leipzig. → pages 12 Fraser, P. (2006). Transcriptional control thrown for a loop. Current opinion in genetics & development 16, 490–495. → pages 10 Frederiks, F., Tzouros, M., Oudgenoeg, G., van Welsem, T., Fornerod, M., Krijgsveld, J. and van Leeuwen, F. (2008). Nonprocessive methylation by Dot1 leads to functional redundancy of histone H3K79 methylation states. Nature structural & molecular biology 15, 550–557. → pages 79, 112 Fredkin, E. and Toffoli, T. (1982). Conservative logic. International Journal of Theoretical Physics 21, 219–253. → pages 4 Fu, P. (2007). Biomolecular computing: is it ready to take off? Biotechnology journal 2, 91–101. → pages 5 Fujita, P. A., Rhead, B., Zweig, A. S., Hinrichs, A. S., Karolchik, D., Cline, M. S., Goldman, M., Barber, G. P., Clawson, H., Coelho, A., Diekhans, M., Dreszer, T. R., Giardine, B. M., Harte, R. A., Hillman-Jackson, J., Hsu, F., Kirkup, V., Kuhn, R. M., Learned, K., Li, C. H., Meyer, L. R., Pohl, A., Raney, B. J., Rosenbloom, K. R., Smith, K. E., Haussler, D. and Kent, W. J. (2011). The UCSC Genome Browser database: update 2011. Nucleic acids research 39, D876–82. → pages 62, 68, 69 Gao, Z., Liu, H.-L., Daxinger, L., Pontes, O., He, X., Qian, W., Lin, H., Xie, M., Lorkovic, Z. J., Zhang, S., Miki, D., Zhan, X., Pontier, D., Lagrange, T., Jin, H., Matzke, A. J. M., Matzke, M., Pikaard, C. S. and Zhu, J.-K. (2010). An RNA polymerase II- and AGO4-associated protein acts in RNA-directed DNA methylation. Nature 465, 106–109. → pages 21 Gardner, R. G., Nelson, Z. W. and Gottschling, D. E. (2005). Ubp10/Dot4p regulates the persistence of ubiquitinated histone H2B: distinct roles in telomeric silencing and general chromatin. Molecular and cellular biology 25, 6123–6139. → pages 77, 100, 104, 105, 116, 117 Gardner, T. S., Cantor, C. R. and Collins, J. J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. → pages 7  140  Gehlenborg, N., O’Donoghue, S. I., Baliga, N. S., Goesmann, A., Hibbs, M. A., Kitano, H., Kohlbacher, O., Neuweger, H., Schneider, R., Tenenbaum, D. and Gavin, A.-C. (2010). Visualization of omics data for systems biology. Nature methods 7, S56–68. → pages 62 Gelfand, B., Mead, J., Bruning, A., Apostolopoulos, N., Tadigotla, V., Nagaraj, V., Sengupta, A. M. and Vershon, A. K. (2011). Regulated antisense transcription controls expression of cell-type-specific genes in yeast. Molecular and cellular biology 31, 1701–1709. → pages 109 Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. H. and Zhang, J. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5, R80. → pages 57, 58 Gerber, M. and Shilatifard, A. (2003). Transcriptional elongation by RNA polymerase II and histone methylation. The Journal of biological chemistry 278, 26303–26306. → pages 115 Goecks, J., Nekrutenko, A., Taylor, J. and Team, G. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, R86. → pages 68, 69, 129 Goldberg, A. D., Allis, C. D. and Bernstein, E. (2007). Epigenetics: a landscape takes shape. Cell 128, 635–638. → pages 11 Greber, D., El-Baba, M. D. and Fussenegger, M. (2008). Intronically encoded siRNAs improve dynamic range of mammalian gene regulation systems and toggle switch. Nucleic acids research 36, e101. → pages 125 Gregory, R. I., Chendrimada, T. P., Cooch, N. and Shiekhattar, R. (2005). Human RISC couples microRNA biogenesis and posttranscriptional gene silencing. Cell 123, 631–640. → pages 29 Gregory, R. I., Chendrimada, T. P. and Shiekhattar, R. (2006). MicroRNA biogenesis: isolation and characterization of the microprocessor complex. Methods in molecular biology (Clifton, NJ) 342, 33–47. → pages 17 Grewal, A., Lambert, P. and Stockton, J. (2007). Analysis of expression data: an overview. Current protocols in bioinformatics Chapter 7, Unit 7.1. → pages 55 141  Grewal, S. I. S. and Elgin, S. C. R. (2007). Transcription and RNA interference in the formation of heterochromatin. Nature 447, 399–406. → pages 20 Grewal, S. I. S. and Rice, J. C. (2004). Regulation of heterochromatin by histone methylation and small RNAs. Curr Opin Cell Biol 16, 230–238. → pages 20 Group, B. F., Baker, D., Church, G., Collins, J., Endy, D., Jacobson, J., Keasling, J., Modrich, P., Smolke, C. and Weiss, R. (2006). Engineering life: building a fab for biology. Scientific American 294, 44–51. → pages 7 Guillemette, B., Bataille, A., Gevry, N., Adam, M., Blanchette, M., Robert, F. and Gaudreau, L. (2005). Variant histone H2A.Z is globally localized to the promoters of inactive yeast genes and regulates nucleosome positioning. PLoS biology 3, e384. → pages 106 Guillemette, B. and Gaudreau, L. (2006). Reuniting the contrasting functions of H2A.Z. Biochemistry and cell biology = Biochimie et biologie cellulaire 84, 528–535. → pages 78 Guttman, M., Amit, I., Garber, M., French, C., Lin, M. F., Feldser, D., Huarte, M., Zuk, O., Carey, B. W., Cassady, J. P., Cabili, M. N., Jaenisch, R., Mikkelsen, T. S., Jacks, T., Hacohen, N., Bernstein, B. E., Kellis, M., Regev, A., Rinn, J. L. and Lander, E. S. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227. → pages 54 Hall, I. M., Shankaranarayana, G. D., Noma, K.-I., Ayoub, N., Cohen, A. and Grewal, S. I. S. (2002). Establishment and maintenance of a heterochromatin domain. Science (New York, NY) 297, 2232–2237. → pages 20, 28, 127 Harbison, C., Gordon, D., Lee, T., Rinaldi, N., Macisaac, K., Danford, T., Hannett, N., Tagne, J., Reynolds, D., Yoo, J., Jennings, E., Zeitlinger, J., Pokholok, D., Kellis, M., Rolfe, P., Takusagawa, K., Lander, E., Gifford, D., Fraenkel, E. and Young, R. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104. → pages 75, 88 Hardy, S., Jacques, P.-E., G´evry, N., Forest, A., Fortin, M.-E., Laflamme, L., Gaudreau, L. and Robert, F. (2009). The euchromatic and heterochromatic landscapes are shaped by antagonizing effects of transcription on H2A.Z deposition. PLoS genetics 5, e1000687. → pages 119 Havilio, M., Levanon, E. Y., Lerman, G., Kupiec, M. and Eisenberg, E. (2005). Evidence for abundant transcription of non-coding regions in the Saccharomyces cerevisiae genome. BMC genomics 6, 93. → pages 109 142  He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N. and Kinzler, K. W. (2008). The antisense transcriptomes of human cells. Science (New York, NY) 322, 1855–1857. → pages 54 Henikoff, S. and Grosveld, F. (2008). Welcome to epigenetics & chromatin. Epigenetics & chromatin 1, 1. → pages 16 Henkin, T. M. (2008). Riboswitch RNAs: using RNA to sense cellular metabolism. Genes & development 22, 3383–3390. → pages 9 Henry, K. W., Wyce, A., Lo, W.-S., Duggan, L. J., Emre, N. C. T., Kao, C.-F., Pillus, L., Shilatifard, A., Osley, M. A. and Berger, S. L. (2003). Transcriptional activation via sequential histone H2B ubiquitylation and deubiquitylation, mediated by SAGA-associated Ubp8. Genes & development 17, 2648–2663. → pages 77, 117 Hillen, W. and Berens, C. (1994). Mechanisms underlying expression of Tn10 encoded tetracycline resistance. Annual review of microbiology 48, 345–369. → pages 6, 27 Hinrichs, W., Kisker, C., D¨uvel, M., M¨uller, A., Tovar, K., Hillen, W. and Saenger, W. (1994). Structure of the Tet repressor-tetracycline complex and regulation of antibiotic resistance. Science (New York, NY) 264, 418–420. → pages 6, 27 Hjelmfelt, A. and Ross, J. (1992). Chemical implementation and thermodynamics of collective neural networks. Proceedings of the National Academy of Sciences of the United States of America 89, 388–391. → pages 5 Hjelmfelt, A., Weinberger, E. D. and Ross, J. (1991). Chemical implementation of neural networks and Turing machines. Proceedings of the National Academy of Sciences of the United States of America 88, 10983–10987. → pages 5 Hjelmfelt, A., Weinberger, E. D. and Ross, J. (1992). Chemical implementation of finite-state machines. Proceedings of the National Academy of Sciences of the United States of America 89, 383–387. → pages 5 Hochstrasser, M. (1996). Ubiquitin-dependent protein degradation. Annual review of genetics 30, 405–439. → pages 76 Hogan, G., Lee, C. and Lieb, J. (2006). Cell cycle-specified fluctuation of nucleosome occupancy at gene promoters. PLoS genetics 2, e158. → pages 75, 86  143  Holstege, F., Jennings, E., Wyrick, J., Lee, T., Hengartner, C., Green, M., Golub, T., Lander, E. and Young, R. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728. → pages 66, 82, 83, 93, 95 Huber, W., Toedling, J. and Steinmetz, L. M. (2006). Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics (Oxford, England) 22, 1963–1970. → pages 57, 66, 105 Hunter, C. P., Winston, W. M., Molodowitch, C., Feinberg, E. H., Shih, J., Sutherlin, M., Wright, A. J. and Fitzgerald, M. C. (2006). Systemic RNAi in Caenorhabditis elegans. Cold Spring Harbor symposia on quantitative biology 71, 95–100. → pages 46 Hwang, W., Venkatasubrahmanyam, S., Ianculescu, A., Tong, A., Boone, C. and Madhani, H. (2003). A conserved RING finger protein required for histone H2B monoubiquitination and cell size control. Molecular cell 11, 261–266. → pages 76 Ingvarsdottir, K., Krogan, N. J., Emre, N. C. T., Wyce, A., Thompson, N. J., Emili, A., Hughes, T. R., Greenblatt, J. F. and Berger, S. L. (2005). H2B ubiquitin protease Ubp8 and Sgf11 constitute a discrete functional module within the Saccharomyces cerevisiae SAGA complex. Molecular and cellular biology 25, 1162–1172. → pages 77, 118 Isaacs, F. J., Dwyer, D. J. and Collins, J. J. (2006). RNA synthetic biology. Nature biotechnology 24, 545–554. → pages 9 Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R. and Collins, J. J. (2004). Engineered riboregulators enable post-transcriptional control of gene expression. Nature biotechnology 22, 841–847. → pages 8 Istrail, S., De-Leon, S. B.-T. and Davidson, E. H. (2007). The regulatory genome and the computer. Developmental biology 310, 187–195. → pages 2, 9, 121 Iyer, V., Horak, C., Scafe, C., Botstein, D., Snyder, M. and Brown, P. (2001). Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538. → pages 75, 88, 89 Jacinto, F. V., Ballestar, E. and Esteller, M. (2008). Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. BioTechniques 44, 35, 37, 39 passim. → pages 70 Jain, K. and G W Pratt, J. (1976). Optical transistor. Applied Physics Letters 28, 719–721. → pages 4 144  Jaynes, E. T. (1957a). Information Theory and Statistical Mechanics. The Physical Review 106, 620–630. → pages 121 Jaynes, E. T. (1957b). Information Theory and Statistical Mechanics. II. Phys. Rev. 108, 171–190. → pages 121 Jeltsch, A. and Rathert, P. (2008). Putting the pieces together: histone H2B ubiquitylation directly stimulates histone H3K79 methylation. Chembiochem : a European journal of chemical biology 9, 2193–2195. → pages 113, 115 Jenuwein, T. and Allis, C. D. (2001). Translating the histone code. Science (New York, NY) 293, 1074–1080. → pages 15 Jiang, C. and Pugh, B. F. (2009). Nucleosome positioning and gene regulation: advances through genomics. Nature reviews Genetics 10, 161–172. → pages 127 Johnson, D. S., Li, W., Gordon, D. B., Bhattacharjee, A., Curry, B., Ghosh, J., Brizuela, L., Carroll, J. S., Brown, M., Flicek, P., Koch, C. M., Dunham, I., Bieda, M., Xu, X., Farnham, P. J., Kapranov, P., Nix, D. A., Gingeras, T. R., Zhang, X., Holster, H., Jiang, N., Green, R. D., Song, J. S., McCuine, S. A., Anton, E., Nguyen, L., Trinklein, N. D., Ye, Z., Ching, K., Hawkins, D., Ren, B., Scacheri, P. C., Rozowsky, J., Karpikov, A., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M., Yang, A., Moqtaderi, Z., Hirsch, H., Shulha, H. P., Fu, Y., Weng, Z., Struhl, K., Myers, R. M., Lieb, J. D. and Liu, X. S. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome research 18, 393–403. → pages 55 Johnson, W., Li, W., Meyer, C., Gottardo, R., Carroll, J., Brown, M. and Liu, X. (2006). Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A 103, 12457–12462. → pages 55, 56, 79 Jones, L., Ratcliff, F. and Baulcombe, D. C. (2001). RNA-directed transcriptional gene silencing in plants can be inherited independently of the RNA trigger and requires Met1 for maintenance. Current biology : CB 11, 747–757. → pages 20 Joshi, A. A. and Struhl, K. (2005). Eaf3 chromodomain interaction with methylated H3-K36 links histone deacetylation to Pol II elongation. Molecular cell 20, 971–978. → pages 78, 105, 118 Judy, J. T. and Ji, H. (2009). TileProbe: modeling tiling array probe effects using publicly available data. Bioinformatics (Oxford, England) 25, 2369–2375. → pages 55 145  Kahana, A. and Gottschling, D. E. (1999). DOT4 links silencing and cell growth in Saccharomyces cerevisiae. Molecular and cellular biology 19, 6608–6620. → pages 77 Kahn, S. D. (2011). On the future of genomic data. Science (New York, NY) 331, 728–729. → pages 71 Kamath, R. S. and Ahringer, J. (2003). Genome-wide RNAi screening in Caenorhabditis elegans. Methods (San Diego, Calif) 30, 313–321. → pages 19, 29 Kampa, D., Cheng, J., Kapranov, P., Yamanaka, M., Brubaker, S., Cawley, S., Drenkow, J., Piccolboni, A., Bekiranov, S., Helt, G., Tammana, H. and Gingeras, T. R. (2004). Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome research 14, 331–342. → pages 57 Kaplan, C. D., Laprade, L. and Winston, F. (2003). Transcription elongation factors repress transcription initiation from cryptic sites. Science (New York, NY) 301, 1096–1099. → pages 78, 105, 118 Kaplan, T., Liu, C., Erkmann, J., Holik, J., Grunstein, M., Kaufman, P., Friedman, N. and Rando, O. (2008). Cell cycle- and chaperone-mediated regulation of H3K56ac incorporation in yeast. PLoS genetics 4, e1000270. → pages 75 Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P. A. and Gingeras, T. R. (2002). Large-scale transcriptional activity in chromosomes 21 and 22. Science (New York, NY) 296, 916–919. → pages 54 Kapranov, P., Cheng, J., Dike, S., Nix, D. A., Duttagupta, R., Willingham, A. T., Stadler, P. F., Hertel, J., Hackerm¨uller, J., Hofacker, I. L., Bell, I., Cheung, E., Drenkow, J., Dumais, E., Patel, S., Helt, G., Ganesh, M., Ghosh, S., Piccolboni, A., Sementchenko, V., Tammana, H. and Gingeras, T. R. (2007). RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science (New York, NY) 316, 1484–1488. → pages 54 Kapranov, P., Sementchenko, V. I. and Gingeras, T. R. (2003). Beyond expression profiling: next generation uses of high density oligonucleotide arrays. Briefings in functional genomics & proteomics 2, 47–56. → pages 53, 54 Kapranov, P., Willingham, A. T. and Gingeras, T. R. (2007). Genome-wide transcription and the implications for genomic organization. Nature reviews Genetics 8, 413–423. → pages 49, 54 146  Katayama, S., Tomaru, Y., Kasukawa, T., Waki, K., Nakanishi, M., Nakamura, M., Nishida, H., Yap, C. C., Suzuki, M., Kawai, J., Suzuki, H., Carninci, P., Hayashizaki, Y., Wells, C., Frith, M., Ravasi, T., Pang, K. C., Hallinan, J., Mattick, J., Hume, D. A., Lipovich, L., Batalov, S., Engstr¨om, P. G., Mizuno, Y., Faghihi, M. A., Sandelin, A., Chalk, A. M., Mottagui-Tabar, S., Liang, Z., Lenhard, B., Wahlestedt, C., Group, R. G. E. R., Group), G. S. G. G. N. P. C. and Consortium, F. (2005). Antisense transcription in the mammalian transcriptome. Science (New York, NY) 309, 1564–1566. → pages 54 Ketting, R. F. (2011). The many faces of RNAi. Developmental cell 20, 148–161. → pages 3 Khalil, A. M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B. E., van Oudenaarden, A., Regev, A., Lander, E. S. and Rinn, J. L. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106, 11667–11672. → pages 54 Kimmel, A. and Oliver, B. (2006). DNA microarrays: Array platforms and wet-bench protocols. Methods in enzymology, Elsevier/Academic Press. → pages 52 Kobor, M., Venkatasubrahmanyam, S., Meneghini, M., Gin, J., Jennings, J., Link, A., Madhani, H. and Rine, J. (2004). A protein complex containing the conserved Swi2/Snf2-related ATPase Swr1p deposits histone variant H2A.Z into euchromatin. PLoS biology 2, E131. → pages 77, 107, 119 K¨ohler, A., Pascual-Garc´ıa, P., Llopis, A., Zapater, M., Posas, F., Hurt, E. and Rodr´ıguez-Navarro, S. (2006). The mRNA export factor Sus1 is involved in Spt/Ada/Gcn5 acetyltransferase-mediated H2B deubiquitinylation through its interaction with Ubp8 and Sgf11. Molecular biology of the cell 17, 4228–4236. → pages 117 K¨ohler, A., Zimmerman, E., Schneider, M., Hurt, E. and Zheng, N. (2010). Structural basis for assembly and activation of the heterotetrameric SAGA histone H2B deubiquitinase module. Cell 141, 606–617. → pages 117 Kornberg, R. D. (1974). Chromatin Structure: A Repeating Unit of Histones and DNA. Science (New York, NY) 184, 868–871. → pages 13 Kornberg, R. D. (1977). Structure of chromatin. Annual review of biochemistry 46, 931–954. → pages 10 147  Kornberg, R. D. and Lorch, Y. (1999). Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294. → pages 13 Koshland, D. E. (2002). Special essay. The seven pillars of life. Science (New York, NY) 295, 2215–2216. → pages 121 Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128, 693–705. → pages 10, 13, 14, 73 Krogan, N., Keogh, M., Datta, N., Sawa, C., Ryan, O., Ding, H., Haw, R., Pootoolal, J., Tong, A., Canadien, V., Richards, D., Wu, X., Emili, A., Hughes, T., Buratowski, S. and Greenblatt, J. (2003a). A Snf2 family ATPase complex required for recruitment of the histone H2A variant Htz1. Molecular cell 12, 1565–1576. → pages 77, 107, 119 Krogan, N. J., Dover, J., Wood, A., Schneider, J., Heidt, J., Boateng, M. A., Dean, K., Ryan, O. W., Golshani, A., Johnston, M., Greenblatt, J. F. and Shilatifard, A. (2003b). The Paf1 complex is required for histone H3 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. Molecular cell 11, 721–729. → pages 74, 76, 115 Krogan, N. J., Kim, M., Tong, A., Golshani, A., Cagney, G., Canadien, V., Richards, D. P., Beattie, B. K., Emili, A., Boone, C., Shilatifard, A., Buratowski, S. and Greenblatt, J. (2003c). Methylation of histone H3 by Set2 in Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Molecular and cellular biology 23, 4207–4218. → pages 78 Lacoste, N., Utley, R., Hunter, J., Poirier, G. and Cote, J. (2002). Disruptor of telomeric silencing-1 is a chromatin-specific histone H3 methyltransferase. The Journal of biological chemistry 277, 30421–30424. → pages 74 Lahiri, D. K. and Maloney, B. (2006). Genes are not our destiny: the somatic epitype bridges between the genotype and the phenotype. Nature reviews Neuroscience 7. → pages 12 Lan, F. and Shi, Y. (2009). Epigenetic regulation: methylation of histone and non-histone proteins. Science in China Series C, Life sciences / Chinese Academy of Sciences 52, 311–322. → pages 74 Latham, J. A. and Dent, S. Y. R. (2007). Cross-regulation of histone modifications. Nature structural & molecular biology 14, 1017–1024. → pages 14, 76 148  Lee, J., Shukla, A., Schneider, J., Swanson, S., Washburn, M., Florens, L., Bhaumik, S. and Shilatifard, A. (2007). Histone crosstalk between H2B monoubiquitination and H3 methylation mediated by COMPASS. Cell 131, 1084–1096. → pages 115 Lee, J.-S., Smith, E. and Shilatifard, A. (2010). The language of histone crosstalk. Cell 142, 682–685. → pages 15, 49, 63, 127 Lee, J. T. (2000). Disruption of imprinted X inactivation by parent-of-origin effects at Tsix. Cell 103, 17–27. → pages 21 Lee, J. T. (2010). The X as model for RNA’s niche in epigenomic regulation. Cold Spring Harbor perspectives in biology 2, a003749. → pages 22, 124 Lee, J. T., Davidow, L. S. and Warshawsky, D. (1999). Tsix, a gene antisense to Xist at the X-inactivation centre. Nature genetics 21, 400–404. → pages 21 Lee, J. T. and Lu, N. (1999). Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99, 47–57. → pages 21 Lee, K. K., Florens, L., Swanson, S. K., Washburn, M. P. and Workman, J. L. (2005). The deubiquitylation activity of Ubp8 is dependent upon Sgf11 and its association with the SAGA complex. Molecular and cellular biology 25, 1173–1182. → pages 77 Lee, W., Tillo, D., Bray, N., Morse, R., Davis, R., Hughes, T. and Nislow, C. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature genetics 39, 1235–1244. → pages 61, 79, 80, 84 Leisner, M., Bleris, L., Lohmueller, J., Xie, Z. and Benenson, Y. (2010). Rationally designed logic integration of regulatory signals in mammalian cells. Nature nanotechnology 5, 666–670. → pages 9, 31 Lenstra, T. L., Benschop, J. J., Kim, T., Schulze, J. M., Brabers, N. A. C. H., Margaritis, T., van de Pasch, L. A. L., van Heesch, S. A. A. C., Brok, M. O., Groot Koerkamp, M. J. A., Ko, C. W., van Leenen, D., Sameith, K., van Hooff, S. R., Lijnzaad, P., Kemmeren, P., Hentrich, T., Kobor, M. S., Buratowski, S. and Holstege, F. C. P. (2011). The specificity and topology of chromatin interaction pathways in yeast. Molecular cell 42, 536–549. → pages 105, 117 Levine, M. and Davidson, E. H. (2005). Gene regulatory networks for development. Proceedings of the National Academy of Sciences of the United States of America 102, 4936–4942. → pages 2, 9, 121 149  Lickwar, C. R., Rao, B., Shabalin, A. A., Nobel, A. B., Strahl, B. D. and Lieb, J. D. (2009). The Set2/Rpd3S pathway suppresses cryptic transcription without regard to gene length or transcription frequency. PloS one 4, e4886. → pages 78, 105, 106, 119, 120 Lieb, J. D. (2003). Genome-wide mapping of protein-DNA interactions by chromatin immunoprecipitation and DNA microarray hybridization. Methods in molecular biology (Clifton, NJ) 224, 99–109. → pages 50 Link, K. H. and Breaker, R. R. (2009). Engineering ligand-responsive gene-control elements: lessons learned from natural riboswitches. Gene therapy 16, 1189–1201. → pages 31 Lipardi, C., Wei, Q. and Paterson, B. M. (2001). RNAi as random degradative PCR: siRNA primers convert mRNA into dsRNAs that are degraded to generate new siRNAs. Cell 107, 297–307. → pages 16 Lipton, R. J. and Baum, E. B. (1996). DNA based computers. Amer Mathematical Society. → pages 6 Lister, R., O’Malley, R. C., Tonti-Filippini, J., Gregory, B. D., Berry, C. C., Millar, A. H. and Ecker, J. R. (2008). Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536. → pages 21 Liu, C. L., Kaplan, T., Kim, M., Buratowski, S., Schreiber, S. L., Friedman, N. and Rando, O. J. (2005). Single-nucleosome mapping of histone modifications in S. cerevisiae. PLoS biology 3, e328. → pages 61, 98 Liu, Q., Wang, L., Frutos, A. G., Condon, A. E., Corn, R. M. and Smith, L. M. (2000). DNA computing on surfaces. Nature 403, 175–179. → pages 4 Liu, X. S. (2007). Getting started in tiling microarray analysis. PLoS computational biology 3, 1842–1844. → pages 54 Liu, Y., Mochizuki, K. and Gorovsky, M. A. (2004). Histone H3 lysine 9 methylation is required for DNA elimination in developing macronuclei in Tetrahymena. Proceedings of the National Academy of Sciences of the United States of America 101, 1679–1684. → pages 21 Lodish, H., Berk, A., Kaiser, C. A., Krieger, M., Scott, M. P., Bretscher, A., Ploegh, H. and Matsudaira, P. T. (2007). Molecular Cell Biology. 6th edition, W.H.Freeman & Co Ltd. → pages 1, 27  150  Luger, K., Mader, A., Richmond, R., Sargent, D. and Richmond, T. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251–260. → pages 10 Luikenhuis, S., Wutz, A. and Jaenisch, R. (2001). Antisense transcription through the Xist locus mediates Tsix function in embryonic stem cells. Molecular and cellular biology 21, 8512–8520. → pages 21 Luk, E., Vu, N.-D., Patteson, K., Mizuguchi, G., Wu, W.-H., Ranjan, A., Backus, J., Sen, S., Lewis, M., Bai, Y. and Wu, C. (2007). Chz1, a nuclear chaperone for histone H2AZ. Molecular cell 25, 357–368. → pages 119 Lushbough, C. M., Bergman, M. K., Lawrence, C. J., Jennewein, D. and Brendel, V. (2008). Implementing bioinformatic workflows within the bioextract server. International journal of computational biology and drug design 1, 302–312. → pages 68 MacIsaac, K. D., Wang, T., Gordon, D. B., Gifford, D. K., Stormo, G. D. and Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC bioinformatics 7, 113. → pages 66 Mandal, M., Boese, B., Barrick, J. E., Winkler, W. C. and Breaker, R. R. (2003). Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell 113, 577–586. → pages 8 Marahrens, Y., Panning, B., Dausman, J., Strauss, W. and Jaenisch, R. (1997). Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes & development 11, 156–166. → pages 21 Markowitz, V. M. and Ritter, O. (1995). Characterizing heterogeneous molecular biology database systems. Journal of computational biology : a journal of computational molecular cell biology 2, 547–556. → pages 67 Marques, M., Laflamme, L., Gervais, A. L. and Gaudreau, L. (2010). Reconciling the positive and negative roles of histone H2A.Z in gene transcription. Epigenetics : official journal of the DNA Methylation Society 5, 267–272. → pages 108 Matzke, M. A., Aufsatz, W., Kanno, T., Mette, M. F. and Matzke, A. J. M. (2002). Homology-dependent gene silencing and host defense in plants. Advances in genetics 46, 235–275. → pages 21  151  Mayer, A., Lidschreiber, M., Siebert, M., Leike, K., S¨oding, J. and Cramer, P. (2010). Uniform transitions of the general RNA polymerase II transcription complex. Nature structural & molecular biology 17, 1272–1278. → pages 63, 101 McGinty, R. K., Kim, J., Chatterjee, C., Roeder, R. G. and Muir, T. W. (2008). Chemically ubiquitylated histone H2B stimulates hDot1L-mediated intranucleosomal methylation. Nature 453, 812–816. → pages 113 McManus, M. T. and Sharp, P. A. (2002). Gene silencing in mammals by small interfering RNAs. Nature reviews Genetics 3, 737–747. → pages 19, 29 Mercer, T. R., Dinger, M. E. and Mattick, J. S. (2009). Long non-coding RNAs: insights into functions. Nature reviews Genetics 10, 155–159. → pages 54 Mette, M. F., Aufsatz, W., van der Winden, J., Matzke, M. A. and Matzke, A. J. (2000). Transcriptional silencing and promoter methylation triggered by double-stranded RNA. The EMBO journal 19, 5194–5201. → pages 20 Metzker, M. L. (2010). Sequencing technologies - the next generation. Nature reviews Genetics 11, 31–46. → pages 49 Miller, T., Krogan, N. J., Dover, J., Erdjument-Bromage, H., Tempst, P., Johnston, M., Greenblatt, J. F. and Shilatifard, A. (2001). COMPASS: a complex of proteins associated with a trithorax-related SET domain protein. Proceedings of the National Academy of Sciences of the United States of America 98, 12902–12907. → pages 74 Milosavljevic, A. (2010). Putting epigenome comparison into practice. Nature biotechnology 28, 1053–1056. → pages 15 Minsky, N., Shema, E., Field, Y., Schuster, M., Segal, E. and Oren, M. (2008). Monoubiquitinated H2B is associated with the transcribed region of highly expressed genes in human cells. Nature cell biology 10, 483–488. → pages 93 Misteli, T. (2005). Concepts in nuclear architecture. BioEssays : news and reviews in molecular, cellular and developmental biology 27, 477–487. → pages 10 Misteli, T. (2007). Beyond the sequence: cellular organization of genome function. Cell 128, 787–800. → pages 10, 73, 128 Mizuguchi, G., Shen, X., Landry, J., Wu, W., Sen, S. and Wu, C. (2004). ATP-driven exchange of histone H2AZ variant catalyzed by SWR1 chromatin 152  remodeling complex. Science (New York, NY) 303, 343–348. → pages 77, 107, 119 Mochizuki, K., Fine, N. A., Fujisawa, T. and Gorovsky, M. A. (2002). Analysis of a piwi-related gene implicates small RNAs in genome rearrangement in tetrahymena. Cell 110, 689–699. → pages 20, 21 Morgan, H. D., Santos, F., Green, K., Dean, W. and Reik, W. (2005). Epigenetic reprogramming in mammals. Human molecular genetics 14 Spec No 1, R47–58. → pages 127 Mosammaparast, N. and Shi, Y. (2010). Reversal of histone methylation: biochemical and molecular mechanisms of histone demethylases. Annual review of biochemistry 79, 155–179. → pages 74 Moss, E. G. (2003). Silencing unhealthy alleles naturally. Trends in biotechnology 21, 185–187. → pages 19, 29 M¨uller-Hill, B. (1996). The lac Operon. a short history of a genetic paradigm, De Gruyter. → pages 6, 27 Muramoto, T., M¨uller, I., Thomas, G., Melvin, A. and Chubb, J. R. (2010). Methylation of H3K4 Is required for inheritance of active transcriptional states. Current biology : CB 20, 397–406. → pages 74, 115 Murray, K. (1964). The occurrence of epsilon-N-methyl lysine in histones. Biochemistry 3, 10–15. → pages 74 Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (New York, NY) 320, 1344–1349. → pages 109 Nakanishi, S., Lee, J. S., Gardner, K. E., Gardner, J. M., Takahashi, Y.-h., Chandrasekharan, M. B., Sun, Z.-W., Osley, M. A., Strahl, B. D., Jaspersen, S. L. and Shilatifard, A. (2009). Histone H2BK123 monoubiquitination is the critical determinant for H3K4 and H3K79 trimethylation by COMPASS and Dot1. The Journal of cell biology 186, 371–377. → pages 114 Napoli, C., Lemieux, C. and Jorgensen, R. (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. The Plant cell 2, 279–289. → pages 17 Naturalists, A. S. o., of Chicago Press. Journals Division, U. and (Organization), J. (1878). The American naturalist. → pages 121 153  Nature Biotechnology, E. (2010). Making a mark. Nature biotechnology 28, 1031. → pages 15, 16 Neumann, J. V. (1966). Theory of Self-Reproducing Automata. University of Illinois Press, Champaign, IL, USA. → pages 5 Newmann, J. v. (1945). First draft of a report on the EDVAC. Technical report. → pages 4 Ng, H. H., Ciccone, D. N., Morshead, K. B., Oettinger, M. A. and Struhl, K. (2003a). Lysine-79 of histone H3 is hypomethylated at silenced loci in yeast and mammalian cells: a potential mechanism for position-effect variegation. Proceedings of the National Academy of Sciences of the United States of America 100, 1820–1825. → pages 74 Ng, H. H., Robert, F., Young, R. A. and Struhl, K. (2003b). Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. Molecular cell 11, 709–719. → pages 74, 115 Ng, S. B., Buckingham, K. J., Lee, C., Bigham, A. W., Tabor, H. K., Dent, K. M., Huff, C. D., Shannon, P. T., Jabs, E. W., Nickerson, D. A., Shendure, J. and Bamshad, M. J. (2010). Exome sequencing identifies the cause of a mendelian disorder. Nature genetics 42, 30–35. → pages 129 Niazov, T., Baron, R., Katz, E., Lioubashevski, O. and Willner, I. (2006). Concatenated logic gates using four coupled biocatalysts operating in series. Proceedings of the National Academy of Sciences of the United States of America 103, 17160–17163. → pages 8 Niemeyer, C. M., Koehler, J. and Wuerdemann, C. (2002). DNA-directed assembly of bienzymic complexes from in vivo biotinylated NAD(P)H:FMN oxidoreductase and luciferase. Chembiochem : a European journal of chemical biology 3, 242–245. → pages 7 Nugent, R. and Meila, M. (2010). An overview of clustering applied to molecular biology. Methods in molecular biology (Clifton, NJ) 620, 369–404. → pages 64 Ogawa, Y., Sun, B. K. and Lee, J. T. (2008). Intersection of the RNA interference and X-inactivation pathways. Science (New York, NY) 320, 1336–1341. → pages 21  154  Okamura, K. and Lai, E. C. (2008). Endogenous small interfering RNAs in animals. Nature reviews Molecular cell biology 9, 673–678. → pages 29 Olins, A. L. and Olins, D. E. (1974). Spheroid chromatin units (v bodies). Science (New York, NY) 183, 330–332. → pages 10 O’Neill, L. P., VerMilyea, M. D. and Turner, B. M. (2006). Epigenetic characterization of the early embryo with a chromatin immunoprecipitation protocol applicable to small cell populations. Nature genetics 38, 835–841. → pages 128 Onodera, Y., Haag, J. R., Ream, T., Nunes, P. C., Pontes, O. and Pikaard, C. S. (2005). Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120, 613–622. → pages 21 Oudet, P., Gross-Bellard, M. and Chambon, P. (1975). Electron microscopic and biochemical evidence that chromatin structure is a repeating unit. Cell 4, 281–300. → pages 10, 13 Ouyang, Q., Kaplan, P. D., Liu, S. and Libchaber, A. (1997). DNA solution of the maximal clique problem. Science (New York, NY) 278, 446–449. → pages 4 Pak, J. and Fire, A. (2007). Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science (New York, NY) 315, 241–244. → pages 29 Pal-Bhadra, M., Bhadra, U. and Birchler, J. A. (2002). RNAi related mechanisms affect both transcriptional and posttranscriptional transgene silencing in Drosophila. Molecular cell 9, 315–327. → pages 20, 21 Pal-Bhadra, M., Leibovitch, B. A., Gandhi, S. G., Rao, M., Bhadra, U., Birchler, J. A. and Elgin, S. C. R. (2004). Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science (New York, NY) 303, 669–672. → pages 21 Pattenden, S. G., Gogol, M. M. and Workman, J. L. (2010). Features of cryptic promoters and their varied reliance on bromodomain-containing factors. PloS one 5, e12927. → pages 78 Paul Hill, W. H. (1989). Art of Electronics, The; 2nd Edition. Cambridge University Press. → pages 27, 31, 33  155  Penny, G. D., Kay, G. F., Sheardown, S. A., Rastan, S. and Brockdorff, N. (1996). Requirement for Xist in X chromosome inactivation. Nature 379, 131–137. → pages 21 Picard, F., Robin, S., Lavielle, M., Vaisse, C. and Daudin, J.-J. (2005). A statistical approach for array CGH data analysis. BMC bioinformatics 6, 27. → pages 57 Plasterk, R. H. A. (2002). RNA silencing: the genome’s immune system. Science (New York, NY) 296, 1263–1265. → pages 16 Pokholok, D., Harbison, C., Levine, S., Cole, M., Hannett, N., Lee, T., Bell, G., Walker, K., Rolfe, P., Herbolsheimer, E., Zeitlinger, J., Lewitter, F., Gifford, D. and Young, R. (2005). Genome-wide map of nucleosome acetylation and methylation in yeast. Cell 122, 517–527. → pages 63, 74, 81, 82, 93, 107 Powell, D. W., Weaver, C. M., Jennings, J. L., McAfee, K. J., He, Y., Weil, P. A. and Link, A. J. (2004). Cluster analysis of mass spectrometry data reveals a novel component of SAGA. Molecular and cellular biology 24, 7249–7259. → pages 77 Privman, V., Strack, G., Solenov, D., Pita, M. and Katz, E. (2008). Optimization of enzymatic biochemical logic for noise reduction and scalability: how many biocomputing gates can be interconnected in a circuit? The journal of physical chemistry. B 112, 11777–11784. → pages 8 Prohaska, S. J., Stadler, P. F. and Krakauer, D. C. (2010). Innovation in gene regulation: the case of chromatin computation. Journal of theoretical biology 265, 27–44. → pages 9, 12, 13, 14, 15, 125, 127 Ptashne, M. (1986). A genetic switch: gene control and phage [lambda]. Cell Press. → pages 6, 27 Ptashne, M. (1988). How eukaryotic transcriptional activators work. Nature 335, 683–689. → pages 27 Ptashne, M. (2007). On the use of the word ’epigenetic’. Current biology : CB 17, R233–6. → pages 11 Ptashne, M. (2009). Binding reactions: epigenetic switches, signal transduction and cancer. Current biology : CB 19, R234–41. → pages 12 Radman-Livaja, M., Liu, C. L., Friedman, N., Schreiber, S. L. and Rando, O. J. (2010). Replication and active demethylation represent partially overlapping 156  mechanisms for erasure of H3K4me3 in budding yeast. PLoS genetics 6, e1000837. → pages 74 Raisner, R., Hartley, P., Meneghini, M., Bao, M., Liu, C., Schreiber, S., Rando, O. and Madhani, H. (2005). Histone variant H2A.Z marks the 5’ ends of both active and inactive genes in euchromatin. Cell 123, 233–248. → pages 106 Rana, T. M. (2007). Illuminating the silence: understanding the structure and function of small RNAs. Nature reviews Molecular cell biology 8, 23–36. → pages 17 Rando, O. J. and Chang, H. Y. (2009). Genome-wide views of chromatin structure. Annual review of biochemistry 78, 245–271. → pages 14 Regev, A. and Shapiro, E. (2002). Cells as computation. Nature 419, 343. → pages 6 Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P. and Mesirov, J. P. (2006). GenePattern 2.0. Nature genetics 38, 500–501. → pages 68 Reik, W. (2007). Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447, 425–432. → pages 127 Rinaudo, K., Bleris, L., Maddamsetti, R., Subramanian, S., Weiss, R. and Benenson, Y. (2007). A universal RNAi-based logic evaluator that operates in mammalian cells. Nature biotechnology 25, 795–801. → pages 3, 9, 19, 31, 46, 125 Ritter, O., Kocab, P., Senger, M., Wolf, D. and Suhai, S. (1994). Prototype implementation of the integrated genomic database. Computers and biomedical research, an international journal 27, 97–115. → pages 67 Robzyk, K., Recht, J. and Osley, M. A. (2000). Rad6-dependent ubiquitination of histone H2B in yeast. Science (New York, NY) 287, 501–504. → pages 76 R¨ossler, O. (1974). Chemical automata in homogeneous and reaction-diffusion kinetics. Springer, Heidelberg. → pages 5 R¨ossler, O. E. (1972). A principle for chemical multivibration. Journal of theoretical biology 36, 413–417. → pages 5 Rothemund, P. W. K. (2006). Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302. → pages 7  157  Royce, T. E., Rozowsky, J. S., Bertone, P., Samanta, M., Stolc, V., Weissman, S., Snyder, M. and Gerstein, M. (2005). Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends in genetics : TIG 21, 466–475. → pages 57 Ruthenburg, A., Li, H., Patel, D. and Allis, C. (2007). Multivalent engagement of chromatin modifications by linked binding modules. Nature reviews Molecular cell biology 8, 983–994. → pages 14 Sado, T., Wang, Z., Sasaki, H. and Li, E. (2001). Regulation of imprinted X-chromosome inactivation in mice by Tsix. Development (Cambridge, England) 128, 1275–1286. → pages 21 Samara, N. L., Datta, A. B., Berndsen, C. E., Zhang, X., Yao, T., Cohen, R. E. and Wolberger, C. (2010). Structural insights into the assembly and function of the SAGA deubiquitinating module. Science (New York, NY) 328, 1025–1029. → pages 117 Santos-Rosa, H., Schneider, R., Bannister, A. J., Sherriff, J., Bernstein, B. E., Emre, N. C. T., Schreiber, S. L., Mellor, J. and Kouzarides, T. (2002). Active genes are tri-methylated at K4 of histone H3. Nature 419, 407–411. → pages 74 Satterlee, J. S., Sch¨ubeler, D. and Ng, H.-H. (2010). Tackling the epigenome: challenges and opportunities for collaboration. Nature biotechnology 28, 1039–1044. → pages 12 Schadt, E. E., Edwards, S. W., GuhaThakurta, D., Holder, D., Ying, L., Svetnik, V., Leonardson, A., Hart, K. W., Russell, A., Li, G., Cavet, G., Castle, J., McDonagh, P., Kan, Z., Chen, R., Kasarskis, A., Margarint, M., Caceres, R. M., Johnson, J. M., Armour, C. D., Garrett-Engele, P. W., Tsinoremas, N. F. and Shoemaker, D. D. (2004). A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome biology 5, R73. → pages 57 Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L. and Nolan, G. P. (2010). Computational solutions to large-scale data management and analysis. Nature reviews Genetics 11, 647–657. → pages 67 Sedighi, M. and Sengupta, A. M. (2007). Epigenetic chromatin silencing: bistability and front propagation. Physical biology 4, 246–255. → pages 127 Seeman, N. C. (2003). DNA in a material world. Nature 421, 427–431. → pages 7 158  Segal, E. and Widom, J. (2009). From DNA sequence to transcriptional behaviour: a quantitative approach. Nature reviews Genetics 10, 443–456. → pages 127 Serre, D., Lee, B. H. and Ting, A. H. (2010). MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic acids research 38, 391–399. → pages 70 Shahbazian, M., Zhang, K. and Grunstein, M. (2005). Histone H2B ubiquitylation controls processive methylation but not monomethylation by Dot1 and Set1. Molecular cell 19, 271–277. → pages 79, 112 Shalgi, R., Lieber, D., Oren, M. and Pilpel, Y. (2007). Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS computational biology 3, e131. → pages 125 Shapiro, E. and Benenson, Y. (2006). Tapping the computing power of biological molecules gives rise to tiny machines that can speak directly to living cells. Scientific American 294, 44–51. → pages 5, 6, 31 Shendure, J. and Ji, H. (2008). Next-generation DNA sequencing. Nature biotechnology 26, 1135–1145. → pages 49 Shilatifard, A. (2006). Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. Annual review of biochemistry 75, 243–269. → pages 74, 76 Shimoni, Y., Friedlander, G., Hetzroni, G., Niv, G., Altuvia, S., Biham, O. and Margalit, H. (2007). Regulation of gene expression by small non-coding RNAs: a quantitative view. Molecular systems biology 3, 138. → pages 125 Siepel, A., Farmer, A., Tolopko, A., Zhuang, M., Mendes, P., Beavis, W. and Sobral, B. (2001). ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics (Oxford, England) 17, 83–94. → pages 67 Sijen, T., Fleenor, J., Simmer, F., Thijssen, K. L., Parrish, S., Timmons, L., Plasterk, R. H. and Fire, A. (2001). On the role of RNA amplification in dsRNA-triggered gene silencing. Cell 107, 465–476. → pages 16 Simmel, F. C. and Dittmer, W. U. (2005). DNA nanodevices. Small (Weinheim an der Bergstrasse, Germany) 1, 284–299. → pages 7  159  Simon, I., Barnett, J., Hannett, N., Harbison, C., Rinaldi, N., Volkert, T., Wyrick, J., Zeitlinger, J., Gifford, D., Jaakkola, T. and Young, R. (2001). Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708. → pages 75, 88, 113 Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B. and de Laat, W. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature genetics 38, 1348–1354. → pages 128 Simpson, R. T. (1978). Structure of chromatin containing extensively acetylated H3 and H4. Cell 13, 691–699. → pages 14 Singer, M. S., Kahana, A., Wolf, A. J., Meisinger, L. L., Peterson, S. E., Goggin, C., Mahowald, M. and Gottschling, D. E. (1998). Identification of high-copy disruptors of telomeric silencing in Saccharomyces cerevisiae. Genetics 150, 613–632. → pages 117 Siomi, H. and Siomi, M. C. (2009). On the road to reading the RNA-interference code. Nature 457, 396–404. → pages 29 Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G. and Kasprzyk, A. (2009). BioMart–biological queries made easy. BMC genomics 10, 22. → pages 68 Song, J. S., Johnson, W. E., Zhu, X., Zhang, X., Li, W., Manrai, A. K., Liu, J. S., Chen, R. and Liu, X. S. (2007). Model-based analysis of two-color arrays (MA2C). Genome biology 8, R178. → pages 55 Song, Y.-H. and Ahn, S. H. (2010). A Bre1-associated protein, large 1 (Lge1), promotes H2B ubiquitylation during the early stages of transcription elongation. The Journal of biological chemistry 285, 2361–2367. → pages 104 Spearman, C. (1987). The proof and measurement of association between two things. By C. Spearman, 1904. The American journal of psychology 100, 441–471. → pages 63 Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular biology of the cell 9, 3273–3297. → pages 75, 84, 85, 86, 91  160  Steigele, S. and Nieselt, K. (2005). Open reading frames provide a rich pool of potential natural antisense transcripts in fungal genomes. Nucleic acids research 33, 5034–5044. → pages 109 Stockinger, H., Attwood, T., Chohan, S. N., Cˆot´e, R., Cudr´e-Mauroux, P., Falquet, L., Fernandes, P., Finn, R. D., Hupponen, T., Korpelainen, E., Labarga, A., Laugraud, A., Lima, T., Pafilis, E., Pagni, M., Pettifer, S., Phan, I. and Rahman, N. (2008). Experience using web services for biological sequence analysis. Briefings in bioinformatics 9, 493–505. → pages 67 Stojanovic, M. (2010). Two Molecular Information Processing Systems Based on Catalytic Nucleic Acids. In Natural Computing, (Peper, F., Umeo, H., Matsui, N. and Isokawa, T., eds), pp. 55–63. Springer Japan Columbia University USA. 10.1007/978-4-431-53868-4 6. → pages 125, 126 Stojanovic, M. N. and Stefanovic, D. (2003). A deoxyribozyme-based molecular automaton. Nature biotechnology 21, 1069–1074. → pages 8 Stoughton, R. B. (2005). Applications of DNA microarrays in biology. Annual review of biochemistry 74, 53–82. → pages 49 Strahl, B. D. and Allis, C. D. (2000). The language of covalent histone modifications. Nature 403, 41–45. → pages 15 Strahl, B. D., Grant, P. A., Briggs, S. D., Sun, Z.-W., Bone, J. R., Caldwell, J. A., Mollah, S., Cook, R. G., Shabanowitz, J., Hunt, D. F. and Allis, C. D. (2002). Set2 is a nucleosomal histone H3-selective methyltransferase that mediates transcriptional repression. Molecular and cellular biology 22, 1298–1306. → pages 74, 78 Subramaniam, S. (1998). The Biology Workbench–a seamless database and analysis environment for the biologist. Proteins 32, 1–2. → pages 67 Subsoontorn, P., Kim, J. and Winfree, E. (2011). Bistability of an In Vitro Synthetic Autoregulatory Switch. ArXiv e-prints, → pages 47 Suganuma, T. and Workman, J. L. (2008). Crosstalk among Histone Modifications. Cell 135, 604–607. → pages 14, 75, 76 Sun, Z. and Allis, C. (2002). Ubiquitination of histone H2B regulates H3 methylation and gene silencing in yeast. Nature 418, 104–108. → pages 76, 114 161  Tabara, H., Sarkissian, M., Kelly, W. G., Fleenor, J., Grishok, A., Timmons, L., Fire, A. and Mello, C. C. (1999). The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99, 123–132. → pages 20 Talbert, P. B. and Henikoff, S. (2010). Histone variants–ancient wrap artists of the epigenome. Nature reviews Molecular cell biology 11, 264–275. → pages 14, 77 Tamsir, A., Tabor, J. J. and Voigt, C. A. (2011). Robust multicellular computing using genetically encoded NOR gates and chemical ’wires’. Nature 469, 212–215. → pages 46 Taunton, J., Hassig, C. A. and Schreiber, S. L. (1996). A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p. Science (New York, NY) 272, 408–411. → pages 13 Tavazoie, S., Hughes, J., Campbell, M., Cho, R. and Church, G. (1999). Systematic determination of genetic network architecture. Nature genetics 22, 281–285. → pages 84 Taverna, S. D., Coyne, R. S. and Allis, C. D. (2002). Methylation of histone h3 at lysine 9 targets programmed DNA elimination in tetrahymena. Cell 110, 701–711. → pages 20, 21 Taverna, S. D., Ueberheide, B. M., Liu, Y., Tackett, A. J., Diaz, R. L., Shabanowitz, J., Chait, B. T., Hunt, D. F. and Allis, C. D. (2007). Long-distance combinatorial linkage between methylation and acetylation on histone H3 N termini. Proceedings of the National Academy of Sciences of the United States of America 104, 2086–2091. → pages 128 Taylor, J., Schenck, I., Blankenberg, D. and Nekrutenko, A. (2007). Using galaxy to perform large-scale interactive data analyses. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 10, Unit 10.5. → pages 68, 69 Tjaden, B., Haynor, D. R., Stolyar, S., Rosenow, C. and Kolker, E. (2002). Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics (Oxford, England) 18 Suppl 1, S337–44. → pages 57 Toedling, J., Skylar, O., Sklyar, O., Krueger, T., Fischer, J. J., Sperling, S. and Huber, W. (2007). Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. BMC bioinformatics 8, 221. → pages 55 162  Tsai, M.-C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J. K., Lan, F., Shi, Y., Segal, E. and Chang, H. Y. (2010). Long noncoding RNA as modular scaffold of histone modification complexes. Science (New York, NY) 329, 689–693. → pages 21 Tuleuova, N., An, C.-I., Ramanculov, E., Revzin, A. and Yokobayashi, Y. (2008). Modulating endogenous gene expression of mammalian cells via RNA-small molecule interaction. Biochemical and biophysical research communications 376, 169–173. → pages 9, 19 Turing, A. M. (1937). On Computable Numbers, with an Application to the Entscheidungsproblem. Proc. London Math. Soc. s2-42, 230–265. → pages 4 Turner, B. M. (2007). Defining an epigenetic code. Nature cell biology 9, 2–6. → pages 15 Tuschl, T. and Borkhardt, A. (2002). Small interfering RNAs: a revolutionary tool for the analysis of gene function and gene therapy. Molecular interventions 2, 158–167. → pages 19 Ura, K., Kurumizaka, H., Dimitrov, S., Almouzni, G. and Wolffe, A. P. (1997). Histone acetylation: influence on transcription, nucleosome mobility and positioning, and linker histone-dependent transcriptional repression. The EMBO journal 16, 2096–2107. → pages 14 van Bakel, H., van Werven, F., Radonjic, M., Brok, M., van Leenen, D., Holstege, F. and Timmers, H. (2008). Improved genome-wide localization by ChIP-chip using double-round T7 RNA polymerase-based amplification. Nucleic acids research 36, e21. → pages 51 van Driel, R., Fransz, P. F. and Verschure, P. J. (2003). The eukaryotic genome: a system regulated at different hierarchical levels. Journal of cell science 116, 4067–4075. → pages 10 van Leeuwen, F., Gafken, P. and Gottschling, D. (2002). Dot1p modulates silencing in yeast by methylation of the nucleosome core. Cell 109, 745–756. → pages 74 Vassetzky, Y., Gavrilov, A., Eivazova, E., Priozhkova, I., Lipinski, M. and Razin, S. (2009). Chromosome conformation capture (from 3C to 5C) and its ChIP-based modification. Methods in molecular biology (Clifton, NJ) 567, 171–188. → pages 128  163  Venters, B. J. and Pugh, B. F. (2009). A canonical promoter organization of the transcription machinery and its regulators in the Saccharomyces genome. Genome research 19, 360–371. → pages 63 Verdel, A., Jia, S., Gerber, S., Sugiyama, T., Gygi, S., Grewal, S. I. S. and Moazed, D. (2004). RNAi-mediated targeting of heterochromatin by the RITS complex. Science (New York, NY) 303, 672–676. → pages 20 Vermeulen, A., Behlen, L., Reynolds, A., Wolfson, A., Marshall, W. S., Karpilow, J. and Khvorova, A. (2005). The contributions of dsRNA structure to Dicer specificity and efficiency. RNA (New York, NY) 11, 674–682. → pages 29 Volpe, T., Schramke, V., Hamilton, G. L., White, S. A., Teng, G., Martienssen, R. A. and Allshire, R. C. (2003). RNA interference is required for normal centromere function in fission yeast. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 11, 137–146. → pages 20 Volpe, T. A., Kidner, C., Hall, I. M., Teng, G., Grewal, S. I. S. and Martienssen, R. A. (2002). Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science (New York, NY) 297, 1833–1837. → pages 20 Washietl, S., Machn´e, R. and Goldman, N. (2008). Evolutionary footprints of nucleosome positions in yeast. Trends in genetics : TIG 24, 583–587. → pages 127 Watson, J. D. and Crick, F. H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737–738. → pages 6 Weake, V. M. and Workman, J. L. (2008). Histone ubiquitination: triggering gene activity. Molecular cell 29, 653–663. → pages 77, 105, 115, 117 Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L. and Sch¨ubeler, D. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nature genetics 37, 853–862. → pages 70 Weiss, R. (2001). Cellular computation and communications using engineered genetic regulatory networks. PhD thesis,. → pages 7 Werner, T. (2008). Bioinformatics applications for pathway analysis of microarray data. Current opinion in biotechnology 19, 50–54. → pages 65  164  Wheeler, J. A. (1990). A journey into gravity and spacetime. Scientific American Library, New York. → pages 122 Win, M. N., Liang, J. C. and Smolke, C. D. (2009). Frameworks for programming biological function through RNA parts and devices. Chemistry & biology 16, 298–310. → pages 31, 125 Win, M. N. and Smolke, C. D. (2007a). A modular and extensible RNA-based gene-regulatory platform for engineering cellular function. Proceedings of the National Academy of Sciences of the United States of America 104, 14283–14288. → pages 8, 31 Win, M. N. and Smolke, C. D. (2007b). RNA as a versatile and powerful platform for engineering genetic regulatory tools. Biotechnology & genetic engineering reviews 24, 311–346. → pages 125 Win, M. N. and Smolke, C. D. (2008). Higher-order cellular information processing with synthetic RNA devices. Science (New York, NY) 322, 456–460. → pages 8 Winkler, W. C. and Breaker, R. R. (2005). Regulation of bacterial gene expression by riboswitches. Annual review of microbiology 59, 487–517. → pages 8 Wood, A., Krogan, N., Dover, J., Schneider, J., Heidt, J., Boateng, M., Dean, K., Golshani, A., Zhang, Y., Greenblatt, J., Johnston, M. and Shilatifard, A. (2003a). Bre1, an E3 ubiquitin ligase required for recruitment and substrate selection of Rad6 at a promoter. Molecular cell 11, 267–274. → pages 76 Wood, A., Schneider, J., Dover, J., Johnston, M. and Shilatifard, A. (2003b). The Paf1 complex is essential for histone monoubiquitination by the Rad6-Bre1 complex, which signals for histone methylation by COMPASS and Dot1p. The Journal of biological chemistry 278, 34739–34742. → pages 76 Wood, A., Schneider, J., Dover, J., Johnston, M. and Shilatifard, A. (2005). The Bur1/Bur2 complex is required for histone H2B monoubiquitination by Rad6/Bre1 and histone methylation by COMPASS. Molecular cell 20, 589–599. → pages 76 Woodcock, C. L. (2006). Chromatin architecture. Current opinion in structural biology 16, 213–220. → pages 10 Wyce, A., Xiao, T., Whelan, K. A., Kosman, C., Walter, W., Eick, D., Hughes, T. R., Krogan, N. J., Strahl, B. D. and Berger, S. L. (2007). H2B ubiquitylation 165  acts as a barrier to Ctk1 nucleosomal recruitment prior to removal by Ubp8 within a SAGA-related complex. Molecular cell 27, 275–288. → pages 77, 117 Xiao, T., Kao, C.-F., Krogan, N. J., Sun, Z.-W., Greenblatt, J. F., Osley, M. A. and Strahl, B. D. (2005). Histone H2B ubiquitylation is associated with elongating RNA polymerase II. Molecular and cellular biology 25, 637–651. → pages 76, 118 Xie, Z., Liu, S. J., Bleris, L. and Benenson, Y. (2010). Logic integration of mRNA signals by an RNAi-based molecular computer. Nucleic acids research 38, 2692–2701. → pages 9, 31 Xin, H. and Woolley, A. T. (2003). DNA-templated nanotube localization. Journal of the American Chemical Society 125, 8710–8711. → pages 7 Yanagida, T., Ueda, M., Murata, T., Esaki, S. and Ishii, Y. (2007). Brownian motion, fluctuation and life. Bio Systems 88, 228–242. → pages 124 Young, N. L., Plazas-Mayorca, M. D. and Garcia, B. A. (2010). Systems-wide proteomic characterization of combinatorial post-translational modification patterns. Expert review of proteomics 7, 79–92. → pages 128 Young, R. A. (2000). Biomedical discovery with DNA arrays. Cell 102, 9–15. → pages 53, 54 Youngson, N. A. and Whitelaw, E. (2008). Transgenerational epigenetic effects. Annual review of genomics and human genetics 9, 233–257. → pages 11 Zamore, P. D. (2002). Ancient pathways programmed by small RNAs. Science (New York, NY) 296, 1265–1269. → pages 16 Zamore, P. D., Tuschl, T., Sharp, P. A. and Bartel, D. P. (2000). RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25–33. → pages 29 Zhang, H., Roberts, D. N. and Cairns, B. R. (2005). Genome-wide dynamics of Htz1, a histone H2A variant that poises repressed/basal promoters for activation through histone loss. Cell 123, 219–231. → pages 106 Zhang, T.-Y. and Meaney, M. J. (2010). Epigenetics and the environmental regulation of the genome and its function. Annual review of psychology 61, 439–66, C1–3. → pages 12  166  Zhang, Z. D., Rozowsky, J., Lam, H. Y. K., Du, J., Snyder, M. and Gerstein, M. (2007). Tilescope: online analysis pipeline for high-density tiling microarray data. Genome biology 8, R81. → pages 55 Zhao, Z., Tavoosidana, G., Sj¨olinder, M., G¨ond¨or, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K. S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S. and Ohlsson, R. (2006). Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nature genetics 38, 1341–1347. → pages 128 Zilberman, D., Cao, X. and Jacobsen, S. E. (2003). ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science (New York, NY) 299, 716–719. → pages 20, 21 Zofall, M., Fischer, T., Zhang, K., Zhou, M., Cui, B., Veenstra, T. D. and Grewal, S. I. S. (2009). Histone H2A.Z cooperates with RNAi and heterochromatin factors to suppress antisense RNAs. Nature 461, 419–422. → pages 20, 78 Zuse, K. (1990). Der Computer - mein Lebenswerk (2. Aufl.). Springer. → pages 121  167  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items