Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Global computational regulatory analysis of the anti-endotoxin effect of LL-37 Doho, Gregory Hyojun 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006-0175.pdf [ 3.67MB ]
Metadata
JSON: 831-1.0092520.json
JSON-LD: 831-1.0092520-ld.json
RDF/XML (Pretty): 831-1.0092520-rdf.xml
RDF/JSON: 831-1.0092520-rdf.json
Turtle: 831-1.0092520-turtle.txt
N-Triples: 831-1.0092520-rdf-ntriples.txt
Original Record: 831-1.0092520-source.json
Full Text
831-1.0092520-fulltext.txt
Citation
831-1.0092520.ris

Full Text

G L O B A L COMPUTATIONAL R E G U L A T O R Y ANALYSIS OF T H E ANTI-ENDOTOXIN E F F E C T OF LL-37 by G R E G O R Y H Y O J U N DOHO B.Sc, University of British Columbia, 2000  A thesis submitted in partial fulfillment of the requirements for the degree of  M A S T E R OF SCIENCE . in THE F A C U L T Y OF G R A D U A T E STUDIES (MICROBIOLOGY & I M M U N O L O G Y )  THE UNIVERSITY OF BRITISH C O L U M B I A APRIL 2006 © G R E G O R Y H Y O J U N DOHO, 2006  ABSTRACT LL-37, a human cationic peptide, selectively modulates the innate immune response by interacting with the effector cells of innate immunity. It has been demonstrated to stimulate chemokine production and protect against infection, as well as suppress the endotoxin/lipopolysaccharide  (LPS)-mediated  production  of  pro-inflammatory  cytokines and endotoxic shock in mouse model experiments. Microarray experiments under these conditions have indicated that the expression of numerous genes is altered in the presence of LL-37 and/or LPS, but the mechanisms underlying these transcriptional changes and the immunomodulatory effects of LL-37 are poorly understood. Therefore a computational approach was applied to time course microarray data comprising gene sets upregulated in monocytic cells upon treatment with LPS, LL-37 or LPS and LL-37. Sets of co-expressed genes observed in microarray experiments were clustered by a variety of methods into related temporal gene expression patterns. Subsequently, the promoter regions of the genes in these clusters were analyzed to predict potential transcription factor binding sites (TFBS) and the application of rigorous statistics revealed over-represented TFBS. This then permitted the generation of hypotheses concerning the transcription factors and signaling pathways involved in the anti-endotoxin effect(s) of LL-37. These analyses indicated that LL-37 selectively neutralized the LPS-induced expression of genes with promoters containing the signatures of both nuclear factor ( N F ) - K B and transcription factors downstream of mitogen-activated protein kinase ( M A P K ) pathways. Other genes that remained upregulated in the presence of both LPS and LL-37 tended to be deficient in TFBS for  NF-KB  but did contain the MAPK-related and other TFBS. LL-37 alone  tended to induce expression of genes with promoters that bound transcription factors downstream of ERK1/2 and p38, M A P K that are known to be activated by LL-37, but did not induce the expression of most pro-inflammatory genes. Thus these data extend previous hypotheses that selective suppression of the  NF-KB  pathway is one of the anti-  endotoxin mechanisms of LL-37, and indicate that residual gene expression is controlled in part by M A P K pathways.  ii  Table of Contents  Abstract  ii  Table of Contents  iii  List of Tables  iv  List of Figures  v  List of Abbreviations  vi  Acknowledgements  vii  Statement of Authorship Introduction 1.1 1.2 1.3 1.4 1.5 1.6  Innate immunity and the inflammatory response Signaling pathways regulating inflammation The Human Cationic Peptide LL-37 Bioinformatics approach to understanding the anti-endotoxin effect of LL-37 Hypothesis and Experimental Goals Bibliography  Global Computational Regulatory Analysis of the Anti-endotoxin Effect of LL-37 2.1 2.2 2.3 2.4 2.5  Introduction Materials and Methods Results Discussion Bibliography  Discussion 3.1 3.2 3.3 3.4 3.5  Expression pattern clustering TFBS analysis Biological implications Future directions Bibliography  Supplemental Tables  viii 1 2 2 4 5 9 10  16 17 21 26 40 46  52 53 55 56 58 59  61  List of Tables Introduction  1  Table 1.1 The most commonly-used methods for clustering and their properties  Global Computational Regulatory Analysis, of the Anti-endotoxin Effect of L L - 3 7 * . . . . Table 2.1. Differentially expressed (DE) genes from THP-1  7  16  microarray  22  Table 2.2. Mapping of known N F - K B p50/p65 target genes using different clustering methods. 27 Table 2.3 Over-representation of TFBS in test datasets 30 Table 2.4 Overrepresentation of TFBS in clusters of co-expressed genes from the LPS-stimulated THP-1 dataset 32 Table 2.5 Overrepresentation of TFBS in clusters of co-expressed genes from THP-1 treated with LL-37 in the presence of LPS 38  Supplemental Tables Table I . p50/p65 N F - K B target genes mapped to the clustering results produced by S O T A and K M C Table I . p50/p65 N F - K B target genes mapped to the clustering results produced by S O T A and K M C Table I I . Over-representation of Gene Ontology Term analysis Table I I I . Over-representation of Gene Ontology Term analysis  61 62 62 63 66  iv  List of Figures Global Computational Regulatory Analysis of the Anti-endotoxin Effect of LL-37* Figure Figure Figure Figure LPS  16  2.1 Expression patterns of p50/p65 target genes show up-regulation by LPS 2.2. Clustering the transcriptional effect of LPS 2.3 LL-37 up-regulation of genes downstream of ERK1/2 and p38 pathways 2.4 Clusters of transcriptional events reflecting the action of LL-37 in the presence  28 33 34 of 37  v  List of Abbreviations  DE ERK1/2 GO HC hCAP-18 HOPACH hr ILKMC LBP LPS LTA MAPK NF-KB  PSSM SOM SOTA TFBS TLR TNF-a  differencially expressed extracellular regulated protein kinase gene ontology hierarchical clustering human cationic host defense protein-18 hierarchical ordered partitioning an collapsing hybrid hour interleukin k-means clustering lipopolysaccharide binding protein lipopolysaccharide lipoteichoic acid mitogen activated protein kinase nuclear factor kappa B position specific scoring matrix self organizing map self organizing tree algorithm transcription factor binding site toll-like receptor tumor necrosis factor alpha  Acknowledgements I would like to first thank my supervisor Dr. Bob Hancock for supporting and guiding me throughout my graduate work. I have learned tremendously under his guidance and I truly appreciate the opportunities he has provided me and the patience he sometimes had to have. I would like to also thank my co-supervisor Dr. Fiona Brinkman for providing me guidance and support. I especially appreciate the encouragement she has given me throughout my work. I would like to thank my committee members, Dr. Wyeth Wassermen and Dr. Michael Murphy for very productive and helpful discussions and advices. I would like to thank everyone in the Hancock lab and the Brinkman lab. I know I have been fortunate to work with such brilliant scientists and yet such wonderful people. There are so many people I would like to say thanks to. First, thanks to Fiona Roche for always getting me on track, Dawn Bowdish, Neeloffer Mookherjee and Kelly Brown for helpful suggestions, wonderful discussions and talks, Karsten Hokamp, Michael Acab for technical support, Susan Farmer for being my mom in the lab, Bernadette Mah and Barbara Sherman for so much administrative help. I would like to thank everyone else for making my years in the lab such a wonderful and memorable experience. I truly thank you all. I would like to thank my parents Catharina and Archie Doho for supporting and encouraging me. And my wife Ashley Doho, thank you for your emotional support and encouragement.  vii  Statement of Authorship  Chapter 2 was submitted for publication as Doho G.H., Mookherjee N , Brown K , Roche F . M . , Brinkman F.S.L., Hancock R.E.W. 2006. Global Computational Regulatory Analysis of Anti-endotoxin Effect of LL-37. Genomics. Submitted on Jan 2006. A l l data were collected and analyzed by G.H. Doho who wrote the manuscript. Fiona Brinkman hosted me in her lab and supervised me for a portion of this research. Robert Hancock hosted me in his lab and supervised me for a portion of this research, and discussed data and methods. Neeloffer Mookherjee and Kelly Brown provided the microarray data. Fiona Roche discussed methods and results.  viii  Chapter  1  INTRODUCTION  1.1  Innate immunity and the inflammatory response The innate immune system is essential for host defense and is responsible for  early detection and containment of pathogens (1, 2). Various cell types and tissues participate in the ensuing inflammatory response, which includes the recognition of microbes, the activation of antimicrobial defenses and the recruitment of circulating inflammatory cells (3). The nature of the invading pathogen specifies a response that provides optimal host defense, but this inflammatory response is a two edged sword that must be tightly regulated (3). The complex interactions initiated by the infection set off a wave of events that can lead to several possible alternative outcomes: resolution of the infection with complete restoration of tissue architecture, resolution of the infection and destruction of tissue (scarring), control of the infection with ongoing inflammation (chronic inflammation), control of the infection with initiation of new inflammatory responses (autoimmunity), or failure to control the infection (3). The regulation of the inflammatory response is extraordinarily complicated and occurs on many levels.  1.2  Signaling pathways regulating inflammation Inflammatory responses are regulated by a vast number of signaling pathways.  The complex integration and interplay amongst signaling molecules result in the translation of information from extracellular signals to intracellular responses. Ligand binding to the extracellular domain of Toll-like receptors (TLRs) initiates a complex signal-transduction cascade, which ultimately leads to activation of the transcription factor Nuclear Factor-KB ( N F - K B ) and increased transcription of pro-inflammatory  2  genes,  including  those  encoding  cytokines  and  chemokines.  Bacterial  lipopolysaccharide (LPS), a TLR4 agonist, is known to trigger innate immunity (therefore inflammation) in part via the TLR4 to  NF-KB  pathway (4). Modest  stimulation is considered beneficial but situations where too vigorous a response is initiated can lead to excessive amounts of pro-inflammatory cytokines like tumor necrosis factor alpha (TNF-a), leading to complications such as endotoxaemia and septic shock (5). LPS activates  NF-KB  in human monocytes and monocytic cell lines  by initiating signal transduction pathway(s) downstream of TLR4, promoting the degradation of the cytosolic inhibitor I K B which releases nucleus (6).  NF-KB  NF-KB  to translocate into the  comprises five subunits (p50/pl05, p52/pl00, p65 (RelA), RelB  and c-Rel) which combine as homo- or heterodimers to mediate transcriptional activation or in some cases repression; p50/p65 dimers are the best studied transcriptional activators while p50 homodimers tend to mediate repression (7). Studies have also demonstrated that LPS is a potent activator of mitogen-activated protein kinase (MAPK) signal transduction pathways that include those involving the extracellular signal-regulated kinase (ERK1/2) and p38 M A P K (8, 9). Inhibition of either ERK1/2 or p38 activation has been shown to reduce LPS induction of TNF-a expression (8, 9). These findings suggest that the  NF-KB  and the ERK1/2 and p38  pathways may act cooperatively in inducing pro-inflammatory cytokines such as TNFa. A n exaggerated response to bacterial stimuli underlies a clinical condition called Systemic Inflammatory Response Syndrome (5), or sepsis, in which high levels of cytokines and inflammatory mediators become destructive, causing organ failure,  3  cardiovascular  shock  and/or  death.  The  nature,  duration  and  intensity  of  inflammatory/septic responses are considered to involve the interplay between T L R and other receptors, different adaptor molecules such as MyD88, TIRAP/Mal and TRIF, and different signaling pathways (10, 11). A n ideal therapeutic regulator of the inflammatory response would be antagonistic to potentially lethal conditions such as septic shock but maintain innate immune defenses against bacterial infections, thus sustaining a balance  between  the protective  and destructive  components  of  inflammation (12).  1.3  The Human Cationic Peptide LL-37 Cationic host defense (antimicrobial) peptides are a component of the innate  immune response that can be expressed constitutively or induced in response to inflammation and/or pathogen-associated molecules such as bacterial LPS (13-15). L L 37, a 37 amino acid peptide that is produced throughout the body and at mucosal surfaces, represents an extracellular processed form of the human cationic host defense protein-18 (hCAP-18). It has a wide variety of immunomodulatory activities that have been demonstrated both in vitro and in vivo (16). Despite its weak antimicrobial activity under physiological salt conditions (16), its exogenous application can protect animals against Staphylococcus aureus infections (14). LL-37 is chemotactic for a variety of cells of the innate and adaptive immune responses and also induces chemokine production in monocytes, epithelial cells and animals. LL-37 also traffics into host cells and is known to induce the ERK1/2 and p38 pathways in primary human monocytes (17), indicating an ability to influence intracellular signal transduction.  4  While the above mechanisms might be considered pro-inflammatory, LL-37 is also able to reduce or abolish pro-inflammatory cytokine production induced by LPS and L T A in monocytes and macrophages, and is protective against endotoxaemia in animal models (18). It has been proposed that the anti-endotoxin effects of host defence peptides are partly due to blocking of LPS binding to the serum LPS-binding protein (LBP) (19) and the cell surface receptor component C D 14 (20), although LL-37 has a much lower LPS-binding affinity than C D 14 or L B P . Moreover, it has been demonstrated that the anti-endotoxin effect of this peptide was based on a complex series of events and reflected in part an ability to reduce, by around 75%, the amount of NF-KB  p50/p65 in the nucleus (12). Although there is ample evidence that LL-37  interacts directly with the effector cells of innate immune responses, very little is known about eukaryotic cell signaling induced by peptide interaction. Therefore, regulatory network and pathway analyses using a systematic approach are needed to understand the immunomodulatory effect of LL-37.  1.4  Bioinformatics approach to understanding the anti-endotoxin effect of LL-37  Analysis of transcriptional regulation Transcriptional regulation is a major mechanism controlling the spatial and temporal activity of genes, thereby governing the organization of biological processes in eukaryotes (21). Stimulation of these cells initiates the orchestration of complex signaling cascades which activate transcription factor binding to the promoter regions  5  of genes. In contrast to prokaryotes where transcriptional regulation can often be understood in terms of induction by one or two factors, eukaryotic gene regulation is largely carried out through sophisticated interactions of multiple transcription factors (21). Advances in large scale and high throughput genomic analysis such as D N A microarray technology have provided the means to identify genes that are transcriptionally responsive to particular conditions. Expression of genes that are coregulated by the same transcription factors often exhibit similar temporal patterns, which may result from the activity of several transcription factors acting in a coordinated and/or sequential fashion (22-26). The utilization of clustering methods that involve the grouping of genes that respond similarly over a particular set of conditions (e.g. various treatment times) permits the identification of genes with similar expression profiles. Subsequent detection of an over-representation of cis regulatory motifs in the upstream regions of those clustered genes compared to a random data set suggests a common mechanism of transcriptional regulation. Elucidation of such key transcriptional networks is crucial to understanding the upstream molecular interactions, such as signal transduction pathways, involved in complex systems such as human innate immunity.  Clustering of expression profiles Since co-expression of genes often reflects co-regulation, closely related gene expression profiles can be grouped together by clustering techniques to extract regulatory information. Unsupervised clustering (i.e. where no external information is used in the clustering process) can be performed with a number of techniques that  6  produce arrangements of the data based on a distance function (27). Various algorithms of unsupervised clustering are available and the current most used ones are shown in Table 1.1.  Table 1.1  The most commonly-used methods for clustering and their properties Non-hierarchical  Hierarchical  Properties  Standard  K-Means Clustering  Hierarchical Clustering  NN-based Properties  SOM  SOTA Provides info on relationships btw clusters  Sensitive to noise. Slow Robust. Fast  Data can be clustered in two different ways: in a hierarchical or non-hierarchical manner. Hierarchical clustering allows detection of higher-order relationships between clusters of profiles, whereas the majority of non-hierarchical classification techniques work by allocating expression profiles to a pre-defined number of clusters, with no assumptions as to the inter-cluster relationships (27). K-Means Clustering (KMC) starts with a pre-defined number of clusters and, by iterative reallocation of cluster members, minimizes the overall intra-cluster dispersion (28). Hierarchical clustering produces a representation of the data in the shape of a binary tree, where the most similar patterns are clustered in a hierarchy of nested subsets (22). Self-organizing maps (SOM) (29) and self-organizing tree algorithm (SOTA) (30) are both neural network (NN)-based methods. As with K M C , the number of clusters is arbitrarily fixed from the beginning for SOM. SOTA is a hierarchical method where the tree grows from the root (30). Despite the arsenal of methods used, the optimal ways of classifying gene expression data by unsupervised methods are still open to debate.  7  Detection of over-represented transcription factor binding sites (TFBSs) Gene clusters with similar expression profiles can be assessed for regulatory information. Examination of TFBS information for a gene cluster can provide a strong hypothesis as to the signaling pathway(s) governing the expression profile exhibited in that cluster. The oPOSSUM database provides pre-computed predictions of conserved TFBSs in human and mouse promoters from the Ens'embl database (31, 32). The TFBSs are predicted by position-specific scoring matrices (PSSMs) from the J A S P A R database (33). Unlike the T R A N S F A C database (38) which is commercially available and has redundancies in its DNA-binding profiles, the J A S P A R database is an openaccess database of annotated, high-quality, matrix-based transcription factor binding site profiles for multicellular eukaryotes  (33). The specificity of these TFBS  predictions is largely improved by reducing the analytical search space to regions of conserved, non-coding D N A sequence using a comparative genomics approach known as phylogenetic footprinting (31, 34-37). The predictions are made in promoter regions encompassing 5000 bp upstream and 5000 bp downstream of the transcription start site, enabling detection over a large sequence space that includes some coding and intronic regions. However, even with the improved performance conferred by phylogenetic footprinting, most predicted TFBSs are non-functional (31). The capacity to discriminate functional binding sites from these potential false positive predictions should be improved by incorporating gene expression data into the analysis procedures (31).  8  1.5  Hypothesis and Experimental Goals The goal of this study was to examine the anti-endotoxin properties of LL-37 in  the context of its ability to modulate transcription. Time-course microarray data using THP-1 monocytic cell line treated with LPS, LL-37 or both LPS and LL-37 was available and I hypothesized that the analysis of this microarray data using a systematic bioinformatics approach (clustering of expression profiles followed by extraction of regulatory information) should provide a more thorough understanding of the signaling pathways responsible for the anti-endotoxin effect of LL-37. A computational approach was applied to this time course microarray data (12), by clustering genes demonstrating similar temporal expression profiles followed by subsequent detection of overrepresented transcription factor binding sites (TFBSs) in the promoters of these coexpressed genes.  9  1.6  Bibliography  1.  Fearon, D. T., and R. M . Locksley. 1996. The instructive role of innate immunity in the acquired immune response. Science 272:50.  2.  Janeway, C. A., Jr., and R. Medzhitov. 2002. Innate immune recognition. Annu Rev Immunol 20:197.  3.  Aderem, A., and K . D. Smith. 2004. A systems approach to dissecting immunity and inflammation. Semin Immunol 16:55.  4.  Chow, J. C , D. W. Young, D. T. Golenbock, W. J. Christ, and F. Gusovsky. 1999. Toll-like receptor-4 mediates lipopolysaccharide-induced signal transduction. J Biol Chem 274:10689.  5.  Davies, M . G., and P. O. Hagen. 1997. Systemic inflammatory response syndrome. Br J Surg 84:920.  6.  Guha, M . , and N . Mackman. 2001. LPS induction of gene expression in human monocytes. Cell Signal 13:85.  7.  Caamano, J., and C. A . Hunter. 2002. NF-kappaB family of transcriptionfactors: central regulators of innate and adaptive immune functions. Clin Microbiol Rev 15:414.  8.  Guha, M . , M . A . O'Connell, R. Pawlinski, A . Hollis, P. McGovern, S. F. Yan, D. Stern, and N . Mackman. 2001. Lipopolysaccharide activation of the M E K ERK1/2 pathway in human monocytic cells mediates tissue factor and tumor necrosis factor alpha expression by inducing Elk-1 phosphorylation and Egr-1 expression. Blood 98:1429.  10  9.  van der Bruggen, T., S. Nijenhuis, E. van Raaij, J. Verhoef, and B . S. van Asbeck. 1999. Lipopolysaccharide-induced tumor necrosis factor alpha production by human monocytes involves the r a f - l / M E K l - M E K 2 / E R K l E R K 2 pathway. Infect Immun 67:3824.  10.  Vogel, S. N . , K . A . Fitzgerald, and M . J. Fenton. 2003. TLRs: differential adapter utilization by toll-like receptors mediates TLR-specific patterns of gene expression. Mol Interv 3:466.  11.  Mukhopadhyay, S., J. Herre, G. D. Brown, and S. Gordon. 2004. The potential for Toll-like receptors to collaborate with other innate immune receptors. Immunology 112:521.  12.  Mookherjee, N . , K . Brown, D. M . Bowdish, S. Doria, R. Falsafi, K . Hokamp, F. M . Roche, R. M u , G. H . Doho, J. Pistolic, J. P. Powers, J. Bryan, F. S. L . Brinkman, and R. E. W. Hancock. 2006. Modulation of the Toll-like receptormediated inflammatory response by the endogenous human host defence peptide LL-37. J Immunol 176:2455.  13.  Liu, L., A . A . Roberts, and T. Ganz. 2003. By IL-1 signaling, monocytederived cells dramatically enhance the epidermal antimicrobial response to lipopolysaccharide. J Immunol 170:575.  14.  Diamond, G., J. P. Russell, and C. L . Bevins. 1996. Inducible expression of an antibiotic peptide gene in lipopolysaccharide-challenged tracheal epithelial cells. Proc Natl Acad Sci USA 93:5156.  15.  Hancock, R. E. W., and G. Diamond. 2000. The role of cationic antimicrobial peptides in innate host defences. Trends Microbiol 8:402.  11  16.  Bowdish, D. M . , D. J. Davidson, Y . E. Lau, K . Lee, M . G. Scott, and R. E. W. Hancock. 2005. Impact of LL-37 on anti-infective immunity. JLeukoc Biol 77:451.  17.  Bowdish, D. M . , D. J. Davidson, D. P. Speert, and R. E. W. Hancock. 2004. The human cationic peptide LL-37 induces activation of the extracellular signal-regulated kinase and p38 kinase pathways in primary human monocytes. J Immunol 172:3 758.  18.  Bals, R., D. J. Weiner, A. D. Moscioni, R. L . Meegalla, and J. M . Wilson. 1999. Augmentation of innate host defense by expression of a cathelicidin antimicrobial peptide. Infect Immun 67:6084.  19.  Scott, M . G., A . C. Vreugdenhil, W. A . Buurman, R. E. W. Hancock, and M . R. Gold. 2000. Cutting edge: cationic antimicrobial peptides block the binding of lipopolysaccharide (LPS) to LPS binding protein. J Immunol 164:549.  20.  Nagaoka, I., S. Hirota, F. Niyonsaba, M . Hirata, Y . Adachi, H . Tamura, and D. Heumann. 2001. Cathelicidin family of antibacterial peptides C A P 18 and CAP11 inhibit the expression of TNF-alpha by blocking the binding of LPS to CD14(+) cells. J Immunol 167:3329.  21.  Bluthgen, N . , S. M . Kielbasa, and H. Herzel. 2005. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res 33:272.  22.  Eisen, M . B., P. T. Spellman, P. O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U SA 95:14863.  12  23.  Ross, D. T., U . Scherf, M . B . Eisen, C. M . Perou, C. Rees, P. Spellman, V . Iyer, S. S. Jeffrey, M . Van de Rijn, M . Waltham, A. Pergamenschikov, J. C. Lee, D. Lashkari, D. Shalon, T. G. Myers, J. N . Weinstein, D. Botstein, and P. O. Brown. 2000. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227.  24.  Ueda, H . R., W. Chen, A . Adachi, H. Wakamatsu, S. Hayashi, T. Takasugi, M . Nagano, K . Nakahama, Y . Suzuki, S. Sugano, M . lino, Y . Shigeyoshi, and S. Hashimoto. 2002. A transcription factor response element for gene expression during circadian night. Nature 418:534.  25.  Wu, Q., P. Kirschmeier, T. Hockenberry, T. Y. Yang, D. L . Brassard, L. Wang, T. McClanahan, S. Black, G. Rizzi, M . L. Musco, A . Mirza, and S. Liu. 2002. Transcriptional regulation during p21WAFl/CIPl-induced apoptosis in human ovarian cancer cells. J Biol Chem 277:36329.  26.  Chiang, D. Y . , P. O. Brown, and M . B . Eisen. 2001. Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics 17 Suppl 1:S49.  27.  Tamames, J., D. Clark, J. Herrero, J. Dopazo, C. Blaschke, J. M . Fernandez, J. C. Oliveros, and A . Valencia. 2002. Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. JBiotechnol 98:269.  28.  Tavazoie, S., J. D. Hughes, M . J. Campbell, R. J. Cho, and G. M . Church. 1999. Systematic determination of genetic network architecture. Nat Genet 22:281.  29.  Kohonen, T., 1990. The self-organizing map. Proc. IEEE 78:1464.  13  30.  Dopazo, J., and J. M . Carazo. 1997. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J Mol Evol 44:226.  31.  Ho Sui, S. J., J. R. Mortimer, D. J. Arenillas, J. Brumm, C. J. Walsh, B. P. Kennedy, and W. W. Wasserman. 2005. oPOSSUM: identification of overrepresented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 33:3154.  32.  Birney E, D. Andrews, M . Caccamo, Y. Chen, L . Clarke, G. Coates, T. Cox, F. Cunningham, V . Curwen, T. Cutts, T. Down, R. Durbin, S. M . FernandezSuarez, P. Flicek, S. Graf, M . Hammond, J. Herrero, K . Howe, V . Iyer, K . Jekosch, A . Kahari, A. Kasprzyk, D . Keefe, F. Kokocinski, E. Kulesha, D. London, I. Longden, C. Melsopp, P. Meidl, B. Overduin, A . Parker, G. Proctor, A . Prlic, M . Rae, D. Rios, S. Redmond, M . Schuster, I. Sealy, S. Searle, J. Severin, G. Slater, D. Smedley, J. Smith, A. Stabenau, J. Stalker, S. Trevanion, A. Ureta-Vidal, J. Vogel, S. White, C. Woodwark, T. J. Hubbard. 2006. Ensembl. Nucleic Acids Res 34.D556.  33.  Sandelin, A., W. Alkema, P. Engstrom, W. W. Wasserman, and B. Lenhard. 2004. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32.D91.  34.  Koop, B. F., 1995. Human and rodent D N A sequence comparisons: a mosaic model of genomic evolution. Trends Genet 11:367.  14  35.  Hardison, R. C , J. Oeltjen and W. Miller. 1997. Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7:399.  36.  Duret, L . and P. Bucher. 1997. Searching for regulatory elements in human noncoding sequences. Curr Opin Struct Biol 7:399.  37.  Wasserman, W. W., M . Palumbo, W. Thompson, J. W. Fickett and C. E. Lawrence. 2000. Human-mouse genome comparisons to locate regulatory sites. Nature Genet 26:225.  38.  Matys, V . , E. Fricke, R. Geffers, E. Gossling, M . Haubrock, R. Hehl, K . Hornischer, D. Karas, A . E. Kel, O. V . Kel-Margoulis, D. U . Kloos, S. Land, B . Lewicki-Potapov, H . Michael, R. Munch, I. Reuter, S. Rotert, H . Saxel, M . Scheer, S. Thiele, and E. Wingender. 2003. T R A N S F A C : transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31:374.  15  Chapter  Global  2  Computational Regulatory Analysis Anti-endotoxin Effect of LL-37*  of  the  A version of this chapter has been submitted for publication as Doho G.H., Mookherjee N , Brown K , Roche F . M . , Brinkman F.S.L., Hancock R.E.W. 2006. Global Computational Regulatory Analysis of Anti-endotoxin Effect of LL-37. Genomics. Submitted on Jan 2006. The text has been modified accordingly. A l l data were collected and analyzed by G.H. Doho who wrote the manuscript. Fiona Brinkman hosted me in her lab and supervised me for a portion of this research. Robert Hancock hosted me in his lab and supervised me for a portion of this research, and discussed data and methods. Neeloffer Mookherjee and Kelly Brown provided the microarray data. Fiona Roche discussed methods and results.  16  2.1  Introduction  Transcriptional regulation is a major mechanism controlling the spatial and temporal activity of genes, thereby governing the organization of biological processes in eukaryotes (1). Stimulation of these eukaryotic cells initiates the orchestration of complex signaling cascades which activate transcription factor binding to the promoter regions of genes. In contrast to prokaryotes where transcriptional regulation can often be understood in terms of induction by one or two factors, eukaryotic gene regulation is largely carried out through sophisticated interactions of multiple transcription factors (1). Advances in large scale and high throughput genomic analysis such as D N A microarray technology have provided the means to identify genes that are transcriptionally responsive to particular conditions. Expression of genes that are coregulated by the same transcription factors often exhibit similar temporal patterns, which may result from the activity of several transcription factors acting in a coordinated and/or sequential fashion (2-6). The utilization of clustering methods that involve the grouping of genes that respond similarly over a particular set of conditions (e.g. various treatment times) permits the identification of genes with similar expression profiles. Subsequent detection of an over-representation of cis regulatory motifs in the upstream regions of those clustered genes compared to a random data set suggests a common mechanism of transcriptional regulation. Elucidation of such key transcriptional networks is crucial to understanding the upstream molecular interactions, such as signal transduction pathways, involved in complex systems such as human innate immunity. Here I report a global bioinformatic analysis of transcriptional events  17  occurring in the human monocytic cell line THP-1, especially in the presence of the human cationic peptide LL-37 and a bacterial signature molecule lipopolysaccharide (LPS; also termed endotoxin). Ligand binding to the extracellular domain of Toll-like receptors (TLRs) initiates a complex signal-transduction cascade, which ultimately leads to activation of the transcription factor Nuclear Factor-KB  (NF-KB)  and increased transcription of pro-  inflammatory genes, including those encoding cytokines and chemokines. Bacterial LPS, a TLR4 agonist, is known to trigger innate immunity (inflammation) via the TLR4 to N F - K B pathway (7). Modest stimulation is considered beneficial but situations where too vigorous a response is initiated can lead to excessive amounts of proinflammatory cytokines like TNF-a, leading to complications such as endotoxaemia and septic shock (8). LPS activates  NF-KB  in human monocytes and monocytic cell  lines by initiating signal transduction pathway(s) downstream of TLR4, promoting the degradation of the cytosolic inhibitor I K B which releases N F - K B to translocate into the nucleus (9).  NF-KB  comprises five subunits (p50/pl05, p52/pl00, p65 (RelA), RelB  and c-Rel) which combine as homo- or heterodimers to mediate transcriptional activations and in some cases repression; p50/p65 dimers are the best studied transcriptional activators while p50 homodimers tend to mediate repression (10). Studies have also demonstrated that LPS is a potent activator of mitogen-activated protein kinase (MAPK) signal transduction pathways that include those involving the extracellular signal-regulated kinase (ERK1/2) and p38 M A P K (11, 12). Inhibition of either ERK1/2 or p38 activation has been shown to reduce LPS induction of TNF-a expression (11, 12). These findings indicate the possibility that N F - K B and the ERK1/2  18  and p38 pathways act cooperatively in inducing pro-inflammatory cytokines such as TNF-a. Cationic host defence (antimicrobial) peptides are a component of the innate immune response that can be expressed constitutively or induced in response to inflammation and/or pathogen-associated molecules such as bacterial LPS (13-15). L L 37, a 37 amino acid peptide that is produced throughout the body and at mucosal surfaces, represents an extracellular form of human cationic host defense protein-18 (hCAP-18). It has a wide variety of immunomodulaotory activities that have been demonstrated both in vitro and in vivo (16). Despite its weak antimicrobial activity under physiological salt conditions (16), its exogenous application can protect animals against Staphylococcus aureus infections (14). For example, LL-37 is chemotactic for a variety of cells of the innate and adaptive immune responses and also induces chemokine production in monocytes, epithelial cells and animals. LL-37 traffics into host cells and is known to induce the ERK1/2 and p38 pathways in primary human monocytes (17), indicating an ability to influence intracellular signal transduction. While the above mechanisms might be considered pro-inflammatory, LL-37 is also able to reduce or abolish pro-inflammatory cytokine production induced by LPS and L T A in monocytes and macrophages, and is protective against endotoxaemia in animal models (18). It has been proposed that the anti-endotoxin effects of host defence peptides are partly due to blocking the LPS binding to the serum LPS-binding protein (LBP) (19) and the cell surface receptor component CD 14 (20), although LL-37 has a much lower L P S binding affinity than C D 14 or L B P . Moreover, a recent study demonstrated that the anti-endotoxin effect of this peptide was based on a complex  19  series of events and reflected in part an ability to reduce, by around 75%, the amount of NF-KB  p50/p65 in the nucleus (21). In this study, I examined the anti-endotoxin properties of LL-37 in the context  of its ability to modulate transcription. A computational approach was applied to previously  gathered  time course microarray data  (21), by clustering genes  demonstrating similar temporal expression profiles followed by subsequent detection of over-represented transcription factor binding sites (TFBSs) in the promoters of these co-expressed genes. These data indicated that LL-37 selectively neutralized the LPSinduced expression of genes which contained binding sites for both the p50/p65  NF-KB  heterodimer and transcription factors downstream of ERK1/2 and p38 in THP-1 cell line. In particular, the latter M A P K pathway controlled transcription factors were strongly implicated in the action of LL-37 by this bioinformatic study.  20  2.2  Materials and Methods  Microarray Experiments The data set utilized for these analyses was previously presented (21) and has been deposited into ArrayExpress under accession number E-FPMI-4. Briefly, THP-1 cells were stimulated with LPS (100 ng/ml), LL-37 (20 ug/ml), or both LPS and LL-37 for 1, 2, 4, or 24 hr. R N A was isolated using RNeasy Mini kit (Qiagen Inc., O N , Canada) and treated with RNase-Free DNase (Qiagen Inc.) as per the manufacturer's instruction. Isolated R N A samples were eluted and stored in RNase-free water (Ambion Inc., T X , USA). R N A concentration, integrity and purity were assessed using an Agilent 2100 Bioanalyzer with R N A 6000 Nano kits (Agilent Technologies, C A , USA). The R N A samples were reverse-transcribed  using the MessageAmpII™  amplification kit, according to the manufacturer's instructions, and the samples were labeled with mono-functional dyes, Cyanine-3 and Cyanine-5 (Amersham Biosciences Corp., N J , USA). Microarray slides were printed with the human genome 21K ArrayReady Oligo Set™ (Qiagen Inc.) at The Jack Bell Research Center (Vancouver, B C , Canada). Equivalent (20 pmol) cyanine labeled samples from control and treated cells were then mixed and hybridized on the array slides as previously described (21). Following hybridization, the slides were washed, dried and scanned using ScanArray™ Express software/scanner (scanner and software by Packard Bioscience BioChip Technologies) and the images were quantified using ImaGene™ (BioDiscovery Inc., E l Segundo, C A , USA). Assessment of slide quality, normalization, detection of differential gene  21  expression and statistical analysis was carried out with ArrayPipe (version 1.6), a webbased, semi-automated software specifically designed for processing of microarray data (22). The following processing steps were applied: 1) flagging of markers, 2) subgridwise background correction, using the median of the lower 10% foreground intensity as an estimate for the background noise, 3) data-shifting, to rescue negative spots, 4) printTip LOESS normalization, 5) merging of technical replicates, 6) two-sided onesample Student's t-test on the log2-ratios within each time-point group, 7) averaging of biological replicates to yield overall fold-changes for each treatment group. In order to perform temporal expression pattern clustering, differentially expressed genes, defined as Student t-test p < 0.05 at one or more of the time points, were obtained from each treatment group. Log2-ratios at each time point were obtained for gene expression measures for further analysis such as expression pattern clustering. Microarray probes predicted by ProbeLynx (23) to cross-hybridize, or have high probability of hybridizing to intergenic regions, were excluded from the differentially expressed gene list. Treatment  Total differentially expressed (DE) genes  DE gene probes with multiple gene hits  DE genes with high probability intergenic  DE genes for further analysis (% from total genes)  LPS LL-37 LPS + LL-37  3981 3343 3613  190 164 187  225 178 206  3566(15.3%) 3001 (12.9%) 3220(13.9%)  Table 2.1. Differentially expressed (DE) genes from THP-1 microarray. Microarray spots which might hybridize with more than one gene or spots with high probability of hybridizing to intergenic region, as predicted by ProbeLynx were excluded from the differentially expressed gene list.  22  Expression pattern clusterings Clustering of temporal gene expression data was performed using four publicly available software packages including three methods from TIGR T M E V version 3.0 (K-Means Clustering (KMC) (24), the Self-Organizing Tree Algorithm (SOTA) (25) and Self-Organizing Map (SOM)) (26). In addition I used the hierarchical clustering method, Hierarchical Ordered Partitioning and Collapsing Hybrid  (HOPACH,  http://www.r-project.org), which builds a hierarchical tree by recursively partitioning a gene expression dataset and collapsing clusters at each level. Different  distance  measures  including Euclidean distance,  uncentered  correlation and centered correlation were tested for their ability to correctly cluster known co-regulated genes (p50/p65  NF-KB  target genes) together. However, only  minimal differences were observed for our data (data not shown), so Euclidean distance was used for subsequent analyses. Various other parameters were explored to optimize each clustering method for their ability to correctly group p50/p65 target genes. Some of the key parameters used for the clustering methods included the use of the cell variability setting with p < 0.05 as the division criterion in SOTA, the use of k=30 in K M C clustering and the use of the default settings for H O P A C H .  TFBS Analysis The oPOSSUM database provides pre-computed predictions of conserved TFBSs in human and mouse promoters from Ensembl database, through phylogenetic footprinting (27). The TFBSs are predicted by the position specific scoring matrices (PSSMs) from the J A S P A R database (28). The predictions are made in promoter  23  regions encompassing 5000 bp upstream and 5000 bp downstream of the transcription start site, enabling detection over a large sequence space that includes some coding and intronic regions. For each gene analyzed, all TFBS predictions with a stringent conservation level (top 10% of the conserved regions with a minimum of 80% matrix match score) were compiled. To detect over-represented TFBS in a set of co-expressed genes, a randomized re-sampling method was used to generate background data sets, and a Z-score was determined as a statistical measure to rank the degree of TFBS over-representation in the set of co-expressed genes relative to the means and distribution of background data sets. For each co-expressed gene set, n sets of randomly sampled genes of the same sample size were generated as background data sets. The optimal value of n at which the rate of standard deviation decrease was minimal, was determined to be 1000 after testing values ranging from 100 to 10,000. The sample size of each background set of randomly selected genes was chosen according to the sample size of the co-expressed gene set, and the standard deviation calculated accordingly. Since, at large values of n, the distributions of the proportion of genes containing a given TFBS followed a normal approximation to the binomial distribution, I applied the Central Limit Theorem and correction for continuity to calculate the Z-score as: Zi =  (XJ  - 0.5 - u.j) / o~,  where X; was the number of the genes containing the particular TFBS, i , in the input co-expressed gene set, and \i\ and a were the mean and standard deviation of the number of genes containing the TFBS in the random background data sets. A significant Z-score indicated that a significantly increased (i.e. non-random) proportion  24  of the co-expressed genes analyzed contained the particular TFBS. Due to differences in the natural abundance of different TFBS, the mean proportions of different TFBS in the background data sets would not be the same, a property that could potentially introduce false positives. For example, i f a particular TFBS had a very low mean proportion in the background, e.g. 10%, and was found to be present in the promoters of 20% of the co-expressed genes under analysis, it would still be reported as overrepresented, despite its low rate of occurrence in the co-expressed gene set. To avoid this problem, over-represented TFBSs found in less than 50% of the co-expressed genes were filtered out.  Over-representation of Gene Ontology Terms To further facilitate biological interpretation of gene expression clusters, overrepresentation of Gene Ontology (GO) Terms was assessed for each cluster of coexpressed genes. GoMiner build 143 (29) was used to compare each gene cluster to the total number of differentially expressed genes within the same treatment group. This analysis was performed using GO Terms in the biological process category. GO Terms that were statistically significantly (p < 0.05) over-represented were reported.  25  2.3  Results  The human monocytic THP-1 cell line was used to assess the immunemodulating effects of LL-37. The microarray experiment analyzed the global gene expression at four time points (1, 2, 4 and 24 hours post-treatment) under three separate treatment conditions (LPS alone, LL-37 alone and LPS together with LL-37). A l l analyses represented three independent biological experiments performed in duplicate. ArrayPipe (22) was used to extract lists of differentially expressed genes from these microarray datasets. Since a gene can be differentially expressed at any time during the course of the experiment, the criteria for differential expression were defined as p < 0.05 by two-sided one-sample Student's t-test at one or more of the four time points. Using these criteria, 12.9 to 15.3% of the 23,232 genes analyzed were differentially expressed in each treatment group relative to the untreated control and were extracted for further computational analysis.  Evaluation of clustering methods As various gene clustering methods are available, four of the most commonlyused methods (HC, K M C , SOTA, SOM) were evaluated for their ability to group together genes that are known to be co-regulated by the same transcription factor. As LPS is known to activate the p50/p65 complex of the transcription factor  NF-KB,  each  clustering method was validated using a reference list of 28 known target genes (30) of this complex, including a number of cytokines/growth factors, transcription factors, surface receptors, signaling molecules as well as molecules involved in metabolism.  26  Temporal expression profiles of differentially expressed genes from the microarray were clustered by the various methods, and then the known p50/p65 target genes in the reference list were mapped to the clustering results. None of the methods tested was able to isolate all of the known target genes into a single cluster (Table 2.2), including the hierarchical clustering method using H O P A C H (Hierarchical Ordered Partitioning and Collapsing Hybrid) that tended to produce two or three very large clusters and thus was not further utilized. For all methods, the clusters found to contain the known p50/p65 target genes showed similar up-regulated expression patterns with maximal magnitudes at either 2 or 4 hr post-treatment (Figure 2.1). It seems likely that these moderate differences in the magnitude and timing of gene expression, resulting in different temporal expression patterns, reflected the involvement of other transcription factors. Method  Hierarchical Clustering (HC) by HOPACH K-Means Clustering (KMC) Self-Orgainizing Tree Algorithm (SOTA) Self-Organizing Map (SOM)  Total number of Clusters  Clusters with p50/p65 target genes  3 30 30 30  2 3 3 7  Table 2.2. Mapping of known N F - K B p50/p65 target genes using different clustering methods. Four commonly-used clustering methods were evaluated for their ability to group known target genes of the p50/p65 heterodimer of N F - K B . Both SOTA and K M C produced three clusters that contained known p50/p65 target genes whereas these target genes were spread into seven clusters using S O M (Table 2.2). By evaluating the proportion of the differentially expressed genes that were common to both the SOTA and K M C methods only modest discrepancies were observed in that of the 3566 genes in groups of 10 or more genes, 2803 genes (about 27  79%) were similarly clustered by these methods. Given the high degree of overlap between the SOTA and K M C clustering methods they were combined by extracting gene clusters, with ten or more genes, that were commonly predicted by both methods. This process largely preserved the clustering of p50/p65 target genes, with only one target gene being eliminated (IL-7R) and the remainder falling into three clusters (Supplemental Table I). Conserved gene clusters were similarly obtained for the other datasets (treatment with LL-37 alone or LPS and LL-37).  p50/p65 Target Genes 3 2  3  -  1  _:  °, o  — — * —  2hr  4hr  ^^-24hr  • •  -1  -2 Time points Figure 2.1 Expression patterns of p50/p65 target genes show up-regulation by LPS.  Detection of over-represented TFBS in a group of co-regulated genes Since co-regulation of genes is predicted to correlate with their co-expression, each cluster was assessed for over-represented TFBSs. To detect over-represented TFBS in a set of genes, oPOSSUM (27), a database of pre-computed human TFBS predictions on human genes using J A S P A R matrices (28), was used to obtain TFBS prediction data of the promoter regions for each gene. As a comparison a series of  28  random sets of genes were compared, to examine non-random association of a given TFBS with a gene set of equivalent size, using a randomized re-sampling method and statistical analysis by Z-score. To validate the statistical TFBS prediction for a set of co-regulated genes, I tested the method with three gene lists obtained from T R A N S F A C (31), consisting of genes that were experimentally found to contain binding sites for  NF-KB,  C R E B , and  SP1 respectively. Another test data set consisting of p50/p65 target genes (30) was also included in this validation study. TFBS prediction data was obtained from oPOSSUM and a Z-score computed for each TFBS, compared to randomly sampled background gene sets, to identify the TFBS that were over-represented in these test datasets. Table 2.3 shows validation results for these datasets indicating that this statistical method correctly identified the corresponding known TFBS in each of the test datasets. In the NF-KB  dataset, the p50 and p65 binding sites were the most over-represented, followed  by the binding sites for c-REL and different subunits of  NF-KB.  NF-KB  (subunit non-specific matrix), which are all  In the C R E B dataset, the MEF2 binding site was the most  over-represented followed by the C R E B binding site. Interestingly however, no other TFBS was over-represented in this dataset. In the SP1 dataset, the SP1 binding site had the highest Z-score of 3.56. Finally, TFBS analysis of the p50/p65 data set identified p50 (Z-score of 6.53) and p65 (Z-score of 4.62) as the top hits. Some of the TFBSs downstream of the M A P K pathway, such as C R E B , Elk-1 and c-Fos binding sites, were also over-represented in this data set, indicating the possibility of combinatorial transcriptional regulation downstream of the  TLR-»NF-KB  and M A P K pathways.  29  These results indicated that this statistical detection of over-represented TFBS could be applied to the co-expressed gene sets. Dataset  TFBS  Class  Signalling pathway*  NF-KB  P50 P65 c-REL NRF-2 MZF5-13  REL REL REL REL ETS ZN-FINGER  NFKB NFKB NFKB NFKB MAPK  MEF2 CREB  MADS bZIP  SP1 CREB P50 Max Tallbeta-E47S  ZN-FINGER bZIP REL bHLH-ZIP bHLH  p50 p65 NF-kappaB Max c-REL CREB Thing 1-E47 n-MYC ARNT USF FREAC-2 Elk-1 COUP-TF Androgen Ahr-ARNT HLF Irf-1 c-FOS  REL REL REL bHLH-ZIP REL bZIP bHLH bHLH-ZIP bHLH bHLH-ZIP FORKHEAD ETS NUCLEAR RECEPTOR NUCLEAR RECEPTOR bHLH bZIP  NF-KB  CREB SP1  p50/p65  MAPK _  MAPK NFKB -  NFKB NFKB NFKB NFKB MAPK  MAPK  MAPK  Z-score 3.64 3.48 2.97 2.35 1.85 1.72 1.99 1.73 3.56 2.63 2.58 1.87 1.72 6.53 4.62 4.07 3.45 3.44 2.71 2.47 2.39 2.39 2.35 2.32 2.19 2.18 2.16 2.13 2.08 2.06 2.05  Table 2.3 Over-representation of TFBS in test datasets. Three testing datasets were obtained from T R A N S F A C where genes which were experimentally found to contain binding sites for N F - K B , C R E B and SP1, and a data set consisting of known target genes of p50/p65 N F - K B heterodimer (40) were assessed for over-represented TFBS. TFBS predictions are reported with Z-scores higher than 1.645 which corresponds to a p-value of 0.05. * N F K B indicates the N F - K B pathway and M A P K indicates M A P K pathway.  30  LPS  up-regulates genes downstream of NF-KB (p50/p65), ERK1/2 and p38 It is known that  LPS  activates the  TLR4->NF-KB  (7), ERK1/2 and p38  pathways. Studies also strongly suggest that ERK1/2 and p38 pathways must be activated in addition to activation of the  NF-KB  pathway to led to the production of  certain pro-inflammatory cytokines and sepsis (9, 11, 12). Therefore, it was proposed that a subset of  NF-KB  p50/p65-regulated genes would be co-regulated by transcription  factors (TFs) downstream of ERK1/2 and/or p38. These MAPK-downstream transcription factors are known to include C R E B , c-Fos, Elk-1, serum regulatory factor (SRF) that often forms a complex with Elk-1, and certain ETS-family transcription factors such as Spi-B, Sap-1 and Nrf-2 (32). To examine transcriptional effects downstream of T L R 4 - > N F - K B , ERK1/2 and p38 in THP-1 cells in the presence of LPS, over-represented TFBS were statistically detected in the gene clusters from microarray experiments. The data was then screened for clusters with over-represented binding sites for C R E B , c-Fos, Elk-1, SRF and other ETS-family transcription factors as well as  NF-KB  p50/p65 binding sites. One cluster that contained over-represented p50 and  p65 binding sites was also found to have over-representation of Elk-1, SRF, SAP-1, C R E B and c-Fos binding sites (Figure 2.2A, Table 2.4A). This is consistent cooperative regulation by  NF-KB  and the transcription factors downstream of ERK1/2  and p38. Examination of the gene expression pattern showed that the genes were all optimally up-regulated at 2 and/or 4 hr and returned to baseline expression at 24 hours post-treatment (Figure 2.2A).  31  Cluster  Name  Class  Signalling pathway *  A.  c-REL p50 SRF CREB Myc-Max FREAC-4 p65 Pbx FREAC-2 SP1 Evi-1 SAP-1 Elk-1 M Z F 5-13 HNF-1 NF-kappaB HFH-1 c-FOS Max AML-1  REL REL MADS bZIP bHLH-ZIP FORKHEAD REL HOMEO FORKHEAD ZN-FINGER, C2H2 ZN-FINGER, C2H2 ETS ETS ZN-FINGER, C2H2 HOMEO REL FORKHEAD bZIP bHLH-ZIP RUNT  NFKB NFKB  SAP-1 SRY Gfi SPI-B CREB SOX 17 Sox-5 c-FOS Elk-1 Nkx  ETS HMG ZN-FINGER, C2H2 ETS bZIP HMG HMG bZIP ETS HOMEO  B.  MAPK  NFKB  MAPK MAPK  NFKB MAPK  MAPK  MAPK MAPK  MAPK  Z-score 3.69 3.58 3.41 3.14 2.83 2.55 2.49 2.48 2.46 2.43 2.16 2.09 2.08 1.92 1.86 1.79 1.77 1.72 1.69 1.69 2.67 2.60 2.20 2.00 2.00 1.87 1.87 1.85 1.80 1.73  Table 2.4 Overrepresentation of TFBS in clusters of co-expressed genes from the LPSstimulated THP-1 dataset. TFBS analysis results for the expression pattern clusters A , cluster from Figure 2.2A and B, cluster from Figure 2.2B are shown here. Reported here are TFBS with Z-scores higher than 1.645 which corresponds to a p-value of 0.05. * N F K B indicates the N F - K B pathway and M A P K indicates M A P K pathway.  32  Figure 2.2. Clustering the transcriptional effect of LPS. Clusters of genes affected by L P S treatment of T H P - 1 cells, and with over-represented binding sites for transcription factors downstream of ERK1/2 and p38, are presented. A, A cluster with T F B S over-expression for both N F - K B p50/p65 and transcription factors downstream of ERK1/2 and p38 pathways. B, A cluster with T F B S over-expression for transcription factors downstream of ERK1/2 and p38 pathways, but no N F - K B p50/p65.  The TFBS analysis of these LPS-regulated genes also revealed that in another cluster, the binding sites for most of the transcription factors downstream of ERK1/2 and p38 were also over-represented, with up-regulation at 2 and 4 hr post-treatment; in contrast the p50 and p65 binding sites were not over-represented in this cluster (Figure 2.2B,  Table 2.4A). Therefore,  these data were consistent  with LPS-induced  transcriptional regulation, downstream of ERK1/2 and p38, which may be independent of the  NF-KB  pathway.  LL-37 up-regulates genes downstream of ERK1/2 and p38, but not NF-KB pathways  LL-37 is known to induce the activation of ERK1/2 and p38 pathways with downstream consequences that include the up-regulation of certain chemokine genes (17). Therefore I examined LL-37-induced genes for clusters containing over-  33  represented TFBSs for those the transcription factors downstream of ERK1/2 and p38, including C R E B , c-Fos, Elk-1, and other ETS family of transcription factors. The results indicated up-regulated gene expression at 2 and 4 hr post-treatment in those clusters with over-represented binding sites for C R E B , c-Fos, Elk-1, Srf, Sap-1 and Spi-B (Figure 2.3). Because these TFBSs were found together in only 2 out of 30 clusters, it seems likely that this up-regulation was highly dependent on the ERK1/2 and p38 M A P K pathways, rather than reflecting other pathways that individually activated these transcription factors.  Time Points  T i m e Points  Figure 2.3 LL-37 up-regulation of genes downstream of ERK1/2 and p38 pathways. Clusters of genes affected by LL-37 treatment of THP-1 cells and with over-represented TFBS for transcription factors: A, CREB, Elk-1, Srf, Sap-1 and B , c-Fos, Elk-1, Sap-1, Spi-B binding sites showing up-regulation at 2 and/or 4 hours after addition of LL-37.  34  LL-37 is a multipotent modulator of the innate immune response, and to date a wide variety of interactions with the immune effector cells have been described (15). Interestingly, as indicated previously, there were no genes for classical proinflammatory cytokines in these two clusters indicating that LL-37-induced activation of the transcription factors downstream of ERK1/2 and p38 had no effect on triggering the inflammatory response (data not shown). Furthermore, there was no cluster in which both p50 and p65 binding sites were both over-represented, indicating that L L 37 does not activate translocation of the  NF-KB  p50/p65 heterodimer into the nucleus.  This is in agreement with the finding that nuclear translocation of p50 and p65 was reduced by about 75% in the presence of LL-37 (21).  Anti-endotoxin effect of LL-37 LL-37 is known to suppress endotoxin-mediated lethality in animals (15, 18). It has been shown to reduce pro-inflammatory cytokine production by monocytes and macrophages to T L R agonists such as LPS (20). However it was recently demonstrated that a variety of genes remained up-regulated in the presence of both LPS and LL-37 in THP-1 cells (21). To understand this further, a TFBS analysis was performed on the conserved gene clusters generated from THP-1 cells stimulated with LPS together with LL-37. One cluster, that demonstrated up-regulation peaking at 2 hr post-treatment, exhibited over-representation of TFBS downstream of ERK1/2 and p38 (Elk-1, Spi-B and c-Fos), although no  NF-KB  p50/p65 binding sites were over-represented (Figure  2.4C, Table 2.5B). The other TFBSs over-represented in this cluster largely included binding sites for transcription factors that are associated with cellular development  35  and/or differentiation, including Sry, Sox-5, Sox-9, and Spl (Table 2.5B) (33-36). Another large cluster was found to show over-representation of TFBS downstream of ERK1/2 and p38 as well as that for N F - K B p50 and p65 (Figure 2.4A, Table 2.5A). The expression patterns of this cluster indicated down-regulation to near or below baseline expression (Figure 2.4A), consistent with the interpretation that LL-37 had selectively modulated the T L R 4 — » N F - K B pathway leading to suppression of those genes that require activation of both the N F - K B and M A P K pathways. In the presence of LPS alone these genes were up-regulated to various extents (Figure 2.4B). The genes in the cluster with over-representation of TFBS downstream of ERK1/2 and p38 but not p50/p65 also showed a varying degree of up-regulation in the presence of LPS alone (Figure 2.4D). These findings strongly support the hypothesis that the proposed anti-endotoxin effect of LL-37 occurs through the overall suppression of the TLR4->NF-KB  pathway, while the selective effects of LL-37 depend on genes with  TFBS downstream of M A P K .  36  1  2  3  Time Point  4  1  2  3  4  Time Point  Figure 2.4 Clusters of transcriptional events reflecting the action of LL-37 in the presence of LPS. Clusters of genes affected by combined LPS and LL-37 treatment of THP-1 cells and with over-represented binding sites for specific transcription factors. A , The expression pattern of a gene cluster generated from the LPS + LL-37 data set with over-represented binding sites for N F - K B p50/p65 and transcription factors downstream of ERK1/2 and p38 is shown here. B, The expression patterns of these genes from the LPS data set were also plotted. Also shown here are C, the expression pattern of a cluster generated from the LPS + LL-37 data set with over-represented TFBS downstream of ERK1/2 and p38 but not N F - K B p50 or p65, and D, the same genes from the LPS data set.  37  Clusters  A.  B.  Name  Class  TBP Myf p65 SOX 17 S8 SRY HFH-3 Gfi cEBP Max c-REL SAP-1 p50 FREAC-4 Yin-Yang HFH-2 c-FOS Elk-1 NF-kappaB USF RORalfa-1 n-MYC Thing 1-E47 ARNT SPI-B  TATA-box bHLH REL HMG HOMEO HMG FORKHEAD ZN-FINGER, C2H2 bZIP bHLH-ZIP REL ETS REL FORKHEAD ZN-FINGER, C2H2 FORKHEAD bZIP ETS REL bHLH-ZIP N U C L E A R RECEPTOR bHLH-ZIP bHLH bHLH ETS  SRY Sox-5 HFH-3 SPl HFH-2 FREAC-4 Myf SOX-9 RORalfa-1 AML-1 Elk-1 c-FOS cEBP SPI-B USF  HMG HMG FORKHEAD ZN-FINGER, C2H2 FORKHEAD FORKHEAD bHLH HMG N U C L E A R RECEPTOR RUNT ETS bZIP bZIP ETS bHLH-ZIP  Signalling pathway *  NFKB  NFKB MAPK NFKB  MAPK MAPK NFKB  MAPK  MAPK MAPK MAPK  Z-score  Genes containing TFBS(%)  3.66 3.47 3.18 3.06 3.02 3.00 2.97 2.80 2.80 2.80 2.75 2.63 2.36 2.59 2.35 2.34 2.32 2.07 2.02 1.88 1.87 1.86 1.78 1.75 1.66  80 71 . 67 93 98 93 78 87 76 60 71 58 69 71 93 76 82 92 61 69 56 71 82 71 100  2.61 2.55 2.52 2.33 2.10 2.07 2.06 2.03 2.01 2.00 1.95 1.88 1.82 1.73 1.66  90 92 73 78 73 69 59 73 59 80 86 82 67 100 67  Table 2.5 Overrepresentation of T F B S in clusters of co-expressed genes from THP-1 treated with LL-37 in the presence of L P S . TFBS analysis results for the expression pattern clusters A , cluster from Figure 2.4A and B, cluster from Figure 2.4B are shown here. Reported here are TFBS with Z-scores higher than 1.645 which corresponds to a p-value of 0.05. * N F K B indicates the N F - K B pathway and M A P K indicates M A P K pathway.  38  To further facilitate biological interpretation of these data, over-representation of Gene Ontology (GO) Terms (37) was assessed for each gene cluster using GoMiner software (29) . Many of the over-represented GO Terms in the cluster that showed suppressed expression patterns in the presence of both LL-37 and LPS, and demonstrated over-representation of p50/p65 binding sites (Fig 2.3A), was enriched for GO terms that were related to the production and regulation of cytokines and the immune response (Supplemental Table II). Conversely, the other cluster with upregulated expression patterns in the presence of LPS and LL-37, but contained no overrepresentation of p50/p65 binding sites, was associated with GO Terms that described natural cellular processes and immune responses, although no GO terms were related to cytokine production (Supplemental Table III).  39  2.4  Discussion  Time course microarray analysis provides the potential to analyze global gene expression patterns. Subsequent grouping of similar temporal expression patterns followed by regulatory analysis of the promoters of such co-expressed genes can give rise to hypotheses about the factors, TFBSs, and putative pathways involved in generating the observed expression patterns (27). The current study represents a bioinformatic analysis of time-course microarray data obtained using a monocytic cell line and three treatment regimens using a combination of optimized clustering combined with TFBS analysis. This provided a method of generating hypotheses regarding the influence of the human cationic host defence peptide LL-37 on innate immunity.  Expression pattern clustering A number of different clustering methods are currently available and as there appears to be no optimal solution for all types of data, I chose to first evaluate the most commonly used clustering methods for their performance when applied to our microarray data sets. The methods were evaluated based on their performance characteristics and underlying algorithm (hierarchical vs. non-hierarchical). Analysis of the regulation by LPS of the known target genes of  NF-KB  provided insight into the  ability of each method to separate these known co-regulated genes into a limited, number of clusters. Although  NF-KB  comprises five subunits (p50/pl05, p52/pl00,  p65 (RelA), RelB and c-Rel) that combine as homo- or heterodimers to mediate  40  transcriptional activation and in some cases repression, the most well studied transcriptional activator is the p50/p65 heterodimer. If the published list of  NF-KB  p50/p65 target genes (30) were influenced only by this heterodimeric transcription factor, in the absence of microarray experimental errors, all the target genes should have exhibited  similar expression  profiles  indicative of gene  co-regulation.  Examination of the expression profiles of a number of these target genes, under the influence of LPS stimulation, revealed rather similar patterns involving maximal upregulation either at 2 or 4 hr post-treatment. However although this indicated that the p50/p65 target gene list served as an strong test set for the validation of methods, several of the genes on this list were not observed to be up-regulated, indicating that even amongst this set there are other transcription factors that influence inducible expression. H O P A C H was used to perform Hierarchical Clustering since it can identify the most homogeneous clusters by the Mean/Median Split Silhouette (MSS) criteria. Moreover, it was noted by Tamames et al. (38) that standard hierarchical clustering works very well for clustering conditions (represented by columns, i.e. a small number of items), but several authors have noted that standard hierarchical clustering methods are not very robust when applied to thousands of gene expression profiles (26). Unlike standard hierarchical clustering, H O P A C H builds a hierarchical tree from top to bottom using a divisive algorithm. When applied to our array data however, it identified the second level of the tree (the first division from the root) as main clusters, producing only three gene clusters with several hundreds of genes using uncentered correlation as the distance measure. Examination of the expression patterns of the  41  clusters produced did not reveal any form of co-expression (data not shown). Neither using different distance measures nor varying method parameters seemed to improve the problem. Heirarchical clustering was not effective for clustering of our microarray data sets, possibly due to intrinsic problems such as high sensitivity to noise, or limited robustness, that manifested when applied to thousands of gene expression profiles. In contrast the use of the SOTA and K M C clustering methods permitted robust separation of the p50/p65 target genes into three highly similar clusters despite differences in the underlying algorithms of these methods. I combined the two methods and extracted conserved gene clusters hoping to take advantage of both SOTA's robustness based on neural networks and K M C ' s ability to minimize intra-cluster dispersion by iterative reallocation. Virtually all of the p50/p65 target genes remained within the same clusters after combining the methods, indicating that both SOTA and K M C were highly efficient and consistent when applied to our microarray data set.  TFBS analysis The TFBS analyses performed in this study were based on TFBS predictions generated in the oPOSSUM database, using very stringent parameters in order to minimize false positives that may have been generated in the process of binding site prediction. Although this might in principle have lowered the sensitivity of these analyses, it was likely to have increased the specificity of these analyses. However, an analogous TFBS analysis using comparatively less stringent parameters (e.g. top 30% of the conserved regions with a minimum of 75% matrix match score) did not  42  substantially affect the results. For example, less stringent TFBS analysis of the four test data sets still correctly identified their corresponding TFBS as top hits, while indicating over-representation of some additional TFBSs (data not shown). Therefore, to minimize false interpretations of gene cluster TFBS analyses, any inferences from these TFBS analyses reported here were based on the observation of multiple overrepresented TFBSs belonging to the same class of transcription factors. For example, to draw an inference of possible co-operative LPS-induced regulation by  NF-KB  and the  transcription factors downstream of ERK1/2 and p38, at least two or more M A P K downstream TFBSs (e.g. ETS-family, C R E B and c-Fos TFBS) as well as binding sites for p50 and p65 were required to be over-represented within the same cluster (Figure 3A). Our approach to detecting over-represented TFBS in a set of co-expressed genes was largely dependent on the TFBS predictions obtained from oPOSSUM and based on J A S P A R binding site matrices. Therefore, the regulatory analyses reported here are only as efficient as the limitations associated with J A S P A R database. J A S P A R currently contains 111 binding site profiles representing 25 classes of TFBS, limiting regulatory analysis to those 111 TFBS profiles. Nevertheless, the high quality of JASPAR TFBS profiles and information on the class of each TFBS enabled a comprehensive regulatory analysis.  Biological implications It is known that inhibitors of both the  TLR4->NFKB  and the ERK1/2 and p38  pathways reduce LPS-induced production of T N F - a among other proinflammatory  43  cytokines (11, 12). Our bioinformatic analyses indicated that the bacterial LPS upregulated clusters of genes that contained TFBS for both  N F K B  subunits and  transcription factors downstream of M A P K . Thus these bioinformatic analyses are consistent with the regulation of a large subset of pro-inflammatory genes by complexes of these two classes of transcription factors, possibly with the additional involvement of other transcription factors with TFBS that were over-represented in the promoter regions of genes in these clusters. Even though the  key transcription factor  in innate  NF-KB  has been described as  immune/inflammatory responses,  our  bioinformatic analyses together with the fact that inhibitors of both N F - K B and M A P K pathways reduce LPS-induced pro-inflammatory cytokines conclusively suggest that the regulation of innate immunity is complicated involving complexes of multiple transcription factors, including the obligate involvement of factors downstream of the M A P K . In addition it can be anticipated that there would be a cascade of events involved in innate immunity including triggering of gene expression influenced by transcription factors that are up-regulated during the early innate immune response, as well as suppression of responses as a mechanism of bringing about the dampening of this response (evident in the array data after 24 hours of exposure of monocytic cells to LPS). > . The human cationic host defense peptide LL-37 modulates innate immunity, in part through the activation of the ERK1/2 and p38 pathways (17). It was recently reported that the anti-endotoxin effect of this peptide reflected in part an ability to reduce, by around 75%, the amount of  NF-KB  p50/p65 in the nucleus (21). Here  computational evidence is provided that supports the contention that the selective  44  abrogation of the association of  NF-KB  p50/p65 is the key mechanism of the anti-  endotoxin effect of LL-37. The suppression of gene expression seemed to particularly influence genes that were predicted to contain binding sites for both  NF-KB  and  transcription factors downstream of ERK1/2 and p38. Conversely, genes regulated by the transcription factors downstream of ERK1/2 and p38 possibly together with other transcription factors, but not by  NF-KB  p50/p65 were apparently still up-regulated,  providing a potential basis for protection against infections by LL-37 in the absence of an N F - K B p50/p65 directed response.  45  2.5  Bibliography  1.  Bluthgen, N . , S. M . Kielbasa, and H . Herzel. 2005. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res 33:272.  2.  Eisen, M . B., P. T. Spellman, P. O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U SA 95:14863.  3.  Ross, D. T., U . Scherf, M . B. Eisen, C. M . Perou, C. Rees, P. Spellman, V . Iyer, S. S. Jeffrey, M . Van de Rijn, M . Waltham, A . Pergamenschikov, J. C. Lee, D. Lashkari, D. Shalon, T. G. Myers, J. N . Weinstein, D. Botstein, and P. O. Brown. 2000. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227.  4.  Ueda, H . R., W. Chen, A . Adachi, H . Wakamatsu, S. Hayashi, T. Takasugi, M . Nagano, K . Nakahama, Y . Suzuki, S. Sugano, M . lino, Y . Shigeyoshi, and S. Hashimoto. 2002. A transcription factor response element for gene expression during circadian night. Nature 418:534.  5.  Wu, Q., P. Kirschmeier, T. Hockenberry, T. Y . Yang, D. L . Brassard, L . Wang, T. McClanahan, S. Black, G. Rizzi, M . L . Musco, A . Mirza, and S. Liu. 2002. Transcriptional regulation during p21WAFl/CIPl-induced apoptosis in human ovarian cancer cells. J Biol Chem 277:36329.  6.  Chiang, D. Y., P. O. Brown, and M . B . Eisen. 2001. Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics 17 Suppl 1:S49.  46  7.  Chow, J. C , D. W. Young, D. T. Golenbock, W. J. Christ, and F. Gusovsky. 1999. Toll-like receptor-4 mediates lipopolysaccharide-induced signal transduction. J Biol Chem 274:10689.  8.  Davies, M . G., and P. O. Hagen. 1997. Systemic inflammatory response syndrome. Br J Surg 84:920.  9.  Guha, M . , and N . Mackman. 2001. LPS induction of gene expression in human monocytes. Cell Signal 13:85.  10.  Caamano, J., and C. A . Hunter. 2002. NF-kappaB family of transcription factors: central regulators of innate and adaptive immune functions. Clin Microbiol Rev 15:414.  11.  Guha, M . , M . A . O'Connell, R. Pawlinski, A . Hollis, P. McGovern, S. F. Yan, D. Stern, and N . Mackman. 2001. Lipopolysaccharide activation of the M E K ERK1/2 pathway in human monocytic cells mediates tissue factor and tumor necrosis factor alpha expression by inducing Elk-1 phosphorylation and Egr-1 expression. Blood 98:1429.  12.  van der Bruggen, T., S. Nijenhuis, E. van Raaij, J. Verhoef, and B . S. van Asbeck. 1999. Lipopolysaccharide-induced tumor necrosis factor alpha production by human monocytes involves the r a f - l / M E K l - M E K 2 / E R K l E R K 2 pathway. Infect Immun 67:3824.  13.  Liu, L., A . A . Roberts, and T. Ganz. 2003. By IL-1 signaling, monocytederived cells dramatically enhance the epidermal antimicrobial response to lipopolysaccharide. J Immunol 170:575.  47  14.  Diamond, G., J. P. Russell, and C. L . Bevins. 1996. Inducible expression of an antibiotic peptide gene in lipopolysaccharide-challenged tracheal epithelial cells. Proc Natl Acad Sci USA 93:5156.  15.  Hancock, R. E. W., and G. Diamond. 2000. The role of cationic antimicrobial peptides in innate host defences. Trends Microbiol 8:402.  16.  Bowdish, D. M . , D. J. Davidson, Y . E. Lau, K . Lee, M . G. Scott, and R. E. W. Hancock. 2005. Impact of LL-37 on anti-infective immunity. JLeukoc Biol 77:451.  17.  Bowdish, D. M . , D. J. Davidson, D. P. Speert, and R. E. W. Hancock. 2004. The human cationic peptide LL-37 induces activation of the extracellular signal-regulated kinase and p38 kinase pathways in primary human monocytes. J Immunol 172:3 758.  18.  Bals, R., D. J. Weiner, A . D. Moscioni, R. L . Meegalla, and J. M . Wilson. 1999. Augmentation of innate host defense by expression of a cathelicidin antimicrobial peptide. Infect Immun 67:6084.  19.  Scott, M . G., A . C. Vreugdenhil, W. A . Buurman, R. E. W. Hancock, and M . R. Gold. 2000. Cutting edge: cationic antimicrobial peptides block the binding of lipopolysaccharide (LPS) to LPS binding protein. J Immunol 164:549.  20.  Nagaoka, I., S. Hirota, F. Niyonsaba, M . Hirata, Y . Adachi, H . Tamura, and D. Heumann. 2001. Cathelicidin family of antibacterial peptides C A P 18 and CAP11 inhibit the expression of TNF-alpha by blocking the binding of LPS to CD14(+) cells. J Immunol 167:3329.  48  21.  Mookherjee, N . , K . Brown, D. M . Bowdish, S. Doria, R. Falsafi, K . Hokamp, F. M . Roche, R. Mu, G. H . Doho, J. Pistolic, J. P. Powers, J. Bryan, F. S. L . Brinkman, and R. E. W. Hancock. 2005. Modulation of the Toll-like receptormediated inflammatory response by the endogenous human host defence peptide LL-37. J Immunol 176:2455.  22.  Hokamp, K., F. M . Roche, M . Acab, M . E. Rousseau, B. Kuo, D. Goode, D. Aeschliman, J. Bryan, L. A . Babiuk, R. E. W. Hancock, and F. S. Brinkman. 2004. ArrayPipe: a flexible processing pipeline for microarray data. Nucleic Acids Res $2:W457.  23.  Roche F . M . , K. Hokamp, M . Acab, L. A . Babiuk, R. E. W. Hancock, R. S. Brinkman. 2004. ProbeLynx: a tool for updating the association of microarray probes to genes. Nucleic Acids Res 32:W471  24.  Tavazoie, S., J. D. Hughes, M . J. Campbell, R. J. Cho, and G. M . Church. 1999. Systematic determination of genetic network architecture. Nat Genet 22:281.  25.  Dopazo, J., and J. M . Carazo. 1997. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J Mol Evol 44:226.  26.  Tamayo, P., D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:290 7.  21.  Ho Sui, S. J., J. R. Mortimer, D. J. Arenillas, J. Brumm, C. J. Walsh, B. P. Kennedy, and W. W. Wasserman. 2005. oPOSSUM: identification of over-  49  represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 33:3154. 28.  Sandelin, A., W. Alkema, P. Engstrom, W. W. Wasserman, and B. Lenhard. 2004. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32.D91.  29.  Zeeberg, B. R., W. Feng, G. Wang, M . D. Wang, A . T. Fojo, M . Sunshine, S. Narasimhan, D. W. Kane, W. C. Reinhold, S. Lababidi, K . J. Bussey, J. Riss, J. C. Barrett, and J. N . Weinstein. 2003. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4:R28.  30.  Tian, B., D. E. Nowak, M . Jamaluddin, S. Wang, and A . R. Brasier. 2005. Identification of direct genomic targets downstream of the nuclear factorkappaB transcription factor mediating tumor necrosis factor signaling. J Biol Chem 280:17435.  31.  Matys, V., E. Fricke, R. Geffers, E. Gossling, M . Haubrock, R. Hehl, K . Hornischer, D. Karas, A . E. Kel, O. V . Kel-Margoulis, D. U . Kloos, S. Land, B. Lewicki-Potapov, H . Michael, R. Munch, I. Reuter, S. Rotert, H . Saxel, M . Scheer, S. Thiele, and E. Wingender. 2003. T R A N S F A C : transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31:374.  32.  Price, M . A., A . E. Rogers, and R. Treisman. 1995. Comparative analysis of the ternary complex factors Elk-1, S A P - l a and SAP-2 (ERP/NET). Embo J 14:2589.  50  33.  Cook, A . L., A. G. Smith, D. J. Smit, J. H . Leonard, and R. A . Sturm. 2005. Coexpression of SOX9 and SOX 10 during melanocyte differentiation in vitro. Exp Cell Res 308:222.  34.  Ikeda, T., H . Kawaguchi, S. Kamekura, N . Ogata, Y . Mori, K . Nakamura, S. Ikegawa, and U . I. Chung. 2005. Distinct roles of Sox5, Sox6, and Sox9 in different stages of chondrogenic differentiation. J Bone Miner Metab 23:337.  35.  Kanai, Y . , R. Hiramatsu, S. Matoba, and T. Kidokoro. 2005. From S R Y to SOX9: mammalian testis differentiation. J Biochem (Tokyo) 138:13.  36.  Zhao, C , and A . Meng. 2005. Spl-like transcription factors are regulators of embryonic development in vertebrates. Dev Growth Differ 47:201.  37.  Ashburner, M . , C. A . Ball, J. A . Blake, D. Botstein, H . Butler, J. M . Cherry, A . P. Davis, K . Dolinski, S. S. Dwight, J. T. Eppig, M . A . Harris, D. P. Hill, L . Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M . Ringwald, G. M . Rubin, and G. Sherlock. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25.  38.  Tamames, J., D. Clark, J. Herrero, J. Dopazo, C. Blaschke, J. M . Fernandez, J. C. Oliveros, and A. Valencia. 2002. Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. JBiotechnol 98:269.  51  Chapter  3  DISCUSSION  Time course microarray analysis provides the potential to analyze global gene expression patterns. Subsequent grouping of similar temporal expression patterns followed by regulatory analysis of the promoters of such co-expressed genes can give rise to hypotheses about the factors, TFBSs, and putative pathways involved in generating the observed expression pattern (1). The current study represents a bioinformatic analysis of time-course microarray data obtained using a monocytic cell line and three treatment regimens using a combination of optimized clustering combined with TFBS analysis. This provided a method of generating hypotheses regarding the influence of the human cationic host defence peptide LL-37 on innate immunity.  3.1  Expression pattern clustering A number of different clustering methods are currently available and as there  appears to be no optimal solution for all types of data, I chose to first evaluate the most commonly used clustering methods for their performance when applied to our microarray data sets. The methods were evaluated based on their performance characteristics and underlying algorithm (hierarchical vs. non-hierarchical). Analysis of the regulation by LPS of the known target genes of  NF-KB  provided insight into the  ability of each method to separate these known co-regulated genes into a limited number of clusters. Although N F - K B comprises five subunits (p50/pl05, p52/pl00, p65 (RelA), RelB and c-Rel) that combine as homo- or heterodimers to mediate transcriptional activation and in some cases repression, the most well studied transcriptional activator is the p50/p65 heterodimer. If the published list of  NF-KB  53  p50/p65 target genes (2) were influenced only by this heterodimeric transcription factor, in the absence of microarray experimental errors, all the target genes should have exhibited similar expression profiles indicative of gene co-regulation. Examination of the expression profiles of a number of these target genes, under the influence of LPS stimulation, revealed rather similar patterns involving maximal up-regulation either at 2 or 4 hr post-treatment. However although this indicated that the p50/p65 target gene list served as an strong test set for the validation of methods, several of the genes on this list were not observed to be up-regulated, indicating that even amongst this set there are other transcription factors that influence inducible expression. H O P A C H was used to perform Hierarchical Clustering since it can identify the most homogeneous clusters by the Mean/Median Split Silhouette (MSS) criteria. Moreover, it was noted by Tamames et al. (3) that standard hierarchical clustering works very well for clustering conditions (represented by columns, i.e. a small number of items), but several authors have noted that standard hierarchical clustering methods are not very robust when applied to thousands of gene expression profiles (4). Unlike the standard hierarchical clustering, H O P A C H builds a hierarchical tree from top to bottom using a divisive algorithm. When applied to our array data however, it identified the second level of the tree (the first division from the root) as main clusters, producing only three gene clusters with several hundreds of genes using uncentered correlation as the distance measure. Examination of the expression patterns of the clusters produced did not reveal any form of co-expression (data not shown). Neither using different distance measures nor varying method parameters seemed to improve the problem.  54  Heirarchical clustering was not effective for clustering of our microarray data sets, possibly due to intrinsic problems such as high sensitivity to noise, or limited robustness, that manifested when applied to thousands of gene expression profiles. In contrast the use of the SOTA and K M C clustering methods permitted robust separation of the p50/p65 target genes into three highly similar clusters despite differences in the underlying algorithms of these methods. I combined the two methods and extracted conserved gene clusters hoping to take advantage of both SOTA's robustness based on neural networks and K M C ' s ability to minimize intra-cluster dispersion by iterative reallocation. Virtually all of the p50/p65 target genes remained within the same clusters after combining the methods, indicating that both SOTA and K M C were highly efficient and consistent when applied to our microarray data set.  3.2  TFBS analysis The TFBS analyses performed in this study were based on TFBS predictions  generated in the oPOSSUM database, using very stringent parameters in order to minimize false positives that may have been generated in the process of binding site prediction. Although this might in principle have lowered the sensitivity of these analyses, it was likely to have increased the specificity of these analyses. However, an analogous TFBS analysis using comparatively less stringent parameters (e.g. top 30% of the conserved regions with a minimum of 75% matrix match score) did not substantially affect the results. For example, less stringent TFBS analysis of the four test data sets still correctly identified their corresponding TFBS as top hits, while indicating over-representation of some additional TFBSs (data not shown). Therefore,  55  to minimize false interpretations of gene cluster TFBS analyses, any inferences from these TFBS analyses reported here were based on the observation of multiple overrepresented TFBSs belonging to the same class of transcription factors. For example, to draw an inference of possible co-operative LPS-induced regulation by  NF-KB  and the  transcription factors downstream of ERK1/2 and p38, at least two or more M A P K downstream TFBSs (e.g. ETS-family, C R E B and c-Fos TFBS) as well as binding sites for p50 and p65 were required to be over-represented within the same cluster (Figure 3A). Our approach to detecting over-represented TFBS in a set of co-expressed genes was largely dependent on the TFBS predictions obtained from oPOSSUM and based on J A S P A R binding site matrices. Therefore, the regulatory analyses reported here are only as efficient as the limitations associated with the J A S P A R database. JASPAR currently contains 111 binding site profiles representing 25 classes of TFBS, limiting regulatory analysis to those 111 TFBS profiles. Nevertheless, the high quality of J A S P A R TFBS profiles and information on the class of each TFBS enabled a comprehensive regulatory analysis.  3.3  Biological implications It is known that inhibitors of both the  TLR4->NFKB  and the ERK1/2 and p38  pathways reduce LPS-induced production of T N F - a among other proinflammatory cytokines (5, 6). The bioinformatic analyses described in Chapter 2 indicated that the bacterial LPS up-regulated clusters of genes that contained TFBS for both  NFKB  subunits and transcription factors downstream of M A P K . Thus these bioinformatic  56  analyses are consistent with the regulation of a large subset of pro-inflammatory genes by complexes of these two classes of transcription factors, possibly with the additional involvement of other transcription factors with TFBS that were over-represented in the promoter regions of genes in these clusters. Even though the  key  transcription  factor  in innate  NF-KB  has been described as  immune/inflammatory  bioinformatic analyses together with the fact that inhibitors of both  responses,  NF-KB  the  and M A P K  pathways reduce LPS-induced pro-inflammatory cytokines conclusively suggest that the regulation of innate immunity is complicated involving complexes of multiple transcription factors, including the obligate involvement of factors downstream of the M A P K . In addition it can be anticipated that there would be a cascade of events involved in innate immunity including triggering of gene expression influenced by transcription factors that are up-regulated during the early innate immune response, as well as suppression of responses as a mechanism of bringing about the dampening of this response (evident in the array data after 24 hours of exposure of monocytic cells to LPS). The human cationic host defense peptide LL-37 modulates innate immunity, in part through the activation of the ERK1/2 and p38 pathways (7). It was recently reported that the anti-endotoxin effect of this peptide reflected in part an ability to reduce, by around 75%, the amount of  NF-KB  p50/p65 in the nucleus (8). Here  computational evidence is provided that supports the contention that the selective abrogation of the association of  NF-KB  p50/p65 is the key mechanism of the anti-  endotoxin effect of LL-37. The suppression of gene expression seemed to particularly influence genes that were predicted to contain binding sites for both  NF-KB  and  57  transcription factors downstream of ERK1/2 and p38. Conversely, genes regulated by the transcription factors downstream of ERK1/2 and p38 possibly together with other transcription factors, but not by  NF-KB  p50/p65 were apparently still up-regulated,  providing a potential basis for protection against infections by LL-37 in the absence of an N F - K B p50/p65 directed response.  3.4  Future directions The analysis of over-represented TFBSs reported in this thesis was based on the  proportion of each gene cluster containing a particular TFBS. Therefore, this analysis did not account for multiple occurrences of a certain TFBS within a single promoter. Incorporation of this information into the analysis pipeline would provide an improved statistical model which could further reduce false positives. In contrast to prokaryotes where transcriptional regulation can often be understood in terms of induction by one or two factors, eukaryotic gene regulation is largely carried out through sophisticated interactions of multiple transcription factors acting in a coordinated and/or sequential fashion (9). In this study, the associations of different signaling pathways were hypothesized by a presence or absence of overrepresentation of the TFBSs downstream of corresponding pathways within a cluster of interest. Therefore, by examining the actual locations of the binding site predictions and over-representation of multiple TFBSs within a given window of sequence, the hypothesized associations of different signaling pathways could be extended to the interaction of the downstream transcription factors.  58  3.5  Bibliography  1.  Ho Sui, S. J., J. R. Mortimer, D. J. Arenillas, J. Brumm, C. J. Walsh, B. P. Kennedy, and W. W. Wasserman. 2005. oPOSSUM: identification of overrepresented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 33:3154.  2.  Tian, B., D. E. Nowak, M . Jamaluddin, S. Wang, and A . R. Brasier. 2005. Identification of direct genomic targets downstream of the nuclear factorkappaB transcription factor mediating tumor necrosis factor signaling. J Biol Chem 280:17435.  3.  Tamames, J., D. Clark, J. Herrero, J. Dopazo, C. Blaschke, J. M . Fernandez, J. C. Oliveros, and A . Valencia. 2002. Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. JBiotechnol 98:269.  4.  Tamayo, P., D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:2907.  5.  Guha, M . , M . A . O'Connell, R. Pawlinski, A . Hollis, P. McGovern, S. F. Yan, D. Stern, and N . Mackman. 2001. Lipopolysaccharide activation of the M E K ERK1/2 pathway in human monocytic cells mediates tissue factor and tumor necrosis factor alpha expression by inducing Elk-1 phosphorylation and Egr-1 expression. Blood 98:1429.  59  6.  van der Bruggen, T., S. Nijenhuis, E. van Raaij, J. Verhoef, and B. S. van Asbeck. 1999. Lipopolysaccharide-induced tumor necrosis factor alpha production by human monocytes involves the r a f - l / M E K l - M E K 2 / E R K l E R K 2 pathway. Infect Immun 67:3824.  1.  Bowdish, D. M . , D. J. Davidson, D. P. Speert, and R. E. Hancock. 2004. The human cationic peptide LL-37 induces activation of the extracellular signalregulated kinase and p38 kinase pathways in primary human monocytes. J Immunol 172:3758.  8.  Mookherjee, N . , K . Brown, D. M . Bowdish, S. Doria, R. Falsafi, K . Hokamp, F. M . Roche, R. M u , G. H . Doho, J. Pistolic, J. P. Powers, J. Bryan, F. S. L . Brinkman, and R. E. W. Hancock. 2005. Modulation of the Toll-like receptormediated inflammatory response by the endogenous human host defence peptide LL-37. J Immunol 176.  9.  Bluthgen, N . , S. M . Kielbasa, and H . Herzel. 2005. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res 33:272.  60  Appendix  SUPPLEMENTAL TABLES  Table I . p50/p65 N F - K B target genes mapped to the clustering results produced by SOTA and K M C . Both clustering results using SOTA and K M C were combined to produce conserved clusters. A , p50/p65 target genes mapping before the merge, and B, after the merge. * A Cluster  Clustering Method  1  SOTA KMC SOTA KMC SOTA KMC  2 3  p50/p65 CXCL2 CXCL2 CD83 CD83 TNFAIP2 TNFAIP2  TNFAIP3 TNFAIP3 NFKB1 NFKB1 PTGS2 PTGS2  NF-KB  target genes  IL-7R -  CXCL3 CXCL3 IL-6 IL-6  RELB RELB  NFKBIA NFKBIA  IL7R  B  Cluster  Clustering Method  1 2 3  SOTA+KMC SOTA+KMC SOTA+KMC  p50/p65 CXCL2 CD83 TNFAIP2  TNFAIP3 NFKB1 PTGS2  NF-KB  target genes  CXCL3 IL-6  RELB  NFKBIA  62  Table II. Over-representation of Gene Ontology Term analysis. Shown here is a list of over-represented GO Terms associated with the gene cluster in Figure 4A using GoMiner. This analysis was carried out using the GO Terms only under the category of biological process. GO terms with p-value < 0.05 are reported here to be overrepresented. GO Term  p-Value  Regulation of lymphocyte proliferation Lymphocyte proliferation Regulation of T-cell proliferation T-cell proliferation Regulation of cytokine production Cytokine production Positive regulation of cell proliferation Cytokine metabolism Cytokine biosynthesis Regulation of cytokine biosynthesis Regulation of cell activation Regulation of T-cell activation Regulation of lymphocyte activation Regulation of cellular biosynthesis T-helper 1 type immune response Cell-mediated immune response Cellular defense response (sensu Vertebrata) T-cell activation Actin filament-based process Regulation of immune response Regulation of cell proliferation Regulation of cell cycle Negative regulation of S phase of mitotic cell cycle Eye morphogenesis (sensu Endopterygota) Regulation of activated T-cell proliferation Vitamin A metabolism Fat-soluble vitamin metabolism Tissue homeostasis Negative regulation of cytokine production Bile acid biosynthesis Eye morphogenesis Negative regulation of interleukin-2 biosynthesis Positive regulation of interferon-gamma biosynthesis Regulation of interferon-gamma biosynthesis Negative regulation of biosynthesis  0.0014 0.0014 0.0014 0.0014 0.0045 0.0045 0.0045 0.0045 0.0045 0.0045 0.0067 0.0067 0.0067 0.0067 0.0067 0.0067 0.0092 0.0121 0.0129 0.0154 0.0181 0.0211 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022  63  Regulation of liquid surface tension Activated T-cell proliferation Negative regulation of lymphocyte proliferation Negative regulation of protein biosynthesis Alveolus development Macrophage chemotaxis Negative regulation of cellular biosynthesis D N A restriction Respiratory tube development Negative regulation of T-cell proliferation Positive regulation of activated T-cell proliferation Natural killer cell activation Interferon-gamma biosynthesis Negative regulation of cytokine biosynthesis Negative regulation of mitotic cell cycle Vasculature development Blood vessel development Angiogenesis Blood vessel morphogenesis Antimicrobial humoral response (sensu Vertebrata) Antimicrobial humoral response Organogenesis Organ development Phosphate transport Lymphocyte activation Cellular physiological process Cytoskeleton organization and biogenesis Cellular protein metabolism Protein metabolism Positive regulation of cellular physiological process Immune cell activation Cell activation Humoral defense mechanism (sensu Vertebrata) Positive regulation of physiological process Regulation of mitotic cell cycle Regulation of S phase of mitotic cell cycle Negative regulation of T-cell activation Negative regulation of cell activation Protein stabilization Negative regulation of lymphocyte activation Negative regulation of immune response Positive regulation of phagocytosis Regulation of phagocytosis  0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.0229 0.0229 0.0229 0.0229 0.0229 0.0229 0.0256 0.0256 0.0271 0.0271 0.0276 0.0305 0.0328 0.0335 0.0362 0.0363 0.0363 0.0363 0.0403 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435  Positive regulation of lymphocyte proliferation Positive regulation of transport Response to cold Oxygen transport Gas transport Positive regulation of T-cell proliferation Regulation of endocytosis Bile acid metabolism Actin filament-based movement Positive regulation of endocytosis Respiratory gaseous exchange Parturition Regulation of protein biosynthesis  0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0435 0.0466  Table III. Over-representation of Gene Ontology Term analysis. Shown here is a list of over-represented GO Terms associated with the gene cluster in Figure 4C using GoMiner. This analysis was carried out using the GO Terms only under the category of biological process. GO terms with p-value < 0.05 are reported here to be overrepresented. Term  p-Value  Micro tubule-based process Microtubule-based movement Cytoskeleton-dependent intracellular transport Regulation of cellular process Regulation of cellular physiological process Regulation of biological process Response to pest, pathogen or parasite Regulation of physiological process Immune response Response to biotic stimulus Defense response Response to external biotic stimulus Response to virus Regulation of transferase activity Regulation of protein kinase activity Regulation of cyclin dependent protein kinase activity Development G2/M transition of mitotic cell cycle Anti-apoptosis Regulation of apoptosis Regulation of programmed cell death Response to stress Brain development Negative regulation of programmed cell death Negative regulation of apoptosis Immune cell mediated cytotoxicity Post-chaperonin tubulin folding pathway Chaperonin-mediated tubulin folding R N A elongation from R N A polymerase II promoter R N A elongation Detection of virus Natural killer cell mediated cytotoxicity G l / S transition of mitotic cell cycle Regulation of enzyme activity Vasculature development  0.0008 0.0011 0.0011 0.0017 0.0021 0.0028 0.0045 0.0049 0.0033 0.0054 0.0062 0.0073 0.0075 0.0077 0.0077 0.011 0.0171 0.0199 0.0201 0.0209 0.0209 0.0236 0.0251 0.0257 0.0257 0.0284 0.0284 0.0284 0.0284 0.0284 0.0284 0.0284 0.0308 0.0355 0.037  66  Blood vessel development Angiogenesis Blood vessel morphogenesis Regulation of transcription, DNA-dependent Negative regulation of biological process Response to wounding  0.037 0.037 0.037 0.0443 0.0444 0.0457  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092520/manifest

Comment

Related Items