UBC Theses and Dissertations
De novo detection of regulatory elements in the nematode Caenorhabditis elegans Sleumer, Monica Celia
The availability of high-throughput gene expression data and completely sequenced genomes from eight species of nematodes has provided an opportunity to identify novel cis-regulatory elements in the promoter regions of Caenorhabditis elegans transcripts. Motif discovery was performed in the promoter regions of genes expressed in the C. elegans intestine. We scanned the upstream regions of genes expressed in the intestine and ASE neurons for sequences similar to the binding sites of the transcription factors ELT-2 and CHE-1 respectively and showed that they are more likely to contain high-scoring matches to these binding sites than upstream regions of other genes. To create the cisRED C. elegans database, we determined orthologues for C. elegans transcripts in C. briggsae, C. remanei, C. brenneri, C. japonica, Pristionchus pacificus, Brugia malayi and Trichinella spiralis using the WABA alignment algorithm. We pooled the upstream region of each transcript in C. elegans with the upstream regions of its orthologues and identified conserved DNA sequence elements by de novo motif discovery. In total, we discovered 158,017 novel conserved motifs upstream of 3847 C. elegans transcripts for which three or more orthologues were available, and identified 82% of 44 experimentally validated regulatory elements from the ORegAnno TFBS database. We annotated 26% of the motifs as similar to known binding sequences of transcription factors from the ORegAnno, TRANSFAC and JASPAR databases. This is the first catalogue of annotated conserved upstream elements for nematodes and can be used to find putative regulatory elements, improve gene models, discover novel RNA genes, and understand the evolution of transcription factors and their binding sites in phylum Nematoda. We placed the cisRED motifs into groups based on sequence similarity and identified a series of motif groups that are associated with genes that have significant functional associations. Fifteen of the groups are specifically associated with ribosomal protein genes. Eight of these are extensions of the canonical C. elegans trans-splice acceptor site; two are similar to binding sites of transcription factors in other species. One was tested for regulatory function in a series of GFP expression experiments and was shown to be involved in pharyngeal expression.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International