UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Significance of low-abundance transcripts detected in Caenorhabditis elegans muscle SAGE libraries Veiga, Mariana Barçante 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2008_fall_veiga_mariana.pdf [ 4.21MB ]
JSON: 24-1.0066402.json
JSON-LD: 24-1.0066402-ld.json
RDF/XML (Pretty): 24-1.0066402-rdf.xml
RDF/JSON: 24-1.0066402-rdf.json
Turtle: 24-1.0066402-turtle.txt
N-Triples: 24-1.0066402-rdf-ntriples.txt
Original Record: 24-1.0066402-source.json
Full Text

Full Text

    SIGNIFICANCE OF LOW-ABUNDANCE TRANSCRIPTS DETECTED IN CAENORHABDITIS ELEGANS MUSCLE SAGE LIBRARIES  by   MARIANA BARÇANTE VEIGA  B.Sc., University of British Columbia, 2004      A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF   MASTER OF SCIENCE   in   THE FACULTY OF GRADUATE STUDIES (Genetics)    THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  May 2008     © Mariana Barçante Veiga, 2008  ii ABSTRACT   Serial Analysis of Gene Expression (SAGE) on Caenorhabditis elegans RNA from FACS sorted embryonic body wall muscle cells has identified nearly 8000 genes expressed in nematode body wall muscle. Approximately 60% of these are genes are expressed at low levels (<5 tags/~50,000-100,000 tag library). Low-abundance transcripts have typically been overlooked since most are considered experimental or contamination errors. Consequently, research has been focused on transcripts that are most enriched in the particular tissue of interest. Here I focus on the analysis of low-expressed transcripts in the muscle SAGE libraries in order to investigate what percentage of these are in fact expressed in muscle and are not false positives. Most well characterized C. elegans body wall muscle genes are not expressed at low levels, therefore I anticipate that focusing on these rarely expressed genes will allow for the identification of muscle components that have been previously unrecognized.   RT-PCR was performed on RNA isolated from purified body wall muscle cells to initially estimate what fraction of these low abundance transcripts present in the SAGE data are indeed expressed in muscle. I examined 128 genes, of which 84 were represented by a single SAGE tag. From this initial list, 38% of the low-expressed transcripts were verified for their presence in body wall muscle. Subsequently, reporter GFP fusions were used to deduce if these low-expressed transcripts are indeed expressed in vivo within muscle. Of the low-expressed genes that tested positive via RT-PCR, 42% showed in vivo expression in body wall muscle. When the results from the RT-PCR and in vivo expression experiments are combined, I can extrapolate that at least 16% of low-  iii expressed genes identified by the SAGE libraries are in fact expressed in muscle and are not false positives.  RNAi and knockout analysis were performed in order to investigate the role of low- expressed muscle genes in myofilament structure. RNAi results show that 14/34 (41%) of the genes screened had mild defects in myofilament organization. The SAGE libraries identified 6388 low-expressed transcripts, this work suggests that at least 16% (1022 genes) of these are in fact expressed in muscle and may reveal new components previously overlooked by other approaches.           iv TABLE OF CONTENTS  ABSTRACT ............................................................................................... ii TABLE OF CONTENTS.......................................................................... iv LIST OF TABLES.................................................................................... vi LIST OF FIGURES ................................................................................. vii LIST OF ABBREVIATIONS................................................................. viii ACKNOWLEDGEMENTS....................................................................... x DEDICATION ......................................................................................... xii CHAPTER I - INTRODUCTION............................................................. 1 1.1 Caenorhabditis elegans as a model organism..................................................... 2 1.2 Caenorhabditis elegans muscle.......................................................................... 3 1.3 Serial Analysis of Gene Expression ................................................................... 6 1.3.1 What is Serial Analysis of Gene Expression? .............................................. 8 1.3.2 Caenorhabditis elegans muscle SAGE libraries ........................................ 13 1.4 Low expressed genes and singletons ................................................................ 15 1.5 Approach taken to investigate low-expressed genes in muscle SAGE data. ...... 18 CHAPTER II - MATERIALS AND METHODS....................................19 2.1 Muscle SAGE library ...................................................................................... 19 2.1.1 Cell sorting for muscle cells ...................................................................... 19 2.1.2 Generating the muscle SAGE library ........................................................ 21 2.1.3 Generation of list of low-expressed genes ................................................. 23 2.2 RNA isolation from FACS sorted muscle cells ................................................ 24 2.2.1 Whole embryo RNA isolation ................................................................... 24 2.2.2 Muscle RNA isolation............................................................................... 25 2.3 Reverse-Transcription Polymerase Chain Reaction .......................................... 26 2.3.1 Primer design............................................................................................ 26 2.3.1 Whole embryo mRNA RT-PCR................................................................ 26 2.3.2 Muscle mRNA RT-PCR ........................................................................... 28 2.4 Promoter fusions.............................................................................................. 29 2.4.1 PCR stitching to construct promoter fusions.............................................. 29 2.4.2 Microinjection procedure .......................................................................... 31 2.4.3 Microscopic analysis................................................................................. 32  v 2.5 Mutant and RNAi analysis............................................................................... 32 2.5.1 Knockout strains analysis.......................................................................... 32 2.5.2 RNAi analysis........................................................................................... 33 CHAPTER III – RESULTS .....................................................................38 3.1 The muscle SAGE libraries.............................................................................. 38 3.1.1 Presence of low-expressed genes in Muscle Library 2............................... 40 3.1.2 Gene Ontology annotation of singletons.................................................... 41 3.2 Verification of low-abundance transcript expression in muscle cells ................ 43 3.3 The use of promoter fusions to verify in vivo expression of low-expressed genes in body wall muscle............................................................................................... 46 3.4 Role of low-expressed genes in muscle cells.................................................... 50 3.4.1 Knockout analysis..................................................................................... 50 3.4.2 RNAi analysis........................................................................................... 53 CHAPTER IV - DISCUSSION ................................................................66 4.1 16% of all low-abundance SAGE tags are indeed expressed in muscle............. 66 4.2 Depleting low expressed genes in muscle has a minor effect on myofilament integrity................................................................................................................. 75 CHAPTER V – CONCLUSION ..............................................................80 REFERENCES .........................................................................................84    vi LIST OF TABLES  Table 1. Specifications of thermocycler program used for RT-PCR reaction.................. 28 Table 2. Summary of RT-PCR results ........................................................................... 45 Table 3. Summary of promoter::GFP fusion data........................................................... 47 Table 4. Summary of knockout strain analysis and their corresponding RNAi phenotype .............................................................................................................................. 51 Table 5. Summary of data for genes that were verified by RT-PCR............................... 59   vii LIST OF FIGURES  Figure 1. A C. elegans sarcomere.................................................................................... 4 Figure 2. Overview of C. elegans muscle ........................................................................ 6 Figure 3. Schematic of Serial Analysis of Gene Expression procedure........................... 11 Figure 4. Comparison between numbers of genes identified by SAGE and LongSAGE C. elegans embryonic libraries ................................................................................... 14 Figure 5. Tag counts in muscle SAGE libraries for some known muscle genes.............. 17 Figure 6. Illustration of fluorescent activated sorting of muscle cells. ............................ 21 Figure 7. Summary of isolation of muscle cells for generating muscle SAGE library..... 23 Figure 8. Titration reactions .......................................................................................... 27 Figure 9. Titration reactions .......................................................................................... 29 Figure 10. PCR stitching protocol ................................................................................. 31 Figure 11. Summary of genes expressed in body wall muscle as defined by SAGE ....... 38 Figure 12. Summary of low-expressed genes expressed in body wall muscle as defined by SAGE.................................................................................................................... 39 Figure 13. Relationship between sequence quality score and number of genes identified in Muscle Library 2 (SW031) ................................................................................ 40 Figure 14. Distribution of tag counts in Muscle Library 2 (SW031)............................... 41 Figure 15. GO annotations of singletons versus entire genome ...................................... 42 Figure 16. RT-PCR reaction products............................................................................ 45 Figure 17. Promoter::GFP fusions showing expression in body wall muscle and other tissues.................................................................................................................... 49 Figure 18. Examples of myofilament structure from knockout strains............................ 53 Figure 19. Observed myofilaments defects in RNAi experiment ……………………….58   viii LIST OF ABBREVIATIONS  % Percent oC degrees Celsius µL microlitre AE anchoring enzyme bp base pair cDNA complementary DNA CGC Caenorhabditis genetics centre DEPC diethylpyrocarbonate dH20 distilled water DL dorsal left DNA deoxyribonucleic acid Dpy Dumpy DR dorsal right dsRNA double stranded ribonucleic acid EDTA ethylenediaminetetraacetic acid e-PCR electronic polymerase chain reaction EST expressed sequence tag FACS fluorescent activated cell sorter g g-force g Grams GFP green fluorescent protein GO gene ontology IPTG isopropyl-beta-D-thiogalactopyranoside kb kilo base pairs KO knockout KOH potassium hydroxide L1 larval stage 1 L4 larval stage 4 LB lennox broth M9 minimal media salt solution 9 MHC myosin heavy chain min Minutes mL Milliliter mm  Millimeteres mRNA messenger ribonucleic acid NaCl sodium chloride ng Nanogram NGM nematode growth medium PCR polymerase chain reaction RNA ribonucleic acid RNAi RNA interference RT-PCR reverse transcription-PCR  ix s Seconds SAGE Serial Analysis of Gene Expression Taq Taq polymerase TE tagging enzyme UK United Kingdom unc uncoordinated USA United States of America UTR Untranslated region VL ventral left VR ventral right WT wild type   x ACKNOWLEDGEMENTS  Being a graduate student was not an easy task and without the help of certain people it would have been impossible to complete my Master degree. There are many people that I would like to thank for their support, mentorship, guidance, and assistance in generating the data that lead to this thesis.   First and foremost I would like to thank my supervisor Dr. Donald Moerman. Don has helped me grow as a scientist and as a critical thinker through these last three years. Through Don’s knowledge and expertise I learned much about C. elegans, and if it weren’t for him I never pictured myself working with worms. Thank you Don for not only your professional help but also for being a great person and developing a wonderful supervisor-student relationship. Through my Masters Don has also helped me learn many things about myself and grow on a personal level.   I would like to thank my committee Dr. Calvin Roskelley and Dr. Christopher Lowen who were extremely helpful in providing advice about my project and guiding me in the right direction. Thank you to Dr. Pamela Hoodless for taking time to be part of my examining committee. During my undergraduate years there were a few professors that triggered my love for science and genetics, it is because of them that I knew I would someday pursue graduate studies. I would like to thank these professors, Dr. Anthony Glass, Dr. Hugh Brock, Dr. Fabio Rossi and Dr. Jennifer Klenz. Dr. Klenz was key in me getting my first job after undergrad at the C. elegans knockout consortium. It was working at the C. elegans knockout consortium that I met Don and that is how this thesis came to be.   I want to thank all the members of the Moerman lab. Thank you for all the advice and support, and most importantly for the friendships. In particular I would like to thank Adam Lorch, the lab’s bioinformatician. If it weren’t for Adam’s help the work presented here would have not been possible. Adam, thank you for all the help with gathering data, thank you for all the emails, thank you for discussing with me my results and thank you for helping me even on weekends! Thank you to Dr. Barbara Meissner for teaching me how to generate transgenic worms and how to perform microinjections. Thank you to Rick Zapf for all the help with FACS sorting GFP muscle cells. Thank you to Dr. Teresa Rogalski for teaching me worm handling techniques and helping me with strain maintenance. Thank you to Adam Warner for teaching me about RNAi and for all your support. Thank you to Dr. Aruna Somasiri for being a great mentor and inspiring me. Aruna, thank you for your patience when teaching how to work with proteins during my directed studies project. Thanks to Teresa, Aruna, Adam Lorch and Adam Warner for the feedback on this thesis.   Lastly I would like to thank my family and friends. Thank you to my parents for all their love and support, through good and hard times. My parents have always been my role models and I am the person that I am today because of them. The intelligence that  xi you both have awes me and your big hearts inspire me. Thank you to my brother Victor for being the wonderful person that he is. Thank you to my family and friends for putting up with me during stressful times! My friends Dina Anastas, Tanja Teofilovic and Marie- Luise Ermisch were key in helping me maintain my sanity when nothing in the lab was going my way! Thank you to my grandparents for all their love and for also helping mold me into who I am today. My greatest appreciation goes to my husband Mauricio Blanco. Mauricio has been by my side everyday since the very beginning and only he knew how to lift me up during times when I was unmotivated or angry. He always helped me put everything into perspective and always gave me his unconditional love and support. Thank you Mauricio for being my rock when I needed one but also for sharing and celebrating all the joys with me.        xii DEDICATION          This thesis is dedicated to my loving husband and my amazing parents             1 CHAPTER I – INTRODUCTION  Scientists have been working for many years to understand the biological processes taking place in order to keep an organism alive. This impressive undertaking has lead to the revelation that precise coordination of gene expression is essential for a single cell to develop into a multicellular organism. Deciphering which genes are turned on at each developmental stage and in each particular tissue continues to be an important step towards comprehending development. Looking at a single gene at a time has been a daunting task for discovering all the players involved in cellular function, composition and maintenance. The sequencing of genomes from a variety of organisms has allowed for the application of genome-wide experiments where the expression of thousands of genes can be analyzed simultaneously. Our lab focuses on looking at genes expressed in muscle. The ultimate goal is to understand how a muscle cell is formed, how it functions and how it is maintained. By studying the dynamics of a muscle cell we hope to shed light on causes and potential treatments for muscular diseases. Laing and Nowak (2005) report that approximately 20 human skeletal muscle disorders are caused by mutations in sarcomeric proteins. Investigating how sarcomeric proteins interact and function is essential for understanding the underlying biological basis of muscle disorders. Serial Analysis of Gene Expression (SAGE) has aided in developing the transcription profile of a cell (McKay et al., 2003; Velculescu et al., 1995). SAGE has allowed us to determine which genes are being expressed at particular stages in development by revealing the mRNA transcripts present in each specific stage. In this  2 work, I examine genes identified in a SAGE experiment carried out on Caenorhabditis elegans embryonic muscle cells (Moerman lab, unpublished data). Caenorhabditis elegans is a soil nematode found in most parts of the world. C. elegans is widely used to study behaviour, cell cycle, development and functions associated with distinct tissue types such as gut, pharynx, muscle and neurons. The work done in C. elegans has greatly broadened the developmental biology field as researchers identify genes involved in the formation and function of each specific tissue.  1.1 Caenorhabditis elegans as a model organism Sydney Brenner established the use of the soil nematode C. elegans as a model organism in the mid 1960’s, when he used the worm to study development and behaviour (Brenner, 1974). C. elegans possesses many features that make it attractive as a model organism. It has a short generation time of three days, it can mate or self-fertilize, it is inexpensive to maintain, and it has a transparent body making it very easy to visualize internal structures. The complete cell lineage for this organism has been determined (Sulston and Horvitz, 1977; Sulston et al., 1983), many knockout mutants are available, and the worm contains various tissue types that can be easily isolated. In 1998, the C. elegans genome sequence was completed (Consortium, 1998). The C. elegans genome has approximately 20,000 protein coding genes (Hillier et al., 2005) and just over 40% of these have human homologues (Ahringer, 1997). Currently in wormbase (www.wormbase.org) there are approximately 23,693 gene sequences, including splice variants. The availability of a sequenced genome proved to be an  3 incredible resource and has influenced how gene expression is studied, moving from single gene analysis to genome-wide gene expression profiling. The worm is now widely used as a genetic tool in a high throughput manner. Many RNAi screens have been performed on C. elegans since dsRNA can be injected or fed (Fire et al., 1998; Kamath et al., 2003; Timmons and Fire, 1998). The Gene Knockout Consortium has generated thousands of mutants which are publicly available. Transgenic worms can be easily generated by micro-injecting DNA constructs into the worm’s gonads (Mello et al., 1991). Fusions using tissue-specific promoters driving Green Fluorescent Protein (GFP) can be used to isolate specific cell types via Fluorescent Activated Cell Sorting (FACS) (Dupuy et al., 2004; Hunt-Newbury et al., 2007; McKay et al., 2003). Global gene expression experiments such as Serial Analysis of Gene Expression and microarray have been popular techniques used in the worm (Fox et al., 2007; Hill et al., 2000; Holt and Riddle, 2003; Jones et al., 2001; McGhee et al., 2007; Reinke et al., 2000).  1.2 Caenorhabditis elegans muscle   Our lab focuses on the study of muscle by examining the contractile repeating unit of muscle cells, the sarcomere. Sarcomeres consist of myosin (thick filaments), actin (thin filaments) and the attachment complexes anchoring these filaments (Figure 1). The structures that secure the thick filaments are known as M-lines, and the structures securing the thin filaments are known as dense bodies (analogous to Z- disks in vertebrates). A sarcomere is delineated as the distance between two dense bodies (Figure 1) (Moerman and Williams, 2006).  4   Figure 1. A C. elegans sarcomere Thick filaments (represented in grey) and thin filaments (represented in black) are interdigitated, allowing for myosin to move along actin and generate muscle contraction. Filaments are anchored to basal lamina via dense bodies and M-lines. Force is transmitted to the basal lamina by M-line and dense body attachment complex; in turn force is transmitted to cuticle via intermediate filaments. Figure used with permission from Moerman and Williams (2006).   The adult worm possesses 95 body wall muscle cells organized into four quadrants: ventral left (VL), ventral right (VR), dorsal left (DL) and dorsal right (DR). VR, DL, and DR each have 24 muscle cells, while VL has 23 muscle cells (Sulston and Horvitz, 1977). Of these 95 cells, 81 cells are already formed before the embryo hatches, while the remaining cells mature after hatching (Sulston et al., 1983). Body wall muscle cells are arranged side by side along the hypodermis. C. elegans muscle cells are mononucleated and do not fuse to each other like vertebrate muscle cells to form multinucleated myotubules. Thick and thin filaments are interdigitated in the sarcomere, allowing for myosin heads to move along actin filaments (figure 1). The movement of myosin along actin filaments causes tension at the cell attachment sites and thus the muscle cell contracts. intermediate filaments  5 Muscle cells are anchored to one another laterally via attachment plaques and through interactions with the hypodermis (Hresko et al., 1994). Dense bodies and M-lines are the vehicles for anchoring sarcomeres to the muscle cell membrane. Anchoring of dense bodies and M-lines to the muscle cell membrane involves an array of proteins similar to those found in vertebrate focal adhesions. In turn, fibrous organelles, structures analogous to vertebrate hemidesmosomes, connect the cell membrane to the hypodermis and the hypodermis to the cuticle (Figure 1) (Francis and Waterston, 1991; Hresko et al., 1994). When a muscle cell contracts, force is transmitted laterally to the cuticle via these attachment complexes leading to sinusoidal movement of the animal. Dense bodies and M-lines are offset by 5-7oC from the longitudinal axis (figure 2), which is thought to contribute to the animal’s sinusoidal movement (Mackenzie and Epstein, 1990). Our ultimate goal is to uncover all the genes that are being expressed in a muscle cell and understand how their products interact to form a sarcomere, to carry out muscle function and to maintain muscle stability.           6   Figure 2. Overview of C. elegans muscle This image provides an overview of filament arrangement within a muscle cell. The A- band spans the area where the thick and thin filaments overlap. The I-band is the area containing only thin filaments spanning both sides of the dense body. Dense bodies and M-lines are offset by 5-7oC from the longitudinal axis (Mackenzie and Epstein, 1980). Figure modified from Altun and Hall (2005).  1.3 Serial Analysis of Gene Expression  Information about the transcriptome will greatly benefit our understanding of how genes interact to regulate a single cell, a tissue, an organ or indeed the whole organism. The complexity of analysing the transcriptome increases as we tackle the issue of time and locality of gene expression. Since a cell contains thousands of transcripts, identifying only a few at a time is not an efficient approach if our goal is to eventually detect all of these transcripts.  Colony hybridization (Yamamoto et al., 1983) and cDNA subtractive hybridization (Kavathas et al., 1984) are techniques that were used in the past to compare gene expression between cells in different states; for example, genes expressed in tumor cells vs normal cells. These hybridization techniques are very time consuming and low- 5-7o Longitudinal axis  7 throughput, resulting in few genes being identified (Kavathas et al., 1984; Yamamoto et al., 1983). A new approach was needed to identify more transcripts, more rapidly. To study spatial and temporal distribution of gene expression, methods such as northern blotting, RT-PCR and in-situ hybridization had to be employed. Again, these low- throughput approaches are limited due to the number of genes that can be examined at a time. Large-scale efforts are being made to provide publicly available data examining patterns of gene expression. In the C. elegans community, Yuji Kohara has generated a database of in-situ hybridization images for C. elegans genes (http://nematode.lab.nig.ac.jp/). In the mouse community, the Allen Brain Atlas has been providing data on in-situ hybridization performed on genes expressed in the mouse brain (http://www.brain-map.org) while www.genepaint.org provides in-situ hybridization data on gene expression in additional tissues as well as in the mouse embryo. The use of expressed sequence tags (EST) for transcript identification became popular before complete genome sequences were available and continue to be widely used. An EST is a segment of sequenced cDNA from a cDNA library. An EST is generated by a single sequence read from either or both ends of the cDNA. Thousands of genes in the human genome have been identified by ESTs (Adams et al., 1992; Adams et al., 1991), however this number appears to have reached a plateau (Chen et al., 2002). EST only provides a partial picture of the transcriptome since it does not offer much information about abundance of a transcript. Expressed sequence tags are typically a few hundred bases in length. Therefore, due to sequencing and labour costs it is not realistic for every clone in a cDNA library to be sequenced. As a consequence, highly expressed genes mask low-expressed genes in the EST data set (Sun et al., 2004).  8  The availability of thousands of ESTs facilitated the development of new high- throughput technologies (Nowak, 1995). Much more genomic sequence became available and initiated a shift from investigating a few genes at a time to looking at thousands of genes simultaneously. Microarray technology boomed after 1995 when Schena et al (1995) developed an array that used fluorescent labelling to quantify gene expression. In the same issue of Science where-in Schena et al (1995) published their microarray experiment, Velculescu et al (1995) published an article on a new methodology for looking at gene expression called Serial Analysis of Gene Expression. Nowadays with access to fully sequenced genomes, microarray and SAGE are the two most widely used high-throughput methodologies to observe gene expression at a genomic level.  1.3.1 What is Serial Analysis of Gene Expression?  SAGE is a high-throughput technology that allows detection of the expression levels of all transcripts in a cell at a given time point. SAGE is based on the creation of short sequence tags (9-14 base pairs or 21 base pairs for LongSAGE) from cDNA, which can then be mapped back to the genome sequence to uniquely identify the transcript that gave rise to the tag (figure 3). SAGE is efficient because the tags generated are concatenated and thus, unlike ESTs, many tags can be sequenced from a single clone (Pleasance et al., 2003; Velculescu et al., 1995). Figure 3 visually describes the SAGE methodology. Double stranded DNA is first synthesized from mRNA using a biotinylated oligo(dT) primer. The cDNA is then cleaved with a restriction enzyme containing a 4 base pairs (bp) recognition site (called the anchoring enzyme). This enzyme cleaves approximately every 256 bp. Since most  9 transcripts are longer than 256 bp it is expected that the anchoring enzyme will cut every transcript at least once. The 3’ ends of the transcripts are then recovered by binding to streptavidin beads. Streptavidin beads bind the biotin that is now incorporated into the poly-T tail of the cDNA. The sample is then divided in half and ligated to a “linker”. The linker contains a tagging enzyme (type IIS enzyme) recognition site. The tagging enzyme (TE) recognizes its site on the linker and cuts up to 20bp away (dependent on TE used), causing the cDNA to be released from the streptavidin bead. The tagging enzyme used determines the tag length (typically 9 bp to 17 bp). The total tag length is the sum of the 4 bp AE site plus the number of base pairs it leaves behind before cutting. Blunt-ends are then generated and the two samples are then mixed together and ligated to form ditags. The ligated ditags are PCR amplified using primers specific to each linker. The PCR products are cleaved with the anchoring enzyme to release the linker from either end of the ditag generating overhanging sticky-ends that can complementary base pair with the overhanging sticky-end of another ditag. All ditags generated are concatenated to produce one long string of tags, and then introduced into a vector for cloning and sequencing. As a consequence of concatenation, clones contain 10-50 tags (Velculescu et al., 1995; Yamamoto et al., 2001). In our SAGE libraries each clone had approximately 40 tags. Using the sequenced genome, tags are then annotated to their gene of origin. The number of tags observed for a gene is directly proportional to the abundance of the gene’s transcript in the cell. SAGE results in both quantitative and qualitative data.  SAGE is being widely used for transcription profiling and for new gene discovery. SAGE has been applied to a variety of organisms, including yeast, worm, fly, mouse and humans (Divina and Forejt, 2004; Gorski et al., 2003; Jones et al., 2001;  10 Reinke et al., 2000; Velculescu et al., 1999; Velculescu et al., 1997). Although SAGE depends on the availability of a partially or fully sequenced genome, no prior knowledge of the transcripts is needed since there is no requirement for probe design (Pleasance et al., 2003; Velculescu et al., 1995). Tags that have not been matched to known or predicted genes have served as a source of new gene discovery (Chen et al., 2002; Lee et al., 2005). In the nematode C. elegans, SAGE is being used to generate a series of transcription profiles. This is possible because tissue specific libraries, such as the ciliome library (Chen et al., 2006), the intestine library  (McGhee et al., 2007), the germline library (Reinke et al., 2000), the neuronal libraries (Etchberger et al., 2007) , and the muscle library (Moerman, unpublished data) are available. There are also SAGE libraries for specific developmental stages (Jones et al., 2001; McKay et al., 2003). Having access to multiple libraries allow for the comparison of changes in gene expression patterns between the different tissue types and developmental stages.  11    12 Figure 3. Schematic of Serial Analysis of Gene Expression procedure. In the case of the muscle LongSAGE libraries, the anchoring enzyme used was NlaI and the tagging enzyme used was MmeI, which gave rise to 21bp long tags. See text for further details on the SAGE methodology. Based on Velculescu et al. (1995).     13 1.3.2 Caenorhabditis elegans muscle SAGE libraries  Our lab, in collaboration with Canada’s Michael Smith Genome Sciences Centre (Vancouver, Canada), generated two body wall muscle SAGE libraries. We used a C. elegans strain carrying a myo-3::GFP functional fusion construct, which results in GFP expression in the myosin filaments of body wall muscle. myo-3 codes for myosin heavy chain-A, which is located in the centre of thick filaments. Muscle cells begin expressing myo-3 at approximately the 1.5-fold embryo stage (Waterston, 1989). Worm embryos were ruptured and using a Fluorescent Activated Cell Sorter (FACS) GFP tagged muscle cells were isolated (McKay et al., 2003). The purity of such cell preparations is about 95%.  One of the limits to the usefulness of SAGE is tag specificity. Since SAGE tags are usually short, 9-14bp in length, there are instances when multiple genes share the same tag due to an overlap in sequence. To address this issue of ambiguous tag mapping, LongSAGE was used to generate the muscle SAGE libraries. In LongSAGE the tagging enzyme MmeI produces 21 bp tags (4bp AE site plus 17 bp tag) (Saha et al., 2002). Sequencing cost is a major factor in choosing between using shorter SAGE tags (14 bp) and being able to sequence more tags or using longer SAGE tags (21 bp) and increasing tag specificity but decreasing overall tag numbers. When SAGE data is analysed, ambiguous tags are removed, therefore having more short tags does not mean that more genes will be identified (McKay et al., 2003; Pleasance et al., 2003; Saha et al., 2002). Figure 4 shows an example comparing two C. elegans embryonic SAGE libraries, one using SAGE tags of 14bp and another using LongSAGE with tags of 21 bp. The LongSAGE library identified 1,416 genes more than the normal SAGE library. Pleasance  14 et al (2003) estimate that a C. elegans SAGE library generated from 14 bp tags would result in ~12% of genes in the whole genome giving rise to ambiguous tags, whereas a SAGE library using 17 bp tags would result in ~6.5% of genes giving rise to ambiguous tags. LongSAGE libraries have fewer ambiguous tags and hence fewer tags are filtered out from the data set and more genes are identified (figure 4). The resolving power of LongSAGE was the attractive feature that made us choose this method for generating the muscle SAGE libraries.      Figure 4. Comparison between numbers of genes identified by SAGE and LongSAGE C. elegans embryonic libraries The LongSAGE library identified 1416 genes more than the regular SAGE library. LongSAGE increases tag specificity as a consequence more tags are unambiguously annotated. Wormbase release used for mapping tags =WS180, sequence quality = 0.99.  Other limitations of SAGE arise from: 1) sequencing errors 2) mRNA that does not contain the anchoring enzyme recognition site (presently 500 transcripts in the C. elegans genome do not posses the cut site of the TE used for the muscle SAGE) 3) not being able to properly distinguish which splice variant is expressed as all splice forms of a gene may share the same tag. A big drawback for SAGE compared to microarray is the sequencing cost; however, as DNA sequencing technology improves, sequencing costs will continue to decline. In contrast, SAGE has advantages over microarray in that no probe design is needed and it can detect very low-abundance transcripts that could embryo SAGE library (SWN21)  embryo LongSAGE library (SWN22)  2107 8004 3523  15 otherwise be difficult to distinguish from background noise in a microarray. Since no previous knowledge of the transcriptome is required, SAGE has an incredible potential for identifying new genes (Chen et al., 2002; Lee et al., 2005; McKay et al., 2003). In recent years, the availability of complete genome sequences has allowed the development of whole genome tiling arrays where array-based hybridization is used to scan the genome. This array method also has the potential of discovering new genes without previous knowledge of transcripts (Biemar et al., 2006). Combining data from microarray and SAGE experiments could be used as a powerful tool for increasing the confidence in the transcripts identified. Although SAGE recognizes many transcripts that are expressed at low levels, most focus is given to genes that are of high abundance.  1.4 Low expressed genes and singletons  Genes with few tags annotated to them are considered low-expressing because, as previously mentioned, the number of tags annotated to a gene is proportional to the abundance of transcripts present in the cell. Many low-expressed genes are eliminated from the SAGE data set because when performing SAGE analysis a minimum tag count cut off value is usually established to decrease the number of false positive tags. This cut off value is typically between 3-5 tags for a library in the 100,000 tag size range (Chen et al., 2006; McGhee et al., 2007). Cut off values vary depending on the depth (total number of tags) of the SAGE library (Lash et al., 2000). In this thesis, the definition of a low-expressed gene is: a gene that has less than 5 tags annotated to it. If a gene has only one unique tag annotated to it, then that specific tag is considered a singleton. Low- abundance transcripts (<5 tags) have been shown to account for over 50% of genes  16 identified in SAGE libraries (Etchberger et al., 2007; Holland, 2002; Lee et al., 2005; McGhee et al., 2007). The percentage of genes considered low-expressed and the number of tags set as the cut off value will also depend on the quality of the sequence read. When analyzing SAGE data it is possible to set stringent filters that only allow tags with very low probability of sequencing errors to be used for gene annotation (section 3.1 in this thesis further addresses the issue of sequence quality). Many novel unmatched tags display low levels of expression (Chen et al., 2002; Kim et al., 2006; Lee et al., 2005). When unmatched tags are removed from the data set, then what do the other low- expressed transcripts represent? Some of these transcripts are indeed low-expressed genes present in the tissue being analyzed. Others can be false positives due to experimental errors, such as sequencing and annotation errors arising during SAGE procedure. False positives can also come from sample contamination where the gene is detected due to another cell type being present in the tissue specific cell sample used to extract RNA.  In this work I focus on the low-expressed genes detected in the muscle SAGE libraries. Many currently known muscle genes are not in the low-abundance category in the muscle SAGE libraries. For example, the myosin heavy chain (MHC) and paramyosin genes are expressed at high levels since these are the major components of thick filaments (figure 5). In the embryo, unc-54 (myosin heavy chain-B) is the minor myosin component as compared to myo-3 (myosin heavy chain-A) (Waterston, 1989). Gradually through development unc-54 becomes the major muscle myosin. Components of the attachment complex responsible for anchoring dense bodies to the basal lamina show mostly intermediate-abundance expression (figure 5). Since many highly expressed body  17 wall muscle genes have been well studied, examining low-abundance genes may reveal novel muscle components. My aim was to examine what percentage of low-expressed genes are not false positives and what roles might they perform within the developing muscle tissue. If these genes have non-essential roles in muscle cell formation and maintenance, it is possible that they were previously over-looked by genome wide screens due to subtle mutant phenotypes.   Figure 5. Tag counts in muscle SAGE libraries for some known muscle genes. These well-studied muscle genes are expressed at a higher abundance than low-expressed genes. The transcription factor hlh-1 is the gene with the lowest number of tags, with 2 tags in library 1 and 10 tags in library 2. Examining low-expressed genes may reveal genes that were never detected before. myo-3 has a very high tag count since the strain used as a source for muscle RNA carries an extrachromosomal array containing myo-3.   18 1.5 Approach taken to investigate low-expressed genes in muscle SAGE data.  The first step taken to investigate low-expressed genes identified in the muscle SAGE libraries was to validate their expression in muscle in-vitro via reverse transcription-PCR (RT-PCR). Next, to observe in vivo gene expression, promoter::GFP fusions were generated for the genes that were validated via RT-PCR. In-vivo validation also allowed for the elimination of positives in the in-vitro experiment that were originally identified due to contamination of other cell types in FACS sorted muscle cells. Finally, RNAi and knockout mutant analysis of the low-expressed genes confirmed via RT-PCR was performed. Using RNAi and knockout analysis it was possible to observe the importance of these low-expressed genes in myofilament organization. Based on the loss-of-function observations, 14 of the 34 genes tested showed ≥50% penetrance in myofilament disorganization phenotype. I found that at least 16% of the low-expressed genes tested are in fact expressed in muscle. If this trend holds true for all the low-expressed genes identified in the SAGE libraries, it means that approximately 1022 low-expressed genes are in fact present in muscle.   19 CHAPTER II - MATERIALS AND METHODS 2.1 Muscle SAGE library 2.1.1 Cell sorting for muscle cells In order to obtain RNA from muscle I had to first isolate muscle cells from embryos via fluorescent activated cell sorting (FACS). This was achieved by using the strain RW1596, which carries an extra-chromosomal array of myo-3::GFP and therefore expresses GFP in its body wall muscle cells (see section for more details on this strain) (McKay et al., 2003). RW1596 worms were grown on twenty-four 15cm nematode growth medium (NGM) agar plates supplemented with 8x peptone and the bacterium E. coli (χ1666 strain), for 2 or 3 days until worms became gravid adults. Five to six plates were washed at a time with 50 mL of autoclaved distilled water (dH2O) into a 50 mL polypropylene tube. Tubes were centrifuged; water was aspirated to just above worm pellet, and autoclaved dH2O was added to the 45 mL mark. Subsequently another spin was performed and once again the water was aspirated to just above worm pellet. Each tube was then treated with 25 mL of hypochlorite solution (75% dH2O, 20% sodium hypochlorite, 5% 10N KOH) for approximately 6.5 minutes while being shaken. Tubes were spun, solution was aspirated to just above pellet, and 20 mL of hypochlorite solution was added to each tube for approximately 2.5 minutes, until no more animal carcasses were visible in the solution. After centrifuging tubes, the resulting embryos were rinsed three times with minimal salts (M9) buffer to eliminate any trace of hypochlorite solution. Thereafter, the tubes were chilled on ice and 2 mL of egg buffer (6.896g NaCl, 3.578g KCl, 0.294g CaCl2.2H2O, 0.406g MgCl2.6H2O, 5.98g Hepes, per litre of water, adjust osmolarity to 3.40 then pH to 7.3) was added to each tube.  20  Embryonic chitin shell was digested by incubation in 0.5 units of chitinase at room temperature for one hour while on a rotator. Embryos were then dissociated using a syringe (3ml, 21G 11/2, Latex free, 0.8 mmx40 mm) containing 1.5 mL of egg buffer and pipetting up and down about 20 times to ensure that all cells were separated. Cells were then passed through a 5µm Millex-SV filter and pelleted by gentle centrifugation. Isolated cells were re-suspended in ice-cold egg buffer, and maintained on ice in preparation for sorting.  Cells were sorted using a BD FACSAria equipped with a 488nm Argon laser and a GFP filter set. First, WT non-GFP cells were passed through the FACS sorter in order to identify population of autofluorescent cells. Next, the scatter plot from cells of RW1596 embryos (described above) was compared to the scatter plot of the WT cells in order to identify location of GFP cell population (figure 6). Cells that showed GFP expression were selected for by placing a gate, P3, around their location to direct the machine to isolate cells falling into the P3 area (figure 6). A typical sort yielded approximately 400,000 GFP muscle cells. Sort purity was gauged by examining an aliquot of cells with a fluorescent microscope. Sorts were typically 95% pure (95/100 cells counted) (Christensen et al., 2002; Fox et al., 2007; McKay et al., 2003). Moving the P3 gate further to the right to make sorting more stringent did not increase the purity of the sort. Since the embryonic muscle cells do not survive very long, sorts were carried out for a maximum period of two hours. The P3 gate was kept in the position where the greatest number of GFP cells could be obtained while maintaining the 95% purity of the sample.  21  A            Figure 6. Illustration of fluorescent activated sorting of muscle cells. A) Images of embryo expressing myo-3::GFP construct and of FACS sorted embryonic muscle cells. Top row are images taking using Nomarski microscopy and bottom row show the GFP expression of muscle cells (95% of sorted embryo cells show GFP expression). B) Scatter plot of RW1596 embryo cells being sorted. Area P3 contains muscle GFP positive cells.  2.1.2 Generating the muscle SAGE library  Canada’s Michael Smith Genome Sciences Centre (Vancouver, Canada) was responsible for isolating RNA from GFP FACS sorted muscle cells (provided by our lab) and performing two LongSAGE experiments with the muscle RNA (McKay et al., 2003). The anchoring enzyme used to generate the LongSAGE library was NlaI and the tagging GFP population B WT non-GFP cells scatter plot RW1596, containing GFP muscle cells, scatter plot  22 enzyme was MmeI (figure 3) (McKay et al., 2003; Saha et al., 2002). The resulting tags were 21pb in length.  The two LongSAGE libraries were biological replicas: SWEM1 (library 1) and SW031 (library 2). The total numbers of SAGE tags present in each trial were: Library 1 (SWEM1) = 49,655 (sequence quality=0; raw tag count)  33,827 tags (sequence quality=0.99) Library 2 (SW031) = 120,825 (sequence quality=0; raw tag count) 89,561 tags (sequence quality=0.99)  Sequence quality was used to filter out tags that are more likely to contain a sequencing error. A sequence quality filter of 0.99 means that any tag where there is a higher than 1 in 100 chances of a base being incorrectly sequenced is filtered out (see section 3.1 for further information).  In these biological trials, 88% of tags were similarly expressed with a correlation coefficient of 0.94 (Moerman lab, unpublished data). The steps described thus far, leading up to the production of muscle RNA for the SAGE experiment are summarized in figure 7.         23  Figure 7. Summary of isolation of muscle cells for generating muscle SAGE library  2.1.3 Generation of list of low-expressed genes  A list of all the genes (cds) identified by either one of the SAGE libraries, along with the number of tags annotated to each gene, was generated. Some tags show redundancy in that they map to various splice variants of a gene, this type of tag is not splice-variant specific. When I created the list of genes identified by the SAGE libraries, redundant tags were collapsed so that splice variants would be counted as just one gene. Any tag that was specific to a splice variant was not collapsed. This list of genes was then filtered for genes that had less than 5 tags in both of the muscle SAGE libraries. The list was created using data from the wormbase release WS150 and the following specifications were used: sequence quality of 0.99, removal of all duplicate ditags, ambiguous tags and antisense tags, and lastly, tag numbers were not  24 normalized. Note, the wormbase release WS150 was the current release at the time the initial gene list was generated, however all further analysis performed in this thesis was done using wormbase release WS180.  This list was generated in collaboration with Adam Lorch, a bioinformatician in Dr. Moerman’s lab at the University of British Columbia (Vancouver, Canada). SAGE data from any of the C. elegans tissue specific SAGE libraries can be viewed on the MultiSAGE website, http://tock.bcgsc.bc.ca/cgi- bin/sage180, a C. elegans resource page provided by Canada’s Michael Smith Genome Sciences Centre (Vancouver, Canada). One hundred and twenty eight genes were selected from the list of low-expressed genes in such a way that these genes represented various possible tag combinations between the 2 muscle SAGE libraries (also see section 3.2). Subsequently, any information available for these 128 selected genes was collected: gene description, protein domain, homologues, RNAi and knockout phenotypes, and number of tags in tissue specific SAGE libraries other than muscle.  2.2 RNA isolation from FACS sorted muscle cells Whole embryo RNA was isolated from embryonic cells extracted from embryos as described in section 2.1.1, however there was no FACS sorting required here. Muscle cells were isolated as described in section 2.1.1.  2.2.1 Whole embryo RNA isolation Whole embryo RNA was isolated using 1 mL trizol reagent (Invitrogen Life Technologies) as per manufacture description. RNA pellet was dissolved in 500 µL  25 RNase-free dH2O (DEPC treated). RNA samples were quantified by a spectrophotometer, followed by DNase treatment. One unit of DNase I was used per 1 µg of RNA, then sample was incubated at 37oC for 30 minutes. Following incubation, 1 µL of 25 mM ethylenediaminetetraacetic acid (EDTA) was added and sample was further incubated for 10 minutes at 65oC to stop DNase action. The RNA was then ready to be used for RT- PCR.  2.2.2 Muscle RNA isolation  Muscle cell RNA isolation was performed using the Micro-FastTrack™ 2.0 Kit (Invitrogen Life Technologies). The manufacturer’s isolation protocol was followed, however a few changes should be noted. One mL of Lysis Buffer was added per tube containing 800,000 cells and incubated at 45oC for 15 minutes. After the addition of NaCl stock solution, DNA was sheared by passing sample 5 times through a 25G needle with a 1cc syringe.  The supernatant containing mRNA was added to the OligodT  cellulose powder tube and the tube was rocked for 20 minutes. After the 20 minutes incubation, centrifugation and removal of supernatant, the OligodT cellulose was washed exactly as described by manufacture. The mRNA was eluted from the OligodT cellulose in two separate 100 µl washes with Elution Buffer. The mRNA was then precipitated by addition of glycogen, sodium acetate and ethanol, as recommended by manufacture, to the elutant. The tube containing this mix was allowed to freeze at -80oC overnight. The following day, the tube was removed from -80oC, thawed and then spun at 16,000 g for 15 minutes at 4oC. All of the ethanol was suctioned off, followed by the addition of 10 µl elution buffer to re-suspend mRNA. The mRNA solution was further diluted with 90 µl  26 of DEPC treated dH2O before being quantified by a spectrophotometer (BioMate 3, Thermo Spectronic).  2.3 Reverse-Transcription Polymerase Chain Reaction 2.3.1 Primer design  The primers used for RT-PCR were designed using the program AcePrimer1.3, made available by the British Columbia Genome Sciences Centre (Vancouver, Canada) at the website: http://elegans.bcgsc.bc.ca/aceprimer/aceprimer.shtml. The following parameters were selected: requested program to generate at least 6 primer sets per exon, primer size was 20 bp ± 2 bp, optimum Tm 55oC ± 5oC, checked the box RT-PCR primer and input the minimum coding DNA size depending on the length of the gene, checked the box for e-PCR at word size 7 bp and allowed for 0 mismatches.  Program generated a list of primers for the specific gene based on the search parameters. The following characteristics were taken into account when selecting which primer to use for RT-PCR: low quality value (this means that primer set did not deviate far from optimal criteria defined by user), e-PCR showed that the particular primer set would not generate other unwanted products from the C. elegans genome, and finally, the primer set would generate a product from mRNA that could be easily differentiated by gel electrophoresis from a product of genomic DNA.  2.3.1 Whole embryo mRNA RT-PCR  The kit SuperScript™III One-Step RT-PCR with Platinum® Taq (Invitrogen Life Technologies) was used to perform the RT-PCR experiments. Initially, a titration  27 experiment was carried out in order to determine the minimum amount of mRNA needed in each reaction in order to obtain a product. This titration was performed using two control genes hlh-1 (essential muscle transcription factor expressed at a low level – total of 12 tags between the two body wall muscle SAGE libraries) and unc-54 (myosin heavy chain B expressed at a high level – total of 102 tags between the two body wall muscle SAGE libraries) (figure 8). Since the genes tested in this study were of low-abundance, I used the minimum amount of RNA required for obtaining a product for the hlh-1 transcript. Each one-step RT-PCR reaction contained 25 µL 2x reaction mix (provided with kit), 10 ng whole embryo mRNA, 2 µL of each forward and reverse primers, 1 µL RT/Taq mix and autoclaved, DEPC treated dH2O to top up reaction volume to 50 µL. The specifications programmed into the thermocycler for carrying out the one step RT- PCR reaction are described in Table 1.    Figure 8. Titration reactions Titration experiment to determine minimum amount of whole embryo mRNA required for each RT-PCR reaction.  Template: whole embryo mRNA 100bp ladder shown on gel  Product sizes: hlh-1: 845 bp unc-54: 735 bp hlh-1 unc-54 600bp 200bp  28 Table 1. Specifications of thermocycler program used for RT-PCR reaction. cDNA synthesis and pre- denaturation PCR amplification Final Extension 50oC for 30 min 94oC for 2 min 35 cycles: Denature: 94oC for 15s Annealing: 50oC for 30s Extension: 72oC for 1 min 1 cycle at 72oC for 10 min Incubate at 4oC forever   RT-PCR products were separated by agarose gel electrophoresis and visualized by utilizing SyberSafe gel stain (Invitrogen) followed by imaging with an ultraviolet transilluminator.  2.3.2 Muscle mRNA RT-PCR  The SuperScript™ One-Step RT-PCR with Platinum® Taq (Invitrogen Life Technologies) kit was also used for RT-PCR reactions using muscle mRNA as a template. Initially, a titration experiment was carried out in order to determine the minimum amount of mRNA needed in each reaction to obtain a product. The titration reactions used the same two control genes, hlh-1 and unc-54 (see figure 9), as the titration reactions performed with whole embryo mRNA. The contents in each RT-PCR reaction were the same as those used for whole embryo mRNA RT-PCR. The reactions differed in that here the template was 10ng of muscle mRNA. The program utilized for the thermocycler was the same as that shown in Table 1, except that 40 cycles of PCR amplification were executed, rather than only 35 cycles. Muscle mRNA template RT-  29 PCR products were also separated by agarose gel electrophoresis and visualized by SyberSafe gel stain (Invitrogen) followed by imaging with an ultraviolet transilluminator.    Figure 9. Titration reactions Titration experiment to determine minimum amount of muscle mRNA required for each RT-PCR reaction. The gene hlh-1 is transcribed at much lower levels as compared to unc-54 and therefore requires a larger quantity of mRNA in the reactions for its detection.  2.4 Promoter fusions 2.4.1 PCR stitching to construct promoter fusions Primer design Primers were designed with the aid of the PCR primer design for C. elegans promoter::GFP fusions website, http://elegans.bcgsc.bc.ca/promoter_primers/index.html, provided by Canada’s Michael Smith Genome Sciences Centre (Vancouver, Canada). 600bp 200bp Template: muscle mRNA 100bp ladder shown on gel  Product sizes: hlh-1: 845 bp unc-54: 735 bp  30 The desired amplicon size was 3.5-4 kb upstream of the start codon. Primers were between 20-25 nucleotides long, with optimum melting temperature ranging from 58- 62oC. A nested forward primer also had to be designed (figure 10, primer A’), A’ was typically 3-10bp downstream as recommended by Hobert’s PCR stitching protocol (Hobert, 2002). However, these primer specifications had to occasionally be adjusted depending on the gene’s location in the genome. E-PCR was performed for every primer set to ensure that only one product would be generated in the promoter amplification PCR reaction. PCR stitching reactions  The PCR fusion reactions were performed in accordance to Hobert’s protocol published in BioTechniques (Hobert, 2002). This procedure is summarized in Figure 10. Two PCR reactions were carried out: PCR reaction 1 amplified the promoter region, PCR reaction 2 amplified the GFP and unc-54 3’UTR sequence from vector pPD95.75. Then approximately 10-50 ng of product from each PCR reaction was added to the fusion PCR reaction mix, which contained nested primers A’ and D’. Primers A, A’ and B were specific for each gene. Primers D (AAGGGCCCGTACGGCCGACTAGTAGG) and D’ (GGAAACAGTTATGTTTGGTATATTGGG) were designed based on the unc-54 3’UTR sequence. For every PCR reaction Expand Long Template PCR System (Roche) was used. The solution containing the fusion PCR products was clogging the microinjection needles used to inject constructs into worms. For this reason, I found it necessary to purify the fusion PCR product using QIAquick PCR Purification Kit (Qiagen), before using it for injections.  31   Figure 10. PCR stitching protocol Summary of promoter::GFP fusion protocol for creating reporter genes for expression analysis. Modified from Hobert (2002).  2.4.2 Microinjection procedure  To perform promoter expression analysis, purified PCR fusion product was injected into worms. The PCR product was injected along with two co-injection markers, pRF4 [rol-6(su1006dm)], which contains a copy of a mutant collagen gene rol-6(su1006) leading to a roller phenotype in successful transformants, and pBx [pha-1::pha-1(+)], which carries the WT copy of the pha-1 gene thus being able to rescue the temperature lethal phenotype of the strain GE24. The injection mixes contained 20-50 ng/µL of PCR product and 45 ng/µL of each pRF4 and pBx. This mix was injected into gonads of GE24 worms by established methods (Mello et al., 1991) using a Zeiss inverted compound microscope (IM35).  32 The strain GE24 is homozygous for pha-1 (e2123t). This mutation causes animals to arrest at L1 if grown at 25oC, however animals are viable at 15oC. Gravid adult animals grown at 15oC were used for microinjections and then transferred to 25oC. Only worms carrying the injected DNA extrachromosomal array are able to survive at 25oC. Transformants were picked based on their viability (rescued by pha-1 co-injection marker) and their roller phenotype.  2.4.3 Microscopic analysis  All phenotypic analysis of worms was performed using a dissecting microscope (Wild Heerbrugg model). All transgenic strains worms were imaged using a compound fluorescent microscope (Zeiss Axiophot D-7082 Oberkochen), at 400x-1000x magnification (see magnification specified on images in chapter 3), and a Qimaging QICAM digital camera running Qcapture version 1.68.4. Microscopic analysis was focused on identifying the presence and location(s) of GFP expression in adults, larvae and embryos. The goal was to confirm the presence of GFP expression in muscle cells.  2.5 Mutant and RNAi analysis 2.5.1 Knockout strains analysis Obtaining strains All knockout strains were provided by the Caenorhabditis Genetics Center, University of Minnesota, (USA): CB362, RB889, VC910, VC346, RB1127, CB315, KG421, DW101, PT709, HR483, GS2735.   33 Polarized optics imaging  The myofilaments of the knockout strains were visualized by using polarizing optics on the compound microscope (Zeiss Axiophot D-7082 Oberkochen) at 400x magnification. Slide must be rotated until the worm is in the correct position relative to the polarized light source and thus allowing visualization of the filaments.  2.5.2 RNAi analysis RNAi analysis was conducted to observe the effects of knocking out low- abundace genes on myofilament integrity. Strains used  Two strains were used for RNA interference analysis: RW 1596 and MT2495. RW1596 (myo-3(st386) V; stEx30[myo-3::GFP + rol-6(su1006)]):  This strain was provided to our lab by Pamela Hoppe, Western Michigan University (USA). This strain is lethal when homozygous due to the myo-3(st386) allele. However, these animals can survive since they are rescued by an extrachromosomal array, stEx30, containing a wild-type copy of the myo-3 gene fused in frame with the coding sequence for GFP. This array also carries the marker rol-6(su1006) which confers a roller phenotype to these animals. The MYO-3 protein (myosin heavy chain-A) is tagged with GFP and thus allows us to observe muscle thick filaments using fluorescence microscopy. This strain is hypersensitive to myofilament disintegration (Meissner, personal communication).   34 MT2495 [lin-15(n744) X]:  This strain was provided by the Caenorhabditis Genetics Center, University of Minnesota (USA). The MT2495 strain is a mutant for lin-15B, which is a negative regulator of RNAi, thus making these animals hypersensitive to RNAi. MT2495 worms appear wild-type under normal growth conditions. (Lehner et al., 2006). Unlike the RW1596 worms, this strain does not contain GFP and is not hypersensitive to myofilament disorganization.  N2 (Bristol): Wild-type strain used as a control (Brenner, 1974) for proper myofilament structure. Preparation of worms for screening  RW1596 and MT2495 worms were grown on 5 to 6, 60 mm NGM agar plates streaked with E. coli (OP50 strain). Plates were washed with M9 buffer and then dispensed into a 15 mL polypropylene tube. Ten mL of hypochlorite solution were added to each tube. Tubes were shaken until the solution turned yellow and there were no visible animal carcasses. Embryos were washed 3 times with M9 buffer to completely eliminate the hypochlorite solution. Embryos were re-suspended in approximately 2 mL of M9 buffer and transferred to a 1.5 mL microcentrifuge tube. The microcentrifuge tube was placed on a slow-speed rotator and left rotating overnight at room temperature. This allowed for embryos to hatch and thus give rise to a synchronized population of first larval stage (L1) worms.  35 RNAi feeding  Bacterial clones containing dsRNA generating plasmids were picked from the publicly available frozen Geneservice feeding library (originally produced by Dr. Julie Ahringer’s lab at the University of Cambridge, UK) and grown overnight in Lennox broth (LB). Fifty µL of overnight culture was streaked onto a NGM plate containing 1 mM isopropyl-beta-D-thiogalactopyranoside (IPTG), to induce production of dsRNA in bacteria, and 50 µg/mL carbenicillin, to select for bacteria containing RNAi construct. For each overnight culture (one culture per gene of interest), approximately 16 plates were streaked. Streaked plates were incubated overnight at room temperature to allow bacteria to produce dsRNA.  Aliquots, containing an average of 20 L1 worms were taken from the M9 solution in the microcentrifuge tube containing hatched embryos (described in, and placed on RNAi plates (one plate per feeding vector). Plates were incubated at 20oC until worms reached young adult stage (~60-68 hours). Four young adult worms from each plate were transferred to two new RNAi plates (2 worms per plate) corresponding to the same feeding vector. Worms were allowed to feed on bacteria containing dsRNA for approximately 18 hours, until they had laid most of their embryos. These adult worms (called the Po) were then removed and plates containing embryos were placed at 20oC for about 36 hours, until they had reached fourth larval stage (L4)/young adult stage (the F1 generation).     36 Inspecting for overt phenotypes  Some animals never developed into L4/young adult because they showed phenotypes such as embryonic lethality (Emb) or larval arrest (Lva). In the case of Emb or Lva phenotypes, imaging for muscle defects was performed on Po worms at the L4/young adult stage. The F1 worms that reached the L4/young adult stage were screened for any overt mutant phenotype under a dissecting scope (Wild Heerbrugg model). Some phenotypes observed were: slow growth (Slo), protruding vulva (Pvl) or other body morphology defects (Bmd), sterility (Ste), paralysis (Prl), uncoordinated movement (Unc) or sick worms (sck). Observing myofilament organization  The RNAi treated F1 worms were imaged at the L4/young adult stage by means of microscopic analysis using a compound fluorescent microscope (Zeiss Axiophot D- 7082 Oberkochen). A 15 µl drop of M9 containing approximately 0.1% sodium azide was placed on a glass microscope slide, subsequently, 20 animals were added to this M9 drop. A 24 mm x 24 mm coverslip was carefully placed onto the slide surface in order to avoid rupturing of worms. RW1596 worms were imaged at 400x magnification using a Qimaging QICAM digital camera running Qcapture version 1.68.4. Whereas screening of MT2495 worms was performed at 400x magnification using polarizing microscopy (refer to section  Screening of RW1596 worms subjected to RNAi revealed myofilament defects such as aggregation of protein product (GFP aggregation), reduced protein levels  37 (reduced fluorescence), and disorganization of thick filament. When screening MT2495 worms that were exposed to RNAi, I examined the overall integrity of the myofilaments and inspected for possible gaps within the myofilament lattice. Slides containing ≥50% of worms displaying any myofilament defect had their respective gene labeled as muscle affecting. These genes were selected for re-screening in order to obtain increased confidence that the gene was in fact causing myofilament disorganization. The same procedure as the one described in this section (2.5.2) was used for re-screening. Genes that resulted in a ≥50% penetrance of myofilament disorganization when re-screened for its RNAi phenotype kept its label of being muscle affecting.  38 CHAPTER III – RESULTS   In this chapter I show the results of my investigation into low-expressed genes identified by the SAGE libraries. The interpretation of the data provided in this chapter serves as evidence to answer such questions: what percentage of the SAGE data is made up of low-expressed genes (<5 tags), what percentage of these are not false positives, and what roles might these have within the muscle cell? 3.1 The muscle SAGE libraries The muscle SAGE libraries were generated from two biological trials performed by Canada’s Michael Smith Genome Sciences Centre (Vancouver, Canada) as previously described. Both libraries were created using LongSAGE (21bp tags) and contained a total of 33,827 tags (library 1) and 89,561 tags (library 2) respectively. Together these two libraries identify 7975 genes (figure 11).       Figure 11. Summary of genes expressed in body wall muscle as defined by SAGE Venn diagram depicting the number of genes identified in each muscle SAGE library and the number of genes identified in both and either libraries. Data generated using wormbase release WS180 and sequence quality 0.99.  2929 Muscle library 1 (SWEM1) Muscle library 2 (SW031) 4330 716   39  From these 7975 genes identified in the SAGE libraries, 6388 genes had less than 5 tags, thus making them what I classify as low-expressed genes (figure 12).          Figure 12. Summary of low-expressed genes expressed in body wall muscle as defined by SAGE Venn diagram depicting the number of low-expressed genes identified in each muscle SAGE library and the number of low-expressed genes identified in both libraries. Data generated using wormbase release WS180 and sequence quality 0.99.  Sequence quality is another factor that must be considered when examining genes identified in the SAGE data. Figure 13 demonstrates that as the sequence quality becomes more stringent the number of genes identified decreases. Early SAGE studies did not employ sequence quality filters, however, Pleasance et al. (2003) suggest that by applying a quality filter we can remove ambiguous tags and tags mistakenly annotated to a gene due to sequencing errors. The analysis performed in this thesis used a sequence quality cut off of 0.99. This means that for a tag to be considered in my data set, each of the peaks on its sequence read must meet a minimum threshold. This threshold is a Phred score of 20, implying that we are at least 99% sure that every base was correctly sequenced (Ewing et al., 1998). Some tags at the end of the concatamer array (figure 3) are of lower quality since these are further from the sequencing primer. If the sequence of the bases of these lower quality tags does not meet the 0.99 requirement, then the tags are discarded from the analysis. A sequence quality score of 0.99 assures that the tag 1681  Muscle library 1 (SWEM1) Muscle library 2 (SW031) 2611 2096  40  sequence has high fidelity but it is not as stringent as quality 0.9999 where ~1000 genes are lost compared to the raw data (sequence quality 0) (figure 13). The sequence quality score of 0.99 has been widely used for SAGE data analysis (Beissbarth et al., 2004; McGhee et al., 2007).   Figure 13. Relationship between sequence quality score and number of genes identified in Muscle Library 2 (SW031) As sequence quality is made more stringent the number of genes identified by the SAGE library decreases.  3.1.1 Presence of low-expressed genes in Muscle Library 2  If we focus on the larger muscle SAGE library (Muscle Library 2), we see that the majority of genes identified in this library are expressed at low-levels. Figure 14 shows the relationship between the number of tags and the number of genes expressing each particular tag count. The total number of genes identified in this library is 7259 (figure 14) of which 59% (4292 genes) are low-expressed (have less than 5 tags) and 25% (1824 genes) are represented by a single tag (singleton).    41  Figure 14. Distribution of tag counts in Muscle Library 2 (SW031) The majority of genes identified by this library are expressed at low levels. Fifty percent of the genes in this muscle SAGE library have 3 or less tags. Data generated using sequence quality = 0.99 and wormbase release WS 180.  3.1.2 Gene Ontology annotation of singletons  A Gene Ontology annotation (GO) was used to categorize gene products of singletons. GO uses sequence and functional conservation to deduce biological labels for gene products (Ashburner et al., 2000). A list of all singletons identified by the SAGE libraries was queried against twenty selected GO terms. These twenty GO terms relating to essential cellular processes were deemed as a reasonable number of terms to provide an overview of functional group categories among the singletons’ gene products. The percentage of singletons annotated to each GO term is shown in figure 15. For comparative purposes, we found the percentage of total genes in the C. elegans genome with that particular GO annotation (information obtained from wormbase release 180). Tag Count vs Number of Genes for Muscle Library 2 (SW031) Number of low-expressed genes = 4292 (59% of total genes in library)  Number of singletons = 1824 (25% of total genes in library) 3 50% of genes  42 Our goal was to observe the presence of biases shown by singletons to any of the chosen GO terms. As shown in figure 15, the percent of singletons annotated to a GO term is mostly on par with the percent of genes in the whole genome annotated to the GO term. Singletons appear to show a bias for the GO annotation “Cell organization and biogenesis” (a process that is carried out at the cellular level which results in the formation, arrangement of constituent parts, or disassembly of a cellular component; includes the plasma membrane and any external encapsulating structures such as the cell wall and cell envelope) (Ashburner et al., 2000). On the other hand, singletons are under represented in the “signal transduction” and “cell communication” categories.  Figure 15. GO annotations of singletons versus entire genome Bars represent the percentage of genes in each GO category. There does not appear to be a bias for a certain category for singletons. The only category where the percentage of singletons exceeds the percentage of genes proportional to the entire genome is “cell organization and biogenesis”.   43 3.2 Verification of low-abundance transcript expression in muscle cells  The SAGE libraries identified 6388 low-expressed genes (genes with less than 5 tags). Reverse transcription polymerase chain reaction (RT-PCR) was applied to 128 low- abundance transcripts in order to verify their expression in muscle. When selecting these 128 genes I wanted to provide a broad sample of possible tag representations among the two muscle libraries. From the 128 genes 20% are singletons in muscle library 1 and have no tags in muscle library 2, 20% are singletons in muscle library 2 and have no tags in muscle library 1, 25% are singletons in both libraries, 25% have less than 5 tags in both libraries (but are not singletons in both libraries), 10% have between 2-5 tags in one library but no tags in the other library. I did not take into account the gene product description when selecting these genes. These were selected merely on tag counts, therefore avoiding biases for specific genes based on their predicted function. The goal was to work with a suitable number of low-expressed genes and determine what percentage of these are in fact expressed in muscle.  Two positive controls were used in the RT-PCR reactions: unc-54 and hlh-1. unc- 54 is a myosin heavy chain-B gene which is one of the major components of body-wall muscle thick filaments (Epstein et al., 1974; White et al., 2003). This gene has 26 SAGE tags in muscle library 1 and 56 tags in muscle library 2, therefore it was used as a high- abundant control (gene containing a large number of SAGE tags). The gene used as a low-abundance control was hlh-1, which only has 2 tags in muscle library 1 and 10 tags in muscle library 2. hlh-1 is the C. elegans homologue of mouse/human Myo-D, an essential gene of the myogenic regulatory factor family, involved in muscle cell fate determination during embryogenesis (Fukushige and Krause, 2005).  44  The initial RT-PCR reactions were performed using 50 ng of whole embryo RNA (figure 16, panel A). This was done to test the effectiveness of the primer sets, to adjust reaction conditions and to determine quantity of RNA required for reactions. Obtaining muscle RNA via cell sorting is labour intensive and therefore these initial optimizing tests were performed using whole embryo RNA, which is easily obtainable. Out of the 128 genes tested, 114 gave positive results using whole embryo RNA as template. See section 4.1 for discussion on possible reasons as to why the remaining 14 genes did not produce positive results in whole embryo RT-PCR.  These 114 primer sets that generated positive RT-PCR reactions were then used to perform RT-PCR on muscle RNA as a template. Forty-three reactions, out of 114, gave affirmative products using muscle RNA as a template. This indicates that 38% of the low-expressed genes selected showed expression in muscle via RT-PCR. The value of 38% represents a minimum percentage of the selected low-expressed genes that are expressed in muscle. There may be false negatives that resulted from RNA degradation, non-optimal reaction conditions and not enough RNA concentration as template. Figure 16, panel B, shows that 10 ng of muscle RNA was a sufficient amount of template for the RT-PCR according to controls. It is possible that some genes required more and 10ng of muscle RNA as template. The genes that did not produce positive products in the muscle RT-PCR reactions could have represented transcripts identified due to sample contamination or sequencing errors (see section 4.1 for further discussion). The muscle RNA used for the RT-PCR reactions did not come from the same batch of RNA used for the SAGE experiment. Nonetheless, the RNA used for muscle RT-PCR was isolated in the same manner as the RNA used for the SAGE experiment.  45 The data for the in vitro RT-PCR verification of the 128 low-expressed genes is summarized in table 2. Here, it should also be noted that out of the 43 verified genes, 25 were singletons. Since there were 72 singletons in the 114 genes that resulted in correct reaction products using whole embryo RNA, it can be inferred that 35% of singletons were confirmed. Table 5 shows a summary of the data obtained in this thesis for these 43 positive genes.  Table 2. Summary of RT-PCR results Initial number of low-expressed genes tested 128 Number of positive RT-PCR reactions using whole embryo RNA 114/128 [72/114 were singletons (63%)] Number of positive RT-PCR reactions using muscle RNA 43/114 [25/43 were singletons (58%)] Percent of genes confirmed 38% [25/72 = 35% of singletons confirmed]  A)            B)  Figure 16. RT-PCR reaction products Panel A demonstrates RT-PCR reactions using 50 ng of whole embryo RNA as template. Panel B shows the same RT-PCR reactions from panel A, however, the template used was 10 ng of muscle RNA. unc-54 and hlh-1 were the controls for these reactions.   46 3.3 The use of promoter fusions to verify in vivo expression of low- expressed genes in body wall muscle Subsequent to demonstrating in-vitro that 38% of low-expressed genes show expression in muscle, promoter::GFP fusions were generated to now verify in-vivo expression of these genes. An in-vivo experiment was performed where the promoter (5’ upstream regulatory region which enables transcription) from each gene was fused to a reporter gene coding for GFP (Chalfie et al., 1994) and injected into worms in order to examine that gene’s expression pattern. This in-vivo experiment also allowed for addressing the possibility that some positive RT-PCR products may have been the result of sample contamination in the RNA. The RNA used for the SAGE library and the RT- PCR came from muscle cells isolated in the same manner. If the muscle cell sample were not pure, the presence of contaminating cells from other tissues would contribute to false positive results in both the SAGE and the RT-PCR experiments. For each of the 43 genes that had their expression tested via RT-PCR a promoter::GFP fusion was generated using Hobert’s PCR fusion protocol (Hobert, 2002). Primers were designed to amplify 3.5 to 4 kb of the region upstream of the start codon for each gene. Depending on the location within the genome, certain genes had a shorter upstream region available for amplification. Out of the 43 genes, 10 already had promoter::GFP fusion data available through the C. elegans Gene Expression Project (Vancouver, Canada) (Hunt-Newbury et al., 2007) and 5 had fusion expression pattern available through literature (table 3). Table 3 summarizes the expression pattern available for 24 of the 43 genes. From these 24, the expression of 14 genes cannot be detected in muscle and 10 genes show expression in muscle (summarized in table 3).  47 Table 3. Summary of promoter::GFP fusion data. Genes showing expression in body wall muscle are highlighted in grey. For full gene description please see table 5.  Gene (cds) Location of promoter::GFP expression Fusion data generated by 1 M03F8.1 Hypodermis My own promoter::GFP construct 2 M03A8.2 Body wall muscle and other tissues Gene expression project 3 R12H7.5 Pharynx My own promoter::GFP construct 4 F28F8.6 Body wall muscle and other tissues Gene expression project 5 Y50D4C.1 Gut, excretory cells, gonads, pharyngeal-intestinal valve My own promoter::GFP construct 6 B0350.2 Body wall muscle and other tissues Gene expression project 7 Y45F10A.7 Body wall muscle, pharynx and vulval muscle My own promoter::GFP construct 8 R10E11.1 Body wall muscle and other tissues Gene expression project 9 T22F7.1 Excretory cells, gut, pharynx and other tissues My own promoter::GFP construct 10 C11H1.5 Seam cells My own promoter::GFP construct 11 T23E7.2 Body wall muscle My own promoter::GFP construct 12 B0414.7 Gut and pharynx My own promoter::GFP construct 13 R06A10.2 Body wall muscle and other tissues Literature research 14 R13H4.1 Neurons Literature research 15 R05H10.5 Gut, pharynx, tail My own promoter::GFP construct 16 C05C10.6 Neurons Gene expression project 17 C06C3.1 Pharynx Literature research 18 C27B7.1 Many tissues including body wall muscle Literature research 19 E04D5.1 Intestine, hypodermis, head neurons Gene expression project 20 F49E2.5 Body wall muscle Literature research 21 F56B3.5 Pharynx Gene expression project 22 Y17G7B.5 Intestine and other tissues (not BWM) Gene expression project 23 Y71F9AL.17 Body wall muscle and other tissues Gene expression project 24 Y71H2B.10 Pharynx and some neurons Gene expression project  The data presented in table 3 indicates that approximately 42% (10/24) of the 24 genes tested via promoter::GFP fusions show in vivo expression in body wall muscle.  48 Figure 17 demonstrates examples of promoter::GFP fusion being expressed in body wall muscle and in other tissues. Expression in body wall muscle (figure 17, panels A and B) can be easily distinguished from expression in other tissues (figure 17, panels C and D) by using a fluorescence compound microscope.   49  A) Body wall muscle   B) Body wall muscle  C) Pharynx  D) Gut and tail  Figure 17. Promoter::GFP fusions showing expression in body wall muscle and other tissues (A), (B) pY45F10A.7 and pT23E7.2 are examples of genes for which I made promoter::GFP constructs and GFP expression was present in muscle cells. GFP expression was present in embryo through adult stages. (C) pR12H7.5 shows GFP expression in the pharynx and (D) pT22F7.1 shows expression in the gut and the tail. These images illustrate the ease of defining GFP expression in body wall muscle cells and demonstrate that indeed there are false positive low-expressed genes that are not expressed in body wall muscle. All images were taken using a fluorescent compound microscope at 400x magnification.  pT23E7.2a pY45F10A.7 Nucleus 20 µm 20 µm pR12H7.5 pT22F7.1 tail End of gut 20 µm 20 µm  50 3.4 Role of low-expressed genes in muscle cells  Next, I investigated whether any of these verified low expressed genes played a role in myofilament formation, maintenance and integrity. This was accomplished by analyzing the effect of knocking out the low-expressed gene in the worm. Using the 43 genes that were validated via RT-PCR I searched for knockout mutants available from the Caenorhabditis Genetics Center (CGC) (University of Minnesota, USA). RNA interference (RNAi) was also used as a method of inhibition of gene expression (Fire et al., 1991; Fire et al., 1998). Clones carrying double stranded RNA were selected from the Ahringer RNAi feeding library (Kamath et al., 2003). Of the 43 genes that tested positive for muscle expression via RT-PCR: • 11 genes had knockout mutants available from CGC • 34 genes had RNAi clones available from feeding library • 6 genes had neither an RNAi clone nor KO available  3.4.1 Knockout analysis  Analysis of each knockout mutant was performed under a compound microscope fitted with a polarizing filter. This allowed for assessment of myofilament integrity within a living animal. The summary of the analysis performed on the 11 knockout strains is shown in table 4. Careful observation was also made regarding any observable overt phenotype and these are mentioned in table 4.     51 Table 4. Summary of knockout strain analysis and their corresponding RNAi phenotype RNAi was performed on the same genes that were mutated in the knockout strain. Strains RW1596 and MT2495 were used for RNAi.  Name of KO strain – gene knocked out Observations of myofilament and overt phenotype of KO strain RNAi on RW1596 strain RNAi on MT2495 strain 1- CB362 – unc-44  - Normal filaments - Uncoordinated phenotype, slightly dumpy 41.2% Mild filament defect 2- RB889 – ceh-40  - Normal filaments. - No overt phenotype 25% Normal filaments 3- VC910 – mtk-1   - Normal filaments. Few worms may have very slight disorganization, but hard to be certain. - “bag of worms”, sick, some male formation 38.7% Normal filaments 4- VC346 – atx-3  - Normal filament. - No overt phenotype RNAi clone not available 5- RB1127 – Y54G2A.2b  - May have very slight disorganization, but hard to be certain. - High incidence male formation RNAi clone not available 6- CB315 – unc-34  - Normal filaments - Uncoordinated phenotype 36.4% Normal filaments 7- KG421 – gsa-1   - Normal filaments. Note: Very hard to image muscle due to large amounts of eggs accumulated inside the worm. - Egg laying defective, hyperactive, “bag of worms”. Adults are smaller than WT adults. 16.7% Normal filaments 8- DW101-atl-1   - Normal filaments. - Embryonic lethal - lays dead embryos 53.5% Normal filaments 9- PT709-nphp-4  - Normal filament - High incidence male formation 46.1% Normal filaments   52  10- HR483- mel-11/mnC1 (deletion linked to unc-4, picked unc young adult for imaging) - Normal filament. Very hard to image these worms since they have so much background auto-fluorescence. 19% Normal filaments  11- GS2735 – spr-2 (deletion linked to dpy-20, picked dpy young adult for imaging) - Normal muscle RNAi clone not available    53 All 11 knockout strains appeared to have normal myofilaments (figure 18). If there was in fact any disorganization, it was too mild to detect with certainty.   WT – control   CB362 – unc-44  RB889 – ceh-40  CB315 – unc-34 Figure 18. Examples of myofilament structure from knockout strains All knockout strains had normal filament structure. Here the myofilaments of three KO strains are shown as examples. WT image was taken at 1000x magnification using polarized optics. All other images were taken using polarized optics at 400x magnification.   3.4.2 RNAi analysis  Here I took a reverse genetics approach to inhibit gene expression for the selected low-expressed genes. RNAi has been widely used in C. elegans due to the availability of a frozen bacterial library containing ~17,000 clones carrying dsRNA. This library covers  54 about 87% of predicted C. elegans genes (Kamath et al., 2003). The E. coli carrying dsRNA is then streaked on an agar plate and fed to worms to induce gene interference (Timmons et al., 2001). RNAi using strain RW1596 The strain RW1596 carries an extrachromosomal array in which myosin heavy chain A (myo-3) is tagged to GFP. MYO-3 is found in the centre of thick filaments, therefore every thick filament is tagged with GFP (Epstein, 1990). This allows for easy direct visualization of myosin filaments using a fluorescence microscope. It has been observed in previous RNAi experiments performed in our lab (Meissner, personal communication) that the RW1596 strain provides a sensitized background for examination of myofilament disorganization. The myofilaments in RW1596 are more fragile than N2, thus knocking down genes that only have a mild effect on filament organization make the filaments in this strain fall apart. This enhanced filament fragility may be due to the fact that the myosin heavy chain-A is bulkier with GFP attached. Possibly, the voluminous MHC-A::GFP cannot properly compact itself in the middle of the thick filament (Meissner, personal communication).   Out of the 34 genes with RNAi clones available from the Ahringer library, 8 genes also had knockout mutants available from the CGC. The results from the KO mutant analysis have already been mentioned in section 3.4.1 (table 4). The first RNAi screen was performed on the 26 genes for which no KO mutant was available. Typical observed phenotypes included aggregation of protein product in clumps or in a “banding pattern” and disorganization of filament alignment (figure 19). Screening was performed  55 by observing animals on a microscope slide and counting the percentage of animals that had myofilament disorganization (phenotype penetrance). In the first screen, 16/26 genes caused myofilament disorganization phenotype with ≥50% penetrance when knocked down. As a negative control RW1596 worms were fed E. coli containing the L4440 vector without an insert. The muscle of the negative control appeared exactly like the muscle of the RW1596 feeding on bacteria with no vector (figure 19, panel A). Both of these negative controls showed a 10-20% penetrance of myofilament disorganization. Thus, background disorganization can account for up to 20% of observed phenotype when silencing gene expression in the RW1596 strain. Two positive controls were used to verify the effectiveness of the RNAi. A bacterial clone carrying the L4440 vector containing an epi-1 insert gave rise to sterile Po animals, and a clone carrying the L4440 vector containing an unc-97 insert gave rise to myofilament disorganization since unc-97 is essential for filament structure. Worms were screened at the L4 to young adult stage because it has been noted in our lab that the strain RW1596 shows progressive muscle degeneration, which initiates shortly after the animal becomes an adult. At the 1-day old adult stage, 32% of the animals exhibit filament disorganization (Moerman lab, unpublished data).  Subsequent to the preliminary screen, the 16 genes with ≥50% phenotype penetrance were subjected to a secondary screen. The goal of the secondary screen was to verify that the observed myofilament defects would carry through a second round of screening. This would provide more confidence in the noted defects. In the secondary screen 13/16 genes showed ≥50% penetrance of myofilament disorganization phenotype.  56 The genes considered muscle affecting ranged from 50%-69% in phenotype penetrance in the second screen. RNAi on genes that also had KO available Eight out of the eleven knockout strains analyzed also had RNAi clones available from the RNAi feeding library.  These were screened in the same manner described above in section One out of the eight genes screened showed a myofilament disorganization phenotype penetrance ≥50% (table 4). This observation suggests that there was little discrepancy between the KO analysis of these strains, where no myofilament disorganization was observed, and the RNAi analysis, where only one gene showed minor filament disruption (table 4). Together, the RNAi screen data from sections and, indicates that 14/34 (41%) genes had ≥50% penetrance of disrupted myofilament organization when carrying out RNAi on the RW1596 strain. RNAi using strain MT2495 The other strain used for testing the RNAi effects of low-expressed genes on myofilament organization was the MT2495 strain. This strain is hypersensitive to RNAi due to a mutation in the gene lin-15B, which has been shown to maximize the effect of gene knockout via dsRNA in worms (Lehner et al., 2006). Unlike the RW1596 strain, the MT2495 strain although being RNAi hypersensitive it is not sensitized for myofilament disorganization specifically.  57  When observing the effects of RNA interference on the MT2495 strain, worms had to be visualized under a compound microscope using a polarizing filter (figure 19). Due to the fact that there is no GFP to aid with visualization, it is difficult to estimate percent penetrance of the RNAi phenotype when imaging using polarizing optics. RNAi on the MT2495 strain was performed for all 34 genes that had RNAi feeding clones available. Only 2 genes showed mild myofilament disorganization. This result is lower than the 14 genes identified using the strain RW1596. See figure 19, panels B and C, for comparison between RNAi using RW1596 and RNAi using MT2495. Out of the 34 genes screened, 14 caused ≥50% of animals to display myofilament disorganization. The highest penetrance observed in the re-screen was that of 69% (table 5). I therefore conclude that these low-expressed genes do not play a major structural role, at least when knocked out individually. Seven of the 14 genes have an assigned or predicted function, the others have unknown functions (table 5). RNAi on genes for which a knockout mutant was available revealed only one gene with greater than 50% phenotype penetrance, whereas KO analysis did not reveal any myofilament defect. This one gene was atl-1, which has been shown to be expressed in gonads. It is important to be aware that RNAi may generate false positives as well as false negatives due to its variable efficiency. Simmer et al. (Simmer et al., 2003) report that the false positive rate could be as low as 0.4%. Table 5 summarizes all the data obtained from the RNAi screen, the knockout mutant analysis and the promoter::GFP fusions. Some promoter fusion data is not yet available due to due technical difficulties with injection procedure. Table 5 also includes a brief description of gene product and any homologues in other organisms.  58 A) Controls myo-3::GFP (RW1596)   MT2495   D)  C11H1.5   B) F59F4.3 RNAi on RW1596   RNAi on MT2495   C) Y71F9AL.17 RNAi on RW1596   RNAi on MT2495    50 µm 50 µm 50 µm 50 µm 50 µm 50 µm Figure 19. Observed myofilament defects in RNAi experiment Panel A shows the controls for both strains, RW1596 and MT2495. The RW1596 strain carries a myo-3::GFP extrachromosomal array which leads to GFP expression in the thick filaments. The worms shown in panel A were fed with bacteria that did not carry a dsRNA generating vector. Panels B through D show examples of muscle disorganization seen in the RW1596 strain as a result of RNAi. In panels B and C the image of RNAi on MT2495 strain is also shown. Notice that although there is myofilament disorganization in the RW1596 strain, there is no disorganization in the MT2495 strain. All images were taken using 400x magnification. 50 µm  59 Table 5. Summary of data for genes that were verified by RT-PCR The first group of genes represent those that gave ≥50% phenotype penetrance in RNAi screen (RW1596 strain). The second group of genes are those that had low RNAi penetrance (or no RNAi clone available) but showed expression in muscle in the in vivo expression study. The third group of genes are those that had low RNAi penetrance (or no RNAi clone available) and did not show promoter::GFP expression in muscle. Control level of myofilament disorganization in the RW1596 RNAi screen is from 10-20%. The last column shows the presence of the gene in the muscle microarray data from the Miller lab (Fox et al., 2007), n/a means that no probes were used to detect the gene.  RNAi (1st screen/ 2nd screen) Cosmid name (common name) RW1596 strain MT2495 strain KO analysis Promoter Fusion expression Gene description Homology to other organisms Presence in muscle microarray data (3 trials) (Fox et al., 2007) C07D8.6 92.3% / 53.6% Normal filaments n/a Not yet available Aldose reductase family proteins Aldo reductases in various organisms Yes (2/3 trials) F54A5.2 84.6% / 68.8% Normal filaments n/a Not yet available Predicted: Cuticulin precursor To uncharacterized fly protein containing zona pellucida (ZP) domain n/a C11H1.5 70.5% / 60% Normal filaments n/a Seam cells Unknown function. Contains a thrombospodin type-1 domain Weak homology to other proteins containing thrombospodin type -I domain n/a T24B8.6 (hlh-3) 66.7% / 66.7% Normal filaments n/a Not yet available helix-loop-helix transcription factor Drosophila Achaete- scute (involved in embryonic nervous system formation) Yes (1/3 trials) E04D5.1 65% / 56.3% Normal filaments n/a Hypodermis and other tissues Unknown function Homology to transcription initiation factors n/a M106.4 57.1% / 66.67% Normal filaments n/a Not yet available Guanine Monophosphate (GMP) Synthetase Homology to GMP sysnthases in various organisms n/a  60 F59F4.3 55.6% / 66.7% Mild disorgani zation n/a Not yet available Unknown function Strong homology to bacteria Beta-lactamase II precursor (penicillinase) n/a B0464.1 (drs-1) 58.8% / 51% Normal filaments n/a Not yet available aspartyl(D) tRNA synthetase Homology to aspartyl(D) tRNA synthetase from various species Yes (3/3 trials) T06E4.3 (atl-1) 53.5% / 52% Normal filaments Normal filaments Not yet available ATL-1 contains a PI-3 kinase-like domain. ATL-1 is required for early embryogenesis survival and normal chromosomal segregation Homologous to human AT mutated in ataxia telangiectasia) n/a R12H7.5 (skr-20) 53.3% / 50% Normal filaments n/a Pharynx Unknown function Skp1 in yeast (involved in cell cycle) Yes (2/3 trials) Y17G7B.5 50% / 50.5% Normal filaments n/a Intestine and other tissues (not in muscle) Unknown function Homology to protein involved in DNA replication in humans, fly and yeast n/a  Y48G1A.5 (lim-5) 50% / 52.4% Normal filaments n/a Not yet available Encodes an importin-beta-like protein. Predicted to function in nuclear transport of proteins required for mitotic progression or apoptosis  Homology to Exportin- 2 and to yeast chromosome segregation 1 No (0/3 trials)  61 F33H1.3 50% / 50% Normal filaments n/a Not yet available Unknown function Homology to unknown proteins Yes (2/3 trials) F10D11.6 50% / 50% Normal filaments n/a Not yet available Unknown function. Contains lipid binding domain Bactericidal/permeabili ty-increasing protein in humans Yes (2/3 trials) Y71F9AL.1 7 72.7% / 40% Mild disorgani zation n/a Muscle as well as other tissues alpha subunit of the coatomer (COPI) complex Coatomer subunit alpha in humans, yeast and Drosophila Yes (3/3 trials) R10E11.1 (cbp-1) 42.1% Normal filaments n/a Muscle as well as other tissues CBP-1 is required during embryogenesis for differentiation of all non-neuronal somatic cell types Homolog to mammalian transcriptional cofactors shown to possess histone acetyltransferase activity n/a B0350.2 (unc-44) 41.2% Normal filaments Normal filaments Muscle as well as other tissues Encodes an ankyrin-like protein required for proper sex myoblast and axonal guidance during development Homology to an isoform of ankrybn-1 in humans n/a T23E7.2 40% Normal filaments n/a Muscle Unknown function Weak homology to glycoprotein and proteoglycan n/a Y45F10A.7 35.5% Normal filaments n/a Muscle, pharynx and vulval muscle Unknown function Glutaredoxin (responds to oxidative stress) No (0/3 trials) R06A10.2 (gsa-1) 16.7% Normal filaments Normal filaments Muscle as well as other tissues Encodes a Gs alpha subunit of heterotrimeric G proteins  Homology to Guanine nucleotide-binding protein G(s) subunit alpha Yes (2/3 trials)  62 F28F8.6 (atx-3) n/a n/a Normal filaments Muscle as well as other tissues Encodes the C. elegans ortholog of human ataxin-3 (ATXN3) Ataxin-3 (ATXN3) (associated with Machado-Joseph disease: lack of muscle control) Yes (2/3 trials) M03A8.2 n/a n/a n/a Muscle as well as other tissues Unknown function Sequence identity to some unknown proteins in various organisms  n/a F49E2.5 n/a n/a n/a Muscle Unknown function Appears to be nematode specific gene n/a C27B7.1 (spr-2) n/a n/a Normal filaments Muscle as well as other tissues Predicted: part of SET protein complex that functions in chromatin remodeling, DNA repair, and transcriptional regulation. May regulate the activity and/or levels of the presenilin proteins Orthologue of mammalian and Drosophila SET proteins n/a H06O01.2 52.6% / 39.13% Normal filaments n/a n/a Chromodomain- helicase DNA- binding protein chromodomain helicase DNA binding protein 1, from humans and yeast Yes (2/3 trials) C05C10.6 (ufd-3) 57.1% / 38.5% Normal filaments n/a Neurons Predicted: Phospholipase A2- activating protein (contains WD40 repeats)  To domain of Phospholipase A-2- activating protein in various organisms n/a  63 T22F7.1 46.7% Normal filaments n/a Excretory cells, gut, pharynx and other tissues Predicted: Synaptic vesicle transporter SVOP and related transporters To unknown proteins No (0/3 trials) R13H4.1 (nphp-4) 46.1% Normal filaments n/a Neurons C. elegans NPH-4 protein. [NePHronophthisis (human kidney disease) homolog] Ortholog of human NPHP4 Yes (2/3 trials) B0414.7 (mtk-1) 38.7% Normal filaments Very mild filament disorganiz a-tion Gut and pharynx Predicted: MEKK and related serine/threonine protein kinases Mitogen-activated protein kinase kinase kinase in various organisms n/a Y50D4C.1 (unc-34) 36.4% Normal filaments Normal filaments Gut and other tissues Signal transduction protein Enabled, contains WH1 domain Protein enabled homolog in various organisms n/a T04D3.5 33.3% Normal filaments n/a Not yet available Unknown Appears to be nematode specific Yes (3/3 trials) VC5.3 (npa-1) 33.3% Normal filaments n/a Not yet available npa-1 encodes a large polyprotein precursor. Probable binding protein for fatty acids and retinol (Vitamin A) Similarity to Dictyocaulus viviparus (a parasitic nematode) DVA-1 polyprotein precursor n/a Y37E11AR .3 29.4% Normal filaments n/a Not yet available Predicted Zn- dependent hydrolase Some sequence similarity to unknown proteins n/a M03F8.1 26.67% Normal filaments n/a Hypodermis Unknown protein  Appears to be nematode specific  No (0/3 trials)  64 F38E1.8 (srsx-22) 25% Normal filaments n/a Not yet available Predicted serpentine receptor Similarity to 7 transmembrane receptor domain No (0/3 trials) F17A2.5 (ceh-40) 25% Normal filaments Normal filaments Not yet available C. elegans homeodomain proteins homologous to Extradenticle Homology to fly Extradenticle No (0/3 trials) C06C3.1 (mel-11) 19% Normal filaments Normal filaments Pharynx mel-11 encodes Myosin phosphatase, regulatory subunit. Role in embryonic elongation ortholog of the vertebrate smooth muscle myosin- associated phosphatase regulatory subunit n/a Y71H2B.10 (apb-1) 12.5% Normal filaments n/a Pharynx and some neurons apb-1 encodes an adaptin ortholog of the beta1/2 subunit of adaptor protein complex 1 (AP- 1) that affects fertility and embryonic viability n/a Y54G2A.2 n/a n/a Normal filaments Not yet available Predicted: Guanylate-binding protein Atlastin-2 in various organisms n/a R05H10.5 n/a n/a n/a Gut, pharynx and tail Predicted: Glutathione peroxidase Similarity to Glutathione peroxidase domain in various organisms No (0/3 trials) F56B3.5 (ech-5) n/a n/a n/a Pharynx  Predicted: Enoyl- CoA hydratase (involved in fatty acid metabolism)  Enoyl-CoA hydratase in various organism n/a  65 Y71H2AM. 19 n/a n/a n/a Not yet available Unknown function Contains sequence similarity to helicase domain No (0/3 trials) Y37E3.17 n/a n/a n/a Not yet available Encodes a putative mitochondrial dimethylglycine dehydrogenase Dimethylglycine dehydrogenase, mitochondrial precursor in various organisms n/a   66 CHAPTER IV – DISCUSSION  Examination of the body wall muscle SAGE libraries revealed that a large proportion of the genes are expressed at low levels (<5 tags). Here my aim was to determine if these genes with low tag counts are indeed expressed in muscle or whether they are simply the result of sample contamination or experimental errors. This was achieved first by verifying in-vitro expression via RT-PCR and then by in-vivo expression via promoter::GFP fusions. Forty-three out of 114 genes  (38%) were verified for their expression in body wall muscle by RT-PCR (table 2). The promoter::GFP fusion experiment resulted in 10 out of 24 genes (42%) tested displaying expression in muscle. Together these suggest that approximately 16% of the low- expressed genes tested are expressed in muscle. Since most research is focused on genes that are enriched in the tissue of interest, it is important to investigate the possibility that we are overlooking valuable candidate genes by throwing out low-expressed transcripts. In fact, it has been shown that low- expressed transcripts have the potential of identifying novel genes (Lee et al., 2005).  4.1 16% of all low-abundance SAGE tags are indeed expressed in muscle  A total of 6388 low-expressed transcripts were identified in either muscle SAGE libraries. In muscle library 2, 59% of transcripts are expressed at low levels (figure 14) and out of this 25% are singletons. This illustrates that low-expressed transcripts account for a significant portion of the overall data set. It was noted that sequence quality cut-off plays a role in the number of genes annotated to a particular SAGE data set (figure 13).  67 The total number of genes decreases as sequence quality becomes more stringent; on the other hand, the proportion of low-expressed genes increases. As the quality score increases, there are fewer tags in the library since tags below the quality cut-off value are removed. This means that genes will have fewer tags annotated to them and thus more genes will be considered low-expressed. As more clones are sequenced to generate SAGE libraries with greater numbers of tags there will gradually be more genes that have five or more tags and hence would no longer be considered low-expressing. However, this is only true if sequencing has saturated and identified all the possible transcripts in the cell. The muscle library 2 has approximately 90,000 tags (compared to ~34,000 in muscle library one) and a larger number of low-abundance genes. It appears that as the number of tags in the SAGE library increases, so does the number of low-abundance transcripts. This could be an indication that the library has not been saturated in terms of transcript discovery and if sequencing was performed even more in depth a larger number of genes would be identified in this data set. Both muscle SAGE libraries were generated using LongSAGE (21bp tags) to increase tag specificity and therefore reduce tag ambiguity (Li et al., 2006). Muscle library two had a larger number of tags because it was sequenced more in depth. Deeper sequencing means that more clones carrying concatenated ditags are sequenced. This heavily depends on budget for the sequencing aspect of the SAGE experiment. In the case of the muscle SAGE libraries, library 2 identified 7,259 unique transcripts as compared to muscle library 1 which identified 5,046 transcripts (figure 11). If saturation of transcript identification is reached, then the number of transcripts identified should  68 remain at a relatively constant level between libraries (Wang, 2006). Since this is not the case, sequencing deeper should continue to identify new transcripts.  With the aid of our lab’s bioinformatician, Adam Lorch (Moerman Lab, UBC, Vancouver, Canada), we queried the list of singletons against twenty GO terms (figure 15). I looked for biases among singletons for any of the queried functions (figure 15). The only category where there is a higher percentage of singletons relative to the percentage of genes in the whole genome is cell organization and biogenesis. The GO definition for cell organization and biogenesis is: “A process that is carried out at the cellular level which results in the formation, arrangement of constituent parts, or disassembly of a cellular component; includes the plasma membrane and any external encapsulating structures such as the cell wall and cell envelope” (Ashburner et al., 2000). These single tag genes could represent proteins that aid in fine-tuning the organization of the cell, but are not essential for the cell’s survival as suggested by the RNAi results. In contrast, the relative percentage of genes in the whole genome with the terms signal transduction and cell communication is much higher than the percent of singletons with these terms. I anticipated that there would have been a greater representation of singletons in these categories since it has been suggested that lower abundance transcripts generally code for genes with specialized functions (e.g. transcription factors) (Holland, 2002; Kim et al., 2006). Gene Ontology annotations provide a broad sense of function, and cannot be interpreted with complete confidence. GO terms are usually annotated based on sequence homology families, functional groups and cellular localization, supplied by a variety of other databases. GO collects information about the queried gene sequence and searches  69 several databases for annotated terms to sequences homologous to the query sequence. This results in certain genes having a variety of GO terms annotated to them (Ashburner et al., 2000). It is important to be aware that interpretation of functional terms will vary for different people and between databases thus function has to eventually be determined via experimental methods (Pal, 2006).  I wanted to analyze a small subset of the total low-expressed genes identified in the SAGE data (6388 genes, figure 12) and determine what percentage of these genes are in fact expressed in muscle and are not false positives. A subset of 128 genes was a manageable number to work with, which can provide an overview of the whole set. The availability of RNA restricted the number of genes I selected for performing RT-PCR on.  A major limitation was obtaining sufficient sorted muscle cells from which to extract RNA. Each cell sort resulted in ~300,000-400,000 muscle cells that produced anywhere from 170 ng to 300 ng of RNA. Since cell sorting these small embryonic muscle cells (2-10 µm in diameter) (Christensen et al., 2002) is very time consuming and yields only small amounts of RNA, I chose to first perform test reactions using whole embryo RNA. In the whole embryo RT-PCR the goal was to explore the optimal reaction conditions in which most primer sets would work. There were 14 primer sets that never generated a product in these RT-PCR reactions using whole embryo RNA as template. For this reason, only 114 primer sets were utilized on reactions with muscle RNA as template (table 2). There are a few possibilities as to why these 14 primers may not have been able to amplify their target gene when supplied with whole embryo RNA template: (1) Poor primer design. (2) Reaction conditions not optimal for the primer set.  70 (3) When queried against the embryonic SAGE libraries (figure 4), 1 of the 14 genes did not have any tags in either normal SAGE and LongSAGE libraries. Ten of the 14 genes have less than 5 tags in both whole embryo libraries and 3 genes have between 6-11 tags. Low-abundance transcript identification by SAGE may be prevented due to an overwhelming amount of transcripts present in embryo RNA. The same is true for the RT-PCR primer. Primers could be hindered from finding low-expressed transcripts in pool of whole embryo transcripts. In this case primer set requires increased transcript enrichment from pure tissue sample.  (4) There is a possibility that the low-expressed genes were identified in SAGE data due to experimental errors (e.g. sequencing errors). (5) Transcript not present in template due to RNA degradation. The RT-PCR reactions, using muscle RNA as template, resulted in 43 out of 114 genes generating reaction products. This result indicates that at least 38% (43/114) of genes show expression in muscle (table 2). Table 5 points out that 12 of these 43 genes are present in one or more of the three muscle microarray replicas performed by Fox et al (2007). Being present in the microarray data is an additional method of verification that these 12 genes are in fact expressed in muscle. It is possible that if the gene was identified in the SAGE library due to sample contamination or experimental errors, it could have dropped-out in the RT-PCR step. False negatives could have resulted from RNA degradation or non-sufficient amount of template. Increasing the amount of RNA template in each reaction may improve the detection of the low-abundance transcripts that resulted in no product.  71 Out of the 114 genes, 72 were identified by just one tag present as a single copy, and 25 of these 72 (35%) were validated via RT-PCR (table 2). The percent of singletons verified (35%) relative to the total low-expressed genes verified (38%) is only 3% apart. An intriguing experiment would be to select genes with 5 tags, 10 tags, 20 tags, 30 tags, and observe how the percentage of validated genes increase. This would allow us to determine at what tag count could 100% of the genes be validated. The RT-PCR reactions provided a first round of evidence for the question of what percentage of low-expressed genes are not false-positives. Since the RNA used for the RT-PCR reactions and the SAGE library production came from myo-3 cells sorted in the same manner, both procedures could contain similar amounts of sample contamination. As a consequence, we investigated the expression of these low-expressed genes in vivo. The goal of the in vivo study was to express GFP using the gene’s own promoter and determine the location of gene expression. The 43 low-expressed genes that tested positive via RT-PCR were selected for this in-vivo study. First, the 43 genes were queried against the data from the C. elegans Gene Expression Project (Hunt-Newbury et al., 2007) to search for promoter::GFP fusion information already available for any of these genes. Next, I performed a literature search for these genes and found that 5 more genes already had promoter::GFP expression data available. Promoter::GFP fusion constructs were generated for the remaining genes. Table 3 shows a summary of the promoter::GFP data. At this time there are 10 genes that show expression in muscle and 14 that do not. Thus, approximately 42%  (10/24) of tested genes have detectable muscle expression. Hunt-Newbury et al. (Hunt-Newbury et al., 2007) also saw discrepancy between a gene’s presence in the SAGE data and their GFP fusion expression pattern. They report that 71%  72 of the genes for which their promoter::GFP constructs expressed in muscle are also present in the muscle SAGE data. In this paper they also showed GFP expression in body wall muscle for 26 singletons, which were not part of the 128 genes selected for RT-PCR. This provides further evidence to support my findings that not all low-expressed transcripts identified by the SAGE libraries are false positives; many are in fact expressed in muscle. Etchberger et al (2007) looked at GFP expression for 12 singletons identified in their gustatory neuron SAGE library. They found that 7/12 (58%) showed expression in the correct neuron. From the 24 genes for which I have shown GFP expression data, 13 are singletons and 5 out of the 13 have expression in body wall muscle (38%). All the singleton expression data testify to the potency of GFP reporter expression even when it is fused with the upstream regulatory region of low-expressed genes. There are a number of possible reasons why some of the low-expressed genes selected in this study did not show expression in muscle: (1) Gene could have been present in SAGE muscle data due to sample contamination (cell from another tissue type mixed with FACS muscle cell due to sorting error). We know by visual inspection that our sample of FACS sorted muscle cells is approximately 95% pure. Hence a particular contaminant could be as high as 5% of total sample. Making the GFP gate more stringent (figure 6, gate P3) did not seem to make a significant difference in decreasing the contamination amount. C. elegans muscle cells are small and sticky, making it possible for a non-GFP cell to be sorted into the sample by sticking to a GFP muscle cell. It is very difficult, if not impossible, to obtain a 100% pure sample.  73 (2) Sequencing errors could have generated an erroneous tag to a particular gene that is not actually expressed in muscle. Beissbarth et al (2004) report that in LongSAGE libraries 3.5% of tag sequences have errors resulting from PCR artifacts and 17.3% of tags have errors due to sequencing errors. Sequencing errors were addressed by setting the sequence quality score to 0.99 to eliminate low quality tags. (3) SAGE can also pick up some leaky transcription since clones are randomly sequenced. The deeper the library, the more likely leaky transcription could be detected, and if present it will likely be represented as low-abundance tags. Leaky transcription can vary from cell to cell. If a transcript is picked up due to leaky transcription it will be under represented in the sample of transcripts from many cells. Since clones are picked randomly for sequencing, it is less likely that an under represented tag will be sequenced. (4) Discrepancy could have arisen from promoter fusion. It is possible that the upstream sequence that I used to fuse to GFP did not contain some important regulatory elements for expression in body wall muscle. Regulatory elements could have been further upstream, in introns or even downstream of coding region. It should be noted that if a silencer regulatory sequence was missing then the promoter::GFP fusion could have potentially resulted in a false positive.  If we combine the results from the RT-PCR (38%) and the promoter fusions (42% of the genes tested positive by RT-PCR) the data suggests that at least 16% of low- expressed genes are expressed in body wall muscle. Since the rate of false negatives is unclear, the 16% percentage result is a conservative estimate and could actually be higher. Taking into consideration that the muscle SAGE libraries have identified 6388  74 low-expressed genes, if the 16% trend holds true for the entire sample then at least 1022 of these genes are not false positives. The SAGE profile for each of the 43 genes that produced RT-PCR products was investigated. A SAGE profile is the number of tags annotated to the particular gene in each of the 14 C. elegans tissue-specific SAGE libraries that are publicly available (http://tock.bcgsc.bc.ca/cgi-bin/sage180). It was expected that there might be a bias for genes that did not show promoter::GFP expression in muscle to have a high tag count in another tissue, and thus indicate the possibility of these genes being picked up due to contamination. Out of the 14 genes that did not show promoter::GFP expression in muscle, 5 had low tag counts in every other tissue specific SAGE libraries, whereas 8 genes had high tag count in another tissue specific SAGE library. Based on these findings I researched whether the same trend was observed for the genes that displayed promoter::GFP expression in muscle. Six of these genes had only low tag counts in other tissues, and 4 genes showed high tag counts in other tissues. These observations suggest that SAGE profile is not an ideal predictor for identifying whether a low-expressed gene found in the muscle SAGE library is actually expressed in muscle or if it is a false positive. If there is an interest in studying low-abundance transcripts and no information is available to determine if the gene is in fact expressed in the tissue, then RT-PCR and in-vivo expression are good first steps to deciding whether or not to pursue the candidate gene.   75 4.2 Depleting low expressed genes in muscle has a minor effect on myofilament integrity My objective here was to gain insight into the importance of low-expressed genes in the formation and maintenance of myofilaments. This was accomplished by knocking out each low-expressed gene validated via RT-PCR individually, and then analyzing the myofilament for misalignment and degeneration. Two methods of gene knock down were used: knockout mutants (supplied by the C. elegans Genetics Center) and RNAi (clones carrying dsRNA available in the Ahringer feeding library as previously described).  Eleven knockout mutants had their myofilaments observed under polarized light and none of them revealed any significant changes in filament structure. I next performed RNAi experiments on 34 of the 43 low-expressed genes. To do this analysis I used two strains: RW1596 (RNAi sensitive to myofilament disruption) and MT2495 (RNAi hypersensitive strain).  The RW1596 strain is a homozygous myo-3 mutant rescued by an extrachromosomal array carrying the myo-3 gene fused to GFP. Animals that are not rescued by this array show embryonic arrest. When functional MYO-3 is produced from the extrachomosomal array, MYO-3 is fused to GFP and allows for easy visualization of myofilament since MYO-3 makes up the center of thick filaments. In the RNAi experiment there were a few types of filament disorganization observed. GFP aggregation was a common phenotype (refer to figure 18) that could have resulted from the MYO-3 not being able to localize to its proper location, or MYO-3 not being distributed evenly thus accumulating excess protein at particular locations. Slight GFP aggregation was sometimes seen in the control RW1596 worms, this could be due to the excess expression  76 of myo-3 from the extrachromosomal array. Another commonly observed phenotype was the misalignment of myofilaments, indicating that the gene knocked down was involved in filament stabilization. Finally, some worms displayed large gaps in their filaments where MYO-3::GFP should have been present. This could be an effect of the loss of a gene associated with protein turn-over or filament stability, so that when the gene is gone the filament cannot endure as much mechanical stress.  The myofilaments in the RW1596 strain are sensitized towards disintegration. The myosin heavy chain-A (myo-3) fused to the 238 amino acid GFP (Chalfie et al., 1994) produced by the extrachromosomal array has to be tightly packed into the center of the myofilament (Epstein, 1990). Although the myo-3::GFP array can rescue the lethal mutant phenotype, it is possible that the GFP is causing physical hindrance and not allowing for optimal compact packing of the myosin heavy chain-A in the thick filament. Even if a gene that only has a minor role in filament integration is knocked out, an enhanced RNAi phenotype can be observed. The fragile filaments make this strain ideal for identifying even minor components of myofilament assembly and maintenance. Another feature of the RW1596 strain is the age related sarcopenia observed in adult worms. As these worms become adults, their filaments begin to fall apart, whereas N2 worms display organized filaments until late adulthood (Herndon et al., 2002; Meissner, personal communication). Due to age related filament breakdown, the RW1596 worms were imaged between the L4 to young adult stages. It has been previously reported that RNAi does not display complete penetrance of the mutant phenotype because not always is there a total knockdown of the gene. The penetrance of a particular mutant phenotype varies between each trial of an RNAi screen  77 (Kamath et al., 2003; Simmer et al., 2003). To address these issues, RNAi screening was always done in replicates of three and worms from each plate were scored for abnormal myofilaments in order to obtain an average from the three replicas. In addition, genes that showed a ≥50% phenotype penetrance were re-screened a second time in order to verify phenotype. The cut-off value of 50% was selected because low-expressed genes do not appear to be essential components of myofilaments (figure 5). As a result, most phenotypes did not demonstrate a high penetrance of filament disorganization. Unpublished data from the Moerman lab shows that many major muscle components when eliminated result in 75-100% of animals displaying the knockout phenotype. RNAi on the MT2495 strains produced different results. This strain does not carry a GFP array and it is not hypersensitive to myofilament disintegration. Visualization must be done using polarized optics (figure 19). Due to the technicalities of using polarized light it is difficult to accurately estimate percent penetrance of the phenotype. In this screen using MT2495, two genes showed very mild disorganization: F59F4.3 – unknown protein and Y71F9AL.17 – alpha subunit of the COPI complex (table 5). In the RW1596 background both these genes had >50% phenotype penetrance in the first screen, while in the second screen F59F4.3 resulted in 67% and Y71F9AL.17 in 40% (table 5). When comparing both strains used for RNAi, 14 genes showed ≥50% myofilament disorganization using the RW1596 strain since it is sensitized for muscle degeneration, compared to only 2 genes that showed mild myofilament disorganization using MT2495. The RW1596 strain is advantageous for uncovering genes that are indeed involved in filament formation and maintenance even if they play only a minor role. If a gene plays a key role in muscle formation we would expect to see myofilament  78 disorganization in both strains, since major muscle proteins have severe detrimental effects when knocked out (Moerman and Williams, 2006). It is possible that low- expressed genes may be involved in myofilament formation and stabilization via interactions with actin, myosin, other structural proteins or attachment complexes. It is also possible that some low-expressed genes have genetic buffering, redundant pathways that can compensate for their loss, resulting in just a slight RNAi phenotype. Low- expressed genes could also be involved in specialized roles such as coding for a transcription factor. In most cases filaments are still properly formed when the genes investigated here are knocked down. However, the filament could be much weaker and lead to early onset sarcopenia, or to minor changes in underlying structures that could not be visualized using myo-3::GFP (for example, attachment complexes). A possible speculation is that in the wild, low-expressed genes may play a more crucial role than when worms are grown in favorable laboratory conditions. Worms living in the wild face many more environmental stresses that could require “fine-tuning” roles that may be provided by low-expressed genes in order to keep their muscle as strong and intact as possible. Low-expressed genes may provide candidates for discovering novel genes that may have been missed by conventional approaches such as mutagenesis. In fact, in the past few years researchers have been looking at low-abundance transcripts that are un- matched to any part of the genome as a form to unveil new genes. (Boheler and Stern, 2003; Chen et al., 2002; Lee et al., 2005). A follow up to this work would be to select candidate genes, which showed RNAi phenotypes and/or have their expression in muscle validated, for further studies. Many of the low-abundant genes have unknown functions.  79 Studies of these genes could lead to a better understanding of the complexity of how muscle cells are formed, maintained and function. Our lab generated a compilation of known muscle-affecting genes from literature research (Warner, personal communication). Three of these genes show low-expression in the muscle SAGE libraries. Of the 3 genes, 2 are involved in muscle attachment and 1 is involved in muscle contraction. This serves as further evidence to support that there are muscle components encoded by low-expressed transcripts and that in future expression profiling experiments it may be worth pursuing some of these low-abundant genes. If I was to pursue further work on the low-expressed genes tested in this thesis I would choose the genes T23E7.2 and Y71F9AL.17 (both verified by RT-PCR, see table 5). T23E7.2 is a good candidate since its in vivo expression was exclusive to muscle. It is an unknown gene and could reveal a novel muscle component. Y71F9AL.17 has a few features that make it a good candidate: significant RNAi phenotype in the first screen, in vivo muscle expression, present in all three microarray trials, and has homologues in other species.    80 CHAPTER V – CONCLUSION  Here we have provided data to demonstrate that at least 16% of low-expressed genes identified in the C. elegans muscle SAGE libraries are not false positives and could potentially identify new candidate genes worth researching. Low-expressed genes were verified here using two approaches, RT-PCR and promoter::GFP fusions. Not all genes validated via RT-PCR showed muscle expression in promoter::GFP fusions, indicating that although an in vitro method is a good initial test to eliminate false positives from the data set, expression must be verified by in-vivo studies.  In this work, reverse-transcription PCR was used for the first round of validation. Quantitative RT-PCR could offer an alternative approach, and allow for observation of how the total level of mRNA for a certain gene correlates with the number of tags detected by the SAGE experiment. Promoter::GFP fusions were also employed to validate gene expression in muscle. Promoter fusions were chosen rather than full functional fusions because I was only interested in the location of gene expression. Promoter::GFP fusions can be easily performed via PCR stitching reactions and fusion products are directly injected into worms (Hobert, 2002).  RNAi screening and null mutant analysis of low-expressed genes indicated that for most part they play only a minor role in myofilament formation, maintenance and turn over. A knockout phenotype analysis is an preliminary step in determining the potential role of these genes. Additional experiments are necessary to deduce their functions. These experiments could include mutant rescue, using antibodies to look at protein localization, yeast two hybrid, and double mutant analysis. Confirming expression in  81 muscle and performing an RNAi screen presents the possibility of identifying potential candidates worth further researching.  SAGE provides a qualitative and quantitative profile of gene expression levels at a certain time point in development. SAGE is equivalent to a snapshot of the transcriptome at the stage when cells are harvested for RNA extraction. In these SAGE libraries analysed in this thesis I examined genes expressed in late embryonic muscle cells. The availability of multiple C. elegans SAGE libraries allows us to generated a quantitative SAGE profile for transcripts throughout various developmental stages and between various tissue types. SAGE profiles can indicate if the low-expressed transcripts identified in these two muscle SAGE libraries actually become more abundant in later stages in development or remain low-expressed throughout.  The traditional SAGE experiment described by Velculescu et al. (1995) has evolved over the years as research evolved from SAGE libraries with 1000 tags to libraries with at least 40,000 thousands tags (Huang et al., 2005; Jones et al., 2001; McGhee et al., 2007; McKay et al., 2003; Velculescu et al., 1997). As sequencing becomes cheaper, researchers can afford to pay for more clones to be sequenced and therefore produce SAGE libraries with larger numbers of tags. More tags signify that we get an even more in-depth look at the transcriptome.  New transcription profiling methods such as SuperSAGE (Matsumura et al., 2005), 5’ RNA end determination (TEC-RED) (Hwang et al., 2004), and Illumina Solexa sequencing (DeepSAGE) were created in order to address some limitations of SAGE and improve our analysis of the transcriptome. SuperSAGE generates 26 bp tags, instead of 21bp tags seen in LongSAGE, to increase tag specificity and identify splice variants more  82 precisely (Matsumura et al., 2005). The problem with SuperSAGE is that sequencing costs increase because with longer tags there are fewer tags per clone and thus require sequencing of more clones to obtain the same tag number as a LongSAGE library. Our lab has currently invested in Illumina Solexa sequencing (DeepSAGE), where the amplification step and sequencing are done within a flow cell using transcript clusters and fluorescently labeled bases, a laser can easily detect which bases are being added and thus produce a sequence read (Bennett, 2004; Bennett et al., 2005). Since Solexa is now capable of producing millions of tags (5-6 million), we hope to compare the low- expressed genes identified in the SAGE libraries to the Solexa data and observe what happens to their tag counts. I expect that if the low-expressed genes identified in the SAGE data increase significantly in tag numbers in the Solexa data then they are in fact really expressed in muscle. One pitfall is that if a low-expressed gene is being detected due to contamination of cells from other tissues in the FACS sorted muscle cell sample, then as we sequence deeper these genes will also increase in their tag number. The low- expressed genes identified in the SAGE libraries that remain expressed at very low levels as we sequence deeper are most likely to be false positives. Therefore, no matter how deep we sequence, experimental work is still needed to validate low-expressed genes.  Since low-expressed genes make up more than half of the genes identified by the muscle SAGE libraries, it is important to know if these genes are false positives and should be removed from data set or are if they are in fact expressed in muscle. This work provides encouraging data for the use of low-expressed genes as a source for new candidate genes. At least 16% of the tested low-expressed genes show expression in muscle both in vitro and in vivo. If this trend holds true to the entire library then 16% is  83 still a significant number (1022 genes) given that the SAGE libraries have 6388 low- expressed genes. Looking at genes that are enriched in the SAGE data can provide the key players of muscle cell structure and function. Focusing on low-abundance genes may help fill in the gaps in our knowledge by providing potential candidates that could be aiding the key players in fulfilling their essential functions.   84 REFERENCES  Adams, M.D., M. Dubnick, A.R. Kerlavage, R. Moreno, J.M. Kelley, T.R. Utterback, J.W. Nagle, C. Fields, and J.C. Venter. (1992). Sequence identification of 2,375 human brain genes. Nature. 355(6361):632-4. Adams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, R.F. Moreno, and et al. (1991). Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 252(5013):1651-6. Ahringer, J. (1997). Turn to the worm! Curr Opin Genet Dev. 7(3):410-5. Altun, Z.F., and D.H. Hall. (2005). Mesodermal Organs: Body Wall Muscle. In WormAtlas. http://www.wormatlas.org/handbook/contents.htm. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25(1):25-9. Beissbarth, T., L. Hyde, G.K. Smyth, C. Job, W.M. Boon, S.S. Tan, H.S. Scott, and T.P. Speed. (2004). Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 20 Suppl 1i31-9. Bennett, S. (2004). Solexa Ltd. Pharmacogenomics. 5(4):433-8. Bennett, S.T., C. Barnes, A. Cox, L. Davies, and C. Brown. (2005). Toward the 1,000 dollars human genome. Pharmacogenomics. 6(4):373-82. Biemar, F., D.A. Nix, J. Piel, B. Peterson, M. Ronshaugen, V. Sementchenko, I. Bell, J.R. Manak, and M.S. Levine. (2006). Comprehensive identification of Drosophila dorsal-ventral patterning genes using a whole-genome tiling array. Proc Natl Acad Sci U S A. 103(34):12763-8. Boheler, K.R., and M.D. Stern. (2003). The new role of SAGE in gene discovery. Trends Biotechnol. 21(2):55-7; discussion 57-8. Brenner, S. (1974). The genetics of Caenorhabditis elegans. Genetics. 77(1):71-94. Chalfie, M., Y. Tu, G. Euskirchen, W.W. Ward, and D.C. Prasher. (1994). Green fluorescent protein as a marker for gene expression. Science. 263(5148):802-5. Chen, J., M. Sun, S. Lee, G. Zhou, J.D. Rowley, and S.M. Wang. (2002). Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci U S A. 99(19):12257-62. Chen, N., A. Mah, O. Blacque, J. Chu, K. Phgora, M. Bakhoum, C.R. Hunt Newbury, J. Khattra, S. Chan, A. Go, E. Efimenko, R. Johnsen, P. Phirke, P. Swoboda, M. Marra, D. Moerman, M. Leroux, D. Baillie, and L. Stein. (2006). Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biology. 7(12):R126. Christensen, M., A. Estevez, X. Yin, R. Fox, R. Morrison, M. McDonnell, C. Gleason, D.M. Miller, and K. Strange. (2002). A Primary Culture System for Functional Analysis of C. elegans Neurons and Muscle Cells. Neuron. 33(4):503-514.  85 Consortium, C.e.S. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 282(5396):2012-8. Divina, P., and J. Forejt. (2004). The Mouse SAGE Site: database of public mouse SAGE libraries. Nucleic Acids Res. 32(Database issue):D482-3. Dupuy, D., Q.R. Li, B. Deplancke, M. Boxem, T. Hao, P. Lamesch, R. Sequerra, S. Bosak, L. Doucette-Stamm, I.A. Hope, D.E. Hill, A.J. Walhout, and M. Vidal. (2004). A first version of the Caenorhabditis elegans Promoterome. Genome Res. 14(10B):2169-75. Epstein, H.F. (1990). Genetic analysis of myosin assembly in Caenorhabditis elegans. Mol Neurobiol. 4(1-2):1-25. Epstein, H.F., R.H. Waterston, and S. Brenner. (1974). A mutant affecting the heavy chain of myosin in Caenorhabditis elegans. J Mol Biol. 90(2):291-300. Etchberger, J.F., A. Lorch, M.C. Sleumer, R. Zapf, S.J. Jones, M.A. Marra, R.A. Holt, D.G. Moerman, and O. Hobert. (2007). The molecular signature and cis- regulatory architecture of a C. elegans gustatory neuron. Genes Dev. 21(13):1653- 74. Ewing, B., L. Hillier, M.C. Wendl, and P. Green. (1998). Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8(3):175-85. Fire, A., D. Albertson, S.W. Harrison, and D.G. Moerman. (1991). Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegans muscle. Development. 113(2):503-14. Fire, A., S. Xu, M.K. Montgomery, S.A. Kostas, S.E. Driver, and C.C. Mello. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 391(6669):806-11. Fox, R.M., J.D. Watson, S.E. Von Stetina, J. McDermott, T.M. Brodigan, T. Fukushige, M. Krause, and D.M. Miller, 3rd. (2007). The embryonic muscle transcriptome of Caenorhabditis elegans. Genome Biol. 8(9):R188. Francis, R., and R.H. Waterston. (1991). Muscle cell attachment in Caenorhabditis elegans. J Cell Biol. 114(3):465-79. Fukushige, T., and M. Krause. (2005). The myogenic potency of HLH-1 reveals wide- spread developmental plasticity in early C. elegans embryos. Development. 132(8):1795-805. Gettner, S.N., C. Kenyon, and L.F. Reichardt. (1995). Characterization of beta pat-3 heterodimers, a family of essential integrin receptors in C. elegans. J Cell Biol. 129(4):1127-41. Gorski, S.M., S. Chittaranjan, E.D. Pleasance, J.D. Freeman, C.L. Anderson, R.J. Varhol, S.M. Coughlin, S.D. Zuyderduyn, S.J. Jones, and M.A. Marra. (2003). A SAGE approach to discovery of genes involved in autophagic cell death. Curr Biol. 13(4):358-63. Hill, A.A., C.P. Hunter, B.T. Tsung, G. Tucker-Kellogg, and E.L. Brown. (2000). Genomic analysis of gene expression in C. elegans. Science. 290(5492):809-12. Hillier, L.W., A. Coulson, J.I. Murray, Z. Bao, J.E. Sulston, and R.H. Waterston. (2005). Genomics in C. elegans: so many genes, such a little worm. Genome Res. 15(12):1651-60. Hobert, O. (2002). PCR fusion-based approach to create reporter gene constructs for expression analysis in transgenic C. elegans. Biotechniques. 32(4):728-30.  86 Holland, M.J. (2002). Transcript abundance in yeast varies over six orders of magnitude. J Biol Chem. 277(17):14363-6. Holt, S.J., and D.L. Riddle. (2003). SAGE surveys C. elegans carbohydrate metabolism: evidence for an anaerobic shift in the long-lived dauer larva. Mech Ageing Dev. 124(7):779-800. Hresko, M.C., B.D. Williams, and R.H. Waterston. (1994). Assembly of body wall muscle and muscle cell attachment structures in Caenorhabditis elegans. J Cell Biol. 124(4):491-506. Huang, J., X. Miao, W. Jin, P. Couble, K. Mita, Y. Zhang, W. Liu, L. Zhuang, Y. Shen, C. Keime, O. Gandrillon, P. Brouilly, J. Briolay, G. Zhao, and Y. Huang. (2005). Serial analysis of gene expression in the silkworm, Bombyx mori. Genomics. 86(2):233-41. Hunt-Newbury, R., R. Viveiros, R. Johnsen, A. Mah, D. Anastas, L. Fang, E. Halfnight, D. Lee, J. Lin, A. Lorch, S. McKay, H.M. Okada, J. Pan, A.K. Schulz, D. Tu, K. Wong, Z. Zhao, A. Alexeyenko, T. Burglin, E. Sonnhammer, R. Schnabel, S.J. Jones, M.A. Marra, D.L. Baillie, and D.G. Moerman. (2007). High-throughput in vivo analysis of gene expression in Caenorhabditis elegans. PLoS Biol. 5(9):e237. Hwang, B.J., H.M. Muller, and P.W. Sternberg. (2004). Genome annotation by high- throughput 5' RNA end determination. Proc Natl Acad Sci U S A. 101(6):1650-5. Jones, S.J., D.L. Riddle, A.T. Pouzyrev, V.E. Velculescu, L. Hillier, S.R. Eddy, S.L. Stricklin, D.L. Baillie, R. Waterston, and M.A. Marra. (2001). Changes in gene expression associated with developmental arrest and longevity in Caenorhabditis elegans. Genome Res. 11(8):1346-52. Kamath, R.S., A.G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot, S. Moreno, M. Sohrmann, D.P. Welchman, P. Zipperlen, and J. Ahringer. (2003). Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature. 421(6920):231-7. Kavathas, P., V.P. Sukhatme, L.A. Herzenberg, and J.R. Parnes. (1984). Isolation of the gene encoding the human T-lymphocyte differentiation antigen Leu-2 (T8) by gene transfer and cDNA subtraction. Proc Natl Acad Sci U S A. 81(24):7688-92. Kim, Y.C., Y.C. Jung, Z. Xuan, H. Dong, M.Q. Zhang, and S.M. Wang. (2006). Pan- genome isolation of low abundance transcripts using SAGE tag. FEBS Lett. 580(28-29):6721-9. Laing, N.G., and K.J. Nowak. (2005). When contractile proteins go bad: the sarcomere and skeletal muscle disease. Bioessays. 27(8):809-22. Lash, A.E., C.M. Tolstoshev, L. Wagner, G.D. Schuler, R.L. Strausberg, G.J. Riggins, and S.F. Altschul. (2000). SAGEmap: a public gene expression resource. Genome Res. 10(7):1051-60. Lee, S., J. Bao, G. Zhou, J. Shapiro, J. Xu, R.Z. Shi, X. Lu, T. Clark, D. Johnson, Y.C. Kim, C. Wing, C. Tseng, M. Sun, W. Lin, J. Wang, H. Yang, W. Du, C.I. Wu, X. Zhang, and S.M. Wang. (2005). Detecting novel low-abundant transcripts in Drosophila. Rna. 11(6):939-46. Lehner, B., A. Calixto, C. Crombie, J. Tischler, A. Fortunato, M. Chalfie, and A.G. Fraser. (2006). Loss of LIN-35, the Caenorhabditis elegans ortholog of the tumor suppressor p105Rb, results in enhanced RNA interference. Genome Biol. 7(1):R4.  87 Li, Y.J., P. Xu, X. Qin, D.E. Schmechel, C.M. Hulette, J.L. Haines, M.A. Pericak-Vance, and J.R. Gilbert. (2006). A comparative analysis of the information content in long and short SAGE libraries. BMC Bioinformatics. 7504. Mackenzie, J.M., Jr., and H.F. Epstein. (1980). Paramyosin is necessary for determination of nematode thick filament length in vivo. Cell. 22(3):747-55. Matsumura, H., A. Ito, H. Saitoh, P. Winter, G. Kahl, M. Reuter, D.H. Kruger, and R. Terauchi. (2005). SuperSAGE. Cellular Microbiology. 7(1):11-18. McGhee, J.D., M.C. Sleumer, M. Bilenky, K. Wong, S.J. McKay, B. Goszczynski, H. Tian, N.D. Krich, J. Khattra, R.A. Holt, D.L. Baillie, Y. Kohara, M.A. Marra, S.J. Jones, D.G. Moerman, and A.G. Robertson. (2007). The ELT-2 GATA-factor and the global regulation of transcription in the C. elegans intestine. Dev Biol. 302(2):627-45. McKay, S.J., R. Johnsen, J. Khattra, J. Asano, D.L. Baillie, S. Chan, N. Dube, L. Fang, B. Goszczynski, E. Ha, E. Halfnight, R. Hollebakken, P. Huang, K. Hung, V. Jensen, S.J. Jones, H. Kai, D. Li, A. Mah, M. Marra, J. McGhee, R. Newbury, A. Pouzyrev, D.L. Riddle, E. Sonnhammer, H. Tian, D. Tu, J.R. Tyson, G. Vatcher, A. Warner, K. Wong, Z. Zhao, and D.G. Moerman. (2003). Gene expression profiling of cells, tissues, and developmental stages of the nematode C. elegans. Cold Spring Harb Symp Quant Biol. 68159-69. Mello, C.C., J.M. Kramer, D. Stinchcomb, and V. Ambros. (1991). Efficient gene transfer in C.elegans: extrachromosomal maintenance and integration of transforming sequences. Embo J. 10(12):3959-70. Moerman, D.G., and B.D. Williams. (2006). Sarcomere assembly in C. elegans muscle. In WormBook, ed. T.C.e.R. Community, editor. Nowak, R. (1995). Entering the postgenome era. Science. 270(5235):368-9, 371. Pal, D. (2006). On gene ontology and function annotation. Bioinformation. 1(3):97-8. Pleasance, E.D., M.A. Marra, and S.J. Jones. (2003). Assessment of SAGE in transcript identification. Genome Res. 13(6A):1203-15. Reinke, V., H.E. Smith, J. Nance, J. Wang, C. Van Doren, R. Begley, S.J. Jones, E.B. Davis, S. Scherer, S. Ward, and S.K. Kim. (2000). A global profile of germline gene expression in C. elegans. Mol Cell. 6(3):605-16. Saha, S., A.B. Sparks, C. Rago, V. Akmaev, C.J. Wang, B. Vogelstein, K.W. Kinzler, and V.E. Velculescu. (2002). Using the transcriptome to annotate the genome. Nat Biotechnol. 20(5):508-12. Schena, M., D. Shalon, R.W. Davis, and P.O. Brown. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235):467-70. Simmer, F., C. Moorman, A.M. van der Linden, E. Kuijk, P.V. van den Berghe, R.S. Kamath, A.G. Fraser, J. Ahringer, and R.H. Plasterk. (2003). Genome-wide RNAi of C. elegans using the hypersensitive rrf-3 strain reveals novel gene functions. PLoS Biol. 1(1):E12. Sulston, J.E., and H.R. Horvitz. (1977). Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev Biol. 56(1):110-56. Sulston, J.E., E. Schierenberg, J.G. White, and J.N. Thomson. (1983). The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev Biol. 100(1):64-119.  88 Sun, M., G. Zhou, S. Lee, J. Chen, R.Z. Shi, and S.M. Wang. (2004). SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics. 5(1):1. Timmons, L., D.L. Court, and A. Fire. (2001). Ingestion of bacterially expressed dsRNAs can produce specific and potent genetic interference in Caenorhabditis elegans. Gene. 263(1-2):103-12. Timmons, L., and A. Fire. (1998). Specific interference by ingested dsRNA. Nature. 395(6705):854. Velculescu, V.E., S.L. Madden, L. Zhang, A.E. Lash, J. Yu, C. Rago, A. Lal, C.J. Wang, G.A. Beaudry, K.M. Ciriello, B.P. Cook, M.R. Dufault, A.T. Ferguson, Y. Gao, T.C. He, H. Hermeking, S.K. Hiraldo, P.M. Hwang, M.A. Lopez, H.F. Luderer, B. Mathews, J.M. Petroziello, K. Polyak, L. Zawel, K.W. Kinzler, and et al. (1999). Analysis of human transcriptomes. Nat Genet. 23(4):387-8. Velculescu, V.E., L. Zhang, B. Vogelstein, and K.W. Kinzler. (1995). Serial analysis of gene expression. Science. 270(5235):484-7. Velculescu, V.E., L. Zhang, W. Zhou, J. Vogelstein, M.A. Basrai, D.E. Bassett, Jr., P. Hieter, B. Vogelstein, and K.W. Kinzler. (1997). Characterization of the yeast transcriptome. Cell. 88(2):243-51. Wang, S.M. (2006). Applying the SAGE technique to study the effects of electromagnetic field on biological systems. Proteomics. 6(17):4765-8. Waterston, R.H. (1989). The minor myosin heavy chain, mhcA, of Caenorhabditis elegans is necessary for the initiation of thick filament assembly. Embo J. 8(11):3429-36. White, G.E., C.M. Petry, and F. Schachat. (2003). The pathway of myofibrillogenesis determines the interrelationship between myosin and paramyosin synthesis in Caenorhabditis elegans. J Exp Biol. 206(Pt 11):1899-906. Yamamoto, M., Y. Maehara, K. Takahashi, and H. Endo. (1983). Cloning of sequences expressed specifically in tumors of rat. Proc Natl Acad Sci U S A. 80(24):7524-7. Yamamoto, M., T. Wakatsuki, A. Hada, and A. Ryo. (2001). Use of serial analysis of gene expression (SAGE) technology. J Immunol Methods. 250(1-2):45-66.  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items