Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Cloning and annotation of novel transcripts from human embryonic stem cells Khattra, Jaswinder 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2008_spring_khattra_jaswinder.pdf [ 14.71MB ]
Metadata
JSON: 24-1.0066593.json
JSON-LD: 24-1.0066593-ld.json
RDF/XML (Pretty): 24-1.0066593-rdf.xml
RDF/JSON: 24-1.0066593-rdf.json
Turtle: 24-1.0066593-turtle.txt
N-Triples: 24-1.0066593-rdf-ntriples.txt
Original Record: 24-1.0066593-source.json
Full Text
24-1.0066593-fulltext.txt
Citation
24-1.0066593.ris

Full Text

CLONING AND ANNOTATION OF NOVEL TRANSCRIPTS FROM HUMAN EMBRYONIC STEM CELLS by Jaswinder Khattra A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Genetics) THE UNIVERSITY OF BRITISH COLUMBIA December 2007 © Jaswinder Khattra, 2007 ABSTRACT Both cDNA tag-based and DNA chip hybridization assays have revealed widespread transcriptional activity across mammalian genomes, providing a rich source of novel protein- coding and non-coding transcripts. Annotation and functional evaluation of this undefined transcriptome space represents a major step towards the comprehensive definition of biomolecules regulating the properties of living cells, including embryonic stem cells (ESCs) and their derivatives. In this study I analysed 87 rare mRNA transcripts from human ESCs that mapped uniquely to the human genome, in regions lacking evidence for known genes or transcripts. In addition, the transcripts appeared enriched in the hESC transcriptome as enumerated by serial analysis of gene expression (SAGE). Full-length transcripts corresponding to twelve novel LongSAGE tags were recovered and evaluated with respect to gene structure, protein-coding potential, and gene regulatory features. In addition, transcript abundance was compared between RNA isolated from undifferentiated hESCs and differentiated cells. Analysis of full-length transcripts revealed that the novel ORFs did not exceed a size of 129 amino acids and no matches were observed to well characterized protein domains. Interesting protein level predictions included small disulfide-bonded proteins, known members of which are important in a variety of biological processes. Transcripts evaluated for differential expression by real-time RT-qPCR (Reverse Transcription followed by real-time quantitative Polymerase Chain Reaction) were found to be variably expressed (0.2- to 4.5-fold) in Day-2 or Day-4 retinoic acid-induced differentiation cultures compared to undifferentiated hESCs. ii Relative quantitation using a universal reference RNA (derived from pooled adult tissues) showed large differences in novel transcript levels (0.002- to 35-fold) compared to hESCs. Collectively, these results provide a detailed analysis of a set of novel hESC transcripts and their abundance in early and adult differentiated cell types, both of which may advance our understanding of the transcriptional events governing stem cell behavior. iii TABLE OF CONTENTS Abstract^ ii Table of Contents^ .iv List of Tables . v List of Figures^ vi List of Abbreviations . .vii Acknowledgements^ viii 1. Introduction 1 1.1 Human embryonic stem cell research^ 1 1.2 Small protein gene discovery  .1 1.3 Transcriptome profiling using LongSAGE^ 3 1.4 Hypothesis 4 1.5 Objectives^ 5 2. Amplification and cloning of novel cDNAs^ 6 2.1 Introduction^ .7 2.2 Methods 7 2.2.1 Reverse transcription and RACE chemistry^ 8 2.2.2 Cloning RACE products and full-length transcripts^9 2.2.3 Cloning full-length ORFs ^ 11 2.3 Results and Discussion^ 11 3. Analysis of novel transcript sequences  ^ .14 3.1 Introduction^ .14 3.2 Methods 14 3.3 Results and Discussion^ 18 3.3.1 Candidate transcript 3012^ 21 4. Quantitation of transcript abundance using RT-qPCR^ 30 4.1 Introduction^ 30 4.2 Methods 30 iv 4.2.1 Primer selection^ 32 4.2.2 qPCR data analysis ..33 4.3 Results and Discussion^ 34 5. Conclusion and Further Work ^ 37 Bibliography^ 41 Appendices 52 Appendix 1. Experimental work flowchart^ 52 Appendix 2. hESC culture and differentiation 53 Appendix 3. RNA quality electropherogram^ 56 Appendix 4. Primers for 3' RACE, Full-Length cDNA, and ORF recovery^57 Appendix 5. Novel transcript sequence analysis flowchart^ 58 Appendix 6. Candidate 3012 sequence 59 Appendix 7. Candidate 3210 structure and features^ .60 Appendix 8. Candidate 3372 structure and features ^ 63 Appendix 9. Candidate 3373 structure and features ^ 66 Appendix 10. Example of relative quantitation calculation^ 69 Appendix 11. Biohazard Approval Certificate^ 72 v LIST OF TABLES Table 1. Human ESC lines used for LongSAGE^ 4 Table 2. InterPro domain hits to novel predicted ORFs 20 Table 3. Transcripts that may have resulted from oligo d(T) mispriming^26 Table 4. Transcripts in a region of segmental duplication^ 26 Table 5. Novel ORFs with predicted disulfide bridges 27 Table 6. FASTA similarity search of 111 as ORF from candidate 3210 ^ .29 Table 7. Primers for qPCR assays^ .33 Table 8. Relative quantitation of transcript abundance^ 36 Supplementary Table 1. Transcript sequence data. ftp://ftp03.bcgsc.ca/public/hESC_novel vi LIST OF FIGURES Figure 1. RACE chemistry ^7 Figure 2. Agarose gel electrophoresis of RACE amplicons^  12 Figure 3. Agarose gel electrophoresis of colony PCR verification of cloned RACE products....12 Figure 4. Candidate 3012 structure and features^ .23-25 Figure 5. Candidate 3210 alignment of FASTA similarity search results ^29 vii LIST OF ABBREVIATIONS Abbreviation^Stands for as^Amino acids BCCA^British Columbia Cancer Agency BLAST Basic local alignment search tool by^Base pairs cDNA Complementary DNA ChIP-chip^Chromatin immunoprecipitation on DNA chip CMOST Comprehensive mapping of SAGE tags CSPG4^Chondroitin sulfate proteoglycan 4 db Database DNMT3B^DNA (cytosine-5-)-methyltransferase 3 beta dNTPs Deoxynucleoside triphosphates EGF-like^Epidermal growth factor-like ESC Embryonic stem cell EST^Expressed sequence tag GSC Genome Sciences Centre hESC^Human embryonic stem cell irrMEFs Irradiated mouse embryonic fibroblasts kb^Kilo base-pairs kDa Kilo Dalton LB^Luria-Bertani LTR Long terminal repeat miRNA^Micro RNA MPSS Massively parallel signature sequencing MW^Molecular weight n/a Not applicable ORF^Open reading frame pI Isoelectric point qPCR^Quantitative polymerase chain reaction RA Retinoic acid RACE^Rapid amplification of cDNA ends rpm Revolutions per minute RT^Reverse transcription SAGE Serial analysis of gene expression SMARTTM^Switching mechanism at the 5' end of RNA transcript TFBS^Transcription factor binding site UCSC University of California at Santa Cruz UTR^Untranslated region viii ACKNOWLEDGEMENTS Dr. Marco Marra for supporting and directing my research during graduate studies at the BCCA Genome Sciences Centre. Dr. Connie Eaves and Dr. Pamela Hoodless for input and feedback as members of my thesis advisory committee. Dr. Michael O'Connor and Dr. Connie Eaves for providing cultured human embryonic stem cells and differentiation cultures, and suggestions for qPCR assays. Ryan Morin and Dr. Martin Hirst for providing assistance and orientation with starting this project. The BCCA Genome Sciences Centre. I am indebted to numerous groups including Sequencing, Administration, Operations, and IT Systems, who have provided expert assistance and maintained state-of-the-art research laboratory facilities. Genome British Columbia, Stem Cell Network Canada, and the National Cancer Institute (USA) for funding this research. The Michael Smith Foundation for Health Research for a senior graduate trainee award. ix 1. INTRODUCTION 1.1 Human embryonic stem cell research Since the first successful derivation and sustained culture of pluripotent human ESCs (Thomson et al. 1998), there has been growing interest in the research community (Guhr et al. 2006) to better understand and manipulate hESCs. Although the formal study of pluripotent cells can be traced back at least fifty years (Solter 2006), their defining properties of self-renewal and pluripotency have only recently raised expectations of myriad clinical and research applications. Potential uses exist for both undifferentiated hESCs and differentiated cell types derived from hESCs. There are an increasing number of reports into the latter and include the generation of cell types from all three primary germ layers. For example, chemical induction of hESCs with retinoic acid is an approach to make cells of the neural lineage for transplantation studies in animal models of nervous system damage (Ikeda et al. 2005). 1.2 Small protein gene discovery Proteins and functional RNAs are important components defining cellular state and function, and so a thorough knowledge of the ESC transcriptome and proteome is a major goal of current research (Wang et al. 2006, Van Hoof et al. 2006, Baharvand et al. 2006, Unwin et al. 2003). The application of ESC products as therapeutic agents or models in laboratory research will be more effective with a comprehensive catalogue of the biomolecules present within such cells, along with an understanding of their expression patterns and interactions within or between individual cells. Small proteins are important effectors in a variety of biological processes, including cell signalling, immunity, and metabolism. In regard to small-protein gene discovery, expert reviews 1 of large-scale cDNA sequencing projects showed that the longest ORFs did not necessarily correspond to the actual protein coding regions (Suzuki and Sugano 2006). Proteome-wide mass spectrometry has revealed large populations of small novel proteins, such as upstream open reading frames in 5' UTRs (Oyama et al. 2004). A genome-wide evaluation of short ORFs in eukaryotes for the purpose of discovering new proteins was first conducted in the yeast model system (Andrade et al. 1997) with the discovery of bonafide novel proteins. Thus, with important roles in basic cellular processes, and by extension in developing cell-based therapeutics, it is imperative that short proteins be well characterized. However, the small size of short proteins has resulted in their being under-represented in large mammalian cDNA collections, which is evident by an artificial discontinuity in annotated protein-coding cDNAs below the 100 amino acid mark (Frith et al. 2006). This is in part a result of weak statistical confidence for short hits in sequence similarity searches, that can be compounded further for novel short proteins that lack similarity to known proteins or span a narrow phyletic range. Therefore, as a minimum, describing the short proteome needs to preclude artificial lower size-cutoffs in the transcriptome assay itself (e.g. cDNA size fractionation), or in cDNA annotation and de novo gene prediction methods. However, the analysis of protein-coding mRNA transcripts and, more recently, entire transcriptomes is a more feasible approach for the discovery of low abundance protein-coding genes. This is a consequence of our ability to manipulate nucleic acids in ways not yet possible with cellular proteins, including harvesting and assaying minute samples, molecular amplification, subtraction, and clonal propagation. 2 1.3 Transcriptome profiling using LongSAGE Cross-platform comparisons for measuring transcript abundance have shown low correlation (Griffith et al. 2005, Oudes et al. 2005, van Ruissen et al. 2005) and thereby underscored the value of using multiple technologies to circumvent biases inherent in individual methods. In comparison to LongSAGE (Saha et al. 2002), or other cDNA tag-based methods to interrogate the growing complexity of mammalian transcriptomes (Carninci 2006), DNA microarray studies have been biased towards well characterized and relatively abundant transcripts, with little overlap observed in hESC-specific gene lists (Vogel 2003). It was observed that MPSS data (Brandenberger et al. 2004a), despite greater sampling depth, showed poorer coverage of transcriptomes when compared to well curated databases (Siddiqui et al. 2006). hESC EST data (Miura et al. 2004) was limited and not affordable on a scale comparable to that undertaken by shorter cDNA tag-based approaches such as SAGE. Because key molecular regulators may be of low abundance or transient in their expression, it is advantageous to utilize a comprehensive sampling assay like deep LongSAGE. A further improvement in sampling cDNA tag-based libraries is now achievable with next-generation sequencing technologies that have recently become affordable (Nielsen et al. 2006; Ng et al. 2006). Bioinformatic resources for processing and mapping SAGE data have vastly improved since the first description of SAGE technology (Velculescu et al. 1995). Advances have included direct tag-to-genome mapping of LongSAGE tags (Saha et al. 2002), cross-species conservation analysis of novel tag regions (Schnerch 2005), and CMOST and p-value based quality filters (Siddiqui et al. 2006). These advances enable the processing of low-frequency tags to generate high confidence predictions for downstream functional studies. Furthermore, targeted rescue 3 experiments of transcripts corresponding to low abundance tags have demonstrated many such tags to represent bonafide transcripts (Richards et al. 2006; Kim et al. 2006; Siddiqui et al. 2005; Chen et al. 2002). In prior work (Schnerch 2005), an analysis was performed on a meta-library of 2.6 million LongSAGE tags acquired from nine undifferentiated hESC lines which had been independently derived in four different laboratories (Table 1). One aim of that analysis was to identify tags enriched in the hESC metalibrary, by comparison to publicly available human SAGE libraries constructed from a variety of tissues. Enriched tags were filtered to select those that mapped to human genomic sequence distant from known transcripts. Other criteria included selecting tags from human genomic sequence that was orthologous to mouse or rat sequence. The result was a filtered list of 301 novel hESC-enriched tags. 5' RACE experiments resulted in the rescue of 126 partial cDNA transcripts corresponding to these novel tags (Hirst et al. 2007). My project was a continuation of these hESC gene discovery efforts begun earlier in our department and complements similar efforts elsewhere (Richards et al. 2006; Boheler and Tarasov 2006). Table 1. Human embryonic stem cell lines used in LongSAGE Cell line Number of tags NIH registry Passage Feeder layer Gender 1 HI (Thomson, WI) 218,000 WA01 54 matrigel XY 2 HI (Thomson, WI) 276,000 WA01 31 irrMEFs, CF-1 mice XY 3 H7 (Thomson, WI) 272,000 WA07 22 irrMEFs, CF-1 mice XX 4 H9 (Thomson, WI) 468,000 WA09 38 irrMEFs, CF-1 mice XX 5 H13 (Thomson, WI) 221,000 WA13 22 irrMEFs, CF-1 mice XY 6 H14 (Thomson, WI) 212,000 WA14 22 irrMEFs, CF-1 mice XY 7 ES 3 (Pera, Australia) 206,000 ESO3 16 irrMEFS, B-81 mice XX 8 ES 4 (Pera, Australia) 209,000 ESO4 36 irrMEFS, B-81 mice XY 9 HSF-6 (Firpo, CA) 190,000 UCO6 50 irrMEFs, CF-1 mice XX 10 hESBGN-01 (BresaGen, GA) 202,000 BG01 20 irrMEFS XX 4 1.4 Hypothesis Human ESC transcriptomes contain uncharacterized protein-coding transcripts that may be variably expressed in undifferentiated versus differentiated cell types and be candidate regulators of ESC function. 1.5 Objectives My objectives were to discover novel ORFs encoded by rare mRNA transcripts detected in hESC transcriptomes, and assess regulation of transcript levels via measurement of transcript abundance in undifferentiated ES cells versus differentiated cell types. The experimental approach was to perform the following : One, conduct a bioinformatic analysis of 126 candidate partial transcripts (which, in earlier work, had yielded 5' RACE data from 301 novel hESC-enriched tags) and pick a subset for further study based on protein coding potential. Two, recover 3' RACE products from the novel SAGE tags chosen for study in step one. Three, perform a comprehensive analysis of full-length transcript sequences from the candidates chosen to evaluate their gene structure, gene regulatory features, and predicted protein structure. Four, recover full-length cDNAs and ORFs for interesting candidates identified in step three. The fifth and final aim was to determine whether any of the interesting candidates identified in steps three and four were present at different levels in differentiated cells as compared to undifferentiated hESCs. 5 2. AMPLIFICATION AND CLONING OF NOVEL cDNAs 2.1 Introduction In earlier work (Hirst et al. 2007), 126 cDNA fragments were recovered by 5' RACE from 301 novel hESC-enriched LongSAGE tags. Tags corresponding to these rescued fragments had a mean count of 2.5 tags per million (maximum 36 tpm), the rescued fragments had a mean length of 844 by (maximum 2 kb), 47 % were spliced, and 74 % had independent EST support for transcription. Twelve candidate transcripts were prioritized for 3' RACE amplification after analysing the 5' RACE sequence data using a subset of the features described in Section 3 (Analysis of novel transcript sequences). Criteria for selection included hits to known or poorly characterized protein domains, cross-species sequence conservation, sense-antisense tag pairs, putative single exon genes, and tags observed only in the deeply sampled H9 library (468,000 tags). Tags for these candidate transcripts were detected in one to six libraries, meta library tag counts were low (one to twelve tags in 2.6 million), and six transcripts were spliced. My first task was to use standard RACE chemistry (Frohman et al. 1988; Figure 1) to rescue 3' cDNA fragments overlapping 5' RACE products generated earlier using Clontech 5' SMART chemistry (complete experimental workflow is in Appendix 1). 6 Figure 1. Schematic of RACE chemistry. Outer (GSP2) and nested (NGSP2) 3' Gene-Specific Primers located upstream of a transcript's LongSAGE tag (and overlapping the 5' RACE product) were coupled with 3' oligo d(T)-based outer and inner primers to amplify the 3' ends of novel transcripts from reverse transcribed hESC total RNA. Region of overlap 5' — 3' — Region to be amplified by 5'-RACE Region to be amplified by 3'-RACE TagGSP2^NGSP2 AVM/VAMP Generalized first-strand cDNA ^NGSP1 GSP1 tp,r,,AAromivvw N NAAAAA- 3 ' 'NNTTTTT - 5' 2.2 Methods Amplicons were generated using transcript-specific oligonucleotide primers (IDT Technologies, Coralville, IA; Appendix 4) and total RNA from the H9 cell line (WA09; Thomson 1998), which was one of the nine hESC lines that formed part of the original set for LongSAGE library construction and meta-analysis (Table 1). RNA was kindly provided by Dr. Michael O'Connor from the laboratory of Dr. Connie Eaves (Terry Fox Laboratory, BCCA). Quality was verified periodically over the course of this project using a 2100 Bioanalyzer and RNA 6000 NanoChips (Agilent Technologies, Palo Alto, CA). RNA integrity numbers were 9.0 or higher. 7 2.2.1 Reverse transcription and RACE chemistry Reactions were conducted using a FirstChoiceTM RLM-RACE kit (Ambion, Austin, TX) and comprised the following three steps: (a) Reverse transcription Component Concentration Quantity H9 hESC total RNA 3.6 ug/uL 0.3 uL, 1.0 ug dNTP mix 2.5 mM each 4.0 uL 3' RACE adapter 0.3 ug/uL 2.0 uL 10x RT buffer 10x 2.0 uL RNase inhibitor 10 U/uL 1.0 uL M-MLV Reverse Transcriptase 100 U/uL 1.0 uL Nuclease-free water n/a 9.7 uL The 20 uL RT reactions were incubated at 42 °C for one hour using a Tetrad thermal cycler (MJ Research, Waltham, MA). (b) Outer 3' RACE PCR Component Concentration Quantity Reverse Transcription reaction product n/a 1.0 uL 10x PCR buffer 10x 5.0 uL dNTP mix 2.5 mM each 4.0 uL 3' RACE gene-specific outer primer 10 uM 2.0 uL 3' RACE outer primer 10 uM 2.0 uL SuperTaem---Plus DNA polymerase 5 U/uL 0.25 uL Nuclease-free water n/a 35.75 uL 8 The 50 uL outer 3' RACE PCR reactions were thermally cycled as follows: 94°C for 3 minutes, followed by 35 cycles of (94°C for 30 seconds, 60°C for 30 seconds, 72°C for 90 seconds), and a final extension at 72°C for 7 minutes. (c) Inner 3' RACE PCR Component Concentration Quantity Outer 3' RACE reaction product n/a 1.0 uL 10x PCR buffer 10x 5.0 uL dNTP mix 2.5 mM each 4.0 uL 3' RACE gene-specific inner primer 10 uM 2.0 uL 3' RACE inner primer 10 uM 2.0 uL SuperTael—Plus DNA polymerase 5 U/uL 0.25 uL Nuclease-free water n/a 35.75 uL The 50 uL inner 3' RACE PCR reactions were thermally cycled using the same parameters as the outer PCRs described above. 2.2.2 Topor"-TA cloning of RACE products and full-length transcripts PCR products were electrophoresed in preparative gels using Ultrapure TM agarose (Invitrogen, Mississauga, ON) and DNA bands were visualized for excision using a non-UV Dark Reader transilluminator (Clare Chemical Research, Dolores, CO). Excised bands were purified with a MinElute Gel Extraction Kit (Qiagen, Mississauga, ON) using a standard microcentrifuge-based protocol from the manufacturer. 9 Purified amplicons were cloned into the pCR4-TOPO vector using a TOPO TM TA Cloning Kit for Sequencing (Invitrogen) and the following reaction setup (optimized for transformation using electrocompetent E. coli): Component Concentration Quantity Fresh PCR product n/a 2.5 uL Dilute salt solution 0.25x 1 uL pCR4-TOPO vector 10 ng/uL 1 uL Nuclease-free water n/a 1.5 uL The 6 uL cloning reactions were incubated at room temperature for 30 minutes. A 2 uL aliquot was then used for transforming 50 uL of One ShotTM TOP10 electrocompetent cells with 0.1 cm cuvettes and a GenePulserTM machine (Bio-Rad, Hercules, CA). Electroporated cells were immediately resuspended in 250 uL of SOC medium and incubated at 37°C for 1 hour with shaking at 225 rpm. Serial dilutions of each transformation were then plated individually on LB- Kanamycin (50 ug/mL) agar plates and incubated overnight at 37°C. Resulting colonies were evaluated directly by colony PCR using PlatinumTM Blue PCR SuperMix (Invitrogen) to verify insert size and quality of each cloning-transformation reaction. Colony PCR reaction conditions were as follows: Component Concentration Quantity Platinumlm Blue PCR SuperMix n/a 24 uL M13 F primer 100 uM 0.1 uL M13 R primer 100 uM 0.1 uL Template - colony spiked n/a n/a 10 The reactions were thermally cycled as follows: 94°C for 10 minutes, followed by 30 cycles of (94°C for 30 seconds, 55°C for 30 seconds, 72°C for 60 seconds), and a final extension at 72°C for 10 minutes. Inserts were sized using analytical agarose gel electrophoresis, after accounting for the contribution of vector sequence (— 130 bp) to the PCR amplicon size. Twelve colonies per amplicon, from each of the 12 candidates targeted for full-length recovery, were picked into 384-well LB-Kanamycin plates and submitted to the Genome Sciences Centre (GSC) Sequencing Production Group for DNA sequencing. 2.2.3 Cloning full-length ORFs Full-length ORFs were recovered using ORF-specific primers designed with flanking GatewayTM technology (Invitrogen) attB sites. Amplicons were cloned into the pDONR221 GatewayTM donor vector utilizing standard BP recombination reaction chemistry. Cloned ORFs were subsequently prepped and readied for transfer into the pEXP3-DEST vector using Gateway LR recombination chemistry. The expression-ready clones will be used in subsequent work for in vitro transcription and translation assays using the Expressway TM LumioTM Cell-Free Expression and Detection System (Invitrogen). 2.3 Results and Discussion The 3' RACE procedure required both outer and nested PCRs (Figure 2), given the observation that outer PCRs alone rarely yielded distinct amplicons. That was not surprising, given the low abundance of tags for the candidate transcripts in the hESC LongSAGE data (counts ranged from one to twelve in a meta-library of 2.6 million tags). Approximating a few hundred thousand transcripts per cell, these tag counts approach a high of one copy per cell. 11 2000 by - 1000 by -- 650 bp -- 400 by -- 200 by -. 75* + 130 by 1000 by --- .0* 500 by --- 200 by --- 175* + 130 by 200* + 130 by 7, 4.10 Figure 2. Agarose gel showing nested PCR amplicons generated from four candidates using H9 hESC RNA and transcript-specific 3' RACE primers. Sizing was achieved using a 1 kb Plus DNA Ladder (Invitrogen). 3399 3443^3461^3465 I recovered a total of 36 amplicons from the 12 candidates, with one to five distinct amplicons (75 by to 2 kb) per candidate. Clones were checked for inserts by colony PCR (Figure 3) and then sequenced. Figure 3. Agarose gel electrophoresis from colony PCR verification of cloned RACE products. Six colonies were tested per RACE amplicon cloned. The three sizes of colony PCR amplicons shown for candidate 3012 represent 3' RACE products of 75 bp, 175 bp, and 200 bp. Sizing was achieved using a 1 kb Plus DNA Ladder (Invitrogen). * 3' RACE amplicon size. For colony PCR, add —130 by of vector 12 Full-length transcript sequences were reconstructed in silico for all candidate transcripts. Following a detailed bioinformatic analysis of these sequences (as described below in Section 3), I prioritized four candidates (3012, 3210, 3372, 3373) for further study and used transcript- specific primers to amplify and clone full-length cDNAs using the methods described above. 13 3. ANALYSIS OF NOVEL TRANSCRIPT SEQUENCES 3.1 Introduction This project began with an analysis of 5' RACE data from the set of 126 novel hESC gene candidates that were successfully amplified by 5' RACE (Hirst et al. 2007). My objective was to shortlist a set of putative protein-coding transcripts for further study. The initial analysis included annotation of relevant empirical ChIP-chip data (Boyer et al. 2005) and scoring synonymous versus non-synonymous substitution rates (K a/Ks) from pairwise alignments to the mouse genome (Nekrutenko et al. 2001). I was assisted in this regard by Mr. Ryan Morin (a CIHR Bioinformatics M.Sc. student at the GSC), who had tabulated hESC 5' RACE clone statistics. I subsequently conducted 3' RACE data processing and additional analyses that consisted of the nine activities described below (flowchart in Appendix 5). The strategy was to evaluate a variety of sequence features without restrictions on the size of transcripts or ORFs. 3.2 Methods 3.2.1. Raw 3' RACE DNA sequence was processed by trimming contaminating vector and RACE linker sequence, and bases of quality less than phred-20 (Ewing and Green 1998a; Ewing et al. 1998b) were also excluded. The presence and correct location of the LongSAGE tag was confirmed and clones that passed these criteria were assembled using overlapping 5' and 3' RACE sequence reads. The resulting full-length transcript sequences for each target region were analyzed for sequence polymorphism and alternative polyadenylation. 3.2.2. Full-length transcript sequences were subjected to sequence composition analysis, including features such as word repeats, in-frame triplets, and nucleotide frequencies. 14 3.2.3. Transcripts were analysed extensively in the context of the human genome using the UCSC Genome Browser (Kuhn et al. 2007) and supporting data tracks (http://genome.ucsc.edui, March 2006 assembly). That activity included: (a) BLAT (Kent 2002) analysis of transcripts and corresponding ORFs. (b) assessing potential oligo d(T) primer mispriming of genomic DNA from trace DNA contamination of RNA preparations. (c) checking regions of pseudogenes (Ashurst et al. 2005; Karro et al. 2007) for overlap with the novel transcribed regions under study. (d) documenting genomic repeat elements according to the RepBase library (Jurka 2000). (e) checking for segmental duplications, defined as duplications over 1 kb of non-RepBase masked sequences (Bailey et al. 2001). (f) assessing cross-species sequence conservation using 17 vertebrate species, based on the phastCon model (Siepel et al. 2005). (g) evaluating proximal regulatory sequences using the Transfac database (Matys et al. 2006), CpG island (Gardiner-Garden and Frommer 1987) and DNaseI-hypersensitivity (Crawford et al. 2004) tracks, canonical TATA and CAAT box signals in upstream regions, and polyadenylation signals at transcript 3' ends. Additional regulatory feature data included aligned transcription factor binding sites predicted from ChIP-chip experiments in ES cells (Boyer et al. 2005) for two key stem cell transcription factors, NANOG and SOX2. (h) documenting splicing with canonical GT/AG splice signals (Clark and Thanaraj 2002). 15 3.2.4. Open reading frames (a) ORF discovery was performed using the NCBI ORF Finder program (http://www.ncbi.nlm.nih.gov/gorf/gorfhtml). (b) Predicted ORFs were assessed for codon usage bias using the Graphical Codon Usage Analyzer program (Fuhrmann et al. 2004), which computed the frequency of usage of each codon versus a reference codon usage table derived from well characterized human protein- coding genes (http://gcua.schoedl.de). 3.2.5. Sequence similarity searches were conducted for both full-length transcripts and predicted ORFs using primarily the BLAST family (Altschul et al. 1990; McGinnis and Madden 2004) of algorithms (http://www.ncbi.nlm.nih.gov/BLAST/). (a) TBLASTX, searching the translated human EST db using translated transcripts. (b) TBLASTN, searching the translated human EST db using novel ORF aa sequences. (c) BLASTX, searching the non-redundant protein db using translated transcripts. (d) BLASTP, search the non-redundant protein db using predicted ORF aa sequences. To find regions of low similarity for diverged sequences and detect distant evolutionary relationships, the following exhaustive search algorithms were applied: (e) PSI-BLAST (Altschul et al. 1997), searching the non-redundant protein db using novel ORF aa sequences. (f) MPsrch (http://www.ebi.ac.u1c/MPsrch/), searching the UniProt Knowledgebase (Bairoch et al. 2005) using novel ORF aa sequences. 3.2.6. Domain and motif searches were performed for predicted ORFs using the Conserved Domain Database (Marchler-Bauer et al. 2005) and the InterPro collection (Mulder et al. 2007) 16 of protein domains comprising 16 major data sources (http://www.ebi.ac.uldinterpro/) . Short functional sequences were evaluated using the Eukaryotic Linear Motif resource (Puntervoll et al. 2003), which included motifs for peptide cleavage, ligand binding, PTMs, and subcellular targeting (http://elm.eu.org . 3.2.7. The predicted 5' and 3' UTR regions of novel transcripts were checked for regulatory RNA sequence signals using the RegRNA tool (http://regrna.mbc.nctu.edu.tw/) . That involved searching a collection of motifs from UTRdb (Mignone et al. 2005), including translational control signals and miRNA target sites. 3.2.8. Protein-level analyses were performed on predicted ORFs, primarily using software tools hosted by the ExPASy proteomics resource (Gasteiger et al. 2005) at the Swiss Institute of Bioinformatics (htt ://e. Each ORF was queried using the comprehensive PredictProtein server (Rost et al. 2004; htt ://wwwlpp_.elict rotehtor ) and I evaluated the following features: (a) protein physicochemical parameters, using the ExPASy ProtParam tool. (b) secondary structure and transmembrane regions, using the PredictProtein programs PHD and PROF, TopPred (Claros and von Heijne 1994; http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html),  and PSIPRED (Bryson et al. 2005; http://bioinf.cs.ucl.ac.uk/psipred/). (c) coiled segments, using Coils (Lupas 1996) in PredictProtein. (d) globularity, using the PredictProtein tool Globe. (e) cysteine bridges, using Disulfind (Ceroni et al. 2006) in PredictProtein. 17 (f) leucine zippers, using the program 2Zip (Bornberg-Bauer et al. 1998; http://2zip.molgen.mpg.de/). (g) signaling and subcellular localization signatures, using the SignalP 3.0 and TargetP 1.1 programs (Emanuelsson et al. 2007; http://www.cbs.dtu.dk/services/). (h) post-translational modifications, using ScanProsite (de Castro et al. 2006). 3.2.9. The final step in data analysis was to screen novel transcript sequences against non- coding RNA gene datasets to filter out hits to potential non-coding sequences. This involved BLAST searches against RNAdb (a mammalian non-coding RNA database; Pang et al. 2005), and querying Rfam, a database of non-coding RNA family alignments (Griffiths-Jones et al. 2005). 3.3 Results and Discussion To strengthen the evidence that these novel tags and corresponding transcripts were bonafide products of hESC transcriptional activity and not technical artifacts of the experimental process, transcript sequences were studied in detail with respect to gene structure, protein-coding potential, and gene regulatory features. This comprehensive manual evaluation included features generally used in gene finding approaches for eukaryotes (Stormo 2000), such as splice donor and acceptor sites, start and stop codons, TATA boxes, poly(A) addition signals, ORF annotation, codon usage bias, predicted protein features, upstream genomic regulatory potential, conservation, and sequence similarity. Following are seven key observations. One, novel predicted ORFs ranged in size from 22 to 129 amino acids (mean 66 aa). Two, there were no hits to well described InterPro protein domains and the three observed hits were to two uncharacterized domains and one domain of 18 unclear function (Table 2). Three, the most frequent independent support for expression at many of these novel transcribed regions was from an EST dataset (Brandenberger et al. 2004b) profiling both undifferentiated and differentiated human ESC samples. Four, GC content ranged from 26 to 54 % and genomic repeat elements (LINEs, SINES, LTRs) comprised 0 to 88 % of transcript sequence. Five, there was no overlap observed with known pseudogenes. Six, splicing was evident in six of twelve candidates with exon counts ranging from two to eight and canonical splicing signals (GT/AG) in three candidates. Seven, the mean difference in codon usage frequency ranged from 18 to 26 % for selected ORFs. The preceding observations suggest that finding novel predicted ORFs larger than 100- 150 as are unlikely in human ES cells with current cDNA databases and transcript sampling protocols. Perhaps deeper sampling for rare transcripts using next generation sequencing technologies and assaying highly purified stem cell populations may reveal longer novel proteins. What is also apparent in current cDNA and protein databases are many sparsely described or predicted proteins that require better characterization. Two of the three InterPro hits in this study belonged to the 'uncharacterized' category (Table 2). There was a wide range in genomic repeat content of novel transcripts, with some transcripts having high repeat content. For example, 56 % of the 609 by transcript of candidate 3373 was comprised of an LTR element. Such transcripts may prove to be interesting in studies of genomic repeat sequences and gene evolution. Splicing sites were observed in half the transcripts studied and they represent examples of difficult de novo predictions from genomic DNA, underscoring the utility of cDNA sampling from rare cell populations for novel gene discovery. The unspliced transcripts may be good candidates for novel single exon genes, generally thought to be under-represented in 19 multicellular eukaryotic gene databases. The mean difference in codon usage frequency in predicted ORFs was within typical values for known genes. For example, human POU5F1 and NANOG values were 11 % and 24 %, respectively, using the same analysis method. Table 2. InterPro domain hits to novel predicted ORFs Transcript ORF size InterPro entry Description 3210 66 aa PTHR12138 uncharacterized 3210 111 aa PTHR16213 uncharacterized 3373 72 aa PS01186 EGF-like As noted above, a detailed analysis of twelve hESC novel transcripts did not yield transcripts encoding large ORFs or well described protein domains. This observation may not be surprising, given the context of searching for novel protein coding sequences in a species whose genome and transcriptome datasets have been subject to exhaustive gene prediction algorithms. Thus, my focus shifted to the evaluation of novel short proteins. I was, however, cognizant of the fact that hypothetical short ORFs can result from faulty annotation (Linial 2003) and thus require expert manual review. The scope of transcript and gene features that I reviewed exceeded those of automated prediction algorithms used in projects annotating large cDNA collections. For example, one recent international collaborative annotation project (Yamasaki et al. 2006) used three features (ORF prediction, similarity search, and protein motif prediction). From the set of twelve candidates, four (3012, 3210, 3372, 3373) were shortlisted as most likely representing bonafide protein-coding transcripts. This selection was based on the presence of canonical ORF and splicing signals, EST support, Ka/Ks and codon usage scores, upstream 20 genomic regulatory sequences, predicted protein structures, and hits to InterPro protein domain entries. Structures for these candidates are presented in Figure 4 (for 3012) and in the Appendices (7, 8, and 9, for 3210, 3372, and 3373, respectively), showing gene and transcript structure, predicted protein features, and gene regulatory features. As an example, I describe candidate transcript 3012 below. 3.3.1 Candidate transcript 3012 This transcript had a meta-library frequency of 3 LongSAGE tags from 2 libraries (WA01 and WA09), including one tag from a deeply sampled H9 hESC library (468,000 tags). Two exons (163 and 1606 bp) were separated by a 62 kb intron with transcript orientation in the sense direction (Figure 4b). Shown in the UCSC Genome Browser view (Figure 4d) are data tracks of known genes (ZNF281, 1.5 kbp upstream), pseudogenes (none), predicted exons (one upstream), RNA secondary structure prediction (mostly intronic), human and non-human ESTs (including a match to independent hESC EST data), predicted transcription factor binding sites (TFBS, mostly upstream in a block of conservation), cross-species conservation, and repeat elements (one LTR comprising 9 % of the transcript sequence at the 5' end). There were CpG island and DNase-I hypersensitive sites 1 to 2 kb upstream, indicative of potential gene regulatory sequences. The 3012 transcript (Figure 4a) was 1769 by long with a predicted 65 amino acid ORF, a 41 by 5' UTR and 1530 by 3' UTR (sequences in Appendix 6). The canonical AUG initiation signal at the 42 by position was the 5'-most AUG and the only one predicted with a strong sequence context. There were also canonical GT/AG splicing sites at the termini of the predicted 62 kb intron. The 65 as ORF had a GC content (49 %) which was higher relative to the complete 21 transcript, and a codon usage frequency mean difference score of 22 %. There was empirical ChIP-chip evidence from hESCs (Boyer et al. 2005) for upstream NANOG and SOX2 transcription factor binding sites. The upstream promotor region had TATA and CAAT boxes, and there was a canonical polyadenylation signal 25 by upstream of the 3' poly(A) tail. Furthermore, these gene sequence signals were found within canonical distances (Mathews and van Holde 1990) to the predicted gene or transcript. ORF translation and protein structure predictions for 3012 suggested a 7.6 kDa single- transmembrane protein (Figure 4c), with an N-terminal signal anchor or peptide region, and four phosphorylation sites. There were a total of 13 short motifs in the protein coding region, including an N-arginine dibasic convertase cleavage site (an endopeptidase processing secreted proteins). In addition, regulatory motifs were predicted in the 5' and 3' UTRs, including four miRNA target sites. One of these (hsa-miR-128a) was found to be expressed in human HL-60 leukemia cells. Another (hsa-miR-197) was cloned from the Saos-2 human epithelial-like osteosarcoma cell line. And a third miRNA (hsa-miR-331) was a predicted homologue of a miRNA cloned from rat neuronal tissue and later found to be seven-fold down-regulated in human CNS tumor cell lines compared to normal tissue (Gaur et al. 2007). These predicted miRNA target sites may be of use in functional studies using RNA interference-mediated transcript knockdown. 22 Figure 4. Candidate 3012 (a) transcript (b) gene (c) predicted protein (d) Genome Browser view. 4 (a) 3012 transcript PolyA signal AUG^Stop^ 1745 ■ AAA CAAT TATA^5' -73^-18 I. 42 65 as 239 t 1530 by Tag at 1693 1769  3' UTR : 64 motifs (including 4 miRNA)5' UTR : 6 motifs 4 (b) 3012 gene Exon 1 — 163 by^ Exon 2 — 1606 by•^GT^Intron - 62 kb^ AG 5' 3' 65aa ORF 5' end, I22bp 65aa ORF 3'end, 76bp Summary transcript-gene information • 3 tags in 2 LongSAGE libraries (WA01, WA09) • 64 kb predicted transcription unit • GT/AG intron splice signals • 40 % GC (ORF = 49 %) • 22 % codon frequency mean difference • Cytoplasmic polyadenylation element, K-Box, and Gamma Interferon Activated Inhibitor element in 3' UTR • 100-fold less abundant in reference adult RNA versus hESCs 23 00a.o " rte 71- P. ie s-4 ^ c x ^ ,c) O ,,,c ,,^ 0 ^ 4 ^ 0 0•—^ ,..., ^ -, 4(74 k i a) ^ 0  c d ^ Z ^ t O ^ --% •,--, c...> 8  + ' E ^ . ^ -0^ ,4^ .0 •13' ci,^ II 7=1,̂ . -- o^ a ' E> ^ .. T.')^ up u)^ 0  v ')  cn  2  4 -4 .- ^ cr s  al •-,^ C I' ^ 7 )^ C-71 o  c rj 0  0 ^ 7 1 '  ' — ' ^ up^ Tu 4 (1 ) -0 ^ I I ^ a) c) 46 ,^ 0 1) -,- ,0 in^ ._,^ 0 7_,D' 4  .+4 v) II '4 ., ^ ■ . 0  • ,--. ^ II^ I I ^ i:) '-'^ 71" 0  C .) ,.., N  2 bOCU I I  8  a ) =  w II —  c t  .  .T.., 4 :a' al c l o  o  0 w  c) E --. rz4 r:4 ,4 w  7-) a., ra., a P. (... • •^ e e o o 24 1 -c- 1a)<..) ...-4 (i) . --,Szl, E  (-) NN  -zi 75 • az) • 7:30 O v) 4  a9^ a)k _ O P 4  E ^ =l=r3 CC3 ^ . 0 ^ r= --,^ 5 0 c z ;  o^ 0 • xa.)^ (144 f- 4^ S:11 7::t u ,-- -d • •.--' cz 9 aa)^ a.) E  K (1Jc zl I • 4 A  t.1 ?JD^ ta 0  .2 o  7 0 + - 4  ti aa)^ cd ad a-)^ c). v -) a  u^ E (IC c:) f:) ^ = •-c3 cu 0 Ocd cu (1.) $+-1 =1, 2 c 1 .) ._, 7 :3  V 3-4^ c . ) ') • •-. ^ (I) C ..) C r C 7 3  7 i cn C.) u ) I . P ., P L , • a) .-P . F  'C I 0 C h  f:1 -4 • cl zn t ----^ (;) ,4-1^ ciD a .) 0  C).-4 .-.^ a.) H 4--^ - cA4,  c/D u  ril .--' u' H tb  - ▪ •.4 ,-- 7 i Q 0  •,-4 1-4 M  C N  E --i O X  C ip , w -- • — a) • -zi M C.. c9  .c C ..) (... ZP4 c/ CD E (-1 • <C cA's z  0 o -0 .,-_-, r ,- )  0  0 a ...,.., ;i5 ,-• ,, < L, 0 71. 7:3 S:1.1 GL..oGoNN4I - -1 .7. ,-; 1..^ = .. <4-■ .4 -.z^ ....- 3.''' -45^ , -1 22 41,^ <-) ,„  g  t — I -^ - v ) 9z., ^ 0 . a c..)^ -0^,,," 4.) V  g - 1:F._ ^ P ,,... (,,,^. L4= _ .7: rz. .7 is.-._ E 1 -  TT11ii _ ..y::. 1111 i1 111 1 1 1^111 1 ^ _ i t I - I I i0^E, ..^ ;7, EJ.._^ c 40^ 9 II . 1,'^-_ i m _g L- i^ '8 11^ 2, 2^ t 2  - ^ - . ,_ - IA 0, ti^ 1 ^ c ■^ •-■ f2.^?̂  2 tIK' -^ ' a Q •- ,0 . — ill I ÎI I^ Vt:'^.9 !^ i 0; ^ g _ • E i- •"7-. '2^ 1^ 1.-,'^ f;^ §  E^ ,i. 5 ^ . 0^ 0^ ‘ • ., : 9   ^ E E  ^ 0 ^ 2.4f,^ 14 ' .^ =^ 2 ,.4  2  ^ .,_  0 '. 1.. ^ 2 ‘1̀,1^ 3^R E ^ 1' - c ^ 2 1. ^ !^ '^ - 2  1 ^ 2 : t'7, '^ .'^ 5^ F ri'1^1 1^ 1 2 -11-^ it ,:3;114. °11,1=  - S' I^ '120v.°^ P 1 ;^ c---i^ cN 0 c ,0 ^ - i 1 - ^ _ j ^ 1^ k1 1 ^ ..c ^ 0 ^ N - 1m^ E^Cs_ ^ .I-•^ _. . '-^ _ ')% 1.• 0 0^ E.!^ i ^ .? ","^ 5 ^ - -& +2^ = ^ .- ^ c 5:- C^ 0 ,1111-. `" '^- T.5 =^ - .. 4., ro- 1 ? 11^ :7_,, ^ 7 g112 ^ z^t ^ t!1^ j 41^ t .. '.5 L'-.... 4,44.4, 10 . ^ 2 8 I 6  -a  - I.'c.. . 11 "S'^i 1^ u, 1^ §^3 . .^ 8^a 1ii^ 8 • ,_,0. _ ...^ = §^ -,.. . m; - E 7 - ^ - = 7.^ - CY :(1i;,gm g :..,1 5 'k- I^I I - - = ,■-■05we _. - — -1' 'IC e^ 7 c4i. = S' ca a .4 _ I, cl 1  'I) I'^ f0 S.t:, — M 0ra E al^ 03 M . •T• ^ 0, . *"–el • t  2 a)^ 17,' '-'r3;.... X0C _^ I 1  R C.) 7m C.^ :1":" ■■ -444a^ .::: =- 4.4.^ _ -^ - . 0 .^ 5 "'4'411 .-.^ 40 ^ 0^ 4''^ 0^ 0 0 .^ 8 ^ 2^ E^ 2^ — ^ 3 L rIfIgP 7 E t^t't131 ^ - - . 1  , •..• •••^ a0,. ^C *..": ^ ^ ^ E ; , 27 1 g ^ . . .  . . . ^ U ^I^f',^g ^ 1,-, -o,-^ o t -^ ' i g^ - k t &S^ 3^ x 1 ■I^ I ^ 1 2 5 Detailed analyses were similarly performed using cDNA sequences (Supplementary Table 1) from the other rescued novel transcripts. Five points of particular note were : (i) Two candidates (3114 and 3630) were discredited due to evidence for artifactual transcripts arising from oligo d(T) mispriming of genomic DNA along a tract of adenines immediately downstream of the transcript sequence when aligned against human genome sequence (Table 3). This observation highlights the need for acquiring full-length novel cDNA sequences in order to exclude such artifacts which, based on this study, may account for one in six putative transcripts. Table 3. Transcripts that may have resulted from oligo d(T) mispriming Transcript Number of adenines Context Location 3114 13 Genomic Immediately 3' of aligned transcript 3630 21 Genomic Immediately 3' of aligned transcript (ii) Regions of segmental duplication were observed for candidate transcripts 3149 and 3630 (Table 4). Akin to the novel transcripts containing genomic repeat sequences, these segmentally duplicated regions may be interesting candidates for studying gene evolution. Table 4. Transcripts in a region of segmental duplication Transcript Chromosome (alignment score) Size Overlapping sequences 3149 7q (100 %) 1 duplication in 14- (96.1 %) 4 kb hypothetical thymus protein Intron of predicted TF and 3630 15q (99.8 %) 27 duplications in Y & 15 1.3 to 83 kb Segment of CSPG4 (iii) Ka/K, data yielded only two high-scoring segments, in transcripts 3149 and 3372. One of these segments overlapped a 101 as ORF in 3372. Thus, this popular approach for recognizing 26 potential protein coding sequences by scoring regions of purifying selection, provided few clues in this dataset of novel transcripts. (iv) The 72 aa ORF of candidate 3373 (Appendix 9c) contained five cysteine residues with two predicted disulfide bridges (Table 5). This is an interesting example, with disulfide bridge complexity reminiscent of SDPs (small disulfide-bonded proteins), a class including hormones, toxins and inhibitors. All are characterized by small size and complex disulfide connectivity (Kong et al. 2004). This protein also contained a PCSK cleavage site, belonging to the family of subtilisin-like proprotein convertases. These are expressed extensively in mammalian neural and endocrine cells, and play a key role in the proteolytic processing of neuropeptide and peptide hormone precursors. Table 5. Novel ORFs with predicted disulfide bridges Transcript ORF length Number of cysteines Position 3372 101 aa 2 C23, C89 3373 72 aa 5 C17, C19, C30, C53, C72 (v) Sequence similarity searches, the mainstay of many gene finding approaches (Stormo 2000), did not contribute substantially to strengthening evidence for the candidate transcripts in this study. As noted earlier, this outcome is not unexpected when querying short sequences. It was a key reason why large cDNA annotation projects used size cutoffs of 100 aa when conducting gene finding exercises (Frith et al. 2006), to circumvent the weaker statistical confidence of short sequence similarity hits. Nevertheless, I performed comprehensive sequence similarity searches using both the novel transcript sequences and their predicted ORFs. 27 One example of an interesting result was the predicted 111 aa ORF of transcript 3210 (Table 6 and Figure 5 below). Searching the large UniProt Knowledgebase of proteins (Bairoch et al. 2005) returned a collection of high scoring hits, albeit only across a narrow phyletic range of primate sequences. The sequences mapped to distinct genomic loci. Thus, this set of predicted protein sequences dispersed over several chromosomes may represent an expansion in primate genomes. Interestingly, variation amongst these sequences occurred primarily at the N-terminus, suggestive of variant signal sequences. These have been found to be a common feature contributing to the diversity of protein output in the mouse transcriptome (Davis et al. 2006). This 111 aa ORF also had a high (96 %) aa identity hit to a patented full-length human cDNA sequence (NCBI accession ABP27450), the subject of US patent number 7193069 (Isogai et al. 2007). 28 C=:) (L) N  c/) cr) czt aJ " 0  t o "0cct bl) • a ca. 6 1) 0  71 . cc! cci a ) (1) 7 5- • 'e r C A ) (1) 4•:". rncll c74 . 4  7Drn Hcc3a) -0  • cn • CO O  • C/D CL4 0 -4 4-1 4H°OQ  5 Cur) ^ c.)0 -c-d  0 c.)rn o  C A C )  N C L , 19 c t  • -0 -Cc 4ct1 • Fi5O  ct o •- cci^ 5czt a.) cal • c).) c7ca ) a) it tri 4 4  ^ c i 0 _a ) c.)czt O A  a) •.-• W I  a " 0 C D  0 X  X  X  P a 1 -4  w  H 0 0 0 0 Z H Z Z Z t-4  a  H  a  1 -4 a a  v s  a ., a r 4 4 1 4 1 4 P r P r P r 1 -4  0 X P r P r P , w  00 7.4 R R R R R F-4 i f ] H ^ H  ■-4 1 -4  H  a  HN 1-4^ 1-4 rn 6s4 M M M D 4 D 4 rc.'D  rc.' D R I P i w  P I P i P i a  a  a P i P i P r P r P i P I P i P i 0 ' 0 ', C D ,^ 0 , a . a  a  a  a C ,1 0  V I  0  V A  t e 1 C :3 A 00 0 0 i7 D 0 4  4  4  4  4 0 4  0 %  0  0 4 0 4 4  4  P .4 E -I vs P IM M H 1 -4  a  1 -4 ar^ ar 1-4^ 1-4 L r .i C .4  a ..r 1 -4  C i L -1  0 1 e n H ^ I-1  X  H v , a .,  e n r-.1 M ^ '1°.?1 1-4 1 -4  1 -4  0 1 H  e n  g  M E -r P a^ E--1 C I4  H ^ V , 0 0 0  0  C D ^ 0 >  >  >  ''' '''' > =  =  =  0 C.C4 =  X  H  = L.T 4 LY 4 [A 0 r.4 5.1•.5 0 0 0 0 H H H E - 1 H a41-4 1 -4  H  1 -4  C H  ■ -4 ar^ aa^ aa 4 >  > a ., 4 -1  a ., L H E i H 4  4 4 ,-.1 4 .4 .4^ ,..4 ›, 04 c.'-. 3., P r P r P i P i ,, el et P i P4 ,.,. '7i^ A / P .1 ^ ••1 g  g  =E-4 > 4  > 4  9 1  =  > 4 f:=1 cy R R R R 0 ., „, V 1  v , 0 ., 0.,  ... ,e , r .„  v , a  g 7 :1 1  a  a  a G ., ■ -1  1 -4  H  1 -4 4  1  -I 1  - 4" I -I Hu  ' 0 ) 1.4HH CD04t-4Ca4HN]HN]1.4 0CO C.DCwW HN]HPren••^ • •^ • •^ • • H H H H I H ^ H I  H  H H ^ H I H ^ H I  H I H I T i t- •• •• .-4 ec) •- H o o CO CO CONI E.]z 0 ,C1 0 0 - H  EA" H 0  0  0  0H • ski wo H  H  H •-1^ w r 0 3  0  C O  C V  s•O  C O  4 • ,4 7 1  C V  In C O H i c o  C O  u o  U ) m  r-- H I  H  H  H  H  H  H •• C r, ^•^ • •^ • •^ • • C O  r , H I H i H  1 ' H CO0OH0 H  H  H ^ H i 1-1 H ^ H ^ H I H I H I H I H I H I H  •-I H I H I ••^ • •^ • • •-1 ri l  CO r  C O  a l C 7  H N]OH H01.4 Hz C74 r_40E-r• 0GcH A0H0H • • HL.]0H0H • • C r. HC1-10H0H 29 m> V D riAa) •-r •-■ V D ciAa) N•-^ kr) Na) N-■ k•) cila) kr) (Ni 0") c ..](1) 00-+ C r) r.;.i1, -■r‘i (n(--a) cr) c‘i C r) c.1a) ce) cr) C r) Na.) ca, oo c l C A1• -i NNa.) kr) c- , ,j c)(.1OL) c-r) • , S=4 c, t 7 ',ti) czt c, 1 0 er, C h cr, C T cr, C T c l C T 71- C T ,C h anCT oo00 cr) 01 c l 01 C T 00 C T 00 --.1 • -0.' ,,z,z a) M M M (7\ M (^,1 VD cc") C ) V 7 0 1 Cr) 7:1■-i V7t--- V7c•-• r:5 t---- Mt--- Nt---- 01v) N1---- rn■c. ,--, t --- 00f:) ,--^ t --- a) c,z;m oo-■ c ) cr) oocr) (-1kr) vokr) encr- (1(-1 kr) •1- (171- tr).t:) oo•-, (■1r•I 1H + 1 + + + I + I + I N • 1 I+ + 1 + 1 + 1 E.-4 g A < ,....4 U ,..r, ± ±N ± _I_ ± ± I I 4.4 1+1 7 r I I I C I , ..4 1 , ,' 4 0 ,-4 •-, •-, , 'I C A V ") M C A ecr es, • ---I (-NI •- 4 •- 4 0 &I 5•42 '' a) ;... a) _.., c.) a•a i..4 ,cd = , E ;,,,, 1) .0C v ) ... H >.,.. ,-4 .+5,I. ,,..-■ •Fi,__-0 Tu",,s•-•P-4 -0,,., _.,-. c„)P0+... ;-._, c••°' „, 1- 1...CI,  a ),...4..clHZ 'F's-0 d-H ,„,„-,..,C 7.;'_1 52a)c) >..,-4c''-'c+1)-1 .-4cl' -'°( (1.) v) c4 u) c.) cn v) c.) c:D ki:;' cs 4=(:) 4=co 4=4' en 4=-4 411-4 CN,., 4=1CA t4.1'I - (-1 r-co, „,6- ir) tr) c N ik r) cc) •-I C A cn C--- cn c r )c s )r )- , cn C- - oo 7 r ch •- , M (N I ___,. • .10 cl•-, Ot---. 7 r Nt--.. - c l ,I 4 0 'A" ';:t C. 1 , 1 C4 , 1 A- z ..,. z z 0 5 p4 a a a a a a F.,,a a4 ,.„ C /) P k ;•Il .1 4-4 4.1 P•1 •1 4■I 1/4.../ •1 4-4 ra•I 1/4■ 1 4 4 4 4 4 4 0 0 0 0 0 0 0 0 0 0 Q Q 4Q 4Q 4Q 4Q 4121 4r: Qcu,3 Cd x 5 E Mx c94 CC3 m E 5 g g „9a c,9 c.,$,uc,,: ocl5I c ,,:5 m5 c.,c_dQ1 .0AEIcl I oor-rr) / ›-cNi C/) in . I °eD kr' X ■ ip r` l 1 0 0 ' X> C h . I '`') 1-, kr) in---- I ::).1.-1 E-I I 1r) cy•,--■ mC.7 N N Z 4 N N P 4 N N P L q Z 1 C T 1/4.0 1/4.0 C T C O 1/4.0 c r VD VD 0 1 C:31 C Y 0'CY CY a 0' a a 0'0'0' 0' _._I - (N I C r) 71 - kr1 • CD r --- 0 0 0 1 c),--.• -l•-••4 enCI 4. RELATIVE QUANTITATION OF TRANSCRIPT ABUNDANCE 4.1 Introduction As a first attempt to functionally characterize these novel transcripts, I evaluated transcript abundance in RNA samples from undifferentiated hESCs and differentiated cell types. This was conducted with real-time RT-qPCR, using relative quantitation assays with SYBR Green dye and transcript-specific primers. This nucleic acid quantitation approach is amenable to verifiable specificity and high sensitivity, with low input template requirements, scalable assays, and replicability of data for statistical analysis (Bustin 2002). 4.2 Methods RNA samples utilized in this part of my study included qPCR human universal reference total RNA (Clontech, Mountain View, CA), undifferentiated H9 hESC total RNA, and total RNA samples from three timepoints in a retinoic acid-induced (RA-induced) differentiation protocol (Day 0, undifferentiated; Days 2 and 4, differentiated; Appendix 2). Cell samples were kindly provided by Dr. Michael O'Connor from the laboratory of Dr. Connie Eaves (Terry Fox Laboratory, BCCA). Total RNA was extracted using TRIzol reagent (Invitrogen, Burlington, ON) and quality was verified on a 2100 Bioanalyzer with RNA 6000 NanoChips (Agilent Technologies, Palo Alto, CA). RNA integrity numbers were 9.7 or higher (Appendix 3). Concentrations were verified using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE) and equal masses of each RNA sample were used for DNase I treatment with DNA-free TM reagent (Ambion, Austin, TX). A two-step RT-PCR approach was used, where equal amounts of DNase I treated total RNA were first subjected to reverse transcription using random hexamers and MultiScribe 30 Reverse Transcriptase as per the manufacturer's protocol (Applied Biosystems, Foster City, CA). To prepare a dilution series of 1 st strand cDNA products for generating relative standard curves, I assumed a 1:1 mass conversion in the reverse transcription reaction when labeling the diluted aliquots. Thus, a 2.0 microgram and 100 microlitre reverse transcription reaction was assumed to yield 2.0 micrograms of 1 st strand cDNA per 100 microlitre of reaction product. Power SYBRTM Green 2x dye based qPCR detection was performed on a 7900 HT SDS machine with SDS v2.2 software (Applied Biosystems). Assays were performed in 20 uL reaction volumes, in triplicate. Optimizations were conducted using a primer dilution series ranging from 50 to 300 nM, a template dilution series spanning four orders of magnitude from 10 pg to 100 ng, dissociation curve analysis, measurements of PCR efficiency and precision of replicates. I kept input template amounts for each qPCR reaction equivalent by ensuring equal amounts of total RNA added in the DNase I treatments and reverse transcription reactions, identical dilutions of the 1 st strand cDNA products, and equal volumes of diluted 1 st strand cDNA template per qPCR reaction. One large preparation of H9 hESC 1 st strand cDNA was synthesized and diluted for use as a reference standard in all qPCR experiments conducted over the course of this study. qPCR reactions comprised 10 uL of 2x Power SYBRTM Green dye (Applied Biosystems), 5 uL of 0.6 uM each primer pair mix (IDT Technologies), and 5 uL (10 ng equivalent) of l st strand cDNA template. Thermal cycling was perfomed as follows: 95°C for 10 minutes, then 40 cycles of (95°C for 15 seconds followed by 60°C for 1 minute), and a final dissociation curve stage comprising 95°C for 15 seconds followed by a 60-to-95°C slow ramp, (at a rate of 2 %). 31 4.2.1 Primer selection Up to four primer pairs were selected for each novel transcript, endogenous control gene, and marker of undifferentiated or differentiated cellular states using Primer3 software (Rozen and Skaletsky 2000) and published literature or primer databases (PrimerBank, http://pga.mgh.harvard.edu/primerbanki;  qPrimerDepot, http://primerdepot.nci.nih.gov/; and RTPrimerDB, http://medgen.ugent.be/rtprimerdb/) . For endogenous controls, I reviewed relevant hESC literature (Synnergren et al. 2007; Willems et al. 2006) to include genes considered better normalization controls in ESC-to-EB differentiation protocols. Likewise, markers of the undifferentiated and differentiated cellular states were selected based on published literature, including findings from a comprehensive study that characterised 59 hESC lines from 17 laboratories worldwide (The International Stem Cell Initiative; Adewumi et al. 2007). Novel transcript-specific primers were picked following an assessment of each transcript for regions of unique sequence. This involved a preliminary BLAST search of each novel sequence against human genomic, transcript, and EST DNA sequence databases to flag regions lacking hits of high similarity to any known transcribed sequences or multiple sites in the human genome. Following primer design, each primer pair was rechecked using the above specificity criteria. Other prioritized features in primer design included picking amplicons that spanned introns, avoiding pseudogenes or regions of repetitive genomic sequence, and avoiding sites of known sequence polymorphism. The final sets of primer pairs for each target used in this study are listed in Table 7 below. 32 Table 7. Primers for qPCR assays Primer group Forward Reverse Endogenous controls GAPDH CTCTGCTCCTCCTGTTCGAC TTAAAAGCAGCCCTGGTGAC ACTB GCACAGAGCCTCGCCTT GTTGTCGACGACGAGCG NUBP1 TGGCCTAGCAGAGGATGAAAA GTCTTCCACGTACACTGGAGA Novel transcripts 3005 GACTCTTGTAGCCCGCTTTG TCCCAGCACGTTGTAGGATA 3010 ACTGCAAAAGGAAGCCAAAA TAAGCACCTTGGGACTCCAG 3012 AGACTTTCCCCCATCCATCT AAACTCGCTCTGCCCTTTCT 3042 AACATTTCACCAGGGTTGGA TGGAGCTGGCAACTGACTC 3052 TACCATTCCATGAGGTGCTG CATGTACAAAGAGGGCACGA 3061 CCCTGCGCTTTTTCTCTCT TCACACAAGGCAGGAATGAA 3116 GAAAGACGGTCTGGCTTCAG CGTCAGAAATAAAAGTCGCATGA 3194 TGTGTGGGCCAATGATACAG TATCTGGGCCTTCACACACA 3210 CAGGGGAGGCTAAGTACAGG GGGCAACGAGAGCAAAACT 3210 F02 CACCCGAGATTCCTGTCTTG TGCAACATGAAGGGTTCAAA 3372 CGGAAGAAGCAGCTCATACA TGTCAAGGCCTTACCTCCAC 3373 AAAAGCATGACCTTCTCTTTGTG CCCCCAACTAGGATCAGTTCT Markers of undifferentiated state POU5F1 CTTGCTGCAGAAGTGGGTGGAGGAA CTGCAGTGTGGGTTTCGGGCA NANOG CAAAGGCAAACAACCCACTT TCTGCTGGAGGCTGAGGTAT LIN28 GATCAAAAGGAGACAGGTGCTAC CTTCAGCGGACATGAGGCTA DNMT3B CCAATCCTGGAGGCTATCCG ACTGGGGTGTCAGAGCCAT GDF3 TGCTACGTAAAGGAGCTGGG CAGGAGGAAGCTTGGGAAAT Markers of differentiated state AFP ACTGAATCCAGAACACTGCATAG GCTTCTTGAACAAACTGGGCAAA T (BRACHYURY) CCCTATGCTCATCGGAACAA CAATTGTCATGGGATTGCAG NEFH CAGGACCTGCTCAATGTCAA CAAAGCCAATCCGACACTCT NEUROD1 GCCCCAGGGTTATGAGACTA CTCGCTGTACGATTTGGTCA CDX2 GGCAGCCAAGTGAAAACCAG GGTGATGTAGCGACTGTAGTGAA 4.2.2 qPCR data analysis Raw qPCR data was processed for relative quantitation using a relative standard curve method (Peforming Relative Quantitation of Gene Expression Using Real-Time Quantitative PCR, http://www.appliedbiosystems.com/support/apptech/) . This approach does not require PCR 33 efficiencies of target and endogenous control primers to be equivalent and gives more accurate quantitative results, especially for low-expression-level fold changes. Triplicate data points were averaged and standard deviations calculated. The same preparation of undifferentiated H9 hESC total RNA was used as the reference sample for generating a relative standard curve for every primer set on each qPCR reaction plate. The Day-0 (undifferentiated) hESC sample used in the retinoic acid-induced differentiation experiment served as the calibrator sample and the basis for comparing changes in relative expression during the differentiation time course. 4.3 Results and Discussion To evaluate gene regulation at the level of transcript abundance, I conducted real-time RT-qPCR relative quantitation assays using SYBR Green dye and transcript-specific primers. First, the assays were optimized using undifferentiated H9 hESC total RNA and qPCR universal reference human adult RNA to pick the best primer pair per target, after analysis of dissociation curves, PCR efficiency, precision, and dynamic range. Next, to assess the functional significance of these uncharacterized transcripts in a biological context, I measured relative transcript abundance in an experiment involving the differentiation of human ES cells using a RA-induced differentiation protocol. A comparison was also conducted using human adult universal reference RNA versus undifferentiated H9 hESC total RNA. I observed five trends of differential expression when evaluating the abundance of novel transcripts in undifferentiated (Day-0 and reference H9 hESC) versus differentiated (Days 2, 4, and human adult universal reference RNA) cell types (example calculation in Appendix 10). These trends are summarized in Table 8 below, with the most dramatic changes being a 500-fold 34 lower abundance of two transcripts (3372, 3373) in human adult reference RNA, and a 35-fold higher abundance of one transcript (3005) in human adult reference RNA. Interestingly, some novel transcripts differed in abundance even between the two differentiated cell samples (RA-induced days 2 or 4 versus adult reference RNA). One set (3012, 3052, 3372, 3373) had expression patterns suggestive of transcripts restricted to early cellular differentiation events, and the other (3061) may be relevant in differentiated adult cells. In contrast to changing expression patterns seen for the majority of transcripts, two transcripts (3194, 3210_F02) had relatively stable levels of expression across all samples and so may be regulated by an alternative mechanism. 3210_F02 represents an isoform whose poly(A) signal was located at a canonical distance upstream from the poly(A) tail, in contrast to the two other isoforms of candidate transcript 3210. To address the question of reliability of a single housekeeping gene as sufficient for an endogenous control in ESC differentiation experiments, a panel of three such genes (GAPDH, ACTB, NUBP 1) were used in all relative quantitation assays. I observed similar performance across this panel of genes, with the GAPDH assay yielding more consistent measurements. 35 c0:r-,(a 4-.a)1cu...c— 1:(D E u)=ii2a)>  N C.)^ a) 0)^ c.) W  -o _c^ a) ......- -.Ei. 0)^ El2 .,-z^ a) c ),._ 0  '''' La o0-oC= 12)a)al =c2le—a--== u ) 0 TD . as c.) .0a)30_co=E a ) 't-0 Crl c -E-'^ cu N (N CD^ u) -0 ^ CØ .., a =  c-,.= a) ^ a) C  -C —  o) a) _, C I .5 _c^ 0) oz  0 E  ° C.-'-W-Ccr) ._  c o_ .c^ a.)0 = a) .4 -^03 -o  L e c  I- —^ ,.., N .,...CD co E . >,.^ Quo CD ^ .-. a., -a - ^ cti 1.-a)=.o -0a)'CC) :=ca),.-a) 55-• 22 - —  0—  0 C....a) _Ct3) LE.co=E NTi)-0''fa...7..c2  a )-0c._0_c0)- u)a)CD_ > , .0coco20cia'—E._u) a)00 ^ a)  0 6-  <  E  U ) Z  a )  W 2. ) ^ W  -C , 4 - C O 1^ !  t 4) aS —,-, 4— o >^ -  " 4 —  CM 8C4).IcDC810 (?-1 cDC8ci g q a q E ; ?-1 75. 0 5i 8.c) q-Hr.- 0 ,-H7,,L6 inCD-H(-.;— to0-HN:cr) r - v--H .I: ' 0)0-H A" N0-HN- Lr) M0 -H cqc• co a) 0 c f) oc  LLI I ^ .2 -C,-, >,^ _ 0  --; la^ ."^ > " 0 ^ a  c8 0  EoI- 7o-14 0, 9c)+I Mci -co-H(r) ,- -a-Hco‘. c%! o+I coci CDci+I CDcti cv6- H,:tNI: c.,ci-H COc\i ooel-H COcNi CDel-H LO4 -ci-H ,a - ci -ci-H N, ca 0 03) co oc W N 0 _ c,-, >., 2 — CU 4— a ^ LEI^ (13 a 0  Eo,_ elCDitCD• CN/ ,-CD-H0)6 Nd-H1—N (06-HC3 0ci-HCVc:i Nci-HCO, ci-H'—4 d-HCDcii C O 3 -N N • c t ci.1-1 0—: ci-HI-— ci-H(C') ,-d441,-ci is. •E0NCRI i t oiN. MCO elN. C O C O (NAe-0M c■I 1.0CDCI) 'COCDC') c.00CDCO 01—CDM N1 1 - C D CC) C31 —CVC') CO1— . 1—M V ' 0 ) 1—M CN10 LL1 CD1—NCO 36 5. CONCLUSION AND FURTHER WORK As has been generally observed in the field of transcriptional profiling over the last several years, there can be abundant diversity in the transcriptional output of mammalian genomic loci. Although tested on a much smaller scale, this phenomenon was evident from the extent of sequence variation observed in the 3' RACE data from this study. Nine of twelve candidate transcripts showed alternative polyadenylation and eleven exhibited sequence polymorphisms, ranging in size from single nucleotide variants to indels of over 100 by amongst transcript isoforms. Functionally evaluating such transcriptional diversity in a cell-specific and developmental context remains a formidable challenge. When conducting detailed analyses of novel transcribed sequences, it can be difficult scoring evidence for bonafide transcripts. Consider, for example, the case of codon usage scores. On the one hand, assigning cut-off values for codon usage scores may help exclude apparent transcripts with poor predicted translation efficiency. But problematically, there may be bonafide novel ORFs having low to intermediate scores, with a less-than-average codon translation efficiency being a way of modulating translational output in a cell. Thus, novel protein-coding gene discovery today in the heavily mined human genome and transcriptome datasets is a particularly difficult task, given that more readily discernible clues like deep sequence conservation or regions with canonical gene signals have already been annotated. On assessing novel transcript abundance, I found much higher fold differences when comparing human adult universal reference RNA to undifferentiated hESCs than in an RA- induced differentiation time course. A broader interpretation of the former comparison was rendered difficult from insufficient (propietary) information about the types and proportions of 37 adult tissues contributing to the reference RNA preparation. The product was marketed as a human adult reference RNA sample optimal for qPCR studies and purported to represent transcripts from a wide variety of adult tissues. Nevertheless, the reference RNA can be considered to be representative of many adult differentiated cell types and devoid of large contaminating pools of stem cells. By comparison, resolution in the RA-induction assay may suffer from heterogeneity in the cell sample populations, since neither are hESC cultures comprised entirely of undifferentiated cells, nor are differentiating cultures devoid of undifferentiated cells. Conducting qPCR assays on more homogeneous and well defined cell samples, such as a neural cell line, would be a logical next step to resolve the above issues. Candidate transcripts 3372 and 3373 were particularly interesting in that both exhibited similar expression profiles (500-fold less abundant in adult cells) and had protein structure predictions that included cysteine disulfide bridges. In addition, both had proprotein peptide cleavage sites, with an NDR endopeptidase cleavage site in 3372 and a PCSK cleavage site in 3373 (of functional importance in mammalian neural and endocrine cells). Perhaps these novel proteins form cysteine bridges with one another or other proteins to form larger order functional structures. A protein-protein interaction study would be particularly informative in this regard. Another compelling example of a novel transcript with altered expression upon cell differentiation was candidate 3042 (2- to 8-fold increase). In earlier work (Schnerch 2005), the genomic coordinate orthologous to 3042 in mouse was found to transcribe an uncharacterized cDNA in bone marrow (Riken 9430067K14), and the corresponding tag was present in mouse ESC SAGE data. Cross-species conserved gene expression in orthologous genomic coordinates, 38 and in the same cell type, is strong validation for a bonafide transcript. This candidate transcript would be well suited for further functional studies using a mouse model. Similarly, the mouse genomic regions orthologous to candidates 3052 and 3116 were found to be transcribed in mouse (although without evidence for a tag in mouse ESC SAGE data). In addition, a segment of the 97 as ORF from 3052 scored a strong BLASTP hit to ECEL1, a member of the peptidase family M13. 3052 expression was also supported by independent hESC EST data (Brandenberger et al. 2004b). Thus, the collective evidence from successful cDNA rescue of transcripts corresponding to novel hESC tags, transcript and gene structure signals, gene regulatory features, predicted protein features, and empirical data from quantitation of transcript abundance, all suggest that many of these novel candidates are bonafide transcripts that may have roles in stem cell biology. Subsequent experiments using cloned expression-ready novel ORFs in an in vitro transcription and translation assay will provide further evidence for these cryptic transcripts present in human ES cells. Other possible functional assays include transcript knockdown using RNA interference protocols, targeting the predicted miRNA sites found in the 3' UTRs of some of these novel transcripts. However, despite the utility of the experimental approach for novel gene discovery adopted in this study, there are many technical and biological limitations to consider when interpreting such findings. These include the impact of cellular heterogeneity in ESC cultures defined as 'undifferentiated', mouse feeder cell contamination, false negatives for known and novel transcripts due to the absence of an N/aIII tagging site, oligo(dT) mispriming in the LongSAGE and 3' RACE protocols, noise from sampling RNA molecules other than mature 39 mRNAs (e.g. pre-mRNAs) that are present in total RNA preparations, presence of truncated transcripts captured by the 5' RACE SMART chemistry, inability to distinguish between transcripts generating the same tag, tags spanning splice junctions, transcribed pseudogenes, ambiguities in tag-to-transcript mapping due to RNA editing and fusion transcripts, conserved but non-coding transcripts, scoring non-canonical signals of gene structure in putative transcripts, and challenges with sampling transiently expressed transcripts. Nevertheless, the discovery and validation of novel transcribed sequences from cDNA tag-based assays is an important contributor to our understanding of the transcriptional output of mammalian genomes. The approach is gaining power with much deeper sampling capabilities offered by next-generation sequencing technologies. The resulting biomolecular knowledge resource should help refine protocols for the controlled genetic manipulation and evaluation of stem cells and their derivatives in research and therapeutic applications. 40 BIBLIOGRAPHY Adewumi, O., B. Aflatoonian, L. Ahrlund-Richter, M. Amit, P.W. Andrews, G. Beighton, P.A. Bello, N. Benvenisty, L.S. Berry, S. Bevan et al. 2007. Characterization of human embryonic stem cell lines by the International Stem Cell Initiative. Nat Biotechnol 25: 803-816. Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local alignment search tool. J Mol Biol 215: 403-410. Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402. Andrade, M.A., A. Daruvar, G. Casari, R. Schneider, M. Termier, and C. Sander. 1997. Characterization of new proteins found by analysis of short open reading frames from the full yeast genome. Yeast 13: 1363-1374. Ashurst, J.L., C.K. Chen, J.G. Gilbert, K. Jekosch, S. Keenan, P. Meidl, S.M. Searle, J. Stalker, R. Storey, S. Trevanion et al. 2005. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 33: D459-465. Baharvand, H., M. Hajheidari, S.K. Ashtiani, and G.H. Salekdeh. 2006. Proteomic signature of human embryonic stem cells. Proteomics 6: 3544-3549. Bailey, J.A., A.M. Yavor, H.F. Massa, B.J. Trask, and E.E. Eichler. 2001. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11: 1005-1017. 41 Bairoch, A., R. Apweiler, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane et al. 2005. The Universal Protein Resource (UniProt). Nucleic Acids Res 33: D154-159. Boheler, K.R. and K.V. Tarasov. 2006. SAGE analysis to identify embryonic stem cell- predominant transcripts. Methods Mol Biol 329: 195-221. Bornberg-Bauer, E., E. Rivals, and M. Vingron. 1998. Computational approaches to identify leucine zippers. Nucleic Acids Res 26: 2740-2746. Boyer, L.A., T.I. Lee, M.F. Cole, S.E. Johnstone, S.S. Levine, J.P. Zucker, M.G. Guenther, R.M. Kumar, H.L. Murray, R.G. Jenner et al. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122: 947-956. Brandenberger, R., I. Khrebtukova, R.S. Thies, T. Miura, C. Jingli, R. Puri, T. Vasicek, J. Lebkowski, and M. Rao. 2004a. MPSS profiling of human embryonic stem cells. BMC Dev Biol 4: 10. Brandenberger, R., H. Wei, S. Zhang, S. Lei, J. Murage, G.J. Fisk, Y. Li, C. Xu, R. Fang, K. Guegler et al. 2004b. Transcriptome characterization elucidates signaling networks that control human ES cell growth and differentiation. Nat Biotechnol 22: 707-716. Brown, N.P., C. Leroy, and C. Sander. 1998. MView: A web compatible database search or multiple alignment viewer. Bioinformatics 14: 380-381. Bryson, K., L.J. McGuffin, R.L. Marsden, J.J. Ward, J.S. Sodhi, and D.T. Jones. 2005. Protein structure prediction servers at University College London. Nucleic Acids Res 33: W36- 38. Bustin, S.A. 2002. Quantification of mRNA using real-time reverse transcription PCR (RT- PCR): trends and problems. J Mol Endocrinol 29: 23-39. 42 Carninci, P. 2006. Tagging mammalian transcription complexity. Trends Genet 22: 501-510. Ceroni, A., A. Passerini, A. Vullo, and P. Frasconi. 2006. DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acids Res 34: W177-181. Chen, J., S. Lee, G. Zhou, and S.M. Wang. 2002. High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3' complementary DNAs. Genes Chromosomes Cancer 33: 252-261. Clark, F. and T.A. Thanaraj. 2002. Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum Mol Genet 11: 451-464. Claros, M.G. and G. von Heijne. 1994. TopPred II: an improved software for membrane protein structure predictions. Comput Appl Biosci 10: 685-686. Crawford, G.E., I.E. Holt, J.C. Mullikin, D. Tai, R. Blakesley, G. Bouffard, A. Young, C. Masiello, E.D. Green, T.G. Wolfsberg et al. 2004. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A 101: 992-997. Davis, M.J., K.A. Hanson, F. Clark, J.L. Fink, F. Zhang, T. Kasukawa, C. Kai, J. Kawai, P. Carninci, Y. Hayashizaki et al. 2006. Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units. PLoS Genet 2: e46. de Castro, E., C.J. Sigrist, A. Gattiker, V. Bulliard, P.S. Langendijk-Genevaux, E. Gasteiger, A. Bairoch, and N. Hulo. 2006. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34: W362-365. 43 Draper, J.S., C. Pigott, J.A. Thomson, and P.W. Andrews. 2002. Surface antigens of human embryonic stem cells: changes upon differentiation in culture. J Anat 200: 249-258. Emanuelsson, 0., S. Brunak, G. von Heijne, and H. Nielsen. 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2: 953-971. Ewing, B. and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186-194. Ewing, B., L. Hillier, M.C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175-185. Frith, M.C., A.R. Forrest, E. Nourbakhsh, K.C. Pang, C. Kai, J. Kawai, P. Carninci, Y. Hayashizaki, T.L. Bailey, and S.M. Grimmond. 2006. The abundance of short proteins in the mammalian proteome. PLoS Genet 2: e52. Frohman, M.A., M.K. Dush, and G.R. Martin. 1988. Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci U S A 85: 8998-9002. Fuhrmann, M., A. Hausherr, L. Ferbitz, T. Schodl, M. Heitzer, and P. Hegemann. 2004. Monitoring dynamic expression of nuclear genes in Chlamydomonas reinhardtii by using a synthetic luciferase reporter gene. Plant Mol Biol 55: 869-881. Gardiner-Garden, M. and M. Frommer. 1987. CpG islands in vertebrate genomes. J Mol Biol 196: 261-282. Gasteiger, E., A. Gattiker, C. Hoogland, I. Ivanyi, R.D. Appel, and A. Bairoch. 2003. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31: 3784-3788. 44 Gaur, A., D.A. Jewell, Y. Liang, D. Ridzon, J.H. Moore, C. Chen, V.R. Ambros, and M.A. Israel. 2007. Characterization of MicroRNA Expression Levels and Their Biological Correlates in Human Cancer Cell Lines. Cancer Research 67: 2456-2468. Griffith, O.L., E.D. Pleasance, D.L. Fulton, M. Oveisi, M. Ester, A.S. Siddiqui, and S.J. Jones. 2005. Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. Genomics 86: 476-488. Griffiths-Jones, S., S. Moxon, M. Marshall, A. Khanna, S.R. Eddy, and A. Bateman. Rfam: annotating non-coding RNAs in complete genomes. 2005. Nucleic Acids Res 33: D121- 124. Guhr, A., A. Kurtz, K. Friedgen, and P. Loser. 2006. Current State of Human Embryonic Stem Cell Research: An Overview of Cell Lines and their Usage in Experimental Work. Stem Cells 10.1634/stemcells.2006-0053. Henderson, J.K., J.S. Draper, H.S. Baillie, S. Fishel, J.A. Thomson, H. Moore, and P.W. Andrews. 2002. Preimplantation human embryos and embryonic stem cells show comparable expression of stage-specific embryonic antigens. Stem Cells 20: 329-337. Hirst, M., A. Delaney, S.A. Rogers, A. Schnerch, D.R. Persaud, D. O'Connor M, T. Zeng, M. Moksa, K. Fichter, D. Mah et al. 2007. LongSAGE profiling of nine human embryonic stem cell lines. Genome Biol 8: R113. Ikeda, R., M.S. Kurokawa, S. Chiba, H. Yoshikawa, M. Ide, M. Tadokoro, S. Nito, N. Nakatsuji, Y. Kondoh, K. Nagata et al. 2005. Transplantation of neural cells derived from retinoic acid-treated cynomolgus monkey embryonic stem cells successfully improved motor function of hemiplegic mice with experimental brain injury. Neurobiol Dis 20: 38-48. 45 Isogai, T., T. Sugiyama, T. Otsuki, A. Wakamatsu, H. Sato, S. Ishii, J. Yamamoto, Y. Isono, Y. Hio, K. Otsuka et al. 2007. Full-length cDNA, sequence 2465, NCBI accession ABP27450. US Patent 7193069. Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16: 418-420. Karro, J.E., Y. Yan, D. Zheng, Z. Zhang, N. Carriero, P. Cayting, P. Harrrison, and M. Gerstein. 2007. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res 35: D55-60. Kent, W.J. 2002. BLAT--the BLAST-like alignment tool. Genome Res 12: 656-664. Kim, Y.C., Y.C. Jung, Z. Xuan, H. Dong, M.Q. Zhang, and S.M. Wang. 2006. Pan-genome isolation of low abundance transcripts using SAGE tag. FEBS Lett 580: 6721-6729. Kong, L., B.T. Lee, J.C. Tong, T.W. Tan, and S. Ranganathan. 2004. SDPMOD: an automated comparative modeling server for small disulfide-bonded proteins. Nucleic Acids Res 32: W356-359. Kuhn, R.M., D. Karolchik, A.S. Zweig, H. Trumbower, D.J. Thomas, A. Thakkapallayil, C.W. Sugnet, M. Stanke, K.E. Smith, A. Siepel et al. 2007. The UCSC genome browser database: update 2007. Nucleic Acids Res 35: D668-673. Linial, M. 2003. How incorrect annotations evolve--the case of short ORFs. Trends Biotechnol 21: 298-300. Lupas, A. 1996. Prediction and analysis of coiled-coil structures. Methods Enzymol 266: 513- 525. 46 Marchler-Bauer, A., J.B. Anderson, P.F. Cherukuri, C. DeWeese-Scott, L.Y. Geer, M. Gwadz, S. He, D.I. Hurwitz, J.D. Jackson, Z. Ke et al. 2005. CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33: D192-196. Mathews, C.K. and K.E. van Holde. 1990. Biochemistry. The Benjamin/Cummings Publishing Company, Redwood City, CA. Matys, V., O.V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer et al. 2006. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34: D108-110. McGinnis, S. and T.L. Madden. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32: W20-25. Mignone, F., G. Grillo, F. Licciulli, M. Iacono, S. Liuni, P.J. Kersey, J. Duarte, C. Saccone, and G. Pesole. 2005. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 33: D141-146. Miura, T., Y. Luo, I. Khrebtukova, R. Brandenberger, D. Zhou, R.S. Thies, T. Vasicek, H. Young, J. Lebkowski, M.K. Carpenter et al. 2004. Monitoring early differentiation events in human embryonic stem cells by massively parallel signature sequencing and expressed sequence tag scan. Stem Cells Dev 13: 694-715. Mulder, N.J., R. Apweiler, T.K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bork, V. Buillard, L. Cerutti, R. Copley et al. 2007. New developments in the InterPro database. Nucleic Acids Res 35: D224-228. 47 Nekrutenko, A., K.D. Makova, and W.H. Li. 2002. The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res 12: 198-202. Ng, P., J.J. Tan, H.S. Ooi, Y.L. Lee, K.P. Chiu, M.J. Fullwood, K.G. Srinivasan, C. Perbost, L. Du, W.K. Sung et al. 2006. Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res 34: e84. Nielsen, K.L., A.L. Hogh, and J. Emmersen. 2006. DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res 34: e133. Oudes, A.J., J.C. Roach, L.S. Walashek, L.J. Eichner, L.D. True, R.L. Vessella, and A.Y. Liu. 2005. Application of Affymetrix array and Massively Parallel Signature Sequencing for identification of genes involved in prostate cancer progression. BMC Cancer 5: 86. Oyama, M., C. Itagaki, H. Hata, Y. Suzuki, T. Izumi, T. Natsume, T. Isobe, and S. Sugano. 2004. Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res 14: 2048-2052. Pang, K.C., S. Stephen, P.G. Engstrom, K. Tajul-Arifin, W. Chen, C. Wahlestedt, B. Lenhard, Y. Hayashizaki, and J.S. Mattick. 2005. RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Res 33: D125-130. Puntervoll, P., R. Linding, C. Gemund, S. Chabanis-Davidson, M. Mattingsdal, S. Cameron, D.M. Martin, G. Ausiello, B. Brannetti, A. Costantini et al. 2003. ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31: 3625-3630. 48 Richards, M., S.P. Tan, W.K. Chan, and A. Bongso. 2006. Reverse serial analysis of gene expression (SAGE) characterization of orphan SAGE tags from human embryonic stem cells identifies the presence of novel transcripts and antisense transcription of key pluripotency genes. Stem Cells 24: 1162-1173. Rost, B., G. Yachdav, and J. Liu. 2004. The PredictProtein server. Nucleic Acids Res 32: W321- 326. Rozen, S., and H. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132: 365-386. Saha, S., A.B. Sparks, C. Rago, V. Akmaev, C.J. Wang, B. Vogelstein, K.W. Kinzler, and V.E. Velculescu. 2002. Using the transcriptome to annotate the genome. Nat Biotechnol 20: 508-512. Schnerch, A. 2005. Analysis of undifferentiated human embryonic stem cell lines using serial analysis of gene expression. M.Sc. Thesis. The University of British Columbia. Siddiqui, A.S., A.D. Delaney, A. Schnerch, O.L. Griffith, S.J. Jones, and M.A. Marra. 2006. Sequence biases in large scale gene expression profiling data. Nucleic Acids Res 34: e83. Siddiqui, A.S., J. Khattra, A.D. Delaney, Y. Zhao, C. Astell, J. Asano, R. Babakaiff, S. Barber, J. Beland, S. Bohacec et al. 2005. A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc Natl Acad Sci U S A 102: 18485-18490. Siepel, A., G. Bejerano, J.S. Pedersen, A.S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L.W. Hillier, S. Richards et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034-1050. 49 Solter, D. 2006. From teratocarcinomas to embryonic stem cells and beyond: a history of embryonic stem cell research. Nature Rev Genet 7: 319-327. Stormo, G.D. 2000. Gene-finding approaches for eukaryotes. Genome Res 10: 394-397. Suzuki, Y. and S. Sugano. 2006. Transcriptome analyses of human genes and applications for proteome analyses. Curr Protein Pept Sci 7: 147-163. Synnergren, J., T.L. Giesler, S. Adak, R. Tandon, K. Noaksson, A. Lindahl, P. Nilsson, D. Nelson, B. Olsson, M.C. Englund et al. 2007. Differentiating human embryonic stem cells express a unique housekeeping gene signature. Stem Cells 25: 473-480. Thomson, J.A., J. Itskovitz-Eldor, S.S. Shapiro, M.A. Waknitz, J.J. Swiergiel, V.S. Marshall, and J.M. Jones. 1998. Embryonic stem cell lines derived from human blastocysts. Science 282: 1145-1147. Ungrin, M., M.D. O'Connor, C. Eaves, et al. 2007. Phenotypic analysis of human embryonic stem cells. In: Current Protocols in Stem Cells (in press). Unwin, R.D., S.J. Gaskell, C.A. Evans, and A.D. Whetton. 2003. The potential for proteomic definition of stem cell populations. Exp Hematol 31: 1147-1159. Van Hoof, D., R. Passier, D. Ward-Van Oostwaard, M.W. Pinkse, A.J. Heck, C.L. Mummery, and J. Krijgsveld. 2006. A quest for human and mouse embryonic stem cell-specific proteins. Mol Cell Proteomics 5: 1261-1273. van Ruissen, F., J.M. Ruijter, G.J. Schaaf, L. Asgharnegad, D.A. Zwijnenburg, M. Kool, and F. Baas. 2005. Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips. BMC Genomics 6: 91. Velculescu, V.E., L. Zhang, B. Vogelstein, and K.W. Kinzler. 1995. Serial analysis of gene expression. Science 270: 484-487. 50 Vogel, G. 2003. Stem cells. 'Sternness' genes still elusive. Science 302: 371. Wang, J., S. Rao, J. Chu, X. Shen, D.N. Levasseur, T.W. Theunissen, and S.H. Orkin. 2006. A protein interaction network for pluripotency of embryonic stem cells. Nature 444: 364- 368. Willems, E., I. Mateizel, C. Kemp, G. Cauffman, K. Sermon, and L. Leyns. 2006. Selection of reference genes in mouse embryos and in differentiating human and mouse ES cells. Int J Dev Biol 50: 627-635. Xu, C., M.S. Inokuma, J. Denham, K. Golds, P. Kundu, J.D. Gold, and M.K. Carpenter. 2001. Feeder-free growth of undifferentiated human embryonic stern cells. Nat Biotechnol 19: 971-974. Yamasaki, C., H. Kawashima, F. Todokoro, Y. Imamizu, M. Ogawa, M. Tanino, T. Itoh, T. Gojobori, and T. Imanishi. 2006. TACT: Transcriptome Auto-annotation Conducting Tool of H-InvDB. Nucleic Acids Res 34: W345-349. 51 00 .5 <g^ (i) -.6 ' N .0  r.4 ^ + 4  i . . . g ;— ^ ca fa, b.0 c''^ -0 ^ 74 o °  ( 1)^ g  o  . - •.  c.., .g! V I • . ^ 0  ..f) 'a rn  C h Z" ,^ ;-. s....' 1 ) O .— . 6■1^ C U  L . ' C I O M I o ^ v-S (2 ,,,, $.-110., CKS v- ^ O % ,,0 ta) as 2 VO ^ C.I..1 (N ) .5 o ^,,, Vs ...^ rn ,.4.., 0 -t:, cl °-)^ . o  Q 3c t  0 ^ a s  r. (24 "a^ -*  E •,-,o^ o •1.4 Alk ,-. ›, bi)^ o  ed <.,., . 7 U ...^ AtCci.1^ .4 7 1-.. u)^ u) U )v ) O .2^ r t n  C V 4) ^ 174^ rzi^ a) ^ > ,  Z ^ 03 O cc)^ ,— , cil ^ 0  = 5 77$^ cd cA g:4 "0. ^ a t > , "0 O 0  ,::1 ^ 0  0 ^ c ,9 > ;0' "0 ^ at ( : )  =^ 7.) .4 .4 al ^ C . ) ^ co 0 cc■ 0  g -0 ^ 0 ^ 0  ......, cd O - 0  c)^ 04 2 ^ to^ 7/1 -ti 0  N ^ 'd ) , s o  c l^ 0  L t ^ c n  a) c.) -.^ c.,) % .• ,-. 0  5 ^ a.) ... =^ a) o^ ›, ;54 . ,c ,  ^ 8  -8 --■ ic, -47... „., • . ^ 0 — C .) Q ^ X  0  a) a)^ x  - U ^ cu  2 ^ A^ co^ H  ai. ^ ;.. a)^ a)^ a. ,^ .  0 -ii' s^ _,04 ^ 0  , - . <4 • t)^ 7,4 ,4 14 ) u., cs. ^ .0 4-1 74 El, ^ • --.^ H ^ o  01. Z  O o  v i  p 4 ^ '' cu ^ 4 , ^ g -60 r. p 4 . a ) ti)^ 0  > I...^ 0,3^ v 0 a) (0 ^ a)^ >^ Q  C )^ 1 .. , cr.' ,..,' . 5  ■ • . 't  w .....o -5 ^ •  . . r.- . [/)^ X  1 ) .1.1^ • /-.I ^ M  ^ e-sed1--1^ W  0 fil C fl 44 .1; ) C .) ^ 4 - .  RI^ cd'iA -5 , -c,^ ... > tu.^ L  i:,-) o  ,,, W O U ^ C.) ,.., 0  0 0  C 4 .4 v.. "  OE g _____. . 4-I t..^ cr a 0  .  In ^ 0 ..... v ) u) O 1.. ) ,;54 ^ C 4  -0°) 0  c e 5  o  --. ... O -ci g T.) g ,.o ^ c ‘ i ^ ..^ *$7..' C l) 0 — .^ kr) o. v ■ I -oa) cdV0 c..) ^ cO ^ 0  ° ^ 0  i) .) t' a ) 4E4  M o ' C )  0 er ^ . g^ T ) .2  q;) . <4 -o^ co cn -,-,), O E-^ ^ ;_c ^ u ) U  ''' <  cn e l ^ § ^ cd Z C . s..-., ^ Zc i^ at' ..?375(..7 o 0  0 c.)^ o  2  -;, ., ..^ c.) 0 U 71.^ r=4^ *.L0^ to t 0... U^ o, 0 1-. 2 -.g IS a) c - n  ^ 8  ^ (N I 'os 0 VD 0  0 ,—  0 ^ rn 0'^ M  0  0 ^0 .1  ., O C .) • ..^ 010 to ;-+-, • — ^ b i )  T a ", --■  .5  .5  E .5  2 ' ^ Q  . to 1/40 N 52 Appendix 2. hESC culture and differentiation Contributions Cell culture, differentiation experiments, and cell-based assays were done by Dr. Michael O'Connor in the laboratory of Dr. Connie Eaves (Terry Fox Laboratory, BC Cancer Research Centre), who kindly provided samples and protocol notes. I conducted RT-qPCR assays for markers of undifferentiated and differentiated cellular states. (a) Cells H9 and H1 cells were purchased from WiCell (Madison, WI, USA). Mouse embryo fibroblasts (MEFs) were a generous gift from StemCell Technologies, Inc. (Vancouver, BC, Canada). Approval for use of these cells as described was obtained from the Canadian Stem Cell Oversight Committee. Unless otherwise stated, all reagents were obtained from StemCell Technologies. (b) Maintenance hESC cultures Undifferentiated hESCs were cultured in maintenance medium consisting of DMEM/F12 supplemented with 20 % Knockout Serum Replacer (Invitrogen, Carlsbad, CA), 0.1 mM beta- mercaptoethanol (Sigma, St. Louis, MO), 0.1 mM non-essential amino acids, 1 mM glutamine and 4 ng/mL human FGF2, either in contact with mitotically inactivated MEFs or in MEF- conditioned maintenance medium on dishes coated with Matrigel (BD Biosciences, San Jose, CA), and passaged enzymatically using 1 mg/mL collagenase (Invitrogen) every 7 days as previously described (Thomson et al. 1998; Xu et al. 2001). 53 (c) hESC differentiation cultures For all-trans retinoic acid (RA) induced differentiation, the medium on adherent hESCs in 5- to 7-day old maintenance cultures was replaced with fresh maintenance medium lacking FGF2 and supplemented with 10 -5 M RA (Draper et al. 2002; Henderson et al. 2002). The cells were then cultured for up to 5 days with daily medium changes. RNA samples were collected in TRIzol (Invitrogen) from the initial undifferentiated hESCs and then from the differentiating cells, each day up to 5 days post-initiation of RA differentiation. All the undifferentiated and RA-differentiated samples collected were analyzed for pluripotency-related antigens via flow cytometry as well as colony-forming cell assays. (d) Flow cytometry Antigen expression was analysed by flow cytometry as described (Ungrin et al., in press) using the following unconjugated antibodies: anti-stage-specific embryonic antigen-3 (SSEA3), anti-SSEA4, and anti-SSEA1, all from the Developmental Studies Hybridoma Bank (Iowa City, IA); TRA-1-60 and TRA-1-81 (Abeam, Cambridge, MA); and anti-Oct3/4 (BD Biosciences). The respective secondary antibodies used were: APC-conjugated anti-IgM (Jackson ImmunoResearch, West Grove, PA), FITC-conjugated anti-IgG3 , and FITC-conjugated anti-Ig (BD Biosciences); FITC-conjugated anti-IgM (BD Biosciences); and FITC-conjugated anti-IgGi (BD Biosciences). Cells were analysed using a FACSCalibur (BD) and FlowJo software (Tree Star, Ashland, OR) with gates for positive cells set such that they excluded > 99 % of events detected when the same cells were stained only with the appropriate fluorochrome-labelled secondary antibody. 54 (e) Colony assays For primary colony-forming cell (CFC) assays, cultures were incubated with 0.05% trypsin or TrypLE reagent (Invitrogen) for 10 minutes at 37°C. Cell suspensions were filtered through a 40 pm cell strainer and counts of viable cells performed in a hemacytometer after staining the cells in 0.1% nigrosine (Sigma). Cells were plated onto MEFs in maintenance medium containing 4 ng/mL FGF2 or Matrigel-coated plates with MEF-conditioned maintenance medium containing 4 ng/mL FGF2, and were then cultured for 7 days. For AP detection, cells were fixed and then stained using an Alkaline Phosphatase kit (Sigma) as recommended by the manufacturer. (f) RT-qPCR The abundance of mRNA markers (Table 7), routinely used in assessing pluripotent or differentiated cellular states, were measured to complement flow cytometry assays for pluripotency-related antigens and colony-forming cell assays. Gene Day-2 fold difference from Day-0 hESC Day-4 fold difference from Day-0 hESC Predicted trend Observed trend POU5F1 0.53 ±0.02 0.40 ±0.02 Downregulated upon differentiation As predicted, except for LIN28 NANOG 0.32 ±0.01 0.18 ±0.03 L1N28 1.46 ±0.06 1.64 ±0.08 DNMT3B 0.70 ±0.03 0.45 ±0.03 GDF 0.60 ±0.08 0.27 ±0.02 AFP 44.68 ±3.22 58.5 ±3.4 Upregulated upon differentiation As predicted, except for BRACHYURY BRACHYURY (T) 0.05 ±0.01 0.03 ±0.00 NEFH 2.36 ±0.64 1.82 ±0.24 NEUROD I 0.72 ±o.41 4.34 ±0.60 CDX2 55.84 ±4.50 160.2 ±8.2 CD30 N-terminus 0.40 ±0.03 0.14 ±o.cut Unknown Day-0 level elevatedrelative to days 2 or 4CD30 C-terminus 0.48 ±o.oa 0.16 ±0.03 55 0 0 0 0 t> 1^ 7;1 C O  C O t a l C i ^ r i ) ^ co1.4-4  a l ),) ^ a l  C r  C O  r- Î=1 N N - O C cu ..^ .. ^ NJIA cn Lo c, „ cu^ — ^ C . ( ) L 6 1 ,—^ a) M . ,^ E  I....1 Ril^ 4.4I... (11 ^ C .2.^ 1....1 8r4 ^ 8..1■■^ L o  T ^ _:,_.: ^ N 4/3^ E c o  sa  0  2 ^ . - - I ^ (1 ) 7 .9 ^ ,_, 0 ^ L ., — I^ -L ctl= ^ r o  8  Z  o r cr. 5  rg, 4:5^ Li ^ 4.,^ L-1.. LC , CO a)^ ;_—, A N  -4 , C . .0 .- 4 4  0  .',-- .3  -6 . ^ 1  ' 7". .,4 ,^ CI) r1 3  It  iv ^ 4 -- N . ed^ t: =  i .§ . c.̀" 44^ C linro  1 ,  L L  L L  t... c  c o 63^ 0  Ce^ ....^ ci., 'V " ‹t. ....1  z  .ct^ u , ,.- ^ < ^ :-... = ./) kb M I ^ ;:p- ,..„„ Z  Z  c e  Z  a , ., ^ Z ‘..r c c  et^ c c  c c  cc U . c ^ r- e L C O  c0 r-40 0AN , 0 'V  rt.)! 0  C.A. N  o 'r .--, ol 19^ c.,rsi^ e0 CI eg r-- C n ^ CA■ 1 03 C 3  C a ..--1 01;^ .3e ^ 1 3)^ ,.i.' ,t .-: 06 . ^ . 'I-C C , C f ■  o i ,-I ‘,„ c "^ 4r-' , ^ ,-,2, ^ = N ^ a ■■-■ —  --A 4.7 ..^ .. N(J) e I ( .4 ^ ‘-4' a) 4)^ C IJ "p. ^ a . L.1) C M  0 ..7.-^ 0., in^ ._ ..,_ ... ..^ (-... ...:4. '^ cp.^ *a 19 2 ^ V I 0 ., knti^ .. g ^ 1., 1--.1 4...^ V  t !,,.:, .:.., 49-^ NV ea^ ao .0  0  0  C l.) 7 .4^ C  "  E  0  r o  = - 5 0  - - -CO , u  _ _ 4 ^ 444, :( 7-■^ i■ 0^ I.^ C N  0 Li)^ IT 3  C O  Z  C IL  C C  4 0  5 tu^ ,.. A N  -4 4 - C r c -^ i f )  N a 'o  ,,- c h  ue , , -  0 C6^ c.- =  i'j ua  .- 0  g  g  1 0  4 ,4 1 ti:^ , V^ re 4 -, u _ L T _ "-- 0  W ^ 444 2„1 I..^ '.it" u ^ ,_, ..,..-,_ 4 J  <  <  < ^ v ) 00 ^ rc l^ (FS (.0 ,.. >  Z  Z  '-  Z  1,,  .1 , i.. ^ -^ to co L.)^ c e  r,' ce cC  ac  U .. Z ^ •,,- IN 56 Appendix 4. Primers for 3' RACE, Full-Length cDNA, and ORF recovery Novel transcript^3' RACE Outer^ 3' RACE Inner 3012 TTGCTTTAGAATGCCCGATT TAAAAGGGCTTGAGCCATCA 3114 CTTTCATTTCCTTTGGTCATGC CTTTGGTCATGCTCTTGTGG 3149 TATATCCCCCATCCCATCTG ATCCCATCTGCTGGAAGTCA 3210 CGCAGGCCTGTTGAGTAAAT GGCCCTTCAAATCTCAGTGT 3372 AAGGCCTTGACATGTTTGGT AGGAAGCTCTTGCCCAAAAA 3373 TCCTTTACTGCGCATGTGTT GCATTTGGCTGTCTCTGCTT 3399 AACATCTTAGCATCTGGGGAAC GCATCTGGGGAACACCTAGT 3443 TCACAGTGGACGACTCAGC CAGTGGACGACTCAGCCCG 3461 TGGGCATCTCCACACATAAA CATCATACCCCACCCAACC 3465 TGGAAGAGAGGAGGGTCCTA CAGCCTGGTGGGATATTTTC 3518 TTTGATTTCTATGGGAACTTTGC TGGGAACTTTGCAAGTAGATATAGG 3630 GGACCAGTACCCCACATGTACT CACATGTACTTGAACCGAGGAG Novel FL transcri It^FL-Forward^ FL-Reverse 3012 GGAGAGCTTCCTGTAAGCGAA CCAAGGCCCTAGAATCAAGTT 3210 GATGCCGGGAGGGTTTGA AAATATAAAAATAAGCCTTTTAATG 3372 GACTCAGCCCGCCTGCAC GGAATTCATGTAATTTTAGGACAGA 3373 TACTAAGCAAAAGGAAAACTG GTTTTCAAAATAACTGATCATTTTA Novel ORF^ ORF-Forward^ ORF-Reverse 3012 65aa ATGGCGCACCCGGAAAGG TCAAACATGAAACTCGCTCTG 3210_111aa ATGGTGGAAGGAGCTCAGCCCAG TCATGGCCGGGCACGGTGGCTCT 3372 101aa ATGAAATTTGGTGCCGTGAC TCAAATCCTTCTACTTTTGAC 3373 72aa ATGCGGGCCCTTAGTCTG TCAACAGTTGATCATATTTATTT 57 I)cd4-,cct0rr; cil P.^ rl, -c-cs -ci^ 0 46". ^ ' t i° cuoH cu -cs wO 'a t:4^ el) ^ a) fd c..^ Z0 cns... ^ .- .) +->t)  • '4 - 7 :1 ^ t€1 : CI) 2  E  = • 6c..) 4 - -- 1^ 7 ,4 n . 00 *4 -, c d . 0 C.4^ a a .N.--Ns .. "t:: la , cl) cl.s^ ▪^ ccl^ E,i^ ccl 74 s 1 . ( . . )^ I) u., .°' '- 1 H . ,-. [_, • „SD  tO ^ •-■ •-••, .) E^ 0  .17 / 0 . 7 1 5 1  V ) 0  0  to  ;••■  0 . U  0 0  , _  0 ^ "0 ^ c) .0  0  0  cr9  a ) l ^ P , )  , 1.4 • 7 )^;••■ ^ C' , . .  re .,4 0^ 0  > , 1.) . Ed) H 4 -4 A ^ 5 4-4 O > " 1  ''' •  ,-. V I 0 0.0 a. cl U  c...) ,--) cA  a, P.9H 0-0 OObA1.) cd • O U 0 .,••• +-■ ris S. 4-1 CCI a)can c••■ CA C4■4^ 0 •4 0 g E C D0 O H N — 11). OA ^ b i l ^ •< ^ = ^ Z e i H . --. '`t ^ E ^ 7:3 bA 0 .... ^ 01,^ ccl 't c 0 n ^ 'A^ ,-. -1' .,_ ^ . ,•-■ fa,^ a , H ^ 4-1 ^ ... c c : ri-`^ C -) '' ., 1 E  ,-- . o ^ H ^ ..,:' ^ 'Et' 'a-. o^ V) 'a' ••-• ^ C= CIS 0 . r..: 1:10 CD^ r-'-'"?'" ^ 3 ^ • ,.. C D  H  •••-,^ E .- 0t — ^ 5  ,b am ^ up .! <_) v)^ c ) ^ A-, ^ o <1.) 1.)^ cur' = O ----..h i.. ^ czt rz  o  ;■  "  --,̀"  a) 0  ct b A 09 cci j -4.(L) V ) Ci el. • '6 ti ▪ s= 1 o 0,, -ti ox c -:: . 5  c l) =  0  -- 6  0  u  7 -''Z3 0 •. ^ .^ to . ,-,^ T 'ZI '' - ^'(-11 . '-$:1-4 •-•,-1^ bp ,-, A A?-...' E  0 E  t 'I' E  el) =  ilr .,) :5, 76 0  • ,-^ -CS •-  0 ^ 0  =  A  •  -. ^ . , . _ A C D  =^ " (/) X e l) n .) 0  1:4 ) 0  0j.., Z  6., > <0 c. ,.. ). cl 4. -", (.7 a C .7 v) = ^ P ../  :'-) C I) 0 ^ •-; C ■i c ,-; 4  kri v6 r---:^ cx5 c; .6 1 •er 58 Appendix 6. Candidate transcript 3012 sequence >3012_1769bp GGAGAGCTTCCTGTAAGCGAACCCGTGAAGGTTCCAGGAGGATGGCGCACCCGGAAAGGGCGTGG AAGCTCCGCGCCCCTTCCCACAGACTTTCCCCCATCCATCTCTTCATCTGGTGTTCATCGATAGCCTT TGTAATGTCCTTAATAGTAAACC GGAAAACAGTGGAGGAAG AAGAGAATCAC CAC ATATC GTATTTAG AGGTCCTGCAGAAAGGGCAGAGCGAGTTTCATGTTTGAGTGCAGCGGATGCATGGGCAAAGCATTA CACAGCCGCTCCATGGAAGTGACAATGAGCCTTGTCTTTCACACGTAACTCCAGGAACCAGGGGAA ACAAAACCAATTCCACCCAGAGGTACAGACAAAGAAGAGCACCCTCCCAAAGGAGCCCGCATGATA AGTGGCCACGGCGAACATCACTGTGATCAGTCAGAAAGGCAGGCGAGTTATTAGCTTAGCCCTGAA CCTCCTACTTAATGTAGAACCAGTGGTGATGGTTAACAGACCCTGGCTGCAGTGACAAGCCTGGGAC CTGCTC GAGGAAATCCAATTGAAAAGCAGCCAAGAGGAACAACAAGCTAGAAACAGATGTTC CAG CT ACCAGGATGTGCACGCATTTAACCTGAACATGTAGTTGGGGTGGGATGGTAGCATTCGTTTCTAAAT ATAGTGATTTCACTTCTTCCCCCATATAGATATTGTCAAGTTTAATTATAGCCATCCACATTAGTTTTGT CAAATTTTCAGCAATTTCTTTAAAATAAGAAGATGTCAGAATGATAACTGCTATTAAGGGAATTCAGGG TGTTTG GTTTATC TACC CAG TCTTACTAAAACATTTCAG TTAAATCATC CAC TC TTAAAC T CCAG T CTC ATAGCAATTAAAAATTATCTTTTAAAAAATTGCCCAAATGAAATTCCAAACAGTTCATGTAATTATTTGA TTAAATATTACCATGTTAGTAAAATATCTTTAAGATAGTTATATTTTCATGAGTCACTTTCTTAGTCACT TCTAGTTCCTAAATAGGGGAGTTTGGGTTGACAGAGGTGTGATAACCATGCGAACAGGGGCTAGAAT ATAAGCTTAACAGAATCAGTTAATTAGATTTGGAAGAGAGTTGAGTCTTGTGAACCACAGTGATATGT CATGTACATTAGGAAACATTCTCCTTTACGTCAACACAATTCTGGTTTTCTCTAATGGGCTTTATACGA GGAGAGAGACCATCTCACTGTGTTGATTTTTCTCAAGAAAAAAATCAATACTCTCAATGGATTTTAAGT AGATCTAATGTAGTACTGCACTATGCTCATAAGAAGATAAGCATATGTAAAAGATTTTAAAATGTGAAT TAATGCATGCCACACTGAAACCAGATCATACCCCACACTCATGTTGTTATTATGTGTACCTAAAGCAA AGAACAGCTGCAGAGTATTTTCTCTGGTAACACTTCATTTTTTATTGGTAGGTTTTAATCCGATACATT TTTGGGGTCACTGAATTCTTATTTGCTTTAGAATGCCCGATTATATCATTCTGCAAAGGATAAAAGGG CTTGAGCCATCATCCTGTGCTGAAATTTATTGAAGGTATTTATGAAGAGGGTGATCTTGAGGAATGCA AG C TTTTAAAAT C TAAGAACAGCAG C AAAAATAAATTAATCAG GAATC CATCAC TAATTTATACATGAG TCAC TTAC GAGGAAG GTAAAAC GGTTCAG TAAATTATG GATTTACAAATAAAC TTGATTCTAGG G C CT TGG >3012_1769bp_65aaORF (frame 3, 42 to 239) MAHPERAWKLRAPSHRLSPIHLFIWCSSIAFVMSLIVNRKTVEEEENHHISYLEVLQKGQSEFHV 59  4 — c-4CDCDen0 0C/2 r=4  o 0en  0en0L.0  Leo e n '`a .a.)Ea..)u'2 '..' H . .,...:9. Z7r) 1 5  utj 0  .,...4-.c.) . 5 ^ 5 ^ a) •.,—  ,,^ „1 g  . N mg H °^ 0  , ,; - . . 2 t, ›,^ . -.. 0 ai^ a; • ,-..^ • a ,,- ;-4  cn ^ 0  0 ;=I^ (.,1^ 0 ^ ;.-, C d cl. )^ crj A :-,-4 A — . ^ . a ' ^ • •Igl.' . 1■1^ H t,CL) 7:9  A A. .7., C ) L ) ^ 0  g ^ 7 :, (I)  C 1 Z : g . ,..{ Ci)^ up^ g  A — . e■ ..1 —c--'73oLl '6 : I ,---, (.) ^ 5 -, cd cd Q . 2) t.; — ,7,' .0 3 ,:::' ›, . a ''' to ,..,,n  a.) c7t , kr) c-.) •^ 2 '-' „... .cd cd—  0  E  c n  In  eu, ^ . ,--I^ cl) M 1^ s-4 , =,..^ cij I I ^ cr, ^ 2 E _, c ) ®  p .., , c r) ,0 0  7 :f '^ c ll 0  0  P.. C D  c l) g  5 - 73'- 0 6  t^ up^ ,^ O -..., '-' -' ^ v ) = ^ II s-■ • ,-. ,,, ‘,..., g  -0 ■- i . • A -'=, ^ - -a'^ ,_^ .,;,--, ....., • • —t '1 - cu  :0_, a., v) c...) ^ 4 -4 g ;-'^ 0 '  ' - '  "  C - CCE .51 '  C _ u  7 ,73 x - 2 a  -0  -., ,, ›, ct,., c . ,  E , ^ri-:, cd --' ,--. H  ce —6 71- oo -' ^ -u -,—.—, 7 1 - C...7 P -1 al- in  — 1 60 .. ro .5^ 4 -1  T i 'C I 4)^ 4) F ' 4) FA.^ 0 FI ^ %-i^ 14 al^ 0  a l 1 3 4 nnnnnnnnElUnnnnnnn1- 7ElnElnnnnr-1ElEln0 kw r0ti P.4 0  1 ))  4 -)^ N ^ 0  r 1 1  . ri^ 0  E-. • ,, ,,1., (/) r.L.. ia, ,n - ^ n ^ C.) u, c 4 z.-.• =  - ^ r - 1 ^ ;.-... 1.14 (_,. r jj .--..^ 1- 1^ 0 C _I' 4 7 D • 1-1 - in^ r i ^ r.... ri.'i - 01 ..) (,)^ r i ^ .)C O r) C O  0 0 ^ r 1 ^ C..) F4 c -.) a. '7I-,,^ n ^ c) .4  . „ n ,-. ^ I ^ c..-, a u :4 ..7_;. pLi ^ E l^ 0  •-4 a - 0  C4  , ^ 1 7 1 ^ 0  0  * (_... O4 4 7 ^ r i ^ 0 4 O ,-7^ n ^ 0 0 ,' ^ I - 1^ 0 ^ r i ^ 00 c) --^ I—I^ 0  t rii fii^ n Pt] E ,^ n ^ 0 x  , 0 1 ^ n ^ -..) W v, ^ n ^ r.." V I ° rd 0 ^ n ^ 0 C.h  'A - o  a ,^ n ^ (.) E4  . 1..! ta i^ 1 ^ r,..) F -1 ^ 0 C .) .....•  0^ ■ ^ 0  z ^  ,z O 0  - w  t c.) 4  - -I r ) E4^ N I ^ o  c h 6  el r̀ c^ I I I ^ 7 .i C Y  ( / ' O Q  t - - ^ • ^ "0  r,$ )  t --- U ri c • o v • u « t,c1 O -■-■• E-1 cll • vi O ato c • t • (1) c.) "71a)  0t•e-1 0.4 a )  6 o ai • cd C.)• "CS 5.7 ) '71 t4 .744 4 0  0 c..) •cu • cn(CI (s-■ O $-■ • (1) rti - C j czic/C f:1-1 1 -1  0 C/P PL•4bA c HO •- •-• 4-• •Fi •ra5• 7:3t:4 • ° ) -d▪ , 11,0 "CS Q . • 0 o Sal ct" • cl) c,) A.2, • , ▪^ •,•■ N  < 4 -1 O 0 . ,...O .F) 0 ^ 0 s.,sa.^ ?•,,, O 0  - o '1,'^ .2 ^ • - ^ .._•^ 0N v) Â7z3^ • •-■ C.) T.: ^ 7 : 1 ^ 0 as^ ›-. 7:1^ 0 ^ .-■ ,•••-■^ 0  ,-,^ s.^ 0 0 ^ A--, :r..,^ " ^ c'ds-, . •-■ C A  tO 0  c z t s-■ ci) ^ > .,^ ,4 C..)^ 8 -c) 0 ^ 0 I:^ s--1 0  a t , ..t i ,,,,j (1 .) # - ■■.-.., cn tO ^ 4  '' ,•-■ 4; ^ 4 ^ c t ts  4 1-)^ c r l.' • :_, r i ^ r...) ia4- vr ri^ 0 a+ r -1^ 0  C Y r i^ E...) 1-1 n ^ 0 cf.) rl^ 0  cf.! r i^ 0  a El r.1) G E l C r • \  E s 1  X  0 ■ 1 I P I C i - re • n .7...., ..--- nn ri n i  r..... r..7 r i E l r i I-1 nnn nnnn nnn nn n M :--- ^ L ' =  t .  (..1 0r.0 ...› 0 1-1 0  0 -4 C  ) r. 0  ri) - CA M  1 -1 M  e l 0  (-- MMm  u t ■-i 1-7 V)C.:, n  -  -  .7..) a, 0  C^ , r i Cdr E E l^ 0  4 n E l ^ '•:..) c!)^ . C.) al r i ^;.---- r71^ 0  z ,.^ ..^ . 4 4  V  'C I 0  0  q,, C. ^ ....4^ : ,-I ;--) O 4  a l ,--, E `n 'a ^ 5 ^ II cd^ c•1 4 I..,^ .;.-4 0 cA  ;-■  ;-■  ,:, ii 72^ 'Ft' r, cc,',3 E^ r,4 ,.c cd E  • .-L . " . .  ^-,-, . 5  4  m  0 a)^ f"'■ ....-.^ ,_^ ° ^ ' 5 ^ t° 0-, -48^ ,__1-1  I I L I t) r,'"" —2) 8  4 -0' Li rm. ,_. ^ ,' ,7). ,4  II I c:.. c-i 'O  a .) " 8  8  E 0 t„^ —  , to  II c t ,,, cl^ II^ -  v  C A  4_, C -i- ,4-4 ^ 4.'1 E C . d 0 C ) ( )  t ) E ^,-^ ,) ^ rz4 (24^ ,,H a. ( ...)  0_, H  p .., P -4  g LI '. .-*1 ,— , 0C.4")^ •  • ^ . . . . . . 6 1 rfa.)F.")a) a ) up rc ) 0 0  C A - H 44-i 0cA A  y ..,MU s-4 C/) cc3C73(1..) s-4 0 1-) ct P .1 r l  C1.) r H C.) • ›.1 (-)Ct  0 + -■  C .)CU Sa• ,-0 a' cA N H . 0  a ) X  7 ') 7 ,(1) c+i4 a) C )a) a ) rr ia.) rziFct■ 0 ,„^ cAN M I a ) .3ai E0  07:3 a)^ o 1..) •czs(L) u  a ) c.) o b n  0C.) • - I  rci a.) N  <1.) Cr) (I)s•-■ IC".^ rn  C a  0,  62 •—■ J000 LI) 00CC Cen;ln C cl^ 6 . a7)^ ci) 0 ' ra••'^ ` ) 73'^ 4,) 2^ (L) '4' ^ 8 -  r24 ,-, rn 04 0:::, 0 ^ 0 4  , 4 Li . - - - E u c.)°) 0) -(C^ c4 +4, z. 9 • ?2^ ,d.- c...),..2-) 1--: -,--. ^ o cf)^ a) ^ 0, 7i^ cd^ • o,•••• ;_, C) 1...1^ a)^ 04 't Q  b .° '-; 5 • o  cn^ -42 I % '. -a " ,_ , 0 z3  0,) .1- c..) cn  (1) cri v t 0  :',7 4  -ci -ct cn  -4-.^ c...4 ^ . -■^ I.) ..^ LI .2 '.^ — 11 , (u ^ cs. OD •=,' czi A - cn^ a ) 0  0 -rD : - - H ' Gam) •(  ) 0   : -12 (  : ) cA  •• -, co cr) C±E.1' +a.  '--' -8  II^ — •—  ,..) (.. c. , 0  c.) — 6,70, to  c) '" .::' c.:' -,^ 0 +^ . c,i ,,-;), E-4 (.,-, 8 71- ^ (.7^ N ^ U 63 a) cat ° Q  •N U C.44 (1) -04 -(1 ) P-■ cl) 0  czt a n  cz9 •-•  u -0a ) 0 .4 ■ C .. 0'FV1. 4-4^ 4.4) 0 c 4 -i M I 0a) cat (4-1 a.) c  0 1  "4 <LI 0  0 C.) • o• 0 I)^ •*) a) - 0  ,I) .-0 C t cL) - 0  CA . 0  0 E 4. ^ :4=1 C.) 7 ' P .  0 ' A 1 1  0  Q -) C A   0 P .  F - F : 4) tO  (2 , 17)' •v .  Q -) 00 a)^ . -4-$4  b.1)^ * ticf)^ CC/ C C $ C ) .0  . 0 ,C ) Q .) 0  ,..c1 4:14 a.) - (..) a) •0  P , " . , ^ 0 (1) F-cu  •-■ cNi r--- w ) a ) 0 cr) Z ) a .) Ca4 1-) .4 O e  (.4 .1  0 . 0 . II1- 1 nnnnnnnnnnnIIIIII1-1nn1-1••4-10CJ 0  r-r/  - , t. n̂ ^ r....) C .) E -1^ n ^ r...) C ._) 13 .)^ n ^ M (..) c4 7A ^ n ^ z  w ^ [ 7 : • 0  1 -1  '4 .4 ^ n ^ x  w w  ( :-----; . ^ n ^ w  E i ,.2 01 ^ E l^ 1_4^ ,E r 0  t:^ F-1 ^ Lzi^ F4 114^ i -f- ,='• Cil , L II ia., w  1_ 1  c  ,^ im ^ w  Z  c : )  , ; , ' ill^ - 4 -,^ M I ^ w  .- 1  -  r - - -  8 Li' 1-1 ,-,1 ^ i m i ^ rx i^ > , rai ^ 4y^ i m i ^ L 4  a ^ L 4  _Ix , 4,., cnc,, - ^ w  a z w  1-'17).  ` ',3 t ^ M I^ Cil Cm . Lil^ * E ll 0 - ,, ,., * ^ I N ^ ili C il 1::: =  ■ -•-1  r .'"••.•^ M I ^ 0  C Y ^ =  r 4  .  ' M I ^ C )^ .. o 0 ^ - k o ^ c_.) (") iz" - ''',-1 ^ ta4^ (D ^ Lr - ,71^ (..) 0 ^ c)^ 1--1 0 ^ C A ^ E l^ 1-1 0 ^ Fl^ M  X ^ 1-1^ 0  C 4 0 0  (7 ).1 0Cu ?) C ) 1-1 C _,) al .73 la+ r ,- ) 0  0 ) U  i_D  - r--i^ 1 I-1nElElrlElEl n i  '= ..r.) -) 0  .-1 "--i?) 0  c D ^ 0 ...:.-...^ N 0xx c4 x .---- =  .6 4 . (-) ^ u - ■ ^ 1-1 1 -1 1- 1 N I. F l nA ,^ :-.4 c) , =  r./ ) M;:- =  1-4 .- 1  ZG O ,^ I-1  - (7  0 ,!;, 00cr, CID a. C ..) O ^ n 0  I-1 w  1̂ ::^ 1-1^ 0  r-f)^ F-1 ^ =  E - 4 _ ., ,,.2 :6' 1 'E, ^ 1 1 ^ 4 1 ) Z Ill E --1^ 1-1 ^ 4 . - , ^ r - 1 14 ',.>^ n ^ C ) 1 -I 0  1 2 4  - 4 -^ E l ^ 04 -  C .) F -4 ^ 1- 1^ C .) rJ) ^ c .^ E l ^ G O  0 0  *0. ', - ,, C ) c_ (...) E ., ^ I-I^ C ._) I-I ^ '.; ^ i - i r l ^ -  uu  zc) .*H .-, 8 ^ F -1^ 0  Z 0  - :'^ r-1 ^ 0  1 2 4 ^ F-1^ C .) iii* n .; u  .a . ^ i r r - 1^ C .-) 124.1^ 1 -1 ^ 0  E-1^ * ^ .--,---,^ ,--,--, ^ ..^ • •^ • •^ • •^ • • 72.71 :. a) .. ^ .. :-, (1)^ . 1 )  ^ '-0 :--4^ 0 ^ :.-i^ ■--1^ .9 ^ -1 (1)^ :-1(1) 1:14^ 0 ^ CL1^ la,^ r..)^ tai^ C4 T 1au , 0 4 =0 ^ v ^ . ,.., ^ c/)^ t'l Ct5 . -^ (I)0 a )  C .) cL, ^ 0 ^ tu)^ .- ^ -zi^ o  ci)t ^ = ^ e ^ t., ,..,0-4 ^ I)^ o  > , ^ (I)^ , • - ^ -6 0 ^ rn^ a s •-^ a) ^ m ^ s-4^ (..) ^ 73^ 3 , rc77)^ fa, ,.., o  c il to  0z 1/4.0^ ,,,,, c.) a) r--. Z ^ o  c l x  ti)  - c ^ 0 ,6  to ^ -C  ia' •,..,-  ^ 0 ^ 0 .  c ) Q )  c r j II^ . -  -T:s sa. ^ a l^ II^ ct t l) 0  cA $ -■  -0 ^ . .  0 ^ ^ 0  C A  (A  •-0  0 ^ ' A  Z ^ • C  ct _al E - a • --■^ -0  c d  c  j 0 ^ ., ^ A-■ ,--,^ 0  ..;.,, z^ 28 ^ c ,/ .^ -0 0 • -, .c = ,__, 1.)^ ›, I I^ r6) r -  t ) S  Ect ■rD  .- rf, .___, (/'  L) II ^ a , II II ^ cc -.-' ^ 7111.  ^ 4.) I"- II 0  0 I I^ tg)^ (4 rcl ., 29 c t .  w  w E ct,c.1 0 ,0 21 0 E -■  r:4  P 4  ( -Q  c-) a a. a a (.7 • • 0 0 0 0 0 64 • cd (I) 0<-)(L)v) — 6 0.4 o (IC  < 1.) 0 N c<4 I-) –d U czt (1 .4^u p 's U a s o. 0 czt cip 0 C.) chicd (L.) •L-)  C A() 7 :r / Cf) (1)•,,"„, c d u p• $.„ 3  0 P .. 0  c .) tr) "./■—■ H Oczi Mc) 0•-4  r 7E1b1.1 ••-•1:1•1 :1•1 5  t0 a) b•v-I C f) Ncr) c r) ti) ,.".■  •0  .0OOJ  c'71U  E  8  a.  6 5  A s O0. 00H Cl■ NNN004— V ') 00 cr) 0. a a ,c7-.., ^ a .) 1.)E CD^ a ) 0 a ) :--, cl) ^ 0 ^ I^ 7-4 4 -1  ,9 ^ I) c„ . ^•,-, Ir.:^ ;-, c) ^ a) 44 •-, 4'... 7:3 ^ a ) ^ d ^ .4:4 --, E ^ ,_-, 0^ --, oo (..) ,r. <:) • 27) .-2 II I CAC) F):, ;..Ic73 f:)7. —■ ^ (I) a cu ▪ H ^ H ^ (.1-1 -CD ^ 0 ^ '''^ II ^ „,$+ 0  g  H  " ' .H  u) (1) (7 1 - ^ -Ci 0 . —.^ (.. ^ 0  0  (7 :1 Sa.■^ u  ;-■  ,-■ -0 ^ a ,c,^ -, = upto^ ,  ' ^ E as a,c) cn^ 4 '-' cz , .c) r:) 71 - c l  ^ (.. 0^YECLCL 66 n(- 1 1- 1 (- 1 nnrlnnnnn1-1nnr-Ir- 1nn11nnnnn1-1r-i4400 a)0 •F.; c.,__, i--,cn  0 LI O  -4--. . ,--■ v, a )  cn >  a ) E  ._.c 0 -1— ■ A--+^ V ) C A  • .-.1 ,--4  '" ^ • Cl;' ;-, • (1 ..) • *C5 ^ e• ;-.I^ CU (11 1^ I) .- ^ CZ: (1) 0  C t .g ..... + - , Q .) C .)  ,..0 . ,--■ ^ a) -0a..) s . <-1--■ C I: > (1) c..)^ ,.. 2s-4^ ;-■ c i-: M  c 0 ^ cu <1.)^ cd (..)^ cA 771 -C S  ed c.)0 c...) o o0  • _, _, •-, tab ''''' i-,U  0 0 -c-cl 7 0::, 1.) C .) Q ) ^ • 7 i • •-■ ,_ ^ ,...---. c n  a ) ,-. " kl^ r I re , ,c) ,-, — 0  c,.,<.4_, 4.1 ›, 0  „,^ ,sL ,–  0  z 0  -0  m  -0 ,.^ 5, ^ a.) H  ',-_■ , b b  t . c-) '6 1  0 c.ccn - •-, s,---, ,,,, a ^ '" ^ a. a. 0  0  • - c/ ) 0  7 7 , al-1 s„ ti..0 •-■ all c.,9, -5cn  o •,-, ^ Q.) • .-I^ ,-■ .5  -0  -ifi% ,. a ) o  o ;- ^ -0  0., ...0 0  .to  0 o  .4. ^ 4- , 0 ,.. ,_^ 0 $...^ .... .,.., y .., ''' -1 5  ,T .^ .3 cd .. .10 c)' A -J 4 0 7 --I  rA lab 1) 0  - d I . ) E , u  0  ... 'a )+ a  ,b  '-' a) (1 .) cA  -d -4-. u 1 c t a)^ cci •,--, ^ s, • . v ., -0  S :1  -■ ^ ^ a .) a ) a ) }•-I ^ ;--4^ y ..,0 m  czt ,•-E  ct r---^ c A  › , c r) (i) o-^ 0 cl) m  5 _,^ &1 ,---,^ c3 • -• cj -4-: • -.4^ /-, ‘.......• c71^ cn^ ›•-■ a.) 0•■ (..1 P• c., -FL)~ c, o ^ c  0 o a,^ -a%  • a 0^ a.) a.%^ -,^ (157; v)^ 0•--.^ ,?:,,^ 0 ^ ,- v)cc ^ < 4,F°3 ^ . --,^ z -, 0 E ›, ^ 4 ) ^ - ^ -• .,, s•-■ ^ C. 0 ^ v) CA ^ -^ c f) c t H ^ _T4 U ^ c9"^ cl)^ s. ,.. f...)^ a ., N  cd ,----• 0  ,__, c._, :-■  N  o ^ ct ert^ 0  1 -0^ 7  0 ^ 4_, [---: m ^ fA L .^ -c -' = ^ 0 Cal bb II^ ryi^ a ) (7 : .2  -c --.,h w -,.. ^ a)^ >  II^ ., 0 . - to -0^ cn .c.) -0 „0- ,A 0  ,,, c,$ ct  -0  q 00 c„,- -,-,^ - I.."^ a) ^ - . 0 ^ -fP^ czt cj s. -,-> , o ---, ,_ ,,,>^ ,^ a)^ ,, ^ ct") c ) c :' .^ .--^ 0, c/) cd oc>^ -pi r4=4'^ c >  LI) a )  0 r- , (a, III 11 .) '' 0 ^ 2^ 0  II Q . ^ c.) —  ....v) .,- ,t II ... cn ^ "1 1 1 ) "CS^ c.) U co^ H ...E '-' t:10 I I^ ;,-, c .)  a )  E  W  0 0 cd cn I  I 4  4 . czt ^ Z  4  P . c i) V  . um , 1 -^ $.7 .,'0 0 ° H  P 4  X  a 4  W  0 P .^ P . A . P . L7 g=i • •^ .^ • • • • • Issi = ^ i7..) 4,^ I N ^ =  r - 1.1 ^ r j ,..7.,^ I I M ^ =  E a 4 ^ I I I ^ =  rj. i ^ M I ^ =  6 - 1 ^ 0  C h 1111. ^ P - 1  0 ^ . - - - ^ <A4 ^  r " , M R ^ =  .-.1  to  c c ) ^ r  . ^ r-1 ^ rz)l^ . . . , '-^ ^ c - - -  r --: r- i^ ` 6 . ' *f, 0 ^ .  r - -1^ * ^ r - -1^ 1-1 N ,, „ ,^ . . . . ,  (4- , ^ . . . . 1 ^ Fli -6  1;21 ^ r  , . 1 ^ .  :3^,, ^ =  f i .  0  : ; . ; M II ^ =  I- 1 ^ ,_ , ^ =  1 - .  V  ' ' ' - rr ,Pi,^ =  ::..) =  * = : . ^ : f  r - -1 ^ 1-4 ^ =  ) - 7  . z . Fa I - 1^ =  a l ^ =  G L : -I  *  n ^ M  Fl■ li) ^ = r.:.:̂ • -1 ^ =  .  ` 4 ^ =  = ^ 1-1^ x ^? ^ , , . .  r J ) ^ c ---^ 1 - 1^ ( % ^ r-) )-1 ^ r n ^ r i ^ .L.-1 ^ C.) Ch^ L rC ^ E l^ i.--4 S.)^ rn* ^ ra il ^ p a .,,,, ','^ m e ^ . 4 ^ r .,.., a.^ I n ^ =  "7-') ^ O x ^ i n  I- T-.> 0 .71 rt./ 4) 0  )..4 ..) 124 a. co p a i - .i. - ,A p c.., :':':^ ( s̀ 1=2, ^ L a ^ , D ^ C - )  1 3.4 ^cn W  r3 ^ CV  C.-- C A N * . n 1 ...)  r .) ^ U ^-1  0  ,7c. r -i ^ r. . - en ' -,L1 0 ^ r..;, Z - t F -4  o , _ 0 *0 ^ r ! 4 ^ .r ± ,,,^ n ^ (I,/ ^ '..i: ^ r:, I-1 67 Ec Y . 0 1 2 1 0 1^ S . _ ^ 10 C  . ^ 0 ^ . kJ.^ . -  C  ,,,^ 0 ^ . .  0  ..- - 4  ,  0 1 M  —  ..- 1 0 ^ 5 _  , - - .   C  =  U l 0 , 0 ,_. 0  M  -  0 ,i,^ a . M ,n , ^ Q . ^ 2 2 .1...^ u. (.1 4 - ^ a , ^ .,. , w .w a - ,  , ^ = W .-- " .= .9 .a :.. . ^ 28 w = ^ . -,.5 ^ 0 .'.....- .......= ^ , , 6^ iC e = .2 W r '-, 0 4 -^ .-,^ = ' N .  a , o , . ^ w2 . . , w w = ..,, ^ e .-.,m a z.- iii-^0 .. „ . ....... . - ..,,,„ „,,,,,_  ,...^ . 0^ , 0 ^ .. ,. 11.1. : 7 3  . ^ 1 gL.)^,...21 ^ == SB...^ ...^ = 1 w  a =.__ . m .w“ . 0 , ^ w › i 3 m I n c^ .,_ ...c. . , .. -0  cA M0 c.) 4  C A0 a..) -0c.) rCA ' a3 c. )  H  0 M 7:3 (/)a.)()) E 0(.) 0 i.%) czt (I) kr) N ^ (i)a.) 7:1^ • as cn 0 o m  c-) - r^. 4  0 0 a .) u p  1 ) H ^ a.) X  c) .(1)  -0a .) I) C.) 0 ^ C73 (2 ?) r; ,-,(1) ^ c z t(-) t 1.1 0  0'CS o4-7-t C .) C1) (1.) t:) -0al.) cr) c .) 0 (1);-1^ (1.) cT ^ va El OtiCO a , 1 -• 0 1  6  C 0  0  0 , ,1,1 - :5 S I 6  -o w W I L 1 M M 0 , 7 1 W M C C = = 1 - - = . , , = . 0 .- ..- .J .1 4 ..— .- M . 0 .10. ...- - .^ . 4 , ^ 0 5 _  ,^ . 0 , ^ .8 68 Reference Std. curve - GAPDH y = -3.234x + 21.644  • Y ■ Predicted Y — Linear (Predicted Y) -4.0^-2.0^0.0^2.0^4.0 Log ng RNA Appendix 10. Example calculation for relative quantitation of transcript abundance using the standard curve method. Definitions Reference standard = H9 hESC. Calibrator sample (baseline) = Day-0 sample in retinoic acid-induced differentiation. Test samples = Day-2 and Day-4 of retinoic acid differentiation. Endogenous reference gene = GAPDH. Target (query) = Novel transcript 3012. (i) Calculate the mean and standard deviation values of replicate reactions. (ii) Generate a relative standard curve for the endogenous control (GAPDH) in the reference sample (H9 hESC). Input total RNA H9 hESC (ng) Log of input GAPDH-1f/r^GAPDH-1f/r average Ct Ct Std. Dev. 100 2.0 15.72^0.31 10 1.0 17.84 0.10 1 0.0 21.45 0.23 0.1 -1.0 24.80^0.14 0.01 -2.0 28.41 0.11 69 Reference Std. curve - 3012 -4.0^-2.0^0.0^2.0^4.0 Log ng RNA y = -3.55x + 28.09840- • Y ■ Predicted Y - Linear (Predicted Y) ea 5 20  10 (iii) Generate a relative standard curve for the target (3012) in the reference sample (H9 hESC). Input total RNA H9 hESC (ng) Log of input ^ 3012-1f/r^3012-1f/r average Ct Ct Std. Dev. 100 2.0 21.38^0.01 10 1.0 24.29 0.07 1 0.0 27.66 0.05 0.1 -1.0 31.77^0.45 0.01 -2.0 35.39^0.91 (iv) Determine relative quantity of the target (3012) in the calibrator (Day-0) and test (Days 2 and 4) samples from the standard curve. 3012-1f/r average Ct 3012-1f/r Ct Std. Dev. 3012 quantity (ng) interpolated from std. curve RA Day-0 25.00 0.05 7.49 RA Day-2 23.56 0.09 18.97 RA Day-4 25.02 0.02 7.39 70 co N - O C O  N - C \ I 0) C V  C C ) .TOa)ONa :13NO90EOy. (13c •O•- C N1 6 > . CUa C I  1 0 (2 ) MI t - (11 C  •OIt) coC O „_0 cr) v.) C O  N : V,E  (2, Oz -oO NELo.oC.) 99W c=i 0 ) C •cr O OcdN 71 Appendix 11. Biohazard Approval Certificate. The University of British Columbia Biohazard Approval Certificate PROTOCOL NUMBER: H06-0109 INVESTIGATOR OR COURSE DIRECTOR: Marra, Marco DEPARTMENT: Medical Genetics PROJECT OR COURSE TITLE: Development of Technologies for the Derivation, Propagation and Differentiation of hESC APPROVAL DATE: 07-06-20 APPROVED CONTAINMENT LEVEL: 2 FUNDING AGENCY: Networks of Centres of Excellence (NCE) The Principal Investigator/Course Director is responsible for ensuring that all research or course work involving biological hazards is conducted in accordance with the Health Canada, Laboratory Biosafety Guidelines, (2nd Edition 1996). Copies of the Guidelines (1996) are available through the Biosafety Office, Department of Health, Safety and Environment, Room 50 - 2075 Wesbrook Mall, UBC, Vancouver, BC, V6T 1Z1, 822-7596, Fax: 822-6650. ^ k Approval of the UBC Biohazards Committee by one of: Chair, Biosafety Committee Manager, Biosafety Ethics Director, Office of Research Services This certificate is valid for one year from the above start or approval date (whichever is later) provided there is no change in the experimental procedures. Annual review is required. A copy of this certificate must be displayed in your facility. Office of Research Services 102. 6190 Agronomy Road. Vancouver. V8T 1Z3 Phone: 604-827-5111 FAX: 604-822-5093 72

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0066593/manifest

Comment

Related Items