Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Transcriptomic consequences of RNA processing disruption via a novel CDC-like kinase inhibitor Funnell, Tyler 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_february_funnell_tyler.pdf [ 2.38MB ]
JSON: 24-1.0135632.json
JSON-LD: 24-1.0135632-ld.json
RDF/XML (Pretty): 24-1.0135632-rdf.xml
RDF/JSON: 24-1.0135632-rdf.json
Turtle: 24-1.0135632-turtle.txt
N-Triples: 24-1.0135632-rdf-ntriples.txt
Original Record: 24-1.0135632-source.json
Full Text

Full Text

Transcriptomic consequences of RNA processing disruption viaa novel CDC-like kinase inhibitorbyTyler FunnellBachelor of Science, University of Northern British Columbia, 2009a thesis submitted in partial fulfillmentof the reqirements for the degree ofMaster of Scienceinthe faculty of graduate and postdoctoral studies(Bioinformatics)The University of British Columbia(Vancouver)December 2014© Tyler Funnell, 2014AbstractRNA splicing is a process by which introns are excised from precursor mRNA.Variations in the segments removed — and the resulting mRNA molecule — mayresult in gene transcripts with differing and even opposing functions. The mech-anisms involved in RNA splicing are tightly regulated, the disruption of whichhas been implicated in several human diseases including cancer.This presents the RNA splicing machinery as a potential therapeutic target.However, the effects of systematic splicing modulation through pharmaceuticalintervention remain under explored. A thorough understanding of splicing canbe investigated through controlled disruption of the molecular machinery.The Takeda Pharmaceutical Company Limited (Osaka, Japan) has recentlydeveloped a novel compound that inhibits the CDC-like family of kinases, whichregulate key splicing factors. Although splicing inhibitors have already been pub-lished, their effects on the RNA splicing landscape have not been systematicallydescribed. The creation of a novel splicing inhibitor presents the opportunityto perform a methodical analysis of transcriptomic response to RNA processinginhibition using modern RNA sequencing and analysis methods.It is demonstrated, using the Takeda compound, that restricting the functionof CDC-like kinases perturbs RNA splicing in both malignant and normal cells ina dose dependent manner. Post-treatment changes in splicing patterns revealedthat these changes are mainly due to inefficient recognition of RNA splice sites.Splicing factors were among the earliest responders to treatment, indicating splic-ing autoregulatory mechanisms are sensitive to changes in splicing efficiency.Downstream effects were seen as dose-dependent changes in gene expressionregulation, and down-regulated genes were enriched for splicing factors. Treat-ment also resulted in increased generation of conjoined gene transcripts — RNAiimolecules transcribed from at least two different genes, likely caused by tran-scriptional read-through. This revelation points to a previously undescribed rolefor CDC-like kinases in RNA processing.iiiPrefaceThis thesis is based on the CLK inhibitor project conducted through a collabora-tion between the BCCRC and Takeda Pharmaceutical Company. Experimentaldesign and research direction by Sam Aparicio with Osamu Nakanishi, AtsushiNakanishi, and Gregg Morin. None of the text in this thesis has been previouslypublished.Experimentswere performed byArushaOloumi at the BCCRC or bymembersof the Shonan Incubation Lab at Takeda Pharmaceutical Company. Identificationof the T3 compound by Takeda. Arusha Oloumi performed the CLK siRNA ex-periments and conjoined gene targeted sequencing.Informatic analysis of experimental data presented in this thesis are performedby myself. Exceptions include RNA-Seq alignment and processing with MISO,which was performed by Jamie Rosner and Celia Siu. First identification of con-joined genes by colleagues at Takeda Pharmaceutical Company. Initial process-ing of HCT116 T3 treated RNA-Seq libraries with deFuse was performed by KareyShumansky; All further deFuse processing and analysis was performed by my-self with a version of deFuse modified according to Andrew McPherson’s advice.ESE motif density analysis was initially performed by Hirokazu Tozaki, but latermodified and redone by myself. Differentially spliced gene biological process en-richment analysis was initially performed by Sohrab Shah, and later modified andredone by myself. Selection of clustering method was performed in conjunctionwith Hiroyoshi Toyoshiba.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Process and Mechanisms of RNA Splicing . . . . . . . . . . . 21.2.1 Formation of the spliceosome . . . . . . . . . . . . . . . . 31.2.2 The role of SR proteins in RNA splicing . . . . . . . . . . 31.2.3 The role of SRSF protein kinases and CDC-like kinases inRNA splicing . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.4 Alternative splicing . . . . . . . . . . . . . . . . . . . . . . 71.2.5 The role of CLK and SR proteins in non-splicing RNAmetabolicprocesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Disruption of RNA Processing in Human Disease . . . . . . . . . 111.3.1 The splicing machinery as a therapeutic target . . . . . . 121.4 Detecting and Measuring Changes in the Transcriptome . . . . . 131.4.1 Computational methods . . . . . . . . . . . . . . . . . . . 16v1.5 Experimental Approach and Aims . . . . . . . . . . . . . . . . . . 212 Transcriptomic Consequences of CLK Inhibition . . . . . . . . . . . . . . 242.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 T3 Treatment InducesDose-Dependent Alternative Splicing Changes 272.2.1 Alternative splicing response to CLK inhibition is com-mon to both HCT116 and hTERT cell types . . . . . . . . . 282.2.2 Splicing and cell cycle related genes are sensitive to CLKinhibition . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2.3 CLK knockdown partially reproduces effects of T3 treatment 342.2.4 T3 induced CLK inhibition reduces splice junction recog-nition efficacy . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.5 PSI clustering reveals distinct AS response groups . . . . 412.2.6 ESE density is predictive of splicing response to CLK in-hibition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3 CLK Inhibition Promotes Conjoined Gene Transcription in a DoseDependent Manner . . . . . . . . . . . . . . . . . . . . . . . . . . 522.3.1 T3 treatment increases conjoined gene loci detection in adose-dependent manner . . . . . . . . . . . . . . . . . . . 542.3.2 T3 treatment increases conjoined gene PSI in a dose-dependentmanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.3.3 Similar conjoined genes are sensitive to CLK inhibition inHCT116 and hTERT cells . . . . . . . . . . . . . . . . . . . 652.3.4 Conjoined gene events are validated in both HCT116 andhTERT using targeted sequencing . . . . . . . . . . . . . . 692.3.5 Upstreampartners of conjoined genes are involved in RNAmetabolism and cell-cycle regulation . . . . . . . . . . . . 702.3.6 Upstream conjoined gene partners may rely on auxiliary3′-end processing factors . . . . . . . . . . . . . . . . . . . 742.4 CLK Inhibition Results in theDownRegulation of Splicing Factorsand Cell Cycle Regulators . . . . . . . . . . . . . . . . . . . . . . . 752.5 Comparison of Unstranded and Stranded RNA-Seq Libraries . . . 84vi3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.1 Limitations and Future Directions . . . . . . . . . . . . . . . . . . 963.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105viiList of TablesTable 2.1 Summary of T3 treatment datasets. Each dataset contains se-quences from either T3 treated or control cell populations. (un-str) and (str) indicates an unstranded or stranded RNA-Seqprotocol was used, respectively. An ‘X’ indicates that a se-quencing library exists for the appropriate T3 concentrationand dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Table 2.2 Summary of CLK siRNA knockdown RNA-Seq libraries. Theknockdown experimentwas performed usingHCT116 cells. An‘X’ indicates that the corresponding sequencing library wasgenerated from cells with the indicated target knocked down.(ctrl) indicates a control library, and ‘None’ represents a sam-ple treated with vehicle Lipofectamine® 2000. . . . . . . . . . 26Table 2.3 T3 and CLK siRNA treated HCT116 GO BP enrichment . . . . 36Table 2.4 Proposed similar AS PSI response clusters between the threeRNA-Seq datasets. (unstr) and (str) indicates an unstranded orstranded RNA-Seq protocol was used, respectively. . . . . . . . 45Table 2.5 HCT116 unstranded RNA-Seq PSI cluster event type propor-tion enrichment . . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 2.6 HCT116 stranded RNA-Seq PSI cluster event type proportionenrichment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Table 2.7 hTERT PSI cluster event type proportion enrichment . . . . . 48viiiTable 2.8 ESEmotif density comparisons for SE and RI events in PSI clus-ters 1 and 2. One-tailed t-test p-values are shown. Alternativehypothesis is that the indicated bindingmotif density is greaterin AS events that increase in PSI upon T3 treatment. NS indi-cates that the null hypothesis was not rejected at a significancelevel of 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52ixList of FiguresFigure 1.1 Regulation of SR protein localization by RS domain phospho-rylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Figure 1.2 Recruitment of spliceosome components mediated by RS do-main phosphorylation . . . . . . . . . . . . . . . . . . . . . . 6Figure 1.3 Alternative splicing event types . . . . . . . . . . . . . . . . . 8Figure 1.4 Comparison of RNA-Seq vs. PacBio cDNA reads . . . . . . . 15Figure 1.5 RNA sequencing analysis workflow . . . . . . . . . . . . . . . 17Figure 2.1 Differentially spliced event counts . . . . . . . . . . . . . . . 28Figure 2.2 HCT116 & hTERT MISO event Venn diagram . . . . . . . . . 29Figure 2.3 HCT116 unstranded RNA-Seq differentially spliced gene en-richment map . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Figure 2.4 HCT116 stranded RNA-Seq differentially spliced gene enrich-ment map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 2.5 hTERT differentially spliced gene enrichment map . . . . . . 33Figure 2.6 CLK siRNA MISO event Venn diagram . . . . . . . . . . . . . 35Figure 2.7 HCT116 unstranded RNA-Seq MISO event PSI boxplots . . . . 38Figure 2.8 HCT116 stranded RNA-Seq MISO event PSI boxplots . . . . . 39Figure 2.9 hTERT MISO event PSI boxplots . . . . . . . . . . . . . . . . 40Figure 2.10 HCT116 unstranded RNA-Seq MISO event PSI clusters . . . . 43Figure 2.11 HCT116 stranded RNA-Seq MISO event PSI clusters . . . . . . 44Figure 2.12 hTERT MISO event PSI clusters . . . . . . . . . . . . . . . . . 44Figure 2.13 AS PSI cluster event type proportions . . . . . . . . . . . . . . 46Figure 2.14 SE ESE density boxplots . . . . . . . . . . . . . . . . . . . . . 50Figure 2.15 RI ESE density boxplots . . . . . . . . . . . . . . . . . . . . . 51xFigure 2.16 VSIG10-WSB2 conjoined gene sashimi plot . . . . . . . . . . . 53Figure 2.17 CG event count barplot . . . . . . . . . . . . . . . . . . . . . . 56Figure 2.18 HCT116 unstranded RNA-Seq CG PSI change boxplots . . . . 59Figure 2.19 HCT116 stranded RNA-Seq CG PSI change boxplots . . . . . . 60Figure 2.20 hTERT CG PSI change boxplots . . . . . . . . . . . . . . . . . 61Figure 2.21 HCT116 unstranded RNA-Seq upstream WT RPM vs CG PSI . 62Figure 2.22 HCT116 unstranded RNA-Seq upstream WT RPM vs CG PSI . 63Figure 2.23 hTERT upstream WT RPM vs CG PSI . . . . . . . . . . . . . . 64Figure 2.24 RNA-Seq CG event Venn diagram . . . . . . . . . . . . . . . . 66Figure 2.25 RNA-Seq & PacBio CG event Venn diagram . . . . . . . . . . 68Figure 2.26 HCT116 unstranded RNA-Seq CG enrichment map . . . . . . 71Figure 2.27 HCT116 stranded RNA-Seq CG enrichment map . . . . . . . . 72Figure 2.28 hTERT CG enrichment map . . . . . . . . . . . . . . . . . . . 73Figure 2.29 HCT116 unstranded RNA-Seq gene expression clusters . . . . 78Figure 2.30 HCT116 stranded RNA-Seq gene expression clusters . . . . . . 78Figure 2.31 hTERT gene expression clusters . . . . . . . . . . . . . . . . . 79Figure 2.32 HCT116 unstranded RNA-Seq differentially expressed geneenrichment map . . . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 2.33 HCT116 stranded RNA-Seq differentially expressed gene en-richment map . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 2.34 hTERT differentially expressed gene enrichment map . . . . . 82Figure 2.35 HCT116 unstranded vs. stranded RNA-Seq hexplot of AS eventPSI values. PSI values were compared for each event at eachconcentration. Each hex represents a number of AS events.The lighter the shade of blue, the greater the number of ASevents map to that hex. . . . . . . . . . . . . . . . . . . . . . . 85Figure 2.36 HCT116 PacBio vs. unstranded RNA-Seq hexplot of AS eventPSI values. PSI values were compared for each event at eachconcentration. Each hex represents a number of AS events.The lighter the shade of blue, the greater the number of ASevents map to that hex. . . . . . . . . . . . . . . . . . . . . . . 87xiFigure 2.37 HCT116 PacBio vs. stranded RNA-Seq hexplot of AS event PSIvalues. PSI values were compared for each event at each con-centration. Each hex represents a number of AS events. Thelighter the shade of blue, the greater the number of AS eventsmap to that hex. . . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure 2.38 Mapped read counts for RNA-Seq libraries from the three T3-treated RNA-Seq datasets. Counts for T3 concentrations com-mon amongst the three datasets are shown. . . . . . . . . . . 90Figure 2.39 Proportion of mapped reads split during the alignment pro-cess. Proportions for T3 concentrations common amongst thethree datasets are shown. . . . . . . . . . . . . . . . . . . . . . 92Figure A.1 Soft threshold vs. scale independence and vs. mean connectiv-ity for HCT116 unstranded RNA-Seq AS PSIWGCNA clustering. 106Figure A.2 Soft threshold vs. scale independence and vs. mean connectiv-ity for HCT116 stranded RNA-Seq AS PSI WGCNA clustering. 107Figure A.3 Soft threshold vs. scale independence and vs. mean connec-tivity for hTERT stranded RNA-Seq AS PSI WGCNA clustering. 108Figure A.4 hTERT-exclusive CG upstream partner FPKM violin plots . . 109Figure A.5 hTERT-exclusive CG downstream partner FPKM violin plots . 110Figure A.6 HCT116 replicate 1 conjoined gene expression in validationdataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Figure A.7 HCT116 replicate 2 conjoined gene expression in validationdataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Figure A.8 hTERT conjoined gene expression in validation dataset . . . . 113Figure A.9 Soft threshold vs. scale independence and vs. mean connectiv-ity for HCT116 unstranded RNA-Seq FPKM WGCNA clustering. 114Figure A.10 Soft threshold vs. scale independence and vs. mean connec-tivity for HCT116 stranded RNA-Seq FPKMWGCNA clustering. 115Figure A.11 Soft threshold vs. scale independence and vs. mean connec-tivity for hTERT stranded RNA-Seq FPKM WGCNA clustering. 116xiiGlossaryA3SS alternative 3’ splice siteA5SS alternative 5’ splice siteAFE alternative first exonALE alternative last exonAS alternative splicingCCS circular conformation sequencing. see complimentary DNACG conjoined geneCLK CDC-like kinaseCLR continuous long read. non CCS PacBio readsDSE downstream elementESE exonic splicing enhancerFPKM Fragments Per Kilobase of exon model per Million mapped readsmRNA messenger RNAMXE mutually exclusive exonPSI percent spliced inPTC premature termination codonRI retained intronRNA ribonucleic acidRPKM Reads Per Kilobase of exon model per Million mapped readsSE skipped exonsiRNA small interfering RNASMRT Single Molecule Real TimesnRNP small nuclear ribonucleoprotein particlexiiiSRPK SRSF protein kinaseUTR untranslated regionxivChapter 1Introduction1.1 OverviewThehuman genome contains approximately 22,000 protein-coding genes [1]. How-ever, the number of unique protein isoforms is greater than can be explainedby the number of genes alone. To reconcile this disparity, we must look at thecorpus of gene-protein intermediates: ribonucleic acid (RNA) molecules. BeforeRNA molecules are ready to be translated into protein peptides, they must un-dergo a series of modifications to become messenger RNA (mRNA). Splicing is apre-mRNA processing mechanism that occurs both during and after transcriptionfrom genes encoded in the DNA.A typical eukaryotic gene is primarily composed of exons and introns. Ex-ons are the regions included in the final mRNA product, while introns are theintervening sequences. When pre-mRNA is transcribed from genes, it containsboth exons and introns. During splicing, the introns are excised from the RNAmolecule and the remaining exons are ligated together, forming mRNA.Alternative splicing (AS), or the differential inclusion of exons and selectionof splice sites, is an important source of proteome diversity in humans. In fact,there are approximately 6 protein coding transcript isoforms per gene on aver-age [2]. Proteins produced from alternatively spliced RNA can have different andeven opposing functions. AS can also impact gene expression regulation; In somemembers of the SR gene family inclusion of a small “poison” exon containing apremature termination codon marks transcripts for decay, reducing transcript1abundance within the cell [3].Alterations in the patterns of AS have been implicated in various human dis-eases. Approximately 15% of all genetic disease-causing mutations specificallydisrupt RNA splicing [4]. In cancer, protein products resulting from aberrant al-ternative splicing are linked with malignant phenotypes [5, 6]. Modulation ofalternative splicing can retard oncogenic activity in tumour cells with relativelylow cellular toxicity, suggesting the splicing machinery may be targeted for ther-apeutic intervention [7]. However, more research is needed to obtain a detailedunderstanding of the dynamics of splicing regulation and the effects of its suppres-sion before AS-modulating agents can be used (safely) as clinical therapeutics.RNA sequencing (RNA-Seq) [8] assays allow the precise measurement of nu-cleotide sequences and quantification of RNA levels. Many methods have beendeveloped that use RNA-Seq data to detect and quantify RNA isoforms with highsensitivity. Recent advances in sequencing technologies, including the PacificBiosciences (PacBio) [9] RS platform, provide the ability to sequence up to sev-eral thousand nucleotides – enough to capture entire transcripts for many genes.Using these long read methodologies, it is possible to verify the existence of spe-cific mRNA splice variants. Current RNA assays provide the ability to study RNAexpression and processing mechanisms with unprecedented resolution.1.2 The Process and Mechanisms of RNA SplicingRNA splicing is a complex procedure that requires a collaboration of many dis-tinct proteins and ribonucleoprotein particles. For splicing to occur, a subset ofthese splicing factors assemble onto the mRNA precursor around exon junctionsto form the spliceosome complex. Once in place, the spliceosome cleaves the RNAmolecule, removing the non-coding intron segment, and ligates the remainingexons together. Recognition and precise definition of exon boundaries involvesseveral cis- and trans-acting elements that can either promote or inhibit splicingat a candidate exon junction.21.2.1 Formation of the spliceosomeFormation of the spliceosome involves the cooperative action of five small nu-clear ribonucleoprotein particles (snRNPs) in conjunction with many auxiliaryproteins. Spliceosomal snRNPs and protein factors recognise and interact withseveral cis-elements including the 5′ splice site, branch point, polypyrimidinetract, and 3′ splice site during assembly of the spliceosome. Assembly proceedsin a step-wise fashion, forming several intermediate complexes before formingthe final spliceosome complex [10].The first pre-spliceosomal complex is formedwhen the U1 snRNP and splicingfactor 1 (SF1) bind to the 5′ splice site and branch point of an intron, respectively,to form the E’ complex. The E’ complex is transformed into the E complex withthe binding of U2 auxiliary factor (U2AF) to the polypyrimidine tract and 3′ splicesite. The E complex can be converted to the A complex if U2 snRNP is recruitedto the pre-mRNA intron through interactions with U2AF, replacing SF1 at thebranch site. Recruitment of the U4/U6-U5 tri-snRNP to the A complex generatesthe B complex. Subsequent extensive rearrangements produce the C spliceosomecomplex. The C complex catalyzes the next step in the splicing process beforedisassociating [10].1.2.2 The role of SR proteins in RNA splicingRegulation of splicing can occur at many different stages in the process of splicesite selection and spliceosome formation. Splicing regulation involves cis-regulatoryelements that are categorized into four groups: exonic splicing enhancers (ESE)and silencers (ESS), and intronic splicing enhancers (ISE) and silencers (ISS). ESEsare common, degenerate exonic sequences commonly bound by members of theSR protein family to promote splicing.SR proteins are characterized by the presence of a C-terminal Serine/Arginine-rich RS domain and at least one N-terminal RNA recognition motif (RRM). Tradi-tional models of SR protein function maintain that the RRM domain mediates in-teraction between the SR protein and splicing regulatory elements (e.g. ESEs) [11],while the RS domainmediates protein-protein interactionswith other splicing fac-tors. For example, the RS domain is believed to facilitate the recruitment of U13snRNP, U2AF, and U2 snRNP to the pre-mRNA substrate [10]. However, studieshave also shown that the RS domain contacts the RNA itself during spliceosomeformation [12, 13], and that the RRM of SRSF1 is directly involved in recruiting U1snRNP to the 5′ splice site [14].1.2.3 The role of SRSF protein kinases and CDC-like kinases in RNA splicingSerine residues of SR protein RS domains are phosphorylated by members ofseveral protein kinase families, including the CDC-like kinases (CLKs) and theSRSF protein kinases (SRPKs). SRPK-mediated phosphorylation of SR proteins lo-cated in the cytoplasm results in their nuclear entry, and concentration in speck-les. Subsequent phosphorylation by CLK is necessary for intra-nuclear localiza-tion and activation of splicing [15] (see Figure 1.1). Although the exact mannerby which this activity regulates splicing is not completely understood, recruit-ment of spliceosomal components by some SR proteins are thought to occurvia phosphorylation-enhanced interactions with the SR protein RS domains [16](see Figure 1.2a). However, in the case of SRSF1, hyper-phosphorylation of the RSdomain promotes the recruitment of U1 snRNP via an RRM-RRM interaction [14](see Figure 1.2b). Regardless of the precise mechanism, RS domain phosphoryla-tion is a critical step in the formation of the spliceosome.4SRSRpSRSRPKTransportin-SRpSRpSRnuclear specklepSRppre-mRNAmRNASRPKCLKnucleuscytoplasmCLKFigure 1.1: Regulation of SR protein cellular localization by phosphorylation.SR proteins in the cytoplasm are phosphorylated by SRPKs which pro-motes interactions with Transportin-SR and nuclear entry. Within thenucleus, SR proteins tend to aggregate in nuclear speckles until furtherphosphorylation allows them to dissociate from the speckles and partic-ipate in spliceosome formation. Dephosphorylation of SR protein RS do-mains is necessary for splicing catalysis [17]. Once splicing is complete,SR proteins may either remain associated with the mRNA to facilitatenuclear export and translation, or remain in the nucleus and engage infurther splicing reactions.5ESERSU1RRMRSp pESE5' ssRRMRS U1RRMRSSRSF1ESE5' ss 3' ssRRMRSRSU2AFp p SR proteina.b.kinases3' ssRRMFigure 1.2: Twomodels for spliceosome recruitment by SR proteins. Phospho-rylated RS domains are indicated by the presence of lowercase ‘p’s. a,Phosphorylated RS domain mediated recruitment of U2AF via the U2AF35 kDa subunit’s RS domain. b, Recruitment of U1 snRNP by SRSF1. Theun- or hypo-phosphorylated RS domain of SRSF1 interacts with a non-RNA-bound interface of the RRM domain. Subsequent phosphorylationof SRSF1’s RS domain disassociates it from the RRM, leaving the RRMopen for interaction with the U1 snRNP 70 kDa subunit’s RRM domain.Inspired by figure 6 of [14] and figure 1 of [18].61.2.4 Alternative splicingThe selection of exon junctions during RNA splicing can be variable. Changes inthe set of selected splice sites will impact the structural composition of the finalRNA molecule. The exonic structural consequences can be grouped into eightcategories of AS events [19] (see Figure 1.3).7SE:RI:A5SS:A3SS:MXE:AFE:ALE:Tandem 3' UTR:Figure 1.3: Alternative splicing event types. Constitutive exonic regions aresolid black. Regions that may be differentially included are striped. Thinblack lines represent introns. SE: skipped exon, RI: retained intron, A5SS:alternative 5′ splice site, A3SS: alternative 3′ splice site, MXE: mutuallyexclusive exons, AFE: alternative first exon, ALE: alternative last exon.Inspired by figure 2 from [19]Changes in splice site selection can, for example, result in the exclusion of en-tire exons, as with skipped exon (SE) and mutually exclusive exon (MXE) events;Or, they may cause a shift in the location of an exon’s boundaries as with alter-native 3’ splice site (A3SS) and alternative 5’ splice site (A5SS) events [19]. ASevents affecting the termini of RNA transcripts (e.g. alternative first exon (AFE)and alternative last exon (ALE) events) can result in changes to their untranslatedregion (UTR) sequences, which can affect transcript stability and localization [20].This transcriptomic flexibility equips the cell with another regulatory mechanismwith which to fine tune gene function.Various factors play a role in determining the precise locations of splice sites.Recognition of splice sites is regulated in part by the binding of splicing factors(e.g. SR proteins) to splicing enhancer and silencer elements within the exonicand surrounding intronic sequences. The relative concentrations and activities ofthese splicing factors affects the ability of the spliceosome to assemble on exon8junctions [10]. Therefore, AS (and overall splicing activity) can be modulatedby altering splicing factors’ expression, localization, or functional efficacy. Forexample, disrupting the phosphorylation of SR proteins could negatively impactsplicing regulatory programmes.1.2.5 The role of CLK and SR proteins in non-splicing RNAmetabolicprocessesTherole of SR proteins in RNA splicing and their regulation throughCLK-mediatedphosphorylation has been established. However, members of the SR protein fam-ily are also involved in non-splicing RNA metabolic reactions, including forma-tion of the exon junction complex (EJC) [21], and 3′ end formation [20, 22]. Dis-ruption of regular SR protein activitymay prevent SR proteins from fulfilling theirrole in other cellular processes.EJCs assemble upstream of spliced RNA exon-exon junctions and play a num-ber of roles including promotion of mRNA export and translation. However, it isperhaps most well known for its function in the nonsense-mediated mRNA decaypathway; If an mRNAmolecule contains a pre-mature stop codon upstream of anEJC, that transcript is marked for degradation. Several SR proteins have beenfound to interact with the EJC core and may act to stabilize it [21]. PreventingSR proteins from loading on the pre-mRNA substrate or interacting with otherproteins may not only reduce levels of splicing, but also broadly inhibit mRNAtransport and translation.For the majority of eukaryotic transcripts, formation of the 3′ end entails thecleavage of the nascent RNA molecule, followed by the appending of a poly-adenine (poly(A)) tail to the 5′ cleaved end. The location of cleavage and poly-adenylation is subject to regulation, and at least half of human genes are alter-natively poly-adenylated [23]. Alternative poly-adenylation allows for a greaterdiversity of RNA messages and, consequently, proteins. In this sense, alternativepoly-adenylation is similar to alternative splicing.Generally, recognition of poly(A) sites begins with the binding of cleavageand polyadenylation specificity factor (CPSF) to an A(A/U)UAAA poly(A) signalhexamer in conjunction with the binding of cleavage stimulation factor (CstF) toa U/GU-rich downstream element (DSE). The subsequent steps of cleavage and9poly-adenylation are performed by these core proteins along with with a collec-tion of other 3′ processing factors.CPSF also recognises non-canonical poly(A) signals with reduced efficiency.In these cases, poly(A) site recognition relies on the cooperative action of auxil-iary 3′ end processing factors, including cleavage factor I and II (CFIm, CFIIm).CFIm recognizes a UGUA signal upstream of the poly(A) site and recruits CPSFto the unprocessed RNA transcript [24].CFIm is composed of a 25 kDa subunit and a large subunit of either 59, 68,or 72 kDa. The structures of the 59 and 68 kDa subunit proteins are similar toSR proteins due to their inclusion of both an RNA-binding domain, and an RS-like alternating charge domain. CFIm has been demonstrated to interact with SRproteins [25]. Interactions between SR proteins and CFIm may work to promotebinding of CFIm to the RNA substrate and recognition of non-canonical poly(A)sites [24].The phosphorylation status of CFIm can affect 3′ end formation efficiency.Dephosphorylation of CFIm using Serine/Threonine phosphatases results in theloss of 3′ transcript end cleavage activity in HeLa cell nuclear extract [26]. De-phosphorylation of CPSF and CstF do not produce the same effect. Althoughthe kinase(s) responsible for phosphorylating CFIm are not known, CLKs may beresponsible for phosphorylating the CFIm RS-like domain.The loading of CstF onto the poly(A) site U/GU-rich DSE is an early and es-sential step of the 3′ cleavage and polyadenylation process. Like CFIm binding ofUGUA elements, CstF binding to DSEs promotes selection of poly(A) sites withnon-canonical poly(A) signals [27]. SR proteins can affect CstF binding affinity toregulate alternative 3′ end processing. SRSF3 recognition of splicing enhancer sig-nals of the calcitonin/calcitonin gene-related peptide (CT/CGRP) gene promotesrecruitment of CstF to the poly(A) site at exon 4 [22]. SRSF3’s influence on CstFbinding to Poly-A sites may involve CFIm as CFIm binds early in the 3′ end cleav-age reaction, and promotes the recruitment of other core 3′ end processing fac-tors.RS domain phosphorylation status is known tomodulate interactions betweenSR proteins and other splicing factors. Therefore, it is likely that RS domainmediated interactions between SR proteins and factors involved in other RNA10metabolic reactions are also subject to regulation via CLK activity. Disruption ofCLK phosphorylation of SR protein RS domains may result in a reduction of SRprotein-CFIm or SR protein-CstF interaction. Additionally, there is a possibilitythat disruption of CLK activity will directly reduce phosphorylation of the CFmproteins. Either situation would negatively impact the ability of the 3′ end pro-cessing machinery to recognise poly(A) sites and effectively cleave nascent RNAmolecules.1.3 Disruption of RNA Processing in Human DiseaseStudies of genetic diseases have often focussed on the protein coding regions ofgenes, especially mutations changing the amino acid sequence of the translatedpeptide. Synonymous exonic changes and changes occurring in intronic regionscan still lead to gene dysfunction and disease. Up to 50% of mutations contribut-ing to disease affect RNA splicing [28]; 10% directly disrupt splice sites [29].Essential to splicing is the recognition of splice site signals demarcating in-tronic sequences. Mutations preventing the identification of splice sites can resultin loss of exon recognition [4, 29] and potentially introduce a premature termina-tion codon (PTC), as in the case of familial dysautonomia [30]. MCAD deficiencyfatty acid disorder is caused by a mutation that disrupts an ESE in the MCADgene, resulting in skipping of exon 5 and nonsense-mediated decay of the RNAtranscript [31].Mutations affecting RNA splicing have also been implicated in cancer for-mation and progression. The splicing factor SF3B1 has been shown in a recentstudy to be mutated in approximately 20% of patients with myelodysplastic syn-dromes [32]. In prostate cancer, a mutation creates an ESE in the KLF6 gene andpromotes expression of an isoform that accelerates tumour progression [33].SR proteins have also been associated with cancer. Both SRSF1 and SRSF3are up-regulated in ovarian and colon cancer, among others [5, 34]. For exam-ple, SRSF1 regulates splicing in the oncogene MST1R [35]; Over-expression ofSRSF1 increases expression of an MST1R isoform that bestows greater cell motil-ity, which is related to tumour progression.Similar to RNA splicing, mutations in either poly(A) sites or their cis-regulatory11sequences can lead to disease [36]. Additionally, misregulation of alternativepolyadenylation can cause or exacerbate pathological conditions. For example,cardiac hypertrophy and some cancers are associated with a general preferencefor the selection of proximal poly(A) sites. It is also possible that SR proteins playa role in disease involving misregulated polyadenylation; they are known to bothregulate poly(A) site selection [20] and to be involved in disease.1.3.1 The splicing machinery as a therapeutic targetThe involvement of the RNA splicing machinery in a broad array of diseasesmakes it a potential target for therapeutic intervention. Two approaches havebeen identified in the development of therapies for splicing related diseases. Oneapproach uses antisense oligonucleotides to target specific regions of the nascentRNA transcript, thus preventing the expression of pathological RNA and proteinisoforms. Another approach uses small molecules to modulate cellular signallingevents that regulate splicing.Antisense oligonucleotides can be designed to complement specific nucleotidesequences within a pre-mRNA. Depending on the sequence targeted, the selec-tion of specific splice junctions or entire exon can be controlled. Isoform expres-sion itself can be adjusted by promoting the degradation of target transcripts,while protein-RNA interactions can be prevented by blocking binding sites, forexample, ESEs and ESSs. Antisense oligonucleotides have been successfully usedto treat patients with Duchenne’s muscular dystrophy [34].Small molecules can be used to modulate splicing by inhibiting or promot-ing certain cell signalling pathway events. A well known splicing related sig-nalling event is the post-translational phosphorylation of splicing factors, espe-cially those of the SR protein family. The phosphorylation status of SR proteinsaffects their ability to promote exon recognition. Inhibitors of proteins known tophosphorylate SR proteins have been recently developed, including KH-CB19 [37]and T3 (unpublished, but used in this project). Both CB19 and T3 target the activ-ity of the CLK family of kinases.Although there is the potential to use small molecules to treat splicing relateddiseases, inhibiting components of cellular pathways are likely to have many un-12intended effects. Aside from potential drug off-targets, splicing regulators (e.g.CLKs) are important for the normal splicing of diverse transcript species. Tofully comprehend the consequences of small molecule splicing modulation, tran-scriptomic response must be studied in a systematic manner.1.4 Detecting andMeasuring Changes in the TranscriptomeThere are several methods by which cellular RNA can be measured and compared.Recently developed RNA sequencing technologies allow the capture and identifi-cation of RNA transcript sequences — including splice junctions — without priorknowledge of their existence or composition. RNA-seq, or “Whole TranscriptomeShotgun Sequencing” samples many short RNA fragments from a population ofcells. It uses “next-generation” sequencing technologies to produce reads usuallyaround 30–700 base pairs (bp) in length, depending on the technology used. Atthe same time, the number of reads produced can be very large — up to hundredsof millions, or even billions of reads per run. The number of bases sequencedallows the quantitative representation of the entire transcriptome.A common approach to RNA sequencing involves fragmenting the transcrip-tome, or a subset thereof (e.g. only coding, polyadenylated transcripts). The frag-ments are reverse transcribed to create complimentary DNA (cDNA), which areamplified and then sequenced. During sequencing, either a single end, or bothends of the cDNA can be sequenced. Paired-end sequencing libraries, where bothends of a fragment have been sequenced, have the additional benefit of providingthe expected length between each read mate-pair. This information is useful fordownstream analysis, including gene and RNA isoform quantification.Standard RNA-Seq methodologies produce reads with no indication of whichDNA strand the RNA fragment was transcribed from. Because there are regionsof the genome in which genes on both strands overlap, RNA-Seq reads may notalways be unambiguously assigned to one strand or the other. To address thisproblem, “strand-specific” RNA-seq protocols have been developed [8]. Strand-specific RNA-seq libraries are useful for quantifying transcript expression fromgenomic regions with genes occurring on both the forward and reverse strands.A drawback of RNA-Seq methods is the short read length. A single RNA-13Seq read typically cannot unambiguously reveal the structure of the full RNAmolecule from which it was produced (Figure 1.4a). This problem is exacerbatedby the the presence of multi-exonic genes with multiple alternative isoforms. Forexample, a readmay indicate the skipping of an exon if it maps to the two adjacentexons. However, it may not be useful in identifying alternative splicing decisionsmade upstream or downstream of that particular exon.14a.b.c.SRSF2Figure 1.4: A comparison of RNA-Seq vs PacBio cDNA reads mapped toSRSF2 using a plot generated by the Integrative Genomics Viewer. Thelonger PacBio reads can typically reveal more of a transcript’s structurethan can single RNA-Seq reads. Grey blocks represent sequencing reads.The thin blue lines between grey blocks represent gaps within reads thatare split across introns. Black dots within reads represent deletions. a,RNA-Seq reads. b, PacBio reads. c, SRSF2 transcript structure from Ref-Seq.Long read sequencing technologies produce read lengths thousands of basepairs long. The Pacific Biosciences’ (PacBio) Single Molecule Real Time (SMRT)technology can produce reads with an average length of 4,200–8,500 bp, withthe longest reads reaching greater than 30,000 bp. With these read lengths, largesections of mRNA, or even entire transcripts may be captured (Figure 1.4b).A potential disadvantage of the PacBio sequencing platform is the error rate:approximately 13% on average for raw reads [38]. However, reads with ≥ 99.9%average accuracy can be constructed from the raw continuous long reads (CLRs):when a single cDNA molecule is sequenced multiple times, the CLRs can be as-sembled into a single high quality circular conformation sequencing (CCS) read.If a cDNAmolecule is too long to be sequenced multiple times before sequencingtermination then CLRs representing large portions or even the entire moleculecan still be produced, albeit with greatly reduced accuracy.15Another limitation of PacBio sequencing is the moderate throughput. ThePacBio RS platform produces around 100 Mb of sequence, while the IlluminaHiSeq 2000 can produce 600 Gb [39]. Although short-read sequencing is stillpreferable for quantitative measurement of transcriptomes, long-read sequenc-ing is valuable for isoform detection and validation.Current short- and long-read sequencing technologies should be viewed ascomplementary, rather than as competing, approaches. The high-throughput ofRNA-Seq allows the capture of sequence frommany distinct RNA species and pro-vides a greater sensitivity than the PacBio platform. RNA-Seq also has a greaterper-base accuracy which is critical for mutation detection and accurate identifi-cation of splice sites. Therefore, RNA-Seq libraries can be used to predict splicedRNA isoforms with high sensitivity, while PacBio reads can then be used validatethe existence of the predicted transcripts.1.4.1 Computational methodsExtracting information about the transcriptome of a cell population from RNA-Seq libraries is a difficult problem. However, many tools have been developed thatattempt to compute statistics from RNA-Seq data, such as gene and RNA isoformexpression levels, and relative inclusion levels of alternatively spliced transcriptcomponents. Studies using RNA sequencing technologies often follow commonanalysis workflows starting with read alignment and proceeding to at least oneof several different analyses, including differential expression analysis or RNAisoform prediction (see Figure 1.5). Each RNA sequencing method has it’s ownsources of error and biases that can confound analyses. So, many studies willvalidate results using an independent approach; For example real-time PCR [40]or Sanger sequencing [41] can be used to verify the existence of spliced isoforms.16raw RNA readsread alignmentaligned readsalternativesplicinggene fusiondetectiondifferentialexpression ...etc.... ... ......validationFigure 1.5: Common basic workflow of analysis with RNA sequencing li-braries. RNA sequencing reads are first aligned to a reference genome.The resulting aligned reads can then be used in a number of differentanalyses, including differential expression analysis, alternative splicingquantification, gene fusion detection, etc. The products of these analysesmay then be used in further downstream analyses. Validation of resultsmay be performed using a variety of methods.Splicing-aware RNA sequencing read alignmentThe literature describes many methods for accurately aligning DNA sequencingreads to a reference genome. However, determining the genomic origins of RNAreads presents a distinct challenge: RNA reads can represent regions of RNA con-taining splice junctions. If a read overlapping a splice junction is to be accuratelyaligned to a reference genome, the read must be split apart and each portionmapped to the corresponding exons. Doing so can be difficult if the split read17portions have insufficient sequence specificity to be accurately mapped to thereference genome. DNA sequence aligners are not optimized for the large-gapalignment necessary for RNA read mapping.A potential solution to the problem of split read alignment is to use refer-ence transcriptome sequences instead of a reference genome. By aligning to areference transcriptome, the need to split RNA sequencing reads across intronsis greatly reduced. However, alignment would be restricted to a set of known orpredicted RNA sequences, hindering novel isoform detection. Additionally, readsoriginating from transcripts not present in the reference transcriptome may bealigned to an incorrect reference transcript.Rather than align RNA reads to a reference transcriptome, alignment can beperformed against both a reference genome and a database of exon junction se-quences. This approach eliminates the need for a reference transcriptome andallows the entire genome to be queried for possible matches. But, the set of splicejunction sequences is also limited to known or predicted exon junctions, makingthe alignment of reads containing unknown splice junctions problematic.These issues motivated the development of methods specifically tailored toRNA sequencing read alignment. Some short-read (i.e RNA-Seq) alignment meth-ods, such as GSNAP [42] and STAR [43], are able to detect and map reads acrossboth annotated and predicted splice junctions. However, short-read alignmentmethods may not be the most appropriate choice for longer reads; For example,the GMAP [44] cDNA aligner is recommended for PacBio reads [45].Alternative splicing detection and quantificationCommon problems in the study of alternative splicing are the identification andquantification of existing spliced isoforms. Methods developed to address theseproblems employ a variety of techniques to accomplish their objectives, and com-putations can be performed at the level of individual AS events or at the levelof whole alternative transcript isoforms. Some approaches to AS detection andquantification commonly use information inherent in mapped RNA-Seq reads.During aligment to a reference genome, some reads are split and each segmentmapped to exonic sequences separated by an intron. These reads are useful for18indicating the precise location of exon junctions. When paired-end RNA sequenc-ing data is available, the genomic distance between twomate-pairs mapped to thereference genome can be compared to the expected value of mate-pair distancesin the originating sequence library. When mate-pair distances are longer thanexpected, it is possible that an exon in the gene model has been skipped in thefinal mRNA molecule. Although mate-pair distances cannot identify the preciselocation of exon junctions, they are valuable for inferring the exonic architectureof the originating cDNA fragment.A measure of AS is the percent spliced in (PSI) valuePSI = II+E(1.1)where I is the number of inclusion isoform transcripts, and E is the numberof exclusion isoform transcripts [46]. For example, the inclusion isoform for a SEevent would be the isoform containing the potentially skipped exon. PSI valuescan be compared between two samples to identify RNA isoforms or AS eventsthat are differentially spliced.A popular method for AS analysis is the MISO software package [46]. MISOcalculates PSI values for a set of annotated AS events belonging to 8 differentclasses (SE, retained intron (RI), MXE, A3SS, A5SS, AFE, ALE, tandem UTR) usinga Bayesian approach. When comparing PSI values between two samples, MISOcalculates a Bayes factor statisticBF = Pr(D|M1)Pr(D|M2)(1.2)where D is the observed data, and M1, M2 are two statistical models. TheBayes factor in this application is the likelihood ratio of the observed data be-ing produced under the assumption of differential splicing occurring, over theassumption of no differential splicing. Essentially, the higher the Bayes factor,the more likely it is that differential splicing has occurred. MISO is an appropri-ate choice for projects requiring differential splicing analysis of a broad range ofAS event types in human cells. Although MISO contains only a specific set offunctionality, the field of computational AS methods has developed to the pointwhere there exists a number of statistically rigorous tools that can satisfy the19needs of most sequencing based AS studies [47].Gene expression quantificationThe simplest way gene expression can be estimated given a RNA-Seq library isto count the number of reads or read pairs mapping to regions of the genomecorresponding to annotated gene locations. For some applications, such as differ-ential gene expression analysis using DESeq [48] or edgeR [49], it is necessary tocalculate expression using this strategy. However, raw read counts are biased byfactors including the sequencing depth of a library, and the length and GC con-tent of genes. Generally, the higher the sequencing depth, or the longer the gene,the more reads will map to that gene. As a result, it is necessary to employ someform of read count normalization when dealing with gene expression analysis.Some normalization schemes attempt to find a suitable scaling factor usedto divide gene read counts within a sequencing library. The DESeq and edgeRpackages both use this approach for differential expression analysis. Anotherapproach is to use quantile normalization to transform the gene expression dis-tributions of each RNA-Seq library in such a way as to make them identical. Yetanother approach is calculating Reads Per Kilobase of exon model per Millionmapped reads (RPKM) valuesRPKM = 109CNL(1.3)whereC is the number of reads mapped to a gene’s exons, N is the total num-ber of mapped reads in the sequencing library, and L is the length of the gene’sexons in base pairs [50]. RPKM values represent global (rather than relative, e.g.PSI) expression level, and normalize read counts by the number of mapped readsin a sequencing library and by the lengths of gene models.A variant of the RPKM measure, Fragments Per Kilobase of exon model perMillion mapped reads (FPKM), is produced by the Cufflinks software [51]. Thecalculation of FPKM values takes into account that with paired-end sequencingdata, only one mate of a read pair originating from the same cDNA fragmentmight be mapped to the genome reference. This results in the double countingof fragments with both mate-pairs mapped while only counting other fragments20once. FPKM attempts to count cDNA fragments rather than individual RNA-Seqreads, thereby reducing this bias. The Cufflinks software can also correct for frag-ment bias (certain sequences being preferentially selected for by primers duringPCR) when calculating FPKM values [52].1.5 Experimental Approach and AimsTakeda Pharmaceutical Company Limited has recently developed T3 — a novelcompound that suppresses RNA processing by inhibiting CLK phosphorylationof RS domains. The Takeda T3 compound inhibits CLK activity with a greaterspecificity than previously reported CLK inhibitors [unpublished data]. Althoughmethods for splicing inhibition have been described [37, 53], the transcriptomiceffects of progressively disrupting RNA processing have not been assessed in asystematic manner. Using this novel T3 compound, cellular responses to pharma-cological restriction of RNA processing can be measured. Concentration-basedanalysis will facilitate the identification of transcriptomic components sensitiveto CLK inhibition, and may provide valuable insight into the importance of RSdomain phosphorylation in the RNA processing regulatory landscape.Alternative splicing can be categorised into eight different event types (Fig-ure 1.3). Each event type may rely on the activity of SR proteins to a greater orlesser extent. SR proteins also have a role in non-splicing reactions, including3′-end formation. Additionally, the phosphorylation status of the RS domain-containing CFIm appears to be important to the 3′-end cleavage reaction. Thevulnerability of RNA processing events to CLK inhibition, and the manner inwhich these events react to progressive repression of CLK is currently unknown.Individual RNA processing events may have differing responses to T3 treat-ment. For example, the PSI values of AS events may increase or decrease to vary-ing degrees upon treatment. The direction of response and level of sensitivitymay reflect the strength of cis regulatory signals, or other relevant RNA sequencecharacteristics. There have been efforts to characterise an RNA splicing code [54];Nevertheless, there is a lack of research in transcriptome-wide RNA features pre-dictive of splicing changes caused by the global impedance of SR protein function.Disruption of RNA processing efficacy can lead to changes in the composi-21tion of the transcriptome, which may comprise both changes in RNA isoformbalance, as well as gene expression level. These changes may reflect both the di-rect effects of CLK inhibition, as well as compensatory responses by the cell. Forexample, disruption of AS may increase production of aberrant transripts, whichmay then prompt the cell to up-regulate the expression of gene isoforms involvedin nonsense-mediated decay. Which biological processes are most vulnerable orresponsive to CLK inhibition and alterations in RNA processing efficacy has yetto be described.RNA processing patterns are dependent on biological context, including celltype [54] and tumour/normal status [5]. CLK phosphorylation of RS domainsappears to be fundamental to the process of RNA processing. However theremay still be variations between cell types in the degree to which RNAmetabolismrelies on CLK activity. To gain insight into the regulation of RNA processing viaCLK-mediated SR protein phosphorylation, HCT116 and hTERT cells were treatedwith progressively increasing concentrations of T3. Vehicle-treated cells wereused as a negative control. To compare the effects on splicing between T3 treatedcells and cells with artificially reduced CLK expression, a CLK small interferingRNA (siRNA) experiment was performed with HCT116 cells. RNA was measuredusing RNA-Seq and Pacific Biosciences’ RS platform (see Section 2.1).Gene expression and RNA splicing changes were quantified computation-ally using the MISO [46] and Cufflinks [51] software. Preliminary inspection oftreated RNA-Seq libraries revealed the treatment-dependent formation of con-joined transcripts. So, a transcriptome-wide search for conjoined transcripts wasperformed using a published gene-fusion detection method [55]. Biological pro-cesses affected by T3 treatment were found by selecting genes exhibiting changesin splicing or expression to build functional interaction networks [56], whichwere then queried for enriched GO biological process terms [57]. RNA featuresassociated with splicing changes due to T3 treatment were computed using geneannotations and published sequence motifs [58].A description of the generated datasets and the results of computational anal-ysis are included in chapter 2 of this document. The results are split up into threemain parts. In Section 2.2, the dose depended effects on AS are reported alongwith the results of an investigation into affected biological processes. This section22also compares the changes in AS between the HCT116 and hTERT cell types, aswell as between T3 treated and CLK siRNA transfected HCT116 cells. Section 2.3includes a characterisation of conjoined gene transcripts produced as a result ofT3 treatment and biological processes affected by conjoined gene transcription.Finally, Section 2.4 describes the effects of T3-induced CLK inhibition on geneexpression and the biological processes affected by differential expression.23Chapter 2Transcriptomic Consequences of CLKInhibition2.1 DatasetsThe CLK inhibitor compound, T3, was applied to HCT116 malignant colon epithe-lial cells and normal hTERT cells at multiple concentrations. RNA was measuredusing either an unstranded (HCT116 cells) or stranded (both HCT116 and hTERTcells) RNA-Seq protocol, or using Pacific Biosciences SMRT platform [9]. Table 2.1summarizes the three datasets.24Table 2.1: Summary of T3 treatment datasets. Each dataset contains se-quences from either T3 treated or control cell populations. (unstr) and(str) indicates an unstranded or stranded RNA-Seq protocol was used, re-spectively. An ‘X’ indicates that a sequencing library exists for the appro-priate T3 concentration and dataset.HCT116 hTERTT3 dose (μM) RNA-Seq (unstr) RNA-Seq (str) PacBio RNA-Seq (str)0.0 X X X X0.05 X0.10 X0.50 X X X X1.0 X X X5.0 X X X X10.0 XThe primary dataset used for analysis was the HCT116 unstranded RNA-Seqdataset. This dataset includes the largest number of T3 treatment observations,providing the ability to detect changes at both very small and large doses aswell as providing greater resolution for response pattern detection. The hTERTdataset was used to determine whether observed transcriptomic response pat-terns were HCT116 cell-type specific, or observable in cells with differing biol-ogy. As the hTERT dataset was sequenced using a stranded RNA-Seq protocol, asecond HCT116 dataset was generated for comparison using the same RNA-Seqprotocol and the same T3 concentrations as the hTERT dataset.Multiple datasets were also generated with the purpose of validating resultspresented in this study. A CLK knockdown data set was generated by usingsiRNA to target each or a combination of the CLK proteins in HCT116 cells. Twocontrol libraries were generated by either knocking down NT3 (a growth factorin neurons), or treating cells with only vehicle Lipofectamine® 2000. The RNAfrom each CLK siRNA sample was sequenced using RNA-Seq (Table 2.2).25Table 2.2: Summary of CLK siRNA knockdown RNA-Seq libraries. Theknockdown experiment was performed using HCT116 cells. An ‘X’ indi-cates that the corresponding sequencing library was generated from cellswith the indicated target knocked down. (ctrl) indicates a control library,and ‘None’ represents a sample treatedwith vehicle Lipofectamine® 2000.siRNA targetSample CLK1 CLK2 CLK3 CLK4 NT3 (ctrl) None (ctrl)1 X2 X3 X4 X5 X X X6 X X7 X X X8 X X X X9 X10 XAnother dataset, consisting of RNA sequences obtained from the PacBio RSplatform, was generated mainly for the purposes of validating the existence ofspliced isoforms arising due to CLK inhibition. The PacBio sequencing platformis suited for RNA isoform detection as it is able to produce long reads, enablingthe identification of large portions of transcript structure. The Pacbio datasetincludes both high-quality CCS reads, and lower quality CLR sequences.All of the RNA-Seq libraries were aligned using GSNAP. The aligned librarieswere then processed to remove potential PCR duplicates. The PacBio librarieswere aligned with the GMAP aligner and filtered to only include reads whosealigned proportion is at least 90%, and have at least 80% identity with the refer-ence.262.2 T3 Treatment Induces Dose-Dependent Alternative SplicingChangesRelative inclusion levels of alternative splicing eventswere quantified usingMISOin all three T3-treated RNA-Seq datasets. The resulting PSI values in each treatedlibrary were compared to corresponding control PSI values. Alternative splicingevents were called as differentially spliced if the MISO-calculated Bayes factor(Equation 1.2) was ≥ 20 and the difference in PSI values between treated anduntreated samples was ≥ 0.1.To assess the transcriptome’s sensitivity to CLK inhibition, the number ofdifferentially spliced events were counted for each treated library. The numberof differentially spliced events increased with higher dosage (Figure 2.1). Thisresponse pattern demonstrates that the Takeda T3 compound is able to inhibitthe splicing of a large number of exon junctions. A large change in the numberof affected events occurred at 0.50μM (4474 events compared with 799 and 1088for 0.05μM and 0.10μM, respectively, or 5.6 and 4.1 fold more events), suggestingthat at this concentration a regulatory mechanism was disrupted, resulting ingreater numbers of differentially spliced transcripts.27020004000600080000.05 0.10 0.50 1.0 5.0 10.0 0.50 1.0 5.00.50 1.0 5.0Concentration (µM)AS event counts DatasetHCT116−unstrandedHCT116−strandedhTERT−strandedFigure 2.1: Differentially spliced event counts for the HCT116 and hTERTdatasets. Events have a Bayes-factor >= 202.2.1 Alternative splicing response to CLK inhibition is common to bothHCT116 and hTERT cell typesSplicing response to CLK inhibition between HCT116 and non-malignant hTERTcells was compared to ascertain the extent to which biological context affectsthe reliance of splicing on normal CLK activity. Treated hTERT and HCT116 celltranscriptomes were sequenced with a stranded RNA-Seq protocol (see Table 2.1).To investigate the degree of overlap between differentially spliced AS events inthe three T3-treated RNA-Seq datasets, the events affected by T3 treatment werecollected for each dataset. The number of overlapping and dataset-specific eventswere then counted (Figure 2.2).285476 8041409150810814473074HCT116 unstrandedHCT116 strandedhTERT strandedFigure 2.2: Venn diagram illustrating the number of unique overlapping anddataset-specific differentially spliced MISO events between the HCT116and hTERT RNA-Seq datasets.TheHCT116 unstranded RNA-Seq dataset produced the greatest number of dif-ferentially spliced AS events (11,040), followed by the hTERT (6,110) and HCT116stranded RNA-Seq datasets (5,734) (Figure 2.2). However, the unstranded RNA-Seq dataset includes more treated samples, including the 10.0μM concentration.The large majority of events for both stranded RNA-Seq datasets overlap withthe events from at least one other dataset (HCT116: 86%; hTERT: 75%), while only50% of events from the HCT116 unstranded RNA-Seq dataset overlap with theevents from another dataset (Figure 2.2). The stranded RNA-Seq libraries mayinclude less splicing information than the unstranded RNA-Seq libraries (see Sec-tion 2.5). Of the differentially spliced events detected in hTERT cells, 75% werealso detected in HCT116 cells. 37% of all HCT116 events and 61% of events fromthe HCT116 stranded RNA-Seq dataset were also detected in hTERT cells. The29amount of AS event overlap between hTERT and HCT116 cells suggests that theeffects of CLK inhibition are not predominantly hTERT or HCT116 cell-type spe-cific.2.2.2 Splicing and cell cycle related genes are sensitive to CLK inhibitionIdentifying genes differentially spliced at low T3 concentrations will point to-wards the biological processes most sensitive to loss of CLK activity. Additionally,it may hint at novel roles for CLK phosphorylation in non-splicing processes. Ob-serving affects only occurring at higher concentrations may reveal how the cellresponds to widespread RNA processing disruption.Affected biological processes were determined by identifying differentiallyspliced genes for each T3 concentration. Genes were then grouped according towhether they were differentially spliced in the 0.05–0.5μM or 1.0–10.0μM CLK in-hibitor treated samples. Each group of genes was used to create a gene interactionnetwork using the ReactomeFI Cytoscape plugin [56]. Gene interaction networkswere queried for enriched GO biological process terms with false discovery ratecontrolled at 0.05. Each group of significantly enriched biological process genesets was then used to generate an enrichment map [59] (Figure 2.3, Figure 2.4,Figure 2.5).30apoptotic processRNA splicingRNA metabolismcell cycletoll-like receptor signaling pathway histone acetylationDNA repairphosphorylationchromatin modificationprotein ubiquitination and catabolismnegative regulation of telomerase activitygene expression, transcriptionFigure 2.3: Biological process enrichment map for differentially spliced genesin the HCT116 unstranded RNA-Seq dataset. Each node represents a GObiological process gene set. Node cores are coloured red when that geneset is enriched among genes differentially spliced in the the 0.05–0.5μMsamples, and the outer ring is coloured redwhen that gene set is enrichedin the 1.0–10.0μM samples. Edge thickness indicates the level of overlapbetween two gene sets, considering the set of differentially spliced genesin the 0.05–0.5μM (green edges) or 1.0–10.0μM (blue edges) samples.31DNA repairubiquitin-dependent protein catabolismprotein phosphorylationtoll-like receptor signalingcell cyclechromatin modificationapoptosistranslationtranscriptionRNA processing, splicingRNA catabolismFigure 2.4: Biological process enrichment map for differentially spliced genesin the HCT116 stranded dataset. Each node represents a GO biologicalprocess gene set. Node cores are coloured red when that gene set isenriched among genes differentially spliced in the the 0.05μM samples,and the outer ring is coloured red when enriched in the 1.0–5.0μM sam-ples. Edge thickness indicates the level of overlap between two gene sets,considering the set of differentially spliced genes in the 0.05μM (greenedges) or 1.0–5.0μM (blue edges) samples.32toll-like receptor signalingtranslationcell cycleapoptosis histone modificationprotein catabolismDNA repairphosphorylationRNA processing, splicingRNA catabolismFigure 2.5: Biological process enrichment map for differentially spliced genesin the hTERT dataset. Each node represents a GO biological process geneset. Node cores are coloured red when that gene set is enriched amonggenes differentially spliced in the the 0.05μM samples, and the outer ringis coloured red when enriched in the 1.0–5.0μM samples. Edge thicknessindicates the level of overlap between two gene sets, considering the setof differentially spliced genes in the 0.05μM (green edges) or 1.0–5.0μM(blue edges) samples.33Splicing factors were found to be affected by differential splicing in the lowerT3 concentration samples. While the splicing machinery is known to be sub-ject to autoregulation [60], that splicing factors are among the genes affected byeven low doses of CLK inhibitor indicates that splicing autoregulatory processesare sensitive to changes in CLK activity. Other forms of RNA metabolism werealso affected at lower T3 concentrations, including gene expression and transcrip-tion. Cell cycle related genes were also found to be sensitive to CLK inhibition;cell cycle progression is known to rely on the normal operation of RNA splic-ing [61, 62]. Some groups of related biological processes (e.g. those involved withtranscription or the cell cycle) had gene sets that were affected at only the higherT3 concentrations. This may be the result of a progressively stronger disruptionof these biological processes with increasing T3 dose. A group of genes involvedin toll-like receptor signaling were found to be predominantly affected in the1.0–10.0μM samples. This effect may be an innate immune response to toll-likereceptor ligands released from cells dying [63] due to high concentrations of CLKinhibitor. Genes involved in apoptosis are differentially spliced due to treatment,which may also indicate cellular lethality at higher T3 concentrations.2.2.3 CLK knockdown partially reproduces effects of T3 treatmentThe T3 compound prevents CLKs from phosphorylating their target RNA pro-cessing factors. Therefore, one may hypothesize that reducing the expressionof CLK genes would have a similar effect on RNA splicing. To test this notion,CLK expression was knocked down via siRNA in HCT116 cells and the resultingtranscriptomes sequenced using RNA-Seq (Table 2.2).The RNA-Seq libraries from the CLK knockdown experiment were analyzedwith MISO; Each CLK knockdown library and the vehicle control library werecompared to the NT3 siRNA control. Differentially spliced AS events were calledat a Bayes factor (Equation 1.2) threshold of 20, and PSI change threshold of 0.1,similar to the T3 concentration curve experiment (Section 2.2). A list was com-piled of MISO events found to be differentially spliced in any of the CLK siRNAlibraries but not in the vehicle control library. This list was then compared to listsof differentially spliced events from the T3-treated HCT116 datasets (Figure 2.6).3470562682891141110 4007476CLK siRNAT3 (unstranded RNA-Seq)T3 (stranded RNA-Seq)Figure 2.6: Venn diagram showing the number of dataset-specific and com-mon AS events for the CLK knockdown and T3-treated HCT116 datasets.In total, 1580 unique AS events were found to be differentially spliced in anyof the CLK knockdown libraries. Of these events, 875 (55%) were found in at leastone of the two T3-treated HCT116 AS event lists, demonstrating that at least someof the effects of T3 treatment are due to loss of CLK function as opposed to inhi-bition of other targets. Almost half of the events resulting from CLK knockdownwere not found to be differentially spliced in the T3 treated datasets. This observa-tion can be partially explained by differences in biological response to depletingCLK RNA versus inhibiting CLK phosphorylation activity.Genes differentially spliced in both T3 treated cells and cells transfected withCLK siRNA are likely to be specifically affected by loss of CLK activity. Biologicalprocesses likely to be affected by splicing changes in this common set of geneswere identified by constructing a gene interaction network with the ReactomeFICytoscape plugin [56]. Functional enrichment analysis was then performed us-35ing the genes in the network (Table 2.3). Biological processes enriched amonggenes differentially spliced in both T3 treated and CLK siRNA transfected cellsincluded “gene expression”, “mitotic cell cycle”, “chromatin modification”, and“nuclear mRNA splicing, via spliceosome”. The enrichment of these biologicalprocesses underscores their sensitivity to normal CLK activity.Table 2.3: Enrichment of GO biological process terms in differentially splicedgenes common between T3 treated and CLK siRNA transfected HCT116cells.Biological Process FDR Genesgene expression 0.001 XPO1, THRA, RPL13, U2AF1, RPL10,PTBP1, RPS18, MED15, SRSF11, HSPA1A,HNRNPL, UBE2D3, EIF3B, HNRNPK,TEAD4, RPL10A, RPS24, EEF1A1, CSTF3,EIF4H, NCOR1, NCOR2, RPS2, SNAPC5,POLR1C, EIF4A2, EEF1D, SNRNP70,GTF3C2mitotic cell cycle 0.0165 XPO1, CEP78, CDC16, NDEL1, CNTRL,AZI1, TFDP1, CDC23, POLD2, AKAP9,POM121, PPP1R12A, ODF2, BUB3, LMNA,CEP63, CSNK1Echromatin modification 0.021 MORF4L2, MTF2, HDAC5, NCOR1,MBTD1, CHD9, CHD3, PHF19translational initiation 0.024 RPL13, RPL10, RPS18, EIF3B, RPL10A,RPS24, EIF4H, RPS2, EIF4A2translational elongation 0.02525 RPL13, RPL10, RPS18, RPL10A, RPS24,EEF1A1, RPS2, EEF1Dnuclear mRNA splicing,via spliceosome0.03183 U2AF1, PTBP1, SRSF10, SRSF11, HNRNPL,HNRNPK, CSTF3, DDX5, SF1, SNRNP70362.2.4 T3 induced CLK inhibition reduces splice junction recognition efficacyCLK inhibition causes changes in splicing levels in many alternatively spliced ex-ons. In addition, multiple AS event types exhibited changes in splicing patternsupon T3 treatment. Understanding the manner in which each AS event type re-sponds to CLK inhibitionmay provide insight into regulatory differences betweenevent types. Specifically, relative sensitivities to splicing factor phosphorylationstatus may be revealed.Differentially spliced events were identified and their PSI values in each T3concentration were collected. Events with missing PSI values were removed. Byinspecting the PSI value distributions at each T3 concentration, several responsepatterns can be observed (Figure 2.7, Figure 2.8, Figure 2.9). First, the PSI values ofSE events decrease as drug concentration is increased (medians: -0.02, -0.13, -0.18,for 0.10μM, 0.50μM, and 1.0μM), indicating that these exons are being skippedmore often due to treatment. The most substantial PSI decrease occurs at the0.50μM concentration (6.5 fold decrease in median PSI from 0.10μM). This obser-vation supports the notion that the 0.50μM concentration surpasses a biologicalthreshold, resulting in widespread structural changes within the transcriptome.RI events tend to increase in PSI over increasing CLK inhibitor concentration (me-dians: 0.0, 0.02, 0.09, for 0.10μM, 0.50μM, and 1.0μM), demonstrating a tendencyfor introns to be retained more often as a result of treatment. However, retainedintrons see a more substantial increase in PSI at 1.0μM, compared to 0.50μM (4.5fold increase in median PSI from 0.50μM). This response pattern suggests thatintron retention is more resilient to CLK inhibition compared to exon skipping.In contrast to SE and RI events, A3SS and A5SS events both see a more gradualincrease in PSI. Increases in alternative splice site PSI represents a tendency to-wards including an exon’s extension (i.e. choosing a splice site farther away fromthe centre of the exon).37lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllA3SS (763) A5SS (559) AFE (1680)ALE (1422) MXE (220) RI (993)SE (5324)−1.0−−1.0−−1.0− 0.10 0.50 1.0 5.0 10.0 Concentration (µM)PSI changeFigure 2.7: AS event type PSI distributions across CLK inhibitor concentra-tion for the HCT116 unstranded RNA-Seq dataset. The number of eventsfor each event type is shown in parentheses. Notches extend ±1.58 IQR√n ,where IQR is the inter-quartile range.38lllll lllllllllllllllllllllll lllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllA3SS (284) A5SS (201) AFE (819)ALE (884) MXE (50) RI (505)SE (2975)−1.0−−1.0−−1.0− 1.0 5.0 Concentration (µM)PSI changeFigure 2.8: AS event type PSI distributions across CLK inhibitor concentra-tion for the HCT116 stranded RNA-Seq dataset. The number of eventsfor each event type is shown in parentheses. Notches extend ±1.58 IQR√n ,where IQR is the inter-quartile range.39ll l llllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllA3SS (291) A5SS (176) AFE (857)ALE (987) MXE (69) RI (474)SE (3190)−1.0−−1.0−−1.0− 1.0 5.0 Concentration (µM)PSI changeFigure 2.9: AS event type PSI distributions across CLK inhibitor concentra-tion for the hTERT stranded RNA-Seq dataset. The number of eventsfor each event type is shown in parentheses. Notches extend ±1.58 IQR√n ,where IQR is the inter-quartile range.40The stranded RNA-Seq datasets were unable to show a shift in SE PSI changedistributions at 0.5μM due to the lack of samples that have been treated with T3concentrations lower than 0.5μM. The dose-dependent changes in RI (and otherevent type) PSI distributions were often not as apparent in the stranded RNA-Seq datasets, partly due to the lack of treatment points, but possibly also due toreduced splicing information in the stranded RNA-Seq libraries compared to theunstranded RNA-Seq libraries (see Section 2.5).The variety in response patterns reveal that different AS event types havevarying levels of sensitivity to CLK inhibition. From these observations, it maybe concluded that different classes of alternative splicing events are regulatedthrough different mechanisms that in turn exhibit varying levels of sensitivity toCLK phosphorylation efficacy. Further, the tendency of the splicing machinery toselect the exclusion isoform of SE events and the inclusion isoform of RI eventssuggests that CLK inhibition reduces splice site recognition.2.2.5 PSI clustering reveals distinct AS response groupsIn aggregate, AS events respond to CLK inhibition following event type deter-mined patterns. However, enforcing an event type based segregation of AS eventsmay be concealing finer-grained response profiles. Clustering of AS event PSI pro-files will reveal treatment response patterns in an event class unaware manner.Clustering of PSI profiles was performed using the WGCNA [64] clusteringtool. Events were selected for clustering if they were differentially spliced in anyof the treated samples at a Bayes factor threshold of 20. Events with missingPSI values were removed unless they contained only two non-consecutive miss-ing values in the case of the HCT116 unstranded RNA-Seq dataset, or only onemissing value in the stranded RNA-Seq datasets. Missing values were replaced us-ing linear interpolation. WGCNA was run with networkType=“signed” and min-ModuleSize=25. The WGCNA clustering package requires a soft threshold valuewhich can be chosen by attempting to maximise both the scale independence andconnectivity of the PSI correlation network. Soft thresholds of 17, 28, and 24 wereselected for the HCT116 unstranded and stranded RNA-Seq, and hTERT datasets,respectively (Figure A.1, Figure A.2, Figure A.3). The threshold for the HCT116 un-41stranded RNA-Seq dataset was chosen by selecting one of the thresholds wherethe scale free topology model fit starts to plateau on the model fit versus thresh-old curve. For the stranded RNA-Seq datasets, values above 20 were chosen asthis produced visually distinct clusters and agrees with the suggested guidelinesfor threshold selection when model fit R2 values do not reach above 0.8 [65]. Thescale free topology model fit can be low when clustering time-series data [65],which the CLK inhibitor concentration curve data can be considered to be. A rep-resentative event (i.e. “eigenevent”) was calculated for each cluster, and eventswhose PSI profiles did not strongly correlate with the eigenevent (Pearson corre-lation coefficient ≥ 0.75) were removed.Clustering revealed several distinct response patterns common across multi-ple event types and both cell types (Figure 2.10, Figure 2.11, Figure 2.12, number ofevents per cluster shown in plots). This resulted in 28 distinct PSI profile clustersfor the HCT116 unstranded RNA-Seq dataset, and 7 clusters for the two strandedRNA-Seq datasets. Similarities in clustered PSI response patterns can be observedbetween the HCT116 and hTERT cell types when considering the two strandedRNA-Seq datasets. Similar response patterns can also be observed in the HCT116unstranded RNA-Seq dataset, although the PSI response patterns in this datasetwill be somewhat different due to differences in the number of observations. Asummary of proposed similar clusters between the three datasets is included inTable 2.4. Common response patterns found in the three RNA-Seq datasets arelikely genuine.421 (4674) 2 (2187) 3 (408) 4 (194)5 (161) 6 (159) 7 (146) 8 (149)9 (135) 10 (134) 11 (114) 12 (78)13 (74) 14 (71) 15 (56) 16 (58)17 (50) 18 (49) 19 (43) 20 (36)21 (40) 22 (36) 23 (37) 24 (25)25 (30) 26 (28) 27 (27) 28 (27)−2−1012−2−1012−2−1012−2−1012−2−1012−2−1012−2−10120.0 0.05 0.10 0.50 1.0 5.0 10.0 0.0 0.05 0.10 0.50 1.0 5.0 10.0 0.0 0.05 0.10 0.50 1.0 5.0 10.0 0.0 0.05 0.10 0.50 1.0 5.0 10.0Concentration (µM)Standardized event PSIFigure 2.10: AS event PSI clusters for the HCT116 unstranded RNA-Seqdataset. Black lines represent AS event PSI profiles. Red lines are clustereigen-events. The number of events in each cluster is shown in paren-theses in the cluster label.431 (2171) 2 (1501) 3 (913) 4 (545)5 (297) 6 (163) 7 (27)−101−1010.0 0.50 1.0 5.0 0.0 0.50 1.0 5.0 0.0 0.50 1.0 5.0Concentration (µM)Standardized event PSIFigure 2.11: AS event PSI clusters for the HCT116 stranded RNA-Seq dataset.Black lines represent AS event PSI profiles. Red lines are cluster eigen-events. The number of events in each cluster is shown in parenthesesin the cluster label.1 (2272) 2 (1415) 3 (1009) 4 (439)5 (350) 6 (334) 7 (127)−101−1010.0 0.50 1.0 5.0 0.0 0.50 1.0 5.0 0.0 0.50 1.0 5.0Concentration (µM)Standardized event PSIFigure 2.12: AS event PSI clusters for the hTERT stranded RNA-Seq dataset.Black lines represent AS event PSI profiles. Red lines are cluster eigen-events. The number of events in each cluster is shown in parenthesesin the cluster label.44Table 2.4: Proposed similar AS PSI response clusters between the three RNA-Seq datasets. (unstr) and (str) indicates an unstranded or stranded RNA-Seq protocol was used, respectively.HCT116 hTERTRNA-Seq (unstr) RNA-Seq (str) RNA-Seq (str)1 1 21 2 13 3 32 4 57 5 612 6 425 7 7As PSI clustering was performed in an AS event type unaware manner, eachcluster may contain a variety of event types. Calculating cluster event type pro-portions revealed a variety of event type distributions between clusters (Figure 2.13).Each cluster was enriched for certain event types, compared with the distribu-tion of all differentially spliced events chosen for clustering (Table 2.5, Table 2.6,Table 2.7). General distributional trends are most apparent when inspecting theevent type distributions of the two stranded RNA-Seq datasets. Clusters enrichedfor SE events (1, 2, 5, and 7 for HCT116; 1, 2, and 6 for hTERT) are characterizedby a decrease in PSI between untreated samples and samples treated with 0.50μMof T3 CLK inhibitor. After the 0.50μM concentration, these clusters may eitherincrease or decrease in PSI. The remaining clusters (excluding hTERT cluster 7)are characterised by an increase in PSI between untreated samples and samplestreated with 0.50μM of T3. These clusters have a lower proportion of SE eventsand are enriched for ALE, AFE, A5SS, A3SS, MXE, and RI events. These resultsagree with the previous observation that SE events tend to decrease in PSI in aT3 dose-dependent manner, while RI events tend to increase in PSI with T3 treat-ment. A likely cause of this pattern is loss of splice junction recognition efficacyin T3 treated cells.45a.b. c.Event typeSERIAFEALEA3SSA5SSMXE0.000.250.500.751.001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28ClusterProportion of cluster0.000.250.500.751.001 2 3 4 5 6 7ClusterProportion of cluster0.000.250.500.751.001 2 3 4 5 6 7ClusterFigure 2.13: AS event type proportions across AS PSI clusters. a, HCT116 un-stranded RNA-Seq dataset. b, HCT116 stranded RNA-Seq dataset. c,hTERT dataset.46Table 2.5: AS PSI cluster event type proportion enrichment for the HCT116unstranded RNA-Seq datasets. Benjamini-Hochberg adjusted p-valuesfrom hypergeometric tests are shown if they are below 0.05.Cluster A3SS A5SS AFE ALE MXE RI SE1 0.02 1.98e⁻⁰⁶ 1.97e⁻¹⁴ 5.06e⁻³² 5.05e⁻⁷⁵ 1.71e⁻¹²³3 4.97e⁻⁰⁷ 0.0364 0.00107 7.25e⁻⁰⁸ 0.002034 6.11e⁻⁰⁸ 0.00582 3.05e⁻⁰⁵5 3.16e⁻¹⁸ 1.99e⁻¹¹ 0.008716 0.0155 3.64e⁻⁰⁵7 0.0002228 0.0003399 0.0477 0.000271 0.047710 0.00530 0.00638 1.06e⁻¹¹11 0.00582 1.54e⁻⁰⁵12 0.0018113 0.021114 0.012015 0.0020116 0.021917 0.0130181920 0.00107 0.00871 0.04772122 0.013323 0.0055324 0.00740 0.009972526 0.0106Continued47Cluster A3SS A5SS AFE ALE MXE RI SE27 0.0058228 0.0125Table 2.6: AS PSI cluster event type proportion enrichment for the HCT116stranded RNA-Seq datasets. Benjamini-Hochberg adjusted p-values fromhypergeometric tests are shown if they are below 0.05.Cluster A3SS A5SS AFE ALE MXE RI SE1 6.63e⁻⁷²2 6.04e⁻⁷²3 1.75e⁻¹⁹ 1.49e⁻¹² 3.21e⁻³⁵ 3.96e⁻¹¹ 0.000305 5.30e⁻¹⁸4 0.0128 3.88e⁻¹⁵ 2.09e⁻¹³ 5.66e⁻⁵³5 4.72e⁻¹⁰6 8.32e⁻⁰⁷ 0.0340 0.0128 0.001537 0.000224Table 2.7: AS PSI cluster event type proportion enrichment for the hTERT.Benjamini-Hochberg adjusted p-values from hypergeometric tests areshown if they are below 0.05.cluster A3SS A5SS AFE ALE MXE RI SE1 7.88e-1362 8.42e-543 8.76e-23 1.71e-12 3.64e-21 5.70e-18 0.0130 2.20e-394 0.00170 2.23e-14 1.32e-09 0.00454 0.0007665 0.0407 5.00e-15 2.70e-05 1.00e-386 1.75e-077 0.000378Analysis of PSI change patterns revealed that AFE and possibly ALE events48tend to increase with T3 treatment, although PSI did not always clearly changeas a function of T3 dose (Section 2.2.4). However, the event type unaware PSIprofile clustering shows that both AFE and ALE events are more prevalent inclusters that increase in PSI in a dose-dependent manner. AFE and ALE eventsare also present in clusters of events that decrease in PSI, which might explain thelack of a clear dose response in the event type PSI change analysis (Section 2.2.4).PSI increases for AFE and ALE events indicate that an isoform beginning closerto the gene centre is being chosen more often.2.2.6 ESE density is predictive of splicing response to CLK inhibitionAS events of a particular type with contrasting responses (i.e. increasing vs. de-creasing with treatment) may contain differences in splicing signals within over-lapping and nearby RNA sequences. One common class of splicing signal is theexonic splicing enhancer (ESE). ESEs are recognized by SR proteins, usually topromote recruitment of the spliceosome to exon junctions. The ESE sequencemotifs are degenerate and common in exonic sequences, especially near exonjunctions [58]. Exons with a higher density of ESE motifs would present moreopportunities for SR proteins to bind to the RNA substrate, promoting inclusionof the corresponding exon.To test whether ESE density can explain some difference in splicing response,SE and RI events from clusters 1 (SE: 4208 events, RI: 145 events) and 2 (SE: 241events, RI: 584 events) from the HCT116 unstranded RNA-Seq dataset AS eventclustering (Figure 2.10) were selected. These events increase (cluster 2) or de-crease (cluster 1) in PSI with T3 treatment. Alternatively included regions ineach group of events were queried for the presence of SRSF1, SRSF2, SRSF5, andSRSF6 binding motifs obtained from ESEfinder [58]. The ESE motif search wasperformed in a probabalistic manner, correcting for background nucleotide rates.The density of each binding motif was calculated for each sequence. SRSF1,SRSF2, and SRSF5 motif density was significantly higher (one-tailed t-test, Ta-ble 2.8) in PSI-increasing vs. PSI-decreasing skipped exons (Figure 2.14). For re-tained introns, SRSF1, SRSF2, and SRSF6 binding motif density was significantlyhigher (one-tailed t-test, Table 2.8) in PSI-increasing events (Figure 2.15).49llllllllllllllllllllllllllllllllllllllllllllllSRSF1 SRSF2SRSF5 SRSF60. 2 1 2ClusterMotif densityFigure 2.14: ESE density boxplots for skipped exons increasing (cluster 2) ordecreasing (cluster 1) in PSI with T3 treatment. ESEs tested includebinding motifs for SRSF1, SRSF2, SRSF5, and SRSF6. Cluster 1: 4208events, cluster 2: 241 events. Notches extend ±1.58 IQR√n , where IQR isthe inter-quartile range.50lllllllllllllllllllllllllllllllSRSF1 SRSF2SRSF5 SRSF60. 2 1 2ClusterMotif densityFigure 2.15: ESE density boxplots for retained introns increasing (cluster 2) ordecreasing (cluster 1) in PSI with T3 treatment. ESEs tested include bind-ing motifs for SRSF1, SRSF2, SRSF5, and SRSF6. Cluster 1: 145 events,cluster 2: 584 events. Notches extend±1.58 IQR√n , where IQR is the inter-quartile range.51Table 2.8: ESE motif density comparisons for SE and RI events in PSI clusters1 and 2. One-tailed t-test p-values are shown. Alternative hypothesisis that the indicated binding motif density is greater in AS events thatincrease in PSI upon T3 treatment. NS indicates that the null hypothesiswas not rejected at a significance level of 0.05.AS event type SRSF1 SRSF2 SRSF5 SRSF6SE 1.01e⁻⁰⁶ 0.00161 0.0287 NSRI 0.00379 0.0379 NS 0.0013The observation that ESE density correlates with splicing response demon-strates that the number of SR protein binding motifs is an important indicatorof whether an alternatively included region of RNA will be present in the finaltranscript. RI and SE events appear to rely on different sets of SR proteins fortheir inclusion. Both SRSF1 and SRSF2 were predictive of splicing response in SEand RI events; However, SRSF5 was only predictive of response in SE events, andlikewise SRSF6 for RI events.2.3 CLK Inhibition Promotes Conjoined Gene Transcription in aDose Dependent MannerInspection of splicing patterns using the Integrative Genomics Viewer [66] re-vealed cases of splicing between consecutive genes located on the same genomicstrand in treated RNA-Seq libraries (Figure 2.16). Conjoined genes (CGs) havebeen previously reported in the literature, and are believed to arise from transcrip-tional read-through from the upstream to the downstream partner gene [67]. Thishypothesis is supported by a common pattern: the second-to-last exon of the up-stream gene being spliced to the second exon of the downstream gene. Skippingof the last and first exons of CG partner genes may be due to a lack of splicing sig-nals at what would normally be a polyadenylation site or transcription start site,respectively. Additionally, the existence of intergenic exons in some CG tran-scripts strongly points to transcriptional read-through as the underlying mecha-nism for CG formation. Both of these patterns are present in the CGs detected inT3-treated samples.52118,488,991 118,494,892 118,500,793 118,506,694[0 - 792] 10.0μM1792154[0 - 480] 5.0μM7618 574[0 - 452] 1.0μM91834[0 - 591] 0.5μM1195 524[0 - 554] 0μM13055VSIG10WSB2Figure 2.16: IGV-generated plot of splicing in the VSIG10-WSB2 conjoinedgene. Plots for T3 treatment concentrations of 0.0, 0.5, 1.0, 5.0, and10.0μM are shown from top to bottom. The control sample plot iscoloured grey, and the treated sample plots are coloured according toT3 concentration. RefSeq gene annotations are shown in blue at the bot-tom of the plot alongwith chromosome 12 coordinates. For each sample,the y-axis represents read coverage, and the value range is indicated be-tween brackets. Arcs connecting exons represent reads spliced acrossintrons, with the number of spliced reads annotated over the line. Onlyarcs representing at least 3 reads are shown.53AlthoughCLKs are known to play an important role in RNA splicing, theman-ner in which they might regulate 3′-end cleavage is unknown. Characterisationof CGs produced as a result of CLK inhibition is a first step towards understandingthe genesis of these transcripts. Systematic analysis may provide insight into theregulation of 3′-end processing and reveal a novel role of CLK phosphorylation.2.3.1 T3 treatment increases conjoined gene loci detection in adose-dependent mannerA genome-wide search for further occurrences of conjoined transcripts was per-formed using the deFuse gene fusion detection method [55]. The deFuse classifierwas modified by removing two features to increase CG detection sensitivity:• est_breakseqs_percident• breakseqs_estislands_percidentConjoined genes are required to have both participating genes located on thesame strand of the same chromosome. Detected CG events were filtered to havethe following attributes:• deletion = ‘Y’• expression ≥ 50 reads for both genes• splice_score = 4 OR exonboundaries = ‘Y’• probability ≥ 0.9These filters were chosen to produce a set of conjoined gene event calls that arelikely due to splicing as opposed to genomic aberrations, and occur with a highprobability. While it is likely that some real conjoined gene events have beenmissed due to stringent filter thresholds, this is acceptable as the focus of down-stream analysis is on the characterisation of a set of true events rather than iden-tifying all possible events.Analysis of the RNA-Seq libraries in each T3-treated dataset revealed a com-mon pattern of T3 dose-dependent detection of CG events (Figure 2.17). TheHCT116 unstranded RNA-Seq dataset demonstrates a pattern similar to some AS54events (e.g. SE) where the number of affected events increases dramatically atthe 0.50μM T3 concentration. This pattern was not observed in the strandedRNA-Seq datasets due to the lack of measurements at T3 concentrations lowerthan 0.5μM. Nevertheless, the stranded RNA-Seq datasets do not contradict theresults from the unstranded HCT116 dataset as they still reveal an increase inconjoined gene events at 0.5μM, with a milder dose effect. The similarity withAS events in dose-dependent response, especially the increase in event detectionat 0.5μM, suggests that the production of CGs due to CLK inhibition is a primaryeffect of the treatment itself, rather than a secondary effect induced by disruptionof the transcriptomic landscape.5502004006000.0 0.05 0.10 0.50 1.0 5.0 10.0 0.0 0.50 1.0 5.00.0 0.50 1.0 5.0Concentration (µM)Conjoined gene event countsDatasetHCT116−unstrandedHCT116−strandedhTERT−strandedFigure 2.17: Conjoined gene counts per RNA-Seq library as detected by amod-ified deFuse classifier.A substantial difference exists in the number of detected CGs between theunstranded and stranded RNA-Seq datasets (HCT116 unstranded RNA-Seq: 586,HCT116 stranded RNA-Seq: 215, hTERT: 154 unique events for 0.0μM, 0.5μM,1.0μM, and 5.0μM; 2.7 fold increase in unstranded vs. stranded HCT116 RNA-Seq). This pattern was also observed in the number of differentially spliced ASevents, and may be due to differences in the amount of splicing information inthe RNA-Seq libraries from the stranded and unstranded RNA-Seq datasets (seeSection 2.5).Conjoined geneswere also detected in RNA-Seq libraries generated fromHCT116cells transfected with CLK siRNA. 33 CGs (upstream, downstream gene pairs)were detected in the siRNA dataset after subtraction of CGs found in the controllibraries. 25 of these CG were also found in the CG lists generated from the T3-treated sample libraries. Therefore, increased CG transcription can be explainedby loss of CLK activity (as opposed to a T3 off-target), for at least some loci.562.3.2 T3 treatment increases conjoined gene PSI in a dose-dependentmannerIncreased detection of CG events upon T3 treatment implies a growth in CG tran-scription rate. If a constant fraction of transcripts from the upstream partnergene read through to the downstream partner, then increased CG transcriptionmay indicate increased expression of the upstream partner gene. Alternatively,CLK inhibition may increase the proportion of transcripts escaping 3′-end cleav-age.To investigate the affect of CLK inhibition on CG production rate, CG isoformannotations were generated and input into MISO. In cases where the second tolast exon of the upstream CG partner gene is spliced to the second exon of thedownstream partner, the isoform annotations can be constructed by using thelast two exons of the upstream parent as the exclusion (wildtype) isoform, andthe second to last exon of the upstream parent and the second exon of the down-stream parent as the inclusion (CG) isoform. Any intergenic exons detected inCGs transcripts are included in the inclusion isoform annotations. When splicingoccurs from the last exon of the upstream gene, annotations are generated wherethe terminal exon of the upstream parent is the exclusion isoform, and the sameexon plus the appropriate exon of the downstream parent is the inclusion isoform.This class of annotations is similar to tandem UTR AS events in the MISO anno-tations. The generated CG isoform annotations were used by MISO to calculatePSI values for each CG event.CGs were called as “differentially spliced” if MISO reported a Bayes factor≥ 20, and a PSI difference ≥ 0.1 between treated and untreated samples. 603,194, and 185 CGs were differentially spliced in the HCT116 unstranded RNA-Seq,HCT116 stranded RNA-Seq, and hTERT datasets. PSI value differences across allT3 concentrations were collected for each differentially spliced CGs, and CGswith missing PSI estimations were removed. PSI value distributions were thencompared across each treatment concentration (Figure 2.18, Figure 2.19, Figure 2.20).Both the HCT116 and hTERT datasets show a dose-dependent increase in CG PSI(HCT116 unstranded RNA-Seq medians: 0.01, 0.07, and 0.14, for 0.10μM, 0.50μM,and 1.0μM; hTERT medians: 0.1, 0.16, and 0.23, for 0.50μM, 1.0μM, and 5.0μM). Inthe HCT116 unstranded RNA-Seq dataset, a clear increase (7 fold greater median57PSI than 0.10μM) in PSI value changes can be seen at the 0.5μM concentration.CLK inhibition clearly increases the proportion of CG to wild-type transcriptsin a dose-dependent manner. CG PSI changes were then compared to the ex-pression of non-conjoined upstream transcripts, and found that upstream genenon-conjoined transcription decreased with increased CG PSI (Figure 2.21, Fig-ure 2.22, Figure 2.23). This pattern demonstrates that CGs “steal” transcriptionfrom the upstream CG participant.58llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−1.0− (µM)PSI changeFigure 2.18: CG PSI change boxplots per T3 treatment for the HCT116 un-stranded RNA-Seq dataset. N = 603. Notches extend ±1.58 IQR√n , whereIQR is the inter-quartile range.59lll lllllllllll−1.0− 1.05.0Concentration (µM)PSI changeFigure 2.19: CG PSI change boxplots per T3 treatment for the HCT116stranded RNA-Seq dataset. N = 194. Notches extend ±1.58 IQR√n , whereIQR is the inter-quartile range.60lllllllll−1.0− 1.05.0Concentration (µM)PSI changeFigure 2.20: CGPSI change boxplots per T3 treatment for the hTERT strandedRNA-Seq dataset. N= 185. Notches extend±1.58 IQR√n , where IQR is theinter-quartile range.610.05uM 0.10uM 0.50uM1.0uM 5.0uM 10.0uM123451020123451020−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0PSI changeRPM ratio + 1Figure 2.21: Non-conjoined upstream transcript expression ratio vs CG PSIchange in the HCT116 unstranded RNA-Seq dataset. Upstream non-conjoined transcript expression is reads per million (RPM) mappedreads supporting the non-conjoined isoform from the CG MISO anal-ysis. RPM ratio is the RPM of the upstream gene in the treated sampledivided by the RPM in the control sample. PSI change is the differencein PSI from the control sample to the treated sample for the CG event.Negative regression line slope indicates decrease in non-conjoined tran-scription with CG PSI increase.620.50uM 1.0uM 5.0uM12345−0.5 0.0 0.5 −0.5 0.0 0.5 −0.5 0.0 0.5PSI changeRPM ratio + 1Figure 2.22: Non-conjoined upstream transcript expression ratio vs CG PSIchange in the HCT116 stranded RNA-Seq dataset. Upstream non-conjoined transcript expression is reads per million (RPM) mappedreads supporting the non-conjoined isoform from the CG MISO anal-ysis. RPM ratio is the RPM of the upstream gene in the treated sampledivided by the RPM in the control sample. PSI change is the differencein PSI from the control sample to the treated sample for the CG isoform.Negative regression line slope indicates decrease in non-conjoined tran-scription with CG PSI increase.630.50uM 1.0uM 5.0uM123451020−0.5 0.0 0.5 −0.5 0.0 0.5 −0.5 0.0 0.5PSI changeRPM ratio + 1Figure 2.23: Non-conjoined upstream transcript expression ratio vs CG PSIchange in the hTERT dataset. Upstream non-conjoined transcript ex-pression is reads per million (RPM) mapped reads supporting the non-conjoined isoform from the CG MISO analysis. RPM ratio is the RPMof the upstream gene in the treated sample divided by the RPM in thecontrol sample. PSI change is the difference in PSI from the controlsample to the treated sample for the CG isoform. Negative regressionline slope indicates decrease in non-conjoined transcription with CGPSI increase.642.3.3 Similar conjoined genes are sensitive to CLK inhibition in HCT116 andhTERT cellsRNA 3′-end processing is differentially regulated according to cell type and tu-mour/normal status, similar to RNA splicing [36]. Despite differences in RNAprocessing regulation, CLK inhibition increases CG transcription in both malig-nant HCT116 and normal hTERT cells. Nevertheless, there may be differences inthe set of CG loci between cell types. The degree of overlap between CGs willreflect the level of reliance on biological context in the vulnerability of genes toskip 3′-end cleavage.Overlapping CGs were identified by generating a unique list of upstream-downstream CG partner pairs for each dataset. These lists ignore variation indonor and acceptor splice sites from the same CG partners, as these differencescan be considered to arise from different isoforms (or “events”) of the same CG.These lists were used to determine the set of conjoined genes common betweencell types and exclusive to each cell type (Figure 2.24). 15 of 117 (12.8%) hTERTconjoined gene calls were not present in the HCT116 conjoined gene lists. Only9 of 161 (5.6%) calls from the HCT116 stranded RNA-Seq dataset were not presentin the other conjoined gene lists. Overall, the majority of both stranded RNA-Seqdataset CGs overlap with those in another dataset. However, 403 of 589 (68.4%)CGs called in the unstranded HCT116 RNA-Seq dataset were exclusive to thatdataset.65Figure 2.24: Venn diagram of conjoined genes detected in the two HCT116and one hTERT RNA-Seq datasets.The stranded RNA-Seq datasets both revealed many fewer CGs than the un-stranded HCT116 RNA-Seq dataset. This is likely due, in part, to a greater num-ber of treament concentrations in the unstranded RNA-Seq dataset, includingthe highest tested concentration (10.0μM). The majority of CGs in both of thestranded RNA-Seq datasets were detected in the unstranded RNA-Seq dataset.Also, large proportions of the HCT116 (55.3%) and hTERT (33.3%) stranded RNA-Seq datasets were detected in the unstranded RNA-Seq dataset, but not the otherstranded RNA-Seq dataset. The unstranded RNA-Seq protocol may also be moresensitive to the detection of spliced sequences (see Section 2.5).15 conjoined geneswere only present in the high-confidence hTERT calls. Thestringent filtering process for CG events may have removed the CG predictionsfound in the hTERT cells from the HCT116 predictions. Or, those particular CGs66may not have been sampled in the HCT116 RNA-Seq libraries, which could occurif expression is low.CGs were also detected in HCT116 cells using cDNA sequences generatedwith the Pacific Biosciences’ (PacBio) SMRT sequencing technology [9] (Table 2.1).The PacBio technology provides the ability to sequence up to several thousandnucleotides, allowing the capture of entire transcript sequences in many cases.As deFuse was designed to use paired-end RNA-Seq reads, an alternative methodfor CG detection was necessary for the PacBio data. Conjoined transcripts weredetected by selecting reads that mapped across two different genes located onthe same chromosome strand. For a PacBio read to be considered as “mapped” toa gene for the purposes of CG detection, at least three exon junctions within aread must match exon junctions belonging to a single gene in the Gencode level1 and 2 transcript annotations. Cases where one gene is encapsulated withinanother gene (e.g. a miRNA located within the intron of another gene) are notconsidered conjoined genes. The result is an inclusive list of candidate CGs thatcan be compared to the CGs detected the RNA-Seq datasets.The PacBio CGs were compared to those detected in the RNA-Seq datasetsand overlapping CGswere counted in an identical manner to the RNA-Seq datasetcomparison (Figure 2.25). 173 of 647 (26.7%) CGs detected in the PacBio datasetoverlapped with those found in the RNA-Seq datasets. The PacBio-only CGs maybe due, in part, to increased numbers of false positives: the detection methodfor the PacBio dataset was designed to favour sensitivity over specificity. How-ever, the lower number of reads in the PacBio dataset (mean: 1,919,728) comparedwith the RNA-Seq datasets (e.g. HCT116 unstranded RNA-Seq mean: 167,167,942;approx. 87 times more than PacBio) means that the PacBio dataset may have sam-pled fewer conjoined gene transcripts. This may partially explain the lower over-lap of the PacBio CGs with the RNA-Seq CGs.67Figure 2.25: Venn diagram of conjoined genes detected in the RNA-Seq andPacBio datasets.Only 1 of the 15 hTERT-specific CGs from the RNA-Seq dataset comparisonwas detected in the PacBio data. These hTERT-specific CGs may indicate a dif-ferential 3′ end processing response to CLK inhibition. These CGs may also beexplained by cell-type specific gene expression profiles. Specifically, the hTERT-specific CGs may not be detected in the HCT116 samples merely due to low ex-pression of the parent genes in HCT116 cells. To investigate this, FPKM values forgenes involved in hTERT-specific CGs were calculated using Cufflinks for each68of the HCT116 and hTERT datasets. The FPKM distributions of hTERT-specificCG partner genes reveal a pattern of higher expression in hTERT samples (Fig-ure A.4, Figure A.5). Therefore, the presence of hTERT-specific CGs may be atleast partially explained by reduced expression of participating genes in HCT116cells.2.3.4 Conjoined gene events are validated in both HCT116 and hTERT usingtargeted sequencingWhile the PacBio dataset adds support for the presence of CGs detected in theRNA-Seq dataset, the low throughput and resulting lower sensitivity of the PacBioplatform compared to RNA-Seq means that another validation method is neces-sary to properly estimate the proportion of true CG events. A set of 52 conjoinedgene events (i.e. CG isoforms) was selected for targeted sequencing. The list ofCGs include events found in both HCT116 and hTERT cells, and events foundonly in the CG lists of one cell type. The final list of sequencing amplicons alsoinclude regions of constitutive exons from three housekeeping genes. Housekeep-ing gene exon expression was used to normalize expression of each CG.Targeted sequencing of the CG and housekeeping gene amplicons was per-formed on three datasets. Samples from the two HCT116 concentration curve ex-periments sequenced with unstranded and stranded RNA-Seq were used as twoHCT116 replicate datasets. The hTERT concentration curve experiment sampleswere also used for CG targeted sequencing.The validation sequencing libraries were analyzed for conjoined genes withdeFuse. Detected CGs were compared to the set of CGs selected for validation. 37of 52 (71.2%) CG events were validated with this method. Interestingly, 5 eventsnot selected for validation were detected in the validation dataset. Upon inspec-tion, 4 appear to be alternative isoforms of other CGs selected for validation; theother is similar to another validation input event except that it involves a moredistant paralog of the upstream gene. This CG event is likely due to reads mis-aligned to the paralog gene. Considering CG parent genes only, and ignoringspecific splice sites, 40 (76.9%) of the selected CGs were detected in the validationdataset.Since the CGs chosen for validation include those found in only HCT116 or69hTERT cells according to the deFuse analysis, the CGs found in the validationdataset may support the existence of cell-type specific CG events. However, ofthe 42 detected events, only 1 event was found in only one cell type — HCT116.One event was detected in a single HCT116 dataset and the hTERT dataset. Thevast majority (40) of the detected events were present in all three (2 HCT116, 1hTERT) datasets.Many of the CGs detected in the validation dataset (22 of 42, or 52.4%) werealso detected in untreated samples. To verify the effect of CLK inhibition on CGformation, CG event expression was compared across T3 treatment concentra-tions (Figure A.6, Figure A.7, Figure A.8). CG expression distributions show adose-dependent increase in both HCT116 and hTERT cells.2.3.5 Upstream partners of conjoined genes are involved in RNAmetabolismand cell-cycle regulationCG regulation may be focussed on either the upstream or downstream gene part-ners. For example, The downstream partners may use the promoter of the up-stream gene to increase expression; Alternatively, CGsmay form to add the down-stream gene’s functionality to the upstream gene. To investigate the possibilitythat upstream and downstream CG partners are involved in similar biologicalprocesses, the upstream and downstream partners were used to create two geneinteraction networks using the ReactomeFI Cytoscape plugin [56]. The interac-tion network genes were checked for enriched GO biological process gene setswith false discovery rate controlled at 0.05. Enriched biological processes in theupstream and downstream CG partners were then used to generate an enrich-ment map [59] (Figure 2.26, Figure 2.27, Figure 2.28).70RNA splicingcell cycleRNA metabolismtranslationprotein catabolismFigure 2.26: Enrichment map for genes involved in CGs in the HCT116 un-stranded RNA-Seq dataset. Each node represents a GO biological pro-cess gene set. Biological processes enriched in CG upstream partnershave red cores, while biological processes enriched in downstream part-ners have red outer rings. Edge thickness indicates the level of CG part-ner overlap between gene sets.71cell cycle translationRNA metabolismRNA splicingprotein catabolismFigure 2.27: Enrichment map for genes involved in CGs in the HCT116stranded RNA-Seq dataset. Each node represents a GO biological pro-cess gene set. Biological processes enriched in CG upstream partnershave red cores, while biological processes enriched in downstream part-ners have red outer rings. Edge thickness indicates the level of CG part-ner overlap between gene sets.72cell cycleprotein catabolismRNA metabolism, splicingtranslationFigure 2.28: Enrichment map for genes involved in CGs in the hTERT dataset.Each node represents a GO biological process gene set. Biological pro-cesses enriched in CG upstream partners have red cores, while biolog-ical processes enriched in downstream partners have red outer rings.Edge thickness indicates the level of CG partner overlap between genesets.73Analysis of the upstream gene partners produced a greater number of signifi-cantly enriched biological processes compared to the downstream partners. Thissuggests that the upstream gene partners are more related to each other, and thatCG regulation is more focussed on the role of the upstream partners within thecell. Upstream CG partners are involved in RNA splicing, the cell cycle proteincatabolism, and translation. Genes associated with 3′-end processing, includingA2AF1, CSTF1, and NUDT21 (a component of CFIm), were found to participatein CG transcription. Formation of CGs involving 3′-end processing factors maydisrupt normal 3′-end cleavage, in turn promoting CG transcription at other loci.Similar biological processes were affected by CG transcription (Figure 2.26,Figure 2.27, Figure 2.28) and differential splicing (Figure 2.3, Figure 2.4, Figure 2.5,Table 2.3), which may suggest that CLKs can regulate this common set of biolog-ical functions through different RNA processing mechanisms. Formation of CGsmight comprise one aspect of cellular response to CLK inhibition. For example,CG transcription may be a mechanism for upstream gene expression control ifthe CG transcript is targeted for degradation by the nonsense-mediated decaypathway [67].2.3.6 Upstream conjoined gene partners may rely on auxiliary 3′-endprocessing factorsTranscriptional readthrough of upstream CG partners into downstream genesmay be regulated by components of the 3′-end processing machinery. While agene may have multiple alternative cleavage and polyadenylation sites, CG for-mation requires the skipping of all possible sites in the upstream gene. Yet, finalpoly(A) sites generally contain a strong, canonical poly(A) signal [36]. Termi-nal poly(A)/3′ cleavage sites of upstream CG partners may contain common cis-regulatory signal patterns that are sensitive to RS domain phosphorylation status.These genes would then be susceptible to transcriptional read-through upon CLKinhibition.Regulatory signals associated with CG formation were investigated by iden-tifying the annotated locations of terminal poly(A) sites in the genome. The pro-portion of upstream CG partners with canonical poly(A) signals at their terminalpoly(A) site was similar to the proportion for all genes. Therefore, the absence of74a canonical poly(A) signal alone does not appear to be associated with CG gen-eration. The regions around the terminal poly(A) sites were examined for thepresence of a canonical A(A/U)UAAA poly(A) signal, an upstream UGUA signal,and a U/GU-rich downstream element (DSE). For the purposes of this analysis, aDSE is defined as a sequence of at least six nucleotides, composed of uracils andinterspersed with up to three non-sequential guanines.Polyadenylation sites without canonical poly(A) signals are known to relyon auxiliary 3′-end processing factors for poly(A) site selection [36]. So, geneswere partitioned into two groups based on whether or not their terminal poly(A)site contained a nearby canonical poly(A) signal, as detected through this analysis.UpstreamCG partners in the group lacking canonical poly(A) signals had a higherproportion of detected UGUA signals (chi-squared p-value < 0.01) and DSEs (chi-squared p-value < 0.05) compared to all genes without an annotated poly(A) site.This pattern was not found in a similar comparison with the group containingnearby canonical poly(A) signals.Upstream CG gene partners lacking canonical poly(A) signals seem to relyon CFIm binding to UGUA sites and CstF binding to G/GU-rich DSEs more oftenthan typical genes. Proper 3′ cleavage of these genes may be especially sensitiveto regulation of CFIm andCstF.The heavier reliance onCFIm binding in particularis interesting, because SR proteins are known to interact with CFIm, potentiallyby assisting in the recruitment of CFIm to the RNA substrate [25]. Furthermore,phosphorylation of CFIm is necessary for the 3′ cleavage reaction to occur [26].CLKs may regulate RS domain mediated SR protein-CFIm interactions, or mayeven phosphorylate the RS-like domain of CFIm itself. This may partially explainthe sensitiviy of these CG loci to CLK inhibition. For genes with canonical poly(A)signals at terminal polyadenylation sites, CG formation propensity may be deter-mined by regulation of core components of the 3′-end processing machinery.2.4 CLK Inhibition Results in the Down Regulation of SplicingFactors and Cell Cycle RegulatorsCLK inhibition causes widespread structural changes in the transcriptome. Anygene expression changes could be due to changes in transcriptome composition,75or a direct response to the presence of the T3 compound. Cufflinks [51] wasused to quantify transcript abundances in the three T3-treated RNA-Seq datasets,which produced FPKM values for each gene. Genes selected for further analy-sis were required to have FPKM values >= 1 in at least 4 libraries for the un-stranded HCT116 RNA-Seq dataset, and 3 libraries for the two stranded RNA-Seqdatasets. This filtering was performed to remove unexpressed genes that have alow FPKM value due to the presence of misaligned reads. Each gene must alsohave an FPKM fold change >= 2 for at least one treated library when comparedwith the untreated control library. The resulting list represents candidate differ-entially expressed genes.Determining whether a gene is differentially expressed in a statistically mean-ingful manner without biological replicates is challenging. However, by mea-suring RNA at a variety of CLK inhibitor concentrations, genes with expressionprofiles following clear trends across the concentration gradient can identifiedas likely to be differentially expressed. Gene expression trends were discoveredby clustering gene expression profiles using the WGCNA [64] clustering method.WGCNAwas runwith networkType=“signed”, minModuleSize=25, and power=28for the unstranded HCT116 RNA-Seq dataset, power=27 for the stranded HCT116RNA-Seq dataset, and power=30 for the HTERT dataset. This resulted in 6 clus-ters for the unstranded HCT116 RNA-Seq dataset, 5 clusters each for the strandedHCT116 RNA-Seq and hTERT datasets. For each cluster, a representative geneexpression profile (an “eigengene”) was calculated and genes whose expressionprofiles correlated with the eigengene expression profile less than 0.75 were re-moved.All three datasets exhibit similar FPKMprofile clusters (Figure 2.29, Figure 2.30,Figure 2.31). Both stranded RNA-Seq datasets include fewer treatment librariesand so the expression profiles will appear somewhat different. The number ofdown-regulated genes greatly outnumbered up-regulated genes. The splicing andtranscriptional machineries are linked and splicing disruption may have causeda negative effect on gene expression. In all datasets the largest cluster is charac-terised by genes that are strongly down-regulated starting at the 0.5μM concen-tration. Some clusters behave in an opposing manner: their genes are stronglyup-regulated at the same concentrations. This pattern of greater regulatory ac-76tivity at the 0.5μM concentration was also observed in the differential splicinganalysis (Section 2.2), and the CG analysis (Section 2.3). The cluster profiles forboth HCT116 datasets and the hTERT dataset demonstrates that gene regulatoryprocesses are affected similarly in both HCT116 and hTERT cells as a result ofCLK inhibitor treatment.771 (3198) 2 (469) 3 (507) 4 (232)5 (37) 6 (22)−2−1012−2−10120.0 0.05 0.10 0.50 1.0 5.0 10.0 0.0 0.05 0.10 0.50 1.0 5.0 10.0Concentration (µM)Standardized FPKMFigure 2.29: Clustered gene expression profiles from the HCT116 unstrandedRNA-Seq dataset. Genes have been clustered using WGCNA based onFPKM profiles. Each black line is a gene expression profile; The redlines are cluster eigengenes.1 (2647) 2 (359) 3 (334) 4 (182)5 (45)−101−1010.0 0.50 1.0 5.0 Concentration (µM)Standardized FPKMFigure 2.30: Clustered gene expression profiles from the HCT116 strandedRNA-Seq dataset. Genes have been clustered using WGCNA based onFPKM profiles. Genes have been clustered using WGCNA based onFPKM profiles. Each black line is a gene expression profile; The redlines are cluster eigengenes.781 (2338) 2 (727) 3 (264) 4 (114)5 (88)−101−1010.0 0.50 1.0 5.0 Concentration (µM)Standardized FPKMFigure 2.31: Clustered gene expression profiles from the hTERT strandedRNA-Seq dataset. Genes have been clustered using WGCNA based onFPKM profiles. Genes have been clustered using WGCNA based onFPKM profiles. Each black line is a gene expression profile; The redlines are cluster eigengenes.Each cluster contains genes that appear to be subject to similar regulatoryprocesses. Therefore, it is likely that each cluster contains groups of genes thatparticipate in similar or related biological processes. Identifying biological pro-cesses enriched within each gene expression cluster will provide a glimpse intohow biological processes are affected by differential expression due to CLK inhbi-tion.Functional enrichment analysis of clustered genes was performed using theReactomeFI Cytoscape plugin [56]. For each set of clustered genes, a gene interac-tion network was constructed and genes remaining in the constructed networkwere used to perform functional enrichment analysis. Enriched GO biologicalprocess terms with false discovery rate controlled at 0.05 were reported for eachcluster. For the HCT116 datasets, only analysis of clusters 1–3 resulted in a list ofenriched biological processes. The hTERT dataset only produced enriched biolog-ical processes for clusters 1–4. Enriched biological processes were used to createenrichment maps [59] (Figure 2.32, Figure 2.33, Figure 2.34).79DNA repaircell cyclesignaling pathwaysprotein phosphorylationchromatin modificationubiquitin-dependent protein catabolismtranscription, RNA splicingtoll-like receptor signalingprotein dephosphorylationnucleosome assemblyFigure 2.32: Biological process enrichment map for differentially expressedgenes in the HCT116 unstranded RNA-Seq dataset. Each node repre-sents a GO biological process gene set. Red nodes represent biologicalprocesses enriched among up-regulated genes, likewise blue for down-regulated genes. Node cores are coloured blue when that gene set isenriched among genes in cluster 1, red for cluster 2. The outer ring iscoloured blue when that gene set is enriched among genes in cluster3. Edge thickness indicates the level of overlap between two gene sets,considering the set of up- or down-regulated genes.80cell cycletoll-like receptor signalingubiquitin-dependent protein catabolismprotein phosphorylationphosphatidylinositol dephosphorylationDNA repairchromatin modificationtranscriptionRNA catabolismnucleosome assemblyprotein transportFigure 2.33: Biological process enrichment map for differentially expressedgenes in the HCT116 stranded RNA-Seq dataset. Each node representsa GO biological process gene set. Red nodes represent biological pro-cesses enriched among up-regulated genes, likewise blue for down-regulated genes. Node cores are coloured blue when that gene set isenriched among genes in cluster 1, red for cluster 2. The outer ring iscoloured blue when that gene set is enriched among genes in cluster3. Edge thickness indicates the level of overlap between two gene sets,considering the set of up- or down-regulated genes.81chromatin modificationRNA processing, splicing protein catabolismtranscriptioncell cycleDNA repairtoll-like receptor signalingprotein phosphorylationprotein dephosphorylationnucleosome assemblyFigure 2.34: Biological process enrichment map for differentially expressedgenes in the hTERT dataset. Each node represents a GO biologicalprocess gene set. Red nodes represent biological processes enrichedamong up-regulated genes, likewise blue for down-regulated genes.Node cores are coloured blue when that gene set is enriched amonggenes in cluster 1, red for cluster 3. The outer ring is coloured bluewhen that gene set is enriched among genes in cluster 2, red for cluster4. Edge thickness indicates the level of overlap between two gene sets,considering the set of up- or down-regulated genes.82Genes characterised by strong down-regulation at the 0.5μM concentration(cluster 1) are enriched for RNA splicing and processing genes (Figure 2.32, Fig-ure 2.33, Figure 2.34). Up-regulated gene clusters were not enriched for RNAsplicing and processing genes. The down regulation of genes involved in RNAmetabolism may represent an attempt by treated cells to prevent the productionof aberrant RNA transcripts due to CLK inhibition.Aside from RNA processing, genes involved in cell cycle regulation weredown-regulated. Down-regulation of cell cycle regulators upon T3 treatment sug-gests that CLK inhibition may disrupt normal cell cycle activity. RNA splicingis inhibited during mitosis [61] and appears to involve the dephosphorylation ofSRSF10 proteins [68]. In addition, down-regulation of SRSF3 induces G1 cell cyclearrest in HCT116 colon cancer cells [62]. Splicing repression via CLK inhibitionmay have a similar effect.Genes in the second down-regulated cluster (3 for HCT116, 2 for hTERT) werefewer than those in cluster 1 and were enriched for many fewer biological processgene sets. Biological processes enriched in the secondary down-regulated clus-ter overlapped with those of cluster 1, and are related to RNA metabolism andcell cycle regulation. A subset of genes are perhaps more resilient to expressionchanges in the presence of CLK inhibition, and increasing T3 dose is progressivelydisrupting biological processes.Toll-like receptor signaling genes were down-regulated upon CLK inhibition.However, this biological process seemed to bemore sensitive in HCT116 cells thanhTERT cells. In HCT116 cells, toll-like receptor signaling was down-regulated incluster 1 (strong down-regulation at 0.5μM) as well as cluster 3 (more resilient todown-regulation). In hTERT cells, toll-like receptor signalingwas down-regulatedin cluster 2 (more resilient to down-regulation).Up-regulated genes were much fewer than down-regulated genes and thusaffected fewer biological processes. Histone assembly was among the few bio-logical processes found to be enriched among only up-regulated gene expressionclusters in all three datasets.Biological processes affected by gene down-regulation are consistent with thebiological processes affected by differential splicing and CG transcription. RNAmetabolic processes (including splicing), cell cycle, and protein catabolism are83affected by changes in all three processes. Both differential splicing and genedown-regulation affected DNA repair, histone modification, protein phosphory-lation, and toll-like receptor signaling. SR proteins are reported to play a role intranscriptional elongation, and depletion of some SR proteins can have a negativeimpact on transcription [69]; Disruption of SR protein activity via CLK inhibitionmay attenuate the splicing and expression a common set of genes, potentiallyexplaining the similarities in biological processes affected by differential splicingand expression down-regulation.2.5 Comparison of Unstranded and Stranded RNA-Seq LibrariesThe stranded RNA-Seq datasets produced many fewer significant AS and CGevents (Figure 2.1, Figure 2.17). To identify sources of these differences, the RNA-Seq libraries were compared to each other and to the PacBio libraries using var-ious metrics. First, PSI values for each AS event were compared at each com-mon T3 concentration between the HCT116 unstranded and stranded RNA-Seqlibraries (Figure 2.35). AS events were not compared at a certain T3 concentra-tion if they did not pass a coverage threshold in both datasets of 1 read each forboth the inclusion and exclusion isoforms and 10 reads total for the AS event.This read coverage filter is the same as applied for the MISO differential splicinganalysis. The unstranded and stranded RNA-Seq dataset AS event PSI values hada Pearson correlation coefficient of 0.75. A pattern of anti-correlation amongst asubset of events was also observed.840.000.250.500.751.000.00 0.25 0.50 0.75 1.00Unstranded RNA−Seq PSIStranded RNA−Seq PSI1101001000countFigure 2.35: HCT116 unstranded vs. stranded RNA-Seq hexplot of AS eventPSI values. PSI values were compared for each event at each concentra-tion. Each hex represents a number of AS events. The lighter the shadeof blue, the greater the number of AS events map to that hex.85Similarly, the unstranded and stranded HCT116 RNA-Seq AS event PSI valueswere compared to PSI values computed from the PacBio sequencing libraries (Fig-ure 2.36, Figure 2.37). PacBio reads violate some assumptions of the MISO model,so PSI values were calculated by counting reads supporting the inclusion and ex-clusion isoforms in the MISO event annotations. PacBio PSI values were morestrongly correlated with the unstranded RNA-Seq dataset (Pearson correlationcoefficient 0.76) compared with the stranded RNA-Seq dataset (Pearson correla-tion coefficient 0.66). Anti-correlation can also be observed amongst a subsetof events in the PacBio vs. RNA-Seq comparisons, although perhaps to a lesserextent in the comparison with the unstranded RNA-Seq data. The higher corre-lation between the PacBio PSI and unstranded RNA-Seq PSI values suggests thatthe unstranded RNA-Seq PSI values may be more reliable than those computedfrom the stranded RNA-Seq dataset.860.000.250.500.751.000.00 0.25 0.50 0.75 1.00PacBio PSIUnstranded RNA−Seq PSI110100countFigure 2.36: HCT116 PacBio vs. unstranded RNA-Seq hexplot of AS event PSIvalues. PSI values were compared for each event at each concentration.Each hex represents a number of AS events. The lighter the shade ofblue, the greater the number of AS events map to that hex.870.000.250.500.751.000.00 0.25 0.50 0.75 1.00PacBio PSIStranded RNA−Seq PSI110100countFigure 2.37: HCT116 PacBio vs. stranded RNA-Seq hexplot of AS event PSIvalues. PSI values were compared for each event at each concentration.Each hex represents a number of AS events. The lighter the shade ofblue, the greater the number of AS events map to that hex.88Next, the number of mapped reads between the three RNA-Seq datasets werecompared at each common T3 concentration (Figure 2.38). Generally, the un-stranded RNA-Seq libraries have a greater number of mapped reads. However,this pattern is not always consistent; At the 1.0μM concentration the number ofmapped reads is roughly equal between the three datasets. Therefore, while readcoverage may play a role in event count differences between the unstranded andstranded RNA-Seq libraries, it cannot be the primary cause.890.0e+005.0e+071.0e+081.5e+080.0 0.5 1.0 5.0Concentration (µM)Mapped readsDatasetHCT116−unstrandedHCT116−strandedhTERT−strandedFigure 2.38: Mapped read counts for RNA-Seq libraries from the three T3-treated RNA-Seq datasets. Counts for T3 concentrations commonamongst the three datasets are shown.Finally, the proportion of mapped reads that were split during the mappingprocess were compared (Figure 2.39). The majority of these reads are split acrossintrons and are an important source of evidence for RNA splicing in a sequencinglibrary. A lower proportion of split reads may result in a reduced ability to detectand quantify alternative splicing. Lower split read proportions were detected inthe stranded RNA-Seq libraries. In both HCT116 datasets the proportion of splitreads decreases with increasing T3 dose. The hTERT dataset shows a similar dose-dependent effect, however the decrease in split read proportion is not as strong,especially at the higher concentrations. This weaker dose effect in the hTERTdataset can also be observed in the differentially spliced AS event and CG eventcounts (Figure 2.1, Figure 2.17). Differences in the proportion of mapped reads90that align to splice junctions appears to be a main contributor to the reduction ofdetected splicing events in the stranded RNA-Seq datasets.91llllll l lllll0.2000.2250.2500.2750.0 0.5 1.0 5.0Concentration (µM)Split reads / mapped readsDatasetlllHCT116−unstrandedHCT116−strandedhTERT−strandedFigure 2.39: Proportion of mapped reads split during the alignment process.Proportions for T3 concentrations common amongst the three datasetsare shown.Dose dependent decreases in split read proportions may be explained by theincreasing presence of aberrantly spliced transcripts. The GSNAP aligner maystruggle to map splice junction reads from novel splice sites in these transcripts.The overall lower proportion of split read proportions in the stranded RNA-Seqlibraries may suggest that the unstranded RNA-Seq libraries contain a greater pro-portion of reads erroneously mapped to non-contiguous regions of the genome.However, the higher correlation of unstranded RNA-Seq and PacBio AS event PSIvalues suggests that the opposite may be true: the stranded RNA-Seq datasetsmay include less RNA splicing information.92Chapter 3DiscussionThe results of the analyses presented in this thesis has demonstrated that theT3 CLK inhibitor is an effective disruptor of normal RNA processing. Applyingthe CLK inhibitor to cells in progressively greater concentrations allowed dose-dependent response patterns to be observed in alternative splicing regulation,3′-end processing (i.e. conjoined gene formation), and gene expression regula-tion. Performing concentration-curve experiments in both HCT116 colon cancerand normal hTERT cells revealed that the majority of observable effects on thetranscriptome were not specific to cancer or normal biology.AS events exhibited varying levels of sensitivity to CLK inhibition. For exam-ple, SE events displayed a sharp decrease in PSI starting at the 0.5μM concentra-tion, compared to lower concentrations. RI events appear to be less dependent onCLK activity and began to show large increases in PSI at the 1.0μM concentration.These splicing responses clearly indicate that CLK inhibition disrupts splice siterecognition.TheRS domain of SR proteins are generally thought to facilitate protein-proteininteractions. However, a recent study has shown that phosphorylation is requiredfor the RS domain of SRSF1 to dissociate from the RRM domain, allowing the RRMdomain to recruit U1 snRNP [14]. Under either model, repressing RS domain phos-phorylation prevents SR proteins already bound to the RNA substrate from pro-moting spliceosome formation. Therefore, RNA-bound and unphosphorylated SRproteins may directly inhibit splicing.ESE density appears to be an important predictor of AS inclusion levels. Alter-93native sequences in SE and RI events that are up-regulated upon CLK inhibitiontend to have a greater density of ESE motifs than down-regulated exons. GreaterESE density provides more opportunities for SR protein binding, and increasesthe chances of a sufficiently phosphorylated SR protein being available to recruitmembers of the spliceosome. SEs and RIs appear to be regulated by different SRproteins; SE events that decreased in PSI with treatment were depleted of SRSF1,SRSF2, and SRSF5 binding motifs. Similarly responding RI events were depletedof SRSF1, SRSF2, and SRSF6 binding motifs.RNA 3′-end cleavage was also shown to be negatively impacted by CLK inhi-bition. Conjoined gene formation occurred in a T3 dose-dependent manner and,similar to SE events, greater effects were observed starting at 0.5μM. Targetedsequencing of a subset of detected CGs recapitulated these results, and verifiedthe existence of detected CGs in untreated cells. Dose-dependent increases in CGPSI and decreases in non-conjoined upstream gene transcription indicate that CGexpression is “stolen” from the upstream gene.Conjoined gene formation through transcriptional read-through appears tobe a natural phenomenon and has received some attention in the literature [67,70]. T3-induced CG production patterns suggest that CLK phosphorylation isimportant for the 3′-end cleavage reaction of some genes. U2AF, a component ofthe spliceosome, has been shown to promote 3′-end cleavage by interacting withCFIm [20]. SR proteins facilitate the recruitment of U2AF to 3′ splice sites [11], andthus may indirectly promote recruitment of CFImwhen properly phosphorylated.However, SR proteins have also been shown to interact directly with CFIm [25],and so may also directly promote its recruitment. Involvement of CFIm in CGtranscription regulation is supported by the finding that, among genes lackingcanonical poly(A) signals at their terminal polyadenylation site, upstream CGpartners have a higher proportion of terminal polyadenylation sites with CFIm-binding UGUA signals. Interestingly, phosphorylation of CFIm is required forthe 3′-end cleavage reaction to occur [26]. This presents the possibility that CLKphosphorylates the RS-like domain of CFIm itself and regulates 3′-end processing.T3 treatment revealed 5–6 gene expression response patterns to CLK inhi-bition, with the bulk of genes being down-regulated upon treatment. For mostdifferentially expressed genes, greater changes in expression were observed at940.5μM and higher of CLK inhibitor. Gene expression regulation, therefore, showsa similar sensitivity to CLK inhibition as seen in AS and CG regulation. A notableexception is a group of genes strongly down-regulated beginning at 5.0μM. Thesegenes may be more resilient to CLK inhibition, or their down-regulation may bea secondary response to strong RNA processing disruption.Splicing factors were among the genes most affected by AS changes at lowdoses of T3, indicating that RNA splicing auto-regulation is one of the cellularprocesses most sensitive to CLK inhibition. Splicing and other RNA process-ing factors were also involved in CG formation, and their expression was down-regulated in treated cells. One method of splicing factor auto-regulation is theinclusion of a “poison” exon that includes a premature termination codon, andthe resulting degradation of the poisoned transcript [60]. AS changes and CGformation may lead to the inclusion of premature termination codons, resultingin reduced expression of RNA processing factors and other genes.CLK inhibition may also result in cell cycle disruption. Cell cycle progressionis linked to RNA splicing, and knock-down of splicing factors can cause cell cy-cle arrest [61, 62]. Cell cycle related genes were not only differentially spliced,but also participated in CG transcription and were generally down-regulated intreated cells. Therefore, global disruption of splicing through CLK inhibition mayinterfere with normal cell cycle progression.High doses of T3 CLK inhibitor may cause pathological cell death. Toll-likereceptor ligands released from dead and dying cells may have caused an innate im-mune response in nearby cells [63], explaining the effects on genes in the toll-likereceptor signaling pathway observed in samples treated with high concentrationsof T3.A noticeable similarity in the biological processes affected by CLK inhibitionwas observed in the analysis of differentially spliced and expressed genes, andCG participants. RNA metabolism (e.g. transcription and splicing), cell cycle pro-gression, and protein degradation were among those processes sensitive to lossof CLK activity. Disruption of SR protein activity may cause defects in splicingand transcription [69] (and maybe 3′-end processing) in a common set of genes,which would explain the similarity in affected biological processes.953.1 Limitations and Future DirectionsAlternative splicing analysis was performed using event annotations providedby MISO. These annotations only include a limited set of events derived from ex-pressed sequence tags and gene annotation databases. CLK inhibition may causesplicing defects even in constitutive gene regions and so the MISO annotationsmay be too restrictive and may have prevented the capture of the full set of splic-ing changes present in treated cells. Further study into the effects of CLK in-hibition on AS would benefit by performing differential splicing analysis on amore comprehensive set of potential AS events, including those which would notundergo differential splicing under normal conditions.In this thesis, ESE density was shown to correlate with SE and RI splicingresponse. However, only SE and RI event types were tested and there are likelyto be other genomic features predictive of splicing response. Future investigationmay be able to predict changes in splicing upon CLK inhibition by inspecting alarger set of features, such as those used in splicing code studies [71], on the fullspectrum of AS event types.Analysis presented in this thesis suggests that SR protein or CFIm phospho-rylation may be important for the 3′-end cleavage reaction and CG transcriptionregulation. However, further experiments are necessary to fully illuminate therole of CLKs in CG formation. One approach might be to use HITS-CLIP as-says (cross-linking and immunoprecipitation combined with high-throughput se-quencing) to compare RNA processing factor binding profiles in untreated andtreated cells. Likewise, immunoprecipitation methods could be used to investi-gate changes in protein-protein interactions between and with 3′-end processingfactors. Further, the proportion of CG transcripts translated into proteins maybe tested experimentally. This would shed light on whether CG transcription isprimarily a gene expression regulatory mechanism, or it is intended to producefunctional proteins. Similar experiments could be performed to fully reveal themechanism by which CLK inhibition disrupts alternative splicing.Amanuscript of the presentedwork is in preparationwith the intent to submitto a scientific journal.963.2 ConclusionsThis is the first systematic analysis of the transcriptomic consequences of CLKinhibition. Loss of CLK function resulted in the the disruption of RNA splicing,3′-end processing, and gene expression for genes involved in a common set ofbiological processes. The dependence of transcript 3′-end cleavage on CLK activ-ity has not been previously reported in the literature. Insights derived from thisthesis’ will inform future investigations into RNA processing regulation, and therole of CLKs therein.97Bibliography[1] Pertea M, Salzberg SL (2010) Between a chicken and a grape: estimatingthe number of human genes. Genome Biol 11: 206. → pages[2] Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, et al. (2012)GENCODE: the reference human genome annotation for The ENCODEProject. Genome research 22: 1760–1774. → pages[3] Ni JZ, Grate L, Donohue JP, Preston C, Nobida N, et al. (2007)Ultraconserved elements are associated with homeostatic control ofsplicing regulators by alternative splicing and nonsense-mediated decay.Genes & development 21: 708–718. → pages[4] Krawczak M, Reiss J, Cooper DN (1992) The mutational spectrum of singlebase-pair substitutions in mRNA splice junctions of human genes: causesand consequences. Human genetics 90: 41–54. → pages[5] David CJ, Manley JL (2010) Alternative pre-mRNA splicing regulation incancer: pathways and programs unhinged. Genes & development 24:2343–2364. → pages[6] Yoshida K, Sanada M, Shiraishi Y, Nowak D, Nagata Y, et al. (2011) Frequentpathway mutations of splicing machinery in myelodysplasia. Nature 478:64–69. → pages[7] Webb TR, Joyner AS, Potter PM (2013) The development and application ofsmall molecule modulators of SF3b as therapeutic agents for cancer. Drugdiscovery today 18: 43–49. → pages[8] Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool fortranscriptomics. Nature Reviews Genetics 10: 57–63. → pages[9] Eid J, Fehr A, Gray J, Luong K, Lyle J, et al. (2009) Real-time DNAsequencing from single polymerase molecules. Science 323: 133–138. →pages98[10] Chen M, Manley JL (2009) Mechanisms of alternative splicing regulation:insights from molecular and genomics approaches. Nature ReviewsMolecular Cell Biology 10: 741–754. → pages[11] Shepard PJ, Hertel KJ (2009) The SR protein family. Genome Biol 10: 242.→ pages[12] Shen H, Kan JL, Green MR (2004) Arginine-serine-rich domains bound atsplicing enhancers contact the branchpoint to promote prespliceosomeassembly. Molecular cell 13: 367–376. → pages[13] Shen H, Green MR (2004) A pathway of sequential arginine-serine-richdomain-splicing signal interactions during mammalian spliceosomeassembly. Molecular cell 16: 363–373. → pages[14] Cho S, Hoang A, Sinha R, Zhong XY, Fu XD, et al. (2011) Interactionbetween the RNA binding domains of Ser-Arg splicing factor 1 and U1-70KsnRNP protein determines early spliceosome assembly. Proceedings of theNational Academy of Sciences 108: 8233–8238. → pages[15] Ngo JCK, Chakrabarti S, Ding JH, Velazquez-Dones A, Nolen B, et al. (2005)Interplay between SRPK and Clk/Sty kinases in phosphorylation of thesplicing factor ASF/SF2 is regulated by a docking motif in ASF/SF2.Molecular cell 20: 77–89. → pages[16] Long J, Caceres J (2009) The SR protein family of splicing factors: masterregulators of gene expression. Biochem J 417: 15–27. → pages[17] Zhou Z, Fu XD (2013) Regulation of splicing by SR proteins and SRprotein-specific kinases. Chromosoma 122: 191–207. → pages[18] Graveley BR (2004) A protein interaction domain contacts RNA in theprespliceosome. Molecular cell 13: 302–304. → pages[19] Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. (2008)Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476. → pages[20] Millevoi S, Vagner S (2010) Molecular mechanisms of eukaryoticpre-mRNA 3′ end processing regulation. Nucleic acids research 38:2757–2774. → pages[21] Singh G, Kucukural A, Cenik C, Leszyk JD, Shaffer SA, et al. (2012) Thecellular EJC interactome reveals higher-order mRNP structure and anEJC-SR protein nexus. Cell 151: 750–764. → pages99[22] Lou H, Neugebauer KM, Gagel RF, Berget SM (1998) Regulation ofalternative polyadenylation by U1 snRNPs and SRp20. Molecular andcellular biology 18: 4977–4985. → pages[23] Iseli C, Stevenson BJ, de Souza SJ, Samaia HB, Camargo AA, et al. (2002)Long-range heterogeneity at the 3′ ends of human mRNAs. Genomeresearch 12: 1068–1074. → pages[24] Venkataraman K, Brown KM, Gilmartin GM (2005) Analysis of anoncanonical poly (A) site reveals a tripartite mechanism for vertebratepoly (A) site recognition. Genes & development 19: 1315–1327. → pages[25] Dettwiler S, Aringhieri C, Cardinale S, Keller W, Barabino SM (2004)Distinct sequence motifs within the 68-kDa subunit of cleavage factor Immediate RNA binding, protein-protein interactions, and subcellularlocalization. Journal of Biological Chemistry 279: 35788–35797. → pages[26] Ryan K (2007) Pre-mRNA 3′cleavage is reversibly inhibited in vitro bycleavage factor dephosphorylation. RNA biology 4: 26. → pages[27] Nunes NM, Li W, Tian B, Furger A (2010) A functional human poly (A) siterequires only a potent DSE and an A-rich upstream sequence. The EMBOjournal 29: 1523–1536. → pages[28] López-Bigas N, Audit B, Ouzounis C, Parra G, Guigó R (2005) Are splicingmutations the most frequent cause of hereditary disease? FEBS letters 579:1900–1903. → pages[29] Krawczak M, Thomas NS, Hundrieser B, Mort M, Wittig M, et al. (2007)Single base-pair substitutions in exon–intron junctions of human genes:nature, distribution, and consequences for mRNA splicing. Humanmutation 28: 150–158. → pages[30] Slaugenhaupt SA, Blumenfeld A, Gill SP, Leyne M, Mull J, et al. (2001)Tissue-Specific Expression of a Splicing Mutation in the IKBKAP GeneCauses Familial Dysautonomia. The American Journal of Human Genetics68: 598–605. → pages[31] Nielsen KB, Sørensen S, Cartegni L, Corydon TJ, Doktor TK, et al. (2007)Seemingly Neutral Polymorphic Variants May Confer Immunity toSplicing-Inactivating Mutations: A Synonymous SNP in Exon 5 of MCADProtects from Deleterious Mutations in a Flanking Exonic SplicingEnhancer. The American Journal of Human Genetics 80: 416–432. → pages100[32] Papaemmanuil E, Cazzola M, Boultwood J, Malcovati L, Vyas P, et al. (2011)Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. NewEngland Journal of Medicine 365: 1384–1395. → pages[33] Narla G, DiFeo A, Fernandez Y, Dhanasekaran S, Huang F, et al. (2008)KLF6-SV1 overexpression accelerates human and mouse prostate cancerprogression and metastasis. The Journal of clinical investigation 118:2711–2721. → pages[34] Ward AJ, Cooper TA (2010) The pathobiology of splicing. The Journal ofpathology 220: 152–163. → pages[35] Ghigna C, Giordano S, Shen H, Benvenuto F, Castiglioni F, et al. (2005) CellMotility Is Controlled by SF2/ASF through Alternative Splicing of the RonProtooncogene. Molecular cell 20: 881–890. → pages[36] Elkon R, Ugalde AP, Agami R (2013) Alternative cleavage andpolyadenylation: extent, regulation and function. Nature Reviews Genetics14: 496–506. → pages[37] Fedorov O, Huber K, Eisenreich A, Filippakopoulos P, King O, et al. (2011)Specific CLK inhibitors from a novel chemotype for regulation ofalternative splicing. Chemistry & biology 18: 67–76. → pages[38] Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, et al. (2013)Nonhybrid, finished microbial genome assemblies from long-read SMRTsequencing data. Nature methods 10: 563–569. → pages[39] Quail MA, Smith M, Coupland P, Otto TD, Harris SR, et al. (2012) A tale ofthree next generation sequencing platforms: comparison of Ion Torrent,Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics 13: 341.→ pages[40] Vandenbroucke II, Vandesompele J, De Paepe A, Messiaen L (2001)Quantification of splice variants using real-time PCR. Nucleic AcidsResearch 29: e68–e68. → pages[41] Sanger F, Coulson AR (1975) A rapid method for determining sequences indna by primed synthesis with dna polymerase. Journal of molecularbiology 94: 441–448. → pages[42] Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variantsand splicing in short reads. Bioinformatics 26: 873–881. → pages101[43] Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al. (2013) Star:ultrafast universal rna-seq aligner. Bioinformatics 29: 15–21. → pages[44] Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignmentprogram for mRNA and EST sequences. Bioinformatics 21: 1859–1875. →pages[45] A hands on tutorial of three aligners: BLAT, BLASR, and GMAP. 2014-07-15. → pages[46] Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNAsequencing experiments for identifying isoform regulation. Naturemethods 7: 1009–1015. → pages[47] Alamancos GP, Agirre E, Eyras E (2013) Methods to study splicing fromhigh-throughput RNA Sequencing data. arXiv preprint arXiv:13045952 .→ pages[48] Anders S, Huber W (2010) Differential expression analysis for sequencecount data. Genome biol 11: R106. → pages[49] Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductorpackage for differential expression analysis of digital gene expression data.Bioinformatics 26: 139–140. → pages[50] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mappingand quantifying mammalian transcriptomes by RNA-Seq. Nature methods5: 621–628. → pages[51] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, et al. (2010)Transcript assembly and quantification by RNA-seq reveals unannotatedtranscripts and isoform switching during cell differentiation. Naturebiotechnology 28: 511–515. → pages[52] Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) ImprovingRNA-Seq expression estimates by correcting for fragment bias. Genomebiology 12: R22. → pages[53] Muraki M, Ohkawara B, Hosoya T, Onogi H, Koizumi J, et al. (2004)Manipulation of alternative splicing by a newly developed inhibitor of Clks.Journal of Biological Chemistry 279: 24246–24254. → pages102[54] Barash Y, Calarco JA, Gao W, Pan Q, Wang X, et al. (2010) Deciphering thesplicing code. Nature 465: 53–59. → pages[55] McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, et al. (2011)deFuse: an algorithm for gene fusion discovery in tumor RNA-seq data.PLoS computational biology 7: e1001138. → pages[56] Wu G, Feng X, Stein L (2010) A human functional protein interactionnetwork and its application to cancer data analysis. Genome Biol 11: R53.→ pages[57] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Geneontology: tool for the unification of biology. Nature genetics 25: 25–29. →pages[58] Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR (2003) ESEfinder: a webresource to identify exonic splicing enhancers. Nucleic acids research 31:3568–3571. → pages[59] Merico D, Isserlin R, Stueker O, Emili A, Bader GD (2010) Enrichment map:a network-based method for gene-set enrichment visualization andinterpretation. PloS one 5: e13984. → pages[60] Anko ML, Muller-McNicoll M, Brandl H, Curk T, Gorup C, et al. (2012) TheRNA-binding landscapes of two SR proteins reveal unique functions andbinding to diverse RNA classes. Genome Biol 13: R17. → pages[61] Blencowe BJ (2003) Splicing regulation: the cell cycle connection. Currentbiology 13: R149–R151. → pages[62] Kurokawa K, Akaike Y, Masuda K, Kuwano Y, Nishida K, et al. (2013)Downregulation of serine/arginine-rich splicing factor 3 induces G1 cellcycle arrest and apoptosis in colon cancer cells. Oncogene 33: 1407–1417.→ pages[63] Kawai T, Akira S (2010) The role of pattern-recognition receptors in innateimmunity: update on Toll-like receptors. Nature immunology 11: 373–384.→ pages[64] Langfelder P, Horvath S (2008) WGCNA: an R package for weightedcorrelation network analysis. BMC bioinformatics 9: 559. → pages[65] Langfelder P, Horvath S. Wgcna package faq. Accessed:2014-11-12. → pages103[66] Thorvaldsdóttir H, Robinson JT, Mesirov JP (2012) Integrative genomicsviewer (igv): high-performance genomics data visualization andexploration. Briefings in bioinformatics : bbs017. → pages[67] Prakash T, Sharma VK, Adati N, Ozawa R, Kumar N, et al. (2010)Expression of conjoined genes: another mechanism for gene regulation ineukaryotes. PloS one 5: e13284. → pages[68] Shin C, Manley JL (2002) The SR protein SRp38 represses splicing in Mphase cells. Cell 111: 407–417. → pages[69] Lin S, Coutinho-Mansfield G, Wang D, Pandit S, Fu XD (2008) The splicingfactor SC35 has an active role in transcriptional elongation. Naturestructural & molecular biology 15: 819–826. → pages[70] Greger L, Su J, Rung J, Ferreira PG, Lappalainen T, et al. (2014) TandemRNA Chimeras Contribute to Transcriptome Diversity in HumanPopulation and Are Associated with Intronic Genetic Variants. PloS one 9:e104567. → pages[71] Leung MK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of thetissue-regulated splicing code. Bioinformatics 30: i121–i129. → pages104Appendix ASupporting Materials1050 5 10 15 20 25 300.00.40.8Scale independenceSoft threshold (power)Scale free topology model fit, signed R^21 2 3 4 5 6 7891011121314151617181920212223242526272829300 5 10 15 20 25 30100030005000Mean connectivitySoft threshold (power)Mean connectivity1234 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.1: Soft threshold vs. scale independence and vs. mean connectivityfor HCT116 unstranded RNA-Seq AS PSI WGCNA clustering.1060 5 10 15 20 25 30−0.4−0.20.0Scale independenceSoft threshold (power)Scale free topology model fit, signed R^212345 6 7 8 9 1011121314151617181920212223242526272829300 5 10 15 20 25 30100020003000Mean connectivitySoft threshold (power)Mean connectivity123 4 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.2: Soft threshold vs. scale independence and vs. mean connectivityfor HCT116 stranded RNA-Seq AS PSI WGCNA clustering.1070 5 10 15 20 25 30−0.35−0.20−0.05Scale independenceSoft threshold (power)Scale free topology model fit, signed R^2123456 7 8 91011121314151617181920212223242526272829300 5 10 15 20 25 30100020003000Mean connectivitySoft threshold (power)Mean connectivity123 4 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.3: Soft threshold vs. scale independence and vs. mean connectivityfor hTERT stranded RNA-Seq AS PSI WGCNA clustering.108SA464 SA465 SA466 SA467 SA468 SA469 SA470 SA502 SA503 SA504 SA505 SA537 SA538 SA539 SA540library420246810log10 FPKMFigure A.4: Violin plots of log10FPKM values for upstream gene partners ofhTERT exclusive conjoined genes. FPKM values are plotted for bothHCT116 RNA-Seq datasets and the hTERT dataset. SA464-470 are theHCT116 samples used for unstranded RNA-Seq (dark blue), SA537-540are HCT116 samples used for stranded RNA-Seq (light blue), and SA502-505 are hTERT samples (green). Violin plots for each dataset are orderedby increasing T3 concentration.109SA464 SA465 SA466 SA467 SA468 SA469 SA470 SA502 SA503 SA504 SA505 SA537 SA538 SA539 SA540library864202468log10 FPKMFigure A.5: Violin plots of log10FPKM values for downstream gene partnersof hTERT exclusive conjoined genes. FPKM values are plotted for bothHCT116 RNA-Seq datasets and the hTERT dataset. SA464-470 are theHCT116 samples used for unstranded RNA-Seq (dark blue), SA537-540are HCT116 samples used for stranded RNA-Seq (light blue), and SA502-505 are hTERT samples (green). Violin plots for each dataset are orderedby increasing T3 concentration.110lll0120.0uM 0.05uM 0.10uM 0.50uM 1.0uM 5.0uM 10.0uMConcentrationNormalized expressionFigure A.6: Normalized conjoined gene expression boxplots across T3 con-centrations for HCT116 replicate 1 dataset. Conjoined gene expressionhas been normalized toACTB expression. This dataset is generated fromthe same samples used to generate the HCT116 unstranded RNA-Seqdataset.111l0120.0uM 0.50uM 1.0uM 5.0uMConcentrationNormalized expressionFigure A.7: Normalized conjoined gene expression boxplots across T3 concen-trations for HCT116 replicate 2 dataset. Conjoined gene expression hasbeen normalized to ACTB expression. This dataset is generated from thesame samples used to generate the HCT116 stranded RNA-Seq dataset.112ll0120.0uM 0.50uM 1.0uM 5.0uMConcentrationNormalized expressionFigure A.8: Normalized conjoined gene expression boxplots across T3 con-centrations for hTERT dataset. Conjoined gene expression has been nor-malized to ACTB expression. This dataset is generated from the samesamples used to generate the hTERT stranded RNA-Seq dataset.1130 5 10 15 20 25 30−0.08−0.040.00Scale independenceSoft threshold (power)Scale free topology model fit, signed R^212345 6 7 8 9 1011121314151617181920212223242526272829300 5 10 15 20 25 30100020003000Mean connectivitySoft threshold (power)Mean connectivity12 3 4 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.9: Soft threshold vs. scale independence and vs. mean connectivityfor HCT116 unstranded RNA-Seq FPKM WGCNA clustering.1140 5 10 15 20 25 30−0.14−0.08−0.02Scale independenceSoft threshold (power)Scale free topology model fit, signed R^21234 5 6 7 8 9 1011121314151617181920212223242526272829300 5 10 15 20 25 30140020002600Mean connectivitySoft threshold (power)Mean connectivity12 3 4 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.10: Soft threshold vs. scale independence and vs. mean connectivityfor HCT116 stranded RNA-Seq FPKM WGCNA clustering.1150 5 10 15 20 25 30−0.15−0.05Scale independenceSoft threshold (power)Scale free topology model fit, signed R^212345 6 7 8 9 1011121314151617181920212223242526272829300 5 10 15 20 25 30120018002400Mean connectivitySoft threshold (power)Mean connectivity12 3 4 5 6 7 8 9 101112131415161718192021222324252627282930Figure A.11: Soft threshold vs. scale independence and vs. mean connectivityfor hTERT stranded RNA-Seq FPKM WGCNA clustering.116


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items