Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Identification of RNA binding proteins associated with differential splicing in neuroendocrine prostate… Yeung, Jake 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2014_september_yeung_jake.pdf [ 2.11MB ]
JSON: 24-1.0167443.json
JSON-LD: 24-1.0167443-ld.json
RDF/XML (Pretty): 24-1.0167443-rdf.xml
RDF/JSON: 24-1.0167443-rdf.json
Turtle: 24-1.0167443-turtle.txt
N-Triples: 24-1.0167443-rdf-ntriples.txt
Original Record: 24-1.0167443-source.json
Full Text

Full Text

Identification of RNA binding proteins associatedwith differential splicing in neuroendocrine prostatecancerbyJake YeungB.A.Sc. Chemical Engineering, University of Waterloo 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Genome Science and Technology)The University Of British Columbia(Vancouver)May 2014c© Jake Yeung, 2014AbstractAlternative splicing is a tightly regulated process that can be disrupted in cancer.Established cancer genes express splice isoforms with distinct properties and theirdifferential expression is associated with tumour progression. Although prostateadenocarcinoma (PCa) is effectively managed at early stage by therapies target-ing the androgen receptor signaling axis, up to 30% of late stage prostate can-cers progress to a treatment-resistant form of the disease called neuroendocrineprostate cancer (NEPC), for which there are few therapeutic options. It is histo-logically distinct from PCa, expresses a neuronal gene signature and is associatedwith poor survival (<1 year). We hypothesize that alternative splicing has animportant role in driving transformation of PCa tumours towards the NEPC phe-notype and we seek to identify regulators of aberrant alternative splicing. Weintegrated a number of bioinformatics tools to investigate alternative splicing inNEPC. Analyzing RNA-Seq data from a patient-derived xenograft model of neu-roendocrine transdifferentiation, we compared splicing profiles between NEPCand PCa and identified a set of differentially spliced cassette exons. We foundthese cassette exons to code for protein segments containing DNA-binding do-mains, protein-binding regions and posttranslational modification sites. We dis-covered evolutionarily conserved motifs around intronic regions of the cassetteexons and implicated them with RNA recognition motifs of tissue-specific RNAbinding proteins. We corroborated our findings by analyzing RNA-Seq data froma patient-tumour cohort and found recurrent RNA binding proteins associated withcassette exon inclusion. Our integrated analysis suggests that splicing changesbetween PCa and NEPC are mediated by tissue-specific RNA binding proteins,which may be of therapeutic or diagnostic value.iiPrefaceThis dissertation is original, unpublished, independent work by the author, JakeYeung.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background on prostate cancer . . . . . . . . . . . . . . . . . . . 11.1.1 Cell types of origin and genetic alterations for prostateadenocarcinoma . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Mechanism of androgen action in prostate adenocarcinomaand castration resistance . . . . . . . . . . . . . . . . . . 31.1.3 Histological and molecular characteristics of neuroendocrineprostate cancer . . . . . . . . . . . . . . . . . . . . . . . 41.1.4 Clinically relevant xenograft models of neuroendocrine prostatecancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Background on alternative splicing . . . . . . . . . . . . . . . . . 61.2.1 Mechanisms and regulation of alternative splicing . . . . . 61.2.2 Alternative splicing in cancer . . . . . . . . . . . . . . . . 71.2.3 Therapeutic potential for targeting alternative splicing andits regulatory factors . . . . . . . . . . . . . . . . . . . . 9iv1.3 Scope and objectives of the thesis . . . . . . . . . . . . . . . . . . 102 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Three prostate cancer cohorts . . . . . . . . . . . . . . . . . . . . 122.1.1 Patient-derived xenograft models, 331 and 331R . . . . . . 132.1.2 Vancouver Prostate Centre cohort . . . . . . . . . . . . . 132.1.3 Beltran et al. cohort . . . . . . . . . . . . . . . . . . . . . 142.1.4 Combined human cohort . . . . . . . . . . . . . . . . . . 142.2 Processing of RNA-Seq data . . . . . . . . . . . . . . . . . . . . 152.2.1 Alignment of RNA-Seq reads to the genome . . . . . . . . 152.2.2 Quantitating the expression of isoforms from RNA-Sequsing MISO . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Detecting differential isoform expression . . . . . . . . . 162.3 Motif discovery and motif similarity comparisons . . . . . . . . . 182.3.1 Motif discovery with MEME . . . . . . . . . . . . . . . . 182.3.2 Database of RNA recognition motifs . . . . . . . . . . . . 192.3.3 Motif similarity comparison with TOMTOM . . . . . . . 202.3.4 Evolutionary conservation of motifs . . . . . . . . . . . . 202.4 Functional analysis of differentially regulated cassette exons . . . 202.4.1 Translation from nucleotide to amino acid sequence in lo-cal exons . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.2 Sequence annotations of protein segments encoded by exons 212.4.3 Prediction of protein binding regions . . . . . . . . . . . . 213 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1 Detection of differentially spliced events . . . . . . . . . . . . . . 233.1.1 Differentially regulated alternatively spliced events . . . . 233.1.2 Cassette exons are activated and repressed in NEPC . . . . 243.1.3 Protein and mRNA expression of differentially regulatedcassette exons . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Function of differentially regulated cassette exons . . . . . . . . . 243.2.1 Cassette exons code for a wide range of protein domains . 273.2.2 Cassette exons are significantly enriched in predicted bind-ing protein regions . . . . . . . . . . . . . . . . . . . . . 293.3 Enriched motifs around genic regions of differentially regulatedexons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 RNA binding proteins associated with enriched motifs inxenograft model . . . . . . . . . . . . . . . . . . . . . . . 33v3.3.2 Enriched motifs are enriched with evolutionarily conservedelements . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4 Corroboration of xenograft model results with a patient-tumourcohort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.1 Differentially regulated cassette exons in human cohort . . 363.4.2 Motif analysis in human cohort corroborates SRRM4 asan RNA binding protein associated with exon inclusion . . 383.5 Neural-specific splicing factors are overexpressed in NEPC . . . . 403.5.1 Differentially spliced genes are enriched in previously re-ported SRRM4-regulated genes . . . . . . . . . . . . . . 413.6 Recapitulation of the results . . . . . . . . . . . . . . . . . . . . . 464 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.1 Parallels between neural cell differentiation and neuroendocrinedifferentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2 Relevance of results in the context of prostate cancer . . . . . . . 494.3 Reflection and discussion of the methods . . . . . . . . . . . . . . 504.3.1 Quantitation of differentially expressed isoforms using MISO 504.3.2 Identification of annotated features of cassette exons . . . 514.3.3 Prediction of protein-binding regions in cassette exons . . 524.3.4 Motif discovery and motif similarity analysis . . . . . . . 524.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 78viList of TablesTable 2.1 Mapped read pairs across three prostate cancer cohorts. . . . . 13Table 2.2 Clinical features of frozen NEPC tumours in Beltran et al. co-hort that were used in this work. . . . . . . . . . . . . . . . . . 14Table 2.3 Interpretations for Bayes factor according to Kass and Raftery . 17Table 3.1 Enrichment of SRRM4-regulated genes across cohorts. Frac-tions show number of genes matching to a list of SRRM4-regulated genes over the total number of DSEs found in speci-fied cohort. Values in parentheses denote p-values from Fisher’sexact test. Overlap contains DSEs common in both xenograftand patient-tumour cohorts. Background genes used for Fisher’sexact test taken from Table A.4 . . . . . . . . . . . . . . . . . 47Table A.1 RNA-Seq metadata for 331 and 331R . . . . . . . . . . . . . . 78Table A.6 Uniprot annotations of differentially included exons (331 vs331R in xenograft cohort) . . . . . . . . . . . . . . . . . . . . 78Table A.7 Uniprot annotations of differentially excluded exons in NEPC(331 vs 331R in xenograft cohort) . . . . . . . . . . . . . . . . 85Table A.8 Uniprot annotations of differentially included exons (NEPC vsPCa in VPC-Beltran pooled cohort) . . . . . . . . . . . . . . . 87Table A.9 Uniprot annotations of differentially excluded exons in NEPC(NEPC vs PCa in VPC-Beltran pooled cohort) . . . . . . . . . 88Table A.2 RNA-Seq metadata of patient-tumour samples from VPC co-hort. RL and MRP denotes read length and mapped read pairs,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Table A.3 Metadata of patient-tumour samples from Beltran et al. cohortthat were sequenced by RNA-Seq using Illumina Genome An-alyzer II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92viiTable A.4 Annotated alternative splicing events used in MISO . . . . . . 93Table A.5 Genes encoding baits with SRRM4-regulated exons used inLUMIER screens. Taken from supplementary table of studyof Ellis et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94viiiList of FiguresFigure 1.1 Cell types of origin for prostate cancer . . . . . . . . . . . . . 3Figure 1.2 Advanced prostate adenocarcinoma is initially treated withandrogen deprivation therapy but treatment inevitably fails anddisease progresses to castration-resistant prostate cancer, ofwhich NEPC is a subtype. . . . . . . . . . . . . . . . . . . . 5Figure 1.3 RNA binding proteins can encourage inclusion, exclusion orboth by binding to cis-regulatory elements that promote eitherinclusion or exclusion. . . . . . . . . . . . . . . . . . . . . . 8Figure 1.4 Alternative isoforms of Bcl-x showing opposing functions.The longer form, Bcl-x(L) promotes anti-apoptotic function.The shorter form, Bcl-x(s) promotes apoptosis. . . . . . . . . 8Figure 2.1 Cartoon of neuroendocrine transdifferentiation in 331 and 331Rin the xenograft cohort. Briefly, neuroendocrine phenotypewas induced by androgen deprivation and RNA-Seq was per-formed before castration and after neuroendocrine transdiffer-entiation was complete. Time frame to reach fully differenti-ated neuroendocrine state is between five to eight months. . . . 13Figure 2.2 Workflow for discovering and matching motifs. Entire exonregions and 100 nucleotide intronic regions were extractedaround the cassette exons. Motifs were discovered using MEMEwith a position-specific prior coming from a background ofconstitutively spliced exons. Discovered motifs were matchedto RNA-recognition motifs using TOMTOM. . . . . . . . . . 19ixFigure 3.1 Differentially regulated alternatively spliced events categorizedby type of alternative splicing. Cassette exons are the mostrepresented category and are used for further downstream anal-yses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 3.2 548 differentially regulated cassette exons found between 331(PCa) and 331R (NEPC). Rows represent alternatively splicedevents, columns represent samples. Color key represents PSI(Ψ) values. Cassette exons group into two types: 1) cassetteexons included in NEPC (n=346) and 2) cassette exons ex-cluded in NEPC (n=202). . . . . . . . . . . . . . . . . . . . . 25Figure 3.3 Differential expression of genes at the mRNA and protein levelbetween 331 and 331R. Positive values represent overexpres-sion in 331R (NEPC) compared to 331 (PCa). (A): differen-tial expression of all genes containing matched protein andmRNA data. Neuronal gene signatures are in the upper right,PCa signatures bottom left. (B): differential expression ofonly genes containing differentially regulated cassette exon. . 26Figure 3.4 Functions of differentially regulated cassette exons between331 and 331R. Annotated gene names are examples of geneswith differentially regulated cassette exons coding for inter-esting protin regions and discussed in Section 3.2.1. Annota-tions under arrows denote further description of the UniProtannotation. For example, NASP inclusion exon contains 9phosphoamino acids and a 2 histone-binding regions . . . . . 28Figure 3.5 Sashimi plot of FOXM1 in the xenograft cohort. Reads perkilobase per million (RPKM) represent coverage around thealternatively spliced event. Numerical values linking RPKMcoverage represents reads spanning different exon-exon junc-tions. Ψ estimates from MISO are shown as a probabilitydistribution (right). Upstream, cassette and downstream exonstructure shown schematically (below). . . . . . . . . . . . . 29Figure 3.6 Sashimi plot of PTPRF (A) and PTPRS (B) in the xenograftcohort. Reads per kilobase per million (RPKM) represent cov-erage around the alternatively spliced event. Numerical valueslinking RPKM coverage represents reads spanning differentexon-exon junctions. Ψ estimates from MISO are shown as aprobability distribution (right). Upstream, cassette and down-stream exon structure shown schematically (below). . . . . . . 30xFigure 3.7 Sashimi plot of FOXM1 in the xenograft cohort. Reads perkilobase per million (RPKM) represent coverage around thealternatively spliced event. Numerical values linking RPKMcoverage represents reads spanning different exon-exon junc-tions. Ψ estimates from MISO are shown as a probabilitydistribution (right). Upstream, cassette and downstream exonstructure shown schematically (below). . . . . . . . . . . . . 31Figure 3.8 Exons with predicted protein binding regions in the xenograftcohort. Included exons are exons that are included in NEPCbut excluded in PCa. Excluded exons are exons that are ex-cluded in NEPC but included in PCa. Number of exons usedas input: 346 inclusion, 202 exclusion. Fraction indicatesnumber of protein segments that are predicted for protein bind-ing regions out of the total number of protein segments thatwas confidently translated from the exon (less than number ofinput exons because many could not be confidently translatedto protein segments). . . . . . . . . . . . . . . . . . . . . . . 32Figure 3.9 Motifs discovered from NEPC-specific cassette exons in thexenograft cohort. Discovered motifs were identified from 100nucleotide intronic regions flanking the exons. . . . . . . . . . 34Figure 3.10 Motif map of RBPs whose RRMs matched to discovered mo-tifs. P-values obtained through conversion of a score repre-senting overlap of RRM and motif. Colours represent the dis-covered motifs. Size of the text represents number of DSEsthat the discovered motifs matched. Since some RRMs wereidentified through RNA binding domain similarity (Section 2.3.2),RBPs with shared RRMs are identified using a forward slash(’/’). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Figure 3.11 (A): Density plot of GERP scores for discovered motifs ver-sus control sites. Dotted vertical line represents GERP scoreof 1.5, the threshold over which an element is considered to beconserved. (B): Fraction of intronic regions in the discoveredmotifs that were considered evolutionarily conserved in thexenograft cohort. Discovered motifs were compared againsta control region where a 9mer was randomly chosen from the100 nucleotide intronic region from which the discovered mo-tif was found. . . . . . . . . . . . . . . . . . . . . . . . . . . 37xiFigure 3.12 Exons with predicted protein binding regions in the humancohort. Included exons are exons that are included in NEPCbut excluded in PCa. Excluded exons are exons that are ex-cluded in NEPC but included in PCa. Number of exons usedas input: 146 inclusion, 63 exclusion. Fraction indicates num-ber of protein segments that are predicted for protein bindingregions out of the total number of protein segments that wasconfidently translated from the exon (less than number of in-put exons because many could not be confidently translated toprotein segments). . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.13 Density plot of intronic location of sites contributing to twodiscovered motifs found to be associated with known RNArecognition motifs. . . . . . . . . . . . . . . . . . . . . . . . 41Figure 3.14 (A): Density plot of GERP scores for discovered motifs ver-sus control sites. Dotted vertical line represents GERP scoreof 1.5, the threshold over which an element is considered to beconserved. (B): Fraction of intronic regions in the discoveredmotifs that were considered evolutionarily conserved in thexenograft cohort. Discovered motifs were compared againsta control region where a 9mer was randomly chosen from the100 nucleotide intronic region from which the discovered mo-tif was found. . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.15 209 differentially regulated cassette exons found between 331(PCa) and 331R (NEPC). Rows represent alternatively splicedevents, columns represent samples. Color key represents PSI(Ψ) values. Cassette exons group into two types: 1) cassetteexons included in NEPC (n=146) and 2) cassette exons ex-cluded in NEPC (n=63). . . . . . . . . . . . . . . . . . . . . 43Figure 3.16 Splicing factors with RRMs matched with discovered motifs.P-values obtained through conversion of a score representingoverlap of RRM and motif. Colours represent the discoveredmotifs. Size of the text represents number of DSEs that thediscovered motifs matched. . . . . . . . . . . . . . . . . . . . 44xiiFigure 3.17 Differential gene expression of RBPs across three prostatecancer cohorts. X- and y-axes represent Benjamini-Hochbergadjusted p-values for Student’s t-test for differential gene ex-pression for VPC and Beltran cohort, respectively. Size ofeach bubble represent log2 fold change between 331R and331 in the xenograft cohort. Colours represent over- or under-expression compared to PCa. Gene names shown for differ-entially expressed genes in either one of three cohorts. . . . . 45Figure 3.18 A: overlap of differentially regulated cassette exons betweenxenograft and human cohorts. B: overlap of genes containingdifferentially regulated cassette exons between xenograft andhuman cohorts. Cassette exon events exceed genes containingcassette exons because multiple cassette events can occur ona single gene. . . . . . . . . . . . . . . . . . . . . . . . . . . 46xiiiGlossaryPCa prostate adenocarcinomaADT androgen deprivation therapyCRPC castration-resistant prostate cancerNED neuroendocrine transdifferentiationNEPC neuroendocrine prostate cancerAS alternative splicingPSA prostate-specific antigenDHT dihydrotestosteroneAR androgen receptorRNA-Seq RNA Sequencing, also called ”Whole Transcriptome Shotgun Sequenc-ing” uses next-generation sequencing to reveal a snapshot of mRNA abun-dance across the transcriptome.RBP RNA binding proteinRRM RNA recognition motifPWM position weight matrixGERP Genomic Evolutionary Rate ProfilingRS rejected substitutionsmRNA messenger RNAxivpre-mRNA precursor mRNAsnRNP nuclear ribonucleoprotein particleESE exonic splicing enhancerESS exonic splicing silencerISE intronic splicing enhancerISS intronic splicing silencerSR serine/arginine-rich proteinhnRNP heterogeneous nuclear ribonucleoproteinDMD Duchenne muscular dystrophyPWM position weight matrixUniProt Universal ProteinUniProt is the universal protein resource, a centralrepository of protein data created by combining the Swiss-Prot, TrEMBLand PIR-PSD databases.VPC Vancouver Prostate CentreBH Benjamini-Hochberg, a multiple test-correction procedure based on false dis-covery rates.xvAcknowledgmentsI would like to express my appreciation to my supervisor, Professor Colin Collins.You have been a tremendous mentor for me. I would like to thank you for push-ing me to think of the big picture while allowing me to grow as an independentresearcher. You have given me valuable advice for my research as well as formy career. Special thanks to Professor Cenk Sahinalp, for showing me how com-puter scientists think and communicate, with clarity and precision. I would like tothank Professor Xuesen Dong for insightful discussions on my research and beingon my committee. You remind me to always relate my computational results to thecomplex world of biology. Your respective expertise was crucial for my research,allowing me to combine computational approaches to study prostate cancer.I would like to thank all the people in the Collins lab for helping me get upto speed with prostate cancer research and computational biology. Coming froman engineering background, I knew very little about prostate cancer or compu-tational biology in the beginning. Thank you all for being patient with my naivequestions as I began my journey into this field. Special thanks to Drs. Anna Lapukand Stanislav Volik for guiding my research and overcoming difficulties in bothbiological concepts and computational software. To the post-doctoral researchers,Drs. Fan Mo, Alex Wyatt and Nilgu¨n Donmez, thank you for always being will-ing to help, from computational difficulties, proof-reading abstracts to explainingsimple biological phenomena. A bulk of my learning and, consequently, insightsto my research, came from conversations with each of you. I would like to thankRobert Bell for the lengthy discussions on statistical theory; Dr. Michael Hsingfor your help with motif analysis; Kendric Wang and Rohan Romnarine for bi-ological and computer science discussions; Raunak Shrestha for allowing me toparticipate in some of your work regarding networks; and Carmen Bayly for yourinitial work using ANCHOR. Finally, special thanks to Stephane Le Bihan, Dr.Shusuke Akamatsu, Dr. Dong Lin, Sonal Brahmbhatt, Shawn Anderson, RobertShukin and Brian McConeghy for your memorable conversations and insights.xviI would like to thank my research rotation supervisors, who have directly con-tributed to my development which indirectly contributed to this thesis. I wouldlike to express my appreciation to Professor Jamie Piret for providing mentorshipduring my first exposure to graduate school. Your calm and thoughtful approachto research has stayed with me during my master’s thesis. Thank you for beingpatient with me as I explored how to be a graduate student, I am a significantlybetter graduate student because of your initial guidance. Special thanks to Profes-sor Michael Kobor and Dr. Anthony Fejes for an enlightening research rotation,allowing me to develop as a proper programmer, writing well-factored and cleancode. Thank you for showing me the vast world of open-source software devel-opment and epigenetics. I wish there were more time in my research rotations tohave taken both of the projects to a more satisfied completion.Most importantly, I would like to thank my family, Kitty Lam, On Yeung andLeslie Yeung for supporting me along the way. Your encouragement to pursuemy passions have been crucial for my development. Special thanks to all of myfriends in Vancouver and beyond. Thank you for the memories, for providingbalance with my research and for giving me perspective in life.Finally, I would like to thank the Genome Science and Technology (GSAT)program and funding from NSERC-CREATE. The GSAT program was well de-signed and overall a great graduate program that gave me the freedom to explorethe intersection of different fields, from engineering, computer science to molec-ular biology and genetics. I was lucky to have had the opportunity to explore eachof these fields, which would not have been possible without the generous fundingprovided by the GSAT and NSERC-CREATE programs.xviiChapter 1IntroductionThe introduction covers the background information for this thesis. Outlining thecurrent literature in prostate cancer and alternative splicing, the chapter puts intocontext the thesis in relation to the current state of the field. This chapter is brokendown into several sections:• Section 1.1 covers background information regarding prostate cancer. Thesection discusses mechanisms of androgen action and how prostate cancerbecomes treatment-resistant. The section places significant focus on neu-roendocrine prostate cancer as an important treatment-resistant subtype ofprostate cancer.• Section 1.2 discusses alternative splicing in the context of disease, focusingon cancer.• Section 1.3 outlines the scope of the thesis based on current knowledgeof neuroendocrine prostate cancer, establishes objectives of the work andsummarizes expected outcomes from the objectives.1.1 Background on prostate cancerProstate cancer is the most commonly diagnosed cancer in men and is the sec-ond most common cause of male cancer-related deaths in North America [94].To detect early stages of prostate adenocarcinoma (PCa), prostate-specific anti-gen (PSA) is used as a biomarker. PCa is a slow growing cancer, with an esti-mated lag-time of 15 years or more from initial detection of PSA to PCa [11].1When prostate cancer is first diagnosed, it is treated by surgery or radiation. Forpatients who progress to locally advanced or metastatic disease, androgen depri-vation therapy (ADT) is used as treatment [92]. Although locally advanced PCaresponds initially to ADT, the disease inevitably progresses to castration-resistantprostate cancer, of which neuroendocrine prostate cancer (NEPC) is a subtype[12].Mechanistic insights into disease progression requires understanding of fre-quent genetic alterations and molecular mechanisms leading to castration-resistantprostate cancer (CRPC) and NEPC. The following subsections address each con-cept individually:• Section 1.1.1 discusses cell types of origin and genetic alterations in prostateadenocarcinoma.• Section 1.1.2 discusses mechanisms of androgen action and the develop-ment of castration-resistant prostate cancer.• Section 1.1.3 focuses on neuroendocrine prostate cancer and and its charac-teristics.• Section 1.1.4 discusses clinically relevant models of neuroendocrine trans-differentiation.1.1.1 Cell types of origin and genetic alterations for prostateadenocarcinomaThe prostate epithelium is comprised of three differentiated cell types: luminal se-cretory cells, basal cells and neuroendocrine cells. Prostate adenocarcinoma canderive from luminal or basal epithelial cells from primary human prostate tissue[34, 108] (Figure 1.1). Many prostate cancers contain multiple foci with varyinggenetic alterations [70]. Genetic alterations in PCa involves large-scale genomicrearrangements and copy number changes across multiple chromosomes, a phe-nomenon called chromoplexy [6, 91, 112]. A number of critical tumor suppressorgenes are often deleted in one or both copies, including PTEN, NKX3.1, TP53and CDKN1B [11]. TMPRSS2-ERG is an oncogenic gene fusion found in 50%of tumours, making it an important marker for patient stratification [70].2Luminal cellsNE cell Basal cellsTransformed luminal cellsTransformed basal cellsPCaLuminal di!erentiationFigure 1.1: Cell types of origin for prostate cancer1.1.2 Mechanism of androgen action in prostateadenocarcinoma and castration resistanceAndrogens regulate PCa growth through stimulating proliferation and inhibitingapoptosis. The main circulating androgen, testosterone, enters prostate cells and isconverted to dihydrotestosterone (DHT) by 5α-reductase enzyme with a yield of90%. Androgen receptor AR binds with DHT in the cytoplasm, leading to dimer-ization and phosphorylation. The dimerized androgen receptor (AR) translocatesto the nucleus, binds to elements to AR response elements and expresses genes inthe AR signaling axis, leading to increased growth and survival [30].Due to the importance of the androgen receptor in the proliferation of PCacells, locally advanced PCa is initially treated with DHT. However, almost all ad-vanced prostate cancer progresses to CRPC after a period of ADT [11]. Pathwaysleading to castration resistance are not fully understood. Potential, non-mutuallyexclusive mechanisms include [30]:1. Hypersensitivity of AR pathway via AR amplification.2. Promiscuity of AR mediated by AR mutations, leading to non-specific bind-ing of AR to other ligands.33. Alternative pathways increasing AR transcription activity, such as growthfactor activated pathways and receptor tyrosine kinase activated pathways.4. Alternative pathways bypassing the AR pathway.5. Selection of pre-existing castration-resistant epithelial stem cells.The variety of potential mechanisms leading to castration resistance makes castration-resistant prostate cancer difficult to treat. Nevertheless, there are novel oral agentsavailable to combat castration-resistant prostate cancer such as Abiraterone, Bi-calutamide and Enzalutamide. These oral agent all target the androgen signalingaxis, which is ineffective against hormone refractory NEPC. Abiraterone blocksandrogen production in tumours, testis and adrenal gland through inhibition ofproducts of the CYP17A1 gene [25]. Enzalutamide blocks androgen receptorfunction by inhibition of androgen binding to androgen receptor, nuclear translo-cation of androgen receptor and androgen receptor association with nuclear DNA[90]. Certain subtypes of castration-resistant prostate cancer, such as NEPC, arehormone refractory (i.e. does not express androgen receptor), rendering these anti-androgen therapeutics ineffective.1.1.3 Histological and molecular characteristics ofneuroendocrine prostate cancerNeuroendocrine prostate cancer (NEPC), a small cell carcinoma of the prostate,is an aggressive subtype of castration-resistant prostate cancer, with most patientsdying within one year of diagnosis [73]. In up to 30% of late stage PCas [12],there is evidence of PCa cells acquiring a neuroendocrine phenotype, a processknown as neuroendocrine transdifferentiation (NED). [106] (Figure 1.2). SinceADT may promote the development of NEPC, it is anticipated that the incidenceof NEPC may increase with the introduction of new anti-androgen therapies in theclinic [13]. Deeper molecular understanding of NEPC is needed to identify noveltherapeutic strategies to combat this deadly disease.NEPC is histologically and molecularly distinct from PCa. Histologically,NEPC resembles small cell carcinomas, large cell neuroendocrine carcinomas orcarcinoid tumours of other primary sites [12]. Molecularly, NEPC does not ex-press androgen receptor or PSA and expresses neuroendocrine markers includingchromogranin A.Although NEPC shares phenotypic similarities with neuroendocrine cells inthe prostate epithelium, recent data suggest that NEPC shares a common ori-4PCaADTTreatment failureNEPCAR+ AR-Neuroendocrine transdi!erentiationFigure 1.2: Advanced prostate adenocarcinoma is initially treated with an-drogen deprivation therapy but treatment inevitably fails and diseaseprogresses to castration-resistant prostate cancer, of which NEPC is asubtype.gin with PCa [73]. TMPRSS2-ERG, has been reported in approximately 50% ofNEPC, a frequency similar to PCa. This similarity in TMPRSS2-ERG rearrange-ment between NEPC and PCa suggests transdifferentiation from PCa to NEPCand distinguishes NEPC from small carcinomas of other primary sites [13]. Fur-ther support of neuroendocrine transdifferentiation came recently from a study byLin et al. Using patient-derived xenograft tumour mouse models, Lin et al. in-duced neuroendocrine transdifferentiation from PCa to NEPC through castrationand Bicalutamide treatment (a front-line therapy), supporting the hypothesis thatNEPC can evolve directly from PCa via an adaptive response.Despite common origins between NEPC and PCa, vast differences in the tran-scriptome has been observed, with over 1000 genes showing differential expres-sion [12]. Downregulation of REST, a transcription factor that represses neuronaldifferentiation, leads to upregulation of neuronal genes in NEPC [57]. In addi-tion, NEPC-specific splicing has also been reported [57], but the regulators ofNEPC-specific splicing are poorly understood.1.1.4 Clinically relevant xenograft models of neuroendocrineprostate cancerHistorically, reliable preclinical models for studying prostate cancer lagged be-hind other cancers such as breast cancer [101]. However, important progresshas been made to develop preclinical models that reliably capture the dynamics5of prostate cancer. The recent development of patient-derived xenograft tumourmouse models for PCa and castration-resistant prostate cancer subtypes includingNEPC enable controlled experiments in a clinically-relevant setting [60].The use of patient-derived xenograft mouse models create a tight feedbackloop between preclinical experiments and translational solutions for the clinic.For example, we can induce neuroendocrine transdifferentiation in the xenograftmodel and perform deep whole transcriptome sequencing to elucidate mechanisticinsights to the process of neuroendocrine transdifferentiation. By sequencing thetranscriptome at high depths, we enter a resolution that enables us to tease apartsubtle features such as alternative splicing. The small sample size (n=2) and con-trolled setting of the xenograft models translates to meaningful, high-resolutiondata at a manageable cost. The thesis uses xenografts as a discovery cohort fol-lowed by a patient-tumour cohort to corroborate robust findings.1.2 Background on alternative splicing1.2.1 Mechanisms and regulation of alternative splicingAlternative splicing allows the production of multiple messenger RNA (mRNA)isoforms from a single gene, contributing to the vast proteomic diversity in hu-mans. With the rapid development of sequencing technologies, it has been dis-covered that alternative splicing affects 95% of mammalian genes [52]. Differ-ent types of alternative splicing events have been described, including skippedexon, retained intron, alternative 5’ splice site, alternative 3’ splice site, mutuallyexclusive exon, alternative first exon, alternative last exon and tandem 3’ UTRs[109]. The splicing process is performed by an active splicing complexes, calledsplicesomes. Formation of the splicesome complex occurs in a step-wise fashion,involving nuclear ribonucleoprotein particles (snRNPs) and an additional ∼150proteins associating with the precursor mRNA (pre-mRNA) through direct recog-nition of short sequences at the exon/intron boundaries [16].RNA binding proteins have key roles in determining splicing fate by bindingto genic regions of the pre-mRNA (Figure 1.3). It has been shown that much of theinformation necessary to determine splicing fate is present in the sequence of thepre-mRNA [10]. The RNA binding specificity, however, is often low [107]. Forexample, Nova binds to short YCAY motifs [104]. The short, degenerate recogni-tion motifs make motif analysis of RNA binding proteins a more challenging taskthan transcription factors, which often have longer recognition motifs. To com-6pensate for low specificity in RBPs, tight regulation of splice site selection oftenrelies on cooperative binding of proteins to short redundant RNA motifs [95]. Anumber of RNA binding proteins assist in ”guiding” the splicesome to the cor-rect splice sites by ”reading” the sequences present in the pre-mRNA [24]. RNAbinding proteins associate with cis-regulatory elements, broadly categorized as:exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splic-ing enhancers (ISEs) and intronic splicing silencers (ISSs). Two main families ofRNA binding proteins, serine/arginine-rich proteins (SRs) and heterogeneous nu-clear ribonucleoproteins (hnRNPs), associate with these cis-regulatory elementsto encourage inclusion or exclusion of the exon [19, 52].In general, SR protein–pre-mRNA interactions tend to encourage exon inclu-sion whereas hnRNP–pre-mRNA interactions tend to encourage exon exclusion[19]). Changes in local concentration, binding affinity to the pre-mRNA and ac-tivity of SR proteins and other RNA binding proteins have been reported to affectalternative splicing [116]. Differential splicing has been reported in knock-downsof RNA binding proteins [18, 26] as well as differential phosphorylation of RNAbinding proteins [116].Interestingly, alternative splicing has been shown to be tightly coupled to othergene regulatory layers, including mRNA transcription, turnover, transport, trans-lation and even chromatin structure [16]. Factors regulating alternative splicingaffect other layers and, conversely, factors regulating chromatin and transcriptioncomplexes affect alternative splicing. To develop a systems-level understanding ofalternative splicing control in cancer, quantitative and genome-wide understand-ing of alternative splicing as well as the crosstalk between other regulatory layerswill be needed. This thesis focuses on one of these layers, alternative splicing,and seeks to elucidate regulators of alternative splicing in NEPC.1.2.2 Alternative splicing in cancerThe flexibility for one gene to generate multiple mRNA isoforms allows for a cell-or tissue-specific proteome that can dynamically respond to physiological stimuli[80]. Cancer cells leverage this flexibility to produce proteins that promote growthand survival. Alternative splicing alters many areas of the tumour biology, includ-ing metabolism, apoptosis, cell cycle control, motility and angiogenesis [24].Alternatively spliced isoforms have been shown to exhibit differing and evenopposing functions. Bcl-x, a transmembrane molecule in the mitochondria, pro-duces two isoforms: Bcl-x(L) and Bcl-x(s) (Figure 1.4). Bcl-x(L) has anti-apoptoticeffects whereas Bcl-x(s) promotes apoptosis [14]. Interestingly, high Bcl-x(L) /7Intronic Splicing Enhancer(ISE)Exonic Splicing Enhancer(ESE)Exonic Splicing Silencer(ESS)Intronic Splicing Silencer(ISS)Promoting ExclusionPromoting Inclusion Mixture of IsoformsFigure 1.3: RNA binding proteins can encourage inclusion, exclusion orboth by binding to cis-regulatory elements that promote either inclu-sion or exclusion.Exon 2 Exon 3Bcl-x(L)Bcl-x(s)Exon 2 Exon 3Exon 2 Exon 3Anti-apoptotic function and cell survivalApoptosis and cell deathFigure 1.4: Alternative isoforms of Bcl-x showing opposing functions. Thelonger form, Bcl-x(L) promotes anti-apoptotic function. The shorterform, Bcl-x(s) promotes apoptosis.Bcl-x(s) ratios are observed in of cancers, underscoring the importance of iso-form ratios on cancer proliferation and survival. Other examples including CD44,FGFRs and Rac1, whose altered isoform ratios promote invasion and metastasis[24].The flexibility of alternative splicing allows cancer cells to adapt and survive.Alternative splicing can remodel protein interaction networks and alter signalingpathways [29, 47]. To develop effective therapies against cancer, it is importantdevelop a genome-wide understanding of the functions of cancer-specific isoformsand, importantly, identify regulators promoting the expression of these isoforms.81.2.3 Therapeutic potential for targeting alternative splicingand its regulatory factorsGiven the role of alternative splicing on transcript variation in diseases, there is anincreasing interest in targeting aberrant splicing as a therapeutic treatment againstcancer [24, 36, 105]. In general, there are two approaches to tackle aberrant splic-ing. First is to target the differentially spliced isoforms and second is to target theregulators of splicing.Targeting aberrant or oncogenic alternatively spliced isoforms could rebal-ance ratios of isoforms, which may provide a therapeutic effect. One of the bestexamples of splice modulating therapies is treating Duchenne muscular dystro-phy (DMD) with antisense oligonucleotides. This application is closest to clinicalapplication with safety and efficacy studies of antisense oligonucleotides againstDMD in progress [51]. Duchenne muscular dystrophy is a severe muscle-wastingdisordered caused by mutations in the dystrophin gene. Point mutations andframe-shifting deletions lead to premature truncation of protein translation, lead-ing to loss of function [97]. Antisense oligonucleotides induce skipping of exon51, restoring function of dystrophin. The successes of two phase I/IIa trials us-ing this therapeutic approach demonstrate the feasibility of modulating splicing inhuman subjects and offer promise for other splice modulating therapies.For cancer, one example include the use of modified antisense oligonucleotidesto target a splicing enhancer regulating STAT3 exon 23 alternative splicing. Theshift from STAT3α to STAT3β expression led to apoptosis and cell-cycle arrestin breast cancer cell lines [115]. The obstacles of off-target effects and drug de-livery to entire solid tumours have not yet been addressed. Splice modulatingtherapies in cancer have not yet made it to the clinic, but the early success in celllines shows optimism for rebalancing ratios of isoforms as a therapeutic strategyagainst cancer.The second approach involves targeting the regulatory RNA binding proteinsthemselves. A number of RNA binding proteins, including SF2/ASF and hn-RNPH, have been shown to drive oncogenesis and induce oncogenic splicing[48, 58]. This global approach to correct for aberrant splicing allows for thesplicing fate of a large number of genes to be affected through the targeting ofa single RNA binding protein. Although targeting splicing factors may ameliorateaberrant splicing on a large number of genes, unexpected side-effects may arisedue to the changes in splicing of off-target genes. In addition, many RNA bindingRNA binding proteins have functional roles beyond regulation of alternative splic-ing, including polyadenylation, mRNA stability and translation initiation [89]. It9is conceivable that oncogenic RNA binding proteins may be therapeutic targets,but side-effects of inhibiting or activating RNA binding proteins are not well un-derstood. This in part because the exact exons they regulate are not well knownand in part because we are only beginning to understand the multifunctionality ofRNA binding proteins. In this thesis, I focus mainly on the role of RNA bindingproteins on alternative splicing.1.3 Scope and objectives of the thesisAfter having discussed the current state of knowledge (to the best of my ability)regarding NEPC and alternative splicing, I outline and frame my thesis by dis-cussing the scope and objectives. The scope and objectives of the thesis bringstogether the current knowledge of NEPC and the current knowledge of alternativesplicing, combining them to elucidate novel insights regarding alternative splicingand their regulators in NEPC.By comparing the transcriptome profiles of PCa and NEPC, NEPC-specificsignatures may be discovered and mechanistic insights to NEPC biology may beelucidated. I hypothesize that NEPC-specific RNA binding proteins drive NEPC-specific splicing and these NEPC-specific exons are involved in important func-tions that contribute to the NEPC phenotype. To answer this question, I firstinvestigate alternative splicing changes in the xenograft model [60]. I discoveralternatively spliced events and use a motif analysis to link RNA binding proteinsto the events. To corroborate the results, I analyze RNA-Seq data from a humancohort published by Beltran et al.[13] and a human cohort from the VancouverProstate Centre [57]. I break down the objectives as follows:• Identify NEPC-specific splicing signatures at the whole transcriptome level.• Identify functions of differentially regulated cassette exons in the xenograftand the human cohorts.• Infer the effect of differentially regulated cassette exons on protein-proteininteractions.• Discover enriched motifs in genic regions around the differentially regu-lated cassette exons.• Create a database of known RNA recognition motifs for RNA binding pro-teins.10• Identify RNA recognition motifs that match to the discovered motifs fromthe cassette exons. This will provide candidate RNA binding proteins thatlink to differential splicing.• Identify the gene expression of each RNA binding protein in the databaseacross the three cohorts to infer whether the candidate RNA binding proteinsare over or underexpressed.The outcomes of the objectives will provide a deeper insight to the role of al-ternative splicing in NEPC and also identify candidate regulators that may regu-late NEPC-specific splicing. The comprehensive approach provides an unbiasedmethod of associating RNA binding proteins associated with differential splicing.Significant dimensionality reduction can be achieved by reducing a set of 154RNA binding proteins to a set of tractable candidates for further analyses.11Chapter 2MethodsA major objective of the thesis is to identify potential regulatory splicing factorson a genome-wide scale. To accomplish this, I use principles of computational bi-ology, leveraging a wide range of established methods to analyze RNA-Seq dataand elucidate alternative splicing and its regulation. The methods span from map-ping of RNA-Seq reads, quantitating alternative splicing, discovering motifs topredicting protein-protein binding regions.This section describes the methods used to identify regulatory splicing factors.Deeper reflection on the methods can be found in Section 4.3, where I analyzestrengths and weaknesses of the methods in the context of the results. I discussthe methods by dividing into its subsections.1. Section 2.1: Description of the cohorts analyzed.2. Section 2.2: Processing of RNA-Seq data,3. Section 2.3: Motif discovery and similarity comparisons4. Section 2.4: Functions of differentially regulated exons.2.1 Three prostate cancer cohortsIn total, PCa and NEPC samples across three prostate cancer cohorts were ana-lyzed. The three prostate cancer cohorts are summarized in Table 2.1. I discussdetails of each cohort in separate sections.12Table 2.1: Mapped read pairs across three prostate cancer cohorts.Cohort # PCa # NEPC Avg mapped read pairsXenografts: 331 & 331R 1 1 285×106/samplePatient Tumours: VPC 4 5 83×106/sampleBeltran et. al., 2011 27 4 22×106/sampleCastration331 331RPCa NEPCFigure 2.1: Cartoon of neuroendocrine transdifferentiation in 331 and 331Rin the xenograft cohort. Briefly, neuroendocrine phenotype was in-duced by androgen deprivation and RNA-Seq was performed beforecastration and after neuroendocrine transdifferentiation was complete.Time frame to reach fully differentiated neuroendocrine state is be-tween five to eight months.2.1.1 Patient-derived xenograft models, 331 and 331ROne PCa-NEPC cohort is a patient-derived xenograft model, 331 (PCa) and 331R(NEPC), generated by Lin et al.. The NEPC sample, 331R, was generated throughchemical castration of the 331 PCa sample. After chemical castration, the tumourtransdifferentiates from PCa to NEPC within a time frame of five to eight months(Figure 2.1). RNA-Seq was performed on the Illumina HiSeq. Metadata of the331 and 331R samples are summarized in Table A. Vancouver Prostate Centre cohortThe patient-tumour samples from the Vancouver Prostate Centre (VPC) cohortconsists of 4 PCa samples and 5 NEPC samples. The cohort is a subset of a larger13VPC cohort described by Lapuk et al.. A significant fraction (80%, 4/5) of theNEPC tumours were metastases taken from other organs and lymph nodes. ThePCa samples were primary tumours from the prostate. RNA-Seq was performedon the Illumina Genome Analyzer II. Metadata of the VPC cohort samples aresummarized in Table A. Beltran et al. cohortThe prostate cancer cohort from the study of Beltran et al. consisted of 7 NEPCand 30 PCa, but in this work, 4 NEPC and 27 PCa from the cohort were used.Three PCa samples (2525A, 2620D and 3034C51) were excluded because theycontained mixed read lengths. Three NEPC samples (4240, 7520 and 8740) wereexcluded because they had histological features of mixed PCa and NEPC, whichmay affect alternative splicing signatures. Clinical features of the remaining 4NEPC tumours are summarized in Table 2.2. Summary of mapped read pairs areshown in Table A.3. Note the shallower RNA-Seq coverage in the Beltran et al.cohort versus the xenograft and VPC cohort (Table 2.1). Its possible consequencesare discussed in Section 4.3.Table 2.2: Clinical features of frozen NEPC tumours in Beltran et al. cohortthat were used in this work.ID Age at Dx Tissue Histology PSA IHC SYP IHC7820 NA Met NEPC NEG POS7800 48 Prostate NEPC NEG POS7821 NA Met NEPC NEG POS8220 NA Xenograft NEPC NEG POS2.1.4 Combined human cohortIn order to obtain a larger number of statistically significant differentially regu-lated AS events, I combined the VPC cohort with the Beltran et al. cohort, gen-erating a human cohort of 40 tumours (31 primary PCa, 9 primary or metastaticNEPC). Although the human cohort consists of shallow RNA-Seq data, signalconsistent between the xenograft cohort (a purer tumour) may generate robustresults consistent in both the xenograft and human cohorts.142.2 Processing of RNA-Seq data2.2.1 Alignment of RNA-Seq reads to the genomePaired-end RNA Sequencing (RNA-Seq) reads were aligned to the human genome(hg19) [32] with TopHat [102] using the default settings of the software, generat-ing outputs in BAM format for each sample. Differential gene expression analysiswas performed using DESeq [5]. The xenograft cohort was analyzed by IlluminaHiSeq. The VPC and Beltran et al. human cohorts were analyzed by IlluminaGenome Analyzer II. Mapping statistics, read lengths and metadata of xenograftmodel, VPC cohort and Beltran et al. are discussed in Tables A.1, A.2 and A.3,respectively.2.2.2 Quantitating the expression of isoforms from RNA-Sequsing MISOMISO [50] was used to quantify the expression level of alternatively spliced genesfrom RNA-Seq. MISO classifies differentially spliced events into separate typesof events, including cassette exons (skipped exons), mutually exclusive exons,alternative 5’ splice site, alternative 3’ splice site and retained introns. The num-ber of annotated events for each type of alternative splicing is summarized inTable A.4.In brief, MISO seeks to calculate the ’Percent Spliced Isoform’ values Ψ, rep-resenting relative abundances of different isoforms in a gene. Given N isoforms,Ψi is defined broadly as the abundance of isoform i (αi), as a fraction of the totalabundances of all N isoforms (∑Nk=1αk):Ψi =αi∑Nk=1αk(2.1)By restricting to only two isoforms, which are cases applicable to cassette exons,mutually exclusive exons, alternative 5’ splice site, alternative 3’ splice site andretained introns, Equation 2.1 simplifies to Equation 2.2:Ψ1 =α1α1 +α2(2.2)In a specific example of cassette exons, where one isoform is inclusion of the al-15ternative exon and the other isoform is exclusion of the alternative exon, Ψinclusioncan be calculated using Equation 2.3:Ψinclusion =αinclusionαinclusion +αexclusion(2.3)and Ψexclusion can easily be obtained knowing that the sum of the Ψ values mustequal 1:Ψexclusion = 1−Ψinclusion (2.4)A priori, longer isoforms are more likely to be sampled. The probability ofsampling a read from an mRNA increases approximately linearly with the lengthof the mRNA [50]. Conversely, given a read that could map to either the longer orshorter isoform, it is more likely (all other things being equal and under uniformsampling assumption) to have originated from the shorter isoform since the shorterisoform has fewer starting positions. To balance these factors, MISO incorporateslatent information from exon bodies, read lengths and length of the RNA isoform.MISO uses Bayesian inference to assign reads to a particular isoform, generat-ing a posterior probabilty distribution for the Ψ estimates. Differential isoformexpression was detected using Bayes factors in the xenograft cohort (two-waycomparison)and the Student’s t-test (grouped comparison) in the human cohorts.2.2.3 Detecting differential isoform expressionDetection of differentially spliced events implicitly assumes that the genes aresufficiently expressed and they are not differentially expressed. Therefore, eachdifferentially spliced event is filtered to remove lowly expressed genes (normal-ized read count≥ 1000, by DESeq) and differentially expressed genes (0.25 <readsAreadsB< 4) between the two samples A and B. In addition, the total number ofreads assigned to both isoforms must be greater than or equal to 20, with at least1 read supporting each isoform (i.e. a event with 20 reads assigned to the firstisoform and 0 reads assigned to the second isoform would be filtered out).To detect differential isoform expression between two samples A and B usingBayes factors, MISO uses a two-sided point null hypothesis test:H0 : ΨA−ΨB = 0H1 : ΨA−ΨB 6= 0(2.5)MISO calculates the Bayes factor K to choose between the two competing hy-16Table 2.3: Interpretations for Bayes factor according to Kass and RafteryK Strength of evidence1 to 3 Not worth more than a bare mention3 to 20 Positive20 to 150 Strong>150 Very strongpotheses:K =P(D|H1)P(H1)P(D|H0)P(H0)(2.6)K can be interpreted as a ratio of the weight of the evidence in data D supportingH1 to the weight of the evidence supporting H0. Interpretations of the magnitudeof K is provided by Kass and Raftery and summarized in Table 2.3.In this work, an event was considered differentially spliced between two samplesA and B if:|ΨA−ΨB| ≥ 0.3K ≥ 10(2.7)which provides a filter for both statistical significance and practical significance.In group analysis, samples with low read counts were filtered out for eachevent independently. For each event, the sum of the reads assigned to isoform 1and isoform 2 must be greater than or equal to 10 for each sample, otherwise, thesample would not be considered in the Student’s t-test. This means it could bepossible for different events to consider different numbers of samples, dependingon whether samples had sufficient reads in the event.To detect differential isoform expression in a group of samples, the posteriordistribution for each calculated Ψ value, P(Ψ|D), was represented by its medianvalue. Differential isoform expression was determined by Student’s t-test assum-ing equal variance, as implemented by the SciPy python module. In group analy-sis, an event is considered differentially spliced if:|ΨA−ΨB| ≥ 0.3BH p-value≤ 0.1(2.8)172.3 Motif discovery and motif similaritycomparisonsAll motifs tools were performed with various tools within the MEME Suite [8].Motif discovery was performed with MEME [7] and motif similarity comparisonwas performed with TOMTOM [37]. The section is broken down into separatesubsections:1. Section 2.3.1: discusses motif discovery.2. Section 2.3.2: discusses the compilation of an RNA recognition motif (RRM)database.3. Section 2.3.3: discusses matching discovered motifs with motifs in theRRM database.4. Section 2.3.4: discusses evolutionary conservation2.3.1 Motif discovery with MEMEMotif discovery was performed with MEME [7]. Nucleotide sequences of thecassette exon, upstream and downstream constitutive exons, and 100 nucleotideregions of the introns were used to independently discover motifs, as shown inFigure 2.2. This creates nucleotide sequences in seven regions: upstream consti-tutive exon, cassette exon, downstream constitutive exon, 5’ upstream intron, 3’upstream intron, 5’ downstream intron and 3’ downstream intron. Motifs wereconstrained to be between 5 and 9 nucleotides long, the minimum and maximumlength of an RNA recognition motif, respectively. Discovered motifs were consid-ered statistically significant for E-value≤ 10−10. Motifs from seven genic regionswere discovered independently. This was performed for two sets of differentiallyspliced events: 1) those with the cassette exon included in NEPC versus PCa and2) those with the cassette exon excluded in NEPC versus PCa.To de-emphasize the strength of constitutive splicing motifs, such as GUAAGUAat the 5’ splice site and CAG at the 3’ splice site [64], prior knowledge of theseconstitutive splicing signals must be directly incorporated into the motif discoverymodel. This attenuates constitutive splicing signals and strengthens differentialsplicing signatures in the motif discovery. To incorporate this prior knowledge, aset of constitutively spliced cassette exons described by Buljan et al. were used asa background to generate position-specific priors. Briefly, position-specific prior18Discover motifs with MEME{ { { {{ {{ 100bp 100bp100bp 100bpMatch motifs to RNA-recognition motif databasewith TOMTOMMEME (no SSC) 25.01.14 14:37012bits1CT2CT3TC4CT5TC6T 7TC 8TC 9TCMEME (no SSC) 25.01.14 14:38012bits1ATC2C3TC4TA5TC6TC7TC8ATC9TC5’ 3’Background from similar genic regionsE-value: 3.1x10-19 E-value: 5.1x10-13Figure 2.2: Workflow for discovering and matching motifs. Entire exon re-gions and 100 nucleotide intronic regions were extracted around thecassette exons. Motifs were discovered using MEME with a position-specific prior coming from a background of constitutively spliced ex-ons. Discovered motifs were matched to RNA-recognition motifs us-ing TOMTOM.incorporates auxiliary information and translates it into a measure of the likeli-hood that a motif starts at each position in each sequence in the input [9]. Torecapitulate the seven regions, 100 nucleotides downstream of the backgroundexon was considered the 5’ intron and 100 nucleotides upstream was consideredthe 3’ intron. No distinction was made between the upstream and downstreamintron in the background.2.3.2 Database of RNA recognition motifsTo match discovered motifs with RRMs, a database of 103 RRMs spanning 50RNA binding proteins (RBPs) was compiled. All RRMs were stored in positionweight matrix (PWM) format. Background nucleotide frequencies were 0.25 forA, C, G and U. Source E-value was defaulted to zero. RRMs from 49 of the 50RBPs were derived from the work of Ray et al.. RBPs with RRMs derived from19direct experimental evidence was used whenever possible. If no direct RRMswere found for the RBP, motifs from related RBP would be considered if the RBPidentity is greater than or equal to 0.7, as described by Ray et al..SRRM4 did not have an associated direct or indirect RRM from the work ofRay et al.. But work from Calarco et al. discovered 9 motifs enriched in intronicregions around SRRM4-regulated cassette exons. The SRRM4-associated motifswere converted from a sequence logo into a PWM and added to the database.2.3.3 Motif similarity comparison with TOMTOMDiscovered motifs from Section 2.3.1 (2.2) were compared against a custom databaseof RRMs (Section 2.3.2) using TOMTOM [37]. Reverse complement matcheswere ignored. Motif matches were considered statistically significant if Benjamini-Hochberg (BH)-corrected p-value (q-value) ≤ Evolutionary conservation of motifsEvolutionary constrained elements were detected using Genomic EvolutionaryRate Profiling (GERP) [35]. Briefly, GERP identifies constrained elements inmultiple alignments by quantifying substitution deficits, i.e. substitutions that wouldhave occurred under non-selective conditions but did not occur due to functionalconstraint. This quantitative measure is called rejected substitutions (RS), coinedby [35]. RS can be interpreted as the expected rate of substitutions minus the ob-served rate of substitutions. Higher RS infers greater functional constraint. Thus,a motif can be considered evolutionarily conserved if the region has a sufficientlyhigh RS value. In practice, an RS score threshold of 2 provides high sensitivitywhile still strongly enriching for constrained sites [54].2.4 Functional analysis of differentially regulatedcassette exonsSince the differentially spliced genes were sufficiently expressed in both PCa andNEPC, the focus was localized to the exon level, rather than the gene level. Iinferred functions of the exons using two methods: 1) I analyzed the sequenceannotations (features) of the protein segments which the exons encoded (Sec-tion 3.2.1)and 2) predicted protein-protein binding regions in disordered proteinsegments (Section 2.4.3). Both methods require the translation of the nucleotide20sequence of the exon to its amino acid sequence, which is discussed in Sec-tion Translation from nucleotide to amino acid sequence inlocal exonsTo correctly translate the nucleotide sequence of an exon to its correspondingamino acid sequence requires high-fidelity reconstruction of the mRNA transcript.Unfortunately, since mRNA-Seq technology involves random sampling of frag-ments of mRNA, reconstruction of the entire mRNA transcript is not trivial. Inlight of this difficulty, I used a localized approach for translating cassette exonnucleotide sequences to amino acid sequences.Reading frames of exons were taken from annotations of Ensembl 72. If multi-ple reading frames of were annotated for an exon, all possible reading frames wereconsidered. The reading frame was used to translate the nucleotide sequences toits corresponding sequences.The constitutively spliced exons from Buljan et al. were from an older versionof the human genome (hg18), and Ensembl 54 was subsequently used for trans-lating these nucleotide sequences to amino acid sequences. All data mapping tohg19 used Ensembl 72 for reading frame annotations.2.4.2 Sequence annotations of protein segments encoded byexonsSequence annotations (features) were downloaded from the Universal Protein (UniProt).Release version 2013-11 of the UniProt annotations was used. All sequence anno-tations in segments of the protein that overlap the exonic regions were considered.The sequence annotations were then grouped based on whether the exon was in-cluded in NEPC or excluded in PCa.2.4.3 Prediction of protein binding regionsPrediction of protein binding regions was performed using ANCHOR [71]. AN-CHOR identifies disordered protein segments that may undergo a disorder-to-order transition upon binding to a globular protein partner. Disordered regionscontaining 6 or more residues with ANCHOR probability value ≥ 0.5 were pre-dicted to be protein binding regions.21Chapter 3ResultsThe flow of the results follows a test and validation cohort procedure, where thexenograft model is used as a test cohort and the human cohort is used as a vali-dation cohort for corroborating robust findings. The results are broken down intoseveral sections:1. Section 3.1 describes differentially regulated AS events found in the xenograftmodel.2. Section 3.2 describes the potential functional role of AS exons by predictingprotein binding regions and by exploring the annotated protein regions inwhich the exons code.3. Section 3.3 describes a motif discovery and motif similarity strategy tomatch known RNA recognition motifs (RRMs) of RNA binding proteins(RBPs) to enriched motifs near AS events.4. Section 3.4 corroborates the results from the xenograft model with a hu-man cohort to identify SRRM4 as a recurrent RBP associated with exoninclusion.5. Section 3.5 shows SRRM4 and other neural-specific splicing factors to beupregulated in both the xenograft model and the human cohort6. Section 3.6 recapitulates the results section.22CassetteMutually excAlt 5'Alt 3'Retained intron0100200300400500600Number of AS eventsNumber of differentially regulated AS eventsFigure 3.1: Differentially regulated alternatively spliced events categorizedby type of alternative splicing. Cassette exons are the most representedcategory and are used for further downstream analyses.3.1 Detection of differentially spliced events3.1.1 Differentially regulated alternatively spliced eventsUsing MISO, I quantitated alternatively spliced events and identified ones thatwere differentially regulated between NEPC and PCa. A total of 1137 differen-tially regulated AS events representing 876 genes were found the xenograft cohort(Figure 3.1). PHF21A and SPTAN1, previously known differentially regulatedAS event in NEPC, was rediscovered and correctly categorized as mutually exclu-sive and cassette, respectively [57]. Cassette exons represented 48% (548/1137)of differentially regulated AS events, making them the most represented categoryin the identified AS events. Taken together, a large number of genes are affectedby alternative splicing and, particularly, by cassette exons. The cassette exons areused in downstream analyses, where a sufficiently large sample size is required tomake meaningful conclusions from the results.233.1.2 Cassette exons are activated and repressed in NEPCTo quantify the extent to which an exon is included, Percentage spliced in (PSIor Ψ) values were calculated for each cassette exon event and for each sampleindependently. Ψ denotes the fraction of mRNA fragments that represents the in-clusion isoform; Ψ = 1 and Ψ = 0 signifies complete activation and repressionof the cassette exon, respectively. Cassette events were found to split into twogroups: 1) cassette exons activated in NEPC and 2) cassette exons repressed inNEPC Figure 3.2. These two groups are distinct in the context of NEPC biology,since activated exons suggest a potentially longer isoform whereas repressed ex-ons suggest a potentially shorter isoform. The two groups, activated and repressedare, therefore, analyzed separately in downstream analyses.3.1.3 Protein and mRNA expression of differentially regulatedcassette exonsTo assess whether alternatively spliced genes may have altered expression at theprotein or mRNA level, we integrated proteomic data from label-free mass spec-trometry and gene expression from RNA-Seq to assess differential expression intwo dimensions. Interestingly, genes with cassette events did not have significantchanges at both the mRNA and protein level (Figure 3.3A) compared to putativeNEPC markers, including SCGN and TAGLN3 (Figure 3.3B). Taken together, theresults suggest the cassette exons are not affecting protein or mRNA abundance.3.2 Function of differentially regulated cassetteexonsIn Section 3.1.3, I found no significant differential expression of genes with dif-ferentially regulated cassette exons. In this section, I analyze cassette exons bytranslating the exon mRNA segment to a protein segment. This allows me to dotwo things, discussed in its respective subsection:1. Map protein segments to annotated protein segments Section 3.2.12. Predict for protein binding regions in protein segments Section 3.2.224SubtypeSamples Cassette Exons 548 Events 2 Samples0 0.4 0.8PSI Value00.511.5Color Keyand Density PlotDensityPCa 331NEPC 331RActivated in NEPCn=346Repressed in NEPCn=202Figure 3.2: 548 differentially regulated cassette exons found between 331(PCa) and 331R (NEPC). Rows represent alternatively spliced events,columns represent samples. Color key represents PSI (Ψ) values. Cas-sette exons group into two types: 1) cassette exons included in NEPC(n=346) and 2) cassette exons excluded in NEPC (n=202).25ABFigure 3.3: Differential expression of genes at the mRNA and protein levelbetween 331 and 331R. Positive values represent overexpression in331R (NEPC) compared to 331 (PCa). (A): differential expression ofall genes containing matched protein and mRNA data. Neuronal genesignatures are in the upper right, PCa signatures bottom left. (B): dif-ferential expression of only genes containing differentially regulatedcassette exon.263.2.1 Cassette exons code for a wide range of protein domainsMapping protein segments to the UniProt database, I found cassette exons to mapto a wide range of protein domains, suggesting that cassette exons affect a widerange of protein functions (Figure 3.4). The inclusion or exclusion of a cassetteexon encoding for a feature could be inferred, in some cases, as gain or loss ofthe feature, respectively. Since the function of these exons are not to changeexpression of mRNA or protein (Figure 3.3B), I look at the possibility of theexons coding for interesting protein regions.Examples of exons coding for three protein regions are discussed:1. Exons coding for DNA binding domains, using FOXM1 as a specific exam-ple.2. Exons coding for phosphoamino acids, using NASP as a specific example.3. Protein segments in extracellular regions, using protein tyrosine phosphatasesas a specific example.FOXM1 is a transcription factor that plays a role in cell proliferation and regu-lates the expression of cell cycle genes important for DNA replication and mitosis.Other roles include repair of DNA breaks and DNA damage checkpoint response[31, 62, 99]. In NEPC, FOXM1 differentially expresses a cassette exon comparedto PCa (Ψ¯NEPC = 0.83, Ψ¯NEPC = 0.38, Figure 3.5). Interestingly, the cassetteexon is in the DNA-binding region of the protein (amino acid positions 235-327) and contains a phosphoserine, suggesting altered protein-DNA interactionbetween FOXM1 and its response elements. Indeed, FOXM1 activity has beenreported to be modulated through tissue-specific alternative splicing [56, 114].Further, emerging evidence suggests FOXM1 is an important transcription factorthat is disrupted in cancer [46, 55, 56, 67, 111]Protein segments located in extracellular regions are of particular interest incancer, since they may be possible isoform-specific regions targetable by mono-clonal antibodies. In NEPC, two protein tyrosine phosphatases, PTPRF and PT-PRS, have an activated exon (Ψ¯PT PRF,NEPC = 0.76, Ψ¯PT PRS,NEPC = 0.77) thatcodes for a protein segment located in the extracellular domain, which are re-pressed in PCa (Ψ¯PT PRF,PCa = 0.01, Ψ¯PT PRS,PCa = 0.05). Coverage plots (sashimiplots) of PTPRF and PTPRS show significant increase of the cassette exon inNEPC relative to PCa. There has been growing interest in targeting the proteintyrosine phosphatase family due to their roles in regulating a number of cellular27FOXM1 NASPPTPRFPTPRS2 histone-bindingregionForkheadPhosphoserineExtracellular9 phospho-amino acidsFigure 3.4: Functions of differentially regulated cassette exons between 331and 331R. Annotated gene names are examples of genes with differen-tially regulated cassette exons coding for interesting protin regions anddiscussed in Section 3.2.1. Annotations under arrows denote furtherdescription of the UniProt annotation. For example, NASP inclusionexon contains 9 phosphoamino acids and a 2 histone-binding regionsprocesses including receptor tyrosine kinase activity, proliferation, cell-cell ad-hesion and invasion [4, 44, 78]. The NEPC-specific region in the extracellulardomain of PTPRF and PTPRS suggests potential isoform-specific targets againstNEPC.NASP is a histone-binding protein responsible for transporting H1 linker hi-stones into the nucleus of dividing cells [86]. Further, NASP has been shownto be required for cell proliferation [87]. We found the T-exon of NASP to bedifferentially included in NEPC relative to PCa (Figure 3.7). Interestingly, thedifferentially included T-exon in NEPC is 338 amino acids long, coding for ninephosphoamino acids (7 phosphoserines and 2 phosphothreoines) and two histonebinding regions. Interestingly, isoforms with the T-exon, tNASP, which is abun-dant in NEPC, have been reported to be abundant in embryonic and tumour cells[85, 87]. The tNASP isoform suggests a connection between alternative splicingand histone modifications linked to cell cycle. Taken together, the mapping of pro-tein segments to UniProt annotations provides an efficient method to infer alteredfunctions of differentially regulated cassette exons.28Sashimi plot of FOXM1: A1 cassette exonInclusion isoformProteinFork head DNA binding domainA1 exonExclusion isoformFigure 3.5: Sashimi plot of FOXM1 in the xenograft cohort. Reads per kilo-base per million (RPKM) represent coverage around the alternativelyspliced event. Numerical values linking RPKM coverage representsreads spanning different exon-exon junctions. Ψ estimates from MISOare shown as a probability distribution (right). Upstream, cassette anddownstream exon structure shown schematically (below).3.2.2 Cassette exons are significantly enriched in predictedbinding protein regionsTo investigate whether differentially regulated exons are important for protein-protein interactions, cassette exons were analyzed for predicted protein bindingregions in disordered regions using ANCHOR [28, 71]. Differentially includedand excluded cassette exons were found to be significantly enriched for predictedprotein binding regions for the xenograft and patient-tumour cohorts, suggestingthat the cassette exons may be important for rewiring protein interaction networks(Figure 3.8). Altered protein interactions have been previously suggested by El-29PTPRSInclusion isoformExclusion isoformPTPRFExtracellular region CytoplasmicExtracellular region CytoplasmicProteinABFigure 3.6: Sashimi plot of PTPRF (A) and PTPRS (B) in the xenograftcohort. Reads per kilobase per million (RPKM) represent cover-age around the alternatively spliced event. Numerical values linkingRPKM coverage represents reads spanning different exon-exon junc-tions. Ψ estimates from MISO are shown as a probability distribu-tion (right). Upstream, cassette and downstream exon structure shownschematically (below).30NASP: T-exon regionT-exonT-exonAA 137 AA 475P P P P PPPPInclusion isoformExclusion isoformProtein regionsPhosphorylation sitesHistone binding regionsFigure 3.7: Sashimi plot of FOXM1 in the xenograft cohort. Reads per kilo-base per million (RPKM) represent coverage around the alternativelyspliced event. Numerical values linking RPKM coverage representsreads spanning different exon-exon junctions. Ψ estimates from MISOare shown as a probability distribution (right). Upstream, cassette anddownstream exon structure shown schematically (below).lis et al. and Buljan et al. in human tissues, corroborating the results found inNEPC. The activation or repression of protein binding regions may yet be anothermechanism with which alternative splicing shapes the NEPC phenotype.31ABFigure 3.8: Exons with predicted protein binding regions in the xenograftcohort. Included exons are exons that are included in NEPC but ex-cluded in PCa. Excluded exons are exons that are excluded in NEPCbut included in PCa. Number of exons used as input: 346 inclusion,202 exclusion. Fraction indicates number of protein segments that arepredicted for protein binding regions out of the total number of proteinsegments that was confidently translated from the exon (less than num-ber of input exons because many could not be confidently translated toprotein segments).323.3 Enriched motifs around genic regions ofdifferentially regulated exons3.3.1 RNA binding proteins associated with enriched motifs inxenograft modelI seeked to uncover potential regulators of splicing changes in NEPC. First, weused MEME to discover motifs using genic regions of included and excluded ex-ons, separately; a discriminative prior consisting of matched genic regions fromconstitutively spliced exons were used to emphasize motifs relevant to differentialsplicing [9]. Second, to assess which RBPs were associated with the discoveredmotifs, we matched motifs to a database of RNA recognition motifs (RRMs) of156 RBPs using TOMTOM [37]. Two motifs (one upstream and one downstreamof NEPC-specific cassette exons) matching to RNA recognition motifs were iden-tified from motif discovery analysis Figure 3.9. Motif similarity analysis betweendiscovered motifs and the database of 156 RNA-binding proteins (Section 2.3.2)identified 11 RRMs belonging to 15 RBPs associating with the discovered mo-tifs (Figure 3.10). Interestingly, SRRM4, PTBP2 and ELAVL3 are tissue-specificRNA binding proteins known to regulate splicing in the nervous system [23, 59].These tissue-specific RNA binding proteins may have similar roles in activatingNEPC-specific exons. ELAVL3, also known as HuC, is part of a family of RNAbinding proteins whose neuronal members (ELAVL2, ELAVL3, ELAVL4) playimportant roles in neuronal plasticity and differentiation [1, 2, 77]. Interestingly,it has been shown that ELAVL proteins and TIA1, an RNA binding protein thatalso enhances exon inclusion, compete for binding to the same U-rich sequence inan intronic splicing enhancer region [117], corroborating the appearance of TIA1and ELAVL3 in Figure 3.10.RNA recognition motifs of RNA binding proteins that repress exons were alsodiscovered. RRMs of the PTB family, PTBP1 and PTBP2, which were inferredto have the same RRM from RNA binding domain similarity (denoted PTBP1/2),and its paralog, PTBP3, are known repressers of many exons [96]. They have beenshown to repress exons by binding to intronic regions and inhibiting the bindingof U2AF, a ubiquitous splicing factor, leading to disruption of the assembly of thesplicesome. These RNA binding proteins may be also be involved in competi-tive binding with activators of exons including SRRM4, an NEPC-specific RNAbinding protein. The presence of SRRM4 in NEPC may reduce the exon repres-33MEME (no SSC) 25.01.14 14:37012bits1CT 2CT 3TC 4CT 5TC 6T 7TC 8TC 9TCMEME (no SSC) 25.01.14 14:38012bits1ATC2C 3TC 4TA 5TC 6TC 7TC 8ATC 9TCEval: 3.1x10-19poly(C) motifEval: 5.1x10-13pyrimidine-rich motif100 nt 100 ntDistribution of sites contributing to discovered motifsMotif 1 (55 sites)Motif 2 (133 sites)Figure 3.9: Motifs discovered from NEPC-specific cassette exons in thexenograft cohort. Discovered motifs were identified from 100 nu-cleotide intronic regions flanking the exons.sive effects from the PTB family and lead to increased exon inclusion. Indeed,SRRM4, PTBP1 and PTBP2 have been reported to be involved in a regulatorycircuit leading to inclusion of neural-specific alternative exons [18].A number of poly(C) binding proteins were associated with a discovered mo-tif upstream of the cassette exon. Poly(C) binding proteins 2, 3 and 4 (PCBP2,PCBP3, PCBP4) and hnRNP K are poly(C)-binding proteins that were associatedwith the upstream discovered motif. They share a triple KH domain, are linkedby their poly(C) binding and are multifunctional [68]. PCBP2 and hnRNP K hasbeen reported to be involved in alternative splicing [15, 22].PCBP3 and PCBP4 have not yet been reported to be involved in alternativesplicing. Although they have similar KH domains and poly(C) binding specifici-ties, their subcellular localizations are in the cytoplasm and have not been reportedto shuttle into the nucleus [88]. Further work will be needed to elucidate whetherPCBP3 and PCBP4 localize into the nucleus in different cellular contexts.34PCBP2PCBP3/4hnRNPKU2AF2PCBP2SRRM4TIA1ROD1PTBP1/2PCBP3/4ELAVL1/3CPEB3/4−101234UpstrmExonUpstrmIntron 5’UpstrmIntron 3’CassetteExonDwnstrmIntron 5’DwnstrmIntron 3’DwnstrmExon−log 10 p−valueMotif ID a aMotif 1 Motif 2Number of sites a a a a60 80 100 120Figure 3.10: Motif map of RBPs whose RRMs matched to discovered mo-tifs. P-values obtained through conversion of a score representingoverlap of RRM and motif. Colours represent the discovered mo-tifs. Size of the text represents number of DSEs that the discoveredmotifs matched. Since some RRMs were identified through RNAbinding domain similarity (Section 2.3.2), RBPs with shared RRMsare identified using a forward slash (’/’).353.3.2 Enriched motifs are enriched with evolutionarilyconserved elementsTo further infer functional significance of the discovered motifs, I calculated theGERP score for each of the 55 and 133 intronic sites contributing to the poly(C)and pyrimidine-rich motif, respectively. Comparing with a set of controls (Sec-tion 2.3.4), we found the sites were significantly enriched in evolutionarily con-served elements. I previously identified the enriched motifs to associate withRRMs of RBPs (Section 3.3.1). The conservation analysis further supports theenriched motifs to be functionally significant.3.4 Corroboration of xenograft model results witha patient-tumour cohortFollowing the identification of a number of RNA binding proteins associated withexon inclusion in the xenograft cohort, I corroborate my results by analyzing ahumah cohort consisting of 31 PCa primary tumours and 9 NEPC tumours. Themethod used parallels the one used in the xenograft cohort. The results are dividedinto subsections:1. Section 3.4.1 identifies 209 differentially regulated cassette exons.2. Section 3.4.2 discusses enriched motifs matching to RNA recognition mo-tifs of RNA binding proteins. SRRM4 is rediscovered in the human cohort,corroborating the xenograft results.3. Section 3.5.1 analyzes overlap of differentially regulated cassette exons andidentifies enrichment of SRRM4-regulated genes.3.4.1 Differentially regulated cassette exons in human cohortTo identify robust findings consistent with human tumours, I identified a set ofdifferentially regulated cassette exons in a human cohort consisting of 31 PCaprimary tumours and 9 NEPC tumours (Figure 3.15). 209 differentially regulatedcassette exons identified in the human cohort, which was less than the 548 cassetteexons identified in the xenograft model. The smaller number of identified cassetteexons could be a result of a number of reasons:1. Noise from impurity of NEPC tumours (e.g. stromal cells)36Sites in discovered motifsControl sitesBAFigure 3.11: (A): Density plot of GERP scores for discovered motifs versuscontrol sites. Dotted vertical line represents GERP score of 1.5, thethreshold over which an element is considered to be conserved. (B):Fraction of intronic regions in the discovered motifs that were con-sidered evolutionarily conserved in the xenograft cohort. Discoveredmotifs were compared against a control region where a 9mer was ran-domly chosen from the 100 nucleotide intronic region from which thediscovered motif was found.372. Noise from metastatic locations from the NEPC tumours (e.g. non-prostatetissue)3. Noise from low transcriptome coverageThe 209 cassette exons were categorized into two groups: 1) activated in NEPC(n=146) and 2) repressed in NEPC (n=63). The activated and repressed exonswere independently analyzed to identify enriched motifs and associate it to RNAbinding proteins. When these two groups of cassette exons were analyzed for pre-dicted protein binding region, the activated exons were significantly enriched forpredicted binding regions (Fisher’s exact test p-value: 4.32×10−3, Figure 3.12A),whereas the repressed exons were not (Fisher’s exact test p-value: 1.88× 10−1,Figure 3.12B), likely due to the smaller sample size of the repressed exons (nactivated =146, nrepressed = 63).3.4.2 Motif analysis in human cohort corroborates SRRM4 asan RNA binding protein associated with exon inclusionThree motifs were discovered in the human cohort that were associated with exoninclusion: 1) poly(C) motif in the 5’ splice site of the intron upstream of thecasette exon, 2) purine-rich motif in the 3’ splice site of the upstream intron, 3) 3’splice site in the downstream intron (Figure 3.13). Conservation analysis showedthat the discovered motifs were enriched with evolutionarily conserved elements(Figure 3.14).Interestingly, SRRM4 was matched to both the human cohort Figure 3.16 andthe xenograft cohort Figure 3.10. This further suggests that SRRM4 is an im-portant RNA binding protein regulating the inclusion of cassette exons in NEPC.SRRM4 may indeed be important in NEPC through a connection with REST, atranscriptional repressor of neuronal genes. It has been previously reported thatSRRM4 induces inclusion of an exon of REST leading to an inactive form andleads to neuronal derepression of genes in mammalian systems. [43, 61, 76, 82].Other RNA binding proteins matching to enriched motifs in the human cohortinclude hnRNP H1, H2 and F (hnRNP H1/H2/F, inferred to have the same RRMfrom RNA binding domain similarity). hnRNP H1/H2/F have been shown to co-ordinate together to mostly activate exon inclusion [3, 42]. hnRNP H1 is known tobe involved in neural-specific pre-mRNA splicing and modulates splicing fate ofimportant transcripts such as the apoptotic mediator Bcl-x* [20, 33, 38, 72]. SincehnRNP H1 was identified to associate with exon inclusion in NEPC via discovered38ABFigure 3.12: Exons with predicted protein binding regions in the human co-hort. Included exons are exons that are included in NEPC but ex-cluded in PCa. Excluded exons are exons that are excluded in NEPCbut included in PCa. Number of exons used as input: 146 inclusion,63 exclusion. Fraction indicates number of protein segments that arepredicted for protein binding regions out of the total number of pro-tein segments that was confidently translated from the exon (less thannumber of input exons because many could not be confidently trans-lated to protein segments).39motifs in the intron region, corroborating with previously published studies show-ing hnRNP H1 binds to intronic regions to activate exons [69]. hnRNP proteinsare abundantly expressed across many tissues [42], their association with differ-ential exon inclusion suggests these ubiquitously expressed RNA binding proteinsmay be involved in determining tissue-specific splicing fate.Lastly, RNA binding proteins of the serine/arginine-rich family, SRSF1, SRSF4and SRSF6 were found to be associated with enriched motifs in the human cohort.SRSF1, SRSF4 and SRSF6 have been previously implicated in a number of can-cers [21, 24, 74]. Intriguingly, CLK4, a gene with NEPC-specific inclusion of anexon containing a phosphoserine (Table A.9), has been reported to phosphorylateSRSF4 and SRSF6 [75]. Indeed, differential phosphorylation analysis will needto be performed in order to determine whether the phosphorylation status of theseSR proteins are affected.The recapitulation of SRRM4 as a potential activator of NEPC-specific exoninclusion in the human cohort supports previous findings in Section 3.3.1. Thediffering RNA-binding proteins found in the human cohort versus the xenograftcohort corroborate with the different DSEs found between the xenograft and thehuman cohort. Overall, the motif analysis has proved to be an efficient methodfor associating cassette exons with a large collection of RBPs in an unbiased andgenome-wide manner, reducing the dimensionality to a handful of biologicallyrelevant candidates for functional analyses.3.5 Neural-specific splicing factors areoverexpressed in NEPCAfter identification of a number of RNA binding proteins associated with exoninclusion, I seeked to identify differential expression of the RNA binding proteinsthat may be associated with differentially regulated cassette exons.Gene expression analysis of 156 RBPs across the xenograft and human co-hort showed extensive and consistent differential expression of key RBPs (Fig-ure 3.17). A number of differentially expressed RBPs are neural-specific includ-ing SRRM4 [18], ELAVL3, ELAVL4, [40], CELF3, CELF4, CELF5 [53] andNOVA2 [104]. These RBPs have been reported to be important in neurogenesisand brain-specific splicing. The narrower range in BH-adjusted p-values in theVPC cohort may be due to the smaller sample sizes (5NEPC, 4PCa) comparedto the Beltran cohort (4NEPC, 27PCa). Nevertheless, the consistent overexpres-sion suggests that neural-specific RBPs may also be important for NEPC-specific40Distribution of sites contributing to discovered motifsMotif 1 (33 sites)Motif 2 (32 sites)Motif 3 (51 sites)100 nt 100 nt100 ntMEME (no SSC) 22.03.14 04:45012bits1C 2TC 3C 4GTC 5CT 6CTG 7TC 8C 9TCMEME (no SSC) 22.03.14 04:45012bits1TCAG2G 3G 4TCG 5TCA 6AG 7ATG 8AG 9ACGMEME (no SSC) 22.03.14 04:45012bits1GCT2CTG3C 4C 5CT 6GTC 7C 8TA 9GMotif 1 Motif 2 Motif 3Figure 3.13: Density plot of intronic location of sites contributing to two dis-covered motifs found to be associated with known RNA recognitionmotifs.splicing. One of the most significant neural-specific RBP found to be overex-pressed in NEPC was SRRM4. Importantly, SRRM4 was also identified in twoindependent motif analyses in Section 3.3. The overexpression of SRRM4 as wellas its identification in independent motif analyses strongly suggests that SRRM4is an important RNA binding protein associated with exon inclusion.3.5.1 Differentially spliced genes are enriched in previouslyreported SRRM4-regulated genesComparing the overlap of cassette exons found between the xenograft model andthe human cohort, only 16% (34/209) of cassette exon events in the human cohortoverlapped with the xenograft cohort (Figure 3.18). To reconcile the small over-lap, we investigated whether SRRM4 may be a common regulator of alternativesplicing in NEPC in both cohorts (Table 3.1). I took 29 genes whose exons havebeen reported to be regulated by SRRM4 [29] and asked whether these genes wereenriched in differentially spliced genes (DSG) in the xenograft model (471 DSGs),human cohort (180 DSGs) and overlap (41 DSGs). Of the 29 known SRRM4-regulated genes Table A.5, 7 were differentially spliced in the xenograft model41ABFigure 3.14: (A): Density plot of GERP scores for discovered motifs versuscontrol sites. Dotted vertical line represents GERP score of 1.5, thethreshold over which an element is considered to be conserved. (B):Fraction of intronic regions in the discovered motifs that were con-sidered evolutionarily conserved in the xenograft cohort. Discoveredmotifs were compared against a control region where a 9mer was ran-domly chosen from the 100 nucleotide intronic region from which thediscovered motif was found.42SubtypePatient Tumour SamplesCassette Exons 209 Events 40 Samples0.2 0.6PSI Value00.511.522.5Color Keyand Density PlotDensityAdenocarcinomaNeuroendocrineActivated in NEPCn=146Repressed in NEPCn=63Figure 3.15: 209 differentially regulated cassette exons found between 331(PCa) and 331R (NEPC). Rows represent alternatively spliced events,columns represent samples. Color key represents PSI (Ψ) values.Cassette exons group into two types: 1) cassette exons included inNEPC (n=146) and 2) cassette exons excluded in NEPC (n=63).43SRRM4SRSF1SRSF4/6hnRNPH1/H2/FPCBP3/4PCBP2−101234UpstrmExonUpstrmIntron 5’UpstrmIntron 3’CassetteExonDwnstrmIntron 5’DwnstrmIntron 3’DwnstrmExon−log 10 p−valueNumber of sites a a a a a20 30 40 50 60Motif ID a a aMotif 1 Motif 2 Motif 3Figure 3.16: Splicing factors with RRMs matched with discovered motifs.P-values obtained through conversion of a score representing overlapof RRM and motif. Colours represent the discovered motifs. Sizeof the text represents number of DSEs that the discovered motifsmatched.44PABPN1LRBM4BESRP2RBM47ELAVL3KHDRBS1CSDASRSF4PCBP3RBM38LIN28ARBM24A1CFFUSNOVA2MBNL1IGF2BP3RBFOX1CPEB2RBMS3RBMXCELF3DAZAP1SRRM401230 3 6 9−log10 BH−Adj P−Value: Beltran Cohort−log 10 BH−Adj P−Value: VPC Cohortlog2 FC :331R vs 331 0 3 6 9FC direction Overexpressed UnderexpressedFigure 3.17: Differential gene expression of RBPs across three prostate can-cer cohorts. X- and y-axes represent Benjamini-Hochberg adjustedp-values for Student’s t-test for differential gene expression for VPCand Beltran cohort, respectively. Size of each bubble represent log2fold change between 331R and 331 in the xenograft cohort. Coloursrepresent over- or under-expression compared to PCa. Gene namesshown for differentially expressed genes in either one of three co-horts.45AB41430 13934514 175Differentially regulated cassette exonsGenes containing differentially regulated assette exonsXenograft HumanXenograft Human5 SRRM4-regulatedgenes found.Figure 3.18: A: overlap of differentially regulated cassette exons betweenxenograft and human cohorts. B: overlap of genes containing differ-entially regulated cassette exons between xenograft and human co-horts. Cassette exon events exceed genes containing cassette exonsbecause multiple cassette events can occur on a single gene.(Fisher’s exact test p-value <0.001), 8 were differentially spliced in the humancohort (P-value ¡ 10-5 and 5 were differentially spliced in both (P-value <10-6).Despite the small overlap between cassette exons found between the xenograftmodel and the human cohort, we found consistent enrichment of SRRM4 betweenboth cohorts, further corroborating our results that SRRM4 may be a potential reg-ulator of alternative splicing in NEPC.3.6 Recapitulation of the resultsAn integrated analysis, combining a number of established bioinformatics toolsfrom alignment of reads to motif discovery, allowed for systematic characteri-46Table 3.1: Enrichment of SRRM4-regulated genes across cohorts. Fractionsshow number of genes matching to a list of SRRM4-regulated genesover the total number of DSEs found in specified cohort. Values inparentheses denote p-values from Fisher’s exact test. Overlap containsDSEs common in both xenograft and patient-tumour cohorts. Back-ground genes used for Fisher’s exact test taken from Table A.4Type Xenografts Patient-tumour OverlapCassette 7/471 (0.002)8/180(6.64×10−6)5/41 (4.10×10−7)Mutuallyexclusive4/117 (0.007) 1/68 (0.35) 1/10 (0.06)zation of NEPC-specific splicing, splicing regulation and their functional conse-quences. I have identified a NEPC subtype specific splicing signature, character-ized the predicted functions of the regulated exons, discovered motifs that maybe associated with NEPC-specific splicing and computationally predicted NEPC-specific splicing factors that may play a role in differential splicing.Taken together, the results show interactions between different regulatory lay-ers including protein-DNA, protein-RNA and protein-protein networks. Differ-ential gene expression analysis of RNA binding proteins suggested splicing dif-ferences found between NEPC and PCa could be regulated in part by alteredabundances of RNA binding proteins. Analysis of the differentially regulated ex-ons at the protein level coded for a number of functions including DNA-bindingregions, phosphorylation sites and membrane-spanning regions. Differentiallyspliced genes with distinct phosphorylation sites or membrane-spanning regionssuggests these genes may be targetable in an isoform-specific manner. Motif dis-covery and similarity analysis with RRMs linked differential splicing with tissue-specific RNA binding proteins. This links differentially expressed RNA bindingproteins to differentially regulated cassette exon. I robustly identified SRRM4in two independent cohorts as an overexpressed RNA binding protein associ-ated with exon inclusion in NEPC. Its functional significance is further corrob-orated through identification of enrichment of SRRM4-regulated genes in boththe xenograft and the human cohorts.47Chapter 4DiscussionThe discussion section will reflect upon the results in Chapter 3. Section 4.1 pro-poses a model of how alternative splicing may be regulated in NEPC by drawingparallels with neural cell differentiation. Section 4.2 puts the results into the con-text of the field of prostate cancer. Section 4.3 reflects upon the methods usedto obtain the results and discusses caveats and cautions related to each method.Section 4.4 discusses future work that could extend my results.4.1 Parallels between neural cell differentiationand neuroendocrine differentiationIdentifying splicing factors associated with NEPC-specific exon inclusion usingthe global and unbiased approach (i.e. discovered motifs were matched againstthe entire database of RBPs) strongly suggests biological relevance and providesstrong evidence for further functional validation. It is tempting to speculate theparallels between the mechanisms of neural cell differentiation and the mecha-nisms of neuroendocrine differentiation, since many of same players are involved.In mice, SRRM4 is required for the expression of REST target genes and neuralcell differentiation [83]. REST directly inhibits SRRM4 in nonneural cells. Thepresence of a number of factors including PTBP2 and SRRM4 facilitates a neuralexon network [18].In NEPC, previous work identified REST to be downregulated and PHF21Ato be alternatively spliced, leading to derepression of neural genes [57]. My workintroduces an additional layer of complexity by implicating SRRM4 as overex-pressed and associated with cassette exon inclusion in NEPC.48With striking parallels between neural cell differentiation and neuroendocrinedifferentiation, it is tempting to speculate that PCa undergoes a similar transi-tion during the stresses of androgen deprivation. However, since neural cell dif-ferentiation involves proliferating neural stem and progenitor cells, it sparks thequestion whether PCa reverts to a transient state of plasticity in order to transdif-ferentiation into NEPC. The mechanisms that allow PCa to transdifferentiate arepoorly understood, but may be eludicated through global analysis of different timepoints between terminally differentiated PCa and terminally differentiated NEPC.Nonetheless, this thesis underscores the extensive changes in alternative splicingin NEPC. Future work unraveling alternative splicing during the transition phaseof NEPC (no longer PCa but not quite NEPC) may reveal critical players hiddenin my analysis of NEPC vs. PCa.4.2 Relevance of results in the context of prostatecancerMy work demonstrates the vast amount of biological information that is largelyunexplored in differential gene expression or differential protein expression stud-ies. Differential splicing affect a large number of non-differentially expressedgenes. By taking into account alternative splicing in RNA-Seq data, we providea more comprehensive picture of cancer. Indeed, the large number of differentfunctions coded in cassette exons underscore the importance of considering notonly the expression of a gene, but the ratio of its isoforms.Alternative splicing is only one of many regulatory mechanisms used by can-cer cells to promote growth and survival [39, 79]. In addition, work at the inter-face of epigenetics and RNA biology has revealed complex links between histonemodifications, DNA methylation, the transcription machinery, non-coding RNAsand alternative splicing [66, 93]. We are heading towards an increasingly inte-grated analysis of high-dimensional biological data, combining layers of regula-tory mechanisms to understand how cancer confers growth and survival advan-tages. In order to understand how regulatory layers are coupled with each other,a deep understanding of each layer is required. My work is a step towards under-standing the role of alternative splicing in NEPC. The alternative splicing frame-work can be used to layer additional complexities to provide a global picture ofneuroendocrine transdifferentiation.The xenograft model provides a system in which to perturb while recapitulat-ing the complexity of human tumours [101]. Using a patient-derived xenograft49model of PCa and NEPC forms a tight feedback loop between computational pre-diction and experimental validation. Further, deep sequencing between PCa andNEPC in the xenograft model reduces variance, allowing precise assessment ofnuanced transcriptomic features such as alternative splicing [50, 100]. By usingthe insights provided by deep sequencing of PCa and NEPC xenograft models andcorroborating findings in human cohorts, we can make robust discoveries to elu-cidate cancer biology and identify clinically relevant targets for further validation.4.3 Reflection and discussion of the methodsThe extent to which the results can be interpreted depends greatly on the methodsused. Methods matter. This section is split into subsections to discuss separatelythe methods used in my work. Alignment (TopHat) and differential gene expres-sion analysis (DESeq) tools are not discussed because its implementation wasmainly the work of the post-doctoral researcher, Fan Mo. This section breaksdown into the following subsections:1. Section 4.3.1: discussion on alternative splicing identification.2. Section 4.3.2: discussion on annotated functional features of cassette exons.3. Section 4.3.3: discussion regarding prediction of protein binding regions.4. Section 4.3.4: discussion on motif analysis.4.3.1 Quantitation of differentially expressed isoforms usingMISOThe discovery of RNA binding proteins associated with differential splicing be-tween PCa and NEPC hinges upon the ability to robustly quantitate alternativesplicing, specifically cassette exons. To identify full-length isoforms from short-read RNA-Seq data is a challenging task, since we do not know all possible iso-forms of each gene. My work ignores full-length isoform reconstruction and fo-cuses on more local analysis, specifically cassette exons. Thus, it can be said myanalysis is an ’exon-centric’ analysis, whereas a ’transcript-centric’ approach hasto be tackled in prostate cancer.A number of tools for detecting alternative splicing from short-read RNA-seqdata are open-source and available. Reviews and comparisons of a number of50these tools have been published [41, 45]. Even in local analysis of alternativesplicing (e.g. identification of cassette exons), statistical models must be incorpo-rated and tailored for RNA-Seq technology, since fragments from transcripts arerandomly sampled.I used MISO to perform my alternative splicing analysis [50] because it wasideal for two-way comparisons in the xenograft cohort (331 versus 331R). InMISO, the use of Bayesian inference to compute the probability that a read orig-inated from a particular isoform allows statistically robust analysis in two-waycomparisons. One drawback of MISO is that it lacks statistical methods for han-dling groups of samples. To mitigate this weakness, I implemented my own sta-tistical analysis to detect differential splicing in the human cohort. There is acomparable tool, Cuffdiff 2 which incorporates added statistical power of multi-ple samples and/or biological replicates along with prediction of novel splicingevents [103], but lacks the rich information obtained from two-way comparisonof MISO. Since MISO’s strength lies in two-way analysis, I used MISO on thexenograft model. The human cohort was then used to corroborate the results.Due to the large number of possible isoforms a gene may express, the detectionof alternative splicing requires a large number of fragments from transcripts to besampled (i.e. requires high coverage). In a recent human study, >400 millionreads (100bp length, paired-end reads) were needed to detect 80% of differentialsplicing between two conditions [63]. By comparison, the xenograft cohort wassequenced to an average depth of ∼ 285 million reads (100bp length, paired-endreads) whereas the VPC cohort and Beltran et al. was sequenced to an averagedepth of ∼ 59 million reads (50bp length, paired-end reads) and ∼ 22 millionreads (54bp length, paired-end reads), respectively. The differing read lengths(100bp versus 50bp) as well as the differing sequencing depth may account forthe differing results between xenograft and human cohort.4.3.2 Identification of annotated features of cassette exonsOne of the difficulties in inferring possible functional domains of cassette exonsis translating the nucleotide sequence derived from short-read RNA-seq data toits corresponding amino acid sequence. Without reconstruction of the full-lengthisoform, it is impossible to determine the reading frame in which the cassetteexon is translated. This is an additional limitation of using local approaches foralternative splicing detection, previously discussed in Section 4.3.1.To mitigate this limitation in this work, I incorporated genome annotationsfrom Ensembl into the analysis. Every cassette exon that was annotated in En-51sembl was considered and all of their annotated reading frames were used fortranslation to amino acid sequence. This method, by considering all annotatedreading frames, identifies only the upper bound of the number of protein domainsthe cassette exons can code (Section 3.2). In reality, the cassette exons may codefor only a subset of the identified protein domains.4.3.3 Prediction of protein-binding regions in cassette exonsProtein protein interactions can be categorized into different types based on theirbinding affinities [81]. Broadly, high binding affinity interactions tend to be strongand irreversible whereas low binding affinity interactions may be transient. Mywork did not consider the entire continuum of protein interactions, but focused ona narrow region.The binding regions predicted by ANCHOR are restricted to only transient in-teractions mediated by linear motifs [28]. Specifically, ANCHOR only considerspotential binding sites in predicted disordered regions [71]. Therefore, the num-ber of regions predicted in Section 3.2 can be considered a lower bound of thenumber of protein interactions mediated by cassette exons.The subset of transient protein interactions predicted by ANCHOR may bebiologically interesting. The smaller interface areas of the binding regions andweaker affinities may be important for forming and breaking interactions readily.Indeed, these types of protein interactions are frequently involved in signaling andother transient interactions [98]. However, further computational and functionalanalysis is needed to uncover the nuanced signaling and regulatory interactionsthese predicted binding regions may play.4.3.4 Motif discovery and motif similarity analysisComprehensive motif analysis filtered 154 RNA binding proteins (RBPs) to only14 RBPs and 10 RNA recognition motifs (RRMs). This unbiased identificationof candidate RBPs can be used in other alternative splicing analyses involvingidentification of candidate RBPs for further analyses. To link RBPs to DSEs, adatabase of RRMs and their corresponding RBPs needs to be built. The qual-ity of the database has a direct effect on motif similarity analysis. The databasecontains both direct and indirect (inferred) RRMs. One caveat of incorporatingindirect RRMs is that there may be multiple RBPs mapping to a single RRM. I il-lustrate the indirect RNA recognition motif caveat with an example. CELF3 is anRBP that does not have a directly determined RRM. However, CELF3 has RNA-52binding domain identity of 0.829, 0.819 and 0.706 with BRUNOL5, BRUNOL6and BRUNOL4, respectively. Therefore, CELF3 has three inferred RRMs, thatof BRUNOL5, BRUNOL6 and BRUNOL4. Consequently, if a motif matchesto BRUNOL5, it must also match to CELF3. I caution that the association of anRBP to a set of differentially included cassette exons is only a first approximation,further functional analysis (e.g. CLIP-Seq) may resolve the issue of high RNA-binding domain identity between a number of the RBPs. Nevertheless, the motifanalysis is a powerful tool for predicting key regulators of alternative splicing ina global and unbiased manner.The motif analysis focuses on genic regions proximal to identify regulators(≤100 nt). Recent studies have identified binding in distal intronic regions (>500nt) [65]. Further, emerging evidence suggests RNA secondary structure to playa role in alternative splicing [27, 110, 113]. The increasing complexity of pre-mRNA splicing beyond what has been considered in my work, which focused onlinear motifs proximal to cassette exons, encourages further studies to unravel theintricacies of alternative splicing.4.4 Future workThe work presented in this thesis underscores the extensive alternative splicingchanges between PCa and NEPC. Importantly, the work also identifies regulatorsthat may contribute to these changes. There are a number of extensions from mywork that have important significance in further understanding NEPC.The RBPs predicted to be involved in exon inclusion are candidates for func-tional experiments in the wet lab. CLIP-Seq analyses of candidate RBPs mayreveal cooperative regulation between tissue-specific and ubiquitously-expressedRBPs. The small subset of RBPs identified (14/154) means it is experimentallytractable to obtain CLIP-Seq data to analyze the interaction between all 14 RBPs.Indeed, Huelga et al. have published a study exploring the role of 7 RBPs in thehnRNP family.Additional computational analysis should explore the functions of alternative5’ and 3’ splice site, retained introns and mutually exclusive exons. The functionalconsequence of these types of alternative splicing events is largely unexplored atthe whole-transcriptome level. This will give a more complete picture of alterna-tive splicing in NEPC and PCa.The splicing signatures between PCa and NEPC have been identified for thexenograft model, but the alternative splicing fate of the tumour during neuroen-53docrine transdifferentiation is not well understood. Exploring alternative splicingduring the transition phase of neuroendocrine transdifferentiation may providecritical insights to the dynamics of alternative splicing during androgen depri-vation. The rich information from two-way comparisons using MISO makes itparticularly suitable for investigating alternative splicing in the xenograft model.By elucidating the role of alternative splicing in neuroendocrine transdifferen-tiation, it opens the possibility of integrating other gene regulatory layers. Com-bining DNA methylation, histone modifications, proteomic and phosphoproteomicdata with alternative splicing, a systems level view of the biology will emerge.Novel therapeutic strategies involving multiple targets across a number of generegulatory layers can be possible as we begin to understand the crosstalk betweenlayers and the systems-wide effects of perturbing biological components.54Chapter 5ConclusionsNEPC is an aggressive subtype of prostate cancer that does not respond to an-drogen deprivation therapy, nearly all patients die within one year of diagnosis.Critically, its incidence is predicted to increase with the introduction of new hor-monal agents. NEPC is histologically and molecularly distinct from PCa and poormolecular understanding of NEPC accounts in part for the lack of effective ther-apies. Deeper understanding of NEPC is needed to propose novel therapeuticstrategies for this disease.To explore the distinct differences between PCa and NEPC, I asked whetheralternative splicing may contribute to its cancer subtype identity and seeked toidentify regulators of alternative splicing. I used a motif analysis to identify can-didate RNA binding proteins (RBPs) in an unbiased and comprehensive approach.The method can be seen as a pseudo-dimensionality reduction technique reducinga space of 154 RBPs to ∼10.Using a discovery and validation procedure to identify relevant RBPs, I an-alyzed RNA-Seq data from a xenograft model and corroborated the results witha human cohort. We identified SRRM4 as a recurrent RBP associated with exoninclusion. Further, enriched motifs associated with SRRM4 were found to be evo-lutionarily conserved. Finally, we found SRRM4-regulated genes to be enrichedin differentially spliced genes.Mapping cassette exons to functional protein domains showed that the exonscoded for a wide range of functions, including DNA binding and phosphorylationsites. Notable examples include FOXM1, a transcription factor with an includedexon in the DNA binding region, and CLK4, a protein kinase with a phosphoserinein the included exon that may phosphorylate SR proteins. Further, a statisticallysignificant number of cassette exons were predicted to code for protein binding55regions.Evolutionarily conserved motifs were identified in genic regions around NEPC-specific exons, suggesting functional importance of the motifs. Motif similarityanalysis linked the discovered motifs with NEPC-specific splicing factors, linkingNEPC-specific exons with NEPC-specific splicing factors. Interestingly, SRRM4was identified in two independent analyses, one on a xenograft cohort and an-other on a patient-tumour cohort, underscoring SRRM4 as a potential candidatefor functional analysis.Taken together, the alternative spliced cassette exons were implicated to affectprotein-DNA interactions, protein-RNA interactions and protein-protein interac-tions. This complexity of interaction networks will be multiplied as we begin tointegrate other regulatory mechanisms. The crosstalk between alternative splicingand other regulatory layers will require extension into the realm of proteomics,histone modifications and DNA methylation. Since each gene regulatory mecha-nism is tightly coupled with each other, this systems-level approach may be cru-cial to understanding the disease and, ultimately, developing effective therapiesagainst it.56Bibliography[1] W. Akamatsu, H. J. Okano, N. Osumi, T. Inoue, S. Nakamura,S. Sakakibara, M. Miura, N. Matsuo, R. B. Darnell, and H. Okano.Mammalian ELAV-like neuronal RNA-binding proteins HuB and HuCpromote neuronal development in both the central and the peripheralnervous systems. Proceedings of the National Academy of Sciences of theUnited States of America, 96(17):9885–90, Aug. 1999. ISSN 0027-8424.URL → pages 33[2] W. Akamatsu, H. Fujihara, T. Mitsuhashi, M. Yano, S. Shibata,Y. Hayakawa, H. J. Okano, S.-I. Sakakibara, H. Takano, T. Takano,T. Takahashi, T. Noda, and H. Okano. The RNA-binding protein HuDregulates neuronal cell identity and maturation. Proceedings of theNational Academy of Sciences of the United States of America, 102(12):4625–30, Mar. 2005. ISSN 0027-8424. doi:10.1073/pnas.0407523102.URL → pages 33[3] S. A. Alkan, K. Martincic, and C. Milcarek. The hnRNPs F and H2 bind tosimilar sequences to influence gene expression. The Biochemical journal,393(Pt 1):361–71, Jan. 2006. ISSN 1470-8728. doi:10.1042/BJ20050538.URL → pages 38[4] A. Alonso, J. Sasin, N. Bottini, I. Friedberg, I. Friedberg, A. Osterman,A. Godzik, T. Hunter, J. Dixon, and T. Mustelin. Protein tyrosinephosphatases in the human genome. Cell, 117(6):699–711, July 2004.ISSN 0092-8674. doi:10.1016/j.cell.2004.05.018. URL →pages 2857[5] S. Anders and W. Huber. Differential expression analysis for sequencecount data. Genome biology, 11(10):R106, Jan. 2010. ISSN 1465-6914.doi:10.1186/gb-2010-11-10-r106. URL → pages 15[6] S. C. Baca, D. Prandi, M. S. Lawrence, J. M. Mosquera, A. Romanel,Y. Drier, K. Park, N. Kitabayashi, T. Y. MacDonald, M. Ghandi, E. VanAllen, G. V. Kryukov, A. Sboner, J.-P. Theurillat, T. D. Soong,E. Nickerson, D. Auclair, A. Tewari, H. Beltran, R. C. Onofrio,G. Boysen, C. Guiducci, C. E. Barbieri, K. Cibulskis, A. Sivachenko, S. L.Carter, G. Saksena, D. Voet, A. H. Ramos, W. Winckler, M. Cipicchio,K. Ardlie, P. W. Kantoff, M. F. Berger, S. B. Gabriel, T. R. Golub,M. Meyerson, E. S. Lander, O. Elemento, G. Getz, F. Demichelis, M. A.Rubin, and L. A. Garraway. Punctuated evolution of prostate cancergenomes. Cell, 153(3):666–77, Apr. 2013. ISSN 1097-4172.doi:10.1016/j.cell.2013.03.021. URL →pages 2[7] T. L. Bailey and C. Elkan. Fitting a mixture model by expectationmaximization to discover motifs in biopolymers. Proceedings / ...International Conference on Intelligent Systems for Molecular Biology ;ISMB. International Conference on Intelligent Systems for MolecularBiology, 2:28–36, Jan. 1994. ISSN 1553-0833. URL → pages 18[8] T. L. Bailey, M. Boden, F. A. Buske, M. Frith, C. E. Grant, L. Clementi,J. Ren, W. W. Li, and W. S. Noble. MEME SUITE: tools for motifdiscovery and searching. Nucleic acids research, 37(Web Server issue):W202–8, July 2009. ISSN 1362-4962. doi:10.1093/nar/gkp335. URL → pages 18[9] T. L. Bailey, M. Bode´n, T. Whitington, and P. Machanick. The value ofposition-specific priors in motif discovery using MEME. BMCbioinformatics, 11(1):179, Jan. 2010. ISSN 1471-2105.doi:10.1186/1471-2105-11-179. URL → pages 19, 3358[10] Y. Barash, J. A. Calarco, W. Gao, Q. Pan, X. Wang, O. Shai, B. J.Blencowe, and B. J. Frey. Deciphering the splicing code. Nature, 465(7294):53–9, May 2010. ISSN 1476-4687. doi:10.1038/nature09000.URL → pages 6[11] L. J. Barlow and M. M. Shen. SnapShot: Prostate Cancer. Cancer Cell, 24(3):400–400.e1, 2013. URL →pages 1, 2, 3[12] H. Beltran, T. M. Beer, M. A. Carducci, J. de Bono, M. Gleave,M. Hussain, W. K. Kelly, F. Saad, C. Sternberg, S. T. Tagawa, and I. F.Tannock. New therapies for castration-resistant prostate cancer: efficacyand safety. European urology, 60(2):279–90, Aug. 2011. ISSN1873-7560. doi:10.1016/j.eururo.2011.04.038. URL → pages 2, 4, 5[13] H. Beltran, D. S. Rickman, K. Park, S. S. Chae, A. Sboner, T. Y.MacDonald, Y. Wang, K. L. Sheikh, S. Terry, S. T. Tagawa, R. Dhir, J. B.Nelson, A. de la Taille, Y. Allory, M. B. Gerstein, S. Perner, K. J. Pienta,A. M. Chinnaiyan, Y. Wang, C. C. Collins, M. E. Gleave, F. Demichelis,D. M. Nanus, and M. A. Rubin. Molecular characterization ofneuroendocrine prostate cancer and identification of new drug targets.Cancer discovery, 1(6):487–95, Nov. 2011. ISSN 2159-8290.doi:10.1158/2159-8290.CD-11-0130. URL → pages v,vii, 4, 5, 10, 14, 15, 51, 92[14] L. H. Boise, M. Gonza´lez-Garcı´a, C. E. Postema, L. Ding, T. Lindsten,L. A. Turka, X. Mao, G. Nun˜ez, and C. B. Thompson. bcl-x, abcl-2-related gene that functions as a dominant regulator of apoptotic celldeath. Cell, 74(4):597–608, 1993. URL →pages 7[15] K. Bomsztyk, O. Denisenko, and J. Ostrowski. hnRNP K: one proteinmultiple processes. BioEssays : news and reviews in molecular, cellularand developmental biology, 26(6):629–38, June 2004. ISSN 0265-9247.doi:10.1002/bies.20048. URL → pages 3459[16] U. Braunschweig, S. Gueroussov, A. M. Plocik, B. R. Graveley, and B. J.Blencowe. Dynamic integration of splicing within gene regulatorypathways. Cell, 152(6):1252–69, Mar. 2013. ISSN 1097-4172.doi:10.1016/j.cell.2013.02.034. URL → pages 6, 7[17] M. Buljan, G. Chalancon, S. Eustermann, G. P. Wagner, M. Fuxreiter,A. Bateman, and M. M. Babu. Tissue-specific splicing of disorderedsegments that embed binding motifs rewires protein interaction networks.Molecular cell, 46(6):871–83, June 2012. ISSN 1097-4164.doi:10.1016/j.molcel.2012.05.039. URL →pages 18, 21, 31[18] J. A. Calarco, S. Superina, D. O’Hanlon, M. Gabut, B. Raj, Q. Pan,U. Skalska, L. Clarke, D. Gelinas, D. van der Kooy, M. Zhen, B. Ciruna,and B. J. Blencowe. Regulation of Vertebrate Nervous System AlternativeSplicing and Development by an SR-Related Protein. Cell, 138(5):898–910, 2009. URL →pages 7, 20, 34, 40, 48[19] M. Chen and J. L. Manley. Mechanisms of alternative splicing regulation:insights from molecular and genomics approaches. Nature reviews.Molecular cell biology, 10(11):741–54, Nov. 2009. ISSN 1471-0080.doi:10.1038/nrm2777. URL → pages7[20] M.-Y. Chou, N. Rooke, C. W. Turck, and D. L. Black. hnRNP H Is aComponent of a Splicing Enhancer Complex That Activates a c-srcAlternative Exon in Neuronal Cells. Mol. Cell. Biol., 19(1):69–77, Jan.1999. URL ipsecsha.→ pages 38[21] M. Cohen-Eliav, R. Golan-Gerstl, Z. Siegfried, C. L. Andersen,K. Thorsen, T. F. Ø rntoft, D. Mu, and R. Karni. The splicing factorSRSF6 is amplified and is an oncoprotein inlung and colon cancers. TheJournal of pathology, 229(4):630–9, Mar. 2013. ISSN 1096-9896.60doi:10.1002/path.4129. URL → pages 40[22] D. Croft. Building models using Reactome pathways as templates.Methods in molecular biology (Clifton, N.J.), 1021:273–83, Jan. 2013.ISSN 1940-6029. doi:10.1007/978-1-62703-450-0\ 14. URL → pages 34[23] R. B. Darnell. RNA protein interaction in neurons. Annual review ofneuroscience, 36:243–70, July 2013. ISSN 1545-4126.doi:10.1146/annurev-neuro-062912-114322. URL → pages 33[24] C. J. David and J. L. Manley. Alternative pre-mRNA splicing regulation incancer: pathways and programs unhinged. Genes & development, 24(21):2343–64, Nov. 2010. ISSN 1549-5477. doi:10.1101/gad.1973010. URL → pages 7, 8, 9, 40[25] J. S. de Bono, C. J. Logothetis, A. Molina, K. Fizazi, S. North, L. Chu,K. N. Chi, R. J. Jones, O. B. Goodman, F. Saad, J. N. Staffurth,P. Mainwaring, S. Harland, T. W. Flaig, T. E. Hutson, T. Cheng,H. Patterson, J. D. Hainsworth, C. J. Ryan, C. N. Sternberg, S. L. Ellard,A. Fle´chon, M. Saleh, M. Scholz, E. Efstathiou, A. Zivi, D. Bianchini,Y. Loriot, N. Chieffo, T. Kheoh, C. M. Haqq, and H. I. Scher. Abirateroneand increased survival in metastatic prostate cancer. The New Englandjournal of medicine, 364(21):1995–2005, May 2011. ISSN 1533-4406.doi:10.1056/NEJMoa1014618. URL→ pages 4[26] F. J. de Miguel, R. D. Sharma, M. J. Pajares, L. M. Montuenga, A. Rubio,and R. Pio. Identification of alternative splicing events regulated by theoncogenic factor SRSF1 in lung cancer. Cancer research, pages0008–5472.CAN–13–1481–, Dec. 2013. ISSN 1538-7445.doi:10.1158/0008-5472.CAN-13-1481. URL → pages 761[27] Y. Ding, Y. Tang, C. K. Kwok, Y. Zhang, P. C. Bevilacqua, and S. M.Assmann. In vivo genome-wide profiling of RNA secondary structurereveals novel regulatory features. Nature, 505(7485):696–700, Jan. 2014.ISSN 1476-4687. doi:10.1038/nature12756. URL → pages 53[28] Z. Doszta´nyi, B. Me´sza´ros, and I. Simon. ANCHOR: web server forpredicting protein binding regions in disordered proteins. Bioinformatics(Oxford, England), 25(20):2745–6, Oct. 2009. ISSN 1367-4811.doi:10.1093/bioinformatics/btp518. URL → pages29, 52[29] J. D. Ellis, M. Barrios-Rodiles, R. Colak, M. Irimia, T. Kim, J. A. Calarco,X. Wang, Q. Pan, D. O’Hanlon, P. M. Kim, J. L. Wrana, and B. J.Blencowe. Tissue-specific alternative splicing remodels protein-proteininteraction networks. Molecular cell, 46(6):884–92, June 2012. ISSN1097-4164. doi:10.1016/j.molcel.2012.05.037. URL → pages viii, 8, 29, 41, 94[30] B. J. Feldman and D. Feldman. The development ofandrogen-independent prostate cancer. Nature reviews. Cancer, 1(1):34–45, Oct. 2001. ISSN 1474-175X. doi:10.1038/35094009. URL → pages 3[31] Z. Fu, L. Malureanu, J. Huang, W. Wang, H. Li, J. M. van Deursen, D. J.Tindall, and J. Chen. Plk1-dependent phosphorylation of FoxM1 regulatesa transcriptional programme required for mitotic progression. Nature cellbiology, 10(9):1076–82, Sept. 2008. ISSN 1465-7392.doi:10.1038/ncb1767. URL→ pages 27[32] P. A. Fujita, B. Rhead, A. S. Zweig, A. S. Hinrichs, D. Karolchik, M. S.Cline, M. Goldman, G. P. Barber, H. Clawson, A. Coelho, M. Diekhans,T. R. Dreszer, B. M. Giardine, R. A. Harte, J. Hillman-Jackson, F. Hsu,V. Kirkup, R. M. Kuhn, K. Learned, C. H. Li, L. R. Meyer, A. Pohl, B. J.Raney, K. R. Rosenbloom, K. E. Smith, D. Haussler, and W. J. Kent. TheUCSC Genome Browser database: update 2011. Nucleic acids research,6239(Database issue):D876–82, Jan. 2011. ISSN 1362-4962.doi:10.1093/nar/gkq963. URL 1/D876.short. → pages 15[33] D. Garneau, T. Revil, J.-F. Fisette, and B. Chabot. Heterogeneous nuclearribonucleoprotein F/H proteins modulate the alternative splicing of theapoptotic mediator Bcl-x. The Journal of biological chemistry, 280(24):22641–50, June 2005. ISSN 0021-9258. doi:10.1074/jbc.M501070200.URL ipsecsha.→ pages 38[34] A. S. Goldstein, J. Huang, C. Guo, I. P. Garraway, and O. N. Witte.Identification of a cell of origin for human prostate cancer. Science (NewYork, N.Y.), 329(5991):568–71, July 2010. ISSN 1095-9203.doi:10.1126/science.1189992. URL → pages 2[35] D. L. Goode, G. M. Cooper, J. Schmutz, M. Dickson, E. Gonzales,M. Tsai, K. Karra, E. Davydov, S. Batzoglou, R. M. Myers, and A. Sidow.Evolutionary constraint facilitates interpretation of genetic variation inresequenced human genomes. Genome research, 20(3):301–10, Mar.2010. ISSN 1549-5469. doi:10.1101/gr.102210.109. URL → pages 20[36] A. R. Grosso, S. Martins, and M. Carmo-Fonseca. The emerging role ofsplicing factors in cancer. EMBO reports, 9(11):1087–93, Nov. 2008.ISSN 1469-3178. doi:10.1038/embor.2008.189. URL → pages 9[37] S. Gupta, J. A. Stamatoyannopoulos, T. L. Bailey, and W. S. Noble.Quantifying similarity between motifs. Genome biology, 8(2):R24, Jan.2007. ISSN 1465-6914. doi:10.1186/gb-2007-8-2-r24. URL → pages 18, 20, 33[38] K. Han, G. Yeo, P. An, C. B. Burge, and P. J. Grabowski. A combinatorialcode for splicing silencing: UAGG and GGGG motifs. PLoS biology, 3(5):e158, May 2005. ISSN 1545-7885. doi:10.1371/journal.pbio.0030158.63URL → pages 38[39] D. Hanahan and R. A. Weinberg. Hallmarks of cancer: the nextgeneration. Cell, 144(5):646–74, Mar. 2011. ISSN 1097-4172.doi:10.1016/j.cell.2011.02.013. URL →pages 49[40] M. N. Hinman, H.-L. Zhou, A. Sharma, and H. Lou. All three RNArecognition motifs and the hinge region of HuC play distinct roles in theregulation of alternative splicing. Nucleic acids research, 41(9):5049–61,May 2013. ISSN 1362-4962. doi:10.1093/nar/gkt166. URL → pages 40[41] J. E. Hooper. A survey of software for genome-wide discovery ofdifferential splicing in RNA-Seq data. Human genomics, 8(1):3, Jan.2014. ISSN 1479-7364. doi:10.1186/1479-7364-8-3. URL → pages 51[42] S. C. Huelga, A. Q. Vu, J. D. Arnold, T. Y. Liang, P. P. Liu, B. Y. Yan, J. P.Donohue, L. Shiue, S. Hoon, S. Brenner, M. Ares, and G. W. Yeo.Integrative genome-wide analysis reveals cooperative regulation ofalternative splicing by hnRNP proteins. Cell reports, 1(2):167–78, Feb.2012. ISSN 2211-1247. doi:10.1016/j.celrep.2012.02.001. URL →pages 38, 40, 53[43] M. Irimia and B. J. Blencowe. Alternative splicing: decoding an expansiveregulatory layer. Current opinion in cell biology, 24(3):323–32, June2012. ISSN 1879-0410. doi:10.1016/ URL →pages 38[44] S. G. Julien, N. Dube´, S. Hardy, and M. L. Tremblay. Inside the humancancer tyrosine phosphatome. Nature reviews. Cancer, 11(1):35–49, Jan.2011. ISSN 1474-1768. doi:10.1038/nrc2980. URL → pages 2864[45] B. Kakaradov, H. Y. Xiong, L. J. Lee, N. Jojic, and B. J. Frey. Challengesin estimating percent inclusion of alternatively spliced junctions fromRNA-seq data. BMC bioinformatics, 13 Suppl 6(Suppl 6):S11, Jan. 2012.ISSN 1471-2105. doi:10.1186/1471-2105-13-S6-S11. URL → pages 51[46] T. V. Kalin, V. Ustiyan, and V. V. Kalinichenko. Multiple faces of FoxM1transcription factor: lessons from transgenic mouse models. Cell cycle(Georgetown, Tex.), 10(3):396–405, Mar. 2011. ISSN 1551-4005. URL → pages 27[47] A. Kalsotra and T. A. Cooper. Functional consequences ofdevelopmentally regulated alternative splicing. Nature reviews. Genetics,12(10):715–29, Oct. 2011. ISSN 1471-0064. doi:10.1038/nrg3052. URL → pages 8[48] R. Karni, E. de Stanchina, S. W. Lowe, R. Sinha, D. Mu, and A. R.Krainer. The gene encoding the splicing factor SF2/ASF is aproto-oncogene. Nature structural & molecular biology, 14(3):185–93,Mar. 2007. ISSN 1545-9993. doi:10.1038/nsmb1209. URL → pages 9[49] R. E. Kass and A. E. Raftery. Bayes Factors. Journal of the AmericanStatistical Association, 90(430):773–795, June 1995. ISSN 0162-1459.doi:10.1080/01621459.1995.10476572. URL→ pages vii, 17[50] Y. Katz, E. T. Wang, E. M. Airoldi, and C. B. Burge. Analysis and designof RNA sequencing experiments for identifying isoform regulation.Nature methods, 7(12):1009–15, Dec. 2010. ISSN 1548-7105.doi:10.1038/nmeth.1528. URL→ pages 15, 16, 50, 51[51] T. Koo and M. J. Wood. Clinical trials using antisense oligonucleotides induchenne muscular dystrophy. Human gene therapy, 24(5):479–88, May2013. ISSN 1557-7422. doi:10.1089/hum.2012.234. URL → pages 965[52] A. R. Kornblihtt, I. E. Schor, M. Allo´, G. Dujardin, E. Petrillo, and M. J.Mun˜oz. Alternative splicing: a pivotal step between eukaryotictranscription and translation. Nature reviews. Molecular cell biology, 14(3):153–65, Mar. 2013. ISSN 1471-0080. doi:10.1038/nrm3525. URL → pages 6, 7[53] H. Kuroyanagi, Y. Watanabe, Y. Suzuki, and M. Hagiwara.Position-dependent and neuron-specific splicing regulation by the CELFfamily RNA-binding protein UNC-75 in Caenorhabditis elegans. Nucleicacids research, 41(7):4015–25, Apr. 2013. ISSN 1362-4962.doi:10.1093/nar/gkt097. URL →pages 40[54] D. Lal, E. M. Reinthaler, J. AltmA˜ller, M. R. Toliat, H. Thiele,P. NA˜rnberg, H. Lerche, A. Hahn, R. S. MA˜ller, H. Muhle, T. Sander,F. Zimprich, and B. A. Neubauer. Correction: RBFOX1 and RBFOX3Mutations in Rolandic Epilepsy. PLoS ONE, 8(10), Oct. 2013. ISSN1932-6203.doi:10.1371/annotation/f6aed47b-9135-45f5-bfdd-f4ceb33c8561. URL → pages 20[55] A. K. Y. Lam, A. W. L. Ngan, M.-H. Leung, D. C. T. Kwok, V. W. S. Liu,D. W. Chan, W. Y. Leung, and K.-M. Yao. FOXM1b, which is present atelevated levels in cancer cells, has a greater transforming potential thanFOXM1c. Frontiers in oncology, 3:11, Jan. 2013. ISSN 2234-943X.doi:10.3389/fonc.2013.00011. URL→ pages 27[56] J. Laoukili, M. Stahl, and R. H. Medema. FoxM1: at the crossroads ofageing and cancer. Biochimica et biophysica acta, 1775(1):92–102, Jan.2007. ISSN 0006-3002. doi:10.1016/j.bbcan.2006.08.006. URL→ pages 27[57] A. V. Lapuk, C. Wu, A. W. Wyatt, A. McPherson, B. J. McConeghy,S. Brahmbhatt, F. Mo, A. Zoubeidi, S. Anderson, R. H. Bell, A. Haegert,66R. Shukin, Y. Wang, L. Fazli, A. Hurtado-Coll, E. C. Jones, F. Hach,F. Hormozdiari, I. Hajirasouliha, P. C. Boutros, R. G. Bristow, Y. Zhao,M. A. Marra, A. Fanjul, C. A. Maher, A. M. Chinnaiyan, M. A. Rubin,H. Beltran, S. C. Sahinalp, M. E. Gleave, S. V. Volik, and C. C. Collins.From sequence to molecular pathology, and a mechanism driving theneuroendocrine phenotype in prostate cancer. The Journal of pathology,227(3):286–97, July 2012. ISSN 1096-9896. doi:10.1002/path.4047.URL → pages 5, 10, 14,23, 48[58] C. V. Lefave, M. Squatrito, S. Vorlova, G. L. Rocco, C. W. Brennan, E. C.Holland, Y.-X. Pan, and L. Cartegni. Splicing factor hnRNPH drives anoncogenic splicing switch in gliomas. The EMBO journal, 30(19):4084–97, Oct. 2011. ISSN 1460-2075. doi:10.1038/emboj.2011.259.URL → pages 9[59] Q. Li, J.-A. Lee, and D. L. Black. Neuronal regulation of alternativepre-mRNA splicing. Nature reviews. Neuroscience, 8(11):819–31, Nov.2007. ISSN 1471-0048. doi:10.1038/nrn2237. URL → pages 33[60] D. Lin, A. W. Wyatt, H. Xue, Y. Wang, X. Dong, A. Haegert, R. Wu,S. Brahmbhatt, F. Mo, L. Jong, R. H. Bell, S. Anderson, A. Hurtado-Coll,L. Fazli, M. Sharma, H. Beltran, M. Rubin, M. Cox, P. W. Gout, J. Morris,L. Goldenberg, S. V. Volik, M. E. Gleave, C. C. Collins, and Y. Wang.High Fidelity Patient-Derived Xenografts for Accelerating Prostate CancerDiscovery and Drug Development. Cancer research, pages0008–5472.CAN–13–2921–T–, Feb. 2014. ISSN 1538-7445.doi:10.1158/0008-5472.CAN-13-2921-T. URL→ pages 5, 6, 10, 13[61] D. Lipscombe. Neuronal proteins custom designed by alternative splicing.Current opinion in neurobiology, 15(3):358–63, June 2005. ISSN0959-4388. doi:10.1016/j.conb.2005.04.002. URL →pages 3867[62] D. R. Littler, M. Alvarez-Ferna´ndez, A. Stein, R. G. Hibbert,T. Heidebrecht, P. Aloy, R. H. Medema, and A. Perrakis. Structure of theFoxM1 DNA-recognition domain bound to a promoter sequence. Nucleicacids research, 38(13):4527–38, July 2010. ISSN 1362-4962.doi:10.1093/nar/gkq194. URL→ pages 27[63] Y. Liu, J. F. Ferguson, C. Xue, I. M. Silverman, B. Gregory, M. P. Reilly,and M. Li. Evaluating the impact of sequencing depth on transcriptomeprofiling in human adipose. PloS one, 8(6):e66883, Jan. 2013. ISSN1932-6203. doi:10.1371/journal.pone.0066883. URL → pages 51[64] H. Lou and R. F. Gagel. Alternative ribonucleic acid processing inendocrine systems. Endocrine reviews, 22(2):205–25, Apr. 2001. ISSN0163-769X. doi:10.1210/edrv.22.2.0426. URL → pages 18[65] M. T. Lovci, D. Ghanem, H. Marr, J. Arnold, S. Gee, M. Parra, T. Y.Liang, T. J. Stark, L. T. Gehman, S. Hoon, K. B. Massirer, G. A. Pratt,D. L. Black, J. W. Gray, J. G. Conboy, and G. W. Yeo. Rbfox proteinsregulate alternative mRNA splicing through evolutionarily conservedRNA bridges. Nature structural & molecular biology, 20(12):1434–42,Dec. 2013. ISSN 1545-9985. doi:10.1038/nsmb.2699. URL → pages 53[66] R. F. Luco and T. Misteli. More than a splicing code: integrating the roleof RNA, chromatin and non-coding RNA in alternative splicingregulation. Current opinion in genetics & development, 21(4):366–72,Aug. 2011. ISSN 1879-0380. doi:10.1016/j.gde.2011.03.004. URL→ pages 49[67] R. Y. M. Ma, T. H. K. Tong, A. M. S. Cheung, A. C. C. Tsang, W. Y.Leung, and K.-M. Yao. Raf/MEK/MAPK signaling stimulates the nucleartranslocation and transactivating activity of FOXM1c. Journal of cellscience, 118(Pt 4):795–806, Feb. 2005. ISSN 0021-9533.68doi:10.1242/jcs.01657. URL → pages 27[68] A. V. Makeyev and S. A. Liebhaber. The poly(C)-binding proteins: amultiplicity of functions and a search for mechanisms. RNA, 8(3):265–278, Mar. 2002. URL → pages 34[69] D. M. Mauger, C. Lin, and M. A. Garcia-Blanco. hnRNP H and hnRNP Fcomplex with Fox2 to silence fibroblast growth factor receptor 2 exon IIIc.Molecular and cellular biology, 28(17):5403–19, Sept. 2008. ISSN1098-5549. doi:10.1128/MCB.00739-08. URL → pages 40[70] R. Mehra, B. Han, S. A. Tomlins, L. Wang, A. Menon, M. J. Wasco,R. Shen, J. E. Montie, A. M. Chinnaiyan, and R. B. Shah. Heterogeneityof TMPRSS2 gene rearrangements in multifocal prostate adenocarcinoma:molecular evidence for an independent group of diseases. Cancerresearch, 67(17):7991–5, Sept. 2007. ISSN 0008-5472.doi:10.1158/0008-5472.CAN-07-2043. URL → pages 2[71] B. Me´sza´ros, I. Simon, and Z. Doszta´nyi. Prediction of protein bindingregions in disordered proteins. PLoS computational biology, 5(5):e1000376, May 2009. ISSN 1553-7358.doi:10.1371/journal.pcbi.1000376. URL → pages 21, 29,52[72] H. Min, R. C. Chan, and D. L. Black. The generally expressed hnRNP F isinvolved in a neural-specific pre-mRNA splicing event. Genes &Development, 9(21):2659–2671, Nov. 1995. ISSN 0890-9369.doi:10.1101/gad.9.21.2659. URL ipsecsha.→ pages 38[73] R. Nadal, M. Schweizer, O. N. Kryvenko, J. I. Epstein, and M. A.Eisenberger. Small cell carcinoma of the prostate. Nature reviews.69Urology, advance on, Feb. 2014. ISSN 1759-4820.doi:10.1038/nrurol.2014.21. URL → pages 4, 5[74] C. Naro and C. Sette. Phosphorylation-Mediated Regulation ofAlternative Splicing in Cancer. International journal of cell biology, 2013:151839, Jan. 2013. ISSN 1687-8876. doi:10.1155/2013/151839. URL → pages 40[75] K. Ninomiya, N. Kataoka, and M. Hagiwara. Stress-responsive maturationof Clk1/4 pre-mRNAs promotes phosphorylation of SR splicing factor.The Journal of cell biology, 195(1):27–40, Oct. 2011. ISSN 1540-8140.doi:10.1083/jcb.201107093. URL → pages 40[76] A. D. Norris and J. A. Calarco. Emerging Roles of Alternative Pre-mRNASplicing Regulation in Neuronal Development and Function. Frontiers inneuroscience, 6:122, Jan. 2012. ISSN 1662-453X.doi:10.3389/fnins.2012.00122. URL→ pages 38[77] H. J. Okano and R. B. Darnell. A hierarchy of Hu RNA binding proteinsin developing and adult neurons. The Journal of neuroscience : the officialjournal of the Society for Neuroscience, 17(9):3024–37, May 1997. ISSN0270-6474. URL → pages33[78] A. Ostman, C. Hellberg, and F. D. Bo¨hmer. Protein-tyrosine phosphatasesand cancer. Nature reviews. Cancer, 6(4):307–20, May 2006. ISSN1474-175X. doi:10.1038/nrc1837. URL → pages 28[79] S. Pal, R. Gupta, and R. V. Davuluri. Alternative transcription andalternative splicing in cancer. Pharmacology & therapeutics, 136(3):283–94, Dec. 2012. ISSN 1879-016X.doi:10.1016/j.pharmthera.2012.08.005. URL →pages 4970[80] S. Pedrotti and T. A. Cooper. In Brief: (Mis)splicing in disease. TheJournal of Pathology, pages n/a–n/a, Feb. 2014. ISSN 00223417.doi:10.1002/path.4337. URL →pages 7[81] J. R. Perkins, I. Diboun, B. H. Dessailly, J. G. Lees, and C. Orengo.Transient protein-protein interactions: structural, functional, and networkproperties. Structure (London, England : 1993), 18(10):1233–43, Oct.2010. ISSN 1878-4186. doi:10.1016/j.str.2010.08.007. URL →pages 52[82] I. A. Qureshi and M. F. Mehler. Developing epigenetic diagnostics andtherapeutics for brain disorders. Trends in molecular medicine, 19(12):732–41, Dec. 2013. ISSN 1471-499X.doi:10.1016/j.molmed.2013.09.003. URL →pages 38[83] B. Raj, D. O’Hanlon, J. P. Vessey, Q. Pan, D. Ray, N. J. Buckley, F. D.Miller, and B. J. Blencowe. Cross-regulation between an alternativesplicing activator and a transcription repressor controls neurogenesis.Molecular cell, 43(5):843–50, Sept. 2011. ISSN 1097-4164.doi:10.1016/j.molcel.2011.08.014. URL → pages 48[84] D. Ray, H. Kazan, K. B. Cook, M. T. Weirauch, H. S. Najafabadi, X. Li,S. Gueroussov, M. Albu, H. Zheng, A. Yang, H. Na, M. Irimia, L. H.Matzat, R. K. Dale, S. A. Smith, C. A. Yarosh, S. M. Kelly, B. Nabet,D. Mecenas, W. Li, R. S. Laishram, M. Qiao, H. D. Lipshitz, F. Piano,A. H. Corbett, R. P. Carstens, B. J. Frey, R. A. Anderson, K. W. Lynch,L. O. F. Penalva, E. P. Lei, A. G. Fraser, B. J. Blencowe, Q. D. Morris, andT. R. Hughes. A compendium of RNA-binding motifs for decoding generegulation. Nature, 499(7457):172–7, July 2013. ISSN 1476-4687.doi:10.1038/nature12311. URL→ pages 19, 20[85] R. T. Richardson, I. N. Batova, E. E. Widgren, L.-X. Zheng, M. Whitfield,W. F. Marzluff, and M. G. O’Rand. Characterization of the Histone71H1-binding Protein, NASP, as a Cell Cycle-regulated Somatic Protein. J.Biol. Chem., 275(39):30378–30386, Sept. 2000. URL → pages 28[86] R. T. Richardson, D. C. Bencic, and M. G. O’Rand. Comparison of mouseand human NASP genes and expression in human transformed and tumorcell lines. Gene, 274(1-2):67–75, Aug. 2001. ISSN 03781119.doi:10.1016/S0378-1119(01)00605-9. URL →pages 28[87] R. T. Richardson, O. M. Alekseev, G. Grossman, E. E. Widgren,R. Thresher, E. J. Wagner, K. D. Sullivan, W. F. Marzluff, and M. G.O’Rand. Nuclear autoantigenic sperm protein (NASP), a linker histonechaperone that is required for cell proliferation. The Journal of biologicalchemistry, 281(30):21526–34, July 2006. ISSN 0021-9258.doi:10.1074/jbc.M603816200. URL → pages 28[88] M. Safran, I. Dalah, J. Alexander, N. Rosen, T. Iny Stein, M. Shmoish,N. Nativ, I. Bahir, T. Doniger, H. Krug, A. Sirota-Madi, T. Olender,Y. Golan, G. Stelzer, A. Harel, and D. Lancet. GeneCards Version 3: thehuman gene integrator. Database : the journal of biological databasesand curation, 2010(0):baq020, Jan. 2010. ISSN 1758-0463.doi:10.1093/database/baq020. URL →pages 34[89] K. Sawicka, M. Bushell, K. A. Spriggs, and A. E. Willis.Polypyrimidine-tract-binding protein: a multifunctional RNA-bindingprotein. Biochemical Society transactions, 36(Pt 4):641–7, Aug. 2008.ISSN 0300-5127. doi:10.1042/BST0360641. URL → pages 9[90] H. I. Scher, K. Fizazi, F. Saad, M.-E. Taplin, C. N. Sternberg, K. Miller,R. de Wit, P. Mulders, K. N. Chi, N. D. Shore, A. J. Armstrong, T. W.Flaig, A. Fle´chon, P. Mainwaring, M. Fleming, J. D. Hainsworth,M. Hirmand, B. Selby, L. Seely, and J. S. de Bono. Increased survival withenzalutamide in prostate cancer after chemotherapy. The New England72journal of medicine, 367(13):1187–97, Sept. 2012. ISSN 1533-4406.doi:10.1056/NEJMoa1207506. URL → pages 4[91] M. M. Shen. Chromoplexy: a new category of complex rearrangements inthe cancer genome. Cancer cell, 23(5):567–9, May 2013. ISSN1878-3686. doi:10.1016/j.ccr.2013.04.025. URL→ pages 2[92] M. M. Shen and C. Abate-Shen. Molecular genetics of prostate cancer:new prospects for old challenges. Genes & development, 24(18):1967–2000, Sept. 2010. ISSN 1549-5477. doi:10.1101/gad.1965810.URL → pages 2[93] S. Shukla, E. Kavak, M. Gregory, M. Imashimizu, B. Shutinoski,M. Kashlev, P. Oberdoerffer, R. Sandberg, and S. Oberdoerffer.CTCF-promoted RNA polymerase II pausing links DNA methylation tosplicing. Nature, 479(7371):74–9, Nov. 2011. ISSN 1476-4687.doi:10.1038/nature10442. URL→ pages 49[94] R. Siegel, D. Naishadham, and A. Jemal. Cancer statistics, 2012. CA: acancer journal for clinicians, 62(1):10–29, 2012. ISSN 1542-4863.doi:10.3322/caac.20138. URL → pages 1[95] R. Singh and J. Valca´rcel. Building specificity with nonspecificRNA-binding proteins. Nature structural & molecular biology, 12(8):645–53, Aug. 2005. ISSN 1545-9993. doi:10.1038/nsmb961. URL → pages 7[96] R. Spellman, M. Llorian, and C. W. J. Smith. Crossregulation andfunctional redundancy between the splicing regulator PTB and its paralogsnPTB and ROD1. Molecular cell, 27(3):420–34, Aug. 2007. ISSN1097-2765. doi:10.1016/j.molcel.2007.06.016. URL →pages 3373[97] P. Spitali and A. Aartsma-Rus. Splice modulating therapies for humandisease. Cell, 148(6):1085–8, Mar. 2012. ISSN 1097-4172.doi:10.1016/j.cell.2012.02.014. URL →pages 9[98] A. Stein, R. A. Pache, P. Bernado´, M. Pons, and P. Aloy. Dynamicinteractions of proteins in complex networks: a more structured view. TheFEBS journal, 276(19):5390–405, Oct. 2009. ISSN 1742-4658.doi:10.1111/j.1742-4658.2009.07251.x. URL → pages 52[99] Y. Tan, P. Raychaudhuri, and R. H. Costa. Chk2 mediates stabilization ofthe FoxM1 transcription factor to stimulate expression of DNA repairgenes. Molecular and cellular biology, 27(3):1007–16, Feb. 2007. ISSN0270-7306. doi:10.1128/MCB.01068-06. URL → pages 27[100] S. Tarazona, F. Garcı´a-Alcalde, J. Dopazo, A. Ferrer, and A. Conesa.Differential expression in RNA-seq: a matter of depth. Genome research,21(12):2213–23, Dec. 2011. ISSN 1549-5469.doi:10.1101/gr.124321.111. URL → pages 50[101] J. J. Tentler, A. C. Tan, C. D. Weekes, A. Jimeno, S. Leong, T. M. Pitts,J. J. Arcaroli, W. A. Messersmith, and S. G. Eckhardt. Patient-derivedtumour xenografts as models for oncology drug development. Naturereviews. Clinical oncology, 9(6):338–50, June 2012. ISSN 1759-4782.doi:10.1038/nrclinonc.2012.61. URL → pages 5, 49[102] C. Trapnell, L. Pachter, and S. L. Salzberg. TopHat: discovering splicejunctions with RNA-Seq. Bioinformatics (Oxford, England), 25(9):1105–11, May 2009. ISSN 1367-4811.doi:10.1093/bioinformatics/btp120. URL → pages1574[103] C. Trapnell, D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn, andL. Pachter. Differential analysis of gene regulation at transcript resolutionwith RNA-seq. Nature biotechnology, 31(1):46–53, Jan. 2013. ISSN1546-1696. doi:10.1038/nbt.2450. URL → pages 51[104] J. Ule, K. B. Jensen, M. Ruggiu, A. Mele, A. Ule, and R. B. Darnell. CLIPidentifies Nova-regulated RNA networks in the brain. Science (New York,N.Y.), 302(5648):1212–5, Nov. 2003. ISSN 1095-9203.doi:10.1126/science.1090095. URL → pages 6,40[105] R. J. van Alphen, E. A. C. Wiemer, H. Burger, and F. A. L. M. Eskens. Thespliceosome as target for anticancer treatment. British journal of cancer,100(2):228–32, Jan. 2009. ISSN 1532-1827. doi:10.1038/sj.bjc.6604801.URL → pages 9[106] N. Vashchenko and P.-A. Abrahamsson. Neuroendocrine differentiation inprostate cancer: implications for new treatment modalities. Europeanurology, 47(2):147–55, Feb. 2005. ISSN 0302-2838.doi:10.1016/j.eururo.2004.09.007. URL → pages 4[107] J. P. Venables, R. Klinck, C. Koh, J. Gervais-Bird, A. Bramard, L. Inkel,M. Durand, S. Couture, U. Froehlich, E. Lapointe, J.-F. Lucier,P. Thibault, C. Rancourt, K. Tremblay, P. Prinos, B. Chabot, and S. A.Elela. Cancer-associated regulation of alternative splicing. Naturestructural & molecular biology, 16(6):670–6, June 2009. ISSN1545-9985. doi:10.1038/nsmb.1608. URL → pages 6[108] X. Wang, M. Kruithof-de Julio, K. D. Economides, D. Walker, H. Yu,M. V. Halili, Y.-P. Hu, S. M. Price, C. Abate-Shen, and M. M. Shen. Aluminal epithelial stem cell that is a cell of origin for prostate cancer.Nature, 461(7263):495–500, Sept. 2009. ISSN 1476-4687.doi:10.1038/nature08361. URL→ pages 275[109] Z. Wang and C. B. Burge. Splicing regulation: from a parts list ofregulatory elements to an integrated splicing code. RNA (New York, N.Y.),14(5):802–13, May 2008. ISSN 1469-9001. doi:10.1261/rna.876308.URL → pages 6[110] M. B. Warf and J. A. Berglund. Role of RNA structure in regulatingpre-mRNA splicing. Trends in biochemical sciences, 35(3):169–78, Mar.2010. ISSN 0968-0004. doi:10.1016/j.tibs.2009.10.004. URL →pages 53[111] I. Wierstra. The transcription factor FOXM1c binds to and transactivatesthe promoter of the tumor suppressor gene E-cadherin. Cell cycle(Georgetown, Tex.), 10(5):760–6, Mar. 2011. ISSN 1551-4005. URL → pages 27[112] C. Wu, A. W. Wyatt, A. McPherson, D. Lin, B. J. McConeghy, F. Mo,R. Shukin, A. V. Lapuk, S. J. M. Jones, Y. Zhao, M. A. Marra, M. E.Gleave, S. V. Volik, Y. Wang, S. C. Sahinalp, and C. C. Collins. Poly-genefusion transcripts and chromothripsis in prostate cancer. Genes,chromosomes & cancer, 51(12):1144–53, Dec. 2012. ISSN 1098-2264.doi:10.1002/gcc.21999. URL → pages 2[113] Y. Yang, L. Zhan, W. Zhang, F. Sun, W. Wang, N. Tian, J. Bi, H. Wang,D. Shi, Y. Jiang, Y. Zhang, and Y. Jin. RNA secondary structure inmutually exclusive splicing. Nature structural & molecular biology, 18(2):159–68, Feb. 2011. ISSN 1545-9985. doi:10.1038/nsmb.1959. URL → pages 53[114] K.-M. Yao, M. Sha, Z. Lu, and G. G. Wong. Molecular Analysis of aNovel Winged Helix Protein, WIN: EXPRESSION PATTERN, DNABINDING PROPERTY, AND ALTERNATIVE SPLICING WITHIN THEDNA BINDING DOMAIN. J. Biol. Chem., 272(32):19827–19836, Aug.1997. URL → pages 27[115] F. Zammarchi, E. de Stanchina, E. Bournazou, T. Supakorndej,K. Martires, E. Riedel, A. D. Corben, J. F. Bromberg, and L. Cartegni.Antitumorigenic potential of STAT3 alternative splicing modulation.Proceedings of the National Academy of Sciences of the United States of76America, 108(43):17779–84, Oct. 2011. ISSN 1091-6490.doi:10.1073/pnas.1108482108. URL → pages 9[116] Z. Zhou and X.-D. Fu. Regulation of splicing by SR proteins and SRprotein-specific kinases. Chromosoma, 122(3):191–207, June 2013. ISSN1432-0886. doi:10.1007/s00412-013-0407-z. URL → pages 7[117] H. Zhu, R. A. Hasman, V. A. Barron, G. Luo, and H. Lou. A nuclearfunction of Hu proteins as neuron-specific alternative RNA processingregulators. Molecular biology of the cell, 17(12):5105–14, Dec. 2006.ISSN 1059-1524. doi:10.1091/mbc.E06-02-0099. URL → pages 3377Appendix ASupporting MaterialsTable A.1: RNA-Seq metadata for 331 and 331RSample ID Location Type Read LengthMappedRead Pairs331Xenograftfrom PrimaryPCa 100 217,046,582331RXenograftfrom PrimaryNEPC 100 353,370,756Table A.6: Uniprot annotations of differentially included exons (331 vs331R in xenograft cohort)Gene name Annotated feature DescriptionACLY MOD RES Phosphoserine.ADPGK DOMAIN ADPK.Continued on next page78Table A.6 – continued from previous pageGene name Annotated feature DescriptionAKAP10 DOMAIN RGS 1.AKAP10 DOMAIN RGS 2.ANO10 TOPO DOM Cytoplasmic (Potential).APBA2 DOMAIN PID.APTX DOMAIN FHA-like.APTX REGION Interactions with ADPRT and NCL.APTX STRANDAPTX STRANDAPTX TURNATL1 REGION Sufficient for membrane association.ATL1 TOPO DOM Cytoplasmic.ATP9B TOPO DOM Extracellular (Potential).ATP9B TRANSMEM Helical; (Potential).ATP9B TRANSMEM Helical; (Potential).BARD1 REGION Interaction with BRCA1.BARD1 REGION Interaction with BRCA1.BARD1 STRANDBARD1 STRANDBARD1 TURNBARD1 TURNBARD1 TURNBARD1 TURNBARD1 ZN FING RING-type.BARD1 ZN FING RING-type.BCLAF1 MOD RES Phosphothreonine.BPTF REGION Interaction with KEAP1.BPTF REGION Interaction with KEAP1.CACNA1D TOPO DOM Cytoplasmic (Potential).CADPS DOMAIN MHD1.CADPS REGION Interaction with DRD2.CARM1 MOD RES Dimethylated arginine (By similarity).CARM1 REGION Transactivation domain (By similar-ity).CARS2 METAL Zinc (By similarity).CARS2 MOTIF ”HIGH” region.Continued on next page79Table A.6 – continued from previous pageGene name Annotated feature DescriptionCARS2 TRANSIT Mitochondrion (Potential).CASK DOMAIN L27 1.CASP8 MOD RES Phosphoserine (By similarity).CASP8 PROPEPCCDC125 COILED Potential.CD44 REGION Stem.CD44 TOPO DOM Extracellular (Potential).CEP57 COILED Potential.CEP57 COMPBIAS Poly-Lys.CEP57 REGION centrosome localization domain(CLD) (By similarity).CLASP1 COMPBIAS Ser-rich.CLASP1 REGION Interaction with microtubules,MAPRE1 and MAPRE3.CLK4 MOD RES Phosphoserine.CLK4 MOD RES Phosphoserine.CXADR DISULFID By similarity.CXADR DOMAIN Ig-like C2-type 2.CXADR STRANDCXADR STRANDCXADR STRANDCXADR STRANDCXADR TOPO DOM Extracellular (Potential).CYFIP1 TURNDHX33 DOMAIN Helicase ATP-binding.DHX33 NP BIND ATP (By similarity).DIS3 DOMAIN PINc.DLX1 DNA BIND Homeobox.DNAJC27 DOMAIN J.DNM1L CROSSLNK Glycyl lysine isopeptide (Lys-Gly)(interchain with G-Cter in SUMO).DNM1L CROSSLNK Glycyl lysine isopeptide (Lys-Gly)(interchain with G-Cter in SUMO).DNM1L REGION Interaction with GSK3B.DNM1L REGION B domain.Continued on next page80Table A.6 – continued from previous pageGene name Annotated feature DescriptionDNMT3B DOMAIN SAM-dependent MTase C5-type.DTNB COILED Potential.EPB41 MOD RES Phosphotyrosine; by EGFR.EPB41 MOD RES Phosphotyrosine; by EGFR.EPB41 REGION Spectrin–actin-binding.EPB41 REGION Spectrin–actin-binding.EPB41L1 MOD RES Phosphoserine.EPB41L1 MOD RES Phosphoserine.EPB41L1 MOD RES Phosphoserine.EPB41L1 MOD RES Phosphoserine.EPM2A COMPBIAS Arg-rich.EXOG STRANDEXOG STRANDEXOG STRANDEXOG STRANDEXOG TURNEXOG TURNEXOG TURNEXOG TURNFNBP1 REGION Required for self-association and in-duction of membrane tubulation.FOXM1 DNA BIND Fork-head.FOXM1 MOD RES Phosphoserine.GOPC COILED Potential.GTPBP8 DOMAIN G.GTPBP8 NP BIND GTP (Potential).GZF1 ZN FING C2H2-type 9.GZF1 ZN FING C2H2-type 10.KCTD20 DOMAIN BTB.KIAA0226 DOMAIN RUN.LETMD1 DOMAIN LETM1.LETMD1 REGION Required and sufficient for mitochon-drial import.LETMD1 TOPO DOM Cytoplasmic (Potential).LMAN2L DOMAIN L-type lectin-like.Continued on next page81Table A.6 – continued from previous pageGene name Annotated feature DescriptionLMAN2L METAL Calcium (By similarity).LMAN2L METAL Calcium (By similarity).LMAN2L METAL Calcium (By similarity).LMAN2L REGION Carbohydrate binding (By similarity).LMAN2L TOPO DOM Lumenal (Potential).LYPLAL1 STRANDLYPLAL1 STRANDLYPLAL1 STRANDLYPLAL1 STRANDLYPLAL1 STRANDLYPLAL1 STRANDMAP3K4 COMPBIAS Poly-Ala.MCM9 DOMAIN MCM.MEF2D REGION Beta domain.MEF2D SITE Cleavage (Probable).MXD3 DOMAIN bHLH.NASP COILED Potential.NASP COILED Potential.NASP COMPBIAS Glu-rich (acidic).NASP MOD RES Phosphoserine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphothreonine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphoserine.NASP MOD RES Phosphothreonine.NASP REGION Histone-binding.NASP REGION Histone-binding.NEO1 DOMAIN Fibronectin type-III 6.NEO1 TOPO DOM Extracellular (Potential).NF1 DOMAIN Ras-GAP.NFATC1 MOTIF Nuclear export signal.NFATC1 REGION Trans-activation domain B (TAD-B).Continued on next page82Table A.6 – continued from previous pageGene name Annotated feature DescriptionNFATC1 REGION Trans-activation domain B (TAD-B).OCRL MOTIF Clathrin box 2.PCF11 DOMAIN CID.PDLIM1 DOMAIN PDZ.PDLIM1 STRANDPDLIM1 STRANDPDLIM1 STRANDPOMT2 CARBOHYD N-linked (GlcNAc...) (Potential).POMT2 TRANSMEM Helical; (Potential).PPP1R9A DOMAIN SAM.PPP1R9A REGION Interacts with TGN38 (By similarity).PTPN3 MOD RES Phosphoserine.PTPN3 MOD RES Phosphoserine.PTPN3 MOD RES Phosphothreonine.PTPRF DOMAIN Fibronectin type-III 5.PTPRF TOPO DOM Extracellular (Potential).PTPRS DOMAIN Fibronectin type-III 5.PTPRS TOPO DOM Extracellular (Potential).PUM2 COMPBIAS Ser-rich.PUM2 MOD RES Phosphoserine.RACGAP1 MOD RES N-acetylmethionine.RAD51AP1 MOD RES Phosphoserine.RBM41 DOMAIN RRM.RBM41 MOD RES Phosphoserine.RBM41 STRANDRBM41 STRANDRFX3 DNA BIND RFX-type winged-helix.RREB1 MOD RES Phosphothreonine.RREB1 MOD RES Phosphoserine.RREB1 MOD RES Phosphoserine.RREB1 ZN FING C2H2-type 13.RREB1 ZN FING C2H2-type 14.SALL2 COMPBIAS Poly-Glu.SALL2 COMPBIAS Poly-Ala.SALL2 COMPBIAS Poly-Pro.Continued on next page83Table A.6 – continued from previous pageGene name Annotated feature DescriptionSALL2 CROSSLNK Glycyl lysine isopeptide (Lys-Gly)(interchain with G-Cter in ubiquitin).SALL2 MOD RES Phosphoserine.SALL2 MOD RES Phosphoserine.SALL2 MOD RES Phosphoserine.SALL2 ZN FING C2H2-type 1.SALL2 ZN FING C2H2-type 2.SALL2 ZN FING C2H2-type 3.SALL2 ZN FING C2H2-type 4.SALL2 ZN FING C2H2-type 5.SALL2 ZN FING C2H2-type 6.SALL2 ZN FING C2H2-type 7.SCRIB REGION Sufficient for targeting to adherensjunction and to inhibit cell prolifera-tion.SCRIB REGION Interaction with ARHGEF7.SCRIB STRANDSLC30A5 TOPO DOM Cytoplasmic (Potential).SLC30A5 TOPO DOM Extracellular (Potential).SLC30A5 TRANSMEM Helical; (Potential).SLC30A5 TRANSMEM Helical; (Potential).SNX12 BINDING Phosphatidylinositol 3-phosphate (Bysimilarity).SNX12 BINDING Phosphatidylinositol 3-phosphate (Bysimilarity).SNX12 DOMAIN PX.SNX12 STRANDSPAST REGION Required for interaction with RTN1.SPAST REGION Sufficient for interaction withCHMP1B.SPAST REGION Required for interaction with micro-tubules.SPAST REGION Sufficient for interaction with micro-tubules (By similarity).SURF4 TRANSMEM Helical; (Potential).Continued on next page84Table A.6 – continued from previous pageGene name Annotated feature DescriptionTAPT1 TRANSMEM Helical; (Potential).TAPT1 TRANSMEM Helical; (Potential).TMEM181 TRANSMEM Helical; (Potential).TMPRSS4 DISULFID By similarity.TMPRSS4 DISULFID By similarity.TMPRSS4 DOMAIN Peptidase S1.TMPRSS4 TOPO DOM Extracellular (Potential).TTPAL DOMAIN CRAL-TRIO.TTPAL DOMAIN CRAL-TRIO.VAV2 DOMAIN PH.VDAC3 INIT MET Removed.VPS18 COILED Potential.VPS18 MOD RES N6-acetyllysine.VPS18 MOD RES Phosphoserine.VPS18 REPEAT CHCR.YTHDF1 COMPBIAS Gln/Pro-rich.YTHDF1 DOMAIN YTH.YWHAE MOD RES N6-acetyllysine.YWHAE MOD RES N6-acetyllysine.YWHAE SITE Interaction with phosphoserine on in-teracting protein.ZDHHC16 TOPO DOM Lumenal (Potential).ZNF141 ZN FING C2H2-type 1; atypical.ZNF92 DOMAIN KRAB.Table A.7: Uniprot annotations of differentially excluded exons in NEPC(331 vs 331R in xenograft cohort)Gene name Annotated feature DescriptionANKMY1 REPEAT ANK 1.ANKMY1 REPEAT ANK 5.ANKMY1 REPEAT ANK 6.ARHGAP17 COMPBIAS Pro-rich.Continued on next page85Table A.7 – continued from previous pageGene name Annotated feature DescriptionATG10 ACT SITE Glycyl thioester intermediate (By sim-ilarity).ATXN2 COMPBIAS Pro-rich.ATXN2 MOD RES Phosphoserine.BZRAP1 COMPBIAS Poly-Glu.C9orf156 DOMAIN TsaA-like.CCDC30 COILED Potential.DDX19B REGION N-terminal lobe.DLG1 REGION Interaction with SH3 domains.EPN3 REPEAT UIM 1.EPN3 REPEAT UIM 2.FGFR1OP2 COILED Potential.FGFR1OP2 SITE Breakpoint for translocation to formFGFR1OP2-FGFR1.FIP1L1 REGION Necessary for stimulating PAPOLAactivity.FIP1L1 REGION Sufficient for interaction with PAP-OLA.FLNB REGION Hinge 1 (By similarity).FLNB REGION Hinge 1 (By similarity).FLNB REPEAT Filamin 15.FLNB REPEAT Filamin 15.FLNB STRANDFLNB STRANDFOXP1 DNA BIND Fork-head.FOXP1 STRANDFOXP1 STRANDGXYLT1 TOPO DOM Lumenal (Potential).MAGI1 DOMAIN PDZ 4.MTRR DOMAIN Flavodoxin-like.MTRR DOMAIN Flavodoxin-like.MTRR NP BIND FMN (By similarity).MTSS1 COILED Potential.MTSS1 DOMAIN IMD.PIGT CARBOHYD N-linked (GlcNAc...).Continued on next page86Table A.7 – continued from previous pageGene name Annotated feature DescriptionPIGT TOPO DOM Lumenal (Potential).RCOR3 DOMAIN SANT 2.RPGR COMPBIAS Glu-rich.SLC24A1 SIGNAL Not cleaved.SLC24A1 TOPO DOM Extracellular (Potential).SLC24A1 TOPO DOM Cytoplasmic (Potential).SLC24A1 TRANSMEM Helical; (Potential).TJP1 MOD RES Phosphoserine.TPM3 COILED By similarity.TRPC1 REPEAT ANK 2.TRPC1 REPEAT ANK 3.TRPC1 TOPO DOM Cytoplasmic (Potential).Table A.8: Uniprot annotations of differentially included exons (NEPC vsPCa in VPC-Beltran pooled cohort)Gene name Annotated feature DescriptionAPLP2 TOPO DOM Extracellular (Potential).ATL1 REGION Sufficient for membrane association.ATL1 TOPO DOM Cytoplasmic.BIN1 COILED Potential.BIN1 DOMAIN BAR.CACNA1D TOPO DOM Cytoplasmic (Potential).DYRK1B DOMAIN Protein kinase.EPB41 REGION Spectrin–actin-binding.EPB41L1 REGION Spectrin–actin-binding.FGF1 STRANDFGF1 STRANDFGF1 STRANDFGF1 STRANDFGF1 STRANDFGF1 TURNHNRNPD COMPBIAS Gly-rich.Continued on next page87Table A.8 – continued from previous pageGene name Annotated feature DescriptionHNRNPD COMPBIAS Tyr-rich.KCNH6 TOPO DOM Cytoplasmic (Potential).L1CAM TOPO DOM Cytoplasmic (Potential).MEF2D REGION Beta domain.MEF2D SITE Cleavage (Probable).NEO1 DOMAIN Fibronectin type-III 6.NEO1 TOPO DOM Extracellular (Potential).OCRL MOTIF Clathrin box 2.PARP6 DOMAIN PARP catalytic.PTPRF DOMAIN Fibronectin type-III 5.PTPRF TOPO DOM Extracellular (Potential).PTPRS DOMAIN Fibronectin type-III 5.PTPRS TOPO DOM Extracellular (Potential).RTN4 MOD RES Phosphoserine (By similarity).RTN4 MOD RES Phosphoserine (By similarity).RTN4 TOPO DOM Cytoplasmic (Potential).Table A.9: Uniprot annotations of differentially excluded exons in NEPC(NEPC vs PCa in VPC-Beltran pooled cohort)Gene name Annotated feature DescriptionACHE ACT SITE Charge relay system.ACHE CARBOHYD N-linked (GlcNAc...).KIAA1217 COILED Potential.SYNE1 COMPBIAS Ser-rich.ACHE DISULFIDZNF595 DOMAIN KRAB.EPHB6 DOMAIN Eph LBD.ZNF544 DOMAIN KRAB.ZNF419 DOMAIN KRAB.MPZL1 MOD RES Phosphoserine.MPZL1 MOD RES Phosphoserine.MPZL1 MOD RES Phosphoserine.Continued on next page88Table A.9 – continued from previous pageGene name Annotated feature DescriptionMPZL1 MOD RES Phosphoserine.PSAT1 MOD RES N6-acetyllysine.PSAT1 MOD RES N6-acetyllysine.PSAT1 MOD RES N6-acetyllysine.TJP1 MOD RES Phosphoserine.FLNB REGION Hinge 1 (By similarity).ANKRD65 REPEAT ANK 2.ANKRD65 REPEAT ANK 3.ANKRD65 REPEAT ANK 4.ANKRD65 REPEAT ANK 5.ANKRD65 REPEAT ANK 6.ANKRD65 REPEAT ANK 7.FLNB REPEAT Filamin 15.EPN3 REPEAT UIM 1.EPN3 REPEAT UIM 2.TSEN15 STRANDTSEN15 STRANDTSN STRANDFLNB STRANDACHE STRANDACHE STRANDPSAT1 STRANDPSAT1 STRANDPSAT1 STRANDMPZL1 TOPO DOM Cytoplasmic (Potential).CSPG5 TOPO DOM Cytoplasmic (Potential).CD47 TOPO DOM Cytoplasmic (Potential).SYNE1 TOPO DOM Cytoplasmic (Potential).SGCE TOPO DOM Cytoplasmic (Potential).EPHB6 TOPO DOM Extracellular (Potential).TPCN2 TOPO DOM Extracellular (Potential).SLC9B2 TRANSMEM Helical; (Potential).TPCN2 TRANSMEM Helical; Name=S3 of repeat II; (Po-tential).MOSPD1 TRANSMEM Helical; (Potential).Continued on next page89Table A.9 – continued from previous pageGene name Annotated feature DescriptionMOSPD1 TRANSMEM Helical; (Potential).ACHE TURN90Table A.2: RNA-Seq metadata of patient-tumour samples from VPC cohort.RL and MRP denotes read length and mapped read pairs, respectively.Sample ID Location Type RL MRP946b UrethraPEUrethra NEPC 51 87,213,415972 2 PenilePEPenile NEPC 50 69,288,725AB352 PE Penile/UrethraNEPC(Xenograft)50 45,942,348963 PE PrimaryHybridPCa-NEPC50 56,920,570963L ln PE Lymph nodeHybridPCa-NEPC50 59,333,9451005 PEDistantmetastasisPCa 51 110,537,721890LP PE Primary PCa 51 100,651,842945RP PE Primary PCa 51 100,132,794961RP PE Primary PCa 51 120,371,27291Table A.3: Metadata of patient-tumour samples from Beltran et al. cohortthat were sequenced by RNA-Seq using Illumina Genome Analyzer IIID # Subtype Read Length Mapped Read Pairs97 T PCa 54 16,172,1742621 PCa 54 11,899,9852682 PCa 54 11,832,4662741 B PCa 54 10,984,5192743 D PCa 54 21,561,1943035 B53 PCa 54 11,835,9682620 D PCa 54 30,638,8282832 B PCa 54 53,150,5213023 B62 PCa 54 28,521,6183034 C51 PCa 54 13,775,3183042 H51 PCa 54 14,098,9383043 B56 PCa 54 12,962,7383051 B51 PCa 54 12,654,6983071 B51 PCa 54 25,073,0543085 B57 PCa 54 18,803,5873134 B58 PCa 54 13,883,5682740 A PCa 54 15,974,3092858 C PCa 54 19,052,8523026 B56 PCa 54 18,964,1352761 D PCa 54 21,943,3543048 B50 PCa 54 23,330,5603849 D PCa 54 21,493,2602844 D PCa 54 21,838,9452872 D PCa 54 18,618,0173050 B51 PCa 54 14,814,10331332 B51 PCa 54 25,609,3282660 B PCa 54 11,486,6587820 NEPC 54 37,690,3967800 NEPC 54 38,385,5777821 NEPC 54 40,721,2968220 NEPC 54 33,309,13892Table A.4: Annotated alternative splicing events used in MISOType of alternative splicing Number of annotated alter-native splicing eventsCassette (skipped exons) 42485Mutually exclusive 7581Retained introns 7197Alternative 5’ splice site 9035Alternative 3’ splice site 1420593Table A.5: Genes encoding baits with SRRM4-regulated exons used in LU-MIER screens. Taken from supplementary table of study of Ellis et al.Gene Found in xenograft co-hort?Found in patient-tumourcohort?Madd Yes YesAnk3 No NoArhgap21 No NoArhgap21 No NoKlc1 Yes NoClta No NoDock4 No NoGit1 No NoAdd1 Yes NoMef2a No NoPpp3cb No NoAnk2 No YesDync1i2 No NoCttn No NoBin1 No YesSorbs1 No NoNeo1 Yes YesEpb41 Yes YesEpb42 No NoClip2 No NoClasp2 No NoScarb1 No NoDaam1 No NoMagi1 Yes NoSgce No YesNrxn2 Yes NoMyo18a No NoSynj1 No NoElmo2 No NoAsph No NoPpfia1 Yes YesClasp1 Yes YesApp Yes No94


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items