UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Analysis of the androgen regulated transcriptome in prostate cancer Lehman, Melanie Lynne 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2012_spring_lehman_melanie.pdf [ 3.7MB ]
Metadata
JSON: 24-1.0072483.json
JSON-LD: 24-1.0072483-ld.json
RDF/XML (Pretty): 24-1.0072483-rdf.xml
RDF/JSON: 24-1.0072483-rdf.json
Turtle: 24-1.0072483-turtle.txt
N-Triples: 24-1.0072483-rdf-ntriples.txt
Original Record: 24-1.0072483-source.json
Full Text
24-1.0072483-fulltext.txt
Citation
24-1.0072483.ris

Full Text

ANALYSIS OF THE ANDROGEN REGULATED TRANSCRIPTOME IN PROSTATE CANCER  by  Melanie Lynne Lehman  B.Sc., The University of Alberta, 1998  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE STUDIES (Experimental Medicine)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  December 2011    © Melanie Lynne Lehman, 2011 ii  Abstract Prostate cancer is one of the most common cancers diagnosed and the second leading cause of cancer related death in North American men.  The prostate is dependent on male-sex hormones (androgens) for differentiation and growth.  Prostate cancer derived from prostatic epithelial cells is likewise dependent on androgens for survival. Androgen deprivation therapy (ADT) through pharmacological methods is currently the most effective treatment for disease no longer localized to the prostate gland.  ADT offers a temporary remission, but only delays disease progression as the cancer cells adapt to survive and proliferate in castrate levels of circulating androgens.  There is mounting evidence to suggest that intratumoral androgens acting through the androgen receptor (AR) continue to play a critical role in castration resistant prostate cancer (CRPC) progression. Recent advances in RNA profiling technologies are revealing a more complex picture of the human transcriptome.  Alternative splice variants of protein-coding mRNAs and non- protein-coding regulatory RNAs (ncRNA) were identified in this research that are expressed in prostate and prostate cancer cells.  A custom 180K microarray was designed to profile the expression of the identified prostate RNAs, ncRNAs, and other reference RNAs.  The custom microarray was used to profile expression after treatment in vitro with androgens and anti-androgens in the LNCaP prostate cancer cell line.  The expression of the identified androgen regulated RNAs were examined in vivo after castration and during progression to CRPC in LNCaP xenograft tumors.  The research presented is an integrative analysis of in vitro and in vivo expression profiles with AR- DNA interactions detected by AR ChIP-seq and microRNAs detected by small RNA sequencing.  The integrated expression profiles suggest a different androgen regulated transcriptional program in CRPC from that seen in treatment naive tumors.  Several unexpected findings were revealed including a cell cycle related difference between synthetic androgen, R1881, and physiological androgen, DHT; RNAs increased by the anti-androgen, MDV3100, and not bicalutamide; and the use of alternative 3‟ UTRs following castration in the in vivo model. The identified reference and novel androgen regulated RNAs may inform on the mechanism underlying androgen deprivation therapy, anti-androgen response, and the progression to CRPC. iii  Preface The work presented in this thesis is a collaborative effort of researchers at the Vancouver Prostate Centre (VPC) in Vancouver, Canada, the Queensland University of Technology (QUT) in Brisbane, Australia, and other collaborating research institutions. Each individual contribution to the research presented is list under the chapter headings. Chapter 2: Identification of Prostate Expressed Transcripts The work presented is my analysis of primary sequence data from GenBank, RNA-seq data on prostate cancer cell lines, and RNA-seq data from the Illumina Body Map 2.0 tissue sequencing project.  The prostate cancer cell data was provided as pileup files by Drs. Collin Collins and Stanislav Volik (VPC).  Illumina provided raw next-generation sequencing data for the Illumina Body Map 2.0.  Dr. Marcel Dinger (University of Queensland, Brisbane, Australia) provided BED and BedGraph files from his analysis of the Illumina Body Map 2.0 (unpublished).  The BEDTools software developed by Dr. Aaron Quinlan at the University of Virginia (Quinlan and Hall 2010)  was extensively used throughout this thesis. Chapter 3: Building a Custom Agilent Prostate Microarray The work presented is my effort to design novel probes for a custom array and to analyze the performance of those probes.  A software tool written by Andrew Gray (VPC) was used to parse blat results aligning microarray probes to the genome. Chapter 4: Identification of Androgen Regulated Transcripts and Genomic Loci The work presented is my analysis of four large high-throughput datasets.  The AR ChIP experiment was designed by Dr. Colleen Nelson, Stephen Hendy, and me and was performed by Stephen Hendy (VPC).  The sequencing was performed at the Genome Sciences Centre (Vancouver, BC).  The LNCaP in vitro experiment was designed by Dr. Colleen Nelson, Stephen Hendy and me and the cell culture work was performed by Stephen Hendy (VPC).  The microarrays were processed by Anne Haegert (VPC).  The FACS analysis was performed by Dr. Martin Sadowski (QUT). Laboratory validation of the KLK4 antisense transcript was performed by Dr. John Lai iv  (QUT) and was published (Lai et al. 2010).  The small RNA next-generation sequencing experiment was performed by Nadine Tomlinson (VPC): cell culture, library preparation and miRNA qRT-PCR.  The sequencing was performed at the BC Genome Sciences Centre.  Analysis of the small RNA sequencing data for novel miRNA was performed by Dr. Jiyuan An (QUT).  The design of the LNCaP xenograft experiment was a collaborative effort between Dr. Martin Gleave, Dr. Susan Ettinger, Mary Bowden and me.  The animal work was completed by Mary Bowden.  The microarrays were processed by Sonal Brahmbhatt. Chapter 5: Examples of Androgen Regulated Transcripts and Genomic Loci The work presented is specific examples of androgen regulated transcripts and genomic loci identified from the analysis in Chapter 4.  The laboratory validation of CTBP1 sense-antisense transcripts was performed by John Cavanagh (VPC; Western Blot) and Stephen Hendy (VPC; qRT-PCR and siRNA). Manuscripts in Preparation Lehman ML, Dinger ME, Sadowski M, McPherson S, Hendy SC, Gleave ME, Mattick JS, Nelson CC.  Integrative Analysis of the Androgen and Anti-androgen Regulated Transcriptome in Prostate Cancer Cells. Lehman ML, Najaraj SH, Lai J, An J, Hendy SC, Nelson CC.  miRNA Target Prediction using RNA-seq 3‟UTR Expression Data in Prostate Cancer Cells. Lai J, Lehman ML, Nelson CC.   Androgen Regulated Multitasking Genomic Loci in Prostate Cancer Cells. Sadowski M, Lehman ML, An J, Rockstroh A, McPherson S, Hendy SC, Nelson CC. Opposing Effects of a Synthetic Androgen (R1881) and a Physiological Androgen (DHT) on Cell Cycle Progression in Prostate Cancer Cells. Ettinger S, McPherson S, Lehman ML, Hendy SC, Nelson CC.  Androgen Regulation of an Alternative 3‟ end of the Estrogen Receptor Alpha mRNA in Prostate Cancer Cells.  v  Additional Publications Sieh S, Taubenberger A, Sadowski M, Rockstroh A, Lehman ML, Clements J, Nelson CC, Hutmacher D.  Phenotypic characterization of prostate cancer LNCaP cells cultured within a PEG-based synthetic and biomimetic matrix.  Submitted October 2011. Wang Q, Bailey CG, Ng C, Tiffen J, Thoeng A, Minhas V, Lehman ML, Hendy SC, Buchanan G, Nelson CC, Rasko JEJ, Holst J. Androgen receptor and nutrient signaling pathways coordinate the demand for increased amino acid transport in prostate cancer progression.  Accepted Cancer Research, November 2011. Thompson VC, Hurtado-Coll A, Turbin D, Fazli L, Lehman ML, Gleave ME, Nelson CC. Relaxin drives Wnt signaling through upregulation of PCDHY in prostate cancer.  The Prostate: July 2010; 70(10):1134-45. Lai J, Lehman ML, Seim I, Lawrence MG, Hendy SC, Dinger ME, Mattick JS, Clements JA, Nelson CC.  A variant of the KLK4 gene is expressed as a cis sense-antisense chimeric transcript in prostate cancer cells.  RNA: June 2010;16(6):1156-66. Locke JA, Guns ES, Lehman ML, Ettinger S, Zoubedi A, Lubik A, Margiotti K, Fazli L, Adomat HH, Wasan KM, Gleave ME, Nelson CC.  Arachidonic acid activation of intratumoral steroid synthesis during prostate cancer progression to castration- resistance.  Prostate: September 2009; 70(3):239-251.  vi  Table of Contents Abstract ......................................................................................................................... ii Preface .......................................................................................................................... iii Table of Contents ......................................................................................................... vi List of Tables ................................................................................................................. x List of Figures .............................................................................................................. xi List of Abbreviations ................................................................................................. xiii Acknowledgements ................................................................................................... xvi Chapter 1 Literature Review, Hypothesis and Objectives ......................................... 1 1.1 Prostate Cancer ........................................................................................................ 1 1.1.1 Prostate Biology and Function ......................................................................... 1 1.1.2 Screening and Diagnosis ................................................................................. 2 1.1.3 Localized Prostate Cancer .............................................................................. 3 1.1.4 Advanced Prostate Cancer .............................................................................. 4 1.1.5 Risk Factors .................................................................................................... 4 1.1.6 Theories on Molecular Mechanisms of Prostate Cancer Initiation ................... 5 1.2 Androgens and the Androgen Receptor .................................................................... 6 1.2.1 Androgens ....................................................................................................... 6 1.2.2 The Androgen Receptor .................................................................................. 7 1.2.3 Genomic Effects of Androgens ........................................................................ 8 1.2.4 Non-Genomic Effects of Androgens .............................................................. 10 1.2.5 Androgen Deprivation Therapy (ADT) and Anti-androgens ........................... 10 1.3 Castration Resistant Prostate Cancer (CRPC) ....................................................... 12 1.3.1 Activation of Cell Survival and Alternative Growth Factor Pathways ............. 12 1.3.2 Increased Expression, Sensitivity or Activation of AR ................................... 13 1.3.3 Increased Expression of Ligand-Independent AR Splice Variant .................. 14 vii  1.3.4 Alternative Sources of Androgens ................................................................. 15 1.3.5 Treatment of CRPC ....................................................................................... 15 1.4 RNA ........................................................................................................................ 18 1.4.1 High Throughput RNA Profiling ..................................................................... 19 1.4.2 Protein-Coding RNA ...................................................................................... 21 1.4.3 RNA Databases and Data Visualization ........................................................ 21 1.4.4 Alternative Splicing of Protein-Coding RNA ................................................... 23 1.4.5 Non-Coding RNA (ncRNA) ............................................................................ 26 1.4.6 Small Non-Coding RNA ................................................................................. 27 1.4.7 Long Non-Coding RNA (lncRNA) .................................................................. 29 1.4.8 Long Non-coding RNAs in Prostate Cancer .................................................. 33 1.4.9 Multitasking Genomic Loci ............................................................................. 33 1.5 Scope of the Thesis ................................................................................................ 36 1.5.1 Hypotheses ................................................................................................... 36 1.5.2 Rationale and Specific Objectives ................................................................. 36 Chapter 2 Identification of Prostate Expressed Transcripts ................................... 38 2.1 Introduction ............................................................................................................. 38 2.2 Methods and Results .............................................................................................. 38 2.2.1 Software and Datasets .................................................................................. 38 2.2.2 Summary of NCBI RefSeq Transcripts .......................................................... 39 2.2.3 Tissue-Specific Transcription in Illumina Body Map 2.0 ................................ 41 2.2.4 Prostate Expressed Genomic Regions .......................................................... 43 2.3 Discussion .............................................................................................................. 48 2.3.1 RefSeq Transcripts ........................................................................................ 48 2.3.2 Incorrect Association of mRNA Sequences with NCBI Entrez Genes ........... 49 2.3.3 Non-RefSeq Transcripts ................................................................................ 49 viii  2.3.4 Prostate Expressed Genomic Regions .......................................................... 50 Chapter 3 Building an Custom Agilent Prostate Microarray ................................... 52 3.1 Introduction ............................................................................................................. 52 3.2 Microarray Probe Design ........................................................................................ 52 3.2.1 Previously Designed Probes ......................................................................... 52 3.2.2 Microarray Design Method and Inclusion Criteria .......................................... 53 3.2.3 Novel Probes ................................................................................................. 54 3.3 Microarray Probe Classification and Annotation ..................................................... 55 3.4 Microarray Design Versions .................................................................................... 56 3.5 Microarray Probe Performance ............................................................................... 57 3.6 Discussion .............................................................................................................. 61 Chapter 4 Identification of Androgen Regulated Transcripts and Genomic Loci . 63 4.1 Introduction ............................................................................................................. 63 4.2 AR Binding Sites Detected by ChIP-seq ................................................................. 64 4.2.1 Methods ......................................................................................................... 64 4.2.2 Results .......................................................................................................... 65 4.2.3 Discussion ..................................................................................................... 70 4.3 Androgen and Anti-androgen Regulated Transcripts Detected in vitro by Microarray Profiling ......................................................................................................................... 72 4.3.1 Methods ......................................................................................................... 72 4.3.2 Results .......................................................................................................... 73 4.3.3 Discussion ..................................................................................................... 94 4.4 Androgen Regulated Small RNA Detected in vitro by Next-Generation Sequencing  ...................................................................................................................................... 98 4.4.1 Methods ......................................................................................................... 98 4.4.2 Results .......................................................................................................... 98 4.4.3 Discussion ................................................................................................... 101 ix  4.5 Transcripts Regulated in Xenograft Tumors by Castration and During Progression to CRPC Detected by Microarray Profiling ...................................................................... 103 4.5.1 Methods ....................................................................................................... 103 4.5.2 Results ........................................................................................................ 104 4.5.3 Discussion ................................................................................................... 112 4.6 Summary .............................................................................................................. 114 Chapter 5 Examples of Androgen Regulated Transcripts and Genomic Loci ..... 116 5.1 Introduction ........................................................................................................... 116 5.2 Androgen Regulated Transcripts and Genomic Loci ............................................. 116 5.2.1 CTBP1 Sense-Antisense Transcripts .......................................................... 116 5.2.2 miRNA-29a/29b Genomic Locus ................................................................. 121 5.2.3 SREBF2 and miR-33a Genomic Locus ....................................................... 124 5.3 Discussion ............................................................................................................ 126 Chapter 6 Summary and Perspectives .................................................................... 129 6.1 Summary .............................................................................................................. 129 6.2 Perspectives ......................................................................................................... 131 6.3 Conclusions and Future Work ............................................................................... 133 Bibliography .............................................................................................................. 135 x  List of Tables Table 2.1  Summary of Sense-Antisense Transcripts in NCBI RefSeq ......................... 41 Table 2.2  Illumina Body Map 2.0 Tissue Coverage on the Human Genome ............... 43 Table 2.3  Examples of Prostate-Specific Genes in Illumina Body Map 2.0 .................. 45 Table 2.4  Examples of Prostate GenBank mRNAs Incorrectly Associated with NCBI Entrez Genes ................................................................................................................ 47 Table 3.1  Probe Classifications for Novel Probes Compared to Agilent 44K Probes ... 57 Table 3.2  Microarray Probes by eArray Base Composition Score ............................... 58 Table 3.3  Novel and Agilent Probes Expressed in the LNCaP in vitro Experiment ...... 60 Table 4.1  Sample Conditions Included in the LNCaP in vitro Experiment .................... 73 Table 4.2  R1881 and DHT Regulated Probes by Probe Classification ........................ 74 Table 4.3  DHT Regulated Probes Oppositely Regulated by Anti-Androgens .............. 84 Table 4.4  Probes Designed to Detect Prostate Expressed Transcripts in the LNCaP in vitro Experiment ............................................................................................................ 94 Table 4.5  R1881 Regulated miRNAs Detected by Next-Generation Sequencing ...... 100 Table 4.6  Differentially Expressed Probes in the LNCaP Xenograft Experiment ....... 106 Table 4.7  Probes Designed to Detect Prostate Expressed Transcripts in the LNCaP Xenograft Experiment ................................................................................................. 112    xi  List of Figures Figure 1.1  Histopathology of the Prostate ...................................................................... 2 Figure 1.2  The Hypothalamic-Pituitary-Gonadal Axis .................................................... 7 Figure 1.3  Progression from Androgen Dependent to Castration Resistant Prostate Cancer .......................................................................................................................... 17 Figure 1.4  UCSC Genome Browser: KLK3 Genomic Locus ........................................ 23 Figure 1.5  miRNA Biogenesis ...................................................................................... 28 Figure 2.1  NCBI RefSeq Transcript Coverage on the Human Genome ....................... 40 Figure 2.2  Illumina Body Map 2.0 Compared to NCBI RefSeq Transcript Coverage on the Human Genome ...................................................................................................... 42 Figure 2.3  Venn Diagrams of Prostate Expressed Genomic Regions .......................... 45 Figure 2.4  UCSC Genome Browser: HOXB13 Prostate Specific 3'UTR ...................... 46 Figure 3.1  Microarray Probe Classification .................................................................. 56 Figure 3.2  Novel Probe Intensities and Ratios by eArray Base Composition Score .... 59 Figure 3.3  Probe Reproducibility in the LNCaP in vitro Experiment ............................. 61 Figure 4.1  AR ChIP-seq Peaks Compared to Other Published AR ChIP Experiments 66 Figure 4.2  AR ChIP-seq Peaks Compared to RefSeq Transcript Features ................. 67 Figure 4.3  AR ChIP-seq Peak Distance to the Nearest RefSeq TSS ........................... 68 Figure 4.4  DNA Sequence Motifs Detected in AR ChIP-seq Peaks ............................. 70 Figure 4.5  Microarray Probe Intensities for R1881 Compared to DHT ......................... 76 Figure 4.6  R1881 and DHT Regulated Probes Compared by Scatterplot .................... 78 Figure 4.7  R1881 and Progesterone Regulated Probes Compared by Scatterplot ...... 79 Figure 4.8  FACS Analysis for LNCaP Cells Treated with Different Steroids ................ 80 Figure 4.9  AR ChIP-seq Peak Distance to the Nearest DHT or R1881 Regulated RefSeq TSS .................................................................................................................. 82 Figure 4.10  Bicalutamide and DHT Regulated Probes Compared by Scatterplot ........ 85 Figure 4.11  MDV3100 and DHT Regulated Probes Compared by Scatterplot ............ 86 Figure 4.12  Probe Intensities for the Probes Specifically Increased by MDV3100 ...... 88 Figure 4.13  AR ChIP-seq Peak Distance to the Nearest Anti-Androgen Regulated RefSeq TSS .................................................................................................................. 90 Figure 4.14  DHT Regulated Sense-Antisense RefSeq Transcripts.............................. 92 Figure 4.15  qRT-PCR Validation for miRNAs Increased by R1881 ........................... 101 xii  Figure 4.16  Tumor Samples Profiled in the LNCaP Xenograft Experiment ................ 104 Figure 4.17  HIF1A Exon and 3' UTR Expression in the LNCaP Xenograft Experiment  .................................................................................................................................... 107 Figure 4.18  Patterns of Expression for the DHT Regulated Probes in the LNCaP Xenograft Experiment ................................................................................................. 110 Figure 5.1  Androgen Regulation of Sense-Antisense Transcripts in the CTBP1 Genomic Locus ........................................................................................................... 119 Figure 5.2  Laboratory Validation of CTBP1 and the CTBP1 Antisense Transcripts ... 120 Figure 5.3  Androgen Regulation of the miR-29a/29b Genomic Locus ....................... 123 Figure 5.4  Androgen Regulation of the SREBF2/miR-33a Genomic Locus ............... 126  xiii  List of Abbreviations aCGH  array comparative genomic hybridization ADT  androgen deprivation therapy AKT  v-akt murine thymoma viral oncogene AR  androgen receptor ARE  androgen response elements BAX  BCL2-associated X protein BCL2  B-cell CLL/lymphoma 2 BPH  benign prostatic hyperplasia bp  base pair CAGE  cap analysis gene expression CCDC134 coiled-coil domain containing 134 ChIP  chromatin immunoprecipitation CLCN5 chloride channel 5 CLU  clusterin CRPC  castration resistant prostate cancer CSS   charcoal striped fetal bovine serum CTBP1 c-terminal binding protein 1 CTBP1as c-terminal binding protein 1 antisense CV  coefficient of variation CYP17A1 cytochrome P450, family 17, subfamily A, polypeptide 1 DHEA  dehydroepiandrosterone DHT  dihydrotestosterone DICER dicer 1, ribonuclease type III DNA  deoxyribonucleic acid DNMT3A DNA (cytosine-5-)-methyltransferase 3 alpha DNMT3B DNA (cytosine-5-)-methyltransferase 3 beta DROSHA drosha, ribonuclease type III EGF  epidermal growth factor ELK4  ETS-domain protein (SRF accessory protein 1) EMT  epithelial to mesenchymal transition ENCODE encyclopedia of DNA elements project Entrez Gene NCBI Entrez Gene Database ERG  v-ets erythroblastosis virus E26 oncogene homolog ETV1  ets variant gene 1 EST  expressed sequence tag EZH2  enhancer of zeste homolog 2 FACS  fluorescence-activated cell sorting FKBP5 FK506 binding protein 5 FOXA1 forkhead box A1 kb  kilobase pair KGF  keratinocyte growth factor KLK4  kallikrein-related peptidase 4 GAS5  growth-arrest specific 5 GATA2 GATA binding protein 2 GO  gene ontology GSTP1 glutathione S-transferase pi 1 xiv  HAT  histone acetyltransferase HDAC  histone deacetylase HIF1A  hypoxia inducible factor 1, alpha subunit HOTAIR HOX transcript antisense RNA HOXB  homeobox B cluster HOXB13 homeobox B13 HOXC  homeobox C cluster HSP27 heat shock protein 27 HSP70 heat shock protein 70 HSP90 heat shock protein 90 IGF1  insulin-like growth factor 1 (somatomedin C) IL6  interleukin 6 (interferon, beta 2) LBD  ligand binding domain LH  luteinizing hormone LHRH  luteinizing hormone releasing hormone lincRNA long intergenic non-coding RNA lncRNA long non-coding RNA miRNA microRNA MAPK  mitogen-activated protein kinase MTOR mechanistic target of rapamycin (serine/threonine kinase) MYC  myelocytomatosis viral oncogene homolog NCOA1 nuclear receptor coactivator 1 NCOA2 nuclear receptor coactivator 2 NCOR nuclear receptor corepressor ncRNA non-coding RNA NKX3-1 NK3 homeobox 1 NOVA  neuro-oncological ventral antigen NR1D1 nuclear receptor subfamily 1, group D, member 1 nt  nucleotide OCT1  octamer transcription factor p15  cyclin-dependent kinase inhibitor 2B p15as  cyclin-dependent kinase inhibitor 2B antisense p21  cyclin-dependent kinase inhibitor 1A p53  tumor protein 53 PCA3  prostate cancer gene 3 PCGEM1 prostate-specific transcript 1 PERP  TP53 apoptosis effector piRNA  PIWI-interacting RNA PIWIL1 piwi-like 1 PKA  protein kinase A PKC  protein kinase C PPAP2A phosphatidic acid thosphatase type 2A PRC2  polycomb repressive complex 2 PSA  prostate-specific antigen PTEN  phosphatase and tensin homolog PTENP1 pten pseudogene 1 RefSeq NCBI reference sequence database RIP  RNA immunoprecipitation xv  RISC  RNA-induced silencing complex RNA  ribonucleic acid SIAE  sialic acid acetylesterase SKIV2L2 superkiller viralicidic activity 2-like 2 SLC45A3 solute carrier family 45, member 3 SMRT  nuclear receptor co-repressor 2 snoRNA small nucleolar RNA SNP  single nucleotide polymorphism snRNA small nuclear RNA spliRNA splice-site associated RNA SRA  steroid receptor RNA activator SREBF2 sterol regulatory element binding transcription factor 2 tiRNA  transcript initiating RNA TBRG1 transforming growth factor beta regulator 1 THRA  thyroid hormone receptor-alpha TMPRSS2 transmembrane protease, serine 2 TSIX  XIST antisense RNA TSS  transcriptional start site UTR  untranslated region VPC  Vancouver Prostate Centre YB1  Y box binding protein 1 XIST  X (inactive)-specific transcript ZEB1  zinc finger E-box binding homeobox 1 ZEB2  zinc finger E-box binding homeobox 2   xvi  Acknowledgements I would like thank the following funding agencies who have supported this research: the Terry Fox Foundation, the Australian Canadian Prostate Cancer Research Alliance— funded by the Queensland Government National and International Research Alliance Program, the Prostate Cancer Foundation Australia and Cancer Australia.  Thank you to the Prostate Cancer Foundation of Canada and the Prostate Cancer Foundation BC for supporting my research through research scholarships. I would like to thank a number of researchers at the Vancouver Prostate Centre for their support, encouragement, and contribution to this research: Stephen Hendy, Nadine Tomlinson, Anne Haegert, Sonal Brahmbhatt, Mary Bowden, Dr. Susan Ettinger, Andrew Gray, John Cavanagh, Bob Shukin, and Dr. Jennifer Locke.  I would also like to thank Dr. Colin Collins and Dr. Stanislav Volik for providing RNA-seq data included in this thesis.  Thank you to Dr. Michael Cox and Dr. Amina Zoubeidi for their mentorship and thought-provoking discussions.  This research would not be possible without the careful and thorough work by Stephen Hendy. I would like to thank collaborators at the Queensland University of Technology for their contribution to this research: Dr. Martin Sadowski, Dr. John Lai, Dr. Stephen McPherson, and Dr. Jiyuan An.  I would also like to thank collaborators at the University of Queensland, Dr. Marcel Dinger and Dr. John Mattick, who have completely changed my view of RNA.   Thank you to researchers at the National Research Council of Canada: Dr. Roy Walker, Brandon Smith, and Dr. Marianna Sikorska, who took a chance and hired a fledgling bioinformatician. I would like to thank my supervisory committee members: Dr. Sam Aparicio, Dr. Martin Gleave, Dr. Colleen Nelson, and Dr. Wyeth Wasserman for their insightful comments and guidance during my graduate studies in the Experimental Medicine Program at UBC. To my research supervisor, Dr. Colleen Nelson, thank you for your support and for encouraging my sometimes unconventional ideas.  Your advice, enthusiasm, humor, and friendship are invaluable. xvii  To my friend and colleague, Kathleen Barilla, thank you for knowing the exact moment to encourage, listen, cheer, and divert. To my family, thank you for your endless encouragement and support.  I would especially like to thank my father for promoting and encouraging my early interest in science and medicine. I would like to express my sincere gratitude to Troy Lehman.  I never would have completed this thesis without you.  1  Chapter 1 Literature Review, Hypothesis and Objectives 1.1 Prostate Cancer Prostate cancer is the second most common cancer diagnosed in North America after skin cancer.  In Canada, one in seven men will be diagnosed with prostate cancer during his lifetime and one in 27 will die of the disease (Canadian Cancer Society 2010). In the United States, prostate cancer incidents account for 28% of all newly diagnosed cancers in men (Jemal et al. 2010). 1.1.1 Prostate Biology and Function The prostate is a male gland that surrounds the urethra at the base of the bladder.  The walnut-sized gland is responsible for secreting components, including proteases such as kallikrein 3 otherwise known as prostate specific antigen (PSA), into the seminal fluid upon ejaculation.  Male sex hormones called androgens are primarily responsible for the formation and development of the prostate (Cunha et al. 2004).  During neonatal development, prostatic buds grow out of the urogenital sinus (a part of the human body present during development of the urinary and reproductive organs) in response to a pulse of androgens that bind to its cognate androgen receptor (AR), a ligand activated transcription factor, to activate an extensive program of gene regulation (Cunha et al. 2004).  After birth, prostate development remains quiescent until puberty.  Due to a surge in androgen levels at puberty, the prostate grows to double in size and acquires the ability under the regulation of androgens to produce secretions for the seminal fluid. The dramatic morphological changes at puberty give rise to the complex glandular structure seen in the adult prostate (Donjacour and Cunha 1988; Dhanasekaran et al. 2005). A fully developed prostate is composed of epithelial acinar glands surrounded by fibro- muscular stroma.  The epithelial layer is composed of columnar luminal secretory cells which are AR positive and androgen responsive and the underlying AR negative basal cell layer with neuroendocrine cells scattered sparsely between the luminal and basal cells.  The basal layer of the acini and the stroma are divided by an extracellular matrix called the basement or basal membrane (Noordzij et al. 1995; Long et al. 2005) (Figure 2  1.1 A).   Over 90% of prostate cancers are adenocarcinomas that arise from the prostate secretory AR positive epithelium (Long et al. 2005) (Figure 1.1 B).  A small proportion of prostate cancers arise from neuroendocrine and basal cells.  Figure 1.1  Histopathology of the Prostate A.  An image of a normal prostate gland showing the epithelial layer (consisting of luminal secretory cells with underlying basal cells) surrounded by fibro-muscular stroma. B.  An image of a diseased prostate gland with adenocarcinoma showing a well differentiated tumor with a Gleason grade 3.  This tumor has the characteristic absence of basal cells. Images by Dr. Ladan Fazli. 1.1.2 Screening and Diagnosis Prostate cancer is commonly detected through a combination of a digital rectal examination and a blood test for PSA.  PSA is produced in normal prostate secretions but is released into the blood stream of patients with prostate cancer due to the disruption of the basement membrane and normal prostate architecture by tumor growth (Lilja et al. 2008).  Blood tests to measure PSA levels have been used since the early 1990‟s to detect disease and to monitor response to treatment (Lilja et al. 2008). Although PSA is prostate specific, it is not prostate cancer specific.  PSA levels can increase due to benign prostate diseases such as BPH (benign prostatic hyperplasia) or prostatitis. Biopsies of the prostate are taken if a patient has an abnormal digital rectal exam or elevated levels of PSA.  Prostate cancer is generally multifocal with genetically distinct foci of cancer (Aihara et al. 1994; Miller and Cygan 1994; Ruijter et al. 1996; Cheng et al. 1998; Macintosh et al. 1998).  Therefore, 6-12 biopsy samples are taken in an 3  attempt to detect and to reflect the extent of the disease.  The biopsies are scored using a histologic scoring system called the Gleason score [reviewed in (Humphrey 2004)]. The scoring system uses five basic grades that range from 1 to 5 describing the most differentiated to least differentiated tumor based on glandular morphology.  The Gleason score, ranging from 2-10, is the sum of the two most predominant grades.  A patient with a Gleason score of 10 will have the worst prognosis.  Currently most men are diagnosed with Gleason 6 or 7 in North America.  Figure 1.1 B shows a tumor with a Gleason grade of 3.  An additional score, called clinical stage, describes the status of the disease ranging from prostate confined to the presence or degree of local or distant metastases.  A combination of PSA level, Gleason score and clinical stage is used in the context of more detailed nomagrams taking into account other minor contributing factors to predict prognosis and guide treatment (Steyerberg et al. 2007). 1.1.3 Localized Prostate Cancer More than 90% of men diagnosed in the United States have localized or extracapsular disease in which the tumor is confined to tissues surrounding the prostate (Jemal et al. 2010).  Those men diagnosed with localized prostate cancer have an average 5-year survival rate of almost 100% (Jemal et al. 2010).  For the majority of men diagnosed with early stage prostate cancer, tumors will grow at a slow rate.  It is commonly acknowledge that men with indolent or slow growing disease will die from other causes than prostate cancer.   The current biomarker for prostate cancer, PSA (as discussed in 1.1.2), is an independent predictor of risk but cannot accurately enough predict whether a man diagnosed with prostate cancer will have indolent disease or an aggressive disease that will spread rapidly beyond the prostate gland.   A recent clinical trial indicated that screening with PSA in Europe reduced the rate of death due to prostate cancer by 20%; however, the reduced mortality rate was associated with an increase in over-diagnosis and over-treatment (Schroder et al. 2009).   In a parallel study in the USA, where screening has been prevalent for more than a decade, no benefit was observed by using PSA (Andriole et al. 2009).  One test that is gaining in use is the measurement of a non-coding RNA, PCA3, in the urine of men suspected of having prostate cancer.  PCA3 appears to be prostate-specific and prostate cancer selective [reviewed in (Hessels and Schalken 2009); discussed below in 1.4.8]. 4  Treatment options for men with localized disease are active surveillance, surgery (retro- pubic prostatectomy or robotic-assisted laparoscopic prostatectomy) or radiation (brachytherapy or external beam)(American Urological Association 2007; Gleave 2009). The side effects for both surgery and radiation include difficulties with urinary control and impaired sexual function.  Active surveillance includes regular monitoring of PSA and periodic prostate biopsies.  Patients with a low risk of progressing to advanced disease—indicated by their PSA levels, Gleason score, and clinical stage—may consider active surveillance.  Approximately 30% of prostate cancers will progress to advanced disease within 5 years (Gleave 2009). 1.1.4 Advanced Prostate Cancer Once prostate cancer has spread outside of the prostate gland and surrounding tissues, it is referred to as advanced or metastatic prostate cancer. In the United States, 4% of men with prostate cancer have advanced disease at diagnosis and have an average 5- year survival rate of 31% (Jemal et al. 2010).  The majority of prostate cancer metastasis will be found in either the regional lymph nodes or in bone.  Bone metastasis is found in 90% of patients with advanced prostate cancer and is a major component of disease specific mortality (Kelly and Yin 2008).  The bone remodeling and new bone overgrowth associated with bone metastasis leads to pain, fractures, and spinal cord compression (Logothetis and Lin 2005).  Treatment options for advanced prostate cancer is either radiation or hormone ablation therapy (discussed below in 1.2.5) but these treatments are palliative rather than curative. 1.1.5 Risk Factors The cause of prostate cancer is unknown; however, several risk factors have been identified.  Age is the strongest risk factor with most men diagnosed over the age of 65. Autopsy studies have shown that greater than 65% of men may have prostate cancer by the age of 80 (Rullis et al. 1975; Haas et al. 2008).  Men with a family history of prostate cancer also have a higher risk of being diagnosed with the disease (Gronberg 2003; Johns and Houlston 2003).  Prostate cancer is more commonly diagnosed in North America and Northwestern Europe than in Asia, African, and South America; however, mortality rates per capita are highest in Western Africa and the Caribbean 5  (Jemal et al. 2011).  The reasons for this geographical risk factor is unclear but lifestyle and accessibility to screening accounts for some the differences (American Cancer Society 2010).  Race is also a risk factor.  In the United States, the risk of mortality due to prostate cancer is highest in African-American men and lowest in men of Asian descent (Jemal et al. 2010).  Asian migrants to the USA, however, have increased risk within a generation compared to men within Asian countries suggesting that western diet and lifestyle may increase risk (Sinha et al. 2009).  Men with high androgen levels and/or increased expression of the AR have an increased risk of developing prostate cancer (discussed below in 1.2).  More research is required to understand the link between risk factors and etiology of the disease. 1.1.6 Theories on Molecular Mechanisms of Prostate Cancer Initiation A single molecular event has not been identified as responsible for the initiation of prostate cancer.  A number of theories on prostate cancer initiation have been proposed including chronic inflammation, oxidative stress, chromosomal alterations, genomic mutations, and epigenetic changes [reviewed in (Shen and Abate-Shen 2010)]. Alternations at the DNA level have been implicated in prostate tumor development. Recurrent gene fusions between the androgen-regulated promoter of TMPRSS2 and two members of the ETS transcription factor family, ERG and ETV1, have been identified in ~50% of prostate tumors (Tomlins et al. 2005).  There have been numerous but conflicting studies on the function and clinical significance of the most frequent fusion, TMPRSS2-ERG, whose fusion partners are located 3 Mb apart on chromosome 21 (21q22.2).  TMPRSS2-ERG fusion has been described as contributing to aggressive disease and metastasis (Perner et al. 2006; Yu et al. 2010).  A recent study, however, found the TMPRSS2-ERG fusion was associated with low Gleason scores and not with histological features of aggressive disease (Gopalan et al. 2009; Fine et al. 2010).  In addition to the TMPRSS2-ERG fusion, frequent chromosomal gains are observed at 8q and losses at 3p, 8p, 10q, 13q, and 17p (Kim et al. 2007; Lapointe et al. 2007; Taylor et al. 2010).  Several important regulatory genes have been mapped to these chromosomal locations including NKX3-1 at 8q21, PTEN at 10q23, and MYC at 8q24 (Shen and Abate-Shen 2010).  It remains unclear if these chromosomal alterations are 6  cancer initiating events or a consequence of genomic instability seen in advanced cancer contributing to progression. Although there is a familial risk associated with prostate cancer, few genetic susceptibility loci have been found in genome-wide association studies that have been replicated (Shen and Abate-Shen 2010).  The exception are multiple single-nucleotide polymorphisms located upstream of MYC on chromosome 8 (8q24) (Amundadottir et al. 2006; Freedman et al. 2006). Global changes in chromatin modifications and DNA methylation have been described in normal prostate tissue and prostate tumors (Kondo et al. 2008; Ke et al. 2009; Kobayashi et al. 2011) but it is unclear whether these epigenetic changes are cancer initiating.  One of the most striking epigenetic changes is GSTP1 CpG island hypermethylation present in the DNA of ~90% of prostate cancer cases (Nakayama et al. 2004).  Several proteins involved in the regulation of DNA methylation and histone methylation including DNMT3A, DNMT3B and EZH2 are increased in prostate cancer (Varambally et al. 2002; Kobayashi et al. 2011). The inability to identify a single molecular event raises two possibilities which are not mutually exclusive. The first is the idea that multiple mechanisms or pathways may lead to the same outcome of uncontrolled and undifferentiated cellular growth seen in prostate cancer.  The second is the idea of multiple cumulative environmental events triggering the disease.  The strong familial risk associated with prostate cancer suggests an inherited genomic or epigenomic component may be involved but not required. Rather than mechanisms of initiation, this thesis will focus on the role of androgens in prostate cancer progression and treatment resistance. 1.2 Androgens and the Androgen Receptor 1.2.1 Androgens Androgens are critical for the development and maintenance of the adult male phenotype including the development and maintenance of the prostate (discussed in 1.1.1).  Cases of prostate cancer have not been described in eunuchs or in individuals without a functioning androgen receptor suggesting that androgens and the androgen 7  receptor may contribute to prostate carcinogenesis (Wu and Gu 1991; Heinlein and Chang 2004).  Approximately 90% of the androgens in the male body are produced by the testes in the form of testosterone.  A smaller amount of weaker androgens are produced in the adrenal cortex in the form of dehydroepiandrosterone (DHEA) (So et al. 2003).  Testosterone levels are regulated by the hypothalamic-pituitary-gonadal axis (Figure 1.2).  Testosterone is converted in prostate cells to the more potent androgen, dihydrotestosterone (DHT), by 5α-reductase enzymes (Bruchovsky and Wilson 1968).  Figure 1.2  The Hypothalamic-Pituitary-Gonadal Axis The primary signal to produce testosterone is the secretion of luteinizing hormone- releasing hormone (LHRH) from the hypothalamus.  LHRH signals the pituitary to release luteinizing hormone (LH) which, in turn, signals Leydig cells in the testes to produce testosterone.  Testosterone can be converted to estrogen by aromatase enzymes.  The secretion of LHRH is inhibited through a negative feedback loop by circulating testosterone or estrogen. 1.2.2 The Androgen Receptor The actions of androgens are primarily exerted through the androgen receptor (AR), a 110-kDa member of the steroid receptor family of transcription factors. The predominant ligands of AR in the prostate are testosterone and DHT.  Prior to ligand binding, in- active AR resides in the cytoplasm where it is bound by chaperone molecules such as heat shock proteins.  Chaperone proteins such as HSP70 and HSP90 hold AR in a conformation making it available for ligand binding and prevent AR degradation by the 8  ubiquitin-proteasome pathway (Pratt et al. 2004).  Once bound to its ligand, AR undergoes a conformational change, releasing the chaperone proteins, allowing activation through phosphorylation.  The active AR protein conformation exposes its nuclear localization signal permitting AR to be translocated to the nucleus where AR forms homodimers and binds DNA.  AR regulates transcription by binding to DNA at specific sequences called androgen response elements (ARE) and recruiting other co- regulatory proteins and chromatin remodeling complexes [reviewed in (Dehm and Tindall 2007; Claessens et al. 2008)]. AR is encoded on chromosome X (Xq11-12).  The AR locus spans more than 90 kb with 8 exons.  An extended ~7 kb 3‟UTR for the AR mRNA has been characterized in the prostate (Lubahn et al. 1988).  The first exon of AR codes for the N-terminal domain; exons 2 and 3 code for the DNA-binding domain; exons 4 to 8 code the hinge region and the C-terminal ligand-binding domain (LBD) (Gelmann 2002).  The N-terminal domain is required for transcriptional activity and for binding to co-activators [reviewed in (Dehm and Tindall 2007; Claessens et al. 2008)]. The nuclear localization signal is located in the hinge region of AR.  The LBD contains an additional co-regulatory binding domain that is involved in co-repression or co-activation in a gene and ligand specific manner. 1.2.3 Genomic Effects of Androgens Much of the actions of androgens are mediated through AR‟s function as a transcriptional regulator.  It has been estimated that thousands of RNA transcripts are regulated by androgens directly or indirectly and that the protein products generated from these transcripts have a role in many cellular functions including cell proliferation, survival, lipid metabolism, and cell differentiation (Lamont and Tindall 2010).  From recent microarray studies, it has been estimated that ~25% of androgen regulated transcripts have little known or inferred function (Ngan et al. 2009).  Additional research is required to further elucidate the role of androgens and AR in cellular function, tissue differentiation and disease processes. The consensus DNA binding sequence (classical ARE), which can be bound by other steroid receptors including the glucocorticoid, mineralocorticoid and progesterone 9  receptors, is inverted repeats of 5'-TGTTCT-3' separated by a three bp spacer region (Cato et al. 1987; Ham et al. 1988).  AR ChIP-on-chip and ChIP-seq experiments have revealed that AR binds thousands of DNA regions with many of the binding sites containing a divergent variant of this idealised ARE.  Other DNA-binding transcription factors including GATA2, OCT1, and FOXA1 were found to bind to DNA near AR binding sites (Bolton et al. 2007; Massie et al. 2007; Wang et al. 2007; Jia et al. 2008; Wang et al. 2009; Yu et al. 2010).  DNA binding of both AR and these other transcription factors may be required for transcriptional regulation or alternatively AR can be associating indirectly with the DNA through protein interactions at some sites. Transcriptional regulation of PSA (encoded by the KLK3 genomic locus) is achieved by AR binding to two AREs within the proximal promoter region (Riegman et al. 1991) and to a non-classical cluster of AREs ~4 kb upstream in an enhancer region (Huang et al. 1999).  ChIP chromosome conformation capture assays suggests that the AR and co- activators bound at the promoter and enhancer AREs of PSA are brought together by chromatin looping to form a coordinated transcriptional complex (Shang et al. 2002; Wang et al. 2005).   An AR ChIP-seq experiment reveals few AR binding sites in classical upstream promoter regions (5 kb upstream of transcriptional start site; TSS). Many AR binding sites are found in enhancer regions, introns, or distant intergenic regions (Yu et al. 2010).  For example, AR binds within an intron of FKBP5 ~90 kb downstream of the TSS (Makkonen et al. 2009). The androgen regulated promoter region of TMPRSS2 (discussed in 1.1.6) is another example of distant AR binding with the binding site located ~13.5 kb upstream of the TSS (Wang et al. 2007).  Chromatin looping is not restricted to intra-chromosomal enhancers, as androgen signalling can also promote inter-chromosomal proximity (Lin et al. 2009; Mani et al. 2009).  Androgen signalling can recruit AR and components of the DNA double-stranded break machinery to the sites of the TMPRSS2 and ERG or ETV1 fusion points and along with genotoxic stress can lead to cell type-specific chromosomal alteration (Lin et al. 2009; Haffner et al. 2010). Although genome wide AR ChIP experiments are starting to give an unbiased view of AR binding, it is difficult from a linear representation of the DNA to infer androgen regulation without a clear understanding of dynamic chromatin structure. 10  1.2.4 Non-Genomic Effects of Androgens Most research is focused on the genomic effects of androgen through AR-regulated transcription and the subsequent translation of those transcripts into proteins. Androgens can, however, stimulate a rapid cellular response that it independent of AR nuclear translocation and transcriptional or translational machinery.  These non- genomic effects are seen within seconds or minutes of exposure to androgens.  Non- genomic effect of androgens include the rapid increase in intracellular concentrations of calcium and activation of secondary signaling pathways including activation of PKA, PKC, and MAPK which, in turn, regulate the activity of other transcription factors [reviewed in (Heinlein and Chang 2002; Foradori et al. 2008)].  The rapid non-genomic effects of androgens also lead to increased ligand-binding efficiencies and activation of AR and, therefore, work in concert with the genomic effects of androgens to regulate cellular function. 1.2.5 Androgen Deprivation Therapy (ADT) and Anti-androgens Androgen Deprivation Therapy (ADT) remains the cornerstone for treatment of advanced prostate cancer.  The primary goal of ADT is to suppress the production of testicular androgens.  Prostate tumor regression following ADT was first described in work by Huggins and Hodges in 1941 (Huggins and Hodges 1941).  In 1966, Huggins was awarded the Nobel Prize in Medicine for the seminal work that lead to the understanding that the growth and proliferation of prostate cancer cells was dependent on androgens.  In prostate cancer cells, androgens provide mechanisms to prevent apoptosis and increase cell proliferation.  ADT slows tumor growth and disease progression by a combination of mechanisms including cellular death through apoptosis and cell cycle arrest (Denmeade et al. 1996; Agus et al. 1999; Balk and Knudsen 2008) ADT was previously achieved through surgical castration (orchietomy) or high dose estrogens to disrupt the hypothalamic-pituitary-gonadal axis (Figure 1.2).  Currently, ADT is most frequently achieved through medical castration using LHRH analogs: Zoladex, Lupron, Eligard or Suprefact (Gleave 2009). These LHRH agonists stimulate an initial production of testosterone by the testes.  Through the negative feedback loop they inhibit any subsequent production of testicular androgens (Figure 1.2).  LHRH 11  antagonists have been used less frequently in the treatment of prostate cancer.  ADT has a number of adverse side effects including hot flashes, loss of libido, erectile dysfunction, fatigue, weight gain, memory and mood changes, breast enlargement or tenderness, decreased bone density, anemia, and increased risk of diabetes and cardiovascular disease (Gleave 2009; Kohli and Tindall 2010).  The production of testosterone or hormone flare that is initially seen with the use of LHRH agonists can be treated with the combined use of an anti-androgen to bind to the LDB and antagonize normal transcriptional activity of AR.  Anti-androgens can be used as a monotherapy or in combination with ADT (call combined androgen blockade). The anti-androgen most frequently used in the clinical setting is the non-steroidal AR antagonist bicalutamide (Casodex). When bicalutamide is bound to the ligand-binding domain of AR, AR is translocated to the nucleus.  Instead of AR stabilization, seen with DHT, binding of bicalutamide to AR causes AR degradation and ensuing inhibition of cell growth (Waller et al. 2000).  A second generation anti-androgen, MDV3100, is currently being evaluated in Phase III clinical trials for treatment of advanced prostate cancer.  Initial experiments show that MDV3100 has a higher affinity for the AR ligand- binding domain and a greater ability to impair nuclear translocation, DNA binding, and recruitment of AR coactivators than bicalutamide (Tran et al. 2009).  The in vitro effects of bicalutamide and MDV3100 on the androgen regulated transcriptome in prostate cancer cells are presented in Chapter 4. In advanced prostate cancer, ADT and anti-androgens are used as a primary systemic therapy as the growth of most advanced prostate cancer is initially dependent on androgens.  ADT and/or anti-androgens are initially effective at repressing AR activity as seen by a decrease in serum PSA levels and tumor regression.   ADT is, however, rarely curative and in almost all cases a subset of cancer cells will grow despite castrate levels of circulating testicular androgens (Feldman and Feldman 2001). The recurrent disease following ADT was previously called Androgen Independent Prostate Cancer.  It has recently become evident that androgens and AR play a critical role in the progression to—the now termed—Castration Resistant Prostate Cancer (CRPC). Aspects of AR activity, previously seen in androgen dependent disease, re-emerge in CRPC including expression of PSA (Craft and Sawyers 1998; Gregory et al. 1998; 12  Scher and Sawyers 2005).  For those patients who have an initial biochemical response to ADT or combined androgen blockade (evidenced by decreased serum PSA levels), their disease will progress to CRPC in a median time of 18-30 months (Crawford et al. 1989; Denis et al. 1998; Eisenberger et al. 1998).  Understanding the mechanisms underlying progression to CRPC remains the key to developing durable therapies for men with advanced disease. 1.3 Castration Resistant Prostate Cancer (CRPC) A number of theories on the molecular mechanisms involved in the progression to CRPC have been described (Feldman and Feldman 2001).  It is unclear whether the progression to CRPC is a result of the selection of a pre-existing castration resistant subpopulation of cells (Isaacs 1999; Wang et al. 2009) and/or if ADT promotes adaptive survival mechanisms that did not exist prior to treatment.  Theories on the mechanisms involved in the progression to CRPC describe a complex interplay of selection and adaptation involving components of genomic instability, activation of alternative growth factor and cell survival pathways, and activation of the androgen-AR signaling axis through increased activation of AR and production of androgens from alternative sources.  These theories are described below with emphasis on the continuing role of androgens and AR in CRPC. 1.3.1 Activation of Cell Survival and Alternative Growth Factor Pathways The expression of key cell survival proteins are increased following ADT and in CRPC including BCL2 (McDonnell et al. 1992), clusterin; CLU (Miyake et al. 2000; July et al. 2002) , heat shock protein 27; HSP27 (Rocchi et al. 2004), and Y box protein 1; YB1 (Gimenez-Bonafe et al. 2004).  BCL2 is a pro-survival protein that prevents apoptosis by inhibiting the release of cytochrome c from the mitochondria into the cytoplasm (McDonnell et al. 1992).  CLU and HSP27 are cytoprotective chaperone proteins that also prevent apoptosis by stabilizing proteins during cellular stress (Koch-Brandt and Morgans 1996; Gibbons et al. 2000; Rocchi et al. 2004).  HSP27 can also interact with AR and enhance AR activity (Zoubeidi et al. 2007).  YB1 is a transcriptional and translational regulator involved in drug resistance and survival (Gimenez-Bonafe et al. 2004). 13  The activation of key components of cell survival signal transduction pathways such as AKT-MTOR and MAPK are increased in CRPC.  Loss of PTEN, frequently seen in prostate cancer, leads to abnormal activation of AKT kinase activity and its downstream targets such as MTOR (Graff et al. 2000; Kremer et al. 2006).  Increased MAPK activation has also been associated with progression to CPRC (Gioeli et al. 1999). MAPK activation is a point of convergence in a number of signaling pathways and can be triggered by aberrant growth factor signaling (Gioeli et al. 1999). Aberrant paracrine and autocrine signaling of growth factors and cytokines through increased expression of the growth factor, their cognate receptors, or their binding partners has been reported in CRPC (So et al. 2005).  The aberrant signaling can lead to increased expression of cell survival proteins and activation cell survival pathways independent of the androgen-AR signaling.  Alternative growth factors and cytokines such as IGF1, EGF, KGF, and IL6 can also lead to ligand-independent activation of AR (Culig et al. 1994).  Growth factors and cytokines can activate protein kinases such as AKT or MAPK leading to direct activation of AR (Yeh et al. 1999; Zhou et al. 2000; Ueda et al. 2002) or indirect activation of AR through activation of coactivators (Ueda et al. 2002; Gregory et al. 2004). 1.3.2 Increased Expression, Sensitivity or Activation of AR In vitro experiments have shown that CRPC cell lines are more sensitive to lower levels of androgens than androgen dependent cell lines (Gregory et al. 2001).  The increased sensitivity to androgens may be due to increased AR expression, AR mutations, or (ligand-dependent or ligand-independent) AR activation. AR expression is increased in CRPC leading to increased tumor sensitivity to castrate levels of androgens.   In ~30% of cases, increased AR expression may be due to AR genomic amplification (Visakorpi et al. 1995; Koivisto et al. 1997).  Other adaptive transcriptional, translational or protein stabilizing mechanisms are, however, required to explain the increased AR mRNA and protein expression reported in most cases of CRPC (van der Kwast et al. 1991; Ruizeveld de Winter et al. 1994; Gregory et al. 2001; Chen et al. 2004). 14  Mutations in AR reported in some CRPC tumors can increase tumor sensitivity to the castrate levels of androgens.  Few mutations in AR are seen in primary or untreated tumors; however, ADT provides selective pressure for the rare subset of tumor cells with gain-of-function mutations in AR (Marcelli et al. 2000; Buchanan et al. 2001; Taplin et al. 2003; Steinkamp et al. 2009).  AR mutations can change the spectrum of ligands or coactivators capable of activating AR including adrenal androgens (DHEA) and other steroids such as progesterone, estradiol, and corticol (Culig et al. 1993; Taplin et al. 1995; Tan et al. 1997).  AR mutations can also convert anti-androgens (e.g. bicalutamide or an earlier anti-androgen, flutamide) from an AR antagonist to an AR agonist (Taplin et al. 1999; Hara et al. 2003).  The shift from AR antagonist to agonists is a potential mechanism leading to Anti-Androgen Withdrawal Syndrome where a decrease in PSA is seen when anti-androgen treatment is stopped. Increased ligand-dependent or ligand-independent AR activation in CRPC can increase tumor sensitivity to castrate levels of androgens.  AR activation can be increased through non-steroidal activation of AR (discussed in 1.3.1), increased expression of coactivators, decreased expression of corepressors, or increased expression of ligand- independent AR splice variants (Dehm et al. 2008; Guo et al. 2009; Hu et al. 2009; Sun et al. 2010).  Non-steroidal activation can occur through signaling transduction pathways triggered by alternative growth factors or cytokines (discussed in 1.3.1).  AR coactivators such as NCOA1 (alias SRC1) and NCOA2 (alias TIF2, SRC2) are increased in CRPC (Gregory et al. 2001).  A change in the coactivator to corepressor (e.g. SMRT, NCOR) ratio can alter AR activation (Scher and Sawyers 2005). 1.3.3 Increased Expression of Ligand-Independent AR Splice Variant Ligand-independent AR activation can occur through increased expression of AR splice variants lacking the ligand-binding domain.  A number of alternative splice variants of AR have been recently identified.  The AR variants lack the ligand-biding domain, AR∆LDB (Dehm et al. 2008; Guo et al. 2009; Hu et al. 2009; Sun et al. 2010).  Certain AR∆LDB variants are constitutively active: they do not require ligand binding for AR transactivation and they can promote transcription of a subset of RNAs normally regulated by full-length AR.  Increased expression of constitutively active AR∆LDB 15  variants correlated with rapid disease recurrence after radical prostatectomy (Guo et al. 2009; Hu et al. 2009). The presence—albeit at low levels—of the constitutively active AR∆LDB variants in benign tissue and primary untreated tumors suggests that AR∆LDB variant expression may be a natural adaptive mechanism for prostate cells to compensate for a low androgen environment.  The expression of AR∆LDB variants following ADT and in CRPC increases tumor sensitivity to castrate levels of androgens (Hu et al. 2009). 1.3.4 Alternative Sources of Androgens The androgen-AR signaling axis can be activated in CRPC by increased AR activation (discussed in 1.3.2) and by increased expression of ligand-independent AR splice variants (discussed in 1.3.3) but also by persistent intratumoral androgens.  Intratumoral DHT and testosterone levels in CRPC are similar to untreated benign prostate tissue and sufficient for AR signaling leading to PSA expression (Mohler et al. 2004; Titus et al. 2005; Mostaghel et al. 2007).  A study of androgens levels in healthy men following ADT showed that serum testosterone levels were decreased by an average of 94% but testosterone and DHT levels in the prostate were only decreased by 70% and 80%, respectively (Page et al. 2006). Persistent androgens in the prostate following ADT may come from an adrenal source of androgens or from de novo synthesis of androgens within the prostate tumor. Enzymes required to convert a systemic adrenal source of androgens to testosterone are increased in CRPC tumors (Stanbrough et al. 2006).  Xenograft experiments with an androgen-dependent (LNCaP) cell line showed de novo synthesis of androgens from a cholesterol precursor and conversion of progesterone to DHT (through steroid intermediates) in an androgen-depleted environment (Locke et al. 2008).  Several enzymes involved in steroid synthesis, including CYP17, are increased in CRPC tumors (Montgomery et al. 2008). 1.3.5 Treatment of CRPC The androgen-AR signaling axis is still active in CRPC; ~30% of patients with biochemical recurrence following ADT and/or anti-androgen treatment will respond to 16  additional hormone therapies (Small and Ryan 2006).  A second line of treatment may be the use of different anti-androgens or inhibitors of androgen synthesis (discussed below).  After failure of hormone manipulation therapies, the current treatment for CRPC is the combined use of docetaxel (an anti-mitotic chemotherapy) and prednisone (a synthetic corticosteroid used as an anti-inflammatory drug) (Tannock et al. 2004). Docetaxel-based chemotherapy in Phase III clinical trials showed an important but modest survival advantage of 2-3 months for patients with CRPC (De Dosso and Berthold 2008).  A new chemotherapeutic drug, cabazitaxel, was recently approved in June 2011 for use in combination with prednisone after or during treatment with docetaxel.  Provenge, a therapeutic cellular immunotherapy used to stimulate the patient‟s immune system to target prostate cancer cells, is a recent but expensive treatment option for CRPC with a survival advantage of ~4 months [reviewed in (Longo 2010)]. 17   Figure 1.3  Progression from Androgen Dependent to Castration Resistant Prostate Cancer Advanced prostate cancer is, in most cases, dependent on androgens for growth. Tumor volume can be monitored by the androgen-regulated biomarker, PSA.  Androgen dependent disease can be treated with androgen deprivation therapy (ADT) which will lead to a decrease in tumor volume.  After a period of time, prostate cancer cells will adapt to grow in castrate levels of circulating testicular androgens leading to a castration resistant phenotype. Evidence of increased intraprostatic androgens and AR activity in CRPC has lead to clinical trials using treatment with either second generation anti-androgens or inhibitors of androgen synthesis.  A small proportion of patients will progress to castration resistant prostate cancer without an associated increase in PSA. Promising results have recently been seen in early clinical trials with novel methods to directly inhibit the androgen-AR signaling axis in CRPC including more potent second- generation anti-androgen and inhibitors of androgen synthesis.  MDV3100 (discussed above in 1.2.5), a second generation anti-androgen, showed promising results in phase I/II clinical trials with CRPC patients who failed current anti-androgen and chemotherapeutic treatments (Tran et al. 2009).  MDV3100 is currently being evaluated in Phase III clinical trials. Abiraterone acetate, a specific inhibitor of an enzyme required for androgen synthesis (CYP17A1), was just approved for clinical use for the treatment of CRPC.  Clinical trials with abiraterone acetate in combination with docetaxel showed a 3.9 month survival advantage for patients with CRPC over treatment with docetaxel alone (de Bono et al. 2010).  MDV3100 and abiraterone acetate could provide additional therapeutic benefit in combination with LHRH agonists to delay the progression to CRPC.  In light of the recently identified ligand-independent AR splice variants, 18  inhibitors targeting the N-terminal domain of AR may be more effective than current anti-androgens that target the AR LBD. Additional methods to target the androgen-AR signaling axis would be to target AR chaperone molecules like HSP27 and HSP90.  Phase II clinical trials treating CPRC patients with an HSP27 inhibitor (OGX-427) has shown decreased PSA levels (Hotte et al. 2009).  Pre-clinical experiments with an inhibitor of HSP90 (PF-04929113) has been shown to inhibit progression to CRPC in a LNCaP xenograft model (Lamoureux et al. 2011). New therapeutic agents to treat CRPC are currently in clinical trials.  These agents target protein kinases, growth factor receptors and binding partners, cell survival proteins, proteins involved in metastasis (i.e. proteins promoting cell migration, invasion, and angiogenesis).  One of these agents targeting a cell survival protein is an inhibitor of clusterin.   Phase II clinical trial with an inhibitor of CLU (OGX-011) and Docetaxel/prednisone showed a 6.9 month survival advantage over Docetaxel/prednisone alone (Chi et al. 2010).  OGX-011 is currently being evaluated in Phase III clinical trials.  Both OGX-011 targeting CLU and OGX-427 targeting HSP27 are 2'methoxyethyl antisense oligonucleotides designed to target the respective mRNA molecules to prevent translation of those mRNAs into protein. Therapies for CRPC, to date, have generally been palliative with modest survival benefits.  More research is required to understand the molecular mechanisms behind progression to CRPC in order to develop more durable and targeted therapeutics.  It is clear that the AR continues to play a pivotal role in CRPC.  The progression to CRPC may not be through a single molecular mechanism but through multiple methods to achieve the same goal of growth with castrate levels of circulating androgens. Understanding the AR and non-AR mechanisms may provide biomarkers that allow patient stratification to inform on targeted treatment options. 1.4 RNA As discussed in the previous sections, the androgen-AR signaling axis is pivotal to understanding prostate cancer and progression to CRPC.  A considerable amount of data has been generated in RNA transcript profiling experiments (i.e. microarray and 19  RNA sequencing) in an effort to understand androgen transcript regulation [early efforts reviewed in (Dehm and Tindall 2006); (Jia et al. 2008; Ngan et al. 2009; Wang et al. 2009; Yu et al. 2010)].  These experiments have mainly focused on conventional transcriptional regulation of protein-coding mRNA.  Protein-levels are inferred from these RNA profiling experiment and pathway analyses have been generated [reviewed in (Lamont and Tindall 2010)].  In this section, high throughput RNA profiling methods, RNA databases and data visualization tools are examined.  Unconventional regulatory roles of RNA that exceed the established role of RNA as a messenger between DNA and protein are also examined in this section.  In Chapter 4 and Chapter 5, some of these regulatory roles in the context of androgen regulation in prostate cancer are examined. 1.4.1 High Throughput RNA Profiling Many of the recent advances in our understanding of the human transcriptome (collection of RNAs transcribed from the human genome) have been the result of advances in high throughput RNA profiling technologies including high density microarrays and next-generation sequencing.  In Chapter 4 and Chapter 5, data generated by both of these RNA profiling technologies is presented. Microarrays are a tool to measure the levels of a collection of RNAs within samples of interest.  Specific single-stranded DNA sequences or probes are attached to a glass slide—generally the size of a microscope slide.  Single-stranded cDNA or cRNA, generated from RNA extracted from the samples of interest, is fluorescently labeled and then hybridized to the microarray.  The hybridized microarrays are then scanned using a laser scanner and the fluorescent levels are compared between the samples of interest to determine relative levels of RNA. Advances in robotics and laser scanners have improved the density of microarrays increasing the number of oligonucleotide sequences that can be attached to a single glass side to ~1 million features.  Agilent‟s SurePrint technology is an example of high density microarray technology.  SurePrint technology involves ink-jet printing or in situ synthesis of oligonucleotides (60 nt) enabling rapid and cost-effective printing of standard or custom probe sets.  High density microarrays are no longer limited to one 20  microarray probe to measure the relative levels of a target protein-coding RNA.  These microarrays can be used to tile across large genomic regions to detect novel RNA transcription or can be used to detect expression levels of alternative splicing in protein- coding transcripts (discussed below in 1.4.4).  Advances in cRNA/cDNA generation and labeling protocols have increased strand-specificity and reduced 3‟ bias seen in earlier microarray protocols.  Microarrays have a number of limitations: they measure relative levels of RNA; they are limited to short oligonucleotides (e.g. the standard Agilent microarray probe length is 60 nt); and they are limited by probe design.  Microarray probe design requires a priori knowledge of target sequence.  Probe design is also limited by oligonucleotide composition in order to optimize specific hybridization of the target sequence to the oligonucleotide probe (discussed below in 3.5). Another recent advance in high throughput RNA profiling is called RNA-seq where next- generation sequencing technologies (e.g. Illumina Genome Analyzer II or ABI SOLiD sequencing platform) are used to generate short sequence reads from cDNA libraries. Millions of short cDNA fragments are sequenced in parallel generating gigabytes of data in a single sequencing run.  The short sequencing reads (current limit 100 bp) can be single-end reads or paired-end (i.e. both ends of a larger fragment of cDNA are sequenced).  The sequencing reads can be aligned to a reference genome or can be used for de novo transcriptome assembly.  RNA-seq has a number of advantages over microarray technologies including an unbiased view of the transcriptome with the ability to detect novel transcripts and alternative splice variants, and a larger dynamic range of measurement.   Some of the current limitations to RNA-seq technologies are access to sequencing platforms, cost, strand-specificity, data storage, and analysis.   The cost of RNA-seq is much less than previous sequence-based technologies but still significantly more than microarray technologies at the present time.  Efforts are ongoing to reduce costs, increase read length, and increase efficiencies of high throughput sequencing. Strand-specific RNA-seq protocols are available but have not yet been universally adopted.  Although analysis pipelines and storage methods exist to deal with the data generated by RNA-seq, we are still only at the edge of exploring the potential of what is possible with high throughput sequencing. 21  1.4.2 Protein-Coding RNA The established role of RNA as a copy (or transcript) of a DNA blueprint that is translated into a protein is recognized in Francis Crick‟s Central Dogma of Molecular Biology (1958).   The dogma characterizes RNA (messenger RNA; mRNA) as a single- stranded transitory messenger required to facilitate protein synthesis.  RNA transcription is mediated by the binding of transcription factors (e.g. AR) and co-factors to DNA and recruitment of transcriptional machinery. Unprocessed RNA, pre-mRNA, is transcribed from the DNA by an RNA polymerase enzyme, extending the pre-mRNA molecule in a 5‟ to 3‟ direction.  Pre-mRNA is processed by 5‟ capping, exon splicing (removing sequences called introns) and 3‟ polyadenylation.  Mature processed mRNA is exported from the nucleus to the cytoplasm and can then be translated into protein.  Regulatory sequences exist at both the 5‟ and 3‟ ends of mRNA that are not translated into proteins; these regions are called untranslated regions (UTR) or 5‟ UTR and 3‟ UTR, respectively. 1.4.3 RNA Databases and Data Visualization A number of open-access RNA databases and visualization tools are used later in this thesis including GenBank, RefSeq, Entrez Gene, and the UCSC Genome Browser. GenBank, RefSeq, and Entrez Gene are databases that are available from the National Center for Biotechnology Information (NCBI) in Bethesda, MD, USA.  GenBank is a public primary repository that stores nucleotide sequences submitted from individual laboratories and large scale sequencing projects.  GenBank sequences are curated to generate a list of non-redundant reference sequences that are stored in a secondary public database called RefSeq.  In addition to reference mRNA sequences, RefSeq stores reference sequences for genomic DNA contigs and proteins for known genes (Pruitt et al. 2009).  Standardization of sequences is critical for integration of genomic information with content relating to biological function and clinical significance. GenBank and RefSeq sequences are further integrated to consolidate gene-specific information in the Entrez Gene database.  Each Entrez Gene is assigned a unique identifier (Gene ID) to facilitate integration with other NCBI and external resources including nomenclature (official gene names and symbols), links to citations, variation information, chromosomal location, expression, homologs, protein domains, and protein 22  interaction information (Maglott et al. 2011).  Entrez Gene IDs are used extensively to link to other databases (e.g. KEGG, GO ontology terms from the Gene Ontology Consortium) for pathway and functional analysis of high throughput RNA profiling experiments.  The majority of content in both RefSeq and Entrez Gene is focused on protein-coding regions of the genome. The University of California, Santa Cruz (UCSC) Genome Browser provides a valuable online tool to access and visualize genomic data.  The UCSC Genome Browser facilitates comparison of publically available data including information from such resources as the NCBI databases and a similar European data resource, Ensembl (Flicek et al. 2011) with user-generated data (Fujita et al. 2011).  An RNA sequence can be viewed in genomic context with other RNA sequences (e.g. alternative transcripts, non-coding transcripts, and adjacent transcripts) and genomic features (e.g. repeats, SNPs, regulatory regions) in an effort to better understand regulation.   An example of the UCSC Genome Browser representation is given in the KLK3 genomic location in Figure 1.4.  The figure shows that GenBank holds numerous mRNA sequences for the KLK3 region that are consolidated into fewer mRNA reference sequences in RefSeq and Ensembl. 23   Figure 1.4  UCSC Genome Browser: KLK3 Genomic Locus Transcript characteristics are represented as follows: short blocks are UTRs, tall blocks are protein-coding exons, lines between exons are introns, and arrows on intron lines represent direction of transcription.  PSA protein is encoded on the positive strand (left: 5‟ end; right: 3‟ end) of chromosome 19 (19q13.33) at the KLK3 genomic region (human genome version: hg18).  A. NCBI RefSeq has 4 transcript variants for KLK3. The top reference sequence for KLK3 has 5 coding exons and 4 introns.  B. Ensembl has 3 transcript variants for KLK3.  C. RNA sequences submitted to NCBI GenBank mapping to the KLK3 genomic region.  Red transcripts are annotated as sequenced from prostate tissue.  D. RNA sequence mapping to negative chromosomal strand of KLK3 genomic locus (i.e. antisense to KLK3). 1.4.4 Alternative Splicing of Protein-Coding RNA In the last decade, large scale DNA sequencing projects of the human genome estimated that the human genome encoded 20,000-25,000 proteins (Human Genome Project 2004).  The estimates of the human proteome are very similar to simpler organisms such as Caenorhabditis elegans, a common roundworm, estimated to have ~19,000 proteins (CESC 1998).  Data generated by new high throughput RNA profiling 24  technologies in large scale projects such as ENCODE (Encyclopedia of DNA Elements) are revealing a more complex picture of transcription.  Most protein-coding regions identified at the DNA level encode for an average of five alternative transcripts (Birney et al. 2007; Tress et al. 2007).   The protein isoforms generated from these alternative transcripts can have vastly different biological functions.  The modular use of UTRs, exons, and retained introns seen in alternative transcripts may be an important link to understanding the phenotypic complexity of higher organisms (Kim et al. 2007). Alternative splicing of protein-coding RNAs can be both tissue- and condition-specific and has been implicated in disease (Garofalo et al. 2008).  Alternative transcript splicing can alter protein function by altering the composition or conformation of a protein including protein domains and localization signals.  Alternatively spliced transcripts of clusterin, for example, when translated to protein isoforms have opposing biological functions: one isoform inhibits while the other promotes apoptosis (Leskov et al. 2003). As discuss previously, AR can be alternatively spliced to generate ligand-independent protein isoforms expanding our understanding of the continuing role of AR in a low androgen environment (discussed in 1.3.3).  More than 20 additional AR transcript variants have been identified but little is known about their biological function or significance to disease (Guo et al. 2009).  The KLK3 genomic region, encoding the PSA biomarker, can generate at least 15 different transcripts (Heuze-Vourc'h et al. 2003)— many with unknown function and many that are not represented in reference RNA databases (Figure 1.4).  High throughput RNA profiling experiments continue to reveal cellular potential for alternative splicing.  The biological significance for most of the alternative splice events, however, remains unknown. Alternative transcripts exist that do not alter the protein-coding potential of the transcripts but instead alter UTRs which regulate protein translation efficiencies, RNA stability, or RNA localization.  These alternative 5‟ or 3‟ UTRs are generated by alternative promoter usage and alternative polyadenylation sites, respectively, and are commonly seen to correlate with tissues and conditions (Wang et al. 2008).  Estimates suggest that over half of protein-coding mRNAs can have an alternative promoter (Carninci et al. 2006; Kimura et al. 2006) and/or an alternative polyadenylation site (Tian et al. 2005).  Global analysis of 3‟ UTR usage have shown that truncated 3‟ UTRs can 25  be associated with proliferating cells (Sandberg et al. 2008).  A coordinated shift to use truncated 3‟ UTRs has also been identified in some types of cancer cells (Mayr and Bartel 2009; Singh et al. 2009).  The mechanism for these global changes in 3‟ UTR length is unknown.  mRNA with shorter 3‟ UTR can be more stable by evading translation repression machinery and therefore lead to high protein levels (Mayr and Bartel 2009).  Longer 3‟ UTRs extending well beyond previously annotated polyadenylation sites have been identified (Doherty et al. 1999; Moucadel et al. 2007). RNA sequencing projects are revealing chimeric transcripts that are encoded on distant DNA and sometimes different chromosomes.  In the ENCODE pilot study, over half of the protein-coding genes studied utilized exons outside boundaries of an annotated gene (Birney et al. 2007).  Many transcriptional starts sites can be located at large distances upstream of the annotated start sites often skipping neighboring protein- coding regions of the DNA (Denoeud et al. 2007) Some of these chimeric transcripts are formed by genomic rearrangements such as the TMPRSS2-ERG fusion transcript seen in prostate cancer (discussed in 1.1.5 above). Many efforts have been made to identify these chimeric transcripts as they provide potential disease-specific biomarkers and therapeutic targets (Maher et al. 2009).  The chimeric transcript, SLC45A3-ELK4, is of particular interest, as it is regulated by androgens and, though it is expressed in normal tissue, it is expressed at high levels in a subset of prostate cancer samples.  Unlike the TMPRSS2-ERG fusion, the presence of the SLC45A3-ELK4 transcript is not restricted to samples with genomic rearrangements (Rickman et al. 2009).  SLC45A3 and ELK4 are located 25 kb apart on chromosome 1 suggesting that a trans-splicing mechanism (where two independent transcripts are spliced together) or a read-through mechanism (where transcripts from consecutive genes on the same chromosome strand are spliced together) may be involved in generating the chimeric transcript.  A recent RNA-seq experiment identified additional read-through chimeric transcripts in prostate tumor samples; the function and clinical significance of these chimeric transcripts remains unknown (Nacu et al. 2011). An intriguing study found the presence of a chimeric transcript (from different chromosomes) in normal tissue that mimics a genomic rearrangement in endometrial stromal tumors (Li et al. 2008). Since the chimeric transcript was detected in normal 26  tissue but a genomic rearrangement was not, Li et al. proposed that a trans-splicing event might be a pre-condition for the genomic rearrangement.  The discovery of chimeric transcripts forces a shift in our perception of the role and complexity of RNA. Through mechanisms such as trans-splicing, RNA may not be a simple linear copy of DNA but may use modular regions of the DNA (Gingeras 2009). 1.4.5 Non-Coding RNA (ncRNA) Large scale high throughput RNA profiling projects have revealed complex overlapping, interleaved, antisense, and intergenic transcription that cannot be attributed to protein- coding transcripts.  Estimates from the ENCODE pilot project suggest that ~93% of the human genome is transcribed into RNA but less than 2% of that encodes for proteins (Birney et al. 2007).  Transcription from regions of „junk DNA‟ (Ohno 1972) that did not encoding for a protein was previously viewed as transcriptional noise.  Like protein- coding RNA, most non-protein-coding RNA (ncRNA) is also spliced and similarly poly- adenylated.  Although the estimates for transcriptional potential may be liberal due to sensitive measurement, it is becoming increasingly apparent that non-protein-coding RNA—termed non-coding RNA (ncRNA)—plays a significant regulatory role in cellular function.  In this section, some examples of the regulatory roles of RNA are examined in order to highlight the importance of including non-protein coding genomic regions in transcriptome analysis. The early exceptions to the messenger role of RNA were two forms of RNA required for protein synthesis: transfer RNA (tRNA) delivers amino acids to ribosomes (the site of protein synthesis) and ribosomal RNA (rRNA) links those amino acids together into proteins.  RNA is a single-stranded molecule and as such can form secondary and tertiary structures that exceed the structures of DNA—most predominantly the double- stranded helix.  Both tRNA and rRNA rely on complex structures for their function. Although the sequences of many ncRNAs are highly conserved through evolution, it is difficult to rule out evolutionary conservation of the functional structure of an ncRNA (Mattick 2001).  Many functional ncRNAs have been shown to have rapid sequence evolution implying that a lack of sequence conservation does not equate to a lack of function (Pang et al. 2006). 27  Some of the functional classes of RNA described in the next sections will illustrate the potential for ncRNAs to provide a sequence specific signal for a generic protein complex (Mattick and Makunin 2006).  Many therapeutics are designed to target the generic protein complex where it may be more effective to target a disease- or condition-specific RNA signal.  In later chapters of this thesis, androgen regulated ncRNAs are identified that may provide therapeutic targets for the treatment of prostate cancer. 1.4.6 Small Non-Coding RNA ncRNAs are crudely classified based on conventional RNA purification methods: small ncRNAs are shorter than 200 nt and long ncRNAs (lncRNA) are longer than 200 nt (Mercer et al. 2009).  In the last decade, most of the research in the field of ncRNA has been focused on microRNA (miRNA).  miRNA are ~22 nt single stranded RNAs that provide the sequence specific component of the RISC protein complex that allows for targeted translational repression of protein-coding transcripts.  A miRNA can be processed from intronic transcription of a protein-coding RNA either independently or as part of the protein coding transcript or from an lncRNA (see Figure 1.5 for a diagram of miRNA biogenesis).  1,426 human miRNAs have been documented (miRBase Release 17).  28   Figure 1.5  miRNA Biogenesis Hairpins found in longer protein-coding and non-coding RNA transcripts (pri-miRNA), are cleaved by the DROSHA enzyme to produce pre-miRNAs.  Pre-miRNAs are exported from the nucleus and then cleaved by another enzyme called DICER.  The resulting ~22 nt double stranded RNA is unwound and a single strand is incorporated into the RISC complex to form a mature miRNA.  The miRNA can then bind to a target mRNA (usually in the 3‟ UTR).  Depending on the sequence complementarity of the miRNA to the mRNA, the mRNA will either be translationally repressed or degraded. [reviewed in (Bartel 2004)] A number of computer algorithms exist to predict the interaction of a miRNA with 3‟ UTRs of mRNA including Microcosm (formerly miRBase Targets) (Griffiths-Jones et al. 2006), TargetScan (Lewis et al. 2005), and PicTar (Krek et al. 2005).  These algorithms each produce many false-positive predictions for a number of reasons: miRNA sequences are short and have a high degree of sequence similarity between families of miRNAs and complete complementary of the miRNA sequence to the mRNA sequence is not required for function.  Predicting miRNA-mRNA interactions is complicated by protein-coding RNAs utilizing alternative 3‟ UTRs in different tissues and conditions (discussed in 1.4.4). Dysregulation of specific miRNAs has been observed in cancer tissue compared to normal tissue [reviewed in (Spizzo et al. 2009)].  miRNA mutations and dysregulation can play a role in promoting the growth of cancer cells (Esquela-Kerscher and Slack 2006).  miRNA provide an elegant mechanism for coordinated translational regulation of a network of proteins.  These cancer associated miRNAs can provide valuable— previously overlooked—insight into cancer progression and may provide biomarkers for cancer or specific subtypes of cancer (Chin and Slack 2008).  Mechanisms to therapeutically suppress or express a miRNA are currently being evaluated in preclinical models [review in (Kim et al. 2011)]. 29  Small RNAs are difficult to detect using microarray technologies due the short length of the RNA molecule and cross-hybridization potential.  Many early microarray experiments to detect miRNA expression gave inconsistent results especially in the context of miRNA profiling in prostate cancer (Gandellini et al. 2009).  Protocols do exist, however, to isolate and sequence small RNA.  Androgen regulated small RNAs in prostate cancer cells are identified using these sequencing protocols later in this thesis (Chapter 4). Although early small RNA sequencing experiments were designed to detect miRNA, many other functional classes of small RNAs can be detected including small nucleolar RNA (snoRNA), involved in guiding chemical modifications of other RNAs; small nuclear RNA (snRNA), involved in RNA splicing; tiRNAs, associated with transcription initiation (Taft et al. 2009); spliRNA, associated with RNA splice-sites (Taft et al. 2010); PIWI- interacting RNA (piRNA), involved in maintaining genome integrity and regulating the expression of transposable elements (Malone and Hannon 2009).  Small RNA sequencing experiments are revealing many new classes of small RNA whose mode of action is still unclear.  It is intriguing that in vitro siRNA knockdown of PIWIL1, a protein that binds piRNAs to maintain genome integrity, increased the frequency of the TMPRSS2-ERG chromosomal translocation in normal prostate epithelial cells (Lin et al. 2009) (discussed in 1.1.6).  Evidence is mounting that small RNAs play an important role in cancer and cancer progression.  The design of appropriate therapies to modulate small RNA function will open a whole new avenue for targeted therapeutics. 1.4.7 Long Non-Coding RNA (lncRNA) There has been less focus in the literature on the regulatory roles of lncRNA.  Examples of lncRNAs have been shown to regulate numerous biological processes including chromatin-remodeling, transcription, and post-transcriptional processing (Mercer et al. 2009).  Survey studies of RNA expression in the mouse have identified lncRNAs expressed in different tissues, cell types, subcellular locations, different stages of development, and during cell differentiation (Amaral and Mattick 2008; Dinger et al. 2008; Mercer et al. 2008).  lncRNA can be polyadenylated and alternative spliced in a similar fashion to protein-coding RNA.   Functional classification of lncRNA has been mainly based on genomic context in relation to protein-coding transcripts: antisense 30  (transcribed from the opposite strand of the chromosome), intergenic (transcribed outside of the annotated boundaries of a protein-coding gene), promoter or enhancer associated, intronic, and independent 3‟ UTRs. Katayama et al. (2005) estimated based on the FANTOM3 large scale cDNA sequencing project that ~72% of protein-coding transcripts in the mammalian genome have divergent, convergent and fully-overlapping antisense transcription (Katayama et al. 2005).  The expression of antisense transcripts may be linked to the expression of transcripts from the other strand of DNA.  Concordant regulation of sense-antisense pairs suggests that antisense transcription is involved in chromatin-remodeling to promote an active conformation of DNA.  Discordant regulation of sense-antisense transcripts may be through the collision of transcriptional machinery or through the formation of sense-antisense RNA duplexes after transcription.  Sense-antisense RNA duplexes could alter mRNA stability by changing the structural conformation of RNA or by triggering RNA degradation by various RNAses (Faghihi et al. 2010). The estimates for antisense transcription may however be liberal.  The discovery of antisense artifacts generated by a reverse transcriptase enzyme during cDNA library construction has raised some concerns about current methods for detecting antisense transcription (Perocchi et al. 2007).  The possibility of artifact, however, should not detract from the significant potential of antisense transcription seen in such validated examples as X-chromosome inactivation regulated by the XIST-TSIX sense-antisense pair (Ohhata et al. 2008), p15 tumor suppressor protein inhibited by a p15 antisense (p15as) transcript (Yu et al. 2008), and the HOXD genomic loci inhibited by HOTAIR (expressed antisense to the HOXC genomic loci) (Rinn et al. 2007). All three of the antisense transcripts described above interact with chromatin- remodeling machinery to regulate transcription by altering chromatin state.  The interaction of these lncRNAs with chromatin-remodeling machinery suggests that lncRNAs could serve as scaffolds for the assembly of histone modification complexes at specific genomic loci (Tsai et al. 2010).  In the p15as example, Yu et al. found an inverse expression between p15as and p15 in tumor samples from patients with leukemia, a disease where epigenetic silencing of p15 is common.  Yu et al. showed that in vitro over-expression of p15as increased silencing of p15.  The silencing of p15 31  remained persistent after expression from the p15as construct was turned off.   If continuing expression of an lncRNA was not required for persistent epigenetic silencing, low levels of the lncRNA would be sufficient to alter cellular state.  The persistent epigenetic silencing of p15 has interesting therapeutic implications from a global perspective suggesting that it may be possible to co-opt a cancer cell‟s epigenetic silencing mechanisms through RNA-based therapies to achieve persistent silencing of oncogenic pathways. In addition to antisense transcription, many lncRNAs are transcribed in regions outside the annotated boundaries of known genes.  These intergenic lncRNAs have recently been termed lincRNAs (large intergenic non-coding RNAs) (Guttman et al. 2009; Khalil et al. 2009).  Khalil et al. (2009) designed a custom microarray to detect the expression of 3,289 lincRNAs in human cells.  Building on examples such as XIST and HOTAIR, Khalil et al. (2009) hybridized on their custom microarray RNA that was coimmunoprecipitated (RNA IP) with antibodies directed to several protein members of chromatin-remodeling complexes.  They found that 24% of the lincRNAs tested physically associated with the chromatin-inactivating complex, PRC2—the complex responsible for the repressive H3K27me3 histone modification.  One of the antibodies used in this experiment was against EZH2, a member of the PRC2 complex.  This is particularly interesting in relation to prostate cancer research as EZH2 has been reported to be increased in metastatic prostate cancer (Varambally et al. 2002; Yu et al. 2007).  The RNA IP microarray data generated by Khalil et al. will be integrated with data presented later in this thesis (Chapter 5). lncRNAs can be associated with transcriptional activation through interaction with chromatin-activating complexes (Dinger et al. 2008) and through transcription at enhancers and promoters (Kim et al. 2010; Wang et al. 2011).  Wang et al. recently reported that transcription following FOXA1 binding at enhancer regions can regulate a subset of the AR transcriptional program in prostate cancer cells (Wang et al. 2011). Promoter and enhancer associated transcripts may help to assemble transcriptional machinery at specific genomic loci. lncRNAs can be associated with alternative splicing as in the case of ZEB2.  ZEB2 is a protein that plays a role in epithelial to mesenchymal transition (EMT) and migration of 32  cancer cells.  ZEB2 antisense transcription masks a splice site in the 5‟ UTR of ZEB2 preventing the excision of the internal ribosomal entry site necessary for protein translation (Beltran et al. 2008). lncRNAs can act as decoys to compete with 3‟ UTRs of protein-coding transcripts to alter the dynamic regulation by miRNA and RNA-binding proteins.  Analysis of publically available cDNA and CAGE sequencing data (CAGE sequencing uses the 5‟ cap of RNA to identify the first 20-25 nt in polyadenylated RNAs) detected expression of independent transcripts within 3‟ UTRs of protein-coding transcripts (Mercer et al. 2011).  Using in situ hybridization, Mercer et al. reported that some of these 3‟ UTR transcripts had tissue- and condition-specific expression that was not correlated with the protein-coding region of the transcript.  Independent 3‟ UTR transcripts may bind miRNAs or RNA-binding proteins thereby altering their availability for regulation of protein-coding transcripts.  A similar modulation of miRNA dynamics has been identified through the transcription of pseudogenes (Poliseno et al. 2010).  Pseudogenes can be generated by genetic duplication or by retro-transposition of protein-coding transcripts. Although pseudogenes have lost protein-coding capacity, they retain regulatory regions such as UTRs and can still be transcribed.  PTEN pseudogene 1 (PTENP1) transcripts can interact with PTEN-targeting miRNAs allowing PTEN transcripts to evade miRNA- mediated translation repression.  Increased expression of the PTENP1 transcript leads to increased PTEN protein levels (Poliseno et al. 2010).  The expression of decoy RNA sequences can alter the translation of many different proteins.  The exogenous expression of decoy RNA sequences in vitro can inhibit the activity of a miRNA and may be an approach for therapeutic inhibition of miRNAs (Haraguchi et al. 2009). lncRNAs can interact with proteins to modulate their localization and function.  An intriguing example is the interaction of a hairpin structure of an lncRNA, growth-arrest specific 5 (GAS5), with the DNA-binding domain of the glucocorticoid receptor (GR). GAS5 expression can decrease ligand-dependent transcriptional activity of GR.  GAS5 can also inhibit ligand-dependent transcriptional activity of other steroid hormone receptors that share similar DNA response elements such as androgen, progesterone, and mineralocorticoid receptors (Kino et al. 2010).  It is possible that other transcription factors may have similar unidentified decoy RNAs that mimic DNA binding to prevent 33  DNA-protein interactions.  In contrast to GAS5, the lncRNA, steroid receptor RNA activator (SRA), can interact with the N-terminal domain of steroid receptors to promote their transactivation (Lanz et al. 1999). 1.4.8 Long Non-coding RNAs in Prostate Cancer A focus of prostate cancer research has been on androgen and the networks of proteins that it regulates.  Little effort is exerted to understand the role of androgen regulated non-coding RNAs.  Two prostate cancer specific lncRNAs have been described: PCGEM1 and PCA3.  PCGEM1 is over-expressed in primary prostate tumors compared to benign prostatic tissue in the majority of patients (Srikantan et al. 2000).  Over- expression of PCGEM1 in an androgen-dependent cell line (LNCaP) promotes cell proliferation and an increase in colony formation, suggesting PCGEM1 has a functional role in prostate tumorigenesis (Fu et al. 2006).  PCA3 (also known as DD3) is a prostate cancer specific lncRNA which is being developed as a potential new diagnostic biomarker.  In clinical trials, PCA3 levels in the urine were able to predict the outcomes of prostate biopsies and in conjunction with PSA was shown to be more specific than PSA alone [reviewed in (Hessels and Schalken 2009)].  The biological function of PCA3 has yet to be determined.  Reis et al. performed a more systematic survey of lncRNA expression in prostate cancer using a custom low density cDNA microarray to detect antisense transcription from intronic sequences.  They detected intronic transcripts in prostate cancer cells with 39 being androgen regulated (Reis et al. 2004; Louro et al. 2007). 1.4.9 Multitasking Genomic Loci As discussed in the previous sections, transcription is complex with many genomic loci encoding highly regulated sets of transcripts.  The extent of biological complexity cannot be described by grouping transcripts encoded in a genomic locus into one gene with one associated function (Pearson 2006; Gerstein et al. 2007; Carninci 2010).  The gene-centric view of biology functions in the context of Mendelian genetics where a gene is a unit of inheritance.  The gene-centric view of biology does not fully describe the functional potential of a genomic locus.  Gene-centric databases like NCBI Entrez Gene and GO Ontologies are invaluable for global analyses of large scale RNA profiling 34  experiments and critical for systems biology but they may be masking the complexity of genomic loci.   A genomic locus can have the potential to generate multiple protein isoforms with different and potentially opposing function through alternative RNA splicing (discussed in 1.4.4).  A miRNA with the potential to inhibit the translation of a large spectrum of protein-coding mRNAs can be processed itself from the intron of a protein-coding mRNA (discussed in 1.4.6).  A genomic locus can be transcribed in both directions and can have local (e.g. p15 antisense) and distal (e.g. HOTAIR) regulatory roles. The lines defining a genomic locus as protein-coding or non-coding are further blurred by the SRA example.  As described above, the lncRNA, SRA, can promote transactivation of steroid receptors.  An isoform of SRA transcribed from the same genomic locus can encode for a protein called SRAP (Chooniedass-Kothari et al. 2004). SRAP protein inhibits SRA RNA functional activity through direct protein-RNA interaction (Hube et al. 2010).  Many genomic loci in NCBI‟s Entrez Gene database are annotated as protein-coding; however, the RefSeq reference sequences for those genomic loci may contain, in addition to protein-coding transcripts, reference transcripts that are not predicted to encode proteins. An individual protein-coding transcript may have alternative non-coding regulatory functions.  p53 mRNA has both protein-coding and non-coding function.  p53 mRNA interacts directly with the MDM2 protein to prevent MDM2 from promoting the degradation of p53 protein.  Cancer-derived silent point mutations in p53 mRNA do not alter p53 protein composition but instead altered p53 protein stability.  The point mutation prevents p53 mRNA from binding to MDM2 permitting MDM2 to degrade p53 protein (Candeias et al. 2008).  The UTRs of a protein-coding mRNA may also have currently unknown regulatory functions.  The example of the PTEN pseudogene (PTENP1) (described in 1.4.7) modulating miRNA dynamics brings into question the regulatory role of UTRs as part of a protein-coding mRNA.  Many UTRs can be larger than their protein-coding counterparts and may themselves regulate miRNA dynamics or interact with RNA-binding proteins.  Most functional experiments designed to elucidate the role of a protein-coding transcripts usually artificially introduce a vector containing only the protein-coding sequence; over-expression of the complete mRNA 35  sequence, including UTRs, may have a different influence on cellular state.  Genome- wide experiments such as high throughput sequencing of RNA (RNA-seq) or RNA coimmunoprecipitated with antibodies against RNA-binding proteins (RIP-seq) (Rederstorff et al. 2010; Zhao et al. 2010) will further elucidate state-specific RNA expression patterns and potential functional interactions with proteins. Pathway and functional over-representation analysis, using data generated from high throughput RNA profiling experiments, can reveal important coordinated responses to treatment.  These computational approaches generally consider transcript expression independently of genomic context.  A parallel analytical approach considering genomic context and complexity may reveal critical regulatory mechanisms that could potentially be exploited for therapeutic purpose.  Concordant or discordant regulation of overlapping or adjacent protein-coding or non-coding transcripts may be an indicator of coordinated networks of transcription.  An analysis of a genome wide transcriptome profiling experiment revealed a ripple effect of transcription where transcription from one loci spread to adjacent genomic regions (Ebisuya et al. 2008).  An example of the importance of genomic context is found in the thyroid hormone receptor-α (THRA) genomic locus.  A portion of an orphan receptor, NR1D1, is encoded on the opposite strand of DNA and partially overlapping THRA.  Expression of NR1D1 influences the splicing of two antagonistic isoforms of THRA (Hastings et al. 1997). The examples of complex transcription described in the previous sections suggest a different parallel approach to analysis of high throughput RNA profiling data.  Many current computational approaches average expression across a reference RNA sequence or consolidate expression to a gene level.  With lower density arrays, relying on expression values from probes in the 3‟ UTRs may be misleading as the 3‟ UTR may be expressed independently of the coding sequence or an alternative 3‟ UTR may be used.  With high density microarrays, averaging the expression of individual probes across a reference RNA sequence will mask alternative splice variants.  Although RNA- seq has many advantages over high density microarrays, incorrect assumptions made during analysis can be misleading.  Most data currently generated by RNA-seq is not strand-specific; however, many genomic loci have overlapping antisense transcription. Incorrect assumptions about the direction of transcription can not only completely 36  overlook antisense transcription but can corrupt expression values for sense transcripts. RNA-seq has the advantage of being a relatively unbiased technology; however, relying completely on RNA reference sequences for analysis re-introduces a gene-centric bias. In this thesis, transcripts are identified that are expressed in the prostate and in prostate cancer tissue.  RNA-seq and high-density microarrays were used to identify and profile expression of genomic regions overlooked in the RefSeq RNA reference database. Parallel analytical approaches were used to consider transcript expression in genomic context but also in a system-wide context using protein-coding transcript expression to infer the expression of proteins. 1.5 Scope of the Thesis 1.5.1 Hypotheses RNA transcripts regulated by androgens that are novel or stored in primary sequence databases in conjunction with well studied NCBI RefSeq transcripts will inform on the mechanism underlying androgen-deprivation therapy, anti-androgen response and the progression to castration resistant prostate cancer. 1.5.2 Rationale and Specific Objectives 1) Identify prostate expressed transcripts using public databases and locally generated data and use those transcripts to build a custom microarray 2) Identify androgen regulated transcripts using data from in vitro experiments 3) Integrate in vitro data with RNA expression profiles from an in vivo model 4) Prioritize transcripts and genomic loci for further functional characterization The first part of Objective 1, to identify prostate expressed transcripts, is addressed in Chapter 2.  Primary sequences databases are mined for RNA transcripts expressed in the prostate.  RNA-seq data of prostate cancer cell lines and the Illumina Body Map 2.0 (i.e. RNA-seq data on 16 different tissues including prostate) are also mined for genomic regions expressed in the prostate.     The second part of Objective 1, to build 37  a custom microarray, is addressed in Chapter 3.  Novel probes are designed from the prostate expressed transcripts and genomic regions identified in Chapter 2 and used to design an Agilent custom 4x180K microarray. Objective 2, to identify androgen regulated transcripts from in vitro experiments, is addressed in Chapter 4.  Androgen and anti-androgen regulated transcripts are identified by integrative analysis of three high throughput in vitro experiments in the LNCaP prostate cancer cell line: microarray profiling using the custom microarray described in Chapter 3, AR ChIP-seq, and small RNA next-generation sequencing. The expression profiles of the identified androgen and anti-androgen regulated transcripts are examined in the LNCaP xenograft in vivo model in Chapter 4 to address Objective 3, the integration of in vitro and in vivo expression profiles.  The androgen and anti-androgen regulated transcripts are classified by their response to castration and progression to CRPC.  The global categories of response to androgen, anti- androgen, castration and progression to CRPC are identified for further functional characterization (Objective 4) in Chapter 4.  Chapter 5 further addresses Objective 4 by identifying specific transcripts and genomic loci for functional characterization.  38  Chapter 2 Identification of Prostate Expressed Transcripts 2.1 Introduction Microarrays and RNA-seq technologies provide a platform for characterizing tissue- and condition-specific transcription.  To estimate tissue-specific transcription and to identify specific transcripts or splice variants expressed in the prostate (Objective 1), a comparative analysis of primary sequence databases and RNA-seq data was performed.  The Illumina Human Body Map 2.0 RNA-seq data is used to identify potential tissue-specific transcripts encoded on the human genome outside the boundaries of processed RefSeq transcripts.  Additional non-RefSeq prostate expressed transcripts were identified in GenBank and in RNA-seq data generated from two prostate cancer cell lines. 2.2 Methods and Results 2.2.1 Software and Datasets The BEDTools (version 2.9) suite of software tools (Quinlan and Hall 2010) was used to compare sets of transcripts and to estimate the coverage of these transcript sets on the human genome (build 36.1; USCS Genome Browser, hg18).  Genome alignment of RefSeq transcripts were downloaded from the UCSC Genome Browser in December, 2010 and GenBank human mRNA and EST transcripts were downloaded from the UCSC Genome Browser in April, 2009.  Functional over-representation analysis was performed with the DAVID bioinformatics online resource (Huang da et al. 2009). Aligned RNA-seq data was obtained from the Collins Lab at the Vancouver Prostate Centre from the LNCaP androgen dependent prostate cancer cell line and from the C4- 2 cell line, a castration resistant derivative of LNCaP.  LNCaP cells stable transfectant line expressing wildtype-AR was also used for RNA-seq.  Both cell lines were grown in 5% fetal bovine serum (FBS) media.  cDNA libraries were prepared from both cell lines using the Illumina standard mRNA-seq library preparation protocol from poly-A selected mRNA.  The library preparation protocol does not retain the original direction of transcription (i.e. not strand-specific).  It is, therefore, not possible to know which strand 39  of DNA encoded the original transcript. The Illumina Genome Analyzer II was used to generate 4 lanes of sequencing for each of the cell lines using the paired-end protocol (average read length of 50 bp). RNA-seq data was obtained from Illumina for the Human Body Map 2.0 Project.  cDNA libraries were prepared from 16 different human tissues (see Table 2.2 for list of tissues) using the Illumina standard mRNA-seq library preparation protocol from poly-A selected mRNA.  The library preparation protocol is not strand-specific.  Each tissue was sequenced using the Illumina HighSeq 2000 system with one lane of pair-end sequencing (average read length 50 bp) and one lane of single-end sequencing (average read length 70 bp).  Raw data was aligned and analyzed by Dr. Marcel Dinger at the University of Queensland (Brisbane, Australia) using the de novo transcript assembly program, Cufflinks (Trapnell et al. 2010).  BED files and bedGraph files for each of the tissues were provided by Dr. Dinger (unpublished); these files were used to generate the analysis below.  In some cases, the Cufflinks software tool can infer the direction of transcription using the orientation of consensus splice sites. 2.2.2 Summary of NCBI RefSeq Transcripts The RefSeq reference database contains a curated, non-redundant list of RNA reference sequences.  The „genomeCoverageBed‟ tool in BedTools was used to calculate the coverage of the protein-coding (i.e. NM_ accession number) and non- coding (i.e. NR_ accession number) transcripts on the human genome.  Figure 2.1 A gives a summary of the analysis: protein-coding mRNAs (UTRs and exon) covered 2.02% of the genome, ncRNAs covered 0.16%, and 0.06% (~1.94 Mb) of the genome could be used in either protein-coding or non-coding RNA.  The protein-coding focus of RefSeq is also illustrated in Figure 2.1 A with 93% of the nucleotides covered by RefSeq transcripts used in protein-coding transcripts. Figure 2.1 B shows that approximately a third of the human genome covered by RefSeq transcripts is present in multiple RefSeq transcripts.  Many of these multitasking genomic loci are overlapping in the same direction suggesting that these nucleotides may encode alternative protein-coding splice variants.  A smaller number of nucleotides, however still an estimated 319 kb, are overlapping in opposite directions meaning that 40  two-independent mature transcripts are encoded on opposite strands of the DNA. Table 2.1 summarizes how these independent transcripts overlap on the genome.  The 3156 unique RefSeq transcripts involved in sense-antisense pairs in Table 2.1 overlap 2007 Entrez Gene genomic loci. A functional over-representation analysis using the DAVID bioinformatics resource shows that a significant portion of these Entrez Genes (41%; Benjamini-Hochberg corrected p-value: 6.4E-10) are annotated with the keyword „alternative splicing‟ by the UniProtKB/Swiss-Prot curated protein database.  It is possible that some of these sense-antisense pairs may be involved in directing alternative RNA splicing as in the case of the sense-antisense transcription at the thyroid hormone receptor-α location (THRA and NR1D1 genomic loci discussed above in 1.4.9).  The THRA and NR1D1 genomic loci were selected in the list of 2007 Entrez Genes.  Figure 2.1  NCBI RefSeq Transcript Coverage on the Human Genome A.  A venn diagram comparing the coverage of protein-coding to non-coding RefSeq transcripts (UCSC Genome Browser: December 2010) on the human genome (build 36.1).  Most of the human genome covered is covered by RefSeq transcripts is protein- coding.   ~0.06% the genome has the potential to be transcribed as a part of a protein- coding and a non-coding transcript.  B.  A pie chart showing a summary of how the nucleotides in the human genome are used by RefSeq transcripts.  A third of the human genome covered by RefSeq transcripts can be used by multiple transcripts.  These multitasking genomic loci can have overlapping transcripts in one direction (e.g. alternative splice variants) or from opposite directions (i.e. sense-antisense transcription). 41  Table 2.1  Summary of Sense-Antisense Transcripts in NCBI RefSeq The columns show the types of RefSeq transcripts (UCSC Genome Browser: December, 2010) that are encoded on opposite strands of DNA (i.e. sense-antisense pair).  The first column gives the types of transcripts involved in the sense-antisense pair.  The second column further subcategorizes the overlapping regions by the components of the transcripts.  The third column gives the unique number of transcripts involved in the sense-antisense pairs. Types of Transcripts Overlapping Features Transcripts Involved in Sense-Antisense Pairs Protein-coding transcripts 5‟ UTR - 5‟ UTR 343 5‟ UTR - exon 250 5‟ UTR - 3‟ UTR 30 exon - exon 111 exon - 3‟ UTR 861 3‟ UTR - 3‟ UTR 1176 Protein-coding and non-coding transcripts 5‟ UTR - ncRNA 307 exon - ncRNA 726 3‟ UTR - ncRNA 323 Non-coding transcripts ncRNA - ncRNA 123  The analysis of the RefSeq transcript coverage on the human genome was used below to identify transcripts not found in RefSeq (non-RefSeq transcripts) expressed in prostate tissue and prostate cancer cell lines. 2.2.3 Tissue-Specific Transcription in Illumina Body Map 2.0 The „genomeCoverageBed‟ tool in BEDTools was used to calculate the amount of coverage on the human genome for each of the tissues sequenced for the Illumina Body Map RNA-seq project (see Table 2.2 for list of tissues).  Tissue-specific transcription was calculated by subtracting the genomic regions expressed in all other tissues from the genomic regions expressed in the tissue of interest.  This approach gives a conservative estimation of transcription: the Illumina Body Map cannot distinguish between strands and this approach cannot distinguish between modular use of alternative exons and UTRs.  This approach does, however, show that a significant portion of the genomic regions expressed in the Illumina Body Map do not overlap with RefSeq transcripts.  Figure 2.2 summarizes the overlap between the Illumina Body Map and RefSeq.  14.94% of the human genome is expressed in the Illumina Body Map in at least one tissue but 87% of those regions do not encode for RefSeq transcripts.  8.39% 42  of the human genome is expressed in only one tissue in the Illumina Body Map but 98% of those tissue-specific regions do not encode for RefSeq transcripts.  Table 2.2 summarizes the coverage of the Illumina Body Map data on the human genome by each tissue sequenced.  Figure 2.2  Illumina Body Map 2.0 Compared to NCBI RefSeq Transcript Coverage on the Human Genome A venn diagram comparing the expressed genomic regions in the Illumina Body Map 2.0 RNA-seq data to the genomic regions encoding reference RNA transcripts in RefSeq (UCSC Genome Browser: December, 2010).  From the Cufflinks analysis of the Illumina data, 14.94% of the genome (build 36.1; hg18) is expressed in at least one tissue and 8.39% of the genome is expressed in only one tissue sequenced in this experiment.  While most of the regions encoding RefSeq transcripts are covered by the Illumina RNA-seq data, less than 2% of the tissue-specific expressed genomic regions are covered by RefSeq transcripts.  Transcript coverage of the human genome by tissue is summarized in Table 2.2.  43  Table 2.2  Illumina Body Map 2.0 Tissue Coverage on the Human Genome Each row is the data for one of the 16 tissues in the Illumina Body Map RNA-seq data. Non-RefSeq coverage is the percentage of the genome (build 36.1; hg18) not encoding processed RefSeq transcripts (UCSC Genome Browser: December, 2010).  Tissue- specific coverage is the percentage of the genome that is expressed in the RNA-seq data for the tissue of interest but not expressed in the other 15 tissues. The prostate data is further discussed in 2.2.4 (row highlighted in grey). tissue genome coverage (%) Non-RefSeq coverage (%) tissue-specific coverage (%) Non-RefSeq tissue-specific coverage (%) adipose 2.58 1.23 0.30 0.30 adrenal 5.16 3.72 1.47 1.47 brain 4.40 2.90 1.19 1.16 breast 3.48 2.06 0.59 0.58 colon 2.39 1.05 0.22 0.22 heart 2.96 1.58 0.52 0.51 kidney 3.13 1.73 0.53 0.53 liver 2.02 0.84 0.23 0.22 lung 2.45 1.11 0.28 0.27 lymph 2.60 1.29 0.32 0.31 ovary 4.05 2.53 0.71 0.70 prostate 2.68 1.24 0.26 0.26 skelmusc 1.44 0.42 0.08 0.07 testes 4.10 2.47 1.14 1.09 thyroid 3.34 1.86 0.48 0.47 whiteblood 1.67 0.55 0.09 0.09 2.2.4 Prostate Expressed Genomic Regions Three data sources were used to identify prostate expressed genomic regions outside of the genomic regions encoding RefSeq transcripts (UCSC Genome Brower: December, 2010): Illumina Body Map 2.0, GenBank database (UCSC Genome Browser: April, 2009), and RNA-seq data from the LNCaP and C4-2 prostate cancer cell lines. Illumina Body Map 2.0 Total poly-A selected RNA from the prostate of a 73 year old Caucasian male who died of lung cancer was sequenced using the Illumina HighSeq 2000 system (as described in 2.2.1).  Prostate expressed genomic regions were identified from Cufflinks files (as described in 2.2.3) and the genome coverage is summarized in Table 2.2.  Prostate- 44  specific genomic regions were calculated by subtracting the expressed genomic regions in the 15 other tissues from the genomic regions expressed in the prostate tissue.  The genomic regions were subtracted using the „subtractBed‟ tool in BedTools.  Figure 2.3 A summarizes the overlap between the prostate expressed genomic regions in the Illumina Body Map and RefSeq.  2.68% of the human genome is expressed in the prostate tissue sampled in the Illumina Body Map but only 54% of those genomics regions encode for RefSeq transcripts.  0.26% of the human genome, an estimated 8.1 Mb, is expressed in the prostate and not in the other 15 tissues sampled.  Only 1.9% of the prostate-specific regions encode for RefSeq transcripts.  The identified prostate- specific regions were annotated using the same annotation method for microarray probes (details discussed below in 3.3).  Key previously identified prostate-specific transcripts were identified in the prostate-specific expressed genomic regions in the Illumina Body Map data (see examples in Table 2.3).  In addition to full mRNA transcripts, prostate-specific genomic regions were selected that overlapped with portions of the RefSeq transcripts suggesting tissue-specific alternative exon or UTR usage.  Figure 2.4 shows the HOXB13 transcript as an example of prostate-specific usage of an alternative extended 3‟ UTR. 45   Figure 2.3  Venn Diagrams of Prostate Expressed Genomic Regions A.  A venn diagram comparing the percent of the human genome expressed in prostate tissue from the Illumina Body Map 2.0 data to the percent of the genome (human genome build 36.1) encoding RefSeq transcripts (UCSC Genome Browser: December, 2010).  B. A venn diagram comparing the percent of the human genome expressed in the prostate in the Illumina Body Map, GenBank, and LNCaP and C4-2 RNA-seq data not overlapping processed RefSeq transcripts.  Table 2.3  Examples of Prostate-Specific Genes in Illumina Body Map 2.0 These four NCBI Entrez Genes are expressed in the prostate RNA-seq data from the Illumina Body Map 2.0 and not in the other 15 transcripts.  These genes have previously been shown to be prostate-specific. Entrez GeneID Gene Symbol Gene Name Genomic Location Transcript Type 354 KLK3/PSA kallikrein-related peptidase 3 19q13.41 protein-coding 4824 NKX3-1 NK3 homeobox 1 8p21 protein-coding 50652 PCA3 prostate cancer antigen 3 9q21-q22 non-coding 64002 PCGEM1 prostate-specific transcript 1 2q32 non-coding  46   Figure 2.4  UCSC Genome Browser: HOXB13 Prostate Specific 3'UTR The annotated 3‟ UTR (short block on the left) of the HOXB13 Refseq transcript (blue) was found to be prostate-specific in the Illumina Body Map 2.0 (red).  The 3‟UTR was covered by RNA-seq data for LNCaP and C4-2 cells lines.  The GenBank sequence, AY937327 (submitted by Dr. Damu Yang from the Vancouver Prostate Centre) was sequenced from prostate tissue.  All other GenBank sequences have a shorter 3‟ UTRs. NCBI GenBank Database Prostate and prostate cancer specific mRNA and EST sequences were identified by using the public access to the UCSC Genome Browser mySQL database.  RNA sequence accession numbers were retrieved using a query for prostate or prostate 47  cancer cells lines (i.e. LNCaP, DU-145, PC3, C4-2) in the gbCdnaInfo table (i.e. library or tissue fields).  Genome alignments for the 1780 mRNA accession numbers and 181748 EST accession numbers were retrieved in BED format from the UCSC Genome Browser.  UTR and exon sequences for RefSeq transcripts were subtracted (using the subtractBed tool from BedTools) from the prostate RNA sequences to identify prostate expressed genomic regions that do not overlap with RefSeq transcripts.  0.1% of the human genome is covered by these non-RefSeq prostate transcripts (Figure 2.3 B). Some of these GenBank accession numbers are incorrectly associated with Entrez GeneIDs in the NCBI integrated data resource.  Five examples of these incorrect associations are found in Table 2.4 where the prostate expressed GenBank sequence does not overlap with any of the reference transcripts for the gene.  These incorrect associations were found using the NCBI text file, gene2accession.gz, downloaded from the NCBI ftp server (ftp.ncbi.nih.gov: May, 2011). Table 2.4  Examples of Prostate GenBank mRNAs Incorrectly Associated with NCBI Entrez Genes GenBank prostate or prostate cancer cell line accessions (first column) were incorrectly associated with an Entrez Gene ID (second column).  The last column gives the location of the GenBank sequence compared to the associated Entrez Gene. GenBank Accession Entrez GeneID Gene Symbol Gene Name GenBank location compared to Gene AK092643 1119 CHKA choline kinase alpha intron sense AK096798 112724 RDH13 retinol dehydrogenase 13 downstream sense AK127879 8565 YARS tyrosyl-tRNA synthetase intron sense BC019618 7289 TULP3 tubby like protein 3 3‟ UTR antisense BC024020 81671 VMP1 vacuole membrane protein 1 exon antisense LNCaP and C4-2 RNA-seq Data Expressed genomic regions were identified in RNA-seq data from the androgen dependent LNCaP prostate cancer cell line, and the LNCaP castration resistant derivative, C4-2 (RNA-seq data described in 2.2.1).  Genomic regions were selected if the region had a consecutive stretch of 5 reads for 60 nucleotides (the minimum number of nucleotides required to design a custom Agilent microarray probe) in at least one of the cell lines.  These regions were selected by parsing the raw pileup data using a custom perl script.  UTR and exon sequences for RefSeq transcripts were subtracted 48  (using the subtractBed tool from BedTools) from the filtered RNA-seq genomic regions to identify the 2.70% of the genome expressed in either the LNCaP or C4-2 cell lines that does not overlap with RefSeq transcripts  (Figure 2.3 B). Analysis of the LNCaP and C4-2 RNA-seq data also revealed consecutive coverage of reads through and beyond annotated 3‟ UTRs.  In this RNA-seq data, it is not possible to determine the original direction of the transcription; however, it is possible that the 3108 Entrez Genes with this pattern of consecutive sequencing beyond annotated 3‟ UTR are using longer alternative 3‟ UTRs.  In some cases, the tissue-specific 3‟ UTR may be used as the annotated 3‟UTR in RefSeq such as HOXB13 (Figure 2.4).  The annotated 3‟ UTR of RefSeq transcripts usually represent the longest 3‟ UTR found in GenBank; however, as in the HOXB13 case, many of these 3‟ UTRs may be tissue- specific.  Although the coding sequence for HOXB13 does not change, the extended 3‟ UTR may alter translational efficiencies and transcript localization.  The HOXB13 protein is of particular interest in prostate cancer research as it is a transcription factor that is important for prostate gland development.  HOXB13 can interact with AR to enhance the transcription of a subset of androgen-regulated transcripts (Norris et al. 2009). 2.3 Discussion 2.3.1 RefSeq Transcripts RefSeq transcripts cover 2.18% of the human genome with 93% of those regions covered by processed protein-coding transcripts.  This focus on protein-coding transcripts may reflect the current protein-coding focus in scientific research and not the true ratio of non-coding to protein-coding transcripts in the human transcriptome. Approximately a third of the genomic regions covered by RefSeq transcripts are present in multiple transcripts.  These overlapping transcripts can be generated either by alternative splicing of exons, intron retention, alternative promoter usage, or alternative use of polyadenylation sites.  The genomic regions covered by RefSeq transcripts can also be used by both protein-coding and non-coding transcripts.  It is possible that some of these genomic regions may be encoding both protein-coding and non-coding splice- variants like the lncRNA, SRA, and its antagonistic protein-coding isoform, SRAP 49  (discussed in 1.4.9).  RefSeq transcripts can also be transcribed from both strands of DNA.  It is important to note that many PCR-based sequencing methods do not retain the original direction of transcription thereby potentially underestimating the extent of sense-antisense transcript pairs in the RefSeq database.  In this analysis, we have identified 3156 unique RefSeq transcripts involved in sense-antisense overlapping pairs.  To accurately detect and measure these sense-antisense transcripts, strand- specific sequencing protocols should be adopted. 2.3.2 Incorrect Association of mRNA Sequences with NCBI Entrez Genes The current standards for grouping RNA sequences into genes may oversimplify actual genomic complexity.  Within the Entrez Gene database, primary GenBank and RefSeq sequences are grouped by gene and integrated with additional gene-specific information (discussed in 1.4.3).  Entrez Gene is used as an internet portal for gene information and as a primary resource for diverse informatics tools.  Secondary sources including Stanford SOURCE, Oncomine, Ingenuity Pathway Analysis, and Agilent GeneSpring use Entrez GeneIDs to link functional information or to integrate across diverse platforms.  Incorrect associations in Entrez Gene between RNA sequences (i.e. GenBank accessions) and protein-coding Entrez Genes may be incorrectly associating lncRNAs or novel protein-coding genes with known protein-coding genes.  These GenBank sequences may not overlap in the correct orientation any portion of the associated protein-coding RefSeq transcripts (examples of shown above in Table 2.4). Relying on these associations to provide annotation for microarray or RNA-seq analysis may prevent novel biological discovery by overlooking lncRNAs or novel proteins (transcribed in introns, antisense, upstream or downstream of known transcripts) or invalidate downstream functional analysis where RNA expression levels are used to infer protein levels. 2.3.3 Non-RefSeq Transcripts The analytical approach to identify non-RefSeq transcripts by subtracting genomic regions covered by RefSeq transcripts is too simplistic to capture the true complexity of the transcriptome.  As described above, a genomic locus can encode for multiple transcripts with vastly different biological function.  Analysis of RNA-seq data, 50  exemplified by the Illumina Body Map, is revealing expression from genomic regions devoid of associated RefSeq transcripts.  The analysis of the Illumina Body Map in this chapter has given an estimation of human transcriptome with almost 15% of the genome transcribed in at least one of the 15 tissues sampled.  A percentage of this estimate may be due to artefacts in cDNA preparation (e.g. non-polyadenylated RNA priming from poly-A sequences encoded in the DNA) or from incorrect alignment of sequencing reads to the genome due to duplicate genomic regions or repeat regions. Arguments can, however, be made this estimation is extremely conservative.  The 15 tissues sampled do not reflect the diversity of tissues and cell types within the human body.  The transcript profile for each tissue sampled was a snapshot of the tissue in a particular state; an environmental stimulus or disease would alter the overall transcript profile of each tissue. The sequence of the DNA or genome remains consistent for each healthy cell in the human body; the expression of those DNA sequences is, however, not similarly constrained.  Detection of prostate-specific transcripts in the prostate-specific genomic regions in the Illumina Body Map including KLK3/PSA and NKX3-1 validates this analytical approach to identify tissue-specific transcripts.  98% of the tissue-specific genomic regions identified in the Illumina Body Map do not overlap with RefSeq transcripts.  Reference transcripts may be biased toward transcripts expressed in commonly studied cell lines (e.g.  GM12878, K562, HepG2, HeLa-S3, HUVEC, keratinocytes, and H1 human embryonic stem cells used for the ENCODE project; http://www.genome.gov/10005107: July 2011).  The potential for tissue- and condition- specific transcription highlights the importance of the collecting appropriate meta-data describing sample origins and conditions for sequencing projects and during sequence submission to repositories such as GenBank. 2.3.4 Prostate Expressed Genomic Regions Analysis of data from the Illumina Body Map, GenBank, and RNA-seq data from LNCaP and C4-2 prostate cancer cell lines identified prostate expressed genomic regions outside of the genomic regions encoding RefSeq transcripts.  An estimated 14.8 Mb (0.48% of the genome), was expressed in at least 2 of the 3 data sources.  These genomic regions may contain novel proteins or regulatory lncRNAs important for 51  understanding and potentially preventing or treating treatment resistance in prostate cancer.  The prostate expressed genomic regions identified in this chapter were used to design microarray probes for a custom microarray discussed in Chapter 3.  The custom microarray was later used in profiling experiments discussed in Chapter 4.  52  Chapter 3 Building an Custom Agilent Prostate Microarray 3.1 Introduction The discovery of novel protein-coding RNAs and ncRNAs in large scale genome wide RNA profiling projects such as ENCODE (Birney et al. 2007) motivated the design of a custom microarray to profile prostate-expressed transcripts that may have been absent on standard gene expression microarrays.  The custom microarray incorporated probes from a standard gene expression microarray (Agilent 44K) with probes to profile ncRNAs and alternative proteins.  The goal was to design a cost-effective method to profile prostate-expressed transcripts under a large number of conditions and treatments in order to prioritize transcripts for further functional characterization. The methods used to design and annotate the prostate focused microarray are described in this chapter. 3.2 Microarray Probe Design 3.2.1 Previously Designed Probes 30,755 unique probes from the Agilent 44K standard human gene expression probe set (design number 014850) were included in the custom microarray.  74% of these standard probes are designed to detect expression levels of features—primarily 3‟ UTRs and exons—within RefSeq transcripts.  The remaining Agilent probes have been validated as functionally responsive probes and most likely designed from sequences present in the Incyte Sequence Database (Agilent eArray FAQ: May, 2011). In addition to the Agilent 44K set, 4,416 probes were included from the Agilent 244K aCGH probe set (design number 014693).  These aCGH probes were included because they were differentially expressed (fold change of 1.5 and Benjamini-Hochberg corrected p-value < 0.1) in an experiment designed to detect the expression of novel RNAs.  The experiment was designed to profile RNA isolated from LNCaP cells treated for 48 h with a synthetic androgen (R1881) compared to an ethanol vehicle control on aCGH microarray arrays.  aCGH microarrays are normally used to detect anomalies at the DNA level but were used in this experiment as an unbiased method to detect novel 53  transcription across the genome.  79% of these aCGH Agilent probes do not overlap RefSeq transcripts.  13% of the aCGH probes included were less than 60 nt long; the length of these shorter probes was adjusted to optimize hybridization specificity (i.e. melting temperature and GC content).  All other probes on the custom microarray, including the novel probes (discussed below in 3.2.2), were 60 nt long. 16,281 probes, designed to detect lncRNAs, were provided under a research collaboration agreement by Drs. John Mattick and Marcel Dinger from the University of Queensland (Brisbane, Australia).  The lncRNAs were identified from the literature, the Human Full-length cDNA Annotation Invitational project, and human cDNA and EST databases.  91% of these non-coding probes do not overlap RefSeq transcripts. 3.2.2 Microarray Design Method and Inclusion Criteria Novel probes were designed by submitting sequences from the transcripts or genomic regions of interest to Agilent‟s online probe design portal, eArray.  The eArray online tool designs microarray expression probes with a GC content of ~ 40%, a melting temperature (TM) of ~80˚C, and no base in excess of 60% (Base composition methodology: http://earray.chem.agilent.com: May, 2011).  Quality microarray probes could not be designed for all sequences submitted to eArray due to probe composition constraints. Novel probes were compared to RefSeq transcripts using BedTools version 2.9 (Quinlan and Hall 2010) and the UCSC Genome Browser RefSeq track (downloaded April 2009).  Previously designed probes (discussed in 3.2.1) and novel probes designed through eArray (discussed in 3.2.3) were aligned to the human genome (version hg18) using a local installation of the BLAT alignment tool from UCSC Genome Browser (version 34: downloaded from http://www.soe.ucsc.edu/~kent).  Each probe is included if it aligns to only one location in the human genome with a sequence identity greater than 95%. 54  3.2.3 Novel Probes Prostate Expressed Transcripts 20,321 probes were designed to detect the prostate expressed genomic regions identified in GenBank and EST databases, and in the RNA-seq data from LNCaP and C4-2 prostate cancer cell lines (discussed in 2.2.4).  Non-RefSeq genomic regions were identified by subtracting genomic regions covered by RefSeq transcripts from the identified prostate expressed genomic regions using the „subtractBed‟ tool in BedTools. The prostate expressed sequences were then submitted to eArray (April, 2009) for probe design.  Those probes that met the inclusion criteria (discussed in 3.2.2) were included in the custom microarray.  For RNA-seq regions, probes with an average of 10 sequence reads or more in the pileup files for either the LNCaP or C4-2 RNA-seq data were included.  The pileup file format represents RNA-seq data as the number of sequence reads for each bp of a reference genome.  If the direction of the transcripts could not be determined, probes were included to detect transcripts from either strand. 6,605 probes were designed (eArray: December, 2010) to detect the prostate-specific regions identified in the Illumina Human Body Map 2.0 (discussed in 2.2.4) with the following criteria: longer than 60 nt and with expression greater than 1 FPKM (fragments per kilobase of transcript per million fragments mapped).  These probes were only included in version 3 of the custom microarray (versions of the microarray discussed below in 3.4).  93% of these probes do not overlap RefSeq transcripts.  If the original direction of the transcription could not be determined, probes were included to detect both possible directions. RefSeq Transcripts 96,461 probes were designed to detect features of both protein-coding and non-coding RefSeq transcripts not covered by the standard Agilent 44K probe set (see Table 3.1 for distribution of probes).  Whenever possible, probes were designed to detect alternative splicing and alternative 3‟ UTR usage found in the UCSC Genome Browser RefSeq or GenBank mRNA tracks (downloaded April 2009) (as discussed in 1.4.4). 55  Antisense Transcripts 5,809 probes were designed to detect antisense transcription where alignment of sequences in the UCSC RefSeq, GenBank, or EST tracks (downloaded April 2009) suggested that transcription may occur from both directions.  Efforts were made to pair antisense probes with sense probes designed to detect features of RefSeq transcripts. Androgen Receptor DNA Binding Sites In the previous experiment using the 244K aCGH Agilent microarray to detect novel transcription, differential expression was detected upstream of PSA in the enhancer region—a well characterized AR DNA binding site (Huang et al. 1999).  2,118 probes were designed to test for potential transcription at other AR binding sites identified in a published AR ChIP-on-chip experiment (Massie et al. 2007) and in an AR ChIP-seq experiment (discussed below in 4.2).  Probes were included if evidence of transcription was found in the UCSC RefSeq, GenBank, or EST tracks (downloaded April 2009) or in the LNCaP and C4-2 RNA-seq data. 3.3 Microarray Probe Classification and Annotation Genomic coordinates of RefSeq transcripts were downloaded from the refFlat table in the UCSC Genome Browser (downloaded Dec 2010; version hg18).  The refFlat table was parsed using a perl script to generate a BED file (hereafter called the refBlocks file) containing the genomic coordinates for the separate features of the RefSeq transcripts: promoters (5 kb upstream), 5‟ UTRs, exons, introns, 3‟ UTRs, and 3‟ extended regions (5 kb downstream).  Genomic coordinates (version hg18) of the previously designed and novel microarray probes were found using a local installation of the BLAT alignment tool from the UCSC Genome Browser.  BLAT results were parsed into the 12 column BED format.  The refBlocks BED file was compared to the microarray probe BED file using the „intersectBed‟ tool in BedTools.  The results of the intersection were parsed using perl and a mySQL database.  Microarray probes for the custom microarray are classified based on their genomic location compared to RefSeq transcripts.  Figure 3.1 gives a diagram describing the microarray probe classification. Table 3.1 gives a summary of the probe classifications for the novel probes (designed by eArray) 56  compared to the probe classifications for the probes included in the custom microarray from the Agilent 44K probe set.  Figure 3.1  Microarray Probe Classification Microarray probes for the Agilent custom microarray were classified based on their genomic location compared to RefSeq transcripts.  In this diagram, the RefSeq transcript (dark blue) is transcribed from left (5‟ end) to right (3‟ end).  RefSeq probes (red) align with the 5‟ UTR (short block on left), exon (tall block in the middle), and the 3‟ UTR (short block on right).   Probes targeting non-RefSeq transcripts (light blue) are classified based on their location to RefSeq transcripts.  Non-RefSeq probes can align with the promoter region (5 kb upstream), an intron (sense or antisense), antisense to exons or UTRs, or the 3‟ extended region (5 kb downstream) of RefSeq transcripts. Intergenic probes align to the genome greater than 5 kb from the closest RefSeq transcript. 3.4 Microarray Design Versions Three versions of a custom microarray were generated.  The version 1 design used the Agilent SurePrint HD 2x105K format.  Version 2 and 3 designs used the Agilent SurePrint G3 4x180K format.  The SurePrint G3 4x180K format allowed for the inclusion of more probes and was less expensive than the 2x105K format using the older SurePrint HD technology.  Version 2 and 3 designs both contain the 4,848 standard controls for the 4x180K format as well as 750 replicated controls (i.e. 50 controls printed 15 times).  The main difference between version 2 and 3 designs was the inclusion in version 3 of the 6,605 probes designed to target the prostate-specific genomic regions identified in the Illumina Body Map 2.0.  7,201 version 2 probes were retired in order to include the new probes in version 3.  Table 3.1 gives a summary of the probe classifications for version 2 and version 3.  Analysis of data generated from version 2 and 3 designs is presented in later chapters. 57  Table 3.1  Probe Classifications for Novel Probes Compared to Agilent 44K Probes The table shows the number of novel microarray probes on the 4x180K Agilent custom (version 2 and 3) microarray compared to the probes included from the Agilent 44K (version 1) probe set.  The probes were classified as defined in Figure 3.1.  „multiple RefSeq‟ probes can detect a RefSeq transcript but align to multiple features of a transcript or transcripts associated with the same Entrez Gene ID.  Version 2 Version 3 Type of Transcript Probe Classification Novel Probes Agilent 44K Probes Novel Probes Agilent 44K Probes RefSeq coding exon 78075 8814 78391 8809 non-coding exon 1936 854 1917 853 5'UTR 381 65 447 65 3'UTR 16103 11361 16372 11354 multiple RefSeq 1344 1560 1409 1559 Non- RefSeq exon antisense 4554 77 2151 77 5'UTR antisense 110 12 29 12 3'UTR antisense 3360 167 2940 167 intron sense 4809 1845 6599 1844 intron antisense 1575 775 3751 775 promoter sense 404 111 474 112 promoter antisense 277 455 333 455 3'extended sense 1409 849 855 848 3'extended antisense 308 175 388 174 multiple non-RefSeq 8951 1648 7028 1650 intergenic 539 1987 1357 1977  total 124135 30755 124441 30731 3.5 Microarray Probe Performance The microarray probes performance for the version 2 design was assessed using an in vitro experiment with 21 conditions (repeated in triplicate) studying the RNA profiles of LNCaP cells following treatments with combinations of steroids and steroid receptor inhibitors.  The analysis of some of the data generated in this in vitro experiment is discussed in later chapters.  The eArray tool provides a base composition score (proprietary algorithm) ranging from 1 to 4 based on how the designed probes meet the optimal base composition parameters (described above in 3.2.2).  Probes with a base composition score of 1_BC will form a more stable and specific duplex with their intended targets than probes with a score of 4_BC.  The eArray base composition 58  scores will be used below to assess probe performance.  Table 3.2 summarizes the base composition scores for the novel probes and the Agilent 44K probes included in version 2 and 3 designs. Table 3.2  Microarray Probes by eArray Base Composition Score The eArray base composition score (column 1) is based on the optimal probe design characteristics (1_BC: best probe; 4_BC worst probe).  The number (i.e. frequency) and percentage of novel probes included in all versions of the 4x180K custom microarray (columns 2 and 3) compared to probes included from the Agilent 44K (version 1) probe set (column 4 and 5). eArray Base Composition Score Novel Probes Designed by eArray Agilent 44K Probes   Frequency Percent Frequency Percent 1_BC 103239 83.2 22128 71.9 2_BC 13931 11.2 7072 23.0 3_BC 3944 3.2 919 3.0 4_BC 3021 2.4 636 2.1 Total 124135 100 30755 100  Figure 3.2 summarizes the distribution of average probe intensities and log ratios for the 63 microarrays used in the LNCaP in vitro experiment by eArray base composition score.  The median value for the average probe intensities and log ratio increased as the base composition scores increased (1_BC: best score; 4_BC: worst score).  The increased signal intensities observed for the probes with a base composition score of BC_3 or BC_4 (5.6% of probes designed for version 2 and 3; 5% of the Agilent 44K probes included) may be due to suboptimal hybridization and not due to biological changes in signal.  59   Figure 3.2  Novel Probe Intensities and Ratios by eArray Base Composition Score Data summarized in these boxplots is from an LNCaP in vitro experiment comparing 20 combinations of steroids and steroid receptor inhibitors to an ethanol only vehicle control (63 version 2 microarrays).  The top row shows the distribution of the average probe intensities by base composition score (A: RefSeq Probes; B: non-RefSeq probes. The bottom row show the distribution of the average log ratios by base composition score (C: RefSeq probes; D: non-RefSeq probes).  A box represents the interquartile range (bottom: first quartile; top: third quartile) while the line through the middle represents the median or second quartile.  The lines extending from the box or whiskers represent the data values that fall within 1.5 times the interquartile range.  Outliers, data values that fall outside of the whiskers in a boxplot, are plotted individually. The number of novel probes and Agilent 44K probes expressed in at least one condition in the LNCaP in vitro experiment is summarized in Table 3.3.  The expressed column 60  indicates the probe passed the „isWellAboveBG‟ Agilent Feature Extraction Software (version 10.5) flag for all three replicates in at least one of the 21 conditions tested. „isWellAboveBG‟ flag is positive if the probe signal passed the „isPosAndSignif‟ flag (i.e. the mean signal is significantly greater than an additive error model) and if the background-subtracted signal is significantly greater than the noise of its background (Agilent Feature Extraction Software version 10.5: Reference Guide). Table 3.3  Novel and Agilent Probes Expressed in the LNCaP in vitro Experiment A probe was defined as expressed if it passed the „isPosAndSignif‟ flag from the Agilent Feature Extraction Software for all triplicates in at least one of the 21 conditions tested in the LNCaP in vitro experiment (63 microarrays; version 2). Probe Source Probe Type Percent Expressed Expressed Not Expressed Total Agilent 44K Probes RefSeq 83.5 17969 3549 21518 Non-Refseq 83.5 7711 1526 9237 Novel Probes Designed by eArray RefSeq 80.2 78446 19393 97839 Non-Refseq 79.8 20990 5306 26296  Probe reproducibility for the biological triplicates in the LNCaP in vitro experiment is shown in Figure 3.3.  Each of the 21 samples in the experiment has a median coefficient of variation (i.e. standard deviation of probe intensity / mean probe intensity) below 0.16 (i.e. % CV below 16%).  The biological replicates were performed at the same time; however, the coefficient of variation measured will include variation due to treatment, RNA isolation and labelling, and microarray processing. 61   Figure 3.3  Probe Reproducibility in the LNCaP in vitro Experiment A boxplot showing the distribution of coefficient of variation (CV) (i.e. standard deviation of probe intensity / mean probe intensity) for the biological triplicates for each of the 21 samples included in the LNCaP in vitro experiment.  The median CV for all samples is below 0.16. 3.6 Discussion A custom Agilent microarray was designed and annotated to detect transcripts in regions outside of the 3‟ biased probes in the standard Agilent 44K gene expression probe set.  The custom microarray incorporated previously designed probes including Agilent human gene expression probes and non-coding probes provided by Drs. John Mattick and Marcel Dinger with novel probes designed to detect alternative splicing and 3‟ UTR usage in RefSeq transcripts along with novel probes designed to detect the prostate expressed genomic regions discussed in Chapter 2 (Objective 1).  All probes on the microarray were then classified based on their genomic location compared to the transcripts in the RefSeq database. The probe performance was then assessed based on the base composition.  The Agilent eArray portal provides a simple method to design novel probes with algorithms to ensure optimal hybridization conditions.  Probe intensity and log ratios can be dependent on probe composition.  The probes designed through eArray portal with a 62  base composition score of 1_BC or 2_BC were shown to perform better than probes with a base composition score of 3_BC or 4_BC.  Probes on the custom microarray with a base composition score of 3_BC or 4_BC were flagged in the following analyses. Normalization methods to consider GC probe content may reduce the base composition effect on probe intensities and should be considered in the future. The custom Agilent microarray discussed in this chapter was used to profile genomic regions and transcripts identified in RNA-seq data and present in primary RNA databases.  Analysis for data generated using the version 2 and 3 custom microarrays discussed in this chapter is discussed in Chapter 4 and Chapter 5.  63  Chapter 4 Identification of Androgen Regulated Transcripts and Genomic Loci 4.1 Introduction The custom 4x180K Agilent microarray described in Chapter 3 was used to detect androgen and anti-androgen regulated transcripts in an in vitro setting and transcripts regulated by castration and during progression to CRPC in an in vivo model. The data from the microarray profiling experiments were integrated with AR binding sites detected by ChIP-seq and androgen regulated small RNA detected by next-generation sequencing.  Each of the four high throughput experiments is described below as a separate experiment and as a part of an integrative analysis.  The LNCaP prostate cancer cell line was used for all four experiments.  LNCaP cells are androgen dependent human prostate adenocarcinomas cells derived from a lymph node metastasis of a 50 year-old Caucasian male (Horoszewicz et al. 1983).  The LNCaP cell line is the most widely used androgen dependent prostate cancer cell line in prostate cancer research with over five thousand references to LNCaP indexed in PubMed. The in vitro microarray experiment examined the transcripts regulated by the synthetic androgen, R1881, and the physiological androgen, DHT.  The synthetic androgen, R1881, is used extensively in prostate cancer research to study the role of androgens and AR in prostate cancer cell lines.  R1881 has a number of advantageous over DHT in a laboratory setting: R1881 has a higher affinity for AR and is not metabolized within the cell.  The in vitro microarray experiment also examined the transcripts regulated by the clinically used anti-androgen (i.e. antagonist of AR), bicalutamide, and the next- generation anti-androgen, MDV3100, currently being evaluated in Phase III trials.  Initial experiments comparing bicalutamide and MDV3100 showed that MDV3100 had a higher affinity for AR by competition assays with labelled DHT and was better able to impair AR nuclear localization (Tran et al. 2009). In addition to androgens and anti-androgens, the effects of progesterone and a non- specific steroid receptor inhibitor, RU-486, on the transcriptome were examined. Progesterone levels increase following castration in an LNCaP xenograft tumor 64  progression model (Locke et al. 2008)—the same in vivo model used for the in vivo microarray experiment presented in this thesis (described below in 4.5).  Progesterone can be metabolized to DHT within the cells and may be a source of androgen following ADT.  RU-486 (also known as mifepristone) is a synthetic steroid compound that functions as a broadly acting steroid receptor inhibitor.  RU-486 binds to the LBD of steroid receptors and inhibits the action of the progesterone, glucocorticoid, and androgen receptors.  To date, there are no commercially available specific inhibitors of the progesterone receptor. The expression of the androgen and anti-androgen regulated transcripts identified in vitro (Objective 2) were examined in the in vivo model (Objective 3) in order to classify and prioritize transcripts for further functional characterization (Objective 4). 4.2 AR Binding Sites Detected by ChIP-seq 4.2.1 Methods LNCaP cells were grown in charcoal stripped fetal bovine serum (CSS) for 48 h and then treated with 1 nM R1881, a synthetic androgen, for 24 h.  Genomic DNA was isolated from an AR ChIP experiment using the N-20 antibody against AR (sc-186; Santa Cruz Biotechnology) and from an input control (i.e. DNA isolated from the R1881 treated sample prior to ChIP).  Library construction and sequencing using the Illumina Genome Analyzer II of the R1881 treated AR ChIP sample, an ethanol vehicle control AR ChIP sample, and the input DNA control was performed at the BC Genome Sciences Centre (Vancouver, BC) using their standard single-end sequencing protocol as described (Robertson et al. 2007).  The sequencing reads (average read length of 36 bases) were aligned to the human genome (hg18) using the Eland software, an alignment program from the Illumina Genome Analyzer II platform.  5,157,750 reads were aligned for the R1881 treated AR ChIP sample.  10,032,993 reads were aligned for the input DNA control sample.  Enriched genomic regions for AR binding (hereafter referred as AR peaks) were detected using the MACS peak detection software (version 1.3.7.1) (Zhang et al. 2008) by comparing the R1881 treated AR ChIP sample to the input DNA control sample (p-value<10e-5).  Motif analysis was performed on the detected AR peaks using MEME-ChIP, an online suite of motif-based sequence 65  analysis tools (http://meme.nbcr.net/: version 4.6.1).  The default MEME ChIP settings were used with the exception of permitting repetitions of a single motif within an input sequence.  RefSeq transcriptional start sites (TSS) were parsed from the refFlat table in the UCSC Genome Browser (downloaded Dec 2010; version hg18).  The distance from a detected AR peak to the closest RefSeq TSS was calculated using the „closestBed‟ tool in BEDTools (Quinlan and Hall 2010). 4.2.2 Results AR ChIP-seq Peaks Detected 6,630 peaks were detected in the AR ChIP-Seq experiment in LNCaP cells 24 h after treatment with 1 nM R1881 compared to the input DNA control (MACS peaks detection software; p-value<10e-5).  The input DNA control should account for non-random artifacts introduced by DNA shearing, library preparation, and sequencing.  The input DNA control should also account for DNA copy number differences in the LNCaP genome.  An additional AR ChIP sample was prepared for an ethanol vehicle control; however, DNA yield was low and sequence coverage across the genome was not sufficient to use as a baseline.  This is expected as AR is reported to translocate to the nucleus and bind to DNA only in the presence of androgens (discussed in 1.2.2).  The 6,630 peaks detected by comparing the R1881 treated AR ChIP sample to the input DNA control were used to generate the analysis presented below. AR ChIP-seq Peaks Compared to Other AR ChIP Experiments The AR peaks were compared to sites in a recently published AR ChIP-Seq experiment (Yu et al. 2010) and an AR ChIP-chip experiment (Wang et al. 2009).  Yu et al. reported 37,193 peaks in their AR ChIP-seq experiments using a different AR antibody (no. 06- 680 from Millipore) and a different time point (16 h after treatment with 1 nM R1881). HPeak peak detection software was used in the Yu et al. publication; however, the AR peaks were not reported to be compared to an input DNA control.  12,269 genomic regions enriched for AR binding were downloaded from the associated website for the Wang et al. publication.   The Wang et al. AR ChIP-chip experiments used the same N- 20 AR antibody (Santa Cruz) but a different time point and steroid treatment (4 h after treatment with 100 nM DHT).  The AR Chip-chip experiment was performed on the 66  Affymetrix Human Tilling 2.0R Microarray Set.  The genomic regions enriched detected for AR binding from the three experiments were compared using the „intersectBed‟ tool in BEDTools.  The overlapping regions were summarized in a Venn diagram (Figure 4.1).  86% of the AR peaks (5,521 peaks) detected were also detected either in the Wang et al. or the Yu et al. AR ChIP experiments. 47% of the AR peaks (3,135 peaks) were common to all three AR ChIP experiments.  Three well studied AR binding sites (discussed in 1.2.3) are found in all three AR ChIP experiments: the KLK3/PSA distal enhancer region (3.6 kb upstream of TSS), the FKBP5 downstream intronic region (96 kb downstream of TSS), and the TMPRSS2 enhancer region (13.3 kb upstream of TSS).  AR binding of the proximal promoter region of KLK3/PSA was found in the AR ChIP-seq experiment and in the Yu et al. AR ChIP-seq experiment but not in the Wang et al. AR ChIP-chip experiment.  Figure 4.1  AR ChIP-seq Peaks Compared to Other Published AR ChIP Experiments The peaks detected in the AR ChIP-seq experiment (VPC; red) were compared with other published AR ChIP experiments on LNCaP cells. 5,474 peaks were detected in both the VPC and the other AR ChIP-seq experiment [(Yu et al. 2010); blue].  3,182 peaks were detected in the both the VPC and the AR ChIP-chip experiment [(Wang et al. 2009); grey].  3,135 peaks were common to all three experiments. AR ChIP-seq Peaks Compared to RefSeq Transcript Features The peaks found in the AR ChIP-seq experiment were annotated in relation to their location to RefSeq transcripts using the same annotation method described for the 67  microarray probes (discussed above in 3.3 and shown in Figure 3.1).  The annotation is summarized in the pie chart in Figure 4.2.  Most of the AR peaks were greater than 5 kb from the closest RefSeq transcript (45%) or in intronic regions (39%). Only 5% of the AR peaks were 5 kb upstream in the promoter regions of RefSeq transcripts.  The AR peak annotation was similar to the annotation reported (in Figure S2) for the Yu et al. ChIP-seq experiment.   Yu et al. reported 45% of the AR peaks overlapping introns, 39% overlapping enhancer regions (greater than 5 kb), 11% overlapping promoters (5kb upstream), and 5% overlapping exons.  Figure 4.2  AR ChIP-seq Peaks Compared to RefSeq Transcript Features The AR ChIP-seq peaks were compared to RefSeq transcript features.  45% of the peaks (2,983) were greater than 5 kb from the closest transcript.  39% of the peaks (2,608) were detected in introns.  5% of the peaks (340) were 5 kb upstream in the promoter region.  3% of the peaks (187) were 5 kb downstream in the 3‟ extended region.  1% of the peaks (59) were detected in either exons or 3‟ UTRs.  6% of the peaks (450) overlapped multiple features of a transcript or multiple transcripts belonging to different Entrez Genes.  Of these peaks overlapping multiple features, 65 overlapped a portion of a 5‟ UTR. The minimum distances between an AR ChIP-seq peak and the closest RefSeq TSS was calculated (Figure 4.3).  The distances ranged from 1.86 Mb upstream to 2.16 Mb downstream with 95% of the peaks detected within 500 kb of the closest TSS (Figure 4.3 Top).  29% of the peaks were detected within 20 kb of the closest TSS while only 10% were detected within 5 kb (Figure 4.3 Bottom).  The intergenic peaks (labelled in Figure 4.2) were an average of 182 kb from the closest RefSeq TSS with 85% of those 68  peaks further than 20 kb away.  These analyses only consider RefSeq transcripts and do not take into account non-RefSeq and other novel transcripts detect by next- generation sequencing.  Figure 4.3  AR ChIP-seq Peak Distance to the Nearest RefSeq TSS The distribution of the minimum distance from an AR ChIP-seq peak to the closest RefSeq transcriptional start site (TSS) is plotted by histogram. Top: The histogram for all peaks.  Distances ranged from 1.86 Mb upstream to 2.16 Mb downstream.  Bottom: the histogram for the subset of peaks within 20 kb of the nearest TSS (1930 peaks). Only 10% of all peaks were detected within 5 kb of the nearest TSS. De novo DNA Sequence Motif Analysis The DNA sequences for the peaks detected in the AR ChIP-seq experiment were analyzed using the MEME ChIP suite of tools (i.e. MEME, TOMTOM, and MAST).  Two prominent motifs (E-values: 4.3e-033, 4.6e-057) were detected using MEME (Bailey and Elkan 1994) on 600 randomly chosen and trimmed (central 100 bases) input 69  sequences.  The two motifs were aligned to motifs from the JASPAR CORE 2009 databases using TOMTOM (Gupta et al. 2007).  One motif (motif logo shown in Figure 4.4 A) aligned with the AR consensus motif (MA0007.1; q-value<0.001).  The detected motif also aligned to the glucocorticoid consensus motif (NR3C1; MA0113.1; q- value<0.01).  The alignment of the detected motif with the glucocorticoid receptor motif was not unexpected as the binding motifs for the glucocorticoid and androgen receptors are similar.  The second motif (motif logo shown in Figure 4.4 B) aligned with the FOXA1 consensus motif (MA0148.1; q-value<0.05) and the fkh (Forkhead family) consensus motif (MA0446.1: q-value<0.05).  FOXA1 is a member of the forkhead family of winged-helix transcription factors.  The FOXA1 motif was also detected in the other AR ChIP experiments compared above.  The two motifs were aligned to all of the AR peaks using MAST (position p-value<0.0001) (Bailey and Gribskov 1998).  The AR-like motif was found in 40% of all AR peaks (2,635) while the FOXA1-like motif was found in 58% of all AR peaks (3,833).  21% of all AR peaks (1,385) contained both the AR-like and FOXA1-like motifs (not overlapping). 70   Figure 4.4  DNA Sequence Motifs Detected in AR ChIP-seq Peaks The JASPAR CORE 2009 database consensus motifs (top) was aligned to the motifs detected by MEME on the AR ChIP-seq peaks (bottom) by TOMTOM (output above). A. The top line is the AR consensus motif (JASPAR name: MA0007.1) aligned to the detected motif in the bottom line (E-value: 4.3e-033, q-value<0.001).  B. The top line is the FOXA1 consensus motif (JASPAR name: MA0148.1) aligned to the detected motif in the bottom line (E-value: 4.6e-057, q-value<0.05). 4.2.3 Discussion The AR ChIP-seq experiment detected 6,630 potential AR binding sites in LNCaP cells 24 h after treatment with 1 nM R1881.  These regions were enriched (86%) for AR binding in at least one of two other genome wide AR ChIP experiments compared (Wang et al. 2009; Yu et al. 2010).  Well studied AR binding sites including the enhancer regions for KLK3/PSA, FKBP5, and TMPRSS2 were detected in all three AR ChIP experiments.  The AR and FOXA1 consensus motifs were enriched in all three experiments.  FOXA1 and AR protein interaction has been reported to regulate prostate-specific transcription including transcription of the KLK3/PSA genomic loci 71  (Gao et al. 2003).  The consensus motif for the forkhead family of transcription factor is similar; therefore, it is possible that other forkhead transcription factors may also interact with AR to coordinate transcription. The 6,630 AR binding sites detected may be a conservative list, as 8,686 additional regions were detected in the other two AR ChiP experiments.  Some variation was expected as different AR antibodies, steroids (i.e. R1881 and DHT), treatment durations, steroid concentrations, detection software and normalization methods were used in the three AR ChIP experiments.  The AR ChIP sample sequences were normalized to an input control as DNA shearing and library preparation and sequencing may introduce non-random artifacts.  Yu et al. did not report a similar normalization procedure so false positives may account for some of the additional “AR peaks” detected. Most of the AR binding sites were located in distant enhancer or intergenic regions when compared to RefSeq transcripts.  Only 10% of the detected AR binding sites were located within 5 kb of the closest RefSeq TSS which may suggest that AR does not function primarily as a traditional transcription factor and may have more effect at distant enhancer regions.  KLK3/PSA, FKBP5, and TMPRSS2 are well studied examples of AR binding to distant enhancer regions.  A high throughput ChIP chromosome conformation capture assay may further elucidate the role of distant enhancer binding as in the KLK3/PSA example where chromatin looping brings together AR binding at the KLK3/PSA distal enhancer region and proximal promoter region to form a coordinated transcriptional complex (Shang et al. 2002; Wang et al. 2005).  An example of a high throughput ChIP chromosome conformation capture assay is called ChIA-PET (chromatin interaction analysis by paired-end tag sequencing).  The ChIA- PET protocol includes cross-linking and sonication of DNA-protein complexes, enrichment of DNA fragments by an antibody for a protein of interest, linker mediated ligation of proximal DNA fragments, and purification and pair-end sequencing of linker enriched DNA fragments (Fullwood et al. 2010).  Intergenic AR binding may also regulate transcription of novel non-RefSeq transcripts. The 6,630 AR binding sites detected in the AR ChIP-seq experiment were integrated with other datasets in order to predict direct targets of AR (discussed below). 72  4.3 Androgen and Anti-androgen Regulated Transcripts Detected in vitro by Microarray Profiling 4.3.1 Methods LNCaP cells were grown in CSS for 48 h and then treated for 48 h with an ethanol vehicle control (0.38%), 1 nM R1881, 10 nM DHT, or 5 nM 17-OH-progesterone.  Three different steroid receptor inhibitors were also used either alone or in combination with a steroid inhibitor: bicalutamide (10 uM), MDV3100 (10 uM), or RU-486 (10 uM).  The subset of sample conditions presented in this thesis is summarized in Table 2.1. Each sample condition was prepared in triplicate.  RNA was isolated using the miRVana RNA isolation kit (Ambion).  The labelled cRNA was prepared using the Agilent Low Input Quick Amp (LIQA) labelling kit.  The cRNA was hybridized to the Agilent 4x180K custom version 2 microarray (discussed in Chapter 3) using the standard Agilent single channel hybridization protocol.  The microarrays were scanned with the Agilent scanner and feature extraction was performed using the Agilent Feature Extraction Software (version 10.5.1.1). The raw data were analyzed using R (version 2.11.0) and LIMMA (version 3.4.1), Bioconductor package (Smyth 2004).  A between array quantile normalization was applied to make the distribution of signal intensities similar for all microarrays within the experiment in order to limit technical variation.  Contrasts between the different sample conditions were generated using LIMMA linear models.  Differential expression was determined using a Baysian adjusted t-statistic from the LIMMA linear model corrected for a false discovery rate of 5% (Benjamini-Hochberg multiple test correction).  Probes with an eArray base composition score of 3_BC or 4_BC (as discussed in 3.5) were flagged and not included in the analysis below.  The probes were defined as expressed for a sample condition if the probes passed the „isWellAboveBG‟ Agilent FE Extraction Software flag (discussed in 3.5) for all three triplicates.  The probes classified as „multiple RefSeq‟ probes in the tables below can overlap multiple RefSeq features (e.g. 5‟UTR and exon) of an associated Entrez Gene ID.  BED files containing the genomic alignments of the individual probes were generated for each of the sample contrasts. 73  The BED files were loaded to the UCSC Genome Browser as custom tracks to visually compare the differential expression (color coded) across a genomic locus. Functional over-representation analysis for Gene Ontology terms (GO term; i.e. GOTERM_BP_FAT and GOTERM_MF_FAT) was performed with the DAVID bioinformatics online resource version 6.7  (Huang da et al. 2009). DAVID uses a modified Fisher Exact Test to determine over-representation of genes annotated with a GO term in an input list of genes.  The p-values listed are Benjamini-Hochberg adjusted p-values for the modified Fisher Exact Test.  Figures were generated using R (version 2.11.0). Table 4.1  Sample Conditions Included in the LNCaP in vitro Experiment The 13 sample conditions included in the LNCaP in vitro experiment are listed by row. LNCaP cells were grown in CSS for 48 h then treated with the steroid and/or steroid receptor inhibitor as indicated in each row for 48 h.  The sample B.DHT, for example, was treated with 10 nM DHT and 10 uM bicalutamide. Sample Condition Abbreviation Steroid or Control Steroid Receptor Inhibitor E Ethanol control - R1881 R1881 (1 nM) - DHT DHT (10 nM) - P 17-OH-Progesterone  (5 nM) - B Ethanol control Bicalutamide (10 uM) B.DHT DHT (10 nM) Bicalutamide (10 uM) B.P 17-OH-Progesterone  (5 nM) Bicalutamide (10 uM) MDV Ethanol control MDV3100 (10 uM) MDV.DHT DHT (10 nM) MDV3100 (10 uM) MDV.P 17-OH-Progesterone  (5 nM) MDV3100 (10 uM) RU486 Ethanol control RU-486 (10 uM) RU486.DHT DHT (10 nM) RU-486 (10 uM) RU486.P 17-OH-Progesterone  (5 nM) RU-486 (10 uM) 4.3.2 Results Summary of Androgen Regulated Probes Androgen regulated microarray probes were defined as differentially expressed (1.5 fold change; FDR adjusted p-value<0.05) in LNCaP cells 48 h after treatment with R1881 or DHT compared to an ethanol control.  The number of androgen regulated probes is 74  summarized in Table 2.1.  More probes were differentially regulated by the synthetic androgen, R1881, than by the physiological androgen, DHT.  The difference between the R1881 and DHT treatments is discussed later in this section. Table 4.2  R1881 and DHT Regulated Probes by Probe Classification The number of probes differentially expressed (1.5 fold change; FDR adjusted p- value<0.05) between DHT (10 nM) and the ethanol control (E), and between R1881 (1 nM) and the ethanol control are given in the columns.  The total number of probes on the 4x180 Agilent version 2 custom microarray is listed in the last column.  The microarray probes were classified using the method illustrated in Figure 3.1.  DHT vs E R1881 vs E version 2 RefSeq Probes up down up down total coding exon 8602 9878 12482 17242 87295 non-coding exon 209 500 485 728 3789 5'UTR 31 29 65 54 485 3'UTR 3067 3998 4472 6095 28326 multiple RefSeq 296 342 401 558 2958 Non-RefSeq Probes up down up down total exon antisense 358 214 1008 268 4963 5'UTR antisense 8 12 18 15 136 3'UTR antisense 404 148 743 274 3947 intron sense 843 1506 1777 2102 12242 intron antisense 224 275 761 315 4481 promoter sense 32 51 90 67 723 promoter antisense 118 142 184 253 1475 3'extended sense 286 384 371 701 3090 3'extended antisense 52 65 114 113 743 multiple non-RefSeq 955 971 1830 1829 13707 intergenic 338 610 1022 846 6927  The list of androgen regulated RefSeq transcripts generated from the LNCaP in vitro microarray experiment was compared to a previously published list of androgen regulated genes [(Nelson et al. 2002) downloaded from the Broad Institute‟s Molecular Signatures Database].  The previously published list was generated from a microarray experiment profiling LNCaP cells 24 h after treatment with 1 nM R1881.  73 of the 84 genes increased by R1881 in the published experiment were increased by either DHT or R1881 including KLK3/PSA, FKBP5, TMPRSS2 and NXK3-1.  15 of the 17 genes decreased by R1881 in the published experiment were decreased by either DHT or R1881 including CTBP1 (discussed in Chapter 5). 75  Functional over-representation analysis was performed for the Entrez Genes commonly regulated by DHT and R1881 compared to the ethanol control.  The top four enriched GO term clusters for the 2,048 genes increased by R1881 and DHT (coding exons in Table 4.2) were steroid metabolic process (p-value: 1.2E-6), intracellular transport (p- value: 1.3E-6), phosphate metabolic process (p-value: 2.0E-4) and fatty acid metabolic process (p-value: 1.8E-3).  The top two enriched GO term clusters for the 1,569 genes decreased by R1881 and DHT (Table 4.2 coding exons) were cell adhesion (p-value: 2.4E-5) and chromatin modification (p-value: 1.9E-2). Non-coding Entrez Genes were also differentially regulated by R1881 and DHT.  140 ncRNAs were increased by R1881 and DHT (Table 4.2 non-coding exons) including PCA3, the lncRNA currently being evaluated as a marker for prostate cancer (discussed in 1.4.8) and 208 ncRNAs were decreased by R1881 and DHT. Non-RefSeq probes were also differentially regulated by R1881 and DHT (Table 4.2 bottom).  134 non-RefSeq probes increased by R1881 and DHT were directly overlapping the AR ChIP-peaks detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  67 of these probes were classified as „intron sense‟ probes suggesting that these probes may be detecting novel variants of RefSeq transcripts. 430 intergenic probes (i.e. greater than 5 kb from the nearest RefSeq transcript) were increased by R1881 and DHT and 220 intergenic probes were decreased by R1881 and DHT.  The probes designed to detect prostate expressed transcripts found in GenBank or LNCaP and C-42 RNA-seq data were also differentially regulated by R1881 and DHT: 1316 probes were increased and 1169 probes were decreased. Synthetic Androgen (R1881) Compared to Physiological Androgen (DHT) The concentrations of steroids used (i.e. 1 nM R1881 and 10 nM DHT) are the standard concentrations used to mimic physiological levels of androgens.  A higher concentration of DHT is used because it has a lower affinity for AR than R1881 and DHT is metabolized within the cell whereas R1881 is not metabolized.  The number of differentially expressed probes between R1881 and DHT treatments is illustrated in Figure 4.5 (yellow: increased by R1881; blue: decreased by R1881). 76   Figure 4.5  Microarray Probe Intensities for R1881 Compared to DHT Microarray probes intensities (log2) from LNCaP cells after 48 h treatment with the physiological androgen, DHT (10 nM), and the synthetic androgen, R1881 (1 nM) were compared by scatterplot.  Differential expression was defined as 1.5 fold change (black lines) and an FDR adjusted p-value<0.05.  Yellow: The probes increased by R1881 compared to DHT (RefSeq probes: 6,082; non-RefSeq probes: 4,281).  Blue: The probes decreased by R1881 compared to DHT (RefSeq probes: 8,990; non-RefSeq probes: 2,146). R1881 should be expected to regulate the same transcripts as DHT as it is an agonist of AR.  R1881 should also affect a larger magnitude of change as it has a higher affinity for AR and a longer half life on the receptor.  The ratios between R1881 and DHT to the ethanol control were compared in a scatterplot in Figure 4.6.  As expected, probes were identified that were regulated in the same direction by R1881 and DHT compared to ethanol but the magnitude of change was different.  Unexpectedly, probes were identified that were specifically regulated by either DHT or R1881. A functional over-representation analysis was performed for the Entrez Genes targeted by the R1881 and DHT specific probes (i.e. the probes plotted in red, blue, green and 77  yellow in Figure 4.6).  The top three enriched GO term clusters for the 2,398 genes increased specifically by R1881 (Figure 4.6 yellow) were cell adhesion (p-value: 5.2E- 16), ion transport (p-value: 7.4E-12), and ligand-gated ion channel activity (p-value: 5.2E-7). The top three enriched GO term clusters for the 1,923 genes decreased specifically by R1881 (Figure 4.6 blue) were cell cycle (p-value: 2.8E-52), DNA metabolic process (p- value: 1.2E-38), microtubule-based process (p-value: 7.6E-17).  The top three enriched GO term clusters for the 426 genes increased specifically by DHT (Figure 4.6 red) were cell cycle/M phase (p-value: 7.5E-6), DNA-dependent DNA replication (p-value: 0.04), and DNA metabolic process (p-value: 8.4E-4).  The top three enriched GO term clusters for the 533 genes decreased specifically by DHT (Figure 4.6 green) were ion transport (p-value: 1.1E-4), neuron migration (p-value: 0.01), cell projection organization (p-value: 2.8E-4). 54% of the probes decreased specifically by R1881 (Figure 4.6 blue) were increased with progesterone treatment (Figure 4.7 bottom left).  The top three enriched GO term clusters for the 508 genes specifically decreased by R1881 and increased by progesterone were cell cycle (p-value: 1.1E-97), DNA metabolic process (p-value: 3.5E- 51), and microtubule-base process (p-value: 2.8E-26).  78  . Figure 4.6  R1881 and DHT Regulated Probes Compared by Scatterplot Microarray expression profiles from LNCaP cells after 48 h treatment with the physiological androgen, DHT (10 nM), and the synthetic androgen, R1881 (1 nM) were compared by scatterplot.  The log2 ratio for the DHT sample compared to the ethanol control (E) was plotted on the x-axis.  The log2 ratio for the R1881 sample compared to the ethanol control was plotted on the y-axis.  Differential expression was defined as 1.5 fold change (black lines) and an FDR adjusted p-value<0.05.  Yellow: The probes increased specifically by R1881 compared to both DHT and ethanol (RefSeq probes: 3,409; non-RefSeq probes: 2,746).  Blue: The probes decreased specifically by R1881 compared to both DHT and ethanol (RefSeq probes: 4,683; non-RefSeq probes: 1,051). Red: The probes increased specifically by DHT compared to both R1881 and ethanol (RefSeq probes: 728; non-RefSeq probes: 197). Green: The probes decreased specifically by DHT compared to both R1881 and ethanol (RefSeq probes: 631; non- RefSeq probes: 515). Turquoise:  The probes differentially expressed between R1881 and DHT but not differentially expressed when compared to ethanol (RefSeq probes: 5,712; non-RefSeq probes: 2,076). Grey:  Probes not differentially expressed between R1881 and DHT. 79   Figure 4.7  R1881 and Progesterone Regulated Probes Compared by Scatterplot Microarray expression profiles from LNCaP cells after 48 h treatment with the synthetic androgen, R1881 (1 nM), and 17-OH-progesterone (P; 5 nM) were compared by scatterplot.  The log2 ratio for progesterone sample compared to the ethanol control (E) was plotted on the x-axis.  The log2 ratio for the R1881 sample compared to the ethanol control was plotted on the y-axis.  Differential expression was defined as 1.5 fold change (black lines) and an FDR adjusted p-value<0.05.  The probes specifically increased (yellow) and decreased (blue) by R1881 compared to both DHT and ethanol were plotted in the same colors as in Figure 4.6. Fluorescence-activated cell sorting (FACS) analysis was performed to validate the cycle cell related difference detected by the functional over-representation analysis of transcripts differentially regulated by 1 nM R1881 and 10 nM DHT in the LNCaP in vitro microarray experiment.  FACS analysis was performed on LNCaP cells treated with ethanol, DHT (10 nM), R1881 (1 nM), 17-0H progesterone (5 nM), or a synthetic progesterone, R5020 (1 nM).  All steroids tested except R1881 stimulated the LNCaP cells to progress through S phase of the cell cycle (Figure 4.8). 80   Figure 4.8  FACS Analysis for LNCaP Cells Treated with Different Steroids Fluorescence-activated cell sorting (FACS) analysis was performed on LNCaP cells were grown in CSS for 48 hr and then treated with DHT (10 nM), R1881 (1 nM), 17-0H progesterone (5 nM), or a synthetic progesterone, R5020 (1 nM) for 24 h, 48 h, and 72 h.  The FACS analysis shown here gives the percentage of cells in S phase at the three time points for the different hormone treatments.  DHT, progesterone, and R5020 stimulated the LNCaP cells to progress through S phase while treatment with R1881 or the ethanol and CSS controls did not stimulate the cells to progress through S phase. The data presented is from one experiment.  The experiment has been repeated with similar results. The androgen regulated RefSeq transcripts detected in the LNCaP in vitro experiment were compared to the AR ChIP-seq peaks detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  The distribution of the minimum distance from an AR ChIP-seq peak to the TSS of an R1881 or DHT regulated RefSeq transcript were compared by boxplot (Figure 4.9).  The median distance between an AR ChIP-seq peak and a RefSeq TSS was lower for the transcripts increased by R1881 or DHT compared to ethanol than the median distance for transcripts not affected by either R1881 or DHT (Wilcoxon Rank Test: FDR adjusted p-value<0.05).  The median distance was not lower for transcripts decreased by R1881 or DHT compared to ethanol.  The median distance was also not lower for the R1881 specific transcripts 81  suggesting that many of these transcripts may not be regulated through direct AR DNA interactions.  The R1881 specific transcripts may have been regulated through secondary transcription factors or other steroid receptors.  The median distance was lower for the DHT specific transcripts.  AR bound to R1881 may bind to the DNA of some of these DHT specific transcripts—as the AR ChIP-seq peaks were detected after R1881 treatment—but secondary co-factors may be required for transcription. 82   Figure 4.9  AR ChIP-seq Peak Distance to the Nearest DHT or R1881 Regulated RefSeq TSS The distribution of the minimum distance from an AR ChIP-seq peak to the TSS of a DHT or R1881 regulated RefSeq transcript was compared by boxplot.  The AR ChIP- seq peaks were detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  RefSeq transcripts were subdivided by differential expression in LNCaP cells 48h after treatment with 1 nM R1881 or 10 nM DHT compared to an ethanol control (E).  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  R1881 specific transcripts (increased: yellow; decreased: blue; as in Figure 4.6) were differentially expressed compared to both DHT and the ethanol. DHT specific transcripts (increased: red; decreased: green; as in Figure 4.6) were differentially expressed compared to both R1881 and ethanol.  The distances for each subset was compared by Wilcoxon Rank Test to distances calculated for RefSeq transcripts not regulated by R1881 or DHT (no change; * FDR adjusted p-value<0.05; ** FDR adjusted p-value<0.001).  The outlier distances were not plotted. Current (Bicalutamide) and Next Generation (MDV3100) Anti-Androgens The microarray expression profile following treatment with bicalutamide, the anti- androgen currently used in the clinic, was compared to the expression profile following 83  treatment with MDV3100, the next generation anti-androgen.  Bicalutamide and MDV3100 regulated probes were defined as increased by the anti-androgen in the presence of DHT compared to DHT alone.  Bicalutamide and MDV3100 appeared to function as AR antagonists.  The DHT regulated probes oppositely regulated by the anti-androgens are summarized in Table 4.3.  47% of the probes increased by DHT were decreased by bicalutamide (Figure 4.10 bottom right) and 28% of the probes decreased by DHT were increased by bicalutamide (Figure 4.10 top left).  60% of the probes increased by DHT were decreased by MDV3100 (Figure 4.11 bottom right) and 38% of the probes decreased by DHT were increased by MDV3100 (Figure 4.11 top left).  42% of the probes increased by DHT were decreased by both anti-androgens and 24% of the probes decreased by DHT were increased by both anti-androgens. Bicalutamide and MDV3100 do not function as AR agonists for the conditions tested as there were few probes commonly regulated between DHT and the anti-androgens (Figure 4.10 and Figure 4.11 top right; bottom left). The top two enriched GO terms for the 2,095 Entrez Genes increased by DHT and decreased by either anti-androgen were steroid metabolic process (p-value: 4.2E-6) and fatty acid metabolic process (p-value: 2.4E-5).  The top three enriched GO terms for the 2,183 genes decreased by DHT and increased by either anti-androgen were cell projection organization (p-value: 6.0E-9), cell adhesion (p-value: 1.7E-5), and cell motion (p-value: 5.4E-4) The top three enriched GO terms for the 1,016 Entrez Genes increased by DHT and decreased by MDV3100 but not decreased by bicalutamide were sterol metabolic process (p-value: 0.02), cholesterol metabolic process (p-value: 0.02), and sterol biosynthesis process (p-value: 0.02).  The top three enriched GO terms for the 1,102 genes increased by DHT and decreased by MDV3100 but not decreased by bicalutamide were metal ion binding (p-value: 5.6E-5), cell projection organization (p- value: 1.7E-3), cell motion/migration (p-value: 0.02). 84  Table 4.3  DHT Regulated Probes Oppositely Regulated by Anti-Androgens DHT regulated probes were differentially expressed between DHT (10nM) and the ethanol control (1.5 fold change; FDR adjusted p-value<0.05).  The bicalutamide (B; 10 uM) and MDV3100 (MDV; 10 uM) regulated probes were differentially expressed between the anti-androgen in the presence of DHT compared to DHT alone.  The microarray probes were classified using the method illustrated in Figure 3.1.  DHT up and Anti-Androgen down DHT down and Anti-Androgen up Anti-Androgen  - B  MDV  B and MDV   - B MDV B and MDV RefSeq Probes  coding exon 8602 4207 5260 3828 9878 2832 3704 2383 non-coding exon 209 96 122 87 500 138 156 99 5'UTR 31 12 16 10 29 11 14 9 3'UTR 3067 1442 1810 1317 3998 1255 1869 1167 multiple RefSeq 296 125 156 115 342 95 143 85 Non-RefSeq Probes   exon antisense 358 130 172 109 214 42 87 35 5'UTR antisense 8 0 1 0 12 4 6 4 3'UTR antisense 404 158 223 134 148 38 65 29 intron sense 843 380 569 353 1506 390 429 266 intron antisense 224 87 109 69 275 68 136 59 promoter sense 32 13 19 13 51 12 16 6 promoter antisense 118 62 65 52 142 44 55 30 3'extended sense 286 117 171 114 384 104 134 95 3'extended antisense 52 16 25 16 65 17 22 15 multiple non-RefSeq 955 402 548 372 971 196 261 150 intergenic 338 118 171 104 610 179 255 152  85   Figure 4.10  Bicalutamide and DHT Regulated Probes Compared by Scatterplot Microarray expression profiles from LNCaP cells after 48 h treatment with the anti- androgen, bicalutamide (10 uM) and the physiological androgen, DHT (10 nM) were compared by scatterplot.  The log2 ratio for the DHT (10 nM) treated sample compared to the ethanol sample was plotted on the x-axis.  The log2 ratio for the bicalutamide (10 uM) and DHT sample compared to the DHT sample was plotted on the y-axis. Differential expression was defined as 1.5 fold change (black lines) and an FDR adjusted p-value<0.05.  Red: The probes increased by DHT compared to ethanol.  47% of the probes increased by DHT were decreased by bicalutamide treatment (bottom right; RefSeq probes: 5,882; non-RefSeq probes: 1,483).  Green:  The probes decreased by DHT compared to ethanol.  28% of the probes decreased by DHT were increased by bicalutamide treatment (top left; RefSeq probes: 4,331; non-RefSeq probes: 1,094).  The probes specifically increased (yellow) and decreased (blue) by R1881 compared to both DHT and ethanol were plotted in the same colors as in Figure 4.6.  86   Figure 4.11  MDV3100 and DHT Regulated Probes Compared by Scatterplot Microarray expression profiles from LNCaP cells after 48 h treatment with the anti- androgen, MDV3100 (MDV; 10 uM) and the physiological androgen, DHT (10 nM) were compared by scatterplot.  The log2 ratio for the MDV3100 and DHT sample compared to the DHT sample was plotted on the y-axis.  Differential expression was defined as 1.5 fold change (black lines) and an FDR adjusted p-value<0.05.  Red: The probes increased by DHT compared to ethanol.  60% of the probes increased by DHT were decreased by MDV3100 treatment (bottom right; RefSeq probes: 7,364; non-RefSeq probes: 2,073). Green:  The probes decreased by DHT compared to ethanol.  38% of the probes decreased by DHT were increased by MDV3100 treatment (top left; RefSeq probes: 5,886; non-RefSeq probes: 1,466).  The probes specifically increased (yellow) and decreased (blue) by R1881 compared to both DHT and ethanol were plotted in the same colors as in Figure 4.6. The probes identified in the previous section as specifically regulated by R1881 compared to ethanol and DHT (Figure 4.6) were examined for differential expression in the anti-androgen treated samples.  46% of the probes specifically decreased by R1881 (Figure 4.6 blue) were increased by bicalutamide (Figure 4.10: blue top middle).  40% of the probes specifically decreased by R1881 (Figure 4.6 blue) were increased by MDV3100 (Figure 4.11: blue top middle).  39% of the R1881 specific probes decreased by R1881 were increased by both MDV3100 and bicalutamide.  The top three enriched GO terms clusters for these 494 genes decreased by R1881 and increased by bicalutamide and MDV3100 were cell cycle (p-value: 2.8E-93), DNA metabolic process (p-value: 2.5E-50), and microtubule-based process (p-value: 6.3E-25).  Only 7% of the 87  probes specifically increased by R1881 (Figure 4.6 yellow) were differentially regulated by bicalutamide.  59% of those probes specifically increased by R1881 were, however, increased by MDV3100 (Figure 4.11: yellow top middle). A subset of probes (RefSeq probes: 1,728; non-RefSeq probes: 1,553) were increased by MDV3100 in the presence of ethanol, DHT, and progesterone but not increased by bicalutamide.  The subset of probes increased by the anti-androgen, MDV3100, was also increased by the synthetic androgen, R1881 and the non-specific steroid receptor inhibitor, RU-486, but was not increased by DHT, progesterone, and bicalutamide (Figure 4.12).  All probes in the subset were significantly expressed (Agilent FE Extraction Software Flag „isWellAboveBG flag as discussed in 3.5) in all conditions tested (listed in Table 4.1).  The top three enriched GO term clusters for the 1,355 genes increased by MDV3100 were cell adhesion (p-value: 2.9E-10), cell-cell signalling (p-value: 2.1E-5), and ion transport (p-value: 3.9E-10). 88   Figure 4.12  Probe Intensities for the Probes Specifically Increased by MDV3100 The distribution of microarray probes intensities (log2) were compared by boxplot for each treatment in the LNCaP in vitro experiment (see Table 2.1 for sample abbreviations).  The subset probes plotted (RefSeq probes: 1,728; non-RefSeq probes: 1,553) were increased when treated with the anti-androgen, MDV3100 (1 nM), compared to the ethanol control, DHT (1 nM), and 17-0H-progesterone (5 nM). The bicalutamide and MDV3100 regulated RefSeq transcripts detected in the LNCaP in vitro experiment were compared to the AR ChIP-seq peaks detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  The distribution of the minimum distance from an AR ChIP-seq peak to the TSS of a RefSeq transcript regulated by an anti-androgen was compared by boxplot (Figure 4.13).   The median distance was lower for transcripts that were increased by DHT and decreased by bicalutamide or MDV3100 than the distances for transcripts that may not be directly regulated by DHT, bicalutamide, or MDV3100 (Wilcoxon Rank Test: FDR adjusted p-value<0.05).  The median distance was lower for transcripts increased by DHT and not decreased bicalutamide or MDV3100 suggesting that some direct targets of AR are not antagonized by the anti-androgens.  The median distance was also lower for transcripts that were not increased by DHT but decreased by bicalutamide or MDV3100 suggesting that some direct targets of AR were not detected in the DHT vs ethanol comparison. 89  The median distance was not lower for transcripts that were decreased by DHT and increased by anti-androgens.  The median distance was not lower for the subset of probes specifically increased by MDV3100 in the presence of ethanol, DHT, and progesterone. 90   Figure 4.13  AR ChIP-seq Peak Distance to the Nearest Anti-Androgen Regulated RefSeq TSS The distribution of the minimum distance from an AR ChIP-seq peak to the TSS of RefSeq transcripts regulated by an anti-androgen was compared by boxplot.  The AR ChIP-seq peaks were detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  RefSeq transcripts are subdivided by their differential expression in LNCaP cells 48h after treatment with 10 nM DHT compared to ethanol control and their differential expression with 10 uM bicalutamide (B) or 10 uM MDV3100 (MDV) in the presence of DHT compared to DHT alone.  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  The DHT up.B down subset, for example, included the transcripts increased by DHT compared to ethanol and decreased by bicalutamide in the presence of DHT compared to DHT alone.  no: no differential expression. The distances for each subset was compared using a Wilcoxon Rank Test to the minimum distances to the TSS of RefSeq transcripts not regulated by DHT, bicalutamide or MDV3100 (no change) * FDR adjusted p-value<0.001.  The outliers are not plotted.   91  Antisense Transcripts One of the features included in the design of the Agilent 4x180K custom microarray was sense-antisense probe pairs.  The version 2 design contained 29,644 sense-antisense probe pairs where a probe detecting a transcript encoded on the plus strand of DNA was within 500 nt of a probe detecting a transcript from the negative strand.  88% of the differentially regulated sense-antisense probe pairs were coordinately regulated by DHT: 1356 probe pairs were both increased by DHT and 577 probe pairs were both decreased by DHT.  12% of the sense-antisense probe pairs were oppositely regulated by DHT.  192 probe pairs were oppositely regulated by DHT and targeting a RefSeq transcript on one strand and a non-RefSeq transcript on the other strand.  38 probe pairs were oppositely regulated by DHT, targeting distinct RefSeq transcripts encoded on opposite strands.  SKIV2L2 and PPAP2A, on chromosome 5 (Figure 4.14 A), and TBRG1 and SIAE, on chromosome 11 (Figure 4.14 B), are two examples of oppositely regulated sense-antisense RefSeq transcripts.  31 probe pairs were oppositely regulated by DHT and targeting non-RefSeq transcripts.  Other genomic loci with DHT regulated sense-antisense transcripts were detected but the probes were greater than 500 nt apart.  The CTBP1 sense-antisense transcript pair is oppositely regulated by DHT but the probes were greater than 500 nt apart (discussed in Chapter 5). One of the sense-antisense pairs coordinately increased by DHT overlapped the KLK4 genomic locus.  KLK4 is a member of the kallikrein-related peptidase gene family—a gene family including KLK3/PSA.  The coordinate regulation of KLK4 sense and antisense transcripts was further validated by strand-specific linker-mediated qRT-PCR and by Northern blotting and the antisense transcript characterized by 5‟ and 3‟ RACE. The experiments revealed unexpected sense-antisense chimeric transcripts in the KLK4 genomic locus where portions of the KLK4 RefSeq transcript were fused to transcripts encoded on the opposite strand of the DNA to the KLK4 transcript (Lai et al. 2010). Independent KLK4 antisense transcripts could not be detected suggesting that the sense-antisense KLK4 chimeric transcript may be generated by strand-switching during transcription instead of through a trans-splicing mechanism.  The sense-antisense chimera was detected by Northern blot analysis and with PCR with the addition of actinomycin D during cDNA library preparation to alleviate potential template-switching 92  as previously described to be introduced by the reverse transcriptase enzyme (Perocchi et al. 2007).  Figure 4.14  DHT Regulated Sense-Antisense RefSeq Transcripts Two genomic loci are illustrated (modified from the UCSC Genome Browser) each with two convergent transcripts oppositely regulated in LNCaP cells 48 h after treatment with 10 nM DHT compared to an ethanol (E) control in a microarray experiment.  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  Blue: RefSeq transcripts.  Red: the microarray probes increased by DHT compared to ethanol.  Green: the microarray probes decreased by DHT compared to ethanol. A. SKIV2L2, a RefSeq transcript decreased by DHT, is encoded on the positive strand of DNA and PPAP2A, a RefSeq transcript increased by DHT, is encoded on the negative strand.  SKIV2L2 and PPAP2A overlap in a sense-antisense manner at their 3‟ ends.  B. TBRG1, a RefSeq transcript increased by DHT, is encoded on the positive strand of DNA and SIAE, a RefSeq transcript whose 3‟ end is decreased by DHT, is encoded on the negative strand.  TBRG1 and SIAE overlap in a sense-antisense manner at their 3‟ ends. LNCaP Illumina RNA sequencing represented is described in 2.2.4.  93  Probes Designed to Detect Prostate Expressed Transcripts The number of differentially expressed novel probes designed to detect expression of the prostate expressed transcripts identified in GenBank, LNCaP, and C4-2 RNA-seq data are summarized in Table 4.4.  16% of these novel probes detected differential regulation by DHT.  7.6% of the novel probes detected differential regulation by DHT and antagonism by an anti-androgen (i.e. bicalutamide or MDV3100).  84% of the novel probes detected transcripts expressed in all triplicates for at least one sample condition tested in the LNCaP in vitro experiment.  The percentages for the Agilent 44K probes were also included in Table 4.4: 24% of the Agilent 44K probes detected differential regulation by DHT, 12.6% detected differential regulation by DHT and antagonism by an anti-androgen, and 82% detected transcript expression. 94  Table 4.4  Probes Designed to Detect Prostate Expressed Transcripts in the LNCaP in vitro Experiment Probes designed to detect the prostate expressed transcripts identified in GenBank, LNCaP and C4-2 RNA-seq (discussed in 3.2.3) are presented.  Differential expression (DE) was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  DHT DE: the probes expressed by DHT compared to ethanol.  DHT AA DE: the probes differentially expressed by DHT and antagonized by an anti-androgen (AA) (i.e. bicalutamide or MDV3100).  EXP: the probes expressed in all triplicates for at least one sample condition. % of total on the Agilent 4x180K custom microarray (version 2).  Prostate Expressed Transcripts Agilent 44K Probes  DHT DE DHT AA DE DHT DE % DHT AA DE % EXP % DHT DE % DHT AA DE % EXP % RefSeq Probes coding exon 18 7 28.6 11.1 96.8 21.4 10.7 81.9 non-coding exon 18 9 22.2 11.1 91.4 21.8 10.0 72.6 5'UTR 8 3 16.7 6.3 83.3 20.0 12.0 68.0 3'UTR 74 39 24.5 12.9 97.7 27.6 15.4 87.3 multiple RefSeq 14 7 28.6 14.3 93.9 24.0 11.6 86.0 Non-RefSeq Probes exon antisense 295 154 12.1 6.3 86.4 12.5 5.6 75.0 5'UTR antisense 4 2 9.8 4.9 87.8 37.5 0.0 75.0 3'UTR antisense 453 261 14.9 8.6 76.3 18.8 12.5 75.0 intron sense 831 354 21.8 9.3 85.8 25.3 11.6 79.7 intron antisense 120 68 12.0 6.8 52.1 15.5 8.9 56.8 promoter sense 37 18 13.8 6.7 71.7 12.6 5.3 66.3 promoter antisense 41 23 20.0 11.2 89.3 22.9 13.0 83.8 3'extended sense 279 124 22.9 10.2 93.0 24.6 12.1 77.1 3'extended antisense 40 13 15.9 5.2 70.6 19.5 9.5 67.5 multiple non-RefSeq 871 382 13.0 5.7 88.2 23.0 11.1 82.0 intergenic 50 26 27.9 14.5 94.4 20.5 10.7 70.2 total 3153 1490 16.0 7.6 83.9 24.0 12.6 82.0 4.3.3 Discussion Probes targeting both RefSeq and non-RefSeq transcripts were identified above as differentially regulated by androgen and antagonized by anti-androgens.  Integrating the expression profiles of these probes with the AR ChIP peaks (discussed in 4.2) revealed that transcripts increased by androgen were closer to AR ChIP-seq peaks than to non- androgen regulated transcripts.  The transcripts decreased by androgen were not closer to AR ChIP-seq peaks than to non-androgen regulated transcripts.  The proximity of the AR ChIP-seq peaks to transcripts suggests that a larger number of transcripts increased by androgens may be direct transcriptional targets of AR.  The inability to detect a 95  difference in the distance to transcripts decreased by androgen when compared to transcripts not regulated by androgen suggests that fewer transcripts decreased by androgen are likely to be direct transcriptional targets of AR than the transcripts increased by androgens.  Repression of transcription by AR may, alternatively, be less dependent on AR-ARE binding distance.  Differential expression was compared between the synthetic androgen, R1881, and the physiological androgen, DHT.  A difference in the magnitude of change between R1881 and DHT was expected and observed as R1881 has a higher affinity for AR than DHT and R1881 is not metabolized within the cell whereas DHT is metabolized; however, within the course of these experiments that it unlikely to be the driver of differential regulation. The identification of a subset of probes specifically regulated by R1881 and DHT under these conditions was unexpected.  Functional over-representation analysis revealed that the transcripts specifically decreased by R1881 might be involved in cell cycle regulation.  FACS analysis revealed that under the same conditions as used for the microarray experiment, R1881 did not progress LNCaP cells through S phase while DHT and progesterone promoted cell cycle progression. It is possible that the observed difference between R1881 and DHT expression profiles may be dependent on time and concentration of steroid used.  The observed difference may also be specific to LNCaP cells.  The probes specifically increased by R1881 are not closer to AR ChIP-seq peaks than to non-androgen regulated probes suggesting that the R1881 specific probes may not be direct transcriptional targets of AR. Experiments are underway to determine if the R1881 action is directly acting through AR, using AR siRNA to examine the affects of transcription in response to R1881 treatment.  It is also possible that R1881 may be interacting with other steroid receptors such as the progesterone receptor.  54% of the probes specifically decreased by R1881 were increased by progesterone suggesting that R1881 might have a role as a progesterone receptor antagonist.   The above possibilities are currently being investigated.  It is, however, clear from this analysis that experiments performed to evaluate androgen regulation should be validated with a physiological androgen such as DHT. 96  Differential expression was evaluated following treatment with current (i.e. bicalutamide) and next-generation (i.e. MDV3100) anti-androgens.  The probes regulated by DHT were compared to the probes regulated by either anti-androgen.  52% of the probes regulated by DHT were antagonized by either anti-androgen while only 2.5% of the probes regulated by DHT were similarly regulated by either anti-androgen.  Previous reports have suggested a mutation found in LNCaP AR may allow AR to be activated by other steroid hormones such as progesterone and estrogens as well as anti-androgens such as bicalutamide (Veldscholte et al. 1990).  Under these conditions, bicalutamide and MDV3100 appear to function as AR antagonists and not as AR agonists. Comparing the differences between bicalutamide and MDV3100 are not within the scope of this thesis; however, treatment with MDV3100 antagonized a larger percentage of DHT regulated probes than treatment with bicalutamide under these conditions.  This is not unexpected as MDV3100 has been reported to have a higher affinity for the AR LBD and a greater ability to inhibit AR transcriptional activity than bicalutamide (Tran et al. 2009).  MDV3100 treatment increased a subset of probes that were also increased by the synthetic androgen, R1881, and the non-specific steroid receptor inhibitor, RU-486, but were not increased by DHT, progesterone, bicalutamide, or in the ethanol control.  The potential relationship between these transcripts that are increased both by a synthetic androgen and an anti-androgen is currently being explored in the lab. DHT regulated sense-antisense probe pairs were also identified above.  88% of probe- pairs where both probes were regulated by DHT were coordinately regulated.  There are several potential explanations for this coordinate regulation.  It is possible that antisense transcription is enhanced by the open chromatin formation required for transcription from the opposition strand.  It is also possible that the DHT regulated sense-antisense probe pairs may be targeting the same sense-antisense chimera as in the KLK4 example.  The third possibility is that the AffinityScript reverse transcriptase enzyme in the Agilent LIQA labeling kit introduced artifacts by template-switching as previously described with other reverse transcriptase enzymes (Perocchi et al. 2007). The coordinately regulated sense-antisense pairs should be validated by Northern blot analysis or by PCR with the addition of actinomycin D to prevent second strand 97  synthesis during cDNA library preparation.  In general, the Agilent LIQA protocol to generate labeled cRNA should maintain strand-specific information.  It is clear that some genuine sense-antisense transcription does occur in nature. Examples of sense-antisense transcripts oppositely regulated by DHT were identified including two examples of overlapping sense-antisense RefSeq transcripts: SKIV2L2 and PPAP2A, on chromosome 5 (Figure 4.14 A), and TBRG1 and SIAE, on chromosome 11 (Figure 4.14 B).  These sense-antisense transcripts highlight the importance of adopting strand-specific next-generation sequencing protocols.  Current standards for next-generation sequencing lose the direction of transcription during library preparation.   This loss of information can lead to incorrect assignment of strand to sequencing reads.  The incorrect alignment of reads can result in incorrect measurement of protein-coding transcripts and subsequent incorrect inference of protein levels in downstream functional analysis.  The incorrect alignment can also mask the detection of biologically significant antisense transcripts. It is unclear from this analysis the true composition of the non-RefSeq transcripts; many of these probes may be detecting transcripts with novel exons or UTRs for protein- coding transcripts or they may be detecting novel non-coding transcripts.  Integration with other high throughput profiling data such as strand-specific RNA-seq data or CAGE data may help to further classify these probes as detecting novel exons of RefSeq transcripts or as independent transcripts. 84% of the probes designed for the prostate expressed genomic regions identified in GenBank and the LNCaP and C-42 RNA-seq data were expressed in all three replicates of at least one sample condition tested (Table 4.4).  The percentage of probes designed to detect prostate expressed transcripts is comparable to the 82% of Agilent 44K probes (included on the custom microarray) expressed in at least one sample condition tested.  A lower percentage of the novel probes was expected as in some cases when the original direction of transcription was unknown, microarray probes were designed for both strands.  It is clear from this analysis, however, that transcripts can be expressed from genomic regions outside of those regions covered by RefSeq.  Some of these non-RefSeq transcripts were identified above to be regulated by androgens and anti-androgens. 98  4.4 Androgen Regulated Small RNA Detected in vitro by Next- Generation Sequencing 4.4.1 Methods LNCaP cells were grown in CSS for 48 h and then treated for 48 h with 1 nM R1881 or an ethanol only vehicle control.  RNA was isolated from the R1881 and ethanol control samples using the miRVana RNA isolation kit (Ambion).  The RNA was prepared for sequencing using the Illumina Small RNA Sample Prep Kit.  The small RNA libraries were sequenced on the Illumina Genome Analyzer II at the Genome Sciences Centre (Vancouver, BC) as described (Morin et al. 2008).  The sequencing reads were aligned to the human genome (hg18) and analyzed using miRAnalyzer, an online small RNA analysis tool (Hackenberg et al. 2009).  1,773,749 reads were sequenced for the R1881 treated sample and 1,112,332 reads were sequenced for the ethanol control sample. 4.4.2 Results 71.1% of the total sequencing read count from the R1881 sample and 70.3% of the total read count from the ethanol control sample aligned to mature miRNA in miRBase. 12.9% of the total read count from the R1881 sample and 11.2% of the total read from the ethanol control aligned to repeat regions featured in RepeatMasker.  The top 25 known miRNAs increased by R1881 and 3 miRNAs decreased by R1881 in LNCaP compared to ethanol are listed in Table 4.5. For many known miRNAs, the transcriptional regulation of their pri-miRNAs and regulation of how those pri-miRNAs are processed into mature miRNAs (Figure 1.5) is unknown. Some miRNAs are encoded in introns of protein-coding and non-protein- coding transcripts (host transcripts) and may therefore have common transcriptional regulation.  The expression profiles from the LNCaP in vitro experiment (discussed in 4.3) was integrated with the small RNA sequencing data to identify potential coordinate regulation between a miRNA and its host transcript.  The host transcripts with potential coordinate regulation of a miRNA are highlighted in bold in Table 4.5.  Of the top 25 miRNAs increased by R1881, eight host transcripts were also increased by R1881.  Of the three miRNAs decreased by R1881, all three hosts were decreased by R1881.  The 99  increase of hsa-miR-29a, hsa-miR-33a, and hsa-miR-148a by R1881 was further validated by qRT-PCR (Ambion: miRVana qRT-PCR miRNA Detection Kit) (Figure 4.15).  The genomic loci for hsa-miR2-29a and hsa-miR-33a is further discussed in Chapter 5.  Five of the miRNAs increased by R1881 (i.e. let-7g, miR-21, miR-29a, miR- 29b, and miR-148a) were previously reported to be increased by R1881 in a miRNA microarray profiling experiment of LNCaP cells (Ribas et al. 2009). The R1881 regulated miRNAs were compared to the AR ChIP-seq peaks detected in LNCaP cells 24 h after treatment with 1 nM R1881 (discussed in 4.2).  The median distance from an AR ChIP-seq peak to an R1881 regulated miRNAs was not lower than the distance to other known miRNAs not regulated by R1881 (Wilcoxon Rank Test: p- value>0.05).  The three miRNAs with the closest AR ChIP-seq peaks are miR-28-3p (~ 3 kb), miR-33a (~ 12 kb), and miR-21 (~28 kb).  100  Table 4.5  R1881 Regulated miRNAs Detected by Next-Generation Sequencing The miRNAs differentially regulated in LNCaP cells 48 h after treatment with 1 nM R1881 compared to an ethanol control.  The miRNAs presented had a fold change (FC) greater than 1.5 and a minimum number of sequencing reads of 100 in either sample. Raw fold change: R1881 reads (column 2) compared directly to the ethanol reads (column 3).    Normalized fold change: a scaling factor was applied to account for differences in sequencing library size prior to comparison.  miRNAs can be processed from (usually introns) of host protein-coding transcripts (column 6). Bold: the host was coordinately regulated by R1881 in the LNCaP in vitro experiment) miRNAs Increased by R1881 miRNA R1881 Reads Ethanol Reads Raw FC R1881 vs E Normalized FC R1881 vs E Host Transcript let-7a 198099 73224 2.71 1.70  let-7f 141489 46592 3.04 1.90 HUWE1 let-7g 16583 3973 4.17 2.62 WDR82 miR-29a 14647 3063 4.78 3.00  miR-1308 7330 903 8.12 5.09 PHEX miR-375 6162 2367 2.60 1.63 CCDC108 miR-21 6080 1939 3.14 1.97 TMEM49 miR-148a 5933 2007 2.96 1.85  miR-33a 2347 784 2.99 1.88 SREBF2 miR-22 2024 797 2.54 1.59 TLCD2 miR-128 1713 629 2.72 1.71  miR-92a 1439 431 3.34 2.09  miR-30a 1182 406 2.91 1.83  miR-151-3p 900 370 2.43 1.53  miR-340 865 234 3.70 2.32 RNF130 miR-29b 546 125 4.37 2.74  miR-203 476 141 3.38 2.12 ASPG miR-98 427 98 4.36 2.74 HUWE1 miR-28-3p 327 128 2.55 1.60 LP miR-9 251 98 2.56 1.61 C1orf61 miR-429 216 72 3.00 1.88 TTLL10 miR-190b 212 84 2.52 1.58 TPM3 miR-100 152 23 6.61 4.11 LOC399959 miR-23b 135 56 2.41 1.52 C9orf3 miR-660 134 53 2.53 1.58 CLCN5 miRNAs Decreased by R1881 miR-532-5p 248 1059 -4.27 -6.81 CLCN5 miR-615-3p 69 171 -2.48 -3.96 HOXC4 miR-615-5p 74 171 -2.31 -3.69 HOXC4 101       4.4.3 Discussion Some of the R1881 regulated miRNAs identified above have previously been reported to be regulated in prostate cancer.  miR-128 has been described as a negative regulator of prostate cancer cell invasion where knock-down of miR-128 induced invasion in benign prostate epithelial cells while over-expression attenuated invasion (Khan et al. 2009).  miR-148a has been reported to be expressed in lower levels in AR negative CRPC cell lines (PC3, DU145) compared to the AR positive LNCaP cell line.  Increased expression of miR-148a decreased cell migration and invasion and increased sensitivity to the anti-mitotic chemotherapeutic agent, paclitaxel (similar agent to docetaxel) (Fujita et al. 2010).  The removal of androgens by ADT may decrease the levels of the miRNAs increased by R1881 or increase the levels of miRNAs decreased by R1881.  If validated, these R1881 regulated miRNAs, such as miR-128 and miR-148, may be contributing to cell invasion, metastasis, progression to CRPC, and resistance to chemotherapeutics. The mechanisms underlying the regulation of miRNAs are difficult to infer by high throughput profiling experiments.  Although miRNAs can be processed from the introns B A C Figure 4.15  qRT-PCR Validation for miRNAs Increased by R1881 RNA isolated from LNCaP cells treated with 1 nM R1881 for 48 h and compared to an ethanol control.  qRT-PCR was perform using the miRVana, qRT-PCR miRNA detection kit (Ambion) to measure differential expression for the following mature miRNA: hsa- miR-29a (A), hsa-miR-33A (B), and hsa-miR-148 (C).  miRNA expression values were normalized to the snoRNA, U43.  A Welch two sample t-test was used to compare R1881 and ethanol control expression values (Error bars: standard error of the mean, n=3).  102  of host protein-coding and non-coding transcripts, they may also be processed from independent intronic transcripts.  Eleven of the miRNAs regulated by R1881 had host transcripts similarly regulated by R1881 in the LNCaP in vitro experiment including the host transcripts-miRNA pair, SREBF2 and miR-33a (discussed below in 5.2.3).  Some of the R1881 regulated miRNAs are most likely processed from lncRNAs not currently represented in the RefSeq transcripts including miR-29a and miR-29b (discussed below in 5.2.2).  In such cases, it is especially difficult to predict direct transcriptional regulation of the miRNA as the full sequence of the host transcript (ie. pri-miRNA) is unknown.  It is, therefore, not unexpected that AR ChIP-seq peaks are not found near R1881 regulated miRNAs.  It is also possible that (as described above) the synthetic androgen, R1881 could regulate a subset of miRNAs that would not be commonly regulated by the physiological androgen, DHT.  These R1881 regulated miRNAs should be validated with DHT and anti-androgen treatment to predict AR specific regulation. Regulation of miRNA expression can also occur at the miRNA processing level as is clear with miR-660 (increased by R1881) and miR-532-5p (decreased by R1881). These two miRNAs, oppositely regulated by R1881, along with six other miRNA, not differentially regulated by R1881 (not detected: miR-188, miR-500a, and miR-500b; not differentially expressed by R1881: miR-362, miR-501, and miR-502), are encoded in the same intron of CLCN5.  The CLCN5 transcript was decreased by R1881 in the LNCaP in vitro experiment.  The mechanism of regulation is unclear as all eight of these miRNAs are encoded in the same orientation as the „host‟ transcript, CLCN5. Analysis of small RNA high throughput profiling data is somewhat problematic for a number of reasons.  The assumption—usually made with profiling experiments of long transcripts—that the majority of the small RNAs are not changing between conditions can be incorrect if changes occur in the small RNA processing machinery.  If, for example, the expression of the protein components required for miRNA processing (illustrated in Figure 1.5) is altered between conditions, significant global changes in miRNA levels will occur.  It is difficult to determine if the difference in library sequence size is due to technical or biological reasons.  It is for this reason that a raw fold change and a normalized fold change are presented in Table 4.5. 103  miRNAs are also short sequences (~22 nt) with families of miRNAs having one or two distinguishing nucleotides.  Next-generation sequencing is better able to distinguish these differences than competitive hybridization methods such as microarrays. Incorrect aligning of the sequencing reads or cross-mapping can, however, introduce similar artefacts to cross-hybridization artefacts seen in microarray data (de Hoon et al. 2010).  miR-1308 increased by R1881 (Table 4.5) is an example of a miRNA that aligns to mature tRNAs.  Next-generation sequencing of small RNA is also able to detect novel small RNA which may be—like the non-RefSeq transcripts described above—tissue- and condition-specific.  The small RNA sequencing data set presented is currently being analyzed by others for novel R1881 regulated miRNAs and other small RNAs.  The LNCaP in vitro data is also being analyzed for potential mRNA targets of these R1881 regulated miRNAs considering the potential for alternative 3‟UTRs. 4.5 Transcripts Regulated in Xenograft Tumors by Castration and During Progression to CRPC Detected by Microarray Profiling 4.5.1 Methods LNCaP xenograft tumors were grown in athymic nude mice at two sites as modified from previously reported method (Ettinger et al. 2004).  All animal work was conducted using the accepted standards of the University of British Columbia Committee on Animal Care.  PSA serum levels were measured weekly from blood samples collected from tail vein incisions.  As described in the previous method, animals were castrated when serum PSA was > 100 ng/ml or the tumor was approximately 1.5 cm diameter (approximately 8 weeks after injection with LNCaP cells) (Ettinger et al. 2004).  The animals were sacrificed and tumors collected in the following manner: 6 tumors prior to castration, 10 tumors while PSA was regressing (~8 days), 10 tumors while PSA was at nadir or its lowest level (~16 days), 6 tumors while PSA was recurring (~21 Days), and 6 later-stage castration-resistant tumors (~28 days).  PSA levels were used as an indicator of tumor volume.  The tumor samples used in this experiment are summarized in Figure 4.16.  RNA was isolated from the collected xenograft tumors and hybridized to the Agilent 4x180K custom version 3 microarray (discussed in Chapter 3) and was analyzed using the same protocols as used for the LNCaP in vitro experiment 104  (discussed above in 4.3.1).  The probes were defined as expressed for a tumor type if the probes passed the „isWellAboveBG‟ Agilent FE Extraction Software flag (discussed in 3.5) for all tumors in the tumor type.  Figure 4.16  Tumor Samples Profiled in the LNCaP Xenograft Experiment LNCaP cells were injected into athymic nude mice and allowed to grow in an androgen dependent manner for ~8 weeks.  10 tumors were collected from intact mice prior to castration.  Following castration, tumors were collected in the following manner: 6 tumors while PSA was regressing, 10 tumors while PSA was at its lowest level (nadir), 6 tumors while PSA was recurring, and 6 later-stage castration resistant tumors. 4.5.2 Results Summary of Differentially Expressed Probes Each of the tumor types following castration were compared to the intact tumors to determine differential expression (1.5 fold change; FDR adjusted p-value<0.05).  The numbers of differentially expressed probes for each of the contrasts are summarized in Table 4.6. One of the global patterns of expression detected in the LNCaP xenograft experiment was a disproportionate number of 3‟ UTR and coding exons differentially expressed. Generally, the number of probes increased is proportionate to the number of probe decreased as is shown in the summary table of androgen regulated probes (Table 4.2). In the LNCaP xenograft experiment, a larger number of 3‟ UTR probes are decreased 105  than increased following castration compared to intact tumors (Table 4.2 green).  The shift in 3‟UTR probe expression is accompanied by a shift in coding exon probe expression where a larger number of coding exon probes are increased than decreased following castration (Table 4.2 red).   413 genes in the nadir compared to intact tumors had at least one probe decreased in the 3‟UTR accompanied but at least one probe increased in an exon.  The top four enriched GO term clusters for these 413 genes were regulation of transcription (p-value: 4.7E-4), chromosome organization (p-value: 2.5E- 4), transcription (p-value: 1.9E-4), and cell cycle (p-value: 3.6E-4).  Generally, the observed switch remained between the 3‟ UTR and exon; however, HIF1A is an example of where the switch occurred in the middle of the coding sequence after exon 9 (Figure 4.17). 106  Table 4.6  Differentially Expressed Probes in the LNCaP Xenograft Experiment The number of probes differentially expressed (1.5 fold change; FDR adjusted p- value<0.05) for each tumor type following castration (i.e. regressing, nadir, recurring, and castration resistant; as illustrated in Figure 4.16) compared to the intact tumors are given in the columns.  The total number of probes on the 4x180 Agilent version 3 custom microarray is listed in the last column.  The microarray probes were classified using the method illustrated in Figure 3.1. Red: highlighting the larger number of increased coding exon probes.  Green: highlighting the larger number of decreased 3‟UTR probes.  Regressing vs Intact Nadir vs Intact Recurring vs Intact CR vs Intact v3 RefSeq Probes up down up down up down up down total coding exon 6949 2951 11705 6177 9339 4294 9331 3395 87591 non-coding exon 148 80 215 137 131 96 102 81 3763 5'UTR 18 15 28 23 20 14 20 12 551 3'UTR 871 1656 1325 3395 754 2318 530 2523 28581 multiple RefSeq 117 130 175 303 126 214 111 197 3022 Non-RefSeq Probes up down up down up down up down total exon antisense 17 121 43 157 39 87 45 53 2520 5'UTR antisense 0 3 1 4 1 4 1 3 50 3'UTR antisense 24 134 115 137 108 87 108 75 3486 intron sense 659 308 1028 669 766 411 827 490 14022 intron antisense 87 118 111 166 84 162 121 140 6656 promoter sense 11 6 21 177 21 18 19 20 794 promoter antisense 31 26 72 65 33 40 43 62 1531 3'extended sense 85 106 126 231 76 180 84 182 2534 3'extended antisense 13 29 24 38 11 22 13 29 820 multiple non-RefSeq 210 381 428 636 280 399 329 318 11636 intergenic 179 207 272 291 131 268 163 293 7724   107   Figure 4.17  HIF1A Exon and 3' UTR Expression in the LNCaP Xenograft Experiment The HIF1A genomic locus is illustrated (modified from the UCSC Genome Browser). The HIF1A RefSeq transcript (blue) was differentially expressed in each tumor type following castration (i.e. regressing, nadir, recurring, and castration resistant; CR) compared to intact.  Differential expression was defined as 1.5 fold and FDR adjusted p- value<0.05.  The microarray probes targeting exons 2 through 8 are increased (red) following castration while the probes targeting exon 9 through to the 3‟ UTR are decreased (green) following castration. Data Integration with LNCaP in vitro and AR ChIP-Seq Experiments The microarray profiles for the LNCaP xenograft experiment were compared to the profiles for the LNCaP in vitro experiment.  Four predominant patterns of expression were detected for the probes that were differentially regulated in vitro by DHT compared to ethanol (Figure 4.18).  50% of the probes increased by DHT and decreased following castration in the regressing or nadir tumors were increased in castration resistant tumors (Figure 4.18 red).  The other 50% of the probes increased by DHT and decreased following castration remained significantly lower in the castration resistant samples compared to intact (Figure 4.18 yellow).  55% of the probes decreased by DHT and increased following castration were decreased in castration resistant tumors (Figure 4.18 green).  45% of the probes decreased by DHT and increased following castration remaining significantly higher in the castration resistant samples compared to intact (Figure 4.18 blue).  The probes were further classified by probes detecting transcripts that are antagonized in vitro by an anti-androgen (i.e. bicalutamide or MDV3100) or not antagonized. The distances from an AR ChIP-seq peak detected in LNCaP cell 24 h after treatment with 1 nM R1881 (discussed in 4.2) to the TSS of a DHT regulated RefSeq transcript 108  were compared to the distances to the TSS of RefSeq transcripts not regulated in vitro by DHT or regulated in vivo by castration.  The median distance was lower for all the RefSeq transcripts increased by DHT and decreased following castration (Figure 4.18 red and yellow) no matter the expression level in castration resistant tumors or if the transcripts were found to be antagonized in vitro by either anti-androgen (i.e bicalutamide and MDV3100) (Wilcoxon Rank Test: FDR adjusted p-value<0.001).  The median distance was not lower for the RefSeq transcripts decreased by DHT and increased following castration (Figure 4.18 green and blue) no matter the expression level in castration resistant tumors or if the transcripts were found to be antagonized in vitro by either anti-androgen.   The distances from an AR ChIP-seq peaks to the DHT regulated non-RefSeq probes were also compared and the results were as above.  The median distances for probes increased by DHT and decreased following castration (Figure 4.18 red and yellow) were lower than for the probes not regulated in vitro by DHT or in vivo by castration.  The median distances for the probes decreased by DHT and increased following castration (Figure 4.18 green and blue) were not lower. The full-length AR mRNA was one of the RefSeq transcripts decreased by DHT (1.7 fold) and increased in all tumor types following castration (an average of 3 fold for regressing, nadir, recurring, and castration resistant samples).  The alternative 3‟ UTR used in AR-V7 (Hu et al. 2009) and AR3 (Guo et al. 2009) of the ligand-independent and constitutively active AR splice variant (discussed in 1.3.3 ) was also increased 2.2 fold in castration resistant tumors compared to intact tumors.  This is consistent the previous reports of increased levels of AR-V7 in CRPC compared to treatment naive tumors (Hu et al. 2009). 109  Functional over-representation of the Entrez Genes (using the coding exon probes) represented in the four patterns of expression illustrated in Figure 4.18 was performed. The top three enriched GO term clusters for the 529 genes increased by DHT, decreased following castration, and increased in castration resistant tumors (Figure 4.18 red) were DNA replication (p-value: 4.9E-3), DNA metabolic process (p-value: 9.8E-3), lipid biosynthetic process (p-value: 1.4E-2).  There was no over-representation of GO terms in the 409 genes (coding exons) increased by DHT and decreased following castration and in castration resistant tumors (Figure 4.18 yellow).  The top three enriched GO term clusters for the 650 genes decreased by DHT, increased following castration and decreased in castration resistant tumors (Figure 4.18 green) were protein amino acid phosphorylation (p-value: 4.6E-5), cell adhesion (p-value: 3.2E- 3), cell projection organization (p-value: 1.1E-5).  The top three enriched GO term clusters for the 635 genes decreased by DHT, increased following castration and increased in castration resistant tumors (Figure 4.18 blue) were chromatin modification (p-value: 5.0E-12); cell adhesion (p-value: 4.2E-4); and negative regulation of transcription, DNA-dependent (p-value: 5.7E-3).  110   Figure 4.18  Patterns of Expression for the DHT Regulated Probes in the LNCaP Xenograft Experiment The four major patterns of expression for the probes differentially expressed (RefSeq probes: 10,413; non-Refseq probes: 2,225) by DHT compared to ethanol in the LNCaP in vitro experiment and in the LNCaP in vivo experiment (illustrated in Figure 4.16). Differential expression was plotted on the y-axis (1: increased; -1: decreased) and was defined as 1.5 fold change and an FDR adjusted p-value<0.05. Red: 15% of the probes were increased in vitro by DHT, decreased after castration and then increased in castration resistant tumors.  Yellow: 15% of the probes were increased in vitro by DHT, decreased after castration and in castration resistant tumors.  Green: 24% of the probes were decreased in vitro by DHT, increased after castration and then decreased in castration resistant tumors.  Blue:  20% of the probes were decreased in vitro by DHT, increased after castration and in castration resistant tumors. [% antagonized in vitro by either bicalutamide or MDV3100, % not antagonized by an anti-androgen].  Patterns not shown: 6% of the probes were increased by DHT and castration; 7% of the probes were decreased by DHT and castration; the remaining probes were not differentially expressed in regressing or nadir tumors compared to intact tumors.   111  Probes Designed for Prostate Expressed Transcripts The number of differentially expressed novel probes designed to detect prostate expressed transcripts identified in Chapter 2 are summarized in Table 4.7.  13% of the probes designed for the regions identified as prostate-specific in the Illumina Body Map 2.0 RNA-seq data detected differential expression in the LNCaP xenograft experiment (illustrated in Figure 4.16) and 42% detected transcript expression.  The prostate specific lncRNAs, PCA3 and PCGEM1, were both differentially expressed following castration: PCA3 was decreased 1.5 fold at nadir while PCGEM1 was increased 3.6 fold at nadir compared to intact.  15% of the probes designed for the regions identified as expressed in the prostate from either GenBank or RNA-seq of prostate cancer cell lines (selected predominantly for regions of the genome not encoding RefSeq transcripts) detected differential expression and 69% of the probes detected transcript expression. 22% of the Agilent 44K probes (RefSeq probes: 4,904; non-RefSeq probes: 1,504; 4188 Entrez Genes) detected differential expression and 75% detected transcript expression. 112  Table 4.7  Probes Designed to Detect Prostate Expressed Transcripts in the LNCaP Xenograft Experiment Probes targeting the Illumina Body Map 2.0 prostate specific transcripts and the other prostate expressed transcripts (i.e. GenBank, LNCaP and C4-2 RNA-seq) (discussed in 3.2.3) are presented. DE: the probes detecting differential expression in regressing, nadir, recurring, or castration resistant tumors compared to intact tumors.  Exp: the probes detecting transcript expression in all tumors in at least one of the five tumour types.  Differential expression was defined as 1.5 fold change and an FDR adjusted p- value<0.05.  % of total on the Agilent 4x180K custom microarray (version 3).  Illumina Body Map 2.0 Prostate Specific Other Prostate Expressed Transcripts RefSeq Probes DE Exp % DE % Exp DE Exp % DE % Exp coding exon 108 361 22.4 74.9 15 47 30.0 94.0 non-coding exon 8 20 22.2 55.6 12 58 17.4 84.1 5'UTR 1 11 4.8 52.4 10 27 27.8 75.0 3'UTR 62 232 21.5 80.3 33 108 27.7 90.8 multiple RefSeq 8 20 22.9 57.1 6 24 24.0 96.0 Non-RefSeq Probes exon antisense 17 55 17.9 57.9 243 1364 12.2 68.5 5'UTR antisense 1 1 50.0 50.0 4 18 16.7 75.0 3'UTR antisense 7 27 11.1 42.9 351 1544 12.3 54.1 intron sense 264 839 15.4 49.0 760 2848 20.9 78.5 intron antisense 162 438 8.1 21.8 106 349 10.9 36.0 promoter sense 2 26 3.4 44.1 33 123 15.4 57.5 promoter antisense 8 17 17.8 37.8 34 145 18.6 79.2 3'extended sense 17 67 11.3 44.4 113 427 22.6 85.6 3'extended antisense 11 14 12.2 15.6 26 107 10.9 45.0 multiple non-RefSeq 102 481 12.0 56.4 665 3905 12.9 75.8 intergenic 89 205 11.6 26.7 46 161 26.3 92.0 total 867 2814 12.9 41.9 2457 11255 15.1 69.4  4.5.3 Discussion The subset of probes identified as DHT regulated in the LNCaP in vitro experiment were integrated with the probes differentially expressed in the LNCaP xenograft microarray experiment (summarized in Table 4.6).  74% of the probes in the integrated subset were oppositely regulated by the addition of androgen (i.e. DHT) in vitro and the removal of androgens in vivo by castration.  The probes oppositely regulated by DHT and castration were divided somewhat evenly between those probes that had apparent recurring androgen regulation in the castration resistant tumors (Figure 4.18 red and 113  green) and those probes that had persistent change following castration (Figure 4.18 yellow and blue).  It is clear from this analysis that in the LNCaP xenograft model, the subset of transcripts regulated by androgens is different for treatment naive tumors (i.e. intact) compared to the castration resistant tumors.  It is possible that more persistent chromatin modifications may occur following castration that alters the ability of AR to regulate those transcripts.  The integration of the in vitro and in vivo experiments is somewhat incomplete as different versions of the custom microarray were used (in vitro: version 2; in vivo: version 3; differences discussed in 3.4).  It is apparent, however, that some of transcripts regulated by androgens in an acute manner in vitro are not detected in the in vivo model of persistent removal testicular androgens and vice versa. Some of the transcripts differentially regulated in the in vivo model may be regulated by other mechanisms such as cellular stress response.  The LNCaP xenograft dataset presented is a valuable resource to investigate other mechanisms of transcript regulation outside of androgen regulation; however, such questions are beyond the scope of this thesis. The pattern of differential expression between 3‟ UTRs and coding exons was detected in the in vivo model but not in the in vitro model.  The mechanism underlying the use of alternative 3‟ UTRs following castration is unclear and warrants further investigation.  It is possible that alternative polyadenylation sites may be used to generate shorter 3‟ UTRs.  As previously described in other cell lines (Sandberg et al. 2008), shorter 3‟ UTRs may be a mechanism for the cell to evade miRNA-mediated translational repression and thereby result in increased protein levels.  Shorter 3‟ UTRs may also alter RNA-protein interactions.  From these data, it is unclear whether shorter 3‟ UTRs were used by the identified transcripts or if alternative 3‟ UTR were used that would not have been detected using the custom microarray. The detection of the alternative 3‟ UTRs does, however, highlight a number of advantages to the technological and analytical approach presented.  Conventional microarrays with one 3‟ UTR probe per RefSeq transcript would have inferred that protein levels for the detected transcript decreased following castration whereas it is possible that the protein levels may in fact be increased.  The differential expression between 3‟ UTRs and coding exons would also have been missed on exon arrays or 114  arrays with more probes per RefSeq transcript using the convention practice of averaging the fold change across the RefSeq transcripts.  By averaging the probes across a transcript, examples such as HIFIA (Figure 4.17) would have not been detected.  RNA-seq data for these tumors may help to further characterize the alternative transcripts expressed following castration; but, as above, averaging the sequencing reads for a RefSeq transcript would mask the biological effect. 69% of the probes designed for the prostate expressed transcripts identified in GenBank and the LNCaP and C-42 RNA-seq detected transcript expression in all tumors of at least one of the 5 tumor types present.  This is a lower percentage than the 84% seen in the LNCaP in vitro experiment (Table 4.4) and a lower percentage than the 75% of Agilent 44K probes (included on the custom microarray) expressed in the xenograft experiment (Table 4.7).  42% of the probe designed for the Illumina Body Map 2.0 prostate specific transcripts detected expression in the xenograft experiment. As described above, a lower percentage of probes detected to detect prostate expressed transcripts were expected as in some cases when the original direction of transcription was unknown, microarray probes were designed for both strands.  The percentage of probes detecting expression of the transcripts identified in GenBank and the LNCaP and C-42 RNA-seq data was higher than those probes designed for the Illumina Body Map 2.0 prostate specific transcripts.  This difference in percentage may be because a higher expression threshold was set for the LNCaP and C4-2 RNA-seq data than for the Illumina Body Map 2.0 RNA-seq data.  867 Illumina prostate specific probes and 2457 probes from GenBank and LNCaP and C4-2 RNA-seq data detected differential expression following castration which would not have been detected using conventional microarray profiling. 4.6 Summary The custom 4x180K Agilent microarray described in Chapter 3 was used to detect androgen and anti-androgen regulated transcripts in an in vitro setting and transcripts regulated by castration and during progression to CRPC in an in vivo model.  An AR- ChIP experiment also presented in this chapter detected 6,630 potential AR binding sites.  The AR binding sites were compared to the androgen and anti-androgen regulated transcripts to predict direct transcriptional targets of AR.  AR binding sites 115  were generally closer to the transcripts increased by androgen than the transcripts decreased or not regulated by androgens.  Androgen regulated small RNAs were also identified in the analysis of next-generation sequencing data. Two miRNAs increased by androgen, miR-29 and miR-33, are discussed in Chapter 5. The expression of the androgen and anti-androgen regulated transcripts identified in the in vitro experiment were examined in the LNCaP xenograft experiment in order to determine regulation following castration and during progression to CRPC.  Of the transcripts increased by androgens in vitro and decreased following castration, approximately half were increased again in CRPC while the other half remained decreased in CRPC compared the intact (i.e. pre-castrate) tumors.  The same pattern was observed for the transcripts decreased in vitro by androgens and increased following castration where approximately half of the transcripts were decreased again in CRPC while the other half remained increase in CRPC compared to intact tumors. These four patterns of expression suggest that the androgen regulated transcriptional program in CRPC is different from that seen in hormone naive tumors.  The removal of androgens may be effecting a permanent change in transcription potentially through chromatin modification that could promote progression to CRPC. These androgen regulated transcripts—especially those transcripts decreased by androgens and increased following castration—should be prioritized for further characterization as they may provide novel therapeutic targets for combination therapy with ADT to prevent or slow the progression to CRPC.  Specific examples of androgen transcripts and genomic loci are discussed in Chapter 5. Several unexpected findings were revealed in this integrative analysis including a cell cycle related difference between R1881 and DHT, transcripts increased by MDV3100 and not bicalutamide, and the potential use of alternative 3‟ UTRs following castration in the in vivo model.  These unexpected findings warrant further investigation. 116  Chapter 5 Examples of Androgen Regulated Transcripts and Genomic Loci 5.1 Introduction The integrative analysis presented in Chapter 4 revealed many examples of complex transcription beyond the former dogma of one gene per genomic region.  Analysis of genomic context revealed potential modes of transcript regulation that might have been missed if the downstream analysis considered expressed transcripts independently. Androgen and anti-androgen regulated transcripts were identified that were transcribed from complex interleaved and overlapping genomic regions.  Some of this complexity is represented in current RNA reference databases such RefSeq with examples including the oppositely androgen regulated sense-antisense RefSeq transcripts SKIV2L2 and PPAP2A, on chromosome 5, and TBRG1 and SIAE, on chromosome 11 (illustrated above Figure 4.14).  It is possible in such cases that transcription from one direction impairs the transcription or transcript stability of the transcript encoded in the other direction. Some of the complexity identified in the integrative analysis above is not, however, represented in the RefSeq databases.  Presented below are three snapshots of this genomic complexity: a prostate expressed transcript oppositely regulated by androgens to the CTBP1 protein-coding transcript; an androgen regulated genomic locus containing androgen regulated miRNAs, miR-29a and miR-29b; and coordinate expression of miR-33a and its host transcript, SREBF2.  These androgen regulated transcripts and genomic loci should be prioritized for further functional characterization (Objective 4). 5.2 Androgen Regulated Transcripts and Genomic Loci 5.2.1 CTBP1 Sense-Antisense Transcripts C-terminal binding protein (CTBP1) was so named because it was first described as binding to the COOH-terminal end of the adenovirus protein E1a (Boyd et al. 1993). Subsequent studies have described CTBP1 as an antagonist of both the epithelial 117  phenotype and anoikis (i.e. programmed cell death following cell detachment from the extracellular matrix) (Grooteclaes and Frisch 2000).  CTBP1 can form a complex with other proteins that together have histone acetyltransferase (HAT) and histone deacetylase (HDAC) properties.  The CTBP1 complex functions mainly through methylation of H3K9 (Shi et al. 2003).  CTBP1 is generally considered a transcriptional repressor and has been described to repress the transcription of p21, BAX, NOVA, PERP (Grooteclaes et al. 2003).  Knockdown of CTBP1 decreases methylation of the E- cadherin promoter (Shi et al. 2003).  E-cadherin can be repressed through a CTBP1 and ZEB1 interaction (Grooteclaes and Frisch 2000; Grooteclaes et al. 2003).  E- cadherin is an epithelial adhesion protein and its loss has been implicated in epithelial- to-mesenchymal transition (EMT) and metastasis (Onder et al. 2008).  CTBP1, though generally described as a transcriptional repressor, can function as a transcriptional activator.  CTBP1 can bind to the promoter of MDR1/ABCB1 to activate its transcription (Jin et al. 2007).  MDR1/ABCB1 is an efflux pump protein involved in multi-drug resistance. The androgen and anti-androgen regulation of the CTBP1 genomic locus is illustrated in Figure 5.1.  The CTBP1 transcript (Figure 5.1 dark blue) was decreased an average of 4 fold by DHT and R1881 in the LNCaP in vitro experiment.  This finding is in agreement with a previous microarray experiment in LNCaP cells where the CTBP1 transcript was decreased 24 h after treatment with 1 nM R1881 (Nelson et al. 2002). The CTBP1 transcript was increased after treatment with an anti-androgen in the presence of DHT compared to DHT alone (bicalutamide: 2.9 fold; MDV3100: 4.1 fold). Novel probes were included on the 4x180K custom microarray to detect a prostate expressed transcript that are encoded antisense to the 3‟ end of CTBP1 identified in GenBank.  The sequence for the antisense transcript, AK092548 (Figure 5.1 purple) was sequenced from prostate tissue and the sequencing was submitted in 2004 to GenBank as part of large scale sequencing project (Ota et al. 2004).  Contrary to the androgen regulation observed for the CTBP1 transcript, the antisense transcript (CTBP1as) was increased 4.28 fold by DHT.  CTBP1as was decreased after treatment with an anti-androgen (bicalutamide: 1.9 fold; MDV3100; 3.3 fold).  An Agilent 44K probe, A_32_P118655, closer to the 5‟ end of the CTBP1 transcript was similarly 118  regulated to the probes detecting AK092548 suggesting that the detected CTBP1as transcript may be longer than the GenBank sequence AK092548 (Figure 5.1 *). The AR ChIP-seq peaks from the AR ChIP-seq experiment described in 4.2 (Vancouver Prostate Centre; VPC) as well as the other genome wide AR ChIP experiments (Wang et al. 2009; Yu et al. 2010) were examined at the CTBP1 genomic locus (Figure 5.1 brown).  Although the Wang et al. and Yu et al. AR ChIP experiment revealed potential AR binding at different regions within the CTBP1 locus, all three AR ChIP experiments detected AR binding overlapping the 3‟ UTR of the CTBP1 protein-coding transcript.  It is possible that AR binding in the CTBP1 3‟ UTR may promote the transcription of CTBP1as and therefore indirectly regulates the transcript levels of the CTBP1 protein- coding transcript. CTBP1 and CTBP1as transcripts, although expressed, were not differentially expressed following castration and during progression to CRPC in the LNCaP xenograft experiment.  The expression of these transcripts may be an acute response to androgen modulation that would not be detected in the in vivo model of persistent removal of testicular androgens.  Modulation of CTBP1 may be occurring prior to the earliest time point (i.e. regressing tumors; 7 days) in the LNCaP xenograft experiment. Early modulation of the CTBP1 protein levels following castration could potentially lead to persistent chromatin modifications that may alter the progression to CRPC. 119   Figure 5.1  Androgen Regulation of Sense-Antisense Transcripts in the CTBP1 Genomic Locus The CTBP1 genomic locus is illustrated (modified from UCSC Genome Browser).  The AR binding sites detected in the AR ChIP-seq experiment (described in 4.2; Vancouver Prostate Centre; VPC), and other genome wide AR ChIP experiments (Wang et al. 2009; Yu et al. 2010) are colored brown.  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  The illustrated probes detected the CTBP1 RefSeq transcripts (dark blue) increased (red) and decreased (green) in the indicated sample comparisons in the LNCaP in vitro experiment (E: ethanol; B: bicalutamide; MDV3100: MDV).  The illustrated probes detected the CTBP1as transcript (AK092548; purple) increased (yellow) and decreased (blue) in the indicated sample comparisons.  The position of the CTBP1as siRNA is indicated (pink). The expression profiles in the LNCaP in vitro experiment for the CTBP1 genomic loci were further validated using the same treatment conditions (Figure 5.2).  The presence of the CTBP1 protein was detected in the ethanol control sample but not, however, in the 10 nM DHT treated LNCaP cells in a Western blot (CTBP1 antibody: Abcam ab79417) (Figure 5.2 A).  The decrease of the CTBP1 transcript with DHT compared to 120  the ethanol control and the converse increase of the CTBP1as transcript was confirmed by qRT-PCR (Figure 5.2 B).  The inhibition of the CTBP1as transcript levels using an siRNA (3‟-GAGUGGAGCGGCAGAGAAAUU-5‟;Dharmacon; Figure 5.1 pink) increased CTBP1 protein levels in preliminary experiments.  CTBP1 proteins levels were increased 48 h after treatment with 1 nM R1881 and the CTBP1as siRNA compared to the R1881 and scramble siRNA (Western blot; Figure 5.2 C).  The results of these preliminary experiments suggest that CTBP1as transcript levels may directly inhibit the levels of the CTBP1 transcript and ultimately CTBP1 protein levels.  Figure 5.2  Laboratory Validation of CTBP1 and the CTBP1 Antisense Transcripts A. Western blot using a CTBP1 antibody (ab79417) on RIPA lysates of LNCaP cells treated with 10 nM DHT for 48 h compared to an ethanol control (EtOH).  A Western blot of a loading control, Vinculin, is also shown.  B. qRT-PCR results for the CTB1 protein-coding transcript (sense) and the CTBP1 antisense transcript (AK092548) treated with 10 nM DHT for 48 h compared to an ethanol control.  The results were normalized to GAPDH transcript levels (Error bars: standard deviation; n=3).  C. Western blot using a CTBP1 antibody on RIPA lysates of LNCaP cells treated with R1881 (1 nM), R1881 and a scramble siRNA, and R1881 and a CTBP1 antisense (CTBP1as) siRNA for 48 h compared to an ethanol control.  The blot was normalized for GAPDH expression. 121  5.2.2 miRNA-29a/29b Genomic Locus Altered expression of the miR-29 family of miRNAs has previously been described in some types of cancers. miR-29a was reported to reduce invasion and cell proliferation in an in vitro model of invasive lung cancer (Muniyappa et al. 2009).  The miR-29 family has also been shown to inhibit protein translation of DNA methyltransferases, DNMT3A and DNMT3B, leading to aberrant methylation in lung cancer (Fabbri et al. 2007) and in acute myeloid leukemia (Garzon et al. 2009).  Most miRNAs function in the cytoplasm to inhibit mRNA translation.  miR-29b, however, is one of the few reported miRNAs to be localized predominantly in the nucleus (Hwang et al. 2007).  The functional role of nuclear miR-29b remains unknown. The androgen regulation of the miR-29a/29b genomic locus is illustrated in Figure 5.3. miR-29a and miR-29b were increased (3 fold and 2.7 fold, respectively) 48 h after treatment 1 nM R1881 compared to an ethanol control in the LNCaP small RNA next- generation sequencing experiment (discussed in 4.4).  The androgen regulation of miR- 29a was validated by qRT-PCR (3.22 fold; Figure 4.15).  This finding is in agreement with a previous miRNA profiling experiment where miR-29a and miR-29b were also increased by 1 nM R1881 in LNCaP cells (Ribas et al. 2009). The pri-miRNA or host transcript on chromosome 7 for miR-29a and miR-29b remains uncharacterized but these miRNAs are presumably processed from the intron of an lncRNA.  miR-29a/29b are encoded on the negative strand of the chromosome downstream of two RefSeq transcripts encoding hypothetical lncRNAs, LOC646329 and FLJ43663 (Figure 5.3 dark blue).  Non-RefSeq and RefSeq probes on the custom microarray detected increased expression of this genomic locus (on the negative strand) 48 h after treatment with 10 nM DHT compared to ethanol in the LNCaP in vitro experiment (probe average: 2.0 fold).  The probes in this genomic region were decreased with an anti-androgen in the presence of DHT compared to DHT alone (bicalutamide: 2.5 fold; MDV3100: 2.3 fold). The AR ChIP-seq peaks from the AR ChIP-seq experiment described in 4.2 (Vancouver Prostate Centre; VPC) as well as the other genome wide AR ChIP experiments compared above (Wang et al. 2009; Yu et al. 2010) were examined at the miR-29a/29b genomic locus (Figure 5.3 brown).  Although the Wang et al. and Yu et al. AR ChIP 122  experiment revealed potential AR binding at different regions within the miR-29a/29b locus, all three AR ChIP experiments detected two AR binding sites in common.  One common AR binding site was upstream of the RefSeq transcript, LOC646329 and the other common AR binding site was upstream of FLJ43663. Five of the non-RefSeq probes in this regions increased by androgen and decreased by anti-androgens were modestly decreased following castration and remained decreased in CRPC compared to intact (>1.5 fold).  Probes for coding exon regions of DNMT3A and DNMT3B, reported targets of miR-29a/29b (Fabbri et al. 2007; Garzon et al. 2009), are increased following castration and remain increased in CRPC compared to intact (> 2 fold).  It is possible that miR-29a and miR-29b are decreased in tumors following castration and the loss of these miRNAs may promote—as described in other cancer types (Fabbri et al. 2007; Garzon et al. 2009; Muniyappa et al. 2009)—increased invasion, cell proliferation, and aberrant DNA methylation in CRPC. The data presented in this thesis was also integrated with data from a RIP-chip experiment (RNA immunoprecipitation) designed to detect interactions of lncRNAs (i.e. long intergenic RNAs; lincRNA) with protein components of the polycomb repressive complex 2 (PRC2; proteins EZH2 and SUZ12) in other cell types (Khalil et al. 2009). The androgen regulated genomic region surrounding miR-29a and miR-29b overlapped with a PRC2 interacting lincRNA identified by Khalil et al. (Figure 5.3 purple).  It is possible that the pri-miRNA of miR-29a and miR-29b or adjacent transcripts might provide a scaffold for the assembly of histone modification complexes as is seen with the lncRNA, HOTAIR (Tsai et al. 2010).  This may be a fairly common feature of pri- miRNA as 83 known human miRNA in miRBase overlapped with the PRC2 interacting lincRNAs identified by Khalil et al.  In addition to miR-29a and miR-29b, three other miRNAs increased by R1881 in the small RNA next-generation sequencing experiment overlapped with the PRC2 interacting lincRNAs: miR-30a, miR-148a, and miR-100.  It is intriguing to postulate that a miRNA and its host lncRNA transcript may cooperate to repress large networks of transcription: the lncRNA may interact with the PRC2 to inhibit transcription at the DNA level while the miRNA may interact with already transcribed mRNAs to prevent protein translation. 123   Figure 5.3  Androgen Regulation of the miR-29a/29b Genomic Locus The miR-29a/29b genomic locus is illustrated (modified from UCSC Genome Browser). The AR binding sites detected in the AR ChIP-seq experiment (described in 4.2; Vancouver Prostate Centre; VPC), and other genome wide AR ChIP experiments (Wang et al. 2009; Yu et al. 2010) are colored brown.  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  RefSeq probes increased (red) and decreased (green) in the indicated sample comparison in the LNCaP in vitro experiment (E: ethanol; B: bicalutamide; MDV3100: MDV) and the LNCaP xenograft experiment (ie. castration resistant tumors vs intact tumors).  Non- RefSeq probes increased (yellow) and decreased (blue) in the indicated sample comparisons.  All probes detected transcripts encoded on the negative strand (right to left).  miR-29a and miR-29b are also encoded on the negative strand.  LNCaP Illumina RNA sequencing represented is described in 2.2.4.  Long intergenic RNA (lincRNA) reported in a RIP-ChIP experiment to interact with the protein components of the polycomb repressive complex 2 (PRC2) in other cell types (Khalil et al. 2009) (purple).  124  5.2.3 SREBF2 and miR-33a Genomic Locus Sterol regulatory element binding transcription factor 2 (SREBF2) is a protein that controls cholesterol synthesis within the cell by regulating the transcription of sterol- regulated transcripts (Swinnen et al. 1997).  SREBF2 protein expression has been reported to be increased following castration and during progression to CRPC in the LNCaP xenograft model (Ettinger et al. 2004).  The expression of miR-33a encoded in an intron of SREBF2, in concert with the SREBF2 protein, inhibits cholesterol export and fatty acid oxidation (Gerin et al. 2010; Najafi-Shoushtari et al. 2010) The androgen regulation of the SREBF2/miR-33a genomic locus is illustrated in Figure 5.4.  The SREBF2 RefSeq transcript was modestly increased by DHT and R1881 in the LNCaP in vitro experiment.  All RefSeq probes detecting SREBF2 were increased more than 1.5 fold (probe average 1.8 fold).  No RefSeq probes were differentially regulated by the anti-androgen, bicalutamide.  Two RefSeq probes were decreased 1.5 fold by the anti-androgen, MDV3100.  Non-RefSeq probes in intron 12 and the 3‟ extended region of SREBF2 were similarly regulated to the RefSeq probes.  The expression of an extended 3‟ UTR, 940 nt downstream of the SREBF2 annotated 3‟UTR, was observed in the LNCaP RNA-seq data.  A non-RefSeq probe 960 nt upstream of SREBF2 was conversely decreased 1.5 fold by DHT and increased 1.5 and 1.7 by bicalutamide and MDV3100, respectively.  The expression of the promoter sequence was observed in the LNCaP RNA-seq data and may be a 6.4 kb extension of the annotated 3‟UTR for the adjacent transcript, CCDC134.  The SREBF2 transcript and extended 3‟ UTR are both decreased 1.6 fold at the earliest time point following castration (i.e. regressing; 7 days) and are not differentially expressed in the later time points compared to intact tumors in the LNCaP xenograft experiment. miR-33a was increased 1.9 fold 48 h after treatment 1 nM R1881 compared to an ethanol control in the LNCaP small RNA next-generation sequencing experiment (discussed in 4.4).  The androgen regulation of miR-33a was validated by qRT-PCR (2.12 fold; Figure 4.15). The coordinated expression of miR-33a and the SREBF2 transcript may suggest that miR-33a is processed from an intron of SREBF2.  An independent host transcript of miR-33a (pri-miR) cannot be ruled out, however, as differential expression of intronic regions of SREBF2 was also observed.  A novel 125  androgen miRNA was also detected in the intron adjacent to miR-33a in the small RNA sequencing data.  This novel miRNA has yet to be characterized. The AR ChIP-seq peaks from the AR ChIP-seq experiment described in 4.2 (Vancouver Prostate Centre; VPC) as well as the other genome wide AR ChIP experiments compared above (Wang et al. 2009; Yu et al. 2010) were examined at the SREBF2/miR-33a genomic locus (Figure 5.4 brown).  Although all three AR ChIP experiments detected AR binding within the SREBF2/miR-33a genomic locus, no AR binding sites were in common between the AR ChIP experiments. Understanding the underlying mechanisms of SREBF2 and miR-33a regulation may be important for understanding cholesterol synthesis in prostate cancer.  Cholesterol synthesis and homeostasis are important in prostate cancer progression as cholesterol can be a precursor for de novo synthesis of androgens in prostate tumors (Locke et al. 2008). 126   Figure 5.4  Androgen Regulation of the SREBF2/miR-33a Genomic Locus The SREBF2/miR-33a genomic locus is illustrated (modified from UCSC Genome Browser).  The AR binding sites detected in the AR ChIP-seq experiment (described in 4.2; Vancouver Prostate Centre; VPC), and other genome wide AR ChIP experiments (Wang et al. 2009; Yu et al. 2010) are colored brown.  Differential expression was defined as 1.5 fold change and an FDR adjusted p-value<0.05.  RefSeq probes increased (red) and decreased (green) in the indicated sample comparison in the indicated sample comparison in the LNCaP in vitro experiment (E: ethanol; B: bicalutamide; MDV3100: MDV).  Non-RefSeq probes increased (yellow) and decreased (blue) in the indicated sample comparisons.  All probes detected transcripts encoded on the positive strand (right to left).  miR-33a is also encoded on the positive strand. LNCaP Illumina RNA sequencing represented is described in 2.2.4. 5.3 Discussion The three snapshots of the genome described above highlight some of the features of transcription identified in the integrative analysis presented in Chapter 4.  The CTBP1 sense-antisense example highlights the importance of identifying and profiling transcripts expressed in the tissue of interest.  The prostate expressed, CTBP1as 127  antisense transcript (AK0922548) oppositely regulated by androgens and anti- androgens to the CTBP1 protein-coding transcript, would not have been detected in convention microarrays focused on reference RNAs.  The CTBP1as would also not have been detected in non-strand-specific RNA-seq data as the sequencing reads for the CTBP1as transcript would have most likely been assigned to the CTBP1 protein- coding transcript. AR binding in the 3‟ UTR of CTBP1 may promote transcription of the CTBP1as transcript.  It is possible that other transcripts repressed by androgens may be affected by androgen regulated antisense transcripts.  Similar AR binding was detected for the sense-antisense RefSeq transcripts oppositely regulated by androgens (Figure 4.14). AR ChIP-seq peaks were detected upstream of PPAP2A (increased by DHT) but AR ChIP-seq peaks were not detected upstream of the convergent antisense transcript, SKIV2L2 (decreased by DHT).  An AR ChIP-seq peak was also detected upstream of TBRG1 (increased by DHT) but AR ChIP-seq peaks were not detected upstream of the convergent antisense transcript SIAE (decreased by DHT). These potential regulatory relationships of adjacent transcripts highlight the importance of considering genomic context in the analysis of high throughput datasets.  The potential regulation of transcripts within a local genomic region should be considered in experiments to perturb single transcripts through antisense or siRNA.  The inhibition of a protein-coding transcript may have additional affects to the expected protein-coding functions by altering adjacent or local transcript expression or stability. The miR-29a example highlights the importance of the lncRNA hosting a miRNA.  The lncRNA (or pri-miRNA) may have additional biological functions independent of the miRNA.  Much of the literature on ncRNAs is focused on miRNAs and their role in regulating protein-coding mRNA translation; the potential function of the host lncRNA is generally overlooked.  miRNAs and the host lncRNA transcript may function in concert to regulate specific biological programs as is described for some miRNAs and their host protein-coding transcripts (e.g. SREBF2 and miR-33a).  One of the difficulties in the study of lncRNAs and their expression is that it is difficult to infer function.  Crude classification systems (including the method presented; Figure 3.1) have been employed to classify and infer function of non-coding transcripts based on their genomic 128  context in relation to protein-coding transcripts.  It is possible, however, for lncRNAs to have trans-acting regulatory roles (e.g. HOTAIR) that do not affect transcripts or transcription within a local genomic context (ie. cis-acting).  Integrative analysis including genome wide experiments to investigate RNA-protein interactions (e.g. RIP- ChIP or RIP-seq) will help to infer function of many lnRNAs.  The integrative analysis presented at the androgen regulated miR-29 genomic locus revealed a potential interaction of the miR-29a pri-miRNA with the polycomb repressive complex 2. The SREBF2/miR-33a genomic locus highlights the differential regulation of some of the features included in custom microarray.  The example also highlights the coordinate androgen regulation of a miRNA and its host protein-coding transcript.  The RNA-seq and microarray profiling data suggests that the SREBF2 transcript may have a longer alternative 3‟UTR in LNCaP cells.  The novel 3‟UTR may have additional regulatory functions that might alter mRNA localization or stability through interaction with RNA- binding proteins or miRNAs.  Differential expression was also detected in the promoter regions of SREBF2 and in intronic regions.  Although detection and identification of differential expression is important, it highlights a disadvantage to this microarray based approach.  It is impossible to know if the differential expression of a promoter sequence is an alternative 5‟ UTR for SREBF2, an alternative 3‟ UTR for the upstream transcript (i.e. CCDC134), or an independent transcript.  It is also difficult to differentiate expression of overlapping transcripts if one transcript is expressed at a much lower level to another transcript especially in the case of alternative splicing or exon skipping. Appropriate analysis of RNA-seq data considering RNA splice sites (e.g. exon-exon junctions) or CAGE data may provide additional information.  129  Chapter 6 Summary and Perspectives 6.1 Summary This thesis has explored the androgen regulation of the prostate cancer genome.  I have integrated information from public sources, our data, and next-generation sequencing data to create a prostate oriented custom 180K microarray.  This microarray has been used to interrogate gene expression under the conditions of androgens and anti-androgens.  These data have been further integrated with small RNA and AR ChIP- seq data to view the complexity of the protein-coding and non-coding RNA regulation by androgens across the genome. In Chapter 2, an analysis of the NCBI RNA reference database, RefSeq, revealed that 2.18% of the human genome is covered by processed RefSeq transcripts and 93% of those regions are protein-coding.   A third of the genomic regions covered by RefSeq can be used in multiple transcripts, both overlapping in the same direction and overlapping in opposite directions (i.e. sense-antisense).  Additional analyses on the Illumina Body Map 2.0 data (RNA-seq of 16 different tissues including prostate) revealed expression from regions of the genome not covered by RefSeq transcripts.  A portion of those expressed non-RefSeq regions were tissue-specific.  Prostate-specific regions were identified including KLK3/PSA and PCA3, transcripts previously identified as prostate-specific.   Transcripts expressed in prostate tissue or prostate cancer cell lines were identified in the NCBI GenBank primary sequence database.  Some of these GenBank accession numbers were incorrectly associated with NCBI Entrez Genes as they were transcribed exclusively in introns, antisense, upstream or downstream of the corresponding reference transcripts.  Genomic regions expressed in RNA-seq data of LNCaP and C4-2 prostate cancer cells were also identified. In Chapter 3, the methods behind the design of a custom Agilent 4x180K prostate microarray were described.  Previously designed probes were included in the design of the custom microarray from the following sources:  the Agilent 44K human gene expression probe set, probes designed by Drs. Marcel Dinger and John Mattick (University of Queensland, Brisbane, Australia) to detect lncRNAs, and Agilent aCGH probes identified in a previous experiment to detect novel androgen regulated 130  transcripts.  Novel probes were designed to detect the following: prostate expressed transcripts and genomic regions identified in Chapter 2, RefSeq transcripts considering alternative splicing and 3‟ UTRs, overlapping sense-antisense transcripts, and expressed genomic regions overlapping AR binding sites detected in an AR ChIP-seq experiment.  The microarray probes were classified based on their genomic location compared to RefSeq transcripts.  The probe performance was assessed and found that probes with an eArray base composition score of 3_BC and 4_BC (i.e. 6% of the novel probes designed) may have a probe composition dependent increase in intensity and log ratio.  The remaining probes on the custom microarray performed as expected. In Chapter 4, an integrative analysis was presented which identified androgen and anti- androgen regulated transcripts in the LNCaP in vitro model and followed the expression of those transcripts in vivo in the LNCaP xenograft tumors after castration and during progression to CRPC.  The androgen regulated transcripts were classified based on their expression following castration and during progression to CRPC (illustrated in Figure 4.18).  Both RefSeq and non-RefSeq androgen regulated transcripts were identified.  16% of the microarray probes designed to detected prostate expressed transcripts (3,153 probes) were regulated by androgens and 7% of those probes (1,490 probes) detected antagonism by an anti-androgen (i.e. bicalutamide or MDV3100). These RefSeq and non-RefSeq androgen regulated transcripts—especially those transcripts decreased by androgens and increased following castration—should be prioritized for further characterization as they may provide novel therapeutic targets for combination therapy with ADT to prevent or slow the progression to CRPC. An AR ChIP-seq experiment and small RNA next-generation sequencing experiment was also described.  The AR ChIP-seq experiment detected 6,630 potential AR binding sites.  De novo sequence motif analysis revealed an AR-like consensus motif in 40% of the AR binding sites and a FOXA1-like consensus motif in 58% of the AR binding sites. Most of the AR binding sites were located in distant enhancer or intergenic regions when compared to RefSeq transcripts.  The apparent few numbers of AR binding sites located at proximal promoters may be explained by chromatin looping.  AR has been found to activate transcription by promoting chromatin looping which brings together AR binding at distant enhancer sites with AR binding at proximal promoter sites (Shang et 131  al. 2002; Wang et al. 2005).  Alternatively, AR may be activating transcription of novel transcripts not represented in the RefSeq database.  When the AR binding sites were compared to the androgen regulated transcripts identified in the LNCaP in vitro model, the transcripts increased by androgens were closer to the detected AR binding sites than those transcripts decreased by androgens or those transcripts with no detected androgen regulation.  It is possible that the transcripts decreased by androgens are regulated by AR in a different manner to the transcripts increased, potentially through distant enhancer binding or secondary transcription factors.  The small RNA sequencing experiment described 25 known miRNAs increased by R1881 and 3 miRNAs decreased by R1881 compared to the ethanol control.  11 of these miRNAs had coordinate androgen regulation of their host transcripts in the LNCaP in vitro experiment. In Chapter 5, three examples of genomic regions regulated by androgens were described in detail.  An identified prostate antisense transcript (Chapter 2) was oppositely regulated by androgens to the CTBP1 protein-coding transcript. Preliminary experiments using a siRNA to knock-down the prostate antisense transcripts showed increased protein levels of the CTBP1 protein.   The androgen regulated genomic locus containing the androgen regulated miRNAs, miR-29a and miR-29b was also described. The integrative analysis—including published RIP-chip data (Khalil et al. 2009)—at the androgen regulated miR-29 genomic locus revealed a potential interaction of the miR- 29a pri-miRNA with the polycomb repressive complex 2.  Coordinate androgen regulation of the miRNA, miR-33a, and its host transcript, SREBF2, was also described. Several unexpected findings were revealed in the integrative analysis including transcripts specifically regulated by R1881 and not regulated by DHT; transcripts specifically increased by the next-generation anti-androgen, MDV3100, and not increased by the current anti-androgen, bicalutamide; and the potential use of alternative 3‟ UTRs following castration in the in vivo model. 6.2 Perspectives The presented RNA profiling and analytical approaches have a number of advantages and disadvantages.  The use of the Agilent custom microarray platform was a cost- effective method to profile the expression of features within both RefSeq transcripts and 132  non-RefSeq transcripts.  The analysis above revealed that many of the transcripts expressed were not represented in the RefSeq database and that some transcripts may be expressed in a tissue-specific manner.  Although microarray technologies require a priori knowledge of sequence for probe design and base composition for optimal hybridization, next-generation sequencing technologies do not have such constraints. The practice of exome sequencing —selectively sequencing the known coding exons in the genome—will re-introduce a bias.  Unbiased sequencing of many tissue types in large scale sequencing projects such as the Illumina Body Map 2.0 will reveal a more complete picture of the potential for transcription within the genome.  Appropriate meta- data (including tissue, cell type, and condition) if associated with these large-scale sequencing projects will enable analysis considering state-specific transcription. Relying completely on reference databases like RefSeq for analysis of unbiased next- generation sequencing data will also re-introduce a bias. The microarray based approach is also limited in the number of probes that can be included on one slide.  Due to such constraints, this analysis is in no way a comprehensive analysis of the transcriptome under the conditions tested.  For example, probes to detect 5‟ UTRs were not included.  It is clear, however, from this analysis that complexity of transcription is such that the former convention of one probe per gene does not fully measure the transcriptome.  It is also clear that features of transcripts such as specific exons or portions of 3‟UTRs can be differentially expressed.  An example of this was described for the LNCaP xenograft experiment where a subset of probes detecting coding exons were increased while the corresponding probes detecting the 3‟ UTRs were decreased following castration.  Averaging probes in microarrays or sequencing reads across a transcript may not have detected this differential expression. The analysis presented above detected a number of sense-antisense transcripts that were oppositely regulated by androgens and could therefore not be the result of cDNA library preparation artifacts.  Strand-specific RNA-seq protocols should be adopted to detect these sense-antisense transcripts.  The potential regulatory relationships of adjacent or antisense transcripts such as CTBP1/CTBP1as highlight the importance of considering genomic context in the analysis of high throughput datasets.  The linear 133  representation of RNA on DNA may, however, not fully represent the magnitude of the complexity of RNA space considering distal intra-chromosomal and inter-chromosomal read-through and trans-splicing events (Gingeras 2009). The microarray based approaches have a number of disadvantages.  One of these disadvantages is the inability to detect in an unbiased manner RNA splice junctions.  It is impossible to predict, for example, if a probe that is differentially expressed in the intron of a RefSeq transcript is detecting a novel exon, an independent transcript, or a novel UTR of an adjacent transcript.  It is also difficult for microarray technologies to detect overlapping alternative isoform especially if they are expressed at much lower levels; the ligand independent AR splice variant that skips exon 5, 6 and 7 [ARv567es; (Sun et al. 2010)] is such an example.  RNA-seq computational approaches are better able to piece together the sequencing reads belonging to the different alternative transcripts using splice junctions.  Microarrays are also limited to measuring relative levels of expression and have limited dynamic range of detection. 6.3 Conclusions and Future Work The decision to use the NCBI integrated bioinformatics resource (i.e. GenBank, RefSeq, and Entrez Gene) and especially to anchor microarray probe classification to RefSeq transcripts was made purely to simplify an already complicated analysis.  Many of the downstream analysis tools rely on RefSeq or Entrez Gene identifiers to link functional information.  From experience switching between identifiers can lead to a loss of information or incorrect associations as was shown between GenBank accessions and Entrez Gene IDs (Table 2.4).  Integration of information from other reference databases such as Ensembl and the UCSC Genome Browser Genes tracks will provide additional information. The analysis of high throughput RNA profiling data sets can be an overwhelming task. The integration or layering of data helps to identify and prioritize subsets of transcripts for further functional characterization.  The integration of other genome wide datasets may also help to predict potential function especially of lncRNAs (e.g. potential interaction of the host transcript of miR-29a/b with the polycomb repressive complex). Integration of genome wide experiments such as RIP-seq, to examine RNA-protein 134  interactions; ChIP-seq, to examine DNA-protein interactions; CAGE, to identify transcript start sites; and strand-specific RNA-seq, to identify splicing events, will provide additional information to further prioritize transcripts and identify novel therapeutic targets.  This analysis does rely completely on the LNCaP cell line so integration with profiling data from other prostate cancer cell lines and human tumors should be performed. The integrative analysis presented above in no way exhausts the types of questions that can be asked of the high throughput experiments described.  It did, however, identify a number of specific examples of androgen regulated and prostate specific transcripts that should be prioritized for further functional characterization. The integrative analysis also identified global patterns of expression such R1881 specific expression and alternative 3‟UTR usage following castration in the in vivo model. Many transcripts—both protein-coding and non-coding—are not represented in current RNA reference databases and therefore analyses of transcriptomes should not be constrained to reference databases.  Many of these overlooked transcripts may provide key missing links to understanding biology and in particular disease.  The novel transcripts identified in this thesis may inform on the underlying biological mechanisms leading to castration resistant prostate cancer.  The prostate-specific transcripts identified are currently being investigated as potential biomarkers of prostate cancer.  It is clear from this work and earlier work that anti-androgens do not simply reverse or “switch off” the identified androgen regulated signature.  Anti-androgens do not regulate all of the identified androgen regulated transcripts.  Anti-androgen can also have off- target effects on prostate cancer cell that can trigger alternative programs of transcription.  Knowledge of androgen regulation and cellular response to castration and anti-androgens may lead to the identification of more effective biomarkers of treatment response and may inform on the design of novel therapeutics for castration resistant prostate cancer.  135  Bibliography Agus, D. B., C. Cordon-Cardo, W. Fox, M. Drobnjak, A. Koff, D. W. Golde and H. I. Scher (1999). "Prostate cancer cell cycle regulators: response to androgen withdrawal and development of androgen independence." J Natl Cancer Inst 91(21): 1869-1876. Aihara, M., T. M. Wheeler, M. Ohori and P. T. Scardino (1994). "Heterogeneity of prostate cancer in radical prostatectomy specimens." Urology 43(1): 60-66; discussion 66-67. Amaral, P. P. and J. S. Mattick (2008). "Noncoding RNA in development." Mamm Genome 19(7-8): 454- 492. Amercian Cancer Society. (2010, June 30). "Prostate Cancer Detailed Guide."   Retrieved August, 2010, from http://www.cancer.org/. American Urological Association (2007). Guideline for the Management of Clinically Localized Prostate Cancer: 2007 Update, American Urological Association Education and Research, Inc. Amundadottir, L. T., P. Sulem, J. Gudmundsson, A. Helgason, A. Baker, et al. (2006). "A common variant associated with prostate cancer in European and African populations." Nat Genet 38(6): 652-658. Andriole, G. L., E. D. Crawford, R. L. Grubb, 3rd, S. S. Buys, D. Chia, et al. (2009). "Mortality results from a randomized prostate-cancer screening trial." N Engl J Med 360(13): 1310-1319. Bailey, T. L. and C. Elkan (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol 2: 28-36. Bailey, T. L. and M. Gribskov (1998). "Combining evidence using p-values: application to sequence homology searches." Bioinformatics 14(1): 48-54. Balk, S. P. and K. E. Knudsen (2008). "AR, the cell cycle, and prostate cancer." Nucl Recept Signal 6: e001. Bartel, D. P. (2004). "MicroRNAs: genomics, biogenesis, mechanism, and function." Cell 116(2): 281-297. Beltran, M., I. Puig, C. Pena, J. M. Garcia, A. B. Alvarez, R. Pena, F. Bonilla and A. G. de Herreros (2008). "A natural antisense transcript regulates Zeb2/Sip1 gene expression during Snail1-induced epithelial-mesenchymal transition." Genes Dev 22(6): 756-769. Birney, E., J. A. Stamatoyannopoulos, A. Dutta, R. Guigo, T. R. Gingeras, et al. (2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project." Nature 447(7146): 799-816. Bolton, E. C., A. Y. So, C. Chaivorapol, C. M. Haqq, H. Li and K. R. Yamamoto (2007). "Cell- and gene- specific regulation of primary target genes by the androgen receptor." Genes Dev 21(16): 2005-2017. Boyd, J. M., T. Subramanian, U. Schaeper, M. La Regina, S. Bayley and G. Chinnadurai (1993). "A region in the C-terminus of adenovirus 2/5 E1a protein is required for association with a cellular phosphoprotein and important for the negative modulation of T24-ras mediated transformation, tumorigenesis and metastasis." EMBO J 12(2): 469-478. Bruchovsky, N. and J. D. Wilson (1968). "The conversion of testosterone to 5-alpha-androstan-17-beta-ol- 3-one by rat prostate in vivo and in vitro." J Biol Chem 243(8): 2012-2021. Buchanan, G., R. A. Irvine, G. A. Coetzee and W. D. Tilley (2001). "Contribution of the androgen receptor to prostate cancer predisposition and progression." Cancer Metastasis Rev 20(3-4): 207-223. 136  Canadian Cancer Society (2010). Canadian Cancer Statistics 2010. Toronto, Canadian Cancer Society. Candeias, M. M., L. Malbert-Colas, D. J. Powell, C. Daskalogianni, M. M. Maslon, N. Naski, K. Bourougaa, F. Calvo and R. Fahraeus (2008). "P53 mRNA controls p53 activity by managing Mdm2 functions." Nat Cell Biol 10(9): 1098-1105. Carninci, P. (2010). "RNA dust: where are the genes?" DNA Res 17(2): 51-59. Carninci, P., A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, et al. (2006). "Genome-wide analysis of mammalian promoter architecture and evolution." Nat Genet 38(6): 626-635. Cato, A. C., D. Henderson and H. Ponta (1987). "The hormone response element of the mouse mammary tumour virus DNA mediates the progestin and androgen induction of transcription in the proviral long terminal repeat region." EMBO J 6(2): 363-368. CESC (1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology." Science 282(5396): 2012-2018. Chen, C. D., D. S. Welsbie, C. Tran, S. H. Baek, R. Chen, R. Vessella, M. G. Rosenfeld and C. L. Sawyers (2004). "Molecular determinants of resistance to antiandrogen therapy." Nat Med 10(1): 33-39. Cheng, L., S. Y. Song, T. G. Pretlow, F. W. Abdul-Karim, H. J. Kung, D. V. Dawson, W. S. Park, Y. W. Moon, M. L. Tsai, W. M. Linehan, M. R. Emmert-Buck, L. A. Liotta and Z. Zhuang (1998). "Evidence of independent origin of multiple tumors from patients with prostate cancer." J Natl Cancer Inst 90(3): 233- 237. Chi, K. N., S. J. Hotte, E. Y. Yu, D. Tu, B. J. Eigl, I. Tannock, F. Saad, S. North, J. Powers, M. E. Gleave and E. A. Eisenhauer (2010). "Randomized phase II study of docetaxel and prednisone with or without OGX-011 in patients with metastatic castration-resistant prostate cancer." J Clin Oncol 28(27): 4247- 4254. Chin, L. J. and F. J. Slack (2008). "A truth serum for cancer--microRNAs have major potential as cancer biomarkers." Cell Res 18(10): 983-984. Chooniedass-Kothari, S., E. Emberley, M. K. Hamedani, S. Troup, X. Wang, A. Czosnek, F. Hube, M. Mutawe, P. H. Watson and E. Leygue (2004). "The steroid receptor RNA activator is the first functional RNA encoding a protein." FEBS Lett 566(1-3): 43-47. Claessens, F., S. Denayer, N. Van Tilborgh, S. Kerkhofs, C. Helsen and A. Haelens (2008). "Diverse roles of androgen receptor (AR) domains in AR-mediated signaling." Nucl Recept Signal 6: e008. Craft, N. and C. L. Sawyers (1998). "Mechanistic concepts in androgen-dependence of prostate cancer." Cancer Metastasis Rev 17(4): 421-427. Crawford, E. D., M. A. Eisenberger, D. G. McLeod, J. T. Spaulding, R. Benson, F. A. Dorr, B. A. Blumenstein, M. A. Davis and P. J. Goodman (1989). "A controlled trial of leuprolide with and without flutamide in prostatic carcinoma." N Engl J Med 321(7): 419-424. Culig, Z., A. Hobisch, M. V. Cronauer, A. C. Cato, A. Hittmair, C. Radmayr, J. Eberle, G. Bartsch and H. Klocker (1993). "Mutant androgen receptor detected in an advanced-stage prostatic carcinoma is activated by adrenal androgens and progesterone." Mol Endocrinol 7(12): 1541-1550. Culig, Z., A. Hobisch, M. V. Cronauer, C. Radmayr, J. Trapman, A. Hittmair, G. Bartsch and H. Klocker (1994). "Androgen receptor activation in prostatic tumor cell lines by insulin-like growth factor-I, keratinocyte growth factor, and epidermal growth factor." Cancer Res 54(20): 5474-5478. 137  Cunha, G. R., W. Ricke, A. Thomson, P. C. Marker, G. Risbridger, S. W. Hayward, Y. Z. Wang, A. A. Donjacour and T. Kurita (2004). "Hormonal, cellular, and molecular regulation of normal and neoplastic prostatic development." J Steroid Biochem Mol Biol 92(4): 221-236. de Bono, J. S., C. Logothetis, K. Fizazi, S. North, L. Chu, K. N. Chi, T. Kheoh, C. M. Haqq, A. Molina and H. I. Scher (2010). "Abiraterone acetate (AA) plus low dose prednisone (P) improves overall survival (OS) in patients (pts) with metastatic castration-resistant prostate cancer (mCRPC) who have progressed after docetaxel-based chemotherapy (chemo): Results of COU-AA-301, a randomized double-blind placebo-controlled phase III study." Annals of Oncology 21 (suppl 8): (abstract LBA5). De Dosso, S. and D. R. Berthold (2008). "Docetaxel in the management of prostate cancer: current standard of care and future directions." Expert Opin Pharmacother 9(11): 1969-1979. de Hoon, M. J., R. J. Taft, T. Hashimoto, M. Kanamori-Katayama, H. Kawaji, et al. (2010). "Cross- mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries." Genome Res 20(2): 257-264. Dehm, S. M., L. J. Schmidt, H. V. Heemers, R. L. Vessella and D. J. Tindall (2008). "Splicing of a novel androgen receptor exon generates a constitutively active androgen receptor that mediates prostate cancer therapy resistance." Cancer Res 68(13): 5469-5477. Dehm, S. M. and D. J. Tindall (2006). "Molecular regulation of androgen action in prostate cancer." J Cell Biochem 99(2): 333-344. Dehm, S. M. and D. J. Tindall (2007). "Androgen receptor structural and functional elements: role and regulation in prostate cancer." Mol Endocrinol 21(12): 2855-2863. Denis, L. J., F. Keuppens, P. H. Smith, P. Whelan, J. L. de Moura, D. Newling, A. Bono and R. Sylvester (1998). "Maximal androgen blockade: final analysis of EORTC phase III trial 30853. EORTC Genito- Urinary Tract Cancer Cooperative Group and the EORTC Data Center." Eur Urol 33(2): 144-151. Denmeade, S. R., X. S. Lin and J. T. Isaacs (1996). "Role of programmed (apoptotic) cell death during the progression and therapy for prostate cancer." Prostate 28(4): 251-265. Denoeud, F., P. Kapranov, C. Ucla, A. Frankish, R. Castelo, et al. (2007). "Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions." Genome Res 17(6): 746-759. Dhanasekaran, S. M., A. Dash, J. Yu, I. P. Maine, B. Laxman, S. A. Tomlins, C. J. Creighton, A. Menon, M. A. Rubin and A. M. Chinnaiyan (2005). "Molecular profiling of human prostate tissues: insights into gene expression patterns of prostate development during puberty." FASEB J 19(2): 243-245. Dinger, M. E., P. P. Amaral, T. R. Mercer, K. C. Pang, S. J. Bruce, et al. (2008). "Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation." Genome Res. Doherty, J. K., C. T. Bond, W. Hua, J. P. Adelman and G. M. Clinton (1999). "An alternative HER-2/neu transcript of 8 kb has an extended 3'UTR and displays increased stability in SKOV-3 ovarian carcinoma cells." Gynecol Oncol 74(3): 408-415. Donjacour, A. A. and G. R. Cunha (1988). "The effect of androgen deprivation on branching morphogenesis in the mouse prostate." Dev Biol 128(1): 1-14. Ebisuya, M., T. Yamamoto, M. Nakajima and E. Nishida (2008). "Ripples from neighbouring transcription." Nat Cell Biol 10(9): 1106-1113. 138  Eisenberger, M. A., B. A. Blumenstein, E. D. Crawford, G. Miller, D. G. McLeod, P. J. Loehrer, G. Wilding, K. Sears, D. J. Culkin, I. M. Thompson, Jr., A. J. Bueschen and B. A. Lowe (1998). "Bilateral orchiectomy with or without flutamide for metastatic prostate cancer." N Engl J Med 339(15): 1036-1042. Esquela-Kerscher, A. and F. J. Slack (2006). "Oncomirs - microRNAs with a role in cancer." Nat Rev Cancer 6(4): 259-269. Ettinger, S. L., R. Sobel, T. G. Whitmore, M. Akbari, D. R. Bradley, M. E. Gleave and C. C. Nelson (2004). "Dysregulation of sterol response element-binding proteins and downstream effectors in prostate cancer during progression to androgen independence." Cancer Res 64(6): 2212-2221. Fabbri, M., R. Garzon, A. Cimmino, Z. Liu, N. Zanesi, et al. (2007). "MicroRNA-29 family reverts aberrant methylation in lung cancer by targeting DNA methyltransferases 3A and 3B." Proc Natl Acad Sci U S A 104(40): 15805-15810. Faghihi, M. A., M. Zhang, J. Huang, F. Modarresi, M. P. Van der Brug, M. A. Nalls, M. R. Cookson, G. St- Laurent, 3rd and C. Wahlestedt (2010). "Evidence for natural antisense transcript-mediated inhibition of microRNA function." Genome Biol 11(5): R56. Feldman, B. J. and D. Feldman (2001). "The development of androgen-independent prostate cancer." Nat Rev Cancer 1(1): 34-45. Fine, S. W., A. Gopalan, M. A. Leversha, H. A. Al-Ahmadie, S. K. Tickoo, Q. Zhou, J. M. Satagopan, P. T. Scardino, W. L. Gerald and V. E. Reuter (2010). "TMPRSS2-ERG gene fusion is associated with low Gleason scores and not with high-grade morphological features." Mod Pathol 23(10): 1325-1333. Flicek, P., M. R. Amode, D. Barrell, K. Beal, S. Brent, et al. (2011). "Ensembl 2011." Nucleic Acids Res 39(Database issue): D800-806. Foradori, C. D., M. J. Weiser and R. J. Handa (2008). "Non-genomic actions of androgens." Front Neuroendocrinol 29(2): 169-181. Freedman, M. L., C. A. Haiman, N. Patterson, G. J. McDonald, A. Tandon, et al. (2006). "Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men." Proc Natl Acad Sci U S A 103(38): 14068-14073. Fu, X., L. Ravindranath, N. Tran, G. Petrovics and S. Srivastava (2006). "Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene, PCGEM1." DNA Cell Biol 25(3): 135- 141. Fujita, P. A., B. Rhead, A. S. Zweig, A. S. Hinrichs, D. Karolchik, et al. (2011). "The UCSC Genome Browser database: update 2011." Nucleic Acids Res 39(Database issue): D876-882. Fujita, Y., K. Kojima, R. Ohhashi, N. Hamada, Y. Nozawa, A. Kitamoto, A. Sato, S. Kondo, T. Kojima, T. Deguchi and M. Ito (2010). "MiR-148a attenuates paclitaxel resistance of hormone-refractory, drug- resistant prostate cancer PC3 cells by regulating MSK1 expression." J Biol Chem 285(25): 19076-19084. Fullwood, M. J., Y. Han, C. L. Wei, X. Ruan and Y. Ruan (2010). "Chromatin interaction analysis using paired-end tag sequencing." Curr Protoc Mol Biol Chapter 21: Unit 21 15 21-25. Gandellini, P., M. Folini and N. Zaffaroni (2009). "Towards the definition of prostate cancer-related microRNAs: where are we now?" Trends Mol Med. Gao, N., J. Zhang, M. A. Rao, T. C. Case, J. Mirosevich, Y. Wang, R. Jin, A. Gupta, P. S. Rennie and R. J. Matusik (2003). "The role of hepatocyte nuclear factor-3 alpha (Forkhead Box A1) and androgen receptor in transcriptional regulation of prostatic genes." Mol Endocrinol 17(8): 1484-1507. 139  Garofalo, M., C. Quintavalle, G. Di Leva, C. Zanca, G. Romano, C. Taccioli, C. G. Liu, C. M. Croce and G. Condorelli (2008). "MicroRNA signatures of TRAIL resistance in human non-small cell lung cancer." Oncogene. Garzon, R., S. Liu, M. Fabbri, Z. Liu, C. E. Heaphy, et al. (2009). "MicroRNA -29b induces global DNA hypomethylation and tumor suppressor gene re-expression in acute myeloid leukemia by targeting directly DNMT3A and 3B and indirectly DNMT1." Blood. Gelmann, E. P. (2002). "Molecular biology of the androgen receptor." J Clin Oncol 20(13): 3001-3015. Gerin, I., L. A. Clerbaux, O. Haumont, N. Lanthier, A. K. Das, C. F. Burant, I. A. Leclercq, O. A. Macdougald and G. T. Bommer (2010). "Expression of miR-33 from an SREBP2 intron inhibits cholesterol export and fatty acid oxidation." J Biol Chem. Gerstein, M. B., C. Bruce, J. S. Rozowsky, D. Zheng, J. Du, J. O. Korbel, O. Emanuelsson, Z. D. Zhang, S. Weissman and M. Snyder (2007). "What is a gene, post-ENCODE? History and updated definition." Genome Res 17(6): 669-681. Gibbons, N. B., R. W. Watson, R. N. Coffey, H. P. Brady and J. M. Fitzpatrick (2000). "Heat-shock proteins inhibit induction of prostate cancer cell apoptosis." Prostate 45(1): 58-65. Gimenez-Bonafe, P., M. N. Fedoruk, T. G. Whitmore, M. Akbari, J. L. Ralph, S. Ettinger, M. E. Gleave and C. C. Nelson (2004). "YB-1 is upregulated during prostate cancer tumor progression and increases P-glycoprotein activity." Prostate 59(3): 337-349. Gingeras, T. R. (2009). "Implications of chimaeric non-co-linear transcripts." Nature 461(7261): 206-211. Gioeli, D., J. W. Mandell, G. R. Petroni, H. F. Frierson, Jr. and M. J. Weber (1999). "Activation of mitogen- activated protein kinase associated with prostate cancer progression." Cancer Res 59(2): 279-284. Gleave, M. E., Goldenberg S.L., So A., Black P, Chi K, Davidson B.J. (2009). Information for men newly diagnosed with prostate cancer. V. P. Centre. Gopalan, A., M. A. Leversha, J. M. Satagopan, Q. Zhou, H. A. Al-Ahmadie, S. W. Fine, J. A. Eastham, P. T. Scardino, H. I. Scher, S. K. Tickoo, V. E. Reuter and W. L. Gerald (2009). "TMPRSS2-ERG gene fusion is not associated with outcome in patients treated by prostatectomy." Cancer Res 69(4): 1400- 1406. Graff, J. R., B. W. Konicek, A. M. McNulty, Z. Wang, K. Houck, S. Allen, J. D. Paul, A. Hbaiu, R. G. Goode, G. E. Sandusky, R. L. Vessella and B. L. Neubauer (2000). "Increased AKT activity contributes to prostate cancer progression by dramatically accelerating prostate tumor growth and diminishing p27Kip1 expression." J Biol Chem 275(32): 24500-24505. Gregory, C. W., X. Fei, L. A. Ponguta, B. He, H. M. Bill, F. S. French and E. M. Wilson (2004). "Epidermal growth factor increases coactivation of the androgen receptor in recurrent prostate cancer." J Biol Chem 279(8): 7119-7130. Gregory, C. W., K. G. Hamil, D. Kim, S. H. Hall, T. G. Pretlow, J. L. Mohler and F. S. French (1998). "Androgen receptor expression in androgen-independent prostate cancer is associated with increased expression of androgen-regulated genes." Cancer Res 58(24): 5718-5724. Gregory, C. W., B. He, R. T. Johnson, O. H. Ford, J. L. Mohler, F. S. French and E. M. Wilson (2001). "A mechanism for androgen receptor-mediated prostate cancer recurrence after androgen deprivation therapy." Cancer Res 61(11): 4315-4319. Gregory, C. W., R. T. Johnson, Jr., J. L. Mohler, F. S. French and E. M. Wilson (2001). "Androgen receptor stabilization in recurrent prostate cancer is associated with hypersensitivity to low androgen." Cancer Res 61(7): 2892-2898. 140  Griffiths-Jones, S., R. J. Grocock, S. van Dongen, A. Bateman and A. J. Enright (2006). "miRBase: microRNA sequences, targets and gene nomenclature." Nucleic Acids Res 34(Database issue): D140- 144. Gronberg, H. (2003). "Prostate cancer epidemiology." Lancet 361(9360): 859-864. Grooteclaes, M., Q. Deveraux, J. Hildebrand, Q. Zhang, R. H. Goodman and S. M. Frisch (2003). "C- terminal-binding protein corepresses epithelial and proapoptotic gene expression programs." Proc Natl Acad Sci U S A 100(8): 4568-4573. Grooteclaes, M. L. and S. M. Frisch (2000). "Evidence for a function of CtBP in epithelial gene regulation and anoikis." Oncogene 19(33): 3823-3828. Guo, Z., X. Yang, F. Sun, R. Jiang, D. E. Linn, H. Chen, X. Kong, J. Melamed, C. G. Tepper, H. J. Kung, A. M. Brodie, J. Edwards and Y. Qiu (2009). "A novel androgen receptor splice variant is up-regulated during prostate cancer progression and promotes androgen depletion-resistant growth." Cancer Res 69(6): 2305-2313. Gupta, S., J. A. Stamatoyannopoulos, T. L. Bailey and W. S. Noble (2007). "Quantifying similarity between motifs." Genome Biol 8(2): R24. Guttman, M., I. Amit, M. Garber, C. French, M. F. Lin, et al. (2009). "Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals." Nature 458(7235): 223-227. Haas, G. P., N. Delongchamps, O. W. Brawley, C. Y. Wang and G. de la Roza (2008). "The worldwide epidemiology of prostate cancer: perspectives from autopsy studies." Can J Urol 15(1): 3866-3871. Hackenberg, M., M. Sturm, D. Langenberger, J. M. Falcon-Perez and A. M. Aransay (2009). "miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments." Nucleic Acids Res. Haffner, M. C., M. J. Aryee, A. Toubaji, D. M. Esopi, R. Albadine, et al. (2010). "Androgen-induced TOP2B-mediated double-strand breaks and prostate cancer gene rearrangements." Nat Genet 42(8): 668-675. Ham, J., A. Thomson, M. Needham, P. Webb and M. Parker (1988). "Characterization of response elements for androgens, glucocorticoids and progestins in mouse mammary tumour virus." Nucleic Acids Res 16(12): 5263-5276. Hara, T., J. Miyazaki, H. Araki, M. Yamaoka, N. Kanzaki, M. Kusaka and M. Miyamoto (2003). "Novel mutations of androgen receptor: a possible mechanism of bicalutamide withdrawal syndrome." Cancer Res 63(1): 149-153. Haraguchi, T., Y. Ozaki and H. Iba (2009). "Vectors expressing efficient RNA decoys achieve the long- term suppression of specific microRNA activity in mammalian cells." Nucleic Acids Res. Hastings, M. L., C. Milcarek, K. Martincic, M. L. Peterson and S. H. Munroe (1997). "Expression of the thyroid hormone receptor gene, erbAalpha, in B lymphocytes: alternative mRNA processing is independent of differentiation but correlates with antisense RNA levels." Nucleic Acids Res 25(21): 4296- 4300. Heinlein, C. A. and C. Chang (2002). "The roles of androgen receptors and androgen-binding proteins in nongenomic androgen actions." Mol Endocrinol 16(10): 2181-2187. Heinlein, C. A. and C. Chang (2004). "Androgen receptor in prostate cancer." Endocr Rev 25(2): 276-308. Hessels, D. and J. A. Schalken (2009). "The use of PCA3 in the diagnosis of prostate cancer." Nat Rev Urol 6(5): 255-261. 141  Heuze-Vourc'h, N., V. Leblond and Y. Courty (2003). "Complex alternative splicing of the hKLK3 gene coding for the tumor marker PSA (prostate-specific-antigen)." Eur J Biochem 270(4): 706-714. Horoszewicz, J. S., S. S. Leong, E. Kawinski, J. P. Karr, H. Rosenthal, T. M. Chu, E. A. Mirand and G. P. Murphy (1983). "LNCaP model of human prostatic carcinoma." Cancer Res 43(4): 1809-1818. Hotte, S. J., E. Y. Yu, H. W. Hirte, C. S. Higano, M. E. Gleave and K. Chi (2009). "OGX-427, a 2'methoxyethyl antisense oligonucleotide (ASO), against HSP27: Results of a first-in-human trial." J Clin Oncol 27(15s): suppl; abstr 3506. Hu, R., T. A. Dunn, S. Wei, S. Isharwal, R. W. Veltri, E. Humphreys, M. Han, A. W. Partin, R. L. Vessella, W. B. Isaacs, G. S. Bova and J. Luo (2009). "Ligand-independent androgen receptor variants derived from splicing of cryptic exons signify hormone-refractory prostate cancer." Cancer Res 69(1): 16-22. Huang da, W., B. T. Sherman and R. A. Lempicki (2009). "Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources." Nat Protoc 4(1): 44-57. Huang, W., Y. Shostak, P. Tarr, C. Sawyers and M. Carey (1999). "Cooperative assembly of androgen receptor into a nucleoprotein complex that regulates the prostate-specific antigen enhancer." J Biol Chem 274(36): 25756-25768. Hube, F., G. Velasco, J. Rollin, D. Furling and C. Francastel (2010). "Steroid receptor RNA activator protein binds to and counteracts SRA RNA-mediated activation of MyoD and muscle differentiation." Nucleic Acids Res. Huggins, C. and C. V. Hodges (1941). "Studies on prostatic cancer. I. The effect of castration, of estrogen and androgen injection on serum phosphatases in metastatic carcinoma of the prostate." Cancer Research 1: 293-297. Human Genome Project (2004). "Finishing the euchromatic sequence of the human genome." Nature 431(7011): 931-945. Humphrey, P. A. (2004). "Gleason grading and prognostic factors in carcinoma of the prostate." Mod Pathol 17(3): 292-306. Hwang, H. W., E. A. Wentzel and J. T. Mendell (2007). "A hexanucleotide element directs microRNA nuclear import." Science 315(5808): 97-100. Isaacs, J. T. (1999). "The biology of hormone refractory prostate cancer. Why does it develop?" Urol Clin North Am 26(2): 263-273. Jemal, A., F. Bray, M. M. Center, J. Ferlay, E. Ward and D. Forman (2011). "Global cancer statistics." CA Cancer J Clin 61(2): 69-90. Jemal, A., R. Siegel, J. Xu and E. Ward (2010). "Cancer Statistics, 2010." CA Cancer J Clin. Jia, L., B. P. Berman, U. Jariwala, X. Yan, J. P. Cogan, A. Walters, T. Chen, G. Buchanan, B. Frenkel and G. A. Coetzee (2008). "Genomic androgen receptor-occupied regions with different functions, defined by histone acetylation, coregulators and transcriptional capacity." PLoS ONE 3(11): e3645. Jin, W., K. W. Scotto, W. N. Hait and J. M. Yang (2007). "Involvement of CtBP1 in the transcriptional activation of the MDR1 gene in human multidrug resistant cancer cells." Biochem Pharmacol 74(6): 851- 859. Johns, L. E. and R. S. Houlston (2003). "A systematic review and meta-analysis of familial prostate cancer risk." BJU Int 91(9): 789-794. 142  July, L. V., M. Akbari, T. Zellweger, E. C. Jones, S. L. Goldenberg and M. E. Gleave (2002). "Clusterin expression is significantly enhanced in prostate cancer cells following androgen withdrawal therapy." Prostate 50(3): 179-188. Katayama, S., Y. Tomaru, T. Kasukawa, K. Waki, M. Nakanishi, et al. (2005). "Antisense transcription in the mammalian transcriptome." Science 309(5740): 1564-1566. Ke, X. S., Y. Qu, K. Rostad, W. C. Li, B. Lin, O. J. Halvorsen, S. A. Haukaas, I. Jonassen, K. Petersen, N. Goldfinger, V. Rotter, L. A. Akslen, A. M. Oyan and K. H. Kalland (2009). "Genome-wide profiling of histone h3 lysine 4 and lysine 27 trimethylation reveals an epigenetic signature in prostate carcinogenesis." PLoS One 4(3): e4687. Kelly, K. and J. J. Yin (2008). "Prostate cancer and metastasis initiating stem cells." Cell Res 18(5): 528- 537. Khalil, A. M., M. Guttman, M. Huarte, M. Garber, A. Raj, D. Rivea Morales, K. Thomas, A. Presser, B. E. Bernstein, A. van Oudenaarden, A. Regev, E. S. Lander and J. L. Rinn (2009). "Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression." Proc Natl Acad Sci U S A 106(28): 11667-11672. Khan, A. P., L. M. Poisson, V. Bhat, D. Fermin, R. Zhao, S. Kalyana-Sundaram, G. Michailidis, A. I. Nesvizhskii, G. S. Omenn, A. M. Chinnaiyan and A. Sreekumar (2009). "Quantitative proteomic profiling of prostate cancer reveals a role for miR-128 in prostate cancer." Mol Cell Proteomics. Kim, E., A. Magen and G. Ast (2007). "Different levels of alternative splicing among eukaryotes." Nucleic Acids Res 35(1): 125-131. Kim, J. H., S. M. Dhanasekaran, R. Mehra, S. A. Tomlins, W. Gu, et al. (2007). "Integrative analysis of genomic aberrations associated with prostate cancer progression." Cancer Res 67(17): 8229-8239. Kim, M., A. L. Kasinski and F. J. Slack (2011). "MicroRNA therapeutics in preclinical cancer models." Lancet Oncol 12(4): 319-321. Kim, T. K., M. Hemberg, J. M. Gray, A. M. Costa, D. M. Bear, et al. (2010). "Widespread transcription at neuronal activity-regulated enhancers." Nature 465(7295): 182-187. Kimura, K., A. Wakamatsu, Y. Suzuki, T. Ota, T. Nishikawa, et al. (2006). "Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes." Genome Res 16(1): 55-65. Kino, T., D. E. Hurt, T. Ichijo, N. Nader and G. P. Chrousos (2010). "Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor." Sci Signal 3(107): ra8. Kobayashi, Y., D. M. Absher, Z. G. Gulzar, S. R. Young, J. K. McKenney, D. M. Peehl, J. D. Brooks, R. M. Myers and G. Sherlock (2011). "DNA methylation profiling reveals novel biomarkers and important roles for DNA methyltransferases in prostate cancer." Genome Res. Koch-Brandt, C. and C. Morgans (1996). "Clusterin: a role in cell survival in the face of apoptosis?" Prog Mol Subcell Biol 16: 130-149. Kohli, M. and D. J. Tindall (2010). "New developments in the medical management of prostate cancer." Mayo Clin Proc 85(1): 77-86. Koivisto, P., J. Kononen, C. Palmberg, T. Tammela, E. Hyytinen, J. Isola, J. Trapman, K. Cleutjens, A. Noordzij, T. Visakorpi and O. P. Kallioniemi (1997). "Androgen receptor gene amplification: a possible molecular mechanism for androgen deprivation therapy failure in prostate cancer." Cancer Res 57(2): 314-319. 143  Kondo, Y., L. Shen, A. S. Cheng, S. Ahmed, Y. Boumber, C. Charo, T. Yamochi, T. Urano, K. Furukawa, B. Kwabi-Addo, D. L. Gold, Y. Sekido, T. H. Huang and J. P. Issa (2008). "Gene silencing in cancer by histone H3 lysine 27 trimethylation independent of promoter DNA methylation." Nat Genet 40(6): 741-750. Krek, A., D. Grun, M. N. Poy, R. Wolf, L. Rosenberg, E. J. Epstein, P. MacMenamin, I. da Piedade, K. C. Gunsalus, M. Stoffel and N. Rajewsky (2005). "Combinatorial microRNA target predictions." Nat Genet 37(5): 495-500. Kremer, C. L., R. R. Klein, J. Mendelson, W. Browne, L. K. Samadzedeh, K. Vanpatten, L. Highstrom, G. A. Pestano and R. B. Nagle (2006). "Expression of mTOR signaling pathway markers in prostate cancer progression." Prostate 66(11): 1203-1212. Lai, J., M. L. Lehman, M. E. Dinger, S. C. Hendy, T. R. Mercer, I. Seim, M. G. Lawrence, J. S. Mattick, J. A. Clements and C. C. Nelson (2010). "A variant of the KLK4 gene is expressed as a cis sense-antisense chimeric transcript in prostate cancer cells." Rna 16(6): 1156-1166. Lamont, K. R. and D. J. Tindall (2010). "Androgen regulation of gene expression." Adv Cancer Res 107: 137-162. Lamoureux, F., C. Thomas, M. J. Yin, H. Kuruma, L. Fazli, M. E. Gleave and A. Zoubeidi (2011). "A novel HSP90 inhibitor delays castrate resistant prostate cancer without altering serum PSA levels and inhibits osteoclastogenesis." Clin Cancer Res. Lanz, R. B., N. J. McKenna, S. A. Onate, U. Albrecht, J. Wong, S. Y. Tsai, M. J. Tsai and B. W. O'Malley (1999). "A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex." Cell 97(1): 17-27. Lapointe, J., C. Li, C. P. Giacomini, K. Salari, S. Huang, P. Wang, M. Ferrari, T. Hernandez-Boussard, J. D. Brooks and J. R. Pollack (2007). "Genomic profiling reveals alternative genetic pathways of prostate tumorigenesis." Cancer Res 67(18): 8504-8510. Leskov, K. S., D. Y. Klokov, J. Li, T. J. Kinsella and D. A. Boothman (2003). "Synthesis and functional analyses of nuclear clusterin, a cell death protein." J Biol Chem 278(13): 11590-11600. Lewis, B. P., C. B. Burge and D. P. Bartel (2005). "Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets." Cell 120(1): 15-20. Li, H., J. Wang, G. Mor and J. Sklar (2008). "A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells." Science 321(5894): 1357-1361. Lilja, H., D. Ulmert and A. J. Vickers (2008). "Prostate-specific antigen and prostate cancer: prediction, detection and monitoring." Nat Rev Cancer 8(4): 268-278. Lin, C., L. Yang, B. Tanasa, K. Hutt, B. G. Ju, K. Ohgi, J. Zhang, D. W. Rose, X. D. Fu, C. K. Glass and M. G. Rosenfeld (2009). "Nuclear Receptor-Induced Chromosomal Proximity and DNA Breaks Underlie Specific Translocations in Cancer." Cell. Locke, J. A., E. S. Guns, A. A. Lubik, H. H. Adomat, S. C. Hendy, C. A. Wood, S. L. Ettinger, M. E. Gleave and C. C. Nelson (2008). "Androgen levels increase by intratumoral de novo steroidogenesis during progression of castration-resistant prostate cancer." Cancer Res 68(15): 6407-6415. Logothetis, C. J. and S. H. Lin (2005). "Osteoblasts in prostate cancer metastasis to bone." Nat Rev Cancer 5(1): 21-28. Long, R. M., C. Morrissey, J. M. Fitzpatrick and R. W. Watson (2005). "Prostate epithelial cell differentiation and its relevance to the understanding of prostate cancer therapies." Clin Sci (Lond) 108(1): 1-11. 144  Longo, D. L. (2010). "New therapies for castration-resistant prostate cancer." N Engl J Med 363(5): 479- 481. Louro, R., H. I. Nakaya, P. P. Amaral, F. Festa, M. C. Sogayar, A. M. da Silva, S. Verjovski-Almeida and E. M. Reis (2007). "Androgen responsive intronic non-coding RNAs." BMC Biol 5: 4. Lubahn, D. B., D. R. Joseph, M. Sar, J. Tan, H. N. Higgs, R. E. Larson, F. S. French and E. M. Wilson (1988). "The human androgen receptor: complementary deoxyribonucleic acid cloning, sequence analysis and gene expression in prostate." Mol Endocrinol 2(12): 1265-1275. Macintosh, C. A., M. Stower, N. Reid and N. J. Maitland (1998). "Precise microdissection of human prostate cancers reveals genotypic heterogeneity." Cancer Res 58(1): 23-28. Maglott, D., J. Ostell, K. D. Pruitt and T. Tatusova (2011). "Entrez Gene: gene-centered information at NCBI." Nucleic Acids Res 39(Database issue): D52-57. Maher, C. A., C. Kumar-Sinha, X. Cao, S. Kalyana-Sundaram, B. Han, X. Jing, L. Sam, T. Barrette, N. Palanisamy and A. M. Chinnaiyan (2009). "Transcriptome sequencing to detect gene fusions in cancer." Nature. Makkonen, H., M. Kauhanen, V. Paakinaho, T. Jaaskelainen and J. J. Palvimo (2009). "Long-range activation of FKBP51 transcription by the androgen receptor via distal intronic enhancers." Nucleic Acids Res 37(12): 4135-4148. Malone, C. D. and G. J. Hannon (2009). "Small RNAs as guardians of the genome." Cell 136(4): 656-668. Mani, R. S., S. A. Tomlins, K. Callahan, A. Ghosh, M. K. Nyati, S. Varambally, N. Palanisamy and A. M. Chinnaiyan (2009). "Induced chromosomal proximity and gene fusions in prostate cancer." Science 326(5957): 1230. Marcelli, M., M. Ittmann, S. Mariani, R. Sutherland, R. Nigam, L. Murthy, Y. Zhao, D. DiConcini, E. Puxeddu, A. Esen, J. Eastham, N. L. Weigel and D. J. Lamb (2000). "Androgen receptor mutations in prostate cancer." Cancer Res 60(4): 944-949. Massie, C. E., B. Adryan, N. L. Barbosa-Morais, A. G. Lynch, M. G. Tran, D. E. Neal and I. G. Mills (2007). "New androgen receptor genomic targets show an interaction with the ETS1 transcription factor." EMBO Rep 8(9): 871-878. Mattick, J. S. (2001). "Non-coding RNAs: the architects of eukaryotic complexity." EMBO Rep 2(11): 986- 991. Mattick, J. S. and I. V. Makunin (2006). "Non-coding RNA." Hum Mol Genet 15 Spec No 1: R17-29. Mayr, C. and D. P. Bartel (2009). "Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells." Cell 138(4): 673-684. McDonnell, T. J., P. Troncoso, S. M. Brisbay, C. Logothetis, L. W. Chung, J. T. Hsieh, S. M. Tu and M. L. Campbell (1992). "Expression of the protooncogene bcl-2 in the prostate and its association with emergence of androgen-independent prostate cancer." Cancer Res 52(24): 6940-6944. Mercer, T. R., M. E. Dinger and J. S. Mattick (2009). "Long non-coding RNAs: insights into functions." Nat Rev Genet 10(3): 155-159. Mercer, T. R., M. E. Dinger, S. M. Sunkin, M. F. Mehler and J. S. Mattick (2008). "Specific expression of long noncoding RNAs in the mouse brain." Proc Natl Acad Sci U S A 105(2): 716-721. 145  Mercer, T. R., D. Wilhelm, M. E. Dinger, G. Solda, D. J. Korbie, E. A. Glazov, V. Truong, M. Schwenke, C. Simons, K. I. Matthaei, R. Saint, P. Koopman and J. S. Mattick (2011). "Expression of distinct RNAs from 3' untranslated regions." Nucleic Acids Res 39(6): 2393-2403. Miller, G. J. and J. M. Cygan (1994). "Morphology of prostate cancer: the effects of multifocality on histological grade, tumor volume and capsule penetration." J Urol 152(5 Pt 2): 1709-1713. Miyake, H., C. Nelson, P. S. Rennie and M. E. Gleave (2000). "Testosterone-repressed prostate message-2 is an antiapoptotic gene involved in progression to androgen independence in prostate cancer." Cancer Res 60(1): 170-176. Mohler, J. L., C. W. Gregory, O. H. Ford, 3rd, D. Kim, C. M. Weaver, P. Petrusz, E. M. Wilson and F. S. French (2004). "The androgen axis in recurrent prostate cancer." Clin Cancer Res 10(2): 440-448. Montgomery, R. B., E. A. Mostaghel, R. Vessella, D. L. Hess, T. F. Kalhorn, C. S. Higano, L. D. True and P. S. Nelson (2008). "Maintenance of intratumoral androgens in metastatic prostate cancer: a mechanism for castration-resistant tumor growth." Cancer Res 68(11): 4447-4454. Morin, R. D., M. D. O'Connor, M. Griffith, F. Kuchenbauer, A. Delaney, A. L. Prabhu, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, C. J. Eaves and M. A. Marra (2008). "Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells." Genome Res 18(4): 610-621. Mostaghel, E. A., S. T. Page, D. W. Lin, L. Fazli, I. M. Coleman, L. D. True, B. Knudsen, D. L. Hess, C. C. Nelson, A. M. Matsumoto, W. J. Bremner, M. E. Gleave and P. S. Nelson (2007). "Intraprostatic androgens and androgen-regulated gene expression persist after testosterone suppression: therapeutic implications for castration-resistant prostate cancer." Cancer Res 67(10): 5033-5041. Moucadel, V., F. Lopez, T. Ara, P. Benech and D. Gautheret (2007). "Beyond the 3' end: experimental validation of extended transcript isoforms." Nucleic Acids Res 35(6): 1947-1957. Muniyappa, M. K., P. Dowling, M. Henry, P. Meleady, P. Doolan, P. Gammell, M. Clynes and N. Barron (2009). "MiRNA-29a regulates the expression of numerous proteins and reduces the invasiveness and proliferation of human carcinoma cell lines." Eur J Cancer. Nacu, S., W. Yuan, Z. Kan, D. Bhatt, C. S. Rivers, J. Stinson, B. A. Peters, Z. Modrusan, K. Jung, S. Seshagiri and T. D. Wu (2011). "Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples." BMC Med Genomics 4: 11. Najafi-Shoushtari, S. H., F. Kristo, Y. Li, T. Shioda, D. E. Cohen, R. E. Gerszten and A. M. Naar (2010). "MicroRNA-33 and the SREBP host genes cooperate to control cholesterol homeostasis." Science 328(5985): 1566-1569. Nakayama, M., M. L. Gonzalgo, S. Yegnasubramanian, X. Lin, A. M. De Marzo and W. G. Nelson (2004). "GSTP1 CpG island hypermethylation as a molecular biomarker for prostate cancer." J Cell Biochem 91(3): 540-552. Nelson, P. S., N. Clegg, H. Arnold, C. Ferguson, M. Bonham, J. White, L. Hood and B. Lin (2002). "The program of androgen-responsive genes in neoplastic prostate epithelium." Proc Natl Acad Sci U S A 99(18): 11890-11895. Ngan, S., E. A. Stronach, A. Photiou, J. Waxman, S. Ali and L. Buluwela (2009). "Microarray coupled to quantitative RT-PCR analysis of androgen-regulated genes in human LNCaP prostate cancer cells." Oncogene. Noordzij, M. A., G. J. van Steenbrugge, T. H. van der Kwast and F. H. Schroder (1995). "Neuroendocrine cells in the normal, hyperplastic and neoplastic prostate." Urol Res 22(6): 333-341. 146  Norris, J. D., C. Y. Chang, B. M. Wittmann, R. S. Kunder, H. Cui, D. Fan, J. D. Joseph and D. P. McDonnell (2009). "The homeodomain protein HOXB13 regulates the cellular response to androgens." Mol Cell 36(3): 405-416. Ohhata, T., Y. Hoki, H. Sasaki and T. Sado (2008). "Crucial role of antisense transcription across the Xist promoter in Tsix-mediated Xist chromatin modification." Development 135(2): 227-235. Ohno, S. (1972). "So much "junk" DNA in our genome." Brookhaven Symp Biol 23: 366-370. Onder, T. T., P. B. Gupta, S. A. Mani, J. Yang, E. S. Lander and R. A. Weinberg (2008). "Loss of E- cadherin promotes metastasis via multiple downstream transcriptional pathways." Cancer Res 68(10): 3645-3654. Ota, T., Y. Suzuki, T. Nishikawa, T. Otsuki, T. Sugiyama, et al. (2004). "Complete sequencing and characterization of 21,243 full-length human cDNAs." Nat Genet 36(1): 40-45. Page, S. T., D. W. Lin, E. A. Mostaghel, D. L. Hess, L. D. True, J. K. Amory, P. S. Nelson, A. M. Matsumoto and W. J. Bremner (2006). "Persistent intraprostatic androgen concentrations after medical castration in healthy men." J Clin Endocrinol Metab 91(10): 3850-3856. Pang, K. C., M. C. Frith and J. S. Mattick (2006). "Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function." Trends Genet 22(1): 1-5. Pearson, H. (2006). "Genetics: what is a gene?" Nature 441(7092): 398-401. Perner, S., F. Demichelis, R. Beroukhim, F. H. Schmidt, J. M. Mosquera, et al. (2006). "TMPRSS2:ERG fusion-associated deletions provide insight into the heterogeneity of prostate cancer." Cancer Res 66(17): 8337-8341. Perocchi, F., Z. Xu, S. Clauder-Munster and L. M. Steinmetz (2007). "Antisense artifacts in transcriptome microarray experiments are resolved by actinomycin D." Nucleic Acids Res 35(19): e128. Poliseno, L., L. Salmena, J. Zhang, B. Carver, W. J. Haveman and P. P. Pandolfi (2010). "A coding- independent function of gene and pseudogene mRNAs regulates tumour biology." Nature 465(7301): 1033-1038. Pratt, W. B., M. D. Galigniana, Y. Morishima and P. J. Murphy (2004). "Role of molecular chaperones in steroid receptor action." Essays Biochem 40: 41-58. Pruitt, K. D., T. Tatusova, W. Klimke and D. R. Maglott (2009). "NCBI Reference Sequences: current status, policy and new initiatives." Nucleic Acids Res 37(Database issue): D32-36. Quinlan, A. R. and I. M. Hall (2010). "BEDTools: a flexible suite of utilities for comparing genomic features." Bioinformatics 26(6): 841-842. Rederstorff, M., S. H. Bernhart, A. Tanzer, M. Zywicki, K. Perfler, M. Lukasser, I. L. Hofacker and A. Huttenhofer (2010). "RNPomics: Defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles." Nucleic Acids Res. Reis, E. M., H. I. Nakaya, R. Louro, F. C. Canavez, A. V. Flatschart, et al. (2004). "Antisense intronic non- coding RNA levels correlate to the degree of tumor differentiation in prostate cancer." Oncogene 23(39): 6684-6692. Ribas, J., X. Ni, M. Haffner, E. A. Wentzel, A. H. Salmasi, W. H. Chowdhury, T. A. Kudrolli, S. Yegnasubramanian, J. Luo, R. Rodriguez, J. T. Mendell and S. E. Lupold (2009). "miR-21: An Androgen Receptor-Regulated MicroRNA that Promotes Hormone-Dependent and Hormone-Independent Prostate Cancer Growth." Cancer Res. 147  Rickman, D. S., D. Pflueger, B. Moss, V. E. VanDoren, C. X. Chen, A. de la Taille, R. Kuefer, A. K. Tewari, S. R. Setlur, F. Demichelis and M. A. Rubin (2009). "SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer." Cancer Res 69(7): 2734-2738. Riegman, P. H., R. J. Vlietstra, J. A. van der Korput, A. O. Brinkmann and J. Trapman (1991). "The promoter of the prostate-specific antigen gene contains a functional androgen responsive element." Mol Endocrinol 5(12): 1921-1930. Rinn, J. L., M. Kertesz, J. K. Wang, S. L. Squazzo, X. Xu, S. A. Brugmann, L. H. Goodnough, J. A. Helms, P. J. Farnham, E. Segal and H. Y. Chang (2007). "Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs." Cell 129(7): 1311-1323. Robertson, G., M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, et al. (2007). "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing." Nat Methods 4(8): 651-657. Rocchi, P., A. So, S. Kojima, M. Signaevsky, E. Beraldi, L. Fazli, A. Hurtado-Coll, K. Yamanaka and M. Gleave (2004). "Heat shock protein 27 increases after androgen ablation and plays a cytoprotective role in hormone-refractory prostate cancer." Cancer Res 64(18): 6595-6602. Rocchi, P., A. So, S. Kojima, M. Signaevsky, E. Beraldi, L. Fazli, A. Hurtado-Coll, K. Yamanaka and M. Gleave (2004). "Heat shock protein 27 increases after androgen ablation and plays a cytoprotective role in hormone-refractory prostate cancer." Cancer Res 64: 6595-6602. Ruijter, E. T., C. A. van de Kaa, J. A. Schalken, F. M. Debruyne and D. J. Ruiter (1996). "Histological grade heterogeneity in multifocal prostate cancer. Biological and clinical implications." J Pathol 180(3): 295-299. Ruizeveld de Winter, J. A., P. J. Janssen, H. M. Sleddens, M. C. Verleun-Mooijman, J. Trapman, A. O. Brinkmann, A. B. Santerse, F. H. Schroder and T. H. van der Kwast (1994). "Androgen receptor status in localized and locally progressive hormone refractory human prostate cancer." Am J Pathol 144(4): 735- 746. Rullis, I., J. A. Shaeffer and O. M. Lilien (1975). "Incidence of prostatic carcinoma in the elderly." Urology 6(3): 295-297. Sandberg, R., J. R. Neilson, A. Sarma, P. A. Sharp and C. B. Burge (2008). "Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites." Science 320(5883): 1643-1647. Scher, H. I. and C. L. Sawyers (2005). "Biology of progressive, castration-resistant prostate cancer: directed therapies targeting the androgen-receptor signaling axis." J Clin Oncol 23(32): 8253-8261. Schroder, F. H., J. Hugosson, M. J. Roobol, T. L. Tammela, S. Ciatto, et al. (2009). "Screening and prostate-cancer mortality in a randomized European study." N Engl J Med 360(13): 1320-1328. Shang, Y., M. Myers and M. Brown (2002). "Formation of the androgen receptor transcription complex." Mol Cell 9(3): 601-610. Shen, M. M. and C. Abate-Shen (2010). "Molecular genetics of prostate cancer: new prospects for old challenges." Genes Dev 24(18): 1967-2000. Shi, Y., J. Sawada, G. Sui, B. Affar el, J. R. Whetstine, F. Lan, H. Ogawa, M. P. Luke and Y. Nakatani (2003). "Coordinated histone modifications mediated by a CtBP co-repressor complex." Nature 422(6933): 735-738. 148  Singh, P., T. L. Alley, S. M. Wright, S. Kamdar, W. Schott, R. Y. Wilpan, K. D. Mills and J. H. Graber (2009). "Global changes in processing of mRNA 3' untranslated regions characterize clinically distinct cancer subtypes." Cancer Res 69(24): 9422-9430. Sinha, R., Y. Park, B. I. Graubard, M. F. Leitzmann, A. Hollenbeck, A. Schatzkin and A. J. Cross (2009). "Meat and meat-related compounds and risk of prostate cancer in a large prospective cohort study in the United States." Am J Epidemiol 170(9): 1165-1177. Small, E. J. and C. J. Ryan (2006). "The case for secondary hormonal therapies in the chemotherapy age." J Urol 176(6 Pt 2): S66-71. Smyth, G. K. (2004). "Linear models and empirical bayes methods for assessing differential expression in microarray experiments." Stat Appl Genet Mol Biol 3: Article3. So, A., M. Gleave, A. Hurtado-Col and C. Nelson (2005). "Mechanisms of the development of androgen independence in prostate cancer." World J Urol 23(1): 1-9. So, A. I., A. Hurtado-Coll and M. E. Gleave (2003). "Androgens and prostate cancer." World J Urol 21(5): 325-337. Spizzo, R., M. S. Nicoloso, C. M. Croce and G. A. Calin (2009). "SnapShot: MicroRNAs in Cancer." Cell 137(3): 586-586 e581. Srikantan, V., Z. Zou, G. Petrovics, L. Xu, M. Augustus, et al. (2000). "PCGEM1, a prostate-specific gene, is overexpressed in prostate cancer." Proc Natl Acad Sci U S A 97(22): 12216-12221. Stanbrough, M., G. J. Bubley, K. Ross, T. R. Golub, M. A. Rubin, T. M. Penning, P. G. Febbo and S. P. Balk (2006). "Increased expression of genes converting adrenal androgens to testosterone in androgen- independent prostate cancer." Cancer Res 66(5): 2815-2825. Steinkamp, M. P., O. A. O'Mahony, M. Brogley, H. Rehman, E. W. Lapensee, S. Dhanasekaran, M. D. Hofer, R. Kuefer, A. Chinnaiyan, M. A. Rubin, K. J. Pienta and D. M. Robins (2009). "Treatment- dependent androgen receptor mutations in prostate cancer exploit multiple mechanisms to evade therapy." Cancer Res 69(10): 4434-4442. Steyerberg, E. W., M. J. Roobol, M. W. Kattan, T. H. van der Kwast, H. J. de Koning and F. H. Schroder (2007). "Prediction of indolent prostate cancer: validation and updating of a prognostic nomogram." J Urol 177(1): 107-112; discussion 112. Sun, S., C. C. Sprenger, R. L. Vessella, K. Haugk, K. Soriano, E. A. Mostaghel, S. T. Page, I. M. Coleman, H. M. Nguyen, H. Sun, P. S. Nelson and S. R. Plymate (2010). "Castration resistance in human prostate cancer is conferred by a frequently occurring androgen receptor splice variant." J Clin Invest. Swinnen, J. V., W. Ulrix, W. Heyns and G. Verhoeven (1997). "Coordinate regulation of lipogenic gene expression by androgens: evidence for a cascade mechanism involving sterol regulatory element binding proteins." Proc Natl Acad Sci U S A 94(24): 12975-12980. Taft, R. J., E. A. Glazov, N. Cloonan, C. Simons, S. Stephen, et al. (2009). "Tiny RNAs associated with transcription start sites in animals." Nat Genet. Taft, R. J., C. Simons, S. Nahkuri, H. Oey, D. J. Korbie, T. R. Mercer, J. Holst, W. Ritchie, J. J. Wong, J. E. Rasko, D. S. Rokhsar, B. M. Degnan and J. S. Mattick (2010). "Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in metazoans." Nat Struct Mol Biol 17(8): 1030- 1034. Tan, J., Y. Sharief, K. G. Hamil, C. W. Gregory, D. Y. Zang, M. Sar, P. H. Gumerlock, R. W. deVere White, T. G. Pretlow, S. E. Harris, E. M. Wilson, J. L. Mohler and F. S. French (1997). 149  "Dehydroepiandrosterone activates mutant androgen receptors expressed in the androgen-dependent human prostate cancer xenograft CWR22 and LNCaP cells." Mol Endocrinol 11(4): 450-459. Tannock, I. F., R. de Wit, W. R. Berry, J. Horti, A. Pluzanska, K. N. Chi, S. Oudard, C. Theodore, N. D. James, I. Turesson, M. A. Rosenthal and M. A. Eisenberger (2004). "Docetaxel plus prednisone or mitoxantrone plus prednisone for advanced prostate cancer." N Engl J Med 351(15): 1502-1512. Taplin, M. E., G. J. Bubley, Y. J. Ko, E. J. Small, M. Upton, B. Rajeshkumar and S. P. Balk (1999). "Selection for androgen receptor mutations in prostate cancers treated with androgen antagonist." Cancer Res 59(11): 2511-2515. Taplin, M. E., G. J. Bubley, T. D. Shuster, M. E. Frantz, A. E. Spooner, G. K. Ogata, H. N. Keer and S. P. Balk (1995). "Mutation of the androgen-receptor gene in metastatic androgen-independent prostate cancer." N Engl J Med 332(21): 1393-1398. Taplin, M. E., B. Rajeshkumar, S. Halabi, C. P. Werner, B. A. Woda, J. Picus, W. Stadler, D. F. Hayes, P. W. Kantoff, N. J. Vogelzang and E. J. Small (2003). "Androgen receptor mutations in androgen- independent prostate cancer: Cancer and Leukemia Group B Study 9663." J Clin Oncol 21(14): 2673- 2678. Taylor, B. S., N. Schultz, H. Hieronymus, A. Gopalan, Y. Xiao, et al. (2010). "Integrative genomic profiling of human prostate cancer." Cancer Cell 18(1): 11-22. Tian, B., J. Hu, H. Zhang and C. S. Lutz (2005). "A large-scale analysis of mRNA polyadenylation of human and mouse genes." Nucleic Acids Res 33(1): 201-212. Titus, M. A., M. J. Schell, F. B. Lih, K. B. Tomer and J. L. Mohler (2005). "Testosterone and dihydrotestosterone tissue levels in recurrent prostate cancer." Clin Cancer Res 11(13): 4653-4657. Tomlins, S. A., D. R. Rhodes, S. Perner, S. M. Dhanasekaran, R. Mehra, et al. (2005). "Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer." Science 310(5748): 644-648. Tran, C., S. Ouk, N. J. Clegg, Y. Chen, P. A. Watson, et al. (2009). "Development of a second-generation antiandrogen for treatment of advanced prostate cancer." Science 324(5928): 787-790. Trapnell, C., B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold and L. Pachter (2010). "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation." Nat Biotechnol 28(5): 511-515. Tress, M. L., P. L. Martelli, A. Frankish, G. A. Reeves, J. J. Wesselink, et al. (2007). "The implications of alternative splicing in the ENCODE protein complement." Proc Natl Acad Sci U S A 104(13): 5495-5500. Tsai, M. C., O. Manor, Y. Wan, N. Mosammaparast, J. K. Wang, F. Lan, Y. Shi, E. Segal and H. Y. Chang (2010). "Long noncoding RNA as modular scaffold of histone modification complexes." Science 329(5992): 689-693. Ueda, T., N. Bruchovsky and M. D. Sadar (2002). "Activation of the androgen receptor N-terminal domain by interleukin-6 via MAPK and STAT3 signal transduction pathways." J Biol Chem 277(9): 7076-7085. Ueda, T., N. R. Mawji, N. Bruchovsky and M. D. Sadar (2002). "Ligand-independent activation of the androgen receptor by interleukin-6 and the role of steroid receptor coactivator-1 in prostate cancer cells." J Biol Chem 277(41): 38087-38094. van der Kwast, T. H., J. Schalken, J. A. Ruizeveld de Winter, C. C. van Vroonhoven, E. Mulder, W. Boersma and J. Trapman (1991). "Androgen receptors in endocrine-therapy-resistant human prostate cancer." Int J Cancer 48(2): 189-193. 150  Varambally, S., S. M. Dhanasekaran, M. Zhou, T. R. Barrette, C. Kumar-Sinha, M. G. Sanda, D. Ghosh, K. J. Pienta, R. G. Sewalt, A. P. Otte, M. A. Rubin and A. M. Chinnaiyan (2002). "The polycomb group protein EZH2 is involved in progression of prostate cancer." Nature 419(6907): 624-629. Veldscholte, J., M. M. Voorhorst-Ogink, J. Bolt-de Vries, H. C. van Rooij, J. Trapman and E. Mulder (1990). "Unusual specificity of the androgen receptor in the human prostate tumor cell line LNCaP: high affinity for progestagenic and estrogenic steroids." Biochim Biophys Acta 1052(1): 187-194. Visakorpi, T., E. Hyytinen, P. Koivisto, M. Tanner, R. Keinanen, C. Palmberg, A. Palotie, T. Tammela, J. Isola and O. P. Kallioniemi (1995). "In vivo amplification of the androgen receptor gene and progression of human prostate cancer." Nat Genet 9(4): 401-406. Waller, A. S., R. M. Sharrard, P. Berthon and N. J. Maitland (2000). "Androgen receptor localisation and turnover in human prostate epithelium treated with the antiandrogen, casodex." J Mol Endocrinol 24(3): 339-351. Wang, D., I. Garcia-Bassets, C. Benner, W. Li, X. Su, Y. Zhou, J. Qiu, W. Liu, M. U. Kaikkonen, K. A. Ohgi, C. K. Glass, M. G. Rosenfeld and X. D. Fu (2011). "Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA." Nature. Wang, E. T., R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. F. Kingsmore, G. P. Schroth and C. B. Burge (2008). "Alternative isoform regulation in human tissue transcriptomes." Nature 456(7221): 470-476. Wang, Q., J. S. Carroll and M. Brown (2005). "Spatial and temporal recruitment of androgen receptor and its coactivators involves chromosomal looping and polymerase tracking." Mol Cell 19(5): 631-642. Wang, Q., W. Li, X. S. Liu, J. S. Carroll, O. A. Janne, E. K. Keeton, A. M. Chinnaiyan, K. J. Pienta and M. Brown (2007). "A hierarchical network of transcription factors governs androgen receptor-dependent prostate cancer growth." Mol Cell 27(3): 380-392. Wang, Q., W. Li, Y. Zhang, X. Yuan, K. Xu, et al. (2009). "Androgen receptor regulates a distinct transcription program in androgen-independent prostate cancer." Cell 138(2): 245-256. Wang, X., M. Kruithof-de Julio, K. D. Economides, D. Walker, H. Yu, M. V. Halili, Y. P. Hu, S. M. Price, C. Abate-Shen and M. M. Shen (2009). "A luminal epithelial stem cell that is a cell of origin for prostate cancer." Nature 461(7263): 495-500. Wu, C. P. and F. L. Gu (1991). "The prostate in eunuchs." Prog Clin Biol Res 370: 249-255. Yeh, S., H. K. Lin, H. Y. Kang, T. H. Thin, M. F. Lin and C. Chang (1999). "From HER2/Neu signal cascade to androgen receptor and its coactivators: a novel pathway by induction of androgen target genes through MAP kinase in prostate cancer cells." Proc Natl Acad Sci U S A 96(10): 5458-5463. Yu, J., R. S. Mani, Q. Cao, C. J. Brenner, X. Cao, et al. (2010). "An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression." Cancer Cell 17(5): 443-454. Yu, J., J. Yu, D. R. Rhodes, S. A. Tomlins, X. Cao, G. Chen, R. Mehra, X. Wang, D. Ghosh, R. B. Shah, S. Varambally, K. J. Pienta and A. M. Chinnaiyan (2007). "A polycomb repression signature in metastatic prostate cancer predicts cancer outcome." Cancer Res 67(22): 10657-10663. Yu, W., D. Gius, P. Onyango, K. Muldoon-Jacobs, J. Karp, A. P. Feinberg and H. Cui (2008). "Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA." Nature 451(7175): 202-206. Zhang, Y., T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nussbaum, R. M. Myers, M. Brown, W. Li and X. S. Liu (2008). "Model-based analysis of ChIP-Seq (MACS)." Genome Biol 9(9): R137. 151  Zhao, J., T. K. Ohsumi, J. T. Kung, Y. Ogawa, D. J. Grau, K. Sarma, J. J. Song, R. E. Kingston, M. Borowsky and J. T. Lee (2010). "Genome-wide identification of polycomb-associated RNAs by RIP-seq." Mol Cell 40(6): 939-953. Zhou, B. P., M. C. Hu, S. A. Miller, Z. Yu, W. Xia, S. Y. Lin and M. C. Hung (2000). "HER-2/neu blocks tumor necrosis factor-induced apoptosis via the Akt/NF-kappaB pathway." J Biol Chem 275(11): 8027- 8031. Zoubeidi, A., A. Zardan, E. Beraldi, L. Fazli, R. Sowery, P. Rennie, C. Nelson and M. Gleave (2007). "Cooperative interactions between androgen receptor (AR) and heat-shock protein 27 facilitate AR transcriptional activity." Cancer Res 67(21): 10455-10465.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0072483/manifest

Comment

Related Items