UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

miRNA sequence analysis reveals cancer subtypes that correlate with tumour characteristics and patient… Lim, Emilia Lee Yian 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_may_lim_emilia.pdf [ 26.93MB ]
Metadata
JSON: 24-1.0228497.json
JSON-LD: 24-1.0228497-ld.json
RDF/XML (Pretty): 24-1.0228497-rdf.xml
RDF/JSON: 24-1.0228497-rdf.json
Turtle: 24-1.0228497-turtle.txt
N-Triples: 24-1.0228497-rdf-ntriples.txt
Original Record: 24-1.0228497-source.json
Full Text
24-1.0228497-fulltext.txt
Citation
24-1.0228497.ris

Full Text

MIRNA SEQUENCE ANALYSIS REVEALS CANCER SUBTYPES THAT CORRELATE WITH TUMOUR CHARACTERISTICS AND PATIENT OUTCOMES  by  Emilia Lee Yian Lim  B.Sc. (Honors), University of Alberta, 2009  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES  (Bioinformatics)   THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)  April, 2016   © Emilia Lee Yian Lim, 2016  ii Abstract microRNAs (miRNAs) are small 17-25nt RNA molecules that regulate gene expression at the post-transcriptional level. A given miRNA may have up to several hundred gene targets, and 60% of messenger RNAs (mRNAs) have binding sites for multiple miRNAs in their 3’-untranslated regions (UTRs). miRNAs have been implicated in the regulation of numerous biological processes, including cellular growth, differentiation and apoptosis, and miRNA dysregulation has been associated with diseases including cancers. miRNAs are stable and robust in a variety of fresh and preserved human tissues, and thus are useful in disease classification and subtype identification. They have also been used to infer dysregulation of regulatory pathways. With the aims of identifying cancer subtypes and relating these to clinical covariates and studying miRNA-mediated regulation, I analyzed miRNA-seq and mRNA-seq expression profiles from diffuse large B-cell lymphomas (DLBCL), pediatric acute myeloid leukemias (AML) and pediatric malignant rhabdoid tumours (MRT). My analyses provided comprehensive characterization of miRNA expression profiles, revealed molecular sub-groups within cancer types, novel miRNA species, putative miRNA prognostic markers, and candidate functional miRNA:mRNA interactions. Of note, I discovered a novel miRNA (miR-10393-3p) that was preferentially expressed in DLBCL samples, and further revealed that it could target genes involved in chromatin modification. I also found that the miR-106a-363 cluster was not only significantly associated with inferior patient outcomes in pediatric AML, but may also contribute to treatment resistance by modulating the expression of genes involved in oxidative phosphorylation. In addition, I performed hierarchical clustering of MRT miRNA profiles together with those of 11,753 other samples representing 36 cancer types and 26 normal tissue types. This analysis demonstrated that MRT samples are most similar to cerebellum and DLBCL samples, possibly reflecting a related cell of origin as these samples. Overall, the research presented in this thesis constitutes a step forward in our understanding of miRNA dysregulation within cancer types and identifies miRNAs that could be useful prognostic markers in guiding treatment selection.  iii Preface In conjunction with my advisor, Dr. Marco Marra, I was involved in the conceptualization and design of research activities described in this thesis. In particular, I was responsible for the design and conduct of the computational experiments, data analysis, and generation of the tables, figures, and text in the thesis. Where there are exceptions, they are noted below. Chapter 1 was written by me, and portions of it have been published: Lim EL and Marra MA. MicroRNA dysregulation in B-cell non-Hodgkin lymphoma. BLCTT. 2013;2013(3):25–40. Chapter 2 has been published under the Creative Commons Attribution license 4.0: Lim EL, Trinh DL, Scott DW, Chu A, Krzywinski M, Zhao YJ, Robertson AG, Mungall AJ, Schein J, Boyle M, Mottok A, Ennishi D, Johnson NA, Steidl C, Connors JM, Morin RD, Gascoyne RD, Marra MA. “Comprehensive miRNA sequence analysis reveals survival differences in diffuse large B-cell lymphoma patients.” Genome Biol. 2015;16(1):18. In this study, I conducted all bioinformatics analyses and wrote the manuscript with Marco Marra and with conceptual contributions from Ryan Morin, Randy Gascoyne and Joseph Connors. Diane Trinh performed the wet-lab experiments, including luciferase assays and qPCR, presented in this thesis. David Scott, Nathalie Johnson, Merrill Boyle, Anja Mottok and Daisuke Ennishi prepared the tumour samples and curated the clinical information for each sample. Andrew Mungall, Yongjun Zhao, Jacqueline Schein and Richard Moore generated the sequencing libraries and generated high-throughput sequence data. A version of Chapter 3 is in preparation for submission: Lim EL, Trinh DL, Ries R, Ma Y, Topham J, Hughes M, Pleasance E, Mungall AJ, Moore R, Zhao YJ, Gerhard DS, Oehler V, Kolb EA, Gamis A, Smith M. Alonzo TA, Arceci RJ, Meshinchi S, Marra MA. “Comprehensive Transcriptome Sequence Analysis Identifies mRNA and miRNA Transcripts Associated With Relapse and Refractory Pediatric Acute Myeloid Leukemia”. In this study, I conducted all bioinformatics analyses and wrote the manuscript with Marco Marra and with conceptual contributions from Daniela Gerhard, Soheil Meshinchi and Robert Arceci. Diane Trinh performed the wet lab experiments, including the luciferase assays presented in this thesis. Rhonda Ries prepared the tumour samples and curated the  iv clinical information for each sample. Todd Alonzo provided statistical consultation. Andrew Mungall, Yongjun Zhao, Jacqueline Schein and Richard Moore performed high-throughput sequencing on the samples. Members of the collaborative Children’s Oncology Group (COG) and National Cancer Institute/National Institutes of Health (NCI/NIH) Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative, including Daniela Gerhard, Vivian Oehler, E. Anders Kolb, Alan Gamis and Malcolm Smith, were involved in the original conception of this project. Chapter 4 is part of a published manuscript: Chun HJE, Lim EL, Heravi-Moussavi A, Saberi S, Mungall KL, Bilenky M, Carles A, Tse K, Shlafman I, Zhu K, Qian JQ, Harvey D, He A, Long W, Goya R, Ng M, LeBlanc V, Pleasance E, Thiessen N, Wong T, Chuah E, Zhao YJ, Schein JE, Gerhard DS, Taylor MD, Mungall AJ, Moore RA, Ma Y, Jones SJM, Perlman EJ, Hirst M, Marra MA. "Genome-wide profiles of extra-cranial malignant rhabdoid tumours reveal molecularly distinct sub-groups with dysregulated developmental pathways" Cancer Cell. 2016;29(3):394–406. While the work presented in this chapter is part of a larger effort, which includes whole genome, transcriptome and epigenome profiles of 40 MRT cases, the analyses reported here consist only of bioinformatics analyses performed by me. Portions of the text in this chapter were adapted from the submitted manuscript, and were written in collaboration with Elizabeth Chun and Marco Marra, with conceptual contributions from Elizabeth Perlman and Daniela Gerhard. Andrew Mungall, Yongjun Zhao, Jacqueline Schein and Richard Moore performed high-throughput sequencing on the samples. Members of the NCI/NIH TARGET initiative, including Daniela Gerhard, were involved in the original conception of this project. Chapter 5 was written by me. The research in this thesis was approved by the UBC Research Ethics Board – Chapter 2 (REB number H05-60103); Chapter 3 and 4 (REB number H09-02558).  v Table of Contents Abstract .................................................................................................................................... ii	Preface ..................................................................................................................................... iii	Table of Contents .................................................................................................................... v	List of Tables ........................................................................................................................... x	List of Figures ......................................................................................................................... xi	List of Abbreviations ........................................................................................................... xiii	Acknowledgements .............................................................................................................. xvi	Dedication ........................................................................................................................... xviii	1	 Studying miRNA Dysregulation in Cancer .................................................................... 1	1.1	 Introduction ............................................................................................................................... 1	1.2	 Early insights into cancer as a genetic disease ......................................................................... 2	1.2.1	 Oncogenes and tumour suppressors .................................................................................... 3	1.2.2	 Tumourigenesis requires multiple genetic alterations ......................................................... 4	1.3	 Cancer is a heterogeneous disease ............................................................................................ 5	1.3.1	 Genetic heterogeneity .......................................................................................................... 5	1.3.2	 Intra-patient heterogeneity ................................................................................................... 6	1.4	 Global molecular profiles of cancer .......................................................................................... 7	1.4.1	 Gene expression analysis ..................................................................................................... 7	1.4.2	 Sequence-based molecular profiles ..................................................................................... 8	1.5	 miRNA-mediated repression .................................................................................................... 9	1.5.1	 miRNA-biogenesis and mechanism of action ..................................................................... 9	1.6	 miRNA dysregulation in cancer ............................................................................................. 11	1.6.1	 miRNAs as tumour suppressors ........................................................................................ 11	1.6.2	 miRNAs as oncogenes ....................................................................................................... 12	1.6.3	 Causes of miRNA dysregulation ....................................................................................... 13	1.6.3.1	 Copy number alterations .......................................................................................... 13	1.6.3.2	 Translocations .......................................................................................................... 13	1.6.3.3	 Aberrant transcription factor activity ....................................................................... 14	1.6.3.4	 Chromatin modification ........................................................................................... 14	1.6.3.5	 Viral infection .......................................................................................................... 15	1.6.3.6	 Defects in miRNA biogenesis .................................................................................. 16	1.6.3.7	 Alterations in miRNA targets .................................................................................. 17	1.6.4	 The clinical utility of miRNAs .......................................................................................... 17	 vi 1.6.4.1	 miRNAs as diagnostic and prognostic tools in cancer ............................................. 17	1.6.4.2	 Correcting for aberrantly expressed miRNAs .......................................................... 18	1.6.5	 miRNAs in treatment resistance ........................................................................................ 19	1.7	 miRNA profiling approaches .................................................................................................. 20	1.7.1	 Probe-based miRNA profiling ........................................................................................... 20	1.7.2	 miRNA sequencing (miRNA-seq) .................................................................................... 20	1.7.3	 Profiling miRNAs in FFPET tissues ................................................................................. 21	1.8	 Identification of miRNA-mediated repression interactions .................................................... 21	1.8.1	 Prediction of miRNA binding sites ................................................................................... 21	1.8.2	 Integrative miRNA and mRNA expression profile analysis ............................................. 22	1.8.3	 Experimental validation of miRNA:mRNA interactions .................................................. 23	1.9	 miRNA expression profiles of cancers studied in this thesis .................................................. 24	1.9.1	 Diffuse large B-cell lymphoma ......................................................................................... 24	1.9.2	 Pediatric acute myeloid leukemia ...................................................................................... 25	1.9.3	 Pediatric malignant rhabdoid tumours ............................................................................... 26	1.10	 Thesis objectives and chapter overview ............................................................................... 27	1.11	 Figures .................................................................................................................................. 29	2	 Comprehensive miRNA Sequence Analysis Reveals Survival Differences in Diffuse Large B-cell Lymphoma Patients .................................................................... 30	2.1	 Introduction ............................................................................................................................. 30	2.2	 Results ..................................................................................................................................... 31	2.2.1	 miRNA sequencing of fresh frozen DLBCL tumor and centroblast samples ................... 31	2.2.2	 Novel miRNA discovery ................................................................................................... 32	2.2.3	 miRNA expression in DLBCL .......................................................................................... 33	2.2.4	 B-cell-enriched miRNA expression profiles ..................................................................... 34	2.2.5	 Integrative analysis of miRNA and mRNA expression .................................................... 35	2.2.6	 miRNAs associated with DLBCL patient outcome ........................................................... 36	2.2.6.1	 R-CHOP-treated discovery cohort ........................................................................... 36	2.2.6.2	 miRNAs associated with patient survival ................................................................ 37	2.2.6.3	 R-CHOP-treated validation cohort ........................................................................... 37	2.2.6.4	 Validation of miRNAs associated with patient survival .......................................... 38	2.2.6.5	 miRNA expression profiles associated with patient survival .................................. 38	2.3	 Discussion ............................................................................................................................... 40	2.4	 Methods .................................................................................................................................. 44	2.4.1	 Lymphoma patient samples (both discovery and validation cohorts) ............................... 44	 vii 2.4.2	 Patient sample acquisition (discovery cohort) ................................................................... 44	2.4.3	 Patient sample acquisition (validation cohort) .................................................................. 44	2.4.4	 Library construction and sequencing of miRNA-seq Illumina libraries ........................... 44	2.4.5	 Discovery of candidate novel miRNAs ............................................................................. 45	2.4.6	 Analysis of HITS-CLIP data ............................................................................................. 45	2.4.7	 Quantitative RT-PCR for novel miRNA validation .......................................................... 46	2.4.8	 mRNA isoform-specific expression profiling with mRNA-seq ........................................ 46	2.4.9	 Differential expression analysis ........................................................................................ 46	2.4.10	Integrative miRNA:mRNA expression analysis ............................................................... 47	2.4.11	Gene ontology (GO) term enrichment analysis ................................................................. 48	2.4.12	Cell culture ........................................................................................................................ 48	2.4.13	Plasmid constructs ............................................................................................................. 48	2.4.14	miRNA mimics .................................................................................................................. 49	2.4.15	Dual-luciferase reporter assays .......................................................................................... 49	2.4.16	Survival analysis ................................................................................................................ 49	2.4.17	NMF clustering of miRNA-seq expression ....................................................................... 50	2.5	 Figures and tables ................................................................................................................... 50	3	 Comprehensive Transcriptome Sequence Analysis of Relapse and Induction Failure in Pediatric Acute Myeloid Leukemia Reveals Transcripts Associated With Treatment Resistance ........................................................................................... 85	3.1	 Introduction ............................................................................................................................. 85	3.2	 Results ..................................................................................................................................... 87	3.2.1	 Sequence analysis of pediatric AML samples ................................................................... 87	3.2.2	 Unsupervised clustering of mRNA transcript expression reveals 5 sub-groups correlated with survival differences .................................................................................. 88	3.2.3	 mRNA transcripts associated with treatment failure ......................................................... 89	3.2.4	 mRNA transcripts associated with patient survival ........................................................... 91	3.2.5	 Unsupervised clustering of miRNA expression profiles reveals 2 sub-groups characterized by specific cytogenetic alterations .............................................................. 92	3.2.6	 miRNA expression in treatment resistant patient samples ................................................ 93	3.2.7	 Integrative analysis reveals putative miRNA:mRNA interactions in pediatric AML ....... 94	3.2.8	 miRNAs associated with patient survival .......................................................................... 95	3.2.9	 Targets of miR-106a-363 are involved in oxidative phosphorylation ............................... 96	3.2.10	Luciferase reporter assays confirm potential miR-106a interactions ................................ 97	3.3	 Discussion ............................................................................................................................... 97	3.4	 Methods ................................................................................................................................ 100	 viii 3.4.1	 Patient samples & treatment protocol .............................................................................. 100	3.4.2	 Cell lines .......................................................................................................................... 100	3.4.3	 Plasmid constructs and miRNA mimics .......................................................................... 101	3.4.4	 Library construction and sequencing of miRNA-seq Illumina libraries ......................... 101	3.4.5	 Library construction and sequencing of mRNA-seq Illumina libraries .......................... 101	3.4.6	 mRNA isoform-specific expression profiling of mRNA-seq .......................................... 103	3.4.7	 BLASTP .......................................................................................................................... 103	3.4.8	 Differential expression analysis ...................................................................................... 103	3.4.9	 Integrative miRNA:mRNA expression analysis ............................................................. 103	3.4.10	Survival analysis .............................................................................................................. 104	3.4.11	NMF clustering of miRNA and mRNA expression ........................................................ 105	3.4.12	Dual-luciferase reporter assays ........................................................................................ 105	3.5	 Figures and tables ................................................................................................................. 106	4	 Characterization of miRNA Expression in Malignant Rhabdoid Tumours Reveals Dysregulated Transcripts Associated With Different Tissue Types .......... 149	4.1	 Introduction ........................................................................................................................... 149	4.2	 Results ................................................................................................................................... 152	4.2.1	 Unsupervised clustering of miRNA expression profiles reveals 2 sub-groups ............... 152	4.2.2	 Pan-cancer/tissue miRNA analysis reveals tissue types that are similar to MRT ........... 154	4.2.3	 miRNA expression comparison with normal cerebellum identifies potentially perturbed miRNA-mediated gene regulation in MRT ..................................................... 155	4.3	 Discussion ............................................................................................................................. 157	4.4	 Methods ................................................................................................................................ 158	4.4.1	 Tissue samples and cell lines ........................................................................................... 158	4.4.2	 RNA-seq data generation and processing ........................................................................ 158	4.4.3	 miRNA-seq data generation and processing ................................................................... 159	4.4.4	 Hierarchical clustering of miRNA expression profiles from MRT and other cancer and normal tissue types ................................................................................................... 159	4.4.5	 Differential expression analyses ...................................................................................... 160	4.4.6	 NMF clustering of MRT miRNA expression profiles ..................................................... 160	4.4.7	 Identification of candidate miRNA:mRNA interactions through integrative miRNA:mRNA analysis .................................................................................................. 160	4.5	 Figures and tables ................................................................................................................. 161	5	 Conclusions and Future Directions ............................................................................. 172	5.1	 miRNA expression profiling reveals heterogeneity within cancer types ............................. 172	5.2	 miRNAs as biomarkers and therapeutic targets .................................................................... 173	 ix 5.3	 Identification of novel miRNA species and variation .......................................................... 175	5.4	 Integrative miRNA:mRNA analysis reveals miRNA-mediated repression interactions ...... 175	5.5	 Pan-cancer miRNA analysis associates samples of the same tissue type ............................. 177	5.6	 Conclusion ............................................................................................................................ 177	Bibliography ........................................................................................................................ 178	Appendices ........................................................................................................................... 199	Appendices for Chapter 3 ............................................................................................................... 199	Appendix 3A: Differentially expressed mRNA transcripts in pediatric AML .......................... 199	mRNA transcripts characteristic of mRNA sub-group 1 ...................................................... 199	mRNA transcripts characteristic of mRNA sub-group 2 ...................................................... 199	mRNA transcripts characteristic of mRNA sub-group 3 ...................................................... 202	mRNA transcripts characteristic of mRNA sub-group 4 ...................................................... 204	mRNA transcripts characteristic of mRNA sub-group 5 ...................................................... 209	Relapse samples compared with primary samples ................................................................ 213	Refractory samples compared with primary samples ............................................................ 215	Appendix 3B: Differentially expressed miRNA transcripts in pediatric AML ......................... 228	Differentially expressed miRNAs between miRNA sub-groups ........................................... 228	Relapse samples compared with primary samples ................................................................ 241	Refractory samples compared with primary samples ............................................................ 249	Primary samples from refractory patients compared with primary samples from complete response patients .................................................................................................................... 258	Samples with the cytogenetic alteration compared with samples without the cytogenetic alteration ................................................................................................................................ 262	miRNA transcripts characteristic of mRNA sub-group 2 ..................................................... 269	miRNA transcripts characteristic of mRNA sub-group 3 ..................................................... 270	miRNA transcripts characteristic of mRNA sub-group 4 ..................................................... 271	miRNA transcripts characteristic of mRNA sub-group 5 ..................................................... 275	Appendices for Chapter 4 ............................................................................................................... 278	Appendix 4A: Differentially expressed miRNA transcripts in pediatric MRT ......................... 278	MRT compared with normal fetal cerebellum ...................................................................... 278	Differentially expressed miRNAs between miRNA sub-groups ........................................... 283	     x List of Tables Table 2-1	 Clinical characteristics of the patients in the DLBCL discovery cohort .............. 80	Table 2-2	 Genomic coordinates of expressed candidate novel miRNAs ............................. 81	Table 2-3	 Clinical characteristics of the patients in the DLBCL validation cohort ............. 84	Table 3-1    Clinical characteristics of all 676 AML patients included in this study ............ 146	Table 4-1    List of identifiers and clinical information for 40 MRT cases .......................... 168	Table 4-2    Abbreviations of each cancer and normal tissue types included in the pan-cancer miRNA clustering analysis ..................................................................... 170	    xi List of Figures Figure 1.1	 miRNA biogenesis and function .......................................................................... 29	Figure 2.1	 Profiling miRNA expression in DLBCL ............................................................. 50	Figure 2.2	 Differential expression analyses of known and candidate novel miRNAs .......... 53	Figure 2.3	 miR-142 expression in DLBCL, centroblasts, and other cancers ........................ 55	Figure 2.4	 Pipeline for discovering putative miRNA:mRNA interactions ........................... 56	Figure 2.5	 Candidate miRNA:mRNA interactions in DLBCL are involved in various cellular processes ................................................................................................. 57	Figure 2.6	 MLL2 and EP300 may be targets of miRNA-mediated repression ..................... 59	Figure 2.7	 Kaplan-Meier (KM) curves illustrating DLBCL patient survival ....................... 61	Figure 2.8	 Survival analyses in the discovery and validation cohorts .................................. 63	Figure 2.9	 Heat map comparing matched discovery cohort FF and validation cohort FFPET samples for 28 cases ................................................................................ 65	Figure 2.10	Kaplan-Meier curves and strip charts of expression levels for the 6 miRNAs that were found to be associated with OS and PFS, independently of COO and IPI in both the discovery and validation cohorts ............................. 67	Figure 2.11	NMF Clustering identifies two clusters of DLBCL patients with distinct miRNA and outcome profiles .............................................................................. 74	Figure 2.12	Non-negative matrix factorization (NMF) solutions and metrics ........................ 76	Figure 2.13	miR-148a and miR-21 expression levels are associated with survival ................ 78	Figure 3.1	 Transcriptome analysis of pediatric AML ......................................................... 106	Figure 3.2	 Unsupervised NMF clustering of mRNA expression profiles from primary samples ............................................................................................................... 108	Figure 3.3	 mRNA k2-15 NMF Metrics ............................................................................... 110	Figure 3.4	 Complementary methods of clustering mRNA expression profiles .................. 112	Figure 3.5	 Differential expression analysis of mRNA transcript expression between relapse and primary samples .............................................................................. 114	Figure 3.6	 mRNAs associated with OS or EFS in the mRNA discovery and validation cohorts ................................................................................................................ 116	Figure 3.7	 Kaplan-Meier plots for mRNAs associated with OS or EFS in the mRNA discovery and validation cohorts ........................................................................ 118	Figure 3.8	 Unsupervised NMF clustering of miRNA expression profiles of primary samples ............................................................................................................... 125	Figure 3.9	 miRNA k2-15 NMF Metrics .............................................................................. 127	 xii Figure 3.10	Comparing miRNA NMF sub-groups (k=2) and mRNA NMF sub-groups (k=5) ................................................................................................................... 129	Figure 3.11	miRNA expression in relapse and refractory AML ........................................... 130	Figure 3.12	Workflow for miRNA:mRNA integrative analysis ........................................... 132	Figure 3.13	Identifying miRNA:mRNA expression correlations .......................................... 133	Figure 3.14	Integrative miRNA:mRNA analysis identifies putative miRNA targets ........... 135	Figure 3.15	miRNAs associated with patient survival .......................................................... 137	Figure 3.16	Kaplan-Meier plots of miRNAs associated with OS and EFS in the miRNA discovery and validation cohorts ........................................................................ 139	Figure 3.17	Clinical characteristics of AML patients in high and low miR-106a-5p expression groups ............................................................................................... 143	Figure 3.18	miR-106a-5p targets genes involved in oxidative phosphorylation ................... 144	Figure 4.1   Two sub-groups are revealed by NMF clustering of miRNA expression within the MRT samples .................................................................................... 161	Figure 4.2	 Metrics from NMF clustering of miRNA expression profiles ........................... 163	Figure 4.3	 Three groups of MRT samples are revealed by clustering miRNA expression profiles of MRT with other tumour and normal tissue types ........... 164	Figure 4.4	 Pvclust result of miRNA pan-can hierarchical clustering .................................. 166	    xiii List of Abbreviations  ABC – Activated B-cell-like ALL – Acute lymphoblastic leukemia AML – Acute myeloid leukemia BH – Benjamini-Hochberg BL – Burkitt’s Lymphoma CDS – Coding region/sequence CML – Chronic myelogenous leukemia CNV – Copy number variant CR – Complete response DLBCL – Diffuse large B-cell lymphoma DNA – Deoxyribonucleic acid EBV – Epstein-Barr virus EFS – Event-free survival EMT – Epithelial-Mesenchymal Transition FAB – French-American-British FFPET – Formalin-fixed paraffin-embedded tissue FPKM – Fragments per kilobase of transcript per million mapped reads GCB – Germinal centre B-cell-like GO – Gene ontology  xiv ICGC – International Cancer Genome Consortium IF – Induction failure KM – Kaplan-Meier MAGIC – Medulloblastoma Advanced Genomics International Consortium MD – Medulloblastoma MCL – Mantle Cell Lymphoma miRISC – microRNA-induced silencing complex miRNA – microRNA miRNA-seq – miRNA sequencing MRD – Minimal residual disease mRNA – Messenger RNA mRNAseq – mRNA Sequencing MRT – Malignant rhabdoid tumours NER – Nucleotide excision repair NMF – Non-negative matrix factorization OS – Overall survival PFS – Progression free survival RNA – Ribonucleic acid RPM – Reads per million mapped reads RSV – Rous strain of avian sarcoma virus  xv SNV – Single nucleotide variant SV – Structural variant TARGET – Therapeutically Applicable Research To Generate Effective Treatments TCGA – The Cancer Genome Atlas UTR – Untranslated region WHO – World Health Organization XP – Xeroderma pigmentosum    xvi Acknowledgements My deepest gratitude goes to my PhD advisor, Dr. Marco Marra. He has been a steadfast mentor throughout my PhD studies, and I am infinitely grateful for his continuous encouragement and commitment to me. His scientific prowess is exceptional and his insight into my work is invaluable. He is also a sage and a philosopher, who has an unwavering ability to take the high road and the long view. I truly appreciate his genuine passion for discovery, remarkable standards for personal integrity and the many lessons in life that he has taught me. He is the best PhD advisor that anyone could ever ask for. Over the course of my PhD, I have been privileged to interact with many talented scientists, clinicians and professionals. To these individuals, I am indebted for my academic success. I thank my committee members Drs. Brad Hoffman, Aly Karsan, and Paul Pavlidis for voluntarily advising and overseeing my research. I would like to also thank the other examiners who have donated their time to read this thesis and participate in my defense.  I have been incredibly blessed to be part of Canada’s Michael Smith Genome Sciences Center, an institution filled with wonderful people who are passionate about science. I would like to thank Drs. Andrew Mungall, Richard Moore, YJ Zhao, Yussanne Ma and Jacqueline Schein for leading the teams that generated the sequence data for my research. I would also like to thank Dr. Gordon Robertson and Andy Chu for teaching me the basics of miRNA analysis, and the systems team for ensuring that I always had everything I needed for my computational analyses. My research projects would not be possible without the administrative expertise and patience of Lulu Crisostomo and our project managers: Drs. Sherry Wang and Karen Novik. To them, I am grateful. The Marra lab has played a pivotal role in my PhD studies – I am appreciative of the camaraderie and support provided to me by lab mates who were constantly there to share in my scientific joys and frustrations. I am truly indebted to Diane Trinh whose technical prowess has given rise to the many wet-lab validation experiments in this thesis. I would also like to thank past and present fellow lab members – Dr. Malachi Griffith, Dr. Ryan Morin, Dr. Olena Morozova, Dr. Sorana Morrissy, Dr. Noushin Farnoud, Dr. Julia Pon, Dr. Suganthi Chittaranjan, Dr. Jill Mwenifumbo, Rodrigo Goya, Ryan Huff, Marlo Firme, Elizabeth Chun,  xvii James Topham, Michelle Ng, Jay Song, Susana Chan, Veronique LeBlanc, Dr. Robert Camfield, Dr. Farah Zahir, Dr. Alessia Gagliardi and Dr. Isabel Serrano – who have taken time to discuss my work and to provide me with wisdom from their own experiences. I have been fortunate to participate in a number of collaborative projects that have allowed me to interact with many world-class scientists and clinicians. I am honored to have been part of the National Cancer Institute’s Acute Myeloid Leukemia and Rhabdoid Tumour Therapeutically Applicable Research To Generate Effective Treatments (TARGET) initiatives, and would like to thank Drs. Soheil Meshinchi, Robert Arceci, Daniela Gerhard, and Elizabeth Perlman for this opportunity. I also extend my thanks to Dr. Todd Alonzo for his statistical support and Rhonda Ries for preparing the AML patient samples used in my research. I have also been part of the Lymphoma Research Center at the British Columbia Cancer Agency, and would like to thank Drs. Randy Gascoyne, Joseph Connors and Christian Steidl for this opportunity. I also extend my thanks to Dr. David Scott for teaching me about survival analysis, Merrill Boyle, Dr. Diasuke Ennishi, Dr. Anja Mottok and Dr. Natalie Johnson for preparing the DLBCL patient samples, and Martin Krzywinski for sharing his artistic talents with me. I am appreciative of personal financial support from both the Canadian Institutes for Health Research and the University of British Columbia in the forms of the Banting and Best Doctoral Award and the Four Year Fellowship respectively. The projects in this thesis have also been supported by the National Cancer Institute (National Institutes of Health), the Terry Fox Research Institute, the BC Cancer Foundation and John Auston. Finally, I sincerely thank my family - my grandparents, parents, uncles, aunts, Jit, cousins and Fong - for all their patience and support throughout my life. Their wisdom and encouragement is the motivation behind my pursuit of higher education, and their love and affirmation has allowed me to strive for excellence amidst the most challenging situations. A special thanks goes out to Fong whose friendship through my studies has been invaluable.  xviii Dedication To my family.   1 1 Studying miRNA Dysregulation in Cancer 1.1 Introduction Cancer is a disease marked by a high degree of heterogeneity. Genetic aberrations exist in all tumours, with a subset that are present in different cancer types at variable frequencies1. Such genetic variability may result in different diagnoses, prognoses, treatment selection and treatment responses in patients. As such, an improved understanding of the genetic underpinnings of cancer heterogeneity may contribute to more precise diagnoses and treatment options, and consequently improved outcomes for patients. Genetic alterations include large-scale structural alterations, single nucleotide variations (SNVs), epigenetic alterations and aberrant mRNA and miRNA expression patterns. In particular, the role of dysregulated miRNA expression in the pathogenesis of cancer has recently emerged2. Moreover, in the context of cancer research and clinical practice, miRNAs offer several advantages over other molecular analytes, including tissue specificity, a relatively low cost to profile, and stability in a variety of fresh and preserved contexts3. Thus, the over-arching goal of this thesis was to study miRNA expression profiles in order to identify cancer subtypes and relate these to clinical covariates, reasoning that this might yield candidate miRNA biomarkers and therapeutic targets, and provide insight into the heterogeneous biologies that converge to form cancers. This chapter introduces cancers as heterogeneous diseases and reviews miRNA dysregulation in cancers. It also discusses the approaches that have been developed for miRNA analysis, with an emphasis on the sequence-based technologies that are used for the research described in Chapters 2, 3 and 4 of this thesis. Since the focus of the thesis is on the miRNA sequence analysis of diffuse large B-cell lymphoma (DLBCL), pediatric acute myeloid leukemia (AML) and pediatric malignant rhabdoid tumours (MRT), brief overviews of miRNA dysregulation in each of the diseases are provided in Section 1.9. Finally, Section 1.10 introduces the specific hypotheses and experimental goals addressed in each of the data chapters of this thesis.   2 1.2 Early insights into cancer as a genetic disease The notion of cancer being a genetic disease was first proposed by David von Hansemann in 18904. In his study of 13 carcinoma samples, von Hansemann observed that multipolar mitosis led to asymmetric distributions of chromosomes during cell division. He then postulated that these abnormalities were responsible for increases or decreases in chromatin content in cancer cells4. Theodor Boveri subsequently validated this theory through his experiments on sea urchin eggs, where he observed that aberrant mitosis would frequently result in cell death, but would occasionally result in uncontrolled cell proliferation. This concept laid the foundation for viewing cancer as a genetic disease and formed the basis of several of Boveri’s astoundingly accurate hypotheses about cancer. These hypotheses are now commonly accepted concepts and include predictions of the existence of tumour suppressor genes and oncogenes, cell cycle checkpoints, clonal origins of cancer and that damaging events that promote aberrant mitoses cause cancer5.  In agreement with Bovari’s predictions, several subsequent early efforts, including those by Hermann Muller6 and Edgar Altenburg7, demonstrated that carcinogens, such as ionizing radiation and ultraviolet light, can act as DNA damaging agents8. For instance, ultra-violet light induces adducts in DNA which are typically rectified by DNA nucleotide-excision repair (NER) mechanisms9. However, when NER is deficient in syndromes such as xeroderma pigmentosum (XP), these adducts result in permanent DNA damage. As such, patients diagnosed with XP have an increased predisposition to skin cancer10. In 1960, Peter Nowell and David Hungerford identified the first genetic defect that was associated with cancer. In their study of chronic myelogenous leukemia (CML) samples, they observed unusually short lengths of chromosome 2211. This defect, later termed the “Philadelphia chromosome”, was subsequently shown to consist of a reciprocal translocation between chromosomes 9 and 2212 and results in a fusion between the BCR and ABL genes13. This oncogenic fusion gives rise to a constitutively active tyrosine kinase that promotes rapid cell division and inhibits DNA repair14. These findings further underscored the genetic basis of cancer.  3 1.2.1 Oncogenes and tumour suppressors It is now widely accepted that cancer is a genetic disease, where the dysregulation of oncogenes and tumour-suppressor genes confer cells with a selective growth advantage5.  Oncogenes encode proteins that control cell proliferation and apoptosis, and their activation, through mechanisms such as genomic mutation or over-expression, typically results in rapid and uncontrolled cell proliferation15. The discovery of oncogenes began in 1970 when G.S. Martin observed that the Rous strain of avian sarcoma virus (RSV) conferred tumourigenic properties in chickens16. Stehelin et al. further observed that RSV acquired its oncogenic activity by recombining with the chicken ‘c-src’ oncogene17. Soon after, genes that were distantly related to ‘c-src’ were identified in human and mouse DNA18, and the related phosphoprotein was found in tumourigenic chicken, rat, quail and human cells19,20. These studies collectively constitute the first evidence of the role of viruses in tumorigenesis in human cells, and led to the hypothesis that mutations in human genes could also transform cells in the absence of any viral involvement. This hypothesis was verified by 3 groups that isolated the KRAS gene from bladder carcinoma cell lines21-23, and they went on to reveal that the mutation specifically altered a single amino acid from glycine to valine at position 1224. This work led to studies that identified the oncogenic role of KRAS in signal transduction in tumours, and inspired numerous subsequent efforts that discovered other human oncogenes25. Tumour suppressor genes prevent tumourigenesis and may do so by repressing the progression of the cell cycle, preventing metastasis or inducing DNA damage repair or apoptosis when DNA cannot be repaired26. In 1971, Alfred Knudson performed a series of experiments on 48 cases of retinoblastoma, and noticed that there were 2 independently acquired mutations in the RB1 gene in each sample. Interestingly, he also observed that the presence of multiple tumours or tumours in both eyes occurred more frequently in familial (inherited) retinoblastoma when compared with the sporadic form of the disease. Based on statistical modeling of the frequency of these mutations, he conceived the ‘two-hit hypothesis’. This hypothesis postulates that in familial cancers, a germ line mutation in a gene could be inherited and could subsequently be followed by somatic loss in the homologous allele resulting in a loss of both functional gene copies; while in sporadic  4 cancers, both alleles would be somatically lost27. Eventually, the tumour suppressive role of RB1 was confirmed by the rescue of the neoplastic RB1 deficient retinoblastoma cell lines with wild type RB1 via retro-viral gene transfer28. Similarly, Baker et al. observed that the suppression of TP53 deficient colorectal cell growth was achieved by wild type p5329. Subsequent efforts demonstrated that TP53 is a crucial tumour suppressor gene that has roles in orchestrating cellular processes including cell cycle arrest, cellular senescence, apoptosis, metabolism, stem cell maintenance and communication with the microenvironment30.  Interestingly, mutations in TP53 not only result in a loss of function but may also result in a gain-of-function, where mutant p53 protein has an increased half-life or acquires the ability to interact with new protein partners and activate the expression of new genes targets31. The same is true of EZH2, where gain of function mutations, loss of function mutations and over-expression of this gene have been associated with aggressive forms of cancer32. These results suggest that the definition of oncogenes and tumour suppressors may not be clear-cut. 1.2.2 Tumourigenesis requires multiple genetic alterations Tumourigenesis is a complex process but has several essential requirements that were originally summarized into 6 hallmarks by Hanahan and Weinberg33. These hallmarks included: 1) evading apoptosis, 2) self-sufficiency in growth signals, 3) insensitivity to anti-growth signals, 4) sustained angiogenesis, 5) limitless replicative potential and 6) tissue invasion and metastasis. Eleven years later, they added 4 emerging hallmarks to this list34: 7) deregulation of cellular energetics, 8) avoidance of immune destruction, 9) genome instability and mutation, and 10) tumour-promoting inflammation.  Since the initial discoveries of oncogenes and tumour suppressors, the identification of genetic aberrations that result in tumourigenesis has been a key focus of cancer research, and efforts aimed at doing so have revealed that at least 570 of the ~22,000 known protein coding genes (2.5%) harbor recurrent mutations with evidence of associations with cancer35, and at least 138 of these genes can promote or drive tumourigenesis by modulating one or more of the hallmarks of cancer36.  5 In 1990, Eric Fearon and Bert Volgelstein reconciled the existence of aberrations of multiple oncogenes and tumour suppressors with their model of multi-step tumourigenesis37. Their study was based on observations of co-occurring genomic alterations, including mutations in the KRAS oncogene, losses of the TP53 tumour-suppressor gene and aberrant hypomethylation in colorectal cancer samples. Through their analysis they noted that at least 4 to 5 mutations were required for the formation of a malignant tumour. They also noted that while each mutation appeared to be associated with a different stage of the disease, it was the accumulation of alterations, rather than sequence of occurrence, that was responsible for the tumour’s biological characteristics. It is now widely understood that most human cancers harbor numerous genetic and epigenetic changes that are required for the step-wise progression of tumourigenesis36.  1.3 Cancer is a heterogeneous disease Cancer classification has been based on the anatomical location and morphology of the primary tumour. Patient tumour samples are analyzed under the microscope and histopathology and immunostaining are used to classify the cancer and achieve a diagnosis. However, cancers that present with identical morphological and histopathological features can exhibit very different clinical behavior, suggesting that heterogeneity exists between tumours. For instance, while the general microscopic appearance of medulloblastoma (MD) consists of small round blue cells, there are at least 4 MD subtypes (ie. Wnt, sonic hedgehog, group 3, and group 4) that have characteristic molecular signatures, distinctive clinical features, and are associated with different outcomes38. This phenomenon, referred to as genetic heterogeneity, has implications for a patient’s diagnosis, treatment plan, prognosis and response to treatment1.  1.3.1 Genetic heterogeneity Genetic heterogeneity underpins at least part of inter-patient heterogeneity. For instance, the M3 subtype of AML, also known as acute promyelocytic leukemia, is distinguished from the other subtypes of AML by the presence of a chromosomal translocation involving retionoic acid receptor alpha (RARA). Thus, treatment with all-trans retinoic acid is effective in  6 patients with M3-AML39. In addition, diagnosis and treatment selection based on the presence of the translocation have resulted in improved outcomes of this disease, where the relapse rate dropped from 68.4% to 20.6%40. Genetic heterogeneity may also be explained by germ line or somatic variants in a subset of patients that affect their response to drug treatment. For example, DPYD encodes an enzyme that controls the rate-limiting step in 5-FU inactivation in the liver. Consequently, patients with deactivating germ line alterations in DPYD do not respond well to 5-FU treatment41. 1.3.2 Intra-patient heterogeneity Genetic heterogeneity may also exist among the cells of a single tumour. For example, cytogenetic analyses of cells from a single tumour rarely reports that all cells have the same karyotype42. A study of individual cells in adenocarcinomas indicated that while most cells harboured TP53 mutations, only a subset of cells harboured KRAS mutations43. In another study, Bashashati et al.44 sampled spatially separated specimens from an individual high-grade serous cancer patients, and revealed that on average only 51.5% of mutations were present in every sample of a given case. Genetic heterogeneity is also evident between serially sampled tumours from the same patient. For example, Ding et al.45 demonstrated a dynamic shift in the genomic profiles between paired primary and relapse biopsies of 8 AML patients. In particular, they observed 2 patterns of clonal evolution in AML relapse: 1) the founding clone in the primary tumour gained mutations and evolved into the relapse tumour, and 2) a subclone within the primary tumour survived therapy and expanded at the time of relapse. Likewise, genetic heterogeneity may also exist between primary and metastatic tumours within the same patient. For instance, Shah et al.46 sequenced the genomes and transcriptomes of primary and metastatic samples from a lobular breast cancer patient, and observed that only 5 of the 32 (16%) somatic mutations were shared between metastatic samples and corresponding primary samples. This study suggested that patients with metastases could benefit from combination therapies directed at distinct primary and metastatic tumours.   7 1.4 Global molecular profiles of cancer 1.4.1 Gene expression analysis In addition to genomic alterations, cancers are often characterized by aberrant gene expression levels that are regulated by a tight interplay between transcription factors, chromatin states and other forms of regulation, including non-coding RNAs47. Microarray-based expression profiling provided the first genome-wide transcriptome profiles and were instrumental in characterizing gene expression sub-groups within cancer types and in providing expression patterns which could be used in clinical decision making48. In this thesis, abundant expression refers to high and plentiful expression of a gene, while poor expression refers to low expression of a particular gene.  In 1999, Golub et al.49 profiled gene expression in AML and ALL samples and provided the first evidence that gene expression analysis could distinguish between cancer types. They also designed an expression-based treatment response predictor, and provided a general framework for unsupervised prediction of cancer classes independent of existing knowledge of the cancer samples. A year later, Alizedeh et al.50 revealed that analysis of gene expression profiles could uncover heterogeneity within a single cancer type. Specifically, they identified a gene expression signature that could distinguish between the 2 morphologically identical subtypes of DLBCL (ABC-DLBCL and GCB-DLBCL). In addition, they also noted that ABC-DLBCL patients had inferior outcomes when compared with GCB-DLBCL patients. In another study, van’t Veer et al.51 analyzed 117 breast cancer samples and identified a poor prognosis signature consisting of genes regulating cell cycle, invasion, metastasis and angiogenesis. This signature predicted the occurrence of metastases and out performed existing prognostic indicators. Both these studies suggested that gene expression signatures may be powerful prognostic markers. Since then, numerous efforts have identified many gene expression signatures that can distinguish between tumour and healthy tissues52, classify cancer subtypes53-56, predict patient  8 survival51,53,57, predict therapeutic response58 and predict transformation to more aggressive cancer types59,60. 1.4.2 Sequence-based molecular profiles With the advent of next-generation sequencing came opportunities for the sequencing of whole tumour genomes and the generation of comprehensive catalogues of somatic mutations from individual tumour genomes, which include single nucleotide variants (SNVs), copy number variants (CNVs), and structural variants (SVs)61,62. Further, researchers have also applied whole transcriptome sequencing (RNA-seq) to tumours to obtain isoform-specific and allele-specific gene expression profiles, and to identify fusion transcripts, novel RNA transcripts, and instances of RNA editing63. ChIP-seq and bisulphite-seq on tumours have also been used to obtain DNA methylation and histone modification profiles respectively64. Systematic cancer genomics projects, such as The Cancer Genome Atlas (TCGA)64 and the International Cancer Genome Consortium (ICGC)65, have profiled genomes, transcriptomes and epigenomes in thousands of tumour samples representing almost 30 different cancer types. These efforts have identified many novel somatic mutations and patient sub-groups within cancer types based on mRNA expression, miRNA expression and chromatin states. The heterogeneity observed in each one of these cancer profiling projects has reinforced the notion that cancer can take hundreds of different forms depending on location, cell of origin, or spectrum of molecular alterations. These distinctions may result in differences in tumour properties or in the patient’s response to treatment, and have the potential to dictate rationales for the types of treatment selected64.  For instance, the TCGA AML project66 reported that NMF clustering of mRNA-seq and miRNA-seq expression profiles revealed 7 sub-groups and 5 sub-groups of patients, respectively. These mRNA and miRNA sub-groups displayed inter-patient heterogeneity not only at the expression level, but also in the presence of particular cytogenetic alterations and in patient outcomes – miRNA sub-groups 2, 3 and 5 were associated with unfavourable, intermediate and favouable risk categories, respectively. The mRNA sub-group characterized  9 by PML-RARA chromosomal translocations had better outcomes than other groups, and was further characterized by a distinct DNA methylation signature.  These systematic cancer genomics projects demonstrated the utility of sequence-based mRNA and miRNA expression profiles in identifying cancer sub-groups that have distinct clinical outcomes. They also indicate that the study of mRNAs and miRNAs can provide insight into the genes and pathways that are dysregulated in cancer.  1.5 miRNA-mediated repression miRNAs are small 17-25nt RNA molecules that regulate gene expression at the post-transcriptional level. They were first discovered in the context of regulating developmental gene expression patterns in C. elegans. In 1993, 2 groups67,68 observed that lin-4 repressed lin-14 during C. elegans larval development. These efforts observed that the lin-4 gene was a short 22nt non-coding RNA molecule that exhibited sequence complementarity with several loci in the 3’-UTR of lin-14. This led to the postulation that these sequences had a role in the observed repressive lin4:lin14 interaction. In 2000, Reinhart et al.69 identified a second small RNA (let-7) and demonstrated that let-7 regulated developmental timing in C. elegans by binding to and repressing lin-41. Just a year later, Lee and Ambros70 discovered another 15 such regulatory RNA sequences in C. elegans and also revealed their homologs in mammals and insects. They declared a new class of RNAs, and began to refer to them as “microRNAs”. Since then, many miRNAs have been identified; an online repository, mirBase (http://www.mirbase.org) version 21, lists 35,828 distinct miRNAs in 223 species. 1.5.1 miRNA-biogenesis and mechanism of action miRNA-mediated repression is a regulatory process in which miRNA binds to mRNA molecules to prevent their translation into protein. miRNA genes are transcribed into long primary miRNA transcripts, termed pri-miRNAs, which form hairpin structures that distinguish them from typical RNA polymerase II products. pre-miRNA hairpins are then processed in the nucleus to ~70nt pre-miRNAs by the nuclear microprocessor complex, consisting of DGCR8 and the RNase III enzyme DROSHA. Pre-miRNAs are then exported  10 to the cytoplasm by XPO5 and processed further by DICER, to ~22nt double-stranded miRNA duplexes. Each miRNA duplex consists of a 3p and 5p mature miRNA, which are derived from the 3’ and 5’ ends of the hairpin respectively. Double stranded duplexes, are loaded onto an argonaute (AGO) protein in the miRNA-induced silencing complex (miRISC) and rapidly unwound. The mature miRNA acts by directing the miRISC to complementary miRNA binding sites located in the target mRNAs, thereby inducing translational repression through cleavage or destabilization of the mRNA targets ( Figure 1.1). Perfect complementarity of the miRNA seed region to its mRNA target is typically necessary for mRNA cleavage and degradation. In cases where there is imperfect complementarity only mRNA destabilization is achieved ( Figure 1.1). While mRNA cleavage is usually observed in plants, studies in HeLa cells indicate that the impact of mammalian miRNAs on protein output is primarily explained by destabilization of target mRNA transcripts71. miRNA genes are encoded in the genome as either independent transcriptional units with their own promoters, or as clusters of several polycistronic miRNA genes transcribed as a single transcript. Many miRNA genes are also found within introns of protein coding genes, and some of these are mirtrons, a special class of miRNAs which do not rely on the canonical miRNA biogenesis pathway72. Instead of being cleaved by DROSHA, mirtrons are pre-processed by mRNA splicing machinery73. Specific RNA binding proteins are also required for the biogenesis of some miRNAs. This is illustrated by the biogenesis of miR-18a from the miR-17-92 cluster. Although all 6 miRNAs from this cluster are transcribed together as a polycistron, miR-18a additionally requires the presence of hnRNP A1 for its maturation74. Although the targets of each of the members of this miRNA cluster do overlap because of the high degree of conservation between the miRNA members, this finding suggests that therapeutically targeting miR-18a alone could fine-tune the expression levels of miR-17-92 targets. Evidence indicates that although it was initially understood that miRNAs target the 3’-UTRs of mRNA, they can also target the 5’-UTR and coding regions to elicit translational repression just as effectively75. For instance, Duursma et al.76 demonstrated that miR-148 repressed the mRNA transcript of DNMT3B by binding to a region in a coding sequence. Lytle et al.77 observed that miRNA binding sites in the 5’-UTR of the mRNA transcript are  11 just as effective as those in the 3’-UTR. Since the 5’-UTR is where the ribosome binds, this finding suggested that miRNA-mediated repression of protein synthesis is affected during or downstream of translation initiation. Petersen et al.78 later confirmed this hypothesis by demonstrating that AGO proteins within the RISC complex act in translational repression by inducing the ribosome to detach from the mRNA transcript.  miRNA:mRNA interactions are complex. A given miRNA may have multiple (up to several hundred) predicted gene targets, and 60% of mRNAs have binding sites for several miRNAs in their 3’-UTRs79. Currently, mirTarBase (Release 4.5)80 has enumerated a total of 51,460 experimentally verified interactions between 1,232 miRNAs and 17,520 target genes. These interactions were validated using methods such as luciferase reporter assays, western blots and qPCR. Given the large numbers of interactions miRNAs are involved in, it is not surprising that miRNAs have been implicated in the regulation of numerous biological processes, including cellular growth, differentiation and apoptosis, which are all phenotypes of significance in cancers2.  1.6 miRNA dysregulation in cancer In 2002, Calin et al.81 provided the first evidence of the involvement of miRNA dysregulation in the pathogenesis of cancer, which included the identification of a genomic deletion at chromosome 13q14 that resulted in the loss of miR-15 and miR-16 expression in 68% of B-cell chronic lymphocytic leukemia cases. Since then, molecular profiling efforts have begun to examine the causes and consequences of miRNA dysregulation2. Specific miRNAs have been found to characterize various subtypes of cancer82 and to have essential roles in cell differentiation and tumourigenesis83-85. 1.6.1 miRNAs as tumour suppressors miRNAs that are responsible for repression of genes that would otherwise contribute to tumorigenesis can be classified as tumor suppressors. These tumor suppressive miRNAs are typically repressed or lost in malignancy. In some instances this loss results in the miRNA exacerbating the aberrant expression of target genes that may have already been dysregulated  12 by other mechanisms, such as copy number alterations or translocations. For instance, in DLBCL, miR-34a is down-regulated by MYC and the result of this is increased cell proliferation through a FOXP1-dependent tumorigenesis pathway86. Similarly, miR-15a and miR-16-1 loci are lost in DLBCL87 and mantle cell lymphoma (MCL)88, and this loss contributes to tumorigenesis through the de-repression of their oncogenic targets, including anti-apoptotic BCL2 and the cell cycle regulator CCND189. In addition, the combinatorial loss of multiple miRNAs could synergistically contribute to tumorigenesis. For instance, a protein that is required for CCND1 activity, CDK6, is also up-regulated due to the loss of miR-2990. Another example is that of the miRNAs that target MYC. In Burkitt’s Lymphoma (BL), MYC is only successfully de-repressed in the absence of abundant let-7a/b, miR-125b, miR-132, miR-154, miR-331, and miR-363. Interestingly, the expression of these miRNAs is repressed by MYC, suggesting a MYC-miRNA feed-forward loop that may drive tumorigenesis91. 1.6.2 miRNAs as oncogenes miRNAs that are over-expressed or amplified in malignancy can result in the repression of tumor suppressor genes. These miRNAs can be classified as oncogenic miRNAs or oncomiRs. For instance, miRNAs from the miR-17-92 cluster are frequently up-regulated in NHLs when compared with normal B-cells. Their over-expression facilitates cell proliferation and inhibits apoptosis by exerting translational repression on multiple target genes, including pro-apoptotic BCL2L11 and tumor suppressor PTEN92. miR-17-92 may also contribute to tumorigenesis by activating members of the PI3K/AKT pathway. In addition, the increased repressive effects of miR-17-92 on its target genes in vivo compared with in vitro suggests that miR-17-92 also regulates the tumor micro-environment, thus accelerating tumorigenesis. In particular, miR-17-92 promotes angiogenesis through suppressing anti-angiogenic THBS-1 and CTGF, or by inhibiting members of the TGFβ pathway93.     13 1.6.3 Causes of miRNA dysregulation 1.6.3.1 Copy number alterations miRNA dysregulation may result from a variety of causes and may be due to changes in the underlying DNA that encodes them. One category of such genetic alterations are DNA copy number alterations94. Due to the nature of copy number alterations affecting large segments of genomic DNA, the same aberration may affect clusters of several miRNAs. A genome-wide study of miRNA expression and copy number in DLBCL identified 63 individual miRNAs, including 28 miRNA clusters that displayed recurrent copy number changes87. A third of these miRNAs were also found to be part of a tumor-driven classifier, suggesting the significance of copy number alterations in miRNA dysregulation in DLBCL87. In contrast, in adult AML, copy number alterations involving miRNA genes are uncommon95, suggesting that miRNA dysregulation in this disease may be due to other types of molecular alterations. 1.6.3.2 Translocations Genomic fusions which arise in malignancy may be a cause of miRNA loss or gain.  In DLBCL, the t(3;7)(q27;q32) translocation fuses BCL6 to a non-coding region at FRA7H near miR-29, thereby down-regulating the expression of miR-2996,97. Similarly, a complex BCL6 rearrangement t(3;13)(q27;q31)t(12;13)(p11;q31) in DLBCL cells results in a ITPR2-BCL6 chimeric fusion gene rearrangement and places the miR-17-92 cluster antisense within this fusion. This translocation results in the up-regulated expression of the miR-17-92 cluster98. The majority of follicular lymphoma (FL) cases (90%) are characterized at the karyotype level by the t(14;18)(q32;q21) translocation – a molecular aberration that is associated with the deregulated expression of miR-16, miR-26a, miR-101, miR-29c, and miR-138.99 In Burkitt’s lymphoma (BL), a translocation results in the fusion of the PVT1 exon 1b region (encoding miR-1204) with the immunoglobulin light chain constant region. This consequently results in the up-regulation of miR-1204100. Although the consequences of this translocation are unclear, it is of interest as miR-1204 is the miRNA with closest genomic proximity to MYC, and the possible mis-regulation of both of these genes concurrently may be a synergistic mechanism promoting tumourigenesis.  14 1.6.3.3 Aberrant transcription factor activity Transcription factors are essential in regulating the expression of other genes by binding to transcription factor binding sites which are cis-regulatory elements located in close proximity to the transcription start sites of genes. When the expression of transcription factors that target miRNA loci is dysregulated, the expression of their target miRNA is consequently affected. There are several transcription factors that are dysregulated in tumourigenesis, but the most frequently disrupted is MYC101. In addition to the regulation of protein coding genes, a major consequence of MYC up-regulation is the extensive reprogramming of miRNA expression patterns in tumors102. When comparing the miRNA expression profiles of BL with those of other B-cell lymphomas, BL cases with high expression of MYC displayed a characteristic pattern of MYC-induced miRNA expression: up-regulated miRNAs include the miR-17-92 cluster, and down-regulated miRNAs include miR-15a and miR-16103. In DLBCL, MYC binds to the promoter of the miR-17-92 cluster and activates its transcription. miRNAs from the miR-17-92 cluster then repress the expression of E2F1, a transcription factor that promotes G1-to-S phase progression. Interestingly, MYC also binds to the E2F1 promoter, thereby directly activating its transcription. The collaborative regulatory action on E2F1 (up-regulation by MYC and down-regulation by miR-17-92) suggests that MYC  “uses” the miR-17-92 cluster as a means to fine-tune its regulatory mechanism of proliferation104. 1.6.3.4 Chromatin modification Chromatin modifications alter the structure of heterochromatin, restricting physical access of nuclear factors, such as pioneer factors105, to the underlying DNA106. One class of chromatin modifications is DNA methylation, which is conversion of DNA cytosine to methylcytosine by DNA methyltransferase. The presence of additional methyl groups on DNA residues modulates the accessibility of DNA to transcriptional machinery; DNA hypermethylation at gene promoters is associated with decreases in gene expression107. In cancer, DNA methylation may be dysregulated, and this results in the aberrant expression of genes involved in tumourigenesis108. For instance, DNA hypermethylation in NHL results in the repression of miR-124a expression, which in turn results in the over-expression of oncogenic  15 CDK6109.  Similarly, in several haematological malignancies, DNA hypermethylation results in the repression of miR-203 which enhances the expression of oncogenic ABL1 and BCR-ABL1 fusion genes110. In some instances, DNA hypermethylation could provide an alternative means for MYC activation. In the absence of the MYC-IGH translocation (in BL-translocation-negative cases) miR-9 is hypermethylated, resulting in its down-regulation and up-regulation of its oncogenic target, IGH 111. Histone modifications, unlike DNA methylation, involve the covalent modification of histone tails, rather than of DNA nucleotides. miRNAs may influence histone marks by regulating the expression of histone modifiers. In mantle-cell lymphoma (MCL), miR-15a, miR-16-1, and miR-29 are down-regulated due to histone hypoacetylation at the promoters of their genes. In this instance, the hypoacetylation is brought about by the over-expression of MYC: MYC binds to and recruits HDAC3, an enzyme that is responsible for removing acetyl groups from histone residues. This in turn results in the down-regulation of miR-15a/16-1112, and consequently the de-repression of oncogenes including BCL2, MCL1, CCND1, and WNT3A113. In a subsequent study, MYC was observed to work in concert with EZH2 to repress HDAC3, which in turn resulted in the repression of miR-29114. 1.6.3.5 Viral infection In some cases, miRNA dysregulation can be induced by viral infections. Epstein-Barr virus (EBV) is an oncogenic Herpes virus which establishes a latent infection in lymphocytes115. EBV readily transforms cells into permanently growing cells under certain conditions such as immunosuppression. EBV is implicated in 95% of endemic BL cases and 15% of DLBCL cases. Similarly, in gastric adenocarcinoma, EBV is found in 9% of cases, and these cases form a distinct molecular sub-group116. More specifically, EBV-positive gastric adenocarcinoma cases had miRNA-seq libraries which contained >500,000 reads that mapped to known EBV miRNAs, while EBV-negative cases had miRNA-seq libraries which contained <200 such reads116. EBV transformed cells include at least 44 mature viral miRNAs that target viral and endogenous genes117. For example, 3 viral miRNAs (ebv-mir-BHRF1-3) are up-regulated in  16 EBV-positive tumors, and are responsible for the down regulation of CXCL11. This targeted suppression of CXCL11 by a viral-encoded miRNA may serve as an immunomodulatory mechanism for promoting tumorigenesis118. EBV-specific miRNAs, including ebv-mir-BART3, ebv-mir-BART9 and ebv-mir-BART17-5p, are up-regulated in tumors and target BCL6, a transcription factor that typically represses many genes involved in tumourigenesis119.  In addition to introducing viral miRNAs into cells, EBV induces and represses the expression of several endogenous cellular miRNAs. When comparing EBV-positive with EBV-negative DLBCL cases, a distinct miRNA expression profile for EBV-positive DLBCL cases revealed 9 up-regulated miRNAs and 7 down-regulated miRNAs120. These results suggest that the oncogenic potential of viruses could, in part, be mediated by miRNAs. 1.6.3.6 Defects in miRNA biogenesis Dysregulation of miRNA expression can also occur post-transcriptionally during miRNA biogenesis121. Both DROSHA and DICER are enzymes that play key roles in processing pri-miRNA and pre-miRNA respectively in the miRNA biogenesis pathway. DICER1 is recurrently mutated in non-epithelial ovarian cancers122, and these mutations have been found to cause defective miRNA processing and result in a bias toward predominantly maturing 3p strands through loss of 5p strand cleavage123. Pathway analysis of gene expression profiles further indicated that genes de-repressed due to loss of 5p miRNAs are strongly associated with pathways regulating the cell cycle124. DROSHA and DICER are not only required for the biogenesis of endogenous miRNAs, but are also essential for the biogenesis of miRNAs introduced by EBV125. In particular, the expression level of DICER is crucial for the maintenance of cellular homeostasis. The loss of one functional allele of Dicer results in impaired miRNA biogenesis and consequently an accumulation of the pre-miRNA, which cannot be processed into mature miRNA. However, the complete loss of Dicer prevents tumourigenesis and is selected against in mouse tumors: one functional allele of Dicer is required for tumor survival. Whereas deletion of both alleles of DICER1 in B-cells does not promote tumorigenesis, but instead induces apoptosis126.  17 1.6.3.7 Alterations in miRNA targets In addition to dysregulation of miRNA expression, miRNA-mediated repression can be disrupted as a consequence of dysregulation and mutation of mRNA targets. For instance, MCL tumors preferentially express a mRNA transcript variant that arises as a consequence of point mutations and genomic deletions. This truncated splice variant lacks the miR-16 binding site, escapes miRNA-mediated repression, and results in an up-regulation of truncated CCND1127. Sandberg et al.128 and Mayr et al.129 have also demonstrated more generally that proliferating cells tend to produce mRNA transcripts with shorter 3’-UTRs. 1.6.4 The clinical utility of miRNAs 1.6.4.1 miRNAs as diagnostic and prognostic tools in cancer miRNAs are a promising new class of biomarkers that may supplement cancer diagnosis. Their suitability stems from their widespread dysregulation and characteristic expression profiles as described above.  Circulating miRNAs have several characteristics that make them suitable as non-invasive diagnostic biomarkers: blood serum miRNAs are resistant to RNase digestion and other harsh conditions such as extreme pH, boiling, extended storage, and multiple freeze-thaw cycles130. When considering the circulating miRNA profiles of serum from DLBCL patients compared with healthy subjects, one study found elevated expression of miR-155, miR-210, and miR-21 in DLBCL samples131, and a subsequent study additionally found elevated expression of miR-15a, miR-16-1 and miR-29c and decreased expression of miR-34a in DLBCL samples132.  miRNAs may also serve as non-invasive biomarkers at the prognostic level where they can be used to predict patient outcomes and to monitor residual disease in patients after chemotherapy. Several studies have identified miRNA that have expression profiles that are associated with patient outcomes. One study that found that high miR-21 levels in the serum of DLBCL patients is an indicator of relapse-free survival131. Similarly, another group found  18 that the down-regulation of miR-92a in blood plasma of complete response (CR) NHL patients indicates an increased probability of disease relapse133.  Circulating miRNAs have also been found in patients diagnosed with leukemia, lung, breast, ovarian, prostate, renal cell, colorectal, gastric, hepatocellular, pancreatic, oesophageal, head and neck, thyroid and brain cancers134. While these findings suggest that the expression levels of these circulating miRNAs could be used as non-invasive biomarkers, a current hurdle that impedes their utility in the clinic is the lack of methods to distinguish between circulating miRNAs that originate from tumours and normal cell types134. 1.6.4.2 Correcting for aberrantly expressed miRNAs Given the large impact of miRNAs on tumorigenesis, miRNA-based cancer therapeutics are being designed to correct the aberrant expression of miRNAs. Miravirsen inhibits the biogenesis of miR-122. It is the first miRNA-based therapeutic tested in humans and is currently undergoing phase 2 clinical trials for the treatment of HCV infections135. miRNA therapy is advantageous because it is specific in its targeting: a partial complementary base pairing interaction is required between the miRNA seed sequence and mRNA. Moreover, a single miRNA can act as a “master switch” and thus can concurrently target numerous mRNA targets, and may be useful where several oncogenic pathways need to be targeted simultaneously136. The majority of currently developed miRNA-based therapeutics take the form of miRNA mimics or miRNA inhibitors that can modulate the expression of their target genes137. One study has successfully demonstrated LNA-mediated anti-miR-155 silencing in B-cell lymphomas in mice138. However, the development of miRNA therapeutics is still in its infancy, probably due to the confounding results of miRNA therapeutic experiments. For instance, in testing the tumor suppressor activity of miRNA-34a, one study reported that not only did miR-34a not inhibit cell proliferation, it resulted in pro-apoptotic activity by dysregulation of the c-MYC/p53 regulatory axis. This suggests that the attribute of miRNAs as a “master switch”, while seemingly advantageous, could result in many off-target effects.  19 Despite this, therapeutically targeting the causes of miRNA dysregulation directly can still be considered. For instance, aberrantly expressed transcription factors or epigenetic modifiers that regulate miRNA expression can be directly targeted to reverse dysregulation of miRNA and other targets of the epigenetic modifiers. One specific example would be the therapeutic targeting of collaborative oncogenic partners HDAC and EZH2 to restore the expression of tumor suppressive miR-29 and other HDAC and EZH2 targets in MCL cell lines. In a recent study, this restoration was achieved by the combined inhibition of HDAC and EZH2 (through vorinostat, DZNep, or their siRNAs). As a consequence, a reduction of oncogenic CDK6 and IGF1R and subsequent inhibition of cell survival and colony formation in vitro was observed139.  1.6.5 miRNAs in treatment resistance Drug resistance is a major obstacle to the successful treatment of cancers. For example, in DLBCL, nearly half of the patients treated with CHOP (cyclophosphamide, doxorubicin, vincristine, prednisone) stop responding to treatment, becoming drug resistant. In pediatric AML, only about 80% of patients achieve complete response after induction therapy140, and of these patients, about 40% subsequently relapse and die of their disease141. Certain miRNAs have been found to regulate the sensitivity of patients to particular drug regiments. miR-21 expression levels in DLBCL cell lines is relatively high, and miR-21 knockdown can significantly modulate the expression level of PTEN protein and thereby increase the sensitivity of DLBCL cell lines to the CHOP chemotherapeutic regimen142. Similarly, miR-148b levels were up-regulated in response to radiation treatment of DLBCL cell lines, and were found to inhibit proliferation and increase radio-sensitivity by enhancing radiation-induced apoptosis143. In AML cell lines, up-regulation of miR-126 contributes to resistance to cytarabine144, and abundant expression of miR-125b contributes to resistance to all-trans retinoic acid and doxorubicin145. In metastatic renal cell carcinoma, miR-221 and miR-222 modulate sensitivity to Sunitinib treatment by down-regulating angiogenesis pathways that are utilized by Sunitinib146. Identifying expression levels of these miRNAs that regulate treatment sensitivity could thus aid in the development of individualized treatment  20 plans for cancer patients with abnormal expression of such miRNAs, with the aim of increasing the efficacy of treatment regimens143. 1.7 miRNA profiling approaches 1.7.1 Probe-based miRNA profiling miRNA expression profiling has traditionally been probe-based. One major approach is the quantitative real-time polymerase chain reaction (q-RT-PCR) that requires the design of specific primers that capture individual miRNA species. This method is tedious and interrogates miRNAs one and at time, but is convenient for diagnostic and research labs that are routinely running PCR for other applications147. As with mRNAs, microarray platforms also exist for profiling miRNA expression, and have been widely used to profile annotated miRNAs148. Recently, a custom array-based technology, Nanostring nCounter, has come on the market. It features a multiplex probe library which is created using two sequence-specific capture probes that are tailored to each miRNA of interest, and thus has the ability to discriminate between similar miRNA variants with high accuracy149. While it is relatively inexpensive, microarrays do not allow the identification of novel miRNAs that could arise in malignancy147. 1.7.2 miRNA sequencing (miRNA-seq) miRNA-seq involves the sequencing of short RNA molecules that are size-selected during library preparation. miRNA-seq has several advantages over microarrays. It is digital and measures absolute abundance over a wider dynamic range than possible with microarrays. As such, it is able to detect miRNA of low abundance and report expression of highly abundant miRNA more accurately. In addition, since it is not limited by specific probes, it can facilitate the discovery of novel miRNAs or other small RNA species (ie. snoRNAs, rRNAs, etc.). Moreover, miRNA-seq profiles each miRNA at single nucleotide resolution and thus provides high accuracy for distinguishing pri-miRNAs that share mature sequences or seed sequences. There are also several tools available to translate sequence reads into miRNA expression levels, discover novel miRNA, and identify differentially expressed miRNA in  21 different cellular contexts79. While miRNA-seq provides several advantages over probe-based methods, it is relatively more expensive, typically requires more RNA (at least 1 ug), and analysis often requires advanced computation147. 1.7.3 Profiling miRNAs in FFPET tissues Unlike mRNA, miRNA are stable in vitro and long-lived in vivo, and can be detected in urine, peripheral blood, and formalin fixed paraffin embedded tissue (FFPET)147. Tissue samples collected during surgery as well as biopsies are often fixed in paraffin, but extracting nucleic acids from FFPET has been problematic. However, since there are several methods that can reliably profile miRNA in FFPET including qPCR, in situ hybridization150, and microarrays151, miRNA research using archival tissue is rapidly gaining popularity. Studies have compared miRNA expression profiles obtained from FFPET and fresh frozen samples, and have shown that differences between profiles exist152-154.  These efforts noted that although miRNAs extracted from FFPET tend to have shorter average lengths, reduced purity152, and higher expression levels than miRNAs from fresh-frozen samples154, the miRNA expression profiles obtained from FFPET were still comparable to those obtained from fresh-frozen samples in terms of the numbers of detected miRNAs and their relative expression levels. 1.8 Identification of miRNA-mediated repression interactions The abundance of miRNA expression in mammalian cells prompted the question of what the cellular functions of these small non-coding RNAs were. Naturally, given that the function of miRNAs is to modulate the translation of mRNA targets by cleavage or destabilization, the key to answering this question is to identify the regulatory targets of each miRNA. Since the identification of the first miRNA-mediated interaction between lin-4 RNA and lin-14 mRNA67,68, researchers have sought to identify targets of miRNAs. 1.8.1 Prediction of miRNA binding sites Individual miRNAs can have multiple targets, and one of the challenges in the field of miRNA biology is the identification of these targets. Although there have been several  22 experimentally validated miRNA:mRNA target interactions80, many of these studies have been facilitated by computational predictions of target interactions. Software, such as TargetScan155 and miRanda156, are able to predict the likelihood of a target interaction based on given miRNA and mRNA sequences of interest. Target prediction methods are diverse both in approach and performance, but they agree on 3 principles157. Firstly, Watson-Crick pairing is required between the mRNA sequence and the 5' region of the miRNA centered on its seed sequence (nucleotides 2-7). This complementarity allows for the miRNA to correctly recognize its target. Secondly, evolutionary conservation observed at the miRNA seed region is sufficient for predicting targets. The rationale behind this is that conserved regions of the genome are thought to be under selective pressure and therefore may be biologically functional. Thirdly, highly conserved miRNAs also have many conserved targets; more than half of the human protein coding genes appear to have been under selective pressure to maintain 3’-UTR pairing to miRNAs158. As such, prediction methods usually allow for hundreds of target predictions per miRNA. In addition, some software, such as miRanda, may also include free energy calculations to estimate the stability of the miRNA:mRNA binding interaction156. 1.8.2 Integrative miRNA and mRNA expression profile analysis In addition to computational binding site predictions, expression profiles of miRNA and mRNA obtained from the same individual in a cohort of patients can be profiled to determine putative interactions159. This approach of identifying functional miRNA:mRNA pairs based on anti-correlated expression has been used in multiple myeloma160,161, human lymphoblastoid cell lines162, NCI-60 leukemia cell lines163, and ovarian cancer164. The assumption behind these profiling efforts is that miRNAs promote destabilization of target mRNAs, and decreased mRNA levels could be a result of direct miRNA repression71. A miRNA:mRNA functional pair can be defined as a miRNA predicted to interact with a given mRNA, where the two entities have negatively correlated expression profiles. With this in mind, several methods have been developed to simultaneously profile and integrate mRNA and miRNA expression165. They typically employ a combination of the  23 Pearson correlation, Spearman correlation, Hyper-Geometric test, Fisher’s exact test, or the Chi-square-test to determine if the miRNA and mRNA expression profiles are negatively correlated. While these methods have been successful in identifying many bona fide miRNA:mRNA interactions, they are limited to identifying interactions that involve mRNA cleavage or degradation, which consists of about 84% of all miRNA-mediated repression interactions71. 1.8.3 Experimental validation of miRNA:mRNA interactions While bioinformatic approaches have been successful in predicting candidate miRNA:mRNA interactions, these interactions can only be recognized as functional if they have been experimentally validated in vivo. A common method of establishing that a particular miRNA binds to a particular mRNA binding site is through luciferase reporter assays. These involve co-transfecting miRNA mimics or inhibitors with a luciferase reporter vector containing the predicted mRNA binding sequence. A functional miRNA:mRNA interaction would then be indicated by the quenching of luciferase activity upon miRNA over-expression. An alternative method for determining miRNA:mRNA interactions is through AGO immunoprecipitation (IP) followed by miRNA and mRNA quantification, where quantification is usually performed using qPCR, microarrays or sequencing166. The assumption made in AGO-IP experiments is that for miRNAs to illicit a repressive effect on mRNAs, both RNA molecules need to be bound by the AGO protein within the RISC complex. Precipitating RNA molecules bound to active AGO will identify RNAs involved in active interactions. Examples of high-throughput AGO-IP methods include high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP)166 and photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP)167. AGO-IP identifies multiple miRNA:mRNA interactions simultaneously instead of single miRNA:mRNA pairs. Moreover, unlike miRNA over-expression assays which are often performed in unrelated cell lines, AGO-IP methods can be performed in primary tissues with endogenous miRNAs168.   24 Some studies take an additional step, and measure changes at the mRNA and protein levels most frequently using qPCR and western blots, respectively. These experiments also involve transfecting cells with miRNA mimics or inhibitors. An observation of reduction of mRNA and protein levels in response to miRNA over-expression adds further evidence of a functional miRNA-mediated repression interaction169. 1.9 miRNA expression profiles of cancers studied in this thesis In this thesis, I report on miRNA expression and interactions in cancers using a selection of tools and techniques which are mentioned above. In this section, I provide brief introductions of miRNA dysregulation in the 3 cancers upon which the rest of the thesis is based: diffuse large B-cell lymphoma, (DLBCL) pediatric acute myeloid leukemia (AML) and pediatric malignant rhabdoid tumours (MRT). DLBCL and AML are heamatological malignancies, which derive from a common haematopoietic progenitor170. In contrast, MRT are solid tumours which are frequently found in the brain and kidney, but their cell of origin is debated171. 1.9.1 Diffuse large B-cell lymphoma Diffuse large B-cell lymphoma (DLBCL) is the most common kind of NHL, accounting for almost 30-40% of newly diagnosed lymphomas172. DLBCL tumors are characterized by up-regulated expression of miR-150, miR-17-5p, miR-145 and miR-328 when compared to normal lymph node and follicular lymphoma (FL) samples173. Since miRNA expression tends to be tissue specific, miRNA expression profiles differ based on the primary tumor site. When miRNA expression was compared between central nervous system, testicular and nodal DLBCL samples, miR-17 was up-regulated in DLBCL tumors arising from within the central nervous system, while miR-127 was up-regulated in DLBCL tumors arising from within the testis174. Although DLBCL cases have a common morphology, they are noted for their striking clinical and molecular variability. Gene expression profiling has revealed at least 3 DLBCL subtypes that have different genetic aberrations and clinical outcome50. Activated B-cell-like (ABC) subtype tumors arise from plasmablasts while Germinal Center B-cell-like (GCB) subtype tumors arise from germinal center B-cells172. These two subtypes  25 can be distinguished not only by gene expression profiles but also by distinct miRNA expression profiles. In particular, miR-155, miR-221, and miR-21 are more highly expressed in ABC-DLBCL when compared with GCB-DLBCL84,175. However, it is still not known if there are novel miRNA species that arise in malignancy, and unclear which miRNA:mRNA interactions may be acting in DLBCL. 1.9.2 Pediatric acute myeloid leukemia Acute myeloid leukemia (AML), a hematopoietic malignancy, comprises nearly 25% of all pediatric leukemia diagnoses, with the highest rate of incidence in infancy176. Until recently, most molecular aberrations in pediatric AML were identified only because they were initially found in studies of adult AML176. Currently, there are several on-going efforts which are focused on identifying cytogenetic alterations and miRNA140,177,178 and mRNA141,179-186 expression profiles that are specific to pediatric AML176, reasoning that this would lead to a more precise characterization of pediatric AML. Probe-based miRNA expression profiling efforts demonstrated that expression levels of miR-100, miR-125b, miR-335, miR-146a, and miR-99a are differentially expressed between pediatric AML and pediatric acute lymphoblastic leukemia (ALL) or adult AML178, and that each pediatric AML subtype (as defined by cytogenetic alterations) has a distinct miRNA expression pattern177. For instance, cases harbouring t(8;21) are characterized by abundant expression of miR-126-3p, miR-181a, miR-181b, miR-146a, and miR-146b-5p; cases harbouring t(15;17) are characterized by abundant expression of miR-100, miR-125b, miR-146a, miR-146b-5p, miR-181a, and miR-181b; while MLL rearranged cases are characterized by abundant expression of miR-21. In a study of pediatric AML and CML, abundant expression of miR-99a was shown to contribute to tumourigenesis by targeting CTDSPL and TRIB2187. Another study of pediatric AML demonstrated that miR-663 was poorly expressed due to hypermethylation188. In addition, abundant expression of miR-100189, miR-375190, miR-335191 and reduced expression of miR-29a192 have been associated with poor outcome in pediatric AML. However, it is still  26 not known which miRNAs are specific to refractory and relapsed AML and what biological processes their gene targets are involved in. 1.9.3 Pediatric malignant rhabdoid tumours Malignant rhabdoid tumours (MRT) are rare but aggressive pediatric solid tumours with a median age at diagnosis of 11 months193. MRT can occur throughout the body, although they are detected frequently in the kidney (RTK) and the brain (Atypical Teratoid Rhabdoid Tumours; AT/RT). MRT is characterized by an inactivation of SMARCB1, a member of the nucleosome remodeling SWI/SNF complex171. miR-129, miR-142-5p, and miR-25 are abundantly expressed in pediatric brain tumours, including MRT, when compared to normal tissues194. In particular, abundant expression of miR-142-3p promotes tumour-initiating and radiotherapy-resistant properties in AT/RT195. In AT/RT, let-7a and let-7b are frequently deleted and target a gene involved in chromosome condensation during the cell cycle (HMGA2)196. miR-221 and miR-222 exhibit increased expression in AT/RT and target the tumour suppressors CDKN1B197 and SUN2198. In AT/RT, abundant LIN28 expression is associated with the repression of let-7g and let-7b expression, and consequently the increase in expression of the KRAS oncogene199. Studies on miRNA dysregulation in MRT are limited. Only two studies have generated probe-based miRNA expression profiles of MRT samples, comparing MRT with rhabdomyosarcoma200 and AT/RT respectively201. One effort demonstrated that miR-200c is abundantly expressed in MRT when compared to alveolar rhabdomyosarcoma200, and the other effort showed that miRNA expression profiles could not distinguish MRT from AT/RT201. In addition, other studies that were focused on the SWI/SNF complex reported that miR-193a repressed the expression of SMARCB1 in malignant rhabdoid tumours202, miR-206, miR-381, miR-671-5p repressed the expression of SMARCB1 in epitheloid sarcoma203, and miR-199a repressed the expression of SMARCA2 in a variety of cancers204. However, there has not yet been an effort to identify miRNA-based sub-groups in MRT, or an effort to use miRNA expression profiles to aid in the identification of putative tissues of origin of MRT.  27 1.10 Thesis objectives and chapter overview  The over-arching hypothesis of this thesis was that stratification of heterogeneous cancer patient populations based on miRNA expression profiles would reveal miRNA biomarkers, therapeutic targets and patient sub-groups correlated with clinical characteristics. As such, I set out to accurately and efficiently quantify, from Illumina sequencing data, known and novel miRNAs, reveal prognostic miRNA associations, and develop approaches to identify miRNA:mRNA interactions that are characteristic of specific cancers. In particular, I characterize miRNA expression in 3 different cancer types (DLBCL, pediatric AML and pediatric MRT), where each cancer type provided a different perspective for the study of the miRNA dysregulation: 1) The DLBCL study highlights how miRNA dysregulation may indicate in differing prognoses in adults, 2) the pediatric AML study explains how miRNA dysregulation is associated with treatment responses in children, and 3) the pediatric MRT study demonstrates that tumours may have different miRNA expression profiles despite a uniform driver alteration (deactivation of SMARCB1) in the disease. I also hypothesized that the identification of new miRNA prognostic markers and interactions may offer insights into disrupted biological pathways of the studied cancers, and perhaps suggest novel therapeutic options.  Chapter 2 provides a comprehensive description of miRNA expression in treatment naïve adult DLBCL cases. Given the poor outcomes in DLBCL, the objective of this project was to identify additional miRNA-based biomarkers that may offer the possibility of improved tools for clinical management of DLBCL. In this chapter, I detail miRNA expression profiles of known and 30 candidate novel miRNAs and reveal 2 sub-groups of DLBCL patients that differ in prognosis, and in the expression of miR-21 and miR-148a. In addition, I also reveal 6 new prognostic miRNAs for DLBCL patients, which are associated with survival independently of established indicators of outcome. Here, I also identified a novel miRNA species (miR-10393), which was more abundantly expressed in DLBCL samples than in benign centroblast samples. miR-10393 targeted MLL2 and EP300, both of which are genes that are known to be frequently mutated in NHL205. This reinforced the significance of  28 dysregulation of chromatin modifiers in DLBCL, and suggested that miRNAs could provide an alternative mechanism for the dysregulation of these genes. Chapter 3 describes the transcriptomes of pediatric AML, with a specific emphasis on treatment resistant disease. In addition to treatment naïve primary samples, I also comprehensively profiled miRNA and mRNA expression in relapse and refractory samples from the same patients. The objectives of this project were 1) to perform an analysis of primary samples to identify miRNA/mRNA expression patterns that underpin the biology of AML or are associated with treatment outcomes, and 2) to perform comparisons of relapse and refractory samples with primary samples to identify miRNA/mRNA expression patterns that underpin the biology of treatment resistant AML. In this chapter, I revealed 5 sub-groups of patients, where the sub-group with abundant expression of ribosomal genes was also characterized by superior outcomes, suggesting that the expression of ribosomal genes may be important in delaying the pathogenesis of the disease. Importantly, I also uncovered the robust association of abundant miR-106a-363 expression with relapse and refractory disease, and further demonstrated that miR-106a-363 may be acting in treatment resistance by reducing the expression of genes involved in oxidative phosphorylation, a process that is often inhibited in chemoresistant leukemic stem cells206.  Chapter 4 describes miRNA expression in treatment naïve pediatric MRT, specifically using the tissue-specific quality of miRNA to characterize putative cells/tissues-of-origin of MRT. The objective of this project was to characterize the transcriptomes of MRT in order to identify dysregulated miRNA regulatory networks and to aid in the identification of the cell/tissue of origin. As such, I performed an analysis of the pediatric MRT samples along with >11,000 miRNA-seq samples (available from the TCGA, MAGIC and other projects), revealed that the majority of pediatric MRT samples were similar to normal cerebellum and DLBCL samples. Moreover, MRT serves as an ideal model for studying heterogeneity related to transcriptional and epigenomic dysregulation because, with the exception of inactivation of SMARCB1, they are relatively devoid of genomic alterations207. Finally, Chapter 5 provides conclusions and future directions of the presented work.  29 1.11 Figures  Figure 1.1 miRNA biogenesis and function An illustration of miRNA biogenesis and function. miRNAs are transcribed in the nucleus into pri-miRNA, which are processed by the DROSHA and DGRC8 complex. The pre-miRNAs are then exported to the cytoplasm by XPO5 and further processed by DICER into miRNA:miRNA* duplexes. Helicase unwinds the duplex, leaving the mature miRNA strands free to associate with the RISC complex to direct mRNA targeting. miRNA:mRNA interacting pairs with perfectly complementary sequences result in the cleavage of target mRNA, whereas partial complementarity achieves translational repression through mRNA destabilization.    30 2 Comprehensive miRNA Sequence Analysis Reveals Survival Differences in Diffuse Large B-cell Lymphoma Patients 2.1 Introduction Diffuse large B-cell lymphoma (DLBCL) is an aggressive form of non-Hodgkin lymphoma (NHL) that accounts for 30% to 40% of newly diagnosed lymphomas. Molecular profiling has revealed that the activated B-cell-like (ABC) and germinal center B-cell-like (GCB) subtypes of DLBCL are derived from different cells of origin and exhibit differential responses to chemotherapy50. In particular, the current combination of cyclophosphamide, doxorubicin, vincristine, prednisone, and rituximab chemotherapy (R-CHOP) yields inferior outcomes in patients with the ABC subtype compared to patients with the GCB subtype50. Thus, these subtype assignments add prognostic value to the widely used International Prognostic Index (IPI) that constitutes the clinical gold standard for identifying patients with poor prognosis208,209. Although gene expression signatures and single gene mutation (or expression)-based prognosticators have been described210, many of these molecular features are surrogates for either the IPI or cell-of-origin (COO) sub-groups210. As such, the identification of additional biomarkers and therapeutic targets may offer the possibility of improved tools for clinical management of NHL. miRNA signatures have been identified in cancers148, and several miRNAs, including miR-155 and the miR-17-92 cluster, have expression patterns that distinguish DLBCL from non-malignant B-cells173. Expression of several miRNAs, including miR-155, miR-21, and miR-221, differ between the ABC and GCB DLBCL subtypes175. In addition, miR-21 expression in tumor cells175 and serum131 has been shown to be associated with DLBCL patient prognosis. Subsequently, several groups84,211-213 performed survival analyses on larger DLBCL patient cohorts using qPCR-based strategies or miRNA microarrays and identified miRNAs that were associated with survival, including miR-21, miR-222, miR-23a, and miR-27a. Deep sequencing of miRNA (miRNA-seq) provides an opportunity to comprehensively catalog the repertoire of miRNA expression and study miRNA dysregulation  31 comprehensively. miRNA-seq has been used to discover candidate novel miRNAs at various stages of B-cell development85 and in NHL cell lines214. However, at the initiation of my study, miRNA-seq had not yet been used to profile DLBCL patient samples. Here I report on the miRNA-seq expression profiles of 92 DLBCL tumors and 15 purified benign centroblast fresh frozen samples, along with an integrated analysis of the DLBCL miRNA expression landscape including clinical annotation, mutational and mRNA expression data. An additional 140 independent DLBCL formalin-fixed, paraffin-embedded tissue (FFPET) samples were also sequenced to validate my survival analyses. I identified candidate novel and known miRNAs expressed in DLBCL, including 25 miRNAs that appeared to be associated with survival independently of the established indicators of outcome (COO and IPI) in my Discovery Cohort. Of these 25, six miRNAs had their associations with survival replicated in my Validation Cohort. Abundant expression levels of miR-28-5p, miR-214-5p, miR-339-3p, and miR-5586-5p were associated with superior outcomes, while abundant expression levels of miR-324-5p and NOVELM00203M were associated with inferior outcomes. My comparisons of DLBCL miRNA expression to miRNA expression obtained from The Cancer Genome Atlas (TCGA) revealed that the miRNAs that are characteristic of DLBCL tend to have B-cell specific functions. In addition, my integrative miRNA:mRNA expression analysis provided evidence of miRNA-mediated repression of chromatin modification genes that are frequently inactivated by somatic mutations, reinforcing the notion that inactivation of these genes is linked to malignant progression in NHL. 2.2 Results 2.2.1 miRNA sequencing of fresh frozen DLBCL tumor and centroblast samples Unlike miRNA microarrays, miRNA-seq provides, at least in principle, the opportunity to globally determine the presence and abundance of essentially all miRNAs across the entire DLBCL miRNA expression landscape. To quantify expressed miRNAs, I obtained miRNA-seq data for 92 DLBCL patient samples (30 ABC-DLBCL, 41 GCB-DLBCL, and 21 unclassified-DLBCL) and 15 purified benign centroblast samples. All patients from which  32 sequence data were obtained from were treated with multi-agent chemotherapy (83 R-CHOP; 17 other treatment protocols; for clinical characteristics see Table 2-1). Each miRNA-seq library was sequenced to an average depth of 5.34 (range: 1.34-16.91) million reads, which is generally sufficient to identify moderate-to-low-abundance miRNAs including those exhibiting modest expression differences between samples that may not be detected by hybridization-based methods215. I observed that 310 known miRNAs (3p or 5p strands of 221 miRNA species in miRBase version 19) were expressed at levels of at least 10 reads per million (RPM) in at least 10% of the samples. My threshold for calling expressed miRNAs (>10 RPM in >10% samples) was based on miRBase criteria216 for high confidence miRNAs. In addition to miRNAs, which accounted for 60% of the aligned miRNA-seq reads, my pipeline also identified the expression of other classes of small RNAs. For example, an average of 9% of the aligned reads mapped to rRNAs and 6% to snoRNAs. Other non-coding RNAs (tRNAs, snRNAs, scRNAs) and DNA repeat elements were represented by fewer reads (Figure 2.1A). 2.2.2 Novel miRNA discovery I interrogated my 92 DLBCL miRNA-seq libraries to identify candidate novel miRNA species that were dysregulated in NHL. Novel miRNA discovery was performed using miRDeep, which identifies novel miRNA species by searching within unannotated regions of the genome for palindromic sequences that fold into thermodynamically stable miRNA hairpins. In addition, miRDeep only shortlists novel miRNA species that have miRNA-seq alignment patterns that are characteristic of mature and star miRNA strands. In particular, the number of reads that align to the arm of the miRNA hairpin representing the mature strand should be at least twice the number of reads that align to the arm of the miRNA hairpin representing the star strand. After sequence filtering, I enumerated 234 candidate novel (that is, not identified in miRBase v19) miRNAs. The mean expression levels of these candidate novel miRNAs (average: 3.84 RPM; range: 0.00-4,979.00 RPM) were lower than those of the annotated miRNAs (average: 218.50 RPM; range: 0.00-131,200.00 RPM). Thirty of these putative miRNAs were expressed at levels of at least 10 RPM in more than 10% of DLBCL and centroblast samples (Figure 2.1B; Table 2-2), and this subset was used in subsequent  33 analyses. Of these, five were more abundant in benign centroblasts than in patient samples, while one, miR-10393-3p, was more abundant in DLBCL patient samples than in centroblasts (Wilcoxon test BH q-value <0.05; log2 fold change >2). Two miRNAs (miR-10397-5p, NOVELM00288M) were more abundant in ABC-DLBCL (Wilcoxon test BH q-value <0.05; Figure 2.1B). This differential abundance indicated that expression of these candidate novel miRNAs might reveal regulatory pathways deployed in these DLBCL subtypes and therefore might be useful in the classification of these tumors. To broadly survey the expression of these miRNAs in cancers, I analyzed their expression in 7,266 TCGA miRNA-seq samples from 21 other cancer types. Three miRNAs (NOVELM00260M, NOVELM00010M, and miR-10398-3p) were significantly more abundant (Wilcoxon test BH q-value <0.05; median of expression of miRNA in other cancers = 0) in B-cell contexts (DLBCL and centroblast samples; Figure 2.1C), suggesting that they may have functions enriched in, or specific to, B-cells. These 30 highly expressed candidate novel miRNAs were subjected to further analyses, in which my survival analysis revealed the associations with survival of some of them, while my integrative expression analysis revealed the potential lymphomagenic roles of others. I further assessed expression levels of each candidate novel miRNA in a published HITS-CLIP data set obtained from primary effusion lymphoma cells (Haecker et al.217). I detected the expression of 12 of the candidate novel miRNAs in this external independent data set (≥1 RPM; ≥1 sample), thus providing evidence that these 12 miRNAs do indeed interact with the Ago protein (a subunit of the RISC complex), and thus may be bona-fide miRNAs. Further, in order to detect the expression of these candidate novel miRNAs using an orthogonal technology, RT-qPCR was performed on tumor samples. Four of the 12 miRNAs that were verified by HITS-CLIP (NOVELM00060M, NOVELM00113M, NOVELM00222M, NOVELM00290M) were tested. These experiments confirmed the presence of all four of the tested miRNA. 2.2.3 miRNA expression in DLBCL To obtain a comprehensive list of candidate novel and known miRNAs that were characteristic of DLBCL, I compared the expression of each miRNA in DLBCL samples  34 with those of benign centroblasts obtained from my miRNA-seq data. I noted that 63 miRNAs exhibited increased abundance in DLBCL, while 39 miRNAs exhibited decreased abundance in DLBCL (Wilcoxon test BH q-value <0.05; log2 fold change >2; Figure 2.2A). Of the miRNAs with increased abundance in DLBCL, only miR-125b-5p218 and miR-34-5p219 have been previously implicated in lymphomagenesis in mouse models. To identify miRNAs that were more abundant in either ABC or GCB DLBCL subtypes, I performed differential expression analysis for each miRNA by comparing expression values between the two groups. Twenty-three miRNAs were more abundant in ABC-DLBCL, while 30 miRNAs were increased in abundance in GCB-DLBCL (Wilcoxon test BH q-value <0.05; Figure 2.2C). In addition, my analysis revealed that the miRNAs whose expression is increased in GCB-DLBCL appear to target transcripts that are known to be dysregulated in the formation of germinal center lymphomas220. These miRNA:mRNA pairs, which had anti-correlated expression in my data, include miR-181-5p:BCL2, miR-181a-5p/miR-28-5p/miR-3150-3p/miR-589-5p:IFNAR1 and miR-129-5p/miR-3150b-3p/miR-28-3p:IRF4. 2.2.4 B-cell-enriched miRNA expression profiles Given that miRNA expression is often cell lineage-specific221, I sought to identify B-cell-enriched profiles using a pan-cancer miRNA-seq analysis. I compared my B-cell data set (DLBCL and centroblast samples) to TCGA data from 21 other cancer types (BLCA: bladder urothelial carcinoma; BRCA: breast invasive carcinoma; CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma; COAD: colon adenocarcinoma; HNSC: head and neck squamous cell carcinoma; KICH: kidney chromophobe; KIRC: kidney renal clear cell carcinoma; KIRP: kidney renal papillary cell carcinoma; LGG: brain lower grade glioma; LIHC: liver hepatocellular carcinoma; LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma; OV: ovarian serous cystadenocarcinoma; PAAD: pancreatic adenocarcinoma; PRAD: prostate adenocarcinoma; READ: rectum adenocarcinoma; SARC: sarcoma; SKCM: skin cutaneous melanoma; STAD: stomach adenocarcinoma; THCA: thyroid carcinoma; UCEC: uterine corpus endometrial carcinoma) to identify miRNAs that were differentially expressed between my B-cell data set and all other TCGA cancer types. The 17 DLBCL cases from the TCGA data set were included in the B-cell test group for  35 these comparisons. This analysis identified 15 miRNAs that were significantly more abundant in B-cell contexts when compared with each of the 21 cancer types (Wilcoxon test BH q-value <0.05; log2 fold change >3; Figure 2.2B). miR-142-3p was the most significantly increased, displaying a 64-fold increase in B-cell contexts (Figure 2.3). Interestingly, miR-142 expression was also more abundant in the benign centroblast samples when compared with the DLBCL patient samples, suggesting that miR-142 could play an important role in normal B-cell function. Of the miRNAs that were significantly more abundant in B-cell contexts when compared with other cancers, abundant expression of miR-3150b-3p, miR-6087, and miR-4491 in B-cells has not been previously reported. My analysis indicated that miR-4491 may be involved in suppressing the expression of genes associated with the innate immune response (GO:0045087). Supporting this notion was the observation that several of these immune response genes are also frequently less abundantly expressed in GCB-DLBCL, including IFNAR1, TLL2, TLR4, and TLR8220. I found that 17 miRNAs were significantly decreased in abundance in my B-cell data set when compared to other cancers (Wilcoxon test BH q-value <0.05; log2 fold change <-3; Figure 2.2B). Of note, members of the miR-200 family (miR-200a-3p, miR-200a-5p, miR-200b-3p, miR-200b-5p, miR-200c-3p, and miR-200c-5p) were the most significantly decreased in abundance. In agreement with this, it has been reported that reduced expression of miR-200 family members results in more aggressive DLBCL through the de-repression of ZEB1, a gene which induces the epithelial to mesenchymal transition (EMT)222. 2.2.5 Integrative analysis of miRNA and mRNA expression miRNA expression can regulate translation and mRNA stability. Considering the latter mechanism, I assessed the relationship between aberrantly expressed miRNA and mRNA abundance. Using the miRNA and mRNA profiles from the 92 DLBCL and 15 centroblast samples, I identified putative miRNA:mRNA regulatory interactions (Figure 2.4). miRNAs that were more abundantly expressed in DLBCL appeared to interact with genes enriched in the Gene Ontology (GO) biological processes related to cell cycle, metabolic processes, chromatin modification, protein modification, nerve growth factor signaling pathways, and organelle organization (Figure 2.5). Conversely, miRNAs that were expressed at lower levels  36 in DLBCL appeared to interact with genes that were enriched in GO biological processes related to extracellular organization, cellular adhesion, defense and wounding responses, actin cytoskeleton organization, blood vessel morphogenesis, and endocytosis (Figure 2.5). miR-10393-3p, the candidate novel miRNA that was more abundantly expressed in DLBCL than in centroblasts, appeared to interact with transcripts from chromatin modifier genes. These genes included BRPF3, RCOR1, WHSC1L1, WHSC1, CHD6, KDM5C, SMARCA4, MLL2/KMT2D, and EP300. Although the number of targeted chromatin modifiers was not sufficient to statistically enrich the chromatin modification GO Term (GO:0016568), two of these candidate targets (MLL2/KMT2D and EP300) are frequently mutated in NHL205 (Figure 2.6A & B). This result is compatible with the notion that chromatin modification may be dysregulated in NHL patient samples by both miRNA-mediated repression and by somatic mutation. These two interactions were further validated using luciferase assays, where over-expression of miR-10393-3p inhibited the luciferase activity of constructs containing each of the four predicted MLL2/KMT2D binding sites (Figure 2.6C). Sites 1 to 3 of MLL2/KMT2D contain the full putative miR-10393-3p binding site whereas site 4 contained a 1 bp mismatch. The mismatch in site 4 may explain the reduced sensitivity to overexpression of miR-10393-3p for both the perfect binding and mismatched constructs. The effect of miR-10393-3p over-expression was similar for each of the four predicted EP300 binding sites (Figure 2.6D), where sites 1 and 3 of EP300, which contain the putative miR-10393-3p binding site, were more sensitive to miR-10393-3p overexpression than sites 2 and 4, which contain a 2 bp and 1 bp mismatch, respectively. 2.2.6 miRNAs associated with DLBCL patient outcome 2.2.6.1 R-CHOP-treated discovery cohort Given that approximately 40% of DLBCL patients succumb to their disease, and that prognostic markers for improved risk stratification are needed, I sought to identify miRNAs that were associated with patient survival. For my survival analyses, I considered the subset of the 92 patients that were uniformly treated with R-CHOP (n = 83; 29 ABC-DLBCL, 41 GCB-DLBCL, and 13 unclassified-DLBCL). This cohort is hereafter referred to as the  37 ‘Discovery Cohort’. The characteristics of my study population, including the parameters that comprise the International Prognostic Index (IPI), are shown in Table 2-1. Originally proposed in 1993208, the IPI is based on treatment with CHOP, and its modernized version, the R-IPI209, which reflects the changes resulting from addition of rituximab to the original CHOP regimen, remain the primary clinical tools used to predict outcome for patients with DLBCL209. However, even though both IPI and COO segregated patients into low and high clinical risk groups in my data set, the log rank p-values were not significant (p-value >0.05; Figure 2.7). 2.2.6.2 miRNAs associated with patient survival To identify miRNAs with expression patterns associated with patient overall survival (OS) and progression-free survival (PFS), I performed log-rank tests on X-tile-derived223 low and high expression patient groups for each miRNA. This revealed that 58 and 45 miRNAs were associated with OS and PFS, respectively (log-rank q-value <0.05). Seven of these miRNAs (miR-330211, miR-93212, miR-148a212, miR-155173, miR-151212, miR-181a213, and miR-28212) have previously been associated with DLBCL patient survival. To determine which of these miRNAs were associated with OS and PFS independently of the two established indicators of DLBCL patient outcome (COO and IPI), I performed Cox proportional hazards (PH) multivariate analysis on the X-tile-derived low and high expression patient groups for each miRNA, along with COO and IPI patient status. The results of this analysis revealed that 25 miRNAs were associated with OS and PFS independently of COO and IPI (p-value <0.05; Figure 2.8A). 2.2.6.3 R-CHOP-treated validation cohort To validate the association of these miRNAs with OS and PFS, I performed miRNA-seq on the diagnostic FFPET biopsies of 140 DLBCL patients treated with R-CHOP. I utilized FFPET samples as these were available to us. This FFPET cohort included 28 cases that were also in the fresh frozen discovery cohort; the 112 unique cases represent an independent validation cohort. The characteristics of my validation study population are shown in Table 2-3. I used the 28 cases with miRNA-seq data from both fresh frozen and FFPET to explore  38 the potential effects of formalin fixation. To do so, I compared miRNA expression from FFPET and fresh frozen samples of these cases using hierarchical clustering. The result was two clusters: one consisting predominantly of fresh frozen samples, and the other consisting predominantly of FFPET samples (Figure 2.9). The observation of the majority (23/28) FFPET samples clustering with other FFPET samples rather than with their matched fresh frozen sample, indicates that FFPET samples are more similar to other FFPET samples than they are to matched fresh frozen samples from the same patient, and is in agreement with a previous study that reports on RNA degradation observed in FFPET miRNA-seq data152. 2.2.6.4 Validation of miRNAs associated with patient survival Despite the differences between fresh frozen and FFPET miRNA-seq expression profiles (Figure 2.9), I performed survival analyses (as performed in the Discovery Cohort) based on miRNA expression profiles obtained from FFPET samples in the Validation Cohort. This validation analysis replicated several associations of miRNA expression with OS and or PFS that had been identified in the Discovery Cohort. Specifically, I validated the association of 28 of 58 miRNAs (48%) with OS, and the association of 19 of 45 miRNAs (32%) with PFS (log-rank p-value <0.05). My analysis also validated the association of six of 25 miRNAs (24%) with both OS and PFS independent of COO and IPI (Cox PH p-value <0.05). These six miRNAs include miR-28 which was previously associated with survival in DLBCL212, and five other miRNAs that have not previously been associated with DLBCL patient survival. I observed that abundant expression levels of miR-28-5p, miR-214-5p, miR-339-3p, and miR-5586-5p were associated with superior outcome, while abundant expression levels of miR-324-5p and NOVELM00203M were associated with poor outcome (Figure 2.8A). Representative Kaplan-Meier curves and expression values for miR-5586-5p in both the Discovery and Validation Cohorts are displayed in Figure 2.8B-E, while results for the other five miRNAs are displayed in Figure 2.10. 2.2.6.5 miRNA expression profiles associated with patient survival I next sought to determine whether DLBCL patients could be stratified using their miRNA expression profiles. Unsupervised non-negative matrix factorization (NMF) consensus  39 clustering (Figure 2.11A and Figure 2.12), using only the miRNA expression profiles of the 83 R-CHOP treated patients, identified an optimum of two groups of patients (Figure 2.11A) with distinct outcome correlations (Figure 2.11B) and miRNA expression patterns (Figure 2.11C). These two groups did not differ based on any clinical characteristics, including age, sex, lactate dehydrogenase level, number of extranodal sites, cell-of-origin subtype, or other parameters such as presence of a chromosomal break-apart at BCL2, BCL6, or MYC (Chi-square test p-value >0.05). However, two miRNAs were significantly differentially expressed between the groups. In the cluster of patients with poorer outcome, miR-148a was increased in abundance and miR-21 was decreased in abundance compared to the cluster of patients with superior outcome (Figure 2.11A). Low expression of miR-21 in tumors175 and in serum211 of DLBCL patients has been associated with poor outcome, and high expression of miR-148a has been associated with poor survival in a COO-based classifier212. In my discovery cohort, miR-21 and miR-148a expression patterns were significantly associated with OS and PFS (Figure 2.13A) and this trend is also evident in my validation cohort, although not at statistically significant levels (Figure 2.13B). Both of these miRNAs appear to be highly expressed and highly variable in DLBCL and centroblast samples and exhibit discontinuous expression patterns (Figure 2.11D), suggesting that they may be robustly detected in clinical samples. My integrative analysis revealed that miR-148a candidate targets included genes associated with the immune response (GO:0006955); for example, AMICA1, CCR5, CD28, CD3G, CD8A, CD96, CLEC10A, CSF1, CTSW, CXCL12, CXCL16, GZMM, ITK, LCP2, MX2, NUB1, OASL, PRKCQ, SAMHD1, SELL, SIGIRR, TMEM173, and XCL1. Of note, CXCL12 is a chemokine receptor which plays a role in germinal center homing224 and CCR5 expression is associated with the transformation of mucosa-associated lymphoid tissue (MALT) lymphoma to DLBCL225. The observation that several immune response genes were candidate targets of miR-148a is compatible with the notion that DLBCL patients with higher miR-148a expression levels exhibit attenuated immune responses due to the repression of immune response genes. Further, six of the genes (CD28, CD3G, CD8A, ITL, LCP2, PRKCQ) are part of the T-cell receptor pathway, suggesting that T-cell interactions could be disrupted in patients with poor prognosis.  40 2.3 Discussion I report here on the first deep sequencing analysis of the DLBCL miRNA expression profiles. I generated miRNA and mRNA expression profiles from miRNA-seq and mRNA-seq data from 92 patient samples (including samples from 83 uniformly R-CHOP treated patients) and 15 normal centroblast fresh frozen samples and analyzed the expression of known and candidate novel miRNAs. Further, miRNAs from a validation cohort of 140 FFPET-derived DLBCL samples were sequenced to confirm the results of my survival analyses. In addition, my integrative miRNA:mRNA expression analysis was used to inform on the potential impact of miRNA dysregulation on B-cell biology and on DLBCL pathogenesis. These data provide a genome-wide view of miRNA expression and dysregulation in DLBCL. Existing miRNA profiling efforts in DLBCL patient cohorts have largely been probe-based84,173,175,211,226, and thus are biased toward detection of known miRNAs at the expense of identification of candidate novel miRNAs. miRNA-seq does not have this same limitation, and thus provides an opportunity to identify candidate novel miRNA species. A previous miRNA-seq analysis of 3 DLBCL cell lines identified more than 200 novel miRNAs214. Here I report on the discovery of an additional 234 novel miRNAs in 92 DLBCL tumor samples, where 30 of these were frequently expressed across DLBCL tumor samples and 29 were also detected (median RPM >1) in the FFPET Validation Cohort (n = 112). Of note, miR-10393-3p appeared to be more abundant in DLBCL tumor samples than in benign centroblasts. Further, my analysis is compatible with the notion that miR-10393-3p may play a role in the pathogenesis of DLBCL through attenuation of chromatin modifier gene expression. DLBCL tumors have been shown to have miRNA expression profiles distinct from those of benign B-cells, and dysregulated miRNAs have functional roles in B-cell differentiation and lymphomagenesis85. To shed light on the functions of dysregulated miRNAs, I performed an integrative miRNA and mRNA expression analysis which provided a transcriptome-wide view of miRNA:mRNA interactions that may be acting in DLBCL. This analysis indicated that the miRNAs that are abundantly expressed in DLBCL may modulate cell cycle regulation, cell metabolism and chromatin modification. Several efforts108,205 recently reported the frequent mutation of chromatin modification genes in NHL, illustrating the  41 relevance of the epigenome in malignant progression. My work here presents miRNA-mediated repression as another mechanism for the dysregulation of chromatin modification genes that are mutated in NHL. First, I show that the expression of a candidate novel miRNA (miR-10393-3p) is abundantly expressed in DLBCL when compared with centroblasts. Further, miR-10393-3p exhibits expression profiles that are anti-correlated with the expression profiles of 11 chromatin modification-related genes, including MLL2/KMT2D and EP300, which are recurrent targets of somatic mutation in NHL205. These results suggest DLBCL progression could proceed through mutations or miRNA-mediated repression as mechanisms that dyregulate the epigenome. Given that DLBCL comprises molecularly distinct subtypes, I sought to identify differentially expressed miRNAs that were associated with these subtypes. miRNAs that were upregulated in ABC-DLBCL included members of the oncogenic miR-17-92 cluster (miR-106a, miR-17, miR-20a, miR-92a)92, and others that have been implicated in lymphomagenesis in mouse models (miR-155218, miR-21227). Although not previously implicated in the pathogenesis of ABC-DLBCL, miR-625 has been shown to regulate invasion and metastasis in gastric cancer by targeting and regulating the expression of ILK228. Members of the miR-29 family, including miR-29b, target the WNT signaling pathway by attenuating expression of DNMT3A and DNMT3B229. Members of the miR-30 family have been shown to bind to and regulate BCL6 in B-lymphocytes and lymphoma cells230. Thus, decreased expression of miR-30b in GCB-DLBCL could promote the germinal center phenotype through the de-repression of BCL6. Previously, a pan-cancer miRNA analysis revealed that miRNA expression profiles tend to be tissue specific and can distinguish cancer samples of different cancer types from one another221. Another pan-cancer effort demonstrated that expression levels of miR-142 and miR-509 expression to be characteristic of lymphomas when compared with melanomas within a decision tree consisting of 25 cancer types231. My comparison of DLBCL and centroblast miRNA expression data to similar data from TCGA cancers showed that the miRNAs that are frequently expressed in DLBCL (including 3 candidate novel miRNAs) tended to have B-cell enriched expression patterns and candidate functions and they are frequently dysregulated in B-cell lymphomas. For instance, miR-191 is part of a 6-miRNA  42 signature that can discriminate B-lineage acute lymphoblastic leukemia (ALL) sub-groups harboring specific molecular lesions232. miR-7 is abnormally increased in abundance in lymphoid cancers including childhood ALL233 and follicular lymphoma234. miR-155 expression is known to be crucial in the B-cell germinal center transition through regulation of the master B-cell regulator AID235, and its expression levels are crucial for normal B-cell function: over-expression of miR-155 is associated with DLBCL, while under-expression is associated with Burkitt lymphoma236. miR-142, the miRNA that displayed the most significant increase in abundance in B-cell contexts, has been shown to regulate B-cell stimulation by downregulating the expression of SAP, CD84 and IL-10 proteins237. miR-142 is also mutated in approximately 20% of DLBCL cases, where mutations in the seed region lead to a loss of base pair affinity to oncogenic RAC1 and ADCY9 mRNA transcripts and a possible gain of base pair affinity to transcriptional repressors ZEB1 and ZEB2238. The ability to accurately predict response to therapy and survival is advantageous for DLBCL patient treatment planning. In this regard, there have been several efforts to explore the utility of miRNA expression. For example, Alencar et al.213 investigated the prognostic value of 11 miRNAs using qPCR, while Montes-Monero et al.212 similarly evaluated miRNA profiles in 36 patients using microarray-based technology. My results reveal that the expression of 25 miRNAs is associated with both OS and PFS independently of established indicators of patient outcome (COO and IPI). I also replicated my survival analyses in a FFPET-derived Validation Cohort. Studies have compared miRNA expression profiles obtained from FFPET and fresh frozen samples, and have shown that differences between profiles exist152-154. For example, miRNAs extracted from FFPET tend to have shorter average lengths153, reduced purity152, and higher expression levels than miRNAs from fresh frozen samples154. Despite the differences between the fresh frozen discovery and FFPET validation cohorts I studied, I was able to replicate the robust association of six miRNAs (miR-28-5p, miR-214-5p, miR-339-3p, miR-5586-5p, miR-324-5p and NOVELM00203M) with OS and PFS independently of COO and IPI. The independent association of these miRNAs with OS and PFS suggests that there is heterogeneity within the groups derived from the COO and IPI classifications.  43 Further, my integrative analysis indicated that the mRNA targets of NOVELM00203M are involved in cell adhesion (GO:0007155), reinforcing the importance of cell adhesion239 in the pathogenesis of DLBCL. miR-28 has previously been associated with DLBCL patient outcomes212, and is a tumor-suppressor in Burkitt Lymphoma240. However, the other five miRNAs I identified as independent factors affecting survival of patients with DLBCL (miR-214-5p, miR-339-3p, miR-5586-5p, miR-324-5p and NOVELM00203M) have not previously been implicated in DLBCL outcome. Although beyond the scope of this study, these miRNAs may serve as the basis for a future prognostic tool and will inform further studies of DLBCL biology. I describe, for the first time, deep and comprehensive profiling of the DLBCL miRNA expression landscape using miRNA-seq. In accordance with the goal of the thesis to stratify heterogeneous cancer patient populations by outcome and other clinical and molecular correlates, I identified 2 sub-groups of DLBCL patients that had distinct outcomes and were characterized by differential expression of miR-148a and miR-21. Of particular note, my analysis identified (in both the discovery and validation cohorts) five known miRNA and one candidate novel miRNA (miR-28-5p, miR-324-5p, miR-214-5p, miR-339-3p, miR-5586-5p, NOVELM00203M) that were associated with patient survival independently of established indicators of outcome. My integrative miRNA:mRNA analysis revealed that miRNAs that are upregulated in DLBCL appear to regulate genes involved in modulating the epigenome, and several of these are recurrently mutated in DLBCL as previously reported. It thus appears that dysregulation of the epigenome in DLBCL can be achieved through these different mechanisms. In addition, my comparison of DLBCL miRNA-seq expression profiles with those from 7,238 TCGA miRNA-seq libraries identified miRNAs (including three candidate novel miRNAs) that were more abundant in B-cell contexts, suggesting that these miRNAs may have B-cell specific functions in malignancy.      44 2.4 Methods 2.4.1 Lymphoma patient samples (both discovery and validation cohorts) This project was approved by the University of British Columbia–BC Cancer Agency Research Ethics Board as part of a broad effort to increase understanding of the molecular biologic characteristics of lymphoid cancers (REB #H05-60103). Informed consent was obtained in accordance with the Declaration of Helsinki. Lymphoma samples were classified by an expert hematopathologist (RDG) according to the World Health Organization criteria of 2008. 2.4.2 Patient sample acquisition (discovery cohort) Benign specimens were purified CD77-positive centroblasts sorted from reactive tonsils using Miltenyi magnetic beads (Miltenyi Biotec, CA, USA). More details and the cell-of-origin subtype assignment (performed using RNA-seq expression values) are reported in Morin et al.205. RNA extraction was performed as reported in The Cancer Genome Atlas Research Network, 201366. 2.4.3 Patient sample acquisition (validation cohort) These samples were obtained from FFPET blocks from which one to two 10 μm scrolls of each block were cut. Total RNA, including miRNA, was extracted from FFPET tissues using AllPrep DNA/RNA FFPET (Qiagen) and High Pure (Roche) kits in a procedure developed by the TCGA project through the Biospecimen Core Resources at Nationwide Children’s Hospital and International Genomics Consortium (manuscript in preparation). The cell-of-origin subtype assignment was performed as reported in Scott et al.241. 2.4.4 Library construction and sequencing of miRNA-seq Illumina libraries miRNA-seq library construction, sequencing, read alignment, and miRNA expression profiling was performed as reported in The Cancer Genome Atlas Research Network, 201366. My threshold for calling expressed miRNAs (>10 RPM in >10% samples) was based on  45 miRBase criteria216 for high confidence miRNAs. The miRNA-seq bam files of DLBCL samples from both the discovery and validation cohorts and the centroblasts are uploaded on EGA (Study#: EGAS00001001025 and Data Set#(s): EGAD00001001073, EGAD00001001074, EGAD00001001075). 2.4.5 Discovery of candidate novel miRNAs Novel miRNA discovery was performed using mirDeep242 in each of the 92 DLBCL miRNA-seq libraries. miRNA-seq reads were extracted from BAM files into a SAM format that was then analyzed using the mirDeep algorithm. As recommended by the authors of the software, only miRNA-seq reads >17 nucleotides in length were used for analysis. A list of all candidate novel miRNAs and their genomic coordinates was obtained from the results of each miRDeep run and then merged into a single file to eliminate duplicate entries. Merging was performed using a Perl script that considered overlapping genomic coordinates within +/- 2 bp. Each unique candidate novel miRNA was then given a name with the following format: ‘NOVEL[M/S]XXXXX’, where M and S indicated the mature or star strand respectively, and where the Xs represented a unique index number for each entry. In several instances, miRDeep incorrectly identified other RNA species (that is, snoRNA, tRNA) as miRNA. These were identified by intersecting their sequence coordinates with tracks supplied by UCSC243 for these RNA species (using intersect of the bedtools package v2.16.2), and disregarded in subsequent analyses. NOVELM00113, NOVELM00156, NOVELM00203, NOVELM00289, and NOVELM00295 were retained for analysis, but I note that they also share sequence identity with mt-tRNA, RNU12, SOX2-OT, RNU4-82P, and RNA28S5, respectively, and thus may also be classified as other species of RNAs. The shortlisted genomic coordinates were then used as annotations in the miRNA profiling pipeline to assess the expression of the candidate novel miRNAs in all 92 DLBCL and 15 centroblast miRNA-seq libraries. 2.4.6 Analysis of HITS-CLIP data HITS-CLIP data from Haecker et al.217 were obtained from the Sequence Read Archive (ID: SRR580359, SRR580360, SRR580361, SRR580362, SRR580363). The reads were aligned  46 and processed for miRNA expression with the same protocols that were used for my miRNA-seq libraries. 2.4.7 Quantitative RT-PCR for novel miRNA validation To measure miRNA expression, leftover total RNA from tumor tissues utilized for miRNA sequencing were synthesized into cDNA using the Universal cDNA Synthesis Kit II (Exiqon, Denmark) and qPCR was performed using the ExiLENT SYBR Green master mix (Exiqon) following the manufacturer’s protocol. Reverse Transcription conditions were: 42°C for 60 min, 95°C for 5 min. cDNA was stored at -20°C until ready for use. cDNA was diluted 1:80 prior to use for qPCR. qPCR conditions used were 40 cycles of 95°C for 10 s and 60°C for 1 min. All measurements were performed in triplicates. miRNA expression was normalized to endogenous RNU48 levels using the ∆∆Ct method. 2.4.8 mRNA isoform-specific expression profiling with mRNA-seq mRNA-seq sequence data were obtained from Morin et al.205. The mRNA-seq paired-end reads were aligned to RefSeq hg19 genome using TopHat v1.4.1244. Alignments were then interrogated for isoform-specific expression profiles using Cufflinks v1.3.0244. Only mRNA transcript isoforms that were expressed at 1 fragment per kilobase of million mapped reads (FPKM) in at least 10% of samples were considered for analysis. 2.4.9 Differential expression analysis Prior to differential expression analysis, miRNA expression profiles were quantile normalized using the R preprocessCore package. Evaluation of the differential expression of miRNA and mRNA was performed using the Wilcoxon ranked-sum test for each miRNA and mRNA. Significantly differentially expressed miRNA had Benjamini-Hochberg (BH) multiple test corrected p-values (q-values) <0.05.    47 2.4.10 Integrative miRNA:mRNA expression analysis For the integrative miRNA:mRNA expression analysis I considered miRNAs and mRNA transcript isoforms that were expressed in >10% of DLBCL and centroblasts samples. A Spearman correlation coefficient (rho) score and p-value was generated for each miRNA:mRNA pair. The p-values were then multiple-test corrected for each miRNA with the BH algorithm. Significantly anti-correlated pairs were those that had Spearman correlation coefficient scores <0 and adjusted q-values <0.05. To account for correlations that might have been stochastic noise, the rho distribution was then divided in 40 bins and the counts for each bin compared with counts from a null distribution. miRNA:mRNA pairs in each bin were sorted by adjusted p-value, and only those that ranked above the threshold set by counts from bins derived from null distribution were considered for further analysis. The null distribution was derived by performing the Spearman correlations 100 times, each time randomizing the miRNA-seq library IDs. Two algorithms were used for miRNA target prediction: TargetScan6.0155 and miRanda156. Target prediction was performed on all RefSeq hg19 mRNA transcript isoform sequences (including the 5’-UTR, CDS and 3’-UTR). While it is generally accepted that miRNAs target the 3’-UTR of mRNA transcripts, there are also reports of miRNA target sites in the CDS (that is, Forman et al.245; Duursma et al.76; Qin et al.246; Ott et al.247). In addition, the binding of miRNAs to binding sites within the 5’-UTR is as effective as binding to sites within the 3’-UTR77. Further, binding of miRNAs to CDS regions has been confirmed using large-scale high throughput approaches for isolating Argonuate-bound target sites. (Chi et al.248; Hafner et al.249) Thus, although evidence for binding sites in 5’-UTR and CDS regions is still accumulating, evidence for them exists in the literature and so I included them in my analysis along with those within the 3’-UTR. Although I required that candidate binding sites be identified using both TargetScan6.0155 and miRanda156, it is possible that certain predictions represent false positives. miRNA sequences and input data for annotated miRNAs was obtained from TargetScan and miRanda, respectively, while candidate novel miRNA sequences were obtained from miRNA-seq consensus sequences. miRNA:mRNA pairs were considered to have a miRNA- 48 mediated repression interaction if they had anti-correlated expression profiles and where the miRNA had a predicted binding site (determined by both algorithms) on the mRNA. 2.4.11 Gene ontology (GO) term enrichment analysis GO term enrichment analysis was performed using the MGSA (v 1.10.0) R package250. The lists of predicted target genes (obtained from the integrative expression analysis) for each miRNA were assessed separately for enriched GO Bioprocess terms. Significant terms were those with standard error measurements <0.05 and estimates >0.2. To assess whether groups of miRNAs (where a group consisted of miRNAs that are upregulated in DLBCL), together enriched particular GO Terms more so than by random chance, a Fisher’s Exact test was performed for each enriched term. The numbers of miRNAs in the category and out of the category that enriched the GO term were compared. The Fisher’s Exact Test p-values were then multiple-test corrected with the BH algorithm, where significant enrichments by a category were those with q-values <0.05. 2.4.12 Cell culture HEK-293 cells were maintained in Dulbecco’s Modified Eagle Medium (DMEM; Life Technologies, Burlington ON) supplemented with 10% (v/v) fetal bovine serum (FBS; Life Technologies) in a 37°C incubator with 5% CO2, humidified atmosphere. 2.4.13 Plasmid constructs The MLL2/KMT2D or EP300 genomic or mismatched sequences corresponding to the predicted miR-10393-3p binding sites were synthesized (IDT Technologies, Coralville, IA, USA) and cloned into the XhoI/NotI restriction sites of the psiCHECK2 vector (Promega, Madison, WI, USA) directly downstream of the Renilla luciferase reporter gene and verified by DNA sequence analysis. The mismatched sequences were designed to be exactly complementary to the seven nucleotide seed regions of each of the predicted miR-10393-3p binding sites to MLL2/KMT2D or EP300.   49 2.4.14 miRNA mimics miRNA expression was increased using MIRIDIAN miRNA mimics (ThermoScientific, Waltham MA) directed against miR-10393-3p (M10393; 5’-UUGGUCAGAUUUGAACUCUUCA-3’) and negative control #2 (NC2; non targeting control against C. elegans cel-miR-239b). Mimics were resuspended in nuclease-free water at a stock concentration of 100 μM. 2.4.15 Dual-luciferase reporter assays HEK-293 cells were seeded onto 24-well plates the day before transfection. Perfect binding or mismatched reporter constructs were co-transfected with miR-10393-3p mimics or NC2 control mimics using TurboFect Transfection Reagent (ThermoScientific) in OPTI-MEM (Life Technologies) without FBS. Six hours following transfection, the medium was changed to DMEM supplemented with 10% FBS. Cells were reseeded the following day into 96-well plates and 48 h following transfection, cells were lysed and luciferase activities were assayed using the Dual-Glo Luciferase Reporter Assay System (Promega). The Renilla:Firefly luciferase ratios were calculated for each well to account for transfection efficiencies. These experiments were performed in quadruplicates and were shown as means ± SEM. Statistical comparisons were performed using unpaired two-tailed t-tests with Bonferroni multiple-test correction, where significant differences were those with adjusted p-value <0.05. 2.4.16 Survival analysis Progression-free survival (PFS; event = progression of disease or death from any cause) and overall survival (OS; event = death from any cause) were estimated. For each miRNA, I used X-tile cohort separation223 to categorize patients into low and high expression groups, and then performed log-rank tests based on these derived groups. For the multivariate analysis for each miRNA, I considered the aforementioned low and high expression groups along with COO and IPI status using the Cox proportional hazards (Cox PH) method. All calculations were performed using the Survival R package251. Survival analyses were performed as above for both the discovery and validation cohorts. Significant associations with survival were those with p-value <0.05. In addition, p-values obtained from the log-rank tests in the  50 discovery cohort were subjected to multiple-test correction using the BH algorithm, and significant associations for that analysis were those with corrected p-values (q-values) <0.05. 2.4.17 NMF clustering of miRNA-seq expression miRNAs that were expressed at levels >10 RPM in at least 10% of the 92 DLBCL and 15 centroblast samples were included in the NMF clustering analysis. Because I was interested in assessing associations with outcome between groups of patients, I only considered the data from the 83 patients that were uniformly treated with R-CHOP for this clustering analysis. I generated unsupervised consensus clustering results as described in The Cancer Genome Atlas Research Network, 201366. I used the default Brunet algorithm and 100 iterations for the rank survey and clustering runs. A preferred cluster result was selected by considering the profiles of the cophenetic scores of the consensus membership matrix for clustering solutions having between two and eight clusters. I chose the 2-group (k = 2) solution as it had the second highest cophenetic score and produced a visually clean consensus matrix when compared with the other solutions (Figure 2.12). Since some of the k = 3 to 8 solutions have relatively high cophenetic scores, there is likely heterogeneity within ‘cluster 2’ of the k = 2 solution. However, I chose to present the k = 2 solution because the focus of my analysis was on the characterization of ‘cluster 1’, the cluster that does not lose its integrity as I increase the number of clusters. That is, in the k = 8 solution, ‘cluster 1’ (from the k = 2 solution) still appears as a distinct cluster of patients with poor outcome that is characterized by reduced expression of miR-21 and abundant expression of miR-148a.  2.5 Figures and tables Figure 2.1 Profiling miRNA expression in DLBCL  (a) miRNA sequence analysis identifies small RNA species (miRNA, rRNA, snoRNA, tRNA, srpRNA, scRNA, snRNA), with the majority of reads aligning to miRNA loci. The pie chart depicts the proportion and origin of miRNA-seq aligned reads, where the blue slice represents miRNAs. Reported proportions are averaged across the 92 DLBCL and 15 centroblast libraries. (b) Expression of candidate novel miRNAs across DLBCL and  51 centroblast libraries. Column labels represent the type of sample: Dark Blue: ABC-DLBCL; Light Blue: GCB-DLBCL; Gray: Unclassified-DLBCL; Orange: Centroblasts. Row labels are annotated to indicate whether the miRNA was more abundantly expressed in a sample category. (c) Boxplots depicting expression of B-cell enriched candidate novel miRNAs (NOVELM00010M, miR-10398-3p and NOVELM00260M) in DLBCL, centroblasts, and other cancers. BLCA: bladder urothelial carcinoma; BRCA: breast invasive carcinoma; CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma; COAD: colon adenocarcinoma; HNSC: head and neck squamous cell carcinoma; KICH: kidney chromophobe; KIRC: kidney renal clear cell carcinoma; KIRP: kidney renal papillary cell carcinoma; LGG: brain lower grade glioma; LIHC: liver hepatocellular carcinoma; LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma; OV: ovarian serous cystadenocarcinoma; PAAD: pancreatic adenocarcinoma; PRAD: prostate adenocarcinoma; READ: rectum adenocarcinoma; SARC: sarcoma; SKCM: skin cutaneous melanoma; STAD: stomach adenocarcinoma; THCA: thyroid carcinoma; UCEC: uterine corpus endometrial carcinoma. Blue: DLBCL; Orange: Centroblast.  52   53 Figure 2.2 Differential expression analyses of known and candidate novel miRNAs Differential expression for each miRNA was calculated using the Wilcoxon ranked-sum test, and p-values were multiple-test corrected using the BH algorithm. (a) MA (Log ratio (M) versus mean average (A) expression) plot showing differentially expressed miRNAs comparing DLBCL to centroblasts. (b) MA plot showing miRNA that are differentially expressed between the B-cell data sets (DLBCL and centroblasts) and all other TCGA cancer data sets. In both MA plots, significantly differentially expressed known miRNAs are represented by red dots, while significantly differentially expressed candidate novel miRNAs are represented by green dots. (c) Heat map of differentially expressed miRNAs between the ABC and GCB DLBCL subtypes. Column labels represent the type of sample: Dark Blue: ABC-DLBCL; Light Blue: GCB-DLBCL; Gray: Unclassified-DLBCL. Row labels indicate if the miRNA is more abundant in ABC-DLBCL or GCB-DLBCL.  54   55 Figure 2.3 miR-142 expression in DLBCL, centroblasts, and other cancers Boxplots indicating miRNA-142 expression in different sample types. BLCA (Bladder Urothelial Carcinoma); BRCA (Breast invasive carcinoma) CESC (Cervical squamous cell carcinoma and endocervical adenocarcinoma); COAD (Colon adenocarcinoma); HNSC (Head and Neck squamous cell carcinoma); KICH (Kidney Chromophobe); KIRC (Kidney renal clear cell carcinoma); KIRP (Kidney renal papillary cell carcinoma); LGG (Brain Lower Grade Glioma); LIHC (Liver hepatocellular carcinoma) LUAD (Lung adenocarcinoma); LUSC (Lung squamous cell carcinoma); OV (Ovarian serous cystadenocarcinoma); PAAD (Pancreatic adenocarcinoma); PRAD (Prostate adenocarcinoma); READ (Rectum adenocarcinoma); SARC (Sarcoma); SKCM (Skin Cutaneous Melanoma ); STAD (Stomach adenocarcinoma); THCA (Thyroid carcinoma); UCEC (Uterine Corpus Endometrial Carcinoma); Blue: DLBCL; Orange: Centroblast; Grey: Other cancer type.  KICH KIRC KIRP LGG LIHC LUAD LUSC OV PAAD PRAD READ SARC SKCM STAD THCA UCEC0100020003000hsa mir 142.MIMAT0000433Expression (RPM)BLCA BRCACentro CESC COAD HNSCDLBCL(TCGA)DLBCL01000200030004000miRNA-seq Data Set15 92 17 174 1104 136 486 399 91 584 153 221 149 535 435 525 44 248 176 30 303 334 572 550n 56 Figure 2.4 Pipeline for discovering putative miRNA:mRNA interactions Algorithm for identification of miRNA:mRNA pairs with anti-correlated expression profiles. Spearman correlations were calculated for each miRNA:mRNA pair, and interactions with Spearman correlation coefficients <0 (Benjamini-Hochberg q-value<0.05) were short listed. (ii) These interactions were filtered through TargetScan and miRanda predictions to obtain interactions where the mRNA had >1 binding site for the miRNA (predicted by both algorithms). (iii) Only considered were those interactions involving a miRNA that was increased in abundance in DLBCL when compared with benign centroblasts, where the up-regulated miRNAs were determined by differentially expression analysis (Wilcoxon test, Benjamini-Hochberg q-value <0.05).    57 Figure 2.5 Candidate miRNA:mRNA interactions in DLBCL are involved in various cellular processes Chart showing Gene Ontology (GO) Bio Process terms that were enriched for genes involved in reciprocally expressed miRNA:mRNA pairs. Each column indicates a miRNA that was either increased (red label) or decreased (green label) in abundance in comparisons between DLBCL and centroblasts. Each row represents a GO term. A grey bar indicates that the GO term is significantly enriched in the candidate gene targets of one or more miRNA. Only GO Terms that were enriched by targets of at least two miRNAs are shown. The numbers of miRNAs in each category are shown on the right.  58   59 Figure 2.6 MLL2 and EP300 may be targets of miRNA-mediated repression miR-10393-3p is involved in miRNA:mRNA interactions with chromatin modifiers MLL2/KMT2D and EP300. (a, b) miRNA and mRNA display anti-correlated expression patterns and the mRNA has a predicted binding site for miR-10393-3p (M10393). Orange dots represent centroblast libraries, red dots represent DLBCL libraries with a somatic mutation in MLL2/KMT2D or EP300, respectively, and blue dots represent DLBCL patient samples without the mutation. The boxplots to the top and right of each scatter plot summarize miRNA and mRNA expression in DLBCL (‘D’) and Centroblasts (‘C’). The green dotted line is the LOESS best-fit line. (c, d) Top: Schematic representations of the putative miR-10393-3p binding sites on MLL2/KMT2D or EP300. Putative seed regions within each site are underlined and in red font. Bottom: Dose response of miR-10393-3p miRNA activity in HEK-293 cells was assessed using a psiCHECK2 dual luciferase reporter construct containing each of the putative MLL2/KMT2D or EP300 binding sites. Activity is measured as Renilla luciferase normalized to Firefly luciferase to control for transfection efficiencies. The data were shown as normalized relative luciferase units (RLU) with respect to the corresponding dose of the control mimic and are representative of three independent experiments (mean ± SEM). Statistically significant comparisons between the co-transfected M10393 miRNA and the NC2 control for the perfect binding reporter vector are noted over the solid colored bars. Statistically significant comparisons between perfect binding and mismatch constructs are indicated above double-headed arrows. *p-value <0.05. White bars, NC2 negative control mimics; Solid colored bars, M10393 mimics on perfect binding (PB) sites; Striped colored bars, M10393 mimics on mismatched (MM) sites.  60   61 Figure 2.7 Kaplan-Meier (KM) curves illustrating DLBCL patient survival There were 83 R-CHOP treated patients for which I had overall survival (OS) and progression free survival (PFS) data. (a) KM curves for the entire cohort. The dotted line above and below represent the confidence interval. (b) KM curves of patients stratified by International Prognostic Scores (IPI) (Low: 0-2; High 3-5). (c) KM curves of patients stratified by tumor cell-of-origin (COO): ABC (Activated B-cell-like); GCB (Germinal Center B-cell-like). COO status was derived using a gene expression signature from mRNA-seq data.     62    63 Figure 2.8 Survival analyses in the discovery and validation cohorts (a) miRNAs that were associated with OS and/or PFS in the discovery cohort (n = 83), and validation cohort (n = 112). Six miRNAs (miR-28-5p, miR-324-5p, miR-214-5p, miR-339-3p, miR-5586-5p, NOVELM00203M) were found to be associated with OS and PFS, independently of COO and IPI, in both the discovery and validation cohorts. Light Blue: miRNAs associated with OS; Dark Blue: miRNAs associated with OS independently of COO and IPI; Light Green: miRNAs associated with PFS; Dark Green: miRNAs associated with PFS independently of COO and IPI; miR-5586-5p Kaplan-Meier curves and scatter plots of expression in (ABC)-DLBCL, (GCB)-DLBCL, and (U)nclassified-DLBCL: (b) Discovery cohort OS, (c) Discovery cohort PFS, (d) Validation cohort OS, (e) Validation cohort PFS. Plots for the other five validated miRNAs are shown in Figure 2.10.  64      65 Figure 2.9 Heat map comparing matched discovery cohort FF and validation cohort FFPET samples for 28 cases Unsupervised clustering of miRNA expression profiles from matched fresh frozen (FF) and formalin-fixed, paraffin embedded tissue (FFPET) samples of 28 cases, from the discovery and validation cohorts respectively, indicates that FF and FFPET samples have distinct expression profiles. With the exception of 5 cases that have their FF and FFPET samples clustering closest to one another (indicated in red boxes around sample labels), all FF and FFPET samples fall within the 2 distinct clusters representing FF and FFPET samples. Expression profiles from FF and FFPET samples were normalized by the total number of mapped reads in each library and then quantile normalization was performed across all 56 samples. The value plotted in the heat map is a z-score of the expression of each miRNA in each sample.  66   67 Figure 2.10 Kaplan-Meier curves and strip charts of expression levels for the 6 miRNAs that were found to be associated with OS and PFS, independently of COO and IPI in both the discovery and validation cohorts (This figure spans 6 pages) High and low expression groups are represented by blue and red respectively. The dots in the strip chart represent miRNA expression of each sample, and samples are categorized by DLBCL subtype: ABC-DLBCL, GCB-DLBCL and (U)nclassified-DLBCL. Abundant expression of miR-324-5p and NOVELM00203M are associated with inferior outcomes, while abundant expression of miR-28-5p, miR-214-5p, miR-339-3p and miR-5586-5p are associated with superior outcomes.  68   69   70  71  72  73   74 Figure 2.11 NMF Clustering identifies two clusters of DLBCL patients with distinct miRNA and outcome profiles I performed non-negative matrix factorization (NMF) clustering on miRNA expression profiles from 83 R-CHOP treated DLBCL patient samples. (a) NMF yielded two clusters of patients that had distinct differences in their outcomes. Patients in cluster 1 are indicated by dark gray bars, while patients in cluster 2 are indicated by light gray bars. Below the consensus matrix is a heat map showing the expression of miR-148a and miR-21 in each patient. (b) Kaplan-Meier curves showing overall survival and progression-free survival of patients in both clusters. Patients in cluster 1 exhibit inferior outcome compared to those in cluster 2. (c) To identify which miRNAs were characteristic of each cluster, I identified the differentially expressed miRNAs between the two clusters. The MA plot shows that miR-21 abundance was increased in cluster 2 patients, while miR-148a abundance was decreased in cluster 1 patients (Wilcoxon test Benjamini-Hochberg q-value <0.05). (d) Expression patterns of miR-148a and miR-21 are discontinuous. miRNA expression in DLBCL patient samples is indicated with squares, while expression in centroblast samples is indicated with diamonds.  75     76 Figure 2.12 Non-negative matrix factorization (NMF) solutions and metrics (a) Consensus matrices from different NMF solutions for K=2:8. (b) Cophenetic scores from different solutions for K=2:8. I selected the K=2 solution for further analysis as the consensus matrix appeared to be the cleanest, and had the highest cophenetic coefficient.     77     78 Figure 2.13 miR-148a and miR-21 expression levels are associated with survival Kaplan-Meier plots show that miR-148a was positively correlated with poor outcome, and that miR-21 was negatively correlated with poor outcome in both the discovery cohort (a) and validation cohort (b).    79   80 Table 2-1 Clinical characteristics of the patients in the DLBCL discovery cohort Demographic or Clinical Characteristic Number of Patients (n=83) (24 patients) Male - % 61 Age - median (range) 66 (16 to 92) Stage – n (%) I/II III/IV NA  39 (50) 40 (48) 4 Lactate Dehydrogenase > ULN – n (%) No Yes NA  31 (37)  35 (42) 17 ECOG Performance Status – n (%) 0 to 1 At least 2 NA  57 (68) 22 (27) 4 Extranodal Sites – n (%) 0 to 1 Greater than 1 NA  70 (84) 9 (11) 4 Revised International Prognostic Index† – n (%) Very Good and Good (0 to 2) Poor (3 to 5) NA  52 (63) 22 (27) 9 Cell-of-Origin‡ – n (%) GCB ABC Unclassified NA  41 (49) 29 (35) 8 (10) 5 BCL2 FISH Breakapart§ - n (%) Positive Negative NA  20 (24) 39 (47) 24 BCL6 FISH Breakapart§ - n (%) Positive Negative NA  10 (12) 48 (58) 25 MYC FISH Breakapart§ - n (%) Positive Negative NA  6 (7) 52 (63) 25 B Symptoms – n (%) Absent Present NA  50 (60) 25 (30) 8 NA indicates not available; ULN, upper limit of normal; ECOG, Eastern Cooperative Oncology Group; and GCB, germinal center B-cell like. †The Revised International Prognostic Indicator (R-IPI) score ranges from 0-5, with higher scores indicating increased risk208,209. ‡Cell-of-origin (COO) was determined by RNA-seq gene expression profiling using the Wright et al.252 classifier. §The presence of translocations was determined using commercial dual color “break-apart” probes from Abbott Molecular (Abbot Park, IL, US) on tissue microarray using the method described in Chin et al.253  81 Table 2-2 Genomic coordinates of expressed candidate novel miRNAs Novel miRNA Name miRNA ID mirBase Assigned Name Chromosome Start End Strand Transcript Position Start Transcript Position End Sequence NOVELM00010 NOVELM00010M  10 115463433 115463489 - 35 56 cugcgaucuauugaaagucagc NOVELM00010 NOVELM00010S  10 115463433 115463489 - 1 18 uugggguuucguauguag NOVELM00015 NOVELM00015M hsa-mir-10392-5p 11 64645994 64646081 - 1 23 gcgcuucgacgggcugggcugug NOVELM00015 NOVELM00015S hsa-mir-10392-3p 11 64645994 64646081 - 64 87 ccggccccgccucggcuccgcacc NOVELM00058 NOVELM00058M hsa-mir-10393-3p 15 45010041 45010091 + 29 50 uuggucagauuugaacucuuca NOVELM00058 NOVELM00058S hsa-mir-10393-5p 15 45010041 45010091 + 1 25 gagaauucucuuauccaacaucaac NOVELM00060 NOVELM00060M  16 33964144 33964202 + 1 21 ccaguaagugcgggucauaag NOVELM00060 NOVELM00060S  16 33964144 33964202 + 36 58 cugcccuauguacacacugcccg NOVELM00078 NOVELM00078S  18 77772983 77773066 + 1 23 gcuucugggucgggguuucguac NOVELM00079 NOVELM00079M  18 43423106 43423167 + 40 61 ucugaccucucugaccugcagc NOVELM00088 NOVELM00088M hsa-mir-10394-3p 19 58904730 58904813 + 61 83 uugggcgcgccgggacugugaga NOVELM00088 NOVELM00088S hsa-mir-10394-5p 19 58904730 58904813 + 1 24 cucugcagguccuggugaacgcca NOVELM00089 NOVELM00089M hsa-mir-10395-3p 19 12814414 12814478 - 44 64 auguauucguacugucugaug NOVELM00089 NOVELM00089S hsa-mir-10395-5p 19 12814414 12814478 - 1 18 gugauggagagcaauacc NOVELM00113 NOVELM00113M  1 564927 564972 - 25 45 ggauggggugugauagguggc NOVELM00113 NOVELM00113S  1 564927 564972 - 1 24 gcuuauuuagcugaccuuacuuua NOVELM00146 NOVELM00146M  20 26189310 26189351 - 23 41 ggcucgucgccuacugugg NOVELM00146 NOVELM00146S  20 26189310 26189351 - 1 19 caccgcggugguggccgag NOVELM00149 NOVELM00149M  20 26190183 26190253 - 1 18 ugguugucgacuugcggg NOVELM00149 NOVELM00149S  20 26190183 26190253 - 52 70 cgccggcccgucgugcugc NOVELM00154 NOVELM00154M hsa-mir-10396a-3p 21 9826449 9826498 + 29 49 gggccccgggcccucgaccgg NOVELM00154 NOVELM00154S hsa-mir-10396a-5p 21 9826449 9826498 + 1 19 cggcggggcucggagccgg NOVELM00156 NOVELM00156M  22 43011339 43011397 + 36 58 gaugccugggaguugcgaucugc  82 Novel miRNA Name miRNA ID mirBase Assigned Name Chromosome Start End Strand Transcript Position Start Transcript Position End Sequence NOVELM00156 NOVELM00156S  22 43011339 43011397 + 1 19 aaggucgcccucaagguga NOVELM00173 NOVELM00173M  2 199151734 199151804 + 52 70 uggauuuuuggaaauagga NOVELM00173 NOVELM00173S  2 199151734 199151804 + 1 20 ccuauugauucuauuucuuu NOVELM00203 NOVELM00203M  3 181540606 181540686 - 1 25 gccugggaauaccgggugcuguagg NOVELM00203 NOVELM00203S  3 181540606 181540686 - 57 80 ucccagcacuucgggaggccgagg NOVELM00222 NOVELM00222M  4 7584280 7584364 + 65 84 gcccucgacacaaggguuug NOVELM00222 NOVELM00222S  4 7584280 7584364 + 1 21 ugcuucuggcucgggguuuca NOVELM00237 NOVELM00237M hsa-mir-10397-5p 5 10402475 10402543 + 1 22 uuccuugaccugaugcuguagg NOVELM00237 NOVELM00237S hsa-mir-10397-3p 5 10402475 10402543 + 45 68 ucauagaucucgucgcuuacuggg NOVELM00245 NOVELM00245M hsa-mir-10398-3p 6 41701291 41701348 - 37 57 gcccggagagcugggagccag NOVELM00245 NOVELM00245S hsa-mir-10398-5p 6 41701291 41701348 - 1 21 uggcucccuucucuccgucug NOVELM00263 NOVELM00263M hsa-mir-10399-5p 7 138728845 138728903 - 1 21 aauuacagauugucucagaga NOVELM00263 NOVELM00263S hsa-mir-10399-3p 7 138728845 138728903 - 38 58 cucucggacaagcuguagguc NOVELM00278 NOVELM00278M hsa-mir-10400-5p 8 145687867 145687922 - 1 21 cggcggcggcggcucugggcg NOVELM00278 NOVELM00278S hsa-mir-10400-3p 8 145687867 145687922 - 34 55 cugggcucccggacgaggcggg NOVELM00288 NOVELM00288M  9 136856966 136857036 - 1 18 uccgagacgcgaccucag NOVELM00288 NOVELM00288S  9 136856966 136857036 - 52 70 gggucgcgccgccgccgcc NOVELM00289 NOVELM00289M  9 127649764 127649816 - 30 52 ccgaggcgcgauuauugcuaauu NOVELM00289 NOVELM00289S  9 127649764 127649816 - 1 17 guggcaguaucguagcc NOVELM00290 NOVELM00290M  9 97904679 97904745 - 1 17 cgggugcuguaggcuuu NOVELM00290 NOVELM00290S  9 97904679 97904745 - 47 66 uggcacacaccuguaauccc NOVELM00294 NOVELM00294M hsa-mir-10401-3p GL000220.1 149970 150026 + 36 56 gaccucgccgucccgcccgcc NOVELM00294 NOVELM00294S hsa-mir-10401-5p GL000220.1 149970 150026 + 1 20 gcgugugggaaggcgugggg NOVELM00295 NOVELM00295M  GL000220.1 158352 158454 + 1 20 cccccgcgggggcgcgccgg  83 Novel miRNA Name miRNA ID mirBase Assigned Name Chromosome Start End Strand Transcript Position Start Transcript Position End Sequence NOVELM00295 NOVELM00295S  GL000220.1 158352 158454 + 84 102 cccgcgcccccgccccggc NOVELM00296 NOVELM00296M hsa-mir-10396b-3p GL000220.1 108525 108576 + 30 51 gggccccgggcccucgaccgga NOVELM00296 NOVELM00296S hsa-mir-10396b-5p GL000220.1 108525 108576 + 1 20 ccggcggggcucggagccgg     84 Table 2-3 Clinical characteristics of the patients in the DLBCL validation cohort Demographic or Clinical Characteristic All Patients (n=112) (24 patients) Male - % 60 Age - median (range) 65 (20 to 89) Stage – n (%) I/II III/IV NA  51 (86) 59 (53) 2 Lactate Dehydrogenase – median (range) 1.1 (0.57 to 17.1)  ECOG Performance Status – n (%) 0 to 1 At least 2 NA  75 (67) 35 (31) 2 Extranodal Sites – n (%) 0 to 1 Greater than 1 NA  50 (44) 19 (17) 43 Revised International Prognostic Index† – n (%) Very Good and Good (0 to 2) Poor (3 to 5) NA  69 (62) 37 (33) 6 Cell-of-Origin‡ – n (%) GCB ABC Unclassified  63 (56) 35 (31) 14 (13)  BCL2 FISH Breakapart§ - n (%) Positive Negative NA  36 (32) 69 (61) 7 BCL6 FISH Breakapart§ - n (%) Positive Negative NA  25 (22) 80 (71) 7 MYC FISH Breakapart§ - n (%) Positive Negative NA  86 (77) 15 (13) 11 NA indicates not available; ECOG, Eastern Cooperative Oncology Group; and GCB, germinal center B-cell like. †The Revised International Prognostic Indicator (R-IPI) score ranges from 0-5, with higher scores indicating increased risk (1, 2). ‡Cell-of-origin (COO) was determined by the Lymph2CX (3) classifier. §The presence of translocations was determined using commercial dual color “break-apart” probes from Abbott Molecular (Abbot Park, IL, US) on tissue microarray using the method described in Chin et al. (4)     85 3 Comprehensive Transcriptome Sequence Analysis of Relapse and Induction Failure in Pediatric Acute Myeloid Leukemia Reveals Transcripts Associated With Treatment Resistance 3.1 Introduction Acute myeloid leukemia (AML) is a hematopoietic malignancy that is characterized by genetic and epigenetic alterations in progenitor cells that lead to impairment of differentiation and promotion of clonal expansion254. Although AML is far more prevalent in adults, it is also diagnosed in young children and comprises almost 25% of pediatric leukemias176. The first course of treatment administered to pediatric AML patients consists of intensive cytarabine and anthracycline based induction chemotherapy. At the end of induction chemotherapy, a minimal residual disease (MRD) score, which is based on the total leukemic blast count, is determined for each patient. A patient is considered to have achieved complete response (CR), or to be disease free, if their MRD score is less than 5%140. The remaining patients who are refractory to induction chemotherapy are classified as cases that experience induction failure (IF) (Figure 3.1). While a majority of patients (80-90%) do achieve CR after induction chemotherapy, about 40% of these cases subsequently suffer from relapse141 (Figure 3.1); and when relapse does occur, even with hematopoietic stem cell transplantation, outcomes are generally poor with 5 year overall survival (OS) rates between 21–33%184. Therefore, the identification of additional biomarkers and therapeutic targets may offer the possibility of improved tools for clinical management of pediatric AML. AML is a heterogeneous disease and several subtypes for the disease have been identified. Two main systems of AML classification are 1) the French American British (FAB) classification that comprises 8 (M0-M7) subtypes that are distinguished by cell-of-origin and maturity of the leukemia cells255, and 2) the newer World Health Organization (WHO) classification that is based on the presence of cytogenetic abnormalities256.  Most pediatric AML cases separate into distinct cytogenetic categories based on specific chromosomal alterations such as t(8;21), inv(16), t(15;17) or rearrangements involving the MLL locus176. Somatic point mutations in genes involved in hematopoiesis, such as FLT3,  86 NPM1 and CEBPA, have been associated with the pathogenesis and outcome of pediatric AML and have been incorporated in clinical trials as prognostic markers176. Efforts have also associated a flow-cytometry-based signature76 and the presence of NUP98/NSD1 fusions and FLT3 internal tandem duplications239 with treatment failure in pediatric AML. However, although many somatic karyotypic and molecular alterations have been identified in pediatric AML, the majority of them do not identify a specific target or distinct pathway that can be readily exploited for therapeutic intervention254. Thus, pediatric AML poses significant therapeutic challenges. AML has one of the lowest mutation rates of all cancers257. Moreover, pediatric cancers tend to have lower mutation rates when compared with adult cancers36, suggesting that genomic alterations are not sufficient to explain the complexity and somatic heterogeneity of AML in its entirety258. This points at a need to consider transcriptome and epigenome dysregulation along with genomic alterations in order to gain an in-depth understanding of AML biology. Probe-based methods have been used to profile pediatric AML transcriptomes, including miRNA177,178 and mRNA182,183 expression. These efforts demonstrated that expression levels of miR-100, miR-125b, miR-335, miR-146a and miR-99a are abundant in pediatric AML when compared to pediatric acute lymphoblastic leukemia (ALL) or adult AML178, and that each pediatric FAB subtype has a distinct miRNA expression pattern177. mRNA expression patterns associated with patient prognosis179,180,182, refractory and relapse disease140,259, and particular FAB subtypes185 were also reported. Sequence-based transcriptome profiling, using mRNA-seq and miRNA-seq, provides mRNA isoform specific expression profiles and digital read counts of miRNA expression. Leveraging the strengths of sequence-based methods, TCGA has provided profiles of genomes, transcriptomes and epigenomes of 200 adult AML cases66. However, despite confirmation of the prevalence and prognostic utility of genetic events in adult AML (ie. mutations in DNMT3A, IDH1, IDH2), such events appear to be rare or absent in pediatric AML260, suggesting a need to directly profile pediatric AML. To the best of my knowledge, there has not yet been a sequence-based effort to comprehensively profile the pediatric AML transcriptome to report on miRNA and mRNA expression patterns and interactions associated with refractory and relapse disease.  87 Here, I provide a detailed analysis of the pediatric AML transcriptome, including miRNA and mRNA expression patterns, and miRNA:mRNA interactions that are characteristic of the disease. Further, to discover genes and pathways that drive aggressive disease, I catalogued gene expression changes enriched in relapse and refractory samples. In particular, I observed that reduced expression of ribosomal genes (ie. RPL28, RPL10, MRPS24) was associated with inferior outcomes. I also revealed the association of 4 members of the miR-106a-363 cluster with inferior OS and event-free survival (EFS) in patients. Integrative miRNA:mRNA analysis and luciferase reporter assays further demonstrated that candidate targets of miR-106a-363 are involved in oxidative phosphorylation, a process that is suppressed in treatment-resistant leukemic cells206.  3.2 Results 3.2.1 Sequence analysis of pediatric AML samples Transcriptome sequencing yields digital read counts of mRNA and miRNA transcripts and allows for the identification of dysregulated miRNA/mRNA expression patterns. To characterize the transcriptome of pediatric AML, I obtained and analyzed miRNA-seq and mRNA-seq data from AML patient samples. A total of 676 pediatric AML patients were considered for this study, and their clinical characteristics are listed in Table 3-1. My discovery set consisted of miRNA-seq from 259 primary, 22 refractory (16 with matched primary sample) and 37 relapse (26 with matched primary sample) samples, and mRNA-seq from 158 primary, 12 refractory (10 with matched primary sample) and 47 relapse (35 with matched primary sample) samples. Analysis on matching samples was performed when comparing mRNA expression in relapse and primary samples as there were sufficient matched samples. I confirmed my survival analyses on a validation set which consisted of miRNA-seq from 378 primary samples and mRNA-seq from 95 primary samples (Figure 3.1A).   88 3.2.2 Unsupervised clustering of mRNA transcript expression reveals 5 sub-groups correlated with survival differences Pediatric AML has been treated uniformly despite significant genetic heterogeneity between patients261. As such, the identification of clinically significant heterogeneity within AML may play an important role in future therapeutic strategies. To determine the extent of gene expression heterogeneity in pediatric AML, I performed unsupervised NMF consensus clustering using mRNA expression profiles of 158 primary samples. This analysis returned 5 patient sub-groups (Figure 3.2A; Figure 3.3; Methods), each with distinct gene expression profiles (Figure 3.2B; Appendix 3A). Sub-group 1, which had an enrichment of trisomy 8 cases (Figure 3.2C; 5/22; 22.7%) and FLT3-ITD positive cases (Figure 3.2C; 6/22; 27.3%), was characterized by abundant expression of hemoglobin genes (GO:0015671) such as HBB, HBA2, HBD, HBG2, HBG1 and AHSP, suggesting that there are relatively more red blood cell progenitors in these patients. Accordingly, this sub-group also has the lowest white blood cell (WBC) count amongst all 5 groups (Sub-group 1 mean: 2.94E4 WBC/MicroLiter; Sub-groups 2-4 means: 5.61E4 – 1.30E5 WBC/MicroLiter). This is in agreement with reports of low WBC counts in trisomy 8 AML cases262. Sub-group 2 was characterized by abundant expression of genes involved in the inhibition of apoptosis (GO:0043066) such as AZU1, IER3, CEBPB, BNIP3L, BCL2A1, PIM1, ANXA1, PRDX2, CAT and SOD2. This suggests that dysregulated gene expression in this sub-group could be resulting in an evasion of cell death. Consistent with previous findings that gene expression is tightly correlated with FAB sub type in pediatric AML185, 2 of the 5 sub-groups showed specific enrichment for particular FAB subtypes. Sub-group 3, which had a enrichment for FAB M2 cases (Figure 3.2C; 23/69; 33.3%), was characterized by abundant expression of RNA processing genes (GO:0006496) such as SF3B1, RPL7, SF1, RPL11, RBM39 and HNRNPH1; while Sub-group 4, which has an enrichment for FAB M5 cases (Figure 3.2C; 19/30; 63.3%) and MLL rearranged cases (Figure 3.2C; 14/30; 46.7%), was characterized by abundant expression of immune response (GO:0006955) genes such as CCL3, LST1, BST2, NCF1, LY86, IL1RN, CNPY3, CCL5, WAS, CFP, CYBA, FCN1, FCER1G, CFD, IFI6 and CTSG.  89 Sub-group 5, which was enriched by inv(16) cases (Figure 3.2C; 10/23; 43.5%) and FLT3 mutated cases (Figure 3.2C; 6/23; 26.1%), was characterized by abundant expression of genes that encode ribosomal proteins (GO:0006414) such as RPL18, RPL35, RPLP2, RPL36, RPL27, RPL38, RPL28, RPL29, RPL30, RPS19, RPS28, RPL32, RPS16, RPS29, RPL31, RPLP1, RPL37A, RPS11, RPS21. Further, Sub-group 5 is distinguished from the others by superior outcome (OS and EFS log-rank p-value<0.05; Figure 3.2D). In agreement with this, Sub-group 5 is also enriched for low risk cases (Figure 3.2C; 18/23; 78.3%). To confirm the robustness of the NMF clustering solution, I also performed clustering of the same data using 2 different methods: 1) hierarchical clustering and 2) principle components analysis. While these clustering methods did not clearly distinguish Sub-groups 1-4 from one another, they did emphasize the distinctiveness of Sub-group 5 compared to the other 4 sub-groups (Figure 3.4). To identify miRNAs that are differentially expressed between the 5 sub-groups, I performed Wilcoxon tests, comparing cases in each sub-group with cases in all other sub-groups combined. This revealed that there were 11 miRNAs that were more abundantly expressed and 30 miRNAs that were less abundantly expressed in Sub-group 2; 6 miRNAs that were more abundantly expressed and 2 miRNAs that were less abundantly expressed in Sub-group 3; 50 miRNAs that were more abundantly expressed and 36 miRNAs that were less abundantly expressed in Sub-group 4; and 26 miRNAs that were more abundantly expressed and 25 miRNAs that were less abundantly expressed in Sub-group 5. These miRNAs are listed in Appendix 3B. There were no differentially expressed miRNAs when comparing Sub-group 1 with the other sub-groups. 3.2.3 mRNA transcripts associated with treatment failure Relapse and refractory disease are the major causes of treatment failure in pediatric AML. To identify mRNA transcripts associated with relapse, I performed differential expression analysis between primary (n=158) and relapse (n=47) samples. This revealed that 46 mRNA transcripts were significantly more abundant in relapse samples, while 23 mRNA transcripts were significantly more abundant in primary samples (Wilcoxon test q-value< 0.05, log2 fold  90 change>1; Figure 3.5A; Appendix 3A). GO Term and KEGG pathway enrichment analysis revealed that mRNA transcripts that were more abundant in relapse samples are involved in the mitotic (M) phase of the cell cycle (CDK1, KIF11, NUSAP1), immune response (CCL2, IGJ, VPREB1, IGLL1, CD79B, CD79A), JAK-STAT signaling (CCL2, SOCS2), and mRNA splicing (SF1). Of the mRNA transcripts that were less abundant in relapse samples, 22/46 (48%) encode histone proteins that are required for the DNA synthesis (S) phase of the cell cycle.  Of all differentially expressed transcripts, only Ribosomal protein L28 (RPL28 - NM_001136136) was also identified as significantly differentially expressed in matched primary and relapse sample comparisons (Paired Wilcoxon test q-value< 0.05, log2 fold change>1). Reduced RPL28 mRNA expression has been shown to be a result of increased hypoxic conditions in HeLaS3 cells263, suggesting that relapsed AML could be more hypoxic than primary AML. Of note, only 1 (NM_001136136) of the 5 RPL28 mRNA transcript isoforms was differentially expressed between primary and relapse samples, while the other 4 mRNA transcript isoforms had relatively constant expression patterns across samples. Differential RPL28 mRNA isoform expression suggests that mRNA splicing of RPL28 transcripts could be altered between primary and relapse samples (Figure 3.5B). NM_001136136 encodes an additional 18 amino acids (EFCLVWARERPLSRVWEL) that are not present on the other 4 isoforms. Using BLASTP, I found that the additional amino acid sequence matches the SCAN oligomerization domain of the Zinc finger protein 500 (O60304) with 72.2% identity. Since the SCAN oligomerization domain is involved in mediating protein complex formation264, its presence in NM_001136136 suggests that, unlike other RPL28 isoforms, this NM_001136136 may encode a protein that has additional protein complex forming capabilities. To identify mRNA transcripts that were associated with refractory AML, I performed differential expression analysis between primary (n=19) and refractory (n=12) samples. (Appendix 3A) No mRNA transcripts were significantly differentially expressed after multiple test correction (q-value < 0.05). I next considered the 200 most overexpressed and 200 most under expressed (based on fold change) mRNA transcripts for GO Term and KEGG pathway enrichment analyses. These revealed that mRNA transcripts that were more  91 abundant in refractory samples are involved in chemokine signaling (CCL14, CCL2, ARRB2, PPBP, FGR, RAP1B, GNG11, PF4, CCL5, CXCL12, CCL7), while mRNA transcripts that were less abundant in refractory samples are involved in glycolysis (ALDOA, LDHA, PFKL, PKM2, ENO1), oxidative phosphorylation (ATP5D, SDHB, NDUFB4, NDUFB6, COX7A2L, ATP5J), drug metabolism (MGST3, GSTK1, MGST1, GSTP1), and encode ribosomal proteins (RPL17, RPL41, RPLP1, RPS4Y1, RPL11, RPL38, RPS24). 3.2.4 mRNA transcripts associated with patient survival After initial induction therapy, MRD and molecular assessments are used to inform second induction cycle or consolidation therapy intensification for pediatric AML patients. Although MRD is a robust predictor of treatment response and relative risk of relapse, it is usually based on a measurement obtained between days 22 and 28 of induction therapy140. Hence, the availability of a prognostic marker that could be used at diagnosis might allow early application of alternative therapies140. To identify mRNA transcripts associated with patient outcome, I performed Cox proportional hazards (PH) analyses using mRNA transcript expression profiles from my discovery cohort of 158 primary samples. This analysis revealed that 41 mRNA transcripts were associated with overall survival (OS) and 1 mRNA transcript was associated with event-free survival (EFS; Univariate Cox PH p-value<0.05, q-value<0.1). To validate my survival analyses, I performed mRNA-seq analysis  on an additional 95 primary samples (mRNA validation cohort). This confirmed the association of the expression of 10 mRNA transcripts with OS (Inferior OS: NCKAP1L, ACTN1; Superior OS: RPL10, CD99, UXT, CUTA, C6orf48, RBM3, MRPS24, HERPUD2), and expression of 1 mRNA transcript (TM9SF2) with EFS (Figure 3.6). Kaplan-Meier plots are for mRNA transcripts associated with OS or EFS are displayed in Figure 3.7. ACTN1 encodes a cytoskeletal bundling protein that is thought to anchor actin to intracellular structures265, CD99 triggers the reorganization of the actin cytoskeleton266 and UXT is also involved in cytoskeleton organization267. Therefore, the associations of ACTN1, CD99 and UXT expression with survival suggests that cytoskeletal processes may be relevant to  92 pediatric AML outcomes, and is in agreement with a previous study which linked actin proteins with refractory disease in adult AML268. Of particular interest is the association of the ribosomal genes RPL10 and MRPS24 with inferior survival. Consistent with my NMF clustering analysis that revealed that abundant expression of ribosomal genes was associated with superior outcomes, these observations indicate that decreased protein translation may play a role in treatment resistance. 3.2.5 Unsupervised clustering of miRNA expression profiles reveals 2 sub-groups characterized by specific cytogenetic alterations To determine the extent of heterogeneity in miRNA expression across primary pediatric AML samples, I used miRNA expression profiles of 259 primary samples to perform unsupervised non-negative matrix factorization (NMF) consensus clustering, This identified an optimum of two sub-groups of patients (sub-group 1 n=65; sub-group 2 n=194; Figure 3.8A; Figure 3.9) that were characterized by distinct miRNA expression patterns and were associated with particular genomic aberrations. Of note, these two groups did not differ based on any clinical characteristics, including age, FAB subtype, MRD or CR status (Fisher’s exact test p-value>0.05), nor did they have significant survival differences (Figure 3.8D). Sub-group 1 was characterized by an abundance of 10 miRNAs, while sub-group 2 was characterized by an abundance of 28 miRNAs (Figure 3.8B; Appendix 3B). In addition, sub-group 1 cases were enriched in FLT3-ITD (46.2% vs 7.73%) and NPM1 mutation (26.2% vs 1.03%), while Sub-group 2 cases were enriched in t(8;21) (18% of vs 0%), inv(16) (19.6% vs 0%), MLL rearrangements(18% vs 0%) and CEBPA mutations (7.73% vs 0%) (Fisher’s exact test p-value ≤0.01; Figure 3.8C). These findings were consistent with reports of associations of genomic variants with miRNA expression in AML177,269: 1) NPM1 mutation with abundant expression of miR-196b and miR-10a, and reduced expression of miR-128, miR-126 and miR-130a; 2) CEBPA mutation with reduced expression of miR-335, miR-181a and miR-181b; t(8,21) with abundant expression of miR-133a. There was no significant overlap between miRNA NMF sub-groups (k=2) and mRNA NMF sub-groups (k=5) (Fisher’s Exact Test p-value > 0.01).  93 The robust association of miRNA expression with cytogenetic alterations prompted me to further investigate miRNA expression patterns associated with cytogenetic alterations. Of the 24 cytogenetic alterations available for analysis, 9 alterations (t(8,21), t(6,11)(q27,q23), t(10,11)(p11.2,q23), t(9,11)(p22,q23), inv(16), MLL rearrangements, FLT3-ITD, NPM mutation, CEBPA mutation) were associated with increased or decreased expression of miRNAs (Wilcoxon test q-value<0.05; log2 fold change>1; Appendix 3B). 3.2.6 miRNA expression in treatment resistant patient samples Although I found somatic genomic alterations that were associated with the expression of specific miRNAs in pediatric AML, none of these genomic aberrations or corresponding variations in miRNA expression patterns were associated with refractory or relapse disease. Therefore, I sought to identify specific miRNAs that were associated with treatment failure.  To identify miRNAs that were associated with treatment failure, I compared primary (pre-treatment) samples of patients who achieved CR to primary (pre-treatment) samples of patients who experienced induction failure (i.e. treatment refractory). This analysis revealed that 15 miRNAs were more abundant in primary samples of CR patients, while 61 miRNAs were more abundant in primary samples of refractory patients (Wilcoxon test q-value <0.05; Figure 3.11A; Appendix 3B).  I compared miRNA expression between refractory (post-treatment) samples and primary (pre-treatment) samples, and this revealed that 193 miRNAs were more abundant in refractory samples, while 30 miRNAs were more abundant in primary samples (Figure 3.11B; Appendix 3B). miR-199a was the most significantly over expressed miRNA in refractory samples (log2 Fold Change: 3.34), and its abundant expression has previously been associated with poor outcome in AML patients270. I compared miRNA expression between relapse (post-treatment) samples and primary (pre-treatment) samples, and this revealed that 113 miRNAs were more abundant in relapse samples, while 70 miRNAs were more abundant in primary samples (Figure 3.11C; Appendix 3B).  Interestingly, 12 miRNAs (hsa-miR-103a-2-5p, hsa-miR-15a-5p, hsa-miR-29b-1-5p, hsa-miR-29b-2-5p, hsa-miR-30c-5p, hsa-miR-324-5p, hsa-miR-500a-5p, hsa-miR-500b-5p, hsa-miR-106a-3p, hsa-miR-106a-5p, hsa-miR-20b-5p, hsa-miR-363-3p) were consistently  94 abundantly expressed in treatment resistant contexts. They were more abundant in refractory samples than primary samples, more abundant in relapse samples than in primary samples, and more abundant in primary samples of patients who subsequently experienced refractory disease when compared with primary samples of patients who achieved CR (Wilcoxon test q-value <0.05; Figure 3.11D,E). This result is compatible with the notion that these miRNAs could be markers of relapse and refractory disease, and that a study of their expression and function could reveal biology important to treatment resistance. Some of these miRNAs have been reported to be dysregulated in AML and are tumour suppressors: miR-29b is dysregulated in adult AML samples271 and inhibits cell proliferation and apoptosis in AML cell lines272 and in xenograph leukemia models273; miR-15a inhibits proliferation of leukemic cells by down-regulating WT1 protein levels274 and enhances retinoic acid-mediated cell differentiation in AML cell lines275; and miR-30c is down-regulated in AML and thus reduces myeloid differentiation276. The abundant expression of tumour-suppressor miRNAs suggests that treatment resistant cells could have a slow-growing and quiescent phenotype. Of particular interest were the 4 miRNAs (miR-106a-3p, miR-106a-5p, miR-20b-5p, miR-363-3p) of the same polycistronic miR106a-363 cluster, which were significantly over-expressed (Wilcoxon test q-value <0.05; log2 fold change >1) in treatment resistant contexts. Since their role in leukemogenesis had not yet been reported, I studied these miRNAs in more detail (Sections 3.2.7 - 3.2.10). 3.2.7 Integrative analysis reveals putative miRNA:mRNA interactions in pediatric AML To study candidate targets of expressed miRNAs in pediatric AML, I performed an integrative miRNA:mRNA analysis277 of matched miRNA-seq and mRNA-seq data for each patient (Figure 3.12). I identified miRNA:mRNA pairs with anti-correlated expression profiles that were indicated by a negative Spearman correlation coefficient, and above a null distribution (617, 403 miRNA:mRNA pairs; Figure 3.13). These pairs were then filtered for both TargetScan and miRanda predicted miRNA binding sites in the 5’-UTR, CDS or 3’-UTR. 206,809 miRNA:mRNA pairs remained and are referred to as putative miRNA:mRNA interactions. In our analysis, there were 368 miRNAs with at least 1 putative target, where  95 each of these miRNAs had an average of 566 (range: 1-3397) putative targets. I note that this analysis is biased to the proportion of miRNA:mRNA interactions that might be linked to mRNA instability and degradation. Of note, my analysis detected the previously reported interaction between miR-139:EIF4G2 in AML cells278, indicating that my analysis is sensitive to bona fide interactions. To identify the candidate biological functions of the putative targets, I performed pathway enrichment analyses on the collection putative targets of various groups of miRNAs (Methods). As there were a large number of putative miRNA:mRNA interactions that resulted from my analysis (206,809 miRNA:mRNA interactions), only miRNA:mRNA target interactions with the strongest anti-correlations (Spearman correlation coefficient <-0.3; 71,040 miRNA:mRNA interactions) were considered further. Significantly enriched pathways (Fisher’s exact test q-values < 0.05) are listed in Figure 3.14A-D. KEGG pathways that were significantly enriched by both the putative targets of miRNAs that were abundant in refractory samples and the putative targets of miRNAs that were abundant in relapse samples included: genetic information processing (“Ribosome”, “RNA polymerase”), cellular signaling (“VEGF signaling”, “Cytokine-cytokine receptor interaction”, “Neuroactive ligand-receptor interaction”), and metabolism (“Oxidative phosphorylation”, “Metabolic pathways”, “Purine metabolism”, “Pyrimidine metabolism”). Of these pathways, only “Oxidative phosphorylation” was amongst the top 5 most enriched pathways in both analyses. This indicated that abundant expression of miRNAs could be inhibiting the expression of  “Oxidative Phosphorylation” target genes in refractory and relapse samples. 3.2.8 miRNAs associated with patient survival To identify miRNAs that had expression patterns associated with survival, I performed Cox proportional hazards analyses on the expression of each miRNA across 259 primary samples (miRNA discovery cohort). There were 26 miRNAs associated with overall survival (OS) and 11 miRNAs with event-free survival (EFS) (Univariate Cox PH p-value<0.05; q-value < 0.1).  96 To validate my survival analyses, I performed miRNA-seq and survival analyses on an additional 378 primary samples (miRNA Validation Cohort). This confirmed the association of the expression of 16 miRNAs with OS and expression of 8 miRNAs with EFS (Univariate Cox PH p-value<0.05; Figure 3.15A). Of note, 6 miRNAs were associated with both OS and EFS in both the Discovery and Validation cohorts: miR-378c and miR-181c-3p were associated with superior outcome, while 4 members of the miR-106a-363 cluster (miR-106a-5p, miR-106a-3p, miR-20b-3p, miR-363-3p) were associated with inferior OS and EFS. The KM plots for miR-106a-5p are shown in (Figure 3.15B,C), and plots for the all miRNAs associated with either OS or EFS are shown in Figure 3.16. 3.2.9 Targets of miR-106a-363 are involved in oxidative phosphorylation The miR-106a-363 cluster consists of 6 polycistronic miRNA species (miR-106a, miR-18b, miR-20b, miR-19b-2, miR-92a-2, miR-363). 4 miRNAs from this cluster (miR-106a-3p, miR-106a-5p, miR-20b-5p, miR-363-3p) were consistently abundantly expressed in treatment resistance contexts (Figure 3.11). In addition, miR-106a-363 expression appeared to be independent of recurrent cytogenetic alterations (Figure 3.17). While abundant miR-106a-363 expression has been associated with drug resistance in glioma279, ovarian cancer280 and gastric cancer281, it has not yet been implicated in treatment resistance in AML. My integrative miRNA:mRNA analysis revealed that several putative targets of the miR-106a-363 cluster were involved in “Oxidative phosphorylation”, the pathway that was predicted to be consistently dysregulated in both refractory and relapse samples. These interactions are: miR-106a: ATP5J2-PTCD1 /ATP5S /NDUFA10 /NDUFC2 /UQCRB, miR-18b: ATP6V1A /ATP8A1 /COX11 /NDUFA4 /NDUFS1, miR-20b: ATP5J2-PTCD1 /ATP5S /ATP8B1 /NDUFA10 /NDUFA4 /NDUFAF3 /NDUFB11 /NDUFC2 /SYNJ2BP-COX16 /UQCRB, and miR-363: ATP2A1 /ATP5G3 /ATP8B1 /ATPAF2 /NDUFB11 /NDUFC2 /SDHB /UQCRB. I interpreted these results to indicate that miR-106a-363 may be modulating treatment resistance by regulating the expression of genes involved in energy metabolism in leukemic cells. This is consistent with a previous observation of oxidative phosphorylation being reduced in chemotherapy resistant leukemic stem cells206. In that study, they used reactive oxygen species (ROS) as a measure of oxidative  97 phosphorylation levels and isolated 2 cell populations from primary AML samples; one characterized by high reactive oxygen species (ROS) levels and the other characterized by low ROS levels. They noted that the low ROS cell population was preferentially enriched for chemo-resistant G0 quiescent leukemic stem cells. 3.2.10 Luciferase reporter assays confirm potential miR-106a interactions To test whether miR-106a can act on predicted miRNA binding sites of genes involved in oxidative phosphorylation, luciferase constructs containing the predicted binding sites of a number of genes involved in oxidative phosphorylation (ATP5J2-PTCD1 (3 sites), ATP5S, NDUFA10 (2 sites), NDUFC2/UQCRB) were generated and luciferase assays were performed on them. The overexpression of miR-106a-5p was able to inhibit the luciferase activity of constructs containing the predicted miR-106a-5p binding sites in a dose dependent manner and the mismatch of the binding site resulted in reduced inhibition of luciferase activity (Figure 3.18). These data indicated that miR-106a expression can modulate expression of mRNA targets that harbor any one of these 6 predicted binding sites, and indicated that miR-106a may modulate the expression of the oxidative phosphorylation genes that harbor these binding sites. 3.3 Discussion Previous studies exploring miRNA177,178 and mRNA179-183,259 expression in pediatric AML were limited to selected genes. For the first time I describe deep and comprehensive profiling of primary, relapse and refractory pediatric AML transcriptomes (summarized in Figure 3.1B). Deep sequencing of mRNA and miRNA transcripts provided me with a unique opportunity to catalog the repertoire of miRNA and mRNA expression and study the possible consequences of miRNA dysregulation. Gene expression profiling has been used to reveal sub-groups of clinical relevance in pediatric ALL282 and in adult AML66. In this study, unsupervised clustering of miRNA and mRNA expression profiles illustrated the gene expression heterogeneity in primary samples of the pediatric AML cohort. In particular, I observed that the mRNA sub-group that  98 exhibited abundant expression of genes encoding ribosomal proteins was associated with superior outcome. My survival analyses revealed that reduced expression of RPL10 (ribosomal protein L10) and MRPS24 (Mitochondrial Ribosomal Protein S24) were associated with inferior OS. I also observed that the expression of 1 isoform of RPL28 (ribosomal protein L28) was reduced at relapse. These findings are in agreement with studies in adult AML which have demonstrated that ribosomal proteins sustain the morphology and function of leukemic blasts283, and that ribosomal stress as a result of reduced RPS14 contributes to erythroid failure through the upregulation of the p53 pathway284. Moreover, ribosomal proteins are also frequently mutated in Diamond-Blackfan anemia285, a congenital disease with risk of AML development. Moreover, increased cell proliferation is made possible by an up-regulation of ribosomal biogenesis and thereby protein synthesis286. Thus, my observation of reduced expression of ribosomal genes suggests that treatment resistant cells are slow growing and may be evading therapy directed at rapidly proliferating cells. miRNAs can regulate the expression of hundreds of different target mRNA transcripts and have been reported to play oncogenic roles in various pediatric leukemias178. In pediatric AML, miR-99a187 was shown to be oncogenic, and low expression of miR-663188 was observed in AML samples. Unsupervised clustering of miRNA expression profiles revealed sub-groups that are highly correlated with the presence of specific cytogenetic aberrations. This is in agreement with previous miRNA profiling efforts which have associated miRNA expression profiles with specific cytogenetic aberrations in adult AML269, and one study has identified miR-9 as a tumor suppressor in pediatric AML with t(8;21)287. My study identified several additional associations of miRNA expression with specific cytogenetic alterations (listed in Appendix 3B). These new associations included those involving established indicators of outcome (t(8;12), CEBPA-mut, FLT3-ITD), suggesting that miRNAs could also be valuable diagnostic and prognostic tools.  Accordingly, my survival analysis in 637 patients revealed that abundant expression of miR-378c and miR-181c were associated with superior OS and EFS, while abundant expression of 4 members of the miR-106a-363 cluster (miR-106a-5p, miR-106a-3p, miR-20b-5p, miR-363-3p) were associated with inferior OS and EFS. The miR-106a-363 miRNA cluster, a paralog of the oncogenic miR-17-92 cluster288, consists of  6 miRNAs: miR-106a, miR-18b,  99 miR-20b, miR-19b-2, miR-92a-2, and miR-363. Although less well-characterized, miR-106a-363 has displayed oncogenic potential in T-cell leukemia289, and is associated with increased proliferation in mantle cell lymphoma290. However, unlike the miR-17-92 cluster, the miR-106a-363 cluster is dispensable during mouse embryonic development291, suggesting that it might be involved in the regulation of a different set of functions and pathways. My analysis showed that members of the miR-106a-363 cluster were not only robustly associated with OS and EFS, but were also more abundant in refractory and relapse samples and more abundant in primary samples of refractory patients when compared to CR patients. This expression pattern was observed across FAB subtypes and miRNA NMF sub-groups, suggesting a common mechanism of miRNA-mediated repression in pediatric AML. Further, my integrative miRNA:mRNA analysis and luciferase reporter assays indicated that the miR-106a-5p targets genes may be involved in oxidative phosphorylation. Oxidative phosphorylation is reduced in treatment resistant cell line populations and consequently results in a quiescent cell state206. Such quiescent cells may evade cytarabine and anthracycline based therapies, which have mechanisms of action that are selective for rapidly dividing and proliferating cells292. Thus, I infer that abundant expression of miR-106a-363 could be contributing to treatment resistance by repressing energy metabolism. This also suggests that miR-106a-363 could serve as a diagnostic marker to identify AML cases with low levels of energy metabolism, perhaps identifying cases that may be better-served with alternative therapies. A probe-based analysis reported that miRNA:mRNA interactions in pediatric AML293 samples mostly involved 3 miRNAs: miR-196b, miR-155 and miR-25. The targets of these 3 miRNAs were involved in the cell cycle, proliferation, cell death and apoptosis, and development, growth and angiogenesis. A PAR-CLIP array-based study has reported on miRNA:mRNA interactions in pediatric AML cell lines, where mRNAs associated with Ago were mostly involved in cancer signaling pathways including the MAPK, mTOR, Phosphatidylinositol and Wnt pathways177. My integrative miRNA:mRNA sequence analysis in pediatric AML samples complements these previous efforts by providing a global view of miRNA:mRNA interactions and pathways that are dysregulated in pediatric AML. In particular, it revealed that miRNAs that are abundantly expressed in treatment resistant  100 contexts (relapse and refractory samples) appear to target genes involved in RNA processing, cellular signaling and energy metabolism. These are all processes required for rapid proliferation, and their inhibition in relapse and refractory samples may suggest a role for quiescence in pediatric AML treatment resistance.  As a goal of my thesis was to stratify heterogeneous cancer patient populations by outcome and other clinical and molecular correlates, I performed analyses of miRNA and mRNA expression across a cohort of AML patients with the aim of identifying transcripts that could classify AML patients. In summary, my analysis revealed 5 mRNA-based sub-groups where the sub-group with superior outcome was further characterized by abundant expression of ribosomal genes. I also identified 2 miRNA-based sub-groups that correlated with cytogenetic alterations that could classify the disease. Further, I identified several mRNAs and miRNAs which are significantly associated with survival, including miR-106a-363. miR-106a-363 was abundantly expressed in refractory and relapse samples and targets genes involved in energy metabolism, suggesting that the oxidative phosphorylation pathway may be amenable to therapeutic targeting in treatment resistant AML cells. 3.4 Methods 3.4.1 Patient samples & treatment protocol The data set for this study consisted of primary, refractory and relapse samples from 676 patients who were enrolled in at least one of the following pediatric AML studies: CCG-2961 (n=43), AAML0531 (n=556) or AAML03P1 (n=77). Details of these studies have been described294-296. Karyotyping was centrally reviewed by each study group. Consent was obtained from all study participants in accordance with the Declaration of Helsinki. Institutional review board approval was obtained by the Fred Hutchinson Cancer Research Centre before analysis. 3.4.2 Cell lines HEK-293 cells were maintained in Dulbelcco's Modified Eagle Medium (DMEM; Life Technologies, Burlington ON) supplemented with 10% (v/v) fetal bovine serum (FBS; Life  101 Technologies. M-07e cells were maintained in RPMI640 (Life Technologies) supplemented with 10% (v/v) FBS and 10ng/ml human recombinant GMSCF (Stem Cell Technologies; Vancouver BC). All cells were maintained in a 37°C incubator with a 5% humidified atmosphere. 3.4.3 Plasmid constructs and miRNA mimics  miRNA expression was increased using MIRIDIAN miRNA mimics (ThermoScientific, Waltham MA) directed against miR-106a-5p and negative control #2 (NC2; negative control against C. elegans cel-miR-239b). Mimics were resuspended in nuclease-free water at a stock concentration of 100µM. The genomic or mismatched sequences corresponding to the predicted binding sites to miR-106a-5p were synthesized (IDT Technologies; Coralville IA) and cloned into the XhoI/NotI restrictions sites of the psiCHECK2 vector (Promega; Madison WI) directly downstream of the Renilla luciferase reporter gene and verified by DNA sequence analysis. The mismatched sequences are exactly complementary to the seven nucleotide seed regions of each of the predicted miR-106a-5p binding sites. 3.4.4 Library construction and sequencing of miRNA-seq Illumina libraries RNA extraction, miRNA-seq library construction, sequencing, read alignment, and miRNA expression profiling was performed as reported by The Cancer Genome Atlas Research Network66. miRNA-seq reads were aligned to hg19 and mirBase v21, and miRNA 3p/5p strands that were expressed at a level of at least 10 reads per million mapped reads (RPM) in at least 10 libraries were retained for analysis. 3.4.5 Library construction and sequencing of mRNA-seq Illumina libraries PolyA+ RNA was purified using the 96-well MultiMACS mRNA isolation kit on the MultiMACS 96 separator (Miltenyi Biotec, Germany) from 2µg total RNA with on-column DNaseI-treatment as per the manufacturer's instructions. The eluted PolyA+ RNA was ethanol precipitated and resuspended in 10µL of DEPC treated water with 1:20 SuperaseIN (Life Technologies, USA).  102 First-strand cDNA for samples in the discovery cohort was synthesized from the purified polyadenylated messenger RNA using Superscript II Reverse Transcriptase (Thermo-Fisher, USA). For the validation cohort, the Maxima H Minus First Strand cDNA Synthesis kit (Thermo-Fisher, USA) and random hexamer primers at a concentration of 5µM along with a final concentration of 1µg/uL Actinomycin D was used, followed by Ampure XP SPRI bead purification on a Biomek FX robot (Beckman-Coulter, USA). Second strand cDNA was synthesized following the Superscript cDNA Synthesis protocol by replacing the dTTP with dUTP in the dNTP mix, allowing the second strand to be digested using UNG (Uracil-N-Glycosylase, Life Technologies, USA) in the post-adapter ligation reaction, thus achieving strand specificity. cDNAs were fragmented using Covaris E210 sonication for 55 seconds at a “Duty cycle” of 20% and “Intensity” of 5. Paired-end sequencing libraries were prepared following the BC Cancer Agency Genome Sciences Centre strand-specific, plate-based and paired-end (PE) library construction protocols on a Biomek FX robot (Beckman-Coulter, USA). Briefly, the cDNA was purified in 96-well format using Ampure XP SPRI beads, and was subject to end-repair and phosphorylation using T4 DNA polymerase, Klenow DNA Polymerase, and T4 polynucleotide kinase respectively in a single reaction, followed by SPRI bead cleanup and 3’ A-tailing using Klenow fragment (3’ to 5’ exo minus). Illumina PE adapters were ligated and the adapter-ligated products were purified using Ampure XP SPRI beads, and digested with UNG (1U/µL) at 37oC for 30 min followed by deactivation at 95oC for 15 min. The digested cDNAs were purified using Ampure XP SPRI beads, and then PCR-amplified with Phusion DNA Polymerase (Thermo Fisher Scientific Inc. USA) using Illumina’s PE primer set, with cycle conditions 98˚C for 30sec followed by 10-13 cycles of 98˚C for 10 sec, 65˚C for 30 sec and 72˚C for 30 sec, and finally 72˚C for 5min. The PCR products were purified using Ampure XP SPRI beads, and checked with Caliper LabChip GX for DNA samples using the High Sensitivity Assay (PerkinElmer, Inc. USA). PCR products of the desired size range were purified using SPRI beads, and the DNA quality was assessed and quantified using an Agilent DNA 1000 series II assay and Quant-iT dsDNA HS Assay Kit using a Qubit fluorometer (Invitrogen), then diluted to 8nM in preparation for Illumina HiSeq2500 paired-end 75 base sequencing.  103 3.4.6 mRNA isoform-specific expression profiling of mRNA-seq  The mRNA-seq paired-end reads were aligned to RefSeq hg19 genome using TopHat v1.4.1244. Alignments were then interrogated for isoform-specific expression profiles using Cufflinks v1.3.0244. mRNA transcripts with at least 1 fragment per kilobase of transcript per million mapped reads (FPKM) in 1 mRNA-seq library were considered expressed and were retained for analysis. 3.4.7 BLASTP To identify proteins with sequence identity to protein sequences of interest, I performed BLASTP297 using the NCBI web portal. I specified “refseq_protein” for the database and “Homo sapiens” for the organism. 3.4.8 Differential expression analysis To correct for the batch-effect that may hinder with clustering analysis, prior to differential expression analysis, miRNA expression profiles were quantile normalized using the R preprocessCore package. Evaluation of the differential expression of miRNA and mRNA was performed using the Wilcoxon ranked-sum test for each miRNA and mRNA. I considered significantly differentially expressed miRNAs to be those with Bejamini-Hochberg (BH) multiple test corrected p-values (q-values) < 0.05. KEGG pathways and Gene Ontology (GO) terms that were enriched in differentially expressed genes were determined using DAVID298. 3.4.9 Integrative miRNA:mRNA expression analysis Integrative miRNA:mRNA analysis was performed as described by Lim et al.277 (Figure 3.12). I considered samples for which I had both miRNA-seq and mRNA-seq data (n=164). Briefly, a Spearman correlation coefficient (rho) score and a p-value were generated for comparisons of expression profiles between all possible miRNA and mRNA pairs. Then, miRNA:mRNA pairs were shortlisted based on the presence of target site predictions (from both TargetScan and miRanda algorithms) and significant anti-correlation between miRNA  104 and gene expression, with a statistical significance determined from comparing against bootstrapping-based null distributions (Figure 3.13). KEGG pathway enrichment of target genes miRNAs was performed using the Fisher’s exact test. The groups of miRNAs I considered were: (1) miRNAs that were abundantly expressed in refractory samples vs primary samples; (2) miRNAs that were poorly expressed in refractory samples vs primary samples; (3) miRNAs that were abundantly expressed in relapse samples vs primary samples, and (4) miRNAs that were poorly expressed in relapse samples vs primary samples. Significantly enriched pathways were those with q-values < 0.05. As I was interested in cellular functions that were dysregulated in AML, only KEGG pathways in the following categories were considered: 1. Metabolism, 2. Genetic Information Processing, 3. Environmental Information Processing, 4. Cellular Processes, 5. Organismal Systems, while disease-centric KEGG pathways in the following categories were excluded: 6. Human Diseases, 7. Drug Development. 3.4.10 Survival analysis Overall survival (OS) was measured from the date of registration to the date of death due to any cause, with patients last known to be alive censored at the date of last contact. Event-free survival (EFS) was measured from the date of registration to the date of the first of the following events: removal from protocol therapy without achieving CR, progression, or death due to any cause. Patients who were last known to be alive and progression free were censored at the date of last contact. For each miRNA/mRNA transcript, I performed Cox proportional hazards (Cox PH) analysis using the Survival R package, where quantile normalized log2 RPM values were used as input. Significant associations with survival were those with p-value <0.05 and Benjamini-Hochberg multiple test corrected p-values (q-values) <0.1. For the mRNA analysis, I only considered protein coding mRNA transcripts that had a median expression of >10 FPKM across all primary samples. For each KM plot, I used X-tile cohort separation223 to categorize patients into low and high expression groups. A cut point was then determined by taking a mean on the centroids of the  105 high and low groups. This same cut point was applied to both the discovery and validation cohorts. The p-value displayed on each KM-plot is a log-rank p-value. 3.4.11 NMF clustering of miRNA and mRNA expression Only primary samples were included in the NMF clustering analysis. For mRNA and miRNA expression profiles, I generated unsupervised consensus clustering results as described by The Cancer Genome Atlas Research Network66. I used the default Brunet algorithm and 100 iterations for the clustering runs. A preferred cluster result was selected by considering the profiles of the cophenetic scores of the consensus membership matrix for clustering solutions having between 2 and 15 clusters. For the clustering of mRNA expression data, I chose the 5-group (k=5) solution as it had the second highest cophenetic score and produced a visually clean consensus matrix when compared with the other solutions (Figure 3.3), indicating that there was a high probability that the same result was obtained through the 100 clustering iterations. I reasoned that studying 5 groups (as opposed 2 groups solution, which produced the highest cophenetic score) would uncover more insight into the heterogeneity of mRNA transcript expression in pediatric AML. For the clustering of miRNA expression data, I chose the 2-group (k=2) solution as it had the highest cophenetic score and produced a visually clean consensus matrix when compared with the other solutions (Figure 3.9). Enrichment of particular clinical characteristics in each sub-group was determined using Fisher’s exact tests, where significant enrichment was reported if p-value <0.05 3.4.12 Dual-luciferase reporter assays Dual-Glo reporter assays were performed as previously described277. HEK-293 cells were seeded onto 24-well plates one day before transfection. Luciferase reporter constructs were co-transfected with miR-106a-5p or NC2 control mimics using TurboFect Transfection Reagent (ThermoScientific) in OPTI-MEM (Life Technologies) without FBS. Six hours following transfections, media were changed and DMEM supplemented with 10% FBS. 24h after transfections, cells were reseeded into 96-well plates. 48h after transfection, cells were lysed and luciferase activities were assayed using the Dual-Glo Luciferase Reporter Assay System (Promega). On the luciferase construct, Renilla luciferase was located down stream  106 of the inserted miRNA binding site of interest and thus was used to monitor responses to miRNA over-expression; while the firefly luciferase is included on the plasmid as an intraplasmid transfection normalization reporter. As such, Renilla/Firefly luciferase ratios were calculated for each well to account for transfection efficiencies. These experiments were performed in triplicate and were shown as means ± SEM. Statistical comparisons were performed using unpaired two-tailed t-tests with Bonferroni multiple-test correction, where significant differences were those with adjusted p-value <0.05. 3.5 Figures and tables Figure 3.1 Transcriptome analysis of pediatric AML A) Schematic diagram of my experimental design and possible trajectories of pediatric AML patients. In the data set are primary samples that are obtained at time of diagnosis (blue), relapse samples (orange), and induction failure/refractory samples (red). B) Summary of pediatric AML transcriptomes. mRNA and miRNA transcripts are arranged according to genomic locus on each chromosome. Tracks from outer-most to inner-most: 1) Green bars indicate expression levels of all detected miRNA and mRNA transcripts. Expression values are represented as log2 FPKM (mRNA) or log2 RPM (miRNA); 2) Orange dots indicate the log2 fold change between relapse and primary samples. Each dot represents one miRNA or mRNA transcript; 3) Red dots indicate the log2 fold change between refractory and primary samples. Each dot represents one miRNA or mRNA transcript. RPL28 is less abundant in relapse samples when compared with primary samples, and its genomic locus is highlighted with a rectangular box on chromosome 19. miR-106a-363, miR-500a/b, miR-29b-1. miR-103a are more abundant in relapse and refractory samples when compared with primary samples, and their genomic loci are also highlighted with a rectangular boxes.    107   108 Figure 3.2 Unsupervised NMF clustering of mRNA expression profiles from primary samples A) The NMF consensus map for k=5 sub-groups. Deep red blocks numbered 1 to 5 indicate the 5 robust groups that were identified using NMF. B) Heat map displaying expression of mRNA transcripts that are significantly differentially expressed between the 5 sub-groups (Wilcoxon test q-value <0.05; log2 fold change >1. Only the top 25 mRNA transcripts that characterize each sub-group (based on q-value) are displayed). C) Covariate tracks displaying clinical attributes of each patient. Red boxes indicate clinical attributes that enrich one cluster over another (Fisher’s exact test p-value <0.01). The visual legend for this figure is available at the end of the figure. D) Kaplan-Meier plots illustrating OS and EFS status of patients in each group, where Sub-group 5 has superior OS and EFS when compared to the other 4 sub-groups.  109  BHBA2|NM_000517HBB|NM_000518PDLIM1|NM_020992SOX4|NM_003107EIF1B|NM_005875TFRC|NM_001128148SF3B1|NM_012433SRSF11|NM_004768CCNL1|NM_020307HNRNPH1|NM_005520SLC38A2|NM_018976AMD1|NM_001634RSL24D1|NM_016304EIF3E|NM_001568RPL7|NM_000971SRGN|NM_002727GRN|NM_002087CTSD|NM_001909CTSG|NM_001911RNASE2|NM_002934RNASE3|NM_002935CAPG|NM_001256139CAPG|NM_001747S100A9|NM_002965S100A8|NM_002964CDA|NM_001785LGALS3|NM_002306ANXA5|NM_001154SCPEP1|NM_021626IFI27L2|NM_032036GPX1|NM_000581LST1|NM_205838PLD3|NM_001031696PLD3|NM_012268LGALS1|NM_002305S100A4|NM_002961LY86|NM_004271RNASE6|NM_005615CST3|NM_000099TYROBP|NM_198125TYROBP|NM_003332MRPL33|NM_004891CHCHD10|NM_213720C4orf48|NM_001168243C1orf162|NM_174896SPI1|NM_003120TSPO|NM_000714CYBA|NM_000101TRAPPC1|NM_001166621BLOC1S1|NM_001487RAC2|NM_002872LAMTOR2|NM_014017PYCARD|NM_013258CNPY3|NM_006586WAS|NM_000377ADIPOR1|NM_015999GYPC|NM_002101CST7|NM_003650BCL2A1|NM_004049NAMPT|NM_005746DEFA1|NM_004084SLC25A39|NM_016016BLVRB|NM_000713SLC25A37|NM_016612HBD|NM_000519AHSP|NM_016633CA1|NM_001128831HBG2|NM_000184HBG1|NM_000559PNP|NM_000270FTL|NM_000146FXYD5|NM_014164CD99|NM_002414TPT1|NM_003295RPL31|NM_000993RPL38|NM_000999RPLP1|NM_213725RPLP1|NM_001003PTRHD1|NM_001013663RPS19|NM_001022RPL28|NM_001136135RPL27|NM_000988RPL38|NM_001035258RPL28|NM_000991RPS16|NM_001020RPS11|NM_001015RPLP2|NM_001004RPL37A|NM_000998RPL29|NM_000992RPL18|NM_000979RPL36|NM_033643RPS29|NM_001032RPS21|NM_001024RPL35|NM_007209RPL30|NM_000989GNB2L1|NM_00609800.40.6Sil. Widthïïï0123mRNA Expression Z-scoreAFLT3 Mutationtrisomy21GenderRaceCNS diseaseChloromat(6,9)t(8,21)t(3,5)(q25,q34)t(6,11)(q27,q23)t(9,11)(p22,q23)t(10,11)(p11.2,q23)t(11,19)(q23,p13.1)inv(16)del5qdel7qdel9qmonosomy5monosomy7trisomy8MLL RearrangementsOther MLLMinus YMinus XFLT3-ITDNPM1 MutationCEBPA MutationWT1 MutationcKit MutationcKit Exon MutationMRD - end of course 1MRD - end of course 2Risk GroupCR - end of course 1CR - end of course 2C0 2 4 6 8 100.00.20.40.60.81.0 OSYears% SurvivalORJïUSval: 0.009830 1 2 3 4 5 6 70.00.20.40.60.81.0 EFSYears% SurvivalORJïUSval: 0.016091 n=222 n=143 n=694 n=305 n=23D5 1 2 4 3 5 1 2 4 3FABSil. Width0.980.03GenderFemaleMaleRaceAmerican Indian or Alaska NativeAsianBlack or African AmericanNative Hawaiian or other Pacific IslanderUnknownWhiteFABM0M1M2M4M5M6M7NOSUnknownRisk groupHighLowStandardUnknownCR status at end of course1/2CRNot in CRUnevaluable 110 Figure 3.3 mRNA k2-15 NMF Metrics A) Consensus maps of rank (k) 2-15 solutions of unsupervised clustering of mRNA expression profiles of 158 primary samples. Deep red blocks indicate samples that consistently cluster with one another. B) Cophenetic coefficients (which provide measurements of the stability of the clusters), and silhouette widths (which indicate the consistency of the membership of each sample in the assigned cluster) of k: 2-15 solutions. I chose to follow up on the k=5 solution as it had the second highest cophenetic coefficient, and I reasoned that studying 5 sub-groups (as opposed to 2 sub-groups) would uncover more insight into the heterogeneity of mRNA transcript expression in pediatric AML. This rationale emerged as meaningful: we were able to identify 5 sub-groups characterized by distinct expression profiles, where 1 sub-group (sub-group 5) had superior outcomes when compared to the other 4 sub-groups.      111  rank =  2 rank =  3 rank =  4 rank =  5rank =  6 rank =  7 rank =  8 rank =  9rank =  10 rank =  11 rank =  12 rank =  1300.20.40.60.81rank =  14 rank =  15BasisConsensusSil. Widthsilhouette0.40.60.81.02 3 4 5 6 7 8 9 10 11 12 13 14 15Factorization rankMeasure typeBasisBest fitCoefficientsConsensuscophenetic0.950.960.970.980.991.002 3 4 5 6 7 8 9 10 11 12 13 14 15Factorization rankAB 112 Figure 3.4 Complementary methods of clustering mRNA expression profiles A) Hierarchical clustering (distance: manhattan, hclust: ward-D2) of mRNA expression data. Coloured bars indicate groups derived from NMF cluster of the same data. NMF Group 5 appears to be a distinct cluster from the other 4. 18 of 23 cases from NMF Group 5 clustered in a distinct branch, displayed to the extreme left of this plot. B) Principle components analysis of mRNA expression data. The 3 of the top 4 principle components (PC; based on eigenvalues; indicated by black bars) were able to separate patients similarly as groupings derived from NMF. NMF Group 5 appears to be a distinct cluster from the other 4. (Group 5 dots are indicated in green colour, and are mostly within the dotted ellipse).  113   114 Figure 3.5 Differential expression analysis of mRNA transcript expression between relapse and primary samples A) Volcano plot displaying differentially expressed mRNA transcripts between relapse and primary samples. mRNA transcripts that are more abundant in relapse samples are shown in red, while mRNA transcripts that are more abundant in primary samples are shown in green. The red line indicates the significance threshold of 0.05. The arrow indicates the mRNA transcript (RPL28|NM_001136136) that is also significantly differentially expressed between matched primary and relapse samples (n=35). B) Heat map showing expression levels of RPL28 RefSeq annotated mRNA transcripts in matched primary and relapse samples. The gene model of each mRNA isoform is displayed to the left of the heat map. 3 isoforms (NM_000991, NM_001136136 and NM_001136135) are expressed, but only NM_001136136 is significantly differentially expressed between primary and relapse samples.   115     116 Figure 3.6 mRNAs associated with OS or EFS in the mRNA discovery and validation cohorts Forrest plots displaying Cox proportional hazard ratios (and 95% confidence intervals) of mRNAs that are significantly associated with OS or EFS (Cox PH q-value <0.05). TOP: OS in the discovery cohort and OS in the validation cohort. BOTTOM: EFS in the discovery cohort and EFS in the validation cohort. A hazard ratio to the left of the y-axis indicates that abundant expression of that miRNA is associated with reduced risk (superior outcome), while a hazard ratio to the right of the y-axis indicates that abundant expression of that miRNA is associated with increased risk (inferior outcome). Blue dots/bars indicate a p-value of less than 0.05, while red dots/bars indicate both a p-value of less than 0.05 and a FDR of less than 0.1.  117     Not Significant p-value<0.05 and FDR<0.1 p-value<0.05HERPUD2|NM_022373CUTA|NM_001014838MRPS24|NM_032014RBM3|NM_006743C6orf48|NM_001040438UXT|NM_004182RPL10|NM_006013CD99|NM_002414ACTN1|NM_001102TM9SF2|NM_004800NCKAP1L|NM_0053370.5 1.0 1.5 2.0 2.5OS Hazard RatiomRNA0.5 1.0 1.5 2.0 2.5OS Hazard Ratio0.5 1.0 1.5 2.0 2.5EFS Hazard Ratio0.5 1.0 1.5 2.0 2.5EFS Hazard RatioHERPUD2|NM_022373CUTA|NM_001014838MRPS24|NM_032014RBM3|NM_006743C6orf48|NM_001040438UXT|NM_004182RPL10|NM_006013CD99|NM_002414ACTN1|NM_001102TM9SF2|NM_004800NCKAP1L|NM_005337mRNADISCOVERY COHORT (n=158) VALIDATION COHORT (n=87)Overall Survival (OS)Event Free Survival (EFS)DISCOVERY COHORT (n=158) VALIDATION COHORT (n=87) 118 Figure 3.7 Kaplan-Meier plots for mRNAs associated with OS or EFS in the mRNA discovery and validation cohorts (This figure spans 6 pages)  There are 4 panels for each miRNA: OS in the discovery cohort (DISC), EFS in the discovery cohort, OS in the validation cohort (VAL) and EFS in the validation cohort. For each panel, the Kaplan-Meier plot is shown to the left, while the boxplot indicating expression levels in low and high groups is shown to the right. In the discovery cohort, low and high groups were determined using X-tile223, which determines the optimal separation point for low and high expression groups by selecting the separation point that returns the lowest log-rank p-value. In the both the discovery and validation cohorts, low and high groups were determined based on proximity to centroids that were the means of the low and high groups in the discovery cohort. The 11 mRNA transcripts featured here are associated with either OS or EFS in both the discovery and validation cohorts, indicating that their associations with survival are robust and should be regarded as potential prognostic markers.   119  0 2 4 6 8 100.00.20.40.60.81.0MRPS24|NM_03201426ï',6&Years% SurvivalORJïUSval: 0.06557High n= 97Low n= 61High Low050100150High/Low Threshold: 54 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0MRPS24|NM_032014()6ï',6&Years% SurvivalORJïUSval: 0.01524High n= 36Low n= 122High Low050100150High/Low Threshold: 89 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0MRPS24|NM_03201426ïVALYears% SurvivalORJïUSval: 0.69621High n= 64Low n= 31High Low20406080100120High/Low Threshold: 54 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0MRPS24|NM_032014()6ïVALYears% SurvivalORJïUSval: 0.77845High n= 11Low n= 84High Low20406080100120High/Low Threshold: 89 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.01&.$3/_10B26ï',6&Years% SurvivalORJïUSval: 0High n= 75Low n= 83High Low101520253035High/Low Threshold: 17.9 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.01&.$3/_10B()6ï',6&Years% SurvivalORJïUSval: 0.0011High n= 78Low n= 80High Low101520253035High/Low Threshold: 17.4 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.01&.$3/_10B26ïVALYears% SurvivalORJïUSval: 0.43021High n= 87Low n= 8High Low10203040506070High/Low Threshold: 17.9 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.01&.$3/_10B()6ïVALYears% SurvivalORJïUSval: 0.66534High n= 88Low n= 7High Low10203040506070High/Low Threshold: 17.4 FPKMP51$([S)3.0 120  0 2 4 6 8 100.00.20.40.60.81.0RPL10|NM_00601326ï',6&Years% SurvivalORJïUSval: 0.01109High n= 71Low n= 87High Low0200400600800High/Low Threshold: 356 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RPL10|NM_006013()6ï',6&Years% SurvivalORJïUSval: 0.01544High n= 46Low n= 112High Low0200400600800High/Low Threshold: 420 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RPL10|NM_00601326ïVALYears% SurvivalORJïUSval: 0.1631High n= 81Low n= 14High Low20040060080010001200High/Low Threshold: 356 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RPL10|NM_006013()6ïVALYears% SurvivalORJïUSval: 0.00143High n= 73Low n= 22High Low20040060080010001200High/Low Threshold: 420 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0A&71_10B26ï',6&Years% SurvivalORJïUSval: 0.00051High n= 30Low n= 128High Low010203040506070High/Low Threshold: 26.5 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0A&71_10B()6ï',6&Years% SurvivalORJïUSval: 0.00666High n= 29Low n= 129High Low010203040506070High/Low Threshold: 26.8 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0A&71_10B26ïVALYears% SurvivalORJïUSval: 0.00011High n= 28Low n= 67High Low020406080100120High/Low Threshold: 26.5 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0A&71_10B()6ïVALYears% SurvivalORJïUSvDOHïHigh n= 28Low n= 67High Low020406080100120High/Low Threshold: 26.8 FPKMP51$([S)3.0 121  0 2 4 6 8 100.00.20.40.60.81.0C6orf48|NM_00104043826ï',6&Years% SurvivalORJïUSval: 0.0067High n= 109Low n= 49High Low0100200300400500600High/Low Threshold: 197 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0C6orf48|NM_001040438()6ï',6&Years% SurvivalORJïUSval: 0.01637High n= 61Low n= 97High Low0100200300400500600High/Low Threshold: 266 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0C6orf48|NM_00104043826ïVALYears% SurvivalORJïUSval: 0.07417High n= 80Low n= 15High Low2004006008001000High/Low Threshold: 197 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0C6orf48|NM_001040438()6ïVALYears% SurvivalORJïUSval: 0.01579High n= 69Low n= 26High Low2004006008001000High/Low Threshold: 266 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0&'_10B26ï',6&Years% SurvivalORJïUSval: 0.00406High n= 94Low n= 64High Low02004006008001200High/Low Threshold: 226 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0&'_10B()6ï',6&Years% SurvivalORJïUSval: 0.05224High n= 51Low n= 107High Low02004006008001200High/Low Threshold: 396 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0&'_10B26ïVALYears% SurvivalORJïUSval: 0.30946High n= 64Low n= 31High Low05001000150020002500High/Low Threshold: 226 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0&'_10B()6ïVALYears% SurvivalORJïUSval: 0.20981High n= 43Low n= 52High Low05001000150020002500High/Low Threshold: 396 FPKMP51$([S)3.0 122    0 2 4 6 8 100.00.20.40.60.81.0CUTA|NM_00101483826ï',6&Years% SurvivalORJïUSval: 0.1926High n= 71Low n= 87High Low01020304050High/Low Threshold: 14.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0CUTA|NM_001014838()6ï',6&Years% SurvivalORJïUSval: 0.00243High n= 46Low n= 112High Low01020304050High/Low Threshold: 19.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0CUTA|NM_00101483826ïVALYears% SurvivalORJïUSval: 0.13996High n= 17Low n= 78High Low510152025High/Low Threshold: 14.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0CUTA|NM_001014838()6ïVALYears% SurvivalORJïUSval: 0.19015High n= 8Low n= 87High Low510152025High/Low Threshold: 19.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0+(538'_10B26ï',6&Years% SurvivalORJïUSval: 0.00645High n= 78Low n= 80High Low051015202530High/Low Threshold: 13.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0+(538'_10B()6ï',6&Years% SurvivalORJïUSval: 0.12915High n= 72Low n= 86High Low051015202530High/Low Threshold: 14 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0+(538'_10B26ïVALYears% SurvivalORJïUSval: 0.26418High n= 39Low n= 56High Low10152025High/Low Threshold: 13.6 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0+(538'_10B()6ïVALYears% SurvivalORJïUSval: 0.05373High n= 38Low n= 57High Low10152025High/Low Threshold: 14 FPKMP51$([S)3.0 123   0 2 4 6 8 100.00.20.40.60.81.0RBM3|NM_00674326ï',6&Years% SurvivalORJïUSval: 0.00983High n= 99Low n= 59High Low050100150High/Low Threshold: 48.4 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RBM3|NM_006743()6ï',6&Years% SurvivalORJïUSval: 0.02311High n= 99Low n= 59High Low050100150High/Low Threshold: 48.7 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RBM3|NM_00674326ïVALYears% SurvivalORJïUSval: 0.07674High n= 57Low n= 38High Low50100150High/Low Threshold: 48.4 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0RBM3|NM_006743()6ïVALYears% SurvivalORJïUSval: 0.01885High n= 57Low n= 38High Low50100150High/Low Threshold: 48.7 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0UXT|NM_00418226ï',6&Years% SurvivalORJïUSval: 0.01304High n= 35Low n= 123High Low0100200300400500High/Low Threshold: 317 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0UXT|NM_004182()6ï',6&Years% SurvivalORJïUSval: 0.00548High n= 35Low n= 123High Low0100200300400500High/Low Threshold: 317 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0UXT|NM_00418226ïVALYears% SurvivalORJïUSval: 0.4441High n= 14Low n= 81High Low100200300400500High/Low Threshold: 317 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0UXT|NM_004182()6ïVALYears% SurvivalORJïUSval: 0.37605High n= 14Low n= 81High Low100200300400500High/Low Threshold: 317 FPKMP51$([S)3.0 124    0 2 4 6 8 100.00.20.40.60.81.0TM9SF2|NM_00480026ï',6&Years% SurvivalORJïUSval: 0.03401High n= 33Low n= 125High Low102030405060High/Low Threshold: 33.4 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0TM9SF2|NM_004800()6ï',6&Years% SurvivalORJïUSval: 0.00421High n= 84Low n= 74High Low102030405060High/Low Threshold: 24.3 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0TM9SF2|NM_00480026ïVALYears% SurvivalORJïUSval: 0.19476High n= 16Low n= 79High Low102030405060High/Low Threshold: 33.4 FPKMP51$([S)3.00 2 4 6 8 100.00.20.40.60.81.0TM9SF2|NM_004800()6ïVALYears% SurvivalORJïUSval: 0.00057High n= 40Low n= 55High Low102030405060High/Low Threshold: 24.3 FPKMP51$([S)3.0 125 Figure 3.8 Unsupervised NMF clustering of miRNA expression profiles of primary samples A) The NMF consensus map for k=2 sub-groups. Deep red blocks numbered 1 and 2 indicate 2 sub-groups that were identified using NMF. B) Heat map displaying expression of miRNAs that are significantly differentially expressed between sub-group 1 and sub-group 2 (Wilcoxon test q-value <0.03; log2 fold change >1). C) Covariate tracks displaying clinical attributes of each patient. Red boxes indicate clinical attributes that enrich one cluster over another (Fisher’s exact test p-value <0.01). The visual legend for this figure is available at the very end of the figure. D) Kaplan-Meier plots illustrating OS and EFS status of patients in each sub-group.  126  1 2pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppAB00.40.6Sil. Widthïïï0miRNA Expression Z-score1 2)/70XWDWLRQWULVRP\Gender5DFH&16GLVHDVH&KORURPDFABt(6,9)WWTTWTTWSTWSTWTSLQYGHOTGHOTGHOTPRQRVRP\PRQRVRP\trisomy80//5HDUUDQJHPHQWV2WKHU0//0LQXV<0LQXV;)/7,7'1300XWDWLRQ&(%3$0XWDWLRQ:70XWDWLRQF.LW0XWDWLRQF.LW([RQ0XWDWLRQ05'HQGRIFRXUVH05'HQGRIFRXUVH5LVN*URXS&5HQGRIFRXUVH&5HQGRIFRXUVHCD0  4 6 8 0.00.40.60.8 OS<HDUV6XrvivDOORJïUSvDOQ Q 0  4 6 8 0.00.40.60.8 EFS<HDUV6XrvivDOORJïUSvDOGenderFHPDOH0DOHRaceAmerLFDQ,QGLDQRU$ODVND1DWLve$VLDQ%ODFNRU$IrLFDQ$PHrLFDQ1DWLve HDwDLLDQRURWKHU PDFLILF,VODQGHUUnknownWhiteFAB0000000NOSUnknownRisk group+LJKLow6WDQGDUGUnknownCR status at end of course1/2CRNot in CRUnevDOXDbleSil. Width0.98 127 Figure 3.9 miRNA k2-15 NMF Metrics A) Consensus maps of rank (k) 2-15 solutions of unsupervised clustering of miRNA expression profiles of 259 primary samples. Deep red blocks indicate samples that consistently cluster with one another. B) Cophenetic coefficients (which provide measurements of the stability of the clusters), and silhouette widths (which indicate the consistency of the membership of each sample in the assigned cluster) of k: 2-15 solutions. I chose to follow up on the k=2 solution as it had the highest cophenetic coefficient.  128  rank =  2 rank =  3 rank =  4 rank =  5rank =  6 rank =  7 rank =  8 rank =  9rank =  10 rank =  11 rank =  12 rank =  13BasisConsensusSil. Widthrank =  14 rank =  15silhouette0.40.60.81.02 3 4 5 6 7 8 9 10 11 12 13 14 15Factorization rankMeasure typeBasisBest fitCoefficientsConsensuscophenetic0.940.960.981.002 3 4 5 6 7 8 9 10 11 12 13 14 15Factorization rankAB00.20.40.60.81 129 Figure 3.10 Comparing miRNA NMF sub-groups (k=2) and mRNA NMF sub-groups (k=5) Sankey plot displaying membership of AML samples in miRNA NMF sub-groups and mRNA NMF sub-groups. There was no significant overlap between miRNA NMF sub-groups (k=2) and mRNA NMF sub-groups (k=5) (Fisher’s Exact Test p-value > 0.01). “miRNA-NA” refers to cases where mRNA data was available but no miRNA data was available, while “mRNA-NA” refers to cases where miRNA data was available but no mRNA data was available.     130 Figure 3.11 miRNA expression in relapse and refractory AML There is a small schematic diagram at the top left of each panel to indicate which samples are being compared and represented. A) Volcano plot displaying differentially expressed miRNAs between primary samples from refractory patients and primary samples from CR patients. B) Volcano plot displaying differentially expressed miRNAs between refractory and primary samples. C) Volcano plot displaying differentially expressed miRNAs between relapse and primary samples. A-C) Red dotted line represents the significant differential expression threshold, set at 0.05. 4 members of the miR-106a-363 cluster are abundantly expressed in treatment resistant contexts: they are more abundant in primary samples of refractory than in primary samples of patients who achieve CR, more abundant in refractory samples than in primary samples and more abundant in relapse samples than in primary samples. Dots representing members of the miR-106a-363 cluster are marked with circles in the volcano plots. D) Boxplot depicting miR-106a-5p expression in primary, relapse and refractory samples. E) miRNA expression levels in primary samples. I feature the 12 miRNAs that are abundantly expressed in relapse samples, refractory samples, and primary samples of refractory patients. The heat map is sorted according to miR-106a-5p expression. The expression of these miRNAs correlates with CR status of patients.   131     132 Figure 3.12 Workflow for miRNA:mRNA integrative analysis Putative miRNA targets are those with anti-correlated expression with their target miRNA, and harbor a predicted binding site for their targeting miRNA.     133 Figure 3.13 Identifying miRNA:mRNA expression correlations A) In order to identify miRNA:mRNA pairs with correlated expression profiles, I performed Spearman correlation tests for each miRNA:mRNA pair. The resulting Spearman correlation coefficients are displayed in this histogram. B) In order to represent miRNA:mRNA correlations that are noise, I generated a null distribution consisting of Spearman correlation coefficients of scrambled data. The null distribution was derived by performing Spearman correlations 50 times, each time randomizing the miRNA-seq library IDs. The resulting Spearman correlation coefficients are displayed in this histogram. C) To account for correlations that might have been stochastic noise, the rho distribution was then divided in 40 bins and the counts for each bin compared with counts from the null distribution depicted in B. miRNA:mRNA pairs in each bin were sorted by adjusted p-value, and only those that ranked above the threshold set by counts from bins derived from null distribution were considered for further analysis. In this plot, the grey bars represent the null distribution, the dark blue bars represent miRNA:mRNA correlations that were considered for analysis, and light blue bars represent miRNA:mRNA correlations that were excluded for analysis because they were considered stochastic noise.   134    ï ï ï ï ï ï ï          All Observed miRNA:mRNA CorrelationsSpearman Correlation CoefficientNumber of miRNA:mRNA&RUUHODWLRQVORJï ï ï ï ï ï ï          miRNA:mRNA Correlated In Generated Null DistributionSpearman Correlation CoefficientNumber of miRNA:mRNA&RUUHODWLRQVORJï ï ï ï ï ï ï          miRNA:mRNA Correlations Filtered By The Null DistributionSpearman Correlation CoefficientNumber of miRNA:mRNA&RUUHODWLRQVORJ Considered For AnalysisExcluded From AnalysisNull DistributionABC 135 Figure 3.14 Integrative miRNA:mRNA analysis identifies putative miRNA targets Pathways enriched by targets of miRNAs that are: A) over-expressed in refractory samples compared with primary samples, B) over-expressed in relapse samples compared with primary samples, C) under-expressed in refractory samples compared with primary samples, and D) under-expressed in relapse samples compared with primary samples. Numbers of target genes that fall into each pathway are indicated in brackets. Blue dotted lines indicate the p-value significance threshold, set at 0.05. Of note, “Oxidative phosphorylation” is the only pathway that is within the top 5 enriched pathways of both (A) and (B), suggesting that many energy metabolism related genes could be repressed by miRNAs in both refractory and relapse samples. “RNA transport” and “Spliceosome” pathways are amongst the top 5 enriched pathways of both (C) and (D) that are enriched by >10 target genes. This suggests that genes involved in RNA transport and splicing could be more abundantly expressed in both refractory and relapse samples due to decreased expression of their targeting miRNAs.    136   137 Figure 3.15 miRNAs associated with patient survival A) Forrest plots displaying Cox proportional hazard ratios and 95% confidence intervals of miRNAs that are significantly associated with OS and/or EFS. From left to right: OS in the discovery cohort, OS in the validation cohort, EFS in the discovery cohort, EFS in the validation cohort. A hazard ratio to the left of the y-axis indicates that abundant expression of that miRNA is associated with reduced risk (superior outcome), while a hazard ratio to the right of the y-axis indicates that abundant expression of that miRNA is associated with increased risk (inferior outcome). Blue dots/bars indicate a p-value of less than 0.05, while red dots/bars indicate both a p-value of less than 0.05 and a FDR of less than 0.1. B and C) Kaplan-Meier plots showing the association of miR-106a-5p with OS and EFS in the discovery (B) and validation (C) cohorts. Boxplots indicate the range of miRNA expression in the low and high expression groups.   138     139 Figure 3.16 Kaplan-Meier plots of miRNAs associated with OS and EFS in the miRNA discovery and validation cohorts (This figure spans 3 pages)  There are 4 panels for each miRNA: OS in the discovery cohort (DISC), EFS in the discovery cohort, OS in the validation cohort (VAL) and EFS in the validation cohort. For each panel, the Kaplan-Meier plot is shown to the left, while the boxplot indicating expression levels in low and high groups is shown to the right. In the discovery cohort, low and high groups were determined using X-tile223, which determines the optimal separation point for low and high expression groups by selecting the separation point that returns the lowest log-rank p-value. For both the discovery and validation cohorts, low and high groups were determined based on proximity to centroids that were the means of the low and high groups in the discovery cohort. The 6 miRNAs featured here are associated with both OS and EFS in both the discovery and validation cohorts, indicating that their associations with survival are robust and should be regarded as potential prognostic markers.             140    0 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A726ï',6&Years% SurvivalORJïUSval: 0High n= 17Low n= 242High Lowï0246High/Low Threshold: 2.63 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A7()6ï',6&Years% SurvivalORJïUSval: 0.00203High n= 21Low n= 238High Lowï0246High/Low Threshold: 2.32 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A726ïVALYears% SurvivalORJïUSval: 0.04085High n= 10Low n= 368High Lowï024High/Low Threshold: 2.63 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A7()6ïVALYears% SurvivalORJïUSval: 0.05519High n= 13Low n= 365High Lowï024High/Low Threshold: 2.32 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A726ï',6&Years% SurvivalORJïUSval: 0.00302High n= 57Low n= 202High Low246810High/Low Threshold: 6.67 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A7()6ï',6&Years% SurvivalORJïUSval: 0.00289High n= 65Low n= 194High Low246810High/Low Threshold: 6.58 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A726ïVALYears% SurvivalORJïUSvDOHïHigh n= 106Low n= 272High Low46810High/Low Threshold: 6.67 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïDïS0,0A7()6ïVALYears% SurvivalORJïUSvDOHïHigh n= 110Low n= 268High Low46810High/Low Threshold: 6.58 log2 RPMPL51$([SORJ530 141    0 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïïS0,0AT000070726ï',6&Years% SurvivalORJïUSval: 0.00588High n= 46Low n= 213High Low0246810High/Low Threshold: 6.73 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïïS0,0AT0000707()6ï',6&Years% SurvivalORJïUSval: 0.00283High n= 46Low n= 213High Low0246810High/Low Threshold: 6.72 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïïS0,0AT000070726ïVALYears% SurvivalORJïUSval: 0.00113High n= 89Low n= 289High Low0246810High/Low Threshold: 6.73 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïïS0,0AT0000707()6ïVALYears% SurvivalORJïUSval: 0.00215High n= 90Low n= 288High Low0246810High/Low Threshold: 6.72 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïEïS0,0A726ï',6&Years% SurvivalORJïUSvDOHïHigh n= 62Low n= 197High Low024681012High/Low Threshold: 6.89 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïEïS0,0A7()6ï',6&Years% SurvivalORJïUSval: 0.02544High n= 58Low n= 201High Low024681012High/Low Threshold: 6.97 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïEïS0,0A726ïVALYears% SurvivalORJïUSval: 0.0248High n= 191Low n= 187High Low24681012High/Low Threshold: 6.89 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïEïS0,0A7()6ïVALYears% SurvivalORJïUSval: 0.06403High n= 185Low n= 193High Low24681012High/Low Threshold: 6.97 log2 RPMPL51$([SORJ530 142    0 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïFïS0,0AT000455926ï',6&Years% SurvivalORJïUSval: 0.00095High n= 142Low n= 117High Low46810High/Low Threshold: 7.8 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïFïS0,0AT0004559()6ï',6&Years% SurvivalORJïUSval: 0.02204High n= 141Low n= 118High Low46810High/Low Threshold: 7.82 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïFïS0,0AT000455926ïVALYears% SurvivalORJïUSval: 0.00176High n= 181Low n= 197High Low246810High/Low Threshold: 7.8 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïFïS0,0AT0004559()6ïVALYears% SurvivalORJïUSval: 0.0115High n= 180Low n= 198High Low246810High/Low Threshold: 7.82 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïF0,0A726ï',6&Years% SurvivalORJïUSval: 0.00482High n= 139Low n= 120High Low1234567High/Low Threshold: 5.1 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïF0,0A7()6ï',6&Years% SurvivalORJïUSval: 0.00058High n= 134Low n= 125High Low1234567High/Low Threshold: 5.17 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïF0,0A726ïVALYears% SurvivalORJïUSval: 0.00059High n= 61Low n= 317High Low0123456High/Low Threshold: 5.1 log2 RPMPL51$([SORJ5300 2 4 6 8 100.00.20.40.60.81.0KVDïPL5ïF0,0A7()6ïVALYears% SurvivalORJïUSval: 0.01078High n= 49Low n= 329High Low0123456High/Low Threshold: 5.17 log2 RPMPL51$([SORJ530 143 Figure 3.17 Clinical characteristics of AML patients in high and low miR-106a-5p expression groups Top: Bar plot indicating miR-106a-5p expression levels in patient samples. Red bars indicate patients in the high expression group (n=57), while blue bars indicate patients in the low expression group (n=202). Bottom: Clinical characteristics of each AML patient. Aside from MRD and CR, there is no significant enrichment of any clinical characteristics in either high or low miR-106a-5p expression groups.  0246810miR-106a-5p Expression(log2 RPM)FLT3 Mutationtrisomy21GenderRaceCNS diseaseChloromaFABt(6,9)t(8,21)t(3,5)(q25,q34)t(6,11)(q27,q23)t(9,11)(p22,q23)t(10,11)(p11.2,q23)t(11,19)(q23,p13.1)inv(16)del5qdel7qdel9qmonosomy5monosomy7trisomy8MLL RearrangementsOther MLLMinus YMinus XFLT3-ITDNPM1 MutationCEBPA MutationWT1 MutationcKit MutationcKit Exon MutationMRD at the end of course 1MRD at the end of course 2Risk GroupCR at the end of course 1CR at the end of course 2GenderFemaleMaleRaceAmerican_Indian_or_Alaska_NativeAsianBlack_or_African_AmericanNative_Hawaiian_or_other_Pacific_IslanderUnknownWhiteFABM0M1M2M4M5M6M7NOSUnknownRisk_groupHighLowStandardUnknownCR_statusCRNot_in_CRDeathUnevaluable 144 Figure 3.18 miR-106a-5p targets genes involved in oxidative phosphorylation A) miR-106a-5p binding sites predicted by both TargetScan and miRanda on genes involved in oxidative phosphorylation: ATP5S, ATP5J2-PTCD1, NDUFA10, NDUFC2, UQCRB. B) miR-106a-5p activity in HEK-293 cells was assessed using a psiCHECK2 dual luciferase reporter construct containing each of the putative ATP5S, ATP5J2-PTCD1, NDUFA10, or NDUFC2/UQCRB binding sites. Activity is measured as Renilla luminescence normalized to Firefly luminescence to control for transfection efficiencies. The data were shown as normalized relative luciferase units (RLU) with respect to the corresponding dose of the control mimic and are representative of three independent experiments (mean ± SEM). Statistically significant comparisons between the co-transfected miR-106a-5p miRNA (20 pmol) and the NC2 control for the perfect binding reporter vector are noted over the solid colored bars. Statistically significant comparisons between perfect binding and mismatch constructs are indicated above double-headed arrows. *p-value <0.05. White bars: NC2 negative control mimics; Grey colored bars: miR-106a-5p mimics on perfect binding (PB) sites; Striped grey bars, mir-106a-5p mimics on mismatched (MM) sites.  145   146 Table 3-1    Clinical characteristics of all 676 AML patients included in this study Characteristics Variables Values Age in days (Mean (Range))   3462(8-10500) WBCx103/MicroLiter (Mean (Range))   72.08(0.2-827.2) Study AAML03P1 77 (11.4%) AAML0531 556 (82.2%) CCG-2961 43 (6.36%) Gender Female 333 (49.3%) Male 343 (50.7%) Race American Indian or Alaska Native 3 (0.444%) Asian 27 (3.99%) Black or African American 77 (11.4%) Native Hawaiian or other Pacific Islander 3 (0.444%) White 501 (74.1%) Other 20 (2.96%) Unknown 45 (6.66%) CNS Disease No 625 (92.5%) Yes 51 (7.54%) Chloroma No 591 (87.4%) Yes 84 (12.4%) Unknown 1 (0.148%) FAB M0 17 (2.51%) M1 71 (10.5%) M2 144 (21.3%) M3 2 (0.296%) M4 164 (24.3%) M5 122 (18%) M6 10 (1.48%) M7 26 (3.85%) Unknown 120 (17.8%) t(6;9) No 627 (92.8%) Yes 15 (2.22%) Unknown 34 (5.03%) t(8;21) No 544 (80.5%) Yes 98 (14.5%) Unknown 34 (5.03%) t(3;5)(q25;q34) No 641 (94.8%) Yes 1 (0.148%)  147 Characteristics Variables Values Unknown 34 (5.03%) t(6;11)(q27;q23) No 625 (92.5%) Yes 17 (2.51%) Unknown 34 (5.03%) t(9;11)(p22;q23) No 602 (89.1%) Yes 38 (5.62%) Unknown 36 (5.33%) t(10;11)(p11.2;q23) No 625 (92.5%) Yes 17 (2.51%) Unknown 34 (5.03%) t(11;19)(q23;p13.1) No 627 (92.8%) Yes 15 (2.22%) Unknown 34 (5.03%) inv(16) No 545 (80.6%) Yes 97 (14.3%) Unknown 34 (5.03%) del5q No 637 (94.2%) Yes 5 (0.74%) Unknown 34 (5.03%) del7q No 623 (92.2%) Yes 18 (2.66%) Unknown 35 (5.18%) del9q No 616 (91.1%) Yes 26 (3.85%) Unknown 34 (5.03%) monosomy 5 No 641 (94.8%) Yes 1 (0.148%) Unknown 34 (5.03%) monosomy 7 No 627 (92.8%) Yes 15 (2.22%) Unknown 34 (5.03%) trisomy 8 No 573 (84.8%) Yes 68 (10.1%) Unknown 35 (5.18%) trisomy 21 No 623 (92.2%) Yes 19 (2.81%) Unknown 34 (5.03%) MLL No 551 (81.5%) Yes 91 (13.5%) Unknown 34 (5.03%)  148 Characteristics Variables Values Minus Y No 612 (90.5%) Yes 30 (4.44%) Unknown 34 (5.03%) Minus X No 621 (91.9%) Yes 21 (3.11%) Unknown 34 (5.03%) FLT3-ITD positive No 557 (82.4%) Yes 118 (17.5%) Unknown 1 (0.148%) FLT3 Mutation No 608 (89.9%) Yes 49 (7.25%) Unknown 19 (2.81%) NPM Mutation No 609 (90.1%) Yes 53 (7.84%) Unknown 14 (2.07%) CEBPA Mutation No 632 (93.5%) Yes 35 (5.18%) Unknown 9 (1.33%) WT1 Mutation No 612 (90.5%) Yes 55 (8.14%) Unknown 9 (1.33%) c-Kit Mutation No 156 (23.1%) Yes 30 (4.44%) Unknown 490 (72.5%) c- Kit Exon Mutation No 160 (23.7%) Yes 26 (3.85%) Unknown 490 (72.5%) CR Status (At End of Course 1) CR 512 (75.7%) Death 14 (2.07%) Not in CR 142 (21%) Unevaluable 8 (1.18%) Risk Group 10 2 (0.296%) 30 1 (0.148%) High 94 (13.9%) Low 268 (39.6%) Standard 288 (42.6%) Unknown 23 (3.4%)   149 4 Characterization of miRNA Expression in Malignant Rhabdoid Tumours Reveals Dysregulated Transcripts Associated With Different Tissue Types 4.1 Introduction MRT are aggressive pediatric solid tumours with a median age at diagnosis of 11 months193. Malignant rhabdoid tumours can occur throughout the body, although they are detected frequently in kidneys (rhabdoid tumours of the kidney; RTK) and brain (Atypical Teratoid Rhabdoid Tumours; AT/RT). Although various cellular origins of MRT, such as neural, mesenchymal, epithelial, myogenic and neural crest, have been proposed299-303, the cell of origin of malignant rhabdoid tumours is yet unknown. Despite being considered rare with an age-standardized annual incidence rate of 0.6 per million children in the UK304, the clinical burden of malignant rhabdoid tumours is considerable in the infant population, with extra-cranial MRT accounting for 18% of renal tumors, 14% of soft tissue tumors and 9% of liver tumors304. In addition, AT/RT accounts for 10-15% of central nervous system tumors305. As such, new treatment options are needed, as MRT patients exhibit an overall 4-year survival rate of only 23.2%306. MRT are driven by loss of SMARCB1307,308 or in rare instances, loss of SMARCA4309. SMARCB1 and SMARCA4 are core subunits of the chromatin-remodeling SWItch/Sucrose Non-Fermentable (SWI/SNF) complex, a highly conserved global transcription regulator that can recruit transcription factors to target genes310, or indirectly modulate target gene expression by altering nucleosome position311. Loss of SMARCB1 has been reported in other neoplasms such as epithelioid sarcomas312 and schwannomatosis313. Loss of SMARCA4 has also been reported in other cancer types (e.g. clear cell renal cell carcinomas314, small cell carcinoma of the ovary – hypercalcemic type315,316). Previous studies of MRT samples and cell line models described the consequences of SMARCB1 loss, which included dysregulated G0-G1 cell cycle transition resulting from Cyclin D1 induction and repression of P16INK4A317; aberrant activation of the sonic hedgehog318 and WNT/-catenin signaling pathways319;  150 dysregulated expression of genes involved in self-renewal of embryonic stem cells320; and dysregulated neural or neural crest development321. Previous exome207 and microarray studies322,323 noted a paucity of somatic mutations in MRT genomes, compatible with the notion that MRT progression is driven predominantly by SMARCB1 loss. Despite this predominant driver event, previous studies have alluded to some degree of clinical heterogeneity in MRT, indicated by outcome correlation with tumour stage and patient age306, and reports of a few long-term survivors324. Through analysis of gene expression, Torchia et al.325 identified two sub-groups within AT/RT that differed in clinical outcome and in the expression of genes involved in cell lineage and developmental signaling. However, the existence of molecularly distinguishable sub-groups has not yet been described in extra-cranial MRT. Thus, I hypothesized that an analysis of extra-cranial MRT gene expression profiles may uncover patient sub-groups and reveal genes and pathways that are dysregulated in extra-cranial MRT. One such molecular profile is that of the miRNAs. Array-based miRNA expression profiles have been generated for MRT samples for the purpose of comparison to other cancer types200,201. In particular, 107 miRNAs were differentially expressed between MRT and rhabdomyosarcoma200, an undifferentiated round cell tumour consisting of poorly differentiated cells. Another effort, which compared miRNA profiles between RTK and AT/RT, noted that their microarray-based miRNA profiles could not distinguish between the 2 disease sample types201. Studies which focused on the SWI/SNF complex reported that miR-193a represses the expression of SMARCB1 in MRT202, miR-206, miR-381, miR-671-5p repress the expression of SMARCB1 in epitheloid sarcoma203, and miR-199a represses the expression of SMARCA2, which encodes another member of the SWI/SNF complex, in a variety of cancers204. However, there has not yet been a sequence-based effort to profile miRNA expression in MRT. Moreover, since all MRT cases are characterized by a deactivation of SWI/SNF complex members, MRT serves as a suitable model for studying miRNA dysregulation in the context of SWI/SNF loss. miRNA expression patterns are useful in classifying cancers221,231. In a pan-cancer miRNA analysis, Lu et al.221 observed that nearly all miRNAs were differentially expressed across 20  151 cancer types, and hierarchical clustering of miRNA expression profiles of these samples paralleled the developmental origins of the tissues. In particular, those of epithelial origin were clearly distinguished from those of hematopoietic origin. Another effort by Rosenfeld et al.231 successfully designed a miRNA-based binary tree classifier that could distinguish between 22 different cancer types. It was also noted in the Lu et al.221 study that clustering of mRNA expression was unable to distinguish tumour samples to the same degree as miRNA expression – their miRNA-based classifier was able to correctly assign the anatomical context of 12 of 17 tumours of unknown origin, compared to only 1 of 17 that were correctly assigned by their mRNA-based classifier. In addition, Ferracin et al.326 identified a miRNA expression signature, consisting of 47 miRNAs that not only correctly classify samples from 20 different cancer types, but also identified cells-of-origin for cancers of unknown primary with 86% accuracy. The ‘cells of origin’ of a cancer are the cells that acquire the first genetic hit or hits that culminate the initiation of tumourigenesis327. Identifying a cell of origin of a cancer allows us to better understand the relative contribution of genetic and epigenetic aberrations to the pathogenesis of the disease. For instance, since the tissue of origin of medulloblastoma (MD) is the cerebellum, mutations in genes involved in the development of the cerebellum might be expected to play important roles in this cancer328. As such, it is not surprising that mutations in and aberrant expression of the sonic hedgehog gene, which encodes a protein that is secreted by the Purkinje cells in the developing cerebellum, have been found to drive at least one subtype of MD329. The identification of cells of origin of cancers may allow earlier detection of malignancies and better prediction of tumour behavior and patient prognosis327. For instance, the cell of origin of DLBCL subtypes is tightly correlated with treatment outcomes. DLBCL patients diagnosed with the activated B-cell-like (ABC) subtype have inferior outcomes when compared with patients diagnosed with the germinal-center B-cell-like (GCB) subtype50.  Here, as part of the TARGET consortium of pediatric cancer researchers, a multi-platform dataset was generated. Using these data, we performed a comprehensive characterization of extra-cranial MRT, and an analysis of the consequences of SMARCB1 loss in MRT (Chun, Lim, et al., manuscript under review). The work described in this chapter consists only of the  152 miRNA based analyses I performed in support of the larger effort. To derive comprehensive transcriptome datasets for MRT to facilitate a sequence-based miRNA characterization of extra-cranial MRT, miRNA-seq and RNA-seq were applied to a set of 40 MRT cases. Of these 40 cases, 34 samples were obtained from the kidney, 4 from soft tissues and 2 from the liver. 4.2 Results 4.2.1 Unsupervised clustering of miRNA expression profiles reveals 2 sub-groups Analysis of miRNA-seq expression profiles revealed that 535 miRNAs were expressed (Methods) in at least 1 of the 40 MRT samples. To determine the extent of heterogeneity in miRNA expression across the 40 MRT samples, I performed NMF clustering. This identified an optimum of two sub-groups characterized by distinct miRNA expression patterns (Figure 4.1A and Figure 4.2). miRNA sub-group 1 consisted of 23 cases, while miRNA sub-group 2 consisted of 17 cases. Similar to gene expression sub-groups identified using the same samples (Chun, Lim, et al., manuscript under review), extra-renal MRT cases were found exclusively in miRNA sub-group 1 (Fisher’s exact test p-value <0.05; Figure 4.1B).  This indicated that extra-renal cases have distinctive gene and miRNA expression patterns than renal cases, and may be considered as a separate subtype for classification purposes. To identify miRNA expression patterns that characterized each sub-group, I performed differential expression analysis comparing miRNA expression between sub-group 1 and sub-group 2. This identified 12 miRNAs that were over-expressed in sub-group 1, and 30 that were over-expressed in sub-group 2 (Figure 4.1C DESeq log2 fold change >1, FDR <0.05; Appendix 4A). All 5 members of the miR-200 family (miR-200a, miR-200b, miR-200c, miR-141, miR-429) were expressed at relatively lower levels in sub-group 2 compared to sub-group 1. The tumour-suppressive miR-200 family is involved in suppressing the epithelial-mesenchymal transition (EMT) in metastatic bladder cancer, gastric cancer, nasopharyngeal carcinomas, ovarian cancer, pancreatic cancer, and prostate cancer330. EMT is the process by which epithelial cells lose their cell polarity and cell-cell adhesion and gain migratory and invasive properties330. The lower expression levels of the miR-200 family in  153 sub-group 2 are compatible with the notion that EMT may be activated in sub-group 2 relative to sub-group 1. It may also indicate that MRT cases in sub-group 1 may be from tissues that are more epithelial-like, whereas MRT cases in sub-group 2 may be from tissues that are more mesenchymal-like. In agreement, with this, we see abundant expression of APC, FGF7, TWIST1 in sub-group 2. The observation that MRT samples have expression profiles that reflect different stages of EMT is in agreement with our mRNA analysis, where we also observed differential expression of EMT inducing genes (ie. ZEB1, TWIST2, CDH1, LMO4) between mRNA sub-groups (Chun, Lim, et al., manuscript under review). I also observed that liver-specific miRNAs, miR-122331,332 and miR-1269333, were the most significantly over-expressed miRNAs in sub-group 1, which included two MRT cases which were found in the liver. Moreover, miR-202, which is poorly expressed in hepatocellular carcinoma334, was also under-expressed in sub-group 1, supporting the notion that miRNA expression can reflect tissue of origin of the malignancy. The caveat of this result is that it is limited by the analytic power offered by only 2 cases. To determine the potential impact of the differential miRNA expression on candidate target mRNA transcripts, I identified putative mRNA targets of the differentially expressed miRNAs between miRNA sub-groups 1 and 2. Candidate miRNA:mRNA interactions were determined by searching for miRNA:mRNA pairs with anti-correlated expression profiles, where the mRNA had a predicted binding site for the miRNA, and where the miRNA and mRNA were reciprocally differentially expressed between the 2 miRNA sub-groups. This analysis revealed 10 candidate mRNA targets that were more abundant in miRNA sub-group 2 (DESeq FDR <0.05). These were E2F7, CD248, ARSI, ARMC9, KRT12, GOLGA8B, ATP1B2, SLC6A9, RGMA, and RASL11B (Figure 4.1D). To the best of my knowledge, none of these genes have been implicated in MRT, however, RGMA and CD248 are deregulated in non-cancer diseases of the kidney335,336. Moreover, E2F7337 and CD248338 are promoters of angiogenesis, and RGMA339 is associated with poor prognosis in breast cancer. The results of this analysis suggest that these genes may have a role in MRT pathogenesis.   154 4.2.2 Pan-cancer/tissue miRNA analysis reveals tissue types that are similar to MRT Next, I studied miRNA expression in MRT samples in the context of other cancer and tissue types, reasoning that such an analysis may identify tissue types that MRT are similar to, and thus would provide information about the cell of origin of the disease which is yet unknown. I used an unsupervised clustering approach to broadly explore the relationship of MRT to other cancer types based on miRNA-seq expression profiles. Using the 535 mature miRNAs that were expressed in 40 MRT cases (Table 4-1), I performed an unsupervised hierarchical clustering analysis to compare MRT miRNA profiles to those from 11,753 cases representing 36 cancer types and 26 normal tissue types (Methods). Each MRT case was represented individually, but each other cancer and tissue type were represented by median miRNA expression levels within that particular type. This analytical approach, which has been used in previous studies that compared expression profiles from multiple tissue types340, allowed me to specifically characterize individual MRT cases without the confounding effects of heterogeneity within the other cancer and tissue types. My analysis revealed that MRT cases segregated into three groups (Figure 4.3). To determine the robustness of the clustering solution, I ran pvclust, which returns Approximately Unbiased (AU) values. AU values range from 0-100, where an AU close to 100 indicates that clustering solution is strongly supported by the clustered data. The pvclust analysis of my pan-cancer/tissue clustering solution returned a relatively robust median value of 86 (Methods, Figure 4.4). The group consisting of the majority of MRT cases (35/40) was characterized by a fairly robust AU (AU: 88) that was found above the median. This group of MRT cases clustered closely with normal cerebellum samples (“MD_NORM”, “GBM_NORM”). These results suggest that despite being found in the kidney, these MRT cases may be cerebellum-or neural-like, and is consistent with the notion that neural crest-derived cells might be the cell of origin for MRT299-303. The second group, consisting of only three MRT cases, clustered closely with diffuse large B-cell lymphoma (DLBCL) tumours. One of these cases (PARIRN) also exhibited a germline deletion in SMARCA4 (Chun, Lim, et al., manuscript under review), a gene recurrently mutated in B-cell lymphomas arising from the germinal center341. Another case in this group, PABKLN, exhibited high expression  155 of several immunoglobulin genes as well as CD274/PD-L1 (Chun, Lim, et al., manuscript under review), a putative marker for effectiveness of anti-PDL1 therapy342. Finally, two kidney MRT cases clustered with Wilms’ tumour and uterine carcinosarcoma tumour samples originating from kidney and uterus respectively. Overall, the pan-cancer/tissue clustering analyses revealed that MRT cases might share cells of origins with normal cerebellum, DLBCL tumours, or tumours originating from the kidney and uterus. The caveat about this analysis is that it is limited to only 40 samples, where the DLBCL and kidney/uterus clustering groups are only represented by 3 and 2 cases respectively. Results of the differential expression analyses with small sample sizes such as these are poorly powered and thus may have a low likelihood of a statistically significant result reflecting a true effect. I next identified genes that were differentially expressed among these 3 miRNA groups. In general, differentially expressed genes reflected the tissue type with which each miRNA group was clustered. For example, the largest sub-group that clustered with normal cerebellum samples (N=35) showed over-expression of genes involved in neuron differentiation such as NNAT, ROBO2, SEMA3A, and STMN2 (Fisher’s exact test p-value = 4.84E-10). The second miRNA group (N=3) that clustered with DLBCL cases showed over-expression of genes involved in the immune response such as CCL19, CXCL12 and immunoglobulin genes (p-value = 9.38E-7). The third group (N= 2), which clustered with Wilm’s tumours and uterine carcinosarcoma samples, exhibited over-expression of genes involved in tissue and organ development such as WNT5A, SPRY2, and ETS2 (p-value = 2.05E-3 - 8.30E-6). The observation that mRNA expression patterns of the samples in each of these 3 miRNA-defined sub-groups also reflected the same tissue types that were identified in miRNA analysis reinforces the notion that gene expression analysis may reflect the biology of samples. 4.2.3 miRNA expression comparison with normal cerebellum identifies potentially perturbed miRNA-mediated gene regulation in MRT The pan-cancer/tissue miRNA analysis indicated that the majority of MRT samples exhibited similarities to normal SMARCB1-intact cerebellum samples from both adults and fetuses, supporting the notion that the cerebellum may be a suitable normal tissue for comparison to  156 MRT. Moreover, analysis of mRNA expression data indicated that there were fewer differentially expressed genes between MRT and fetal cerebellum than between MRT and adult cerebellum, suggesting that MRT samples were more similar to fetal cerebellum than adult cerebellum (Chun, Lim, et al., manuscript under review). Thus, I proceeded to identify miRNA transcripts that were differentially expressed between MRT and normal fetal cerebellum, revealing 33 over-expressed miRNAs and 97 under-expressed miRNAs in MRT compared to normal fetal cerebellum (DESeq FDR < 0.05, log2 Fold Change>1; Appendix 4A). miR-372, the most significantly over-expressed miRNA in MRT, resides on an imprinted locus in chromosome 19q13343, and is associated with poor prognosis in multiple cancer types such as glioma344, hepatocellular carcinoma345, and colorectal cancer346. I note that miR-372 is located approximately 3.02Mb downstream of PEG3, an imprinted gene that also exhibited increased expression in mRNA sub-group 2 (Chun, Lim, et al., manuscript under review). miR-219 was the most significantly under-expressed miRNA in MRT. miR-219 under-expression in medulloblastoma samples results in increased proliferation, invasion and migration347, and its under-expression in the cerebellums of Alzheimer’s disease patients correlates with neurodegeneration348. I also observed miR-200c over-expression and miR-9 under-expression in MRT, which are consistent with a previous study by Armeanu-Ebinger et al.200. In that study, they compared miRNA expression in MRT and SMARCB1-intact rhabdomyosarcoma, and found that there were 107 miRNAs (including miR-200c and miR-9) that were differentially expressed between the 2 cancer types. To identify candidate miRNA:mRNA interactions of dysregulated miRNAs in MRT, I performed an integrative miRNA and mRNA expression analysis. This analysis is identical to the analysis described in Section 3.4.9, with the exception that candidate interactions are now determined by considering only miRNA:mRNA pairs that are reciprocally differentially expressed between MRT and normal fetal cerebellum. This analysis revealed that over-expressed miRNAs in MRT compared to normal fetal cerebellum samples target mRNA transcripts involved in the MAPK signaling pathway (PRKCA, FGFR3, ARRB1, RASGRP2, PLA2G6, MAPK11, MAPK8IP1, PLA2G4B; Fisher’s exact test p-value = 0.028) and glycerolipid metabolism (AGPAT3, DGKD, MGLL; p-value = 0.09). Conversely, under- 157 expressed miRNAs result in de-repression of their candidate target genes in MRT that were involved in ribosome biogenesis (RPL7AP30, RPS18, RPL23, RPS16, RPL37, RPS27L, RPL37A, RPL23A, RPL22L1, RPS7; p-value = 0.0050) and the regulation of the cell cycle (CDC6, CDK1, TP53, SKP2, TTK, CHEK1, CDC20, CHEK2, PTTG1, CDK4, MCM3, MCM5, CCNB1, CCNE1, CDC45, CDKN2A, MAD2L1, HDAC1, BUB1, BUB1B, CCNA2, MYC; p-value = 4.63E-09). These results suggest that miRNAs that are dysregulated in MRT may contribute to tumorigenesis by repressing the expression of genes involved in the MAPK signaling pathway and glycerolipid metabolism, and de-repressing the expression of genes involved in ribosome biogenesis and cell cycle check points. 4.3 Discussion MRT are lethal cancers that require improved treatment strategies, and extra-cranial MRT in particular appear understudied compared to AT/RT. To understand the molecular biology that underpins extra-cranial MRT, I set out to characterize miRNA expression and dysregulation using miRNA-seq and mRNA-seq data of 40 MRT samples. Although it is clear that the dominant driver alteration in rhabdoid tumours is SMARCB1 loss, previous efforts325,349 have identified sub-groups in AT/RT patient cohorts, indicating that the effects of SMARCB1 loss on transcriptional regulation are not uniform. For instance, Birks et al.349 found sub-groups that differed in the expression of genes in the BMP pathway, while Torchia et al.325 found 3 gene expression sub-groups that exhibited distinct clinical outcomes. Here, I identified 2 molecular sub-groups in MRT using miRNA expression analyses, where one group (N=23) consisted of all extra-renal samples, and the other group (N=17) was characterized by poor expression of the EMT-inhibiting miR-200 family. These findings suggest that there could be at least 2 MRT subtypes with differing mechanisms of pathogenesis. The miRNA expression patterns that characterize these MRT sub-groups may direct future studies aimed at designing targeted therapies.  miRNA expression is known to be tissue-specific, and has been used to classify cancers from various tissues221,350. Clustering of miRNA expression profiles of MRT samples with other tumour and normal cell types revealed that MRT clustered closely with normal cerebellum  158 and B-cell lymphoma samples. The largest group of MRT samples clustered most closely with normal cerebellum samples, suggesting that MRT may be relatively more similar to neuronal cell types despite its occurrence in kidneys. This observation is consistent with studies that observed expression of neurofilment proteins in MRT samples300,351 and MRT cell lines302. Moreover, another study determined that 25% of the genes that were differentially expressed between MRT and other diseases of the kidney were involved in neural development321. A comparison of miRNA expression in MRT and normal fetal cerebellum samples revealed that miR-372 and miR-200c are more abundant in MRT samples, while miR-219 and miR-9 are less abundant in MRT samples, suggesting that these 4 miRNAs may be dysregulated due to SMARCB1 inactivation. I presented here the first sequence-based miRNA expression landscape of MRT. In accordance with the goal of the thesis to stratify heterogeneous cancer patient populations by outcome and other clinical and molecular correlates, I described heterogeneity amongst extra-cranial MRT samples by revealing 2 sub-groups. These sub-groups differed in the expression of the EMT-related miR-200 family and liver-specific miRNAs. Although limited by small sample sizes, my pan-cancer/tissue analysis also revealed 3 groups of MRT cases and suggested that MRT may be similar to cerebellum cells, lending additional support to the notion that MRT may derive from neural-like cells.  4.4 Methods 4.4.1 Tissue samples and cell lines 40 primary treatment-naïve MRT samples (34 from kidneys, 4 from soft tissues and 2 from liver) samples were obtained from patients registered on the Children’s Oncology Group protocol. The complete sample names and other details are provided in Table 4-1. 4.4.2 RNA-seq data generation and processing RNA-seq experiments were performed using polyA+ RNA enrichment and strand-specific sequencing protocols as reported by The Cancer Genome Atlas Research Network66.  159 Sequences were aligned to the reference genome (hg19), EnsEMBL gene models (version 69) and exon-exon junction read alignment was performed using BWA and JAGuaR352. 4.4.3 miRNA-seq data generation and processing miRNA-seq library construction, sequencing, read alignment, and miRNA expression profiling were performed as previously reported by the Cancer Genome Atlas Research Network66. miRNAs were considered expressed if they were represented by at least 10 reads per million mapped reads (RPM) in at least 1 of the 40 MRT miRNA-seq libraries. 4.4.4 Hierarchical clustering of miRNA expression profiles from MRT and other cancer and normal tissue types I performed hierarchical clustering on 11,753 miRNA expression profiles from 36 other cancer types and 26 normal tissue types. Clustering was based on z-scores of log2-transformed RPM values using the complete linkage and the Euclidean distance metric. The robustness of the clustering solution was then confirmed by 100 iterations of clustering using the pvclust R package. These cancer types include 3 pediatric cancer types from the TARGET consortium (pAML N=109, ALL N=157, WT N=138), 1 cancer type from the MAGIC consortium (medulloblastoma N=1133), and 32 adult cancer types from TCGA consortium (ACC N=80, BLCA N=418, BRCA N=796, CESC N=308, CHOL N=36, COAD N=452, DLBCL N=143 (inclusive of local data (Lim et al., 2015) and from TCGA), ESCA N=184, HNSC N=522, KICH N=66, KIRC N=537, KIRP N=291, LGG N=511, LIHC N=372, LAML N=17, LUAD N=519, LUSC N=479, MESO N=87, OV N=487, PAAD N=178, PCPG N=179, PRAD N=498, READ N=161, SARC N=218, SKCM N=97, STAD N=487, TGCT N=150, THCA N=517, THYM N=124, UCEC N=540, UCS N=57, UVM N=80) 66,116,314,353-366. The normal tissue types include B-cell centroblasts (N=15), ES cells (N=3), the HEK293 cell line (N=1), cerebellum (“MD_NORM”; N=10), and normal tissues obtained from various TCGA projects (BLCA N=19, CESC N=3, CHOL N=9, COAD N=8 ESCA N=13, GBM N=5, HNSC N=44, KICH N=25, KIRC N=71, KIRP N=34, LIHC N=51, LUAD N=45, LUSC N=45, PAAD N=4, PCPG N=3, PRAD N=52, READ N=3, SKCM N=2, STAD N=46, THCA N=69, THYM N=2, UCEC N=33). Descriptions of all of these  160 cancer and normal tissue type abbreviations are provided in Table 4-2. All miRNA-seq libraries were sequenced and profiled at the Genome Sciences Centre using approaches used to generate the MRT libraries described above. For each cancer type, a median expression value for each miRNA across samples was determined. These median values, together with miRNA-seq profiles from the MRT data set, were then used for the clustering analysis. A similar approach using median expression levels for clustering has been used by the GTEx project340. I used the Heatmap.2 function within the gplots R package (version 2.16.0) for visualization. 4.4.5 Differential expression analyses Differentially expressed genes and miRNAs were identified using the DESeq package (version 1.14.0) in R (version 3.0.3)352 at FDR 5%. For miRNA analysis, I used miRNA-seq read count data for 3p and 5p miRNAs. For mRNA analysis, I removed genes (n=23,329 out of 58,450 EnsEMBL annotated genes (version 69)) that were expressed below a noise threshold of 1 RPKM in all 52 cases of MRT, cell line, hES and normal kidneys. Pathways that were enriched by differentially expressed genes were identified using DAVID298. 4.4.6 NMF clustering of MRT miRNA expression profiles NMF clustering analyses were performed using miRNAs that were expressed at levels >10 RPM in at least 1 of the samples using the NMF package367 in R (NMF version 0.20.5; R version 3.0.2). I generated unsupervised consensus clustering results as described66. I used the default Brunet algorithm and 100 iterations for the rank survey and performed clustering runs for k = 2-10. The random seed was set as 123456 for reproducibility of clustering results. A preferred cluster solution was selected by considering the profiles of the cophenetic scores of the consensus membership matrix for clustering solutions. 4.4.7 Identification of candidate miRNA:mRNA interactions through integrative miRNA:mRNA analysis  Identification of candidate miRNA:mRNA interactions was performed as previously described277. Briefly, a Spearman correlation coefficient (rho) score and a p-value were  161 generated for comparisons of expression profiles between all possible miRNA and mRNA pairs. Then, miRNA:mRNA pairs were shortlisted based on the presence of target site predictions (from both TargetScan6.0155 and miRanda156 algorithms) and significant anti-correlation between miRNA and gene expression, with a statistical significance determined from comparing against bootstrapping-based null distributions. In addition, only miRNA:mRNA pairs that were also reciprocally differentially expressed between the groups in question (ie. tumour vs normal; sub-group 1 vs sub-group 2) were considered candidate miRNA:mRNA interactions. 4.5 Figures and tables Figure 4.1   Two sub-groups are revealed by NMF clustering of miRNA expression within the MRT samples A) NMF consensus map indicating two consistent sub-groups after 100 clustering iterations. B) Rows 1-3: Clinical characteristics of MRT cases in each group; age at diagnosis, gender and tumour stage. I note that all six extra-renal MRTs from liver and soft tissue were in miRNA sub-group 1. Rows 4-6: NMF basis and consensus matrix membership and silhouette width (which indicate the consistency of the membership of each sample in the assigned cluster). C) Heat map depicts differentially expressed (DE) miRNAs between miRNA sub-group 1 and sub-group 2. 12 miRNAs were over-expressed in miRNA sub-group 1, and 30 miRNAs were over-expressed in sub-group 2 (DESeq log2 fold change >1, FDR <0.05). D) Heat map depicts expression of putative mRNA targets of miRNAs that are DE between the sub-groups. Putative mRNA targets harbor at least one miRNA binding site predicted by both MiRanda and TargetScan programs and are reciprocally expressed and negatively correlated with expression levels of their corresponding targeting miRNA.  162   163 Figure 4.2 Metrics from NMF clustering of miRNA expression profiles  A) Cophenetic coefficients (which measure the stability of the clusters from multiple clustering iterations) and silhouette widths (which indicate the consistency of the membership of each sample in the assigned cluster) are shown from each NMF clustering run (with factorization ranks ranging from two to ten), based on miRNA-seq data. B) Consensus NMF clustering heat maps from each NMF miRNA clustering run are shown for each factorization rank k:2-7. I chose to follow up on the k=2 solution as it had the highest cophenetic coefficient.   164 Figure 4.3 Three groups of MRT samples are revealed by clustering miRNA expression profiles of MRT with other tumour and normal tissue types MRT samples are represented individually, while each non-MRT tumour and normal tissue types are represented by the medians of miRNA expression of all samples in that type. Hierarchical clustering was performed with log2 RPM values, but z-scores are plotted in order to accentuate miRNA-wise differences. Descriptions of all of these cancer and normal tissue type abbreviations are provided in Table 4-2. “TUM” at the end of each label indicates a tumour sample group, while “NORM” at the end of each title indicates a normal sample group. MRT cases are identified by “TARGET-52” labels. Pink shade: 35 of 40 MRT cases that clustered with normal cerebellum samples (“MD_NORM” and “GBM_NORM”). Green shade: MRT cases that clustered with DLBCL (“DLBC_TUM”). Blue shade: MRT cases that clustered with Wilms’ tumour samples (“WT_TUM”) and uterine carcinosarcomas (“UCS_TUM”). The caveat about this analysis is that it is limited to only 40 samples, where the DLBCL and kidney/uterus clustering groups are only represented by 3 and 2 cases respectively.    165    166 Figure 4.4 Pvclust result of miRNA pan-can hierarchical clustering Hierarchical clustering using Pvclust was performed over 100 clustering iterations to obtain Approximately Unbiased (AU) and Bootstrap Probability (BP) scores for each branch of the dendrogram. AU and BP scores, which are determined by multi-scale bootstrap resampling and normal bootstrap resampling respectively, range from 0-100 where AU and BP scores close to 100 indicate robust clustering solutions. The median AU for the clustering solution presented here is 86. The branch that includes the majority of the MRT samples has an AU value of 88. This result indicated that the clustering of MRT samples with normal cerebellum is relatively robust. Descriptions of all of these cancer and normal tissue type abbreviations are provided in Table 4-2. “TUM” at the end of each label indicates a tumour sample group, while “NORM” at the end of each title indicates a normal sample group. MRT cases are identified by “TARGET-52” labels.    167  168 Table 4-1    List of identifiers and clinical information for 40 MRT cases Patient / TARGET USI Age at Diagnosis in Years Path Stage Tumour content estimated by APOLLOH (%) Anatomical site of tumour TARGET-52-PABKLN 1 III 88.4 Kidney TARGET-52-PADYZI 3 III 90 Kidney TARGET-52-PAJLWM 1 III/IV 92.25 Kidney TARGET-52-PAJMRB 1 III 74.16 Kidney TARGET-52-PAJNER <1 II 81.43 Kidney TARGET-52-PAJNFP 1 III 90.32 Kidney TARGET-52-PAJNFZ 2 III/IV 42.78 Kidney TARGET-52-PAKHTL 3 III 86.96 Kidney TARGET-52-PARECB <1 III 88.13 Kidney TARGET-52-PARGRN <1 III/IV 93.61 Kidney TARGET-52-PARIRN 4 II/IV 90.21 Kidney TARGET-52-PARPFY 1 III 85.24 Kidney TARGET-52-PARRCL 1 II 87.12 Kidney TARGET-52-PARUGK 1 III 67.32 Kidney TARGET-52-PARZBI 1 III 81.88 Kidney TARGET-52-PASABD <1 III 88.31 Kidney TARGET-52-PASADZ 1 III 89.1 Kidney TARGET-52-PASCDH 1 III/IV 87.84 Soft tissue TARGET-52-PASDLA 1 II 90.91 Kidney TARGET-52-PASRHU 1 III 89.96 Kidney TARGET-52-PASVDP <1 III 74.43 Soft tissue TARGET-52-PASWZZ <1 III 91.84 Kidney TARGET-52-PASXNA 1 III/IV 95.04 Kidney TARGET-52-PASYNF 1 III/IV 87.81 Kidney TARGET-52-PASZYE 1 III/IV 76.3 Kidney TARGET-52-PATAFT 1 III 80.22 Liver TARGET-52-PATBLF 2 III 89.46 Soft tissue TARGET-52-PATDVL <1 II 89.09 Kidney TARGET-52-PATENH 3 III/IV 81.78 Kidney TARGET-52-PATFXW 2 II 89.47 Kidney TARGET-52-PATFZZ 1 III/IV 89.14 Kidney TARGET-52-PATXEE 2 III/IV 75.87 Kidney TARGET-52-PATXKA <1 III/IV 86.98 Liver TARGET-52-PAUCGJ 1 II 84.43 Kidney  169 Patient / TARGET USI Age at Diagnosis in Years Path Stage Tumour content estimated by APOLLOH (%) Anatomical site of tumour TARGET-52-PAUDPV <1 III/IV 92.11 Kidney TARGET-52-PAUEKW 15 III 89.95 Soft tissue TARGET-52-PAUFVP 2 I 89.99 Kidney TARGET-52-PAUGYZ 1 II 86.64 Kidney TARGET-52-PAUHAZ 1 III 86.91 Kidney TARGET-52-PAUNPA 3 II 84.93 Kidney     170 Table 4-2    Abbreviations of each cancer and normal tissue types included in the pan-cancer miRNA clustering analysis Disease/Tissue Abbreviation Disease/Tissue  ALL Acute Lymphoblastic Leukemia BLCA Bladder Urothelial Carcinoma BRCA Breast invasive carcinoma  CENTRO Centroblasts (Benign B-cells) CESC Cervical squamous cell carcinoma and endocervical adenocarcinoma CHOL Cholangiocarcinoma COAD Colon adenocarcinoma ES Embryonic Stem Cells ESCA Esophageal carcinoma GBM Glioblastoma multiforme HEK293 Human Embryonic Kidney 293 Cell Line HNSC Head and Neck squamous cell carcinoma KICH Kidney Chromophobe KIRC Kidney renal clear cell carcinoma KIRP Kidney renal papillary cell carcinoma LAML Acute Myeloid Leukemia LGG Brain Lower Grade Glioma LIHC Liver hepatocellular carcinoma LUAD Lung adenocarcinom LUSC Lung squamous cell carcinoma MD Pediatric Medulloblastoma MESO Mesothelioma OV Ovarian serous cystadenocarcinoma PAAD Pancreatic adenocarcinoma pAML Pediatric Acute Myeloid Leukemia PCPG Pheochromocytoma and Paraganglioma PRAD Prostate adenocarcinoma READ Rectum adenocarcinoma RT Malignant Rhabdoid Tumour SARC Sarcoma SKCM Skin Cutaneous Melanoma STAD Stomach adenocarcinoma TGCT Testicular Germ Cell Tumors THCA Thyroid carcinoma THYM Thymoma  171 Disease/Tissue Abbreviation Disease/Tissue  UCEC Uterine Corpus Endometrial Carcinoma UCS Uterine Carcinosarcoma UVM Uveal Melanoma WT Wilms' Tumour ACC Adrenocortical carcinoma DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma    172 5 Conclusions and Future Directions miRNAs are important regulators of gene expression – not only do they play crucial roles in cell developmental fate decisions but they also buffer stochastic fluctuations in gene expression368. Given their essential roles in the cell, as demonstrated by Dicer knockout models369, and the complexity of the network of their interactions370, their dysregulation in cancer is frequent and role in tumourigenesis is considerable. miRNA profiling efforts in the last decade have proven instrumental in improving our understanding of miRNA dysregulation in various cancers. These efforts revealed miRNA expression patterns which are characteristic of specific cancer types148, defined sub-groups within cancer types66,116,314,353-359,361-364,366,371, and established roles for dysregulated miRNAs in the pathogenesis of cancers226,372-374.   The primary goal of this thesis was to utilize state of the art miRNA expression profiling strategies to identify cancer subtypes and relate these to clinical covariates, reasoning that this would reveal relevant miRNA-based biomarkers and provide insight into the heterogeneous biologies that converge to form cancers. Thus, I provide the first comprehensive sequence-based characterizations of miRNA expression profiles of DLBCL (Chapter 2), pediatric AML (Chapter 3) and pediatric MRT (Chapter 4). My analysis revealed molecular sub-groups within cancer types that have distinct miRNA expression profiles. I also identified cellular functions and pathways that are altered by miRNA dysregulation and novel candidate miRNA prognostic markers. The translation of these findings into diagnostic and prognostic clinical tests could lead to improved clinical management for patients suffering from these and other cancers. 5.1 miRNA expression profiling reveals heterogeneity within cancer types Cancer is a heterogeneous disease, and even amongst tumour samples of the same subtype, there exists a high degree of variation375. My unsupervised NMF clustering of miRNA expression profiles revealed patient sub-groups within each of the 3 studied cancer types, and thus provided frameworks for exploring how miRNA expression may explain biological and clinical differences amongst patients. In some instances, the derived sub-groups also  173 provided a basis for the identification of therapeutic targets and biomarkers that may be useful in guiding decisions about therapy. In DLBCL (Section 2.2.6.5), I identified 2 miRNA sub-groups, where the sub-group with inferior outcome was further characterized by abundant miR-148a expression and reduced miR-21 expression. Even though miR-21 and miR-148a have already been associated with patient survival in lymphoma212,376, this result suggests that their roles in treatment resistance should be investigated. In pediatric AML (Section 3.2.5), I identified 2 miRNA sub-groups that correlated with known genomic rearrangements, some of which are established indicators of outcome in pediatric AML. This result prompts future studies into how these 2 molecular observations (miRNA expression and genomic rearrangements) may be related to one another, and suggests that specific miRNA-based therapies, such as miRNA mimics, anti-sense oligonucleotide miRNA inhibitors and mRNA decoys377, may benefit patients whose cancers harbor corresponding karyotypic abnormalities. In pediatric MRT (Section 4.2.1), I observed 2 miRNA patient sub-groups, where the 2 sub-groups differed in the expression of the miR-200 family, suggesting that one sub-group may be from more epithelial-like tissues and the other sub-group may be from more mesenchymal-like tissues. Moreover, one sub-group contained all the liver samples and was characterized by an abundant expression of liver-specific miRNAs. This reinforced the notion that miRNA expression is highly correlated with tissue type. I note here that the analysis of MRT is limited to 40 cases, and thus may not capture the full extent of heterogeneity in this disease. As such, this analysis would benefit from a larger cohort of MRT cases. 5.2 miRNAs as biomarkers and therapeutic targets The ability to accurately predict response to therapy and survival is advantageous for patient treatment planning. Currently, several molecular-based biomakers have been considered for clinical tests. However, the high stability of miRNAs in tissues and fluids offers a key advantage over proteins and mRNAs as biomarkers that can be routinely used in the clinic.  174 Moreover, a complete miRNA expression profile can be quantified from very low quantities of material (<1 μg) and in highly degraded samples, such as FFPET. I was privileged to have access to large patient miRNA-seq data sets for DLBCL and pediatric AML, and thus was able to identify candidate miRNA prognostic markers in these diseases. In DLBCL (Section 2.2.6), I had access to a discovery cohort of 83 cases, and a FFPET validation cohort of 112 cases; and analyses of these data revealed that 6 miRNAs (miR-28-5p, miR-214-5p, miR-339-3p, miR-5586-5p, miR-324-5p, NOVELM00203M) were associated with OS and EFS independently of COO and IPI. In pediatric AML (Section 3.2.8), I had access to a discovery cohort of 259 cases, and a validation cohort of 378 cases; and analyses of these data revealed that 6 miRNAs (miR-106a-5p, miR-106a-3p, miR-20b-3p, miR-363-3p, miR-378c and miR-181c-3p) were significantly associated with OS and EFS. The miRNAs that I discovered to be associated with survival are candidates for prognostic and diagnostic tests that could direct treatment plans. I have reported their associations with survival in a univariate manner, but a combination of their expression levels into a prognostic signature may well provide a more robust estimation of risk. As well, these findings are based on miRNA-seq technology, which currently is not the technology of choice for routine clinical usage. As such, future studies could look into investigating the use of these biomarkers on probe-based platforms such as qPCR or Nanostring. I also note that unlike the other tumour samples in this thesis, which have been derived from fresh-frozen tissue samples, the DLBCL validation cohort was derived from FFPET. I utilized FFPET samples as these were available to me for validation and fresh frozen samples were not. Studies have compared miRNA expression profiles obtained from FFPET and fresh frozen samples, and have shown that differences between profiles exist152-154. However, despite the differences between my fresh frozen discovery and FFPET validation cohorts, I replicated the robust association of 6 miRNAs with outcome, suggesting that the hundreds of archival FFPET samples from across the world may prove to be a useful source for miRNA expression profiles for retrospective research studies.   175 5.3 Identification of novel miRNA species and variation A distinct advantage of miRNA-seq over probe-based methods is the capability of identifying novel miRNA species. This is of particular utility when attempting to search for novel miRNA species that might arise in malignancy. As a demonstration of this, I performed novel miRNA discovery in DLBCL (Section 2.2.2 & 2.4.5), and identified 30 candidate miRNAs that were expressed across my data set. These miRNAs were submitted to the miRBase repository, and those that satisfied their criteria were given official miRNA names. One of these miRNAs was miR-10393-3p, which was more abundantly expressed in DLBCL tumour samples than in benign centroblasts, suggesting that it could have arisen in lymphomagenesis. This analysis demonstrated that it is possible for miRNA species to be preferentially expressed in malignancy, and thus inspires a thorough pan-cancer novel miRNA discovery exercise across miRNA-seq data sets, to identify novel miRNAs that may be over-represented in particular cancer types. Two other advantages of miRNA-seq data are the capability of distinguishing different miRNA isoforms (isomiRs215) from one another, and the identification of SNVs within miRNA sequences. These 2 phenomena were not studied in this thesis, but present avenues for future investigations using these data sets. In addition, there are recent reports concerning recurrent somatic mutations in miRNA processing genes in ovarian cancer122 and Wilm’s tumours378, and how these mutations result in preferential expression of 3p miRNA strands123. It would be interesting to observe (through novel miRNA discovery) if there exists any previously unannotated 3p miRNAs that arise as a result of these mutations. 5.4 Integrative miRNA:mRNA analysis reveals miRNA-mediated repression interactions The large numbers of dysregulated miRNAs in my analyses prompted me to investigate what mRNA targets may be impacted by aberrant miRNA expression. This analysis proved to be useful in identifying functional miRNA:mRNA interactions and in informing on the gene functions and pathways that may be perturbed as a result of miRNA dysregulation.  176 In DLBCL (Section 2.2.5), I found that miRNAs that are abundantly expressed in DLBCL tumours, when compared with benign centroblasts, appeared to be targeting genes involved in chromatin modification. In particular, I demonstrated that the novel miRNA that was identified in this study (miR-10393-3p) could repress MLL2 and EP300. As genes involved in chromatin modification are frequently mutated in NHL205, this analysis was compatible with the notion that miRNAs may provide an alternative mechanism for dysregulating chromatin modifiers in NHL. In pediatric AML (Section 3.2.7 & Section 3.2.9), I found that miRNAs that were abundantly expressed in treatment resistant contexts (relapse and refractory samples) appeared to target genes involved in oxidative phosphorylation. In particular, my analysis revealed that miR-106a-5p, the miRNA that was also significantly associated with OS and EFS, appeared to target 5 members of the electron transport chain: ATP5J2-PTCD1, ATP5S, NDUFA10, NDUFC2, and UQCRB. Cytarabine and anthracycline are chemotherapy agents that specifically target rapidly dividing and proliferating cells. As such, quiescent cells with lowered levels of oxidative phosphorylation206 tend to be chemotherapy resistant. In that regard, my results are compatible with the notion that miRNAs may be contributing to treatment resistance in patients treated with cytarabine and anthracycline-based induction chemotherapy by reducing the expression of genes involved in oxidative phosphorylation. Moving forward, this hypothesis can be investigated using metabolic assays, such as those measuring glucose consumption and ATP production. While my integrative analysis was useful in identifying functional miRNA:mRNA interactions which could be validated by luciferase assays, I note that my analysis is relatively straight forward, as it only considers each miRNA:mRNA interaction in isolation. In particular, a major caveat of my analysis is that it only captures miRNA:mRNA interactions where the mRNA is destabilized or degraded and does not take into account instances where the mRNA targets remain intact after translational repression. In addition, a more comprehensive analysis taking into account an interaction network of miRNA:mRNA interactions, such as those proposed by Sumazin et al.379, may provide more accurate predictions. As well, there are other considerations about miRNA:mRNA interactions that have not been considered in this thesis but could be used towards an enhanced understanding  177 of miRNA:mRNA networks. These include competitive sponge effects from other RNA molecules380 and mRNA thresholds for miRNA interactions381. 5.5 Pan-cancer miRNA analysis associates samples of the same tissue type Pediatric MRT is a cancer of unknown cell of origin. In my analysis of miRNA expression in MRT, I set out to investigate a putative cell of origin. My unsupervised hierarchical clustering analysis of the MRT samples, along with medians of miRNA expression from 11,753 cases representing 36 cancer types and 26 normal tissue types, revealed that MRT samples are most similar to samples from the cerebellum and DLBCL. These findings were also supported by mRNA expression profiles of these samples. In addition, the finding that MRT was similar to neural-type cells was consistent with the literature299-303. Altogether, these results not only provided insight into each of the MRT cases, but also reinforced the notion that miRNA expression profiles may reflect tissue type. Moreover, they suggest that miRNA expression profiles might be routinely used in the diagnosis of cancers of unknown primary origin. 5.6 Conclusion Overall, the research presented in this thesis constitutes a step forward in our understanding of miRNA dysregulation in DLBCL, pediatric AML, and pediatric MRT. I attained my goals of identifying cancer subtypes and relating these to clinical covariates, and studying miRNA-mediated regulation in each of these cancers. My findings not only detail cancer sub-groups that have distinct miRNA expression profiles, but also include putative functional miRNA:mRNA interactions. Moreover, I demonstrated the utility of miRNA-seq in the identification of candidate novel miRNA species, and in the identification of putative tissues of origin of cancer samples. Importantly, I identified miRNAs that are significantly associated with patient outcome and treatment resistance, which could be used in the design of clinical diagnostic and prognostic tests.     178 Bibliography 1. Bedard, P. L., Hansen, A. R., Ratain, M. J. & Siu, L. L. Tumour heterogeneity in the clinic. Nature 501, 355–364 (2013). 2. Croce, C. M. Causes and consequences of microRNA dysregulation in cancer. Nat Rev Genet 10, 704–714 (2009). 3. Caldas, C. & Brenton, J. D. Sizing up miRNAs as cancer genes. Nat Med 11, 712–714 (2005). 4. Farrell, A., Hutchinson, E., Marte, B. & McCarthy, N. Nature Milestones - Cancer. Nature (2006). doi:10.1038/nrc1844 5. Balmain, A. Cancer genetics: from Boveri and Mendel to microarrays. Nature Rev. Cancer 1, 77–82 (2001). 6. Muller, H. J. The Production of Mutations by X-Rays. Proc Natl Acad Sci USA 14, 714–726 (1928). 7. Altenburg, E. The Artificial Production of Mutations by Ultra-Violet Light. The American Naturalist 68, 491–507 (1934). 8. van Gent, D. C., Hoeijmakers, J. H. & Kanaar, R. Chromosomal stability and the DNA double-stranded break connection. Nat Rev Genet 2, 196–206 (2001). 9. Weeda, G. et al. A presumed DNA helicase encoded by ERCC-3 is involved in the human repair disorders xeroderma pigmentosum and Cockayne's syndrome. Cell 62, 777–791 (1990). 10. Tanaka, K. et al. Analysis of a human DNA excision repair gene involved in group A xeroderma pigmentosum and containing a zinc-finger domain. Nature 348, 73–76 (1990). 11. Nowell, P. & Hungerford, D. A minute chromosome in human chronic granulocytic leukemia. Science 132, 1488–1501 (1960). 12. Rowley, J. D. Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973). 13. de Klein, A. et al. A cellular oncogene is translocated to the Philadelphia chromosome in chronic myelocytic leukaemia. Nature 300, 765–767 (1982). 14. Sattler, M. & Griffin, J. D. Mechanisms of transformation by the BCR/ABL oncogene. Int. J. Hematol. 73, 278–291 (2001). 15. Croce, C. M. Oncogenes and cancer. N Engl J Med 358, 502–511 (2008). 16. Martin, G. S. Rous sarcoma virus: a function required for the maintenance of the transformed state. Nature 227, 1021–1023 (1970). 17. Stehelin, D., Varmus, H. E., Bishop, J. M. & Vogt, P. K. DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature 260, 170–173 (1976). 18. Spector, D. H., Varmus, H. E. & Bishop, J. M. Nucleotide sequences related to the transforming gene of avian sarcoma virus are present in DNA of uninfected vertebrates. Proc Natl Acad Sci USA 75, 4102–4106 (1978). 19. Oppermann, H., Levinson, A. D., Varmus, H. E., Levintow, L. & Bishop, J. M. Uninfected vertebrate cells contain a protein that is closely related to the product of the avian sarcoma virus transforming gene (src). Proc Natl Acad Sci USA 76, 1804–1808 (1979).  179 20. Collett, M. S., Erikson, E., Purchio, A. F., Brugge, J. S. & Erikson, R. L. A normal cell protein similar in structure and function to the avian sarcoma virus transforming gene product. Proc Natl Acad Sci USA 76, 3159–3163 (1979). 21. Shih, C. & Weinberg, R. A. Isolation of a transforming sequence from a human bladder carcinoma cell line. Cell 29, 161–169 (1982). 22. Goldfarb, M., Shimizu, K., Perucho, M. & Wigler, M. Isolation and preliminary characterization of a human transforming gene from T24 bladder carcinoma cells. Nature 296, 404–409 (1982). 23. Santos, E., Tronick, S. R., Aaronson, S. A., Pulciani, S. & Barbacid, M. T24 human bladder carcinoma oncogene is an activated form of the normal human homologue of BALB- and Harvey-MSV transforming genes. Nature 298, 343–347 (1982). 24. Reddy, E. P., Reynolds, R. K., Santos, E. & Barbacid, M. A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene. Nature 300, 149–152 (1982). 25. Malumbres, M. & Barbacid, M. RAS oncogenes: the first 30 years. Nature Reviews Cancer (2003). 26. Sherr, C. J. Principles of tumor suppression. Cell (2004). 27. Knudson, A. G. Mutation and cancer: statistical study of retinoblastoma. in (1971). 28. Huang, H. J., Yee, J. K., Shew, J. Y., Chen, P. L. & Bookstein, R. Suppression of the neoplastic phenotype by replacement of the RB gene in human cancer cells. Science (1988). 29. Baker, S. J., Markowitz, S., Fearon, E. R. & Willson, J. K. Suppression of human colorectal carcinoma cell growth by wild-type p53. Science (1990). 30. Bieging, K. T., Mello, S. S. & Attardi, L. D. Unravelling mechanisms of p53-mediated tumour suppression. Nature Reviews Cancer (2014). 31. van Oijen, M. G. & Slootweg, P. J. Gain-of-function mutations in the tumor suppressor gene p53. Clin Cancer Res 6, 2138–2145 (2000). 32. Hock, H. A complex Polycomb issue: the two faces of EZH2 in cancer. Genes Dev 26, 751–755 (2012). 33. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000). 34. Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). 35. Forbes, S. A., Beare, D. & Gunasekaran, P. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic acids … (2015). 36. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013). 37. Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell (1990). 38. Gajjar, A. J. & Robinson, G. W. Medulloblastoma-translating discoveries from the bench to the bedside. Nat Rev Clin Oncol 11, 714–722 (2014). 39. Tallman, M. S. & Altman, J. K. Curative strategies in acute promyelocytic leukemia. Hematology Am Soc Hematol Educ Program 391–399 (2008). doi:10.1182/asheducation-2008.1.391 40. Adès, L. et al. Very long-term outcome of acute promyelocytic leukemia after treatment with all-trans retinoic acid and chemotherapy: the European APL Group experience. Blood 115, 1690–1696 (2010). 41. Rukov, J. L. & Shomron, N. MicroRNA pharmacogenomics: post-transcriptional  180 regulation of drug response. Trends Mol Med 17, 412–423 (2011). 42. Höglund, M., Gisselsson, D., Säll, T. & Mitelman, F. Coping with complexity. multivariate analysis of tumor karyotypes. Cancer Genet. Cytogenet. 135, 103–109 (2002). 43. Shibata, D., Schaeffer, J., Li, Z. H., Capella, G. & Perucho, M. Genetic heterogeneity of the c-K-ras locus in colorectal adenomas but not in adenocarcinomas. J. Natl. Cancer Inst. 85, 1058–1063 (1993). 44. Bashashati, A. et al. Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational profiling. J Pathol 231, 21–34 (2013). 45. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature (2012). doi:10.1038/nature10738 46. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809–813 (2009). 47. Ellis, L., Atadja, P. W. & Johnstone, R. W. Epigenetics in cancer: targeting chromatin modifications. Mol. Cancer Ther. 8, 1409–1420 (2009). 48. Rhodes, D. R. & Chinnaiyan, A. M. Integrative analysis of the cancer transcriptome. Nat Genet 37 Suppl, S31–7 (2005). 49. Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999). 50. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000). 51. van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002). 52. Zou, T.-T. et al. Application of cDNA microarrays to generate a molecular taxonomy capable of distinguishing between colon cancer and normal colon. Oncogene 21, 4855–4862 (2002). 53. Sotiriou, C. et al. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci USA 100, 10393–10398 (2003). 54. Bittner, M. et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536–540 (2000). 55. Garber, M. E. et al. Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci USA 98, 13784–13789 (2001). 56. Best, C. J. M. et al. Molecular differentiation of high- and moderate-grade human prostate cancer by cDNA microarray analysis. Diagn. Mol. Pathol. 12, 63–70 (2003). 57. van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999–2009 (2002). 58. Chang, J. C. et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362, 362–369 (2003). 59. Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. A molecular signature of metastasis in primary solid tumors. Nat Genet 33, 49–54 (2003). 60. Lossos, I. S. et al. Transformation of follicular lymphoma to diffuse large-cell lymphoma: alternative patterns with increased or decreased expression of c-myc and its regulated genes. Proc Natl Acad Sci USA 99, 8886–8891 (2002).  181 61. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008). 62. Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N Engl J Med 361, 1058–1066 (2009). 63. Morozova, O., Hirst, M. & Marra, M. A. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 10, 135–151 (2009). 64. The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). 65. International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993–998 (2010). 66. The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 368, 2059–2074 (2013). 67. Lee, R. C., Feinbaum, R. L. & Ambros, V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843–854 (1993). 68. Wightman, B., Ha, I. & Ruvkun, G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855–862 (1993). 69. Reinhart, B. J. et al. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901–906 (2000). 70. Lee, R. C. & Ambros, V. An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862–864 (2001). 71. Guo, H., Ingolia, N. T., Weissman, J. S. & Bartel, D. P. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835–840 (2010). 72. Berezikov, E. Evolution of microRNA diversity and regulation in animals. Nat Rev Genet 12, 846–860 (2011). 73. Westholm, J. O. & Lai, E. C. Mirtrons: microRNA biogenesis via splicing. Biochimie 93, 1897–1904 (2011). 74. Ji, M. et al. The miR-17-92 microRNA cluster is regulated by multiple mechanisms in B-cell malignancies. Am. J. Pathol. 179, 1645–1656 (2011). 75. Schnall-Levin, M., Rissland, O. S., Johnston, W. K., Bartel, D. P. & Berger, B. Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs. Genome Res 21, 1395–1403 (2011). 76. Duursma, A. M., Kedde, M., Schrier, M., le Sage, C. & Agami, R. miR-148 targets human DNMT3b protein coding region. RNA 14, 872–877 (2008). 77. Lytle, J. R., Yario, T. A. & Steitz, J. A. Target mRNAs are repressed as efficiently by microRNA-binding sites in the 5‘ UTR as in the 3’ UTR. Proc Natl Acad Sci USA 104, 9667–9672 (2007). 78. Petersen, C. P., Bordeleau, M.-E., Pelletier, J. & Sharp, P. A. Short RNAs repress translation after initiation in mammalian cells. Mol Cell 21, 533–542 (2006). 79. Creighton, C. J., Reid, J. G. & Gunaratne, P. H. Expression profiling of microRNAs by deep sequencing. Brief. Bioinformatics 10, 490–497 (2009). 80. Hsu, S.-D. et al. miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res 39, D163–9 (2011). 81. Calin, G. A. et al. Frequent deletions and down-regulation of micro- RNA genes  182 miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA 99, 15524–15529 (2002). 82. Farazi, T. A., Spitzer, J. I., Morozov, P. & Tuschl, T. miRNAs in human cancer. J Pathol 223, 102–115 (2011). 83. Zhang, J. et al. Patterns of microRNA expression characterize stages of human B-cell differentiation. Blood 113, 4586–4594 (2009). 84. Malumbres, R. et al. Differentiation stage-specific expression of microRNAs in B lymphocytes and diffuse large B-cell lymphomas. Blood 113, 3754–3764 (2009). 85. Basso, K. et al. Identification of the human mature B cell miRNome. Immunity 30, 744–752 (2009). 86. Craig, V. J. et al. Myc-mediated repression of microRNA-34a promotes high-grade transformation of B-cell lymphoma by dysregulation of FoxP1. Blood 117, 6227–6236 (2011). 87. Li, C. et al. Copy number abnormalities, MYC activity, and the genetic fingerprint of normal B cells mechanistically define the microRNA profile of diffuse large B-cell lymphoma. Blood 113, 6681–6690 (2009). 88. Klein, U. et al. The DLEU2/miR-15a/16-1 cluster controls B cell proliferation and its deletion leads to chronic lymphocytic leukemia. Cancer Cell 17, 28–40 (2010). 89. Deshpande, A. et al. 3'UTR mediated regulation of the cyclin D1 proto-oncogene. Cell Cycle 8, 3584–3592 (2009). 90. Zhao, J.-J. et al. microRNA expression profile and identification of miR-29 as a prognostic marker and pathogenetic factor by targeting CDK6 in mantle cell lymphoma. Blood 115, 2630–2639 (2010). 91. Bueno, M. J. et al. Combinatorial effects of microRNAs to suppress the Myc oncogenic pathway. Blood 117, 6255–6266 (2011). 92. Xiao, C. et al. Lymphoproliferative disease and autoimmunity in mice with increased miR-17-92 expression in lymphocytes. Nat Immunol 9, 405–414 (2008). 93. Rao, E. et al. The miRNA-1792 cluster mediates chemoresistance and enhances tumor growth in mantle cell lymphoma via PI3K/AKT pathway activation. Leukemia 26, 1064–1072 (2012). 94. Marcinkowska, M., Szymanski, M., Krzyzosiak, W. J. & Kozlowski, P. Copy number variation of microRNA genes in the human genome. BMC Genomics 12, 183 (2011). 95. Ramsingh, G. et al. Acquired copy number alterations of miRNA genes in acute myeloid leukemia are uncommon. Blood 122, e44–51 (2013). 96. Schneider, B. et al. T(3;7)(q27;q32) fuses BCL6 to a non-coding region at FRA7H near miR-29. Leukemia 22, 1262–1266 (2008). 97. Ruiz-Ballesteros, E. et al. MicroRNA losses in the frequently deleted region of 7q in SMZL. Leukemia 21, 2547–2549 (2007). 98. Schneider, B. et al. Neoplastic MiR-17~92 deregulation at a DNA fragility motif (SIDD). Genes Chromosomes Cancer 51, 219–228 (2012). 99. Leich, E. et al. MicroRNA profiles of t(14;18)-negative follicular lymphoma support a late germinal center B-cell phenotype. Blood 118, 5550–5558 (2011). 100. Huppi, K. et al. The identification of microRNAs in a genomically unstable region of human chromosome 8q24. Mol. Cancer Res. 6, 212–221 (2008). 101. Wahlström, T. & Henriksson, M. Mnt takes control as key regulator of the  183 myc/max/mxd network. Adv. Cancer Res. 97, 61–80 (2007). 102. Chang, T.-C. et al. Widespread microRNA repression by Myc contributes to tumorigenesis. Nat Genet 40, 43–50 (2008). 103. Robertus, J.-L. et al. MiRNA profiling in B non-Hodgkin lymphoma: a MYC-related miRNA profile characterizes Burkitt lymphoma. Br. J. Haematol. 149, 896–899 (2010). 104. O'Donnell, K. A., Wentzel, E. A., Zeller, K. I., Dang, C. V. & Mendell, J. T. c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839–843 (2005). 105. Magnani, L., Eeckhoute, J. & Lupien, M. Pioneer factors: directing transcriptional regulators within the chromatin environment. Trends Genet 27, 465–474 (2011). 106. Ducasse, M. & Brown, M. A. Epigenetic aberrations and cancer. Mol Cancer 5, 60 (2006). 107. Herman, J. G. & Baylin, S. B. Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med 349, 2042–2054 (2003). 108. Shaknovich, R. et al. DNA methylation signatures define molecular subtypes of diffuse large B-cell lymphoma. Blood 116, e81–9 (2010). 109. Wong, K. Y. et al. Epigenetic inactivation of the miR-124-1 in haematological malignancies. PLoS ONE 6, e19027 (2011). 110. Chim, C. S. et al. Epigenetic inactivation of the hsa-miR-203 in haematological malignancies. J. Cell. Mol. Med. 15, 2760–2767 (2011). 111. Onnis, A. et al. Alteration of microRNAs regulated by c-Myc in Burkitt lymphoma. PLoS ONE 5, (2010). 112. Zhang, X. et al. Myc represses miR-15a/miR-16-1 expression through recruitment of HDAC3 in mantle cell and other non-Hodgkin B-cell lymphomas. Oncogene 31, 3002–3008 (2012). 113. Aqeilan, R. I., Calin, G. A. & Croce, C. M. miR-15a and miR-16-1 in cancer: discovery, function and future perspectives. Cell Death Differ. 17, 215–220 (2010). 114. Kluiver, J. et al. Regulation of pri-microRNA BIC transcription and processing in Burkitt lymphoma. Oncogene 26, 3769–3776 (2007). 115. Campo, E. et al. The 2008 WHO classification of lymphoid neoplasms and beyond: evolving concepts and practical applications. Blood 117, 5019–5032 (2011). 116. Network, C. G. A. R. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014). 117. Barth, S., Meister, G. & Grässer, F. A. EBV-encoded miRNAs. Biochim Biophys Acta 1809, 631–640 (2011). 118. Xia, T. et al. EBV microRNAs in primary lymphomas and targeting of CXCL-11 by ebv-mir-BHRF1-3. Cancer Res 68, 1436–1442 (2008). 119. Martín-Pérez, D. et al. Epstein-Barr virus microRNAs repress BCL6 expression in diffuse large B-cell lymphoma. Leukemia 26, 180–183 (2012). 120. Imig, J. et al. microRNA profiling in Epstein-Barr virus-associated B-cell lymphoma. Nucleic Acids Res 39, 1880–1893 (2011). 121. Thomson, J. M. et al. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20, 2202–2207 (2006). 122. Heravi-Moussavi, A. et al. Recurrent somatic DICER1 mutations in nonepithelial ovarian cancers. N Engl J Med 366, 234–242 (2012). 123. Anglesio, M. S. et al. Cancer-associated somatic DICER1 hotspot mutations cause  184 defective miRNA processing and reverse-strand expression bias to predominantly mature 3p strands through loss of 5p strand cleavage. J Pathol 229, 400–409 (2013). 124. Chen, J. et al. Recurrent DICER1 hotspot mutations in endometrial tumours and their impact on microRNA biogenesis. J Pathol (2015). doi:10.1002/path.4569 125. Kim, D. N. & Lee, S. K. Biogenesis of Epstein-Barr virus microRNAs. Mol. Cell. Biochem. 365, 203–210 (2012). 126. Arrate, M. P. et al. MicroRNA biogenesis is required for Myc-induced B-cell lymphoma development and survival. Cancer Res 70, 6083–6092 (2010). 127. Chen, R. W. et al. Truncation in CCND1 mRNA alters miR-16-1 regulation in mantle cell lymphoma. Blood 112, 822–829 (2008). 128. Sandberg, R., Neilson, J. R., Sarma, A., Sharp, P. A. & Burge, C. B. Proliferating Cells Express mRNAs with Shortened 3' Untranslated Regions and Fewer MicroRNA Target Sites. Science 320, 1643–1647 (2008). 129. Mayr, C., Hemann, M. T. & Bartel, D. P. Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576–1579 (2007). 130. Ma, R., Jiang, T. & Kang, X. Circulating microRNAs in cancer: origin, function and application. J. Exp. Clin. Cancer Res. 31, 38 (2012). 131. Lawrie, C. H. et al. Detection of elevated levels of tumour-associated microRNAs in serum of patients with diffuse large B-cell lymphoma. Br. J. Haematol. 141, 672–675 (2008). 132. Wang, Y.-H. & Xu, W. Serum microRNAs are promising novel biomarkers for diffuse large B cell lymphoma. Ann. Hematol. 91, 553–559 (2012). 133. Ohyashiki, K. et al. Clinical impact of down-regulated plasma miR-92a levels in non-Hodgkin's lymphoma. PLoS ONE 6, e16408 (2011). 134. Schwarzenbach, H., Nishida, N., Calin, G. A. & Pantel, K. Clinical relevance of circulating cell-free microRNAs in cancer. Nat Rev Clin Oncol 11, 145–156 (2014). 135. Lindow, M. & Kauppinen, S. Discovering the first microRNA-targeted drug. J. Cell Biol. 199, 407–412 (2012). 136. Garzon, R., Marcucci, G. & Croce, C. M. Targeting microRNAs in cancer: rationale, strategies and challenges. Nat Rev Drug Discov 9, 775–789 (2010). 137. Iorio, M. V. & Croce, C. M. MicroRNA dysregulation in cancer: diagnostics, monitoring and therapeutics. A comprehensive review. EMBO Mol Med 4, 143–159 (2012). 138. Zhang, Y. et al. LNA-mediated anti-miR-155 silencing in low-grade B-cell lymphomas. Blood 120, 1678–1686 (2012). 139. Zhang, X. et al. Coordinated Silencing of MYC-Mediated miR-29 by HDAC3 and EZH2 as a Therapeutic Target of Histone Modification in Aggressive B-Cell Lymphomas. Cancer Cell 22, 506–523 (2012). 140. Lacayo, N. J. et al. Development and validation of a single-cell network profiling assay-based classifier to predict response to induction therapy in paediatric patients with de novo acute myeloid leukaemia: a report from the Children's Oncology Group. Br. J. Haematol. (2013). doi:10.1111/bjh.12370 141. Pui, C.-H., Carroll, W. L., Meshinchi, S. & Arceci, R. J. Biology, risk stratification, and therapy of pediatric acute leukemias: an update. J Clin Oncol 29, 551–565 (2011). 142. MicroRNA-21 regulates the sensitivity of diffuse large B-cell lymphoma cells to the  185 CHOP chemotherapy regimen. Int. J. Hematol. 97, 223–231 (2013). 143. Wu, Y. et al. MicroRNA-148b enhances the radiosensitivity of non-Hodgkin's Lymphoma cells by promoting radiation-induced apoptosis. J. Radiat. Res. 53, 516–525 (2012). 144. Shibayama, Y. et al. Upregulation of microRNA-126-5p is associated with drug resistance to cytarabine and poor prognosis in AML patients. Oncol. Rep. 33, 2176–2182 (2015). 145. Zhang, H. et al. Upregulation of microRNA-125b contributes to leukemogenesis and increases drug resistance in pediatric acute promyelocytic leukemia. Mol Cancer 10, 108 (2011). 146. Khella, H. W. Z. et al. miR-221/222 Are Involved in Response to Sunitinib Treatment in Metastatic Renal Cell Carcinoma. Mol. Ther. (2015). doi:10.1038/mt.2015.129 147. Pritchard, C. C., Cheng, H. H. & Tewari, M. MicroRNA profiling: approaches and considerations. Nat Rev Genet 13, 358–369 (2012). 148. Calin, G. A. & Croce, C. M. MicroRNA signatures in human cancers. Nature Rev. Cancer 6, 857–866 (2006). 149. Wyman, S. K. et al. Post-transcriptional generation of miRNA variants by multiple nucleotidyl transferases contributes to miRNA transcriptome complexity. Genome Res 21, 1450–1461 (2011). 150. Peveling-Oberhag, J. et al. Dysregulation of global microRNA expression in splenic marginal zone lymphoma and influence of chronic hepatitis C virus infection. Leukemia 26, 1654–1662 (2012). 151. Nelson, P. T. et al. Microarray-based, high-throughput gene expression profiling of microRNAs. Nat Methods 1, 155–161 (2004). 152. Meng, W. et al. Comparison of microRNA deep sequencing of matched formalin-fixed paraffin-embedded and fresh frozen cancer tissues. PLoS ONE 8, e64393 (2013). 153. Weng, L. et al. MicroRNA profiling of clear cell renal cell carcinoma by whole-genome small RNA deep sequencing of paired frozen and formalin-fixed, paraffin-embedded tissue specimens. J Pathol 222, 41–51 (2010). 154. Li, J. et al. Comparison of miRNA expression patterns using total RNA extracted from matched samples of formalin-fixed paraffin-embedded (FFPE) cells and snap frozen cells. BMC Biotechnol. 7, 36 (2007). 155. Garcia, D. M. et al. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139–1146 (2011). 156. John, B. et al. Human MicroRNA targets. PLoS Biol 2, e363 (2004). 157. Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009). 158. Friedman, R. C., Farh, K. K.-H., Burge, C. B. & Bartel, D. P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92–105 (2009). 159. Jayaswal, V., Lutherborrow, M., Ma, D. D. F. & Yang, Y. H. Identification of microRNA-mRNA modules using microarray data. BMC Genomics 12, 138 (2011). 160. Lionetti, M. et al. Identification of microRNA expression patterns and definition of a microRNA/mRNA regulatory network in distinct molecular groups of multiple  186 myeloma. Blood 114, e20–6 (2009). 161. Gutiérrez, N. C. et al. Deregulation of microRNA expression in the different genetic subtypes of multiple myeloma and correlation with gene expression profiling. Leukemia 24, 629–637 (2010). 162. Wang, L. et al. Genome-wide transcriptional profiling reveals microRNA-correlated genes and biological processes in human lymphoblastoid cell lines. PLoS ONE 4, e5878 (2009). 163. Wang, Y.-P. & Li, K.-B. Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data. BMC Genomics 10, 218 (2009). 164. Creighton, C. J. et al. Integrated Analyses of microRNAs Demonstrate Their Widespread Influence on Gene Expression in High-Grade Serous Ovarian Carcinoma. PLoS ONE 7, e34546 (2012). 165. Muniategui, A. et al. Quantification of miRNA-mRNA interactions. PLoS ONE 7, e30766 (2012). 166. Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479–486 (2009). 167. Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141 (2010). 168. Haecker, I. & Renne, R. HITS-CLIP and PAR-CLIP advance viral miRNA targetome analysis. Crit. Rev. Eukaryot. Gene Expr. 24, 101–116 (2014). 169. Thomson, D. W., Bracken, C. P. & Goodall, G. J. Experimental strategies for microRNA target identification. Nucleic Acids Res 39, 6845–6853 (2011). 170. Vardiman, J. W. et al. The 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes. in 114, 937–951 (2009). 171. Kim, K. H. & Roberts, C. W. M. Mechanisms by which SMARCB1 loss drives rhabdoid tumor growth. Cancer Genet (2014). doi:10.1016/j.cancergen.2014.04.004 172. Staudt, L. M. Aggressive lymphomas. N Engl J Med 362, 1417–1429 (2010). 173. Roehle, A. et al. MicroRNA signatures characterize diffuse large B-cell lymphomas and follicular lymphomas. Br. J. Haematol. 142, 732–744 (2008). 174. Robertus, J.-L. et al. Specific expression of miR-17-5p and miR-127 in testicular and central nervous system diffuse large B-cell lymphoma. Mod. Pathol. 22, 547–555 (2009). 175. Lawrie, C. H. et al. MicroRNA expression distinguishes between germinal center B cell-like and activated B cell-like subtypes of diffuse large B cell lymphoma. Int. J. Cancer 121, 1156–1161 (2007). 176. Tarlock, K. & Meshinchi, S. Pediatric Acute Myeloid Leukemia: Biology and Therapeutic Implications of Genomic Variants. Pediatr. Clin. North Am. 62, 75–93 (2015). 177. Daschkey, S. et al. MicroRNAs distinguish cytogenetic sub-groups in pediatric AML and contribute to complex regulatory networks in AML-relevant pathways. PLoS ONE 8, e56334 (2013). 178. Zhang, H. et al. MicroRNA patterns associated with clinical prognostic parameters and CNS relapse prediction in pediatric acute leukemia. PLoS ONE 4, e7826 (2009). 179. Yagi, T. et al. Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102, 1849–1856 (2003).  187 180. Lacayo, N. J. et al. Gene expression profiles at diagnosis in de novo childhood AML patients identify FLT3 mutations with good clinical outcomes. Blood 104, 2646–2654 (2004). 181. Andersson, A. et al. Microarray-based classification of a consecutive series of 121 childhood acute leukemias: prediction of leukemic and genetic subtype as well as of minimal residual disease status. Leukemia 21, 1198–1203 (2007). 182. de Jonge, H. J. M. et al. High VEGFC expression is associated with unique gene expression profiles and predicts adverse prognosis in pediatric and adult acute myeloid leukemia. Blood 116, 1747–1754 (2010). 183. Sandahl, J. D. et al. t(6;9)(p22;q34)/DEK-NUP214-rearranged pediatric myeloid leukemia: an international study of 62 patients. Haematologica 99, 865–872 (2014). 184. Gorman, M. F. et al. Outcome for children treated for relapsed or refractory acute myelogenous leukemia (rAML): a Therapeutic Advances in Childhood Leukemia (TACL) Consortium study. Pediatr Blood Cancer 55, 421–429 (2010). 185. Ross, M. E. et al. Gene expression profiling of pediatric acute myelogenous leukemia. Blood 104, 3679–3687 (2004). 186. Sander, A. et al. Consequent and intensified relapse therapy improved survival in pediatric AML: results of relapse treatment in 379 patients of three consecutive AML-BFM trials. Leukemia 24, 1422–1428 (2010). 187. Zhang, L. et al. MiR-99a may serve as a potential oncogene in pediatric myeloid leukemia. Cancer Cell Int. 13, 110 (2013). 188. Yan-Fang, T. et al. The promoter of miR-663 is hypermethylated in Chinese pediatric acute myeloid leukemia (AML). BMC Med. Genet. 14, 74 (2013). 189. Bai, J., Guo, A., Hong, Z. & Kuai, W. Upregulation of microRNA-100 predicts poor prognosis in patients with pediatric acute myeloid leukemia. Onco Targets Ther 5, 213–219 (2012). 190. Wang, Z., Hong, Z., Gao, F. & Feng, W. Upregulation of microRNA-375 is associated with poor prognosis in pediatric acute myeloid leukemia. Mol. Cell. Biochem. 383, 59–65 (2013). 191. Lin, X., Wang, Z., Zhang, R. & Feng, W. High serum microRNA-335 level predicts aggressive tumor progression and unfavorable prognosis in pediatric acute myeloid leukemia. Clin Transl Oncol (2014). doi:10.1007/s12094-014-1237-z 192. Zhu, C. et al. Prognostic value of miR-29a expression in pediatric acute myeloid leukemia. Clin. Biochem. 46, 49–53 (2013). 193. Weeks, D. A., Beckwith, J. B., Mierau, G. W. & Luckey, D. W. Rhabdoid tumor of kidney. A report of 111 cases from the National Wilms' Tumor Study Pathology Center. Am. J. Surg. Pathol. 13, 439–458 (1989). 194. Birks, D. K. et al. Survey of MicroRNA expression in pediatric brain tumors. Pediatr Blood Cancer 56, 211–216 (2011). 195. Lee, Y.-Y. et al. MicroRNA142-3p promotes tumor-initiating and radioresistant properties in malignant pediatric brain tumors. Cell Transplant 23, 669–690 (2014). 196. Zhang, K. et al. Frequent overexpression of HMGA2 in human atypical teratoid/rhabdoid tumor and its correlation with let-7a3/let-7b miRNA. Clin Cancer Res 20, 1179–1189 (2014). 197. Sredni, S. T. et al. Upregulation of mir-221 and mir-222 in atypical teratoid/rhabdoid tumors: potential therapeutic targets. Childs Nerv Syst 26, 279–283 (2010).  188 198. Hsieh, T.-H. et al. Downregulation of SUN2, a novel tumor suppressor, mediates miR-221/222-induced malignancy in central nervous system embryonal tumors. Carcinogenesis 35, 2164–2174 (2014). 199. Weingart, M. F. et al. Disrupting LIN28 in atypical teratoid rhabdoid tumors reveals the importance of the mitogen activated protein kinase pathway as a therapeutic target. Oncotarget 6, 3165–3177 (2015). 200. Armeanu-Ebinger, S. et al. Differential expression of miRNAs in rhabdomyosarcoma and malignant rhabdoid tumor. Exp Cell Res 318, 2567–2577 (2012). 201. Grupenmacher, A. T. et al. Study of the gene expression and microRNA expression profiles of malignant rhabdoid tumors originated in the brain (AT/RT) and in the kidney (RTK). Childs Nerv Syst 29, 1977–1983 (2013). 202. Kohashi, K. et al. Differential microRNA expression profiles between malignant rhabdoid tumor and epithelioid sarcoma: miR193a-5p is suggested to downregulate SMARCB1 mRNA expression. Mod. Pathol. (2013). doi:10.1038/modpathol.2013.213 203. Papp, G., Krausz, T., Stricker, T. P., Szendrői, M. & Sápi, Z. SMARCB1 expression in epithelioid sarcoma is regulated by miR-206, miR-381, and miR-671-5p on Both mRNA and protein levels. Genes Chromosomes Cancer 53, 168–176 (2014). 204. Sakurai, K. et al. MicroRNAs miR-199a-5p and -3p target the Brm subunit of SWI/SNF to generate a double-negative feedback loop in a variety of human cancers. Cancer Res 71, 1680–1689 (2011). 205. Morin, R. D. et al. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303 (2011). 206. Lagadinou, E. D. et al. BCL-2 inhibition targets oxidative phosphorylation and selectively eradicates quiescent human leukemia stem cells. Cell Stem Cell 12, 329–341 (2013). 207. Lee, R. S. et al. A remarkably simple genome underlies highly malignant pediatric rhabdoid cancers. J. Clin. Invest. 122, 2983–2988 (2012). 208. Shipp, M. A. A predictive model for aggressive non-Hodgkin‘s lymphoma. The International Non-Hodgkin’s Lymphoma Prognostic Factors Project. N Engl J Med 329, 987–994 (1993). 209. Sehn, L. H. et al. The revised International Prognostic Index (R-IPI) is a better predictor of outcome than the standard IPI for patients with diffuse large B-cell lymphoma treated with R-CHOP. Blood 109, 1857–1861 (2007). 210. Gascoyne, R. D., Rosenwald, A., Poppema, S. & Lenz, G. Prognostic biomarkers in malignant lymphomas. Leuk Lymphoma 51 Suppl 1, 11–19 (2010). 211. Lawrie, C. H. et al. Expression of microRNAs in diffuse large B cell lymphoma is associated with immunophenotype, survival and transformation from follicular lymphoma. J. Cell. Mol. Med. 13, 1248–1260 (2009). 212. Montes-Moreno, S. et al. miRNA expression in diffuse large B-cell lymphoma treated with chemoimmunotherapy. Blood 118, 1034–1040 (2011). 213. Alencar, A. J. et al. MicroRNAs are independent predictors of outcome in diffuse large B-cell lymphoma patients treated with R-CHOP. Clin Cancer Res 17, 4125–4135 (2011). 214. Jima, D. D. et al. Deep sequencing of the small RNA transcriptome of normal and  189 malignant human B cells identifies hundreds of novel microRNAs. Blood 116, e118–27 (2010). 215. Morin, R. D. et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res 18, 610–621 (2008). 216. Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42, D68–73 (2014). 217. Haecker, I. et al. Ago HITS-CLIP expands understanding of Kaposi's sarcoma-associated herpesvirus miRNA function in primary effusion lymphomas. PLoS Pathog. 8, e1002884 (2012). 218. Costinean, S. et al. Pre-B cell proliferation and lymphoblastic leukemia/high-grade lymphoma in E(mu)-miR155 transgenic mice. Proc Natl Acad Sci USA 103, 7024–7029 (2006). 219. Craig, V. J. et al. Systemic microRNA-34a delivery induces apoptosis and abrogates growth of diffuse large B-cell lymphoma in vivo. Leukemia (2012). doi:10.1038/leu.2012.110 220. Shaffer Iii, A. L., Young, R. M. & Staudt, L. M. Pathogenesis of Human B Cell Lymphomas. Annu Rev Immunol (2011). doi:10.1146/annurev-immunol-020711-075027 221. Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834–838 (2005). 222. Huang, W.-T., Kuo, S.-H., Cheng, A.-L. & Lin, C.-W. Inhibition of ZEB1 by miR-200 characterizes Helicobacter pylori-positive gastric diffuse large B-cell lymphoma with a less aggressive behavior. Mod. Pathol. (2014). doi:10.1038/modpathol.2013.229 223. Camp, R. L., Dolled-Filhart, M. & Rimm, D. L. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res 10, 7252–7259 (2004). 224. Allen, C. D. C., Okada, T. & Cyster, J. G. Germinal-center organization and cellular dynamics. Immunity 27, 190–202 (2007). 225. Deutsch, A. J. A. et al. Chemokine receptors in gastric MALT lymphoma: loss of CXCR4 and upregulation of CXCR7 is associated with progression to diffuse large B-cell lymphoma. Mod. Pathol. 26, 182–194 (2013). 226. Caramuta, S. et al. Role of microRNAs and microRNA machinery in the pathogenesis of diffuse large B-cell lymphoma. Blood Cancer J 3, e152 (2013). 227. Medina, P. P., Nolde, M. & Slack, F. J. OncomiR addiction in an in vivo model of microRNA-21-induced pre-B-cell lymphoma. Nature 467, 86–90 (2010). 228. Wang, M. et al. Down-regulated miR-625 suppresses invasion and metastasis of gastric cancer by targeting ILK. FEBS Lett. 586, 2382–2388 (2012). 229. Liston, A., Papadopoulou, A. S., Danso-Abeam, D. & Dooley, J. MicroRNA-29 in the adaptive immune system: setting the threshold. Cell Mol Life Sci 69, 3533–3541 (2012). 230. Lin, J. et al. Follicular dendritic cell-induced microRNA-mediated upregulation of PRDM1 and downregulation of BCL-6 in non-Hodgkin's B-cell lymphomas. Leukemia 25, 145–152 (2011). 231. Rosenfeld, N. et al. MicroRNAs accurately identify cancer tissue origin. Nat  190 Biotechnol 26, 462–469 (2008). 232. Fulci, V. et al. Characterization of B- and T-lineage acute lymphoblastic leukemia by integrated analysis of MicroRNA and mRNA expression profiles. Genes Chromosomes Cancer 48, 1069–1082 (2009). 233. Xu, L., Liang, Y.-N., Luo, X.-Q., Liu, X.-D. & Guo, H.-X. [Association of miRNAs expression profiles with prognosis and relapse in childhood acute lymphoblastic leukemia]. Zhonghua Xue Ye Xue Za Zhi 32, 178–181 (2011). 234. Wang, W. et al. MicroRNA profiling of follicular lymphoma identifies microRNAs related to cell proliferation and tumor response. Haematologica 97, 586–594 (2012). 235. Teng, G. et al. MicroRNA-155 is a negative regulator of activation-induced cytidine deaminase. Immunity 28, 621–629 (2008). 236. Lenze, D. et al. The different epidemiologic subtypes of Burkitt lymphoma share a homogenous micro RNA profile distinct from diffuse large B-cell lymphoma. Leukemia 25, 1869–1876 (2011). 237. Ding, S. et al. Decreased microRNA-142-3p/5p expression causes CD4+ T cell activation and B cell hyperstimulation in systemic lupus erythematosus. Arthritis Rheum. 64, 2953–2963 (2012). 238. Kwanhian, W. et al. MicroRNA-142 is mutated in about 20% of diffuse large B-cell lymphoma. Cancer Med 1, 141–155 (2012). 239. Bosch, R. et al. Focal adhesion proteins expression in human diffuse large B cell lymphoma. Histopathology (2014). doi:10.1111/his.12381 240. Schneider, C. et al. microRNA 28 controls cell proliferation and is down-regulated in B-cell lymphomas. Proc Natl Acad Sci USA (2014). doi:10.1073/pnas.1322466111 241. Scott, D. W. et al. Determining cell-of-origin subtypes of diffuse large B-cell lymphoma using gene expression in formalin-fixed paraffin-embedded tissue. Blood 123, 1214–1217 (2014). 242. Friedländer, M. R., Mackowiak, S. D., Li, N., Chen, W. & Rajewsky, N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res 40, 37–52 (2012). 243. Hsu, F. et al. The UCSC Known Genes. Bioinformatics 22, 1036–1046 (2006). 244. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562–578 (2012). 245. Forman, J. J., Legesse-Miller, A. & Coller, H. A. A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence. Proc Natl Acad Sci USA 105, 14879–14884 (2008). 246. Qin, L. et al. A deep investigation into the adipogenesis mechanism: profile of microRNAs regulating adipogenesis by modulating the canonical Wnt/beta-catenin signaling pathway. BMC Genomics 11, 320 (2010). 247. Ott, C. E. et al. MicroRNAs differentially expressed in postnatal aortic development downregulate elastin via 3' UTR and coding-sequence binding sites. PLoS ONE 6, e16250 (2011). 248. Chi, S. W., Hannon, G. J. & Darnell, R. B. An alternative mode of microRNA target recognition. Nat Struct Mol Biol 19, 321–327 (2012). 249. Hafner, M. et al. Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods 44, 3–12 (2008).  191 250. Bauer, S., Robinson, P. N. & Gagneur, J. Model-based gene set analysis for Bioconductor. Bioinformatics 27, 1882–1883 (2011). 251. Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox Model. (Springer, 2000). 252. Wright, G. A gene expression-based method to diagnose clinically distinct sub-groups of diffuse large B cell lymphoma. Proc Natl Acad Sci USA 100, 9991–9996 (2003). 253. Chin, S.-F. et al. A simple and reliable pretreatment protocol facilitates fluorescent in situ hybridisation on tissue microarrays of paraffin wax embedded tumour samples. MP, Mol. Pathol. 56, 275–279 (2003). 254. Schuback, H. L., Arceci, R. J. & Meshinchi, S. Somatic characterization of pediatric acute myeloid leukemia using next-generation sequencing. Semin. Hematol. 50, 325–332 (2013). 255. Bennett, J. M. et al. Proposals for the classification of the acute leukaemias. French-American-British (FAB) co-operative group. Br. J. Haematol. 33, 451–458 (1976). 256. Harris, N. L. et al. The World Health Organization classification of neoplastic diseases of the hematopoietic and lymphoid tissues. Report of the Clinical Advisory Committee meeting, Airlie House, Virginia, November, 1997. Ann. Oncol. 10, 1419–1432 (1999). 257. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013). 258. Chen, J., Odenike, O. & Rowley, J. D. Leukaemogenesis: more than mutant genes. Nat Rev Cancer 10, 23–36 (2010). 259. Bachas, C. et al. Gene Expression Profiles Associated with Pediatric Relapsed AML. PLoS ONE 10, e0121730 (2015). 260. Ho, P. A. et al. Leukemic mutations in the methylation-associated genes DNMT3A and IDH2 are rare events in pediatric AML: a report from the Children's Oncology Group. Pediatr Blood Cancer 57, 204–209 (2011). 261. Cianfriglia, M. The biology of MDR1-P-glycoprotein (MDR1-Pgp) in designing functional antibody drug conjugates (ADCs): the experience of gemtuzumab ozogamicin. Ann. Ist. Super. Sanita 49, 150–168 (2013). 262. Wolman, S. R., Gundacker, H., Appelbaum, F. R., Slovak, M. L.Southwest Oncology Group. Impact of trisomy 8 (+8) on clinical presentation, treatment response, and survival in acute myeloid leukemia: a Southwest Oncology Group study. Blood 100, 29–35 (2002). 263. Görlach, A. et al. Efficient translation of mouse hypoxia-inducible factor-1alpha under normoxic and hypoxic conditions. Biochim Biophys Acta 1493, 125–134 (2000). 264. Williams, A. J., Blacklow, S. C. & Collins, T. The zinc finger-associated SCAN box is a conserved oligomerization domain. Mol Cell Biol 19, 8526–8535 (1999). 265. Sjöblom, B., Salmazo, A. & Djinović-Carugo, K. Alpha-actinin structure and regulation. Cell Mol Life Sci 65, 2688–2701 (2008). 266. Yoon, S. S. et al. Engagement of CD99 triggers the exocytic transport of ganglioside GM1 and the reorganization of actin cytoskeleton. FEBS Lett. 540, 217–222 (2003). 267. Liu, L. & McKeehan, W. L. Sequence analysis of LRPPRC and its SEC1 domain interaction partners suggests roles in cytoskeletal organization, vesicular trafficking,  192 nucleocytosolic shuttling, and chromosome activity. Genomics 79, 124–136 (2002). 268. Kaźmierczak, M. et al. Esterase D and gamma 1 actin level might predict results of induction therapy in patients with acute myeloid leukemia without and with maturation. Med. Oncol. 30, 725 (2013). 269. Marcucci, G., Mrozek, K., Radmacher, M. D., Garzon, R. & Bloomfield, C. D. The prognostic and functional role of microRNAs in acute myeloid leukemia. Blood 117, 1121–1129 (2011). 270. Garzon, R. et al. MicroRNA signatures associated with cytogenetics and prognosis in acute myeloid leukemia. Blood 111, 3183–3189 (2008). 271. Han, Y.-C. et al. microRNA-29a induces aberrant self-renewal capacity in hematopoietic progenitors, biased myeloid development, and acute myeloid leukemia. J Exp Med 207, 475–489 (2010). 272. Gong, J.-N. et al. The role, mechanism and potentially therapeutic application of microRNA-29 family in acute myeloid leukemia. Cell Death Differ. (2013). doi:10.1038/cdd.2013.133 273. Garzon, R. et al. MicroRNA 29b functions in acute myeloid leukemia. Blood 114, 5331–5341 (2009). 274. Gao, S.-M. et al. miR-15a and miR-16-1 inhibit the proliferation of leukemic cells by down-regulating WT1 protein level. J. Exp. Clin. Cancer Res. 30, 110 (2011). 275. Gao, S.-M. et al. miR-15a/16-1 enhances retinoic acid-mediated differentiation of leukemic cells and is up-regulated by retinoic acid. Leuk Lymphoma 52, 2365–2371 (2011). 276. Katzerke, C. et al. Transcription factor C/EBPα-induced microRNA-30c inactivates Notch1 during granulopoiesis and is downregulated in acute myeloid leukemia. Blood 122, 2433–2442 (2013). 277. Lim, E. L. et al. Comprehensive miRNA sequence analysis reveals survival differences in diffuse large B-cell lymphoma patients. Genome Biol 16, 18 (2015). 278. Emmrich, S. et al. miR-139-5p controls translation in myeloid leukemia through EIF4G2. Oncogene (2015). doi:10.1038/onc.2015.247 279. Wang, Q. et al. The Effects and Molecular Mechanisms of MiR-106a in Multidrug Resistance Reversal in Human Glioma U87/DDP and U251/G Cell Lines. PLoS ONE 10, e0125473 (2015). 280. Huh, J. H. et al. Dysregulation of miR-106a and miR-591 confers paclitaxel resistance to ovarian cancer. Br. J. Cancer 109, 452–461 (2013). 281. Zhang, Y., Lu, Q. & Cai, X. MicroRNA-106a induces multidrug resistance in gastric cancer by targeting RUNX3. FEBS Lett. 587, 3069–3075 (2013). 282. Yeoh, E.-J. et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002). 283. Bassøe, C. F., Bruserud, O., Pryme, I. F. & Vedeler, A. Ribosomal proteins sustain morphology, function and phenotype in acute myeloid leukemia blasts. Leuk. Res. 22, 329–339 (1998). 284. Wang, L. et al. Ribosomal protein S14 silencing inhibits growth of acute myeloid leukemia transformed from myelodysplastic syndromes via activating p53. Hematology 19, 225–231 (2014). 285. Fuchs, O. Important genes in the pathogenesis of 5q- syndrome and their connection  193 with ribosomal stress and the innate immune system pathway. Leuk Res Treatment 2012, 179402 (2012). 286. Montanaro, L., Treré, D. & Derenzini, M. Changes in ribosome biogenesis may induce cancer by down-regulating the cell tumor suppressor potential. Biochim Biophys Acta 1825, 101–110 (2012). 287. Emmrich, S. et al. miR-9 is a tumor suppressor in pediatric AML with t(8;21). Leukemia (2013). doi:10.1038/leu.2013.357 288. Olive, V., Jiang, I. & He, L. mir-17-92, a cluster of miRNAs in the midst of the cancer network. Int. J. Biochem. Cell Biol. 42, 1348–1354 (2010). 289. Landais, S., Landry, S., Legault, P. & Rassart, E. Oncogenic potential of the miR-106-363 cluster and its implication in human T-cell leukemia. Cancer Res 67, 5699–5707 (2007). 290. Deffenbacher, K. E. et al. Molecular distinctions between pediatric and adult mature B-cell non-Hodgkin lymphomas identified through genomic profiling. Blood (2012). doi:10.1182/blood-2011-05-349662 291. Ventura, A. et al. Targeted deletion reveals essential and overlapping functions of the miR-17 through 92 family of miRNA clusters. Cell 132, 875–886 (2008). 292. Braess, J. et al. Proliferative activity of leukaemic blasts and cytosine arabinoside pharmacodynamics are associated with cytogenetically defined prognostic sub-groups in acute myeloid leukaemia. Br. J. Haematol. 113, 975–982 (2001). 293. Yan, W. et al. MicroRNA biomarker identification for pediatric acute myeloid leukemia based on a novel bioinformatics model. Oncotarget (2015). 294. Cooper, T. M. et al. AAML03P1, a pilot study of the safety of gemtuzumab ozogamicin in combination with chemotherapy for newly diagnosed childhood acute myeloid leukemia: a report from the Children's Oncology Group. Cancer 118, 761–769 (2012). 295. Gamis, A. S. et al. Gemtuzumab Ozogamicin in Children and Adolescents With De Novo Acute Myeloid Leukemia Improves Event-Free Survival by Reducing Relapse Risk: Results From the Randomized Phase III Children's Oncology Group Trial AAML0531. J Clin Oncol (2014). doi:10.1200/JCO.2014.55.3628 296. Lange, B. J. et al. Outcomes in CCG-2961, a children‘s oncology group phase 3 trial for untreated pediatric acute myeloid leukemia: a report from the children’s oncology group. Blood 111, 1044–1053 (2008). 297. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997). 298. Dennis, G. et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4, P3 (2003). 299. Blatt, J., Russo, P. & Taylor, S. Extrarenal rhabdoid sarcoma. Med. Pediatr. Oncol. 14, 221–226 (1986). 300. Fischer, H. P., Thomsen, H., Altmannsberger, M. & Bertram, U. Malignant rhabdoid tumour of the kidney expressing neurofilament proteins. Immunohistochemical findings and histogenetic aspects. Pathol. Res. Pract. 184, 541–547 (1989). 301. Ota, A. et al. Identification and characterization of a novel gene, C13orf25, as a target for 13q31-q32 amplification in malignant lymphoma. Cancer Res 64, 3087–3095 (2004). 302. Sugimoto, T. et al. Malignant rhabdoid-tumor cell line showing neural and smooth- 194 muscle-cell phenotypes. Int. J. Cancer 82, 678–686 (1999). 303. Tsokos, M., Kouraklis, G., Chandra, R. S., Bhagavan, B. S. & Triche, T. J. Malignant rhabdoid tumor of the kidney and soft tissues. Evidence for a diverse morphological and immunocytochemical phenotype. Arch Pathol Lab Med 113, 115–120 (1989). 304. Brennan, B., Stiller, C. & Bourdeaut, F. Extracranial rhabdoid tumours: what we have learned so far and future directions. Lancet Oncol. 14, e329–36 (2013). 305. Packer, R. J. et al. Atypical teratoid/rhabdoid tumor of the central nervous system: report on workshop. in 24, 337–342 (2002). 306. Tomlinson, G. E. et al. Rhabdoid tumor of the kidney in the National Wilms' Tumor Study: age at diagnosis as a prognostic factor. J Clin Oncol 23, 7641–7645 (2005). 307. Versteege, I. et al. Truncating mutations of hSNF5/INI1 in aggressive paediatric cancer. Nature 394, 203–206 (1998). 308. Biegel, J. A. et al. The role of INI1 and the SWI/SNF complex in the development of rhabdoid tumors: meeting summary from the workshop on childhood atypical teratoid/rhabdoid tumors. in 62, 323–328 (2002). 309. Schneppenheim, R. et al. Germline nonsense mutation and somatic inactivation of SMARCA4/BRG1 in a family with rhabdoid tumor predisposition syndrome. Am. J. Hum. Genet. 86, 279–284 (2010). 310. Kia, S. K., Gorski, M. M., Giannakopoulos, S. & Verrijzer, C. P. SWI/SNF mediates polycomb eviction and epigenetic reprogramming of the INK4b-ARF-INK4a locus. Mol Cell Biol 28, 3457–3464 (2008). 311. Tolstorukov, M. Y. et al. Swi/Snf chromatin remodeling/tumor suppressor complex establishes nucleosome occupancy at target promoters. Proc Natl Acad Sci USA 110, 10165–10170 (2013). 312. Sullivan, L. M., Folpe, A. L., Pawel, B. R., Judkins, A. R. & Biegel, J. A. Epithelioid sarcoma is associated with a high percentage of SMARCB1 deletions. Mod. Pathol. 26, 385–392 (2013). 313. Hadfield, K. D. et al. Molecular characterisation of SMARCB1 and NF2 in familial and sporadic schwannomatosis. J. Med. Genet. 45, 332–339 (2008). 314. Networkk, C. G. A. R. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). 315. Ramos, P. et al. Small cell carcinoma of the ovary, hypercalcemic type, displays frequent inactivating germline and somatic mutations in SMARCA4. Nat Genet 46, 427–429 (2014). 316. Jelinic, P. et al. Recurrent SMARCA4 mutations in small cell carcinoma of the ovary. Nat Genet 46, 424–426 (2014). 317. Betz, B. L., Strobeck, M. W., Reisman, D. N., Knudsen, E. S. & Weissman, B. E. Re-expression of hSNF5/INI1/BAF47 in pediatric tumor cells leads to G1 arrest associated with induction of p16ink4a and activation of RB. Oncogene 21, 5193–5203 (2002). 318. Jagani, Z. et al. Loss of the tumor suppressor Snf5 leads to aberrant activation of the Hedgehog-Gli pathway. Nat Med 16, 1429–1433 (2010). 319. Mora-Blanco, E. L. et al. Activation of β-catenin/TCF targets following loss of the tumor suppressor SNF5. Oncogene 33, 933–938 (2014). 320. Wilson, B. G. et al. Epigenetic antagonism between polycomb and SWI/SNF  195 complexes during oncogenic transformation. Cancer Cell 18, 316–328 (2010). 321. Gadd, S., Sredni, S. T., Huang, C.-C., Perlman, E. J.Renal Tumor Committee of the Children's Oncology Group. Rhabdoid tumor: gene expression clues to pathogenesis and potential therapeutic targets. Lab. Invest. 90, 724–738 (2010). 322. Hasselblatt, M. et al. High-resolution genomic analysis suggests the absence of recurrent genomic alterations other than SMARCB1 aberrations in atypical teratoid/rhabdoid tumors. Genes Chromosomes Cancer 52, 185–190 (2013). 323. Jackson, E. M. et al. Genomic analysis using high-density single nucleotide polymorphism-based oligonucleotide arrays and multiplex ligation-dependent probe amplification provides a comprehensive analysis of INI1/SMARCB1 in malignant rhabdoid tumors. Clin Cancer Res 15, 1923–1930 (2009). 324. Ammerlaan, A. C. J. et al. Long-term survival and transmission of INI1-mutation via nonpenetrant males in a family with rhabdoid tumour predisposition syndrome. Br. J. Cancer 98, 474–479 (2008). 325. Torchia, J. et al. Molecular sub-groups of atypical teratoid rhabdoid tumours in children: an integrated genomic and clinicopathological analysis. Lancet Oncol. 16, 569–582 (2015). 326. Ferracin, M. et al. MicroRNA profiling for the identification of cancers with unknown primary tissue-of-origin. J Pathol 225, 43–53 (2011). 327. Visvader, J. E. Cells of origin in cancer. Nature 469, 314–322 (2011). 328. Dyer, M. A. & Bremner, R. The search for the retinoblastoma cell of origin. Nature Rev. Cancer 5, 91–101 (2005). 329. Remke, M., Ramaswamy, V. & Taylor, M. D. Medulloblastoma molecular dissection: the way toward targeted therapy. Curr Opin Oncol 25, 674–681 (2013). 330. Feng, X., Wang, Z., Fillmore, R. & Xi, Y. MiR-200, a new star miRNA in human cancer. Cancer Lett. 344, 166–173 (2014). 331. Thakral, S. & Ghoshal, K. miR-122 is a unique molecule with great potential in diagnosis, prognosis of liver disease, and therapy both as miRNA mimic and antimir. Curr Gene Ther 15, 142–150 (2015). 332. Wu, Q. et al. Decreased expression of hepatocyte nuclear factor 4α (Hnf4α)/microRNA-122 (miR-122) axis in hepatitis B virus-associated hepatocellular carcinoma enhances potential oncogenic GALNT10 protein activity. J Biol Chem 290, 1170–1185 (2015). 333. Gan, T.-Q. et al. Upregulated MiR-1269 in hepatocellular carcinoma and its clinical significance. Int J Clin Exp Med 8, 714–721 (2015). 334. Zhang, Y. et al. miR-202 suppresses cell proliferation in human hepatocellular carcinoma by downregulating LRP6 post-transcriptionally. FEBS Lett. 588, 1913–1920 (2014). 335. Sandholm, N. et al. New susceptibility loci associated with kidney disease in type 1 diabetes. PLoS Genet 8, e1002921 (2012). 336. Smith, S. W. et al. CD248+ stromal cells are associated with progressive chronic kidney disease. Kidney Int. 80, 199–207 (2011). 337. Weijts, B. G. M. W. et al. E2F7 and E2F8 promote angiogenesis through transcriptional activation of VEGFA in cooperation with HIF1. EMBO J 31, 3871–3884 (2012). 338. Naylor, A. J. et al. A differential role for CD248 (Endosialin) in PDGF-mediated  196 skeletal muscle angiogenesis. PLoS ONE 9, e107146 (2014). 339. Li, J., Ye, L., Mansel, R. E. & Jiang, W. G. Potential prognostic value of repulsive guidance molecules in breast cancer. Anticancer Res. 31, 1703–1711 (2011). 340. Melé, M. et al. Human genomics. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015). 341. Love, C. et al. The genetic landscape of mutations in Burkitt lymphoma. Nat Genet 44, 1321–1325 (2012). 342. Brahmer, J. R. et al. Safety and activity of anti-PD-L1 antibody in patients with advanced cancer. N Engl J Med 366, 2455–2465 (2012). 343. Stelzer, Y., Sagi, I. & Benvenisty, N. Involvement of parental imprinting in the antisense regulation of onco-miR-372-373. Nat Commun 4, 2724 (2013). 344. Li, G. et al. Correlation of microRNA-372 upregulation with poor prognosis in human glioma. Diagn Pathol 8, 1 (2013). 345. Gu, H., Guo, X., Zou, L., Zhu, H. & Zhang, J. Upregulation of microRNA-372 associates with tumor progression and prognosis in hepatocellular carcinoma. Mol. Cell. Biochem. 375, 23–30 (2013). 346. Yamashita, S. et al. MicroRNA-372 is associated with poor prognosis in colorectal cancer. Oncology 82, 205–212 (2012). 347. Shi, J.-A., Lu, D.-L., Huang, X. & Tan, W. miR-219 inhibits the proliferation, migration and invasion of medulloblastoma cells by targeting CD164. Int. J. Mol. Med. 34, 237–243 (2014). 348. Santa-Maria, I. et al. Dysregulation of microRNA-219 promotes neurodegeneration through post-transcriptional regulation of tau. J. Clin. Invest. 125, 681–686 (2015). 349. Birks, D. K. et al. High expression of BMP pathway genes distinguishes a subset of atypical teratoid/rhabdoid tumors associated with shorter survival. Neuro-oncology 13, 1296–1307 (2011). 350. Subramanian, S. et al. MicroRNA expression signature of human sarcomas. Oncogene 27, 2015–2026 (2008). 351. Murphy, A. J., Viero, S., Ho, M. & Thorner, P. S. Diagnostic utility of nestin expression in pediatric tumors in the region of the kidney. Appl. Immunohistochem. Mol. Morphol. 17, 517–523 (2009). 352. Butterfield, Y. S. et al. JAGuaR: junction alignments to genome for RNA-seq reads. PLoS ONE 9, e102398 (2014). 353. Cancer Genome Atlas Network. Genomic Classification of Cutaneous Melanoma. Cell 161, 1681–1696 (2015). 354. Network, C. G. A. R. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med (2015). doi:10.1056/NEJMoa1402121 355. Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517, 576–582 (2015). 356. Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014). 357. Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330 (2014). 358. The Cancer Genome Atlas Research Network et al. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014). 359. Networkk, C. G. A. R. Comprehensive molecular characterization of urothelial  197 bladder carcinoma. Nature 507, 315–322 (2014). 360. Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013). 361. Networkk, C. G. A. R. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013). 362. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012). 363. The Cancer Genome Atlas Research Network et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012). 364. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). 365. Verhaak, R. G. W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010). 366. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008). 367. Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010). 368. Osella, M., Riba, A., Testori, A., Corà, D. & Caselle, M. Interplay of microRNA and epigenetic regulation in the human regulatory network. Front Genet 5, 345 (2014). 369. Adams, C. M. & Eischen, C. M. Inactivation of p53 is insufficient to allow B cells and B-cell lymphomas to survive without Dicer. Cancer Res 74, 3923–3934 (2014). 370. Barbarotto, E., Schmittgen, T. D. & Calin, G. A. MicroRNAs and cancer: profile, profile, profile. Int. J. Cancer 122, 969–977 (2008). 371. Network, C. G. A. R. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011). 372. Lawrie, C. H. MicroRNAs and lymphomagenesis: a functional review. Br. J. Haematol. 160, 571–581 (2013). 373. Png, K. J., Halberg, N., Yoshida, M. & Tavazoie, S. F. A microRNA regulon that mediates endothelial recruitment and metastasis by cancer cells. Nature 481, 190–194 (2012). 374. Png, K. J. et al. MicroRNA-335 inhibits tumor reinitiation and is silenced through genetic and epigenetic mechanisms in human breast cancer. Genes Dev 25, 226–231 (2011). 375. Zardavas, D., Irrthum, A., Swanton, C. & Piccart, M. Clinical management of breast cancer heterogeneity. Nat Rev Clin Oncol 12, 381–394 (2015). 376. Gu, L. et al. Inhibition of miR-21 Induces Biological and Behavioral Alterations in Diffuse Large B-Cell Lymphoma. Acta Haematol. 130, 87–94 (2013). 377. Broderick, J. A. & Zamore, P. D. MicroRNA therapeutics. Gene Ther. 18, 1104–1110 (2011). 378. Walz, A. L. et al. Recurrent DGCR8, DROSHA, and SIX Homeodomain Mutations in Favorable Histology Wilms Tumors. Cancer Cell 27, 286–297 (2015). 379. Sumazin, P. et al. An Extensive MicroRNA-Mediated Network of RNA-RNA Interactions Regulates Established Oncogenic Pathways in Glioblastoma. Cell 147, 370–381 (2011).  198 380. Ebert, M. S. & Sharp, P. A. Emerging roles for natural microRNA sponges. Curr. Biol. 20, R858–61 (2010). 381. Mukherji, S. et al. MicroRNAs can generate thresholds in target gene expression. Nat Genet 43, 854–859 (2011).      199 Appendices Appendices for Chapter 3 Appendix 3A: Differentially expressed mRNA transcripts in pediatric AML mRNA transcripts characteristic of mRNA sub-group 1 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group1 (n=22) log2 mean all other sub-groups (n=136) log2 fold change (sub-group1/all other sub-groups) p val adj p val HBB NM_000518 8.75E+00 1.37E+01 7.96E+00 5.70E+00 1.20E-12 9.46E-10 HBA2 NM_000517 8.15E+00 1.22E+01 7.49E+00 4.75E+00 2.94E-09 1.15E-06 HBG2 NM_000184 6.30E+00 9.57E+00 5.77E+00 3.80E+00 5.17E-06 1.02E-03 HBD NM_000519 4.92E+00 7.58E+00 4.49E+00 3.09E+00 3.87E-06 1.01E-03 HBG1 NM_000559 5.04E+00 8.05E+00 4.55E+00 3.50E+00 9.55E-06 1.50E-03 AHSP NM_016633 4.79E+00 7.16E+00 4.41E+00 2.75E+00 3.80E-05 4.98E-03 CA1 NM_001128831 4.18E+00 6.32E+00 3.83E+00 2.49E+00 1.80E-04 1.77E-02  mRNA transcripts characteristic of mRNA sub-group 2 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group2 (n=14) log2 mean all other sub-groups (n=144) log2 fold change (sub-group2/all other sub-groups) p val adj p val HBA1 NM_000558 8.01E+00 3.20E+00 8.47E+00 -5.28E+00 2.29E-03 1.55E-02 TPSAB1 NM_003294 5.27E+00 3.09E+00 5.48E+00 -2.39E+00 6.84E-03 3.26E-02 CRIP1 NM_001311 7.88E+00 6.44E+00 8.03E+00 -1.59E+00 8.44E-03 3.70E-02 EGFL7 NM_201446 3.98E+00 2.58E+00 4.12E+00 -1.54E+00 3.32E-03 2.07E-02 ITM2C NM_030926 4.56E+00 3.19E+00 4.70E+00 -1.51E+00 1.03E-03 8.60E-03 JUP NM_021991 4.88E+00 3.54E+00 5.01E+00 -1.46E+00 1.42E-03 1.08E-02 PDLIM1 NM_020992 6.33E+00 5.02E+00 6.46E+00 -1.44E+00 1.05E-04 2.50E-03 CD99 NM_002414 7.79E+00 6.59E+00 7.91E+00 -1.32E+00 1.64E-04 2.86E-03 SOX4 NM_003107 4.81E+00 3.62E+00 4.92E+00 -1.30E+00 1.99E-04 3.17E-03 HLA-DPB1 NM_002121 5.32E+00 4.22E+00 5.43E+00 -1.21E+00 4.98E-03 2.68E-02  200 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group2 (n=14) log2 mean all other sub-groups (n=144) log2 fold change (sub-group2/all other sub-groups) p val adj p val LGALS9 NM_002308 5.77E+00 4.80E+00 5.87E+00 -1.07E+00 5.16E-04 5.41E-03 HSPB1 NM_001540 7.09E+00 6.12E+00 7.18E+00 -1.06E+00 9.56E-03 4.09E-02 CEBPB NM_005194 4.92E+00 5.86E+00 4.83E+00 1.03E+00 1.14E-02 4.53E-02 LITAF NM_001136472 4.89E+00 5.85E+00 4.80E+00 1.05E+00 1.59E-03 1.18E-02 NFKBIZ NM_031419 5.26E+00 6.26E+00 5.16E+00 1.10E+00 1.54E-03 1.16E-02 SOD2 NM_001024465 5.94E+00 6.97E+00 5.84E+00 1.13E+00 1.10E-02 4.51E-02 BNIP3L NM_004331 4.66E+00 5.70E+00 4.56E+00 1.14E+00 3.52E-03 2.13E-02 FBXO7 NM_001033024 4.97E+00 6.02E+00 4.86E+00 1.15E+00 4.19E-03 2.40E-02 ADIPOR1 NM_015999 5.11E+00 6.19E+00 5.01E+00 1.19E+00 1.92E-04 3.14E-03 CAT NM_001752 6.32E+00 7.41E+00 6.22E+00 1.19E+00 4.40E-03 2.47E-02 MAP2K3 NM_145109 5.42E+00 6.51E+00 5.31E+00 1.20E+00 2.56E-03 1.66E-02 PLAUR NM_002659 5.41E+00 6.56E+00 5.29E+00 1.27E+00 5.65E-04 5.70E-03 CTSD NM_001909 6.86E+00 8.04E+00 6.75E+00 1.30E+00 3.98E-05 1.30E-03 GLRX5 NM_016417 5.17E+00 6.36E+00 5.06E+00 1.30E+00 1.75E-03 1.27E-02 NINJ1 NM_004148 5.70E+00 6.91E+00 5.59E+00 1.32E+00 3.77E-03 2.24E-02 EGR1 NM_001964 5.54E+00 6.79E+00 5.41E+00 1.37E+00 1.27E-02 4.90E-02 ANXA1 NM_000700 7.55E+00 8.81E+00 7.42E+00 1.39E+00 7.23E-03 3.36E-02 TFRC NM_001128148 4.34E+00 5.64E+00 4.21E+00 1.43E+00 9.53E-04 8.23E-03 NCOA4 NM_005437 4.62E+00 5.95E+00 4.49E+00 1.46E+00 2.92E-04 3.48E-03 MS4A3 NM_001031666 3.36E+00 4.69E+00 3.23E+00 1.46E+00 7.13E-03 3.34E-02 IER3 NM_003897 4.38E+00 5.74E+00 4.24E+00 1.49E+00 3.41E-03 2.11E-02 LSP1 NM_002339 4.79E+00 6.18E+00 4.66E+00 1.52E+00 2.56E-03 1.66E-02 NCF1 NM_000265 4.12E+00 5.52E+00 3.99E+00 1.53E+00 6.72E-03 3.22E-02 IL1RN NM_173842 3.03E+00 4.46E+00 2.89E+00 1.57E+00 6.23E-03 3.14E-02 PIM1 NM_002648 4.56E+00 6.00E+00 4.42E+00 1.58E+00 2.89E-04 3.48E-03 SLC2A1 NM_006516 3.65E+00 5.10E+00 3.51E+00 1.59E+00 4.61E-03 2.54E-02 CSTA NM_005213 5.85E+00 7.34E+00 5.71E+00 1.63E+00 2.66E-03 1.70E-02 PNP NM_000270 4.72E+00 6.21E+00 4.58E+00 1.64E+00 1.06E-06 7.66E-04 RBM38 NM_017495 4.11E+00 5.60E+00 3.96E+00 1.64E+00 1.20E-03 9.59E-03 SLC25A39 NM_001143780 3.77E+00 5.32E+00 3.62E+00 1.70E+00 1.39E-03 1.08E-02 PRDX2 NM_005809 4.94E+00 6.56E+00 4.78E+00 1.78E+00 4.57E-03 2.53E-02 AZU1 NM_001700 8.10E+00 9.74E+00 7.94E+00 1.80E+00 5.47E-03 2.85E-02 SLC25A39 NM_016016 5.17E+00 6.82E+00 5.01E+00 1.81E+00 9.24E-05 2.42E-03  201 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group2 (n=14) log2 mean all other sub-groups (n=144) log2 fold change (sub-group2/all other sub-groups) p val adj p val NAMPT NM_005746 4.46E+00 6.12E+00 4.30E+00 1.83E+00 2.81E-05 1.21E-03 ALOX5AP NM_001629 5.48E+00 7.16E+00 5.31E+00 1.85E+00 2.25E-04 3.17E-03 CST7 NM_003650 5.77E+00 7.53E+00 5.60E+00 1.93E+00 1.47E-04 2.69E-03 CXCL2 NM_002089 4.05E+00 5.84E+00 3.88E+00 1.97E+00 1.19E-03 9.59E-03 SRGN NM_002727 9.74E+00 1.16E+01 9.56E+00 1.99E+00 1.07E-05 7.66E-04 RETN NM_020415 3.41E+00 5.22E+00 3.23E+00 1.99E+00 9.75E-03 4.12E-02 G0S2 NM_015714 4.34E+00 6.17E+00 4.16E+00 2.01E+00 6.89E-03 3.26E-02 MNDA NM_002432 3.37E+00 5.29E+00 3.18E+00 2.10E+00 2.81E-04 3.45E-03 LGALS3 NM_002306 4.91E+00 6.86E+00 4.72E+00 2.14E+00 9.17E-06 7.66E-04 IL8 NM_000584 8.04E+00 1.01E+01 7.84E+00 2.27E+00 4.60E-04 4.96E-03 BCL2A1 NM_004049 4.59E+00 6.78E+00 4.37E+00 2.41E+00 1.18E-05 7.66E-04 BLVRB NM_000713 5.24E+00 7.45E+00 5.03E+00 2.42E+00 9.43E-06 7.66E-04 FCN1 NM_002003 4.03E+00 6.25E+00 3.81E+00 2.44E+00 4.10E-04 4.60E-03 SLC25A37 NM_016612 4.29E+00 6.58E+00 4.06E+00 2.52E+00 4.94E-06 7.66E-04 CDA NM_001785 3.34E+00 5.67E+00 3.11E+00 2.56E+00 3.13E-05 1.21E-03 CLC NM_001828 4.31E+00 6.65E+00 4.08E+00 2.56E+00 2.76E-03 1.74E-02 RNASE2 NM_002934 7.32E+00 9.69E+00 7.09E+00 2.59E+00 2.09E-04 3.17E-03 S100P NM_005980 4.69E+00 7.08E+00 4.46E+00 2.62E+00 6.91E-04 6.47E-03 RNASE3 NM_002935 4.79E+00 7.39E+00 4.54E+00 2.85E+00 1.20E-04 2.52E-03 CTSG NM_001911 5.26E+00 8.00E+00 4.99E+00 3.01E+00 1.12E-04 2.52E-03 S100A9 NM_002965 9.98E+00 1.30E+01 9.69E+00 3.29E+00 1.22E-04 2.52E-03 S100A12 NM_005621 5.49E+00 8.49E+00 5.19E+00 3.30E+00 2.36E-04 3.17E-03 S100A8 NM_002964 8.82E+00 1.19E+01 8.51E+00 3.41E+00 3.63E-05 1.24E-03 DEFA3 NM_005217 5.06E+00 8.45E+00 4.74E+00 3.72E+00 7.82E-03 3.53E-02 CA1 NM_001128831 4.18E+00 7.79E+00 3.83E+00 3.97E+00 1.28E-04 2.58E-03 HBG1 NM_000559 5.04E+00 8.73E+00 4.68E+00 4.05E+00 1.47E-04 2.69E-03 AHSP NM_016633 4.79E+00 8.52E+00 4.43E+00 4.09E+00 6.60E-05 1.85E-03 HBG2 NM_000184 6.30E+00 1.03E+01 5.91E+00 4.42E+00 9.84E-05 2.49E-03 HBD NM_000519 4.92E+00 9.03E+00 4.52E+00 4.52E+00 7.16E-05 1.94E-03 DEFA1 NM_004084 4.10E+00 9.16E+00 3.61E+00 5.55E+00 1.45E-05 8.13E-04     202 mRNA transcripts characteristic of mRNA sub-group 3 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group3 (n=69) log2 mean all other sub-groups (n=89) log2 fold change (sub-group3/all other sub-groups) p val adj p val RNASE2 NM_002934 7.32E+00 6.03E+00 8.33E+00 -2.30E+00 5.62E-08 6.20E-07 CTSG NM_001911 5.26E+00 4.05E+00 6.19E+00 -2.14E+00 5.31E-06 2.47E-05 RNASE3 NM_002935 4.79E+00 3.71E+00 5.63E+00 -1.93E+00 5.84E-06 2.70E-05 S100A9 NM_002965 9.98E+00 8.91E+00 1.08E+01 -1.90E+00 5.58E-05 1.87E-04 CAPG NM_001747 3.61E+00 2.61E+00 4.39E+00 -1.78E+00 2.60E-07 1.95E-06 CST3 NM_000099 8.74E+00 7.75E+00 9.50E+00 -1.75E+00 1.86E-08 2.47E-07 CDA NM_001785 3.34E+00 2.36E+00 4.10E+00 -1.74E+00 7.01E-07 4.41E-06 AZU1 NM_001700 8.10E+00 7.14E+00 8.85E+00 -1.72E+00 1.02E-05 4.45E-05 TYROBP NM_003332 8.24E+00 7.30E+00 8.98E+00 -1.68E+00 7.37E-10 2.37E-08 LGALS1 NM_002305 9.55E+00 8.61E+00 1.03E+01 -1.67E+00 2.49E-08 3.15E-07 TYROBP NM_198125 6.01E+00 5.07E+00 6.74E+00 -1.66E+00 6.60E-10 2.36E-08 LST1 NM_205838 4.09E+00 3.15E+00 4.81E+00 -1.65E+00 4.88E-09 8.93E-08 CSTA NM_005213 5.85E+00 4.96E+00 6.55E+00 -1.59E+00 1.10E-06 6.35E-06 CCL3 NM_002983 5.08E+00 4.19E+00 5.77E+00 -1.58E+00 1.52E-05 6.12E-05 S100A11 NM_005620 7.00E+00 6.13E+00 7.68E+00 -1.56E+00 7.84E-08 8.00E-07 LGALS3 NM_002306 4.91E+00 4.03E+00 5.58E+00 -1.55E+00 1.60E-07 1.36E-06 NCF1 NM_000265 4.12E+00 3.25E+00 4.80E+00 -1.54E+00 1.24E-06 7.04E-06 ALOX5AP NM_001629 5.48E+00 4.61E+00 6.15E+00 -1.54E+00 6.64E-08 7.02E-07 HBB NM_000518 8.75E+00 7.89E+00 9.42E+00 -1.53E+00 9.84E-03 1.91E-02 LY86 NM_004271 4.82E+00 3.96E+00 5.48E+00 -1.52E+00 4.18E-06 1.99E-05 C1orf162 NM_174896 5.55E+00 4.69E+00 6.22E+00 -1.52E+00 1.26E-09 3.31E-08 PRTN3 NM_002777 6.36E+00 5.52E+00 7.01E+00 -1.50E+00 1.61E-03 3.63E-03 HBA2 NM_000517 8.15E+00 7.30E+00 8.80E+00 -1.50E+00 1.69E-02 3.06E-02 RNASE6 NM_005615 3.80E+00 2.97E+00 4.45E+00 -1.48E+00 3.81E-06 1.83E-05 TSPO NM_000714 6.97E+00 6.14E+00 7.60E+00 -1.46E+00 7.25E-11 7.12E-09 CFD NM_001928 7.48E+00 6.66E+00 8.11E+00 -1.46E+00 1.85E-04 5.21E-04 CRIP1 NM_001311 7.88E+00 7.06E+00 8.52E+00 -1.46E+00 4.46E-05 1.54E-04 AIF1 NM_001623 5.41E+00 4.60E+00 6.03E+00 -1.43E+00 1.41E-06 7.65E-06 S100A8 NM_002964 8.82E+00 8.02E+00 9.43E+00 -1.41E+00 1.88E-03 4.15E-03 IFI27L2 NM_032036 4.43E+00 3.64E+00 5.05E+00 -1.41E+00 1.25E-09 3.31E-08 IFI30 NM_006332 6.66E+00 5.87E+00 7.27E+00 -1.40E+00 1.66E-05 6.61E-05 CCL5 NM_002985 5.70E+00 4.92E+00 6.31E+00 -1.39E+00 1.55E-07 1.36E-06 CHCHD10 NM_213720 5.08E+00 4.30E+00 5.68E+00 -1.38E+00 8.15E-10 2.46E-08 PYCARD NM_013258 5.74E+00 4.97E+00 6.34E+00 -1.37E+00 5.20E-08 6.01E-07 FCER1G NM_004106 7.82E+00 7.05E+00 8.41E+00 -1.36E+00 1.57E-07 1.36E-06  203 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group3 (n=69) log2 mean all other sub-groups (n=89) log2 fold change (sub-group3/all other sub-groups) p val adj p val TYROBP NM_001173514 4.62E+00 3.86E+00 5.21E+00 -1.35E+00 9.03E-07 5.37E-06 LST1 NM_205839 4.47E+00 3.72E+00 5.04E+00 -1.32E+00 1.21E-07 1.12E-06 FCN1 NM_002003 4.03E+00 3.29E+00 4.60E+00 -1.30E+00 1.06E-03 2.53E-03 GRN NM_002087 6.59E+00 5.86E+00 7.16E+00 -1.30E+00 9.37E-07 5.45E-06 S100A10 NM_002966 7.08E+00 6.36E+00 7.65E+00 -1.29E+00 1.16E-04 3.48E-04 IL1RN NM_173842 3.03E+00 2.32E+00 3.58E+00 -1.26E+00 6.57E-05 2.12E-04 KLF4 NM_004235 4.12E+00 3.41E+00 4.67E+00 -1.26E+00 1.28E-04 3.80E-04 LTC4S NM_145867 3.88E+00 3.17E+00 4.42E+00 -1.26E+00 1.46E-04 4.22E-04 BLVRB NM_000713 5.24E+00 4.54E+00 5.79E+00 -1.25E+00 4.57E-06 2.16E-05 S100A12 NM_005621 5.49E+00 4.79E+00 6.03E+00 -1.24E+00 6.80E-03 1.36E-02 PLD4 NM_138790 3.80E+00 3.11E+00 4.33E+00 -1.22E+00 1.21E-04 3.62E-04 CAPG NM_001256139 4.28E+00 3.60E+00 4.81E+00 -1.21E+00 3.42E-05 1.24E-04 FTL NM_000146 1.13E+01 1.06E+01 1.18E+01 -1.20E+00 2.16E-10 1.21E-08 NKG7 NM_005601 6.34E+00 5.67E+00 6.85E+00 -1.18E+00 1.59E-05 6.39E-05 PLD3 NM_012268 3.88E+00 3.22E+00 4.39E+00 -1.17E+00 2.15E-05 8.24E-05 S100A4 NM_002961 9.40E+00 8.74E+00 9.91E+00 -1.17E+00 1.19E-07 1.11E-06 SRGN NM_002727 9.74E+00 9.09E+00 1.02E+01 -1.16E+00 3.36E-05 1.22E-04 SCPEP1 NM_021626 4.55E+00 3.91E+00 5.05E+00 -1.14E+00 2.63E-05 9.70E-05 CYBA NM_000101 8.90E+00 8.26E+00 9.39E+00 -1.13E+00 4.08E-10 1.95E-08 ITGB2 NM_000211 5.78E+00 5.14E+00 6.27E+00 -1.13E+00 1.19E-05 5.05E-05 PLAUR NM_002659 5.41E+00 4.77E+00 5.90E+00 -1.12E+00 1.61E-07 1.36E-06 CITED4 NM_133467 5.13E+00 4.50E+00 5.61E+00 -1.11E+00 5.45E-07 3.63E-06 S100A6 NM_014624 8.47E+00 7.85E+00 8.95E+00 -1.09E+00 1.95E-07 1.56E-06 CD68 NM_001251 6.48E+00 5.88E+00 6.95E+00 -1.07E+00 1.25E-05 5.21E-05 BLOC1S1 NM_001487 5.93E+00 5.34E+00 6.38E+00 -1.03E+00 1.72E-13 1.35E-10 C4orf48 NM_001168243 5.15E+00 4.57E+00 5.59E+00 -1.03E+00 2.01E-08 2.64E-07 TRAPPC1 NM_001166621 5.28E+00 4.70E+00 5.73E+00 -1.03E+00 4.40E-09 8.23E-08 G0S2 NM_015714 4.34E+00 3.77E+00 4.78E+00 -1.02E+00 1.41E-02 2.63E-02 GPX1 NM_000581 8.77E+00 8.20E+00 9.21E+00 -1.02E+00 3.71E-08 4.35E-07 CFP NM_001145252 4.81E+00 4.24E+00 5.25E+00 -1.01E+00 1.97E-05 7.67E-05 HCST NM_001007469 6.37E+00 5.80E+00 6.81E+00 -1.01E+00 2.30E-05 8.75E-05 RSL24D1 NM_016304 6.45E+00 7.02E+00 6.01E+00 1.00E+00 2.14E-11 4.21E-09 RPL7 NM_000971 7.51E+00 8.09E+00 7.06E+00 1.03E+00 1.24E-09 3.31E-08  204 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group3 (n=69) log2 mean all other sub-groups (n=89) log2 fold change (sub-group3/all other sub-groups) p val adj p val RBM39 NM_004902 4.85E+00 5.44E+00 4.40E+00 1.04E+00 2.53E-07 1.95E-06 SLC38A2 NM_018976 5.32E+00 5.91E+00 4.85E+00 1.06E+00 4.22E-11 5.53E-09 HNRNPH1 NM_005520 6.56E+00 7.17E+00 6.08E+00 1.08E+00 9.93E-11 7.89E-09 RPL11 NM_001199802 7.61E+00 8.22E+00 7.13E+00 1.09E+00 1.65E-07 1.38E-06 EIF3E NM_001568 7.46E+00 8.09E+00 6.97E+00 1.12E+00 3.41E-09 6.86E-08 TFRC NM_001128148 4.34E+00 4.97E+00 3.84E+00 1.13E+00 2.30E-05 8.75E-05 AMD1 NM_001634 4.76E+00 5.40E+00 4.26E+00 1.14E+00 2.20E-09 5.23E-08 CCNL1 NM_020307 6.67E+00 7.37E+00 6.13E+00 1.24E+00 1.00E-08 1.58E-07 SRSF11 NM_004768 5.10E+00 5.87E+00 4.50E+00 1.36E+00 3.29E-11 5.18E-09 SF3B1 NM_012433 5.91E+00 6.69E+00 5.31E+00 1.37E+00 6.14E-12 1.61E-09 SF1 NM_001178031 4.46E+00 5.24E+00 3.86E+00 1.38E+00 6.70E-08 7.02E-07 ITM2A NM_004867 5.10E+00 5.89E+00 4.48E+00 1.40E+00 1.70E-03 3.80E-03  mRNA transcripts characteristic of mRNA sub-group 4 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val ITM2A NM_004867 5.10E+00 2.46E+00 5.71E+00 -3.25E+00 3.10E-08 2.44E-07 C19orf77 NM_001136503 5.89E+00 3.36E+00 6.48E+00 -3.13E+00 1.62E-09 2.55E-08 HBG1 NM_000559 5.04E+00 2.74E+00 5.58E+00 -2.84E+00 8.19E-05 2.26E-04 HBG2 NM_000184 6.30E+00 4.02E+00 6.84E+00 -2.82E+00 1.93E-04 4.93E-04 CD34 NM_001025109 4.37E+00 2.16E+00 4.89E+00 -2.73E+00 1.46E-05 4.89E-05 IL8 NM_000584 8.04E+00 5.84E+00 8.56E+00 -2.73E+00 4.71E-08 3.37E-07 CXCL2 NM_002089 4.05E+00 1.91E+00 4.55E+00 -2.64E+00 2.40E-08 2.07E-07 GYPC NM_002101 6.09E+00 3.98E+00 6.59E+00 -2.61E+00 1.05E-10 3.59E-09 IGLL1 NM_020070 4.71E+00 2.60E+00 5.21E+00 -2.61E+00 3.72E-05 1.16E-04 CYTL1 NM_018659 3.77E+00 1.70E+00 4.25E+00 -2.56E+00 9.27E-06 3.24E-05 JUP NM_021991 4.88E+00 2.94E+00 5.33E+00 -2.38E+00 4.63E-10 9.34E-09 TSC22D1 NM_006022 3.99E+00 2.13E+00 4.42E+00 -2.29E+00 2.02E-06 8.58E-06 EGFL7 NM_201446 3.98E+00 2.15E+00 4.41E+00 -2.27E+00 3.67E-05 1.15E-04  205 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val C1QTNF4 NM_031909 4.50E+00 2.84E+00 4.89E+00 -2.05E+00 2.18E-05 6.96E-05 MPO NM_000250 7.59E+00 5.97E+00 7.97E+00 -2.00E+00 4.74E-03 8.98E-03 DEFA3 NM_005217 5.06E+00 3.51E+00 5.43E+00 -1.91E+00 8.99E-03 1.58E-02 PRSS57 NM_214710 6.83E+00 5.29E+00 7.19E+00 -1.90E+00 3.27E-04 7.83E-04 TPSAB1 NM_003294 5.27E+00 3.75E+00 5.62E+00 -1.87E+00 1.35E-02 2.29E-02 DEFA1 NM_004084 4.10E+00 2.63E+00 4.45E+00 -1.82E+00 7.71E-03 1.38E-02 AREG NM_001657 3.74E+00 2.27E+00 4.08E+00 -1.81E+00 3.64E-04 8.56E-04 HBD NM_000519 4.92E+00 3.48E+00 5.25E+00 -1.77E+00 6.40E-03 1.17E-02 LGALS3BP NM_005567 3.71E+00 2.29E+00 4.04E+00 -1.75E+00 7.86E-05 2.22E-04 HIF1A NM_001530 5.83E+00 4.42E+00 6.16E+00 -1.74E+00 8.64E-08 5.66E-07 TPSB2 NM_024164 4.76E+00 3.38E+00 5.09E+00 -1.70E+00 5.76E-03 1.07E-02 PTPRCAP NM_005608 5.57E+00 4.19E+00 5.89E+00 -1.70E+00 7.18E-07 3.50E-06 AHSP NM_016633 4.79E+00 3.47E+00 5.10E+00 -1.63E+00 8.78E-03 1.55E-02 GADD45A NM_001924 4.67E+00 3.37E+00 4.98E+00 -1.61E+00 2.51E-10 6.60E-09 TFRC NM_001128148 4.34E+00 3.08E+00 4.63E+00 -1.55E+00 6.26E-06 2.29E-05 CA1 NM_001128831 4.18E+00 2.93E+00 4.47E+00 -1.54E+00 1.77E-02 2.91E-02 NPW NM_001099456 3.33E+00 2.09E+00 3.62E+00 -1.53E+00 6.29E-04 1.41E-03 CCNL1 NM_020307 6.67E+00 5.48E+00 6.95E+00 -1.48E+00 6.19E-07 3.12E-06 GNA15 NM_002068 5.41E+00 4.21E+00 5.69E+00 -1.48E+00 1.50E-08 1.44E-07 RIOK3 NM_003831 4.97E+00 3.85E+00 5.24E+00 -1.39E+00 1.34E-07 8.22E-07 SLC38A2 NM_018976 5.32E+00 4.21E+00 5.58E+00 -1.37E+00 1.20E-09 2.01E-08 IER3 NM_003897 4.38E+00 3.27E+00 4.64E+00 -1.37E+00 2.68E-04 6.57E-04 CSRNP1 NM_033027 6.08E+00 4.98E+00 6.34E+00 -1.36E+00 3.76E-08 2.79E-07 STK17B NM_004226 4.84E+00 3.76E+00 5.09E+00 -1.33E+00 7.47E-06 2.67E-05 IFITM1 NM_003641 6.35E+00 5.28E+00 6.60E+00 -1.32E+00 2.20E-04 5.59E-04 BNIP3L NM_004331 4.66E+00 3.60E+00 4.91E+00 -1.31E+00 3.44E-08 2.60E-07 SLC2A3 NM_006931 5.44E+00 4.39E+00 5.69E+00 -1.29E+00 1.83E-05 5.97E-05 AMD1 NM_001634 4.76E+00 3.73E+00 5.00E+00 -1.27E+00 3.16E-07 1.75E-06 RBM39 NM_004902 4.85E+00 3.84E+00 5.09E+00 -1.26E+00 1.53E-05 5.09E-05 ARRDC3 NM_020801 4.12E+00 3.14E+00 4.35E+00 -1.21E+00 1.26E-03 2.64E-03 YPEL5 NM_016061 6.30E+00 5.33E+00 6.53E+00 -1.20E+00 9.43E-07 4.36E-06 ELF1 NM_172373 5.53E+00 4.57E+00 5.76E+00 -1.19E+00 3.10E-06 1.24E-05 PIM3 NM_001001852 5.53E+00 4.57E+00 5.76E+00 -1.19E+00 2.45E-06 1.01E-05 ZNF207 NM_001098507 5.93E+00 4.98E+00 6.15E+00 -1.17E+00 2.35E-09 3.55E-08  206 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val FBXO7 NM_001033024 4.97E+00 4.02E+00 5.19E+00 -1.17E+00 3.76E-08 2.79E-07 RSL24D1 NM_016304 6.45E+00 5.51E+00 6.67E+00 -1.17E+00 3.31E-08 2.53E-07 SRSF11 NM_004768 5.10E+00 4.16E+00 5.32E+00 -1.16E+00 1.33E-05 4.51E-05 PMAIP1 NM_021127 5.79E+00 4.85E+00 6.01E+00 -1.16E+00 8.18E-05 2.26E-04 RPS24 NM_033022 1.03E+01 9.33E+00 1.05E+01 -1.16E+00 9.49E-09 9.81E-08 SF3B1 NM_012433 5.91E+00 4.98E+00 6.13E+00 -1.15E+00 6.06E-06 2.23E-05 CD69 NM_001781 6.58E+00 5.65E+00 6.80E+00 -1.15E+00 6.40E-03 1.17E-02 PTPRC NM_002838 4.94E+00 4.01E+00 5.16E+00 -1.15E+00 1.88E-05 6.12E-05 DNAJB6 NM_005494 5.44E+00 4.53E+00 5.66E+00 -1.13E+00 1.19E-07 7.46E-07 EIF1B NM_005875 7.07E+00 6.17E+00 7.28E+00 -1.12E+00 1.78E-10 5.17E-09 SFPQ NM_005066 6.31E+00 5.41E+00 6.53E+00 -1.11E+00 2.74E-08 2.24E-07 PELI1 NM_020651 3.99E+00 3.09E+00 4.20E+00 -1.11E+00 2.05E-03 4.19E-03 DUSP2 NM_004418 4.22E+00 3.33E+00 4.43E+00 -1.11E+00 3.87E-04 9.05E-04 SF1 NM_001178031 4.46E+00 3.58E+00 4.67E+00 -1.09E+00 7.06E-04 1.55E-03 HNRNPA1 NM_002136 7.32E+00 6.44E+00 7.53E+00 -1.08E+00 2.51E-09 3.72E-08 SOD2 NM_001024465 5.94E+00 5.07E+00 6.14E+00 -1.07E+00 3.81E-05 1.18E-04 NCOA4 NM_005437 4.62E+00 3.76E+00 4.82E+00 -1.06E+00 5.15E-03 9.62E-03 PER1 NM_002616 5.44E+00 4.58E+00 5.63E+00 -1.05E+00 1.90E-06 8.12E-06 NAMPT NM_005746 4.46E+00 3.61E+00 4.66E+00 -1.05E+00 2.87E-03 5.68E-03 MKNK2 NM_199054 5.42E+00 4.59E+00 5.61E+00 -1.02E+00 3.19E-08 2.48E-07 EIF3E NM_001568 7.46E+00 6.64E+00 7.65E+00 -1.01E+00 5.57E-06 2.08E-05 ETS2 NM_005239 5.00E+00 4.19E+00 5.20E+00 -1.01E+00 4.16E-05 1.27E-04 CFL1 NM_005507 9.33E+00 1.01E+01 9.14E+00 1.00E+00 2.57E-08 2.13E-07 ATP5J2 NM_001003713 6.46E+00 7.29E+00 6.26E+00 1.02E+00 1.92E-07 1.11E-06 DBI NM_001079862 7.02E+00 7.85E+00 6.82E+00 1.03E+00 2.82E-06 1.15E-05 DYNLL1 NM_003746 5.49E+00 6.33E+00 5.30E+00 1.04E+00 8.11E-05 2.25E-04 ECH1 NM_001398 5.65E+00 6.50E+00 5.45E+00 1.05E+00 6.09E-09 6.84E-08 RNASET2 NM_003730 5.97E+00 6.82E+00 5.77E+00 1.05E+00 3.04E-06 1.23E-05 WAS NM_000377 5.65E+00 6.50E+00 5.45E+00 1.06E+00 7.39E-11 2.77E-09 PGAM1 NM_002629 5.56E+00 6.43E+00 5.36E+00 1.06E+00 5.06E-09 6.21E-08 UCP2 NM_003355 6.84E+00 7.71E+00 6.64E+00 1.07E+00 8.04E-07 3.86E-06 NDUFS7 NM_024407 5.45E+00 6.32E+00 5.24E+00 1.08E+00 6.59E-10 1.23E-08 ATP5J2 NM_004889 5.07E+00 5.95E+00 4.87E+00 1.08E+00 3.28E-06 1.30E-05 PFN1 NM_005022 9.97E+00 1.09E+01 9.76E+00 1.09E+00 1.82E-08 1.66E-07  207 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val IFI6 NM_002038 5.83E+00 6.73E+00 5.62E+00 1.11E+00 1.74E-03 3.60E-03 BLOC1S1 NM_001487 5.93E+00 6.83E+00 5.71E+00 1.11E+00 1.78E-10 5.17E-09 CTSD NM_001909 6.86E+00 7.78E+00 6.65E+00 1.13E+00 6.75E-08 4.58E-07 PLAC8 NM_016619 5.33E+00 6.28E+00 5.11E+00 1.17E+00 5.54E-08 3.92E-07 CSTB NM_000100 5.22E+00 6.17E+00 5.00E+00 1.17E+00 8.85E-08 5.75E-07 CCL3 NM_002983 5.08E+00 6.04E+00 4.86E+00 1.18E+00 1.10E-02 1.90E-02 COTL1 NM_021149 7.04E+00 8.00E+00 6.82E+00 1.18E+00 4.53E-07 2.39E-06 BST2 NM_004335 5.47E+00 6.45E+00 5.25E+00 1.20E+00 4.43E-07 2.35E-06 CNPY3 NM_006586 5.50E+00 6.52E+00 5.26E+00 1.26E+00 6.22E-14 2.44E-11 C4orf48 NM_001168243 5.15E+00 6.16E+00 4.91E+00 1.26E+00 3.07E-08 2.44E-07 TPP1 NM_000391 6.86E+00 7.94E+00 6.61E+00 1.33E+00 6.78E-10 1.24E-08 RAC2 NM_002872 7.45E+00 8.55E+00 7.19E+00 1.35E+00 1.14E-11 7.47E-10 CITED4 NM_133467 5.13E+00 6.23E+00 4.87E+00 1.36E+00 2.94E-07 1.64E-06 CYBA NM_000101 8.90E+00 1.00E+01 8.63E+00 1.38E+00 1.23E-09 2.02E-08 ARPC1B NM_005720 6.73E+00 7.85E+00 6.47E+00 1.38E+00 2.66E-10 6.60E-09 TRAPPC1 NM_001166621 5.28E+00 6.42E+00 5.02E+00 1.41E+00 2.51E-10 6.60E-09 IFI27L2 NM_032036 4.43E+00 5.59E+00 4.16E+00 1.42E+00 5.39E-07 2.79E-06 SPI1 NM_003120 6.49E+00 7.67E+00 6.21E+00 1.46E+00 2.40E-12 2.36E-10 IL1RN NM_173842 3.03E+00 4.22E+00 2.75E+00 1.46E+00 1.42E-04 3.75E-04 CFD NM_001928 7.48E+00 8.70E+00 7.19E+00 1.50E+00 1.14E-03 2.40E-03 LAMTOR2 NM_014017 5.08E+00 6.31E+00 4.79E+00 1.51E+00 8.52E-12 6.09E-10 CLEC11A NM_002975 7.00E+00 8.25E+00 6.71E+00 1.54E+00 3.76E-07 2.04E-06 CFP NM_001145252 4.81E+00 6.07E+00 4.51E+00 1.56E+00 1.03E-06 4.71E-06 CDA NM_001785 3.34E+00 4.62E+00 3.04E+00 1.58E+00 6.34E-04 1.41E-03 IGFBP7 NM_001553 6.15E+00 7.45E+00 5.84E+00 1.61E+00 1.51E-07 9.06E-07 S100A6 NM_014624 8.47E+00 9.78E+00 8.16E+00 1.61E+00 3.76E-09 4.85E-08 CTSA NM_000308 5.59E+00 6.91E+00 5.28E+00 1.63E+00 3.57E-09 4.76E-08 MRPL33 NM_004891 5.81E+00 7.13E+00 5.50E+00 1.63E+00 3.62E-11 1.58E-09 LTC4S NM_145867 3.88E+00 5.21E+00 3.56E+00 1.65E+00 1.70E-04 4.38E-04 ALOX5AP NM_001629 5.48E+00 6.83E+00 5.16E+00 1.67E+00 4.10E-06 1.59E-05 TSPO NM_000714 6.97E+00 8.33E+00 6.65E+00 1.69E+00 3.44E-10 7.51E-09 IFI30 NM_006332 6.66E+00 8.04E+00 6.33E+00 1.71E+00 1.07E-05 3.71E-05 LST1 NM_205839 4.47E+00 5.88E+00 4.14E+00 1.74E+00 6.59E-08 4.50E-07 AZU1 NM_001700 8.10E+00 9.52E+00 7.77E+00 1.75E+00 3.24E-04 7.78E-04 ITGB2 NM_000211 5.78E+00 7.20E+00 5.44E+00 1.76E+00 2.76E-09 4.02E-08  208 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val LGALS3 NM_002306 4.91E+00 6.33E+00 4.57E+00 1.76E+00 1.85E-07 1.09E-06 NKG7 NM_005601 6.34E+00 7.80E+00 5.99E+00 1.81E+00 1.52E-08 1.44E-07 LST1 NM_205838 4.09E+00 5.56E+00 3.74E+00 1.82E+00 4.28E-07 2.29E-06 NRGN NM_001126181 3.76E+00 5.25E+00 3.41E+00 1.84E+00 7.13E-06 2.57E-05 CHCHD10 NM_213720 5.08E+00 6.64E+00 4.71E+00 1.93E+00 2.20E-11 1.15E-09 S100A4 NM_002961 9.40E+00 1.10E+01 9.03E+00 1.93E+00 6.95E-11 2.73E-09 CD68 NM_001251 6.48E+00 8.07E+00 6.11E+00 1.96E+00 2.98E-10 6.69E-09 PYCARD NM_013258 5.74E+00 7.33E+00 5.37E+00 1.96E+00 3.56E-12 3.11E-10 S100A11 NM_005620 7.00E+00 8.62E+00 6.62E+00 1.99E+00 6.19E-08 4.35E-07 TYROBP NM_003332 8.24E+00 9.86E+00 7.87E+00 2.00E+00 4.63E-10 9.34E-09 FCER1G NM_004106 7.82E+00 9.46E+00 7.43E+00 2.03E+00 5.93E-09 6.75E-08 FCN1 NM_002003 4.03E+00 5.68E+00 3.64E+00 2.03E+00 2.46E-05 7.76E-05 TYROBP NM_198125 6.01E+00 7.67E+00 5.62E+00 2.05E+00 2.66E-10 6.60E-09 CCL5 NM_002985 5.70E+00 7.36E+00 5.31E+00 2.05E+00 5.62E-09 6.60E-08 AIF1 NM_001623 5.41E+00 7.11E+00 5.01E+00 2.11E+00 8.92E-09 9.35E-08 C1orf162 NM_174896 5.55E+00 7.29E+00 5.14E+00 2.15E+00 7.95E-13 1.56E-10 CAPG NM_001256139 4.28E+00 6.03E+00 3.87E+00 2.16E+00 1.37E-10 4.48E-09 SRGN NM_002727 9.74E+00 1.15E+01 9.33E+00 2.16E+00 5.26E-09 6.26E-08 SCPEP1 NM_021626 4.55E+00 6.34E+00 4.14E+00 2.20E+00 2.72E-11 1.26E-09 BRE NM_199191 3.78E+00 5.57E+00 3.36E+00 2.22E+00 2.69E-10 6.60E-09 S100A10 NM_002966 7.08E+00 8.88E+00 6.66E+00 2.22E+00 1.48E-08 1.44E-07 KLF4 NM_004235 4.12E+00 5.92E+00 3.70E+00 2.22E+00 5.38E-07 2.79E-06 PLD3 NM_001031696 4.46E+00 6.28E+00 4.04E+00 2.25E+00 1.49E-12 2.29E-10 CSTA NM_005213 5.85E+00 7.70E+00 5.42E+00 2.27E+00 3.72E-09 4.85E-08 CTSG NM_001911 5.26E+00 7.13E+00 4.82E+00 2.31E+00 2.63E-04 6.49E-04 PLD4 NM_138790 3.80E+00 5.71E+00 3.35E+00 2.37E+00 5.32E-10 1.05E-08 CST3 NM_000099 8.74E+00 1.07E+01 8.28E+00 2.41E+00 3.54E-10 7.51E-09 NCF1 NM_000265 4.12E+00 6.08E+00 3.67E+00 2.42E+00 4.92E-09 6.14E-08 GRN NM_002087 6.59E+00 8.55E+00 6.13E+00 2.42E+00 2.37E-11 1.17E-09 CTSH NM_004390 3.63E+00 5.59E+00 3.17E+00 2.42E+00 2.85E-10 6.69E-09 CRIP1 NM_001311 7.88E+00 9.86E+00 7.42E+00 2.44E+00 2.54E-08 2.12E-07 ANXA5 NM_001154 4.25E+00 6.23E+00 3.78E+00 2.44E+00 1.94E-11 1.09E-09 LGALS1 NM_002305 9.55E+00 1.17E+01 9.05E+00 2.64E+00 2.08E-12 2.33E-10 RNASE6 NM_005615 3.80E+00 5.96E+00 3.30E+00 2.67E+00 1.50E-10 4.73E-09 PLD3 NM_012268 3.88E+00 6.12E+00 3.36E+00 2.76E+00 1.75E-12 2.29E-10  209 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group4 (n=30) log2 mean all other sub-groups (n=128) log2 fold change (sub-group4/all other sub-groups) p val adj p val RNASE3 NM_002935 4.79E+00 7.23E+00 4.22E+00 3.01E+00 1.05E-08 1.03E-07 LY86 NM_004271 4.82E+00 7.27E+00 4.24E+00 3.02E+00 1.83E-13 4.80E-11 CAPG NM_001747 3.61E+00 6.33E+00 2.97E+00 3.36E+00 9.99E-15 7.85E-12 RNASE2 NM_002934 7.32E+00 1.01E+01 6.67E+00 3.44E+00 1.57E-11 9.52E-10  mRNA transcripts characteristic of mRNA sub-group 5 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group5 (n=23) log2 mean all other sub-groups (n=135) log2 fold change (sub-group5/all other sub-groups) p val adj p val TFRC NM_001128148 4.34E+00 2.50E+00 4.65E+00 -2.15E+00 1.59E-09 2.59E-08 HBG1 NM_000559 5.04E+00 3.25E+00 5.34E+00 -2.10E+00 1.45E-02 2.84E-02 RPL11 NM_001199802 7.61E+00 5.86E+00 7.91E+00 -2.05E+00 4.72E-04 1.37E-03 CA1 NM_001128831 4.18E+00 2.48E+00 4.47E+00 -1.99E+00 1.00E-02 2.06E-02 HBG2 NM_000184 6.30E+00 4.65E+00 6.58E+00 -1.93E+00 2.30E-02 4.30E-02 SF3B1 NM_012433 5.91E+00 4.46E+00 6.16E+00 -1.69E+00 2.58E-07 2.11E-06 SRSF11 NM_004768 5.10E+00 3.67E+00 5.34E+00 -1.67E+00 1.88E-07 1.62E-06 CCNL1 NM_020307 6.67E+00 5.40E+00 6.89E+00 -1.49E+00 1.12E-06 7.49E-06 DDX5 NM_004396 7.09E+00 5.87E+00 7.29E+00 -1.42E+00 3.55E-08 3.93E-07 SRGN NM_002727 9.74E+00 8.57E+00 9.94E+00 -1.37E+00 3.48E-04 1.05E-03 HSP90AA1 NM_005348 6.78E+00 5.64E+00 6.97E+00 -1.33E+00 4.93E-06 2.50E-05 EIF4A2 NM_001967 6.57E+00 5.45E+00 6.76E+00 -1.30E+00 7.60E-08 7.86E-07 CAT NM_001752 6.32E+00 5.22E+00 6.51E+00 -1.29E+00 4.99E-06 2.51E-05 CD164 NM_006016 4.87E+00 3.79E+00 5.06E+00 -1.27E+00 7.81E-08 7.87E-07 EIF3E NM_001568 7.46E+00 6.42E+00 7.64E+00 -1.22E+00 9.43E-06 4.44E-05 HNRNPH1 NM_005520 6.56E+00 5.52E+00 6.73E+00 -1.22E+00 6.53E-08 6.85E-07 NUCB2 NM_005013 5.72E+00 4.71E+00 5.90E+00 -1.18E+00 6.97E-07 4.89E-06 TMEM123 NM_052932 5.82E+00 4.83E+00 5.99E+00 -1.16E+00 2.21E-06 1.27E-05 EEF1A1 NM_001402 9.95E+00 9.01E+00 1.01E+01 -1.10E+00 6.27E-04 1.76E-03 SF1 NM_001178031 4.46E+00 3.52E+00 4.62E+00 -1.10E+00 8.52E-04 2.33E-03  210 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group5 (n=23) log2 mean all other sub-groups (n=135) log2 fold change (sub-group5/all other sub-groups) p val adj p val HMGB1 NM_002128 5.85E+00 4.91E+00 6.01E+00 -1.10E+00 2.83E-08 3.42E-07 BNIP3L NM_004331 4.66E+00 3.75E+00 4.81E+00 -1.07E+00 2.66E-05 1.10E-04 RBM39 NM_004902 4.85E+00 3.96E+00 5.00E+00 -1.05E+00 2.13E-04 6.93E-04 FAM46A NM_017633 4.13E+00 3.24E+00 4.28E+00 -1.04E+00 3.38E-04 1.03E-03 LCP1 NM_002298 6.86E+00 5.97E+00 7.01E+00 -1.03E+00 1.27E-05 5.66E-05 EIF4G2 NM_001418 6.97E+00 6.09E+00 7.11E+00 -1.02E+00 5.76E-07 4.15E-06 HSP90B1 NM_003299 6.69E+00 5.81E+00 6.84E+00 -1.02E+00 1.34E-07 1.20E-06 EMB NM_198449 4.73E+00 3.86E+00 4.88E+00 -1.02E+00 1.26E-04 4.34E-04 RPL7 NM_000971 7.51E+00 6.65E+00 7.66E+00 -1.01E+00 1.03E-05 4.78E-05 CD69 NM_001781 6.58E+00 5.72E+00 6.73E+00 -1.01E+00 2.72E-03 6.50E-03 TYROBP NM_003332 8.24E+00 9.11E+00 8.10E+00 1.01E+00 6.45E-03 1.39E-02 GNB2L1 NM_006098 1.09E+01 1.18E+01 1.08E+01 1.01E+00 1.70E-11 5.97E-10 C4orf48 NM_001168243 5.15E+00 6.01E+00 5.00E+00 1.01E+00 1.09E-04 3.82E-04 SERPING1 NM_001032295 3.57E+00 4.44E+00 3.42E+00 1.02E+00 1.37E-02 2.70E-02 DUSP2 NM_004418 4.22E+00 5.09E+00 4.08E+00 1.02E+00 1.52E-03 3.86E-03 RPS19 NM_001022 1.02E+01 1.11E+01 1.00E+01 1.03E+00 7.68E-12 3.36E-10 RPL29 NM_000992 1.09E+01 1.18E+01 1.08E+01 1.03E+00 4.34E-13 4.13E-11 PTRHD1 NM_001013663 6.32E+00 7.21E+00 6.17E+00 1.04E+00 6.91E-11 1.87E-09 RPL27 NM_000988 1.10E+01 1.19E+01 1.09E+01 1.05E+00 2.48E-11 7.79E-10 RPL38 NM_001035258 9.92E+00 1.08E+01 9.77E+00 1.07E+00 1.75E-11 5.97E-10 HIST1H2BD NM_021063 5.60E+00 6.53E+00 5.45E+00 1.09E+00 4.34E-03 9.84E-03 SH3BGRL3 NM_031286 9.72E+00 1.06E+01 9.56E+00 1.09E+00 1.21E-05 5.46E-05 RPS11 NM_001015 1.10E+01 1.19E+01 1.08E+01 1.09E+00 3.70E-12 2.08E-10 PRSS57 NM_214710 6.83E+00 7.77E+00 6.67E+00 1.10E+00 2.39E-03 5.83E-03 CST3 NM_000099 8.74E+00 9.68E+00 8.58E+00 1.10E+00 7.10E-03 1.51E-02 MIF NM_002415 9.61E+00 1.06E+01 9.45E+00 1.11E+00 1.17E-07 1.07E-06 RPLP1 NM_001003 1.03E+01 1.13E+01 1.01E+01 1.14E+00 1.40E-09 2.39E-08 HIST2H2AC NM_003517 4.48E+00 5.45E+00 4.31E+00 1.14E+00 4.12E-04 1.22E-03 MGST1 NM_020300 5.66E+00 6.64E+00 5.49E+00 1.15E+00 2.49E-02 4.59E-02 FXYD5 NM_014164 8.85E+00 9.87E+00 8.68E+00 1.19E+00 2.12E-09 3.27E-08 RUNX1 NM_001122607 5.39E+00 6.41E+00 5.21E+00 1.20E+00 3.34E-05 1.31E-04 NR4A1 NM_173157 3.99E+00 5.01E+00 3.81E+00 1.20E+00 1.11E-03 2.92E-03  211 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group5 (n=23) log2 mean all other sub-groups (n=135) log2 fold change (sub-group5/all other sub-groups) p val adj p val CLEC7A NM_197954 3.29E+00 4.31E+00 3.11E+00 1.20E+00 1.02E-02 2.10E-02 RPL32 NM_001007073 5.30E+00 6.35E+00 5.12E+00 1.23E+00 6.71E-09 9.26E-08 HIST1H1E NM_005321 5.06E+00 6.13E+00 4.88E+00 1.25E+00 2.20E-04 7.05E-04 RPL28 NM_000991 7.68E+00 8.76E+00 7.50E+00 1.27E+00 7.96E-10 1.53E-08 ATP5J2 NM_001039178 5.16E+00 6.25E+00 4.98E+00 1.27E+00 2.08E-06 1.20E-05 GYPC NM_002101 6.09E+00 7.18E+00 5.91E+00 1.28E+00 1.11E-03 2.92E-03 RPL31 NM_000993 1.16E+01 1.27E+01 1.14E+01 1.28E+00 2.67E-12 1.62E-10 RPS28 NM_001031 7.95E+00 9.05E+00 7.76E+00 1.29E+00 7.15E-05 2.59E-04 LY6E NM_002346 6.20E+00 7.30E+00 6.01E+00 1.29E+00 3.17E-05 1.26E-04 CD99 NM_002414 7.79E+00 8.90E+00 7.60E+00 1.30E+00 5.31E-08 5.80E-07 RPL38 NM_000999 1.18E+01 1.29E+01 1.16E+01 1.31E+00 6.66E-11 1.87E-09 HSPB1 NM_001540 7.09E+00 8.21E+00 6.89E+00 1.31E+00 2.40E-04 7.60E-04 RPL35 NM_007209 1.08E+01 1.20E+01 1.06E+01 1.32E+00 9.46E-13 7.44E-11 S100A10 NM_002966 7.08E+00 8.21E+00 6.89E+00 1.32E+00 4.88E-03 1.08E-02 HCST NM_014266 7.75E+00 8.90E+00 7.56E+00 1.34E+00 1.53E-05 6.77E-05 RPS16 NM_001020 1.12E+01 1.24E+01 1.10E+01 1.34E+00 1.17E-12 8.39E-11 SAP30 NM_003864 4.02E+00 5.18E+00 3.83E+00 1.35E+00 1.44E-06 9.17E-06 HLA-DMB NM_002118 4.31E+00 5.47E+00 4.11E+00 1.36E+00 6.06E-04 1.72E-03 FTH1 NM_002032 1.07E+01 1.19E+01 1.05E+01 1.37E+00 5.39E-05 2.05E-04 LY6E NM_001127213 4.72E+00 5.91E+00 4.51E+00 1.40E+00 3.29E-06 1.78E-05 JUP NM_021991 4.88E+00 6.08E+00 4.67E+00 1.41E+00 1.28E-04 4.37E-04 SYNGR1 NM_145731 4.87E+00 6.07E+00 4.66E+00 1.41E+00 1.05E-07 9.74E-07 HCST NM_001007469 6.37E+00 7.57E+00 6.16E+00 1.41E+00 9.65E-06 4.52E-05 HLA-DMA NM_006120 6.37E+00 7.58E+00 6.17E+00 1.41E+00 1.00E-06 6.81E-06 ITM2A NM_004867 5.10E+00 6.32E+00 4.89E+00 1.44E+00 6.59E-03 1.42E-02 RPL18 NM_000979 1.13E+01 1.25E+01 1.10E+01 1.49E+00 4.73E-13 4.13E-11 RPS21 NM_001024 1.12E+01 1.25E+01 1.09E+01 1.53E+00 2.56E-13 3.46E-11 TYROBP NM_001173514 4.62E+00 5.93E+00 4.39E+00 1.53E+00 7.66E-06 3.67E-05 HLA-DPA1 NM_033554 7.23E+00 8.55E+00 7.01E+00 1.54E+00 2.87E-05 1.16E-04 HLA-DPB1 NM_002121 5.32E+00 6.64E+00 5.10E+00 1.55E+00 3.84E-06 2.04E-05 TMSB10 NM_021103 1.15E+01 1.28E+01 1.12E+01 1.55E+00 1.70E-07 1.50E-06 TPT1 NM_003295 1.25E+01 1.39E+01 1.23E+01 1.56E+00 5.85E-09 8.36E-08  212 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group5 (n=23) log2 mean all other sub-groups (n=135) log2 fold change (sub-group5/all other sub-groups) p val adj p val HIST1H2BG NM_003518 3.90E+00 5.27E+00 3.67E+00 1.60E+00 1.35E-03 3.47E-03 RPL28 NM_001136135 1.11E+01 1.25E+01 1.09E+01 1.60E+00 6.24E-12 2.89E-10 IFITM3 NM_021034 6.29E+00 7.67E+00 6.06E+00 1.61E+00 7.73E-04 2.15E-03 FTL NM_000146 1.13E+01 1.27E+01 1.11E+01 1.62E+00 9.17E-08 8.68E-07 C19orf77 NM_001136503 5.89E+00 7.29E+00 5.65E+00 1.64E+00 2.64E-04 8.20E-04 HLA-DQB1 NM_002123 4.64E+00 6.07E+00 4.40E+00 1.68E+00 4.77E-04 1.38E-03 IGLL1 NM_020070 4.71E+00 6.15E+00 4.47E+00 1.69E+00 4.55E-03 1.02E-02 CRIP1 NM_001311 7.88E+00 9.34E+00 7.64E+00 1.70E+00 2.47E-04 7.76E-04 PTPRCAP NM_005608 5.57E+00 7.04E+00 5.32E+00 1.72E+00 8.14E-08 8.10E-07 HIST1H2AE NM_021052 4.69E+00 6.23E+00 4.42E+00 1.80E+00 4.10E-05 1.59E-04 HLA-DQA1 NM_002122 4.08E+00 5.63E+00 3.82E+00 1.81E+00 3.18E-04 9.79E-04 CD52 NM_001803 7.82E+00 9.39E+00 7.56E+00 1.84E+00 1.45E-04 4.93E-04 TSC22D1 NM_006022 3.99E+00 5.57E+00 3.72E+00 1.85E+00 1.99E-04 6.58E-04 AREG NM_001657 3.74E+00 5.37E+00 3.46E+00 1.91E+00 1.03E-03 2.77E-03 HLA-DRB5 NM_002125 4.48E+00 6.13E+00 4.20E+00 1.93E+00 4.35E-05 1.67E-04 C1QTNF4 NM_031909 4.50E+00 6.16E+00 4.22E+00 1.94E+00 2.35E-04 7.47E-04 CEBPD NM_005195 5.47E+00 7.16E+00 5.18E+00 1.98E+00 2.94E-05 1.18E-04 CD44 NM_001202557 4.60E+00 6.29E+00 4.31E+00 1.98E+00 9.08E-08 8.68E-07 GTSF1 NM_144594 4.27E+00 5.97E+00 3.99E+00 1.98E+00 3.53E-06 1.90E-05 RPL36 NM_033643 1.16E+01 1.33E+01 1.14E+01 1.99E+00 6.84E-14 2.47E-11 RPL30 NM_000989 1.23E+01 1.40E+01 1.20E+01 2.00E+00 9.44E-14 2.47E-11 EGFL7 NM_201446 3.98E+00 5.74E+00 3.68E+00 2.06E+00 4.44E-06 2.31E-05 RPL28 NM_001136136 6.30E+00 8.07E+00 6.00E+00 2.07E+00 4.81E-06 2.46E-05 MDK NM_002391 4.09E+00 5.87E+00 3.78E+00 2.08E+00 2.61E-07 2.12E-06 RPS29 NM_001032 1.24E+01 1.43E+01 1.21E+01 2.17E+00 1.31E-13 2.58E-11 CYTL1 NM_018659 3.77E+00 5.65E+00 3.45E+00 2.20E+00 1.67E-04 5.61E-04 RPLP2 NM_001004 1.29E+01 1.50E+01 1.26E+01 2.37E+00 2.64E-13 3.46E-11 RPLP1 NM_213725 1.16E+01 1.36E+01 1.12E+01 2.40E+00 5.93E-12 2.89E-10 MPO NM_000250 7.59E+00 9.76E+00 7.22E+00 2.54E+00 1.00E-04 3.55E-04 CD34 NM_001025109 4.37E+00 6.58E+00 4.00E+00 2.58E+00 1.00E-04 3.55E-04 RPL37A NM_000998 1.31E+01 1.53E+01 1.27E+01 2.61E+00 8.51E-14 2.47E-11  213 Gene Name RefSeq Transcript ID log2 base mean log2 mean sub-group5 (n=23) log2 mean all other sub-groups (n=135) log2 fold change (sub-group5/all other sub-groups) p val adj p val NPW NM_001099456 3.33E+00 5.72E+00 2.92E+00 2.80E+00 1.03E-08 1.37E-07 TPSB2 NM_024164 4.76E+00 7.40E+00 4.31E+00 3.08E+00 5.84E-06 2.89E-05 TPSAB1 NM_003294 5.27E+00 8.29E+00 4.75E+00 3.54E+00 1.38E-06 8.99E-06  Relapse samples compared with primary samples Gene Name mRNA Transcript log2 base mean log2 mean Relapse (n=47) log2 mean Primary (n=158) log2 fold change (Relapse/Primary) p val adj p val RPL28 NM_001136136 5.69E+00 3.66E+00 6.30E+00 -2.64E+00 8.22E-13 1.75E-08 TPSB2 NM_024164 4.34E+00 2.93E+00 4.76E+00 -1.84E+00 3.00E-04 7.91E-03 TPSAB1 NM_003294 4.86E+00 3.50E+00 5.27E+00 -1.77E+00 7.65E-04 1.48E-02 HIST1H2BG NM_003518 3.51E+00 2.17E+00 3.90E+00 -1.73E+00 7.60E-06 6.98E-04 TPPP3 NM_015964 2.61E+00 1.35E+00 2.99E+00 -1.64E+00 2.28E-04 6.65E-03 HIST1H2AE NM_021052 4.32E+00 3.11E+00 4.69E+00 -1.58E+00 4.25E-06 4.94E-04 HIST1H1D NM_005320 2.56E+00 1.37E+00 2.92E+00 -1.55E+00 3.71E-06 4.54E-04 HIST1H1E NM_005321 4.73E+00 3.63E+00 5.06E+00 -1.43E+00 5.57E-08 3.39E-05 HSPA1B NM_005346 3.98E+00 2.89E+00 4.30E+00 -1.41E+00 1.98E-05 1.30E-03 HIST1H3D NM_003530 2.56E+00 1.48E+00 2.88E+00 -1.40E+00 9.44E-07 1.81E-04 HIST1H4E NM_003545 2.24E+00 1.17E+00 2.56E+00 -1.39E+00 1.33E-07 5.43E-05 CRIP1 NM_001311 7.57E+00 6.52E+00 7.88E+00 -1.36E+00 3.91E-05 2.05E-03 HIST1H2AD NM_021065 2.19E+00 1.16E+00 2.50E+00 -1.34E+00 4.97E-07 1.21E-04 HIST1H1C NM_005319 6.94E+00 5.90E+00 7.24E+00 -1.34E+00 2.77E-08 2.56E-05 HIST1H2BD NM_021063 5.30E+00 4.27E+00 5.60E+00 -1.33E+00 1.53E-06 2.44E-04 HSPA1A NM_005345 4.19E+00 3.17E+00 4.50E+00 -1.33E+00 2.48E-05 1.53E-03 S100A10 NM_002966 6.78E+00 5.76E+00 7.08E+00 -1.32E+00 5.88E-05 2.83E-03 CRIP2 NM_001312 2.19E+00 1.20E+00 2.49E+00 -1.28E+00 7.39E-04 1.44E-02 HIST1H1B NM_005322 1.59E+00 6.19E-01 1.88E+00 -1.27E+00 6.49E-06 6.28E-04 HIST1H2AG NM_021064 2.37E+00 1.40E+00 2.66E+00 -1.26E+00 1.98E-05 1.30E-03 MID1IP1 NM_001098790 2.75E+00 1.80E+00 3.03E+00 -1.24E+00 3.75E-07 9.84E-05  214 Gene Name mRNA Transcript log2 base mean log2 mean Relapse (n=47) log2 mean Primary (n=158) log2 fold change (Relapse/Primary) p val adj p val RHOC NM_001042678 2.79E+00 1.84E+00 3.07E+00 -1.23E+00 4.64E-06 5.17E-04 HIST2H2AB NM_175065 1.32E+00 3.79E-01 1.60E+00 -1.22E+00 6.60E-09 1.02E-05 RHOB NM_004040 5.91E+00 4.97E+00 6.19E+00 -1.22E+00 7.02E-05 3.11E-03 HIST1H3H NM_003536 2.53E+00 1.60E+00 2.81E+00 -1.21E+00 1.47E-05 1.05E-03 HIST1H2BF NM_003522 2.27E+00 1.33E+00 2.54E+00 -1.21E+00 8.25E-06 7.22E-04 CHD2 NM_001042572 2.04E+00 1.13E+00 2.31E+00 -1.18E+00 8.85E-08 4.28E-05 PHLDA2 NM_003311 2.07E+00 1.17E+00 2.33E+00 -1.17E+00 1.86E-05 1.24E-03 HIST2H2BF NM_001161334 1.70E+00 8.03E-01 1.97E+00 -1.17E+00 8.10E-08 4.20E-05 SNAI1 NM_005985 2.10E+00 1.22E+00 2.36E+00 -1.15E+00 2.94E-05 1.69E-03 HIST2H2AA3 NM_001040874 8.84E+00 7.96E+00 9.10E+00 -1.15E+00 3.81E-06 4.55E-04 TPSD1 NM_012217 1.70E+00 8.25E-01 1.96E+00 -1.13E+00 1.61E-03 2.40E-02 HIST1H2AH NM_080596 1.84E+00 9.64E-01 2.10E+00 -1.13E+00 8.41E-05 3.54E-03 RRAS NM_006270 2.23E+00 1.36E+00 2.49E+00 -1.13E+00 2.65E-05 1.58E-03 GPX1 NM_201397 3.66E+00 2.80E+00 3.92E+00 -1.12E+00 7.79E-07 1.62E-04 HIST2H2AC NM_003517 4.22E+00 3.37E+00 4.48E+00 -1.11E+00 2.79E-06 3.69E-04 CCL3L1 NM_021006 2.49E+00 1.64E+00 2.75E+00 -1.10E+00 4.24E-03 4.45E-02 C1orf38 NM_004848 1.37E+00 5.26E-01 1.63E+00 -1.10E+00 3.52E-05 1.88E-03 TYROBP NM_001173515 3.22E+00 2.38E+00 3.47E+00 -1.09E+00 1.79E-04 5.71E-03 HIST2H2BE NM_003528 5.08E+00 4.25E+00 5.33E+00 -1.07E+00 2.04E-05 1.32E-03 HIST2H3D NM_001123375 1.29E+00 4.68E-01 1.53E+00 -1.07E+00 3.64E-06 4.50E-04 ZDHHC1 NM_013304 1.01E+00 2.01E-01 1.25E+00 -1.04E+00 6.53E-07 1.42E-04 HIST1H2AI NM_003509 1.80E+00 1.01E+00 2.03E+00 -1.02E+00 1.10E-04 4.20E-03 TMEM107 NM_183065 2.29E+00 1.51E+00 2.52E+00 -1.01E+00 4.48E-06 5.04E-04 H2AFJ NM_177925 2.51E+00 1.74E+00 2.74E+00 -1.00E+00 3.43E-05 1.85E-03 ODF3B NM_001014440 2.76E+00 1.99E+00 2.99E+00 -1.00E+00 9.86E-05 3.94E-03 TSPAN13 NM_014399 2.02E+00 2.79E+00 1.79E+00 1.01E+00 5.03E-06 5.40E-04 NUSAP1 NM_001243143 2.43E+00 3.21E+00 2.20E+00 1.01E+00 2.10E-08 2.35E-05 HMBS NM_001024382 1.94E+00 2.76E+00 1.70E+00 1.06E+00 7.81E-04 1.50E-02 GYPB NM_002100 2.35E+00 3.17E+00 2.11E+00 1.06E+00 3.40E-03 3.87E-02 SF1 NM_001178031 4.71E+00 5.53E+00 4.46E+00 1.07E+00 1.04E-04 4.05E-03 SOCS2 NM_003877 2.90E+00 3.73E+00 2.66E+00 1.07E+00 3.11E-03 3.65E-02 KIF11 NM_004523 1.26E+00 2.11E+00 1.01E+00 1.09E+00 1.45E-09 5.80E-06  215 Gene Name mRNA Transcript log2 base mean log2 mean Relapse (n=47) log2 mean Primary (n=158) log2 fold change (Relapse/Primary) p val adj p val CDK1 NM_001786 1.72E+00 2.60E+00 1.46E+00 1.14E+00 2.70E-07 8.37E-05 CD79B NM_021602 2.11E+00 2.99E+00 1.84E+00 1.15E+00 1.85E-04 5.87E-03 CD79A NM_001783 4.30E+00 5.21E+00 4.03E+00 1.18E+00 1.94E-05 1.28E-03 SET NM_001248001 3.70E+00 4.62E+00 3.43E+00 1.19E+00 1.50E-03 2.30E-02 FAM178B NM_001172667 1.32E+00 2.25E+00 1.04E+00 1.22E+00 6.01E-05 2.86E-03 GYPA NM_002099 1.73E+00 2.69E+00 1.45E+00 1.24E+00 8.91E-05 3.68E-03 TOP2A NM_001067 1.34E+00 2.30E+00 1.05E+00 1.24E+00 1.91E-08 2.26E-05 IGJ NM_144646 2.89E+00 3.90E+00 2.59E+00 1.31E+00 2.59E-05 1.55E-03 HBM NM_001003938 4.09E+00 5.10E+00 3.79E+00 1.31E+00 4.46E-03 4.61E-02 CCL2 NM_002982 1.58E+00 2.61E+00 1.27E+00 1.33E+00 1.38E-03 2.19E-02 SLC4A1 NM_000342 2.44E+00 3.55E+00 2.12E+00 1.43E+00 1.88E-04 5.91E-03 CA1 NM_001128831 4.52E+00 5.67E+00 4.18E+00 1.49E+00 4.79E-03 4.82E-02 AHSP NM_016633 5.14E+00 6.30E+00 4.79E+00 1.51E+00 2.37E-03 3.07E-02 VPREB3 NM_013378 1.27E+00 2.45E+00 9.12E-01 1.54E+00 9.34E-07 1.81E-04 VPREB1 NM_007128 1.21E+00 2.48E+00 8.28E-01 1.65E+00 2.26E-08 2.41E-05 IGLL1 NM_020070 5.09E+00 6.37E+00 4.71E+00 1.66E+00 2.03E-04 6.18E-03  Refractory samples compared with primary samples Gene Name mRNA Transcript log2 base mean log2 mean Refractory (n=12) log2 mean Primary (n=19) log2 fold change Refractory/Primary p val adj p val DEFA3 NM_005217 5.49E+00 3.46E+00 6.77E+00 -3.32E+00 3.43E-02 6.74E-01 HBA2 NM_000517 8.88E+00 7.16E+00 9.97E+00 -2.81E+00 5.93E-02 6.74E-01 DEFA1 NM_004084 4.97E+00 3.25E+00 6.05E+00 -2.80E+00 4.61E-02 6.74E-01 HBB NM_000518 9.46E+00 8.04E+00 1.04E+01 -2.31E+00 1.28E-01 6.81E-01 HBA1 NM_000558 9.50E+00 8.14E+00 1.04E+01 -2.22E+00 1.50E-01 6.81E-01 DEFA4 NM_001925 2.61E+00 1.27E+00 3.46E+00 -2.18E+00 9.29E-02 6.76E-01 RPL11 NM_000975 7.40E+00 6.08E+00 8.23E+00 -2.15E+00 2.07E-01 7.13E-01 CLC NM_001828 2.22E+00 1.03E+00 2.98E+00 -1.95E+00 4.54E-02 6.74E-01 LCN2 NM_005564 1.84E+00 6.85E-01 2.57E+00 -1.88E+00 1.07E-02 6.74E-01 PRG2 NM_002728 1.97E+00 8.72E-01 2.67E+00 -1.80E+00 4.32E-02 6.74E-01 DYNLL1 NM_003746 3.92E+00 2.82E+00 4.61E+00 -1.79E+00 1.41E-02 6.74E-01  216 Gene Name mRNA Transcript log2 base mean log2 mean Refractory (n=12) log2 mean Primary (n=19) log2 fold change Refractory/Primary p val adj p val TYROBP NM_003332 5.35E+00 4.30E+00 6.01E+00 -1.72E+00 1.23E-01 6.81E-01 TSC22D3 NM_004089 3.82E+00 2.79E+00 4.47E+00 -1.67E+00 1.41E-02 6.74E-01 LY6E NM_002346 2.18E+00 1.22E+00 2.78E+00 -1.56E+00 1.63E-02 6.74E-01 CKLF NM_016951 2.08E+00 1.13E+00 2.68E+00 -1.55E+00 7.55E-03 6.74E-01 CLEC11A NM_002975 2.85E+00 1.93E+00 3.43E+00 -1.50E+00 6.95E-02 6.74E-01 PKM2 NM_002654 3.22E+00 2.31E+00 3.80E+00 -1.50E+00 9.37E-03 6.74E-01 PYCARD NM_013258 2.84E+00 1.94E+00 3.41E+00 -1.48E+00 3.23E-02 6.74E-01 IGFBP7 NM_001553 3.36E+00 2.48E+00 3.92E+00 -1.45E+00 6.19E-03 6.74E-01 GSTK1 NM_015917 3.89E+00 3.00E+00 4.45E+00 -1.45E+00 2.95E-02 6.74E-01 STMN1 NM_005563 3.37E+00 2.52E+00 3.90E+00 -1.38E+00 1.43E-01 6.81E-01 RNASE3 NM_002935 3.94E+00 3.10E+00 4.47E+00 -1.37E+00 3.19E-01 7.33E-01 PCBD1 NM_000281 2.49E+00 1.67E+00 3.01E+00 -1.35E+00 1.66E-03 6.74E-01 RPS4Y1 NM_001008 4.13E+00 3.30E+00 4.65E+00 -1.35E+00 3.72E-01 7.71E-01 GUK1 NM_001159390 2.16E+00 1.34E+00 2.68E+00 -1.34E+00 1.77E-02 6.74E-01 RAC2 NM_002872 4.47E+00 3.65E+00 4.99E+00 -1.33E+00 2.08E-02 6.74E-01 ZYX NM_001010972 2.15E+00 1.34E+00 2.67E+00 -1.32E+00 4.12E-02 6.74E-01 ARL2 NM_001667 1.53E+00 7.26E-01 2.04E+00 -1.32E+00 3.11E-03 6.74E-01 S100P NM_005980 3.38E+00 2.58E+00 3.88E+00 -1.31E+00 1.86E-01 7.12E-01 HIST2H2AA3 NM_001040874 7.54E+00 6.75E+00 8.05E+00 -1.30E+00 1.20E-01 6.81E-01 HSPA1B NM_005346 1.14E+00 3.43E-01 1.64E+00 -1.30E+00 2.92E-03 6.74E-01 DEFB1 NM_005218 1.06E+00 2.64E-01 1.56E+00 -1.30E+00 3.18E-02 6.74E-01 ACADVL NM_000018 1.83E+00 1.05E+00 2.33E+00 -1.28E+00 5.58E-03 6.74E-01 PSMA4 NM_002789 2.20E+00 1.42E+00 2.70E+00 -1.28E+00 2.12E-02 6.74E-01 LDHA NM_005566 3.45E+00 2.67E+00 3.94E+00 -1.27E+00 3.14E-02 6.74E-01 LY86 NM_004271 2.90E+00 2.13E+00 3.39E+00 -1.27E+00 7.67E-02 6.74E-01 GNAI2 NM_002070 3.60E+00 2.83E+00 4.08E+00 -1.25E+00 1.76E-02 6.74E-01 EIF4EBP1 NM_004095 2.65E+00 1.89E+00 3.13E+00 -1.23E+00 1.58E-02 6.74E-01 CD97 NM_001784 1.85E+00 1.09E+00 2.32E+00 -1.23E+00 2.36E-02 6.74E-01 MBOAT7 NM_024298 2.03E+00 1.28E+00 2.50E+00 -1.22E+00 2.47E-03 6.74E-01 LAT2 NM_014146 1.55E+00 7.98E-01 2.02E+00 -1.22E+00 1.12E-02 6.74E-01 PSMB10 NM_002801 3.01E+00 2.26E+00 3.48E+00 -1.22E+00 8.48E-03 6.74E-01 CLTA NM_001833 2.05E+00 1.31E+00 2.52E+00 -1.21E+00 9.60E-03 6.74E-01 NME2 NM_001018138 1.90E+00 1.16E+00 2.36E+00 -1.21E+00 6.87E-02 6.74E-01 CXorf26 NM_016500 1.20E+00 4.61E-01 1.66E+00 -1.20E+00 4.65E-04 6.74E-01 CD164 NM_006016 2.84E+00 2.10E+00 3.30E+00 -1.20E+00 1.39E-02 6.74E-01  217 Gene Name mRNA Transcript log2 base mean log2 mean Refractory (n=12) log2 mean Primary (n=19) log2 fold change Refractory/Primary p val adj p val PRDX5 NM_012094 3.31E+00 2.58E+00 3.77E+00 -1.20E+00 6.17E-03 6.74E-01 HSPB1 NM_001540 3.49E+00 2.76E+00 3.96E+00 -1.20E+00 3.07E-03 6.74E-01 ISG15 NM_005101 2.58E+00 1.85E+00 3.04E+00 -1.19E+00 3.67E-02 6.74E-01 EIF3D NM_003753 3.80E+00 3.07E+00 4.26E+00 -1.19E+00 5.64E-03 6.74E-01 TPM3 NM_153649 3.21E+00 2.49E+00 3.67E+00 -1.19E+00 3.84E-02 6.74E-01 LGALS9 NM_002308 2.19E+00 1.46E+00 2.65E+00 -1.18E+00 1.24E-02 6.74E-01 KLF6 NM_001300 2.48E+00 1.76E+00 2.94E+00 -1.18E+00 5.36E-02 6.74E-01 IFITM2 NM_006435 6.08E+00 5.36E+00 6.54E+00 -1.17E+00 2.84E-02 6.74E-01 SLPI NM_003064 1.12E+00 4.02E-01 1.57E+00 -1.17E+00 8.58E-02 6.76E-01 ATP5D NM_001001975 3.67E+00 2.95E+00 4.12E+00 -1.17E+00 2.87E-03 6.74E-01 HSPD1 NM_002156 2.02E+00 1.31E+00 2.47E+00 -1.16E+00 4.79E-02 6.74E-01 RPS24 NM_033022 9.28E+00 8.57E+00 9.73E+00 -1.16E+00 1.10E-01 6.81E-01 HSPA1A NM_005345 1.08E+00 3.70E-01 1.53E+00 -1.16E+00 2.04E-02 6.74E-01 ENO1 NM_001428 5.07E+00 4.36E+00 5.51E+00 -1.15E+00 3.15E-03 6.74E-01 UBE2E1 NM_003341 2.13E+00 1.43E+00 2.57E+00 -1.14E+00 2.45E-02 6.74E-01 SSR4 NM_006280 5.75E+00 5.05E+00 6.19E+00 -1.14E+00 2.54E-02 6.74E-01 SPI1 NM_003120 2.47E+00 1.77E+00 2.91E+00 -1.14E+00 5.84E-02 6.74E-01 HIST1H2BG NM_003518 5.01E+00 4.32E+00 5.45E+00 -1.14E+00 2.01E-01 7.13E-01 RPL41 NM_001035267 5.22E+00 4.53E+00 5.66E+00 -1.13E+00 7.10E-02 6.74E-01 PPP2R1A NM_014225 1.56E+00 8.71E-01 2.00E+00 -1.13E+00 8.52E-03 6.74E-01 ARRB2 NM_004313 1.51E+00 8.19E-01 1.95E+00 -1.13E+00 9.71E-02 6.76E-01 AP2S1 NM_004069 3.05E+00 2.36E+00 3.49E+00 -1.12E+00 3.14E-02 6.74E-01 NCF4 NM_000631 1.13E+00 4.43E-01 1.56E+00 -1.12E+00 1.93E-02 6.74E-01 ATG16L2 NM_033388 1.32E+00 6.41E-01 1.76E+00 -1.12E+00 5.21E-03 6.74E-01 TSC22D3 NM_001015881 1.37E+00 6.86E-01 1.80E+00 -1.11E+00 5.83E-02 6.74E-01 C6orf115 NM_021243 3.69E+00 3.01E+00 4.12E+00 -1.11E+00 3.66E-03 6.74E-01 NME2 NM_002512 4.19E+00 3.51E+00 4.62E+00 -1.11E+00 1.67E-02 6.74E-01 CAPNS1 NM_001003962 3.45E+00 2.77E+00 3.88E+00 -1.11E+00 9.87E-03 6.74E-01 PRDX1 NM_181697 3.98E+00 3.30E+00 4.41E+00 -1.11E+00 1.09E-01 6.81E-01 MRPS28 NM_014018 1.07E+00 4.00E-01 1.50E+00 -1.10E+00 3.90E-02 6.74E-01 FAM195A NM_138418 1.14E+00 4.63E-01 1.56E+00 -1.10E+00 1.21E-02 6.74E-01 MYO1G NM_033054 1.71E+00 1.03E+00 2.13E+00 -1.10E+00 1.66E-02 6.74E-01 CUEDC2 NM_024040 1.55E+00 8.74E-01 1.97E+00 -1.10E+00 7.23E-03 6.74E-01 ITGA5 NM_002205 1.47E+00 7.94E-01 1.89E+00 -1.09E+00 2.69E-02 6.74E-01 PIN1 NM_006221 2.01E+00 1.34E+00 2.44E+00 -1.09E+00 4.23E-03 6.74E-01  218 Gene Name mRNA Transcript log2 base mean log2 mean Refractory (n=12) log2 mean Primary (n=19) log2 fold change Refractory/Primary p val adj p val BPI NM_001725 1.31E+00 6.39E-01 1.73E+00 -1.09E+00 1.76E-01 6.99E-01 CST7 NM_003650 4.52E+00 3.85E+00 4.94E+00 -1.09E+00 2.01E-01 7.13E-01 CAP1 NM_006367 1.70E+00 1.03E+00 2.12E+00 -1.09E+00 8.04E-02 6.76E-01 ALDOA NM_184043 2.65E+00 1.98E+00 3.07E+00 -1.09E+00 1.07E-01 6.81E-01 ROMO1 NM_080748 5.18E+00 4.51E+00 5.60E+00 -1.09E+00 6.44E-02 6.74E-01 MPG NM_001015052 1.73E+00 1.07E+00 2.15E+00 -1.08E+00 3.11E-02 6.74E-01 CTSA NM_000308 2.39E+00 1.73E+00 2.81E+00 -1.08E+00 4.79E-02 6.74E-01 LSP1 NM_002339 2.52E+00 1.86E+00 2.94E+00 -1.08E+00 6.70E-02 6.74E-01 PSENEN NM_172341 3.55E+00 2.89E+00 3.97E+00 -1.08E+00 3.15E-03 6.74E-01 NAPA NM_003827 2.21E+00 1.55E+00 2.63E+00 -1.08E+00 4.87E-02 6.74E-01 CCT7 NM_006429 1.64E+00 9.89E-01 2.06E+00 -1.07E+00 3.92E-02 6.74E-01 C6orf48 NM_001040437 1.99E+00 1.33E+00 2.40E+00 -1.07E+00 2.17E-02 6.74E-01 PFDN2 NM_012394 3.73E+00 3.08E+00 4.14E+00 -1.06E+00 3.32E-02 6.74E-01 KLF2 NM_016270 2.23E+00 1.58E+00 2.64E+00 -1.06E+00 2.70E-02 6.74E-01 ORAI1 NM_032790 1.43E+00 7.79E-01 1.84E+00 -1.06E+00 5.09E-03 6.74E-01 RPL17 NM_001199341 8.46E-01 1.97E-01 1.26E+00 -1.06E+00 2.31E-02 6.74E-01 ARHGAP4 NM_001666 1.85E+00 1.21E+00 2.26E+00 -1.06E+00 1.34E-02 6.74E-01 TALDO1 NM_006755 4.15E+00 3.50E+00 4.56E+00 -1.06E+00 5.31E-02 6.74E-01 HSF1 NM_005526 1.39E+00 7.40E-01 1.80E+00 -1.06E+00 4.68E-03 6.74E-01 NOP10 NM_018648 5.70E+00 5.05E+00 6.11E+00 -1.05E+00 3.92E-02 6.74E-01 GSTP1 NM_000852 6.03E+00 5.39E+00 6.44E+00 -1.05E+00 1.41E-02 6.74E-01 PLIN3 NM_005817 8.29E-01 1.83E-01 1.24E+00 -1.05E+00 6.67E-03 6.74E-01 RBMX NM_002139 2.82E+00 2.17E+00 3.22E+00 -1.05E+00 1.61E-01 6.81E-01 CANX NM_001746 2.72E+00 2.08E+00 3.13E+00 -1.05E+00 6.70E-02 6.74E-01 RPL38 NM_001035258 7.23E+00 6.58E+00 7.63E+00 -1.05E+00 4.39E-01 8.26E-01 SMAP2 NM_022733 1.64E+00 9.95E-01 2.04E+00 -1.05E+00 1.63E-02 6.74E-01 NDUFB6 NM_002493 2.38E+00 1.74E+00 2.78E+00 -1.05E+00 1.26E-01 6.81E-01 AURKAIP1 NM_001127230 1.08E+00 4.47E-01 1.48E+00 -1.04E+00 1.19E-02 6.74E-01 SIVA1 NM_006427 3.19E+00 2.56E+00 3.59E+00 -1.04E+00 5.92E-02 6.74E-01 RHOB NM_004040 1.12E+00 4.83E-01 1.52E+00 -1.04E+00 4.10E-02 6.74E-01 SAT1 NM_002970 5.43E+00 4.79E+00 5.83E+00 -1.03E+00 1.41E-01 6.81E-01 C8orf40 NM_138436 1.20E+00 5.64E-01 1.60E+00 -1.03E+00 1.03E-02 6.74E-01 RPLP1 NM_001003 7.76E+00 7.13E+00 8.16E+00 -1.03E+00 1.52E-01 6.81E-01 AIF1 NM_032955 7.04E+00 6.41E+00 7.44E+00 -1.02E+00 2.05E-01 7.13E-01 MAZ NM_001042 1.65E+00 1.03E+00 2.05E+00 -1.02E+00 3.23E-02 6.74E-01  219 Gene Name mRNA Transcript log2 base mean log2 mean Refractory (n=12) log2 mean Primary (n=19) log2 fold change Refractory/Primary p val adj p val 539 ZNF22 NM_006963 1.57E+00 9.43E-01 1.96E+00 -1.02E+00 3.27E-03 6.74E-01 HCST NM_001007469 5.03E+00 4.40E+00 5.42E+00 -1.02E+00 7.07E-02 6.74E-01 CAMP NM_004345 1.52E+00 8.91E-01 1.91E+00 -1.02E+00 1.83E-01 7.12E-01 CD34 NM_001025109 2.42E+00 1.79E+00 2.81E+00 -1.02E+00 9.05E-02 6.76E-01 TSTD1 NM_001113206 2.54E+00 1.92E+00 2.93E+00 -1.02E+00 1.74E-01 6.96E-01 CD82 NM_002231 1.46E+00 8.36E-01 1.85E+00 -1.02E+00 2.45E-02 6.74E-01 C16orf13 NM_001040161 1.46E+00 8.35E-01 1.85E+00 -1.01E+00 4.21E-02 6.74E-01 RTN4 NM_153828 1.14E+00 5.19E-01 1.53E+00 -1.01E+00 4.33E-03 6.74E-01 TRIM22 NM_006074 1.35E+00 7.36E-01 1.74E+00 -1.01E+00 2.88E-02 6.74E-01 MS4A3 NM_001031666 1.37E+00 7.50E-01 1.76E+00 -1.01E+00 7.05E-02 6.74E-01 YWHAH NM_003405 2.52E+00 1.90E+00 2.91E+00 -1.01E+00 4.89E-03 6.74E-01 PRDX4 NM_006406 1.31E+00 6.92E-01 1.70E+00 -1.01E+00 1.90E-02 6.74E-01 P2RY8 NM_178129 9.17E-01 3.03E-01 1.30E+00 -1.00E+00 4.33E-03 6.74E-01 S100A8 NM_002964 8.14E+00 7.53E+00 8.53E+00 -1.00E+00 2.52E-01 7.13E-01 CHCHD8 NM_016565 1.70E+00 1.09E+00 2.09E+00 -9.99E-01 4.84E-02 6.74E-01 TNF NM_000594 9.83E-01 3.71E-01 1.37E+00 -9.98E-01 2.84E-02 6.74E-01 EXOSC5 NM_020158 1.25E+00 6.38E-01 1.63E+00 -9.93E-01 3.81E-02 6.74E-01 HAT1 NM_003642 1.98E+00 1.37E+00 2.36E+00 -9.89E-01 1.36E-02 6.74E-01 PTPRCAP NM_005608 3.28E+00 2.68E+00 3.66E+00 -9.88E-01 1.62E-01 6.81E-