UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Blood-based biomarkers of asthma Yang, Chen Xi 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_september_yang_chenxi.pdf [ 6.86MB ]
Metadata
JSON: 24-1.0354271.json
JSON-LD: 24-1.0354271-ld.json
RDF/XML (Pretty): 24-1.0354271-rdf.xml
RDF/JSON: 24-1.0354271-rdf.json
Turtle: 24-1.0354271-turtle.txt
N-Triples: 24-1.0354271-rdf-ntriples.txt
Original Record: 24-1.0354271-source.json
Full Text
24-1.0354271-fulltext.txt
Citation
24-1.0354271.ris

Full Text

BLOOD-BASED BIOMARKERS OF ASTHMA by  Chen Xi Yang  B.Sc., The University of British Columbia, 2014  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Experimental Medicine)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  August 2017  © Chen Xi Yang, 2017 ii  Abstract  Asthma is a chronic inflammatory airway disorder characterized by reversible airway obstruction and hyperresponsiveness. It affects more than 300 million people worldwide but remains poorly understood. Asthma can be induced by a variety triggers, including allergens, substances found in the workplace, cold air and exercises, and greatly diminishes the life quality of patients. In this thesis, I investigated two major categories of asthma: allergic asthma and occupational asthma. Two types of response are involved in allergic asthma. While 50% of the affected individuals only develop an acute early asthmatic response (EAR), the other individuals develop both the EAR and a chronic late asthmatic response (LAR). Those individuals who present an isolated EAR are classified as early responders (ERs) and individuals who present both the EAR and the LAR are classified as dual responders (DRs). In our study, patients with mild asthma were challenged with specific allergens and their blood was collected prior to the challenge. By measuring the gene expression and the protein levels of complement and coagulation molecules, I demonstrated that the complement and coagulation system may play a role in the LAR of allergic asthma.  Occupational asthma is caused by sensitivity to specific molecules found in the working environment. Western red cedar asthma (WRCA) is the most common form of occupational asthma in the Pacific Northwest region of North America, including British Columbia. It is due to sensitivity to a low molecular weight molecule, plicatic acid (PA), found in the dust of western red cedar wood. The current diagnosis of WRCA is through multiple bronchial challenges, which are time-consuming, complicated and expensive. Blood samples were collected from individuals iii  who were suspected to have WRCA prior to the bronchial challenges. Gene expression was measured using the NanoString platform. Using a pathway-directed approach of random forest and leave-one-out cross-validation, I identified and validated a blood-based two-gene biomarker panel which may help distinguish patients with WRCA from those with asthma due to non-western red cedar causes at baseline. Having such a biomarker panel may greatly simplify the current diagnosis of WRCA by way of a simple blood test. iv  Lay Summary  Asthma is a chronic inflammatory airway disorder characterized by reversible airway obstruction and affects more than 300 million people worldwide. In this thesis, I investigated two major categories of asthma: allergic asthma and occupational asthma. Allergic asthma is induced by a variety of allergens. Using a human allergen inhalation challenge model on individuals with mild asthma, I demonstrated that the complement and coagulation systems may play a role in the late asthmatic response. Similar to allergic asthma, occupational asthma is caused by specific substances found in the workspace. Using blood samples collected from individuals suspected of having western red cedar asthma and statistical modelling, I identified blood-based biomarkers which may help to diagnose this occupational disease. My findings support an approach, whereby a blood-based biomarker panel could be used to simplify the diagnosis of western red cedar asthma, and allow patients avoid the much more invasive bronchial challenges.   v  Preface  I participated in the experimental design and performed all the experiments and analyses described in this thesis. To distinguish from the main text, code pieces and file names in Chapter 4 will appear in a different font. Chapters 3 and 4 contain material that has either been published in a peer-reviewed journal or a book.  Chapter 3 contains the following manuscript:  Yang CX, Singh A, Kim YW, Conway EM, Carlsten C, Tebbutt SJ. Diagnosis of Western Red Cedar Asthma Using a Blood-based Gene Expression Biomarker Panel. Am J Respir Crit Care Med [Internet] 2017 [cited 2017 May 10];Available from: http://www.atsjournals.org/doi/abs/10.1164/rccm.201608-1740LE  I designed the experiment and selected the study participants from discovery to validation. I processed the collected blood samples and performed RNA extraction and RNA quality assessment. I performed the NanoString assay with YWK’s help. I performed the statistical and computational analyses with AS’s help and wrote the entire manuscript. CC participated in patient recruitment and sample collection. AS, EMC, CC and SJT participated in the research design of the study.  Chapter 4 contains parts of the following book chapter: Shannon C, Yang CX, Tebbutt SJ. Analysis of RNA-Seq from Tissue Admixtures --- A Bloody Primer. (Methods in Molecular Biology - In Press)  I developed the reported RNA-sequencing data processing pipeline and wrote part of the book chapter.  vi  Table of Contents  Abstract .......................................................................................................................................... ii	Lay Summary ............................................................................................................................... iv	Preface .............................................................................................................................................v	Table of Contents ......................................................................................................................... vi	List of Tables ................................................................................................................................ xi	List of Figures .............................................................................................................................. xii	List of Symbols and Abbreviations .......................................................................................... xiv	Acknowledgements .................................................................................................................... xvi	Dedication ................................................................................................................................. xviii	Chapter 1: Introduction ................................................................................................................1	1.1	 Thesis overview .............................................................................................................. 1	1.2	 Allergic asthma ............................................................................................................... 2	1.2.1	 Introduction ............................................................................................................. 2	1.2.2	 Clinical diagnosis .................................................................................................... 2	1.2.3	 Biphasic responses .................................................................................................. 3	1.2.4	 Disease mechanisms ............................................................................................... 4	1.3	 Occupational asthma ....................................................................................................... 5	1.3.1	 Western red cedar asthma ....................................................................................... 5	1.3.2	 Clinical diagnosis .................................................................................................... 6	1.3.3	 Disease mechanisms ............................................................................................... 6	1.4	 High throughput molecular technologies ........................................................................ 8	vii  1.4.1	 RNA-Seq ................................................................................................................. 8	1.4.2	 NanoString technologies ......................................................................................... 8	1.5	 Thesis Summary .............................................................................................................. 9	Chapter 2: Allergic asthma and the complement system .........................................................11	2.1	 Introduction ................................................................................................................... 11	2.2	 Hypothesis..................................................................................................................... 13	2.3	 Methods......................................................................................................................... 14	2.3.1	 Human studies ....................................................................................................... 14	2.3.2	 Inhalational challenges .......................................................................................... 14	2.3.3	 Blood collection .................................................................................................... 15	2.3.4	 Study participants .................................................................................................. 15	2.3.4.1	 Discovery set ..................................................................................................... 15	2.3.4.2	 Validation set .................................................................................................... 16	2.3.5	 ELISA set .............................................................................................................. 17	2.3.6	 Experimental techniques ....................................................................................... 18	2.3.6.1	 RNA extraction ................................................................................................. 18	2.3.6.2	 RNA-Seq ........................................................................................................... 19	2.3.6.3	 NanoString nCounter Elements assay ............................................................... 19	2.3.6.4	 ELISA ............................................................................................................... 20	2.3.7	 Data analysis ......................................................................................................... 20	2.3.7.1	 RNA-Seq processing ......................................................................................... 20	2.3.7.2	 NanoString data processing .............................................................................. 21	2.3.7.3	 Differential expression analysis ........................................................................ 22	viii  2.3.7.4	 Hypothesis testing on ELISA data .................................................................... 22	2.4	 Results ........................................................................................................................... 22	2.4.1	 Transcriptomic difference between ERs and DRs (RNA-Seq) ............................. 22	2.4.2	 Technical validation (NanoString Elements) ........................................................ 24	2.4.3	 Biological validation (NanoString Elements) ....................................................... 26	2.4.4	 Differential abundance of C3a and C5a ................................................................ 27	2.5	 Discussion ..................................................................................................................... 29	2.6	 Conclusions and future directions ................................................................................. 31	Chapter 3: Biomarkers of western red cedar asthma ..............................................................32	3.1	 Introduction ................................................................................................................... 32	3.2	 Hypothesis / Rationale .................................................................................................. 32	3.3	 Methods......................................................................................................................... 33	3.3.1	 Inhalational challenges .......................................................................................... 33	3.3.2	 Blood collection .................................................................................................... 34	3.3.3	 Study participants .................................................................................................. 34	3.3.3.1	 Discovery set ..................................................................................................... 34	3.3.3.2	 Validation Set .................................................................................................... 35	3.3.4	 Experimental techniques ....................................................................................... 36	3.3.4.1	 Blood RNA preparation .................................................................................... 36	3.3.4.2	 NanoString nCounter Elements assay ............................................................... 36	3.3.5	 Statistical methodologies ...................................................................................... 37	3.3.5.1	 Data normalization and batch correction .......................................................... 37	3.3.5.2	 Comparisons on CBC/diffs ............................................................................... 37	ix  3.3.5.3	 Biomarker analysis ............................................................................................ 37	3.3.5.4	 Differential expression analysis ........................................................................ 38	3.4	 Results ........................................................................................................................... 38	3.4.1	 Clinical and demographic characteristics ............................................................. 38	3.4.2	 Comparisons on CBC/diffs ................................................................................... 40	3.4.3	 Biomarker panel to distinguish PA-positives and PA-negatives .......................... 43	3.4.4	 Differential gene expression between PA-positives and PA-negatives ................ 48	3.4.5	 Biomarker panel validation ................................................................................... 49	3.5	 Discussion ..................................................................................................................... 51	3.6	 Conclusions and future directions ................................................................................. 53	Chapter 4: Pipeline for RNA-Seq data processing ...................................................................54	4.1	 Introduction ................................................................................................................... 54	4.2	 Methods......................................................................................................................... 54	4.2.1	 Workflow management with Snakemake ............................................................. 54	4.2.2	 Quality control with FastQC ................................................................................. 55	4.2.3	 Quantifying transcript abundance with RSEM ..................................................... 56	4.2.3.1	 Read alignment ................................................................................................. 57	4.2.3.1.1	 Generating the STAR index ........................................................................ 57	4.2.3.1.2	 Aligning reads ............................................................................................. 59	4.2.3.2	 Read summarization .......................................................................................... 59	4.2.4	 All together now! .................................................................................................. 60	4.2.4.1	 The Snakefile .............................................................................................. 61	4.2.4.2	 The config.yaml file ................................................................................... 63	x  4.2.4.3	 Running the pipeline ......................................................................................... 63	4.2.5	 Structure of working directory .............................................................................. 65	4.3	 Conclusions ................................................................................................................... 66	Reference ......................................................................................................................................67	Appendices ....................................................................................................................................72	Appendix A Supplementary material for Chapter 2 ................................................................. 72	A.1	 Genes of interest (microarrays) comparing ER and DR at pre-challenge (8ERs, 6DRs)  ................................................................................................................................... 72	A.2	 Proteins of interest (iTRAQ proteomics) comparing ER and DR at pre-challenge (4ERs, 4DRs) ........................................................................................................................ 73	A.3	 NanoString quality control (QC) criteria .................................................................. 73	A.4	 NanoString probe sequence of candidate genes ........................................................ 74	Appendix B Supplementary material for Chapter 3 ................................................................. 75	B.1	 Discovery and validation workflow of the WRCA panel ......................................... 75	B.2	 Classification tree of the best performing panel ....................................................... 76	B.3	 List of significant genes comparing expression change from 0h to 6h (PA-positive vs. PA-negative) .................................................................................................................... 77	B.4	 Expression levels of the significant genes comparing expression change from 0h to 6h (PA-positive vs. PA-negative) ......................................................................................... 79	 xi  List of Tables  Table 2.1 Clinical and demographic characteristics of the individuals in discovery set. ............. 16	Table 2.2 Clinical and demographic characteristics of the individuals in validation set. ............. 17	Table 2.3 Clinical and demographic characteristics of the individuals in ELISA set. ................. 18	Table 2.4 Differential expression of ERs and DRs in the discovery set at pre-challenge (RNA-Seq). .............................................................................................................................................. 24	Table 2.5 Differential expression of ERs and DRs in the discovery set at pre-challenge (NanoString). ................................................................................................................................ 25	Table 2.6 Differential expression of ERs and DRs in the validation set at pre-challenge (NanoString). ................................................................................................................................ 26	Table 3.1 Clinical and demographic characteristics of the individuals in discovery set. ............. 35	Table 3.2 Clinical and demographic characteristics of the individuals in validation set. ............. 36	 xii  List of Figures  Figure 1.1 Early and late phase response of allergic asthma. ......................................................... 4	Figure 1.2 Workflow of a general NanoString gene expression assay. .......................................... 9	Figure 1.3 Thesis Summary. ......................................................................................................... 10	Figure 2.1 The complement and coagulation pathway. ................................................................ 12	Figure 2.2 NanoString nCounter Elements tag-target complex. ................................................... 20	Figure 2.3 Expression levels of gene transcript isoforms that are differentially expressed between ERs and DRs at pre-challenge (RNA-Seq). .................................................................................. 23	Figure 2.4 Gene expression level of ERs and DRs in the discovery set at pre-challenge (NanoString). ................................................................................................................................ 25	Figure 2.5 Gene expression level of ERs and DRs in the validation set at pre-challenge (NanoString). ................................................................................................................................ 27	Figure 2.6 C3a concentration in ERs, DRs and non-asthmatic controls at pre-challenge. ........... 28	Figure 2.7 C5a concentration in ERs, DRs and non-asthmatic controls at pre-challenge. ........... 29	Figure 3.1 Percentage change in FEV1 in discovery set post methacholine challenge (Day 1). .. 39	Figure 3.2 Percentage change in FEV1 in discovery set post PA challenge (Day 2). ................... 40	Figure 3.3 Relative proportion (%) of leukocyte subtypes (discovery set) following Day 1 methacholine challenge. ................................................................................................................ 42	Figure 3.4 Relative proportion (%) of leukocyte subtypes (discovery set) following Day 2 PA challenge. ...................................................................................................................................... 43	Figure 3.5 AUC performance of all the tested panels in discovery set. ........................................ 44	xiii  Figure 3.6 AUC performance of the reported WRCA biomarker panel (solid line) and the AUC performance after the reshuffling of phenotypic labels (dashed line) in discovery set. ............... 45	Figure 3.7 Classification of individuals in the discovery set based on log2 expression of MAP2K2 and MAPKAPK2. .......................................................................................................................... 47	Figure 3.8 Probability score of individuals in the discovery set. .................................................. 48	Figure 3.9 Expression levels of MAP2K2 and MAPKAPK2 during the 6-hour PA challenge. .... 49	Figure 3.10 Classification of individuals in the validation set based on log2 expression of MAP2K2 and MAPKAPK2. .......................................................................................................... 50	Figure 3.11 Probability score of individuals in the validation set. ............................................... 51	 xiv  List of Symbols and Abbreviations  α  alpha AUC  area under the receiver operating characteristic curve  BAL  bronchoalveolar lavage BAM  binary compressed sequence alignment map (SAM) format BH-FDR Benjamini-Hochberg false discovery rate bp  base pair  CBC/diffs  complete blood cell counts and differentials cDNA  complementary deoxyribonucleic acid (DNA)  DNA  deoxyribonucleic acid DR  dual responder  EAR  early asthmatic response ELISA  enzyme-linked immunosorbent assay ENCODE Encyclopedia of DNA Elements ER  early responder ERCC  External RNA Control Consortium ERK  extracellular signal-regulated protein kinase  FEV1   forced expiratory volume in 1 second FVC  force vital capacity FOV   field of view  GATA  GATA-binding protein GTF  gene transfer format  HDM  house dust mite  IgE  immunoglobulin E iTRAQ isobaric tag for relative and absolute quantitation  JNK  c-Jun N-terminal kinase  LAR  late asthmatic response limma  linear models for microarray and RNA-Seq data LOESS locally weighted polynomial regression LOOCV leave-one-out cross-validation  MAC  membrane attack complex xv  MAPK  mitogen-activated protein kinase MBL  mannose binding lectin  PA  plicatic acid PAI-1  plasminogen activator inhibitor-1 PAMPs pathogen-associated molecular patterns PC20  provocative concentration of methacholine that causes a 20% drop in FEV1  QC  quality control qPCR  quantitative polymerase chain reaction  RAM   random access memory RNA  ribonucleic acid RNA-Seq RNA-sequencing RSEM  RNA-Seq by Expectation-Maximization  SAM  sequence alignment map STAR  Spliced Transcripts Alignment to a Reference SVM  Support Vector Machines  TF  tissue factor Th  T helper  UCSC  University of California, Santa Cruz  WRCA western red cedar asthma  xvi  Acknowledgements  First, and most of all, I would like to thank my supervisor Dr. Scott Tebbutt who supported me and helped me a lot during my graduate studies. He led me through my master’s studies all the way from an undergraduate student who knew nothing about research to a skillful researcher and provided me with many opportunities to learn. I would like to thank my co-supervisor Dr. Edward Conway who also supported me and gave me many valuable suggestions on both my research and my career path. I also want to thank my supervisor committee Dr. Gabriela Cohen Freue and Dr. Jayachandran Kizhakkedathu for their kind support and great feedbacks. I want to thank our collaborator Dr. Christopher Carlsten, too, for his professional insight towards phenotyping the study patients and his input on the research project. I would like to acknowledge the financial support from Mitacs, AllerGen NCE, the BC Lung Association, WorkSafe BC and UBC Faculty of Medicine. I would also like to thank Dr. Carlsten’s group at Chan-Yeung Centre for Occupational and Environmental Respiratory Disease and the research staff of AllerGen Clinical Investigator Collaborative in recruiting the research participants and collecting the samples. The studies could not be possible without their excellent work. I would like to express my sincere thanks to my colleagues and mentors, Dr. Amrit Singh and Mr. Casey Shannon for their great help on statistical and computational data analysis. As a student from a non-statistical background, these research projects could not be accomplished without their patient teaching. I also want to thank my colleagues, Mr. Young Woong Kim, Dr. Emilie Lameignere, Ms. Linnette Ocariza and Mr. Victor Lei, for teaching me experimental laboratory techniques. Thank you to all my previous and current lab members, Mr. Luka Culbrik, xvii  Ms. Amreen Toor, Ms. Jaclyn Rollins, Dr. Tara Fernandez, Dr. Piyushkumar Kapopara, Ms. Jenny Huang and Dr. Houra Loghmani for your generous help.  Lastly, I would like to thank my parents and my dear friend Xuan Zhang for their love and support. Thank you! xviii  Dedication To My Parents.  1  Chapter 1: Introduction  1.1 Thesis overview Asthma is a chronic inflammatory disorder characterized by airway obstruction. Both allergic asthma and occupational asthma are common types of asthma and greatly diminish patients’ quality of life. This thesis includes two goals: 1) to investigate the relationship between mild allergic asthma and the complement system, and 2) to develop a blood-based biomarker panel to help diagnose western red cedar asthma (WRCA), which is one of the most common forms of occupational asthma in British Columbia. Upon allergen inhalation, while most mildly allergic individuals develop an early asthmatic response (EAR), approximately 50% of the individuals develop both an EAR and a late asthmatic response (LAR) 1. Individuals who only present with an isolated EAR are classified as early responders (ERs) whereas individuals who present with both an EAR and a LAR are classified as dual responders (DRs). The complement system is a key component of innate immunity and plays an important role in allergic asthma. The first part of the thesis (Chapter 2) was designed to identify complement molecules (genes and proteins) that were differentially expressed in ERs and DRs. Identification of such molecules may help understand the underlying mechanism of the LAR and reveal new therapeutic targets. The second part (Chapter 3) of the thesis focused on biomarker development of WRCA, which is the most common form of occupational asthma in the Pacific Northwest region of North America, including British Columbia 2–4. I developed a blood-based biomarker panel that can discriminate patients with WRCA from those suffering from asthma due to other causes, at 2  baseline without having the patients to go through the inhalational challenges, which may greatly improve the current diagnosis of the disease. In addition, Chapter 4 introduces an automated ribonucleic acid-sequencing (RNA-Seq) data processing pipeline using “Snakemake” and describes how it can be used towards file management. RNA-Seq data quantification is complicated by a large number of files generated during the process. Having such an automated pipeline greatly reduces the complexity of the process, saves time and helps with project management.   1.2 Allergic asthma 1.2.1 Introduction Asthma is a chronic inflammatory disorder characterized by airway obstruction and hyperresponsiveness. It affects more than three million Canadians (~10%) 5 but remains poorly understood due to its complexity and heterogeneity 6. Among these individuals with asthma, 75% belong to the allergic type. Although there has been significant improvement in asthma diagnosis and clinical practice guidelines over the past decade, 50% of Canadians with asthma remain uncontrolled 5. Symptoms of allergic asthma include coughing, wheezing, shortness of breath and chest tightness.  1.2.2 Clinical diagnosis Medical history is critically important in the initial diagnosis of asthma 5. Spirometry is then used to confirm the diagnosis by measuring FEV1 (forced expiratory volume in 1 second) and FEV1/FVC (force vital capacity) ratio. This also provides a quantitative measure to follow. However, patients with normal spirometry results can also present with symptoms of asthma. In 3  such cases, bronchial provocation tests are used to confirm airway hyperresponsiveness. Methacholine is inhaled in doubling concentrations until the patient has a decrease of 20% in FEV1 (PC20). A concentration of less than 16 mg/L is used as an indicator of airway hyperresponsiveness. The lower the concentration, the greater the severity of airway hyperresponsiveness. Besides spirometry, physical examination of wheezing on auscultation can also be used to confirm airflow limitation. Skin prick tests are commonly used to confirm an atopic condition.   1.2.3 Biphasic responses Both an EAR and a LAR are present in allergic asthma. The EAR occurs several minutes after inhalation of the allergen and can be easily reversed by using bronchodilators. In contrast, the LAR develops 2-6 hours after inhalation and is hard to reverse even with corticosteroids (Figure 1.1). While the majority of individuals with asthma have an EAR, only 50% of individuals develop LAR 1. Based on their different responses to the allergen, individuals with an isolated EAR are classified as ERs whereas individuals with both an EAR and a LAR are classified as DRs. Understanding the molecular mechanisms underlying the differences would allow us to better understand disease heterogeneity and to develop new diagnostic tools and treatments.  4   Figure 1.1 Early and late phase response of allergic asthma 1. (Permission obtained from Journal of Allergy and Clinical Immunology)  1.2.4 Disease mechanisms Over the past 2-3 decades, it has become generally accepted that the allergic response of patients with asthma is associated with T helper type 2 (Th2) immune processes. Upon inhalation of allergen, elevated levels of Th2 cells release cytokines such as IL-4, IL-5, IL-9 and IL-13, which promote inflammation and immunoglobulin E (IgE) production by mast cells. IgE, in turn, triggers mast cell degranulation and the release of inflammatory mediators such as histamine. These mediators lead to the immediate EAR. The mechanism which results in the chronic LAR is not clear at the moment. Most studies believe that the LAR is related to the recruitment of eosinophils and neutrophils and that the structural and functional characteristics of the LAR of 5  allergic asthma provide the link between the acute immunoglobulin E response and chronic persistent asthma 7.  1.3 Occupational asthma 1.3.1 Western red cedar asthma Western red cedar asthma (WRCA) is the most common form of occupational asthma in the Pacific Northwest region of North America including British Columbia 2–4 and affects approximately 5% of exposed workers 3,4. The majority of the affected individuals are sawmill workers, shingle mill workers, carpenters and construction workers. With continuous exposure of approximately 3 years, symptoms including cough, wheeze and dyspnea, start to develop 3. Initially, the symptoms occur after working hours and the affected individual frequently awakens at night with cough, wheeze and chest tightness 8. The symptoms tend to appear at work if the exposure continues. Some of the patients also develop persistent breathlessness after several weeks or months. The symptoms show improvement during weekends and holidays but the recovery usually takes longer as the disease progresses 3. There is an assumption that the symptoms will disappear after removal from the exposure; however, as shown in previous studies, half of the patients continue to have recurrent asthma attacks and require regular medications for symptomatic relief 5,10,11. After years of exposure, patients usually have persistent airflow obstruction and respiratory impairment 11. Patients with a shorter duration of exposure prior to diagnosis, show better recovery 4, indicating that early diagnosis and exposure removal is essential for treating the disease 12 and improving health-related quality of life.  6  1.3.2 Clinical diagnosis Diagnostic testing for WRCA has not changed over the past several decades and involves multiple inhalational bronchial challenges 4,13,14. The patient needs to stay away from work and stay in the clinic for at least 2 days to confirm the diagnosis. Confirmation of the diagnosis of WRCA requires that the patient experiences a reduction in lung function (i.e. FEV1) in response to inhalation of plicatic acid (PA), the disease-causing molecule isolated from western red cedar wood dust. Methacholine challenge tests are also performed to confirm airway hyperresponsiveness. A detailed interview of the patient’s past and current occupation and health status is also required for a diagnosis. Therefore, the whole process can be time-consuming, complicated, expensive, and also may cause discomfort to the patients. Adiponectin has been shown in a previous study to be a potential WRCA biomarker after PA challenge; however, there are no known biomarkers that can predict WRCA at baseline without challenges 15.   1.3.3 Disease mechanisms Three types of asthmatic reactions to PA inhalation are observed in WRCA: isolated EAR, isolated LAR and dual responses.  Isolated LAR and dual responses are very common in WRCA and account for more than 90% of the disease population while the isolated EAR accounts for less than 10% of the patients 4. The mechanisms underlying WRCA are not yet well understood, but both immunological and non-immunological mechanisms are likely involved. Specific IgE antibodies to the crude red cedar dust have previously been detected in approximately 40% of patients with WRCA, and were not found in healthy controls or individuals who have asthma but did not respond to PA 16. However, the role of IgE antibodies is still not clear since the antibodies were not found in all the patients with WRCA and they were 7  not only observed in the early responders and dual responders but also observed in the isolated late responders 3. On the other hand, as a low molecular weight molecule, PA may act as a hapten to combine with other proteins that are found in the airways to form a complete, functional antigen. This complete antigen may then cross-link with specific IgE to trigger the allergic reaction seen in WRCA 17. Besides the immunological mechanisms, previous studies have shown that upon PA exposure, there was direct increased histamine release from human basophils or mast cells in some of the individuals with WRCA in comparison to healthy controls and patients with asthma unrelated to WRCA (non-WRCA) 3,18. Activation of the complement system via the classical pathway was also observed in the patients with WRCA 19. Proinflammatory molecules such as C3a and C5a generated during complement system activation could also induce histamine release from human basophils and mast cells. In addition, activation of the complement system could result in increased vascular permeability 20, which is an important feature of the late phase response. Moreover, upon PA inhalational challenge, there was an increase in sputum eosinophils among patients with WRCA at 6 hours and 24 hours post challenge and this was inversely correlated with the drop of FEV1 at the 6-hour time point 21,22. This suggests that the late phase response occurring in most patients with WRCA may be associated with elevated levels of eosinophils in the airway. Levels of exhaled NO were also increased at 24 hours post PA challenge and the levels of exhaled NO during methacholine challenge and PA challenge were correlated with the level of eosinophilia 23,24. Loss of epithelial cells during the late phase response 25 and increased numbers of T-lymphocytes in the bronchial mucosa 26 were also observed in individuals with WRCA.  8  1.4 High throughput molecular technologies 1.4.1 RNA-Seq Initially, gene expression studies were performed by low-throughput methods such as northern blots and quantitative polymerase chain reaction (qPCR) 27. These methods are limited to measuring only one transcript at a time. Over the last two decades, advanced technologies have enabled us to measure gene expression more globally (transcriptomics). Before RNA-Seq technology was developed, transcriptomic studies largely relied on hybridization-based microarrays. Microarrays can simultaneously quantify thousands of transcripts at a relatively low cost; however, several limitations are associated with microarrays. For example, there are cross-hybridization artifacts in the analysis of highly similar sequences and the ability to quantify low or highly expressed genes is very limited 27. Compared to microarray, RNA-Seq is more detailed in gene expression quantification. It not only performs better in quantifying low and highly abundant gene transcripts but also enables us to discover new sequences that are not yet annotated. The preparation of RNA-Seq includes the following steps: first, RNA is extracted from biological materials such as cells or tissues; next, RNA is converted to complementary deoxyribonucleic acid (cDNA) by reverse transcription; finally, after PCR amplification, a library is constructed and ready for sequencing 27.  1.4.2 NanoString technologies In comparison to RNA-Seq which has a long work-up (such as cDNA amplification and library construction), the NanoString platform is fully automated and has higher sensitivity, precision, reproducibility and a lower background signal. Also, the NanoString assay is a non-enzymatic assay and inexpensive to implement in a clinical setting with data being generated in 9  less than 24 hours. The workflow of the NanoString assay is shown in Figure 1.2. The assay consists of both capture and reporter probes. The reporter probe is fluorescently labeled with a unique colour code for each target transcript. In the assay, the target transcripts bind together with reporter probes and capture probes to form target-probe complexes by hybridization at 67°C for 18-24 hours. Then, the hybridized complexes are immobilized and aligned on a glass cartridge. At the final step, the cartridge is scanned in a digital analyzer and each transcript-unique colour code is quantified for its abundance by counting.  Figure 1.2 Workflow of a general NanoString gene expression assay.  1.5 Thesis Summary In this thesis, Chapter 2 describes my first project on the relationship between the complement system and the late asthmatic response. I compared both the protein abundance and the gene expression of complement molecules in blood between ERs and DRs at baseline (Figure 1.3). Since the protein abundance did not show any difference and the validation results of gene expression did not corroborate with the discovery results (derived from my analysis of RNA-Seq datasets), I decided to stop this project. However, I was still interested in the late asthmatic response and samples were available to us from another cohort of individuals who were suspected to have western red cedar asthma. Many of these individuals presented a late asthmatic 10  response, but unfortunately, their phenotypes were not clear and the sample size was small. Therefore, we switched to another research question, which was to identify biomarkers for diagnosing western red cedar asthma (Chapter 3). In this project, I successfully identified and validated a biomarker panel that distinguishes western red cedar asthmatics from non-western red cedar asthmatics (Figure 1.3). Concurrently, to simplify the processing procedures for RNA-Seq data, I developed an automated RNA-Seq data processing pipeline that can be used for any future projects involving RNA-Seq (Figure 1.3). This pipeline is described in Chapter 4 of my thesis.  Figure 1.3 Thesis Summary  11  Chapter 2: Allergic asthma and the complement system  2.1 Introduction The complement system is a danger-sensing and host-defense system of innate immunity. Upon recognition of pathogen-associated molecular patterns (PAMPs), the complement system can be activated through three pathways: the classical pathway, the alternative pathway and the lectin pathway 28. The classical pathway begins with the binding of antigen-antibody immune complexes, PAMPs or apoptotic cells to C1q while the lectin pathway starts with the binding of mannose binding lectin (MBL) or ficolins to carbohydrates on pathogens or apoptotic cells. The alternative pathway is activated through the hydrolysis of the internal thioester bond of C3 to form C3·H2O. All three types of activation lead to the formation of C3 convertase and converge at the level of C3. C3 can be cleaved into C3a and C3b. C3b participates in the formation of C5 convertase and cleaves C5 into C5a and C5b. C5b, together with C6, C7, C8 and C9, forms the membrane attack complex (MAC), which incorporates into the membranes and induces the lysis of pathogens or cells (Figure 2.1) 28,29. Anaphylatoxins C3a and C5a generated during this process are well-known proinflammatory molecules which are critical in activating and regulating allergic responses.  The coagulation system is part of the homeostatic process. It is activated through the intrinsic and extrinsic pathways (Figure 2.1). The intrinsic pathway is initiated by contact activation of factor XII while the extrinsic pathway is triggered by tissue factor (TF) which converts factor VII to factor VIIa. Both pathways converge at the level of factor X. The activated form of factor X, factor Xa, converts prothrombin (factor II) into thrombin (factor IIa), which 12  then catalyzes the cleavage of soluble fibrinogen into fibrin molecules. In the final step, fibrin molecules crosslink with each other to form the fibrin clot with the help of factor XIIIa.  There are many crosstalk pathways between the complement and the coagulation system (Figure 2.1) 29. For example, C5a is able to enhance blood thrombogenicity by upregulating the expression of TF and plasminogen activator inhibitor-1 (PAI-1) on platelets, mast cells and basophils. Thrombin also contributes to the cleavage of C3 into C3a and C3b, and C5 into C5a and C5b, thus amplifying the activation of the complement system 29.  Figure 2.1 The complement and coagulation pathway.  The complement and coagulation systems play a complex role in allergic asthma. Many studies have reported observing complement activation during allergic responses. Nagata and Glovsky showed that extracts of aeroallergens, such as house dust mite (HDM), Aspergillus 13  fumigatus and perennial ryegrass can induce in vitro production of anaphylatoxins C3a and C5a 30. HDM-derived proteases can also cleave C3 and C5 into C3a and C5a, respectively 31. In a segmental allergen challenge model of individuals with asthma, levels of both C3a and C5a were increased in bronchoalveolar lavage (BAL) after 24 hours while the levels of C3a and C5a only showed minor increases in healthy individuals 32. There was also evidence in genome-wide association studies showing linkage of asthma and chromosomal regions coding for C5 and its receptor 33,34. In animal studies, upon allergen exposure, C5-deficient mice were more prone to the development of airway hyperresponsiveness and pulmonary inflammation compared to wild-type mice 35,36. Further investigation showed that blocking C5a receptor signaling prior to allergen sensitization induced Th2-biased immune responses; however, blockade of C5a receptor in already sensitized mice suppressed airway inflammation and hyperresponsiveness 37.   2.2 Hypothesis Although it is generally accepted that the complement and coagulation systems participate in allergic responses of asthma, it is not clear whether it affects the late phase response of asthma or not. There are previous studies showing that C3 may regulate the late asthmatic response using mouse models 38,39. Using a human allergen inhalation challenge model, our preliminary data in microarrays 40 and iTRAQ (isobaric tag for relative and absolute quantitation) proteomics 41 (see appendix A.1-2 for details) suggested that the expression of complement molecules may be associated with the LAR of allergic asthma. We hypothesized that complement molecules (genes and proteins) are differentially abundant in ERs and DRs of allergic asthma. 14  2.3 Methods 2.3.1 Human studies Upon written informed consent, individuals with mild asthma were recruited into the study at the University of British Columbia, McMaster University and Université Laval as part of the AllerGen Clinical Investigator Collaborative. All participants were non-smokers and had been diagnosed with clinically stable asthma with a baseline FEV1 ≥ 70% of the predicted value and PC20 < 16 mg/mL. None of the patients had other lung diseases, cardiovascular disease or viral infections at least 4 weeks prior to the challenge. None had used inhaled corticosteroids or other medications for asthma (see Diamant et al. 1 for a complete list of inclusion and exclusion criteria).   2.3.2 Inhalational challenges Skin prick tests were used prior to inhalational challenges to determine the allergen sensitivity for each individual. Methacholine challenges were performed on Day 1 and Day 3 to confirm airway hyperresponsiveness and the allergen-induced shift of PC20 ([PC20]pre/[PC20]post). Allergen challenge was performed on Day 2, using allergen extracts that were pre-determined by skin prick tests. Both methacholine and allergen extracts were given in doubling doses until a drop of 20% in FEV1 was reached. The initial concentration of allergen extracts was determined based on the results of skin prick tests and Day 1 methacholine challenge. During the inhalational challenges, FEV1 was monitored at regular intervals until 7 hours post-challenge. EAR was defined as having a 20% drop or more in FEV1 between 0 to 2 hours post-allergen challenge while the LAR was defined as having a 15% drop or more between 3 to 7 hours post-allergen challenge. Patients who only presented the EAR were classified as ERs while those who 15  presented both EAR and LAR were classified as DRs. Individuals who did not fulfil the definition of LAR but demonstrated a drop of 10% or more in FEV1 and an allergen-induced shift ([PC20]pre/[PC20]post) value ≥ 2 were also classified as DRs.  2.3.3 Blood collection Blood samples were collected at baseline (prior to allergen challenge) and 2 hours post-allergen challenge using standardized protocols. Approximately 5.5 mL of blood was collected at each time point into EDTA tubes (~ 3ml) and PAXgene Blood RNA tubes (~ 2.5 mL). Complete blood cell counts and differentials (CBC/diffs) were measured from EDTA blood using an automated hematology analyzer (Cell Dyn 3700 System [Abbott Diagnostics, USA]). Then, the rest of the EDTA blood sample was centrifuged at 500 x g for 10 minutes at room temperature and freshly processed into plasma, buffy coat and erythrocytes. The processed plasma, buffy coat, erythrocytes samples and PAXgene blood samples were stored in -80 ºC freezer prior to and after shipment to the Tebbutt laboratory in Vancouver, Canada.   2.3.4 Study participants 2.3.4.1 Discovery set Thirty-six individuals (15 ERs and 21 DRs) were included in the discovery set. Their demographic and clinical characteristics are shown in the following table (Table 2.1). Samples collected from these individuals were used for both RNA-Seq and NanoString analysis.   16  Table 2.1 Clinical and demographic characteristics of the individuals in discovery set. Statistics are shown as mean ± standard deviations.  ERs (n=15) DRs (n=21) P-value Number of Female 10 (67%) 14 (67%)  Weight (kg) 70.2 ± 14.8 73.0 ± 16.3 0.47 Height (cm) 168.3 ± 8.4 168.4 ± 9.3 0.98 Age (years) 27.1 ± 8.0 32.0 ± 13.1 0.07 Baseline FEV1 (L) 3.4 ± 0.8 3.2 ± 0.8 0.44 % drop in FEV1 during EAR -35.2 ± 9.9 -35.4 ± 10.0 0.93 % drop in FEV1 during LAR -7.5 ± 4.0 -28.7 ± 15.1 < 0.001 Allergen induced PC20 shift 1.7 ± 1.7 3.4 ± 1.9 0.001 Allergen  (Number of individuals) Cat 8 5  Fungus 0 2  Grass 3 2  HDM 4 8  Horse 0 1  Ragweed 0 3  Site  (Number of individuals) Laval 10 8  McMaster 5 12  UBC 0 1   2.3.4.2 Validation set An additional 45 individuals (9 ERs and 36 DRs) were used for validation. Their demographic and clinical characteristics are shown in Table 2.2. Samples collected from these individuals were used for NanoString analysis.      17  Table 2.2 Clinical and demographic characteristics of the individuals in validation set. Statistics are shown as mean ± standard deviations.  ERs (n=9) DRs (n=36) P-value Number of Female 5 (56%) 15 (42%)  Weight (kg) 70.2 ± 11.2 73.3 ± 14.5 0.59 Height (cm) 167.9 ± 10.3 173.0 ± 8.0 0.13 Age (years) 33.4 ± 8.2 29.1 ± 10.1 0.24 Baseline FEV1 (L) 3.0 ± 0.5 3.5 ± 0.7 0.048 % drop in FEV1 during EAR -30.6 ± 10.3 -34.9 ± 9.0 0.22 % drop in FEV1 during LAR -6.4 ± 4.5 -22.5 ± 8.7 < 0.001 Allergen induced PC20 shift 1.2 ± 0.8 3.0 ± 1.8 0.02 Allergen  (Number of individuals) Birch 1 0  Cat 6 14  Fungus 0 0  Grass 2 7  HDM 0 12  Horse 0 1  Ragweed 0 2  Site  (Number of individuals) Laval 6 10  McMaster 1 19  UBC 2 7   2.3.5 ELISA set Samples collected from 49 individuals (18 ERs, 25 DRs and 6 non-asthmatics) were used for ELISA (enzyme-linked immunosorbent assay) quantification of C3a and C5a levels. The demographic and clinical characteristics of these individuals are shown in Table 2.3. Of the 49 individuals, 31 individuals (14 ERs and 17DRs) overlapped with the discovery set.     18   Table 2.3 Clinical and demographic characteristics of the individuals in ELISA set. Statistics are shown as mean ± standard deviations.  ERs (n=18) DRs (n=25) P-value Number of Female 12 (67%) 18 (72%)  Age (years) 28.4 ± 8.2 31.2 ± 12.2 0.50 Baseline FEV1 (L) 3.3 ± 0.8 3.2 ± 0.7 0.80 % drop in FEV1 during EAR -34.5 ± 10.3 -37.1 ± 9.0 0.43 % drop in FEV1 during LAR -6.7 ± 4.2 -28.8 ± 12.1 < 0.001 Allergen induced PC20 shift 1.4 ± 1.4 3.0 ± 1.8 <0.001 Allergen (Number of individuals) Cat 12 11  Fungus 0 1  Grass 2 2  HDM 4 8  Horse 0 1  Ragweed 0 2  Site  (Number of individuals) Laval 13 14  McMaster 5 10  UBC 0 1    2.3.6 Experimental techniques 2.3.6.1 RNA extraction RNA in blood was extracted from 5 mL of the PAXgene blood samples using the PAXgene Blood miRNA Kit (PreAnalytiX, Switzerland). The RNA concentration and quality was assessed using a NanoDrop 8000 Spectrophotometer (Thermo Scientific, USA) and Agilent 2100 Bioanalyzer system (Agilent Technologies, USA). A RIN (RNA integrity number) of 7.0 was used as the threshold for inclusion.  19  2.3.6.2 RNA-Seq  The extracted RNA samples in the discovery set were sent to Génome Québec (Centre d’Innovation Genome Quebec et Université McGill) for sequencing. The RNA quality was assessed again by Génome Québec prior to sequencing. External RNA Control Consortium (ERCC) spike-in controls (92 sequences) were added to all samples. rRNA/globin-depleted stranded cDNA libraries were constructed and sequenced using an Illumina HiSeqTM 2000 sequencing system as 100 base pairs (bp) paired-end reads.  2.3.6.3 NanoString nCounter Elements assay 100 ng of the purified total RNA was used in a custom NanoString nCounter ElementsTM assay (NanoString Technologies, US) to simultaneously quantify the expression of 166 selected transcripts, including 8 candidates from the complement and coagulation pathways. The candidates were selected based on the analysis of RNA-Seq data. 100 bases of oligonucleotide probes were designed for all selected transcripts using NanoString bioinformatics web-based tools by which users were allowed to select specific probe sequences for transcripts of interest. The NanoString nCounter Elements technologyTM consists of an nCounter Elements TagSet (capture and reporter probes) and target-specific oligonucleotide probe pairs (probe A & B) (Figure 2.2). The reporter probe is fluorescently labeled with a unique colour code (sequence of six colour spots) for each target transcript sequence. In the assay, the purified RNA sample was mixed with the specific oligonucleotide probe pairs, the 50-nucleotide-sized reporter probes and the capture probes of the target transcripts for hybridization at 67°C over 16 hours to form the tag-target complex (Figure 2.2). During hybridization, the reporter and the capture tags bound to probe A and B which further bound to the target transcript sequence. Then, the hybridized 20  samples were processed with a “high sensitivity protocol” on the automated PrepStation for 3 hours to immobilize the tag-target complex to the glass cartridge. The transcript counts were acquired from the GEN2 Digital Analyzer after a 2.5-hour scan.  Figure 2.2 NanoString nCounter Elements tag-target complex.  2.3.6.4 ELISA Baseline blood plasma (prior to allergen inhalation challenge) levels of C3a and C5a were measured by specific ELISAs. The assays were performed in the Conway Laboratory (Centre for Blood Research, Vancouver) using Quidel MicroVueTM C3a Plus EIA Kits and Quidel MicroVueTM C5a Plus EIA Kits, respectively, according to the manufacturer's instructions.   2.3.7 Data analysis 2.3.7.1 RNA-Seq processing All raw RNA-Seq FASTQ files were quality controlled using FastQC 42 standards. The first 5 bases and the last 25 bases were trimmed from the 100 bp paired-end reads using Seqtk (version 1.0). The University of California, Santa Cruz (UCSC) GTF file and 2013 human reference genome (GRCh38 build) were used to build a reference genome sequence for RSEM. 21  The sequencing data were aligned using Bowtie2 aligner (version 2.2.4) and quantified at both gene and gene-isoform levels using RSEM (version 1.2.19).  The gene and isoform counts were then normalized to log2 counts per million using the following equation from the voom() function of limma: 𝑋"#$% = 𝑙𝑜𝑔*( 𝑋,#-"./ + 0.5𝑙𝑖𝑏. 𝑠𝑖𝑧𝑒 + 1 ∗ 10;) where Xcounts is a p (variables) by n (samples) matrix of the count data and lib.size is the sum of all counts in each sample.  2.3.7.2 NanoString data processing The NanoString data were normalized and analyzed using the statistical computing environment, R (version 3.2.4). Firstly, the data were assessed for all the recommended quality control metrics including field of view, binding density, positive spike-ins and background signals (see appendix A.3 for details). The Elements assay uses six levels of positive spike-ins: 128fM, 32fM, 8fM, 2fM, 0.5fM and 0.125fM and 13 housekeeping genes. The first 5 levels of positive spike-ins were used for positive control normalization (as recommended by NanoString) to eliminate assay-to-assay variability and the top 5 housekeeping genes with the lowest coefficient of variations were used for house-keeping normalization to correct for the biological variations across the samples. Since there was no obvious batch effect between the data of the discovery set and the data of the validation set, the data of the two sets were normalized together by a positive control normalization followed by a housekeeping normalization.   22  2.3.7.3 Differential expression analysis Differential expression analysis was applied to the RNA-Seq data of 76 genes involved in the complement and coagulation systems to identify potential candidate genes, and to the NanoString data to validate the candidate genes, using linear models for microarray and RNA-Seq data (limma) 43. Limma shrinks the variance of each variable towards a common value by using a moderation factor, which reduces the number of false positives observed in studies with small sample sizes 44. The Benjamini-Hochberg method was used to control the false discovery rate (BH-FDR) 45 and to adjust for multiple hypotheses testing.  2.3.7.4 Hypothesis testing on ELISA data  The absorbance reading at 450 nm of standard dilutions were used to build the standard curves and the C3a and C5a concentrations were calculated from the standard curves. Shapiro-Wilk normality tests were used to assess the normality of the data. Non-parametric Kruskal-Wallis H tests were used to compare the concentrations of C3a and C5a among ERs, DRs and non-asthmatic individuals.  2.4 Results 2.4.1 Transcriptomic difference between ERs and DRs (RNA-Seq) Samples from patients in the discovery set were used for RNA-Seq data analyses. Differential expression analysis was performed to compare the difference between ERs and DRs. For discovery purposes, the eight top-ranked gene isoforms were selected as candidates and designed into a custom NanoString nCounterTM Elements assay (see appendix A.4 for gene 23  names and probe sequences). The log2 fold change, expression levels and P-values, BH-FDRs of these isoforms are shown in Figure 2.3 and Table 2.4.   Figure 2.3 Expression levels of gene transcript isoforms that are differentially expressed between ERs and DRs at pre-challenge (RNA-Seq).        24   Table 2.4 Differential expression of ERs and DRs in the discovery set at pre-challenge (RNA-Seq). *: average log2 expression of DRs – average log2 expression of ERs.  UCSC ID Gene Symbol logFC* Average Expression P-Value BH-FDR uc001muv.4 CD59 -0.346 5.984 0.014 0.287 uc001yda.1 SERPINA1 0.422 6.011 0.015 0.287 uc001hfq.4 CD55 -0.408 6.707 0.026 0.287 uc001hfr.4 CD55 -1.033 1.464 0.028 0.287 uc001ydc.4 SERPINA1 0.599 2.174 0.030 0.287 uc001ycz.4 SERPINA1 0.487 4.252 0.035 0.287 uc002pgj.1 C5AR1 0.347 5.262 0.037 0.287 uc001muu.4 CD59 1.135 2.811 0.047 0.320 uc031yse.1 C1R 0.507 -0.094 0.068 0.343 uc003mwv.3 F13A1 0.382 8.350 0.072 0.343 uc001hgm.3 CD46 -0.278 7.846 0.074 0.343 uc001ggg.1 F5 0.298 5.523 0.076 0.343  2.4.2 Technical validation (NanoString Elements) A technical validation was applied to the majority (29) of the samples from the discovery set using the custom-designed NanoString nCounterTM Elements assay. The differential expression analysis results at pre-challenge are shown in Table 2.5 and the expression levels of these genes are shown in Figure 2.4. Using a BH-FDR cutoff of 0.1, only F13A1 was differentially expressed between ERs and DRs. SERPINA1 was trending to significance as indicated by the P-value.     25  Table 2.5 Differential expression of ERs and DRs in the discovery set at pre-challenge (NanoString). *: average log2 expression of DRs – average log2 expression of ERs.   logFC* Average Expression t P-Value BH-FDR F13A1 0.598 10.740 3.054 0.005 0.036 SERPINA1 -0.207 13.035 -1.681 0.102 0.410 CD55 -0.076 5.302 -0.746 0.461 0.952 C5AR1 -0.065 10.740 -0.721 0.476 0.952 C3AR1 0.059 8.203 0.314 0.756 0.984 PLAUR -0.025 10.168 -0.235 0.815 0.984 CD46 -0.002 12.125 -0.025 0.980 0.984 CD59 -0.002 9.062 -0.020 0.984 0.984   Figure 2.4 Gene expression level of ERs and DRs in the discovery set at pre-challenge (NanoString).  26  2.4.3 Biological validation (NanoString Elements) As an external biological validation, the same NanoString Elements assay was applied to the independent validation set. The differential expression analysis results at pre-challenge are shown in Table 2.6 and the expression levels of these genes are shown in Figure 2.5. Using a BH-FDR cutoff of 0.1, 4 genes (PLAUR, SERPINA1, CD46 and C5AR1) were differentially expressed between ERs and DRs.   Table 2.6 Differential expression of ERs and DRs in the validation set at pre-challenge (NanoString). *: average log2 expression of DRs – average log2 expression of ERs.   logFC* Average Expression t P-Value BH-FDR PLAUR -0.302 10.207 -3.104 0.003 0.025 SERPINA1 -0.285 12.956 -2.454 0.018 0.071 CD46 -0.170 11.960 -2.103 0.041 0.088 C5AR1 -0.236 10.808 -2.068 0.044 0.088 F13A1 -0.228 10.590 -1.055 0.297 0.457 C3AR1 -0.156 8.099 -0.954 0.345 0.457 CD55 0.104 5.188 0.850 0.400 0.457 CD59 0.021 8.921 0.193 0.848 0.848  27   Figure 2.5 Gene expression level of ERs and DRs in the validation set at pre-challenge (NanoString).  2.4.4 Differential abundance of C3a and C5a C3a and C5a protein levels were measured by ELISA. Non-parametric Kruskal-Wallis H tests was used to compare C3a and C5a levels among ERs, DRs and non-asthmatic controls. There was a significant difference in C3a levels among ERs, DRs and non-asthmatic controls (P = 0.02) (Figure 2.6) while there was no difference in the C5a levels among the three groups (Figure 2.7) (P = 0.54). Post-hoc tests were performed using Dunn’s test 46 and the FDR was controlled using the Benjamini-Hochberg method (BH-FDR) 45. C3a levels in both ERs and DRs were significantly higher than non-asthmatic controls (ER vs. non-asthmatics: BH-FDR = 0.01, DR vs. non-asthmatics: BH-FDR = 0.03) (Figure 2.6).  28   Figure 2.6 C3a concentration in ERs, DRs and non-asthmatic controls at pre-challenge.  29   Figure 2.7 C5a concentration in ERs, DRs and non-asthmatic controls at pre-challenge.  2.5 Discussion Our NanoString results revealed SERPINA1 as a gene of interest which may be associated with the late phase response of allergic asthma. Although the differential expression of SERPINA1 between ERs and DRs was not significant in the discovery set, it was trending to significance and the direction of change was in the same direction as it was in the validation set. SERPINA1 encodes a serine protease inhibitor which targets elastase, plasmin, thrombin, trypsin, 30  chymotrypsin, and plasminogen activator. Abnormal expression of α1-antitrypsin (SERPINA1) may lead to an imbalance between α1-antitrypsin and elastase which forms the extracellular matrix. Degradation of the extracellular matrix is an important characteristic of airway inflammation and remodeling. Both α1-antitrypsin and elastase have been reported to be increased in sputum of patients with asthma 47. Gharib et al. have also shown using both Western blotting and shotgun proteomics that α1-antitrypsin was differentially expressed in sputum when comparing patients with or without asthma 48. Since we hypothesized that DRs may be more prone to severe asthma because of the presence of the chronic LAR, α1-antitrypsin may also play a role in progression to the late phase response of allergic asthma.  C3a is a key molecule of the complement system and our ELISA data confirmed previous literature that it is increase in patients with asthma. C3a is involved in the recruitment and activation of leukocytes and can trigger the release of many allergic mediators, such as leukotrienes, histamine, IL-1, IL-6, and TNF 28. C3a can also induce airway mucus secretion, smooth muscle contraction and increased vascular permeability 49. Although based on our data, C5a is not associated with the late phase response of asthma, there are other studies showing it may be a novel therapeutic target for allergic asthma. Eculizumab is a monoclonal antibody which prevents the formation of C5a and C5b-9 by binding to C5. Gauvreau et al. have reported that a single infusion of eculizumab 24 hours prior to the allergen challenge shows significant attenuation of the LAR 50. However, in another clinical trial outlined in a review by Monk et al., an oral dose of C5aR antagonist (NGD-2000-1) did not show any improvement in lung function 51.  31  2.6 Conclusions and future directions Limitations of the study include the small sample size of ERs, the limited consistency across different platforms (RNA-Seq to NanoString), and the heterogeneity of the phenotypes. Although we tried to match the probe sequences for NanoString to the candidate transcript sequences that we selected from RNA-Seq data, this does not guarantee equivalence. Also, some of the participants who went through the allergen challenges multiple times showed different types of responses each time, which led to difficulties in assigning phenotypic labels. In addition, since many of the complement and coagulation proteins are activated upon cleavage (e.g. C3a, C5a, factor Xa, factor IIa, and etc.), they may be different in their protein levels between ERs and DRs while the differences cannot be detected at the gene level. All these issues might contribute to the corroboration challenges that we observed in this study. Based on the current results, the complement and coagulation systems may be associated with allergic asthma, although the connection is not very strong. Further investigations including the evaluation of protein levels are required.  32  Chapter 3: Biomarkers of western red cedar asthma 52  3.1 Introduction Western red cedar asthma (WRCA) is the most common form of occupational asthma in British Columbia and is caused by increased sensitivity to plicatic acid (PA). The current diagnosis requires multiple bronchial challenges which are time-consuming, expensive and may cause discomfort to patients. This whole process takes at least two days and may require the patients to make several visits to the clinic. This may therefore delay the diagnosis of WRCA and hinder recovery from the disease. So far, there is no known molecular biomarker that can be used to diagnose WRCA without inhalational challenges. Peripheral whole blood is a useful and easily obtainable resource for studying WRCA and the transcriptional changes that occur after methacholine inhalation challenge 53. Instead of the traditional diagnostic method of WRCA which includes multiple inhalational challenges, a blood test would be ideal to more simply obtain an accurate diagnosis.   3.2 Hypothesis / Rationale We hypothesized that blood transcriptomic signatures could be developed into biomarker panels which would diagnose WRCA through a simple blood test. Patients with asthma who respond to PA are classified as PA-positive and patients with asthma who do not respond to PA are classified as PA-negative. We aimed to develop a blood-based biomarker panel which distinguishes PA-positive individuals from PA-negative individuals prior to the inhalation challenges. Longer-term, these biomarker molecules might also help to further elucidate the disease mechanism underlying WRCA. 33    3.3 Methods 3.3.1 Inhalational challenges Upon written informed consent, individuals with known or suspected WRCA were recruited into the study. All study patients underwent a methacholine challenge (Day 1) followed by a PA challenge (Day 2) at Vancouver General Hospital using standardized protocols 14. Before the challenges, patients were instructed not to use any inhaled short-acting bronchodilators for 12 hours, long-acting bronchodilators for 24 hours, and inhaled corticosteroids for at least 2 weeks. On Day 1, baseline spirometry and methacholine challenge was performed according to the American Thoracic Society guidelines 54. On Day 2, baseline spirometry was again obtained. PA was given to the patients in doubling doses (0.625, 1.25, 2.5, 5, and 10 mg/mL) and their FEV1 was measured after each increase in concentration until a drop of 20% in FEV1 was observed. On both days, FEV1 was monitored at regular intervals until 6 hours (20 min, 30 min, 40 min, 60 min and then hourly) after the commencement of inhalational challenge.  Patients with a positive response to methacholine were considered to have asthma and patients with a positive response to PA were diagnosed as having WRCA (PA-positive). According to previous literature, an EAR was defined a drop in FEV1 of 20% within the first 2 hours post-challenge and a LAR was defined as having a drop in FEV1 of 15% after 2 hours post-challenge. All PA-positive patients had either an EAR or a LAR, or both. Patients who did not respond to PA within the 6 hours of monitoring were diagnosed as PA-negative.  34  3.3.2 Blood collection Peripheral whole blood was collected in EDTA tubes and PAXgene Blood RNA tubes on both days prior to the challenges, and 2 hours and 6 hours post-challenge. After collection, blood samples were transferred to the Tebbutt laboratory at St. Paul’s Hospital for further processing. CBC/diffs were measured using EDTA blood samples to assess for changes in cellular composition. Then, 1 mL of each EDTA blood sample was aliquoted and the rest of the sample was centrifuged at 1200 x g for 10 minutes at room temperature and freshly processed into plasma, buffy coat and erythrocytes. The processed plasma, buffy coat, and erythrocyte samples and PAXgene blood samples were stored in a -80 ºC freezer.  3.3.3 Study participants 3.3.3.1 Discovery set  Seventeen individuals (8 PA-negative and 9 PA-positive) were selected for our discovery biomarker study. The clinical and demographic characteristics of the individuals are shown in Table 3.1. All 3 time-series blood samples collected on Day 2 PA-challenge (baseline 0h, 2h post-challenge and 6h post-challenge) were used in the study. Mann-Whitney U tests were used to compare the clinical and demographic characteristics between the PA-positive group and the PA-negative group.       35   Table 3.1 Clinical and demographic characteristics of the individuals in discovery set. Statistics are shown as mean ± standard deviations. *: Mann-Whitney U tests comparing PA-positive and PA-negative individuals.  PA-Positive (n=9) PA-Negative (n=8) P-value*  Age (Year) 48.0±12.3 38.8±12.8 0.152 BMI 28.40±3.70 29.35+6.97 0.962 Predicted FEV1 (L) 3.84±0.68 4.15±0.53 0.290 Predicted FEV1/FVC 0.90±0.19 0.81±0.02 0.736 Baseline FEV1 (L)        before methacholine challenge 2.91±0.94 3.33±0.79 0.481     before PA challenge 2.83±0.94 3.18±0.81 0.481 Current Smoker n = 0 n = 1   3.3.3.2 Validation Set An additional 7 PA-positive individuals were used for external validation. Their clinical and demographic characteristics are shown in Table 3.2. Only baseline (0h) blood samples collected on Day 2 PA-challenge were used in the validation study. Kruskal-Wallis H tests were used to compare the clinical and demographic characteristics among PA-positive group and PA-negative group in the discovery set, and PA-positive individuals in the validation set (3-group comparison).    36  Table 3.2 Clinical and demographic characteristics of the individuals in validation set. Statistics are shown as mean ± standard deviations. *: Kruskal-Wallis H tests comparing PA-positive individuals and PA-negative individuals in the discovery set, and PA-positive individuals in the validation set.  PA-Positive (n=7) P-value* Age (Year) 52.4±11.7 0.122 BMI 28.86±3.27 0.971 Predicted FEV1 (L) 3.86±0.83 0.509 Predicted FEV1/FVC 0.95±0.27 0.928 Baseline FEV1 (L)       before methacholine challenge 2.93±0.89 0.618     before PA challenge 2.74±0.79 0.516 Current Smoker n = 0    3.3.4 Experimental techniques 3.3.4.1 Blood RNA preparation Total RNA was extracted and purified from 5 ml of PAXgene blood samples using the PAXgene Blood miRNA Kit (PreAnalytiX, Switzerland). The RNA concentration and quality was assessed using NanoDrop 8000 Spectrophotometer (Thermo Scientific, USA) and Agilent 2100 Bioanalyzer system (Agilent Technologies, USA). A RIN number of 7 was used as a threshold for sample inclusion.  3.3.4.2 NanoString nCounter Elements assay 100 ng of the purified total RNA was used in the same NanoString nCounter ElementsTM assay (NanoString Technologies, US) described in Chapter 2 to simultaneously quantify the expression of 166 transcripts, which are mostly involved in immune processes such as the Th2 pathway. Please see Chapter 2 for details. 37   3.3.5 Statistical methodologies 3.3.5.1 Data normalization and batch correction The obtained NanoString data were normalized and analyzed within the statistical computing environment R (version 3.2.4). Firstly, the data were assessed for all the recommended quality control metrics as described in Chapter 2 (Section 2.3.7.2). The first 5 concentrations of positive spike-ins were used for positive control normalization (as recommended by NanoString) to eliminate inter-assay variability and the top 5 housekeeping genes with the lowest coefficient of variations were used for house-keeping normalization to correct for the biological variations across the samples. The data were normalized by a positive control normalization followed by a housekeeping normalization for both the discovery set and the validation set. Since there was a batch effect between the two sets of data, the data from the discovery set and the data from the validation set were normalized separately, and then batch-corrected using Combat 55 R package.  3.3.5.2 Comparisons on CBC/diffs The relative levels of leukocyte subtypes were compared at each time point and throughout the challenges between the PA-positive group and PA-negative group of the discovery set using non-parametric Mann-Whitney U tests.   3.3.5.3 Biomarker analysis After normalization, a biomarker discovery analysis was performed to identify biomarker panels that can differentiate PA-positive individuals from PA-negative individuals at baseline 38  (before Day 2 PA challenge). The panels were identified using a biomarker development pipeline, which includes various classification methods such as elastic net, random forests and Support Vector Machines (SVM). The pipeline also incorporated a pathway-directed approach in which a list of genes was pre-selected if they were in the same pathway. Then, these genes (or features) were further selected using various classification algorithms. Area under the receiver operating characteristic curve (AUC) obtained from leave-one-out cross-validation (LOOCV) was used as a criterion to assess the performance of the identified panels. The best performing panel was then tested using individuals from the validation set (see appendix B.1 for a detailed workflow of the discovery and validation procedures).   3.3.5.4 Differential expression analysis Differential expression analysis was performed on all 166 transcripts using limma 43 R package. The Benjamini-Hochberg method was used to control the false discovery rate (BH-FDR) 45 and adjust for multiple hypotheses testing.  3.4 Results 3.4.1 Clinical and demographic characteristics As shown in Table 3.1, there was no significant difference in age, BMI, or baseline FEV1 (before methacholine challenge and before PA challenge) between the PA-positive group and the PA-negative group in the discovery set using non-parametric Mann-Whitney U tests. There was no significant difference in % change in FEV1 from baseline at any time point between the two groups post Day 1 methacholine challenge (Figure 3.1). During the course of the Day 2 PA 39  challenge (Figure 3.2), the PA-positive individuals experienced significant drops in FEV1 compared to the PA-negative individuals at all time points post-challenge (P < .001).   Figure 3.1 Percentage change in FEV1 in discovery set post methacholine challenge (Day 1). Statistics are shown as mean ± standard deviations.  40   Figure 3.2 Percentage change in FEV1 in discovery set post PA challenge (Day 2). Statistics are shown as mean ± standard deviations.  3.4.2 Comparisons on CBC/diffs Figures 3.3 and 3.4 show the relative proportions of leukocyte subtypes (neutrophils, lymphocytes, monocytes, eosinophils and basophils from left to right) of individuals in the discovery set following Day 1 methacholine challenge and Day 2 PA challenge, respectively. Two-sample t-tests were used to compare the relative proportions of leukocyte subtypes between PA-positive individuals and PA-negative individuals. There was no significant difference in any of the leukocyte subtypes at each time point following Day 1 methacholine challenge or following Day 2 PA challenge. Paired tests were used to compare the leukocytes subtype 41  proportion changes between time points, between PA-positive individuals and PA-negative individuals. To achieve this, mean leukocytes subtype proportion differences were first calculated by subtracting paired samples between time points. The mean differences were then compared between PA-positive individuals and PA-negative individuals using two-sample t-tests. During Day 2 PA challenge, neutrophils changed differently from 0h to 6h (P = 0.046) and from 2h to 6h (P = 0.005) between the PA-positive group and the PA-negative group (Figure 3.4); lymphocytes changed differently from 2h to 6h (P = 0.023) between the PA-positive group and the PA-negative group (Figure 3.4); eosinophils changed differently from 0h to 6h (P = 0.046) and from 2h to 6h (P = 0.016) between the PA-positive group and the PA-negative group (Figure 3.4). Solid lines in the figure represent the locally weighted polynomial regression (LOESS) of the data across the 3 time points and the shaded area represent the 95% confidence interval of the regression model.  42   Figure 3.3 Relative proportion (%) of leukocyte subtypes (discovery set) following Day 1 methacholine challenge. 43   Figure 3.4 Relative proportion (%) of leukocyte subtypes (discovery set) following Day 2 PA challenge.  3.4.3 Biomarker panel to distinguish PA-positives and PA-negatives Having demonstrated that there was no difference in the clinical and demographic characteristics between the PA-positive and the PA-negative groups (excepting following PA challenge) in the discovery set, we undertook a biomarker discovery approach to identify a classifier of PA-positive individuals at baseline (see section 3.3.5.3 and appendix B.1 for details). Figure 3.5 shows the cross-validated AUC of all the tested panels.  44   Figure 3.5 AUC performance of all the tested panels in discovery set. Dotted line indicates an AUC performance of 0.5.  Of all the panels tested, a two-gene panel (MAP2K2 and MAPKAPK2) identified using a MAPK signaling pathway-directed approach with random forests demonstrated the highest AUC performance of 0.847 (95% Confidence Interval: 0.631-1.000) (Figure 3.6) (see section 3.3.5.3 and appendix B.1 for details about the biomarker analysis pipeline). The phenotypic labels (PA-positive or PA-negative) were then reshuffled and the AUC was recalculated 200 times using 45  LOOCV. The AUC performance (mean ± standard deviation) after reshuffling the phenotypic labels was 0.600 ± 0.095 (Figure 3.6).   Figure 3.6 AUC performance of the reported WRCA biomarker panel (solid line) and the AUC performance after the reshuffling of phenotypic labels (dashed line) in discovery set. Error bars indicate mean sensitivity ± standard deviations. (Reprinted with permission of the American Thoracic Society. Copyright (c) 2017 American Thoracic Society. Cite: Chen Xi Yang, Amrit Singh, Young Woong Kim, Edward M Conway, Christopher Carlsten, Scott J Tebbutt/2017/Diagnosis of Western Red Cedar Asthma Using a Blood-based Gene Expression Biomarker Panel/American Journal of Respiratory and Critical Care Medicine/Epub ahead of print. The American Journal of Respiratory and Critical Care Medicine is an official journal of the American Thoracic Society.) 46   Random forests are a tree-based ensemble method for machine learning. It combines many predictive tree classifiers where the two nodes of each tree represent the phenotypic classes (in the case of binary classification) and the branches are split based on the value of a certain feature. The features are ranked based on their importance score which reflects their appearing frequencies and abilities to split the phenotypic classes.  The classification was based on the log2 expression of MAP2K2 and MAPKAPK2 (see appendix B.2 for the classification tree). The classification of individuals in the discovery set and their probability scores are shown in Figure 3.7 and 3.8, respectively. The optimal probability threshold was calculated using the Youden Index. Individuals with a probability score higher than 0.362 were classified as PA-positive. 47   Figure 3.7 Classification of individuals in the discovery set based on log2 expression of MAP2K2 and MAPKAPK2.  48    Figure 3.8 Probability score of individuals in the discovery set.  3.4.4 Differential gene expression between PA-positives and PA-negatives Comparisons of the expression of the genes were made between the PA-positive group and the PA-negative group at baseline (prior to PA challenge) and throughout the course of PA-challenge (expression change from 0h to 6h). At an BH-FDR cutoff of 10%, there were no genes that were significantly differentially expressed between the two groups. However, restricting the 49  comparisons to the two panel genes (MAPK2 and MAPKAPK2), both MAP2K2 and MAPKAPK2 showed differential expression between PA-positive and PA-negative individuals (BH-FDR = 0.027). The expression levels of MAP2K2 and MAPKAPK2 during the 6-hour PA challenge are shown in Figure 3.9. Thirty-six genes showed significant differences in expression change from 0h to 6h (expression of 6h minus expression of 0h) between the PA-positive group and the PA-negative group using an BH-FDR cutoff of 10% (see appendix B.3-4 for details). Solid lines in the figure represent the LOESS regression of the data across the 3 time points and the shaded area represent the 95% confidence interval of the regression model.  Figure 3.9 Expression levels of MAP2K2 and MAPKAPK2 during the 6-hour PA challenge.  3.4.5 Biomarker panel validation To validate this panel, 7 additional PA-positive individuals (validation set) were tested. Batch correction across the discovery and validation sets was performed using ComBat 55 R 50  package. Batch difference was used as the only variable during correction. The classification model built using original (uncorrected) data from the discovery set was recalibrated after batch correction. Using the recalibrated model and the threshold (0.362) that was previously established using the discovery samples, 6 (86%) of these new individuals were correctly classified as PA-positive. The classification of individuals in the validation set and their probability scores are shown in Figure 3.10 and 3.11, respectively.   Figure 3.10 Classification of individuals in the validation set based on log2 expression of MAP2K2 and MAPKAPK2. 51    Figure 3.11 Probability score of individuals in the validation set.  3.5 Discussion The mitogen-activated protein kinase (MAPK) signaling pathway represents a convergence of many cellular events and participates in many physiological functions. Kinases such as MAPKAPK2 and MAP2K2 in the MAPK signaling pathway play an important role in 52  activating and regulating inflammatory responses. MAPK pathway activates when stimuli such as growth factor, cell adhesion, cellular and oxidative stress, and inflammatory responses are received. Upon “switching on” the activator protein, the MAP kinases are activated through a phosphorylation cascade, where MAP3Ks (MAP kinase kinase kinases) turns into MAP2Ks (MAP kinase kinases), and then MAPKs (MAP kinases) 56. The MAPK signaling pathway divides into three major groups: extracellular signal-regulated protein kinase (ERK), c-Jun N-terminal kinase (JNK), p38 MAPK. MAPKAPK2 is a downstream molecule of p38 MAPK while MAP2K2 belongs to the ERK group. The p38 MAPK group is activated upon inflammatory responses and is involved in many aspects of asthma such as cytokine release, mast cell migration, monocyte differentiation and neutrophil recruitment 57. It is a key player in activating transcription factor GATA-binding protein (GATA) 3, which further regulates Th2 cell differentiation and Th2 cytokine expression 58. It also contributes to eosinophil differentiation 59 and inhibition of eosinophil apoptosis 60, which may be associated with the elevated levels of these cells observed in WRCA 61. Consistently, in our CBC/diff data, compared to PA-negative individuals, PA-positive individuals showed a greater decrease in the relative abundancy of blood eosinophils from 0h to 6h post-PA challenge, which may suggest a migration of eosinophils from blood to the airway. In addition, previous studies have also shown that inhibitors of p38 MAPK are effective in reducing lung inflammation in animal models of asthma and chronic obstructive pulmonary disease 62. MAP2K2 (or more commonly known as MEK2), is one of the MAP2Ks in the ERK group. Upon activation of the Ras protein, the Raf protein (MAP3K), MEKs (MAP2K) and ERKs (MAPK) are activated in a cascaded fashion. The ERK group is involved in Th2 cell differentiation and cytokine production as well 57. It regulates 53  airway smooth muscle cell proliferation 63, which may further contribute to remodeling and narrowing of the airway.   3.6 Conclusions and future directions Limitations of this study include small sample size (for both discovery and validation) and the functional significance of blood, reflecting changes in lung function. To ensure that the results were not confounded by other variables, we carefully matched the clinical and demographic characteristics of the two groups. In conclusion, we have demonstrated for the first time that peripheral whole blood may have utility as a resource to help diagnose WRCA and that the transcriptional signatures may help to further elucidate the disease mechanisms of WRCA. Further validation of the biomarker panel with larger sets and additional PA-negative individuals is required to confirm its robustness. Since the majority of the study individuals were dual responders and isolated early responders, the panel also needs to be further tested with a more mixed group of PA-positive individuals (including isolated late responders). In addition, although the panel showed adequate classification performance using baseline blood samples prior to Day 2 PA challenge, it is not clear whether Day 1 methacholine challenge affected the classification performance or not. In the future, we would also like to examine this effect using baseline blood samples from Day 1 methacholine challenge.  54  Chapter 4: Pipeline for RNA-Seq data processing  4.1 Introduction RNA-Seq is a powerful technology which allows unbiased profiling of the entire transcriptome. RNA-Seq does not rely on transcript-specific probes and therefore, supports quantification of novel transcripts. It is also less affected by background noise and signal saturation compared to microarrays. After sequencing, the obtained raw reads usually require the following processing before they are ready for further analysis: 1) quality control, 2) read alignment, and 3) read summarization. During the whole process, different tools are used and a large number of files are generated. Installation of these tools can be tedious and each tool has its own parameters, input files and output files. An automated pipeline is needed to reduce the complexity of this process and to make the data more reproducible. In this chapter, I developed an automated transcript quantification pipeline which was powered by a workflow management tool called “Snakemake”.  4.2 Methods 4.2.1 Workflow management with Snakemake The pipeline was implemented as a Snakemake workflow. This workflow is composed of a series of “rules”. Each rule specifies the input and output files and invokes shell commands that create the output files. Dependencies between the rules are determined automatically, and the jobs can be parallelized across cores, or even computer cluster nodes. Since Snakemake only updates output files if the input files are more recent, the workflow can be suspended or resumed at any time, without having to start over. Rules are carried by a Snakefile, which can serve as 55  a record of the input files, parameters and the programs used in the workflow. The inputs and program parameters can also be stored into a configuration file, which allows further flexibility. The workflow template repository is available on Github and can be cloned with the following command lines in the terminal:  # shell  # clone the repository git clone https://github.com/PROOF-centre/rnaseq_workflow.git  # move into the repository cd rnaseq_workflow  # inspect its contents ls -la  The folder contains a Snakefile, where the rules are defined, a config.yaml file, where the inputs and parameters are stored, and a requirements.yaml file, which lists the software used in the workflow. Here I will introduce each rule before all the rules are assembled in the section “All together now!” (Section 4.2.4).  4.2.2 Quality control with FastQC The first part of the pipeline runs quality control over the downloaded FASTQ files using FastQC 42.  FastQC performs simple checks on the raw sequencing data and produces a HTML-fomatted report on the analyses. These analyses can help determine whether the sequencing data has any problems that we should be aware of before doing any further analysis. Interpretations of the diagnostic plots in the report can be found on the FastQC website. This Snakemake rule takes all the (compressed) FASTQ files as input, passes the files one by one (or in pairs for paired-end reads) to the FastQC program and calls the shell 56  commands on the input files. After quality evaluation, the output reports are stored in a new folder called “FastQC”, separated with sample names. Running time will depend on the total number of reads (approximately 10 min/FASTQ file for 50M reads). The benchmarks such as running time for each sample can be stored in the “benchmarks” folder.  Snakemake Rule rule fastQC:     input:         lambda wildcards: config["samples"][wildcards.sample]     output:         "fastQC/{sample}/"     benchmark:         "benchmarks/fastQC/{sample}.benchmark.txt"     threads: 8     params:         outDir = "fastQC/{sample}/"     shell:         "zcat -c {input} | fastqc -t {threads} --outdir {params.outDir} stdin"  4.2.3 Quantifying transcript abundance with RSEM The next part of the pipeline quantifies the transcript abundance in each sample using the RSEM (RNA-Seq by Expectation-Maximization) 64 and STAR (Spliced Transcripts Alignment to a Reference) 65 programs. This is done by a two-step process: reads are first aligned to a reference genome sequence by STAR and then summarized both at gene level and isoform level by RSEM. For simplicity, RSEM includes a script — rsem-calculate-expression — that performs alignment and summarization in a single command using the parameters recommended by the Encyclopedia of DNA Elements (ENCODE) consortium. This second part of the data processing pipeline downloads and prepares the reference genome sequence, aligns the reads from each FASTQ file (or, in pairs in the case of paired-end reads) and summarizes them to gene and isoform counts for further analysis. 57   4.2.3.1 Read alignment In the first step, reads are aligned to the reference genome sequence by the STAR aligner. STAR is a multi-threaded, high performance alignment program, purpose-built for RNA-Seq. In particular, it greatly improves alignment accuracy for intron-spanning reads. STAR was chosen over alternatives, such as bowtie, mainly because of its flexibility and speed. It achieves a very high alignment rate by using a specialized in-memory reference genome index and this index can be quite large: in the case of the human genome, the system should have > 30GB of random access memory (RAM).  4.2.3.1.1 Generating the STAR index A reference genome index needs to be built in the first place, so that it can subsequently be loaded into memory prior to carrying out any read alignments. Both the reference genome sequence in a FASTA-formatted file and the gene annotation file in gene transfer format (GTF) will be used in this step. The GTF file is a tab-delimited text file format used to hold information about gene structure. The most current human reference genome and transcriptome sequences, and corresponding annotation files, can be obtained from the GENCODE website. Here I defined a pair of Snakemake rules to download these files from the Sanger Institute FTP servers to the ~/genome/ directory on the system. The download location is specified by the downloadDir parameter in the config.yaml file and can be easily modified by the user. These two rules will be automatically skipped if the ~/genome/ directory contains the downloaded FASTA and GTF files or the reference genome has been built already.   58  Snakemake Rule rule downloadGTF:     output:         expand("{downloadDir}gtf/gencode.v25.annotation.gtf", downloadDir = config["downloadDir"])     params:         url = "ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gtf.gz",         downloadDir = config["downloadDir"]     shell:         "wget -O - {params.url} | gunzip -c > {output}"  Snakemake Rule rule downloadFASTA:     output:         expand("{downloadDir}fasta/GRCh38.p7.genome.fa", downloadDir = config["downloadDir"])     params:         url = "ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh38.p7.genome.fa.gz",         downloadDir = config["downloadDir"]     shell:         "wget -O - {params.url} | gunzip -c > {output}"  After the FASTA and GTF files are downloaded to the ~/genome/ directory, the next rule invokes the rsem-prepare-reference RSEM script, which will call STAR and build a suitable reference genome index. Snakemake Rule rule generateRSEMIndex:     input:         gtf = expand("{downloadDir}gtf/gencode.v25.annotation.gtf", downloadDir = config["downloadDir"]),         fasta = expand("{downloadDir}fasta/GRCh38.p7.genome.fa", downloadDir = config["downloadDir"])     output:         expand("{downloadDir}index/rsem/", downloadDir = config["downloadDir"])     shell:         "rsem-prepare-reference \ 59          -p 8 \         --star \         --gtf {input.gtf} \         {input.fasta} \         {output}GRCh38"  When the pipeline is invoked for the first time, it calls the downloadGTF, downloadFASTA, and generateRSEMIndex rules to download the GTF annotation and reference genome sequence FASTA files, and generate the necessary STAR genome index. If the genome index already exists, however, the pipeline will skip these rules and immediately start to process the FASTQ files.  4.2.3.1.2 Aligning reads Once the genome index has been built, we are ready to align the reads. STAR will take either a single, or a pair of (compressed) FASTQ files for each sample, take each read or read-pair (sometimes referred to as a fragment) in turn, and find the optimal alignment along the reference genome sequence. Reads (or fragments), along with location of the optimal alignment, are returned and written to a new file, called a BAM file. This is the binary version of a sequence alignment map (SAM) file, a tab-delimited text file standard for sequence alignment data. Since the — rsem-calculate-expression — script of RSEM does read alignment and summarization in a single shell command, no rule was defined specifically for alignment.  4.2.3.2 Read summarization This part of the pipeline calls the — rsem-calculate-expression — script of RSEM to invoke the STAR aligner to perform read alignment and the RSEM program to produce two tab-delimited text files containing the summarized read counts at both the gene level 60  and the isoform level. This Snakemake rule passes single, or pairs of (compressed) FASTQ files to STAR (via rsem-calculate-expression) for alignment using the created genome index. Conveniently, STAR can read from compressed FASTQ files directly (specified using --star-gzipped-read-file). Here I chose to discard the large BAM files following quantification with the flag --no-bam-output, but this can be modified in the Snakefile if the BAM files are of interest. Snakemake Rule rule RSEM:     input:         fastqs = lambda wildcards: config["samples"][wildcards.sample],         ref = expand("{downloadDir}index/rsem/", downloadDir = config["downloadDir"])     output:         "rsem/{sample}.genes.results",         "rsem/{sample}.isoforms.results"     benchmark:         "benchmarks/rsem/{sample}.benchmark.txt"     threads: 8     params:          readsType = config["readsType"],         id = "{sample}"     shell:        "rsem-calculate-expression \         {params.readsType} \         --star \         --star-gzipped-read-file \         -p {threads} \         --no-bam-output \         {input.fastqs} \         {input.ref}GRCh38 \         {params.id} && \         mv -v *.genes.results *.isoforms.results *.stat/ -t rsem/"   4.2.4 All together now!  Having described all the rules required to go from raw RNA-Seq reads contained in FASTQ files to summarized read counts for a set of annotated genomic features, here I put 61  together all the defined rules into a single Snakefile that both powers, and serves as a record of the data processing pipeline. I will also describe the config.yaml file -- a mechanism that allows us to customize this pipeline for different analytical requirements.   4.2.4.1 The Snakefile The Snakefile starts by specifying that the parameters are stored in a configuration file called config.yaml. Besides the rules that have been described in previous sections, we have a top-level rule (rule all) which invokes the other rules based on its dependencies. It also stores the current working environment, including the software installed and their version numbers, into an environment.yaml file for future reference. configfile: "config.yaml"  rule all:     input:         expand("fastQC/{sample}/", sample = config["samples"]),         expand("rsem/{sample}.genes.results", sample = config["samples"]),         expand("rsem/{sample}.isoforms.results", sample = config["samples"])     shell:         "conda env export > environment.yaml"  rule downloadGTF:     output:         expand("{downloadDir}gtf/gencode.v25.annotation.gtf", downloadDir = config["downloadDir"])     params:         url = "ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gtf.gz",         downloadDir = config["downloadDir"]     shell:         "wget -O - {params.url} | gunzip -c > {output}"  rule downloadFASTA:     output:         expand("{downloadDir}fasta/GRCh38.p7.genome.fa", downloadDir = config["downloadDir"]) 62      params:         url = "ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh38.p7.genome.fa.gz",         downloadDir = config["downloadDir"]     shell:         "wget -O - {params.url} | gunzip -c > {output}"  rule generateRSEMIndex:     input:         gtf = expand("{downloadDir}gtf/gencode.v25.annotation.gtf", downloadDir = config["downloadDir"]),         fasta = expand("{downloadDir}fasta/GRCh38.p7.genome.fa", downloadDir = config["downloadDir"])     output:         expand("{downloadDir}index/rsem/", downloadDir = config["downloadDir"])     shell:         "rsem-prepare-reference \         -p 8 \         --star \         --gtf {input.gtf} \         {input.fasta} \         {output}GRCh38"  rule fastQC:     input:         lambda wildcards: config["samples"][wildcards.sample]     output:         "fastQC/{sample}/"     benchmark:         "benchmarks/fastQC/{sample}.benchmark.txt"     threads: 8     params:         outDir = "fastQC/{sample}/"     shell:         "zcat -c {input} | fastqc -t {threads} --outdir {params.outDir} stdin"  rule RSEM:     input:         fastqs = lambda wildcards: config["samples"][wildcards.sample],         ref = expand("{downloadDir}index/rsem/", downloadDir = config["downloadDir"])     output:         "rsem/{sample}.genes.results",         "rsem/{sample}.isoforms.results"     benchmark:         "benchmarks/rsem/{sample}.benchmark.txt"     threads: 8 63      params:         readsType = config["readsType"],         id = "{sample}"     shell:         "rsem-calculate-expression \         {params.readsType} \         --star \         --star-gzipped-read-file \         -p {threads} \         --no-bam-output \         {input.fastqs} \         {input.ref}GRCh38 \         {params.id} && \         mv -v *.genes.results *.isoforms.results *.stat/ -t rsem/"  4.2.4.2 The config.yaml file While some commonly used parameters, such as taking the compressed format of FASTQ files as input, are specified in the Snakefile, some parameters, like whether to run STAR or RSEM in paired-end mode, may need to be modified for every new analysis. In the reported pipeline, the commonly used parameters were directly defined in the Snakefile while the project-specific parameters were stored in the config.yaml file. Each new project can be initialized with a static copy of the Snakefile and project-specific parameters can be modified in the config.yaml file. The config.yaml file is divided into two parts: the first part contains the parameters passed to RSEM and the STAR aligner (e.g. the path to the genome reference folder or whether the reads are paired-end reads or not) while the second part lists the samples to be processed, including sample IDs and paths to their corresponding FASTQ file(s).  4.2.4.3 Running the pipeline A number of programs need to be installed and placed on the system’s path before running the pipeline. The following steps describe the way of setting up a suitable environment 64  on a Linux system, as a regular user without administrative privileges, using Miniconda. Continue from section 4.2.1 where the template folder has been downloaded, 1. At the command line, install Miniconda. # shell  # download the install script wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  # run it bash Miniconda3-latest-Linux-x86_64.sh  # cleanup rm Miniconda3-latest-Linux-x86_64.sh  2. Create a named Miniconda environment with the required software. # shell conda env create -n "rnaseq" -f "requirements.yaml"  3. Activate the newly created Miniconda environment. # shell source activate "rnaseq"  All software specified in the requirements.yaml file should now be available to use at the command line.  4. Create a fastq folder under the project directory. # shell # create a new directory for the FASTQ files mkdir fastq  5. Move the FASTQ files from your experiment into the newly created directory. # shell # move all FASTQ files into it, e.g. mv /path/to/experiment/*.fastq.gz fastq/.  65  Next, the config.yaml file needs to be modified with the project-specific parameters and sample list. The pipeline is now ready to start. Before running, we can perform a “dry” run which prints the entire execution plan. 6. Preview the pipeline execution with the -n and -p flags. # shell snakemake -np  7. Run the pipeline, utilizing multiple cores with the --cores flag. # shell snakemake -p --cores 8  4.2.5 Structure of working directory Here is what the home directory should look like after running the pipeline. |-[Miniconda3]     |-This is the working environment with all the tools installed inside. |-[genome]     |-[gtf]         |-gencode.v25.annotation.gtf     |-[fasta]         |-GRCh38.p7.genome.fa     |-[index]         |-[rsem]             |-This folder contains the RSEM genome reference files. |-[rnaseq_workflow]     |-requirements.yaml     |-Snakefile     |-config.yaml     |-[fastq]         |-sample1_1.fastq.gz         |-sample1_2.fastq.gz         |-sample2_1.fastq.gz         |-sample2_2.fastq.gz         ...     |-[fastQC]         |-[sample1]             |-stdin_fastqc.html             |-stdin_fastqc.zip         |-[samples2]             |-stdin_fastqc.html 66              |-stdin_fastqc.zip         ...     |-[rsem]         |-sample1.genes.results         |-sample1.isoforms.results         |-[sample1.stat]         |-sample2.genes.results         |-sample2.isoforms.results         |-[sample2.stat]         ...     |-[benchmarks]         |-[fastQC]         |-[rsem]  4.3 Conclusions During RNA-Seq data processing, it is difficult to keep track of all the input, intermediate and output files. The process is also likely to take a long time because of the amount of the data. With this automated pipeline, the inputs and outputs can be placed in the project folder in a highly organized manner. The pipeline can also be suspended and resumed at any time without having to re-run it from the beginning. For each project, we only need to modify the parameters in the configuration file, which is much easier than rewriting scripts every time we start a new project, thereby greatly speeding up our workflow. In addition, since the working instructions, software (including version numbers), and software parameters are stored automatically, the pipeline also ensures the reproducibility of the data if we ever want to return to a project. 67  Reference 1.  Diamant Z, Gauvreau GM, Cockcroft DW, Boulet L-P, Sterk PJ, de Jongh FHC, et al. Inhaled allergen bronchoprovocation tests. J Allergy Clin Immunol. 2013;132:1045–1055.e6.  2.  Chan-Yeung M. Mechanism of occupational asthma due to Western red cedar (Thuja plicata). Am J Ind Med. 1994;25:13–8.  3.  Chan-Yeung M. Immunologic and nonimmunologic mechanisms in asthma due to western red cedar (Thuja plicata). J Allergy Clin Immunol. 1982;70:32–7.  4.  Chan-Yeung M, Lam S, Koener S. Clinical features and natural history of occupational asthma due to western red cedar (Thuja plicata). Am J Med. 1982;72:411–5.  5.  Kim H, Mazza J. Asthma. Allergy Asthma Clin Immunol. 2011;7:S2.  6.  Anderson GP. Endotyping asthma: new insights into key pathogenic mechanisms in a complex, heterogeneous disease. The Lancet. 2008;372:1107–19.  7.  Bai TR, Knight DA. Structural changes in the airways in asthma: observations and consequences. Clin Sci. 2005;108:463–77.  8.  Vedal S, Moria C-Y, Enarson D, Fera T, MacLean L, Tse KS, et al. Symptoms and Pulmonary Function in Western Red Cedar Workers Related to Duration of Employment and Dust Exposure. Arch Environ Health Int J. 1986;41:179–83.  9.  Chan-Yeung M. Fate of Occupational Asthma. Am Rev Respir Dis. 1977;116:1023–9.  10.  Chan-Yeung M, MacLean L, Paggiaro PL. Follow-up study of 232 patients with occupational asthma caused by western red cedar (Thuja plicata). J Allergy Clin Immunol. 1987;79:792–6.  11.  Carlsten C, Dybuncio A, Pui MM, Chan-Yeung M. Respiratory Impairment and Systemic Inflammation in Cedar Asthmatics Removed from Exposure. PLOS ONE. 2013;8:e57166.  12.  Lin FJ, Dimich-Ward H, Chan-Yeung M. Longitudinal decline in lung function in patients with occupational asthma due to western red cedar. Occup Environ Med. 1996;53:753–6.  13.  Chan-Yeung M, Barton GM, MaClean L, Grzybowski S. Occupational Asthma and Rhinitis Due to Western Red Cedar (Thuja plicata). Am Rev Respir Dis. 1973;108:1094–102.  14.  Biagioni BJ, Pui MM, Fung E, Wong S, Hosseini A, Dybuncio A, et al. Sputum adiponectin as a marker for western red cedar asthma. J Allergy Clin Immunol. 2014;134:1446–1448.e5.  68  15.  Dominguez-Ortega J, Barranco P, Rodríguez-Pérez R, Quirce S. Biomarkers in Occupational Asthma. Curr Allergy Asthma Rep. 2016;16:63.  16.  Tse KS, Chan H, Chan-Yeung M. Specific IgE antibodies in workers with occupational asthma due to western red cedar. Clin Exp Allergy. 1982;12:249–58.  17.  Tarlo SM, Lemiere C. Occupational Asthma. N Engl J Med. 2014;370:640–9.  18.  Frew A, Chan H, Dryden P, Salari H, Lam S, Chan-Yeung M. Immunologic studies of the mechanisms of occupational asthma caused by western red cedar. J Allergy Clin Immunol. 1993;92:466–78.  19.  Chan-Yeung M, Giclas PC, Henson PM. Activation of complement by plicatic acid, the chemical compound responsible for asthma due to western red cedar (Thuja plicata). J Allergy Clin Immunol. 1980;65:333–7.  20.  Klos A, Tenner AJ, Johswich K-O, Ager RR, Reis ES, Köhl J. The role of the anaphylatoxins in health and disease. Mol Immunol. 2009;46:2753–66.  21.  Obata H, Dittrick M, Chan H, Chan-Yeung M. Sputum eosinophils and exhaled nitric oxide during late asthmatic reaction in patients with western red cedar asthma. Eur Respir J. 1999;13:489–95.  22.  Chan-Yeung M, Obata H, Dittrick M, Chan H, Abboud R. Airway Inflammation, Exhaled Nitric Oxide, and Severity  of Asthma in Patients with Western Red Cedar Asthma. Am J Respir Crit Care Med. 1999;159:1434–8.  23.  Obata H, Dittrick M, Chan H, Chan-Yeung M. Sputum eosinophils and exhaled nitric oxide during late asthmatic reaction in patients with western red cedar asthma. Eur Respir J. 1999;13:489–95.  24.  Chan-Yeung M, Obata H, Dittrick M, Chan H, Abboud R. Airway Inflammation, Exhaled Nitric Oxide, and Severity  of Asthma in Patients with Western Red Cedar Asthma. Am J Respir Crit Care Med. 1999;159:1434–8.  25.  Lam S, LeRiche J, Phillips D, Chan-Young M. Cellular and protein changes in bronchial lavage fluid after late asthmatic reation in patients with red cedar asthma. J Allergy Clin Immunol. 1987;80:44–50.  26.  Frew AJ, Chan H, Lam S, Chan-Yeung M. Bronchial inflammation in occupational asthma due to western red cedar. Am J Respir Crit Care Med. 1995;151:340–4.  27.  Kukurba KR, Montgomery SB. RNA Sequencing and Analysis. Cold Spring Harb Protoc. 2015;2015:pdb.top084970.  28.  Zhang X, Köhl J. A complex role for complement in allergic asthma. Expert Rev Clin Immunol. 2010;6:269–77.  69  29.  Markiewski MM, Nilsson B, Ekdahl KN, Mollnes TE, Lambris JD. Complement and coagulation: strangers or partners in crime? Trends Immunol. 2007;28:184–92.  30.  Nagata S, Glovsky MM. Activation of human serum complement with allergens. J Allergy Clin Immunol. 1987;80:24–32.  31.  Maruo K, Akaike T, Ono T, Okamoto T, Maeda H. Generation of anaphylatoxins through proteolytic processing of C3 and C5 by house dust mite protease. J Allergy Clin Immunol. 1997;100:253–60.  32.  Krug N, Tschernig T, Erpenbeck VJ, Hohlfeld JM, Köhl J. Complement Factors C3a and C5a Are Increased in  Bronchoalveolar Lavage Fluid after Segmental Allergen Provocation in Subjects with Asthma. Am J Respir Crit Care Med. 2001;164:1841–3.  33.  A genome-wide search for asthma susceptibility loci in ethnically diverse populations. The Collaborative Study on the Genetics of Asthma (CSGA). Nat Genet. 1997;15:389–92.  34.  Hasegawa K, Tamari M, Shao C, Shimizu M, Takahashi N, Mao X-Q, et al. Variations in the C3, C3a receptor, and C5 genes affect susceptibility to bronchial asthma. Hum Genet. 2004;115:295–301.  35.  Drouin SM, Sinha M, Sfyroera G, Lambris JD, Wetsel RA. A Protective Role for the Fifth Complement Component (C5) in Allergic Airway Disease. Am J Respir Crit Care Med. 2006;173:852–7.  36.  McKinley L, Kim J, Bolgos GL, Siddiqui J, Remick DG. Allergens induce enhanced bronchoconstriction and leukotriene production in C5 deficient mice. Respir Res. 2006;7:129.  37.  Köhl J, Baelder R, Lewkowich IP, Pandey MK, Hawlisch H, Wang L, et al. A regulatory role for the C5a anaphylatoxin in type 2 immunity in asthma. J Clin Invest. 2006;116:783–96.  38.  Mizutani N, Nabe T, Yoshino S. Complement C3a regulates late asthmatic response and airway hyperresponsiveness in mice. J Immunol Baltim Md 1950. 2009;183:4039–46.  39.  Mizutani N, Goshima H, Nabe T, Yoshino S. Complement C3a-induced IL-17 plays a critical role in an IgE-mediated late-phase asthmatic response and airway hyperresponsiveness via neutrophilic inflammation in mice. J Immunol Baltim Md 1950. 2012;188:5694–705.  40.  Singh A, Yamamoto M, Kam SHY, Ruan J, Gauvreau GM, O’Byrne PM, et al. Gene-metabolite expression in blood can discriminate allergen-induced isolated early from dual asthmatic responses. PloS One. 2013;8:e67907.  70  41.  Singh A, Cohen Freue GV, Oosthuizen JL, Kam SHY, Ruan J, Takhar MK, et al. Plasma proteomics can discriminate isolated early from dual responses in asthmatic individuals undergoing an allergen inhalation challenge. Proteomics Clin Appl. 2012;6:476–85.  42.  Andrews S. FastQC A Quality Control tool for High Throughput Sequence Data [Internet]. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. [cited 2017 Aug 10]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 43.  Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.  44.  Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3.  45.  Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.  46.  Dunn OJ. Multiple Comparisons Using Rank Sums. Technometrics. 1964;6:241–52.  47.  Vignola AM, Bonanno A, Mirabella A, Riccobono L, Mirabella F, Profita M, et al. Increased Levels of Elastase and α1-Antitrypsin in Sputum of Asthmatic Patients. Am J Respir Crit Care Med. 1998;157:505–11.  48.  Gharib SA, Nguyen EV, Lai Y, Plampin JD, Goodlett DR, Hallstrand TS. Induced sputum proteome in healthy subjects and asthmatic patients. J Allergy Clin Immunol. 2011;128:1176–1184.e6.  49.  Wills-Karp M. Complement activation pathways: a bridge between innate and adaptive immune responses in asthma. Proc Am Thorac Soc. 2007;4:247–51.  50.  Gauvreau G, Boulet L-P, Severino B, Watson R, Peng T, Obminski G, et al. The effect of C5 inhibition by eculizumab on allergen-induced asthmatic responses in patients. Mol Immunol. 2009;46:2837–8.  51.  Monk PN, Scola A-M, Madala P, Fairlie DP. Function, structure and therapeutic potential of complement C5a receptors. Br J Pharmacol. 2007;152:429–48.  52.  Yang CX, Singh A, Kim YW, Conway EM, Carlsten C, Tebbutt SJ. Diagnosis of Western Red Cedar Asthma Using a Blood-based Gene Expression Biomarker Panel. Am J Respir Crit Care Med [Internet]. 2017 [cited 2017 May 11]; Available from: http://www.atsjournals.org/doi/abs/10.1164/rccm.201608-1740LE 53.  Tebbutt SJ, He J-Q, Singh A, Shannon CP, Ruan J, Carlsten C. Transcriptional Changes of Blood Eosinophils After Methacholine Inhalation Challenge in Asthmatics. Genomics Insights. 2012;5:1–12.  71  54.  Crapo RO, Casaburi R, Coates AL, Enright PL, Hankinson JL, Irvin CG, et al. Guidelines for methacholine and exercise challenge testing-1999. This official statement of the American Thoracic Society was adopted by the ATS Board of Directors, July 1999. Am J Respir Crit Care Med. 2000;161:309–29.  55.  Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat Oxf Engl. 2007;8:118–27.  56.  Johnson GL, Lapadat R. Mitogen-activated protein kinase pathways mediated by ERK, JNK, and p38 protein kinases. Science. 2002;298:1911–2.  57.  Pelaia G, Cuda G, Vatrella A, Gallelli L, Caraglia M, Marra M, et al. Mitogen-activated protein kinases and asthma. J Cell Physiol. 2005;202:642–53.  58.  Gross NJ, Barnes PJ. New Therapies for Asthma and Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. 2016;195:159–66.  59.  Adachi T, Choudhury BK, Stafford S, Sur S, Alam R. The differential role of extracellular signal-regulated kinases and p38 mitogen-activated protein kinase in eosinophil functions. J Immunol Baltim Md 1950. 2000;165:2198–204.  60.  Kankaanranta H, De Souza PM, Barnes PJ, Salmon M, Giembycz MA, Lindsay MA. SB 203580, an inhibitor of p38 mitogen-activated protein kinase, enhances constitutive apoptosis of cytokine-deprived human eosinophils. J Pharmacol Exp Ther. 1999;290:621–8.  61.  Chan-Yeung M, Barton GM, MaClean L, Grzybowski S. Occupational Asthma and Rhinitis Due to Western Red Cedar (Thuja plicata). Am Rev Respir Dis. 1973;108:1094–102.  62.  Norman P. Investigational p38 inhibitors for the treatment of chronic obstructive pulmonary disease. Expert Opin Investig Drugs. 2015;24:383–92.  63.  Black JL, Roth M, Lee J, Carlin S, Johnson PRA. Mechanisms of Airway Remodeling. Am J Respir Crit Care Med. 2001;164:S63–6.  64.  Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.  65.  Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma Oxf Engl. 2013;29:15–21.   72  Appendices  Appendix A   Supplementary material for Chapter 2  A.1 Genes of interest (microarrays) comparing ER and DR at pre-challenge (8ERs, 6DRs) Probe ID Gene Symbol logFC* Average Expression t P-Value BH-FDR 8037374 PLAUR -0.350 8.040 -2.360 0.031 0.854 8029907 C5AR1 -0.286 9.032 -1.780 0.094 0.854 8003656 SERPINF2 0.218 5.032 1.770 0.095 0.854 7981068 SERPINA1 -0.295 9.906 -1.743 0.100 0.854 8123259 PLG 0.425 4.335 1.704 0.107 0.854 8170215 F9 0.163 2.228 1.644 0.119 0.854 8024062 CFD -0.437 9.322 -1.548 0.141 0.854 7909400 CD46 -0.229 8.986 -1.545 0.141 0.854 8159491 C8G -0.197 6.013 -1.545 0.142 0.854 7947425 CD59 -0.291 6.752 -1.535 0.144 0.854 *: mean log2(expression in DR) - mean log2(expression in ER)   73  A.2 Proteins of interest (iTRAQ proteomics) comparing ER and DR at pre-challenge (4ERs, 4DRs) Gene Symbol Protein Name FC* p-Value BH-FDR C1R cDNA FLJ54471, highly similar to Complement C1r subcomponent 0.75 0.02 0.13 C1QC Complement C1q subcomponent subunit C 0.63 0.02 0.13 C1S Complement C1s subcomponent 0.67 0.02 0.13 C7 Complement component C7 0.83 0.03 0.15 C8G Complement component C8 gamma chain 0.87 0.03 0.16 C1QB Complement component 1, q subcomponent, B chain 0.82 0.04 0.16 C5 Complement component C5 0.61 0.05 0.16 * Fold-change: average expression in DR / average expression in ER  A.3 NanoString quality control (QC) criteria 1. Imaging QC: % FOV (field of view) must be greater than 75 FOV. 2. Binding Density QC: Binding density must be between 0.05 and 2.25. 3. Positive Control Linearity QC: The R2 must be greater than 0.9 between the counts and concentrations of the 6 positive controls. 4. Positive Control Limit of Detection QC: The second lowest positive control spike in (0.5fM) must have counts greater than the Mean ± 2SD of the negative controls for each sample.   74  A.4 NanoString probe sequence of candidate genes Gene Name Accession Position Target Sequence Tm CP Tm RP C3AR1 NM_004054.2 416-515 CAGTGTCTTCCTGCTTACTGCCATTAGCCTGGATCGCTGTCTTGTGGTATTCAAGCCAATCTGGTGTCAGAATCATCGCAATGTAGGGATGGCCTGCTCT 80 81 C5AR1 NM_001736.2 1196-1295 TTGCCTGTCTTTCCCAGACTTGTCCCTCCTTTTCCAGCGGGACTCTTCTCATCCTTCCTCATTTGCAAGGTGAACACTTCCTTCTAGGGAGCACCCTCCC 79 82 CD46 NM_172350.1 366-465 TATACCTCCTCTTGCCACCCATACTATTTGTGATCGGAATCATACATGGCTACCTGTCTCAGATGACGCCTGTTATAGAGAAACATGTCCATATATACGG 80 80 CD55 NM_000574.3 102-201 CCCTACTCCACCCGTCTTGTTTGTCCCACCCTTGGTGACGCAGAGCCCCAGCCCAGACCCCGCCCAAAGCACTCATTTAACTGGTATTGCGGAGCCACGA 82 82 CD59 NM_000611.4 731-830 GACTTGAACTAGATTGCATGCTTCCTCCTTTGCTCTTGGGAAGACCAGCTTTGCAGTGACAGCTTGAGTGGGTTCTCTGCAGCCCTCAGATTATTTTTCC 81 78 F13A1 NM_000129.3 3197-3296 TTCAGGTCCCCTTTCAGAGATATAATAAGCCCAACAAGTTGAAGAAGCTGGCGGATCTAGTGACCAGATATATAGAAGGACTGCAGCCACTGATTCTCTC 80 85 PLAUR NM_001005376.1 441-540 GAGAAGACCAACAGGACCCTGAGCTATCGGACTGGCTTGAAGATCACCAGCCTTACCGAGGTTGTGTGTGGGTTAGACTTGTGCAACCAGGGCAACTCTG 83 83 SERPINA1 NM_000295.4 761-860 TCACTGTCAACTTCGGGGACACCGAAGAGGCCAAGAAACAGATCAACGATTACGTGGAGAAGGGTACTCAAGGGAAAATTGTGGATTTGGTCAAGGAGCT 81 79 Tm CP: melting temperature of the capture probe Tm RP: melting temperature of the reporter probe   75  Appendix B   Supplementary material for Chapter 3   B.1 Discovery and validation workflow of the WRCA panel   76  B.2 Classification tree of the best performing panel     77  B.3 List of significant genes comparing expression change from 0h to 6h (PA-positive vs. PA-negative) Gene Symbol t P-Value BH-FDR GATA3 -4.090 0.001 0.030 ENTPD5 4.030 0.001 0.030 CHP1 3.996 0.001 0.030 CARM1 3.800 0.001 0.030 HIP1 3.740 0.001 0.030 SPARC 3.662 0.002 0.030 SULT1A2 3.644 0.002 0.030 PELI1_isoform 3.618 0.002 0.030 NAPA 3.573 0.002 0.030 IL17A 3.530 0.002 0.030 unknown2a_comp55647_c0_seq2 -3.522 0.002 0.030 SH3BGRL3 3.493 0.002 0.030 MPPED1 3.484 0.003 0.030 PSMF1 3.445 0.003 0.030 RALGPS2_isoform 3.423 0.003 0.030 C9orf78 3.399 0.003 0.030 HBA2 3.384 0.003 0.030 PDCD1 3.362 0.003 0.030 F13A1 3.342 0.003 0.030 MBNL3 3.277 0.004 0.034 RRAD 3.236 0.004 0.035 GTF2H2 3.018 0.007 0.054 MAPKAPK2 2.903 0.009 0.065 unknown2_comp55647_c0_seq2 2.888 0.010 0.065 noNamelncRNA 2.879 0.010 0.065 RPS6 -2.845 0.010 0.066 MT-ND1 2.834 0.011 0.066 AKT3 -2.777 0.012 0.072 DAP 2.760 0.013 0.072 SF3B1 -2.720 0.014 0.076 MT-CYB 2.672 0.015 0.081 ATP8A1 -2.650 0.016 0.083 RALGPS2 -2.592 0.018 0.087 78  Gene Symbol t P-Value BH-FDR SCARNA5 -2.575 0.019 0.087 FOS 2.573 0.019 0.087 PPP3R1 2.572 0.019 0.087    79  B.4 Expression levels of the significant genes comparing expression change from 0h to 6h (PA-positive vs. PA-negative)  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0354271/manifest

Comment

Related Items