UBC Research Data

Whole blood RNA-seq demonstrates an increased host immune response in individuals with cystic fibrosis who develop nontuberculous mycobacterial pulmonary disease Prieto, Miguel; Quon, Bradley; Jang, Jiah; Franciosi, Alessandro N.; Av-Gay, Yossef; Bach, Horacio; Tebbutt, Scott J.

Description

Abstract

Background

Individuals with cystic fibrosis have an elevated lifetime risk of colonization, infection, and disease caused by nontuberculous mycobacteria. A prior study involving non-cystic fibrosis individuals reported a gene expression signature associated with susceptibility to nontuberculous mycobacteria pulmonary disease (NTM-PD). In this study, we determined whether people living with cystic fibrosis who progress to NTM-PD have a gene expression pattern similar to the one seen in the non-cystic fibrosis population.

Methods

We evaluated whole blood transcriptomics using bulk RNA-seq in a cohort of cystic fibrosis patients with samples collected closest in timing to the first isolation of nontuberculous mycobacteria. The study population included patients who did (n = 12) and did not (n = 30) develop NTM-PD following the first mycobacterial growth. Progression to NTM-PD was defined by a consensus of two expert clinicians based on reviewing clinical, microbiological, and radiological information. Differential gene expression was determined by DESeq2.

Results

No differences in demographics or composition of white blood cell populations between groups were identified at baseline. Out of 213 genes associated with NTM-PD in the non-CF population, only two were significantly different in our cystic fibrosis NTM-PD cohort. Gene set enrichment analysis of the differential expression results showed that CF individuals who developed NTM-PD had higher expression levels of genes involved in the interferon (α and γ), tumor necrosis factor, and IL6-STAT3-JAK pathways.

Conclusion

In contrast to the non-cystic fibrosis population, the gene expression signature of patients with cystic fibrosis who develop NTM-PD is characterized by increased innate immune responses.; Methods <h3 style="text-align:left;" align="left">Study population and clinical data</h3> This study is a secondary data analysis using blood samples and data from the “CF Biomarker” cohort approved by the University of British Columbia-Providence Health Care ethics review board (H12-00835). The local ethics board also reviewed and approved the secondary analysis (H20-00117). Patients in the parent cohort were recruited following informed consent at the St. Paul’s Hospital Adult CF Clinic (Vancouver, Canada) between January 2012 and December 2019. In the current analysis, we included participants who consented to the future use of their samples and data, had at least one positive respiratory culture for NTM, and had a whole blood RNA sample available (PAXgene® stored at -70°C). Lung transplant recipients and subjects without a definite diagnosis of CF were excluded. We preferentially selected blood samples taken during clinically stable periods and closest to the first positive growth of NTM. We did not limit blood sampling to within a specified time frame in relation to NTM infection or the development of NTM-PD as we expected an intrinsic downregulation of gene expression in immune‑related pathways in patients with CF who develop NTM-PD. <h3 style="text-align:left;" align="left">Clinical and demographic data analysis</h3> Demographic characteristics, anthropometric measurements, pulmonary function tests, CFTR genotyping results, and microbiology laboratory results were extracted from the clinical charts using a pre-defined case report form. Clinical data were extracted from baseline positive NTM culture until December 2019; participants were censored at the time of NTM-PD diagnosis, lung transplant, or death. The diagnosis of NTM-PD was defined independently by two expert CF clinicians, based on current CF NTM guidelines, and disagreements were resolved by consensus. Lung function measurements (forced expiratory volume in 1 second [FEV1]) were standardized according to the 2012 Global Lung Function Initiative equation. <h3 style="text-align:left;">RNA extraction and RNA sequencing experiments</h3> The PAXgene Blood micro-RNA Kit was used for RNA extraction (in five batches) following the manufacturer’s instructions (QIAGEN) and omitting the DNA depletion steps. The quality of the extracted RNA was evaluated using the 260/280nm ratio in a NanoDrop™ spectrophotometer; the mean concentration was 117.4 ± 73.2 (SD) ng/uL. The RNA Integrity Number (RIN) was evaluated using a 2100 Bioanalyzer instrument from Agilent. A RIN ≥ 7 was required for sequencing; two samples failed bioanalyzer quality control and were re-extracted. Poly-A RNA-seq libraries were prepared in a single batch with a starting concentration of 250 ng per sample. The library preparation was performed using the strand-specific Nextera NEB mRNA kit with adapters AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC and AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT. The NovaSeq 6000 S4 PE100 (Illumina) platform with automatic base calling (RTA3) and an initial concentration of 200 pM per library was used for sequencing. Samples were multiplexed in a single flow cell after dual barcoding. The sequencing run generated 150 bp paired-end reads, the lowest mean Phred+33 score per file was 36/40. The median number of reads per sample was 65 x 106 (range: 36–151 x 106). All sequencing data from this study can be found in NCBI Gene Expression Omnibus with accession GSE205161. <h3 style="text-align:left;">RNA-seq data analysis</h3> Alignment and quantification of raw reads (FASTQ format) were performed in a High-Performance computational cluster (CentOS 7 Linux). Fast-QC was used to evaluate the quality of raw reads before and after alignment. Untrimmed reads were aligned to the primary human assembly GRCh38.p13 v38 (May 21, 2021) from GENCODE using STAR v 2.7. We produced gene‑level quantification (bam format) using RSEM v1.3.3 with default parameters. Quality control of alignment was performed using Picard tools, and quality reports were summarized using MultiQC. The raw count data was imported to R version 4.1.1 using tximport and annotated with the Ensembl database version 104 of Bioconductor. Globin subunits and genes showing abnormally large expression counts (> 7*107) were filtered out before analysis; before filtering, 68–93% of reads per sample were overrepresented. An outlier sample was detected in exploratory principal component analysis (CFB2006) and removed from further analysis. Count data were corrected for extraction batches using the ComBat-seq algorithm. Differential gene expression of the group with vs. without NTM-PD was performed using DESeq2 (v 1.3.2) with a cut-off FDR of 0.3. No minimum fold-change threshold was defined to increase the recovery of differentially expressed genes in this small exploratory cohort. Finally, gene set enrichment analysis of the molecular signatures database (human hallmark pathways) was performed with the fgsea package using the fold change ranked results from DESeq2. The cell-specific population compositions were estimated and explored using gene expression deconvolution in cibersortX against the complete blood cell counts (CBC) measured in patients at the time of sample procurement. The deconvolution process used the LM22 (22 blood immune cell types) signature matrix, bulk mode batch correction, and a hundred iterations.; Usage notes

The data sets are provided as comma-separated values and can be opened with standard statistical software or explored with a spreadsheet program. In our analyses, we employed R and the GUI R studio (v 4.1.1) for analysis. Raw sequencing processing and transcript counting was done in a CentOS high performance cluster, dependencies and commands ran are described inside markdown scripts.

Item Metadata

Title	Whole blood RNA-seq demonstrates an increased host immune response in individuals with cystic fibrosis who develop nontuberculous mycobacterial pulmonary disease
Creator	Prieto, Miguel; Quon, Bradley; Jang, Jiah; Franciosi, Alessandro N.; Av-Gay, Yossef; Bach, Horacio; Tebbutt, Scott J.
Date Issued	2023-04-27
Description	<b>Abstract</b><br/> <strong>Background </strong></p> Individuals with cystic fibrosis have an elevated lifetime risk of colonization, infection, and disease caused by nontuberculous mycobacteria. A prior study involving non-cystic fibrosis individuals reported a gene expression signature associated with susceptibility to nontuberculous mycobacteria pulmonary disease (NTM-PD). In this study, we determined whether people living with cystic fibrosis who progress to NTM-PD have a gene expression pattern similar to the one seen in the non-cystic fibrosis population. <strong> </strong></p> <strong>Methods</strong></p> We evaluated whole blood transcriptomics using bulk RNA-seq in a cohort of cystic fibrosis patients with samples collected closest in timing to the first isolation of nontuberculous mycobacteria. The study population included patients who did (n = 12) and did not (n = 30) develop NTM-PD following the first mycobacterial growth. Progression to NTM-PD was defined by a consensus of two expert clinicians based on reviewing clinical, microbiological, and radiological information. Differential gene expression was determined by DESeq2.</p> <strong>Results</strong></p> No differences in demographics or composition of white blood cell populations between groups were identified at baseline. Out of 213 genes associated with NTM-PD in the non-CF population, only two were significantly different in our cystic fibrosis NTM-PD cohort. Gene set enrichment analysis of the differential expression results showed that CF individuals who developed NTM-PD had higher expression levels of genes involved in the interferon (α and γ), tumor necrosis factor, and IL6-STAT3-JAK pathways. <strong> </strong></p> <strong>Conclusion</strong></p> In contrast to the non-cystic fibrosis population, the gene expression signature of patients with cystic fibrosis who develop NTM-PD is characterized by increased innate immune responses.</p>; <b>Methods</b><br /><h3 style="text-align:left;" align="left"><span lang="EN-US">Study population and clinical data</span></h3> <p style="text-align:left;" align="left">This study is a secondary data analysis using blood samples and data from the “CF Biomarker” cohort approved by the University of British Columbia-Providence Health Care ethics review board (H12-00835). The local ethics board also reviewed and approved the secondary analysis (H20-00117). Patients in the parent cohort were recruited following informed consent at the St. Paul’s Hospital Adult CF Clinic (Vancouver, Canada) between January 2012 and December 2019. In the current analysis, we included participants who consented to the future use of their samples and data, had at least one positive respiratory culture for NTM, and had a whole blood RNA sample available (PAXgene® stored at -70°C). Lung transplant recipients and subjects without a definite diagnosis of CF were excluded. We preferentially selected blood samples taken during clinically stable periods and closest to the first positive growth of NTM. We did not limit blood sampling to within a specified time frame in relation to NTM infection or the development of NTM-PD as we expected an intrinsic downregulation of gene expression in immune‑related pathways in patients with CF who develop NTM-PD.</p> <h3 style="text-align:left;" align="left"><span lang="EN-US">Clinical and demographic data analysis</span></h3> <p style="text-align:left;" align="left">Demographic characteristics, anthropometric measurements, pulmonary function tests, CFTR genotyping results, and microbiology laboratory results were extracted from the clinical charts using a pre-defined case report form. <span lang="EN-US">Clinical data were extracted from baseline positive NTM culture until December 2019; participants were censored at the time of NTM-PD diagnosis, lung transplant, or death. </span>The diagnosis of NTM-PD was defined independently by two expert CF clinicians, based on current CF NTM guidelines, and disagreements were resolved by consensus. Lung function measurements (forced expiratory volume in 1 second [FEV1]) were standardized according to the 2012 Global Lung Function Initiative equation. </p> <h3 style="text-align:left;"><span lang="EN-US">RNA extraction and RNA sequencing experiments</span></h3> <p style="text-align:left;" align="left"><span lang="EN-US">The PAXgene Blood micro-RNA Kit was used for RNA extraction (in five batches) following the manufacturer’s instructions (QIAGEN) and omitting the DNA depletion steps</span><span lang="EN-US">. The quality of the extracted RNA was evaluated using the 260/280nm ratio in a NanoDrop™ spectrophotometer; the mean concentration was 117.4 ± 73.2 (SD) ng/uL. The RNA Integrity Number (RIN) was evaluated using a 2100 Bioanalyzer instrument from Agilent. A RIN ≥ 7 was required for sequencing; two samples failed bioanalyzer quality control and were re-extracted. Poly-A </span>RNA-seq libraries were prepared in a single batch with a starting concentration of 250 ng per sample. The library preparation was performed using the strand-specific Nextera NEB mRNA kit with adapters AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC and AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT. The NovaSeq 6000 S4 PE100 (Illumina) platform with automatic base calling (RTA3) and an initial concentration of 200 pM per library was used for sequencing. Samples were multiplexed in a single flow cell after dual barcoding. The sequencing run generated 150 bp paired-end reads, the lowest mean Phred+33 score per file was 36/40. The median number of reads per sample was 65 x 10<sup>6</sup> (range: 36–151 x 10<sup>6</sup>). All sequencing data from this study can be found in NCBI Gene Expression Omnibus with accession GSE205161.</p> <h3 style="text-align:left;"><span lang="EN-US">RNA-seq data analysis</span></h3> <p style="text-align:left;" align="left">Alignment and quantification of raw reads (FASTQ format) were performed in a High-Performance computational cluster (<span lang="EN-US">CentOS 7 Linux). Fast-QC was used to evaluate the quality of raw reads before and after alignment</span><span lang="EN-US">. Untrimmed reads were aligned to the primary human assembly GRCh38.p13 v38 (May 21, 2021) from GENCODE using STAR v 2.7</span><span lang="EN-US">. We produced gene‑level quantification (bam format) using RSEM v1.3.3 with default parameters</span><span lang="EN-US">. Quality control of alignment was performed using Picard tools, and quality reports were summarized using MultiQC</span><span lang="EN-US">. The </span>raw count data was imported to R version 4.1.1 using tximport<span lang="EN-US"> and annotated </span>with the Ensembl database version 104 of Bioconductor. Globin subunits and genes showing abnormally large expression counts (> 7*10<sup>7</sup>) were filtered out before analysis<span lang="EN-US">; before filtering, 68–93% of reads per sample were overrepresented</span>. An outlier sample was detected in exploratory <span lang="EN-US">principal component analysis </span>(CFB2006) and removed from further analysis. Count data were corrected for extraction batches using the ComBat-seq algorithm. Differential gene expression of the group with vs. without NTM-PD was performed using DESeq2 (v 1.3.2) with a cut-off FDR of 0.3. No minimum fold-change threshold was defined to increase the recovery of differentially expressed genes in this small exploratory cohort. <span lang="EN-US">Finally, gene set enrichment analysis of the molecular signatures database (human hallmark pathways) was performed with the fgsea package using the fold change ranked results from DESeq2</span><span lang="EN-US">. The cell-specific population compositions were estimated and explored using gene expression deconvolution in cibersortX against the complete blood cell counts (CBC) measured in patients at the time of sample procurement. The deconvolution process used the LM22 (22 blood immune cell types) signature matrix, bulk mode batch correction, and a hundred iterations</span><span lang="EN-US">.</span></p>; <b>Usage notes</b><br /> The data sets are provided as comma-separated values and can be opened with standard statistical software or explored with a spreadsheet program. In our analyses, we employed R and the GUI R studio (v 4.1.1) for analysis. Raw sequencing processing and transcript counting was done in a CentOS high performance cluster, dependencies and commands ran are described inside markdown scripts. </p>
Subject	Other; cystic fibrosis; Mycobacteria; RNA sequencing; gene expression; blood; infectious diseases
Type	Dataset
Notes	Dryad version number: 9</p> Version status: submitted</p> Dryad curation status: Published</p> Sharing link: https://datadryad.org/stash/share/FZ1DbrVy1KzveU5UQ_UbjqG-IrfFSouFSpspXhdnghg</p> Storage size: 463244905</p> Visibility: public</p>
Date Available	2023-04-25
Provider	University of British Columbia Library
License	CC0 1.0
DOI	10.14288/1.0439796
URI	https://doi.org/10.5683/SP3/DCKLWQ
Publisher DOI	https://doi.org/10.5683/SP3/DCKLWQ; https://doi.org/10.5061/dryad.np5hqbzx2
Grant Funding Agency	Michael Smith Foundation for Health Research; Michael Smith Foundation for Health Research
Aggregated Source Repository	Dataverse

Item Media

Item Citations and Data

Licence

CC0 1.0

Open Collections