PulmonDB: a gene expression lung diseases database. Altamirano, Ana Beatriz


There is a massive amount of transcriptomic data (microarrays and RNA-seq) accumulated since the development of this technology . Analyzing this data and integrate it to study a complex disease can be an overwhelming task, principally because it would require integrating data from different technologies and platforms. Moreover, the lack of uniformity on experimental annotations in public databases such as Gene Expression Omnibus (GEO) adds to the challenge. By integrating transcriptomic datasets from different sources and their curated annotations, we developed an online web resource to facilitate the exploration of gene expression profiles of two respiratory diseases: Idiopathic Pulmonary Fibrosis (IPF) and Chronic Obstructive Pulmonary Disease (COPD); our first aim was to build a database integrating existing transcriptomic data for the identification of differentially expressed genes that replicates in different experiments. This project sets the foundation to integrate transcriptomics data of other respiratory diseases and smoker phenotypes facilitating the identification of common and divergent pathways that lead to a pathological state. In 2011, Engelen et al. developed COMMAND, a platform that allows the comparison and integration of transcriptomics data from different sources and platforms into a compendium. COMMAND has been successfully used to build transcriptome data compendia in bacteria (Engelen K. et al., 2011) and grapevine (Moretto M., et al., 2016). We used COMMAND to create a human lung database that allows us to integrate, analyze and explore gene expression data from different sources by making contrast between controls and patients given clinical phenotypes (i.e age, gender, status of the disease, FEV1, etc.) when it is available. We selected the relevant transcriptome experiments for IPF and COPD by querying in GEO and ArrayExpress with selected key words. Each experiment was downloaded, imported to COMMAND and the experimental conditions were annotated, the contrast group was selected and a similar pipeline was used to normalized the data to create PulmonDB. PulmonDB is an exploratory web interface that contains gene expression data of IPF and COPD experiments, the platform will be expanded to include other abnormal lung phenotypes. This resource facilitates the exploration of gene expression profiles under different pathological conditions, and allows the identification of co-expression patterns. PulmonDB can help the scientific community to study which genes have a distinct expression profile related with a disease, explore the reproducibility across technologies and platforms, identify interesting co expression patterns across diseases and to find relationships among distinct clinical or experimental variables.

