Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles Cheung, Warren A; Francis Ouellette, BF; Wasserman, Wyeth W
Background: MEDLINE®/PubMed® currently indexes over 18 million biomedical articles, providing unprecedented opportunities and challenges for text analysis. Using Medical Subject Heading Over-representation Profiles (MeSHOPs), an entity of interest can be robustly summarized, quantitatively identifying associated biomedical terms and predicting novel indirect associations. Methods: A procedure is introduced for quantitative comparison of MeSHOPs derived from a group of MEDLINE® articles for a biomedical topic (for example, articles for a specific gene or disease). Similarity scores are computed to compare MeSHOPs of genes and diseases. Results: Similarity scores successfully infer novel associations between diseases and genes. The number of papers addressing a gene or disease has a strong influence on predicted associations, revealing an important bias for gene-disease relationship prediction. Predictions derived from comparisons of MeSHOPs achieves a mean 8% AUC improvement in the identification of gene-disease relationships compared to gene-independent baseline properties. Conclusions: MeSHOP comparisons are demonstrated to provide predictive capacity for novel relationships between genes and human diseases. We demonstrate the impact of literature bias on the performance of gene-disease prediction methods. MeSHOPs provide a rich source of annotation to facilitate relationship discovery in biomedical informatics.
Item Citations and Data
Attribution 4.0 International (CC BY 4.0)