UBC Research Data

Using text mining to link journal articles to neuroanatomical databases French, Leon; Pavlidis, Paul


The electronic linking of neuroscience information, including data embedded in the primary literature, would permit powerful queries and analyses driven by structured databases. This task would be facilitated by automated procedures that can identify biological concepts in journals. Here we apply an approach for automatically mapping formal identifiers of neuroanatomical regions to text found in journal abstracts, applying it to a large body of abstracts from the Journal of Comparative Neurology (JCN). The analyses yield over 100,000 brain region mentions, which we map to 8,225 brain region concepts in multiple organisms. Based on the analysis of a manually annotated corpus, we estimate mentions are mapped at 95% precision and 63% recall. Our results provide insights into the patterns of publication on brain regions and species of study in JCN but also point to important challenges in the standardization of neuroanatomical nomenclatures. We find that many terms in the formal terminologies never appear in a JCN abstract, and, conversely, many terms that authors use are not reflected in the terminologies. To improve the terminologies, we deposited 136 unrecognized brain regions into the Neuroscience Lexicon (NeuroLex). The training data, terminologies, normalizations, evaluations, and annotated journal abstracts are freely available at http://www.chibi.ubc.ca/WhiteText/.

Item Media

Item Citations and Data


CC0 Waiver