Latent semantic analysis for retrieving related biomedical articles

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Latent semantic analysis for retrieving related biomedical articles Lin, Sheng-Ting

Abstract

Retrieving relevant scientific papers in a scalable way is increasingly important, as more and more studies are published. PubMed’s relevant article recommendation is based on MeSH assignments by indexers, which requires significant human resources and can become a limitation in making papers searchable. Many recommendation systems use singular value decomposition (SVD) to pre-compute related products. In this study, we look at using latent semantic analysis (LSA), an application of SVD to determine relationships in a set of documents and terms, to find related biomedical papers. We focused on determining the best parameters for SVD in retrieving relevant biomedical articles given a paper of interest. Using PubMed's recommendations as guidance, we found that using cosine distance to measure document similarity leads to better results than using Euclidean distance. We re-evaluated other parameters, including the weighting scheme and the number of singular values and using a larger abstract corpus. Finally, we asked people to compare the relevant abstract retrieved with our method against those retrieved by PubMed. Our method retrieved sensible articles that were chosen over PubMed's relevant papers one-third of the time. We looked into the abstracts retrieved by either method and discuss possible areas for experimentation and improvement.

Item Metadata

Title	Latent semantic analysis for retrieving related biomedical articles
Creator	Lin, Sheng-Ting
Publisher	University of British Columbia
Date Issued	2017
Description	Retrieving relevant scientific papers in a scalable way is increasingly important, as more and more studies are published. PubMed’s relevant article recommendation is based on MeSH assignments by indexers, which requires significant human resources and can become a limitation in making papers searchable. Many recommendation systems use singular value decomposition (SVD) to pre-compute related products. In this study, we look at using latent semantic analysis (LSA), an application of SVD to determine relationships in a set of documents and terms, to find related biomedical papers. We focused on determining the best parameters for SVD in retrieving relevant biomedical articles given a paper of interest. Using PubMed's recommendations as guidance, we found that using cosine distance to measure document similarity leads to better results than using Euclidean distance. We re-evaluated other parameters, including the weighting scheme and the number of singular values and using a larger abstract corpus. Finally, we asked people to compare the relevant abstract retrieved with our method against those retrieved by PubMed. Our method retrieved sensible articles that were chosen over PubMed's relevant papers one-third of the time. We looked into the abstracts retrieved by either method and discuss possible areas for experimentation and improvement.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-04-19
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-ShareAlike 4.0 International
DOI	10.14288/1.0343967
URI	http://hdl.handle.net/2429/61273
Degree	Master of Science - MSc
Program	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2017-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-sa/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Latent semantic analysis for retrieving related biomedical articles Lin, Sheng-Ting

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights