Digital Library Federation (DLF) (2015)

Statistical DPLA : metadata counting and word analysis; Think globally, act locally : how working with DPLA has improved our collections Harper, Corey; Gore, Emily; McIntosh, Marcia; Ballinger, Linda; Stanton, Chris; Harlow, Christina


Statistical DPLA: Metadata Counting and Word Analysis -- Building on work recently presented at DPLAFest, this presentation will highlight next steps in research into metadata quantification and metadata quality. The presentation will begin with a review of phase one of the project, which focused on element counts, top level word counts across fields and across collections, and techniques for visualizing these data. The focus of phase two is applying Natural Language Processing (NLP) techniques to understand word frequencies, common words, and rare / unique words across the DPLA text corpus. Questions posed include: • Are there naturally occurring clusters of words that differentiate DPLA providers? • Are there differences in language patterns between providers or between hub types (content vs. service)? • Are there gaps or differences, or is there alignment, in the language used in search terms versus collection metadata? • Are there relationships between language in search terms, metadata terms, and item usage as measured by Google Analytics? • Are there patterns to the language used in Twitter references to DPLA items? The presentation will provide preliminary responses to these research questions and discuss the development of Metadata NLP and term / N-gram frequency visualization techniques. This will be followed by a participatory discussion of metadata language research; its relationship to metadata best practice, quality, and completeness; and possible next steps to expand exploration of these techniques beyond DPLA. Presenter: Corey Harper (New York University). Think Globally, Act Locally: How Working with DPLA has Improved Our Collections -- This panel discussion brings together various people involved with resource aggregation for the Digital Public Library of America (DPLA). Each person will discuss how working as an aggregator for the DPLA has changed how they approach, work with and describe their own local collections. This can involve metadata processes, collection organization, tools used, local discovery practices, or other workflows related to digital collections... Presenters: Emily Gore (Digital Public Library of America), Marcia McIntosh (University of North Texas), Linda Ballinger (Penn State University), Chris Stanton (Metropolitan New York Library Council / Empire State Digital Network), Christina Harlow (University of Tennessee, Knoxville)

Item Citations and Data


Attribution-NonCommercial-NoDerivs 2.5 Canada