UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An intelligent class : the development of a novel context capturing method for the functional auto-classification of records Payne, Nathaniel

Abstract

The need to accurately classify records is a core problem in many domains. Historically, the classification of records was done manually as records were received and then categorized. Unfortunately, due to a significant growth in the volume of records, the need for robust auto-classification methods that can effectively “read” and classify records, is high. Today, significant challenges remain to the development of effective auto-classification processes for records. This is because the records traditionally require functional classification based on context, not topic classification based on content. Functional classification traditionally has been a challenge for both humans and machines, with little research on how to effectively functionally classify a record. In order to move research forward, this thesis will address the challenges of both human and machine classification of records. Firstly, this thesis, will seek to evaluate the efficacy of human manual classifiers on a classification task, using knowledge from this process to articulate a process for automated functional classification that utilizes a record’s archival diplomatic context. Secondly, this thesis will compare the efficacy of manual versus machine (i.e., auto-classification) using a record set with over 500,000 records, using a novel auto-classification approach that leverages a record’s context, not just its content, to improve classification accuracy. As this thesis will discuss, there is significant variance between expert human (i.e., records managers) during the manual classification process, with statistically significant differences in their ability to accurately classify both administrative and operational records. Moreover, this thesis will demonstrate that an auto-classifier, when trained using key elements of context, can statistically outperform a group of expert human classifiers on a classification task.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International