Hierarchical structure and ordinal features in class-based linear models

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Hierarchical structure and ordinal features in class-based linear models Wang, Wan Shing Martin

Abstract

In many real world datasets, we seek to make predictions about entities, where the entities are in classes that are interrelated. A commonly studied problem, known as the reference class problem, is how to combine information from relevant classes to make predictions about entities. The intersection of all classes that an entity is a member of constitutes the most specific class for that entity. When seeking to make predictions about such intersection classes for which we have not observed much (or any) data, we would like to combine information from more general classes to create a prior. If there is no data for the intersection, we would have to rely entirely on the prior. However, if data exists but is scarce, we seek to balance the prior with the available data. We first investigate a model where we assign weights to classes, and additively combine weights to make predictions. The use of regularisation forces generalisation; the signal gets pushed up to more general classes. To make a prediction for an unobserved intersection of classes, we would use the weights from the individual classes that comprise the intersection. We introduce several variants that average the predictions, as well as a probabilistic mix of these variants. We then propose a bounded ancestor method, which balances the creation of an informed prior with observed data for classes varying amounts of observations. When dealing with ordinal properties, such as shoe size, we can dynamically create new classes and subclasses in ways that are conducive to creating more informative priors. We do this by splitting the ordinal properties. Throughout, we test on the MovieLens and UCSD Fashion datasets. We found that a combination of the three bounded ancestor method variants resulted in the best performance, and the best combination varied between datasets. We found that a simple model that assigns weights to classes and additively makes predictions slightly outperformed the bounded ancestor method for supervised classification. For the bounded ancestor method, we found that splitting ordinal properties in different ways had minimal impact on the error metrics we used.

Item Metadata

Title	Hierarchical structure and ordinal features in class-based linear models
Creator	Wang, Wan Shing Martin
Publisher	University of British Columbia
Date Issued	2021
Description	In many real world datasets, we seek to make predictions about entities, where the entities are in classes that are interrelated. A commonly studied problem, known as the reference class problem, is how to combine information from relevant classes to make predictions about entities. The intersection of all classes that an entity is a member of constitutes the most specific class for that entity. When seeking to make predictions about such intersection classes for which we have not observed much (or any) data, we would like to combine information from more general classes to create a prior. If there is no data for the intersection, we would have to rely entirely on the prior. However, if data exists but is scarce, we seek to balance the prior with the available data. We first investigate a model where we assign weights to classes, and additively combine weights to make predictions. The use of regularisation forces generalisation; the signal gets pushed up to more general classes. To make a prediction for an unobserved intersection of classes, we would use the weights from the individual classes that comprise the intersection. We introduce several variants that average the predictions, as well as a probabilistic mix of these variants. We then propose a bounded ancestor method, which balances the creation of an informed prior with observed data for classes varying amounts of observations. When dealing with ordinal properties, such as shoe size, we can dynamically create new classes and subclasses in ways that are conducive to creating more informative priors. We do this by splitting the ordinal properties. Throughout, we test on the MovieLens and UCSD Fashion datasets. We found that a combination of the three bounded ancestor method variants resulted in the best performance, and the best combination varied between datasets. We found that a simple model that assigns weights to classes and additively makes predictions slightly outperformed the bounded ancestor method for supervised classification. For the bounded ancestor method, we found that splitting ordinal properties in different ways had minimal impact on the error metrics we used.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-02-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0395817
URI	http://hdl.handle.net/2429/77241
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2021-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Hierarchical structure and ordinal features in class-based linear models Wang, Wan Shing Martin

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights