Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data Ingram, Stephen

Abstract

In this thesis, we explore ways to make practical extensions to Dimensionality Reduction, or DR algorithms with the goal of addressing challenging, real-world cases. The first case we consider is that of how to provide guidance to those users employing DR methods in their data analysis. We specifically target users who are not experts in the mathematical concepts behind DR algorithms. We first identify two levels of guidance: global and local. Global user guidance helps non-experts select and arrange a sequence of analysis algorithms. Local user guidance helps users select appropriate algorithm parameter choices and interpret algorithm output. We then present a software system, DimStiller, that incorporates both types of guidance, validating it on several use-cases. The second case we consider is that of using DR to analyze datasets consisting of documents. In order to modify DR algorithms to handle document datasets effectively, we first analyze the geometric structure of document datasets. Our analysis describes the ways document datasets differ from other kinds of datasets. We then leverage these geometric properties for speed and quality by incorporating ideas from text querying into DR and other algorithms for data analysis. We then present the Overview prototype, a proof-of-concept document analysis system. Overview synthesizes both the goals of designing systems for data analysts who are DR novices, and performing DR on document data. The third case we consider is that of costly distance functions, or when the method used to derive the true proximity between two data points is computationally expensive. Using standard approaches to DR in this important use-case can result in either unnecessarily protracted runtimes or long periods of user monitoring. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional Scaling algorithms achieve substantial speed improvements or remove the need for human monitoring.

Item Metadata

Title	Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data
Creator	Ingram, Stephen
Publisher	University of British Columbia
Date Issued	2013
Description	In this thesis, we explore ways to make practical extensions to Dimensionality Reduction, or DR algorithms with the goal of addressing challenging, real-world cases. The first case we consider is that of how to provide guidance to those users employing DR methods in their data analysis. We specifically target users who are not experts in the mathematical concepts behind DR algorithms. We first identify two levels of guidance: global and local. Global user guidance helps non-experts select and arrange a sequence of analysis algorithms. Local user guidance helps users select appropriate algorithm parameter choices and interpret algorithm output. We then present a software system, DimStiller, that incorporates both types of guidance, validating it on several use-cases. The second case we consider is that of using DR to analyze datasets consisting of documents. In order to modify DR algorithms to handle document datasets effectively, we first analyze the geometric structure of document datasets. Our analysis describes the ways document datasets differ from other kinds of datasets. We then leverage these geometric properties for speed and quality by incorporating ideas from text querying into DR and other algorithms for data analysis. We then present the Overview prototype, a proof-of-concept document analysis system. Overview synthesizes both the goals of designing systems for data analysts who are DR novices, and performing DR on document data. The third case we consider is that of costly distance functions, or when the method used to derive the true proximity between two data points is computationally expensive. Using standard approaches to DR in this important use-case can result in either unnecessarily protracted runtimes or long periods of user monitoring. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional Scaling algorithms achieve substantial speed improvements or remove the need for human monitoring.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-10-01
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NoDerivs 2.5 Canada
DOI	10.14288/1.0052184
URI	http://hdl.handle.net/2429/45175
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2013-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data Ingram, Stephen

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights