- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Practical considerations for Dimensionality Reduction...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data Ingram, Stephen
Abstract
In this thesis, we explore ways to make practical extensions to Dimensionality Reduction, or DR algorithms with the goal of addressing challenging, real-world cases. The first case we consider is that of how to provide guidance to those users employing DR methods in their data analysis. We specifically target users who are not experts in the mathematical concepts behind DR algorithms. We first identify two levels of guidance: global and local. Global user guidance helps non-experts select and arrange a sequence of analysis algorithms. Local user guidance helps users select appropriate algorithm parameter choices and interpret algorithm output. We then present a software system, DimStiller, that incorporates both types of guidance, validating it on several use-cases. The second case we consider is that of using DR to analyze datasets consisting of documents. In order to modify DR algorithms to handle document datasets effectively, we first analyze the geometric structure of document datasets. Our analysis describes the ways document datasets differ from other kinds of datasets. We then leverage these geometric properties for speed and quality by incorporating ideas from text querying into DR and other algorithms for data analysis. We then present the Overview prototype, a proof-of-concept document analysis system. Overview synthesizes both the goals of designing systems for data analysts who are DR novices, and performing DR on document data. The third case we consider is that of costly distance functions, or when the method used to derive the true proximity between two data points is computationally expensive. Using standard approaches to DR in this important use-case can result in either unnecessarily protracted runtimes or long periods of user monitoring. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional Scaling algorithms achieve substantial speed improvements or remove the need for human monitoring.
Item Metadata
Title |
Practical considerations for Dimensionality Reduction : user guidance, costly distances, and document data
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2013
|
Description |
In this thesis, we explore ways to make practical extensions to Dimensionality Reduction, or DR algorithms with the goal of addressing challenging, real-world cases.
The first case we consider is that of how to provide guidance to those users employing DR methods in their data analysis. We specifically target users who are not experts in the mathematical concepts behind DR algorithms. We first identify two levels of guidance: global and local. Global user guidance helps non-experts select and arrange a sequence of analysis algorithms. Local user guidance helps users select appropriate algorithm parameter choices and interpret algorithm output. We then present a software system, DimStiller, that incorporates both types of guidance, validating it on several use-cases.
The second case we consider is that of using DR to analyze datasets consisting of documents. In order to modify DR algorithms to handle document datasets effectively, we first analyze the geometric structure of document datasets. Our analysis describes the ways document datasets differ from other kinds of datasets. We then leverage these geometric properties for speed and quality by incorporating ideas from text querying into DR and other algorithms for data analysis.
We then present the Overview prototype, a proof-of-concept document analysis system. Overview synthesizes both the goals of designing systems for data analysts who are DR novices, and performing DR on document data.
The third case we consider is that of costly distance functions, or when the method used to derive the true proximity between two data points is computationally expensive. Using standard approaches to DR in this important use-case can result in either unnecessarily protracted runtimes or long periods of user monitoring. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional Scaling algorithms achieve substantial speed improvements or remove the need for human monitoring.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2013-10-01
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NoDerivs 2.5 Canada
|
DOI |
10.14288/1.0052184
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2013-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NoDerivs 2.5 Canada