Flexible and efficient exploration of rated datasets

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar

Abstract

As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms.

Item Metadata

Title	Flexible and efficient exploration of rated datasets
Creator	Kolloju, Naresh Kumar
Publisher	University of British Columbia
Date Issued	2013
Description	As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-08-31
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0052206
URI	http://hdl.handle.net/2429/44028
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2013-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights