UBC Theses and Dissertations
Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar
As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International