Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

LePlaza : an event-centered social networking platform with location based personalized recommendation… Lin, Cong 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_fall_lin_cong.pdf [ 1.97MB ]
Metadata
JSON: 24-1.0052105.json
JSON-LD: 24-1.0052105-ld.json
RDF/XML (Pretty): 24-1.0052105-rdf.xml
RDF/JSON: 24-1.0052105-rdf.json
Turtle: 24-1.0052105-turtle.txt
N-Triples: 24-1.0052105-rdf-ntriples.txt
Original Record: 24-1.0052105-source.json
Full Text
24-1.0052105-fulltext.txt
Citation
24-1.0052105.ris

Full Text

LePlaza: An Event-Centered Social Networking Platform with Location Based Personalized Recommendation Service  by Cong Lin  B.Sc., Tsinghua University, 2006  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Computer Science)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  August 2011  © Cong Lin, 2011  Abstract  Web based social networking services have afforded opportunities for people to enrich their social lives by building new relationships and sharing various kinds of information. While many people are interested in questions such as “what interesting social events are happening nearby that I can join?”, few existing social networking sites have successfully provided easy solutions for people to explore and organize real social activities.  This thesis represents LePlaza platform, an effort to offer an event-centered social networking platform through which the users can share real social events as well as scheduling their own events. To serve targeted events that are more likely to be of interest to a specific individual, a content-based user-interest model is proposed according to the assessment of the user‟s historical activity records and the description of the events. An intelligent recommender system built upon the user-interest model is developed to make personalized recommendations on restaurants as a particular case study in which events are parameterized by user‟s location, and the results are visualized in an interactive manner. Experiments and user studies show that the proposed model and algorithm effectively reflect the user‟s personal preferences with satisfying recommendation accuracy for the case of dining events.  ii  Table of Contents  Abstract .................................................................................................................................... ii Table of Contents ................................................................................................................... iii List of Tables ........................................................................................................................... v List of Figures ......................................................................................................................... vi Acknowledgements ............................................................................................................... vii Dedication ................................................................................................................................ 1 Chapter 1: Introduction ..................................................................................................... 1 1.1  Background and Motivation ................................................................................. 1  1.2  Thesis Goals and Contributions ............................................................................ 4  1.3  Structure of the Thesis .......................................................................................... 5  Chapter 2: Related Work ................................................................................................... 7 2.1  User Profiling and Interests Modeling .................................................................. 7  2.2  Recommender System ........................................................................................ 10  2.3  Existing Personalized LBS Systems ................................................................... 13  Chapter 3: LePlaza Platform – System Architecture .................................................... 15 3.1  Overall Architecture............................................................................................ 15  3.2  Friend Management ............................................................................................ 18  3.3  Event Management ............................................................................................. 19  Chapter 4: LePlaza Platform – The Recommendation System .................................... 20 4.1  Entity Representation .......................................................................................... 20  4.2  User Interest Model............................................................................................. 26  iii  4.3  Personalized Recommendation ........................................................................... 30  4.4  Summary ............................................................................................................. 33  Chapter 5: LePlaza Platform – Visualization ................................................................ 34 5.1  Dining History .................................................................................................... 34  5.2  Preference Analysis ............................................................................................ 35  5.3  Recommendation Results.................................................................................... 37  Chapter 6: Evaluation ...................................................................................................... 39 6.1  Effectiveness of Keyword Vector ....................................................................... 39  6.2  Determine Lubricating Factor ............................................................................. 43  6.3  User Study........................................................................................................... 46  Chapter 7: Conclusion ...................................................................................................... 50 7.1  Contributions....................................................................................................... 50  7.2  Future Work ........................................................................................................ 51  Bibliography .......................................................................................................................... 53 Appendix: Original Data of Restaurant Recommendation User Feedback .................... 56  iv  List of Tables  Table 4.1: Examples of Dining Activity ............................................................................... 21 Table 4.2: Representation of Restaurant Entity .................................................................... 25 Table 4.3: Representation of Restaurant Review .................................................................. 25 Table 4.4: Recommendation Algorithm ................................................................................ 32 Table 6.1: Intra-category Similarities .................................................................................... 41 Table 6.2: Inter-category Similarities .................................................................................... 42 Table 6.3: Precision Comparison ............................................................................................ 48 Table 6.4: Recall Comparison................................................................................................. 48  v  List of Figures Figure 1.1: Evolution of Location Based Search .................................................................... 4 Figure 2.1: Example of Topic Hierarchical Tree .................................................................... 9 Figure 3.1: LePlaza User Interface ....................................................................................... 16 Figure 3.2: LePlaza System Architecture ............................................................................. 17 Figure 3.3: Friend Tracking & Route Planning .................................................................... 18 Figure 4.1: An Example of Weighted Keyword Vector ........................................................ 25 Figure 4.2: Representation of Restaurant Entity ................................................................... 26 Figure 4.3: An Example of Dining History Tree ................................................................... 27 Figure 4.4: An Example of User Interest Tree ....................................................................... 28 Figure 5.1: Dining History Visualization............................................................................... 34 Figure 5.2: Summary Page of Dining History ....................................................................... 36 Figure 5.3: Detail Page of Dining History ............................................................................. 36 Figure 5.4: Customized Recommendation Result ................................................................. 37 Figure 5.5: Location Based Recommendation Result............................................................ 38 Figure 6.1: Measurement D Varies with 𝜶. ........................................................................... 46 Figure 6.2: Precision Comparison.......................................................................................... 48 Figure 6.3: Recall Comparison .............................................................................................. 49  vi  Acknowledgements   First of all, I would like to offer my enduring gratitude to my supervisor Dr. Son Vuong for his guidance. In the journey of two years, I have got great opportunities from him and his lab, and his support has always been a great motivation for me.    I would also like to give thanks to the second reader of my thesis, Dr. Eric Wohlstadter, and my lab mates: Andrew, Victor, Shahed, and all others for their kind help.    Special thanks are owed to my parents for their endless support in my life.  vii  Dedication Chapter 1: Introduction Social networking is a hot topic in internet computing area today. Rich contents can be posted and shared by people on social networking websites, including status, articles, photos, music, and videos. However, these are not typically what people do in their real social lives, where people are usually connected by various social events such as parties or sports games. Therefore, the first question is, can those social events be managed and shared by people on social network websites as conveniently as other stuffs? In addition, when talking about social events, people are often interested in questions like “Where does the event X take place?” or “What is happening at location Y?” The location based services are supposed to deal with such questions. However, the task of selecting some really interesting items among the huge number of choices returned by a location based search is left to users, because the traditional location based services have no idea of user‟s personal preferences. So the second question to be answered is, how to find upcoming events that are interesting to a specific user? In this thesis, a location-based, eventcentered social networking platform called LePlaza is presented, aiming to give answers to the questions above.  1.1  Background and Motivation  The introduction of the internet and web-based social networking sites such as Facebook and Twitter, has provided good answers to questions with “who”. People build relationships with friends in their real lives and strangers from every corner of the internet in  1  the cyberspace, where they chat, share status and photos, and play online social games together. Despite that many social networking sites have brought enormous fun to people‟s online social lives, few have been successfully provided a media that ties people‟s online social lives to their real lives, a platform that allows people to share, schedule, organize and get notified of real-life social events like an outdoor hiking or a live concert, with ease. To find the answers to questions with “where”, people could acquire a city tour guidebook or they would instead scan through the internet and search pages for keywords they are interested in. However, one may despondently find that either some of the places are not included in an old dated print guide or an upcoming event returned by traditional search engine is actually far away from where he lives. Traditional web search works as shown in Figure 1.1(a). The coming out of Location Based Service (LBS) has made such geographicallyrelated questions easier. Location Based Service is the use of certain mobile technical approaches through the internet/mobile network to access user‟s location information (latitude and longitude coordinate), and provides users with corresponding value-added services through the electronic map platform [1] [2]. Examples are restaurant finders, buddy trackers, applications in the areas of mobile marketing and mobile gaming. The attractiveness of LBSs comes from the fact that users are not required to enter location information manually but are automatically pin-pointed and tracked thanks to nowadays mobile devices. Some local event search services such as Eventful [3] and Zevent [4] help to find out public facilities and business in the surroundings and what is going to take place nearby, as shown in Figure 1.1(b). The result is usually presented with a list of upcoming events and their  2  addresses. However, as a classical music concert may not be attractive to a fan of rock and roll, the conventional location based search engine hardly knows the differences among people with different tastes, because there is insufficient way for them to express their personal preferences during search. Providing the same result to people with different tastes never satisfies all. Known as Context-Aware Service, an advanced location based service integrates personalized information and more targeted context services as shown in Figure 1.1(c). One very interesting class of context aware services are so called personalized applications [5], which suggests that “how well” an item fits a specific user. In the social life domain in this thesis, personalization means the use of knowledge of the user‟s personal preference on various social events as well as the location of the user to find place/events of interest with minimal effort on the user‟s part. Most of the personalized applications make use of a key technology called recommender system [6]. The recommender system was originally developed to address data overload issue, which is extremely important to applications running on hand held devices as the small screen and limited interactions make it difficult for navigating and searching trough large space of data. It has greatly contributed to success of e-commercial companies as Amazon and EBay, functioning well to recommend relevant products to consumers. However, despite the huge success the recommender system has made in commercial goods recommendations, few system delivering location based, personalized social activity information exists today.  3  Query  Search  Local Search  Query  (a) Traditional Web Search  Location (b) Local Search  Personalized Local Search  Query  Location  Personal Information  (c) Location Based Search with Personal Information  Figure 1.1: Evolution of Location Based Search  1.2  Thesis Goals and Contributions The overall goal of this thesis is to design, develop, implement and evaluate a  location based social networking platform prototype, on which users can publish, schedule, share, search and join various of social events. The aim also includes that the operations and resultant social events should be tailored to the user‟s personal preference and tries to make him feel in control of his experience in a visual manner. This goal will then be realized on the specific case of events - dining activities in Vancouver, resulting in a service that would help a user view his dining activities and make dining decisions according to his own historical dining records. The reason of choosing dining event as a particular case in the thesis is we believe that dining event is one of the most popular social events in people‟s social lives, and moreover, it is very convenient to retrieve dining-related data such as restaurants information and customers‟ reviews from criticism websites.  4  The main contributions of this thesis are: First, a location-based and social-event centered platform is designed and implemented. Some of the technical goals of the platform are:   User can publish, view, attend and invite other users to join social events;    User can perform an event search on his or her location and optional locations;    User can be presented the search result as selectable markers on a map and items in a list;    User can perform group operations on events and users.  Second, a new model of representation of user‟s personal preferences on social events based on user‟s historical social activities is presented. There are variety types of social events in the system, including education, sports, music, and so on. In this thesis a preference model of dining events is built and implemented, but the model itself is expected to be easily applied to other types of events in system. A recommending algorithm is run on pre-built preference model, aiming to give recommendations on local restaurants which meet the user‟s taste. Third, the user‟s history of social activities, preference model as well as recommendation results are visualized in an interactive and playful manner by using e-maps, graphs and diagrams.  1.3  Structure of the Thesis Chapter 2 discusses related work with respect to user interest modeling and  personalized recommendation, as well as the state of art of location based social platforms.  5  Chapter 3 gives an overall introduction to our event-centered social network platform, LePlaza, including system architecture, function modules and other technical details. Chapter 4 describes the steps of design and implementation of user preference model for dining events. A personalized recommendation algorithm running on the model will then be proposed. In Chapter 5, a user-interface is presented to visualize the model and results generated in previous chapter. Evaluations are made in Chapter 6. System parameters are adjusted based on experiments and final user experience evaluations are studied. The last chapter concludes the thesis.  6  Chapter 2: Related Work The most important issue for many personalized applications is to construct a computational model for each individual user to predict his preferences for upcoming events [7]. An accurate user profile is vital to the system‟s success as it largely depends on the ability of the model to represent the user‟s interests. In this chapter approaches of user profiling are firstly discussed, followed by recommendation algorithms running on the profile. Finally some state of the art relevant applications are presented.  2.1  User Profiling and Interests Modeling A user profile is a description of the user containing the most important facts about  him or her [8]. User profiling is a process of deriving unobservable features about users from observable information given by them or their actions in system. The most common contents of user profile include: user preferences and interests; the user‟s individual characteristics; user behavior patterns; the user‟s knowledge and skills; the user‟s goals. A user preference and interests model is a subset of a user profile as it is only concerned with modeling the following two aspects with respect to a user [7]: First, it models the interests of a user in some qualitative objects or concepts and second, it measures his preferences in these interests by some quantitative values. The two most common representations of user interests are keyword-based model and topic hierarchical model.  2.1.1  Keyword-based Interests Model In keyword-based models, user interests are represented by weighted vectors of  keywords. Weights traditionally represent the relevance or importance of the keyword for the  7  user or within the topic. The vector of keywords can be explicitly provided by the user, or they can be extracted from the description of products that user used to buy or view, the articles or Web documents visited by the user during browsing, etc. Amalthaea [9] is a typical system that creates keyword profiles by extracting them from web pages. The keywords are weighted with the widely used TF-IDF weighting scheme [10] and compared to the profile using the cosine formula [10]. Only the documents with vectors that are closest to the user profile are passed on to the user. Letizia [11] and Fab [12] did similar work in different application scenarios. In the personalized book recommendation system described in [7], each book is represented by a list of keywords referring to the categories and scope of the book. User‟s interest with respect to certain book category is then proportional to the frequency that user browses books containing the category keyword. Similar work can be found in PEA [13], which builds keyword-based profiles using terms extracted from the user‟s webpage bookmarks. The difference is that for each bookmark there is a set of keyword vectors generated. The rationale behind this is that the user‟s interests in multiple areas can be the combinations of sets of vectors under corresponding bookmarks. The keyword-based model is suitable for building user profile from unstructured data such as the description of an item or text in a document. The limitation is its inherent polysemy problem: the keyword “apple” could have totally different meanings to a nutritionist and an electronic device fancier.  8  2.1.2  Topic Hierarchical Interests Model Topic hierarchical model is usually represented by conceptual nodes and relationship  between these nodes. Each node in the hierarchy represents a topic of interest for a user, which is defined by a set of representative weighted words. A typical topic hierarchy can be seen in Figure 2.1. Some topics can have dozens or hundreds children while others may have few or none. In Sensus search project [14], the hierarchy contains approximately 70,000 nodes in total. Because building a deep and broad hierarchical model is mostly an expensive and manual work, profiles are often based on subsets of existing classifications. User Id  Travel  Sports  Hockey  Soccer  Real Madrid  MANUTD  Asia  ...  Movie  ...  Europe  Italy  Action  France  Figure 2.1: Example of Topic Hierarchical Tree  A lot of research has been carried out using topic hierarchies. Although no tree is created, the personalized search engine proposed in [15] introduced the concept of hierarchy by using hierarchical matrices instead to represent user‟s hierarchical relationship in his search history. In [16], [17], [18] interest trees are built up and each tree node is represented by a topic-weight pair. The user interests are then captured by a set of preference-vectors, which store the path information and node relationship from root node to a leaf node. These  9  approaches successfully address polysemy issue, however, they require every object has an explicit topic. In addition, the simple form of “topic-weight” is not powerful enough to represent complicated user interests. As an attempt to combine keyword-based model and hierarchical model, in the elearning system presented in [19], each leaf learning topic node is attached a set of weighted keywords, which are extracted from user‟s learning materials. A user interest score is then computed by the sum of the score from hierarchy part and the one from keywords part. It is the closest work to our interest model in the thesis. The difference is that in our model, a list of keywords is attached to every topic node not only to the leaf nodes. In summary, the good part of a hierarchical model is that it is able to store semantic relationship, which gives us many advantages over a flat structure, as we can utilize the inherent relationships between the nodes of the hierarchy to better understand the user‟s interest on a given context. The bad is that it is insufficient when dealing with unstructured data, where no explicit hierarchical structures are available. Therefore, it is believed that a hybrid model, which combines hierarchies and keyword vectors, would be more powerful when constructing interest model from mixed data sources.  2.2  Recommender System Recommender systems have been an active research area for more than a decade, and  many recommendation approaches with distinct strength have been proposed. In the literature, two types of recommendation approaches have been suggested: content-based filtering [20] and collaborative filtering [21].  10  2.2.1  Content-based Filtering Content-based filtering systems analyze the content of items unknown to the user,  compare them with pre-built user profile which is a representation of user‟s interests, and then identify items that are of particular interest to the user [22]. For example, a contentbased music recommendation system will typically rely on information such as style, instrument, player, and composer and will match these against the learned user preferences model in order to select a set of promising albums. A real case is Amazon.com [23]. Part of the Amazon‟s product recommendation results comes from its content-based feature, “favorites”. These favorites are either calculated by keeping track of the categories of items that users purchased or may be specified by the user. More work can be found in [24]. Content-based filtering is very efficient in text-intensive domains, and it has the advantage of immediate recommendations, i.e., no feedback from other users about the items is necessary. However, some well-know shortcomings of this approach are discussed in [25], [26], and [27]. Firstly, users can only be recommended with items that are similar to the ones they visited in the past. Secondly, sometimes it is difficult to formalize user perception and preference. Thirdly, the system will need to work with a sufficient amount of data.  2.2.2  Collaborative Filtering The rationale of Collaborative Filtering (CF) is the fact one is likely to enjoy an item  if his friends who share the similar tastes recommend it. Rather than determining the similarity between item and user preferences (as in Content-based approaches), CF analyzes users‟ historical ratings to identify similarities between users, and then generates new recommendations based on like-minded users‟ preferences.  11  A prototype of an academic event recommender system is discussed in [28]. The system constructs a researcher-academic event participating matrix, based on which similarity between any two researchers is computed. To make recommendations to a target researcher, a group of the most similar researchers is first selected and the rank of upcoming events is then determined by their aggregating ratings. Similar work is done in [29], [30]. A flaw of this method is that it requires users‟ feedback from events that haven‟t taken place yet. To address the problem that how to use the partial rating history to predict ratings for the remaining items that the users have yet to experience, [31] uses pair of event in past (Ei, Ej) to represent user preferences, where Ei is a liked event and Ej is a disliked event, and attempts to find a ranking function to the unknown event list that best captures past feedback (e.g. Ei is preferred over Ej) from the user. The method, however, is not really collaborative as it doesn‟t take the impact from other similar users into consideration. The strength of collaborative filtering is that it is completely independent of any machine readable representation of the objects being recommended, and works well for complex objects such as movies in which people‟s tastes and preferences vary [32]. A disadvantage of CF is the system is depended on many active users to ensure a sufficient amount of ratings and a proper distribution of the ratings across all the items. In summary, both content-based and collaborative approaches have their strengths and weakness. In our particular case where dining events are to be recommended, a contentbased method is preferred as the features of the contents are relatively easy to be extracted from a given data source and it is believed one‟s preference in food is mostly determined by his personal taste rather than other‟s opinions.  12  2.3  Existing Personalized LBS Systems Four personalized location based services are investigated in this section, including  two web-based services, one social networking application and one mobile phone application.  2.3.1  Loopt Loopt [33] is a mobile phone application, as shown in Figure 2.2, which allows users  to see the nearby buzz about the places and on-going events around them and where their friends are. The application is inspired by Google Latitude, which is a simple mobile locationbased application that allows the user to see the positions of his friends on Google map. Loopt adds more social elements to it by marking interesting places with highlighted features on map and encourage users to share comments. The recommendations on places are based on popularity of the place. Thus there is no personal preference considered.  2.3.2  GeoLife Project As a part of the GeoLife Project developed at Microsoft, [34] presents a web-based  tool that makes two types of recommendation: (1) Location recommendation given some activity query and (2) Activity recommendation given some location query. By mining GPS data and available user comments, the system is able to build up a location-activity correlation matrix. Recommendations can then be made due to the fact that certain activities are often performed at certain locations. No personal data is used in the system.  13  2.3.3  LivesGEO LivesGEO [48] is a location based multimedia sharing and tagging mobile application  developed by Network and Internet Computing Lab, Computer Department of UBC. Pictures and videos can be uploaded via mobile phone and marked on the map where people took them. Users can also search for contents shared by others on the map. LivesGEO and LePlaza project in this thesis complement each other: People schedule events on LePlaza, and share exiting moments of the events they attended using LivesGEO.  2.3.4  Eventer The Eventer [26] is a Facebook application that recommends upcoming concert  events to a specific user. The application uses an IP-GeoLocation mapping to get location data and retrieves concert events from a third music website. The combination of both content-based and collaborative methods is used to compute similarities between user and event, as well as user and user. The limitation is a user has no control of his location which is only determined by his IP address, and content-based similarity calculation relies on the data format from the specific music website.  14  Chapter 3: LePlaza Platform – System Architecture In this thesis, we developed LePlaza, a location based online social network which provides event-centered services such as social events publishing and searching, personalized event recommending, etc. The core concept of LePlaza is event. People are connected by events they attended and are going to attend. An event could be any kind of social activity like an academic conference, a weekend party or a soccer game on campus. Using the location information collected from mobile clients, LePlaza allows user to get real-time info of social events happening nearby, to know which friends are going to attend the event and to search potential participants nearby who have the similar interests. This chapter gives an overview of our LePlaza platform, including the system architecture and the introduction to each function module. More details about personalization module, which is the main work of this thesis, will be discussed in later chapters.  3.1  Overall Architecture A user interface of LePlaza after the user logged in is shown in Figure 3.1. Like many  mainstream social networking sites, the front page contains a navigation of functional links, a list of friends, and friends‟ news feed from left to right, respectively.  15  Figure 3.1: LePlaza User Interface  A high level view of the system consists of four functional components: User identification management, friend management, event management and messaging module. The complete system architecture is shown in Figure 3.2. ID management component keeps the user‟s profile information and validates when user login. Online chat and offline message are both supported by MSG management component, as well as the update of friends‟ news feed. The content in dash box is the core work of this thesis, which will be described in detail in the following chapters.  16  Login Module  ID Mngt.  User Profile  Basic Functions Add New Friends Friends Search & View  Friend Mngt.  Friends Tracking  Group Module  lePlaza  Basic Functions Public Event Module Event Mngt. Personalized Module  Publish & View Event Search Join Event  Preference Mngt. Recommendation Module  Friends Feed MSG Mngt.  On-line Chat  Off-line Message  Figure 3.2: LePlaza System Architecture  17  3.2  Friend Management Friend management module provides a set of basic operations such as view, invite,  search, etc. User can use a keyword-based search engine to search other user‟s profiles which are open to public, view their information and send new friend invitations. It is desirable that all the basic functions can be performed on a basis of group. This is useful when a user wants to schedule an event shared by a group of people. The group module allows the user to create and join groups, invite others and send messages to whole group or any selected members. Another feature of this module is it allows user to keep a track of his friends on the map. It also has a route planner to provide user a route to any selected friends. Figure 3.3 shows the interface for friend tracking.  Figure 3.3: Friend Tracking & Route Planning 18  3.3  Event Management Like friend management module, event management component also provides some  basic event operations in its Public Event sub-module. The event search is location-based and a search range can be specified by the user. Only events that match search condition and are within the given range are returned. In addition, more filters can be applied to search result such as event category and date. Any user can create new events and send event invitations to any selected friends either right after the events are created or later. A friend who decided to attend the event will be marked on an event-view map where all event-related people can be seen.  19  Chapter 4: LePlaza Platform – The Recommendation System As a specific case of social event, user‟s dining activities are studied in this chapter. Because the details of recommendation systems differ based on the representation of items, this chapter first discusses item representations. Next, a user interest tree is built upon the data representations. Finally a personalized restaurant recommendation service using user interest tree is suggested.  4.1  Entity Representation In order for the system to analyze item description to identify items that are of  particular interest to the user, presentations of two types of entities in system must be studied: the presentation of user‟s historical dining activity, which reflects his dining preferences, and the presentation of restaurant entity, which contains the feature and description of the restaurant.  4.1.1  Historical Dining Activity After a user having his meal, he could add a record of historical dining activity to  system database through the dining activity uploading interface of LePlaza. A table called Dining_Activity_Table is used by the system to store dining activities for all users. Table 4.1 shows the table with a few records that describe three dining activities.  20  Table 4.1: Examples of Dining Activity  ID  uid  rid  date  cost  rating with_friend  1001  14  bobs-subs-richmond  2011-05-04  12  4  1002  14  toshi-sushi-vancouver  2011-07-11  15  5  1003  15  red-robin-burnaby  2011-03-20  19  3  4;5;  10;  Each record has an ID as unique identifier. The rest of fields of a record are: user id (uid), restaurant id (rid), dining date, dining cost, user‟s rating and friends who had meal together. Uid and rid are foreign keys for User_Table and Restaurant_Table in database respectively. Rating is an integer from 1to 5 with minimum of 1, specified by user, indicating the satisfaction for the meal. The with_friend field contains a NULLABLE list of friend ids.  4.1.2  Restaurant Data Collection There are two types of restaurant data, structured and unstructured. The structured  data includes most information of the restaurant such as name, category, address, hours, and average rating, etc., whereas the unstructured data is users‟ reviews. All restaurant data is from Yelp [35], which is a social networking, user review and local search website [36]. Yelp offers online local search includes what the user is seeking (e.g. a restaurant) and the location from which the search is performed. The returned result usually contains both structured data and unstructured data (reviews).  21  Yelp also provides a set of APIs to complete the same task as its online local search. However, there is a limitation of using Yelp API to collect restaurant data, which is, the Yelp API only returns the most recent three reviews of a restaurant. This is not good enough for us to extract features for the restaurant from the reviews. Therefore, a web crawler is written to gather all users‟ reviews for a target restaurant. Steps to collect all restaurant data in Vancouver: Step I Call Yelp local search API with search parameters „city = Vancouver‟ and „type = restaurant‟. A list of restaurants is returned with all desired structured data and a page url for each restaurant on Yelp website. Store the structured data in Restaurant_Table. Step II For each restaurant page url, run the crawler to crawl all the reviews associated to the restaurant, and store them in Restaruant_Review_Table. We have collected 816 restaurants in Vancouver and 12,800 reviews in total.  4.1.3  Restaurant Entity The representation of restaurant entity consists of the representation of structured data  and unstructured data. The structured data is stored in Restaruant_Table and some of the important fields of the table is shown in Table 4.2, where category is the style of the restaurant, and url is the link to the detail page of the restaurant where reviews can be collected. Other fields that are not shown in the table include address, photo, phone, price, highlight_words of restaurant, etc.  22  The  unstructured  data,  which  is  user‟s  review,  is  firstly  store  in  Resturant_Review_Tables shown in Table 4.3, where ID is a unique identifier for the record, rid is the restaurant id, rev_rating is the user‟s rating for the restaurant. For each restaurant, a weighted keyword vector is attached to describe the features of the restaurant. The keyword vector is generated from the stored reviews. In order to compute a weighted keyword vector for each restaurant, a keyword dictionary needs to be generated first. The keyword dictionary, which is extracted from the contexts of reviews, contains all the keywords that will appear in keyword vector. To generate the keyword dictionary from users‟ review, three steps are to be performed. Step I Purifying. Since the reviews are user-generated, they usually contain spelling errors and non-recognizable words. It is necessary to make sure the dictionary doesn‟t contain invalid words. In addition, numbers and punctuations are also excluded. Step II Filtering. Not all the words in the reviews are useful, for example, prepositions and conjunctions are meaningless. In fact we are only interested in nouns which describe restaurant features and adjectives which might imply users‟ feelings. Therefore, all the meaningless words are to be filtered out. Step III Stemming. Removal of common word suffixes is done to decrease the size of dictionary. For instance, “apples” would be recognized as “apple” and “children” would be stored as “child” in the dictionary. The three steps are completed by using the WordNet [37] and JWNL APIs [38]. The WordNet is a large lexical database of English developed by Princeton University. Nouns, 23  verbs, adjectives and adverbs are grouped into sets of cognitive synonyms, each expressing a distinct concept. JWNL is a Java API for accessing the WordNet relational dictionary. The final keyword dictionary generated from users‟ review has 12,990 words in total, including all appeared nouns and adjectives. After the keyword dictionary is generated, we created a weighted keyword vector for each restaurant. The vector for a restaurant contains all 12,990 words in the dictionary, and the weight for each keyword in the vector is the number of times the keyword has appeared in the reviews of the restaurant. An example of a weighted keyword vector for a given restaurant is shown in Figure 4.1. The keyword vectors are stored in files. In summary, the representation of a restaurant entity is shown in Figure 4.2.  24  Table 4.2: Representation of Restaurant Entity  rid apgujungvancouver guu-vancouver-2  name category Apgujung korean;  latitude 49.28989  Guu  49.28422  au-petite-cafevancouver  Au Petite vietnamese; Cafe  japanese;  49.24123  longitude rating review_count url 123.13300 4 23 http://www.yelp.ca/biz/apgujungvancouver 123.12547 4.5 140 http://www.yelp.ca/biz/guuvancouver-2 123.10165 4 40 http://www.yelp.ca/biz/au-petitecafe-vancouver  Table 4.3: Representation of Restaurant Review  ID 12001 12002  rid acme-cafe-vancouver-3 au-petite-cafe-vancouver  rev_rating 5 4  12003  green-lemongrassrichmond  5  rev_date 5/19/2010 1/25/2011  rev_context This place is so cute!! I loved the atmosphere… Sandwiches! One of the best because of their very fresh and limited bread… 11/13/2010 Good Vietnamese place with a really clean and beautiful menu. The salad roll was …  sushi  spicy  grill  elegant  7  3  0  1  Figure 4.1: An Example of Weighted Keyword Vector  25  Structured Name, category … …  Database  Restaurant  Unstructured  File  Keyword vector Figure 4.2: Representation of Restaurant Entity  4.2  User Interest Model As discussed in Chapter 2, a keyword-based user interest model is able to make good  use of unstructured data while a hierarchical model is more powerful to deal with semantic issues. In this section a weighted tree, attached by a set of weighted keyword vectors, is proposed based on the user‟s dining history and representation of restaurants.  4.2.1  Dining History Tree For each user, a tree model that represents the user‟s historical activities is firstly built  up before his interest tree is created. In fact, it is the process of refactoring the record of user historical dining activities from a flat structure a tree structure. An example of dining history tree can be seen in Figure 4.3.  26  User ID  pizza  chinese  greek  #Times: 3 Ratings: 11  #Times: 1 Ratings: 4  #Times: 5 Ratings: 20  Bella Pizza  Camy's Pizza  Shanghai River  Maria's Taverna  Olympia Restaurant  #Times: 1 Ratings: 5  #Times: 2 Ratings: 6  #Times: 1 Ratings: 4  #Times: 3 Ratings: 13  #Times: 2 Ratings: 7  Figure 4.3: An Example of Dining History Tree  The root node of the tree stores user ID. The nodes at second level are category nodes, representing the categories of the restaurants that the user went before. Each node has two fields, „#Times‟, which is the number of times that user went to the restaurants that belong to the category, and „Ratings‟, which is the total ratings that the user gave to restaurants in the category. The leaf nodes are restaurants nodes which store the restaurants where the user had his meals in the past. Note that the summation of #Times and Ratings of all children nodes gives the value of corresponding fields in their parent node.  4.2.2  Interest Tree Representation The user interest tree is an extension of dining history tree, as shown in Figure 4.4.  Each node is attached a weighted keyword vector, which describes the features of the corresponding node. For example, the keyword vector of a restaurant node describes the feature of the restaurant, the keyword vector for a category node captures the features for all 27  restaurants under this category, and the keyword vector for root node is a description of all features that are preferred by the user. In addition, for both category nodes and restaurant nodes, an underlined field „Weight‟ is added, which indicates how likely the user will prefer the category of a specific restaurant.  User ID User KeywordVec  pizza  chinese  greek  #Times:3 Ratings:11 Weight: 0.333  #Times:1 Ratings:2 Weight: 0.061  #Times:5 Ratings:20 Weight: 0.606  Pizza KeywordVec  Chinese KeywordVec  Greek KeywordVec  Bella Pizza  Camy's Pizza  Shanghai River  Maria's Taverna  Olympia Restaurant  #Times: 1 Ratings: 5 Weight: 0.152  #Times: 2 Ratings: 6 Weight: 0.181  #Times: 1 Ratings: 2 Weight: 0.061  #Times: 3 Ratings: 13 Weight: 0.394  #Times: 2 Ratings: 7 Weight: 0.212  Keyword Vector  Keyword Vector  Keyword Vector  Keyword Vector  Keyword Vector  Figure 4.4: An Example of User Interest Tree  4.2.3  Computation of Interest Tree The computation of user interest tree consists of two parts, the computation for  keyword vectors and the computation for weight fields. They all follow a bottom-up way as described below. To compute keyword vectors, firstly, for each restaurant node, retrieve its keyword vector from keyword vector file generated in section 4.1.3, and attach it to the node. Then 28  iteratively compute the keyword vector for the parent node by summing up all keyword vectors of the children nodes. User keyword vector is created at last. Denote the keyword vector of a node p as 𝑝𝑘𝑣 , the computation can be formalized as 𝑖 𝑐𝑘𝑣  𝑝𝑘𝑣 = 𝑖  where 𝑐 𝑖 is a child of node p. To compute weight fields, let 𝑛𝑝𝑘 denote the node p at level k, where k = 0, 1, 2, 𝑊𝑝𝑘 denote the weight of 𝑛𝑝𝑘 , and 𝑅𝑝𝑘 denote the ratings of 𝑛𝑝𝑘 , the computation of the weight of node 𝑛𝑝𝑘 can be expressed as 𝑊𝑝𝑘  4.2.4  =  𝑅𝑝𝑘 𝑗  𝑅𝑗𝑘  Grey Node The user interests tree does not only show what the user likes but also reflect what the  user probably dislikes. For example, Figure 4.4 shows that user has gone to Chinese restaurants for only one time with a poor rating. It probably implies the restaurant, or even the type of food is of no interest to the user. Such a node is defined as „Grey Node‟. A bottom-up method is used to discover all the grey nodes. Firstly, for each restaurant node, if the values of both of its #Times and Ratings fields are below average value of the nodes at same level, then the node is marked as grey node. Then if all the restaurant nodes are grey nodes, the category node is also marked grey. Any restaurant that matches a grey node or is similar to a grey node shouldn‟t appear in the recommendation results. 29  4.3  Personalized Recommendation The recommendation problem can be generally formalized as follows [39]. Let U be  the set of users and P be the set of items to be recommended. A score function SF is defined to measure the score of item p in the perspective of user u, i.e., SF:𝑈 × 𝑃 → 𝑅, where R is a totally ordered set. To make a personalized recommendation for a given 𝑢 ∈ 𝑈, is to choose such item that maximizes the inferred score. Formally, we have ∀𝑢 ∈ 𝑈, 𝑝𝑢 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑝∈𝑃 𝑆𝐹(𝑢, 𝑝)  4.3.1  Score Function In the case of restaurant recommendation, the restaurants to be recommended is  called target restaurants. The recommender system calculates a score for every target restaurant and recommends the ones with the higher scores. Given a target restaurant to the user, the score that the restaurant would get in the user‟s perspective consists of two parts, which are the category match score and the keyword vector match score. Formally, let r be 𝑢 𝑢 the target restaurant, u be the user, 𝑆𝑟𝑢 , 𝑆𝑟,𝑐 and 𝑆𝑟,𝑘 be the final score of the restaurant, the  category match score and the keyword vector match score respectively, it gives that 𝑢 𝑢 𝑆𝑟𝑢 = 𝛼 ∗ 𝑆𝑟,𝑐 + 1 − 𝛼 ∗ 𝑆𝑟,𝑘  where 𝛼 ∈ (0, 1) is the lubricating factor.  30  4.3.2  Category Match Score For a target restaurant r, if the category of r matches one category node in user‟s  interest tree, then the category match score is simply given by the weight of the category 𝑢 node, i.e., 𝑆𝑟,𝑐 = 𝑊𝑐 , where 𝑊𝑐 is the weight of the matched category node. Here the word  „match‟ means that the category of the target restaurant is the same as the one of a category node in user‟s interest tree. Note that a restaurant may have multiple categories that match multiple nodes in interest tree, resulting more than one category match scores. The strategy is to choose the highest match score as the final category match score, i.e., 𝑢 𝑆𝑟,𝑐 = 𝑚𝑎𝑥 𝑊𝑐  𝑐 ∈ 𝑠𝑒𝑡 𝑜𝑓 𝑚𝑎𝑡𝑐𝑕𝑒𝑑 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑛𝑜𝑑𝑒𝑠}  For target restaurant that doesn‟t match any category node, the category match score gives zero.  4.3.3  Keyword Vector Match Score For a target restaurant r that doesn‟t appear in user‟s dining history tree, if the  category of r matches one category node in user‟s interest tree, then the keyword vector match score is calculated by the cosine of the angle between the keyword vector of r and the keyword vector of the matched node. If no category node matched, then the keyword vector match score is calculated by the cosine of the angle between the keyword vector of r and the keyword vector of the root node.  31  Formally, let 𝑡𝑘𝑣 and 𝑐𝑘𝑣 be the keyword vector of r and matched category node c if any, respectively and let 𝑟𝑘𝑣 be the keyword vector of root node, the keyword vector match score is given by 𝑡𝑘𝑣 ∙ 𝑐𝑘𝑣 𝑢 𝑆𝑟,𝑘 =  𝑡𝑘𝑣 × 𝑐𝑘𝑣 𝑡𝑘𝑣 ∙ 𝑟𝑘𝑣 𝑡𝑘𝑣 × 𝑟𝑘𝑣  4.3.4  ,  𝑖𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑛𝑜𝑑𝑒 𝑚𝑎𝑡𝑐𝑕𝑒𝑑  ,  𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒  Recommendation Algorithm The input of the recommender system are user‟s location L, which could either be an  address or a coordinate, and a user‟s preferred search range R, for example, 3 kilometers. The output is an ordered list of 10 restaurants that have the highest scores. The algorithm for recommending new restaurants to the user is described in Table 4.4. Table 4.4: Recommendation Algorithm  1. 2. 3.  4.  Recommendation Algorithm: Generate a candidate set C that contains all restaurants at within range of R location L; Initialize result set RS = {}; For each restaurant r in set C, find the matched category node Nc in user interest tree with the highest weight: a) If Nc is a grey node, continue; b) If Nc exists, sum up the category match score and keyword vector match score as the final score s, then add r with s to RS; c) If Nc is null, calculate keyword vector match score using root node of interest tree as the final score s, then add r with s to RS; Sort RS with respect to final score s of each entry r, and return the top 10 entries.  32  4.4  Summary This chapter is the core work of the thesis. The entity representation is first  investigated, based on which a hybrid user interest model is proposed. The advantage of this model is that it makes good use of unstructured data by having each node in the model attached a keyword vector, without losing the inherent hierarchical relationship between nodes. The interest mode is also able to discover what the user doesn‟t like. In addition, a content-based personalized recommendation algorithm is suggested based on user interest model. The recommender system takes both of the restaurant style which is reflected by the weight of category node, and the features of restaurant which are represented by keyword vectors, into consideration. A few of known limitations of the recommendation algorithm are, first, it is purely content-based, which implies that only the restaurants that are similar to the ones that user went before can appear in the recommended result; second, the algorithm wouldn‟t work well when the user doesn‟t have enough number of historical dining activities. More discussions regarding to the validation of keyword vectors, the configurations of parameters and the effectiveness of the algorithm can be found in Chapter 6.  33  Chapter 5: LePlaza Platform – Visualization LePlaza provides a user interface that visualizes user‟s dining history, interests model and recommendation results. The aim of the visualization UI is to offer the user a straightforward view of his historical dining activities, a detailed analysis of his dining interests using charts and graphs, and an interactive manner of making a recommendation.  5.1  Dining History User can view his historical dining activities by entering My Dining Activity page.  Figure 5.1 is a snap of dining history visualization.  Figure 5.1: Dining History Visualization  By default, user‟s dining activities in the last 3 months will be marked on the map, with different icons representing different types of food. The larger the icon is, the more 34  numbers of times the user has been there before. By dragging the slide bar above the map, user is allowed to choose to display his dining activities from last year to last two weeks. The detail of a dining activity will pop up when clicking any mark on the map, which includes the restaurant information such as ratings and address, and dining date. On the right side is a list showing all the dining activities that are currently displayed on the map. The list can be sorted by number of times or the user‟s rating for the dining activity.  5.2  Preference Analysis The user‟s interest model is visualized in My Preference page. In the preference  summary page as shown in Figure 5.2, a pie chart is used to show the proportion that each type of food takes in the user‟s all dining activities. A bar chart on the right side shows the user‟s monthly dining activities, in terms of number of times. These charts would help the user to have a better understanding of his own preferences and dining behaviors. A text summary is provided at the bottom of the page, including the total number of dining activities, the average rating and cost, the most preferred food category and so on. In the preference detail page, the pie chart is broken into separate parts, each of which representing a food category, with different sizes meaning different proportion in dining activities as shown in Figure 5.3. By double clicking any food category, the user can see a bar chart on the right showing the monthly dining activities within in one year of the selected  35  food category, with a text summary including the average rating, cost, and some highlighted features of the category.  Figure 5.2: Summary Page of Dining History  Figure 5.3: Detail Page of Dining History 36  This page also allows the user to play with his preference by dragging and dropping any food category to the pink circle in the middle. A window for recommendation will show up when some category is dropped in the drop area, and user can then be recommend with restaurants that serves the selected types of food within any specific range from user‟s current location. Up to 10 recommended restaurants will be ranked and displayed on the map on right side, as shown in Figure 5.4.  Figure 5.4: Customized Recommendation Result  5.3  Recommendation Results In My Recommendations page, a map is firstly displayed with the user‟s current  location marked in the middle, which is the default location where restaurants nearby will be recommended. However, user is free to change the location by dragging and dropping the 37  marker on the map. A recommendation event is fired and the result is ranked and displayed whenever the location is changed. The search range can be specified before moving the location marker and a list of recommended restaurants will be displayed on the right afterwards. Figure 5.5 gives an example of recommendation result.  Figure 5.5: Location Based Recommendation Result  38  Chapter 6: Evaluation In this chapter, the effectiveness of user interest model and personalized recommendation will be evaluated. The parameter used in recommendation module is determined by experiments. A user study is conducted and feedback collected from participants is provided as a proof of system feasibility.  6.1  Effectiveness of Keyword Vector Recall that in the personalized recommendation discussed in section 4.3.1, the final  score for a target restaurant to be recommended to a specific user is given by 𝑢 𝑢 𝑆𝑟𝑢 = 𝛼 ∗ 𝑆𝑟,𝑐 + 1 − 𝛼 ∗ 𝑆𝑟,𝑘 𝑢 𝑢 where the first part 𝑆𝑟,𝑐 is the category match score and the second part 𝑆𝑟,𝑘 is the keyword 𝑢 match score. 𝑆𝑟,𝑐 is just the weight of the matched category node in user interest tree, which  is pre-computed and stored and can be easily read from the tree when making a 𝑢 recommendation. In the second part, 𝑆𝑟,𝑘 measures the similarity between the target  restaurant and the matched node in user interest tree, which is also the cosine value of the 𝑢 angle between the two corresponding keyword vectors. However, the calculation of 𝑆𝑟,𝑘 , on  the other hand, is not that straightforward because it is not a pre-computed value that can be viewed or retrieved somewhere. Therefore, in order to evaluate the effectiveness of using the extracted keyword vectors as a measurement for the similarity between two restaurants and to have an intuitive  39  understanding of quantity of the similarity, we defined intra-category similarity and intercategory similarity respectively.  6.1.1  Intra-category Similarity Firstly, the intra-category similarity for a given category is defined as the average  similarity between any of the two restaurants within the category. Formally, let 𝐶𝑖 be the ith category, 𝑅𝑖 is the set of restaurants within the category, and 𝑆𝑖  is the intra-category  similarity for 𝐶𝑖 , then 𝑆𝑖 is given by 𝑆𝑖 =  𝑢∈𝑅𝑖, 𝑣∈𝑅𝑖 ,𝑢≠𝑣 𝑠𝑖𝑚(𝑢, 𝑣)  𝑁𝑖  where u and v are restaurants within category 𝐶𝑖 , sim(u,v) is the similarity between u and v, and 𝑁𝑖 is the number of (u, v) pairs in 𝐶𝑖 . We can also define 𝑆𝑖,𝑚𝑎𝑥 and 𝑆𝑖,𝑚𝑖𝑛 in the same way, 𝑆𝑖,𝑚𝑎𝑥 = 𝑚𝑎𝑥 𝑠𝑖𝑚 𝑢, 𝑣  𝑢 ∈ 𝑅𝑖, 𝑣 ∈ 𝑅𝑖 , 𝑢 ≠ 𝑣}  𝑆𝑖,𝑚𝑖𝑛 = 𝑚𝑖𝑛 𝑠𝑖𝑚 𝑢, 𝑣  𝑢 ∈ 𝑅𝑖, 𝑣 ∈ 𝑅𝑖 , 𝑢 ≠ 𝑣}  where 𝑆𝑖,𝑚𝑎𝑥 means the maximum similarity between two restaurants within category Ci and 𝑆𝑖,𝑚𝑎𝑥 means the minimum similarity within Ci. Table 6.1 shows the 𝑆𝑖,𝑚𝑎𝑥 , 𝑆𝑖,𝑚𝑖𝑛 and 𝑆𝑖 for the largest ten categories.  40  Table 6.1: Intra-category Similarities  Category  Number of Rests  𝑆𝑖,𝑚𝑎𝑥  𝑆𝑖,𝑚𝑖𝑛  𝑆𝑖  Hotdogs  45  0.9257702  0  0.0950225  Chinese  42  0.7988825  0  0.2257978  Indpak  39  0.7942415  0  0.3264948  Japanese  38  0.8487532  0.04476907  0.3394094  Seafood  36  0.8783334  0  0.2552037  breakfast_brunch  32  0.9234331  0.01918471  0.250193  Burgers  32  0.7062084  0  0.1843502  Sandwiches  32  0.7303357  0  0.1723489  Newamerican  28  0.6733683  0.03160561  0.3013634  Mideastern  25  0.6962025  0.00847397  0.2909511  The table shows that the similarities between restaurants are nearly distributed in the range of (0, 1) with a mean around 0.3. The 𝑆𝑖,𝑚𝑎𝑥 column shows that under each category, there exists some restaurants that are pretty similar to each other. More results show that the maximum similarity among all restaurants is 0.9257702, which is given by two hotdogs restaurants that both sell japa-dogs, „japa-dog-vancouver‟ and ‟japa-dog-vancouver-2‟, located at different streets in Vancouver downtown. This makes sense because it is highly possible that a  japa-dog lover will get recommended with other japa-dogs nearby when he is hanging around.  6.1.2  Inter-category Similarity  41  Secondly, the inter-category similarity is defined as the similarity between two categories. For a given pair of categories, the inter-category similarity is the average similarity between any of the two restaurants, each of which falls into one category. Formally, let 𝐶𝑖 and 𝐶𝑗 be two given categories, and 𝑅𝑖 and 𝑅𝑗 are the sets of restaurants that belong two 𝐶𝑖 and 𝐶𝑗 respectively. Let 𝑆𝑖,𝑗 be the inter-category similarity for 𝐶𝑖 and 𝐶𝑗 , which is given by, 𝑆𝑖,𝑗 =  𝑢∈𝑅𝑖, 𝑣∈𝑅𝑗  𝑠𝑖𝑚(𝑢, 𝑣)  𝑁𝑖,𝑗  where u and v are restaurants within category 𝐶𝑖 and 𝐶𝑗 respectively, and 𝑁𝑖,𝑗 is the number of (u, v) pairs. Table 6.2 gives the inter-category similarities for 6 selected typical food categories: Greek, Italian, French, Chinese, Malaysian and Thai food. Table 6.2: Inter-category Similarities  Greek  Italian  French  Italian  0.20465  French  0.211486  0.29955  Chinese  0.14474  0.168456 0.184176  Chinese  Malaysian  0.159773 0.162888 0.184505 0.235711  Thai  0.141429 0.153315 0.173946 0.255981  Malaysian  0.278918  The average of the whole table is 0.1973. It can be observed that the values in upperleft triangle and the bottom-right triangle are all above average, and the rest are below average. The result corresponds well with the fact that Greece, France and Italy are geographically close to each other and they have pretty similar food style. While the other 42  group of country, China, Malaysia and Thailand has the same conclusion, it can be seen that there is not too much in common for categories such as Thai and Greek, which is also the truth as we know. Not surprisingly, the conclusion draw from the two similarity experiments applies well with people‟s common sense. It does demonstrate that the keyword vectors that are extracted from users‟ reviews work well and they are effective measurements for making recommendations.  6.2  Determine Lubricating Factor One parameter to be determined is the lubricating factor 𝛼 in the recommendation  score  function 𝑢 𝑢 𝑆𝑟𝑢 = 𝛼 ∗ 𝑆𝑟,𝑐 + 1 − 𝛼 ∗ 𝑆𝑟,𝑘  𝑢 The factor is used to keep the balance of the category match score 𝑆𝑟,𝑐 and the keyword 𝑢 vector match score 𝑆𝑟,𝑘 that a target restaurant would get. The intuition is that if 𝛼 is too large  (close to 1), then the category match score will dominate the final score, resulting in a very limited coverage of the recommendation result as the most of recommended restaurants will probably fall only into a very few of categories preferred by the user. On the other hand, if 𝛼 is too small (close to 0), the user would probably be disappointed to see he is recommended with a lot of unfamiliar cuisines that he hardly tried before, which doesn‟t reflect the user‟s preferences on food styles well. 6.2.1  Measurement D 43  To determine a proper value of 𝛼, some measurements need to be defined first. We define the user interest distribution vector for a given list of food categories as the vector containing in sequence the percentage of each category in the user‟s historical dining activities, denoted by 𝑉 𝑙𝑢 , where 𝑙 is list of categories. For example, if a user had 10 dining activities in the past, with 5 times in Japanese food, 3 times French food and 2 times pizza, then the user interest distribution vector for category list (Japanese, French, pizza) is 𝑉 𝑙𝑢 = (0.5, 0.3, 0.2). Similarly, we also define the global distribution for a given list of food category as the percentage of each food category of all restaurants in database, denoted by 𝑉𝑔𝑙 ,. For example, in all 816 restaurants we have, there are 45 Japanese, 20 French and 23 pizza restaurants. Thus the global distribution vector for category list (Japanese, French, pizza) is 𝑉𝑔𝑙 = (45/816, 20/816, 23/816). In the same way, we can define the recommended distribution vector for a given list of food category in the recommendation results, which is the percentage of each category in recommendation results, denoted by 𝑉 𝑙𝑟𝑒𝑐 . Let‟s still take the list (Japanese, French, pizza) as an example. If the recommended distribution vector is very close to the user interest distribution vector 𝑉 𝑙𝑢 , this implies the factor 𝛼 is too large and the food categories dominate recommendation results. If the recommended distribution vector is very close to the global distribution vector 𝑉𝑔𝑙 , this implies the factor 𝛼 is too small and the user‟s preferences on food category are not reflected by recommendation results.  44  Therefore, it is reasonable to keep 𝑉 𝑙𝑟𝑒𝑐 in between 𝑉 𝑙𝑢 and 𝑉𝑔𝑙 , neither too far from 𝑉 𝑙𝑢 nor too far from 𝑉𝑔𝑙 . The distance between the vectors are defined as the measurement D where 𝐷= where  𝑉 𝑙𝑟𝑒𝑐 −𝑉𝑙𝑢  + 𝑉 𝑙𝑟𝑒𝑐 −𝑉𝑔𝑙  is the Euclidean distance between two vectors. It is obvious that the  recommendation results vary with 𝛼, resulting in the change of 𝑉 𝑙𝑟𝑒𝑐 and thus the change of D. The purpose is then to select the factor 𝛼 such that 𝛼 = 𝑎𝑟𝑔𝑚𝑖𝑛(𝐷)  6.2.2  Experiment Result In the experiment, we created 6 fake users, and each of the users has 40 automatically  generated dining activities that are nearly evenly distributed in 8 randomly picked food categories. For instance, fake user 1 has 40 dining activities for food categories (hotdogs, sandwiches, greek, foodstands, sushi, dimsum, creperies, ethnicmarkets) with a user interest distribution vector 𝑉 𝑙𝑢 = 0.15, 0.15, 0.175, 0.125, 0.125, 0.125,0.125, 0.125 . The corresponding global distribution vector 𝑉𝑔𝑙 is 𝑉𝑔𝑙 = (0.0551, 0.0392, 0.0281, 0.0270, 0.0257, 0.0245, 0.0196, 0.0147) . With 𝛼 varying from 0.1 to 0.8, the recommendation algorithm is run against 6 fake users and for each of the users 100 restaurants are recommended among all 816 restaurants. A recommended distribution vector 𝑉 𝑙𝑟𝑒𝑐 is then calculated from the recommendation result for  45  each user and finally the average distance D is computed. Figure 6.1 shows how the measurement D varies with 𝛼.  Average Distance (D) 0.225 0.22 0.215 0.21 0.205 0.2 0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  Figure 6.1: Measurement D Varies with 𝜶.  The figure shows that the curve of average distance is concave which has the minimum value when 𝛼 = 0.3, and 𝛼 = 0.4 gives the similar result. Therefore, we fix 𝛼 to 0.3 in recommendation score function for user evaluations discussed in next section.  6.3  User Study To evaluate the effectiveness of the personalized recommender system a user test  with 8 participants was conducted. Each user was asked to give 30-50 historical dining activities according to his personal preference, including the number of times and ratings, in 100 randomly picked restaurants. Based on the provided dining activities, a list containing the top 10 restaurants among another randomly selected 100 restaurants is recommended to  46  the user. Feedback of the recommendation result from the participants are collected and studied.  6.3.1  Metrics Various metrics have been used to evaluate the performance of recommender systems.  We choose the most common two metrics in this thesis, which are precision and recall. Precision P is defined as the ratio of the number of relevant items to the total number of items in the recommendation result, and recall R is defined as the ratio of the number of relevant items to the number of all relevant items available, as shown in formula [40] below. 𝑃=  𝑁𝑟𝑠 𝑁𝑠  ,𝑅=  𝑁𝑟𝑠 𝑁𝑟  where 𝑁𝑠 is the total number of items in recommendation result, 𝑁𝑟 is the total number of relevant items, and 𝑁𝑟𝑠 is the number of relevant items that appear in recommendation result. In our test case, 𝑁𝑠 is always equal to 10, 𝑁𝑟𝑠 is the number of the recommended restaurants that the user is satisfied with, and 𝑁𝑟 is the total number of restaurants that are interesting to the user and may or may not be included in the recommendation result.  6.3.2  Experiment Results We compared our personalized recommendation module in LePlaza system with a  simple traditional one that makes recommendations according to the average ratings and number of reviews generated by all users. Table 6.3 and Figure 6.2 show the precision of the  47  two systems from all 8 participants, who are denoted by P0 to P7. Table 6.4 and Figure 6.3 show the recall values. Table 6.3: Precision Comparison  P0  P1  P2  P3  P4  P5  P6  P7  LePlaza  0.8  0.8  0.9  0.9  0.9  0.9  0.9  1  Traditional  0.5  0.7  0.6  0.8  0.8  0.7  0.7  0.4  lePlaza  Precision  Traditional  1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 P0  P1  P2  P3  P4  P5  P6  P7  Figure 6.2: Precision Comparison  Table 6.4: Recall Comparison  P0  P1  P2  P3  P4  P5  P6  P7  LePlaza  0.62  0.89  0.90  0.75  0.69  0.90  0.69  1  Traditional  0.33  0.58  0.50  0.82  0.57  0.64  0.58  0.33  48  lePlaza  Recall  Traditional  1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 P0  P1  P2  P3  P4  P5  P6  P7  Figure 6.3: Recall Comparison  Some observations can be made. Firstly, for almost all users, our personalized recommender system performs much better than the traditional one, which demonstrates that the proposed user interest model reflects users‟ interests. Secondly, it can be seen from Figure 6.2 that the precision values of LePlaza for 8 users appear stable and they are all above 80%, which suggests that the LePlaza recommender system would be effective in real use. Thirdly, there are some obvious variations for recall metric in Figure 6.3. Even for LePlaza the best case is 100% while the worst case is only 69%. This is probably because in the experiment we hard-coded to recommend only the top 10 restaurants rather than all possible relevant restaurants. In addition, 4 out of 8 participants complained that the first restaurant that appeared in the recommended results was not satisfying when using the simple traditional recommender system, while this number of people decreased to 1 for our LePlaza system.  49  Chapter 7: Conclusion People often make efforts to enrich their social lives by seeking answers to questions like “Are there any exciting movies coming out next week that I can enjoy with my friends?”, “Is there a party taking place in my neighborhood?” or “Which restaurant nearby fits my taste best?”. To answer such questions, LePlaza, which is a location-based and eventcentered social networking platform, is presented in this thesis.  7.1  Contributions Firstly, the whole framework and fundamental functions of LePlaza system are  designed and implemented, on which users can create, browse, search and manage variety types of social events. Moreover, group operations are supported so that the user can manage friends and events in group with ease and flexibility. Secondly, as a particular case of events, restaurants and user‟s dining activities are studied, based on which a content-based hybrid user interest model is presented. Experimental results show that the proposed model can capture user‟s dining preference effectively. Thirdly, a personalized recommendation algorithm is proposed. It is location-based and can recommend restaurants that meet the user‟s taste within any given area according to the user interest model. A user study shows that the performance of the recommendation is satisfying with an average accuracy over 80%.  50  Finally, both user interest model and recommendation result are visualized in an interactive manner, aiming to give the user a better understanding of his preference and make him feel in control of his experience.  7.2  Future Work   Generalized user interest model. The current user interest model is built on top of the dining events only. It would be interesting to apply the interest model to other types of events. For example, a real estate application can use this model to give suggestions to people who are seeking for houses, according to location information and house descriptions. Other applications in which the model applies include shopping guide, academic conference recommendation, etc. In all, the model is expected to be generally applicable to many other applications since the tree structure in the model can still be remained and new nodes for other types of event could be easily attached to the tree.    Group interest model. To come up with a way that combines individual interest models to get a model for a group of people would be useful when an agreement or a decision has to be made among group members. For example, what event will be probably enjoyed by most of the people in a group when they are going to have some group activities?  51    Mobile side support. Location based services are often associated with mobile applications. With a LePlaza client installed on mobile phone, users will be able to find interesting events in real time at any places.  52  Bibliography  [1] Koeppel, I. “What are location services from a GIS Perspective”. ESRI white paper, 2000. [2] Shiode, N., Li, C., Batty, M., Longley, P., & Maguire, D. “The impact and penetration of location-based services”. In H. A. Karimi & A. Hammad (Eds.), Telegeoinformatics: location-based computing and services, pp. 349–366, 2004. [3] Eventful website. http://www.eventful.com. [4] Zevent website. http://www.zevent.com. [5] W. Schwinger, Ch. Grün, B. Pröll, W. Retschitzegger, A. Schauerhuber. “Contextawareness in mobile tourism guides – a comprehensive survey”. Technical report, 2005. [6] K. Kabassi. “Personalizing recommendations for tourists”. Telematics and Informatics, 27(1): 51-66, 2010. [7] M. Späth. “A Novel Approach to User Interest and Preference Modelling in a Sightseeing Planner for Dublin”, MSc Dissertation, Trinity College Dublin, 2007. [8] Y.J. Park, K.N. Chang. “Individual and group behavior-based customer profile model for personalized product recommendation”. Journal of Expert Systems with Applications, 36(2): 1932-1939, 2009. [9] Moukas, A. “Amalthaea: Information Discovery And Filtering Using A Multiagent Evolving Ecosystem”. In Applied Artificial Intelligence, 11(5): 437-457, 1997. [10] Salton, G., McGill, M. “Introduction to Modern Information Retrieval”. McGraw-Hill, 1986. [11] Lieberman, H. “Letizia: An Agent That Assists Web Browsing”. In Proceedings of the 14th International Joint Conference On Artificial Intelligence, Montreal, Canada, pp. 924-929, 1995. [12] Balabanovic, M., Shoham, Y “Fab: Content-Based Collaborative Recommendation”. In Communications of the ACM, vol. 40, no. 3, pp. 66-72, 1997. [13] Montebello, M., Gray, W., Hurley, S. “A Personal Evolvable Advisor for WWW Knowledge-Based Systems”. In Proceedings of the 1998 International Database Engineering and Application Symposium (IDEAS’98), Cardiff, Wales, U.K, pp. 224-233, 1998. [14] Gabrilovich, E., Markovitch, S. “Harnessing the expertise of 70,000 human editors: Knowledgebased feature generation for text categorization”. Journal of Machine Learning”, vol. 8, pp. 2297–2345, 2007. [15] Liu, F., Yu, C., and Meng, W. “Personalized Web Search for Improving Retrieval Effectiveness”. IEEE Transactions on Knowledge and Data Engineering, 16(1): 28-40, 2004. [16] H. H. Syed, P. Andritsos, “A Lightweight Tree Structure to Model User Preferences”. In Proceedings of PersDL, June 2007. [17] Liu, D., “The Design and Research of User Interest Model in Personalized Search 53  Engine”. In Proceedings of Asia-Pacific Information Processing, pp. 639-642, 2009. [18] Strobbe M., Van Laere O., Dauwe S., Dhoedt B., De Turck F., Demeester P., van Nimwegen C., and Vanattenhoven J. “Interest based selection of user generated content for rich communication services”. Journal of Network and Computer Applications, 33(2): 84-97, 2010. [19] Lee, B., Kim, J., & Jung, J. “Location-Based Service with Context Data for a Restaurant Recommendation”. Journal of Database and Expert Systems Applications, vol. 4080, pp. 430-438, 2006. [20] Resnick, P., & Varian, H. R. “Recommender systems”. Communications of the ACM, 40(3): 56–60, 1997. [21] Wang, Y. F., Chuang, Y. L., Hsu, M. H., & Keh, H. C. “A personalized recommender system for the cosmetic business”. Expert Systems with Applications, 26(3): 427–434, 2004. [22] Min, S. H., & Han, I. “Detection of the customer time-variant pattern for improving recommender systems”. Expert Systems with Application, 28(2): 189–199, 2005. [23] Amazon website. http://www.amazon.com. [24] Pazzani MJ, Billsus D. “Content-based recommendation systems”. Lecture notes in computer science. Berlin, vol. 4321, pp. 325–41, 2007. [25] A. Hinze and S. Junmanee. “Travel recommendations in a mobile tourist informationsystem”. In Proceedings of Information Systems and its Application (ISTA), Palmerston North, New Zealand, 2005. [26] M. Kayaalp, T. Özyer, S.T. Özyer. “A Collaborative and Content Based Event Recommendation System Integrated With Data Collection Scrapers and Services at a Social Networking Site”. In Proceedings of IEEE Advances in Social Network Analysis and Mining, pp. 113-118, 2009. [27] Gao, H., Li W., “A Hotel Recommendation System Based on Collaborative Filtering and Rankboost Algorithm”. In Proceedings of the Second International Conference on MultiMedia and Information Technology, vol. 1, pp.317-320, 2010. [28] R. Klamma, P.M. Cuong, and Y. Cao. “You Never Walk Alone: Recommending Academic Events Based on Social Network Analysis”. Journal of Social Informatics and Telecommunications Engineering, 4(1): 657-670, 2009. [29] A. Basso, M. Milanesio, A. Panisson, G. Ruffo. “From Recordings to Recommendations: Suggesting LiveEvents in the DVR Context”. In Proceedings of RecSys 2010, Barcelona, Spain, 2010. [30] W. Chu, S. Park. “Personalized recommendation on dynamic content using predictive bilinear models”. In Proceedings of the 18th international conference on World Wide Web, Madrid, Spain, 2009. [31] E. Minkov, B. Charrow, J. Ledie, S. Teller, T. Jaakola. “Collaborative Future Event Recommendation.” In Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 819-828, 2010. [32] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 54  [33] [34]  [35] [36] [37] [38] [39]  [40]  [41]  [42] [43] [44]  [45]  [46]  [47] [48]  “Evaluating collaborative filtering recommender systems”. ACM Trans. Inf. Syst., 22(1):5-53, 2004. Loopt website. http://www.loopt.com/. W. Zheng, Y. Zheng, X. Xie, Q. Yang. “Collaborative Location and Activity Recommendations With GPS History Data”. In Proceedings of the 19th international conference on World Wide Web, 2010. Yelp website. http://www.yelp.com/. Wiki page of Yelp. http://en.wikipedia.org/wiki/Yelp,_Inc. WordNet project. http://wordnet.princeton.edu/. JWNL project. http://sourceforge.net/projects/jwordnet/. Adomavicius, G. and Tuzhilin, A. “Toward the Next Generation of Recommender Systems: A Survey of the State of-the-Art and Possible Extensions”. Journal of IEEE Trans. Knowledge and Data Eng., 17(6): 734-749, 2005. Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. “Evaluation colaborative filtering recommender systems”. ACM Transactions on Information Systems, 22(1): 5–53, 2004. Y. Park and N. Chang. “Individual and group behavior-based customer profile model for personalized product recommendation”. Journal of Expert Systems with Applications, vol. 36, pp. 1932–1939, 2009. R. Hu. “Design and user issues in personality-based recommender systems”. In Proceedings of the fourth ACM conference on Recommender systems, 2010. S. Schiaffino, A. Amandi. “Intelligent User Profiling”. Journal of Artificial Intelligence, pp. 193-216, 2009. J. J. Levandoski, M. F. Mokbel, and M. E. Khalefa. “Caredb: A context and preferenceaware location-based database system”. In Proceedings of PVLDB, 3(2):1529–1532, 2010. Fan, W., Gordon, M. D., & Pathak, P. “Effective profiling of consumer information retrieval needs: A unified framework and empirical comparison”. Journal of Decision Support Systems, 40(2): 213–233, 2005. A. Miele, E. Quintarelli, and L. Tanca. “A methodology for preference-based personalization of contextual data”. In Proceedings of the 12th Intl Conf. on Extending Database Technology, pp. 287–298, 2009. Simon R. and Frőhlich P. “A mobile application framework for the geospatial Web”. In Proceedings of the 16th Intl. Conf. on World Wide Web. ACM Press: 381-390, 2007. Y. Chung. “LIVESGEO: A Location-based Multimedia Knowledge Sharing System”. Master Thesis. The University of British Columbia. 2011.  55  Appendix: Original Data of Restaurant Recommendation User Feedback Eight participants were asked to fill “Bad?” column with number “1” or “2” according to the recommended restaurants. Number “1” represents that the user is not happy with the restaurant, and number “2” means the user is not satisfied with the rank of the restaurant in the result. P0.xls  P1.xls  P2.xls  56  P3.xls  P4.xls  P5.xls  57  P6.xls  P7.xls  58  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0052105/manifest

Comment

Related Items