UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Quantifying the value of peer-produced Information in social tagging systems Santos-Neto, Elizeu 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2014_september_santosneto_elizeu.pdf [ 6.11MB ]
JSON: 24-1.0166001.json
JSON-LD: 24-1.0166001-ld.json
RDF/XML (Pretty): 24-1.0166001-rdf.xml
RDF/JSON: 24-1.0166001-rdf.json
Turtle: 24-1.0166001-turtle.txt
N-Triples: 24-1.0166001-rdf-ntriples.txt
Original Record: 24-1.0166001-source.json
Full Text

Full Text

Quantifying the Value of Peer-Produced Informationin Social Tagging SystemsbyElizeu Santos-NetoB. Computer Science, Universidade Federal de Alagoas, 2002M. Computer Science, Universidade Federal de Campina Grande, 2004A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University Of British Columbia(Vancouver)August 2014c© Elizeu Santos-Neto, 2014AbstractCommons-based peer production systems are marked by three main characteris-tics, they are: radically decentralized, non-proprietary, and collaborative. Peerproduction is in stark contrast to maket-based production and/or on a centralizedorganization (e.g., carpooling vs. car rental; couchsurfing vs. hotels; Wikipedia 1vs. Encyclopaedia Britannica).Social tagging systems represent a class of web systems, where peer produc-tion is central in their design. In these systems, decentralized users collect, share,and annotate (or tag) content collaboratively to produce a public pool of annotatedcontent. This uncoordinated effort helps filling the demand for labeling an everincreasing amount of user-generated content on the web with textual information.Moreover, these labels (or simply tags) can be valuable as input to mechanismssuch as personalized search or content promotion.Assessing the value of individuals’ contributions to peer production systemsis key to design user incentives to bring high quality contributions. However,quantifying the value of peer-produced information such as tags is intrinsicallychallenging, as the value of information is inherently contextual and multidimen-sional. This research aims to address these two issues in the context of socialtagging systems.To this end, this study sets forth the following hypothesis: assessing the valueof peer-produced information in social tagging systems can be achieved by har-nessing context and user behavior characteristics. The following questions guide1http://www.wikipedia.comiithe investigations:Characterization: (Q1). What are the characteristics of individual user activ-ity? (Q2). What are the characteristics of social user activity? (Q3). What are theaspects that influence users’ perception of tag value?Design: (Q4). How to assess the value of tags for exploratory search? (Q5).What is the value of peer-produced information for content promotion?This study applies a mixed methods approach. The findings show that pat-terns of user activity can inform the design of supporting mechanisms for taggingsystems. Moreover, the results suggest that the proposed method to assess valueof tags is able to differentiate between valuable tags from less valuable tags, asperceived by users. Moreover, the analysis of the value of peer-produced infor-mation for content promotion shows that peer-produced sources can oftentimesoutperform expert-produced sources.iiiPrefaceAlthough I am the main author of the studies presented in this dissertation, theresults presented in the following chapters are the product of collaborative ef-forts. I had the pleasure to work with researchers from the University of BritishColumbia (my advisor, Matei Ripeanu), University of South Florida (David Con-don and Adriana Iamnitchi), Universidade Federal de Campina Grande (NiginiOliveira and Nazareno Andrade), Universidade Federal de Minas Gerais (FlavioFigueiredo, Tatiana Pontes, and Jussara Almeida), and HP Labs - Bristol (MirandaMowbray).It is worth noting that this dissertation consists of research studies that havebeen published (or are under review) in peer-reviewed international conferences,workshops, and journals. First, the characterization study presented in Chapter 2,Chapter 3, and Chapter 4 led to the four publications and submissions below:• Elizeu Santos-Neto, Flavio Figueiredo, Nigini Oliveira, Nazarendo Andrade,Jussara Almeida, Matei Ripeanu. Assessing Tag Value for Exploratory Search.Under review.• Elizeu Santos-Neto, David Condon, Nazareno Andrade, Adriana Iamnitchi,Matei Ripeanu. Reuse, Temporal Dynamics, Interest Sharing, and Collabo-ration in Social Tagging Systems. First Monday, Vol. 19 (7), August, 2014.• Elizeu Santos-Neto, David Condon, Nazareno Andrade, Adriana Iamnitchi,Matei Ripeanu. Individual and Social Behavior in Tagging Systems. In theiv20th ACM Conference on Hypertext and Hypermedia, Torino, Italy, June2009 (acceptance rate: 32%)• Elizeu Santos-Neto, Matei Ripeanu, Adriana Iamnitchi. Content Reuse andInterest Sharing in Tagging Communities. The AAAI 2008 Spring Symposiaon Social Information Processing, Stanford, CA, USA, March 2008.• Elizeu Santos-Neto, Matei Ripeanu, Adriana Iamnitchi. Tracking User At-tention in Collaborative Tagging Communities. In Proceedings of the In-ternational ACM/IEEE Workshop on Contextualized Attention Metadata:Personalized Access to Digital Resources, Vancouver, BC, Canada, June2007.The results related to the system design part, which are reported in Chapter 5and Chapter 6, are presented in the following refereed publications:• Elizeu Santos-Neto, Flavio Figueiredo, Nigini Oliveira, Nazarendo Andrade,Jussara Almeida, Matei Ripeanu. Assessing Tag Value for Exploratory Search.Under review.• Tatiana Pontes, Elizeu Santos-Neto, Jussara Almeida, Matei Ripeanu. WhereAre the ‘Key’ Words? On the Optimization of Multimedia Content TextualAttributes to Improve Viewership. Under review.• Elizeu Santos Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu. Onthe Choice of Data Sources to Improve Content Discoverability via TextualFeature Optimization. In the Proceedings of ACM Hypertext Conference(HT’2014), Santiago de Chile, Chile, September, 2014.• Elizeu Santos Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu. To-wards Boosting Video Popularity via Tag Selection. In the Proceedings ofthe ACM Workshop on Social Multimedia and Storytelling (SoMuS), April2014, Glasgow, UK.v• Elizeu Santos-Neto. Characterizing and Harnessing Peer-Production of In-formation in Social Tagging Systems. In the Proceedings of the 5th ACMInternational Conference on Web Search and Data Mining Doctoral Con-sortium (WSDM 2012 Doctoral Consortium), Seattle, WA, USA, February2012.• Elizeu Santos-Neto, Flavio Figueiredo, Jussara Almeida, Miranda Mow-bray, Marcos Gonalves, Matei Ripeanu. Assessing the Value of Contribu-tions in Tagging Systems. In the Proceedings of the 2nd IEEE Interna-tional Conference on Social Computing (SocialCom’2010), Minneapolis,MN, August 2010. (acceptance rate 15%)Finally, it is worth noting that part of this research has been approved by theUBC Behavioral Research Ethics Board under the approval number H11-02039.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Online Peer Production Systems . . . . . . . . . . . . . . . . . . 31.2 Social Tagging Systems . . . . . . . . . . . . . . . . . . . . . . . 51.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Summary of Methodology . . . . . . . . . . . . . . . . . . . . . 101.5 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . 101.6 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . 142. Characterizing Users’ Individual Behavior . . . . . . . . . . . . . . 152.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17vii2.1.1 General Characterization Studies . . . . . . . . . . . . . . 172.1.2 Evolution of Users’ Tag Vocabularies . . . . . . . . . . . 192.2 Data Collection and Notation . . . . . . . . . . . . . . . . . . . . 202.3 Tag Reuse and Item Re-Tagging . . . . . . . . . . . . . . . . . . 212.3.1 Levels of Item Re-tagging and Tag Reuse . . . . . . . . . 222.3.2 New Incoming Users . . . . . . . . . . . . . . . . . . . . 252.3.3 The Influence of Power Users . . . . . . . . . . . . . . . 262.3.4 Summary and Implications . . . . . . . . . . . . . . . . . 272.4 Temporal Dynamics of Users’ Tag Vocabularies . . . . . . . . . . 282.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 292.4.2 Results and Implications . . . . . . . . . . . . . . . . . . 303. Characterizing Social Aspects in Tagging Systems . . . . . . . . . . 353.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 Interest Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.1 Quantifying Activity Similarity . . . . . . . . . . . . . . . 393.2.2 How is Interest Sharing Distributed across the System? . . 403.2.3 Comparing to a Baseline . . . . . . . . . . . . . . . . . . 423.2.4 Summary and Implications . . . . . . . . . . . . . . . . . 433.3 Shared Interest and Indicators of Collaboration . . . . . . . . . . 453.3.1 Group Membership . . . . . . . . . . . . . . . . . . . . . 463.3.2 Semantic Similarity of Tag Vocabularies . . . . . . . . . . 483.3.3 Summary and Implications . . . . . . . . . . . . . . . . . 524. Understanding Users’ Perception of Tag Value . . . . . . . . . . . . 534.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.1.1 Why Do We Tag? . . . . . . . . . . . . . . . . . . . . . . 564.1.2 Economics of Information . . . . . . . . . . . . . . . . . 564.1.3 Perceptions of Information Value in System Design . . . . 584.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Which Systems Do Participants Use? . . . . . . . . . . . . . . . . 63viii4.4 Users Perception of Tag Value . . . . . . . . . . . . . . . . . . . 644.4.1 Aspects of Tag Production . . . . . . . . . . . . . . . . . 644.4.2 Tag Value in Exploratory Search . . . . . . . . . . . . . . 664.5 Concept Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715. Assessing Tag Value for Exploratory Search . . . . . . . . . . . . . . 735.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.1.1 Contributions in Peer Production Systems . . . . . . . . . 745.1.2 Characterizing the Quality of Tags . . . . . . . . . . . . . 765.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 A framework to assess the value of user contributions . . . . . . . 795.4 A Naive Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5 An Information-theoretical Approach . . . . . . . . . . . . . . . . 825.5.1 Search Space Reduction Property . . . . . . . . . . . . . 845.5.2 Relevance Property . . . . . . . . . . . . . . . . . . . . . 875.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.6.1 Experiment Design . . . . . . . . . . . . . . . . . . . . . 885.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.6.3 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 925.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936. Assessing Value of Peer-Produced Information for Content Promotion 956.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.2 Context for Assessing the Value of Peer-Produced Information . . 1006.3 Building the Ground Truth . . . . . . . . . . . . . . . . . . . . . 1026.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 1056.4.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . 1056.4.2 Recommenders . . . . . . . . . . . . . . . . . . . . . . . 1086.4.3 Budget Adjustment . . . . . . . . . . . . . . . . . . . . . 1096.4.4 Success Metrics . . . . . . . . . . . . . . . . . . . . . . . 110ix6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 1116.5.1 Are Tags Assigned to Videos Optimized? . . . . . . . . . 1116.5.2 Is peer-produced information valuable? . . . . . . . . . . 1136.5.3 Combining data sources . . . . . . . . . . . . . . . . . . 1156.5.4 Is the number of contributors a factor? . . . . . . . . . . . 1176.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1187. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A. Contextual Interview Guideline . . . . . . . . . . . . . . . . . . . . . 138B. Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141C. Thick Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143xList of TablesTable 2.1 Summary of data sets used in this study . . . . . . . . . . . . . 21Table 2.2 A summary of daily item re-tagging and tag reuse. The higherthe score the more an item/tag is re-tagged/reused. . . . . . . . 23Table 2.3 The statistical test results reject the hypothesis that the item re-tagging and tag reuse observations with and without the powerusers are equal. However, in most cases the difference has asmall magnitude. . . . . . . . . . . . . . . . . . . . . . . . . . 27Table 3.1 The share of tagging activity captured by the tag vocabular-ies in CiteULike and Connotea that is found in WordNet andWorldNet combined with YAGO lexical databases. As we usean anonymous version of the del.icio.us dataset, with all theusers, items, and tags identified by numbers, this precludedus to perform the same analysis using WordNet and YAGO fordel.icio.us. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Table 5.1 Data set used as input of an experiment that estimates the method’saccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88xiList of FiguresFigure 1.1 The logical structure of this research. . . . . . . . . . . . . . . 3Figure 1.2 Illustration of classes (squares) and instances (ellipsis) of commons-based peer production systems. . . . . . . . . . . . . . . . . . 4Figure 2.1 Daily item re-tagging (left) and tag reuse (right). The curvesare smoothed by a moving average with window size n = 30 . 24Figure 2.2 Self-tag reuse (left) and daily activity generated by returningusers (right). The curves are smoothed by a moving averagewith window size n = 30 . . . . . . . . . . . . . . . . . . . . 25Figure 2.3 The vocabulary growth pattern in CiteULike . . . . . . . . . . 31Figure 2.4 The vocabulary growth pattern in Connotea . . . . . . . . . . 32Figure 2.5 The vocabulary growth pattern in delicious . . . . . . . . . . 32Figure 2.6 Rate of change in the tag usage frequency in the user vocabu-laries of CiteULike . . . . . . . . . . . . . . . . . . . . . . . 33Figure 2.7 Rate of change in the tag usage frequency in the user vocabu-laries of Connotea . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 2.8 Rate of change in the tag usage frequency in the user vocabu-laries of Delicious . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 3.1 Distributions for item- and tag-based interest sharing (for pairsof users with non-zero sharing) in the studied systems . . . . . 40xiiFigure 3.2 Q-Q plots that compare the interest sharing distributions forthe observed vs. simulated (i.e., the RNM model) for CiteU-Like (left) and Connotea (right) . . . . . . . . . . . . . . . . . 44Figure 3.3 CDFs of tag vocabulary similarity for user pairs with posi-tive (bottom curve) and zero (top curve) activity similarity(as measured by the item-based interest sharing defined inEq. 3.1). CiteULike (left); Connotea (right) . . . . . . . . . . 51Figure 4.1 Example of a tag cloud extracted from the most popular tagsin Flickr. Information seekers interact with the tag clouds byclicking on each term to retrieve items that are annotated withthat specific tag. The tag cloud is reconstructed at each step. . 54Figure 4.2 Illustration of the qualitative research cycle inspired by themethodology described in Hennik et al. [39] and applied inthis work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Figure 4.3 Illustration of observed users’ transitions and decision makingduring exploratory search tasks. . . . . . . . . . . . . . . . . 66Figure 4.4 Concept map that illustrates the influence of several aspectson the perceived value of tags for exploratory search. . . . . . 70Figure 5.1 Components of a framework to quantify the value of user con-tributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 5.2 Comparison between the cumulative distribution functions (CDF)of tag values (naı¨ve method), for tags in each set (Hidden andRandom), from the perspective of each user in the Library-Thing data set. . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 5.3 Comparison between the CDFs of tag values (our proposedmethod), for tags in each set (Hidden and Random), from theperspective of each user in the LibraryThing data set . . . . . 92Figure 6.1 The recommendation pipeline. . . . . . . . . . . . . . . . . . 100xiiiFigure 6.2 A screenshot of the survey we set up on Amazon MechanicalTurk: turkers watch the video presented on the left side, enterthe suggested keywords, answer the questions, and move onto the next video. . . . . . . . . . . . . . . . . . . . . . . . . 104Figure 6.3 Histogram of the number of evaluations turkers have performed 105Figure 6.4 Histogram of the number of different keywords associated toa video by turkers . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 6.5 Histogram of the total length (in characters) for the set of dis-tinct keywords associated to each video. . . . . . . . . . . . . 106Figure 6.6 An illustration of the space of data sources we explore. . . . . 107Figure 6.7 CCDF of F3-measure for YouTube tags (dashed line) com-pared to recommendation based on input from all other datasources combined (continuous line) using FREQUENCY (left)and RANDOMWALK (right) recommenders. . . . . . . . . . . 112Figure 6.8 CCDF of F3-measure for each data source used as input forFREQUENCY (left) and RANDOMWALK (right) recommenders. 113Figure 6.9 CCDF of τ for each data source used as input for FREQUENCY(left) and RANDOMWALK (right) recommenders. . . . . . . . 115Figure 6.10 CCDF of NDCG for each data source used as input for FRE-QUENCY (left) and RANDOMWALK (right) recommenders. . . 116Figure 6.11 CCDF F3-measure performance comparison between combi-nations of groups of data sources Peers (MovieLens + Wikipedia)and Experts (NYTimes + RT Reviews) relative to YouTube andRotten Tomatoes. . . . . . . . . . . . . . . . . . . . . . . . . 117Figure 6.12 Tau performance comparison between combinations of datasources: Peers (MovieLens + Wikipedia) vs. Experts (NY-Times + RT Reviews). . . . . . . . . . . . . . . . . . . . . . 118Figure 6.13 NDCG performance comparison between combinations of datasources: Peers (MovieLens + Wikipedia) vs. Experts (NY-Times + RT Reviews). . . . . . . . . . . . . . . . . . . . . . 119xivAcknowledgmentsFirst and foremost, I must thank Matei Ripeanu – an academic advisor turned intoa friend – for challenging me all the way during this journey; and, for being arisk-taker by investing his time on investigations I decided to pursue.Second, this work would have been no fun without my fabulous collaboratorsand external advisors. My special thanks to you all (in order of appearance): Wal-fredo Cirne (UFCG, Google), Franscisco Brasileiro (UFCG), Nazareno Andrade(UFCG), Ian Foster (U. of Chicago), Adriana Iamnitchi (USF), David Condon(USF), Abdullah Gharaibeh (UBC), Lauro Belta˜o Costa (UBC), Flavio Figueiredo(UFMG), Jussara Almeida (UFMG), Miranda Mowbray (HP Labs, Bristol), DinanGunawardena (Microsoft Research, UK/MSRC), Thomas Karagiannis (MSRC),Milan Vojnovic (MSRC), Alexandre Poutiere (MSRC), Tatiana Pontes (UFMG),and Nigini Oliveira (UFCG).Third, thanks to all members of NetSysLab for their feedback and discussions;the volunteers who participated in parts of this study; and the support from: Uni-vertisy of British Columbia (UBC), British Columbia Innovation Council (BCIC),Natural Sciences and Engineering Research Council of Canada (NSERC), and As-sociation of Universities and Colleges of Canada (AUCC).Fourth, thanks to my Google colleagues for enriching internship experiencesand great SF2G rides: Thomas Kotzmann, Foad Dabiri, Viresh Ratnakar, DanielFireman, and Guilherme Germoglio.Finally, thanks to Cypress Mt. and Mt. Seymour for being there when I neededthem the most.xvDedicationTo my family, friends, Luciana, Anaı¨s, and Ama´lia,who constantly help me to be a better human being.xviChapter 1IntroductionCommons-based peer production systems (or simply, peer production systems)are marked by three main characteristics, they are: radically decentralized, non-proprietary, and collaborative [8]. Peer production is in stark contrast to modes ofproduction based on markets and/or on a centralized organization (e.g., Wikipedia 1vs. Encyclopaedia Brittanica; ontology created by experts vs. folksonomies).As peer production abounds in today’s World Wide Web, with hundreds ofmillions of distributed users collaborating towards the production of informationgoods (e.g., Wikipedia, Flickr 2, YouTube 3, and Delicious 4), studying these sys-tems to understand users’ motivation to contribute to their communities and theirperception of value of peer-produced information is important. In fact, under-standing peer production of information the World Wide Web will enable us toimprove user experience both in existing and to design the next generation ofsuch systems.Social tagging systems (often referred to as collaborative tagging systems, orsimply tagging systems) are web systems, where peer production is central in theirdesign. In social tagging systems, users collect, share and annotate (or tag) content1http://www.wikipedia.com2http://www.flickr.com3http://www.youtube.com4http://www.delicious.com1collaboratively. For example, in Delicious, users bookmark URLs and annotatethese URLs with free form words (i.e., tags). As multiple users may annotate thesame URL, the user community collaboratively produces a large-scale catalog ofannotated content that may enable the design of improved mechanisms such aspersonalized search and recommendation [56, 93, 100].As tagging becomes an effective way to collect useful metadata about con-tent and users’ interests, user behavior characteristics at the individual and so-cial level, and their relationship to peer production are important to the designof mechanisms that improve users’ experience [40, 99]. Therefore, unveiling us-age characteristics is important to inform the design of current and future taggingsystems, in particular, and peer production systems, in general.More importantly, social tagging systems are inherently collaborative, and, assuch, studying social tagging through the lenses of commons-based peer produc-tion has the potential to enable the design of key supporting mechanisms suchas incentives to boost high quality participation. More specifically, designing andevaluating methods that assess the value one participant produces to other individ-uals in the community is a fundamental building block to build other mechanisms.For example, measuring how one user’s tags helps other users search more effi-ciently could help designing incentive mechanisms that improve the quality oftags.To fill these gaps, this thesis present efforts on two main directions:• First, a characterization of social tagging systems based on both a qualita-tive investigation of user perceptions of value and a quantitative analysis ofrecords of user activity. In particular, the characterization aims to under-stand how and why users contribute to tagging systems and what are theirperceptions about others’ contributions;• Second, the design and evaluation of methods to assess the value of usercontributions in social tagging systems. In particular, this thesis investigatesthe value of tags in two contexts: exploratory search and content promotion.2Figure 1.1: The logical structure of this research.Figure 1.1 illustrates the logical structure of this research, while the rest of thischapter presents the background of this research, the research questions, and thecontributions. First, it presents the definition of commons-based peer productionsystems [8] (Section 1.1). Next, it discusses the peer production of informationin tagging systems, which is the specific area this research contributes to (Sec-tion 1.2). The discussion follows with the introduction of research questions thatguide this thesis (Section 1.3), the methodology adopted to address these ques-tions (Section 1.4), and a summary of contributions (Section 1.5). Finally, thischapter concludes with the presentation of the structure of this dissertation.1.1 Online Peer Production SystemsCommons-based peer production systems are “systems where production is rad-ically decentralized, collaborative and non-proprietary” [8]. Benkler uses theterm commons to highlight that the production mode in the these systems resem-3bles that of a common property regime, where participants share a common poolof resources [67]. In this sense, commons-based peer production systems, suchas carpooling 5, represent the opposite of production systems based on privateproperty, such as car rental.Commons-based peer production spans a wide range of scenarios, both in thephysical world (e.g., carpooling, community choirs) and over the Internet (e.g.,open source software, Q&A portals, social media websites). This research focuseson the latter class of scenarios. In particular, it concentrates on a subclass ofpeer production systems that are concerned with the online peer production ofinformation (see Figure 1.2). Systems in this subclass use the Internet to mobilizedecentralized participants who collaboratively produce and share information. Asit happens in most peer production systems, individuals contribute to a commonpool of resources without any enforced hierarchy.Figure 1.2: Illustration of classes (squares) and instances (ellipsis) ofcommons-based peer production systems.5http://en.wikipedia.org/wiki/Carpool4Encyclopedia Britannica and Wikipedia, for instance, illustrate the contrastsbetween the proprietary mode of information production and its commons-basedcounterpart. In the former, the production of articles follows a traditional model,where authors are coordinated by a centralized entity, to produce a proprietarygood (i.e., the encyclopedia articles). In the latter, however, authors are inherentlydecentralized, contribute to a public pool of articles, and hierarchy emerges byconsensus. It is worth noting that Encyclopedia Britannica has recently incor-porated a few elements of peer production to its services (e.g., users can submitarticles), which serves as evidence of the advantages of peer production in somecontexts.1.2 Social Tagging SystemsAlong the same lines as Wikipedia in terms of collaboration, yet focusing on theproduction of a different type of information, many social systems target the de-mand for social content sharing and personal content management [37]. Systemslike CiteULike 6, Delicious, YouTube 7, and Flickr 8 are commonly referred to associal tagging systems. These systems provide users with the capability to anno-tate content with free-form words (or tags), as well as social networking features.In fact, tagging features are commonplace even in major online social systemslike Twitter 9, Facebook 10, and Google+ 11, where social networking is a moreprominent fature.Although each of these systems targets different types of content, users, andprovide unique features, their tagging capabilities are conceptually the same, interms of the abstract entities. In social tagging systems each user maintains a6http://www.citeulike.org7http://www.youtube.com8http://www.flickr.com9http://www.twitter.com10http://www.facebook.com11http://plus.google.com5library: a collection of annotated items (e.g., photos, videos, URLs, or textualposts). For example, in CiteULike, users collect citation records linked to onlinearticles, while in Delicious, users bookmark URLs to generic web pages, and inGoogle+, Twitter, and Facebook, users post textual or multimedia content. Auser may assign tags to items in her library (e.g., tags in Delicious, or hashtagsin Twitter). Additionally, a user may also tag items in other user’s public library.Tags may serve to group items, as a form of categorization, or to help find itemsin the future [29, 66]. The tagging activity can be private (i.e., only the user whogenerated the tags and items can access these annotations) or public. A user cansee what (public) tags other users assigned to an item when she is tagging it, thusthe user is able to reinforce the choice of tags as appropriate by repeating the tagspreviously assigned to that item.1.3 Research QuestionsThis section describes the specific goals of this research towards both the char-acterization of tagging systems and the design of methods to assess the value oftags.Characterization of Individual User Activity (Chapter 2)The rationale behind characterizing tagging systems is that usage patterns caninform the design of mechanisms and supporting infrastructure of these systems.In this mindset, this research focuses on the following aspects of tagging systems:• RQ1. What are the characteristics of individual user activity? In taggingsystems, activity can be described in terms of production (i.e., content pub-lication and annotation) and consumption of information (i.e., search, navi-gation). This research question is divided into two specific aspects of users’activity, as follows:– RQ1.1. Are there any patterns of information production in socialtagging systems? Besides the activity distributions, the rate of infor-6mation production overtime provides valuable insights on the growthof the system. More specifically, the goal is to determine whether usersare annotating more frequently the content that is available in the sys-tem, or they are more likely to publish new content. Understandingthis aspect is important to inform the design of techniques that aim atpredicting user behavior based on past activity such as recommenda-tion systems, as a high rate of new items poses an extra challenge torecommendation algorithms [2], while a high rate of tag reuse amongusers may help alleviating this problem [78, 100] (Section 2.3).– RQ1.2. What are the temporal dynamics of tag vocabularies?. Study-ing the evolution of tag vocabularies for individual users, in terms ofvocabulary size and tag usage frequency, complements the character-ization of information production in a tagging system. Patterns in vo-cabulary evolution may help designing mechanisms that rely on tagsas indicators of user preferences. For example, assuming that tags areused to represent user interests, personalization mechanisms [18, 56]could use the rate of vocabulary change (growth or tag frequency us-age) to determine the shifting window of interests users might have)(Section 2.4).Characterization of Social User Activity (Chapter 3)Social tagging systems provide features to users to engage in online social be-haviour (e.g., collaborative content creation, curation, and sharing). This part ofthe thesis investigation focurs on the following questions:• RQ2. What are the characteristics of social user activity? Understandingthe social aspects of user activity is an important step to complement theanalysis of individual user activity. More importantly, it can unveil char-acteristics that can inform system design. This characterization focuses on7two main aspects of user social activity, as guided by the following particu-lar questions:– RQ2.1 How is the strength of implicit user ties based on activity sim-ilarity distributed across the system? As a first step on understand-ing social user behaviour in tagging systems, this work characterizeswhether the strength of implicit ties between users are concentratedon a small subset of user pairs, or evenly spread across the population.Understanding this aspect can be harnessed by mechanisms that aim todetect communities of users with shared interest over specific topics,for instance (Section 3.2).– RQ2.2 Are there relationships between implicit and explicit user ties?There are different types of ties between users: implicit ties, as inferredfrom the similarity on their tagging activity; and, explicit ties suchas declared co-membership in discussion groups or friendship links.The goal is to characterize the intensity of implicit ties and determinewhether one type of ties contains information about its counterpart.Understanding this relationship between the type of ties enables a finerinterpretation of explicit ties, as the implicit ties provide a measure ofthe strength of existing explicit links between users (Section 3.3).Characterizing Users’ Perception of Tag Value (Chapter 4)Tagging systems are a subclass of online peer production systems. However,quantifying users’ contribution in tagging systems poses new challenges com-pared to other peer production systems, where the contribution value is oftenlinked to the amount of resource shared. In particular, users contribute informa-tion to tagging systems, which is fundamentally different from other peer produc-tion systems, where users share units of physical resources (e.g., bandwidth, CPUand storage). This contrast demands a study on the users’ perception of value ofpeer-produced information in the context of social tagging systems.8To this end, this research moves to a qualitative characterization of the per-ception of value and their records of activity. The goal is to inform the design ofmethods that quantify the value of tags. In particular, this work focuses on thefollowing two aspect of tagging systems design:RQ3. What are the aspects that influence users’ perception of value of tags?All tags are not created equal. Due to many factors such as context in which thethe tag is being used and personal interests, a user will naturally consider sometags more important than others. To inform the design of methods that assess thevalue of tags, it is important to understand what are the aspects users take intoaccount when choosing to use tags in some contexts such as exploratory search(also known as navigation). The results of this investigation directly inform thedesign of methods that assess the value of tags (Chapter 4).Assessing the Value of Tags (Chapter 5, Chapter 6)Inspired by both the qualitative and the quantitative characterizations of users’ per-ception of tag value and users’ tagging activity, this thesis moves towards investi-gating methods that automatically assess the value of peer-produced informationin two relevant contexts for both information seekers and information producersin social tagging systems. In particular, this part of the thesis focuses on assessingthe value of peer-produced information in two contexts: exploratory search andcontent promotion, as described in the following:RQ4. How to assess the value of tags for exploratory search? In the light ofthe aspects that influence users’ perception of tag value, one can design methodsto automatically quantify the value of tags while aiming to capture the identifiedaspects. The goal is to formalize the aspects, design, and evaluate methods thatquantify the value of tags from the perspective information seekers (i.e., users whouse tags to discover new content) (Chapter 5).RQ5. How to assess the value of tags for content promotion? The character-ization of how users perceive tag value also provides insights on the aspects thatinfluence tag production. In particular, some users have clear goals that drive the9choice of tags such as promoting online content. Therefore, given the availabilityof tags and peer-produced information, it is paramount to assess the value of suchsources as inputs to content producers and whether these sources can improvetheir success metrics (e.g., popularity of videos) (Chapter 6).1.4 Summary of MethodologyThis research was conducted using mixed methods [39] (i.e., a combination of bothquantitative and qualitative research methods). In particular, quantitative methodsare used to study the overall characteristics of tagging systems by analysing tracesof user tagging activity. To this end, I have applied statistical methods and simula-tions. Part of the characterization of user behavior resorts to qualitative methods.More specifically, I used grounded theory [39] and in-depth interviews [39] tostudy what are the aspects information consumers take into account when choos-ing tags in an exploratory search (Chapter 4).The combination of these methods provides primarily two important angleson user behavior in social tagging systems. Additionally, it enables the designof mechanisms while starting from the users’ perspective of what is important,instead of using unvalidated assumptions.1.5 Summary of ContributionsThis section summarizes the contributions of this thesis. I note that the contri-butions are part of a series of refereed articles and technical reports [65, 75–81].Each contribution briefly described below maps to each specific research question,as stated in the previous section, in order.Characterization of Individual User Activity• RQ1.1. Users tend to reuse tags already present in the system more oftenthan they repeatedly tag existing items [77, 78]. This finding supports theintuition that tags are primarily a content categorization instrument. Addi-10tionally, the results show that the difference between the levels of tag reuseand repeated item tagging vary across different systems. This observationsuggests that features such as tag recommendation and the type of contentplay a role in the patterns of peer production of information in tagging sys-tems.• RQ1.2. The tag vocabulary of a user can be approximated by a small por-tion of her activity [79]. The experiments on the evolution of user tagvocabularies show that only to accurately approximate the characteristicsof a tag vocabulary, only a small percentage of the initial tag assignmentsperformed by a user is necessary. These observed results can applied in thecontext of applications that rely on activity similarity scores between users,for example, as it provides a way to reason about the trade offs betweenaccuracy of a user activity profile and the computational cost of updatingthe similarity scores.Characterization of Social User Activity• RQ2.1. The strength of implicit social ties is concentrated over small por-tion of user pairs. Moreover, the observed strength of activity similaritybetween pairs of users are the result of shared interest as opposed to gen-erated by chance. The distributions of activity similarity strength deviatesignificantly from those produced by a Random Null Model (RNM) [71].This suggests that the implicit ties between users, as defined by their activ-ity similarity levels, capture latent information about user relationships thatmay offer support for optimizing system mechanisms.• RQ2.2. The average strength of implicit ties is stronger for user pairs withexplicit ties [78]. This investigation analyzes the similarity between usersaccording to their tagging activity and its relation to explicit indicators ofcollaboration. The results show that the users’ activity similarity is con-centrated on a small fraction of user pairs. Also, the observed distributions11of users’ activity similarity deviate significantly from those produced by aRandom Null Model [71]. Finally, an analysis of the relationships betweenimplicit relationships based on activity similarity and other more explicitrelationships, such as co-membership in discussion groups, shows that userpairs that tag items in common have in average higher similarity in terms ofco-membership in discussion groups.Characterization of Users’ Perception of Tag ValueTo complement the quantitative characterization and to inform the design of meth-ods that assess the value of tags, this research conducts a qualitative characteri-zation of user’ perception of tag value. A summary of the major findings in thisinvestigation is presented below:• RQ3. Users perception of tag value in exploratory search is multidimen-sional and the key aspects that influence users’ perception are: relevanceof items retrieved and reduction of search space [81]. Based on a qual-itative characterization of users’ perception of tag value in the context ofexploratory search, this study finds that the two most salient aspects thatinfluence users’ perception of tag value are: ability to retrieve relevant con-tent items and ability to reduce the search space. These findings inform thedesign of a method that quantifies the value of tags automatically by takinginto account the important aspects, which are identified by the qualitativeanalysis.Methods to Assess Value of Peer-Produced InformationFinally, this research proposes new techniques that exploit the usage character-istics of tagging systems to improve their design. The next paragraphs brieflydescribe the contributions related to studying social tagging as commons-basedpeer production systems and the design of methods to assess the value of user12contribution in these collaborative contexts. Chapter 5 and Chapter 6 distills theproposed approaches and results in details.Important to note that there are two perspectives to the problem of assessingthe value of peer-produced information in tagging systems: the consumer and theproducer. The goal is to design methods that cater for each of these perspectives.For consumers, assessing the value of tags are considered in the context of ex-ploratory search, while for producers, the method takes into account the ability ofa tag to improve the viewership of content (e.g., a YouTube video).• RQ4. An information-theoretical approach to assess the value of tags forexploratory search provides accurate estimates of value as perceived byusers. This study first provides a framework that help specifying compo-nents of methods that assess the value of user contributios in social tag-ging systems. In particular, this part of the research provides a method thatautomatically quantifies the value of tags that caters for the two desirableproperties in the context of exploratory search, as identified by the quali-tative user study. A proof shows that the proposed method has desirabletheoretical properties while quantifying these two aspects. Additionally, anexperiment using real tagging data that shows that the proposed methodaccurately quantifies the value of tags according to users’ perception.• RQ5. Peer-produced information, though lacking formal curation, hascomparable value to that of expert-produced information sources when usedfor content promotion. An analysis of online videos provides evidence thatthe tags associated with a sample of popular movie trailers can be optimizedfurther by an automated process: either by incorporating human computingengines (e.g., Amazon Mechanical Turk) at a much lower cost than usingdedicated channel managers (the current industry practice); or, at an evenlower cost, by using recommender algorithms to harness textual producedby a multitude of data sources that are related to the video content. To thisend, I perform a comparison of the effectiveness of using peer- and expert-13produced sources of information as input for tag recommender that aim toboost content popularity.1.6 Dissertation StructureThis dissertation is naturally divided into three parts: i) quantitative characteri-zation of user activity; ii) qualitative characterization of users’ perception of tagvalue; and, iii) system design. The first part consists of Chapter 2 (RQ1) andChapter 3 (RQ2) that presents a characterization of both individual and socialuser behaviour while using quantitative research methods. The second part, aspresented in Chapter 4 (RQ3), focuses on a qualitative analysis of users’ percep-tion of tag value to inform system design. The third part, which is presented asChapter 5 (RQ4), Chapter 6 (RQ5), focus on the design of methods to assess thevalue of tags from the perspective of both information seekers and content promo-tion. Finally, Chapter 7 presents the final remarks and directions for future work.Note that each chapter contains its respective related work section to position thecontributions among the related literature.14Chapter 2Characterizing Users’ IndividualBehaviorTagging systems [63] are a ubiquitous manifestation of online peer-production ofinformation [8], a production mode commonplace in today’s World Wide Web [70].The annotation feature, often referred to as simply tagging, has been originallydesigned to support personal content management. However, as this feature ex-poses user preferences and their temporal dynamics, similarities between users,and the aggregated characteristics of the user population, annotations have beenrecognized for their potential to support a wider range of mechanisms such as so-cial search [24], recommendation [32, 56], and spam detection [54]. Therefore,a better understanding through characterization and modeling of usage patterns isnecessary to fully realize the full potential of this feature.This chapter presents quantitative characterization results that complementsprevious characterization studies (presented in Section 2.1) 1. In particular, thischapter focuses on two major aspects of the tagging activity that have attractedrelatively little attention in the past: i) the dynamics of peer production of tags anditems; and, ii) the temporal dynamics of users’ tag vocabularies [64, 75, 76, 78].More specifically, the following questions guide the quantitative characterization1The results presented in this chapter appeared at the following references: [64, 76, 78, 81]15of individual user behavior:• RQ1.1. Are there any patterns of information production in social taggingsystems? (Section 2.3).• RQ1.2. What are the temporal dynamics of tag vocabularies? (Section 2.4).To study the patterns of information production in social tagging systems, Sec-tion 2.3 concentrates on two metrics: i) item re-tagging, a measure of the degreeto which items are repeatedly tagged; and ii) tag reuse, a measure of the degree towhich users reuse a tag to perform new annotations.The analysis of the evolution of the users tag vocabularies (i.e., the set of tagsa user assigns to her items) in Section 2.4 focuses on the evolution of the uservocabularies over time.This study uses activity traces from three distinct tagging systems: CiteULike,Connotea and Delicious (Section 2.2). This selection of systems samples the di-versity of the tagging ecosystem, as they are three emblematic tagging systemsfor the type of content they target, with CiteULike and Connotea concentratingin bookmarking of academic citations, and Delicious focusing on general URLs.The in-depth analysis of these three systems reveals regularities and relevant vari-ations in tagging behavior.The main findings of this characterization study are:• The characteristics of peer production of information are qualitatively sim-ilar across systems but differ quantitatively, as suggested by the observedrates of item re-tagging and tag reuse. In all three systems investigated,users produce new items at higher rate than they produce new tags. How-ever, the observed rates in CiteULike and Connotea are different from Deli-cious. As the three systems provide essentially similar annotation features,these findings suggest that the target audience and the type of annotatedcontent play an important role in the users’ tagging behavior (Section 2.3).16• User tag vocabularies are constantly growing, but at different rates depend-ing on the age of the user. However, despite the constant increase in size,the relative usage frequency of tags in a vocabulary converges to a stableranking at early stages of a user’s lifetime in the system. These observa-tions have implications for applications that rely on tag-vocabulary simi-larity (e.g., recommender systems): these applications can use only a subsample of the entire user activity to estimate vocabulary similarity betweenusers. Moreover, applications can aim to strike a balance between the ac-curacy of similarity estimates, the data volume used for estimation, and thefreshness of the data. (Section 2.4)These characteristics have practical implications for the design of mechanismsthat rely on implicit user interactions such as collaborative search [15], spam de-tection [54] and recommendation [19, 35].2.1 Related WorkThis section positions this work among the related literature of two main topics:i) general studies about the characteristics of social tagging systems; and, ii) char-acterization of the evolution of tag vocabularies.2.1.1 General Characterization StudiesPrevious characterization studies focusing on tagging systems vary along threemain aspects: i) the system analyzed; ii) the focus of the characterization (i.e.,system-, tag-, item- or user-centric analysis); and, iii) the method of investigation- qualitative or quantitative research methods. Nevertheless, these works share thesame intent: to characterize the usage patterns observed and gaining insight intothe underlying processes that generate them. These works propose models thatcan be used to explain the observed characteristics of tagging activity such as theincentives behind tagging, the relative frequency of tags over time for a given item,17the interval between tag assignments performed by users and the distributions ofactivity volume.Hammond et al. [34] present, perhaps, the first study and discussion about thecharacteristics of social tagging, its potential, and the incentives behind taggingitself. The study comments on the features provided by different social taggingsystems and discusses preliminary reasons that incentivize users to annotate andshare content online. Following on the question of incentives, Ames et al. [3]study tagging in online social media websites by interviewing 13 users on thefundamental question of why do people tag? Based on user answers, the authorssuggest that tagging serves to support content organization or to communicate as-pects about the content. These actions can be either socially- or personally-driven.More recent studies have followed the analysis of incentives at a larger scale [90].Our study supports and, more importantly, extends these result by performing alarge-scale user behavior analysis (covering more than 700,000 users) in threetagging systems. Although, we do not focus on the question of incentives particu-larly, the quantitative analysis we present highlight and provide stronger evidenceof existing incentives hypothesized by previous works.One of the first works on the quantitative characterization of tagging systemsis an item-centric characterization of del.icio.us that proposes the Eggenberger-Polya’s urn model [23] as an explanation to the observed relative frequenciesof tags applied to an item [29]. Addtionally, Cattuto et al. [12] show in a tag-centric characterization that the observed tag co-occurrence patterns in del.icio.usis well modeled by the Yule-Simon’s stochastic process [85]. Similarly, Capocciet al. [11] models the tag interarrival time distribution to show that it follows apower-law. Using a different approach to characterize tagging activity, Chi andMytkowicz [16] study the impact of user population growth in the efficiency oftags to retrieve items in Delicious. More recent works, however, focus on a char-acterization of social tagging systems that analyzes the impact of using taggingon external applications such as information retrieval and expert-generated con-tent [31, 58, 61, 82].18Another stream of characterization studies focuses on user-centric analysis.Nov et al. [66] present a user-centric qualitative study on the motivations behindcontent tagging in Flickr, where they suggest that users tag content due to a mix-ture of individual like personal content organization, and social motivation suchas to help others in finding photos from a particular place. In a previous study, wecharacterize the user-centric properties of tagging activity from two social book-marking systems designed for academic citation management: CiteULike and Bib-sonomy. The observations suggest that user activity across the system follows theHoerl model [76].The investigations presente in this chapter complement and extend these pre-vious studies, as I study the characteristics of a combination of user-, item- andtag-centric tagging activity. Moreover, it explores different aspects of taggingactivity, such as the levels of item re-tagging and tag reuse over time and therelationship between implicit and explicit user ties in tagging systems. By apply-ing a quantitative approach on a broad population of users and multiple taggingsystems, this study also offers new insights on user behavior that complementprevious qualitative work by Ames and Naaman [3].2.1.2 Evolution of Users’ Tag VocabulariesTags represent to a certain extent the user perception or intended use of an item.It is natural, therefore, to assume that the set of tags (i.e, tag vocabulary) of agiven user provides information about her topics of interest, which is useful to de-sign other mechanisms that support efficient content usage such as recommendersystems. Naturally, if inclusion of new tags or shifts in the tag usage frequencyobserved in a vocabulary are rare (i.e., if tag vocabularies are stable over time),a mechanism that relies on vocabulary snapshots can focus less on shifts of userspreferences overtime when computing personalized predictions. Indeed, the re-sults show that this is the case (Section 2.4).Previous studies on the characterization of the evolution of tag vocabularycan be divided in two categories: first, studies that aim to quantify and model the19growth of tag vocabularies at both the system- and user-level [12, 13]; and, second,studies that estimate shifts in the tag vocabularies over time such as evolution ofthe tag popularity distribution of item-level tag vocabularies [33], and the variationof tag usage frequency across manually predefined tag classes [29] (i.e., factualtags, subjective tags and personal tags) [83].In summary, these previous studies show that: i) the system-level and user-level tag vocabulary growth is sublinear; ii) item-level tag popularity distributionconverges to a power-law; and, iii) the usage frequency of tag categories shiftsover time.This study extends previous works by evaluating different facets of the vocab-ulary evolution. First, this work goes beyond the estimation of vocabulary growth,focusing on the evolution of tag usage frequency, as opposed to the frequency oftag categories. Second, it concentrates on individual, user-level tag vocabular-ies, as opposed to the item-level vocabularies; more precisely, the approach usedmakes no assumptions about the categories of tags that appear in the user tagvocabularies. Finally, it uses a different methodology to estimate the differencebetween tag vocabularies from different points in time.2.2 Data Collection and NotationThis section describes the activity traces collected and analyzed in this study. Ad-ditionally, it also introduces the basic notation used in the rest of this chapter.Table 2.1 presents a summary of the data sets used in this investigation. TheCiteULikeand Connoteadata sets consist of all tag assignments since the creationof each system in late 2004 until January 2009. The CiteULikedataset is availabledirectly from its website. For Connotea, I built a crawler that leverages Con-notea’s API to collect tagging activity since December 2004 (no earlier activitywas available). Finally, the Deliciousdataset is available at the website of a previ-ous study by Go¨rlitz et al. [30]2.2http://www.tagora-project.eu/20Table 2.1: Summary of data sets used in this studyCiteULike Connotea DeliciousActivity Period 11/2004 – 01/2009 12/2004 – 01/2009 01/2003 – 12/2006# Users 40,327 34,742 659,470# Items 1,325,565 509,311 18,778,597# Tags (distinct) 274,982 209,759 2,370,234# Tag Assignments 4,835,488 1,671,194 140,126,555Note that we do not have access to browsing or click traces. The traces an-alyzed in this work contain records that indicate when items are annotated witha given tag and who was the user, but the traces do not inform whether a tag issubsequently used by a user to navigate through the system, for example. The datasets are ’cleaned’ to reduce sources of noise, such as the default tag ’no-tag’ inCiteULike, tags composed only of symbols and other tags like the automaticallygenerated ’bibtex-import’, which are clear outliers in the popularity distribution.Notation. The rest of this chapter uses the following notation. A tagging systemis composed of a set of users, items and tags, respectively denoted by U , I, T . Thetagging activity in the system is a set of tuples (u, i,w, t), where u ∈U is a userwho tagged item i ∈ I with tag w ∈ T at time t. The activity of a user u ∈U can becharacterized by Au, Iu and Tu, which are respectively the set of tag assignmentsperformed by u, the set of items annotated, and the vocabulary or set of tags usedby u. The user’s activity from the beginning of the trace up to a particular point intime is denoted by Au(t0, t), Iu(t0, t) and Tu(t0, t), where t0 and t are timestamps, t0represents the begin of the trace, and t0 ≤ t.2.3 Tag Reuse and Item Re-TaggingLet a new item (or tag) be an item (or tag) that has never been used in an annotationin the tagging system. If users introduce new items and tags frequently, efficientlyharnessing information based on collective action is difficult, if not impossible.21This is so because in this case information about future user actions towards theannotation of an item or use of a tag is then hard to predict: prediction relieson the historical use of items and tags; new items or tags have no history in thesystem. Understanding the degree to which items are repeatedly tagged and tagsreused can therefore help estimating the potential efficiency of techniques that relyon similarity of past user activity (e.g., recommender systems). To this end, thissection addresses the following specific questions:• RQ 2.3.1 What is the rate of repeated item annotation and tag reuse? (Sec-tion 2.3.1)• RQ 2.3.2 Is the flow of new incoming users a major factor in the observedrates of repeated item annotation? (Section 2.3.2)• RQ 2.3.3 Are the observed reuse patterns the result of a group of high-volume power users? (Section 2.3.3)The rest of this section first formalizes the metrics item re-tagging and tagreuse used to address these questions. Second, it characterizes the levels of itemre-tagging and tag reuse as well as the level of activity generated by returningusers. Finally, it discusses the implications of the usage characteristics discovered.2.3.1 Levels of Item Re-tagging and Tag ReuseAn item is re-tagged (repeatedly tagged) if one or more users annotate it more thanonce (with the same or different tags). Similarly, a tag is reused if it appears in thetrace more than once (for the same or different items) with different timestamps.The goal is to determine which portion of the activity falls in these categories.Definition 1 The level of item re-tagging during a time interval [t f−1, t f ) is theratio between the number of items tagged during that interval that have also beentagged in the past [t0, t f ) to the total number of items tagged during the interval[t f−1, t f ), as expressed by Equation 2.1. Tag reuse, denoted by tr(t f−1, t f ), issimilarly defined.22ir(t f−1, t f ) =|I(t0, t f−1)∩ I(t f−1, t f )||I(t f−1, t f )|(2.1)This definition is used to determine the aggregate level of item re-tagging andtag reuse in CiteULike, Connotea and Delicious. Table 2.2 presents the mediandaily item re-tagging and tag reuse over the entire traces (i.e., the time interval[t f−1, t f ) encompasses a day). The results show that CiteULike and Connoteahave relatively low levels of item re-tagging while del.icio.us has a higher levelof item re-tagging, yet all three systems present similarly high levels of tag reuse.One hypothesis is that the observed difference in item re-tagging between De-licious and their counterparts in CiteULike and Connotea is due to the type ofcontent users bookmark in each system (with URLs of any type in the former, andacademic literature in the latter).Table 2.2: A summary of daily item re-tagging and tag reuse. The higher thescore the more an item/tag is re-tagged/reused.Re-Tagged Items Reuse TagsMedian Std. Dev. Median Std. Dev.CiteULike 0.15 0.07 0.84 0.12Connotea 0.07 0.06 0.77 0.21del.icio.us 0.45 0.17 0.86 0.07To test whether these aggregate levels are a result of stable behavior over time,Figure 2.1 presents the moving average (with a window size of 30 days) of dailyitem re-tagging and tag reuse. Overall, these results show that all three systems gothrough a bootstrapping period, after which they stabilize, with the levels of itemre-tagging and tag reuse stabilizing much sooner for CiteULike and Connotea thanthat for del.icio.us. However, the tag reuse levels have a similar evolution patternin all three systems.On the one hand, from the perspective of personal content management, theobserved levels of item re-tagging and tag reuse, together with the much largernumber of items than that of tags in these systems, suggest that users indeed ex-ploit tags as an instrument to categorize items according to, for example, topics of23Figure 2.1: Daily item re-tagging (left) and tag reuse (right). The curves aresmoothed by a moving average with window size n = 30interest or intent of usage (’toread’, ’towatch’). On the other hand, from the social(or collaborative) perspective, the relatively high level of tag reuse taken togetherwith the low level of item reuse suggests that users may have common interestover some topics, but not necessarily over specific items. These quantitative re-sults suggest that tags are used in the way previous exploratory qualitative studyAmes and Naaman discusses [3].A question that arises from the above observations is whether the levels ofitem re-tagging and tag reuse are generated by the same user or by different users.An inspection of the activity trace shows that virtually none of the item re-taggingevents are produced by the user who originally introduced the item to the system:generally, users (in our trace) do not add new tags to describe the items theycollected and annotated once.As illustrated by Figure 2.2 (left), about 50% of tag reuse is self-reuse (i.e.,the reuse of a tag by a user who already used it first). This level of tag self-reuseindicates that users will often tag multiple items with the same tag, a behaviorconsistent with the use of tagging for item categorization and personal contentmanagement, as discussed above. Additionally, the fact that half of the tag reuse24Figure 2.2: Self-tag reuse (left) and daily activity generated by returningusers (right). The curves are smoothed by a moving average with win-dow size n = 30is not self-reuse reinforces the notion that users do share tags, which indicatespotentially similar interests. Chapter 3 further investigates this social aspect oftag reuse by defining and evaluating interest sharing among users, as implied bythe similarity between users’ activity (i.e., tags and items).2.3.2 New Incoming UsersTo understand whether the observed low level of item re-tagging is due to a highrate of new users joining the community, it is necessary to estimate the levelsof activity generated by returning users (as opposed to new users that join thecommunity). Figure 2.2 (right) shows that, after a short bootstrap period, thelevel of tagging activity generated by returning users remains stable at about 80%over the rest of the trace for both CiteULike and Connotea. In del.icio.us, thepercentage of activity represented by returning users is even higher, with above95% of daily activity performed by returning users.Thus, the low levels of item re-tagging are the outcome of new items beingadded by returning users, instead of a constant stream of new users joining the25community.2.3.3 The Influence of Power UsersFinally, this study looks into the influence of highly active users in the observeditem re tagging and tag reuse levels. To this end, I conduct an experiment thatconsists of comparing the observed item re-tagging and tag reuse with and withoutthe activity produced by such power users. This experiment assumes the powerusers as the top-1% most active users according to the number of annotationsproduced, and calculates item re-tagging and tag reuse as before.The experiments test the hypothesis that the levels of item re-tagging and tagreuse are the same with and without the activity produced by these power users.To this end, I apply the Kolmogorov-Smirnov test (KS-test) on the two samplesof activity (i.e., with and without the power users) with the null hypothesis thatthe item re-tagging and tag reuse observed in the two samples come from thesame distribution (i.e, H0 = the item re-tagging and tag reuse levels are equallydistributed with and without the power users).At a confidence level of 99%(α = 0.01, p = 1−α), the null hypothesis canbe rejected for all the systems, except the item re-tagging levels for Delicious (seethe p-values in Table 2.3). This means that removing the activity produced by thepower users leads to statistically different levels of item re-tagging and tag reuseas indicated by the D-statistic in Table 2.3 (i.e., the maximum difference betweenthe two distributions) [87].An explanation for the observations above is that Delicious is a system thatfocuses on social bookmarking of URLs of any type (as opposed to be restricted toscientific articles in CiteULike and Connotea), removing the top 1% most activeusers do not affect the observed levels of item re-tagging because some itemswill attract the attention of many other less active users. These users contribute,therefore, in large part for the observed levels of item re-tagging in del.icio.us.26Table 2.3: The statistical test results reject the hypothesis that the item re-tagging and tag reuse observations with and without the power users areequal. However, in most cases the difference has a small magnitude.Re-Tagged ItemsD-Statistic p-value <CiteULike 0.03516 2.2×10−16Connotea 0.1889 2.2×10−16Delicious 0.0475 0.0768Reuse TagsD-Statistic p-value <CiteULike 0.2858 2.2×10−16Connotea 0.2132 2.2×10−16Delicious 0.1371 3.23×10−162.3.4 Summary and ImplicationsThe observed user behavior impacts the efficiency of systems that rely on theinferred similarity among items, such as recommender systems. On the one hand,the relatively low level of item re-tagging suggests a highly sparse data set (i.e.,attempting to connect users based on similar items will connect only few userpairs). A sparse data set poses challenges when designing recommender systemsas they typically rely on the similarity of users based on their past activity to makerecommendations.On the other hand, the higher level of tag reuse confirms that analyzing tagshas the potential to circumvent, or at least alleviate, the sparsity problem describedabove. The tags and users that relate to each item could not only serve to link itemsand build an item-to-item structure, but could also potentially provide semanticinformation about items. This information may help, for instance, to design betterbibliography and citation management tools for the research community.The results on analyzing the impact of power users in the observed levelsof item re-tagging and tag reuse support two ideas: first, the notion that someusers are instrumental on reducing the sparsity on tagging data sets (i.e., withoutpower users, tags and items would be reused less, therefore potentially lesser itemswould be connected through tags and users). In fact, recommender systems benefit27directly from the activity produced by such power users, as they can connect moreitems via repeated tag usage. Second, the role of power users differs from systemto system, potentially due to effects of population size and diversity of interests.In the largest and most diverse system studied here, reuse is a result of the activityof less active users rather than only power users.2.4 Temporal Dynamics of Users’ Tag VocabulariesThe item re-tagging and tag reuse analysis presented in the previous section showsthat users constantly produce new information in the system, by adding both newitems to their libraries and tags to their vocabularies, though at different rates.Although user tag vocabularies are constantly growing, it is unclear whetherthe growth rate is uniform over time. More importantly, vocabulary growth mayor may not imply changes in the relative tag usage frequency by a given user.Changes in these frequency can indicate shifts in user interests over time.To better understand these aspects of tagging activity, this section character-izes the temporal dynamics of user tag vocabularies. In particular, we study therate of change of user vocabularies over time, as it quantifies the growth rate andchanges in tag usage frequency for each user vocabulary. The following questionguides this investigation:• RQ1.2 What are the temporal dynamics of tag vocabularies??To address this question, this section quantifies the evolution of user tag vocab-ularies by considering both their vocabulary growth and the tag usage frequencyat different points in time. More specifically, the experiments first characterize thegrowth of user vocabularies, and, second, estimate the distance between tag vo-cabularies as expressed by the distance between snapshots of a user’s vocabularyat various points in time and her final vocabulary. To take into account tag usagefrequency the tags are ordered according to their frequency (i.e., the number oftimes the user annotated an item with the tag).282.4.1 MethodologyTime is introduced in the definition of a user vocabulary by defining the tag vo-cabulary of a user Tu(s, f ) as the set of tags used within the tag assignment inter-val [s, f ]. A particular case is Tu(1,n) when 1 and n indicate the timestamps ofthe first and the last observed tagging assignment by user u, respectively. ThusTu(1,n) = Tu and represents the user’s entire vocabulary.Vocabulary growth. To analyze the vocabulary growth, it is necessary totrack the distribution of growth rates across the user population for the duration ofthe traces. The goal is to understand whether the growth rate changes accordingto the user age. Therefore, the growth is measured by following ratio:|Tu(1,k+1)|− |Tu(1,k)||Tu(1,k+1)|(2.2)where k∈ [1,n] for all users in the system (i.e., 1 and n represent the timestampof the first and last tag assignments of a particular users, respectively).Vocabulary change. To measure the rate of change in the content of thevocabularies, this investigation considers vocabularies as sets of tags ordered indecreasing order of usage frequency (i.e., number of times the tag was used toannotate any item), and apply a distance metric as follows.In this context, the final tag vocabulary, Tu(1,n) is taken as a reference pointto study the evolution of tag vocabularies in terms of the usage frequency of in-dividual tags. The rationale behind the choice of this reference is that accordingto the tag reuse results in Section 4, user tag vocabularies are constantly grow-ing. Therefore, it is unlikely that splitting the activity trace into disjoint windowscould help identifying meaningful evolution patterns. Instead, we trace the evolu-tion of a user’s tag vocabulary by comparing the distance of incremental snapshotsto her final vocabulary. This way, it is possible to understand the rate of conver-gence of user vocabularies over time. The experiment consists of calculating thedistance from the tag vocabularies Tu(1,k) (k ∈ [2,n]), to the reference tag vocab-ulary Tu(1,n).29A traditional metric to calculate the distance between two lists of ordered el-ements is the Kendall’s τ distance [51], which considers the number of pairwiseswaps of adjacent elements necessary to make the lists similarly ordered. How-ever, Kendall’s τ distance assumes that both lists are composed of the same ele-ments. Since we are interested in the evolution of tag vocabularies over time, thisassumption is not valid in our case: tag vocabularies are likely to contain differenttags at different times due to the constant inclusion of new tags.Therefore, we apply the generalized Kendall’s τ distance, as defined by Faginet al. [25], which relaxes the restriction mentioned above and accounts for ele-ments that are present in one permutation, but are missing in the other. Similar tothe original Kendall’s τ distance, the generalized version of the metric counts thenumber of pairwise swaps of items necessary to make the lists similarly ordered.Additionally, the generalized version counts the absence of items via a parameterp. This parameter can be set between 0 and 1, which allows various levels of cer-tainty about the order of absent items. For example, in the case that two items aremissing from one list, but present on the other, setting p = 0 indicates that thereare not enough information to decide whether the two items are in the same otheror not. Conversely, setting p = 1 indicates that there is full information availableto consider the absence as an increase in the distance between the lists. In theexperiments that follow we use p = Results and ImplicationsOur analysis filters out users that had negligible activity considering only userswith at least 10 annotations. This sample is responsible for approximately 93%,61%, and 90% of the total system activity in terms of tag assignments in CiteU-Like, Connotea, and Delicious, respectively.Vocabulary growth rate. Figure 2.3, Figure 2.4, and Figure 2.5 illustratesvocabulary growth rate across the user population in the three systems studied.The x-axis indicates categories of users according to their age (i.e., number of dayssince their first recorded tag assignment), while the y-axis indicates the growth30rate relative to each user vocabulary. For each of the systems studied we presenttwo plots: labeled ’median’ and ’90th percentile’. A point in the median plotindicates that 50The results show that, for the duration of the traces analyzed, the mediangrowth rate (e.g., Figure 2.3 – left) is relatively larger for older users. On theother hand, if we take the 90th percentile growth rate (e.g., Figure 2.3 – right),except the very young users, we observe that the rate is relatively the same for allage groups with a slightly smaller rate for users in the middle of the age spectrum.An important observation is that except for the growth rate of young vocabularies,the 90th percentile reaches a maximum rate of 0.1. This means that for 90% ofusers, their vocabularies growth rate upper bound is 10%.Figure 2.3: The vocabulary growth pattern in CiteULikeVocabulary change. Figure 2.6, Figure 2.7, and Figure 2.8 changes the focusfrom growth rate to the rate of change in users’ vocabularies. The figures presentthe rate of change in the contents of user vocabularies by taking into account thefrequency of tags and calculating the distance between vocabulary snapshots. Theresults show that the distance from the vocabulary at earlier ages to its final state(i.e., Kendall-tau distance t(Tu(1,k),Tu(1,n)), where k ∈ [2,n]) decreases rapidlyin the first 100 days for 50% of users.These findings have direct consequences for the design of similarity-aware31Figure 2.4: The vocabulary growth pattern in ConnoteaFigure 2.5: The vocabulary growth pattern in deliciousapplications. The rapid convergence to the final vocabulary shown in Figure 2.6,Figure 2.7 and Figure 2.8 suggest that it is possible to obtain a relatively accu-rate approximation of a user’s vocabulary based on her initial and limited tagassignment history, which leads to potential reduction in computation costs whenattempting to estimate user similarity. In particular, users’ vocabulary are used asthe input for methods that quantify the value of tags, as presented in Chapter 5.Therefore, using part of the user vocabulary can be beneficial as one reduce thecost of computing tag values without compromising accuracy.32Figure 2.6: Rate of change in the tag usage frequency in the user vocabular-ies of CiteULikeFigure 2.7: Rate of change in the tag usage frequency in the user vocabular-ies of Connotea33Figure 2.8: Rate of change in the tag usage frequency in the user vocabular-ies of Delicious34Chapter 3Characterizing Social Aspects inTagging SystemsTagging systems are inherently social, as users can oftentimes annotate contentshared by others or use others’ annotations to discover new content of interest.Therefore, besides understanding the individual characteristics of user behaviour,it is also important to study this social dimension of social tagging. This chapterpresent results on the characterization of social user behaviour 1. The focus lies onthe characteristics of the social ties between users in these systems [64, 76, 78, 81].The investigation of social ties between pairs of users focuses first on unveil-ing the characteristics of the implicit ties between users based on the similaritybetween their tagging activities. Additionally, this work explores the relation-ship between the strength of such implicit ties and those of more explicit socialties such as co-membership in discussion groups and semantic similarity of tagvocabularies. Studying the relationship among the implicit and explicit ties isrelevant, as we test whether the implicit ties based on usage similarity provideinformation about the potential creation of explicit social ties and ultimately forcollaboration. This characterization focuses on two main aspects of user socialactivity, as guided by the following particular questions:1The results presented in this chapter appeared at the following references: [78, 81]35• RQ2.1 How is the strength of implicit user ties based on activity similaritydistributed across the system?• RQ2.2 Are there relationships between implicit and explicit user ties?To address these questions, I applied a quantitative approach to characterizetraces of activity collected from real social tagging systems. The main findings ofthis characterization study are:• RQ2.1. The observed levels of activity similarity between pairs of usersare the result of shared interest as opposed to generated by chance. Thedistributions of activity similarity strength deviate significantly from thoseproduced by a Random Null Model (RNM) [71]. This suggests that theimplicit ties between users, as defined by their activity similarity levels,capture latent information about user relationships that may offer supportfor optimizing system mechanisms (Section 3.2).• RQ2.2. The implicit social ties are related to explicit indicators of collabo-ration. We show that user pairs that share interests over items (i.e., annotatethe same items) have higher similarity regarding the groups they participatetogether and higher semantic similarity of their tag vocabularies (even aftereliminating the portions of tagging activity that is related to the items theytag in common) (Section 3.3).These characteristics have practical implications for the design of mechanismsthat rely on implicit user interactions such as collaborative search [15], spamdetection [54] and recommendation [56] as outlined in Section 3.2.4 and Sec-tion Related WorkThis section contextualizes this work along the topic of graph-based approachesto study activity similarity among users.36An alternative way to characterize tagging systems is a graph-centric approach.Two users are connected by a weighted edge with strength proportional to thesimilarity between the tagging activities of these two users. In this study, thissimilarity is referred to as an implicit social tie between users. Note that othertypes of connections between users are possible. In particular, we refer to explicitsocial ties as explicit indicators of user collaboration, such as co-membership indiscussion groups.This approach has been used by Iamnitchi et al. [45, 46] to characterize sci-entific collaborations, the web, and peer-to-peer networks. The same model hasbeen used by Li et al. [58] to target the problem of finding users with similar in-terests in online social networking sites. The authors use a Delicious data set anddefine links between users based on the similarity of their tags. Their conclusionssupport the intuition that tags accurately represent the content by showing thattags assigned to a URL match to a great extent the keywords that summarize thatURL. Additionally, they design and evaluate a system that clusters users based onsimilar interests and identifies topics of interests in a tagging community.Another focus of graph-centric characterizations is to determine structural fea-tures in the graph formed by connecting users, items and tags based on similarity.Hotho et al. [43] models a collaborative tagging system as a tripartite network (thenetwork connects users, items and tags in a hypergraph) and design a ranking algo-rithm to enable search in social tagging systems. Using the same tripartite networkmodel, Cattuto et al. [12] study Bibsonomy and show the existence of small-worldpatterns in such networks representing social tagging systems. Krause et al. [55]also explore the topology of a tagging system, but the one formed by item simi-larity, to compare the folksonomy inferred from search logs and tagging systems.Their results suggest that search keywords can be considered as tags to URLs.More recently, Kashoob et al. [50] characterizes and model the temporal evolu-tion of sub-communities in social tagging systems by looking into the similaritybetween users vocabularies.Our study differs from these previous investigations in three aspects: first,37the characterization of tagging activity similarity between users focuses on thesystem-wide concentration and intensity of pairwise similarities, as opposed tothe topological characteristics. Second, our methodology provide a principledway to test whether the user similarity observed in social tagging systems is theproduct of interest sharing among users or chance. Finally, we investigate possiblecorrelations between the observed levels of activity similarity between users (i.e.,the implicit social ties) and the external indicators of explicit collaboration (i.e.,the explicit social ties) as co-membership to discussion groups and semantic simi-larity of tag vocabularies (Sections 3.2 and 3.3). We note that our methodology isinspired by a previous work by Reichardt and Bornholdt that studies the patternsof similarity of product preferences among buyers and sellers on eBay [71].3.2 Interest SharingThe analysis of item re-tagging and tag reuse in Section 2.3 suggests that theobserved level of re-tagging is the result of different users interested in the sameitem and annotating it. We dub this similarity in item related activity item-basedinterest sharing. Similarly, we dub the similarity in tag related activity tag-basedinterest sharing. This section defines and characterizes pairwise interest sharingbetween users as implied by their annotation activity in CiteULike, Connotea andDelicious.Analyzing interest sharing is relevant for information retrieval mechanismssuch as search engines tailored for tagging systems [98, 101], which can exploitpairwise user similarity to estimate the relevance of query results. However, thissection goes one step further and studies the system-wide characteristics of inter-est sharing and the implicit social structure that can be inferred from it. More-over, the next section investigates the relationship between interest sharing (asinferred from activity similarity) and explicit indicators of collaboration such asco-membership in discussion groups and semantic similarity between tag vocab-ularies (Section 3.3).38In particular, this section focuses in particular on characterizing interest shar-ing distributions across the user-pairs in the system and addresses the followingquestion:• RQ2.1. How is interest sharing distributed across the pairs of users in thesystem?3.2.1 Quantifying Activity SimilarityThis study uses the Asymmetric Jaccard Similarity Index [47] to quantity similar-ity between the item (or tag-) sets of two users. Note that previous work (includ-ing ours) has used the Jaccard Index to quantify interest sharing: Stoyanovich etal. [89] used this index to model shared user interest in Delicious and to evaluateits efficiency in predicting future user behavior. Chi et al. [14] applied the sym-metric index to determine the diversity of users and its impact in a social searchsetting.More formally, the item-based interest-sharing metric is defined as follows(the tag-based version is defined similarly and denoted by wT ):Definition 2 The level of item-based interest sharing between two users, k and j,as perceived by k, is the ratio between the size of the intersection of the two itemsets and the size of the item set of that user, where Ik is the set of items annotatedby user k.wI(k, j) =|Ik∩ I j||Ik|(3.1)Equation 3.1 captures how much the interests of a user uk match those ofanother user u j, from the perspective of uk. We opt for the asymmetric similarityindex rather than the symmetric version (which uses the size of the union of thetwo sets as the denominator in Equation 3.1) to account for the observation thatthe distribution of item set sizes in our data is heavily skewed. As a result, the39situation where a user has a small item set contained in another user’s much largeritem set happens often. In such cases, the symmetric index would define that thereis little similarity between interests, while the asymmetric index accurately reflectsthat, from the standpoint of the user with smaller item set, there is a large overlapof interests. From the perspective of the user with a large item set, however, onlya small part of his interests intersect with those of the other user.3.2.2 How is Interest Sharing Distributed across the System?This section presents the distribution of pairwise interest sharing in CiteULike,Connotea and Delicious. The first observation is that approximately 99.9% ofuser pairs in CiteULike and Delicious share no interest over items (i.e., wI(k, j) =0). In Connotea, the percentage is virtually the same: 99.8%. For the tag-basedinterest sharing, the percentage of user pairs with no tag-based shared interest(i.e., wT (k, j) = 0) is slightly lower: 83.8%, 95.8% and 99.7% for CiteULike,Connotea and Delicious, respectively. Such sparsity in the user similarity supportsthe conjecture that users are drawn to tagging systems primarily by their personalcontent management needs, as opposed to the desire of collaborating with others(Section 4.4.1 discusses further the qualitative aspects of tag production).The rest of this section focuses on the remaining user pairs, that is, those userpairs that have shared interest either over items or tags. To characterize theseuser pairs, we determine the cumulative distribution function (CDF) of item- andtag-based interest sharing for these sets of user pairs in all three systems.Figure 3.1: Distributions for item- and tag-based interest sharing (for pairsof users with non-zero sharing) in the studied systems40Figure 3.1 shows that, in all three systems, the typical intensity of tag-basedinterest sharing is higher than its item-based counterpart. This is not surpris-ing: after all, all three systems include two to three times more items than tags.However, there is qualitative difference across systems with respect the concentra-tion of item-based and tag-based interest sharing levels, with Delicious showing amuch wider gap between the distributions.The difference between the levels of item- and tag-based interest sharing sug-gests the existence of latent organization among users as reflected by their fieldsof interest. We hypothesize that this observation is due to a large number of userpairs that have similar tag vocabularies regarding high-level topics (e.g., computernetworks), but have diverging interests in specific sub-topics (e.g., internet rout-ing versus firewall traversal techniques), which could explain the relatively loweritem-based interest sharing compared to the observed tag-based interest sharing.Finally, to provide a better perspective in the tag-based interest sharing levels,we compare the observed values to that of controlled studies on the vocabulary ofusers describing computer commands [28]. The tag-based interest sharing level,as observed in Figure 3.1 is approximately 0.2 (or less) for 80% of the user pairsthat have some interest sharing, while Furnas et al. [28] show that in an experimentwhere participants are instructed to provide a word to name a command based onits description such that it is an intuitive name and more likely to be understoodby other people, the ratio of agreement between two participants is in the interval[0.1,0.2] (i.e., number of times two participants use the same word divided by thetotal number of participant pairs).These observations suggest that the tag-based interest sharing is due to con-scious choice of terms from vocabularies that are shared among users, rather thanby chance. The next section looks more closely into this aspect by constructing abaseline to compare the observed interest sharing levels to that of a random nullmodel.413.2.3 Comparing to a BaselineThe goal of this section is to better understand the interest sharing levels we ob-serve. In particular, we focus on the following high-level question:• RQ.2.2. Do the interest sharing distributions we observe differ significantlyfrom those produced by random tagging behavior?For this investigation, we compare the observed interest sharing distribution tothat obtained in a system with users that have an identical volume of activity andthe same user-level popularity distributions for items or tags, but do not act ac-cording to their personal interests. Instead, in the random null model (RNM) [71],the chance that a user is interested in an item or tag is simply that item or tag’spopularity in the user’s vocabulary.The reason to perform this experiment is the following: we aim to validate theintuition that the interest sharing metric distils useful user behavior information.If the interest-sharing levels we observe in the three real systems at hand are moreconcentrated than those generated by the RNM, then interest sharing metric cap-tures relevant information about similarity of user preferences, rather than simplycoincidence in the tagging activity.To reiterate, the random null model (RNM) is produced by emulating a tag-ging system activity that preserves the main macro-characteristics of the real sys-tems we explore (such as the number of items, tags, and users, as well as itemand tag popularity, and user activity distributions), but where users make randomtag assignments. As such, random assignments are used here as the opposite ofinterest-driven assignments.To test this hypothesis, the experiment compares the two sets of data (realand RNM-generated) in terms of the numbers of user pairs with non-zero interestsharing and the interest-sharing intensity distribution. Because of its probabilisticnature, we use the RNM to generate five synthetic traces corresponding to eachof the real systems we analyze. For the rest of this section, the RNM resultsrepresent averages over the five RNM traces for each system. We confirmed that42the five synthetic traces represent a large enough sample to guarantee a narrow95% confidence interval for the average interest sharing observed from the RNMsimulations.The data analysis presented in this section confirms that interest sharing de-viates significantly from that generated by random behavior in two important re-spects.First, interest sharing (and, consequently, the similarity between users) is moreconcentrated in the real systems than in the corresponding simulated RNM. Morespecifically, the number of user pairs that share some item-based interest (i.e.,wI(k, j) > 0) is approximately three times smaller in the real systems than in theRNM-generated ones. Tag-based interest sharing follows a similar trend.Second, interest sharing distribution deviates significantly from that producedby a RNM. We compare the cumulative distribution function (CDF) for the in-terest sharing intensity for the user-pairs that have some shared interest (i.e.,w(k, j) > 0). Figure 3.2 presents the Q-Q plots that directly compare the quan-tiles of the distributions of interest-sharing levels derived from the actual trace andthose derived from the simulated RNM. A deviation from the diagonal indicatesa difference between these distributions: The higher the points are above the di-agonal, the larger the difference between the observed interest-sharing levels andthose generated by the RNM.Note that the only interest-sharing distribution that is close to the one producedby the RNM is for Connotea’s tag-based interest sharing (Figure 3.2). However,there is still a significant deviation from randomness: the real activity trace leadsto three times fewer user-pairs that share interest than the corresponding RNM.3.2.4 Summary and ImplicationsThis section provides a metric to estimate pairwise interest sharing between users,offers a characterization of interest-sharing levels in CiteULikeand Connotea; andinvestigates whether the observed interest sharing in these systems deviates fromthat produced by chance, given the amount of activity users had. Such reference43Figure 3.2: Q-Q plots that compare the interest sharing distributions for theobserved vs. simulated (i.e., the RNM model) for CiteULike (left) andConnotea (right)is given by a random null model (RNM) that preserves the macro characteristicsof the systems we investigate, but uses random tag assignments.The comparison highlights two main characteristics of the interest sharing:first, interest sharing is significantly more concentrated in the real traces than inthe RNM-generated activity: in quantitative terms, three times fewer user pairsshare interests in the real traces. Second, most of the time, for the user pairs thathave non-zero interest sharing the observed interest-sharing intensity is signifi-cantly higher in each real system than in its RNM equivalent.A conjecture to explain these observations is as follows. Let us consider thatthe set of tags that can be assigned to an item is largely limited by the set of topicsthat item is related to. In this case, intuitively, the probability of choosing a tag isconditional to the set of topics the item is related to. At one extreme, the maximumdiversity of topics occurs when there is a one-to-one mapping between topics andtags, that is, when each tag introduces a different topic. The RMN simulates theother extreme, a single topic that encompasses all tags in the system.However, in real systems, the interests for each individual user are limited toa finite set of topics, which is likely to determine their tag vocabulary. This leadsto a concentration of interest sharing, as implied by the tag similarity, on few user44pairs, yet at higher intensity than that produced by the RNM.Finally, and most importantly, the divergence between the observed and theRNM-generated interest sharing distributions shows that activity similarity, ourmetric to quantify interest sharing intensity, embeds information about user self-organization according to their preferences. This information, in turn, could beexploited by mechanisms that rely on implicit relationships between users. Thenext section seeks evidence about the existence of such information by analyzingthe relationship implicit user ties, as inferred from the similarity between users’activity, and their explicit social ties, as represented by co membership in discus-sion groups or semantic similarity between tag vocabularies.3.3 Shared Interest and Indicators of CollaborationThe previous section characterizes interest sharing across all user pairs in eachsystem and suggests that it encodes information about user behavior, as its distri-bution deviates significantly from that produced by a random null model.This section complements this characterization and evaluates whether the im-plicit user relationships that can be derived from high levels of interest sharingcorrelate with explicit online social behavior. More specifically, this section ad-dresses the following question:• RQ2.2 Are there correlations between interest sharing and explicit indica-tors of social behavior?Before starting the analysis, it is important to mention that the number ofexternally observable elements of user behavior to which we have access is limitedby the design of the tagging systems themselves (e.g., the tagging systems collectlimited information on user attributes) and by our limited access to data (e.g., wedo not have access to browsing traces or search logs).One CiteULike feature, however, is useful for this analysis: CiteULike allowsusers to explicitly declare membership to groups and to share items among a se-45lected subset of co-members – an explicit indicator of user collaboration in thesystem. Thus, this feature enables an investigation about the relationship betweeninterest sharing and group co-membership (which we assume to indicate collab-oration). Note that a similar experiment could be performed using the explicitfriendship links in Delicious, for example. However, this data is not available toour study.Along the same lines, we use a second external signal: semantic similaritybetween tag vocabularies. More specifically, we test the hypothesis that item-based interest sharing relates to semantic similarity between user vocabularies.The underlying assumption here is that users who (have the potential to) collabo-rate employ semantically similar vocabularies.This section presents the methodology and the results of these two experimentsthat mine the relationship between interest sharing and indicators of collaboration.In brief, our conclusions are:• User pairs with positive item-based interest sharing have a much highersimilarity in terms of group co-membership and semantic tag vocabulary,than users who have no interest sharing.• On the other side, we find no correlation between the intensity of the interestsharing and the collaboration levels as implied by group co-membership orvocabulary similarity.3.3.1 Group MembershipIn CiteULike, approximately 11% of users declare membership to one or moregroups. While the percentage may seem small, they are the most active users:these users generate 65% of tag assignments, and introduce 51% of items and50% of tags. For this section we limit our analysis to the user pairs for which bothusers are members of at least one group. Also, the analysis focuses on groups thathave two or more users (about 50% of all groups) as groups with only one userare obviously not representative of potential collaboration.46The goal is to explore the possible relationship between item-based interestsharing and co-membership in one or more groups. Let Hu be the set of groups inwhich the user u participates. We determine the group-based similarity wH(u,v)between two users u and v using the asymmetric Jaccard index, similar to the item-based definition in Eq. 3.1, but considering the sets of groups users participate in.Based on this similarity definition, we study whether the intensity of item-basedinterest sharing between two users with non-zero interest sharing (i.e., wI(u,v) >0) correlates with group membership similarity.The experiments show no correlation between wI(u,v) – the item-based inter-est sharing – and wH(u,v) – the group-based similarity. More precisely, Pearson’scorrelation coefficient is approximately 0.12, and Kendall’s τ is about 0.05. Thisis surprising as one would expect that being part of the same discussion groups is agood predictor to the intensity in which users share interest over items. Therefore,we look into these correlations in more detail.To put these correlation results in perspective, we look at group similarityfor two distinct groups of user pairs: those with no item-based interest sharing(wI(u,v)= 0) and those with some interest sharing (wI(u,v)> 0). We observe that,although the group information is relatively sparse, pairs of users with positiveinterest sharing are more likely to be members of the same group than the userpairs where wI(u,v) = 0. In particular, 4% of the user pairs with wI(u,v) > 0have wH(u,v) > 0.2, while twenty times fewer user pairs with wI(u,v) = 0 havewH(u,v) > 0.2.These observations suggest that activity similarity (defined according to Eq. 3.1)is a necessary, but not sufficient condition for higher-level collaboration, such asparticipation in the same discussion groups. Although users share interest overitems, and may implicitly benefit from each other tagging activity (e.g., using oneanother’s tags to navigate the system), this may not directly lead to users activelyengaging in explicit collaborative behavior. Conversely, the lack of interest shar-ing strongly suggests a lack of collaborative behavior.473.3.2 Semantic Similarity of Tag VocabulariesThis section complements the previous analysis on the relationship between item-based interest sharing and collaboration indicators via group co-membership. Itinvestigates the potential relation between item-based interest sharing of a pair ofusers and the semantic similarity between their tag vocabularies, that is, the set oftags each has applied to items in its library. Since, through this experiment we aimto understand the potential for user collaboration through similar vocabularies,when comparing vocabularies for a user pair, we exclude the tags applied to theitems the two users have tagged in common – a these tags have a likely highsimilarity.The rest of this section is organized as follows: it presents the metric used toestimate the semantic similarity of two tag vocabularies; discusses methodologicalissues; and, finally, presents the evaluation results.Estimating semantic similarity: This experiment uses the lexical databaseWordNet to estimate the semantic similarity between individual tags. WordNetconsists of a set of hierarchical trees representing semantic relations betweenword senses such as synonymy (the same or similar meaning) and hypernymy/hy-ponymy (one term is a more general sense of the other). Different methods havebeen implemented to quantify semantic similarity using WordNet. In particular,WordNet::Similarity – a Perl module – provides a set of semantic similarity mea-sures [68].The experiments use the Leacock-Chodorow similarity metric [10], as pre-vious experiments, based on human judgments, suggest that it best captures thehuman perception of semantic similarity. The metric is derived from the negativelog of the path length between two word senses in the WordNet ”is-a” hierarchy,and is only usable between word pairs where each has at least one noun sense.Additionally, we explore a method to extend coverage to a larger subset ofusers’ tag vocabularies, with an approach that builds on the YAGO ontology, de-veloped and described by Suchanek et al. [91, 92]. YAGO (”Yet Another Great48Table 3.1: The share of tagging activity captured by the tag vocabularies inCiteULike and Connotea that is found in WordNet and WorldNet com-bined with YAGO lexical databases. As we use an anonymous versionof the del.icio.us dataset, with all the users, items, and tags identified bynumbers, this precluded us to perform the same analysis using WordNetand YAGO for del.icio.us.WordNet only WordNet + YAGOCiteULike 62.1% 79.5%Connotea 51.3% 65.3%Combined 57.4% 73.4%Ontology”) is built from the entries in Wikipedia 2, a collaborative online ency-clopedia. The standardized formatting of Wikipedia makes it possible for infor-mation to be automatically extracted from the work of thousands of individualcontributors and used as the raw material of a generalized ontology. The primarycontent of the YAGO ontology is a set of fact tables consisting of bilateral relationsbetween entities, such as ”bornIn”, a table of relations between persons and theirbirthplaces. Five of the relations are of particular interest to us because they con-tain links between entities mentioned in Wikipedia and terms found in WordNet.In this way, we are able to identify some tags as probable personal, collective, orplace names, and use the WordNet links from YAGO to map these on to a set ofcorresponding WordNet terms.A merged tag vocabulary that combines tags from CiteULike and Connoteadatasets show that a little over 13% of the tags had direct matches in WordNet.By adding the tags matched through comparison with YAGO’s WordNet links, thiswas increased to 28.6% of unique tags applied by users of both systems. Note,however, that these tags cover up to 75% of the tagging activity in the two systems,as shown in Table 3.1.In order to match tags gathered from the two systems with corresponding en-tities in YAGO, all non-ASCII characters, such as accented letters, are replaced2http://wikipedia.org49by their nearest ASCII equivalents; also, the experiment removes all charactersother than letters and numerals, and reduced all the YAGO entities to lower case(tags from both systems being already reduced to lower case). Finally, partialmatches are allowed, but to consider the partial match it is required that the endof a tag correspond to a word boundary in the YAGO entity or vice-versa. Thisprocedure enable the construction of a mapping between about 58,600 tags fromthe merged vocabulary and 57,900 distinct WordNet senses, with most tags match-ing multiple WordNet senses. Given that the addition of WordNet terms identifiedby mapping through YAGO effectively increases the total depth of the tree beingconsidered, the Leacock-Chodorow algorithm required that we adjust all tag pairsimilarity scores accordingly in order to fairly compare the WordNet-only andWordNet+YAGO scores. The maximum possible similarity with WordNet alone islog(1/40) or 3.689; whereas with WordNet + YAGO it is log(1/42) or 3.738.The similarity sim(t1, t2) between two tags (t1, t2) is defined as the maximumLeacock-Chodorow similarity between every available noun sense of t1 and t2.Thus, the semantic similarity between the tag vocabularies Tu and Tv of two users,u and v, as perceived by u, is denoted by s(u,v), and determined by the ratiobetween the sum over the pairwise tag similarities and the size of u’s vocabulary,as expressed by Eq. 3.2 below.sim(u,v) =∑t1∈Tu,t2∈Tv sim(t1, t2)|Tu|(3.2)We then calculate the corresponding value of s(v,u) by reversing the u and vterms in Eq. 3.2 and record the smaller of the two – i.e. min(s(u,v),s(v,u)) – asthe undirected tag vocabulary similarity between the two users u and v. We notethat this metric is based on the Modified Hausdorff Distance (MHD) [22].Methodological issues. There are three practical issues regarding our exper-imental design that deserve a note. First, to avoid bias, if two users assigned thesame tags to the same item, we omit these tags from their vocabularies, beforedetermining the aggregate similarity. By eliminating from vocabularies the tagsthat have been used on exactly the same items, we eliminate the tags on which50Figure 3.3: CDFs of tag vocabulary similarity for user pairs with positive(bottom curve) and zero (top curve) activity similarity (as measuredby the item-based interest sharing defined in Eq. 3.1). CiteULike (left);Connotea (right)the two users have most likely already converged. We look only at the remainingparts of the vocabularies where convergence is not apparent. Second, the Leacock-Chodorow similarity metric only considers words that have noun senses in Word-Net, because it is calculated from paths through the ”is-a” hierarchy, only definedfor nouns. Tags in both systems considered may include words or phrases fromany language, abbreviations, or even arbitrary strings invented by the user, whileWordNet consists mainly of common English words. A third methodological issuewas that matching tags to YAGO entries, in some cases, returned an unmanageablylarge set of distinct WordNet senses. We accordingly eliminated those tags thatwere above the 99th percentile in distinct WordNet senses matched, which werethose returning more than 167 distinct senses.Results. We use sampling to test, in both CiteULike and Connotea, whether thereis a significant difference in tag vocabulary similarity between two sets of userpairs: one where all users have no item-based interest sharing and one with pos-itive item-based interest sharing (we sample each group with n = 4000 pairs).This analysis shows that the vocabularies of user pairs with interest sharing aresignificantly more similar than those of user pairs with no interest sharing (Fig-51ure 3.3). The median vocabulary similarity for user pairs with positive interestsharing µc = 2.112 (±0.02, 99% c.i.) is about 1.6 times that of user pairs withno interest sharing µu = 1.308 (±0.04, 99% c.i.). This salient difference in thevocabulary similarity suggests that the item-based interest sharing embeds infor-mation about the ”language” shared by the users to describe the items they areinterested in.3.3.3 Summary and ImplicationsThis section takes a first step towards understanding the relationship between theimplicit user ties, as inferred from pairwise interest sharing, and their explicit so-cial ties. First, we look at correlations between the item-based interest sharing andthe group-based similarity. The observations indicate that although the intensityof item-based activity interest sharing does not correlate with explicit collabora-tive behavior, as implied by group co-membership, user pairs with some interestsharing are more than one order of magnitude more likely to participate in similargroups.Second, we evaluate the relationship between item-based interest similarityand the semantic similarity of tag vocabularies. We discover that, although thetwo do not yield a Pearson’s correlation, item-based interest similarity does embedinformation about the expected semantic similarity between user vocabularies.These results have implications on the design of mechanisms that aim to pre-dict collaborative behavior, as these mechanisms could exploit item-based similar-ity to set expectations about group-based and vocabulary-based similarity. More-over, assuming that the tagging activity characteristics of spammers differ fromlegitimate users, one could use deviations from observed relationship betweenitem-based similarity and the two indicators of collaborative behavior presentedhere to detect malicious user behavior.52Chapter 4Understanding Users’ Perception ofTag ValueThe first part of this thesis (Chapter 2 and Chapter 3) presents a quantitative char-acterization of tag production with the goal to understand how users’ individuallyproduce tags and socially interact (share interest and collaborate). This chaptermoves towards a characterization of users’ perception of tag value 1. In particular,it considers users’ perception of value when both producing and using tags forparticular tasks. The goal of this chapter is to understand the qualitative aspectsusers take into account when both producing tags (i.e., annotating content) andusing tags in exploratory search tasks.In exploratory search tasks, information seekers (i.e., users who are lookingto satisfy an information need) navigate the set of items by using tag clouds, asopposed to traditional keyword search. Users tend to prefer tag-based navigationwhen they are exploring a topic and want to retrieve a set of related items, asopposed to the single most relevant item [86]. Tag clouds (or similar user inter-face artifacts) are the default interaction mode provided by systems like Delicious,StackOverflow, or MrTaggy [49]. Figure 4.1 illustrates what a tag cloud typicallylook like. Tag clouds are generally initialized with the set of most popular tags.1The results presented in this chapter appeared at the following references: [80]53Information seekers start the navigation by entering a tag-query (typing or click-ing). The system, in turn, retrieves items that are annotated with that tag-queryand related tags (e.g., in the form of a tag cloud). The navigation continues fur-ther if the user selects one of the available tags presented by the system. Thesearch result at each navigation step is generally composed of items annotated byall the tags selected by the user. In this sense, we assume that the tagging systemprovides AND-semantics [6].Figure 4.1: Example of a tag cloud extracted from the most popular tags inFlickr. Information seekers interact with the tag clouds by clicking oneach term to retrieve items that are annotated with that specific tag.The tag cloud is reconstructed at each step.The investigation presented in this chapter addresses the following researchquestion:• RQ3. What are the aspects that influence users’ perception of tag value forexploratory search?To address this question, this study uses qualitative research methods. In par-ticular, in-depth contextual interviews help collecting the data, while the analysiscycle resorts to grounded theory methods [39]. This leads to a characterizationof aspects that influence users’ perception of tag value for exploratory search andwhen producing tags to annotate content. The rest of this chapter focuses on theformer issue, while one of the latter aspects is further explored in Chapter 6.54In summary, this chapter presents two major contributions:• Present findings about aspects of users’ production of tags that contributeto solidify the existing body of research on the motivation behind tagging.Moreover, it reveals that sometimes there is a disconnect between the moti-vations behind producing tags and the aspects that makes a tag valuable tousers when solving exploratory search tasks (Section 4.4.1).• A qualitative characterization of users’ perception of tag value in the con-text of exploratory search; based on the qualitative analysis of 9 contextualinterviews of social tagging users, we find that the two most salient aspectsthat influence users’ perception of tag value are: ability to retrieve relevantcontent items and ability to reduce the search space (Section 4.4).The next section provides background and positions the investigation in thischapter among the related literature.4.1 Related WorkIn a nutshell, this work differs from previous efforts in two main aspects. First, itis motivated by the view that social tagging systems are inherently online peer pro-duction systems [8]. Thus to improve the quality of user contributions, it is neces-sary to first quantify their value, so that one can then think of designing incentivesfor the production of high quality content. Second, this research focuses both oncharacterizing users’ perception of tag value, and on the design and analysis of amethod to assess tag value in practice (as opposed to only studying the impact oftags in other information retrieval tasks such as recommendation [3, 9, 26, 90]).This section starts by positioning the work in this chapter among the relatedstudies on characterizing users’ motivation behind tagging (Section 4.1.1). Next,it discusses previous studies on the economics of information that provides abackground to understand the perceived value of peer-produced information (Sec-tion 4.1.2).554.1.1 Why Do We Tag?Hammond et al. [34] provide, perhaps, the rst study that discusses the character-istics of social tagging, its potential, and the motivations users have to producetags. The study comments on the features provided by different social taggingsystems, and discusses preliminary reasons that incentivize users to annotate andshare content online. Marlow et al. [63] discuss the properties of several tag-ging systems while pointing out their similarities and differences. Additionally,the authors conjecture the motivations that can potentially drive the production oftags in these systems. Ames and Naaman [3] go deeper on the study of motiva-tions behind tagging and investigate why people tag in mobile (i.e., ZoneTag) andweb applications (i.e., Flickr). They interviewed 13 users to address the question:Why do people tag? Their findings indicate that there are both personal and so-cial motivations behind tagging. Moreover, the study builds a taxonomy for themotivations behind tagging in these systems along two dimensions: sociality andfunction. More recent studies have extended the analysis of motivations at a largerscale [90].This work differs from these previous studies as it concentrates on understand-ing the use of tags to engage in exploratory search tasks (e.g., exploring the setof items available in a social bookmarking tool), as opposed to focusing on themotivations behind tagging (i.e., the production of tags).4.1.2 Economics of InformationThe value of information in market settings is contextual [88], as it requires oneto make use of it to assess its expected value. Hischleifer [42] adds to the studyof characteristics of information goods by enumerating and discussing a set ofeconomically significant information attributes that can influence its perceivedvalue, namely: Certainty, Diffusion, Applicability, Content (environmental vs.behavioral), and Decision-relevance; as described below.Certainty – the value of information goods depends on the amount of certaintyit provides about the outcome of a particular process. For example, an annotation56that increases the probability of finding an item of interest to a user is more valu-able than an annotation that retrieves items of marginal interest to the user.Diffusion – the availability of information goods across the user populationmay affect their value, as few users may have the privilege to possess that infor-mation. In tagging systems, one may think of particular items or annotations thatare kept private.Applicability – an information good can be of general or particular applicabil-ity or interest. Indeed, a tag or item may serve a general audience or only a smallfraction of the user population. For instance, tags can be general enough (e.g.,networks) to be of interest to several sub-communities of users that use it to re-trieve relevant content. Conversely, other tags (e.g., Agneta2) are only applicablein a more restricted subset of the user population.Content – naturally, the value of information may be affected by the character-istics of its contents. Hirschleifer points out to common subclasses of this aspectin market settings, where it distinguishes information about the environment frominformation about the behavior of other individuals in the market. In tagging sys-tems, the content aspect of a peer-produced information can map to the semanticof tags, for example. For example, a tag may reveal information about how theuser intends to use the annotated content (e.g. ‘to-read’). Additionally, a tag maycontain information about topics of interest for a user.Decision-relevance – this dimension captures how important is the informa-tion in the context of a decision problem.As noted by Bates [74], the five aspects provide an idea on what factors in-fluence the value of tags and items, but it shed little light on how to determinetheir value. Arrow[? ] and Stigler [88] seems to provide a way out by adding tothe five aspects described above the observation that the value of information isdetermined from its use.One may expect that these attributes also influence the perceived value of peer-produced information such as tags. However, while attributes that influence infor-2A character in Pedro Juan Gutierrez’s Tropical Animal novel.57mation value have been investigated and discussed in market contexts, it is unclearwhat role these attributes play, if any, in the context of peer-production systems,in general, and in social tagging systems, in particular.Stigler states the value of information goods can only be assessed by its use [88].Repo [73] goes further with this statement and discusses two major approaches onassessing value of information: value-in-use and exchange value.Our study uses the same notion of a multidimensional value concept as dis-cussed by Hischleifer [42] and Repo [73]. However, we inquire further on thehuman perception by performing interviews with users to understand their per-ception of value of peer-produced information (which departs from the type ofinformation focused on in previous work). More precisely, we investigate whataspects users take into account when choosing tags in the context of exploratorysearch tasks. Understanding the value of information in online peer productionsystems, as perceived by information seekers, extends the existing body of knowl-edge on the value of information in markets’ context and in the design of infor-mation systems discussed in the next section.4.1.3 Perceptions of Information Value in System DesignAnother context where the perception of information value plays a role is on thedesign of social information-sharing systems. In a recent study, Lampe et al. [57]investigate the users’ perception of Facebook’s value as an information source byconducting a survey with non-faculty staff at Michigan State University. Theiranalysis shows that Facebook users are not likely to engage in information seek-ing with their Facebook network. However, users who do engage in informationseeking show common characteristics: they tend to be female, younger, and havemore total and actual friends on Facebook than those who do not engage in in-formation seeking. Similarly, Andre´ et al. [4] investigate the perception of valueof tweet’s content. The authors show that while 35% of tweets are rated by usersas worth reading, 25% are marked otherwise (not worth reading). Their analy-sis also shows that the tweets that are considered valuable are those that provide58information about a topic of interest or have some humor taste.Our study differs from these previous work as it focus on the instrumentalvalue of tags (peer-produced information) for a particular application exploratorysearch as opposed to the conceptual value of information of expert-producedinformation or the value assessment of a platform (e.g., online social network orparticular search engine) as a source for information.4.2 MethodologyThis section presents a brief description of the methodology adopted for the qual-itative study (i.e., recruiting, data collection, and analysis methods). Figure 4.2 il-lustrates the qualitative research cycle used in this work. It consists of an iterativeprocess that starts by defining a set of research questions, recruiting participants,performing contextual interviews, and iteratively refining the data collection pro-cedures based on the ongoing analysis of data.Participants. The target population for this experiment was any Internet userwho is familiar with search and navigation tasks in social tagging systems. Therecruiting method used a combination of advertising via email and snowball re-cruiting techniques (i.e., where a participant suggests others who may qualify forthe study) [39]. New participants were continuously recruited until a level ofdiversity among participants and saturation in the data collection was observed.Participants were asked to complete a background and demographics question-naire 3. We recruited 12 participants. The first two interviews were used as pilotsand were not used for the final results. We discarded one participant because shefailed to demonstrate basic knowledge about social tagging.The 9 interviewed participants are mostly young males (all are 19 year old orolder): only one is female and only two reported to be over 30 years old (twopreferred not to report their age). Brazilian nationals are the majority (5), fol-lowed by Iranians (2) and USA nationals (2). The group is highly educated: allof them have at least a graduate degree. The majority of the participants has an3Available at: http://goo.gl/uLVkRl59Figure 4.2: Illustration of the qualitative research cycle inspired by themethodology described in Hennik et al. [39] and applied in this work.engineering/computer science background, while two others have background inlinguistics and arts. All of them reported to be fully capable of performing ex-ploratory search tasks, and 8 reported to be able to develop software.Data collection. The data is collected using semi-structured contextual in-terviews, as this technique provides flexibility in approaching participants abouttheir tag-based search habits. The interview protocol consisted of open-endedquestions that explore the users’ application of tagging features in different sys-tems(Appendix A).The interviews were performed either face-to-face or using a video chat toolat the participants convenience. The duration of each interview was roughly one60hour, and consisted of two parts. Both parts consist of contextual inquiries whereparticipants are encouraged to use a social tagging system to illustrate usage andexplain their choices of tags while searching.In the first part of the interview, participants were free to use any system theyare familiar with and used to produce tags or use tags to search. The goal is to gainan insight of the users’ habits as they explain their understanding of tags and theirpersonal usage choices. In the second part, the participant used a Delicious-clonesystem 4 that is populated with a snapshot of bookmarks and tags collected fromDelicious (i.e., more than six hundred thousand entries collected in September2009 5). The goal of this task-driven interview is to inquire deeper on the users’decision making process during exploratory searches. By motivating the user witha real search task that is similar to the user’ common tasks, we can explore specificaspects that influence the choice for one tag versus another.All sessions were recorded as a video of the participant’s screen and the au-dio of the conversation. The data collected via the interviews were transcribed,coded, and, finally, analyzed using Grounded Theory methodology [39]. It isworth noting that data collection was conducted iteratively with data analysis.This approach allowed the research to stop recruiting and interviewing new par-ticipants when saturation in the data is observed (i.e., new issues are not found inthe analysis of the interviews transcripts) together a diverse demographics in theparticipant sample.Analysis. In summary, the analysis cycle consists of transcription of interviewrecordings, coding, description and comparison of codes, codebook consolidation,analysis plan design, categorization, production of a thick description for eachidentified issue, and, finally, conceptualization. It is important to highlight thatthat this process is iterative and each step is constantly refined by the output of theprevious.• Coding. The codebook is seeded with deductive codes (i.e., codes origi-4Code available at: https://github.com/nigini/GetBoo5http://arvindn.livejournal.com/116137.htm61nated from the related literature). During the process of coding a transcribedinterview, new inductive codes may surface from the data. This implies arefinement of the codebook by the addition of new codes. In collaborationwith other researcher, the codebook is refined by applying triangulation. Inthis process, each researcher reviews the coding performed by each otherindependently to consolidate the codebook by means of a discussion andrevision of each others codes and coded interviews. The final codebook thatresults from this step is available at Appendix B.• Analysis plan. The next step consists of defining an analysis roadmap thatguides the data searches on the coded interviews with the purpose of ad-dressing the initial research questions. The analysis plan with the result ofthe data searches per issue is available at: http://goo.gl/YL38nA.• Categorization. It is necessary to group codes with similar attributes tobuild a set of categories such that an understanding of higher level conceptscan be extracted from the data. The categorization is connected to the stepof producing thick descriptions (see Appendix C of issues and the relatedcodes as resulted from the searches on the data that is guided by the analysisplan.• Conceptualization. Finally, it is possible to move towards the extractionof concepts that connect the issues and help addressing the research ques-tions. The main result of this step is the production of a concept map thathighlights aspects that influence users perception of value. This result ulti-mately can inform the design of methods that assess the value of tags in thecontext of exploratory search. The concept map is discussed in details inSection 4.5.The next sections provide a description of findings from the analysis of con-textual inquiries based on in-depth interviews, as guided by the research questionsmentioned above.624.3 Which Systems Do Participants Use?This section discusses the systems in which participants have reported to use tagseither to annotate content or to seek information. More importantly, we also in-quire about the users’ motivation behind using the tagging features of these sys-tems with the intent to confirm previous qualitative studies on the motivationsbehind tagging.Systems. Participants reported to use a variety of systems with different mo-tivations for each of them. Twitter was mentioned by eight participants, whileFlickr, Delicious, and Facebook were discussed by three participants each. Othersystems mentioned were CiteULike, Dribble, Diigo, Evernote, Instagram, Pinter-est, StackOverflow, Vimeo, and YouTube.We note that this set of systems provides an opportunity for a more compre-hensive understanding of tag production and consumption compared to previousstudies, as they represent almost all categories of Marlow’s tagging systems de-sign taxonomy [63], from self-tagging usage (e.g., Evernote) to systems wherefree-for-all tagging is allowed (e.g., Twitter, Delicious); different types of ob-jects as web pages (Delicious), images (Instagram, Flickr), and micro-blog posts(Twitter); or even considering different types of resource connectivity (e.g., Flickrphotos can be grouped, scientific works in Citeulike cite each other) and socialconnectivity (from following users in Twitterto a private usage in Evernote).Motivation behind tagging. Participants declared a variety of reasons to usetagging systems. Participants often provided many reasons for using a single sys-tem or using tagging in general. These motivations can be driven by aspects unre-lated to the tagging feature such as the perception that the process of making senseis faster in Twitter, as declared by participant P1; or, by aspects closely related tothe tagging features provided by the system, such as the ability to bookmark itemsto read them later, as declared by participant P3. Other participants declared theneed for collaboratively maintaining a list of bookmarks that help them to orga-nize and share a reading list with others was a driving motivation to use systemslike CiteULike.63In summary, although there are multiple reasons that motivate a participant touse tagging systems, the personal and social information management providedby tagging systems are the main reasons that drive users to contribute to socialtagging systems.4.4 Users Perception of Tag ValueThis section presents a qualitative investigation on the aspects that influence users’perception of tag value in social tagging systems. In particular, the qualitativeanalysis that follows mainly focuses on the value of tags in the context of ex-ploratory search tasks; however, the analysis also contribute to the existing bodyof research on the motivation behind tagging by presenting the findings about as-pects of tag production (Section 4.4.1). Next, Section 4.4.2 presents the analysisof users’ perception of tag value for exploratory search.4.4.1 Aspects of Tag ProductionWhile our main goal is to characterize the perception of tag value in exploratorysearch tasks to inform the design of methods that quantify tag value, we startby probing about the aspects that influence users’ perception of tag value whenproducing annotations (as opposed to using them to search).A prevalent theme, as observed from the data collected during the interviews,is that users perceive tags as valuable when they help describing items they are an-notating. Such tag assignments are deemed useful as they improve sense-makingabout a set of items and by making individual items searchable. In particular, in-terviewees comment on the need of tags to describe images and videos with thesetwo purposes. In this context, tags that describe features of the object such aslocation, people , and aesthetics characteristics are considered useful. For textualitems (e.g., tweets), which themselves are searchable, tags are reported as usefulto augment their meaning by making explicit a feeling about the text or providingcontext for the textual item.64While creating annotations to improve the ability to find the item later, someparticipants report that there is a tension between using general and specific terms.On the one hand, general tags are likely memorable, and will come to mind assearch terms when looking for an item. On the other hand, such tags providelittle discriminative power; as they are likely used in many items. Another aspectrepeatedly raised by our subjects is the potential of tags to attract attention to itemsthey create or post. Several subjects were concerned with annotating items so thatthey would likely become more popular or at least with more chances to be finded.As participant P11 describes a strategy to promote content by the use of tags:P11 – Instead of writing ‘got first place in the fencing championship (stateleague)’, I write ‘got first place in the #fencing champion (state league)’ as itmakes easier to other people find my tweet when searching for that tag.Finally, interviewees commented how annotations may attach content to atrend or the contributor to a group. Annotating an item with a tag that is cur-rently used in a trending topic or which is specific to groups is seen as connectingthe user with others. Different subjects were motivated to participate in the collec-tive use of a trending tag or were concerned with not using a tag that are normallyused by users of a different opinion group.In summary, the aspects that influence one user’s perception of value duringthe production of tags may not be in tandem with the expectation of another userwhen searching for items. This is based on the observation that some of the driv-ing forces behind tagging and perception of value during production of tags ishighly personal (e.g., feelings), and thus other users may not consider the sametag valuable when trying to locate the same item.In the next section, we address the question of whether there is indeed a mis-match between the perceptions of tag value while annotating items and searchingfor them.654.4.2 Tag Value in Exploratory SearchExploratory search is the process of acquiring a set of information resources toan information need (e.g., a particular domain) with a certain level of uncer-tainty. This section presents the qualitative analysis of aspects that influence users’decision-making along the steps of exploratory search such as ‘What tag to use?’or ‘When to stop the search?’.The contextual interviews, and in particular the search tasks users performedin the second part of the session, enabled us to observe and identify patterns ofuser behaviour while they engaged in information seeking tasks (see Appendix Afor details on the contextual inquiry guidelines used in this study). Users provideddata about their decision-making either voluntarily or by answering specific ques-tions about their actions while trying to locate items that fulfill their informationneeds. Based on our observations that synthesize the behaviour across all partic-ipants, the exploratory search process can be illustrated by the high level modelshown in Figure 4.3.Figure 4.3: Illustration of observed users’ transitions and decision makingduring exploratory search tasks.The illustrative model in Figure 4.3 points out that users stay in a loop (i) de-ciding which tags to use to define a search space at each step that better reflects herinformation need; and (ii) judging the relevance of returned items. More than justreflecting a general intuition about exploratory search processes (and previously66proposed models for information foraging [69]), this model is useful when dis-cussing our more specific results about the aspects that influence the participants’decision-making, which we do in the next paragraphs.Search space definition. A search space is basically a set of items indexedand retrieved by tags. Participants normally define a search space to describe howtheir information need is translated into tags. such that it provides a search spacedefinition which Search space definition is an essential part of the exploratorysearch process and involves different perspectives of the set of items retrieved byeach tag. Participant P11, for example, during the execution of a Search Task 1(see Appendix A), clicked on ‘web2.0’ and reported that this tag is more repre-sentative of web social networks (which was the main topic of that participant’sinformation need). The same idea was expressed by participant P6 when choosingthe tag ‘tutorial’, as the participant explains that ‘tutorial’ is a better representa-tion of her particular information need (the participant was looking for material tolearn more about programming).Known vocabulary. As users try to translate their information needs intotags, these tags tend to come from users’ known vocabulary. Participant P8 isclear about that when saying that she chooses hashtags that are alike terms that Ihear, when performing exploratory search on Twitter. The same user goes furtherand, right after the previous statement, comments on the ‘cryptic’ aspect of the tag#DAADC13 saying: this one here I would probably not click on because I do notknow what it means. However, this is not simply a matter of a tag to be ‘knownor unknown’ to a given user. Participant P3 justifies choosing ‘computer science’to search instead of ‘computing’ saying that basically it’s because I use it moreoften. These observations suggest that the more a tag is already used by a user,the higher its perceived value is.Search space size. A characteristic of a search space (as defined by the use ofa tag) is the search space size (i.e., the number of items it contains). Users tend torefer to the ‘right size’ of a search space in exploratory search when talking aboutthe decision to continue searching for items. Participant P11, for instance, men-67tions that A lot of results is confusing and you’ll not be able to find what you wanta number of results that doesn’t even fill system’s first page is kind of frustrating.Additionally, P4 expressed a lack of confidence in the results when too many itemswere retrieved. In contrast, participant P1 took the action of removing an addedtag because it might have filtered too much. Interestingly, many participants men-tioned the number of retrieved items (reported by the system in the search resultspage) as a way to gauge whether the tags are helping on controlling the searchspace size. As mentioned by the users, search space size affects their perceptionof the value of a tag.Relevance. Besides finding the ‘right size’ of a search space, no search iscomplete without locating relevant items. The relevance aspect in our data be-comes salient when users are deciding whether a space (composed of retrieveditems) fulfills their information needs. In fact, this aspect has been raised and de-scribed across all participants, which strongly suggests that this aspect is a majorinfluence on users’ perception of tag value.Participant P7 points out to this aspect by stating: I am going to take a lookat the first five or ten entries to have an idea about my results. After a briefinspection, the participant decides that they (the results) still have a lot of noise,so I am going to add one more tag. Similarly, participant P8 reports an analysisof the relevance of the space defined by a tag as saying that this (set of items) isstill not sufficient I gave a quick look but the first (entries) were not interesting.Participant P7 is more direct in suggesting that relevance influences the perceivedvalue of a tag when reasoning about a particular choice of tags. The participantselected ‘software’ instead of ‘programming’ based on the perception that she willfind more things related (to my information needs) using the former instead of thelatter.Combination of space size and relevance of items. Participants use wordslike ‘focused’, ‘specific’, ‘restrict’, and ‘refined’ to describe a desired search spacethat balanced well size and relevance. Participant P7 supports this observationby explaining a click decision: as ‘opensource’ is already a subset of (software)68development/programming then I’ll start clicking at ‘opensource’. Similarly, par-ticipant P1 reasons that adding an additional tag to the navigation is beneficialbecause it might give more focused results. Another strong example related to theinfluence of space characteristics combination is raised by Participant P3 whendeciding to redefine the space at a particular point of the navigation: It looks likethis (result) is really related to ‘storage’ but there is nothing to do with research.I need to refine it more. This combination of characteristics of a space (as definedby a tag) influences positively the value of a tag, as a tag can define both a smallerspace that contains highly relevant items.Diversity and neighbouring spaces. Finally, two other identified aspects areconnected to tags related to a currently defined search space: diversity and neigh-boring spaces. To some degree, these two aspects are opposite concepts if oneconsiders that related tags to a given space (presented as a tag cloud) can be per-ceived as increasing the diversity of items in that space or simply retrieving similarneighboring spaces.Participant P4 considers confusing to have ‘artists’ as part of the tag cloudwhen the current space is already defined by the tag ‘artist’, which suggests thatmore diversity in the tag cloud improves the perception of value for the tags inthe tag cloud relative to the currently defined space. Similarly, participant P1is even more emphatic about this aspect while performing Search Task 1 (seeAppendix A) by stating that: type and typography both of them point to the samething, web and website, icon and icons, it’s a bit of useless to have these twosimilar, very similar tags together, this is something that impacts the value, iconshave zero value here because you have icon here. When inquired about whetherreplacing these highly similar tags by more diverse set of tags would improve theperceived value, the participant replied: Yes, meaningful diversity within the tags.On the other hand, participant P2 selects the tag ‘user experience’ after using‘ux’, while reporting that these two terms are considered synonyms. ParticipantP2 explains that she perceives that the tag ‘user experience’ can retrieve resultssimilar to those retrieved (but not annotated with) by the tag ‘ux’ (i.e., a neigh-69boring space to the currently defined one). We were unable, however, to identifywhether each of these two aspects (i.e., neighboring search space and diversity) ismore important than the other regarding the characteristics of a tag cloud to users.4.5 Concept MapThe analysis reported in the previous section leads to several insights into theaspects that influence the users’ perception of tag value. These insights are sum-marized by the concept map in Figure 4.4.Figure 4.4: Concept map that illustrates the influence of several aspects onthe perceived value of tags for exploratory search.The central entity of the map is a tag that has a perceived value by users.Tags may come from a user’s vocabulary, which in turn mediates the perceptionof value a user has about a tag. Moreover, users express their information needvia a tag that defines a search space. In turn, a search space has many aspects:relevance (of items it contains), size (number of items), and a related tag cloud70(set of tags). Users report, two characteristics of the tag cloud (relative to the tagthat already defines the space) influence their perception of value of a tag: first,the ability to explore similar neighboring spaces and diversity among tags.These findings are key to design methods that quantify the value of tags to in-formation seekers in the context of exploratory search. The two aspects of searchspace, as defined by a tag, can therefore guide the design of functions that measurethe value of tag. This definition and evaluation is presented in details in Chapter 5.It is important highlighting that the concepts and their interactions, as pre-sented by the concept map in Figure 4.4, expose both limitations of current tech-nology and fundamental characteristics of user behaviour in exploratory search.For instance, users express that diversity among tags in a tag cloud is important.This fact is generally expressed by showing discontent about the selection of tagsby current systems. This is an indication that current systems can be improved.On the other hand, the observation that users define a search space by extractingtags their known vocabulary is a fundamental characteristic of user behavior whenperforming exploratory search regardless of the system in which this occurs.4.6 SummaryIn summary, at least four aspects, as discussed by the users during the contextualinquiry and revealed by the analysis, influence their perceived value of a tag. Inparticular, two of them are more salient: search space size and relevance. There-fore, the findings suggest that the perceived value of a tag is largely influencedby its ability to retrieve items that are relevant to a user while reducing the searchspace size. The tag reduces the search space by filtering out items, and maximizesrelevance by retaining the items that address the user’s information needs duringexploratory search.It is also worth highlighting that this study provides an important characteri-zation that can help designing, apart from methods to quantify value of tags, othernew social tagging features (e.g., tag cloud algorithms, user interface design, andranking mechanisms) in many systems, as it improves our understanding of what71users consider valuable when searching with tags.Finally, it is also worth discussing potential threats to validity. As in any qual-itative study, this study is subject to some design decisions that may impact itsvalidity. In particular, the most important aspects is the external validity, which Idiscuss below.External validity. Although the qualitative study is performed at small scale,which limits our ability to make general claims about the findings, the list ofsystems described in Section 4.3 shows that the qualitative analysis covers usagescenarios of both tag production and tag-based search in a variety of systems. Webelieve that this variety of systems provides a broad set of real usage scenariosand reduces the threat to external validity. It is also important to recognize thatthe diversity of participant demographics is limited.72Chapter 5Assessing Tag Value for ExploratorySearchThe previous chapter provides a characterization of users’ perception of tag value.In particular, the analysis focus on aspects users consider when producing tags(i.e., annotating content) and using tags in exploratory search tasks.This chapter takes the lessons from the qualitative analysis to inform the de-sign of methods that quantify the value of tags 1. In particular, the investigationpresented in this chapter addresses the following question:• RQ4. How to quantify the value of tags as perceived by information seekersin exploratory search?To this end, this chapter presents a formalization of the aspects that, accordingto the participants of a qualitative study, influence their perception of tag valuewhen they are performing exploratory search tasks. In summary, this chaptercontains the following major contributions:• A framework that helps reasoning about the problem of quantifying thevalue of user contributions in tagging systems (Section 5.3).1The results presented in this chapter appeared at the following references: [75, 80]73• A method that quantifies the value of tags that caters for the two desirableproperties in the context of exploratory search, as identified by the qualita-tive user study. We prove that this method has desirable theoretical proper-ties while quantifying these two aspects (Section 5.5).• An experiment using real tagging data that shows that the proposed methodaccurately quantifies the value of tags according to users’ perception (Sec-tion 5.6).The rest of this chapter starts by positioning this study among the related lit-erature.5.1 Related WorkIn a nutshell, this work differs from previous efforts in two main aspects: first,it is motivated by the view that social tagging systems are inherently online peerproduction systems. Thus to improve the quality of user contributions, it is neces-sary to first quantify their value, so that one can then think of designing incentivesfor the production of high quality content. Second, this research focuses on thedesign and analysis of a method to assess tag value in practice (as opposed to onlystudying the impact of tags in other information retrieval tasks such as recommen-dation [3, 9, 27, 90]).Section 5.1.1 starts by discussing what makes the design of methods to quan-tify the value of peer-produced information challenging; and, Section 5.1.2 re-views previous works that study the quality of tags in different contexts.5.1.1 Contributions in Peer Production SystemsOnline peer production systems can be categorized into systems where users pro-duce/share resources or information. In the former category, as we have alreadymentioned, quantifying the value of user contribution is based largely on countingthe resource units one user produces and donates to other users (and implicitly to74the system). For example, in P2P content sharing systems (e.g., BitTorrent) thevalue of contributions is estimated by the volume of content a peer donates to oth-ers citeAndrade2009. Similar methods have been applied for volunteer computing(e.g., BOINC), where contributions are quantified in terms of CPU hours.Valuing contributions in these resource-sharing peer production systems relieson: first, the fact that the amount of resources donated are easily quantifiable; sec-ond, the assumption that contribution value can be directly linked to the resourcesconsumed to deliver a service; and, third, on the simplifying assumption that aunit of contributed resources has a uniformly perceived value across all users ofthe system.In contrast, none of these assumptions holds for systems that support produc-tion/sharing of information. First, it is impossible to directly quantify the ‘effort’that has led to the production of a specific piece of information; and, second,the value of information (e.g., tags or items in tagging systems) is subjective tousers’ opinions, interests, and task at hand (an aspect shared with other informa-tion goods).To address this latter issue of contextual value, some peer production sys-tems, such as StackOverflow.com, use intangible rewards (e.g., points) by allow-ing users to rate content items, as a way to express how much they like a par-ticular item. Ratings can, therefore, be interpreted as an estimate of the value ofone user’s contribution from the perspective of another. Although this approachgenerates rich feedback about what users like (or sometimes dislike), it has twolimitations. First, rating information is generally sparse (i.e., the majority of usersdo not express their preferences via ratings); and second, in tagging systems, itemrating does little to address the problem of valuing tags. Thus, while this infor-mation can support a solution, a direct estimation of value that covers the entirespectrum of peer-produced information is necessary. In this study, we cope withthe contextual nature of tag value by: first, using a qualitative analysis to identifythe aspects that influence information seekers’ perception of tag value (in the con-text of exploratory search); and, second, using the result of this analysis to inform75the design of a method that quantifies the value of tags.5.1.2 Characterizing the Quality of TagsSeveral studies focus on characterizing the quality of tags or tagging (as a fea-ture of an information system) in general. These studies instantiate the notion of‘quality’ in various ways, which we comment in turn.Search and Recommendation. Focusing on the quality of tags for informa-tion retrieval tasks such as content classification and search, Figueiredo et al. [27]and Bischoff et al. [9] evaluate the quality of information provided by tags (incomparison to other textual features) to improve the efficiency of recommenda-tion or search mechanisms. Similar studies aim to harness tags to improve websearch [41, 99].Tagging efficiency in decentralized search. Helic et al. [38] study taggingsystems from a network theoretical perspective by analyzing whether tagging net-works (i.e., formed by the associations of tags to items) have properties that en-able efficient decentralized search [1]. In particular, the authors study the impactof different algorithms that build such tag networks (i.e., tag hierarchies, knownas folksonomies) on a decentralized search process. The rationale is that a folk-sonomy is better than another if a decentralized search that uses that folksonomyas background knowledge [1] performs more efficiently.Tagging as a categorization mechanism. Moving the focus to a different ap-plication, Heymann and Garcia-Molina [41] investigate whether tags help users tocategorize content by analogy with widely deployed classification tools for librarymanagement systems. They use a qualitative analysis to evaluate the power of tagsto build classification systems rather than a user-centric quantitative approach toassess value. Lu et al. [61] perform a similar study, by comparing peer-producedtags and expert-assigned terms to classify books aiming at showing that tags canimprove accessibility of items in a library catalog.Quality of textual content. Other studies focus on the content of peer-producedinformation. Suchanek et al. [92] study the quality of tags by determining the de-76scriptive power of a tag (i.e., its efficiency in describing an item). Similarly, Guet al. [31] propose a way to measure the confidence in which a tag describes aparticular item. In their context, confidence equates to the relevance of the tagto the topic in which the item is part of. More recently, Baeza-Yates et al. [5]characterize the lexical quality of several web sources, including a social taggingsystem (Flickr), finding that the lexical quality of texts in Flickr is better than thatin the general web. Other work has focused on methods to detect and mitigate theimpact of tag spam [54].Building tag clouds. Helic et al. [38] and Venetis et al. [94] analyze algo-rithms to build tag clouds. Their approach to evaluate the quality of a tag cloud isdirectly related to the present study. However, their method resorts to metrics thataim to capture intuitive aspects of users’ information needs – such as novelty, di-versity, and coverage of content items in the system – our method concentrates onthe relevance and reduction of search space. A comparison of all these methods isa direction for future work.Our approach differs from previous efforts as we start by characterizingusers’ perception of tag value to inform the design of a method that quantifiesthe value of tags for exploratory search. To the best of our knowledge there areno previous attempts to neither characterize the perception of tag value for ex-ploratory search nor design methods to quantify tag value in such context.5.2 System ModelBefore presenting the design of a method that quantifies the value of tags in thecontext of exploratory search, it is necessary to introduce the system model andsome notation.Let S = (U, I,A) be a social tagging system, where U represents the set ofusers in the system, I denotes the set of items, and A represents the set of an-notations. An annotation is a tuple that specifies its author, the annotated item,a tag assigned to the item, and the time the annotation happened. Formally,A = {(s, i, t,e)|s ∈ U, i ∈ I}, where t is a tag, a word selected by the user from77an uncontrolled vocabulary to annotate the item at timestamp e).The set of annotations As characterizes a particular user s, where individ-ual annotations can be distinguished by their timestamps. More formally, As ={(q, i, t,e) ∈ A|q = s}. From the set of annotations As, it is possible to derive theset of items (or user library), and the set of tags (or user vocabulary), respectivelyannotated and used by particular user s. The library and vocabulary of a user arerespectively defined as follows: Is = {i|(s, i, t,e)∈As}, and Ts = {t|(s, i, t,e)∈As}.The set of tags assigned to a particular item i, T i, and the set of items tagged witha particular tag t, It , are similarly defined.Item relevance. We assume that, for an information seeker s, there is a proba-bility mass function p(·|s) over the set of items in the system that the informationseeker has not annotated yet (i.e., over I− Is) that specifies the relevance of anitem i given that information seeker s. Therefore, the set of items relevant to aninformation seeker can be defined as:Definition 3 Given an information seeker s, the set of items relevant to s is: Γs ={i ∈ I− Is|p(i|s) > 0}, where p(i|s) is the probability of relevance of an item i toan information seeker s.Note that Γs can be defined for different search tasks, such as exploring or in-cluding a user’s own library (i.e., items already tagged by the user). The proposedmethod is general enough to work with alternative definitions of .Modeling exploratory search. We model exploratory search as a communi-cation channel between the search engine (the sender) and an information seeker(the receiver). Consider that the sender transmits items to the user, and the channelis characterized by the probability of item relevance p(i|s) to the receiving userover the set of items Γs. In this context, a tag defines a filter that creates a newchannel from the original one. The new channel is characterized by the proba-bility of item relevance conditional both on a tag and on the information seeker:p(i|t,s)Search space. Using a probabilistic interpretation where the items are assignedwith a probability of relevance, a tag t reduces the search space if the probabil-78ity mass function p(·|t,s) over the set of items Γs is more concentrated than theoriginal probability mass function p(·|s) (see discussion below).Probability estimation. It is worth highlighting that there are many ways toestimate the probabilities of relevance p(·|s) and p(·|t,s). More importantly, itis not our goal to advocate a particular estimator. In particular, the evaluation ofour proposed method (Section 5.5) considers two estimators: i) a language modelbased on Bayes smoothing as described in Wang et al. [96]; and, ii) a topic modelbased on Latent Dirichlet Allocation (LDA) as proposed by Harvey et al. [35] 2.5.3 A framework to assess the value of usercontributionsUsers in a tagging system are either information producers or information seekers,depending on the action they perform at a given moment. Information producerspublish new items and/or annotate existing items. An information seeker navi-gates the set of items available in the system. To assess the value of a user’s con-tribution in such system, one must combine the value of items and tags producedby the user.More formally, let v(tu,s) and r(iu,s) be two functions that quantify the valuesof a tag tu, and of an item iu, respectively, produced by user u, from the perspectiveof an information seeker s. A function K(u,s) should combine v(tu,s) and r(iu,s)for all tags and items produced by u.In particular, the intuition behind computing v(t,s) is that the value of a tagshould be proportional to its ability to retrieve relevant items while reducing thesearch space (as the qualitative characterization of users’ perception of tag valueindicates).Similarly, the value r(i,s) of an item to an information seeker should be pro-portional to its ‘relevance’ and ‘usefulness’ to user s. This can be estimated di-2In the spirit of enabling reproducible research, all of code used to estimate probabilities basedon the work proposed in citeHarvey2011,Wang2010, together with the scripts to produce our pre-sented results are available at: http://github.com/flaviovdf/tag assess79Figure 5.1: Components of a framework to quantify the value of user contri-butions.rectly based on: (1) network analysis similar to that applied to the citation graphto find influential authors [52, 97]; (2) direct user feedback such as ratings; or (3)indirect user feedback such as the frequency an item is (re)visited.Figure 5.1 presents a block diagram that illustrates the process to assess thevalue of user contributions. The top part of the diagram presents the flow to cal-culate the value of tags produced by user to an information seeker as a function ofthe tags’ ability to lead to items relevant to her. The ‘Tag Value Calculator’ blockcombines the information seeker’s set of relevant items Γs (produced by the rele-vant item set estimator, which can be based on an item recommendation engine)and the information producer’s annotations to determine the value of tags (i.e., thetags extracted from the annotations produced by u) to s.The bottom part of the diagram presents the flow to calculate the value of itemsproduced by u that are used by s. The ‘Item Value Calculator’ box combines theinformation seeker’s item usage statistics, represented by Fs (output from the itemusage monitor), and the set of items originally published by u to estimate the valueof these items. These usage statistics can be obtained via click traces, for example,that provide information about how often a user consumes a particular item.80Finally, the estimated values of tags and items are aggregated separately andthen combined into the value of the contributions from u to s, K(u,s).It is important to highlight that the proposed framework 5.1 is generic. Eachbuilding block can be instantiated according to the specific characteristics of thesystem. For example, the availability of user activity data, such as records of tagassignments, click traces, item ratings, friendship links, or group co-membershipinformation, can certainly drive the design of specific solutions for the value cal-culator and aggregator boxes.The rest of this chapter focuses on assessing the value of tags from the perspec-tive of an information seeker. In particular, it designs and evaluates an instance ofthe function v(t,s)5.4 A Naive MethodA simple method that could capture both the ability of a tag to reduce the searchspace and to retrieve relevant items (i.e., the two most salient aspects that influenceusers’ perception of tag value, as presented in Chapter 4) will include as inputsthe number of retrieved items (relative to the number of items in the system) andsome aggregation of the relevance score of items retrieved (e.g., the average).More formally, let us assume that the relevance score is given by the probabilitythat an item is relevant to an information seeker s. Thus, a naı¨ve method can definethe value of tag t from the perspective of the information seeker s as:v(t,s) = (|I|− |It |)∑i∈Itp(i|s)|It |(5.1)where p(i|s) is the relevance probability of item i given a user s and it is definedover Γs.Although Equation 5.1 captures the reduction of search space and the rele-vance of items retrieved by the tag, this method fails to distinguish the value oftwo tags when they retrieve the same number of items, but the distribution of itemrelevance is different, yet the average relevance of items in It is the same. In this81case, choosing the tag that is more valuable to user is simply arbitrary.To illustrate this situation, suppose two distinct tags and that retrieve the samenumber of items – i.e., |It |= |Iw|. Now, consider that the probabilities of items inIt are (0.5,0.2,0.2,0.1), and those for Iw are (0.25,0.25,0.25,0.25). In this case,the average relevance is the same for both tags, but the item relevance distributionfor the items retrieved by t is more concentrated. In this case, tag t should be con-sidered more efficient than w when used by a user to explore the set of item, as itreduces the search space (probabilistically) by concentrating the relevance distri-bution. However, this naı¨ve method is unable to assign appropriate values to eachtag. Therefore, it is important that a method takes into account the distribution ofitem relevance from the perspective of an information seeker, given that she usesa tag to prune the original search space of items I. The next section elaborates onthis idea, and introduces our proposed method.5.5 An Information-theoretical ApproachWe split the presentation of our method into three parts: first, we present how weestimate the reduction of search space by a tag; second, we discuss an approach toestimate the relevance of the set of items retrieved by tag; and, finally, we combinethese two components.Estimating search space reduction. To estimate how much a tag reducesthe search space for a given information seeker, the proposed method assumesa probabilistic interpretation (as opposed to assuming a deterministic approachthat counts the number of filtered items by the tag). In our model of exploratorysearch a tag reduces the search space by leading to a higher concentration on theprobability of relevance over the set of retrieved items.More formally, given the distribution of probability of relevance p(·|s), andthe conditional probability distribution p(·|t,s) over the set of relevant items Γs,to measure how much information one gains by using the channel defined bya tag to read the set of items , the proposed method uses the Kullback-Leiblerdivergence [21] of the two distributions, as defined below:82DKL(p(·|t,s)||p(·|s)) = ∑i∈Γsp(i|t,s) logp(i|t,s)p(i|s)(5.2)where p(i|t,s) represents the probability that an item i is relevant to a giveninformation seeker s when she uses a tag t to navigate the system; while p(i|s)represents the probability that an item i is relevant to s.Equation 5.2 measures the reduction in the item search space by a given tag t,as it quantifies how much the distribution of relevance conditional on a tag p(i|t,s)diverges from the probability of relevance of an item i. The reduction in searchspace occurs, for example, when conditioning p(i|s) to a tag t concentrates theprobability of relevance over a smaller set of items. However, as conditioning to atag may increase the concentration of the probability mass p(·|s) over fewer rele-vant items. Therefore, it is necessary to complement Equation 5.2 with a measureof relevance of items a tag t delivers to an information seeker s.Estimating delivered relevance. To estimate the relevance of a set of itemsretrieved by tag t to a particular user s, we compare the set of items retrieved(ordered by probability of relevance) to a reference point – a subset with top itemsof Γs ordered by probability of relevance. The intuition is that the more items fromthe top of the ranked Γs the tag retrieves, the more valuable it will be. Note thataccording to this definition a tag maximizes its ability to retrieve relevant itemsby retrieving all items. This, however, does not necessarily maximize its value,as it will depend on how much of the search space the tag is able to reduce (asmeasured by Equation 5.2).More formally, let It be the set of items retrieved by a tag t and not alreadyannotated by the information seeker s (i.e., It 6⊂ Is). Also, let It be ordered byrelevance to an information seeker s. Let Γ[k]s be the set of top-k most relevantitems to s from Γs when ordered according to p(·|s). We define the relevancedelivered by a tag t to an information seeker s as:ρ(t,s) = 1− τ(It ,Γ[k]s)(5.3)83where τ(It ,Γ[k]s)is the generalized Kendall’s τ distance 3 between It and Γ[k]s ,and k = |It |. Kendall’s τ distance measures the fraction of the number of changesneeded (in regards to the maximum number of changes) to convert one rank (It) tothe other (Γ[k]s ). 0 distance means ranks are the same, while 1 states that the ranksare exact opposites. A penalty (which can be specified from 0 to 1) is incurredwhen items appear on one rank but not the other. The rationale is that the morerelevant items a given tag retrieves, the smaller is the distance and the closer to 1,ρ(t,s) gets.Combining relevance and reduction of search space. The final step is todefine the estimate of the value of tag t, from the perspective of an informationseeker s, v(t,s).Definition 4 Given an information seeker s and her set of relevant items Γs, thevalue of a tag t to s, is defined as:v(t,s) = ρ(t,s)DKL(p(·|t,s)||p(·|s)) (5.4)The rationale behind this definition of tag value is that if a tag t retrieves onlyitems with low relevance to s, the factor ρ(t,s) penalizes the value, as it computesthe distance from the retrieved set of items to the set of estimated relevant items tothe user. Therefore, tag t has little value to the information seeker, even though itmay reduce the search space towards a subset of Γs. On the other hand, if t leadsthe user to a subset of relevant items, its value is proportional to the reduction insearch space, as the relevance of the items retrieved by t, which is represented bythe coefficient ρ(t,s), will be close to one and will have a smaller penalty effect.5.5.1 Search Space Reduction PropertyThis section shows that the method we propose can indeed distinguish betweentwo arbitrary tags, when they deliver different levels of reduction of search space.3This quantity is also known as the Kendall distance with a penalty [25], as it introduces apenalty parameter to extend the original Kendall’s tau distance to enable the comparison of top-klists with different elements84As described in Section 5.5, we use a probabilistic interpretation of the searchspace, where the items are assigned with a probability of relevance. Therefore, atag reduces the search space if the probability mass function p(·|t,s) over the setof items Γs is more concentrated than p(·|s).The goal of this analysis is to show that our proposed method is able to dis-tinguish between two tags that lead to different levels of search space reduction.More formally, we prove the following proposition:Proposition 1 Given an information seeker s, if a tag t reduces the search spacemore than another tag w by moving the probability mass towards more relevantitems, then DKL(p(·|t,s)||p(·|s)) > DKL(p(·|w,s)||p(·|s)).Proof. The first condition in the proposition is such that given an informationseeker s, if a tag t reduces the search space more than another tag w, we have that:p(·|t,s) is more concentrated than p(·|w,s), where the functions are defined overΓs. Therefore, H(p(·|t,s)) < H(p(·|w,s)), where H is Shannon’s entropy [21].Moreover, if according to the second condition in the proposition, tag t movesthe probability mass towards more relevant items than the tag w does, this meansthat there are at least two items in i,k ∈ Γs where p( j|s) > p(k|s) and when con-ditioning the probability to each tag t and w, respectively, p( j|t,s)> p( j|w,s) andp(k|t,s) < p(k|w,s). Note that to conserve the probability mass, it is necessarythat |p( j|t,s)− p( j|w,s)|= |p(k|t,s)− p(k|w,s)|.Putting these two conditions together and applying Equation 5.2 to p(·|t,s)and p(·|w,s), we prove, by contradiction, that Proposition 1 holds:85DKL(p(i|t,s)||p(i|s)) < DKL(p(i|w,s)||p(i|s))∑i ∈Γsp(i|t,s) log[p(i|t,s)p(i|s)]< ∑i ∈Γsp(i|w,s) log[p(i|w,s)p(i|s)]∑i ∈Γsp(i|t,s) log[p(i|t,s)]− ∑i ∈Γsp(i|t,s) log[p(i|s)] < ∑i ∈Γsp(i|w,s) log[p(i|w,s)]− ∑i ∈Γsp(i|w,s) log[p(i|s)]Replacing the first summations by the entropy term leads to:−H(p(i|t,s))− ∑i ∈Γsp(i|t,s) log[p(i|s)] < −H(p(i|w,s))− ∑i ∈Γsp(i|w,s) log[p(i|s)]Next, by expanding the second summation on each side to isolate j and k, wehave:−H(p(i|t,s))− ∑i ∈Γs−{ j,k}(p(i|t,s) log[p(i|s)])−p( j|t,s) log p( j|s)− p(k|t,s) log p(k|s) < −H(p(i|w,s))− ∑i ∈Γs−{ j,k}(p(i|w,s) log[p(i|s)])−p( j|w,s) log p( j|s)− p(k|w,s) log p(k|s)Cancelling the equal summations from each side leads to:86−H(p(i|t,s))+ log p( j|s) < −H(p(i|w,s))+ log p(k|s)H(p(i|w,s))−H(p(i|t,s)) < log p(k|s)− log p( j|s)From the first condition set forth in the proposition, we know that H(p(·|w,s))−H(p(·|t,s)) > 0, and from the second condition p(k|s)− p( j|s) < 0. Therefore,the last equation contradicts the original conditions, and the propositions holds.5.5.2 Relevance PropertyThis section shows that the proposed method can distinguish between two arbi-trary tags, when they deliver different relevance levels. In particular, the analysisshow that, from the perspective of a given information seeker s, Equation 5.4 dis-tinguishes two tags if they deliver two different levels of relevance. To show thatour proposed method has this property, we prove the following proposition.Proposition 2 Given an information seeker s, if a tag t retrieves more relevantitems than a tag w, it follows that ρ(t,s) > ρ(w,s).Proof. If t retrieves more relevant items than w, we have that:τ(It ,Γ[k]s)< τ(Iw,Γ[k′]s)where k = |It | and k′ = |Iw|. By inverting the signs and adding 1 to both sides,we have:1− τ(It ,Γ[k]s)> 1− τ(Iw,Γ[k′]s)Therefore, ρ(t,s) > ρ(w,s).875.6 EvaluationThe previous section presents proofs that the proposed method can differentiatebetween two tags when they lead to different levels of search space reduction andrelevance of retrieved items. This section complements these results by perform-ing an experiment with real data to test the accuracy of our method. The methodis accurate if the tag values it produces match users’ perception of value.Two hard constraints limit the validation experiments we can execute: we donot have access to browsing traces and we do not have access to a ground truth,that is, direct estimates of users’ perception of value.We have, however, access to tag assignment traces in a number of systemsand we use them to estimate our method’s accuracy based the following intuition:when a user assigns a tag to an item, this is the tag that had a high value for theuser from the perspective of a future search for that particular object. Thus, ifour method consistently estimates the value of the previously used tags higherthan the value of random tags (that the user have not used before), then there is astrong indication that the method is accurate in quantifying tag value as perceivedby users.5.6.1 Experiment DesignTo test the hypothesis that the proposed method passes this accuracy criterion,we collect tag assignments from a real social tagging system LibraryThing 4 (asdescribed in Table 5.1).Table 5.1: Data set used as input of an experiment that estimates themethod’s accuracyLibraryThing# Users 7,279# Items 37,232# Tags (distinct) 10,559# Tag Assignments 2,056,4874LibraryThing data collected from: http://www.macle.nl/tud/LT/88The experiment consists of two major steps: i) finding the best probabilityestimator parameters (steps 2 to 4) to, which are used as inputs to our method; and,ii) for each user, computing the value of tags from two samples; a sample of tagsfrom the user’s vocabulary and a sample of tags not in the user’s vocabulary (steps5 and 6). These samples are denoted by Gs ⊂ Ts and Rs ⊂ T −Ts. It is important tohighlight that neither tags in Gs nor tags in Rs are used in the parameter estimationphase. Thus, the method has no information whether the user has annotated itemswith them before.More formally, this experiment tests the hypothesis that the method is able toassign higher value to tags in Gs than to tags in Rs. The following steps provide adetailed set of steps we follow in the experiment:1. First, we select a sample of users that use the system more than occasionally,that is, users with at least 50 annotated items. We denote this sample by S50..2. With the tagging trace sorted by annotation timestamp, we break the set ofannotations A into three sets: Atrain, Avalidation, and Atest . The training setcontains the first 80% (sorted by date) of items annotated for the users in thesample S50. The validation and test set are each composed by 10% of theremaining annotations. We made sure that all tags/items on the validationand test sets, also appeared on training set.3. We train the estimators (based on different parameters) for the probabilitydistributions p(·|t,s) and p(·|s) on Atrain. Models trained were based onlanguage models [96] and topic models [35]. As in [38], we were unable toreproduce the results in [96], thus for the rest of this section we shall discussresults based on topic models only.4. The set of items on the Avalidation are then used to measure average Suc-cess10 (as in [96]) of the estimator for each user. Success@10 capturesthe fraction of times at least one relevant item, that is, one item in the val-idation set, appeared in the first set of the first 10 items when sorted by89p(i|t,s) or p(i|s). Each probability distribution is evaluated independentlyof the other. This way, we pick the best estimator parameterization for bothprobability distributions. The best estimators reached Success@10 valuesof 0.05 and 0.06 for p(i|t,s) and p(i|s) respectively. Parameters used areα = 0.1/|I|,β = 0.1/|T |,γ = 0.001. We refer the reader to [35] and oursource code for more details on parameters and implementation issues.5. With the best parameterization, we use Atest to perform our experiments.Recall that no parameter tuning is done on this set. Now, for each users ∈ S50, two sets of tags are constructed, namely: hidden and random. Thehidden set, denoted by Gs ⊂ Ts, contains tags used by user s in the testset Atest . The random set, denoted by Rs, is comprised of 50 tags that arerandomly selected from the trace and have not been used by the user on anyof the train, test or validation sets.6. Finally, we compare the distributions of tag value v(t,s) for tags in G =∪s∈S50Gs to that of the tags in R = ∪s∈S50Rs.5.6.2 ResultsFigure 5.2 shows the results for the naı¨ve method. The plot shows the cumulativedistribution functions (CDF) of values for both tag sets from the perspective ofall users in the LibraryThing data. The result shows that the naı¨ve method is notefficient in distinguishing between tags that users find valuable (i.e., those part ofthe hidden set) and the others (i.e., those part of the random set).In contrast, Figure 5.3 shows the CDF for tags values computed using ourproposed method based on the information theoretical approach. The result showsthat the distribution of tag values for tags in the hidden set Gs is concentrated overlarger values than that of tags in the random set Rs (i.e., tags that are chosen atrandom and that have not used by the user).To confirm that the tag values for tags in one sample are significantly largerthan those from the other sample, we apply a Kolmogorov-Smirnov test. In fact,90Figure 5.2: Comparison between the cumulative distribution functions(CDF) of tag values (naı¨ve method), for tags in each set (Hidden andRandom), from the perspective of each user in the LibraryThing dataset.the test allows the rejection of the null hypothesis that the values in the samplescome from the same distribution, and accept the alternative hypothesis that thedistribution of tag values for tags in the hidden set lies below that of random.In particular, we observe that the D-statistic, which measures the distance be-tween the two CDFs, for the Information Theoretical Method is 2.5 times largerth an that of the Naı¨ve Method. The larger the difference the better is the methodin distinguishing the valuable (hidden) from random tags. In fact, D− = 0.25(p < 2.2×10−16) for Naı¨ve; D− = 0.64 (p < 2.2×10−16) for our method.Therefore, these experiments provide evidence that the proposed method (for-malized by Equation 5.4) is accurate, as it is able to assign higher values to thosetags that users perceive as more valuable.91Figure 5.3: Comparison between the CDFs of tag values (our proposedmethod), for tags in each set (Hidden and Random), from the perspec-tive of each user in the LibraryThing data set5.6.3 AlternativesThis section discusses alternatives to the proposed method. Additionally, this sec-tion presents directions towards an experiment that compares methods to assessthe value of tags regarding their ability to accurately capture the aspects that in-fluence users’ perception of value.Alternatives to the proposed method can be divided into two classes: i) incor-poration of user feedback; and, ii traditional information retrieval metrics. Thenext paragraphs discuss these in turn.User feedback. In production, the method proposed in the previous sectioncould be augmented to incorporate user feedback. In particular, the use of userclick traces (i.e., a log of what tag users clicked when performing an exploratorysearch task) would not only enable better item relevance estimation, but also animportant ground truth for evaluation purposes (as the tags clicked before findingthe items of interest indicate what tags were valuable in that search task).Traditional information retrieval metrics. It is natural to consider that other92traditional metrics used in information retrieval tasks could be adapted to mea-sure the value of tags. For instance, TF-IDF or F-measure could be used to assignvalue to tags. These metrics could provide a partial ordering on the set of tagsin the system from the perspective of an information seeker. Similar to the pro-posed method, these metrics can be used to measure the ability of a tag to retrieverelevant items to a given information seeker. However, in contrast to the methodproposed in this research, these traditional information retrieval metrics do not ac-count explicitly for the reduction in search space that a particular tag can achieve.Alternative experiment design. Assuming one can build a ground truth (e.g.,based on click traces) that contains, for each information seeker, (at least) twoclasses of tags, namely: more valuable and less valuable. Therefore, given aninformation seeker, a tag, and a method, the effectiveness of methods can be com-pared by measuring their performance in classifying the tag correctly. More con-cretely, the experiment can be recast as a logistic regression problem with the classof tag as the output variable, while the value of the tag is considered the featureof a tag. The rationale behind this experiment is that if a method is accurate incapturing the value of tags from the perspective of an information seeker, usingthe value of the tag (measured by the method) as a feature in the classifier leads toan effective prediction of tag class (i.e., more valuable or less valuable).5.7 SummaryThis study focuses on the problem of quantifying the value of peer-produced tagsfor exploratory search. Informed by the qualitative analysis of these aspects, thisstudy designs a method that quantifies tag value by considering the two mostsalient aspects identified by the qualitative analysis: reduction of search space andrelevance of retrieved items (Chapter 4). Finally, an evaluation with real taggingdata provides evidence that the proposed method is able to quantify and differen-tiate valuable tags from those less valuable.It is also important to note that our qualitative analysis uncovers several as-pects that influence the users’ perception of tag value in exploratory search. The93proposed method quantifies the two most salient ones. Therefore, the plan toextend this method to account for the other aspects is a future work. A largerevaluation of our methods using either a collected ground truth or click traces isalso a natural extension of this study.Finally, as the experiment design is subject to decisions that may threaten itsvalidity, it is necessary to comment on the Internal validity of this study. One po-tential source of threats to internal validity is the interaction between data used inthe probability estimators and the methods that assess the value of a tag, howeverwe guarantee that this threat is removed by breaking the trace into three disjointsegments (training, test, and validation) to avoid using the same data in training(i.e., probability estimators) and testing (i.e., tag value computation).94Chapter 6Assessing Value of Peer-ProducedInformation for Content PromotionThis chapter focuses on assessing the value of peer-produced information in adifferent context – content promotion 1.Based on observations provided by the characterization of users’ perception oftag value, where some participants state that one strong motivation produce tags isto promote their content. This observation together with the sheer volume contentowners generate (e.g., YouTube receives 100 hours of video every minute 2) turncreates the problem of optimizing tags to improve content viewership.To cope with such high volume of content, it is common for large-scale contentowners to offload online publication and monetization tasks to specialized contentmanagement companies. The job of content managers is to publish, monitor, andpromote the owner’s content, and usually there is a revenue sharing agreementbetween the content manager and the content owner (e.g., the content managers’revenue is directly related to the number of ad prints each piece of content re-ceives).Although viewers may reach a content item starting from many ‘leads’ (e.g.,1The results presented in this chapter appeared at the following references: [81]2http://www.youtube.com/yt/press/statistics.html95an e-mail from a friend or a promotion campaign in an online social network), alarge portion of viewers relies on keyword-based search and/or tag-based naviga-tion to find videos. An argument supporting this assertion is the fact that 10.5% ofthe unique visitors in YouTube come from Google.com searches 3. With the inte-gration of Google and YouTube search, one might expect that the volume of searchtraffic that leads to views on YouTube will only increase. Moreover, YouTube isthe third most popular site on the web; behind Facebook.com and Google.com 4.Consequently, the textual features of video content (e.g., the title, description,comments, and tags, in the case of YouTube videos) have a major impact on theview count of each particular content item and ultimately on the revenues of thecontent manager and content owner [44, 102].Experts can produce these textual features via manual inspection of a contentobject (and our industry contacts confirm that this is a still current practice 5).This solution, however, is manpower intensive and limits the scale at which con-tent managers can operate. Therefore, mechanisms to support this process (e.g.,automating tag and title suggestion) are desirable. It has been shown that simplesuggestions of textual features produce positive results: for example, title sug-gestions in eBay have benefitted both sellers, who increased revenue, and buyers,who found relevant products faster [44].With the ever increasing volume of user-generated content available on theWeb, there is a plethora of sources from which an automated mechanism that sug-gests textual features, in general, and tags, in particular, could extract candidateterms. For example, Wikipedia (a peer-produced encyclopedia), MovieLens andRotten Tomatoes (social networks where movie enthusiasts collaboratively cata-log, rate, and annotate movies), New York Times movie review section (which in-cludes over 28,000 movies) or even YouTube [102] comments are potential sourcesof candidate keywords to annotate user-generated video.3http://www.alexa.com/siteinfo/youtube.com#keywords4http://www.alexa.com/topsites/global5This has been confirmed by companies who provide content management to large contentproducers such as NBA96This study primarily investigates the value of various information repositorieslike the above when used as data sources for tag recommendation algorithms thataim to boost video-content popularity. In particular, the data sources are catego-rized as peer- or expert-produced according to their production mode, and evalu-ate whether sourcing from one category or the other leads to better recommendedtags. The following research questions drive the investigations presented in thischapter:• RQ5.1. To what extent the tags that are currently associated with existingYouTube content are optimized to attract search traffic? Is there room forimprovement using automated tag recommendation solutions?• RQ5.2. How do peer- and expert-produced input data sources comparewith regards to their impact on the performance of tag recommenders forboosting content popularity?• RQ5.3. Do peer-production aspects such as the number of contributors toa data source influence the effectiveness of tag recommenders that aim toincrease content popularity?It is worth highlighting that this work uses recommender algorithms in a dif-ferent context than many previous studies: the goal is not to design novel andmore efficient recommendation algorithms but to explore the impact of input datasource choice. While previous work proposes tag recommenders that aim to max-imize, for instance, relevance or diversity [7, 32, 56, 62, 72], this study focuses oncomparing the outcomes of using different sources of information (e.g., peer- andexpert-produced) when recommending tags to boost video popularity.In summary, the contributions of this work are:• The evidence that the tags associated with a sample of trailers of popularmovies currently available on YouTube can be optimized by an automatedprocess: either by incorporating human computing engines (e.g., Amazon97Mechanical Turk) at a much lower cost than using dedicated ‘channel man-agers’ (the current industry practice), or, at en even lower cost, by usingrecommender algorithms to harness textual produced by a multitude of datasources that are related to the video content.• A comparison of the effectiveness of using peer- and expert-produced sourcesof information as input for tag recommender that aim to boost content pop-ularity.• The production of a ground truth that is available to the community (to-gether with the implemented tools).It is worth noting that the quest to improve visibility of one’s content (e.g., awebsite, a video) is not new - the whole Search Engine Optimization segment hasseen uninterrupted attention. Multiple avenues are available ranging from somethat are viewed as abusive (e.g., link-farms) to perfectly legitimate ones (e.g.,better content organization, good summaries in the title-bar of webpages). Ourexploration falls into this latter category.6.1 Related WorkThe related literature falls into two broad categories: automated content annota-tion and tag value assessment. This section briefly discusses the previous workson each topic in turn, and positions this work among these previous efforts byhighlighting the novel aspects of our comparison.The majority of related work on automated content annotation (or tag recom-mendation) focuses on suggesting tags to annotate content items such that theymaximize the relevance of the tag given the content [7, 17, 44, 60, 95] with a fewexceptions where authors propose to leverage other aspect such as diversity [7].Although finding tags that are relevant to a given content item is an importantcomponent of improving the tags assigned to the content, previous studies fail toaccount for the potential improvement on the view count of the annotated content98an aspect which is valuable to content managers and publishers, as they monetizebased on the audience that is able to find their content.Zhou et al. [102] study is, to the best of our knowledge, the closest to our work.The authors focus on the same problem we use as a backdrop to investigate thevalue of data sources on improving content popularity via tag recommendation.However, contrary to our study that considers the search portion of traffic thatreach videos, the authors approach the problem of boosting video popularity byproposing approaches to connect videos to other influential videos as a way toleverage the related video recommendations.Our study is different on another axis as it concentrates on evaluating the im-pact of data source choice instead of aiming to design a new recommendationalgorithm. This differentiates this work from other studies [32, 48, 62, 72, 84]as well. Therefore, our work complements and extends these previous effortsas recommender systems solutions could be designed combining the techniquesproposed previously and the knowledge about the valuable data sources and theircombination to improve content popularity such as the one proposed by Lipczakand Milios [59].The problem of assessing the value of data sources for boosting content pop-ularity via tag recommendation is related to assessing the value of individual tagsin other contexts. In the context of exploratory search, for example, Santos-Netoet al. [65] (Chapter 5) pose the problem of assessing the value of contributions insocial tagging systems. The authors argue that the value of collaboratively pro-duced tags, in the context of exploratory search, is proportional to their ability toimprove the efficiency of information seeking tasks from the perspective of a user.In a different context, Gu et al. [31] propose a method to quantify ‘tag con-fidence’ in social tagging systems. Their approach quantifies the quality of tagsproduced collaboratively in social tagging systems by taking into account two as-pects of a tag: i) the credibility of its producer; and, ii) the strength of its semanticrelation to the tagged resource. Therefore, their work is an answer if one aims tooptimize the tags of a content item that best996.2 Context for Assessing the Value ofPeer-Produced InformationThis section describes the context in which we investigate the effectiveness ofdifferent information sources as inputs for tag recommendation algorithms andpresents the formal statement of the recommendation problem used as a backdropin our investigation.Annotating a video with tags that match the terms users would use to searchfor it increases the chance that users view the video. Various textual sources thatare related to the video and whose content can be automatically retrieved (e.g.,movie reviews, comments, wiki-pages, news items) can be used as input sourcesfor recommenders to suggest tags for these content items.Figure 6.1: The recommendation pipeline.A recommendation pipeline that implements the previous idea is schemati-cally presented in Figure 6.1: data sources feed the pipeline with textual inputdata. Next, the textual data is pre-processed by filters to both clean and augmentit (e.g., remove stopwords, detect named entities). This first processing step pro-vides candidate keywords for the recommenders. The recommendation step usesthe candidate keywords (and their related statistics, such as frequency and co-occurrence) to produce a ranked list according to a scoring function implementedby a given recommender algorithm. Finally, as the space available for tags pro-vided by video sharing websites, such as YouTube or Vimeo, is limited, the selec-tion of most valuable candidate keywords is constrained by a budget, often definedby the number of words or characters. Therefore, the final step consists of solvingan instance of the 0-1-knapsack problem [20] that selects a set of recommended100tags from the ranked-list produced by the recommender.In summary, the recommendation pipeline is composed of four main elements:data sources, filters, recommender, and knapsack solver. The next paragraphsdiscuss each of these elements.• Data Sources. This component provides the input textual data used bythe recommenders. In particular, we are interested in peer-produced datasources such as Wikipedia and social tagging systems like MovieLens, aswell as expert-produced data sources such as NYTimes movie reviews. Wediscuss in detail each of the data sources used in Section 6.4.1• Filters. The raw textual data extracted from a data source is filtered to min-imize noise. We consider simple filters such as stopwords and punctuationremoval, lowercasing, and named entity detection 6 input data.• Recommender. Starting from a set of candidate keywords together withrelevant statistics (e.g., frequency, co-occurrence), a recommender scoresthe candidate keywords. Note that there are many ways of defining scoringfunctions; and, it is not our goal to advocate a specific scoring function orrecommender. The intention is to investigate the influence of the choice ofthe data source on their performance. We discuss the recommenders usedin this work in Section 6.4.2• Knapsack Solver. Finally, after ranking candidate keywords, the final stepis selecting the ones which best fit the budget. In this paper the budget isexpressed in terms of the number of characters as done in video sharingsystems such as YouTube, where the total number of characters one canuse for tags is limited to 500. This step is formulated as the 0-1-knapsackproblem, as follows:Let v be a video and C〈ki〉, i = 1...n be a list of candidate keywords providedby a data source when used as input to a tag recommendation algorithm.6We leverage OpenCalais web service to perform named entity detection. http://www.opencalais.com101Additionally, let us denote the length of a keyword ki by wi. Therefore, theproblem of selecting the best tags that to improve viewership of the video vis equivalent to solving the following optimization [20]:maximize ∑ni f (ki,v)xisubject to ∑ni wixi ≤ B (6.1)where B is the budget in terms of number of characters allowed in the tagsfield, xi ∈ {0,1} is an indicator variable, and f (ki,v) is a scoring functionprovided by the recommender for the keyword ki with respect to the videov. Considering that the cost 7 (i.e., the keyword length) and scores are bothnonnegative, we use a well-known dynamic programming algorithm [20] tosolve this optimization problem.6.3 Building the Ground TruthThe ideal ground truth would consist of experiments that vary the set of tagsassociate to videos and capture their impact on the number of views attracted.However, collecting this ground truth requires having the publishing rights forthe videos and, even then, it implies executing experiments over a considerableduration.After unsuccessful attempts to collaborate with content publishers to executesuch an experiment we decided for an alternative solution: we built a ground truthby setting up a survey using the Amazon Mechanical Turk 8 a video and answerthe question: What query terms they would use to search for that video? Therationale is that these terms would, if used as tags to annotate the video, maximize7The budget can be defined in terms of number of tags as in Vimeo which restricts to 20 thenumber of tags a publisher can apply to an uploaded video. This study can easily be extended toconsider this situation.8www.mturk.com102its retrieval by YouTube search engine (and indirectly maximize viewership) whilebeing relevant to the video.The rest of this section presents the details of our methodology to build theground truth and characterizes it.Content Selection. Our study focuses on a specific type of content: movies 9We ask turkers (i.e., the Amazon Mechanical Turk workers who accept to partici-pate in the survey) to watch movie trailers, and not the actual movies. The reasonis that the trailers are generally short (about five minutes or less), and this makesit possible to have the evaluation process more dynamic, encouraging ‘turkers’ towatch more trailers and associate more keywords to them.In total, our dataset consists of 382 movies that were selected to meet two con-straints: Firstly, their trailers must be available on YouTube; secondly, to enablecomparisons, the movies selected had to have reviews available via the NYTimesmovie reviews API 10, and records in the MovieLens catalog 11 the data sourcesused in our experiments in more detail.Survey. First, we conducted a pilot survey by recruiting participants via ourinternal mailing lists and online social networks. This pilot highlighted two majorissues: i) relying only on volunteerism to mobilize participants was insufficient(we were able to collect too few completed surveys); and, ii) quality control (e.g.,typos in the keywords) is much harder as there was no automatic way to recruitonly participants that are fluent in English (all videos in the survey are in English).Therefore, we published a task 12 in the Amazon Mechanical Turk. The taskrequires the ‘turkers’ to watch trailers, and provide the query terms they woulduse to search for the videos they have just watched (Figure 6.2). For each video,we collected answers from three ‘turkers’. Turkers who accept the tasks are re-quired to associate at least 3 keywords (and at most 10 keywords) to each video,9Note that this work can easily be extended to other types of videos or content, as long as thereis textual data available related to the content to produce candidate tags.10developer.nytimes.com/docs11movielens.org12A similar form to that used in the AMT is available at: http://goo.gl/HZiUSw103as queries are typically of that length [36]. Each participant is paid $0.30 per taskassignment with completion time of 6min (leading a total cost of $345 to con-duct the survey. We followed AMT pay guidelines). This amounts to a hourlyrate of $3/hour, which is way cheaper than the wage paid to dedicated ‘channelmanagers’.Figure 6.2: A screenshot of the survey we set up on Amazon MechanicalTurk: turkers watch the video presented on the left side, enter the sug-gested keywords, answer the questions, and move on to the next video.We also perform simple quality control by inspecting each answer to avoidaccepting spam (which is expected to be low, due to the reputation mechanismadopted by the system). In fact, only one submission was rejected because theturker submitted URLs instead of keywords, and they had nothing to do with thevideo.A brief characterization of the ground truth. In total, 33 turkers submittedsolutions. Figure 6.3 shows the number of videos evaluated per turker: as wecan observe, 58% of the turkers evaluated more than 5 videos, with the maximumreaching 333 videos evaluated by one turker. Figure 6.4 shows a histogram ofthe number of different keywords each video received. Even though we askedthe turkers to associate at least 3 keywords to each video, 82% of the evaluations104Figure 6.3: Histogram of the number of evaluations turkers have performedprovided more than the required minimum, which resulted in 96% of the videoswith 10 or more different associated keywords.Figure 6.5 presents the histogram of the total number of characters in the set ofunique keywords associated to each video. The length of the ground truth variesfrom 51 (min) to 264 (max) characters; in fact, 32% of the videos have up to 100characters. These values guide the budget parameter in our experiments as weexplain in Section Experimental SetupThis section presents the instances of data sources and recommenders, and themetrics used in the evaluation.6.4.1 Data SourcesWe focus on comparing the effectiveness of using peer- and expert-produced datasources as input to recommender algorithms in the context of content promotion.The position of a data source in this spectrum (Figure 6.6) depends on whether105Figure 6.4: Histogram of the number of different keywords associated to avideo by turkersFigure 6.5: Histogram of the total length (in characters) for the set of distinctkeywords associated to each video.the data is produced collaboratively by non-experts or by a single expert user.For example, the page of a film in Wikipedia is likely edited by many non-expertusers, while the reviews published by NYTimes are generally authored by a singlemovie critic. Next, we describe each of these data sources:106Figure 6.6: An illustration of the space of data sources we explore.MovieLens is a web system where users collaboratively build and maintain acatalog of movies and their ratings. Users can create new movie entries; updateexisting entries; annotate movies with tags; review and rate movies. Based onprevious users’ activity and ratings, MovieLens suggests movies a user may liketo watch.For our evaluation we use only some of the data available in MovieLens: onlythe tags users produce while collaboratively annotate and bookmark movies. Thisdata is a trace of tag assignments made available on the Web 13Wikipediais a peer-produced encyclopedia where users collaboratively writearticles about a multitude of topics. Users in Wikipediaalso collaboratively editand maintain pages for specific movies 14. We leverage these pages as the sourcesof candidate keywords for recommending tags for their respective movies fromour sample.NYTimes reviews are written by movie critics who can be considered expertson the subject. Similar to the data provided by Wikipedia, we leverage the reviewpage of a particular movie as the source of candidate keywords for the tag recom-mendation. The reviews are collected via the query interfaces 15 the New YorkTimes API.Rotten Tomatoes is a portal where users can rate and review movies, and, in13http://www.grouplens.org/taxonomy/term/1414E.g., http://en.wikipedia.org/wiki/Pulp Fiction (Film)15http://developers.nytimes.com/107addition they have access to all credits information: actors and roles, directors,producers, soundtrack, synopsis, etc. The portal links to critics‘ reviews as well.The information about the credits of a movie and the critics’ reviews can be con-sidered as produced by experts (likely the film credits are obtained directly fromthe producers, while the critics’ reviews are similar to those from NYTimes).While users can review the movies as well (and this qualifies as peer-producedinformation), these reviews are available on the website, but not accessible via theAPI at the time of our investigation. The rest of the information about the moviestogether with links to the experts’ reviews is available via the Rotten TomatoesAPI 16. Therefore, in this investigation, the data used from this data source lies inthe expert end of the spectrum. In the experiments, we divide Rotten Tomatoesinto two data sources: Rotten Tomatoes (with the credit information); and, RTReviews (with the critics’ reviews).YouTube. Finally, to test whether the tags already assigned to YouTube videoscan be further optimized, we collect the tags assigned to the YouTube videos inour sample from the HTML source of each video’s page. The reason for usingpage scraping rather than API requests is that videos’ tags are accessible via theAPI only to the video publisher, even though these tags are still used by the searchengine to match queries and are available in the HTML of the video page. YouTubedata source figures in the expert-produced end of the spectrum, because only thepublisher can assign tags to the video. Moreover, it is reasonable to assume thata video’s publisher is an expert on that video and aims to optimize its textualfeatures to attract more views.6.4.2 RecommendersThe experiments use two tag recommendation algorithms that process the inputprovided by the data sources. In particular, we use FREQUENCY and RAN-DOMWALK. We selected these two recommendation algorithms primarily be-cause they harness some fundamental aspects of the tag recommendation problem16http://developer.rottentomatoes.com/108that more sophisticated methods (e.g., [7, 53, 84]) also use (i.e., tag frequency,and tag co-occurrence patterns). Moreover, our goal is to understand the relativeinfluence of the data sources on the quality of tags recommended. We note that,the methodology we describe and the ground truth can be used to evaluate other,more sophisticated, recommender algorithms as well.The FREQUENCY recommender scores the candidate keywords based on howoften each keyword appears in the data provided by a data source. Given themovie title, our pipeline finds the documents in the data source that match thetitle and extract a list of candidate keywords. For example, in Wikipedia, thecandidate keywords for recommendation to a given movie are extracted from theWikipediapage about the movie. Hence, the frequency of provided by a keywordis the number of times the keyword appears in that page. Similarly, in MovieLens,the frequency is the number of times a tag is assigned to a movie.The RANDOMWALK recommender harness both the frequency and the co-occurrence between keywords. The co-occurrence is detected differently depend-ing on the data source. In MovieLens, two keywords co-occur if they are assignedto the movie by the same user, for example, while in NY Times, Rotten Toma-toes, and Wikipedia two keywords co-occur if they appear in the same page re-lated to the movie (i.e., review, movie record, and movie page, respectively). TheRANDOMWALK recommender builds a graph based on keyword co-occurrence,where each keyword is a node and an edge connects two keywords if they co-occur. The initial score of each node is proportional to the individual frequencyof each keyword as obtained from the data source. The RANDOMWALK is exe-cuted until convergence and the final node scores are used to rank the candidatekeywords [18, 53, 95].6.4.3 Budget AdjustmentTo make the comparison fairer, for each movie, we adjust the budget to the size ofthe tag set in the ground truth. The knapsack solver uses this budget to select therecommended tags for a particular video. The reason for using a budget per video109is that by using a budget larger than the ground truth for a video the F3-measure(see definition below), for example, is penalized by definition, as the number ofrecommended tags will be always larger than the ground truth size.6.4.4 Success MetricsThe final step in the experiment is to estimate, for each video and for various in-put data sources and recommender algorithms, the quality of the recommendedtag-set. To this end, we use multiple metrics to compare the ground truth with therecommended tag-set: the F3-measure, generalized distance [25], and the Nor-malized Discounted Cumulative Gain (NDCG). We present each of these in turn.Let Tv and Sv be the set of distinct words in the ground truth and the recom-mended tag-set, respectively, for video v. The metrics are defined as follows:• F3-measure. This metric is defined as F3(v) =10P(v)R(v)9P(v)+R(v) for video v, whereP(v) is the precisions and R(v) is the recall. This metric, however, weighsall tags in the ground truth equally, and thus ignores one important pieceof information: in some cases multiple ‘turkers’ suggested the same tag, astrong indication that the tag has higher value. To account for this we use:• Generalized τ distance [25]. This metric allows the comparison betweentwo ranked lists. Given a video v, we use this metric to compare Tv (groundtruth) sorted by frequency (i.e., number of turkers who assigned the tagto the video) and Sv (recommended set of tags) sorted by the recommenderscore function. Similar to the traditional Kendall τ distance, the generalizedτ distance counts the number of permutations needed to transform one of thelists into the other one, while relaxing the constraint that the two list have tocontain the same elements. The extension is done by introducing a penaltyparameter to account for elements that are in one list, but absent in theother. This metric however, weighs equally all order inversions, regardlessof whether they are at the top or at the bottom. To compensate for this weuse:110• Normalized Discounted Cumulative Gain (NDCG). This metric intro-duce a discount factor that penalizes order changes at the top of the rankedlist. Given a video v, this metric is computed as follows:NDCG(Tv,Sv) =∑|Sv|j=12 f (w j ,v)−1log( j+1)∑|Tv|i=12 f (ki,v)−1log(i+1)where f (·,v) is the frequency of a tag in the ground truth (i.e., number ofturkers who assigned the tag to the video); i and j are the positions of a tagin the ground truth and in the recommended set of tags, respectively. Notethat if a tag w ∈ Sv and w 6∈ Tv, we consider f (w,v) = 0 .6.5 Experimental ResultsThis section presents our experimental results to address the research questionsthat guide this study. First, we compare the performance of tags already assignedto YouTube videos in our sample to the effectiveness of both FREQUENCY andRANDOMWALK recommenders when using input from all data sources (Sec-tion 6.5.1). Next, we look into the performance of individual data sources tounderstand the influence that each one has in the recommendation performance(Section 6.5.2). To complement the comparison of individual data sources, wecompare two sets of combined data sources that represent the two ends of thespectrum we study (Section 6.5.3). Finally, we perform a characterization to iden-tify factors that may explain the observed performance of some peer-produceddata sources (Section 6.5.4).6.5.1 Are Tags Assigned to Videos Optimized?The first experiment assesses the value of tags already assigned to videos onYouTube for boosting their popularity. To this end, we compare the tags to theground truth of each video. If the tags are already optimized they should show a111Figure 6.7: CCDF of F3-measure for YouTube tags (dashed line) comparedto recommendation based on input from all other data sources com-bined (continuous line) using FREQUENCY (left) and RANDOMWALK(right) recommenders.large overlap with the keywords in the ground truth.Figure 6.7 shows the performance of tags previously assigned to the videosand the performance achieved by recommenders using input from all data sourcescombined (MovieLens, Rotten Tomatoes, Wikipedia, and NY Times) and the per-formance of tags already assigned to the YouTube videos. The curves representthe Complementary Cumulative Distribution Function (CCDF) of F3-measure. Apoint in the curve indicates the percentage of videos (y-axis) for which the F3-measure is larger than x. The more to the right the curve is, the more concentratedaround larger values of F3-measure the recommendation performance is.In fact, the Kolmogorov-Smirnov test of significance confirms that the per-formance of either recommender when using All data sources is significantlyhigher than that achieved by the YouTube tags (FREQUENCY: D− = 0.44, p-value= 3.9×10−16; RANDOMWALK: D− = 0.43, p-value = 5.5×10−15) . These re-sults show that tags recommended by both methods are better than those currentlyassigned to the videos on YouTube. Therefore, the tags assigned to the YouTube112Figure 6.8: CCDF of F3-measure for each data source used as input for FRE-QUENCY (left) and RANDOMWALK (right) recommenders.videos can still be improved towards attracting more search traffic, and, hence,more likely to boost popularity.6.5.2 Is peer-produced information valuable?The next experiment aims to assess the value of peer-produced versus expert-produced information, in the context of recommendation to improve content pop-ularity. To this end, we compare the recommendation performance of differentsources of candidate tags by fixing the recommender.Figure 6.8 shows the CCDFs of the F3-measure for each individual data sourceas the input for the two recommenders. The first observation is that Rotten Toma-toes provide significant improvements over the existing tags on YouTube. Second,MovieLens is significantly better than the other three data sources NYTimes, RTReviews, and Wikipedia, though MovieLens provides minor improvements on thecurrently assigned tags to videos on YouTube.To put these results in perspective, we note that Rotten Tomatoes data sourcebesides providing expert-produced information, it incorporates a schema for the113information provided (i.e., actor names, character names, directors, i.e., namedentities). Thus, one explanation for this good individual performance of RottenTomatoes is that users tend to use exactly names of entities related to the moviethey are searching for. Therefore, by using an input that is rich of highly accu-rate named entities (i.e., entered by the movie producers), it is more likely that arecommender is successful.In fact, we inspected the ground truth and after aggregating the top-5 mostfrequent keywords for each movie, around 50% of the top-10% most frequentkeywords are named entities. Although it might be intuitive that accurate namedentities improve recommendation, the observation that the MovieLens data sourceadds value, though minor, is an interesting observation. In particular, one wouldexpect that the candidate keywords extracted from expert-produced reviews (NY-Times and RT Reviews) or peer-produced fact pages (Wikipedia) about the moviematch what users would use to search. However, relative performance betweenMovieLens and the other three data sources suggests that candidate keywordsproduced via collaborative annotation is more effective than those produced byeither collaborative text editing or produced by experts.Figure 6.9 and Figure 6.10 show the performance of all data sources indi-vidually except the YouTube tags in terms of τ and NDCG metrics. The reasonfor removing YouTube tags is that the data obtained lacks ordering about the tagsrelative importance, and these two metrics compare the recommendations to theground truth by considering the tags ranked according to the scores.The results for the metrics τ and NDCG are qualitatively similar to those ob-served for F3-measure, as the relative order among data sources is kept unchanged– Rotten Tomatoes, MovieLens, Wikipedia, NYTimes, and RT Reviews, in order ofthe highest to the lowest performance. The only highlight that τ and NDCG bringis that the introduction of order in the comparison between the recommended tagsand the ground truth widens the distance between Rotten Tomatoes and the otherdata sources.114Figure 6.9: CCDF of τ for each data source used as input for FREQUENCY(left) and RANDOMWALK (right) recommenders.6.5.3 Combining data sourcesThe previous experiments focused on the performance of individual data sources.In this section, we investigate the relative performance of combinations of datasources.The goal is to understand whether each category of data sources leads to dif-ferent performance levels. In particular, the experiment considers two groups –Peers: MovieLens + Wikipedia; Experts: NY Times Reviews + RT Reviews.Additionally, in these experiments, we use Rotten Tomatoes (which provides thecredit information about the movies) and YouTube tags as baselines for compari-son.Figure 6.11 results lead to three observations: first, the CCDFs show that theperformance of both recommenders using the Peers data source is significantlybetter than using the Experts data source; second, for the FREQUENCY recom-mender, the Peers data source performance is comparable to that of Rotten Toma-toes (which has the advantage of highly accurate named entity information to themovie, as discussed in the previous section); third, the Peers data source provides115Figure 6.10: CCDF of NDCG for each data source used as input for FRE-QUENCY (left) and RANDOMWALK (right) recommenders.significant improvement relative to the tags currently assigned to the YouTubevideos, while for the RANDOMWALK recommender, there is no evidence of sig-nificant improvement.Although Wikipedia alone leads the recommenders to poor performance, com-bining it with MovieLens seems encouraging, as the results for the FREQUENCYrecommender using the Peers data source shows. The combination of candidatekeywords produced by collaborative writing (Wikipedia) and collaborative anno-tations (MovieLens) seems to dilute important co-occurrence information that canbe harnessed by the RANDOMWALK when using only MovieLens, as the relativeperformance between RANDOMWALK recommender with MovieLens (Figure 7)and the Peers compared to YouTube suggests (Figure 10).Figure 6.12 and Figure 6.13 show similar results the performance of bothrecommenders using the Peers data source is significantly better than those usingthe Experts data source; and, while the performance of the FREQUENCY recom-mender using the Peers data source is comparable to Rotten Tomatoes, the RAN-DOMWALK performance is lower.116Figure 6.11: CCDF F3-measure performance comparison between combi-nations of groups of data sources Peers (MovieLens + Wikipedia)and Experts (NYTimes + RT Reviews) relative to YouTube and Rot-ten Tomatoes.6.5.4 Is the number of contributors a factor?As the previous result shows, the performance achieved by the peer-produced datasources vary widely across videos. This section investigates whether the numberof peers that produce tags for a movie in the MovieLens data source has predictivepower about the performance delivered by the data source in the recommendation(or how many peers is an expert worth?). To this end, we compute the correla-tion between the number of users who annotated a movie in MovieLens and thevalue of each quality metric for that video recommendation. The Spearman’s rankcorrelation between the number of users and F3-measure of 0.31 indicates a mildpositive correlation between these aspects.Therefore, the number of contributors partially explains the value added bythe MovieLens data source to the recommenders’ performance. Yet, one potentialreason for a lack of stronger correlation is that the motivation behind tagging amovie in MovieLens leads to drastically different terms from those used by users117Figure 6.12: Tau performance comparison between combinations of datasources: Peers (MovieLens + Wikipedia) vs. Experts (NYTimes +RT Reviews).searching for the video. For example, although ‘boring’ is a tag used to annotatemovies in MovieLens as a way to express opinion about a movie to other users, itis unlikely that users searching for the same movie would use that term.6.6 SummaryA large portion of traffic received by video content on the web is originated fromkeyword-based search and/or tag-based navigation. Consequently, the textual fea-tures of this content can directly impact the popularity of a particular content item,and ultimately the advertisement generated revenue. Therefore, understanding theperformance of automatic tag recommenders is important to optimize the viewcount of content items.First, this study confirms that tags currently assigned to a sample of YouTubevideos can be further improved regarding their ability to attract more search traffic.Next, we perform comparisons between different types of data sources (peer- andexpert-produced) with the goal of understanding the relative value of data sources118Figure 6.13: NDCG performance comparison between combinations of datasources: Peers (MovieLens + Wikipedia) vs. Experts (NYTimes +RT Reviews).and combinations thereof, where we find that combinations of peer-produced datasources can add value compared to the best expert-based baseline. Finally, ourexperiments show that the number of contributors in a peer-produced data sourcepartially explains its positive influence on the performance of tag recommendationfor boosting content popularity.119Chapter 7ConclusionsThe study presented in this dissertation consists of a characterization of socialtagging systems to inform the design of supporting mechanisms. In particular,this research focuses on the value of peer produced information in two distinctcontexts: exploratory search and content promotion. In the former, the studypresents the design and evaluation of a method to assess the value of tags fromthe information seeker standpoint. In the latter, this study investigates the valueof peer-produced information in the context of content promotion in social mediawebsites via tag recommendation.To this end, a combination of quantitative and qualitative methods have beenapplied. The methodology consists of the following parts: i) characterization oftraces of activity of social tagging systems with the focus on understanding usagepatterns (Chapter 2 and Chapter 3); ii) qualitative analysis of users’ perceptionof tag value when performing exploratory search tasks (Chapter 4); iii) designand evaluation of a method to assess the value of tags in exploratory search tasks(Chapter 5); and, iv) a study of the value of peer-produced information for contentpromotion in social media.The main contributions of this research can be summarized as follows (groupedby their respective research question):120Characterization of Individual User Activity• RQ1.1. Users tend to reuse tags already present in the system more oftenthan they repeatedly tag existing items [77, 78]. This finding supports theintuition that tags are primarily a content categorization instrument. Addi-tionally, the results show that the difference between the levels of tag reuseand repeated item tagging vary across different systems. This observationsuggests that features such as tag recommendation and the type of contentplay a role in the patterns of peer production of information in tagging sys-tems.• RQ1.2. The tag vocabulary of a user can be approximated by a small por-tion of her activity [79]. The experiments on the evolution of user tagvocabularies show that only to accurately approximate the characteristicsof a tag vocabulary, only a small percentage of the initial tag assignmentsperformed by a user is necessary. These observed results can applied in thecontext of applications that rely on activity similarity scores between users,for example, as it provides a way to reason about the trade offs betweenaccuracy of a user activity profile and the computational cost of updatingthe similarity scores.Characterization of Social User Activity• RQ2.1. The strength of implicit social ties is concentrated over small por-tion of user pairs. Moreover, the observed strength of activity similaritybetween pairs of users are the result of shared interest as opposed to gen-erated by chance. The distributions of activity similarity strength deviatesignificantly from those produced by a Random Null Model (RNM) [71].This suggests that the implicit ties between users, as defined by their activ-ity similarity levels, capture latent information about user relationships thatmay offer support for optimizing system mechanisms.• RQ2.2. The average strength of implicit ties is stronger for user pairs with121explicit ties [78]. This investigation analyzes the similarity between usersaccording to their tagging activity and its relation to explicit indicators ofcollaboration. The results show that the users’ activity similarity is con-centrated on a small fraction of user pairs. Also, the observed distributionsof users’ activity similarity deviate significantly from those produced by aRandom Null Model [71]. Finally, an analysis of the relationships betweenimplicit relationships based on activity similarity and other more explicitrelationships, such as co-membership in discussion groups, shows that userpairs that tag items in common have in average higher similarity in terms ofco-membership in discussion groups.Characterization of Users’ Perception of Tag ValueTo complement the quantitative characterization and to inform the design of meth-ods that assess the value of tags, this research conducts a qualitative characteri-zation of user’ perception of tag value. A summary of the major findings in thisinvestigation is presented below:• RQ3. Users perception of tag value in exploratory search is multidimen-sional and the key aspects that influence users’ perception are: relevanceof items retrieved and reduction of search space [81]. Based on a qual-itative characterization of users’ perception of tag value in the context ofexploratory search, this study finds that the two most salient aspects thatinfluence users’ perception of tag value are: ability to retrieve relevant con-tent items and ability to reduce the search space. These findings inform thedesign of a method that quantifies the value of tags automatically by takinginto account the important aspects, which are identified by the qualitativeanalysis.Methods to Assess Value of Peer-Produced InformationFinally, this research proposes new techniques that exploit the usage character-istics of tagging systems to improve their design. The next paragraphs briefly122describe the contributions related to studying social tagging as commons-basedpeer production systems and the design of methods to assess the value of usercontribution in these collaborative contexts. Chapter 5 and Chapter 6 distills theproposed approaches and results in details.Important to note that there are two perspectives to the problem of assessingthe value of peer-produced information in tagging systems: the consumer and theproducer. The goal is to design methods that cater for each of these perspectives.For consumers, assessing the value of tags are considered in the context of ex-ploratory search, while for producers, the method takes into account the ability ofa tag to improve the viewership of content (e.g., a YouTube video).• RQ4. An information-theoretical approach to assess the value of tags forexploratory search provides accurate estimates of value as perceived byusers. A method that automatically quantifies the value of tags that catersfor the two desirable properties in the context of exploratory search, asidentified by the qualitative user study. A proof shows that the proposedmethod has desirable theoretical properties while quantifying these two as-pects. Additionally, an experiment using real tagging data that shows thatthe proposed method accurately quantifies the value of tags according tousers’ perception.• RQ5. Peer-produced information, though lacking formal curation, hascomparable value to that of expert-produced information sources when usedfor content promotion. An analysis of online videos provides evidence thatthe tags associated with a sample of popular movie trailers can be optimizedfurther by an automated process: either by incorporating human computingengines (e.g., Amazon Mechanical Turk) at a much lower cost than usingdedicated channel managers (the current industry practice); or, at an evenlower cost, by using recommender algorithms to harness textual producedby a multitude of data sources that are related to the video content. To thisend, I perform a comparison of the effectiveness of using peer- and expert-123produced sources of information as input for tag recommender that aim toboost content popularity.These contributions are key to the understanding user behaviour in social tag-ging systems. More importantly, the characterization study together with the de-sign of methods that assess the value of tags (as proposed in this research) can helpthe design of incentive mechanisms that aim to boost user participation. In fact,a method to assess the value of users contributions in social tagging systems is akey building block in the design of incentive mechanisms. Therefore, this researchprovides an important contribution to future research that pursue this direction.124Bibliography[1] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman.Search in power-law networks. Physical Review E, 64(4), 2001. → pages[2] G. Adomavicius and A. Tuzhilin. Toward the Next Generation ofRecommender Systems: A Survey of the State-of-the-Art and PossibleExtensions. Knowledge and Data Engineering, IEEE Transactions on, 17(6):734–749, 2005. → pages[3] M. Ames and M. Naaman. Why we tag. In Proceedings of the SIGCHIconference on Human factors in computing systems - CHI ’07, CHI ’07,page 971, New York, New York, USA, 2007. ACM Press. ISBN9781595935939. → pages[4] P. Andre´, M. Bernstein, and K. Luther. Who gives a tweet? In Proceedingsof the ACM 2012 conference on Computer Supported Cooperative Work -CSCW ’12, page 471, New York, New York, USA, Feb. 2012. ACM Press.ISBN 9781450310864. doi:10.1145/2145204.2145277. URLhttp://dl.acm.org/citation.cfm?id=2145204.2145277. → pages[5] R. Baeza-Yates and L. Rello. On measuring the lexical quality of the web.In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on WebQuality - WebQuality ’12, page 1, New York, New York, USA, Apr. 2012.ACM Press. ISBN 9781450312370. doi:10.1145/2184305.2184307.URL http://dl.acm.org/citation.cfm?id=2184305.2184307. → pages[6] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval.Addison Wesley, 1999. ISBN 020139829X. → pages[7] F. Bele´m, E. Martins, J. Almeida, and M. Gonc¸alves. Exploiting Noveltyand Diversity in Tag Recommendation. In P. Serdyukov, P. Braslavski,125S. Kuznetsov, J. Kamps, S. Ru¨ger, E. Agichtein, I. Segalovich, andE. Yilmaz, editors, Advances in Information Retrieval SE - 32, volume7814 of Lecture Notes in Computer Science, pages 380–391. SpringerBerlin Heidelberg, 2013. ISBN 978-3-642-36972-8.doi:10.1007/978-3-642-36973-5\ 32. URLhttp://dx.doi.org/10.1007/978-3-642-36973-5 32. → pages[8] Y. Benkler. The Wealth of Networks: How Social Production TransformsMarkets and Freedom. Yale University Press, May 2006. ISBN0300110561. → pages[9] K. Bischoff, C. S. Firan, W. Nejdl, and R. Paiu. Can all tags be used forsearch? In CIKM, CIKM ’08, pages 193–202, Napa Valley, California,USA, 2008. ACM. ISBN 978-1-59593-991-3. → pages[10] A. Budanitsky and G. Hirst. Evaluating WordNet-based Measures ofLexical Semantic Relatedness. Computational Linguistics, 32(1):13–47,2006. ISSN 0891-2017. → pages[11] A. Capocci, A. Baldassarri, V. D. Servedio, and V. Loreto. Statisticalproperties of inter-arrival times distribution in social tagging systems. In20th ACM International Conference on Hypertext, pages 239–244, Torino,Italy, 2009. ACM. ISBN 978-1-60558-486-7. → pages[12] C. Cattuto, A. Baldassarri, V. D. P. Servedio, and V. Loreto. Vocabularygrowth in collaborative tagging systems. 2007. → pages[13] C. Cattuto, A. Barrat, A. Baldassarri, G. Schehr, and V. Loreto. Collectivedynamics of social annotation. Proceedings of the National Academy ofSciences, 106(26):10511–10515, June 2009. ISSN 1091-6490. → pages[14] E. Chi, P. Pirolli, and S. Lam. Aspects of Augmented Social Cognition:Social Information Foraging and Social Search. In D. Schuler, editor,Online Communities and Social Computing, volume 4564 of LectureNotes in Computer Science, pages 60–69. Springer Berlin Heidelberg,2007. → pages[15] E. H. Chi. Information Seeking Can Be Social. Computer, 42(3):42–46,Mar. 2009. ISSN 0018-9162. → pages126[16] E. H. Chi and T. Mytkowicz. Understanding the efficiency of socialtagging systems using information theory. In 19th ACM InternationalConference on Hypertext, pages 81–88, Pittsburgh, PA, USA, 2008. ACM.ISBN 978-1-59593-985-2. → pages[17] P. A. Chirita, S. Costache, W. Nejdl, and S. Handschuh. P-TAG. InProceedings of the 16th international conference on World Wide Web -WWW ’07, page 845, New York, New York, USA, 2007. ACM Press.ISBN 9781595936547. doi:10.1145/1242572.1242686. URLhttp://portal.acm.org/citation.cfm?doid=1242572.1242686. → pages[18] M. Clements, A. P. de Vries, and M. J. T. Reinders. Optimizing singleterm queries using a personalized markov random walk over the socialgraph. Mar. 2008. → pages[19] M. Clements, A. P. de Vries, and M. J. Reinders. The task dependent effectof tags and ratings on social media access. In ACM TOIS, 2010. → pages[20] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction toAlgorithms. The MIT Press, third edit edition, July 2009. ISBN0262033844. → pages[21] T. M. Cover and J. A. Thomas. Elements of Information Theory 2ndEdition (Wiley Series in Telecommunications and Signal Processing).Wiley-Interscience, 2 edition, July 2006. ISBN 0471241954. → pages[22] M.-P. Dubuisson and A. Jain. A modified Hausdorff distance for objectmatching. In Pattern Recognition, 1994. Vol. 1 - Conference A: ComputerVision & Image Processing., Proceedings of the 12th IAPR InternationalConference on, volume 1, pages 566–568, Jerusalem, Israel, 1994. IEEEComput. Soc. Press. ISBN 0-8186-6265-4. → pages[23] F. Eggenberger and G. Polya. Ueber die Statistik verketteter Vorgaenge.Zeit. Angew. Math. Mech, 3(4):279–289, 1923. → pages[24] B. M. Evans, S. Kairam, and P. Pirolli. Exploring the cognitiveconsequences of social search. CHI EA ’09, pages 3377–3382, Boston,MA, USA, 2009. ACM. ISBN 978-1-60558-247-4. → pages127[25] R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. InSODA’03, pages 28–36, Baltimore, Maryland, 2003. Society for Industrialand Applied Mathematics. ISBN 0-89871-538-5. → pages[26] F. Figueiredo, F. Bele´m, H. Pinto, J. Almeida, M. Gonc¸alves,D. Fernandes, E. Moura, and M. Critso. Evidence of Quality of TextualFeatures on the Web 2.0. In CIKM, 2009. → pages[27] F. Figueiredo, H. Pinto, F. Bele´m, J. Almeida, M. Gonc¸alves,D. Fernandes, and E. Moura. Assessing the quality of textual features insocial media. Information Processing & Management, 49(1):222–247,Jan. 2013. ISSN 03064573. doi:10.1016/j.ipm.2012.03.003. URLhttp://dx.doi.org/10.1016/j.ipm.2012.03.003. → pages[28] G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. Thevocabulary problem in human-system communication. Commun. ACM, 30(11):964–971, 1987. ISSN 0001-0782. → pages[29] S. A. Golder and B. A. Huberman. Usage patterns of collaborative taggingsystems. Journal of Information Science, 32(2):198–208, Apr. 2006. ISSN0165-5515. → pages[30] O. Go¨rlitz, S. Sizov, and S. Staab. PINTS: Peer-to-Peer Infrastructure forTagging Systems. In Procdings of the 7th International Conference onPeer-to-Peer Systems, 2008. → pages[31] X. Gu, X. Wang, R. Li, K. Wen, Y. Yang, and W. Xiao. Measuring SocialTag Confidence: Is It a Good or Bad Tag? 6897, 2011. → pages[32] Z. Guan, J. Bu, Q. Mei, C. Chen, and C. Wang. Personalized tagrecommendation using graph-based ranking on multi-type interrelatedobjects. SIGIR ’09, pages 540–547, Boston, MA, USA, 2009. ACM.ISBN 978-1-60558-483-6. → pages[33] H. Halpin, V. Robu, and H. Shepherd. The complex dynamics ofcollaborative tagging. In 16th International World Wide Web Conference,WWW ’07, pages 211–220, Banff, Alberta, Canada, 2007. ACM. ISBN978-1-59593-654-7. → pages128[34] T. Hammond, T. Hannay, B. Lund, and J. Scott. Social bookmarking tools(i): A general review. D-Lib Magazine, 11(4), April 2005. ISSN1082-9873. → pages[35] M. Harvey, I. Ruthven, and M. J. Carman. Improving social bookmarksearch using personalised latent variable language models. In Proceedingsof the fourth ACM international conference on Web search and datamining, WSDM ’11, pages 485–494, New York, NY, USA, 2011. ACM.ISBN 978-1-4503-0493-1. → pages[36] B. He and I. Ounis. Query performance prediction. Information Systems,31(7):585–594, Nov. 2006. ISSN 03064379.doi:10.1016/j.is.2005.11.003. URLhttp://linkinghub.elsevier.com/retrieve/pii/S0306437905000955. → pages[37] M. Heckner, M. Heilemann, and C. Wolff. Personal informationmanagement vs. resource sharing: Towards a model of informationbehaviour in social tagging systems. In Proc. the Third International AAAIConference on Weblogs and Social Media (ICWSM 09), 2009. → pages[38] D. Helic, M. Strohmaier, C. Trattner, M. Muhr, and K. Lerman. Pragmaticevaluation of folksonomies. WWW ’11, pages 417–426, Hyderabad,India, 2011. ACM. ISBN 978-1-4503-0632-4. → pages[39] M. Hennik, I. Hutter, and A. Bailey. Qualitative Research Methods.SAGE Publications, 1 edition, 2011. → pages[40] P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarkingimprove web search? In Proceedings of the 2008 InternationalConference on Web Search and Data Mining, WSDM ’08, pages 195–206,New York, NY, USA, 2008. ACM. ISBN 978-1-59593-927-2. → pages[41] P. Heymann, A. Paepcke, and H. G. Molina. Tagging human knowledge.In WSDM, WSDM ’10, pages 51–60, New York, New York, USA, 2010.ACM. ISBN 978-1-60558-889-6. → pages[42] J. Hirshleifer. Where Are We in the Theory of Information? The AmericanEconomic Review, 63(2):31–39, 1973. ISSN 00028282. → pages[43] A. Hotho, R. Jschke, C. Schmitz, and G. Stumme. Information retrieval infolksonomies: Search and ranking. In Proceedings of the 3rd European129Semantic Web Conference, volume 4011 of LNCS, pages 411–426, Budva,Montenegro, June 2006. Springer. ISBN 3-540-34544-2. → pages[44] S. Huang, X. Wu, and A. Bolivar. The effect of title term suggestion one-commerce sites. WIDM ’08, pages 31–38, Napa Valley, California,USA, 2008. ACM. ISBN 978-1-60558-260-3. → pages[45] A. Iamnitchi, M. Ripeanu, and I. Foster. Small-world file-sharingcommunities. In INFOCOM 2004. Twenty-third AnnualJoint Conferenceof the IEEE Computer and Communications Societies, volume 2, pages952–963, Hong Kong, China, 2004. → pages[46] A. Iamnitchi, M. Ripeanu, E. S. Neto, and I. Foster. The Small World ofFile Sharing. IEEE Transactions on Parallel and Distributed Systems, 22:1120–1134, 2011. ISSN 1045-9219. → pages[47] P. Jaccard. The Distribution of the Flora in the Alpine Zone. NewPhytologist, 11(2):37–50, 1912. → pages[48] R. Ja¨schke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme.Tag Recommendations in Folksonomies. In J. Kok, J. Koronacki,R. Lopez de Mantaras, S. Matwin, D. Mladenic, and A. Skowron, editors,Knowledge Discovery in Databases: PKDD 2007, volume 4702 ofLecture Notes in Computer Science, book part (with own title) 52, pages506–514. Springer Berlin / Heidelberg, Warsaw, Poland, 2007. ISBN978-3-540-74975-2. → pages[49] Y. Kammerer, R. Nairn, P. Pirolli, and E. H. Chi. Signpost from themasses: learning effects in an exploratory social tag search browser. InCHI, CHI ’09, pages 625–634, Boston, MA, USA, 2009. ACM. ISBN978-1-60558-246-7. → pages[50] S. Kashoob and J. Caverlee. Temporal dynamics of communities in socialbookmarking systems. Social Network Analysis and Mining, 2:387–404,2012. ISSN 1869-5450. → pages[51] M. G. Kendall. A New Measure of Rank Correlation. Biometrika, 30(1/2):81–93, June 1938. ISSN 00063444. → pages130[52] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley,and H. A. Makse. Identifying influential spreaders in complex networks.http://arxiv.org/abs/1001.5285, 2010. → pages[53] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks andcollaborative recommendation. In SIGIR, SIGIR ’09, pages 195–202,Boston, MA, USA, 2009. ACM. ISBN 978-1-60558-483-6. → pages[54] G. Koutrika, F. A. Effendi, Z. Gyo¨ngyi, P. Heymann, and H. G. Molina.Combating spam in tagging systems: An evaluation. ACM Trans. Web, 2(4):1–34, 2008. ISSN 1559-1131. → pages[55] B. Krause, C. Schmitz, A. Hotho, and G. Stumme. The anti-social tagger:detecting spam in social bookmarking systems. pages 61–68, Beijing,China, 2008. ACM. ISBN 978-1-60558-159-0. → pages[56] R. Krestel and P. Fankhauser. Language Models and Topic Models forPersonalizing Tag Recommendation. pages 82–89, Toronto, AB, Canada,Aug. 2010. → pages[57] C. Lampe, J. Vitak, R. Gray, and N. Ellison. Perceptions of facebook’svalue as an information source. In Proceedings of the 2012 ACM annualconference on Human Factors in Computing Systems - CHI ’12, page3195, New York, New York, USA, May 2012. ACM Press. ISBN9781450310154. doi:10.1145/2207676.2208739. URLhttp://dl.acm.org/citation.cfm?id=2207676.2208739. → pages[58] P. Li, B. Wang, W. Jin, J. Y. Nie, Z. Shi, and B. He. Exploringcategorization property of social annotations for information retrieval.CIKM ’11, pages 557–562, Glasgow, Scotland, UK, 2011. ACM. ISBN978-1-4503-0717-8. → pages[59] M. Lipczak and E. Milios. Efficient Tag Recommendation for Real-LifeData. ACM Transactions on Intelligent Systems and Technology, 3(1):1–21, Oct. 2011. ISSN 21576904. doi:10.1145/2036264.2036266. URLhttp://dl.acm.org/citation.cfm?doid=2036264.2036266. → pages[60] D. Liu, X. S. Hua, L. Yang, M. Wang, and H. J. Zhang. Tag ranking.WWW ’09, pages 351–360, Madrid, Spain, 2009. ACM. ISBN978-1-60558-487-4. → pages131[61] C. Lu, J.-R. Park, and X. Hu. User tags versus expert-assigned subjectterms: A comparison of LibraryThing tags and Library of CongressSubject Headings. Journal of Information Science, 6(6):763–779, Dec.2010. → pages[62] L. B. Marinho and L. Schmidt-Thieme. Collaborative TagRecommendations. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, andR. Decker, editors, Data Analysis, Machine Learning and Applications,book chapter/section Chapter 63, pages 533–540. Springer BerlinHeidelberg, 2008. ISBN 978-3-540-78239-1. → pages[63] C. Marlow, M. Naaman, D. Boyd, and M. Davis. HT06, tagging paper,taxonomy, Flickr, academic article, to read. In Proceedings of theseventeenth conference on Hypertext and hypermedia - HYPERTEXT ’06,page 31, New York, New York, USA, 2006. ACM Press. ISBN1595934170. → pages[64] E. S. Neto, S. A. Kiswany, N. Andrade, S. Gopalakrishnan, andM. Ripeanu. enabling cross-layer optimizations in storage systems withcustom metadata. In International Conference on High PerformanceComputing - HotTopics, pages 213–216, Boston, MA, USA, 2008. ACM.ISBN 978-1-59593-997-5. → pages[65] E. S. Neto, F. Figueiredo, J. Almeida, M. Mowbray, M. Gonc¸alves, andM. Ripeanu. Assessing the Value of Contributions in Tagging Systems.Social Computing / IEEE International Conference on Privacy, Security,Risk and Trust, 2010 IEEE International Conference on, 0:431–438, Aug.2010. → pages[66] O. Nov, M. Naaman, and C. Ye. What drives content tagging: the case ofphotos on Flickr. In 26th Annual SIGCHI Conference on Human factorsin computing systems, pages 1097–1100, Florence, Italy, 2008. ACM.ISBN 978-1-60558-011-1. → pages[67] E. Ostrom. Governing the Commons : The Evolution of Institutions forCollective Action. Political Economy of Institutions and Decisions.Cambridge University Press, Nov. 1990. ISBN 0521405998. → pages[68] T. Pedersen, S. Patwardhan, and J. Michelizzi. WordNet::Similarity:measuring the relatedness of concepts. In In Demonstration Papers At132HLT-NAACL on XX Human Language Technology Conference.Association for Computational Linguistics, pages 38–41, Boston,Massachusetts, 2004. → pages[69] P. L. T. Pirolli. Information Foraging Theory: Adaptive Interaction withInformation (Oxford Series in Human-Technology Interaction). OxfordUniversity Press, USA, 1 edition, Apr. 2007. ISBN 0195173325. → pages[70] R. Ramakrishnan and A. Tomkins. Toward a PeopleWeb. Computer, 40(8):63–72, 2007. → pages[71] J. Reichardt and S. Bornholdt. Market Segmentation: The NetworkApproach. In Managing Complexity: Insights, Concepts, Applications,pages 19–36. 2008. → pages[72] S. Rendle and L. S. Thieme. Pairwise interaction tensor factorization forpersonalized tag recommendation. pages 81–90, New York, New York,USA, 2010. ACM. ISBN 978-1-60558-889-6. → pages[73] A. J. Repo. The dual approach to the value of information: An appraisal ofuse and exchange values. Information Processing and Management, 22(5):373 – 383, 1986. ISSN 0306-4573. URLhttp://www.sciencedirect.com/science/article/pii/0306457386900725. →pages[74] B. D. Ruben. Information as an economic good: a reevaluation oftheoretical approaches. Information and behavior series, 3. TransactionPubl., New Brunswick, NJ [u.a.], 1990. ISBN 08873827899780887382789. → pages[75] E. Santos-Neto. Characterizing and harnessing peer-production ofinformation in social tagging systems. In Proceedings of the fifth ACMinternational conference on Web search and data mining, WSDM ’12,pages 761–762, New York, NY, USA, 2012. ACM. ISBN978-1-4503-0747-5. → pages[76] E. Santos-Neto, M. Ripeanu, and A. Iamnitchi. Tracking User Attention inCollaborative Tagging Communities. In International ACM/IEEEWorkshop on Contextualized Attention Metadata: Personalized Access toDigital Resources, pages 11–18, Vancouver, June 2007. CEUR-WS.org.→ pages133[77] E. Santos-Neto, M. Ripeanu, and A. Iamnitchi. Content Reuse and InterestSharing in Tagging Communities. In Proceeding of the AAAI SpringSymposium on Social Information Processing (AAAI-SIP 2008), Stanford,Mar. 2008. → pages[78] E. Santos-Neto, D. Condon, N. Andrade, A. Iamnitchi, and M. Ripeanu.Individual and social behavior in tagging systems. In Hypertext, HT ’09,pages 183–192, Torino, Italy, 2009. ACM. ISBN 978-1-60558-486-7. →pages[79] E. Santos-Neto, D. Condon, N. Andrade, A. Iamnitchi, and M. Ripeanu.Reuse, Temporal Dynamics, Interest Sharing, and Collaboration in SocialTagging Systems. Jan. 2013. URL http://arxiv.org/abs/1301.6191. →pages[80] E. Santos-Neto, F. Figueiredo, N. Oliveira, N. Andrade, J. Almeida, andM. Ripeanu. Assessing value of peer-produced information forexploratory search. In Technical Report. Submitted to WWW’2014,October 2013. → pages[81] E. Santos-Neto, T. Pontes, J. Almeida, and M. Ripeanu. How many peersis an expert worth? on the value of information sources for boostingcontent popularity via tag recommendation. In Technical Report.Submitted to WSDM’2014, October 2013. → pages[82] K. Seki, H. Qin, and K. Uehara. Impact and prospect of social bookmarksfor bibliographic information retrieval. JCDL ’10, pages 357–360, GoldCoast, Queensland, Australia, 2010. ACM. ISBN 978-1-4503-0085-8. →pages[83] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski,J. Osterhouse, F. M. Harper, and J. Riedl. tagging, communities,vocabulary, evolution. In 15th International World Wide Web Conference,CSCW ’06, pages 181–190, Banff, Alberta, Canada, 2006. ACM. ISBN1-59593-249-6. → pages[84] B. Sigurbjo¨rnsson and R. van Zwol. Flickr tag recommendation based oncollective knowledge. In 17th International World Wide Web Conference,WWW ’08, pages 327–336, Beijing, China, 2008. ACM. ISBN978-1-60558-085-2. → pages134[85] H. A. Simon. On a Class of Skew Distribution Functions. Biometrika, 42(3/4):425–440, 1955. ISSN 00063444. → pages[86] J. Sinclair and M. Cardew-Hall. The folksonomy tag cloud: when is ituseful? Journal of Information Science, 34(1):15–29, Feb. 2008. ISSN1741-6485. → pages[87] M. A. Stephens. EDF Statistics for Goodness of Fit and SomeComparisons. Journal of the American Statistical Association, 69(347):730, Sept. 1974. ISSN 01621459. doi:10.2307/2286009. URLhttp://www.jstor.org/stable/2286009?origin=crossref. → pages[88] G. J. Stigler. The Organization of Industry. University Of Chicago Press,Mar. 1983. ISBN 0226774325. → pages[89] J. Stoyanovich, S. A. Yahia, C. Marlow, and C. Yu. Leveraging Tagging toModel User Interests in del.icio.us. In Proceeding of the AAAI SpringSymposium on Social Information Processing (AAAI-SIP 2008), Stanford,2008. → pages[90] M. Strohmaier, C. Krner, and R. Kern. Understanding why users tag: Asurvey of tagging motivation literature and results from an empirical study.Web Semantics: Science, Services and Agents on the World Wide Web, 17(0), 2012. ISSN 1570-8268. → pages[91] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semanticknowledge. In Proceedings of the 16th international conference on WorldWide Web - WWW ’07, WWW ’07, pages 697–706, Banff, Alberta,Canada, 2007. ACM. ISBN 978-1-59593-654-7. → pages[92] F. M. Suchanek, M. Vojnovic, and D. Gunawardena. Social tags: meaningand suggestions. In CIKM, CIKM ’08, pages 223–232, Napa Valley,California, USA, 2008. ACM. ISBN 978-1-59593-991-3. → pages[93] P. K. Vatturi, W. Geyer, C. Dugan, M. Muller, and B. Brownholtz.Tag-based filtering for personalized bookmark recommendations. pages1395–1396, Napa Valley, California, USA, 2008. ACM. ISBN978-1-59593-991-3. → pages135[94] P. Venetis, G. Koutrika, and H. G. Molina. On the selection of tags for tagclouds. WSDM ’11, pages 835–844, Hong Kong, China, 2011. ACM.ISBN 978-1-4503-0493-1. → pages[95] C. Wang, F. Jing, L. Zhang, and H. J. Zhang. Image annotation refinementusing random walk with restarts. MULTIMEDIA ’06, pages 647–650,Santa Barbara, CA, USA, 2006. ACM. ISBN 1-59593-447-2. → pages[96] J. Wang, M. Clements, J. Yang, A. P. de Vries, and M. J. Reinders.Personalization of tagging systems. Information Processing &Management, 46(1):58–70, Jan. 2010. ISSN 03064573. → pages[97] J. Weng, E. P. Lim, J. Jiang, and Q. He. TwitterRank: findingtopic-sensitive influential twitterers. In WSDM, WSDM ’10, pages261–270, New York, New York, USA, 2010. ACM. ISBN978-1-60558-889-6. → pages[98] S. A. Yahia, M. Benedikt, L. V. S. Lakshmanan, and J. Stoyanovich.Efficient network aware search in collaborative tagging sites. Proc. VLDBEndow., 1(1):710–721, 2008. ISSN 2150-8097. → pages[99] Y. Yanbe, A. Jatowt, S. Nakamura, and K. Tanaka. Can socialbookmarking enhance search in the web? JCDL ’07, pages 107–116,Vancouver, BC, Canada, 2007. ACM. ISBN 978-1-59593-644-8. → pages[100] D. Yin, Z. Xue, L. Hong, and B. D. Davison. A probabilistic model forpersonalized tag prediction. pages 959–968, Washington, DC, USA, 2010.ACM. ISBN 978-1-4503-0055-1. → pages[101] D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring socialannotations for information retrieval. In 17th International World WideWeb Conference, pages 715–724, Beijing, China, 2008. ACM. ISBN978-1-60558-085-2. → pages[102] R. Zhou, S. Khemmarat, L. Gao, and H. Wang. Boosting video popularitythrough recommendation systems. In Databases and Social Networks on -DBSocial ’11, pages 13–18, New York, New York, USA, June 2011. ACMPress. ISBN 9781450306508. doi:10.1145/1996413.1996416. URLhttp://dl.acm.org/citation.cfm?id=1996413.1996416. → pages136137Appendix AContextual Interview GuidelineThe contextual interview guide consisted of the questions below. Note that al-though the interviews help to collect data that enable us to confirm previous stud-ies about both motivations to use tags and the types of use, the primary goal ofthis investigation is to understand what aspects influence the users’ perception ofvalue when choosing tags during information seeking tasks:1. Why do you use tags? Why do you use each of these specific systems youmentioned?2. What’s the perceived value of tags produced by other users to you?3. Can you describe search interfaces/systems of your choice that you usewhen looking for a set of items related to the same topic? For example,to explore a given topic of interest. (Probes: to find articles related to atopic of interest)4. What are the situations where you feel the search interfaces mentionedabove are more adequate to perform your search tasks, as opposed to otheralternatives? (Probes: traditional keyword-based search vs. AND-searchnavigation)?1385. Please, describe/show us (in as much details as possible) the process youfollow when using exploratory search. You can recount your last experi-ence, for example.6. Consider a scenario where you are looking for content on a given topic ofinterest. How do you choose among tags when navigating (i.e., performinginformation seeking tasks)?7. Can you show us an example of an exploratory search where you had tochoose among tags to proceed?8. Why did you choose these tags while looking up these content items (fromquestion 7)?9. How does the partial search results influence the tags you choose to proceedwith the navigation?10. Let’s talk about a different use of tags: annotation instead its use in search/-navigation. How did you choose the tags when annotating content?11. Do you speak/write/read more than one language? If so, how do these mul-tiple languages influence your choice of tags?12. How the intended use of content you found during your search/navigationinfluence your choice of tags to annotate it?In the second part, users are requested to ‘solve’ the following navigationtasks:• Task 0 (tutorial). Find articles related to cooking. (The goal is to get theuser acquainted to the Getboo interface and enable her to perform task 1 and2 without much intervention).• Task 1. Find articles related to your work that are interesting (and new) toyou.139• Task 2. Find articles related to your hobbies that are interesting to you.We note that the search tasks are deliberately vague. The reason is that suchtasks are the ones that motivate users to go into exploratory search mode [86]rather than trying to locate a single specific answer to an information need (e.g.,what is a factotum? What is the blindekuh restaurant’s location in Zrich?)140CodeTypeDescriptionExample from dataA - Validate previous workon motivation for tagusageMotivation to use systemsInductiveUsers have different motivations to use aspecific system that offers taggingfeatures.The process of making sense is faster. Instead of reading apage... I just look for one or two sentences gives me an idea ofwhat does it means and then I go for something else. when I doit traditionally.... Sorry... when I do it traditionally, the way itworks is I have to go over everything I have to read everythingmore carefully. I think that in this sense it’s way better.P1P1Tag productionInductiveDescribes the aspects related to users'tag production habitsWith Flickr I think I used more of tagging (I used it awhile ago),but [I used] tagging more as a content producer, in Twitter moreas consuming contenP1P1Categorization (tag production)InductiveTags can be used to annotate items withthe purpose of grouping these items intodifferent groups.Flickr I use to put my photos in different groupsP1P1Content promotion (tagproduction)InductiveTags can be used to annotate items withthe purpose of increasing the chance ofits discovery by others.I try to tag them in a better way to discover them so people cancome and comment in my photos.P1P3Join a trend (tag production)InductiveRefers to situations where tags are usedto create trends (like promotions orgames) or join a trend.The only thing that I might use a hashtag, if something istrending in Twitter, and I want to join the trend, I add a hashtagto it.P1P1Target audience (tagproduction)InductiveRefers to the fact users may considertheir audience when choosing tags toannotate contentMaybe, because the target audience, or the place where I usedthe content from, or I generate the content for, are not persian.that’s one issue.P4P2Language choice (tagproduction)InductiveMaybe, because the target audience, or the place where I usedthe content from, or I generate the content for, are not persian.that’s one issue.P3P3Descriptive tags (types of tags)DeductiveTags may vary widely in its type (orfunction), this code represent evidence oftags that are used to describe the items.I think it was location that I paid attention to - kind of -  describethe location. And... I think that feature of the photo as far I canremember. If I was interested in a low light photography I put a“low light” tag, if it was HDR I put an “HDR” tag, if it was a wideangle photography...P1P1Task-organization tags (Typesof tags)Deductiveoutra parte das tags era pra dizer o que eu queria com aquilo,por exemplo, tinha muita coisa do tipo: eu achava que tinha umlivro bacana e eu achava que deveria comprar ele ou ler eledepoisP3P3Source of tagsInductiveTags can be created by the user orextracted from social norm, for example.In some conferences, they say:  here is a hashtag of aconference, I usually tend to, I don’t tweet, but I search for thehashtag to see things about the paper that just have beenpresented, or anything specifically that have been going insidethe conference.P1P6Appendix BCo eb ok141(SUB) Aggregated itemsevaluationInductiveOne of the motivations to use socialtagging is that items are directly (via tagsor other system provided tool) orindirectly (amount of user who added theitem) evaluated by the usersEu acho que quando o cara se dava o trabalho de escrever umresumo do negócio ou era muito ruim ou era muito bom,entendeu?B - Build a theory on howusers consume tags viaexplorationTypes of searchesInductiveWe can build a taxonomy for types ofsearches in the context of our study.Users have mentioned serious and lessserious type of searches, which seems toaffect their expectations about therelevance of results, and level ofrefinement.I wanted to look for one page websites, I am not sure I used twotags, or not. So, again, it was not a very serious search. So, Isearch for a website, I browsed through and tried to look fordifferent things. But, I think I could use, I think,  “website” and“one page”, I think there would be a “onepage” tag.P1P4Tag consumptionInductiveDescribes aspects related to tag use forsearch or other activities, such asaspects that influence the choice of tagsthat should be used.P5P5Semantic similarity (tagconsumption)InductiveRefers to questions related to choice oftags based on their (a)similaritySo, if we show, type and typography both of them point to thesame thing, web and website, icon and icons, it’s a bit of uselessto have these two similar, very similar tags together, this issomething that impacts the value, icons have zero value herebecause you have icon here.P3P3Relevance of Results (tagconsumption)DeductiveRefers to the user's action of consideringthe results of a query to take a decisionabout the next step of search.as I couldn’t find lots of stuff in ‘hci’ ‘research’ ‘usability’ then Itried to remove the tags to find more interesting stuff with abroader... ‘hci’P3P3Search space reduction (Tagconsumption)DeductiveRefers to the fact that tags can be usedto "cut" desired parts of the search spaceas I couldn’t find lots of stuff in ‘hci’ ‘research’ ‘usability’ then Itried to remove the tags to find more interesting stuff with abroader... ‘hci’P3P3Source of tags (Tagconsumption)InductiveThe producer of the informationinfluences the perception of value ofinformation goods.Sometimes it happens that some of my friends... they retweetothers tweet... the probability that I follow those tweets is lessthan their own tweets... because I’m much more interested in tosee what they are thinking what they are doing than what otherare thinking... but this is different from search.P3P3Information foraging behaviourInductive. If I feel that these first results are not the ones I’m looking for...I refine the search...P3P4Search mechanismDeductiveUsers have differentunderstanding/perceptions about how thesystem works.I think of Twitter of this full text search feature that it has, thatdoesn’t matter if your search it’s specific a hashtag or not.P1P1Diversity (tag consumption)DeductiveUsers may value tags conditionaly to theother tags already selected. This codehighlights the aspect where users valuetags more if they have diversity amongthemselves.Yeah... diversity, exactly! it did not add much information.probably,, hci... then I search for something like more orthogonaltags.P1P1142Diversityworkexplorationcomplete ataskGetbooSuggesting too similar tagsdecreases the value of them- Relatedtags- SemanticsimilarityP4- Negative. "Eu achoconfuso (...) Achoruim. (...) por quenão bota aqui“portfolio”, “design”,“musics”?"IncreaseDiversityfun explorationcomplete ataskGetbooUsing synonyms tags in a searchDOESN'T make sense- SemanticsimilarityP6- Neutral. "tem dois“fun(ny)” aqui, numfaz sentido né. Voutirar esse aqui"IncreaseDiversityfun explorationfind resourceDribbleSuggesting too similar tagsdecreases the value of them- Relatedtags- DiversityP1- Negative. "it’s a bitof useless to havethese ... very similartags together. Thisis something thatimpacts the value.'Icons' have zerovalue here becauseyou have 'icon'here... (I prefer tohave) meaninfuldiversity within thetags!"IncreaseKnownvocabularyworkexplorationcomplete ataskGetbooWhen differenciating similar tagsthe one that is known/used havemore value- SemanticsimilarityP3- Neutral."'computer_science'era uma tag que euusava maisbasicamente.”IncreaseKnownvocabularyworkexplorationcomplete ataskGetbooTag composition have less valuethan the tag with the words- Categorydefinition- TagconsumptionP4- Negative."'performance_art' éuma categoriaespecífica que não,'performance'sozinha"IncreaseKnownvocabularyfun explorationcomplete ataskGetbooWhen differenciating similar tagsthe one that is known/used havemore value- SemanticsimilarityP8- Neutral. "Sei lá!Quando eu procurorelacionado agames eu uso noplural."IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon ValueAppendix CThick Descriptions143Knownvocabularystream/infoexplorationfind workrelated pointersTwitterContent and related tagsassociated with known vocabularyhave more value.- Relatedtags- More likethis!- Relevance ofresultsP8- Positive. "euescolheria ashashtagsassociadas quesejam semelhantesou iguais a termosque eu escuto ...esse aquiprovavelmente eunão clicaria porqueeu não sei o que éesse “DAADC13”"IncreaseMakingsensestream/infoexplorationbeen up-to-dateTwitterHashtags creates an easier spaceto be processed/evaluated. (Moreabout Twitter?)- Types ofsearchP2- Neutral. "Theprocess of makingsense is faster."More likethis!stream/infoexplorationjob searchTwitterIndexation by hashtags cangenerate a "more like this" search.It is faster for this kind of search- Categorization?- Types ofsearchP2- Positive. "In thiskind of stuff, find itusually easier andfaster."Increase?More likethis!photoexplorationfind image touseFlickrEven as a "secondary" way ofdoing it, clicking on tags may helpyou to explore a "theme"- Categorization?- TagconsumptionP5- Negative. "I didn’tmean to use it. I justsee a tag and I clickon it... and I mayfind more types oftrees."Increase?More likethis!photoexplorationfind image touseFlickrWhen the results are not goodenough, you may use similar tagsto explore a "near space"- Relatedtags- Nearspace?- SemanticsimilarityP7- Neutral. "a quemais se aproximaseria “dark room”,certo? Essa, esseseria meu próximoalvo."Increase?More likethis!workexplorationcomplete ataskGetbooWhen the results are not goodenough, you may use similar tagsto explore a "neighbour space"- Relatedtags- Nearspace?- SemanticsimilarityP2- Neutral. "the nextone is gonna beuser experience...there might be someresults that theyused these terminstead of ux"Increase?AspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value144More likethis!stream/infoexplorationjob  searchTwitterOne tag has more value than theother if it describes better what Iwant (will retrieve more relevantresults)!!!- Relatedtags- Describability- Relevance ofresultsP2- Neutral. "If theresults were not thatmuch related towhat I’m looking forI start looking fornew hashtags ...(they might be)really more close towhat I’m lookingfor."IncreaseMore likethis!photoexplorationfind image touseFlickrAfter start with a more specific tag(s) and finding that the results arefrom "another category" shedecided to make the spacebroader.- Searchspace- Spacedefinition- Relevance ofresultsP7- Negative. "Isso, euqueria uma sala decinema real. Tá,minha próximaalternativa seriaapagar o 'room’ etentar só 'cinema'.Obviamente podeaparecer bem maiscoisa, por exemplo,o exterior decinemas."NULLNarrowerspacefun explorationcomplete ataskGetbooA more specific tag (that result ina more focused space) have morevalue- Space size- Spacedefinition- Relevance ofresultsP1- Neutral. "I mightlook at'typography'... itmight give morefocused result...something in thisarea"IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooResults are used to identify if thesearch is going in the rightdirection- Space size- Spacedefinition- Relevance ofresultsP3- Neutral. "pareceque tem muito a vercom 'storage' e tal,mas nada a ver compesquisa... tem querefinar mais."IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooSuggested tags would have morevalue if helped to cut more thespace, make it more focused.- Searchspace- Relatedtags- Search spacereductionP3- Neutral. "É aimpressão que eutenho é que.. é..essas tags queaparecem aqui no..na caixinha dadireita são muitogerais, sabe? ... Euía mais na coisamais restrita"IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value145Narrowerspacefun explorationfind link tocontentTwitterA tag is more valuable when itcuts the space more precisely.- Spacedefinition- Proportion ofrelevant items- Search spacereductionP6- Positive. "Sozinhanão. Não, porque eunão estava atrás deconteudo do ufc, euestava atrás do linkdo UFC 160,entendeu?Certamente essapalavra “link” estariano meio."IncreaseNarrowerspacefun explorationfind link tocontentTwitterA tag is more valuable when itcuts the space more precisely.- Spacedefinition- Proportion ofrelevant items- Search spacereductionP6- Positive. "Essaaqui #UF160 (émais valiosa).Porque estáespecificando bem"IncreaseNarrowerspacefun explorationfind link tocontentTwitterA tag is more valuable when itcuts the space more precisely.- Spacedefinition- Proportion ofrelevant items- Search spacereductionP6- Neutral. "Eucombino 2 hashtagsou 3, quando fazsentido, para tentarrefinar mais."IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooA tag have more value when itdefines a narrower search space- Spacedefinition- Search spacereductionP7- Neutral. "Como'opensource' já éum subconjunto dedesenvolvimento/programaçãoentão eu voucomeçar clicandoem 'opensource'”IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooA tag have more value when itdefines a narrower search space- Spacedefinition- Search spacereductionP7- Positive. "Então euvou adicionar maisuma tag prarestringir mais aindae ver se eu acho oque eu quero."IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooTwo things improve the value ofthe tag: space definition, andspace reduction.- Spacedefinition- Search spacereductionP7- Neutral. "como euacho que estas trêstags aqui já definembem, eu vouadicionar uma coisabem específica praver se tem nosistema algumacoisa relacionada."IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value146Narrowerspaceworkexplorationcomplete ataskGetbooA tag have more value when ithelps to define a search space- Spacedefinition- Search spacereductionP2- Neutral. "I used'development'... youcan find differentthings.'ux+development'...that was kind ofdevelopment... usedfor user experience.if I had seen thisbefore I would startwith ux"IncreaseNarrowerspacefun explorationcomplete ataskGetboo- Two things improve the value ofthe tag: space definition, andspace reduction.- A tag have more value when itdefines a narrower search space- Spacedefinition- Search spacereductionP11- Neutral. "Eu achoque ela é uma tagmuito clara assim,mas muitoabrangente né.Então ela comcerteza não vai sersuficiente"IncreaseNarrowerspacestream/infoexplorationfind workrelated pointersTwitterA tag is more valuable when itcuts the space more precisely.- Knownvocabulary- Relevance ofresultsP8- Neutral. "Asegunda forma seriamais valiosa porqueeu já relaciono issoaqui como umapalavra-chave. Seeu colocar separadovai me trazerresultado aquirelacionado a“máquinas” e a“social”"IncreaseNarrowerspaceworkexplorationcomplete ataskGetbooA tag have more value when itdefines a narrower search space- Searchresults- Search spacereductionP8- Neutral. (afterseen that there weresome pages ofresults): Deixa eufiltrar mais. Deixa euver aqui se aparecealguma coisa sebotar "maps"NULLAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value147Proportionof relevantitemsworkexplorationcomplete ataskGetbooIf the proportion of relevance inthe first results is low the filter isnot good enough- Searchresults- Relevance ofresultsP7- Neutral. “eu vouolhar as primeirascinco, dez, eh,entradas aqui e vermais ou menoscomo é que tá omeu, como é queestão os meusresultados. (...) osresultados ainda tãocom muito ruído.Então eu vouadicionar mais umatag.IncreaseProportionof relevantitemsworkexplorationcomplete ataskGetbooIf the proportion of relevance inthe first results is low the filter isnot good enough- Searchresults- Relevance ofresultsP11- Neutral. "se num tána primeira páginalogo, nas primeiras,eu, eu acho que nãovai ser do meuinteresse"IncreaseProportionof relevantitemsstream/infoexplorationgeneral searchTwitterIf the proportion of relevance inthe first results is low the filter isnot good enough- Searchresults- Relevance ofresultsP11- Negative. "oprimeiro resultado éem espanhol, osegundo também...o quarto já é emespanhol também.... muito conteúdoem espanhol quenão me interessava"IncreaseProportionof relevantitemsworkexplorationcomplete ataskGetbooIf the proportion of relevance inthe first results is low the filter isnot good enough- Searchresults- Relevance ofresultsP8- Neutral. "Aindanão é suficiente issoaqui. (...) Eu olheide início assim, osprimeiros num meinteressaram não."IncreaseProportionof relevantitemsworkexplorationcomplete ataskGetbooIf the proportion of relevance inthe first results is low the filter isnot good enough- Spacedefinition- Relevance ofresultsP8- Neutral. "souanalista derequisitos, então, eucliquei em“software” ... ele meretornou coisasrelacionadas aodesenvolvimentoem si ... nada aoque interessa."IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value148Proportionof relevantitemsphotoexplorationfind image touseFlickrTags that have a higherprobability to return relevant itemshave more value- Searchresults- Search spacereductionP7- Neutral. "eumanteria “cinema”porque eu acho queé a palavra principalaqui... Eu tentomanter aquele queeu tenho maiscerteza que vão melevar ao objeto queeu quero encontrar”IncreaseProportionof relevantitemsworkexplorationcomplete ataskGetbooOne tag has more value than theother if it will retrieve morerelevant results!!!- Spacedefinition- Relevance ofresultsP7- Neutral. "buscandoinicialmente por'software' ... (vs'programming) euacharia mais coisasque não sãorelacionadas ... aomeu trabalho."IncreaseProportionof relevantitemsworkexplorationcomplete ataskGetbooA set of tags used to search isconsidered ok when the result isrevelevant (Even if the createdspace is "too short/notmeaninful?")- Spacedefinition- Space size- Relevance ofresultsP3- Neutral. "Aíaparentementequando eu refino ascoisas não significaquase nada.  Mas oque fica é umacoisa bacana"IncreaseProportionof relevantitemsstream/infoexplorationgeneral searchTwitterWhen the tag retrieves too much(irrelevant) content this tag losesvalues.- Space size- Search spacereductionP11- Negative. " o queme desapontou umpouco na época foique veio muitoconteúdo. Ou seja,é uma tag tão clarae tão abrangenteque veio muitoconteúdoindesejado."IncreaseRelatedtagsfun explorationcomplete ataskGetbooIf after reading related tags theuser uses his own vocabulary thismeans that suggestions had lowvalue!?- Searchresults- Relevance ofresultsP6- Neutral."(Murmurandoanalisa osresultados da tag“videos”! Faz aleitura das tagsrelacionadas...dizendo algo como:“É eu não usarianenhuma daqui!”)"DecreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value149Relatedtagsfun explorationcomplete ataskGetbooTags can have their valuedecremented if related withirrelevant content???- Searchresults- ContentRelevance- Relevance ofresultsP6- Negative."(Olhando para astags relacionadascom o link que nãogostou!) Mas táclassificado aqui ó:“free, funny, humor”.Então vamos tentar“fun”)"DecreaseRelatedtagsworkexplorationcomplete ataskGetboo- Content relevance can bejudged based on related tags: ifthere are irrelevant tags thecontent loses value.- Searchresults- Resultingcategory- ContentRelevance- Relevance ofresultsP8- Negative. "Talvezesse seja maisrelevante do queeste aqui! Porqueesse aqui trouxemais assim:“python”, eh “php”,que já são tags quenão me interessam."NULLRelatedtagsworkexplorationcomplete ataskGetbooTags can have their valueincremented if related withrelevant content- Searchresults- ContentRelevance- Relevance ofresultsP8- Neutral. "se euachei um linkinteressante, ... eutento ver quais sãoas tags que ele táusando pra fazerpossíveis pesquisasrelacionadas"IncreaseSpacedefinitionworkexplorationcomplete ataskGetbooA tag that better describes adesired theme will probablyretrieve more relevant results.- Searchresults- Searchspace- Describability- Relevance ofresultsP11- Neutral. "Mas aque eu vou clicar éa “web2.0”  porqueeu acho que ela émais, é,representativa darede social na web"IncreaseSpacedefinitionfun explorationfind link tocontentTwitterA tag that better describes adesired theme will probablyretrieve more relevant results.- Searchspace- Knownvocabulary- Describability- Search spacereductionP6- Neutral. "se euquisesse todo oconteúdo doUFC160, eu acholegal que o cara, porexemplo, toda vezque tweetassealguma coisa elebotasse UFC160."IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value150Spacedefinitionworkexplorationcomplete ataskGetbooOne tag has more value than theother if it describes better what Iwant (will retrieve more relevantresults)!!!- Searchspace- Describability- Search spacereductionP6- Neutral. “Entãocomo era “tutorial”,para aprender, apalavra “tutorial”estava dentro da tag“Ruby” aqui, entãopara mim faria maissentido.”IncreaseSpacedefinitionbookmarkexplorationcomplete ataskGetbooThe search was simpler becausethe space was "well defined" (in asmall amount of tags)- Types ofsearchP2- Neutral. "I mean itwas simpler... thereare not so manyoptions of what I'mlooking for."IncreaseSpacedefinitionfun explorationcomplete ataskGetbooIf the space is already definedusing more tags will have a lowervalue.- Searchspace- Relevance ofresultsP3- Neutral. "eu nãousaria isso aqui(TAGSRELACIONADAS)pra buscar aqui. Eujá sabia que, porexemplo 'scifi'definiria isso aquipra mim osuficiente"DecreaseSpacedefinitionfun explorationcomplete ataskGetbooIf the space is already definedusing more tags will have a lowervalue.- Space size?- Search spacereductionP3- Positive. "Bacana.Parece quecolocando o temadireto é maisbacana do que(combinar comoutras tags) talvezseja porque é umacoisa.. sei lá, muitoespecífica, quepouca gente gosta.Não sei."DecreaseSpacedefinitionfun explorationcomplete ataskGetboo- A tag have more value when ithelps to define a search space- Search spacereductionP8- Neutral. (afterasked about'entertainment' tag):"porque eu tavavendo muita coisarelacionada anotícia aqui."IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value151Space sizeworkexplorationcomplete ataskGetbooSometimes to decide betweentwo similar spaces is better totake a look at both and evaluatethe results- SemanticsimilarityP7- Neutral. "acho quepode ajudarbastante quandovocê precisa filtrarpor tagsrelacionadas, é aquantidade deresultados"MediateSpace sizeworkexplorationfind workreferencesgeneralThe number of results is anaspect to decide when to continueto search.- Search spacereductionP4- Neutral. "Aíaparece muitaspalavras aí eucoloco um nomeque tenta identificar"DecreaseSpace sizefun explorationcomplete ataskGetbooThe number of results is anaspect to decide when to continueto search.- Search spacereductionP4- Negative."(PORTLAND)apareceu demaisaqui! Eu acho quenão vou confiar ...(PORTLAND+MUSIC)"DecreaseSpace sizefun explorationcomplete ataskGetbooThere is an ideal size of a searchspace: not too many BUT withoptions!- Search spacereductionP11- Negative. "muitosresultadosconfundem e vocênão consegue acharaquilo que vocêquer... um númerode resultados quenem preenche aprimeira página dosistema é um poucofrustante."MediateSpace sizeworkexplorationcomplete ataskGetbooWhen the space is too small therelated tags have a smallervalue?- Searchspace- Search spacereductionP1- Neutral. "It mighthave filtered toomuch... or I wasreally specific... tovery focused... as Icouldn’t find lots ofstuff in ‘hci’‘research’ ‘usability’then I tried toremove the tags tofind more interestingstuff with abroader... ‘hci’"IncreaseAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value152SearchspacephotoexplorationgeneralFlickrWhen you use tags to search youcreate different spaces to dealwith- TagconsumptionP5- Neutral. "expect tosee things more in acategory. … I think Ijust expecteddifferent sets."Searchspacestream/infoexplorationexplore forinformationTwitterHash tags have more value inTwitter because they define anarrower search space- Search spacereductionP7- Neutral. "o hashmarca o tópico.Normalmente issoajuda. Se você nãoachar com o tópicopode voltar e tentarsem eles."Searchstrategyworkexplorationcomplete ataskGetbooAfter defining a good space tosearch it is valuable to test morespecific tags- Searchspace- Relevance ofresultsP3- Neutral. "Aí eu voucolocar (filesystem)... mais ligado apesquisa... termomenor popular."Searchstrategyworkexplorationcomplete ataskGetbooCutting around is good enough fora first step- Searchspace- Search spacereductionP6- Neutral."Obviamente é umalinguagem, não é oframework, mas euiria ver do que é quese trata."Searchstrategyfun explorationcomplete ataskGetbooOne tag has more value than theother if it will retrieve morerelevant results!!!- Searchspace- Relevance ofresultsP8- Neutral. "tambémé interessante vocêtirar uma tag ... vero que tem a outra.... ‘não, isso aquinão interessa não,vou tirar que távindo muitabesteira’Types oftagsstream/infoexplorationgeneral searchTwitterGroups/Sources of tags that arepreviouly judged as having lowrelevance will contributenegatively for non knowing tags.- Searchresults- Relevance ofresultsP11- Negative. "as tagsdas trendsgeralmente... têmum caráter que nãome atraem nostweets... essa coisaaqui que eu nem seio que é, oh!"NULLAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value153Types oftagsworkexplorationcomplete ataskGetboo- To decide between two tags theexpected result is considered.- A "known tag" will have morevalue!?- Knownvocabulary- Describability- Relevance ofresultsP4- Neutral. "eu achoque.. esse tipo detag não seria usadopela pessoa quepostou"NULLAspectsContextMotivationSystemWhat and How is it discussed?RelationsCodesExamplesAbout emotion andsome textInfluenceon Value154


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items