Computational Social Influence:Models, Algorithms, and ApplicationsbyWei LuB.Sc., Simon Fraser University, 2010B.Eng., Zhejiang University, 2010a thesis submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoralstudies(Computer Science)The University of British Columbia(Vancouver)July 2016c Wei Lu, 2016AbstractSocial influence is a ubiquitous phenomenon in human life. Fueled by the ex-treme popularity of online social networks and social media, computationalsocial influence has emerged as a subfield of data mining whose goal is to an-alyze and understand social influence using computational frameworks suchas theoretical modeling and algorithm design. It also entails substantial ap-plication potentials for viral marketing, recommender systems, social mediaanalysis, etc. In this dissertation, we present our research achievements thattake significant steps toward bridging the gap between elegant theories incomputational social influence and the needs of two real-world applications:viral marketing and recommender systems. In Chapter 2, we extend theclassic Linear Thresholds model to incorporate price and valuation to modelthe diffusion process of new product adoption; we design a greedy-style al-gorithm that finds influential users from a social network as well as theircorresponding personalized discounts to maximize the expected total profitof the advertiser. In Chapter 3, we propose a novel business model for onlinesocial network companies to sell viral marketing as a service to competingadvertisers, for which we tackle two optimization problems: maximizing to-tal influence spread of all advertisers and allocating seeds to advertisers ina fair manner. In Chapter 4, we design a highly expressive diffusion modelthat can capture arbitrary relationship between two propagating entities toarbitrary degrees. We then study the influence maximization problem in anovel setting consisting of two complementary entities and design efficientapproximation algorithms. Next, in Chapter 5, we apply social influence intorecommender systems. We model the dynamics of user interest evolution us-ing social influence, as well as attraction and aversion effects. As a result,making effective recommendations are substantially more challenging andwe apply semi-definite programming techniques to achieve near-optimal so-lutions. Chapter 6 concludes the dissertation and outlines possible futureresearch directions.iiPrefaceThis dissertation is the result of collaborative research with several othercomputer scientists. All work were done under the supervision of Prof. LaksV.S. Lakshmanan.Chapter 2 is based on a publication [109] in the 2012 IEEE InternationalConference on Data Mining, a joint work with Laks V.S. Lakshmanan. Ideveloped the theory and conducted the experiments under his guidance.Chapter 3 is based on a publication [106] in the 2013 ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, a jointwork with Francesco Bonchi, Amit Goyal, and Laks V.S. Lakshmanan. Ideveloped the theory and conducted the experiments under their guidance.Dr. Goyal refined the idea of the K-LT model that eventually led to sub-modularity. Prof. Lakshmanan and Dr. Goyal gave the proof of NP-hardnessfor the Fair Seed Allocation problem.Chapter 4 is based on a publication in the Proceedings of VLDB En-dowment (volume 9, issue 2) [107], a joint work with Wei Chen at Mi-crosoft Research Asia and Laks V.S. Lakshmanan. I proposed the currentComIC model, studied its properties, and conducted the experiments undertheir guidance. Wei Chen proposed the possible world definition, GeneralTIMframework, and the RR-ComIC algorithm.Chapter 5 is based on a publication in the 2014 ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining [108], a jointwork with Stratis Ioannidis, Smriti Bhagat, and Laks V.S. Lakshmanan. Ideveloped the theory and conducted experiments under their guidance. Thesemi-definite relaxation formulation was developed by Stratis Ioannidis.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Influence Maximization in Social Networks . . . . . . . . . . . 31.2 Challenges and Key Contributions . . . . . . . . . . . . . . . 71.2.1 Viral Marketing . . . . . . . . . . . . . . . . . . . . . . 81.2.2 Competition and Complementarity: A Unified Influ-ence Propagation Model . . . . . . . . . . . . . . . . . 91.2.3 Applying Social Influence in Recommender Systems . 101.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Influence-Driven Profit Maximization in Social Networks . 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Linear Threshold Model with User Valuations . . . . . . . . . 162.3.1 Model and Problem Definition . . . . . . . . . . . . . 162.4 Special Case with Fixed Valuations . . . . . . . . . . . . . . . 172.5 General Properties of the LT-V Model . . . . . . . . . . . . . 21iv2.6 Algorithm Design and Analysis . . . . . . . . . . . . . . . . . 242.6.1 Two Baseline Algorithms . . . . . . . . . . . . . . . . 242.6.2 The Price-Aware Greedy Algorithm . . . . . . . . . . 272.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.2 Results and Analysis . . . . . . . . . . . . . . . . . . . 332.8 Discussion and Future Work . . . . . . . . . . . . . . . . . . . 373 Competitive Viral Marketing: The Host Perspective . . . . 393.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Models and Problem Definition . . . . . . . . . . . . . . . . . 433.3.1 The K-LT Diffusion Model . . . . . . . . . . . . . . . 443.3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . 463.3.3 Possible Alternative Objectives . . . . . . . . . . . . . 483.4 Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . 493.4.1 Submodularity . . . . . . . . . . . . . . . . . . . . . . 493.4.2 Closed-form Expression for Influence Spread . . . . . . 513.4.3 Adjusted Marginal Gain . . . . . . . . . . . . . . . . . 523.5 Fair Seed Allocation Algorithms . . . . . . . . . . . . . . . . . 543.5.1 Hardness Results . . . . . . . . . . . . . . . . . . . . . 543.5.2 Dynamic Programming . . . . . . . . . . . . . . . . . 553.5.3 Integer Linear Programming . . . . . . . . . . . . . . . 573.5.4 An Efficient Greedy Heuristic . . . . . . . . . . . . . . 583.5.5 Discussion on Strategic Behaviors . . . . . . . . . . . . 603.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.6.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 653.6.2 Results and Analysis . . . . . . . . . . . . . . . . . . . 673.7 Discussion and Future Work . . . . . . . . . . . . . . . . . . . 724 Comparative Influence Diffusion and Maximization . . . . 744.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3 The Comparative Independent Cascade Model . . . . . . . . . 794.3.1 Global Adoption Probability (GAP) . . . . . . . . . . 804.3.2 Unreachable States of ComIC Model . . . . . . . . . . 814.3.3 Diffusion Dynamics in the ComIC Model . . . . . . . 824.3.4 Design Considerations . . . . . . . . . . . . . . . . . . 844.3.5 An Equivalent Possible World Model . . . . . . . . . . 844.4 Influence Maximization with Complementary Goods . . . . . 87v4.5 Properties of ComIC Model . . . . . . . . . . . . . . . . . . . 884.5.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . 884.5.2 Submodularity in Complementary Setting . . . . . . . 924.6 Scalable Approximation Algorithms . . . . . . . . . . . . . . . 954.6.1 A General Framework Extending TIM . . . . . . . . . 964.6.2 Generating RR-Sets for Influence Maximization withComIC . . . . . . . . . . . . . . . . . . . . . . . . . . 994.7 The Sandwich Approximation Strategy . . . . . . . . . . . . . 1074.7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . 1074.7.2 Applying Sandwich Approximation to Influence Max-imization . . . . . . . . . . . . . . . . . . . . . . . . . 1094.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.8.1 Experiments with Synthetic Adoption Probabilities . . 1104.8.2 Learning Global Adoption Probabilities . . . . . . . . 1134.8.3 Experimental Settings with Learned Adoption Proba-bilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.8.4 Results and Analysis . . . . . . . . . . . . . . . . . . . 1174.9 Discussion and Future Work . . . . . . . . . . . . . . . . . . . 1205 Recommendations with Attraction, Aversion, and SocialInfluence Diffusion . . . . . . . . . . . . . . . . . . . . . . . . 1225.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 1275.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 1275.3.2 Recommender System and User Utilities . . . . . . . . 1285.3.3 Interest Evolution . . . . . . . . . . . . . . . . . . . . 1295.3.4 Recommended Item Distribution . . . . . . . . . . . . 1315.3.5 Recommendation Objective . . . . . . . . . . . . . . . 1325.4 Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . 1325.4.1 Steady State Social Welfare . . . . . . . . . . . . . . . 1325.4.2 SDP Relaxation . . . . . . . . . . . . . . . . . . . . . 1345.4.3 Solvable Special Cases . . . . . . . . . . . . . . . . . . 1385.4.4 Finite Catalog . . . . . . . . . . . . . . . . . . . . . . 1395.5 Parameter Learning . . . . . . . . . . . . . . . . . . . . . . . 1395.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415.6.1 Evaluation of Parameter Learning . . . . . . . . . . . 1425.6.2 Social Welfare Performance . . . . . . . . . . . . . . . 1455.7 Discussion and Future Work . . . . . . . . . . . . . . . . . . . 150vi6 Summary and Future Research . . . . . . . . . . . . . . . . . 1526.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.2 Discussions and Future Work . . . . . . . . . . . . . . . . . . 154Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158viiList of TablesTable 2.1 Dataset statistics . . . . . . . . . . . . . . . . . . . . . . . 30Table 2.2 Running time, in hours (WD weights, ca = 0.1) . . . . . . 36Table 2.3 Running time, in hours (TV weights, ca = 0.1) . . . . . . . 37Table 3.1 Cyclic behaviors of C1 and C2, assuming c < 91/3. Therightmost column represents the best response by a com-pany. E.g., (C1, 28) means that company C1 will change itsbid to 28 in the next round. Note that the round 7 is iden-tical to round 1, indicating a cyclic trend of the strategicbehaviors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Table 3.2 Datasets statistics . . . . . . . . . . . . . . . . . . . . . . . 65Table 3.3 Test cases with varying budget distribution . . . . . . . . . 65Table 3.4 Comparing Needy-Greedy and Integer Linear Program-ming: For each dataset, we show the largest deviation be-tween Needy-Greedy’s outcome and Integer Linear Pro-gramming’s outcome, in percentage, among all instanceswhere Integer Linear Programming finished. . . . . . . . . 69Table 4.1 Frequently used acronyms. . . . . . . . . . . . . . . . . . . 77Table 4.2 Statistics of graph data (all directed) . . . . . . . . . . . . 110Table 4.3 Percentage improvement of GeneralTIM with RR-ComICover VanillaIC & Copying, where the fixed B-seed set is cho-sen to be the 101st to 200th ones from the VanillaIC order . 111Table 4.4 Percentage improvement of GeneralTIM with RR-ComICover VanillaIC & Copying, where the fixed B-seed set is ran-domly chosen . . . . . . . . . . . . . . . . . . . . . . . . . 111Table 4.5 Percentage improvement of GeneralTIM with RR-ComICover VanillaIC & Copying, where the fixed B-seed set is cho-sen to be the top-100 nodes by VanillaIC . . . . . . . . . . 112viiiTable 4.6 Selected GAPs learned for pairs of movies in Flixster . . . 114Table 4.7 Selected GAPs learned for pairs of books in Douban-Book 114Table 4.8 Selected GAPs learned for pairs of movies in Douban-Movie 114Table 4.9 Sandwich approximation factor: (S⌫)/⌫(S⌫) . . . . . . . . 118Table 5.1 Datasets statistics . . . . . . . . . . . . . . . . . . . . . . . 141ixList of FiguresFigure 2.1 Node states in the LT-V model. . . . . . . . . . . . . . . . 16Figure 2.2 An example graph. . . . . . . . . . . . . . . . . . . . . . . 26Figure 2.3 Distribution of influence weights in Flixster . . . . . . . . 32Figure 2.4 A review for Canon EOS 300D camera on Epinions.com.At the end of the review, the user mentioned the price –$999. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 2.5 Statistics of Valuations (Epinions.com) . . . . . . . . . . 34Figure 2.6 Expected profit achieved (Y-axis) on Epinions graphsw.r.t. |S| (X-axis). (N)/(U) denotes normal/uniform dis-tribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 2.7 Expected profit achieved (Y-axis) on Flixster graphs w.r.t.|S| (X-axis). (N)/(U) denotes normal/uniform distribution. 35Figure 2.8 Expected profit achieved (Y-axis) on NetHEPT graphsw.r.t. |S| (X-axis). (N)/(U) denotes normal/uniform dis-tribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Figure 2.9 Price assigned to seeds (Y-axis) w.r.t. |S| (X-axis) onEpinions-TV with N (0.53, 0.142). . . . . . . . . . . . . . . 36Figure 3.1 Sample graph accompanying Example 1. . . . . . . . . . . 46Figure 3.2 An example graph for illustrating adjusted marginal gain 52Figure 3.3 Adjusted marginal gains on the four datasets. On the X-axis, the seeds are ranked in the order in which they werechosen by the greedy algorithm. . . . . . . . . . . . . . . . 66Figure 3.4 Minimum amplification factors (higher is better) . . . . . 67Figure 3.5 Maximum amplification factors (lower bar is better) . . . 68Figure 3.6 Empirical variance of amplification factors for Needy-Greedy, Round-Robin, and Random Allocation. . . . . . . 70xFigure 3.7 Running time comparisons. Bars touching the top of theY -axis means that the algorithm did not finish within oneweek. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 3.8 Scalability tests of Needy-Greedy . . . . . . . . . . . . . . 72Figure 4.1 ComIC model: Node-level automaton for product A. . . . 79Figure 4.2 ComIC model: Diffusion dynamics . . . . . . . . . . . . . 83Figure 4.3 The graph for Example 3 . . . . . . . . . . . . . . . . . . 88Figure 4.4 The graph for Example 4 . . . . . . . . . . . . . . . . . . 92Figure 4.5 Complementary effects learned from data: The histogramof all (qA|B qA|;) and (qB|A qB|;) values on Flixster,Douban-Book, and Douban-Movie (10000 pairs of itemseach) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Figure 4.6 Effects of " on the running time and influence spread ofRR-ComIC and RR-ComIC++, on all four datasets. Theinfluence spread achieved by RR-ComIC and RR-ComIC++were almost identical in all cases, and thus we only drewone line using the spread of RR-ComIC++. . . . . . . . . . 116Figure 4.7 Influence spread vs. seed set size . . . . . . . . . . . . . . 118Figure 4.8 Running time comparisons on real networks: GeneralTIMwith RR-ComIC, GeneralTIM with RR-ComIC++, andGreedy with Monte Carlo simulations . . . . . . . . . . . . 119Figure 4.9 Running time comparisons on synthetic power-law ran-dom graphs up to one million nodes: GeneralTIM withRR-ComIC versus GeneralTIM with RR-ComIC++ . . . . . 119Figure 5.1 Illustration of aversion and attraction in MovieLens, andgains from accounting for them in optimization. . . . . . . 125Figure 5.2 The decreasing trend of Test RMSE, RMSE↵,RMSE , andRMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142Figure 5.3 Interest evolution probabilities learned on synthetic data,compared against the generated ground-truth values foraccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Figure 5.4 Learned values of ↵i on three real-world datasets . . . . . 144Figure 5.5 Values of i i on three real-world datasets . . . . . . . 144Figure 5.6 Test RMSE comparisons between our extended MF modeland the standard MF model . . . . . . . . . . . . . . . . . 145Figure 5.7 Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying network size (n) . . . . . . 147xiFigure 5.8 Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying . . . . . . . . . . . . . . 147Figure 5.9 Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying i i . . . . . . . . . . . 148Figure 5.10 Social welfare, where FX(0.1) and FX(0.5) denote Flixsterwith = 0.1 and 0.5 respective; FT denotes FilmTipSetand ML denotes MovieLens. . . . . . . . . . . . . . . . . . 149xiiAcknowledgmentsI wish to express my deepest gratitude to my supervisor Prof. Laks V.S.Lakshmanan. This dissertation would not have been possible without hissupervision and continuous guidance and encouragement. Throughout theyears I have learned so much from him, and his unparalleled passion forresearch will always motivate me to pursue excellence.I sincerely thank my supervisory committee members Kevin Leyton-Brown and Raymond T. Ng for their generous time and valuable feedbackon the dissertation. It is certainly worth mentioning that the valuable sug-gestions from Kevin has led to significant improvements to the dissertation,in particular Chapter 3. I am also very thankful to Sathish Gopalakrishnan,Jiawen Han (University of Illinois at Urbana-Champaign), Ruben H. Zamarfor examining my dissertation and providing very helpful comments. BillAiello chaired my proposal defense and William Dunford chaired the finaldefense; I am very thankful for their help and time.I am grateful to Amit Goyal, Wei Chen, Francesco Bonchi, Stratis Ioan-nidis, Smriti Bhagat, Martin Ester, and Mitul Tiwari for being great mentorsand research collaborators in various stages of my Ph.D. studies. It is mygreat pleasure collaborating with them.The acknowledgements would be barely complete without thanking allof my friends for making this journey joyful. Special thanks go to DMMbuddies Min Xie, Pei Li, Rui Chen, and Xin Huang; I should definitelymention Zhilong Fang and Chenyi Zhang, my high school classmates reunitedin Vancouver. I will not attempt to list any more names to avoid inevitableomissions, but buddies Xin Wang, Kan Wu, Weilu Liu, and Ning Tang mustbe called out. Together we went from Zhejiang University to Simon FraserUniversity, we grew in Vancouver, and have had so much fun.Most importantly, I am greatly indebted to my family and especially myparents, Hanquan Lu and Ping Huo. Their unconditional love and supportmeans everything to me. This dissertation is devoted to them.xiiiTo my parents.xivChapter 1IntroductionOver the past decade, fueled by the rapid growth of online social networkingwebsites (e.g., Facebook and LinkedIn) and social media (e.g., Twitter andInstagram), Computational Social Influence, and in particular, the dynam-ics of social influence propagation (or diffusion, or cascade)1 have attractedsubstantial interests from computer scientists in the fields of data mining,machine learning, algorithmic game theory, theoretical computer science,etc [35, 52, 67].Social influence occurs when an individual’s beliefs, opinions, or behav-iors change as a result of interactions with other individuals [126]. Socialinfluence theory has been studied extensively by sociologists and psycholo-gists [11, 15, 30, 47, 57, 95, 96], dating back several decades. As our modernworld becomes increasingly connected, it is now well understood that theactions of individuals (e.g., adopting a new technology, sharing a video, orretweeting a tweet) may impact their friends, families, colleagues, and as aresult, trigger a viral effect over the entire social network. For instance, alarge-scale voting mobilization experiment carried out on Facebook duringthe 2010 United States congressional voting reported statistically significantresults indicating messages with social cues (that promote and encouragevoting) did impact the behaviors of millions of people [21].In computational social influence, a fundamental algorithmic problemis the Influence Maximization problem proposed by Kempe et al. [86].An instance of Influence Maximization is defined by (i) a directed graphG = (V,E), where each node v 2 V represents an individual (or user) in the1Throughout the dissertation, we use the three terms “propagation”, “diffusion”, and“cascade” interchangeably to refer to the processes in which a behavior or an action virallyspreads from one user to another following links in a social network.1social network and each edge (u, v) represents a relationship between user uand user v; (ii) a positive integer k < |V | as budget2, and (iii) a stochas-tic diffusion model M which describes the random process of how a certainbehavior would spread from one node to its neighbors in the network. Thetask is to find a subset S⇤ ⇢ V satisfying the cardinality constraint |S⇤| k,so that by targeting the nodes in S⇤ as early-adopters, the expected numberof users who end up adopting the (propagating) behavior is maximized bythe end of a propagation process that proceeds according to probabilisticrules specified by model M . Influence Maximization is NP-hard under manystochastic diffusion models [35, 86]. However, a constant-factor greedy ap-proximation algorithm can be obtained as long as model M satisfies certaindesirable properties.Following this seminal work by Kempe et al., the research on computa-tional social influence began to take off, benefiting numerous applicationssuch as viral marketing (a.k.a. viral advertising or word-of-mouth advertis-ing) [8, 50, 77, 86, 98, 106, 128], community detection [13, 135], recommendersystems [19, 70, 108, 144], outbreak detection [100], social media and blogo-sphere analysis [10, 76, 81, 101, 117], etc. Here, at a high level, we classifyexisting research into three major categories, inspired by [33]:Influence Modeling. Often times, the forefront challenge is how to modelinfluence cascades over social networks. Only after a satisfactory modelis given, one can study worthy optimization problems and design opti-mization algorithms. There are two common desiderata: First, a goodmodel should reflect human interactions in real world scenarios asmuch as possible; Second, a good model should be mathematicallysound and ideally, allow tractable solutions for natural optimizationproblems. Notable work in this category has successfully modeled var-ious aspects in real-life social influence, e.g., the spread of both pos-itive and negative opinions [16, 29, 34, 79], competitions of multiplediffusions [18,23,24,31,106,123], cooperations or complementarity be-tween different diffusions [107, 117, 118, 149], time-delay in propaga-tion [36, 102], non-progressive dynamics, where a user may switch be-tween “on” (influenced) and “off” (not influenced) states [105], as wellas continuous-time propagation [51, 65] as opposed to discrete-time.2Following the convention [23, 69, 71, 86], through the thesis we use the term “budget”to refer to the maximum number of influential users (seeds) to be selected in the influencemaximization problem. Note that this is not a monetary value and should not be confusedwith such. If a monetary budget is meant, it will be explicitly mentioned.2Influence Optimization. With a well-defined diffusion model, we can thenoptimize a certain desirable objective under this model. Among such,Influence Maximization is the most fundamental problem. Typically,optimization objectives are tailored to models. For instance, in modelswith negative opinions, a common formulation is that given the set of“negative” seeds, find an optimal set of “positive” seeds to bloack thespread of negative opinions to the maximum extent possible [29, 79].As another example, in models that focus on product adoptions, thetypical objective is to maximize adoption, profit, or some other relatedbusiness objectives instead of pure influence spread (as those modelsexplicitly distinguish between the state of merely being influence andthe state of actual adoption) [16,107,109].Many of the influence optimization problems, including the seminalInfluence Maximization, are NP-hard [86], and moreover for a largefamily of stochastic diffusion models, it is #P-hard to compute theexact influence spread (in expectation) of any given seed set [37, 39].Therefore, lots of work focuses on devising efficient and effective ap-proximation algorithms or heuristics that scale to massive social net-works [22,37–40,71,72,84,87,100,136,137].Influence Learning. Almost all stochastic diffusion models assumes thecomplete knowledge of pairwise influence strength (in the form ofprobabilities or weights) between friends in social networks. This as-sumption can be unrealistic, as those data is almost never explicitlyavailable. Work has been done to develop principled methodologies forlearning influence strength from user interaction data. Representativework can be further divided into two kinds: some assume that net-work structures are known and thus the main task is to infer influencestrength [68,129,141], while others learn network structures and influ-ence strength at the same time [42,64, 122].This dissertation concentrates more on the first two aspects, namely mod-eling and optimization. In the experiments where influence probabilities orinfluence weights are required, we draw on existing work on influence learningto compute such quantities.1.1 Influence Maximization in Social NetworksDomingos and Richardson [50, 128] first posed influence maximization asan algorithmic problem to the data mining community. They used Markov3Random Field (MRF) techniques to model and study the problem of findingan optimal set of individuals (seeds) on which a company should performmarketing actions, so that the expected increase in profit is maximized. Theypropose a greedy-style hill-climbing heuristic to solve the problem.However, the MRF-based formulation in [50, 128] did not gain nearly asmuch popularity as the discrete optimization formulation in Kempe et al [86],which we re-state here for convenience: Given a directed graph G = (V,E)in which nodes representing individuals in a social network and edges repre-senting the links or relationships between individuals, as well as a positiveinteger k, the task of influence maximization (Influence Maximization) is tofind a node-set S of size no more than k, such that by targeting them initiallyfor early activation, the expected number of activated nodes over the entiresocial network is maximized, under a certain stochastic diffusion model. Notethat it is the diffusion model that specifies probabilistic rules under whichthe dynamics of a propagation/diffusion/cascade process unfold.The targeted set S of nodes is often referred to as the seed set. Let : 2V ! R0 denote the influence spread function, such that (S) is theexpected number of active nodes by the end of diffusion, given that S isactivated at the beginning. As a convention, (S) is commonly referred toas the influence spread of S.Two classic and fundamental stochastic diffusion models are the Indepen-dent Cascade (IC) model [63] and the Linear Thresholds (LT) model [73].In both models, there are two possible states for nodes: {inactive, active}.Both models are progressive, meaning that once a node becomes active, itwill never revert back to inactive. Intuitively, the IC model describes indi-vidual interactions between friends in social networks, while the LT modelcharacterizes thresholding behaviours often found in life experiences: e.g., aperson may finally decide to purchase an iPhone if an overwhelming majorityof her friends have already done so.Independent Cascade (IC). An instance of the IC model has a directedgraph G = (V,E) and an influence probability function p : E ! [0, 1].For each edge (u, v) 2 E, let pu,v =def p(u, v) denot the probability that uactivates v. Initially, all nodes are inactive. At time step 0, all seeds becomeactive and the propagation starts to proceed in discrete time steps. At timet, every node u that became active at t 1 makes one attempt to activateeach of its inactive out-neighbors v 2 Nout(u), succeeding with probabilitypu,v, which is independent of the diffusion history thus far. The diffusionprocess terminates when no more nodes can be activated.4Algorithm 1: Greedy — The Greedy Hill-Climbing AlgorithmData: graph G = (V,E), cardinality constraint kResult: seed set S such that |S| k1 begin2 S ;3 for i 1 to k do4 w argmaxu2V \S [f(S [ {u}) f(S)]5 S S [ {w}Linear Thresholds (LT). In this model, each node v has an activationthreshold ✓v uniformly distributed in the interval [0, 1], which represents theminimum weighted fraction of active in-neighbors that are needed to activateui. Each edge (u, v) 2 E is associated with an influence weight pu,v, and forall v 2 V , the sum of incoming weights does not exceed 1:Xu2N in (v)pu,v 1,where N in(v) denotes the set of in-neighbors of v.Influence propagation also proceeds in discrete time steps. At time step0, a seed set S is activated. At any time step t 1, any inactive v becomesactive if the total influence weight from its active in-neighbors reaches orexceeds ✓v: Xactive u2N in (v)pu,v ✓v.The diffusion process terminates when no more nodes can be activated.Hardness and Approximation. Influence Maximization is NP-hard underboth models [86], and furthermore, it is #P-hard to compute the exact valueof (S) for seed set S [37, 39]. To obtain approximation algorithms, Kempeet al. [86] show that the influence spread function (S) is monotone andsubmodular w.r.t. the seed set S.Intuitively, submodularity captures the law of diminishing marginal re-turn. Let X be a finite ground set, and let f : 2X ! R0 denote a setfunction that maps elements in X to nonnegative real values. The func-tion f is submodular iff for all S ✓ T ✓ X and all x 2 X \ T , we havef(S [ {x}) f(S) f(T [ {x}) f(T ). Monotonicity simply means thefunction value is non-decreasing as the set grows: f(S) f(T ) wheneverS ✓ T ✓ X.5Thanks to these two properties, an approximation algorithm forInfluence Maximization can be obtained by applying a seminal result inNemhauser et al. [119]: the problem of maximizing a monotone submodularfunction subject to a cardinality constraint can be approximated withina factor of 1 1/e, by the following greedy algorithm (pseudo-code inAlgorithm 1): It starts with S = ;, and add one element at a time to S:the element that has the largest incremental function value w.r.t. S. Thealgorithm terminates after the cardinality of S reaches k.To circumvent the #P-hardness of computing (S), a common practice isto estimate the spread using Monte-Carlo (MC) simulations, in which casethe approximation ratio of Greedy drops from 1 1/e to 1 1/e ✏, forany ✏ > 0. The value of ✏ depends on the number of MC iterations (whichis typically 10000). Unless otherwise noted, hereafter whenever we mentionGreedy, MC simulations are used jointly.The aforementioned approximation result is not just applicable to IC andLT. In fact, this elegant framework works for a large class of discrete diffusionmodels, all of which can be seen as special cases of the General Thresholdsmodel (GT). This model specifies that each node v 2 V is associated with athreshold function fv : 2V ! [0, 1] that is monotone w.r.t. the set of activein-neighbors of v. Mossel and Roch [116] show that whenever the thresholdfunction at every node is monotone and submodular, the influence spreadfunction (·) is also monotone and submodular (a conjecture posed in [86]).This enables the approximation results of Greedy to apply.Efficient Influence Maximization Algorithms. Greedy suffers from se-rious efficiency issues because of the MC simulations. It may take days orweeks to finish mining 50 or 100 seeds on graphs with merely thousands ofnodes [35]. By exploiting submodularity, Leskovec et al. [100] devised thecost-effective lazy forward (CELF) technique, which improves the runningtime of Greedy by up to 700 times. Goyal et al [71] proposed CELF++, whichfurther improve the efficiency of CELF by intelligent look-ahead computa-tions of marginal gains. CELF++ was able to gain up to 61% is runningtime compared to CELF.More recently, Tang et al. [137] proposed a randomized algorithm calledTwo-phase Influence Maximization (TIM), which produces a (1 1/e ✏)-approximation with at least 1 |V |` probability in O((k + `)(|E| +|V |) log |V |/✏2) expected running time. TIM is based on the concept ofReverse-Reachable sets (RR-sets) [22], and is applicable to the TriggeringModel [86] that generalizes both IC and LT. This algorithm is orders of mag-nitude faster than Greedy with CLEF++, while still yielding approximation6solutions with high probability. Very recently, Tang et al. also proposed anew improvement [136], which significantly reduced the number of RR-setsgenerated using martingale analysis. The new IMM algorithm (for InfluenceMaximization with Martingale) allows dependent RR-sets, in contrast withTIM that requires all RR-sets generated must be independent. IMM has thesame approximation guarantee and expected running time complexity, butis empirically shown orders of magnitude faster than the fastest version ofTIM [136]. In Section 4.6.1, we present a generalized solution frameworkthat extends TIM to work for an arbitrary stochastic diffusion model thatsatisfies submodularity and monotonicity.Efficient and Efffective Heuristics. In addition to searching for scalableapproximation algorithms, a number of heuristics have been proposed inthe literature (especially before the invention of TIM and IMM) [37, 39, 72,84]: they are not only orders of magnitude faster than Greedy but are alsocomparable in terms of seed set quality as evaluated in terms of the influencespread achieved.The common intuition is to leverage mathematical properties of the diffu-sion model and then devise efficient schemes to accurately estimate influencespread to replace the prohibitive MC simulations. For example, the Maxi-mum Influence Arborescence (MIA) algorithm in [37] (for IC) and the LocalDirected Acyclic Graph (LDAG) [39] algorithm (for LT) restrict the compu-tation of influence spread within a simpler structure such as trees and DAGsthat are local to seeds. The SimPath algorithm (for LT) [72] intelligentlyenumerates simple paths originating from seeds and leverages the fact thatthe probability of a path quickly diminishes as the length increases. Thus, in-expensive enumeration can be done to estimate influence spread accurately.For more details, we refer the reader to the original papers or the monographby Chen et al. [35].1.2 Challenges and Key ContributionsDespite extensive prior work, there are still plenty of unknowns to explorein this field. The goal of this dissertation to advance the research in thisfield by studying some of the important new problems (described in moredetails below) and tackle challenges that arise from those problems usingnovel modeling ideas and algorithm design techniques.Our overall approaches and objectives are (i) to draw on social sci-ences to design more expressive, more realistic, yet still tractable influencediffusion models; (ii) to pose well-motivated and challenging optimization7problems and design approximation algorithms and/or effective heuristics tosolve them; (iii) to recognize the important role of social influence in recom-mender systems and devise cascade-aware models and algorithms to improvethe quality of recommendations. In what follows, we motivate the problemsstudied in this dissertation and summarize key contributions.1.2.1 Viral MarketingViral marketing [26] is a cost-effective advertising strategy that leveragesword-of-mouth effects to promote ideas or products through a social net-work. It is often seen as the primary application of computational socialinfluence [35, 37, 50, 86], due to natural connections: In principle, techniquesdeveloped for influence maximization is well-suited for running viral market-ing campaigns as their objectives align. However, applying influence maxi-mization theory as is to viral marketing may be problematic, as we elaboratebelow.Influence-Driven Profit Maximization. The ultimate goal of viral mar-keting is to convince consumers to adopt (purchase) a product, and suchdecisions undoubtedly involve monetary considerations, which are importantmissing links between social influence theory and viral marketing in practice.Various challenges arise in this context. For instance, how can we incorporatethe monetary nature of the product adoption dynamics into the modelingand optimization of social influence propagations? And what should be thebest price of the propagating product to extract as much profit as possiblefrom the viral marketing campaigns?We address these challenges in Chapter 2. There are two major factors inplay: the price of the product being advertised and the valuation that usershave toward the product, which is defined as the maximum amount of moneythat a user is willing is to pay. Economics theory suggests that only when thevaluation is at least as high as the price, an adoption will be made [85]. Weextend the LT model by incorporating both price and buyer valuation intothe adoption diffusion process. We study the problem of finding an optimalmarketing strategy consisting of a seed set and a price vector, such that theexpected total profit at the end of propagation is maximized.One of the biggest challenges in the above problem is pricing, and morespecifically, how to strike the balance between the profit earned on the seedsand the profit earned on follow-up adoptions: if a large discount is offered,then the seller has a higher chance to convince the seed to adopt, so that herinfluence can be leveraged; but on the other hand, the loss of profit from thisparticular seed is larger due to deeper discount. We come up with a price-8aware greedy algorithm that not only identifies seeds from a social networks,but also dynamically adjusts the optimal personalized discounts for eachseed chosen. Our experiments on three real-world datasets demonstrate thatthe price-aware greedy algorithm achieves the best performance both in theexpected profit achieved and in running time, compared to several intuitivebaselines.Viral Marketing from the Host Perspective. Another important yetoften overlooked question regarding viral marketing is: Can the advertiserssimply take the liberty of approaching social network users and promotingtheir products? The answer is likely no, as such unsolicited actions wouldbe viewed as spamming and may not gain much trust from users [104]. Onthe other hand, many service providers of social networks (a.k.a. hosts) haveestablished their own platforms and protocols for advertising and monetiza-tion3. Thus, a viable way to conduct viral advertising is to do it through thehost. However, to the best of our knowledge, prior work cannot offer suitablesolutions for this task.To bridge this gap, we propose a novel business model for the hosts tosell viral marketing as a service (Chapter 3). When competing advertisersrequest to run viral marketing campaigns, the host is responsible for select-ing and allocating seeds, for which it charges advertisers. There are two maindesiderata: First, the collective influence spread over all advertisers shouldbe maximized, which in turn maximizes the host’s revenue via commission;Second, the allocation of seeds must be fair, in such a way that the expectedgain (influence spread achieved) for each advertiser is in proportion to itsinvestment in the campaign. We extend the LT model to characterize com-petitive influence diffusion. Then, by exploiting the mathematical propertiesderived from the diffusion model, we tackle both influence maximization andfair allocation problems using greedy and dynamic programming algorithms.We run experiments on three real-world social networks, showing that ourproposed allocation algorithms are both efficient and effective.1.2.2 Competition and Complementarity: A UnifiedInfluence Propagation ModelProducts or technologies often face fierce competitions (e.g., iPhone vs. An-droid phones), and such scenarios can be modeled by competitive influencepropagation models [18, 23, 24, 31, 106, 123]. However, consider the following3Advertising on Facebook: https://www.facebook.com/business/products/ads; Advertis-ing on Twitter: https://ads.twitter.com/. Both last accessed on October 3, 2015.9hypothetical example: Suppose Apple wants to launch a joint advertisingcampaign on iPhone 6S and the Apple Watch. Unfortunately, none of thecompetitive diffusion models is suitable for this job. This is because iPhone6S and Apple Watch are simply not competitors, but rather complemen-tary to each other, yet all competitive models assume that users adopt atmost one product. Therefore, the main challenge is how to carefully modelinfluence cascades of complementary goods, especially when the degree ofcomplementarily is asymmetric: adopting an iPhone without Apple Watchis a rational decision, while the opposite action is not, due to the fact thatalmost all functionalities of Apple Watch relies on a pairing iPhone.In Chapter 4, we propose a new diffusion model called Comparative In-dependent Cascade (ComIC) which is capable of covering the full spectrumof entity interactions from competition to complementarity. Users’ adoptiondecisions depend not only on edge-level information propagation, but alsoon a node-level automaton whose behavior is governed by a set of modelparameters, enabling our model to capture not only competition, but alsocomplementarity, to any possible degree. We study Influence Maximizationin a novel setting with complementary entities, where the objective is tofind seeds of product A (e.g., Apple Watch), given the seeds of its com-plementary product B (e.g., iPhone), such that the influence spread for Ais maximized. We devise effective approximation algorithms via non-trivialtechniques based on reverse-reachable sets and a novel “sandwich approxi-mation” strategy. The applicability of both techniques extends beyond ourmodel and problems. Our experiments show that the proposed algorithmsconsistently outperform intuitive baselines in four real-world social networks,often by a significant margin. In addition, we learn model parameters fromreal user action logs.1.2.3 Applying Social Influence in Recommender SystemsComputational social influence has application values in a variety of datamining tasks. One primary example is the task of personalized recommenda-tions. Typically, a recommender system extracts user feedback from historicaltransactions, ratings, and reviews, to build a model that profiles users anditems, and computes a ranked list of items for each user as recommenda-tions [3, 127]. One of the key challenges is data sparsity: given an arbitraryuser, we may not have enough past ratings to accurately profile her and pre-dict her preferences. Previous work [82,112,152] leverages social influence toaddress the sparsity issue, as the intuition is that a user tends to adopt anitem rated positively by trustworthy or influential friends.10However, to the best of our knowledge, no prior work explicitly modelsinterest cascades that may be triggered jointly by social influence and recom-mended items. To elaborate, suppose that Alice was recommended the bookThe Unbearable Lightness of Being by Milan Kundera. She really liked it andgave a five-star rating, which may trigger the RecSys to further suggest thisbook to users whom Alice has strong influence on. Subsequently, a cascadeof recommendations and high ratings may start propagating over the entiresocial network. Thus, a subtle and challenging question is that how recom-mendations can be designed to ensure (i) the suggested items remain highlyrelevant to users and (ii) the interest cascades over time can be incorporatedto further improve the quality of recommendation.To this end, in Chapter 5 we model interest evolution through dynamicinterest cascades: we consider a scenario where a user’s interests may beaffected by (i) the interests of other users in her social circle and (ii) sugges-tions she receives from a recommender system. In the latter case, we modeluser reactions through either attraction or aversion towards past sugges-tions. We study this interest evolution process, and the utility accrued byrecommendations, as a function of the system’s recommendation strategy.We show that, in steady state, the optimal strategy can be computed as thesolution of a semi-definite program (SDP). Using datasets of user ratings,we provide evidence for the existence of aversion and attraction in real-lifedata, and show that our optimal strategy can lead to significantly improvedrecommendations over systems that ignore aversion and attraction.1.3 OutlineIn Chapter 2, we incorporate monetary aspects into social influence prop-agations and study the problem of influence-driven profit maximization forviral marketing on social networks. In Chapter 3, we study viral market-ing from the perspective of social networking service providers, and designa viral-marketing-as-a-service business model and corresponding algorithmicframeworks to solve the problem of fair seed allocation. In Chapter 4, we pro-pose a novel influence diffusion model that characterizes both competitiveand complementary relationship between two propagating entities, to anydegree possible. In Chapter 5, we use social influence propagation as a keybuilding block to model dynamic user interests in social networks, and deviseoptimal recommendation algorithms. Finally, we summarize this dissertationand discuss future research directions in Chapter 6.11Chapter 2Influence-Driven ProfitMaximization in SocialNetworks2.1 IntroductionAlthough Influence Maximization has been studied extensively, a majority ofthe previous work has focused on the classical propagation models, namelyIC and LT, which do not fully incorporate important monetary aspects inpeople’s decision-making process of adopting new products. The importanceof such aspects is seen in actual scenarios and recognized in the managementscience literature.As real-world examples, until recently, Apple’s iPhone has seemingly cre-ated bigger buzz in social media than any other smartphones. However, itsworldwide market share in 2011 fell behind Nokia, Samsung, and LG1. Thisis partly due to the fact that iPhone is pricier in terms of both hardware(if one buys it contract-free and factory-unlocked) and monthly rate plans.On the contrary, the HP TouchPad was shown little interest by the tabletmarket when it was initially priced at $499 (16GB). However, it was sold outwithin a few days after HP dropped the price substantially to $99 (16GB) 2.In management science, the adoption of a new product is characterizedas a two-step process [85]. In the first step, “awareness”, an individual gets1IDC Worldwide Mobile Phone Tracker, July 28, 2011.2http://www.pcworld.com/article/237088/hp_drops_touchpad_price_to_spur_sales.html, last accessed on September 7th, 2015.12exposed to the product and becomes familiar with its features. In the secondstep, “actual adoption”, a person who is aware of the product will purchase itif her valuation outweighs the price. Product awareness is modeled as beingpropagated through the word-of-mouth of existing adopters, which is indeedarticulated by classical propagation models. However, the actual adoptionstep is not captured in these classical models and is indeed the gap betweenthese models and that in [85].In a real marketing scenario, viral or otherwise, products are priced andpeople have their own valuations for owning them, both of which are criticalin making adoption decisions. Precisely, the valuation of a person for a certainproduct is the maximum money she is willing to pay for it; the valuation fornot adopting is defined to be zero [132]. Thus, when a company attempts tomaximize its expected profit in a viral marketing campaign, such monetaryfactors need to be taken into account. However, in Influence Maximization,only influence weights and network structures are considered, and the mar-keting strategies are restricted to binary decisions: for any node in the net-work, an Influence Maximization algorithm just decides whether or not itshould be seeded.To address the aforementioned limitations, we propose the problem ofprofit maximization (ProMax) over social networks, by incorporating bothprices and valuations. ProMax is the problem of finding an optimal strategyto maximize the expected total profit earned by the end of an influencediffusion process under a given propagation model. We extend the LT modelto propose a new propagation model named the Linear Threshold model withuser Valuations (LT-V), which explicitly introduces the states influenced andadopting. Every user will be quoted a price by the company, and an influenceduser adopts, i.e., transitions to adopting, only if the price does not exceedher valuation.As pointed out in [90], consumers typically do not want to reveal theirvaluations before the price is quoted for reasons of trust. Moreover, for pri-vacy concerns, after a price is quoted, they usually only reveal their decisionof adoption (i.e., “yes” or “no”), but do not wish to share information abouttheir true valuations. Thus, following the literature [90, 132], we make theIndependent Private Value (IPV) assumption, under which the valuation ofeach user is drawn independently at random from a certain distribution. Suchdistributions can be learned by a marketing company from historical salesdata. Furthermore, our model assumes users to be price-takers who respondmyopically to the prices offered to them, solely based on their privately-heldvaluations and the price offered.13Since prices and valuations should be considered in the optimization,marketing strategies for ProMax require non-binary decisions: for any nodein the network, we (i.e., the system) need to decide whether or not to seed it,and what price should be quoted. Given this factor, the objective functionto optimize in ProMax, i.e, the expected total profit, is a function of boththe seed set and the vector of prices. As we shall show in Secs. 2.3 and2.6, since discounting may be necessary for seeds, the profit function is ingeneral non-monotone. We shall also show that the profit function maintainssubmodularity for any fixed vector of prices, regardless of the specific formsof valuation distributions.In light of the above, ProMax brings about more challenges comparedto Influence Maximization, and calls for more sophisticated algorithms forits solution. As the profit function is in the form of the difference betweena monotone submodular set function and a linear function, we first designan “unbudgeted” greedy (U-Greedy) framework for seed set selection. In eachiteration, it picks the node with the largest expected marginal profit untilthe total profit starts to decline. We show that for any fixed price vector,U-Greedy provides quality guarantees slightly lower than a (1 1/e ✏)-approximation.To obtain complete profit maximization algorithms, we propose to inte-grate U-Greedy with three pricing strategies, which leads to three algorithmsAll-OMP (Optimal Myopic Prices), FFS (Free-For-Seeds), and PAGE (Price-Aware GrEedy). The first two are baselines and choose prices in ad hoc ways,while PAGE dynamically determines the optimal price to be offered to eachcandidate seed in each round of U-Greedy. Our experimental results on threereal-world network datasets illustrate that PAGE outperforms All-OMP andFFS in terms of expected profit achieved and running time, and is morerobust against various network structures and valuation distributions.2.2 Related WorkBhagat et al. [17] addressed the difference between product adoption andinfluence in their Linear Thresholds with Colors (LT-C) model. They focuson the possible emergence of negative opinions and the fact that even non-adopting users may spread opinions or information to friends. In the LT-Cmodel, the extent to which a node is influenced by its neighbors dependson two factors: influence weights and the opinions of the neighbors. LT-Calso features a “tattle” state for nodes: if an influenced node does not adopt,it may still propagate positive (promote state) or negative influence (inhibit14state) to neighbors. The optimization objective is to maximize the spread ofpositive influence and the algorithmic framework in [86] is applicable. Ourwork departs from [17] by modeling the monetary aspects (price and valua-tion) in product adoption and posing a natural, but different problem whoseobjective is to maximize the expected profit in a viral marketing campaign.Due to the specific properties of both the LT-V model and the formulationof profit maximization, the greedy approximation algorithm [86] no longerapplies. In principle, the LT-V and LT-C models capture different aspectsof influence diffusion and can be merged into a combined model – We deferdetailed discussions to Section 2.8. In this chapter, our focus is to under-stand the direct impact of price and valuation in solving influence-drivenprofit maximization, a problem, as we shall see, is already challenging byjust extending the canonical LT model.Considerable work has been done on pricing problems in social networks.Hartline et al. [77] studied optimal marketing for digital goods (zero cost)in social networks and propose the Influence-and-Exploit (IE) framework.In the Influence step, the seller offers free samples to a set of seeds; In theExploit step, the seller determines a random sequence to visit each of theremaining nodes and offer them a price that would maximize the seller’sexpected revenue based on the probability distribution of buyer valuation.The valuation of a user depends on the set of her active neighbors. Notice thatthis randomized approach effectively bypassed the network structure andfurthermore, their approach did not consider the viral diffusion of adoptionbehaviors, which is a clear distinction from our work.Arthur et al. [6] adopted the IE framework to study revenue maximizationfor viral marketing. Given a seed set S, their algorithm first computes anapproximate max-leaf spanning tree T of the input social network graph Grooted at S. All seeds and all internal nodes of T will be offered free samples.Each leaf node will be charged a constant price with a certain probability, orit gets a copy for free as well. There are several key differences between thiswork and ours: (i) seeds are given as input in [6], whereas in our case, thechoice of the seed set is dictated by profit maximization and hence made byalgorithms; (ii) our profit maximization algorithm is capable of dynamicallyfinding the best personalized discounts for seeds, however in [6] all seeds getfree samples; (iii) unlike ours, their work lacks influence modeling.15Inactive Influenced Adoptingprice ≤ valuationprice > valuationinf ≥ threshold inf < threshold Figure 2.1: Node states in the LT-V model.2.3 Linear Threshold Model with User Valuations2.3.1 Model and Problem DefinitionIn the LT-V model, the social network is modeled as a directed graphG = (V,E), in which each node ui 2 V is associated with a valua-tion vi 2 [0, 1]. Recall that in §2.1, we made the IPV assumption underwhich valuations are drawn independently at random from some continu-ous probability distribution assumed known to the marketing company. LetFi(x) = Pr[vi x] be the distribution function of vi, and fi(x) = ddxFi(x)be the corresponding density function. The domain of both functions is [0, 1]as we assume both prices and valuations are in [0, 1]. As in the classical LTmodel, each node ui has an influence threshold ✓i chosen uniformly at ran-dom from [0, 1]. Each edge (ui, uj) 2 E has an influence weight wi,j 2 [0, 1],such that for each node uj ,Pui2N in (uj)wi,j 1. If (ui, uj) 62 E, definewi,j = 0. Following [50, 128], we assume that there is a constant acquisitioncost ca 2 [0, 1) for marketing to each seed (e.g., rebates, or costs of mailingads and coupons).Diffusion Dynamics. Figure 2.1 presents a state diagram for the LT-Vmodel. At any time step, nodes are in one of the three states: inactive,influenced, and adopting. A diffusion under the LT-V model proceeds indiscrete time steps. Initially, all nodes are inactive. At time 0, a seed setS is targeted and becomes influenced. Next, every user ui in the networkis offered a price pi by the system. Let p = (p1, . . . , p|V |) 2 [0, 1]|V | denotea vector of quoted prices, which remains constant throughout the diffusion.For any ui 2 S, it gets one chance to adopt (enters adopting state) at step 0if pi vi; otherwise it stays influenced.At any time step t 1, an inactive node uj becomes influenced if thetotal influence from its adopting in-neighbors reaches its threshold, i.e.,Xui2N in (uj)^ui adoptingwi,j ✓j .16Then, uj will transition to adopting at time step t if pj vj , and willstay influenced otherwise. The model is progressive, meaning that once anode “advances” from one state to the next, e.g., from inactive to influenced,or from influenced to adopting, it will never revert to the previous states.The diffusion ends if no more nodes can change states. Following [85], weassume that only adopting nodes propagate influence, as adopters can releaseexperience-related product features (e.g., durability, usability), making theirrecommendations more effective in removing doubts of inactive users.Formally, we define ⇡ : 2V ⇥ [0, 1]|V | ! R to be the profit function suchthat ⇡(S,p) is the expected (total) profit earned by the end of a diffusionprocess under the LT-V model, with S as the seed set and p as the vectorof prices. The problem studied in the paper is as follows.Problem 1 (Profit Maximization (ProMax)). Given an instance of theLT-V model consisting of a graph G = (V,E) with edge weights, find theoptimal pair of a seed set S and a price vector p that maximizes the expectedprofit ⇡(S,p).It is worth emphasizing that users in this problem can be naturally seenas utility-maximizing agents, where the utility of each adopting user i isdefined to be the difference between i’s valuation toward the product andthe price she is going to pay: vi pi. The utility of not adopting is simply0. In our setup, every node is offered a price once and only once, and weassume that there is no reconsideration after a node declines to adopt theproduct. Hence, it is clear that no user has any incentive to deviate from hertruthful valuation. In other words, for all users, the dominant strategy is tocompare their truthful valuations to the price.2.4 Special Case with Fixed ValuationsTo better understand the properties of the LT-V model and the hardnessof ProMax, we first study a special case of the problem. We assume thevaluation distribution degenerate to an identical single-point, i.e., for allui 2 V , vi = p with probability 1, where p 2 (0, 1] is a constant. As mentionedin §2.1, this is usually not the case, and the degeneration assumption here isof theoretical interest only.For simplicity, we also assume that for every ui 2 S, the quoted pricepi = 0. Strictly speaking, for the sake of maximizing expected total profit,the seller should also charge price p to all seeds, since it is assumed that allusers have valuation p. In this case, the expected profit function becomes⇡ˆ(S) = p · L(S) ca|S|, and the non-monotonicity still holds in general.17To see this, consider a social network graph G = (V,E) where |V | = 100,and also let p = 0.5, ca = 0.1. Suppose that there is a single node v thathas expected influence spread of 90, which alone would yield a profit of 44.9,while if S = V , the expected profit would be 40, which is less than the casewhen S = {v}. This example illustrates that the profit function is still non-monotone when all users have the same valuation and are offered the sameprice.Since valuation is the maximum money one is willing to pay for theproduct, in this case, the optimal pricing strategy is to set pj = p, 8uj 2 V \S.The situation amounts to restricting the marketing strategy to a binary one:free sample (pi = 0) for seeds and full price for non-seeds (pj = p). Giventhis pricing strategy, once a node is influenced, it transitions to adopting withprobability 1. Thus, ProMax boils down to a problem to determine a seedset S, and the profit function ⇡(S,p) reduces to a set function ⇡ˆ(S), since pis uniquely determined given S:⇡ˆ(S) = p · (L(S) |S|) ca |S|= p · L(S) (p+ ca) |S|, (2.1)where L(S) is the expected number of adopting nodes under the LT-Vmodel by seeding S.In general, the degenerated profit function ⇡ˆ is non-monotone. To seethis, let u be any seed that provides a positive profit. Now, clearly ⇡ˆ(;) =0 < ⇡ˆ({u}) but ⇡ˆ(V ) 0 < ⇡ˆ({u}), as giving free samples to the wholenetwork will result in a loss of ca |V | on account of seeding expenses. Since⇡ˆ is non-monotone, unlike Influence Maximization, it is natural to not use abudget k for the number of seeds, but instead ask for a seed set of any sizethat results in the maximum expected profit. In other words, the number ofseeds to be chosen, k, is not preset, but is rather determined by a solution.This restricted case of ProMax is to find S = argmaxT⇢V ⇡ˆ(T ), which weshow is NP-hard.Theorem 1. The Restricted ProMax problem (RPM) is NP-hard for theLT-V model.Proof. Given an instance of the NP-hardMinimum Vertex Cover (MVC)problem, we can construct an instance of the ProMax problem, such thatan optimal solution to the ProMax problem gives an optimal solution tothe MVC problem. Consider an instance of MVC defined by an undirectedn-node graph G = (V,E); we want to find a set S such that |S| = k and kis the smallest number such that G has a vertex cover (VC) of size k.18The corresponding instance of RPM is as follows: first, we direct all edgesin G in both directions to obtain a directed graph G0 = (V,E0), where E0is the set of all directed edges. Then, for each ui 2 V , set ✓i = 1; for each(ui, uj) 2 E, define wi,j = 1/din(uj), where din(uj) is the in-degree of uj inG0. Lastly, set p = 1 and ca = 0, in which case ⇡ˆ(S) = L(S) |S|. Now, wewant show that a set S ✓ V is a minimum vertex cover (MVC) of G if andonly if S = argmaxT✓V ⇡ˆ(T ).(=)). If S is a MVC of G, then in ProMax we choose S as the seed set,so that ⇡ˆ(S) = n |S|. This is optimal, shown by contradiction. Supposeotherwise, i.e., there exists some T ✓ V , T 6= S, such that ⇡ˆ(T ) > ⇡ˆ(S).For the case of |T | |S|, we have ⇡ˆ(T ) = L(T ) |T | L(T ) |S|. SinceL(T ) n, ⇡ˆ(T ) L(T ) |S| n |S| = ⇡ˆ(S), which is a contradiction.For the case of |T | < |S|, let |S| |T | = `. Thus, ⇡ˆ(T ) = L(T ) (|S| `).Since T is not a VC, L(S) = n, and it is supposed that ⇡ˆ(T ) > ⇡ˆ(S), we haveL(T ) = n j, for some j 2 [1, `). Then, from the way in which influenceweights and thresholds are set up, we know there are exactly j nodes inV \ T that are not activated. Let J be the set containing those j nodes, andconsider the set T 0 = T [ J , for which we have ⇡ˆ(T 0) = n. From the proof ofTheorem 2.7 of [86], T 0 is a VC of G. But since |T 0| = |T |+ j < |S|, T 0 is aVC with a strictly smaller size than S, which gives a contradiction since Sis a MVC.((=). Suppose that S = argmaxT✓V ⇡ˆ(T ), but S is not a VC of G (wewill consider MVC later). This implies that there exists at least one edgee 2 E such that both endpoints of e, denoted by ui and uj , are not in S.From the way in which influence weights and thresholds are set up in G0, weknow both ui and uj are not activated. Thus, if we add either one of them, sayui, into S, L(S[{ui}) is at least L(S)+2, and thus ⇡ˆ(S[{ui})⇡ˆ(S) > 1,which contradicts with the fact that S optimizes ⇡ˆ. Hence, S must be a VCof G. Now suppose that in addition S is not a MVC. Then, there must existsome x 2 S such that the node-set S \ {x} is still a VC of G; this meansthat L(S \ {x}) = n, too. Thus, ⇡ˆ(S \ {x}) = n |S|+1 > ⇡ˆ(S) = n |S|,which is a contradiction. Hence, S is indeed a MVC of G.Now we have shown that an optimal solution to the restricted ProMaxproblem is an optimal solution to the Minimum Vertex Cover problem,and vice versa; this completes the proof.Observe that both components of ⇡ˆ, L(S) and |S|, are submod-ular, which leads to the submodularity of ⇡ˆ as it is a non-negativelinear combination of two submodular functions. However, unlike forInfluence Maximization, the function is non-monotone and we want to find19Algorithm 2: U-GreedyData: G = (V,E)Result: seed set S1 begin2 S ;3 while true do4 u argmaxui2V \S [⇡ˆ(S [ {ui}) ⇡ˆ(S)]5 if ⇡ˆ(S [ {u}) ⇡ˆ(S) > 0 then6 S S [ {u}7 else8 breaka set S of any size that maximizes ⇡ˆ(S), so the standard Greedy is notapplicable here.Feige et al. [55] give a randomized local search (2/5-approximation) formaximizing non-monotone submodular functions. This is applicable to ⇡ˆ,but have time complexity O(|V |3|E|/✏), where (1 + ✏/|V |2) is the per-stepimprovement factor in the search. In contrast, the function ⇡ˆ is the differencebetween a monotone submodular function and a linear function, we proposea greedy approach (Algorithm 2 U-Greedy) with time complexity O(|V |2|E|)and a better approximation ratio, which is slightly lower than 1 1/e ✏.U-Greedy grows the seed set S in a greedy fashion similar to Greedy, andterminates when no node can provide positive marginal gain w.r.t. S.Theorem 2. Given an instance of the restricted ProMax problem underthe LT-V model consisting of a graph G = (V,E) with edge weights andobjective function ⇡ˆ, let Sg ✓ V be the seed set returned by Algorithm 2, andS⇤ ✓ V be the optimal solution. Then,⇡ˆ(Sg) (1 1/e ✏) · ⇡ˆ(S⇤)⇥(max{|Sg|, |S⇤|}). (2.2)Proof. Case (i). If |S⇤| |Sg|, then since L is monotone and submodular,we haveL(Sg) (1 1/e ✏) · L(S⇤).20Thus, by the definition of ⇡ˆ, we have⇡ˆ(Sg) = p · L(Sg) (p+ ca) |Sg| p(1 1/e ✏) · L(S⇤) (p+ ca) |Sg|= (1 1/e ✏) · ⇡ˆ(S⇤) (p+ ca) |Sg|+ (1 1/e ✏)(p+ ca) |S⇤|= (1 1/e ✏) · ⇡ˆ(S⇤)⇥(Sg).Case (ii). If |S⇤| > |Sg|, consider a set S0g obtained by running U-Greedyuntil |S0g| = |S⇤|. Clearly, from case (i), we have⇡ˆ(S0g) (1 1/e ✏) · ⇡ˆ(S⇤)⇥(|S0g|).Due to the fact that |S⇤| = |S0g| > |Sg|, and Sg is obtained by runningU-Greedy until no node can provide positive marginal profit, we have⇡ˆ(Sg) ⇡ˆ(S0g) (1 1/e ✏) · ⇡ˆ(S⇤)⇥(|S⇤|).Combining the above two cases gives Equation (2.2).Theorem 2 indicates that the gap between the U-Greedy solution anda (1 1/e ✏)-approximation grows linearly w.r.t. the cardinality of theseed set. Since this cardinality is typically much smaller than the expectedspread, U-Greedy can provides quality guarantees for restricted ProMaxwith objective function ⇡ˆ.2.5 General Properties of the LT-V ModelTheorem 1 shows that in a restricted setting where exact valuations areknown and the optimal pricing strategy is trivial, ProMax is still NP-hard.Now we consider the general ProMax described in §2.3.1, and show thatfor any fixed price vector, the general profit function maintains submodu-larity (w.r.t. the seed set) regardless of the specific forms of the valuationdistributions.Given a seed set S and a price vector p, let ap(ui|S,p) denote ui’s adop-tion probability, defined as the probability that ui adopts the product by theend of the diffusion started with seed set S and price vector p. Similarly,let ip(ui|S,pi) denote ui’s probability of getting influenced under the sameinitial conditions, where pi 2 [0, 1]|V |1 is the vector of all prices excludingpi. Also, let ⇡(i)(S,p) be the expected profit earned from ui.21By model definition, for any ui 2 V \ S, we haveap(ui|S,p) = ip(ui|S,pi) · (1 Fi(pi))and⇡(i)(S,p) = pi · ap(ui|S,p).If ui 2 S, then we haveip(ui|S,pi) = 1and⇡(i)(S,p) = pi · (1 Fi(pi)) ca.By linearity of expectations, we have⇡(S,p) =Xui2V⇡(i)(S,p).Hence, to analyze the profit function, we just need to focus on the adoptionprobability, in which the factor (1 Fi(pi)) does not depend on S, butip(ui|S,pi) requires careful analysis, which we shall present in the proofof Theorem 3.Let v = (v1, . . . , v|V |) 2 [0, 1]|V | be a vector of user valuations, corre-sponding to random samples drawn from the various user valuation distri-butions. We now have:Theorem 3 (Submodularity). Given an instance of the LT-V model, for anyfixed vector p 2 [0, 1]|V | of prices, the profit function ⇡(S,p) is submodularw.r.t. S, for an arbitrary vector v of valuation samples.The proof of submodularity of the influence spread function in theclassical LT model [86] relies on establishing an equivalence between the LTmodel and reachability in a family of random graphs generated as follows:for each node ui 2 V , select at most one of its incoming edges at random,such that (uj , ui) is selected with probability wj,i, and no edge is selectedwith probability 1Puj2N in (ui)wj,i. We will use a similar approach in theproof of Theorem 3.22Proof of Theorem 3. By linearity of expectation as well as the above analysison adoption probabilities,⇡(S,p)=Xui2V⇡(i)(S,p)=Xui2S[pi(1 Fi(pi)) ca] +Xui 62Spi(1 Fi(pi)) · ip(ui|S,pi).Since the first sum is linear in S, it suffices to show that ip(ui|S,pi) issubmodular in S, whenever ui 62 S.To encode random events of the LT-V model using the possible worldsemantics, we do the following. First, we run a node coloring process on G: foreach node ui, if pi vi, color it black; otherwise color it white. Meanwhile, werun a live-edge selection process following the aforementioned protocol [86].Note that the two processes are orthogonal and independent of each other.Combining the results of both leads to a colored live-edge graph, which wecall a possible world X. Let X be the probability space in which each samplepoint specifies one such possible world X.Next, we define the notion of “black-reachability”. In any possible worldX, a node ui is black-reachable from a node set S if and only if there existsa black node s 2 S such that ui is reachable from s via a path consistingentirely of black nodes, except possibly for ui (even if ui is white, it is stillconsidered black-reachable since here we are interested in the probability ofbeing influenced, not adopting). From the same argument in the proof ofClaim 2.6 of [86], on any black-white colored graph, the following two distri-butions over the sets of nodes are the same: (i) the distribution over sets ofinfluenced nodes obtained by running the LT-V process to completion start-ing from S; (ii) the distribution over sets of nodes that are black-reachablefrom S, under the live-edge selection protocol.Let IX(ui|S) be the indicator set function such that it is 1 if ui isblack-reachable from S, and 0 otherwise. Consider two sets S and T withS ✓ T ✓ V , and a node x 2 V \ T . Consider some ui that is black-reachablefrom T [ {x} but not from T . This implies (i) ui is not black-reachable fromS either (otherwise, ui would also be black-reachable from T , which is acontradiction); (ii) the source of the path that “black-reaches” ui must bex. Hence, ui is black-reachable from S [ {x}, but not from S, which impliesIX(ui|S [ {x}) IX(ui|S) = 1 1 = IX(ui|T [ {x}) IX(ui|T ). Thus,IX(ui|S) is submodular. Since ip(ui|S,pi) =PX2X Pr[X] · IX(ui|S) is a23nonnegative linear combination of submodular functions, it is also submod-ular w.r.t. S. This completes the proof.We also remark that in general graphs, given any S and p, it is #P-hardto compute the exact value of ⇡(S,p) for the LT-V model, just as in the caseof computing the exact expected spread of influence for the LT model. Thiscan be shown similarly to the proof of Theorem 1 in [39].2.6 Algorithm Design and AnalysisFor ProMax, since the expected profit is a function of both the seed setand the vector of prices, a ProMax algorithm should determine both theseed set and an assignment of prices to nodes to optimize the expectedprofit. Accordingly, it has two components: (i) a seed selection procedurethat determines S, and (ii) a pricing strategy that determines p. Due toacquisition costs and the possible need for seed-discounting (details later),⇡(S,p) is still non-monotone in S and is in the form of the difference betweena monotone submodular function and a linear function. Hence, inspired bythe restricted ProMax studied in 2.4, we propose to use U-Greedy for seedset selection.We then propose three pricing strategies and integrate them with U-Greedy to obtain three ProMax algorithms. The first two, All-OMP andFFS, are baselines with simple strategies that set prices of seeds withoutconsidering the network structure and influence spread, while the third one,PAGE, computes optimal discounts for candidate seeds based on their “profitpotential”. Intuitively, it “rewards” seeds with higher influence spread bygiving them a deeper discount to boost their adoption probabilities, and inturn the adoption probabilities of nodes that may be influenced directly orindirectly by such seeds.Notice that taking valuations into account when modeling the diffusionprocess of product adoption makes a difference for a marketing company. Apricing strategy that does not consider valuations is limited: either it chargeseveryone full price (or at best gives full discount to the seeds), or it uses anad-hoc discount policy which is necessarily suboptimal. By contrast, PAGEmakes full use of valuation information to determine the best discounts.2.6.1 Two Baseline AlgorithmsRecall that in our model, users in the social network are price-takers whomyopically respond to the price offered to them. Thus, given a distribution24Algorithm 3: All-OMP — Optimal Myopic Price for All UsersData: graph G = (V,E), CDFs Fi(·) for all ui 2 VResult: seed set S, price vector pm1 begin2 S ;3 pm 04 foreach ui 2 V do5 pm[i] pmi = argmaxp2[0,1] p · (1 Fi(p))6 while true do7 u argmaxui2V \S [⇡(S [ {ui},pm) ⇡(S,pm)]8 if ⇡(S [ {u},pm) ⇡(S,pm) > 0 then9 S S [ {u}10 else11 breakfunction Fi of valuation vi, the optimal myopic price (OMP) [77] can becalculated by:pmi = argmaxp2[0,1]p · (1 Fi(p)). (2.3)2.6.1.1 OMP For All UsersOffering OMP to a single influenced node ensures that the expected profitearned solely from that node is the maximum. This gives our first ProMaxalgorithm, All-OMP, which offers OMP to all nodes regardless of whether anode is a seed or how influential it is. First, for each ui 2 V , it calculatespmi using Equation (2.3), and populates all OMPs to form the price vectorpm = (pm1 , ..., pm|V |). Then, treating pm fixed, it essentially runs U-Greedy(Algorithm 2) to select the seeds. When the algorithm cannot find a node ofwhich the marginal profit is positive, it stops.Notice that Equation (2.3) overlooks the network structure and ignoresthe profit potential of seeds. This may lead to the sub-optimality of All-OMPin general. Figure 2.2 illustrates this with an example.Suppose that all valuations are distributed uniformly in [0, 1] and the ac-quisition cost ca = 0.001. Hence, pm = (1/2, . . . , 1/2). Consider seeding node1: it adopts w.p. 0.5, giving a profit of 0.5+ 5 ⇤ 0.53 0.001 = 1.124; it doesnot adopt w.p. 0.5, resulting in a profit of 0.001. Thus, the expected profit25162 34 50.50.50.5 0.50.5Figure 2.2: An example graph.⇡({1},pm) = 0.5615. However, when p1 = 3/16, ⇡({1},pm1 (3/16)) =0.661. Here we have used pi x to denote a vector sharing all values withp except that the i-th coordinate is replaced by x, e.g., if p = (0.2, 0.3, 0.4),then p1 0.5 = (0.5, 0.3, 0.4). This shows that for high-influence networksand low acquisition cost, the profit earned by running All-OMP can be im-proved by seed-discounting, i.e., lowering prices for seeds so as to boost theiradoption probabilities and thus better leverage their influence over the net-work. The intuition is that the profit loss over seeds (stemming from thediscount) can potentially be compensated and even surpassed by the profitgain over non-seeds: more seeds may adopt as a result of the discount andthe probabilities of non-seeds getting influenced will go up as more seedsadopt.2.6.1.2 Free Samples for SeedsGenerally speaking, there exists a trade-off between the immediate (myopic)profit earned from seeds and the potentially more profit earned from non-seeds. Favoring the latter, we propose our second algorithm FFS (Free-For-Seeds) which gives a full discount to seeds and charges non-seeds the OMP.FFS first calculates pm = (pm1 , ..., pm|V |) using Equation (2.3). Then it runsU-Greedy: in each iteration, it adds to S the node which provides the largestmarginal profit when a full discount (i.e., price 0) is given. For all seedsadded, their prices remain 0; the algorithm ends when no node can providepositive marginal profit.Since FFS has a completely opposite attitude towards seed-discountingcompared to All-OMP, intuitively, it should be suitable for high-influencenetworks and low acquisition costs, but it may be overly aggressive for low-influence networks and high acquisition costs. For example, in Fig 2.2, theFFS profit by seeding node 1 is 0.625, better than the All-OMP profit 0.5615.26Algorithm 4: FFS — Free-For-SeedsData: graph G = (V,E), CDFs Fi(·) for all ui 2 VResult: seed set S, price vector pf1 begin2 S ;3 pf 04 foreach ui 2 V do5 pf [i] pmi = argmaxp2[0,1] p · (1 Fi(p))6 while true do7 u argmaxui2V \S [⇡(S [ {ui},pfi 0) ⇡(S,pf )]8 if ⇡(S [ {u},pfu 0) ⇡(S,pf ) > 0 then9 S S [ {u}; pf pfu 010 else11 breakBut if all influence weights are 0.01 instead of 0.5, and ca = 0.01, All-OMPgives a profit of 0.246, while FFS gives only 0.0025.2.6.2 The Price-Aware Greedy AlgorithmBoth All-OMP and FFS are easy for marketing companies to operate, butthey are not balanced and are not robust against different input instances asillustrated above by examples. To achieve more balance, we propose the PAGE(for Price-Aware GrEedy) algorithm (Algorithm 5). PAGE also employs U-Greedy to select seeds. It initializes all seed prices to their OMP values (Step3). In each round, it calculates the best price for each candidate seed suchthat its marginal profit (MP) w.r.t. the chosen S and p is maximized (Step7); then it picks the node with the largest maximum MP (Step 8). It stopswhen it cannot find a seed with a positive MP (Step 11). For all non-seednodes, PAGE still charges OMP. We next explain the details of determiningthe best price for a candidate seed.Given a seed set S, consider an arbitrary candidate seed ui 2 V \S, withits price pi to be determined. The marginal profit (MP ) that ui providesw.r.t. S with pi isMP (ui) = ⇡(S [ {ui},pi pi) ⇡(S,pi pmi ),27Algorithm 5: PAGE — Price Aware Greedy AlgorithmData: graph G = (V,E), CDFs Fi(·) for all ui 2 VResult: seed set S, price vector p1 begin2 S ;3 p 04 foreach ui 2 V do5 p[i] pmi = argmaxp2[0,1] p · (1 Fi(p))6 while true do7 foreach ui 2 V \ S do8 Estimate the value of Y0 and Y1 by MC simulations9 p⇤i argmaxpi2[0,1] gi(pi); normalize if needed10 u argmaxui2V \S gi(p⇤i )11 if ⇡(S [ {ui},pi p⇤i ) ⇡(S,pi pmi ) > 0 then12 S S [ {ui}; p pi p⇤i13 else14 breakwhere pi is fixed. The key task in PAGE is to find pi such that MP (ui)is maximized. Since ⇡(S,pi pmi ) does not involve ui and pi, it suffices tofind pi that maximizes this quantity:⇡(S [ {ui},pi pi)Seeding ui at a certain price pi results in two possible worlds: worldX(i)1 with Pr[X(i)1 ] = 1 Fi(pi), in which ui adopts, and world X(i)0 withPr[X(i)0 ] = Fi(pi), in which ui does not adopt. In world X(i)1 , the profitearned from ui is pi ca and let the expected profit earned from other nodesbe Y1. Similarly, in world X(i)0 , the profit from ui is ca and let the expectedprofit from other nodes be Y0. Notice that Y1 depends on the influence of uibut Y0 does not. Putting it all together, the quantity of ⇡(S [{ui},pipi)can be expressed as a function of pi as follows:gi(pi) = (1 Fi(pi)) · (pi + Y1) + Fi(pi) · Y0 ca . (2.4)Similarly to the expected spread of influence in Influence Maximization, theexact values of Y1 and Y0 cannot be computed in PTIME (due to #P-28hardness [39]), but sufficiently accurate estimations can be obtained byMonte Carlo (MC) simulations.Finding p⇤i = argmaxpi2[0,1] gi(pi) now depends on the specific form ofthe distribution function Fi. We consider two kinds of distributions: thenormal distribution, for which vi ⇠ N (µ,2), 8ui 2 V , and the uniformdistribution, for which vi ⇠ U(0, 1), 8ui 2 V . The choice of the normal dis-tribution is supported by evidence from real-world data from Epinions.com(see §2.7), and also work in [83]. When sales data are not available, it iscommon to consider the uniform distribution with support [0, 1] to accountfor our complete lack of knowledge [20,132].Normal Distribution. For normal distribution, assume that vi ⇠ N (µ,2)for some µ and , then 8pi 2 [0, 1],Fi(pi) =121 + erf✓pi µp2◆,where erf(·) is the error function, defined aserf(x) =2p⇡Z x0et2dt.Plugging Fi(·) back into Equation (2.4), one cannot obtain an analyticalsolution for p⇤i , as erf(x) has no closed-form expression. Thus, we turn tonumerical methods to approximately find p⇤i . Specifically, we use the goldensection search algorithm, a technique that finds the extremum of a unimodalfunction by iteratively shrinking the interval inside which the extremum isknown to exist [61]. In our case, the search algorithm starts with the interval[0, 1], and we set the stopping criteria to be that the size of the interval whichcontains pi is strictly smaller than 108.Uniform Distribution. The uniform distribution has easier calculationsand analytical solutions. If vi ⇠ U(0, 1), then 8pi 2 [0, 1], Fi(pi) = pi, andplugging it back to Equation (2.4) givesgi(pi) = p2i + (1 Y1 + Y0) · pi + Y1 ca.Hence, the optimal pricep⇤i =(1 + Y1 Y0)2.29Epinions Flixster NetHEPTNumber of nodes 11K 7.6K 15KNumber of edges 119K 50K 62KAverage out-degree 10.7 6.5 4.12Maximum out-degree 1208 197 64#Connected components 4603 761 1781Largest component size 5933 2861 6794Table 2.1: Dataset statisticsFor both normal and uniform distributions, if p⇤i > 1 or p⇤i < 0, it isnormalized back to 1 or 0, respectively. Also note that the above solutionframework applies to any probability distribution that vi may follow, as longas an analytical or numerical solution can be found for p⇤i .Lines 7-10 in Algorithm 5 (and also the U-Greedy seed selection procedurein All-OMP and FFS) can be accelerated by the CELF optimization [100], orthe more recent CELF++ [71]. The adaptation is straightforward and thedetails can be found in [71,100].2.7 ExperimentsWe conducted experiments on real-world network datasets to evaluate ourproposed baselines and the PAGE algorithm. In all these algorithms, a keystep is to compute the marginal profit of a candidate seed. As mentioned inSection 2.3, computing the exact expected profit is intractable for the LT-V model. Thus, we estimated the expected profit with Monte Carlo (MC)simulations. Following [86], we ran 10,000 simulations for this purpose. Thisis an expensive step and as for Influence Maximization, it limits the size ofnetworks on which we can run these simulations. For the same reason, theCELF optimization was used in all algorithms as a heuristic.All implementations were done in C++ and all experiments were run ona server with 2.50GHz eight-core Intel Xeon E5420 CPU, 16GB RAM, andWindows Server 2008 R2.2.7.1 DataWe used three network datasets whose statistics are summarized in Table 3.2:(i) Epinions [128], a who-trust-whom network extracted from review siteEpinions.com: an edge (ui, uj) was drawn if uj has expressed her trust30in ui’s reviews; (ii) Flixster3, a friendship network from social movie siteFlixster.com: if ui and uj are friends, we drew edges in both directions;(iii) NetHEPT (a standard dataset that is widely adopted in the literatureof social influence [37, 39, 72, 86])4, a co-authorship network extracted fromthe High Energy Physics Theory section of arXiv.org: if ui and uj haveco-authored papers. Edges are bidirectional in this dataset.The raw data of Epinions and Flixster contain 76K users, 509K edgesand 1M users, 28M edges, respectively. We used the METIS graph parti-tion software5 to extract a subgraph for both networks, to ensure that MCsimulations can finish within a reasonable amount of time.2.7.1.1 Computing Influence WeightsWe used two methods, Weighted Distribution (WD) and Trivalency (TV), toassign influence weights to edges. For WD, wi,j = Ai,j/Nj , where Ai,j is thenumber of actions ui and uj both perform, and Nj is a normalization factor,i.e., the number of actions performed by uj , to ensurePui2N in (uj)wi,j 1. InFlixster, Ai,j is the number of movies uj rated after ui; in NetHEPT, Ai,j isthe number of papers ui and uj co-authored; in Epinions, since no action datais available, we used wi,j = 1/din(uj) as an approximation. For TV, wi,j wasselected uniformly at random from {0.001, 0.01, 0.1}, and was normalized toensurePui2N in (uj)wi,j 1. Figure 2.3 illustrates the distribution of weightsfor Flixster, and shows that influence strength is higher in WD graphs.2.7.1.2 Learning Valuation DistributionsAs mentioned in Section 2.1, valuations are difficult to obtain directly fromusers, and we had to estimate the distribution using historical sales data.On Epinions.com, a user typically wrote a review, gave an integer ratingfrom 1 to 5, and might also report the paid in US dollars: Figure 2.4 showsan example of such reviews. If a review contains both price and rating, wecan combine them to approximately estimate the valuation of that user, asin such systems, ratings are seen as people’s utility for a good, and utility isthe difference of valuation and price [132].We observed that most products have only a limited number of reviews(typically no more than 100), and thus a single product may not provide3http://www2.cs.sfu.ca/~sja25/personal/datasets/, last accessed on September 7th,2015. All ratings in this dataset contain timestamps.4http://research.microsoft.com/en-us/people/weic/projects.aspx, last accessed onSeptember 7th, 2015.5http://glaros.dtc.umn.edu/gkhome/views/metis, last accessed on September 7th, 2015.31 10 100 1000 10000 100000 0 0.2 0.4 0.6 0.8 1Number of Edges (Frequency)Influence WeightFlixster (WD)(a) Weighted Distribution 1000 10000 100000 0 0.2 0.4 0.6 0.8 1Number of Edges (Frequency)Influence WeightFlixster (Trivalency)(b) TrivalencyFigure 2.3: Distribution of influence weights in Flixsterenough samples. To circumvent this difficulty, we acquired all reviews forthe popular Canon EOS 300D, 350D, and 400D cameras. Given that thesecameras belong to in the same product line of Canon (entry-level DigitalSingle-Lens Reflex camera, or DSLR), we assume that they had similar mon-etary values to consumers at their respective release time. Further, we assumemost users opted to buy (and hence reviews) the latest model when multipleexist on the market. This allows us to treat the three cameras as a “unified”product to obtain sufficient data points.After removing reviews without prices reported, we were left with 276samples. Next, we transformed prices and ratings to obtain estimated valu-ations as follows:valuation = price ⇤✓1 +rating5◆.We then normalized the results into [0, 1] and fit the data to a nor-mal distribution with mean 0.53 and variance 0.14 estimated by maximumlikelihood estimation (MLE). Figure 2.5a plots the histogram of the normal-ized valuations; Figure 2.5b presents the CDFs of our empirical data andN (0.53, 0.142). To test the goodness of fit, we computed the Kolmogorov-Smirnov (K-S) statistic [66] of the two distributions, which is defined as themaximum difference between the two CDFs; in our case, the K-S statistic is0.1064. As can be seen from Figure 2.5b, N (0.53, 0.142) is a reasonable fitfor the estimated valuations of the three Canon EOS cameras.Since there were no price data available to collect in Flixster andNetHEPT, we use N (0.53, 0.142) in the simulations for all three datasets.For a comprehensive comparison among different algorithms, we also tested32Figure 2.4: A review for Canon EOS 300D camera on Epinions.com. Atthe end of the review, the user mentioned the price – $999.the uniform distribution over [0, 1], i.e., U(0, 1), as it has been commonlyassumed and used in the literature [20,132].2.7.2 Results and AnalysisWe now compare PAGE, All-OMP, and FFS in terms of the expected profitachieved, price assignments, and running time. Although all algorithms em-ploy U-Greedy which does not terminate until the marginal profit starts de-creasing. For uniformity, we report simulation results up to 100 seeds.Expected Profit Achieved. The quality of outputs (seed sets and pricevectors) of All-OMP, FFS, and PAGE for general ProMax were evaluatedbased on the expected profit achieved. Figures 2.6, 2.7, and 2.8 illustrate theresults on Epinions, Flixster, and NetHEPT, respectively. In each dataset,both valuation distributions were tested in four settings: WD weights withca = 0.1 and 0.001; TV weights with ca = 0.1 and 0.001. As prices and330 0.2 0.4 0.6 0.8 1020406080100FrequencyNormalized Estimated Valuation(a) Histogram of valuations0 0.2 0.4 0.6 0.8 100.20.40.60.81xF(x) Empirical CDFCDF of Fitted Normal(b) Empirical & normal CDFsFigure 2.5: Statistics of Valuations (Epinions.com)0!200!400!600!800!1000!1200!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!0!200!400!600!800!1000!1200!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!(a) WD with ca = 0.1 (b) WD with ca = 0.0010!50!100!150!200!250!300!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!0!50!100!150!200!250!300!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!(c) TV with ca = 0.1 (d) TV with ca = 0.001Figure 2.6: Expected profit achieved (Y-axis) on Epinions graphs w.r.t.|S| (X-axis). (N)/(U) denotes normal/uniform distribution.valuations are in [0, 1], we used 0.1 to simulate high acquisition costs and0.001 for low costs. Except for NETHEPT-TV with ca = 0.1 (Figure 2.8(c))and 0.001 (Figure 2.8(d)), FFS was better than All-OMP; this indicates thatonly in NetHEPT-TV, influence strength is low enough so that giving freesamples blindly to all seeds would negatively impact profits.340!200!400!600!800!1000!1200!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!0!200!400!600!800!1000!1200!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!(a) WD with ca = 0.1 (b) WD with ca = 0.0010!50!100!150!200!250!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!0!50!100!150!200!250!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!(c) TV with ca = 0.1 (d) TV with ca = 0.001Figure 2.7: Expected profit achieved (Y-axis) on Flixster graphs w.r.t.|S| (X-axis). (N)/(U) denotes normal/uniform distribution.0!100!200!300!400!500!600!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)! PAGE (U)!FFS (U)!All-OMP (U)!0!100!200!300!400!500!600!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!FFS (N)!All-OMP (N)!PAGE (U)!FFS (U)!All-OMP (U)!(a) WD with ca = 0.1 (b) WD with ca = 0.0010!10!20!30!40!50!60!70!80!90!100!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!All-OMP (N)!FFS (N)!PAGE (U)!All-OMP (U)!FFS (U)!0!20!40!60!80!100!120!1! 5! 10!15!20!25!30!35!40!45!50!55!60!65!70!75!80!85!90!95!100!PAGE (N)!All-OMP (N)!FFS (N)!PAGE (U)!All-OMP (U)!FFS (U)!(c) TV with ca = 0.1 (d) TV with ca = 0.001Figure 2.8: Expected profit achieved (Y-axis) on NetHEPT graphs w.r.t.|S| (X-axis). (N)/(U) denotes normal/uniform distribution.35In all test cases, PAGE performed consistently better than FFS and All-OMP. The margin between PAGE and FFS is higher in TV graphs (by,e.g., 15% on Epinions-TV with N (0.53, 0.142), ca = 0.1) than that in WDgraphs (by, e.g., 2.1% on Epinions-WD with N (0.53, 0.142), ca = 0.1), ashigher influence in WD graphs can potentially bring more compensations forprofit loss in seeds for FFS. Also, the expected profit of all algorithms underN (0.53, 0.142) is higher than that under U(0, 1), since adoption probabilitiesunder N (0.53, 0.142) are higher. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0 10 20 30 40 50 60 70 80 90 100PriceSeed set sizeAll-OMPPAGE 0.001PAGE 0.1FFSFigure 2.9: Price assigned to seeds (Y-axis) w.r.t. |S| (X-axis) onEpinions-TV with N (0.53, 0.142).Price Assignments. For N (0.53, 0.142) and U(0, 1), the OMP is 0.41 and0.5, respectively. Figure 2.9 demonstrates the prices offered to each seed byAll-OMP, FFS, and PAGE on Epinions-TV with N (0.53, 0.142). All-OMP andFFS assigns 0.41 and 0 for all seeds, respectively, For PAGE, as the seedset grows, price tends to increase, reflecting the intuition that discount isproportional to the influence (profit potential) of seeds, as they are added ina greedy fashion and those added later have diminishing profit potential.Epinions-WD Flixster-WD NetHEPT-WDN U N U N UAll-OMP 6.7 2.3 3.0 1.0 2.6 2.2FFS 6.3 2.1 2.8 1.0 2.7 2PAGE 4.8 1.3 2.3 0.5 1.0 0.9Table 2.2: Running time, in hours (WD weights, ca = 0.1)36Epinions-TV Flixster-TV NetHEPT-TVN U N U N UAll-OMP 5.1 2.4 1.4 1.0 2.3 2.1FFS 5.5 2.5 1.5 0.8 2.2 1.8PAGE 4.0 1.0 0.9 0.4 0.8 0.5Table 2.3: Running time, in hours (TV weights, ca = 0.1)Running Time. Tables 2.2 and 2.3 present the running time of all algo-rithms on the three networks with WD weights and TV weights, respec-tively6. As adoption probabilities under N (0.53, 0.142) are higher, all algo-rithms ran longer with the normal distribution on all graphs. Similarly, asinfluence in WD graphs are higher, the running time on them is longer thanthat on TV graphs.All-OMP and FFS had roughly the same running time. More interestingly,PAGE was faster than both baselines in all cases. The observation is that ineach round of U-Greedy, PAGE maximizes the marginal profit for each can-didate seed in the priority queue maintained by CELF. Thus, heuristically,the lazy-forward procedure in CELF (see [100]) may have a better chanceto return the best candidate seed sooner for PAGE. All-OMP and FFS alsobenefited from CELF, but since the marginal profits of candidate seeds areoften suboptimal, elements in the CELF queue tend to be clustered, and thusthe lazy-forward is not as effective. Besides, for PAGE under N (0.53, 0.142),the golden section search usually converges in less than 40 iterations withstopping criteria 108 (defined in Section 2.6.2); thus the extra overhead itbrought is negligible compared to MC simulations.To conclude, our empirical results on three real-world datasets with twodifferent valuation distributions have shown that the PAGE algorithm con-sistently outperforms baselines All-OMP and FFS in both expected profitachieved and running time. It was also the most robust one (against variousinputs) among all algorithms.2.8 Discussion and Future WorkThe current algorithms for ProMax cannot scale to graphs with morethan tens of thousands of nodes due to the expensive MC simulations. Toachieve scalability, one extend fast heuristics developed specifically for LT,e.g., LDAG [39] and SimPath [72], so that they are suitable for computing6The results for ca = 0.001 are similar, which are omitted here.37expected profit accurately under our LT-V model. Another possible exten-sion is to consider users’ spontaneous interests in product adoption, andincorporate it into the LT-V model for profit maximization. Due to personaldemand, a user may have spontaneous interests in a certain product evenwhen no neighbor in the network has adopted. To model this, each node uiis associated with a “network-less” probability i [50]. An inactive node be-comes influenced when the sum of i and the total influence from its adoptingneighbors are at least ✓i. A marketing company can thus wait for sponta-neous adopters to emerge first and propagate their adoption (for ` time steps,where ` is the diameter of G), and then deploy a viral marketing campaignto maximize the expected profit. Our analysis and solution framework canbe naturally applied to this setting. In addition, it is interesting to look intomore sophisticated methodologies to acquire knowledge on user valuations,e.g., by leveraging users’ full previous transaction history, as well as look atreal datasets besides Epinions.In Section 2.2, we discussed the connections and differences between theLT-V model proposed in this chapter and the LT-C model studied in [17].As they model different aspects in social influence and offer orthogonal ex-tensions to the canonical LT model, it is interesting to incorporate the newelements of both models into a combined one. Note that a common propertybetween the two models is that they both distinguish between the state ofa node being influenced and the state of actually adopting the product. Al-though the extensions model different aspects of social influence diffusions,in principle the two models may be integrated together to obtain a mixedmodel. In the integrated model, termed LT-VC, a user i transits from theinfluenced state to adopting with probability 1 Fi(pi), namely the proba-bility that the valuation of i is at least as high as the price offered to i. If idoes not adopt the product, then she would “tattle”, i.e., still spreading heropinions on the product to neighbors: She enters the “promote” state withprobability ↵i ·Fi(pi) to spread positive opinions and the “inhibit” state withprobability (1 ↵i) · Fi(pi) to spread negative opinions.Challenges exist for solving ProMax under the new LT-CV model. First,is the expected total profit still a submodular function of the seed set givenany fixed price vector, especially considering the presence of tattling behav-iors as well as the existence of both positive and negative opinions. Second,will the approximation guarantee of the U-Greedy algorithm be preservedunder this model? Third, is it possible (and if yes, how) to solve the PAGEalgorithm to tackle ProMax for this model?38Chapter 3Competitive Viral Marketing:The Host PerspectiveIn this chapter, we propose a “viral-marketing-as-a-service” business modelfor social network hosts, and study viral marketing from a novel angle: thehost’s perspective. In particular, we solve the problem of how a social net-work service provider should select and allocate seeds to competing advertis-ers, such that first, the collective spread of influence over all advertisers aremaximized, and second, seed allocations are fair: expected reward for eachadvertiser is in proportion to its investment in the marketing campaign.3.1 IntroductionThe bulk of research on viral marketing and influence maximization assumesthat there is one company, introducing one product in the market. In otherwords, there is no competition. However, in the real world, typically multipleplayers compete with comparable products over the same market. For exam-ple, consider consumer technologies such as videogame consoles (X-Box vs.Playstation), digital SLR cameras (Canon vs. Nikon) or smartphones (An-droid vs. iPhone): since the adoption of these consumer technologies is notfree, it is very unlikely that an average consumer will adopt more than one ofthe competing products. Recognizing this, there has been some recent workon competitive viral marketing, where two or more players compete with sim-ilar products for the same market. The majority of these studies focus onthe best strategy for one of the players [18,24,29,31,79,94].Our motivating observation is that social network platforms are owned bythird party such as Facebook and LinkedIn. The owner keeps the proprietary39social graph confidential1 for obvious reasons of the company benefits, as wellas due to privacy legislation. We call the owner the host. Companies thatwant to run viral campaigns are the host’s clients. The clients typically donot have direct access to the network and thus cannot choose seeds for theircampaign on their own. Any campaign would need the host’s permission andprivilege to run. Take Facebook as an example, business owners can set up aFacebook Page and create display ads or promoted posts to reach users2, butthey are not able to effectively implement a viral marketing campaign whichdirectly reaches individual users, due to the lack of access to the networkgraph and privacy concerns.Motivated by this observation, we propose and study the novel prob-lem of competitive viral marketing from the host perspective. We consider anew business model where the host offers viral marketing as a service, fora price. It allows the clients to run campaigns by specifying a seed budget,i.e., number of seeds desired. The host controls the selection of seeds andtheir allocation to companies. Once seeds are allocated, companies competefor adopters of their products on the common network.In classical non-competitive influence maximization, the objective is tochoose the seeds so as to maximize the expected number of adopters. How-ever, in a competitive setting, from the host’s perspective, it is importantto not only choose the seeds to maximize the collective expected number ofadoptions across all companies, but also allocate seeds to companies in sucha way that guarantees the “bang for the buck ” for all companies is nearly thesame.More specifically, the bang for the buck for a company is the ratio betweenthe expected number of adopters of its product and the number of seeds. Wecall this the amplification factor, as it reflects how the investments in a smallnumber of seeds get amplified by the network effect through propagation ofsocial influence. If the host allocates the seeds carelessly to its clients, it mayresult in a wide variance in the amplification factors, leading to resentfulclients. Consider the following hypothetical scenario. Suppose Canon andNikon are two clients with seed limit 20 and 30 respectively, and Facebook,as host, would select 50 seeds in total. If those 50 seeds are allocated in such away that Canon ends up getting expected spread of 400 (“bang for the buck”= 20), while Nikon gets 300 (“bang for the buck” = 10), this allocation isunfair and may lead to Nikon to feel resentful.1http://techcrunch.com/2013/01/24/my-precious-social-graph/, last accessed onSeptember 7th, 2015.2https://www.facebook.com/business, last accessed on September 7th, 201540Motivated by the aforementioned, we propose a new propagation modelcalled K-LT by extending the classical Linear Threshold (LT) model [86] tocapture the competitive aspect in viral marketing. Intuitively, propagationin our model consists of two phases. A node (user) is in one of three states:inactive, influenced, or active. It adopts a product only in the active state.In the first phase, inactive nodes may become influenced (to adopt a prod-uct) as a result of influence coming in from their neighbors. In the secondphase, an influenced node makes its choice to adopt one of the products (i.e.,becomes active) based on the relative strengths of incoming influence for dif-ferent products. The model is intuitive and retains the desired properties ofmonotonicity and submodularity.We then define the fair seed allocation problem whose goal is to allocateseeds to the companies such that their amplification factors are as closeto each other as possible, while the total expected number of adoptionsover all companies is maximized. The problem is NP-hard and we devise anefficient and effective greedy heuristic to tackle it. To summarize, we makethe following contributions:• We study competitive viral marketing from a campaign host’s perspec-tive. We propose the K-LT propagation model and show that in ourmodel, expected influence spread for any individual competing productis monotone and submodular (Section 3.4.1).• We define the problem of Fair Seed Allocation (FSA) and discuss anumber of options for formalizing it. As a case study, we focus onminimizing the maximum amplification factor offered to companies(Section 3.3.2).• We show that FSA under K-LT model is NP-hard. To tackle this prob-lem, we develop two exact algorithms based on dynamic programingand integer linear programing. We also propose an efficient heuristicalgorithm, Needy Greedy, a natural adaptation of the classic greedyalgorithm for machine scheduling [89] (Section 3.5.4).• We conduct extensive experiments on four real-world network datasets.3.2 Related WorkHistorically, competitions between two products have largely been ad-dressed in economics. For example, in Arthur [7] and David [44], network-independent properties were employed to model the propagation of two tech-41nologies through a market. Tomochi et al. [139] offered a game-theoretic ap-proach which relies on the network for spatial coordination games. However,they did not not address the problem of how to effectively take advantageof social networks and viral influence propagation when introducing a newtechnology into a market.Recent studies on competitive viral marketing mainly aimed to extendthe IC or LT model. Almost all of them focused on the client perspective asopposed to the host perspective (which is the case for our work in Section 3).In addition, some of them were restricted to just two entities. Two early pa-pers to tackle influence maximization in a competitive setting are [18, 31]and both studied the problem from the “follower’s perspective”. The followeris the player trying to introduce a new product into an environment where acompeting product already exists. Both studies showed that the problem forthe follower maintains the desired properties of monotonicity and submodu-larity and the approximation result in [119] is applicable.In the model described in Bharathi and Kempe [18], when a node u adoptsa product at time t it tries to activate each currently inactive neighbor v. Ifthe activation attempt from u on v succeeds, v will become active, adoptingthe same product of u, at time t+Tuv, where Tuv’s are mutually independent,exponentially distributed random variables, named activation time. They areneeded in order to avoid tie-breaking for simultaneous activation attempts.Carnes et al. [31] proposed two models. In the distance based model,each edge is also given a “length”: this might be for instance the inverse ofthe probability of seeing influence travelling trough that edge. In each mo-ment, edges on which influence has travelled are called active. Let us considertwo competitive products A and B, and the two sets of initial adopters for Aand B, denoted IA and IB respectively. For a given node u we consider theshortest path distance d along active edges to any initiator. Let udA denotethe number of initiators nodes in IA which are at distance d, along activeedges, from u (and similarly for udB). Then u adopts product A with proba-bility udA/(udA + udB) (and similarly for product B). In the wave propagationmodel propagation happens in discrete steps. In step d, all nodes that areat distance at most d 1 from some node in either IA or IB, and all nodesfor which the closest initial node is farther than d 1 do not have taken aproduct yet (where the distance is again with respect to active edges). Bothmodels above reduce to the IC model if there is no competition. For bothmodels, they showed that the corresponding influence function is monotoneand submodular, so they can apply the simple greedy algorithm to obtain a(1 1/e ✏)-approximation to the optimal strategy for the follower.42Kostka et al. [94] studied competitive influence diffusions under a game-theoretic framework and showed that finding the optimal strategy of boththe first and second player is NP-complete. Budak et al. [29] and He etal. [79] studied the problem of influence blocking maximization, where oneentity tries to block the influence propagation of its competitor as much aspossible, under extended IC and LT models, respectively. Pathak et al. [123]proposed an extension of the voter model to study multiple cascades. Chenet al. [34] study influence maximization in the presence of negative opinions.Their model, called IC-N, extends the IC model by further dividing theactive node state into two substates: positive and negative. The spread ofnegative opinions is characterized by a “quality factor” q, a parameter thatmodels the quality of the propagating entity, as perceived by users. There isno seed set for the negative opinion, but rather, any active user may becomepositive or negative with corresponding probabilities. They show that when qis assumed to be uniform across all users, the influence spread of the positiveopinion is a monotone submodular function of the seed set. However, whenthis assumption is lifted, submodularity no longer holds. We note that noneof the above work considered the host’s perspective.Borodin et al. [24] proposed extensions to the LT model to deal with com-peting products. In Section 3.3 we shall discuss their Weighted-ProportionalCompetitive LT (WPCLT) model in more details and highlight the dif-ferences between WPCLT and our proposed K-LT model. More recently,Borodin et al. [23] studied viral marketing mechanism design from the hostperspective. They showed that when the number of competitors is at leastthree, a mechanism which first uses the greedy algorithm to select seeds andthen randomly assigns seeds to companies is truthful. We shall discuss theirresults in more details in Section 3.5.5.Last but not the least, Myers and Leskovec analyzed Twitter data tostudy competitions which took place in real-world information and influencediffusions on Twitter [117].3.3 Models and Problem DefinitionIn this section we present the propagation model underlying our work andprovide the problem statement. We first introduce our extended LT model(dubbed K-LT, for Linear Thresholds with K Competitors) that capturescompetition, and then provide conceptual justifications of the model. Thenwe highlight the difference between K-LT and the Weighted-ProportionalCompetitive LT (WPCLT) model by Borodin et al. [24].433.3.1 The K-LT Diffusion ModelLet K denote the number of competing companies. Let Ci and Si, i 2{1, 2, . . . ,K}, denote the i-th company and its seed set respectively. Eachnode v 2 V selects an activation threshold ✓v uniformly at random from[0, 1]. Initially, all nodes are inactive. At time step 0, for each company Ci,a seed set Si is targeted. This means that if u 2 Si, then u becomes activewith respect to Ci at time 0. We also assume that all seed sets are disjoint,i.e., Si \ Sj = ; whenever i 6= j. Moreover, since it is a competitive model,each node adopts at most one company’s product.At any subsequent time step t 1, the activation of a node takes placein two phases. First, an inactive node v becomes influenced when the totalinfluence weight from its active in-neighbors (regardless of which company)reaches v’s threshold: Xactive u2N in (v)pu,v ✓v.Then, in a second phase (still at time t), v becomes active by picking onecompany out of those of its in-neighbors activated at t 1.Let Ait1 denote the set of nodes that are active with company Ci at theend of time t 1 and At1 denote the set of nodes that are active at the endof time t 1, w.r.t. any company. Hence, v becomes active at time t withcompany Ci with probabilityPu2Ait1\Ait2 pu,vPu2At1\At2 pu,v.Once a node becomes active, it remains so and will not switch company. Thediffusion process continues until no more nodes can be activated.The K-LT model reflects several phenomena of competitive influencepropagation that match our daily experience as well as studies in the litera-ture. The first phase models the threshold behavior in influence propagationas in the original LT model, and the second phase incorporates the recencyeffect in the final decision among competing products. Indeed, it has beenrecognized in various studies that influence decays very quickly in time, andthus customers are more likely to rely on recent information than on oldinformation, when choosing which product to adopt [80,124,151].Next, we introduce a critical invariance property w.r.t. influence spreadunder this model.Influence spread functions. Let S = {S1, ..., SK} be the set of seeds setsfor the various companies, i.e., S corresponds to a seed set allocation. We44use Si to denote the set of seed sets for all companies but company Ci, i.e.,Si =def {S1, . . . , Si1, Si+1, . . . , SK}.Definition 1 (Expected Spread). For each company Ci, let i(Si,Si) de-note the expected number of active nodes, or the expected spread, of Ci,given seed set allocation S. We define the overall expected spread, de-noted all =defPKi=1 i(Si,Si), to be the expected number of active nodesw.r.t. any company.Observe that under both K-LT and WPCLT models, the first phase ofactivation follows the same activation condition as in the classic LT model.Therefore, we have the following proposition.Proposition 1. Given a directed graph G = (V,E) with edge weights, andK pair-wise disjoint subsets S1, S2, . . . , SK of V , then under both the K-LTmodel and the WPCLT model, letting S = S1 [ . . . [ SK , we haveall = LT (S). (3.1)where LT is the spread function for the classical LT model.This implies that once a union seed set S is given, no matter how it getspartitioned (into K disjoint subsets), all remains the same. In other words,the total influence spread is invariant w.r.t. any K-partition of S.Comparisons with the WPCLT model. In the WPCLT model proposedby Borodin et al [24], the first phase of activation is exactly the same as inK-LT. The difference lies in the second phase, i.e., the way in which newlyinfluenced nodes decides the company. In WPCLT, a node v picks a certainCi with probability equal to the ratio between the total weight from theCi-active in-neighbors and that from all active in-neighbors. That is, all pastexposure are accounted for adoption. Thus, v becomes active with companyCi with probabilityPu2Ait1pu,vPu2At1 pu,v.As we shall show in Theorem 4, i(Si,Si) is monotone and submodularw.r.t. Si under the K-LT model, while this result does not hold in WPCLT,which is somewhat counter-intuitive as noted by the authors of [24]. Indeed,i(Si,Si) being non-monotone means that adding a new seed x to Si maycause the spread of Ci to go down. This counter-intuitive phenomenon canbe explained with the possibility that a certain graph structure will allowthe seeding of some nodes to trigger multiple “activation attempts” for seedsof a different company, shown in Example 1 below. For more detailed exam-ples that illustrate non-monotonicity and non-submodularity of the WPCLTmodel, we refer the reader to [24].45Example 1 (Activation in WPCLT). Consider Figure 3.1. Suppose thatthere are two companies with seed sets S1 = {u} and S2 = {w}. Also supposethat ✓v and ✓x fall into the interval (0.5, 1). At time step 1, v becomesactive w.r.t. company 2 (as pw,v = 1 > ✓v), while x remains inactive (aspu,x = 0.5 < ✓x). Subsequently, at time step 2, x first gets influenced asthe total incoming influence weight is now 1. Then, x will activate w.r.t.company 1 with probability 0.5 and company 2 with probability 0.5. v w0.5 0.5 1.0Figure 3.1: Sample graph accompanying Example 1.In this example, although u (in company 1) fails to activate x at timestep 1, x may still adopt company 1 under WPCLT. The reason is that xgets additional influence from v which has company 2! Thus, seeding w forcompany 2 ends up “helping” the competitor company 1: u gets a secondchance at activating x after failing at first. However, this phenomenon willnot occur in K-LT: at time step 2, after getting influenced, x will activatew.r.t. company 2 exclusively, with probability 0.5/0.5 = 1.3.3.2 Problem DefinitionWe are ready to provide the formal problem statement of fair competitiveviral marketing from the host perspective. We will focus on the K-LT modelhereinafter, unless otherwise specified. Assume that there are K companies,as clients of the host H, competing with similar products (one product each).Before the campaign is run, each company Ci would approach the host,specifying a positive integer bi as its budget (maximum number of seedswanted), and we assum that b1 + b2 + . . .+ bK < |V |. As its business model,H charges every company a fixed amount of money per requested seed, as wellas surcharges proportional to the expected spread achieved. Before definingthe problem, we first introduce the important notion of amplification factor.Definition 2 (Amplification Factor). The amplification factor of Ci, de-noted ↵i, is the average influence spread that Ci gets per seed, i.e.,↵i =i(Si,Si)bi. (3.2)Intuitively, after receiving budgets from all companies, H will allocateeach company Ci a seed set Si, |Si| = bi, such that461. The overall influence spread — all (Definition 1) — is maximized;2. Given that the overall influence spread is maximized, the allocations ofseeds to the K participating companies should be done in such a waythat the expected influence spread across all companies is as “balanced”as possible, i.e., the amplification factor of each company is as close aspossible. In other words, the allocation of seeds should be fair.Note that pursuing the second objective (fair seed allocation) does notcontradict with the first objective (maximizing the total influence spread),due to the invariant property on total influence spread under the K-LTmodel, as stated in Proposition 1.Formally, we define the problem of competitive influence maximizationfrom the host’s perspective, which consists of two subproblems, as follows.Problem 2 (Overall Influence Maximization). Given a directed graphG = (V,E) with pair-wise edge weights, numbers b1, b2, . . . , bK 2 Z+ withPKi=1 bi |V |, select a seed set S ✓ V of sizePKi=1 bi, such that all ismaximized.By Proposition 1, Problem 2 can be solved by establishing a connection tothe classical single-company influence maximization problem for the vanillaLT model. In addition, by Proposition 1, under both K-LT and WPCLTmodels, Problem 2 is equivalent to the original influence maximization underthe LT model, and hence it is NP-hard.By the same token, since LT (·) is monotone and submodular [86], select-ing the set of seeds S can be done using the classic greedy algorithm outlinedin the introduction as for the original LT model, giving a (1 1/e ✏)-approximate solution to the optimum selection of seeds. Formally, we have:Corollary 1. For an arbitrary instance of the K-LT model, Algorithm 1provides an (1 1/e ✏)-approximation for Problem 2.Corollary 2. For an arbitrary instance of the WPCLT model, Algorithm 1provides an (1 1/e ✏)-approximation for Problem 2.In the rest of this chapter, we assume that the union (global) seed set isselected by the greedy algorithm, and we shall focus on their fair allocationto the K participating companies.The goal of our second problem is to allocate seeds among the K clientssuch that the amplification factor of all companies is as close as possible, toachieve maximum fairness. We have various options to formalize the notion47of fairness. In the following problem statement and hereinafter we adopt asobjective function to maximize the minimum amplification factor. Intuitively,when the minimum amplification factor is maximized, it balances out allthe amplification factors. A discussion on other alternatives is provided inSection 3.3.3.Problem 3 (Fair Seed Allocation (FSA)). Given a directed graph G = (V,E)with pair-wise edge weights, numbers b1, b2, . . . , bK 2 Z+, a set S ✓ V with|S| =PKi=1 bi, find a partition of S into K disjoint subsets S1, S2, . . . , SK ✓S, such that |Si| = bi, i 2 [1,K], and the minimum amplification factor ofany color is maximized:argmaxS1...SKmini=1...Ki(Si,Si)bi. (3.3)It is worth emphasizing that although the two problems are formulatedseparately, the host H must solve them in a sequential order to achieve itsgoals. In particular, notice that the output of Problem 2, i.e., the union seedset S, is given as input for Problem 3.3.3.3 Possible Alternative ObjectivesOur goal of partitioning the union seed set S is to make the amplificationfactors of all advertisers as close as possible, such that the allocation isdeemed fair. To achieve this goal, in Problem 3, we defined the objectivefunction as maximizing the minimum amplification factor. One can offersimilar alternative objective functions, while trying to achieve the same goal.For instance, we may try to minimize the difference ↵max↵min, or the ratio↵max/↵min. More sophisticated objective functions can be based on the Lp-norm. In general, the objective function based on Lp-norm can be definedas KXi=1i(Si,Si) biBallp!1/p, (3.4)which one may want to minimize. A comprehensive theoretical analysis ofthese various objective functions would be an interesting exercise, but it isnot the focus of this paper. In the experiments section, we show that ouralgorithm performs well w.r.t. essentially all of these objectives.483.4 Model PropertiesBefore we develop algorithms to solve Problem 3, we take a deeper lookat the properties of the K-LT model, which will allow us to characterizethe complexity of FSA under K-LT and develop efficient and effective seedallocation algorithms.3.4.1 SubmodularityWe first show that the expected spread function for individual colors is mono-tone and submodular (Theorem 4) in the K-LT model. To prove this result,we employ a plot similar to the one in Kempe et al. [86], by establishing theequivalence between the K-LT model and a competitive version of the “live-edge” model (Definition 3). This, importantly, will in turn help us derive aclosed-form expression for the spread function (Theorem 5), which will playa pivotal role in the design of our algorithms and characterizing the com-plexity of FSA: it is NP-hard in general (Theorem 6), but can be solved inpolynomial time for K = 2.We start by introducing the competitive live-edge model, by extendingthe live-edge model defined in [86].Definition 3 (Competitive Live-Edge Model). Given a directed graph G =(V,E) with edges labeled by influence weights, we can obtain a possible worldX as follows. Each node v picks at most one of its incoming edges at ran-dom, selecting edge (u, v) with probability pu,v and selecting no edge withprobability 1 Pw2N in(v) pw,v. The selected edges are declared “live”, whileothers “blocked”. By definition, incoming edges to nodes in the seed set S areblocked. We call a directed path a live-edge path if it consists entirely of liveedges.In a possible world X, we say a node is Ci-reachable, if there exists alive-edge path from a node in Si to v. Note that a node v has at most oneincoming live edge, thus there is at most one live-edge path from S to v.Thus, the notion of color rechability is well-defined.It is easy to see that the spread function under the competitive live-edge model is monotone and submodular. Clearly, each possible world X isa deterministic graph. Let RX({u}) be the set of reachable nodes from aparticular node u on live-edge paths, in X. Then the set of nodes reachablefrom Si is RX(Si) = [u2SiRX({u}). The function |RX(Si)| is clearly mono-tone and submodular. Finally, the expected number of Ci-reachable nodesaccording to the live-edge model,PX Pr[X] · |RX(Si)|, is a non-negative lin-ear combination of monotone submodular functions, and thus is monotone49and submodular (in Si). Here, Pr[X] is the probability of the possible worldX, which is determined by the choice of live/blocked edges.We now state the submodularity result for K-LT:Theorem 4. Under the K-LT model, for any color Ci, the expected influencespread i(Si,Si) is monotone and submodular in Si, for any fixed Si suchthat all seed sets are pairwise disjoint.Proof. We prove this result by establishing the equivalence between the K-LT model and the competitive live-edge model (Definition 3). We shall provethis claim: Given K colors and their corresponding seed sets S1, S2, . . . , SK(all disjoint), for any color Ci, the following two distributions over sets ofnodes are equivalent: (i) the distribution over Ci-active sets obtained byrunning the K-LT process to completion from S1, S2, . . . , SK , and (ii) thedistribution over sets of Ci-reachable nodes according to the live-edge model.The theorem follows from this claim.We next prove the claim. If a node v has not become active after timestep t, then the probability that it becomes Ci-active at time step t+ 1 isPu2At\At1 pu,v1Pu2At1 pu,v ·Pu2Ait\Ait1 pu,vPu2At\At1 pu,v=Pu2Ait\Ait1 pu,v1Pu2At1 pu,v ,where the former quantity is the probability that v becomes active at t+ 1,and the latter is the probability that v adopts color Ci, given that v getsactivated.For the competitive live-edge model, we start the “reach-out” process withseed sets S1, S2, . . . , SK . In the first stage, if a node v’s selected live-edge isincident on Si, then v is Ci-reachable from a seed in Si. We denote the setof such nodes by A0i1. In general, let A0it denote the set of nodes which arefound to be Ci-reachable from a node in Si in stage t. In this way, we canobtain sets A0i2, A0i3, . . . . Similarly, we can also obtain sets A0t, t = 1, 2, 3, . . . ,which represent the set of nodes reachable from S1 [ S2 [ . . . [ SK in staget. Now, if a node v has not yet been determined Ci-reachable by the end ofstage t, then the probability that v will be determined Ci-reachable at staget+ 1 is the chance that its chosen edge is from A0t \A0t1, which isPu2A0t\A0t1 pu,v1Pu2A0t1 pu,v .50Given that, the probability that v proceeds to become Ci-reachable isPu2A0it\A0it1 pu,vPu2A0t\A0t1 pu,v.By the product rule, the probability that v will be determined to beCi-reachable at stage t+ 1, given that it is not already so determined, isPu2A0it\A0it1 pu,v1Pu2A0t1 pu,v .Applying induction on time steps (stages), it is easy to see that thedistributions over Ait and A0it are identical, and the same holds for At andA0t, 8t.3.4.2 Closed-form Expression for Influence SpreadWe first introduce the needed notation. By virtue of the equivalence shownin Theorem 4, i(Si,Si) is equal to the expected number of Ci-reachablenodes under the competitive live-edge model. Let X be a possible world.For simplicity, we write V S for V \ S and V S + u for (V \ S) [ {u}hereinafter. With node-sets as superscripts, we denote the correspondinginduced subgraph: e.g., WLT (S), where W ✓ V , denotes the expected spreadof the seed set S in the subgraph of G induced by the nodes W . When thereis no superscript, the entire graph G is meant by default.We now derive the closed-form expression by establishing connections tothe classical LT model. Let IVSiX (Si, v) be the indicator function whichtakes 1 if there exists a node s in Si and a path from s to v, in a possibleworld X for the subgraph of G induced on V Si (otherwise the functiontakes 0). Thus, by definition,i(Si,Si) =XXPr[X] · i,X(Si,Si),where Xi,X(Si,Si) is the number of Ci-reachable nodes in possible worldX. Then, because any live-edge path from any node u 2 Si to v must notgo through any node w 2 Si, as all incoming edges to nodes in Si areblocked by definition of the live-edge model (in other words, it has the effectof removing nodes in Si from G and hence from the possible world X), we51u1 u2u30.40.3 0.20.1 0.5Figure 3.2: An example graph for illustrating adjusted marginal gainhavei(Si,Si) =XXPr[X] ·Xv2V IVSiX (Si, v).Let W = V Si, the set of nodes after removing nodes in Si. Then,by switching the summations, we havei(Si,Si) =Xv2VXXPr[X] · IWX (Si, v)=Xv2V ⌥WSi,v , (3.5)where ⌥Si,v is the probability that there exists a path from Si to v in thesubgraph induced by V Si. Since Si is the seed set for company Ci, italso denotes the probability that v becomes Ci-active on the correspondingsubgraph. Note that the indicator function depends only the seed set Si andthe subgraph W , and not on the seeds for other colors. Therefore, ⌥WSi,vis equal to the probability that v is activated in the subgraph induced byV S + u, under classical LT model, with seed set Si.3.4.3 Adjusted Marginal GainNext, we introduce the notion of adjusted marginal gain, which is key tosolving Problem 3.Definition 4 (Adjusted Marginal Gain). Given a set S of seeds, for anyu 2 S, the adjusted marginal gain of u, denoted u, is the expected spread ofinfluence of {u} on the graph induced by V S + u under the classical LTmodel. That is, u = VS+uLT ({u}).Consider the example in Fig. 3.2. Suppose S = {u1, u2} is the seed set.Then, one can verify that u1 is the expected spread of u1 on graph consistingof u1 and u3 only, which is 1 + 0.3 = 1.3.Next, we show the following useful result for the K-LT model, whichsays that given a set of seeds S selected by the host, the expected spread for52company Ci only depends on the seeds Si allocated it, and not on how theremaining seeds S Si are distributed among the other companies.Theorem 5. Consider an allocation of seed sets, where the seed set Si ✓S is assigned to company Ci and the remaining seeds S Si are allocatedarbitrarily to other companies (denoted by Si). Then under the K-LT model,i(Si,Si) =Xu2Siu (3.6)Proof. Consider the right hand side of the equation. Since S is the set of allseeds, that is, S = Si + Si. we have by Definition 4,Xu2Siu =Xu2SiVSiSi+uLT ({u})=Xu2SiXv2V ⌥VSiSi+uu,v ,where ⌥VSiSi+uu,v is the probability with which v is activated given seedset {u}, on the subgraph induced by the nodes V Si S + u, under LTmodel. We next make use of the proof of Theorem 1 of [72]. There, it isshown that, under LT model,⌥Si,v =Xu2Si⌥WSi+uu,v ,for any set Si ✓ W ✓ V , where ⌥Si,v is the probability that v becomesactive, given seed set Si, on the subgraph induced by the nodes W Si+ u.Let W = V Si, then by switching the summations and applying thisresult, we get Xu2Siu =Xv2V ⌥WSi,v .From the equivalence with the live-edge model, and Eq. 3.5, the theoremfollows.Consider again the example shown in Figure 3.2. Suppose there are twocompanies with S1 = {u1} and S2 = {u2}. Then, 1(S1,S1) = u1 = 1.3.Similarly, 2(S1,S2) = u2 = 1.5. Also, note that all = LT ({u1, u2}) =2.8.533.5 Fair Seed Allocation AlgorithmsIn this section, we first show that Fair Seed Allocation is NP-hard. We thendesign several algorithms to tackle this problem, including two exact algo-rithms – Dynamic Programming and Integer Linear Programing – as well asa greedy heuristic called Needy-Greedy. All three algorithms leverage The-orem 5, which says that given the union seed set S, the expected spread ofCi is solely determined by the seeds Si ✓ S that are allocated to companyCi, and it can be calculated by taking the sum of the adjusted marginal gainof the seeds in Si in appropriate subgraphs. Therefore, the input to thosealgorithms will be the union seed set S chosen by the greedy algorithm (Al-gorithm 1), the adjusted marginal gain u for all u 2 S, and the budget bifor each company Ci.3.5.1 Hardness ResultsHaving established the notion of adjusted marginal gain, we first show thatFair Seed Allocation is NP-hard.Theorem 6. Fair Seed Allocation is NP-hard under the K-LT model .Proof. We prove the theorem by reduction from 3-PARTITION [59]. In 3-PARTITION, we are given a set A of 3m elements, and a size s(a) 2 Z+ foreach element. Let Y be the sum of sizes of all elements, i.e., Y =Pa2A s(a),then the question is whether there exists a partition of A into m disjointsubsets A1, A2, . . . , Am, each with exactly 3 elements, such that the sumof sizes of elements in each subset is the same, i.e.,Pa2Ai s(a) = Y/m.This problem is known to be strongly NP-hard [59]. Recall that a problem isstrongly NP-hard if it remains NP-hard even when the numerical parametersof the problem are bounded by a polynomial of the input size. In the contextof 3-PARTITION, it implies that the problem remains NP-hard even whenY is bounded by a polynomial in m.Let I be an abitrary instance of 3-PARTITION. We reduce it to aninstance J of FSA as follows. First, createm companies, and for each elementa 2 A with size s(a), create a seed ua in instance J , with its adjustedmarginal gain set to ua := s(a). Then we set the budget of each companyto 3. Suppose there exists a polynomial time algorithm A that provides anoptimal solution to FSA. Then by running this algorithm on instance J andchecking whether the minimum amplification factor is equal to Y/3m, wecan separate the YES-instances from the NO-instances of 3-PARTITION,which is not possible unless P = NP.54In the above, we performed the reduction entirely in terms of adjustedmarginal gains, instead of creating a graph, which is a required input to FSA.It is easy to create an input graph whose seed nodes ua satisfy the adjustedmarginal gains above. E.g., create 3 ·m disjoint trees, each rooted at a nodeua. The root ua has exactly s(a) 1 children, with influence weights on alledges set to 1. Since the trees are disjoint, ua = s(a). Notice this reductionis polynomial time in m since Y is a polynomial in m.Inapproximability. We note that the approximability of FSA is an openproblem for the max-min objective as defined in Problem 3. However, ifthe problem is defined using the Lp-norm (Equation (3.4)), then FSA istrivially inapproximable within any factor because the optimal solution inthis case may yield zero. The same result holds if the objective is to minimize↵max ↵min, which may also be zero.Special Case. When K = 2, Fair Seed Allocation resembles the PAR-TITION problem, which is weakly NP-hard and admits an exact dynamicprogramming algorithm in pseudo-polynomial time. We can adapt it to solveFair Seed Allocation, as described in the next subsection.3.5.2 Dynamic ProgrammingIn the special case where K = 2, the FSA problem can be solved by adynamic programming (DP) algorithm in polynomial time. If all adjustedmarginal gain are integers, then this approach yields an optimal solution.For instances with real-valued adjusted marginal gain, a standard techniqueis to multiple all input numbers with a factor of 10d and round to the nearestinteger. In such cases, the dynamic programming algorithm is optimal w.r.t.the resultant precision, but is not guaranteed to be such w.r.t. the originalreal-valued instance. Generally, the loss of accuracy should be insignificant,as we empirically verified in Section 3.6.1 by comparing the output of dy-namic programming with that of integer linear programming (to be proposedin the next subsection, which by definition solves the problem optimally forany input instance). Note that there is a trade-off between accuracy andcomplexity: As d increases, accuracy improves but the size of the dynamicprogramming table also grows, which in turn translates into higher time andspace complexity.The dynamic programming algorithm is set up as follows. Let the seed setbe S = {u1, u2, . . . , uB}, where B = b1 + b2. Also let Sj denote the “partial”55seed set {u1, ..., uj} for j 2 {1, 2, . . . , B}. Then, we defineP (j, µ, `) =⇢1, if 9Q ✓ Sj : |Q| = ` and 1(Q,Sj Q) = µ0, otherwiseHere, the variable j keeps track of which seeds in S have been explored,while ` tracks the size of a seed set Q, such that with Q allocated to C1 andSjQ allocated to C2, 1(Q,SjQ), is exactly µ. The size of Q is boundedby b1, the budget of C1.To derive the dynamic programming formula, a key observation here isthat in order to have a subset of Sj of size ` yielding influence spread µ forC1, one or both of the following must be true true:1. There is a subset of Q ✓ Sj1 of size `, which when allocated to C1,yields spread µ;2. There is a subset of Sj1 of size ` 1, which does not give spread µfor C1 itself, but will if we add uj to C1’s allocation.More formally, P (j, µ, `) = 1 if P (j 1, µ, `) = 1, or P (j 1, µ uj , `1) = 1. This gives the following dynamic programming equation:P (j, µ, `) = max{P (j 1, µ, `), P (j 1, µ uj , ` 1)}.And the base case is P (0, 0, 0) = P (1, 0, 0) = 1. Whenever the context isclear, we refer to P (·, ·, ·) as the “DP table”.The spread of C1 would be equal to Z = b1b1+b2all in the theoreticallybest allocation. Thus, after all non-zero cells in the DP table is populated,to obtain the actual seed set partition we can set the target be the numbert obtained by amplifying and rounding the number Z. If P (B, t, b1) = 1then we have found this ideal partition. Otherwise, we search for t0 such thatP (B, t0, b1) = 1 and |t t0| is minimized. It is easy to see that the subsetthat satisfies P (B, t0, b1) = 1 is then set to be S1 and for the other company,its seed set S2 is S \ S1. By construction, |S2| = B b1 = b2.Time complexity. From the ranges for j, µ, and `, the size of the DP tableis O(b1(b1 + b2)|V |). This in turn determines the running time. Note thattypically, b1 and b2 are much smaller than |V |. In our implementation, weapply a couple of optimizations. First, there is no need to populate cells with` > j, Second, if µ < uj , there is no need to examine the second argumentin the RHS of the dynamic programming equation.We also note that the DP table can be represented using sparse datastructures instead of a full 3-dimensional array. This would avoid storing56any zeros in memory, and thus can reduce memory usage and make theimplementation more scalable. For example, a hashing-based set can be used,where the keys are triples (j, µ, `) for which P (j, µ, `) = 1.3.5.3 Integer Linear ProgrammingWhen K > 2, the dynamic programming algorithm described above nolonger works. In general, another exact approach for solving the FSA problemis Integer Linear Programming (ILP).For connivence, we first introduce several useful notations. First let [n]denote the integer set {1, 2, . . . , n}. We use i 2 [K] to index companies andj 2 [m] to index seeds in the union seed set S, where m := |S|. Let xij be abinary indicator variable such that: That is,xij =(1 if sj 2 Si,0 otherwise.We can formulate an integer program by directly translating the objectiveand all constraints as defined in Problem 3.maximize mini2[K]Pmj=1 jxijbisubject toKXi=1xij = 1, 8j 2 [m]mXj=1xij = bi, 8i 2 [K]xij 2 {0, 1}, 8i 2 [K], 8j 2 [m].However, the max-min objective is not linear and the formulation aboveneeds to be transformed and standardized. To this end, we introduce a slackvariable z to represent the minimum amplification factor: Letz = mini2[K]Pmj=1 jxijbi.57Since z is the minimum amplification factor, then by definition z mustbe no greater than all amplification factors:z Pmj=1 jxijbi, 8i 2 [K]. (3.7)Then, using z as the maximization objective and insert the above K con-straints (Equation (3.7)) into the original formulation to get a standardizedILP formulation.maximize z (3.8)subject to z Pmj=1 jxijbi, 8i 2 [K]KXi=1xij = 1, 8j 2 [m]mXj=1xij = bi, 8i 2 [K]xij 2 {0, 1}, 8i 2 [K], 8j 2 [m]In any feasible solution to the above ILP, we have z mini2[K]Pmj=1 jxijbias per (3.7). In case z is strictly smaller than the minimum amplificationfactor, we can simply increase z (3.8) without breaking any constraint. Thiscan be done until z becomes equal to the minimum amplification factor.3.5.4 An Efficient Greedy HeuristicThus far, we have described two exact algorithms for Fair Seed Allocation.However, since the problem is NP-hard under the K-LT model (Theorem 6),such exact algorithms may not scale to large problem instances. Therefore,we now propose an efficient greedy heuristic by drawing on the literature ofmachine scheduling and load balancing.The heuristic is named Needy-Greedy, which comes from the intuitionthat in each iteration, the algorithm greedily chooses the “neediest” companyto allocate a seed. The pseudo-code is illustrated in Algorithm 6. The algo-rithm takes as input all seeds in the union seed set S sorted in decreasingorder of adjusted marginal gain and the budget for each company. It thenbegins by setting all individual seed sets Si to the empty set (line 2). Weprocess one seed in S per iteration. Let C be the set of of companies of58Algorithm 6: Needy-Greedy for Fair Seed Allocation with max-minobjective functionData: Union seed set S (with u for all u 2 S) and budget bi for alli 2 [K].Result: K-partition of S, with |Si| = bi for all i 2 [K].1 begin2 Si ; for all i 2 [K]3 sort all seeds u 2 S in decreasing order of u4 foreach u 2 ordered S do5 C {i 2 [K] : |Si| < bi}6 J argmini2C i(Si,Si)bi7 if |J| > 1 then8 j⇤ argmaxj2J(bj |Sj |) // breaking ties (ifargmax returns multiple, return one randomly)9 else10 j⇤ the only element in J11 Sj⇤ Sj⇤ [ {u}which the budget has not been exhausted, i.e., all i 2 [K] such that |Si| < bi(line 5): note that we can allocate a new seed only to such companies. Next, itfinds the subset J of companies which have the smallest amplification factor(line 6). It is possible that J has more than one element: for example, in thevery first iteration, all companies have an amplification factor of zero andhence they are all in J . Tie-breaking is required in such a case. Since seedsare visited in decreasing order of adjusted marginal gain, as a reasonableheuristic we break ties by favoring the company with the largest deficiencyin seed set size, namely the largest difference between budget and currentseed set size (line 7 and line 8). On the other hand, if J does contain onlyone company (line 10), then we proceed by assigning the current seed u tothis company.Adapting to Lp-norm objectives. Needy-Greedy not only works with themax-min objective as defined in Problem 3, but can also be flexibly adaptedto deal with alternative optimization objectives for Fair Seed Allocation. Forinstance, in order to minimize the Lp-norm: KXi=1i(Si,Si) biBallp!1/p,59the adaptation can be done by replacing line 6 of Algorithm 6 toJ argmaxi2Ci(Si,Si) biBall .That is, the “neediest” companies are those having the largest gap (absolutedifference) between its current spread and the ideal “target” spread biBall.Time complexity. The optimal time complexity of Needy-Greedy as inAlgorithm 6 is O(B(logB + logK)). The term B logB comes from sortingthe union seed set; B logK comes from the fact that there are in total Biterations, each of which examines the state of all companies to determine j⇤.Using a priority queue, we can perform the necessary searching and updatesin O(logK) time. When a priority queue is not used, the complexity increasesto O(B(logB +K)).3.5.5 Discussion on Strategic BehaviorsIt is natural to think of companies participating in viral marketing campaignsas selfish agents. We now discuss interesting game-theoretical implicationson fair seed allocation. In game theory and algorithmic mechanism design, amechanism is said to be truthful if each agent’s dominant strategy is to bidits true valuation. That is, no agent can benefit from lying. Taking the clas-sical single-item auctions as example, the second-price auction is a truthfulmechanism while the first-price auction is not [132]. In the context of seedallocation in viral marketing, truthfulness means that for each company,regardless of the seed budgets of other companies, reporting its true seedbudget always yields the best expected influence spread.The following counter-example shows that Needy-Greedy is not truthful.For the sake of discussion and illustration, we follow a similar analysis in[53] to assume that companies have complete knowledge of the bids of theircompetitors. As common in game theory, we assume companies are rationalagents, and their strategic behaviors are driven by the reasoning on how to“best react” to the other company’s bid so that their own expected influencespread can be maximized.Example 2 (Non-truthfulness of Needy-Greedy). Consider two companiesC1 and C2 with true budgets b1 = 30 and b2 = 29. Suppose the adjustedmarginal gain of the seeds are as follows: The first three seeds s1, s2, ands3 have 100, 96, and 95 respectively. And then there is a significant drop60in terms of seed quality: for simplicity we assume the rest of the seeds haveidentical adjusted marginal gain c, where c is constant and c < 91/33.When both C1 and C2 bid truthfully (30 and 29 respectively), Needy-Greedy will assign s1 to C1, s2 to C2. At this point, one can verify thatthe amplification factor of C1 is 3.33 while the amplification factor of C2 isslightly less at 3.31. This means Needy-Greedy will assign s3 to C2. Thisimplies that the spread of C1 would be 100+29c and the spread of C2 wouldbe 191 + 27c.Suppose C1 knows that b2 is 29 and reacts by lowering its bid to 28(underreporting). One can verify that Needy-Greedy would instead assign s1to C2, and assign both s2 and s3 to C1. The spread of C1 in this case wouldbecome 191+26c. As long as c < 91/3, implying that 100+29c < 191+26c,company C1 would be better off by lying and underreporting its budget tobe 28. This violates the condition of truthfulness.Remarks. Example 2 shows the myopic nature of a greedy algorithm: Whendeciding which company gets seed s3, Needy-Greedy picks C2 as the ampli-fication factor of C2 is smaller, although the difference is quite small (96/29vs. 100/30). Furthermore, in this setup, the adjusted marginal gain of thefirst three seeds are quite close, and much larger than the rest of the seeds,this enables C1 to have incentives to underreport.Interestingly, the example no longer applies if we just make a very smallchange to the input values. First, suppose the true budget of C2 is 28 insteadof 29, then since 96/28 > 100/30, C1 will get s3 and as a result C1 nolonger has incentive to underreport the budget. Second, suppose the adjustedmarginal gain of seed s2 is 97 instead of 96, then again C1 will get s3 because97/29 > 100/30.Cyclic Trend in Strategic BehaviorsWe now conduct further analysis on the setup presented in Example 2, whichindicates that the strategic behaviors of the two companies could lead toan interesting cyclic trend, if the host runs multiple campaigns in a rowand the constant c is small enough (less than 91/3, to be exact). This, tosome extent, bears a resemblance to the cyclic behaviors observed from real-world Generalized First-Price (GFP) auctions for display ads, on a searchengine called Overture [53]. The authors of [53] estimated that in 2002 and3As we shall see shortly, the exact value of c does not matter for the purpose of thiscounter-example and the following analysis61Round Bid of C1 Bid of C2 Spread of C1 Spread of C2 Strategy1 30 29 100 + 29c 191 + 27c (C1, 28)2 28 29 191 + 26c 100 + 28c (C2, 27)3 28 27 100 + 27c 191 + 25c (C1, 26)4 26 27 191 + 24c 100 + 26c (C2, 25)5 26 25 100 + 25c 191 + 23c (C1, 30)6 30 25 195 + 28c 96 + 24c (C2, 29)7 30 29 100 + 29c 191 + 27c (C1, 28)... ... ... ... ... ...Table 3.1: Cyclic behaviors of C1 and C2, assuming c < 91/3. The right-most column represents the best response by a company. E.g.,(C1, 28) means that company C1 will change its bid to 28 inthe next round. Note that the round 7 is identical to round1, indicating a cyclic trend of the strategic behaviors.2003 Overture suffered about 7.8% loss of revenue in auctions on popularkeywords.To see how the cyclic trend may take place, let us recall Example 2. WhenC1 drops its seed budget from 30 to 28 to “manipulate” Needy-Greedy to getseeds s2 and s3, the competitor C2 can best respond by lowering the budgetfrom 29 to 27, in which case C1 ends up with s1 and C2 get s2 and s3 back.Then, to best respond again, C1 can further lower from 28 to 26 to get s2and s3 back. This could go on for a few more rounds, and we illustrate thecomplete scenario in Table 3.1. Also note that the companies are rationalagents seeking to maximize its expected influence spread, so in this setup,when they decide to lower the bid, there is no incentive to declare an budgetequal to the other company’s. We shall further remark on this point shortly.As can be seen from Table 3.1, from round 1 to round 5, the two com-panies alternatively decrease their respective bid, until C1 bids at 26 andC2 bids at 25. Here, the best response of C1 is actually to increase again toits valuation 30, in which case it gets both s1 and s3, and have a spread of195+28c. This action is better than lowering again to 24, in which case thatthe spread of C1 is merely 96+23c: Because Needy-Greedy would allocate s1to C2 and s2 to C1, and at this point both companies have an amplificationfactor of 4. Due to Needy-Greedy’s tie-breaking rule that favors companieswith larger bids, C2 would get s3. By a similar token, bidding at 30 is alsobetter than 25. In that case, in the first iteration both companies tie at ampli-fication factor (zero) and budgets (25), and thus Needy-Greedy will allocate62the first seed uniformly at random: with probability 0.5, C1 gets s1 and endup with a spread of 100 + 24c < 195 + 28c; with another 0.5 probability, C2gets s1 and the spread of C1 is 191 + 23c < 195 + 28c.Further remarks on avoiding equal budgets. Note that in our setup,when C1 or C2 has the incentive to lower its bid, they would avoid declar-ing equal budgets (cf. round 1 to round 5 in Table 3.1). The reason is thatcompanies are clearly better off by getting seeds s2 and s3 together, as op-posed to getting s1. When the budgets are equal, there exists a non-zeroprobability for both companies to end up with s1 due to Needy-Greedy’stie-breaking rules. E.g., consider the scenario when C1 and C2 bid at 30 and29 respectively. If C1 then bids at 29, then the random tie-breaking rule isin effect: C1 and C2 have an equal probability of 0.5 to get s1. In contrast,if C1 declares slightly lower at 28, it is deterministically better off.Comparisons to GFP Auctions. In [53], the authors discussed a similar,but much more “drastic” cyclic scenario that happened in real-world GFPauctions, on search engine Overture. To show key differences between thecyclic strategic behaviors in Needy-Greedy seed allocation and Overture’sGFP auctions, we first recap the analysis in [53]. There are K agents (adver-tisers) bidding for N positions on a search results page to display ads. Eachagent declares a cost-per-click value to the search engine. In GFP, the agentwith the i-th highest wins i-th position, for all 1 i N , and the paymentis equal to its own bid value.GFP is not truthful, either. Edelman and Ostrovsky [53] analyzed thefollowing scenario: Suppose there are two agents A and B, whose true valu-ations are 0.6 and 0.55 respectively. B can drop its bid all the way to 0.01(which is assumed as the minimum required bid) as it would still win the sec-ond position. To respond, A will also decrease its bid to 0.02, barely beatingB and still winning the first position. Now, since B in fact has a valuationof 0.55 per click, it will bid 0.03 to beat A and win the first position. Thisalternating behaviors will continue till B hits the valuation 0.55 but can nolonger win the first position. After that, B will again drop to the minimumbid 0.01, resulting in a cyclic trend.It is straightforward to see that in our setup, dropping all the way tothe minimum bid (e.g., 1) is not a good strategy under the assumption thatcompanies are rational. Consider a generic case where there areK competingcompanies. If there exists a company Ci that bids at 1, then assuming noother companies bid the same, it will only get one seed with the K-th highestadjusted marginal gain. If Ci can afford more seeds, it will be decisively betteroff by bidding higher than 1: In the worst case where Ci still has the lowest63budget amongst all companies, it will still get the K-th seed, plus anotherbi 1 seeds which result in a higher spread.Truthful Mechanisms by Borodin et al. [23]. In a recent work, Borodinet al. [23] studied truthful mechanisms for the same problem setting as in thischapter, though they did not look into fairness. When K = 2 (two compa-nies), they showed that there exists a randomized allocation algorithm thatis truthful. When K > 3, there exists an even simpler result: the mecha-nism that allocates seeds uniformly at random, coupled with the greedy seedselection algorithm, is truthful. More specifically, given K 3 companiesand their budgets b1, b2, . . . , bK , the host runs the classic greedy algorithmfor B :=PKi=1 bi iterations. In each iteration, it selects the seed that maxi-mizes the marginal gain w.r.t. the spread under the classical LT model, andrandomly assigns it to a company whose budget has not been exhausted.Our experiments in the next section shall show that the uniform randomallocation performs substantially poorly w.r.t. fairness, compared to Needy-Greedy. Even though random allocation guarantees that no company canbenefit from underreporting budgets, its pure random nature make it un-suitable as a building block for a solid and sustainable business model forsocial network hosts. This motivates a study of how to design a seed allo-cation algorithm that is truthful (like the random allocation in [23]), but atthe same time achieves a higher level of fairness like Needy-Greedy.Remarks on Envy-freeness. Last but not the least, Example 2 also showsthat Needy-Greedy does not always guarantee envy-free outcomes. In theliterature studying fair division (such as the classical cake-cutting prob-lem) [41], envy-freeness means that no agent prefers another agent’s allo-cation than its own. Although the classic cake-cutting mainly focuses ondivisible goods, while seeds in viral marketing are not, we can still “port”the definition of envy-freeness to our context. We say that a seed assignmentis envy-free if there does not exist two companies i and j such that bi > bjbut i(Si,Si) < j(Sj ,Sj). In Example 2, as long as c < 91/2, we have100 + 29c < 191 + 27c, in which case company C1 would envy company C2because b1 > b2. Hence, designing an envy-free seed allocation algorithm andstudying its possible connections to a truthful allocation algorithm is also aninteresting direction for future research on viral marketing.64Epinions Flixster NetHEPT LiveJournalNumber of nodes 76K 7.6K 15K 4.8MNumber of edges 509K 50K 62K 69MAverage out-degree 13.4 6.5 4.12 28.5Table 3.2: Datasets statisticsEqual budgets Unequal budgetsK = 2 30 each 20, 40K = 3 20 each 10, 20, 30K = 6 10 each 5, 5, 10, 10, 15, 15Table 3.3: Test cases with varying budget distribution3.6 Experiments3.6.1 SettingsTo evaluate our proposed algorithms for FSA, we conducted simulations onfour real-world networks – Epinions, Flixster, NetHEPT, and LiveJournalTable 3.2 presents the statistics of the datasets.Network Data and Influence Weights. Epinions is a who-trust-whomsocial network extracted from the consumer review website Epinions.com. Ifuser v trusts the reviews of user u, then we drew a directed edge (u, v).We apply the Jaccard model to compute influence weights on edges [68]:pu,v = Au2v/Au|v where Au|v is the number of actions either u or v hasperformed. After computing these weights, we normalized them to ensurethat the sum of incoming weights to each node v 2 V is 1.NetHEPT is a collaboration network from the High Energy Physics The-ory section on arXiv.org with nodes representing authors and edges repre-senting co-author relationships. We calculated the weights as pu,v = Au,v/Nvwhere Au,v is the number of papers u and v co-authored. Flixster is a friend-ship network from social movie site (www.flixster.com), for which Au,v is thenumber of movies rated by both u and v. In both datasets, Nv is the nor-malizing factor to ensure the sum of weights incoming to v is 1.LiveJournal is a directed social network (www.livejournal.com) whereusers write an online blog, journal, or diary. We calculated the influenceweight pu,v as 1/degin(v) since there is no action log data available.65●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●30060090012000 20 40 60Greedy rankAdjusted marginal gain●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●501001502000 20 40 60Greedy rankAdjusted marginal gain(a) Epinions (b) Flixster●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●202530350 20 40 60Greedy rankAdjusted marginal gain●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●4000800012000160000 20 40 60Greedy rankAdjusted marginal gain(c) NetHEPT (d) LiveJournalFigure 3.3: Adjusted marginal gains on the four datasets. On the X-axis, the seeds are ranked in the order in which they werechosen by the greedy algorithm.Algorithms and Baselines. In the experiments, we evaluated the followingalgorithms as proposed in Section 3.5: Dynamic Programming (DP), IntegerLinear Programming (ILP), and Needy Greedy (NG). Two baselines werealso tested: The first one is a simple Round-Robin (RR) allocation strategy:it first fixes a random permutation of the K companies, and then allocatesseeds to the companies in a round-robin fashion according to that order. Thesecond one is the random allocation (RA) proposed in [23].Implementations of all algorithms and baselines are in Python. For In-teger Linear Programming we used the Coin-OR Cut-and-Branch (Cbc)solver [56]. For Dynamic Programming, we set the scaling factor to be 10.All experiments were conducted on a Linux server (Redhat Enterprise 6.6)with 12 Intel Xeon CPUs at 2.10GHz each and 64GB RAM.Competition Settings.We variedK – the number of competing companies– to be 2, 3 and 6. In each case, we further considered two scenarios: equalbudgets and unequal budgets. The detailed set-up can be found in Table 3.3.66 50 100 150 200 250 3002eq 2neq 3eq 3neq 6eq 6neqMinimum Amplification FactorDP ILP NG RR RA 10 20 30 40 50 60 70 802eq 2neq 3eq 3neq 6eq 6neqMinimum Amplification FactorDP ILP NG RR RA(a) Epinions (b) Flixster 5 10 15 20 252eq 2neq 3eq 3neq 6eq 6neqMinimum Amplification FactorDP ILP NG RR RA 0 500 1000 1500 2000 2500 3000 3500 4000 4500 50002eq 2neq 3eq 3neq 6eq 6neqMinimum Amplification FactorDP ILP NG RR RA(c) NetHEPT (d) LiveJournalFigure 3.4: Minimum amplification factors (higher is better)Later we also increased the problem size for the scalability test on Needy-Greedy – more details shortly.The selection of the union seed set and the computation of adjustedmarginal gains were done by invoking the martingale-based influence maxi-mization algorithm [136] (IMM), which is the state-of-the-art influence max-imization approximation algorithms for the LT model. The running time ofIMM was within 10 seconds for all datasets, matching the results reportedby [136]. Figure 3.3 illustrates the adjusted marginal gain of seeds in decreas-ing order, for all four datasets.3.6.2 Results and AnalysisMinimum and maximum amplification factors. Figure 3.4 and Fig-ure 3.5 give comparisons on quality: they depict the minimum and maximumamplification factors, respectively, achieved by various algorithms on all fourdatasets. Note that on the X-axis, “2eq” refers to the setting of K = 2 &67 0 100 200 300 400 500 600 7002eq 2neq 3eq 3neq 6eq 6neqMaximum Amplification FactorDP ILP NG RR RA 20 40 60 80 100 120 140 160 180 2002eq 2neq 3eq 3neq 6eq 6neqMaximum Amplification FactorDP ILP NG RR RA(a) Epinions (b) Flixster 10 20 30 40 50 602eq 2neq 3eq 3neq 6eq 6neqMaximum Amplification FactorDP ILP NG RR RA 0 2000 4000 6000 8000 10000 12000 140002eq 2neq 3eq 3neq 6eq 6neqMaximum Amplification FactorDP ILP NG RR RA(c) NetHEPT (d) LiveJournalFigure 3.5: Maximum amplification factors (lower bar is better)equal budgets, while “neq” refers to the case of unequal budgets. A missingbar means that either the corresponding algorithm failed to complete withina reasonable amount of running time (one week)4, or the algorithm does notapply (e.g., dynamic programming for K > 2). These two sets of plots offer adirect comparisons on the quality and effectiveness of algorithms for the FSAproblem. For Figure 3.4, the higher the bar, the better; while for Figure 3.5,lower bar is better.As expected, the Integer Linear Programming method yielded optimalsolutions by definition. Dynamic Programming also achieved the same levelof performance, which indicates that the loss of precision due to roundingthe adjusted marginal gains to integers is negligible. On the other hand, thetwo baselines, Round-Robin and Random Allocation yielded poor results.Importantly, Needy-Greedy performed well on these two metrics: In allinstances where Needy-Greedy, Dynamic Programming, and Integer Linear4Note that, to put the quantity into perspective to select the union seed set for alloca-tion, the current state-of-the-art algorithm [136] finishes within 10 seconds for all datasets.68Epinions Flixster NetHEPT LiveJournalDeviation on min. AF 0.1% 0.8% 0.4% 0.8%Deviation on max. AF 0.05% 0.5% 0.1% 0.4%Table 3.4: Comparing Needy-Greedy and Integer Linear Programming:For each dataset, we show the largest deviation betweenNeedy-Greedy’s outcome and Integer Linear Programming’soutcome, in percentage, among all instances where IntegerLinear Programming finished.Programming all finished, the metrics are neck to neck. Table 3.4 showsthe comparisons between the results by Needy-Greedy and Integer LinearProgramming. More specifically, for each dataset, we compute the largestdeviation between Needy-Greedy’s outcome and Integer Linear Program-ming’s outcome, in percentage, among all instances where Integer LinearProgramming finished.When the metric is the minimum amplification factor, the percentage iscalculated asminAFILP minAFNGminAFILP⇥ 100%, (3.9)When the metric is the maximum amplification factor, the percentagecan be similarly calculated asmaxAFNG maxAFILPmaxAFILP⇥ 100%. (3.10)Note that both Equation (3.9) and Equation (3.10) are well-defined, asInteger Linear Programming is guaranteed to be optimal while Needy-Greedyis not, and thus the numerators as defined are always nonnegative. As canbe seen from Table 3.4, for all cases presented, the deviation percentage isunder 1%, and in some case it is as low as 0.05%. This further confirms thestrong performance of Needy-Greedy.Figure 3.6 depicts the resultant amplification factors from another angleby plotting the empirical variance of the amplification factors of all compa-nies. As can be seen, Needy-Greedy are often several orders of magnitudebetter than the two baselines.Running time. Figure 3.7 illustrates the comparison on running time forall algorithms on all datasets. When a bar in the plot touches the top ofY -axis, it means the algorithm did not finish within a reasonable amount of6910-310-210-11001011021031041052eq 2neq 3eq 3neq 6eq 6neqVariance of all amplification factorsNeedyGreedyRound-RobinRandom10-410-310-210-11001011021031042eq 2neq 3eq 3neq 6eq 6neqVariance of all amplification factorsNeedyGreedyRound-RobinRandom(a) Epinions (b) Flixster10-510-410-310-210-11001011021032eq 2neq 3eq 3neq 6eq 6neqVariance of all amplification factorsNeedyGreedyRound-RobinRandom1011021031041051061071082eq 2neq 3eq 3neq 6eq 6neqVariance of all amplification factorsNeedyGreedyRound-RobinRandom(c) NetHEPT (d) LiveJournalFigure 3.6: Empirical variance of amplification factors for Needy-Greedy, Round-Robin, and Random Allocation.time, which is often the case of Integer Linear Programming. Needy-Greedyis consistently much more efficient than Dynamic Programming and Inte-ger Linear Programming – it was several orders of magnitude faster. Forinstance, on NetHEPT when K = 3 (with unequal budgets), Needy-Greedyfinished in 0.0002 seconds while Integer Linear Programming finished in 33.3seconds (165,000 times slower). On instances when both Dynamic Program-ming and Integer Linear Programming finish, they have comparable perfor-mance with Integer Linear Programming being slightly faster. However, it isevident that significant scalability issue exists for Integer Linear Program-ming as it quickly becomes much slower when K increases from 2 to 3 or6. For instance, on NetHEPT with unequal budget cases, even though therunning time of Integer Linear Programming was 0.42 seconds for K = 2,33.3 seconds for K = 3, but it failed to finish when K = 6. Hence, we con-clude that neither Dynamic Programming or Integer Linear Programming is7010-510-410-310-210-11001011021032eq 2neq 3eq 3neq 6eq 6neqRunning time (sec)DP ILP NG RR10-510-410-310-210-11001011021031042eq 2neq 3eq 3neq 6eq 6neqRunning time (sec)DP ILP NG RR(a) Epinions (b) Flixster10-410-310-210-11001011022eq 2neq 3eq 3neq 6eq 6neqRunning time (sec)DP ILP NG RR10-510-410-310-210-11001011021032eq 2neq 3eq 3neq 6eq 6neqRunning time (sec)DP ILP NG RR(c) NetHEPT (d) LiveJournalFigure 3.7: Running time comparisons. Bars touching the top of theY -axis means that the algorithm did not finish within oneweek.particularly scalable and practical, while Needy-Greedy is consistently moreefficient.Scalability Tests on Needy-Greedy. We ran Needy-Greedy with largerproblem instances on the LiveJournal dataset to test its scalability. For sim-plicity, for all cases described below the budget is equal amongst all compa-nies. In particular, we first varied the total number of seeds to be allocatedfrom 1000 to 5000 and fixed the number of companies to be 10. The resultsof this test are shown in Figure 3.8(a). We then fixed the total number ofseeds to be 5000 and varied the number of companies to be 10 to 50, andthe results are depicted in Figure 3.8(b). In both cases, it is evident thatNeedy-Greedy scaled well as the size of the problem instance increased.In conclusion, we have demonstrated the effectiveness of our proposedsolutions to the Fair Seed Allocation problem: Needy-Greedy, Integer Lin-ear Programming, and Dynamic Programming. In addition, we showed that710.00250.00500.00750.01000.01251000 2000 3000 4000 5000Total number of seedsRunning time (seconds)0.0150.0200.0250.03010 20 30 40 50Number of companiesRunning time (seconds)(a) Varying total budgets (b) Varying number of companiesFigure 3.8: Scalability tests of Needy-GreedyNeedy-Greedy is also much more efficient and scalable than the other two al-gorithms. On the other hand, the baselines including the Random Allocationmechanism of [23] yielded much inferior results w.r.t. fairness.3.7 Discussion and Future WorkInvestigations of competitive influence propagation and fair seed allocationunder other diffusion models (e.g., in which the virality of different productsmay vary) are worthy future work. Note that for ourK-LT model, we are ableto compute the adjusted marginal gains and use them as the input to the fairseed allocation problem. This result is specific to K-LT, and thus may not beeasily replicated under other models such as the competitive IC models [35].It is also interesting to further investigate whether other truthful mechanismexists besides the uniform random allocation [23] (which performs poorly interms of fairness).An important issue that arises here is data privacy and security in theviral marketing setting described in this chapter. Note that for the host toaccurately select and allocate seeds, it needs access to user action logs tocompute pairwise user influence strength in the social network [68, 129]. Ifthe user actions are restricted to those on the social networking website itself,such as clicking a link, liking or commenting on a post, and sharing a video,the host will have direct knowledge from its own tracking data. However,when the actions are product adoptions, note that users’ purchase historiesare more likely to be stored externally, e.g., in the participating companies’own databases or the databases of third-party stores such as Amazon oreBay. For the host to estimate influence strength, one possible solution touse its own tracking data as proxies. An (arguably) better alternative is to72establish a protocol with the participating companies and use their ground-truth historical sales data. However, the challenge is that neither the hostnor the companies want their propitiatory data to be disclosed and leaked, sothere is a need to protect privacy on both ends. To solve this problem, Tassaand Bonchi [138] proposed a privacy-preserving protocol for the host and theparticipating companies to jointly compute social influence strength. Theirmethod is certainly appealing and useful, but it do not account for purchasesin third-party stores, and hence the resultant estimations may be biased.Exploring possible improvements to eliminate such bias (if any) would be aninteresting albeit quite challenging future work (e.g., how to collect data andhow to ensure privacy guarantees when more parties are involved?).More generally, privacy-preserving influence maximization is itself an im-portant, interesting, yet often overlooked problem, in spite of a large body ofwork on privacy-preserving social network mining [9,103,153]. For example,supposes the host wishes to publishes (part, or all of) its social network datafor purposes such as research or outsourcing5, the data must be published in aprivacy-preserving manner such that (i) user privacy (identity, connections,interests, etc) can be safeguarded from adversaries and (ii) the publisheddata still has sufficient utility for data mining applications. It is interestingto investigate whether well-established privacy models, such as k-anonymityby Zhou et al. [153] and k-degree-anonymity by Liu and Terzi [103] are suit-able, and how influence maximization algorithms can be devised or adaptedto achieve high-quality solutions.5For example, in the 2014 Economic Graph Challenge held by LinkedIn, the world’slargest professional social network company invited participations from US academic in-stitutions to propose data mining and data analytics problems and solutions that wouldmake good and novel use of LinkedIn’s data.73Chapter 4Comparative InfluenceDiffusion and Maximization4.1 IntroductionMost existing work in computational social influence focuses on two typesof diffusion models — single-entity models and pure-competition models. Asingle-entity model has only one propagating entity for social network usersto adopt: the classic Independent Cascade (IC) and Linear Thresholds (LT)models [86] belong to this category. These models, however, ignore complexsocial interactions involving multiple propagating entities. Considerable workhas been done to extend IC and LT models to study competitive influencemaximization, but almost all models assume that the propagating entitiesare in pure competition and users adopt at most one of them [18, 23, 24, 29,31,34,79,106,123].In reality, the relationship between different propagating entities is cer-tainly more general than pure competition. In fact, consumer theories ineconomics have two well-known notions: substitute goods and complementarygoods [113, 133]. Substitute goods are those that can be used for the samepurpose and purchased one in place of the other, e.g., smartphones of variousbrands. Complementary goods are those that tend to be purchased together,e.g,. iPhone and its accessories, computer hardware and software, etc. Thereare also varying degrees of substitutability and complementarity: buying aproduct could lessen the probability of buying the other without necessar-ily eliminating it; similarly, buying a product could boost the probability74of buying another to any degree. Pure competition only corresponds to thespecial case of perfect substitute goods.The limitation of pure-competition models can be exposed by the follow-ing example. Consider a viral marketing campaign featuring iPhone 6 andApple Watch. It is vital to recognize the fact that Apple Watch generallyneeds an iPhone to be usable, and iPhone’s user experience can be greatly en-hanced by a pairing Apple Watch (see, e.g., http://bit.ly/1GOqesc). Clearlynone of the pure-competition models is suitable for this campaign becausethey do not even allow users to adopt both the phone and the watch!This motivates us to design a more powerful, expressive, yet reasonablytractable model that captures not only competition, but also complementar-ity, and any possible degrees associated with these notions. To this end, wepropose the Comparative Independent Cascade model, or ComIC for short,which, unlike most existing diffusion models, consists of two critical compo-nents that work jointly to govern the dynamics of diffusions:• Edge-level information propagation: This is similar to the social influ-ence propagation dynamics captured by the classical IC and LT models,but only controls information awareness.• Node-level decision-making: The model features a Node-Level Automa-ton (NLA) that ultimately makes adoption decisions based on a set ofmodel parameters known as the Global Adoption Probabilities (GAPs).The NLA is a novel feature and is unique in our model. Indeed, theterm “comparative” comes from the fact that once a user is aware, via edge-level propagation, of multiple products, intuitively she makes a comparisonbetween them by “running” her NLA. Notice that “comparative” subsumes“competitive” and “complementary” as special cases. In theory, the ComICmodel is able to accommodate any number of propagating entities (items)and cover the entire spectrum from competition to complementarity betweenpairs of items, reflected by the values of GAPs.In this work, as the first step toward comparative influence diffusion andviral marketing, we focus on the case of two items. At any time, w.r.t. anyitem A, a user in the network is in one of the following four states: A-idle, A-suspended, A-rejected, or A-adopted. The NLA sets out probabilistictransition rules between states, and different GAPs are applied based ona given user’s state w.r.t. the other item B and the relationship betweenA and B. Intuitively, competition (complementarity) is modeled as reducedprobability (resp., increased probability) of adopting the second item afterthe first item is already adopted. After a user adopts an item, she propagates75this information to her neighbors in the network, making them aware ofthe item. The neighbor may adopt the item with a certain probability, asgoverned by her NLA.We then study the influence maximization problem in the context ofComIC model for two complementary itemsA and B. Specifically, here it asksfor k seeds forA such that given a fixed set of B-seeds, the expected number ofA-adopted nodes is maximized. To the best of our knowledge, we are the firstto systematically study influence maximization for complementary items.Weshow that the problem is NP-hard under the ComIC model. Moreover, twoimportant properties of set functions, submodularity andmonotonicity, whichwould allow a greedy approximation algorithm frequently used in discreteoptimization problems, do not hold in unrestricted ComIC model (wherethere are no constraints on the GAPs). Even when we restrict ComIC tomutual complementarity, submodularity still does not hold in general.To circumvent the aforementioned difficulties, we first show that sub-modularity holds for a subset of the complementary parameter space. Wethen make a non-trivial extension to the Reverse-Reachable Set (RR-set)techniques [22,136,137], originally proposed for influence maximization withsingle-entity models, to obtain effective and efficient approximation solutionsto Influence Maximization. Next, we propose a novel Sandwich Approxima-tion (SA) strategy which, for a given non-submodular set function, providesan upper bound function and/or a lower bound function, and uses them toobtain data-dependent approximation solutions w.r.t. the original function.We further note that both techniques are applicable to a larger contextbeyond the model and problems studied in this paper: for RR-sets, we providea new definition and general sufficient conditions not covered by [22,136,137]that apply to a large family of influence diffusion models, while SA applies tothe maximization of any non-submodular functions that are upper- and/orlower-bounded by submodular functions.In the experiments, we first learn GAPs from user action logs from twosocial networking sites — Flixster.com and Douban.com. We demonstratethat our approximation algorithms based on RR-sets and SA techniquesconsistently outperform several intuitive baselines, typically by a significantmargin on real-world networks.To summarize, we make the following contributions:• We propose the ComIC model to characterize influence diffusion dy-namics of products with arbitrary degree of competition or comple-mentarity (Section 4.3).76Acronym Full NameComIC Comparative Independent CascadeGAP Global Adoption ProbabilityNLA Node-Level AutomationRR-set Reverse-Reachable SetSA Sandwich ApproximationTIM Two-Phase Influence MaximizationTable 4.1: Frequently used acronyms.• We identify a subset of the parameter space under which submodularityand monotonicity of influence spread hold, paving the way for designingapproximation algorithms (Section 4.5).• We study Influence Maximization for complementary products underthe ComIC model (Section 4.4). The problem remains NP-hard, andthus we devise efficient and effective approximation solutions by non-trivial extensions to RR-set techniques and by proposing Sandwich Ap-proximation, both having applicability beyond this work (Section 4.6).• We conduct empirical evaluations on four real-world social networksand demonstrate the superiority of our algorithms over intuitive base-lines (Section 4.8).• We also propose a methodology for learning global adoption probabil-ities for the ComIC model from user action logs of social networkingsites (Section 4.8).Table 4.1 summarizes frequently used acronyms in this chapter.4.2 Related WorkDatta et al. [43] studied influence maximization with items whose propa-gations are independent. Narayanam et al. [118] studied a setting with twosets of products, where a product can be adopted by a node only when ithas already adopted a corresponding product in the other set. Their modelextends LT. In Section 4, we depart by defining a significantly more power-ful and expressive model in ComIC, compared to theirs which only coversthe special case of perfect complementarity. Our technical contributions foraddressing the unique challenges posed by ComIC are substantially differentfrom [118].77Meyers and Leskovec [117] analyzed Twitter data to study the effect ofdifferent cascades on users and predicted the likelihood of a user adoptinga piece of information (e.g., URLs in tweets) given cascades that the userwas previously exposed to. McAuley et al. [114] used logistic regression tolearn substitute or complementary relationships between products from userreviews. Both studies primarily focus on data analysis and behavior predic-tion, instead of providing diffusion modeling for competing and complemen-tary items, nor do they study the influence maximization problem in thiscontext.Substitutability and complementarity have been studied in other com-puter science subfields, such as theory. A representative example is the al-location problem in combinatorial auctions: Given a set of items and eachbidder’s valuation function, the task is to allocate items to bidders so thatthe total utility of all bidders (i.e., the social welfare) is maximized. Thegreedy algorithm can be naturally applied: Enumerate the items in an arbi-trary order, and each item is allocated to the bidder with the largest marginalvaluation under the current allocation. When all valuation functions are sub-modular and a value oracle is assumed1, Lehmann et al. showed that thegreedy algorithm achieves 1/2-approximation [97]. Vondrak [142] proposed acontinuous greedy algorithm with (1 1/e) approximation factor. However,maximizing social welfare is much more difficult when there exists comple-mentarity. In a recent work, Abraham et al. [2] proposed a hypergraph-basedmodel to succinctly represent valuations with complements and designed ap-proximation algorithms under that model.Although the ComIC model also characterizes complementarity and sub-stitutability between propagating entities in social influence diffusions, fun-damental differences exist between social welfare maximization in combina-torial auctions and influence maximization under ComIC. More specifically,the substitute or complementary relationship is present amongst items to beallocated, which are not subject to any dynamic propagation. Neither theitems, nor the agents that they are allocated to are involved in any recursivepropagation. In contrast, in our setting (influence maximization), while seedsneed to be allocated to different products, this allocation needs to factor inthe effect of stochastic, recursive propagation of those products through a1The value oracle directly returns the value of the valuation function on any givensubset of items. For example, suppose the function is denoted by v and the set is denotedby S, then the value oracle returns v(S) when queried with S. More powerful oraclessuch as the demand oracle and general oracle have also been studied in the combinationalauction literature, see, e.g., [49].78A-suspended A-adopted A-rejected A-idle & B-idle / B-suspended A-idle & B-adopted 𝑞𝐴|∅ 1 − 𝑞𝐴|∅ 𝑞𝐴|𝐵 1 − 𝑞𝐴|𝐵 𝜌𝐴 1 − 𝜌𝐴 (informed of A) (informed of A) (adopted B) Figure 4.1: ComIC model: Node-level automaton for product A.network as guided by the stochastic propagation rules under the ComICmodel.4.3 The Comparative Independent Cascade ModelWe start this section by highlight the essential ideas of our new model.In the Comparative IC (ComIC) model, there are at least two propagatingentities (products, technologies, information pieces, opinions, etc). For easeof exposition, we focus on the case of two products, denoted by A and Brespectively. The diffusion dynamics unfold in discrete time steps 0, 1, . . .Nodes (users) in the social network can be in any of the following states:{idle, suspended, adopted, rejected} w.r.t. each of the products. Initially, allnodes in the joint state of (A-idle, B-idle).One of the biggest differences between ComIC and IC is the separation ofinformation diffusion (edge-level) and the actual adoption decisions (node-level). Edges only control the information that flows to a node: e.g., whenu adopts a product, its out-neighbor v may be informed of this fact. Oncethat happens, v uses its own Node-Level Automaton (NLA) to decide whichstate to transit to. This depends on v’s current state w.r.t. the two productsas well as parameters corresponding to the state transition probabilities ofthe NLA, namely the Global Adoption Probabilities, defined below.A concise representation of the NLA is shown in Figure 4.1. Each stateis indicated by the label. E.g., with probability qA|;, a node transits froma state where it’s A-idle to A-adopted, regardless of whether it was B-idleor B-suspended. From the A-suspended state, it transits to A-adopted w.p.⇢A and to A-rejected w.p. 1⇢A. The probability ⇢A, called reconsiderationprobability, as well as the reconsideration process will be explained below.Note that in a ComIC diffusion process defined in Section 4.3.3, not alljoint state is reachable from the initial (A-idle, B-idle) state, e.g., (A-idle, B-79rejected). Since all unreachable states are irrelevant to adoptions, they can besafely ignored. We shall list and prove all unreachable states in Section 4.3.2.4.3.1 Global Adoption Probability (GAP)The Global Adoption Probabilities, consisting of four parameters Q =(qA|;, qA|B, qB|;, qB|A) 2 [0, 1]4, are important parameters of the NLA whichdecide the likelihood of adoptions after a user is informed of an item. For-mally,• qA|; := the probability that a user adopts A given that she is A-informed but not B-adopted;• qA|B := the probability that a user adopts A given that she is B-adopted;• qB|; := the probability that a user adopts B given that she is B-informed but not A-adopted;• qB|A := the probability that a user adopts A given that she is B-adopted.Intuitively, GAPs reflect the overall popularity of products and how theyare perceived by the entire market. They are considered aggregate estimatesand hence are not user specific in our model. In other words, users are as-sumed to be homogenous in terms of preferences on consuming the two items.We shall provide further justifications at the end of this section and describea way to learn GAPs from user action log data in Section 4.8. In addition,we shall discuss in Section 4.9 how the ComIC model can be extended toaccommodate multiple types of users.GAPs enable ComIC to characterize competition and complementarityto arbitrary extent. We say that A competes with B iffqB|A qB|;.Similarly, A complements B iffqB|A qB|;.We include the special case of qB|A = qB|; in both cases above for conve-nience of stating our technical results, and it actually means that the propa-gation of B is completely independent of A (cf. Lemma 5). Competition andcomplementarity in the other direction are similar.80The degree of competition and complementarity is determined by thedifference between the two relevant GAPs, i.e., |qB|AqB|;| and |qA|BqA|;|.For convenience, we use Q+ to refer to an arbitrary set of GAPs representingmutual complementarity:(qA|; qA|B) ^ (qB|; qB|A),and similarly, Q for an arbitrary set of GAPs representing mutual compe-tition:(qA|; qA|B) ^ (qB|; qB|A).4.3.2 Unreachable States of ComIC ModelBefore an influence diffusion starts under the ComIC model, all nodes are inthe initial joint state (A-idle, B-idle). According to the diffusion dynamicsdefined in Figure 4.2, there exist five unreachable joint states, which are notmaterial to our analysis and problem-solving, since none of these is relevantto actual adoptions, the objectives studied in Influence Maximization. Forcompleteness, we list these states here.1. (A-idle, B-rejected)2. (A-suspended, B-rejected)3. (A-rejected, B-idle)4. (A-rejected, B-suspended)5. (A-rejected, B-rejected)Lemma 1. In any instance of the ComIC model (no restriction on GAPs),no node can reach the state of (A-idle, B-rejected), from its initial state of(A-idle, B-idle).Proof. Let v be an arbitrary node from graph G = (V,E, p). Note that forv to reject B, it must be first be informed of B (otherwise it remains B-idle,regardless of its state w.r.t. A), and then becomes B-suspended (otherwise itwill be B-adopted, a contradiction). Now, note that v is never informed of A,and hence it will not be triggered to reconsider B, the only route to the stateof B-rejected, according to the model definition. Thus, (A-idle, B-rejected)is unreachable.81The argument for (A-rejected, B-idle) being unreachable is symmetric,and hence omitted.Lemma 2. In any instance of the ComIC model (no restriction on GAPs),no node can reach the state of (A-suspended, B-rejected), from its initial stateof (A-idle, B-idle).Proof. Let v be an arbitrary node from graph G = (V,E, p). Note that forv to reject B, it must be first be informed of B (otherwise it remains B-idle,regardless of its state w.r.t. A), and then becomes B-suspended (otherwiseit will be B-adopted, a contradiction). Now, v transits from A-idle to A-suspended, meaning that v does not adopt A. This will not further triggerreconsideration, and hence v stays at B-suspended.The argument for (A-rejected, A-suspended) being unreachable is sym-metric, and hence omitted. Finally, it is evident from the proof of Lemma 2that, the joint state of (A-suspended, A-suspended) is a sunken state, mean-ing the node will not get out it to adopt or reject any product. This impliesthat (A-rejected, B-rejected) is also unreachable.4.3.3 Diffusion Dynamics in the ComIC ModelLet G = (V,E, p) be a directed social graph with pairwise influence prob-abilities. Let SA, SB ⇢ V be the seed sets for A and B. Influence diffusionunder ComIC proceeds in discrete time steps. Initially, every node is A-idleand B-idle.At time step 0, every u 2 SA becomes A-adopted and every u 2 SBbecomes B-adopted. No generality is lost in assuming seeds adopt an itemwithout testing the NLA: for every v 2 V , we can create two dummy nodesvA, vB and edges (vA, v) and (vB, v) with pvA,v = pvB,v = 1. Requiring seedsto go through NLA is equivalent to constraining that A-seeds (B-seeds) beselected from all vA’s (resp. vB’s). If u 2 SA \ SB, we randomly decide theorder of u adopting A and B with a fair coin. For ease of understanding, wedescribe the rest of the diffusion process in a modular way in Figure 4.2. Weuse N+(v) and N(v) to denote the set of out-neighbors and in-neighborsof v, respectively.We draw special attention to tie-breaking and reconsideration. Tie-breaking is used when a node’s in-neighbors adopt different products andtry to inform the node at the same step. Node reconsideration concerns thesituation that a node v did not adopt A initially but later after adoptingB it may reconsider adopting A: when B competes with A (qA|; qA|B),82Global iteration. At every time step t 1, for all nodes that becameA- or B-adopted at t 1, their outgoing edges are tested for transition(1 below). After that, for each node v that has at least one in-neighbor(with a live edge) becoming A- and/or B-adopted at t 1, v is testedfor possible state transition (rules 2–4 below).1. Edge transition. For an untested edge (u, v), flip a biased coin in-dependently: (u, v) is live w.p. pu,v and blocked w.p. 1pu,v. Each edgeis tested at most once in the entire diffusion process.2. Node tie-breaking. Consider a node v to be tested at time t. Gen-erate a random permutation ⇡ of v’s in-neighbors (with live edges) thatadopted at least one product at t 1. Then, test v with each such in-neighbor u and u’s adopted item (A and/or B) following ⇡. If there is aw 2 N(v) adopting both A and B, then test both products, followingtheir order of adoption by w.3. Node adoption. Consider the case of testing an A-idle node v foradopting A (Figure 4.1). If v is not B-adopted, then w.p. qA|;, it be-comes A-adopted and w.p. 1 qA|; it becomes A-suspended. If v isB-adopted, then w.p. qA|B, it becomes A-adopted and w.p. 1 qA|B itbecomes A-rejected. The case of adopting B is symmetric.4. Node reconsideration. Consider an A-suspended node v that justadopts B at time t. Define⇢A =defmax{qA|B qA|;, 0}1 qA|; . (4.1)Then, v reconsiders to become A-adopted w.p. ⇢A, or A-rejected w.p.1 ⇢A. The case of reconsidering B is symmetric, and the reconsider-ation probability of B can be similarly defined:⇢B =defmax{qB|A qB|;, 0}1 qB|; .Figure 4.2: ComIC model: Diffusion dynamics83v will not reconsider adopting A, but when B complements A (specifically,qA|; < qA|B), v will reconsider adopting A. In the latter case, the probabilityof adopting A, ⇢A, is defined in such a way that the overall probability ofadopting A is equal to qA|B. That is,qA|B = qA|; + (1 qA|;) · ⇢A,where ⇢A is defined as in Equation (4.1).4.3.4 Design ConsiderationsThe design of ComIC not only draws on the essential elements from a classicaldiffusion model (IC) proposed in mathematical sociology, but also closes agap between theory and practice, in which diffusions typically do not occurjust for one product or with just one mode of pure competition.With GAPs in the NLA, the model can characterize any possible re-lationship between two propagating entities: competition, complementarity,and any degree associated with them. GAPs are fully capable of handlingasymmetric relationship between products. Furthermore, introducing NLAwith GAPs and separating the propagation of product information fromactual adoptions reflects Kalish’s famous characterization of new productadoption [85]: customers go through two stages – awareness followed by ac-tual adoption. In Kalish’s theory, product awareness is propagated throughword-of-mouth effects; after an individual becomes aware, she would decidewhether to adopt the item based on other considerations. Edges in the net-work can be seen as information channels from one user to another. Once thechannel is open (live), it remains so. This modeling choice is reasonable ascompetitive goods are typically of the same kind and complementary goodstend to be adopted together.We remark that ComIC encompasses previously-studied single-entity andpure-competition models as special cases. When qA|; = qB|; = 1 and qA|B =qB|A = 0, ComIC reduces to the (purely) Competitive Independent Cascademodel [35]. If, in addition, qB|; is 0, the model further reduces to the classicIC model.4.3.5 An Equivalent Possible World ModelTo facilitate a better understanding of the ComIC model and our submod-ularity analysis in (Section 4.5), we now describe a Possible World (PW)84model that provides an equivalent view of influence diffusion processes underthe ComIC model.Given a graph G = (V,E, p) and a diffusion model, a possible worldconsists of a deterministic graph sampled from a probability distributionover all subgraphs of G. For ComIC, we also need some variables for eachnode to fix the outcomes of random events in relation to the NLA (i.e.,adoption, tie-breaking, and reconsideration), so that influence cascade is fullydeterministic in a single possible world.4.3.5.1 Definition of the Possible World ModelGenerative rules. Let W be any possible world. To generate such W , wefirst process edges: Retain each edge (u, v) 2 E with probability pu,v (liveedge) and drop it with probability 1 pu,v (blocked edge). This generates adeterministic graph GW = (V,EW ) where EW is the set of all live edges.Next we process nodes. For all v 2 V :1. Choose “thresholds” ↵v,WA and ↵v,WB independently and uniformly atrandom from the interval [0, 1]. These two values are used for compar-ison with GAPs in adoption decisions. When the possible world W isclear from context, we write ↵vA and ↵vB for simplicity;2. Generate a random permutation ⇡v of all in-neighbors u 2 N(v). Thisis for tie-breaking;3. Sample a discrete value ⌧v 2 {A,B}, where each value has a probabilityof 0.5. This is used for tie-breaking in case v is a seed of both A andB.Deterministic diffusions in a possible world. At time step 0, the A-seeds in SA become A-adopted and the B-seeds in SB become B-adopted(ties, if any, are broken based on ⌧v).Then, iteratively for each time step t 1, we say that a node v isreachable by A at time step t if t is the length of a shortest path fromany seed u 2 SA to v consisting entirely of live edges and A-adopted nodes.Node v then becomes A-adopted at step t if ↵vA x, wherex =(qA|;, if v is not B-adopted;qA|B, if v is already B-adopted.For re-consideration, suppose v just becomes B-adopted at time step tand it is A-suspended (i.e., v became reachable by A before t time steps85but ↵vA > qA|;). Then, v adopts A if ↵vA qA|B. The reachability andreconsideration tests for item B are symmetric.For tie-breaking, if v is reached by both A and B at time step t, the per-mutation ⇡v is used to determine the order in which A and B are considered.In addition, if v is reached by A and B from the same in-neighbor, e.g., u,then the order in which v is informed of A and B is the same as the orderin which u itself adopted A and B.4.3.5.2 Equivalence to ComICThe following lemma establishes the equivalence between the possible worldmodel defined above and the ComIC mode, from the standpoint of influencediffusion and the probability distribution of adopted nodes. This allows usto analyze model properties such as monotonicity and submodularity (Sec-tion 4.5) using the PW model, which tends to be more convenient technically.Lemma 3. For any fixed A-seed set SA and fixed B-seed set SB, the jointdistributions of the sets of A-adopted nodes and B-adopted nodes obtained by(i) running a ComIC diffusion from SA and SB and (ii) randomly samplinga possible world W and running a deterministic cascade from SA and SB inW , are the same.Proof. The proof is based on establishing equivalence on all edge-level andnode-level activities in these two models.By the principle of deferred decisions and the fact that each edge is onlytested once in one diffusion, edge transition processes are equivalent. To gen-erate a possible world, the live/blocked status of an edge is pre-determinedand revealed when needed, while in a Com-IC process, the status is deter-mined on-the-fly.Tie-breaking is also equivalent. Note that each node v only needs to applythe random permutation ⇡v for breaking ties at most once. For ComIC, weneed to apply ⇡v only when v is transitioning out of state (A-idle, B-idle)after being informed of both A and B. Clearly, this transition occurs at mostonce for each node. The same logic applies to the PW model. Thus, theequivalence is obvious due to the principle of deferred decisions.The equivalence of decision-making for adoption is straightforward as ↵vAand ↵vB are chosen uniformly at random from [0, 1]. Hence, Pr[↵vA q] = q,where q 2 {qA|;, qA|B, qB|;, qB|A}.As to reconsideration, w.l.o.g. we consider A. In ComIC, the probabilityof reconsideration is ⇢A = max{(qA|B qA|;), 0}/(1 qA|;). In PW, whenqA|B qA|;, this amounts to the probability that ↵vA qA|B given ↵vA > qA|;,86which is (qA|BqA|;)/(1qA|;). On the other hand, when qA|B < qA|;, ↵vA >qA|; implies ↵vA > qA|B, which means reconsideration is meaningless, and thiscorresponds to ⇢A = 0 in ComIC. Thus, the equivalence is established.Finally, the seeding protocol is trivially the same. Combining the equiv-alence for all edge-level and node-level activities, we can see that the twomodels are equivalent and yield the same distribution of A- and B-adoptednodes, for any given SA and SB.Note that since ↵vA’s and ↵vB’s are real values in the interval [0, 1], theo-retically the number of valid possible worlds can be infinite. However, fromthe perspective of influence diffusion (i.e., state transitions of all nodes), thenumber of all “effective” possible worlds is still finite. To see why, notice thatinstead of the exact values, it is the interval in which ↵vA or ↵vB falls into thatultimately decides the outcomes of the random events regarding adoption.For ↵vA, there are three possibilities: [0, qA|;), [qA|;, qA|B), and [qA|B, 1],and the same applies to ↵B. Thus, consider two possible worlds W1 and W2where everything else is the same but ↵v,W1A 6= ↵v,W2A and ↵v,W1B 6= ↵v,W2B ,for some v 2 V . If, for instance, ↵v,W1A ,↵v,W2A 2 [0, qA|;) and ↵v,W1B ,↵v,W2B 2[qB|A, 1], then the diffusion dynamics in W1 and W2 will still be the same forany fixed seed sets.This observation would be particular useful for showing submodularity,since it suffices to show that submodularity holds in an arbitrary possibleworld (see, e.g., the proof of Theorem 8 in Section 4.5).4.4 Influence Maximization with ComplementaryGoodsMany interesting optimization problems can be formulated thanks to theexpressiveness of ComIC model. In this work, we focus on solving influencemaximization in a novel context, where two propagating entities are comple-mentary2.Given the seed sets SA and SB, we define A(SA, SB) to be the expectednumber of A-adopted nodes. Similarly, let B(SA, SB) denote the expectednumber of B-adopted nodes. We can see that both A and B are real-valuedbi-set functions mapping 2V ⇥ 2V to [0, |V |], for any fixed Q.Unless otherwise noted, GAPs are not considered as arguments to thesetwo influence spread functions asQ is constant in a given instance of ComIC.2Recall that competitive viral marketing has been studied extensively in the literature(see Section 3.2 and Section 4.2).87Following conventions, these two functions are called influence spread func-tions. For simplicity, we will refer to A(·, ·) as “A-spread” and B(·, ·) as“B-spread”.Without loss of generality we define the influence maximization problemin terms of A-spread as follows.Problem 4 (Influence Maximization under ComIC). Given a directed graphG = (V,E, p) with pairwise influence probabilities, B-seed set SB ⇢ V , acardinality constraint k, and a set of GAPs Q+ representing mutual com-plementarity, find an A-seed set S⇤A ⇢ V of size k, such that the expectednumber of A-adopted nodes is maximized under ComIC:S⇤A 2 argmaxT✓V,|T |=k A(T, SB).Influence Maximization under ComIC is obviously NP-hard, as it sub-sumes the original problem under the classic IC model when SB = ; andqA|; = qA|B = 1.4.5 Properties of ComIC ModelSince Influence Maximization cannot be solved in polynomial time unlessP = NP, we now study submodularity and monotonity for the ComIC modelwhich will pave the way for designing approximation algorithms. Withoutloss of generality, we focus on A only. Note that this influence spread func-tion is a bi-set function taking arguments SA and SB, so submodularityand monotonicity can be defined w.r.t. each of the two arguments. For thepurpose of Influence Maximization, it suffices to study these two propertiesw.r.t. SA only.4.5.1 MonotonicityIt turns out that if A competes with B, but B complements A, monotonicitydoes not hold in general, as shown in the following counter-example.v w u y s1 s2 Figure 4.3: The graph for Example 388Example 3 (Non-Monotonicity). Consider the graph in Figure 4.3. All edgeshave probability 1. GAPs are qA|; = q 2 (0, 1), qA|B = qB|; = 1, qB|A = 0,which means that A competes with B but B complements A. Let SB = {y}.If SA is S = {s1}, the probability that v becomes A-adopted is 1, becausev is informed of A from s1, and even if it does not adopt A at the time,later it will surely adopt B propagated from y, and then v will reconsider Aand adopt A. If it is T = {s1, s2}, that probability is 1 q + q2 < 1: w getsA-adopted w.p. q blocking B and then v gets A-adopted w.p. q; w gets B-adopted w.p. (1q) and then v surely getsA-adopted. Replicating sufficientlymany v’s, all connected to s1 and w, will lead to A(T, SB) < A(S, SB).The intuition is that the additional A-seed s2 “blocks” B-propagation as Acompetes with B (qB|A < qB|;) but B complements A (qA|B > qA|;). ClearlyA is not monotonically decreasing in SA either (e.g., in a graph when allnodes are isolated). Hence, A is not monotone in SA.The counter-example above has A competing with B, but B comple-menting A, which is unnatural. Hence, we now focus on mutual competition(Q) and mutual complementary cases (Q+), and show that monotonicityis satisfied in these settings.Theorem 7. For any fixed B-seed set SB, the influence spread function ofA — A(SA, SB) — is monotonically increasing in SA for any set of GAPsin Q+ and Q. Also, A(SA, SB) is monotonically increasing in SB for anyGAPs in Q+, and monotonically decreasing in SB for any Q.For ease of exposition, we also state a symmetric version of Theorem 7w.r.t. B. That is, given any fixed A-seed set, B(SA, SB) is monotonicallyincreasing in SB for any set of GAPs in Q+ and Q. Also, B(SA, SB) ismonotonically increasing in SA for any GAPs in Q+, and monotonicallydecreasing in SA for any Q. For technical reasons and notational conve-nience, in the proof of Theorem 7 presented below, we “concurrently” proveboth Theorem 7 and this symmetric version, without loss of generality.Proof of Theorem 7. We first fix a B-seed set SB. Since SB is always fixed,in the remaining proof we ignore SB from the notations whenever it is clearfrom context. It suffices to show that monotonicity holds in an arbitrary, fixedpossible world, which implies monotonicity holds for the diffusion model. LetW be an arbitrary possible world generated according to §4.3.5.Define WA (SA) (resp. WB (SA)) to be the set of A-adopted (resp. B-adopted) nodes in possible world W with SA being the A-seed set (and SBbeing the fixed B-seed set). Furthermore, for any time step t 0, define89WA (SA, t) (resp. WB (SA, t)) to be the set of A-adopted (resp. B-adopted)nodes in W by the end of step t, given A-seed set SA. Clearly, WA (SA) =[t0WA (SA, t) and WB (SA) = [t0WB (SA, t). Let S and T be two sets,with S ✓ T ✓ V .Mutual Competition Q. Our goal is to prove that for any v 2 V , (a) ifv 2 WA (S), then v 2 WA (T ); and (b) if v 2 WB (T ), then v 2 WB (S). Item(a) implies self-monotonic increasing property while item (b) implies cross-monotonic decreasing property. We use an inductive proof to combine theproof of above two results together, as follows. For every t 0, we inductivelyshow that (i) if v 2 WA (S, t), then v 2 WA (T, t); and (ii) if v 2 WB (T, t),then v 2 WB (S, t).Consider the base case of t = 0. If v 2 WA (S, 0), then it means v 2 S,and thus v 2 T = WA (T, 0). If v 2 WB (T, 0), it means v 2 SB, and thusv 2 WB (S, 0) = SB.For the induction step, suppose that for all t < t, (i) and (ii) hold, andwe show (i) and (ii) also hold for t = t0. For (i), we only need to considerv 2 WA (S, t0) \ WA (S, t0 1), i.e. v adopts A at step t0 when S is the A-seed set. Since v adopts A, we know that ↵vA qA|;. Let U be the set ofin-neighbors of v in the possible world W . Let UA(SA) = U \WA (SA, t01)and UB(SA) = U \ WB (SA, t0 1), i.e. UA(SA) (resp. UB(SA)) is the set ofin-neighbors of v in W that adopted A (resp. B) by time t0 1, when SA isthe A-seed set. Since v 2 WA (S, t0), we know that UA(S) 6= ;. By inductionhypothesis, we have UA(S) ✓ UA(T ) and UB(T ) ✓ UB(S).Thus, UA(T ) 6= ;, which implies that by step t0, v must have been in-formed of A when T is the A-seed set. If ↵vA qA|B, then no matter if vadopted B or not, v would adopt A by step t0 according to the possible worldmodel. That is, v 2 WA (T, t0).Now suppose qA|B < ↵vA qA|;. For a contradiction suppose v 62WA (T, t0), i.e., v does not adopt A by step t0 when T is the A-seed set.Since v has been informed of A by t0, the only possibility that v does notadopt A is because v adopted B earlier than A, which means v 2 WB (T, t0).Two cases arise:First, if v 2 WB (T, t0 1), then by the induction hypothesis v 2WB (S, t0 1). Since v 2 WA (S, t0) \ WA (S, t0 1), it means that when Sis the A-seed set, v adopts B first before adopting A, but this contradicts tothe condition that qA|B < ↵vA. Therefore, v 62 WB (T, t0 1).Second, v 2 WB (T, t0) \ WB (T, t0 1). Since v 62 WA (T, t0), it meansthat v is informed of A at step t0 when T is the A-seed set, and thus thetie-breaking rule must have been applied at this step and B is ordered first90before A. However, looking at the in-neighbors of v in W , by the inductionhypothesis, UA(S) ✓ UA(T ) and UB(T ) ✓ UB(S). This implies that when Sis the A-seed set, the same tie-breaking rule at A would still order B firstbefore A, but this would result in v not adopting A at step t0, a contradiction.Therefore, we know that v 2 WA (T, t0).The statement of (ii) is symmetric to (i): if we exchange A and B andexchange S and T , (ii) becomes (i). In fact, one can check that we canliterally translate the induction step proof for (i) into the proof for (ii) byexchanging pair A and B and pair S and T (except that (a) we keep thedefinitions of UA(SA) and UB(SA), and (b) whenever we say some set isthe A-seed set, we keep this A). This concludes the proof of the mutualcompetition case.Mutual Complementarity Q+. The proof structure is very similar to thatof the mutual competition case. Our goal is to prove that for any v 2 V , (a)if v 2 WA (S), then v 2 WA (T ); and (b) if v 2 WB (S), then v 2 WB (T ).To show this, we inductively prove the following: For every t 0, (i) ifv 2 WA (S, t), then v 2 WA (T, t); and (ii) if v 2 WB (S, t), then v 2 WB (T, t).The base case is trivially true.For the induction step, suppose (i) and (ii) hold for all t < t0, and weshow that (i) and (ii) also hold for t = t0.For (i), we only need to consider v 2 WA (S, t0) \ WA (S, t0 1), i.e. vadopts A at step t0 when S is the A-seed set. Since v adopts A, we knowthat ↵vA qA|B. Since v 2 WA (S, t0), we know that UA(S) 6= ;. By inductionhypothesis we have UA(S) ✓ UA(T ). Thus we know that UA(T ) 6= ;, whichimplies that by step t0, v must have been informed of A when T is the A-seedset. if ↵vA qA|;, then no matter v adopted B or not, v would adopt A bystep t0 according to the possible world model. Thus, v 2 WA (T, t0).Now suppose qA|; < ↵vA qA|B. Since v 2 WA (S, t0), the only possibilityis that v adopts B first by time t0 so that after reconsideration, v adopts Adue to condition ↵vA qA|B. Thus we have v 2 WB (S, t0), and ↵vB qB|;.If v 2 WB (S, t0 1), by induction hypothesis v 2 WB (T, t0 1), whichmeans that v adopts B by time t0 1 when T is the A-seed set. Since vhas been informed of A by time t0 when T is the A-seed set, condition↵vA qA|B implies that v adopts A by time t0 when T is the A-seed set, i.e.v 2 WA (T, t0).Finally we consider the case of v 2 WB (S, t0) \WB (S, t0 1). Looking atthe in-neighbors of v in W , v 2 WB (S, t0), implies that UB(S) 6= ;. By theinduction hypothesis, we have UB(S) ✓ UB(T ), and thus UB(T ) 6= ;. Thisimplies that when T is the A-seed set, node v must have been informed of91B by time t0. Since ↵vB qB|;, we have that v adopts B by time t0 when Tis the A-seed set. Then the condition ↵vA qA|B implies that v adopts A bytime t0 when T is the A-seed set, i.e. v 2 WA (T, t0).This concludes the inductive step for (i) in the mutual complementaritycase. The inductive step for (ii) is completely symmetric, and hence omitted.Therefore, we have completed the proof for the mutual complementarity case.As a result, the whole theorem holds.4.5.2 Submodularity in Complementary SettingNext, we analyze the submodularity of A(SA, SB) w.r.t. SA for mutualcomplementary GAPs. This has direct impact on the approximability ofInfluence Maximization (Problem 4). We show that submodularity is sat-isfied in the case of “one-way complementarity”, i.e., B complements A(qA|; qA|B), but A does not affect B (qB|; = qB|A), or vise versa (The-orem 8). However, this property are not satisfied in general as we showbelow.v z w y u x Figure 4.4: The graph for Example 4Example 4 (Non-Submodularity). Consider the possible world in Fig-ure 4.4. All edges are live. The node thresholds are: for w: ↵wA qA|;,qB|; < ↵wB qB|A; for z: ↵zA > qA|B, ↵zB < qB|;; for v: qA|; < ↵vA qA|B,↵vB qB|;; Then fix SB = {y}. For SA, let S = ;, T = {x}, and u is theadditional seed. It can be verified that only when SA = T [ {u}, v becomesA-adopted, violating submodularity.A concrete example of Q for which submodularity does not hold is asfollows: qA|; = 0.078432; qA|B = 0.24392; qB|; = 0.37556; qB|A = 0.99545.Seed sets are the same as above. We denote by pv(SA) the probability thatv becomes A-adopted with A-seed set SA. It can be verified that: pv(S) = 0,pv(S [ {u}) = 8.898 · 105, pv(T ) = 0.027254, and pv(T [ {u}) = 0.027383.Clearly, pv(T [ {u}) pv(T ) > pv(S [ {u}) pv(S). Hence, replicating vsufficiently many times will lead to A(T [ {u}, SB) A(T, SB) A(S [{u}, SB) A(S, SB), violating submodularity.92In what follows, we first give two useful lemmas. Thanks to Lemma 4below, we may assume w.l.o.g. that tie-breaking always favours A in com-plementary cases.Lemma 4. Consider any ComIC instance with Q+. Given fixed A- andB-seed sets, for all nodes v 2 V , all permutations of v’s in-neighbors areequivalent in terms of determining if v becomes A-adopted and B-adopted.This implies that tie-breaking is not needed for mutual complementary case.Proof. Without loss of generality, we only need to consider a node v and twoof its in-neighbours uA and uB which become A-adopted and B-adopted att1 respectively. In a possible world, there are nine possible combinations ofthe values of ↵vA and ↵vB. We show that in all such combinations, the ordering⇡1 = huA, uBi and ⇡2 = huB, uAi produce the same outcome for v.1. ↵vA qA|; ^ ↵vB qB|;. Both ⇡1 and ⇡2 make v A-adopted and B-adopted.2. ↵vA qA|; ^ qB|; < ↵vB qB|A. Both ⇡1 and ⇡2 make v A-adopted andB-adopted. With ⇡2, v first becomes B-suspended, then A-adopted,and finally B-adopted due to re-consideration.3. ↵vA qA|; ^ ↵vB > qB|A. Both ⇡1 and ⇡2 makes v A-adopted only.4. qA|; < ↵vA qA|B ^ ↵vB qB|;. Symmetric to (2).5. qA|; < ↵vA qA|B ^ qB|; < ↵vB qB|A. In this case, v does not adoptany item.6. qA|; < ↵vA qA|B^↵vB > qB|A. In this case, v does not adopt any item.7. ↵vA > qA|B ^ ↵vB qB|;. Symmetric to (3): v is B-adopted only.8. ↵vA > qA|B ^ qB|; < ↵vB qB|A. Symmetric to (6).9. ↵vA > qA|B ^ ↵vB > qB|A. In this case, v does not adopt any item.Since the possible world model is equivalent to ComIC (Lemma 3), thelemma holds true.Lemma 5. In the ComIC model, if B is indifferent to A (i.e., qB|A = qB|;),then for any fixed B-seed set SB, the probability distribution over sets of B-adopted nodes is independent of A-seed set. Symmetrically, the probabilitydistribution over sets of A-adopted nodes is also independent of B-seed set ifA is indifferent to B.93Proof. Consider an arbitrary possible worldW . Let q := qB|; = qB|A. A nodev becomes B-adopted in W as long as ↵vB q and there is a live-edge pathPB from SB to v such that for all nodes w on PB (excluding seeds), ↵wB q.Since qB|; = qB|A, this condition under which v becomes B-adopted in W iscompletely independent of any node’s state w.r.t.A. Thus, the propagation ofB-adoption is completely independent of the actual A-seed set (even empty).Due to the equivalence of the possible world model and Com-IC, the lemmaholds.Theorem 8. For any instance of Com-IC model with qA|; qA|B and qB|; =qB|A, we have1. A(SA, SB) is submodular w.r.t. A-seed set SA, for any fixed B-seedset SB.2. B(SA, SB) is submodular w.r.t. B-seed set SB, and is independent ofA-seed set SA.Proof. First of all, the submodularity of B holds trivially. Lemma 5 impliesthat the diffusion of A does not affect the diffusion of B whatsoever. Thus,B(SA, SB) = B(;, SB). It can be shown that the function B(;, SB) is bothmonotone and submodular w.r.t. SB, for any qB|;, through a straightforwardextension to the proof of Theorem 2.2 in Kempe et al. [86].Now we prove the submodularity of A. First, fix any possible world Wand any B-seed set SB 6= ;. Let WA (SA) be the set of A-adopted nodes inpossible world W with A-seed set SA (SB omitted when it is clear from thecontext). Consider two sets S ✓ T ✓ V , some node u 2 V \ T , and finally anode v 2 WA (T [ {u}) \WA (T ). There must exist a live-edge path PA fromT [ {u} consisting entirely of A-adopted nodes. We denote by w0 2 T [ {u}the origin of PA. We first prove the following claim.Claim 1. All nodes on path PA remain A-adopted even when SA = {w0}.Proof of Claim 1. Consider any node wi 2 PA. In this possible world, if↵wiA qA|;, then regardless of the diffusion of B, wi will adopt A as long asits predecessor wi1 adopts A. If qA|; < ↵wiA qA|B, then there must alsobe a live-edge path PB from SB to wi that consists entirely of B-adoptednodes, and it boosts wi to adopt A. Since qB|; = qB|A, A has no effect onthe diffusion of B (Lemma 5), and PB always exists and all nodes on PBwould still be B-adopted through SB (fixed) irrespective of A-seeds. Thus,PB always boosts wi to adopt A as long as wi1 is A-adopted. Hence, theclaim holds by a simple induction along PA starting from w0.94Then, it is easy to see w0 = u. Suppose otherwise for a contradiction.Then, w0 2 T must be true. By Claim 1 and the monotonicity of A (Theo-rem 7), v 2 WA ({w0}) implies that w 2 WA (T ), a contradiction. Therefore,we have v 62 WA (S) and v 2 WA (S [ {u}). This by definition implies that|WA (·)| is submodular for any W and SB. This is sufficient to show thatA(SA, SB) is indeed submodular in SA, due to the fact that a nonnegativelinear combination of submodular functions is also submodular, and that thenumber of effective possible worlds are finite.4.6 Scalable Approximation AlgorithmsIn this section, we derive a general framework (§4.6.1) to obtain approxima-tion algorithms for Influence Maximization (§4.6.2).Recall that for Influence Maximization under the classic IC model, theTIM algorithm proposed by Tang et al. [137] is able to produce a (1 1/e")-approximation with at least 1 |V |` probability in O((k+ `)(|E|+|V |) log |V |/"2) expected running time. TIM relies on the notion of Reverse-Reachable sets (RR-sets) [22] for computing influence spread accurately andefficiently.Reverse-Reachable Sets. In a deterministic directed graph Gd = (Vd, Ed),given any v 2 Vd, we say that all nodes that can reach v in Gd form an RR-setrooted at v [22]. Let R(v) denote such a set:R(v) =def {u 2 Vd : u can reach v via edges in Ed}.A random RR-set encapsulates two levels of randomness: (i) the “root”node v is chosen uniformly at random from the graph, and (ii) a deterministicgraph is sampled according to a certain probabilistic rule that retains asubset of edges from the graph. E.g., for the IC model, each edge (u, v) 2 Eis independently removed with probability (1 pu,v).The TIM algorithm first computes a lower bound on the optimal influencespread (which itself is NP-hard to compute). Then it uses the lower bound toderive the number of random RR-sets to be sampled, denoted ✓. To guaranteeapproximation solutions, the following inequality must be satisfied:✓ "2(8 + 2")|V | · ` log |V |+ log|V |k+ log 2OPT k, (4.2)where OPT k is the optimal influence spread achievable amongst all size-knode-sets, and " represents the trade-off between efficiency and quality: a95smaller " implies more RR-sets (longer running time), but gives a betterapproximation factor. The approximation guarantee of TIM relies on a keyresult from [22], re-stated here:Proposition 2 (Lemma 9 in [137]). Fix a set S ✓ V and a node v 2 V .Under the Triggering model, let ⇢1 be the probability that S activates v ina cascade, and ⇢2 be the probability that S overlaps with a random RR-setR(v) rooted at v. Then, ⇢1 = ⇢2.4.6.1 A General Framework Extending TIMThe solution framework proposed in [137] is promising but does not workas is for Influence Maximization with the ComIC model. The primary chal-lenge is how to correctly generate random RR-sets, so that the influencespread computed using those RR-sets are accurate w.r.t. the ComIC modeland the same approximation guarantee can be obtained for Problem 4. Inwhat follows, we use Possible World (PW) models to generalize the theoryin [22, 137] and show that the extended framework is capable of deliveringapproximation algorithms for our Problem 4.For a generic stochastic diffusion model M , an equivalent PW modelM 0 is a model that specifies a distribution over W , the set of all possibleworlds, where influence diffusion in each possible world inW is deterministic.Further, given a seed set (or two seed sets SA and SB as in ComIC), thedistribution of the sets of active nodes (orA- and B-adopted nodes in ComIC)in M is the same as the corresponding distribution in M 0. Then, we definea generalized concept of RR-set through the PW model:Definition 5 (Generalized RR-Set). For any possible world W 2 W andany root node v, the reverse-reachable set (RR-set) of v in W — denoted byRW (v) — is defined as the set of all nodes u such that the singleton set {u}would activate v in W . A random RR-set of v is a set RW (v) where W israndomly sampled from W using the probability distribution specified by M 0.It is easy to see that Definition 5 encompasses the RR-set definitionin [22,137] for IC, LT, and Triggering models as special cases. For the entiresolution framework to work, the key property that RR-sets need to satisfyis the following:Definition 6 (Activation Equivalence Property). Let M be a stochastic dif-fusion model and M 0 be its equivalent possible world model. Let G = (V,E, p)be a graph. Then, RR-sets have the Activation Equivalence Property if for96any fixed S ✓ V and any fixed v 2 V , the probability that S activates vaccording to M is the same as the probability that S overlaps with a randomRR-set generated from v in a possible world in M 0.As shown in [137], the entire correctness and complexity analysis is basedon the above property, and in fact in their latest improvement [136], theydirectly use this property as the definiton of general RR-sets. Proposition 2shows that the activation equivalence property holds for the triggering model.We now provide a more general sufficient condition for the activationequivalence property to hold (Lemma 7), which gives concrete conditions onwhen the RR-set based framework would work. More specifically, we showthat for any diffusion model M , if there is an equivalent PW model M 0 ofwhich all possible worlds satisfy the following two properties, then the RR-sets as defined in Definition 5 will enjoy the activation equivalence property.(P1). Given two seed sets S ✓ T , if a node v can be activated by S in apossible world W , then v shall also be activated by T in W .(P2). If a node v can be activated by S in a possible world W , then thereexists u 2 S such that the singleton seed set {u} can also activate v inW .In fact, (P1) and (P2) are equivalent to monotonicity and submodularityrespectively, as we formally state below.Lemma 6. Let W be a fixed possible world. Let fv,W (S) be an indicatorfunction that takes on 1 if S can activate v in W , and 0 otherwise. Then,fv,W (·) is monotone and submodular for all v 2 V if and only if both (P1)and (P2) are satisfied in W .Proof. First consider “if”. Suppose both properties hold in W . Monotonicitydirectly follows from Property (P1). For submodularity, suppose v can beactivated by set T [ {x} but not by T , where x 62 T . By Property (P2),there exists some u 2 T [ {x} such that {u} can activate v in W . If u 2 T ,then T can also activate v by Property (P1), a contradiction. Hence we haveu = x. Then, consider any subset S ⇢ T . Note that by Property (P1), Scannot activate v (otherwise so could T ), while S [ {x} can. Thus, fv,W (·)is submodular.Next we consider “only if”. Suppose fv,W (·) is monotone and submod-ular for every v 2 V . Property (P1) directly follows from monotonicity.For Property (P2), suppose for a contradiction that there exists a seed setS that can activate v in W , but there is no u 2 S so that {u} activates97v alone. We repeatedly remove elements from S until the remaining setis the minimal set that can still activate v. Let the remaining set be S0.Note that S0 contains at least two elements. Let u 2 S0, and then we havefv,W (;) = fv,W ({u}) = fv,W (S0 \ {u}) = 0, but fv,W (S0) = 1, which violatessubmodularity, a contradiction. This completes the proof.Lemma 7. Let M be a stochastic diffusion model and M 0 be its equivalentpossible world model. If M 0 satisfies Properties (P1) and (P2), then the RR-sets as defined in Definition 5 have the activation equivalence property as inDefinition 6.Proof. It is sufficient to prove that in every possible world W 2 W , S acti-vates v if and only if S intersects with v’s RR set in W , denoted by RW (v).Suppose RW (v)\ S 6= ;. Without loss of generality, we assume a node uis in the intersection. By the definition of RR set, set {u} can activate v inW . Per Property (P1), S can also activate v in W .Now suppose S activates v in W . Per Property (P2), there exists u 2 Ssuch that {u} can also activate v in W . Then by the RR-set definition,u 2 RW (v). Therefore, S \RW (v) 6= ;.Comparing with directly using the activation equivalence property as theRR-set definition in [136], our RR-set definition provides a more concrete wayof constructing RR-sets, and our Lemma 6 and Lemma 7 provide generalconditions under which such constructions can ensure algorithm correctness.Algorithm 7, GeneralTIM, outlines a general solution framework based onRR-sets and TIM. It provides a probabilistic approximation guarantee forany diffusion models that satisfy (P1) and (P2). Note that the estimation ofa lower bound LB of OPT k (line 2) is orthogonal to our contributions andwe refer the reader to [137] for details. Finally, we have:Theorem 9. Suppose for a stochastic diffusion model M with an equivalentPW model M 0, that for every possible world W and every v 2 V , the in-dicator function fv,W is monotone and submodular. Then for the influencemaximization problem under M with graph G = (V,E, p) and seed set sizek, GeneralTIM (Algorithm 7) applied on the general RR-sets (Definition 5)returns a (11/e")-approximate solution with at least 1|V |` probability.Theorem 9 follows from Lemmas 6 and 7, and the fact that all theoret-ical analysis of TIM relies only on the Chernoff bound and the activationequivalence property, “without relying on any other results specific to the ICmodel” [137].98Algorithm 7: GeneralTIM— Generalized Two-phase Influence Maxi-mization AlgorithmData: graph G = (V,E, p), k, ", `Result: seed set S1 begin2 LB lower bound of OPT k estimated by method in [137]3 Compute ✓ using Equation (4.2) with LB replacing OPT k4 R generate ✓ random RR-sets according to Definition 5/* using RR-ComIC or RR-ComIC++; */5 S ;6 for i = 1 to k do7 vi the node appearing in the most RR-sets in R8 S S [ {vi}9 Remove all RR-sets in which vi appearsNext, we describe how to generate RR-sets correctly forInfluence Maximization under the ComIC model (line 4 of Algorithm 7),which is much more involved than the generation process for IC/LT mod-els [137]. We will focus on submodular settings (qA|; qA|B and qB|; = qB|A,cf. Theorem 8) first, and then in §4.7, we propose Sandwich Approximationto handle any mutual complementary cases in which submodularity doesnot hold in general.4.6.2 Generating RR-Sets for Influence Maximization withComICWe present two algorithms, RR-ComIC and RR-ComIC++, for gen-erating random RR-sets per Definition 5. The overall algorithm forInfluence Maximization can be obtained by plugging RR-ComIC or RR-ComIC++ into GeneralTIM (Algorithm 7).According to Definition 5, for Influence Maximization, the RR-set of aroot v in a possible world W is the set of nodes u such that if u is theonly A-seed, then v would be A-adopted in W with any fixed B-seed set SB.By Theorems 7 and 8 (whose proofs indeed show that the indicator functionfv,W (S) is monotone and submodular), along with Lemmas 6 and 7, we knowthat RR-sets following Definition 5 have the activation equivalence property.We now focus on how to construct RR-sets that satisfy Definition 5. Recallthat in ComIC, adoption decisions for A are based on a number of factors99Algorithm 8: RR-ComIC— Generating RR-set for Problem 4Data: Graph G = (V,E, p), root node v, B-seed set SBResult: RR-set RW (v)1 begin2 Create an empty FIFO queue Q and empty set RW (v)3 Enqueue all nodes in SB into Q /* start forward labeling */4 while Q is not empty do5 u Q.dequeue()6 Mark u as B-adopted7 foreach v 2 N+(u) such that (u, v) is live do8 if ↵v,WB qB|; ^ v is not visited then9 Q.enqueue(v) /* also mark v as visited */10 Clear Q, and then enqueue v /* start backward BFS */11 while Q is not empty do12 u Q.dequeue()13 RW (v) RW (v) [ {u}14 if (u is B-adopted ^ ↵u,WA qA|B) _ (u is not B-adopted^ ↵u,WA qA|;) then15 foreach w 2 N(u) such that (w, u) is live do16 if w is not visited then17 Q.enqueue(w)18 mark w visitedsuch as whether v is reachable via a live-edge path from SA and its statew.r.t. B when reached by A. Note that qB|; = qB|A implies that B-diffusion isindependent of A (Lemma 5). Our algorithms take advantage of this fact, byfirst revealing node states w.r.t. B, which gives a sound basis for generatingRR-sets for A.4.6.2.1 The RR-ComIC AlgorithmConceptually, RR-ComIC (Algorithm 8) proceeds in three phases.Phase I. Sample a possible world as described in §4.3.5 (omitted from thepseudo-code).Phase II. A forward labeling process from the input B-seed set SB (lines 3to 9): a node v becomes B-adopted if ↵v,WB qB|; and v is reachable100from SB via a path consisting entirely of live edges and B-adoptednodes.Phase III. Randomly select a node v and generate RR-set RW (v) by run-ning a Breadth-First Search (BFS) backwards by following incomingedges (lines 10 to 18).Note that the RR-set generation for IC and LT models [137] is essentiallya simpler version of the first and third phases.Backward BFS. For possible world W , an RR-set RW (v) is formed by allnodes explored in the backward BFS as follows. Initially, we enqueue v intoa FIFO queue Q. We repeatedly dequeue a node u from Q for processinguntil Q is empty.• Case 1: u is B-adopted. There are two sub-cases: (i). If ↵uA qA|B, thenu is able to transit fromA-informed toA-adopted. Thus, we continue toexamine u’s in-neighbors. For all unexplored w 2 N(u), if edge (w, u)is live, then enqueue w; (ii). If ↵uA > qA|B, then u cannot transit fromA-informed to A-adopted, and thus u has to be an A seed to becomeA-adopted. In this case, u’s in-neighbors will not be examined.• Case 2: u is not B-adopted. Similarly, if ↵uA qA|;, perform actions asin 1(i); otherwise perform actions as in 1(ii).Theorem 10. Under one-way complementarity (qA|; qA|B and qB|; =qB|A), the RR-sets generated by the RR-ComIC algorithm satisfy Definition 5for Problem 4. As a result, Theorem 9 applies to GeneralTIM with RR-ComICin this case.Proof. It suffices to show that, given a fixed possible world W , a fixed B-seed set SB, and a certain node u 2 V , for any node v 62 WA (;, SB) with↵vA qA|B, we have: v 2 WA ({u}, SB) if and only if there exists a live-edgepath P from u to v such that for all nodes w 2 P , excluding u, w satisfies↵wA qA|B, and in case ↵wA > qA|;, then w must be B-adopted.The “if” direction is straightforward as P will propagate the adoptioninformation of A all the way to v. If ↵vA qA|;, it adopts A without question.If ↵vA 2 (qA|;, qA|B], then v must be B-adopted by the definition of P , whichmakes it A-adopted.For the “only if” part, suppose no such P exists for u. This leads to adirect contradiction since u is the only A-seed, and u lacks a live-edge pathto v, it is impossible for v to get informed of A, let alone adopting A. Next,101suppose there is a live-edge path P from u to v, but there is a certain nodew 2 P which violates the conditions set out in the lemma. First, w couldbe have a “bad” threshold: ↵wA > qA|B. In this case, w will not adopt Aregardless of its status w.r.t. B, and hence the propagation of A will notreach v. Second, w could have a threshold such that ↵wA 2 (qA|;, qA|B] but itdoes not adopt B under the influence of the given SB. Similar to the previouscase, w will not adopt A and the propagation of A will not reach v. Thiscompletes the “only if” part.Then by Definition 5, the theorem follows.Lazy sampling. For RR-ComIC to work, it is not necessary to sample alledge- and node-level variables (i.e., the entire possible world) up front, asthe forward labeling and backward BFS are unlikely to reach the wholegraph. Hence, we can simply reveal edge and node states on demand (“lazysampling”), based on the principle of deferred decisions. In light of this ob-servation, the following improvements are made to RR-ComIC.First, the first phase is simply skipped. Second, in Phase II, edge statesand ↵-values are sampled as the forward labeling from SB goes on. We recordthe outcomes, as it is possible to encounter certain edges and nodes againin the third phase. Next, for Phase III, consider any node u dequeued fromQ. We need to perform an additional check on every incoming edge (w, u).If (w, u) has already been tested live in Phase II, then we just enqueue w.Otherwise, we first sample its live/blocked status, and enqueue w if it islive, Algorithm 8 provides the pseudo-code for RR-ComIC, where sampling isassumed to be done whenever we need to check the status of an edge or the↵-values of a node.Expected time complexity. For the entire seed selection (Algorithm 7with RR-ComIC) to guarantee approximate solutions, we must estimate alower bound LB of OPT k and use it to derive the minimum number of RR-sets required, defined as ✓ in Equation (4.2). In expectation, the algorithmruns inO(✓·EPT ) time, where EPT is the expected number of edges exploredin generating one RR-set. In our case, EPT = EPTF +EPTB, where EPTF(EPTB) is the expected number of edges examined in forward labeling (resp.,backward BFS). Thus, we have the following result.Lemma 8. The expected running time complexity of GeneralTIM with RR-ComIC isO✓(k + `)(|V |+ |E|) log |V |✓1 +EPTFEPTB◆◆.102Proof. Given a fixed RR-set R ✓ V , let !(R) be the number of edges in Gthat point to nodes in R. Since in RR-ComIC, it is possible that we do notexamine incoming edges to a node added to the RR-set (cf. Cases 1(ii) and2(ii) in the backward BFS), we have:EPTB E[!(R)],where the expectation is taken over the random choices of R. By Lemma 4in [137] (note that this lemma only relies on the activation equivalence prop-erty of RR-sets, which holds true in our current one-way complementaritysetting),|V ||E| · E[!(R)] OPT k.This gives|V ||E| · EPTB OPT k.Following the same analysis as in [137] one can check that the lower boundLB of OPT k obtained by the estimation method in [137] guarantees thatLB EPTB · |V |/|E|. Since in our algorithm we set ✓ = /LB, where(following Eq.(4.2)) = "2✓(8 + 2")|V |✓` log |V |+ log✓|V |k◆+ log 2◆◆,then the expected running time of generating all RR-sets is:O(✓ · EPT ) = O✓LB· (EPTF + EPTB)◆= O✓|E||V |EPTB (EPTF + EPTB)◆= O✓|E||V |✓1 +EPTFEPTB◆◆= O✓(k + `)(|V |+ |E|) log |V |✓1 +EPTFEPTB◆◆.103The time complexity for estimating LB and for calculating the final seedset given RR-sets are the same as in [137], and thus the final complexity isO✓(k + `)(|V |+ |E|) log |V |✓1 +EPTFEPTB◆◆.This completes our analysis.Observe that EPTF increases when the input B-seed set grows. Intu-itively, it is reasonable that a larger B-seed set may have more complemen-tary effect and thus it may take longer time to find the best A-seed set.However, it is possible to reduce EPTF as described in the next algorithm.4.6.2.2 The RR-ComIC++ AlgorithmThe RR-ComIC algorithm may incur loss of efficiency because some of thework done in forward labeling (Phase II) may not be used in backward BFS(Phase III). For instance, consider an extreme situation where all nodesexplored in forward labeling are in a different connected component of thegraph than the root v of the RR-set. In this case, forward labeling can beskipped safely and entirely! To leverage this observation, we propose RR-ComIC++ (pseudo-code in Algorithm 9), of which the key idea is to run tworounds of backward BFS from the random root v. The first round determinesthe necessary scope of forward labeling, while the second one generates theRR-set.First backward BFS. As usual, we create a FIFO queue Q and enqueuethe randomly chosen root v. We also sample ↵vB uniformly at random from[0, 1]. Then we repeatedly dequeue a node u until Q is empty: for each in-coming edge (w, u), we test its live/blocked status based on probability pw,u,independently. If (w, u) is live and w has not been visited before, enqueue wand sample its ↵wB .Let T1 be the set of all nodes explored. If T1 \ SB = ;, then none ofthe B-seeds can reach the explored nodes, so that forward labeling can becompletely skipped. The above extreme example falls into this case. Other-wise, we run a residual forward labeling only from T1\SB along the explorednodes in T1: if a node u 2 T1 \SB is reachable by some s 2 T1\SB via a live-edge path with all B-adopted nodes, and ↵u,WB qB|;, u becomes B-adopted.Note that it is not guaranteed in theory that this always saves time comparedto RR-ComIC, since the worst case of RR-ComIC++ is that T1 \ SB = SB,which means that the first round is wasted. However, our experimental re-104Algorithm 9: RR-ComIC++ for Generating RR-set for Problem 4Data: Graph G = (V,E, p), root node v, B-seed set SBResult: RR-set RW (v)1 begin2 Create an FIFO queue Q and empty sets RW (v), T13 Q.enqueue(v) /* 1st backward BFS */4 while Q is not empty do5 u Q.dequeue()6 T1 T1 [ {u}7 foreach unvisited w 2 N(u) such that (w, u) is live do8 Q.enqueue(w)9 Mark w visited10 if T1 \ SB 6= ; then11 /* auxiliary forward pass to determine B adoption */12 Clear Q13 Enqueue all nodes of T1 \ SB into Q14 Execute line 4 to line 9 in Algorithm 815 execute line 10 to line 18 in Algorithm 8 /* 2nd backward BFS */sults (§4.8) indeed show that RR-ComIC++ is often one magnitude fasterthan RR-ComIC.Second backward BFS. This round is largely the same as Phase III in RR-ComIC, but there is a subtle difference. Suppose we just dequeued a nodeu. It is possible that there exists an incoming edge (w, u) whose status isnot determined. This is because we do not enqueue previously visited nodesin BFS. Hence, if in the previous round, w is already visited via an out-neighbor other than u, (w, u) would not be tested. Thus, in the currentround we shall test (w, u), and decide if w belongs to RW (v) accordingly.To see RR-ComIC++ is equivalent to RR-ComIC, it suffices to show that foreach node explored in the second backward BFS, its adoption status w.r.t.B is the same in both algorithms.We now show the RR-ComIC and RR-ComIC++ are indeed equivalent.Lemma 9. Consider any possible worldW of the ComIC model. Let v denotethe root node. For any u 2 V that can reach v via live-edges in W , u isdetermined as B-adopted in RR-ComIC if and only if u is determined as B-adopted in RR-ComIC++.105Proof. We first prove the “if” part. Suppose u is determined as B-adopted inRR-ComIC++. This means that there exists a node s 2 T1 \ SB, such thatthere is a path from s to u consisting entirely of live-edges and B-adoptednodes (every node w on this path satisfies that ↵wB qB|;). Therefore, in RR-ComIC, whereW is generated upfront (or revealed on-the-fly using lazy sam-pling), this live-edge path must still exist. Thus, u must be also B-adoptedin RR-ComIC as well.Next we prove the “only if” part. By definition, if u is determined as B-adopted in possible world W , then there exists a path P from some s 2 SBto u such that the path consists entirely of live edges and all nodes w on thepath satisfy that ↵wW qB|;. It suffices to show that if u is reachable by vbackwards in W , then P will be explored entirely by RR-ComIC++.Suppose otherwise. That is, there exists a node z 2 P that is not exploredby the first round backward BFS from v. We have established that in thecompletely revealed possible world W , there is a live-edge path from z to uand from u to v respectively. Thus, connecting the two paths at node u givesa single live-edge path Pz from z to v. Now recall that the continuation ofthe first backward BFS phase in RR-ComIC++ relies solely on edge status(as long as an edge (w, u) is determined live, w will be visited by the BFS).This means that z must have been explored in the first backward BFS, acontradiction. This completes the proof.Lemma 9, together with the fact that all relevant edges will be testedfor liveness (Phase III in RR-ComIC and the second backward BFS in RR-ComIC++), it then follows that RR-ComIC and RR-ComIC++ are equivalentin terms of generating RR-sets for the ComIC model.Expected time complexity. The analysis is similar: We can show that theexpected running time of RR-ComIC++ isO✓(k + `)(|V |+ |E|) log |V |✓1 +EPTB1EPTB2◆◆,where EPTB1 (EPTB2) is the expected number of edges explored in the first(resp., second) backward BFS.Compared to RR-ComIC, EPTB2 is the same as EPTB in RR-ComIC, soRR-ComIC++ will be faster than RR-ComIC if EPTB1 < EPTF , i.e., if thefirst backward BFS plus the residual forward labeling explores fewer edges,compared to the full orward labeling in RR-ComIC.1064.7 The Sandwich Approximation StrategyIn this section, we present the Sandwich Approximation strategy thatleads to algorithms with data-dependent approximation factors forInfluence Maximization in the general mutual complement case of ComIC(qA|; qA|B and qB|; qB|A). In fact, Sandwich Approximation can beseen as a general strategy, applicable to any non-submodular maximizationproblems for which we can find submodular upper or lower bound functions.Thus, our presentation below is generic and independent of ComIC andProblem 4.Let : 2V ! R0 be a non-submodular set function. Let µ and ⌫ besubmodular and defined on the same ground set V such that µ(S) (S) ⌫(S) for all S ✓ V . That is, µ (⌫) is a lower (resp., upper) bound on everywhere. Consider the problem of maximizing subject to a cardinalityconstraint k. Notice that if the objective function were µ or ⌫, the problemwould be approximable within 1 1/e (e.g., max-k-cover) or 1 1/e "(e.g., influence maximization) by the greedy algorithm [86, 119]. A naturalquestion is: Can we leverage the fact that µ and ⌫ “sandwich” to derive anapproximation algorithm for maximizing ? The answer is yes.4.7.1 MethodologyFirst, run the greedy algorithm on all three functions. It produces an ap-proximate solution for µ and ⌫. Let Sµ, S, S⌫ be the solution obtained forµ, , and ⌫ respectively. Then, select the final solution to to beSsand = argmaxS2{Sµ,S ,S⌫}(S). (4.3)Theorem 11. Sandwich Approximation solution gives:(Ssand ) max⇢(S⌫)⌫(S⌫),µ(S⇤)(S⇤)· (1 1/e) · (S⇤), (4.4)where S⇤ is the optimal solution maximizing (subject to cardinality con-straint k).107Proof. Let S⇤µ and S⇤⌫ be the optimal solution to maximizing µ and ⌫ respec-tively. We have(S⌫) =(S⌫)⌫(S⌫)· ⌫(S⌫) (S⌫)⌫(S⌫)· (1 1/e) · ⌫(S⇤⌫) (S⌫)⌫(S⌫)· (1 1/e) · ⌫(S⇤) (S⌫)⌫(S⌫)· (1 1/e) · (S⇤), (4.5)and(Sµ) µ(Sµ) (1 1/e) · µ(S⇤µ) (1 1/e) · µ(S⇤) µ(S⇤)(S⇤)· (1 1/e) · (S⇤). (4.6)The theorem follows by applying Equation (4.3).Without loss of generality, in the theorem statement we use approxima-tion factor 11/e. In cases where the function value must be estimated usingMC simulations, the factor drops to 1 1/e ✏, for any ✏ > 0. However, thisdoes not affect our analysis.Further Remarks. While the factor in Equation (4.4) involves S⇤ thatgenerally is not computable in polynomial time for problems such asInfluence Maximization, the first term inside maxn(S⌫)⌫(S⌫), µ(S⇤)(S⇤)oinvolves Sµcan be computed efficiently and can be of practical value (see Table 4.9 in§4.8). We emphasize that Sandwich Approximation is much more generaland is not restricted to cardinality constraints. E.g., for a general matroidconstraint, simply replace 1 1/e with 1/2 in (4.4), as the greedy algorithmis a 1/2-approximation in this case [119].Furthermore, monotonicity is not important, as maximizing general sub-modular functions can be approximated within a factor of 1/2 [28], and thusSandwich Approximation applies regardless of monotonicity. On the otherhand, the true effectiveness of Sandwich Approximation depends on howclose ⌫ and µ are to : e.g., a constant function can be a trivial submodularupper bound function but would only yield trivial data-dependent approxi-108mation factors. Thus, an interesting question is how to derive ⌫ and µ thatare as close to as possible, while maintaining submodularity.4.7.2 Applying Sandwich Approximation to InfluenceMaximizationFor Problem 4, GeneralTIM (Algorithm 7) with RR-ComIC (Algorithm 8) orRR-ComIC++ (Algorithm 9) provides a (1 1/e ")-approximate solutionwith high probability, when qA|; qA|B and qB|; = qB|A. When qB|; < qB|A,the upper bound function ⌫ can be obtained by increasing qB|; to qB|A, whilethe lower bound function µ can be obtained by decreasing qB|A to qB|;. Thecorrectness of this approach is ensured by the following theorem.Theorem 12. Suppose qA|; qA|B and qB|; qB|A. Then, under the ComICmodel, for any fixed A and B seed sets SA and SB, A(SA, SB) is mono-tonically increasing w.r.t. any one of {qA|;, qA|B, qB|;, qB|A} with other threeGAPs fixed, as long as after the increase the parameters are still in Q+.Proof (Sketch). The detailed proof would follow the similar induction proofstructure for each possible world as in the proof of Theorem 7. Intuitively,we would prove inductively that at every step increasing qA|; or qA|B wouldincrease both A-adopted and B-adopted nodes.Putting it all together, the final algorithm for Problem 4 is GeneralTIMwith (i) RR-ComIC/RR-ComIC++ and (ii) Sandwich Approximation. It isimportant to see how useful and effective Sandwich Approximation is inpractice. We address this question head on in §4.8, where we “stress test”the idea behind Sandwich Approximation. Intuitively, if qB|; and qB|A areclose to each other, the upper and lower bounds (⌫ and µ) obtained forInfluence Maximization can be expected to be quite close to in terms offunction values. More details are presented in the next section.4.8 ExperimentsWe performed extensive experiments on three real-world social networksfrom which four datasets are derived and tested on. We first present resultswith synthetically generated GAPs (Section 4.8.1). Then we shall propose amethod for learning GAPs using action log data (Section 4.8.2), and conductexperiments using learned GAPs (Section 4.8.3).Datasets. Flixster was collected from a movie website with social networkingfeatures, and we extracted a strongly connected component from the original109Douban-Book Douban-Movie Flixster Last.fm# nodes 23.3K 34.9K 12.9K 61K# edges 141K 274K 192K 584Kavg. out-degree 6.5 7.9 14.8 9.6max. out-degree 1690 545 189 1073Table 4.2: Statistics of graph data (all directed)network [82]. Douban3 is collected from a Chinese social network [147], whereusers rate and review numerous books, movies, music, etc. We crawled allmovie & book ratings of the users in the graph, and derive two datasets frombook and movie ratings: Douban-Book and Douban-Movie (details later).Last.fm is taken from the popular music website4 with social networkingfeatures. For all graphs, we learned influence probabilities on edges using themethod in [68], which has been widely adopted in the field [35]. Table 4.2presents the basic stats of the datasets.4.8.1 Experiments with Synthetic Adoption ProbabilitiesWe first evaluated our proposed RR-set-based algorithms described in Sec-tion 4.6 under various combinations of synthetic GAPs, and compared withtwo baselines:1. VanillaIC: It selects k seeds using TIM algorithm [137] under theclassic IC model, essentially ignoring the other product and theNLA in ComIC model; Intuitively, VanillaIC ought to be effective forInfluence Maximization when qA|BqA|; is small, in which case comple-mentary effects are less strong and hence it is “safer” to ignore B-seedsand all the GAPs.Recall that the TIM algorithm is a highly scalable approximation al-gorithm for influence maximization and was shown to dominate manyadvanced heuristics [37, 39, 72] in terms of both seed set quality andrunning time in [137].2. Copying: For Influence Maximization, it simply selects the top-k B-seeds to be A-seeds (assuming |SB| |SA|).We set qA|B = qB|A = 0.75, qB|; = 0.5. And qA|; was set to 0.1, 0.3,0.5, which represent strong, moderate, and low complementarity respectively.3http://www.douban.com/, last accessed October 6, 2015.4http://www.last.fm/, last accessed October 6, 2015.110Lots of possibilities exist for fixing the input B-seeds. We tested the followingthree representative cases:1. Run VanillaIC and select the 101st to 200th nodes – this models asituation where we assume those seeds are moderately influential.2. Randomly select 100 nodes – this models our complete lack of knowl-edge;3. Run VanillaIC and select the top-100 nodes – this models a situationwhere we assume the advertiser might use an advanced algorithm suchas TIM to target highly influential users;VanillaIC CopyingqA|; = 0.1 0.3 0.5 0.1 0.3 0.5Douban-Book 5.89% 0.93% 0.50% 85.7% 207% 301%Douban-Movie 24.7% 3.30% 1.72% 13.3% 68.8% 122%Flixster 35.5% 11.3% 5.15% 16.7% 48.0% 84.8%Last.fm 31.5% 2.75% 0.70% 22.6% 88.5% 168%Table 4.3: Percentage improvement of GeneralTIM with RR-ComIC overVanillaIC & Copying, where the fixed B-seed set is chosen tobe the 101st to 200th ones from the VanillaIC orderVanillaIC CopyingqA|; = 0.1 0.3 0.5 0.1 0.3 0.5Douban-Book 2.16% 1.12% 0.71% 133% 419% 676%Douban-Movie 4.38% 1.49% 0.87% 236% 737% 1283%Flixster 10.6% 0% 0% 134% 352% 641%Last.fm 3.76% 2.65% 1.65% 398% 1355% 2525%Table 4.4: Percentage improvement of GeneralTIM with RR-ComIC overVanillaIC & Copying, where the fixed B-seed set is randomlychosenTable 4.3 shows the percentage improvement of our algorithms over thetwo baselines, for the case of selecting the 101st to 200th nodes output byVanillaIC as the fixed B-seed set. As can be seen, GeneralTIM performedconsistently better than both baselines, and in many cases by a large margin.111VanillaIC CopyingqA|; = 0.1 0.3 0.5 0.1 0.3 0.5Douban-Book 0.34% 0.34% 0% 0.34% 0.34% 0%Douban-Movie 0.64% 0.54% 0.36% 0.64% 0.54% 0.36%Flixster 0% 0% 0% 0% 0% 0%Last.fm 1.73% 1.38% 0.80% 1.73% 1.38% 0.80%Table 4.5: Percentage improvement of GeneralTIM with RR-ComIC overVanillaIC & Copying, where the fixed B-seed set is chosen tobe the top-100 nodes by VanillaICTable 4.4 shows the percentage improvement of GeneralTIM over VanillaICand Copying baselines when the fixed B-seeds were chosen uniformly at ran-dom from V . As can be seen, GeneralTIM was significantly better exceptwhen comparing to VanillaIC. This is not surprising, as when B-seeds werechosen randomly, they were unlikely to be influential, so it is rather safeto ignore them when selecting A-seeds, matching what VanillaIC essentiallydoes.Table 4.5 shows the results when the fixed B-seed set was chosen to bethe top-100 nodes from VanillaIC. These nodes represent the most influentialones under the IC model that can be found efficiently in polynomial time5.We remark that VanillaIC and Copying are equivalent in this case. Here theadvantage of GeneralTIM is less significant. On Flixster, the three algorithmsachieve the same influence spread.Overall, considering Tables 4.3, 4.4, and 4.5 all together, we can seethat in the vast majority of all test cases, GeneralTIM outperformed the twobaselines, often by a large margin. This demonstrates that GeneralTIM isrobust with respect to different B-seeds. Furthermore, in real-world scenarios,the B-seed set may simply consist of “organic” early adopters, i.e., userswho adopt the product spontaneously. The robustness of GeneralTIM is thushighly desirable as it is often difficult to foresee which users would actuallybecome organic early adopters in real life.Also, in our model the influence probabilities on edges were assumed tobe independent of the product; without this assumption it is expected thatCopying and VanillaIC would perform even more poorly. If we additionally as-sume that the GAPs are user-dependent, VanillaIC would deteriorate further.5Recall that influence maximization is NP-hard under the IC model, and thus theoptimal seed set of cardinality 100 is difficult to obtain. The seeds found by VanillaIC canbe regarded as a good proxy.112In contrast, our GeneralTIM and RR-set generation algorithm (RR-ComIC)can be easily adapted to both these scenarios.4.8.2 Learning Global Adoption ProbabilitiesExtracting Signals from DataFor Flixster and Douban, we learned GAPs from the available timestampedrating data. These datasets can be viewed as action logs in the followingsense. Each entry is a quadruple (u, i, a, tu,i,a), indicating u performed actiona on item i at time tu,i,a. We counted a rating quadruple as one adoptionaction and one informing action: If someone rated an item, she must havebeen informed of it first; we assume only adopters rate items.A key challenge is how to find actions that can be mapped to informingevents that do not lead to adoptions. Fortunately, there are special ratingsproviding such signals in Flixster and Douban. Flixster allows users to indi-cate if they “want to see” a movie, or are “not interested” in one. We mappedboth signals to the actions of a user being informed of a movie. Doubanallows users to put items into a wish list. Thus, if a book/movie was in auser’s wish list, we treated it as an informing action.Learning Methodology and ResultsConsider two items A and B in an action log. Let RA and IA be the setof users who rated A and who were informed of A, respectively. Clearly,RA ✓ IA. Thus,qA|; = |RA \RBrateA| / |IA \RBinformA|,where RBrateA is the set of users who rated both items with B rated first,and RBinformA is the set of users who rated B before being informed of A.Next, qA|B is computed as follows:qA|B = |RBrateA| / |RBinformA|.Similarly, qB|; and qB|A can be computed in a symmetric way.GAPs learned from Flixster and Douban. Tables 4.6 – 4.8 demon-strate selected GAPs learned from Flixster, Douban-Book, and Douban-Movie datasets. Here we not only show the estimated probabilities, but alsogive 95% confidence intervals (the standard Wald interval) [27]. By the defi-113A B qA|; qA|B qB|; qB|AMonster Inc. Shrek .88± .01 .92± .01 .92± .01 .96± .01Gone in 60 Seconds Armageddon .63± .02 .77± .02 .67± .02 .82± .02Harry Porter: Prisoner of Azkaban What a Girl Wants .85± .01 .84± .02 .66± .02 .67± .02Shrek The Fast and The Furious .92± .02 .94± .01 .80± .02 .79± .02Table 4.6: Selected GAPs learned for pairs of movies in FlixsterA B qA|; qA|B qB|; qB|AThe Unbearable Lightness of Being Norwegian Wood .75± .01 .85± .02 .92± .01 .97± .01Harry Potter: Philosopher’s Stone Harry Potter: Half-Blood Prince .99± .01 1.0± 0 .97± .01 .98± .01Stories of Ming Dynasty III Stories of Ming Dynasty VI .94± .01 1.0± 0 .88± .03 .98± .01Fortress Besieged Love Letter .89± .01 .91± .03 .82± .02 .83± .03Table 4.7: Selected GAPs learned for pairs of books in Douban-BookA B qA|; qA|B qB|; qB|AUp 3 Idiots .92± .01 .94± .01 .92± .01 .93± .01Pulp Fiction Leon .81± .01 .83± .01 .95± .00 .98± .01The Silence of the Lambs Inception .90± .01 .86± .01 .92± .01 .98± .01Fight Club Se7en .84± .01 .89± .01 .89± .01 .95± .01Table 4.8: Selected GAPs learned for pairs of movies in Douban-Movie−0.1 0 0.1 0.2 0.3 0.4 0.5 0.6050010001500200025003000Frequency (num. of pairs)Strength of complementarity−0.4 −0.2 0 0.2 0.4 0.6010002000300040005000Frequency (num. of pairs)Strength of complementarity−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5050010001500200025003000Frequency (num. of pairs)Strength of complementarity(a) Flixster (b) Douban-Book (c) Douban-MovieFigure 4.5: Complementary effects learned from data: The histogram ofall (qA|BqA|;) and (qB|AqB|;) values on Flixster, Douban-Book, and Douban-Movie (10000 pairs of items each)nition of GAPs (Section 4.3), we can treat each GAP as the parameter of aBernoulli distribution. Consider any GAP, denoted q, and let q¯ be its esti-mated value from action log data. The confidence interval of q¯ is computedas follows: q¯ 1.96qq¯(1 q¯)/nq, q¯ + 1.96qq¯(1 q¯)/nq,114where nq is the number of samples used for estimating q.Complementary effects. Figure 4.5 plots the histogram of all (qA|BqA|;)and (qB|A qB|;) values on Flixster, Douban-Book, and Douban-Movie. Foreach dataset, we selected the top 10K pairs of items ranked by the numberof common ratings received. The plots clearly demonstrate that complemen-tarity and substitutability exist in the data.4.8.3 Experimental Settings with Learned AdoptionProbabilitiesBaselines. In the following experiments, we compare our approximation al-gorithm, namely GeneralTIM with RR-ComIC/ RR-ComIC++ and SandwichApproximation, to several baselines commonly used in the influence maxi-mization literature.1. HighDegree: choose the k highest out-degree nodes as seeds;2. PageRank: choose the k nodes with highest PageRank score as seeds;3. Random: choose k seeds uniformly at random.Items and Global Adoption Probabilities. The following pairs of itemsare tested:• Flixster: We chose movies Monster Inc as A and Shrek as B, whereQ = {.88, .92, .92, .96}.• Douban-Book: We chose books The Unbearable Lightness of Being asA and Norwegian Wood as B, and Q = {.75, .85, .92, .97}.• Douban-Movie: We chose movies Fight Club as A and Se7en as B, andQ = {.84, .89, .89, .95}.• Last.fm: There is no signal in the data to indicate informing events,thus our learning method in Section 4.8.2 is not applicable. As a result,we used synthetic Q = {.5, .75, .5, .75}.Other Parameters. Unless otherwise stated, following the convention weset k = 50 as the number of A-seeds to mine. For GeneralTIM, we set ` to 1 sothat a success probability (of obtaining approximate solutions) of 1 1/|V |was ensured [137]. We set paremeter " to be 0.5 for RR-ComIC and RR-ComIC++, as this choice strikes the balance between efficient running time115101102103104105 0.1 0.2 0.3 0.4 0.5 20 22 24 26 28 30Running Time (sec)Influence Spread (x 100)Value of εRR-ComIC timeRR-ComIC++ timeRR-ComIC(++) spread102103104105 0.1 0.2 0.3 0.4 0.5 12 13 14 15 16 17 18 19 20Running Time (sec)Influence Spread (x 100)Value of εRR-ComIC timeRR-ComIC++ timeRR-ComIC(++) spread(a) Flixster (b) Douban-Book102103104105 0.1 0.2 0.3 0.4 0.5 41 42 43 44 45 46Running Time (sec)Influence Spread (x 100)Value of εRR-ComIC timeRR-ComIC++ timeRR-ComIC(++) spread103104105106 0.1 0.2 0.3 0.4 0.5 33 34 35 36 37 38 39 40Running Time (sec)Influence Spread (x 100)Value of εRR-ComIC timeRR-ComIC++ timeRR-ComIC(++) spread(c) Douban-Movie (d) Last.fmFigure 4.6: Effects of " on the running time and influence spread of RR-ComIC and RR-ComIC++, on all four datasets. The influ-ence spread achieved by RR-ComIC and RR-ComIC++ werealmost identical in all cases, and thus we only drew one lineusing the spread of RR-ComIC++.and high quality of seeds, as we shall show in Figure 4.6. For the input B-seed set, we ran VanillaIC to extract 200 seeds and took the bottom 100 to beB-seeds. This is to simulate the situation where the B-seeds are moderatelyinfluential.All algorithms were implemented in C++ and compiled using g++ O3optimization. All experiments were conducted on an openSUSE server with2.93GHz CPUs and 128GB RAM.1164.8.4 Results and AnalysisEffects of ". We first evaluated the effects of " on GeneralTIM with RR-ComIC and RR-ComIC++. As mentioned in Section 4.6, the parameter "controls the trade-off between approximation ratio and efficiency. The larger" is, the more RR-sets will be generated. Figure 4.6 depicts influence spreadand running time (log-scale) side-by-side, as a function of ". We can seethat as " goes up from 0.1 to 0.5, the running time of both versions ofGeneralTIM (RR-ComIC & RR-ComIC++) decreases dramatically, often byorders of magnitude. while the quality of seed sets in terms of influence spread(Influence Maximization) are almost completely unaffected (the largest dif-ference among all test cases is only 0.45%). This allows us to set " = 0.5 toachieve best efficiency without sacrificing seed set quality much.Figure 4.6 also illustrates that RR-ComIC++ was consistently faster thanRR-ComIC, up to one order of magnitude. For instance, on Douban-Book,RR-ComIC++ was 10.4, 11.4, 12.8, 11.9, and 11.7 times as fast as RR-ComICwhen " was set to 0.1, 0.2, 0.3, 0.4, and 0.5 respectively.Quality of Seed Sets. The quality of seeds (output by an algorithm) is mea-sured by the influence spread it achieves. We evaluated the spread of seed setscomputed by all algorithms by MC simulations with 10000 iterations to en-sure a fair comparison. As can be seen from Figure 4.7, our approximation al-gorithm GeneralTIM was consistently the best in all test cases, often by a sig-nificant margin. More specifically, GeneralTIM with RR-ComIC++ was 99.9%,12.5%, 2.7%, and 12.6% better than the next best algorithm on Flixster,Douban-Book, Douban-Movie, and Last.fm respectively. HighDegree typi-cally has good performance, especially in graphs with many nodes havingvery large out-degrees (Douban-Book and Douban-Movie), while PageRankhas good quality seeds only on Last.fm. Random is consistently the worst.Running Time. We plot the running time of GeneralTIM with RR-ComICand RR-ComIC++ (" = 0.5) and compare to Greedy with MC simulations,as depicted in Figure 4.8. Note that the maximum value of the Y -axis isset to be two weeks of time, and touching it means the algorithm did notfinish within this limit. On all datasets, we can see that GeneralTIM withRR-ComIC and RR-ComIC++ was about two to three orders of magnitudefaster than Greedy. In addition, RR-ComIC++ is 7.1, 11.7, 8.0, and 2.2 timesas fast as than RR-ComIC on Flixster, Douban-Book, Douban-Movie, andLast.fm respectively. We omit the running time of HighDegree, PageRank,and Random since they are typically very efficient [35,37, 39].Scalability Tests on Synthetic Graphs. We then used larger syntheticgraphs to conduct scalability tests. We generated power-law random graphs117 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 10 20 30 40 50Influence Spread of A (x 1000)Number of A-SeedsGreedyMCRR-Comic++HighDegPageRankRandom 0 0.5 1 1.5 2 2.5 3 3.5 1 10 20 30 40 50Influence Spread of A (x 1000)Number of A-SeedsRR-Comic++HighDegPageRankRandom(a) Douban-Book (b) Douban-Movie 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 10 20 30 40 50Influence Spread of A (x 1000)Number of A-SeedsGreedyMCRR-Comic++HighDegPageRankRandom 0 0.5 1 1.5 2 2.5 3 1 10 20 30 40 50Influence Spread of A (x 1000)Number of A-SeedsRR-Comic++HighDegPageRankRandom(c) Flixster (d) Last.fmFigure 4.7: Influence spread vs. seed set sizeDouban-Book Douban-Movie Flixster Last.fmLearned GAP 0.996 0.999 0.996 0.999qB|; = 0.1 0.652 0.962 0.492 0.519qB|; = 0.1 0.770 0.969 0.633 0.628qB|; = 0.5 0.946 0.985 0.926 0.879Table 4.9: Sandwich approximation factor: (S⌫)/⌫(S⌫)of 0.2, 0.4, ..., up to 1 million nodes with a power-law degree exponent of2.16 [37]. These graphs have an average degree of about 5. The GAPs wereset to be the same as Flixster. As can be seen from Figure 4.9, the runningtime of both algorithms grew linearly w.r.t. graph size, demonstrating goodscalability. Consistent with Figure 4.6 and Figure 4.8, RR-ComIC++ wasconsistency faster than RR-ComIC in all graphs we tested.118100101102103104105Flixster Douban(B) Douban(M) Last.fmRunning Time (sec)Greedy-MC RR-ComIC RR-ComIC++Figure 4.8: Running time comparisons on real networks: GeneralTIMwith RR-ComIC, GeneralTIM with RR-ComIC++, and Greedywith Monte Carlo simulations 0 1 2 3 4 5 6 7 8 9 0.2 0.4 0.6 0.8 1Running time (hour)Number of nodes in graph (million)RR-ComICRR-ComIC++Figure 4.9: Running time comparisons on synthetic power-law randomgraphs up to one million nodes: GeneralTIM with RR-ComICversus GeneralTIM with RR-ComIC++Approximation Factors by Sandwich Approximation. Recall fromSection 4.7 that the approximation factor yielded by Sandwich Approxi-mation is data-dependent:(Ssand ) max⇢(S⌫)⌫(S⌫),µ(S⇤)(S⇤)· (1 1/e ") · (S⇤).119To see evaluate the effectiveness of Sandwich Approximation in real-worldgraphs, we computed (S⌫)/⌫(S⌫), as although S⇤ is unknown due to NP-hardness of Influence Maximization, Sandwich Approximation is guaranteedto have an approximation factor of at least (1 1/e ")(S⌫)/⌫(S⌫). Notethat in the GAPs learned from data, both qB|AqB|; and qA|BqA|; are small,hence “friendly” to Sandwich Approximation as mentioned in Section 4.7.Thus, we further “stress tested” the algorithm with more adversarial settings:We set qA|; = .3, qA|B = .8, qB|A = 1 and vary qB|; from {.1, .5, .9}; Thisensures mutual complementarity.Table 4.9 presents the results on all datasets with both learned GAPs andartificial GAPs. As can be seen, with real GAPs, the ratio was quite closeto 1, matching our intuition. For artificial GAPs, the ratio was not as high.E.g., in the case of qB|; = 0.5, (S⌫)/⌫(S⌫) ranges from 0.628 (Last.fm) to0.969 (Douban-Movie), which correspond to an approximation factor of 0.40and 0.61 (omitting "). Even the smallest ratio 0.492 would yield a decentfactor at about 0.3. This has shown that Sandwich Approximation is fairlyeffective and robust for solving non-submodular cases of Problem 4.4.9 Discussion and Future WorkOur work in this chapter opens up a number of interesting avenues for futureresearch. One direction is to design more efficient approximation algorithmsor heuristics for Influence Maximization: e.g., whether near-linear time al-gorithm is still available for these problems is open. Another direction isto fully characterize the entire GAP space Q in terms of monotonicity andsubmodularity properties. In addition, an important direction is to extendthe model to multiple items. Given the current framework, ComIC can beextended to accommodate k items, if we allow k · 2k1 GAP parameters —for each item, we specify the probability of adoption for every combinationof other items that have been adopted. However, how to simplify the modeland make it tractable, how to reason about the complicated two-way ormulti-way competition and complementarity, how to analyze monotonicityand submodularity, and how to learn GAP parameters from real-world data,etc. remain interesting challenges.Last but not the least, as mentioned in Section 4.3.1 the current ComICmodel assumes user homogeneity and the set of GAPs – qA|;, qA|B, qB|;, andqB|A – are the same for all users. It is worth considering an extension ofthe ComIC model to characterize heterogenous users. Let T be the numberof user types and let [T ] := {1, 2, . . . , T} denote the set of all types. For120each type i 2 [T ], there is a set of type-specific adoption probabilities: qiA|;,qiA|B, qiB|;, and qiB|A. For any user u of type i in the propagation process, shedecides to adopt A or B using the aforementioned type-specific probabilities.The rest of the model remains the same.Under the extended ComIC model, one may consider investigate if thesubmodularity and monotonicity results still hold: Establishing submodular-ity for a subset of the parameter space is important as it will enable theapplication of the generalized TIM algorithm and the Sandwich Approxima-tion technique proposed in this chapter. In addition, how to leverage availabledatasets to learn the larger set of parameters (4T adoption probabilities intotal) is challenging. In principle, the methods described in Section 4.8.2 canstill be used, but one may face data sparsity issues since the new model isin finer granularity and has much more parameters.121Chapter 5Recommendations withAttraction, Aversion, andSocial Influence Diffusion5.1 IntroductionIn this chapter, we describe how social influence makes big impact in awidely-used data mining application: recommender systems. The primarygoal of a recommender system is to infer user interest from data (e.g., nu-meric ratings, text reviews, or even binary adoption data) and suggest alist of personalized relevant items to each user. Relevant and accurate sug-gestions boost user engagement, deliver better user experiences, which arehighly beneficial to the growth of the service. In this digital era, Recom-mender systems are now ubiquitous, and can be found in almost every majorInternet services such as Amazon (product recommendation), Netflix (movierecommendation), Spotify (music recommendation), Facebook and LinkedIn(friend recommendation). We refer interested readers to surveys [3, 127] forcomprehensive treatments on a large variety of computational models andtechniques employed by modern recommender systems.To come up with high-quality and highly relevant recommendations, oneof the most crucial tasks is to accurately model and infer user interests.Importantly and naturally, users’ content consumption patterns evolve overtime. For example, a user may be attracted towards content that is popular,content recommended to her by a service, or content being enjoyed by herfriends. Alternatively, users may get tired of certain types of content, e.g.,122romantic comedy movies, and desire to consume something different andnew.A key challenge for recommender systems is accurately modeling suchuser preferences as they evolve over time. Although traditional matrix fac-torization approaches can be extended to incorporate temporal dynamics ofuser behavior [92, 93], such extensions do not identify or explicitly analyzethe factors that influence the drift in interests.A “classic” factor influencing user interests is attraction: users may beattracted to content they are exposed to repeatedly and often (such as,e.g., a song played often on the radio). This phenomenon, known in psy-chology as the “mere-exposure effect” [148], is natural and intuitive, and isthe main premise behind advertising [54,75]. Nonetheless, repetition and/oroverexposure can also have the opposite effect, leading to aversion: recentresearch argues that users often desire serendipitous, novel, previously un-seen content [1, 5, 115, 130]. This notion is also quite natural and intuitive,but is usually not taken into account by recommender systems, yielding over-specialized, predictable recommendations [5, 115].Importantly, a third factor affecting a user’s interests is social influence:users may feel attracted to content consumed and liked by their friends.Trend adoption through “word-of-mouth” or “viral” marketing is a well doc-umented phenomenon [26, 50], and has been extensively studied since theseminal paper by Kempe et al. [86]. Nonetheless, to the best of our knowl-edge, the effect of social influence on interests, and its implications for recom-mender systems, has received attention only recently [70,82,88,112,130,152].Furthermore, those existing work in social recommendation rarely considerthe dynamics of influence cascades and how modelling and computational is-sues in recommender systems would be affected by those complex stochasticprocesses. Instead, they merely consider the effects of social influence withineach user’s ego-network (neighbourhood).Incorporating these influence factors in a recommender system raises sev-eral challenges. To begin with, under attraction and aversion, a recommendercan no longer be treated as a passive entity: recommendations it makes mayalter user interests, pushing them either towards or away from certain top-ics. Hence, traditional methods that merely profile a user and then caterto this specific profile may fall short of keeping up with these dynamics.Second, social influence implies that recommendation decisions to differentusers cannot be made in isolation anymore: as recommendations alter a user’sinterests through attraction and aversion, social influence can spread thesechanges, resulting in an interest cascade. Therefore, optimal recommenda-123tion decisions across users need to be computed globally, taking into accountthe joint effect they have over the user’s social network.In this chapter, we make the following contributions:• We formulate a global recommendation problem in the presense ofattraction, aversion, and social influence. In particular, we propose amathematical model that incorporates these phenomena, and studythe steady state behavior of user interests as a function of the rec-ommender’s strategy in selecting which items to show to users. Underthis model, we seek the optimal recommendation strategy, i.e., one thatmaximizes the users’ social welfare in steady state (Section 5.3).• We show that, for a large recommender item catalog, obtaining theoptimal recommendation strategy amounts to solving a quadratically-constrained quadratic optimization problem (QCQP). Though thisproblem may not be convex, we present a semi-definite program (SDP)relaxation that can be solved in polynomial time. In many cases, thissolution is also guaranteed to be an optimal solution; when the solutionis not optimal, we show how a solution with a provable approximationguarantee can be constructed through randomization. We discuss howto determine whether the solution is optimal, and identify special casesfor which an optimal solution is always reached, and randomization isunnecessary (Section 5.4).• We provide evidence for the existence of attraction and aversion inthree real-life rating datasets. We do so by developing and applyinga method for learning the weight (i.e., importance) of these factorsfrom rating data (Section 5.5). Applied to three real life datasets, ourmethod indicates that between 14.0% to 18.6% of users show strongaversive or attractive behavior (Section 5.6).• We conduct extensive experiments on real world datasets, and showthat our recommendation algorithm is 15.4% to 107.4% better than abaseline algorithm in terms of social welfare achieved (Section 5.6).As we shall see, our work assumes that users are generally aware of itemsconsumed by their friends (so that social influence is effective). There areplenty of real-life examples: For instance, friends often discuss popular moviesand TV series with each other. In addition, content streaming services suchas Netflix and Spotify allow users to import their social network data (e.g.,from Facebook) and post activities as feed. We also assume that the service124−1 −0.5 0 0.5 1050100150200250Value of γi − δiNumber of users EmpiricalFit µ ± 1.8 σSDP Baseline020004000600080001000012000Social Welfare(a) Attraction-Aversion (b) Gains from SDP solutionFigure 5.1: Illustration of aversion and attraction in MovieLens, andgains from accounting for them in optimization.host has access to social network connections amongst its users, and that thehost is capable of computing pairwise influence strength amongst its usersby, e.g., using timestamped action logs stored in its databases [68].Our data analysis indicates that (i) the phenomena of attraction andaversion are present in real-life datasets and (ii) accounting for them canlead to significant gains in the improvement of recommendations. Figure 5.1provides a quick illustration of these two facts (see Section 5.6 for a detailedaccount on the derivation these two figures). Figure 5.1(a) presents the dis-tribution of a score measuring aversion and attraction among different usersin MovieLens (with 1 indicating users with the strongest aversive behav-ior, and +1 indicating users with the strongest attractive behavior). About7.0% of users are strongly aversive (score 0.5) while 9.0% are stronglyattractive (score 0.5). Accounting for such users can lead to a signifi-cant impact on recommendations: as shown in Figure 5.1(b), the user socialwelfare more than doubles when incorporating this knowledge in recommen-dation decisions. Though there are clearly many factors of user behavior thatare not accounted for in our analysis, we believe that these two facts, alongwith the SDP relaxation yielding optimal recommendations, indicate that in-vestigating and accommodating for such phenomena is both important andtractable.1255.2 Related WorkThere has been a significant interest in modeling the temporal dynamics ofuser interests for various settings close to ours [92,125,130,143]. Early workon matrix factorization (MF) by Koren [92] incorporated time-variant userprofiles, an approach that we also adopt. We depart from this line of workby modeling, and also including in the MF process (see Section 5.5), factorsthat impact these drifts, including attraction, aversion, and social influence.Several studies have highlighted the need for serendipity and diversity inthe context of recommender systems, both of which relate to the notion ofaversion we describe here. The need for serendipity was first idendified byMcNee et al. [115]. To address this, Yu et al. [146] and Abbassi et al. [1]proposed algorithms for recommending items that maximize a score thatcombines both relevance to a user as well as diversity. Ge et al. [60] focusedon evaluating the lack of serendipity and diversity, and how it hurts the qual-ity of recommendations. We depart from these works by modeling how rec-ommendations themselves may instigate aversion or attraction among users,through a dynamic evolution of user interests.Our approach to incorporating the effects of aversion is closer to DasSarma et al. [130], which considered users that iteratively consume items inone out of several categories. They incorporate “boredom” and social influ-ence in a manner similar to us: inherent item values decrease as a functionof a weighted frequency of past consumption, and a user’s utility is aver-aged among her friend’s utilities. The authors provide bounds of the steadystate performance of different consumption strategies under such dynamics.We depart by modelling user interests as multi-dimensional vectors, and us-ing a factor-based model for user utilities, whose dynamics and steady statebehavior cannot be captured by the (one-dimensional) model in [130].The literature on social influence is vast, motivated by the viral marketingapplications introduced by Domingos and Richardson [50] and further stud-ied by Kempe et al. [86]. Our influence model is closer to gossiping [131],in that the interest/state of each user results from averaging the interestsof her neighbors. Though we depart from classic gossiping protocols in thatwe incorporate additional dynamics (through attraction and aversion), sim-ilar techniques as in [131] could potentially be used to study our system inscenarios where interest evolution is asynchronous across users. In the con-text of matrix factorization, Jamali et al. [82] proposed incorporating thedistance of a user’s profile to the average profile of users in their social circleas a regularization factor in MF. This is consistent with the social influencebehavior we outline in Section 5.3.3. We depart from this work by modeling126dynamic profiles, and studying the additional effect of recommendations onuser profiles through attraction and aversion.Semi-definite programming (SDP) relaxation for quadratically con-strained quadratic programs (QCQP) lies at the core of our algorithmic con-tributions. Building on the seminal work by Goemans and Williamson [62],several papers have demonstrated classes of QCQPs for which an SDP re-laxation gives a constant approximation guarantee [120,121,145]. Moreover,exact solutions of rank 1 were known to be attainable for several classesof QCQP, including when the problem has one [25] or two quadratic con-straints [14]. Of special interest is the case where the quadratic objectiveinvolves non-negative off-diagonal elements, and constraints involve onlyquadratic terms of one variable [150], as the attraction-dominant case of ourproblem falls into this class (see Section 5.4.3). We refer the interested readerto [140] for SDP in general, and to [111, 121] for applications to quadraticprogramming.5.3 Problem FormulationIn what follows, we present our mathematical model that describes how usersinteract with a recommender system. We use bold script (e.g., x,y,u,v) todenote vectors, and capital script (e.g., A,B,H) to denote matrices. For ei-ther matrices or vectors, we use to indicate element-wise inequalities. Forsymmetric matrices, we use ⌫ to indicate dominance in the positive semidef-inite sense; in particular, A ⌫ 0 implies that A is positive semidefinite. Forsquare matrices A, we denote by tr(A), diag(A), rank(A) the trace, diagonaland rank of A, respectively. Finally, given an n⇥m matrix A, we denote bycol : Rn⇥m ! Rnm the column-major order representation of A: i.e., col(M)maps the elements of A to a vector, by stacking the m columns of A on topof each other.5.3.1 OverviewOur model assumes that user interests are dynamic: they are affected bothby recommendations users receive, as well as by how other users’ interestsevolve. In particular, our model of user behavior takes into account the fol-lowing factors:1. Inherent interests. Our model accounts for an inherent predisposi-tion users may have, e.g., towards particular topics or genres. This isstatic and does not change through time.1272. Social influence. A user’s behavior can be affected by what peoplein her social circle (e.g., her friends or family) are interested in.3. Attraction. As per the mere-exposure effect, users may exhibit attrac-tive behavior: if a type of content is shown very often by the recom-mendation service, this might reinforce the desire of a user to consumeit.4. Aversion. Users may also exhibit aversive behavior: a user can growtired of a topic that she sees very often, and may want to see somethingnew or rare.Under the joint effect of the factors above, suggestions made by the rec-ommender instigate an interest cascade over the users. Suggestions alteruser interests through attraction or aversion; in turn, these changes affectneighboring users as well, on account of their social behavior. These effectspropagate dynamically over the users’ social network.5.3.2 Recommender System and User UtilitiesWe now formally describe how each of the four factors mentioned above isincorporated in our model. We consider n users receiving recommendationsfrom an entity we call the recommender. We denote by [n] ⌘ {1, 2, . . . , n}the set of all users. At each time step t 2 N, the recommender suggests anitem to each user in [n], selected from a catalog C of available items. Theuser accrues a utility from the item recommended. As discussed below, therecommender’s goal is to suggest items that maximize the aggregate userutility, i.e., the social welfare.Following the standard convention in recommender systems, we assumefactor-based user utilities. At each t 2 N, each user i 2 [n] has an interestprofile represented by a d-dimensional vector ui(t) 2 Rd. Moreover, the itemrecommended to user i at time t is represented by a d-dimensional featureprofile vi(t) 2 Rd. Then, the expected rating1 a user i would give to the itemsuggested to her at time t is given by F (ui(t),vi(t)), whereF (u,v) =def hu,vi =Pdk=1 ukvk, (5.1)i.e., the inner product between the interest and feature profiles [91, 93]. In-tuitively, each coordinate of a feature profile can be perceived as an item-specific feature, e.g., a movie’s genre or an article’s topic. The corresponding1In practice, Equation (5.1) best approximates centered ratings, i.e., ratings offset bya global average across users.128coordinate in an interest profile captures the propensity of the user to reactpositively or negatively to this feature.We call F (ui(t),vi(t)) the utility of user i from the suggested item at timet. Without loss of generality2, we assume that the item profiles vi 2 Rd arenormalized, i.e.: kvi(t)k2 = 1 for all i 2 [n], t 2 N. Under this assumption,given that a user’s profile is u, the best item to recommend to user i is theone that yields the highest expected rating; indeed, this isargmaxv2Rd:kvk2=1F (u,v) = u/kuk2,i.e., the item that maximizes the utility of a user i. Note that identifying itemsthat maximize the aggregate utility across users (i.e., the sum of expectedratings to suggested items), is a natural goal for the recommender.5.3.3 Interest EvolutionThe evolution of user interests captures the four factors outlined in §5.3.1.At each time step t 2 N, the interest profile of a user i 2 [n] is chosenalternately between either a personalized or a social behavior. If personalized,the behavior of a user is again selected among three possible outcomes, eachcorresponding to inherent interests, attraction, and aversion, respectively.The selection of which of these four behaviors takes place at a given timestep is random, and occurs independently of selections at other users as wellas selections at previous time slots. We denote with 2 [0, 1] the probabilitythat the user selects a social behavior at time slot t. The probability ofselecting a personalized behavior is thus 1. Interests at these two distinctevents are as follows:Personalized Behavior. If a user’s interest is selected through a person-alized behavior, the user selects her profile through one of the three person-alized factors outlined in §5.3.1. In particular, for every i 2 [n], there existprobabilities ↵i, i, i 2 [0, 1] such that ↵i + i + i = 1, and:• Inherent interests. With probability ↵i, user i follows her inherentinterests. That is, ui(t) is sampled from a probability distribution µ0iover Rd. This distribution does not vary with t and captures the user’sinherent predisposition.2Note that F (u,v) = F (su, 1sv), for any scalar s 2 R, so we can assume that eitheruser or feature profiles have a bounded norm.129• Attraction. With probability i, user i selects a profile that is at-tracted to the items suggested to her in the past. To capture this no-tion denote by Vi(t) = {vi(⌧)}⌧t the history of (profiles of) itemssuggested to a user i. Then, the interest of a user under attraction isgiven by g(Vi(t 1)), a weighted average of the items suggested to itin the past. That is:ui(t) = g(Vi(t 1)) =Pt1⌧=0wt⌧v(⌧)Pt1⌧=0wt⌧. (5.2)• Aversion. With probability i, user i selects a profile that is repulsedby the items suggested to her in the past; that is, her interest profileis given byui(t) = g(Vi(t 1)) = Pt1⌧=0wt⌧v(⌧)Pt1⌧=0wt⌧. (5.3)To gain some intuition on Equation (5.2) and Equation (5.3), recall thata user’s utility at time t is given by Equation (5.1). Therefore, a profilegenerated under Equation (5.2) implies that the suggestion that maximizesher utility at time t would be one that aligns perfectly (i.e., points in thesame direction as) the weighted average g up to time t 1. In contrast,under the aversive behavior Equation (5.3), the same suggestion minimizesthe user’s utility.Note that the weighted average g is fully determined by the sequenceweights {w⌧}⌧2N. By selecting decaying weights, a higher importance can beplaced on more recent suggestions.Social Behavior. User i’s profile is selected through social behavior withprobability . Conditioned on this event:• Social Influence. A user aligns her interests with a user j selectedfrom her social circle with probability Pij . That is:ui(t) = uj(t 1), with probability Pij , (5.4)wherePj Pij = 1.The probability Pij 2 [0, 1] captures the influence that user j has on user i.Note that users j for which Pij = 0 (i.e., outside i’s social circle) have noinfluence on i. Moreover, the set of pairs (i, j) s.t. Pij 6= 0, defines the socialnetwork among users. We denote by P 2 [0, 1]n⇥n the stochastic matrix130with elements Pij , i, j 2 [n]; we assume that P is ergodic (i.e., irreducibleand aperiodic) [58].Under these dynamics, interests evolve in the form of a dynamic cascade:suggestions made by the recommender act as a forcing function, alteringinterests either through attraction or aversion. Such changes propagate acrossusers through the social network.5.3.4 Recommended Item DistributionIn practice, the recommender has access to a finite “catalog” of items. Re-calling that feature profiles have norm 1, the recommender’s catalog can berepresented as a set C ✓ B, where B = {v 2 Rd : kvk2 = 1} is the set ofitems of norm 1 (i.e., the d-dimensional unit ball).We assume that the recommender selects the items vj(t) 2 B suggestedto user i 2 [n] by sampling them from a discrete distribution ⌫i over B, whosesupport is C. Note that the expected feature profile of a suggested item is aweighted average among the vectors in C. As such, it belongs to the convexhull of catalog C; formally:v¯i =Zv2Bvd⌫i 2 conv(C), (5.5)Note that conv(C) is a convex polytope included in B.As we will see later in our analysis (cf. Theorem 13), the steady stateuser utilities depend only on the expectations v¯i, i 2 [n], rather than theentire distributions ⌫i. We will thus refer to {v¯i}i2[n] as the recommenderstrategy ; it is worth keeping in mind that, given a v¯i 2 conv(C), finding a ⌫isuch that (5.5) holds can be computed in polynomial time in |C| (see also§5.4.4).We further assume that the catalog C is large; in particular, for largecatalog size |C|, we have:conv(C) ' B. (5.6)This would be true if, for example, each item in the catalog are generated inan i.i.d. fashion from a distribution that covers the entire ball B; this distri-bution need not be uniform. Formally, lim|C|!1 conv(C) = B w.p. 1 if, e.g.,items in catalog C are sampled independently from a probability distribu-tion absolutely continuous to the uniform distribution on B. We revisit the131issue of how to pick a distribution ⌫i given v¯i, as well as how to interpretour results in the case of a finite catalog, in §5.4.4.5.3.5 Recommendation ObjectiveObserve that, under the above dynamics, the evolution of the system is anMarkov chain, whose state comprises the interest and feature profiles. Wedefine the objective of the recommender as maximizing the social welfare,i.e., the sum of expected user utilities, in steady state. Formally, we wish todetermine a strategy {v¯i}i2[n] (and, hence, distributions ⌫i) that maximizes:limT!11TTXt=0Xi2[n]hui(t),vi(t)i = limt!1Xi2[n]E[hui(t),vi(t)i],where the equality above holds w.p. 1 by the renewal theorem [58]. It isimportant to note that, under the interest dynamics described in §5.3.3,optimal recommendations to a user i cannot be obtained independently ofrecommendations to other users: user i’s profile depends on recommendationsmade not only directly to this user, but also to any user reachable throughi’s social network.5.4 Algorithm DesignIn this section, we discuss how the recommender selects which items topresent to users to maximize the system’s social welfare. We begin by ob-taining a closed-form formula for the social welfare in steady state, and thendiscuss algorithms for its optimization.5.4.1 Steady State Social WelfareRecall that µ0i is the inherent profile distribution of user i 2 [n], and letµi be the steady state distribution of the profile of user i. We denote byu¯i =Ru2Rd udµi and u¯0i =Ru2Rd udµ0i the expected profile of i 2 [n] underthe steady state and inherent profile distributions, respectively. Moreover,denote by U¯ , U¯0, V¯ 2 Rn⇥d the matrices of dimensions n ⇥ d whose rowscomprise the expected profiles u¯i, u¯0i , v¯i, i 2 [n], respectively. Let also A, , 2 Rn⇥n be the n ⇥ n diagonal matrices whose diagonal elements are thecoefficients (1 )↵i, (1 )i, and (1 )i, respectively.Then, the steady state social welfare can be expressed in closed formaccording to the following theorem.132Theorem 13. The expected social welfare in steady state is:G(V¯ ) ⌘ trh(I P )1⇣AU¯0V¯ T + ()V¯ V¯ T⌘i, (5.7)where tr(·) denotes the matrix trace.Proof. Observe that at any time step t, the profiles ui(t) and vi(t) are inde-pendent random variables. Hence,limt!1Xi2[n]E[hui(t),vi(t)i] = limt!1Xi2[n]hE[ui(t)],E[vi(t)]i=Xi2[n]hu¯i, v¯ii = tr⇣U¯ V¯ T⌘. (5.8)Observe that by the linearity of expectation E[g(Vi(t))] = v¯i for all t 2 Nand i 2 [n]. Thus, for U(t) = [ui(t)]i2[n] 2 Rn⇥d the matrix of all user profilesat time t, we get thatE[U(t)] = AU¯0 + PE[U(t 1)] + V¯ V¯ .As P is sub-stochastic and ergodic, the Perron-Frobenius theorem [58]implies that U¯ = limt!1 E[U(t)] exists andU¯ = AU¯0 + PU¯ + ()V¯ .Solving the above linear system and substituting the solution for U¯ in(5.8) completes the proof.An important consequence of Theorem 13 is that the steady state socialwelfare depends only on the expected profiles V¯ , rather than the entire dis-tributions ⌫i, i 2 [n]. Hence, determining the optimal recommender strategyamounts to solving the following quadratically-constrained quadratic opti-mization problem (QCQP):Global RecommendationMax.: trh(I P )1⇣AU¯0V¯ T + ()V¯ V¯ T⌘isubj. to: kv¯ik22 1, for all i 2 [n].(5.9)where the norm constraint comes from Equation (5.6). Note that this isindeed a global optimization: to solve it, recommendations across different133users need to be taken into account jointly. This manifests in (5.9) throughthe quadratic term in the social welfare objective.5.4.2 SDP RelaxationThe QCQP (5.9) is not necessarily convex. It is thus not a priori clear whetherit can be solved in polynomial time. However, there is a way to reduce toa semi-definite program (SDP) relaxation, which can be solved in polyno-mial time. Interestingly, in many cases, the solution obtained for the SDPrelaxation turns out to be an optimal solution to our original problem (5.9),and there is a simple and efficient test that can verify whether the obtainedsolution is optimal. Finally, when the solution is not optimal, it can be trans-formed to yield a constant-factor approximation. We are thus able to obtaina strong and elegant theoretical result for solving the GlobalRecommen-dation problem.It is important to note that the large-catalog assumption (5.6) is crucialto tractability: replacing the quadratic constraints with the linear constraints(5.5) does not lead to a problem that is amenable to an SDP relaxation. Infact, generic quadratic problems with linear constraints are known to beinapproximable, unless P = NP [121].Deriving an SDP Relaxation. We begin by describing first how to express(5.9) as “almost” an SDP, except for a rank constraint:Theorem 14. There exists a symmetric matrix H 2 R(nd+1)⇥(nd+1) anda convex polyhedral set D 2 Rnd+1 such that Global Recommenda-tion (5.9) is equivalent to:Max.: tr(HY )subj. to: Y ⌫ 0, diag(Y ) 2 D, rank(Y ) = 1, (5.10)where Y 2 R(nd+1)⇥(nd+1).Proof. Let x = col(V¯ ) 2 Rnd be the column-major order vector represen-tation of the recommender’s strategy, and b = col((I P )1AU¯0) 2 Rndthe vector representation of the linear term in (5.7). Moreover, for Q =(IP )1() 2 Rn⇥n, consider the following block-diagonal symmetric134matrix, where Q+QT2 is repeated d times:H0 =26664Q+QT2 0 . . . 00 Q+QT2 . . . 0. . . . . . . . . . . . . . . . . . .0 0 . . . Q+QT237775 2 Rnd⇥nd.Under this notation, (5.9) can be written asMax. bTx+ xTH0xsubj. to x2 2 D0,(5.11)where x2 = [x2k]k2[nd] 2 Rnd+ results from squaring the elements of x, and D0the set implied by the norm constraints:D0 = {x0 2 Rnd | 8i 2 [n],Pndj=11j mod n=i mod n x0j 1}.Note that D0 is a convex polyhedral set defined by linear equality con-straints. Moreover, (5.11) can be homogenized to a quadratic program with-out linear terms using the following standard trick (see also [111, 121, 140]).Introducing an auxiliary variable t, the objective can be replaced by tbTx+xTH0x, where t satisfies the constraint t2 1. Setting y = (x, t) 2 Rnd+1,this yields:Max.: yTHysubj. to: y2 2 D, (5.12)where H is the following symmetric matrix:H =H0 b/2bT/2 02 R(nd+1)⇥(nd+1),andD = {y0 = (x0, t0) 2 Rnd+1 | x0 2 D0, t0 1}.To see that (5.12) and (5.11) are equivalent, observe that an optimalsolution (x, t) to (5.12) must be such that t = 1 or t = +1. If t = +1,then x is an optimal solution to (5.11); if t = 1, then x is an optimalsolution to (5.11). Finally, (5.12) is equivalent to (5.10), by setting Y = yyTand using the fact that yTHy = tr(HyyT).135In particular, given an optimal solution Y to (5.10), an optimal solutionto GlobalRecommendation can be constructed as follows. Since Y ⌫0 and rank(Y ) = 1, there exists y 2 Rnd+1 such that Y = yyT. Morespecifically, Y has a unique positive eigenvalue . If e is the correspondingeigenvector, y = (x, t) =p · e. An optimal solution to (5.9) is thus thematrix V¯ 2 Rn⇥d with column-major order representation col(V¯ ) = t · x.Problem (5.10) is still not convex, on account of the rank constrainton Y . However, in light of Theorem 14, a natural relaxation for GlobalRecommendation is the following semi-definite program, resulting fromdropping this rank constraint:SDP RelaxationMax.: tr(HY )subj. to: Y ⌫ 0, diag(Y ) 2 D. (5.13)This is a relaxation, in the sense that it increases the feasible set: anysolution to (5.10) will also be a solution to (5.13). Crucially, (5.13) is aconvex SDP problem, and can be solved in polynomial time. Moreover, ifit happens that the optimal solution Y of (5.13) has rank 1, this solutionis also guaranteed to be an optimal solution to (5.10), and can thus beused to construct an optimal solution to Global Recommendation, byTheorem 14. If, on the other hand, rank(Y ) > 1, we are not guaranteed toretrieve an optimal solution to (5.10). However, a solution with a provableapproximation guarantee can still be constructed through a randomizationtechnique, originally proposed by Goemans and Williamson [62].Approximation Algorithm. Algorithm 10 summarizes the steps in theapproach outlined above to solving Global Recommendation. First, thealgorithm obtains an optimal solution Y to the SDP (5.13). It then testsif rank(Y ) = 1, i.e., if this solution happens to have rank 1. If it does,then it is also a solution to (5.10), and an optimal solution to (5.9) canbe constructed as outlined in the proof of Theorem 14. In particular, Ycan be written as Y = yyT, where y = (x, t) 2 Rnd+1 can be computedfrom the unique positive eigenvalue of Y and its corresponding eigenvector.The optimal solution to (5.9) can subsequently be obtained as the matrixV¯ 2 Rn⇥d that has a column-major order representation col(V ) = t · x.If, on the other hand, rank(Y ) > 1, the algorithm returns a vector (x, t)constructed in a randomized fashion. In particular, the algorithm returns136Algorithm 10: Global Recommendation AlgorithmData: Model parameters A, , , , P , U¯0Result: Expected item profiles V¯1 begin2 Solve SDP Relaxation (5.13) and let Y be its solution3 if rank(Y ) = 1 then4 Let > 0 be the unique positive eigenvalue of Y , and e thecorresponding eigenvector5 (x, t) p · e6 else7 Construct a factorization Y = ZTZ, where Z 2 R(nd+1)⇥(nd+1)8 Sample a random u 2 Rnd+1 from N (0, Ind+1)9 sgn(ZTu)10 D a diagonal matrix containing pdiag Y11 (x, t) D12 Let V¯ 2 Rn⇥d be such that col(V¯ ) = t · xthe vectorpdiag(Y ), namely the square root of Y ’s diagonal elements, witheach coordinate multiplied by a random sign (+1 or 1). The random signvector 2 {1,+1}nd+1 used in this multiplication is constructed as follows.Given that Y ⌫ 0, there exists a matrix Z 2 Rn⇥n that factorizes Y , i.e.,Y = ZTZ. Such a matrix can be obtained in polynomial time from theeigendecomposition of Y . Having Z, the algorithm proceeds by sampling arandom vector u 2 Rnd+1 from a standard Gaussian distribution. Then, isthe binary vector computed by applying the sign operator on the coordinatesof vector ZTu.The resulting random y = (x, t) 2 Rnd+1 is guaranteed to be a feasiblesolution to (5.10). Most importantly, the following approximation guaranteefor the quality of the corresponding solution to Global Recommendationcan be provided:Theorem 15 (Ye [121]). Let G⇤, G⇤ be the maximal and minimal valuesof the social welfare G given by (5.7), evaluated over the feasible domain of(5.9). Let also V¯ the solution generated by Algorithm 10 when rank(Y ) > 1.ThenG⇤ Eu[V¯ ]G⇤ G⇤ ⇡2 1 < 47,where the expectation Eu[·] is over the Gaussian vector u.137The existence of a simple test (namely rank(Y ) = 1) verifying that thesolution produced by Algorithm (10) is optimal is important. In fact, inSection 5.6, we study an extensive set of instances, involving several socialnetwork topologies and combinations of aversive and attractive behavior. Ineach and every instance studied, Algorithm 10 yielded an optimal solution.Hence, although the quadratic program (5.9) is not known to be within theclass of problems that can be solved exactly through an SDP relaxation, theexperiments in Section 5.6 suggest that a stronger guarantee than the oneprovided by Theorem 15 is attained in practice.5.4.3 Solvable Special CasesThough for generic instances of (5.9) we cannot obtain a better theoreticalguarantee than Theorem 15, there are specific instances of (5.9) for whichoptimality is always attained, and the approximation through randomizationis not necessary. As these cases are also of practical interest, we briefly reviewthem below.Attraction Dominance. Consider a scenario where (a) i > i for all i 2 [n]and (b) U¯0 0 . Intuitively, (a) implies that attraction to proposed contentis more dominant than aversion to content, while (b) implies that user profilefeatures take only positive values. Hence, the matrix H in Theorem 14 hasnonnegative off-diagonal elements. Altrough the QCQP (5.9) in this case isnot convex, it is known that in this specific case Algorithm 10 provides anoptimal, rank-1 solution [150].Uniform Aversion Dominance. Consider a scenario where (a) all param-eters are uniform across users, i.e.,i = and i = , for all i 2 [n], and (b) < , i.e., aversion dominates user behavior. In this case, the QCQP (5.9) isconvex and can thus be solved optimally by standard interior point methodsin polynomial time [25].No Personalization. Consider the scenario where the same item is rec-ommended to all users, i.e., vi(t) = v(t), 8i 2 [n]. In this case, GlobalRecommendation reduces to a quadratic objective with a single quadraticconstraint, in which case even though the problem may not be convex, Al-gorithm 10 is guaranteed to find a rank-1, optimal solution [25].No Social Network. In the case where = 0, and there is no socialcomponent to the optimization, the social welfare (5.7) becomes separablein v¯i, i.e., G(V¯ ) =Pi2[n]Gi(v¯i), where Gi is a quadratic function. Then, theoptimization is separable, and a solution to (5.9) can be obtained by solvingmaxv¯i2Rd:kvk1G(v¯i) for each i 2 [n]; these are again quadratic problems138with a single quadratic constraint, and can be solved exactly by Algorithm 10[25].5.4.4 Finite CatalogRecall that our analysis assumes (5.6), which becomes applicable for a largecatalog C covering the unit ball B. We describe below how a computed profilev¯i, i 2 [n] can be used to construct a distribution ⌫i over catalog C.If v¯i 2 conv(C), the recommender can select probabilities ⌫i(v), for v 2C, that satisfy (5.5); this equality, along with the positivity constraints, andthe constraintPv2C ⌫i(v) = 1 (as ⌫i is a distribution), are linear, and definea feasible set. Thus, finding a probability distribution satisfying (5.5) (i.e.,that lies in the feasible set) is a linear program, which can be solved inpolynomial time.If, on the other hand, v¯i /2 conv(C), the same procedure can be appliedto the projection of v¯i to conv(C). Given that conv(C) is a convex polytope,this can again be computed in polynomial time. Moreover, under (5.6), if thecatalog C is large this projection will be close to the optimal value v¯i.5.5 Parameter LearningIn this section, we provide an algorithm for validating the existence of at-traction and aversion phenomena in real datasets. In short, our approachinvolves incorporating aversion and attraction parameters into matrix fac-torization [91,93]; we treat parameters ↵i, i and i as regularization terms,which are learned through cross validation.Extending MF. We focus on datasets that comprise ratings generatedby users, at known times (such as the datasets to be used in §5.6). Morespecifically, we assume access to a dataset represented by tuples of the form(i, j, rij , t) where i 2 [n] ⌘ {1, . . . , n} is the ID of a user, j 2 [m] ⌘ {1, . . . ,m}the ID of an item, r 2 R the feedback (rating) provided by user i to item jand t 2 [T ] the time at which the rating took place. Denoting by E ⇢ [n]⇥[m]the pairs appearing in tuples in this dataset, recall that matrix factorization(MF) amounts to constructing profiles ui 2 Rd, vj 2 Rd that are solutionsto:minui,vj ,i2[n],j2[m]X(i,j)2E(rijhui,vji)2+Xi2[n]kuik22+µXj2[m]kvjk22 (5.14)139where , µ > 0 are regularization parameters to be learned through crossvalidation. Though this is not a convex problem, it is typically solved eitherthrough gradient descent or alternating least squares techniques, both ofwhich perform well in practice [91,93].We incorporate attraction and aversion in this formulation as follows.First, at any time step t 2 [T ], the profile of a user i is given by ui(t) 2 Rd.Let Ei(t) ✓ [m] be the set of items rated by user i at time t andVi(t) = {vj : j 2 Ei(⌧), 1 ⌧ t}be the set of items the user has interacted with up to time t (inclusive). Asin §5.3, we denote by g(Vi(t)) the weighted average of items in Vi(t). Then,we propose obtaining ui(t) as solutions to:minui(t),i2[n],t2[T ]Xt2[T ],i2[n],(i,j)2Ei(t)(rij hui(t),vji)2+Xi2[n],t2[T ]kui(t)↵iu0i(ii)g(Vi(t))k22+kui(t)k22, (5.15)where u0i , vj are computed through standard MF (5.14), and ↵i, i, i, i 2 [n]and are also treated as regularization parameters, to be learned throughcross validation. Note that, in contrast to (5.14), (5.15) is a simple linearregression problem, and the profiles ui(t), where i 2 [n], t 2 [T ], can becomputed in closed form.Learning Procedure. Based on this approach, our algorithm for learningthe vectors ↵, , and is outlined in Algorithm 11. First, we learn theinherent profiles u0i and the item feature profiles vj by solving (5.14), throughstochastic gradient descent. Then, we use these profiles to learn ↵i, i, ithrough cross validation. In particular, we split the ratings dataset in kfolds, and use k 1 folds as a training set, and one fold as a test set. In ourevaluation, we set k = 5. We learn ui(t) by solving (5.15) on this restricteddataset. Using these, we compute the square error on the test set as:SEtest =X(i,j,t)2test(rij hui(t),vji)2.We repeat this process across k folds and obtain an average SEtest.We compute vectors ↵, , that minimize the average SEtest. Notethat this is a function of the regularization parameters of (5.15), i.e.,140Algorithm 11: Attraction-Aversion Learning AlgorithmData: Rating dataResult: (↵i, i, i), for all i 2 [n]1 Obtain u0i , 8i 2 [n] and vj , 8j 2 [m] through standard MF (5.14)2 Compute g(Vi(t)), 8i 2 [n], 8t 2 [T ]3 Split the dataset into k folds4 Initialize values in ↵, , uniformly at random from [0, 1]5 Project (↵i, i, i) to the set {(x, y, z) 0 : x+ y + z = 1}, 8i 2 [n]6 repeat7 (↵, , ,) (↵, , ,) ⇢rSEtest(↵, , ,)8 Project (↵i, i, i) to {(x, y, z) 0 : x+ y + z = 1}, 8i 2 [n]9 until change of SEtest in two consecutive iterations is small enoughFlixster FilmTipSet MovieLens# users 4.6K 443 8.9K# items 25K 4.3K 3.8K# ratings 1.8M 118K 1.3M# SN edges 44K N/A N/ATable 5.1: Datasets statisticsSEtest = SEtest(↵, , ,). As (5.15) admits a closed form solution, so doesSEtest(↵, , ,). Using this, we find ↵, , through projected gradient de-scent, requiring that they sum to 1.5.6 ExperimentsWe performed experiments to evaluate our parameter learning and socialwelfare-maximizing algorithms on three real-world rating datasets: Flixster,FilmTipSet, and MovieLens, as well as several synthetically generated traces.The implementations were done in Matlab and we made use of the CVXlibrary [74] to solve the SDP in Algorithm 10. All experiments were run ona server with AMD Opteron 6272 CPUs (eight cores at 2.1GHz) and 128GBmemory.Dataset Preparations. We first describe the three real-world ratingdatasets, whose basic statistics are summarized in Table 5.1.Flixster is a social movie rating site. The original dataset, collected byJamali et al. [82], comprises 1M users, 14M undirected friendship edges, and1418.2M timestamped ratings (ranging from 0.5 to 5 stars). We used Graclus [48]to extract a dense subgraph of the social network. Further, we filtered outusers and movies with less than 100 ratings so that there is enough data tolearn temporal profile vectors. This left us with a core of 4.6K users, 44Kedges, and 25K movies.FilmTipSet3 is Swedish movie fans community. The data was originallypublished for a research competition in the CAMRa workshop4. It has 16Kusers, 67K movies, 2.8M timestamped ratings (on the scale of 1 to 5). Weselected users rating no less than 100 movies in both 2004 and 2005. Thisgives a core of 443 users, 4.3K movies, and 118K ratings.The third dataset is MovieLens5 (the largest version with 10 millionusers). We focused on users that have rated at least 20 movies in the yearof 2000. Note that there is no social network in MovieLens. FilmTipSetcontains some social networking information, which we however did not usein our analysis due to its extreme sparsity (85 edges for the 443 core users).0 1000 2000 3000 4000 5000 600010−210−1100101IterationError value Test RMSERMSEαRMSEγRMSEδFigure 5.2: The decreasing trend of Test RMSE, RMSE↵,RMSE , andRMSE5.6.1 Evaluation of Parameter LearningLearning on Synthetic Data.We first ran Algorithm 11 on a syntheticallygenerated dataset to examine its accuracy. We set n = 100, T = 100, and3http://www.filmtipset.se/, last accessed on September 24, 20154http://www.dai-labor.de/camra2010/, last accessed on September 24, 20155http://grouplens.org/datasets/movielens/, last accessed on September 24, 20151420 0.2 0.4 0.6 0.8 100.20.40.60.81Ground−truth probabilityLearned probability AlphaGammaDeltaFigure 5.3: Interest evolution probabilities learned on synthetic data,compared against the generated ground-truth values for ac-curacyd = 5. Each user i 2 [n] consumes one random item at every time stept 2 [T ]. For all users i, we generated “ground-truth” ↵i, i, and i uniformlyat random from [0, 1] and normalize them so that ↵i + i + i = 1. Theexpected inherent interest profiles and item profiles were generated uniformlyat random from [0, 1]d. Interest profiles evolved according to the dynamicsin Section 5.3.3, with = 0, and weighted average g with equal weights. Ateach step, users “generate” ratings which were computed by taking the innerproduct of appropriate profile vectors.Both the learning rate ⇢ and regularization parameter in Algorithm 11were set to 0.001 (determined by cross validation). The convergence conditionof Algorithm 11 was set to be the change in SEtest being smaller than 106.We repeated the process ten times with different random starting points andreport the results obtained in the repetition that gives the smallest SEtest.Let ↵`i , `i , and `i be the learned evolution probabilities. We used RMSEto define the learning error w.r.t. ↵i’s:RMSE↵ =sPni=1 |↵i ↵`i |2n.RMSE and RMSE can thus be defined in the same way. Figure 5.2 showsthe decrease of these RMSEs as the number of iterations goes up. At con-vergence, they were 0.08, 0.58 and 0.56 respectively. In addition, this figure1430 0.5 10100200300400500Probability valuesNumber of users0 0.5 1010203040Probability valuesNumber of users0 0.5 102004006008001000Probability valuesNumber of users(a) Flixster (b) FilmTipSet (c) MovieLensFigure 5.4: Learned values of ↵i on three real-world datasets−1 −0.5 0 0.5 1050100150200Value of γi − δiNumber of users EmpiricalFit µ ± 1.8 σ−1 −0.5 0 0.5 101020304050Value of γi − δiNumber of users EmpiricalFit µ ± 1.8 σ−1 −0.5 0 0.5 1050100150200250Value of γi − δiNumber of users EmpiricalFit µ ± 1.8 σ(a) Flixster (b) FilmTipSet (c) MovieLensFigure 5.5: Values of i i on three real-world datasetsshows that the test RMSE (computed asqSEtest/|test|) dropped steadilyas the learning proceeds, which finally converged to 0.02 in a total of 5300iterations.In Figure 5.3, we show a scatter plot of the ground-truth probabilities andthe learned probabilities (at convergence). Each data point has one ground-truth probability value in x-coordinate and the corresponding learned valuein y-coordinate. The y = x line indicates points for which the learned andground truth probabilities are equal. As can be seen, the algorithm recovered↵i’s almost perfectly, while the results for i’s and i’s were also reasonablygood. This gave us confidence in deploying the algorithm on real-world ratingdata, in which ground-truth parameters are not known.Learning on Real Data. For each dataset, we sorted the ratings in chrono-logical order and split them into T = 10 time steps. A single time stepcorresponded to 3, 1.2, and 2.5 calendar months in Flixster, MovieLens,and FilmTipSet, respectively. We then ran Algorithm 11 with learning rate⇢ = 0.001, regularization parameter = 0.001, and number of latent featuresd = 10.144Figure 5.4 shows the distributions of values learned for ↵i’s. Furthermore,to compare the number of attraction-dominant users and the number ofaversion dominant users, in Figure 5.5 we display the distribution of i i,along with a Gaussian distribution fitted by data within the interval [µ 1.8, µ + 1.8], where µ and 2 is the mean and variance of all i i’s.As can be seen, the empirical distribution has tails that are heavier thanthe Gaussian (at about 0.5 and 0.5), indicating the existence of stronglyaversive and strongly attracted users.For reference, we also compared the average test RMSE on five-fold cross-validation achieved by our model and by standard MF. For standard MF,we implement the stochastic gradient descent method as in [93] with d = 10,learning rate 0.002, and regularization parameters determined by cross vali-dation. As shown in Figure 5.6, profiles learned by Algorithm 11 outperformstandard MF in rating prediction, lowering the test RMSE by 11.8%, 11.9%,6.18% on Flixster, FilmTipSet, and MovieLens respectively.Flixster FilmTipSet MovieLens0.50.60.70.80.91Test RMSE Our ModelStandard MFFigure 5.6: Test RMSE comparisons between our extended MF modeland the standard MF model5.6.2 Social Welfare PerformanceNext, we evaluated Algorithm 10, hereafter referred to as GRA, and comparethe social welfare it yielded with a baseline that ignores interest evolution.This baseline would recommend to each user the item profile maximizing theuser’s utility under the inherent profile computed by standard MF. For each145user i 2 [n], this isvi =u0i||u0i ||2, for all i 2 [n].It is thus easy to see that ||vi||2 = 1 and it is in fact co-linear w.r.t. u0i . Wehereafter refer to this baseline as MF-Local. In all experiments, following theliterature of social influence propagation and maximization [35, 86], we setthe influence probability of user j on user i to be 1/degin(i), where degin(i)is the in-degree of node i in the network graph.5.6.2.1 Experiments on Synthetic NetworksWe started by evaluating the social welfare achieved by GRA and MF-Localon three different of random networks that mimic the structure of a socialnetwork: Forest-Fire [12], Kronecker [99], and Power-Law [4]. For each type,we considered the following settings: Forest-Fire with forward and backwardburning probability being 0.38 and 0.32 respectively, Kronecker with initiatormatrix being [0.9, 0.5; 0.5, 0.3], and Power-Law with exponent 2.1.We varied the size of network graphs (i.e., number of users, n), the valueof (i.e, users’ tendency of getting influenced by friends), and the differencebetween i and i, to evaluate their effects on the performance of our GRAalgorithm in comparison to MF-Local. Unless otherwise noted, ↵i’s, i’s,i’s, and inherent user profiles are sampled randomly and the process wasrepeated ten times, of which we took the average social welfare. Also, d wasfixed to be 10. In all cases, we plot the relative gap in social welfare, i.e.,SocialWelfareGRA SocialWelfareMF-Local|SocialWelfareMF-Local| .Effect of Network Size. We tested five different values for n:10, 50, 100, 150, and 200 for Forest-Fire and Power-Law, and 16, 32, 64, 128,and 256 for Kronecker (by definition a random Kronecker graph has 2wnodes where w 2 N+ is the number of iterations of Kronecker producttaken in the generation process [99]). As can be seen from Figure 5.7, thegap between GRA and MF-Local was close to 10% for small graphs, butincreases on all three networks for larger values of n: GRA achieved twiceas much social welfare as MF-Local, for n = 200.Effect of . In this test, we varied the value of from 0 up to 0.5. Networksize is fixed at 100 for Forest-Fire and Power-Law, and 128 for Kronecker.Figure 5.8 shows that GRA significantly outperformed MF-Local, and moreinterestingly, the relative gap increased as increased. This intuitively sug-14610 50 100 150 200050100150Network sizeRelative gap (in percentage) Forest−FireKroneckerPower−LawFigure 5.7: Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying network size (n)0 0.1 0.2 0.3 0.4 0.501020304050Value of βRelative gap (in percentage) Forest−FireKroneckerPower−LawFigure 5.8: Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying gests that when the influence among users is higher, ignoring the joint effectof recommendations becomes more detrimental to maximizing the social wel-fare.Effects of i i. Next, we tested different values of i i, representingcases from extreme aversion dominance to attraction dominance. Networksize was n = 100 and = 0.25, while ↵i = ↵, i = and i = for alli 2 [n]. We set ↵ s.t. ↵(1 ) = 0.25, and vary , where ↵+ + = 1.Relative gaps are shown in Figure 5.9. All in all, we see that gaps are far147−0.6 −0.4 −0.2 0 0.2 0.4 0.610−1100101102103Value of γi − δiRelative gap (in percentage) Forest−FireKroneckerPower−LawFigure 5.9: Relative increase in social welfare by GRA over MF-Localon synthetic datasets: Varying i imore pronounces in the strongly aversive regime, as targeting to the existingprofiles of users leads to suboptimal recommendations. For values less than-0.3, the social welfare under MF-Local was actually negative; it went upas increases, i.e., users tend towards attractive behavior. In contrast,the social welfare of GRA was always positive, and always greater than theone under MF-Local. As a result, there is a large gap for values at less than0.3; the relative gap become small (but still positive) near 0.1, and thensteadily increased.It is important to note that in all evaluations of GRA over syntheticdatasets, as well as the ones listed below on real datasets, GRA returnedan optimal solution. That is, for all inputs tested, the matrix Y computedhad rank 1. Hence, although the QCQP problem (5.9) is not known to besolvable in polynomial time, in practice, GRA outperformed the guaranteeof Theorem 15.5.6.2.2 Experiments on Real DataWe next compare the social welfare attained on Flixster, FilmTipSet, andMovieLens by GRA and MF-Local. For FilmTipSet and MovieLens wherethere is no social network considered, Global Recommendation is sep-arable and can thus be parallelized (see Section 5.4.3): we can divide usersinto arbitrary subsets, run GRA on each of them, and then combine the totalsocial welfare over all subsets as the final solution without any loss.148FX(0.1) FX(0.5) FT ML020004000600080001000012000Social Welfare GRA−heuristicGRAMF−LocalFigure 5.10: Social welfare, where FX(0.1) and FX(0.5) denote Flixsterwith = 0.1 and 0.5 respective; FT denotes FilmTipSetand ML denotes MovieLens.To improve the scalability of GPA over Flixster, and parallelize its ex-ecution, we adopted the following heuristic. First, we split the social graphinto 50 subgraphs using Graclus. Then, we solved SDP on each subgraphseparately. Note that, in effect, this optimization ignored the edges betweensubgraphs, and thus only yielded an approximation to the social welfare.Figure 5.10 illustrates the performance of GRA and MF-Local on thosedatasets, where the values of ↵i, i, and i were all from the learning resultsin Section 5.6.1 and we set the dimensionality d to 10. We can see thatGRA is significantly superior to MF-Local: on FilmTipSet (1461 vs. 757)and MovieLens (11092 vs. 4926), it achieved approximately twice the socialwelfare.On Flixster, tested test two cases for : 0.1 and 0.5, representing weakand strong social behavior respectively. For GRA, we adopted the aforemen-tioned clustering-based heuristic to compute v¯i’s, and evaluated the welfareachieved by GRA in two ways: (i) simply calculating the welfare on the sub-graph and taking the sum over all subgraphs (termed GRA-heuristic); (ii)taking the v¯i’s to calculate the social welfare on the entire graph (termedGRA). The values computed by method (ii) were only slightly different from(i), indicating that our clustering heuristic closely followed the true socialwelfare, while enabling parallelization. The relative gain of GRA-heuristicover MF-Local was 39.0% when = 0.1 and 13.4% when = 0.5. The run-149ning time of GRA was reasonably good, e.g., on a subgraph of Flixster with94 nodes and 276 edges, GRA finished in 90 seconds.In summary, through extensive empirical evaluation on both real andsynthetic data, we have demonstrated that first, the phenomenon of interestevolution, especially attraction and aversion, can indeed by observed fromreal-world rating data, and second, both of our learning algorithm and globalrecommendation algorithm are highly effective in their respective tasks.5.7 Discussion and Future WorkExtensions to accommodate user segmentation. Notice that the frame-work presented in this chapter assumes equal treatment of all users, as theobjective function (5.7) is an unweighted sum of the expected steady-stateutility of all users. We can extend it to capture scenarios where the serviceprovider of the recommender system wishes to provide differential treatmentson different segments of users. For instance, the system may want to furtheroptimize for VIP users who have premium subscriptions. Alternatively, wemay want to weight active user more than inactive ones.To this end, we extend the Global Recommendation problem formu-lation to use the weighted sum of the expected steady-state utility of all usersas the maximization objective. More specifically, each user i 2 [n] is asso-ciated with a weight wi 2 (0, 1]. It indicates the importance of a particularuser in the optimization.By linearity of expectation and the linearity of inner product operations,the expected weighted social welfare in steady state is thus:limt!1Xi2[n]wi · E[hui(t),vi(t)i] = limt!1Xi2[n]hwi · E[ui(t)],E[vi(t)]i=Xi2[n]hwiu¯i, v¯ii = tr⇣WU¯V¯ T⌘,where the matrix W is a diagonal matrix with Wii = wi and Wij = 0whenever i 6= j.Then, by Theorem 13 we can obtain the formulation of the WeightedGlobal Recommendation problem as follows.Max.: trhW (I P )1⇣AU¯0V¯ T + ()V¯ V¯ T⌘isubj. to: kv¯ik22 1, for all i 2 [n].(5.16)150Note that the property of the problem does not change, as (5.16) is stilla QCQP, and hence our solution framework based on SDP relaxation (seeSection 5.4.2) remains applicable. More precisely, this is because Theorem 14can be easily extended to hold for (5.16): in its proof where we constructedan equivalent problem formulation consisting of a SDP and a rank-one con-straint, we can simply modify vector b and matrix Q by multiplying W tothe left of (I P )1.Future Work. In the experiments, we have exploited parallelizing execu-tion over weakly connected partitions of the social graph. This highlights anapproach for scalable, parallelizable solutions to the SDP relaxation. Furtheropportunities for improving efficiency exist: the sparse, block structure of thematrices in our SDP was not exploited by the generic solvers we employed.Investigating solutions that exploit this structure for higher efficiency is aninteresting future direction.Moreover, although the QCQP that expresses our problem is not knownto be exactly solvable through an SDP relaxation, all solutions we obtainedthrough our experiments were actually optimal. Understanding if optimalityholds for a wider class than the ones presented in Section 5.4.3 is also animportant open problem. Besides, there are clearly many phenomena beyondattraction, aversion and social influence that may affect a user’s interests.The quadratic nature of our problem arises from the standard factor-basedmodel for utilities: understanding if other phenomena inducing drift on pro-files can also be cast in this framework is also an interesting open question.151Chapter 6Summary and Future Research6.1 SummaryThe rapid growth of online social networks and social media has opened upmany opportunities in computational social influence research. Motivated by(i) the connection between social influence and two prevalent data miningapplications – viral marketing and recommender systems, and (ii) the gapsbetween theories of computational social influence and practical applications,in this dissertation we have made the following contributions to the filed ofcomputational social influence:• We proposed three novel influence diffusion models: Linear Thresholdswith Valuation (LT-V), Linear Thresholds with K advertisers (K-LT),and Comparable Independent Cascade (ComIC). All three models havemore expressive power compared to the classical diffusion models suchas IC and LT [86].• We studied various optimization problems under these models, suchas profit maximization over social networks (ProMax), fair seed allo-cation for competitive viral marketing from the host perspective, andinfluence maximization for two complementary propagating entities.• In the context of recommender systems, we modeled the dynamics ofuser interest evolution using social influence as well as users’ attrac-tion and aversion behaviors, and solve the optimal recommendationproblem using semi-definite programming techniques.In Chapter 2, we extended the classical LT model by incorporating pricesand valuations to capture monetary aspects in product adoption. We studied152an NP-hard profit maximization problem (ProMax) under the LT-V model.In this problem, the objective function is submodular but non-monotonew.r.t. the seed set, for any fixed price vector. We proposed the PAGE al-gorithm which dynamically determines the optimal personalized discountsfor targeted seeds. Our experimental results showed that PAGE is both effi-cient and effective, outperforming several intuitive baseline algorithms in allaspects evaluated, e.g., the expected total profit achieved and running time.Our work in Chapter 3 also took a step towards closing this gap betweeninfluence maximization and real-world viral marketing. More specifically, weconsidered the fact that social networks are owned by a service provider(host) in real life, and competing advertisers cannot simply autonomouslyset up their campaigns. We considered a setting where hosts sells viral mar-keting as a service to multiple competing companies. We posed the novelproblem of Fair Seed Allocation in which the host must allocate influen-tial users to competing companies to guarantee “the bang for the buck” forthe advertisers is as balanced as possible. We showed that the problem isNP-hard, and developed an efficient greedy heuristic called Needy-Greedy,as well as two exact algorithms based on dynamic programing and integerlinear programming. We perform simulations on three real-world networksand show that our algorithms are both effective and efficient, significantlyoutperforming baseline allocation schemes such as random and round-robin.In Chapter 4, we proposed the Comparative Independent Cascade(ComIC) model that characterizes both competition and complementarityrelationships between two different propagating items, to any degree pos-sible, and tackled the influence maximization problem when there are twocomplementary products. We identified parameter subspaces of ComIC un-der which submodularity and monotonicity are satisfied by influence spreadfunctions, and develop non-trivial extensions to the RR-set techniques to ob-tain approximation algorithms. For non-submodular settings, we also deviseda Sandwich Approximation scheme to achieve data-dependent approximationbounds. Our experiments demonstrated the effectiveness and efficiency of thenovel approximation algorithms we proposed.In Chapter 5, we applied social influence propagation, together withusers’ attraction and aversion behaviors, to recommender systems to modeluser interests. Taking these factors into account, we first proposed an interestevolution model in which the probability distributions of user interest profilesform a Markov chain. Devising recommendation strategies in this setting ischallenging, as the items suggested in previous time steps have direct impactin all users’ future interests due to the interest cascade triggered by influ-ence propagation. Therefore, effective recommendations should be made in a153holistic fashion, in contrast to conventional methods that computes relevantitems for each user separately. Our objective was to maximize the total ex-pected utility of all users, which we formulated as a quadratically constrainedquadratic program (QCQP). We showed that the optimal recommendationproblem is NP-hard, and devised an approximation algorithm using SDPrelaxation. We learned interest evolution parameters by adapting low-rankmatrix factorization, and our experiments showed that the SDP-based algo-rithms clearly outperform baseline recommendation methods which do nottake interest evolution into account.6.2 Discussions and Future WorkThe field of Computational Social Influence is still booming and there areabundant opportunities for future research.Influence Modeling. Most stochastic propagation models (including thosepresented in the dissertation) do not account for non-social channels throughwhich people obtain information, such as TV, newspapers, web browsing,etc. This calls for a unified model which takes both social influence andnon-social information channels into consideration. One possible way is toaugment the social network graph G = (V,E) by creating dummy nodes andedges that represent those channels and their influence on users. For eachnon-social channel c, we create a dummy node xc that has a directed edgeto every user-node u 2 V in the social network. The edge weight on (xc, u)represents the propensity to which a user u obtains information from or getsinfluenced by channel xc. As a future work , one could study the stability ofinfluence optimization algorithms when such non-social information channelsare accounted for.Another direction is to factor in the heterogenous nature of activeness ofsocial network users. For example, teenagers and college students are typ-ically keen on modern social networking and social media apps and hencehave more exposure to viral information, while people spent less time onthose technologies (e.g., busy professionals) do not have the same level ofexposure. Thus, one may consider segmenting social network users into mul-tiple types, each corresponding to a different level of activeness, and studyinfluence propagation models in the presence of such heterogeneity.One may also consider multiple types of social relationships: family, co-worker, classmates, research collaborators, people sharing the same politicalbelief, or same sports interest, etc. Similarly, the information being propa-gated over the social network can also be of multiple types, and depending on154the type of the relationship, the strength of influence may vary. For example,suppose Alice and Bob became friends for they are both enthusiastic fans ofthe Vancouver Canucks. The mutual influence between Alice and Bob maybe stronger when the information or product being propagated is about icehockey, as the same topic and relationship type (ice hockey) may create highsynergy. Studying relevant influence optimization problems in such a settingis also worth considering.Influence Optimization. The reliance of submodularity to obtain approx-imation guarantees is ubiquitous in the literature. Prior to the proposal ofthe Sandwich Approximation method in Chapter 4, resorting to heuristics isthe only way of solving maximization problems in the absence of submod-ularity. However, for Sandwich Approximation to be effective, one shouldderive an upper and/or lower bound submodular function that is reasonablytight w.r.t. the original objective function. A general principle is to ana-lyze if there is any aspect of a non-submodular model that can be changed,tuned, or relaxed, so that submodularity is satisfied. The difficulty of such ananalysis varies from model to model. Thus, further studies on how to applySandwich Approximation effectively on influence maximization under well-known non-submodular models – e.g., The LT model with hard-wired nodethresholds [86], the WPCLT-model [24] – is both interesting and worthwhile.One may also be interested in designing optimization algorithms whenadditional constraints are imposed. For example, in a viral marketing cam-paign, many users could very well be inundated with signals, or impressionsof its neighbors having performed a promoted action (e.g., purchased a prod-uct). To avoid this and ensure pleasant user experience, a social network hostmay want to place an upper bound on how many viral-marketing-related im-pressions each user will receive in a specific time frame. It is not difficult toextend simulation-based algorithms to choose seeds under such constraints(e.g., greedy algorithm with MC simulations), but how to extend the muchmore advanced and efficient TIM [137] or IMM [136] algorithms remains achallenge: First, the RR-set sampling component in General TIM assumesorder-independence1, which may not hold due to the constraint on the num-ber of impressions. Second, there is no guarantee that submodularity willstill be satisfied. We remark that Sandwich Approximation may offer a vi-able solution in this case, as it is obvious that the influence spread computedwithout the impression limit constraint is naturally an upper bound on the1As a specific example, in the IC model, the order in which active in-neighbors attemptto influence an inactive node does not matter for spread estimation and seed selection.155influence spread computed with the constraint imposed2. Though this discus-sion sheds some lights on the power of Sandwich Approximation, a full-scopestudy in the future is certainly worthwhile.Influence Learning. Another common assumption often made in com-putational social influence is that optimization algorithms have explicitknowledges on the influence strength between social network users. Get-ting ground-truth is extremely challenging, and there has been considerablework on learning influence strength in social networks [68,122,129,141]. Al-though the thesis does not focus on learning, we stress that the accuracy oflearnt influence strength w.r.t. the ground-truth is crucial in the sense thatif the learnt values deviate substantially from the ground-truth, then evenan optimal influence maximization algorithm is likely to return suboptimalsolutions. He and Kempe [78] found that for the IC model, if each of the esti-mated influence probability (one per edge) has more than 20% relative error,the noise of the data would dominate the objective function and lead to asignificant risk of suboptimal solutions. This indicates the need of searchingfor (i) capable influence learning methods that have proven bound on rela-tive errors (e.g., well under 20%) and (ii) influence maximization algorithmsthat are less susceptible to inaccurate and noisy input data.Distributed Influence Maximization. The vast majority of the compu-tational social influence literature assumes that for problems like influencemaximization, the entire social network graph can fit into the main memoryof a single machine. However, real-world social networks are getting largerand larger by the second3, and thus there is a pressing need to devise efficientand effective distributed algorithms.Recently, Lucier et al. [110] proposed a distributed MapReduce-based [45]algorithm – InfEst – for estimating the expected influence spread of anygiven seed set under the IC model. They assumed the link-server modelfor querying and inter-machine communications, which is commonly used indistributed computing [32, 46, 134]. Under the link-server model, the socialnetwork graph G = (V,E) is stored on the disk of a centralized server, andeach query to the server on any node v 2 V returns the set of in- and out-neighbors of v, together with the relevant influence probabilities. For anygiven seed set S ⇢ V and any ✏ 2 (0, 1/4), the estimation ˆ(S) by InfEstis an (1 + 8✏)-approximation to the true value (S), with high probability.2For simplicity, we restrict this discussion to single-item propagation models only, ascompetitive models may introduce additional complications.3For example, according to the official quarterly earnings report of Facebook in thefirst quarter of 2016, it adds 63 million monthly active users. This roughly translates intoa growth rate of 8 new monthly active users per second. Source: http://investor.fb.com/.156They showed that for estimating the spread of a singleton set {u} within afactor of 1 + ✏, the InfEst algorithm requires ⌦(p|V |) queries to the linkserver.This work offers a significant first step toward devising distributed algo-rithms for influence maximization. A natural extension is to treat InfEst asa value oracle in the classic greedy algorithm (Algorithm 1), which results ina MapReduce-based algorithm for influence maximization. More specifically,in line 4 of Algorithm 1, we invoke InfEst to estimate f(S[{u}). Note thatthe greedy algorithm requires O(k|V |) calls to the spread estimation oracle,and in the case of InfEst, each of which requires ⌦(p|V |) queries to thelink server as mentioned above.For future work, it is certainly worthwhile to search for better distributedinfluence maximization algorithms (in terms of I/O-complexity and/or solu-tion quality). One intuitive direction is to consider making the RR-set basedalgorithms (e.g., TIM in [137], GeneralTIM in Chapter 4, and IMM [136])to work in a distributed setting, as the single-machine version of these al-gorithms are proven to have much better time complexity than the classicgreedy algorithm.157Bibliography[1] Zeinab Abbassi, Sihem Amer-Yahia, Laks VS Lakshmanan, SergeiVassilvitskii, and Cong Yu. Getting recommender systems to thinkoutside the box. In RecSys, pages 285–288. ACM, 2009. ! pages 123,126[2] Ittai Abraham, Moshe Babaioff, Shaddin Dughmi, and TimRoughgarden. Combinatorial auctions with restricted complements.In ACM Conference on Electronic Commerce, EC ’12, Valencia,Spain, June 4-8, 2012, pages 3–16, 2012. ! pages 78[3] Gediminas Adomavicius and Alexander Tuzhilin. Toward the nextgeneration of recommender systems: A survey of the state-of-the-artand possible extensions. IEEE Transactions on Knowledge and DataEngineering (TKDE), 17(6):734–749, 2005. ! pages 10, 122[4] William Aiello, Fan R. K. Chung, and Linyuan Lu. A random graphmodel for massive graphs. In STOC, pages 171–180, 2000. ! pages146[5] Sihem Amer-Yahia, Laks VS Lakshmanan, Sergei Vassilvitskii, andCong Yu. Battling predictability and overconcentration inrecommender systems. IEEE Data Eng. Bull., 32(4):33–40, 2009. !pages 123[6] David Arthur, Rajeev Motwani, Aneesh Sharma, and Ying Xu.Pricing strategies for viral marketing on social networks. In WINE,pages 101–112, 2009. ! pages 15[7] W. B. Arthur. Competing technologies, increasing returns, andlock-in by historical events. Economic Journal, 99(394), March 1989.! pages 41158[8] Çigdem Aslay, Wei Lu, Francesco Bonchi, Amit Goyal, and Laks V. S.Lakshmanan. Viral marketing meets social advertising: Ad allocationwith minimum regret. PVLDB, 8(7):822–833, 2015. ! pages 2[9] Lars Backstrom, Cynthia Dwork, and Jon M. Kleinberg. Whereforeart thou r3579x?: anonymized social networks, hidden patterns, andstructural steganography. In Proceedings of the 16th InternationalConference on World Wide Web, WWW 2007, Banff, Alberta,Canada, May 8-12, 2007, pages 181–190, 2007. ! pages 73[10] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan JWatts. Everyone’s an influencer: quantifying influence on twitter. InWWW, pages 65–74. ACM, 2011. ! pages 2[11] R. F. Bales. Interaction Process Analysis. Addison Wesley, Reading,MA, 1950. ! pages 1[12] Albert-László Barabási and Réka Albert. Emergence of scaling inrandom networks. Science, 286(5439):509–512, 1999. ! pages 146[13] Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco.Cascade-based community detection. In WSDM, pages 33–42. ACM,2013. ! pages 2[14] Amir Beck and Yonina C Eldar. Strong duality in nonconvexquadratic optimization with two quadratic constraints. SIAMJournal on Optimization, 17(3):844–860, 2006. ! pages 127[15] J Berger, S. J. Rosenholtz, and M. Zelditch Jr. Status organizingprocesses. Annual Review of Sociology, 6:479 – 508, 1980. ! pages 1[16] Smriti Bhagat, Amit Goyal, and Laks V. S. Lakshmanan.Maximizing product adoption in social networks. In WSDM, pages603–612, 2012. ! pages 2, 3[17] Smriti Bhagat, Amit Goyal, and Laks V. S. Lakshmanan.Maximizing product adoption in social networks. In WSDM, pages603–612, 2012. ! pages 14, 15, 38[18] Shishir Bharathi, David Kempe, and Mahyar Salek. Competitiveinfluence maximization in social networks. In WINE, pages 306–311,2007. ! pages 2, 9, 39, 42, 74159[19] Rushi Bhatt, Vineet Chaoji, and Rajesh Parekh. Predicting productadoption in large-scale social networks. In CIKM, pages 1039–1048.ACM, 2010. ! pages 2[20] Francis Bloch and Nicolas Quérou. Pricing in networks. Workingpapers, unpublished, Ecole Polytechnique, October 2011. ! pages29, 33[21] Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DIKramer, Cameron Marlow, Jaime E Settle, and James H Fowler. A61-million-person experiment in social influence and politicalmobilization. Nature, 489(7415):295–298, 2012. ! pages 1[22] Christian Borgs, Michael Brautbar, Jennifer T. Chayes, and BrendanLucier. Maximizing social influence in nearly optimal time. In SODA,pages 946–957, 2014. ! pages 3, 6, 76, 95, 96[23] Allan Borodin, Mark Braverman, Brendan Lucier, and Joel Oren.Strategyproof mechanisms for competitive influence in networks. InWWW, pages 141–150, 2013. ! pages 2, 9, 43, 64, 66, 72, 74[24] Allan Borodin, Yuval Filmus, and Joel Oren. Threshold models forcompetitive influence in social networks. In WINE, pages 539–550,2010. ! pages 2, 9, 39, 43, 45, 74, 155[25] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.Cambridge University Press, 2004. ! pages 127, 138, 139[26] Jacqueline J Brown and Peter H Reingen. Social ties andword-of-mouth referral behavior. Journal of Consumer Research,1987. ! pages 8, 123[27] Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. Intervalestimation for a binomial proportion. Statistical Science,16(2):101–117, 2001. ! pages 113[28] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Atight linear time (1/2)-approximation for unconstrained submodularmaximization. In FOCS, pages 649–658, 2012. ! pages 108[29] Ceren Budak, Divyakant Agrawal, and Amr El Abbadi. Limiting thespread of misinformation in social networks. In WWW, pages665–674, 2011. ! pages 2, 3, 39, 43, 74160[30] John T Cacioppo, Richard E Petty, and Cal D Stoltenberg. Processesof social influence: The elaboration likelihood model of persuasion.Advances in cognitive-behavioral research and therapy, 4:215–274,1985. ! pages 1[31] Tim Carnes, Chandrashekhar Nagarajan, Stefan M. Wild, and Ankevan Zuylen. Maximizing influence in a competitive social network: afollower’s perspective. In ICEC, pages 351–360, 2007. ! pages 2, 9,39, 42, 74[32] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes,and Robert E. Gruber. Bigtable: A distributed storage system forstructured data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June2008. ! pages 156[33] Wei Chen. Computational social influence. In Marcelo G.Armentano, Ariel Monteserin, Jie Tang, and Virginia Yannibelli,editors, SocInf 2015, volume 1398 of CEUR Workshop Proceedings.CEUR-WS.org, 2015. ! pages 2[34] Wei Chen, Alex Collins, Rachel Cummings, Te Ke, Zhenming Liu,David Rincón, Xiaorui Sun, Yajun Wang, Wei Wei, and Yifei Yuan.Influence maximization in social networks when negative opinionsmay emerge and propagate. In SDM, pages 379–390, 2011. ! pages2, 43, 74[35] Wei Chen, Laks V. S. Lakshmanan, and Carlos Castillo. Informationand Influence Propagation in Social Networks. Synthesis Lectures onData Management. Morgan & Claypool Publishers, 2013. ! pages 1,2, 6, 7, 8, 72, 84, 110, 117, 146[36] Wei Chen, Wei Lu, and Ning Zhang. Time-critical influencemaximization in social networks with time-delayed diffusion process.In AAAI, 2012. ! pages 2[37] Wei Chen, Chi Wang, and Yajun Wang. Scalable influencemaximization for prevalent viral marketing in large-scale socialnetworks. In KDD, pages 1029–1038, 2010. ! pages 3, 5, 7, 8, 31,110, 117, 118161[38] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influencemaximization in social networks. In KDD, pages 199–208, 2009. !pages 3[39] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influencemaximization in social networks under the linear threshold model. InICDM, pages 88–97, 2010. ! pages 3, 5, 7, 24, 29, 31, 37, 110, 117[40] Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck.Sketch-based influence maximization and computation: Scaling upwith guarantees. In CIKM, pages 629–638, 2014. ! pages 3[41] Yuga J. Cohler, John K. Lai, David C. Parkes, and Ariel D.Procaccia. Optimal envy-free cake cutting. In Proceedings of theTwenty-Fifth AAAI Conference on Artificial Intelligence, AAAI2011, San Francisco, California, USA, August 7-11, 2011, 2011. !pages 64[42] Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, andBernhard Schölkopf. Estimating diffusion network structures:Recovery conditions, sample complexity & soft-thresholdingalgorithm. In ICML, pages 793–801, 2014. ! pages 3[43] Samik Datta, Anirban Majumder, and Nisheeth Shrivastava. Viralmarketing for multiple products. In ICDM 2010, pages 118–127,2010. ! pages 77[44] P. A. David. Technical Choice, Innovation and Economic Growth.Cambridge University Press, 1975. ! pages 41[45] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified dataprocessing on large clusters. Commun. ACM, 51(1):107–113, 2008. !pages 156[46] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, GunavardhanKakulapati, Avinash Lakshman, Alex Pilchin, SwaminathanSivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo:Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev.,41(6):205–220, October 2007. ! pages 156[47] Morton Deutsch and Harold B. Gerard. A study of normative andinformational so- cial influences upon individual judgement. Journalof Abnormal and Social Psychology, 51(3):629 – 636, 1955. ! pages 1162[48] Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. Weighted graphcuts without eigenvectors A multilevel approach. IEEE Transactionson Pattern Analysis and Machine Intelligence, 29(11):1944–1957,2007. ! pages 142[49] Shahar Dobzinski, Noam Nisan, and Michael Schapira.Approximation algorithms for combinatorial auctions withcomplement-free bidders. Math. Oper. Res., 35(1):1–13, 2010. !pages 78[50] Pedro Domingos and Matthew Richardson. Mining the network valueof customers. In KDD, pages 57–66, 2001. ! pages 2, 3, 4, 8, 16, 38,123, 126[51] Nan Du, Le Song, Manuel Gomez-Rodriguez, and Hongyuan Zha.Scalable influence estimation in continuous-time diffusion networks.In NIPS, pages 3147–3155, 2013. ! pages 2[52] David A. Easley and Jon M. Kleinberg. Networks, Crowds, andMarkets - Reasoning About a Highly Connected World. CambridgeUniversity Press, 2010. ! pages 1[53] Benjamin Edelman and Michael Ostrovsky. Strategic bidder behaviorin sponsored search auctions. Decision Support Systems,43(1):192–198, 2007. ! pages 60, 61, 63[54] Xiang Fang, Surendra Singh, and Rohini Ahluwalia. An examinationof different explanations for the mere exposure effect. Journal ofconsumer research, 34(1):97–103, 2007. ! pages 123[55] Uriel Feige, Vahab S. Mirrokni, and Jan Vondrák. Maximizingnon-monotone submodular functions. In FOCS, pages 461–471, 2007.! pages 20[56] John Forrest. Coin-or branch-and-cut mip solver.https://projects.coin-or.org/Cbc. Accessed: 2016-05-17. ! pages 66[57] John R.P. French and Bertram Raven. The bases of social power. InD. Cartwright, editor, Studies in Social Power, pages 150–167.Institute for Social Research, 1959. ! pages 1[58] Robert G Gallager. Discrete stochastic processes, volume 101. KluwerAcademic Publishers Boston, 1996. ! pages 131, 132, 133163[59] Michael R. Garey and David S. Johnson. Computers andIntractability: A Guide to the Theory of NP-Completeness. W. H.Freeman & Co., New York, NY, USA, 1979. ! pages 54[60] Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach.Beyond accuracy: evaluating recommender systems by coverage andserendipity. In RecSys, pages 257–260. ACM, 2010. ! pages 126[61] Curtis F. Gerald and Patrick O. Wheatley. Applied numericalanalysis (7th ed.). Addison-Wesley, 2004. ! pages 29[62] Michel X Goemans and David P Williamson. Improvedapproximation algorithms for maximum cut and satisfiabilityproblems using semidefinite programming. Journal of the ACM(JACM), 42(6):1115–1145, 1995. ! pages 127, 136[63] J. Goldenberg, B. Libai, and E. Muller. Talk of the network: Acomplex systems look at the underlying process of word-of-mouth.Marketing Letters, 12(3):211–223, 2001. ! pages 4[64] Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause.Inferring networks of diffusion and influence. TKDD, 5(4):21, 2012.! pages 3[65] Manuel Gomez-Rodriguez and Bernhard Schölkopf. Influencemaximization in continuous time diffusion networks. In ICML, 2012.! pages 2[66] Teofilo F. Gonzalez, Sartaj Sahni, and William R. Franta. Anefficient algorithm for the kolmogorov-smirnov and lilliefors tests.ACM Trans. Math. Softw., 3(1):60–64, 1977. ! pages 32[67] Amit Goyal. Social Influence and its Applications: An algorithmicand data mining study. PhD thesis, The University of BritishColumbia, 2013. ! pages 1[68] Amit Goyal, Francesco Bonchi, and Laks V. S. Lakshmanan.Learning influence probabilities in social networks. In WSDM, pages241–250, 2010. ! pages 3, 65, 72, 110, 125, 156[69] Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan, and SureshVenkatasubramanian. On minimizing budget and time in influencepropagation over social networks. Social Netw. Analys. Mining,3(2):179–192, 2013. ! pages 2164[70] Amit Goyal and Laks V. S. Lakshmanan. RecMax: exploitingrecommender systems for fun and profit. In KDD, pages 1294–1302,2012. ! pages 2, 123[71] Amit Goyal, Wei Lu, and Laks V. S. Lakshmanan. Celf++:optimizing the greedy algorithm for influence maximization in socialnetworks. In WWW, pages 47–48, 2011. ! pages 2, 3, 6, 30[72] Amit Goyal, Wei Lu, and Laks V. S. Lakshmanan. Simpath: Anefficient algorithm for influence maximization under the linearthreshold model. In ICDM, pages 211–220, 2011. ! pages 3, 7, 31,37, 53, 110[73] Mark S Granovetter. Threshold models for collective behavior. TheAmerican Journal of Sociology, 83(6):1420–1443, 1978. ! pages 4[74] Michael Grant and Stephen Boyd. CVX: Matlab software fordisciplined convex programming, version 2.0 beta.http://cvxr.com/cvx, September 2013. ! pages 141[75] Anthony Grimes and Philip J Kitchen. Researching mere exposureeffects to advertising-theoretical foundations and methodologicalimplications. International Journal of Market Research,49(2):191–219, 2007. ! pages 123[76] Daniel Gruhl, R. Guha, David Liben-Nowell, and Andrew Tomkins.Information diffusion through blogspace. In WWW, pages 491–501,2004. ! pages 2[77] Jason D. Hartline, Vahab S. Mirrokni, and Mukund Sundararajan.Optimal marketing strategies over social networks. In WWW, pages189–198, 2008. ! pages 2, 15, 25[78] Xinran He and David Kempe. Robust influence maximization.CoRR, abs/1602.05240, 2016. ! pages 156[79] Xinran He, Guojie Song, Wei Chen, and Qingye Jiang. Influenceblocking maximization in social networks under the competitivelinear threshold model. In SDM, pages 463–474, 2012. ! pages 2, 3,39, 43, 74[80] Tad Hogg and Gabor Szabo. Diversity of user activity and contentquality in online communities. In ICWSM, pages 58–65, 2009. !pages 44165[81] Dino Ienco, Francesco Bonchi, and Carlos Castillo. The memeranking problem: Maximizing microblogging virality. In Proceedingsof the SIASP 2010 workshop at ICDM 2010. ! pages 2[82] Mohsen Jamali and Martin Ester. A matrix factorization techniquewith trust propagation for recommendation in social networks. InRecSys, pages 135–142, 2010. ! pages 10, 110, 123, 126, 141[83] Albert X. Jiang and Kevin Leyton-Brown. Estimating bidders’valuation distributions in online auctions. In Workshop on GameTheory and Decision Theory (GTDT) at IJCAI, 2005. ! pages 29[84] Kyomin Jung, Wooram Heo, and Wei Chen. IRIE: scalable androbust influence maximization in social networks. In ICDM, pages918–923, 2012. ! pages 3, 7[85] Shlomo Kalish. A new product adoption model with price,advertising, and uncertainty. Management Science, 31(12):1569–1585,1985. ! pages 8, 12, 13, 17, 84[86] David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing thespread of influence through a social network. In KDD, pages 137–146,2003. ! pages 1, 2, 3, 4, 5, 6, 8, 15, 19, 22, 23, 30, 31, 41, 47, 49, 74,94, 107, 123, 126, 146, 152, 155[87] Masahiro Kimura and Kazumi Saito. Tractable models forinformation diffusion in social networks. In PKDD, pages 259–271,2006. ! pages 3[88] Irwin King, Michael R. Lyu, and Hao Ma. Introduction to socialrecommendation. In WWW, pages 1355–1356, 2010. ! pages 123[89] Jon M. Kleinberg and Éva Tardos. Algorithm design.Addison-Wesley, 2006. ! pages 41[90] Robert D. Kleinberg and Frank Thomson Leighton. The value ofknowing a demand curve: Bounds on regret for online posted-priceauctions. In FOCS, pages 594–605, 2003. ! pages 13[91] Yehuda Koren. Factorization meets the neighborhood: a multifacetedcollaborative filtering model. In KDD, 2008. ! pages 128, 139, 140[92] Yehuda Koren. Collaborative filtering with temporal dynamics.Commun. ACM, 53(4), 2010. ! pages 123, 126166[93] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorizationtechniques for recommender systems. IEEE Computer, 2009. !pages 123, 128, 139, 140, 145[94] Jan Kostka, Yvonne Anne Oswald, and Roger Wattenhofer. Word ofmouth: Rumor dissemination in social networks. In SIROCCO, pages185–196, 2008. ! pages 39, 43[95] Bibb Latané. The psychology of social impact. Americanpsychologist, 36(4):343 – 356, 1981. ! pages 1[96] Bibb Latané. Dynamic social impact: The creation of culture bycommunication. Journal of Communication, 46:13–25, 1996. ! pages1[97] Benny Lehmann, Daniel Lehmann, and Noam Nisan. Combinatorialauctions with decreasing marginal utilities. In ACM EC, 2001. !pages 78[98] Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. Thedynamics of viral marketing. TWEB, 1(1), 2007. ! pages 2[99] Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, ChristosFaloutsos, and Zoubin Ghahramani. Kronecker graphs: An approachto modeling networks. Journal of Machine Learning Research,11:985–1042, 2010. ! pages 146[100] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos,Jeanne M. VanBriesen, and Natalie S. Glance. Cost-effectiveoutbreak detection in networks. In KDD, pages 420–429, 2007. !pages 2, 3, 6, 30, 37[101] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie S.Glance, and Matthew Hurst. Patterns of cascading behavior in largeblog graphs. In SDM, pages 551–556, 2007. ! pages 2[102] Bo Liu, Gao Cong, Dong Xu, and Yifeng Zeng. Time constrainedinfluence maximization in social networks. In ICDM, pages 439–448.IEEE, 2012. ! pages 2[103] Kun Liu and Evimaria Terzi. Towards identity anonymization ongraphs. In Proceedings of the ACM SIGMOD InternationalConference on Management of Data, SIGMOD 2008, Vancouver, BC,Canada, June 10-12, 2008, pages 93–106, 2008. ! pages 73167[104] Jon Loomer. Facebook advertising and spam, deception, value andtrust. http://www.jonloomer.com/2015/01/11/facebook-ads-spam-deception/,January 2015. ! pages 9[105] Vincent Yun Lou, Smriti Bhagat, Laks V. S. Lakshmanan, andSharan Vaswani. Modeling non-progressive phenomena for influencepropagation. In COSN, pages 131–138, 2014. ! pages 2[106] Wei Lu, Francesco Bonchi, Amit Goyal, and Laks V. S. Lakshmanan.The bang for the buck: fair competitive viral marketing from the hostperspective. In KDD, pages 928–936, 2013. ! pages iii, 2, 9, 74[107] Wei Lu, Wei Chen, and Laks V. S. Lakshmanan. From competitionto complementarity: Comparative influence diffusion andmaximization. Proceedings of the Very Large Database Endowment(PVLDB), 9, 2015. ! pages iii, 2, 3[108] Wei Lu, Stratis Ioannidis, Smriti Bhagat, and Laks V. S.Lakshmanan. Optimal recommendations under attraction, aversion,and social influence. In KDD, pages 811–820, 2014. ! pages iii, 2[109] Wei Lu and Laks V. S. Lakshmanan. Profit maximization over socialnetworks. In ICDM, pages 479–488, 2012. ! pages iii, 3[110] Brendan Lucier, Joel Oren, and Yaron Singer. Influence at scale:Distributed computation of complex contagion in networks. InProceedings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Sydney, NSW, Australia,August 10-13, 2015, pages 735–744, 2015. ! pages 156[111] Zhi-quan Luo, Wing-kin Ma, AM-C So, Yinyu Ye, and ShuzhongZhang. Semidefinite relaxation of quadratic optimization problems.Signal Processing Magazine, IEEE, 27(3):20–34, 2010. ! pages 127,135[112] Hao Ma, Irwin King, and Michael R. Lyu. Learning to recommendwith explicit and implicit social relations. ACM TIST, 2(3):29, 2011.! pages 10, 123[113] Andreu Mas-Colell, Michael D. Whinston, and Jerry R. Green.Microeconomic Theory. Oxford University Press, 1995. ! pages 74168[114] J. J. McAuley, R. Pandey, and J. Leskovec. Inferring networks ofsubstitutable and complementary products. In KDD, 2015. ! pages78[115] Sean M McNee, John Riedl, and Joseph A Konstan. Being accurateis not enough: how accuracy metrics have hurt recommender systems.In CHI’06 extended abstracts on Human factors in computingsystems, pages 1097–1101. ACM, 2006. ! pages 123, 126[116] Elchanan Mossel and Sébastien Roch. On the submodularity ofinfluence in social networks. In STOC, pages 128–134, 2007. ! pages6[117] Seth A. Myers and Jure Leskovec. Clash of the contagions:Cooperation and competition in information diffusion. In ICDM,pages 539–548, 2012. ! pages 2, 43, 78[118] Ramasuri Narayanam and Amit A Nanavati. Viral marketing forproduct cross-sell through social networks. In Machine Learning andKnowledge Discovery in Databases, pages 581–596. Springer, 2012. !pages 2, 77[119] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis ofapproximations for maximizing submodular set functions.Mathematical Programming, 14(1):265–294, 1978. ! pages 6, 42, 107,108[120] Yuri Nesterov. Semidefinite relaxation and nonconvex quadraticoptimization. Optimization methods and software, 9(1-3):141–160,1998. ! pages 127[121] Yuri Nesterov, Henry Wolkowicz, and Yinyu Ye. Semidefiniteprogramming relaxations of nonconvex quadratic optimization. InHandbook of semidefinite programming, pages 361–419. Springer,2000. ! pages 127, 134, 135, 137[122] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph ofepidemic cascades. In SIGMETRICS, pages 211–222, 2012. ! pages3, 156[123] Nishith Pathak, Arindam Banerjee, and Jaideep Srivastava. Ageneralized linear threshold model for multiple cascades. In ICDM,pages 965–970, 2010. ! pages 2, 9, 43, 74169[124] Gang Peng and Jifeng Mu. Technology adoption in online socialnetworks. Journal of Product Innovation Management, 28:133–145,2011. ! pages 44[125] Kira Radinsky, Krysta Svore, Susan Dumais, Jaime Teevan, AlexBocharov, and Eric Horvitz. Modeling and predicting behavioraldynamics on the web. In WWW, 2012. ! pages 126[126] Lisa Rashotte. Social influence. The Blackwell Encyclopedia ofSociology, IX:4426–4429, 2006. ! pages 1[127] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor,editors. Recommender Systems Handbook. Springer, 2011. ! pages10, 122[128] Matthew Richardson and Pedro Domingos. Miningknowledge-sharing sites for viral marketing. In KDD, pages 61–70,2002. ! pages 2, 3, 4, 16, 30[129] Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. Prediction ofinformation diffusion probabilities for independent cascade model. InKES, pages 67–75, 2008. ! pages 3, 72, 156[130] Anish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, andLi Zhang. Understanding cyclic trends in social choices. In WSDM,pages 593–602, 2012. ! pages 123, 126[131] Devavrat Shah. Gossip algorithms. Now Publishers Inc, 2009. !pages 126[132] Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems -Algorithmic, Game-Theoretic, and Logical Foundations. CambridgeUniversity Press, 2009. ! pages 13, 29, 31, 33, 60[133] Christopher Snyder and Walter Nicholson. Microeconomic Theory,Basic Principles and Extensions (10th ed). South-Western CengageLearning, 2008. ! pages 74[134] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, andHari Balakrishnan. Chord: A scalable peer-to-peer lookup service forinternet applications. In Proceedings of the 2001 Conference onApplications, Technologies, Architectures, and Protocols for ComputerCommunications, SIGCOMM ’01, pages 149–160, 2001. ! pages 156170[135] Lei Tang and Huan Liu. Community detection and mining in socialmedia. Synthesis Lectures on Data Mining and Knowledge Discovery,2(1):1–137, 2010. ! pages 2[136] Youze Tang, Yanchen Shi, and Xiaokui Xiao. Influence maximizationin near-linear time: a martingale approach. In SIGMOD, pages1539–1554, 2015. ! pages 3, 7, 67, 68, 76, 97, 98, 155, 157[137] Youze Tang, Xiaokui Xiao, and Yanchen Shi. Influence maximization:near-optimal time complexity meets practical efficiency. In SIGMOD,pages 75–86, 2014. ! pages 3, 6, 76, 95, 96, 97, 98, 99, 101, 103, 104,110, 115, 155, 157[138] Tamir Tassa and Francesco Bonchi. Privacy preserving estimation ofsocial influence. In Proceedings of the 17th International Conferenceon Extending Database Technology, EDBT 2014, Athens, Greece,March 24-28, 2014., pages 559–570, 2014. ! pages 73[139] M. Tomochi, H. Murata, and M. Kono. A consumer-based model ofcompetitive diffusion: the multiplicative effects of global and localnetwork externalities. Journal of Evolutionary Economics,15:273–295, 2005. ! pages 42[140] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM review, 38(1):49–95, 1996. ! pages 127, 135[141] Sharan Vaswani, Laks Lakshmanan, and Mark Schmidt. Influencemaximization with bandits. arXiv preprint arXiv:1503.00024, 2015.! pages 3, 156[142] Jan Vondrák. Optimal approximation for the submodular welfareproblem in the value oracle model. In Proceedings of the 40th AnnualACM Symposium on Theory of Computing, Victoria, BritishColumbia, Canada, May 17-20, 2008, pages 67–74, 2008. ! pages 78[143] Jaewon Yang and Jure Leskovec. Patterns of temporal variation inonline media. In Proceedings of the Fourth ACM InternationalConference on Web Search and Data Mining, 2011. ! pages 126[144] Mao Ye, Xingjie Liu, and Wang-Chien Lee. Exploring social influencefor recommendation: a generative model approach. In SIGIR, pages671–680. ACM, 2012. ! pages 2171[145] Yinyu Ye. Approximating quadratic programming with bound andquadratic constraints. Mathematical programming, 84(2):219–226,1999. ! pages 127[146] Cong Yu, Laks V. S. Lakshmanan, and Sihem Amer-Yahia. It takesvariety to make a world: diversification in recommender systems. InEDBT, pages 368–378, 2009. ! pages 126[147] Nicholas Jing Yuan, Fuzheng Zhang, Defu Lian, Kai Zheng, Siyu Yu,and Xing Xie. We know how you live: exploring the spectrum ofurban lifestyles. In COSN, pages 3–14. ACM, 2013. ! pages 110[148] Robert B Zajonc. Attitudinal effects of mere exposure. Journal ofPersonality and Social Psychology, 9(2):1, 1968. ! pages 123[149] Ali Zarezade, Ali Khodadadi, Mehrdad Farajtabar, Hamid R. Rabiee,Le Song, and Hongyuan Zha. Correlated cascades: Compete orcooperate. CoRR, abs/1510.00936, 2015. ! pages 2[150] S. Zhang. Quadratic maximization and semidefinite relaxation.Mathematical Programming, 87(3):453–465, 2000. ! pages 127, 138[151] Shenghui Zhao, Robert J. Meyer, and Jin K. Han. The EnhancementBias in Consumer Decisions to Adopt and Utilize ProductInnovations. Office of Research, Singapore Management University,2004. ! pages 44[152] Tong Zhao, Julian J. McAuley, and Irwin King. Leveraging socialconnections to improve personalized ranking for collaborativefiltering. In CIKM, pages 261–270, 2014. ! pages 10, 123[153] Bin Zhou and Jian Pei. The k -anonymity and l -diversity approachesfor privacy preservation in social networks against neighborhoodattacks. Knowl. Inf. Syst., 28(1):47–77, 2011. ! pages 73172
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational social influence : models, algorithms,...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computational social influence : models, algorithms, and applications Lu, Wei 2016
pdf
Page Metadata
Item Metadata
Title | Computational social influence : models, algorithms, and applications |
Creator |
Lu, Wei |
Publisher | University of British Columbia |
Date Issued | 2016 |
Description | Social influence is a ubiquitous phenomenon in human life. Fueled by the extreme popularity of online social networks and social media, computational social influence has emerged as a subfield of data mining whose goal is to analyze and understand social influence using computational frameworks such as theoretical modeling and algorithm design. It also entails substantial application potentials for viral marketing, recommender systems, social media analysis, etc. In this dissertation, we present our research achievements that take significant steps toward bridging the gap between elegant theories in computational social influence and the needs of two real-world applications: viral marketing and recommender systems. In Chapter 2, we extend the classic Linear Thresholds model to incorporate price and valuation to model the diffusion process of new product adoption; we design a greedy-style algorithm that finds influential users from a social network as well as their corresponding personalized discounts to maximize the expected total profit of the advertiser. In Chapter 3, we propose a novel business model for online social network companies to sell viral marketing as a service to competing advertisers, for which we tackle two optimization problems: maximizing total influence spread of all advertisers and allocating seeds to advertisers in a fair manner. In Chapter 4, we design a highly expressive diffusion model that can capture arbitrary relationship between two propagating entities to arbitrary degrees. We then study the influence maximization problem in a novel setting consisting of two complementary entities and design efficient approximation algorithms. Next, in Chapter 5, we apply social influence into recommender systems. We model the dynamics of user interest evolution using social influence, as well as attraction and aversion effects. As a result, making effective recommendations are substantially more challenging and we apply semi-definite programming techniques to achieve near-optimal solutions. Chapter 6 concludes the dissertation and outlines possible future research directions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2016-07-07 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0305735 |
URI | http://hdl.handle.net/2429/58394 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2016-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2016_september_lu_wei.pdf [ 3.75MB ]
- Metadata
- JSON: 24-1.0305735.json
- JSON-LD: 24-1.0305735-ld.json
- RDF/XML (Pretty): 24-1.0305735-rdf.xml
- RDF/JSON: 24-1.0305735-rdf.json
- Turtle: 24-1.0305735-turtle.txt
- N-Triples: 24-1.0305735-rdf-ntriples.txt
- Original Record: 24-1.0305735-source.json
- Full Text
- 24-1.0305735-fulltext.txt
- Citation
- 24-1.0305735.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0305735/manifest