Influence Maximization in Bandit and Adaptive SettingsbySharan VaswaniB.E. Computer Science, Birla Institute of Technology and Science, Pilani, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)The University of British Columbia(Vancouver)August 2015c© Sharan Vaswani, 2015AbstractThe objective of viral marketing is to leverage a social network to spread awareness abouta specific product in the market through information propagation via word-of-mouth. Aclosely related problem is that of influence maximization which aims to identify the‘best’ set of ‘influential’ users (seeds) to give discounts or free products to, such thatawareness about the product is maximized. We study two relatively unexplored variantsof influence maximization (IM) in social networks.Conventional IM algorithms assume the structure of the social network and edgeweights to be known. Though the structure of the network is usually known, edge weightsneed to be estimated from past data. In the first part of this thesis, we tackle the real butdifficult problems of (i) learning these edge weights online and (ii) maximizing influencein the network when no past data is available as input. We adopt a combinatorial multi-armed bandit (CMAB) paradigm and formulate the above problems respectively as (i)network exploration, i.e. incrementally minimizing the error in learned edge weights and(ii) regret minimization i.e. minimizing the loss in influence from choosing suboptimalseed sets over multiple attempts.Most previous work on influence maximization in social networks is limited to thenon-adaptive setting in which the marketer is supposed to select all of the seed users upfront. A disadvantage of this setting is that she is forced to select all the seeds basedsolely on a diffusion model. If the selected seeds do not perform well, there is no op-portunity to course-correct. A more practical setting is the adaptive setting in which themarketer initially selects a batch of users and observes how seeding those users leads toa diffusion of product adoptions. Based on this market feedback, she formulates a policyfor choosing the remaining seeds. We study adaptive offline strategies for two problems:(a) MAXSPREAD - given a budget on number of seeds and a time horizon, maximizethe spread of influence and (b) MINTSS - given a time horizon and an expected numberof target users, minimize the number of required seeds.iiPrefaceThis thesis is submitted in partial fulfilment of the requirements for a Master of ScienceDegree in Computer Science. All the work presented in this dissertation are original andindependent work of the author, performed under the supervision of Prof. Laks.V.S.Lakshmanan.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Conventional Influence Maximization . . . . . . . . . . . . . . . . . . 62.1.1 Diffusion Model . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Learning probabilities . . . . . . . . . . . . . . . . . . . . . . 82.1.4 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Multi-armed Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Adaptive Influence Maximization . . . . . . . . . . . . . . . . . . . . 103 Influence Maximization with Bandits . . . . . . . . . . . . . . . . . . . . 123.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.1 Combinatorial Multiarmed Bandit Framework . . . . . . . . . . 12iv3.1.2 Adaptation to IM . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Network Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.1 Random Exploration . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Strategic Exploration . . . . . . . . . . . . . . . . . . . . . . . 243.3 Regret Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Adaptive Influence Maximization . . . . . . . . . . . . . . . . . . . . . . 344.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.1 MAXSPREAD . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 MINTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.3 Bounded Time Horizon . . . . . . . . . . . . . . . . . . . . . . 414.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.1 Two phase Influence Maximization . . . . . . . . . . . . . . . 424.2.2 Sequential Model Based Optimization . . . . . . . . . . . . . . 434.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 454.3.3 Sequential Model Based Optimization . . . . . . . . . . . . . . 454.3.4 MAXSPREAD . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.3.5 MINTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54vList of TablesTable 3.1 Dataset characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 28Table 3.2 Subroutine runtimes (in sec/round) . . . . . . . . . . . . . . . . . . 29Table 3.3 Epinions: Spread vs k . . . . . . . . . . . . . . . . . . . . . . . . . 33Table 4.1 Policies of p = 1 recovered by SMAC for varying time horizons(T )for Flixster with Q = 5800 . . . . . . . . . . . . . . . . . . . . . . 50viList of FiguresFigure 3.1 Summary of bandit framework for influence maximization . . . . . 14Figure 3.2 Example for node level feedback. . . . . . . . . . . . . . . . . . . 15Figure 3.3 NetHept,k = 50: Network Exploration . . . . . . . . . . . . . . . . 30Figure 3.4 k = 50:Regret vs Number of Rounds . . . . . . . . . . . . . . . . . 31Figure 3.5 NetHept, k = 50: L2 error vs Number of Rounds . . . . . . . . . . 32Figure 4.1 Example for adaptive versus non-adaptive seed selection . . . . . . 34Figure 4.2 Theoretical comparison of adaptive and non-adaptive strategies . . . 40Figure 4.3 Counterexample to show that the spread is not adaptive submodularunder incomplete diffusion . . . . . . . . . . . . . . . . . . . . . . 41Figure 4.4 Average Spread vs Number of seeds . . . . . . . . . . . . . . . . . 46Figure 4.5 Runtime vs Number of seeds . . . . . . . . . . . . . . . . . . . . . 47Figure 4.6 Epinions:Adaptive vs Non-adaptive for MAXSPREAD . . . . . . . 48Figure 4.7 Number of seeds required vs Target fraction . . . . . . . . . . . . . 49Figure 4.8 Flixster: Runtime vs Target fraction . . . . . . . . . . . . . . . . . 50viiList of SymbolsG Graph underlying the social networkV Set of vertices in GE Set of edges in GP Set of influence probabilities(edge weights) in GNout(u) Set of out-neighbours of node uNin(u) Set of in-neighbours of node uD Length of the longest path of Gk Seed budgetS Seed setW Possible world of the diffusion processσ(S) Expected spread for the seed set Sm Number of base arms in the bandit frameworkpi Seed selection policy for adaptive seedingH Time horizon for viral marketing campaignT Number of rounds in the bandit gameXi,s Reward of the ith in round sTi,s Number of times arm i has been triggered until round sT s(u,v) Number of times edge (u,v) has been triggered until round sµi Mean of the reward distribution for the ith armA Superarm (subset of base arms)µˆ i Mean estimate for arm ipiA Triggering probability of arm i if the superarm A is playedO Influence maximization oracleρ Failure probability of the Frequentist approach for node level feedbackviiipu,v Influence probability for the edge (u,v)pmin Minimum influence probability in the networkpmax Maximum influence probability in the networkF Bounded convex setcs Cost function in round s in Zinkevich’s frameworkxs Point selected in round s in Zinkevich’s frameworkηs Step size in round s for gradient descent in Zinkevich’s frameworkr~µ (A) Reward obtained on playing the superarm A when the mean rewards is given by ~µRegA~µ ,α,β Regret for algorithm A when the mean rewardsare given by ~µ calculated according to an (α,β ) oracleC Set of available cascadesM Feedback mechanismµi Upper confidence bound for the ith armω Initial exploration rate for the ε-greedy algorithmζ Fraction of exploration rounds for the Initial Exploration algorithmγ Correlation decay in the networkα Multiplicative error in the marginal gain estimaton for adaptive policiesQ Expected target spread for MINTSSδ , β Shortfall in achieving the target spread for MINTSSpiGA,k Greedy Adaptive policy constrained to select k seedspiOA,k Optimal Adaptive policy constrained to select k seedspiGNA,k Greedy Non-adaptive policy constrained to select k seedspiONA,k Optimal Non-adaptive policy constrained to select k seedsσ(pistrategy,k) Expected spread for a pistrategy policy constrained to select k seedsixAcknowledgementsI am indebted to my supervisor Professor Laks.V.S.Lakshmanan for being so patient andfor all his help - technical and otherwise. I am grateful to all the professors with whom Igot an opportunity to interact with. In particular, I am grateful to Nick Harvey and MarkSchmidt for taking time off their busy schedules to give me helpful advice. I would liketo thank my lab-mates and collaborators for all the useful discussions and ideas. I wouldlike to thank my friends at UBC for keeping me sane during these two years. Last butnot the least, I am thankful to my family for their constant love and support.xTo my familyxi1IntroductionThey say 90% of the promotion of a book comes through word of mouth. Butyou’ve somehow got to get your book into the hands of those mouths first!.– Claudia OsmondRecently, there has been tremendous interest in the study of influence propagation insocial and information networks, motivated by applications such as the study of spread ofinfections and innovations, viral marketing, and feed ranking to name a few (e.g.,see [18,31, 48, 49]). In these applications, the network is given by a directed graph G = (V,E,P),with nodes V , edges E, and edge weights P : E → [0,1]. The nodes denote users ofthe social network, the edges correspond to relations between users of the network (e.g.friendship in Facebook, followers in Twitter). The edge weights (also referred to as influ-ence probabilities) characterize the strength of these relations. Usually, they denote theprobabilities of influence between adjacent nodes and govern how information spreadsfrom a node to its neighbours. Information propagation in these applications is modelledwith the aid of a stochastic diffusion model.We address the problem of viral marketing [38, 43] the aim of which is to leveragethe social network to spread awareness about a specific product in the market throughinformation propagation via word-of-mouth. Specifically, the marketer aims to select afixed number of ‘influential’ users (called seeds) to give free products or discounts to.The marketer assumes that these users will influence others through the social network.This will result in information propagating through the network with increasing numberof users adopting or becoming aware of the product. Since there is a budget on thenumber of free samples / discounts that can be given, the marketer must strategically1choose seeds so that the maximum number of people in the network become aware ofthe product. This can be formalized as the influence maximization (IM) problem [33].Under an assumed model of information diffusion, IM aims to identify the ‘best’ set ofseeds which when given free products / discounts will result in the maximum numberof people adopting or becoming aware of the product. We refer to a user adopting orbecoming aware of the product as being ‘activated’. More precisely, if k is the budgeton the number of seeds that can be selected by the marketer, IM aims to find a set S ofk seed nodes, such that activating them in the beginning leads to the maximum expectedspread σ , i.e., expected number of activated nodes according to a given diffusion model.The IM problem can be formally formulated as:S = argmax|S|≤kσ(S) (1.1)In their seminal paper, Kempe, Kleinberg and Tardo¨s [33] formalized IM as a discreteoptimization problem. They studied several discrete-time diffusion models originally pi-oneered in mathematical sociology. These models include the independent cascade andlinear threshold model (details in Section 2). They showed that IM under these modelsis NP-hard but the expected spread function satisfies the nice properties of monotonic-ity and submodularity. Monotonicity means adding more seeds cannot lead to a lowerexpected spread whereas submodularity intuitively corresponds to the property of dimin-ishing returns (formal definitions in Chapter 2). By exploiting these properties and earlyresults by Nemhauser et al.[39], they showed that a simple greedy algorithm achievesa (1− 1/e)-approximation to the optimum. There has been an explosion of researchactivity around this problem, including development of scalable heuristics, alternativediffusion models, and scalable approximation algorithms (e.g., see [10] [51] [35] [28][27] [50]). We refer the reader to [13] for a more detailed survey.Most work on IM assumes that both the network and the set of influence probabil-ities is available as input. However, although the network structure might be known inpractice, influence probabilities are not generally available. To overcome this, the ini-tial series of papers following [33] simply assigned influence probabilities: e.g., assigna fixed small probability such as 0.01 to all edges, assign values drawn at random froma fixed set such as {0.1,0.01,0.001}, or assign an equal probability of 1Nin(v) to all in-coming edges of a node v, where Nin(v) is the set of in-neighbours of v. However, thereis no basis for such assignment. A useful kind of data sometimes available is the set ofdiffusions or cascades that actually happened in the past, specified in the form of a log of2actions by network users. There is a growing body of work devoted to learning the trueinfluence probabilities from past data [17, 26, 40, 45, 46]. Details of this body of workare covered in Section 2.Irrespective of the differences in the approaches proposed, diffusion models used,or parameters learned, a common characteristic of all these works is that they all criti-cally depend on actual diffusion cascades being available. We notice that in several realdatasets, only the social network data is available and there is no available informationabout cascades. This unfortunately renders the previous approaches for learning influ-ence probabilities inapplicable. This raises the following questions. (1) When only asocial network is available, how can we learn influence probabilities, and do so in anefficient manner? (2) More generally, how can we solve the IM problem when no cas-cade information or influence probabilities is available? We solve these problems in anonline fashion i.e. we perform IM in multiple rounds and improve our seed selection andknowledge about the influence probabilities incrementally across rounds. We refer to thefirst question as the network exploration problem i.e. how to perform our seed selectionsso that we can learn about the influence probabilities in the network quickly. The sec-ond question can be cast as a regret minimization problem (formal definitions given inchapter 3).Formally, we adopt a multi-armed bandits(MAB) paradigm for solving the problemsof learning influence probabilities and influence maximization, when only the social net-work is available as input. The inspiration for our work comes from recent work by Chenet al. [14, 15] on combinatorial MAB that includes influence maximization as an appli-cation. The stochastic multi-armed bandit setting [34] has been used to address problemsin clinical trials [44], adaptive routing [5] and advertisement placement on websites [42]and recommender systems [36]. The combinatorial MAB framework [2, 14] is an exten-sion to the multi-armed bandit setting and has proved to be important in solving taskswhich have an associated combinatorial structure. Adoption of a bandit paradigm in turnraises the following additional questions. (3) How should we update our estimates ofthe influence probabilities based on observations made from previous seed selections?(4) How many rounds of seed selections are needed before influence probabilities canbe learned in an accurate enough manner? Another natural question is: (5) How doesthe learning “rate” of bandit based algorithms compare with that of the batch algorithms(which assume the availability of a batch of cascades) ? (6) While seed selection does im-prove our knowledge of influence probabilities, we are paying a price for selecting seeds3under imperfect knowledge, in the form of reduced spread compared with the spread wecould have achieved if we had perfect knowledge of the true influence probabilities. Howcan we quantify and how can we minimize this loss in spread or regret? This questionis particularly important to a viral marketer. Intuitively, a new marketer who has just en-tered the market and has no knowledge about the network (besides its structure) will tryand learn about it through a trial and error procedure (the ‘exploration’ phase). These ex-ploration rounds are therefore like an investment and the knowledge gained in this phasecan be used to generate revenue (by achieving a high spread) in the subsequent rounds(the ‘exploitation’ phase). The regret minimization objective corresponds to minimizingthe loss in revenue while learning about the market whereas the network exploration ob-jective corresponds to learning about the network in a reasonable amount of time. Wedescribe the basics of MAB and CMAB in chapter 2. In chapter 3, we show how toaddress the above questions by mapping our IM problem to the CMAB framework andusing ideas and algorithms from bandit theory.In the second part of this thesis, we focus on IM in the adaptive setting. The majorityof work in influence maximization has confined itself to a non-adaptive setting where,in viral marketing terms, the marketer must commit to choosing all the k seeds up front.We refer to conventional IM as the MAXSPREAD problem. Instead of maximizing thespread, the marketer may have a certain expected spread as the target that she wants toachieve. This target may be derived from the desired sales volume for the product. A nat-ural problem is to find the minimum number of seeds needed to achieve the target. Thisproblem, called minimum targeted seed selection (MINTSS for short), has been studiedin the non-adaptive setting [29]. It was shown that the classic greedy algorithm leads to abi-criteria approximation to the optimal solution. Precisely, if the target spread is Q and ashortfall of β > 0 is allowed, then the greedy algorithm will achieve an expected spread≥ (Q− β ) using no more than OPT (1 + lndQ/βe) seeds, where OPT is the optimalnumber of seeds.For both the MAXSPREAD and MINTSS problems, a non-adaptive setting impliesthat the choice of every single seed is driven completely by the diffusion model used forcapturing the propagation phenomena. In practice, it may happen that the actual spreadresulting from the seeds chosen may fall short of the expected spread predicted by thediffusion model. Recent work by Goyal et al. [27] shows that most diffusion modelstend to over-predict the actual spread. Thus, committing to the choice of all k seedsin one shot can result in a sub-optimal performance in reality. A more realistic setting4is one where the marketer chooses a subset of seeds and activates them. She monitorshow their activation spreads through the network and observes the actual or ‘true’ spreadthus far. She can then take into account this market feedback in making subsequent seedselections. We call this setting an adaptive setting, as choices of subsequent seeds areadapted to observations made so far about the actual spread achieved by previous selec-tions. Hence, the adaptive setting introduces a policy pi which specifies which node(s) toseed at a given time.Adaptive seed selection raises several major challenges. For instance, in the adaptivesetting, in practice, there is a finite time horizon H within which the marketer wishes toconduct her viral marketing campaign. The marketer must then consider the followingquestions. How many seeds to select at a given time, that is, what is the batch size? Which nodes should be selected in each intervention ? How long should she waitbetween seeding successive batches i.e. what should be the inter-intervention time ? If His sufficiently long, it seems intuitive that selecting one seed at a time and waiting untilthe diffusion completes, before choosing the next seed, should lead to the maximumspread. The reason is that we do not commit any more seeds than necessary to start afresh diffusion and every seed selection takes full advantage of market feedback. Werefer to the above case as unbounded time horizon. Another natural question is, what ifthe time horizon H is not long enough to allow many small batches to be chosen and/ordiffusions to be observed in full. In this case, which we call bounded time horizon, themarketer has to choose a strategy in which the budget k is spent within the time horizonH and every seed selection benefits from as much feedback as possible.It is very intuitive that for the MAXSPREAD problem, adaptive seed selection shouldlead to a higher actual spread compared to non-adaptive seed selection, since it benefitsfrom market feedback and tailors seed selections accordingly. For MINTSS, an interest-ing question is to what extent can an adaptive seed selection strategy cut down on thenumber of seeds required to reach a given target spread. We answer the above questionstheoretically and empirically in chapter 4.52Background and Related WorkIf I have seen farther it is by standing on the shoulders of Giants.– Sir Isaac Newton2.1 Conventional Influence MaximizationWe first describe the necessary background for solving the conventional IM problem.2.1.1 Diffusion ModelInformation propagation is modelled using stochastic diffusion models such as Indepen-dent Cascade (IC) and Linear Threshold (LT) [33]. For both these models, time proceedsin discrete steps. In this thesis, we focus exclusively on the IC model which we brieflydescribe now. All our results can be easily adapted for the LT model. In the IC model,the seed nodes are active at time t = 0. Each active user at time t gets one chance toinfluence/activate her out-neighbours at the next time step t +1. This activation attemptsucceeds with the corresponding influence probability between the two users. If the at-tempt succeeds, the neighbour will become active at time t +1. An edge along which anactivation attempt succeeded is said to be ‘live’ whereas the other edges are said to be‘dead’. At a given instant of time t, an inactive node v may have multiple in-neighbourscapable of activating it. We refer to the in-neighbours of v as its parents and to its in-neighbours which became active at t−1 (set of in-neighbours which can activate v at t)as its active parents. The diffusion process is said to have ended if there are no morenodes which can be activated using the above procedure. The number of nodes activated6at the end of the diffusion is referred to as the spread σ¯ . Since the IC diffusion modelis stochastic, it can result in a large number of ‘possible worlds’. Note that each edgecan be either ‘live’ or ‘dead’ in a possible world W depending on its influence probabil-ity. Hence there can be 2|E| possible worlds where some worlds are more probable thanothers. The expected spread (σ ) can be trivially calculated by taking a weighted averageover σ¯ in these possible worlds.2.1.2 AlgorithmAs was mentioned in chapter 1, IM is NP-hard under these common discrete time models.However, under both the IC and LT models, the expected spread σ is monotonic andsubmodular.Definition 1. [Monotonicity and Submodularity] Given the universe of elements U , areal-valued set function f : 2U → R is monotone if f (S) ≤ f (S′),∀S ⊂ S′ ⊆ U . It issubmodular if ∀S⊂ S′ ⊂U and x ∈U \S′, f (S′∪{x})− f (S′)≤ f (S∪{x})− f (S). 2The difference f (S∪{x})− f (S) is the marginal gain of adding element x to the setS. Submodularity hence implies that the marginal gain of any element x decreases asthe size of the set increases. Submodularity enables a simple greedy algorithm [39] toprovide a (1− 1/e)-approximation to the optimal solution. For the greedy algorithm,we build the set S incrementally. At each point, we add the element (in our case, seed)which maximizes the marginal gain in expected spread. Computing the expected spreadof a given set (and hence marginal gain) is #P-hard for both IC and LT models [11, 12].Kempe et al. [33] advocated using MCMC simulations to estimate marginal gains. UsingMCMC estimation of the marginal gain, the greedy algorithm yields a (1− 1/e− ε)-approximation to the optimum, where ε > 0 is the error because of the marginal gainestimation. Another technique for improving the speed of marginal gain computationis that of lazy evaluation [35]. Tang et al. [50] proposed a near-optimal (w.r.t. timecomplexity) randomized greedy (1−1/e−ε)-approximation algorithm for IM. We notethat it is currently the state of the art for MAXSPREAD and has been shown to scale toa billion-edge network. We use the approach in [50] for solving the conventional IMproblem.72.1.3 Learning probabilitiesA number of methods have addressed the problem of learning probabilities from pastdiffusions or cascades available in the form of logs. Goyal et al. [27] showed empiricallythat learning the true influence probabilities from available data is critical for ensuringthe quality of seeds chosen and the spread achieved. Saito et al. [46] develop a likelihoodmaximization approach for learning influence probabilities from available cascades un-der the IC model and propose an expectation maximization algorithm. Goyal et al. [26]develop a frequentist approach for learning influence probabilities under the generalizedthreshold model. Netrapalli and Sanghavi [40] present a likelihood maximization for-mulation for learning both the graph structure and influence probabilities using the ICmodel. In continuous time diffusion models, in lieu of influence probabilities, one as-sociates transfer rates of diffusion with network edges. Rodriguez et al. [17, 24, 45]develop algorithms to infer these rates from available cascades under continuous timediffusion models.2.1.4 FeedbackIn both the variants addressed in this thesis, we need some feedback from either thepartial or complete diffusion process. In the case when the influence probabilities areunknown, we need feedback about the state of the network at the end of the diffusionprocess to refine our estimates of the influence probabilities so that we can do better onthe next round of seed selection. For adaptive influence maximization, we need feedbackfrom the partial diffusion process so that we can select the next seed. The state of thenetwork can be characterized by identifying either the activated nodes (i.e. users who be-came aware / adopted the product) or the ‘live’ edges (i.e. edges along which a successfulactivation attempt has occurred). We refer to the former as node level feedback and tothe latter as edge level feedback. We note that it is more feasible to identify users whohave adopted a product than to gain information about each edge in the network. Thus,node level feedback is more practical than edge level feedback and we use it throughout.2.2 Multi-armed BanditsAs mentioned in chapter 1, we can cast influence maximization with unknown influenceprobabilities as a combinatorial multiarmed bandit problem. The stochastic multi-armedbandit (MAB) problem was first addressed in [34]. In the traditional framework, there are8m arms each of which has an unknown reward distribution. The bandit game proceedsin rounds and in every round s, an arm is played, the environment samples the rewarddistribution for that arm and a corresponding reward is generated. This game continuesfor a fixed number of rounds T . Some of the arms will result in higher rewards whereasothers have underlying distributions that lead to smaller reward values. Initially, thereward distribution of every arm is unknown to the player. The aim of the bandit gameis to minimize the regret obtained by playing suboptimal arms across rounds (regretminimization). This results in an exploration (trying arms we don’t know much aboutin hope of a higher rewards) vs exploitation (pulling the arm which we have learnt giveshigh rewards) tradeoff. For the regret minimization setting, [1, 34] proposed algorithmswhich can achieve the optimal regret of O(log(T )) over T rounds. Another objective ofthe bandit game can be to learn the properties (such as mean) of the underlying rewarddistributions (pure exploration). This problem has been studied in the past in [7, 19, 37].The Combinatorial Multiarmed Bandit problem is an extension to the MAB frame-work in which we can pull a set of arms (or superarm) together [14, 21] or simultaneouslyplay k among the m arms [2, 8, 20]. Clearly, triggering a super arm triggers all the armsit contains and the resulting reward is some combination of the individual rewards. Pre-vious literature [14, 21] consider a CMAB framework with an approximation oracle tofind the best superarm to be played in each round. In [21] however, the authors onlyconsider cases where the resulting reward on pulling the superarm is some linear combi-nation of the rewards for individual arms in the superarm. This was generalized to anynon-linear combination in [14]. For the regret minimization case, [14] proposed a UCBbased algorithm Combinatorial UCB (CUCB) to solve the CMAB problem and obtainan optimal regret of O(log(T )). [25] proposes a Thompson Sampling based algorithmto solve the CMAB problem. Pure exploration case in the CMAB framework has beenaddressed in [9]. In [15], the authors introduced the notion of probabilistically triggeredarms which are triggered when a superarm is played, in addition to the arms contained inthe superarm. In this paper, Chen at al. target both ad placement on web pages and viralmarketing application under semi-bandit feedback [4] i.e. we can observe the rewards ofarms which were triggered when the superarm was pulled. The latter is of special interestto us, and we discuss this in more detail in Section 3.1.92.3 Adaptive Influence MaximizationAdaptive influence maximization has been proposed previously in [16, 22, 30]. In theadaptive setting, batches of nodes are seeded at different intervals. When a batch isseeded, an actual diffusion (called realization in [22]) unfolds as per the classical ICmodel. The next batch is chosen based on the partially observed diffusion/cascade. Wewish to choose a policy that maximizes such an objective function in the adaptive setting.Golovin and Krause [22] extend the definitions of submodularity and monotonicity tothe adaptive setting. An objective function is adaptive monotone and adaptive submod-ular if the marginal gain of every element is non-negative and non-increasing in everypossible realization, as the size of the set (alternatively length of the policy) increases.As before, the greedy policy consists of selecting the node with the maximum marginalgain. Golovin and Krause [22] derive average case bounds on the performance of greedyadaptive policies. They also prove bounds on the greedy adaptive policy for adaptivesubmodular functions under matroid constraints [23].They assume an edge level feedback mechanism under the IC model and show thatthe expected spread is adaptive monotone and adaptive submodular, guaranteeing an ap-proximation algorithm. Guillory et al. [30] study the problem of submodular set cover inthe adaptive setting in which the objective is to minimize the total number of sets requiredto cover a certain target set and prove worst case bounds for the greedy adaptive policy.They briefly describe how their framework can be used for influence maximization in asocial network with hidden information (e.g., hidden preferences of users). The probleminvolves simultaneously learning about the hidden preferences of users and maximizingthe influence among users who prefer a particular product. We consider the more tra-ditional influence maximization problem and assume that users do not have any hiddenpreferences. We establish average case guarantees similar to [22]. Finally, [16] addressesthe adaptive MINTSS problem and shows that under certain conditions, the batch-greedyadaptive policy, in which the seeds are chosen in batches in a greedy manner, is compet-itive not only against the sequential greedy policy (choosing one seed at a time) but alsoagainst the optimal adaptive policy.These papers assume the unrealistic edge level feedback described above. The ex-periments conducted in these papers (if at all) are on small toy networks with 1000 nodesand they do not clarify the practical benefits of going adaptive for real large networks.All previous studies are confined to the setting of unbounded time horizon, which meansthe horizon is long enough for the diffusion started by each batch to complete. In prac-10tice, the horizon may be bounded and not leave enough time for successive diffusions tocomplete. The theoretical results in these papers bound the performance of the greedyadaptive policy compared to the optimal adaptive policy. Notice that the optimal (adap-tive) policy cannot be computed in polynomial time. The only practical options for bothnon-adaptive and adaptive settings are greedy approximations (possibly with techniquesfor scaling up to large datasets). Thus, a real question of practical interest is what dowe gain by going adaptive, i.e., what is the gain in peformance of the greedy approx-imation algorithm when it is made adaptive? In contrast, we study MAXSPREAD andMINTSS under both unbounded and bounded time horizon and quantify the benefits ofgoing adaptive with reference to the greedy approximation algorithm, as opposed to theoptimal algorithm. Furthermore, we use the more realistic node level feedback modelthroughout and perform experiments on large real world networks.113Influence Maximization withBanditsGood judgment comes from experience, and experience comes from bad judg-ment– Rita Mae Brown3.1 Theory3.1.1 Combinatorial Multiarmed Bandit FrameworkWe briefly review the combinatorial multi-armed bandit (CMAB) framework proposedin [14, 15] and adapt it to the problem of IM. A CMAB problem consists of m base arms.Each arm i is associated with a random variable Xi,s which denotes the outcome or rewardof triggering the ith arm on the sth trial. The reward Xi,s is bounded on the support [0,1]and is independently and identically distributed according to some unknown distributionwith mean µi. The CMAB game is played in T rounds, for some fixed T ≥ 1. In eachround s, a superarm A (a set of base arms) is played, which triggers all arms in A. Inaddition, other arms may get probabilistically triggered. Let piA denote the triggeringprobability of arm i if the superarm A is played. Clearly, ∀i ∈ A, piA = 1. The rewardobtained in each round s can be a (possibly non-linear) function of the rewards Xi,s foreach arm i that gets triggered in that round. Let Ti,s denote the number of times a basearm i has been triggered until round s. For the special case s = T , we define Ti := Ti,T ,12as the number of times arm i is triggered in the game. Each time an arm i is triggered,we update its mean estimate µˆ i based on the observed reward. Using the current rewarddistribution estimates, the best superarm (which is expected to lead to the highest reward)to be played in each round is selected by an oracle O, which takes as input the vectorof current mean estimates ~ˆµ = (µˆ1, ..., µˆm) and outputs an appropriate superarm A. Inorder to accommodate intractable problems, the framework of [14, 15] assumes that theoracle provides an (α,β )-approximation to the optimal solution, i.e., the oracle outputswith probability β a superarm A which attains a reward within a factor α of the optimalsolution, based on the current mean estimate ~ˆµ .3.1.2 Adaptation to IMWe can map the IM problem with unknown influence probabilities to the CMAB frame-work as follows: the edges E in the network G map to arms. Each arm is characterized bya Bernoulli distribution, i.e., the edge is either live or dead in the “true” possible world.Thus, the rewards Xi,s are binary. The mean of this distribution for an arm i is preciselythe influence probability for the corresponding edge. A set of seed nodes S correspondsto a superarm defined by the set of all edges ES that are outgoing from the nodes in S.Specifically, given a seed budget k, in each round, k seeds are selected and the corre-sponding superarm is played. Once the superarm is played, information diffuses in thenetwork and a subset of network edges become live, and Xi,s for these edges is 1. Bydefinition, when an arm/edge (u,v) is made live, we say that the target node v is active.The total reward for each round is the spread σ¯(S), i.e., the number of active nodes at theend of the diffusion process. Note that ¯σ(S) is a non-linear function of the rewards ofthe triggered arms. The input to the influence maximization oracle is the graph G alongwith the current estimates of influence probabilities ~ˆµ . It constitutes a (1− 1e ,1−1|E|)-approximation oracle and outputs a seed set S under the cardinality constraint k. Noticethat the well-known greedy algorithm with MC simulations used for influence maximiza-tion can serve as such an oracle. The mean estimate ~ˆµ vector is updated based on theobserved rewards. In this context, the notion of feedback mechanism plays an importantrole. It characterizes the information available after a superarm is played. This infor-mation is used to update the model of the world to improve the mean estimates of theunderlying arm reward distributions. The entire framework is summarized in figure 3.1.In the specific context of our application of bandits to IM, there are two basic typesof feedback we can formulate.13Figure 3.1: Summary of bandit framework for influence maximization3.1.2.1 Edge Level FeedbackIn the feedback mechanism proposed in [14], which we call edge level feedback, oneassumes we know the status (live or dead) of each triggered edge in the “true” possibleworld generated by the diffusion in the network. The mean estimates of the arms distri-butions can then be updated using a simple frequentist formula, i.e., for the ith arm, µˆi=∑sj=1 Xi, jTi,s. Assuming edge level feedback amounts to assuming that for each activatednode in the network, we know exactly which of its active parents succeeded in activat-ing the node. A key point is that success/failure of activation attempts in general is notobservable and thus edge level feedback assumption is not realistic. This motivates us topropose node level feedback, made precise below.3.1.2.2 Node Level FeedbackUnlike the status of edges, we can always observe the status of each node, i.e., whethera user bought/adopted the marketed product and when. We call this node level feed-back. Clearly, compared to edge level feedback, node level feedback makes a weakerbut more realistic assumption. The flip side is that update to the mean estimates is morechallenging under node level feedback.To see this, consider a given node v which becomes active at time t. Suppose v14has more than one parent and the parents u1, ...,uK became active at time t − 1 (seeFigure 3.2).Figure 3.2: Example for node level feedback.Under edge level feedback, we assume that we know the status of each edge (u j,v),1 ≤ j ≤ K and use it to update mean estimates. Under node level feedback, any of theactive parents may be responsible for activating a node v and we don’t know which oneis. There are two ways of overcoming this problem.3.1.2.2.1 Frequentist Approach This is an adaptation of the frequentist approach foredge level feedback. Specifically, we propose a scheme whereby we choose one of theactive neighbors of v, say ui, uniformly at random, and assign the credit of activating vto ui (see Figure 3.2). The probability of assigning credit to any one of the K nodes is1K . Here, edge (ui,v) is given a reward of 1 whereas the edges (u j,v) corresponding toall the other active parents u j, j 6= i, are assigned a zero reward. We then follow the samefrequentist update formula as described for the edge level feedback model.Owing to the inherent uncertainty in node level feedback, note that we may makemistakes in credit assignment: we may infer an edge to be live while it is dead in thetrue world or vice versa. We term the probability of such faulty inference, the failureprobability under node level feedback. An important question is whether we can boundthis probability. This is important since failure probability could ultimately affect theachievable regret and the error in the learned probabilities. The following result settlesthis question. As a preview, in Section 3.4, we verify that the failure probability is lowfor real networks, thus establishing the effectiveness of learning influence probabilitiesunder node level feedback.Theorem 1. Suppose a network node v became active at time t and u1, ...,uK are theparents of v that became active at t − 1. Let i denote the arm corresponding to edge(ui,v). Let pmin and pmax be the minimum and maximum true influence probabilities inthe network. The failure probability ρ under node level feedback is then characterized15by:ρ ≤ 1K(1− pmin)(1−K∏k=1,k 6=i[1− pmax])+(1−1K)pmax. (3.1)Let µˆEi and µˆNi be the inferred influence probabilities for edge (ui,v) using edge leveland node level feedback respectively. Then the relative error in the learned influenceprobability is given by: ∣∣µˆNi − µˆEi∣∣µˆEi= ρ∣∣(1µˆEi−2)∣∣ (3.2)Proof. Consider the network in Figure 3.2. Consider updating the influence probabilityon the edge (ui,v) that is being learned. For the situation in Figure 3.2, we may infer theedge (ui,v) to be live or dead. Our credit assignment makes an error when the edge islive and inferred to be dead and vice versa. Recall that all probabilities are conditionedon the fact that the node v is active at time t and K of its parents (u1, ...,ui, ...,uK) areactive at time t− 1. Let Ed (El) be the event that the edge (ui,v) is dead (resp., live) inthe true world. Hence we can characterize the failure probability as follows:ρ = Pr[(ui,v) is inferred to be live]×Pr[Ed | v is active at time t]+Pr[(ui,v) is inferred to be dead]×Pr[El | v is active at time t]If (ui,v) is live in the true world, then node v will be active at time t irrespective ofthe status of the edges (u j,v), j ∈ [K], j 6= i. Hence,Pr[El | v is active at time t] = Pr[El]By definition of independent cascade model, the statuses of edges are independent ofeach other. Hence,Pr[Ed | v is activeat time t] = Pr[Ed ∧ at least one of the edges (u j,v), j 6= i, is live]ρ = Pr[(ui,v) is inferred to be live]×Pr[Ed ∧ at least one of the edges (u j,v), j 6= i, is live]+Pr[(ui,v) is inferred to be dead]×Pr[El]Let pu j,v be the true influence probability for the edge (u j,v), j ∈ [K]. We have the16following:Pr[El] = pui,vPr[Ed ∧ at least one of the edges (u j,v), j 6= i, is live] = (1−Pr[El])[1−K∏j=1, j 6=i[1− pu j,v]]Since one of the active nodes is chosen at random and assigned credit,Pr[choosing ui for credit] =1K⇒ Pr[(ui,v) is inferred to be live] =1KWe thus obtain:ρ = 1K(1− pui,v)[1−K∏k=1,k 6=i[1− puk,v]]+(1−1K)pui,v (3.3)Let pmin (pmax) denote the minimum (resp. maximum) true influence probability ofany edge in the network. Plugging these in Eq. (3.3) gives us the upper bound in Eq.(3.1),the first part of the theorem.Let µˆNi and µˆEi denote the mean estimates using node level and edge level feedbackrespectively. That is, they are the influence probabilities of edge (ui,v) learned undernode and edge level feedback. We next quantify the error in µˆNi relative to µˆEi . Let XNi,sbe the status of the edge corresponding to arm i inferred using our credit assignmentscheme described above, at round s. Recall that under both edge level and node levelfeedback, the mean is estimated using the frequentist approach. That is, µˆNi = ∑Ts=1XNi,sTi(similarly for edge level feedback). Note that Xi,s denotes the true reward (for edge levelfeedback) whereas XNi,s denotes the inferred reward under node level feedback, using thecredit assignment scheme described earlier. Thus, for each successful true activation ofarm i (i.e., Xi,s = 1) we obtain XNi,s = 1 with probability 1−ρ and for each unsuccessfultrue activation, we obtain XNi,s = 1 with probability ρ . Let Si denote the number of roundsin which the true reward Xi,s = 1. Hence, we have:17µˆEi =SiTi(3.4)µˆNi =Si(1−ρ)+(Ti−Si)(ρ)Ti(3.5)The second part of the theorem, Eq.(3.2), follows from Eq.(3.4) and (3.5) using sim-ple algebra.In Section 3.4, we empirically find typical values of pmax, pmin, and K on real datasetsand verify that the failure probability is indeed small. We also find that the proposed nodelevel feedback achieves competitive performance compared to edge level feedback.3.1.2.2.2 Maximum Likelihood Method As mentioned before, the papers [40, 46] de-scribe an offline method for learning influence probabilities, where a batch of cascadesis given as input. We adapt the approach of [40] into an online method, employing theCMAB paradigm. In an online setting, cascades stream in and only one cascade is avail-able at a time. Each cascade can be viewed as resulting from choosing a set of seeds ina round of a CMAB game. Before describing our extension, we briefly review the ap-proach of Netrapalli and Sanghavi [40]. They take as input a batch of diffusion cascadesand infer the influence probabilities and the neighborhood structure for each node in thegraph. They formulate a function Lc(θ) which models the likelihood of observing a par-ticular cascade c given the current estimated neighborhood and influence probabilities.They show that the likelihood function is concave and can be decoupled over the nodes,i.e., it can be maximized independently for each node. For our use case, we assume thatthe graph structure (the true neighborhood) is known for each node. We next describethe likelihood function used in [40] and then propose a method for making the likelihoodoptimization online.Offline MLE Learning: Let θuv = − ln(1− pu,v) where pu,v is the influence probabil-ity of the edge (u,v). We can characterize the likelihood in terms of ~θ , the vector of“θ values”. The overall likelihood function L(~θ) is a sum of the likelihood functionsfor individual cascades i.e., L(~θ) = ∑Cc=1 Lc(~θ), C being the number of cascades. Thelikelihood function L decomposes over the network nodes, i.e., L(~θ) = ∑Cc=1∑v∈V Lcv(~θ)where Lcv(θ) models the likelihood of observing the cascade c w.r.t. node v. Let tcu denotethe timestep at which node u becomes active in cascade c. Then Lcv(~θ) can be written as18follows:Lcv(~θ) =− ∑u:tcu≤tcv−2θuv + ln(1− exp(− ∑u:tcu=tcv−1θuv)) (3.6)The first term corresponds to unsuccessful attempts by u in activating node v, whereasthe second term corresponds to the successful activation attempts. Notice that pu,v is0 whenever u is not a parent of v. Since the likelihood function is concave, we canmaximize it for each node v and each cascade c separately, using gradient descent, using~θ q+1∗v = ~θq∗v−ηq∇(−Lcv(.)) (3.7)where ∇(−Lcv(.)) is the gradient of −Lcv(.) and ~θ ∗v is the vector containing the θuv fromall parent nodes u of v. Here, q corresponds to the gradient descent iteration and ηq isthe step size used in iteration q.Online MLE Learning: We use a result from online optimization of convex functions,to maximize the likelihood function above in an online streaming setting. Zinkevich [52]develops a method for online optimization of convex functions over a convex set. InZinkevich’s framework, there is a fixed convex set F of n-dimensional points, known inadvance. A series of convex functions cs : F → R stream in, with cs being revealed attime s. This framework makes no assumption about the distribution of the cs functions.At each timestep s, the online algorithm must choose a point xs ∈ F before seeing theconvex function cs. The objective is to minimize the total cost across T timesteps, i.e.,Coston(T ) = ∑s∈[T ] cs(xs), for a given finite T . Consider an alternative offline settingwhere the cost functions cs are revealed in advance s ∈ [T ], for T timesteps. The offlinealgorithm is required to choose a single point x ∈ F such that Costo f f (T ) = ∑s∈[T ] cs(x)is minimized. Zinkevich defines the loss1 of the online algorithm compared to the offlinealgorithm as Loss(T ) = Coston(T )−Costo f f (T ). Notice that the loss depends on thenumber of steps T and it can also be measured in an average sense, i.e., as Loss(T )/T .Zinkevich [52] proposes a greedy algorithm for updating the estimates xs i.e., xs+1 =xs−ηs∇(cs(xs)).For our IM application, the cost functions cs correspond to the negative likelihoodfunction −Lsv(.) for the cascade generated in round s of the CMAB game. Notice thatthese functions are convex. Thus, the online MLE method works as follows: at eachround, we pick a seed set at random, observe the resulting cascade, and update the “θ1He used the term regret which we rename to loss, to avoid confusion with the regret studied in Sec-tion 3.3.19values” using Eq.(3.7), the exact same update equation used in the offline MLE algo-rithm. The only difference with the offline method lies in not requiring all cascades upfront and in using only the last cascade observed (resulting from the last seed set chosen)to perform the update.Performance of Online MLE Learning: In our online algorithm, we seek to mini-mize the negative likelihood function across T rounds of the CMAB game and estimatethe true influence probabilities. Let F be the convex set of ~θ vectors corresponding topossible values for network influence probabilities. Let dia(F) be the maximum Eu-clidean distance d between any two points x,y ∈ F , i.e., dia(F) := maxx,y∈Fd(x,y). Theobjective is to minimize the total “cost” incurred until round T . We evaluate the onlinemethod of learning influence probabilities by comparing it with the offline algorithm in[40]. Let ~θbatch denote the parameters learned by their offline algorithm. Let ~θ s de-note the parameters learned by our online algorithm at the end of round s. (Recall thatθuv :=− ln(1− pu,v).) We can show:Theorem 2. Let ~θbatch be the set of parameters learned offline with the cascades avail-able in batch, and ~θ s be the estimate for the parameters in round s of the CMAB frame-work. Let Lsv be the likelihood function at round s for the node v under consideration, dvbe the in-degree of v, T be the total number of rounds and G = maxs∈[T ]||∇(−Ls(~θs))|| bethe maximum L2-norm of the gradient of the negative likelihood function over all rounds.Suppose we perform greedy updates to the estimates~θ s using Eq.(3.7) with ηs decreasingas 1√s . Then we have the following:T∑s=1(Lsv(~θ batch)−Lsv(~θs)≤dvθ 2max√T2+(√T −12)G2. (3.8)Proof. The proof is an adaptation of the loss result from [52], reproduced below:Loss(T )≤dia(F)2√T2+(√T −12)||∇(cmax)||2 (3.9)where ||∇cmax|| is the maximum gradient obtained across the T rounds in the frameworkof [52]. Let the true influence probabilities lie in the range (0, pmax], for some pmax. Thenthe θ values for various edges lie in the range (0,θmax) where θmax = − ln(1− pmax).Our optimization variables are ~θ and the cost function cs in our setting is −Lsv where1 ≤ s ≤ T . Furthermore, in our case, dia(F) =√dvθmax since this is the maximumdistance between any two “θ -vectors” and Loss(T ) = ∑Ts=1(Lsv(~θbatch)−Lsv(~θ s), where20~θbatch is the optimal value of ~θ , given all the cascades as a batch. Substituting thesevalues in Eq. 3.9, we obtain Eq. 3.8, proving the theorem.The significance of the theorem can be best understood by considering the averageloss over T rounds. The average loss Loss(T )T can be seen to approach 0 as T increases.This shows that with sufficiently many rounds T , the parameters learned by the onlineMLE algorithm are nearly as likely as those learned by the offline algorithm of [40]. Asthe number of cascades (rounds) becomes sufficiently large, ~θ batch approaches the “true”parameters and hence by the above result, as T increases, the parameters ~θ s tend to thetrue ones, on an average. Hence, the online approach for maximumizing the likelihoodresults in a consistent estimator of the influence probabilities. Before closing, we remarkthat the framework of [52] on which this result is based, makes no distributional assump-tions on the cascades being generated and hence the above result does not depend on thenetwork exploration algorithm used. For instance, the result holds whether successiveseed sets are chosen at random or using any criteria.3.2 Network ExplorationThe objective of network exploration is to obtain good estimates of the network’s influ-ence probabilities, regardless of the loss in spread in each round and it thus requires pureexploration of the arms. We seek to minimize the error in the learned (i.e., estimated)influence probabilities ~ˆµ w.r.t. the true influence probabilities~µ i.e. minimize ||~ˆµ−~µ||2.We study two exploration strategies – random exploration, which chooses a random su-perarm at each round and strategic exploration, which chooses the superarm which leadsto the triggering of a maximum number of edges which haven’t been sampled sufficientlyoften.3.2.1 Random ExplorationThe idea of random exploration is simple: at each round of the CMAB game, pick a seedset of size k at random. This starts a diffusion and based on the observations, we canupdate the current influence probabilities according to the desired feedback model. Thepseudocode is given in Algorithm 1.Frequentist Feedback: We next consider random exploration with a frequentist feed-back used for estimating the influence probabilities. We already discussed the frequentistupdate method in Section 3.1 (see Algorithm 4 and Sections 3.1.2.1 and 3.1.2.2.1). In21Algorithm 1: RANDOM EXPLORATION(budget k, Feedback mechanism M)1 for s = 1→ t do2 ES = Explore(G,k);3 Play the superarm ES and observe the diffusion cascade c ;4 ~ˆµ = UPDATE(c,M) ;this section, we mainly focus on its performance. Specifically, we address the importantquestion, how many CMAB rounds, equiv., number of cascades, are required in orderto learn the influence probabilities accurately? We settle this question below for randomexploration, assuming frequentist estimation of probabilities under edge level feedback.Thereto, we resort to the correlation decay assumption commonly made for randomprocesses over large graphs [40]. This assumption intuitively says that diffusion fromeach seed does not travel far. More formally, we assume that there is a number γ ∈ (0,1)which bounds the sum of incoming probabilities of edges to any node: precisely, ∀v ∈V : ∑(u,v)∈E pu,v ≤ (1− γ). From a straightforward application of Lemma 1 in [40], wecan show that the probability that any node v in V gets active is bounded by pinitγ , wherepinit is the probability that node v is a seed node. When the seed budget is k and the seedsare chosen at random, we have pinit = k|V | . We have the following result.Theorem 3. Consider an edge (ui,v) in the network with true probability pi := pui,v.Suppose that random exploration with frequentist estimation under edge level feedbackis used to learn the influence probabilities and that the number of available cascades Csatisfies the inequalityC ≥3γ|V | ln( 1δ )ε2 pik(3.10)where k is the seed budget, γ is the correlation decay bound, and |V | is the number ofnodes in the network. Then the influence probability of the edge (ui,v) can be learned towithin a relative error of ε with probability greater than 1−δ .Proof. Consider the edge (ui,v) which corresponds to the arm i in the CMAB framework.Suppose the number of CMAB rounds that are played in all is C and of these, the numberof rounds in which arm i (edge (ui,v)) is played is Ci. By the semantics of the IC model,this is equivalent to the number of cascades in which node ui gets activated. We establisha bound on the number of cascades Ci needed to learn the influence probability for edge22(ui,v) to within a relative error of ε and use it to bound the total number of cascades Cneeded. Arm i follows a Bernoulli distribution with mean as the true probability pi :=pui,v. Let the random variable Xi,s denote the outcome of whether the edge (ui,v) becamelive in the sth cascade, i.e., in the sth round in the CMAB game. Define X = ∑Cis=1 Xi,s.Using Chernoff bounds, we getPr[X ≥ (1+ ε)Ci pui,v]≤ exp(−ε2Ci pi3)Rewriting, we getPr[µˆEi − pipi]≥ ε]≤ exp(−ε2Ci pi3) (3.11)Bounding by δ the probability that the relative error exceeds ε , we getexp(−ε2Ci pi3)≤ δ (3.12)Ci ≥3ln(1/δ )piε2(3.13)This gives a bound on the number of times arm i needs to be sampled in order tolearn the probabilities to within a relative error of ε . If C is the total number of cascades,then we havePr[nodeui becomes active]≤pinitγ (3.14)Ci ≤Cpinitγ (3.15)From Eq. (3.13) and (3.15),C ≥3γ ln( 1δ )pinitε2 pi(3.16)(3.17)If the seeds are chosen randomly and the seed budget is k, then pinit = k|V | . Plugging thisinto Eq.(3.17), the theorem follows.This result gives the minimum number of cascades required in order to learn the23influence probabilities within a given relative error, with high probability. The is alsothe number of rounds of CMAB needed to learn those probabilities. Thus, this resultapplies for both offline and online learning. The result is couched in terms of edge levelfeedback. We can combine the result from Theorem 1 to bound the number of cascadesneeded in case of node level feedback. We leave this for future work.3.2.2 Strategic ExplorationRandom exploration doesn’t use information from previous rounds to to select seeds andexplore the network. On the other hand, a pure exploitation strategy selects a seed setaccording to the estimated probabilities in every round. This leads to selection of a seedset which results in a high spread and consequently triggers a large set of edges. However,after some rounds, it stabilizes choosing the same/similar seed set in each round. Thusa large part of the network may remain unexplored. We combine ideas from these twoextremes, and propose a strategic exploration algorithm: in each round s, select a seed setwhich will trigger the maximum number of edges that have not been explored sufficientlyoften until this round. We instantiate this intuition below.Recall Ti is the number of times arm i (edge (ui,v)) has been triggered, equivalently,number of times ui was active in the T cascades. Writing this in explicit notation, letT s(u,v) be the number of times the edge (u,v) has been triggered in the cascades 1 throughs, s ∈ [T ]. Define value(u) := ∑v∈Nout(u)1T(u,v)+1. Higher the value of a node, the moreunexplored (or less frequently explored) out-edges it has. Define value-spread of a setS ⊂ V exactly as the expected spread σ(S) but instead of counting activated nodes, weadd up their values. Then, we can choose seeds with the maximum marginal value-spreadgain w.r.t. previously chosen seeds. It is intuitively clear that this strategy will chooseseeds which will result in a large number of unexplored (or less often explored) edges tobe explored in the next round. We call this strategic exploration (SE). It should be notedthat the value of each node is dynamically updated by SE across rounds so it effectivelyshould result in maximizing the amount of exploration across the network. We leavederiving formal bounds for strategic exploration for future work.3.3 Regret MinimizationWe first formally define the notion of regret, tailored to the IM problem. Intuitively, givena seed budget k, we select seed sets Ss of size k in each round s, each corresponding to24Algorithm 2: EXPLORE(Graph G = (V,E,~µ), budget k)//Returns superarm for an exploration round ;1 Select k nodes at random from V to form seed set S ;Output ESAlgorithm 3: EXPLOIT(Graph G = (V,E,~µ), Oracle O, budget k)//Returns superarm for an exploitation round ;1 S = O(G,k) //Oracle returns optimal seed set using current ~µ estimates ;Output ESa superarm ESs . By selecting seeds under imperfect knowledge, we achieve suboptimalspread. Let ~µ denote the vector of true influence probabilities of the network edges.Let r~µ (ES) denote the reward from playing the superarm ES, under the true means ~µ .For IM, r~µ (ES) = σ¯(S), i.e., spread of S under the IC model, using the true influenceprobabilities. Let opt~µ = maxS:|S|=kσ(S), i.e., the optimal spread.If the true influence probabilities are known, using the oracle, we can achieve a spreadof α ·β ·opt~µ . Let A be any strategy used for choosing seeds in successive rounds. LetSAs be the seed set chosen by A in round s. There is a clear tradeoff between exploringand exploiting: exploiting improves the quality of seeds selected, but only relative tothe current knowledge of influence probabilities, which may be imperfect; exploringimproves the knowledge of influence probabilities, but at the expense of choosing seedsof lower quality. The regret incurred by A can be quantified asRegAµ,α,β = T ·α ·β ·optµ −ES[ T∑s=1r~µ (SAs )],where the expectation is over the randomness in the seed sets output by the oracle. Wenow discuss several strategies for seed selection over the rounds. The strategies, given asAlgorithms 5-7, invoke the subroutines in Algorithms 2-4. Given a network G and budgetk, Algorithm 2 outputs a random subset of size k from V as the seed set. Algorithm 3, onthe same input, consults an oracle and outputs the seed set of size k that maximizes thespread according to current mean estimates. Algorithm 4 examines the latest cascade cand updates the parameters using the desired feedback mechanism M.Combinatorial Upper Confidence Bound: The Combinatorial Upper ConfidenceBound (CUCB) algorithm was proposed in [14] and shown to achieve logarithmic regret25Algorithm 4: UPDATE(Cascade c, Feedback mechanism M)1 if M == ‘Edge Level’ then2 ∀i update µˆi according to scheme in section 3.1.2.1 ;3 else if M == ’Node Level Frequentist’ then4 ∀i update µˆi according to scheme in section 3.1.2.2.1 ;5 else6 ∀i update µˆi according to scheme in section 3.1.2.2.2 ;Output ~ˆµAlgorithm 5: CUCB(budget k, Feedback mechanism M)Initialization1 Initialize ~ˆµ ;2 ∀i initialize Ti ;3 for s = 1→ T do4 ES = Exploit(G,O,k) ;5 Play the superarm ES and observe the diffusion cascade c ;6 ~ˆµ = Update(c,M) ;7 ∀ arm i that was triggered, increment values of Ti,s ;8 ∀ i µˆi = µˆi +√3ln(s)2Ti;in the number of rounds (T ). It is a variant of the popular upper confidence boundalgorithm. The algorithm assumes that each edge in the network has been triggered atleast once initially and maintains an overestimate µi of the mean estimates µˆi. µi =µˆi +√3ln(t)2Ti. Pure exploitation, i.e., finding the best seed set using the oracle O with theµi values as input leads to implicit exploration and is able to achieve optimal regret [14].The pseudocode appears in Algorithm 5.ε-Greedy: Another strategy proposed for the CMAB problem in [14] is the ε-Greedystrategy. In each round s, this strategy involves exploration – select k seeds at randomwith probability εs and exploitation – with probability 1− εs use the influence maxi-mization oracle with the mean estimates µˆi as input to select the seed set. Chen et al.[14] show that that if εs is annealed as 1/s, logarithmic regret can be achieved. Thepseudocode appears in Algorithm 6.As mentioned above, both the CUCB and ε-greedy approaches have been shown toachieve optimal regret under edge level feedback [14]. The regret proofs for both thesealgorithms rely on the mean estimates. To characterize the regret for node level feedback,26Algorithm 6: ε -GREEDY(budget k, Feedback mechanism M, parameter ω)1 Initialize ~ˆµ ;2 for s = 1→ T do3 εs = ωs ;4 With probability 1− εs, ES = Exploit(G,O,k) ;5 Else ES = Explore(G, k) ;6 Play the superarm ES and observe the diffusion cascade c ;7 ~ˆµ = Update(c,M) ;Algorithm 7: INITIAL EXPLORATION(budget k, Feedback mechanism M, param-eter ζ )1 Initialize ~ˆµ ;2 for s = 1→ T do3 if s≤ ζT then4 ES = Explore(G, k) ;5 else6 ES = Exploit(G,O,k)7 ;8 Play the superarm ES and observe the diffusion cascade c ;9 ~ˆµ = Update(c,M) ;we can use the mean estimates for node level feedback from Theorem 1 and essentiallyrepeat the proofs to characterize the regret. For lack of space, we suppress these resultsand refer the reader to [14].Initial Exploration: In this strategy, we explore for the first ζ fraction of rounds andthen exploit. The pseudocode in Algorithm 7 is self-explanatory.Pure Exploitation: This strategy exploits in every round. Even if we have no knowl-edge about the initial probabilities, it results in implicit exploration: the initially pickedseed sets produce cascades which are used to learn the probabilities. This amounts toinvoking Algorithm 3 for T rounds.3.4 ExperimentsGoals: Our main goal is to test the various algorithms and evaluate the exploration al-gorithms on the error in learning the influence probabilities, and regret minimizationalgorithms on regret achieved. Our main goal is to benchmark the proposed algorithms27on the regret minimization and network exploration tasks.Datasets: We use 3 real datasets – NetHept, Epinions and Flickr, whose characteristicsare summarized in Table 3.1. The “true” probabilities for these datasets are not available,so we had to synthetically generate them: we set them according to the weighted cascademodel [33]: for an edge (u,v), the influence probability is set to pu,v = 1|Nin(v)| . Theprobabilities to be learned are initialized to 0.Dataset | V | | E | Av.Degree Max.DegreeNetHEPT 15K 31K 4.12 64Epinions 76K 509K 13.4 3079Flickr 105K 2.3M 43.742 5425Table 3.1: Dataset characteristicsExperimental Setup: We simulate the diffusion in the network by sampling a determin-istic world from the probabilistic graph G: for purposes of our experiments, we assumethat the diffusion cascade in the real world happened according to this deterministicgraph. For this, we sample each edge in the graph according to its “true” probability. Wevary the size of the seed set or budget k from 10 to 100. For our influence maximizationoracle, we use the TIM algorithm proposed by Tang et al. [50], which is based on thenotion of reverse reachable (RR) sets. For lack of space, we refer the reader to [50] fordetails of the algorithm but note that (i) it obtains a (1− 1e − ε)-approximation solutionin near optimal time, (ii) it is much faster than using Greedy algorithm with MC simula-tions, and (iii) it readily serves as a (1− 1e ,1−1|E|)-approximation oracle for IM. Owingto runtime constraints, we use a value of ε = 0.5. We have verified that smaller, more ac-curate values of ε give us qualitatively similar results. We run the bandit algorithm for atotal of 1000 rounds. For each round, we find the “actual” spread obtained using the dif-fusion according to the deterministic graph generated. We find this value with seed setsobtained using both the learned probabilities as well as the “true” probabilities. Thus, weare able to calculate the regret (difference in spread) in each round of the process. Wealso plot the relative L2 error between the true and the learned probabilities. In additionto this, we find the fraction of edges f within a relative error of p = 10% and plot f as pis varied. All our results are those obtained by averaging across 3 runs.Algorithms Compared: (abbreviations used in plots are in brackets) Regret Minimiza-tion: CUCB, ε-greedy(EG), Initial Exploration(IE), Pure Exploitation(PE). Network Ex-ploration: Random Exploration(R), Strategic Exploration(S). Feedback Mechanisms:28Edge Level(EL), Node Level Frequentist (NLF), Node Level Maximum Likelihood (NL-ML). Using algorithm A with feedback M is abbreviated as A-M.In order to characterize the runtime for the various algorithms, we characterize theruntime for each of the basic components - EXPLORE, EXPLOIT and UPDATE (underall three feedback mechanisms) for all three datasets. 3.2.Dataset Exploit Explore UpdateEL NL-F NL-MLNetHEPT 2.682 0.0002 0.026 0.0275 1.148Epinions 245.225 0.0014 0.214 0.224 15.156Flickr 97.808 0.0019 1.452 2.479 690.103Table 3.2: Subroutine runtimes (in sec/round)Network Exploration: Figure 3.3(a) shows the L2 error obtained by using RandomExploration and Strategic Exploration strategies, coupled with Edge level feedback andboth the frequentist and maximum likelihood based node level feedback mechanisms.First, we can see that strategic exploration is better than just choosing nodes at ran-dom because it incorporates feedback from the previous rounds and explicitly tries toavoid those edges which have been sampled (often). We observe that the decrease in er-ror for frequentist approaches is much quicker as compared to the maximum likelihoodapproach. As expected, edge level feedback shows the fastest decrease in error. Thefrequentist node level feedback takes more aggressive steps as compared to the maxi-mum likelihood approach which is more principled and moderate towards distributingcredit. However, since the failure probability is low in typical networks, the aggressive-ness of the frequentist approach pays off leading to a very quick decrease in error. InFigure 3.3(b), we plot the fraction of edges which are within a relative error of 10% oftheir true probabilities. Figure 3.3(c) depicts an ROC-type curve which shows how thefraction of edges whose probabilities are learned within a given precision. We observethat there is a quick decrease in L2 error and in just 1000 rounds, our network explorationalgorithm is able to learn a large percentage of edges lie within 20% error. Since we havethe flexibility to generate cascades to learn about the hitherto unexplored parts of the net-work, our network exploration algorithms can lead to a far lesser sample complexity ascompared to algorithms which try to learn the probabilities from a given set of cascades.Regret Minimization: For our regret minimization experiments, we compare CUCB,ε-Greedy, Initial Exploration, Pure Exploitation. We explored random seed selection29(a) L2 Error (b) Fraction of edges within 10% Rel Err(c) Fraction of edges with p% Rel ErrFigure 3.3: NetHept,k = 50: Network Explorationas a baseline but the regret it achieved is much worse than that of all other algorithmsand it barely decreases over rounds, so we omit it completely from the plots. Since themaximum likelihood approach is not able to learn the probabilities as effectively as thefrequentist approach and also has a high update runtime 3.2, we use just the frequentistapproaches in the regret minimization setting. For ε-Greedy, we set the parameter ω = 5and for initial exploration we set ζ = 0.2. For CUCB, if the update results in any µiexceeding 1, we reset it back to 1, following [14]. All the probabilities are initialized to0. We show the results for all 3 datasets for k = 50 for both the node level and edge levelfeedback. Results for other k are similar.Figure 3.4(a) shows the decrease in average regret as the number of rounds increases.We see that convergence for the CUCB is slow. This is because a UCB-based algorithm isbiased towards exploring edges which have not been triggered often (or even once). Since30(a) NetHept (b) Epinions(c) FlickrFigure 3.4: k = 50:Regret vs Number of Roundstypical networks consist of a large number of edges, the rate of decrease in regret forCUCB is slow. This behaviour is observed for other datasets as well, so we omit CUCBfrom further plots. For algorithms such as ε-Greedy and Initial Exploration, we cancontrol the amount of exploration using the parameters ω and ζ . For these algorithms,it can be seen from Figures 3.4(a)-(c), that the regret decreases at a much faster rate,becoming very low in 1000 rounds. This implies that at the end of 1000 rounds, theprobabilities are estimated well enough to lead to a comparable spread as against theknown probabilities. For Initial Exploration, the regret remains almost the same duringthe exploration phase in the first 200 rounds, after which its regret steeply decreases.Pure Exploitation achieves the best average regret at the end of 1000 rounds. This is notuncommon for cases where the rewards are noisy. Initially, with unknown probabilities,rewards are noisy in our case, so exploiting and greedily choosing the best superarm31often leads to very good results. We also observe that the edge level and node levelfeedback achieve almost the same regret in several cases and this leads us to conjecturethat the credit distribution in node level feedback does not lead to much error and hencethe failure probability ρ is indeed low for typical social networks. In fact, we can useEq. (3.1) to find the failure probability. For the NetHept dataset, we find that the averagenumber of active parents K for a node is 1.175. Hence credit distribution is not a bigissue. Previous work has shown that the probabilities learned from diffusion cascadesare generally small [26, 40, 47]. E.g., if pmin = 0 and pmax varies from 0.01 to 0.2, thefailure probability ρ varies from 0.0115 to 0.2261. Hence the failure probability of theproposed node level feedback is small. We obtain similar results on the Epinions andFlickr datasets as well.Figure 3.5: NetHept, k = 50: L2 error vs Number of RoundsIn order to better understand the behaviour of the algorithms, we also plot the L2error on influence probabilities. We show the results for NetHept in Figure 3.5. As therounds progress, the mean estimates improve and the relative L2 error goes down. Thisleads to better estimates of expected spread, the quality of the chosen seeds improves,the true spread increases and hence the average regret goes down (see Figure 3.4). Wecan see that although pure exploitation led to the best regret, it narrowed down on theseed set to be chosen quickly and did not learn about the other edges in the network.ε-Greedy and Initial Exploration however did a fair bit of exploration and hence have alower L2 error. However, for the larger datasets, pure exploitation explores the networkmore before settling on a seed set. Hence, the L2 error is as small as the other algorithms.For brevity, we omit the plots. We also observe that the L2 error increases for the CUCBalgorithm. This is because of the resetting of mean estimates above 1. Since the influence32probabilities are small, resetting them to 1 leads to a large error.k EG-NL spread EG-EL spread10 6549.04 6599.0620 8696.23 8616.9050 11998.53 11986.96100 14770.16 14730.80Table 3.3: Epinions: Spread vs kWe show the effect of changing k in Table 3.3. We calculate the average spreadobtained using the learned probabilities across the rounds. We observe that the spreadobtained using node level feedback is always close to that using edge level feedback andhence failure probability is small.334Adaptive Influence MaximizationThe measure of intelligence is the ability to change.– Albert Einstein4.1 TheoryWe introduced the idea of adaptive influence maximization in chapter 1 and hypothesizedthat following an adaptive strategy by reacting to market feedback can potentially leadto large gains for both the MAXSPREAD and MINTSS problems. In this chapter, wegive algorithms for adaptive influence maximization and formalize the benefit of goingadaptive. The toy example below provides basic intuition about the advantage of adaptiveseed selection over non-adaptive seed selection.(a) Original probabilistic network (b) Network after diffusion for oneseedFigure 4.1: Example for adaptive versus non-adaptive seed selectionExample 1 (Adaptive Seed Selection). Figure 4.1(a) shows a probabilistic network. The34influence probabilities (edge weights) corresponding to (u,v1) and (u,v2) are p1 and p2respectively with p2 < p1. All remaining edges have a probability p. We need to selectk = 2 seeds for this network. It is easy to see that node u has the maximum marginal gain,i.e. activating it will result in maximum number of nodes being activated in expectation.Given that u has been selected as a seed node, we observe that v2 has a higher marginalgain since it has lesser probability of being activated by u than v1 and activating it willlead to a greater number of additional nodes being influenced (4p as compared to 3pfor v1, in expectation). Hence the non-adaptive strategy will select nodes u and v2 asthe seed nodes. Suppose the true world consists of 4 links – (u,v2), (v1,z1), (v1,z2)and (v1,wz). Hence the non-adaptive strategy will result in an actual spread of 2 in thistrue world. An adaptive strategy selects the first seed node, observes the diffusion andthen selects the second seed node. It selects the first seed node as u, then observes thenetwork feedback. The precise feedback mechanism is described later. Figure 4.1(b)shows the observed network after seeding the first node. The double-circle for u means ithas already been seeded. Links shown by a bold arc ((u,v2)) are no longer probabilisticand are known to exist in the true world. Similarly, links shown by a grey arc ((u,v1) andall links outgoing from v2) are known not to exist. The other links are still probabilistic,i.e., they have not yet been revealed in the feedback. Given this, the adaptive strategy willselect node v1 as the second seed, resulting in an actual spread of 6 in this true world.This example gives an intuition on why we expect adaptive strategies to do better thannon-adaptive seed selection.Recall, we say time horizon is unbounded if H ≥ kD, where k is the seed budget, Dis the length of the longest path of G, and H is the time horizon. We consider unboundedtime horizon up to Section 4.1.2. Bounded time horizon is addressed in subsection 4.1.3.Our first result is that our spread function based on node level feedback is adaptive mono-tone and adaptive submodular, thus affording an efficient approximation algorithm.Theorem 4. For unbounded time horizon, if the diffusion process is allowed to com-plete after every intervention, node level feedback is equivalent to edge level feedbackw.r.t. marginal gain computation and therefore the expected spread function is adaptivesubmodular and adaptive monotone under node level feedback.Proof. We will show that node level feedback is equivalent to edge level feedback fromthe perspective of marginal gain computation. In [22], the authors show that the expectedspread function under edge level feedback is adaptive monotone and adaptive submod-35ular. The above theorem will follow from this. Specifically, we prove that (a) for everyedge level feedback based network state, there is a corresponding state based on nodelevel feedback, which preserves marginal gains of nodes, and (b) vice versa.Given edge level feedback, we clearly know which nodes are active. These are pre-cisely nodes reachable from the seeds via live edge paths in the revealed network. In therest of the proof, we show that for each node level feedback state, there is a correspond-ing edge level feedback state that preserves marginal gains. Let S0 be the set of seedschosen at time t = 0. Given node level feedback, we can infer the corresponding edgelevel feedback based network state using the following rules. Consider an edge froman active node u to node v. Notice that the status of an edge leaving an inactive node isunknown in either feedback model.Rule 1: If node u is active, v is inactive, and there is an edge from u to v, then infer thatedge (u,v) is dead.Rule 2: If nodes u and v are both active and u is the only in-neighbor of v, then concludethat the edge (u,v) is live.Rule 3: If nodes u and v are both active and u has more than one in-neighbor, arbitrarilyset the status of the edge (u,v) to be live or dead.We now show that the way edge status is chosen to be live or dead in Rule 3 plays norole in determining the marginal gains of nodes. We make the observation that if the dif-fusion process is allowed to complete after each intervention, the only extra informationabout the network that is observed using edge level feedback over node level feedback isthe status of edges between 2 active nodes. Given that the node u is active, we need tocalculate the marginal gain of every other node in the network for the next intervention.Next we show that the status of edges between 2 active nodes does not matter in themarginal gain computation for any node.For the rest of the argument, we consider the both u and v are active and that v hasmultiple active in-neighbours, i.e., the case that is addressed by Rule 3. Consider anarbitrary node w the marginal gain of which we need to calculate. There maybe multiplepaths from w to a node reachable from w. These paths can be classified into those whichinvolve the edge (u,v) and ones which don’t. The marginal gain computation involvingthe latter paths is independent of the status of the edge (u,v). Since the diffusion processis allowed to complete, all nodes which can be reached (in the ”true” possible world)from w through (u,v) have already been activated. Hence paths going through (u,v)do not contribute to the marginal gain for w. Thus, the status of the edge (u,v) does36not matter. Since w is any arbitrary node, we can conclude that the marginal gain ofevery node remains the same under states based on both feedback models. Adaptivemonotonicity and submodularity are properties of marginal gains. Since marginal gainsare preserved between edge level and node level feedback, it follows that these propertiescarry over to our node level feedback model.4.1.1 MAXSPREADThere are four types of policies – the greedy non-adaptive policy (abbreviated GNA),the greedy adaptive policy (GA), the optimal non-adaptive policy (ONA) and the optimaladaptive policy (OA). We use piGA,k to denote the greedy adaptive policy constrained toselect k seeds and σ(piGA,k) to refer to the expected spread for this policy. While pre-vious results bound the performance of greedy (adaptive) policies in relation to optimaladaptive policies, they do not shed light on practically implementable policies under ei-ther setting. These previous results do not quite answer the question ”What do we gainin practice by going adaptive?” since both the optimal non-adaptive or optimal adaptivepolicies are intractable. We establish relations between two key practical (and henceimplementable!) kinds of policies – the greedy non-adaptive policy and the greedy adap-tive policy – for both MAXSPREAD and MINTSS. These relations quantify the average“adaptivity gain”, i.e., the average benefit one can obtain by going adaptive.We first restate Theorem 7 from [16]. This theorem gives a relation between thespreads obtained using a batch greedy adaptive policy which is constrained to selectseeds in batches of size b and the optimal adaptive policy.Fact 1. If σ(piGA,lb) is the average spread obtained by using a greedy batch policy witha batchsize b and σ(piOA,mb) is the spread using an optimal sequential policy (the opti-mal policy if we are able to select one seed per intervention) constrained to selecting anumber of seeds divisible by the batchsize b, thenσ(piGA,lb)> (1− e−lαγm )σ(piOA,mb) (4.1)where α is the multiplicative error in calculating the marginal gains. gamma is a con-stant and equal to ( ee−1)2.Proposition 1. Let the horizon be unbounded. Let piGA,nGA be the greedy batch policy thatselect nGA seeds overall in batches of size bGA, and let piOA,nOA be the optimal adaptive37policy that selects nOA seeds overall in batches of size bOA. Thenσ(piGA,nGA)≥[1− exp(−⌈nGAbGA⌉αγ⌈nOAbOA⌉)]σ(piOA,nOA) (4.2)where α ≥ 1 is the multiplicative error in calculating the marginal gains and γ = ( ee−1)2is a constant.Proof. Fact 4.1 gives a relation between the spreads obtained by a batch greedy adap-tive policy constrained to select lb seeds and the optimal adaptive policy constrained toselect mb seeds. Both these policies are constrained to select seeds in batches of sizeb. The relation is in terms of the number of batches used by the policies. Let l andm be the number of batches for the greedy and optimal policies respectively. We makethe following observations. First, the two policies can be constrained to select seeds indifferent batchsizes, bGA and bOA respectively. Next, the number of seeds selected by thepolicies need not be divisible by the batchsizes. We can follow a similar proof procedureas Theorem 7 in [16] and replace l by dnGAbGA e and m by dnOAbOAe.Theorem 5. Let piGNA,k be a greedy non-adaptive policy, piGA,k and piOA,k be the greedyand optimal adaptive policies respectively with batch-sizes equal to one i.e. the adaptivepolicies are sequential. All policies are constrained to select k seeds. Then we have thefollowing relations:σ(piGA,k)≥ (1− e−1/αγ)σ(piOA,k) (4.3)σ(piGNA,k)≥ (1−1e− ε)2σ(piOA,k) (4.4)Proof. Proposition 1 gives us bounds on the ratio of the spread achieved by batch-greedyadaptive policy and that achieved by the optimal adaptive policy. We set nOA = k andbGA = bOA = 1 and obtain equation 4.3 of the theorem.Theorem 2 of [3] states that for a submodular monotone function, there exists a non-adaptive policy which obtains (1−1/e− ε) fraction of the value of the optimal adaptivepolicy. In our context, this implies that the spread due to an optimal non-adaptive policyconstrained to select nONA seeds is within a (1− e−nONA/nOA − ε) factor of the spread ofan optimal adaptive policy selecting nOA seeds. More precisely,σ(piONA,nONA)≥ (1− e−nONA/nOA− ε)σ(piOA,nOA) (4.5)38The classical result from Nemhauser [39] states that the greedy non-adaptive algorithmobtains a (1−1/e−ε) fraction of the value of the optimal non-adaptive algorithm, whereε is the additive error made in the marginal gain computation. Moreover if the greedynon-adaptive policy is constrained to select nGNA seeds and the optimal non-adaptivepolicy selects nONA seeds we have the following:σ(piGNA,nGNA)≥ (1− e−nGNA/nONA− ε)σ(piONA,nONA) (4.6)Combining equations 4.5 and 4.6, we obtain the following resultσ(piGNA,nGNA)≥ (1− e−nGNA/nOA− ε)(1− e−nGNA/nOA− ε)σ(piOA,nOA) (4.7)Setting nGNA = nGA = nOA = k, we obtain equation 4.4 of the theorem.Discussion: To clarify what this theorem implies, lets assume that we can estimatethe marginal gains perfectly. Let’s set ε = 0 and α = 1. We thus obtain the followingrelations: σ(piGA,k)≥ (1−e−1/γ)σ(piOA,k) and σ(piGNA,k)≥ (1− 1e )2σ(piOA,k). These twofactors are almost equal (in fact non-adaptive is slightly better) and in the case of perfectmarginal estimation, there is not much gain in going adaptive. This intuition is confirmedby our experiments in section 4.3.4.1.2 MINTSSGiven that it takes the optimal adaptive policy nOA seeds to achieve a spread of Q, we seekto find the number of seeds that it will take the greedy adaptive and traditional greedynon-adaptive policy to achieve the same spread. Since the non-adaptive policy can beguaranteed to achieve the target spread only in expectation, we allow it to have a smallshortfall βONA. In addition, we allow both the greedy policies to have a small shortfallagainst their optimal variants. We formalize these notions in the following theorem.Theorem 6. Let the target spread to be achieved by the optimal adaptive policy be Q.Let the allowable shortfall for the optimal non-adaptive policy over the optimal adaptivepolicy be βONA. Let βGA and βGNA be the shortfall for the greedy adaptive and non-adaptive policies over their optimal variants. Let the number of seeds required by thefour policies - OA, ONA, GA and GNA be nOA, nONA, nGA and nGNA. Then we have thefollowing relationsnGA ≤ nOA(αγ ln(Q/βGA)) (4.8)39Figure 4.2: Theoretical comparison of adaptive and non-adaptive strategiesnGNA ≤ nOA ln(QβONA−Qε)ln(Q−βONAβGNA− ε(Q−βONA))(4.9)nGNA ≤ nOA ln(QβGA−βGNA−Qε)ln(Q−βGA +βGNAβGNA− ε(Q−βGA +βGNA))(4.10)Proof. If in proposition 1, we set bGA = bOA = 1 , σ(piOA,nOA) = Q and σ(piGA,nGA) =Q−βGA, after some algebraic manipulation we can obtain equation 4.8 of the theorem.Setting σ(piONA,nONA) = Q−βONA, σ(piOA,nOA) = Q in equation 4.5, we obtain the inter-mediate relation 4.11.nONA ≤ nOA ln(QQ−βONA) (4.11)Setting σ(piONA,nONA) = Q−βONA and σ(piGNA,nGNA) = Q−βONA−βGNA, we obtain thefollowing relation.nGNA ≤ nONA ln(Q−βONAβGNA− ε(Q−βONA)) (4.12)We constrain the spreads for the greedy adaptive and greedy non-adaptive policies tobe the same. Hence, Q− βGA = Q− βONA− βGNA. Hence βONA = βGA− βGNA. Bycombining equations 4.11 and 4.12 and substituting βONA as βGA − βGNA, we obtainequation 4.10 of the theorem.Discussion: To understand the implications of this theorem, set α = 1, ε = 0. Letthe βGNA = 2 and βGA = 1, thus allowing for a shortfall of only 2 nodes in the spread. Weobtain the following relations: nGA ≤ nOAγ ln(Q/2) and nGNA ≤ nOA ln(Q) ln((Q−1)/2).Figure 4.2 shows the growth of these functions with Q. We can see that as Q in-creases, the ratio nGAnOA grows much slower thannGNAnOA. Hence, for the MINTSSproblem,there is clearly an advantage on going adaptive. This is confirmed by our experiments in40section 4.3.4.1.3 Bounded Time HorizonIn discrete diffusion models (e.g., IC), each time-step represents one hop in the graph,so the time needed for a diffusion to complete is bounded by D, the longest simple pathin the network. In networks where this length is small [41], most diffusions completewithin a short time. This is also helped by the fact that in practice, influence probabilitiesare small. However, if we are given a very short time horizon, the diffusion processmay not complete. In this case, seed selection is forced to be based on observations ofincomplete diffusions. We show that the spread function in this case is no longer adaptivesubmodular.Theorem 7. The spread function with the IC diffusion model is not adaptive submodularif the diffusion process after each intervention is not allowed to complete.Figure 4.3: Counterexample to show that the spread is not adaptive submodularunder incomplete diffusionProof. We give a counterexample. Consider the network shown in Figure 4.3 and thetrue world, where the edge (u,v) is live and (v,w) is dead. Let H = 2, k = 2. Supposeat t = 0, we choose the seed set S = {u}, so the next intervention must be made at timet = 1. Based on the true world, we observe that nodes u and v are active at time t = 1.Hence we infer the edge (u,v) to be live. We do not know the status of edge (v,w).Even though w is reachable in the network G, there is incomplete information in therealization revealed at t = 1 to decide if the node w is active or not, since the observeddiffusion is incomplete. Thus, the expected spreads w.r.t. the realization above are asfollows: σ(S) = 2+(1− p) and σ(S∪{w}) = 3. Let S′ = {u,v}. Then σ(S′) = 2 andσ(S′∪{w}) = 3. This is because w is one hop away from v ∈ S′ and the realization tellsus that w is not active. Thus, we have σ(S∪{w})−σ(S) < σ(S′∪{w})−σ(S′). Thiswas to be shown.What are our options, given that the spread under bounded time horizon is in generalnot adaptive submodular? Theorem 24 in [22] shows that if a function is not adap-tive submodular, no polynomial algorithm can approximate the optimal expected spread41within any reasonable factor. Thus, we may continue to use adaptive greedy policy,but without any guarantees in general. In our experiments 4.3, we use a novel SequentialModel Based Optimization (SMBO) approach for finding a reasonably good policy whenthe time horizon is bounded.4.2 AlgorithmsTo obtain a greedy adaptive policy, we need to repeatedly select nodes with the maximummarginal gain at every intervention. This implies that we need to run the greedy influ-ence maximization algorithm to compute the marginal gain over the entire network mul-tiple times. Fortunately, this can be done efficiently by exploiting the recent work [50]which describes a near-optimal and efficient greedy algorithm – Two-phase InfluenceMaximization (TIM) for non-adaptive influence maximization. We first review TIM anddescribe the modifications we made to it for the adaptive case.4.2.1 Two phase Influence MaximizationOverview of TIM: Given a budget of k seeds, a network with m edges and n nodes andan appropriate diffusion model such as IC, TIM obtains a (1− 1/e− ε) fraction of theoptimal spread in the non-adaptive case, incurring a near-optimal runtime complexity ofO(k+ l)(n+m)logn/ε2. TIM operates by generating a large number of random ReverseReachable (RR) sets. An RR set is defined for a particular node v and a possible worldW of the network. It consists of the set of nodes that can reach the node v in the possibleworld W . Given enough number (see [50] for an explicit bound) of RR sets, the nodeswhich cover a large number of RR sets are chosen as the seed nodes: the node u whichappears in the maximum number of RR sets is chosen as the first seed. Once a node u isselected as a seed, we remove all the RR sets containing u and the next seed is the nodewhich covers the maximum of the remaining RR sets and so on until a seed set S with knodes is chosen. Tang et al. [50] show that this simple strategy is enough to guarantee a(1−1/e− ε)-approximation factor in near optimal time.Adaptive TIM: In a greedy adaptive policy, we need to select seed nodes in every inter-vention. After each intervention, a certain number of nodes are influenced and becomeactive. These already active nodes should not be selected as seeds. To ensure this, weeliminate all RR sets covered by any of these active nodes. If the number of nodes whichbecame active is large, it brings the number of remaining RR sets below the required42bound, which in turn can invalidate the theoretical guarantees of TIM, as the marginalgain of seeds selected in the next intervention may not be estimated accurately. Hence af-ter each intervention, we need to re-generate the RR sets to effectively select seeds for thenext intervention. To avoid this expensive repeated RR set generation, we instead elimi-nate all active nodes from the original network, by making all the incoming and outgoingedges have a zero probability, and generating the required number of RR sets for the newmodified network. This guarantees that the resulting RR sets do not contain the alreadyactive nodes. This is equivalent to running the greedy non-adaptive algorithm multipletimes on modified networks and results in retaining preserves the theoretical guaranteesof TIM. For the unbounded time horizon, the optimal policy consists of selecting oneseed per intervention and letting the diffusion complete. For the IC model, the diffusioncan take a maximum of D time steps.4.2.2 Sequential Model Based OptimizationIn the case of bounded time horizon (i.e., H < kD), as discussed at the end of Sec-tion 4.1.3, there is no straightforward strategy to find or approximate the optimal policy.The policy depends on the precise values of time horizon H and properties of the network.For MAXSPREAD the two extreme cases are the non-adaptive policy and the completelysequential greedy policy. The non-adaptive policy does not take any feedback into ac-count and is hence suboptimal. For a sequential policy, the inter-intervention time H/kwill be less than D. Hence the completely sequential policy will result in incompletediffusions and from 7 will be suboptimal. A similar reasoning applies for MINTSSF˙orboth problems, we are either forced to seed more than one node per intervention or waitfor less than D time-steps between interventions, or both. We split the problem of findingthe optimal policy into two parts - finding the intervention times and the number of nodesto be seeded at each intervention and which nodes need to be seeded at each intervention.Using the logic in 7, we solve the latter problem by using the adaptive TIM algorithmdescribed above. For the former problem, we resort to a heuristic approach since theexpected spread function we need to optimize does not have any nice algebraic proper-ties w.r.t. time. In order to find the best offline policy, we need to calculate σ for eachcandidate policy. Calculating σ across all the candidate possible worlds is expensive.Thus we need to maximize an expensive function without any nice mathematical prop-erties. Hence we resort to a bayesian optimization technique known as sequential modelbased optimization (SMBO)[32]. The SMBO approach narrows down on the promising43configurations (in our case, policies) to optimize a certain function. It iterates betweenfitting models and using them to make choices about which configurations to investigate.We now show the above problems can be encoded for solving these problems usingSMBO. Consider MAXSPREADW˙e have a maximum of k interventions. Some of theseinterventions may seed multiple nodes whereas other might not seed any. There areanother k− 1 variables corresponding to the inter-intervention times. Since the numberof variables is 2k− 1, SMBO techniques will slow down as k increases. It is also non-trivial to add the constraint that the sum of seeds across all interventions will add tok. Since this leads to an unmanageable number of variables for large k, we introducea parameter p which we refer to as the policy complexity. Essentially, p encodes thenumber of degrees of freedom a policy can have. For every i < p, we have a variablesi which is the number of nodes to be seeded at a particular intervention. We have alsohave a variable ti which encodes waiting time before making the next intervention. Forexample, if p = 2 and s1 = 2, t1 = 5,s2 = 3, t2 = 7 we initially seed 2 nodes, wait for5 time-steps, then seed 3 nodes, wait for 7 time-steps before the next intervention. Inthe next intervention, we repeat the above procedure, until we run out of time, i.e., reachH or get too close (within s1 or s2) to the budget of k seeds. In the latter case, thelast intervention just consists of using the remaining seeds. We use the same strategy toencode policies for MINTSSI˙n this case, however, we stop if the time reaches H or if≥Qnodes become active. Since we have a manageable number of parameters, we can easilyuse SMBO techniques to optimize over these parameters. The objective function for thefirst problem is to maximize the spread. The constraint is covered by the encoding. Forthe second problem, the objective function is to minimize the seeds to achieve a spreadof Q. This can be modelled by introducing penalty parameters λ1 and λ2. The functioncan be written as,minimize g(x)+λ1(Q− f (x))+λ2( f (x)−Q) (4.13)where x is the parameter vector, g(x) is the number of seeds, f (x) is the spread, Q isthe target spread. The parameter λ1 penalizes not achieving the target spread whereas λ2penalizes over-shooting the target spread. λ1 encodes the hard constraint whereas λ2 isused to direct the search.444.3 Experiments4.3.1 DatasetsWe run all our experiments on 3 real datasets – the author collaboration network NetHEPT(15k nodes and 62k edges), the trust network Epinions (75k nodes and 500k edges) andFlixster. On NetHEPT and Epinions where real influence probabilities are not available,we set the probability of an edge into a node v to 1/indegree(v), following the popularapproach [10, 11, 51]. We use the Flixster network under the topic-aware independentcascade model of diffusion [6] for which the authors learned the probabilities using Ex-pectation Maximization. Their processed network has 29k nodes and 10 topics. Wechoose the topic which results in the maximum number of non-zero edge weights. Theresulting sub-network of Flixster consists of 29k nodes and 200k edges.4.3.2 Experimental SetupAs mentioned earlier, we consider only the IC model of diffusion. We compare betweengreedy non-adaptive, greedy sequential adaptive and the batch-greedy adaptive policies.Since the actual true world is not known, we sample each edge in the network accordingto its influence probability and generate multiple true worlds. Since we are interestedin the performance of a policy on an average, we randomly generate generate 100 trueworlds and average our results across them. For either problem, the seeds selected by thenon-adaptive policy is based on expected case calculations and remain the same irrespec-tive of the true world. Only the performance of the policy is affected by the true possibleworld. Also note that for MINTSS in some true worlds the spread of the non-adaptivepolicy might be less than the target Q. The shortfall can be modelled by the factor βintroduced in Section 4.1.4.3.3 Sequential Model Based OptimizationWe use Sequential Model-Based Optimization for General Algorithm Configuration (SMAC)[32]. SMAC is the state of the art tool used for automatic algorithm configuration for hardproblems including SAT and Integer Linear Programming. Based on the performance ofa configuration on certain kinds of benchmark instances characterized by problem spe-cific properties, SMAC creates a random forest model and uses it to choose promisingcandidate configurations to be evaluated. SMAC uses the model’s predictive distribution45to compute its expected positive improvement over the the incumbent (current best solu-tion). This approach automatically trades exploitation for exploration. SMAC can easilyhandle both numerical and categorical parameters.For our case, we need to optimize an expensive black-box function (as defined in theprevious section) over 2p configurations where p is the policy complexity. Because thefunction is hard to evaluate a simple brute-force grid search over the parameter space isnot feasible. SMAC is implicitly able to leverage the structure present in the problemand come up with promising solutions to the problem.The benchmark instances consist of seeds for the random process generating 10 trueworlds at a time. Hence, the evaluation of each configuration on each instance involvesrunning the algorithm 10 times. We use a training set of 1000 such instances and aseparate test set of 50 instances to evaluate the policies found by SMAC. We restrict thenumber of function evaluations SMAC can make to 500 and set the tuner timeout (themaximum time that can be spent by SMAC in building the random forest model anddeciding which configuration to evaluate next) is set to 100 seconds.4.3.4 MAXSPREAD4.3.4.1 Unbounded time horizonFor MAXSPREAD we vary the number of seeds k over {1,10,20,50,100}. For the un-bounded horizon, we compute the spread obtained using the greedy non-adaptive and thegreedy adaptive sequential policies. We set ε = 0.1.(a) NetHEPT (b) FlixsterFigure 4.4: Average Spread vs Number of seeds46Figures 4.4(a) and 4.4(b) show the average spread σ across 100 possible true worldsas the number of seeds is varied in the given range. We quantify the the effect of adap-tivity by the ratio σ(piGA)σ(piGNA) , which we call the average adaptivity gain. We see that theaverage adaptivity gain remains constant as the number of seeds are varied. We obtainsimilar results even with higher (100 to 500) values of k. This finding is consistent withthe observations made in Section 4.1.(a) NetHEPT (b) FlixsterFigure 4.5: Runtime vs Number of seedsFor the adaptive greedy sequential strategy in which we select one seed at a time, wegenerate RR sets for k = 1 and regenerate the RR sets between each pair of interventions.The run-time graphs are shown in Figures 4.5(a) and 4.5(b). As can be seen, although thismethod scales linearly with the number of seeds, it is much slower than the non-adaptivecase and will prohibitive for larger datasets. Instead we can generate a large number ofRR sets upfront and use these sets to select seeds for the first few interventions. TheRR sets are regenerated as soon as the change in the number of active nodes becomesgreater than a certain threshold (the regeneration threshold θ ). The intuition is that if thenumber of active nodes has not increased much in a few interventions, the number of RRsets does not decline significantly and they still represent information about the state ofthe network well. We call this optimization trick lazy RR(LR) set regeneration to contrastit with the full RR(FR) set regeneration. We observe that because of submodularity, thefrequency of RR set (re)generation decreases as the number of seeds (and time) increases.For our experiments, we empirically set θ equal to 10. Higher values of θ lead to lowerruntimes but to a smaller average adaptivity gain.We use this strategy to find the spread for both NetHEPT and Flixster. As can be seen47(a) Average Spread vs Number of seeds (b) Runtime vs Number of seedsFigure 4.6: Epinions:Adaptive vs Non-adaptive for MAXSPREADfrom the runtime graphs and average spread graphs, this strategy does not decrease thespread much but leads to significant computational savings. After verifying this strategy,we use it to compare the 2 policies on the larger Epinions dataset, where the same trendis observed – see Figures 4.6(a) and 4.6(b). The average adaptivity gain is small even forthe greedy adaptive sequential policy in case of unbounded time horizon.4.3.4.2 Bounded time horizonFor the bounded time horizon, the policy will be forced to group sets of seeds togetherto form a batch. From Fact 4.1, we know that the average spread for such a policy willbe lower and hence the average adaptivity gain will further decrease. To verify this, weconduct an experiment on the NetHEPT dataset in which we decrease the time horizon Tfrom a large value (corresponding to unbounded time horizon) to low values of the orderof the length of the longest path in the network. We vary the policy complexity p to be1 or 2 in this case. We aim to find the best configuration by varying the batch-size in therange 1 to 100 and the inter-intervention time between 1 and the D of the network. Sincethe difference between the spreads for the non-adaptive policy vs. the greedy adaptivesequential policy is so small, for the bounded time horizon, SMAC is unable to find aunique optimal policy. Different runs of SMAC yield different policies for the samenumber of seeds, sometimes converging to the non-adaptive policy even for reasonablylarge time horizons! A higher configuration time for SMAC might lead to stable resultsor alternatively we might need to encode the problem differently. We leave this for futurework.484.3.5 MINTSS4.3.5.1 Unbounded time horizonFor all 3 datasets, for the unbounded time horizon, we compare the greedy non-adaptiveand greedy adaptive policies with different batch sizes in the range {1,10,50,100}. Be-cause a large number of seeds may be required to saturate a certain fraction of the net-work, we use the lazy RR set generation approximation explained above and set ε to 0.2.Figures 4.7(a), 4.7(b) show the comparison between the non-adaptive and various adap-tive greedy policies for the NetHEPT and Flixster datasets. Epinions shows a similartrend.(a) NetHEPT (b) FlixsterFigure 4.7: Number of seeds required vs Target fractionAs can be seen, the non-adaptive policy is competitive for smaller number of targetnodes. But as the target fraction increases, the adaptive policies are better able to exploitthe market feedback mechanism and lead to large savings in the number of seeds. Thisagain agrees with our theoretical results which showed that the adaptivity gain increasesas the number of target nodes increases. As the size of the network increases, the es-timated spread calculation in the non-adaptive case is averaged across greater numberof true worlds and hence becomes less efficient. We observed that in many cases, thefinal true spread for the non-adaptive policy either overshoots the target spread or missesthe target spread by a large amount. We conclude that adaptive policies are particularlyuseful if we want to influence a significant fraction of the network.We give some intuition for the difference in the adaptivity gains for the two prob-lems. For adaptive policies, the rate of increase in the expected spread is fast in the49beginning before the effects of submodularity take over. Hence adaptive policies requirefewer seeds than non-adaptive to reach a comparable target spread. However, once sub-modularity kicks in, the additional seeds added contribute relatively little to the spread.Hence for MINTSS, where the objective is to reach a target spread with minimum seeds,the adaptivity gain is higher. However for MAXSPREAD, even though the adaptive pol-icy reaches a high spread with fewer seeds, the remaining seeds in the budget don’t addmuch more to the true spread.Figure 4.8: Flixster: Runtime vs Target fractionWe also plot the runtime graph for the Flixster dataset. The non-adaptive time domi-nates because it needs to choose a larger number of seeds. Since the batch-greedy policiesselect batches of seeds and consider feedback less often, they have a lower running timewhich decreases as the batch-size increases. Figure 4.8 shows the runtime variation forFlixster. Results on other datasets show a similar trend.4.3.5.2 Bounded time horizonT 10 50 100 1000ShortFall (β ) 709 174.98 10.54 0Number of seeds 200 177.33 171.11 168Objective function 7290 1927.1 276.51 168Policy(s,t) (100,6) (28,8) (20,11) (3,12)Table 4.1: Policies of p = 1 recovered by SMAC for varying time horizons(T ) forFlixster with Q = 5800We now consider the important question, how good is the effect of adaptivity for abounded time horizon for the MINTSS problem. For this, we vary the time horizon T50from 10 to 1000 and the policy complexity p is set to either 1 or 2. We use the Flixsterdataset and fix the target fraction of nodes to 0.2. As in the previous problem, we aimto find the best configuration by varying the batch size in the range 1-100 and the inter-intervention time between 1 and the D of the network. Since each configuration runinvolves solving MINTSS 500 times, to save computation time we use a relatively highε = 0.5. We verified that similar results hold for smaller values of ε . The optimal policyreturned by SMAC is evaluated on a different set of instances (possible true worlds)averaging the results over 50 such instances.Table 4.1 shows the results for this experiment. For both p = 1,2, as the time horizonincreases, the shortfall goes to zero and the objective function is just the number of seedsrequired. We see that even for a low time horizon, SMAC is able to find a policy for whichthe number of seeds is close to the policy (which uses 163 seeds) for an unbounded timehorizon. It is still much better than the non-adaptive version of the policy which uses alarge number of seeds even for unbounded time horizon. As T increases, in the policyfound by SMAC, the number of seeds/interventions decreases and inter-intervention timeincreases. In fact, for T = 1000 the p = 1, the policy found by SMAC seeded 3 nodesper intervention and had a inter-intervention time equal to 12 (which is greater than Dof the graph). For extremely small T , the policy found by SMAC had 100 nodes perintervention and a very short inter-intervention time of 3. We observe similar behavioureven for p = 2 and with the NetHEPT dataset as well. Note that as long as T > D, thenon-adaptive version will require the same number of seeds it needs for the unboundedhorizon case. This shows us the benefit of adaptivity even when the time horizon isseverely constrained. These experiments show the effectiveness of SMAC in findingreasonably good policies for any time horizon for MINTSS.515ConclusionA conclusion is simply the place where you got tired of thinking.– Dan ChaonIn this thesis, we studied two important variants of the influence maximization prob-lem - namely in the bandit and adaptive settings.We studied the important, but under-researched problem of influence maximizationwhen no influence probabilities or diffusion cascades are available. Adopting a combina-torial multi-armed bandit paradigm, we formulated two interesting technical problems:minimizing error in the learned influence probabilities and minimizing regret from re-duced spread as a result of choosing suboptimal seed sets across rounds of a CMABgame. We proposed various algorithms from the bandits literature for these problemsand proved bounds on their performance. We also evaluated their empirical performancecomprehensively on three real datasets. It would be interesting to consider Thompsonsampling based algorithms for these problems. It is important to extend the results andalgorithms to continuous time diffusion models. How to learn on the fly, not just influ-ence probabilities, but the graph structure as well, is another interesting question.For the second part of the thesis, we relaxed the assumption that all seeds need tobe selected in the beginning of the diffusion process by adopting an adaptive approach.We gave algorithms for finding guaranteed approximations to the optimal policy for boththe MAXSPREAD and MINTSS problems. We quantified the gain of an adaptive policyover a non-adaptive strategy for both these problems. We studied these problems underan unbounded as well as a bounded time horizon. We performed experiments on threereal world datasets to verify our theoretical findings. In the future, we plan to evaluate52the performance of the greedy adaptive policy under a bounded time horizon from botha theoretical and empirical perspective.Investigating a combination of both problems i.e. considering adaptive strategieswhile learning influence probabilities in the CMAB framework is another interestingfuture research direction.53Bibliography[1] R. Agrawal. Sample mean based index policies with o (log n) regret for themulti-armed bandit problem. Advances in Applied Probability, pages 1054–1078,1995. → pages 9[2] V. Anantharam, P. Varaiya, and J. Walrand. Asymptotically efficient allocationrules for the multiarmed bandit problem with multiple plays-part i: Iid rewards.Automatic Control, IEEE Transactions on, 32(11):968–976, 1987. → pages 3, 9[3] A. Asadpour, H. Nazerzadeh, and A. Saberi. Maximizing stochastic monotonesubmodular functions. arXiv preprint arXiv:0908.2788, 2009. → pages 38[4] J.-Y. Audibert, S. Bubeck, and G. Lugosi. Minimax policies for combinatorialprediction games. arXiv preprint arXiv:1105.4871, 2011. → pages 9[5] I. C. Avramopoulos, J. Rexford, and R. E. Schapire. From optimization to regretminimization and back again. In SysML, 2008. → pages 3[6] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social influence propagationmodels. Knowledge and information systems, 37(3):555–584, 2013. → pages 45[7] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in finitely-armed andcontinuous-armed bandits. Theoretical Computer Science, 412(19):1832–1852,2011. → pages 9[8] F. Caro and J. Gallien. Dynamic assortment with demand learning for seasonalconsumer goods. Management Science, 53(2):276–292, 2007. → pages 9[9] S. Chen, T. Lin, I. King, M. R. Lyu, and W. Chen. Combinatorial pure explorationof multi-armed bandits. In Advances in Neural Information Processing Systems,pages 379–387, 2014. → pages 9[10] W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in socialnetworks. In Proceedings of the 15th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 199–208. ACM, 2009. → pages 2,4554[11] W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalentviral marketing in large-scale social networks. In Proceedings of the 16th ACMSIGKDD international conference on Knowledge discovery and data mining,pages 1029–1038. ACM, 2010. → pages 7, 45[12] W. Chen, Y. Yuan, and L. Zhang. Scalable influence maximization in socialnetworks under the linear reshold model. In Proc. 2010 IEEE Int. Conf. on DataMining, pages 88–97, 2010. → pages 7[13] W. Chen, L. V. Lakshmanan, and C. Castillo. Information and influencepropagation in social networks. Synthesis Lectures on Data Management, 5(4):1–177, 2013. → pages 2[14] W. Chen, Y. Wang, and Y. Yuan. Combinatorial multi-armed bandit: Generalframework and applications. In Proceedings of the 30th International Conferenceon Machine Learning, pages 151–159, 2013. → pages 3, 9, 12, 13, 14, 25, 26, 27,30[15] W. Chen, Y. Wang, and Y. Yuan. Combinatorial multi-armed bandit and itsextension to probabilistically triggered arms. arXiv preprint arXiv:1407.8339,2014. → pages 3, 9, 12, 13[16] Y. Chen and A. Krause. Near-optimal batch mode active learning and adaptivesubmodular optimization. In Proceedings of The 30th International Conference onMachine Learning, pages 160–168, 2013. → pages 10, 37, 38[17] H. Daneshmand, M. Gomez-Rodriguez, L. Song, and B. Schoelkopf. Estimatingdiffusion network structures: Recovery conditions, sample complexity &soft-thresholding algorithm. arXiv preprint arXiv:1405.2936, 2014. → pages 3, 8[18] P. Domingos and M. Richardson. Mining the network value of customers. InProceedings of the seventh ACM SIGKDD international conference on Knowledgediscovery and data mining, pages 57–66. ACM, 2001. → pages 1[19] V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unifiedapproach to fixed budget and fixed confidence. In Advances in Neural InformationProcessing Systems, pages 3212–3220, 2012. → pages 9[20] Y. Gai, B. Krishnamachari, and R. Jain. Learning multiuser channel allocations incognitive radio networks: A combinatorial multi-armed bandit formulation. InNew Frontiers in Dynamic Spectrum, 2010 IEEE Symposium on, pages 1–9. IEEE,2010. → pages 9[21] Y. Gai, B. Krishnamachari, and R. Jain. Combinatorial network optimization withunknown variables: Multi-armed bandits with linear rewards and individual55observations. IEEE/ACM Transactions on Networking (TON), 20(5):1466–1478,2012. → pages 9[22] D. Golovin and A. Krause. Adaptive submodularity: Theory and applications inactive learning and stochastic optimization. Journal of Artificial IntelligenceResearch, 42(1):427–486, 2011. → pages 10, 35, 41[23] D. Golovin and A. Krause. Adaptive Submodular Optimization under MatroidConstraints. Computing Research Repository, abs/1101.4, 2011. → pages 10[24] M. Gomez Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusionand influence. In Proceedings of the 16th ACM SIGKDD international conferenceon Knowledge discovery and data mining, pages 1019–1028. ACM, 2010. →pages 8[25] A. Gopalan, S. Mannor, and Y. Mansour. Thompson sampling for complex onlineproblems. In Proceedings of The 31st International Conference on MachineLearning, pages 100–108, 2014. → pages 9[26] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities insocial networks. In Proceedings of the third ACM international conference on Websearch and data mining, pages 241–250. ACM, 2010. → pages 3, 8, 32[27] A. Goyal, F. Bonchi, and L. V. Lakshmanan. A data-based approach to socialinfluence maximization. Proceedings of the VLDB Endowment, 5(1):73–84, 2011.→ pages 2, 4, 8[28] A. Goyal, W. Lu, and L. V. Lakshmanan. Simpath: An efficient algorithm forinfluence maximization under the linear threshold model. In Data Mining(ICDM), 2011 IEEE 11th International Conference on, pages 211–220. IEEE,2011. → pages 2[29] A. Goyal, F. Bonchi, L. Lakshmanan, and S. Venkatasubramanian. On minimizingbudget and time in influence propagation over social networks. Social NetworkAnalysis and Mining, 3(2):179–192, 2013. ISSN 1869-5450.doi:10.1007/s13278-012-0062-z. URLhttp://dx.doi.org/10.1007/s13278-012-0062-z. → pages 4[30] A. Guillory and J. Bilmes. Interactive submodular set cover. In ICML, pages415–422, 2010. → pages 10[31] H. W. Hethcote. The mathematics of infectious diseases. SIAM review, 42(4):599–653, 2000. → pages 156[32] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-basedoptimization for general algorithm configuration. In Learning and IntelligentOptimization, pages 507–523. Springer, 2011. → pages 43, 45[33] D. Kempe, J. Kleinberg, and E´. Tardos. Maximizing the spread of influencethrough a social network. In Proceedings of the ninth ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 137–146. ACM,2003. → pages 2, 6, 7, 28[34] T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules.Advances in applied mathematics, 6(1):4–22, 1985. → pages 3, 8, 9[35] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance.Cost-effective outbreak detection in networks. In Proceedings of the 13th ACMSIGKDD international conference on Knowledge discovery and data mining,pages 420–429. ACM, 2007. → pages 2, 7[36] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach topersonalized news article recommendation. In Proceedings of the 19thinternational conference on World wide web, pages 661–670. ACM, 2010. →pages 3[37] S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in themulti-armed bandit problem. The Journal of Machine Learning Research, 5:623–648, 2004. → pages 9[38] A. L. Montgomery. Applying quantitative marketing techniques to the internet.Interfaces, 31(2):90–108, 2001. → pages 1[39] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximationsfor maximizing submodular set functions. Mathematical Programming, 14(1):265–294, 1978. → pages 2, 7, 39[40] P. Netrapalli and S. Sanghavi. Learning the graph of epidemic cascades. In ACMSIGMETRICS Performance Evaluation Review, volume 40, pages 211–222. ACM,2012. → pages 3, 8, 18, 20, 21, 22, 32[41] M. E. Newman. The structure and function of complex networks. SIAM review, 45(2):167–256, 2003. → pages 41[42] S. Pandey and C. Olston. Handling advertisements of unknown quality in searchadvertising. In Advances in neural information processing systems, pages1065–1072, 2006. → pages 3[43] J. Rayport. The virus of marketing. Fast Company, 6(1996):68, 1996. → pages 157[44] H. Robbins. Some aspects of the sequential design of experiments. In HerbertRobbins Selected Papers, pages 169–177. Springer, 1985. → pages 3[45] M. G. Rodriguez, D. Balduzzi, and B. Scho¨lkopf. Uncovering the temporaldynamics of diffusion networks. arXiv preprint arXiv:1105.0697, 2011. → pages3, 8[46] K. Saito, R. Nakano, and M. Kimura. Prediction of information diffusionprobabilities for independent cascade model. In Knowledge-Based IntelligentInformation and Engineering Systems, pages 67–75. Springer, 2008. → pages 3,8, 18[47] K. Saito, K. Ohara, Y. Yamagishi, M. Kimura, and H. Motoda. Learning diffusionprobability based on node attributes in social networks. In Foundations ofIntelligent Systems, pages 153–162. Springer, 2011. → pages 32[48] J. J. Samper, P. A. Castillo, L. Araujo, J. Merelo, O. Cordon, and F. Tricas.Nectarss, an intelligent rss feed reader. Journal of Network and ComputerApplications, 31(4):793–806, 2008. → pages 1[49] X. Song, Y. Chi, K. Hino, and B. L. Tseng. Information flow modeling based ondiffusion rate for prediction and ranking. In Proceedings of the 16th internationalconference on World Wide Web, pages 191–200. ACM, 2007. → pages 1[50] Y. Tang, X. Xiao, and S. Yanchen. Influence maximization: Near-optimal timecomplexity meets practical efficiency. 2014. → pages 2, 7, 28, 42[51] C. Wang, W. Chen, and Y. Wang. Scalable influence maximization forindependent cascade model in large-scale social networks. Data Mining andKnowledge Discovery, 25(3):545–576, 2012. → pages 2, 45[52] M. Zinkevich. Online convex programming and generalized infinitesimal gradientascent. In Machine Learning, Proceedings of the Twentieth InternationalConference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages928–936, 2003. URL http://www.aaai.org/Library/ICML/2003/icml03-120.php. →pages 19, 20, 2158
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Influence maximization in bandit and adaptive settings
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Influence maximization in bandit and adaptive settings Vaswani, Sharan 2015
pdf
Page Metadata
Item Metadata
Title | Influence maximization in bandit and adaptive settings |
Creator |
Vaswani, Sharan |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | The objective of viral marketing is to leverage a social network to spread awareness about a specific product in the market through information propagation via word-of-mouth. A closely related problem is that of influence maximization which aims to identify the `best' set of `influential' users (seeds) to give discounts or free products to, such that awareness about the product is maximized. We study two relatively unexplored variants of influence maximization (IM) in social networks. Conventional IM algorithms assume the structure of the social network and edge weights to be known. Though the structure of the network is usually known, edge weights need to be estimated from past data. In the first part of this thesis, we tackle the real but difficult problems of (i) learning these edge weights online and (ii) maximizing influence in the network when no past data is available as input. We adopt a combinatorial multi-armed bandit (CMAB) paradigm and formulate the above problems respectively as (i) network exploration, i.e. incrementally minimizing the error in learned edge weights and (ii) regret minimization i.e. minimizing the loss in influence from choosing suboptimal seed sets over multiple attempts. Most previous work on influence maximization in social networks is limited to the non-adaptive setting in which the marketer is supposed to select all of the seed users up front. A disadvantage of this setting is that she is forced to select all the seeds based solely on a diffusion model. If the selected seeds do not perform well, there is no opportunity to course-correct. A more practical setting is the adaptive setting in which the marketer initially selects a batch of users and observes how seeding those users leads to a diffusion of product adoptions. Based on this market feedback, she formulates a policy for choosing the remaining seeds. We study adaptive offline strategies for two problems: (a) MAXSPREAD - given a budget on number of seeds and a time horizon, maximize the spread of influence and (b) MINTSS - given a time horizon and an expected number of target users, minimize the number of required seeds. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-08-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166665 |
URI | http://hdl.handle.net/2429/54688 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_november_sharan_vaswani.pdf [ 1.23MB ]
- Metadata
- JSON: 24-1.0166665.json
- JSON-LD: 24-1.0166665-ld.json
- RDF/XML (Pretty): 24-1.0166665-rdf.xml
- RDF/JSON: 24-1.0166665-rdf.json
- Turtle: 24-1.0166665-turtle.txt
- N-Triples: 24-1.0166665-rdf-ntriples.txt
- Original Record: 24-1.0166665-source.json
- Full Text
- 24-1.0166665-fulltext.txt
- Citation
- 24-1.0166665.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166665/manifest