Computational Problems in Multiagent Systems by Albert Xin Jiang B.Sc , The University of British Columbia, 2003 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia March 31, 2006 Â© Albert X in Jiang 2006 ii Abstract There has been recent interest from the computer science community on multi-agent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. In this thesis, we apply tech-niques developed in other areas of CS to solve two computational problems in multiagent systems: A c t i o n G r a p h Games: Action Graph Games (AGGs), first proposed in [Bhat and Leyton-Brown 2004], are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players' utility functions. We present an efficient algorithm for com-puting expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. Learn ing and planning i n online auct ion environments: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's preferences. When bidders' valuation distributions are not known ex ante, ma-chine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valua-tions, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-base algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach esti-mates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density esti-mation techniques. Bidding agents using the estimated distributions from our EM approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. iii Contents Abst rac t ii Contents iii List of Figures v 1 In t roduct ion 1 1.1 Multiagent Systems 1 1.2 Overview 2 2 Action Graph Games 4 2.1 Introduction 4 2.2 Action Graph Games 5 2.2.1 Definition 5 2.2.2 Examples 6 2.2.3 Size of an A G G Representation 7 2.3 Computing with AGGs 9 2.3.1 Notation 9 2.3.2 Computing Vy4(a_i) 10 2.3.3 Proof of correctness 13 2.3.4 Complexity 14 2.3.5 Discussion 15 2.4 A G G with Function Nodes 16 2.4.1 Motivating Example: Coffee Shop 16 2.4.2 Function Nodes 17 2.4.3 Representation Size 18 2.4.4 Computing with A G G F N s 19 2.5 Applications 21 2.5.1 Application: Computing Payoff Jacobian 21 2.6 Experiments 23 2.6.1 Representation Size 24 2.6.2 Expected Payoff Computation i 24 2.6.3 Computing Payoff Jacobians 25 2.6.4 Finding Nash Equilibria using the Govindan-Wilson algo-rithm 26 2.7 Conclusions 27 Contents iv 3 B i d d i n g Agents for Onl ine Auc t ions w i t h H i d d e n B i d s . . . . 29 3.1 Introduction 29 3.1.1 Game-Theoretic and Decision-Theoretic Approaches . . . 30 3.1.2 Overview 31 3.2 A Model of Online Auctions 32 3.2.1 Bidding Dynamics 32 3.2.2 Hidden Bids 32 3.2.3 Discussion 33 3.3 Learning the Distributions 34 3.3.1 The Simple Approach 34 3.3.2 E M Learning Approach 35 3.3.3 Learning Distributions in a Game-Theoretic Setting . . . 37 3.4 Constructing a Bidding Agent 37 3.4.1 A Decision-Theoretic Approach to Repeated Auctions . . 38 3.4.2 A Game-Theoretic Approach to Bidding in Online Auc-tions without Proxies 40 3.5 Experiments 44 3.5.1 Repeated Online Auctions 45 3.5.2 Online Auctions without Proxies 52 3.6 Related Work 53 3.7 Conclusion 55 Bib l iography 57 V List of Figures 2.1 A G G representation of an arbitrary 3-player, 3-action game . . . 6 2.2 A G G representation of a 3-player, 3-action graphical game . . . . 6 2.3 A G G representation of the ice cream vendor game 6 2.4 Projection of the action graph. Left: action graph of the ice cream vendor game. Right: projected action graph and action sets with respect to the action CI . 11 2.5 A 5 x 6 Coffee Shop Game: Left: the A G G representation without function nodes (looking at only the neighborhood of the a node s). Middle: we introduce two function nodes. Right: s now has only 3 incoming edges 19 2.6 Comparing Representation Sizes of the Coffee Shop Game (log-scale). Left: 5 x 5 grid with 3 to 16 players. Right: 4-player r. x 5 grid with r varying from 3 to 10 24 2.7 Running times for payoff computation in the Coffee Shop Game. Left: 5 x 5 grid with 3 to 16 players. Right: 4-player r x 5 grid with r varying from 3 to 10 25 2.8 Running times for Jacobian computation in the Coffee Shop Game. Left: 5 x 5 grid with 3 to 10 players. Right: 4-player r x 5 grid with r varying from 3 to 10 26 2.9 Ratios of Running times for the Govindan-Wilson algorithm in the Coffee Shop Game. Left: 4x4 grid with 3 to 5 players. Right: 4-player r x 4 grid with r varying from 3 to 9. The error bars indicate standard deviation over 10 random initial perturbations. The constant lines at 1.0 indicating equal running times are also shown 27 3.1 An example of the bidding process of an auction with 7 potential bidders 33 3.2 Results for Data Set 1, Distribution Estimation: distribution of bids ./(CE) (left); distribution of highest bids f1(x) (right) 47 3.3 Results for Data Set 1, Bidding: bidding strategies in the first auction (left); box plot of payoff regrets of the two approaches (right) 47 3.4 Box plot of expected payoff regrets for overlapping auctions . . . 48 3.5 Results for Data Set 2: Linear relationship between the mean of f(x\a) and a (left). Box plot of payoff regrets (right) 48 List of Figures vi 3.6 Results for Data Set 3: Distribution f(x) (top-left). Distribution g(m) (top-right). Distribution fl(x) (bottom-left). Box plot of payoff regrets (bottom-right) 50 3.7 Box plot of payoff regrets on the eBay Data Set 52 3.8 Results for Online Auctions without Proxies: the value distri-butions f(v) (left); the distributions of number of bidders g(m) (right) 53 3.9 Results for Online Auctions without Proxies: box plot of epsilon-equilibria using the simple and E M approaches 54 [ vii Glossary Cs(m) the contribution of action s to node m, 20 D a configuration, 5 D(s) the number of players that chose action s, 5 a configuration over v(s), 5 G the action graph, 5 N the set of players, 5 5 the set of distinct actions; also the set of action nodes in the action graph, 5 Si player i's set of actions, 5 - Vg (o-i) expected utility to agent i for playing pure strat-egy Si, given that all other agents play the mixed strategy profile cr_i; 10 A the set of configurations, 5 A^5'1) the set of configurations over v(s) given that player i has played s, 5 A ^ S i ' ^ ( ( T _ i ) the set of configurations over v(si) that have positive probability of occurring under the mixed strategy (s,,<7_j), 14 A( s) the set of configurations over v(s) given that some player has played s, 5 E set of all mixed strategy profiles, 9 Si set of all mixed strategies of player i, 9 s an action profile, 5 I the maximum in-degree of the action graph, 7 er mixed strategy profile, 9 <Ji mixed strategy of player i, 9 VVg1'-^. (c?) payoff Jacobian's entry at row (i, Si) and column (j.*j),21 v the neighbor relation of the action graph. i/(s) is the set of neighbors of s., 5 n the number of players, 5 St an action of player i, 5 u utility function , 5 Glossary viii us the utility function associated with action s; stores utilities given that the current player has played s, 6 1 Chapter 1 Introduction 1.1 Multiagent Systems There has been recent interest from the computer science community on (non-cooperative) multiagent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. These situations can be modeled as games, and analyzed using game theory. Game theory has received a great deal of study, and is perhaps the dominant paradigm in microe-conomics. However, the computational aspects of game theory have received relatively less attention until recently. The intersection of computer science and game theory includes many exciting topics. One line of research is to apply game theory to resource allocation prob-lems in computing systems. Internet-based systems, due to their distributed and multiagent nature, are a popular area for game-theoretic analysis. Examples in-clude network routing [Roughgarden and Tardos 2004], C P U scheduling [Regev and Nisan 1998; Waldspurger et al. 1992], peer-to-peer file sharing [Golle et al. 2001] and search engines' ranking systems [Altman and Tennenholtz 2005]. Another line of research, which includes the topic of this thesis, is to ap-ply CS techniques to analyze and solve computational problems in multiagent systems. One type of problems is the computation of game-theoretic solutions such as Nash equilibria (and restricted versions such as pure-strategy equilibria and social welfare maximizing equilibria) and correlated equilibria, given a de-scription of the game. Recently, complexity analysis from theoretical computer science has been used to analyze these problems [Conitzer and Sandholm 2003; Goldberg and Papadimitriou 2005; Daskalakis et al. 2005; Chen and Deng 2005]. There are also some recent research on practical algorithms for computing Nash equilibria [Roller et al. 1994; Porter et al. 2004]. The representation of games has also received recent interest. Tradition-ally, simultaneous-move games are represented in normal form, and dynamic games are represented in extensive form. However, for large games, these rep-resentations are impractically large, and any non-trivial computation on these representations would be intractable. Fortunately, most large games of any practical interest have highly structured payoff functions, thus it is possible to compactly represent them. One influential approach is to represent structure in payoffs as graphs. A l researchers have used graphs to represent structure in joint probability distributions (as Bayes nets or Markov random fields) with great success. For multiagent systems, several graphical representations have been proposed to represent the (strict) independence between players' payoffs; Chapter 1. Introduction 2 this includes multi-agent influence diagrams [Roller and Milch 2001] for dynamic games and graphical games [Kearns et al. 2001] for simultaneous-move games. Another type of graphical representations focus on context-specific independen-cies in agents' utility functions - that is, games in which agents' abilities to affect each other depend on the actions they choose. This includes local effect games [Leyton-Brown and Tennenholtz 2003] and action graph games [Bhat and Leyton-Brown 2004]; the latter is the topic of Chapter 2 of this thesis. Another type of computational problems often encountered in multiagent systems are learning problems. Game-theoretic solution concepts such as Nash equilibria assume that the players have perfect knowledge about the game be-ing played, and are perfectly rational. In practice these conditions often do not hold. How should agents act in such environments? This is often cast as a multiagent reinforcement learning problem: instead of assuming other agents to be perfectly rational, assume instead that agent tries to learn an optimal policy through repeated interaction with the other players. Many approaches have been proposed [Bowling and Veloso 2001; Bowling 2004; Powers and Shoham 2004]. Most of these multiagent reinforcement learning approaches study set-tings of repeated games where players know the game being played. However, in many situations such as auctions, each player does not know the complete payoff function, and each player has certain private information about the pay-offs. In the case of auctions, the private information are players' valuations on the item(s) being auctioned. If we know the joint probability distribution of the valuations, we can formulate the situation as a Bayesian game. Usually we do not know the distribution of valuations; instead the bidding history of pre-vious auctions are available, and we could try to estimate the distribution from data. This machine learning problem is an essential component of the problem of building automated bidding agents, which has attracted interest partly due to the Trading Agent Competitions (TAC-Classic and T A C Supply Chain Man-agement) [Wellman et al. 2002]. In Chapter 3 of this thesis, we look at another auction setting with practical interest, namely online auction systems such as the popular eBay. Learning the distributions in this setting presents unique challenges due to the fact that not all bids are shown in the bidding history. 1.2 Overview My thesis contains results from two research projects on computational problems in game theory and multiagent systems. One is a way of compactly representing games in general, and the other is about a learning problem in a specific kind of multiagent systems, namely online auction environments. The two projects may seem to be about vastly different things, but a common theme is that in both cases we apply techniques developed in other areas of CS (graphical models and dynamic programming in the former, and the Expectation-Maximization algorithm in the latter) to solve computational problems in multiagent systems. Chapter 2 is about Action Graph Games (AGGs). First proposed in [Bhat and Leyton-Brown 2004], AGGs are a graphical representation of games that Chapter 1. Introduction 3 exploits the context-specific independence and anonymity structure in many games. Using AGGs we are able to compactly represent large structured games with many players, and furthermore efficiently compute expected payoffs under mixed strategies, which is an essential step in many game-theoretic computa-tions, e.g. best response, Nash equilibria and correlated equilibria. In this thesis, we make several significant improvements to to results in [Bhat and Leyton-Brown 2004]. First we significantly improved the algorithm for com-puting expected payoffs of AGGs. Now our dynamic programming algorithm is able to compute expected payoffs in polynomial time with respect to the size of the representation, whereas in [Bhat and Leyton-Brown 2004] a poly-nomial time algorithm was found for only symmetric games with symmetric strategies. Furthermore, we extended the A G G representation to include func-tion nodes in the action graph. This allows us to compactly represent a larger class of context-specific independence structure in games. Results of computa-tional experiments on structured games confirm our theoretical predictions of compactness and computational speedup. In Chapter 3 we try to solve the problem of learning bidders' valuation and number distributions in online auction environments. There is much active research into the design of automated bidding agents, particularly for environ-ments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's pref-erences. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bid-ders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her val-uation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-based algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach esti-mates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density esti-mation techniques. Bidding agents using the estimated distributions from our E M approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. 4 Chapter 2 Action Graph Games 2.1 Introduction Game-theoretic1 models have recently been very influential in the computer sci-ence community. In particular, simultaneous-action games have received con-siderable study, which is reasonable as these games are in a sense the most fundamental. In order to analyze these models, it is often necessary to com-pute game-theoretic quantities ranging from expected utility to Nash equilibria. Most of the game theoretic literature presumes that simultaneous-action games will be represented in normal form. This is problematic because quite often games of interest have a large number of players and a large set of action choices. In the normal form representation, we store the game's payoff function as a matrix with one entry for each player's payoff under each combination of all players' actions. As a result, the size of the representation grows exponentially with the number of players. Even if we had enough space to store such games, most of the computations we'd like to perform on these exponential-sized objects take exponential time. Fortunately, most large games "of any practical interest have highly struc-tured payoff functions, and thus it is possible to represent them compactly. (Intuitively, this is why humans are able to reason about these games in the first place: we understand the payoffs in terms of simple relationships rather than in terms of enormous look-up tables.) One influential class of representa-tions exploit strict independencies between players' utility functions; this class include graphical games [Kearns et al. 2001], multi-agent influence diagrams [Roller and Milch 2001], and game nets [LaMura 2000]. A second approach to compactly representing games focuses on context-specific independencies in agents' utility functions - that is, games in which agents' abilities to affect each other depend on the actions they choose. Since the context-specific indepen-dencies considered here are conditioned on actions and not agents, it is often natural to also exploit anonymity in utility functions, where each agent's utili-ties depend on the distribution of agents over the set of actions, but not on the identities of the agents. Examples include congestion games [Rosenthal 1973] and local effect games (LEGs) [Leyton-Brown and Tennenholtz 2003]. Both of these representations make assumptions about utility functions, and as a result cannot represent arbitrary games. Bhat and Leyton-Brown [2004] introduced 1A version of this chapter has been submitted for publication. Jiang, A .X. and Leyton-Brown, K. (2006) A Polynomial-Time Algorithm for Action-Graph Games. Submitted to AAAI. Chapter 2. Action Graph Games 5 action graph games (AGGs). Similar to LEGs, AGGs use graphs to repre-sent the context-specific independencies of agents' utility functions, but unlike LEGs, AGGs can represent arbitrary games. Bhat & Leyton-Brown proposed an algorithm for computing expected payoffs using the A G G representation. For AGGs with bounded in-degree, their algorithm is exponentially faster than normal-form-based algorithms, yet still exponential in the number of players. In this chapter we make several significant improvements to results in [Bhat and Leyton-Brown 2004], In Section 2.3, we present an improved algorithm for computing expected payoffs. Our new algorithm is able to better exploit anonymity structure in utility functions. For AGGs with bounded in-degree, our algorithm is polynomial in the number of players. In Section 2.4, we extend the A G G representation by introducing function nodes. This feature allows us to compactly represent a wider range of structured utility functions. We also describe computational experiments in Section 2.6 which confirm our theoretical predictions of compactness and computational speedup. 2.2 Action Graph Games 2.2.1 Definition A n action-graph game (AGG) is a tuple (N, S, v, u). Let N = {1,... ,n} denote the set of agents. Denote by S = FJieiv the set of action profiles, where Y['s the Cartesian product and 5, is agent i's set of actions. We denote by Si G Si one of agent i's actions, and s G S an action profile. Agents may have actions in common. Let S = [jiâ‚¬N Si denote the set of. distinct actions choices in the game. Let A denote the set of configurations of agents over actions. A configuration D G A is an ordered tuple of |5 | integers (D(s), D(s'),...), with one integer for each action in S. For each s G S, D(s) specifies the number of agents that chose action s G S. Let V : S iâ€”> A be the function that maps from an action profile s to the corresponding configuration D. These shared actions express the game's anonymity structure: agent i's utility depends only on her action S j and the configuration V(s). Let G be the action graph: a directed graph having one node for each action s G S. The neighbor relation is given by v : S >â€”> 2s. If s' G is(s) there is an edge from s' to s. Let denote a configuration over ^(s), i.e. is a tuple of |^(s)| integers, one for each action in v(s). Intuitively, agents are only counted in if they take an action which is an element of v(s). is the set of configurations over v(s) given that some player has played s.2 Similarly we .define : S >-* A<8) which maps from an action profile to the corresponding configuration over v(s). The action graph expresses context-specific independencies of utilities of the game: V i G N, if i chose action s, G S, then i's utility depends only on the 2If action s is in multiple players' action sets (say players i, j), and these action sets do not completely overlap, then it is possible that the set of configurations given that i played s (denoted A^ s , i )) is different from the set of configurations given that j played s. A ' s ) is the union of these sets of configurations. Chapter 2. Action Graph Games 6 Figure 2.1: A G G rep-resentation of an arbi-trary 3-player, 3-action game Figure 2.2: A G G rep-resentation of a 3-player, 3-action graph-ical game Figure 2.3: A G G rep-resentation of the ice cream vendor game numbers of agents who chose actions connected to s, which is the configuration Â£>(Si)(s). In other words, the configuration of actions not in ^(SJ) does not affect i's utility. We represent the agents' utilities using a tuple of \S\ functions u = (us, u" ,...) one for each action s G 5. Each us is a function us : >â€”Â» R . So if agent i chose action s, and the configuration over v(s) is D^s\ then agent i's utility is us(D^). Observe that all agents have the same utility function, i.e. con-ditioned on choosing the same action s, the utility each agent receives does not depend on the identity of the agent. For notational convenience, we define w ( s , = u a { D ^ ) and Ui(s) = u(si,V^{s)). 2.2.2 E x a m p l e s Any arbitrary game can be encoded as an A G G as follows. Create a unique node St for each action available to each agent i. Thus Vs e S, D(s) G {0,1}, and Vi , J2ses D(s) m u s t equal 1. The distribution simply indicates each agent's action choice, and the representation is no more or less compact than the normal form (see Section 2.2.3 for a detailed analysis). Example 1 . Figure 2.1 shows an arbitrary 3-player, 3-action game encoded as an AGG. As always, nodes represent actions and directed edges represent membership in a node's neighborhood. The dotted boxes represent the players' action sets: player 1 has actions 1, 2 and 3; etc. Observe that there is always an edge between pairs of nodes belonging to different action sets, and that there is never an edge between nodes in the same action set. In a graphical game [Kearns et al. 2001] nodes denote agents and there is an edge connecting each agent i to each other agent whose actions can affect i's utility. Each agent then has a payoff matrix representing his local game with neighboring agents; this representation is more compact than normal form whenever the graph is not a clique. Graphical games can be represented as AGGs by replacing each node i in the graphical game by a distinct cluster of nodes 5, representing the action set of agent i . If the graphical game has an Chapter 2. Action Graph Games 7 edge from i to j, create edges so that Vs , â‚¬ 5 i , V s j â‚¬ S j , s, â‚¬ ^ ( s j ) - The result ing A G G representations are as compact as the original graphical game representations. Example 2. Figure 2.2 shows the AGG representation of a graphical game having three nodes and two edges between them (i.e., player 1 and player 3 do not directly affect each others' payoffs). The AGG may appear more complex than the graphical game; in fact, this is only because players' actions are made explicit. The A G G representation becomes even more compact when agents have actions in common, wi th ut i l i ty functions depending only on the number of agents taking these actions rather than on the identities of the agents. Example 3 . The action graph in Figure 2.3 represents a setting in which n ven-dors sell chocolate or vanilla ice creams, and must choose one of four locations along a beach. There are three kinds of vendors: nc chocolate (C) vendors, n y vanilla vendors, and nw vendors that can sell both chocolate and vanilla, but only on the west side. Chocolate (vanilla) vendors are negatively affected by the presence of other chocolate (vanilla) vendors in the same or neighboring loca-tions, and are simultaneously positively affected by the presence of nearby vanilla (chocolate) vendors. Note that this game exhibits context-specific independence without any strict independence, and that the graph structure is independent of n. Other examples of compact A G G s that cannot be compactly represented as graphical games include: locat ion games, role formation games, traffic rout ing games, product placement games and party affil iation games. 2.2.3 Size of an AGG Representation We have claimed that action graph games provide a way of representing games compactly. Bu t what exactly is the size of an A G G representation? A n d how does this size grow as the number of agents n grows? F rom the definit ion of A G G in Section 2.2.1, we observe that we need the following to completely specify an A G G : â€¢ The set of agents A r = { l , . . . , n } . Th is can be specified by the integer n. â€¢ The set of actions S. â€¢ Each agent's act ion set 5 , C S. â€¢ The action graph G. The set of nodes is 5 , which is already specified. The neighbor relat ion v can be straightforwardly represented as neighbor lists: for each node s â‚¬ S we specify its list of neighbors v(s) C S. The space required is X ] s ss IKS)I' which is bounded by \S\T, where T = m a x s | f (s) | , i.e. the max imum in-degree of the action graph. Chapter 2. Action Graph Games 8 â€¢ For each action s, the utility function us : A^ s ' iâ€”> E. We need to specify a utility value for each distinct configuration â‚¬ A^s^. The set of configurations A' s) can be derived from the action graph, and can be sorted in lexicographical order. So we do not need to explicitly specify A ^ ; we can just specify a list of |A^ S ' | utility values that correspond to the (ordered) set of configurations.3 |AW|, the number of distinct configurations over v(s), in general does not have a closed-form expression. Instead, we consider the operation of extending all agents' action sets via V i : Si iâ€”> S. Now the number of configurations over v(s) is an upper bound on |A(")]. The bound is the number of (ordered) combinatorial compositions of n â€” 1 (since one player has already chosen s) into |i/(s)| +1 nonnegative integers, which is ^(n-\)^]}(s)^<. â€¢ Then the total space required for the utilities is bounded from above by |5| ^ " l ^ i j / â€¢ Therefore the size of an AGG representation is dominated by the size of its utility functions, which is bounded by \S\ â€¢ If ^ is bounded by a constant as n grows, the representation size grows like O^Sln1), i.e. polynomially with respect to n. The AGG representation achieves compactness by exploiting two types of structure in the utilities: 1. Anonymity: agent i's utility depends only on her action Si and the configuration (i.e. number of players that play each action), but not on the identities of the players. Since the number of configurations |A| is usually less than the number of action profiles |S| = Y[i \Si\ and is never greater, we need fewer numbers to represent the utilities in AGG compared to the normal form. 2. Context-specific independence: for each node s â‚¬ S, the utility func-tion us only needs to be denned over A^ s ' . Since |A^ S ' | is usually less than |A| and is never greater, this further reduces the numbers we need to specify. For each AGG, there exists a unique induced normal form representation with the same set of players and actions for each i ; its utility function is a matrix that specifies each player i's payoff for each possible action profile s S S. This implies a space complexity of n f l " = 1 \Si\. When Si = S for all i , this becomes n|5|n, which grows exponentially with respect to n. The number of payoff values stored in an AGG representation is always less or equal to the number of payoff values in the induced normal form representation. This is because for each entry in the normal form which represents i's utility under action profile s, there exists a unique action profile s in the AGG with the 3 This is the most compact way of representing the utility functions, but does not provide easy random access of the utilities. Therefore, when we want to do computation using A G G , we may convert each, utility function u" to a data structure that efficiently implements a mapping from sequences of integers to (floating-point) numbers, (e.g. tries, hash tables or Red-Black trees), with space complexity in the order of 0(I|A^ s ' |) . Chapter 2. Action Graph Games 9 corresponding action for each player. Th is s induces a unique configuration Â£>(s ) over the A G G ' s action nodes. B y construction of the A G G ut i l i ty functions, X>(s) together wi th sÂ» determines a unique ut i l i ty w S i(Z?( S i)(s)) in the A G G . Furthermore, there are no entries in the A G G ut i l i ty functions that do not correspond to any action profile (si,S-i) in the normal form. Th is means that there exists a many-to-one mapping from entries of normal form to uti l i t ies in the A G G . Of course, the A G G representation has the extra overhead of representing the action graph, which is bounded by \S\J. Bu t asymptotical ly, A G G ' s space complexity is never worse than the equivalent normal form. 2.3 Computing with A G G s One of the main motivations of compactly representing games is to do efficient computat ion on the games. We have introduced A G G as a compact represen-tat ion of games; now we would like to exploit the compactness of the A G G representation when we do computat ion. We focus on the computat ional task of comput ing expected payoffs under a mixed strategy profile. Besides being important in itself, this task is an essential component of many game-theoretic applications, e.g. computing best responses, Gov indan and Wi lson 's continu-at ion methods for finding Nash equi l ibr ia [Govindan and Wi l son 2003; Gov in -dan and Wi lson 2004], the simpl ic ial subdivision algor i thm for finding Nash equi l ibr ia [van der L a a n et al . 1987], and finding correlated equi l ibr ia using Papadimi t r iou 's algori thm [Papadimitr iou 2005]. Besides exploit ing the compactness of the representation, we would also like to be able to exploit the fact that quite often the mixed strategy profile given wi l l have smal l support. The support of a mixed strategy CT, is the set of pure strategies played wi th positive probabi l i ty (i.e. cri(si) > 0). Qui te often games have Nash equi l ibr ia wi th small support. Porter et a l . [2004] proposed algo-r i thms that expl ici t ly search for Nash equi l ibr ia wi th smal l support. In other algorithms for comput ing Nash equi l ibr ia such as Gov indan-Wi lson and sim-pl ic ial subdivis ion, quite often we wi l l also be comput ing expected payoffs for mixed strategy profiles wi th smal l support. Our algor i thm appropriately ex-ploits strategy profiles wi th smal l supports. 2.3.1 N o t a t i o n Let <p(X) denote the set of al l probabi l i ty distr ibutions over a set X. Define the set of mixed strategies for i as S j = <p(Si), and the set of al l mixed strategy profiles as E = YlieN We denote an element of by CTJ, an element of Â£ by a , and the probabi l i ty that i plays action s as o~i(s). Define the expected ut i l i ty to agent i for playing pure strategy s; , given that al l other agents play the mixed strategy profile c_ j , as (2.1) \ Chapter 2. Action Graph Games 10 where Pr (s_i |<7_i ) = Ylj7n(7j(sj) 1S the probabi l i ty of s_{ under the mixed strategy <7_j. The set of i 's pure strategy best responses to a mixed strategy profile cr_j is arg m a x s Vy(<7_i), and hence the ful l set of i's pure and mixed strategy best responses to <T_J is BRi(cr-i) = v?(arg max vy'(a-i))- (2.2) s A strategy profile a is a Nash equi l ibr ium iff V i â‚¬ N, Oi â‚¬ BRiip-i). (2.3) 2.3.2 Computing Vs\(a-i) Equat ion (2.1) is a sum over the set S-i of action profiles of players other than i. The number of terms is Ilj^ i l^ 'l' which grows exponential ly in n. Thus Equat ion (2.1) is an exponential t ime algor i thm for comput ing Vy (er_j). If we were using the normal form representation, there really would be \S-i\ different outcomes to consider, each with potential ly dist inct payoff values, so evaluation Equat ion (2.1) is the best we could do for comput ing V s l . C a n we do better using the A G G representation? Since A G G s are fully expressive, representing a game without any structure as an A G G would not give us any computat ional savings compared to the normal form. Instead, we are interested in structured games that have a compact A G G representation. In this section we present an algor i thm that given any i , Si and <7_,, computes the expected payoff Vy (a_ i ) in t ime polynomial wi th respect to the size of the A G G representation. In other words, our a lgor i thm is efficient if the A G G is compact, and requires t ime exponential in n if i t is not. In part icular, recall that for classes of A G G s whose in-degrees are bounded by a constant, their sizes are polynomial in n. A s a result our algori thm wi l l be polynomial in n for such games. F i rs t we consider how to take advantage of the context-specific independence structure of the A G G , i.e. the fact that i 's payoff when playing Sj only depends on the configurations in the neighborhood of i . Th is allows us to project the other players' strategies into smaller action spaces that are relevant given s^. Th is is i l lustrated in Figure 2.4, using the ice cream vendor game (Figure 2.3). Intuit ively we construct a graph from the point of view of an agent who took a part icular act ion, expressing his indifference between actions that do not affect his chosen action. Th is can be thought of as inducing a context-specific graphical game. Formal ly, for every action s â‚¬ S define a reduced graph G^ by including only the nodes u(s) and a new node denoted 0. The only edges included in G^ are the directed edges from each of the nodes v(s) to the node s. Player j ' s act ion Sj is projected to a node in the reduced graph G^ by the following mapping: (s) = I SJ S J e v{s) , . Chapter 2. Action Graph Games 11 SC Sy Figure 2.4: Pro ject ion of the act ion graph. Left: act ion graph of the ice cream vendor game. Right : projected act ion graph and act ion sets w i th respect to the action C I . In other words, actions that are not in u(s) (and therefore do not affect the payoffs of agents playing s) are projected to 0. The result ing projected act ion set SjS^ has cardinal i ty at most min(|5,-|, |^(s)| + 1). We define the set of mixed strategies on the projected action set by T,^ = <p(SjS^). A mixed strategy Oj on the original act ion set Sj is projected (s) (s) to Oj G Â£ v by the following mapping: *M(8W)=l ^ S ^ " W (2 5) [ 2^s'eSi\v(s) aj(s I sj ~ v So given s, and o-i, we can compute in <9(n|5|) t ime in the worst case. Now we can operate entirely on the projected space, and write the expected payoff as where P r ^ * V - ^ ) = Uj^i ^(sf^). The summat ion is over S^p, which in the worst case has (|^(SJ)| + l ) ( n _ 1 ) terms. So for A G G s wi th strict or context-specific independence structure, comput ing V ^ c T - i ) this way is much faster than doing the summat ion in (2.1) directly. However, the t ime complexi ty of this approach is st i l l exponential in n. Next we want to take advantage of the anonymity structure of the A G G . Recal l from our discussion of representation size that the number of dist inct configurations is usual ly smaller than the number of dist inct pure act ion profiles. So ideally, we want to compute the expected payoff V s ' (er_i) as a sum over the Chapter 2. Action Graph Games 12 possible configurations, weighted by their probabil i t ies: (2.6) where <r ( s i ) = (si,a_l ) and N (2.7) s : X > < s Â« ' ( s ) = D < s i > J = l which is the probabi l i ty of D1-3^ given the mixed strategy profile cr1-3^. Equa t ion (2.6) is a summat ion of size |A( S i ' * ) | , the number of configurations given that i played s, , which is polynomial in n if X is bounded. The difficult task is to compute P r ( D ( s i ) | i 7 ( s i ) ) for a l l G A ( S i , 1 \ i.e. the probabi l i ty d ist r ibut ion over A ( S i ' 1 ) induced by <r(Si\ We observe that the sum in Equat ion (2.7) is over the set of al l act ion profiles corresponding to the configuration D^Si\ The size of this set is exponential in the number of players. Therefore direct ly comput ing the probabi l i ty distr ibut ion using Equat ion (2.7) would take exponential t ime in n. Indeed this is the approach proposed in [Bhat and Ley ton-Brown 2004]. C a n we do better? We observe that the players' mixed strategies are inde-pendent, i.e. a is a product probabi l i ty d istr ibut ion cr(s) = F L ^ ^ i ) - A lso , each player affects the configuration D independently. Th is structure allows us to use dynamic programming (DP) to efficiently compute the probabi l i ty dis-t r ibut ion P r^ ^ ^ l c r ^ ) ) . The intui t ion behind our algor i thm is to apply one agent's mixed strategy at a t ime. In other words, we add one agent at a t ime (s ) to the act ion graph. Let cr\ k denote the projected strategy profile of agents { l , . . . , f c } . Denote by AJj.8*^ the set of configurations induced by actions of agents { 1 , . . . , k}. S imi lar ly denote Dka^ â‚¬ AJj. s*\ Denote by Pk the probabi l i ty d istr ibut ion on AJj."*^ induced by o-[s'\, and by Pk[D] the probabi l i ty of configu-rat ion D. A t i terat ion k of the algor i thm, we compute Pk from Pk-i and crks,\ After i terat ion n, the algor i thm stops and returns Pn. The pseudocode of our D P algor i thm is shown as A lgor i thm 1. E a c h Dk is represented as a sequence of integers, so Pk is a mapping from sequences of integers to real numbers. We need a data structure to manipulate such probabi l i ty distr ibutions over configurations (sequences of integers) which permits quick lookup, insertion and enumeration. A n efficient data structure for this purpose is a trie. Tries are commonly used in text processing to store strings of characters, e.g. as dictionaries for spell checkers. Here we use tries to store strings of integers rather than characters. Bo th lookup and insert ion complexi ty is linear in | ^ ( .S i ) | . To achieve efficient enumeration of al l elements of a trie, we store the elements in a list, in the order of their insertions. Chapter 2. Action Graph Games 13 Algorithm 1 Comput ing the induced probabi l i ty d istr ibut ion P r ( D ( s i ) | a ( s ^ ) . Algorithm ComputeP Input: Si, cr(Si) Output: Pn, which is the distr ibut ion Pr(D^\cr<-Si'1) represented as a trie. D{0Si) = ((),... ,0) P0[D{0Si)] = 1.0 / / Ini t ial izat ion: A ^ S I ) = {D[0Si)} for fc = 1 to n do Init ialize Pk to be an empty trie for all D{kl\ from Pk-i do for all s[Si) G S{kSi) such that a[Si)(s{Si)) > 0 do if s^3i) ^ 0 then Â£>ls,)(4s,)) + = 1 / / A p p ] y a c t i o n 4Si) end if if Pk[D^^] does not exist yet then Pk[D(kSi)] = 0 . 0 end if Pk[D^} +=Pk-1[Dt\]xolSi\s^) end for end for end for return Pn 2.3.3 Proof of correctness It is straightforward to see that A lgor i thm 1 is comput ing the following recur-rence in iteration k: V Â£ F C G A < S I ) , Pk[Dk}= Y, Pnp t - i ]xa[ S i ) ( S < S i ) ) ( 2 ' 8 ) where T>(Si\Dk-i, sks'^) denotes the configuration result ing from apply ing /c's projected act ion sks^ to the configuration Dk^\ G A Â£ " \ O n the other hand, the probabi l i ty d istr ibut ion on Aks^ induced by oi...k is by definit ion k Pr(Â£Â» f c |a 1 . . . f c) = Â£ I I^( s i ) ( 2 - 9 ) S i . . . f c : I 5 ( Â» i ) ( s i . . . f e ) = Â£> f c i=l Now we want to prove that our D P algor i thm is indeed comput ing the correct probabi l i ty d istr ibut ion, i.e. Pk[Dk] as defined by Equat ion 2.8 is equal to Pr(Dfc|ff i . . . f c). Chapter 2. Action Graph Games 14 Theorem 1. For all k, and for all Dk G A[Si), Pk[Dk] = Pr(Â£> f c|cri... f c). Proof by induction on k. Base case: App ly ing Equat ion (2.8) for k = 1, it is straightforward to verify that Pi[Di] = P r ( Â£ > i | ( 7 i ) for al l Di G A{Si). Inductive case: Now assume Pk-\[Dk-i\ = Pr(Dfc_i|<7i...fc_i) for al l Dk~i â‚¬ A ( s i ) fc-i' Pk[Dk}= Pk-r\Dkyxak{sk) (2.10) F>k-i,sk : 2 ? ( D f c _ i , s f c ) = Dk E " * w x x nÂ°j(sj)j Dk^,Sk: \ s i . . . k - i : ' D ( s i . . . f c - l ) = Â£ > f c - l i = i / (2.11) = ' E f E (2-12) Dk_1,sk:'D(Dk-1,sk) = Dk \ s 1 . . . f c _ i:r> ( s i . . . f c _ 1 ) = Â£ ) f c _ 1 j = l / fc = E E E i [ p ( O f c - i , ^ ) = D t ] â€¢ i [D ( S 1 . . . f e _ 1 )=D f c _ 1 ] â€¢ n^( sj) (2.13) = E I E 1 [ D ( D f c _ 1 , S f c ) = D f c ] â€¢ 1[X>(S1... fc_1) = Â£>FC_L] J -n^jCSj) ( 2- 1 4) fe Â«l...fc j = l E n^ )^ (2-16) si...k--V(<>i...k) = Dkj=l = P r ( D t K . . f c ) (2.17) Note that from (2.13) to (2.14) we use the fact that given an action pro-file S i . . . f c _ i , there is a unique configuration Dk-i G AJÂ£l j such that Dk-i = Z> ( 8 l )(* i . . . fc- i)- â€¢ 2.3.4 Complexity Our algor i thm for comput ing V* (o~-i) consists of first comput ing the projected strategies using (2.5), then following A lgor i thm 1, and finally doing the weighted sum given in Equat ion (2.6). Let A ( 5 i ' ^ ( c r _ i ) denote the set of configurations over u(si) that have posit ive probabi l i ty of occurr ing under the mixed strategy Chapter 2. Action Graph Games 15 (sj,<7_i). In other words, this is the number of terms we need to add together when doing the weighted sum in Equation (2.6). When <7_j has full support, A( S i ' l )(er_j) = A ^ ' 1 ) . Since looking up an entry in a trie takes time linear in the size of the key, which is |^(s,)| in our case, the complexity of doing the weighted sum in Equation (2.6) is 0(|f (si)||A( S i'*)(cr_t)|). Algorithm 1 requires n iterations; in iteration k, we look at all possible combinations of and sks*\ and in each case do a trie look-up which costs 0(Ksi)|). Since |S[ S i ) | < + 1, and l A J ^ I < l A ^ I , the complexity of Algorithm 1 is 0(n|i/(si)| 2|A ' S I , T)(<7_j)|). This dominates the complexity of summing up (2.6). Adding the cost of computing o_\.\, we get the overall complexity of expected payoff computation 0(n|5| + n | i /(sj) | 2 |A( S I '*)(<7_;) |) . Since l A ^ V - i ) ! ) < l A ^ I < | A ( S i ) | , and |A< S')| is the number of payoff values stored in payoff function uSi, this means that expected payoffs can be computed in polynomial time with respect to the size of the AGG. Furthermore, our algorithm is able to exploit strategies with small supports which lead to a small l A ^ - ^ c r - i ) ! ) . Since \A^\ is bounded by ^'^jffj^i, this implies that if the in-degree of the graph is bounded by a constant, then the complexity of computing expected payoffs is 0(n|5| + n1+1). Theorem 2 . Given an AGG representation of a game, i's expected payoff V s l (cr_j) can be computed in polynomial time with respect to the representa-tion size, and if the in-degree of the action graph is bounded by a constant, the complexity is polynomial in n. 2.3.5 Discussion Of course it is not necessary to apply the agents' mixed strategies in the order 1... n. In fact, we can apply the strategies in any order. Although the number of configurations |A ' S i ,^(cr_i)| remains the same, the ordering does affect the intermediate configurations Aks'\ We can use the following heuristic to try to minimize the number of intermediate configurations: sort the players by the sizes of their projected action sets, in ascending order. This would reduce the amount of work we do in earlier iterations of Algorithm 1, but does not change the overall complexity of our algorithm. In fact, we do not even have to apply one agent's strategy at a time. We could partition the set of players into sub-groups, compute the distributions induced by each of these sub-groups, then combine these distributions together. Algorithm 1 can be straightforwardly extended to deal with such distributions instead of mixed strategies of single agents. In Section 2.5.1 we apply this approach to compute Jacobians efficiently. Rela t ion to Po lynomia l M u l t i p l i c a t i o n We observe that the problem of computing Pr(D\o^) can be expressed as one of multiplication of multivariate polynomials. For each action node s G v(si), Chapter 2. Action Graph Games 16 let xs be a variable corresponding to s. Then consider the following expression: f [ K S i ) ( 0 ) + E ^{sk)xSk\ (2.18) fc=i \ skeSknv(Si) j Th is is a mult ip l icat ion of n mult ivariate polynomials, each corresponding to one player's projected mixed strategy. Th is expression expands to a sum of | A ( S i ' l ) | terms. Each term can be identified by the tuple of exponents of the x variables, (D(s), D(s'),...). In other words, the set of terms corresponds to the set of configurations A ^ ' 1 ) . The coefficient of the term wi th exponents D G A ^ * ' ^ is E (n-(si)(4si))) s<sÂ«>:Â£>(â€¢'<)(s(aÂ»>)=Â£> \ fc=l / which is exactly Pr(D\o-(-Si^) by Equat ion (2.7)! So the whole expression (2.18) evaluates to E Pr{D\*M) *?(a) Thus the problem of comput ing Pr(D\al-Si^) is equivalent to the problem of com-put ing the coefficients in (2.18). Our D P algori thm corresponds to the strategy of mul t ip ly ing one polynomial at a t ime, i.e. at i teration k we mul t ip ly the polynomial corresponding to player k's strategy wi th the expanded polynomial of 1 . . . (k â€” 1) that we computed in the previous i teration. 2.4 A G G with Function Nodes There are games wi th certain kinds of context-specific independence structures that A G G s are not able to exploit. In Example 4 we show a class of games wi th one such k ind of structure. Our solution is to extend the A G G representation by introducing function nodes, which allows us to exploit a much wider variety of structures. 2.4.1 Mot ivat ing Example: Coffee Shop Example 4. In the Coffee Shop Game there are n players; each player is plan-ning to open a new coffee shop in an downtown area, but has to decide on the location. The downtown area is represented by a r x c grid. Each player can choose to open the shop at any of the B = rc blocks, or decide not to enter the market. Conditioned on player i choosing some location s, her utility depends on: â€¢ the number of players that chose the same block, â€¢ the number of players that chose any of the surrounding blocks, and Chapter 2. Action Graph Games 17 â€¢ the number of players that chose any other location. The normal form representation of this game has size n\S\n = n(B + l ) n . Since there are no strict independencies in the utility function, the size of the graphical game representation would be similar. Let us now represent the game as an AGG. We observe that if agent i chooses an action s corresponding to one of the B locations, then her payoff is affected by the configuration over all B locations. Hence, u(s) would consist of B action nodes corresponding to the B locations. The action graph has in-degree T = B. Since the action sets completely overlap, the representation size is 0(|5||AW|) = 0(B^"l^igf )â€¢ If we hold B constant, this becomes 0(BnB), which is exponentially more compact than the normal form and the graphical game representation. If we instead hold n constant, the size of the representation is 0(Bn), which is only slightly better than the normal form and graphical game representations. Intuitively, the AGG representation is only able to exploit the anonymity structure in this game. However, this game's payoff function does have context-specific structure. Observe that us depends only on three quantities: the num-ber of players that chose the same block, the number of players who chose surrounding blocks, and the number of players who chose other locations. In other words, us can be written as a function g of only 3 integers: us(D^) â€” g(D(s), J2S'Â£S' D(s'), J2s"eS" ^K5")) where S' is the set of actions that sur-rounds s and S" the set of actions corresponding to the other locations. Because the AGG representation is not able to exploit this context-specific information, utility values are duplicated in the representation. 2.4.2 F u n c t i o n N o d e s In the above example we showed a kind of context-specific independence struc-ture that AGGs cannot exploit. It is easy to think of similar examples, where us could be written as a function of a small number of intermediate param-eters. One example is a "parity game" where us depends only on whether Es'ei/fs) -^(s') *s e v e n o r Â°dd- Thus us would have just two distinct values, but the AGG representation would have to specify a value for every configuration This kind of structure can be exploited within the AGG framework by in-troducing function nodes to the action graph G. Now G's vertices consist of both the set of action nodes S and the set of function nodes P. We require that no function node p â‚¬ P can be in any player's action set, i.e. 5 D P = {}, so the total number of nodes in G is \S\ + \P\. Each node in G can have ac-tion nodes and/or function nodes as neighbors. For each p Â£ P, we introduce a function fp : A ^ ' >â€”> IN, where â‚¬ A(p) denotes configurations over p's neighbors. The configurations D are extended over the entire set of nodes, by defining D(p) = fp(D^). Intuitively, D(p) are the intermediate parameters that players' utilities depend on. To ensure that the AGG is meaningful, the graph G restricted to nodes in P is required to be a directed acyclic graph (DAG). Furthermore it is required that Chapter 2. Ac t ion Graph Games 18 every p Â£ P has at least one neighbor (i.e. incoming edge). These condit ions ensure that D(s) for al l s and D(p) for al l p are well-defined. To ensure that every p Â£ P is "useful" , we also require that p has at least one out-going edge. As before, for each act ion node s we define a ut i l i ty function ua : A^ i - Â» E. We call this extended representation (N, S, P, v, { / p } p g p , u) an Ac t ion G r a p h Game wi th Funct ion Nodes ( A G G F N ) . 2.4.3 Representation Size Given an A G G F N , we can construct an equivalent A G G wi th the same players N and actions S and equivalent ut i l i ty functions, but represented without any funct ion nodes. We put an edge from s' to s in the A G G if either there is an edge from s' to s in the A G G F N , or there is a path from s' to s through a chain of function nodes. The number of uti l i t ies stored in an A G G F N is no greater than the number of util it ies in the equivalent A G G without function nodes. We can show this by following similar arguments as before, establishing a many-to-one mapping from util it ies in the A G G representation to uti l i t ies in the A G G F N . O n the other hand, A G G F N s have to represent the functions fp, which can either be implemented using elementary operations, or represented as mappings similar to us. We could construct examples wi th huge number of funct ion nodes, such that the space complexity of representing { / P } P gp would be greater than that of the ut i l i ty functions. In other words, bl indly adding function nodes wi l l not make the representation more compact. We want to add function nodes only when they represent meaningful intermediate parameters and hence reduce the number of incoming edges on action nodes. Consider our coffee shop example. For each action node s corresponding to a locat ion, we introduce function nodes p's and p". Let v(p's) consist of act ions surrounding s, and v(p") consist of actions for the other locations. Then we modify v(s) so that it has 3 nodes: i/(s) = {s,p's,p"}, as shown in F igure 2.5. For al l function nodes p Â£ P, we define fp(D^) = V J m g â€ž ( p ) D(m). Now each is a configuration over only 3 nodes. Since fp is a summation operator, |A ( S ) | is the number of composit ions of n â€” 1 into 4 nonnegative integers, ^ " ^ j ^ , = n (n + l ) ( n + 2)/6 = 0 ( n 3 ) . We must therefore store 0(Bn3) ut i l i ty values. Th is is significantly more compact than the A G G representation without function nodes, which had a representation size of 0(B ^ " " V ) ^ )â€¢ Remark 1. One property of the A G G representation as defined in Section 2.2.1 is that ut i l i ty function us is shared by al l players that have s in their act ion sets. W h a t if we want to represent games wi th agent-specific ut i l i ty functions, where uti l i t ies depend not only on s and D^s\ but also on the identity of the player playing s? We could split s into indiv idual player's actions S j , Sj etc., so that each action node has its own ut i l i ty function, however the result ing A G G would not be able to take advantage of the fact that the actions s , , Sj affect the other players' uti l i t ies in the same way. Using function nodes, we are able to compact ly represent this k ind of structure. We again split s into separate act ion nodes s;, Sj, but also introduce a function node p w i th S j , Sj as its neighbors, Chapter 2. Action Graph Games 19 m m a i ..3 m 3 '9 ,3 1 a I Figure 2.5: A 5 x 6 Coffee Shop Game: Left: the AGG representation without function nodes (looking at only the neighborhood of the a node s). Middle: we introduce two function nodes. Right: s now has only 3 incoming edges. and define fp to be the summation operator fp(D^p>) = ^2meupD(m). This way the function node p with its configuration D(p) acts as if Si and Sj had been merged into one node. Action nodes could then include p instead of both Si and Sj as a neighbor. This way agents can have different utility functions, without sacrificing representational compactness. 2.4.4 Computing with A G G F N s Our expected-payoff algorithm cannot be directly applied to AGGFNs with arbitrary fp. First of all, projection of strategies does not work directly, because a player j playing an action Sj v(s) could still affect Z?'s' via function nodes. Furthermore, our DP algorithm for computing the probabilities does not work because for an arbitrary function node p G v{s), each player would not be guaranteed to affect D(p) independently. Therefore in the worst case we need to convert the AGGFN to an AGG without function nodes in order to apply our algorithm. This means that we are not always able to translate the extra compactness of AGGFNs over AGGs into more efficient computation. Definition 1. An AGGFN is contribution-independent (CI) if â€¢ For all p G P, v(p) C S, i.e. the neighbors of function nodes are action nodes. â€¢ There exists a commutative and associative operator *, and for each node s G S an integer ws, such that given an action profile s, for all p G P, D{p) = * i e N : S i e i / ( p ) W S i . Note that this definition entails that D(p) can be written as a function of DM by collecting terms: D(p) = fp(D^) = *seÂ»(P)(*k=i *>,)â€¢ The coffee shop game is an example of a contribution-independent AGGFN, with the summation operator serving as *, and ws = 1 for all s. For the parity game mentioned earlier, * is instead addition mod 2. If we are modeling an auction, and want D(p) to represent the amount of the winning bid, we would let ws be the bid amount corresponding to action s, and * be the max operator. Chapter 2. Ac t ion Graph Games 20 For contribution-independent A G G F N s , it is the case that for al l function nodes p, each player's strategy affects D(p) independently. Th is fact allows us to adapt our algori thm to efficiently compute the expected payoff Vy(cr_t) . For simpl ic i ty we present the algor i thm for the case where we have one operator * for a l l p â‚¬ P, but our approach can be directly applied to games wi th different operators associated wi th different function nodes, and likewise wi th a different set of ws for each operator. We define the contribution of act ion s to node m G S U P, denoted Cs{m), as 1 if m = s, 0 if m G S \ {s}, and *m'Â£v(m)(*k=il ^ w s ) if m G P . Then it is easy to verify that given an action profile s, D(s) = 5Zâ„¢=1 Caj(s) for al l s G S and D{p) = * ! ? = 1 CSj (p) for al l p G P. Given that player i played s, , we define the projected contr ibut ion of ac-t ion s, denoted CiSt\ as the tuple (Cs(m))7nâ‚¬u(Siy Note that different actions may have identical projected contributions. Player j's mixed strategy Oj in-duces a probabi l i ty distr ibut ion over j ' s projected contr ibutions, P r ( C ' s * ) | < 7 j ) = 2~2 ..c<si)_(7(Â»i) CTi(sj')- Now we can operate entirely using the probabil i t ies on projected contributions instead of the mixed strategy probabil i t ies. Th is is (s â€¢) analogous to the project ion of Oj to <r) in our algor i thm for A G G s without function nodes. A lgor i thm 1 for comput ing the distr ibut ion Pr[p^\o) can be straightfor-wardly adopted to work wi th contribution-independent A G G F N s : whenever we apply player k's contr ibut ion Cil'^ to Dks^\, the result ing configuration Dka^ is computed componentwise as follows: DkSl\m) = C s ^ ( m ) + Dksz\(m) if m G 5 , and Dks'\m) = cisk'\m) * D ^ j ( m ) if m G P. Fol lowing simi lar complex-ity analysis, if an A G G F N is contribution-independent, expected payoffs can be computed in polynomial t ime wi th respect to the representation size. App l ied to the coffee shop example, since |A(S)| = 0(n 3), our algor i thm takes (9(n|5| +n4) t ime, which grows linearly in \S\. Remark 2. We note that similar ideas are emloyed in the variable el iminat ion algorithms that exploit causal independence in Bayes nets [Zhang and Poole 1996]. Bayes nets are compact representations of probabi l i ty distr ibut ions that graphical ly represent independencies between random variables. A Bayes net is a D A G where nodes represent random variables and edges represent direct prob-abil ist ic dependence between random variables. Efficient algorithms have been developed to compute condit ional probabil i t ies in Bayes nets, such as clique tree propagation and variable el iminat ion. Causal independence refers to the situat ion where a node's parents (which may represent causes) affect the node independently. The condit ional probabil i t ies of the node can be defined using a binary operator that can be applied to values from each of the parent vari-ables. Zhang and Poole [1996] proposed a variable el iminat ion algor i thm that exploits causal independence by factoring the condit ional probabi l i ty distr ibu-t ion into factors corresponding to the causes. The way factors are combined together is similar in spiri t to our D P algor i thm that combines the independent contributions of the players' strategies to the configuration D^Si\ Chapter 2. Action Graph Games 21 Th is paral lel between Bayes nets and act ion graphs are not surprising. In A G G F N s , we are t ry ing to compute the probabi l i ty d istr ibut ion over configu-rations Pr(D^\o-^Si^). If we see each node m in the act ion graph as a ran-dom variable D(m), this is the joint d istr ibut ion of variables v(si). However, whereas edges in Bayes nets represent probabil ist ic dependence, edges in the act ion graph have different semantics depending on the target. Incoming edges of act ion nodes specifies the neighborhood v(s) that we are interested in com-put ing the probabil i t ies of. Incoming edges of a function node represents the deterministic dependence between the random variable of the function node D(p) and its parents. The only probabil ist ic components of act ion graphs are the players' mixed strategies. These are probabi l i ty distr ibut ions of random variables associated wi th players, but are not expl ici t ly represented in the ac-t ion graph. Whereas A G G F N s in general are not D A G s , given an act ion s, we can construct an induced Bayes net consisting of ^(s) , the neighbors of function nodes in v(s), and the neighbors of any new function nodes included, and so on unt i l no more function nodes are included, and finally augmented wi th n nodes representing the players' mixed strategies. Whereas for C I A G G F N s , the Bayes net formulat ion has a simple structure and does not y ield a more efficient algor i thm compared to A lgor i thm 1, this formulat ion could be useful for non-C I A G G F N s wi th a complex network of funct ion nodes, as standard Bayes net algorithms can be used to exploit the independencies in the induced Bayes net. 2.5 Applications 2.5.1 Application: Computing Payoff Jacobian A game's payoff Jacobian under a mixed strategy er is defined as a Yli \Si\ by Y2i matr ix wi th entries defined as follows: = Y,u(siMsi,Si>,s))Pr{s\cr) (2.20) ses Here whenever we use an overbar in our notat ion, it is shorthand for the sub-script â€” {i, i'}. For example, s = s_{^/j. The rows of the matr ix are indexed by i and Si while the columns are indexed by i' and Si>. G iven entry VVy. ' l s . , (<f), we cal l S{ its primary action node, and s;/ its secondary action node. One of the main reasons we are interested in comput ing Jacobians is that it is the computat ional bottleneck in Gov indan and Wi lson 's cont inuat ion method for finding mixed-strategy Nash equi l ibr ia in mult i-player games [Govindan and Wi l son 2003]. The Gov indan-Wi lson algor i thm starts by perturbing the payoffs to obtain a game wi th a known equi l ibr ium. It then follows a path that is guaranteed to give us one or more equi l ibr ia of the unperturbed game. In each step, we need to compute the payoff Jacobian under the current mixed strategy Chapter 2. Action Graph Games 22 in order to get the direction of the path; we then take a smal l step along the path and repeat. Efficient computat ion of the payoff Jacobian is important for more than this continuation method. For example, the iterated po lymatr ix approximat ion ( IPA) method [Govindan and Wi lson 2004] has the same computat ional problem at its core. A t each step the I P A method constructs a po lymatr ix game that is a l inearizat ion of the current game wi th respect to the mixed strategy profile, the Lemke-Howson algor i thm is used to solve this game, and the result updates the mixed strategy profile used in the next i teration. Though theoretical ly it offers no convergence guarantee, I P A is typical ly much faster than the continuation method. A lso , it is often used to give the continuation method a quick start. The payoff Jacobian may also be useful to multiagent reinforcement learning algorithms that perform pol icy search. Equat ion (2.20) shows that the V V ^ ' Y , (tf) element of the Jacobian can be interpreted as the expected ut i l i ty of agent i when she takes action s ; , agent i ' takes act ion s^, and al l other agents use mixed strategies according to o. So a straightforward approach is to use our D P algor i thm to compute each entry of the Jacobian. However, the Jacobian matr ix has certain extra structure that allows us to achieve further speedup. F i rs t , we observe that some entries of the Jacobian are identical. If two entries have same pr imary action node s, then they are expected payoffs on the same ut i l i ty function us, i.e. they have the same value if their induced probabi l i ty distr ibutions over A^ are the same. We need to consider two cases: 1. Suppose the two entries come from the same row of the Jacobian, say player i's act ion S j . There are two sub-cases to consider: (a) Suppose the columns of the two entries belong to the same player j, but different actions Sj and s'j. If s j- 8 ^ = s'jSi\ i.e. Sj and s'j both project to the same projected act ion in Sj ' s projected act ion graph, then V V . V S . = V V â€ž H ' â€¢ (b) Suppose the columns of the entries correspond to actions of different players. We observe that for al l j and Sj such that a^Si\sjSt^) = 1, VV^s.{a) = Vl(<r-i). A s a special case, if S J 8 i ) = {0}, i.e. agent j does not affect i 's payoff when i plays s$, then for al l Sj G Sj, V % ( f f ) = y s*(a_i). 2. If Si and Sj correspond to the same action node s (but owned by agents i and j respectively), thus sharing the same payoff function us, then V V , V , . = V V / ' Y - Furthermore, if there exist s't G Si,s'4 G 5,- such that s\{s) = s ' ( s ) , then VV''{, = V V / ' Y . E v e n if the entries are not equal, we can exploit the simi lar i ty of the pro-jected strategy profiles (and thus the simi lar i ty of the induced distr ibutions) between entries, and re-use intermediate results when comput ing the induced Chapter 2. Action Graph Games 23 distr ibutions of different entries. Since comput ing the induced probabi l i ty dis-tr ibut ions is the bottleneck of our expected payoff a lgor i thm, this provides sig-nificant speedup. F i rs t we observe that if we fix the row (i, S j ) and the column's player j, then o is the same for al l secondary actions Sj Â£ Sj. We can compute the probabi l i ty d ist r ibut ion P r ( D n _ i | s , , c? ' 8 ^ ) , then for al l Sj Â£ Sj, we just need to apply the action Sj to get the induced probabi l i ty d istr ibut ion for the entry V V 8 ' ' ^ . Now suppose we fix the row (i,Si). For two column players j and j', their corresponding strategy profiles c-{tj} and a_^jiy are very simi lar, in fact they are identical in nâ€”3 of the n â€”2 components. For A G G s without function nodes, we can exploit this simi lar i ty by comput ing the distr ibut ion P r ( D n _ i | c r ^ ) , then for each j ^ i, we "undo" j's mixed strategy to get the distr ibut ion induced by cr-{ij}. Recal l from Section 2.3.5 that the distr ibutions are coefficients of the mul t ip l icat ion of certain polynomials. So we can undo j ' s strategy by comput ing the long div is ion of the polynomial for a^i by the polynomial for <jj. Th is method does not work for contribution-independent A G G F N s , because in general a player's contr ibut ion to the configurations are not reversible, i.e. given Pr(Dn-i\a<^l)) and Oj, it is not always possible to undo the contr ibu-tions of <Tj. Instead, we can efficiently compute the distr ibutions by recursively bisecting the set of players in to sub-groups, comput ing probabi l i ty distr ibu-tions induced by the strategies of these sub-groups and combining them. For example, suppose n = 9 and i = 9, so O-i = o\...%. We need to compute the distr ibut ions induced by <7_{ l j 9 } , . . . , c_.r 8 ig}, respectively. Now we bisect cr_j into <7i...4 and 05...8. Suppose we have computed the distr ibutions induced by (T1...4 as well as (T234, CT134, <xi24, CT123, and simi lar ly for the other group of 5 . . . 8. Then we can compute Pr(â€¢ |af_s|\ gy) by combining Pr(-.|o23*4) a n d P r ( ' l Â° 5 6 7 8 ) ' compute P r ( - | c r ^ 2 9 j ) by combining P r ^ l c r ^ ) and Pr(-|05g* 7g), etc. We have reduced the problem into two smaller problems over the sub-groups 1 . . . 4 and 5 . . . 8, which can then be solved recursively by further bisecting the sub-groups. Th is method saves the re-computat ion of sub-groups of strategies when com-put ing the induced distr ibutions for each row of the Jacobian, and it works wi th any contribution-independent A G G F N s because it does not use long division to undo strategies. 2.6 Experiments We implemented the A G G representation and our algor i thm for comput ing ex-pected payoffs and payoff Jacobians in C + + . We ran several experiments to compare the performance of our implementat ion against the (heavily optimized) GameTracer implementat ion [Blum et a l . 2002] which performs the same com-putat ion for a normal form representation. We used the Coffee Shop game (with randomly-chosen payoff values) as a benchmark. We varied both the number of players and the number of actions. Chapter 2. Ac t ion Graph Games 24 1 o W (A r o >. n a. 00000000 10000000 1000000 100000 10000 1000 100 10 1 - â€¢ - A G G â€¢ â€¢ - N F B 3 4 5 6 7 8 9 10 It 12 13 14 15 16 number of players 00000000 10000000 1000000 100000 10000 1000 100 10 1 - - - - - -* - â€”Â»-AGG - Â» - NF 4 5 6 7 8 9 10 number of rows Figure 2.6: Compar ing Representation Sizes of the Coffee Shop Game (log-scale). Left: 5 x 5 grid wi th 3 to 16 players. Right : 4-player r x 5 grid wi th r varying from 3 to 10. 2.6.1 Representation Size For each game instance we counted the number of payoff values that need to be stored in each representation. Since for both normal form and A G G , the size of the representation is dominated by the number of payoff values stored, the number of payoff values is a good indicat ion of the size of the representation. We first looked at Coffee Shop games wi th 5 x 5 blocks, wi th varying number of players. F igure 2.6 has a log-scale plot of the number of payoff values in each representation versus the number of players. The normal form representation grew exponential ly wi th respect to the number of players, and quickly becomes impract ica l for large number of players. The size of the A G G representation grew polynomial ly wi th respect to n. We then fixed the number of players at 4, and varied the number of blocks. For ease of comparison we fixed the number of colums at 5, and only changed the number of rows. F igure 2.6 has a log-scale plot of the number of payoff values versus the number of rows. The size of the A G G representation grew linearly w i th the number of rows, whereas the size of the normal form representation grew like a higher-order polynomial . Th is was consistent wi th our theoretical predict ion that A G G F N s store 0(\S\n3) payoff values for Coffee Shop games while normal form representations store n\S\n payoff values. 2.6.2 Expected Payoff Computation Second, we tested the performance of our dynamic programming algori thm against GameTracer 's normal form based algor i thm for computing expected payoffs, on Coffee Shop games of different sizes. For each game instance, we generated 1000 random strategy profiles wi th ful l support, and measured the C P U (user) t ime spent comput ing the expected payoffs under these strategy profiles. We fixed the size of blocks at 5 x 5 and varied the number of players. F igure 2.7 shows plots of the results. For very smal l games the normal form based algor i thm is faster due to its smaller bookkeeping overhead; as the num-Chapter 2. Action Graph Games 25 co 120 â€¢ D 100 80 60 40 8 â€”-AGG - Â« - NF ill 1,1,1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 number of players â€”â€¢â€”AGG a " -Â»-NF -â€¢ .a * 3 4 5 6 7 8 9 10 number of rows Figure 2.7: Runn ing times for payoff computat ion in the Coffee Shop Game. Left: 5 x 5 gr id w i th 3 to 16 players. Right : 4-player r x 5 grid w i th r varying from 3 to 10. ber of players grows larger, our A G G F N - b a s e d algorithm's running t ime grows polynomial ly, while the normal form based algori thm scales exponentially. For more than five players, we were not able to store the normal form representation in memory. Next , we fixed the number of players at 4 and number of columns at 5, and varied the number of rows. Our algorithm's running t ime grew roughly l inearly in the number of rows, while the normal form based algor i thm grew like a higher-order polynomial . Th is was consistent wi th our theoretical predict ion that our algor i thm take 0 ( n | 5 | + n 4 ) t ime for this class of games while normal-form based algorithms take 0 ( | 5 | r a _ 1 ) t ime. Last , we considered strategy profiles having part ia l support. Wh i le ensuring that each player's support included at least one action, we generated strategy profiles wi th each act ion included in the support w i th probabi l i ty 0.4. Game-Tracer took about 60% of its ful l-support running times to compute expected payoffs in this domain, while our algor i thm required about 20% of its ful l-support running times. 2.6.3 Computing Payoff Jacobians We have also run simi lar experiments on comput ing payoff Jacobians. As dis-cussed in Section 2.5.1, the entries of a Jacobian can be formulated as expected payoffs, so a Jacobian can be computed by doing an expected payoff computa-t ion for each of its entry. In Section 2.5.1 we discussed methods that exploits the structure of the Jacobian to further speedup the computat ion. GameTracer 's normal- form based implementat ion also exploits the structure of the Jacobian by re-using part ia l results of expected-payoff computations. W h e n comparing our A G G - b a s e d Jacobian algori thm as described in Section 2.5.1 against G a -meTracer's implementat ion, the results are very similar to the above results for computing- expected payoffs, i.e. our implementat ion scales polynomial ly in n while GameTracer scales exponential ly in n. We instead focus on the question of how much speedup does the methods in Section 2.5.1 provide, by comparing Chapter 2. Action Graph Games 26 AGG Jacobian 4 5 6 7 8 9 10 number of players T3 C O o a in Â£ a | a. u 5 6 7 8 9 10 number of rows Figure 2.8: Runn ing times for Jacobian computat ion in the Coffee Shop Game. Left: 5 x 5 grid wi th 3 to 10 players. Right: 4-player r x 5 gr id w i th r varying from 3 to 10. our algori thm in Section 2.5.1 against the algor i thm that computes expected payoffs (using our A G G - b a s e d algori thm described in Section 2.3) for each of the Jacobian's entries. The results are shown in Figure 2.8. Our algor i thm is about 50 times faster. Th is confirms that the methods discussed in Seciton 2.5.1 provide significant speedup for computing payoff Jacobians. 2.6.4 Finding Nash Equilibria using the Govindan-Wilson algorithm Govindan and Wi lson 's algori thm [Govindan and Wi lson 2003] is one of the most competit ive algorithms for f inding Nash equi l ibr ia for mult i-player games. The computat ional bottleneck of the algori thm is repeated computat ion of payoff Jacobians as defined in Section 2.5.1. Now we show experimental ly that the speedup we achieved for computing Jacobians using the A G G representation leads to a speedup in the Gov idan-Wi lson algori thm. We compared two versions of the Gov indan-Wi lson algor i thm: one is the implementat ion in GameTracer, where the Jacobian computat ion is based on the normal form representation; the other is identical to the GameTracer im-plementation, except that the Jacobians are computed using our algor i thm for the A G G representation. B o t h techniques compute the Jacobians exactly. A s a result, given an ini t ia l perturbat ion to the original game, these two imple-mentations would follow the same path and return exactly the same answers. So the difference in their running times would be due to the different speeds of comput ing Jacobians. Aga in , we tested the two algorithms on Coffee Shop games of varying sizes: first we fixed the size of blocks at 4 x 4 and varied the number of players; then we fixed the number of players at 4 and number of columns at 4, and varied the number of rows. For each game instance, we randomly generated 10 in i t ia l perturbat ion vectors, and for each ini t ia l perturbat ion we run the two versions of the Gov indan-Wi lson algori thm. Since the running t ime of the Gov indan-Wi l son algor i thm highly depends on the ini t ial perturbat ion, it is not meaningful Chapter 2. Action Graph Games 27 number of players number of rows Figure 2.9: Rat ios of Runn ing times for the Gov indan-Wi lson algor i thm in the Coffee Shop Game. Left: 4 x 4 grid w i th 3 to 5 players. Right : 4-player r x 4 grid wi th r varying from 3 to 9. The error bars indicate standard deviat ion over 10 random ini t ia l perturbations. The constant lines at 1.0 indicat ing equal running times are also shown. to compare the running times wi th different in i t ia l perturbations. Instead, we look at the ratio of running times between the normal form implementat ion and the A G G implementation. Thus a rat io greater than 1 means the A G G implementat ion spent less t ime than the normal form implementat ion. We plotted the results in Figure 2.9. The results confirmed our theoretical predict ion that as the size of the games grows (either in the number of players or in the number of actions), the speedup of the A G G implementat ion compared to the normal from implementation increases. 2.7 Conclusions We presented a polynomial- t ime algor i thm for comput ing expected payoffs in action-graph games. For A G G s wi th bounded in-degree, our algor i thm achieves an exponential speed-up compared to normal-form based algorithms and Bhat and Leyton-Brown [2004] 's algor i thm. We also extended the A G G represen-tat ion by introducing function nodes, which allows us to compact ly represent a wider range of structured ut i l i ty functions. We showed that if an A G G F N is contribution-independent, expected payoffs can be computed in polynomial t ime. Ou r current and future research includes two directions: Computat ional ly , we p lan to apply our expected payoff a lgor i thm to speed up other game-theoretic . computat ions, such as computing best responses and the simpl ic ia l subdivis ion algor i thm for f inding Nash equi l ibr ia. A lso, as a direct corol lary of our Theorem 2 and Papadimi t r iou [2005]'s result, correlated equi l ibr ia can be computed in t ime polynomial in the size of the A G G . Representationally, we plan to extend the A G G framework to represent more types of structure such as addi t iv i ty of payoffs. In part icular, we intend to study Chapter 2. Action Graph Games 28 is Bayesian games. In a Bayesian game, players are uncertain about which game (i.e. payoff function) they are playing, and each receives certain private informa-tion about the underlying game. Bayesian games are heavily used in economics for modeling competitive scenarios involving information asymmetries, e.g: for modeling auctions and other kinds of markets. A Bayesian game can be seen as a compact representation, since it is much more compact than its induced normal form. We plan to use the AGG framework to represent not only the structure inherent in Bayesian games, but also context-specific independence structures such as the ones we have considered here. 29 Chapter 3 Bidding Agents for Online Auctions with Hidden Bids 3.1 Introduction The re 1 has been much research on the study of automated bidding agents for auctions and other market-based environments. The Trading Agent Compet i -t ions (TAC-C lass ic and T A C Supply Cha in Management) have attracted much interest [Wellman et a l . 2002]. There have also been research efforts on b id-ding agents and bidding strategies in other auct ion environments [Byde 2002; Bout i l ier et a l . 1999; Greenwald' and Boyan 2004; A r o r a et a l . 2003; C a i and Wurman 2003; Anthony et al . 2001]. A l though this body of work considers many different auction environments, b idding agents always face a simi lar task: given a valuation function, the bidding agent needs to compute an opt imal b id-ding strategy that maximizes expected surplus. (Some environments such as T A C - S C M also require agents to solve addi t ional , e.g., scheduling tasks.) The "Wi l son Doctr ine" in mechanism design argues that mechanisms should be constructed so that they are "detai l- free"â€”that is, so that agents can behave rat ional ly in these mechanisms even without information about the distr ibut ion of other agents' valuations. For example, the V C G mechanism is detail-free because under this mechanism it is a weakly dominant strategy to bid exactly one's valuation, regardless of other agents' beliefs, valuations or actions. U n -der common assumptions (in part icular, independent private values) single-item Engl ish auctions are similar: an agent should remain in the auct ion unt i l the bidding reaches the amount of her valuat ion. Wh i le detail-free mechanisms are desirable, they are not ubiquitous. Very often, agents are faced wi th the problem of deciding how to behave in games that do not have dominant strategies and where other agents' preferences are strategically relevant. For example, we may want to part icipate in a series of auctions run by different sellers at different t imes. Th is is a common scenario at online auct ion sites such as eBay. In Section 3.4 we consider a sequential auct ion model of this scenario, and show that information about other bidders' preferences is very relevant in construct ing a bidding strategy. : A version of this chapter has been submitted for publication. Jiang, A . X . and Leyton-Brown, K . (2005) Bidding Agents for Online Auctions with Hidden Bids. Submitted to the Machine Learning Journal. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 30 3.1.1 Game-Theoretic and Decision-Theoretic Approaches How should a bidding agent be constructed? Depending on the assumptions we choose to make about other bidders, two broad approaches to comput ing b id-ding strategies suggest themselves: a game-theoretic approach and a decision-theoretic approach. The game theoretic approach assumes that al l agents are perfectly rat ional and that this rat ional i ty is common knowledge; the auct ion is modeled as a Bayesian game. Under this approach, a bidding agent would compute a Bayes-Nash equi l ibr ium of the auct ion game, and play the equil ib-r ium bidding strategy. M u c h of the extensive economics l i terature on auctions follows this approach (see, e.g., the survey in [Klemperer 2000]). For exam-ple, in environments wi th mult iple, sequential auctions for identical items and in which each bidder wants only a single i tem, the Bayes-Nash equi l ibr ium is well-known [Mi lgrom and Weber 2000; Weber 1983]. Such equi l ibr ia very often depend on the distr ibut ion of agents' valuat ion functions and the number of bidders. A l though this information is rarely available in practice, it is usual ly possible to estimate these distr ibutions from the bidding history of previous auctions of similar items. Note that this involves making the assumption that past and future bidders wi l l share the same valuat ion distr ibut ion. The game-theoretic approach has received a great deal of study, and is per-haps the dominant paradigm in microeconomics. In part icular, there are very good reasons for seeking strategy profiles that are resistant to uni lateral devia-t ion. However, this approach is not always useful to agents who need to decide what to do in a part icular setting, especially when the rat ional i ty of other b id-ders is in doubt, when the computat ion of equi l ibr ia is intractable, or when the game has mult iple equi l ibr ia. In such settings, it is sometimes more appropri-ate to rely on decision theory. A decision-theoretic approach effectively treats other bidders as part of the environment, and ignores the possibi l i ty that they may change their behavior in response to the agent's actions. We again make the assumption that the other bidders come from a stat ionary populat ion; how-ever, in this case we model agents' b id amounts directly rather than model ing their valuations and then seeking an equi l ibr ium strategy. We then solve the re-sult ing single-agent decision problem to find a bidding strategy that maximizes expected payoff. We could also use a reinforcement-learning approach, where we continue to learn the bidding behavior of other bidders while part ic ipat ing in the auctions. M u c h recent l i terature on bidding agent design follows the decision-theoretic approach, e.g. [Boutil ier et a l . 1999; Byde 2002; Greenwald and Boyan 2004; Stone et a l . 2002; Mack ie -Mason et a l . 2004; Osepayshvi l i et a l . 2005]. Th is chapter does not attempt to choose between these two approaches; it is our opinion that each has domains for which it is the most appropriate. The important point is that regardless of which approach we elect to take, we are faced wi th the subproblem of estimating two distr ibut ions from the bidding history of past auctions: the distr ibut ion on the number of bidders, and the distr ibut ion of bid amounts (for decision-theoretic approaches) or of valuations Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 31 (for game-theoretic approaches). Consequently, this problem has received much discussion in the l i terature. For example, A they and Hai le [2002] and various others in econometrics study the estimation of valuat ion distr ibutions in various standard auction types given observed bids, assuming that bidders are perfectly rat ional and follow equi l ibr ium strategies. O n the decision-theoretic front, a popular approach is to estimate the distr ibut ion of the final prices in auctions based on observed selling prices, and then to use this distr ibut ion to compute the opt imal b idding strategy. Examples of this include Stone et al.'s [2002] A T T a c agent for the Trading Agent Compet i t ion, Mack ie -Mason et al. 's [2004] study of bidding in simultaneous ascending auctions and the follow-up work in [Osepayshvili et al . 2005], Byde's [2002] study of b idding in simultaneous online Eng l ish auctions, and Greenwald and Boyan's [2004] analysis of sequential Eng l ish auctions. A paper by Bout i l ier et al . [1999] takes a different decision-theoretic approach which is relevant to the approach we propose in this chapter. We defer discussion of this work unt i l Section 3.6, after we have presented our approach. 3.1 .2 O v e r v i e w In this chapter we consider sequential Eng l ish auctions in which a ful l b idding history is revealed, such as the online auctions run by eBay. It might seem that there is very l i t t le left to say: we learn the distr ibutions of interest f rom the bidding history, then compute a bidding strategy based on that information for the current and future auctions. However, we show that under realistic b idding dynamics (described in Section 3.2) the observed bidding histories omit some relevant information. F i rs t , some bidders may come to the auct ion when it is already in progress, f ind that the current price already exceeds their valuat ion, and leave without placing a b id . Second, the amount the winner was wi l l ing to pay is never revealed. Ignoring these sources of bias can lead to poor estimates of the underlying valuat ion distr ibut ion and distr ibut ion of number of bidders. In Section 3.3 we propose a learning approach based on the Expecta t ion M a x -imizat ion ( E M ) algor i thm, which iteratively generates hidden bids consistent w i th the observed bids, and then computes maximum-l ikel ihood estimations of the valuat ion distr ibut ion based on the completed set of bids. The learned dis-tr ibut ions can then be used in comput ing decision-theoretic or game-theoretic b idding strategies. Section 3.4 discusses the computat ion of the opt imal strategy for an agent in a repeated Eng l ish auct ion setting under the decision-theoretic approach, and in a similar (non-repeated) setting under the game-theoretic ap-proach. In Section 3.5 we present experimental results on synthetic data sets as well as on data collected from eBay, which show that our E M learning ap-proach makes better estimates of the distr ibut ions, gets more payoff under the decision-theoretic model , and makes a better approximat ion to the Bayes-Nash equi l ibr ium under the game-theoretic model, as compared to the straightforward approach which ignores hidden bids. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 32 3.2 A Model of Online Auctions Consider a (possibly repeated) auct ion held online, e.g., on eBay. There are TO potential bidders interested in a certain auct ion for a single i tem. We assume that bidders are r isk-neutral , have independent private values ( IPV) , and that uti l i t ies are quasil inear. The number of bidders TO is drawn from a discrete distr ibut ion g(m) wi th support [2, oo). B idders ' potential valuations (in the game-theoretic context) or bids (in the decision-theoretic context) are indepen-dently drawn from a continuous distr ibut ion f{x). 3.2.1 Bidding Dynamics The m potential bidders arrive at the auction site sequentially. W h e n each bidder arrives, she observes the bids that have been accepted in the auction so far, places a single proxy b i d 2 and then leaves the system. The auctioneer processes new bids as follows: 1. W h e n a proxy b id is submit ted, the auctioneer compares it to the current price level, which is the second-highest proxy bid so far plus a smal l b id increment. 3 (a) If the submit ted b id is not greater than the current price level the bid is dropped and no record of the b id is recorded in the auction's history. (b) If the submit ted b id is higher than the current price level but lower than the highest proxy b id so far, then the submit ted b id is accepted, the bid amount of the currently winning b id is increased to equal the submitted b id (i.e., this b id remains the winning bid), and the submitted b id loses. (c) If the submit ted b id is higher than the previously winning bid then the price level is increased to equal the previously winning b id and the submitted b id is made the current winner. 2. A t the end of the auct ion, the i tem is awarded to the bidder who placed the highest b id, and the final price level is the amount of the second highest b id. 3.2.2 Hidden Bids Accord ing to our model, some bidders' proxy b id amounts wi l l be revealed. Th is includes any b id that was higher than the current price level at the t ime it was placed, even if that b id was immediately outbid by another proxy b id. However, other bids wi l l not be observed. There are two types of hidden bids: 2 A proxy bidding system asks bidders for their maximum willingness to pay, and then bids up to this amount on the bidder's behalf. Such systems are common on online auction sites such as eBay; see e.g., h t t p : / / p a g e s . e b a y . c o m / h e l p / b u y / p r o x y - b i d d i n g . h t m l 3 F o r simplicity in this chapter we ignore this small increment and assume that the current price level is the second-highest proxy bid so far. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 33 Figure 3.1: A n example of the bidding process of an auct ion wi th 7 potent ial bidders. 1. The highest b id of each auct ion XM-2. Dropped bids Xd that were lower than the current price when they were placed. Let us denote the set of visible bids as xv and the set of hidden bids as Xh-Let n denote the number of bidders that appears in the bidding history. Th is means that xv wi l l always contain (n â€” 1) bids, since the winning bidder's proxy b id is never observed. Since there are m potential bidders in total , (n â€” 1) bids are visible, and one b id is the highest b id Xhi, there are (m â€” n) dropped bids i n Xd-Figure 3.1 shows an example of the bidding process for an auct ion according to our online auct ion model, and il lustrates how bids are dropped. B ids arrive sequentially f rom left to right. B ids 3, 4 and 7 (the grey bars) wi l l be dropped because they are lower than the current bid amount at their respective t ime steps. The amount of the winning bid (6) wi l l not be revealed, al though an observer would know that it was at least the amount of b id 2, which would be the amount paid by bidder 6. 3 .2 .3 D i s c u s s i o n Our model of the bidding process is quite general. Not ice that when a bidder ob-serves that the price level is higher than her potential b id , she may decide not to b id in this auct ion. Th is is equivalent to our model in which she always submits Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 34 the b id , because dropped bids do not appear in the bidding history (Indeed, this is the motivat ion for our model). A lso our model covers the case of last-minute bidding, which happens quite often in eBay auctions [Roth and Ockenfels 2002]: even though last-minute bids may be submitted almost simultaneously, eBay processes the bids in sequence. Observe that wi th a proxy bidding system, and when agents have indepen-dent private values, bidders would not benefit f rom bidding more than once in an auction. However, in practice eBay bidders quite often make mult iple bids in one auction. One possible mot ivat ion for these bids is a desire to learn more information about the proxy b id of the current high bidder [Shah et al . 2003]. However, only the last bid of the bidder represents her will ingness to pay. G iven a bidder's last b id , her earlier bids carry no extra information. Therefore, we wi l l be interested in only the last bid from each b idder . 4 We can preprocess the bidding histories by removing al l bids except the last bids from each bidder, without losing much information. 3.3 Learning the Distributions Given the model of the bidding process, the first task of our bidding agent is to estimate the distr ibutions f(x) and g(m) from the bidding history. Suppose we have access to the bidding history of K auctions for identical items. 3.3 .1 T h e S i m p l e A p p r o a c h A simple approach is to ignore the hidden bids, and to directly estimate f(x) and g(m) from observed data. The observed number of bidders, n, is used to estimate g(m). To estimate f(x) we use the observed bids xv, which consists of (n â€” 1) bids for each auction. A n y standard density estimation technique may be used. Because it ignores hidden bids, this approach can be expected to produce biased estimates of / and g: â€¢ g(m) wi l l be skewed towards smal l values because n <m. â€¢ f(v) may be skewed towards smal l values because it ignores the winning b id Xhi, or it may be skewed towards large values because it ignores the dropped bids Xd-A popular variat ion of this approach (mentioned in Section 3.1) is to d i -rectly estimate the distr ibut ion of final prices from the selling prices of previous auctions, then to use this distr ibut ion to compute a decision-theoretic b idding strategy. The problem wi th this approach is that for Engl ish auctions, the selling prices are the second-highest bids. As we wi l l show in Section 3.4, to compute a decision-theoretic strategy we really need the distr ibut ion of the other bidders' 4 I n a common value model, the earlier bids do carry some information, and we would not be able to simply ignore those bids. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 35 highest bids. Us ing the distr ibut ion of final prices introduces bias, as this distr i -but ion is skewed towards smal l values compared to the distr ibut ion of highest bids. 3.3.2 E M Learning Approach We would like to have an estimation strategy that accounts for the hidden bids Xh and any bias introduced by their absence. Suppose f(x) belongs to a class of distr ibutions parameterized by 9, f[x\0). Further suppose that g(m) belongs to a class of distr ibutions parameterized by A, g(m\\). For example, }(x) could be a Norma l distr ibut ion parameterized by its mean fx and variance a2, whereas g(m) could be a Poisson distr ibut ion parameterized by its mean, A. We want to find the max imum l ikel ihood estimates of 9 and A given the observed data xv. Suppose that we could actual ly observe the hidden bids Xh in addi t ion to xv. Then estimating 9 and A from the completed data set (xv,Xh) would be easy. Unfortunately we do not have Xh- G iven xv, and wi th the knowledge of the bidding process, we could generate Xh if we knew 9 and A. Unfortunately we do not know 9 and A. A popular strategy for learning this k ind of model wi th missing data is the Expecta t ion Max imiza t ion ( E M ) algor i thm [Dempster et al . 1977). E M is an iterative procedure that alternates between E steps which generate the missing data given current estimates for the parameters and M steps wji ich compute the max imum l ikel ihood (or max imum a posteriori) estimates for the parameters based on the completed data, which consists of the observed data and current estimates for the missing data. Formal ly, the E step computes Q{9) = j\og{p{xh,xv\9))p{xh\xvyÂ°ld\X^)dxh, (3.1) and the M step solves the opt imizat ion 0("Â«") =a rgmax (Q(0 ) ) . (3.2) 6 Analogous computations are done to estimate A, the parameter for g(m\X). The E M algor i thm terminates when A and 9 converge. It can be shown that E M wi l l converge to a local max imum of the observed data l ikel ihood function p{xv\9, A). E M is a part icular ly appropriate algori thm in our auction setting wi th hidden bids, because 1. E M was designed for f inding max imum l ikel ihood (ML ) estimates for prob-abil ist ic models wi th unobserved latent variables. 2. E M is especially helpful when the observed data l ikel ihood function p(xv\9,\) (which we want to maximize) is hard to evaluate, but the M L estimation given complete data (xv,Xh) is relatively easy (because the M step cor-responds to maximiz ing the expected log-l ikel ihood of (xv,Xh))- Th is is exactly the case in our auction setting. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 36 The E Step The integral in Equat ion (3.1) is generally intractable for our complex b idding process. However, we can compute a Monte Car lo approximat ion of this integral by drawing N samples from the distr ibut ion p(xh\xv,0^old\ \(Â°ld^), and approx-imat ing the integral by a smal l sum over the samples (see e.g. [Andrieu et a l . 2003]). App l ied to our model, in each E step our task is therefore to generate samples from the distr ibut ion p(xfl\xv, 9^old\ \(old)). Recal l that Xh consists of the highest bid x^ and the dropped bids Xd-Given and the second highest bid (which is observed), the highest b id Xhi can easily be sampled. Accord ing to the bidding process described in Section 3.2, the highest b id was generated by repeatedly drawing from f(x\9^old^) unt i l we get a bid higher than the previously highest (now second-highest) b id . Th is is exactly the rejection-sampling procedure for the distr ibut ion f(x\9^Â°ld^) t runcated at the second highest b id and renormalized. For distr ibutions wi th a simple functional form (e.g. normal distr ibutions), it may be easier to sample direct ly from the truncated distr ibut ion by reversing the C D F (see e.g. [West 1994]). Sampl ing the dropped bids Xd is a more difficult task. We use the following procedure, which is based on simulat ing the bidding process: 1. Sample TO from g(m\\(cldS>). 2. If m < n, reject the sample and go back to step 1. 3. Simulate the bidding process using xv and m â€” n dropped bids: (a) Repeatedly draw a sample bid from f(x\6^old^), and compare it to the current price level. If it is lower than the price level, add the b id to the set of dropped bids Xd- Otherwise, the current price level is increased to the next visible bid from xv. (b) If the number of bids in Xd exceeds m â€” n, or if we used up al l the bids in xv before we have TO â€” n dropped bids in Xd, we reject this sample and go back to step 1. On ly when we used up al l bids in xv and we have TO â€” n bids in Xd, do we accept the sample of Xd-4. Repeat unt i l we have generated N samples of Xd-The M Step Our task at each M step is to compute the max imum l ikel ihood ( M L ) esti-mates of A and 9 from xv and the generated samples of Xh- For many standard parametr ic families of distr ibutions, there are analyt ical solutions for the M L estimates. For example, if f(x) is a normal distr ibut ion N(p, a), then given the complete set of bids (xv,Xh), the M L estimate of u, is the sample mean, and the M L estimate of a is the sample standard deviat ion. If g(m) is a Poisson distr ibut ion, then the M L estimate of the mean parameter A is the mean of the number of bidders per auction. If analyt ical solutions do not exist we can use Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 37 numerical opt imizat ion methods such as simulated annealing. If prior distr ibu-tions on A and 6 are available, we may instead compute max imum a posteriori ( M A P ) estimates, which are point estimates of A and 9 that maximize their posterior probabil i t ies. 3.3.3 Learning Distributions in a Game-Theoretic Setting The approach we just described is decision-theoretic because we estimate the distr ibut ion of bid amounts without considering how other agents would react to our behavior. W h a t if we want to take a game-theoretic approach? Athey and Hai le [2002] discussed estimation in the game-theoretic setting, however they generally assume that the number of bidders is known (there is a brief discussion of unknown number of bidders, but it is not relevant to our online auction setting). Let f(v) be the distr ibut ion of bidder's valuations (instead of b id amounts), and let g(m) denote the distr ibut ion of number of bidders, as before. G iven a bidder's valuation v, her b id amount in the game-theoretic setting is given by the Bayes-Nash equi l ibr ium of the auction game. M a n y (but not all) auct ion games have symmetric pure strategy equil ibria. Assume there is a symmetr ic pure strategy equi l ibr ium given by the b id function b(v\f, g). Ou r task is to estimate f(v) and g(m) given the observed bids. We can use an E M learning algor i thm similar to the one in Section 3.3.2 to estimate f(v\9) and g(m\\), where 9 and A are parameters for / and g: â€¢ E step: for each auct ion wi th observed bids xv: â€” Compute observed bidders' valuations vv f rom xv by invert ing the bid funct ion. 5 â€” Generate bidders wi th valuations Vh who place hidden bids Xh = b(vh\f(old\g(old^)- Th is is done by a similar procedure to the one in Section 3.3.2 that simulates the auction's bidding process. â€¢ M step: update 9 and A to maximize the l ikel ihood of the valuations (Vv,Vh)-3.4 Construct ing a B i d d i n g Agent In Sections 3.2 and 3.3 we presented a model of the bidding process for a single auct ion, and proposed methods to estimate the distr ibutions of bids and number of bidders in an auction. Bu t our work is not done yet: how do we make use of these estimated distr ibutions to compute a bidding strategy? 5 If the bidding function does not have a single-valued inverse function, or if the equilibrium has mixed strategies, we just generate bidders with valuation vv that would likely bid xv under the equilibrium. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 38 If we only part icipate in one Engl ish auct ion, bidding is simple: under the I P V model it is a dominant strategy to bid up to our valuation for the i tem, and we do not even need to estimate the distr ibut ions. However if we are interested in more complex environments, good estimates of the distr ibutions f(x) and g(m) are essential for comput ing a good bidding strategy. In this section we discuss two such environments: finitely-repeated online auctions and unrepeated online auctions without proxy bidding. We consider the first problem from a decision-theoretic point of view, and take a game-theoretic approach to the second problem. 3.4.1 A Decision-Theoretic Approach to Repeated Auctions In this section we develop a decision-theoretic bidding agent for finitely repeated auctions. We choose this setting because it is a reasonable model of the decision-theoretic problem we would face if we wanted to buy one i tem from an online auct ion site. Our estimation algor i thm can straightforwardly be applied to more complex decision-theoretic models such as infinite horizon models w i th discount factors and combinatorial valuat ion models. The A u c t i o n Envi ronment Suppose we only want to buy one item (say a Playstat ion 2) in an environment (say eBay) where mult iple auctions for similar, but not necessarily identical, P laystat ion 2 systems are held regularly. Recal l that we have assumed that ut i l i ty is quasilinear: thus if we successfully win one item, our ut i l i ty wi l l be equal to our valuat ion for the i tem minus the price we paid. So our bidding agent's task is to compute a bidding strategy that wi l l maximize this uti l i ty. Assume that we are only interested in the first k auctions that wi l l be held after we arrived at the auction site. One motivat ion for such a restrict ion is that we prefer to have the item wi th in a bounded amount of t ime. If we fail to win an item from the k auctions, we lose interest in the i tem and leave the auct ion site, and our ut i l i ty is 0. (Alternatively, we could say that if we fail to win an auct ion then we could buy the i tem from a store after leaving the auct ion site, in which case we would get some other known and constant amount of utility.) Some of the k auctions may overlap in t ime, but since eBay auctions have strict closing times, this can be modeled as a sequential decision problem, where our agent makes bidding decisions right before each auct ion closes. Number the auctions 1 . . . k according to their closing times. Let Vj denote our valuat ion for the i tem from auction j. Note that this allows the items in the auctions to be non-identical. Let bj denote our agent's b id for auction j. Let Uj denote our agent's expected payoff from part ic ipat ing in auctions j .. .k, assuming we did not win before auction j. Let Uk+i be our payoff if we fail to win any of the auctions. For simpl ici ty we define Uk+i = 0, though the analysis would be similar for any constant value. Suppose that for each auction j the number of other bidders is drawn from gj{m) and each bidder's b id is drawn from fj(x). Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 39 Since each auction j is an Engl ish auct ion, only the highest b id from other b id-ders affects our payoff. Let fj (x) and Fj (x) respectively denote the probabi l i ty density function and cumulat ive density function ( C D F ) of the highest b id from other bidders in the j t h auct ion. Then oo F}(x)=J29J(rn)(Fj(x))m, m=2 where Fj(x) is the C D F of fj(x). Now Uj can be expressed as the following function of the future bids bj-.k = (bj, â€¢ â€¢ â€¢, bk) and valuations Vj-.k = (VJ, ..., Vk)'-(VJ - x)fj(x)dx + (1 - Fj(bj))Uj+i{bj+i:k,Vj+i:k) (3.3) -oo The first term in Equat ion (3.3) is the expected payoff from the j t h auct ion; the second term is the expected payoff f rom the later auctions. The Optimal Strategy Greenwald and Boyan [2004] and A r o r a et al . [2003] have analyzed simi lar auc-t ion models. Fol lowing similar reasoning, we can derive the opt imal b idding strategy for our auct ion model. Let b*j.k be the opt imal b idding strategy for auctions j,...., k. Let Uj(vj-k) denote the expected payoff under opt imal strat-egy, i.e. Uj(vj-.k) = Uj(bj.k,Vj-k)- We can optimize Uj by working from the kth auct ion to the first one in a manner similar to backward induct ion. B y solving the first-order conditions of Uj, we obtain the opt imal bidding strategy: b*=Vj-U*+l(vj+1..k) (3.4) In other words, our agent should shade her bids by the "opt ion value", i.e. the expected payoff of part ic ipat ing in future auctions. The exception is of course the kth auct ion; in this case there are no future auctions and the opt imal b id is b*k = Vk- Thus we see that the standard result that honest b idding is opt imal in an unrepeated auct ion is recovered as the special case k = 1. The computat ion of the opt imal bidding strategies requires the computat ion of the expected payoffs f * = Uj(b*j.k,Vj:k), which involves an integral over the distr ibut ion fj(x) (Equat ion 3.3). In general this integral cannot be solved an-alyt ical ly, but we can compute its Monte Car lo approximat ion if we can sample from fj(x). If we can sample from fj(x) and gj(m), we can use the following straightforward procedure to generate a sample from fj(x): 1. draw m from gj(m); 2. draw m samples from fj(x); 3. keep the max imum of these m samples. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 40 The bidding strategy b\.k computed using Equat ions (3.4) and (3.3) is opt imal , provided that the distr ibutions fj(x) and gj(m) are the correct distr ibutions of bids and number of bidders for al l j â‚¬ 1... k. O f course in general we do not know the true fj{x) and gj(m) and the focus of this chapter is to estimate the distr ibutions from the bidding history and use the estimated distr ibutions to compute the bidding strategy. As a result, the computed bidding strategy should be expected to achieve less than the opt imal expected payoff. However, it is reasonable to think that better estimates of f(x) and g(m) should give bidding strategies wi th higher expected payoffs. Th is is confirmed in our experiments across a wide range of data sets, which we discuss in Section 3.5. Auctions that Overlap in Time: Exploiting Early Bidding We observe that while the opt imal b id in auct ion j does not depend on fj, it does depend on / / for I > j. So far we have been est imating f}[x) using fi(x) and gi(m). In practice, auctions overlap in t ime, and we often observe some early bidding act iv i ty by other bidders in auctions j +1,..., k before we have to make a bid on auction j. Th is extra information allows us to make even more informed (posterior) estimates on fl(x), I > j, based on fi(x), gi(m) and the observed bids for auction /, which leads to a better b id for auct ion j. Suppose we have observed n â€” 1 early bids, denoted by xv; the current highest bid Xh% is not revealed (but can be sampled from f(x) t runcated at the current price). Since the auct ion is not over, there wi l l be some set of future bids xfuture (possibly empty). W h e n the auct ion closes, the highest b id from the other bidders wi l l be max{xhi, xfuture}- We can generate XfutUre if we know the number of future bids. We know the total number of bids m is drawn from g(m), and the number of bids made so far is n + \xd\, where xj, are the dropped bids so far, so the number of future bids is m â€” n â€” \xd\- Now we have a procedure that samples from f1(x): 1. Simulate the auct ion using our model in Section 3.2 to generate Xd, the dropped bids so far. 2. Sample the total number of bids m from g(m). 3. Compute the number of future bids, m â€” n â€” \xd\-(a) If this quanti ty is negative, reject the sample. (b) Otherwise generate Xfuture as (m â€” n â€” \x&\) bids drawn from f(x). 4. Generate Xhi and take the max imum of Xfuture and Xhi-3.4.2 A Game-Theoretic Approach to Bidding in Online Auctions without Proxies In Section 3.4.1 we discussed bui ld ing a decision-theoretic b idding agent for re-peated Engl ish auctions. W h a t happens if we t ry to use the game-theoretic Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 41 approach on our auct ion model to bui ld an agent that plays a Bayes-Nash equi-l ib r ium of the auct ion game? The difficulty turns out to be identifying the equi l ibr ium. If each bidder other than our agent only part icipates in one auct ion, then the si tuat ion is easy. In this case the other bidders' dominant strategies wi l l be to bid t ruthful ly up to the amount of their valuations. Assuming al l bidders follow this dominant strategy as a game-theoretic approach would have us do, we find that the dis-t r ibut ion of bids is the same as the distr ibut ion of valuations. Thus , our agent's strategy should be the same as the decision-theoretic opt imal strategy described in Section 3.4.1. It is easy to verify that this strategy together wi th the other players' t ruthful b idding forms a Bayes-Nash equi l ibr ium. If the other bidders part icipate in more than one auct ion then the equi l ibr ium strategy gets more complex, both strategically and computat ional ly. M i lg rom and Weber [2000] have derived equi l ibr ium strategies for sequential auctions under some strong simpl i fy ing assumptions, such as identical items, a fixed number of bidders, and al l bidders entering the auctions at the same time. In an online auct ion sett ing, these assumptions are not reasonable: the items are often heterogenous, the number of bidders is unknown ex ante, and bidders have different entry times and exit policies. The equi l ibr ium strategy in the general case is not known. Therefore, it is an open problem to find a game-theoretic model for repeated online auctions that is realistic and yet st i l l tractable. Online Auc t ions wi thout Proxies Because of the difficulties described above, we tu rn to a sl ightly different auc-t ion setting in order to show how our distr ibut ion learning techniques can be used to bui ld a game-theoretic bidding agent. Specifically, we wi l l consider an online auct ion setting which differs in one crucial respect from the setting de-fined in Section 3.2. Bidders st i l l part icipate in an online ascending auction against an unknown number of opponents and st i l l arrive and b id sequentially; however, now we add the restr ict ion that bidders cannot place proxy bids. In-stead, bidders can now place only, a single b id before they leave the auction forever; the game now takes on a flavor similar to a first-price auct ion, but w i th different information disclosure rules. We chose this game because it has hidden-bid characteristics similar to the online auctions we discussed earlier, but at the same t ime it also has a known, computat ional ly t ractableâ€”and yet non- t r iv ia lâ€”Bayes-Nash equi l ibr ium. More formally, suppose that there are m potent ial bidders interested in a single i tem. The number m is drawn from a distr ibut ion g(m), and is observed by the auctioneer but not the bidders. Bidders have independent private valuations, drawn from the distr ibut ion f(v). A b id history records every b id which is placed along wi th the b id amount, and is observable by al l bidders. The auction has the following rules: - â€¢ The auctioneer determines a random sequential order to use in approach-ing bidders. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 42 â€¢ Each bidder is approached and asked to make a b id. Each bidder is asked once and only once. â€¢ W h e n asked, a bidder has to either make a b id which is at least the current highest b id plus a constant m in imum increment S, or decide not to b id . 1. If the bidder places a b id , it is recorded in the bidding history. 2. Otherwise, no record is made of the fact that the auctioneer ap-proached the bidder. â€¢ The auction ends after the last bidder makes her decision. The highest submit ted b id wins, and the winner pays her b id amount. The winner's ut i l i ty is her valuat ion minus her payment; other bidders get zero uti l i ty. C o m p u t i n g E q u i l i b r i u m Strategies We now turn to the construct ion of a game-theoretic b idding agent for the auction described in Section 3.4.2. We begin by supposing that the distr ibutions f(v) and g(m) are common knowledge, which allows us to treat the auction as a Bayesian game and find its Bayes-Nash equi l ibr ium. Suppose there exists a pure-strategy equi l ibr ium of this game. Consider bidder i wi th valuat ion v, who is asked to make the next bid when the b id-ding history is xv. Denote b(v\xv) the equi l ibr ium bidding strategy. Before we compute b(v\xv), let us first el iminate dominated strategies. A s in classical sealed-bid first-price auctions, it is obvious that we should never b id higher than our valuat ion v. Let b0 denote the highest submitted b id so far, thus b0 + 5 is the min imum bid amount. It immediately follows that if tv < b0 + 5 then we should not b id. O n the other hand, if v > b0 + 5, then we have non-negative expected payoffs by making a b id (as long as we bid below v). Define as P the class of strategies which take the fol lowing form: â€¢ if v < b0 + 5, then do not bid â€¢ if v > b0 + 6, then bid an amount in the interval [b0 + 5, v] Fol lowing the above reasoning, any strategy not in P is weakly dominated. Thus , in equi l ibr ium al l bidders wi l l play some strategy in P. Suppose bidder i 's valuat ion is v > b0 + S. Thus she places a b id b Â£ [b0 + S, v]. Note that bidder i 's expected payoff depends on her valuat ion, her bid b, and what future bidders do. Since future bidders play strategies in P, we know i wi l l be outbid if and only if at least one future bidder has valuat ion greater than or equal to b + 5. In other words, as long as everyone plays a strategy in P, a bidder's expected payoff depends on future bidders' valuations but not their exact b idding functions. Denote by mf the number of future bidders, and denote by p0 â€” Pr(raj â€” Q\xv) the probabi l i ty that there wi l l be no future bidders. Let Fx(v) be the C D F of the (posterior) d istr ibut ion of the highest future b id given xv, condit ioned on having at least one future bidder. In other words, F^v) = Emf]mf>0[F(v)mf} (3.5) Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 43 where F(v) is the C D F of the value distr ibut ion. Fl and p0 depend on the posterior distr ibut ion of the number of future bidders, Pr(nif\xv). If we can generate samples of TO/, then F1 and p0 can be straightforwardly approximated. Note that the set of bidders consists of past bidders (observed and hidden), the current bidder, and future bidders. Thus we can generate m / using essentially the same procedure as described in Section 3.4.1, by drawing m from g(m), simulat ing the auct ion so far to get the number of already dropped bids \xd\, then lett ing TO/ = m â€” la;,,! â€” \xd\ â€” 1. Note that we do not need to know the exact equi l ibr ium strategy to generate the hidden bidders; the knowledge that the equi l ibr ium strategy is in P is enough to determine whether a bidder should b id or not. Now we can write the expected payoff U(b, v) as: U(b, v) = (1 - p0)Fl(b + 6)(v-b) + Po(v - b) (3.6) Since v is fixed for each bidder, U(b, v) is then a function of b. We can then use any standard numerical technique for function maximizat ion in order to find the opt imal b* Â£ [60 + 6, v] that maximizes U. Theorem 3. The following strategy (played by all bidders) is a Bayes-Nash equilibrium of our auction setting: â€¢ if v < b0 + 6, do not bid. â€¢ ifv>b0 + 6, bid b* = a r g m a x b 6 [ 6 o + ^ ] U(b,v). The proof is straightforward: first observe that this strategy is in P. We have showed above that if everyone else is playing a strategy in P, then b* as defined wi l l maximize the bidder's uti l i ty. It follows that if everyone is playing the above strategy, then each bidder is playing a best response to the other bidders' strategies and so we have an equi l ibr ium. Learning the Distributions A t the beginning of Section 3.4.2 we assumed that the distr ibutions f(v) and g(m) were commonly-known. Now we relax this assumption. We assume that we have access to the bidding histories of K past auctions, and consider the problem of learning the distr ibutions. A s in our online Eng l ish auctions, some information is hidden from the bidding history: the number and valuations of the bidders who decided not to bid. As discussed in Section 3.3.1, the simple approach of ignoring these hidden bidders could be expected to introduce bias to the estimates of / and g. Furthermore, the observed bids are not equal to the bidders' valuations; as shown above the equi l ibr ium bid is always lower than the valuation. To correctly account for the hidden information, we use our E M algor i thm for game-theoretic settings, as described in Section 3.3.3. To implement the E M algor i thm, an important subproblem is reversing the equi l ibr ium bidding strategy. Formal ly, given a complete bidding history xv of an auct ion and the Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 44 current estimates f{v\9) and g(m\\), compute the corresponding valuations vv such that bidders wi th valuations vv p laying the equi l ibr ium strategy would have placed the bids xv. Consider each b id 6; â‚¬ xv. Denote by b0 the previous observed b id. F rom Section 3.4.2 we know that bi = argmaxbg^+^t,] U(b,v). There are two cases: 1. If bi > b0 + 5, then 6j must be a local max imum in [b0 + 5, v] satisfying the first-order condit ion dUQbb'v^ = 0. Solving for v, we get Fl(bi+5)+p0/(l-Po) v = b i + fWTI) . ( 3 ' 7 ) where fl is the derivative of F1. We can numerical ly approximate F1, f1 and p0 by generating samples of m / in the same way as in Section 3.4.2. 2. Otherwise, bi = bâ€ž + 5. Intuitively, this means that the bidder's valuat ion is high enough to make a b id (i.e. v > b0 + 6), but not high enough to fall into Case 1. A n y bidder w i th a valuat ion in the interval . c , . r . F1(b0 + 6)+p0/(l-p0) , + d, b0 + 6 H fl(b0 + 5) would b id in this way. Thus we generate samples of v by drawing from f(v\9) t runcated to the above interval. Once the E M algor i thm has estimated the distr ibutions / and g, our agent can use these distr ibutions to compute the Bayes-Nash equi l ibr ium strategy as described in Section 3.4.2. 3.5 Exper iments We evaluated both our E M learning approach and the simple approach 6 on several synthetic data sets and on real world data collected from eBay. We compared the approaches in three ways: 1. W h i c h approach gives better estimates of the distr ibutions / ( C E ) , g(m) and fl(x)? Th is is important because better est imation of these distr ibutions should give better results, regardless of whether agents take a decision-theoretic approach or a game-theoretic approach to bidding. We measure the closeness of an estimated distr ibut ion to the true distr ibut ion using the Kul lback-Le ib ler ( K L ) Divergence from the true distr ibut ion to the estimated distr ibut ion. The smaller the K L Divergence, the closer the estimated distr ibut ion to the true one. 6 We also looked at the variant of the simple approach that directly estimates the dis-tribution f1(x) using the selling prices. Our results show that this approach consistently underestimates f1(x) as expected, and its performance is much worse than both the E M approach and the simple approach. For clarity, detailed results on this approach are not shown. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 45 2. For repeated online auction data, which approach gives better expected payoff under the decision-theoretic bidding model, as described in Section 3.4.1? 3. For data from online auctions without proxies, which approach gives closer estimates to the Bayes-Nash equi l ibr ium under the game-theoretic bidding model (i.e., computes e-equilibria for smaller values of e), as described in Section 3.4.2? Our experiments show that the E M learning approach outperforms the s im-ple approach in al l three ways, across a wide range of data sets. 3.5.1 Repeated Online Auctions In this section we consider repeated online auctions, and thus attempt to answer the first two questions above. We present results from four data sets: â€¢ Da ta Set 1 has auctions of identical items, and we know the family of distr ibutions that f(x) and g(m) belong to. â€¢ Da ta Set 2 has auctions of non-identical items, but we know the bid dis-t r ibut ion /(x) is influenced l inearly by an attr ibute a. â€¢ Da ta Set 3 has auctions of identical items, but we do not know what k ind of distr ibutions f(x) and g(m) are. We use nonparametric est imation techniques to estimate the distr ibutions. â€¢ Da ta Set 4 is real-world auction data on identical items, collected from eBay. Synthetic D a t a Set 1: Identical Items In this data set, the items for sale in al l auctions are identical, so the number of bidders and b id amounts come from stat ionary distr ibutions g(m) and f(x). f(x) is a Norma l distr ibut ion N(4, 3.5). g(m) is a Poisson distr ibut ion shifted to the right: g(m â€” 2) = P(40) , i.e. the number of bidders is always at least 2. The bidding history is generated using our model of the bidding process as described in Section 3.2. Each instance of the data set consists of bidding history from 40 auctions. We generated 15 instances of the data set. Es t ima t ing the Dis t r ibut ions Bo th estimation approaches are informed of the parametr ic families from which f(x) and g(m) are drawn; their task is to estimate the parameters of the distr ibut ions, (/x, cr) for /(x) and A for g(m). A t the M step of the E M algori thm, standard M L estimates for /x, a , and A are computed, i.e. sample mean of the bid amounts for /x, standard deviat ion of the bid amounts for cr, and the mean of the number of bidders minus 2 (due to the shifting) for the Poisson parameter A. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 46 Our results show that the E M approach outperforms the simple approach in the quali ty of its estimates for the distr ibutions f(x), g(m) and fl(x). F igure 3.2 shows typical estimated d is t r ibut ions 7 and the true distr ibutions. We observe that the plot of the estimated f(x) by the simple approach is signif icantly shifted to the right of the true distr ibut ion, i.e. the simple approach overestimated f(x). We have also calculated K L Divergences from the true distr ibutions to the estimated distr ibutions, and the E M estimations have consistently lower K L Divergences. Th is difference was verified to be significant, using the non-parametric Wi lcoxon sign-rank test. Bidding in Repeated Auctions Est imates from both approaches were used to compute bidding strategies for an auct ion environment wi th 8 sequentially held auctions of the same kind of items, using the decision-theoretic model introduced in Section 3.4.1. The agent's "actua l " expected payoffs U\(b, v) under these bidding strategies were then computed, using the true distr ibut ions. The opt imal bidding strategy and its expected payoff were also computed. Our results show that the E M approach gives rise to bidding strategies closer to the opt imal strategy, and achieves higher expected payoffs, as compared to the simple approach. Figure 3.3 shows a plot of the bidding strategies in the first auct ion, and a box plot of the mean regrets, which is the differences between opt imal expected payoffs and actual expected payoffs. Formal ly, let b* denote the opt imal strategy and b the strategy computed using estimated distr ibut ions, then the regret given our agent's valuat ion v is and the mean regret is the expected value of R(v) over the distr ibut ion of v, which we set to be the same as the other bidders' b id distr ibut ion / : F rom the box plot we observe that the mean regret of the E M approach is much smaller than that of the simple approach. The ratio between the mean regrets of the E M and simple approaches is 1 : 56. We also used the estimated distr ibutions on the decision-theoretic model wi th overlapping auctions, as described in Section 3.4.1. We again tested both approaches on 8 sequentially held auctions, but now some early bidding act iv i ty on each auct ion was generated. These results are shown in Figure 3.4. Aga in we see that the E M approach achieves higher expected payoffs (and thus less regret) compared to the simple approach. The E M approach seemed to benefit more from the extra information of early bids than the simple approach: the ratio between the mean regrets of the E M and simple approaches increased to 7 T h e distributions shown were randomly chosen from the 15 instances of the data set. We have verified that the plots of the other distributions are qualitatively similar. R(v) = Ui(b*,v)-Ui(b,v) 1 : 390. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 47 Distribution of the Highest Bid Figure 3.2: Results for Data Set 1, Distribution Estimation: distribution of bids f(x) (left); distribution of highest bids fl(x) (right). Bidding Strategies in the First Auction Payoff Regrets "2 4 6 8 10 12 14 16 18 20 simple EM Bidder Valuation Figure 3.3: Results for Data Set 1, Bidding: bidding strategies in the first auction (left); box plot of payoff regrets of the two approaches (right). Synthetic Data Set 2: Non-identical Items In our second data set, the items on sale are not identical; instead the distribu-tion of valuations are influenced by an observable attribute a. In this data set the dependence is linear: f(x\a) = N(l.la + 1.0,3.5). g{m) is a Poisson distri-bution as before: g(m â€” 2) = P(35). For each auction, a is sampled uniformly from the interval [3,9]. In other words, this data set is similar to Data Set 1, ex-cept that the bid distribution f(x) is drawn from a different parametric family. Both approaches now use linear regression to estimate the linear coefficients. Estimating the Distributions Again, our results show that the EM ap-proach outperforms the simple approach for this data set, in terms of its esti-mates for f(x) and g(m). Figure 3.5 (left) shows the estimated linear relation between the mean of f(x\a) and a. From the figure we can see that the EM approach gives a much better estimate to the linear function. The simple ap-proach again significantly overestimates the bid amounts. In fact the simple Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 48 Overlapping Auctions: Payoff Regrets Figure 3.4: Box plot of expected payoff regrets for overlapping auctions The mean of f(x|a) versus Payoff Regrets Figure 3.5: Results for Da ta Set 2: L inear relationship between the mean of f(x\a) and a (left). Box plot of payoff regrets (right). approach has consistently overestimated f(x) for al l the synthetic da ta sets we tested. Bidding in Repeated Auctions We then used the estimated distr ibut ions to compute a decision-theoretic agent's bidding strategies and expected payoffs of an auct ion environment wi th 8 sequential auctions, where the at tr ibute a of each i tem is observed. The E M approach also gives better expected payoff, the stat ist ical significance of which is confirmed by Wi lcoxon 's sign-rank test. Figure 3.5 (right) shows a box plot of regrets from different instances of data sets, which shows that the E M approach achieved consistently higher payoffs. Synthetic Data Set 3: Unknown Distributions We go back to the identical items model wi th stat ionary distr ibut ions f(x) and g(m). For this data set, f(x) is a G a m m a distr ibut ion w i th shape parameter 2 and scale parameter 3. g(m) is a mixture of two Poisson distr ibut ions: P (4 ) wi th probabi l i ty 0.6 and P(60) w i th probabi l i ty 0.4. Bu t now the est imat ion Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 49 approaches does not know the types of the true distr ibutions. Instead, both use kernel density estimation (kernel smoothing), a nonparametric estimation strategy. Essential ly, given N samples from a distr ibut ion p(x), we estimate p(x) by a mixture of N kernel functions centered at the N samples. Es t ima t ing the Dis t r ibu t ions A Gaussian kernel is used for est imating f(x) and a uni form kernel is used for est imating g(m). A t each M step of the E M algori thm, the bandwidth parameters of the two kernel estimations need to be selected. We use the simple "rule of thumb" strategy [Silverman 1986] for bandwidth selection. We used the same kernel est imation and bandwidth selection technique for the simple approach. Ou r results show that the E M approach gives better estimates than the simple approach. Figure 3.6 shows typical estimated distr ibutions and true dis-tr ibut ions. F rom the figure we can observe that the E M estimates of f(x), g(m) and f1(x) are much closer to the true distr ibutions that the simple estimates. The E M estimates have signif icantly smaller K L Divergences compared to the simple estimates, verified by Wi lcoxon 's sign-rank test. B i d d i n g i n Repeated Auc t ions We then computed the expected payoffs under the decision-theoretic model wi th 8 sequential auctions. The expected payoffs of the E M approach were not significantly better than that of the simple approach, as shown by the box plot in Figure 3.6. One possible explanat ion is that although K L divergence is a good measure of simi lar i ty of distr ibut ions in general, under this part icular sequential auct ion decision-theoretic model K L divergence might not be the best measure of quality. The bidding strategy as defined in (3.4) and (3.3) is opt imal if the distr ibutions / and g are the true underly ing distr ibutions. Us ing our estimated distr ibutions, the result ing b id-ding strategy may be higher or lower than the opt imal bids. However from our experience in these experiments, b idding too high is more costly than bidding too low. Th is is not taken into account in the K L divergence computat ion as well as our M L estimation procedure. Th is suggests that a possible future re-search direction is to identify a loss function suited for the sequential auct ion bidding model, and compute estimated distr ibutions by min imiz ing that loss function. Another possible approach is to compute a posterior d ist r ibut ion in-stead of point estimates of the parameters in each i terat ion. For example, West [1994] used Gibbs sampling techniques to compute the posterior distr ibut ion of parameters in a relatively simple model of incomplete data. B idd ing strategies computed using the distr ibut ion of parameters should be free of the problem mentioned above. However, this k ind of approach would be more computat ion-al ly expensive. eBay D a t a on Sony Plays ta t ion 2 Systems Our experiments on synthetic data sets showed that our E M approach gave good estimates of the true distr ibutions in several different settings. However, the synthetic data sets are generated using our model for the bidding process. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 50 Figure 3.6: Results for Data Set 3: Distribution f(x) (top-left). Distribution g{m) (top-right). Distribution fl(x) (bottom-left). Box plot of payoff regrets (bottom-right). Thus, the above experiments do not tell us whether our model for the bidding process is an accurate description of what happens in real world online auctions. To answer this question, we wanted to test our approach on real world bid data. On eBay, the bidding histories of completed auctions are available for 30 days. Unfortunately, information on the hidden bids, especially the proxy bids of the winners of the auctions, is not publicly available. So unlike in the synthetic data experiments, we cannot compare our estimated distributions with the "true" distributions.8 To get around this problem, we used the following approach. First we col-lected bidding histories from a set of eBay auctions. Next we pretended that those highest bids were not placed, and the previously second highest bids were the highest bids. We then "hid" these new highest bids of each auction. This gave us a "ground truth" for the hidden bids which allowed us to evaluate our 8 O f course we could have used our techniques to generate values for the missing bids; however, this would have been unfair when the goal was to test these techniques! Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 51 different approaches. It is true that this approach of hiding bids changed the underlying distr ibut ion of bids; however, this d id not worry us as our goal was not to learn the true distr ibut ion of bids in our eBay auctions. Instead, our goal was to evaluate our algori thms' performance on a realistic, non-synthetic data set. We believe that despite the removal of the high bid our data set preserves qual i tat ive characteristics of the original b idding history data. If our model of the bidding process is correct, then our E M approach should be able to cor-rectly account for the hidden bids in this data set and produce good estimates o f / 1 ^ ) . We collected bidding history of eBay auctions on brand new Sony Playstat ion 2 (Sl im Model) consoles, over the month of March 2005. We chose to study these auctions because they had been previously studied by Shah et al . [2003], who argued that bidders' valuations on PlayStations tend to be close to the private value model. We considered only auctions that lasted one day and had at least 3 bidders. 9 We found 60 auctions that satisfied these requirements. We then randomly div ided our data into a training set and a testing set. Es t ima t ing the Dis t r ibut ions We tested four learning approaches: the E M and simple approaches that estimate a Norma l distr ibut ion for f(x) and a Pois-son distr ibut ion for g(m), and the E M and simple approaches that use kernel density est imation to estimate f(x) and g(m). O f course we d id not have a ground t ru th for these distr ibutions, so it was not possible to compare the ac-curacy of the predictions. However, we could use both approaches' estimates to estimate fl(x) based on the tra in ing set, and compare these estimates against the highest bids from the test set. We did 8 runs of this experiment wi th dif-ferent random part i t ions of training set and testing set, and aggregated the results. The K L Divergences of f1(x) of the approaches were similar, and no one approach was signif icantly better than the others. B i d d i n g i n Repeated Auc t ions We then computed the expected payoffs under the decision-theoretic model. The E M approaches achieved significantly higher payoffs than the simple approaches, as shown in Figure 3.7. The ap-proaches using parametr ic models achieved similar payoffs to the corresponding approaches wi th kernels. The good performance of the parametr ic estimation E M approach for the eBay data set indicates that the Norma l and Poisson mod-els for f(x) and g{m) may be adequate models for model ing bidding on eBay. The E M approaches d id not have better K L divergence than the simple ap-proaches, but nevertheless outperformed the simple approaches in the repeated auct ion experiments. Th is is similar to our si tuat ion in D a t a Set 3. Our ex-periments have shown that there is a posit ive correlat ion between better K L divergence and better performance in repeated auctions, but that this correla-t ion is not perfect. 'We needed at least 3 bidders so that we could drop one and still have one observed bid. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 52 Payoff Regrets simple simple kernel Figure 3.7: Box plot of payoff regrets on the eBay Data Set 3.5.2 Online Auctions without Proxies We built a data set modeling online auctions without proxies in order to compare the EM approach against the simple approach in the game-theoretic setting described in Section 3.4.2. Thus, in this section we consider questions 1 and 3 from the list at the beginning of Section 3.5. Similar to Section 3 .5 .1 , we use a normal distribution JV(5 .0 ,2 .0) for f(v) and a shifted Poisson distribution (with A = 10) for g(m). Each instance of the data set consists of bidding histories from 3 0 auctions. Es t ima t ing the Dis t r ibu t ions As might be expected from the results in the previous section, we found that the EM approach gave much better estimates of f(v) and g(m) than the simple approach. Figure 3.8 shows typical estimated distributions and the true distributions. Es t ima t ing the Bayes-Nash E q u i l i b r i u m We then compared the game-theoretic bidding agents that would have been built using the distributions esti-mated by the two learning approaches. Unlike in Section 3 .5 .1 , we cannot sim-ply use expected payoff as a measure of performance, since in a game-theoretic setting our payoff depends other agents' strategies; for example, a bad approxi-mation to an equilibrium could actually lead to increased payoffs for all agents. (Consider e.g., the prisoner's dilemma.) Instead, what we can measure is the amount that each player could gain by unilaterally deviating from the current strategy at the "equilibrium" strategy profile computed using each learning ap-proach. One way of thinking of this value relates to the concept of e-equilibrium. (Recall that a strategy profile is an e-equilibrium if each agent can gain at most e by unilaterally deviating from the strategy profile.) Every strategy profile is Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 53 Distribution ol Valuations Distribution ot Number ot Bidders Figure 3.8: Results for Online Auctions without Proxies: the value distributions f(v) (left); the distributions of number of bidders g(rn) (right). an e-equilibrium for some e; what we compute is the smallest e for which the learned strategies form an e-equilibrium. For our auction setting, we observe (from Theorem 3) that no matter what distributions we use for / and g, our agents will play a strategy in P. As a result, the true equilibrium strategy (computed from Theorem 3 using the true distributions) is always a best response to the strategies our agents play. Given our agent's valuation v and observed bidding history so far xv, we can compute the difference between the expected payoff of the best response and the expected payoff of our agent's strategy. The e for the game is then the expectation of this difference over v, m, and bidding history xv. We approximate this expectation by sampling v, m, xv and taking the mean of the payoff differences. We generated bidding histories for 20 auctions using each approach, and for each bidder in these auctions we computed expected payoff differences for 50 valuations v drawn from the true f(v). Our results show that strategy profiles computed using the EM approach are epsi/on-equilibria for much smaller values of e than the strategy profiles computed using the simple approach. This means that the EM bidding agent's strategy is much closer to the true equilibrium. We computed e for 15 instances of the training data set, and Figure 3.9 gives a box plot of the resulting e's. 3.6 Related Work Our EM approach is similar in spirit to an approach by Boutilier et al. [1999]. This work concerns a decision-theoretic MDP approach to bidding in sequential first-price auctions for complementary goods. For the case when these sequen-tial auctions are repeated, this paper discusses learning a distribution of other agents' highest bids for each good, based on the winning bids in past auctions. If the agent's own bid wins in an auction, the highest bid by the other agents is hidden because only the winning bid is revealed. To overcome this problem of hidden bids, the paper uses an EM approach to learn the distribution of the Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 54 Figure 3.9: Results for Onl ine Auct ions without Proxies: box plot of epsilon-equi l ibr ia using the simple and E M approaches highest bid. In our Engl ish auction setting the highest bids are always hidden, and thus cannot be directly estimated. Instead we have to estimate the distr ibut ions of bids f(x) and of number of bidders g(m)â€”which requires us to take into account bids other than the highestâ€”and then compute the distr ibut ion of highest bids using f(x) and g(m). As a result the learning task in our domain is more complicated than the problem considered by Bout i l ier et a l . [1999]. Several other papers have tr ied to solve the hidden b id problem in various auct ion settings. Rogers et al . [2005] studied Engl ish auctions wi th discrete b id levels. They discussed estimating the distr ibutions of valuations and numbers of bidders (/ and g) from data, then using the estimated distr ibutions to compute an opt imal set of b id levels ( including the reserve price) for the auctioneer. Thei r learning approach is to look at only the final prices of the auctions, and then to use Bayesian inference to compute posterior distr ibutions of the parameters of / and g given the final prices of previous auctions. We note that this approach ignores al l the earlier bids, which carry information about / and g, while our approach uses al l bids observed. Furthermore, Rogers et al. 's [2005] approach works only for parametric distr ibutions. If the true underlying distr ibutions are not in the chosen parametr ic family of distr ibutions, parametr ic estimation tends to give poor results. In Section 3.5.1 we showed that our approach can use nonparametric estimation techniques (kernel density estimation). Final ly , the method of [Rogers et a l . 2005] needs to compute the joint d istr ibut ion of mult iple parameters, which takes exponential t ime in the number of parameters. Ou r E M approach only tries to compute M L / M A P estimates of the parameters, so each iteration of E M scales roughly l inearly wi th the number of parameters. Another relevant paper is [Song 2004]. L ike us, Song studies the estimation problem in online Engl ish auctions in eBay-l ike environments. However she used a different approach, based on the theoretical result that the second- and third-highest valuations uniquely identify the underlying value distr ibut ion. However, apply ing this fact to the eBay model presents a problem: while the highest Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 55 observed b id corresponds to the second-highest valuations, the second-highest observed b id is not necessarily the third-highest valuation. Th is is because the first- or second- highest bidders may have submit ted bids higher than the th i rd-highest value before the third-highest valued bidder had a chance to submit her f inal b id , which means her bid can be hidden from the bidding history. Thus the second-highest observed bid is sometimes less than the third-highest valuat ion, and ignoring this fact would introduce bias to the estimation. Song recognizes this problem, and her solution is to use data from auctions in which the first- or second-highest bidder submitted bids higher than the second-highest observed b id late in the auct ion, because these auctions have relatively higher probabi l i ty that their second-highest observed bids are third-highest valuations. However, this approach merely reduces expected bias rather than avoiding it entirely. Indeed, we note that there is empir ical evidence [Roth and Ockenfels 2002] that bidding act iv i ty late in auctions has much higher density than earlier b idding activity. Th is suggests that even for the selected auctions of Song's approach, there may be significant probabi l i ty that second-highest observed bids are less than third-highest valuations. Our approach models the hidden bidders, so does not suffer from the bias introduced in her approach, and does not need to drop any auctions from consideration. .Final ly, Song [2004] d id not discuss how to estimate the distr ibut ion of the number of bidders, which is essential for comput ing opt imal bidding strategies in our repeated auctions setting (see Section 3.4). Hai le and Tamer [2003] analyzed a different form of the hidden b id problem: a bidder may have submitted an earlier b id below her valuation, then the current price rises above her valuation, so she does not b id again. A s a result, bidders' f inal observed bids may be below their valuations. Hai le and Tamer [2003] proposed a method to compute bounds on the value distr ibutions given the observed bids. In our model, bidders' f inal observed bids coincide wi th their will ingness to pay, so this issue is not addressed in our current model. O n the other hand, Hai le and Tamer's [2003] approach was intended for physical (as opposed to online) auctions, where there are no fixed closing times, and the number of potential bidders are assumed to be known. Thus their approach cannot be directly applied to online auctions where the number of bidders are inherently unknown. A n interesting future research direction is to combine our approach that deal wi th hidden bidders wi th Hai le and Tamer [2003] 's technique for bounding the valuation distr ibut ion. 3.7 Conclusion In this chapter we have described techniques for bui ld ing bidding agents in online auct ion settings that include hidden bids. In part icular, we have addressed the issue of est imating the distr ibutions of the number of bidders and b id amounts from incomplete auction data. We proposed a learning approach based on the E M algor i thm that takes into account the missing bids by iteratively generating missing bids and doing max imum l ikel ihood estimates on the completed set of Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 56 bids. We applied our approach to both decision-theoretic and game-theoretic settings, and conducted experiments on both synthetic data as well as on eBay data. Our results show that our approach never did worse and often did much better than the the straightforward approach of ignoring the missing data, both in terms of the quality of the estimates and in terms of expected payoffs under a decision theoretic bidding model. 57 Bibliography A l tman , A lon and Moshe Tennenholtz (2005). Rank ing systems: The PageR-ank axioms. ACM Conference on Electronic Commerce. Andr ieu , C , N . de Freitas, A . Doucet and M.I. Jordan (2003). A n introduct ion to M C M C for machine learning. Machine Learning. Anthony, P., W . Ha l l , V . D . Dang and N . Jennings (2001). Autonomous agents for part ic ipat ing in mult iple online auctions. IJCAI Workshop on EBusiness and the Intelligent Web. Arora , A . , H . X u , R. Padman and W . Vogt (2003). Op t ima l b idding in se-quential online auctions. Work ing Paper . Athey, S. and P. Hai le (2002). Identif ication in standard auct ion models. Econometrica, 70(6), 2107-2140. Bhat , N . and K. Ley ton-Brown (2004). Comput ing Nash equi l ibr ia of act ion-graph games. UAL B l u m , B. , C . Shelton and D. Rol ler (2002). Gametracer. http: / /dags.s tanford.edu/Games/gametracer .h tml . Bout i l ier , C , M . Goldszmidt and B. Sabata (1999). Sequential auctions for the al locat ion of resources wi th complementarit ies. IJCAI. Bowl ing, M . (2004). Convergence and no-regret in multiagent learning. NIPS. Bowl ing, M . and M . Veloso (2001). Convergence of gradient dynamics wi th a variable learning rate. ICML. Byde, A (2002). A comparison among bidding algorithms for mult iple auctions. Agent-Mediated Electronic Commerce IV. C a i , G . and P.R. Wurman (2003). Monte Carlo approximation in incomplete-information, sequential-auction games (Technical Repor t ) . No r th Caro l ina State University. Chen, X . and X . Deng (2005). Settling the complexity of 2-player Nash equi-librium (Technical Report TR05-150) . E C C C . Coni tzer, V . and T . Sandholm (2003). Complex i ty results about Nash equil ib-r ia. IJCAI. Bibliography 58 Daskalakis, C , P. W . Goldberg and C . H. Papadimi t r iou (2005). The complex-ity of computing a Nash equilibrium (Technical Report TR05-115) . E C C C . Dempster, A . , N . La i rd and D. Rub in (1977). M a x i m u m l ikel ihood from in-complete data v ia the E M algori thm. Journal of the Royal Statistical Society, 39(1), 1-38. Goldberg, P .W. and C . H . Papadimi t r iou (2005). Reducibility among equilib-rium problems (Technical Report TR05-090) . E C C C . Gol le , Ph i l ippe, K e v i n Ley ton-Brown, Ilya Mi ronov and M a r k L i l l ibr idge (2001). Incentives for sharing in peer-to-peer networks. International Work-shop on Electronic Commerce (WELCOM). Govindan, S. and R. Wi l son (2003). A global Newton method to compute Nash equi l ibr ia. Journal of Economic Theory. Govindan, S. and R. Wi l son (2004). Comput ing Nash equi l ibr ia by iterated polymatr ix approximat ion. Journal of Economic Dynamics and Control, 28, 1229-1241. Greenwald, A . and J . Boyan (2004). B idd ing under uncertainty: Theory and experiments. UAL Hai le, Ph i l i p A . and E l ie Tamer (2003). Inferences w i th an incomplete model of Engl ish auctions. Journal of Political Economy, 111(1), 1-51. Kearns, M . J . , M . L . L i t tman and S.P. Singh (2001). Graph ica l models for game theory. UAL Klemperer, P. (2000). Auc t ion theory: A guide to the l i terature. In P. K l e m -perer (Ed.) , The economic theory of auctions. Edward E lgar . Ko l ler , D., N . Megiddo and B. von Stengel (1994). Fast algorithms for f inding randomized strategies in game trees. STOC. Kol ler , D. and B. M i l ch (2001). Mult i -agent influence diagrams for representing and solving games. IJCAI. L a M u r a , P. (2000). Game networks. UAL Leyton-Brown, K . and M . Tennenholtz (2003). Local-effect games. IJCAI. Mack ie -Mason, J . K . , A . Osepayshvi l i , D . M . Reeves and M . P . Wel lman (2004). Pr ice predict ion strategies for market-based scheduling. Fourteenth Interna-tional Conference on Automated Planning and Scheduling (pp. 244-252). Mi lg rom, P. and R. Weber (2000). A theory of auctions and competi t ive bidding, II. In P. Klemperer (Ed.) , The economic theory of auctions. Edward Elgar . Bibliography 59 Osepayshvi l i , A . , M . P . Wel lman, D . M . Reeves and J . K . Mack ie -Mason (2005). Self-confirming price predict ion for simultaneous ascending auctions. UAL Papadimi t r iou, C H . (2005). Comput ing correlated equi l ibr ia in mult iplayer games. STOC. Porter , R., E . Nude lman and Y . Shoham (2004). Simple search methods for finding a Nash equi l ibr ium. AAAI (pp. 664-669). Powers, R. and Y . Shoham (2004). New cri ter ia and a new algor i thm for learning in multiagent systems. NIPS. Regev, O. and N . Nisan (1998). The popcorn market: Onl ine markets for computat ional resources. International Conference on Information and Com-putation Economies. Rogers, A . , E . Dav id , J . Schiff, S. Kraus and N . R. Jennings (2005). Learn-ing environmental parameters for the design of opt imal english auctions wi th discrete b id levels. International Workshop on Agent-Mediated E-Commerce. Utrecht, Netherlands. Rosenthal , R . W . (1973). A class of games possessing pure-strategy Nash equi-l ibr ia. Int. J. Game Theory, 2, 65-67. Ro th , A . E . and A . Ockenfels (2002). Last-minute bidding and the rules for ending second-price auctions: Evidence from eBay and A m a z o n auctions on the internet. American Economic Review. Roughgarden, T . and E . Tardos (2004). Bounding the inefficiency of equi l ibr ia in nonatomic congestion games. Games and Economic Behavior, ^7(2), 389 -403. Shah, H.S. , N .R . Joshi , A . Surekaand P R . Wurman (2003). M in ing for bidding strategies on eBay. Lecture Notes on Artificial Intelligence. Si lverman, B . W . (1986). Density estimation. London: Chapman and Ha l l . Song, Un jy (2004). Nonparametr ic estimation of an eBay auct ion model wi th an unknown number of bidders. Univers i ty of Br i t i sh Co lumbia . Stone, P., R . E . Schapire, J . A . Csi r ik , M . L . L i t tman and D. McAl les ter (2002). At tac-2001: A learning, autonomous bidding agent. Agent-Mediated Electronic Commerce IV (pp. 143-160). van der Laan , G . , A . J . J . Ta lman and L. van der Heyden (1987). S impl ic ia l variable dimension algorithms for solving the nonlinear complementari ty prob-lem on a product of unit simplices using a general label l ing. Mathematics of OR, 12(3), 377-397. Bibliography 60 Waldspurger, C . A . , T . Hogg, B . A . Huberman, J . 0 . Kephar t and W . S. Stor-netta (1992). Spawn: A distr ibuted computat ional economy. IEEE Transac-tions on Software Engineering. Weber, R. (1983). Mul t i -object auctions. In R. Engelbercht-Wiggans, M . Shu-bik and R. Stark (Eds.), Auctions, bidding, and contracting: Uses and theory, 165-191. New York Universi ty Press. Wel lman, M .P . , A . Greenwald, P. Stone and P.R. Wurman (2002). The 2001 Trad ing Agent Compet i t ion. IAAI. West, M . (1994). Discovery sampl ing and selection models. In J . O . Berger and S.S. G u p t a (Eds.) , Decision theory and related topics IV. New York : Springer Verlag. Zhang, N.L . and D. Poole (1996). Exp lo i t ing causal independence in bayesian network inference. JAIR, 5, 301-328.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational problems in multiagent systems
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computational problems in multiagent systems Jiang, Albert Xin 2006
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Computational problems in multiagent systems |
Creator |
Jiang, Albert Xin |
Publisher | University of British Columbia |
Date Issued | 2006 |
Description | There has been recent interest from the computer science community on multiagent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. In this thesis, we apply techniques developed in other areas of CS to solve two computational problems in multiagent systems: Action Graph Games: Action Graph Games (AGGs), first proposed in [Bhat and Leyton-Brown 2004], are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players' utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. Learning and planning i n online auction environments: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's preferences. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-base algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques. Bidding agents using the estimated distributions from our EM approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-06 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051584 |
URI | http://hdl.handle.net/2429/17668 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2006-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2006-0209.pdf [ 5.26MB ]
- Metadata
- JSON: 831-1.0051584.json
- JSON-LD: 831-1.0051584-ld.json
- RDF/XML (Pretty): 831-1.0051584-rdf.xml
- RDF/JSON: 831-1.0051584-rdf.json
- Turtle: 831-1.0051584-turtle.txt
- N-Triples: 831-1.0051584-rdf-ntriples.txt
- Original Record: 831-1.0051584-source.json
- Full Text
- 831-1.0051584-fulltext.txt
- Citation
- 831-1.0051584.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051584/manifest